Objective
To develop a content-based Tourist Destination Recommendation System that utilizes user preferences to suggest personalized travel destinations, enhancing the travel planning experience.
Motivation
Based on personal experience, a few months back, while trying to plan my summer vacation, I spent countless hours researching different vacation destinations; going through multiple websites to collect information for each destination. This inspired me to build an app that utilizes cosine similarity to recommend destinations based on user input.
Skills: Python, Machine Learning - Cosine Similarity
Dataset: Created a custom dataset. Contains destination name, category(historical, cultural, architectural, commercial or any other significance of the place), minimum budget per day, best months to visit, state/ country, continent, language, visa requirement
Cosine Similarity
Cosine similarity is a metric, helpful in determining, how similar the data objects are irrespective of their size. We can measure the similarity between two records using cosine similarity. The comparison is done by finding the dot product between the two identities. The formula to find the cosine similarity is
$$ \text{Cosine Similarity} = \frac{{A \cdot B}}{{|A| |B|}} $$
where,
- A . B = dot product of the vectors ‘A’ and ‘B’.
- |A| and |B| = length of the two vectors ‘A’ and ‘B’.
- |A| * |B| = cross products of two vectors.

As the above diagram shows, the angle between v1 and v2 is Θ. Greater the angle between the two vectors lesser is the similarity.
The cosine distance is calculated using the formula:
Cosine Distance = 1 - Cosine Similarity
Hence, when
-
Θ = 0
cos 0 = 1
cosine distance = 1 - 1 = 0
∴ The two vectors are same
-
Θ = 90
cos 90 = 0
cosine distance = 1 - 0 = 1
∴ The two vectors are very different
-
Θ = 180
cos 180 = -1
cosine distance = 1 - (-1) = 2
∴ The two vectors are opposite to each other
Methodology
-
Data collection: The first step to build a destination recommendation system is getting the appropriate data. In this project, we designed the dataset, as required data was not directly available. The dataset contains 9 features namely, Index, Destination, Category, Min Budget(Per Day in $), Best Months, State/Country, Continent, Language and Visa.
-
Data Preparation : Data was preprocessed to handle corrupted and missing data is essential.
-
Combining relevant features: In this step, only the features required to make recommendations were combined into a single attribute. In destination recommendation system, 3 features(Category, Min Budget(Per day in $) and visa requirement) are used to make recommendations.
-
Apply a filtering algorithm: To recommend destinations, a content based filtering algorithm - cosine similarity is used.
-
Provide recommendations: Once cosine similarity for a particular user input has been applied, different data records in the dataset will get different values. Based on this, after sorting, the top 5 destinations will be recommended to the user.
Results
