Evaluation of Different Clustering Algorithms: With Social Media Data in Tourism Domain
With an ever-growing amount of data, it has been more difficult to implement traditional data analytical methods as they are often computationally intensive. Traditional travel recommender systems (TSR) are no exception, and a possible solution to this problem is adopting clustering analysis. Through clustering, large datasets can be broken down into smaller datasets or clusters, and then methods like TSR can be independently applied to these individual clusters. However, clustering is often explorative and therefore, is often difficult to know which algorithm should be adopted for the datasets at hand. Therefore, both partitioning and hierarchical algorithms are explored to establish the best performing algorithm for the three tourism-related datasets. Algorithms were compared using performance measures like silhouette scores. Few of the algorithms including K-means, PAM, single-linkage and divisive were tested and according to the results, single-linkage and average methods of the agglomerative hierarchical algorithms produce the best performing clusters with the three datasets. This research provides insights into cluster analysis and more specifically, clustering tourism-related datasets.