Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
A Comparison Study on Similarity Measures in
Collaborative Filtering Algorithms for Movie
Recommendation 1G. Suganeshwari and
2S.P. Syed Ibrahim
1,2School of Computing Sciences and Engineering,
Vellore Institute of Technology, Chennai Campus,
Chennai, Tamilnadu, India.
Abstract
Similarity measures is the core component used by the neighborhood
based collaborative filtering algorithms to predict the user’s interest in
items based upon the interest of the other people with similar taste.
Depending on the application, usage and domain, the collaborative
filtering has been implemented with different similarity metrics, weighing
approaches and rating normalization. In this paper, different similarity
metrics are compared, and a set of recommendations are provided based
on the influence of the similarity measures on the results of the empirical
study. Item-based and user-based collaborative filtering methods are
exploited to evaluate these similarity metrics. The experiments are
conducted on the MovieLens dataset. For evaluation the metrics used are
root mean square error and mean average precision. This paper is intended
to help the research community to identify the suitable similarity metric
depending upon the datasets.
Index Terms:Cosine, euclidean, jaccard, pearson, recommendation
system, similarity measures.
International Journal of Pure and Applied MathematicsVolume 119 No. 15 2018, 1495-1505ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/
1495
1. Introduction
with the advent of web 2.0, voluminous amount of information is available that
has led to information overload in e-commerce industry. Users find difficulties
in selecting the items based on their preferences and interest. Recommender
system(RS) helps the user to handle the information overload problem and to
provide a personalized recommendation. RS is more popular and acts a
backbone in ecommerce and social networking sites. The user item interaction
(likes, purchases, reviews, ratings) acts as an input to generate recommendation
in shopping sites. Similarly, the data in social networking sites also provide a
valuable input for recommendation in classical collaborative filtering methods.
The companies that has successfully implemented recommender systems are
Amazon, Spotify and Netflix.
Collaborative filtering is one of the most widely adopted technique in the
industry. To provide recommendations, similar items or users are to be found.
To trace the similar objects a value is computed between the pairs of items. If
some user watches the movie Inception, the recommended movies would be
Avatar and The Dark Knight. But an ecommerce site with minimum of 10,000
products would end up in computing one billion computations, which is
infeasible solution with number of items and users available in recent years.
Moreover, user tend to rate only a small fraction of products from the large item
catalog. So, the recommendation system must take care of huge volume of
sparse dataset. To address this key challenge in dynamically growing dataset the
similarity computations are to be computed periodically.
Improving the accuracy of the recommendation has been the target of the
researchers. Since in CF algorithms, similarity metrics are the core components,
their performances directly influence the recommendations generated. The aim
of this paper is to investigate the performance of the benchmark similarity
measures such as Euclidean distance, Jaccard distance, Cosine similarity and
Pearson Correlation Coefficient and evaluate their performance in the rating
dataset.
The key contributions of this paper are:
• Popular similarity metrics frequently used for predicting continuous data
are empirically studied and evaluated in a single framework.
• Similarity metrics are evaluated on the bench mark MovieLens dataset
on accuracy and mean average precision.
The rest of the paper is organized as follows: In section 2, the preliminary
knowledge is discussed. In section 3, popular similarity metrics are compared.
Experimental results are discussed in section 4 followed by the conclusion and
recommendation of the metrics.
International Journal of Pure and Applied Mathematics Special Issue
1496
2. Preliminary Knowledge
Recommendation System and Types
Recommender systems emerged as a powerful tool providing useful content to
the user and increasing the revenue of the retailers. The RS definition as defined
in [1], is a useful guide that directs the user to take appropriate decision from
the large option space available. The recommender systems are broadly
classified as content, collaborative and hybrid methods [2]. In content-based
methods items are recommended based on the user’s past purchases.
Collaborative filtering recommends items based on the user’s or item’s
similarities. The content based, and collaborative methods are combined in
different ways to yield hybrid methods.
The most successful recommender system implemented in the ecommerce
industry is Collaborative filtering (CF) [3]. It suggests items to the user, based
on the similarity between the user or the items. CF techniques are further
classified as model and memory-based methods. Memory based methods
exploits the entire utility matrix to provide recommendations. Statistical
techniques are used to find the most similar users to the active user. These
similar users are called as neighbors, they share similar interest with the target
user. Once the neighborhood is framed different algorithms can be integrated to
provide an effective recommendation. Though this method yields high quality
accuracy, it suffers from scalability issues when number of users and items
increases. The computational complexity drastically grows, and it becomes
infeasible to provide recommendations in real time. The similarity is computed
based on users or items. In contrast to memory-based methods, model-based
methods create a model based on the user’s ratings and then generates the
recommendation based on probabilistic approach. Different machine learning
algorithms like Bayesian network, Clustering, SVM and rule-based approaches
can be used to build the models.
Definition 1
Given a utility matrix R with n users and m items of size m x n represented as U
= {u1, u2…., un} and I = {i1, i2…im}, the recommendation problem is to define a
utility function F → {i1,i2,….ik}. p and q represent the user, u represents the
active user. Similarly, x and y represent the items and i represents the active
item. rup and ruq represents the rating given by the same user for the items p and
q. rxi and ryi represents the rating given by two users x and y for the same item i.
rp, rq are the mean ratings for user p and q. rx,ry are the mean ratings for items x
and y. The CF heuristic method for user based is given by aggregation of the
ratings of the similar users.
𝐹 𝑝, 𝑥 = 𝑎𝑔𝑔𝑟𝑞𝜖𝑛 𝑝 𝑟𝑞𝑥 (1)
Here n(p) is set of users similar to user p, rqx is the rating given by the similar
user to item x and F(p,x) is the predicted rating for the item x by the user p. For
International Journal of Pure and Applied Mathematics Special Issue
1497
item-based it is given by
𝐹 𝑝, 𝑥 = 𝑎𝑔𝑔𝑟𝑦𝜖𝑛 𝑥 𝑟𝑝 ,𝑦 (2)
3. Similarity Metrics
The first step in neighborhood-based CF is to weight the group of similar items
with an active item. The aim is to group the items that are similar and can be
influential in providing recommendations. Similarity metrics are used to
determine the closeness between the items. It involves a statistical computation
and measures how alike two objects are. In general, it is not practical to frame a
best similarity metric for myriad applications involving variety of datatypes, a
comparison study can highlight the performance and behaviors.
Euclidean Distance
The Euclidean distance for two objects is given by the path connecting them.
This metric can be used when data are dense and continuous. For n dimensions
it is given as the distance measured as
𝑆𝑖𝑚(𝑝, 𝑞) = 𝑝𝑗 − 𝑞𝑗 2
𝑛
𝑗=1
(3)
The distances computed is always greater than or equal to 0. Two objects are
identical when the distance is zero and dissimilar objects has higher distance
values. In Euclidean distance, the average rating has not been considered and
user’s rating are not normalized. As a result, it doesn’t reflect the user’s mood
or attitude. Euclidean distance [4] is broadly used in clustering recommendation
methods.
Jaccard Similarity
This similarity is used when objects are treated as sets. Jaccard distance [5] is
measured as cardinality of the intersection to the cardinality of their union.
𝑆𝑖𝑚(𝑝, 𝑞) =𝑝 𝑞
𝑝 𝑞
(4)
Jaccard is a measure of asymmetric value on binary objects. This method is
used when negative values gives no valuable outputs. When used with rating
dataset it doesn’t yield accurate results because it ignores the rating value.
Cosine Similarity
Cosine similarity [5] finds the similarity as a dot product of two objects. The
angle between the two objects with same direction results in higher similarity
score. This similarity value yields accurate results in sparse dataset. Its scales
from -1 to +1. The cosine angle is given by
International Journal of Pure and Applied Mathematics Special Issue
1498
If vector P and Q are represented as {x1, y1} and {x2, y2} then cosine similarity
between the vectors are given as
𝑠𝑖𝑚 𝑝, 𝑞 =𝑥1𝑥2 + 𝑦1𝑦2
x12 + y1
2 x22 + y2
2
(6)
The problem with cosine similarity is it treats missing values as negatives and it
doesn’t take into the account the rating scale. This is mainly caused due to the
fact that the average rating has not be considered. To address this issue Pearson
correlation coefficient is used that utilizes cosine similarity with sort of
normalization.
Pearson Correlation Coefficient
It is a very popular similarity metric used by Grouplens [6] and Ringo [7] for
recommendations. It depends upon the co-rated items and its deviation from the
average rating of the item. The similarity between the two users or two items
given by Pearson correlation is a measure of linear relationship between the pair
of objects. The similarity ranges by a scale of -1 to +1. Pearson correlation for
an item-based CF between items p and q is given by
𝑠𝑖𝑚 𝑝, 𝑞 = 𝑟𝑢𝑝 − 𝑟𝑝 𝑟𝑢𝑞 − 𝑟𝑞 𝑢𝜀𝑈𝑝𝑞
𝑟𝑢𝑝 − 𝑟𝑝 2
𝑢 ԑ 𝑈 𝑟𝑢𝑞 − 𝑟𝑞 2
𝑢 ԑ 𝑈
(7)
Here u ε Upq, are the common users who have rated both the items p and q. and
are the average ratings for the item p and q by the user u. Similarly, for user-
based algorithm the Pearson correlation between two user’s x and y is given by
users who have rated the same items.
𝑠𝑖𝑚 𝑥, 𝑦 = 𝑟𝑥𝑖 − 𝑟𝑥 𝑟𝑦𝑖 − 𝑟𝑦 𝑖ԑ𝐼𝑥𝑦
𝑟𝑥𝑖 − 𝑟𝑥 2𝑖 ԑ 𝐼 𝑟𝑦𝑖 − 𝑟𝑦
2𝑖 ԑ 𝐼
(8)
Here Ixy represents items rated by both the users’ x and y, rxi and ryi are the
ratings given by user’s x and y for item i. and are the average ratings for the
items x and y. Ringo music RS used constrained Pearson correlation [7] to
compute similarity weights. It is the variation of spearman that uses midpoint
instead of mean rate. Spearman correlation [8] is used when ratings are
represented as ranks and Kendall’s coefficient [9] is used when relative ranks
are used instead of ranks. Pearson correlation [6] [12] was used by Grouplens
𝑠𝑖𝑚 𝑝, 𝑞 = 𝑐𝑜𝑠 𝑝, 𝑞 =𝑝 . 𝑞
|𝑝 |2 |𝑞 |2
(5)
International Journal of Pure and Applied Mathematics Special Issue
1499
for predicting Usenet news articles by computing the weighted average. The
small neighborhood size doesn’t yield valuable predictions and leads to low
quality accuracy. To address this SigWt Pearson method is used, which
multiplies the correlation value with significant weight α/20. Here α is the
number of co-rated items.
4. Experiments
Dataset
The dataset used for experiments is benchmark MovieLens [11] dataset which
includes 1 lakh ratings from 943 users and 1642 movies. In order to represent
the relationship of sparsity and recommendation generated, the dataset is
divided using cross-validation. All the experiments are performed on the
machines with Intel(R) Xeon(R)2 CPU 3.36GHz and 32GB RAM which were
implemented using Spark version 2.2.0.
Performance Measure
To evaluate the efficiency of the existing similarity measure, the performance
measures such as accuracy, sensitivity, specificity are computed.
Root Mean Square Error (RMSE)
It represents the error rate when estimating an unrated item for an active user.
The proposition of total number of predictions that matches the user’s
preference. This measures the quality of the recommendation made. The widely
used accuracy metric in recommender system is RMSE [10]. The equation is as
follows
𝑅𝑀𝑆𝐸 = 𝑟𝑖𝑗 − 𝑟𝑖𝑗 ′
2𝑖 ,𝑗ϵ𝑋
𝑋
(9)
where rij is the actual rating given by user i for an item j, r’ij is the predicted
rating and |X| is the size of the Test Set.
Mean Average Precision
In most of the recommender systems items in the top list are more significant
than rest of the items. Mean Average Precision (MAP) is used to evaluate these
top-K items. MAP is a measure of the average relevance scores of a set of the
top-K documents presented in response to a query. It is the mean of the average
precision at K [30] as described in Eq (5). Higher MAP values predict higher
quality of the recommendation system.
𝑀𝐴𝑃 = 𝐴𝑃 𝑞 𝑞ϵ𝑄
𝑄
(10)
AP is average precision for each query q, |Q| is mean of the queries.
International Journal of Pure and Applied Mathematics Special Issue
1500
Results and Analysis
Fig.1, Fig.2, represents the root mean square value using different similarity
metrics for Item-Based CF and User-based CF methods. Comparing these
similarity metrics, it is found that Pearson correlation provide much better
results than the rest of the similarity metrics for the MovieLens dataset.
Cosine method doesn’t provide the accurate results, because it considers only
the angle and not the relationship between the ratings and the average ratings.
For example, if there are 3 set of ratings r1, r2 and r3 represented by vectors
(5,5,5), (1,1,1) and (4,5,5). It is evident that r1 and r3 are more similar. But
according to cosine similarity r1 and r2 are more similar. Another challenge in
this method is, it treats missing values as negative. This method is mostly used
in calculating the similarity between the documents.
In Euclidean distance the data must be normalized before computing the
similarity value. Considering the rating sets r1(4,4,3,3), r2(5,5,4,4) and r3
(3,4,3,3). The r1 rating set is rated by a tough user and r2 is a normal user, but
their tastes are very similar.
Fig. 1: Comparison of RMSE for Movie Lens 100k Dataset for Item-based CF
Fig. 2: Comparison of RMSE for MovieLens 100k Dataset for User-based CF
International Journal of Pure and Applied Mathematics Special Issue
1501
The relation between r1 and r2 must be small which is not true when the
Euclidean distance is used.
From the experimental analysis the Jaccard similarity also doesn’t yield
significant outcome for rating dataset. Considering the example of rating sets r1
(4, x, x, 5, 1),r2(5, 5, 5, x, x) and r3(1, x, x, 2, 5). Here x represents the movie
not rated by the user. According to Jaccard similarity the r1 and r2 are less
similar than the similarity between r1 and r3. This is due to the fact that Jaccard
similarity method ignores the rating values.
Pearson correlation method provides better results by normalizing the ratings by
subtracting from the mean. The challenges of handling tough raters and normal
raters are addressed. Along with that the missing ratings are not treated as
negative values and treated as average values. But often the Pearson correlation
method gets over-powered by the smaller number of co-rated objects. To
address this significance weighting can be used. When the number of neighbors
is less than 20, the correlation value is divided by n/20, where n is the number
of neighbors. This helps to address the less number of neighbors.
Mean Average Precision
From Fig.3 and Fig.4, it is inferred that item-based CF provide better accuracy
than user-based CF methods.
Fig. 3: Comparison of MAP for MovieLens 100k Dataset for Item-based CF
Fig. 4: Comparison of MAP for MovieLens 100k dataset for User-based CF
International Journal of Pure and Applied Mathematics Special Issue
1502
5. Conclusion
Item based collaborative filtering methods provide better quality of results when
compared with user based collaborative methods. In item-based CF the
neighborhood is static, which enables to precompute the similarity between the
items from the large sparse dataset. User based CF are extensively used in social
networking sites and item based are used in the e-commerce industries like
Amazon, MovieLens and Netflix. Depending upon the dataset the various
similarity metrics can be used. Pearson can be used for large datasets where data
isn’t normalized. Cosine is used when frequency plays an important role, and
Jaccard can be used when the data is represented as binary attributes.
References
[1] Burke R., Hybrid recommender systems: Survey and experiments, User modeling and user-adapted interaction 12(4) (2002), 331-370.
[2] Adomavicius G., Tuzhilin A., Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE transactions on knowledge and data engineering 17(6) (2005), 734-749.
[3] Ren Y., Li G., Zhang J., Zhou W., Lazy collaborative filtering for data sets with missing values, IEEE transactions on cybernetics 43(6) (2013), 1822-1834.
[4] Song L., Tekin C., van der Schaar M., Online learning in large-scale contextual recommender systems, IEEE Transactions on Services Computing 9(3) (2016), 433-445.
[5] Meng S., Dou W., Zhang X., Chen J., KASR: A Keyword-Aware Service Recommendation method on MapReduce for big data applications, IEEE Transactions on Parallel and Distributed Systems 25(12) (2014), 3221-3331.
[6] Resnick P., Iacovou N., Suchak M., Bergstrom P., Riedl J., GroupLens: an open architecture for collaborative filtering of netnews, Proceedings of the ACM conference on Computer supported cooperative work (1994), 175-186.
[7] Shardanand U., Maes P., Social information filtering: algorithms for automating “word of mouth”, Proceedings of the SIGCHI conference on Human factors in computing systems (1995), 210-217.
[8] Goldberg K., Roeder T., Gupta D., Perkins C., Eigentaste: A constant time collaborative filtering algorithm, information retrieval 4(2) (2001), 133-1351.
International Journal of Pure and Applied Mathematics Special Issue
1503
[9] Herlocker J.L., Konstan J.A., Terveen L.G., Riedl J.T., Evaluating collaborative filtering recommender systems, ACM Transactions on Information Systems (TOIS) 22(1) (2004).
[10] Sarwat M., Levandoski J.J., Eldawy A., Mokbel M.F., LARS*: An efficient and scalable location-aware recommender system, IEEE Transactions on Knowledge and Data Engineering 26(6) (2014), 1384-1399.
[11] Harper F.M., Konstan J.A., The movielens datasets: History and context, ACM Transactions on Interactive Intelligent Systems (TiiS) 5(4) (2016).
[12] Suganeshwari G., Ibrahim S.S., A survey on collaborative filtering-based recommendation system, Proceedings of the 3rd International Symposium on Big Data and Cloud Computing Challenges (2016), 503-518.
International Journal of Pure and Applied Mathematics Special Issue
1504
1505
1506