Upload
anne-casey
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
1
2
3
4
5
KNN Algorithm—CF
Recommender System
Matrix Factorization
MF on Hadoop
Thesis Framework
Content
1 Recommender System
Recommender system is a system which can recommend something you are maybe interested that you haven’t a try.
For example, if you have bought a book about machine learning, the system would give a recommendation list including some books about data mining, pattern recognition, even some programming technology.
1 Recommender System
1 Recommender System
But how she get the recommendation list ?
Machine Learning
1. Nuclear Pattern Recognition Method and Its Application2. Introduction to Robotics3. Data Mining4. Beauty of Programming5. Artificial Intelligence
1 Recommender System
There are many ways by which we can get the list. Recommender systems are usually classified into the following categories, based on how recommendations are made,
1. Content-based recommendations: The user will be recommended items similar to the ones the user preferred in the past;
1 Recommender System
2. Collaborative recommendations: The user will be recommended items that people with similar tastes and preferences liked in the past;
Corated Item
Top 1
The similar user favorite but target user not bought
recommend it to target user
1 Recommender System
3. Hybrid approaches: These methods combine collaborative and content-based methods, which can help to avoid certain limitations of content-based and collaborative.Different ways to combine collaborative and content-based methods into a hybrid recommender system can be classified as follows:1). implementing collaborative and content-based methods separately and combining their predictions,2). incorporating some content-based characteristics into a collaborative approach,3). incorporating some collaborative characteristics into a content-based approach,
4). constructing a general unifying model that incorporates both content-based and collaborative characteristics.
2 KNN Algorithm—CF
KDD CUP 2011 website: http://kddcup.yahoo.com/index.php
Recommending Music Items based on the Yahoo! Music Dataset.
The dataset is split into two subsets: - Train data: in the file trainIdx2.txt- Test data: in the file testIdx2.txtAt each subset, user rating data is grouped by user. First line for a user is formatted as: <UsedId>|<#UserRatings>\nEach of the next <#UserRatings> lines describes a single rating by <UsedId>. Rating line format: <ItemId>\t<Score>\nThe scores are integers lying between 0 and 100, and are withheld from the test set. All user id's and item id's are consecutive integers, both starting at zero
2 KNN Algorithm—CF
KNN is the algorithm used when I participate the KDD CUP 2011 with my advisor Mrs Lin, KNN belongs to collaborative recommendation.
Corated Item
Top 1
The similar user’s favorite song but target
user not seen
recommend it to target user
2 KNN Algorithm—CF
user
item
?),,,(3
)?,,,(2
),?,,(1
333231
242221
141311
rrruser
rrruser
rrruser
,
11r ? 13r 14r
21r
31r
22r
32r 33r
? 24r
?
1
2
3
1 2 3 4
2 KNN Algorithm—CF
1. Cosine distance
2. Pearson correlation coefficient
Where Sxy is the set of all items corated by both users x and y.
2 KNN Algorithm—CF
21
)100)(100())(())(( ,,,,
xyxyxy Ssyx
Ssysyxsx
Ssysyxsx rrrrrrrrrr
UCyxsim ),(
21xyxyxy SSS
1. Cosine distance
where21xyxy SS and
2 KNN Algorithm—CF
||10000 2,,,,
1xy
Sssysx
Sssysx Srrrr
xyxy
UPyxsim ),(
21xyxyxy SSS where 21
xyxy SS
2. Pearson correlation coefficient
and
2 KNN Algorithm—CF
trackData.txt - Track information formatted as:<TrackId>|<AlbumId>|<ArtistId>|<Optional GenreId_1>|...|<Optional GenreId_k>\n
albumData.txt - Album information formatted as:<AlbumId>|<ArtistId>|<Optional GenreId_1>|...|<Optional GenreId_k>\n
artistData.txt - Artist listing formatted as:<ArtistId>\n
genreData.txt - Genre listing formatted as:<GenreId>\n
2 KNN Algorithm—CF
a b c d e f g
h i
j k
l m
Track
Genre
Album
Artist
2 KNN Algorithm—CF
),()()())(
1)()(
)()1((),(wt pcTpICcIC
pd
pd
pE
Epc
is comentropy.
1. The distance between parent node with child node
where
2. Similarity between c1 and c2
2 KNN Algorithm—CF
2 KNN Algorithm—CF
3 Matrix Factorization
u1
u2
u3
i1 i2 i3 Users Feature Matrix Items Feature Matrix
x11*y11 + x12*y12 = 1
x11*y21 + x12*y22 = 3
x21*y11 + x22*y12 = 2
x31*y21 + x32*y22 = 1
x31*y31 + x32*y32 = 3
U,V
x11*y31 + x12*y32 = ?
x21*y21 + x22*y22 = ?
x21*y31 + x22*y32 = ?
x31*y11 + x32*y12 = ?
3 Matrix Factorization
Matrix factorization (abbr. MF), just as the name suggests, decomposes a big matrix into the multiplication form of several small matrix. It defines mathematically as follows,
We here assume the target matrix , the factor matrix and , where k << min (m, n), so it is
nmR kmU
knV
),( TVUKR
3 Matrix Factorization
Kernel Function
Kernel Function decides how to compute the prediction matrix , that is, it’s a function with the features matrix U and V as the arguments. We can express it as follows:
R
),(~, jiji vuKcar
3 Matrix Factorization
Kernel Function
For the kernel K : one can use one of the following well-known kernels:
RRR kk
jijil vuvuK ,),(
djijip vuvuK ),1(),(
)2
||||exp(),(
2
2
ji
jir
vuvuK
),(),( , jijisjis vubvuK
linear
polynomial
RBF
logistic
………………
…………
………..
………
xs ex
1
1:)(with
3 Matrix Factorization
Rji
jiijVU
vurf),(
2
,)(minarg
We quantify the quality of the approximation with the Euclidean distance, so we can get the objective function as follows,
Rji
jijiVU
rrf),(
2~
,,,
))((minarg
Where i.e. is the predict value.
K
kjkikjiji vuvur
1**
~
, *~
, jir
Rji
jiji
ji
jiji
VUrr
r
rrf
),(
~
,,~
,
,,
,)log(minarg
22 |||||||| jvu vui
3 Matrix Factorization
1. Alternating Descent Method
This method only works, when the loss function implies with Euclidean distance.
0])[( iuj jjiij
i
UVVUrU
f
So, we can get
The same to .jV
)( jj
j jij
i VV
VrU
3 Matrix Factorization
2. Gradient Descent Method
The update rules of U defines as follows,
j jjiiji
VVUrU
f])[(
iii UfUU /*/
iuUwhere
The same to .jV
Rji
jiijVU
vurf),(
2
,)(minarg 22 |||||||| jvu vu
i
3 Matrix Factorization
Gradient AlgorithmStochastic Gradient Algorithm
3 Matrix Factorization
Online Algorithm
Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender Systems
4 MF on Hadoop
Loss Function
Rji
jiijVU
vurf),(
2
,)(minarg
])()[( ijT
ijT
ijijij UVUURVV
UVU
URVV
T
T
jij
We update the factor V for reducing the objective function f with the conventional gradient descendent, as follows,
, the same to factor matrix U.
, so it is reachableUVU
VT
Here we set
4 MF on Hadoop
4 MF on Hadoop
BA 1b 2b … nb
Ta1
Ta2
…
Tma
R_1_1 R_1_2 … R_1_n
R_m_1 R_m_2 … R_m_n
R_2_1 R_2_2 … R_2_n
…
…
……
4 MF on Hadoop
4 MF on Hadoop
× =
Left Matrix
Right Matrix
× =
× =
+
+
||
4 MF on Hadoop
BA
Tb1Tb2
…Tsb
1a
2a
…
sa
R_1_1
R_s_s
R_2_2
…
4 MF on Hadoop
AB =
… . . . . . . …
,
11C
1MC
N1C
MNC
A =
… . . . . . . …
,
11A
1AM
S1A
MSA
B =
… . . . . . . …
,
11B
1SB
N1B
SNB
where ),...,1;,...,1(1
NjMiBACS
kkjikij
5 Thesis Framework
Recommendation System
1. Introduction to recommendation system
2. My work to KNN
3. Matrix factorization in recommendation system
4. MF incremental updating using Hadoop