Upload
allenwu
View
1.240
Download
3
Embed Size (px)
DESCRIPTION
Clustering plays an important role in data mining as many applications use it as a preprocessing step for data analysis. Traditional clustering focuses on the grouping of similar objects, while two-way co-clustering can group dyadic data (objects as well as their attributes) simultaneously. Most co-clustering research focuses on single correlation data, but there might be other possible descriptions of dyadic data that could improve co-clustering performance. In this research, we extend ITCC (Information Theoretic Co-Clustering) to the problem of co-clustering with augmented matrix. We proposed CCAM (Co-Clustering with Augmented Data Matrix) to include this augmented data for better co-clustering. We apply CCAM in the analysis of on-line advertising, where both ads and users must be clustered. The key data that connect ads and users are the user-ad link matrix, which identifies the ads that each user has linked; both ads and users also have their feature data, i.e. the augmented data matrix. To evaluate the proposed method, we use two measures: classification accuracy and K-L divergence. The experiment is done using the advertisements and user data from Morgenstern, a financial social website that focuses on the advertisement agency. The experiment results show that CCAM provides better performance than ITCC since it consider the use of augmented data during clustering.
Citation preview
CO-CLUSTERING WITH AUGMENTED
DATA MATRIX
Authors: Meng-Lun Wu, Chia-Hui Chang, and Rui-Zhe Liu
Dept. of Computer Science Information Engineering
National Central University
1
20
11/8
/24
DaW
ak 2
011
in T
ou
lou
se
, Fra
nce
OUTLINE
Introduction
Related Work
Problem Formulation
Co-Clustering Algorithm
Experiments Result and Evaluation
Conclusion
2
20
11/8
/24
DaW
ak 2
011
in T
ou
lou
se
, Fra
nce
INTRODUCTION (CONT.)
Over the past decade, co-clustering are arisen to solve the simultaneously clustering of dyadic data.
However, most research only take account of the dyadic data as the main clustering matrix, which are not considering of addition information.
In addition to user-movie click matrix, we might have user preference and movie description.
Similarly, in addition to document-word co-occurrence matrix, we might have document genre and word meaning.
3
20
11/8
/24
DaW
ak 2
011
in T
ou
lou
se
, Fra
nce
INTRODUCTION (CONT.)
To fully utilize augmented matrix, we proposed a new method called Co-Clustering with Augmented data Matrix (CCAM).
Umatch1 social websites provide the Ad$martservice that could let user to click the ads and share the profit with users.
Fortunately, we could cope with Umatch websites, which hope us to analyze the ad-user information according to the following data.
ad-user click data, ad setting data, and user profile (Lohas questionary). 4
20
11/8
/24
DaW
ak 2
011
in T
ou
lou
se
, Fra
nce
1. Umatch: http://www.morgenstern.com.tw/users2/index.php/u_match1/
RELATED WORK
Co-clustering research could separate three kinds
categories, MDCC, MOCC2 and ITCC.
MDCC: Matrix decomposition co-clustering Long et al. (2005) “Co-clustering by Block Value Decomposition”
Ding et al. (2005) gave a similar co-clustering approach based on
nonnegative matrix factorization.
MOCC2: topic model based co-clustering
Shafiei et al. (2006) “Latent Dirichlet Co-clustering“.
Hanhuai et al. (2008) “Bayesian Co-clustering “
20
11/8
/24
5
DaW
ak
20
11
in T
ou
lou
se
, Fra
nce
2. M. Mahdi Shafiei and Evangelos E. Milios “Model-based Overlapping Co-Clustering”
Supported by grants from the Natural Sciences and Engineering Research.
RELATED WORK (CONT.)
ITCC: an optimization method Dhillon et al. (2003) “Information-Theoretic Co-Clustering.”
Banerjee et al. (2004) ”A Generalized Maximum Entropy Approach to
Bregman Co-clustering and Matrix Approximation.”
Li et al. employ ITCC framework to propagate the
class structure and knowledge from in-domain data
to out-of-domain data.
As the inspiration of Li and Dhillon, we extend ITCC
framework with augmented matrix to co-cluster the
ad and user.
20
11/8
/24
6
DaW
ak 2
011
in T
ou
lou
se
, Fra
nce
PROBLEM FORMULATION
Let A, U, S and L be discrete random variables.
A denotes ads which are ranged from {a1,…,am},
U denotes users which are ranged from {u1,…,un}
S denotes ad settings which are ranged from {s1,…,sr}
L denotes user Lohas questionary which are ranged from {l1,…,lv}
Input Data: the joint probability distribution
p(A, U): ad-user link matrix
p(A, S): ad-setting matrix
p(U, L): user-Lohas matrix
Given a p(A,U), the mutual information is defined as
7
20
11/8
/24
DaW
ak 2
011
in T
ou
lou
se
, Fra
nce
PROBLEM FORMULATION
Goal: to obtain
k ad clusters denoted by {â1, … âk}
l user groups denoted by {û1, … ûl}
Such that the mutual information loss after co-
clustering is minimized the objective function
where , are trade-off parameter that balance the
effect to ad clusters or user groups.
)(ˆ}ˆ,...,ˆ,ˆ{},...,,{:
)(ˆ}ˆ,...,ˆ,ˆ{},...,,{:
2121
2121
UCUuuuuuuC
ACAaaaaaaC
UlnU
AkmA
8
20
11/8
/24
DaW
ak 2
011
in T
ou
lou
se
, Fra
nce
PROBLEM FORMULATION (CONT.)
Let q(A, U) denotes the approximation distribution for
p(A, U).
Lemma 1.
For a fixed co-clustering (Â, Û), we can write the loss in
mutual information as
where q(A, U), q(A, S) and q(U, L) could be obtained by
9
20
11/8
/24
DaW
ak
20
11
in T
ou
lou
se
, Fra
nce
LEMMA 1 PROOF
20
11/8
/24
10
DaW
ak
20
11
in T
ou
lou
se
, Fra
nce
LEMMA 1 PROOF (CONT.)
20
11/8
/24
11
DaW
ak
20
11
in T
ou
lou
se
, Fra
nce
PROBLEM FORMULATION (CONT.)
20
11/8
/24
12
DaW
ak 2
011
in T
ou
lou
se
, Fra
nce
CO-CLUSTERING ALGORITHM
13
20
11/8
/24
DaW
ak 2
011
in T
ou
lou
se
, Fra
nce
20
11/8
/24
DaW
ak 2
011
in T
ou
lou
se
, Fra
nce
14
p(u,a) â(2) â(1) â(2) â(1)
û(3) 0.050 0.050 0.150 0
û(1) 0.050 0.050 0.150 0
û(2) 0 0 0 0.150
û(2) 0 0.05 0 0.050
û(3) 0.050 0 0 0.050
û(3) 0.050 0.050 0.050 0
p(u,l) l1 l2
û(3) 0.050 0
û(1) 0.120 0.050
û(2) 0 0.150
û(2) 0.200 0.040
û(3) 0.040 0.200
û(3) 0.050 0.100
p(u | û=CU(u))
0.500
1
0.600
0.400
0.200
0.300
p(u | û=CU(u))
0.114
1
0.385
0.615
0.545
0.341
p(û, â) â(1) â(2) p(û)
û(1) 0.050 0.200 0.250
û(2) 0.250 0 0.250
û(3) 0.150 0.350 0.500
p(â) 0.450 0.550 1
p(a | â=CA(a)) 0.367 0.444 0.636 0.556
q(u,a) â(2) â(1) â(2) â(1)
û(3) 0.064 0.033 0.111 0.042
û(1) 0.073 0.022 0.127 0.028
û(2) 0 0.067 0 0.083
û(2) 0 0.044 0 0.056
û(3) 0.026 0.013 0.045 0.017
û(3) 0.038 0.020 0.067 0.025
q(u,l) l1 l2
û(3) 0.016 0.034
û(1) 0.120 0.050
û(2) 0.077 0.073
û(2) 0.123 0.117
û(3) 0.076 0.164
û(3) 0.048 0.102
p(û, l) l1 l2 p(û)
û(1) 0.120 0.050 0.170
û(2) 0.200 0.190 0.390
û(3) 0.140 0.300 0.440
20
11/8
/24
DaW
ak 2
011
in T
ou
lou
se
, Fra
nce
15
q(a | û) a1 a2 a3 a4
û(1) 0.291 0.089 0.509 0.111
û(2) 0 0.444 0 0.556
û(3) 0.255 0.133 0.445 0.167
q(l|û) l1 l2
û(1) 0.706 0.294
û(2) 0.513 0.487
û(3) 0.318 0.682
KL(p(A|u) || q(A|û)) a1 a2 a3 a4 *p(u)
u1, û(1) 0.033 0.070 0.043 0.000 0.020
u1, û(2) 1.707 0.069 5.262 0.000 1.725
u1, û(3) 0.021 0.035 0.078 0.000 0.023
KL(p(L|u) || q(L|û)) l1 l2 *p(u)
u1, û(1) 0.151 0.000 0.008
u1, û(2) 0.290 0.000 0.015
u1, û(3) 0.497 0.000 0.025
KL(p(A|u) || q(A|û)) a1 a2 a3 a4 *p(u)
u2, û(1) 0.033 0.070 0.043 0.000 0.020
u2, û(2) 1.707 0.069 5.262 0.000 1.725
u2, û(3) 0.021 0.035 0.078 0.000 0.023
KL(p(L|u) || q(L|û)) l1 L2 *p(u)
u2, û(1) 0.000 0.000 0.000
u2, û(2) 0.098 0.064 0.006
u2, û(3) 0.244 0.107 0.023
KL(p(A|u) || q(A|cu)) a1 a2 a3 a4 *p(u)
u3, û(1) 0.000 0.000 0.000 0.954 0.143
u3, û(2) 0.000 0.000 0.000 0.255 0.038
u3, û(3) 0.000 0.000 0.000 0.778 0.117
KL(p(L|u) || q(L|û)) l1 l2 *p(u)
u3, û(1) 0.000 0.531 0.080
u3, û(2) 0.000 0.312 0.047
u3, û(3) 0.000 0.166 0.025
p(a|u) â(2) â(1) â(2) â(1)
û(3) 0.200 0.200 0.600 0.000
û(1) 0.200 0.200 0.600 0.000
û(2) 0.000 0.000 0.000 1.000
û(2) 0.000 0.500 0.000 0.500
û(3) 0.500 0.000 0.000 0.500
û(3) 0.333 0.333 0.333 0.000
p(l|u) l1 l2
û(3) 1.000 0.000
û(1) 0.706 0.294
û(2) 0.000 1.000
û(2) 0.833 0.167
û(3) 0.167 0.833
û(3) 0.333 0.667
KL(p(A|u) || q(A|û a1 a2 a3 a4 *p(u)
u4, û(1) 0.000 0.375 0.000 0.327 0.070
u4, û(2) 0.000 0.026 0.000 0.023 0.000
u4, û(3) 0.000 0.287 0.000 0.239 0.053
KL(p(L|u) || q(L|û)) l1 l2 *p(u)
u4, û(1) 0.060 0.041 0.005
u4, û(2) 0.176 0.078 0.024
u4, û(3) 0.348 0.102 0.059
φ=0.5
0.024
1.732
0.035
φ=0.5
0.020
1.728
0.035
φ=0.5
0.183
0.062
0.129
φ=0.5
0.072
0.012
0.082
KL(p(A|u) || q(A|û)) a1 a2 a3 a4 *p(u)
u5, û(1) 0.118 0.000 0.000 0.327 0.044
u5, û(2) 4.467 0.000 0.000 0.023 0.444
u5, û(3) 0.147 0.000 0.000 0.239 0.039
KL(p(L|u) || q(L|û)) l1 l2 *p(u)
u5, û(1) 0.104 0.377 0.065
u5, û(2) 0.081 0.194 0.027
u5, û(3) 0.047 0.073 0.006
φ=0.5
0.077
0.458
0.042
KL(p(L|u) || q(L|û)) l1 l2 *p(u)
u6, û(1) 0.109 0.237 0.019
u6, û(2) 0.062 0.091 0.004
u6, û(3) 0.007 0.007 0.000
φ=0.5
0.032
0.860
0.019
KL(p(A|u) || q(A|û)) a1 a2 a3 a4 *p(u)
u6,û(1) 0.020 0.191 0.061 0.000 0.022
u6, û(2) 2.919 0.042 2.838 0.000 0.857
u6, û(3) 0.039 0.133 0.042 0.000 0.019
u1, û(1) 0.024
u2, û(1) 0.020
u3, û(2) 0.062
u4, û(2) 0.012
u5, û(3) 0.042
u6, û(3) 0.019
0.179
p(u)*KL(p(A|u) || q(A|û))+φ*p(u)*KL(p(L|u) || q(L|û))
KL(p(A|U) || q(A|U))+φ*KL(p(L|U) || q(L|U))
EXPERIMENTS RESULT AND EVALUATION
The difficulty of clustering research is performance evaluation, because of it have no standard target.
Therefore, we present two evaluation methods based on class prediction and group variance.
Classification based evaluation
Mutual information based evaluation
We have retrieved the data from 2009/09/01 to 2010/03/31 that contain 530 ads and 9865 users.
For Lohas, only 2,124 users have values (have filled Lohas questionary), others are filled with zero.
16
8/2
4/2
011
CLASSIFICATION BASED EVALUATION
17
8/2
4/2
011
CLASSIFICATION BASED EVALUATION (CONT.)
18
8/2
4/2
011
Baseline
ITCC
CCAM
0.75
0.8
0.85
0.9
0.95
1
K=2 K=3 K=4 K=5
0.989
0.8780.858
0.877
0.99
0.915 0.9110.922
1
0.9270.934
0.926
F-m
easu
re
Evaluation of ad clustering Baseline
ITCC
CCAM
AD CLUSTER EVALUATION
8/2
4/2
011
19
=0.6
=1.0
=0.2
=1.0
=0.8
=1.0
=0.6, =1.0
USER GROUP EVALUATION
8/2
4/2
011
20
Baseline
ITCC
CCAM
0
0.2
0.4
0.6
0.8
1
K=2 K=3 K=4 K=5
0.859 0.83 0.821
0.445
0.877 0.8450.9
0.818
0.8770.845
0.98
0.82
F-m
easu
re
Evaluation of user grouping Baseline
ITCC
CCAM
=0.6
=1.0=0.2
=1.0=0.8
=1.0=0.6, =1.0
PARAMETER TUNING OF CCAM
We fix φ=1.0, and set λ from 0.2 to 1.0, then observe the
average F-measure between ads and users.
The optimal parameter for different K are
K=2,4: φ=1.0, λ=0.6
K=3: φ=1.0, λ=0.8
K=5: φ=1.0, λ=0.2
However, we fix λ 1.0 and set φ from 0.2 to 1.0 as well
as K from 3 to 5. There are nothing change.
We suspect that φ control the p(U, L), but the zero entry
dominate the p(U, L) of 161x7736.
8/2
4/2
011
21
PARAMETER TUNING (FIX =1.0)
8/2
4/2
011
22
0.886
0.957
0.873
0.840
0.860
0.880
0.900
0.920
0.940
0.960
λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1.0
Aver
age
F-m
easu
re
Parameter tuning (fix φ=1.0)
K=3 K=4 K=5
PARAMETER TUNING (FIX =1.0)
8/2
4/2
011
23
0.840
0.850
0.860
0.870
0.880
0.890
0.900
0.910
0.920
φ=0.2 φ=0.4 φ=0.6 φ=0.8 φ=1.0
Aver
age
F-m
easu
re
Parameter tuning (fix λ=1.0)
K=3 K=4 K=5
MUTUAL INFORMATION BASED EVALUATION
24
8/2
4/2
011
MUTUAL INFORMATION BASED EVALUATION (CONT.)
25
8/2
4/2
011
Baseline
ITCC
CCAM
0.000
0.050
0.100
0.150
0.200
0.250
0.300
K=2 K=3 K=4 K=5
0.055 0.062
0.093 0.117
0.158
0.209
0.243
0.279
0.158
0.209
0.242
0.278
Mu
tua
l in
form
ati
on
va
lue
Mutual information Baseline
ITCC
CCAM
MONOTONICALLY DECREASE MUTUAL
INFORMATION LOSS
8/2
4/2
011
26
0.8
0.85
0.9
0.95
1
1.05
1.95
2
2.05
2.1
2.15
2.2
2.25
2.3
2.35
1 2 3 4 5 6 7 8 9
Mu
tua
l in
form
ati
on
va
lue
(IT
CC
)
Mu
tua
l in
form
ati
on
va
lue
(CC
AM
)
Iteration times
Mutual inofrmation loss
CCAM ITCC
CONCLUSION
Co-clustering is to achieve the dual goals of row clustering
and column clustering.
However, most co-clustering algorithm focus on co-clustering
of correlation matrix between row and column.
Our proposed method, Co-Clustering with Augmented Matrix
(CCAM), can fully utilize the augmented data to achieve the
better co-clustering.
CCAM could achieve better classification performance than
ITCC and also present a comparable performance in the
mutual information evaluation.
8/2
4/2
011
27
THANK YOU FOR LISTENING.Q & A28
8/2
4/2
011