28
CO-CLUSTERING WITH AUGMENTED DATA MATRIX Authors: Meng-Lun Wu, Chia-Hui Chang, and Rui-Zhe Liu Dept. of Computer Science Information Engineering National Central University 1 2011/8/24 DaWak 2011 in Toulouse, France

Co-clustering with augmented data

  • Upload
    allenwu

  • View
    1.240

  • Download
    3

Embed Size (px)

DESCRIPTION

Clustering plays an important role in data mining as many applications use it as a preprocessing step for data analysis. Traditional clustering focuses on the grouping of similar objects, while two-way co-clustering can group dyadic data (objects as well as their attributes) simultaneously. Most co-clustering research focuses on single correlation data, but there might be other possible descriptions of dyadic data that could improve co-clustering performance. In this research, we extend ITCC (Information Theoretic Co-Clustering) to the problem of co-clustering with augmented matrix. We proposed CCAM (Co-Clustering with Augmented Data Matrix) to include this augmented data for better co-clustering. We apply CCAM in the analysis of on-line advertising, where both ads and users must be clustered. The key data that connect ads and users are the user-ad link matrix, which identifies the ads that each user has linked; both ads and users also have their feature data, i.e. the augmented data matrix. To evaluate the proposed method, we use two measures: classification accuracy and K-L divergence. The experiment is done using the advertisements and user data from Morgenstern, a financial social website that focuses on the advertisement agency. The experiment results show that CCAM provides better performance than ITCC since it consider the use of augmented data during clustering.

Citation preview

Page 1: Co-clustering with augmented data

CO-CLUSTERING WITH AUGMENTED

DATA MATRIX

Authors: Meng-Lun Wu, Chia-Hui Chang, and Rui-Zhe Liu

Dept. of Computer Science Information Engineering

National Central University

1

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

Page 2: Co-clustering with augmented data

OUTLINE

Introduction

Related Work

Problem Formulation

Co-Clustering Algorithm

Experiments Result and Evaluation

Conclusion

2

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

Page 3: Co-clustering with augmented data

INTRODUCTION (CONT.)

Over the past decade, co-clustering are arisen to solve the simultaneously clustering of dyadic data.

However, most research only take account of the dyadic data as the main clustering matrix, which are not considering of addition information.

In addition to user-movie click matrix, we might have user preference and movie description.

Similarly, in addition to document-word co-occurrence matrix, we might have document genre and word meaning.

3

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

Page 4: Co-clustering with augmented data

INTRODUCTION (CONT.)

To fully utilize augmented matrix, we proposed a new method called Co-Clustering with Augmented data Matrix (CCAM).

Umatch1 social websites provide the Ad$martservice that could let user to click the ads and share the profit with users.

Fortunately, we could cope with Umatch websites, which hope us to analyze the ad-user information according to the following data.

ad-user click data, ad setting data, and user profile (Lohas questionary). 4

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

1. Umatch: http://www.morgenstern.com.tw/users2/index.php/u_match1/

Page 5: Co-clustering with augmented data

RELATED WORK

Co-clustering research could separate three kinds

categories, MDCC, MOCC2 and ITCC.

MDCC: Matrix decomposition co-clustering Long et al. (2005) “Co-clustering by Block Value Decomposition”

Ding et al. (2005) gave a similar co-clustering approach based on

nonnegative matrix factorization.

MOCC2: topic model based co-clustering

Shafiei et al. (2006) “Latent Dirichlet Co-clustering“.

Hanhuai et al. (2008) “Bayesian Co-clustering “

20

11/8

/24

5

DaW

ak

20

11

in T

ou

lou

se

, Fra

nce

2. M. Mahdi Shafiei and Evangelos E. Milios “Model-based Overlapping Co-Clustering”

Supported by grants from the Natural Sciences and Engineering Research.

Page 6: Co-clustering with augmented data

RELATED WORK (CONT.)

ITCC: an optimization method Dhillon et al. (2003) “Information-Theoretic Co-Clustering.”

Banerjee et al. (2004) ”A Generalized Maximum Entropy Approach to

Bregman Co-clustering and Matrix Approximation.”

Li et al. employ ITCC framework to propagate the

class structure and knowledge from in-domain data

to out-of-domain data.

As the inspiration of Li and Dhillon, we extend ITCC

framework with augmented matrix to co-cluster the

ad and user.

20

11/8

/24

6

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

Page 7: Co-clustering with augmented data

PROBLEM FORMULATION

Let A, U, S and L be discrete random variables.

A denotes ads which are ranged from {a1,…,am},

U denotes users which are ranged from {u1,…,un}

S denotes ad settings which are ranged from {s1,…,sr}

L denotes user Lohas questionary which are ranged from {l1,…,lv}

Input Data: the joint probability distribution

p(A, U): ad-user link matrix

p(A, S): ad-setting matrix

p(U, L): user-Lohas matrix

Given a p(A,U), the mutual information is defined as

7

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

Page 8: Co-clustering with augmented data

PROBLEM FORMULATION

Goal: to obtain

k ad clusters denoted by {â1, … âk}

l user groups denoted by {û1, … ûl}

Such that the mutual information loss after co-

clustering is minimized the objective function

where , are trade-off parameter that balance the

effect to ad clusters or user groups.

)(ˆ}ˆ,...,ˆ,ˆ{},...,,{:

)(ˆ}ˆ,...,ˆ,ˆ{},...,,{:

2121

2121

UCUuuuuuuC

ACAaaaaaaC

UlnU

AkmA

8

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

Page 9: Co-clustering with augmented data

PROBLEM FORMULATION (CONT.)

Let q(A, U) denotes the approximation distribution for

p(A, U).

Lemma 1.

For a fixed co-clustering (Â, Û), we can write the loss in

mutual information as

where q(A, U), q(A, S) and q(U, L) could be obtained by

9

20

11/8

/24

DaW

ak

20

11

in T

ou

lou

se

, Fra

nce

Page 10: Co-clustering with augmented data

LEMMA 1 PROOF

20

11/8

/24

10

DaW

ak

20

11

in T

ou

lou

se

, Fra

nce

Page 11: Co-clustering with augmented data

LEMMA 1 PROOF (CONT.)

20

11/8

/24

11

DaW

ak

20

11

in T

ou

lou

se

, Fra

nce

Page 12: Co-clustering with augmented data

PROBLEM FORMULATION (CONT.)

20

11/8

/24

12

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

Page 13: Co-clustering with augmented data

CO-CLUSTERING ALGORITHM

13

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

Page 14: Co-clustering with augmented data

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

14

p(u,a) â(2) â(1) â(2) â(1)

û(3) 0.050 0.050 0.150 0

û(1) 0.050 0.050 0.150 0

û(2) 0 0 0 0.150

û(2) 0 0.05 0 0.050

û(3) 0.050 0 0 0.050

û(3) 0.050 0.050 0.050 0

p(u,l) l1 l2

û(3) 0.050 0

û(1) 0.120 0.050

û(2) 0 0.150

û(2) 0.200 0.040

û(3) 0.040 0.200

û(3) 0.050 0.100

p(u | û=CU(u))

0.500

1

0.600

0.400

0.200

0.300

p(u | û=CU(u))

0.114

1

0.385

0.615

0.545

0.341

p(û, â) â(1) â(2) p(û)

û(1) 0.050 0.200 0.250

û(2) 0.250 0 0.250

û(3) 0.150 0.350 0.500

p(â) 0.450 0.550 1

p(a | â=CA(a)) 0.367 0.444 0.636 0.556

q(u,a) â(2) â(1) â(2) â(1)

û(3) 0.064 0.033 0.111 0.042

û(1) 0.073 0.022 0.127 0.028

û(2) 0 0.067 0 0.083

û(2) 0 0.044 0 0.056

û(3) 0.026 0.013 0.045 0.017

û(3) 0.038 0.020 0.067 0.025

q(u,l) l1 l2

û(3) 0.016 0.034

û(1) 0.120 0.050

û(2) 0.077 0.073

û(2) 0.123 0.117

û(3) 0.076 0.164

û(3) 0.048 0.102

p(û, l) l1 l2 p(û)

û(1) 0.120 0.050 0.170

û(2) 0.200 0.190 0.390

û(3) 0.140 0.300 0.440

Page 15: Co-clustering with augmented data

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

15

q(a | û) a1 a2 a3 a4

û(1) 0.291 0.089 0.509 0.111

û(2) 0 0.444 0 0.556

û(3) 0.255 0.133 0.445 0.167

q(l|û) l1 l2

û(1) 0.706 0.294

û(2) 0.513 0.487

û(3) 0.318 0.682

KL(p(A|u) || q(A|û)) a1 a2 a3 a4 *p(u)

u1, û(1) 0.033 0.070 0.043 0.000 0.020

u1, û(2) 1.707 0.069 5.262 0.000 1.725

u1, û(3) 0.021 0.035 0.078 0.000 0.023

KL(p(L|u) || q(L|û)) l1 l2 *p(u)

u1, û(1) 0.151 0.000 0.008

u1, û(2) 0.290 0.000 0.015

u1, û(3) 0.497 0.000 0.025

KL(p(A|u) || q(A|û)) a1 a2 a3 a4 *p(u)

u2, û(1) 0.033 0.070 0.043 0.000 0.020

u2, û(2) 1.707 0.069 5.262 0.000 1.725

u2, û(3) 0.021 0.035 0.078 0.000 0.023

KL(p(L|u) || q(L|û)) l1 L2 *p(u)

u2, û(1) 0.000 0.000 0.000

u2, û(2) 0.098 0.064 0.006

u2, û(3) 0.244 0.107 0.023

KL(p(A|u) || q(A|cu)) a1 a2 a3 a4 *p(u)

u3, û(1) 0.000 0.000 0.000 0.954 0.143

u3, û(2) 0.000 0.000 0.000 0.255 0.038

u3, û(3) 0.000 0.000 0.000 0.778 0.117

KL(p(L|u) || q(L|û)) l1 l2 *p(u)

u3, û(1) 0.000 0.531 0.080

u3, û(2) 0.000 0.312 0.047

u3, û(3) 0.000 0.166 0.025

p(a|u) â(2) â(1) â(2) â(1)

û(3) 0.200 0.200 0.600 0.000

û(1) 0.200 0.200 0.600 0.000

û(2) 0.000 0.000 0.000 1.000

û(2) 0.000 0.500 0.000 0.500

û(3) 0.500 0.000 0.000 0.500

û(3) 0.333 0.333 0.333 0.000

p(l|u) l1 l2

û(3) 1.000 0.000

û(1) 0.706 0.294

û(2) 0.000 1.000

û(2) 0.833 0.167

û(3) 0.167 0.833

û(3) 0.333 0.667

KL(p(A|u) || q(A|û a1 a2 a3 a4 *p(u)

u4, û(1) 0.000 0.375 0.000 0.327 0.070

u4, û(2) 0.000 0.026 0.000 0.023 0.000

u4, û(3) 0.000 0.287 0.000 0.239 0.053

KL(p(L|u) || q(L|û)) l1 l2 *p(u)

u4, û(1) 0.060 0.041 0.005

u4, û(2) 0.176 0.078 0.024

u4, û(3) 0.348 0.102 0.059

φ=0.5

0.024

1.732

0.035

φ=0.5

0.020

1.728

0.035

φ=0.5

0.183

0.062

0.129

φ=0.5

0.072

0.012

0.082

KL(p(A|u) || q(A|û)) a1 a2 a3 a4 *p(u)

u5, û(1) 0.118 0.000 0.000 0.327 0.044

u5, û(2) 4.467 0.000 0.000 0.023 0.444

u5, û(3) 0.147 0.000 0.000 0.239 0.039

KL(p(L|u) || q(L|û)) l1 l2 *p(u)

u5, û(1) 0.104 0.377 0.065

u5, û(2) 0.081 0.194 0.027

u5, û(3) 0.047 0.073 0.006

φ=0.5

0.077

0.458

0.042

KL(p(L|u) || q(L|û)) l1 l2 *p(u)

u6, û(1) 0.109 0.237 0.019

u6, û(2) 0.062 0.091 0.004

u6, û(3) 0.007 0.007 0.000

φ=0.5

0.032

0.860

0.019

KL(p(A|u) || q(A|û)) a1 a2 a3 a4 *p(u)

u6,û(1) 0.020 0.191 0.061 0.000 0.022

u6, û(2) 2.919 0.042 2.838 0.000 0.857

u6, û(3) 0.039 0.133 0.042 0.000 0.019

u1, û(1) 0.024

u2, û(1) 0.020

u3, û(2) 0.062

u4, û(2) 0.012

u5, û(3) 0.042

u6, û(3) 0.019

0.179

p(u)*KL(p(A|u) || q(A|û))+φ*p(u)*KL(p(L|u) || q(L|û))

KL(p(A|U) || q(A|U))+φ*KL(p(L|U) || q(L|U))

Page 16: Co-clustering with augmented data

EXPERIMENTS RESULT AND EVALUATION

The difficulty of clustering research is performance evaluation, because of it have no standard target.

Therefore, we present two evaluation methods based on class prediction and group variance.

Classification based evaluation

Mutual information based evaluation

We have retrieved the data from 2009/09/01 to 2010/03/31 that contain 530 ads and 9865 users.

For Lohas, only 2,124 users have values (have filled Lohas questionary), others are filled with zero.

16

8/2

4/2

011

Page 17: Co-clustering with augmented data

CLASSIFICATION BASED EVALUATION

17

8/2

4/2

011

Page 18: Co-clustering with augmented data

CLASSIFICATION BASED EVALUATION (CONT.)

18

8/2

4/2

011

Page 19: Co-clustering with augmented data

Baseline

ITCC

CCAM

0.75

0.8

0.85

0.9

0.95

1

K=2 K=3 K=4 K=5

0.989

0.8780.858

0.877

0.99

0.915 0.9110.922

1

0.9270.934

0.926

F-m

easu

re

Evaluation of ad clustering Baseline

ITCC

CCAM

AD CLUSTER EVALUATION

8/2

4/2

011

19

=0.6

=1.0

=0.2

=1.0

=0.8

=1.0

=0.6, =1.0

Page 20: Co-clustering with augmented data

USER GROUP EVALUATION

8/2

4/2

011

20

Baseline

ITCC

CCAM

0

0.2

0.4

0.6

0.8

1

K=2 K=3 K=4 K=5

0.859 0.83 0.821

0.445

0.877 0.8450.9

0.818

0.8770.845

0.98

0.82

F-m

easu

re

Evaluation of user grouping Baseline

ITCC

CCAM

=0.6

=1.0=0.2

=1.0=0.8

=1.0=0.6, =1.0

Page 21: Co-clustering with augmented data

PARAMETER TUNING OF CCAM

We fix φ=1.0, and set λ from 0.2 to 1.0, then observe the

average F-measure between ads and users.

The optimal parameter for different K are

K=2,4: φ=1.0, λ=0.6

K=3: φ=1.0, λ=0.8

K=5: φ=1.0, λ=0.2

However, we fix λ 1.0 and set φ from 0.2 to 1.0 as well

as K from 3 to 5. There are nothing change.

We suspect that φ control the p(U, L), but the zero entry

dominate the p(U, L) of 161x7736.

8/2

4/2

011

21

Page 22: Co-clustering with augmented data

PARAMETER TUNING (FIX =1.0)

8/2

4/2

011

22

0.886

0.957

0.873

0.840

0.860

0.880

0.900

0.920

0.940

0.960

λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1.0

Aver

age

F-m

easu

re

Parameter tuning (fix φ=1.0)

K=3 K=4 K=5

Page 23: Co-clustering with augmented data

PARAMETER TUNING (FIX =1.0)

8/2

4/2

011

23

0.840

0.850

0.860

0.870

0.880

0.890

0.900

0.910

0.920

φ=0.2 φ=0.4 φ=0.6 φ=0.8 φ=1.0

Aver

age

F-m

easu

re

Parameter tuning (fix λ=1.0)

K=3 K=4 K=5

Page 24: Co-clustering with augmented data

MUTUAL INFORMATION BASED EVALUATION

24

8/2

4/2

011

Page 25: Co-clustering with augmented data

MUTUAL INFORMATION BASED EVALUATION (CONT.)

25

8/2

4/2

011

Baseline

ITCC

CCAM

0.000

0.050

0.100

0.150

0.200

0.250

0.300

K=2 K=3 K=4 K=5

0.055 0.062

0.093 0.117

0.158

0.209

0.243

0.279

0.158

0.209

0.242

0.278

Mu

tua

l in

form

ati

on

va

lue

Mutual information Baseline

ITCC

CCAM

Page 26: Co-clustering with augmented data

MONOTONICALLY DECREASE MUTUAL

INFORMATION LOSS

8/2

4/2

011

26

0.8

0.85

0.9

0.95

1

1.05

1.95

2

2.05

2.1

2.15

2.2

2.25

2.3

2.35

1 2 3 4 5 6 7 8 9

Mu

tua

l in

form

ati

on

va

lue

(IT

CC

)

Mu

tua

l in

form

ati

on

va

lue

(CC

AM

)

Iteration times

Mutual inofrmation loss

CCAM ITCC

Page 27: Co-clustering with augmented data

CONCLUSION

Co-clustering is to achieve the dual goals of row clustering

and column clustering.

However, most co-clustering algorithm focus on co-clustering

of correlation matrix between row and column.

Our proposed method, Co-Clustering with Augmented Matrix

(CCAM), can fully utilize the augmented data to achieve the

better co-clustering.

CCAM could achieve better classification performance than

ITCC and also present a comparable performance in the

mutual information evaluation.

8/2

4/2

011

27

Page 28: Co-clustering with augmented data

THANK YOU FOR LISTENING.Q & A28

8/2

4/2

011