Co-clustering with augmented data

CO-CLUSTERING WITH AUGMENTED

DATA MATRIX

Authors: Meng-Lun Wu, Chia-Hui Chang, and Rui-Zhe Liu

Dept. of Computer Science Information Engineering

National Central University

1

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

OUTLINE

Introduction

Related Work

Problem Formulation

Co-Clustering Algorithm

Experiments Result and Evaluation

Conclusion

2

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

INTRODUCTION (CONT.)

Over the past decade, co-clustering are arisen to solve the simultaneously clustering of dyadic data.

However, most research only take account of the dyadic data as the main clustering matrix, which are not considering of addition information.

In addition to user-movie click matrix, we might have user preference and movie description.

Similarly, in addition to document-word co-occurrence matrix, we might have document genre and word meaning.

3

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

INTRODUCTION (CONT.)

To fully utilize augmented matrix, we proposed a new method called Co-Clustering with Augmented data Matrix (CCAM).

Umatch1 social websites provide the Ad$martservice that could let user to click the ads and share the profit with users.

Fortunately, we could cope with Umatch websites, which hope us to analyze the ad-user information according to the following data.

ad-user click data, ad setting data, and user profile (Lohas questionary). 4

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

1. Umatch: http://www.morgenstern.com.tw/users2/index.php/u_match1/

RELATED WORK

Co-clustering research could separate three kinds

categories, MDCC, MOCC2 and ITCC.

MDCC: Matrix decomposition co-clustering Long et al. (2005) “Co-clustering by Block Value Decomposition”

Ding et al. (2005) gave a similar co-clustering approach based on

nonnegative matrix factorization.

MOCC2: topic model based co-clustering

Shafiei et al. (2006) “Latent Dirichlet Co-clustering“.

Hanhuai et al. (2008) “Bayesian Co-clustering “

20

11/8

/24

5

DaW

ak

20

11

in T

ou

lou

se

, Fra

nce

2. M. Mahdi Shafiei and Evangelos E. Milios “Model-based Overlapping Co-Clustering”

Supported by grants from the Natural Sciences and Engineering Research.

RELATED WORK (CONT.)

ITCC: an optimization method Dhillon et al. (2003) “Information-Theoretic Co-Clustering.”

Banerjee et al. (2004) ”A Generalized Maximum Entropy Approach to

Bregman Co-clustering and Matrix Approximation.”

Li et al. employ ITCC framework to propagate the

class structure and knowledge from in-domain data

to out-of-domain data.

As the inspiration of Li and Dhillon, we extend ITCC

framework with augmented matrix to co-cluster the

ad and user.

20

11/8

/24

6

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

PROBLEM FORMULATION

Let A, U, S and L be discrete random variables.

A denotes ads which are ranged from {a1,…,am},

U denotes users which are ranged from {u1,…,un}

S denotes ad settings which are ranged from {s1,…,sr}

L denotes user Lohas questionary which are ranged from {l1,…,lv}

Input Data: the joint probability distribution

p(A, U): ad-user link matrix

p(A, S): ad-setting matrix

p(U, L): user-Lohas matrix

Given a p(A,U), the mutual information is defined as

7

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

PROBLEM FORMULATION

Goal: to obtain

k ad clusters denoted by {â1, … âk}

l user groups denoted by {û1, … ûl}

Such that the mutual information loss after co-

clustering is minimized the objective function

where , are trade-off parameter that balance the

effect to ad clusters or user groups.

)(ˆ}ˆ,...,ˆ,ˆ{},...,,{:

)(ˆ}ˆ,...,ˆ,ˆ{},...,,{:

2121

2121

UCUuuuuuuC

ACAaaaaaaC

UlnU

AkmA

8

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

PROBLEM FORMULATION (CONT.)

Let q(A, U) denotes the approximation distribution for

p(A, U).

Lemma 1.

For a fixed co-clustering (Â, Û), we can write the loss in

mutual information as

where q(A, U), q(A, S) and q(U, L) could be obtained by

9

20

11/8

/24

DaW

ak

20

11

in T

ou

lou

se

, Fra

nce

LEMMA 1 PROOF

20

11/8

/24

10

DaW

ak

20

11

in T

ou

lou

se

, Fra

nce

LEMMA 1 PROOF (CONT.)

20

11/8

/24

11

DaW

ak

20

11

in T

ou

lou

se

, Fra

nce

PROBLEM FORMULATION (CONT.)

20

11/8

/24

12

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

CO-CLUSTERING ALGORITHM

13

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

14

p(u,a) â(2) â(1) â(2) â(1)

û(3) 0.050 0.050 0.150 0

û(1) 0.050 0.050 0.150 0

û(2) 0 0 0 0.150

û(2) 0 0.05 0 0.050

û(3) 0.050 0 0 0.050

û(3) 0.050 0.050 0.050 0

p(u,l) l1 l2

û(3) 0.050 0

û(1) 0.120 0.050

û(2) 0 0.150

û(2) 0.200 0.040

û(3) 0.040 0.200

û(3) 0.050 0.100

p(u | û=CU(u))

0.500

1

0.600

0.400

0.200

0.300

p(u | û=CU(u))

0.114

1

0.385

0.615

0.545

0.341

p(û, â) â(1) â(2) p(û)

û(1) 0.050 0.200 0.250

û(2) 0.250 0 0.250

û(3) 0.150 0.350 0.500

p(â) 0.450 0.550 1

p(a | â=CA(a)) 0.367 0.444 0.636 0.556

q(u,a) â(2) â(1) â(2) â(1)

û(3) 0.064 0.033 0.111 0.042

û(1) 0.073 0.022 0.127 0.028

û(2) 0 0.067 0 0.083

û(2) 0 0.044 0 0.056

û(3) 0.026 0.013 0.045 0.017

û(3) 0.038 0.020 0.067 0.025

q(u,l) l1 l2

û(3) 0.016 0.034

û(1) 0.120 0.050

û(2) 0.077 0.073

û(2) 0.123 0.117

û(3) 0.076 0.164

û(3) 0.048 0.102

p(û, l) l1 l2 p(û)

û(1) 0.120 0.050 0.170

û(2) 0.200 0.190 0.390

û(3) 0.140 0.300 0.440

20

11/8

/24

DaW

ak 2

011

in T

ou

lou

se

, Fra

nce

15

q(a | û) a1 a2 a3 a4

û(1) 0.291 0.089 0.509 0.111

û(2) 0 0.444 0 0.556

û(3) 0.255 0.133 0.445 0.167

q(l|û) l1 l2

û(1) 0.706 0.294

û(2) 0.513 0.487

û(3) 0.318 0.682

KL(p(A|u) || q(A|û)) a1 a2 a3 a4 *p(u)

u1, û(1) 0.033 0.070 0.043 0.000 0.020

u1, û(2) 1.707 0.069 5.262 0.000 1.725

u1, û(3) 0.021 0.035 0.078 0.000 0.023

KL(p(L|u) || q(L|û)) l1 l2 *p(u)

u1, û(1) 0.151 0.000 0.008

u1, û(2) 0.290 0.000 0.015

u1, û(3) 0.497 0.000 0.025

KL(p(A|u) || q(A|û)) a1 a2 a3 a4 *p(u)

u2, û(1) 0.033 0.070 0.043 0.000 0.020

u2, û(2) 1.707 0.069 5.262 0.000 1.725

u2, û(3) 0.021 0.035 0.078 0.000 0.023

KL(p(L|u) || q(L|û)) l1 L2 *p(u)

u2, û(1) 0.000 0.000 0.000

u2, û(2) 0.098 0.064 0.006

u2, û(3) 0.244 0.107 0.023

KL(p(A|u) || q(A|cu)) a1 a2 a3 a4 *p(u)

u3, û(1) 0.000 0.000 0.000 0.954 0.143

u3, û(2) 0.000 0.000 0.000 0.255 0.038

u3, û(3) 0.000 0.000 0.000 0.778 0.117

KL(p(L|u) || q(L|û)) l1 l2 *p(u)

u3, û(1) 0.000 0.531 0.080

u3, û(2) 0.000 0.312 0.047

u3, û(3) 0.000 0.166 0.025

p(a|u) â(2) â(1) â(2) â(1)

û(3) 0.200 0.200 0.600 0.000

û(1) 0.200 0.200 0.600 0.000

û(2) 0.000 0.000 0.000 1.000

û(2) 0.000 0.500 0.000 0.500

û(3) 0.500 0.000 0.000 0.500

û(3) 0.333 0.333 0.333 0.000

p(l|u) l1 l2

û(3) 1.000 0.000

û(1) 0.706 0.294

û(2) 0.000 1.000

û(2) 0.833 0.167

û(3) 0.167 0.833

û(3) 0.333 0.667

KL(p(A|u) || q(A|û a1 a2 a3 a4 *p(u)

u4, û(1) 0.000 0.375 0.000 0.327 0.070

u4, û(2) 0.000 0.026 0.000 0.023 0.000

u4, û(3) 0.000 0.287 0.000 0.239 0.053

KL(p(L|u) || q(L|û)) l1 l2 *p(u)

u4, û(1) 0.060 0.041 0.005

u4, û(2) 0.176 0.078 0.024

u4, û(3) 0.348 0.102 0.059

φ=0.5

0.024

1.732

0.035

φ=0.5

0.020

1.728

0.035

φ=0.5

0.183

0.062

0.129

φ=0.5

0.072

0.012

0.082

KL(p(A|u) || q(A|û)) a1 a2 a3 a4 *p(u)

u5, û(1) 0.118 0.000 0.000 0.327 0.044

u5, û(2) 4.467 0.000 0.000 0.023 0.444

u5, û(3) 0.147 0.000 0.000 0.239 0.039

KL(p(L|u) || q(L|û)) l1 l2 *p(u)

u5, û(1) 0.104 0.377 0.065

u5, û(2) 0.081 0.194 0.027

u5, û(3) 0.047 0.073 0.006

φ=0.5

0.077

0.458

0.042

KL(p(L|u) || q(L|û)) l1 l2 *p(u)

u6, û(1) 0.109 0.237 0.019

u6, û(2) 0.062 0.091 0.004

u6, û(3) 0.007 0.007 0.000

φ=0.5

0.032

0.860

0.019

KL(p(A|u) || q(A|û)) a1 a2 a3 a4 *p(u)

u6,û(1) 0.020 0.191 0.061 0.000 0.022

u6, û(2) 2.919 0.042 2.838 0.000 0.857

u6, û(3) 0.039 0.133 0.042 0.000 0.019

u1, û(1) 0.024

u2, û(1) 0.020

u3, û(2) 0.062

u4, û(2) 0.012

u5, û(3) 0.042

u6, û(3) 0.019

0.179

p(u)*KL(p(A|u) || q(A|û))+φ*p(u)*KL(p(L|u) || q(L|û))

KL(p(A|U) || q(A|U))+φ*KL(p(L|U) || q(L|U))

EXPERIMENTS RESULT AND EVALUATION

The difficulty of clustering research is performance evaluation, because of it have no standard target.

Therefore, we present two evaluation methods based on class prediction and group variance.

Classification based evaluation

Mutual information based evaluation

We have retrieved the data from 2009/09/01 to 2010/03/31 that contain 530 ads and 9865 users.

For Lohas, only 2,124 users have values (have filled Lohas questionary), others are filled with zero.

16

8/2

4/2

011

CLASSIFICATION BASED EVALUATION

17

8/2

4/2

011

CLASSIFICATION BASED EVALUATION (CONT.)

18

8/2

4/2

011

Baseline

ITCC

CCAM

0.75

0.8

0.85

0.9

0.95

1

K=2 K=3 K=4 K=5

0.989

0.8780.858

0.877

0.99

0.915 0.9110.922

1

0.9270.934

0.926

F-m

easu

re

Evaluation of ad clustering Baseline

ITCC

CCAM

AD CLUSTER EVALUATION

8/2

4/2

011

19

=0.6

=1.0

=0.2

=1.0

=0.8

=1.0

=0.6, =1.0

USER GROUP EVALUATION

8/2

4/2

011

20

Baseline

ITCC

CCAM

0

0.2

0.4

0.6

0.8

1

K=2 K=3 K=4 K=5

0.859 0.83 0.821

0.445

0.877 0.8450.9

0.818

0.8770.845

0.98

0.82

F-m

easu

re

Evaluation of user grouping Baseline

ITCC

CCAM

=0.6

=1.0=0.2

=1.0=0.8

=1.0=0.6, =1.0

PARAMETER TUNING OF CCAM

We fix φ=1.0, and set λ from 0.2 to 1.0, then observe the

average F-measure between ads and users.

The optimal parameter for different K are

K=2,4: φ=1.0, λ=0.6

K=3: φ=1.0, λ=0.8

K=5: φ=1.0, λ=0.2

However, we fix λ 1.0 and set φ from 0.2 to 1.0 as well

as K from 3 to 5. There are nothing change.

We suspect that φ control the p(U, L), but the zero entry

dominate the p(U, L) of 161x7736.

8/2

4/2

011

21

PARAMETER TUNING (FIX =1.0)

8/2

4/2

011

22

0.886

0.957

0.873

0.840

0.860

0.880

0.900

0.920

0.940

0.960

λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1.0

Aver

age

F-m

easu

re

Parameter tuning (fix φ=1.0)

K=3 K=4 K=5

PARAMETER TUNING (FIX =1.0)

8/2

4/2

011

23

0.840

0.850

0.860

0.870

0.880

0.890

0.900

0.910

0.920

φ=0.2 φ=0.4 φ=0.6 φ=0.8 φ=1.0

Aver

age

F-m

easu

re

Parameter tuning (fix λ=1.0)

K=3 K=4 K=5

MUTUAL INFORMATION BASED EVALUATION

24

8/2

4/2

011

MUTUAL INFORMATION BASED EVALUATION (CONT.)

25

8/2

4/2

011

Baseline

ITCC

CCAM

0.000

0.050

0.100

0.150

0.200

0.250

0.300

K=2 K=3 K=4 K=5

0.055 0.062

0.093 0.117

0.158

0.209

0.243

0.279

0.158

0.209

0.242

0.278

Mu

tua

l in

form

ati

on

va

lue

Mutual information Baseline

ITCC

CCAM

MONOTONICALLY DECREASE MUTUAL

INFORMATION LOSS

8/2

4/2

011

26

0.8

0.85

0.9

0.95

1

1.05

1.95

2

2.05

2.1

2.15

2.2

2.25

2.3

2.35

1 2 3 4 5 6 7 8 9

Mu

tua

l in

form

ati

on

va

lue

(IT

CC

)

Mu

tua

l in

form

ati

on

va

lue

(CC

AM

)

Iteration times

Mutual inofrmation loss

CCAM ITCC

CONCLUSION

Co-clustering is to achieve the dual goals of row clustering

and column clustering.

However, most co-clustering algorithm focus on co-clustering

of correlation matrix between row and column.

Our proposed method, Co-Clustering with Augmented Matrix

(CCAM), can fully utilize the augmented data to achieve the

better co-clustering.

CCAM could achieve better classification performance than

ITCC and also present a comparable performance in the

mutual information evaluation.

8/2

4/2

011

27

THANK YOU FOR LISTENING.Q & A28

8/2

4/2

011

Education

Co-clustering with augmented data