30
Bayesian Co-clustering for Dyadic Data Analysis Arindam Banerjee [email protected] Dept of Computer Science & Engineering University of Minnesota, Twin Cities Workshop on Algorithms for Modern Massive Datasets (MMDS 2008) Joint work with Hanhuai Shan

Bayesian Co-clustering for Dyadic Data Analysis

  • Upload
    julio

  • View
    58

  • Download
    5

Embed Size (px)

DESCRIPTION

Bayesian Co-clustering for Dyadic Data Analysis. Arindam Banerjee [email protected] Dept of Computer Science & Engineering University of Minnesota, Twin Cities. Workshop on Algorithms for Modern Massive Datasets (MMDS 2008). Joint work with Hanhuai Shan. Introduction. Dyadic Data - PowerPoint PPT Presentation

Citation preview

Page 1: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering for Dyadic Data Analysis

Arindam [email protected]

Dept of Computer Science & EngineeringUniversity of Minnesota, Twin Cities

Workshop on Algorithms for Modern Massive Datasets (MMDS 2008)

Joint work with Hanhuai Shan

Page 2: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 2

Introduction

• Dyadic Data– Relationship between two entities

• Examples– (Users, Movies): Ratings, Tags, Reviews – (Genes, Experiments): Expression– (Buyers, Products): Purchase, Ratings, Reviews – (Webpages, Advertisements): Click-through rate

• Co-clustering– Simultaneous clustering of rows and columns– Matrix approximation based on co-clusters

• Mixed membership co-clustering– Row/column has memberships in multiple row/column clusters– Flexible model, naturally handles sparsity

Page 3: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 3

Example: Gene Expression Analysis

Original Co-clustered

Page 4: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 4

Co-clustering and Matrix Approximation

Page 5: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 5

Example: Collaborative Filtering

Page 6: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 6

Related Work

• Partitional co-clustering– Bi-clustering (Hartigan ’72)– Bi-clustering of expression data (Cheng et al., ’00)– Information theoretic co-clustering (Dhillon et al., ’03)– Bregman co-clustering and matrix approximation (Banerjee et al., ’07)

• Mixed membership models– Probabilistic latent semantic indexing (Hoffman, ’99)– Latent Dirichlet allocation (Blei et al., ’03)

• Bayesian relational models– Stochastic block structure (Nowicki et al, ’01)– Infinite relational model (Kemp et al, ’06)– Mixed membership stochastic block model (Airoldi et al, ’07)

Page 7: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 7

Background

• Bayesian Networks

• Plates

X1

X3

X4 X5

X2

Page 8: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 8

Latent Dirichlet Allocation (LDA) [BNJ’03]

document1 document2

Page 9: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 9

Bayesian Naïve Bayes (BNB) [BS’07]

α

z11

x11

z12

x12

z13

x13

p1

z21

x21

z23

x23

p2

Θ

f21 f23f11 f12 f13

z22

x22

f22

Page 10: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 10

Bayesian Co-clustering (BCC)

Page 11: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 11

Bayesian Co-clustering (BCC)

Page 12: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 12

Variational Inference

• Expectation Maximization

• Variational EM– Introduce a variational distribution to

approximate .

– Use Jensen’s inequality to get a tractable lower bound for log-likelihood

– Maximize the lower bound w.r.t for the best lower bound, i.e., minimize the KL divergence between and

– Maximize the lower bound w.r.t

Page 13: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 13

Variational Distribution

• for each row, for each column

Page 14: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 14

Variational EM for Bayesian Co-clustering

= lower bound of log -likelihood

Page 15: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 15

EM for Bayesian Co-clustering

• Inference (E-step)

• Parameter Estimation (M-step) (Gaussians)

Page 16: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 16

Fast Latent Dirichlet Allocation (FastLDA)

• Introduce a different variational distribution as an approximation of .

• Number of variational parameters φ: m*n →n.• Number of optimizations over φ: m*n →n.

Original FastLDA

Page 17: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 17

FastLDA vs LDA: Perplexity

0

0.5

1

1.5

2

2.5

3

3.5

4

NASA Classic3 CmuDiff CmuSim Movielens

Lo

g o

f P

erp

lex

ity

LDA

Fast LDA

Page 18: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 18

FastLDA vs LDA: Time

0

500

1000

1500

2000

2500

3000

3500

4000

4500

NASA Classic3 CmuDiff CmuSim Movielens

Tim

e (

se

c)

LDA

Fast LDA

Page 19: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 19

Word List for Topics (Classic3)

LDA Fast LDA

Page 20: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 20

Word List for Topics (Newsgroups)

LDA Fast LDA

Page 21: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 21

BCC Results: Simulated Data

Page 22: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 22

BCC Results: Real Data

• Movielens: Movie recommendation data – 100,000 ratings (1-5) for 1682 movies from 943 users (6.3%)– Binarize: 0 (1-3), 1(4-5).– Discrete (original), Bernoulli (binary)

• Foodmart: Transaction data– 164,558 sales records for 7803 customers and 1559 products (1.35%)– Binarize: 0 (less than median), 1(higher than median)– Poisson (original), Bernoulli (binary)

• Jester: Joke rating data– 100,000 ratings (-10.00 - +10.00) for 100 jokes from 1000 users (100%)– Binarize: 0 (lower than 0), 1 (higher than 0)– Gaussian (original), Bernoulli (binary)

Page 23: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 23

BCC vs BNB vs LDA (Binary data)

Training Set Test Set

Perplexity on Binary Jester Dataset with Different Number of User Clusters

Page 24: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 24

BCC vs BNB (Original data)

Training Set Test Set

Perplexity on Movielens Dataset with Different Number of User Clusters

Page 25: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 25

BNB BCC LDA

Jester 1.7883 1.8186 98.3742

Movielens 1.6994 1.9831 439.6361

Foodmart 1.8691 1.9545 1461.7463

BNB BCC LDA

Jester 4.0237 2.5498 98.9964

Movielens 3.9320 2.8620 1557.0032

Foodmart 6.4751 2.1143 6542.9920

Training Set Test Set

On Binary Data

BNB BCC

Jester 15.4620 18.2495

Movielens 3.1495 0.8068

Foodmart 4.5901 4.5938

BNB BCC

Jester 39.9395 24.8239

Movielens 38.2377 1.0265

Foodmart 4.6681 4.5964

Training Set Test Set

On Original Data

Perplexity Comparison with 10 User Clusters

Page 26: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 26

Co-cluster Parameters (Movielens)

Page 27: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 27

Co-embedding: Users

Page 28: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 28

Co-embedding: Movies

Page 29: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 29

Summary

• Bayesian co-clustering– Mixed membership co-clustering for dyadic data– Flexible Bayesian priors over memberships– Applicable to variety of data types– Stable performance, consistently better in test set

• Fast variational inference algorithm – One variational parameter for each row/column– Maintains coupling between row/column cluster memberships– Same idea leads to FastLDA (try it at home)

• Future work– Open problem: Joint decoding of missing entries– Predictive models based on mixed membership co-clusters– Multi-relational clustering

Page 30: Bayesian Co-clustering for Dyadic Data Analysis

Bayesian Co-clustering 30

References

• A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix ApproximationA. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, D. Modha.Journal of Machine Learning Research (JMLR), (2007) .

• Latent Dirichlet Conditional Naive Bayes ModelsA. Banerjee and H. Shan. IEEE International Conference on Data Mining (ICDM), (2007).

• Latent Dirichlet AllocationD. Blei, A. Ng, M. Jordan.Journal of Machine Learning Research (JMLR), (2003).

• Bayesian Co-clusteringH. Shan, A. Banerjee. Tech Report, University of Minnesota, Twin Cities, (2008).