Iclust: information based clustering

Noam SlonimThe Lewis-Sigler Institute for Integrative GenomicsPrinceton University

Joint work with Gurinder Atwal Gasper Tkacik Bill Bialek

2 12 -1 -1 6 -3 8 ?? 7 -5 3 -4

12 ?? -5 11 -2 6 11 11 -8 12 ?? -2

?? 12 5 12 4 -1 8 -2 ?? 5 14 ??

8 1 12 1 14 -8 ?? -2 5 14 -8 -7

5 -5 11 17 -2 15 5 14 -8 5 16 2

1 11 -8 0 5 -5 5 14 18 ?? 2 1

-6 12 4 12 4 7 -1 3 -7 3 7 -5

21 ?? ?? 3 2 4 -11 -3 3 -3 ?? 9

K genes

N conditions

Running example

Gene expression data

Relations between genes?Relations between experimental conditions?

(log) ratio of the mRNAexpression level of a genein a specific condition

Some nice features of the information measure:

Model independent Responsive to any type of dependencyCaptures more than just pairwise relationsSuitable for both continuous and discrete dataIndependent of the measurement scaleAxiomatic

Information as a correlation/similarity measure

Mutual information - definition

We have some “uncertainty” about the state of gene-A;but now someone told us the state of gene-B…

How much can we learn from the state of gene-B about the state of gene-A (and vice versa).

-The resulting reduction in the uncertainty about gene-A stateis called the mutual information between these two variables :

BbAa bpap

bapbapbapHapHbapI, )()(

),(log),()]|([)]([)],([

Model independence & responsiveness to “complicated” relations

MI~1 bit; Corr.~0.9

gene-A expression level

MI~2 bits; Corr.~0.6

MI~0 bits; Corr.~0

MI~1.3 bits; Corr.~0

MI~0 bits; Corr.~0

Experiment index

Triplet-information ~ 1.0 bits

Capturing more than just pairwise relations

Using a model-dependent correlation measure might result in missing significant dependencies in our data.

Mycobacterium tuberculosis81 experiments

Pearson Correlation

Mutual-information vs. Pearson-Correlation results in bacteria gene-expression data

Information relations between gene expression profiles

Given the expression of gene-A, how much information do we have about the expression of gene-B ? (when averaging over all conditions)

( sample size: number of conditions - 173 in Gasch data )

Once we find these information relations, we often want to apply cluster analysis.

Numerous clustering methods are available – but typically they assume a particular model.

For example, K-means corresponds to the modeling assumption that each cluster can be described by a spherical Gaussian.

Back in square one …?

Or … c ii

iiisciqciqciqcqcS,...,

),...,,()|()...|()|()( )(

Iclust – information based clustering

What is a “good” cluster?

A simple proposal – given a cluster, we pick two items at random, and we want them to be as “similar” to each other as possible.

iisciqciqcqcS21,

2121 ),()|()|()( )(Formally, we wish to maximize

Or … c ii

iiiIciqciqciqcqcIcS,...,

),...,,()|()...|()|()()( )(

Namely, we wish to maximize the average information relations in our clusters, or to find clusters s.t. in each cluster all items are highly informative about each other.

Iclust – information based clustering (cont.)A penalty term that we wish to minimize, as in rate-distortion theory :

icqicqipiCI, )(

)|(log)|()();(

S(c) is maximized, but the penalty term is maximized as well (no compression)Penalty term is minimized (maximal compression), but S(c) is minimized as well.Intermediate interesting cases – small penalty with high S(c)

Iclust – information based clustering (cont.)

The intuitive clustering problem can be turned into a General mathematical optimization problem:

);( - )( ] )|( [ iCITcIicqF

Clustering parameters Expected information relationsamong data items

Information between dataitems and clusters

Tradeoff parameter

Clustering is formulated as trading bits of similarity againstbits of descriptive power, without any further assumptions.

Relations with other classical rate distortion

Iclust

)(1);(] )|( [ cDiCIicqF

Classical rate distortion

)(~1);(] )|( [ cDiCIicqF

The difference is whether the sum over i2 is before/after d is computed

If the distortion/similarity matrix is a kernel matrix the formulations are equivalent

rdciqciqcqcD,...,

1 ),...,()|()...|()( )( c i

cidciqcqcD1

1 ),()|()( )(~ )()(1

ii ciqdciqcqcD1 2

21 ))|(,()|()( )(~ )(2

iidciqciqcqcD1 2

21 ),()|()|()( )( )()(21

For the special case of pairwise relations

And yet – some important differences

Iclust is applicable when the raw data is given directly as pairwise relations

Iclust do not require a definition of a “prototype” (or “centroid”)

Both formulations induce different decoding schemes

A sender observes a pattern Φi, but is allowed to send only the cluster index, c

In classical rate distortion the receiver is assumed to decode by

)()()( )|(~ i

ci ciq Deterministic decoding with vocabulary size Nc

In Iclust he receiver is assumed to decode by )|(~~ )()( ciqii Stochastic decoding

with vocabulary size N

Iclust can handle more than just pairwise correlations

Original figure: 220 gray levels

Iclust vs. classical rate-distortion decoding

Iclust (stochastic) decoding

2 clusters

RD (deterministic) decoding

2 clusters

Iclust algorithm - freely available Web implementation

Responsive to any type of dependency among the data

Invariant to changes in the data representation

Allows to cluster based on more than pairwise relations

For more details :Slonim, Atwal, Tkacik, and Bialek (2005) Information based clustering, PNAS, in press.

See www.princeton.edu/~nslonim

)}()1( );(1exp{)( )|( cSricSrT

Average “similarity” among c members

Average “similarity” of i to c members

RPS10ARPS10BRPS11ARPS11BRPS12…

FRS1KRS1SES1TYS1VAS1…

PGM2UGP1TSL1TPS1TPS2…

C18 C15 C4

Clusters of genes

Proteins of the small ribosomal subunit

Enzymes that attach amino acids to tRNA

Enzymes involved in the trehaloseanabolism pathway

Iclust – clusters examples

Wal-MartTargetHome DepotBest BuyStaples…

MicrosoftApple Comp.DellHPMotorola…

NY TimesTribune Co.Meredith Corp.Dow Jones & Co.Knight-Ridder Inc.…

C17 C12 C2

Clusters of stocks

Data: Dynamics of stock prices

Given the price of stock-A, how much information do we have about the price of stock-B ? (when averaging over many days)

Snow WhiteCinderellaDumboPinocchioAladdin…

PsychoApocalypse NowThe GodfatherTaxi DriverPulp Fiction…

Star WarsReturn of the JediThe TerminatorAlienApollo 13…

C12 C1 C7

Clusters of movies

Data: Rating by viewers

Given the rating of movie-A, how much information do we have about the rating of movie-B ? (when averaging over many viewers)

Coherence

K-means

K-medians

Hierarchical

K-means

K-medians

Hierarchical

K-means

K-medians

Hierarchical

Coherence results – comparison to alternative algorithms

ESR S&P 500 EachMovie

Quick Summary

Information as the core measure of data analysis with many appealing features

Iclust - a novel information-theoretic formulation of clustering, with some intriguing relations with classical rate distortion clustering.

… and finding coherent stocks clusters, coherent movies clusters …

Validations: finding coherent gene clusters based on information relations in gene-expression data

… and genotype-phenotype association in bacteria, based on phylogenetic data -Slonim, Elemento & Tavazoie (2005), Mol. Systems Biol., in press.

… and more?

Iclust: information based clustering

Documents

Selecting Clustering Algorithms Based on Their …ackerman/weightedClusteringJMLR.pdf · Selecting Clustering Algorithms Based on Their Weight ... when choosing clustering techniques

Graph Based Clustering

Topical Clustering of MRD Senses Based on Information Retrieval

Curve clustering based on second order information ... · Curve clustering based on second order information: application to bad runway condition detection St ephane Puechmorel a,

Pattern-based Clustering

Clustering IV. Outline Impossibility theorem for clustering Density-based clustering and subspace clustering Bi-clustering or co-clustering

New Framework of HSL System Based Color Clustering ...New Framework of HSL System Based Color Clustering Algorithm Vasile Pătraşcu Tarom Information Technology, Otopeni, Romania

Clique-based Network Clustering

Density Based Clustering

entropy based subspace clustering

Constraint-based Subspace Clustering

ITGC: Information-theoretic Grid-based Clusteringeprints.cs.univie.ac.at/5984/1/ITGC_Information... · Thus, we propose an Information-Theoretic Grid-based Clustering (ITGC) algorithm

Clustering-Based Personalization

Energy based clustering

3.5 model based clustering

Sparsity-sensitive Diagonal Co-clustering Algorithms for ... · Graph-based Co-clustering Model-based Co-clustering Using Co-clustering in Biomedical Text Mining Framework Conclusion

Variable Selection for Model-Based Clustering · Variable Selection for Model-Based Clustering ... we introduce a method for variable or feature selection for model-based clustering

Clustering: Similarity-Based Clustering · Clustering: Similarity-Based Clustering CS4780/5780 – Machine Learning Fall 2013 Thorsten Joachims Cornell University Reading: Manning/Raghavan/Schuetze,

An Introduction to Model-Based Clustering · An Introduction to Model-Based Clustering Anish R. Shah, CFA Northfield Information Services . Anish@northinfo.com . London . Nov 17,

A Biomedical Information Retrieval System based on Clustering for Mobile Devices