30
A brief introduction to mutual information and its application 2015. 2. 4.

A brief introduction to mutual information and its application

Embed Size (px)

Citation preview

Page 1: A brief introduction to mutual information and its application

A brief introduction to mutual information

and its application2015. 2. 4.

Page 2: A brief introduction to mutual information and its application

Agenda• Introduction• Definition of mutual information• Applications

Page 3: A brief introduction to mutual information and its application

Introduction

Page 4: A brief introduction to mutual information and its application

Why we need?• We need ‘a good measure’ for somewhat!

Match score?

Page 5: A brief introduction to mutual information and its application

What is ‘a good measure’?• Precision• Significance• Feasible to various data

Page 6: A brief introduction to mutual information and its application

What is ‘a good measure’?• Precision• Significance• Feasible to various data

A solution : mutual informa-tion!

Page 7: A brief introduction to mutual information and its application

What is mutual informa-tion?• A measure for two or more random variables• Entropy based measure• Non-parametric measure• Shows good estimation for discrete random vari-

ables

Page 8: A brief introduction to mutual information and its application

What is entropy?• A measure in information theory• Uncertainty, information contents

• Definition of entropy for a random variable •

• Definition of joint entropy for two random variable and

Page 9: A brief introduction to mutual information and its application

Entropy of a coin flip• Let • when , • when ,

Page 10: A brief introduction to mutual information and its application

R code for the previous figureH <- function(p_h, p_t) { ret <- 0 if( p_h > 0.0 ) ret <- ret - p_h * log2(p_h) if( p_t > 0.0 ) ret <- ret - p_t * log2(p_t) return(ret)}

head <- seq(0,1,0.01)tail <- 1 - head

entropy <- mapply( H, head, tail)

plot( entropy ~ head, type='n' )lines( entropy ~ head, lwd=2, col='red' )

Page 11: A brief introduction to mutual information and its application

Joint entropy • Venn diagram for definition of entropies

H(X) H(Y)

Page 12: A brief introduction to mutual information and its application

Joint entropy • Venn diagram for definition of entropies

H(X,Y)

Page 13: A brief introduction to mutual information and its application

Example of joint entropy• 성도 (X) and 성완 (Y) tossed coins 10 times at a

time• 0 : head, 1 : tail• X : { 0, 0, 0, 0, 0, 1, 1, 1, 1, 1 }• Y : { 0, 0, 1, 0, 0, 0, 1, 0, 1, 1 }

• H(X,Y) = 1.85• Note :

Page 14: A brief introduction to mutual information and its application

R code for the calculation> X <- c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1)> Y <- c(0, 0, 1, 0, 0, 0, 1, 0, 1, 1)> > freq <- table(X,Y)> > ret <- 0> for( i in 1:2 ) {> for( j in 1:2 ) {> ret <- freq[i,j]/10.0 * log2(freq[i,j]/10.0)> }> }> ret

Page 15: A brief introduction to mutual information and its application

‘entropy’ library> library("entropy")> x1 = runif(10000)> hist(x1, xlim=c(0,1), freq=FALSE)> y1 = discretize(x1, numBins=10, r=c(0,1))> entropy(y1)[1] 2.30244> y1

Page 16: A brief introduction to mutual information and its application

Mutual information• Measure for mutual dependence or interaction

I(X;Y)

Page 17: A brief introduction to mutual information and its application

Mutual information• Some properties of mutual information

Page 18: A brief introduction to mutual information and its application

18

How to measure mutual in-formation

Genotype sumCase

Controlsum

Genotype sum

Case

Control

sum

Frequency Table

Page 19: A brief introduction to mutual information and its application

19

How to measure mutual in-formation

Genotype sum

Case

Control

sum

Entropy Table

Page 20: A brief introduction to mutual information and its application

‘entropy’ library> x1 = runif(10000)> x2 = runif(10000)> y2d = discretize2d(x1, x2, numBins1=10, numBins2=10)> H12 = entropy(y2d)>> # mutual information> mi.empirical(y2d) # approximately zero> H1 = entropy(rowSums(y2d))> H2 = entropy(colSums(y2d))> H1+H2-H12

Page 21: A brief introduction to mutual information and its application

Applications

Page 22: A brief introduction to mutual information and its application

22

Association measure between genomic features and outcome

𝐼 ( 𝑋1 , 𝑋 2;𝑌 )=𝐻 (𝑋 1 , 𝑋 2 )+𝐻 (𝑌 )−𝐻 (𝑋 1 , 𝑋 2 ,𝑌 )

pair of genomic features binary outcomes

Page 23: A brief introduction to mutual information and its application

23

Mutual Information With Clustering(Leem et al., 2014) (1/2)

: SNPs : causative SNPs

d1

d2

distanceScore=d1+d2

Centroid 1

Centroid 2

Centroid 3

3 SNPs with the high-est mutual informa-

tion value

m candidates

m candidates

m candidates

Page 24: A brief introduction to mutual information and its application

24

Mutual Information With Clustering(Leem et al., 2014) (2/2)• Mutual information• As distance measure for clustering

• K-means clustering algorithm• Candidate selection

• Reduce search space dramatically

• Can detect high-order epistatic interaction• Also, shows better performance (power, execution time)

than previous methods

Page 25: A brief introduction to mutual information and its application

Outcome-guided mutual information network in network based prediction (Jeong et al., 2015) (1/2) • Two parameters - and

mutual information

𝜃 𝜃∗ (1+𝜎 )

# of

edg

es

θ=𝑚𝑎𝑥 𝑖≠ 𝑗 𝐼 avg (𝑖 , 𝑗 )

𝐼 avg (𝑖 , 𝑗 )= 130∑𝑝=1

30

𝐼avg (𝑔𝑖 ,𝑔 𝑗 ;𝑌 𝑝 )

𝐺𝜎❑= {(𝑔𝑖 ,𝑔 𝑗 )|𝑔𝑖 ,𝑔 𝑗∈𝑃 𝑎𝑛𝑑 𝐼 (𝑔𝑖 ,𝑔 𝑗 ;𝑌 )≥𝜃(1+𝜎 )}

Page 26: A brief introduction to mutual information and its application

Outcome-guided mutual information network in network based prediction (Jeong et al., 2015) (2/2)

Feature network

Page 27: A brief introduction to mutual information and its application

Relevance networks for gastritis (Jeong and Sohn, 2014)

Page 28: A brief introduction to mutual information and its application

MINA: Mutual Information Network Analysis frame-work

https://github.com/hhjeong/MINA

Page 29: A brief introduction to mutual information and its application

Conclusion

Page 30: A brief introduction to mutual information and its application

Problems and its solution of mutual information• Noises for continuous data• Alternative discretization technique

• Assessment of significance• Permutation test• Also, we should consider for multiple testing problem.

• Mutual information is not a metric!