Upload
hyun-hwan-jeong
View
732
Download
1
Embed Size (px)
Citation preview
A brief introduction to mutual information
and its application2015. 2. 4.
Agenda• Introduction• Definition of mutual information• Applications
Introduction
Why we need?• We need ‘a good measure’ for somewhat!
Match score?
What is ‘a good measure’?• Precision• Significance• Feasible to various data
What is ‘a good measure’?• Precision• Significance• Feasible to various data
A solution : mutual informa-tion!
What is mutual informa-tion?• A measure for two or more random variables• Entropy based measure• Non-parametric measure• Shows good estimation for discrete random vari-
ables
What is entropy?• A measure in information theory• Uncertainty, information contents
• Definition of entropy for a random variable •
• Definition of joint entropy for two random variable and
Entropy of a coin flip• Let • when , • when ,
R code for the previous figureH <- function(p_h, p_t) { ret <- 0 if( p_h > 0.0 ) ret <- ret - p_h * log2(p_h) if( p_t > 0.0 ) ret <- ret - p_t * log2(p_t) return(ret)}
head <- seq(0,1,0.01)tail <- 1 - head
entropy <- mapply( H, head, tail)
plot( entropy ~ head, type='n' )lines( entropy ~ head, lwd=2, col='red' )
Joint entropy • Venn diagram for definition of entropies
H(X) H(Y)
Joint entropy • Venn diagram for definition of entropies
H(X,Y)
Example of joint entropy• 성도 (X) and 성완 (Y) tossed coins 10 times at a
time• 0 : head, 1 : tail• X : { 0, 0, 0, 0, 0, 1, 1, 1, 1, 1 }• Y : { 0, 0, 1, 0, 0, 0, 1, 0, 1, 1 }
• H(X,Y) = 1.85• Note :
R code for the calculation> X <- c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1)> Y <- c(0, 0, 1, 0, 0, 0, 1, 0, 1, 1)> > freq <- table(X,Y)> > ret <- 0> for( i in 1:2 ) {> for( j in 1:2 ) {> ret <- freq[i,j]/10.0 * log2(freq[i,j]/10.0)> }> }> ret
‘entropy’ library> library("entropy")> x1 = runif(10000)> hist(x1, xlim=c(0,1), freq=FALSE)> y1 = discretize(x1, numBins=10, r=c(0,1))> entropy(y1)[1] 2.30244> y1
Mutual information• Measure for mutual dependence or interaction
I(X;Y)
Mutual information• Some properties of mutual information
18
How to measure mutual in-formation
Genotype sumCase
Controlsum
Genotype sum
Case
Control
sum
Frequency Table
19
How to measure mutual in-formation
Genotype sum
Case
Control
sum
Entropy Table
‘entropy’ library> x1 = runif(10000)> x2 = runif(10000)> y2d = discretize2d(x1, x2, numBins1=10, numBins2=10)> H12 = entropy(y2d)>> # mutual information> mi.empirical(y2d) # approximately zero> H1 = entropy(rowSums(y2d))> H2 = entropy(colSums(y2d))> H1+H2-H12
Applications
22
Association measure between genomic features and outcome
𝐼 ( 𝑋1 , 𝑋 2;𝑌 )=𝐻 (𝑋 1 , 𝑋 2 )+𝐻 (𝑌 )−𝐻 (𝑋 1 , 𝑋 2 ,𝑌 )
pair of genomic features binary outcomes
23
Mutual Information With Clustering(Leem et al., 2014) (1/2)
: SNPs : causative SNPs
d1
d2
distanceScore=d1+d2
Centroid 1
Centroid 2
Centroid 3
3 SNPs with the high-est mutual informa-
tion value
m candidates
m candidates
m candidates
24
Mutual Information With Clustering(Leem et al., 2014) (2/2)• Mutual information• As distance measure for clustering
• K-means clustering algorithm• Candidate selection
• Reduce search space dramatically
• Can detect high-order epistatic interaction• Also, shows better performance (power, execution time)
than previous methods
Outcome-guided mutual information network in network based prediction (Jeong et al., 2015) (1/2) • Two parameters - and
mutual information
𝜃 𝜃∗ (1+𝜎 )
# of
edg
es
θ=𝑚𝑎𝑥 𝑖≠ 𝑗 𝐼 avg (𝑖 , 𝑗 )
𝐼 avg (𝑖 , 𝑗 )= 130∑𝑝=1
30
𝐼avg (𝑔𝑖 ,𝑔 𝑗 ;𝑌 𝑝 )
𝐺𝜎❑= {(𝑔𝑖 ,𝑔 𝑗 )|𝑔𝑖 ,𝑔 𝑗∈𝑃 𝑎𝑛𝑑 𝐼 (𝑔𝑖 ,𝑔 𝑗 ;𝑌 )≥𝜃(1+𝜎 )}
Outcome-guided mutual information network in network based prediction (Jeong et al., 2015) (2/2)
Feature network
Relevance networks for gastritis (Jeong and Sohn, 2014)
MINA: Mutual Information Network Analysis frame-work
https://github.com/hhjeong/MINA
Conclusion
Problems and its solution of mutual information• Noises for continuous data• Alternative discretization technique
• Assessment of significance• Permutation test• Also, we should consider for multiple testing problem.
• Mutual information is not a metric!