Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
The Nonparanormal SKEPTIC and
Its Applica9on
Outline
• The Nonparanormal SKEPTIC • inferring biochemical networks
the precision matrix
• inverse of the covariance matrix • Θ • if the the data is mul9variate normal:
node 1 2 3 4 5 6 7
1 0 ~ ~ ~ 0 0 0
2 ~ 0 0 0 0 0 0
3 ~ 0 0 0 ~ 0 0
4 ~ 0 0 0 0 0 0
5 0 0 ~ 0 0 ~ 0
6 0 0 0 0 ~ 0 ~
7 0 0 0 0 0 ~ 0
1
2
3
4
5
6
7
2 problems
• dimension >> # observa9ons • data is not mul9variate normal
dimension >> # observa9ons
• log likelihood
log detΘ – tr(SΘ)-‐ (terms involving the mean)
Max
data is not mul9variate normal
• trick 1 (the nonparanormal):
• trick 2 (nonparametric correla9on):
Lafferty, J. (2009). The Nonparanormal : Semiparametric Es9ma9on of High Dimensional Undirected Graphs, 10, 2295–2328.
Pris'onchus pacificus
• satellite model organism of C. elegans • necromenic associa9on with Scarab beetles • global distribu9on – diverse habitats – diverse but structured gene9c background
Image courtesy of Sommer Lab
Collaboration: Ralf J. Sommer, Director, MPI, Tuebingen, Germany
data set
• ~450 strains • 2 replicates each • posi9ve and nega9ve ioniza9on high resolu9on lcms (metabolome)
• restric9on site associated dna maker snp calls (genome)
rad seq
restric9on enzyme
adapter
restric9on enzyme
adapter
genomic DNA
sequencing
SNP calling
Poland, J. a, Brown, P. J., Sorrells, M. E., & Jannink, J.-‐L. (2012). Development of high-‐density gene9c maps for barley and wheat using a novel two-‐enzyme genotyping-‐by-‐sequencing approach. PloS One, 7(2), e32253. doi:10.1371/journal.pone.0032253
snp data set snp_locus
_1 snp_locus
_2 snp_locus
_3 …
sample_1
sample_2
sample_3
…
1% genomic coverage
# alleles count 1 194 2 2947 3 1
column: hkp://www.waters.com/webassets/cms/category/media/snapshot/ACQUITY_Column.jpg mass spectrometer: hkps://encrypted-‐tbn2.gsta9c.com/images?q=tbn:ANd9GcSJGwVjgNgUcS9gVvxiupz6-‐wrL5jrVypj09BYwFnIfvHGSfFXXdg
total ion chromatogram
mass spectrometer
liquid chromatography coupled mass spectrometry (lcms)
chromatography column
peak_1 (m,rt)
peak_2 (m,rt)
peak_3 (m,rt)
…
sample_1
sample_2
sample_3
…
~2,000 features
lc-‐ms
xcms
PC 2
PC 3
PC 4
PC 2 PC 3 PC 1
ascaroside centric metabolic network
(466.2, 5.78)
ascaroside centric metabolic network
Start Node End Node Shortest
Path
Shortest Path To Random Node From
Start Node
Shortest Path To Random Node
From End Node Correlation
ascr#9 pasc#12 1 9.18 10.74 -0.128061447
pasc#9 pasc#12 2 13.28 10.88 -0.076858659
ascr#9 pasc#9 3 9.6 12.52 -0.626094706
advantages of this method
• requires no prior knowledge • unsupervised • group wise interference • generalizable • efficient • func9onal