Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Semi-supervised enhancer predictionusing the Segway framework
Orion J. Buske, Tzitziki Lemus,Michael M. Hoffman, Jeff A. Bilmes,
William Staffor d Noble
Department of Genome SciencesUniversity of Washington
1 2 0 2 1 00… …unsupervised
1 2 0 2 1 00… …unsupervised
2 02 1 00… …
semi-supervised
novelknown p300 peaks (Heintzman et al. 2009)
recall
prec
isio
n
better
worse
Fraction of p300 sitesoverlapped by predictions
Fraction of predictionsthat overlap p300 sites
recall
prec
isio
n
CTCF
H3K4me1
H3K4me2
H3K4me3
H3K9ac
H3K9me1
H3K27ac
H3K27me3
H3K36me3
H4k20me1
DNaseI
Pol2
predictedobserved
Higher H3K4me1 H3K9me1 H3K36me3 H4K20me1 Input BDP1 BRF1 GATA1 JunD
Lower H3K4me3 DNaseI CTCF Pol2 TAF1
Example
semi-supervised labelprecision: 0.27recall: 0.56
P-SS
Segway hypothesizes more than one type of p300 site
semi-supervised labelprecision: 0.27recall: 0.56
P-SS
higher
lower
P-2
P-3
semi-supervised labelprecision: 0.27recall: 0.56
unsupervised labels
P-SS
higher
lower
P-SSH3K4me3TAF1Pol2H3K27acZNF267
P-2H3K4me3TAF1Pol3H3K4me1cFos
P-3H3K4me3TAF1Pol2H3K9acCTCF
Subtypes correspond to active/repressed chromatin statessimilar
P-2
P-3
P-SS
Combined P-SS, P-2, P-3precision: 0.21recall: 0.91
With combined labels, we achieve comparableprecision with excellent recall
P-2
P-3
P-SS
At least two segments within 1kb(P-SS, P-2, P-3)
precision: 0.31recall: 0.77
With multiple predicted sites in close proximity, weimprove precision with good recall
P-2
P-3
P-SS
Acknowledgements
AvailabilitySegway: http://noble.gs.washington.edu/proj/segway
Segtools: http://noble.gs.washington.edu/proj/segtools
ENCODE Project Consortium
NHGRI
CTCF
H3K4me1
H3K4me2
H3K4me3
H3K9ac
H3K9me1
H3K27ac
H3K27me3
H3K36me3
H4k20me1
DNase
Pol2
predicted (TP)observed (TP)false negative
False negatives have highermean signal than truepositives
Lower GATA1
Higher H3K4me3 Pol2 Pol3 TAF1
Example