Upload
lars-juhl-jensen
View
353
Download
3
Tags:
Embed Size (px)
DESCRIPTION
MIPS Retreat, Kloster Frauenchiemsee, Chiemsee, Germany, July 9-10, 2007
Citation preview
Prediction of protein networks through data integration
Lars Juhl Jensen
EMBL Heidelberg
prediction of interactions
STRING
functional interactions
373 genomes
model organism databases
Ensembl
Genome Reviews
RefSeq
genomic context methods
gene neighborhood
gene fusion
phylogenetic profiles
Cell
Cellulosomes
Cellulose
correct interactions
wrong associations
phylogenetic profiles
SVDSingular Value Decomposition
Euclidian distance
gene neighborhood
sum of intergenic distances
raw quality scores
rank by reliability
not comparable
Euclidian distance
sum of intergenic distances
benchmarking
calibrate vs. gold standard
raw quality scores
probabilistic scores
curated knowledge
many sources
KEGGKyoto Encyclopedia of Genes and Genomes
Reactome
PIDNCI-Nature Pathway Interaction Database
STKESignal Transduction Knowledge Environment
MIPSMunich Information center
for Protein Sequences
Gene Ontology
different gene identifiers
synonyms list
literature mining
MEDLINE
SGDSaccharomyces Genome Database
The Interactive Fly
OMIMOnline Mendelian Inheritance in Man
co-mentioning
NLPNatural Language Processing
Gene and protein namesCue words for entity recognitionVerbs for relation extraction
[nxgene The GAL4 gene]
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
calibrate vs. gold standard
primary experimental data
gene expression
GEOGene Expression Omnibus
expression compendia
protein interactions
BINDBiomolecular Interaction Network Database
BioGRIDGeneral Repository for Interaction Datasets
DIPDatabase of Interacting Proteins
IntAct
MINTMolecular Interactions Database
HPRDHuman Protein Reference Database
many sources
different gene identifiers
redundancy
not comparable
merge data by publication
raw quality scores
calibrate vs. gold standard
combine all evidence
spread over many species
transfer by orthology
naïve Bayesian scoring
prediction of interactions
NetworKIN
the idea
phosphoproteomics
mass spectrometry
phosphorylation sites
Phospho.ELM
in vivo
kinases are unknown
computational methods
NetPhosK
Scansite
sequence motifs
kinase families
overprediction
no context
what a kinase could do
not what it actually does
context
co-activators
scaffolders
protein networks
the algorithm
NetworKIN
benchmarking
Phospho.ELM
2.5-fold better accuracy
context is crucial
global statistics
visualization
ATM signaling
experimental validation
summary
reanalysis
benchmarking
integration
complementary data types
computational methods
reproduce what is know
biological discoveries
testable hypotheses
Acknowledgments
The STRING database– Christian von Mering
– Michael Kuhn
– Berend Snel
– Martijn Huynen
– Sean Hooper
– Samuel Chaffron
– Julien Lagarde
– Mathilde Foglierini
– Peer Bork
Literature mining– Jasmin Saric
– Rossitza Ouzounova
– Isabel Rojas
The NetworKIN method– Rune Linding
– Gerard Ostheimer
– Francesca Diella
– Karen Colwill
– Jing Jin
– Pavel Metalnikov
– Vivian Nguyen
– Adrian Pasculescu
– Jin Gyoon Park
– Leona D. Samson
– Rob Russell
– Peer Bork
– Michael Yaffe
– Tony Pawson