Upload
lars-juhl-jensen
View
583
Download
8
Tags:
Embed Size (px)
Citation preview
Text and data mining
Lars Juhl Jensen
Part 1text mining
exponential growth
some things are constant
~45 seconds per paper
computer
as smart as a dog
teach it specific tricks
named entity identification
Reflect
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009
comprehensive lexicon
orthographic variation
“black list”
information extraction
no access
collaboration
Part 2protein networks
guilt by association
STRING
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
genomic context
gene fusion
Korbel et al., Nature Biotechnology, 2004
experimental data
physical interactions
Jensen & Bork, Science, 2008
gene coexpression
genetic interactions
Beyer et al., Nature Reviews Genetics, 2007
curated knowledge
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
text mining
many data types
many databases
different formats
different identifiers
variable quality
quality scores
calibrate vs. gold standard
von Mering et al., Nucleic Acids Research, 2005
orthology transfer
Frishman et al., Modern Genome Annotation, 2009
Part 3drug networks
new uses for old drugs
shared target(s)
chemical similarity
Campillos & Kuhn et al., Science, 2008
similar drugs share targets
Campillos & Kuhn et al., Science, 2008
only trivial predictions
phenotypic similarity
chemical perturbations
phenotypic readouts
drug treatment
side effects
no database
package inserts
Campillos & Kuhn et al., Science, 2008
text mining
manual validation
side-effect correlations
Campillos & Kuhn et al., Science, 2008
side-effect frequencies
Campillos & Kuhn et al., Science, 2008
side-effect similarity
chemical similarity
Campillos & Kuhn et al., Science, 2008
categorization
Campillos & Kuhn et al., Science, 2008
20 drug–drug pairs
in vitro binding assays
Ki<10 µM for 11 of 20
cell assays
9 of 9 showed activity
Acknowledgments
reflect.wsSune Frankild
Heiko Horn
Evangelos Pafilis
Michael Kuhn
Reinhardt Schneider
Sean O’Donoghue
sideeffects.embl.deMonica Campillos
Michael Kuhn
Anne-Claude Gavin
Peer Bork
string-db.orgDamian Szklarczyk
Andrea Franceschini
Michael Kuhn
Milan Simonovic
Alexander Roth
Pablo Minguez
Tobias Doerks
Manuel Stark
Jean Muller
Peer Bork
Christian von Mering
larsjuhljensen