90
Text and data mining Lars Juhl Jensen

Text and data mining

Embed Size (px)

Citation preview

Page 1: Text and data mining

Text and data mining

Lars Juhl Jensen

Page 2: Text and data mining

Part 1text mining

Page 3: Text and data mining

exponential growth

Page 4: Text and data mining
Page 5: Text and data mining
Page 6: Text and data mining

some things are constant

Page 7: Text and data mining
Page 8: Text and data mining

~45 seconds per paper

Page 9: Text and data mining

computer

Page 10: Text and data mining

as smart as a dog

Page 11: Text and data mining

teach it specific tricks

Page 12: Text and data mining
Page 13: Text and data mining
Page 14: Text and data mining

named entity identification

Page 15: Text and data mining

Reflect

Page 16: Text and data mining

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009

Page 17: Text and data mining

comprehensive lexicon

Page 18: Text and data mining

orthographic variation

Page 19: Text and data mining

“black list”

Page 20: Text and data mining

information extraction

Page 21: Text and data mining

no access

Page 22: Text and data mining
Page 23: Text and data mining

collaboration

Page 24: Text and data mining
Page 25: Text and data mining
Page 26: Text and data mining

Part 2protein networks

Page 27: Text and data mining

guilt by association

Page 28: Text and data mining
Page 29: Text and data mining

STRING

Page 30: Text and data mining

Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011

Page 31: Text and data mining

genomic context

Page 32: Text and data mining

gene fusion

Page 33: Text and data mining

Korbel et al., Nature Biotechnology, 2004

Page 34: Text and data mining

experimental data

Page 35: Text and data mining

physical interactions

Page 36: Text and data mining

Jensen & Bork, Science, 2008

Page 37: Text and data mining

gene coexpression

Page 38: Text and data mining

genetic interactions

Page 39: Text and data mining

Beyer et al., Nature Reviews Genetics, 2007

Page 40: Text and data mining
Page 41: Text and data mining

curated knowledge

Page 42: Text and data mining

pathways

Page 43: Text and data mining

Letunic & Bork, Trends in Biochemical Sciences, 2008

Page 44: Text and data mining

text mining

Page 45: Text and data mining
Page 46: Text and data mining

many data types

Page 47: Text and data mining

many databases

Page 48: Text and data mining

different formats

Page 49: Text and data mining

different identifiers

Page 50: Text and data mining

variable quality

Page 51: Text and data mining

quality scores

Page 52: Text and data mining

calibrate vs. gold standard

Page 53: Text and data mining

von Mering et al., Nucleic Acids Research, 2005

Page 54: Text and data mining

orthology transfer

Page 55: Text and data mining

Frishman et al., Modern Genome Annotation, 2009

Page 56: Text and data mining

Part 3drug networks

Page 57: Text and data mining

new uses for old drugs

Page 58: Text and data mining

shared target(s)

Page 59: Text and data mining

chemical similarity

Page 60: Text and data mining

Campillos & Kuhn et al., Science, 2008

Page 61: Text and data mining

similar drugs share targets

Page 62: Text and data mining

Campillos & Kuhn et al., Science, 2008

Page 63: Text and data mining

only trivial predictions

Page 64: Text and data mining

phenotypic similarity

Page 65: Text and data mining

chemical perturbations

Page 66: Text and data mining

phenotypic readouts

Page 67: Text and data mining

drug treatment

Page 68: Text and data mining

side effects

Page 69: Text and data mining

no database

Page 70: Text and data mining

package inserts

Page 71: Text and data mining

Campillos & Kuhn et al., Science, 2008

Page 72: Text and data mining

text mining

Page 73: Text and data mining

manual validation

Page 74: Text and data mining

side-effect correlations

Page 75: Text and data mining

Campillos & Kuhn et al., Science, 2008

Page 76: Text and data mining

side-effect frequencies

Page 77: Text and data mining

Campillos & Kuhn et al., Science, 2008

Page 78: Text and data mining

side-effect similarity

Page 79: Text and data mining

chemical similarity

Page 80: Text and data mining

Campillos & Kuhn et al., Science, 2008

Page 81: Text and data mining

categorization

Page 82: Text and data mining

Campillos & Kuhn et al., Science, 2008

Page 83: Text and data mining

20 drug–drug pairs

Page 84: Text and data mining

in vitro binding assays

Page 85: Text and data mining

Ki<10 µM for 11 of 20

Page 86: Text and data mining

cell assays

Page 87: Text and data mining

9 of 9 showed activity

Page 88: Text and data mining

Acknowledgments

reflect.wsSune Frankild

Heiko Horn

Evangelos Pafilis

Michael Kuhn

Reinhardt Schneider

Sean O’Donoghue

sideeffects.embl.deMonica Campillos

Michael Kuhn

Anne-Claude Gavin

Peer Bork

string-db.orgDamian Szklarczyk

Andrea Franceschini

Michael Kuhn

Milan Simonovic

Alexander Roth

Pablo Minguez

Tobias Doerks

Manuel Stark

Jean Muller

Peer Bork

Christian von Mering

Page 89: Text and data mining

larsjuhljensen

Page 90: Text and data mining