121
Data and Text Mining Lars Juhl Jensen

Data and Text Mining

Embed Size (px)

Citation preview

Page 1: Data and Text Mining

Data and Text Mining

Lars Juhl Jensen

Page 2: Data and Text Mining
Page 3: Data and Text Mining
Page 4: Data and Text Mining

sequence analysis

Page 5: Data and Text Mining
Page 6: Data and Text Mining
Page 7: Data and Text Mining

protein networks

Page 8: Data and Text Mining

de Lichtenberg, Jensen et al., Science, 2005

Page 9: Data and Text Mining

adverse drug reactions

Page 10: Data and Text Mining

Campillos, Kuhn et al., Science, 2008

Page 11: Data and Text Mining
Page 12: Data and Text Mining

group leader

Page 13: Data and Text Mining
Page 14: Data and Text Mining

cofounder

Page 15: Data and Text Mining
Page 16: Data and Text Mining

data mining

Page 17: Data and Text Mining

proteomics

Page 18: Data and Text Mining

text mining

Page 19: Data and Text Mining

biomedical literature

Page 20: Data and Text Mining

electronic health records

Page 21: Data and Text Mining

protein networks

Page 22: Data and Text Mining

guilt by association

Page 23: Data and Text Mining
Page 24: Data and Text Mining

STRING

Page 25: Data and Text Mining

Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011

Page 26: Data and Text Mining

computational predictions

Page 27: Data and Text Mining

gene fusion

Page 28: Data and Text Mining

Korbel et al., Nature Biotechnology, 2004

Page 29: Data and Text Mining

gene neighborhood

Page 30: Data and Text Mining

operons

Page 31: Data and Text Mining

Korbel et al., Nature Biotechnology, 2004

Page 32: Data and Text Mining

bidirectional promoters

Page 33: Data and Text Mining

Korbel et al., Nature Biotechnology, 2004

Page 34: Data and Text Mining

phylogenetic profiles

Page 35: Data and Text Mining

Korbel et al., Nature Biotechnology, 2004

Page 36: Data and Text Mining

a real example

Page 37: Data and Text Mining
Page 38: Data and Text Mining
Page 39: Data and Text Mining
Page 40: Data and Text Mining

Cell

Cellulosomes

Cellulose

Page 41: Data and Text Mining

experimental data

Page 42: Data and Text Mining

gene coexpression

Page 43: Data and Text Mining
Page 44: Data and Text Mining

protein interactions

Page 45: Data and Text Mining

Jensen & Bork, Science, 2008

Page 46: Data and Text Mining

genetic interactions

Page 47: Data and Text Mining

Beyer et al., Nature Reviews Genetics, 2007

Page 48: Data and Text Mining

curated knowledge

Page 49: Data and Text Mining

complexes

Page 50: Data and Text Mining

pathways

Page 51: Data and Text Mining

Letunic & Bork, Trends in Biochemical Sciences, 2008

Page 52: Data and Text Mining

many databases

Page 53: Data and Text Mining

different formats

Page 54: Data and Text Mining

different identifiers

Page 55: Data and Text Mining

variable quality

Page 56: Data and Text Mining

not comparable

Page 57: Data and Text Mining

not same species

Page 58: Data and Text Mining

hard work

Page 59: Data and Text Mining

quality scores

Page 60: Data and Text Mining

von Mering et al., Nucleic Acids Research, 2005

Page 61: Data and Text Mining

calibrate vs. gold standard

Page 62: Data and Text Mining

von Mering et al., Nucleic Acids Research, 2005

Page 63: Data and Text Mining

homology-based transfer

Page 64: Data and Text Mining

Franceschini et al., Nucleic Acids Research, 2013

Page 65: Data and Text Mining

missing most of the data

Page 66: Data and Text Mining

text mining

Page 67: Data and Text Mining

>10 km

Page 68: Data and Text Mining

too much to read

Page 69: Data and Text Mining

computer

Page 70: Data and Text Mining

as smart as a dog

Page 71: Data and Text Mining

teach it specific tricks

Page 72: Data and Text Mining
Page 73: Data and Text Mining
Page 74: Data and Text Mining

named entity recognition

Page 75: Data and Text Mining

comprehensive lexicon

Page 76: Data and Text Mining

CDC2

Page 77: Data and Text Mining

cyclin dependent kinase 1

Page 78: Data and Text Mining

expansion rules

Page 79: Data and Text Mining

hCdc2

Page 80: Data and Text Mining

CDC2

Page 81: Data and Text Mining

flexible matching

Page 82: Data and Text Mining

cyclin-dependent kinase 1

Page 83: Data and Text Mining

cyclin dependent kinase 1

Page 84: Data and Text Mining

“black list”

Page 85: Data and Text Mining

SDS

Page 86: Data and Text Mining

augmented browsing

Page 87: Data and Text Mining

Reflect

Page 88: Data and Text Mining

browser add-on

Page 89: Data and Text Mining

real-time text mining

Page 90: Data and Text Mining

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009O’Donoghue et al., Journal of Web Semantics, 2010

Page 91: Data and Text Mining

information extraction

Page 92: Data and Text Mining

co-mentioning

Page 93: Data and Text Mining

within documents

Page 94: Data and Text Mining

within paragraphs

Page 95: Data and Text Mining

within sentences

Page 96: Data and Text Mining

text corpus

Page 97: Data and Text Mining

~22 million abstracts

Page 98: Data and Text Mining

no access

Page 99: Data and Text Mining

millions of full-text articles

Page 100: Data and Text Mining
Page 101: Data and Text Mining

localization and disease

Page 102: Data and Text Mining

general approach

Page 103: Data and Text Mining

COMPARTMENTS

Page 104: Data and Text Mining

TISSUES

Page 105: Data and Text Mining

DISEASES

Page 106: Data and Text Mining

curated knowledge

Page 107: Data and Text Mining

experimental data

Page 108: Data and Text Mining

text mining

Page 109: Data and Text Mining

computational predictions

Page 110: Data and Text Mining

common identifiers

Page 111: Data and Text Mining

quality scores

Page 112: Data and Text Mining

visualization

Page 113: Data and Text Mining

compartments.jensenlab.org

Page 114: Data and Text Mining

tissues.jensenlab.org

Page 115: Data and Text Mining

dissemination

Page 116: Data and Text Mining

web interfaces

Page 117: Data and Text Mining
Page 118: Data and Text Mining

web services

Page 119: Data and Text Mining

diseases.jensenlab.org

Page 120: Data and Text Mining

bulk download

Page 121: Data and Text Mining

AcknowledgmentsSTRING

Christian von Mering

Damian Szklarczyk

Michael KuhnManuel Stark

Samuel ChaffronChris Creevey

Jean MullerTobias DoerksPhilippe Julien

Alexander RothMilan Simonovic

Jan KorbelBerend Snel

Martijn HuynenPeer Bork

Text miningSune FrankildEvangelos PafilisKalliopi TsafouAlberto SantosJanos BinderHeiko HornMichael KuhnNigel BrownReinhardt SchneiderSean O’ Donoghue