148
Large-scale integration of data and text Lars Juhl Jensen

Large-scale integration of data and text

Embed Size (px)

Citation preview

Page 1: Large-scale integration of data and text

Large-scale integration of data and text

Lars Juhl Jensen

Page 2: Large-scale integration of data and text

Large-scale integration of data and text

Lars Juhl Jensen

Page 3: Large-scale integration of data and text

association networks

Page 4: Large-scale integration of data and text

text mining

Page 5: Large-scale integration of data and text

localization and diseases

Page 6: Large-scale integration of data and text

me

Page 7: Large-scale integration of data and text
Page 8: Large-scale integration of data and text
Page 9: Large-scale integration of data and text

promoter analysis

Page 10: Large-scale integration of data and text

Jensen & Knudsen, Bioinformatics, 2000

Page 11: Large-scale integration of data and text

function prediction

Page 12: Large-scale integration of data and text

Jensen, Gupta et al., Journal of Molecular Biology, 2002

Page 13: Large-scale integration of data and text
Page 14: Large-scale integration of data and text
Page 15: Large-scale integration of data and text

protein networks

Page 16: Large-scale integration of data and text

de Lichtenberg, Jensen et al., Science, 2005

Page 17: Large-scale integration of data and text

chemoinformatics

Page 18: Large-scale integration of data and text

Campillos, Kuhn et al., Science, 2008

Page 19: Large-scale integration of data and text
Page 20: Large-scale integration of data and text
Page 21: Large-scale integration of data and text
Page 22: Large-scale integration of data and text
Page 23: Large-scale integration of data and text

data mining

Page 24: Large-scale integration of data and text

text mining

Page 25: Large-scale integration of data and text

electronic health records

Page 26: Large-scale integration of data and text

association networks

Page 27: Large-scale integration of data and text

guilt by association

Page 28: Large-scale integration of data and text
Page 29: Large-scale integration of data and text

STRING

Page 30: Large-scale integration of data and text

~2.6 million proteins

Page 31: Large-scale integration of data and text

Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011

Page 32: Large-scale integration of data and text

STITCH

Page 33: Large-scale integration of data and text

~300,000 small molecules

Page 34: Large-scale integration of data and text

Kuhn et al., Nucleic Acids Research, 2012

Page 35: Large-scale integration of data and text

genomic context

Page 36: Large-scale integration of data and text

gene fusion

Page 37: Large-scale integration of data and text

Korbel et al., Nature Biotechnology, 2004

Page 38: Large-scale integration of data and text

operons

Page 39: Large-scale integration of data and text

Korbel et al., Nature Biotechnology, 2004

Page 40: Large-scale integration of data and text

bidirectional promoters

Page 41: Large-scale integration of data and text

Korbel et al., Nature Biotechnology, 2004

Page 42: Large-scale integration of data and text

metagenome neighborhood

Page 43: Large-scale integration of data and text

Harrington et al., PNAS, 2007

Page 44: Large-scale integration of data and text

phylogenetic profiles

Page 45: Large-scale integration of data and text

Korbel et al., Nature Biotechnology, 2004

Page 46: Large-scale integration of data and text

a real example

Page 47: Large-scale integration of data and text
Page 48: Large-scale integration of data and text
Page 49: Large-scale integration of data and text
Page 50: Large-scale integration of data and text

Cell

Cellulosomes

Cellulose

Page 51: Large-scale integration of data and text

experimental data

Page 52: Large-scale integration of data and text

gene coexpression

Page 53: Large-scale integration of data and text
Page 54: Large-scale integration of data and text

protein interactions

Page 55: Large-scale integration of data and text

Jensen & Bork, Science, 2008

Page 56: Large-scale integration of data and text

curated knowledge

Page 57: Large-scale integration of data and text

drug targets

Page 58: Large-scale integration of data and text

complexes

Page 59: Large-scale integration of data and text

pathways

Page 60: Large-scale integration of data and text

Letunic & Bork, Trends in Biochemical Sciences, 2008

Page 61: Large-scale integration of data and text

many databases

Page 62: Large-scale integration of data and text

different formats

Page 63: Large-scale integration of data and text

different identifiers

Page 64: Large-scale integration of data and text

variable quality

Page 65: Large-scale integration of data and text

not comparable

Page 66: Large-scale integration of data and text

hard work

Page 67: Large-scale integration of data and text

quality scores

Page 68: Large-scale integration of data and text

von Mering et al., Nucleic Acids Research, 2005

Page 69: Large-scale integration of data and text

calibrate vs. gold standard

Page 70: Large-scale integration of data and text

missing most of the data

Page 71: Large-scale integration of data and text

text mining

Page 72: Large-scale integration of data and text

>10 km

Page 73: Large-scale integration of data and text

too much to read

Page 74: Large-scale integration of data and text

computer

Page 75: Large-scale integration of data and text

as smart as a dog

Page 76: Large-scale integration of data and text

teach it specific tricks

Page 77: Large-scale integration of data and text
Page 78: Large-scale integration of data and text
Page 79: Large-scale integration of data and text

named entity recognition

Page 80: Large-scale integration of data and text

comprehensive lexicon

Page 81: Large-scale integration of data and text

cyclin dependent kinase 1

Page 82: Large-scale integration of data and text

CDK1

Page 83: Large-scale integration of data and text

CDC2

Page 84: Large-scale integration of data and text

flexible matching

Page 85: Large-scale integration of data and text

spaces and hyphens

Page 86: Large-scale integration of data and text

cyclin dependent kinase 1

Page 87: Large-scale integration of data and text

cyclin-dependent kinase 1

Page 88: Large-scale integration of data and text

orthographic variation

Page 89: Large-scale integration of data and text

CDC2

Page 90: Large-scale integration of data and text

hCdc2

Page 91: Large-scale integration of data and text

“black list”

Page 92: Large-scale integration of data and text

SDS

Page 93: Large-scale integration of data and text

information extraction

Page 94: Large-scale integration of data and text

count co-mentioning

Page 95: Large-scale integration of data and text

within documents

Page 96: Large-scale integration of data and text

within paragraphs

Page 97: Large-scale integration of data and text

within sentences

Page 98: Large-scale integration of data and text

scoring scheme

Page 99: Large-scale integration of data and text
Page 100: Large-scale integration of data and text
Page 101: Large-scale integration of data and text

corpora

Page 102: Large-scale integration of data and text

~22 million abstracts

Page 103: Large-scale integration of data and text

no access

Page 104: Large-scale integration of data and text

~4 million full-text articles

Page 105: Large-scale integration of data and text
Page 106: Large-scale integration of data and text

augmented browsing

Page 107: Large-scale integration of data and text

Reflect

Page 108: Large-scale integration of data and text

browser add-on

Page 109: Large-scale integration of data and text

real-time text mining

Page 110: Large-scale integration of data and text

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009O’Donoghue et al., Journal of Web Semantics, 2010

Page 111: Large-scale integration of data and text

localization and disease

Page 112: Large-scale integration of data and text

small molecules

Page 113: Large-scale integration of data and text

proteins

Page 114: Large-scale integration of data and text

compartments

Page 115: Large-scale integration of data and text

tissues

Page 116: Large-scale integration of data and text

diseases

Page 117: Large-scale integration of data and text

organisms

Page 118: Large-scale integration of data and text

environments

Page 119: Large-scale integration of data and text

suite of web resources

Page 120: Large-scale integration of data and text

common backend database

Page 121: Large-scale integration of data and text

jensenlab.org

Page 122: Large-scale integration of data and text

text mining

Page 123: Large-scale integration of data and text

curated knowledge

Page 124: Large-scale integration of data and text

experimental data

Page 125: Large-scale integration of data and text

computational predictions

Page 126: Large-scale integration of data and text

quality scores

Page 127: Large-scale integration of data and text

web-centric databases

Page 128: Large-scale integration of data and text

DISEASES

Page 129: Large-scale integration of data and text
Page 130: Large-scale integration of data and text
Page 131: Large-scale integration of data and text

visualization

Page 132: Large-scale integration of data and text

COMPARTMENTS

Page 133: Large-scale integration of data and text

compartments.jensenlab.org

Page 134: Large-scale integration of data and text

TISSUES

Page 135: Large-scale integration of data and text

tissues.jensenlab.org

Page 136: Large-scale integration of data and text

project onto networks

Page 137: Large-scale integration of data and text

Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011

Page 138: Large-scale integration of data and text

compartments.jensenlab.org

Page 139: Large-scale integration of data and text

tissues.jensenlab.org

Page 140: Large-scale integration of data and text

diseases.jensenlab.org

Page 141: Large-scale integration of data and text

summary

Page 142: Large-scale integration of data and text

bioinformatics

Page 143: Large-scale integration of data and text

more than alignment

Page 144: Large-scale integration of data and text

data/text mining

Page 145: Large-scale integration of data and text

save you much time

Page 146: Large-scale integration of data and text

Acknowledgments

STRING/STITCHChristian von Mering

Damian Szklarczyk

Michael Kuhn

Manuel Stark

Samuel Chaffron

Chris Creevey

Jean Muller

Tobias Doerks

Philippe Julien

Alexander Roth

Milan Simonovic

Jan Korbel

Berend Snel

Martijn Huynen

Peer Bork

Literature miningSune Frankild

Evangelos Pafilis

Janos Binder

Kalliopi Tsafou

Alberto Santos

Heiko Horn

Michael Kuhn

Nigel Brown

Reinhardt Schneider

Sean O’Donoghue

Page 147: Large-scale integration of data and text
Page 148: Large-scale integration of data and text

Questions?