Upload
lars-juhl-jensen
View
133
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Large-scale integration of data and text
Citation preview
Large-scale integration of data and text
Lars Juhl Jensen
cellular network biology
association networks
guilt by association
molecular networks
proteins
string-db.org
small molecules
stitch-db.org
subcellular localization
compartments.jensenlab.org
tissue expression
tissues.jensenlab.org
disease associations
usage statistics
heavily used
especially in the US
data integration
heterogeneous data
curated knowledge
experimental data
computational predictions
many databases
different formats
different identifiers
variable quality
not comparable
hard work
common identifiers
quality scores
score calibration
missing most of the data
text mining
>10 km
named entity recognition
comprehensive lexicon
cyclin dependent kinase 1
CDC2
orthographic variation
hCdc2
“black list”
SDS
co-mentioning
counting
within documents
within paragraphs
within sentences
IDG-specific tasks
target classification
text mining
“protein studiedness”
probabilistic counting
resource integration
disease associations
tissue expression
subcellular localization
automation of updates
web services
remapping of identifiers
predictions for dark matter
network-based inference
questions?