31
Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester, UK Annotating SABIO-RK: Integration of MIRIAM and SBO

Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Embed Size (px)

Citation preview

Page 1: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Martin GolebiewskiScientific Databases and Visualization Group

EML Research, Heidelberg

2nd BioModels.net Training Camp

13-15 th of January 2007, Manchester, UK

Annotating SABIO-RK:Integration of MIRIAM and SBO

Page 2: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

• Biochemical model simulations need experimental reaction kinetics data

• Kinetic parameter values highly depend on environmental conditions (temperature, pH, concentrations of reactants and modifiers, etc.)

• Enzyme characteristics vary between organisms, tissues and cellular locations

• Kinetic parameters are only interpretable with their corresponding kinetic laws

• Most databases do not link experimental kinetic data for single reactions to complete sets of information comprising all the information mentioned above

• Data must be easily accessible and interchangeable (data export for exchange)

We aimed at creating a database that collects and standardizes kinetic data,relates the data to its biochemical, environmental and experimental context,cross-links corresponding data and associates it with external resourcesto make the data comparable and accessible in standard formats

Why we have developed SABIO-RK ?

Page 3: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

SABIO-RK

Merges information about biochemical reactions and pathways mainly collected from other databases (e.g. KEGG) with corresponding kinetic data manually extracted from literature (including the environmental context)

Is curated manually, assisted by semi-automatic tools (e.g. lists of values)

Unifies, systematically structures and interrelates the data

Can be accessed through a web-based user interface and through web-services

Supports export of the data in SBML for exchange

Links entities and expressions to complementary databases and ontologies

Database population and access

Page 4: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Database population: data extraction

Data source:

• Kinetic data contained in publications

• Text with non-local, highly scattered information

• Tables, Formula, Graphs, Pictures

• Some information is only noted as reference

Problems:

• No 1:1 relation between the paper and the input mask!

• No controlled vocabulary (e.g. different names of one compound or enzyme) fuzziness of descriptions

Full-text publication SABIO-RK input interface

Page 5: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Problems in the database population

Missing or only partial information in the data source:

- Incomplete reactions (products not mentioned)

- Assay conditions missing or reference to another paper

- Kinetic law equation (or fitting equation) not described

Multiplicity of kinetic law types:

no real standard used in publications (or even available, except SBO)

varying notations referring to several kinetic theories

Parameter units:

- Multiple definitions (e.g. Katal or Unit for enzyme activities)

- Different compositions (e.g. µmol/s or µmol/(s*mg) for Vmax)

- Wrong parameter unit (e.g. 1/s for Vmax)

Identification of compounds, reactions and enzymes:

- Ambiguous descriptions of chemical compounds or enzymes (e.g. missing stereochemical information for stereoisomers, simplifying trivial names, ...)

Page 6: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Data integration problems

=nmol/(min*mg)

=U/mg

1 U = the amount of enzyme which catalyses the transformation of 1 µmol of the substrate per minute under standard conditions

e.g. Parameter units:

Page 7: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Annotations and controlled vocabularies

Infosource• PubMed ID• title• authors• journal

Kinetic Law• type SBO• equation

Environment• buffer• pH• temperature

Reactant, Modifier (Species)• compound name (given in publication)• role (e.g. substrate, inhibitor) SBO • cellular location Gene Ontology• comments (modifications etc.)

Kinetic Parameter• name• type (e.g. Km, kcat) SBO• value (range)• standard deviation• comment• SBO-ID

Reaction• stoechiometry• EC classification• enzyme variant

General Information• organism NCBI-ID• tissue• pathway• comments

Unit

Compound• recommended name • synonymic names• IDs in external databases (e.g. KEGG, ChEBI)• additional information

for a

determined under

parameter units

corresponding species

participate in

belongs to

refers to

reportedfor

from a

SBML Unit

defined as

Protein complex• UniProt IDs

catalyzes

Annotations to external

resources

Controlled vocabulary

Page 8: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Annotations of entities in SABIO-RK

Annotations shown to the user: Chemical compounds to KEGG compound and ChEBI Enzymatic activities to Expasy, KEGG, IntEnz, IUBMB and Reactome (query links in the user interface based on the enzyme classification EC) Enzyme protein complexes to UniProt/Swiss-Prot Cellular locations (compartments etc.) to Gene Ontology (as query link) Publications (data sources) to PubMed

Annotations integrated in SABIO-RK, not yet implemented for the output: Organisms to NCBI taxonomy Kinetic law types and parameter types to SBO (Systems Biology Ontology) Species role (substrate, product, modifier, etc.) to SBO Reactions to KEGG reactions

More annotations following the MIRIAM standard are planed ...

Page 9: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Controlled vocabularies in SABIO-RK

- To unambiguously identify entities or terms

- Facilitate the search, interpretation and comparison of the data

- Permits a matching with other database resources based on shared vocabulary

- Facilitate the integration of different database entries into kinetic models

Lists of values (LOV) in the input interface:

Species (compounds) and species roles (e.g. substrate, product, modifier …)

Biochemical reactions and pathways

Organisms (NCBI taxonomy), tissues and cellular locations

Kinetic law types (e.g. ‚Competitive inhibition‘ or ‚Sequential ordered Bi Bi‘)

Parameter types (e.g. Km‚ kcat, Vmax, Ki, Kd, rate constant, pH, pK ...)

Parameter units (e.g. mM, µM, 1/s, nmol/min, U/(h*mg) ...)

Corresponding species for kinetic parameters (like for Km, Ki or concentrations)

Page 10: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Other notation standards in SABIO-RK

Semi-controlled notation standards:

- Kinetic law equation (analyzed for mathematical correctness when entered)

- Enzyme variants (e.g. wildtype, mutant E540K, wildtype isoenzyme PFKL ...)

- Protein complex of the enzyme: e.g. (Q6UG02)*4 for a hometetramer

- Recombinant enzymes: e.g. ‚expressed in Escherichia coli BL21(DE3)’

- Buffer composition in the experimental setup

Page 11: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Controlled vocabularies in SABIO-RK

List of values (LOV) SABIO-RK input interface

Page 12: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Identifying chemical compounds

Every chemical compound can have multiple synonymic descriptionse.g.:

Trivial name and systematic chemical descriptionValproic acid = 2-Propylpentanoic acid

Different parts of the molecule could be considered as lead structureAcetyl phenol = Phenylacetate

Abberrant order of the substituents of a lead structure (prefixes) 2-Amino-6-methyl-4-pyrimidol = 6-Methyl-2-amino-4-pyrimidol

Description of substituents as prefix (like amino-) or suffix (like –amine) 1-(4-Iodo-2,5-dimethoxyphenyl)-2-aminopropane = 1-(4-iodo-2,5-dimethoxy-phenyl)propan-2-amine

3,17-Dioxoandrost-4-ene = 4-Androstene-3,17-dione

Different nomenclature systems (e.g. abberrant order of the morphems)2-Amino-6-methyl-4-pyrimidol = 2-Amino-6-methylpyrimidin-4-ol

2-Methylpropan-2-ol = 2-Hydroxy-2-methyl-propane

Page 13: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Normalization of compound names

Goals:

• Comparing and linking databases with names of chemical compounds, i.e. synonym detection disregarding orthographic and (minor) morpho- syntactic variance in naming

• Matching chemical compound names against existing synonym lists (e.g. ChEBI, PubChem) to identify synonyms with differences in naming not arising from orthographic variations, like trivial names and systematic names.

Page 14: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Normalization of compound names

CompoundID: 10296IUPAC Name: 2-phenylpropanoic acid

Canonical SMILES: CC(C1=CC=CC=C1)C(=O)O

SynonymsHydratropic acid

2-Phenylpropionic acid 2-Phenylpropanoic acid

alpha-Phenylpropioic acidalpha-Methylphenylacetic acid.alpha.-Phenylpropionic acid

alpha-Methylbenzeneacetic acidBenzeneacetic acid, .alpha.-methyl-

.alpha.-Methylphenylacetic acid.alpha.-Methylbenzeneacetic acid

ALPHA-PHENYLPROPIONIC ACIDBenzeneacetic acid, alpha-methyl-

(S)-alpha-Methylbenzeneacetic acidBenzeneacetic acid, .alpha.-methyl-, (S)-Benzeneacetic acid, .alpha.-methyl-, (R)-Benzeneacetic acid, alpha-methyl-, (R)-Benzeneacetic acid, alpha-methyl-, (S)-

ID NAME

20986 alpha-Phenylpropionate

Normalized Name:

alpha-phenylpropionate

Page 15: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Linguistic assisted compound analysis

Systematic compoundname

Structure Classification

Page 16: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Access to SABIO-RK

Available interfaces:

Web-based user interface

for browsing and searching the data manually

Web Services (API access)

can be automatically called by external tools, e.g. by other databases or simulation programs for biochemical network models

Both interfaces support the export of the data in SBML

Page 17: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

SABIO-RK user interface: Query

Page 18: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

SABIO-RK user interface: Query result

Page 19: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

SABIO-RK user interface: Reaction

Page 20: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

SABIO-RK user interface: Enzyme

Page 21: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

SABIO-RK user interface: database entry with kinetic data

Page 22: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

SBML export from SABIO-RK

Page 23: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

SBML export from SABIO-RK

Reactions are coupledin exported SBML files

every species is onlydefined once in theexported SBML file ifseveral reactions referto the same species

Export of layoutinformation in SBML

- using the SBML layout extension

- to draw reaction maps

Page 24: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Web servicemethods

SABIO-RKAPI access

- Integration in simulation tools

- Cross-linking with other databases

- Several possible entry points

- Supports data export in SBML

Page 25: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Data in SABIO-RK: statistics

PubMed records: 923

Organisms 312

Pathways 90

Reactions: 9600

Enzymes 416

Measured parameters:

enzyme activities(rate constant, kcat or Vmax ) 8118

Km (Michaelis constant) 8701

Ki (inhibiton constant) 1774

as of 09/01/2007

Page 26: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Data in SABIO-RK: statistics

Page 27: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Conclusions

• SABIO-RK is a web-accessible database containing biochemical reactionkinetics data for systems biologists and experimenters

• Merges general reaction information retrieved from external databases with kinetic data manually extracted from literature

• Manual curation of the data with some semi-automatic support

• High degree of interrelation within the database

• Type of kinetics, modes of inhibition or activation and corresponding equations are shown with their parameters, measured values and experimental conditions

• Access through a web-based user interface or through web services (API)

• Export of the data in SBML from both interfaces

• Controlled vocabulary used and content annotated to ontologies and external resources

Page 28: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Future goals

• Information about detailed reaction mechanisms (elementary reaction steps)

• Expansion of the data export functions (more data, more annotations)

• Tools for information extraction and data integration

• Expand the usage of annotations and controlled vocabularies

• Extension of the database model to store signaling reactions

• Convince scientists to directly insert their kinetic data into SABIO-RK

Page 29: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

SABIO-RK project team

and many more: students, colleagues at EML Research and other collaborators….

Financial support:

Page 30: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Workshop Invitation

Workshop

Storage and Annotation of Reaction Kinetics’ Data

May 21-23, 2007

Heidelberg, Germany

http://projects.eml.org/sdbv/projects/events/workshop2007/index_html

Topics:

- Data generation

-          Data storage and integration

-          Data annotation

-          Data usage

Page 31: Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

http://sabio.villa-bosch.de/SABIORK