Model management tools for improved reproducibility in systems biology

  • View
    134

  • Download
    0

  • Category

    Science

Preview:

Citation preview

Model management tools for improved reproducibility in systems biology

Dagmar Waltemath, on behalf of the SEMS team

University of Rostock, Germany

10th International CellML Workshop Auckland, June 2016

2

On models and simulations

Model Simulation

Figs: BioModels (top) and DOI: 10.1073/pnas.88.16.7328 (bottom)

3

Most scientific discoveries rely on previous findings.

Model

Fig.: Tyson 2001 (BIOM195)

Fig.: Tyson 1991 (BIOM005)

Successor

Fig.: History of Cell Cycle models in BioModels

4

Can we rely on findings that we ourselves cannot evaluate? (Probably not!)

“only in ~20–25% of the projects were the relevant published data completely in line with our in-house findings (Fig. 1c). In almost two-thirds of the projects, there were inconsistencies [..] that either considerably prolonged the duration of the target validation process or, in most cases, resulted in termination of the projects because the evidence [..] was insufficient to justify further investments into these projects.” Prinz et al (2011)

5

We identified key challenges of reproducibility insystems biology and systems medicine.

Lack of data standards – Lack of data quality and quantity – Lack of data availability – Lack of transparency

6

A lack of data availability makes it impossible for researchers to reproduce results.

● Model code in BioModels, including supplemental with a how-to reproduce the figures given in the original paper

● Online tool makes data available and browseable

TriplexRNA

Recon 2Recon 2

● Publication backed up with a website containing the supplemental material

● Model code in (non-curated) BioModels● Visualisation of the model can easily

be explored● References to original works

How can we support scientistswho wish to share model-based results?

Issues– Simulation studies comprise

of several files

– Data is heterogeneous, distributed, complex

– Data changes over time

– Documentation of the how the study was performed often missing

7

A lack of data availability makes it impossible for researchers to reproduce results.

How can we support scientistswho wish to share model-based results?

Issues– Simulation studies comprise

of several files

– Data is heterogeneous, distributed, complex

– Data changes over time

– Documentation of the how the study was performed often missing

Our solutions– Tool support for the

COMBINE Archive – lowering the effort to share reproducible models

– Graph-based storage of model-related files – integrated & searchable virtual experiments

– Model version control –towards a provenance of models

8

The COMBINE archive bundles all files necessary to reproduce a simulation study.

COMBINE archive toolkit

● manage COMBINE archives

– Explore

– Edit

– Share

– Publish● Used in: PMR 2, JWS Online,

SED-ML Web Tools, OpenCor …

WebCAT, Scharm et al 2014

9

STON, SED-ML DB & MASYMOS

Integrated storage & retrieval system (MASYMOS)

doi: 10.1093/database/bau130

doi: 10.1186/s13326-015-0014-4

Search across heterogeneous data, ontologies, and structures→poster

Tailor-made storage systems (STON, SED-ML DB)

Using graph databases to integrate standardised model-based data

https://dx.doi.org/10.6084/m9.figshare.3382993.v1

SED-ML DB in JWS Online

BioModelsPhysiome Model repository

10

BiVeS & COMODI

Model version control (BiVeS, COMODI) Provenance-to-be (COMODI)

Tracking the evolution of a CellML/SBML model over time

doi: 10.1093/bioinformatics/btv484

Tracking the evolution of simulation studies and biological systems.

https://dx.doi.org/10.6084/m9.figshare.2543059.v5

Physiome Model repository

doi: 10.1093/bioinformatics/btv484

11

What's next? Models for the clinic, or: Bridging the gap between standards for systems biology & systems medicine

Fig. courtesy Atalag et al (2015) http://hdl.handle.net/2292/27911

Thank you for your attention.

m n @SemsProject

Martin ScharmBiVeS, COMODI, COMBINE Archive Video master

Fabienne LambuschPattern & structure search in SBML models

Mariam NassarRank aggregation

Tom GebhardtSBGN-compliant diffs

Martin PetersM2CAT, COMBINE Archive, SED-ML database

Vasundra ToureSTON, SBGN-ED, SBGN symbol of the month

Ron HenkelMASYMOS, MORRE

www.sems.uni-rostock.de

References

Atalag et al (2015) http://hdl.handle.net/2292/27911

Bergmann et al. (2014) F.T. Bergmann, R. Adams, S. Moodie, J. Cooper, M. Glont et al.: COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC Bioinformatics (2014)

Prinz et al. (2011) Prinz, Florian, Thomas Schlange, and Khusru Asadullah. "Believe it or not: how much can we rely on published data on potential drug targets?." Nature reviews Drug discovery 10.9 (2011): 712-712.

Schmitz et al. (2014) Schmitz, Ulf, et al. "Cooperative gene regulation by microRNA pairs and their identification using a computational workflow." Nucleic acids research (2014): gku465.

Thiele et al. (2013) Thiele, Ines, et al. "A community-driven global reconstruction of human metabolism." Nature biotechnology 31.5 (2013): 419-425.

Waltemath & Scharm (2014) D. Waltemath and M. Scharm: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit. Workshop on Data Management for the Life Sciences (2014), Hamburg, BTW 2014.

Waltemath & Wolkenhauer (2016) D. Waltemath and O. Wolkenhauer: How modeling standards, software, and initiatives support reproducibility in systems biology and systems medicine. IEEE Transactions on Biomedical Engineering (2016) in the press.

Recommended