Outline

Opwarmer for discussion on the harmonization of similar initiatives in NBIC sequencing, metabolomics, protomics and biobanking task forces (+friends like NuGO, EBI, GEN2PHEN, BBMRI-NL, SysMO, EU-PANACEA, Groningen Genomics Coordination Center).

LIMS – laboratory info mngmnt system- AKA study capturing framework - AKA sample treatment tracker - AKA investigation metadata annotator

Outline

• What do we mean with LIMS / SCS?

• Ingredients for collaboration

• Suggestive discussion topics

•Peak finding• SNP analysis• GWAS• xQTL•...

• Individuals• Samples• Protocols• Results• Background info

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

80

• Sequencing• Genotyping• Microarrays• Mass spec•…

LIMS/SCF is portal for

Examples DSP/NuGO

Courtesy Kees van Bochove & team, NMC & NuGO

• CilairDB

• Corra

• OpenBIS

• OpenMS

• …

Examples Proteomics

http://www.cisd.ethz.ch/software/openBIS/HCS

• SequenceLIMS

• ChIPLIMS Nijmegen

• GenotypeLIMS

• IBIDAS?

• iSeq

• …

ExamplesSequencing/Genotyping

Courtesy Joris Lops, GCC & LifeLines

• QTL XGAP/EU-PANACEA

• GWAS XGAP/LifeLines

• HGVbaseG2P 2.0

• …

ExamplesBiobanking

Courtesy Joeri van der Velde & friends, GCC & LifeLines

Working hypotheses

1. Each platform has one or more study ‘portals’• Captures all wet-lab and dry-lab flows

• Links to (or copies from) public annotations

• Provides value and data inputs for pipelines

• Stores provenance and results of all pipeline runs (as result files)

2. All tools developed in BioAssist will be connected to them• Need to think on user interaction

• Need to think on data exchange (formats)

• i.e. what does the biologist want?

• We can benefit greatly if we harmonize and share work• Each domain has specific needs but we can still share

• Data models, User Interfaces, Back-ends, …

• Coordination of this a task of CET?

Ingredients for collaboration

1. Conceptual model• To capture all data, including variation/extension mechanisms

2. Exchange formats• To exchange between public and private databases

3. User interfaces• Data import wizards

• Extraction / query modules

• Platforms for analysis!!!

4. Backend engines1. Large scale binary data

2. Automatic generation of services/pipelines

1. Conceptual model• Targets: the thing being followed

AKA: Individuals, Sample, Panels/Groups, Material

• Features: a abstract property of a target

AKA: Characteristics, Comments,

• Values: a concrete property of target (at a certain time)

AKA: Data

• Protocols: description of an activity

AKA: EventType, Template

• ProtocolApplications: use of protocol that produced (a) value

AKA: Events, Activity, Assay

• Investigation: some container of above + contacts/publications

AKA: Study, Project, Laboratory, Partner

‘Pheno-OM’ (generic variation mechanism)

NLNLEBI

Flexible: any feature,

value, and target combo

Observedvalue

Observedvalue*

Observationtarget

Observationtarget

time

Observablefeature

Observablefeature

*

PanelPanel IndividualIndividual*

* ProtocolProtocol

ProtocolapplicationProtocol

application

*

time

Observed Relation

Observed Relation Inferred ValueInferred Value*

*

time

*

Height

179cmInd1

XGAP (extension based variation mechanism)

Swertz et al (2010) Genome Biology 11(3).

DATA ELEMENT

TRAIT

SUBJECT

columns

rows

dimension ELEMENT

PROBE-Name-Gene-Chromosme-Locus

PROBE-Name-Gene-Chromosme-Locus

MARKER-Name-Allele-Chromosme-Locus

MARKER-Name-Allele-Chromosme-Locus

MASSPEAK-Name-MZ-RetentionTime

MASSPEAK-Name-MZ-RetentionTime

Panel-Name-Type: CSS, RIL..-Parent Panels

Panel-Name-Type: CSS, RIL..-Parent Panels

INDIVIDUAL-Name-Strain-Mother-Father-Sex

INDIVIDUAL-Name-Strain-Mother-Father-Sex

SAMPLE-Name-Individual-Tissue

SAMPLE-Name-Individual-Tissue And so on

…

And so on…

And so on…And so on…

NLNL

ISA-TAB(generic model)

Differs from MAGE-TAB• Nested investigations (as studies)• To have templates assays• More aligned to FuGE• But some find it too difficult

ISA =• Investigation• Study (Investigation component)• Assay (a component of Study)• Data files

Still in testing phase though…

http://isatab.sf.net

MIBBI

• MIBBI Minimum Information for Biological and Biomedical Investigations (total 31 areas)

http://mibbi.sourceforge.netTaylor et al 2008 Nature Biotechnology 8, p 889

MIAME Minimum Information About a Microarray Experiment

MIAPA Minimum Information About a Phylogenetic Analysis

MIAPAR Minimum Information About a Protein Affinity Reagent

MIAPE Minimum Information About a Proteomics Experiment

MIARE Minimum Information About a RNAi Experiment

MIFlowCyt Minimum Information for a Flow Cytometry Experiment

MIGen Minimum Information about a Genotyping Experiment

MIGS Minimum Information about a Genome Sequence

MIMPP Minimal Information for Mouse Phenotyping Procedures

MINSEQE Minimum Information about a high-throughput SeQuencing Experiment

MIPFE Minimal Information for Protein Functional Evaluation

MIQAS Minimal Information for QTLs and Association Studies

Ingredients for collaboration

1. Conceptual model• To capture all data, including variation/extension mechanisms

2. Exchange formats• To exchange between public and private databases

3. User interfaces• Data import wizards

• Extraction / query modules

• Platforms for analysis!!!

4. Backend engines1. Large scale binary data

2. Automatic generation of services/pipelines

2. Data formats

Basic

• CSV

• XML

• RDF/Atom

Specific

• MAGE-TAB

• MOLGENIS

• APML

• …

17

Connect to R statistics

Connect to R statistics

Workflow ready web-services

Workflow ready web-services

UML documentation of your model

UML documentation of your model

Edit & trace your dataEdit & trace your data

Import/export to ExcelImport/export to Excel

plugin your own scripts (OntBrowse)

plugin your own scripts (OntBrowse)

Tech keywords: object oriented data models, multi-platform java, tomcat/glassfish web server, mysql/postgresql database, Eclipse/Netbeans IDE, Java API, WSDL/SOAP API, R-project API, MVC, freemarker templates and css for custom layout, open source.

find.investigation()102 downloaded

obs<-find.observedvalue(43,920 downloaded

#some calculationadd.inferredvalue(res)36 added

3. User interfaces

3. User interfaces (import wizards)

3. User interfaces (import wizards)

http://www.obofoundry.org/http://bioportal.bioontology.org/ REST serviceshttp://www.ebi.ac.uk/ontology-lookup/ SOAP serviceshttp://ontocat.sf.net – Simple API around bioportal

ADD PICTURE OF GSCF

3. User interfaces (compute platform)

Courtesy Arends & van der Velde

Things to discuss as next steps?Put all people/tools in this room on the table

• Agree on exchange formats & models (generic/specific)

• Test drive data exchange or even federation

Share the work

• Communicate requirements and plans

• Reuse each other user interface components

• Share scalable back-ends (for high throughput data)

Invest in technology interoperation

• Invest in Galaxy callback to MOLGENIS/Grails (data chooser)?

• Invest in a MOLGENIS to Grails generator (must be easy)?

Something for NBIC mgmt team to think about

Extra

• XGAP wizard here

Acknowledgements

• Morris Swertz, Kees van Bochove, Erik Roos, Joris Lops, Joeri van der Velde, GEN2PHEN, MAGE-TAB, XGAP, ISA-TAB, FuGE, GSCF teams

Documents

Outline