Annotating Experimental Records using Ontologies
Olga Giraldo, Unal de Colombia/CIATJael Garcia, 3Universität der Bundeswehr
Alexander Garcia, UAMS
Motivation and Research Question
• Knowledge-based approach to managing laboratory information– it combines elements from the Semantic Web (SW), e.g.
ontologies supporting organization and classification, with elements from Social Tagging Systems, e.g. collaboration, ad-hoc organization strategies.
• How can we semantically annotate laboratory records?
• How can we facilitate the coexistence of laboratory notebooks and electronic laboratory records?
Motivation and Research Question• Easy to use, highly portable,
easy to share, low cost…• Great artifacts for supporting
design• Legal requirement
da Vinci
Mutis
Marie Curie
Research Question
• How can we facilitate the coexistence of laboratory notebooks and electronic laboratory records?
• How can we semantically annotate laboratory records?
Our Approach
• Documents should be able to “know about” their own content for automated processes to “know what to do” with them.
Semantics….
Materials and Methods
• Our scenario: supporting the annotation of experimental data for some of the processes routinely run at the Center for International Tropical Agriculture (CIAT) biotechnology laboratory
• 15 laboratory notebooks together with their corresponding electronic records, e.g. XLS files, outputs from lab equipment, etc.
• 10 biologists • Direct non-intrusive
observation: 6 months• Ontology and prototype
development: iterative and collaborative process
• Existing ontologies
Results
• Data types • Rhetorical structure• Ontologies• Orchestration of ontologies• Tags and ontologies • Lessons
Results
• Data Types– Manuscript – Digital – Digital data with manuscript annotations
Results• Manuscript
– Lists– To-dos – How-tos (protocols)– Incomplete results – Dates– Formulas– Electronic paths – Sources for information
(URLs)
Results
• Digital– Photos– Lists– Incomplete results – Protocols – Figures– Sequences
Results
• Digital + Manuscript – Digital files, print-outs,
tagged with manuscript information.
Results
• We identified the rhetorical structure implicit in those laboratory notebooks we studied
• And the metadata describing such structure
Lab Notebook
Body: metadata describing an experimental activity
Header: metadata describing a lab notebook
Title (DC)
Notes (AgMes)
Date of creation (DC) Laboratory
notebook number (M4L)
Creator (DC/AgMes)
Date of finalization (M4L)
Languaje (DC)
Project (OBI/AGROVO
C)
Laboratory procedure
(M4L)
Comments (BioPortal,
NCIt, SNOMED)
Date (DC)
Page number (M4L)
Purpose (M4L)
Security measurements
(M4L)
Outcome (NCIt)
Rhetorical structure: Header, Body.
Materials & Methods, experimental design
Materials & Methods: Samples, Reagents, Assays, Equipment and supplies.
Experimental design
Samples: DNA, RNA, whole plant, etc. (OBI, CHEBI, PO)
Reagents: buffer, dNTP mix (CHEBI, M4L)
Assay: extraction DNA, PCR, gel electrophoresis (OBI, M4L).
Equipment & supplies: freezer, centrifuge, shaker, glove, etc. (OBI, PEO, SEP, SNOMED, BIRNLex M4L).
Experimental design: (OBI, M4L)
Protocol (OBI)
Recorded by (M4L)
We focused on: DNA extraction, PCR and Electrophoresis
DNA Extraction
A typical process in a plant biotechnology laboratoryMechanical pulverization of plant material
Results
• M4L: our ontology for the experimental processes we studied– Based on OBI. – Terms proposed to OBI: 197, including new terms
plus terms from other ontologies– Other terms will be proposed to other ontologies,
e.g. ChEBI, GO, PO
Ontology N. of concepts
0 Metadata for Laboratory Notebook (M4L) 149
1 Chemical Entities of Biological Interest (CHEBI) (Degtyarenko et al., 2008) 87
2 Ontology for Biomedical Investigation (OBI) (Brinkman et al., 2010) 59
3 Medical Subject Headings ontology (MSH) (Moerchen et al., 2008) 17
4 Gene Ontology (GO) (Ashburner et al., 2000) 14
5 Sample Processing and Separation Techniques (SEP) (http://psidev.info/index.php?q=node/312) 6
6 BIRN Project lexicon (BIRNLex) (Bug et al., 2008) 6
7 Gene Regulation Ontology (GRO) (Beisswanger et al., 2008) 5
8 National Cancer Institute thesaurus (NCIt) (Ceusters et al., 2005) 5
9 Plant Ontology Consortium (POC) (Jalswal et al., 2005) 5
10 SNOMED-CT (http://www.nlm.nih.gov/research/umls/Snomed/snomed_main.html) 5
11 BioTop Ontology (Beisswanger et al., 2007) 1
12 Foundational Model of Anatomy (FMA) (Rosse and Mejino, 2003) 1
13 Ontology for Genetic Interval (OGI) (Lin et al., 2010) 1
14 Parasite Experiment Ontology (PEO) (http://wiki.knoesis.org/index.php/Parasite_Experiment_ontology) 1
15 Proteomics Data and Process Provenance (PDPP) (Sahoo et al., 2006) 1
Results• We have structured the
descriptive layers by reusing and extending existing ontologies.
• For supporting the annotation within our scenario we have identified three main layers, namely:– i) that related to the document
itself,– ii) the annotation layer, and– iii) that related to the
experiment.
Results
• Orchestration of ontologies: Annotation Ontology
The Annotation Ontology is a vocabulary for performing several types of annotation - comment, entities annotation (or semantic tags), textual annotation (classic tags), notes, examples, erratum... - on any kind of electronic document (text, images, audio, tables...) and document parts. AO is not providing any domain ontology but it is fostering the reuse of the existing ones for not breaking the principle of scalability of the Semantic Web.
InitEndCornerSelectorImageSelector
(304,507) (360,618)
ANNOT1
ANNOT2
Annotation Qualifier
Definition
aos:init aos:end
rdf:type
rdfs:SubClassOf
Selector
ao:context
rdf:typerdfs:SubClassOf
Provenance
http://www.tags4lab.org/
foaf.rdf#olga.giraldoJune 1, 2010
foaf:Person
rdf:type
pav:createdOnpav:createdBy
Annotation
rdf:typerdfs:SubClassOf
Partial sequence on psy promoter
aof:annotatesDocument
aof:onDocument
GenBank:AB005238ao:hasTopic
name
Topic
ann:body
moat:Tag
tags:name
rdf:type
moat:tagMeaning
MOAT
aoex:hasMoatMeaning moat:Meaning
rdf:typemoat:hasMeaning
aof:annotatesDocument
http://www.ncbi.nlm.nih.gov/
pubmed/12520345
Results
• The AO is structuring the semantic annotation as well as the tags generated by users. – In this way we are
supporting complex SPARQL queries involving several ontologies, for instance:
• Retrieve from the eLabBook the pages tagged by Tim Andrews or Lisa Watson with the tags rice and iron for which there is a LIMS data entry”
Concluding Remarks
• Although several ELNs have been proposed and replacing paper-based records has been a consistent trend for several years, the technology has not yet been widely adopted; Laboratory Information Management Systems (LIMS) in combination with paper-based laboratory notebooks continue to be commonly used; particularly in academic environments.
Concluding Remarks
• Sharing and organizing information happens on a concept basis – researchers studying genes involved in iron
transport share information with those who undertake nutritional studies assessing the effects of iron intake in human populations
– Clustering information based on concepts
Concluding Remarks
• Simple tagging mechanisms proved to be valuable resources for organizing information– Cloud of tags were used as TOCs– Tags were also used to support a quick view of
laboratory pages – Tags tend to stabilize over time– Tags were a valuable resource of terms and
evidence (use cases) for those terms
Concluding Remarks
• Time is difficult to model • Incremental prototyping and participatory
design were key –community engagement• Limitations in the technology:
– Tablets, electronic pen, ipad first generation, now motorola XOOM
– Browser compatibility• Laboratory notebooks look like specialized wikis
Future Work
• Focus on one technology: Android OS• Semantic LIMS• Support the whole cycle (LIMS record—notebook—
machine generated data)• Automatic annotation of machine generated data• Adopt minimal amounts of information• Adopt techniques from Personal Information
Management approaches• Look more like a wiki
Acknowledgments
• John Bateman, Oscar Corcho, Joe Tohme, Cesar Montana, Alberto Labarga
• The CIAT biotech lab