19
SMART Protocols: SeMAntic RepresenTation for Experimental Protocols Olga Giraldo [email protected] Ontology engineering group (OEG) Universidad Politécnica de Madrid

SMART Protocols in LISC-2014

Embed Size (px)

Citation preview

SMART Protocols: SeMAntic RepresenTation for

Experimental Protocols

Olga Giraldo

[email protected]

Ontology engineering group (OEG)

Universidad Politécnica de Madrid

Agenda

• What is a lab protocol

• Motivation

• Our general research question

• Our assumption

• Our propose

• Preliminary results

• Future work

What is a lab protocol

• Laboratory protocols are like cooking recipes• They have ingredients: reagents and sample,• They have appliances: equipment,• They have a total time,• They have a list of instructions,• They have critical steps.

• The laboratory protocols are “the how to do” an experiment.

Some problems in lab protocols

some of them present insufficient granularity,

the instructions can be imprecise or ambiguous due to the use of natural language.

• Incubate the centrifuge tubes in a water bath.

• Incubate the samples for 5 min with gentle shaking.

• Rinse DNA briefly in 1-2 ml of wash.

• Incubate at -20C overnight.

Why do we need to formalize and extract information from lab protocols?

Because we want a recommendation system…• That matches protocols according to my situation, for

instance• samples I have, • availability of equipment, reagents, lab conditions • expertise

We also want content based information retrieval • Meaningful sentences, sample used, purpose of the

protocol, applicability, critical steps, etc. Also, identification of instructions• Find all protocols for DNA extraction that have been used in

Oryza sativa that are suitable for processing a large number of samples with a low execution time.

Motivation

Currently…

Semi-structured information

Unstructured information

How to formalize the information from laboratory protocols as a knowledge base?

Ontologies + NLP tools

Our assumption

“Experimental protocols are fundamental information structures that should support the description of the processes by means of which results are generated in experimental research”

Our propose

Methods to represent and extract information

• Gazetteer-based method: use existing lists of named entities Lists of proper nouns, which refer to real-life entities

• Rule-based approaches: write manual extraction rules

• Combination of the above

• Ontology model representing lab protocols

work in progress

Ontology development

Methodology used to develop SMART Protocols

Kick-off

• Gathering use cases.• Gathering competency questions.

Conceptualization &

Formalization

• DAKA - Domain Analysis and Knowledge AcquisitionAnalysis of 175 experimental protocols.1

• LISA - Linguistic and Semantic AnalysisIdentification of key metadata for reporting protocols,2 Determination of workflow aspects in protocols

(implicit order in the instructions, following the input output structure.)

Extraction of elements pertaining to domain knowledge. (e.g. classification of protocols in groups according to the purpose. Within each group were identified basic steps (or common patterns), according to the type of protocol.

• IO - Iterative Ontology buildingDesign of conceptual maps and draft ontologies. The

ontology modules were gathering from DAKA and LISA activities and exchanged with domain experts.

Evaluation &

Evolution

• OWL• Correction of syntactic inconsistencies by using OWLViz3

and OOPS4

• The ontology model evolves as new knowledge goes through the whole cycle.

1http://goo.gl/MC4mR92goo.gl/gAVnn

3http://protegewiki.stanford.edu/wiki/OWLViz4http://oeg-lia3.dia.fi.upm.es/oops/index-content.jsp

SMART Protocols - document It is an extension of IAO ontology. It supports rhetorical and structural components (e.g. introduction, materials, and methods); It supports Information like application of the protocol, advantages and limitations, list of

reagents, critical steps.

SMART Protocols ontology is available here:

http://vocab.linkeddata.es/SMARTProtocols/

SMART Protocols - wf

• It is an extension of the P-Plan Ontology.

• It represents of the workflow aspects in protocols implicit order in the instructions, following the input output structure.

SMART Protocols ontology is available here:

http://vocab.linkeddata.es/SMARTProtocols/

New and reused terms

Resource No. of terms Resource No. of termsOBI 15 P-Plan 3NCIthesaurus 9 NPO 3CHEBI 7 EXACT 2IAO 7 SO 2MGEDOntology 3 MeSH 1

• Reused classes = 52

• Reused properties = 4Property Origen Reused in

isManufacturedBy OBI SMART Protocols-Document

hasInputVar P-Plan SMART Protocols-Workflow

hasOutputVar P-Plan SMART Protocols-Workflow

isStepOfPlan P-Plan SMART Protocols-Workflow

Ontology No. of classes No. of propertiesSMART Protocols-Document 60 7SMART Protocols-Workflow 44 1Total 104 8

• New terms

Future work

• Analysis of the protocols. Focus on the identification of keywords and/or constructs in English –e.g. instructions, actions.

• Writing rules.

• Executing, testing and debugging the rules.

Work in progress

Summarizing…

Our purpose is the formalization of lab protocols by using ontologies and NLP tools to intelligently extract information.

Special thanks…Supervisors

Oscar Corcho Alexander Garcia

OEG’s colleagues

Daniel Garijo María Poveda Pablo Calleja Nandana Mihindukulasooriya

Olga Giraldo

[email protected]

[email protected]

Ontology engineering group (OEG)

Universidad Politécnica de Madrid