87
Semantics and Services Enabled Problem Solving Environment for Trypanosoma cruzi Amit Sheth , Satya Sahoo , Priti Parikh NCBO 2010 January 20, 2010 Kno.e.sis Center , Wright State University Thursday, January 28, 2010

Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Semantics and Services Enabled Problem Solving Environment for Trypanosoma cruzi

Amit Sheth, Satya Sahoo, Priti Parikh

NCBO 2010January 20, 2010

Kno.e.sis Center, Wright State University

Thursday, January 28, 2010

Page 2: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Trypanosoma cruzi

• T. cruzi is a protozoan parasite that causes Chagas Disease or American trypanosomiasis

• Chagas disease is the leading cause of death in Latin America where around 18 million people are infected with this parasite

• Related parasites include, Trypanosoma brucei and Leishmania major that causes African trypanosomiasis and leishmaniasis, respectively.

T. Brucei surrounded by red blood cells in a smear of infected blood. (Copyright: Jürgen Berger and Dr. Peter Overath, Max Planck Institute for Developmental Biology, Tübengen)

Thursday, January 28, 2010

Page 3: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Project Outline

Thursday, January 28, 2010

Page 4: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Project Outline

Thursday, January 28, 2010

Page 5: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Project Outline

• Data Sources Internal Lab Data

• Gene Knockout• Strain Creation• Microarray• Proteome

External Database

Thursday, January 28, 2010

Page 6: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Project Outline

• Data Sources Internal Lab Data

• Gene Knockout• Strain Creation• Microarray• Proteome

External Database• Ontological Infrastructure

Parasite Lifecycle Parasite Experiment

Thursday, January 28, 2010

Page 7: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Project Outline

• Data Sources Internal Lab Data

• Gene Knockout• Strain Creation• Microarray• Proteome

External Database• Ontological Infrastructure

Parasite Lifecycle Parasite Experiment

• Query processing Cuebee

Thursday, January 28, 2010

Page 8: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Project Outline

• Data Sources Internal Lab Data

• Gene Knockout• Strain Creation• Microarray• Proteome

External Database• Ontological Infrastructure

Parasite Lifecycle Parasite Experiment

• Query processing Cuebee

• Results

Thursday, January 28, 2010

Page 9: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Collaborating Institutions

Tarleton Research Group, Center for Tropical and Emerging Global Diseases(CTEGD), University of Georgia

Large Scale Distributed Information Systems, LSDIS Lab, University of Georgia

National Center for Biological Ontologies, NCBO, Stanford University

The Wellcome Trust Sanger Institute, Cambridge, UK

The Oswaldo Cruz Institute (Fiocruz), Brazil

Thursday, January 28, 2010

Page 10: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Project Generated Resources

• Trykipedia: Wiki-based discussion and dissemination platform for the parasite community http://knoesis.wright.edu/

trykipedia

• Parasite Knowledge Repository (PKR) Parasite Lifecycle Ontology Parasite Experiment

Ontology

• Cuebee: platform that provides intuitive interface to query biological data semantically

Thursday, January 28, 2010

Page 11: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Trykipedia - a Wiki-based platform for collaboration of Parasite Research Community

Thursday, January 28, 2010

Page 12: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

PLO on Trykipedia

Thursday, January 28, 2010

Page 13: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Each PLO and PEO class has descriptive texts along with images and external links or references (as appropriate)

Thursday, January 28, 2010

Page 14: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Parasite Knowledge Repository (PKR)

• PKR will support complex biological queries related to T.cruzi drugs, vaccination, or gene knockout targets; for example, Find all genes with proteomic expression in mammalian lifecycle stage with GPI

anchor or signal peptide predictions. Find genes annotated as potential vaccine candidates. Find all genes with proteomic expression evidence in the mammalian host lifecycle

stages for T. cruzi

• Data Internal lab data (from Tarleton Research Group)

Gene Knockout, Strain Creation, Microarray, and Proteome

External databases (TriTrypDB, ProtozoaDB, Drug Bank, etc. )

• Ontologies: Parasite Lifecycle Ontology (PLO) Parasite Experiment Ontology (PEO)

Thursday, January 28, 2010

Page 15: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Parasite Lifecycle Ontology (PLO)

• Models lifecycle stages of T.cruzi, T.brucei, and L.major in OWL

• All the entities are linked to each other by explicitly modeled named relationships, for example, T.cruzi→has_vector_organism → triatominae

• Currently has 41 classes and 5 properties with a description logic expressivity of ALU.

• Collaboration with the Sanger Institute (UK) and Oswaldo Cruz Institute (Brazil)

Thursday, January 28, 2010

Page 16: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Parasite Experiment Ontology (PEO)

• Models gene knockout, strain creation, microarray, and proteomics experiments data Process, instruments, parameters, and sample details

to annotate experimental results with provenance metadata

• 110 classes and 23 properties with a logic expressivity of ALCHQ(D)

• Named relationships, for e.g., Tcruzi_lifecyclestage_subsample → part_of → Tcruzi_sample, and Tcruzi_lifecyclestage_subsample →is_located_in→spatial_parameter Provides important information about research and

Provenance

Thursday, January 28, 2010

Page 17: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance for GKO and SC Protocols

Thursday, January 28, 2010

Page 18: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance for GKO and SC Protocols

New Parasite Strains

Thursday, January 28, 2010

Page 19: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

T. cruzi Provenance System (TPS) for GKO and

• Capture Web pages used in experiments Transform data into RDF instance data corresponding to PEO schema

• Modeling

• Storage Oracle 10g (release 10.2.0.3.0) RDF database management system (DBMS)

• Query Analysis provenance query operators

Thursday, January 28, 2010

Page 20: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance in Parasite Research

*T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia

Cloned Sample

Gene Name

?

Gene Knockout and Strain Creation*

Thursday, January 28, 2010

Page 21: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance in Parasite Research

*T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia

SequenceExtraction

PlasmidConstruction

Transfection

DrugSelection

CellCloning

Gene Name

3‘ & 5’Region

Knockout Construct Plasmid

Drug Resistant Plasmid

Transfected Sample

Selected Sample

ClonedSample

T.Cruzi sample

Thursday, January 28, 2010

Page 22: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance in Parasite Research

*T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia

SequenceExtraction

PlasmidConstruction

Transfection

DrugSelection

CellCloning

Gene Name

3‘ & 5’Region

Knockout Construct Plasmid

Drug Resistant Plasmid

Transfected Sample

Selected Sample

ClonedSample

T.Cruzi sample

Thursday, January 28, 2010

Page 23: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance in Parasite Research

*T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia

SequenceExtraction

PlasmidConstruction

Transfection

DrugSelection

CellCloning

Gene Name

3‘ & 5’Region

Knockout Construct Plasmid

Drug Resistant Plasmid

Transfected Sample

Selected Sample

ClonedSample

T.Cruzi sample

Related Queries from Biologists

Thursday, January 28, 2010

Page 24: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance in Parasite Research

*T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia

SequenceExtraction

PlasmidConstruction

Transfection

DrugSelection

CellCloning

Gene Name

3‘ & 5’Region

Knockout Construct Plasmid

Drug Resistant Plasmid

Transfected Sample

Selected Sample

ClonedSample

T.Cruzi sample

Related Queries from Biologists• List all groups in the lab that used a Target

Region Plasmid?

Thursday, January 28, 2010

Page 25: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance in Parasite Research

*T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia

SequenceExtraction

PlasmidConstruction

Transfection

DrugSelection

CellCloning

Gene Name

3‘ & 5’Region

Knockout Construct Plasmid

Drug Resistant Plasmid

Transfected Sample

Selected Sample

ClonedSample

T.Cruzi sample

Related Queries from Biologists• List all groups in the lab that used a Target

Region Plasmid?• Which researcher created a new strain of

the parasite (with ID = 66)?

Thursday, January 28, 2010

Page 26: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance in Parasite Research

*T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia

SequenceExtraction

PlasmidConstruction

Transfection

DrugSelection

CellCloning

Gene Name

3‘ & 5’Region

Knockout Construct Plasmid

Drug Resistant Plasmid

Transfected Sample

Selected Sample

ClonedSample

T.Cruzi sample

Related Queries from Biologists• List all groups in the lab that used a Target

Region Plasmid?• Which researcher created a new strain of

the parasite (with ID = 66)?• An experiment was not successful – has

this experiment been conducted earlier? What were the results?

Thursday, January 28, 2010

Page 27: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Management for Scientific Data

Thursday, January 28, 2010

Page 28: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Management for Scientific Data

• Provenance from the French word “provenir” describes the lineage or history of a data entity

Thursday, January 28, 2010

Page 29: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Management for Scientific Data

• Provenance from the French word “provenir” describes the lineage or history of a data entity

Thursday, January 28, 2010

Page 30: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Management for Scientific Data

• Provenance from the French word “provenir” describes the lineage or history of a data entity

• For Verification and Validation of Data Integrity, Process Quality, and Trust

Thursday, January 28, 2010

Page 31: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Management for Scientific Data

• Provenance from the French word “provenir” describes the lineage or history of a data entity

• For Verification and Validation of Data Integrity, Process Quality, and Trust

Thursday, January 28, 2010

Page 32: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Management for Scientific Data

• Provenance from the French word “provenir” describes the lineage or history of a data entity

• For Verification and Validation of Data Integrity, Process Quality, and Trust

• Issues in Provenance ManagementProvenance ModelingA Dedicated Query InfrastructurePractical Provenance Management Systems

Thursday, January 28, 2010

Page 33: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Ontologies for Provenance Modeling

Thursday, January 28, 2010

Page 34: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Ontologies for Provenance Modeling

• Advantages of using Ontologies Formal Description: Machine Readability, Consistent Interpretation Use Reasoning: Knowledge Discovery over Large Datasets

Thursday, January 28, 2010

Page 35: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Ontologies for Provenance Modeling

• Advantages of using Ontologies Formal Description: Machine Readability, Consistent Interpretation Use Reasoning: Knowledge Discovery over Large Datasets

• Problem: A gigantic, monolithic Provenance Ontology! – not feasible

Thursday, January 28, 2010

Page 36: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Ontologies for Provenance Modeling

• Advantages of using Ontologies Formal Description: Machine Readability, Consistent Interpretation Use Reasoning: Knowledge Discovery over Large Datasets

• Problem: A gigantic, monolithic Provenance Ontology! – not feasible

• Solution: Modular Approach using a Foundational Ontology

Thursday, January 28, 2010

Page 37: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Ontologies for Provenance Modeling

• Advantages of using Ontologies Formal Description: Machine Readability, Consistent Interpretation Use Reasoning: Knowledge Discovery over Large Datasets

• Problem: A gigantic, monolithic Provenance Ontology! – not feasible

• Solution: Modular Approach using a Foundational Ontology

FOUNDATIONAL

ONTOLOGY

Thursday, January 28, 2010

Page 38: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Ontologies for Provenance Modeling

• Advantages of using Ontologies Formal Description: Machine Readability, Consistent Interpretation Use Reasoning: Knowledge Discovery over Large Datasets

• Problem: A gigantic, monolithic Provenance Ontology! – not feasible

• Solution: Modular Approach using a Foundational Ontology

GLYCOPROTEIN

EXPERIMENT

OCEANOGRAPHY

PARASITEEXPERIMENT

FOUNDATIONAL

ONTOLOGY

Thursday, January 28, 2010

Page 39: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Ontologies for Provenance Modeling

• Advantages of using Ontologies Formal Description: Machine Readability, Consistent Interpretation Use Reasoning: Knowledge Discovery over Large Datasets

• Problem: A gigantic, monolithic Provenance Ontology! – not feasible

• Solution: Modular Approach using a Foundational Ontology

GLYCOPROTEIN

EXPERIMENT

OCEANOGRAPHY

PARASITEEXPERIMENT

FOUNDATIONAL

ONTOLOGY

Thursday, January 28, 2010

Page 40: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenir Ontology

Thursday, January 28, 2010

Page 41: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenir Ontology

Transfection

SequenceExtraction

PlasmidConstruction

Transfection

DrugSelection

CellCloning

Gene Name

3‘ & 5’Region

Knockout Construct Plasmid

Drug Resistant Plasmid

Transfected Sample

Selected Sample

ClonedSample

T.Cruzi sample

Thursday, January 28, 2010

Page 42: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenir Ontology

PROCESS

AGENT

DATA Transfection

SequenceExtraction

PlasmidConstruction

Transfection

DrugSelection

CellCloning

Gene Name

3‘ & 5’Region

Knockout Construct Plasmid

Drug Resistant Plasmid

Transfected Sample

Selected Sample

ClonedSample

T.Cruzi sample

Thursday, January 28, 2010

Page 43: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenir Ontology

PROCESS

AGENT

DATAhas_agent

participates_in

Transfection

SequenceExtraction

PlasmidConstruction

Transfection

DrugSelection

CellCloning

Gene Name

3‘ & 5’Region

Knockout Construct Plasmid

Drug Resistant Plasmid

Transfected Sample

Selected Sample

ClonedSample

T.Cruzi sample

Thursday, January 28, 2010

Page 44: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenir Ontology Schema

PROCESS

AGENT DATA

has_agent

participates_in

Thursday, January 28, 2010

Page 45: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenir Ontology Schema

PROCESS

AGENT DATA

DATA COLLECTION PARAMETER

has_agent

participates_in

is_a is_a

Thursday, January 28, 2010

Page 46: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenir Ontology Schema

PROCESS

AGENT DATA

DATA COLLECTION PARAMETER

SPATIAL THEMATIC TEMPORAL

has_agent

participates_in

is_a is_a

is_a

is_a is_a

Thursday, January 28, 2010

Page 47: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenir Ontology Schema

located_in

PROCESS

AGENT DATA

DATA COLLECTION PARAMETER

SPATIAL THEMATIC TEMPORAL

has_agent

participates_in

has_temporal_value

is_a is_a

is_a

is_a is_a

Thursday, January 28, 2010

Page 48: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenir Ontology Schema

located_in

PROCESS

AGENT DATA

DATA COLLECTION PARAMETER

SPATIAL THEMATIC TEMPORAL

has_agent

participates_in

preceded_by

has_temporal_value

is_a is_a

is_a

is_a is_a

Thursday, January 28, 2010

Page 49: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenir Ontology Schema

located_in

PROCESS

AGENT DATA

DATA COLLECTION PARAMETER

SPATIAL THEMATIC TEMPORAL

has_agent

participates_in

preceded_by

has_temporal_value

is_a is_a

is_a

is_a is_a

Thursday, January 28, 2010

Page 50: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Domain-specific Provenance: Parasite Experiment ontology

agent

processdata_collection

data

parameter

spatial_parameter

domain_parameter

temporal_parameter

has_agent is_a

is_ais_a

is_a

is_a

has_participant

PROVENIRONTOLOGY

*Parasite Experiment ontology available at: http://wiki.knoesis.org/index.php/Trykipedia

Thursday, January 28, 2010

Page 51: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Domain-specific Provenance: Parasite Experiment ontology

agent

processdata_collection

data

parameter

spatial_parameter

domain_parameter

temporal_parameter

sample

Time:DateTimeDescritption

transfection_buffercell_cloningstrain_creation_

protocol

transfection_machine

transfection

drug_selection

location

has_agent

is_a

is_a

is_a

is_ais_a

is_a

is_ais_a

is_a

is_a

is_a

is_a

is_a

is_a

has_participant

PROVENIRONTOLOGY

PARASITEEXPERIMENT

*Parasite Experiment ontology available at: http://wiki.knoesis.org/index.php/Trykipedia

Thursday, January 28, 2010

Page 52: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Domain-specific Provenance: Parasite Experiment ontology

agent

processdata_collection

data

parameter

spatial_parameter

domain_parameter

temporal_parameter

sample

Time:DateTimeDescritption

transfection_buffercell_cloningstrain_creation_

protocol

transfection_machine

transfection

drug_selection

location

has_agent

is_a

is_a

is_a

is_ais_a

is_a

is_ais_a

is_a

is_a

is_a

is_a

is_a

is_a

has_participant

PROVENIRONTOLOGY

PARASITEEXPERIMENTONTOLOGY

*Parasite Experiment ontology available at: http://wiki.knoesis.org/index.php/Trykipedia

Thursday, January 28, 2010

Page 53: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Domain-specific Provenance: Parasite Experiment ontology

agent

processdata_collection

data

parameter

spatial_parameter

domain_parameter

temporal_parameter

sample

Time:DateTimeDescritption

transfection_buffercell_cloningstrain_creation_

protocol

transfection_machine

transfection

drug_selection

Tcruzi_sample

location

has_agent

is_a

is_a

is_a

is_ais_a

is_a

is_ais_a

is_a

is_a

is_a

is_a

is_a

is_a

is_a

has_participant

has_parameter

has_participant

PROVENIRONTOLOGY

PARASITEEXPERIMENTONTOLOGY

*Parasite Experiment ontology available at: http://wiki.knoesis.org/index.php/Trykipedia

Thursday, January 28, 2010

Page 54: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Classification

Classified Provenance Queries into Three Categories

Thursday, January 28, 2010

Page 55: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Classification

Classified Provenance Queries into Three Categories• Type 1: Querying for Provenance Metadata

o Example: Which gene was used create the cloned sample with ID = 66?

Thursday, January 28, 2010

Page 56: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Classification

Classified Provenance Queries into Three Categories• Type 1: Querying for Provenance Metadata

o Example: Which gene was used create the cloned sample with ID = 66?

• Type 2: Querying for Specific Data Seto Example: Find all knockout construct plasmids created by researcher

Michelle using “Hygromycin” drug resistant plasmid between April 25, 2008 and August 15, 2008

Thursday, January 28, 2010

Page 57: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Classification

Classified Provenance Queries into Three Categories• Type 1: Querying for Provenance Metadata

o Example: Which gene was used create the cloned sample with ID = 66?

• Type 2: Querying for Specific Data Seto Example: Find all knockout construct plasmids created by researcher

Michelle using “Hygromycin” drug resistant plasmid between April 25, 2008 and August 15, 2008

• Type 3: Operations on Provenance Metadatao Example: Were the two cloned samples 65 and 46 prepared

under similar conditions – compare the associated provenance information

Thursday, January 28, 2010

Page 58: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Operators

Four Query Operators – based on Query Classification

Thursday, January 28, 2010

Page 59: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Operators

Four Query Operators – based on Query Classification• provenance () – Closure operation, returns the complete set of

provenance metadata for input data entity

Thursday, January 28, 2010

Page 60: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Operators

Four Query Operators – based on Query Classification• provenance () – Closure operation, returns the complete set of

provenance metadata for input data entity• provenance_context() - Given set of constraints defined on

provenance, retrieves datasets that satisfy constraints

Thursday, January 28, 2010

Page 61: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Operators

Four Query Operators – based on Query Classification• provenance () – Closure operation, returns the complete set of

provenance metadata for input data entity• provenance_context() - Given set of constraints defined on

provenance, retrieves datasets that satisfy constraints • provenance_compare () - adapt the RDF graph equivalence definition

Thursday, January 28, 2010

Page 62: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Operators

Four Query Operators – based on Query Classification• provenance () – Closure operation, returns the complete set of

provenance metadata for input data entity• provenance_context() - Given set of constraints defined on

provenance, retrieves datasets that satisfy constraints • provenance_compare () - adapt the RDF graph equivalence definition• provenance_merge () - Two sets of provenance information are

combined using the RDF graph merge

Thursday, January 28, 2010

Page 63: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Answering Provenance Queries using provenance () Operator

Thursday, January 28, 2010

Page 64: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Engine

Thursday, January 28, 2010

Page 65: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Engine

• Available as API for integration with provenance management systems

Thursday, January 28, 2010

Page 66: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Engine

• Available as API for integration with provenance management systems

• Layer on top of a RDF Data Store (Oracle 10g), requires support for:o Rule-based reasoningo SPARQL query execution

Thursday, January 28, 2010

Page 67: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Engine

• Available as API for integration with provenance management systems

• Layer on top of a RDF Data Store (Oracle 10g), requires support for:o Rule-based reasoningo SPARQL query execution

• Input:o Type of provenance query operator : provenance ()o Input value to query operator: cloned sample 66o User details to connect to underlying RDF store

Thursday, January 28, 2010

Page 68: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Evaluation Results

Query ID Number of Variables

Total Number of Triples

Nesting Levels using OPTIONAL

Query 1:Target plasmid

25 84 4

Query 2:Plasmid_66

38 110 5

Query 3:Tr a n s f e c t i o n attempts

67 190 7

Query 4:cloned_sample66

67 190 7

Thursday, January 28, 2010

Page 69: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Evaluation Results

• Queries expressed in SPARQL• Datasets using real experiment data

Query ID Number of Variables

Total Number of Triples

Nesting Levels using OPTIONAL

Query 1:Target plasmid

25 84 4

Query 2:Plasmid_66

38 110 5

Query 3:Tr a n s f e c t i o n attempts

67 190 7

Query 4:cloned_sample66

67 190 7

Dataset ID Number of RDF Inferred Triples

Total Number of RDF TriplesDS 1

2,673 3,553DS 2 3,470 4,490

DS 3 4,988 6,288

DS 4 47,133 60,912

Thursday, January 28, 2010

Page 70: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Evaluation Results

Thursday, January 28, 2010

Page 71: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Query Optimization: Materialized Provenance Views

Thursday, January 28, 2010

Page 72: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Query Optimization: Materialized Provenance Views

Thursday, January 28, 2010

Page 73: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Query Optimization: Materialized Provenance Views

• Materializes a single logical unit of provenance

Thursday, January 28, 2010

Page 74: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Query Optimization: Materialized Provenance Views

• Materializes a single logical unit of provenance

• Does not require query-rewriting

Thursday, January 28, 2010

Page 75: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Query Optimization: Materialized Provenance Views

• Materializes a single logical unit of provenance

• Does not require query-rewriting

• View updates: addressed by characteristics of provenance

Thursday, January 28, 2010

Page 76: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Query Optimization: Materialized Provenance Views

• Materializes a single logical unit of provenance

• Does not require query-rewriting

• View updates: addressed by characteristics of provenance

• Created using a memoization approach

Thursday, January 28, 2010

Page 77: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Engine Architecture

Thursday, January 28, 2010

Page 78: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Query Engine Architecture

TRANSITIVE CLOSURE

QUERY

Thursday, January 28, 2010

Page 79: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Evaluation Results using Materialized Provenance Views

Thursday, January 28, 2010

Page 80: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Management System for Parasite Research

Thursday, January 28, 2010

Page 81: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Provenance Management System for Parasite Research

Thursday, January 28, 2010

Page 82: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Semantics and Services Enabled Problem Solving Environment for T. cruzi

Work Done • PKR

Development of ontologies Conversion of internal lab data

to RDF Modeling of internal lab data to

PEO

• Cuebee Formulation of simple queries

• External Collaboration Initiated with the Sanger

Institute (UK) and Oswaldo Cruz Institute (Brazil)

Future Work

Thursday, January 28, 2010

Page 83: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Semantics and Services Enabled Problem Solving Environment for T. cruzi

Work Done • PKR

Development of ontologies Conversion of internal lab data

to RDF Modeling of internal lab data to

PEO

• Cuebee Formulation of simple queries

• External Collaboration Initiated with the Sanger

Institute (UK) and Oswaldo Cruz Institute (Brazil)

Future Work• PKR

Addition of external databases, for e.g., TriTrypDB, Drug Bank, etc.

Thursday, January 28, 2010

Page 84: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Semantics and Services Enabled Problem Solving Environment for T. cruzi

Work Done • PKR

Development of ontologies Conversion of internal lab data

to RDF Modeling of internal lab data to

PEO

• Cuebee Formulation of simple queries

• External Collaboration Initiated with the Sanger

Institute (UK) and Oswaldo Cruz Institute (Brazil)

Future Work• PKR

Addition of external databases, for e.g., TriTrypDB, Drug Bank, etc.

• Cuebee Formulation and execution of advanced and

complex biological queries

Thursday, January 28, 2010

Page 85: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Semantics and Services Enabled Problem Solving Environment for T. cruzi

Work Done • PKR

Development of ontologies Conversion of internal lab data

to RDF Modeling of internal lab data to

PEO

• Cuebee Formulation of simple queries

• External Collaboration Initiated with the Sanger

Institute (UK) and Oswaldo Cruz Institute (Brazil)

Future Work• PKR

Addition of external databases, for e.g., TriTrypDB, Drug Bank, etc.

• Cuebee Formulation and execution of advanced and

complex biological queries

• NBCO Extensive collaboration on Semantics-driven

Web services using SA-REST and APIHut

Thursday, January 28, 2010

Page 86: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Semantics and Services Enabled Problem Solving Environment for T. cruzi

Work Done • PKR

Development of ontologies Conversion of internal lab data

to RDF Modeling of internal lab data to

PEO

• Cuebee Formulation of simple queries

• External Collaboration Initiated with the Sanger

Institute (UK) and Oswaldo Cruz Institute (Brazil)

Future Work• PKR

Addition of external databases, for e.g., TriTrypDB, Drug Bank, etc.

• Cuebee Formulation and execution of advanced and

complex biological queries

• NBCO Extensive collaboration on Semantics-driven

Web services using SA-REST and APIHut • External Collaboration

Extensive collaboration to extend PLO with other human parasites

Expand the scope of PKR to support queries related to drug targets or repositioning (Oswalso Cruz, Brazil)

Thursday, January 28, 2010

Page 87: Semantics and Services Enabled Problem Solving …T. cruzi Provenance System (TPS) for GKO and • Capture Web pages used in experiments Transform data into RDF instance data corresponding

Semantics and Services Enabled Problem Solving

Questions?

http://knoesis.wright.edu/trykipedia

Thursday, January 28, 2010