24
Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam http://staff.science.uva.nl/~marshall http://www.w3.org/blog/hcls

Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Embed Size (px)

Citation preview

Page 1: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Coordinating data interoperability – a W3C perspective

M. Scott Marshall, Ph.D.W3C HCLS IG co-chair

Leiden University Medical CenterUniversity of Amsterdam

http://staff.science.uva.nl/~marshallhttp://www.w3.org/blog/hcls

Page 2: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

The Semantic Web is the New Global Web of pre- and post-competitive Knowledge

It is about standards for publishing, sharing and querying knowledge drawn from diverse sources

It makes it possible to answersophisticated questions using

background knowledge

Adapted from: Michel Dumontier

Page 3: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Changing tide favors data sharing

• Pharmaceutical industry changing strategy from one size fits all to tailored therapy

• Personalized Medicine• David Cox (Pfizer) Strategy: Academic /

Industry partnership, wellness: rare variants that protect against disease

• Biobanking – Data stewards that can facilitate search for rare variants

• EHRs

Page 4: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Background of the HCLS IG

• Originally chartered in 2005– Chairs: Eric Neumann and Tonya Hongsermeier

• Re-chartered in 2008– Chairs: Scott Marshall and Susie Stephens– Team contact: Eric Prud’hommeaux

• Broad industry participation– Over 100 members – Mailing list of over 600

• Background Information– http://www.w3.org/blog/hcls – http://esw.w3.org/topic/HCLSIG

Page 5: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Mission of HCLS IG

•The mission of HCLS is to develop, advocate for, and support the use of Semantic Web technologies for

– Biological science– Translational medicine– Health care

•These domains stand to gain tremendous benefit by adoption of Semantic Web technologies, as they depend on the interoperability of information from many domains and processes for efficient decision support

Page 6: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Translating across domains

Microarray

MRI PubMed

AlzForum

EHR

Page 7: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Current Task Forces

• BioRDF – federating (neuroscience) knowledge bases– M. Scott Marshall (Leiden University Medical Center / University of Amsterdam)

• Clinical Observations Interoperability – patient recruitment in trials– Vipul Kashyap (Cigna Healthcare)

• Linking Open Drug Data – aggregation of Web-based drug data – Susie Stephens (Johnson & Johnson)

• Translational Medicine Ontology – high level patient-centric ontology– Michel Dumontier (Carleton University)

• Scientific Discourse – building communities through networking– Tim Clark (Harvard University)

• Terminology – Semantic Web representation of existing resources– John Madden (Duke University)

Page 8: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Some HCLS deliverables

• Translational Medicine Ontology• Linked Open Drug Data• CDISC – HL7 ‘light-weight’ bridge for patient

eligibility studies• HCLS Knowledge Base• Emerging Best Practices• Expression RDF• Federation

Page 9: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

BioRDF: Translating across domains

Microarray

MRI PubMed

AlzForum

EHR

Page 10: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

A Bottom-up Approach

Provenance models

Workflow, experimental design

Domain ontologies(DO, GO…)

Communitymodels

Raw Data

Results

Questions

Which genes are markers for neurodegenerative

diseases?

Was gene ALG2 differentially expressed in

multiple experiments?

Provenance of Microarray experiment

What software was used to analyse the data?

How can the experiment be replicated?

Source: Helena Deus

Page 11: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

LODD: Translating across domains

Microarray

MRI PubMed

AlzForum

EHR

Page 12: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

The Linked Data Cloud

Source: Chris Bizer

Page 13: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

LODD

Page 14: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

TripleMap

Page 15: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Pistoia AllianceVocabulary Services Initiative

“The life sciences industry currently operates in an environment where few of the basic components of its study (e.g. genes, proteins, cells, diseases, biomarkers, assays, drugs and technologies) are described using consistent, universally agreed-upon vocabularies.”

Page 16: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Homonyms

PSA• Prostate Specific Antigen• PSoriatic Arthritis• alpha-2,8-PolySialic Acid• PolySubstance Abuse• Picryl Sulfonic Acid• Polymeric Silicic Acid• Partial Sensory Agnosia• Poultry Science Association

Source: Martijn Schuemie

Page 17: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Shared Identifiers

• Must use common URI’s in order to link data• Provenance related identifiers still needed:

– Identifiers for people (researchers)– Identifiers for diseases– Identifiers for terms (Terminology servers)– Identifiers for programs, processes, workflows– Identifiers for chemical compounds

• Shared Names http://sharednames.org • Bio2RDF

Page 18: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Semagacestat - cancelled

• Lilly Alzheimer’s drug cancelled during Phase II trials because it worsened condition of patients

• Refresh rate of clinicaltrials.gov ? A pharma consortium could do better.

Page 19: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Ideally..

• Pre-competitive data sharing– Share public domain resources such as

vocabularies, vocabulary preferences, and literature indexes (Pfizerpedia?)

– Share annotated resources• Post-competitive data sharing

– Share drug development history info– Drug on the market? Release the clinical trial data!– Share adverse event information

Page 20: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Views of linked data

• EHR• PMR• Clinical report• Investigator-led Clinical trial• Biobank• Wet lab study

Page 21: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Data Sharing Steps

• Establish common URI identifiers (PURLs)• PURL server for vocabularies• Create a registry of SPARQL endpoints• Establish query patterns for data discovery i.e.

the structure of a federation

Page 22: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

HCLS IG – how can it fit?

• Current HCLS IG is based on volunteer effort – chairs steer but deliverables are determined by fit with day job

• Would another type of group fit the goals of the pharma consortia better?

• An HCLS task force dedicated to project work for the collaboration

Page 23: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

Summary

• Cross-membership is more effective than occasional liaison calls

• Budget time and travel for interaction across the consortia

• Budget for contribution to existing projects such as NCBO, SWObjects

• Start a collaborative project based on sustainable services

Page 24: Coordinating data interoperability – a W3C perspective M. Scott Marshall, Ph.D. W3C HCLS IG co-chair Leiden University Medical Center University of Amsterdam

The End

“Science is built up of facts, as a house is built of stones; but an accumulation of facts is no more a science than a heap of stones is a house.”

– Henri Poincaré, Science and Hypothesis, 1905