70
LINKED DATA: THE CHALLENGES AHEAD A PERSONAL PERSPECTIVE LEE HARLAND, SCIBITE LIMITED @SCIBITELY PRESENTED AT OPEN PHACTS MEETING VIENNA FEB 2016 SciBite http://openphacts.org http://scibite.com

Open PHACTS : Linked Data Future Challenges

Embed Size (px)

Citation preview

Page 1: Open PHACTS : Linked Data Future Challenges

LINKED DATA: THE CHALLENGES

AHEAD

A PERSONAL PERSPECTIVELEE HARLAND, SCIBITE LIMITED

@SCIBITELY

PRESENTED AT OPEN PHACTS MEETING

VIENNA FEB 2016

SciBitehttp://openphacts.org http://scibite.com

Page 2: Open PHACTS : Linked Data Future Challenges

CONTEXT FOR SLIDESHARE• THIS TALK WAS PRESENTED AT THE OPEN PHACTS CLOSING MEETING IN

VIENNA FEB 2016 WHEN THE OPEN PHACTS INFRASTRUCTURE WAS HANDED OVER OFFICIALLY TO THE OPEN PHACTS FOUNDATION

• THE AIM OF THE TALK WAS TO DISCUSS SOME OF THE KEY CHALLENGES OF THE ORIGINAL OPEN PHACTS PROJECT BUT IN TODAYS CONTEXT

• PLEASE VISIT HTTP://OPENPHACTS.ORG FOR INFORMATION ON THE OPEN PUBLIC-PRIVATE SEMANTICS-BASED PLATFORM FOR DRUG DISCOVERY AND HELP SUPPORT THIS VALUABLE INITIATIVE!

• PLEASE VISIT HTTP://SCIBITE.COM FOR INFORMATION ON OUR HIGH-THROUGHPUT SEMANTIC TOOLS FOR INTEGRATING SCIENTIFIC DOCUMENTS WITH “BIG DATA” SOLUTIONS AND TEXT MINING!

Page 3: Open PHACTS : Linked Data Future Challenges

IT STARTED IN 2009….. Meeting of multiple pharma to discuss key issues, lead to the Open PHACTS

IMI Call text

Page 4: Open PHACTS : Linked Data Future Challenges

2009 COMPETITION

http://readwrite.com/2009/12/15/twitters_top_10_tech_trends_of_2009

Even with a lot of cash, success is not

a given

Page 5: Open PHACTS : Linked Data Future Challenges

AH… 2009….Was a time of

massive increase in data science / big

data

Page 6: Open PHACTS : Linked Data Future Challenges

2016… So if we were planning Open

PHACTS in 2016, what would be on

my mind?

Page 7: Open PHACTS : Linked Data Future Challenges

ACKNOWLEDGEMENTS• BRYN WILLIAMS-JONES

• NICK LYNCH

• KIERA MCNEICE

• ANNA GAULTON

• ALL THOSE WHO GAVE SUGGESTIONS

• AND OPEN PHACTS CONSORTIUM FOR A GREAT 5 YEARS AND DELIVERING A UNIQUE SYSTEM

Page 8: Open PHACTS : Linked Data Future Challenges

CHALLENGE #1

Page 9: Open PHACTS : Linked Data Future Challenges

IF THIS IS NEWS TO YOU, YOU NEED TO STAY IN MORE

A second concern held by some is that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites”

Rather than labelling people

parasites, lets make the tools to ensure

credit for all involved!

Page 10: Open PHACTS : Linked Data Future Challenges

…. Prior to this…

However, many of us who have actually conducted clinical research, managed clinical studies and data collection and analysis, and curated data sets have concerns about the details. The first concern is that someone not involved in the generation and collection of the data may not understand the choices made in defining the parameters. Special problems arise if data are to be combined from independent studies and considered comparable. How heterogeneous were the study populations? Were the eligibility criteria the same….

Actually a very fair comment

Page 11: Open PHACTS : Linked Data Future Challenges

RESEARCH REPRODUCIBILITY

See also http://reason.com/archives/2016/01/19/broken-science

This is really becoming a hot topic right now

Page 12: Open PHACTS : Linked Data Future Challenges

……estimates for the reproducibility of preclinical research range from 51 percent to 89 percent. They estimate that at least half of all U.S. preclinical biomedical research funding—about $28 billion annually—is therefore squandered……

http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165

But is this just an “academic”

problem, or should industry care?

Page 13: Open PHACTS : Linked Data Future Challenges

http://www.cell.com/cell/abstract/S0092-8674%2809%2900316-X

Looks like a good lead– certainly

some drug companies thought

so

Page 14: Open PHACTS : Linked Data Future Challenges
Page 15: Open PHACTS : Linked Data Future Challenges

http://f1000research.com/articles/5-136/v1

Another paper that was a hot idea back

in the day…

Page 16: Open PHACTS : Linked Data Future Challenges

http://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.html

A first-of-a-kind analysis of Bayer's internal efforts to validate 'new drug target' claims now not only supports this view but suggests that 50% may be an underestimate; the company's in-house experimental data do not match literature claims in 65% of target-validation projects, leading to project discontinuation.

Industry should *really* care

Page 17: Open PHACTS : Linked Data Future Challenges

#DATALAKE

Page 18: Open PHACTS : Linked Data Future Challenges

DOES THE #DATALAKE LOOK LIKE THIS

Page 19: Open PHACTS : Linked Data Future Challenges

OR THIS?

Page 20: Open PHACTS : Linked Data Future Challenges

http://www.cafepress.co.uk/+metadata+t-shirts

http://explorer.openphacts.org/

https://www.w3.org/TR/prov-o/

Page 21: Open PHACTS : Linked Data Future Challenges

WHOA….

Quality?

http://lod-cloud.net/

Page 22: Open PHACTS : Linked Data Future Challenges

CAN “DATA” PEOPLE HELP?

Quality Of The ExperimentIndependent Confirmations

Negative AssertionsLiterature “dead ends”

Social Commentary / Sentiment AnalysisMarking Questionable Assertions

Page 23: Open PHACTS : Linked Data Future Challenges

Reproducibility, Meta Data & Provenance

Page 24: Open PHACTS : Linked Data Future Challenges

CHALLENGE #2

Page 25: Open PHACTS : Linked Data Future Challenges

HERE’S A SPARQL QUERY

Page 26: Open PHACTS : Linked Data Future Challenges

ONTOLOGIES ARE CRITICAL

Page 27: Open PHACTS : Linked Data Future Challenges

ESSENTIAL FOR DATA DISCOVERY

2011

Here we talked about how industry

needs open ontologies

Page 28: Open PHACTS : Linked Data Future Challenges

BIOPORTAL UPDATES (OF 618 ONTOLOGIES)

Year Six Months Three Months

One Month0102030405060708090

100%

Of O

ntol

ogie

s

Updated In The Last…

Only 1/3rd updated in the last year

Page 29: Open PHACTS : Linked Data Future Challenges

SCIENCE MOVES AT A DIFFERENT PACE

OK so apples and oranges, but still,

there’s a vast difference between

the two

Page 30: Open PHACTS : Linked Data Future Challenges

ONTOLOGY SUPPORT

BAO incredibly valuable resource, need to support it!

Page 31: Open PHACTS : Linked Data Future Challenges

IS IT TIME FOR A NEW STRATEGY?

We may never have “enough” resource,

so what are the alternatives to

ontology sustain?

Page 32: Open PHACTS : Linked Data Future Challenges

PISTOIA ONTOLOGY WG

https://www.qmarkets.org/live/pistoia/

Page 33: Open PHACTS : Linked Data Future Challenges

ON THE FLY URI

ORPHANET Rare/Orphan diseaseDynamic Phenotype Network In RDF

Generated by TERMite PhenotypeFinder

Ack. Michael Hughes

http://scibite.com

SciBite

Yellow = disease, pink = on-the-fly

phenotype concept generated by text

mining (not found in HPO)

Page 34: Open PHACTS : Linked Data Future Challenges

?We need some cool

new ideas!

Page 35: Open PHACTS : Linked Data Future Challenges

Reproducibility, Meta Data & Provenance

Ontology

Page 36: Open PHACTS : Linked Data Future Challenges

CHALLENGE #3

Page 37: Open PHACTS : Linked Data Future Challenges

139755-83-2

A standard, but not “open” – inhibits certain projects

Page 38: Open PHACTS : Linked Data Future Challenges

AN OPEN SEMANTIC CHEMISTRY API

21,000,000 structures

Open PHACTS built a completely open semantic chemistry

registry

Page 39: Open PHACTS : Linked Data Future Challenges

http://www.slideshare.net/alasdair_gray/scientific-lenses-to-support-multiple-views-over-linked-chemistry-data

With which you can do some very fancy

things

Page 40: Open PHACTS : Linked Data Future Challenges

http://www.inchi-trust.org/

Chemistry doesn’t stand still

Page 41: Open PHACTS : Linked Data Future Challenges

And we need to support biologicals

too!

Page 42: Open PHACTS : Linked Data Future Challenges

https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL1252

Excellent open-data success story!

Page 43: Open PHACTS : Linked Data Future Challenges

HOW TO SEMANTICALLY ENCODE TACIT KNOWLEDGE?

http://blogs.sciencemag.org/pipeline/archives/2016/01/19/what-does-one-do-with-these

Some amazing knowledge out

there, how do we get it “into the

graph”?

Page 44: Open PHACTS : Linked Data Future Challenges

Reproducibility, Meta Data & Provenance

Ontology

Semantics 4 Therapeutic

s

Page 45: Open PHACTS : Linked Data Future Challenges

CHALLENGE #4

Page 46: Open PHACTS : Linked Data Future Challenges

The Biggest Challenge In Data Integration Is…

…The Data

Page 47: Open PHACTS : Linked Data Future Challenges

OPEN PHACTS APPROACH

DATA TOOLING

The best approach, not doing either in

isolation but a dialog between

both

Page 48: Open PHACTS : Linked Data Future Challenges

CHATTER PLOT

data tech

Much more time was spent on issues with the data than

the tech.Not saying the tech

was easy!

Page 49: Open PHACTS : Linked Data Future Challenges

WE BUILT SOME CUTTING EDGE TECH

Dynamic Identifier Resolution

Mixing SPARQL & web services

Production Deployment

API Centric Integration

Business Question-Focus

API Management

“App Store” Ecosystem Semantic Chemistry

Nanopubs & Provenance Cool, Friendly UI’s

Page 50: Open PHACTS : Linked Data Future Challenges

DRIVING FORWARD THROUGH DIRECT DIALOG

But, that every data conversation

generated more and more

questions!

Page 51: Open PHACTS : Linked Data Future Challenges

DATA IS ALWAYS EVOLVING

Page 52: Open PHACTS : Linked Data Future Challenges

Chembl012009

When OPS Started

Page 53: Open PHACTS : Linked Data Future Challenges

By the time we really started making RDF!

Page 54: Open PHACTS : Linked Data Future Challenges

And things didn’t stand

still!

Page 55: Open PHACTS : Linked Data Future Challenges
Page 56: Open PHACTS : Linked Data Future Challenges
Page 57: Open PHACTS : Linked Data Future Challenges
Page 58: Open PHACTS : Linked Data Future Challenges
Page 59: Open PHACTS : Linked Data Future Challenges
Page 60: Open PHACTS : Linked Data Future Challenges
Page 61: Open PHACTS : Linked Data Future Challenges
Page 62: Open PHACTS : Linked Data Future Challenges
Page 63: Open PHACTS : Linked Data Future Challenges
Page 64: Open PHACTS : Linked Data Future Challenges
Page 65: Open PHACTS : Linked Data Future Challenges

Today- Chembl schema is unrecognizable from

2009.Open PHACTS is the way

companies can work with providers to stay

on top of data evolution!

Page 66: Open PHACTS : Linked Data Future Challenges

Reproducibility, Meta Data & Provenance

Ontology

Semantics 4 Therapeutic

sData

Evolution

Page 67: Open PHACTS : Linked Data Future Challenges

CHALLENGE #5

Page 68: Open PHACTS : Linked Data Future Challenges

PCSK9: AN “ICONIC EXAMPLE” OF TRANSLATIONAL MEDICINE IN THE GENOMICS ERA*

*http://www.nature.com/news/genetics-a-gene-of-rare-effect-1.12773

PCSK9 is an example of genomics,

informatics and drug discovery

coming together for real change

Page 69: Open PHACTS : Linked Data Future Challenges

TRANSLATING PRECLINICAL DISCOVERIESResearchers have found that removing a gene called USF1 protects mice against heart disease, diabetes and obesity

We don’t yet have a system where we can just jump and

explore the biology and chemistry of

the USF1 pathways….

Page 70: Open PHACTS : Linked Data Future Challenges

Reproducibility, Meta Data & Provenance

Ontology

Semantics 4 Therapeutic

sData

Evolution

The Testable Disease Network

Conclusions - this is what i’d be arguing for in Open PHACTS

2016