35
Chunlei Wu, Ph.D. [email protected] @chunleiwu Associate Professor of Molecular Medicine Dept. of Molecular Experimental Medicine The Scripps Research Institute La Jolla, CA, USA 01/22/2016 From MyGene.info and MyVariant.info towards BioThings API

Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Embed Size (px)

Citation preview

Page 1: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Chunlei Wu, [email protected]

@chunleiwu

Associate Professor of Molecular MedicineDept. of Molecular Experimental Medicine

The Scripps Research InstituteLa Jolla, CA, USA

01/22/2016

From MyGene.info and MyVariant.info towards BioThings API

Page 2: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

As a

MyGene.info and MyVariant.info recap

AnnotationsGeneVariant(Aggregated)

(high-performance)(real-time) Web Service

Page 3: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

So many variant annotation resources

dbNSFP

The Exome Aggregation

Consortium (ExAC)

Page 4: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Annotations centered around bio-entities

Gene

GVariant

V

Pathway

P

D

Metabolite

M

Disease

Page 5: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Simple JSON-based Aggregation mechanism

{ "_id": "chr1:g.196659237C>T", "cadd": { … }, "clinvar": { … }, "cosmic": { … }, "dbsnp": { … }, "dbnsfp": { … }, "evs": { … }, "emv": { … }, "mutdb": { … }, "gwassnp": { … }, "snpedia": { … }, "wellderly": { … }}

{ "_id": "chr1:g.196659237C>T", “dbsnp": { "snpclass": "single", "rsid": "rs1061170", "func": "missense" }}

{ "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }}

{ "_id": "chr1:g.196659237C>T", “dbnsfp": { “sift": { "breast“: “tolerated”, “val”: 1 } }}

“cadd” “clinvar” “evs” “mutdb”

Page 6: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Keep data always up-to-date

Each data source is updated individually. Colors indicate their different updating schedules.

Schematic view of MyVariant.info architecture

Page 7: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

High-performance web service APIs

Schematic view of MyVariant.info architecture

Page 8: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

MyVariant.info for the end users:

http://MyVariant.info(currently v1 API, two endpoints)

http://MyVariant.info/v1/query?q=<query>

any query term(s)

matching variant hits

http://MyVariant.info/v1/variant/<variantid>

hgvs id(s)

matching variant object(s)

Both supports batch-mode via POST

Simple API. No sign-up. No API key.

Try our live API , and documentations

Page 9: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

MyGene.info for the end users:

http://MyGene.info(currently v2 API, two endpoints)

http://MyGene.info/v2/query?q=<query>

any query term(s)

matching gene hits

http://MyGene.info/v2/gene/<geneid>

gene id(s)

matching gene object(s)

Both supports batch-mode via POST

Simple API. No sign-up. No API key.

Try our live API , and documentations

Page 10: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

MyGene.info usage updates

lastyear

thisyear

2M

3MMonthly hits in Millions

Page 11: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Usage spikes (5M hits/day) during X-Mas 2014

Page 12: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

30%9%

35%26%

Increased clients adoptionRequests by MyGene.info clients

Highlights:• mygene Python client usage now surpasses BioGPS usage• mygene R client usage now increased to 9% from <1%

10/07/2015-01/05/2016

Page 13: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

30%9%

35%26%

Increased clients adoptionmygene Python client hosted in PyPI

mygene R client hosted in Bioconductor

Page 14: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

MyVariant.info updates

Total over 334 Millions of annotated variants

The Exome Aggregation Consortium (ExAC)

New additions:

dbNSFPUpdated:

Page 15: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

MyVariant.info updates

30%

68%2%

10/07/2015-01/05/2016

1 Million requests in 3 months

Page 16: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

MyVariant.info official Python/R Clients

myvariant Python client hosted in PyPI (initial release in Aug 2015)

myvariant R client hosted in Bioconductor(initial release in Oct 2015)

Page 17: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

A Node.js client made by a user with passion

Page 18: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Next?

MyVariant.info

MyGene.info

Page 19: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Make our APIs serve Linked Data

via

Page 20: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Why Linked Data?Gene

GVariant

V

Pathway

P

D

Metabolite

M

Disease

Page 21: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Linked Data for data aggregation

MyVariant.info

V

Another Variant API

V

V

Page 22: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Linked Data for data aggregation

MyVariant.info

Another Variant API

{ "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }, "clinvar": {…}, "dbsnp": {…}, …}

{ "pop": "GWD", "nobs": 226, "freq": 0.371681415929, …}

{ "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }, "clinvar": {…}, "dbsnp": {…}, "new_src": { "pop": "GWD", "nobs": 226, "freq": 0.371681415929 }, …}

Page 23: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

JSON + context = JSON-LD{ "@context": { "clinvar": "http://schema.myvariant.info/datasource/clinvar", "rcv": "http://schema.myvariant.info/datanode/rcv", "gene": "http://schema.myvariant.info/datanode/gene", "_id": "@id" }, "_id": "chr6:g.26093141G>A", "clinvar": { "@context": { "uniprot": "http://identifiers.org/uniprot/", "omim": "http://identifiers.org/omim/" }, "chrom": "6", "alt": "A", "ref": "G", "allele_id": 15048, "rsid": "rs1800562", "rcv": { "@context": { "accession": "http://identifer.org/clinvar" }, "accession": "RCV000000020", "origin": "germline", "clinical_significance": "risk factor" }, "gene": { "@context": { "symbol": "http://identifiers.org/hgnc.symbol/" }, "id": "3077", "symbol": "HFE" }, "omim": "613609.0001", "variant_id": 9 }}

Page 24: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Processed JSON-LD

<chr6:g.26093141G>A> <http://schema.myvariant.info/datasource/clinvar> _:b0 ._:b0 <http://identifiers.org/omim/> "613609.0001" ._:b0 <http://schema.myvariant.info/datanode/gene> _:b1 ._:b0 <http://schema.myvariant.info/datanode/rcv> _:b2 ._:b1 <http://identifiers.org/hgnc.symbol/> "HFE" ._:b2 <http://identifer.org/clinvar> "RCV000000020" .

JSON-LD N-Quads output:

{ "@id": "chr6:g.26093141G>A", "http://schema.myvariant.info/datasource/clinvar": { "http://identifiers.org/omim/": "613609.0001", "http://schema.myvariant.info/datanode/gene": { "http://identifiers.org/hgnc.symbol/": "HFE" }, "http://schema.myvariant.info/datanode/rcv": { "http://identifer.org/clinvar": "RCV000000020" } }}

JSON-LD compacted output:

Page 25: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

In a nut-shell, what JSON-LD context does?

Marks values in a JSON object to defined URIs

"http://identifer.org/clinvar" →clinvar.rcv.accession

Page 26: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

JSON-LD context makes your data

"Linkable"

"Linked"Downstream

processing libraries

Page 27: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

A Python library for processing JSON-LD data

In [1]: fetch_value_source_for_variant("chr6:g.26093141G>A","http://identifiers.org/dbsnp/")Out[1]:

['rs1800562 http://schema.myvarint.info/datasource/dbnsfp', 'rs1800562 http://schema.myvarint.info/datasource/clinvar', 'rs1800562 http://schema.myvarint.info/datasource/dbsnp', 'rs1800562 http://schema.myvarint.info/datasource/evs', 'rs1800562 http://schema.myvarint.info/datasource/gwassnps', 'rs1800562 http://schema.myvarint.info/datasource/mutdb']

By Kevin Xin

Page 28: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Need to define an API specs

• Output as a JSON object with a defined _id.• "jsonld=true/false" toggle for the inclusion of JSON-LD

context.• Support the retrieval of a single entity via GET

(use case: individual data aggregation on the fly)• Support the retrieval of a list of entities via POST

(use case: routine data aggregation in batches)• Output should indicate the entity existence:

GET /variant/<unknown_id> 404

POST /variant/ id1, <unknown_id>, id3 [id1: {…},

<unknown_id>: "notfound",id3: {…}]

to enable data exchange via JSON-LD

Page 29: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

BioThings API

MyVariant.info

MyGene.info

By Cyrus Afrasiabi

Page 30: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

BioThings API

MyVariant.info MyGene.info

JSON data aggregation mechanism

High-performance query engine

Well-designed REST API pattern

JSON-LD enabled Linked Data

Data-updating schedulerPython/R clients…

Page 31: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Data-sharing via Web API is trending

Making a single web service is trivial, but making a sustainable/scalable web API is non-trivial.

We would like to help other groups to create their own hosted web API for sharing their data.

Page 32: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Action item 1: BioThings API whitepaper

Also the action item from last BD2K CA consortium meeting and the API working group from last year's NIH BD2K AHM

Page 33: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Action item 2: BioThings API framework

NIH commonsInfrastructure as a Service:

Software as a Service:BioThings

API

Page 34: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Action item 3: expansion to other "BioThings"

D

Disease

D

Drugs

MyDrug.info MyDisease.infoneed an alt. name here

Page 35: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Acknowledgement

Funding and SupportU54GM114833U01HG008473

Washtington U:Ben AinscoughObi Griffith

TSRI:

Andrew SuJiwen XinCyrus AfrasiabiGinger TsuengAdam Mark

Greg StuppTim Putman

STSI:

Eric TopolAli TorkamaniGalina Erikson

U. Washington:

Sean MooneyMoritz JuchlerNikhil Gopal

OICR:Robin Haw

UC Berkeley:Chris Mungall

UCSD:Trish Whetzel

MyVariant.info

MyGene.info