Upload
chunlei-wu
View
808
Download
1
Embed Size (px)
Citation preview
Chunlei Wu, [email protected]
@chunleiwu
Associate Professor of Molecular MedicineDept. of Molecular Experimental Medicine
The Scripps Research InstituteLa Jolla, CA, USA
01/22/2016
From MyGene.info and MyVariant.info towards BioThings API
As a
MyGene.info and MyVariant.info recap
AnnotationsGeneVariant(Aggregated)
(high-performance)(real-time) Web Service
So many variant annotation resources
dbNSFP
The Exome Aggregation
Consortium (ExAC)
Annotations centered around bio-entities
Gene
GVariant
V
Pathway
P
D
Metabolite
M
Disease
Simple JSON-based Aggregation mechanism
{ "_id": "chr1:g.196659237C>T", "cadd": { … }, "clinvar": { … }, "cosmic": { … }, "dbsnp": { … }, "dbnsfp": { … }, "evs": { … }, "emv": { … }, "mutdb": { … }, "gwassnp": { … }, "snpedia": { … }, "wellderly": { … }}
{ "_id": "chr1:g.196659237C>T", “dbsnp": { "snpclass": "single", "rsid": "rs1061170", "func": "missense" }}
{ "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }}
{ "_id": "chr1:g.196659237C>T", “dbnsfp": { “sift": { "breast“: “tolerated”, “val”: 1 } }}
“cadd” “clinvar” “evs” “mutdb”
…
Keep data always up-to-date
Each data source is updated individually. Colors indicate their different updating schedules.
Schematic view of MyVariant.info architecture
High-performance web service APIs
Schematic view of MyVariant.info architecture
MyVariant.info for the end users:
http://MyVariant.info(currently v1 API, two endpoints)
http://MyVariant.info/v1/query?q=<query>
any query term(s)
matching variant hits
http://MyVariant.info/v1/variant/<variantid>
hgvs id(s)
matching variant object(s)
Both supports batch-mode via POST
Simple API. No sign-up. No API key.
Try our live API , and documentations
MyGene.info for the end users:
http://MyGene.info(currently v2 API, two endpoints)
http://MyGene.info/v2/query?q=<query>
any query term(s)
matching gene hits
http://MyGene.info/v2/gene/<geneid>
gene id(s)
matching gene object(s)
Both supports batch-mode via POST
Simple API. No sign-up. No API key.
Try our live API , and documentations
MyGene.info usage updates
lastyear
thisyear
2M
3MMonthly hits in Millions
Usage spikes (5M hits/day) during X-Mas 2014
30%9%
35%26%
Increased clients adoptionRequests by MyGene.info clients
Highlights:• mygene Python client usage now surpasses BioGPS usage• mygene R client usage now increased to 9% from <1%
10/07/2015-01/05/2016
30%9%
35%26%
Increased clients adoptionmygene Python client hosted in PyPI
mygene R client hosted in Bioconductor
MyVariant.info updates
Total over 334 Millions of annotated variants
The Exome Aggregation Consortium (ExAC)
New additions:
dbNSFPUpdated:
MyVariant.info updates
30%
68%2%
10/07/2015-01/05/2016
1 Million requests in 3 months
MyVariant.info official Python/R Clients
myvariant Python client hosted in PyPI (initial release in Aug 2015)
myvariant R client hosted in Bioconductor(initial release in Oct 2015)
A Node.js client made by a user with passion
Next?
MyVariant.info
MyGene.info
Make our APIs serve Linked Data
via
Why Linked Data?Gene
GVariant
V
Pathway
P
D
Metabolite
M
Disease
Linked Data for data aggregation
MyVariant.info
V
Another Variant API
V
V
Linked Data for data aggregation
MyVariant.info
Another Variant API
{ "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }, "clinvar": {…}, "dbsnp": {…}, …}
{ "pop": "GWD", "nobs": 226, "freq": 0.371681415929, …}
{ "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }, "clinvar": {…}, "dbsnp": {…}, "new_src": { "pop": "GWD", "nobs": 226, "freq": 0.371681415929 }, …}
JSON + context = JSON-LD{ "@context": { "clinvar": "http://schema.myvariant.info/datasource/clinvar", "rcv": "http://schema.myvariant.info/datanode/rcv", "gene": "http://schema.myvariant.info/datanode/gene", "_id": "@id" }, "_id": "chr6:g.26093141G>A", "clinvar": { "@context": { "uniprot": "http://identifiers.org/uniprot/", "omim": "http://identifiers.org/omim/" }, "chrom": "6", "alt": "A", "ref": "G", "allele_id": 15048, "rsid": "rs1800562", "rcv": { "@context": { "accession": "http://identifer.org/clinvar" }, "accession": "RCV000000020", "origin": "germline", "clinical_significance": "risk factor" }, "gene": { "@context": { "symbol": "http://identifiers.org/hgnc.symbol/" }, "id": "3077", "symbol": "HFE" }, "omim": "613609.0001", "variant_id": 9 }}
Processed JSON-LD
<chr6:g.26093141G>A> <http://schema.myvariant.info/datasource/clinvar> _:b0 ._:b0 <http://identifiers.org/omim/> "613609.0001" ._:b0 <http://schema.myvariant.info/datanode/gene> _:b1 ._:b0 <http://schema.myvariant.info/datanode/rcv> _:b2 ._:b1 <http://identifiers.org/hgnc.symbol/> "HFE" ._:b2 <http://identifer.org/clinvar> "RCV000000020" .
JSON-LD N-Quads output:
{ "@id": "chr6:g.26093141G>A", "http://schema.myvariant.info/datasource/clinvar": { "http://identifiers.org/omim/": "613609.0001", "http://schema.myvariant.info/datanode/gene": { "http://identifiers.org/hgnc.symbol/": "HFE" }, "http://schema.myvariant.info/datanode/rcv": { "http://identifer.org/clinvar": "RCV000000020" } }}
JSON-LD compacted output:
In a nut-shell, what JSON-LD context does?
Marks values in a JSON object to defined URIs
"http://identifer.org/clinvar" →clinvar.rcv.accession
JSON-LD context makes your data
"Linkable"
"Linked"Downstream
processing libraries
A Python library for processing JSON-LD data
In [1]: fetch_value_source_for_variant("chr6:g.26093141G>A","http://identifiers.org/dbsnp/")Out[1]:
['rs1800562 http://schema.myvarint.info/datasource/dbnsfp', 'rs1800562 http://schema.myvarint.info/datasource/clinvar', 'rs1800562 http://schema.myvarint.info/datasource/dbsnp', 'rs1800562 http://schema.myvarint.info/datasource/evs', 'rs1800562 http://schema.myvarint.info/datasource/gwassnps', 'rs1800562 http://schema.myvarint.info/datasource/mutdb']
By Kevin Xin
Need to define an API specs
• Output as a JSON object with a defined _id.• "jsonld=true/false" toggle for the inclusion of JSON-LD
context.• Support the retrieval of a single entity via GET
(use case: individual data aggregation on the fly)• Support the retrieval of a list of entities via POST
(use case: routine data aggregation in batches)• Output should indicate the entity existence:
GET /variant/<unknown_id> 404
POST /variant/ id1, <unknown_id>, id3 [id1: {…},
<unknown_id>: "notfound",id3: {…}]
to enable data exchange via JSON-LD
BioThings API
MyVariant.info
MyGene.info
By Cyrus Afrasiabi
BioThings API
MyVariant.info MyGene.info
JSON data aggregation mechanism
High-performance query engine
Well-designed REST API pattern
JSON-LD enabled Linked Data
Data-updating schedulerPython/R clients…
Data-sharing via Web API is trending
Making a single web service is trivial, but making a sustainable/scalable web API is non-trivial.
We would like to help other groups to create their own hosted web API for sharing their data.
Action item 1: BioThings API whitepaper
Also the action item from last BD2K CA consortium meeting and the API working group from last year's NIH BD2K AHM
Action item 2: BioThings API framework
NIH commonsInfrastructure as a Service:
Software as a Service:BioThings
API
Action item 3: expansion to other "BioThings"
D
Disease
D
Drugs
MyDrug.info MyDisease.infoneed an alt. name here
Acknowledgement
Funding and SupportU54GM114833U01HG008473
Washtington U:Ben AinscoughObi Griffith
TSRI:
Andrew SuJiwen XinCyrus AfrasiabiGinger TsuengAdam Mark
Greg StuppTim Putman
STSI:
Eric TopolAli TorkamaniGalina Erikson
U. Washington:
Sean MooneyMoritz JuchlerNikhil Gopal
OICR:Robin Haw
UC Berkeley:Chris Mungall
UCSD:Trish Whetzel
MyVariant.info
MyGene.info