Upload
vanessa-baldwin
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Overview
Publishing collaborations: Making databases more like journals
NPG New Technology: Making journals more like databases
Tagging and social bookmarking: New methods of annotation and navigation
Database publishing at NPG
The AfCS-Nature Signaling Gateway (http://www.signaling-gateway.org/)
The CMC-Nature Cell Migration Gateway (http://www.cellmigration.org/)
Forthcoming collaborations with NCI and several other groups
The AfCS-Nature Signaling Gateway
A freely available online resource for anyone interested in cellular signalling
A collaboration with the research community through the Alliance for Cellular Signaling
An experiment in the next generation of online, database-driven scientific publications
The Signaling Gateway
Hardware & software hosted atSan Diego Supercomputer Center
Molecule Pages
AfCS Data
Center
Signaling Update
Home,Info & News
• Facts and figures on major cell signaling proteins (3,700+)• Continually updated by selected experts (~1000)• Peer-review run by NPG
News & comment written and commissioned by NPG editors
• Repository for raw experimental data from AfCS• Tools for viewing and analyzing data (online & offline)
The Molecule Pages
Comprehensive, structured data for 3,700+ proteins involved in cellular signalling
Some information automatically fed in from other online databases and updated monthly
Other information entered by selected expert authors and updated annually
Author-entered data peer-reviewed by NPG Fully citable using digital object identifiers
(DOIs)
Using Digital Object Identifiers
Nature 409,860 - 921 (2001)
doi:10.1038/35057062
• Allows unambiguous identification of paper• Allows readers to find the paper online• Allows publishers to cross-link reference lists• Guaranteed not to change (even if the publisher changes)
http://dx.doi.org/10.1038/35057062
IDF/CrossRef databases
Correct URL at publisher’s website
The Molecule Pages: A scientific publication
Characteristic Traditional journal
Traditional database
Molecule Pages
Recognised serial publication with an ISSN
Authored by recognised scientific experts ?
Subjected to full anonymous peer review
Maintained indefinitely (with errata and addenda)
Formerly citable and fully integrated into CrossRef
Structured and highly queryable
The Molecule Pages has the same features as a traditional journal, except that the information it contains is more highly structured and queryable.
Overview
Publishing collaborations: Making databases more like journals
NPG New Technology: Making journals more like databases
Tagging and social bookmarking: New methods of annotation and navigation
Great underestimated technologies of our age
Alternating current(1880s)
Executing criminals
The electrically powered society
Web-based scientific publishing(2004)
A new charging model for scientific papers
Redefining the concept the scientific paper
Steam engines(early 1700s)
Pumping water from coal mines The Industrial Revolution
Technology Purported use Eventual impact
Scientific papers as structured data objects
Print journal
Online facsimile
circa 2000
<rdf>
</rdf>
<svg>
</svg>
Article metadata database
Structured data sets
circa 2006
Structured, interactive and queryable figures and text
Experimental article metadata database
Initial data to be included:
Author and institute details Scientific:
Molecules (InChI) Genes (Entrez Gene) Proteins (UniProt) Cellular processes, functions, locations (GO)
Species (NCBI) Citation annotations (controlled vocabulary)
Support for structured data sets
Preview in browser Download to desktop software
Search for more data
Developing support for:• Systems Biology
Markup Language • CellML• Chemical Markup
Language• Others
SVG: Figures as interactive data objects
Plot graph on axes of choice Overlay data sets of choice
Click to download raw dataZoom and pan to view detail
Increasing structure in text markup (1)
The old way (no semantic markup):“<p>...gp120 binding to CXCR4 or CCR5 activates PYK2 and FAK…</p>”
Now (key entities and concepts marked up):“<p>...<protein id="urn:lsid:uniprot.org:uniprot:P03378">gp120</protein> <action id="urn:lsid:geneontology.org:go:000548">binding</action> to <protein id="urn:lsid:uniprot.org:uniprot:P48061">CXCR4</protein> or <protein id="urn:lsid:uniprot.org:uniprot:P10147">CCR5</protein> <action id="urn:lsid:geneontology.org:go:0008047">activates</action> <protein id="urn:lsid:uniprot.org:uniprot:O43150">PYK2</protein> and <protein id="urn:lsid:uniprot.org:uniprot:Q05397">FAK</protein>…</p>”
Increasing structure in text markup (2)
The new way (full RDF/XML):<p>...<rdf:Graph xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:go="urn:lsid:geneontology.org:go:" xmlns:uniprot="urn:lsid:uniprot.org:uniprot:"> <go:000548> <uniprot:Protein rdf:resource="urn:lsid:uniprot.org:uniprot:P03378"/> <uniprot:Protein rdf:resource="urn:lsid:uniprot.org:uniprot:P48061"/> <go:0008047 rdf:resource="urn:lsid:uniprot.org:uniprot:O43150"/> <go:0008047 rdf:resource="urn:lsid:uniprot.org:uniprot:Q05397"/> </go:000548> <go:000548> <uniprot:Protein rdf:resource="urn:lsid:uniprot.org:uniprot:P03378"/> <uniprot:Protein rdf:resource="urn:lsid:uniprot.org:uniprot:P10147"/> <go:0008047 rdf:resource="urn:lsid:uniprot.org:uniprot:O43150"/> <go:0008047 rdf:resource="urn:lsid:uniprot.org:uniprot:Q05397"/> </go:000548> <rdf:label>gp120 binding to CXCR4 or CCR5 activates PYK2 and FAK</rdf:label></rdf:Graph>…</p>
With RDF markup, the article XML itself literally becomes a relational database
Why go to all this effort?
Discoverability and recontextualisation
“Show me statements about the hedgehog gene.”
“Find claims that disagree with this.”
Transparency and flexibility “Plot this graph on a different scale, with error bars added and with these two extra data sets overlaid.”
Specificity and completeness “Give me a full description of this mathematical model that I can run on my own computer.”
Reuse and interoperability “Provide the raw data set used in this analysis in a form that allows me to merge it with my own data.”
Views from the database side
“Before the end of the next decade, pathway databases will become scientific journals and journals
will become databases. Biologists will be greatly empowered, and bioinformatics will continue its long
evolution.”
Lincoln Stein (Reactome)
“Is a biological database any different than a biological journal? I am working toward reaching an
answer of, no, there is no difference.”
Phil Bourne (Protein Data Bank)
Overview
Publishing collaborations: Making databases more like journals
NPG New Technology: Making journals more like databases
Tagging and social bookmarking: New methods of annotation and navigation