Upload
rutger-vos
View
1.805
Download
1
Tags:
Embed Size (px)
DESCRIPTION
TreeBASE is a public repository of peer-reviewed phylogenetic knowledge . Researchers submit their results to TreeBASE when they are writing a manuscript based on them for publication in a suitable journal. The submitted data are assigned permanent, unique identifiers and web addresses that authors can refer to in their article. Anyone can locate and access the data once the study has been published by TreeBASE and by the targeted journal.A prototype of this system has served the phylogenetics community well for a number of years, accumulating the results of thousands of studies. The usage model was that of a silo where data could only be accessed through a web browser, and only be downloaded in representations that omitted important associated metadata. A human with considerable expertise needed to read and interpret the web pages through which everything was served up to make sense out of what was available. This model is not always practical. For example, phyloinformatic research often uses so much data that automation is becoming necessary. Where human intervention is no longer feasible, machines – which are stupid – must be able to do the job instead; and they need to be told what is what. This has spurred more explicit standardization of the syntax and semantics of phylogenetic knowledge. The latest version of TreeBASE facilitates this by adopting a collection of community standards: • PhyloWS for automated searching using a contextual query language and retrieval using a clearly defined URL API. • NeXML for robust data syntax and flexible metadata annotation. • CDAO (and other ontologies) for defining the semantics of the metadata. We will present an overview of how these components work together to make phylogenetic knowledge accessible to machines on the semantic web. Using this new architecture, client side software (including off-the-shelve tools such as RSS readers) can query, transform and download TreeBASE data autonomously.
Citation preview
Rise of the machinesRutger A. Vos, Hilmar Lapp, William H.
Piel, Val Tannen
What is TreeBASE?
A repository of user-submitted phylogenies and source data.
Accepts all types of comparative data for all taxa. Data are public once published in a peer-reviewed medium.
Data in preparation are available to the editors or reviewers using a special access code.
Web app
The machine-readable web
Locations on the web are increasingly visited by machines
instead of human eyes.
Programmable interfaces with structured return values
The TreeBASE web APIObjects can be found using
CQL
Permanent, simple, URLs
Every object a resolvable resource
Serialized in various formats
Searching using CQLContextual Query Language –
standard for queries to information retrieval systems
Hides database schema
Instead, search on predicates
Search results as RSS
PhyloWS Resource URI
PURL domain
Phylogenetics
TreeBASE
PhyloWS
Object ID
http://purl.org/phylo/treebase/phylows/study/TB2:S1787
Same data, different formats
?format=NEXUSFlat file standard for
phylogenetics
?format=NeXMLXML redesign of NEXUS
?format=RDFCDAO/RDF mapping of NeXML
?format=HTMLWeb page describing the
resource
?format=RSS1RSS1.0 feed for search results
?format=NEXUSFlat file standard for
phylogenetics
?format=NeXMLXML redesign of NEXUS
?format=RDFCDAO/RDF mapping of NeXML
?format=HTMLWeb page describing the
resource
?format=RSS1RSS1.0 feed for search results
Data and metadataTreeBASE holds a lot of metadata, for example:
•Lat/long coordinates for specimen samples•Literature metadata•Identifiers
Using the newer serialization formats (NeXML and RDF) we can embed all of them using predicates from a variety of ontologies.
External links
TaxonTaxon
Taxonvariant
Taxonvariant
StudyStudy
Example: Journal feedsprism.publicationName==Evolution
Example: UniProt sequences
TreeBASE stores NCBI taxonomy identifiers
Standard tools can
rewrite these linkout URLs
Result is a corresponding list of UniProt
records
Example: ToLWeb pages
TreeBASE maps to uBio using skos:closeMatch...
…and uBio to ToL using gla:mapping
Example: geocoding
TreeBASE uses DarwinCore for lat/lon annotations
What's next?Make TreeBASE LinkedData
compliant
Make TreeBASE extensible with additional annotations using external triple store
Acknowledgements