1
Terse RDF Triple Language
• Concise, human-readable
• Prefixes improve readability
Turtle
https://www.w3.org/TR/turtle/
2
css:enrollment "541"^^xsd:integer .
TrialURI
@prefix css: <http://www.example.org/CSS/> . @prefix ct: <http://bio2rdf.org/clinicaltrials/ > . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ct:NCT00799760 css:title "Evaluation of Efficacity and Safety…”@en ; css: phase "Phase 3"@en ;
title
phase Phase 3
enrollment
541
Evaluation of Efficacity and Safety of Oseltamivir and Zanamivir
Turtle
3
Working with RDF • Storage
• Querying
• Creation
Optional Applications
– Apache Jena, Jena Fuseki
• RDF storage, validation, querying
– R or SAS
Instructions provided prior to conference
4
Native • 4Store http://www.4store.org/
• AllegroGraph http://franz.com/agraph/allegrograph/
• Apache Jena TDB http://jena.apache.org/
• GraphDB http://ontotext.com/products/graphdb/
DBMS-backed • Apache Jena SDB http://jena.apache.org/
• Oracle Spatial and Graph
http://www.oracle.com/technetwork/database/options/spatialandgraph/overview/rdfse
mantic-graph-1902016.html
Hybrid Sesame http://rdf4j.org/
Virtuoso http://virtuoso.openlinksw.com/
List at the W3C: https://www.w3.org/2001/sw/wiki/Category:Triple_Store
Storing RDF: Triple Stores
Adapted from Dr. Harold Stack Knowledge Engineering with Semantic Web Technologies 2015
5
Introduction to Jena Fuseki
Try or follow along
• Apache-Jena – contains the APIs, SPARQL engine, the TDB native RDF database and command line tools ARQ, RIOT …
• Apache-Jena-Fuseki – the Jena SPARQL server
6
Load a File into Fuseki
Try or follow along
• File: ex001.ttl
@prefix css: <http://www.example.org/CSS/> .
@prefix ct: <http://bio2rdf.org/clinicaltrials/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ct:NCT00799760 css:title "Evaluation of Efficacity…"@en ;
css:phase "Phase 3"@en ;
css:enrollment "541"^^xsd:int .
Instructions sent to attendees/available on wiki
7
Resource Description Framework (RDF)
• Basic Concepts
• SPARQL
• Creating RDF
8
• SPARQL – SPARQL Protocol
And RDF Query Language
• Not limited to RDF
– Utilities for relational database, spreadsheets, XML, JSON
• Protocol
– Rules for queries and results exchange
What is SPARQL?
Mr. Sparkle - The Simpsons
9
Your First SPARQL Query
Try or follow along
File: ex002.rq
PREFIX css: <http://www.example.org/CSS/>
SELECT *
WHERE{
?s ?p ?o .
} LIMIT 10
10
PREFIX css: <http://www.example.org/CSS/>
PREFIX ct: <http://bio2rdf.org/clinicaltrials/>
SELECT ?nctid ?title
WHERE{
?nctid css:title ?title .
}
ct:NCT00799760 css:title "Evaluation of Efficacity and Safety…”@en ;
S
Query #2: Graph Pattern for Title
Query
P Data
O
?nctid css:title
?title
11
Query for Study Title
Try or follow along
File: ex003.rq
PREFIX css: <http://www.example.org/CSS/>
PREFIX ct: <http://bio2rdf.org/clinicaltrials/>
SELECT ?nctid ?title
WHERE{
?nctid css:title ?title .
}
12
Upload another file
Try or follow along
File: ex004.TTL
@prefix css: <http://www.example.org/CSS/> .
@prefix ct: <http://bio2rdf.org/clinicaltrials/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ct:NCT00799760 css:title "Evaluation of Efficacity …”@en ;
css:phase "Phase 3"@en ;
css:enrollment "541"^^xsd:integer ;
css:primOutcome css:outcome1 .
css:outcome1 rdf:type ct:primary-outcome;
ct:measure "RT-PCR for influenza A virus…"@en ;
ct:time-frame "2 days".
13
css:title "Evaluation of Efficacity …”@en ;
css:phase "Phase 3"@en ;
css:enrollment "541"^^xsd:integer ;
css:outcome1 rdf:type ct:primary-outcome;
css:primOutcome css:outcome1.
ct:NCT00799760
"RT-PCR for influenza A virus…"@en ; ct:measure
ct:time-frame
Graph Query
ct:NCT00799760 ?outURI css:primOutcome
Query for Primary Outcome
"2 days".
Data
?outURI ct:measure
?outcome
14
Data Query
ct:measure css:primOutcome
ct:NCT00799760 ?outURI
?outcome
SELECT ?outcome
"RT-PCR for influenza A virus…"@en ;
15
SPARQL Query PREFIX css: <http://www.example.org/CSS/>
PREFIX ct: <http://bio2rdf.org/clinicaltrials/>
SELECT ?outcome
WHERE
{
ct:NCT00799760 css:primOutcome ?outURI .
?outURI ct:measure ?outcome .
}
Retrieve data that matches the Graph Pattern
NCTID ?outURI primOutcome measure
?outcome
16
Query for Study Outcome
Try or follow along
PREFIX css: <http://www.example.org/CSS/>
PREFIX ct: <http://bio2rdf.org/clinicaltrials/>
SELECT ?outcome
WHERE{
ct:NCT00799760 css:primOutcome ?outURI .
?outURI ct:measure ?outcome . }
File: ex005.rq
17
Query with R R Packages: • rrdf • rrdflibs
http://github.com/egonw/rrdf
Requires Java 7 or higher
rrdf, rrdflibs
Willighagen E. (2014) Accessing biological data in R with semantic web technologies. PeerJ PrePrints 2:e185v3 See https://dx.doi.org/10.7287/peerj.preprints.185v3
18
File: queryLocalTTL.R
library(rrdf)
dataSource = load.rdf(“<path to the TTL file>/ex004.ttl",
format="N3")
query = 'PREFIX css: <http://www.example.org/CSS/>
PREFIX ct: <http://bio2rdf.org/clinicaltrials/>
SELECT ?primaryOutcome
WHERE
{
ct:NCT00799760 css:primOutcome ?outURI .
?outURI ct:measure ?primaryOutcome .
}'
queryResult = as.data.frame(sparql.rdf(dataSource, query))
queryResult
Try or follow along
19
> library(rrdf) Loading required package: rJava Loading required package: rrdflibs > dataSource = load.rdf(“<your path>/ex004.ttl", format="N3") log4j:WARN No appenders could be found for logger (org.apache.jena.riot.RDFLanguages). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. > query = 'PREFIX css: <http://www.example.org/CSS/> + PREFIX ct: <http://bio2rdf.org/clinicaltrials/> + SELECT ?primaryOutcome + WHERE + .... [TRUNCATED] > queryResult = as.data.frame(sparql.rdf(dataSource, query)) > queryResult primaryOutcome 1 RT-PCR for influenza A virus in nasal secretion
Ignore log4j warnings
Query result!
20
Query an Endpoint with R
library(rrdf)
endpoint = "http://localhost:3030/test/query"
query = "SELECT * WHERE {?s ?p ?o . } LIMIT 10 "
queryResult = sparql.remote(endpoint, query)
queryResult
File: queryLocalFuseki.R
21
Query with SAS SAS Macros: %sparqlquery - SPARQL query %sparqlupdate - SPARQL update
https://github.com/MarcJAndersen/SAS-SPARQLwrapper
Implementation: • SAS PROC HTTP to access the
service • Send query/update as text file • Input result using SAS LIBNAME
for XML
Other approaches: • PROC groovy to execute Java Code
from Apache Jena • SAS Java objects to interface to Apache
Jena
Requires running SPARQL service, for example Apache Jena
22 Try or follo
w along File: queryLocalFuseki.sas
Assumptions: • Service active at endpoint • TTL file uploaded to store
23
ns1:NCT00799760 rdf:type ns2:Resource ,
ns2:Clinical-Study .
ns1:NCT00799760 ns3:title "Evaluation of Efficacity and Safety
of Oseltamivir and Zanamivir"@en .
ns2:actual-enrollment 541 ;
…AND MUCH MORE….
Trial Triples with SPARQL http://lod.openlinksw.com/sparql
DESCRIBE <http://bio2rdf.org/clinicaltrials:NCT00799760>
24
Query a Remote Source At: http://lod.openlinksw.com/sparql
25
Federated Query: Join data across sources
Local Fed Query Example
26
27
More SPARQL
SPARQL Query Language for RDF https://www.w3.org/TR/rdf-sparql-query/ SPARQL 1.1 Query Language https://www.w3.org/TR/sparql11-query/
“Learning SPARQL” - Bob DuCharme
http://www.learningsparql.com/index.html - examples for download
28
Resource Description Framework (RDF)
• Basic Concepts
• SPARQL
• Creating RDF
29
Creating RDF • Ontologies
• Create RDF with
• SPARQL
• Text editor
» Validate
• R
• SAS Other Choices
• Python
• Ruby
• Java
• OpenRefine.....
30
Become a “Triple Maker”…
31
… and not a “Trouble Maker”
The Trouble with Triples…
As you make triples, tame them with: • Standard Vocabularies/Ontologies • Data Models
RDF Data Cube
• “Datensparsamkeit” [1]
Store only the data you need. Link to the rest!
[1] http://martinfowler.com/bliki/Datensparsamkeit.html
32
• No clear division between Vocabulary and Ontology. – W3C
• Vocabulary = standard set of words
• Ontology = concepts and their relations, classes, hierarchies. More formal than vocabulary
• RDFS, Web Ontology Language (OWL)
• Standard naming, classification, inferencing, reasoning
• One of the most important components in the Semantic Web
What is an Ontology?
33
• General purpose
– Dublin Core (common metadata) http://dublincore.org
• Modeling
– OWL, RDFS, SKOS
– W3C RDF Data Cube
• Concept Specific
– STATO – general statistics http://stato-ontology.org/
– Ontology of Clinical Research OCRE http://bioportal.bioontology.org/ontologies/OCRE
– CDISC Standards RDF http://www.cdisc.org/rdf
– Provenance Authoring and Versioning http://purl.org/pav/
Example Ontologies
34
Find Ontologies Linked Open Vocabularies http://lov.okfn.org/dataset/lov/
541 Vocabularies March, 2016
35
Ontology Tools Protégé
• Free, widely used
• Web/cloud version
http://protege.stanford.edu/
TopBraid Composer from TopQuadrant
• Free edition, commercial edition
36
Resource Description Framework (RDF) • Ontologies
• Create RDF with
• SPARQL
• Text editor
» Validate
• R
• SAS
37
Create RDF using SPARQL
…similar to SQL
• CREATE
• UPDATE
• INSERT *
• DELETE
* See later SAS example
38
Create RDF: Text Editor
@prefix ct: <http://bio2rdf.org/clinicaltrials/> .
@prefix css: <http://www.example.org/CSS/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix pav: <http://purl.org/pav> .
ct:NCT00799760 css:enrollment "541"^^xsd:int ;
css:phase "Phase 3"@en ;
css:title "Evaluation of Efficacity a
pav:createdWith "Text Editor"^^xsd:string .
Try or follow along
39
Validate • Apache Jena RIOT (RDF I/O Technology)
riot –validate CreateTTLFromEditor.TTL
Example errors 1. Forgot PAV prefix
08:45:44 ERROR riot :: line: 9, col: 16] Undefined prefix: pav
2. Incorrect triples termination
08:45:44 ERROR riot :: [line: 9, col: 32] Unexpected IRI
for predicate…
* note: requires Apache Jena in the system path