RDF Databases
By:
Chris Halaschek
Outline
Motivation / Requirements Storage Issues Sesame
General Introduction Architecture Scalability RQL Introduction Demo
Future Directions
Motivation
Having metadata available is not enough Need tools to process, transform, and reason with
the information Need a way to store the metadata and
interact with it
Requirements
Scalable Good performance Useful query language
Storage Issues
How to store the data? In relational database as tables
Querying requires many joins…costly Triples Native graph structure
Querying requires graph traversals…need efficient algorithms
Sesame - Introduction
Open source RDF Schema-based repository and querying facility
Developed as a research prototype by Aidministrator Nederland bv
NLnet Foundation sponsors its further development as open source software
Sesame - Introduction
Can handle RDF data in XML-serialized RDF and N-Triples format
Can extract the contents of a Sesame repository in XML-serialized RDF, N-Triples, and N3 format
Sesame – Architecture
Repository
Many options due to Repository Abstraction Layer (RAL) DBMS – relational, object-relational, etc Existing RDF stores RDF files RDF network services
Repository Abstraction Layer (RAL)
Interface that translates RDF-specific methods to a specific DBMS
Defined by an RDF API Created their own set of interfaces rather than
adopt or extent the existing RDF API proposal Existing API targeted main memory model Theirs offers specific operations that support RDF
Schema semantics (i.e. subsumption reasoning)
RAL Continued
Several of Sesame’s functional modules are clients of the RAL
Problems: Must read from repository – performance
decrease Solution – selectively caching data in memory For small repositories, all data can be cached
Functional Modules
Interact with RAL RQL query module
Evaluates RQL queries RDF administration module
Allows uploading RDF data and schema information, as well as deleting information
RDF export module Allows extraction of schema and/or data from
repository
RQL Query Module
Proposed RQL: Developed within the European IST project C-Web Follow-up project by ICS at FORTH, in Greece Adopts the syntax of OQL
Sesame’s implementation of RQL is slightly different from the proposed RQL
Better compliance to W3C specificaitons Support for optional domain and range restrictions
Queries are translated into sets of call to the RAL Note: Also supports RDQL – based on SquishQL
RQL Query Module
Admin Module
Main functions: Add RDF data/schema information Clear repository
Retrieves information from an RDF(s) source and parses it using SiRPAC RDF parser
Parser delivers information to admin module in statement form – (S,P,O)
Module check statements for consistency and then inserts data
RDF Export Module
Exports the contents of a repository formatted in XML-serialized RDF
Supplies a basis for using Sesame in combination with other RDF tools
Communication with Sesame
Multiple options for various contexts HTTP RMI SOAP
Intermediaries between the functional modules and their clients
Sesame – Architecture
Sesame - Scalability
Performance Tests Uploaded and queried collection of nouns from
Wordnet – 400,000 RDF statements Performed on Sun UltraSPARC 5, 256 MB RAM Used Java Servlets running on web server to
communicate of HTTP PostgreSQL version 7.1.2 repository
Scalability Continued
Uploading nouns 94 minutes 71 statements per second
Querying was much slower than expected Due to distributed storage over multiple tables
Retrieving data required doing many joins
Sesame’s Future
Migration of Sesame to alternate repositories to boost performance
DAML + OIL support
RQL Introduction
Museum schema example
RQL - Syntax
Query typically built upon three clauses Select
Projection over query results From
Bind variables to specific locations in graph model Where
Optional – constraint on values of variables in the from clause
RQL - Example
select X, @P
from {X} @P {Y}
where Y like "Pablo"
x and y are bound to nodes @P bound to a connecting edge - @ prefix signifies the
variable is bound to properties $ prefix signifies classes http://sesame.aidministrator.nl/sesame/
actionFrameset.jsp?repository=museum
RQL - Namespaces
In RDF, nodes and edges are identified by URIs
Can be very long Namespace abbreviation mechanism
Extra clause using namespace
cult = http://www.icom.com/schema.rdf# Simply type: cult:paints
RQL – Path Expressions
Specify a linear path through the graph
select PAINTER, PAINTING, TECH
from {PAINTER} cult:paints {PAINTING}. cult:technique {TECH}
using namespace cult = http://www.icom.com/schema.rdf#
http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum
RQL – Querying Schema
Retrieving the class of a resourceselect X, $X, Y
from {X : $X} cult:paints {Y}
using namespace cult = http://www.icom.com/schema.rdf#
Variable $X is matched to the class of the resource value of X
http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum
RQL – Querying Schema
Constraining resources to a schemaselect X, Y
from {X : cult:Cubist } cult:paints {Y}
using namespace cult = http://www.icom.com/schema.rdf#
RQL – Standard Functions
Class (also Property) subClassOf (also subProperyOf) typeOf In all above use ^ for only direct descendents
(i.e. subClassOf^( cult:Painter ) )
RQL – subClassOf
Example:
select X, @P, Y
from {X} @P {Y}
where X in subClassOf^( cult:Painter )
using namespace cult = http://www.icom.com/schema.rdf#
RQL – Advanced Queries
Set Operators Union, Intersection, Difference
Logical Operators Domain and Range Constraints Comprehensive List:
http://sesame.aidministrator.nl/publications/rql-tutorial.html
Future of RDF Databases
Standard query language Improved storage structures
Native graph model
References / Links
Sesame: http://sesame.aidministrator.nl/
NLnet Foundation: http://www.nlnet.nl/
Original Specifications of RQL: http://139.91.183.30:9090/RDF/RQL