35
Knowledge Technologies Institute 1 Triple Stores in a Nutshell Franjo Bratić Alfred Wertner

Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

1

Triple Stores in a Nutshell

Franjo Bratić

Alfred Wertner

Page 2: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

2

Overview

What are essential characteristics of a Triple Store? – short introduction

– examples and background information

“The Agony of choice” - what’s on the market? which one fits for me?

- Few examples

Benchmark - Example

Live Demo With AllegroGraph

Import data

Use Java Client API and run some queries

Page 3: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

3

Motivation

RDF is good in modeling assertions RDF consists of assertions

Aka Triples

Application developers need tools which can manage

RDF data Import/Export

Query

Update

http://www.franz.com/agraph/support/documentation/current/agraph-introduction.html

Page 4: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

4

Triple Stores: Essentials

Triple Stores are tools for RDF Data Management

Essential characteristics:

Persist RDF Data – Native Storage Design (Graph Database)

– Use Relational Database

Query and update the graph Support SPARQL

Page 5: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

5

Persist RDF Data: Native Store

Designed for storing graphs

Block diagram of a native store implementation

http://www.franz.com/agraph/support/documentation/current/agraph-introduction.html

Page 6: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

6

Persist RDF Data: Quads

A quad extends a triple with context information Fast retrieval of triples

Supported by many Triple Stores

Is not part of RDF!

“Get everything about Chuck’s home page”

Subject Predicate Object Context

Ground Chuck Type Human Chuck‘s home

page

Angel petOf Ground Chuck Chuck‘s home

page

petOf inverseOf hasPet English grammar

Dog subClassOf Mammal science

Page 7: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

7

Persist RDF Data: Rdbms

Stores triples with a relational database

Can you imagine of a simple solution how to achieve

that?

Page 8: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

8

Triple Stores: Essentials

Triple Stores are tools for RDF Data Management

Essential characteristics:

Persist RDF Data – Native Storage Design (Graph Database)

– Use Relational Database

Query and update the graph Support SPARQL

Page 9: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

9

Query and update the Graph: SPARQL

SPARQL Query Language support SPARQL Protocol

SPARQL Query Language

SPARQL Protocol Query and update operations based on HTTP

Between client and SPARQL endpoint

SPARQL Query Language Queries: SELECT, ASK, DESCRIBE, CONSTRUCT

Updates: INSERT, DELETE

Page 10: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

10

Triple Stores …

Page 11: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

11

The agony of choice …

Are there differences?

Is one of them „the right one“?

How to choose one for the project? - Requirements / criteria?

- Environment of use?

- Performance?

- Costs?

- …

Page 12: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

12

Set some criteria …

Scalability - Persistent stores better than in-memory stores

Interoperability & portability - Programming language !!!

- commit to use entire stack of a store

Optimization - native stores vs. 3rd party stores

License, Support, Community, …

… only a few left!

Page 13: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

13

AllegroGraph v4.9

load, store, query RDF data

includes an implementation of Prolog

runs natively on Linux x86-64 bit

Interfaces: Java, Python, Ruby, Perl, C#, Clojure, Common Lisp

Tools: AGWebView, Gruff, …

License: Free < 50 Million Triples

Page 15: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

15

OpenLink Virtuoso v6.2

high-performance object-relational SQL database

written in C

distributions for Unix & Windows

Access through:

Jena & Sesame

Tools: ISQL, Graphical Conductor

License: GPL v2 & commercial

Page 16: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

16

OpenLink Virtuoso v6.2

http://virtuoso.openlinksw.com/images/varch625.jpg

Page 17: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

17

Jena

Java based Open Source Framework

represents RDF Graphs as native models: - In-memory

- other data sources (file, database)

Framework includes: - RDF – API

- Reading and writing RDF in RDF/XML, N3 and N-Triples

- OWL – API

- In-memory and persistent storage SPARQL query engine

- Rule-based inference engine

- Query engine with SPARQL specification

Page 18: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

18

Jena TDB

high performance, pure-Java

non-SQL storage subsystem

persistent graph storage layer for Jena

works with Jena SPARQL query engine (ARQ)

number of extensions (e.g. property functions, aggregates, arbitrary length property paths)

custom implementation of B+Tree-s

License: BSD-License

Page 19: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

19

Jena SDB

basically is a Java Loader

Multiple stores supported - e.g. MySQL, PostgreSQL, Oracle, DB2, Apache Derby, …

provides for: - scalable storage & query of RDF datasets

using conventional SQL databases

database tools for - load balancing, security, clustering

- backup and administration can all be used to

manage the installation

designed specifically to support SPARQL

Page 20: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

20

Sesame

framework for processing RDF data

- parsing, storing, inference & querying

on top of a variety of storage systems - relational db-s, in-memory, file systems, keyword indexers, …

large scale of tools - HTTP, SOAP, RMI access

supports 100% SPARQL (since 2008)

supports main RDF file formats: - RDF/XML, Turtle, N-Triples, TriG & TriX, …

Page 21: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

21

Sesame

as Java Servlet Application

in Apache Tomcat

communicate over

HTTP

http://www.openrdf.org/doc/sesame/users/figures/sesame-server.png

Page 23: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

23

Benchmark

What data to be used? - Lehigh University Benchmark (LUBM)

- 14 test queries

- Berlin SPARQL Benchmark (BSBM)

- 12 test queries

- „real-world“ data

- e.g. DBPedia, WordNet, …

Who is testing? - no central institution

- tests (mostly) only by creator manipulated

Testing architecture?

Page 24: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

24

Benchmark

In almost all not considered - RDFS reasoning

- SPARQL 1.1

- Heavy load

- multiple queries in parallel

Conclusion of every benchmark in advance:

NO store wins in every field!!!

Page 25: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

25

Benchmark example

“Yet Another Triple Store Benchmark”

http://mt.inf.tu-dresden.de/forschung/topics/bm/

Machine Hardware – CPU: Intel® Xeon® CPU X5660 @ 2.80GHz x 4

– RAM: 16 GB

– Harddisk: 1 x 34 GB, 1 x 42 GB

Software – OS: Ubuntu 12.04 LTS / 64 Bit

– JRE: JDK 1.7.0_04

– Apache Tomcat Ver. 7.0.28

Page 26: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

26

Benchmark example stores

Fuseki (Jena TDB SPARQL Server) ver. 0.2.3 - TDB Loader of Jena TDB 0.9.0

NanoSPARQLServer of bigdata ver. 1.2.0 - deployed on a tomcat server

OWLIM LITE ver. 5.0.5001 - via Sesame 2.6.5 deployed on a tomcat server

OpenLink Virtuoso Ver. 6.01.3127

Page 27: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

27

Benchmark example dataset

NYTimes Jamendo Movie DB

Yago 2 Core

N-Triple Datasize (MByte) 56.2 151.0 891.6 5,427.2

Triple (Mio) 0.35 1.05 6.15 35.43

Instances (k) 13.2 290.4 665.4 2,648.4

Classes 19 21 53 292,861

Properties 69 47 222 93

Page 28: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

28

Benchmark example queries

Query 1-6 - generic queries

- same for each dataset

Query 7-13 - SPARQL 1.1 Queries specialized for each dataset

Query 14&15: - SPARQL Update queries

- delete and insert some data in the graph

Page 35: Triple Stores in a Nutshell - KTIkti.tugraz.at/.../2012/06/...In_A_Nutshell-PUBLIC.pdf · Triple Stores in a Nutshell Franjo Bratić Alfred Wertner . Knowledge Technologies Institute

Knowledge Technologies Institute

35

Triple Store

DEMO!!!