Towards Versioning of Arbitrary RDF Data

Preview:

Citation preview

LEDS

WWW.LEDS-PROJEKT.DE

TOWARDS VERSIONING OF ARBITRARY RDF DATA

FROMMHOLD M., NAVARRO PIRIS R., ARNDT N., TRAMP S., PETERSEN N., AND MARTIN M.

September 14, 2016

1

LEDSINTRODUCTION

What it is?

Change tracking system for RDF datasets creating invertible patches for SPARQL Update queries.

What it is not (yet)?

A full-fledged versioning system with support for versioning operations.

September 14, 2016

2

LEDSMOTIVATION

LEDS research project

Allow for co-evolution of RDF datasets outside and inside of an enterprise for better integration of public and private data.

http://www.leds-projekt.de/

September 14, 2016

3

LEDSMOTIVATION

LUCID research project

Automatic distribution of RDF data changes in decentralized value chain networks to optimize the flow of information.

http://www.lucid-project.org/

September 14, 2016

4

LEDSREQUIREMENTS

• Support for versioning operations• commute, revert, merge [Cassidy and Ballantine 2007]

• Detection of effective changes

• Full RDF support

• Protection against unperceived history manipulation

• Distributable patches

September 14, 2016

5

LEDSFEATURES

ü Invertible patches

ü Change detection

ü Patches with quad and blank node support

ü Version integrity by signing of patches

ü eccrev: RDF changes and revisions vocabulary

September 14, 2016

6

LEDSARCHITECTURE

September 14, 2016

7

LEDSARCHITECTURE

September 14, 2016

8

LEDSARCHITECTURE

September 14, 2016

9

LEDSARCHITECTURE

September 14, 2016

10

LEDSBLANK NODES

September 14, 2016

11

LEDSBLANK NODES

September 14, 2016

12

LEDSBLANK NODES

September 14, 2016

13

LEDSBLANK NODES

September 14, 2016

14

LEDSBLANK NODES

September 14, 2016

15

Tummarello et al. 2005:

Given an RDF statement s, the Minimum Self-contained Graph (MSG) containing that statement, written MSG(s), is the set of RDF statements comprised of the following:

1. The statement in question;

2. Recursively, for all the blank nodes involved by statements included in the description so far, the MSG of all the statements involving such blank nodes.

LEDSBLANK NODES

September 14, 2016

16

LEDSBLANK NODES

September 14, 2016

17

Now we are able to clearly identify the correct statement allowing us to revert the update.

LEDSARCHITECTURE

September 14, 2016

18

LEDSARCHITECTURE

September 14, 2016

19

LEDSARCHITECTURE

September 14, 2016

20

LEDSARCHITECTURE

September 14, 2016

21

LEDS

No Versioning

Versioning Async.

Versioning Sync.

0 2000 4000 6000 8000 10000 12000

BSBM QMpH

EVALUATION

September 14, 2016

22

BSBM Explore and Update Benchmark (10 million statements)

LEDSEVALUATION

September 14, 2016

23

Patch Size Benchmark• break even point: >10000 changed statements

0,0

48

s

0,0

75

s

0,4

02

s

4,8

5s

0,3

33

s

1,6

05

s

14

,90

5s

25

6,7

15

s

50 500 5000 50000

Changed Triples vs.SPARQL Update Execution Times

w/o versioning w/ versioning

LEDSFUTURE WORK

September 14, 2016

24

• Integration into LEDS stack

• Implementation of versioning operations [Cassidy and Ballantine 2007]

• Support for more RDF repositories• Virtuoso✓• Stardog: provides versioning without blank node support

• Scalable hashing algorithm• current one based on [Carroll 2003]

• Benchmark for RDF versioning systems

LEDS

September 14, 2016

25

Questions?

Web: http://aksw.org/MarvinFrommhold

Email: frommhold@informatik.uni-leipzig.de /

marvin.frommhold@eccenca.com

Recommended