34
stardog.com Stardog 1.1 An Easier, Smarter, Faster RDF Database Michael Grove, Clark & Parsia LLC [email protected] @mikegrovesoft, @stardog_db, @candp 1

Stardog 1.1: An Easier, Smarter, Faster RDF Database

Embed Size (px)

Citation preview

Page 1: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Stardog 1.1An Easier, Smarter,

Faster RDF DatabaseMichael Grove, Clark & Parsia LLC

[email protected]@mikegrovesoft, @stardog_db, @candp

1

Page 2: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

About C&P

•We build semantic technology tools for enterprise solutions

•Proud bootstrappers since 2005

•Offices in DC and Cambridge, MA

•Government & enterprise customers

2

Page 3: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

What is Stardog?

•a pure Java RDF database

• full-service, feature rich

• focus on query performance

•standards compliant

•scalable (up first, out next)

3

Page 4: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

History•Development started summer 2010

•Stardog 0.5 alpha - 2 May 2011

•Stardog 1.0 final - 19 June 2012

•Total of 32 releases, ~500 tickets, 100s of email on the mailing list

•Stardog 1.0.7 presently

•Stardog 1.1 real soon now...

4

Page 5: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Easier.

5

Page 6: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

What is easy?

6

•What’s “easy” in an RDF database?

•Configuration

•Maintenance

•User Experience

• i.e., rationally predictable

•Easier for whom? Not a simple question.

Page 7: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Configuration• Convention, not configuration

• “Quick Start” is shortest page in the docs

• 4 steps to querying

• Predictable, sane defaults throughout

• Adapted to Java, Unix, Semtech cultures

• Culture is key to convention

• Very good (!) documentation

7

Page 8: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Maintenance•Nothing is easier than doing nothing

•RDF & OWL are ideally schema flexible

• Job scheduler: search, indexes, etc.

•Data migration tools since < 1.0

•Multi-tenancy, online & offline DBs

• Just add data...Automatic data quality*

•NoSQL == Anti-jobs program for DBAs

8

Page 9: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Except that...•Every DB has to be admin’d &

maintained

•Matter of degree, not kind

•Stardog Enterprise Server Management

• audit logging

• JMX monitoring

•web console

•online backups (coming soon!)

9

Page 10: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

User Experience•Client-server & Embeddable

• Jena, Sesame, SNARL, HTTP

•SPARQL query simplifications

•ACID transactions

• Idiomatic Java & Unix interfaces

•Great CLI & shell…

•Windows has gotten much better! :>

•Rich security model10

Page 11: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Smarter.

11

Page 12: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Okay...that’s BS.

12

•“Smarter” is market speak

•But Stardog 1.1 has rich feature set

•Reasoning, including UDR

•Integrity Constraint Validation (ICV)

•Semantic Search

•Security

•Spring

•Linked Data Platform

Page 13: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Reasoning•OWL 2 DL, QL, EL, and RL

•Query-time, no materialization

•Only pay for what you eat

•Embarrassingly parallel in part

•Pellet 3 embedded for OWL 2 DL schema reasoning only

•Very flexible re: NGs & schemas

13

Page 14: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

User-defined Rules• New in 1.1!

• Using SWRL syntax

• Including all SWRL builtins

• Which are also available to SPARQL

• Recently added new individual builtin

• Create new individuals in your rules

• Beware of non-termination!

• Executed at query time like everything else

14

Page 15: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

ICV?• Integrity Constraint Validation

•Automated data quality

•Closed world semantics

•Transactional

•High-level & declarative

• ICs can be OWL, SWRL, or SPARQL

15

Page 16: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

16

Example...Only employees who are US citizens can work on projects that receive funding from a US government agency.

Class: Project and (receivesFundsFrom some USGovAgency)SubClassOf: inverse(worksOn) only (Employee and nationality value "US")

More examples: http://stardog.com/docs/

Page 17: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Semantic Search•Uses Waldo, our deep adaptation of

Lucene

•Text index from RDF literals

•Search for resources or literals

• Integrated with SPARQL query evaluation

•Auto-managed search indexes

17

Page 18: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Security

•Rich security model

•Based on standard RBAC model

•Applies at database-level

•Will extend to Named Graphs in 1.x

•Easy CLI admin tools (& Java API)

18

Page 19: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Spring•Love it or not, Spring isn’t going away

•Support Batch, Data Import, etc.

•Open Source: http://github.com/clark-parsia/spring-stardog

•Developed by an early adopter who needed it; supported/maintained by C&P

19

Page 20: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Linked Data•Stardog fills a hole in our Linked

Data Platform

•HTML5, pure JS, client side web framework (based on backbone.js)

•Linked Data publishing suite

•Stardog Linked Data Catalog...Enterprise Linked Data management app

20

Page 21: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Faster.

21

Page 22: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Finally...

22

•Now we can talk about something that’s objective, context-free, and measurable

•Yes!

•But no…#include <std_disclaim.h>

•Your data & your queries are the only things that really matter

Page 23: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

That said...•Two de facto benchmarks for

SPARQL:

•BSBM, OLTP-style, query mixes per hour (QMpH · 25)

•SP2B, OLAP-style (torture test), set of queries within a timeout, T, at a data size D

23

Page 24: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

SP2B•Stardog completes SP2B at 5M,

10M, and 25M (except q5a)

•No other RDF database completes > 5M. (As of the most recent report. Things change.)

•Considerable performance differential

•Pushing this out to 100M+ in 1.x

24

Page 25: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

BSBM• A throughput test, primarily. Not

necessarily simple queries

• On modest machine, 255 clients, 10M triples, we sustain 7m queries per hour (277k QMpH)

• At 100M, 255 clients, sustain 3m queries per hour (125k QMpH)

• Among the top 2 or 3 RDF DBs for BSBM performance

• We will tackle BSBM BI next...

25

Page 26: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Data Loading• Two indexing modes

• Triples only indexing

• Faster loading, slower NG query

• Up to 250,000 triples per second

• Quads indexing

• Slower loading, faster NG query

• Up to 150,000 triples per second

• More improvements coming in the future

• Customized RDF parser

• Will look at user-defined index subsets26

Page 27: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

What’s new in 1.1

•Aforementioned user defined rules

•But most notably, SPARQL 1.1

•Our most requested feature in a survey

•Oh, we also made it faster

27

Page 28: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

SPARQL 1.1• Latest revision of the SPARQL query

language

• Put off implementing until spec finalized

• It’s still in flux, but we decided to go for it

• Adds useful new features to SPARQL

• Aggregates, grouping, sub-query, negation

• Oh, and the entailment regimes

28

Page 29: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

SPARQL 1.1• Rewrite of query planner & engine for 1.0.5

• Changes needed to support SPARQL 1.1

• Tested by users for the past 3 releases

• With great power comes great responsibility...

• New features are not without cost

• Query planning & optimization more crucial than ever

• Majority of development time

29

Page 30: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Roadmap

30

1. Transitivity & equality

2. GeoSPARQL

3. Web Console

4. Statement identifiers

5. Stored procedures & database triggers

6. “Stardocs”: doc/blob storage & NLP analytics

7. Graph Traversals, Algorithms & query langs

8. Statistical inference & machine learning

9. Stardog 2.0: Distributed Cluster Super Cloud Thingie!

Page 31: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Summary

31

Pick all three!

Easier. Smarter. Faster.

Page 32: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Thanks!

32

Page 33: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Licensing

33

Page 34: Stardog 1.1: An Easier, Smarter, Faster RDF Database

stardog.com

Feature Rich• Support for RDFS, OWL2 profiles (EL, RL, QL) & OWL2 DL

via schema only queries

• Semantic Search

• ICV

• Transactions

• Rich security model

• Support for major APIs

• Jena & Sesame, and our own SNARL

• SPARQL HTTP protocol, Graph Store protocol

• Also includes a CLI & Shell environment

34