Upload
clark-parsia-llc
View
1.313
Download
1
Embed Size (px)
DESCRIPTION
A talk from Semtech NYC 2012 about Stardog 1.1, the forthcoming release that adds SPARQL 1.1 and user-defined rules.
Citation preview
stardog.com
Stardog 1.1An Easier, Smarter,
Faster RDF DatabaseMichael Grove, Clark & Parsia LLC
[email protected]@mikegrovesoft, @stardog_db, @candp
1
stardog.com
About C&P
•We build semantic technology tools for enterprise solutions
•Proud bootstrappers since 2005
•Offices in DC and Cambridge, MA
•Government & enterprise customers
2
stardog.com
What is Stardog?
•a pure Java RDF database
• full-service, feature rich
• focus on query performance
•standards compliant
•scalable (up first, out next)
3
stardog.com
History•Development started summer 2010
•Stardog 0.5 alpha - 2 May 2011
•Stardog 1.0 final - 19 June 2012
•Total of 32 releases, ~500 tickets, 100s of email on the mailing list
•Stardog 1.0.7 presently
•Stardog 1.1 real soon now...
4
stardog.com
What is easy?
6
•What’s “easy” in an RDF database?
•Configuration
•Maintenance
•User Experience
• i.e., rationally predictable
•Easier for whom? Not a simple question.
stardog.com
Configuration• Convention, not configuration
• “Quick Start” is shortest page in the docs
• 4 steps to querying
• Predictable, sane defaults throughout
• Adapted to Java, Unix, Semtech cultures
• Culture is key to convention
• Very good (!) documentation
7
stardog.com
Maintenance•Nothing is easier than doing nothing
•RDF & OWL are ideally schema flexible
• Job scheduler: search, indexes, etc.
•Data migration tools since < 1.0
•Multi-tenancy, online & offline DBs
• Just add data...Automatic data quality*
•NoSQL == Anti-jobs program for DBAs
8
stardog.com
Except that...•Every DB has to be admin’d &
maintained
•Matter of degree, not kind
•Stardog Enterprise Server Management
• audit logging
• JMX monitoring
•web console
•online backups (coming soon!)
9
stardog.com
User Experience•Client-server & Embeddable
• Jena, Sesame, SNARL, HTTP
•SPARQL query simplifications
•ACID transactions
• Idiomatic Java & Unix interfaces
•Great CLI & shell…
•Windows has gotten much better! :>
•Rich security model10
stardog.com
Okay...that’s BS.
12
•“Smarter” is market speak
•But Stardog 1.1 has rich feature set
•Reasoning, including UDR
•Integrity Constraint Validation (ICV)
•Semantic Search
•Security
•Spring
•Linked Data Platform
stardog.com
Reasoning•OWL 2 DL, QL, EL, and RL
•Query-time, no materialization
•Only pay for what you eat
•Embarrassingly parallel in part
•Pellet 3 embedded for OWL 2 DL schema reasoning only
•Very flexible re: NGs & schemas
13
stardog.com
User-defined Rules• New in 1.1!
• Using SWRL syntax
• Including all SWRL builtins
• Which are also available to SPARQL
• Recently added new individual builtin
• Create new individuals in your rules
• Beware of non-termination!
• Executed at query time like everything else
14
stardog.com
ICV?• Integrity Constraint Validation
•Automated data quality
•Closed world semantics
•Transactional
•High-level & declarative
• ICs can be OWL, SWRL, or SPARQL
15
stardog.com
16
Example...Only employees who are US citizens can work on projects that receive funding from a US government agency.
Class: Project and (receivesFundsFrom some USGovAgency)SubClassOf: inverse(worksOn) only (Employee and nationality value "US")
More examples: http://stardog.com/docs/
stardog.com
Semantic Search•Uses Waldo, our deep adaptation of
Lucene
•Text index from RDF literals
•Search for resources or literals
• Integrated with SPARQL query evaluation
•Auto-managed search indexes
17
stardog.com
Security
•Rich security model
•Based on standard RBAC model
•Applies at database-level
•Will extend to Named Graphs in 1.x
•Easy CLI admin tools (& Java API)
18
stardog.com
Spring•Love it or not, Spring isn’t going away
•Support Batch, Data Import, etc.
•Open Source: http://github.com/clark-parsia/spring-stardog
•Developed by an early adopter who needed it; supported/maintained by C&P
19
stardog.com
Linked Data•Stardog fills a hole in our Linked
Data Platform
•HTML5, pure JS, client side web framework (based on backbone.js)
•Linked Data publishing suite
•Stardog Linked Data Catalog...Enterprise Linked Data management app
20
stardog.com
Finally...
22
•Now we can talk about something that’s objective, context-free, and measurable
•Yes!
•But no…#include <std_disclaim.h>
•Your data & your queries are the only things that really matter
stardog.com
That said...•Two de facto benchmarks for
SPARQL:
•BSBM, OLTP-style, query mixes per hour (QMpH · 25)
•SP2B, OLAP-style (torture test), set of queries within a timeout, T, at a data size D
23
stardog.com
SP2B•Stardog completes SP2B at 5M,
10M, and 25M (except q5a)
•No other RDF database completes > 5M. (As of the most recent report. Things change.)
•Considerable performance differential
•Pushing this out to 100M+ in 1.x
24
stardog.com
BSBM• A throughput test, primarily. Not
necessarily simple queries
• On modest machine, 255 clients, 10M triples, we sustain 7m queries per hour (277k QMpH)
• At 100M, 255 clients, sustain 3m queries per hour (125k QMpH)
• Among the top 2 or 3 RDF DBs for BSBM performance
• We will tackle BSBM BI next...
25
stardog.com
Data Loading• Two indexing modes
• Triples only indexing
• Faster loading, slower NG query
• Up to 250,000 triples per second
• Quads indexing
• Slower loading, faster NG query
• Up to 150,000 triples per second
• More improvements coming in the future
• Customized RDF parser
• Will look at user-defined index subsets26
stardog.com
What’s new in 1.1
•Aforementioned user defined rules
•But most notably, SPARQL 1.1
•Our most requested feature in a survey
•Oh, we also made it faster
27
stardog.com
SPARQL 1.1• Latest revision of the SPARQL query
language
• Put off implementing until spec finalized
• It’s still in flux, but we decided to go for it
• Adds useful new features to SPARQL
• Aggregates, grouping, sub-query, negation
• Oh, and the entailment regimes
28
stardog.com
SPARQL 1.1• Rewrite of query planner & engine for 1.0.5
• Changes needed to support SPARQL 1.1
• Tested by users for the past 3 releases
• With great power comes great responsibility...
• New features are not without cost
• Query planning & optimization more crucial than ever
• Majority of development time
29
stardog.com
Roadmap
30
1. Transitivity & equality
2. GeoSPARQL
3. Web Console
4. Statement identifiers
5. Stored procedures & database triggers
6. “Stardocs”: doc/blob storage & NLP analytics
7. Graph Traversals, Algorithms & query langs
8. Statistical inference & machine learning
9. Stardog 2.0: Distributed Cluster Super Cloud Thingie!
stardog.com
Summary
31
Pick all three!
Easier. Smarter. Faster.
stardog.com
Feature Rich• Support for RDFS, OWL2 profiles (EL, RL, QL) & OWL2 DL
via schema only queries
• Semantic Search
• ICV
• Transactions
• Rich security model
• Support for major APIs
• Jena & Sesame, and our own SNARL
• SPARQL HTTP protocol, Graph Store protocol
• Also includes a CLI & Shell environment
34