48
KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu Institute of Applied Informatics and Formal Description Methods (AIFB) No Size Fits All Running the Star Schema Benchmark with SPARQL and RDF Aggregate Views Benedikt Kämpgen , Andreas Harth 10 th ESWC 2013 Semantics and Big Data 29 May 2013 ... ... Supported by

No Size Fits All Running the Star Schema Benchmark with

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: No Size Fits All Running the Star Schema Benchmark with

KIT – University of the State of Baden-Wuerttemberg and

National Research Center of the Helmholtz Association www.kit.edu

Institute of Applied Informatics and Formal Description Methods (AIFB)

No Size Fits All – Running the Star Schema Benchmark with SPARQL and RDF Aggregate Views

Benedikt Kämpgen, Andreas Harth

10th ESWC 2013 Semantics and Big Data

29 May 2013

...

...

Supported by

Page 2: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 2 29 May 2013

Motivation

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Data analytics

http://wiki.planet-data.eu/web/Datasets

Statistical Linked Data

Query

results

Analytical

queries

Query

engine

Balance sheets

Apple Inc.

Real GDP

growth rate

Stock Market

Trading Price

Visa Inc.

Tr-Price for selected companies

Tr-Price aggregated over time

Page 3: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 3 29 May 2013

Motivation

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Data analytics Example Statistical Linked Data

Query

results

Analytical

queries

Query

engine Company Dates Average Trading

Price

Visa Inc. July 1,

2008

80 $

Mastercard

Inc.

July 1,

2008

269 $

Visa Inc. July 1,

2009

90 $

... ... ...

Filter for Visa Inc.

Data

Metadata

RDF Data Cube

Vocabulary [QB]

Company Dates Average Trading

Price

Visa Inc. July 1,

2007

80 $

Visa Inc. July 1,

2009

90 $

... ... ...

Page 4: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 4 29 May 2013

Motivation

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Data analytics Example Statistical Linked Data

Query

results

Analytical

queries

Query

engine Company Dates Average Trading

Price

Visa Inc. July 1,

2008

80 $

Mastercard

Inc.

July 1,

2008

269 $

Visa Inc. July 1,

2009

90 $

... ... ...

Aggregation over Dates

Data

Metadata

RDF Data Cube

Vocabulary [QB]

Company Dates Average Trading

Price

Visa Inc. ALL 85 $

Mastercard

Inc.

ALL 269 $

... ... ...

Page 5: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 5 29 May 2013

Problem

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Tables

OLAP

client ROLAP engine

SQL

RDBMS

Fixed Star Schema

Optimisations: aggregate tables

OLAP query

Data

Metadata

Statistical Linked Data

RDF Data Cube

Vocabulary [QB]

Company Dates Average Trading

Price

Visa Inc. July 1,

2008

80 $

Mastercard

Inc.

July 1,

2008

269 $

... ... ...

Page 6: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 6 29 May 2013

RDF Data Cube

Vocabulary [QB]

Problem

Motivation: Easier integration with other data sources

Filter for companies from cities with more than 100.000 inhabitants

Filter for companies that have a CEO who is younger than 20

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Tables OLAP

client

ROLAP engine

OLAP query

SQL

RDBMS

OLAP4LD

engine Triple

Store

Fixed Star Schema

Optimisations: aggregate tables

SPARQL

OLAP query

RDF

Problem: Performance? Materialised Aggregate Views?

Data

Metadata

Company Dates Average Trading

Price

Visa Inc. July 1,

2008

80 $

Mastercard

Inc.

July 1,

2008

269 $

... ... ...

Page 7: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 7 29 May 2013

Problem: ROLAP vs OLAP4LD –

Performance? Materialised Aggregate Views?

No Materialisation Materialisation

RDBMS/ SQL

(MySQL)

ROLAP

(Mondrian)

ROLAP-M

(Mondrian)

Triple Store/ SPARQL

(Open Virtuoso)

OLAP4LD

(olap4ld)

OLAP4LD-M

(olap4ld)

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Page 8: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 8 29 May 2013

Related Work

We compare ROLAP and OLAP on a common triple store.

Query optimisation on RDF: SPARQL over MapReduce

[1], Column-Stores [2]

We use [QB] to represent aggregate views in RDF.

Vocabularies for statistics: [SCOVO], [QB4OLAP], [QB]

We materialise aggregate views with SPARQL INSERT

queries and run a realistic benchmark with >100M triples.

Views over RDF datasets [3]: Networked Graphs,

vSPARQL

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Page 9: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 9 29 May 2013

Outline

1. Data Analytics on Statistics from the Web

2. Star Schema Benchmark (SSB)

3. Comparing OLAP on RDBMS and on Triple Store

4. Evaluating Materialised Aggregate Views

5. Conclusions

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Page 10: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 10 29 May 2013

Star Schema Benchmark [SSB] – Data

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Measures

Sum of revenues ...

Dimensions with Hierarchies

Dates (→ Month → Year → ALL)

Supplier (→ City → Nation →

Region → ALL) ...

Metadata

SSB Data Cube with Lineorders

Page 11: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 11 29 May 2013

Dates Month Year

January

18, 1992

Jan,

1992

1992

February

8, 1992

Feb,

1992

1992

... ...

Star Schema Benchmark [SSB] – Data

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Measures

Sum of revenues ...

Dimensions with Hierarchies

Dates (→ Month → Year → ALL)

Supplier (→ City → Nation →

Region → ALL) ...

Metadata

Data

SSB Data Cube with Lineorders

Dates Customer Part Supplier Revenue ...

January

18, 1992

Customer#

000009

olive

drab

Supplier

#0001

950,915 ...

February

8, 1992

Customer#

000027

lace

lemon

Supplier

#0035

593,625 ...

... ... ... ... ... ...

Fact table with 6,000,000 lineorders at Scale 1 of SSB

Dimension table with hierarchy

Page 12: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 12 29 May 2013

Star Schema Benchmark (SSB) –

Queries

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Query

Q2.1

Revenues per year

and per product brands

of product category MFGR#12

and of supplier from America.

Aggregation (aka Rollup)

Dates → Year

Part → Brand

Customer → ALL ...

Filter (aka Dice)

Part.Category = MFGR#12

Supplier.Region = AMERICA

Page 13: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 13 29 May 2013

Star Schema Benchmark (SSB) –

Queries

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Query

Result

Year\Brand MFGR#121 ...

1992 667,692,830 ...

... ... ...

Q2.1

Revenues per year

and per product brands

of product category MFGR#12

and of supplier from America.

Calculated

from

Result displayed

in pivot table

Aggregation (aka Rollup)

Dates → Year

Part → Brand

Customer → ALL ...

Filter (aka Dice)

Part.Category = MFGR#12

Supplier.Region = AMERICA

Dates Customer Part Supplier Revenue ...

January

18, 1992

Customer#

000009

olive

drab

Supplier

#0001

950,915 ...

February

8, 1992

Customer#

000027

lace

lemon

Supplier

#0035

593,625 ...

... ... ... ... ... ...

Page 14: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 14 29 May 2013

Star Schema Benchmark (SSB) –

4 Query Flights of 13 Queries

Number of dimensions queried (Filter, Aggregation)

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

0

1

2

3

4

5

Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3

Dimensions queried

From simple to

more complex

Page 15: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 15 29 May 2013

Experimental Setup

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Hardware

OS Debian Linux 6.0.6

CPU 2 x 2.60GHz with 16 cores

Disk 900GB RAID10 on 15k SAS disks

RAM 128GB

Software

DB setup Limited memory (400MB for RDBMS, 600MB for Triple Store)

SSB data Manually translated, loaded according to approach (ROLAP,

OLAP4LD...)

SSB queries Manually translated according to OLAP engine (Mondrian,

olap4ld)

Results Elapsed query times on hot runs with Business Intelligence

Benchmark Framework (BIBM)

Paper website: http://people.aifb.kit.edu/bka/ssb-benchmark/

Page 16: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 16 29 May 2013

Outline

1. Data Analytics on Statistics from the Web

2. Star Schema Benchmark (SSB)

3. Comparing OLAP on RDBMS and on Triple Store

4. Evaluating Materialised Aggregate Views

5. Conclusions

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Page 17: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 17 29 May 2013

Problem: ROLAP vs OLAP4LD –

Performance? Materialised Aggregate Views?

No Materialisation Materialisation

RDBMS/ SQL

(MySQL)

ROLAP

(Mondrian)

ROLAP-M

(Mondrian)

Triple Store/ SPARQL

(Open Virtuoso)

OLAP4LD

(olap4ld)

OLAP4LD-M

(olap4ld)

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Page 18: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 18 29 May 2013

ROLAP vs OLAP4LD – Results

Elapsed query time (eqt)

OLAP4LD overall 12 times slower than ROLAP for executing

all queries.

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

12)(

)(4

irolap

ildolap

qeqt

qeqt

Page 19: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 19 29 May 2013

ROLAP vs OLAP4LD – Query processing

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

0

5

10

15

20

25

Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3

ROLAP joins

OLAP4LD joins

0,00

50,00

100,00

150,00

200,00

Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3

ROLAP

OLAP4LD

OLAP4LD overall 12 times slower than ROLAP for executing all queries?

Hypothesis: Numerous joins for [QB] hierarchies

544.5

Page 20: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 20 29 May 2013

ROLAP vs OLAP4LD – Query processing

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

0

5

10

15

20

25

Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3

ROLAP joins

OLAP4LD joins

0,00

50,00

100,00

150,00

200,00

Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3

ROLAP

OLAP4LD

OLAP4LD overall 12 times slower than ROLAP for executing all queries?

Hypothesis: Numerous joins for [QB] hierarchies

544.5

Page 21: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 21 29 May 2013

ROLAP vs OLAP4LD – Query processing

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

ROLAP

OLAP4LD overall 12 times slower than ROLAP for executing all queries?

Dates Month Year

January

18, 1992

Jan,

1992

1992

February

8, 1992

Feb,

1992

1992

... ...

Denormalised

Dimension

Table

Aggregation (Rollup) Dates → Year

Page 22: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 22 29 May 2013

Numerous joins through skos:narrower paths for [QB] hierarchies

ROLAP vs OLAP4LD – Query processing

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

ROLAP OLAP4LD

OLAP4LD overall 12 times slower than ROLAP for executing all queries?

Dates Month Year

January

18, 1992

Jan,

1992

1992

February

8, 1992

Feb,

1992

1992

... ...

Denormalised

Dimension

Table

Page 23: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 23 29 May 2013

ROLAP vs OLAP4LD – Query processing

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

ROLAP OLAP4LD

OLAP4LD overall 12 times slower than ROLAP for executing all queries?

OLAP4LD-SSB

Dates Month Year

January

18, 1992

Jan,

1992

1992

February

8, 1992

Feb,

1992

1992

... ...

Numerous joins through skos:narrower paths for [QB] hierarchies

Modeling more closely resembling Star Schema is faster (OLAP4LD-SSB)

Denormalised

Dimension

Table

Page 24: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 24 29 May 2013

Outline

1. Data Analytics on Statistics from the Web

2. Star Schema Benchmark (SSB)

3. Comparing OLAP on RDBMS and on Triple Store

4. Evaluating Materialised Aggregate Views

1. RDF Aggregate Views

2. Performance Gain of RDF Aggregate Views

5. Conclusions

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Page 25: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 25 29 May 2013

Evaluating Materialised Aggregate

Views

Aggregate View (aka cuboid, group-by)

Lineorders on same level of granularity

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Dates.Year

Customer.ALL

Part.Brand

Supplier.Region

Quantity.ALL

Discount.ALL

View

Page 26: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 26 29 May 2013

Evaluating Materialised Aggregate

Views

Aggregate View (aka cuboid, group-by)

Lineorders on same level of granularity

Connected via Aggregation operation

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Rollup

Dates.Year

Customer.ALL

Part.Category

Supplier.Region

Quantity.ALL

Discount.ALL

Dates.Year

Customer.ALL

Part.Brand

Supplier.Region

Quantity.ALL

Discount.ALL

View

View

Page 27: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 27 29 May 2013

Evaluating Materialised Aggregate

Views

Aggregate View (aka cuboid, group-by)

Lineorders on same level of granularity

Connected via Aggregation operation

Data Cube Lattice

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

...

...

Rollup

Dates.Year

Customer.ALL

Part.Category

Supplier.Region

Quantity.ALL

Discount.ALL

Dates.Year

Customer.ALL

Part.Brand

Supplier.Region

Quantity.ALL

Discount.ALL

View

View

Page 28: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 28 29 May 2013

Data Cube Lattice – Features

View computable from

any view on lower

level on Rollup path

The higher the view in

lattice the fewer

lineorders

Number of views:

Number of possible

lineorders in data cube:

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Problem: View

Selection and

Computation

...

...

||i

L

Viewv Levell

i

i

vl )|)(|(

(3,000)

(2.6 ∗ 1019)

Page 29: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 29 29 May 2013

View Selection and Computation

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Q2.1

Revenues per year

and per product brands

of product category MFGR#12

and of supplier from America.

Highest view to compute Q2.1

Dates.Year

Customer.ALL

Part.Brand

Supplier.Region

Quantity.ALL

Discount.ALL

View

For each query, we select

highest view from which

query can be computed

Cost of answering query

proportional to number of

lineorders that need to be

scanned

Page 30: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 30 29 May 2013

View Selection and Computation

For each query, we select

highest view from which

query can be computed

Cost of answering query

proportional to number of

lineorders that need to be

scanned

We compute views via

SPARQL INSERT queries

Views are [QB] slices of the

lineorder data cube that fix

dimensions to ALL value

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Q2.1

Revenues per year

and per product brands

of product category MFGR#12

and of supplier from America.

Highest view to compute Q2.1

Dates.Year

Customer.ALL

Part.Brand

Supplier.Region

Quantity.ALL

Discount.ALL

View

Slice of view Q2.1

Slice

Customer.ALL

Quantity.ALL

Discount.ALL

Page 31: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 31 29 May 2013

Outline

1. Data Analytics on Statistics from the Web

2. Star Schema Benchmark (SSB)

3. Comparing OLAP on RDBMS and on Triple Store

4. Evaluating Materialised Aggregate Views

1. RDF Aggregate Views

2. Performance Gain of RDF Aggregate Views

5. Conclusions

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Page 32: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 32 29 May 2013

Problem: ROLAP vs OLAP4LD –

Performance? Materialised Aggregate Views?

No Materialisation Materialisation

RDBMS/ SQL

(MySQL)

ROLAP

(Mondrian)

ROLAP-M

(Mondrian)

Triple Store/ SPARQL

(Open Virtuoso)

OLAP4LD

(olap4ld)

OLAP4LD-M

(olap4ld)

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Page 33: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 33 29 May 2013

OLAP4LD vs OLAP4LD-M – Results

Elapsed query time (eqt)

OLAP4LD-M overall 2 times slower than OLAP4LD, but 13

times faster for query flight Q4

(ROLAP-M overall is 50 times faster than ROLAP)

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

2)(

)(

4

4

ildolap

imldolap

qeqt

qeqt

Page 34: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 34 29 May 2013

0 100000 200000 300000 400000 500000

Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3

Number of lineorders in view

OLAP4LD vs OLAP4LD-M – Query

processing

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

OLAP4LD-M overall 2 times slower than OLAP4LD?

Hypothesis: OLAP4LD-M scans all lineorders although view contains fraction

0

200

400

600

Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3

OLAP4LD

OLAP4LD-M

3,313,191 4,178,699

Page 35: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 35 29 May 2013

0 100000 200000 300000 400000 500000

Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3

Number of lineorders in view

OLAP4LD vs OLAP4LD-M – Query

processing

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

OLAP4LD-M overall 2 times slower than OLAP4LD?

Hypothesis: OLAP4LD-M scans all lineorders although view contains fraction

0

200

400

600

Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3

OLAP4LD

OLAP4LD-M

3,313,191 4,178,699

Page 36: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 36 29 May 2013

OLAP4LD vs OLAP4LD-M – Query

processing

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

OLAP4LD-M overall 2 times slower than OLAP4LD, but 13 times faster for Q4?

ROLAP-M

No joins needed for ROLAP-M

Page 37: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 37 29 May 2013

OLAP4LD vs OLAP4LD-M – Query

processing

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

OLAP4LD-M overall 2 times slower than OLAP4LD, but 13 times faster for Q4?

ROLAP-M

OLAP4LD-M

No joins for ROLAP-M

Reduced number of joins through skos:narrower paths for

OLAP4LD-M hierarchies, but triples are stored in same graph.

...

Page 38: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 38 29 May 2013

OLAP4LD vs OLAP4LD-M – Query

processing

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Hypothesis: Reduced number of joins for hierarchies

OLAP4LD-M 13 times faster than OLAP4LD for Q4?

0

200

400

600

Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3

OLAP4LD

OLAP4LD-M

0

10

20

30

Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3

OLAP4LD joins

OLAP4LD-M joins

Page 39: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 39 29 May 2013

Conclusion: No Size Fits All – Efficiency

versus Modeling Trade-off

Efficiency: Triple Stores per se not worse for analytical

queries than RDBMS

If modelled similar to Star Schema, query processing just as fast

as on RDBMS

Modeling: If modeled using standard vocabulary [QB],

statistics can more easily be integrated with other data

sources, but

Require numerous joins for hierarchies

RDF aggregate views can be represented but increase number of

triples in graph

Some queries successfully optimised but others much slower

Analytical queries on Statistical Linked Data call for more

optimised query engines

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Page 40: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 40 29 May 2013

Feedback & Questions

Thank you!

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

...

...

Paper website: http://people.aifb.kit.edu/bka/ssb-benchmark/

RDF Data Cube

Vocabulary [QB]

OLAP

client

OLAP query

OLAP4LD

engine Triple

Store

SPARQL

RDF

Metadata

Company Dates Average Trading

Price

Visa Inc. July 1,

2008

80 $

Mastercard

Inc.

July 1,

2008

269 $

... ... ...

Page 41: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 41 29 May 2013

References

[1] Spyros Kotoulas, Jacopo Urbani, Peter A. Boncz, Peter Mika: Robust Runtime

Optimization and Skew-Resistant Execution of Analytical SPARQL Queries on

Pig. ISWC. 2012.

[2] Lefteris Sidirourgos, Romulo Goncalves, Martin Kersten, Niels Nes, and Stefan

Manegold. 2008. Column-store support for RDF data management: not all

swans are white. Proc. VLDB Endow. 2008.

[SCOVO] Michael Hausenblas, Wolfgang Halb, Yves Raimond, Lee Feigenbaum,

and Danny Ayers. SCOVO: Using Statistics on the Web of Data. ESWC 2009

Heraklion. 2009.

[QB4OLAP] Lorena Etcheverry, Alejandro A. Vaisman: QB4OLAP: A Vocabulary

for OLAP Cubes on the Semantic Web. ISWC Workshop on Consuming Linked

Data. 2012.

[QB] http://www.w3.org/TR/vocab-data-cube/

[3] Lorena Etcheverry, Alejandro Vaisman. Views over RDF Datasets: A Survey.

Reasoning Web Summer School. 2011.

[SSB] O’Neil, P., O’Neil, E., Chen, X.: Star Schema Benchmark - Revision 3. Tech.

rep., UMass/Boston (2009), http://www.cs.umb.edu/~poneil/StarSchemaB.pdf

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Page 42: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 42 29 May 2013

Backup: ROLAP vs OLAP4LD – Pre-

processing

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

ROLAP OLAP4LD

OLAP4LD takes 261 times longer than ROLAP for transforming and loading

the data?

ROLAP OLAP4LD

Indices No RDF indexed

Hierarchies Strings Resources

Size of data 600MB 4.3GB

Page 43: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 43 29 May 2013

Backup: SSB Schema Detailed

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Page 44: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 44 29 May 2013

Backup: SPARQL Query for Q2.1 SELECT ?rdfh_lo_orderdate ?rdfh_lo_partkey1 sum(?rdfh_lo_revenue) as ?lo_revenue

WHERE {

?obs qb:dataSet rdfh-inst:ds.

?obs rdfh:lo_orderdate ?rdfh_lo_orderdate0.

?rdfh_lo_orderdate1 skos:narrower ?rdfh_lo_orderdate0.

?rdfh_lo_orderdate2 skos:narrower ?rdfh_lo_orderdate1.

?rdfh_lo_orderdate skos:narrower ?rdfh_lo_orderdate2.

rdfh:lo_orderdateYearLevel skos:member ?rdfh_lo_orderdate.

?obs rdfh:lo_partkey ?rdfh_lo_partkey0.

?rdfh_lo_partkey1 skos:narrower ?rdfh_lo_partkey0.

?rdfh_lo_partkey skos:narrower ?rdfh_lo_partkey1.

rdfh:lo_partkeyCategoryLevel skos:member ?rdfh_lo_partkey.

?obs rdfh:lo_suppkey ?rdfh_lo_suppkey0.

...

?obs rdfh:lo_revenue ?rdfh_lo_revenue.

FILTER(?rdfh_lo_partkey = rdfh:lo_partkeyCategoryMFGR-12 AND ?rdfh_lo_suppkey = rdfh:lo_suppkeyRegionAMERICA ).

} group by ?rdfh_lo_orderdate ?rdfh_lo_partkey1 order by ?rdfh_lo_orderdate ?rdfh_lo_partkey1

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Page 45: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 45 29 May 2013

SPARQL Query for Q2.1 – Cube SELECT ?rdfh_lo_orderdate ?rdfh_lo_partkey1 sum(?rdfh_lo_revenue) as ?lo_revenue

WHERE {

?obs qb:dataSet rdfh-inst:ds.

?obs rdfh:lo_orderdate ?rdfh_lo_orderdate0.

?rdfh_lo_orderdate1 skos:narrower ?rdfh_lo_orderdate0.

?rdfh_lo_orderdate2 skos:narrower ?rdfh_lo_orderdate1.

?rdfh_lo_orderdate skos:narrower ?rdfh_lo_orderdate2.

rdfh:lo_orderdateYearLevel skos:member ?rdfh_lo_orderdate.

?obs rdfh:lo_partkey ?rdfh_lo_partkey0.

?rdfh_lo_partkey1 skos:narrower ?rdfh_lo_partkey0.

?rdfh_lo_partkey skos:narrower ?rdfh_lo_partkey1.

rdfh:lo_partkeyCategoryLevel skos:member ?rdfh_lo_partkey.

?obs rdfh:lo_suppkey ?rdfh_lo_suppkey0.

...

?obs rdfh:lo_revenue ?rdfh_lo_revenue.

FILTER(?rdfh_lo_partkey = rdfh:lo_partkeyCategoryMFGR-12 AND ?rdfh_lo_suppkey = rdfh:lo_suppkeyRegionAMERICA ).

} group by ?rdfh_lo_orderdate ?rdfh_lo_partkey1 order by ?rdfh_lo_orderdate ?rdfh_lo_partkey1

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Lineorder Data Cube

Page 46: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 46 29 May 2013

SPARQL Query for Q2.1 – Slice/Roll-Up SELECT ?rdfh_lo_orderdate ?rdfh_lo_partkey1 sum(?rdfh_lo_revenue) as ?lo_revenue

WHERE {

?obs qb:dataSet rdfh-inst:ds.

?obs rdfh:lo_orderdate ?rdfh_lo_orderdate0.

?rdfh_lo_orderdate1 skos:narrower ?rdfh_lo_orderdate0.

?rdfh_lo_orderdate2 skos:narrower ?rdfh_lo_orderdate1.

?rdfh_lo_orderdate skos:narrower ?rdfh_lo_orderdate2.

rdfh:lo_orderdateYearLevel skos:member ?rdfh_lo_orderdate.

?obs rdfh:lo_partkey ?rdfh_lo_partkey0.

?rdfh_lo_partkey1 skos:narrower ?rdfh_lo_partkey0.

?rdfh_lo_partkey skos:narrower ?rdfh_lo_partkey1.

rdfh:lo_partkeyCategoryLevel skos:member ?rdfh_lo_partkey.

?obs rdfh:lo_suppkey ?rdfh_lo_suppkey0.

...

?obs rdfh:lo_revenue ?rdfh_lo_revenue.

FILTER(?rdfh_lo_partkey = rdfh:lo_partkeyCategoryMFGR-12 AND ?rdfh_lo_suppkey = rdfh:lo_suppkeyRegionAMERICA ).

} group by ?rdfh_lo_orderdate ?rdfh_lo_partkey1 order by ?rdfh_lo_orderdate ?rdfh_lo_partkey1

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Roll-Up DatesYear

Roll-Up Part Brand1

Roll-Up Supplier Region

Slice Customer, Supplier, Quantity, Discount

Page 47: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 47 29 May 2013

SPARQL Query for Q2.1 – Projection SELECT ?rdfh_lo_orderdate ?rdfh_lo_partkey1 sum(?rdfh_lo_revenue) as ?lo_revenue

WHERE {

?obs qb:dataSet rdfh-inst:ds.

?obs rdfh:lo_orderdate ?rdfh_lo_orderdate0.

?rdfh_lo_orderdate1 skos:narrower ?rdfh_lo_orderdate0.

?rdfh_lo_orderdate2 skos:narrower ?rdfh_lo_orderdate1.

?rdfh_lo_orderdate skos:narrower ?rdfh_lo_orderdate2.

rdfh:lo_orderdateYearLevel skos:member ?rdfh_lo_orderdate.

?obs rdfh:lo_partkey ?rdfh_lo_partkey0.

?rdfh_lo_partkey1 skos:narrower ?rdfh_lo_partkey0.

?rdfh_lo_partkey skos:narrower ?rdfh_lo_partkey1.

rdfh:lo_partkeyCategoryLevel skos:member ?rdfh_lo_partkey.

?obs rdfh:lo_suppkey ?rdfh_lo_suppkey0.

...

?obs rdfh:lo_revenue ?rdfh_lo_revenue.

FILTER(?rdfh_lo_partkey = rdfh:lo_partkeyCategoryMFGR-12 AND ?rdfh_lo_suppkey = rdfh:lo_suppkeyRegionAMERICA ).

} group by ?rdfh_lo_orderdate ?rdfh_lo_partkey1 order by ?rdfh_lo_orderdate ?rdfh_lo_partkey1

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Revenues (sum)

Page 48: No Size Fits All Running the Star Schema Benchmark with

Institute of Applied Informatics and

Formal Description Methods (AIFB) 48 29 May 2013

SPARQL Query for Q2.1 – Dice SELECT ?rdfh_lo_orderdate ?rdfh_lo_partkey1 sum(?rdfh_lo_revenue) as ?lo_revenue

WHERE {

?obs qb:dataSet rdfh-inst:ds.

?obs rdfh:lo_orderdate ?rdfh_lo_orderdate0.

?rdfh_lo_orderdate1 skos:narrower ?rdfh_lo_orderdate0.

?rdfh_lo_orderdate2 skos:narrower ?rdfh_lo_orderdate1.

?rdfh_lo_orderdate skos:narrower ?rdfh_lo_orderdate2.

rdfh:lo_orderdateYearLevel skos:member ?rdfh_lo_orderdate.

?obs rdfh:lo_partkey ?rdfh_lo_partkey0.

?rdfh_lo_partkey1 skos:narrower ?rdfh_lo_partkey0.

?rdfh_lo_partkey skos:narrower ?rdfh_lo_partkey1.

rdfh:lo_partkeyCategoryLevel skos:member ?rdfh_lo_partkey.

?obs rdfh:lo_suppkey ?rdfh_lo_suppkey0.

...

?obs rdfh:lo_revenue ?rdfh_lo_revenue.

FILTER(?rdfh_lo_partkey = rdfh:lo_partkeyCategoryMFGR-12 AND ?rdfh_lo_suppkey = rdfh:lo_suppkeyRegionAMERICA ).

} group by ?rdfh_lo_orderdate ?rdfh_lo_partkey1 order by ?rdfh_lo_orderdate ?rdfh_lo_partkey1

B. Kämpgen, A. Harth – Star Schema Benchmark with SPARQL and RDF Aggregate Views

Supplier Region = AMERICA AND

Part Category = MFGR#12