Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
>>> PhD Defense
>>> On the foundations for the compilation of webdata queries: optimization and distributedevaluation of SPARQL.
Date: 13 September, 2018Defendant: Louis JACHIET
Directors: Nabil LAYAÏDAPierre GENEVÈS
Reviewers: Dario COLAZZOIoana MANOLESCU
Jury: Jérôme EUZENATPatrick VALDURIEZ
[1/40]
>>> PhD Defense
>>> On the foundations for the compilation of webdata queries: optimization and distributedevaluation of SPARQL.
Date: 13 September, 2018Defendant: Louis JACHIET
Directors: Nabil LAYAÏDAPierre GENEVÈS
Reviewers: Dario COLAZZOIoana MANOLESCU
Jury: Jérôme EUZENATPatrick VALDURIEZ
[1/40]
>>> Introduction / Motivation / Questions
What are the museums in Grenoble?
Who are they exposing?
[2/40]
>>> Introduction / Motivation / Answering with Wikipedia?
[3/40]
>>> Introduction / RDF & SPARQL / Goals
The goals of a data standard are to:
* Encode data in a machine readable & processable way
* Allow exchange of data on the web
* Facilitate querying
[4/40]
>>> Introduction / RDF & SPARQL / The RDF standard
RDF: Resource Description Framework [RCM14]In RDF, data is represented as entities and as statementsexpressing relationships between these entities.
[5/40]
>>> Introduction / RDF & SPARQL / The RDF standard
RDF: Resource Description Framework [RCM14]In RDF, data is represented as entities and as statementsexpressing relationships between these entities.
In N-Triples format:
subject predicate object:Musée_de_Grenoble :isA :Museum .:Musée_de_Grenoble :locatedIn :Grenoble .:Musée_de_Grenoble :exhibits :Chagall .:Musée_de_Grenoble :exhibits :Fantin-Latour .
:Perret_Tower :locatedIn :Grenoble .:Louvre :isA :Museum .
[5/40]
>>> Introduction / RDF & SPARQL / The RDF standard
RDF: Resource Description Framework [RCM14]In RDF, data is represented as entities and as statementsexpressing relationships between these entities.
Viewed as graphs:
:Musée_de_Grenoble
:Museum
:Grenoble
:Perret_Tower
:Louvre
:Chagall
:Fantin-Latour
:isA
:isA
:exhibits
:exhibit
s :locatedIn :locatedIn
[5/40]
>>> Introduction / SPARQL / Triple Patterns
Triple Pattern (TP)
A Triple Path is a triple (s, p, o) where s, p, o are constants orvariables.
“What are the museums?”
?what
:Museum:isA
In SPARQL:
?what :isA :Museum .
Solution:
(?What → :Musée_de_Grenoble)(?What → :Louvre)
[6/40]
>>> Introduction / SPARQL / Triple Patterns
Triple Pattern (TP)
A Triple Path is a triple (s, p, o) where s, p, o are constants orvariables.
“What are the museums?”
?what
:Museum:isA
In SPARQL:
?what :isA :Museum .
Solution:
(?What → :Musée_de_Grenoble)(?What → :Louvre)
[6/40]
>>> Introduction / SPARQL / Triple Patterns
Triple Pattern (TP)
A Triple Path is a triple (s, p, o) where s, p, o are constants orvariables.
“What are the museums?”
?what
:Museum:isA
In SPARQL:
?what :isA :Museum .
Solution:
(?What → :Musée_de_Grenoble)(?What → :Louvre)
[6/40]
>>> Introduction / SPARQL / Triple Patterns
Triple Pattern (TP)
A Triple Path is a triple (s, p, o) where s, p, o are constants orvariables.
“What are the museums?”
?what
:Museum:isA
In SPARQL:
?what :isA :Museum .
Solution:
(?What → :Musée_de_Grenoble)(?What → :Louvre)
[6/40]
>>> Introduction / SPARQL / Triple Patterns
Triple Pattern (TP)
A Triple Path is a triple (s, p, o) where s, p, o are constants orvariables.
“What are the museums?”
?what
:Museum:isA
In SPARQL:
?what :isA :Museum .
Solution:
(?What → :Musée_de_Grenoble)(?What → :Louvre)
[6/40]
>>> Introduction / SPARQL / Basic Graph Patterns
Basic Graph Patterns (BGP)
Basic Graph Patterns are conjunctions of Triple Patterns.
“What are the museums in Grenoble exposing Chagall?”
?What:Museum
:Grenoble
:Chagall
:isA:locat
edIn
:exhibits
In SPARQL:
?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits :Chagall .
Solution:
(?What → :Musée_de_Grenoble)
[7/40]
>>> Introduction / SPARQL / Basic Graph Patterns
Basic Graph Patterns (BGP)
Basic Graph Patterns are conjunctions of Triple Patterns.
“What are the museums in Grenoble exposing Chagall?”
?What:Museum
:Grenoble
:Chagall
:isA:locat
edIn
:exhibits
In SPARQL:
?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits :Chagall .
Solution:
(?What → :Musée_de_Grenoble)
[7/40]
>>> Introduction / SPARQL / Basic Graph Patterns
Basic Graph Patterns (BGP)
Basic Graph Patterns are conjunctions of Triple Patterns.
“What are the museums in Grenoble exposing Chagall?”
?What:Museum
:Grenoble
:Chagall
:isA:locat
edIn
:exhibits
In SPARQL:
?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits :Chagall .
Solution:
(?What → :Musée_de_Grenoble)
[7/40]
>>> Introduction / SPARQL / Basic Graph Patterns
Basic Graph Patterns (BGP)
Basic Graph Patterns are conjunctions of Triple Patterns.
“What are the museums in Grenoble exposing Chagall?”
?What:Museum
:Grenoble
:Chagall
:isA:locat
edIn
:exhibits
In SPARQL:
?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits :Chagall .
Solution:
(?What → :Musée_de_Grenoble)
[7/40]
>>> Introduction / SPARQL / Basic Graph Patterns
Basic Graph Patterns (BGP)
Basic Graph Patterns are conjunctions of Triple Patterns.
“What are the museums in Grenoble exposing Chagall?”
?What:Museum
:Grenoble
:Chagall
:isA:locat
edIn
:exhibits
In SPARQL:
?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits :Chagall .
Solution:
(?What → :Musée_de_Grenoble)
[7/40]
>>> Introduction / SPARQL / Basic Graph Patterns
“What are the museums in Grenoble and who are they exposing?”
?What
:Museum
:Grenoble
?artist
:isA
:locatedIn
:exhibits
In SPARQL:
?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits ?artist .
Solution:
(?What → :Musée_de_Grenoble;?artist → :Chagall)(?What → :Musée_de_Grenoble;?artist → :Fantin-Latour)
[8/40]
>>> Introduction / SPARQL / Basic Graph Patterns
“What are the museums in Grenoble and who are they exposing?”
?What
:Museum
:Grenoble
?artist
:isA
:locatedIn
:exhibits
In SPARQL:
?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits ?artist .
Solution:
(?What → :Musée_de_Grenoble;?artist → :Chagall)(?What → :Musée_de_Grenoble;?artist → :Fantin-Latour)
[8/40]
>>> Introduction / SPARQL / Basic Graph Patterns
“What are the museums in Grenoble and who are they exposing?”
?What
:Museum
:Grenoble
?artist
:isA
:locatedIn
:exhibits
In SPARQL:
?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits ?artist .
Solution:
(?What → :Musée_de_Grenoble;?artist → :Chagall)(?What → :Musée_de_Grenoble;?artist → :Fantin-Latour)
[8/40]
>>> Introduction / SPARQL / The SPARQL standard 1.0
Other graph patterns in SPARQL 1.0 [PS+08]
* Triple Patterns
* Conjunction
Basic Graph Patterns
* Disjunction
Museums or Trekking paths
* Filters
Museums with artists born before 1900
* Conditional optionals
Missing information
* Changing graphs, etc.
[9/40]
>>> Introduction / SPARQL / The SPARQL standard 1.0
Other graph patterns in SPARQL 1.0 [PS+08]
* Triple Patterns
* Conjunction
Basic Graph Patterns
* Disjunction
Museums or Trekking paths
* Filters
Museums with artists born before 1900
* Conditional optionals
Missing information
* Changing graphs, etc.
[9/40]
>>> Introduction / SPARQL / The SPARQL standard 1.0
Other graph patterns in SPARQL 1.0 [PS+08]
* Triple Patterns
* Conjunction
Basic Graph Patterns
* Disjunction
Museums or Trekking paths
* Filters
Museums with artists born before 1900
* Conditional optionals
Missing information
* Changing graphs, etc.
[9/40]
>>> Introduction / SPARQL / The SPARQL standard 1.0
Other graph patterns in SPARQL 1.0 [PS+08]
* Triple Patterns
* Conjunction
Basic Graph Patterns
* Disjunction
Museums or Trekking paths
* Filters
Museums with artists born before 1900
* Conditional optionals
Missing information
* Changing graphs, etc.
[9/40]
>>> Introduction / SPARQL / The SPARQL standard 1.0
Other graph patterns in SPARQL 1.0 [PS+08]
* Triple Patterns
* Conjunction
Basic Graph Patterns
* Disjunction
Museums or Trekking paths
* Filters
Museums with artists born before 1900
* Conditional optionals
Missing information
* Changing graphs, etc.
[9/40]
>>> Introduction / SPARQL / The SPARQL standard 1.1
Novelties of SPARQL 1.1 [HSP13]
* Expressions
Compute the age of monuments
* Minus, Exists
Cities with museums but without operas
* Group by & Aggregation
Count the number of museums per city
* Property Paths
[10/40]
>>> Introduction / SPARQL / The SPARQL standard 1.1
Novelties of SPARQL 1.1 [HSP13]
* Expressions
Compute the age of monuments
* Minus, Exists
Cities with museums but without operas
* Group by & Aggregation
Count the number of museums per city
* Property Paths
[10/40]
>>> Introduction / SPARQL / The SPARQL standard 1.1
Novelties of SPARQL 1.1 [HSP13]
* Expressions
Compute the age of monuments
* Minus, Exists
Cities with museums but without operas
* Group by & Aggregation
Count the number of museums per city
* Property Paths
[10/40]
>>> Introduction / SPARQL / The SPARQL standard 1.1
Novelties of SPARQL 1.1 [HSP13]
* Expressions
Compute the age of monuments
* Minus, Exists
Cities with museums but without operas
* Group by & Aggregation
Count the number of museums per city
* Property Paths
[10/40]
>>> Introduction / SPARQL / Property Paths
“What are the museums in France?”
[11/40]
>>> Introduction / SPARQL / Property Paths
“What are the museums in France?”
:Musée_de_Grenoble
:Museum
:Grenoble :Isère
:ARA:France
:isA
:locatedIn :locatedIn
:locatedIn
:locatedIn
[11/40]
>>> Introduction / SPARQL / Property Paths
?What
:Museum
:France
:isA
:locatedIn+
“What are the museums in France?”
:Musée_de_Grenoble
:Museum
:Grenoble :Isère
:ARA:France
:isA
:locatedIn :locatedIn
:locatedIn
:locatedIn
[11/40]
>>> Introduction / SPARQL / Property Paths
?What
:Museum
:France
:isA
:locatedIn+
“What are the museums in France?”
Property Path (PP)
A Property Path is a triple (s, r, o) where r is a pathexpression.
[11/40]
>>> Introduction / Query evaluation / BGP naive
:Musée_de_Grenoble
:Museum
:Grenoble:Chagall
:Fantin-Latour
:Perret_Tower
:Louvres
:isA
:exhibits
:exhibits
:locatedIn
:isA
:locatedIn
“What are the museums in Grenoble and who are they exposing?”
?What
:Museum
:Grenoble
?artist
:isA:locatedIn
:exhibits
(?What → :Musée_de_Grenoble; ?artist → :Chagall)(?What → :Musée_de_Grenoble; ?artist → :Fantin-Latour)
O(
#nodes#variables)
checks!
[12/40]
>>> Introduction / Query evaluation / BGP naive
:Musée_de_Grenoble
:Museum
:Grenoble:Chagall
:Fantin-Latour
:Perret_Tower
:Louvres
:isA
:exhibits
:exhibits
:locatedIn
:isA
:locatedIn
“What are the museums in Grenoble and who are they exposing?”
?What
:Museum
:Grenoble
?artist
:isA:locatedIn
:exhibits
(?What → :Musée_de_Grenoble; ?artist → :Chagall)(?What → :Musée_de_Grenoble; ?artist → :Fantin-Latour)
O(
#nodes#variables)
checks!
[12/40]
>>> Introduction / Query evaluation / BGP naive
:Musée_de_Grenoble
:Museum
:Grenoble:Chagall
:Fantin-Latour
:Perret_Tower
:Louvres
:isA
:exhibits
:exhibits
:locatedIn
:isA
:locatedIn
“What are the museums in Grenoble and who are they exposing?”
?What
:Museum
:Grenoble
?artist
:isA:locatedIn
:exhibits
(?What → :Musée_de_Grenoble; ?artist → :Chagall)(?What → :Musée_de_Grenoble; ?artist → :Fantin-Latour)
O(
#nodes#variables)
checks!
[12/40]
>>> Introduction / Query evaluation / BGP naive
:Musée_de_Grenoble
:Museum
:Grenoble:Chagall
:Fantin-Latour
:Perret_Tower
:Louvres
:isA
:exhibits
:exhibits
:locatedIn
:isA
:locatedIn
“What are the museums in Grenoble and who are they exposing?”
?What
:Museum
:Grenoble
?artist
:isA:locatedIn
:exhibits
(?What → :Musée_de_Grenoble; ?artist → :Chagall)(?What → :Musée_de_Grenoble; ?artist → :Fantin-Latour)
O(
#nodes#variables)
checks!
[12/40]
>>> Introduction / Query evaluation / BGP naive
:Musée_de_Grenoble
:Museum
:Grenoble:Chagall
:Fantin-Latour
:Perret_Tower
:Louvres
:isA
:exhibits
:exhibits
:locatedIn
:isA
:locatedIn
“What are the museums in Grenoble and who are they exposing?”
?What
:Museum
:Grenoble
?artist
:isA:locatedIn
:exhibits
(?What → :Musée_de_Grenoble; ?artist → :Chagall)(?What → :Musée_de_Grenoble; ?artist → :Fantin-Latour)
O(
#nodes#variables)
checks![12/40]
>>> Introduction / Query evaluation / BGP one-by-one
Compute solution to each individual TP
?What
?What
?What
TP1
TP2
TP3
?artist
:Museum
:Grenoble
:exhibits
:isA
:locatedIn
O(#edges) per TP
Combine individual solutions
(TP 1 ⋈ TP 2) ⋈ TP 3
O(|A| + |B| + |A ⋈ B|) per A ⋈ B
[13/40]
>>> Introduction / Query evaluation / BGP one-by-one
Compute solution to each individual TP
?What
?What
?What
TP1
TP2
TP3
?artist
:Museum
:Grenoble
:exhibits
:isA
:locatedIn
O(#edges) per TPCombine individual solutions
(TP 1 ⋈ TP 2) ⋈ TP 3
O(|A| + |B| + |A ⋈ B|) per A ⋈ B
[13/40]
>>> Introduction / Query evaluation / BGP : Algebraic vision
(BGP ,⋈) as an algebraic structure:
* ⋈ is associative
TP 1 ⋈ (TP 2 ⋈ TP 3) = (TP 1 ⋈ TP 2) ⋈ TP 3* ⋈ is commutative
TP 1 ⋈ TP 2 = TP 2 ⋈ TP 1
An optimization method:
Generate all the equivalent terms, run the most efficient.
[14/40]
>>> Introduction / Query evaluation / Relational algebra
The relational algebra [Cod70]
* Base relations
* Combined through a set of operators (⋈, ∪, �, etc.)
Optimization of relational languages
1. Generate equivalent terms
2. Select an estimated most efficient
3. Execute it
Problems:
* Not specifically for graphs
* Mismatchs in the semantics
* Poor optimization of Property Paths
[15/40]
>>> Introduction / Query evaluation / Relational algebra
The relational algebra [Cod70]
* Base relations
* Combined through a set of operators (⋈, ∪, �, etc.)
Optimization of relational languages
1. Generate equivalent terms
2. Select an estimated most efficient
3. Execute it
Problems:
* Not specifically for graphs
* Mismatchs in the semantics
* Poor optimization of Property Paths
[15/40]
>>> Introduction / Query evaluation / Relational algebra
The relational algebra [Cod70]
* Base relations
* Combined through a set of operators (⋈, ∪, �, etc.)
Optimization of relational languages
1. Generate equivalent terms
2. Select an estimated most efficient
3. Execute it
Problems:
* Not specifically for graphs
* Mismatchs in the semantics
* Poor optimization of Property Paths
[15/40]
>>> Introduction / Query evaluation / Relational algebra
The relational algebra [Cod70]
* Base relations
* Combined through a set of operators (⋈, ∪, �, etc.)
Optimization of relational languages
1. Generate equivalent terms
2. Select an estimated most efficient
3. Execute it
Problems:
* Not specifically for graphs
* Mismatchs in the semantics
* Poor optimization of Property Paths
[15/40]
>>> Introduction / Query evaluation / Relational algebra
The relational algebra [Cod70]
* Base relations
* Combined through a set of operators (⋈, ∪, �, etc.)
Optimization of relational languages
1. Generate equivalent terms
2. Select an estimated most efficient
3. Execute it
Problems:
* Not specifically for graphs
* Mismatchs in the semantics
* Poor optimization of Property Paths
[15/40]
>>> Introduction / Query evaluation / Relational algebra
The relational algebra [Cod70]
* Base relations
* Combined through a set of operators (⋈, ∪, �, etc.)
Optimization of relational languages
1. Generate equivalent terms
2. Select an estimated most efficient
3. Execute it
Problems:
* Not specifically for graphs
* Mismatchs in the semantics
* Poor optimization of Property Paths
[15/40]
>>> Contributions / The �-algebra / Overview
1. Generate equivalent terms
2. Select an estimated most efficient
3. Execute it
[16/40]
>>> Contributions / The �-algebra / A new algebra
The �-algebra:
* a variation of the relational algebra
* equipped with fixpoints
* matches the SPARQL semantics
[17/40]
>>> Contributions / The �-algebra / Syntax
' ∶∶= formula| '1 ∪ '2 union| '1⧵⧵ '2 normal minus| '1 − '2 set minus| '1 ⧵ '2 strict minus| '1 '2 left-join| '1 '2 join| �ba (') column exchange (or rename)| �a (') projection| �ba (') column multiplying| �(', g ∶ → ) apply a function to mappings| Θ (', g,,) reduce| �filter(') row filtering| �(X = ') fixpoint| let (X = ') in let-binder| X variable| ∅ no mapping| |c1 → v1,… , cn → vn| a mapping
[18/40]
>>> Contributions / The �-algebra / Syntax
' ∶∶= formula| '1 ∪ '2 union| '1⧵⧵ '2 normal minus| '1 − '2 set minus| '1 ⧵ '2 strict minus| '1 '2 left-join| '1 '2 join| �ba (') column exchange (or rename)| �a (') projection| �ba (') column multiplying| �(', g ∶ → ) apply a function to mappings| Θ (', g,,) reduce| �filter(') row filtering| �(X = ') fixpoint| let (X = ') in let-binder| X variable| ∅ no mapping| |c1 → v1,… , cn → vn| a mapping
[18/40]
>>> Contributions / The �-algebra / Translation
?s ?oPath
?s ?m ?oA B
A/B
T r(A∕B) = �m(
�mo (T r(A)) �ms (T r(B)))
[19/40]
>>> Contributions / The �-algebra / Translation
?s ?oPath
?s ?m ?oA B
A/B
T r(A∕B) = �m(
�mo (T r(A)) �ms (T r(B)))
[19/40]
>>> Contributions / The �-algebra / Translation
?s ?oPath
?s ?m ?oA B
A/B
T r(A∕B) = �m(
�mo (T r(A)) �ms (T r(B)))
[19/40]
>>> Contributions / The �-algebra / Translation
A∗ = Empty Path or Path of A∗∕A
T r(A∗) =
EmptyPatℎ
∪
T r(A∗)∕A
= �(
X = EmptyPatℎ ∪ X∕A)
= �(
X = �os (AllNodes) ∪ �m(
�mo (X) �ms (T r(A)))
)
[20/40]
>>> Contributions / The �-algebra / Translation
A∗ = Empty Path or Path of A∗∕A
T r(A∗) = EmptyPatℎ ∪ T r(A∗)∕A
= �(
X = EmptyPatℎ ∪ X∕A)
= �(
X = �os (AllNodes) ∪ �m(
�mo (X) �ms (T r(A)))
)
[20/40]
>>> Contributions / The �-algebra / Translation
A∗ = Empty Path or Path of A∗∕A
T r(A∗) = EmptyPatℎ ∪ T r(A∗)∕A
= �(
X = EmptyPatℎ ∪ X∕A)
= �(
X = �os (AllNodes) ∪ �m(
�mo (X) �ms (T r(A)))
)
[20/40]
>>> Contributions / The �-algebra / Translation
A∗ = Empty Path or Path of A∗∕A
T r(A∗) = EmptyPatℎ ∪ T r(A∗)∕A
= �(
X = EmptyPatℎ ∪ X∕A)
= �(
X = �os (AllNodes) ∪ �m(
�mo (X) �ms (T r(A)))
)
[20/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
N
[21/40]
>>> Contributions / The �-algebra / Rewrite rules
Rewrite rules for fixpoints
* pushing filters?
�filter (�(X = '))?= �
(
X = �filter ('))
* return fixpoints?
* pushing joins?
�(X = ')?= �(X = ')
* pushing projections?
�p (�(X = '))?= �
(
X = �p ('))
* combine fixpoints?
�(X = ∪ �) �(X = ' ∪ �)?= �(X = ' ∪ � ∪ �)
[22/40]
>>> Contributions / The �-algebra / Rewrite rules
Rewrite rules for fixpoints
* pushing filters?
�filter (�(X = '))?= �
(
X = �filter ('))
* return fixpoints?
* pushing joins?
�(X = ')?= �(X = ')
* pushing projections?
�p (�(X = '))?= �
(
X = �p ('))
* combine fixpoints?
�(X = ∪ �) �(X = ' ∪ �)?= �(X = ' ∪ � ∪ �)
[22/40]
>>> Contributions / The �-algebra / Rewrite rules
Rewrite rules for fixpoints
* pushing filters?
�filter (�(X = '))?= �
(
X = �filter ('))
* return fixpoints?
* pushing joins?
�(X = ')?= �(X = ')
* pushing projections?
�p (�(X = '))?= �
(
X = �p ('))
* combine fixpoints?
�(X = ∪ �) �(X = ' ∪ �)?= �(X = ' ∪ � ∪ �)
[22/40]
>>> Contributions / The �-algebra / Rewrite rules
Rewrite rules for fixpoints
* pushing filters?
�filter (�(X = '))?= �
(
X = �filter ('))
* return fixpoints?
* pushing joins?
�(X = ')?= �(X = ')
* pushing projections?
�p (�(X = '))?= �
(
X = �p ('))
* combine fixpoints?
�(X = ∪ �) �(X = ' ∪ �)?= �(X = ' ∪ � ∪ �)
[22/40]
>>> Contributions / The �-algebra / Rewrite rules
Rewrite rules for fixpoints
* pushing filters?
�filter (�(X = '))?= �
(
X = �filter ('))
* return fixpoints?
* pushing joins?
�(X = ')?= �(X = ')
* pushing projections?
�p (�(X = '))?= �
(
X = �p ('))
* combine fixpoints?
�(X = ∪ �) �(X = ' ∪ �)?= �(X = ' ∪ � ∪ �)
[22/40]
>>> Contributions / The �-algebra / Rewrite rules
Rewrite rules for fixpoints
* pushing filters?
�filter (�(X = '))?= �
(
X = �filter ('))
* return fixpoints?
* pushing joins?
�(X = ')?= �(X = ')
* pushing projections?
�p (�(X = '))?= �
(
X = �p ('))
* combine fixpoints?
�(X = ∪ �) �(X = ' ∪ �)?= �(X = ' ∪ � ∪ �)
[22/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
�?s(
�?s=∶N(
:Red∕�(
X = �os (AllNodes) ∪X∕:Yellow)))
�?s(
�?s=∶N(
�(
X = :Red∕�os (AllNodes) ∪X∕:Yellow)))
�?s(
�?s=∶N (�(X = :Red ∪X∕:Yellow)))
�?s(
�(
X = �?s=∶N (:Red) ∪X∕:Yellow))
�(
X = �?s(
�?s=∶N (:Red))
∪X∕:Yellow)
[23/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
�?s(
�?s=∶N(
:Red∕�(
X = �os (AllNodes) ∪X∕:Yellow)))
�?s(
�?s=∶N(
�(
X = :Red∕�os (AllNodes) ∪X∕:Yellow)))
�?s(
�?s=∶N (�(X = :Red ∪X∕:Yellow)))
�?s(
�(
X = �?s=∶N (:Red) ∪X∕:Yellow))
�(
X = �?s(
�?s=∶N (:Red))
∪X∕:Yellow)
[23/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
�?s(
�?s=∶N(
:Red∕�(
X = �os (AllNodes) ∪X∕:Yellow)))
�?s(
�?s=∶N(
�(
X = :Red∕�os (AllNodes) ∪X∕:Yellow)))
�?s(
�?s=∶N (�(X = :Red ∪X∕:Yellow)))
�?s(
�(
X = �?s=∶N (:Red) ∪X∕:Yellow))
�(
X = �?s(
�?s=∶N (:Red))
∪X∕:Yellow)
[23/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
�?s(
�?s=∶N(
:Red∕�(
X = �os (AllNodes) ∪X∕:Yellow)))
�?s(
�?s=∶N(
�(
X = :Red∕�os (AllNodes) ∪X∕:Yellow)))
�?s(
�?s=∶N (�(X = :Red ∪X∕:Yellow)))
�?s(
�(
X = �?s=∶N (:Red) ∪X∕:Yellow))
�(
X = �?s(
�?s=∶N (:Red))
∪X∕:Yellow)
[23/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
�?s(
�?s=∶N(
:Red∕�(
X = �os (AllNodes) ∪X∕:Yellow)))
�?s(
�?s=∶N(
�(
X = :Red∕�os (AllNodes) ∪X∕:Yellow)))
�?s(
�?s=∶N (�(X = :Red ∪X∕:Yellow)))
�?s(
�(
X = �?s=∶N (:Red) ∪X∕:Yellow))
�(
X = �?s(
�?s=∶N (:Red))
∪X∕:Yellow)
[23/40]
>>> Contributions / The �-algebra / An example
:N :Red/:Yellow∗ ?o
�?s(
�?s=∶N(
:Red∕�(
X = �os (AllNodes) ∪X∕:Yellow)))
�?s(
�?s=∶N(
�(
X = :Red∕�os (AllNodes) ∪X∕:Yellow)))
�?s(
�?s=∶N (�(X = :Red ∪X∕:Yellow)))
�?s(
�(
X = �?s=∶N (:Red) ∪X∕:Yellow))
�(
X = �?s(
�?s=∶N (:Red))
∪X∕:Yellow)
[23/40]
>>> Contributions / The �-algebra / Property Paths
Methods of evaluating Property Paths:
* Ad-hoc
Automata, Waveguide[YGG15]
* Fixpoints
Datalog, Recursive SQL
[24/40]
>>> Contributions / The �-algebra / Benchmarking
102 103 104 105 10610−3
10−2
10−1
100
101
number of nodes n
Time
(s)
PostgresSQLite
VirtuosoARQ1ARQ2DLV1DLV2
Ramsdell2Ramsdell1
Vlog1Vlog2
Prototype
Figure: Time for ∶ N :Red∕:Yellow∗ ?o on n nodes
[25/40]
>>> Contributions / Plan selection / Overview
1. Generate equivalent terms
2. Select an estimated most efficient
3. Execute it
[26/40]
>>> Contributions / Plan selection / Cost model
Cost model
Estimate the running time to evaluate a term.
Cost(A B) = Cost(A) + Cost(B) + O(size(A B))
Cardinality estimation
Estimate the number of solutions to a term.
[27/40]
>>> Contributions / Plan selection / Cost model
Cost model
Estimate the running time to evaluate a term.
Cost(A B) = Cost(A) + Cost(B) + O(size(A B))
Cardinality estimation
Estimate the number of solutions to a term.
[27/40]
>>> Contributions / Plan selection / SPARQLGX
SPARQLGX
SPARQLGX is a distributed SPARQL query evaluator based onApache Spark.
CardEst
A worst-case cardinality estimation with a new tool:summaries.
[28/40]
>>> Contributions / Plan selection / SPARQLGX
SPARQLGX
SPARQLGX is a distributed SPARQL query evaluator based onApache Spark.
CardEst
A worst-case cardinality estimation with a new tool:summaries.
[28/40]
>>> Contributions / Plan selection / Join algorithms
The hash join algorithm
Map(Cogroup(A,B), iter)
O((|A| + |B|) × shuffle + |A B|)
The broadcast join algorithm
MapV alues(A, fB)
O(|A| + |B| × #workers + |A B|)
[29/40]
>>> Contributions / Plan selection / SPARQLGX
NoOptim Stats CardEst
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10Q11Q12Q13Q140204060
Time
(s)
Figure: LUBM [GPH05] 10k (1,4 billions triples)
C1 C2 C3 S1 S2 S3 S4 S5 S6 S70204060
Time
(s)
L1 L2 L3 L4 L5 F1 F2 F3 F4 F50204060
Time
(s)
Figure: WatDiv [AHÖD14] 1k (140 millions triples)
[30/40]
>>> Contributions / Execution / Overview
1. Generate equivalent terms
2. Select an estimated most efficient
3. Execute it
[31/40]
>>> Contributions / Execution / Streams
Streams are one-way communication channels
RS
end
[32/40]
>>> Contributions / Execution / Streams
Streams are one-way communication channels
RS
end
[32/40]
>>> Contributions / Execution / Streams
Streams are one-way communication channels
RS
end
[32/40]
>>> Contributions / Execution / Streams
Streams are one-way communication channels
RS
end
[32/40]
>>> Contributions / Execution / Streams
Streams are one-way communication channels
RS
end
[32/40]
>>> Contributions / Execution / Streams
Execution of �-algebra terms with streams
start
Xn
.
.
.X1 ' out
[33/40]
>>> Contributions / Execution / muSPARQL Q2
102 103 104 105 106 10710−2
10−1
100
101
102
Number of nodes
Time
(s)
muSPARQLRamsdell
DLVVlog
Postgres
Figure: ?a (P1+)∕(P 5+) ?b.
[34/40]
>>> Contributions / Execution / muSPARQL Q3
102 103 104 105 106 10710−2
10−1
100
101
102
Number of nodes
Time
(s)
muSPARQLRamsdell
DLVVlog
Postgres
Figure: ?a (P1+)∕P 2 ?b . ?b P3 + ?c.
[35/40]
>>> Contributions / Execution / muSPARQL Q7
102 103 104 105 106 10710−2
10−1
100
101
102
Number of nodes
Time
(s)
muSPARQLRamsdell
DLVVlog
Postgres
Figure: N0 P1∕(P2+) ?a
[36/40]
>>> Contributions / Execution / muSPARQL Q10
102 103 104 105 106 10710−2
10−1
100
101
102
Number of nodes
Time
(s)
muSPARQLRamsdell
DLVVlog
Postgres
Figure: ?a (P4+)∕(P5+)∕(P3+) ?b
[37/40]
>>> Conclusion / Contributions
Main contributions:
�-Algebra A new algebra with new efficient rewritingrules for fixpoints
BDA’17, BDA’18
muSPARQL A prototype evaluator based on streams
CardEst A new cardinality estimation technique
SPARQLGX A distributed SPARQL query evaluator
ISWC’16
Collaborations:
* Leaves enumeration technique.
ICALP’17
* Distributed SPARQL query evaluators benchmarks.
BDA’17
* ProvSQL: provenance for SQL.
VLDB’18
[38/40]
>>> Conclusion / Contributions
Main contributions:
�-Algebra A new algebra with new efficient rewritingrules for fixpoints BDA’17, BDA’18
muSPARQL A prototype evaluator based on streams
CardEst A new cardinality estimation technique
SPARQLGX A distributed SPARQL query evaluator ISWC’16
Collaborations:
* Leaves enumeration technique. ICALP’17
* Distributed SPARQL query evaluators benchmarks. BDA’17
* ProvSQL: provenance for SQL. VLDB’18
[38/40]
>>> Conclusion / Perspectives
Short-term perspectives
* Complete the distributed implementation of �-algebra* Use �-algebra for SPARQL with ontology-based data access
Long-term perspectives
* Improve cardinality estimation scheme
* Port the optimization of the �-algebra to SQL and Datalog
[39/40]
>>> Conclusion / Questions?
Questions?
[40/40]
Güneş Aluç, Olaf Hartig, M Tamer Özsu, and KhuzaimaDaudjee.Diversified stress testing of rdf data management systems.
In International Semantic Web Conference, pages 197–212.Springer, 2014.
Edgar F Codd.A relational model of data for large shared data banks.Communications of the ACM, 13(6):377–387, 1970.
Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin.LUBM: A benchmark for OWL knowledge base systems.Web Semantics: Science, Services and Agents on the WorldWide Web, 3(2):158–182, October 2005.
Steve Harris, Andy Seaborne, and Eric Prud’hommeaux.SPARQL 1.1 query language.W3C recommendation, 21(10), 2013.
[40/40]
Eric Prud’Hommeaux, Andy Seaborne, et al.SPARQL query language for RDF.W3C recommendation, 15, 2008.www.w3.org/TR/rdf-sparql-query/.
David Wood Richard Cyganiak and Lanthaler Markus.RDF 1.1 concepts and abstract syntax, February 2014.
Nikolay Yakovets, Parke Godfrey, and Jarek Gryz.Towards Query Optimization for SPARQL Property Paths.arXiv:1504.08262 [cs], April 2015.arXiv: 1504.08262.
[40/40]