View
216
Download
0
Category
Preview:
Citation preview
gStore: Answering SPARQL Queries Via Subgraph Matching
Lei Zou1, Jinghui Mo1, Lei Chen2, M. Tamer Özsu3, Dongyan Zhao1
1
1Peking University,2Hong Kong University of Science and
Technology,3University of Waterloo
Outline
• Background & Related Work
• Overview of gStore
• Encoding Technique
• VS*-tree & Query Algorithm
• Experiments
• Conclusions
2
Outline
• Background & Related Work
• Overview of gStore
• Encoding Technique
• VS*-tree & Query Algorithm
• Experiments
• Conclusions
3
Semantic Web
4
“Semantic Web Technologies” is a collection of standard technologies to realize a Web of Data.
RDF Data Model
5
URI
URI
Literals
SPARQL Queries
7
SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
Query Graph
Naïve Triple Store
9
SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
SQL: Select T3.SubjectFrom T as T1, T as T2, T as T3Where T1.Predict=“BornOnDate” and T1.Object=“1809-02-12” and T2.Predict=“DiedOnDate” and T2.Object=“1865-04-15” and T3. Predict=“hasName” and T1.Subject = T2.Subject and T2. Subject= T3.subject
Too many Self-Joins
Existing Solutions Three categories of solutions are proposed to speed up query
processing: 1. Property Table; Jena [K. Wilkinson et al. SWDB 03], … 2. Vertically Partitioned Solution; SW-store [D. J. Abadi et al. VLDB 07],…3. Exhaustive-Indexing
RDF-3x [T. Neumann et al. VLDB 08], Hexastore [C. Weiss et al. VLDB 08 ],…
10
Existing Solutions-Property Table
11
SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
SQL: Select People.hasName from People where People.BornOnDate = “1809-02-12” and People.DiedOnDate = “1865-04-15”.
Reducing # of join steps
Existing Solutions- Exhaustive-Indexing
Each SPARQL query statement can be translated into one “range query”.
SPARQL Query: Select ?name Where {
?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }
13
Range query & Merge Join
Outline
• Background & Related Work
• Overview of gStore
• Encoding Technique
• VS*-tree & Query Algorithm
• Experiments
• Conclusions
15
Encoding Technique (1)
• |eSig(e).e| = M.• we employ m different string hash functions Hi
(i = 1, ...,m)• For each hash function Hi, we set the
(Hi(eLabel) MOD M)-th bit in eS ig(e).e to be ‘1’• Encoding Sig(e).n is the same
– |eSig(e).n| = N– n different hash functions
23
Encoding Technique (2)
24
“Abr”, “bra”,
”rah”,
”aha”,….,
( hasName, “Abraham Lincoln”)
0010 0000 0000
0000 0010 0000 0000
1000 0000 0000 0000
0000 0000 0100 0000
0000 0000 0000 0001
1000 0010 0100 0001
OR
1000 0010 0100 0001
( BornOnDate, “1809-02-12”)
0100 0000 0000 0100 0010 0100 1000
( DiedOnDate, “1865-04-15”)
0000 1000 0000 0000 0010 0100 0000
( DiedIn, “y:Washington_D.c”)
0000 0010 0000 1000 0010 0100 0001
0110 1010 0000 1100 0010 0100 1001
OR
Outline
• Background & Related Work
• Overview of gStore
• Encoding Technique
• VS-tree & Query Algorithm
• Experiments
• Conclusions
28
VS*-Tree Insert
• The criterion in the VS-tree only depends on the Hamming distance between the signatures of u and the node in VS-tree.
• the criterion in VS - tree depends on both ∗node signatures and G ’s structure∗
36
VS*-Tree split
• the B+1 entities of the node will be partitioned into two new nodes, where B is the maximal fanout for a node in VS -tree.∗
• 1. we find two entities that have the maximal Hamming distance between them as two seed nodes
• 2. we associate each left entry with the nearest seed node, according to Equation 1.
39
VS*-Tree deletion
• Similar to split• if some node d has less than b entries, where
b is the minimal fanout of node in VS -tree, ∗then d is deleted and its entries are reinserted into VS -tree.∗
40
Finding Valid Child States
• propose a DFS strategy to find all valid child states of J.
• start a DFS over G beginning from some ∗vertex vi
44
Outline
• Background & Related Work
• Overview of gStore
• Encoding Technique
• VS*-tree & Query Algorithm
• Experiments
• Conclusions
46
Outline
• Background & Related Work
• Overview of gStore
• Encoding Technique
• VS*-tree & Query Algorithm
• Experiments
• Conclusions
51
Conclusions
• Vertex Encoding Technique;
• An Efficient index Structure: VS-tree;
• A Novel Filtering Technique.
52
Recommended