20
Hexastore: Hexastore: Sextuple Indexing for Semantic Web Data Sextuple Indexing for Semantic Web Data Management Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics, University of Zurich Session: Indexing and Query Processing, VLDB 2008 2010-01-22 Summarized by Jaeseok Myung Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea

Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Embed Size (px)

Citation preview

Page 1: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Hexastore:Hexastore:Sextuple Indexing for Semantic Web Data Sextuple Indexing for Semantic Web Data

ManagementManagement

Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein

Department of Informatics, University of Zurich

Session: Indexing and Query Processing, VLDB 2008

2010-01-22

Summarized by Jaeseok Myung

Intelligent Database Systems LabSchool of Computer Science & EngineeringSeoul National University, Seoul, Korea

Page 2: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

OverviewOverview

Hexastore – Sextuple Indexing

A Triple (S, P, O) can be represented in six ways (3! = 6)

– SPO, SOP, PSO, POS, OSP, OPS

Every possible indexing scheme can be materialized

– Allows quick and scalable query processing

– Up to five times bigger index space is needed

In this presentation,

Review conventional RDF storage structures

Introduction to Hexastore

Discussion

Center for E-Business Technology IDS Lab. Seminar – 2/20

Page 3: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Physical Designs for RDF Storage Physical Designs for RDF Storage (1/4)(1/4)

Giant Triples Table

Center for E-Business Technology

SELECT ?titleWHERE {

?book <title> ?title.?book <author> <Fox, Joe>.?book <copyright> <2001>

}

Join! Join!

Entire Table Scan!

Redundancy!

IDS Lab. Seminar – 3/20

Page 4: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Physical Designs for RDF Storage Physical Designs for RDF Storage (2/4)(2/4)

Clustered Property Table

Contains clusters of properties that tend to be defined together

Center for E-Business Technology IDS Lab. Seminar – 4/20

Page 5: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Physical Designs for RDF Storage Physical Designs for RDF Storage (3/4)(3/4)

Property-Class Table

Exploits the type property of subjects to cluster similar sets of subjects together in the same table

Unlike clustered property table, a property may exist in multiple property-class tables

Center for E-Business Technology

Values of the type propertyValues of the type property

IDS Lab. Seminar – 5/20

Page 6: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Physical Designs for RDF Storage Physical Designs for RDF Storage (4/4)(4/4)

Vertically Partitioned Table

The giant table is rewritten into n two column tables where n is the number of unique properties in the data

We don’t have to

– Maintain null values

– Have a certain clustering algorithm

Center for E-Business Technology

subjectsubject

propertyproperty

objectobject

IDS Lab. Seminar – 6/20

Page 7: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

The problem of having non-property-bound queries

MotivationMotivation

Center for E-Business Technology IDS Lab. Seminar – 7/20

Page 8: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Hexastore: Sextuple IndexingHexastore: Sextuple Indexing

Center for E-Business Technology

OOPP

PP

OO SSSS

OO

PP

PP

SS

SS

PPOO

SS

SS

OO

PP

OOOOPPSS

IDS Lab. Seminar – 8/20

Page 9: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Hexastore: Sextuple IndexingHexastore: Sextuple Indexing

Center for E-Business Technology IDS Lab. Seminar – 9/20

Page 10: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Five-fold Increase in Index SpaceFive-fold Increase in Index Space

Sharing The Same Terminal Lists

SPO-PSO, SOP-OSP, POS-OPS

The key of each of the three resources in a triple appears in two headers and two vectors, but only in one list

Center for E-Business Technology IDS Lab. Seminar – 10/20

Page 11: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Mapping DictionaryMapping Dictionary

Replacing all literals by unique IDs using a mapping dictionary

Mapping dictionary compresses the triple store

– Reduced redundancy, Saving a lot of physical space

We can concentrate on a logical index structure rather than the physical storage design

Center for E-Business Technology

S P O

object214 hasColor blue

object214 belongsTo

object352

… … …

S P O

0 1 2

0 3 4

… … …

ID Value

0 object214

1 hasColor

… …

IDS Lab. Seminar – 11/20

Page 12: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Clustered BClustered B++-Tree (RDF-3X, VLDB -Tree (RDF-3X, VLDB 2008)2008)

Store everything in a clustered B+-Tree

Triples are sorted in lexicographical order

– Allowing the conversion of SPARQL patterns into range scan

We don’t have to do entire table scan

Center for E-Business Technology

002 …

000 001 002 003

S P O

0 1 2

0 3 4

… … …

Actually, we don’t need this table!Actually, we don’t need this table!

ID Value

0 object214

1 hasColor

… …

<Mapping Dictionary>

IDS Lab. Seminar – 12/20

Page 13: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

ArgumentationArgumentation

Concise and Efficient Handling of Multi-valued Resources

Index can contain multiple items

cf. Multi-valued Property Table

Avoidance of NULLs

Only those RDF elements that are relevant to a particular other element need to be stored in a particular index

No ad-hoc Choices Needed

Most other RDF data storage schemes require several ad-hoc decisions about their data representation architecture

– ex. Clustered Property Table (which properties to be stored together)

Center for E-Business Technology IDS Lab. Seminar – 13/20

Page 14: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

ArgumentationArgumentation

Reduced I/O cost

Other RDF storage schemes may need to access multiple tables which are irrelevant to a query

– Queries that are not bounded by property

All First-step Pairwise Joins are Fast Merge-Joins

The key of resources in all vectors and lists used in a Hexastore are sorted

Reduction of Unions and Joins

ex. a list of subjects related to two particular objects through any property

– Hexastore can use osp index

Center for E-Business Technology IDS Lab. Seminar – 14/20

Page 15: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Treating the Path Expression ProblemTreating the Path Expression Problem

Select B.subjFROM triples AS A, triples AS BWHERE A.prop = wasBornAND A.obj = ‘1860’AND A.subj = B.objAND B.prop = ‘Author’

A path expression requires (n-1) subject-object self-joins where n is the length of the path

Vertical Partitioning

– Materialized Path Expressions (A.author:wasBorn = ‘1860’)

– n-1C2 = O(n2) possible additional properties

Hexastore

– (n-1) merge-join using pso and pos indices

Center for E-Business Technology IDS Lab. Seminar – 15/20

Page 16: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Experimental EvaluationExperimental Evaluation

Setup

2.8GHz dual core, 16GB RAM

Competitors

Column-oriented Vertical Partitioning Approaches– COVP1 – PSO Index

– COVP2 – PSO Index + POS Index (second copy)

Hexastore– SPO, SOP, PSO, POS, OSP, OPS

Datasets

Barton, MIT library data, 61 mil. triples, 258 properties

LUBM, A synthetic benchmark data set(10 univ.), 6.8 mil. triples, 18 predicates

Center for E-Business Technology IDS Lab. Seminar – 16/20

Page 17: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Performance (Barton Data)Performance (Barton Data)

Center for E-Business Technology IDS Lab. Seminar – 17/20

Page 18: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Performance (LUBM, 10)Performance (LUBM, 10)

Center for E-Business Technology IDS Lab. Seminar – 18/20

Page 19: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

Memory UsageMemory Usage

In practice, Hexastore requires a four-fold increase in memory in comparison to COVP1, which is an affordable cost for the derived advantages

Center for E-Business Technology IDS Lab. Seminar – 19/20

Page 20: Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics,

Copyright 2010 by CEBT

ConclusionConclusion

Hexastore: Sextuple-Indexing Scheme

Worst-case five-fold storage increase in comparison to a conventional triples table

Quick and scalable general-purpose query processing

– All pairwise joins in a Hexastore can be rendered as merge joins

My Question

Main-memory Indexing (Is it possible?)

– 7GB RAM for 6 mil. triples

Other Options?

Center for E-Business Technology IDS Lab. Seminar – 20/20