18
Storage Engine for Semantic Storage Engine for Semantic Web Web

Storage Engine for Semantic Web

Embed Size (px)

DESCRIPTION

Storage Engine for Semantic Web. Assertion. Storage engine for semantic web has requirements similar to those for e-commerce aplications. Draw upon results and lessons from R. Agrawal, A. Somani, Y. Xu: Storage and Retrieval of E-Commerce Data. VLDB-2001. - PowerPoint PPT Presentation

Citation preview

Page 1: Storage Engine for Semantic Web

Storage Engine for Semantic WebStorage Engine for Semantic Web

Page 2: Storage Engine for Semantic Web

AssertionAssertion

Storage engine for semantic web has requirements similar to those for e-commerce aplications.

Draw upon results and lessons from– R. Agrawal, A. Somani, Y. Xu: Storage and

Retrieval of E-Commerce Data. VLDB-2001.

Page 3: Storage Engine for Semantic Web

Typical E-Commerce Data Typical E-Commerce Data CharacteristicsCharacteristics

Nearly 2 Million components More than 2000 leaf-level

categories Large number of Attributes (5000)

An Experimental E-marketplace for An Experimental E-marketplace for Computer componentsComputer components

Constantly evolving schema Sparsely populated data (about 50-100 attributes/component)

Page 4: Storage Engine for Semantic Web

Alternative Physical Representations

Horizontal– One N-ary relation

Binary – N 2-ary relations

Vertical– One 3-ary relation

Page 5: Storage Engine for Semantic Web

Conventional horizontal representation Conventional horizontal representation (n-ary relation)(n-ary relation)

Name Monitor Height Recharge Output playback Smooth scan Progressive Scan

PAN DVD-L75 7 inch - Built-in Digital - - -

KLH DVD221 - 3.75 - S-Video - - No

SONY S-7000 - - - - - - -

SONY S-560D - - - - Cinema Sound Yes -

… … … … … … … …

DB Catalogs do not support thousands of columns (DB2/Oracle limit: 1012 columns)

Storage overhead of NULL values Nulls increase the index size and they sort high in DB2 B+ tree index Hard to load/update Schema evolution is expensive

Querying is straightforward

Page 6: Storage Engine for Semantic Web

Binary RepresentationBinary Representation(N 2-ary relations)(N 2-ary relations)

Dense representation Manageability is hard

because of large number of tables

Schema evolution expensive

Decomposition Storage Model [Copeland et al SIGMOD 85], [Khoshafian et al ICDE 87]

Monet: Binary Attribute Tables [Boncz et al VLDB Journal 99]

Attribute Approach for storing XML Data [Florescu et al INRIA Tech Report 99]

Val

7 inch

Name

PAN DVD-L75

Monitor

ValName

KLH DVD221

Height

3.75

ValName

PAN DVD-L75

Output

Digital

S-VideoKLH DVD221

Page 7: Storage Engine for Semantic Web

Vertical representationVertical representation(One 3-ary relation)(One 3-ary relation)

Oid (object identifier) Key (attribute name) Val (attribute value)

Objects can have large number of attributes

Handles sparseness well Schema evolution is easy

Oid Key Val

0 ‘Name’ ‘PAN DVD-L75’

0 ‘Monitor’ ‘7 inch’

0 ‘Recharge’ ‘Built-in’

0 ‘Output’ ‘Digital’

1 ‘Name’ ‘KLH DVD221’

1 ‘Height’ ‘3.75’

1 ‘Output’ ‘S-Video’

1 ‘Progressive Scan’

‘No’

2 ‘Name’ ‘SONY S-7000’

… … …

Implementation of SchemaSQL [LSS 99] Edge Approach for storing XML Data [FK

99]

Page 8: Storage Engine for Semantic Web

Querying over Vertical Querying over Vertical Representation is ComplexRepresentation is Complex

Simple query on a Horizontal scheme SELECT MONITOR FROM H WHERE OUTPUT=‘Digital’

Becomes quite complex:

SELECT v1.Val

FROM vtable v1, vtable v2 WHERE v1.Key = ‘Monitor’ AND v2.Key = ‘Output’ AND v2.Val = ‘Digital’ AND v1.Oid = v2.Oid

Writing applications becomes much harder. What can we do ?

Page 9: Storage Engine for Semantic Web

SolutionSolution Provide horizontal view of the vertical table Translation layer automatically maps operations

on H to operations on V

…Attrk…Attr2Attr1

Query Mapping Layer

ValKeyOid

Horizontal view (H)

Vertical table (V)

Page 10: Storage Engine for Semantic Web

Transformation AlgebraTransformation Algebra

Defined an algebra for transforming expressions over horizontal views into expressions over the vertical representation.

Two key operators:– v2h ()– h2v ()

Page 11: Storage Engine for Semantic Web

Sample Algebraic TransformsSample Algebraic Transforms v2h ( Operation – Convert from vertical to horizontal

k(V) = [Oid(V)] [i=1,k Oid,Val(Key=‘Ai’(V))]

h2V (Operation – Convert from horizontal to vertical

k(H) = i=1,k Oid,’Ai’Ai(Ai ‘’(V))] i=1,k Oid,’Ai’Ai(i=1,kAi=‘’(V))

Similar operations such as Unfold/Fold and Gather/Scatter exist in SchemaSQL [LSS 99] and [STA 98] respectively

Complete transforms in VLDB-2001 Paper

Page 12: Storage Engine for Semantic Web

From the Algebra to SQLFrom the Algebra to SQL

Equivalent SQL transforms for algebraic transforms– Select, Project– Joins (self, two verticals, a horizontal and a vertical)– Cartesian Product– Union, Intersection, Set difference– Aggregation

Extend DDL to provide the Horizontal ViewCREATE HORIZONTAL VIEW hview ON VERTICAL TABLE vtable

USING COLUMNS (Attr1, Attr2, … Attrk, …)

Page 13: Storage Engine for Semantic Web

Alternative Implementation Alternative Implementation StrategiesStrategies

VerticalSQL – Uses only SQL-92 level capabilities

VerticalUDF – Exploits User Defined Functions and Table

Functions to provide a direct implementation Binary (hand-coded queries)

– 2-ary representation with one relation per attribute (using only SQL-92 transforms)

Page 14: Storage Engine for Semantic Web

Data Organization Matters: Clustering Data Organization Matters: Clustering by by KeyKey significantly outperforms by significantly outperforms by OidOid

density = 10%, 1000 cols x 20K rows

0

5

10

15

20

25

0.1% 1% 5%

Join selectivity

Ex

ec

uti

on

tim

e (

se

co

nd

s)

VerticalSQL_oid

VerticalSQL_key

Join

Page 15: Storage Engine for Semantic Web

Projection of 10 columns

VerticalSQL comparable to Binary VerticalSQL comparable to Binary and outperforms Horizontaland outperforms Horizontal

0

10

20

30

40

50

60

200x100K 400x50K 800x25K 1000x20K

Table (#cols x #rows)

Ex

ec

uti

on

tim

e (

se

co

nd

s)

density = 10%

HorizontalSQL

VerticalSQL

Binary

Page 16: Storage Engine for Semantic Web

VerticalUDF is the best approach VerticalUDF is the best approach

0

10

20

30

200x100K 400x50K 800x25K 1000x20K

Table (#cols x #rows)

Ex

ec

uti

on

tim

e (

se

co

nd

s)

density = 10%

VerticalUDF

VerticalSQL

Binary

Projection of 10 columns

Page 17: Storage Engine for Semantic Web

SummarySummary

+-

+-Flexibility

++Manageability

Vertical (w/ Mapping)Horizontal

-

-

Binary (w/ Mapping)

+Performance

Querying + + +

Page 18: Storage Engine for Semantic Web

RemarksRemarks

Lessons of this study directly apply to building storage engine for semantics webs

Performance of vertical representation can be further improved by:

– Enhanced table functions

– First class treatment of table functions

– Native support for v2h and h2v operations

– Partial indices