Upload
ifeoma-morris
View
22
Download
1
Embed Size (px)
DESCRIPTION
Storage Engine for Semantic Web. Assertion. Storage engine for semantic web has requirements similar to those for e-commerce aplications. Draw upon results and lessons from R. Agrawal, A. Somani, Y. Xu: Storage and Retrieval of E-Commerce Data. VLDB-2001. - PowerPoint PPT Presentation
Citation preview
Storage Engine for Semantic WebStorage Engine for Semantic Web
AssertionAssertion
Storage engine for semantic web has requirements similar to those for e-commerce aplications.
Draw upon results and lessons from– R. Agrawal, A. Somani, Y. Xu: Storage and
Retrieval of E-Commerce Data. VLDB-2001.
Typical E-Commerce Data Typical E-Commerce Data CharacteristicsCharacteristics
Nearly 2 Million components More than 2000 leaf-level
categories Large number of Attributes (5000)
An Experimental E-marketplace for An Experimental E-marketplace for Computer componentsComputer components
Constantly evolving schema Sparsely populated data (about 50-100 attributes/component)
Alternative Physical Representations
Horizontal– One N-ary relation
Binary – N 2-ary relations
Vertical– One 3-ary relation
Conventional horizontal representation Conventional horizontal representation (n-ary relation)(n-ary relation)
Name Monitor Height Recharge Output playback Smooth scan Progressive Scan
PAN DVD-L75 7 inch - Built-in Digital - - -
KLH DVD221 - 3.75 - S-Video - - No
SONY S-7000 - - - - - - -
SONY S-560D - - - - Cinema Sound Yes -
… … … … … … … …
DB Catalogs do not support thousands of columns (DB2/Oracle limit: 1012 columns)
Storage overhead of NULL values Nulls increase the index size and they sort high in DB2 B+ tree index Hard to load/update Schema evolution is expensive
Querying is straightforward
Binary RepresentationBinary Representation(N 2-ary relations)(N 2-ary relations)
Dense representation Manageability is hard
because of large number of tables
Schema evolution expensive
Decomposition Storage Model [Copeland et al SIGMOD 85], [Khoshafian et al ICDE 87]
Monet: Binary Attribute Tables [Boncz et al VLDB Journal 99]
Attribute Approach for storing XML Data [Florescu et al INRIA Tech Report 99]
Val
7 inch
Name
PAN DVD-L75
Monitor
ValName
KLH DVD221
Height
3.75
ValName
PAN DVD-L75
Output
Digital
S-VideoKLH DVD221
Vertical representationVertical representation(One 3-ary relation)(One 3-ary relation)
Oid (object identifier) Key (attribute name) Val (attribute value)
Objects can have large number of attributes
Handles sparseness well Schema evolution is easy
Oid Key Val
0 ‘Name’ ‘PAN DVD-L75’
0 ‘Monitor’ ‘7 inch’
0 ‘Recharge’ ‘Built-in’
0 ‘Output’ ‘Digital’
1 ‘Name’ ‘KLH DVD221’
1 ‘Height’ ‘3.75’
1 ‘Output’ ‘S-Video’
1 ‘Progressive Scan’
‘No’
2 ‘Name’ ‘SONY S-7000’
… … …
Implementation of SchemaSQL [LSS 99] Edge Approach for storing XML Data [FK
99]
Querying over Vertical Querying over Vertical Representation is ComplexRepresentation is Complex
Simple query on a Horizontal scheme SELECT MONITOR FROM H WHERE OUTPUT=‘Digital’
Becomes quite complex:
SELECT v1.Val
FROM vtable v1, vtable v2 WHERE v1.Key = ‘Monitor’ AND v2.Key = ‘Output’ AND v2.Val = ‘Digital’ AND v1.Oid = v2.Oid
Writing applications becomes much harder. What can we do ?
SolutionSolution Provide horizontal view of the vertical table Translation layer automatically maps operations
on H to operations on V
…Attrk…Attr2Attr1
Query Mapping Layer
ValKeyOid
Horizontal view (H)
Vertical table (V)
Transformation AlgebraTransformation Algebra
Defined an algebra for transforming expressions over horizontal views into expressions over the vertical representation.
Two key operators:– v2h ()– h2v ()
Sample Algebraic TransformsSample Algebraic Transforms v2h ( Operation – Convert from vertical to horizontal
k(V) = [Oid(V)] [i=1,k Oid,Val(Key=‘Ai’(V))]
h2V (Operation – Convert from horizontal to vertical
k(H) = i=1,k Oid,’Ai’Ai(Ai ‘’(V))] i=1,k Oid,’Ai’Ai(i=1,kAi=‘’(V))
Similar operations such as Unfold/Fold and Gather/Scatter exist in SchemaSQL [LSS 99] and [STA 98] respectively
Complete transforms in VLDB-2001 Paper
From the Algebra to SQLFrom the Algebra to SQL
Equivalent SQL transforms for algebraic transforms– Select, Project– Joins (self, two verticals, a horizontal and a vertical)– Cartesian Product– Union, Intersection, Set difference– Aggregation
Extend DDL to provide the Horizontal ViewCREATE HORIZONTAL VIEW hview ON VERTICAL TABLE vtable
USING COLUMNS (Attr1, Attr2, … Attrk, …)
Alternative Implementation Alternative Implementation StrategiesStrategies
VerticalSQL – Uses only SQL-92 level capabilities
VerticalUDF – Exploits User Defined Functions and Table
Functions to provide a direct implementation Binary (hand-coded queries)
– 2-ary representation with one relation per attribute (using only SQL-92 transforms)
Data Organization Matters: Clustering Data Organization Matters: Clustering by by KeyKey significantly outperforms by significantly outperforms by OidOid
density = 10%, 1000 cols x 20K rows
0
5
10
15
20
25
0.1% 1% 5%
Join selectivity
Ex
ec
uti
on
tim
e (
se
co
nd
s)
VerticalSQL_oid
VerticalSQL_key
Join
Projection of 10 columns
VerticalSQL comparable to Binary VerticalSQL comparable to Binary and outperforms Horizontaland outperforms Horizontal
0
10
20
30
40
50
60
200x100K 400x50K 800x25K 1000x20K
Table (#cols x #rows)
Ex
ec
uti
on
tim
e (
se
co
nd
s)
density = 10%
HorizontalSQL
VerticalSQL
Binary
VerticalUDF is the best approach VerticalUDF is the best approach
0
10
20
30
200x100K 400x50K 800x25K 1000x20K
Table (#cols x #rows)
Ex
ec
uti
on
tim
e (
se
co
nd
s)
density = 10%
VerticalUDF
VerticalSQL
Binary
Projection of 10 columns
SummarySummary
+-
+-Flexibility
++Manageability
Vertical (w/ Mapping)Horizontal
-
-
Binary (w/ Mapping)
+Performance
Querying + + +
RemarksRemarks
Lessons of this study directly apply to building storage engine for semantics webs
Performance of vertical representation can be further improved by:
– Enhanced table functions
– First class treatment of table functions
– Native support for v2h and h2v operations
– Partial indices