Upload
trinhnga
View
222
Download
0
Embed Size (px)
Citation preview
11
1
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
On Storing Voluminous RDF Descriptions:
The case of Web Portal Catalogs (http://www.(http://www.icsics.forth..forth.grgr//projproj//isstisst/RDF)/RDF)
Sofia Alexaki, Vassilis ChristophidesGregory Karvounarakis, Dimitris Plexousakis
Computer Science Department, University of Creteand
Institute for Computer Science - FORTHHeraklion, Crete, Greece
2
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Portalmania!
22
3
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Internet Portals Example: The Open Directory
4
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Browsing the ODP Topics
33
5
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Browsing the ODP Topics
6
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Searching the ODP Topics & URLs
n Descriptions in ODP consist of the classification of URIs to topics, atextual description and various administrative information
44
7
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
ODP Search Results: Hotel Paris Orsay
8
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
ODP Search Results: Hotel Paris Orsay
55
9
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
What is needed?
n Flexible Modeling of Web Portal Catalogs using W3C standards (RDF/S)u Exploit existing forms of domain knowledge
• Ranging from simple vocabularies to formal ontologies
v Describe information resources in various ways• Administration, Classification, Content Rating, Channels, ….
n Secondary Storage Management of Portal Metadatau Large Schemas: e.g., 170 Mbytes of ODP Topics (the Art Hierarchy
contains 25315 terms)
v Voluminous Description Bases: e.g., 700 Mbytes of ODP indexed sites
(2,342,978 URLs)
n Declarative Query Languages for Portal Catalogsu Interleave schema with data querying
v Optimize access to Portal Catalogs
10
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Our Approach
High-level Access to
community information
Archives
Virtual XML Warehouse
Documents
Databases
Web
RDF
� Use W3C Standards to describe (RDF/S) & exchange (XML) information
� Our Main Contribution: Declarative Languages for Browsing & Querying
66
11
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Outline
n The Open Directory Portal: a case studyn The RDF Query Language (RQL)n RDF Storage Strategiesn Testbed: the ODP RDF dump
g Representative queriesg Performance
n Summary and Outlook
12
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Modeling the ODP Catalog with RDF/S
Class related
ns1: http://www.dmoz.org/topic.rdf
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema# typeOf(instance)typeOf(instance)
subClassOf(isA)subClassOf(isA)
attributionattribution
Regional Recreation
Lodging
Vacation-Rentals
related
ns2: www.oclc.org/dublincore.rdfs
Ext.Resource
stringstringtitle
description
stringstring
datedate
file_size
last_modifiedIle-de-France
Paris
Travel
Hotel Directories
Hotel
&r1&r1 &r3&r3&r2&r2 &r4&r4
title title title
Notre-Dame
HotelSiteofficielde DisneylandParis
Disneyland
Officialsiteof DisneylandParis
title
description description
Danube OrsaySunScale
&r1: http://www.sunscale.com/france/paris/index.htm
77
13
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Resource Description Framework (RDF/S)
n RDF: Resource Descriptions gData Model: Directed Labeled Graphs
• Nodes: Resources (URIs) or Literals• Edges: Properties – Attributes or Relationships• Labels: Nodes (Class names) and Edges (Property names)• Statement: assertion of the form resource, property, value• Description: collection of statements concerning a resource
gXML syntaxn RDF Schema (RDFS): Schema Vocabularies
gSpecialization of both classes & properties (simple & multiple)gMultiple classification under several classesgUnordered, optional, and multi-valued properties gDomain and range polymorphism of properties
14
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
RDF/S vs. Well-Known Formalisms
n Relational or Object Database Models (ODMG, SQL)g Classes don’t define table or object typesg Instances may have associated quite different propertiesg Collections with heterogeneous members
n Semistructured or XML Data Models (OEM, UnQL, YAT, XML Schema)
g Labels on both nodes and edgesg Schema class and property subsumption is not capturedg Heterogeneous descriptions reminiscent of SGML exceptions
n Knowledge Representation Languages (Telos, DL, F-Logic)g Absence of complex values and n-ary relationships (bags, sequences)
88
15
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
The RDF Query Language: RQL
n Declarative query language for RDF description basesgrelies on a typed data model (see SemWeb2001 paper)gfollows a functional approach (basic queries and filters)gadapts the functionality of semistructured or XML query languages
to RDF, but also: • treats properties as self-existent individuals• exploits taxonomies of node and edge labels • allows querying of schemas as semistructured data
n Relational interpretation of schemas & resource descriptions
16
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
selectselect $X$X
fromfrom Regional {{::$X$X}}
wherewhere $X$X likelike “*“*Hotel*” *”
andand $X$X < < Paris
Portal Navigation with RQL
n Browsing large description bases is cumbersome!n RQL provides powerful path expressions permitting filtering and
navigation on both portal schemas and resource descriptionsn E.g., to find (under the Regional ODP hierarchy) URI’s of Hotels in
Paris whose title matches “Orsay”
selectselect ZZ
fromfrom ((selectselect $X$X
fromfrom Regional {{::$X$X}}
wherewhere $X$X likelike “*“*Hotel*” *”
andand $X$X < < Paris)){{YY}.{}.{ZZ}}title{{TT}}wherewhere TT likelike “*“*OrsayOrsay*”*”
selectselect
fromfrom ((selectselect $X$X
fromfrom Regional {{::$X$X}}
wherewhere $X$X likelike “*“*Hotel*” *”
andand $X$X < < Paris)){{YY}}
99
17
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
The ICS-FORTH RDFSuite Architecture
Class Property
ORDBMSORDBMS
p_namedomain rangeResource title Literal
c_nameHotel
Hotel Dir
URIcreates
subclHotel Dir
supclHotel
subpr suppr
SubClass SubProperty
sourcepaints
targetcreates
Hotel title
DBM
S RD
F qu
ery
APIs
SQ
L3+
SP
I fun
ctio
ns
LIBC++
SQL3
RQL InterpreterRQL Interpreter
Typing
Evaluation
GraphConstructor
Parser
Parser
VRP InternalRDF Model
Validator
RD
F Lo
ader
Load
ing
RD
F Ja
va A
PIsVRPVRP
JDBC
SQL3
18
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Generic Representation
id: int1
uri: texthttp://www.dmoz.org/topics.rdfs#Hotel
Resources
3 http://www.oclc.org/dublincore.rdfs#title2 http://www.dmoz.org/topics.rdfs#Hotel Directories
9 r1
4 http://www.dmoz.org/schema.rdf#Ext.Resource
predid: int6
Triplessubid: int
2objid: int
15 3 75 1 8
objvalue: text
5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type6 http://www.w3.org/2000/01/rdf-schema#subClassOf7 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
5 9 2
8 http://www.w3.org/2000/01/rdf-schema#Class
3 9 SunScale
1010
19
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Specific Representation
subid: int11
13
SubClasssuperid: int
1
12
subid: int16
SubPropertysuperid: int
1412 1
Namespace Type
id: int11
rangeid: int4412
13
id:int1
uri: texthttp://www.w3.org/2000/01/rdf-schema#
3 http://www.oclc.org/dublincore.rdfs#4 http://www.dmoz.org/topics.rdfs#
id: int1
nsid: int1
lpart: textResource
2 2 Bag2 http://www.w3.org/1999/02/22-rdf-syntax-ns#3 2 Seq4 String
Classnsid: int
5lpart: text
Ext.Resource 1415
Propertynsid: int
33
lpart: texttitledescription
domainid: int114 Hotel
4 Hotel Directories
id: int
16 5 title 11 4
subtable
t12
URI: textt1
source: textt15
target: text
URI: textr1
t11URI: text
r2
URI: textr1
t13r2
source: text target: text
source: textr1
t14target: textSunScale
r2 Pulitzer Operat16
classid: int11
1311
uri: textr1
r1
Instances
r2
r2 12
20
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
DBMS Size vs. Schema Triples
èDBMS size scales linearly with the number of schema triples
SpecRepr GenRepr
Aver. triple size (with indexes)
0.086 KB (0.1734 KB)
0.1582 KB (0.3062 KB )
Aver. triple storage time (with indexes)
0.0021 sec (0.0025) sec
0.0025 sec (0.0032 sec)
1111
21
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
DBMS Size vs. Data Triples
èDBMS size scales linearly with the number of data triples
SpecRepr GenRepr
Aver. triple size (with indexes)
0.123 KB (0.2566 KB)
0.123 KB (0.2706 KB )
Aver. triple storage time (with indexes)
0.0033 sec (0.0043) sec
0.0039 sec (0.00457 sec)
22
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Query Templates for RDF description bases
Pure schema queries
Q1 Find the range (or domain) of a property
Q2 Find the direct subclasses of a class
Q3 Find the transitive subclasses of a class
Q4 Check if a class is a subclass of another class Queries on resource descriptions using available schema knowledge
Q5 Find the direct extent of a class (or property) Q6 Find the transitive extent of a class (or property) Q7 Find if a resource is an instance of a class
Q8 Find the resources having a property with a specific (or range of) value(s)
Q9 Find the instances of a class having a given property Schema queries for specific resource descriptions
Q10 Find the properties of a resource and their values Q11 Find the classes under which a resource is classified
1212
23
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Execution Time of RDF Benchmark Queries
Query Generic Specific
Case 1 Case 2 Case 3 Case 1 Case 2 Case 3 Q1 0.0015 0.0012
Q2 0.0017 0.0028 0.02 0.0012 0.0022 0.0124
Q3 0.0460 0.082 344.91 0.0463 0.0612 341.98
Q4 0.033 0.0415 0.0662 0.0333 0.0415 0.0662
Q5 0.0043 0.008 0.04 0.0015 0.0028 0.027
Q6 0.0573 0.315 627.43 0.0508 0.1118 482.45
Q7 0.0034 0.0034 0.0034 0.0016 0.0016 0.0017
Q8 124.20 365.73 675.42 0.0013 0.0069 0.0466 Q9 110.58 117.68 185.7 0.031 0.0338 0.1059 Q10 0.0072 0.0072 0.0072 0.0071 0.0071 0.0076
Q11 0.0035 0.0043 0.0056 0.0013 0.0015 0.0015
24
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Comparison
n Specific Representation permits the customization of the database representation of RDF metadata
n Specific Representation outperforms the Generic Representation for all types of queries
gQ1, Q2, Q5, Q7, Q10, Q11: by a factor up to 3.73gQ3, Q4, Q6: by a factor up to 2.8
gQ8, Q9: by a factor up to 95,538n Generic representation pays severe penalty for maintaining large
tables (Triples, Resources)g e.g., queries Q8, Q9 require (self-) joins of Triples, Resources
1313
25
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Conclusions and Future Work
n Use DB technology for Web Portalsn RDF/S data model is flexible enough to represent superimposed
descriptions of:gInformation resourcesgE-servicesfor various content syndication applications
n RDFSuite addresses the needs of effective and efficient managementof RDF descriptions by providing tools for validation, storage and querying
gFirst set of benchmark queries for RDF description basesgFirst implementation of an experimental framework for real-scale
RDF applicationsn Ongoing efforts:
gappropriate encoding for taxonomic schema relationships to optimize i.e. subclass computation(Q3, Q6)
26
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
1414
27
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Acknowledgements
n Funding was generously provided by the projects:
gC-WEB (IST-1999-13479): “A Generic Platform Supporting
Community Webs”
gMESMUSES (IST-2000-26074): “Metaphor for Science
Museums””
28
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Summary and Outlook
n RDFSuite addresses the needs of effective RDF metadatamanagement by providing tools for validation, storage and querying
g validation follows a formal data model and constraints enforcing consistency of RDF schemas
g incremental loading of voluminous description bases in a persistent store
g declarative query language for schema and data queryingn Ongoing efforts:
g RQL query optimizationg transactional aspectsg alternative encoding and representation schemes for access
optimization