14
1 1 1 ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001 On Storing Voluminous RDF Descriptions: The case of Web Portal Catalogs (http://www. (http://www.ics ics.forth. .forth.gr gr/proj proj/isst isst/RDF) /RDF) Sofia Alexaki, Vassilis Christophides Gregory Karvounarakis, Dimitris Plexousakis Computer Science Department, University of Crete and Institute for Computer Science - FORTH Heraklion, Crete, Greece 2 ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001 Portalmania!

On Storing Voluminous RDF Descriptions: The case of Web Portal

Embed Size (px)

Citation preview

11

1

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

On Storing Voluminous RDF Descriptions:

The case of Web Portal Catalogs (http://www.(http://www.icsics.forth..forth.grgr//projproj//isstisst/RDF)/RDF)

Sofia Alexaki, Vassilis ChristophidesGregory Karvounarakis, Dimitris Plexousakis

Computer Science Department, University of Creteand

Institute for Computer Science - FORTHHeraklion, Crete, Greece

2

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Portalmania!

22

3

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Internet Portals Example: The Open Directory

4

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Browsing the ODP Topics

33

5

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Browsing the ODP Topics

6

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Searching the ODP Topics & URLs

n Descriptions in ODP consist of the classification of URIs to topics, atextual description and various administrative information

44

7

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

ODP Search Results: Hotel Paris Orsay

8

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

ODP Search Results: Hotel Paris Orsay

55

9

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

What is needed?

n Flexible Modeling of Web Portal Catalogs using W3C standards (RDF/S)u Exploit existing forms of domain knowledge

• Ranging from simple vocabularies to formal ontologies

v Describe information resources in various ways• Administration, Classification, Content Rating, Channels, ….

n Secondary Storage Management of Portal Metadatau Large Schemas: e.g., 170 Mbytes of ODP Topics (the Art Hierarchy

contains 25315 terms)

v Voluminous Description Bases: e.g., 700 Mbytes of ODP indexed sites

(2,342,978 URLs)

n Declarative Query Languages for Portal Catalogsu Interleave schema with data querying

v Optimize access to Portal Catalogs

10

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Our Approach

High-level Access to

community information

Archives

Virtual XML Warehouse

Documents

Databases

Web

RDF

� Use W3C Standards to describe (RDF/S) & exchange (XML) information

� Our Main Contribution: Declarative Languages for Browsing & Querying

66

11

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Outline

n The Open Directory Portal: a case studyn The RDF Query Language (RQL)n RDF Storage Strategiesn Testbed: the ODP RDF dump

g Representative queriesg Performance

n Summary and Outlook

12

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Modeling the ODP Catalog with RDF/S

Class related

ns1: http://www.dmoz.org/topic.rdf

rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema# typeOf(instance)typeOf(instance)

subClassOf(isA)subClassOf(isA)

attributionattribution

Regional Recreation

Lodging

Vacation-Rentals

related

ns2: www.oclc.org/dublincore.rdfs

Ext.Resource

stringstringtitle

description

stringstring

datedate

file_size

last_modifiedIle-de-France

Paris

Travel

Hotel Directories

Hotel

&r1&r1 &r3&r3&r2&r2 &r4&r4

title title title

Notre-Dame

HotelSiteofficielde DisneylandParis

Disneyland

Officialsiteof DisneylandParis

title

description description

Danube OrsaySunScale

&r1: http://www.sunscale.com/france/paris/index.htm

77

13

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Resource Description Framework (RDF/S)

n RDF: Resource Descriptions gData Model: Directed Labeled Graphs

• Nodes: Resources (URIs) or Literals• Edges: Properties – Attributes or Relationships• Labels: Nodes (Class names) and Edges (Property names)• Statement: assertion of the form resource, property, value• Description: collection of statements concerning a resource

gXML syntaxn RDF Schema (RDFS): Schema Vocabularies

gSpecialization of both classes & properties (simple & multiple)gMultiple classification under several classesgUnordered, optional, and multi-valued properties gDomain and range polymorphism of properties

14

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

RDF/S vs. Well-Known Formalisms

n Relational or Object Database Models (ODMG, SQL)g Classes don’t define table or object typesg Instances may have associated quite different propertiesg Collections with heterogeneous members

n Semistructured or XML Data Models (OEM, UnQL, YAT, XML Schema)

g Labels on both nodes and edgesg Schema class and property subsumption is not capturedg Heterogeneous descriptions reminiscent of SGML exceptions

n Knowledge Representation Languages (Telos, DL, F-Logic)g Absence of complex values and n-ary relationships (bags, sequences)

88

15

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

The RDF Query Language: RQL

n Declarative query language for RDF description basesgrelies on a typed data model (see SemWeb2001 paper)gfollows a functional approach (basic queries and filters)gadapts the functionality of semistructured or XML query languages

to RDF, but also: • treats properties as self-existent individuals• exploits taxonomies of node and edge labels • allows querying of schemas as semistructured data

n Relational interpretation of schemas & resource descriptions

16

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

selectselect $X$X

fromfrom Regional {{::$X$X}}

wherewhere $X$X likelike “*“*Hotel*” *”

andand $X$X < < Paris

Portal Navigation with RQL

n Browsing large description bases is cumbersome!n RQL provides powerful path expressions permitting filtering and

navigation on both portal schemas and resource descriptionsn E.g., to find (under the Regional ODP hierarchy) URI’s of Hotels in

Paris whose title matches “Orsay”

selectselect ZZ

fromfrom ((selectselect $X$X

fromfrom Regional {{::$X$X}}

wherewhere $X$X likelike “*“*Hotel*” *”

andand $X$X < < Paris)){{YY}.{}.{ZZ}}title{{TT}}wherewhere TT likelike “*“*OrsayOrsay*”*”

selectselect

fromfrom ((selectselect $X$X

fromfrom Regional {{::$X$X}}

wherewhere $X$X likelike “*“*Hotel*” *”

andand $X$X < < Paris)){{YY}}

99

17

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

The ICS-FORTH RDFSuite Architecture

Class Property

ORDBMSORDBMS

p_namedomain rangeResource title Literal

c_nameHotel

Hotel Dir

URIcreates

subclHotel Dir

supclHotel

subpr suppr

SubClass SubProperty

sourcepaints

targetcreates

Hotel title

DBM

S RD

F qu

ery

APIs

SQ

L3+

SP

I fun

ctio

ns

LIBC++

SQL3

RQL InterpreterRQL Interpreter

Typing

Evaluation

GraphConstructor

Parser

Parser

VRP InternalRDF Model

Validator

RD

F Lo

ader

Load

ing

RD

F Ja

va A

PIsVRPVRP

JDBC

SQL3

18

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Generic Representation

id: int1

uri: texthttp://www.dmoz.org/topics.rdfs#Hotel

Resources

3 http://www.oclc.org/dublincore.rdfs#title2 http://www.dmoz.org/topics.rdfs#Hotel Directories

9 r1

4 http://www.dmoz.org/schema.rdf#Ext.Resource

predid: int6

Triplessubid: int

2objid: int

15 3 75 1 8

objvalue: text

5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type6 http://www.w3.org/2000/01/rdf-schema#subClassOf7 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property

5 9 2

8 http://www.w3.org/2000/01/rdf-schema#Class

3 9 SunScale

1010

19

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Specific Representation

subid: int11

13

SubClasssuperid: int

1

12

subid: int16

SubPropertysuperid: int

1412 1

Namespace Type

id: int11

rangeid: int4412

13

id:int1

uri: texthttp://www.w3.org/2000/01/rdf-schema#

3 http://www.oclc.org/dublincore.rdfs#4 http://www.dmoz.org/topics.rdfs#

id: int1

nsid: int1

lpart: textResource

2 2 Bag2 http://www.w3.org/1999/02/22-rdf-syntax-ns#3 2 Seq4 String

Classnsid: int

5lpart: text

Ext.Resource 1415

Propertynsid: int

33

lpart: texttitledescription

domainid: int114 Hotel

4 Hotel Directories

id: int

16 5 title 11 4

subtable

t12

URI: textt1

source: textt15

target: text

URI: textr1

t11URI: text

r2

URI: textr1

t13r2

source: text target: text

source: textr1

t14target: textSunScale

r2 Pulitzer Operat16

classid: int11

1311

uri: textr1

r1

Instances

r2

r2 12

20

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

DBMS Size vs. Schema Triples

èDBMS size scales linearly with the number of schema triples

SpecRepr GenRepr

Aver. triple size (with indexes)

0.086 KB (0.1734 KB)

0.1582 KB (0.3062 KB )

Aver. triple storage time (with indexes)

0.0021 sec (0.0025) sec

0.0025 sec (0.0032 sec)

1111

21

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

DBMS Size vs. Data Triples

èDBMS size scales linearly with the number of data triples

SpecRepr GenRepr

Aver. triple size (with indexes)

0.123 KB (0.2566 KB)

0.123 KB (0.2706 KB )

Aver. triple storage time (with indexes)

0.0033 sec (0.0043) sec

0.0039 sec (0.00457 sec)

22

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Query Templates for RDF description bases

Pure schema queries

Q1 Find the range (or domain) of a property

Q2 Find the direct subclasses of a class

Q3 Find the transitive subclasses of a class

Q4 Check if a class is a subclass of another class Queries on resource descriptions using available schema knowledge

Q5 Find the direct extent of a class (or property) Q6 Find the transitive extent of a class (or property) Q7 Find if a resource is an instance of a class

Q8 Find the resources having a property with a specific (or range of) value(s)

Q9 Find the instances of a class having a given property Schema queries for specific resource descriptions

Q10 Find the properties of a resource and their values Q11 Find the classes under which a resource is classified

1212

23

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Execution Time of RDF Benchmark Queries

Query Generic Specific

Case 1 Case 2 Case 3 Case 1 Case 2 Case 3 Q1 0.0015 0.0012

Q2 0.0017 0.0028 0.02 0.0012 0.0022 0.0124

Q3 0.0460 0.082 344.91 0.0463 0.0612 341.98

Q4 0.033 0.0415 0.0662 0.0333 0.0415 0.0662

Q5 0.0043 0.008 0.04 0.0015 0.0028 0.027

Q6 0.0573 0.315 627.43 0.0508 0.1118 482.45

Q7 0.0034 0.0034 0.0034 0.0016 0.0016 0.0017

Q8 124.20 365.73 675.42 0.0013 0.0069 0.0466 Q9 110.58 117.68 185.7 0.031 0.0338 0.1059 Q10 0.0072 0.0072 0.0072 0.0071 0.0071 0.0076

Q11 0.0035 0.0043 0.0056 0.0013 0.0015 0.0015

24

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Comparison

n Specific Representation permits the customization of the database representation of RDF metadata

n Specific Representation outperforms the Generic Representation for all types of queries

gQ1, Q2, Q5, Q7, Q10, Q11: by a factor up to 3.73gQ3, Q4, Q6: by a factor up to 2.8

gQ8, Q9: by a factor up to 95,538n Generic representation pays severe penalty for maintaining large

tables (Triples, Resources)g e.g., queries Q8, Q9 require (self-) joins of Triples, Resources

1313

25

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Conclusions and Future Work

n Use DB technology for Web Portalsn RDF/S data model is flexible enough to represent superimposed

descriptions of:gInformation resourcesgE-servicesfor various content syndication applications

n RDFSuite addresses the needs of effective and efficient managementof RDF descriptions by providing tools for validation, storage and querying

gFirst set of benchmark queries for RDF description basesgFirst implementation of an experimental framework for real-scale

RDF applicationsn Ongoing efforts:

gappropriate encoding for taxonomic schema relationships to optimize i.e. subclass computation(Q3, Q6)

26

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

1414

27

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Acknowledgements

n Funding was generously provided by the projects:

gC-WEB (IST-1999-13479): “A Generic Platform Supporting

Community Webs”

gMESMUSES (IST-2000-26074): “Metaphor for Science

Museums””

28

ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001

Summary and Outlook

n RDFSuite addresses the needs of effective RDF metadatamanagement by providing tools for validation, storage and querying

g validation follows a formal data model and constraints enforcing consistency of RDF schemas

g incremental loading of voluminous description bases in a persistent store

g declarative query language for schema and data queryingn Ongoing efforts:

g RQL query optimizationg transactional aspectsg alternative encoding and representation schemes for access

optimization