Upload
javier-d-fernandez
View
708
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Slides of my presentation at WWW 2012 PhD Symposium
Citation preview
Javier D. Fernández
Supervised by: Miguel A. Martínez-Prieto and Claudio Gutierrez
University of Valladolid (Spain)
University of Chile (Chile)
Binary RDF for Scalable Publishing,
Exchanging and Consumption
in the Web of Data
PhD Symposium
(1) Resource Description Framework
Webs, services, protocols
Persons, Proteins, geography…
(2) A standard model for data exchange on the Web
Understandable by computers
(3) W3C Recommendation (2004)
(4) Data model
(subject, predicate, object)
Brief RDF Introduction
PhD Symposium
<http://books/book21>
<http://books/author33>
“Spain in the Heart”
“Pablo Neruda”
URI URI
<http://myblog/lectures>
URI
literal
lectures:to_read_list
_collection
Blank
Subject, Predicate, Object (U,B) , U , (U,B,L)
RDF Example
PhD Symposium
Image: Danilo Rizzuti / FreeDigitalPhotos.net
1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs, so that they can discover more things.
PhD Symposium
Image: Danilo Rizzuti / FreeDigitalPhotos.net PhD Symposium
DBPedia (en) 233 M.triples ~ 33 GB
Uniprot 845 “ ~ 230 GB
Publish?
Exchange?
Process/Consume/Query?
Scalability problems
PhD Symposium
RDF Publication
No Recommendations/methodology to publish at large scale
Related Work: Some metadata for discovery, such as Void, Semantic
Sitemaps.
RDF dump
SPARQL Endpoints/
APIs
dereferenceable URIs
sensor
PhD Symposium
RDF Exchanging issues
RDF/XML, N3, Turtle, JSON.
Document-centric (verbose) data-centric view (machine)
No structure (chunks, universal compression)
Related Work: Universal compression (gzip, bzip2) and the Efficient XML
Interchange Format (EXI).
Image: renjith krishnan / FreeDigitalPhotos.net PhD Symposium
RDF Processing/Consumption (After Exchanging)
Costly Post-processing
Decompression
Indexing (RDF Store)
Finally… consume
Related Work (indexing): Based on Relational Storage (Virtuoso) Multi-indexes
(RDF3X), Distributed Systems (Map-Reduce) and others (Bit-Mat).
Image: renjith krishnan / FreeDigitalPhotos.net PhD Symposium
The scalability problems has
a main impact on Users
Would you download hundreds of GB...
… if you don’t know exactly what they contain,
that need costly exchange and post-processing,
and require a powerful store to query them ?
Image: renjith krishnan / FreeDigitalPhotos.net PhD Symposium
In the following...
1. Proposed approach for scalable publishing, exchanging and consumption
of large RDF datasets
2. Preliminary results
3. Methodology
4. On-going work and conclusions
Image: jscreationzs / FreeDigitalPhotos.net PhD Symposium
An integrated solution
We call for, and we study in this thesis, a Binary RDF Serialization format:
Machine oriented (binary)
Clean publication
Metadata
Modular
Efficient exchange
Compression
Basic data operations
Easy to parse and consume
Primitive query resolution
Image: jscreationzs / FreeDigitalPhotos.net PhD Symposium
HDT Overview
PhD Symposium
Dictionary+Triples partition
<http://books/author33>
<http://books/book21>
dc:author
dc:title
foaf:name
“Pablo Neruda”
“Spain in the Heart”
1
2
3
4
5
6
7
2 1
7
6
PhD Symposium
Key concepts: The Dictionary
Largest component (up to 74%)
Long URIs, shared prefixes
Lang, datatype tags in literals
Efficient IDString operations
We plan to work on a specific organization which
Optimizes space (regularities)
Provides efficient performance in operations
PhD Symposium
Preliminary results in Rich Functional Dictionaries
We propose to adapt techniques for string dictionaries;
Front-Coding
Making dictionary partitions
[*] Compression of RDF Dictionaries. Miguel A. Martínez-Prieto, Javier D. Fernández, Rodrigo Cánovas. ACM Symposium on Applied Computing (SAC 2012).
PhD Symposium
Key concepts: Triples
Specific compression:
More efficient compression than just gzip.
Data indexing for consumption:
Allows direct patterns resolution without decompression
(s,p,o), (s,?p,?o) and (s,p,?o)
We plan to work on a specific technique which
optimizes space
provides efficient performance in primitive operations
PhD Symposium
Preliminary results in Triples Encoding
We propose to use Bitmap indexes:
[*] Compact Representation of Large RDF Data Sets for Publishing and
Exchange. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez. International Semantic Web Conference(ISWC 2010).
PhD Symposium
Methodology
RDF structure in theory and practice.
Binary RDF Specification.
Succinct Dictionaries.
Triples Indexes.
Practical deployment.
Image: jscreationzs / FreeDigitalPhotos.net PhD Symposium
Some Results… HDT Acknowledged as W3C
member submission:
http://www.w3.org/Submission/2011/03/
supported by:
PhD Symposium
Some Results... HDT for exchanging
PhD Symposium
Some Results... HDT for consumption
Direct Consumption, without decompression after exchanging
Example of use: HDT-it (Thanks to Mario Arias, DERI)
Image: jscreationzs / FreeDigitalPhotos.net PhD Symposium
On-going promising work: HDT-FoQ
[*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto, Mario Arias, Javier D. Fernández. Extended Semantic Web Conference(ESWC 2012). To appear
PhD Symposium
In conclusion
Binary RDF aims to lightweight the Web of Data;
Logical decomposition: Header, Dictionary, and Triples
Clean publication
Compressed RDF format for exchanging
Machine-friendly, direct consumption
Rich Functional Dictionary/Triples representations for querying
PhD Symposium
Still much work on…
Getting a global understanding of the real structure of RDF networks.
Applying this knowledge in innovative dictionary and triples indexes.
full SPARQL at consumption
Supporting dynamic operations
inserting, deleting, and updating binary RDF
PhD Symposium
Thanks!
HDT: http://www.rdfhdt.org/
Group: http://dataweb.infor.uva.es/
Slides: http://www.slideshare.net/javifer
Javier D. Fernández ([email protected])
Supervised by: Miguel A. Martínez-Prieto, Claudio Gutierrez
University of Valladolid (Spain)
University of Chile (Chile)
PhD Symposium