Upload
christophe-gueret
View
474
Download
0
Tags:
Embed Size (px)
DESCRIPTION
RDF is increasingly being used to represent large amounts of data on the Web. Current query evaluation strategies for RDF are inspired by databases, assuming perfect answers on finite repositories. In this paper, we present a novel query method based on evolutionary computing, which allows us to handle uncertainty, incompleteness and unsatisfiability, and deal with large datasets, all within a single conceptual framework. Our technique supports approximate answers with anytime behaviour. We present initial results and analyse next steps for improvement.
Citation preview
An Evolutionary Perspective onApproximate RDF Query Answering
Christophe Guéret, Eyal Oren, Stefan Schlobach,Frank van Harmelen and Martijn Schut
Vrije Universiteit, Amsterdam
griffioen
Problem and context Method proposed Experimental results Conclusion
The next 30 minutes in 4 points...
I RDF
I Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!
I RDF Query answering
I Finding data matching criterionI ... many queries are actually not satisfiable
I Approximate RDF Query answering
I Finding some, almost valid, data
I The Evolutionary Perspective
I Test different solutionsI Progressive optimisation of the result
SUM 2008 - October 2, 2008 2 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
The next 30 minutes in 4 points...
I RDFI Data on the Web
I Inconsistent, uncertain, heterogeneous, Huge and growing!
I RDF Query answering
I Finding data matching criterionI ... many queries are actually not satisfiable
I Approximate RDF Query answering
I Finding some, almost valid, data
I The Evolutionary Perspective
I Test different solutionsI Progressive optimisation of the result
SUM 2008 - October 2, 2008 2 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
The next 30 minutes in 4 points...
I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!
I RDF Query answering
I Finding data matching criterionI ... many queries are actually not satisfiable
I Approximate RDF Query answering
I Finding some, almost valid, data
I The Evolutionary Perspective
I Test different solutionsI Progressive optimisation of the result
SUM 2008 - October 2, 2008 2 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
The next 30 minutes in 4 points...
I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!
I RDF Query answering
I Finding data matching criterionI ... many queries are actually not satisfiable
I Approximate RDF Query answering
I Finding some, almost valid, data
I The Evolutionary Perspective
I Test different solutionsI Progressive optimisation of the result
SUM 2008 - October 2, 2008 2 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
The next 30 minutes in 4 points...
I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!
I RDF Query answeringI Finding data matching criterion
I ... many queries are actually not satisfiable
I Approximate RDF Query answering
I Finding some, almost valid, data
I The Evolutionary Perspective
I Test different solutionsI Progressive optimisation of the result
SUM 2008 - October 2, 2008 2 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
The next 30 minutes in 4 points...
I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!
I RDF Query answeringI Finding data matching criterionI ... many queries are actually not satisfiable
I Approximate RDF Query answering
I Finding some, almost valid, data
I The Evolutionary Perspective
I Test different solutionsI Progressive optimisation of the result
SUM 2008 - October 2, 2008 2 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
The next 30 minutes in 4 points...
I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!
I RDF Query answeringI Finding data matching criterionI ... many queries are actually not satisfiable
I Approximate RDF Query answering
I Finding some, almost valid, data
I The Evolutionary Perspective
I Test different solutionsI Progressive optimisation of the result
SUM 2008 - October 2, 2008 2 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
The next 30 minutes in 4 points...
I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!
I RDF Query answeringI Finding data matching criterionI ... many queries are actually not satisfiable
I Approximate RDF Query answeringI Finding some, almost valid, data
I The Evolutionary Perspective
I Test different solutionsI Progressive optimisation of the result
SUM 2008 - October 2, 2008 2 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
The next 30 minutes in 4 points...
I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!
I RDF Query answeringI Finding data matching criterionI ... many queries are actually not satisfiable
I Approximate RDF Query answeringI Finding some, almost valid, data
I The Evolutionary Perspective
I Test different solutionsI Progressive optimisation of the result
SUM 2008 - October 2, 2008 2 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
The next 30 minutes in 4 points...
I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!
I RDF Query answeringI Finding data matching criterionI ... many queries are actually not satisfiable
I Approximate RDF Query answeringI Finding some, almost valid, data
I The Evolutionary PerspectiveI Test different solutions
I Progressive optimisation of the result
SUM 2008 - October 2, 2008 2 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
The next 30 minutes in 4 points...
I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!
I RDF Query answeringI Finding data matching criterionI ... many queries are actually not satisfiable
I Approximate RDF Query answeringI Finding some, almost valid, data
I The Evolutionary PerspectiveI Test different solutionsI Progressive optimisation of the result
SUM 2008 - October 2, 2008 2 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
1 What’s the problem ?Querying RDF datastoresStandard techniques
2 And Now for Something Completely DifferentGuessing the solution insteadThe way we do it
3 Does it work ?Evolution of the qualitySome characteristics of this method
4 TODO list
SUM 2008 - October 2, 2008 3 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
1 What’s the problem ?Querying RDF datastoresStandard techniques
2 And Now for Something Completely DifferentGuessing the solution insteadThe way we do it
3 Does it work ?Evolution of the qualitySome characteristics of this method
4 TODO list
SUM 2008 - October 2, 2008 4 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Example
I RDF dataset<Ullman88> type Book .<Ullman88> label "Principles of Database and
Knowledge-Base Systems" .<Ullman88> author b1 .b1 _1 ullman .ullman homepage <http://stanford.edu/~ullman/> .
I SPARQL querySELECT ?title WHERE {?publication type Book .?publication label ?title .}
I Expected answer?title = "Principles of Database and Knowledge-Base Systems"
SUM 2008 - October 2, 2008 5 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Problem description
I Triple =subject,predicate,object
I Dataset =graph oftriples
I Querying :find apattern inthe graph
A query and a graph [PSPARQL07]SUM 2008 - October 2, 2008 6 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Standard techniques
I Standard approach :
1 Find all the possible results for ?publication typeBook
I?publication<Ullman88>
2 Find all the possible results for ?publication label?title
I?publication ?title<Ullman88> "Principles of ..."
3 Do a join on the two tables and return the result
I ?title = "Principles of ..."
I Fast thanks to the creation of indexes and queryoptimisation
SUM 2008 - October 2, 2008 7 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Standard techniques
I Standard approach :1 Find all the possible results for ?publication type
Book
I?publication<Ullman88>
2 Find all the possible results for ?publication label?title
I?publication ?title<Ullman88> "Principles of ..."
3 Do a join on the two tables and return the result
I ?title = "Principles of ..."
I Fast thanks to the creation of indexes and queryoptimisation
SUM 2008 - October 2, 2008 7 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Standard techniques
I Standard approach :1 Find all the possible results for ?publication type
Book
I?publication<Ullman88>
2 Find all the possible results for ?publication label?title
I?publication ?title<Ullman88> "Principles of ..."
3 Do a join on the two tables and return the result
I ?title = "Principles of ..."
I Fast thanks to the creation of indexes and queryoptimisation
SUM 2008 - October 2, 2008 7 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Standard techniques
I Standard approach :1 Find all the possible results for ?publication type
Book
I?publication<Ullman88>
2 Find all the possible results for ?publication label?title
I?publication ?title<Ullman88> "Principles of ..."
3 Do a join on the two tables and return the result
I ?title = "Principles of ..."
I Fast thanks to the creation of indexes and queryoptimisation
SUM 2008 - October 2, 2008 7 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Standard techniques
I Standard approach :1 Find all the possible results for ?publication type
Book
I?publication<Ullman88>
2 Find all the possible results for ?publication label?title
I?publication ?title<Ullman88> "Principles of ..."
3 Do a join on the two tables and return the result
I ?title = "Principles of ..."
I Fast thanks to the creation of indexes and queryoptimisation
SUM 2008 - October 2, 2008 7 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Standard techniques
I Standard approach :1 Find all the possible results for ?publication type
Book
I?publication<Ullman88>
2 Find all the possible results for ?publication label?title
I?publication ?title<Ullman88> "Principles of ..."
3 Do a join on the two tables and return the result
I ?title = "Principles of ..."
I Fast thanks to the creation of indexes and queryoptimisation
SUM 2008 - October 2, 2008 7 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Standard techniques
I Standard approach :1 Find all the possible results for ?publication type
Book
I?publication<Ullman88>
2 Find all the possible results for ?publication label?title
I?publication ?title<Ullman88> "Principles of ..."
3 Do a join on the two tables and return the resultI ?title = "Principles of ..."
I Fast thanks to the creation of indexes and queryoptimisation
SUM 2008 - October 2, 2008 7 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Standard techniques
I Standard approach :1 Find all the possible results for ?publication type
Book
I?publication<Ullman88>
2 Find all the possible results for ?publication label?title
I?publication ?title<Ullman88> "Principles of ..."
3 Do a join on the two tables and return the resultI ?title = "Principles of ..."
I Fast thanks to the creation of indexes and queryoptimisation
SUM 2008 - October 2, 2008 7 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Motivation
I Designed to return results only when there are someI Not designed for incomplete and approximate
queries/answersI Hard to distribute
I Approximate answers to precise queries
I If the query is unsat, return the best almost sat solutionfound
I Precises answers to approximate queries
I Return a subset of existing solutions instead of showingthem all
I Interactive querying
I Use of intermediate results to help the user improving hisquery
SUM 2008 - October 2, 2008 8 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Motivation
I Designed to return results only when there are someI Not designed for incomplete and approximate
queries/answersI Hard to distribute
I Approximate answers to precise queriesI If the query is unsat, return the best almost sat solution
found
I Precises answers to approximate queries
I Return a subset of existing solutions instead of showingthem all
I Interactive querying
I Use of intermediate results to help the user improving hisquery
SUM 2008 - October 2, 2008 8 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Motivation
I Designed to return results only when there are someI Not designed for incomplete and approximate
queries/answersI Hard to distribute
I Approximate answers to precise queriesI If the query is unsat, return the best almost sat solution
found
I Precises answers to approximate queriesI Return a subset of existing solutions instead of showing
them all
I Interactive querying
I Use of intermediate results to help the user improving hisquery
SUM 2008 - October 2, 2008 8 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Motivation
I Designed to return results only when there are someI Not designed for incomplete and approximate
queries/answersI Hard to distribute
I Approximate answers to precise queriesI If the query is unsat, return the best almost sat solution
found
I Precises answers to approximate queriesI Return a subset of existing solutions instead of showing
them all
I Interactive queryingI Use of intermediate results to help the user improving his
query
SUM 2008 - October 2, 2008 8 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
1 What’s the problem ?Querying RDF datastoresStandard techniques
2 And Now for Something Completely DifferentGuessing the solution insteadThe way we do it
3 Does it work ?Evolution of the qualitySome characteristics of this method
4 TODO list
SUM 2008 - October 2, 2008 9 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Approach
I “I’m Feeling Lucky” approach :
1 Assign some random values to the variables
I?publication = <Ullman88>?title = Book
2 Verify if the solution is valid
I
Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no
3 If the solution is OK, stop. Otherwise, try again withsomething else
I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query
SUM 2008 - October 2, 2008 10 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Approach
I “I’m Feeling Lucky” approach :1 Assign some random values to the variables
I?publication = <Ullman88>?title = Book
2 Verify if the solution is valid
I
Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no
3 If the solution is OK, stop. Otherwise, try again withsomething else
I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query
SUM 2008 - October 2, 2008 10 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Approach
I “I’m Feeling Lucky” approach :1 Assign some random values to the variables
I?publication = <Ullman88>?title = Book
2 Verify if the solution is valid
I
Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no
3 If the solution is OK, stop. Otherwise, try again withsomething else
I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query
SUM 2008 - October 2, 2008 10 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Approach
I “I’m Feeling Lucky” approach :1 Assign some random values to the variables
I?publication = <Ullman88>?title = Book
2 Verify if the solution is valid
I
Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no
3 If the solution is OK, stop. Otherwise, try again withsomething else
I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query
SUM 2008 - October 2, 2008 10 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Approach
I “I’m Feeling Lucky” approach :1 Assign some random values to the variables
I?publication = <Ullman88>?title = Book
2 Verify if the solution is valid
I
Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no
3 If the solution is OK, stop. Otherwise, try again withsomething else
I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query
SUM 2008 - October 2, 2008 10 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Approach
I “I’m Feeling Lucky” approach :1 Assign some random values to the variables
I?publication = <Ullman88>?title = Book
2 Verify if the solution is valid
I
Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no
3 If the solution is OK, stop. Otherwise, try again withsomething else
I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query
SUM 2008 - October 2, 2008 10 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Approach
I “I’m Feeling Lucky” approach :1 Assign some random values to the variables
I?publication = <Ullman88>?title = Book
2 Verify if the solution is valid
I
Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no
3 If the solution is OK, stop. Otherwise, try again withsomething else
I Rely on membership testing (instead of lookup)
I The testing loop can be stopped at any timeI A result may satisfy part of the query
SUM 2008 - October 2, 2008 10 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Approach
I “I’m Feeling Lucky” approach :1 Assign some random values to the variables
I?publication = <Ullman88>?title = Book
2 Verify if the solution is valid
I
Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no
3 If the solution is OK, stop. Otherwise, try again withsomething else
I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any time
I A result may satisfy part of the query
SUM 2008 - October 2, 2008 10 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Approach
I “I’m Feeling Lucky” approach :1 Assign some random values to the variables
I?publication = <Ullman88>?title = Book
2 Verify if the solution is valid
I
Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no
3 If the solution is OK, stop. Otherwise, try again withsomething else
I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query
SUM 2008 - October 2, 2008 10 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Our choices
I Need to pay attention to two aspects
1 Each try should be a step closer to the solution
I Random guessing may never endI Stopping the process at t + 1 should give better results than
at t
2 Testing a candidate solution must be fast
I Will try a lot of solutions
I We made the following choices
I Generation of solutions : Evolutionary algorithmI Verification of solutions : Bloom filter based testing
SUM 2008 - October 2, 2008 11 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Our choices
I Need to pay attention to two aspects1 Each try should be a step closer to the solution
I Random guessing may never endI Stopping the process at t + 1 should give better results than
at t
2 Testing a candidate solution must be fast
I Will try a lot of solutions
I We made the following choices
I Generation of solutions : Evolutionary algorithmI Verification of solutions : Bloom filter based testing
SUM 2008 - October 2, 2008 11 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Our choices
I Need to pay attention to two aspects1 Each try should be a step closer to the solution
I Random guessing may never endI Stopping the process at t + 1 should give better results than
at t2 Testing a candidate solution must be fast
I Will try a lot of solutions
I We made the following choices
I Generation of solutions : Evolutionary algorithmI Verification of solutions : Bloom filter based testing
SUM 2008 - October 2, 2008 11 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Our choices
I Need to pay attention to two aspects1 Each try should be a step closer to the solution
I Random guessing may never endI Stopping the process at t + 1 should give better results than
at t2 Testing a candidate solution must be fast
I Will try a lot of solutions
I We made the following choicesI Generation of solutions : Evolutionary algorithmI Verification of solutions : Bloom filter based testing
SUM 2008 - October 2, 2008 11 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Binary Bloom filters (1/2)
I Compact representation of information : a set of n = 8 bits
1 2 3 4 5 6 7 8
I Supports two operationsI INSERT(KEY) : Insert a key into the filterI CONTAINS(KEY) : Test for the presence of a key
I Use k = 3 hash functions to compute a set of bits from akey
HASH3(“HELLO WORLD”)=3HASH2(“HELLO WORLD”)=6HASH1(“HELLO WORLD”)=8
SUM 2008 - October 2, 2008 12 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Binary Bloom filters (2/2)
I INSERT(“HELLO WORLD”)Current
“Hello world”
New
OR
=
I Bit-wise or operation
I Always successful (i.e.unlimited capacity)
I Precision depends ofnumber of elements m.
I CONTAINS(“BONJOUR !”)Current
“Bonjour !”
Test result
AND
=
I Bit-wise and operation
I Positive result can be acollision
perror = (1− e−knm )k
SUM 2008 - October 2, 2008 13 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
A first (naive) approach
I Insert all the triples into a unique Bloom filter.I INSERT(“<Ullman88>_type_Book”)I INSERT(“<Ullman88>_label_"Principles of ..."”)I . . .
I Use the CONTAINS operation to verify a solution
I CONTAINS(“<Ullman88>_type_Book”)⇒ trueI CONTAINS(“<Ullman88>_label_Book”)⇒ false
I Not the best approach ! Let’s see what happen in detail . . .
?publication label ?title
CONTAINS(“<Ullman88>_label_Book”)
modify ?publication and ?title
SUM 2008 - October 2, 2008 14 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
A first (naive) approach
I Insert all the triples into a unique Bloom filter.I INSERT(“<Ullman88>_type_Book”)I INSERT(“<Ullman88>_label_"Principles of ..."”)I . . .
I Use the CONTAINS operation to verify a solutionI CONTAINS(“<Ullman88>_type_Book”)⇒ trueI CONTAINS(“<Ullman88>_label_Book”)⇒ false
I Not the best approach ! Let’s see what happen in detail . . .
?publication label ?title
CONTAINS(“<Ullman88>_label_Book”)
modify ?publication and ?title
SUM 2008 - October 2, 2008 14 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
A first (naive) approach
I Insert all the triples into a unique Bloom filter.I INSERT(“<Ullman88>_type_Book”)I INSERT(“<Ullman88>_label_"Principles of ..."”)I . . .
I Use the CONTAINS operation to verify a solutionI CONTAINS(“<Ullman88>_type_Book”)⇒ trueI CONTAINS(“<Ullman88>_label_Book”)⇒ false
I Not the best approach ! Let’s see what happen in detail . . .
?publication label ?title
CONTAINS(“<Ullman88>_label_Book”)
modify ?publication and ?title
SUM 2008 - October 2, 2008 14 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
A first (naive) approach
I Insert all the triples into a unique Bloom filter.I INSERT(“<Ullman88>_type_Book”)I INSERT(“<Ullman88>_label_"Principles of ..."”)I . . .
I Use the CONTAINS operation to verify a solutionI CONTAINS(“<Ullman88>_type_Book”)⇒ trueI CONTAINS(“<Ullman88>_label_Book”)⇒ false
I Not the best approach ! Let’s see what happen in detail . . .
?publication label ?title
CONTAINS(“<Ullman88>_label_Book”)
modify ?publication and ?title
SUM 2008 - October 2, 2008 14 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Graph parsing
I Every triple of the graph is inserted into 4 Bloom filters
type<Ullman88> Book
<Ullman88>_type_Book
SPO
<Ullman88>_type
SP
type_Book
PO
<Ullman88>_Book
SO
I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>
I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46
SUM 2008 - October 2, 2008 15 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Graph parsing
I Every triple of the graph is inserted into 4 Bloom filters
type<Ullman88> Book
<Ullman88>_type_Book
SPO
<Ullman88>_type
SP
type_Book
PO
<Ullman88>_Book
SO
I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>
I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46
SUM 2008 - October 2, 2008 15 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Graph parsing
I Every triple of the graph is inserted into 4 Bloom filters
type<Ullman88> Book
<Ullman88>_type_Book
SPO
<Ullman88>_type
SP
type_Book
PO
<Ullman88>_Book
SO
I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>
I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46
SUM 2008 - October 2, 2008 15 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Graph parsing
I Every triple of the graph is inserted into 4 Bloom filters
type<Ullman88> Book
<Ullman88>_type_Book
SPO
<Ullman88>_type
SP
type_Book
PO
<Ullman88>_Book
SO
I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>
I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46
SUM 2008 - October 2, 2008 15 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Graph parsing
I Every triple of the graph is inserted into 4 Bloom filters
type<Ullman88> Book
<Ullman88>_type_Book
SPO
<Ullman88>_type
SP
type_Book
PO
<Ullman88>_Book
SO
I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>
I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46
SUM 2008 - October 2, 2008 15 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Graph parsing
I Every triple of the graph is inserted into 4 Bloom filters
type<Ullman88> Book
<Ullman88>_type_Book
SPO
<Ullman88>_type
SP
type_Book
PO
<Ullman88>_Book
SO
I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>
I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46
SUM 2008 - October 2, 2008 15 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Graph parsing
I Every triple of the graph is inserted into 4 Bloom filters
type<Ullman88> Book
<Ullman88>_type_Book
SPO
<Ullman88>_type
SP
type_Book
PO
<Ullman88>_Book
SO
I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>
I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46
SUM 2008 - October 2, 2008 15 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Evolutionary algorithm flowchart [Eiben2003]
I Set of populations + Set of operators
SUM 2008 - October 2, 2008 16 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Query parsing
I Definition of the chromosome for the individuals?publication1 ?publication2 ?title
I Creation of constraints to verify
I Clause ?publication type Book .bloom(spo |?publication1 type Book)bloom(sp |?publication1 type)bloom(po |type Book)
I Clause ?publication label ?title .bloom(spo |?publication2 label ?title)bloom(sp |?publication2 label)bloom(po |label ?title)bloom(so |?publication2 ?title)
I Equality constraintequal(?publication1,?publication2)
Removedbecausealways true
SUM 2008 - October 2, 2008 17 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Query parsing
I Definition of the chromosome for the individuals?publication1 ?publication2 ?title
I Creation of constraints to verify
I Clause ?publication type Book .bloom(spo |?publication1 type Book)bloom(sp |?publication1 type)bloom(po |type Book)
I Clause ?publication label ?title .bloom(spo |?publication2 label ?title)bloom(sp |?publication2 label)bloom(po |label ?title)bloom(so |?publication2 ?title)
I Equality constraintequal(?publication1,?publication2)
Removedbecausealways true
SUM 2008 - October 2, 2008 17 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Query parsing
I Definition of the chromosome for the individuals?publication1 ?publication2 ?title
I Creation of constraints to verifyI Clause ?publication type Book .
bloom(spo |?publication1 type Book)bloom(sp |?publication1 type)bloom(po |type Book)
I Clause ?publication label ?title .bloom(spo |?publication2 label ?title)bloom(sp |?publication2 label)bloom(po |label ?title)bloom(so |?publication2 ?title)
I Equality constraintequal(?publication1,?publication2)
Removedbecausealways true
SUM 2008 - October 2, 2008 17 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Query parsing
I Definition of the chromosome for the individuals?publication1 ?publication2 ?title
I Creation of constraints to verifyI Clause ?publication type Book .
bloom(spo |?publication1 type Book)bloom(sp |?publication1 type)bloom(po |type Book)
I Clause ?publication label ?title .bloom(spo |?publication2 label ?title)bloom(sp |?publication2 label)bloom(po |label ?title)bloom(so |?publication2 ?title)
I Equality constraintequal(?publication1,?publication2)
Removedbecausealways true
SUM 2008 - October 2, 2008 17 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Query parsing
I Definition of the chromosome for the individuals?publication1 ?publication2 ?title
I Creation of constraints to verifyI Clause ?publication type Book .
bloom(spo |?publication1 type Book)bloom(sp |?publication1 type)bloom(po |type Book)
I Clause ?publication label ?title .bloom(spo |?publication2 label ?title)bloom(sp |?publication2 label)bloom(po |label ?title)bloom(so |?publication2 ?title)
I Equality constraintequal(?publication1,?publication2)
Removedbecausealways true
SUM 2008 - October 2, 2008 17 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Query parsing
I Definition of the chromosome for the individuals?publication1 ?publication2 ?title
I Creation of constraints to verifyI Clause ?publication type Book .
bloom(spo |?publication1 type Book)bloom(sp |?publication1 type)bloom(po |type Book)
I Clause ?publication label ?title .bloom(spo |?publication2 label ?title)bloom(sp |?publication2 label)bloom(po |label ?title)bloom(so |?publication2 ?title)
I Equality constraintequal(?publication1,?publication2)
Removedbecausealways true
SUM 2008 - October 2, 2008 17 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Evaluation of a candidate solution
I Solution is checked against all the constraints. If one issatisfied,
I A global reward w is wonI Each variable used is equally rewarded
I Rewards for : bloom(spo|?publication2 label?title)
I reward(solution) += wI reward(?publication1) += w
2I reward(?title) += w
2
SUM 2008 - October 2, 2008 18 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Creation of new individuals
I Select two individuals and do a one point crossover
dblp:ullman <Ullman88> "Principles. . ."
<Ullman88> dblp:ullman _:b1
Randomly pick a pivot point
dblp:ullman <Ullman88> _:b1
<Ullman88> dblp:ullman "Principles. . ."
Swap the two parts
I Mutate the least efficient variable
dblp:ullman <Ullman88> "Principles. . ."
0 3× w 2× w
Select the variable with lowestreward
<Ullman88> <Ullman88> "Principles. . ."
Assign a random new value
SUM 2008 - October 2, 2008 19 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
1 What’s the problem ?Querying RDF datastoresStandard techniques
2 And Now for Something Completely DifferentGuessing the solution insteadThe way we do it
3 Does it work ?Evolution of the qualitySome characteristics of this method
4 TODO list
SUM 2008 - October 2, 2008 20 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Results on some (small) datasets
I Database FOAF (15k triples) and DBLP (3M triples)I Query with, respectively, 4 and 11 different variablesI Average result for 200 individuals and 500 generations
0
10
20
30
40
50
60
0 100 200 300 400 500
fitne
ss v
alue
n-th generation
60
70
80
90
100
0 100 200 300 400 500
fitne
ss v
alue
n-th generation
I Solutions with maximum reward (52) are found for FOAFI Not enough time for DBLP (max 319)
SUM 2008 - October 2, 2008 21 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Scalibility & speed
I Low memory requirementsI Only depends on the number of individuals and the size of
the Bloom filters
(a) parsing
dataset memoryFOAF 65 MBDBLP 230 MB
(b) querying
dataset memoryFOAF 15 MBDBLP 140 MB
Table: Average memory usage (mostly due to dictionary)
I Computation can be distributedI Candidate solutions are independentI The dictionary can be based on a DHT
SUM 2008 - October 2, 2008 22 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
1 What’s the problem ?Querying RDF datastoresStandard techniques
2 And Now for Something Completely DifferentGuessing the solution insteadThe way we do it
3 Does it work ?Evolution of the qualitySome characteristics of this method
4 TODO list
SUM 2008 - October 2, 2008 23 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Status and future work
I Current statusI The search process can be slow to convergeI Several parameters to tune (rewards, size of the population,
number of generations, . . . )
I Current work
1 Improve benchmarking
I Test with more queries and more datasetsI Better study of the influence of the parameters
2 Improve evolution
I Experiment different type of crossover and mutationI Implement dynamic valuations for the rewardsI Improve early results on tabbu search approach
3 Test other, easy to parallelize and anytime, optimizer
I Swarm based algorithm (PSO, ...) or an other EAI CSP solver
SUM 2008 - October 2, 2008 24 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Status and future work
I Current statusI The search process can be slow to convergeI Several parameters to tune (rewards, size of the population,
number of generations, . . . )
I Current work
1 Improve benchmarking
I Test with more queries and more datasetsI Better study of the influence of the parameters
2 Improve evolution
I Experiment different type of crossover and mutationI Implement dynamic valuations for the rewardsI Improve early results on tabbu search approach
3 Test other, easy to parallelize and anytime, optimizer
I Swarm based algorithm (PSO, ...) or an other EAI CSP solver
SUM 2008 - October 2, 2008 24 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Status and future work
I Current statusI The search process can be slow to convergeI Several parameters to tune (rewards, size of the population,
number of generations, . . . )
I Current work1 Improve benchmarking
I Test with more queries and more datasetsI Better study of the influence of the parameters
2 Improve evolution
I Experiment different type of crossover and mutationI Implement dynamic valuations for the rewardsI Improve early results on tabbu search approach
3 Test other, easy to parallelize and anytime, optimizer
I Swarm based algorithm (PSO, ...) or an other EAI CSP solver
SUM 2008 - October 2, 2008 24 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Status and future work
I Current statusI The search process can be slow to convergeI Several parameters to tune (rewards, size of the population,
number of generations, . . . )
I Current work1 Improve benchmarking
I Test with more queries and more datasetsI Better study of the influence of the parameters
2 Improve evolutionI Experiment different type of crossover and mutationI Implement dynamic valuations for the rewardsI Improve early results on tabbu search approach
3 Test other, easy to parallelize and anytime, optimizer
I Swarm based algorithm (PSO, ...) or an other EAI CSP solver
SUM 2008 - October 2, 2008 24 / 24
griffioen
Problem and context Method proposed Experimental results Conclusion
Status and future work
I Current statusI The search process can be slow to convergeI Several parameters to tune (rewards, size of the population,
number of generations, . . . )
I Current work1 Improve benchmarking
I Test with more queries and more datasetsI Better study of the influence of the parameters
2 Improve evolutionI Experiment different type of crossover and mutationI Implement dynamic valuations for the rewardsI Improve early results on tabbu search approach
3 Test other, easy to parallelize and anytime, optimizerI Swarm based algorithm (PSO, ...) or an other EAI CSP solver
SUM 2008 - October 2, 2008 24 / 24