30
SPARQL Query Answering over OWL Ontologies Presented By Mohamed Sabri Ilianna Kollia 1,2 , Birte Glimm 2 , and Ian Horrocks 2 1 Na:onal Technical University of Athens, Greece 2 Oxford University, UK

SPARQL’Query’Answering’over’ OWL’Ontologies’’gweddell/cs848/... · SPARQL’Query’Answering’over’ OWL’Ontologies’’ Presented(By(Mohamed(Sabri(IliannaKollia

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

SPARQL  Query  Answering  over  OWL  Ontologies    

Presented  By  Mohamed  Sabri  

Ilianna  Kollia1,2,  Birte  Glimm2,  and  Ian  Horrocks2    

1  Na:onal  Technical  University  of  Athens,  Greece    2  Oxford  University,  UK    

Mo-va-on:  •  The  query  evalua;on  mechanism  defined  in  the  SPARQL  Query  

specifica;on  is  based  on  subgraph  matching  (simple  entailment).    •  SPARQL  1.1  includes  more  elaborate  entailment  regimes;  including  

RDFS  and  OWL.      •  Query  answering  under  such  entailment  regimes  is  more  complex  

as  it  may  involve  retrieving  answers  that  only  follow  implicitly  from  the  queried  graph.    

 •  While  several  methods  and  implementa;ons  for  SPARQL  under  

RDFS  seman;cs  are  available,  methods  that  use  OWL  seman;cs  have  not  yet  been  well-­‐studied.  

Effects  of  Different  Entailment  Regimes:  

•  Green  dashed  lines  indicate  RDF-­‐entailed  triples  and  red  dashed  lines  indicate  triples  that  are  also  RDFS-­‐entailed.  

 •  SELECT ?pub WHERE { ?pub rdf:type ex:Publication }

OWL  Reasoning:  

SELECT ?pub WHERE { ?pub rdf:type ex:Publication }  1)  Materializa-on  approaches.  

2)  Query  Rewri-ng  approaches.  

3)  Combined  approaches.  

OWL  Reasoning:  

SELECT ?pub WHERE { ?pub rdf:type ex:Publication }  1)  Materializa-on  approaches.  

2)  Query  Rewri-ng  approaches.  

3)  Combined  approaches.  

OWL  Reasoning:  

SELECT ?pub WHERE { ?pub rdf:type ex:Publication }  1)  Materializa-on  approaches.  

2)  Query  Rewri-ng  approaches.  

3)  Combined  approaches.  

Mo-va-on  (cont’d):  •  However,  the  previous  techniques  are  only  applicable  for  less  

expressive  OWL  2  profiles.    

•  In  this  paper,  the  authors  present  a  sound  and  complete  algorithm  for  answering  SPARQL  queries  under  the  OWL  2  Direct  Seman:cs  entailment  regime  (SPARQL-­‐OWL).  

•  …the  first  implementa;on  to  fully  support  SPARQL-­‐OWL!    •  “[Query  Answering]  is  a  very  ac;ve  research  area,  with  many  

different  techniques  being  developed  and  inves;gated…  it  is  reasonable  to  expect  that  the  future  performance  improvements  in  query  answering  systems  will  be  even  more  spectacular  than  those  achieved  in  the  past  by  class  reasoning  systems”,  Ian  Horrocks.  

Mo-va-on  (cont’d):  •  However,  the  previous  techniques  are  only  applicable  for  less  

expressive  OWL  2  profiles.    

•  In  this  paper,  the  authors  present  a  sound  and  complete  algorithm  for  answering  SPARQL  queries  under  the  OWL  2  Direct  Seman:cs  entailment  regime  (SPARQL-­‐OWL).  

•  …the  first  implementa;on  to  fully  support  SPARQL-­‐OWL!    •  “[Query  Answering]  is  a  very  ac;ve  research  area,  with  many  

different  techniques  being  developed  and  inves;gated…  it  is  reasonable  to  expect  that  the  future  performance  improvements  in  query  answering  systems  will  be  even  more  spectacular  than  those  achieved  in  the  past  by  class  reasoning  systems”,  Ian  Horrocks.  

Web  Ontology  Language  OWL  :    SubClassOf(:DogOwner ObjectSomeValuesFrom(:owns :Dog))) SubClassOf(:CatOwner ObjectSomeValuesFrom(:owns :Cat)))

ObjectPropertyDomain(:owns :Person)

ClassAssertion(ObjectUnionOf(:DogOwner :CatOwner) :mary)

ObjectPropertyAssertion(:owns :mary _:somePet)

Ø  We  can  infer:  O |= ClassAssertion(:Person :mary)

u  However,  there  is  no  unique  canonical  model  that  could  be  used  to  answer  queries.  Why?!  

u  And  in  OWL  it  cannot  be  guaranteed  that  the  models  of  an  ontology  are  finite!  

Web  Ontology  Language  OWL  :    SubClassOf(:DogOwner ObjectSomeValuesFrom(:owns :Dog))) SubClassOf(:CatOwner ObjectSomeValuesFrom(:owns :Cat)))

ObjectPropertyDomain(:owns :Person)

ClassAssertion(ObjectUnionOf(:DogOwner :CatOwner) :mary)

ObjectPropertyAssertion(:owns :mary _:somePet)

Ø  We  can  infer:  O |= ClassAssertion(:Person :mary)

u  However,  there  is  no  unique  canonical  model  that  could  be  used  to  answer  queries.  Why?!  

u  And  in  OWL  it  cannot  be  guaranteed  that  the  models  of  an  ontology  are  finite!  

Defini-ons  (simplified):  

Defini-on  1.  We  write  I  for  the  set  of  all  IRIs,  L  for  the  set  of  all  literals,  and  B  for  the  set  of  all  blanknodes.  The  set  T  of  RDF  terms  is  I∪L∪B.  Let  V  be  a  countably  infinite  set  of  variables  disjoint  from  T.  A  triple  pa]ern  is  member  of  the  set  (T  ∪  V)  ×  (I  ∪  V)  ×  (T  ∪  V),  and  a  basic  graph  pa]ern  (BGP)  is  a  set  of  triple  paQerns.      Defini-on2.  A  solu;on  mapping  is  a  par:al  func:on  μ:  V  →  T  from  variables  to  RDF    terms.      Defini-on  3.  An  RDF  instance  mapping  is  a  par:al  func:on  σ:  B  →  T  from  blank  nodes  to  RDF  terms.      We  use:    

     to  denote  the  result  of  mapping  an  OWL  2  DL  graph  G  into  an  OWL          ontology.    

 to  denote  the  result  of  mapping  an  OWL  2  DL  graph  G  into  an  OWL    ontology.    

   

OG

OBGPG

SPARQL-­‐OWL  Query  Answering:  

•  The  SPARQL-­‐OWL  regime  specifies  what  the  answers  are,  but  not  how  they  can  actually  be  computed.    

•  A  straigh(orward  algorithm  to  realize  the  entailment  regime:  Given  an  OWL  2  DL  graph  G  and  a  well-­‐formed  BGP  BGP  for  G,    

1)  map  G  into                ,  BGP  into                        2)  and  then  simply  tests,  for  each  compa;ble  pair  (μ,σ),  whether              sk(              )  |=  μ(σ(                    )).    

•  Compa1ble  Mappings:  (intui;vely)  class  variables  can  only  be  mapped  to  class  names,  object  property  variables  to  object  proper;es,  etc.    

•  but  in  the  worst  case,  the  number  of  dis;nct  compa;ble  pairs  (μ,  σ)  is  exponen;al  in  the  number  of  variables  in  the  query,  i.e.,  if  m  is  the  number  of  terms  in              and  n  is  the  number  of  variables  in                    ,  we  test  O(mn)  solu;ons.    

OG OBGPG

OG OBGPG

SPARQL-­‐OWL  Query  Answering:  

•  The  SPARQL-­‐OWL  regime  specifies  what  the  answers  are,  but  not  how  they  can  actually  be  computed.    

•  A  straigh(orward  algorithm  to  realize  the  entailment  regime:  Given  an  OWL  2  DL  graph  G  and  a  well-­‐formed  BGP  BGP  for  G,    

1)  map  G  into                ,  BGP  into                        2)  and  then  simply  tests,  for  each  compa;ble  pair  (μ,σ),  whether              sk(              )  |=  μ(σ(                    )).    

•  Compa1ble  Mappings:  (intui;vely)  class  variables  can  only  be  mapped  to  class  names,  object  property  variables  to  object  proper;es,  etc.    

•  but  in  the  worst  case,  the  number  of  dis;nct  compa;ble  pairs  (μ,  σ)  is  exponen;al  in  the  number  of  variables  in  the  query,  i.e.,  if  m  is  the  number  of  terms  in              and  n  is  the  number  of  variables  in                    ,  we  test  O(mn)  solu;ons.    

OG OBGPG

OG OBGPG

OG OBGPG

SPARQL-­‐OWL  Query  Answering  (cont’d):  

•  So,  the  authors  presents  op;miza;ons  to  reduce  the  number  of:    •  entailment  checks,  and    •  method  calls  to  the  reasoner.  

•  However,  op;miza;ons  cannot  easily  be  integrated  in  the  above  simple  algorithm  since  it  uses  the  reasoner  to  check  for  the  entailment  of  the  query  as  a  whole  and  does  not  take  advantage  of  rela;ons  that  may  exist  between  axiom  templates.  

•  For  example,  choosing  a  good  execu;on  order  can  significantly  affect  the  performance;  

consider  the  BGP  {  ?x  rdf:type  :A  .  ?x  :op  ?y  .  }    With  100  individuals  only  one  of  which  belongs  to  class  :A,  you  may  perform  10,200  tests  instead  of  just  200.  

SPARQL-­‐OWL  Query  Answering  (cont’d):  

•  Also,  instead  of  checking  entailment,  we  can,  for  several  axiom  templates,  directly  retrieve  the  solu;ons  from  the  reasoner.    Example:  {  ?x  rdfs:subClassOf  :C  }  

 •  Most  methods  of  reasoners  are  highly  op:mized,  which  can  

significantly  reduce  the  number  of  tests  that  are  performed.  Furthermore,  if  the  class  hierarchy  is  precomputed,  the  reasoner  can  find  the  answers  simply  with  a  cache  lookup.  

•  However,  the  actual  execu;on  cost  might  vary  significantly  (depending  on  the  internal  state  of  the  reasoner)  

   •  The  proposed  algorithm  internally  uses  an  OWL  2  DL  reasoner  to  

check  entailment  (HermiT,  in  this  implementa;on).  

Simple  and  Complex  Axiom  Templates:  

•  We  dis;nguish  between  simple  and  complex  axiom  templates,  where:    

 •  Simple  axiom  templates  are  those  that  correspond  to  dedicated  reasoning  tasks.      SubClassOf(?x  :C)  

•  Complex  axiom  templates  are,  in  contrast,  evaluated  by  itera;ng  over  the  compa;ble  mappings  and  by  checking  entailment  for  each  instan;ated  axiom  template.  

   SubClassOf(:C  ObjectIntersec;onOf(?z  ObjectSomeValuesFrom(?x  ?y)))        ClassAsser;on(ObjectSomeValuesFrom(:op  ?x)  ?y)  

390 I. Kollia, B. Glimm, and I. Horrocks

Algorithm 1. Query Evaluation ProcedureInput: G: the active graph, which is an OWL 2 DL graph

BGP: an OWL 2 DL BGPOutput: a multiset of solutions for evaluating BGP over G under OWL 2 Direct Semantics1: OG:=map(G)2: OG

BGP:=map(BGP,OG)3: Axt := rewrite(OG

BGP) {create a list Axt of simplified axiom templates from OGBGP}

4: Axt1, . . . ,Axtm:=connectedComponents(Axt)5: for j=1, . . . , m do6: Rj := {(µ0,!0) | dom(µ0) = dom(!0) = !}7: axt1, . . . , axtn := reorder(Axtj)8: for i = 1, . . . , n do9: Rnew := !

10: for (µ,!) " Rj do11: if isSimple(axti) and ((V(axti) # B(axti)) \ (dom(µ) # dom(!))) ! ! then12: Rnew := Rnew # {(µ # µ$,! # !$) | (µ$,!$) " callReasoner(µ(!(axti )))}13: else14: B := {(µ # µ$,! # !$) | dom(µ$) = V(µ(axti)), dom(!$) = B(!(axti)),

(µ # µ$,! # !$) is compatible with axti and sk(OG)}15: B := prune(B, axti, OG)16: while B ! ! do17: (µ$,!$) := removeNext(B)18: if OG |= µ$(!$(axti)) then19: Rnew := Rnew # {(µ$,!$)}20: else21: B := prune(B,axti, (µ$,!$))22: end if23: end while24: end if25: end for26: Rj := Rnew

27: end for28: end for29: R := {(µ1 # . . . # µm,!1 # . . . # !m) | (µ j,! j) " Rj, 1 % j % m}30: return {(µ,m) | m > 0 is the maximal number with {(µ,!1), . . . , (µ,!m)} & R}

Direct Semantics. We first explain the general outline of the algorithm and leave thedetails of the used submethods for the following section. First, G and BGP are mappedto OG and OG

BGP, respectively (lines 1 and 2). The function rewrite (line 3) can beassumed to do nothing. Next, the method connectedComponents (line 4) partitionsthe axiom templates into sets of connected components, i.e., within a component thetemplates share common variables, whereas between components there are no sharedvariables. Unconnected components unnecessarily increase the amount of intermedi-ate results and, instead, we can simply combine the results for the components in theend (line 29). For each component, we proceed as described below: we first determinean order (method reorder in line 7). For a simple axiom template, which contains sofar unbound variables, we then call a specialized reasoner method to retrieve entailed

390 I. Kollia, B. Glimm, and I. Horrocks

Algorithm 1. Query Evaluation ProcedureInput: G: the active graph, which is an OWL 2 DL graph

BGP: an OWL 2 DL BGPOutput: a multiset of solutions for evaluating BGP over G under OWL 2 Direct Semantics1: OG:=map(G)2: OG

BGP:=map(BGP,OG)3: Axt := rewrite(OG

BGP) {create a list Axt of simplified axiom templates from OGBGP}

4: Axt1, . . . ,Axtm:=connectedComponents(Axt)5: for j=1, . . . , m do6: Rj := {(µ0,!0) | dom(µ0) = dom(!0) = !}7: axt1, . . . , axtn := reorder(Axtj)8: for i = 1, . . . , n do9: Rnew := !

10: for (µ,!) " Rj do11: if isSimple(axti) and ((V(axti) # B(axti)) \ (dom(µ) # dom(!))) ! ! then12: Rnew := Rnew # {(µ # µ$,! # !$) | (µ$,!$) " callReasoner(µ(!(axti )))}13: else14: B := {(µ # µ$,! # !$) | dom(µ$) = V(µ(axti)), dom(!$) = B(!(axti)),

(µ # µ$,! # !$) is compatible with axti and sk(OG)}15: B := prune(B, axti, OG)16: while B ! ! do17: (µ$,!$) := removeNext(B)18: if OG |= µ$(!$(axti)) then19: Rnew := Rnew # {(µ$,!$)}20: else21: B := prune(B,axti, (µ$,!$))22: end if23: end while24: end if25: end for26: Rj := Rnew

27: end for28: end for29: R := {(µ1 # . . . # µm,!1 # . . . # !m) | (µ j,! j) " Rj, 1 % j % m}30: return {(µ,m) | m > 0 is the maximal number with {(µ,!1), . . . , (µ,!m)} & R}

Direct Semantics. We first explain the general outline of the algorithm and leave thedetails of the used submethods for the following section. First, G and BGP are mappedto OG and OG

BGP, respectively (lines 1 and 2). The function rewrite (line 3) can beassumed to do nothing. Next, the method connectedComponents (line 4) partitionsthe axiom templates into sets of connected components, i.e., within a component thetemplates share common variables, whereas between components there are no sharedvariables. Unconnected components unnecessarily increase the amount of intermedi-ate results and, instead, we can simply combine the results for the components in theend (line 29). For each component, we proceed as described below: we first determinean order (method reorder in line 7). For a simple axiom template, which contains sofar unbound variables, we then call a specialized reasoner method to retrieve entailed

390 I. Kollia, B. Glimm, and I. Horrocks

Algorithm 1. Query Evaluation ProcedureInput: G: the active graph, which is an OWL 2 DL graph

BGP: an OWL 2 DL BGPOutput: a multiset of solutions for evaluating BGP over G under OWL 2 Direct Semantics1: OG:=map(G)2: OG

BGP:=map(BGP,OG)3: Axt := rewrite(OG

BGP) {create a list Axt of simplified axiom templates from OGBGP}

4: Axt1, . . . ,Axtm:=connectedComponents(Axt)5: for j=1, . . . , m do6: Rj := {(µ0,!0) | dom(µ0) = dom(!0) = !}7: axt1, . . . , axtn := reorder(Axtj)8: for i = 1, . . . , n do9: Rnew := !

10: for (µ,!) " Rj do11: if isSimple(axti) and ((V(axti) # B(axti)) \ (dom(µ) # dom(!))) ! ! then12: Rnew := Rnew # {(µ # µ$,! # !$) | (µ$,!$) " callReasoner(µ(!(axti )))}13: else14: B := {(µ # µ$,! # !$) | dom(µ$) = V(µ(axti)), dom(!$) = B(!(axti)),

(µ # µ$,! # !$) is compatible with axti and sk(OG)}15: B := prune(B, axti, OG)16: while B ! ! do17: (µ$,!$) := removeNext(B)18: if OG |= µ$(!$(axti)) then19: Rnew := Rnew # {(µ$,!$)}20: else21: B := prune(B,axti, (µ$,!$))22: end if23: end while24: end if25: end for26: Rj := Rnew

27: end for28: end for29: R := {(µ1 # . . . # µm,!1 # . . . # !m) | (µ j,! j) " Rj, 1 % j % m}30: return {(µ,m) | m > 0 is the maximal number with {(µ,!1), . . . , (µ,!m)} & R}

Direct Semantics. We first explain the general outline of the algorithm and leave thedetails of the used submethods for the following section. First, G and BGP are mappedto OG and OG

BGP, respectively (lines 1 and 2). The function rewrite (line 3) can beassumed to do nothing. Next, the method connectedComponents (line 4) partitionsthe axiom templates into sets of connected components, i.e., within a component thetemplates share common variables, whereas between components there are no sharedvariables. Unconnected components unnecessarily increase the amount of intermedi-ate results and, instead, we can simply combine the results for the components in theend (line 29). For each component, we proceed as described below: we first determinean order (method reorder in line 7). For a simple axiom template, which contains sofar unbound variables, we then call a specialized reasoner method to retrieve entailed

 •  For  a  simple  axiom  template,  we  then  call  a  specialized  

reasoner  method  to  retrieve  entailed  results  (line  12).    

•  Otherwise,  we  check  which  compa;ble  solu;ons  yield  an  entailed  axiom  (lines  13  to  24).  

 

Proposed  Op-miza-ons:  

1)  Axiom  Template  Reordering:  q The  simple  axiom  templates  are  ordered  by  their  cost,  which  is  computed  as  the  weighted  sum  of  the  es;mated  number  of  required  consistency  checks  and  the  es;mated  result  size.  (reasoner-­‐dependant)  

q The  complex  templates  are  ordered  based  only  on  the  number  of  bindings  that  have  to  be  tested.  

2)  Axiom  Template  Rewri1ng:  q Some  costly  to  evaluate  axiom  templates  can  be  rewri]en  into  axiom  templates  that  can  be  evaluated  more  efficiently  and  yield  an  equivalent  result.    

q Example:  

SubClassOf(?x  ObjectIntersec;onOf(ObjectSomeValuesFrom(:op  ?y)  :C))      SubClassOf(?x  :C)  and  SubClassOf(?x  ObjectSomeValuesFrom(:op  ?y))  

Proposed  Op-miza-ons:  

1)  Axiom  Template  Reordering:  q The  simple  axiom  templates  are  ordered  by  their  cost,  which  is  computed  as  the  weighted  sum  of  the  es;mated  number  of  required  consistency  checks  and  the  es;mated  result  size.  (reasoner-­‐dependant)  

q The  complex  templates  are  ordered  based  only  on  the  number  of  bindings  that  have  to  be  tested.  

2)  Axiom  Template  Rewri1ng:  q Some  costly  to  evaluate  axiom  templates  can  be  rewri]en  into  axiom  templates  that  can  be  evaluated  more  efficiently  and  yield  an  equivalent  result.    

q Example:  

SubClassOf(?x  ObjectIntersec;onOf(ObjectSomeValuesFrom(:op  ?y)  :C))      SubClassOf(?x  :C)  and  SubClassOf(?x  ObjectSomeValuesFrom(:op  ?y))  

Proposed  Op-miza-ons:  

1)  Axiom  Template  Reordering:  q The  simple  axiom  templates  are  ordered  by  their  cost,  which  is  computed  as  the  weighted  sum  of  the  es;mated  number  of  required  consistency  checks  and  the  es;mated  result  size.  (reasoner-­‐dependant)  

q The  complex  templates  are  ordered  based  only  on  the  number  of  bindings  that  have  to  be  tested.  

2)  Axiom  Template  Rewri1ng:  q Some  costly  to  evaluate  axiom  templates  can  be  rewri]en  into  axiom  templates  that  can  be  evaluated  more  efficiently  and  yield  an  equivalent  result.    

q Example:  

SubClassOf(?x  ObjectIntersec;onOf(ObjectSomeValuesFrom(:op  ?y)  :C))      SubClassOf(?x  :C)  and  SubClassOf(?x  ObjectSomeValuesFrom(:op  ?y))  

392 I. Kollia, B. Glimm, and I. Horrocks

Table 2. Axiom templates and their equivalent simpler ones, where C(i) are class expressions(possibly containing variables), a is an individual or variable, and r is an object property expres-sion (possibly containing a variable)

ClassAssertion(ObjectIntersectionOf(:C1 . . . :Cn) :a) ! {ClassAssertion(:Ci :a) | 1 " i " n}SubClassOf(:C ObjectIntersectionOf(:C1 . . . :Cn)) ! {SubClassOf(:C :Ci) | 1 " i " n}

SubClassOf(ObjectUnionOf(:C1 . . . :Cn) :C) ! {SubClassOf(:Ci :C) | 1 " i " n}SubClassOf(ObjectSomeValuesFrom(:op owl:Thing :C) ! ObjectPropertyDomain(:op :C)

SubClassOf(owl:Thing ObjectAllValuesFrom(:op :C)) ! ObjectPropertyRange(:op :C)

Class-Property Hierarchy Exploitation The number of consistency checks needed toevaluate a BGP can be further reduced by taking the class and property hierarchies intoaccount. Once the classes and properties are classified (this can ideally be done before asystem accepts queries), the hierarchies are stored in the reasoner’s internal structures.We further use the hierarchies to prune the search space of solutions in the evaluationof certain axiom templates. We illustrate the intuition with an example. Let us assumethat OG

BGP contains the axiom template:

SubClassOf(:Infection ObjectSomeValuesFrom(:hasCausalLinkTo ?x))

If :C is not a solution and SubClassOf(:B :C) holds, then :B is also not a solution.Thus, when searching for solutions for x, the method removeNext (line 17) chooses thenext binding to test by traversing the class hierarchy topdown. When we find a non-solution :C, the subtree rooted in :C of the class hierarchy can safely be pruned, whichwe do in the method prune in line 21. Queries over ontologies with a large number ofclasses and a deep class hierarchy can, therefore, gain the maximum advantage fromthis optimization. We employ similar optimizations using the object and data propertyhierarchies. It is obvious that we only prune mappings that cannot constitute actualsolution and instance mappings, hence, soundness and completeness of Algorithm 1 ispreserved.

Exploiting the Domain and Range Restrictions Domain and range restrictions in OG

can be exploited to further restrict the mappings for class variables. Let us assume thatOG contains Axiom (6) and OG

BGP contains Axiom Template (7).

ObjectPropertyRange(:takesCourse :Course) (6)

SubClassOf(:GraduateStudent ObjectSomeValuesFrom(:takesCourse ?x)) (7)

Only the class :Course and its subclasses can be solutions for x and we can immediatelyprune other mappings in the method prune (line 15), which again preserves soundnessand completeness.

4 System Evaluation

Since entailment regimes only change the evaluation of basic graph patterns, standardSPARQL algebra processors can be used that allow for custom BGP evaluation. Fur-thermore, standard OWL reasoners can be used to perform the required reasoning tasks.

Proposed  Op-miza-ons  (cont’d):  

3)  Class-­‐Property  Hierarchy  Exploita1on:  q The  hierarchies  are  stored  in  the  reasoner’s  internal  structures.    q Can  be  used  to  prune  the  search  space  of  solu;ons  in  the  evalua;on  of  certain  axiom  templates.    

q Example:    SubClassOf(:Infec;on  ObjectSomeValuesFrom(:hasCausalLinkTo  ?x))      [If  :C  is  not  a  solu;on  and  SubClassOf(:B  :C)  holds,  then  :B  is  also  not  a      solu;on.]  

4)  Exploi1ng  the  Domain  and  Range  Restric1ons  :  q Can  be  exploited  to  further  restrict  the  mappings  for  class  variables.  

q Example:  ObjectPropertyRange(:takesCourse  :Course)  SubClassOf(:GraduateStudent  ObjectSomeValuesFrom(:takesCourse  ?x))        [Only  the  class  :Course  and  its  subclasses  can  be  solu;ons  for  x  and      we  can  immediately  prune  other  mappings]  

Proposed  Op-miza-ons  (cont’d):  

3)  Class-­‐Property  Hierarchy  Exploita1on:  q The  hierarchies  are  stored  in  the  reasoner’s  internal  structures.    q Can  be  used  to  prune  the  search  space  of  solu;ons  in  the  evalua;on  of  certain  axiom  templates.    

q Example:    SubClassOf(:Infec;on  ObjectSomeValuesFrom(:hasCausalLinkTo  ?x))      [If  :C  is  not  a  solu;on  and  SubClassOf(:B  :C)  holds,  then  :B  is  also  not  a      solu;on.]  

4)  Exploi1ng  the  Domain  and  Range  Restric1ons  :  q Can  be  exploited  to  further  restrict  the  mappings  for  class  variables.  

q Example:  ObjectPropertyRange(:takesCourse  :Course)  SubClassOf(:GraduateStudent  ObjectSomeValuesFrom(:takesCourse  ?x))        [Only  the  class  :Course  and  its  subclasses  can  be  solu;ons  for  x  and      we  can  immediately  prune  other  mappings]  

System  Evalua-on:  SPARQL Query Answering over OWL Ontologies 393

uses

QueryParsing

Algebra Evaluation

SPARQLquery

BGP Execution

AlgebraObject

Query solution sequence

BGP

determines

BGPsolution

sequence

OWL Reasoner

OWL Ontology

BGP parsing

BGP Rewriting

BGP Evaluation

BGP Reordering

Fig. 1. The main phases of query processing in our system

4.1 The System Architecture

Figure 1 depicts the main phases of query processing in our prototypical system. In oursetting, the queried graph is seen as an ontology that is loaded into an OWL reasoner.Currently, we only load the default graph/ontology of the RDF dataset into a reasonerand each query is evaluated using this reasoner. We plan, however, to extend the systemto named graphs, where the dataset clause of the query can be used to determine a rea-soner which contains one of the named ontologies instead of the default one. Loadingthe ontology and the initialization of the reasoner are performed before the system ac-cepts queries. We use the ARQ library3 of the Jena Semantic Web Toolkit for parsingthe query and for the SPARQL algebra operations apart from our custom BGP evalu-ation method. The BGP is parsed and mapped into axiom templates by our extensionof the OWL API [6], which uses the active ontology for type disambiguation. The re-sulting axiom templates are then passed to a query optimizer, which applies the axiomtemplate rewriting and then searches for a good query execution plan based on statisticsprovided by the reasoner. We use the HermiT reasoner4 for OWL reasoning, but onlythe module that generates statistics and provides cost estimations is HermiT specific.

4.2 Experimental Results

We tested our system with the Lehigh University Benchmark (LUBM) [3] and a rangeof custom queries that test complex axiom template evaluation over the more expressiveGALEN ontology. All experiments were performed on a Windows Vista machine witha double core 2.2 GHz Intel x86 32 bit processor and Java 1.6 allowing 1GB of Javaheap space. We measure the time for one-o! tasks such as classification separatelysince such tasks are usually performed before the system accepts queries. Whether morecostly operations such as the realization of the ABox, which computes the types for allindividuals, are done in the beginning, depends on the setting and the reasoner. Sincerealization is relatively quick in HermiT for LUBM (GALEN has no individuals), wealso performed this task upfront. The given results are averages from executing eachquery three times. The ontologies and all code required to perform the experiments areavailable online.5

3 http://jena.sourceforge.net/ARQ/4 http://www.hermit-reasoner.com/5 http://www.hermit-reasoner.com/2010/sparqlowl/sparqlowl.zip

•  Since  entailment  regimes  only  change  the  evalua;on  of  basic  graph  pa]erns,  standard  SPARQL  algebra  processors  can  be  used  that  allow  for  custom  BGP  evalua;on.  

•  Uses  the  ARQ  library  of  the  Jena  Seman;c  Web  Toolkit  for  parsing  the  query  and  for  the  SPARQL  algebra  opera;ons  apart  from  our  custom  BGP  evalua;on.  

•  The  BGP  is  parsed  and  mapped  into  axiom  templates  by  our  extension  of  the  OWL  API.  

•  We  use  the  HermiT  reasoner4  for  OWL  reasoning.  

Experimental  Results:  394 I. Kollia, B. Glimm, and I. Horrocks

Table 3. Query answering times in milliseconds for LUBM(1,0) and in seconds for the queriesof Table 4 with and without optimizations

LUMB(1, 0) GALEN queries from Table 4Query Time Query Reordering Hierarchy Rewriting Time

Exploitation1 20 1 2.12 46 1 x 0.13 19 2 780.64 19 2 x 4.45 32 3 >30 min6 58 3 x 119.67 42 3 x 204.78 353 3 x x 4.99 4,475 4 x x >30 min

10 23 4 x x 361.911 19 4 x x >30 min12 28 4 x x x 68.213 16 5 x >30 min14 45 5 x >30 min

5 x x 5.6

We first evaluate the 14 conjunctive ABox queries provided in the LUBM. Thesequeries are simple ones and have variables only in place of individuals and literals. TheLUBM ontology contains 43 classes, 25 object properties, and 7 data properties. Wetested the queries on LUBM(1,0), which contains data for one university starting fromindex 0, and which contains 16,283 individuals and 8,839 literals. The ontology took3.8 s to load and 22.7 s for classification and realization. Table 3 shows the executiontime for each of the queries. The reordering optimization has the biggest impact onqueries 2, 7, 8, and 9. These queries require much more time or are not answered at allwithin the time limit of 30 min without this optimization (758.9 s, 14.7 s, >30 min, >30min, respectively).

Conjunctive queries are supported by a range of OWL reasoners. SPARQL-OWLallows, however, the creation of very powerful queries, which are not currently sup-ported by any other system. In the absence of suitable standard benchmarks, we cre-ated a custom set of queries as shown in Table 4 (in FSS). Note that we omit variabletype declarations since the variable types are unambiguous in FSS. Since the complexqueries are mostly based on complex schema queries, we switched from the very simpleLUBM ontology to the GALEN ontology. GALEN consists of 2,748 classes, 413 objectproperties, and no individuals or literals. The ontology took 1.6 s to load and 4.8 s toclassify the classes and properties. The execution time for these queries is shown on theright-hand side of Table 3. For each query, we tested the execution once without opti-mizations and once for each combination of applicable optimizations from Section 3.

As expected, an increase in the number of variables within an axiom template leadsto a significant increase in the query execution time because the number of mappingsthat have to be checked grows exponentially in the number of variables. This can, inparticular, be observed from the di!erence in execution time between Query 1 and 2.

•  Evaluate  the  14  conjunc;ve  ABox  queries  provided  in  the  LUBM.  •  Without  op;miza;on,  queries  2,  7,  8,  and  9  required  758.9  s,  14.7  s,  >30  min,  and  >30  

min,  respec;vely.    

Experimental  Results:  SPARQL Query Answering over OWL Ontologies 395

Table 4. Sample complex queries for the GALEN ontology

1 SubClassOf(:Infection ObjectSomeValuesFrom(:hasCausalLinkTo ?x))2 SubClassOf(:Infection ObjectSomeValuesFrom(?y ?x))3 SubClassOf(?x ObjectIntersectionOf(:Infection

ObjectSomeValuesFrom(:hasCausalAgent ?y)))4 SubClassOf(:NAMEDLigament ObjectIntersectionOf(:NAMEDInternalBodyPart ?x)

SubClassOf(?x ObjectSomeValuesFrom(:hasShapeAnalagousToObjectIntersectionOf(?y ObjectSomeValuesFrom(?z :linear))))

5 SubClassOf(?x :NonNormalCondition)SubObjectPropertyOf(?z :ModifierAttribute)SubClassOf(:Bacterium ObjectSomeValuesFrom(?z ?w))SubObjectProperty(?y :StatusAttribute)SubClassOf(?w :AbstractStatus)SubClassOf(?x ObjectSomeValuesFrom(?y :Status))

From Queries 1, 2, and 3 it is evident that the use of the hierarchy exploitation opti-mization leads to a decrease in execution time of up to two orders of magnitude and, incombination with the query rewriting optimization, we can get an improvement of upto three orders of magnitude as seen in Query 3. Query 4 can only be completed in thegiven time limit if at least reordering and hierarchy exploitation is enabled. Rewritingsplits the first axiom template into the following two simple axiom templates, which areevaluated much more e!ciently:

SubClassOf(NAMEDLigament NAMEDInternalBodyPart)SubClassOf(NAMEDLigament ?x)

After the rewriting, the reordering optimization has an even more pronounced e"ectsince both rewritten axiom templates can be evaluated with a simple cache lookup.Without reordering, the complex axiom template could be executed before the simpleones, which leads to the inability to answer the query within the time limit of 30 min.Without a good ordering, Query 5 can also not be answered, but the additional use ofthe class and property hierarchy further improves the execution time by three orders ofmagnitude.

Although our optimizations can significantly improve the query execution time, therequired time can still be quite high. In practice, it is, therefore, advisable to add as manyrestrictive axiom templates for query variables as possible. For example, the addition ofSubClassOf(?y Shape) to Query 4 reduces the runtime from 68.2 s to 1.6 s.

5 Discussion

We have presented a sound and complete query answering algorithm and novel opti-mizations for SPARQL’s OWL Direct Semantics entailment regime. Our prototypicalquery answering system combines existing tools such as ARQ, the OWL API, and theHermiT OWL reasoner to implement an algorithm that evaluates basic graph patternsunder OWL’s Direct Semantics. Apart from the query reordering optimization—which

Conclusion  &  Future  Work:  •  A   sound   and   complete   query   answering   algorithm   and   novel   op;miza;ons   for  

SPARQL’s  OWL  Direct  Seman;cs  entailment  regime.    

•  Prototype  combines  exis;ng  tools;  ARQ,  the  OWL  API,  and  the  HermiT.  

•  Apart   from   the  query   reordering  op;miza;on,   the   system   is   independent  of   the  reasoner  used.  

•  The   op;miza;ons   can   improve   query   execu;on   ;me   by   up   to   three   orders   of  magnitude.  

•  Future  work  will  include  the  crea;on  of  more  accurate  cost  es;mates  for  the  cost-­‐based  query  reordering.    (the  next  paper  on  course  webpage!)  

•  The   implementa;on   of   caching   strategies   that   reduce   the   number   of   tests   for  different  instan;a;ons  of  a  complex  axiom  template.  (no  reported  work  yet!)  

•  Although   the   proposed   op;miza;ons   can   significantly   improve   the   query  execu;on   ;me,   the   required   ;me   can   s;ll   be   quite   high.   In   prac;ce,   it   is,  therefore,   advisable   to   add   as   many   restric;ve   axiom   templates   for   query  variables  as  possible.    

Conclusion  &  Future  Work:  •  A   sound   and   complete   query   answering   algorithm   and   novel   op;miza;ons   for  

SPARQL’s  OWL  Direct  Seman;cs  entailment  regime.    

•  Prototype  combines  exis;ng  tools;  ARQ,  the  OWL  API,  and  the  HermiT.  

•  Apart   from   the  query   reordering  op;miza;on,   the   system   is   independent  of   the  reasoner  used.  

•  The   op;miza;ons   can   improve   query   execu;on   ;me   by   up   to   three   orders   of  magnitude.  

•  Future  work  will  include  the  crea;on  of  more  accurate  cost  es;mates  for  the  cost-­‐based  query  reordering.    (the  next  paper  on  course  webpage!)  

•  The   implementa;on   of   caching   strategies   that   reduce   the   number   of   tests   for  different  instan;a;ons  of  a  complex  axiom  template.  (no  reported  work  yet!)  

•  Although   the   proposed   op;miza;ons   can   significantly   improve   the   query  execu;on   ;me,   the   required   ;me   can   s;ll   be   quite   high.   In   prac;ce,   it   is,  therefore,   advisable   to   add   as   many   restric;ve   axiom   templates   for   query  variables  as  possible.    

 Thank  You!  

J