Upload
sheila
View
22
Download
1
Embed Size (px)
DESCRIPTION
Integrating Databases into the Semantic Web through an Ontology-based Framework. Dejing Dou , Paea LePendu, Shiwoong Kim Computer and Information Science, University of Oregon, USA Peishen Qi Computer Science Department, Yale University, USA April, 2006 @ SWDB’06. Outline. Introduction - PowerPoint PPT Presentation
Citation preview
1
Integrating Databases into the Semantic Web Integrating Databases into the Semantic Web through an Ontology-based Frameworkthrough an Ontology-based Framework
Dejing Dou, Paea LePendu, Shiwoong Kim
Computer and Information Science, University of Oregon, USA
Peishen Qi
Computer Science Department, Yale University, USA
April, 2006 @ SWDB’06
2
OutlineOutline Introduction
– The status of the Semantic Web– Realizing SW needs existing databases
OntoGrate: An Ontology-based Information Integration Framework– Some previous work– Modules in OntoGrate Architecture
Case Study for integrating Databases into SW– Without an existing domain ontology– With an existing domain ontology
Conclusion and Future Work
3
The Semantic Web The Semantic Web One major goal of the Semantic Web is that web-based agents can
process and “understand” data [Berners-Lee etal01]. Ontologies formally describe the semantics of data and web-based
agents can take SW documents (e.g. in RDF/OWL) as a set of assertions (true statements) and draw inferences from them.
human
SW
Web-based agents
4
What we have now?What we have now? DAML+OIL OWL (Web ontology language)
More and more domain ontologies are defined in DAML+OIL/OWL, even for some specific domains (e.g., GO)
We are developing some tools, agents, services
See http://www.semwebcentral.org, http://knowledgeweb.semanticweb.org/
http://www.daml.org/
5
Two things are importantTwo things are important
Real Data for sharing– relational databases (may be the biggest resource)– Other kinds of databases– WWW/XML data– Some knowledge bases
Better Semantic Web Services/Agents
6
Semantic Annotation for Data?Semantic Annotation for Data? It is good for small size data resources
It is not that good for large size data resources (relational databases) – “Redundant” copies – Time consuming for query answering.
E.g. it currently works as loading OWL data into a knowledge base then answering queries with DL ABox reasoning. (Can it compete with existing DBMS which has well developed indexing and query optimization techniques?)
It is better that relational databases can be accessed/queried directly by SW agents/services
7
The difficultiesThe difficulties
The Semantic Web The Relational DBs
Ontologies define the semantics of data
Schemas define the structure and integrity constraints
8
A more general questionA more general question
How can we make databases, SW resources, WWW/XML data, KBs work together?
The problem is similar– SW resources and KBs are defined by ontologies, which
are more expressive and focus on semantics– Databases and XML documents are defined by schemas,
which focus on structure– Syntax difference (e.g., OWL vs. SQL)
9
OntoGrate: An Ontology-based Information OntoGrate: An Ontology-based Information Integration SystemIntegration System
10
Some Previous WorkSome Previous Work Schemas (e.g., stores7 DB in IBM informix),
11
Some Previous WorkSome Previous WorkSchemas, Ontologies and Web-PDDL
Relation Type/Class
Attribute Predicate/Property
Integrity Constrain Axiom/Rule
Primary Key Fact/Instance
12
Some Previous WorkSome Previous WorkMerging Ontologies with Bridging Axioms
13
Some Previous WorkSome Previous WorkThe Bridge Axiom/mapping on customerfname/customerlname vs. customercontactname :
(forall (c - @stores7:Customer f l - @sql:varchar) (if (and (@stores7:customerfname c f)
(@stores7:customerlname c l))
(@nwind:customercontactname c (@sql:concat f l))))
14
Some Previous WorkSome Previous WorkThe Bridge Axiom/mapping on customerregion vs. customerstatecode/statename/statecode :
(forall (x - @nwind:Customer y - @sql:varchar) (if (@nwind:customerregion x y) (exists (z - @stores7:State t - @sql:varchar) (and (@stores7:customerstatecode x t) (@stores7:statename z y) (@stores7:statecode z t)))))
15
Some Previous WorkSome Previous WorkInferential Data Integration with OntoEngine
– Data Translation:
View data as true statements, e.g., (statecode S#28 “OR”)
(Ms_t; s) D t only if (Ms_t; s) ╞ t
(Ms_t; s) D t (Ms_t; s) ├ t (Ms_t; s) ╞ t
– Query Translation:
(Ms_t; s) Q t only if (Ms_t; (t)) ╞ (s)
(Ms_t; s) Q t (Ms_t; (t)) ├ (s)
(Ms_t; (t)) ╞ (s)
16
OntoGrate Architecture RevisitedOntoGrate Architecture Revisited
17
Modules in OntoGrate ArchitectureModules in OntoGrate ArchitectureThe Syntax Translators (Wrappers)
– e.g., PDDSQL (SQLWeb-PDDL), PDDOWL(OWL Web-PDDL)
The Matching (correspondence) Generation– e.g., name, structure (tree, graph) similarity,synonyms and
is-a (part of) relationships using thesauri and dictionary, such as Wordnet
The Data Mining ModuleThe Machine Learning ModuleThe Inference Engine (OntoEngine)The User Interface
18
Learning the mappings from Learning the mappings from domain expertsdomain experts
(forall (x - @A1:Invertebrate)
(if (is @A1:Insect x)
(and (@A2:legs x 6)
(@A2:bodySegments x 3))))
19
Mining the mappings from large Mining the mappings from large datasetsdatasets
For example, two Medical databases in the same hospital: DB1 list blood pressure of patients with nominal values, such as low, normal, at risk, and high, while the other DB2 may record the exact numerical values for systolic and diastolic pressure.
By association rule mining, we may get the rule/mapping like:
@DB2:SystolicPressure 140 @DB2:DiastolicPressure 90
@DB2:BloodPressure = `High‘
(support = 40%, confidence = 90%)
20
Case Study in Two ScenariosCase Study in Two ScenariosIntegrating DBs into SW without an
existing domain ontology
Integrating DBs into SW with an existing domain ontology
21
Without an existing domain ontologyWithout an existing domain ontology
22
Generating OWL ontologies from DB SchemasGenerating OWL ontologies from DB Schemas
SQL schema Web-PDDL (by using PDDSQL) Web-PDDL OWL (by using PDDOWL)
– E.g., Stores7.sql Stores7.pddl Stores7.owl ... <owl:Class rdf:ID="Customer"> <rdfs:subClassOf rdf:resource=“http://www.cs.uoregon.edu/~paea/sql#Relation"/> </owl:Class> <owl:DatatypeProperty rdf:ID="customercity"> <rdfs:domain rdf:resource="#Customer"/> <rdfs:range rdf:resource="#String"/> </owl:DatatypeProperty> ...
23
An OWL-QL query based on Stores7.owl An OWL-QL query based on Stores7.owl <owl-ql:query xmlns:owl-ql=“http://www.w3.org/2003/10/owl-ql-syntax#"...> <owl-ql:premise> <rdf:RDF> <rdf:Description rdf:about="#C"> <rdf:type rdf:resource="#Customer"/> <customercity rdf:resource="#Eugene"/> </rdf:Description> </rdf:RDF> </owl-ql:premise> <owl-ql:queryPattern> <rdf:RDF> <rdf:Description rdf:about="#C"> <customerfname rdf:resource="http://www.w3.org/2003/10/ owl-ql-variables #x"/> <customerlname rdf:resource=" http://www.w3.org/2003/10/ owl-ql-variables##y"/> </rdf:Description> </rdf:RDF> </owl-ql:queryPattern> <owl-ql:answerKBPattern> <owl-ql:kbRef rdf:resource="...stores7.owl"/>…
24
The corresponding Web-PDDL and SQL queriesThe corresponding Web-PDDL and SQL queries
(and (customercity ?C - Customer "Eugene") (customerfname ?C - Customer ?x - String) (customerlname ?C - Customer ?y - String))
PDDSQL
SELECT C.customerfname, C.customerlname
FROM Customer C
WHERE C.customercity = "Eugene"
PDDOWL
25
Getting Answers from Stores7 DBGetting Answers from Stores7 DB
{?x/Paea, ?y/LePendu}{?x/Dejing, ?y/Dou}{?x/Shiwoong, ?y/Kim}
PDDOWL
PDDSQL
customerfname customerlname
Paea LePendu
Dejing Dou
Shiwoong Kim
<owl-ql:answerBundle
xmlns:owl-ql=" http://www.w3.org/2003/10/ owl-ql-syntax#" ...>
<owl-ql:answer>
<owl-ql:binding-set>
<var:x rdf:resource="#Paea"/>
<var:y rdf:resource="#LePendu"/>
</owl-ql:binding-set>
<owl-ql:answerPatternInstance>
<rdf:RDF>
<rdf:Description rdf:about="#C">
<customerfname rdf:resource="#Paea"/>
(1000 bindings/3 secs)
(1000/100,000/3secs)
26
With an existing domain ontologyWith an existing domain ontology
Order ontology: http://www.dayf.de/2004/owl/order.owl
27
An OWL-QL query based on order.owl An OWL-QL query based on order.owl <owl-ql:query xmlns:owl-ql=“http://www.w3.org/2003/10/owl-ql-syntax#"...> <owl-ql:premise> <rdf:RDF> <<rdf:type rdf:resource="#Person"/> <hasAddress rdf:resource="#A"/> </rdf:Description> <rdf:Description rdf:about="#A"> <rdf:type rdf:resource="#Address"/> <City rdf:resource="#Eugene"/> </rdf:Description> </rdf:Description> </rdf:RDF> <owl-ql:queryPattern> <rdf:RDF> <rdf:Description rdf:about="#C"> <FirstName rdf:resource="http://www.w3.org/2003/10/ owl-ql-variables #x"/> <LastNname rdf:resource=" http://www.w3.org/2003/10/ owl-ql-variables##y"/> … <owl-ql:kbRef rdf:resource=" http://www.dayf.de/2004/owl/order.owl"/>…
28
The Bridging Axioms/Mappings between The Bridging Axioms/Mappings between Stores7.pddl and Order.pddl Stores7.pddl and Order.pddl
(T-> @stores7:Customer @order:Person)
(forall (P - @order:Person A - @order:Address z - String)
(if (and (@order:hasAddress P A)
(@order:City A z))
(@stores7:customercity P z)))
(forall (C - @stores7:Customer z - String)
(if (@stores7:customercity P z)
(exists (A - @order:Address)
(and (@order:hasAddress P A)
(@order:City A z)))))
29
The Bridging Axioms/Mappings between The Bridging Axioms/Mappings between Stores7.pddl and Order.pddl Stores7.pddl and Order.pddl
(T-> @stores7:Customer @order:Person)
(forall (C - @stores7:Customer x - String)
(iff (@stores7:customerfname C x)
(@order:FirstName C x)))
(forall (C - @stores7:Customer y - String)
(iff (@stores7:customerlname C y)
(@order:LastName C y)))
30
The Query Translation between Stores7 and OrderThe Query Translation between Stores7 and Order
(and (hasAddress ?C - Person ?A - Address) (City ?A "Eugene") (FirstName ?C - Person ?x - String) (LastName ?C - Person ?y - String))
OntoEngine ( < 1 sec)
(and (customercity ?C - Customer "Eugene") (customerfname ?C - Customer ?x - String) (customerlname ?C - Customer ?y - String))
PDDOWL
Bridging Axioms
OWL-QL query in order.owl
31
Final Answers in the order ontologyFinal Answers in the order ontology
(customerfname C1 Paea)(customerlname C2 LePendu)(customerfname C1 Dejing)…
PDDOWL (10,000 facts/11 secs)
PDDSQL
customerfname customerlname
Paea LePendu
Dejing Dou
Shiwoong Kim
<owl-ql:answer>
<owl-ql:binding-set>
<var:x rdf:resource="#Paea"/>
<var:y rdf:resource="#LePendu"/>
</owl-ql:binding-set>
<owl-ql:answerPatternInstance>
<rdf:RDF>
<rdf:Description rdf:about="#C">
<FirstName rdf:resource="#Paea"/>
<LastName rdf:resource="#LePendu"/>
…
OntoEngine (40,000facts/30 secs)
(FirstName C1 Paea)(LastName C2 LePendu)(FirstName C1 Dejing)…
32
Some related workSome related work Semantic Annotation
– [Stojanovic etal@SAC02] maps relational model to frame logic/RDF.– DOGMA[Verheyden etal@SWDB04] translates a ontology query to SQL
Schema and Ontology mapping– Similarity matching, machine learning… useful for
generating candidate matchings– Semi-automatic tool (Clio)
Data integration and query answering– Federated databases[Sheth&Larson 90], data warehouse, peer
to peer management [Halevy etal@ICDE03] , MiniCon [PottingerLevy@VLDB00] uses query rewriteing at GLV
Logic and Databases– Reiter’s reconstruction of relational model in FOL.– Carnot, SIMS, Information Manifold by using a global
ontology, DL or Datalog
33
Conclusion and Future workConclusion and Future work We applied OntoGrate, an ontology-based information
integration framework, to integrate relational databases with the Semantic Web. The testing result based on two scenarios is promising.
We are developing other modules (e.g., learning/mapping/UI) in OntoGrate.
The scalability and efficiency need to be investigated in larger-size data resources.
Extending the current work to integrate XML (with/without XML schemas or DTD) and the Semantic Web.
34
Thank you for your attentionThank you for your attention ! !