Upload
jie-bao
View
855
Download
3
Embed Size (px)
DESCRIPTION
Citation preview
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Query Translation for Data Sources with Heterogeneous Content Semantics
Jie BaoDepartment of Computer Science
Iowa State [email protected]
May 5, 2006
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Outline
Ontology-Extended Data Sources (OEDS) Query Translation for OEDS with Heterogeneous
Data Content Semantics The INDUS Implementation Summary
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Data Semantics
Even you have the data, do you really understand it?
From Health database for Lorises
Environmental Stress
Tiredness Unwellness
Normal
Hear Something
FearSocial Stress
Social Play
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Bridging the Semantic Gap
• Explanations of data are always context-specific, therefore semantic gaps are common.
Between data sources of the same domain
Between the data provider
and a data user
Between different data users of the same data source
• Ontologies can make explicit the usually implicit assumptions about the “meaning” of data.
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Example: Academic DepartmentStudent RegisterFor Classes OfferedBy InstructorsSchema
Data Set
Ontological Commitment
• Students and Instrutors are People• Classes:Duration's values are time in minutes
• Student status “2ndYear” implies “Undergrad”
Data Schema Ontology
Data Content Ontologies
We will focus on data content ontologies in this work
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Jane’s ontology
Classes:Duration : Minutes
Data Content Ontologies
Data Users’Ontologies
Bob’s ontology
Classes:Duration : Hours
Data Provider’sOntology
[ AVH (Attribute Value Hierarchy) ]
Classes:Duration : Minutes
[ Unit Scale ]
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Ontology-Extended Data SourcesOntology-extended data sources (OEDS) make explicit, the
otherwise implicit ontologies associated with the data sources.
• ontologies can be specified by data providers or data users representing their local points of view.
D
OS
S
Schema Data Set
Data Schema Ontology
OD
Data Content Ontology
Data Sources(Relational, RDF…)
Ontologies
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Data Content Ontology as Data Type
• Common data types: String, Integer, Float…• Unit Scales
– e.g. MinuteDuration, HourDuration• Hierarchies as Partial-Order Ontologies (PO)
– Partial-ordering (): are transitive, self-reflexive and anti-symmetric relations.
– PO operators: =(equal to), <(below), >(above), (above or equal to), (below or equal to), ≠(not equal to)
– e.g. StudentStatus• Undergrad StudentStatus• Undergrad 1st_Year • 2nd_Y ear Undergrad • …
• They can be easily implemented as extensions to many RDBMSs: Oracle, PostgreSQL…
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Outline
Ontology-Extended Data Sources (OEDS) Query Translation for OEDS with
Heterogeneous Data Content Semantics The INDUS Implementation Summary
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Ontology-Extended Query
Bob’s query: How many regular classes (classes longer than half an hour) duration (in hours) are taken by students with status `Masters'?
Data Provider’s ontologyhas not equivalent conceptfor “Masters”
Class duration as recorded in the data source is in minutes
However, this query cannot be directly understood by the data source due to semantic gaps
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Query Translation
• Query translation is a process to transform a query using one ontology to a query using another ontology– usually from a user ontology to the data provider’s
ontology• The tuples that match a given query q: {q(t)}• A translation q-> q’ is
– Sound, if {q’(t)} {q(t)} (all retrieved results are needed)
– Complete, if {q(t)} {q’(t)} (all needed results are retrieved)
– Exact, if {q(t)} = {q’(t)} (sound and complete)
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Translation with Conversion Function
• A conversion function f:O1->O2 establishes one-to-one correspondences between terms in the two ontologies – O1:t and O2:f(O1:t) are semantically equivalent
• Example:– State2Code: {Iowa->IA, Delaware->DE,…}– H2M: y=x*60 (HourDuration to MinuteDuration)
• With conversion functions, exact translation can be made by term substitution– Duration HourDuration:0.5 -> Duration
MinuteDuration:30
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Translation with Interoperation Constraints (1)
• In many cases, one-to-one term correspondence is not existent– Float:3.5 has no correspondence in Integer– GradStatus:Masters has no correspondence in StudentStatus
• Therefore, exact translation is not always possible. • However, we may still build sound or complete translation
with the help of Interoperation Constraints (IC)
?
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Translation with Interoperation Constraints(2)
• IC between Float and Integer– Float:x <= Integer:x (ceiling)– Float:x >= Integer:x (floor)
• Translation rules– Sound translation: A < Float:x -> A < Integer:x, A >
Float:x -> A > Integer:x – Complete translation: A < Float:x -> A < Integer:x , A
> Float:x -> A > Integer :x • Example
– Sound translation: A< Float:3.5 -> A < Integer:3 A> Float:3.5 -> A > Integer:4
– Complete translation: A< Float:3.5 -> A < Integer:4 A> Float:3.5 -> A > Integer:3
The translation is dependent on both the terms and the operators in question
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Translation with Interoperation Constraints(3)
• IC between Partial-order Ontologies– INTO (<=): GradStatus: " Masters" <=
StudentStatus: "Grad"– ONTO (>=): GradStatus: "Masters" >=
StudentStatus: "Master of Science"– EQUIV (=): GradStatus: "Ph.D" =
StudentStatus: " Doctor of Philosophy"
=
<=
>=
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Translation Rules for PO
Example
• Sound translation: Status GradStatus: "Masters" -> Status StudentStatus:"Master of Science“(IC : GradStatus: "Masters" >= StudentStatus:"Master of Science“)
• Complete translation:Status GradStatus: "Masters" -> Status StudentStatus:“Grad“(IC : GradStatus: "Masters" <= StudentStatus:"Master of Grad“)
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
A Query Translation Algorithm
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Outline
Ontology-Extended Data Sources (OEDS) Query Translation for OEDS with
Heterogeneous Data Content Semantics The INDUS Implementation Summary
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Ontology-based information integration in INDUS
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Query processing in INDUS
QL
SV,OV
QLSQL
SV
Q1
S1,OV
Qn
Sn,OV
Qr1
S1,O1
Qrn
S1,On
Qr1SQL
S1
QrnSQL
Sn
D1
Dn
r1
rn
In remote ontology
In local ontology In local schema
In remote schema
r1L
rnL
RL
QueryFormation
LocalRewriting
Query Decomposition
Query Translation
Remote Rewriting
QueryExecution
InverseTranslation
ResultComposition
M1
Mn
M1
Mn
Query Formulation
Handling both schema heterogeneity and data content heterogeneity
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
INDUS: Ontology Editor
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
INDUS: Schema Editor
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
INDUS: Mapping Editor
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
INDUS: Query Editor
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Outline
Ontology-Extended Data Sources (OEDS) Query Translation for OEDS with
Heterogeneous Data Content Semantics The INDUS Implementation Summary
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Related work
Extensive work on semantic data integration, see survey papers [Hull 1997; Wache, et al. 2001; Levy, 2000]
Query translation with schema ontologies OBSERVER: [Mena et al., 2000] SIRUP: [Ziegler and Dittrich, 2004]
Query translation with data content ontologies BUSTER: [Wache and Stuckenschmidt, 2001] COIN: [Goh et al., 1999] Both only address term substitution, i.e. translation with conversion
functions. HOME & Ontology-extended relational algebra: [Bonatti et
al., 2003] It allows data types to be hierarchies, but only with “below”(<=)
operations on hierarchies.
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Conclusions
• In this study, we:– Argued for the need for making explicit the ontological
commitments behind data content semantics, in addition to data schema semantics
– Formulated the problem of translating queries w.r.t. context-specific data content ontologies.
– Described an algorithm for semantic-preserving translation of an ontology-extended query.
• Future Work:– Improve the scaleability of the translation process– Improve the expressiveness of supported ontologies
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Thank you!
Questions ?