Upload
elijah-hawkins
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Merging Source Query Interfaces on Web Databases
Merging Source Query Interfaces on Web Databases
Eduard C. Dragut (speaker)
Wensheng Wu
Prasad Sistla
Clement Yu
Weiyi Meng
Eduard C. Dragut (speaker)
Wensheng Wu
Prasad Sistla
Clement Yu
Weiyi Meng
University of Illinois at Chicago
University of Illinois at Urbana-Champaign
University of Illinois at Chicago
University of Illinois at Chicago
SUNY at Binghamton
University of Illinois at Chicago
University of Illinois at Urbana-Champaign
University of Illinois at Chicago
University of Illinois at Chicago
SUNY at Binghamton
ICDE 2006, Atlanta, USA
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 2
orbitz.com
A Motivating Scenario:
aa.com
Looking for a ticket Chicago – Atlanta, April 3rd – April 9th
A user looking for the “best” price for a ticket: Has to explore multiple sources It is tedious, frustrating and time-consuming
delta.com
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 3
The goal Provide a unified way to query
multiple sources in the same domain
priceline.com
nwa.com
delta.comunited.com
Unified query interface
Airfare.com
The Web
Formulate the query
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 4
Auto
Overview Integrating Query Interfaces
Extract query interfaces
He05, Zhang04
Various formatse.g. ASCII files
(Deep) Web
Merg
e Q
uery
In
terfa
ces
H.H
e03
Cluster query interfaces
Peng04
Match query interfaces
B.He03, Dhamankar04, Doan02, Madvan05, Wu04
The topic of this presentation
Car Rental
Books Airfare
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 5
Merge Algorithm The input
A set of query interfaces in the same domain E.g. Airline domain: Delta, AA, NWA, Orbitz, Travelocity Each query interface is represented hierarchically [Wu04]
And a mapping, globally characterizing the semantic correspondences between the fields in the query interfaces. Organized in clusters (e.g. [Wu04 et al, B.He03 et al])
vacations.net
Children
Vacations
Where and when do you want to travel?
LeavingDeparting from
Going to
How many people are going?
Adults Seniors
depDate
Returning
depTime retDate retTime
1 2
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 6
Travel
2
To City(4)
Departure Date(5)
Depday(6)
DepMonth(7)
Heuredep(8)
From City(3)
Airlines(12)
Travellers(13)
Adult(s)(14)
Children(15)
Senior(s)(16)
PriceLine
Arrival City(3)
Departure Date(4)
Departure month(5)
Departure day(6)
Departure Year(7)
Departure City(2)
Number Tickets(11)
Adult passengers(12)
Child passengers(13)
Infant passengers(14)
British
Going to(3)
Leaving from(2)
Departing on(10)
depDay(11)
depMonth(12)
Flight class(13)
4
Adults(5)
Children(6)
An Example
c_DepCity c_DestCity c_DepMonth c_DepDay c_DepTime c_DepYear
(Travel,3) (Travel,4) (Travel,7) (Travel,6) (Travel,8) (Travel,null)
(PriceLine,2) (PriceLine,3) (PriceLine,5) (PriceLine,6) (PriceLine,null) (PriceLine,7)
(British,2) (British,3) (British,9) (British,8) (British,null) (British,null)
c_Aduts c_Infants c_Children c_Seniors c_Airlines c_Class
(Travel,14) (Travel,null) (Travel,15) (Travel,16) (Travel,12) (Travel,null)
(PriceLine,12) (PriceLine,14) (PriceLine,13) (PriceLine,null) (PriceLine,null) (PriceLine,null)
(British,5) (British,null) (British,6) (British,null) (British,null) (British,13)
Three fragments of query interfaces represented hierarchically
The mapping between them, i.e. the set of clusters
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 7
Merge Algorithm The output
A unified query interface that consists of all the fields of individual interfaces, i.e. it has a field
for each of the clusters in the mapping definition preserves all the constraints enforced by the interfaces being
merged
The constraints to be satisfied by the global interface are: the grouping constraints (to be described) and the ancestor-descendant relationships among the elements within
individual interfaces.
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 8
Grouping Within a domain of discourse (e.g. Airfare) we observe:
A spatial locality property among the fields of query interfaces Designers tend to place related fields close to each other
Hence, in the integrated interface these fields should be placed in adjacent positions, too
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 9
Grouping Problem The goal (requirement)
Groups of fields that occur together in the source query interfaces to appear together in the integrated interface
The actual order of elements is immaterial The problem
Find a partition over the set of fields of a given domain characterizing the way fields are grouped in the integrated interface.
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 10
Capture Grouping Constrains Introduce the notion of potential groups
Informally, it is a maximal set of adjacent sibling leaves whose parent is not the root
Capture the way fields are organized within source query interfaces Underline designer’s perspective that these fields should be together
so that users can easily understand what is required and fill in the desired information with ease.
The set of all potential groups induced by query interface Travel
ExampleAirlines
Travel
To City
Travellers
depDay depTimedepMonth
1 Departure Date
From City ChildrenAdults Seniors
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 11
Constructing Groups Use these structural information collected from multiple
source interfaces to infer the way fields are organized in the integrated interface
Introduce the notion of a group of fields Informally, it is a sequence of fields that preserves the adjacency
constraints within related potential groups Two potential groups are related if their intersection is nonempty.
A group represents the desired organization of the fields in an integrated interface
An example: Set of related potential groups:
{Depday, DepMonth, DepTime}, {Departure month, Departure day, Departure Year}, {depDay, depMonth}
The resulted group: [DepTime, Departure day, Departure month, Departure Year]
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 12
Grouping Problem as C1P The grouping problem can be cast into the Consecutive Ones
Property (C1P) problem [Booth76 et al, Fulkerson65 at al]. For an universal set U and a subset, B, of the power set of U we want a
permutation п of the elements of U such that all the elements in each set in B appear as a consecutive sequence in п.
In our grouping problem Potential groups correspond to the set B U is the union of the fields in the potential groups П is the desired permutation of the fields
Several algorithms to obtain the groups in the integrated schema E.g. PQ-tree algorithm [Meidanis98 et al]
Used in our implementation
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 13
Grouping Problem as C1P An example of applying the PQ-tree algorithm
Set of related potential groups: B = {{c_DepDay, c_DepMonth, c_DepTime}, {c_DepMonth, c_DepDay,
c_DepYear}, {c_DepDay, c_DepMonth}} U = {c_DepDay, c_DepMonth, c_DepYear, c_DepTime}
P
c_DepMonthc_DepDay c_DepTimec_DepYear
P
Q
c_DepTime c_DepYear
c_DepDay c_DepMonth
Universal Tree Final PQ-tree
Frontier gives the group
A permutation satisfying all related potential groups cannot always be derived Minimize the number of violations
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 14
Constructing Groups On the running example
The set of all groups [c_DepCity, c_DestCity] [c_DepTime, c_DepDay, c_DepMonth, c_DepYear] [c_Seniors, c_Adults, c_Children, c_Infants]
Travel
2
To City(4)
Departure Date(5)
Depday(6)
DepMonth(7)
Heuredep(8)
From City(3)
Airlines(12)
Travellers(13)
Adult(s)(14)
Children(15)
Senior(s)(16)
PriceLine
Arrival City(3)
Departure Date(4)
Departure month(5)
Departure day(6)
Departure Year(7)
Departure City(2)
Number Tickets(11)
Adult passengers(12)
Child passengers(13)
Infant passengers(14)
British
Going to(3)
Departing on(7)
depDay(8)
depMonth(9)
Leaving from(2)
Flight class(13)
4
Adults(5)
Children(6)
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 15
Constructing Groups On the running example
The set of all groups [c_DepCity, c_DestCity] [c_DepTime, c_DepDay, c_DepMonth, c_DepYear] [c_Seniors, c_Adults, c_Children, c_Infants]
Travel
2
To City(4)
Departure Date(5)
Depday(6)
DepMonth(7)
Heuredep(8)
From City(3)
Airlines(12)
Travellers(13)
Adult(s)(14)
Children(15)
Senior(s)(16)
PriceLine
Arrival City(3)
Departure Date(4)
Departure month(5)
Departure day(6)
Departure Year(7)
Departure City(2)
Number Tickets(11)
Adult passengers(12)
Child passengers(13)
Infant passengers(14)
British
Going to(3)
Departing on(7)
depDay(8)
depMonth(9)
Leaving from(2)
Flight class(13)
4
Adults(5)
Children(6)
They were not considered (children of the root)
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 16
Pairwise merge For a set of query interfaces:
Iteratively merge two at a time Traversing the schema trees bottom-up Placing of group elements Preserving ancestor-descendant relationships in the source schemas
On the running example First iteration
Travel
2
To City(4)
Departure Date(5)
Depday(6)
DepMonth(7)
Heuredep(8)
From City(3)
Airlines(12)
Travellers(13)
Adult(s)(14)
Children(15)
Senior(s)(16)
PriceLine
Arrival City(3)
Departure Date(4)
Departure month(5)
Departure day(6)
Departure Year(7)
Departure City(2)
Number Tickets(11)
Adult passengers(12)
Child passengers(13)
Infant passengers(14)
Merge direction
Travel & PriceLine
2
To City(4)
Departure Date(5)
Depday(6)
DepMonth(7)
Heuredep(8)
From City(3)
Airlines(12)
Travellers(13)
Adult(s)(14)
Children(15)
Senior(s)(16)
Infant passengers
Departure Year
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 17
Pairwise merge Second iteration
Note, the fields are naturally placed in the merged interface
Travel & PriceLine
2
To City(4)
Departure Date(5)
Depday(6)
DepMonth(7)
Heuredep(8)
From City(3)
Airlines(12)
Travellers(13)
Adult(s)(14)
Children(15)
Senior(s)(16)
Infant passengers
Departure Year
British
Going to(3)
Departing on(7)
depDay(8)
depMonth(9)
Leaving from(2)
Flight class(13)
4
Adults(5)
Children(6)
Travel & PriceLine & British
2
To City(4)
Departure Date(5)
Depday(6)
DepMonth(7)
Heuredep(8)
From City(3)
Airlines(12)
Travellers(13)
Adult(s)(14)
Children(15)
Senior(s)(16)
Infant passengers
Departure Year
Flight class(13)
Merge direction
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 18
Experiment Setup
Five real world domain:
Mapping consists of clusters [Wu04 et al]
Domain#
interfacesAvg. # fields per
interfaceAvg. # internal nodes
per interfaceAvg. depth of
interfaces
Airfare 20 10.7 5.1 3.6
Automobile 20 5.1 1.7 2.4
Book 20 5.4 1.3 2.3
Job 20 4.6 1.1 2.1
Real Estate 20 6.5 2.4 2.7
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 19
Experiment The characteristics of the integrated interfaces.
Domain# potential
groups# groups # Violations
# Fields on the integ. interface
Depth of the integ. interface
Airfare 46 8 2 24 5
Automobile 22 4 0 18 3
Book 34 4 0 19 3
Job 12 1 0 19 2
Real Estate 47 7 0 28 4
All group constraints are satisfied with the exception of two potential groups in the airline domain [Seniors, Adults, Children, Infants] and [Airline, Class, NonStop].
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 20
Example Integrated Interfaces Airfare domain integrated interface
Country of residence
Airline
Where and when do you to go? 9
Email Address
PhoneFrom ToDept time and date
Date Time
1 Contact Name
Your First Name
Last Name
How many people are going?
Seniors Adults Children Infants
Do you have any preferences?
Max. Number of Stops
Class of Tickets
Airline Preference
2 3 4
Ret time and date
Date Time
8
6 75
Ret from Ret to
Note that fields are placed naturally
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 21
Example Integrated Interfaces Auto domain integrated interface
Note that fields are placed naturally
Auto
Your Information
EmailFirst
NameLast
NameYear
From To
Car Information Price
Min Max
State City Near Zip Code
Locate within
Make
Make/Model
Model Keywords
Class
Body Style
Phone Car Type
E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 22
End Please visit the project web site
http://www.cs.uic.edu/~edragut/QIProject.html
Thank you for your time and patience!