22
Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng University of Illinois at Chicago University of Illinois at Urbana-Champaign University of Illinois at Chicago University of Illinois at Chicago SUNY at Binghamton ICDE 2006, Atlanta, USA

Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

Embed Size (px)

Citation preview

Page 1: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

Merging Source Query Interfaces on Web Databases

Merging Source Query Interfaces on Web Databases

Eduard C. Dragut (speaker)

Wensheng Wu

Prasad Sistla

Clement Yu

Weiyi Meng

Eduard C. Dragut (speaker)

Wensheng Wu

Prasad Sistla

Clement Yu

Weiyi Meng

University of Illinois at Chicago

University of Illinois at Urbana-Champaign

University of Illinois at Chicago

University of Illinois at Chicago

SUNY at Binghamton

University of Illinois at Chicago

University of Illinois at Urbana-Champaign

University of Illinois at Chicago

University of Illinois at Chicago

SUNY at Binghamton

ICDE 2006, Atlanta, USA

Page 2: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 2

orbitz.com

A Motivating Scenario:

aa.com

Looking for a ticket Chicago – Atlanta, April 3rd – April 9th

A user looking for the “best” price for a ticket: Has to explore multiple sources It is tedious, frustrating and time-consuming

delta.com

Page 3: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 3

The goal Provide a unified way to query

multiple sources in the same domain

priceline.com

nwa.com

delta.comunited.com

Unified query interface

Airfare.com

The Web

Formulate the query

Page 4: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 4

Auto

Overview Integrating Query Interfaces

Extract query interfaces

He05, Zhang04

Various formatse.g. ASCII files

(Deep) Web

Merg

e Q

uery

In

terfa

ces

H.H

e03

Cluster query interfaces

Peng04

Match query interfaces

B.He03, Dhamankar04, Doan02, Madvan05, Wu04

The topic of this presentation

Car Rental

Books Airfare

Page 5: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 5

Merge Algorithm The input

A set of query interfaces in the same domain E.g. Airline domain: Delta, AA, NWA, Orbitz, Travelocity Each query interface is represented hierarchically [Wu04]

And a mapping, globally characterizing the semantic correspondences between the fields in the query interfaces. Organized in clusters (e.g. [Wu04 et al, B.He03 et al])

vacations.net

Children

Vacations

Where and when do you want to travel?

LeavingDeparting from

Going to

How many people are going?

Adults Seniors

depDate

Returning

depTime retDate retTime

1 2

Page 6: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 6

Travel

2

To City(4)

Departure Date(5)

Depday(6)

DepMonth(7)

Heuredep(8)

From City(3)

Airlines(12)

Travellers(13)

Adult(s)(14)

Children(15)

Senior(s)(16)

PriceLine

Arrival City(3)

Departure Date(4)

Departure month(5)

Departure day(6)

Departure Year(7)

Departure City(2)

Number Tickets(11)

Adult passengers(12)

Child passengers(13)

Infant passengers(14)

British

Going to(3)

Leaving from(2)

Departing on(10)

depDay(11)

depMonth(12)

Flight class(13)

4

Adults(5)

Children(6)

An Example

c_DepCity c_DestCity c_DepMonth c_DepDay c_DepTime c_DepYear

(Travel,3) (Travel,4) (Travel,7) (Travel,6) (Travel,8) (Travel,null)

(PriceLine,2) (PriceLine,3) (PriceLine,5) (PriceLine,6) (PriceLine,null) (PriceLine,7)

(British,2) (British,3) (British,9) (British,8) (British,null) (British,null)

c_Aduts c_Infants c_Children c_Seniors c_Airlines c_Class

(Travel,14) (Travel,null) (Travel,15) (Travel,16) (Travel,12) (Travel,null)

(PriceLine,12) (PriceLine,14) (PriceLine,13) (PriceLine,null) (PriceLine,null) (PriceLine,null)

(British,5) (British,null) (British,6) (British,null) (British,null) (British,13)

Three fragments of query interfaces represented hierarchically

The mapping between them, i.e. the set of clusters

Page 7: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 7

Merge Algorithm The output

A unified query interface that consists of all the fields of individual interfaces, i.e. it has a field

for each of the clusters in the mapping definition preserves all the constraints enforced by the interfaces being

merged

The constraints to be satisfied by the global interface are: the grouping constraints (to be described) and the ancestor-descendant relationships among the elements within

individual interfaces.

Page 8: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 8

Grouping Within a domain of discourse (e.g. Airfare) we observe:

A spatial locality property among the fields of query interfaces Designers tend to place related fields close to each other

Hence, in the integrated interface these fields should be placed in adjacent positions, too

Page 9: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 9

Grouping Problem The goal (requirement)

Groups of fields that occur together in the source query interfaces to appear together in the integrated interface

The actual order of elements is immaterial The problem

Find a partition over the set of fields of a given domain characterizing the way fields are grouped in the integrated interface.

Page 10: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 10

Capture Grouping Constrains Introduce the notion of potential groups

Informally, it is a maximal set of adjacent sibling leaves whose parent is not the root

Capture the way fields are organized within source query interfaces Underline designer’s perspective that these fields should be together

so that users can easily understand what is required and fill in the desired information with ease.

The set of all potential groups induced by query interface Travel

ExampleAirlines

Travel

To City

Travellers

depDay depTimedepMonth

1 Departure Date

From City ChildrenAdults Seniors

Page 11: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 11

Constructing Groups Use these structural information collected from multiple

source interfaces to infer the way fields are organized in the integrated interface

Introduce the notion of a group of fields Informally, it is a sequence of fields that preserves the adjacency

constraints within related potential groups Two potential groups are related if their intersection is nonempty.

A group represents the desired organization of the fields in an integrated interface

An example: Set of related potential groups:

{Depday, DepMonth, DepTime}, {Departure month, Departure day, Departure Year}, {depDay, depMonth}

The resulted group: [DepTime, Departure day, Departure month, Departure Year]

Page 12: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 12

Grouping Problem as C1P The grouping problem can be cast into the Consecutive Ones

Property (C1P) problem [Booth76 et al, Fulkerson65 at al]. For an universal set U and a subset, B, of the power set of U we want a

permutation п of the elements of U such that all the elements in each set in B appear as a consecutive sequence in п.

In our grouping problem Potential groups correspond to the set B U is the union of the fields in the potential groups П is the desired permutation of the fields

Several algorithms to obtain the groups in the integrated schema E.g. PQ-tree algorithm [Meidanis98 et al]

Used in our implementation

Page 13: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 13

Grouping Problem as C1P An example of applying the PQ-tree algorithm

Set of related potential groups: B = {{c_DepDay, c_DepMonth, c_DepTime}, {c_DepMonth, c_DepDay,

c_DepYear}, {c_DepDay, c_DepMonth}} U = {c_DepDay, c_DepMonth, c_DepYear, c_DepTime}

P

c_DepMonthc_DepDay c_DepTimec_DepYear

P

Q

c_DepTime c_DepYear

c_DepDay c_DepMonth

Universal Tree Final PQ-tree

Frontier gives the group

A permutation satisfying all related potential groups cannot always be derived Minimize the number of violations

Page 14: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 14

Constructing Groups On the running example

The set of all groups [c_DepCity, c_DestCity] [c_DepTime, c_DepDay, c_DepMonth, c_DepYear] [c_Seniors, c_Adults, c_Children, c_Infants]

Travel

2

To City(4)

Departure Date(5)

Depday(6)

DepMonth(7)

Heuredep(8)

From City(3)

Airlines(12)

Travellers(13)

Adult(s)(14)

Children(15)

Senior(s)(16)

PriceLine

Arrival City(3)

Departure Date(4)

Departure month(5)

Departure day(6)

Departure Year(7)

Departure City(2)

Number Tickets(11)

Adult passengers(12)

Child passengers(13)

Infant passengers(14)

British

Going to(3)

Departing on(7)

depDay(8)

depMonth(9)

Leaving from(2)

Flight class(13)

4

Adults(5)

Children(6)

Page 15: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 15

Constructing Groups On the running example

The set of all groups [c_DepCity, c_DestCity] [c_DepTime, c_DepDay, c_DepMonth, c_DepYear] [c_Seniors, c_Adults, c_Children, c_Infants]

Travel

2

To City(4)

Departure Date(5)

Depday(6)

DepMonth(7)

Heuredep(8)

From City(3)

Airlines(12)

Travellers(13)

Adult(s)(14)

Children(15)

Senior(s)(16)

PriceLine

Arrival City(3)

Departure Date(4)

Departure month(5)

Departure day(6)

Departure Year(7)

Departure City(2)

Number Tickets(11)

Adult passengers(12)

Child passengers(13)

Infant passengers(14)

British

Going to(3)

Departing on(7)

depDay(8)

depMonth(9)

Leaving from(2)

Flight class(13)

4

Adults(5)

Children(6)

They were not considered (children of the root)

Page 16: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 16

Pairwise merge For a set of query interfaces:

Iteratively merge two at a time Traversing the schema trees bottom-up Placing of group elements Preserving ancestor-descendant relationships in the source schemas

On the running example First iteration

Travel

2

To City(4)

Departure Date(5)

Depday(6)

DepMonth(7)

Heuredep(8)

From City(3)

Airlines(12)

Travellers(13)

Adult(s)(14)

Children(15)

Senior(s)(16)

PriceLine

Arrival City(3)

Departure Date(4)

Departure month(5)

Departure day(6)

Departure Year(7)

Departure City(2)

Number Tickets(11)

Adult passengers(12)

Child passengers(13)

Infant passengers(14)

Merge direction

Travel & PriceLine

2

To City(4)

Departure Date(5)

Depday(6)

DepMonth(7)

Heuredep(8)

From City(3)

Airlines(12)

Travellers(13)

Adult(s)(14)

Children(15)

Senior(s)(16)

Infant passengers

Departure Year

Page 17: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 17

Pairwise merge Second iteration

Note, the fields are naturally placed in the merged interface

Travel & PriceLine

2

To City(4)

Departure Date(5)

Depday(6)

DepMonth(7)

Heuredep(8)

From City(3)

Airlines(12)

Travellers(13)

Adult(s)(14)

Children(15)

Senior(s)(16)

Infant passengers

Departure Year

British

Going to(3)

Departing on(7)

depDay(8)

depMonth(9)

Leaving from(2)

Flight class(13)

4

Adults(5)

Children(6)

Travel & PriceLine & British

2

To City(4)

Departure Date(5)

Depday(6)

DepMonth(7)

Heuredep(8)

From City(3)

Airlines(12)

Travellers(13)

Adult(s)(14)

Children(15)

Senior(s)(16)

Infant passengers

Departure Year

Flight class(13)

Merge direction

Page 18: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 18

Experiment Setup

Five real world domain:

Mapping consists of clusters [Wu04 et al]

Domain#

interfacesAvg. # fields per

interfaceAvg. # internal nodes

per interfaceAvg. depth of

interfaces

Airfare 20 10.7 5.1 3.6

Automobile 20 5.1 1.7 2.4

Book 20 5.4 1.3 2.3

Job 20 4.6 1.1 2.1

Real Estate 20 6.5 2.4 2.7

Page 19: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 19

Experiment The characteristics of the integrated interfaces.

Domain# potential

groups# groups # Violations

# Fields on the integ. interface

Depth of the integ. interface

Airfare 46 8 2 24 5

Automobile 22 4 0 18 3

Book 34 4 0 19 3

Job 12 1 0 19 2

Real Estate 47 7 0 28 4

All group constraints are satisfied with the exception of two potential groups in the airline domain [Seniors, Adults, Children, Infants] and [Airline, Class, NonStop].

Page 20: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 20

Example Integrated Interfaces Airfare domain integrated interface

Country of residence

Airline

Where and when do you to go? 9

Email Address

PhoneFrom ToDept time and date

Date Time

1 Contact Name

Your First Name

Last Name

How many people are going?

Seniors Adults Children Infants

Do you have any preferences?

Max. Number of Stops

Class of Tickets

Airline Preference

2 3 4

Ret time and date

Date Time

8

6 75

Ret from Ret to

Note that fields are placed naturally

Page 21: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 21

Example Integrated Interfaces Auto domain integrated interface

Note that fields are placed naturally

Auto

Your Information

EmailFirst

NameLast

NameYear

From To

Car Information Price

Min Max

State City Near Zip Code

Locate within

Make

Make/Model

Model Keywords

Class

Body Style

Phone Car Type

Page 22: Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

E. Dragut et al -Merging Source Query Interfaces on Web Databases Page 22

End Please visit the project web site

http://www.cs.uic.edu/~edragut/QIProject.html

Thank you for your time and patience!