Query Relaxation Using Malleable Schemas Xuan Zhou, Julien Gaugaz, Wolf-Tilo Balke, Wolfgang Nejdl...

Preview:

Citation preview

Query Relaxation Using Malleable Schemas

Xuan Zhou, Julien Gaugaz, Wolf-Tilo Balke, Wolfgang NejdlL3S Research Center

Leibniz UniversityHanover, Germany

Presented by Aaron StewartBYU CS 652Spring 2009

Problem

+ = ?

Problem

• Multiple data sources

• Unmatched schemas

Approach

1. Malleable schemas

2. Discover correlations

3. Relax user queries

Malleable Schemas

• Allow duplicate fields

• Allow related fields

Malleable Schemas

Malleable Schemasfirst_name, sur_name

name

Malleable Schemas

contents

body

In Practice: Tables

• “…a malleable schema… contains imprecise and overlapping definitions of attributes or relationships.”

• “In this way, a malleable schema can capture such heterogeneous data structures as in Figure 1.”

In Practice: Tables

In Practice: Tables

Entities (database records, rows)

Attributes (database fields, columns)

Equivalently: Distinct tables

Query Relaxation Planning

• Multiple queries– Different columns or tables– As few queries as possible

• Exponential number of relaxed queries– Evaluate in order of precision– Stop at k results

Query Relaxation Planning

A1 A2

relaxed attributechild attributes

Query Relaxation Planning

• A “relaxed query always yields better precision than its child queries, so that it should always be evaluated prior to its child queries”

Parent/Child Relationship

• We would think A is the parent, and A1 and A2 are the children, but…

• Put them in order of correlation probability– If P(A|A1) > P(A|A2)– Then A => A1 => A2

Query Relaxation Planning

Query Relaxation

Experiments

• Data sets– IMDB Movies– Amazon.com DVDs and VHS videos

Results

Results

Results

Analysis

• Strengths– Handles mixed schemas– Well-designed algorithms (IMO)

• Future work– Speed

Recommended