1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY...

Preview:

Citation preview

1

Ranking Inexact AnswersRanking Inexact Answers

2

Ranking IssuesRanking Issues

• When inexact querying is allowed, there may be MANY answers– different answers have a different level of

incompleteness

• Ranking the answers allows the user to quickly see the (hopefully) most relevant answers

• Preference: Create answers in ranking order– Why is this important?

• We will consider several different approaches to this problem

3

Tree Pattern RelaxationTree Pattern Relaxation

Amer-Yahia, Cho, Srivastava

EDBT 2002

4

Tree PatternsTree Patterns

• Queries are tree patterns, as considered in

previous lessons

Book

Collection Editor

Name Address

Double line indicates

descendent

5

Relaxed QueriesRelaxed Queries

• Four types of “relaxations” are allowed on the trees

• Node Generalization: Assume that we know a

relationship of types/super-types among labels.

Allow label to be changed to super-type

Book

Collection Editor

Name Address

Document

Collection Editor

Name Address

6

Relaxed QueriesRelaxed Queries

• Leaf Node Deletion: Delete a leaf node (and its

incoming edge) from the tree

Book

Collection Editor

Name Address

Book

Editor

Name Address

7

Relaxed QueriesRelaxed Queries

• Edge Generalization: Change a parent-child edge

to an ancestor-descendent edge

Book

Collection Editor

Name Address

Book

Editor

Name Address

Collection

8

Relaxed QueriesRelaxed Queries

• Subtree Promotion: A query subtree can be

promoted so that it is directly connected to its

former grandparent by an ancestor-descendent

edgeBook

Collection Editor

Name Address

Book

Editor Name

Address

Collection

9

Composing RelaxationsComposing Relaxations

• Relaxations can be composed. Are the following

relaxations of Q?

Book

Collection Editor

Name Address

QBook

Collection

Book

Collection Address

Name

Document

Address

10

Approximate Answers and RankingApproximate Answers and Ranking

• An approximate answer to Q is an exact answer to a

relaxed query derived from Q

• In order to give different answers different rankings, tree

patterns are weighted

• Each node and edge has 2 weights – value when exactly

satisfied, value when satisfied by a relaxationBook

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(6, 0) (5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

A fragment of a document that

exactly satisfies the query will have a

score of: 45

11

Example RankingExample Ranking

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(6, 0) (5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Book

Person

Name Address

Details

Sam NY

How much would this

answer score?

12

Example RankingExample Ranking

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(6, 0) (5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Book

Person

Name Address

Details

Sam NY

How much would this

answer score?

13

Problem DefinitionProblem Definition

Given an XML document D, a weighted tree

pattern Q and a threshold t, find all approximate

answers of Q in D whose scores are ≥ t

• Naive strategy to solve the problem:

– Find all relaxations of Q

– For each relaxation, compute all exact answers

– remove answers with score below t

• Is this a good strategy?

14

Problem DefinitionProblem Definition

Given an XML document D, a weighted tree pattern Q and a threshold t, find all approximate answers of Q in D whose scores are ≥ t

• A better strategy to compute an answer to a relaxation of a query:– Intuition: Compute the query as a series of joins

– Can use stack-merge algorithms (studied before) for computing joins

– filter out intermediate results whose scores are too low

15

The Query PlanThe Query Plan

• We now show the how to derive a plan for

evaluating queries in this setting

• First, we show how an exact plan is derived

• Then, we consider how each individual

relaxation can be added in

• Finally, we show the complete relaxed plan

16

Query Plan: Exact AnswersQuery Plan: Exact Answers

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Book Collection

Editor

Address

Name

c(Book, Collection)

c(Book, Editor)

c(Editor, Name)

d(Editor, Address)

c(x,y) = y is child of x

d(x,y) = y is descendent of x

(6, 0)

17

Query Plan: Exact AnswersQuery Plan: Exact Answers

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Book Collection

Editor

Address

Name

c(Book, Collection)

c(Book, Editor)

c(Editor, Name)

d(Editor, Address)

Remember, to compute a join, e.g., of Book and Collection, we actually find the list of Books and the list of Collections (from the index) and perform the stack-merge algorithms

(6, 0)

18

Adding Relaxations into PlanAdding Relaxations into Plan

• Node generalization: Book relaxed to Document

Book Collection

Editor

Address

Namec(Book, Editor)

c(Editor, Name)

d(Editor, Address)

Document

c(Book, Collection)c(Document, Collection)

c(Document, Editor)

19

Adding Relaxations into PlanAdding Relaxations into Plan

• Edge generalization: Relax Editor-Name Edge

Book Collection

Editor

Address

Namec(Book, Editor)

c(Editor, Name)

d(Editor, Address)

c(Book, Collection)

c(Editor, Name) or

(Not exists c(Editor,Name)

and d(Editor, Name((

Written in short as:c(Editor, Name) or

d(Editor, Name(

We only allow relaxations when a direct child does

not exist

20

Adding Relaxations into PlanAdding Relaxations into Plan

• Subtree Promotion: Promote tree rooted at Name

Book Collection

Editor

Address

Namec(Book, Editor)

c(Editor, Name)

d(Editor, Address)

c(Book, Collection)

c(Editor, Name) or

(Not exists c(Editor,Name)

and d(Book, Name((

Written in short as:c(Editor, Name) or

d(Book, Name(

21

Adding Relaxations into PlanAdding Relaxations into Plan

• Leaf Node Deletion: Make Address Optional

Book Collection

Editor

Address

Namec(Book, Editor)

c(Editor, Name)

d(Editor, Address)

c(Book, Collection)

Outer Join Operator: Means that should join if possible, but not delete values that

cannot join

22

Combining All Possible RelaxationsCombining All Possible Relaxations

• All approximate answers can be derived from the following

query plan

Document Collection

Editor

Address

Namec(Document, Editor) OR d(Document, Editor)

c(Editor, Name) OR d(Editor, Name) OR d(Document,Name)

d(Editor, Address) OR d(Document, Address)

c(Book, Collection) OR d(Document, Collection)

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)(6, 0)

23

Creating “Best Answers”Creating “Best Answers”

• Want to find answers whose ranking is over

the threshold t

• Naive solution: Create all answers. Delete

answers with low ranking

• Algorithm Thres: Goal of the algorithm is to

prune intermediate answers that cannot

possibly meet the specified threshold

24

Associating Nodes with Maximal WeightAssociating Nodes with Maximal Weight

• The maximal weight of a node in the evaluation plan is the

largest value by which the score of an intermediate answer

computed for that node can grow

Document Collection

Editor

Address

Namec(Document, Editor) OR d(Document, Editor)

c(Editor, Name) OR d(Editor, Name) OR d(Document,Name)

d(Editor, Address) OR d(Document,Address)

c(Book, Collection) OR d(Document, Collection)

25

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Document Collection

Editor

Address

Namec(Document, Editor) OR d(Document, Editor)

c(Editor, Name) OR d(Editor, Name) OR d(Document,Name)

d(Editor, Address) OR d(Document,Address)

c(Book, Collection) OR d(Document, Collection)

(38) (39)

(6, 0)

(30) (40)

(39)

(41)

(21)

(7)

(0)

26

Algorithm ThresAlgorithm Thres

• Relaxed query evaluation plan is computed

bottom-up

– Note that the joins are computed for all matching

intermediate results at the same time

• At each step, intermediate results are computed,

along with their scores

• If the sum of an intermediate result score with the

maximal weight of the current node is less than the

threshold, prune the intermediate result

27

Example: Threshold = 35Example: Threshold = 35

Book

Editor

Name Address

Details

Sam NYDocument Collection

Editor

Namec(Document, Editor) OR d(Document, Editor)

c(Editor, Name) OR d(Editor, Name) OR d(Document,Name)

d(Editor, Address) OR d(Document,Address)

c(Book, Collection) OR d(Document, Collection)

(38) (39)

(30) (40)

(39)

(41)

(21)

(7)

(0)

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Address

(6, 0)

When will the answer be pruned?

7

7

16

27

28

Test YourselfTest Yourself

29

Example RankingExample Ranking

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(6, 0) (5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Document

Name Address

Sam NY

How much would this

answer score?Collection

30

(8, 5)

Query PlanQuery Plan

Book

Collection Editor

Name

(7, 1)

(4, 3)(2, 1)

(5, 0)

(6, 0)

(6, 0)

1. What will the exact plan look like?

FName LName

2. What will the plan look like if all possible relaxations are added?

3. What is maximal weight by which the score of an intermediate answer can

grow, for each node?

(2, 1) (2, 1)

(2, 0)(1, 0)

Recommended