23
iTrails: Pay-as-you-go Information iTrails: Pay-as-you-go Information Integration in Dataspaces Integration in Dataspaces Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi ETH Zurich 2008-02-22 Summerized By Sungchan Park

iTrails: Pay-as-you-go Information Integration in Dataspaces

  • Upload
    bracha

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

iTrails: Pay-as-you-go Information Integration in Dataspaces. Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi ETH Zurich 2008-02-22 Summerized By Sungchan Park. Problem: Querying Several Sources. Solution #1: Use a Search Engine. - PowerPoint PPT Presentation

Citation preview

Page 1: iTrails: Pay-as-you-go Information Integration in Dataspaces

iTrails: Pay-as-you-go Information Integration in iTrails: Pay-as-you-go Information Integration in DataspacesDataspaces

Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi

ETH Zurich

2008-02-22

Summerized By Sungchan Park

Page 2: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Problem: Querying Several SourcesProblem: Querying Several Sources

Center for E-Business Technology

Page 3: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Solution #1: Use a Search EngineSolution #1: Use a Search Engine

Center for E-Business Technology

Page 4: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Solution #2: Use an Information Integration Solution #2: Use an Information Integration SystemSystem

Center for E-Business Technology

Page 5: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

iTrail Core IdeaiTrail Core Idea

Is there an integration solution in-between these two extremes?

Center for E-Business Technology

Page 6: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

iTrail Core IdeaiTrail Core Idea

Center for E-Business Technology

Is there an integration solution in-between these two extremes?

Declaratively add lightweight ‘hints’ to a search engine thus allowing gradual enrichment of loosely integrated data sources

Page 7: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Example ScenarioExample Scenario

Query

“pdf yesterday”

Hints(Trails)

1. The date attribute is mapped to modified attribute

2. The date attribute is mapped to received attribute

3. The yesterday keyword is mapped to a query for values of the date attribute equal to the date of yesterday

4. The pdf keyword is mapped to a query for elements whose names end in pdf

Center for E-Business Technology

Page 8: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Where hints come from?Where hints come from?

Given by the user

Explicitly

Via Relevance Feedback

(Semi-)Automatically

Information extraction techniques

Automatic schema matching

Ontologies and thesauri (e.g., wordnet)

User communities (e.g., trails on gene data, bookmarks)

All these aspects are beyond the scope of this paper

Center for E-Business Technology

Page 9: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Data and Query ModelData and Query Model

Data Model

Assume that all data is represented by a logical graph G

Query also represented by graph

Center for E-Business Technology

Page 10: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Query SyntaxQuery Syntax

Center for E-Business Technology

Page 11: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Query ExampleQuery Example

“//Home/projects//*[“Mike”]”

Center for E-Business Technology

Page 12: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Basic Form of a TrailBasic Form of a Trail

An unidirectional trail

An bidirectional trail

Center for E-Business Technology

Page 13: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Trail ExampleTrail Example

Trails in an example scenario

Trails

Given query

– “pdf yesterday”

Transformed query

– “//*.pdf[modified=yesterday() OR received=yesterday() ].”

Center for E-Business Technology

Page 14: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

iTrail Query ProcessingiTrail Query Processing

1. Matching

2. Transforming

3. Merging

Center for E-Business Technology

Page 15: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

iTrail Query Processing ExampleiTrail Query Processing Example

Given Query

Q1 = //home/projects//* [“Mike”]

Trail

Ψ8 := //home/*.name ->

//calendar//*.tuple.category

Resulting Query

Q1{Ψ8} = //home/projects/*[“Mike”] U

//calendar//*[category=“project”]//*.[“Mike”]

Center for E-Business Technology

Utilizing G. Miklau and D. Suciu. Containment and Equivalence for an Xpath Fragment. In PODS, 2002.

Page 16: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Applying Multiple TrailApplying Multiple Trail

MMCA(Multiple Match Colouring Algorithm) algorithm

Trail can be applied infinitely

To prevent infinite recursion, a trail should not be rematched to nodes in a logical plan generated by itself

Center for E-Business Technology

Page 17: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Other IssuesOther Issues

Trail Pruning

Problem: MMCA is exponential in number of levels

Solution: Trail Pruning

– Prune by number of levels

– Prune by top-K trails matched in each level

Give weight and prob. to trails

– Prune by both top-K trails and number of levels

Trail Indexing

Precompute trail expressions in order to speed up query processing

Trail materialization

Center for E-Business Technology

Page 18: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

ExperimentsExperiments

Setting

Configured iMeMex to act in three modes

– Baseline: Graph / IR search engine

– iTrails: Rewrite search queries with trails

– Perfect Query: Semantics-aware query

Data

Center for E-Business Technology

Page 19: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Experiment, QualityExperiment, Quality

Compare with baseline

Center for E-Business Technology

Page 20: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Experiment, overheadExperiment, overhead

Compare with perfect query

Overhead is not negligible

However, this can be fixed by exploiting trail materializations

Center for E-Business Technology

Page 21: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Experiment, Scalability #1Experiment, Scalability #1

Center for E-Business Technology

Rewrite Time

Query-rewrite time can be controlled with pruning

Page 22: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

Experiment, Scalability #2Experiment, Scalability #2

Quality

Pruning improves precision

Center for E-Business Technology

Page 23: iTrails: Pay-as-you-go Information Integration in Dataspaces

Copyright 2008 by CEBT

ConclusionConclusion

Our Contributions

iTrails: generic method to model semantic relationships (e.g. implicit meaning, bookmarks, dictionaries, thesauri,attribute matches, ...)

We propose a framework and algorithms for Pay-as-you-go Information Integration

Smooth transition between search and data integration

Future Work

Trail Creation

– Use collections (ontologies, thesauri, wikipedia)

– Work on automatic mining of trails from the dataspace

Other types of trails

Center for E-Business Technology