36
2 April 2014, IC2 Lunch Seminar UnSAID: Uncertainty and Structure in the Access to Intensional Data Pierre Senellart

UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

2 April 2014, IC2 Lunch Seminar

UnSAID:Uncertainty and Structurein the Accessto Intensional Data

Pierre Senellart

Page 2: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

2 / 24 IC2 Pierre Senellart

Uncertain data is everywhere

Numerous sources of uncertain data:

Measurement errors

Data integration from contradicting sources

Imprecise mappings between heterogeneous schemas

Imprecise automatic processes (information extraction, naturallanguage processing, etc.)

Imperfect human judgment

Lies, opinions, rumors

Page 3: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

3 / 24 IC2 Pierre Senellart

Structured data is everywhere

Data is structured, not flat:Variety of representation formats of data in the wild:

relational tablestrees, semi-structured documentsgraphs, e.g., social networks or semantic graphsdata streamscomplex views aggregating individual information

Heterogeneous schemas

Additional structural constraints: keys, inclusion dependencies

Page 4: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

4 / 24 IC2 Pierre Senellart

Intensional data is everywhereLots of data sources can be seen as intensional: accessing all the datain the source (in extension) is impossible or very costly, but it ispossible to access the data through views, with some accessconstraints, associated with some access cost.

Indexes over regular data sources

Deep Web sources: Web forms, Web services

The Web or social networks as partial graphs that can beexpanded by crawling

Outcome of complex automated processes: information extraction,natural language analysis, machine learning, ontology matching

Crowd data: (very) partial views of the world

Logical consequences of facts, costly to compute

Page 5: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

5 / 24 IC2 Pierre Senellart

Introducing UnSAID

Uncertainty and Structure in the Access to Intensional Data

Jointly deal with Uncertainty, Structure, and the fact that accessto data is limited and has a cost, to solve a user’s knowledge need

Lazy evaluation whenever possible

Evolving probabilistic, structured view of the current knowledgeof the world

Solve at each step the problem: What is the next best access to dogiven my current knowledge of the world and the knowledge need

Knowledge acquisition plan (recursive, dynamic, adaptive) thatminimizes access cost, and provides probabilistic guarantees

Page 6: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

formulation

answer & explanation

optimization

modeling

priors

intensional access

knowledge update

Knowledgeneed

Currentknowledgeof the world

Knowledgeacquisition plan

Uncertainaccess result

Structuredsource profiles

Page 7: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

formulation

answer & explanation

optimization

modeling

priors

intensional access

knowledge update

Knowledgeneed

Currentknowledgeof the world

Knowledgeacquisition plan

Uncertainaccess result

Structuredsource profiles

Page 8: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

formulation

answer & explanation

optimization

modeling

priors

intensional access

knowledge update

Knowledgeneed

Currentknowledgeof the world

Knowledgeacquisition plan

Uncertainaccess result

Structuredsource profiles

Page 9: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

formulation

answer & explanation

optimization

modeling

priors

intensional access

knowledge update

Knowledgeneed

Currentknowledgeof the world

Knowledgeacquisition plan

Uncertainaccess result

Structuredsource profiles

Page 10: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

formulation

answer & explanation

optimization

modeling

priors

intensional access

knowledge update

Knowledgeneed

Currentknowledgeof the world

Knowledgeacquisition plan

Uncertainaccess result

Structuredsource profiles

Page 11: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

formulation

answer & explanation

optimization

modeling

priors

intensional access

knowledge update

Knowledgeneed

Currentknowledgeof the world

Knowledgeacquisition plan

Uncertainaccess result

Structuredsource profiles

Page 12: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

formulation

answer & explanation

optimization

modeling

priors

intensional access

knowledge update

Knowledgeneed

Currentknowledgeof the world

Knowledgeacquisition plan

Uncertainaccess result

Structuredsource profiles

Page 13: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

formulation

answer & explanation

optimization

modeling

priors

intensional access

knowledge update

Knowledgeneed

Currentknowledgeof the world

Knowledgeacquisition plan

Uncertainaccess result

Structuredsource profiles

Page 14: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

7 / 24 IC2 Pierre Senellart

What this talk is about

General overview of my current (and recent) research, throughone-slide presentation of individual works

Hopefully, emerging consistent themes

Connections with the UnSAID problem

Page 15: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

8 / 24 IC2 Pierre Senellart

Plan

Introduction

Instances of UnSAID

Uncertainty and Structure

UnSAID Applications

Conclusion

Page 16: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

9 / 24 IC2 Pierre Senellart

Adaptive focused crawling(Gouriten, Maniu, and Senellart 2014)

with

3

5 0

4

0

0

3

5

3

4

2

2

3

3

5 0

4

0

0

3

5

3

4

2

2

3

2 3

0

00

10

0

1

0

1

3

00

0

0

Problem: Efficiently crawl nodes in agraph such that total score is high

Challenge: The score of a node isunknown till it is crawled

Methodology: Use various predictorsof node scores, and adaptively selectthe best one so far with multi-armedbandits

Page 17: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

9 / 24 IC2 Pierre Senellart

Adaptive focused crawling(Gouriten, Maniu, and Senellart 2014)

with

3

5 0

4

0

0

3

5

3

4

2

2

3

3

5 0

4

0

0

3

5

3

4

2

2

3

2 3

0

00

10

0

1

0

1

3

00

0

0

Problem: Efficiently crawl nodes in agraph such that total score is high

Challenge: The score of a node isunknown till it is crawled

Methodology: Use various predictorsof node scores, and adaptively selectthe best one so far with multi-armedbandits

Page 18: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

9 / 24 IC2 Pierre Senellart

Adaptive focused crawling(Gouriten, Maniu, and Senellart 2014)

with

3

5 0

4

0

0

3

5

3

4

2

2

3

3

5 0

4

0

0

3

5

3

4

2

2

3

2 3

0

00

10

0

1

0

1

3

00

0

0

Problem: Efficiently crawl nodes in agraph such that total score is high

Challenge: The score of a node isunknown till it is crawled

Methodology: Use various predictorsof node scores, and adaptively selectthe best one so far with multi-armedbandits

Page 19: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

10 / 24 IC2 Pierre Senellart

Adaptive Web application crawling

with

entrypoint p4

2039

p1

239

p2

754

p3

3227

p5

2600

l 2

l3

l 1

l1 l4

Problem: Optimize the amount ofdistinct content retrieved from a Website w.r.t. the number of HTTPrequests

Challenge: No way to know a prioriwhere the content lies on the Web site

Methodology: Sample a small part ofthe Web site and discover optimalcrawling patterns from it

Page 20: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

11 / 24 IC2 Pierre Senellart

Optimizing crowd queries under order

with

Problem: Given a query, what is thenext best question to ask the crowdwhen crowd answers are constrainedby a partial order

Challenge: Order constraints makequestions not independent of eachother

Methodology: Construct a polytopeof admissible regions and uniformlysample from it to determine theimpact of a data item

Page 21: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

12 / 24 IC2 Pierre Senellart

Online influence maximization

with

Action Log

User | Action | Date

1 1 2007-10-102 1 2007-10-123 1 2007-10-142 2 2007-11-104 3 2007-11-12

. . .

1

42

3

0.5

0.1 0.9

0.50.2

. . .

Real World Simulator

1

42

3

. . .

Social Graph

1

42

3

. . .

Weighted Uncertain Graph

One Trial

Sampling

1

2

Activation SequencesUpdate

3

Solution Framework Problem: Run influence campaigns insocial networks, optimizing theamount of influenced nodes

Challenge: Influence probabilities areunknown

Methodology: Build a model ofinfluence probabilities and focus oninfluent nodes, with anexploration/exploitation trade-off

Page 22: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

13 / 24 IC2 Pierre Senellart

Query answering under uncertain rules

with

Pope(X )) BuriedIn(X ; Rome) (98%)

LocatedIn(X ; Lombardy))

BelongsTo(X ; AustrianEmpire) (45%)

Pope(PiusXI)

BornIn(PiusXI; Desio)

LocatedIn(Desio; Lombardy)

9X ; BornIn(X ;Y ) ^

BelongsTo(Y ; AustrianEmpire) ^

BuriedIn(X ; Rome)?

Problem: Determine efficiently theprobability of a query being true,given some data and uncertain rulesover this data

Challenge: Produced facts may becorrelated, the same facts can begenerated in different ways,probability computation is hard ingeneral. . .

Methodology: Find restrictions on therules (guarded?) and the data(bounded tree-width?) that make theproblem tractable

Page 23: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

14 / 24 IC2 Pierre Senellart

Plan

Introduction

Instances of UnSAID

Uncertainty and Structure

UnSAID Applications

Conclusion

Page 24: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

15 / 24 IC2 Pierre Senellart

Efficient querying of uncertain graphs(Maniu, Cheng, and Senellart 2014)

with

06

5

60

2

06

4

3

4

26

1

61

3: 0.144: 0.01

2: 0.183: 0.01

1: 0.75

1: 0.752: 0.06

1: 0.75

1: 0.75

1: 0.5

1: 0.75

1: 0.25

1: 0.75

1: 0.5 1: 0.5

1: 1

(α)

(β)

(γ)

(ε)

(δ)

(ζ)

Problem: Optimize query evaluationon probabilistic graphs

Challenge: Probabilistic queryevaluation is hard, and standardindexing techniques for large graphsdo not work

Methodology: Build a treedecomposition that preservesprobabilities and run the query onthis tree decomposition

Page 25: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

16 / 24 IC2 Pierre Senellart

Uniform sampling of XML documents(Tang, Amarilli, Senellart, and Bressan 2014)

with

0n

1n2n 3n

5n4n

0n

1n

2n 3n

5n4n

6n

0n

1n

3n 4n

6n5n

2n

Problem: Sample a subtree of fixedsize/characteristics uniformly atrandom from a tree, e.g., for datapricing reasons

Challenge: Naive top-down samplingdoes not work, will result in biasedsampling depending on the treestructure

Methodology: Bottom-up annotationof the tree recording distributioninformation, followed by top-downsampling

Page 26: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

17 / 24 IC2 Pierre Senellart

Truth discovery on heterogeneous data

with

Problem: Determine true values fromintegrated semi-structured Websources

Challenge: Semi-structured data onthe Web is contradictory and copiedfrom source to source

Methodology: Estimate the truth anddetermine copy patterns betweensources, not only of base facts but ofsubtrees of the data

Page 27: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

18 / 24 IC2 Pierre Senellart

Probabilistic XML conditioning

with

0

1

2 3 4

1e

2e 3e 4e

0

1

2 3 4

1a

2a 32 aa 32 aa

0e 0a

0

1

2

1e

2e

3e

0e

3 4

5

4e

5e

0

1

2

3 4

5

2a

4a

0a

1a

2a

0

1

2 3

1e

2e3e

0e

4

5

4e

0

1

2 3 32 aa

4

5

42 aa

0a

1a

2a

2a

5e 2a

0

1

2 3

1e

2e3e

0e

4

5

4e

0

1

2 3 ')( 332 aaa

4

5

')( 442 aaa

0a

1a

2a

5e '52 aa

0

1 3 51e3e 5e

0e

2 4 62e 4e6e

0

1 2

3 4

0e

1e 2e

3e 4e

label(0)=Rlabel(1)=Alabel(2)=Blabel(3)=Clabel(4)=D

Problem: Incorporate a logicalconstraint into a probabilistic XMLdatabase

Challenge: Constraining is not astandard probabilistic databaseoperation, NP-hard in general

Methodology: Identify tractablesubcases

Page 28: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

19 / 24 IC2 Pierre Senellart

Provenance and Order(Amarilli, Ba, Deutch, and Senellart 2013)

with

Problem: Clean semantics fornondeterministic order-aware queries

Challenge: Existing datamanipulation languages treat order inan ad hoc manner, with nocompositional semantics

Methodology: Order-aware typesystem for strict static checking, andindividual data-level provenanceannotations for dynamic analysis ofallowed ordered operations

Page 29: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

20 / 24 IC2 Pierre Senellart

Plan

Introduction

Instances of UnSAID

Uncertainty and Structure

UnSAID Applications

Conclusion

Page 30: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

21 / 24 IC2 Pierre Senellart

Social Sensing of Moving Objects(Ba, Montenez, Abdessalem, and Senellart 2014)

with

Problem: Infer trajectories andmeta-information of moving objectsfrom Web and social Web data

Challenge: Uncertainty andinconsistency in extracted information

Methodology: Data cleaning byfiltering incorrect locations, and truthdiscovery to identify reliable sources

Page 31: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

22 / 24 IC2 Pierre Senellart

Smarter urban mobility

with

Problem: Smart and adaptiverecommendations for mobility incities (transit, bike rental, car, etc.)

Challenge: Should take into accountpersonal information (calendar, etc.),past trajectories, public informationabout transit and traffic

Methodology: Map GPS tracks toroutes of public transport, learn routepatterns, infer destination while intransit and provide push suggestions

Page 32: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

23 / 24 IC2 Pierre Senellart

Plan

Introduction

Instances of UnSAID

Uncertainty and Structure

UnSAID Applications

Conclusion

Page 33: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

24 / 24 IC2 Pierre Senellart

What’s next?

So far, we have tackled individual aspects or specializations of theUnSAID problem

Now we need to consider the general problem, and propose generalsolutions

There is a strong potential for uncovering the unsaid informationfrom the Web

Strong connections with a number of research areas: activelearning, reinforcement learning, adaptive query evaluation, etc.Inspiration to get from these areas.

Everyone is welcome to join the effort!

Merci.

Page 34: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

24 / 24 IC2 Pierre Senellart

What’s next?

So far, we have tackled individual aspects or specializations of theUnSAID problem

Now we need to consider the general problem, and propose generalsolutions

There is a strong potential for uncovering the unsaid informationfrom the Web

Strong connections with a number of research areas: activelearning, reinforcement learning, adaptive query evaluation, etc.Inspiration to get from these areas.

Everyone is welcome to join the effort!

Merci.

Page 35: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

Amarilli, A., M. L. Ba, D. Deutch, and P. Senellart (Dec. 2013).Provenance for Nondeterministic Order-Aware Queries.Preprint available athttp://pierre.senellart.com/publications/amarilli2014provenance.pdf.

Ba, M. L., S. Montenez, T. Abdessalem, and P. Senellart (Apr. 2014).Extracting, Integrating, and Visualizing Uncertain WebInformation about Moving Objects. Preprint available athttp://pierre.senellart.com/publications/ba2014extracting.pdf.

Gouriten, G., S. Maniu, and P. Senellart (Mar. 2014). Scalable,Generic, and Adaptive Systems for Focused Crawling. Preprintavailable athttp://pierre.senellart.com/publications/gouriten2014scalable.pdf.

Maniu, S., R. Cheng, and P. Senellart (Mar. 2014). ProbTree: AQuery-Efficient Representation of Probabilistic Graphs. Preprintavailable athttp://pierre.senellart.com/publications/maniu2014probtree.pdf.

Page 36: UnSAID: Uncertainty and Structure in the Access to ...2014/04/02  · UnSAID: Uncertainty and Structure in the Access to Intensional Data PierreSenellart 2 / 24 IC2 Pierre Senellart

Tang, R., A. Amarilli, P. Senellart, and S. Bressan (Mar. 2014). Get aSample for a Discount: Sampling-Based XML Data Pricing.Preprint available athttp://pierre.senellart.com/publications/tang2014get.pdf.