Distributed Query-Sub-Query Presented by Noam Pettel 29/5/05

Preview:

DESCRIPTION

Distributed Query-Sub-Query Presented by Noam Pettel 29/5/05. Motivation. Optimization of query evaluation in a peer-to-peer environment Development of a distributed algorithm based on Query-Sub-Query technique for optimization of Datalog queries in a peer-to-peer environment - PowerPoint PPT Presentation

Citation preview

Distributed Query-Sub-Query

Presented by Noam Pettel29/5/05

2

Motivation

Optimization of query evaluation in a peer-to-peer environment

Development of a distributed algorithm based on Query-Sub-Query technique for optimization of Datalog queries in a peer-to-peer environment

Implementation of the algorithm using the Active XML system

3

Outline

Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query

(dQSQ) Implementation using AXML Using dQSQ for Petri Nets

4

Outline

Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query

(dQSQ) Implementation using AXML Using dQSQ for Petri Nets

5

Example

Input:

We are interested in the ancestor(x,y) relation Typical query: “Give me all the ancestors of

Andy”

AliceNancy

AliceJoyce

JoyceLois

LoisMark

LoisAndy

JoyceRuth

parent(x,y)Alice

Joyce Nancy

Ruth Lois

Andy Mark

6

Relational Database A Database composed of relations (tables) Stores only explicit information

AliceNancy

AliceJoyce

JoyceLois

LoisMark

LoisAndy

JoyceRuth

parent(x,y)

Alice

Joyce Nancy

Ruth Lois

Andy Mark

AliceNancy

AliceJoyce

JoyceLois

LoisMark

LoisAndy

JoyceRuth

AliceLois

AliceMark

AliceAndy

AliceRuth

JoyceMark

JoyceAndy

anc(x,y)

7

Deductive Database

Explicit information Rules that enable inferences based

on the stored data

anc(x,y) :- parent(x,y)anc(x,y) :- anc(x,z), parent(z,y)

Datalog program

recursions

AliceNancy

AliceJoyce

JoyceLois

LoisMark

LoisAndy

JoyceRuth

parent(x,y)

x,y (anc(x,y) ← parent(x,y))

x,y,z (anc(x,y) ← anc(x,z), parent(z,y))

↨head body

8

Outline

Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query

(dQSQ) Implementation using AXML Using dQSQ for Petri Nets

9

Alice

Joyce Nancy

Ruth Lois

Andy Mark

Query Evaluation

Query:

Goal: Compute query with minimal data materialization

q(y) :- anc(“Joyce”,y)

10

QSQ

Known technique for optimization of Datalog queries:Query-Sub-Query (QSQ)

QSQ rewrites the Datalog program according to the given query

QSQ is based on two main notions:• Binding patterns • Supplementary relations

11

Binding Patterns

For each relation, adorned versions of the relation based on the bindings of the variables are considered

For example, adorned versions of anc are: ancbb, ancbf, ancfb, ancff,

anc(x,y) :- parent(x,y)anc(x,y) :- anc(x,z), parent(z,y)q(y) :- anc(“Joyce”,y)

12

Binding Patterns

anc (x,y) :- parent(x,y)anc (x,y) :- anc (x,z), parent(z,y)q(y) :- anc (“Joyce”,y)

bound to a constant free

The same relation may appear with different adornments in the Datalog program

different adornments of the same relation are treated as different relations during the QSQ computation

bf

bf

bf bf

13

Supplementary Relations

ancbf (x,y) :- parent(x,y)

ancbf (x,y) :- ancbf (x,z), parent(z,y)

q(x) :- ancbf (“Joyce”,x)

For each adorned relation and each position in the body of a rule, we define a supplementary relation to accumulate the bindings relevant to that position

sup_10(x) sup_11(x,y)

sup_20(x) sup_21(x,z) sup_22(x,y)

sup_10(x) :- in_anc_bf(x)sup_11(x,y) :- sup_10(x), parent(x,y)anc_bf(x,y) :- sup_11(x,y)

sup_20(x) :- in_anc_bf(x)sup_21(x,z) :- sup_20(x), anc_bf(x,z)sup_22(x,y) :- sup_21(x,z), parent(z,y)anc_bf(x,y) :- sup_22(x,y)

QSQ rewriting of the program

14

QSQ Example

sup_10(x) sup_11(x,y)

sup_20(x) sup_21(x,z) sup_22(x,y)

Joyce, LoisJoyce, Ruth

AliceNancy

AliceJoyce

JoyceLois

LoisMark

LoisAndy

JoyceRuth

parent(x,y)

LoisRuth

Joyce, LoisJoyce, Ruth

Joyce, MarkJoyce, Andy

Mark Andy

ancbf (x,y) :- parent(x,y)

ancbf (x,y) :- ancbf (x,z), parent(z,y)

q(y) :- ancbf (“Joyce”,y)

Joyce, MarkJoyce, Andy

query result

Alice

Joyce Nancy

Ruth Lois

Andy Mark

Joyce

Joyce

15

Properties of QSQ

Compute the correct answer to the query

Materialize only a minimal set of tuples

Guaranteed to terminate

QSQ evaluations have nice properties!

16

Outline

Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query

(dQSQ) Implementation using AXML Using dQSQ for Petri Nets

17

Distributed Environment

r1 r(x,y) :- a(x,y)r2 r(x,y) :- s(x,z), t(z,y)r3 s(x,y) :- r(x,y), b(y,z)r4 t(x,y) :- c(x,y)

Centralized Datolog program

Distribution of the program between 3 peers

R

hosting r, aS

hosting s, b

T

hosting t, c

r1 r@R(x,y) :- a@R(x,y)r2 r@R(x,y) :- s@S(x,z), t@T(z,y)

r3 s@S(x,y) :- r@R(x,y), b@S(y,z)

r4 t@T(x,y) :- c@T(x,y)

The rules at peer P are the rules where P is the peer of the head

18

Naïve Distributed Evaluation

Activation of remote relations

R

S T

r2 r@R(x,y) :- s@S(x,z), t@T(z,y)

request request

response response

AXML and Web Services make it very easy!

19

Termination Detection

We need to detect when the system reaches a fixpoint

Fixpoint is reached when no new facts can be derived at any peer

Termination detection is a standard problem in distributed computing

20

Termination Detection

The model: Communication is asynchronous Each message eventually arrives and

acknowledged At some point, the site that started the

query decides to check for termination It calls all the sites that it directly

invoked and asks them if they completed

These sites contact the sites they invoked and so on…

21

Termination Detection

A site answers positively if:• It is idle (cannot produce more data)• All the data it has sent has been

acknowledged• All its successors believe the

computation terminated

22

Termination Detectionr1 r@R(x,y) :- a@R(x,y)r2 r@R(x,y) :- s@S(x,z), t@T(z,y)

r3 s@S(x,y) :- r@R(x,y), b@S(y,z)

r4 t@T(x,y) :- c@T(x,y)

r

a s

b

t

c

Build a graph to represent the distributed Datalog program

Recursions result in cycles in the graph

Use a spanning tree of the graph in order to decide termination

23

Distributed QSQ Rewriting

For each rule: The peer in the head of the rule starts the rewriting

When a remote relation is encountered, the peer delegates the remainder of the rule to the remote peer in charge of that relation

24

Distributed QSQ Rewriting

sup_0(x) sup_1(x,z) sup_2(x,y)

rbf (x,y) :- sbf (x,z), tbf (z,y) sup_0(x) :- in_r_bf(x)sup_1(x,z) :- sup_0(x), s(x,z)sup_2(x,y) :- sup_1(x,z), t_bf(z,y)r_bf(x,y) :- sup_2(x,y)

centralized

sup_0@R(x) sup_1@S(x,z) sup_2@T(x,y)

r@Rbf (x,y) :- s@Sbf (x,z), t@Tbf (z,y)

distributed R computes sup_0@R(x) :- in_r_bf@R(x) R sends to S sup2@S(x,y) :- sup0@R(x,y), s_bf@S(x,z), t_bf@T(z,y)

25

Distributed QSQ Rewriting

The rewriting is performed locally at each peer, without any global knowledge

Once the QSQ rewriting is complete, we start the QSQ computation process – Like in the central case, except for calling remote services

26

Outline

Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query

(dQSQ) Implementation using AXML Using dQSQ for Petri Nets

27

Why Active XML?

AXML is a natural selection An AXML document contains both

explicit and implicit data, just like in Datalog

r@R(x,y) :- s@S(x,z), t@T(z,y)

<r> <t> <x>1</x> <y>2</y> </t> <t> <x>1</x> <y>3</y> </t>

<sc>…

S T

continuous services

28

Implementation Steps

Given a distributed Datalog program and a query:1. Transform the Datalog program to

distributed QSQ2. Transform the distributed QSQ to

Active XML3. Run!4. Detect termination

29

Outline

Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query

(dQSQ) Implementation using AXML Using dQSQ for Petri Nets

30

Article

“Diagnosis of Asynchronous Discrete Event Systems: Datalog to the Rescue!”

S. Abiteboul, Z. Abrams, S. Haar, T. Milo

PODS, June 2005

31

Datalog & P2P

Deductive databases was a hot topic in the late 80s

Research in this area led to beautiful results, with little industrial impact

Years later, with networks everywhere, recursive data management is becoming more essential

Datalog and QSQ become hot again!

32

Abstract

Diagnosis of distributed telecommunication systems

The problem can be modeled by Datalog

Can benefit from dQSQ

33

Petri Nets

An enabled transition can fire and yield a new Petri net If a transition fires, its alarm symbol is reported to the supervisor For example, if transition (i) fires. The marking moves from

places 1,7 to places 2,3

place

alarm symbol

transition

marked place

The marked places model the current state of the peer

A transition node is enabled iff all its parent nodes are marked

34

The Problem

The supervisor receives an alarm sequence (a1,p1),(a2,p2),…,(an,pn).Ai – An alarm symbolPi – The peer that emitted the alarm

Due to asynchronous communication• We do not guarantee that alarms sent by

different peers appear in the order they were emitted

• We can only assume that the order of alarms is kept for each individual peer

Goal: Find an explanation for a given alarm sequence

35

Example

The set of shaded nodes in figure 2 is a diagnosis for the alarm sequence (b; p1), (a; p2), (c; p1).

36

From Petri Nets to dQSQ

Petri Nets can be modeled by Datalog and dQSQ

A set of relations and rules is defined at each peer

Each peer builds its own Datalog program using local information only, even if it has transitions to other peers

37

From Petri Nets to dQSQ

Here is a small part of the Datalog rules…

38

From Petri Nets to AXML

Translation steps from Petri Nets to Active XML:

Petri Net

Datalog QSQ AXMLPNet2Datalog Datalog2QSQ QSQ2AXML

39

The End

Recommended