Efficient Filtering in Pub-Sub Systems using BDD

Preview:

DESCRIPTION

Slides prepared based on the paper Efficient Filtering in Publish-Subscribe Systems using BDD by Alexis Campailla, SagarChaki, Edmund Clarke, SomeshJha, Helmut Veith

Citation preview

Efficient Filtering in Publish-

Subscribe Systems using BDDAlexis Campailla, Sagar Chaki, Edmund Clarke, Somesh Jha, Helmut Veith

Prepared by Nabeel Mohamed

4/16/08

1

Outline

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

2

Research Problem at Hand

Loosely-coupled interactions in

publish-subscribe systems allows to

build very large scale systems

However, filtering techniques used are

a major bottleneck

Efficiency of the filtering technique

plays a major role in scalability

Whatever technique we use should be

provably correct

3

Major Contributions

A Precise semantics to match

messages (events) to subscriptions

(subscription queries)

Modeling filtering as a satisfiability

check in BDD

4

Roadmap

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

5

Publish-Subscribe Systems

Publisher

Publisher

Publisher

SubscriberNotify()

SubscriberNotify()

SubscriberNotify()

Distributed

Subscription

Mgmt and Routing

Distributed

Content Routers

Notify()

Subscribe()

Unsubscribe()

publish

publish

notify

subscribe

unsubscribe

6

Publish-Subscribe Systems

Publishers and Subscribers are

loosely coupled

◦ Space decoupled

◦ Time decoupled

◦ Synchronization decoupled

Content routers (brokers) form a

structured p2p system

Scalable Systems

7

Message (Event) Filtering

Filtering

◦ Matching incoming messages (events) generated by Publishers with subscription criteria

◦ A main task of content routers (brokers) –filtering engine

Content-based pub-sub systems routes messages (events) based on the content itself

Example: Filter Quotes with symbol = Google and offer price < 400 in a Financial ticker.

8

Example Pub-Sub Systems

Stock market feeds

◦ For delivery of financial data such as

stock quotes, trade reports, news, etc. to

customers

◦ OPRA feed disseminates more than

100,000 quotes/sec

Sensor networks

Network traffic analysis

Transaction log analysis

9

Desirable Functions of a Filtering

Engine Correctness:

◦ Correctly matching incoming messages with subscription criteria

Expressiveness:◦ Rich subscription language

Efficiency:◦ Real time matching

Scalability:◦ Handling a large number of subscriptions

Dynamic:◦ Capability to add and remove subscriptions

online

10

Related Work

Most existing systems support only conjunctive subscriptions

◦ GRYPHON

◦ SIENA

◦ Le Subscribe

Example: The following subscription requires 27 GRYPHON-like subscriptions while BDD handles it naturally.

11

Related Work

Some systems have higher expressive power at the expense of less efficient filtering.◦ ELVIN

Can we come up with an efficient filtering technique while providing an expressive subscription language?

BDD based filtering may be employed in existing systems to improve matching efficiency

12

Roadmap

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

13

Subscription Query Language

The language used to describe

subscription criteria or subscriptions

Three Subscription Languages of

increasing complexity

◦ SiSL – Simple Subscription Language

◦ StSL – Strict Subscription Language

◦ DeSL – Default Subscription Language

14

Messages and Attributes

V = <v1, .., vn> = a finite sequence of

attributes

Each attribute vi has a type

Each attribute vi has a corresponding

domain

Event schema =

15

Messages and Attributes

A message = an assignment of values

to some (not necessarily all) of the

attributes

Formally, a message is a mapping m

such that for each attribute v, either

(m does not define v) ≡

A message is total if it defines all

attributes in V.16

Messages and Attributes –

Example 1 Let V = <company, product, price>

over the event schema <STR, STR, DBL>

Consider the following message:<company> IBM </company><product>PC AT, 20 Mhz, 256 KB RAM</product><price>5000</price>

This describes a total message m1

where m1(company) = “IBM”, m1(product) = “PC AT, 20 Mhz, 256 KB RAM” and m1(price) = 5000.

17

Messages and Attributes –

Example 2 Consider the following message:

<company> IBM </company>

<product>PC AT, 20 Mhz, 256 KB RAM</product>

This describes a different message m2

which is not total (i.e. partial), since

m2(price) = *.

18

Three Subscription Languages

SiSL – Simple Subscription Language

◦ All messages are total

StSL – Strict Subscription Language◦ Messages define all attributes that occur in

the query (subscription criteria)

◦ SiSL is a subset of StSL

DeSL – Default Subscription Language

◦ All attributes are initialized to default values (e.g. using NULL)

◦ Extends the functionality of SiSL to heterogeneous message formats

19

Formalizing SiSL Queries

(Subscriptions) Atomic formulas

Let v be an attribute in V

If and

then the formulas v = c, v < c, c < v

are atomic formulas.

If , atomic formulas are

defined similarly.

If

then the formulas are

atomic formulas. ( ≡ substring)20

Formalizing SiSL Queries

(Subscriptions) Atoms = the set of atomic formulas

A Query is a Boolean combination

of atomic formulas

= the set of attributes occurring

in

= the set of atomic formulas

occurring in

21

Formalizing SiSL Queries

(Subscriptions) Abbreviations

22

Example: SiSL Query

The following SiSL query matches all

messages for 1000 Mhz PCs

manufactured by IBM, Dell or Siemens

which cost at most $1000.

23

Formalizing SiSL Queries

(Subscriptions) = The instantiation of a query by

a message m.

Definition:

is defined as the query obtained

from by replacing all variables

for which m(v) ≠ * by m(v).

Definition:

The SiSL query matches the total

message m if evaluates to true.

24

Formalizing StSL Queries

(Subscriptions) StSL (Strict Subscription Language) is

generalization of SiSL.

Definition: adequacy

A message m is adequate for a query

, if for all , it holds that m(v)

≠ *.

Definition:

The query matches m, iff m is

adequate for and

25

Formalizing DeSL Queries

(Subscriptions) DeSL (Default Subscription Language)

is the most general out of the three.

For each attribute vi, there’s a default

value

Definition:

The default extension of m is

defined as follows.

26

Formalizing DeSL Queries

(Subscriptions) Definition:

The query matches the message m

under default semantics if (i.e.

evaluates to true)

27

Roadmap

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

28

BDDs (Binary Decision

Diagrams) Notations

A = a set of propositional variables

= a linear ordering (variable

ordering) on A

= An ordered BDD over A, whose

non-terminal nodes are labeled by

variables in A, terminals by 0 or 1.

= The Boolean function

represented by node v in

29

Properties of BDDs

Each non-terminal node v has two out-

edges: low edge and high edge

Let a non-terminal node v with label ai

has successors at the low and high

edges u and w respectively. Then,

Size = # nodes in the BDD

30

Example: BDD

The following BDD represents the

Boolean function x AND ( y OR z).

The variable ordering is

31

Shared BDDs (SBDDs)

While OBDDs represent one Boolean function, SBDDs represent multiple Boolean functions.

SBDD is a collection of component OBDDs respecting same variable ordering.

SBDD has a set of output nodes Vo = {o1, …, on} each corresponding to Boolean functions <f1,…, fn> respectively.

32

SBDDs

Every root node of component

OBDDS Vo

Notation:

Denotes the BDD together with its

output nodes {o1, …, on}

is polynomial time

computable from any other shared

BDD over A for <f1,…, fn>

33

Example: Shared BDD

Node 1 represents

Node 2 represents

Node 3 represents

34

BDD Data Structure

A BDD with n nodes is represented as a graph whose vertices are the natural numbers 1,…, n.

The adjacency relationship is described by an array of size n.

ith element = (low[i], high[i], label[i], value[i])◦ low[i] = low successor of i◦ high[i] = high successor of i

◦ label[i] = label of i◦ value[i] = used later to store the result of the

BDD evaluation corresponding to i.

35

BDD Evaluation

The above algorithm computes the

value of each node in under the

assignment where

= = value of ith component36

BDD Evaluation

Notice that we can compute the value

of Boolean functions associated with

each output node in one pass.

37

BDD Restrictions

The idea is to restrict the possible

truth assignments such that

external constraint f (a Boolean fn

over A) evaluates to true under

Definition: f-restriction

38

Roadmap

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

39

Query BDDs

Key Idea

◦ Represent many subscription queries by a

single shared BDD whose nodes

correspond to atomic sub-formulas of the

queries.

◦ Messages are matched against queries

by simply running EvalBDD on the shared

BDD.

40

Query BDDs

, a sequence of queries

over the set of attributes V

A = , the set of atomic

sub-formulas of the queries.

is the set of propositional variables

such that each atomic sub-formula a

in A is assigned a propositional

variable

= Boolean query obtained by

substituting each a with 41

Example: Query BDDs

Let & two subscriptions received

Then, =

Three atomic sub-formulas => Three

propositional variables

42

Example: Query BDDs

Let the variable order be

SBDD corresponding

to the queries

43

Query Matching: SiSL

Use EvalBDD algorithm for query

matching

A query Qi is considered matched if

the BDD node corresponding to Qi

evaluates to 1.

Bottom-up evaluation makes sure sub-

queries are evaluated only once.

44

Query Matching: DeSL

Same as handling complete

messages

When a message received, it is

extended to a total message before

performing the matching.

45

Query Matching: StSL

Recall that a message m matches a

subscription Q iff m is adequate for Q

and m satisfies Q.

Can use a modified EvalBDD to

perform faster matching

Key Ideas

◦ An undefined atom renders all sub-

formulas in which it occurs undefined.

◦ Treat * as new value undefined

46

Query Matching: StSL

MVEvalBDD for StSL is significantly

faster than EvalBDD for SiSL

47

Roadmap

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

48

# Nodes in SBDD vs. #

Subscriptions

Number of nodes scale almost linearly

◦ High scalability

Restriction further reduces node count,

minimizing memory requirements

49

Matching time for SiSL and StSL

Inputs: Number of subscription queries and message density (how total)

Partial messages can be matched quickly.

Time for StSL queries

50

Roadmap

Research problem at hand

Content-based Publish-Subscribe

Subscription Query Language

BDD Semantics

BDD Based matching

Experimental Results

Discussion (Pros and Cons)

51

Variable Ordering vs. BDD size

Variable ordering has a tremendous

influence on BDD size.

52

Pros

Introduces a well-formed semantics to

describe the matching process in

publish-subscribe systems

Matching as a satisfiability checking in

SBDD allows to incrementally check

multiple subscriptions

Scalable

StSL is more efficient than SiSL

53

Cons/Improvements

Does not describe any heuristics to select node ordering (NP-hard);

◦ Can we order based on the significance of the attributes involved?

Does not explore possibility of eliminating redundancies due to semantically related atomic sub-formulas (e.g.: price = 100 and price > 80) (again NP-hard)

◦ Can we further reduce the node count exploiting the semantics without causing side effect?

Efficiency of matching is not compared with existing systems

54

Conclusion

Two major contributions

◦ A Precise semantics to match messages

to subscriptions

◦ Modeling filtering as a satisfiability check

in BDD

55

Questions

56

Thank You

57

Recommended