View
3
Download
0
Category
Preview:
Citation preview
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree: An Index Structure to Efficiently MatchBoolean Expressions over High-dimensional Space
Mohammad Sadoghi Hans-Arno Jacobsen
University of Toronto
June 15, 2011
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 1 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
1 Application Scenarios
2 Matching Problem
3 BE-Tree (Boolean Expression-Tree)
4 BE-Tree Adaptation Policies
5 Experimental Evaluation
6 Conclusions
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 2 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Computational Advertising (A Billion-dollar Industry)
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 3 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Computational Advertising (A Billion-dollar Industry)
Advertiser
Advertising Campaign
SonySears
AmazonAdvertisement (BE):age < 32credit-score > 630num-visits > 4price = 150
Advertiser
Subscriptions
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 4 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Computational Advertising (A Billion-dollar Industry)
Broker
Advertiser
Online User
Advertising Campaign
car=BMWnum-visits=13
year=2008
model=X3
credit-score=647age=25
SonySears
AmazonAdvertisement (BE):age < 32credit-score > 630num-visits > 4price = 150
User Profiles
Clickstream
Advertiser
Subscriptions
Events Events
“BMW X3 2008”
price<235
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 5 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Computational Advertising (A Billion-dollar Industry)
Broker
Advertiser
Online User
Advertising Campaign
Targeted Ads Targeted Ads
car=BMWnum-visits=13
year=2008
model=X3
credit-score=647age=25
SonySears
AmazonAdvertisement (BE):age < 32credit-score > 630num-visits > 4price = 150
User Profiles
Clickstream
Advertiser
Subscriptions
Events Events
“BMW X3 2008”
price<235
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 6 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Application Scenarios
1 Computational advertising (targeted advertising)
2 Computational finance (algorithmic trading)
3 Intrusion detection (deep packet inspection)
4 Real-time data analysis (data analytics)
5 Emerging mobile applications in co-spaces (location-based services)
6 Approximate string matching (data cleansing)
Common Denominator
To continuously evaluate a set of predefined patterns/specifications(subscriptions) over incoming events.
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 7 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Application Scenarios
1 Computational advertising (targeted advertising)
2 Computational finance (algorithmic trading)
3 Intrusion detection (deep packet inspection)
4 Real-time data analysis (data analytics)
5 Emerging mobile applications in co-spaces (location-based services)
6 Approximate string matching (data cleansing)
Common Denominator
To continuously evaluate a set of predefined patterns/specifications(subscriptions) over incoming events.
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 7 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Challenges Derived from Application Scenarios
Key matching problem challenges addressed in this work
1 Enable an expressive matching semantics that supports constraints onboth the subscription and the event.
2 Handle subscriptions with expressive operators that impose conditionsonly on a small set of dimensions, which results in a high degree ofoverlap among subscriptions.
3 Scale to large collections of subscriptions with thousands ofdimensions.
4 Sustain high matching rates of events in presence of frequentinsertions and deletions of subscriptions.
5 Adapt to temporal changes of workload distribution (self-adjustingmechanism), i.e., avoid structure deterioration.
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 8 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Challenges Derived from Application Scenarios
Key matching problem challenges addressed in this work
1 Enable an expressive matching semantics that supports constraints onboth the subscription and the event.
2 Handle subscriptions with expressive operators that impose conditionsonly on a small set of dimensions, which results in a high degree ofoverlap among subscriptions.
3 Scale to large collections of subscriptions with thousands ofdimensions.
4 Sustain high matching rates of events in presence of frequentinsertions and deletions of subscriptions.
5 Adapt to temporal changes of workload distribution (self-adjustingmechanism), i.e., avoid structure deterioration.
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 8 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
1 Application Scenarios
2 Matching Problem
3 BE-Tree (Boolean Expression-Tree)
4 BE-Tree Adaptation Policies
5 Experimental Evaluation
6 Conclusions
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 9 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Language and Data Model
Both subscription and event are defined as Boolean expressions which arecollections of predicates.
A predicate P is a triple consisting of an attribute uniquely representing a dimensionin n-dimensional space, an operator, and a set of values, denoted by P(attr,opt,val).A predicate P(x) either accepts or rejects an input x such thatP : x −→ {True, False}, where x ∈ Dom(Pattr).Each predicate supports relational operators (<, ≤, =, 6=, ≥, >), set operators (∈,/∈), or the SQL BETWEEN operator.Formally, a Boolean expression e is defined over an n-dimensional space as follows:
Definition
e ={
P(attr,opt,val)1 ∧ · · · ∧ P
(attr,opt,val)k
},
where k ≤ n; i , j ≤ k , Pattri = Pattr
j iff i = j
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 10 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Language and Data Model
Both subscription and event are defined as Boolean expressions which arecollections of predicates.A predicate P is a triple consisting of an attribute uniquely representing a dimensionin n-dimensional space, an operator, and a set of values, denoted by P(attr,opt,val).
A predicate P(x) either accepts or rejects an input x such thatP : x −→ {True, False}, where x ∈ Dom(Pattr).Each predicate supports relational operators (<, ≤, =, 6=, ≥, >), set operators (∈,/∈), or the SQL BETWEEN operator.Formally, a Boolean expression e is defined over an n-dimensional space as follows:
Definition
e ={
P(attr,opt,val)1 ∧ · · · ∧ P
(attr,opt,val)k
},
where k ≤ n; i , j ≤ k , Pattri = Pattr
j iff i = j
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 10 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Language and Data Model
Both subscription and event are defined as Boolean expressions which arecollections of predicates.A predicate P is a triple consisting of an attribute uniquely representing a dimensionin n-dimensional space, an operator, and a set of values, denoted by P(attr,opt,val).A predicate P(x) either accepts or rejects an input x such thatP : x −→ {True, False}, where x ∈ Dom(Pattr).
Each predicate supports relational operators (<, ≤, =, 6=, ≥, >), set operators (∈,/∈), or the SQL BETWEEN operator.Formally, a Boolean expression e is defined over an n-dimensional space as follows:
Definition
e ={
P(attr,opt,val)1 ∧ · · · ∧ P
(attr,opt,val)k
},
where k ≤ n; i , j ≤ k , Pattri = Pattr
j iff i = j
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 10 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Language and Data Model
Both subscription and event are defined as Boolean expressions which arecollections of predicates.A predicate P is a triple consisting of an attribute uniquely representing a dimensionin n-dimensional space, an operator, and a set of values, denoted by P(attr,opt,val).A predicate P(x) either accepts or rejects an input x such thatP : x −→ {True, False}, where x ∈ Dom(Pattr).Each predicate supports relational operators (<, ≤, =, 6=, ≥, >), set operators (∈,/∈), or the SQL BETWEEN operator.
Formally, a Boolean expression e is defined over an n-dimensional space as follows:
Definition
e ={
P(attr,opt,val)1 ∧ · · · ∧ P
(attr,opt,val)k
},
where k ≤ n; i , j ≤ k , Pattri = Pattr
j iff i = j
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 10 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Language and Data Model
Both subscription and event are defined as Boolean expressions which arecollections of predicates.A predicate P is a triple consisting of an attribute uniquely representing a dimensionin n-dimensional space, an operator, and a set of values, denoted by P(attr,opt,val).A predicate P(x) either accepts or rejects an input x such thatP : x −→ {True, False}, where x ∈ Dom(Pattr).Each predicate supports relational operators (<, ≤, =, 6=, ≥, >), set operators (∈,/∈), or the SQL BETWEEN operator.Formally, a Boolean expression e is defined over an n-dimensional space as follows:
Definition
e ={
P(attr,opt,val)1 ∧ · · · ∧ P
(attr,opt,val)k
},
where k ≤ n; i , j ≤ k , Pattri = Pattr
j iff i = j
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 10 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Matching Semantics
Stabbing Subscription
Given an event e and a set of subscriptions S, find all subscriptions si ∈ Ssatisfied by e.
Definition
SQ(e) ={
si | ∀Pattr,opt,valq (x) ∈ si , ∃Pattr,opt,val
o (x) ∈ e,
Pattrq = Pattr
o ,∃x ∈ Dom(Pattrq ), Pq(x) ∧ Po(x)
} (1)
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 11 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Matching Semantics
Stabbing Subscription
Given an event e and a set of subscriptions S, find all subscriptions si ∈ Ssatisfied by e.
Definition
SQ(e) ={
si | ∀Pattr,opt,valq (x) ∈ si , ∃Pattr,opt,val
o (x) ∈ e,
Pattrq = Pattr
o ,∃x ∈ Dom(Pattrq ), Pq(x) ∧ Po(x)
} (1)
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 11 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Matching Semantics
Stabbing Subscription
Given an event e and a set of subscriptions S, find all subscriptions si ∈ Ssatisfied by e.
Definition
SQ(e) ={
si | ∀Pattr,opt,valq (x) ∈ si , ∃Pattr,opt,val
o (x) ∈ e,
Pattrq = Pattr
o ,∃x ∈ Dom(Pattrq ), Pq(x) ∧ Po(x)
} (1)
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 11 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
1 Application Scenarios
2 Matching Problem
3 BE-Tree (Boolean Expression-Tree)
4 BE-Tree Adaptation Policies
5 Experimental Evaluation
6 Conclusions
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 12 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Key Principles
Most Important Observation
To utilize the underlying high-dimensional, discrete, finite, and multi-valued attributespace that is assumed in many practical scenarios.
Most Important Design Feature
To systematically explore the space in two iterative phases of space partitioning andspace clustering (i) to cope with the curse of dimensionality (ii) to support dynamicinsertion and removal of subscriptions independent of their orders.
The two-phase space-cutting technique consists of
1 space partitioning: global structuring to determine the best splitting dimension2 space clustering: local structuring for each partition to determine the best
grouping of expressions with respect to the expressions’ range of values for eachdimension
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 13 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Key Principles
Most Important Observation
To utilize the underlying high-dimensional, discrete, finite, and multi-valued attributespace that is assumed in many practical scenarios.
Most Important Design Feature
To systematically explore the space in two iterative phases of space partitioning andspace clustering (i) to cope with the curse of dimensionality (ii) to support dynamicinsertion and removal of subscriptions independent of their orders.
The two-phase space-cutting technique consists of
1 space partitioning: global structuring to determine the best splitting dimension2 space clustering: local structuring for each partition to determine the best
grouping of expressions with respect to the expressions’ range of values for eachdimension
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 13 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Key Principles
Most Important Observation
To utilize the underlying high-dimensional, discrete, finite, and multi-valued attributespace that is assumed in many practical scenarios.
Most Important Design Feature
To systematically explore the space in two iterative phases of space partitioning andspace clustering (i) to cope with the curse of dimensionality (ii) to support dynamicinsertion and removal of subscriptions independent of their orders.
The two-phase space-cutting technique consists of
1 space partitioning: global structuring to determine the best splitting dimension2 space clustering: local structuring for each partition to determine the best
grouping of expressions with respect to the expressions’ range of values for eachdimension
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 13 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
The Complete Structure
c-node
l-node
c-directory
p-node
p-directory
c-directory
p-directoryl-nodel-nodel-node
c-node c-node c-node c-node
p-node
p-node p-node
O(1)
O(logN)O(klogN)
Space Clustering
Space Partitioning
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 14 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
The Complete Structure
c-node
l-node
c-directory
p-node
p-directory
c-directory
p-directoryl-nodel-nodel-node
c-node c-node c-node c-node
p-node
p-node p-node
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 15 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
The Partition Directory
dim_i dim_j
c-node
l-node
p-node
p-directory
p-node
c-directory c-directory
Maintaining information for dimensions
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 16 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
The Complete Structure
c-node
l-node
c-directory
p-node
p-directory
c-directory
p-directoryl-nodel-nodel-node
c-node c-node c-node c-node
p-node
p-node p-node
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 17 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
The Cluster Directory
[1, N]
p-node
[1, N/2] [N/2, N]
[1,N/4] [N/4,N/2] [3N/4,N]
[1] [N/2]
c-node
l-node
c-directory
c-node
l-node l-node l-node l-node
c-node c-node c-node
buckets
Maintaining information for the range of values
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 18 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Intuition Behind the Two-phase Space-cutting Technique
Required rules to avoid the cascading split problem and to ensure adeterministic property for altering between space partitioning and clustering.
1 Insertion rule: expression is always inserted into the smallest bucketthat encloses it.
2 Forced split rule: non-atomic bucket (i.e., a multi-valued, divisiblebucket) is always split before switching back to space partitioning.
3 Merge rule: underflowing leaf bucket is merged with its parent only ifthe parent bucket is not partitioned yet.
Invariance
Every expression always resides in the smallest bucket that encloses it, and anon-atomic leaf bucket is never partitioned.
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 19 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Intuition Behind the Two-phase Space-cutting Technique
Required rules to avoid the cascading split problem and to ensure adeterministic property for altering between space partitioning and clustering.
1 Insertion rule: expression is always inserted into the smallest bucketthat encloses it.
2 Forced split rule: non-atomic bucket (i.e., a multi-valued, divisiblebucket) is always split before switching back to space partitioning.
3 Merge rule: underflowing leaf bucket is merged with its parent only ifthe parent bucket is not partitioned yet.
Invariance
Every expression always resides in the smallest bucket that encloses it, and anon-atomic leaf bucket is never partitioned.
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 19 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Intuition Behind the Two-phase Space-cutting Technique
Required rules to avoid the cascading split problem and to ensure adeterministic property for altering between space partitioning and clustering.
1 Insertion rule: expression is always inserted into the smallest bucketthat encloses it.
2 Forced split rule: non-atomic bucket (i.e., a multi-valued, divisiblebucket) is always split before switching back to space partitioning.
3 Merge rule: underflowing leaf bucket is merged with its parent only ifthe parent bucket is not partitioned yet.
Invariance
Every expression always resides in the smallest bucket that encloses it, and anon-atomic leaf bucket is never partitioned.
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 19 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
1 Application Scenarios
2 Matching Problem
3 BE-Tree (Boolean Expression-Tree)
4 BE-Tree Adaptation Policies
5 Experimental Evaluation
6 Conclusions
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 20 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Self-adjustment Mechanism
Ranking Objective
A novel ranking objective that directly reduces the matching cost as opposed to aranking that is biased towards either the least or the most popular dimension(s).
The matching cost consists of
1 minimizing false candidate computations: reduce the number of predicateevaluations before an unsatisfied predicate is reached, namely, penalizingmultiple search paths and penalizing paths that produce many false candidates(Loss function).
2 minimizing true candidate computations: promote the evaluation of thecommon predicate among matched expressions exactly once (Gain function).
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 21 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Self-adjustment Mechanism
Ranking Objective
A novel ranking objective that directly reduces the matching cost as opposed to aranking that is biased towards either the least or the most popular dimension(s).
The matching cost consists of
1 minimizing false candidate computations: reduce the number of predicateevaluations before an unsatisfied predicate is reached, namely, penalizingmultiple search paths and penalizing paths that produce many false candidates(Loss function).
2 minimizing true candidate computations: promote the evaluation of thecommon predicate among matched expressions exactly once (Gain function).
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 21 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Self-adjustment Mechanism (Cost-based Ranking Model)
Definition
Rank(ni ) =
{βGain(ni )− γLoss(ni ) if ni is a l-node(∑
lj∈c(ni ) Rank(lj))− γLoss(ni ) otherwise
,where 0 ≤ β, γ ≤ 1
(2)
Loss(ni ) =∑
e′∈windowm(ni )
# discarded pred eval for e ′
|window(ni )|(3)
Gain(lj) = αsGains(lj) + αcGainc(lj), 0 ≤ αs , αc ≤ 1 (4)
Gains(lj) = # subsumed pred, Gainc(lj) = # covered pred (5)
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 22 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Self-adjustment Mechanism (Cost-based Ranking Model)
Definition
Rank(ni ) =
{βGain(ni )− γLoss(ni ) if ni is a l-node(∑
lj∈c(ni ) Rank(lj))− γLoss(ni ) otherwise
,where 0 ≤ β, γ ≤ 1
(2)
Loss(ni ) =∑
e′∈windowm(ni )
# discarded pred eval for e ′
|window(ni )|(3)
Gain(lj) = αsGains(lj) + αcGainc(lj), 0 ≤ αs , αc ≤ 1 (4)
Gains(lj) = # subsumed pred, Gainc(lj) = # covered pred (5)
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 22 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Self-adjustment Mechanism (Cost-based Ranking Model)
Definition
Rank(ni ) =
{βGain(ni )− γLoss(ni ) if ni is a l-node(∑
lj∈c(ni ) Rank(lj))− γLoss(ni ) otherwise
,where 0 ≤ β, γ ≤ 1
(2)
Loss(ni ) =∑
e′∈windowm(ni )
# discarded pred eval for e ′
|window(ni )|(3)
Gain(lj) = αsGains(lj) + αcGainc(lj), 0 ≤ αs , αc ≤ 1 (4)
Gains(lj) = # subsumed pred, Gainc(lj) = # covered pred (5)
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 22 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Self-adjustment Mechanism (Cost-based Ranking Model)
Definition
Rank(ni ) =
{βGain(ni )− γLoss(ni ) if ni is a l-node(∑
lj∈c(ni ) Rank(lj))− γLoss(ni ) otherwise
,where 0 ≤ β, γ ≤ 1
(2)
Loss(ni ) =∑
e′∈windowm(ni )
# discarded pred eval for e ′
|window(ni )|(3)
Gain(lj) = αsGains(lj) + αcGainc(lj), 0 ≤ αs , αc ≤ 1 (4)
Gains(lj) = # subsumed pred, Gainc(lj) = # covered pred (5)
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 22 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Example (Insertion)
S1= [a>0, b=2, e>4]
S2= [c<3, f>1, m<2]
S3= [d=1, f<3]
S4= [a=3]
S5= [d>3, f>2]
S6= [d=4, f<4]
S7= [a<4, b=1]
S8= [a<3]
S9= [a>1]
c-node
l-node
{S1,S
2,S
3}
Insertion Sequence
l-node's maxcap
= 3
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 23 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Example (Insertion)
S1= [a>0, b=2, e>4]
S2= [c<3, f>1, m<2]
S3= [d=1, f<3]
S4= [a=3]
S5= [d>3, f>2]
S6= [d=4, f<4]
S7= [a<4, b=1]
S8= [a<3]
S9= [a>1]
c-node
l-node
{S2}
Insertion Sequence
l-node's maxcap
= 3
{S1,S
2,S
3,S
4}
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 24 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Example (Insertion)
S1= [a>0, b=2, e>4]
S2= [c<3, f>1, m<2]
S3= [d=1, f<3]
S4= [a=3]
S5= [d>3, f>2]
S6= [d=4, f<4]
S7= [a<4, b=1]
S8= [a<3]
S9= [a>1]
c-node
l-node
p-node
c-node
p-directory
[1, 4]c-directory
a{S
2}
l-node
{S1,S
4}
Insertion Sequence
l-node's maxcap
= 3
{S2,S
3}
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 25 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Example (Insertion)
S1= [a>0, b=2, e>4]
S2= [c<3, f>1, m<2]
S3= [d=1, f<3]
S4= [a=3]
S5= [d>3, f>2]
S6= [d=4, f<4]
S7= [a<4, b=1]
S8= [a<3]
S9= [a>1]
c-node
l-node
p-node
c-node
p-directory
[1, 4]c-directory
a{S
2}
l-node
{S1,S
4}
Insertion Sequence
l-node's maxcap
= 3
{S2,S
3,S
5,S
6}
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 26 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Example (Insertion)
S1= [a>0, b=2, e>4]
S2= [c<3, f>1, m<2]
S3= [d=1, f<3]
S4= [a=3]
S5= [d>3, f>2]
S6= [d=4, f<4]
S7= [a<4, b=1]
S8= [a<3]
S9= [a>1]
c-node
l-node
p-node p-node
c-node
p-directory
[1, 4]c-directory
c-node
l-node
a d
{S3, S
5,S
6}
{S2}
l-node
{S1,S
4}
Insertion Sequence
l-node's maxcap
= 3
{ S2}
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 27 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Example (Insertion)
c-node
l-node
p-node p-node
c-node
p-directory
[1, 4]c-directory
c-node
l-node
a d
{S3, S
5,S
6}
{S2}
S1= [a>0, b=2, e>4]
S2= [c<3, f>1, m<2]
S3= [d=1, f<3]
S4= [a=3]
S5= [d>3, f>2]
S6= [d=4, f<4]
S7= [a<4, b=1]
S8= [a<3]
S9= [a>1]
l-node
{S1,S
4,S
7,S
8}
Insertion Sequence
l-node's maxcap
= 3
{ S2}
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 28 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Example (Insertion)
c-node
l-node
p-node p-node
c-nodec-node
l-node
p-directory
[1, 4]
[2, 4]
c-directory
c-node
l-node
a d
{S4}
{S3, S
5,S
6}
{S2}
S1= [a>0, b=2, e>4]
S2= [c<3, f>1, m<2]
S3= [d=1, f<3]
S4= [a=3]
S5= [d>3, f>2]
S6= [d=4, f<4]
S7= [a<4, b=1]
S8= [a<3]
S9= [a>1]
l-node
{S1,S
7,S
8}
Insertion Sequence
l-node's maxcap
= 3
{ S2}
{S1,S
7,S
8}
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 29 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Example (Insertion)
c-node
l-node
p-node p-node
c-nodec-node
l-node
p-directory
[1, 4]
[2, 4]
c-directory
c-node
l-node
a d
{S4}
{S3, S
5,S
6}
{S2}
S1= [a>0, b=2, e>4]
S2= [c<3, f>1, m<2]
S3= [d=1, f<3]
S4= [a=3]
S5= [d>3, f>2]
S6= [d=4, f<4]
S7= [a<4, b=1]
S8= [a<3]
S9= [a>1]
l-node
{S8,S
9}
Insertion Sequence
l-node's maxcap
= 3
{ S2}
{S8,S
9}{S
1,S
7,S
8,S
9}
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 30 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Example (Insertion)
c-node
l-node
p-node p-node
c-nodec-node
l-node
p-directory
[1, 4]
[2, 4]
c-directory
c-node
l-node
a d
p-directoryb
p-node c-nodel-node
{S4}
{S3, S
5,S
6}
{S2}
{S1,S
7}
S1= [a>0, b=2, e>4]
S2= [c<3, f>1, m<2]
S3= [d=1, f<3]
S4= [a=3]
S5= [d>3, f>2]
S6= [d=4, f<4]
S7= [a<4, b=1]
S8= [a<3]
S9= [a>1]
l-node
{S8,S
9}
Insertion Sequence
l-node's maxcap
= 3
{ S2}
{S8,S
9}
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 31 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Example (the Notion of a Key)
c-node
l-node
p-node p-node
c-nodec-node
l-node
p-directory
[1, 4]
[2, 4]
c-directory
c-node
l-node
a d
p-directoryb
p-node c-nodel-node
{S4}
{S3, S
5,S
6}
{S2}
{S1,S
7}
S1= [a>0, b=2, e>4]
S2= [c<3, f>1, m<2]
S3= [d=1, f<3]
S4= [a=3]
S5= [d>3, f>2]
S6= [d=4, f<4]
S7= [a<4, b=1]
S8= [a<3]
S9= [a>1]
l-node
{S8,S
9}
l-node's maxcap
= 3
{ S2}
{S8,S
9}
Insertion Sequence
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 32 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Example (Matching)
c-node
l-node
p-node p-node
c-nodec-node
l-node
p-directory
[1, 4]
[2, 4]
c-directory
c-node
l-node
a d
p-directoryb
p-node c-nodel-node
{S4}
{S3, S
5,S
6}
{S2}
{S1,S
7}
S1= [a>0, b=2, e>4]
S2= [c<3, f>1, m<2]
S3= [d=1, f<3]
S4= [a=3]
S5= [d>3, f>2]
S6= [d=4, f<4]
S7= [a<4, b=1]
S8= [a<3]
S9= [a>1]
l-node
{S8,S
9}
Insertion Sequence
l-node's maxcap
= 3
{ S2}
{S8,S
9}
Evente
1=[a=1, b=2, e=3]
e1=[a=1]
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 33 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
BE-Tree Example (Matching)
c-node
l-node
p-node p-node
c-nodec-node
l-node
p-directory
[1, 4]
[2, 4]
c-directory
c-node
l-node
a d
p-directoryb
p-node c-nodel-node
{S4}
{S3, S
5,S
6}
{S2}
{S1,S
7}
S1= [a>0, b=2, e>4]
S2= [c<3, f>1, m<2]
S3= [d=1, f<3]
S4= [a=3]
S5= [d>3, f>2]
S6= [d=4, f<4]
S7= [a<4, b=1]
S8= [a<3]
S9= [a>1]
l-node
{S8,S
9}
Insertion Sequence
l-node's maxcap
= 3
{ S2}
{S8,S
9}
e1=[a=1, b=2, e=3]
e1=[a=1]
Event
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 34 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
1 Application Scenarios
2 Matching Problem
3 BE-Tree (Boolean Expression-Tree)
4 BE-Tree Adaptation Policies
5 Experimental Evaluation
6 Conclusions
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 35 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Experimental Evaluation
Algorithms
1 BE: BE-Tree (our index structure)
2 BE-B: BE-Tree (batch version)
3 GR: IBM Gryphon (Aguilera et al., PODC’99)
4 P: Propagation Algorithm (Fabret et al. SIGMOD’01)
5 k-ind: k-index (Whang et al. VLDB’09)
6 SIFT: Counting Algorithm (Yan et al. TODS’94)
7 SCAN: Sequential Scan
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 36 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Workload Configurations
Table: Synthetic and Real Workload Properties
Wor
kloa
dS
ize
Nu
mb
erof
Dim
ensi
ons
Dim
ensi
onC
ard
inal
ity
Pre
dic
ate
Sel
ecti
vity
Dim
ensi
onS
elec
tivi
ty
Su
b/E
ven
t
Siz
e
%E
qu
alit
yP
red
Mat
chP
rob
DB
LP
(Au
thor
)
DB
LP
(Tit
le)
Mat
chP
rob
(Au
thor
)
Mat
chP
rob
(Tit
le)
Size 100K-1M 1M 100K 100K 100K 100K 1M 1M 100-760K 50-250K 400k 150
Number of Dim 400 50-1400 400 400 400 400 400 400 677 677 677 677
Cardinality 48 48 48-150K 48 2-10 48 48 48 26 26 26 26
Avg. Sub Size 7 7 7 7 7 5-66 7 7 8 35 8 30
Avg. Event Size 15 15 15 15 15 13-81 15 15 8 35 16 43
Pred Avg. Range Size % 12 12 12 6-50 — 12 12 12 — — 12 12
% Equality Pred 0.3 0.3 0.3 0.3 1.0 0.3 0.2-1.0 0.3 1.0 1.0 0.3 0.3
Op Class Med Med Med Med Min Med Med Lo-Hi Min Min Lo-Hi Lo-Hi
Match Prob % 1 1 1 1 — 1 1 0.01-9 — — 0.01-9 0.01-9
The results in this paper were verified by the SIGMOD repeatability committee.
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 37 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Workload Configurations
Table: Synthetic and Real Workload Properties
Wor
kloa
dS
ize
Nu
mb
erof
Dim
ensi
ons
Dim
ensi
onC
ard
inal
ity
Pre
dic
ate
Sel
ecti
vity
Dim
ensi
onS
elec
tivi
ty
Su
b/E
ven
t
Siz
e
%E
qu
alit
yP
red
Mat
chP
rob
DB
LP
(Au
thor
)
DB
LP
(Tit
le)
Mat
chP
rob
(Au
thor
)
Mat
chP
rob
(Tit
le)
Size 100K-1M 1M 100K 100K 100K 100K 1M 1M 100-760K 50-250K 400k 150
Number of Dim 400 50-1400 400 400 400 400 400 400 677 677 677 677
Cardinality 48 48 48-150K 48 2-10 48 48 48 26 26 26 26
Avg. Sub Size 7 7 7 7 7 5-66 7 7 8 35 8 30
Avg. Event Size 15 15 15 15 15 13-81 15 15 8 35 16 43
Pred Avg. Range Size % 12 12 12 6-50 — 12 12 12 — — 12 12
% Equality Pred 0.3 0.3 0.3 0.3 1.0 0.3 0.2-1.0 0.3 1.0 1.0 0.3 0.3
Op Class Med Med Med Med Min Med Med Lo-Hi Min Min Lo-Hi Lo-Hi
Match Prob % 1 1 1 1 — 1 1 0.01-9 — — 0.01-9 0.01-9
The results in this paper were verified by the SIGMOD repeatability committee.
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 37 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Effect of Workload Size on Matching (Log Scale)
0.5
1
2
4
8
16
32
64
128
256
100K300K
500K700K
900K1MM
atc
hin
g T
ime/E
vent (m
s)
Varying Number of Subscriptions
BE-BBEGR
Pk-IndSIFT
SCAN
(a) Uniform: Workload Size
1
2
4
8
16
32
64
128
256
512
1024
100K300K
500K700K
900K1MM
atc
hin
g T
ime/E
vent (m
s)
Varying Number of Subscriptions
BE-BBEGR
Pk-IndSIFT
SCAN
(b) Zipf: Workload Size
Figure: Varying Workload Size
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 38 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Effect of Dimension Cardinality on Matching (Log Scale)
0.5
1
2
4
8
16
32
48 2401.2K
6K 30K150K
Matc
hin
g T
ime/E
vent (m
s)
Varying Cardinality; Sub=100K
BE-BBEGR
Pk-IndSIFT
SCAN
(a) Uniform: Dimension Cardinality
1
2
4
8
16
32
64
48 2401.2K
6K 30K150K
Matc
hin
g T
ime/E
vent (m
s)
Varying Cardinality; Sub=100K
BE-BBEGR
Pk-IndSIFT
SCAN
(b) Zipf: Dimension Cardinality
Figure: Varying Dimension Cardinality
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 39 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Effect of Predicate Expressiveness on Matching (Log Scale)
4
8
16
32
64
128
256
0.20.3
0.40.6
0.81.0
Matc
hin
g T
ime/E
vent (m
s)
Varying % of Equality Pred; Sub=1M
BE-BBEGR
Pk-IndSIFT
SCAN
(a) Uniform: % Equality Predicate
4
8
16
32
64
128
256
512
1024
2048
0.20.3
0.40.6
0.81.0
Matc
hin
g T
ime/E
vent (m
s)
Varying % of Equality Pred; Sub=1M
BE-BBEGR
Pk-IndSIFT
SCAN
(b) Zipf: % Equality Predicate
Figure: Varying % of Equality Predicates
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 40 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
1 Application Scenarios
2 Matching Problem
3 BE-Tree (Boolean Expression-Tree)
4 BE-Tree Adaptation Policies
5 Experimental Evaluation
6 Conclusions
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 41 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Conclusions
The main benefits in the overall design of BE-Tree are to
1 support expressive set of predicate operators (e.g., range predicates)through a novel predicate transformation
2 handle subscriptions that impose conditions only on a small set ofdimensions (of a high-dimensional space), resulting in a high degree ofoverlap, through the two-phase space-cutting technique that identifiesdense subspaces and copes with the curse of dimensionality
3 adapt to temporal changes of the workload distribution and to sustain highmatching rates in presence of frequent subscription updates through
1 a cascading-split-free approach when inserting and removing expressions2 a deterministic clustering and a partition-clustering alteration strategy that
are independent of the insertion sequence3 a novel cost-based technique to optimize the matching cost and recycle
ineffective buckets
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 42 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Conclusions
The main benefits in the overall design of BE-Tree are to
1 support expressive set of predicate operators (e.g., range predicates)through a novel predicate transformation
2 handle subscriptions that impose conditions only on a small set ofdimensions (of a high-dimensional space), resulting in a high degree ofoverlap, through the two-phase space-cutting technique that identifiesdense subspaces and copes with the curse of dimensionality
3 adapt to temporal changes of the workload distribution and to sustain highmatching rates in presence of frequent subscription updates through
1 a cascading-split-free approach when inserting and removing expressions2 a deterministic clustering and a partition-clustering alteration strategy that
are independent of the insertion sequence3 a novel cost-based technique to optimize the matching cost and recycle
ineffective buckets
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 42 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Conclusions
The main benefits in the overall design of BE-Tree are to
1 support expressive set of predicate operators (e.g., range predicates)through a novel predicate transformation
2 handle subscriptions that impose conditions only on a small set ofdimensions (of a high-dimensional space), resulting in a high degree ofoverlap, through the two-phase space-cutting technique that identifiesdense subspaces and copes with the curse of dimensionality
3 adapt to temporal changes of the workload distribution and to sustain highmatching rates in presence of frequent subscription updates through
1 a cascading-split-free approach when inserting and removing expressions2 a deterministic clustering and a partition-clustering alteration strategy that
are independent of the insertion sequence3 a novel cost-based technique to optimize the matching cost and recycle
ineffective buckets
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 42 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Conclusions
The main benefits in the overall design of BE-Tree are to
1 support expressive set of predicate operators (e.g., range predicates)through a novel predicate transformation
2 handle subscriptions that impose conditions only on a small set ofdimensions (of a high-dimensional space), resulting in a high degree ofoverlap, through the two-phase space-cutting technique that identifiesdense subspaces and copes with the curse of dimensionality
3 adapt to temporal changes of the workload distribution and to sustain highmatching rates in presence of frequent subscription updates through
1 a cascading-split-free approach when inserting and removing expressions2 a deterministic clustering and a partition-clustering alteration strategy that
are independent of the insertion sequence3 a novel cost-based technique to optimize the matching cost and recycle
ineffective buckets
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 42 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Thank You,
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 43 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Effect of Sub/Event Size on Matching (Log Scale)
0.5
1
2
4
8
16
32
64
128
256
5/137/15
14/25
27/43
45/61
66/81
Matc
hin
g T
ime/E
vent (m
s)
Varying Sub/Event Size; Sub=100K
BE-BBEGR
Pk-IndSIFT
SCAN
(a) Uniform: Sub/Event Size
1
2
4
8
16
32
64
128
256
512
1024
5/137/15
14/25
27/43
45/61
66/81
Matc
hin
g T
ime/E
vent (m
s)
Varying Sub/Event Size; Sub=100K
BE-BBEGR
Pk-IndSIFT
SCAN
(b) Zipf: Sub/Event Size
Figure: Varying Subscription/Event Size
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 44 / 45
Application Scenarios Language & Semantics Intuitions Adaptation Evaluation Conclusions
Effect of Matching Probability on Matching (Log Scale)
(a) Uniform: % Matching Prob (b) Zipf: % Matching Prob
Figure: Varying % of Matching Probability
Mohammad Sadoghi (University of Toronto) BE-Tree SIGMOD’11 45 / 45
Recommended