Algorithms and Metrics for Processing Multiple Heterogeneous Continuous Queries

Algorithms and Metrics for Processing Multiple

Heterogeneous Continuous Queries

Mohamed A. Sharaf, Panos K. Chrysanthis, Alexandros Labrinidis, and Kirk Pruhs

Department of Computer Science, University of Pittsburgh

The emergence of monitoring applications has precipitated the need for Data Stream Man-agement Systems (DSMSs) which constantly monitor incoming data feeds (through registeredcontinuous queries), in order to detect events of interest. In this paper, we examine the problemof how to schedule multiple Continuous Queries (CQs) in a DSMS to optimize different Qualityof Service (QoS) metrics. We show that, unlike traditional on-line systems, scheduling policies inDSMSs that optimize for average response time will be different from policies that optimize foraverage slowdown, which is a more appropriate metric to use in the presence of a heterogeneousworkload. Towards this, we propose policies to optimize for the average-case performance for bothmetrics. Additionally, we propose a hybrid scheduling policy that strikes a fine balance betweenperformance and fairness, by looking at both the average- and worst-case performance, for bothmetrics. We also show how our policies can be adaptive enough to handle the inherent dynamicnature of monitoring applications. Furthermore, we discuss how our policies can be efficientlyimplemented and extended to exploit sharing in optimized multi-query plans and multi-streamCQs. Finally, we experimentally show using real data that our policies consistently outperformcurrently used ones.

Categories and Subject Descriptors: H.2.4 [Database Management]: SystemsQuery process-ing

General Terms: Algorithms, Design, Performance

Additional Key Words and Phrases: Data Stream Management System, Continuous Queries,Operator Scheduling

1. INTRODUCTION

The growing need for monitoring applications [Carney et al. 2002] has forced an evo-lution on data processing paradigms, moving from Database Management Systems(DBMSs) to Data Stream Management Systems (DSMSs). Traditional DBMSsemploy a store-and-then-query data processing paradigm, where data is stored inthe database and queries are submitted by the users to be answered in full, basedon the current snapshot of the database. In contrast, in DSMSs, monitoring appli-cations register Continuous Queries (CQs) which continuously process unbounded

This work is supported in part by the National Science Foundation under grant IIS-0534531.Authors addresses: Department of Computer Science, University of Pittsburgh, Pittsburgh PA15260; emails: {msharaf, panos, labrinid, kirk}@cs.pitt.edu.Permission to make digital/hard copy of all or part of this material without fee for personalor classroom use provided that the copies are not made or distributed for profit or commercialadvantage, the ACM copyright/server notice, the title of the publication, and its date appear, andnotice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish,to post on servers, or to redistribute to lists requires prior specific permission and/or a fee.c 2007 ACM 0362-5915/2007/0300-0001 $5.00

ACM Transactions on Database Systems, Vol. V, No. N, November 2007, Pages 142.

2 Mohamed A. Sharaf et al.

data streams looking for data that represent events of interest to the end-user.

Currently, we are developing a DSMS, called AQSIOS, that can help supportmonitoring applications such as the real-time detection of disease outbreaks, track-ing the stock market, environmental monitoring via sensor networks, and person-alized and customized Web pages. One of the main goals in the design of AQSIOSis the development of scheduling policies that optimize Quality of Service (QoS).

This goal is complicated by the fact that the scheduling policy must take intoaccount that the CQs are heterogeneous, i.e., they may have different time complex-ities (the amount of processing required to find if input data represents an event),and different productivity or selectivity (the number of events detected by the CQ).For example, consider two CQs, GOOGLE and ANALYSIS on streams of stock marketdata. GOOGLE is a simple query that asks the DSMS to be notified when there is astock quote for GOOGLE. ANALYSIS is a complex query that asks the application toprovide some specific technical analysis for any new stock price. Obviously, GOOGLEhas low cost and it detects fewer events, whereas ANALYSIS has high cost and itdetects more events.

The most commonly used QoS metric in the literature is average response time.In this paper, we show that if the objective is to optimize the response time, then theright strategy is to schedule CQs according to their output rate. Specifically, wepresent a new scheduling policy called Highest Rate (HR). HR generalizes the Rate-based policy (RB) [Urhan and Franklin 2001] for scheduling operators in multipleCQs as opposed to RB that has been proposed for scheduling operators within asingle query. Under HR, the priority of a query is set to its output rate wherethe output rate of the query is the ratio between its expected selectivity and itsexpected cost.

Although scheduling to minimize average response time works well for homoge-neous workloads, there are some well known disadvantages to using average responsetime as the metric to optimize when the workload is heterogeneous. In the aboveexample, the user who issued the ANALYSIS query likely knows that it is a complexquery, and is expecting a higher response time than the user that issued the GOOGLEquery. A metric that captures this phenomenon is average slowdown. The slow-down of a job is the ratio of the response time of the job to its ideal processing time[Muthukrishnan et al. 1999]. So, for example, if each job had slowdown 1.1, theneach user would experience a 10% delay due to queuing (although the responsescould be very different).

Interestingly, in most on-line systems (e.g., Web servers), Shortest-Remaining-Processing-Time (SRPT) is one policy that is optimal for average response timeand near optimal for average slowdown [Muthukrishnan et al. 1999]. A surprisingdiscovery that motivated this work is that this is not the case with the HR policywhich optimizes average response time of CQs [Sharaf et al. 2006]. In general, HRwill not optimize average slowdown because of the probabilistic nature of CQswhere the selectivity might not equal 1. In this paper, we argue that if the objectiveis to optimize average slowdown then the right scheduling strategy is to set thepriority of a query to the ratio of its selectivity over the product of its expectedcost and its ideal total processing cost. We call this policy the Highest NormalizedRate (HNR) policy.

ACM Transactions on Database Systems, Vol. V, No. N, November 2007.

Algorithms and Metrics for Processing Multiple Heterogeneous Continuous Queries 3

The average slowdown provided by the DSMS captures the systems average-caseperformance. However, improving the average-case performance usually comes atthe expense of unfairness toward certain classes of queries that might experiencestarvation. Starvation is typically captured by measuring the maximum slowdownof the system [Bender et al. 1998], i.e., the perceived worst-case performance.

Starvation is an unacceptable behavior in a DSMS that supports monitoringapplications where all kinds of events are equally important. Hence, it is crucialto balance the trade-off between the average-case and worst-case performances ofthe DSMS. Toward this, we propose a hybrid scheduling policy that optimizes the`2 norm of slowdowns [Bansal and Pruhs 2003]. As such, it is able to strike afine balance between the average- and worst-case performances and hence it avoidsstarvation and exhibits higher degree of fairness.

In addition to new scheduling policies, we consider two special problems thatare unique to DSMSs and should be exploited by the query scheduler. The firstproblem is inherent in the dynamic nature of data streams where the distributionof data may vary significantly over time. Towards solving this problem, we pro-pose an adaptive scheduling mechanism that allows our proposed policies to reactquickly to changes in data distribution. The second problem is inherent in theinter-dependency between operators in CQs due to the presence of join or sharedoperators. Towards this, we first address the scheduling of multi-stream queries withtime-based sliding window join operators. We formulate the definition of slowdownfor composite tuples produced by join operators and extend our proposed schedul-ing policies to handle such multi-stream queries. Second, we consider the schedulingof multiple queries with shared operators, where we show that a proper setting ofthe priority of shared operators significantly improves system performance.

Contributions The contributions of this paper can be summarized as follows:

(1) We propose policies for scheduling multiple CQs that maximize the average-caseperformance of a DSMS, for response time and for slowdown.

(2) We propose hybrid policies that strike a fine balance between the average- andworst-case performances, for response time and for slowdown.

(3) We consider three issues that are very particular to DSMSs. Namely, we pro-pose: (a) extending our proposed policies to handle multi-stream continuousqueries, (b) exploiting sharing in optimizing multi-query plans, and (c) utilizingan adaptive scheduling mechanism.

(4) To ensure that our proposed hybrid policy can be efficiently realized in AQSIOS,we propose a low-overhead implementation which uses clustering in additionto efficient search pruning techniques adopted from [Aksoy and Franklin 1999;Fagin et al. 2001].

Our extensive experimental evaluation using real and synthetic data shows thesignificant gains provided by our proposed policies under different QoS measures,compared to existing scheduling policies in DSMSs. Our experiments also highlightthe memory requirements of each of our proposed policies and the trade-off betweenoptimizing for QoS vs optimizing for memory usage.



M1

Q1 Q2

1

M2

2

3

4 pipi 5 pipi

Oy

Oz

Ox

Ol

Operator Segment Exk

QkOr

Shared Operators

Fig. 1. Continuous Queries Plans

Road Map Section 2 provides the system model. Sections 3 and 4 define our QoSmetrics and present our proposed scheduling policies. Section 5 focuses on multi-stream queries. In Sections 6 and 7, we discuss implementation details and extendour work to consider queries with shared operators. Sections 8 and 9 discuss oursimulation testbed and our experimental results. Finally, Section 10 surveys relatedwork.

2. SYSTEM MODEL

In a DSMS, users register continuous queries that are executed as new data arrives.Data arrives in the form of continuous streams from different data sources, wherethe arrival of new data is similar to an insertion operation in traditional databasesystems. A DSMS is typically connected to different data sources and a singlestream might feed more than one query.

A continuous query evaluation plan can be conceptualized as a data flow tree[Carney et al. 2002; Babcock et al. 2003], where the nodes are operators thatprocess tuples and edges represent the flow of tuples from one operator to another(Figure 1). An edge from operator Ox to operator Oy means that the output of Oxis an input to Oy. Each operator is associated with a queue where input tuples arebuffered until they are processed.

Multiple queries with common sub-expressions are usually merged together toeliminate the repetition of similar operations [Sellis 1988]. For example, Figure 1shows the global plan for two queries Q1 and Q2. Both queries operate on datastreams M1 and M2 and they share the common sub-expression represented by op-erators O1, O2 and O3, as illustrated by the half-shaded pattern for these operators.

A single-stream query Qk has a single leaf operator Okl and a single root operator

Okr , whereas a multi-stream query has a single root operator and more than oneleaf operators. In a query plan Qk, an operator segment E

kx,y is the sequence of

operators that starts at Okx and ends at Oky . If the last operator on E

kx,y is the

root operator, then we simply denote that operator segment as Ekx . Additionally,



Ekl represents an operator segment that starts at the leaf operator Okl and ends

at the root operator Okr . For example, in Figure 1, E11 =< O1, O3, O4 >, whereas

E21 =.In a query, each operator Okx (or simply Ox) is associated with two parameters:

(1) Processing cost or Processing time (cx): is the amount of time needed to processan input tuple.

(2) Selectivity or Productivity (sx): is the number of tuples produced after process-ing one tuple for cx time units. sx is less than or equal to 1 for a filter operatorand it could be greater than 1 for a join operator.

Given a single-stream query Qk which consists of operators (Figure 1), we define the following characterizing param-

eters for any operator Okx (or equivalently, for any operator segment Ekx that starts

at operator Okx):

Operator Global Selectivity (Skx): is the number of tuples produced at theroot Okr after processing one tuple along operator segment E

kx .

Skx = skx s

ky ... s

kr

Operator Global Average Cost (Ck

x): is the expected time required to pro-cess a tuple along an operator segment Ekx .

Ck

x = (ckx) + (c

ky s

kx) + ... + (c

kr s

kr1 ... s

kx)

If Okx is a leaf operator (x = l), when a processed tuple actually satisfies all the

filters in Ekl , then Ck

l represents the ideal total processing cost or time incurred by

any tuple produced or emitted by query Qk. In this case, we denote Ck

l as Tk:

Tuple Processing Time (Tk): is the ideal total processing cost required toproduce a tuple by query Qk.

Tk = ckl + ... + c

kx + c

ky + ... + c

kr

We extend the above parameters for multi-stream queries in Section 5.

3. AVERAGE-CASE PERFORMANCE

In this section, we focus on QoS metrics for single-stream queries and present ourscheduling policies for optimizing these metrics. Multi-stream queries are discussedin Section 5.

3.1 Response Time Metric

In DSMSs, the arrival of a new tuple triggers the execution of one or more CQs.Processing a tuple by a CQ might lead to discarding it (if it does not satisfy somefilter predicate) or it might lead to producing one or more tuples at the output,which means that the input tuple represents an event of interest to the user whoregistered the CQ. Clearly, in DSMSs, it is more appropriate to define responsetime from a data/event perspective rather than from a query perspective as intraditional DBMSs. Hence, we define the tuple response time or tuple latency asfollows:



Definition 1. Tuple response time, Ri, for tuple ti is Ri = Di Ai, where Aiis tis arrival time and Di is tis output time. Accordingly, the average responsetime for N tuples is: 1

N

Ni=1 Ri.

Notice that tuples that are filtered out do not contribute to the metric as they donot represent any event [Tian and DeWitt 2003].

3.1.1 Highest Rate Policy (HR). The Rate-based policy (RB) has been shownto improve the average response time of a single query [Urhan and Franklin 2001].In Aurora [Carney et al. 2003], RB was used for scheduling operators within aquery, after the query had been selected by Round Robin (RR). Below, we presenta policy that extends RB for scheduling both queries and operators [Sharaf et al.2005; Sharaf 2007].

In the basic RB policy, each operator path within a query is assigned a prioritythat is equal to its output rate. The path with the highest priority is the onescheduled for execution. In our proposed Highest Rate policy (HR), we simplyview the network of multiple queries as a set of operators and at each schedulingpoint we select for execution the operator with the highest priority (i.e., outputrate).

Specifically, under HR, each operator Okx is assigned a value called global outputrate (GRkx). The output rate of an operator is basically the expected number oftuples produced per time unit due to processing one tuple by the operators alongthe operator segment starting at Okx all the way to the root O

kr . Formally, the

output rate of operator Okx is defined as follows:

GRkx =Skx

Ck

x

(1)

where Skx and Ck

x are the operators global selectivity and global average cost asdefined in Section 2. The intuition underlying HR is to give higher priority to op-erator paths that are both productive and inexpensive. In other words, the highestpriority is given to the operator paths with the minimum latency for producing onetuple.

The priority of each operator Okx is set to its global output rate GRkx, or equiv-

alently, the output rate of the operator segment Ekx starting at Okx. Hence, the

priority of Ekx is basically equal to the priority of Okx and executing O

kx implies the

pipelined execution of all the operators on Ekx unless it is interrupted by a higherpriority operator (or operator segment) as we will describe in Section 6.

3.2 Slowdown Metric

Average response time is an expressive metric in a homogeneous setting, i.e., whenall tuples require the same processing time. However, in a heterogeneous workload,as in our system, the processing requirements for different tuples may vary signifi-cantly and average response time is not an appropriate metric, since it cannot relatethe time spent by a tuple in the system to its processing requirements. Given thisrealization, other on-line systems with heterogeneous workloads such as DBMSs,OSs, and Web servers have adopted average slowdown or stretch [Muthukrishnanet al. 1999] as another metric. This motivated us to consider stretch as the metric



Symbol Description

Oix Operator x in query i

Eix,y Segment of operators that starts at Oix and ends at O

iy

Eix Segment of operators that starts at Oix and ends at the root O

ir

cix Processing time/cost of operator Oix

six Selectivity of operator Oix

Ci

x Expected processing time/cost of operator segment Eix

Six Selectivity of operator segment Eix

W ix Wait time for tuple at the head of Oixs input queue

Ti Ideal processing time/cost of a tuple produced by query Qi

Vx Window interval for join operator Oxl Mean inter-arrival time of data stream Ml

SEx Set of operator segments starting at shared operator OxSCx Expected processing time/cost of set of segments in SEx

Table I. Table of Symbols

in our system.The definition of slowdown was initiated by the database community in [Mehta

and DeWitt 1993] for measuring the performance of a DBMS executing multi-classworkloads. Formally, the slowdown of a job is the ratio between the time a jobspends in the system to its processing demands [Muthukrishnan et al. 1999]. InDSMS, we define the slowdown of a tuple as follows [Sharaf et al. 2006; Sharaf2007]:

Definition 2. The slowdown, Hi, for tuple ti produced by query Qk is Hi =RiTk

,where Ri is tis response time and Tk is its ideal processing time. Accordingly, theaverage slowdown for N tuples is: 1

N

Ni=1 Hi.

Intuitively, in a general purpose DSMS where all events are of equal importance,a simple event (i.e., an event detected by a low-cost CQ) should be detected fasterthan a complex event (i.e., an event detected by a high-cost CQ) since the lattercontributes more to the load on the DSMS.

3.3 Highest Normalized Rate Policy (HNR)

Based on the above definitions, we developed the Highest Normalized Rate (HNR)policy for minimizing average slowdown. Table I summarizes the parameters usedfor describing the HNR policy for single-stream queries as well as the other schedul-ing policies discussed in the next Section. It also includes the parameters used forjoin operators (Section 5) and shared operators (Section 7).

To illustrate the intuition underlying HNR, consider two operator segments E ixand Ejy starting at operators O

ix and O

jy respectively. For each of the two operator

segments, we compute its global selectivity and global average cost as describedabove. Further, assume that the current wait time for the tuple at the head of Oixsqueue is W ix and for the tuple at the head of O

jys queue is W

jy .

We then consider two different scheduling policies:

Policy (A), where Eix is executed before Ejy, and

Policy (B), where Ejy is executed before Eix.



In policy A, where Eix is executed before Ejy, the total slowdown of tuples pro-

duced under this policy is:

HA = Six HA,i + S

jy HA,j (2)

where Six and Sjy is the number of tuples produced by E

ix and E

jy respectively, and

HA,i and HA,j are the slowdowns of the Eix tuples and the E

jy tuples respectively.

Recall that the slowdown of a tuple is the ratio between the time it spent in thesystem to its ideal processing time. Hence, HA,i and HA,j are computed as follows:

HA,i =Ti + W

ix

TiHA,j =

Ci

x + Tj + Wjy

Tj

where Ci

x is the amount of time Ejy will spend waiting for E

ix to finish execution.

By substitution in (2),

HA = Six

Ti + Wix

Ti+ Sjy

Ci

x + Tj + Wjy

Tj

Similarly, under the alternative policy B, where Ejy is executed before Eix, the total

slowdown HB is:

HB = Sjy

Tj + Wjy

Tj+ Six

Cj

y + Ti + Wix

Ti

In order for HA to be less than HB , then the following inequality must be satisfied:

Sjy C

i

x

Tj< Six

Cj

y

Ti(3)

The left-hand side of Inequality 3 shows the increase in total slowdown incurredby the tuples produced by Ejy when E

ix is executed first. Similarly, the right-hand

side shows the increase in total slowdown incurred by the tuples produced by E ixwhen Ejy is executed first. The inequality implies that between the two alternativeexecution orders, we should select the one that minimizes the increase in the totalslowdown. That is, we should select the segment with the smallest negative impacton the other one.

In order to select the segment with the smallest negative impact, in our HNRpolicy, each operator Okx is assigned a priority V

kx which is the weighted rate or

normalized rate of the operator segment Ekx that starts at operator Okx and it is

defined as:

V kx =1

Tk

Skx

Ck

x

(4)

The term Skx/Ck

x is basically the global output rate (GRkx) of the operator segment

starting at operator Okx as defined in [Urhan and Franklin 2001]. As such, thepriority of each operator Okx is its normalized output rate, or equivalently, thenormalized output rate of the operator segment Ekx starting at O

kx. Hence, executing

Okx implies the pipelined execution of all the operators on Ekx unless it is interrupted

by a higher priority operator as we will describe in Section 6.



Q1 Q2 Q2 Q2Q1 Q1

5 10 15 17 19 210

Q1Q2 Q2 Q2 Q1 Q1

11 16 212 4 60

Q1

Q2

Q2Tuples accepted by Q1

Tuples accepted by Q2

Tuples discarded by Q2

(B) Optimizing for Slowdown (HNR)

(A) Optimizing for Response Time (HR)

SD=1 SD=2 SD=3 SD=9.5

SD=2 SD=2.2 SD=3.2 SD=4.2

SD: Slowdown

Fig. 2. The output of Example 1

3.4 HNR vs. HR

It is interesting to notice that if the objective is optimizing the response time, thenthe ideal total processing cost T should be eliminated from the denominators of allthe above equations resulting in setting the priority V kx of operator O

kx to:

V kx =Skx

Ck

x

= GRkx (5)

In fact, this is the prioritizing function we use in our Highest Rate (HR) policyfor optimizing the response time presented in Section 3.1.1. The HR policy, sched-ules jobs in descending order of output rate which might result in a high averageslowdown because a low-cost query can be assigned a low priority since it is notproductive enough. Those few tuples produced by this query will all experiencea high slowdown, with a corresponding increase in the average slowdown of theDSMS.

Our policy HNR, like HR, is based on output rate, however, it also emphasizesthe ideal tuple processing time in assigning priorities. As such, an inexpensiveoperator segment with low productivity will get a higher priority under HNR thanunder HR.

Example 1 To further illustrate the difference between the HR and the HNRpolicies, let us consider an example where we have two queries Q1 and Q2. Eachquery consists of a single operator. For Q1, the cost of the operator is 5 ms and itsselectivity is 1.0. For Q2, the cost of the operator is 2 ms and its selectivity is 0.33.Further, assume that there are 3 pending tuples to be processed by the 2 queriesand that all 3 tuples have arrived at time 0.

Under the HR policy, Q1s priority is1.05.0 = 0.2, whereas Q2s priority is

0.332.0 =

0.1667 (which is the output rate of each query). Figure 2(A) shows the queriesoutput under the HR policy where Q1 is executed first and it accepted/emitted



all the pending 3 tuples, then Q2 is executed and it only accepted one of the 3pending tuples (since its selectivity is 0.33); we assume it was the middle one inthis example.

Under the HNR policy, Q1s priority is1.0

5.05.0 = 0.04, whereas Q2s priority is0.33

2.02.0 = 0.08. Hence, under HNR, Q2 is scheduled before Q1 resulting in theoutput shown in Figure 2(B).

Response Time Slowdown

HR 12.25 3.875

HNR 13.0 2.9

Table II. Results of Example 1

Table II summarizes the results of the two different policies and shows that HNRprovides the lower average slowdown compared to HR. The reason is that the onetuple accepted by Q2 experienced a slowdown of

42 = 2.0 under HNR while its

slowdown under HR is 192 = 9.5. This unfairness of HR toward Q2 resulted in ahigher overall average slowdown compared to HNR.

3.5 HNR vs. HR vs. SRPT

It should be clear that under HR, if all the operators selectivities are equal toone, then Equation 5 is simply the inverse of the processing time. Hence, in thiscase, HR is equivalent to SRPT. Similarly, if all the operators selectivities are

equal to one, then in Equation 4, Ck

x is equal to Tk and Oix is executed before O

jy

if 1/(Ti)2 > 1/(Tj)

2. By taking the square root of both sides, then HNR is alsoequivalent to SRPT.

The above observation shows the effect of the selectivity parameter on this prob-lem. That is, under a probabilistic workload, HR reduces the response time,whereas, HNR reduces the slowdown. However, as the workload becomes deter-ministic, both HR and HNR converge to a single policy which is the SRPT policy,which has been shown to be optimal for task scheduling when looking at responsetime and near optimal when looking at slowdown.

4. AVERAGE-CASE VS. WORST-CASE PERFORMANCE

Here, we first define the worst-case performance and a policy that minimizes it.Then, we introduce our scheduling policy for balancing the trade-off between theaverage- and worst-case performance.

4.1 Worst-case Performance

It is expected that a scheduling policy that strives to minimize the average-caseperformance might lead to a poor worst-case performance under a relatively highload. That is, some queries (or tuples) might starve under such a policy. The worst-case performance is typically measured using maximum response time or maximumslowdown [Bender et al. 1998].

Definition 3. The maximum response for N tuples is max(R1, R2, ..., RN ).



Definition 4. The maximum slowdown for N tuples is max(H1, H2, ..., HN ).

Intuitively, a policy that optimizes for the worst-case performance should bepessimistic. That is, it assumes the worst-case scenario where each processed tuplewill satisfy all the filters in the corresponding query. An example of such a policy isthe traditional First-Come-First-Serve (FCFS) that has been shown to optimize themaximum response time metric in [Bender et al. 1998]. Similarly, the traditionalLongest Stretch First (LSF) [Acharya and Muthukrishnan 1998] has been shownto optimize the maximum slowdown. Under LSF, each operator Okx is assigned apriority V kx which is computed as:

V kx =W kxTk

(6)

where W kx is the wait time of the tuple at the head of Okxs input queue and Tk is

the ideal processing cost for that tuple.LSF is a greedy policy under which the priority assigned to an operator Okx is

basically the current slowdown of the tuple at the top of Okxs input queue; thecurrent slowdown of a tuple is the ratio of the time the tuple has been in thesystem thus far to its processing time.

4.2 Balancing the Trade-off between Average-case and Worst-case Performance

A policy that strikes a fine balance between the average-case and worst-case per-formance needs a metric that is able to capture this trade-off. In this section, wefirst present such a metric, and then describe our proposed scheduling policy whichoptimizes that metric.

4.2.1 The `2 Norm Metric. On one hand, the average value for a QoS metricprovided by the system represents the expected QoS experienced by any tuple inthe system (i.e., the average-case performance). On the other hand, the maximumvalue measures the worst QoS experienced by some tuple in the system (i.e., theworst-case performance). It is known that each of these metrics by itself is notenough to fully characterize system performance.

To get a better understanding of system performance, we need to look at bothmetrics together or, alternatively, we can use a single metric that captures both ofthese metrics. The most common way to capture the trade-off between the average-case and the worst-case performance is to measure the `2 norm [Bansal and Pruhs2003]. Specifically, the `2 norm of response times, Ri, is defined as:

Definition 5. The `2 norm of response times for N tuples is equal toN

1 R2i .

The definition shows how the `2 norm considers the average in the sense thatit takes into account all values, yet, by considering the second norm of each valueinstead of the first norm, it penalizes more severely outliers compared to the averageslowdown metric.

Similarly, the `2 norm of slowdowns, Hi, is defined as:

Definition 6. The `2 norm of slowdowns for N tuples is equal toN

1 H2i .

In the following sections, we present our policies for balancing the trade-off be-tween the average and worst cases.



4.2.2 Balancing the Trade-off for Slowdown. Our proposed HNR policy is stillbiased toward certain classes of queries. These classes are:

(1) Queries with high productivity; and/or

(2) Queries with low processing cost.

For example, under HNR, a query with high cost and low productivity comes atthe bottom of the priority list. When the system is overloaded, such low priorityquery will starve waiting for execution. This behavior may be viewed as beingunfair as it yields a system with a high value for the maximum slowdown metric.The LSF policy, on the other hand, avoids the starvation of tuples yet yields a pooraverage-case performance.

In order to balance the trade-off between the average- and worst-case perfor-mance, we are proposing a new scheduling policy that minimizes the `2 norm ofslowdowns. We will call this new policy Balance Slowdown (BSD). To understandthe intuition underlying BSD, we will use the same technique from the previoussection but with the objective of minimizing the `2 norm of slowdowns.

Specifically, consider a policy A where operator segment E ix is executed beforeoperator segment Ejy. The `2 norm of slowdowns of tuples produced under thispolicy is:

LA =

Six (HA,i)

2 + Sjy (HA,j)2

where Six, HA,i, Sjy, and HA,j are calculated as in Section 3. Similarly, we can

compute LB which is the `2 norm of slowdowns of tuples produced under policy B.In order for LA to be less than LB, then the following inequality must be satisfied:

Sjy

Cj

y(Tj)2(2W jy +2Tj +C

i

x) is executed first, then at the next schedulingpoint, tuple t3 would be the one associated with Cx. However, if the pair is executed first, then at the next scheduling point, t1 would be associated with the



t1t2t3

CxCy

t1t3 t2 Cluster CyInput for Cy

t1t3 t2 X Cluster CxInput for Cx

Input queue(sorted by wait time)

Cluster list (sorted by value )

X

Fig. 4. An example that illustrates the different implementation techniques

next cluster in the cluster list or it would be eliminated from the queue if it hasbeen processed by all clusters.

6.3 Adaptive Scheduling

It should be clear that the success of any of our scheduling policies proposed aboverelies heavily on the DSMS being able to estimate the processing time and the se-lectivity parameters for each operator. This would enable the scheduler to computethe right priority for each query which in turn would lead to optimizing the desiredQoS metric.

The first of these parameters (i.e., the processing time of an operator) is a fairlystatic parameter which could be estimated once when a CQ is registered and usedthroughout the lifetime of the CQ. However, the selectivity parameter of an operatorcould be very dynamic as it depends on the data distribution in the input datastream which may vary significantly over time. For example, in an environmentmonitoring application, a filter like where temp < 40F will have higher selectivityduring the night than during the day. (at least in some parts of the world, includingPittsburgh!).

To circumvent this problem of dynamic selectivity, we propose an adaptive schedul-ing mechanism that enables our proposed policies to deliver the expected perfor-mance all the time. Under this mechanism, the DSMS continuously monitors theexecution of queries and updates the current priorities of queries based on the newestimations.

Specifically, the DSMS monitors the input and output of query operators over atime window and updates the selectivity of the operator at the end of the window.If the new selectivity is different from the old one, then the operator is assigned anew priority based on the new selectivity. The new selectivity Snew assigned to an



operator is basically computed as follows:

Snew = (1 ) Sold + NONI

where Sold is the current selectivity of the operator and NO and NI are respectively,the number of output and input tuples of an operator during the window interval.Finally, is an aging parameter that determines how much is the weight assignedto the newly observed selectivity as compared to the selectivity currently assignedto an operator.

For instance, if is set to 0, then the selectivity would never be updated and thesystem is static. On the contrary, if is set to 1, then the system always ignoresthe past and the new selectivity is basically the one that has been observed duringthe last window. This might lead to a very unstable system especially with a shortmonitoring window. Hence, a value of that is greater than 0 and less than oneshould allow for a stable and and adaptive system. In fact, we found that setting to 0.175, the same value used in network congestion control mechanisms [Jacobson1988], provides the best performance.

Notice that our mechanism for monitoring and adapting is very similar to theticket scheme used in eddies-based query processing [Madden et al. 2002]. However,the ticket scheme is basically used for routing tuples between operators rather thanscheduling the execution of multiple continuous queries. Specifically, the ticketscheme provides dynamic query plans that can adapt to changes in workload.

Under the ticket scheme, a lottery scheduling mechanism [Waldspurger and Weihl1994] is used where the eddy gives a ticket to an operator whenever it consumesa tuple and takes a ticket away whenever it sends a tuple back to the eddy forfurther processing. To choose an operator to which a new tuple should be routed,a lottery is conducted between operators, with the chances of a particular operatorwinning is proportional to the number of tickets it has acquired. On the one hand,the ticket scheme gives higher priority to operators with low selectivity, which isbeneficial for query plan optimization. On the other hand, our proposed policies,generally give higher priority to operators with high selectivity, which is beneficialfor multiple query scheduling for improved online performance.

7. OPERATOR SHARING

Operator sharing eliminates the repetition of similar operations in different queries.Hence, a multi-query scheduler should exploit those shared operators for furtheroptimization. In this section we show how to set the priority of a shared operatorunder our proposed policies.

First, let us consider a set of operator segments SEx in which operator Ox isshared among multiple operator segments E1x, E

2x, ..., E

nx (Figure 5) where for each

segment Eix, we can compute the selectivity Six and the average cost C

i

x.Further, assume that the cost of the shared operator Ox is cx and SCx is the

average cost of executing the set of segments SEx. Intuitively, SCx is equal tothe total average cost of executing the N segments with the cost of the sharedoperator Ox counted only once. Formally, the average cost SCx of N paths sharing



Ox

Or

Ox+1

Or

Ox+1

Or

Ox+1

Ex

Lx

PDTx

Ex1

1

1

m

m n

n

n

n

Fig. 5. Multiple CQs plans sharing operator Ox

an operator Ox is:

SCx =

Ni=1

Ci

x

N1i=1

cx

where Ci

x is the average cost of segment Eix and cx is the cost of the shared operator

Ox.

7.1 HNR with Operator Sharing

In this section, we will describe the general method for setting the priority of ashared operator under HNR. In the next section, we will describe the particulardetails of this method. Note that the BSD policy can also be extended in the sameway, however the details are eliminated for brevity.

To set the priority of a shared operator under the HNR policy, consider two setsof operator segments SEp and SEq , where SEp = {E

1p , ..., E

Np } sharing operator

Op and SEq = {E1q , ..., E

Mq } sharing operator Oq . For now, assume that if a set of

segments is scheduled, then all the segments within that set are executed.To measure the impact of executing one set on the other, we will use the same

concept from the definition of Inequality 3. Basically, we will measure the increase inslowdown incurred by the tuples produced from one set if the other set is scheduledfor execution first. Hence, if the set of segments SEp is executed first, then theincrease in slowdown incurred by tuples from SEq is computed as follows:

Hq = S1q

SCpTq,1

+ S2qSCpTq,2

+ ... + SMqSCpTq,M

where SCp is the amount of time that set SEq will spend waiting for set SEp tofinish execution and Tq,i is the ideal total processing time for the tuples processedby Eiq .

Similarly, we can compute Hp which is the increase in slowdown incurred bytuples from SEp. In order for Hq to be less than Hp, then the following inequalitymust be satisfied:

SCp

Mi=1

SiqTq,i

< SCq

Ni=1

SipTp,i



Hence, the priority of a set of operator segments SEx that consists of N segmentssharing a common operator Ox is:

Vx =

Ni=1

SixTx,i

SCx(9)

7.2 Priority-Defining Tree (PDT)

Setting the priority of a shared operator using all the N segments in a set is onlybeneficial if it maximizes the value of Equation 9. However, that is not always thecase because Equation 9 is non-monotonically increasing. That is, adding a newsegment to the equation might increase or decrease its value.

We definitely need to boost the priority of a shared operator, however, we donot want segments with low normalized rate to hurt those with high normalizedrate by bringing down the overall priority of the shared operator. As such, we needto select from each set what we call a Priority-Defining Tree (PDT) which is thesubset of segments that maximizes the aggregated value of the priority function.Hence, the priority of a shared operator is basically the priority of that PDT andonce a shared operator is scheduled, the segments in the PDT are executed as oneunit (unless it is preempted).

In order to compute the priority value Vx for operator Ox, we sort the segmentsaccording to their priority. Then, we visit the segments in descending order ofpriority, and only add a segment to the priority defining tree of Ox (PDTx) if itincreases the aggregate priority value, otherwise we stop and the shared operatorOx is assigned that aggregate priority value. Hence, for an operator Ox sharedbetween N segments, with a PDTx that is composed of m segments where m N ,the priority of Ox under the HNR policy is defined as:

Vx =

mi=1

SixTx,im

i=1 Ci

x m1

i=1 cx

If m = N , that is, if the PDT consists of all the segments sharing Ox, then Vx isequal to the global normalized rate as defined in Equation 9.

For any operator segment Eix that does not belong to PDTx, such segment can beviewed as two components: Ox and L

ix (as shown in Figure 5). Executing PDTx will

naturally lead to executing the Ox component of Eix. Scheduling L

ix for execution

depends on its priority which is computed in the normal way using its normalizedrate as in Section 3. Hence, for example, in a query-level implementation of theHNR scheduler, the priority list will contain all the leaf operators in addition tothe first operator in each segment that does not belong to any PDT.

8. EVALUATION TESTBED

To evaluate the performance of the algorithms proposed in this paper, we createda DSMS simulator with the following properties.

Queries: We simulated a DSMS with 500 registered continuous queries. Thestructure of the query is the same as in [Chen et al. 2002; Madden et al. 2002]where each query consists of three operators: select, join and project. For the



experiments on single-stream queries, we assume a join with a stored relation; formulti-stream queries we use window join between data streams.

Streams: We used the LBL-PKT-4 trace from the Internet Traffic Archive1 asour input stream. The trace contains an hours worth of wide-area traffic betweenthe Lawrence Berkeley Laboratory and the rest of the world. This trace givesus a realistic data arrival pattern with On/Off traffic which is typical of manyapplications.

Selectivities: In order to control the selectivity, we added two extra attributesto each packet in the trace and assigned each attribute a uniform value in the range[1,100]. Then the selectivity of the select and join operators is uniformly assigned inthe range [0.1,1.0] by using predicates defined on the introduced attributes. Sincethe performance of a policy depends on its behavior toward different classes ofqueries, where a query class is defined by its global selectivity and cost, we chose touse the same selectivity for operators that belong to the same query. This enablesus to control the creation of classes in a uniform distribution to better understandthe behavior of each policy (e.g., Figure 13).

Costs: Similar to selectivity, operators that belong to the same query have thesame cost, which is uniformly selected from five possible classes of costs. The costof an operator in class i is equal to: K 2i time units, where i [0,4] and K isa scaling factor that is used to scale the costs of operators to meet the simulatedutilization (or load). Specifically, we measure the average inter-arrival time of thedata trace, then we set K so that the ratio between the total expected costs ofqueries and the inter-arrival time is equal to the simulated utilization.

Policies: We compared the performance of our proposed policies to the two-levelscheduling scheme from Aurora where Round-Robin (RR) is used to schedule queriesand Rate-based (RB) is used to schedule operators within a query. Collectively, werefer to the Aurora scheme in our experiments as RR.

We also considered the SRPT policy where the priority of an operator segmentis inversely proportional to its total ideal processing time, as well as the Chainscheduling policy [Babcock et al. 2003] which minimizes memory usage.

Here is a list of the rest of the policies considered in our experiments:

FCFS: First Come First Served policy for minimizing maximum response time(Section 4.1).

LSF: Longest Stretch First policy for minimizing maximum slowdown (Sec-tion 4.1).

HR: Highest Rate policy for minimizing average response time (Section 3.1.1).

HNR: Highest Normalized Rate policy for minimizing average slowdown (Sec-tion 3.3).

BRT: Balance Response Time policy for minimizing `2 norm of response times(Section 4.3).

BSD: Balance Slowdown policy for minimizing `2 norm of slowdowns (Sec-tion 4.2.2).



Parameter Value

Base-case policies RR, SRPT, Chain

Adopted policies FCFS, LSF, HR, HNR, BRT, BSD

Queries 500 3-operator queries

Operator cost K 20 K 24 Secs

Operator selectivity 0.1 1.0

Window interval 1 10 Secs

System Utilization 0.1 0.99

Table III. Simulation Parameters

Utilization0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Avg.

Sl

ow

dow

n

0

200

400

600

800

1000

1200

1400

1600 RRFCFSSRPTHRHNR

Fig. 6. Avg. slowdown vs. system load

Table III summarized the simulation parameters described above.

9. EXPERIMENTS

In this section, we present the performance of our proposed policies under the differ-ent QoS metrics. We also present results on the implementation of the BSD policyas well as the performance of the PDT strategy for scheduling shared operators.

9.1 Performance under Different Metrics

In this section, we present the performance of our proposed policies under thedifferent QoS metrics.

9.1.1 Average Slowdown. Figure 6 shows how average slowdown increases withutilization. Clearly, HNR, our proposed policy, provides the lowest slowdown fol-lowed by HR. For instance at 0.7 utilization, the slowdown provided by HNR is74% lower than that of RR, 51% lower than SRPT, and 18% lower than HR. At0.97 utilization, HNR is 75% lower than RR, 53% lower than SRPT, and 20% lowerthan HR.

9.1.2 Average Response Time. As expected, this improvement in slowdown byHNR would lead to an increase in response time compared to HR as shown in

1http://ita.ee.lbl.gov/html/contrib/LBL-PKT.html



Utilization0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Avg.

R

espo

nse

Tim

e ( S

ec)

0.0

5.0e+5

1.0e+6

1.5e+6

2.0e+6

2.5e+6 RRFCFSSRPTHRHNR

Fig. 7. Avg. response vs. system load

Utilization0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Ma

x. R

esp

onse

Tim

e (

Sec)

0

2e+7

4e+7

6e+7

8e+7RRFCFSSRPTHRHNR

Fig. 8. Max. response time vs. system load

Figure 7. For instance, at 0.7 utilization, HNRs response time is 4% higher thanHR and it is 7% higher at 0.97 utilization.

9.1.3 Maximum Response Time. In terms of worst-case performance, Figure 8shows that FCFS provides the lowest maximum response time which is 75% lowerthan HR at 0.97 utilization. However, that improvement comes at the expense ofpoor average-case performance as shown in Figure 7 where the average responsetime provided by FCFS is 630% that of HR.

9.1.4 Maximum Slowdown. Similar to FCFS, Figure 9 shows that LSF reducesthe maximum slowdown by 80% compared to HNR. However, that improvementcomes at the expense of poor average-case performance (as depicted in Figure 10).

9.1.5 Trade-off in Slowdown. Figure 10 shows that BSD can strike a fine bal-ance between average slowdown and maximum slowdown. For instance, as shownin Figure 10, at 0.95 utilization, BSD decreases the maximum slowdown by 44%compared to HNR while decreasing the average slowdown by 80% compared to LSF



Utilization0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Ma

x. Sl

ow

dow

n

0

5e+4

1e+5

2e+5

2e+5RRFCFSSRPTHRHNRLSF

Fig. 9. Max. slowdown vs. system load

Maximum Slowdown0 2e+4 4e+4 6e+4 8e+4 1e+5

Ave

rage

Slo

wdo

wn

0

1000

2000

3000

4000

5000

6000

HNR: Highest Normalized Rate (minimizes avg. slowdown)

BSD: Balance Slowdown (minimizes the trade-off measured in L2)

LSF: Longest Stretch First (minimizes max. slowdown)

SRPT: Shortest Remaining Processing Time

Fig. 10. Max. vs Avg. Slowdown for HNR, LSF, SRPT, and BSD

under the same utilization.

9.1.6 `2 norm of Slowdowns. As mentioned above, the trade-off between averageand maximum slowdowns is easily captured using the `2 metric. Figure 11 showsthe `2 norm of slowdowns as the utilization of the system increases. The figureshows that BSD reduces the `2 of slowdowns by up to 57% compared to LSF andby 24% compared to HNR.

9.1.7 `2 norm of Response Times. Similar to BSD, BRT reduces the `2 normof response times by up to 51% compared to FCFS and 23% compared to HR asshown in Figure 12.

9.1.8 Slowdown per Class. To gain better insight into the behavior of the dif-ferent policies toward different classes of queries, we split the workload into distinctclasses (as suggested in [Acharya and Muthukrishnan 1998]). Tuples belong to thesame class if they were processed by operators with similar costs and selectivities.In Figure 13, we show the slowdown of tuples processed by the class of low-cost



Utilization0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

L 2 N

orm

of S

low

dow

ns

0

1e+6

2e+6

3e+6

4e+6

5e+6RRFCFSSRPTHNRLSFBSD

Fig. 11. `2 of slowdowns vs. system load

Utilization0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

L 2 N

orm

of R

esp

onse

Tim

es

0

1e+9

2e+9

3e+9

4e+9 RRFCFSSRPTHRBRT

Fig. 12. `2 of response times vs. system load

queries (i.e., queries where an operator cost is K 20) and different selectivities.The figure shows how HR is unfair toward the low-selectivity queries which leads tosignificant increase in the slowdown of the tuples processed by those queries. HNRis still biased toward high-selectivity queries, yet less than HR. Similarly, BSD isless biased than HNR. That balance allowed BSD to provide the best `2 norm ofslowdowns as shown in Fig. 11.

9.1.9 Performance over Time. Figures 14, 15, 16, and 17 show the performanceof different scheduling policies over simulation time in the interval from 108 to 4108sec. The figures show that our proposed policies provide the best performanceover time for each of the optimization metrics especially at peak times where trafficis more bursty.

9.1.10 Impact of Selectivity. To further study the impact of selectivity, we con-ducted an experiment where we assigned the same cost to all operators while varyingthe maximum value of selectivity assigned to an operator. For instance, if the max-



Selectivity0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Ave

rage

Stre

tch

per

Cla

ss (lo

g sc

ale

)

1

10

100

1000

10000

RRHRHNRBSD

Fig. 13. Slowdown per class for low-cost queries

Time (sec)1e+8 2e+8 2e+8 3e+8 3e+8 4e+8 4e+8

Avg.

R

esp

onse

Tim

e ( S

ec)

0

1e+7

2e+7

3e+7RRSRPTHR

Fig. 14. Response time over time

Time (sec)1e+8 2e+8 2e+8 3e+8 3e+8 4e+8 4e+8

Avg.

Sl

ow

dow

n

0

2000

4000

6000

8000

10000

12000

SRPTHRHNR

Fig. 15. Slowdown over time

Time (sec)1e+8 2e+8 2e+8 3e+8 3e+8 4e+8 4e+8

L 2 o

f Res

pon

se T

ime

s

0

2e+9

4e+9

6e+9SRPTHRBRT

Fig. 16. `2 of response times over time

Time (sec)1e+8 2e+8 2e+8 3e+8 3e+8 4e+8 4e+8

L 2 o

f Slo

wdo

wn

s

0

1e+6

2e+6

3e+6

4e+6

5e+6

SRPTHNRBSD

Fig. 17. `2 of slowdowns over time

imum selectivity is set to 1.0, then the selectivity value assigned to an operator isuniformly distributed in the range [0.1,1.0], whereas if the maximum is 0.5, thenthe selectivity value assigned to an operator is uniformly distributed in the range[0.1,0.5] etc.



Maximum Operator Selectivity0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

L 2 o

f Slo

wdo

wn

s

0

1e+6

2e+6

3e+6

4e+6

5e+6

HRHNRBSD

Fig. 18. `2 of slowdown vs. maximum operator selectivity

Figure 18 shows the `2 norm of slowdowns for the setting described above. Thefigure shows that when the maximum selectivity is 0.1, then HR, HNR, and BSDprovide almost the same performance since all operators will have the same selec-tivity of 0.1. As the maximum value of selectivity increases, HR will favor querieswith higher selectivity over those with lower selectivity resulting in a high `2 normof slowdowns compared to HNR and BSD. The figure shows that BSD always pro-vides the best performance since it considers both the ideal processing time of aquery as well as the age of its pending tuples. For instance, when the maximumselectivity is 0.5, BSD reduces the `2 norm of slowdowns by 44% compared to HRand by 19% compared to HNR; at a maximum of 1.0, the `2 norm is reduced by61% compared to HR and by 27% compared to HNR.

9.1.11 An Oracle Scheduling Policy. In order to validate our general strategyof using output rate (or normalized output rate) for multiple CQ scheduling, weintroduce what we call an oracle scheduling strategy. The oracle strategy has theability to peek into an input tuple and determine if it will generate an outputevent or if it will be discarded.

Clearly, the oracle strategy is not implementable in a real system as it requiresprocessing the input stream (in the same way continuous queries do), before decid-ing what to schedule. As such, we are introducing this strategy only for the sake ofcomparison, since it takes the guess out of the scheduling decision. Specifically,at each scheduling point, the oracle strategy is able to compute the exact outputrate of a query as opposed to its expected output rate as computed by our proposedpolicies.

As an example, consider a continuous query Q with 5 pending tuples, where onlythe 5th one is an event. Under regular scheduling, each of the 5 inputs is an eventwith probability S (which is the selectivity of the query), whereas under the oraclestrategy, only the 5th tuple is an event with probability 1.0 and the other tuplesare known to be discarded. Given this information, the oracle can compute theinstantaneous output rate of Q as 1.0 (the number of tuples produced) divided bythe amount of time needed to process the 5 pending tuples.



Maximum Selectivity0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Ra

tio b

etw

ee

n o

racl

e a

nd

regu

lar

sche

dulin

g

0%

20%

40%

60%

80%

100%

HR-O vs HR (Avg. Response Time)HNR-O vs HNR (Avg. Slowdown)

Fig. 19. Performance of an oracle scheduling policy compared to regular scheduling

Clearly, the oracle strategy has the advantage of not relying on selectivity esti-mation in making the scheduling decision. This is especially beneficial when theexpected selectivity deviates significantly from the exhibited one. This is illustratedin Figure 19, where we plot the ratio in performance between regular policies (HRand HNR) and oracle policies (HR-O and HNR-O) while increasing the maximumselectivity in the system (as in the previous experimental setup). The figure showsthat at low maximum selectivity (i.e., 0.1), the oracle can improve the performanceby up to 80%. As the maximum selectivity increases, the gains decrease and dropto 12% when the maximum selectivity is 1.0. The reason is that at low selectivitythere is a higher chance that an input tuple is not an event. However, only theoracle knows accurately if it is an event or not, which allows it to make a betterdecision. As the maximum selectivity increases, there are more queries in the sys-tem with high selectivity which means that there is a higher chance that a regularpolicys guess about a tuple being an event is correct. This brings the performanceof regular policies close to the oracle at higher values of maximum selectivity.

Given the above comparisons, it is clear that, in general, using variants of outputrate is the right strategy to schedule CQs. However, the exhibited gains in perfor-mance depend on the accuracy in computing the rate as illustrated in Figure 19.

9.1.12 `2 norm for Multi-stream Queries. BSD also provides the lowest `2 normof slowdowns for multi-stream queries as shown in Figure 20. In this experimentalsetting, we generated a workload where queries receive input tuples from 2 datastreams, generated following Poisson arrival. In this workload, the costs and selec-tivities of the operators are assigned uniformly as before and the windows are inthe range of 1 to 10 secs. Figure 20 shows that BSD improves the `2 norm by upto 14% compared to HNR.

It is also interesting to notice the large improvement offered by BSD over policieslike RR and FCFS. For instance, at 0.9 utilization, BSD improves the performance17 times compared to RR, and by 15 times compared to FCFS. The reason is thatRR and FCFS do not exploit selectivity which plays a more significant role in thecase of multi-stream queries where the selectivity of the join operator often exceeds



Utilization0.0 0.2 0.4 0.6 0.8 1.0

L 2 n

orm

of s

low

dow

n

0

10000

20000

30000

40000 RRFCFSHRHNRBSD

Fig. 20. `2 of slowdown for multi-stream queries

1.0.

9.2 Memory Usage

Besides CPU, memory is another resource that needs to be considered in a DSMS.For this reason, we also studied the memory requirements of each of the proposedscheduling policies. Figure 21 shows the average memory usage of our proposedpolicies, along with that of the Chain policy [Babcock et al. 2003] that was designedto minimize memory usage; we are including Chain as a yard-stick for comparison.

Figure 21 shows how policies that optimize for slowdown (i.e., the HNR andBSD) reduce the memory usage compared to those that optimize for response time(i.e., the HR and BRT). For instance, HNR reduces the memory usage by up to22% compared to HR. Both HNR and HR give higher priorities to queries with lowprocessing cost. Similarly, those queries are given higher priorities under policiesthat optimize for memory usage like Chain, since tuples that belong to such querieswill spend a short time in memory. However, when it comes to selectivity, Chaingives higher priorities to queries with low selectivities, or in other words, to querieswhose input tuples have a higher chance to be discarded, since they will save mem-ory space. On the contrary, HR gives those queries low priority since they will notgenerate results. Meanwhile, since HNR emphasizes processing cost, it will boostthe priority value of a low selectivity query with low processing cost. Hence, itallows HNR to schedule such queries earlier and save memory space.

Figure 21 also shows how the BRT and BSD policies provide more savings inmemory usage. The reason is that under such policies, if a low selectivity query hasbeen waiting for a long time, its priority increases until it is eventually executed.For instance, BSD decreases the memory usage by up to 13% compared to HNR.

In order to put these results into the proper perspective, we also compared theperformance of Chain to our proposed policies under the different QoS metrics thatwe have studied in previous experiments. Figure 22 shows that Chain consistentlysuffers under all of the QoS metrics studied in this paper. For example at utilization0.97, Chain provides 3 times the average slowdown of HNR which needs only 2 timesthe memory of Chain. Similarly, at utilization 0.97, Chain increases the `2 norm



Utilization0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Me

mo

ry U

sage

(Tu

ple

s)

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

HRBRTHNRBSDCHAIN

Fig. 21. Memory usage vs. system load

QoS Metric

Cha

in n

orm

aliz

ed

to m

inim

um

va

lue

for

QoS

0%

100%

200%

300%

400%

Avg. Resp. Time

Avg. Slowdown

L2 Norm of Slowdowns

L2 Norm of Resp. Times

Chain

Chain

Chain

Chain

HR HNRBRT BSD

Fig. 22. Performance of Chain under QoS metrics

of slowdowns by 2.2 times compared to BSD although BSD requires memory spacethat is only 1.85 times more than that of Chain. Thus, BSD is able to also strikea fine balance between improving the interactive performance within acceptablememory requirements.

9.3 Comparison of Implementation Techniques

To evaluate the impact of the implementation techniques proposed in Section 6, wecompared the performance of four policies: HNR, BSD-Hypothetical, BSD-Uniform,and BSD-Logarithmic which are defined as follows:

BSD-Hypothetical is a version of BSD where we ignore the scheduling overheads.

BSD-Uniform uses uniform clustering as in [Carney et al. 2003].

BSD-Logarithmic uses our proposed logarithmic clustering (described in Sec-tion 6).

In both BSD-Uniform and BSD-Logarithmic, we set the cost of each of the pri-



Number of Clusters (log scale)1e+0 1e+1 1e+2 1e+3 1e+4 1e+5

L 2 n

orm

of S

low

dow

n

0

5e+5

1e+6

2e+6

2e+6

HNRBSD-HypotheticalBSD-LogarithmicBSD-Uniform

Fig. 23. `2 of slowdown vs. number of clusters

Implementation TechniqueNone +Clustering +Pruning

L 2 SD

of B

SD-Lo

garit

hmic

/ L 2

SD

of B

SD-H

ypo

the

tica

l

0%

1000%

2000%

3000%

4000%

5000%

6000%

7000%

+Clustered Processing

6470%

1550%

224% 105%

Fig. 24. Efficient implementation of BSD

ority computation and comparison operations to the cost of the cheapest operatorin the query plans.

Figure 23 shows the `2 norm of slowdowns provided by the four policies vs. thenumber of clusters (i.e., m) at 0.95 utilization. The figure shows that for BSD-Logarithmic, when m is small ( 6), its `2 might exceed that for HNR, becausethe priority range covered by each cluster is large which decreases the accuracy ofthe scheduling. However, as we increase m, its performance gets closer to that ofBSD-Hypothetical such that at 12 clusters, its provided `2 norm is only 5% higherthan BSD-Hypothetical. By increasing m beyond 12, its `2 norm starts increasingagain due to increasing the search space. On the other hand, BSD-Uniform startsat a very high `2 and it decreases slowly with increasing m. That is, the accuracyof the solution is very poor when the number of clusters is small (i.e., each clusterrange is large). As such, BSD-Uniform starts to provide acceptable performance(10% higher than BSD-Hypothetical) when the cluster range is very small (noticethat in this setting 1.2e + 05).



Utilization0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Ave

rage

Re

spon

se T

ime

( Sec

)

0

2e+6

4e+6

6e+6

HR-MAXHR-SUMHR-PDT

Fig. 25. Response for grouped queries

Figure 24 shows the incremental gains provided by each of the proposed imple-mentation techniques when using 12 logarithmic clusters. The figure shows thata naive implementation of BSD will increase the `2 norm by 6470% compared toBSD-Hypothetical. By incrementally adding each of the implementation techniques,we achieve a performance that is only 5% higher than BSD-Hypothetical, i.e., theimplementation overhead of the BSD policy is only 5%.

9.4 Operator Sharing

To measure the performance of the sharing-aware versions of HNR and BSD, wecreated a workload in which queries are grouped randomly in sets of 10 queries eachwhere all queries within a set share the same select operator.

Figures 25, 26, and 27 show the performance of different scheduling policies underthe response time, slowdown, and `2 norm of slowdown metrics respectively. In eachfigure, we compare the performance of three variants for implementing the samepolicy that optimizes the metric under investigation. These three variants are:Max, Sum, and PDT, defined as follows:

Max: where the shared operator priority is equal to the priority of that onesegment within the group that has the maximum priority.

Sum: where the shared operator priority is equal to the aggregation of the pri-orities of all the segments in a group.

PDT: where the shared operator priority is equal to the aggregation of the pri-orities of the segments in its priority-defining tree (as described in Section 7).

The figures show that the PDT strategy significantly improves the performanceof each scheduling policy. For example, Figure 25 shows that, compared to Maxand Sum, PDT reduces the response time by 21% and 12% respectively, whereasthe reductions in slowdown are 24% and 18% (Figure 26) and finally, the reductionsin the `2 norm of slowdowns are 10% and 8% (Figure 27).



Utilization0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Ave

rage

Slo

wdo

wn

0

50

100

150

200

250 HNR-MAXHNR-SUMHNR-PDT

Fig. 26. Slowdown for grouped queries

Utilization0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

L 2 o

f Slo

wdo

wn

s

0

10000

20000

30000

40000

50000

60000

70000

BSD-MAXBSD-SUMBSD-PDT

Fig. 27. `2 of slowdowns for grouped queries

9.5 Adaptive Scheduling

In all the previous experiments, queries operated on data that was generated ac-cording to a uniform distribution in the range of [1, 100]. In this experiment, weuse a more dynamic setting to study the performance of our adaptive schedulingmechanism. Specifically, we divide the simulation time into 100 intervals, wherethe data in each interval is generated according to a Gaussian distribution that isspecified by a mean and a standard deviation. The mean starts at 50.0 and it isincremented by one with every new interval.

The goal of this set of experiments is to study the behavior of the adaptivevariants of our proposed policies; basically, this means that for the adaptive policies,selectivity will be estimated dynamically, as described in Section 6.3.

Figure 28 shows the ratio between the performance of the adaptive and the non-adaptive versions of each policy under the metric optimized by that policy. Forinstance, it compares the performance of the adaptive HR (i.e., HR-A) to the non-



Standard Deviation of Gaussian Distribution0 5 10 15 20 25 30 35 40 45 50

Ra

tio b

etw

ee

n a

dapt

ive

an

d st

atic

sch

edu

ling

0%

20%

40%

60%

80%

100%

HR-A vs HR (Response Time)HNR-A vs HNR (Slowdown)BRT-A vs BRT (L2 of Response Time)BSD-A vs BSD (L2 of Slowdown)

Fig. 28. Ratio of the adaptive scheduler performance vs. the static one for different metrics

adaptive HR under the response time metric. For example, a value of 20% for HR-Avs. HR means that HR-As response time is 20% of that of HR. The non-adaptiveHR assumes that data is uniformly distributed, whereas HR-A monitors the datadistribution and adjusts the operators selectivities and priorities accordingly.

Figure 28 shows that the adaptive versions of all policies always outperformthe non-adaptive ones especially at low values of standard deviation where thedistribution is highly skewed within each interval. For instance, at a standarddeviation of 25, HR-As response time is 74% of HR and HNR-As slowdown is76% of HNR, whereas at a standard deviation of 5, these values are 10% and 15%respectively.

Figure 28 also shows that the relative gain provided by HNR-A is lower thanthat provided by HR-A. This is because HNR uses the ideal processing time inits prioritizing function; this makes its non-adaptive version less sensitive to thefluctuations in selectivity. Similarly, the relative gains provided by BRT-A andBSD-A are lower than HR-A, since both BRT and BSD use the wait time in theirprioritizing functions.

Obviously, the improvement in performance provided by adaptive scheduling de-pends on the choice of values for the monitoring window length and the agingparameter . In the results shown in Figure 28, we selected a window of length 100input tuples and a value of equal to 0.175. In order to choose these specific values,we explored the combinatorial search space of the two parameters. We observedthat, in general, very low values of yield a very unstable system as it gives verylow weight to the old observations, while high values of result in an almost staticsystem that cannot adapt fast enough to changes. Similarly, for the window length,a short window does not have enough data to provide good estimates of selectivity,while long windows provide outdated statistics.

Samples of the search space are provided in Figures 29 and 30 (at standarddeviation 5 as in Figure 28). In particular, in Figure 29, we plot the performanceof the adaptive scheduler compared to the static one when is equal to 0.175 andvariable window length. Similarly, in Figure 30, we plot the performance whenthe window length is 100 and is variable. The figures show that, in general,



Length of Monitoring Window0 200 400 600 800 1000

Avg.

Sl

ow

dow

n

0

20

40

60

80

100

120

140

160

180

200

HNR-AHNR (static)

Fig. 29. Impact of monitoring window length on adaptive scheduling

Value of 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Avg.

Sl

ow

dow

n

0

20

40

60

80

100

120

140

160

180

200

HNR-AHNR (static)

Fig. 30. Impact of value on adaptive scheduling

windows between 50 and 150 tuples and s between 0.1 and 0.25 provide the bestperformance.

10. RELATED WORK

The growing need for monitoring applications has led to the development of severalprototype DSMSs (e.g., [Carney et al. 2002; Motwani et al. 2003; Carney et al.2002; Cranor et al. 2003; Chandrasekaran et al. 2003; Hammad et al. 2004]). Theseprototypes utilize new techniques for the efficient processing of continuous queriesover unbounded data streams. For example, [Viglas and Naughton 2002] proposedrate-based query optimization as a replacement to the traditional cost-based ap-proach. Also, new techniques for processing aggregate CQs appeared in [Li et al.2005], while techniques for processing join CQs appeared in [Golab and Ozsu 2003].

For multiple queries, multi-query optimization has been exploited by [Chen et al.2002] to improve system throughput in the Internet and by [Madden et al. 2002] for



Policy and Reference Objective Supported CQs

Single Multiple Join

RB Rate-based Average

[Urhan and Franklin 2001] Response Time

ML Min-Latency Average

[Carney et al. 2003] Response Time

RR Round Robin Average

[Carney et al. 2003] Response Time

HR Highest Rate Average

3.1.1 Response TimeHNR Highest Normalized Rate Average

3.3 Slowdown

FCFS First Come First Served Maximum

4.1 Response TimeLSF Longest Stretch First Maximum

4.1 Slowdown

BRT Balance Response Time `2 norm of

4.3 Response TimeBSD Balance Slowdown `2 norm of

4.2.2 Slowdown

Chain Chain Maximum

[Babcock et al. 2003] Memory usage

FAS Freshness-Aware Scheduling Average

[Sharaf et al. 2005] Freshness

Table IV. Classification of priority-based scheduling policies for CQs (policies inbold are presented in this paper)

improving throughput in TelegraphCQ. TelegraphCQ uses a query execution modelthat is based on eddies [Avnur and Hellerstein 2000]. In that model, the executionorder of operators is determined at run-time. This is particularly important whenthe operators costs and selectivities change over time. Similar to TelegraphCQ,our policies can work in a dynamic environment with support for monitoring thequeries costs and selectivities, and updating the priorities whenever it is necessary.

Operator scheduling has been addressed in several research efforts (e.g., [Urhanand Franklin 2001; Carney et al. 2003; Babcock et al. 2003; Hammad et al. 2003;Sutherland et al. 2005]). The work in [Urhan and Franklin 2001] proposes therate-based (RB) scheduling policy for scheduling operators within a single query toimprove response time. Aurora [Carney et al. 2003] uses a policy called Min-Latency(ML) which is similar to the rate-based one; ML minimizes the average tuple latencyin a single query. For multiple queries, Aurora uses a two-level scheduling schemewhere Round Robin (RR) is used to schedule queries and ML (or RB) is used toschedule operators within the query.

Aurora also proposes a QoS-aware scheduler which attempts to satisfy application-specified QoS requirements. Specifically, each query is associated with a QoS graphwhich defines the utility of stale output; the scheduler then tries to maximize theaverage QoS. In this paper, we focused on system QoS metrics that do not require



the user to have any prior knowledge about the query processing requirements or topredict the appropriate QoS graph. Specifically, we developed policies that min-imize the average response time as well as the average slowdown for multiple CQsthat include join and shared operators. We also considered balancing the worst-and average-case performance, and presented policies to do so for response timeand for slowdown.

Multi-query scheduling has also been exploited to optimize metrics other thanQoS. For example, Chain is a multi-query scheduling policy that optimizes memoryusage [Babcock et al. 2003]. The work on Chain has also been extended to balancethe trade-off between memory usage and response time [Babcock et al. 2004]. An-other metric to optimize is Quality of Data (QoD). In our work in [Sharaf et al.2005], we propose the freshness-aware scheduling policy for improving the QoD ofdata streams, when QoD is defined as freshness.

Table IV lists the scheduling policies discussed above. For each policy, it statesthe optimization metric targeted by the policy. It also states if the policy is usedin the context of a single query or multiple queries and whether or not the policyhandles multi-stream queries that contain join operators.

11. CONCLUSIONS AND FUTURE WORK

In this paper, we considered the problem of scheduling multiple heterogeneous CQsin a DSMS with the goal of optimizing QoS for end users and applications. Toquantify such QoS we first used the traditional metric of response time, whichwe defined over multiple CQs, including CQs that contain joins of multiple datastreams. We also considered slowdown as another QoS metric, since we believe itto be a more fair metric for heterogeneous workloads, and, as such, more suitablefor a wide range of monitoring applications.

Having defined the QoS metrics to optimize, we developed new scheduling policiesthat optimize the average-case performance of a DSMS for response time and forslowdown. Additionally, we proposed hybrid policies that strike a fine balance be-tween the average-case performance and the worst-case performance, thus avoidingstarvation (which is crucial for event detection CQs).

Further, we have extended the proposed policies to exploit operator sharing inoptimized multi-query plans and to handle multi-stream queries. We have alsoaugmented the proposed policies with mechanisms that ensure their adaptivity tochanges in workload. Finally, we have evaluated our proposed policies and theirimplementation experimentally and showed that our scheduling policies consistentlyoutperform previously proposed policies.

As the experimental results show, the magnitude of improvement in performanceprovided by our proposed policies depends on the considered metric as well as thequery workload and the characteristics of the input data streams. For instance,we showed that the improvement is more significant when queries cover a largeselectivity range. We also showed the need for good estimations of selectivity andprocessing costs, as well as efficient implementation and approximation algorithms.Our experimental results also show the trade-off in optimizing for QoS and opti-mizing for memory usage. As part of our future work, we are planning to studypolicies that balance the trade-off between the two objectives.



We are also planning to study the problem of scheduling multiple window-basedcontinuous aggregate queries, which is currently an open research problem. Hav-ing that new policy in place, we believe it would be very rewarding to study theperformance of all our proposed policies under more complicated workloads. Thatis, workloads in which different kinds of queries exist at the same time (i.e., filters,joins, or aggregates) as well as workloads with more complicated query structures(e.g., multi-way joins, or queries with shared join and aggregate operators).

We are also considering incorporating the proposed policies as part of the schedul-ing component in an exiting DSMS prototype, through our AQSIOS project. Thiswill provide us with the ability to assess the actual gains provided by these poli-cies in a real-system implementation. It will also provide us with insights of thescheduling overheads involved and the appropriate approximation methods to usein order to balance the trade-off between scheduling benefits and overheads.

Additionally, we are interested in developing policies that are able to balance thetrade-off between different QoS metrics as well as QoD metrics. Finally, we wouldlike to investigate policies that consider query importance and/or popularity in thescheduling decision.

REFERENCES

Acharya, S. and Muthukrishnan, S. 1998. Scheduling on-demand broadcasts: New metrics andalgorithms. In Proc. of the ACM/IEEE International Conference on Mobile Computing andNetworking (MobiCom).

Aksoy, D. and Franklin, M. 1999. RxW: A scheduling approach for large-scale on-demand databroadcast. IEEE/ACM Tran. on Networking 7, 6, 846860.

Avnur, R. and Hellerstein, J. M. 2000. Eddies: Continuously adaptive query processing. InProceedings of the ACM SIGMOD International Conference on Management of Data.

Babcock, B., Babu, S., Datar, M., and Motwani, R. 2003. Chain: Operator scheduling formemory minimization in data stream systems. In Proceedings of the ACM SIGMOD Interna-tional Conference on Management of Data.

Babcock, B., Babu, S., Datar, M., Motwani, R., and Thomas, D. 2004. Operator schedulingin data stream systems. The International Journal on Very Large Data Bases (VLDB J.) 13, 4.

Bansal, N. and Pruhs, K. 2003. Server scheduling in the lp norm: A rising tide lifts all boats.

In Proc. of the ACM Symposium on Theory of Computing (STOC).

Bender, M. A., Chakrabarti, S., and Muthukrishnan, S. 1998. Flow and stretch metrics forscheduling continuous job streams. In Proc. of the ACM Symposium on Discrete Algorithms(SODA).

Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stone-braker, M., Tatbul, N., and Zdonik, S. 2002. Monitoring streams: A new class of datamanagement applications. In Proc. of the Very Large Data Bases (VLDB) Conference.

Carney, D., Cetintemel, U., Rasin, A., Zdonik, S., Cherniack, M., and Stonebraker, M.2003. Operator scheduling in a data stream manager. In Proc. of the Very Large Data Bases(VLDB) Conference.

Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein, J. M.,Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., and Shah, M. A. 2003.TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In Proc. of the BiennialConference on Innovative Data Systems Research (CIDR).

Chen, J., DeWitt, D. J., and Naughton, J. F. 2002. Design and evaluation of alternativeselection placement strategies in optimizing continuous queries. In Proc. of the InternationalConference on Data Engineering (ICDE).

Cranor, C., Johnson, T., Spataschek, O., and Shkapenyuk, V. 2003. Gigascope: A streamdatabase for network applications. In SIGMOD.



Fagin, R., Lotem, A., and Naor, M. 2001. Optimal aggregation algorithms for middleware. InProc. of the ACM Symposium on Principles of Database Systems (PODS).

Golab, L. and Ozsu, M. T. 2003. Processing sliding window multi-joins in continuous queriesover data streams. In Proc. of the Very Large Data Bases (VLDB) Conference.

Hammad, M., Franklin, M., Aref, W., and Elmagarmid, A. K. 2003. Scheduling for sharedwindow joins over data streams. In Proc. of the Very Large Data Bases (VLDB) Conference.

Hammad, M. A., Mokbel, M. F., Ali, M. H., Aref, W. G., Catlin, A. C., Elmagarmid,A. K., Eltabakh, M., Elfeky, M. G., Ghanem, T. M., Gwadera, R., Ilyas, I. F., Marzouk,M., and Xiong, X. 2004. Nile: A query processing engine for data streams. In Proc. of theInternational Conference on Data Engineering (ICDE).

Jacobson, V. 1988. Congestion avoidance and control. In Proc. of the ACM SIGCOMM Inter-national Conference on Communications Architectures and Protocols (SIGCOMM).

Kang, J., Naughton, J. F., and Viglas, S. D. 2003. Evaluating window joins over unboundedstreams. In Proc. of the International Conference on Data Engineering (ICDE).

Li, J., Maier, D., Tufte, K., Papadimos, V., and Tucker, P. A. 2005. No pane, no gain:efficient evaluation of sliding-window aggregates over data streams. SIGMOD Record 34, 1,3944.

Madden, S., Shah, M. A., Hellerstein, J. M., and Raman, V. 2002. Continuously adaptivecontinuous queries over streams. In Proceedings of the ACM SIGMOD International Conferenceon Management of Data.

Mehta, M. and DeWitt, D. J. 1993. Dynamic memory allocation for multiple-query workloads.In Proc. of the Very Large Data Bases (VLDB) Conference.

Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Ol-ston, C., Rosenstein, J., and Varma, R. 2003. Query processing, resource management, andapproximation in a data stream management system. In Proc. of the Biennial Conference onInnovative Data Systems Research (CIDR).

Muthukrishnan, S., Rajaraman, R., Shaheen, A., and Gehrke, J. E. 1999. Online schedulingto minimize average stretch. In Proc. of the IEEE Symposium on Foundations of ComputerScience (FOCS).

Sellis, T. K. 1988. Multiple-query optimization. ACM Transactions on Database Systems(TODS) 13, 1.

Sharaf, M. A. 2007. Metrics and algorithms for processing multiple continuous queries. Ph.D.thesis, University of Pittsburgh.

Sharaf, M. A., Chrysanthis, P. K., and Labrinidis, A. 2005. Preemptive rate-b

Documents

Algorithms and Metrics for Processing Multiple Heterogeneous Continuous Queries