32
1 Incremental Aggregation on Multiple Continuous Queries Chun Jin Carnegie Mellon University 09/28/2006 ISMIS, Bari Italy

Incremental Aggregation on Multiple Continuous Queries

  • Upload
    eileen

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Incremental Aggregation on Multiple Continuous Queries. Chun Jin Carnegie Mellon University 09/28/2006 ISMIS, Bari Italy. Stream Processing. Intelligence monitoring Fraud detection Onset epidemic patterns Network intrusion detection GeoSpacial changes. Transactions - PowerPoint PPT Presentation

Citation preview

Page 1: Incremental Aggregation on Multiple Continuous Queries

1

Incremental Aggregationon Multiple Continuous Queries

Chun JinCarnegie Mellon University

09/28/2006 ISMIS, Bari Italy

Page 2: Incremental Aggregation on Multiple Continuous Queries

2

•Intelligence monitoring•Fraud detection•Onset epidemic patterns•Network intrusion detection•GeoSpacial changes

•Transactions•Senor network readings•Network traffic data

Stream Processing

Page 3: Incremental Aggregation on Multiple Continuous Queries

3

Problem

• Aggregate queries

• Continuous evaluation

• Multiple concurrent queries

Page 4: Incremental Aggregation on Multiple Continuous Queries

4

Solutions

• Incremental aggregation

• Incremental multiple aggregate query optimization (incremental sharing)

Page 5: Incremental Aggregation on Multiple Continuous Queries

5

Roadmap

• System overview

• Query examples

• Incremental Aggregation

• Incremental sharing

• Evaluation

Page 6: Incremental Aggregation on Multiple Continuous Queries

6

QueryNetwork

QueryCoordinator

SystemCatalog

Common Computation Identifier

(CCI)

Network Operation Manager (NOM)

Code Assembler

Sharing Optimizer(SO)

Projection Manager(PM)

System ArchitectureNew Query Insertion:1. Index query network2. Identify common computation3. Select optimal sharing path4. Expand query network

Query Network Execution:1. Code assembly2. Incremental aggregation3. Periodical execution

Engine

Generator

Oracle

Page 7: Incremental Aggregation on Multiple Continuous Queries

7

S A B

hospital vdate COUNT(*) SUM(fee) AVERAGE(fee)

S A

dis_cat hospital vdate COUNT(*) SUM(fee) AVERAGE(fee)

SELECT dis_cat, hospital, vdate,COUNT(*), AVERAGE(fee)

FROM MedGROUP BY CAT(disease) AS dis_cat,

hospital, DAY(visit_time) AS vdate(a) Query A

SELECT hospital, vdate,AVERAGE(fee)

FROM MedGROUP BY hospital,

DAY(visit_time) AS vdate(b) Query B

Query Examples

SH

SN

AH

AN

Page 8: Incremental Aggregation on Multiple Continuous Queries

8

Roadmap

• System overview

• Query examples

• Incremental Aggregation• Incremental sharing

• Evaluation

Page 9: Incremental Aggregation on Multiple Continuous Queries

9

Aggregate Function Types

• Distributive: aggregate function itself. Sum, count.

• Algebraic: a finite set of aggregate functions. Average.

• Holistic: no such finite set. Quantiles.

Incremental Aggregation

Page 10: Incremental Aggregation on Multiple Continuous Queries

10

Holistic Aggregation

• Revisiting the entire history.

• Usage: – For holistic aggregates.– For post-non-incrementally-evaluated

aggregates.– Baseline to incremental aggregation.

Incremental Aggregation

Page 11: Incremental Aggregation on Multiple Continuous Queries

11

GID COUNT(*)

AS COUNTA

SUM(fee)

AS SUMA

AVERAGE(fee)

AS AVGA

GID COUNT(*)

AS COUNTA

SUM(fee)

AS SUMA

AVERAGE(fee)

AS AVGA

0: PreUpdate State

1: Aggregate AN

t1: AH

t2: AN

SH

SN

2: Merge Groupst2.COUNTA = t1.COUNTA + t2.COUNTAt2.SUMA = t1.SUMA + t2.SUMA

3: Compute Algebraic Aggregate

COUNTAt

SUMAtAVGAt

.2

.2.2

ADig

ADig

ADig

ADig

4: Drop Duplicates

5: Insert New Results

Algorithm

Incremental Aggregation

Page 12: Incremental Aggregation on Multiple Continuous Queries

12

Complexity

1. Aggregate SN. T1 = O(|SN|)

2. Merge groups in AH to AN. Tcurr2 = O(|AH| + |AN|), Thash2 = O(|AH| + |AN|), Tprefetch2 = O(|AN|)

3. Compute algebraic aggregates in AN. T3 = O(|AN|)

4. Drop duplicates. Tcurr4 = O(|AN|*|AN

H|) = O(|AN|2), Thash4 = O(|AH|+|AN|), Tprefetch4 = O(|AN|)

5. Insert new results. T5 = O(|AN|)Incremental Aggregation

Page 13: Incremental Aggregation on Multiple Continuous Queries

13

Implementation

• System catalog:– AggreRules– AggreBasics

• Incremental aggregation instantiation

Incremental Aggregation

Page 14: Incremental Aggregation on Multiple Continuous Queries

14

System Catalog

Incremental Aggregation

Function Category Incremental Aggregation Rule

Vertical Expansion Rule

AVERAGE A SUMX/COUNTW SUMX/COUNTW

SUM D SUMX(H)+SUMX(N) SUM(SUMX)

MEDIAN H NULL NULL

COUNT D COUNTW(H)+COUNTW(N) SUM(COUNTW)

Function Basics Basic ID

AVERAGE COUNT(W) COUNTW

AVERAGE SUM(X) SUMX

SUM SUM(X) SUMX

COUNT COUNT(W) COUNTW

AggreBasics

AggreRules

Page 15: Incremental Aggregation on Multiple Continuous Queries

15

COUNTW

SUMXAVERAGE )()()(

)()()(

NCOUNTWHCOUNTWNCOUNTW

NSUMXHSUMXNSUMX

AggreRules:AggreBasics:AVERAGE: SUM(X): SUMXAVERAGE: COUNT(W): COUNTW

New Query A:AVERAGE(fee)

GroupColumns:SUM(fee): SUMACOUNT(*): COUNTAAVERAGE(fee): AVGA

AVERAGE fee

COUNTA

SUMAAVGA

COUNTW

SUMXfeeAVERAGE )(

COUNTW

SUMXAVGA

COUNTAt

SUMAtAVGAt

.2

.2.2

)()()(

)()()(

NCOUNTAHCOUNTANCOUNTA

NSUMAHSUMANSUMA

COUNTAtCOUNTAtCOUNTAt

SUMAtSUMAtSUMAt

.2.1.2

.2.1.2

SUM(X) SUMXCOUNT(W) COUNTW

SUM(fee) SUMXCOUNT(*) COUNTW

parse

retrieve rules

substitute

insert columns

sub

stitu

te

SUM(fee) SUMX

SUMA

COUNT(*) COUNTW

COUNTAAVERAGE(fee) AVGA

Name Mapping:

InstantiationIncremental Aggregation

Page 16: Incremental Aggregation on Multiple Continuous Queries

16

Roadmap

• System overview

• Query examples

• Incremental Aggregation

• Incremental sharing• Evaluation

Page 17: Incremental Aggregation on Multiple Continuous Queries

17

Incremental Multiple Query Optimization (Incremental Sharing)

• Index existing query plan information R.

• Given a new query Q, identify the sharable computations from R.

• Select the optimal sharing path.

• Expand R to compute Q.

Incremental Sharing

Page 18: Incremental Aggregation on Multiple Continuous Queries

18

Expanding Query Network

• Limited sharing on holistic aggregates

• Sharing on distributive/algebraic aggregates through vertical expansion

Incremental Sharing

Page 19: Incremental Aggregation on Multiple Continuous Queries

19

BID Rest

ID

COUNT(*)

AS COUNTA

SUM(fee)

AS SUMA

AVERAGE(fee)

AS AVGA

AH

1: Further Aggregate:COUNTB=SUM(COUNTA)SUMB=SUM(SUMA)GROUP BY BID

2:

COUNTB

SUMBAVGB

BID COUNT(*)

AS COUNTB

SUM(fee)

AS SUMB

AVERAGE(fee)

AS AVGB

BH

1: Further AggregateCOUNTB=SUM(COUNTA)SUMB=SUM(SUMA)GROUP BY BID

A B

Vertical Expansion

BDig

BDig

BDig

BDig

Incremental Sharing

Vertical Expansion

Page 20: Incremental Aggregation on Multiple Continuous Queries

20

BID

Rest ID

COUNT(*)

AS COUNTA

SUM(fee)

AS SUMA

AN

A B

BID

Rest ID

AH

BID

COUNT(*)

AS COUNTB

SUM(fee)

AS SUMB

AVERAGE(fee)

AS AVGB

BH

2: Merge Groupst2.COUNTA = t1.COUNTA + t2.COUNTAt2.SUMA = t1.SUMA + t2.SUMA

1: Further AggregateCOUNTB=SUM(COUNTA)SUMB=SUM(SUMA)GROUP BY BID

Vertical Expansion

3: Compute Algebraic Aggregate

COUNTB

SUMBAVGB

BDig

BID

COUNT(*)

AS COUNTB

SUM(fee)

AS SUMB

AVERAGE(fee)

AS AVGB

BDig BN

4: Drop Duplicates

5: Insert New Results

BDig

BDig

BDig

BDig

BDig

Page 21: Incremental Aggregation on Multiple Continuous Queries

21

Vertical Expansion Complexity

• TVcurr = O(|AN|2 + |BH|)

• TVhash = O(|AN| + |BH|)

• TVprefetch = O(|AN|)

Incremental Sharing

Page 22: Incremental Aggregation on Multiple Continuous Queries

22

Original DirectParent NodeName GroupID

Original ExprCanonical ColumnName NodeName

Original GroupExprCanonical GroupExprID

GroupExprID GroupID

GroupTopology

GroupExprSet

GroupExprIndex

GroupColumns

Incremental Sharing

SystemCatalog

Page 23: Incremental Aggregation on Multiple Continuous Queries

23

Select Optimal Sharing Path

• Select least-size node for sharing

Incremental Sharing

Page 24: Incremental Aggregation on Multiple Continuous Queries

24

Rerouting

S B

S A

A

B

S B

A

S B

B

Animation Evolution

Incremental Sharing

Page 25: Incremental Aggregation on Multiple Continuous Queries

25

Roadmap

• System overview

• Query examples

• Incremental Aggregation

• Incremental sharing

• Evaluation

Page 26: Incremental Aggregation on Multiple Continuous Queries

26

Evaluation

• Databases: – Synthesized FedWire money transfers– Anonymized Medical patient admission records

• Queries:– Seed queries– Generate sharable queries from seeds– A wild range of queries (aggregates in this paper)

• Simulation:– Historical data (300000 on Fed, and 600000 on Med)– Chunks of new data (4000 per chunk)

Evaluation

Page 27: Incremental Aggregation on Multiple Continuous Queries

27

Incremental Aggregation

Fed

(350 queries)

Med

(450 queries)

Incremental Aggregation

662 316

Non Incremental Aggregation

6236 938

Total execution time in seconds

Evaluation

Page 28: Incremental Aggregation on Multiple Continuous Queries

28

Number of FED queries

Exe

cutio

n T

ime

(s)

0

200

400

600

800

1000

1200

1400

1600

0 50 100 150 200 250 300 350

SIA NS-IA

(a) FedEvaluation

Page 29: Incremental Aggregation on Multiple Continuous Queries

29

0

20

40

60

80

100

120

140

160

180

0 50 100 150 200 250 300 350 400 450

SIA NS-IA

Number of MED queries

Exe

cutio

n T

ime

(s)

(a) MedEvaluation

Page 30: Incremental Aggregation on Multiple Continuous Queries

30

Conclusion

• Multiple aggregates over streams• Solutions:

– Incremental aggregation– Incremental MQO (incremental sharing)– Built atop DBMSs for direct practical utility

• Big performance improvement• Future work:

– A broad range of queries– Built atop DSMSs.

Page 31: Incremental Aggregation on Multiple Continuous Queries

31

Acknowledgement

• Work with Professor Jaime Carbonell.

• Part of ARGUS by CMU and Dynamix.

• Team: Phil Hayes, Santosh Ananthraman, Bob Frederking, Eugene Fink, Dwight Dietrich, Ganesh Mani, Johny Mathew.

• Thanks to Professor Chris Olston for helpful discussion.

Page 32: Incremental Aggregation on Multiple Continuous Queries

32

0.01

0.1

1

10

100

1 3 10 33 100 333 1000 3333 10000 30000

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Incremental Size: |SN|

NonVE ITTVE ITT

Non-VE IBTVE IBT

IBT

: Inc

rem

enta

l-B

atch

Exe

cutio

n T

ime

(s)

ITT

: Ave

rag

e In

divi

dual

-Tup

le E

xecu

tion

Tim

e (s

)

FED Query Pair 1

(a) Pair 1Evaluation