24
Semantics and Evaluation Semantics and Evaluation Techniques for Window Techniques for Window Aggregates in Data Streams Aggregates in Data Streams Jin Li, David Maier, Kris Jin Li, David Maier, Kris tin Tufte, Vassilis Papad tin Tufte, Vassilis Papad imos, Peter A. Tucker imos, Peter A. Tucker SIGMOD 2005 SIGMOD 2005

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

Embed Size (px)

Citation preview

Page 1: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

Semantics and Evaluation Semantics and Evaluation Techniques for Window Techniques for Window

Aggregates in Data StreamsAggregates in Data Streams

Jin Li, David Maier, Kristin TuftJin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. e, Vassilis Papadimos, Peter A.

TuckerTuckerSIGMOD 2005SIGMOD 2005

Page 2: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

IntroductionIntroduction

Window aggregation is an important Window aggregation is an important query capacity.query capacity.

Evaluating window aggregate Evaluating window aggregate queries over streams is non-trivial.queries over streams is non-trivial. OverlappingOverlapping Confusion by window definition with Confusion by window definition with

physical streamphysical stream Out-of-order data arrival.Out-of-order data arrival. ……

Page 3: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

TechniquesTechniques

Window-ID (WID): Window-ID (WID): OverlappingOverlapping Confusion by window definition with Confusion by window definition with

physical streamphysical stream Punctuation: Punctuation:

Out-of-order data arrivalOut-of-order data arrival

Page 4: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

Example 1Example 1

Q1:SELECTQ1:SELECT seg-id, max(speed), min(speeseg-id, max(speed), min(speed)d) FROMFROM Traffic [RANGE Traffic [RANGE 300 seconds 300 seconds

SLIDE SLIDE 60 seconds 60 seconds WATTR ts] WATTR ts]

GROUP BY seg-idGROUP BY seg-id

Page 5: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

Example 1Example 1

tuple

Page 6: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

Window SemanticsWindow Semantics

Window semantics often has been Window semantics often has been described operationally.described operationally. Example: some window query operators Example: some window query operators

process window extents sequentially, but data process window extents sequentially, but data arrivals without in window extent’s order.arrivals without in window extent’s order.

Page 7: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

Window SpecificationWindow Specification

Window specification: a window type and Window specification: a window type and a set of parameters that defines a window a set of parameters that defines a window to be used by a query.to be used by a query. ex: RANGE, SLIDE and WATTR in Q1.ex: RANGE, SLIDE and WATTR in Q1.

Different window aggregate query has Different window aggregate query has different window specification.different window specification. Sliding window aggregate query.Sliding window aggregate query.

Stream Query:Stream Query: Data-drivenData-driven Domain-drivenDomain-driven

Page 8: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

Window SpecificationWindow Specification

Similar to the CQL (Continuous Query LanSimilar to the CQL (Continuous Query Language).guage). Different: user specified WATTR and SLIDE pDifferent: user specified WATTR and SLIDE p

arameters.arameters.

Page 9: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

Sliding Window AggregateSliding Window Aggregate

Time-based:Time-based: Q1Q1

Row-based:Row-based:

RANGE and SLIDE are different attributes:RANGE and SLIDE are different attributes:

Page 10: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

Sliding Window AggregateSliding Window Aggregate

Partitioned Window Aggregate:Partitioned Window Aggregate:

Using function: a variation of Q3Using function: a variation of Q3

Page 11: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

Window Semantic Window Semantic FrameworkFramework

Three functions for mapping between window-idThree functions for mapping between window-ids and tuples in both directionss and tuples in both directions windowswindows, , extentextent and and wids.wids.

T T : a set of tuples.: a set of tuples. S S : window specification: window specification windows windows ((TT,,SS): set of window-ids that identify wi): set of window-ids that identify wi

ndow extents to which tuples in T may belongs.ndow extents to which tuples in T may belongs. extentextent ( (ww,,TT,,SS): the set of tuples in T belonging to ): the set of tuples in T belonging to

the window extent identified by the window extent identified by ww,, ( , )w window T S

Page 12: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

windowswindows, , extentextent

queries in which RANGE and SLIDE are queries in which RANGE and SLIDE are specified on the WATTR attribute:specified on the WATTR attribute:

slide-by-tuple:slide-by-tuple:

Page 13: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

slide-by-n_tuples:slide-by-n_tuples:

slide-by-n_tuples over logical order:slide-by-n_tuples over logical order:

partitioned tuple-based:partitioned tuple-based:

Page 14: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

Mapping Tuples to Mapping Tuples to Window-idsWindow-ids

widswids: Function for identifying window extent to w: Function for identifying window extent to which tuple hich tuple tt belongs. belongs.

queries in which RANGE and SLIDE are specifiequeries in which RANGE and SLIDE are specified on the WATTR attribute:d on the WATTR attribute:

slide-by-tuple (and variations):slide-by-tuple (and variations):

( , , ) { | ( )}wids t T S w W t extent w

( , , [ , ,1, ])

{ | ( . . )}

wids t T S RANGE RATTR row num

w W tWATTR w t RATTR RANGE

Page 15: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

Partitioned tuple-base:Partitioned tuple-base:

r=rankr=rank((t,row-num,PATTR,Tt,row-num,PATTR,T))

Page 16: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

Towards Window Query Towards Window Query EvaluationEvaluation

Backward-context Backward-context Given a tuple Given a tuple tt, it’s backward-context is , it’s backward-context is

information about tuples that have arrived information about tuples that have arrived before before tt . .

ex: partitioned tuple-based window.ex: partitioned tuple-based window. Forward-contextForward-context

Given a tuple Given a tuple tt, it’s backward-context is , it’s backward-context is information about tuples that have arrived information about tuples that have arrived after after tt..

ex: slide-by-tuple.ex: slide-by-tuple. FCF( forward-context free)FCF( forward-context free) FCA (forward-context award)FCA (forward-context award)

Page 17: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

DisorderDisorder

Merging unsynchronized streams, netMerging unsynchronized streams, network delays.work delays. ex: network flow sometimes use start time ex: network flow sometimes use start time

as timestamp.as timestamp. Methods: slack , BSort, heartbeats.Methods: slack , BSort, heartbeats.

Page 18: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

FCF Window with WID FCF Window with WID ApproachApproach

Punctuation: A message embedded Punctuation: A message embedded in a data stream indicating that a in a data stream indicating that a certain subset of data is complete. certain subset of data is complete. WID uses punctuations to signal the WID uses punctuations to signal the end of window extents.end of window extents.

wids function

punctuation

Page 19: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

FCA Windows with WID FCA Windows with WID ApproachApproach

FCB (forward-context bounded)FCB (forward-context bounded) FCU (forward-context unbounded)FCU (forward-context unbounded)

Page 20: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

PerformancePerformance

Environment:Environment: Data generator: XMark data generator, anData generator: XMark data generator, an

d network analysis tool.d network analysis tool. 1. data in generated order.1. data in generated order. 2. data in bounded-disorder2. data in bounded-disorder 3. data in block-sorted-disorder.3. data in block-sorted-disorder. Comparison: buffering mechanism.Comparison: buffering mechanism.

Page 21: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

ParametersParameters

R: RANGER: RANGE S: SLIDES: SLIDE

Page 22: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

ResultResult

WID V.S. BufferingWID V.S. Buffering

Page 23: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

ResultResult

Page 24: Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD

ConclusionConclusion