Upload
ivan-vaillant
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Semantics and Evaluation Semantics and Evaluation Techniques for Window Techniques for Window
Aggregates in Data StreamsAggregates in Data Streams
Jin Li, David Maier, Kristin TuftJin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. e, Vassilis Papadimos, Peter A.
TuckerTuckerSIGMOD 2005SIGMOD 2005
IntroductionIntroduction
Window aggregation is an important Window aggregation is an important query capacity.query capacity.
Evaluating window aggregate Evaluating window aggregate queries over streams is non-trivial.queries over streams is non-trivial. OverlappingOverlapping Confusion by window definition with Confusion by window definition with
physical streamphysical stream Out-of-order data arrival.Out-of-order data arrival. ……
TechniquesTechniques
Window-ID (WID): Window-ID (WID): OverlappingOverlapping Confusion by window definition with Confusion by window definition with
physical streamphysical stream Punctuation: Punctuation:
Out-of-order data arrivalOut-of-order data arrival
Example 1Example 1
Q1:SELECTQ1:SELECT seg-id, max(speed), min(speeseg-id, max(speed), min(speed)d) FROMFROM Traffic [RANGE Traffic [RANGE 300 seconds 300 seconds
SLIDE SLIDE 60 seconds 60 seconds WATTR ts] WATTR ts]
GROUP BY seg-idGROUP BY seg-id
Example 1Example 1
tuple
Window SemanticsWindow Semantics
Window semantics often has been Window semantics often has been described operationally.described operationally. Example: some window query operators Example: some window query operators
process window extents sequentially, but data process window extents sequentially, but data arrivals without in window extent’s order.arrivals without in window extent’s order.
Window SpecificationWindow Specification
Window specification: a window type and Window specification: a window type and a set of parameters that defines a window a set of parameters that defines a window to be used by a query.to be used by a query. ex: RANGE, SLIDE and WATTR in Q1.ex: RANGE, SLIDE and WATTR in Q1.
Different window aggregate query has Different window aggregate query has different window specification.different window specification. Sliding window aggregate query.Sliding window aggregate query.
Stream Query:Stream Query: Data-drivenData-driven Domain-drivenDomain-driven
Window SpecificationWindow Specification
Similar to the CQL (Continuous Query LanSimilar to the CQL (Continuous Query Language).guage). Different: user specified WATTR and SLIDE pDifferent: user specified WATTR and SLIDE p
arameters.arameters.
Sliding Window AggregateSliding Window Aggregate
Time-based:Time-based: Q1Q1
Row-based:Row-based:
RANGE and SLIDE are different attributes:RANGE and SLIDE are different attributes:
Sliding Window AggregateSliding Window Aggregate
Partitioned Window Aggregate:Partitioned Window Aggregate:
Using function: a variation of Q3Using function: a variation of Q3
Window Semantic Window Semantic FrameworkFramework
Three functions for mapping between window-idThree functions for mapping between window-ids and tuples in both directionss and tuples in both directions windowswindows, , extentextent and and wids.wids.
T T : a set of tuples.: a set of tuples. S S : window specification: window specification windows windows ((TT,,SS): set of window-ids that identify wi): set of window-ids that identify wi
ndow extents to which tuples in T may belongs.ndow extents to which tuples in T may belongs. extentextent ( (ww,,TT,,SS): the set of tuples in T belonging to ): the set of tuples in T belonging to
the window extent identified by the window extent identified by ww,, ( , )w window T S
windowswindows, , extentextent
queries in which RANGE and SLIDE are queries in which RANGE and SLIDE are specified on the WATTR attribute:specified on the WATTR attribute:
slide-by-tuple:slide-by-tuple:
slide-by-n_tuples:slide-by-n_tuples:
slide-by-n_tuples over logical order:slide-by-n_tuples over logical order:
partitioned tuple-based:partitioned tuple-based:
Mapping Tuples to Mapping Tuples to Window-idsWindow-ids
widswids: Function for identifying window extent to w: Function for identifying window extent to which tuple hich tuple tt belongs. belongs.
queries in which RANGE and SLIDE are specifiequeries in which RANGE and SLIDE are specified on the WATTR attribute:d on the WATTR attribute:
slide-by-tuple (and variations):slide-by-tuple (and variations):
( , , ) { | ( )}wids t T S w W t extent w
( , , [ , ,1, ])
{ | ( . . )}
wids t T S RANGE RATTR row num
w W tWATTR w t RATTR RANGE
Partitioned tuple-base:Partitioned tuple-base:
r=rankr=rank((t,row-num,PATTR,Tt,row-num,PATTR,T))
Towards Window Query Towards Window Query EvaluationEvaluation
Backward-context Backward-context Given a tuple Given a tuple tt, it’s backward-context is , it’s backward-context is
information about tuples that have arrived information about tuples that have arrived before before tt . .
ex: partitioned tuple-based window.ex: partitioned tuple-based window. Forward-contextForward-context
Given a tuple Given a tuple tt, it’s backward-context is , it’s backward-context is information about tuples that have arrived information about tuples that have arrived after after tt..
ex: slide-by-tuple.ex: slide-by-tuple. FCF( forward-context free)FCF( forward-context free) FCA (forward-context award)FCA (forward-context award)
DisorderDisorder
Merging unsynchronized streams, netMerging unsynchronized streams, network delays.work delays. ex: network flow sometimes use start time ex: network flow sometimes use start time
as timestamp.as timestamp. Methods: slack , BSort, heartbeats.Methods: slack , BSort, heartbeats.
FCF Window with WID FCF Window with WID ApproachApproach
Punctuation: A message embedded Punctuation: A message embedded in a data stream indicating that a in a data stream indicating that a certain subset of data is complete. certain subset of data is complete. WID uses punctuations to signal the WID uses punctuations to signal the end of window extents.end of window extents.
wids function
punctuation
FCA Windows with WID FCA Windows with WID ApproachApproach
FCB (forward-context bounded)FCB (forward-context bounded) FCU (forward-context unbounded)FCU (forward-context unbounded)
PerformancePerformance
Environment:Environment: Data generator: XMark data generator, anData generator: XMark data generator, an
d network analysis tool.d network analysis tool. 1. data in generated order.1. data in generated order. 2. data in bounded-disorder2. data in bounded-disorder 3. data in block-sorted-disorder.3. data in block-sorted-disorder. Comparison: buffering mechanism.Comparison: buffering mechanism.
ParametersParameters
R: RANGER: RANGE S: SLIDES: SLIDE
ResultResult
WID V.S. BufferingWID V.S. Buffering
ResultResult
ConclusionConclusion