Upload
mikasi
View
53
Download
0
Embed Size (px)
DESCRIPTION
Update-Pattern-Aware Modeling and Processing of Continuous Queries. Lukasz Golab University of Waterloo, Canada [email protected] Joint work with M. Tamer Özsu. Introduction. Relational algebra and queries - PowerPoint PPT Presentation
Citation preview
Update-Pattern-Update-Pattern-Aware Modeling Aware Modeling
and Processing of and Processing of Continuous Continuous
QueriesQueriesLukasz GolabLukasz GolabUniversity of Waterloo, University of Waterloo,
[email protected]@uwaterloo.ca
Joint work with M. Tamer Joint work with M. Tamer ÖzsuÖzsu
SIGMOD, June 2005 2 of 21 Lukasz Golab
IntroductionIntroduction
Relational algebra and queriesRelational algebra and queries Each operator consumes one or more Each operator consumes one or more
relation instances and outputs a relation instances and outputs a relation instancerelation instance
BlockingBlocking computations computations Some operators have non-blocking Some operators have non-blocking
variantsvariants aggregation, joinaggregation, join
SIGMOD, June 2005 3 of 21 Lukasz Golab
What is a continuous What is a continuous query?query?
Expression composed of non-blocking Expression composed of non-blocking ``relational’’ operators that operate on ``relational’’ operators that operate on streamsstreams Streams may be bounded by Streams may be bounded by sliding windowssliding windows
QQ((tt) = answer of a continuous query ) = answer of a continuous query QQ at time at time tt
= output of corresponding one-time= output of corresponding one-time relational query relational query Q’Q’ whose inputs are the whose inputs are the current states of the streams, windows, current states of the streams, windows, and tables referenced in and tables referenced in QQ
SIGMOD, June 2005 4 of 21 Lukasz Golab
Example of a continuous Example of a continuous queryquery
SUM
InputsOutput
SIGMOD, June 2005 5 of 21 Lukasz Golab
What is an update pattern?What is an update pattern?
Update pattern does not refer to Update pattern does not refer to individual tuplesindividual tuples stream = stream = append-onlyappend-only
Update pattern refers to changes in the Update pattern refers to changes in the answer of a continuous query answer of a continuous query (insertions/deletions) (insertions/deletions)
Deletions? Aren’t streams append-only?Deletions? Aren’t streams append-only? Queries over an append-only database don’t Queries over an append-only database don’t
necessarily produce append-only outputnecessarily produce append-only output
SIGMOD, June 2005 6 of 21 Lukasz Golab
Non-append-only outputNon-append-only output
Select stocks whose price this hour Select stocks whose price this hour is greater than their price in the is greater than their price in the previous hourprevious hour
Select all stock prices reported in Select all stock prices reported in the last 5 minutesthe last 5 minutes1 2 3 4 5 6 7 8 9 10 11 121 2 3 4 5 6 7 8 9 10 11 121 2 3 4 5 6 7 8 9 10 11 12FIFO Update Pattern
Company X 8am $1.00Company X 9am $1.50Company X 10am $1.25
Update Pattern?
SIGMOD, June 2005 7 of 21 Lukasz Golab
Monotonic queriesMonotonic queries Query Query QQ is is monotonicmonotonic (over an append-only (over an append-only
database) if database) if QQ((tt) ) QQ((tt`̀for all for all t ≤ t`t ≤ t` Queries over sliding windows are Queries over sliding windows are non-non-
monotonicmonotonic because all of their results because all of their results eventually expire as the windows slide eventually expire as the windows slide forwardforward
Some queries are non-monotonic over an Some queries are non-monotonic over an append-only database (stream)append-only database (stream) Stock quotes whose price is higher than last Stock quotes whose price is higher than last
hourhour But others become non-monotonic due to But others become non-monotonic due to
windowingwindowing Select all stock quotes – monotonicSelect all stock quotes – monotonic
SIGMOD, June 2005 8 of 21 Lukasz Golab
Problem definitionProblem definition
MotivationMotivation Two possible reasons for non-monotonic Two possible reasons for non-monotonic
behaviour of continuous queriesbehaviour of continuous queries Problem statementProblem statement
Divide non-monotonic queries into Divide non-monotonic queries into classesclasses
Analyze the update patterns of each Analyze the update patterns of each classclass
Use the knowledge of update patterns Use the knowledge of update patterns in query processing and optimizationin query processing and optimization
SIGMOD, June 2005 9 of 21 Lukasz Golab
OutlineOutline
Update patterns of sliding window Update patterns of sliding window queriesqueries ClassificationClassification
Advantages of update-pattern Advantages of update-pattern awarenessawareness Modeling (query semantics)Modeling (query semantics) Processing (query execution)Processing (query execution)
SIGMOD, June 2005 10 of 21 Lukasz Golab
Sliding window operatorsSliding window operators When a tuple falls out of its window, it also When a tuple falls out of its window, it also
expires from the output and from operator stateexpires from the output and from operator state
DISTINCT
x z x z x z z x y
x z y z y
oldest
f a d a cS1
S2 c f g d af d a a c
undo
z x z x z z x y
SIGMOD, June 2005 11 of 21 Lukasz Golab
Calculating expiration Calculating expiration timestimes
Time-based windows – Time-based windows – predictable expiration predictable expiration timestimes Assign a timestamp, Assign a timestamp, tsts, upon arrival, upon arrival Expiration time = Expiration time = ts ts + window_size + window_size FIFOFIFO For joins: min(expiration times of the joined tuples)For joins: min(expiration times of the joined tuples)
Predictable, but is it stillPredictable, but is it still FIFO?FIFO?
Count-based windows, non-monotonic queries Count-based windows, non-monotonic queries over infinite streams - over infinite streams - unpredictableunpredictable Expiration time depends on stream arrival rates or Expiration time depends on stream arrival rates or
the data arriving on the stream the data arriving on the stream need need negative negative tuplestuples
SIGMOD, June 2005 12 of 21 Lukasz Golab
Classification of update Classification of update patternspatterns
MonotonicMonotonic: answers never expire: answers never expire selection, join, duplicate elimination, over infinite selection, join, duplicate elimination, over infinite
streamsstreams Weakest non-monotonicWeakest non-monotonic: answers expire in : answers expire in
FIFO order, negative tuples are not necessaryFIFO order, negative tuples are not necessary operators over time-based windows that don’t operators over time-based windows that don’t
reorder incoming tuples during processingreorder incoming tuples during processing Weak non-monotonicWeak non-monotonic: order is not FIFO, but : order is not FIFO, but
negative tuples are not needednegative tuples are not needed Time-based window join, duplicate eliminationTime-based window join, duplicate elimination
Strict non-monotonicStrict non-monotonic: unpredictable : unpredictable expiration orderexpiration order negation, queries over count-based windowsnegation, queries over count-based windows
SIGMOD, June 2005 13 of 21 Lukasz Golab
OutlineOutline
Update patterns of sliding window queries Classification
Advantages of update pattern Advantages of update pattern awarenessawareness Modeling (query semantics)Modeling (query semantics) Processing (query execution)Processing (query execution)
SIGMOD, June 2005 14 of 21 Lukasz Golab
Update-pattern-aware Update-pattern-aware semantics of continuous semantics of continuous
queriesqueries How are updates of relational tables How are updates of relational tables
different from insertions and deletions different from insertions and deletions caused by the movement of the caused by the movement of the windows?windows? Join of two infinite streams is monotonicJoin of two infinite streams is monotonic Join of two windows is weak non-Join of two windows is weak non-
monotonicmonotonic Join of a window and a table: weakest Join of a window and a table: weakest
(easier), weak (same), or strict non-(easier), weak (same), or strict non-monotonic (harder)?monotonic (harder)?
SIGMOD, June 2005 15 of 21 Lukasz Golab
Update-pattern-aware Update-pattern-aware modeling of continuous modeling of continuous
queries, cont.queries, cont. Harder: allow arbitrary table updatesHarder: allow arbitrary table updates
Strict non-monotonic because we can’t Strict non-monotonic because we can’t predict when and how the table will be predict when and how the table will be changedchanged
Easier: don’t allow retroactive updatesEasier: don’t allow retroactive updates Non-retroactive relation (NRR)Non-retroactive relation (NRR) – table – table
updates don’t affect previously arrived updates don’t affect previously arrived stream tuplesstream tuples
Weakest non-monotonicWeakest non-monotonic
SIGMOD, June 2005 16 of 21 Lukasz Golab
ExampleExample
Stream: stock quotesStream: stock quotes Table: mapping between stock symbols Table: mapping between stock symbols
and company namesand company names Query: select company name and price Query: select company name and price
over a (time-based) windowover a (time-based) window Company goes bankrupt: delete its previous Company goes bankrupt: delete its previous
quotes (relation) or not (NRR)quotes (relation) or not (NRR) Company changes name: update the name in Company changes name: update the name in
previous quotes (relation) or not (NRR)previous quotes (relation) or not (NRR) New company: no prior stock quotesNew company: no prior stock quotes
SIGMOD, June 2005 17 of 21 Lukasz Golab
Update-pattern-aware query Update-pattern-aware query processingprocessing
Annotate query plan with update Annotate query plan with update patterns of each sub-querypatterns of each sub-query
Use appropriate data structures for Use appropriate data structures for storing statestoring state
Use appropriate physical operatorsUse appropriate physical operators
Delete Insert partition by expiration time
DISTINCTDISTINCT
Strict non-monotonic Weakest or weak non-monotonic
SIGMOD, June 2005 18 of 21 Lukasz Golab
Update-pattern-aware query Update-pattern-aware query optimizationoptimization
Cost model Cost model Per-unit-time cost of executing Per-unit-time cost of executing
operators, maintaining state, and operators, maintaining state, and processing negative tuplesprocessing negative tuples
Update-pattern-aware heuristicUpdate-pattern-aware heuristic Strict NM pull-up, weakest NM push-Strict NM pull-up, weakest NM push-
down down operator and state implementations are operator and state implementations are
simpler with weakest and weak NMsimpler with weakest and weak NM
SIGMOD, June 2005 19 of 21 Lukasz Golab
Update-pattern-aware query Update-pattern-aware query optimization, cont.optimization, cont.
WKS WKS
WKSWK
STR
WKS WKS
WKSSTR
STR
STR
Stream 1 Stream 2 Stream 3 Stream 1 Stream 2 Stream 3
SIGMOD, June 2005 20 of 21 Lukasz Golab
SummarySummary
Monotonic vs. non-monotonic classification Monotonic vs. non-monotonic classification is not precise enoughis not precise enough Fails to distinguish between predictable (due Fails to distinguish between predictable (due
to windowing) and unpredictable update to windowing) and unpredictable update patternspatterns
Our update-pattern classificationOur update-pattern classification Clarifies the semantics of continuous queries Clarifies the semantics of continuous queries
that reference tables alongside that reference tables alongside streams/windowsstreams/windows
Forms the basis of our update-pattern-aware Forms the basis of our update-pattern-aware query processorquery processor
SIGMOD, June 2005 21 of 21 Lukasz Golab
Future workFuture work
Extend update-pattern-aware query Extend update-pattern-aware query optimizationoptimization
Investigate the update patterns of Investigate the update patterns of periodically re-executed queriesperiodically re-executed queries
Sub-divide queries over count-based Sub-divide queries over count-based windowswindows For now, strict non-monotonicFor now, strict non-monotonic