Aurora – system architecture

Aurora – system architecture

Pawel Jurczyk

Currently used DB systems

• Classical DBMS:– Passive repository storing data (HADP – human-active, DBMS-

passive model)– Only current state of data is important– Data synchronized; queries have exact answers (no support for

approximation)• Monitoring applications are difficult to implement in

traditional DBMS– Triggers do not scale past a few triggers per table– Problems with getting required data from historical time series– Development of dedicated middleware is expensive

• Conclusion: these systems are ill suited for applications used to alert human when abnormal situation occurs (expected DAHP model – DBMS-active, human-passive)

Aurora – main assumptions

• Data comes from various, uniquely identified data sources (data streams)

• Each incoming tuple is timestamped• Aurora is expected to process incoming streams• Tuples are transferred through loop-free,

directed graph• Outputs from the system are presented to

applications• Maintains historical storage

Aurora system overviewA

pplicationsInput data

streams

Output data

Queries Storage

•Any box can filter stream (select operation)

•Box can compute stream aggregates applying aggregate function accross a window of values in the stream

•Output of any box can be an input for several other boxes (split operation)

•Each box can gather tuples from many inputs (union operation)

Aurora query model

•Each CP and view should have a persistence specification (e.g. „keep data for 2 hr”)

•Each output is associated with QoS specification (helps to allocate the processing elements along the path)

b1

b7

b2

b6

b5b4

b3 Appl

Appl

Connection points

Storage S1 Storage S2

Storage S3

Continuous query

View

Ad-hoc query

„Keep 2 hr”

QoS spec

QoS spec

QoS spec

„Keep 2 hr”

Queries in the aurora• Continuous queries

– Query continuously processes tuples

– Output tuples are delivered to an application

• Ad-hoc queries– System will process data and deliver answer from the earliest

time stored in the connection point

– Semantic is the same as continuous query that started execution at tnow – (persistence specification)

– Query continues until explicit termination

• Views – Similar to materialized or partially-materialized views in classical

DB systems

– Application may connect to the end of this path whenever there is a need

Connection points

• Support for dynamic modification of network• Support for data caching (persistence specification) –

helpful for ad-hoc queries• Connection point without upload stream can be used as

a stored data set (like in classical DBMS)• Tuples from connection point can be pushed through the

system (e.g when connection point is „materialized” and stored tuples are passed as a stream to the downstream nodes)

• Alternatively, downstream node can pull the data (helpful in the execution of filtering or joining operations)

Optimization in the Aurora - problems

• Many changes in the network over the time

• The need of dealing with a large number of boxes

• The system operates in a data flow mode

• Optimization issues address different needs than classical DBMS

Optimization of continuous queries• Optimization is done during the run-time• Aurora starts execution of unoptimized network• Optimization is performed step-by-step for portions of

network (subnetworks)• Firstly, hold on all input messages for selected

subnetwork – drain it of messages• Then, optimize selected subnetwork

– Insert projections (get rid of unneeded attributes of tuples as soon as possible)

– Combine boxes (e.g. projection with filtering)– Reorder boxes (e.g. filtering can be pushed down the query tree

through join)• Finally, stop holding input messages• Optimizer cycles periodically through all subnetworks (it

is a background task)

Optimization of continuous queries - details

• Each box has:– c(b) – execution cost– s(b) – selectivity -expected

number of output tuples per 1 input tuple

• Amount of processing for successive boxes (according to the situation in figure):c(bi) + c(bj)*s(bi)

• Boxes are in right order if: (1-s(bj))/c(bj) < (1-s(bi))/c(bi)• Let’s check the condition above for bi and bj:

– (1 – 0.5)/1 < (1 – 5)/4 0.5 < -4/4 FALSE– The condition is not satisfied – we should change the order of boxes

S(A) T(B, C)

Filter (A>2, A<4)

Join (A=B)

Filter (C > 0)

bi: c=4; s=5

bj : c=1; s=0.5

Filter (B>2, B<4)

Optimization of ad-hoc queries

• Each ad-hoc query is attached to a connection point – it runs on all the historical data stored in a connection point

• Connection point keeps historical data as B-tree• Firstly examined ‘historical part’ of ad-hoc query

(successor(s) of connection point) – filter boxes being compatible with the B-tree storage

key can use indexed lookup– joins can use merge-sort or indexed lookup – the

cheapest one is chosen• The rest of query is optimized as continuous

queries

Run-time architecture

Router

Scheduler

Load Shedder

QoS Monitor

Storage manager

Box Processors

Q1

Q2

Qi

Qn

Qj

Buffer Manager

Persistent Storage

OutputsInputs

Run-time components• Router

– Routes tuples in the system– Forwards them either to outputs or to the storage manager

• Storage manager– Responsible for maintaining the box queues and managing the

buffer• Scheduler

– Decides which box will be processed• Box processor

– Executes the appropriate operation– Forwards output to router

• QoS monitor– Observes outputs and activates load shedder

• Load shedder– Shades load till the performance reaches the acceptable level

QoS

• Optimization is based on the attempt to maximize the perceived QoS for the outputs

• Basically, QoS is a function of:– Response times (production of output tuples)– Tuple drops– Values produced (importance of produced values)

• Administrator specifies QoS graphs for output based on one or more of mentioned functions

• Other types of QoS functions can be defined too• Administrator defines headroom for the system (the

percentage of computing resources that can be used by Aurora)

QoS graphs

• Graphs are expected to be normalized• Graphs should allow a properly sized network to operate

with all outputs in a ‘good zone’• Graphs should be convex (the value-based graph is an

exception)

1

0Delay

1

0% tuples delivered

1

0Output value

good zone

Aurora Storage Manager (ASM) – Queues management

• Windowed operations (e.g. aggregations) require historical collection of tuples

• Tuples may accumulate in various places when network is saturated

• There is one queue at the output of each box; this queue is shared by all successor boxes

• Queues are stored in memory and on disks• Queues may change length• Scheduler and ASM share scheduling priority and the

percentage of queue in the main memory

b2 b1

timeQueue organization

Processed tuples

Aurora Storage Manager (ASM) – Connection point management

• If the amount of needed historical data in the CP is less than the maximal window size of the successor boxes, no extra storage needed

• Historical data is organized in B-trees based on the storage key (default: timestamp)

• Periodically, all tuples that are older than the history requirement, are removed from B-tree

• B-trees are stored in the space allocated by the ASM

Scheduling in Aurora• Scheduler (and Aurora) aims to reduce overall

tuple execution cost• Exploit of two nonlineralities in tuple processing

– Interbox nonlinearity:• Minimaze tuple trashing (if buffer space is not sufficient

tuples has to be shuttled between memory and disk)• Avoiding to copy data from output to buffer (a possibility of

bypassing ASM when one box is scheduled right after another)

– Intrabox nonlinearity: • The cost of tuple processing may decrease as the number of

available tuples in the queue increases (avoiding context-switching, better optimization)

Scheduling in Aurora

• Aurora’s approach: (1) have in queues as many tuples as possible, (2) process it at once – train scheduling, and (3) pass them to subsequent boxes without going to disk – superbox scheduling

• Two goals: (1) minimize number of I/O operations and (2) minimize number of box calls per tuple

• How does it work?– Output is selected for execution– There is found the first downstream box with queue in memory– Then, there are considered upstream boxes – there is found as

many upstream boxes with queues (not empty) in memory as possible

– Found sequence of boxes can be scheduled one after another– Storage manager is notified to keep all the queues of selected

boxes in memory during the execution

Priorities assignment in Scheduler

• The waiting delay of tuples (a part of the latency of each output) is the function of scheduling

• The goal of scheduler: to assign priorities to boxes outputs that maximize the overall QoS

• The Scheduler’s approach is divided into two aspects: – state-based analysis that assigns priorities to outputs

and picks for scheduling the output with the highest utility

– feedback-based analysis that observes overall system and increases the priorities of outputs not doing well

Scheduler – execution overheadT

ime

(ms)

0

50

100

150

200

250

300Execution costs

Scheduling overhead

Tuple at a time Trains Superboxes

Prediction of overload situations

• Static analysis– The goal: determine if the hardware running the

network is sized correctly– Each box has processing cost c(b) and selectivity s(b)– Each input has the rate of tuples production r(d)– Analysis starts from each datasource and continues

downstream– The system is stable when: 1/c(bi) ≥ r(di)– The output rate from bi is: min(1/c(bi), r(di)) * s(bi)– Iteration of the steps above gives output data rate and

computational requirements for each box– Then there is a possibility of prediction required

computational resources


• b1: 1/0.05t/s ≥ 100t/s (not true!)

• Output stream: min(1/0.05s, 100t/s) * 0.1 = 2t/s

• b2: (1/0.05)t/s ≥ 100t/s (not true!)


S(A, B, C) T(B, C)

Filter (A>2, A<4)

Join (A=B)

Filter (C > 0)

b3: c=0.1s; s=5

b4 : c=0.05s; s=0.5

Filter (B>2, B<4)

b1: c=0.05s; s=0.1 b2: c=0.05s; s=0.1

rs=100t/s rt=100t/s

• b3: (1/0.1)t/s ≥ (2 + 2)t/s (true)

• Output stream: min(1/0.1s, 4t/s) * 5 = 20t/s

• b4: (1/0.05)t/s ≥ 20t/s (true)


• Needed computation: 100t/s+100t/s+2t/s+2t/s+20t/s+10t/s=234t/s


• Run-time analysis– Helps to deal with input rate spikes– Uses delay-based QoS information– If many of tuples are outside the ‘good zone’,

there is a probability of overload

Load shedding• Reaction to overload• Load shedding process relies on QoS information• Load shedding by dropping tuples

– Drop is a system level operator that enables to drop randomly tuples from stream at specified rate

– Drop box is located as far upstream as possible– Result of static analysis

• Dropping of tuples on network branches that terminate in more tolerant outputs

• Algorithm: (1) choose the output with the smallest negative slope in tuple drops graph, (2) move horizontally along this curve until there is another output with smaller negative slope at this point, (3) this horizontal difference is an indication of of the output tuples drop rate

– Result of dynamic analysis• Similar algorithm as previously• Can be use delay-based graphs• Dropping of tuples on branches that terminate in higher priority

outputs (otherwise it would be ineffective)

Load shedding• Load shedding by filtering tuples

– Idea: remove less important tuples rather than randomly chosen

– It use value-based QoS information– There is prepared a histogram containing the frequency

with which value ranges have been observed– Then there can be calculated utility of each of intervals

(multiply frequency with value-based QoS function value)– Backward interval propagation: Aurora picks the interval

with the lowest utility and prepares predicate for it that is used in filter box

– Forward interval propagation: Estimation of proper filter predicate and checking it by trial and error

Documents

Aurora – system architecture