28
Making Every Bit Count in Wide Area Analytics Ariel Rabkin Joint work with: Matvey Arye, Siddhartha Sen, Michael J. Freedman, and Vivek Pai 1

Making Every Bit Count in Wide Area Analytics

  • Upload
    nibal

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Making Every Bit Count in Wide Area Analytics. Ariel Rabkin Joint work with: Matvey Arye , Siddhartha Sen , Michael J. Freedman, and Vivek Pai. Global Systems Have Global Data. The Rise of Big Distributed Data. CDNs: Akamai has ~20 m illion requests per second - PowerPoint PPT Presentation

Citation preview

Page 1: Making  Every  Bit Count  in  Wide Area Analytics

1

Making Every Bit Count in Wide Area Analytics

Ariel Rabkin

Joint work with: Matvey Arye, Siddhartha Sen, Michael J. Freedman, and Vivek Pai

Page 2: Making  Every  Bit Count  in  Wide Area Analytics

2

Global Systems Have Global Data

Page 3: Making  Every  Bit Count  in  Wide Area Analytics

3

The Rise of Big Distributed Data

• CDNs:– Akamai has ~20 million requests per

second– CloudFlare has about 300 MB/s of logs,

volume doubles every 4 months• Sensor data (e.g., power grid,

highways)• Smart camera networks

Page 4: Making  Every  Bit Count  in  Wide Area Analytics

4

Trends

Time

Amou

nt p

er

dolla

r Data

Volum

esWide-area Bandwidth

Page 5: Making  Every  Bit Count  in  Wide Area Analytics

5

Analyzing Low-rate Events is Easy

Server Crashed!

Alert me when server crashes!

Page 6: Making  Every  Bit Count  in  Wide Area Analytics

6

High-rate Events can be Costly

Every minute, compute request counts by URL

RequestsRequestsRequestsRequests

RequestsRequestsRequestsRequests

Page 7: Making  Every  Bit Count  in  Wide Area Analytics

7

Backhaul has Bad DynamicsExample: backhaul count of events every 5 minutesChoice of summaries is made upfront statically

• Buyer’s remorse: Chose to collect unnecessary and expensive data

• Analyst’s remorse: Summaries insufficient for analysis. No way to retroactively get more data

Page 8: Making  Every  Bit Count  in  Wide Area Analytics

8

Local Storage!

Every minute, compute request counts by URL

RequestsRequestsRequestsRequests

RequestsRequestsRequestsRequests

LocalAggregatio

n and Storage

LocalAggregatio

n and Storage

Page 9: Making  Every  Bit Count  in  Wide Area Analytics

9

Challenge: Bandwidth ScarcityI want the request count for every URL every

secondI can’t do that, Ari. That costs 100 MB/sec. You only have 12 MB/sec. Want to impose a rank cutoff, value

cutoff, or change frequency?

I can do that for 900 KB/sec.

Can I get the top 1000 URLs every second?

Great, do it!

Page 10: Making  Every  Bit Count  in  Wide Area Analytics

10

? ? ? ? ? ? ?

Challenge: Varying Scarcity

Time

Band

wid

thNeeded

Available

Can do

First aggregate over longer time periods, up to 30 seconds. Then

only keep the top URLs.

Page 11: Making  Every  Bit Count  in  Wide Area Analytics

12

Data Processing Requirements• Aggregatable

• Merge-able

Data DataMerged

Representation

+ =• Reducible

Data Data

StoredData +

=Updat

e

Page 12: Making  Every  Bit Count  in  Wide Area Analytics

13

Raw byte stringse.g. MapReduce

Database tables

High-level API

Merge + Aggregate

Predictable performance

ArbitraryJoins

X X √ X√ X X √

Page 13: Making  Every  Bit Count  in  Wide Area Analytics

14

The Data Cube Model

Counts by URL 12:00

12:01

12:02

www.mysite.com

3 5 …

www.yoursite.com

5 4 …

www.hersite.com

8 12 …Roll-up of mysite.com by time from 12:00 to 12:01:

8Roll-up of sites at time

12:00: 16

Cube: A multidimensional array, with one or more aggregates, indexed by a set of dimensions

Aggregation function used for:• Updates• Roll-ups• Merging cubes• Degrading

cubes

Page 14: Making  Every  Bit Count  in  Wide Area Analytics

15

Data Cube

Raw byte stringse.g. MapReduce

Database tables

High-level API

Merge + Aggregate

Predictable performance

ArbitraryJoins

X X √ X√ X X √√ √ √ X

Page 15: Making  Every  Bit Count  in  Wide Area Analytics

16

DataflowOperator

sLocalCube

DataflowOperator

s

Net

wor

k bo

ttle

neck

DataflowOperator

sLocal Cube

DataflowOperator

s

DataflowOperator

sMerged Cube

Dataflow

Operators

A Vision for Wide-Area Analytics

Dataflow adapted to bandwidth

Page 16: Making  Every  Bit Count  in  Wide Area Analytics

17

Adaptivity

DataflowOperator

s

Local CubeDataflowOperator

s

Net

wor

kbo

ttle

neck

Page 17: Making  Every  Bit Count  in  Wide Area Analytics

18

Feedback control

Net

wor

kbo

ttle

neck

Adaptivity

DataflowOperator

s

Local CubeDataflowOperator

sSummariz

edCube

• Key ingredients:– Cube summarization as

mechanism– User-defined policies– Feedback control

Page 18: Making  Every  Bit Count  in  Wide Area Analytics

19

Backup Slides

Page 19: Making  Every  Bit Count  in  Wide Area Analytics

20

Conclusions• The hard problems in wide-area analysis:– Reasoning about bandwidth/data quality

tradeoffs– Optimizing data quality under changing

conditions.– Jointly optimizing bandwidth and other

resources• We are building a system. –We call it JetStream. Stay tuned….

Page 20: Making  Every  Bit Count  in  Wide Area Analytics

23

Bandwidth Costs do not Decline Smoothly

[TeleGeography's Global Bandwidth Research Service]

Page 21: Making  Every  Bit Count  in  Wide Area Analytics

24 [TeleGeography's Global Bandwidth Research Service]

20% 20%

Frankfurt-

London

2012 Bandwidth Price Shifts

Page 22: Making  Every  Bit Count  in  Wide Area Analytics

25

Diurnal Load Makes Overprovisioning Expensive

• Leased lines waste capacity during off-peak

• Public internet gets congested during peak

Page 23: Making  Every  Bit Count  in  Wide Area Analytics

29

Can iteratively pose different queries

RequestsRequestsRequestsRequests

Benefit: Iteration

RequestsRequestsRequestsRequests

LocalAggregatio

n and Storage

LocalAggregatio

n and Storage

A revised query

Page 24: Making  Every  Bit Count  in  Wide Area Analytics

30

Can adapt data volume collected to available bw

RequestsRequestsRequestsRequests

Benefit: adaptation

RequestsRequestsRequestsRequests

LocalAggregatio

n and Storage

LocalAggregatio

n and Storage

Limited Bandwidth

Page 25: Making  Every  Bit Count  in  Wide Area Analytics

31

Can adapt data volume collected to available bw

RequestsRequestsRequestsRequests

Benefit: adaptation

RequestsRequestsRequestsRequests

LocalAggregatio

n and Storage

LocalAggregatio

n and Storage

Ample Bandwidth

Page 26: Making  Every  Bit Count  in  Wide Area Analytics

32

A dataflow model for wide-area analytics

Operator

Cube

Defines data transformation on tuples. Can do input or output.

Structured storage of data

Page 27: Making  Every  Bit Count  in  Wide Area Analytics

33

Processing SourceCube

Net

wor

k bo

ttle

neck

Processed Data

Processing SourceCube

Generated data Ingested Into Local cubes

Page 28: Making  Every  Bit Count  in  Wide Area Analytics

34

Processed Data

Processing