38
IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai, Sangeetha Seshadri, Greg Eisenhauer, Karsten Schwan and others

IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

Embed Size (px)

Citation preview

Page 1: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

IFLOW: Self-managing distributed information flows

Brian CooperYahoo! Research

Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang

Cai, Sangeetha Seshadri, Greg Eisenhauer, Karsten Schwan and others

Page 2: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

2

Overview

Motivation Case study: inTransit Architecture Flow graph

deployment/reconfiguration Experiments Other aspects of the system

Page 3: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

3

Lots of data produced in lots of places Examples: operational information systems, scientific

collaborations, end-user systems, web traffic data

Motivation

Page 4: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

4

Airline example

Flights arriving

Flights departing

Bags scanned

Customers check-in

Weather updates

Catering updates

Check seats

FAA updates

Rebook missedconnections

Shop for flights

Concourse display

Gate display

Baggage display

Home user display

Page 5: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

5

Previous solutions Tools for managing distributed updates

Pub/sub middlewares Transaction Processing Facilities In-house solutions

Times have changed How to handle larger data volumes? How to seamlessly incorporate new

functionality? How to effectively prioritize service? How to avoid hand-tuning the system?

Page 6: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

6

Approach Provide a self-managing distributed data flow graph

Flight dataFlight data

Weather dataWeather data

Check-in dataCheck-in data

Correlate flights and reservations

Correlate flights and reservations

Select ATL dataSelect ATL data

Predict delaysPredict delays

Generate customer messages

Generate customer messages

Terminal or webTerminal or web

Page 7: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

7

Approach Deploy operators in a network overlay Middleware should self-manage this

deployment Provide necessary performance, availability Respond to business-level needs

Page 8: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

8

IFLOW

WEATHER

FLIGHTS

OVERHEAD-DISPLAYCOUNTERS

Radial Distance

CoordinatesX-Window Client

ImmersaDesk

Coordinates+Bonds

IPaq Client

MolecularDynamics Experiment

CalculatesDistance and Bonds

AirlineFlowGraph{Sources ->{FLIGHTS, WEATHER, COUNTERS}Sinks ->{DISPLAY}Flow-Operators ->{JOIN-1, JOIN-2}Edges ->{(FLIGHTS, JOIN-1), (WEATHER, JOIN-1),(JOIN-1, JOIN-2), (COUNTERS, JOIN-2),(JOIN-2, DISPLAY)}Utility ->[Customer-Priority, Low Bandwidth Utilization]}

IFLOW middleware

CollaborationFlowGraph{Sources ->{Experiment}Sinks ->{IPaq, X-Window, Immersadesk}Flow-Operators ->{Coord, DistBond, RadDist, CoordBond}Edges ->{(Experiment, Coord), (Coord, DistBond),(DistBond, RadDist), (DistBond, RadDist),(RadDist, IPaq), (CoordBond, ImmersaDesk),(CoordBond, X-Window)}Utility ->[Low-Delay, Synchronized-Delivery]}

[ICAC ’06]

Page 9: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

9

Case study inTransit

Query processing over distributed event streams Operators are streaming versions of relational

operators

Page 10: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

10[ICDCS ’05]IFLOW

ArchitectureQuery?

Data-flow parserApplication layer

Middlewarelayer

Underlay layer

inTransitDistributedStreamManagement Infrastructure

inTransitDistributedStreamManagement Infrastructure

Flow-graph controlECho pub-sub PDS

Stones Messaging

Page 11: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

11

Application layer Applications specify data flow graphs

Can specify directly Can use SQL-like declarative language

STREAM N1.FLIGHTS.TIME, N7.COUNTERS.WAITLISTED, N2.WEATHER.TEMPFROM N1.FLIGHTS, N7.COUNTERS, N2.WEATHERWHEN N1.FLIGHTS.NUMBER=’DL207’

AND N7.COUNTERS.FLIGHT_NUMBER= N1.FLIGHTS.NUMBERAND N2.WEATHER.LOCATION=N1.FLIGHTS.DESTINATION;

N1

N2

N7

‘DL207’N10

Page 12: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

12

ECho – pub/sub event delivery Event channels for data streams Native operators

E-code for most operators Library functions for special cases

Stones – operator containers Queues and actions

Middleware layer

Channel 2Channel 3

⋈Channel 1

Page 13: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

13

Middleware layer

PDS – resource monitoring Nodes update PDS with resource info inTransit notified when conditions

change

CPUCPU?

CPU

CPU

Page 14: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

14

Flow graph deployment Where to place operators?

Page 15: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

15

Flow graph deployment Where to place operators? Basic idea: cluster physical nodes

Page 16: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

16

Flow graph deployment Partition flow graph among coordinators

Coordinators represent their cluster Exhaustive search among coordinators

N1

N2

N7

‘DL207’⋈N10

⋈?

? ?

Page 17: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

17

Flow graph deployment Coordinator deploys subgraph in its cluster

Uses exhaustive search to find best deployment

⋈?

Page 18: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

18

Flow graph reconfiguration Resource or load changes trigger

reconfiguration Clusters reconfigure locally Large changes require inter-cluster reconfiguration

Page 19: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

19

Hierarchical clusters Coordinators themselves are clustered

Coordinators form a hierarchy

May need to move operators between clusters Handled by moving up a level in the hierarchy

Page 20: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

20

What do we optimize Basic metrics

Bandwidth used End to end delay

Autonomic metrics Business value Infrastructure cost

0 1 2 3 4 5 6 7 8 9 1050

40

30

20

10

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Business utility

User priority

End-to-end delay

[ICAC ’05]

Page 21: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

21

Experiments Simulations

GT-ITM transit/stub Internet topology (128 nodes)

NS-2 to capture trace of delay between nodes Deployment simulator reacts to delay

OIS case study Flight information from Delta airlines Weather and news streams Experiments on Emulab (13 nodes)

Page 22: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

22

Approximation penalty

0

100

200

300

400

500

600

700

4 6 8 10 12 14

Nodes in flow graph

End-t

o-e

nd d

ela

y (m

s) .

Centralized Decentralized

Flow graphs on simulator

Page 23: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

23

Impact of reconfiguration

0

50

100

150

200

250

300

350

400

0 500 1000 1500 2000

Time (seconds)

End

-to

-end

de

lay

(ms)

.

Dynamic Static

10 node flow graph on simulator

Page 24: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

24

Impact of reconfiguration

54

56

58

60

62

64

66

68

0 500 1000 1500 2000

Time (seconds)

En

d-t

o-e

nd

de

lay

(ms)

Dynamic Static

2 node flow graph on Emulab

Network congestion

Increased processor load

Page 25: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

25

Different utility functions

0

50

100

150

200

250

300

350

400

450

500

Utility Cost Delay

Optimization criterion

Util

ity o

r co

st (

10^3

dol

lars

/sec

)

150

200

250

300

350

400

450

500

Del

ay (

mse

c)

actual-utility cost delay

Simulator, 128 node network

Page 26: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

27

Query planning We can optimize the structure of the query

graph A different join order may enable a better mapping But there are too many plan/deployment possibilities

to consider

Use the hierarchy for planning Plus: stream advertisements to locate sources and

deployed operators Planning algorithms: top-down, bottom-up

[IPDPS ‘07]

Page 27: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

28

Planning algorithms

Top down A ⋈ B ⋈ C ⋈ D

C ⋈ DA ⋈ B ⋈

C ⋈ DA ⋈ B ⋈

DCBA

Page 28: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

29

Planning algorithms

Bottom up

A ⋈ B ⋈ C ⋈ D

A ⋈ B

A ⋈ B

⋈ C ⋈ D

A ⋈ B

A ⋈ B

DCBA

Page 29: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

30

Query planning

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Phased Combined

Ban

dw

idth

co

st p

er u

nit

tim

e (d

oll

ars)

100 queries, each over 5 sources, 64 node network

Page 30: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

31

Availability management Goal is to achieve both:

Performance Reliability

These goals often conflict! Spend scarce resources on throughput

or availability?

Manage tradeoff using utility function

Page 31: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

32

Basic approach: passive standby Log of messages can be replayed Periodic “soft-checkpoint” from active to standby

Performance versus availability (fast recovery) More soft-checkpoints = faster recovery, higher

overhead Choose a checkpoint frequency that maximizes

utility

Fault tolerance

[Middleware ’06]

⋈X

Page 32: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

33

Proactive fault tolerance

Goal: predict system instability

Page 33: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

34

Proactive fault tolerance

Page 34: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

38

Mean time to recovery

Page 35: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

39

IFLOW beyond inTransit

Self-managing information flow

Complex infrastructure

inTransit Pub/sub Science app…

Page 36: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

40

Related work Stream data processing engines

STREAM, Aurora, TelegraphCQ, NiagaraCQ, etc. Borealis, TRAPP, Flux, TAG

Content-based pub/sub Gryphon, ARMADA, Hermes

Overlay networks P2P Multicast (e.g. Bayeux)

Grid

Other overlay toolkits P2, MACEDON, GridKit

Page 37: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

41

Conclusions IFLOW is a general information flow middleware

Self-configuring and self-managing Based on application-specified performance and utility

inTransit distributed event management infrastructure

Queries over streams of structured data Resource-aware deployment of query graphs IFLOW provides utility-driven deployment and

reconfiguration

Overall goal Provide useful abstractions for distributed information

systems Implementation of abstractions is self-managing Key to scalability, manageability, flexibility

Page 38: IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai,

42

For more information http://www.brianfrankcooper.net [email protected]