DevoFlow - Scaling Flow Management for High-Performance Networks

Preview:

DESCRIPTION

Internet Research Lab at NTU, Taiwan.

Citation preview

1

DevoFlow: Scaling Flow Management for High-Performance Networks

Andrew R. Curtis (University of Waterloo); Jeffrey C. Mogul, Jean Tourrilhes, Praveen Yalagandula, Puneet Sharma, Sujata Banerjee (HP Labs), SIGCOMM 2011

Presenter: Jason, Tsung-Cheng, HOUAdvisor: Wanjiun Liao

Mar. 22nd, 2012

2

Motivation

• SDN / OpenFlow can enable per-flow management… However…

• What are the costs and limitations?• Network-wide logical graph

= always collecting all flows’ stat.s?• Any more problems beyond controller’s

scalability?• Enhancing performance / scalability of

controllers solves all problems?

3

DevoFlow Contributions

• Characterize overheads of implementing OpenFlow on switches

• Evaluate flow mgmt capability within data center network environment

• Propose DevoFlow to enable scalable flow mgmt by balancing – Network control– Statistics collection– Overheads– Switch functions and controller loads

4

Agenda

• OF Benefits, Bottlenecks, and Dilemmas• Evaluation of Overheads• DevoFlow• Simulation Results

5

Benefits

• Flexible policies w/o switch-by-sw config.• Network graph and visibility, stat.s collection• Enable traffic engineering and network mgmt• OpenFlow switches are relatively simple• Accelerate innovation:

– VL2, PortLand: new architecture, virtualized addr– Hedera: flow scheduling– ElsticTree: energy-proportional networking

• However, no further est. of overheads

6

Bottlenecks

• Root: Excessively couples.. – central control and complete visibility

• Controller bottleneck: scale by dist. sys.• Switch bottleneck:

– Data- to control-plane: limited BW– Enormous flow tables, too many entries– Control and stat.s pkts compete for BW– Introduce extra delays and latencies

• Switch bottleneck was not well studied

7

Dilemma

• Control dilemma:– Role of controller: visibility and mgmt capability

however, per-flow setup too costly– Flow-match wildcard, hash-based:

much less load, but no effective control

• Statistics-gathering dilemma:– Pull-based mechanism: counters of all flows

full visibility but demand high BW– Wildcard counter aggregation: much less entries

but lose trace of elephant flows

• Aim to strike in between

8

Main Concept of DevoFlow

• Devolving most flow controls to switches• Maintain partial visibility• Keep trace of significant flows• Default v.s. special actions:

– Security-sensitive flows: categorically inspect– Normal flows: may evolve or cover other flows

become security-sensitive or significant– Significant flows: special attention

• Collect stat.s by sampling, triggering, and approximating

9

Design Principles of DevoFlow

• Try to stay in data-plane, by default• Provide enough visibility:

– Esp. for significant flows & sec-sensitive flows– Otherwise, aggregate or approximate stat.s

• Maintain simplicity of switches

10

Agenda

• OF Benefits, Bottlenecks, and Dilemmas• Evaluation of Overheads• DevoFlow• Simulation Results

11

Overheads: Control PKTs

For a path with N switches: N+1 control pkts• First flow pkt to controller• N control messages to N switchesAverage length of a flow in 1997: 20 pktsIn clos / fat-tree DCN topo: 5 switches 6 control pkts per flow The smaller the flow, the higher cost of BW

A N-switch path

12

Overheads: Flow Setup

• Switch w/ finite BW between data / control plane, i.e. overheads between ASIC and CPU

• Setup capability: 275~300 flows/sec• Similar with [30]• In data center: mean interarrival 30 ms• Rack w/ 40 servers 1300 flows/sec• In whole data center

[43] R. Sherwood, G. Gibb, K.-K. Yap, G. Appenzeller, M. Casado,N. McKeown, and G. Parulkar. Can the production network bethe testbed? In OSDI , 2010.

13

Overheads: Flow Setup

Experiment: a single switch

14

Overheads: Flow Setup

ASIC switching rateLatency: 5 s

15

Overheads: Flow Setup

ASIC CPULatency: 0.5 ms

16

Overheads: Flow Setup

CPU ControllerLatency: 2 msA huge waste of resources!

17

Overheads: Gathering Stat.s

• [30] most longest-lived flows: only a few sec• Counters: (pkts, bytes, duration)• Push-based: to controller when flow ends• Pull-based: fetch actively by controller• 88F bytes for F flows• In 5406zl switch:

Entries:1.5K wildcard match/13K exact match total 1.3 MB, 2 fetches/sec, 17 Mbps Not fast enough! Consumes a lot of BW!

[30] S. Kandula, S. Sengupta, A. Greenberg, and P. Patel. TheNature of Datacenter Trac: Measurements & Analysis. InProc. IMC , 2009.

18

Overheads: Gathering Stat.s

2.5 sec to pull 13K entries1 sec to pull 5,600 entries0.5 sec to pull 3,200 entries

19

Overheads: Gathering Stat.s

• Per-flow setup generates too many entries• More the controller fetch longer• Longer to fetch longer the control loop• In Hedera: control loop 5 secs

BUT workload too ideal, Pareto distribution• Workload in VL2, 5 sec only improves 1~5%

over ECMP• [41], must be less than 0.5 sec to be better

[41] C. Raiciu, C. Pluntke, S. Barre, A. Greenhalgh, D. Wischik,and M. Handley. Data center networking with multipath TCP.In HotNets , 2010.

20

Overheads: Competition

• Flow setups and stat-pulling compete for BW• Must need timely stat.s for scheduling• Switch flow entries

– OpenFlow: TCAMs, wildcard, consumes lots of power & space

– Rules: 10 header fields, 288 bits each– Only 60 bits for trad. Ethernet

• Per-flow entry v.s. per-host entry

21

Overheads: Competition

22

Agenda

• OF Benefits, Bottlenecks, and Dilemmas• Evaluation of Overheads• DevoFlow• Simulation Results

23

Mechanisms

• Control– Rule cloning– Local actions

• Statistics-gathering– Sampling– Triggers and reports– Approximate counters

• Flow scheduler: like Hedera• Multipath routing: based on probability dist.

enable oblivious routing

24

Rule Cloning

• ASIC clones a wildcard rule as an exact match rule for new microflows

• Timeout or output port by probability

25

Rule Cloning

• ASIC clones a wildcard rule as an exact match rule for new microflows

• Timeout or output port by probability

26

Rule Cloning

• ASIC clones a wildcard rule as an exact match rule for new microflows

• Timeout or output port by probability

27

Local Actions

• Rapid re-routing: fallback paths predefinedRecover almost immediately

• Multipath support: based on probability dist.Adjusted by link capacity or loads

28

Statistics-Gathering

• Sampling– Pkts headers send to controller with1/1000 prob.

• Triggers and reports– Set a threshold per rule– When exceeds, enable flow setup at controller

• Approximate counters– Maintain list of top-k largest flows

29

Implementation

• Not yet on hardware• Engineers support this by using existing

functional blocks for most mechanisms• Provide some basic tools for SDN• However, scaling remains a problem

What threshold? How to sample? Rate?• Default multipath on switches• Controller samples or sets triggers to detect

elephants, schedules by bin-packing algo.

30

Simulation

• How much flow scheduling overheads can be reduced, while achieving high performance?

• Custom built flow-level simulator, based on 5406zl experiments

• Workloads generated: – Reverse-engineered [30], by MSR, 1500-server– MapReduce shuffle stage, 128MB to each other– Combine these two

[30] S. Kandula, S. Sengupta, A. Greenberg, and P. Patel. TheNature of Datacenter Trac: Measurements & Analysis. InProc. IMC , 2009.

31

Simulation

32

Agenda

• OF Benefits, Bottlenecks, and Dilemmas• Evaluation of Overheads• DevoFlow• Simulation Results

33

Simulation ResultsClos Topology

34

Simulation ResultsClos Topology

35

Simulation ResultsHyperX Topology

36

Simulation ResultsHyperX Topology

37

Simulation Results

38

Simulation Results

39

Simulation Results

40

Simulation Results

41

Simulation Results

42

Conclusion

• Per-flow control imposes too many overheads• Balance between

– Overheads and network visibility– Effective traffic engineering / network mgmt

Could lead to various researches

• Switches w/ limited resources– Flow entries / control-plane BW– Hardware capability / power consumption

Recommended