40
Hedera: Dynamic Flow Scheduling for Data Center Network Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, Amin Vahdat - USENIX NSDI 2010 - 1 Presenter: Jason, Tsung-Cheng, HOU Advisor: Wanjiun Liao Dec. 22 nd , 2011

Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Embed Size (px)

DESCRIPTION

Internet Research Lab at NTU, Taiwan.

Citation preview

Page 1: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

1

Hedera: Dynamic Flow Scheduling for Data Center

NetworkMohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, Amin Vahdat

- USENIX NSDI 2010 -

Presenter: Jason, Tsung-Cheng, HOUAdvisor: Wanjiun Liao

Dec. 22nd, 2011

Page 2: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

2

Problem

• Relying on multipathing, due to…– Limited port densities of routers/switches– Horizontal expansion

• Multi-rooted tree topologies– Example: Fat-tree / Clos

Page 3: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

3

Problem

• BW demand is essential and volatile–Must route among multiple paths– Avoid bottlenecks and deliver aggre. BW

• However, current multipath routing…–Mostly: flow-hash-based ECMP– Static and oblivious to link-utilization– Causes long-term large-flow collisions

• Inefficiently utilizing path diversity– Need a protocol or a scheduler

Page 4: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Collisions of elephant flows

• Collisions in two ways: Upward or Downward

D1S1 D2S2 D3S3 D4S4

Page 5: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Equal Cost Paths

• Many equal cost paths going up to the core switches

• Only one path down from each core switch• Need to find good flow-to-core mapping

DS

Page 6: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

6

Goal

• Given a dynamic flow demands– Need to find paths that maximize

network bisection BW– No end hosts modifications

• However, local switch information is unable to find proper allocation– Need a central scheduler–Must use commodity Ethernet switches– OpenFlow

Page 7: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Architecture

• Detect Large Flows

– Flows that need bandwidth but are network-limited

• Estimate Flow Demands

– Use min-max fairness to allocate flows between SD pairs

• Allocate Flows

– Use estimated demands to heuristically find better placement of large flows on the EC paths

– Arrange switches and iterate again

Detect Large Flows

Estimate Flow Demands Allocate Flows

Page 8: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Architecture

• Feedback loop

• Optimize achievable bisection BW by assigning flow-to-core mappings

• Heuristics of flow demand estimation and placement

• Central Scheduler– Global knowledge of all links in the network

– Control tables of all switches (OpenFlow)

Detect Large Flows

Estimate Flow Demands Allocate Flows

Page 9: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

9

Elephant Detection

Page 10: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

10

Elephant Detection• Scheduler polls edge switches– Flows exceeding threshold are “large”– 10% of hosts’ link capacity (> 100Mbps)

• Small flows: Default ECMP hashing• Hedera complements ECMP–Default forwarding is ECMP– Only schedules large flows contributing

to bisection BW bottlenecks

• Centralized functions: the essentials

Page 11: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

11

Demand Estimation

Page 12: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

12

Demand Estimation

• Current flow rate: misleading–May be already constrained by network

• Need to find flow’s “natural” BW demand when not limited by network– As if only limited by NIC of S or D

• Allocate S/D capacity among flows using max-min fairness

• Equals to BW allocation of optimal routing, input to placement algorithm

Page 13: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

13

Demand Estimation

• Given pairs of large flows, modify each flow size at S/D iteratively– S distributes unconv. BW among flows– R limited: redistributes BW among

excessive-demand flows– Repeat until all flows converge

• Guaranteed to converge in O(|F|)– Linear to no. of flows

Page 14: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Demand Estimation

A

B

C

X

Y

Flow Estimate Conv. ?AXAYBYCY

Sender Available Unconv. BW Flows Share

A 1 2 1/2B 1 1 1C 1 1 1

Senders

Page 15: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Demand Estimation

Recv RL? Non-SLFlows Share

X No - -

Y Yes 3 1/3

Receivers

Flow Estimate Conv. ?AX 1/2AY 1/2BY 1CY 1

A

B

C

X

Y

Page 16: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Demand Estimation

Flow Estimate Conv. ?AX 1/2AY 1/3 YesBY 1/3 YesCY 1/3 Yes

Sender Available Unconv. BW Flows Share

A 2/3 1 2/3B 0 0 0C 0 0 0

Senders

A

B

C

X

Y

Page 17: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Demand Estimation

Flow Estimate Conv. ?AX 2/3 YesAY 1/3 YesBY 1/3 YesCY 1/3 Yes

Recv RL? Non-SLFlows Share

X No - -

Y No - -

Receivers

A

B

C

X

Y

Page 18: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

18

Placement Heuristics

Page 19: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Placement Heuristics

• Find a good large-flow-to-core mapping– such that average bisection BW is maximized

• Two approaches• Global First Fit: Greedily choose path that

has sufficient unreserved BW– O([ports/switch]2)

• Simulated Annealing: Iteratively find a globally better mapping of paths to flows– O(# flows)

Page 20: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Global First-Fit

• New flow found, linearly search all paths from SD

• Place on first path with links can fit the flow

• Once flow ends, entries + reservations time out

?Flow AFlow BFlow C

? ?

0 1 2 3

Scheduler

S D

Page 21: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

21

Simulated Annealing• Annealing: letting metal to cool down

and get better crystal structure– Heating up to enter higher energy state– Cooling to lower energy state with a

better structure and stopping at a temp

• Simulated Annealing: – Search neighborhood for possible states– Probabilistically accepting worse state– Accepting better state, settle gradually– Avoid local minima

Page 22: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

22

Simulated Annealing• State / State Space

– Possible solutions

• Energy– Objective

• Neighborhood– Other options

• Boltzman’s Function– Prob. to higher state

• Control Temperature– Current temp. affect

prob. to higher state

• Cooling Schedule– How temp. falls

• Stopping Criterion

)()'(

{)'( 0,10),/exp(

XfXff

Xp ffTf

)/(1)( tEEP

Page 23: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

23

Simulated Annealing

• State Space: – All possible large-flow-to-core mappings– However, same destinations map to same core– Reduce state space, as long as not too many

large flows and proper threshold

• Neighborhood:– Swap cores for two hosts within same pod,

attached to same edge / aggregate– Avoids local minima

Page 24: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

24

Simulated Annealing• Energy:– Estimated demand of flows– Total exceeded BW capacity of links, minimize

• Temperature: remaining iterations

• Probability:

• Final state is published to switches and used as initial state for next round

• Incremental calculation of exceeded cap.• No recalculation of all links, only new large

flows found and neighborhood swaps

Page 25: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

25

Evaluation

Page 26: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

26

Implementation

• 16 hosts, k=4 fat-tree data plane– 20 switches: 4-port NetFPGAs / OpenFlow– Parallel 48-port non-blocking Quanta switch– 1 scheduler, OpenFlow control protocol– Testbed: PortLand

Page 27: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

27

Simulator

• k=32; 8,192 hosts– Pack-level simulators not applicable– 1Gbps for 8k hosts, takes 2.5x1011 pkts

• Model TCP flows– TCP’s AIMD when constrained by topology– Poisson arrival of flows– No pkt size variations– No bursty traffic– No inter-flow dynamics

Page 28: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

28

PortLand/OpenFlow, k=4

Page 29: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

29

Simulator

Page 30: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Reactiveness

• Demand Estimation:

– 27K hosts, 250K flows, converges < 200ms

• Simulated Annealing:

– Asymptotically dependent on # of flows + # iter., 50K flows and 1K iter.: 11ms

– Most of final bisection BW: few hundred iter.

• Scheduler control loop:– Polling + Est. + SA = 145ms for 27K hosts

Page 31: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

31

Comments

Page 32: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

32

Comments

• Destine to same host, via same core– May congest at cores, but how severe?– Large flows to/from a host: <k/2– No proof, no evaluation

• Decrease search space and runtime– Scalable for per-flow basis? For large k?

• No protection for mice flows, RPCs– Only assumes work well under ECMP– No address when route with large flows

Page 33: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

33

Comments

• Own flow-level simulator– Aim to saturate network– No flow number by different size– Traffic generation: avg. flow size and arrival

rates (Poisson) with a mean– Only above descriptions, no specific numbers– Too ideal or not volatile enough? – Avg. bisection BW, but real-time graphs?

• States that per-flow VLB = per-flow ECMP– Does not compare with other options (VL2)– No further elaboration

Page 34: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

34

Comments

• Shared responsibility– Controller only deals with critical situations– Switches perform default measures– Improves performance and saves time– How to strike a balance?– Adopt to different problems?

• Default multipath routing– States problems of per-flow VLB and ECMP– How about per-pkt? Author’s future work– How to improve switches’ default actions?

Page 35: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

35

Comments

• Critical controller actions– Considers large flows degrade overall

efficiency– What are critical situations?– How to detect and react?– How to improve reactiveness and adaptability?

• Amin Vahdat’s lab– Proposes fat-tree topology– Develops PortLand L2 virtualization– Hedera: enhances multipath performance– Integrate all above

Page 36: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

36

References

• M. Al-Fares, et. al., “Hedera: Dynamic Flow Scheduling for Data Center Network”, USENIX NSDI 2010

• Tathagata Das, “Hedera: Dynamic Flow Scheduling for Data Center Networks”, UC Berkeley course CS 294

• M. Al-Fares, “Hedera: Dynamic Flow Scheduling for Data Center Network”, USENIX NSDI 2010, slides

Page 37: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

37

Supplement

Page 38: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Fault-Tolerance

• Link / Switch failure– Use PortLand’s fault notification protocol– Hedera routes around failed components

0 1 3Flow AFlow BFlow C

2

Scheduler

Page 39: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Fault-Tolerance

• Scheduler failure– Soft-state, not required for correctness

(connectivity)

– Switches fall back to ECMP

0 1 3Flow AFlow BFlow C

2

Scheduler

Page 40: Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Limitations

• Dynamic workloads, large flow turnover faster than control loop

– Scheduler will be continually chasing the traffic matrix

• Need to include penalty term for unnecessary SA flow re-assignments

Flow Size

Mat

rix S

tabi

lity

Stab

leU

nsta

ble

ECMP Hedera