Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers

Preview:

DESCRIPTION

Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers. Nathan Farrington George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya, Yeshaiahu Fainman, George Papen, and Amin Vahdat. Electrical Packet Switch. - PowerPoint PPT Presentation

Citation preview

Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers

Nathan FarringtonGeorge Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya,

Yeshaiahu Fainman, George Papen, and Amin Vahdat

Nathan Farrington 2

Electrical Packet Switch• $500/port• 10 Gb/s fixed rate• 12 W/port• Requires transceivers• Per-packet switching• For bursty, uniform traffic

Optical Circuit Switch• $500/port• Rate free• 240 mW/port• No transceivers• 12 ms switching time• For stable, pair-wise traffic

2010-09-02 SIGCOMM

3

Analysis

TechnologyIntro

Data PlaneControl Plane

Experimental Setup

Evaluation

Related Work

Conclusion

Nathan Farrington 4

Optical Circuit Switch

2010-09-02 SIGCOMM

Lenses FixedMirror

Mirrors on Motors

Glass FiberBundle

Input 1Output 2Output 1

Rotate Mirror1. Full crossbar switch2. Does not decode packets3. Needs external scheduler

Nathan Farrington 5

Wavelength Division Multiplexing

2010-09-02 SIGCOMM

Electrical Packet Switch1 2 3 4 5 6 7 8

WDM MUX WDM DEMUX

Optical Circuit Switch

Superlink

10G WDM OpticalTransceivers

No TransceiversRequired80G

Nathan Farrington 6

Stability Increases with Aggregation

2010-09-02 SIGCOMM

Inter-ThreadInter-ProcessInter-ServerInter-RackInter-Pod

Inter-Data Center Where is theSweet Spot?

1. Enough Stability2. Enough Traffic

7

AnalysisTechnology

Intro

Data PlaneControl Plane

Experimental SetupEvaluation

Related Work

Conclusion

Nathan Farrington 8

Bisection Bandwidth

10% Electrical(10:1 Oversubscribed)

100% Electrical Helios Example10% Electrical + 90% Optical

Cost $6.3 M

Power 96.5 kW

Cables 6,656

Example: N=64 pods * k=1024 hosts/pod = 64K hosts total; 8 wavelengths

2010-09-02 SIGCOMM

N pods, k-ports each

k switches, N-ports each

Nathan Farrington 9

Bisection Bandwidth

10% Electrical(10:1 Oversubscribed)

100% Electrical Helios Example10% Electrical + 90% Optical

Cost $6.3 M $62.3 M

Power 96.5 kW 950.3 kW

Cables 6,656 65,536

Example: N=64 pods * k=1024 hosts/pod = 64K hosts total; 8 wavelengths

2010-09-02 SIGCOMM

N pods, k-ports each

k switches, N-ports each

Nathan Farrington 10

Bisection Bandwidth

10% Electrical(10:1 Oversubscribed)

100% Electrical Helios Example10% Electrical + 90% Optical

Cost $6.3 M $62.2 M $22.1 M 2.8x Less

Power 96.5 kW 950.3 kW 157.2 kW 6.0x Less

Cables 6,656 65,536 14,016 4.7x Less

Example: N=64 pods * k=1024 hosts/pod = 64K hosts total; 8 wavelengths

2010-09-02 SIGCOMM

Fewer CoreSwitches

N pods, k-ports each

Less than k switches, N-ports each

11

AnalysisTechnology

Intro

Data PlaneControl Plane

Experimental SetupEvaluation

Related WorkConclusion

Nathan Farrington 122010-09-02 SIGCOMM

10G 10G 10G80G80G 80G

Pod 1 -> 2:• Capacity = 10G• Demand = 10G• Throughput = 10GPod 1 -> 3:• Capacity = 80G• Demand = 80G• Throughput = 80G

OCSEPS

Setup a Circuit

Pod 1 Pod 2 Pod 3

Nathan Farrington 132010-09-02 SIGCOMM

10G 10G 10G80G80G 80G

Pod 1 -> 2:• Capacity = 10G• Demand = 10G• Throughput = 10GPod 1 -> 3:• Capacity = 80G• Demand = 80G• Throughput = 80G

OCSEPS

Traffic Patterns Change

Pod 1 Pod 2 Pod 3

Nathan Farrington 142010-09-02 SIGCOMM

10G 10G 10G80G80G 80G

Pod 1 -> 2:• Capacity = 10G• Demand = 10G 80G• Throughput = 10GPod 1 -> 3:• Capacity = 80G• Demand = 80G 10G• Throughput = 10G

OCSEPS

Traffic Patterns Change

Pod 1 Pod 2 Pod 3

Nathan Farrington 152010-09-02 SIGCOMM

10G 10G 10G80G80G 80G

Pod 1 -> 2:• Capacity = 10G• Demand = 10G 80G• Throughput = 10GPod 1 -> 3:• Capacity = 80G• Demand = 80G 10G• Throughput = 10G

OCSEPS

Pod 1 Pod 2 Pod 3

Break a Circuit

Nathan Farrington 162010-09-02 SIGCOMM

10G 10G 10G80G80G 80G

Pod 1 -> 2:• Capacity = 10G• Demand = 10G 80G• Throughput = 10GPod 1 -> 3:• Capacity = 80G• Demand = 80G 10G• Throughput = 10G

OCSEPS

Pod 1 Pod 2 Pod 3

Setup a Circuit

Nathan Farrington 172010-09-02 SIGCOMM

10G 10G 10G80G80G 80G

Pod 1 -> 2:• Capacity = 80G• Demand = 80G• Throughput = 80GPod 1 -> 3:• Capacity = 80G• Demand = 80G 10G• Throughput = 10G

OCSEPS

Pod 1 Pod 2 Pod 3

Nathan Farrington 182010-09-02 SIGCOMM

10G 10G 10G80G80G 80G

Pod 1 -> 2:• Capacity = 80G• Demand = 80G• Throughput = 80GPod 1 -> 3:• Capacity = 10G• Demand = 10G• Throughput = 10G

OCSEPS

Pod 1 Pod 2 Pod 3

19

AnalysisTechnology

Intro

Data Plane

Control PlaneExperimental Setup

EvaluationRelated Work

Conclusion

Nathan Farrington 202010-09-02 SIGCOMM

10G 10G 10G80G80G 80G

OCSEPS

Pod 1 Pod 2 Pod 3

Pod SwitchManager

Pod SwitchManager

Pod SwitchManager

Circuit SwitchManager

TopologyManager

Nathan Farrington 21

Outline of Control Loop

1. Estimate traffic demand2. Compute optimal topology for maximum

throughput3. Program the pod switches and circuit

switches

2010-09-02 SIGCOMM

Nathan Farrington 22

1. Estimate Traffic Demand

Question: Will this flow use more bandwidth if we give it more capacity?

1. Identify elephant flows (mice don’t grow)Problem: Measurements are biased by current

topology2. Pretend all hosts are connected to an ideal

crossbar switch3. Compute the max-min fair bandwidth fixpoint

2010-09-02 SIGCOMM

Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, and Amin Vahdat. Hedera: Dynamic Flow Scheduling for Data Center Networks. In NSDI’10.

Nathan Farrington 23

2. Compute Optimal Topology

1. Formulate as instance of max-weight perfect matching problem on bipartite graph

2. Solve with Edmonds algorithm

2010-09-02 SIGCOMM

1

2

3

4

1

2

3

4

Source Pods Destination Pods

a) Pods do not send traffic to themselvesb) Edge weights represent interpod demandc) Algorithm is run iteratively for each circuit

switch, making use of the previous results

Nathan Farrington 24

Example: Compute Optimal Topology

2010-09-02 SIGCOMM

Nathan Farrington 25

Example: Compute Optimal Topology

2010-09-02 SIGCOMM

Nathan Farrington 26

Example: Compute Optimal Topology

2010-09-02 SIGCOMM

27

AnalysisTechnology

Intro

Data Plane

Control Plane

Experimental SetupEvaluationRelated Work

Conclusion

Nathan Farrington 282010-09-02 SIGCOMM

Traditional Network Helios Network

100% bisection bandwidth(240 Gb/s)

Nathan Farrington 29

Hardware• 24 servers

– HP DL380– 2 socket (E5520) Nehalem– Dual Myricom 10G NICs

• 7 switches– One Dell 1G 48-port– Three Fulcrum 10G 24-port– One Glimmerglass 64-port

optical circuit switch– Two Cisco Nexus 5020 10G

52-port

2010-09-02 SIGCOMM

Nathan Farrington 302010-09-02 SIGCOMM

31

Analysis

Technology

Intro

Data Plane

Control Plane

Experimental Setup

EvaluationRelated Work

Conclusion

Nathan Farrington 32

Traditional Network

2010-09-02 SIGCOMM

Hash Collisions TCP/IP Overhead

190 Gb/s Peak171 Gb/s Avg

Nathan Farrington 33

Helios Network (Baseline)

2010-09-02 SIGCOMM

160 Gb/s Peak43 Gb/s Avg

Nathan Farrington 34

Port Debouncing

2010-09-02 SIGCOMM

0.0 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0

Time (s)

1.Layer 1 PHY signal locked (bits are detected)2.Switch thread wakes up and polls for PHY status• Makes note to enable link after 2 seconds

3.Switch thread enables Layer 2 link

Nathan Farrington 35

Without Debouncing

2010-09-02 SIGCOMM

160 Gb/s Peak87 Gb/s Avg

Nathan Farrington 36

Without EDC

2010-09-02 SIGCOMM

160 Gb/s Peak142 Gb/s Avg

Software Limitation

27 ms Gaps

Nathan Farrington 37

Bidirectional Circuits

2010-09-02 SIGCOMM

Optical Circuit Switch

Pod Switch

RX TX

Pod Switch

RX TX

Pod Switch

RX TX

Nathan Farrington 38

Unidirectional Circuits

2010-09-02 SIGCOMM

Optical Circuit Switch

Pod Switch

RX TX

Pod Switch

RX TX

Pod Switch

RX TX

Nathan Farrington 39

Unidirectional Circuits

2010-09-02 SIGCOMM

Unidirectional Scheduler142 Gb/s Avg

Bidirectional Scheduler100 Gb/s Avg

Daisy Chain Needed for Good PerformanceFor Arbitrary Traffic Patterns

Nathan Farrington 40

Traffic Stability and Throughput

2010-09-02 SIGCOMM

41

Analysis

Technology

Intro

Data Plane

Control Plane

Experimental Setup

Evaluation

Related WorkConclusion

Nathan Farrington 422010-09-02 SIGCOMM

Link Technology Modifications Required

WorkingPrototype

Helios(SIGCOMM ‘10)

Optics w/ WDM10G-180G (CWDM)10G-400G (DWDM)

Switch Software Glimmerglass, Fulcrum

c-Through(SIGCOMM ’10)

Optics (10G) Host OS Emulation

Flyways(HotNets ‘09)

Wireless (1G, 10m) Unspecified

IBM System-S(GLOBECOM ‘09)

Optics (10G) Host Application;Specific to Stream Processing

Calient,Nortel

HPC(SC ‘05)

Optics (10G) Host NIC Hardware

43

Analysis

Technology

Intro

Data Plane

Control Plane

Experimental Setup

Evaluation

Related Work

Conclusion

Nathan Farrington 44

“Why Packet Switching?”

“The conventional wisdom [of 1985 is] that packet switching is poorly suited to the needs of telephony . . .”

2010-09-02 SIGCOMM

Jonathan Turner. “Design of an Integrated Services Packet Network”. IEEE J. on Selected Areas in Communications, SAC-4 (8), Nov 1986.

Nathan Farrington 45

Conclusion

• Helios: a scalable, energy-efficient network architecture for modular data centers

• Large cost, power, and cabling complexity savings• Dynamically and automatically provisions bisection

bandwidth at runtime• Does not require end-host modifications or switch

hardware modifications• Deployable today using commercial components• Uses the strengths of circuit switching to compensate for

the weaknesses of packet switching, and vice versa

2010-09-02 SIGCOMM

Recommended