18
2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3 , David G. Andersen 2 , Michael Kaminsky 1 , Michael Kozuch 1 , T. S. Eugene Ng 3 , Dina Papagiannaki 1 , Madeleine Glick 1 and Lily Mummert 1 , 1. Intel Labs Pittsburgh 2. Carnegie Mellon University 3. Rice University 1 Your Data Center Is a Router: The Case for Reconfigurable Optical Circuit Switched Paths

2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

Embed Size (px)

Citation preview

Page 1: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

2nd Symposium on Networked Systems Design & Implementation (NSDI)

Boston, MAMay 2-4, 2005

Guohui Wang3, David G. Andersen2, Michael Kaminsky1, Michael Kozuch1,

T. S. Eugene Ng3, Dina Papagiannaki1, Madeleine Glick1 and Lily Mummert1,

1. Intel Labs Pittsburgh 2. Carnegie Mellon University 3. Rice University

1

Your Data Center Is a Router: The Case for Reconfigurable Optical Circuit Switched Paths

Page 2: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

2

Data Center NetworkData Center Network

Today’s Data Center Network

Data intensive applications are experiencing bandwidth bottleneck in the tree structure data center networks. E.g. Video data processing, MapReduce …

End of Row Switch

Top of Rack Switch

CoreSwitch

Picture from: James Hamilton, Architecture for Modular Data Centers

Page 3: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

3

Full bisection bandwidth solutionsFull bisection bandwidth solutions

Re-structure data center network to provide full bisection bandwidth among all the servers.

Complicated network structure, hard to construct and expand.

Tree

FatTree BCube Picture from: Ken Hall, Green Data Centers

Page 4: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

4

Full bisection bandwidth may not be necessaryFull bisection bandwidth may not be necessary

Spatial Traffic Locality– Nodes only communicate

with a small number of partners.

– e.g. Earthquake simulation

Temporal Traffic Locality– Applications might hit

CPU, disk IO or Sync bounds.

– e.g. MapReduce

Many measurement studies have suggested evidence of traffic locality. – [SC05][WREN09][IMC09][HotNets09]

Full bisection bandwidth solutions provide too much with high costs.

Page 5: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

5

An alternative design: hybrid data center networkAn alternative design: hybrid data center network

Hybrid network may give us best of both worlds: – Optical circuit-switched paths for data intensive transfer.– Electrical packet-switched paths for timely delivery.

A B C D E F

Optical circuit-switched network

Electrical packet-switched network

Page 6: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

6

Optical Optical CircuitCircuit Switching Switching

MEMS Optical Switching Module

Switching at whatever rate modulated on input/output ports

Up to tens of ms physical reconfiguration time

Picture from: http://www.ntt.co.jp/milab/en/project/pr05_3Dmems.html

Page 7: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

7

Optical ChannelsOptical Channels

Ultra-high bandwidth

Dropping prices

40G, 100Gbps technology has been developed.

15.5Tbps over a single fiber!

Price of Optical Tranceivers

0

0.2

0.4

0.6

0.8

1

1.22000

2002

2004

2006

2008

2010

2012

Year

Cost

s

OC-192,10Gbps, OvumRHKOC-192, 10Gbps,LightcountingOC-768, 40Gbps, VSRPrice data from: Joe

Berthold, Hot Interconnects’09

Page 8: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

8

Optical circuits in datacentersOptical circuits in datacenters

A - E, B - D, C - F

A - D, B - E, C - F

A - F, B - E, C - D

A B C D E F

Advantage:– Simple and flexible:

easy to construct, expand and manage

– Ultra-high bandwidth– Low power

Disadvantage:– Fat pipes are not all-to-all. – Reconfiguration overhead

Page 9: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

9

Research questionsResearch questions

• Enough traffic locality in data centers to leverage optical path?

• Reconfigure optical paths fast enough to meet dynamic traffic?

• How to integrate optical circuits into data centers at low costs?

• How to manage and leverage optical paths?

• How do applications behave over the hybrid network?

Page 10: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

10

Is there enough traffic locality? Is there enough traffic locality?

Analyzing production data center traffic trace: – 7 racks, 155 servers,

1060 cores

– One week NetFlow traces collected at all servers

– Configure 3 optical paths out of total 21 cross-rack paths with maximum optical traffic, reconfigure every 10s.

Traffic locality: a few optical paths have the potential to offload significant amount of traffic from electrical networks.

10 sec TM

Time

10s 10s 10s

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

EvenlyDistributed

Traffic

Real Trafficis skewed

Fraction ofOptical Traffic(on average)

Page 11: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

11

R1R2

R3R4

R5

R6R7

R8

wxy= vol(Rx, Ry) + vol(Ry, Rx)Graph G: (V, E)

w12

w14 w43

w38

w68w36

w35

w27

w47

Can optical paths be reconfigured fast enough? Can optical paths be reconfigured fast enough? - - Optical Path Configuration AlgorithmOptical Path Configuration Algorithm

R1

R2

R3

R4

R5

R6

R7

R8

R1 R2 R3 R4 R5 R6 R7 R8

Optical path configuration is a maximum weight perfect matching on graph G.

Solved by polynomial time Edmonds’ algorithm[1]!

[1] J. Edmonds, Paths, trees and flowers, Canadian J. of Mathematics, pp 449-467, 1965

Page 12: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

12

Can optical paths be reconfigured fast enough? Can optical paths be reconfigured fast enough? - - Optical Path Configuration TimeOptical Path Configuration Time

Several time factors– Computation time

• 640ms for a 1000-rack data center using Edmonds’ algorithm.

– Signaling time • < 1ms in data centers

– Physical reconfiguration time• Up to tens ms for MEMS

optical switches

Even in very large data centers, optical paths can still be reconfigured at small time scales (< 1 sec).

Page 13: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

13

How to manage optical paths in data centers? How to manage optical paths in data centers?

Routing over dynamic dual-path (electrical/optical) network:

• Ethernet Spanning Tree?– NO, dual paths will be blocked

• Link State Routing?– NO, long routing convergence

time after reconfiguration

Page 14: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

14

VLAN based dual-path routing:

VLAN1: Electrical

VLAN2: Optical

• Advantages:– Leverage both electrical and optical paths by tagging packets– No route convergence delay after optical reconfiguration– No need to modify switches

How to manage optical paths in data centers? How to manage optical paths in data centers?

Page 15: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

15

How to manage optical paths in data centers? How to manage optical paths in data centers?

How to measure application traffic demand?

Extensive buffering at servers– Traffic demands

measurement– Aggregate traffic

and batch for optical transfer

Per-rack virtual output queuing: – Avoid head-of-line

blocking

Kernel

User

Apps

Network Interface

Servers

Per-rack Virtual Output Queue

Scheduler

Page 16: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

16

How to manage optical paths in data centers?How to manage optical paths in data centers?

Daemon

Kernel

Stats

Config

ConfigStats

User

Configuration Manager

Apps

Network Interface

Switches with VLAN settings

Traffic

Config

Servers

Scheduler

Per-rack Virtual Output Queue

How to configure optical paths and schedule traffic to them?

A centralized manager to control the optical path configuration.

Configurable virtual output queue scheduler to control traffic to optical paths.

A B C D

Page 17: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

17

ChallengesChallenges

• TCP/IP reacting to optical path reconfiguration.

• Potential long delays caused by extensive queuing at servers.

• Collecting traffic demand from a million servers.

• Choosing the right buffer sizes and reconfiguration intervals.

Page 18: 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005 Guohui Wang 3, David G. Andersen 2, Michael Kaminsky 1, Michael

18

SummarySummary

Adding optical circuit switched paths into data centers.

Potential benefits:

• A simpler and flexible data center network design.

• Relieving data intensive applications from network bottlenecks.