9
Front.Comput.Sci. DOI REVIEW ARTICLE Rethinking the architecture design of data center networks Kaishun WU 1,2 , Jiang XIAO 2 , Lionel M. NI 2 1 National Engineering Research Center of Digital Life, State-Province Joint Laboratory of Digital Home Interactive Applications, School of Physics and Engineering, Sun Yat-sen University, Guangzhou 510006, China 2 Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China c Higher Education Press and Springer-Verlag Berlin Heidelberg 2012 Abstract In the rising tide of the Internet of things, more and more things in the world are connected to the Internet. Recently, data has kept growing at a rate more than four times of that expected in Moore’s law. This explosion of data comes from various sources such as mobile phones, video cameras and sensor networks, which often present multidimensional characteristics. The huge amount of data brings many challenges on the management, transporta- tion, and processing IT infrastructures. To address these challenges, the state-of-art large scale data center networks have begun to provide cloud services that are increasingly prevalent. However, how to build a good data center remains an open challenge. Concurrently, the architecture design, which significantly aects the total performance, is of great research interest. This paper surveys advances in data center network design. In this paper we first introduce the upcoming trends in the data center industry. Then we review some popular design principles for today’s data center network architectures. In the third part, we present some up-to-date data center frameworks and make a comprehensive compar- ison of them. During the comparison, we observe that there is no so-called optimal data center and the design should be dierent referring to the data placement, replication, processing, and query processing. After that, several existing challenges and limitations are discussed. According to these observations, we point out some possible future research directions. Keywords data center networks, switch-based networks, Received August 23, 2011; accepted January 15, 2012 E-mail: [email protected], [email protected], [email protected] direct networks, hybrid networks 1 Introduction Because of rapid data explosion, many companies are out- growing their current server space and more data centers are required. These data-intensive systems may have hundred- s of thousands of computers and an overwhelming require- ment of aggregate network bandwidth. Dierent from tra- ditional hosting facilities, in these systems the computations continue to move into the cloud and the computing platform- s are becoming warehouses full of computers. These new datacenters should not be simply considered as a collection of servers because plenty of the hardware and software re- sources are working together by taking a good Internet ser- vice provider (ISP) into account. To support such services, industry companies have invested greatly the data center con- structions such as eBay, Facebook, Microsoft, and Yahoo. For example, Google had 10 million servers and Microsoft al- so had 50 000+ servers in their data centers in 2009 as shown in Fig. 1 [1]. During the last few years, research on data centers is grow- ing fast. In these developments, networking is the only com- ponent that has not been changed dramatically [2]. Though this component does not have the largest cost among others, it is considered as one of the key sources to reduce cost and improve performance. Being aware of this, the architecture design of the data centers has become a hot research topic in the last decade. Nowadays, typical network architectures are http://www.datacenterknowledge.com/archives/2009/10/20/google- envisions-10-million-servers

Rethinking the architecture design of data center networksjiangxiao/doc/FCS.pdf · performance of data center networks. Such interaction in-cludes checks and balances. For example,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Rethinking the architecture design of data center networksjiangxiao/doc/FCS.pdf · performance of data center networks. Such interaction in-cludes checks and balances. For example,

Front.Comput.Sci.DOI

REVIEW ARTICLE

Rethinking the architecture design of data center networks

Kaishun WU 1,2, Jiang XIAO 2, Lionel M. NI 2

1 National Engineering Research Center of Digital Life, State-Province Joint Laboratory of Digital Home InteractiveApplications, School of Physics and Engineering, Sun Yat-sen University, Guangzhou 510006, China

2 Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong,China

c⃝ Higher Education Press and Springer-Verlag Berlin Heidelberg 2012

Abstract In the rising tide of the Internet of things, moreand more things in the world are connected to the Internet.Recently, data has kept growing at a rate more than fourtimes of that expected in Moore’s law. This explosion ofdata comes from various sources such as mobile phones,video cameras and sensor networks, which often presentmultidimensional characteristics. The huge amount of databrings many challenges on the management, transporta-tion, and processing IT infrastructures. To address thesechallenges, the state-of-art large scale data center networkshave begun to provide cloud services that are increasinglyprevalent. However, how to build a good data center remainsan open challenge. Concurrently, the architecture design,which significantly affects the total performance, is of greatresearch interest. This paper surveys advances in data centernetwork design. In this paper we first introduce the upcomingtrends in the data center industry. Then we review somepopular design principles for today’s data center networkarchitectures. In the third part, we present some up-to-datedata center frameworks and make a comprehensive compar-ison of them. During the comparison, we observe that thereis no so-called optimal data center and the design shouldbe different referring to the data placement, replication,processing, and query processing. After that, several existingchallenges and limitations are discussed. According to theseobservations, we point out some possible future researchdirections.

Keywords data center networks, switch-based networks,

Received August 23, 2011; accepted January 15, 2012

E-mail: [email protected], [email protected], [email protected]

direct networks, hybrid networks

1 Introduction

Because of rapid data explosion, many companies are out-growing their current server space and more data centers arerequired. These data-intensive systems may have hundred-s of thousands of computers and an overwhelming require-ment of aggregate network bandwidth. Different from tra-ditional hosting facilities, in these systems the computationscontinue to move into the cloud and the computing platform-s are becoming warehouses full of computers. These newdatacenters should not be simply considered as a collectionof servers because plenty of the hardware and software re-sources are working together by taking a good Internet ser-vice provider (ISP) into account. To support such services,industry companies have invested greatly the data center con-structions such as eBay, Facebook, Microsoft, and Yahoo.For example, Google had 10 million servers and Microsoft al-so had 50 000+ servers in their data centers in 2009 as shownin Fig. 1 [1].

During the last few years, research on data centers is grow-ing fast. In these developments, networking is the only com-ponent that has not been changed dramatically [2]. Thoughthis component does not have the largest cost among others,it is considered as one of the key sources to reduce cost andimprove performance. Being aware of this, the architecturedesign of the data centers has become a hot research topic inthe last decade. Nowadays, typical network architectures are

http://www.datacenterknowledge.com/archives/2009/10/20/google-envisions-10-million-servers

Page 2: Rethinking the architecture design of data center networksjiangxiao/doc/FCS.pdf · performance of data center networks. Such interaction in-cludes checks and balances. For example,

2Kaishun WU et al. Rethinking the architecture design in data center networks

Fig. 1 Microsoft data center in Chicago

a hierarchy of routers and switches. When the network sizescales up and the hierarchy becomes deeper, more powerful(and much more expensive) routers and switches are need-ed. As the data centers further develop and expand, the gapbetween the desired bandwidth and provisioning increases,though the hardware develops quickly as well. Thus, one ofthe major challenges in architecture design is how to achievehigher performance while keeping the cost low. This articlefocuses on this question and will present some current work-s in this area. In the remainder of this article, we will firstintroduce some basic design principles. Subsequently, wegive details of the interconnection techniques using commod-ity switches, such as fat-tree [3], DCell [4], and BCube [5].We then make a comparison of the different data center archi-tectures. Based on what we learned, we highlight some openchallenges and limitations of the existing works and suggestsome possible future directions.

2 Design principles/issues

In this section, we present some design criteria and consider-ations of modern data center designs.

Scalability: as the amount of data grows, we need more s-torage capacity to keep the data in the data center. One typicalway to increase storage is to add more components instead ofreplacing old ones. As more hardware is integrated in datacenters, the scalability of the data center network is crucial.

Incremental scalability: in practice, instead of adding ahuge number of servers at a time, we usually add a smallnumber of storage hosts at a time. We expect minimal impactduring the add-on on both the system operator and the system

itself [6].Cabling complexity: in traditional networks (e.g., homes

and offices), the cabling complexity is simple. In data centerenvironments, however, cabling is a critical issue when ten-s of thousands of nodes are hosted. The massive number ofcables introduces many practical problems such as the con-necting efforts, maintenance, and cooling.

Bisection bandwidth: bisection bandwidth is defined asthe bandwidth between two equal parts segmented from theoriginal network using an arbitrary partition manner. Thismetric is widely used in performance evaluation of data cen-ter networks [3–5, 7, 8].

Aggregated throughput: the aggregate throughput mea-sures the sum of the aggregated data rates when a network-wide broadcast is conducted. It is also known as the systemthroughput.

Oversubscription: given a particular communicationtopology in a data center network, its oversubscription is de-fined as the ratio of the maximal aggregate bandwidth amongthe end hosts in the worst case to the total bisection band-width [3].

Fault Tolerance: hardware failures are common in large-scale data centers, which make data center networks sufferfrom poor reliability and low utilization [9]. When hard-ware failures occur, alternative means are needed to ensurethe availability of the data center networks.

Energy consumption: recently, the power efficiency ofdata center networks has become increasingly important. Tosave the energy consumption, we can use low-power CPUsand GPUs, equip more efficient power supplies, and applywater-cooling mechanisms. Software means, such as visual-ization and smart cooling, can also help. Besides these, thearchitecture design is also critical for controlling the energycost.

Costs: the cost greatly affects the design decisions forbuilding a large-scale data center networks. We hope to lever-age economic off-the-shelf hardware for large-scale networkinterconnection [2].

Fairness: for applications that farm work out to manyworkers and finish when the last worker finishes,such fair-ness can greatly improve overall performance in data centernetworks.

Reliability: high reliability is one of the most essentialcriteria for designing data center networks. In fact, a greatwaste of computing resources can be induced by an unreliabledata center networks that cause the operation of applicationsand services to fail.

Security: security is also critical for the success of the data

Page 3: Rethinking the architecture design of data center networksjiangxiao/doc/FCS.pdf · performance of data center networks. Such interaction in-cludes checks and balances. For example,

Front. Comput. Sci. China3

center network services. Data exchanged between differentnodes should be isolated from other unintended services toguarantee security .

Latency: in data center networks, the delay incurred inthe end systems or transmission between network nodes iscalled latency. Low latency interconnection in data centernetworks will benefit international data traffic. For example,by reducing the international data transmission latency, thecolocation cost can be reduced.

These criteria interplay with each other to influence theperformance of data center networks. Such interaction in-cludes checks and balances. For example, a data center net-work will induce high latency when some link fails (becausethere are too many hops to transmit a packet); a data centernetwork with high reliability should also be fault tolerant.

3 Existing DCN architectures

The state-of-art data center networks can be classified intothree main classes according to the different network inter-connection principles, namely switch-based networks, directnetworks, and hybrid networks. In the next part we will elab-orate and exemplify each of them.

3.1 Switch-based network

A switch-based network, called an “indirect network”, typi-cally consists of a multi-level tree of switches to connect theend servers (typically two or three levels). Switch-based net-works are widely adopted and implemented in today’s tera-scale data centers. They are able to support communica-tions between tens of thousands of servers. Take a conven-tional three-level switch-based network as an example. Theleaf switches (also known as the top of rack (ToR) switches)have a set of 1 Gbps Ethernet ports and are responsible fortransferring packets within the rack. The layer two aggrega-tion switches have 10 Gbps links to interconnect ToR switch-es, and these layer-2 switches will be connected by a morepowerful switch when more hierarchy structure is applied Inswitch-based network architectures, the bottleneck is at thetop level of the tree. Such bandwidth bottleneck is often alle-viated by employing more powerful hardware at the expenseof high-end switches. These solutions may increase the over-subscription problem and cause scalability issues. To addressthese issues, Fat-tree architecture [3] has been proposed.

http://www.datacenterknowledge.com/archives/2009/10/20/google-envisions-10-million-servers

Instead of the “skinny” links that are used in a tradition-al tree, Fat-tree allows “fatter” links from the leaves towardsthe root. A typical Fat-tree can be split into three layers: core,aggregation, and edge (see Fig. 2). Suppose there are k pods(a pod is a small bunch of servers with certain connectionsin between) in the aggregation layer and each pod support-s non-blocking operation among ( k

2 )2 core switches in a datacenter network (in the example of Fig. 2, k=4). The aggregateswitches are divided into two groups: one is directly connect-ed to k

2 servers and the other is connected to the remaininggroup. Consecutively, the total number of servers supportedby a fat tree is k3

4 . Each core switch can connect to other podswitches and ultimately connect to the servers. Fat-tree ar-chitecture performs as well as the traditional tree architecturebut uses commodity switches only. It avoids the high-endexpensive devices.

Recently, the proliferation of cloud services incentivizesthe construction of data center monsters. To supply a myri-ad of distinct services in such scale data centers, server uti-lizations should be improved as well. Towards this end, theagility of assigning any server to any service is an essentialproperty for a data center network design. The architectureof higher agility can achieve a high utilization and the ca-pability of dynamically allocating resources. For instance,Greenberg et al. [9] introduce the virtual layer 2 (VL2) ar-chitecture based on the basic fat tree topology. VL2 presentsthe attractive agility of connecting all the servers to a sepa-rate VL2 Ethernet switch with servers ranging from one to100 000. VL2 deploys valiant load balancing (VLB) amongmultipath to ensure non-interfering network. To better hostonline applications running on substantial servers within acommon multi-rooted tree data center, PortLand [10] is pro-posed. PortLand adopts a “plug-and-play” layer-2 design toenhance the fault tolerance and scalability. By employing anOperFlow fabric manger with the local switches in the edgeof data center networks, PortLand can delicately make theappropriate forwarding decision.

3.2 Direct network

Another option to make connections between the servers isby a direct network (also termed as “router-based network”).Direct networks directly connect servers to other servers inthe network without any switch, routers, or network devices.Servers will serve both as a server and a network forwarder.Direct networks are often used to provide better scalability,fault tolerance, and high network capacities. Some prac-tical implementations of direct networks will be presented

Page 4: Rethinking the architecture design of data center networksjiangxiao/doc/FCS.pdf · performance of data center networks. Such interaction in-cludes checks and balances. For example,

4Kaishun WU et al. Rethinking the architecture design in data center networks

Core

Aggregation

Edge

Fig. 2 Switch-base architecture

DCell0[0] DCell0[1] DCell0[2] DCell0[3] DCell0[4]

[0.0] [0.1] [0.2] [0.3] [1.0] [1.1] [1.2] [1.3] [2.0] [2.1] [2.2] [2.3] [3.0] [3.1] [3.2] [3.3] [4.0] 4.1] [4.2] [4.3]

DCell1

Fig. 3 DCell architecture

[0] [1] ... [nk-1]

BCubek

...

BCubek-1[0] BCubek-1[1] BCubek-1[n-1]

[0] [1] ... [nk-1]

...

[0] [1] ... [nk-1]

...

... ... ......

Fig. 4 BCube architecture

Ethernet switch

Optical switch

Fig. 5 Hybrid architecture

here. DCell [4] is one of the first direct data center networks.In DCell, servers are connected to several other servers vi-a mini-switches with bidirectional communication links. Ahigh-level DCell is constructed in a top-to-down, recursivemanner. More specifically, denote a level k DCell as DCellkwhere k ≥ 0. Primarily, n servers and a mini-switch form aDCell0 in which all servers are connected to the mini-switch.DCell1 is therefore made of n + 1 DCell0. All pairs of thesen DCell0 will be connected by a single link. Therefore inDCell1, each sever has two links, one to connect to its mini-switch and the other to connect to a server in a DCell0 (seeFig. 3 as an exmaple). Similarly we can construct DCell2with n + 1 DCell1 and so on for DCellk using n + 1 DCellk−1.High network capacity and good fault tolerance are desirabletraits in DCell. Though DCell is advantageous to scale out,its incremental scalability is incredibly poor. Specifically, aslong as a DCell architecture is accomplished, it is very hardto add a small number of new servers to the architecture with-out ruining the original architecture structure. Moreover, theimbalanced traffic load also makes DCell perform poorly. Tosupport unevenly distributed traffic load in data centers, gen-eralized DCell framework [7] is proposed which has a smallerdiameter and higher symmetry structure.

With the data-intensive services spread out all over theworld, a high degree mobility modular data center (MDC) is

urged to emerge. Shipping-container based MDC is ideal foreliminating hardware administration tasks (e.g., installation,trouble-shooting, and maintenance). It achieves cost effec-tiveness and environmental robustness by deploying a sever-centric approach. For instance, BCube [5] structure is spe-cially devised for MDC consisting of multi-port servers andswitches. Based on DCell, BCube is recursively constructedfrom n BCube0 and n n-port switches. Such BCubek (k ≥ 1,denotes the level) is built from n BCubek−1 in which eachserver has k+1 ports. It is easy to see that a BCubek compris-es nk+1 servers and k+1 levels of switches. Figure 4 illustratesthe basic procedure of constructing BCubek. The employedstrategy of BCube guarantees that switches merely connectto servers rather than other switches. Recently, Microsoft re-search proposed a project named CamCube [11] that appliesa 3D torus topology. The 3D torus topology shares a similaridea of BCube. In 3D torus, each server directly connects tosix other servers bypassing the usage of switches or routers.As the communication links between servers in data centerare “directed”, higher bisection bandwidth is expected.

3.3 Hybrid network

A novel approach to interconnect servers and switches ap-pears in the rising tide of employing optical circuit switches.Compared to packet switching, the optical circuit switching

Page 5: Rethinking the architecture design of data center networksjiangxiao/doc/FCS.pdf · performance of data center networks. Such interaction in-cludes checks and balances. For example,

Front. Comput. Sci. China5

is superior in terms of its ultra-high bandwidth, low transmis-sion loss, and low power consumption. More importantly, op-tical switches are becoming commodity off-the-shelf (COT-S) and require shorter reconfiguration time thanks to the re-cent advances in micro-electro-mechanical systems (MEM-S). With these improvements, a number of data center net-works deploy both optical circuit switching and electricalpacket switching to make the connections. We call these “hy-brid networks” as shown in Fig. 5. For instance, Helios [12]explores a hybrid 2-level multi-rooted tree architecture. Bysimply programming the packet switches and circuit switch-es, Helios creates an opportunity to provide preferable ultra-high network bandwidth and a reduced number of wiring ca-bles. Besides Helios, another hybrid data center network ar-chitecture is called c-Through [13]. c-Through makes a bet-ter use of the transient high capacity optical circuits by inte-grating optical circuit switches and packet switching servers.The optical circuit switches will buffer traffic to collect suffi-cient volumes for high-speed transmission. A key differencebetween Helios and c-Through is that Helios implements it-s traffic load on switches while c-Through alternatively uti-lizes the hosts to buffer data. Helios is advantageous for itstransparency of traffic control at the end hosts, but requiresthe modification of every employed switch. In contrast, c-Through buffers data in the hosts, which allows c-Through toamortize the workload over a longer period of time and utilizethe optical link more effectively. Helios and c-Through are t-wo typical hybrid schemes that attempt to optimize the datacenter networks by taking the advantages from both kinds ofswitches to make the data center networks most beneficial byoptimizing both kind of switches.

4 A sea of architectures: which to choose?

In the previous section we give insights of the state-of-artdata center network architectures. These proposals exhib-it promising features according to their measurements andperformance evaluations and etc. It is, however, not clearhow they perform when mutual comparison is conducted. Wetherefore make a comprehensive comparison between them.In this section we construct a typical data center network con-text and compare the performances of different proposals us-ing the metrics in Section 2. In our comparisons, we comparethe following alternatives and summarize them in Table 1.

The traditional hierarchical tree structure presents the ad-vantages of ease-of-wire but is limited by poor scalability.It is well known that tree-based architectures are vulnerable

to link failures between switches and routers and thereforefault-tolerance is poor. Fat-tree solves this problem to someextent by increasing the number of aggregation switches butthe wiring become much more complex. Multipath routingis effective in maximizing the network capacity such as theTwoLevelTable, hot-spot-routing used by VL2, and location-discovery-protocol (LDP) by PortLand. To cope with thetremendous workload volatility in data centers, fat-tree adopt-s VLB to guarantee the balance among different traffic pat-terns. In terms of the fault-tolerance, fat tree provides grace-fully degraded performance, making it greatly outperform thetree structure. It develops a failure broadcast protocol to han-dle two groups of link failure between: (a) the lower- andupper-layer switches, and (b) the upper layer and core switch-es. Fat tree is also much more cost effective than the treestructure as it requires no expensive high-end switches androuters.

DCell is an alternative proposal that adopts “direct” recur-sively defined interconnection topology. In DCell, servers inidentical layers are fully connected, which makes it more s-calable than fat tree. However, incremental development isa strenuous mission for DCell due to the significant cablingcomplexity. In addition, traffic imbalance could be a severeobstacle to considering DCell as a primary choice.

In most commercial companies today, a shipping-container-based modular data center meets the need of highdegree mobility. BCube is the first representative Modulardata center. It packs sets of servers and switches into a s-tandard 20- or 40- feet shipping-container and then connectsdifferent containers through external links. Based on DCell,BCube is designed to support various traffic loads and pro-vide high bisection bandwidth. Load balancing is an appeal-ing advantage of BCube compared to DCell. MDCube [14] s-cales the BCube structure to a mega level while ensuring highcapacity at a reasonable cost. The server centric MDCube de-ploys a virtual generalized hypercube at the container level.This approach directly interconnects multiple BCube blocksusing 10 Gbps optical switch links. Each switch functions asa virtual interface and each BCube block is treated as a virtu-al node. In a way that one node can have multiple interfaces;MDCube can interconnect a huge number of BCube blockswith high network capacity. Also, it delicately provides loadbalancing and fault tolerant routing to immensely improvethe performance.

For hybrid structures, electrical switches provide low-latency immediate configuration and optical switches are ad-vanced at the ultra-high speed data transmission, low loss,ultra-high bandwidth and low power consumption. To com-

Page 6: Rethinking the architecture design of data center networksjiangxiao/doc/FCS.pdf · performance of data center networks. Such interaction in-cludes checks and balances. For example,

6Kaishun WU et al. Rethinking the architecture design in data center networks

Table 1 General comparison of state-of-art data center architectureTree Fat tree DCell BCube Hybrid

Scalability Poor Good Excellent Good Good(scale up) (scale out) (scale out) (scale out)

Incremental Good Good Poor Not necessary Goodscalability

Wiring Easy Easy Very difficult Difficult EasyMultipath No Switch and router End-host protocol upgrade Switch and router

routing protocol upgrade protocol upgradeFault Poor Against switch Against switch, router and Against switch, router

tolerance and router failures end-host port failures and end-host port failuresCost High-end switches Low-end customized switches and routers Low-end Ethernet

and routers (cheap but many) and optical switchesTraffic balance No Yes No Yes Yes

Graceful Poor Good Excellent Excellent Gooddegradation

bine the best of both worlds, hybrid networks develop traf-fic demand estimation and traffic demultiplexing mechanism-s to dynamically allocate traffic onto the circuit or packetswitched network.

In this section, we observe that the existing topologies ofdata center networks are similar with HPC. The difference be-tween them is in the low layer design methods. For example,latency is a key issue in the both data center network and H-PC. But data transfer from memory in HPC is different fromthe data transfer between the servers in data center network.

Existing data center architectures are all fixed. They maynot provide adequate support for dynamic traffic patterns anddifferent traffic loads. It is still an open question as to whicharchitecture will perform best and whether the adaptive dy-namic data center networks are feasible.

5 Challenges

With the existing data center network designs, we identifysome key limitations and point out some open challenges thatcan be the subject of future research.

First, existing interconnection designs are all symmetrical.These symmetric architectures are difficult to extend whenwe need to add a small number of servers, or we have to losethe original network structure. In other words, these archi-tectures have poor incremental salability. For example, in or-der to expand a data center of the hypercube architecture, thenumber of servers has to be doubled every time. In practice,however, most companies cannot afford the cost of addingsuch a large number of servers at one time. In BCube, D-Cell, and Fat-tree, when the present configuration is full, the

network performance will be lower due to imbalance whenonly a small number of new servers are added. Besides theseinterconnection problems, heterogeneity is also a major issuein the network design. In 10 years time new technologieswill be accessible, we will face a practical dilemma: eitherwe have to integrate the old and new technologies into a sin-gle system, or we have to obsolete the old ones. It is yet anopen question which will be the better choice. To deal withthis problem, is it a good idea to reserve some place for suchpotential applications at the present time?

Second, not only should we consider the connections with-in the data center but also the connections to the externalworld. In a switch based network such as Fat-tree, it is easy toconnect to external world. In the direct networks we merelyfocus on the interconnection of the data center. Instead, takeno account for connection to the external world into accoun-t. Clearly, the latter problem is also crucial. For example,HTTP serves the external traffic while MapReduce serves theinternal traffics. In that case, we should not treat them as thesame and take uniform actions. Different flows may have avaried impact on the network performance. We might con-sider the problem as a quality of service (QoS) problem andfind an optimal design to better schedule the external trafficflow.

Third, as MEMS further develops energy issues becomeincreasingly important. The demand of different applications,may present various traffic patterns with certain unique char-acteristics. For example, Amazon EC2 is a cloud servicethat provides platform as a service (PAAS). In EC2, manyusers and applications run concurrently within a datacenter.Workloads are affected by user activities which are difficultif not impossible to predict. Thus, the traffic pattern con-

Page 7: Rethinking the architecture design of data center networksjiangxiao/doc/FCS.pdf · performance of data center networks. Such interaction in-cludes checks and balances. For example,

Front. Comput. Sci. China7

stantly changes over time. In such cases, the networking de-sign should not assume any fixed traffic pattern. Additional-ly, the design should also emphasize the network connectionsto the outside world (the Internet). Another example is thatfor a datacenter that runs data-intensive applications such asHadoop, the network design may be optimized for bisection-bandwidth. It is less important to consider how to connectto the outside world. We can observe that some data maybe used very frequently at a given time (called “hot data”).Without careful design, the disk and servers may consume alot of energy in the transfer between sleep and wake-up states.Some servers, in contrast, store data for backup purpose only(called “cold data”); such servers and disks can safely stay ina sleep state to save energy. In practice, optimizations can beachieved by appropriate scheduling of servers between sleepand wake-up cycles. With this in mind, we can observe thatdata placement is also important for green data centers.

Notice that there may not be a single optimal design thatis suitable for all applications and thus the choice is likely tobe application-dependent. But it is not yet clear what the im-plications are for such the applications. Today, the data maycome from various sources such as mobile phones, camer-a videos, and sensor networks that present multidimensionalcharacteristics. Different users may have different require-ments according to their data. User requirements may havedifferent workloads and thus have different traffic patterns. Inthat case, suppose we are designing a data center for the traf-fic data from monitor cameras or sensors. The optimal datacenter architecture is not straightforward. For example, if thetrajectory of a taxi is distributed across several servers. Howdo we place this data so that we can search the trajectory ofthis taxi quickly? Should we replicate this data? When newtrajectory data arrives at the data center, how can we migratethe original data? If we design a data center for such data, thedata placement, replication and migration will also becomechallenges. All these questions are challenging to answer inpractical environments. In addition, the application relies onthe queries to be used. Each query also implies some com-munication patterns between nodes.

6 Conclusion and future directions

This paper discusses the architecture design issues in currentdata center networks. Motivated by a better support for dataintensive applications and supplying higher bandwidth per-formance, how to optimize the interconnection of data cen-ters becomes a fundamental issue. We begin by introducing

the development atmosphere of current data center architec-tures. Then, we comprehensively elaborate on the prevalentdata center frameworks deployed in current enterprises andresearch associations. We compare several well-known ar-chitectures and remark on the existing limitations. After re-viewing several representative architecture designs, we listsome possible research directions.

Thanks to the maturity of optical techniques, some afore-mentioned hybrid architectures have begun to use both op-tical and electrical components, e.g., Helios and c-Through[12, 13]. As optical devices become more and more inexpen-sive, all optical architectures may become another directionfor future data center architectures. In [6], the authors showus an all optical architecture and compare the cost with Fat-tree. Though their work is in the initial stages without exten-sive experimental evaluation, we believe that the all opticalarchitecture will become a good choice for its high capacitynature.

However, these pure wire data center have static charac-teristics which cannot solve dynamic cases. This means thatonce a data center is built, it is hard to change its topology.There are two possible solutions for future data center net-work design. The first applicable to some fixed applications,we design the architecture for their special traffic patterns us-ing the metrics we mentioned in Section 2. With the spread ofcontainer based data centers, the architecture design is fixedin some cases. In that case, the data center should enablesome dynamic design applications to achieve higher perfor-mance. Moreover, the cabling complexity is a big issue indata centers as it will waste much space, can be difficult toconnect, hard to maintain and needs adequate cooling. On de-mand of these, the hybrid data center which combines wiredand wireless networks may be a good choice for such require-ments. Recently, the authors in [15] have proposed to lever-age multi-gigabit 60 GHz wireless links in data centers thatreduce interference and enhance reliability. In [16], the feasi-bility of wireless data centers has been explored by applyinga 3D beamforming technique, and thus has introduced im-provements in link range and concurrent transmissions. Sim-ilarly, by taking advantages of line-of-sight (LOS) on top-of-rack servers, steered-beam mmWave links have been appliedin wireless data center networks [8]. As multi-gigabit-speedwireless communication is being developed and specified bythe Wireless Gigabit Alliance (WiGig), wireless data centers(WDC) are likely to arrive in the near future. Instead of us-ing wired connections, wireless technologies will bring many

http://wirelessgigabitalliance.org

Page 8: Rethinking the architecture design of data center networksjiangxiao/doc/FCS.pdf · performance of data center networks. Such interaction in-cludes checks and balances. For example,

8Kaishun WU et al. Rethinking the architecture design in data center networks

advantages. For example, it is easy to expand WDC and itstopology can be easily changed. Maintenance of the WDC ismuch easier since wireless nodes can be replaced easily. Alsoby using wireless connections, we can simply transmit pack-ets from one node to any other node we wish. Building suchdata centers will take much less time for the connection ofthe servers. However, compared with the wired connections,wireless is less reliable and has lower channel capacity. Howto design a good hybrid data center which combines wiredand wireless to illustrate its performance will be a future re-search topic.

Acknowledgements This research was supported in part by

Pearl River New Star Technology Training Project, Hong Kong

RGC Grant (HKUST617710, HKUST617811), the National

High Technology Research and Development Program of Chi-

na (2011AA010500), the NSFC-Guangdong Joint Fund of Chi-

na (U0835004, U0935004, and U1135003), and the National

Key Technology Research and Development Program of China

(2011BAH27B01).

References

1. Yang M, Ni L M. Incremental design of scalable inter-connection networks using basic building blocks. IEEETransactions on Parallel and Distributed Systems, 2000,11(11): 1126–1140

2. Greenberg A, Hamilton J, Maltz D A, Patel P. The costof a cloud: research problems in datacenter networks.ACM SIGCOMM Computer Communication Review,2009, 39(1): 68–73

3. Al-Fares M, Loukissas A, Vahdat A. A scalable com-modity data center network architecture. In: Proceed-ings of the ACM SIGCOMM 2008 Conference on Ap-plications, Technologies, Architectures, and Protocolsfor Computer Communications. 2008, 63–74

4. Guo C, Wu H, Tan K, Shi L, Zhang Y, Lu S. DCell:a scalable and fault-tolerant network structure for da-ta centers. In: Proceedings of the ACM SIGCOMM2008 Conference on Applications, Technologies, Archi-tectures, and Protocols for Computer Communications.2008, 75–86

5. Guo C, Lu G, Li D, Wu H, Zhang X, Shi Y, Tian C,Zhang Y, Lu S. BCube: a high performance, server-centric network architecture for modular data centers.In: Proceedings of the ACM SIGCOMM 2009 Confer-ence on Applications, Technologies, Architectures, andProtocols for Computer Communications. 2009, 63–74

6. Singla A, Singh A, Ramachandran K, Xu L, Zhang Y.Proteus: a topology malleable data center network. In:Proceedings of 9th ACM workshop on Hot Topics in

Networks. 20107. Kliegl M, Lee J, Li J, Zhang X, Guo C, Rincon D. Gen-

eralized DCell structure for load-balanced data centerNetworks. In: 2010 INFOCOM IEEE Conference onComputer Communications Workshops. 2010

8. Katayama Y, Takano K, Kohda Y, Ohba N, NakanoD. Wireless data center networking with steered-beammmwave links. In: Proceeding of 2011 IEEE WirelessCommunications and Networking Conference. 2011,2179–2184

9. Greenberg A G, Hamilton J R, Jain N, Kandula S, KimC, Lahiri P, Maltz D A, Patel P, Sengupta S.VL2: a scal-able and flexible data center network. In: Proceedings ofthe ACM SIGCOMM 2009 Conference on Application-s, Technologies, Architectures, and Protocols for Com-puter Communications. 2009, 51–62

10. Mysore R N, Pamboris A, Farrington N, Huang N, Mir-i P, Radhakrishnan S, Subramanya V, Vahdat A. Port-Land: a scalable fault-tolerant layer 2 data center net-work fabric. In: Proceedings of the ACM SIGCOMM2009 Conference on Applications, Technologies, Archi-tectures, and Protocols for Computer Communications.2009, 39–50

11. Costa P, Donnelly A, O’Shea G, Rowstron A. CamCube:a key-based data center. Technical Report, MicrosoftResearch, 2010

12. Farrington N, Porter G, Radhakrishnan S, Bazzaz H H,Subramanya V, Fainman Y, Papen G, Vahdat A. Helios:a hybrid electrical/optical switch architecture for modu-lar data centers. In: Proceedings of the ACM SIGCOM-M 2010 Conference on Applications, Technologies, Ar-chitectures, and Protocols for Computer Communica-tions. 2010, 339–350

13. Wang G, Andersen D G, Kaminsky M, Papagiannaki K,Ng T S E, Kozuch M, Ryan M P. c-Through: part-timeoptics in data centers. In: Proceedings of the ACM SIG-COMM 2010 Conference on Applications, Technolo-gies, Architectures, and Protocols for Computer Com-munications. 2010, 327–338

14. Wu H, Lu G, Li D, Guo C, Zhang Y. Mdcube: a highperformance network structure for modular datacen-ter interconnection. In: Proceedings of the 2009 ACMConference on Emerging Networking Experiments andTechnology. 2009, 25–36

15. Halperin D, Kandula S, Padhye J, Bahl P, WetherallD. Augmenting data center networks with multi-Gigabitwireless links. In: Proceedings of the ACM SIGCOMM2011 Conference on Applications, Technologies, Archi-tectures, and Protocols for Computer Communications.2011, 38–49

16. Zhang W, Zhou X, Yang L, Zhang Z, Zhao B Y, ZhengH. 3D beamforming for wireless data centers. In: Pro-ceedings of the 10th ACM Workshop on Hot Topics inNetworks. 2011

Page 9: Rethinking the architecture design of data center networksjiangxiao/doc/FCS.pdf · performance of data center networks. Such interaction in-cludes checks and balances. For example,

Front. Comput. Sci. China9

Kaishun Wu is currently a research assistan-t professor at the Hong Kong University ofScience and Technology (HKUST). He re-ceived PhD degree in Computer Science andEngineering from HKUST in 2011. He re-ceived his BEng degree from Sun Yat-senUniversity in 2007. His research interest-s include wireless communications, mobilecomputing, wireless sensor networks, anddata center networks.

Jiang Xiao is a first year PhD student inHong Kong University of Science and Tech-nology. Her research interests focus onwireless indoor localization systems, wire-less sensor networks, and data center net-works.

Lionel M. Ni is chair professor in the De-partment of Computer Science and Engi-neering at the Hong Kong University of Sci-ence and Technology (HKUST). He alsoserves as the special assistant to the presi-dent of HKUST, he is the dean of HKUSTFok Ying Tung Graduate School and visit-ing chair professor of Shanghai Key Lab ofScalable Computing and Systems at Shang-hai Jiao Tong University. A fellow of IEEE,Prof. Ni has chaired over 30 professionalconferences and has received six awards forauthoring outstanding papers.