Intel Technology Journal Q4, 2000 · Intel Technology Journal Q4, 2000 Preface Lin Chao Editor Intel Technology Journal Technology is often a key business differentiator. This is

Intel Technology Journal Q4, 2000

Preface Lin Chao Editor Intel Technology Journal Technology is often a key business differentiator. This is especially true for e-Businesses. Simply put, e-Business is defined as Internet-based commerce. Both start-ups and established companies alike face the challenge of doing business on the Internet, at Internet speeds. Intel is one such company adapting to doing business at Internet speed. We are striving to make our own internal processes "e-Business-like." For example, Intel's Web-based order management system allows our customers to place orders, track deliveries, post inquiries, and obtain product and pricing updates. We also have an Internet hosting business called Intel® Online Services, Inc., whose mission is to provide second-generation web hosting to companies worldwide. Internet data centers are the physical environments where all the e-Business magic occurs. The five papers in this issue of the Intel Technology Journal focus on the planning, implementation, and deployment of our e-Business technology data centers, including those data centers managed by Intel's information technology group and Intel Online Services, Inc. The first paper gives an overview of Intel's own internal e-Business data centers. It outlines the current and future direction of the technologies that are being applied to fulfill Intel's future e-Business growth. The second paper describes IP addressing issues for Internet data centers; in particular, it raises the concern about the dwindling amount of IP address space. The authors describe how IP address space concerns impact the design, implementation, and operation of Internet data centers. The third paper discusses the process used to certify the reliability of systems and service offerings in each new data center that comes online for Intel Online Services, Inc. To meet the rapid growth in the number of e-Business applications that are being deployed and the rate at which they are being released, the fourth paper looks at release management and application landing. And, finally, the fifth paper looks at asset management and capacity planning. Asset management can mean anything from determining where a system is located to what applications are running on that system. Effective asset management and capacity planning are vital to the success of any Internet data center.

Copyright © Intel Corporation 2000. This publication was downloaded from http://www.intel.com/. Legal notices at http://www.intel.com/sites/corporate/tradmarx.htm

http://www.intel.com/technology/itj/chao_bio.htm

Fueling the Engines of the Internet Business Machines

By Mike AymarCorporate Vice President; and President, Intel Online Services, Inc.

Intel Corp.

By Doug BuschVice President, Information Technology, and Director, Information Technology

Intel Corp.

Building on Intel's own internal experience and expertise, Intel® Online Services was formed in early 1999 to deliver global e-Commerce and application hosting for other companies. By having Intel deploy and manage their e-Business applications and infrastructures, these companies can remain focused on their core competencies.

Market research firm Gartner Group estimates that Business-to-Business e-Commerce will be a $7.29 trillion market segment by 2004. To remain competitive, companies must move online. Responding to an internal and external focus on e-Commerce, Intel is addressing this competitiveness head-on by becoming the building block supplier to the worldwide Internet economy. Intel Online Services' core infrastructure is a

growing network of data centers that provides application-hosting services to worldwide customers. Currently, seven data centers are located in the US, Europe, Australia, India, Japan, and Korea with more planned for 2001. First- and second-generation hosting are offered. "First-generation" Internet hosting is generally defined as "co-location" meaning that the customer brings everything but the basic facilities and connectivity. "Second-generation" hosting is also available. Features include multiple open-standard applications that enable the fast development of complete e-Business solutions. These capabilities translate into secure, ultra-reliable Internet applications, as well as world-class network and server infrastructures.

Since 1993, Intel has been building strong global networking and Information Technology (IT) expertise. Intel's internal IT organization, with nearly 4000 employees, manages a very large worldwide infrastructure that supports Intel facilities in 93 locations across 32 countries. A sense of the scale of Intel's IT infrastructure is provided by a few statistics: there are over 4300 infrastructure and app servers; 2300 network devices (routers, WAN links, LAN connections, hubs, and switches); 325 Mbps worldwide network capacity on more than 300 circuits; 3.5 million mail messages per day; and 24X7 e-Business and enterprise applications operations with better than 99.97% uptime. This worldwide, highly reliable environment is a crucible for the development of IT expertise and techniques. This infrastructure allows Intel to do the majority of our business — almost $24 billion of $31.8 billion in sales — over the Net.

Quokka, CommerceRoute, Inc. and a variety of other companies are already taking advantage of the e-Business solutions offered by Intel Online Services. Intel Online Services has also forged relationships with

Fueling the Engines of the Internet Business Machines 1

http://intel.com/pressroom/kits/bios/aymar.htm

http://intel.com/pressroom/kits/bios/busch.htm


top solutions developers and network providers, including Appgenesys, Proxicom, and PricewaterhouseCoopers

Over the next few years, Intel will continue to grow our internal and external e-Commerce support. Internal e-Business systems will be an important vehicle for developing Intel expertise and leading the industry in the development of e-Business capability. Intel Online Services will continue to open data centers globally and provide more end-to-end application services in support of Intel's external customers. Intel will continue to focus on being the preeminent building block supplier to the worldwide Internet economy.

Copyright © Intel Corporation 2000. This publication was downloaded from http://www.intel.com/. Legal notices at http://www.intel.com/sites/corporate/tradmarx.htm

Fueling the Engines of the Internet Business Machines 2

http://www.intel.com/

http://www.intel.com/sites/corporate/tradmarx.htm

http://www.intel.com/sites/corporate/tradmarx.htm

IP Addressing Space Design Issues For Internet Data Centers 1

IP Addressing Space Design Issues for Internet Data Centers

Lynne Marchi, Intel Online Services, Intel Corp.Sridhar Mahankali, Intel Online Services, Intel Corp.

Jeff Sedayao, Intel Online Services, Intel Corp.

Index words: Internet protocol, address space, data center

ABSTRACTThe implementation of Public Internet Protocol (IP)address space is a key factor in the size and growth ofInternet data centers. IP addressing space decisions affecthow many servers can be hosted at a data center, and theyinfluence the kind of network connectivity technologythat will be used and even how web sites areimplemented. This paper describes IP addressing issuesfor Internet data centers. First, we provide an overview ofInternet addressing and routing: we discuss IP networks,autonomous systems, and high-level Internet networkrouting. Key Internet constraints are described,particularly the finite amount of IP address space andautonomous systems and the current addressing androuting policies that result from those constraints. Wethen go over key IP address design decisions. TheInternet data center builder needs to decide what addressspace to use, the size of that address space, theautonomous system number to use, and the addressallocation policy to use with customers. These choicesare constrained by the difficulty of obtaining space, therequired speed of implementation, Internet ServiceProvider (ISP) routing policies, ISP connectivitydecisions, and security requirements. Next, we describehow these design choices affect technology choice andimplementation with the data center, by using virtual website design and Network Address Translation (NAT) asexamples. We then provide examples of how addressspace constraints affect the design of Intel® OnlineServices (IOS) data center address spaces and othertechnology choices. The last section discusses sometrends and future technologies that may alleviate IPaddress constraints.

INTRODUCTIONIssues with Internet Protocol (IP) address space arecritical, yet often overlooked, factors in building andmaintaining Internet-accessible data centers and webserver farms, such as those hosted by companies like IntelOnline Services. A shortage of address space can limitthe growth and expansion of data centers. Moreover,dealing with the scarcity of IP address space and withsituations where the data center and customers have tocommunicate while sharing the same private addressspace drives technology decisions and complicates thedebugging of server and network problems. This paperdescribes how IP address space concerns can impact thedesign, implementation, and operation of Internet datacenters. First, we present an overview of how IPaddressing and routing works on the Internet. We thendiscuss key address space design considerations. Thenext section describes the effects of addressing choices ondata center design and implementation, and it is followedby a section in which we show examples of address spacedesign decisions at Intel Online Services. We end with adiscussion of future technology development trendsregarding IP address space.

A BRIEF INTRODUCTION TO INTERNETADDRESSING AND ROUTINGInternet protocol addressing and routing must first beunderstood before any discussion of IP address spacedesign issues will be useful. This section goes over theoriginal IP address scheme, its limitations, and the currentmethods used to deal with the finite number of IPaddresses. This information is crucial to an understandingof the choices and constraints for IP addresses in a datacenter.



Table 1 shows the different classes of IP addresses. Notethat two other classes of address space, class D and E,were not included in the above table. Class D addressesstart at 224.0.0.0 and are used for multicast. Class Eaddresses start at 240.0.0.0 and are used for experimentalpurposes.

Original IP 4 Addressing SchemeIn order for two hosts to communicate over the Internet,there needs to be a way to uniquely identify hosts. In1981, the Internet Engineering Task Force (IETF) createdthe Internet Protocol, IP version 4 (IPv4) [1], whichdefines the current method of uniquely identifying hosts.IPv4 addressing uses a 32-bit binary address. The IETFalso incorporated support for decimal representation ofaddresses to make the addresses human-readable. Indecimal form, an IP address consists of 4 octets (sets of 8bits), separated by dots. Each octet can be a numberranging from 0 to 255. Examples of valid (decimal) IPaddresses are 10.245.171.1 and 172.16.50.224.

IP addresses are partitioned into a network portionfollowed by a host portion. Hosts belong to a network,and that network is defined by the network portion of theIP address. The original design called for classes ofaddress space that divided the IP space into large,medium, and small networks that could be assigned toorganizations (businesses, universities, governmentagencies, etc). Included in the design was the notion of anetwork mask that defines what part of an IP address isthe network portion (as opposed to the host portion of theaddress). In binary, the network portion of the address isa series of ones that is then followed by a series of zeroesrepresenting the host portion of the address. In decimal,the network portion of the mask is equal to 255 for eachoctet.

Autonomous SystemsAnother requirement for Internet communication is thateach host needs to know how to reach all of the otherhosts. To facilitate this, organizations advertise the pathto their network to other networks. Devices called routerslearn about networks in this way and forward packetsappropriately. Routers exchange network and routinginformation through what is called a routing protocol. Arouting table, which is a list of networks and the next hop(often another router) to forward packets for those

networks, is stored in the router’s memory. The routerwill select the best path (next-hop router) to put into itstable when multiple paths exist to the same network.

In the early days of the Internet, all connected routersshared their routing tables. As use of the Internet startedto grow, more routers and networks were added, and theamount of overhead required to store the routing table andmanage changes to the routing table also increased. Inaddition, as more companies began manufacturingdifferent routers that ran their own implementation of therouting software, compatibility issues between differentvendors arose. For these and a number of other reasons, itwas decided to break the Internet into smaller routingdomains, called Autonomous Systems (AS).

An autonomous system (AS) [2] is a set of routers andnetworks that are managed by one or more administrativeentities (e.g., company, university, Internet ServiceProvider, etc.). Each AS is assigned a unique number sothat communication between different autonomoussystems can occur. Routers inside the AS run an InteriorGateway Protocol (IGP) such as RIP [3] and OSPF [4].To communicate externally, one or more border routersare chosen. Border routers use an Exterior GatewayProtocol (EGP) to exchange routing information withrouters in different autonomous systems. Today, theBorder Gateway Protocol Version 4 [5] (BGP4) isgenerally used for this purpose.

Each AS has a number associated with it. BGP4 uses 16bits for AS numbers, so that AS numbers range from 0 to64535. The upper 1024 are reserved as private ASnumbers, usable only within an AS and not directlyreachable from the Internet. This leaves AS numbers 1 to64511 as valid, Internet-usable AS numbers.

Issues With IP AddressingSince IPv4 was finalized, use of the Internet has grownexponentially, causing major addressing issues. In theearly days of the Internet, organizations were able toobtain large blocks of IP space without proof that it wasneeded or even going to be used, and as a result, IPaddress space was being rapidly depleted. Another sideeffect of address space allocation policies was that therouting tables for Internet routers were once againbecoming huge [6]. Remember, routers store a list ofnetworks and next hop information in memory. When

Class Range of NetworkNumbers

Default NetworkMask

Network vs. Host Portion Number of Hosts

A 1.0.0.0 to 126.0.0.0 255.0.0.0 network.host.host.host 16,777,214

B 128.1.0.0 to 191.254.0.0 255.255.0.0 network.network.host.host 65,534

C 192.0.1.0 to223.255.254.0

255.255.255.0 network.network.network.host 254

Table 1: Original IP version 4 address class



routing tables are large, they take up more memory andmore CPU processing time is required to search them.

Finally, the class of address space as defined in Table 1did not always meet, and sometimes exceeded, the needsof the organization receiving it. For example, a smallbusiness that expected to grow to no larger than 300 hostswould require two Class C networks (508 addresses).This wasted 208 addresses (two 256 host networks minusfour addresses that are network overhead and minus the300 hosts)!

Address Allocation AuthorityTo slow the depletion of IP space, the Internet AssignedNumbers Authority (IANA) [7] was established tooversee allocation of the remaining IPv4 addresses.IANA further delegated this authority to the regionalregistries:

• American Registry for Internet Numbers (ARIN)

• Asia-Pacific Network Information Center (APNIC)

• Réseau IP Européens (RIPE NCC)

Today, it is much harder to obtain IP address space as therequesting body must provide a detailed plan that showsthat the requested space is justified and how it will beused.

Subnetting ChangesSeveral new methods of addressing were also created sothat usage of IP space was more efficient. The first ofthese methods is called Variable-Length Subnet Masking(VLSM) [8]. Subnetting had long been a way to betterutilize address space [9]. Subnets divide a single networkinto smaller pieces. This is done by taking bits from thehost portion of the address to use in the creation of a“sub” network. For example, take the class B network147.208.0.0. The default network mask is 255.255.0.0,and the last two octets contain the host portion of theaddress. To use this address space more efficiently, wecould take all eight bits of the third octet for the subnet.

One drawback of subnetting is that once the subnet maskhas been chosen, the number of hosts on each subnet isfixed. This makes it hard for network administrators toassign IP space based on the actual number of hostsneeded. For example, assume that a company has beenassigned 147.208.0.0 and has decided to subnet this byusing eight bits from the host portion of the address.Assume that the address allocation policy is to assign onesubnet per department in an organization. This meansthat 254 addresses are assigned to each department. Now,if one department only has 20 servers, then 234 addressesare wasted.

Using variable-length subnet masks (VLSM) improves onsubnet masking. VLSM is similar to traditional fixed-length subnet masking in that it also allows a network tobe subdivided into smaller pieces. The major difference

between the two is that VLSM allows different subnets tohave subnet masks of different lengths. For the exampleabove, a department with 20 servers can be allocated asubnet mask of 27 bits. This allows the subnet to have upto 30 usable hosts on it.

Class Private Address Space

A 10.0.0.0 to 10.255.255.255B 172.16.0.0 to 172.31.255.255C 192.168.0.0 to 192.168.255.255

Table 2: Private address space ranges

Private IP SpaceIn 1996, IANA set aside three blocks of the global IPspace to be used by organizations solely for the purposeof internal communication [10]. This address space,called private IP space, meant that a company couldassign private addresses to hosts inside the company thatdid not require direct access to the Internet. Anyorganization could use private space without fear ofcolliding with another organization’s address space. Thisallowed companies to conserve on the public IP spacethey had already acquired by assigning it to only thosehosts that needed to communicate directly with theInternet. Table 2 shows which networks can be used forprivate addressing.

Classless Internet Domain RoutingSo far, the discussion on IP address allocation has usedthe model shown in Table 1. This model is often referredto as a “classful” model because it relies on using thedefinitions of class A, B, and C networks. ClasslessInter-Domain Routing (CIDR) [11, 12] eliminates classfuladdressing in the same way that VLSM eliminated fixed-length subnet masks. CIDR uses a prefix to indicate thenumber of bits used for the network portion of theaddress, while the remaining bits are used for the hostaddress. For example, 147.208.61.8/20 is a CIDR addressin which the first 20 bits contain the network portion ofthe address, leaving 12 bits for the host portion. Thenetwork mask for a /20 prefix is 255.255.240.0 and isequivalent to 16 traditional class C networks!

Another advantage of CIDR is it allows routes to beaggregated. This means many networks can besummarized into a single route. For example,147.208.0.0/19, 147.208.32.0/19, 147.208.64.0/19, and147.208.192.0/19 can be summarized as 147.208.0.0/17.Once CIDR was implemented, the growth in the size ofInternet routing tables was significantly reduced.

ADDRESS SPACE DESIGN ISSUESWhen implementing an Internet data center, there are anumber of key decisions that need to be made about IPaddress space. In this section, we discuss four key design



points: what address space to use, the size of addressspace to advertise for a data center, what autonomoussystem number to use, and the IP address allocationpolicy. For each of these design issues, we talk about thedifferent choices available and the tradeoffs involved witheach choice.

What IP Address Space to UseThe first critical choice that data center implementershave to make is what IP address spaces to use. There areseveral choices here:

1. Private IP address space

2. Currently owned and used space

3. Space from the data center’s ISPs

4. New space obtained directly from Internet registries

5. Customer space

These choices are not mutually exclusive within a datacenter, and we will go over the tradeoffs involved witheach choice.

Private IP address space has the primary advantage ofbeing plentiful and immediately available. It has theprimary disadvantage of not being immediately usable onthe Internet without some form of Network AddressTranslation (to be discussed later) or some kind ofproxying technique. This disadvantage is at times not aproblem. For applications and services that do not requiredirect access to the Internet, this is not a concern. Also,for hosts such as database or application servers that donot directly talk to other systems on the Internet, this hassome security advantages, as these systems are notvulnerable to direct attack from the Internet. (Do notthink that they are invulnerable because of this, however).

If data center implementers already have their own IPaddress space, another possibility is to use that space.This can be advantageous, as an organization may haveplenty of address space to utilize immediately. The primedisadvantages can be with routing. For example, someISPs do not accept network prefixes longer than a /16 forparts of traditional class B networks. So if you wanted touse part of a /19 part of a traditional class B for a datacenter, that data center would not be accessible from allISPs.

Another option is to use space from an ISP. ISPs haveaddress space that they will provide to customers. Whilethis option has the advantage that there is address space touse immediately, this option has a number of powerfuldisadvantages. The first disadvantage is that using anISP’s address space will typically limit you to using oneISP. That address space is bound to that ISP, and youtypically will not be able to have traffic routed throughanother ISP. Another disadvantage is that should youchoose to discontinue your service with that ISP, youwould have to give back all of the space you received,

forcing you to renumber all of the hosts directlyaccessible from the Internet.

Another option is to obtain new address space directlyfrom the Internet registries that distribute space. Thisoption has a number of advantages. A data center usingspace obtained from the Internet registries can changeISPs without having to worry about renumbering hosts.The registries usually allocate addresses in /19 blocks,making that address space immediately routable. Thedisadvantage of getting space from registries is that theprocess takes time, on the order of weeks, and longer ifyou need to first join the registry. The process alsoinvolves rigorous justification of address allocation andwhy currently owned addresses will not suffice. Inaddition, once space is allocated, it cannot be used all atonce. To use more of an address allocation, anotherjustification process is required, often requiringverification that previously assigned addresses have beenused.

A final choice is using the data center’s customers’address space. When possible, this is good, but it is oftennot possible, as Internet data center customers usuallyexpect that you will use your own address space. Even ifa customer is willing to do this, it may take some time tomake changes to the data center’s ISP’s address filteringto make it possible to use that space. Also, customeraddress space is also vulnerable to the same problems asaddress space you already own. The customer may use apiece of existing address space, such as part of a class B,that some ISPs may refuse to accept as a route.

Size of Address Space to Advertise from theData CenterA decision closely related to what address space to use iswhat size of address space to use and how to advertise iton the Internet. Clearly, you can only advertise the spacethat you have: that puts an upper limit on the addressspace advertised for the data center, and thus a lower limiton the prefix length of the network advertised. ManyISPs will not accept route advertisements for networkswith prefix lengths longer than /19, which puts aneffective upper limit on the prefix length of what youadvertise and a lower limit on the size of the addressspace.

There are a number of factors that affect the size of theaddress space advertised. First, it depends on how manyhosts within the data center need public addresses. Youwant to advertise enough address space to cover the hoststhat need public addresses, both immediately and in thenear future. To make routing more manageable and tohelp reduce the growth in the size of the Internet routingtable, it is better to advertise fewer routes. Instead ofadvertising each network in a data center, if you advertisea single aggregation of those routes, there are fewer routesto manage. As mentioned above, some ISPs will notaccept parts of a traditional class B network. One way to



deal with such ISPs is to connect data centers to ISPs whowill accept parts of a class B. Each data center canadvertise a prefix that is short enough to be accepted,while at least one data center can advertise the whole /16class B network. This way, those ISPs who only accept awhole class B will see a route for the whole class B andsend it to the data center’s ISP. Once it is in the datacenter’s ISP, the ISP will route traffic to the mostappropriate data center because each data center is alsoadvertising a route for its section of the class B.

Another factor in deciding what size address space toadvertise is the backbone infrastructure connecting datacenters. If data centers are connected by a backbonenetwork that has enough capacity to route significantamounts of public Internet traffic coming into one datacenter that is bound for another (the worst case being thatthe backbone will handle all of a data center’s traffic),then it is feasible to advertise a single route thataggregates all of the data center’s networks into one. Ifthe backbone connecting the networks doesn’t have thecapacity, advertising a single aggregated route can resultin performance bottlenecks when end users accessing onedata center access that data center through another.

Autonomous Systems NumberThe issues and choices regarding Autonomous Systems(AS) numbers are very similar to those regarding IPaddress space. A data center’s address space can beadvertised from the following:

• Private AS

• Currently owned and used AS

• The data center’s ISP’s AS

• A new AS obtained directly from Internet registries

• Customer AS

Advertising a route from a private AS number is fast andimmediately available, but those private AS numbers areonly usable within an organization’s public AS. Likeprivate IP addresses, private AS numbers are not usableover the Internet. Advertising from an existing ASnumber that is already owned by a data center’sadministrators is also easy and quick. One considerationto keep in mind when using an existing AS is the routingpolicy implemented by ISPs and other organizations.Routing policies are often implemented by AS numbers,and each of the data center networks advertised from thatAS will be affected by such a policy. This can become agreat disadvantage when data centers are spread acrossgeographies. Internet conditions can vary greatly: therouting policy made on one continent may be (and usuallyis) totally inappropriate on another.

If a data center’s routes are advertised from an ISP’s mainAS number, the data center is locked into using only thatISP and cannot have connections into other ISPs(although it should be mentioned that some ISPs provide

a special AS for multihomed customers). Getting acompletely new AS number from an Internet registry hasthe advantage that a data center can change ISPs withmuch less work, usually just changing entries in routingregistries. Multihoming to multiple ISPs is now possible,and end-user access to the data center can be improved byrouting traffic based on using the full Internet routingtable. The down side to getting a new AS is that it takestime: you usually have to join an Internet registry andapply for an AS number. Also, AS numbers are limited inquantity, as mentioned above, with only 64511 ASnumbers available for use directly on the Internet.

Finally, a data center can advertise address space/routefrom a customer’s AS number. This has the advantage ofallowing that address space to be served by multiple ISPs.It has a number of constraints and disadvantages. Onlythe address space/routes that the customer owns can beadvertised as using that AS. As with using a previouslyowned AS, this option has the disadvantage of beingaffected by any other policy that organizations and ISPsmay implement based on the customer’s AS numbers.The data center effectively becomes an ISP, and the datacenter’s ISPs often must change AS and network filteringpolicies to allow that route to be advertised. In addition,routing registry information concerning that network willalso have to be changed.

IP Address Allocation PolicyGiven that IP addresses are limited in quantity and theiruse has serious constraints, the allocation policy for IPaddresses to data center customers is a serious concern.There are a number of constraints and tradeoffs. The firstchoice that needs to be made is whether to allocate aseparate address space to each customer. From acustomer, security, administrative standpoint, it is betterto give each customer separate address spaces. Firewallpolicies are easier to implement on a subnet basis, andany special traffic policies, such as giving certaincustomers a different path or giving them priority overothers, are much easier to implement if customers are ondifferent subnets. Customers may be competitors, and thethought of a competitor on the same subnet may beunpalatable to a data center customer. The cost ofseparate subnets per customer is loss of usable addressspace. Each subnet has a subnet number, and typicallythat is not used as a host name to avoid confusion. Also,each subnet has a broadcast address that cannot be usedfor hosts. As a result, there are two addresses consumedas overhead for each subnet. The more subnets, the moreaddresses that are lost from subnetting overhead. Figure 1shows the fraction of a subnet that is lost to varyingdegrees of subnet overhead if that space is divided intosubnets with prefixes of the specified length. Half of alladdresses in a /30 are consumed by subnet overhead even



with the lowest possible subnet overhead.

Subnet Overhead

00.20.40.60.8

11.2

22 23 24 25 26 27 28 29 30

Subnet Prefix Length

frac

tion

of s

pace

as

over

head

2

34

56

7

Figure 1: Fraction of overhead per subnet, dependingon subnet length and overhead per subnet

The amount of overhead consumed per customer subnetaffects usable address space availability. There are routerredundancy techniques such as Hot Standby RouterProtocol (HSRP) [13] that allow more than one router tohandle traffic for a virtual interface. The overhead cost ofHSRP is one virtual address and one address per router.Thus for a /29 segment with two routers using HSRP onthe subnet interface, 62.5% of the available address spaceis consumed with overhead, leaving only three usable IPaddresses. Figure 1 also shows how usable address spacedisappears with the increase in per subnet overhead.

From a customer and administration standpoint, it is alsoadvantageous to have as much address space as possible.This makes it easy for a data center customer to expandoperations by adding servers. Moving servers to acompletely different, larger subnet in order to expandforces a customer to reconfigure all the hosts, typicallyinvolving significant downtime. As mentioned above,firewall and other access policies are often configured bysubnet, and having as large a subnet as possible dedicatedto a customer allows additions and changes to be made toservers without having to change those firewall andaccess policies.

Of course, since the supply of IP addresses is limited,customers cannot have all the space that might beconvenient for them. Data center administrators mustconsider what happens when address space becomesnearly exhausted. In that case, they need to considermeeting Internet registry requirements to obtain newspace. Typical registry requirements for new public IPspace are as follows:

• 25% of the new space must be utilized immediately.

• 50% of the space must be used within one year.

• To get more space, the address space must be 80%utilized.

If these rules are not followed, the data center will be hardpressed to get more space if necessary. Again, the cost ofgiving smaller allocations of address space is that there

will be more subnets and more addresses consumed withsubnetting overhead.

The final consideration that a data center needs toevaluate is economic. An IP address has economic value,and if a customer is willing to pay for space that isunused, the value needs to be weighed against anothercustomer using that space and also generating income.

EFFECTS OF ADDRESS SPACE CHOICESON DATA CENTER IMPLEMENTATIONThe scarcity of IP addresses drives many implementationand technology choices. In this section, we discuss someof these choices, particularly Network AddressTranslation, the complications of implementing NetworkAddress Translation, and web server implementations.

Network Address TranslationA very useful service that an Internet data center can offeris connectivity between a customer’s internal network andtheir servers hosted at the data center. One problem thatsuch a service can encounter is conflict between privateaddress spaces. The resolution of private IP addressconflicts can affect address space design choices made atthe customer end as well as those made for data centerinternal networks. Another major design factor is thepreference to hide customers’ internal networks frombeing seen in the data center network.

A popular technology used for resolving IP addressconflicts is Network Address Translation (NAT) [14].NAT helps translate IP addresses to a non-conflicting IPspace and can be used to resolve IP conflicts that occurbetween a customer network and data center internalnetworks, as well as those that occur between twodifferent customers’ networks. There are two differentmodes of NAT:

• many-to-one or many-to-few NAT

• static one-to-one NAT

Many-to-one or many-to-few NAT entails hiding a set ofnetworks or IP addresses behind a single IP address or asmall pool of IP addresses respectively. A keycharacteristic of this form of NAT is that in addition tothe IP being translated to a non-conflicting IP space, theport numbers are also translated to dynamically assignedport numbers to enable differentiation among the set ofnetworks or IP addresses being hidden.



Dest. Port

4321

Source Port

Y

Dest. IP

192.168.43.x

Source IP

147.208.25.31

Dest. Port

4321

Source Port

Y

Dest. IP

192.168.43.x

Source IP

147.208.25.31

Dest. Port

Y

Source Port

4321

Dest. IP

147.208.25.31

Source IP 192.168.43.x

Dest. Port

Y

Source Port

4321

Dest. IP

147.208.25.31

Source IP 192.168.43.x

Dest. Port

1234

Source Port

Y

Dest. IP

209.23.151.16

Source IP

147.208.25.31

Dest. Port

1234

Source Port

Y

Dest. IP

209.23.151.16

Source IP

147.208.25.31

Dest. Port

Y

Source Port

1234

Dest. IP

147.208.25.31

Source IP

209.23.151.16

Dest. Port

Y

Source Port

1234

Dest. IP

147.208.25.31

Source IP

209.23.151.16

Network B(Translated Packet)

Network A(Untranslated Packet)

Figure 2: Many-to-one NAT packet flow

Figure 2 shows a sample flow of how a packet changes asit traverses the many-to-one NAT boundary betweennetworks A and B. Any traffic traversing from network A(designated by 192.168.43.0/24) to network B has itssource IP address translated to a single IP address(209.23.151.16). To differentiate the multiple hosts onnetwork A trying to communicate with hosts on networkB, the source port 4321 is also translated to a dynamicallyassigned port 1234 as the NAT boundary is crossed. Thedevice performing NAT maintains a dynamic table ofthese IP addresses and port translations to ensureappropriate communication between network A andnetwork B.

A consequence of using many-to-one or many-to-fewNAT is that network communication cannot be initiatedbidirectionally. Consider the example in Figure 2.Network communication can be initiated from hosts onnetwork A to those on network B. Hosts on network Bcannot initiate connections to hosts on network A becausenetwork A is hidden behind a single IP address209.23.151.16, and hosts on network B cannot uniquelyaddress a host on network A for communication. Thiscould potentially be considered a security feature as well.

Static one-to-one NAT entails translating an IP addressuniquely to another IP address. In addition, static NATdoes not involve any port translation. Further, there is norestriction regarding which direction networkcommunication can be initiated in because the deviceperforming NAT can uniquely translate back to a specifichost on the network with the conflicting IP space.

Based on network communication requirements, many-to-one or static NAT or both can be used. Later on, wediscuss how Intel Online Services uses NAT for IPconflict resolution under various remote access scenarios.Extensive use of NAT also drives the need for choosing adevice that allows enabling of flexible NATconfigurations to fit the diverse requirements of multipledata center customers.

NAT may need to be used more than once as the networktraffic traverses between the customer networks and thedata center networks and back. An example of such asituation is when one customer’s internal IP spaceconflicts with another customer’s internal IP space. Insuch a situation, NAT would need to be performed at oneof the customer ends to resolve IP conflicts with the othercustomer’s IP space. NAT would need to be used asecond time at the data center end to hide the customernetwork from data center internal networks.

To further complicate matters, there could be situationswhere a customer needs to run applications such asDCOM [16] that do not work across NAT. In suchsituations, alternate solutions like readdressing customerend systems to a public IP space and allowing that IPspace to be visible within the data center may need to beexplored.

Debugging Complications from NATWhile NAT can be a very useful tool for resolving IPaddress conflicts and enabling end-to-end connectivity tocustomer-end networks, the use of NAT can lead tocomplex troubleshooting scenarios [17]. In situationswhere NAT is performed more than once as a packettransits from its source to its destination, the IP addressof an end system will change as many times as a NATboundary is traversed. Debugging access issues in suchcases requires intimate knowledge of the end-to-endnetwork path and also a clear comprehension of which IPaddress is associated with an end system on a givensection of that network path.

A packet sniffer is critical when troubleshooting potentialconnectivity issues across a device performing NAT.Visibility inside the packets captured on both sides of theNAT boundary helps establish whether the IP addresstranslation is occurring as desired. Looking at thepayload in a packet can also help identify if anapplication will work across a NAT boundary. Usually,applications that contain IP addresses or application portinformation in the packet payload will not work acrossNAT as the NAT process modifies the IP address only inthe IP header and does not modify anything in the packetpayload. Without visibility into a packet, this kind oftroubleshooting could be long and tedious.

Use of NAT can also lead to configuration complicationson the end systems since the translated address is valid onone side of a NAT boundary and the actual IP addressmay be valid on the other side of that NAT boundary. Forexample, when printing to a printer whose IP address hasbeen statically translated (one-to-one), let’s say from theactual printer IP of 172.16.20.5 to an IP address of10.81.249.23, the print job will need to be initially sent tothe translated IP address i.e., 10.81.249.23. Consideranother case where the printer IP address is translatedfrom its actual IP address of 172.16.23.15 to192.168.23.26 at the customer end and is further



translated to 10.81.249.35 at the data center. In this case,the system at the data center will need to send a print jobto 10.81.249.35 in order to print to the actual printer at172.16.23.15.

Furthermore, in the case of many-to-one or many-to-fewNAT, the network traffic can only be initiated in onedirection as discussed earlier. This should be kept inmind when troubleshooting connectivity issues acrosssuch address translation boundaries.

Virtual Web Server ImplementationData center customers occasionally implement what arecalled virtual servers. This means that multiple web sitesare implemented on the same set of servers. One way toimplement this is to use a different IP address for eachweb site, with each web server having multiple IPaddresses. Each IP address corresponds to a different website. For example, let us say a customer has three webservers (1, 2, and 3) and two different virtual servers,www.site1.com and www.site2.com. Each web serverwould have two virtual IP addresses, one forwww.site1.com and another for www.site2.com.www.site1.com would map to an IP address which is loadbalanced between the three site 1 IP addresses.www.site2.com would map to a different IP addresswhich is load-balanced between the three site 2 IPaddresses. This configuration consumes eight public IPaddresses.

Extending this logic for m web sites and n web servers, m* (n+1) public IP addresses are consumed. This is a verywasteful way to use IP addresses. A betterimplementation uses virtual sites for customers with onlyone IP address per server. The web server producescontent depending on the HTTP host request header [17].As a result, for m web sites on n servers, only n+1 IPaddresses are consumed (including one virtual address persite). The shortage of IP address space dictates using thistechnique. Some registries will not accept virtual sites asan excuse for requesting additional IP addresses.

ADDRESSING DESIGN DECISIONS ATINTEL® ONLINE SERVICESIn this section, we discuss some of the implementationchoices made at Intel Online Services that were driven byIP address space concerns. We first talk about the choiceof IP address space. We then discuss remote accessimplementations, and conclude with a description ofvirtual web site implementations.

Addressing ChoicesAs we mentioned above, some types of address space aremore appropriate than others in different situations. AtIntel Online Services (IOS) , we use a hybrid addressingapproach for the data center network that uses the mostappropriate type of address space depending on the

purpose of the host. We use private IP addressing for thedata center internal networks and for customer serversthat do not need to talk to the Internet. Since our internalnetworks do not need to talk to the Internet, there is noneed to use precious public space. Also, back-end serversthat do not need to talk to the Internet gain a measure ofsecurity because they are impossible to access directlyfrom the Internet. (They are not immune to attack,however.)

Links to transit ISPs and other ISPs that IOS is peers withuses the address space of those ISPs on the routerinterfaces of the links. Since the IP address will need tochange if the ISPs change and will go away if the ISP isno longer used, use of their space in this situation is not aproblem and helps conserve IOS public address space.

For data center hosts and routers that need tocommunicate directly with the Internet, we have used avariety of address spaces. In North America, we use aclass B (/16) that was made available to us. Usingavailable public space allows us to have address spaceindependent of our ISP selection, and it makesmultihoming to multiple ISPs much easier. For our datacenters in Asia and Europe, we have obtained space fromthe regional address registries that we own. Using theavailable class B was not feasible because some ISPs willnot accept router advertisements of pieces of traditionalclass B address space. If we choose to use this space, wewould have had to use the same ISPs in Asia, Europe, andin North America in order to have any kind of data center-specific routing policy. The process of obtaining newspace took significant effort, but it is well worth it to haveaddress space independent of ISP choice and the ability tomultihome.

Size of Address Space AdvertisedWhile IOS data centers are designed to handle thousandsof hosts, not all of the hosts need to communicate directlywith the Internet. Each data center advertises /19networks, providing address space for up to 8192 hosts./19 is the longest prefix that some ISPs will accept. Forthe data centers in North America using parts of atraditional class B, we also need to advertise the entire /16network out of two data centers in order to deal with ISPsthat do not accept parts of traditional class B networks.These ISPs will route traffic to IOS’s ISPs, which willthen route it to the individual data centers.

Autonomous System Number ChoicesThese choices are similar to our choices of address space.For North American data centers, IOS uses anautonomous system number that it had available.Separate autonomous numbers for each data center wereconsidered wasteful, and in the North Americanenvironment, not particularly necessary. For IOS Asianand European data centers, we have obtained AS numbersfor APNIC and RIPE, respectively. This allows the data



centers independence in ISP selection, and it avoids anypossible routing policy conflicts with other data centerson different continents.

Address Allocation PolicyFor security and ease of management, IOS has chosen toplace each of its customers on separate segments. Indoing so, IOS enforces requirements for addressutilization that mirror those of the address registriesmentioned above. This positions IOS to be able to meetthe utilization requirements of the registries when IOSrequires more space.

To meet those requirements, IOS uses variable lengthsubnet masks extensively. VLSM allows IOS to assignthe appropriate sized subnet to a customer whilemaintaining utilization requirements. One consequenceof this is that for IOS infrastructure hosts that needrouting information sent to them, a routing protocol thatunderstands VLSM needs to be used. This eliminates RIPversion 1 protocol [3] and basically limits the routingprotocol used by hosts to OSPF [4] and RIP version 2[15].

Remote Access ImplementationIOS data centers offer a variety of remote access optionssuch as Virtual Private Network (VPN), ISDN, anddedicated leased lines (T-1, T-3 etc.) to customers as away of providing access directly from their networks intotheir servers hosted at IOS. Some of these options, suchas LAN-to-LAN VPNs and leased lines, create networkchannels into the data center across which customer-endnetwork addressing, that may very likely be in the privateIP ranges, becomes visible. In this section, we focus onlyon these remote access options.

Allowing customer-end IP addressing, whether it is inpublic or private IP space, inside the data center networkmakes routing extremely difficult. The data centerrouting policy needs to account for routing network trafficappropriately to multiple customers’ home networks.Furthermore, across multiple customers, the networks attheir ends can be spread all across the public and privateIP address ranges, leaving little scope for summarizingnetworks and as a consequence leading to larger routingtables. In addition, private IP conflicts across customersas well as the data center networks need to be resolved.Considering all the above challenges, we made a decisionto hide customer-end internal networks from data centerinternal networks.

In this light, let us discuss the salient features of IOS’sremote access infrastructure, which is logicallyrepresented in Figure 3.

Network Aggregation Segment

VPNRemote AccessLayer

OuterRouters

FirewallLayer

Inner Routers

Internet

Intel Online ServicesNetwork

ISDN LeasedLines (T-1/T-3 etc.)

Network Aggregation Segment

VPNRemote AccessLayer

OuterRouters

FirewallLayer

Inner Routers

Internet

Intel Online ServicesNetwork

ISDN LeasedLines (T-1/T-3 etc.)

Figure 3: Intel Online Services remote accessinfrastructure for data center customers

The remote access layer constitutes the devices thatprovide the various remote access technologies supportedat each data center. The firewall layer has devices thatenforce security policies, among other things, on remoteaccess traffic. The firewall layer also employs many-to-one as well as static NAT extensively to ensure thatcustomer-end networks are not visible beyond thenetwork aggregation segment. An exception to this rule iswhen a customer wants to run applications such as theDistributed Component Object Model (DCOM) [16] thatdoes not work across NAT boundaries.

The actual design of the end-to-end networkcommunication, across the above infrastructure, forvarious customers is dependent on the following factors:

• private IP addressing conflicts with othercustomers’ networks

• where the network communication is initiated

• whether the applications that need to be run canwork across NAT boundaries

Following is a discussion of the various remote accessscenarios that can be encountered and how IOS supportsend-to-end communication in those situations.

Scenario 1: Customer-end network does not conflict withany other customer’s home networks or with data centerinternal networks. All network communication needs to



be initiated from the customer end and inbound into thedata center. All required applications work well withNAT.

This is a simple case and can be resolved by translatingall of the customer network to one IP address (many-to-one NAT) from the data center’s IP addressing range.Each customer network that fits this scenario is translatedto a different IP address. As discussed earlier in theNAT section, many-to-one NAT does not allow forcommunication to be initiated in both directions. In thisdesign, servers in the data center cannot initiatecommunication back to customer-end systems.

Scenario 2: This is the same as Scenario 1 but it requiresthat communication needs to be initiated from the datacenter to a few customer-end systems.

In such situations, we still create a NAT rule to translateall of the customer’s networks to one single IP address.In addition, we create static one-to-one NATrelationships between specific data center’s IP addressesand specific IP addresses of systems at the customer end.Hence, bi-directional initiation of communication isallowed only for specific customer-end systems.

Scenario 3: Customer-end internal networks conflictwith another customer’s home network but do notconflict with data center IP address space.Communication is always initiated from customer end tothe data center and all required applications work well inspite of NAT.

To resolve the private IP address conflicts here, we askcustomers to perform NAT at their end to hide theirinternal network from IOS behind either non-conflictingprivate IP address space or public address space.Furthermore, we translate that IP address space, at thefirewall layer in the data center remote accessinfrastructure, to a single IP address from the datacenter’s IP address space.

Scenario 4: This is the same as Scenario 3 except thatcommunication needs to be initiated from the data centerto some systems at customer end.

For allowing needed end-to-end communication here, wecreate static one-to-one NAT relationships for thespecific systems at the customer end in addition to thesolution implemented for Scenario 3. If customers areusing many-to-one NAT on their end as well, they willneed to set up corresponding one-to-one NATrelationships on their end in order to allow connections tobe initiated from the data center to systems at their end.

Scenario 5: Customer-end networks have private IPconflicts with data center’s private IP space.

With the exception of catering to applications that do notwork across NAT, we have decided to allocate the datacenter’s public IP space to the internal networks that suchcustomers would need to reach at the data center.Customers in turn will need to perform NAT at their end

to translate their networks to a non-conflicting IP space.This non-conflicting IP space is still hidden from the datacenter for reasons discussed earlier.

Scenario 6: Customer needs to run applications that donot work across a NAT boundary across one of theremote access channels.

In such cases, we must ensure that the data center’snetworks as well as the customer-end systems that needsuch communication use public IP addressing. Inaddition, we have to accommodate the routing forcustomer’s public IP address range in the data centerrouting policy as it is not possible to hide customer-endnetworks behind NAT here.

Other Options: End-to-end connectivity design, acrossthe remote access channels discussed in this section, canget quite complex and potentially difficult totroubleshoot. IOS also provides remote access options,such as client-to-LAN VPN and ISDN that provide directconnectivity from an individual desktop system to thedata center network and are devoid of designcomplexities of the remote access options discussed sofar. These remote access options hide the customer endIP addressing from the data center network, eliminatingthe possibility of address conflicts and the need fortechnologies such as NAT. Instead, these technologiesassign an additional IP address from a designated pool ofaddresses in the data center’s IP address space to thecustomer-end system while the system is connected.This IP address is used for all communication withsystems within the data center. Applications that carry IPaddress or application port information in the packetpayload work without any issue across these remoteaccess channels, since the IP address and portinformation never changes anywhere between thecustomer desktop system and the systems in the datacenter.

This option works well for customers who travel a lot andaccess their systems from a number of locations or forcustomers who do not need continuous access to theirsystems. It does not work well for customers whoseconnectivity to their systems in the data center must beup all of the time.

Virtual Web Server ImplementationsIOS uses the HTTP 1.1 host header technique [18]mentioned above for virtual web site public addresses.This means for m virtual web sites and n web servers,only n + 1 public addresses are used. With thisimplementation, the number of IP addresses required isnot sensitive to the number of virtual web sites. SomeIOS customers have web usage analysis packages thatrequire that different virtual web sites have different IPaddresses. IOS minimizes the impact on address spacefrom such customers by mapping IP virtual addresses toweb servers on ports other than the HTTP standard port80. A different web server virtual IP address might map



to port 81 on the web servers, while a different virtualsite’s virtual IP address might map to port 82 on thosesame servers. In this way, for m virtual web sites and nweb servers, m + n public addresses are used. This is notas good as the host header implementation, but is muchbetter than the IP address per web server implementation.

TRENDS IN IP ADDRESSINGIP address space is finite, and as the number of availableaddresses gets smaller, new addresses and AS numberswill be harder to obtain. As of February 2000, about halfof the available IP addresses will be utilized [19]. Weanticipate that depletion of addresses will accelerate as theInternet becomes more pervasive worldwide and as moreand more devices (cellular phones, game consoles) arebecoming able to communicate directly on the Internet.

One technical solution we are evaluating to reduce thenumber of addresses being used is to use NAT on publicweb servers. With NAT, a public virtual IP address couldbe mapped to multiple private addresses, reducing theneed for a public IP address per web server. This solutionhas the potential drawback of complicating monitoring ofweb servers and debugging of problems since theindividual web servers can no longer be individuallycontacted by the Internet. Also, various security schemescan be broken by using NAT [20].

A longer term solution to the depletion problem is for theInternet to move to IP version 6 [21]. The address spacefor IPv6 is much much larger, and many of the actionsnecessitated by IPv4 address scarcity will not benecessary. The address registries have already begunallocating IPv6 address space. Unfortunately, IPv6 is notbackward compatible. At the moment, there is notenough economic incentive to undertake large scaleconversion to IPv6. We anticipate that this may changeas the amount of IPv4 address space is depleted.

CONCLUSIONIP address space is clearly one of the most criticalresources that an Internet data center needs to manage.The fact that IP address space is a limited resource drivesmany technology and operational decisions. Even privateaddress space, once thought to provide relief fromaddressing problems, can be the source of problems astwo organizations find themselves trying to addressprivate space address collisions. IPv6 holds somepromise for relieving many of these problems, but theInternet has quite a ways to go before there is widespreadadoption of IPv6.

ACKNOWLEDGMENTSWe acknowledge our technical reviewers, Sanjay Rungtaand Matt W. Baker for their time and helpful feedback.

REFERENCES[1] Postel, J (editor), “DARPA Internet Program Protocol

Specification,” RFC 791, September 1981,http://www.ietf.org/rfc/rfc0791.txt .

[2] Hawkinson, J., and T. Bates, “Guidelines for Creation,Selection, and Registration of an Autonomous System(AS),” RFC 1930, September 1996,http://www.ietf.org/rfc/rfc1930.txt .

[3] Hedrick, C., “Routing Information Protocol,” RFC1058, June 1988, http://www.ietf.org/rfc/rfc1058.txt .

[4] Moy, J., “OSPF Version 2,” RFC 2328, April 1988,http://www.ietf.org/rfc/rfc2328.txt.

[5] Rekter, Y., and T. Li, “Border Gateway Protocol 4,”RFC 1771, March 1995,http://www.ietf.org/rfc/rfc1771.txt .

[6] Cerf, V., “IAB Recommended Policy on DistributingInternet Identifier Assignment and RecommendedPolicy Charge to Internet ‘Connected’ Status,” RFC1174, August 1990, http://www.ietf.org/rfc/rfc1174.txt.

[7] Nesser II, P., An Appeal to the Internet Community toReturn Unused IP Networks (Prefixes) to the IANA,RFC 1917, August 1990,http://www.ietf.org/rfc/rfc1917.txt.

[8] Manning, B., and T. Pummill, “Variable LengthSubnet Table For IPv4,” RFC 1878, December 1995,http://www.ietf.org/rfc/rfc1878.txt.

[9] Mogul, J. and J. Postel, “Internet Standard SubnetProcedure,” RFC 950, August 1990,http://www.ietf.org/rfc/rfc0950.txt.

[10] Rekhter, Y., B. Moskowitz, D. Karrenberg, G. J. deGroot, and E. Lear, “Address Allocation for PrivateInternets,” RFC 1918, February 1996,http://www.ietf.org/rfc/rfc1918.txt.

[11] Rekter, Y. and T. Li, “An Architecture for IPAddress Allocation with CIDR”, RFC 1518,September 1993 , http://www.ietf.org/rfc/rfc1518.txt.

[12] Fuller, V., T. Li, J. Yu, and K. Varadhan, “ClasslessInter-Domain Routing (CIDR): an AddressAssignment and Aggregation Strategy,” RFC 1519,September 1993, http://www.ietf.org/rfc/rfc1519.txt

[13] Cisco Corporation, Hot Standby Routing Protocol.http://www.cisco.com/warp/public/619/index.shtml

[14] Evegang, K. and P. Francis, “The IP NetworkAddress Translator (NAT),” RFC 1631, May 1994,http://www.ietf.org/rfc/rfc1631.txt.

[15] Malkin. G., “RIP Version 2,” RFC 2453, November1998, http://www.ietf.org/rfc/rfc2453.txt.



[16] Microsoft Corporationhttp://msdn.microsoft.com/library/backgrnd/html/msdn_dco mfirewall.htm

[17] Srisuresh P. and M. Holdrege, “IP Network AddressTranslator (NAT) Terminology and Considerations,”RFC 2663, August 1999,http://www.ietf.org/rfc/rfc2663.txt.

[18] Fielding, R., J. Gettys, J. Mogul, H. Frystyk, and T.Berners-Lee, “Hypertext Transfer Protocol --HTTP/1.1,” RFC 2068, January 1997,http://www.ietf.org/rfc/rfc2068.txt.

[19] Mankin, A., “Will there be a IPv6 Transition?” 1999USENIX Annual Technical Conference, Monterey,1999,http://www.east.isi.edu/~mankin/usenix99/tsld001.htm

[20] Carpenter, B, “Internet Transparency,” RFC 2775,February 2000, http://www.ietf.org/rfc/rfc2775.txt.

[21] Deering, S., “Internet Protocol, Version 6 (IPv6)Specification,” RFC 2460, December 1998,http://www.ietf.org/rfc/rfc2460.txt.

AUTHORS’ BIOGRAPHIESJeff Sedayao is a network engineer at Intel OnlineServices. Between 1987 and 1999, he was the architect ofand ran Intel’s Internet connectivity. His primaryinterests are in Internet performance, security, and policyimplementation. He also serves as Intel’s representativeto the Cross Industries Working Team’s InternetPerformance Team and has participated in the IETF's IPPerformance Metrics Workgroup. His e-mail address [email protected] .

Lynne Marchi received her B.S. degree in computerscience from California State University, Sacramento. Shebegan work at Intel in 1992 as a co-op and then joined theCorporate Information Security group in 1993. In 1996,she joined Internet Connectivity Engineering where shefocused on the secure implementation of new firewalls.Recently, she has joined the Internet FirewallEngineering group of Intel Online Services. Her e-mailaddress is [email protected] .

Sridhar Mahankali is an M.S. graduate in electrical andcomputer engineering from the University of RhodeIsland, Kingston. He worked as a systems and networkengineer with the Board of Governors of the FederalReserve System before joining Intel’s e-Business group in1998, where he focused on firewall and intrusiondetection implementation. Currently, he is part of theInternet Firewall Engineering group in Intel OnlineServices and specializes in engineering VPNs andnetwork consulting. His e-mail address [email protected] .

Copyright © Intel Corporation 2000. This publicationwas downloaded from http://developer.intel.com/ .

Legal notices athttp://developer.intel.com/sites/developer/tradmarx.htm .

Certifying Service Reliability in Data Centers 1

Certifying Service Reliability in Data Centers

Michael Schlierf, Intel Online Services Quality and Reliability Engineering, Intel Corp.Michael Fuller, Intel Online Services Quality and Reliability Engineering, Intel Corp.

Mike Kerr, Intel Online Services Quality and Reliability Engineering, Intel Corp.

Index words: data center, service, reliability, start-up, cookbook, ORC, operations, readiness,certification, SRCL, CIPA, WTP, OJR

ABSTRACT

Intel® Online Services (IOS) Quality and ReliabilityEngineering (QRE) applied the Virtual Factory andCopy Exactly (CE!) [1] Start-Up methodologies fromIntel’s Technology Manufacturing Group (TMG) to thechallenge of starting up an IOS Data Center (DC). Ourapproach was to take existing methods for solvingspecific problems and modify them for service industryneeds. The results are the Data Center Start-Up“Cookbook” and the Operations Readiness Certification(ORC) process. IOS uses Operations ReadinessCertification to certify the reliability of systems andservice offerings in each data center.

INTRODUCTIONIntel® Online Services is chartered to deliver managedservices to host customers’ unique e-Commerceapplications. Our challenge is to leverage Intel'sexpertise in producing millions of units reliably and toapply it to rapidly developing and deploying newtechnologies, with a high level of customization.

The e-Commerce community is unforgiving of mistakes,and IOS cannot afford “black eyes” from poorexecution. New data centers must be ready forcustomers within 54 days of construction beingcompleted. The IOS business plan requiressimultaneous data center start-ups with many new Intelpersonnel. On opening day, a data center mustimmediately provide reliable service consistent withother IOS data centers.

After opening the first data center (named IDS01) inSanta Clara in September, 1999, the QRE teamorganized a rigorous Lessons Learned process in whichwe identified hundreds of issues and suggestedimprovements. We developed specific action plans toresolve the major issues before opening the second datacenter (IDS02), which was scheduled for six monthslater.

As a result of the Lessons Learned and action plans,QRE developed an IOS Operations Start-UpMethodology consisting of 1) the DC Start-Up process (

the Cookbook) and 2) the DC Operations ReadinessCertification (ORC) process.

DATA CENTER START-UP COOKBOOKThe DC Start-Up Cookbook is a set of recipes forramping up new data centers. The cookbook containsmechanisms for planning and scheduling, managing theproject, and capturing lessons learned for subsequentstart-ups.

Planning and SchedulingThe cookbook contains standardized phases and critical-path milestones from the following groups:

• Corporate Services (CS) Construction (limited toIOS-dependent milestones)

• Information Technology (IT) (limited to IOS-dependent milestones)

• Asset Management (AM)

• Intel Online Services Engineering (IOSE) Fit-up

• Staffing

• Human Resources / Employee Integration

• Organizational Resource Development(ORD)—Functional Training

• Customer Service Integration (CSI)

• Data Center Operations (DC Ops)

• Facilities Sustaining

• Customer Service Representatives (CSR)

The cookbook defines the key supporting tasks and theirduration, sequence, and dependencies. It includes adetailed technical (IOSE Fit-up) sub-schedule that canbe integrated into the master schedule.

Each phase/milestone has a standard duration relative tothe “Go-Live” date. Representatives of all groupsinvolved hold a full-day planning session to develop adetailed plan from the generic cookbook milestones.Each group defines its critical-path milestones to ensure

Intel Technology Journal


that all groups understand and agree, particularly groupswith direct interdependencies.

Managing the ProjectThroughout the start-up phases, representatives of eachinvolved group hold a weekly meeting to manage theIOS start-up milestones and group interfaces. Eachweek all group owners present a Weekly RiskAssessment template for their critical path milestonesand assign a risk assessment (high, medium, or low) tomeeting their milestones. They also presenthighlights/lowlights and plans, and they request help ifneeded. A Weekly Readiness Assessment Report isgenerated with Program Highlights, Lowlights, andassessed risk to the critical path.

Lessons LearnedThe IOS Operations start-up methodology ends with arigorous Lessons Learned process from the previousdata center start-up. This is organized into two sessions:

• Session 1 (DC Operations) focuses on staffing,training, human resource allocation and utilization,management/planning, IOS Ops certificationprocesses (SRCL Planning & Management,Customer Integration Process Audit, Walk-the-Pod,On-the-Job Readiness), and business processes(customer landing, build team, and changemanagement).

• Session 2 (IOSE Fit-up) focuses on fit-up technicalissues, planning and coordination of fit-upinterdependencies to CS construction and IT, andSRCL technical content.

In these sessions, the stakeholders define Action Plansand assign Actions Required (AR) to specific owners orforums. Those that affect the data center start-up plansand schedules are included in the next version of theIOS Operations Start-Up Cookbook.

The purpose of the cookbook is to ensure reliabledeployment of a service as measured by successfulcompletion of the Operations Readiness Certification.

OPERATIONS READINESSCERTIFICATION (ORC)The Operations Readiness Certification processvalidates that the data center is able to support customersites with standard production processes and personnel.It consists of the following phases:

• The System Readiness Checklist (SRCL) verifiesthat the infrastructure is built correctly and that allsystems and subsystems are operational.

• The Customer Integration Process Audit (CIPA)ensures that all business processes required to land acustomer in the data center are in place.

• A series of Walk-the-Pod (WTP) exercises orscripts validate that the data center has thecapability to support Service-Level Agreements(SLAs). The ORC team develops a risk assessmentmatrix to identify high-risk areas requiringimmediate action plans.

• During On-the-Job Readiness (OJR), auditorsobserve all data center personnel performing theirroles and responsibilities to validate that they havethe required skills.

When all of these phases are completed successfully, theDC is certified as ready to "go live."

Systems Readiness Checklist (SRCL)The first phase of IOS Operations ReadinessCertification is the Systems Readiness Checklist (SRCL)process, based on the Intel Technology ManufacturingGroup’s Machine Readiness Checklist (MRCL) process.SRCLs are an integral part of starting up new datacenters as well as supporting new service offerings.

SRCL deliverables are grouped into three levels:

1 . The build checklist allows the system builder tocapture completion of the build elements.

2 . The functionality checklist validates all requiredsystem functions.

3 . The documentation checklist verifies that alldocumentation required for transition to data centeroperations is in place.

The Build Checklist (SRCL Level 1) coincides with theIOSE Fit-up. The Functionality Checklist (Level 2)occurs the week after the Build Checklist. TheDocumentation Checklist (Level 3) begins the weekafter the Functionality Checklist and continues through"Go-Live."

A SRCL has several benefits:

• It provides the foundation for a standardized datacenter start-up process.

• It stipulates the requirements and deliverables forthe system and ensures that those deliverables aremet.

• It provides a standard qualitative assessment of thetechnical readiness of each system or subsystem.

• It creates a measurable endpoint for the systeminstallers responsible for deployment

• It specifies what routine maintenance activities arerequired for DC Operations to assume ownership ofand sustain the system.

• It is a means for DC Ops analysts and engineers tolearn system details and reinforce technical training.



• It serves as a vehicle to commission the system toproduction in the data center. (The SRCL requireswritten sign-off by the system installer, DC Opssystem owner, and ultimately the DC Opsmanager.)

• It provides a framework for continuousimprovement for future deployments.

There are a number of specific SRCL managementprocesses in place to manage completion of SRCLs byservice offering. There is also a specific managementand team structure defined to complete SRCL execution.

The SRCL Functional Owner Matrix assigns groups ofSRCLs to a DC Ops functional owner. This functionalowner is responsible for the completeness and accuracyof technical SRCL content and also drives thecompletion of SRCLs within their functional area.

Customer Integration Process Audit (CIPA)The CIPA process exercises the Customer LandingProcess, as defined by Customer Service Integration(CSI). CSI determines which specific CIPA forms,processes, procedures, and roles need to be evaluated foreach service offering then ensures that all businessprocesses required to land a customer in the data centerare in place.

The overall purpose of the CIPA is to certify the datacenter's ability to build a simulated customer pod (whichtypically consists of web application and databaseservers) then land the customer's e-commerceapplication around the clock, using only the skills andresources within the data center. The completed podserves as the platform for the Walk-the-Pod simulations.

In a typical scenario, the implementation team

• blocks out a 72-hour period to conduct the CIPA,

• gathers all known procedural documentation,

• builds a faux pod (for a simulated customer) fromCustomer Enrollment through CSR Start-Up,

• presents Customer Build Request (CBR) to theImplementation Planning Council (IPC), PlanningMeeting, and Change Assessment Team (CAT), and

• creates a library of all documents that need to bevalidated after the audit.

Throughout the CIPA, independent auditors observeevery step.

The CIPA allows the implementation team to

• test all documented procedures, to removeunnecessary steps, and identify missing steps;

• identify procedures that are passed on through“tribal knowledge” or are not defined;

• identify processes that must be performed in serialor may be performed in parallel;

• test document control; and

• test revision control.

The CIPA also allows the implementation team to

• follow a Remedy ticket throughout the process,conducting warm handoffs between roles;

• uncover major issues in the landing/build/supportprocesses;

• identify important training issues;

• test the project tracking tools;

• review the network design;

• ensure that all required asset, server, and networkdata are captured;

• develop Best Known Methods (BKM) withCustomer Service Integration (CSI) and InformationSecurity (InfoSec); and

• train other network engineers on the processes.

Once the implementation team has completed theCustomer Integration Process Audit, the pod is ready forthe Walk-the-Pod exercises.

Walk-the-PodWalk-the-Pod (WTP) is a concept adopted from Intelmanufacturing that is used for validation of a newprocess or process change. The goal of WTP is to assessthe operational readiness of a new/upgraded service ortool and validate that it is fully integrated into aproduction data center. Testing personnel or trainingreadiness is not within the scope of WTP.

The purpose of WTP is to

• ensure that supporting technical and businessdocumentation (such as production designs and runbooks) can be located and executed successfully;

• exercise procedures between support levels;

• exercise all monitoring, fault detection,troubleshooting, escalation, and problem resolutionprocedures; and

• ensure that the technology and processes supportboth customer and internal service-levelagreements.

Planning PhaseDuring the planning phase, assigned team memberscreate a set of simulation exercises, or scripts, covering arepresentative range of the promised capability. Eachscript describes a specific service scenario. WTP scriptsare organized by the primary capability they evaluateand contain the following components:



• Targeted Primary Capability

• Required Procedures and Processes

• Required Preliminary Training

• Scenario Set-Up

• Expected Process Flow

• Expected Outcome

The best practice is to cover each primary capability andkey process with several exercises (with primary andsecondary focus). For example, a network-focusedexercise will simulate a network outage monitored,corrected, and verified by specific management tools(network availability and capability). Resolving thenetwork outage also verifies the problem managementworkflow.

In order to develop robust simulation scripts, the teammembers use a capability matrix that ensures thatadequate tests are run against critical capabilities. Thecapability matrix also shows which capability, process,or tool is either over- or under-tested, thus maximizingthe resources allocated to WTP.

Validation PhaseDuring the validation phase, team members perform adry-run validation to make certain that the tools,processes, and workflow function as expected. Keyquestions to answer during the dry-run validation are asfollows:

• Can the scenario script create the error needed?

• Is the physical infrastructure correctly described andtested?

• Are there missing processes or procedures?

• Is the operational business model correctlyassumed?

During Walk-the-Pod validation, Data CenterOperations personnel perform the simulation exercisesto verify that individual components, tools, processes,procedures, and documentation required for landing,monitoring, and supporting a service offering exist andfunction as an integrated system.

Execution PhaseBefore the execution phase can begin, the SRCLs andtest pod must be completed, and the planned scriptsmust be validated in the specific environment to betested.

As the Walk-the-Pod team runs each exercise, teammembers perform specific roles. During the executionof each key function, an auditor captures changes,issues, or needed action items. After each exercise isrun, the team does a postmortem on the effort to

determine if the issues found require a re-write and re-run of the scenario.

The final output of executing the simulation exercises isthe Summary Report. It captures each scenario,associates a risk level (High, Medium, or Low) to eachexercise and primary capability, and reports any openissues or required action items, as discovered throughWalk-the-Pod.

On-the-Job ReadinessOnce Walk-the-Pod verifies that the system is ready,On-the-Job Readiness (OJR) verifies that the people areready to support the system.

The purpose of OJR is to ensure that command centerpersonnel can

• learn the new service or tool in a simulatedenvironment,

• execute realistic scenarios,

• perform all procedures between support levels,

• perform all troubleshooting, escalation, andproblem-resolution procedures, and

• support both customer and internal service-levelagreements (SLAs).

Planning PhaseDuring the planning phase of OJR, the implementationteam re-uses the Walk-the-Pod exercises and categorizesthem by job classification.

Validation PhaseDuring the validation phase of OJR, the implementationteam members make certain that the scenarios andworkflows reflect realistic situations.

Key questions to answer during the validation are asfollows:

• Does the scenario apply to this (regional, non-regional, or tier-3) data center?

• Does the workflow identify all critical decisions?

• Does the command center person have access to allthe resources required?

• Does the process support all related SLAs?

• Are there missing processes or procedures?

Execution PhaseDuring the execution phase of OJR, DC Operationspersonnel perform the simulation exercises to verify thatthey can follow all related procedures, perform therequired tasks, and consistently make correct decisions,all within the constraints of the service-level agreementfor the new service or tool in a production environment.



As the data center personnel run each exercise,performing their specific roles, an auditor captureschanges, issues, or needed action items. After eachexercise, the OJR team members evaluate the results todetermine if the issues found require revising or re-running the exercise.

At the completion of the OJR, each data center analystor engineer on each shift must be able to perform allrequired procedures. Full OJR status is achieved whenall data center personnel on each shift have completedthis skill validation.

The Post Deployment Summary details each scenario,associates a risk level (High, Medium, or Low) to eachexercise, and reports any open issues or required actions.

CERTIFICATION DECLARATIONThe Ops Readiness Certification process provides datacenter management with an objective assessment of theirreadiness to “go live.” The result of the SRCL, Walk-the-Pod, and On-the-Job Readiness exercises is a “punchlist” of processes and procedures that may not becomplete. Data center management makes an informed-risk decision regarding whether to “go live”immediately, or to complete the punch items beforelanding customers. When the data center has completedall phases of the Operations Readiness process, QREissues a memo to the data center certifying that it isready to support customers.

RESULTSAs a result of following the Data Center Start-UpCookbook, IOS experienced near-flawless executionwhen opening the second data center (IDS02) inChantilly, Virginia, in March, 2000. The Fit-up and Go-Live processes were completed on schedule, three weeksfaster than in the prior start-up, in IDS01.

DISCUSSIONIDS01 (in Santa Clara, CA), IDS02 (in Chantilly, VA),and IDS03 (in Winnersh, UK) are similar data centers inthe sense that they are the same relative size, have asimilar number of customers, have comparable serviceofferings, had permanent staff during the start-up period,and their operating support systems were implementedin the local language. IDS04 (in Tokyo, Japan) issignificantly different in these areas.

The declining total Ticket Count for the three datacenters suggests that the Ops Readiness Certificationprocess reduces the number of critical issues causingcustomer problems during the six weeks after datacenter start-up (see Figure 1.)

Figure 1: Ticket count

The progressive decrease in the problem tickets in theOperating Systems Service (OSS) category, where wecan best measure the integration of technology andpeople, also furthers the conclusion that the ORCprocess increases the "Opening Quality" (see Figure 2).

IDS01 IDS02 IDS03 IDS04

Data Center%

of T

otal

HD

HardwareConnectivity

OSS

Figure 2: Category %

The decrease in the ticket Resolve Rate from IDS01,where we did not execute an ORC, to IDS02 and IDS03where we did supports our increased confidence in our"Opening Quality" (see Figure 3).

IDS01 IDS02 IDS03

Like Data Centers

Hou

rs

Resolve Rate

Figure 3: Resolve rate

CONCLUSIONUsing the Data Center Start-Up Cookbook andOperations Readiness Certification (ORC) methodology,IOS has increased our confidence in data center qualityat opening without lengthening the ramp-up time.

This confidence level prediction is supported by thefollowing datasets: 1) the decreasing number of troubletickets within each data center during the first six weeks;2) the decreasing percentage of total tickets in theOperating System Support (OSS) category; and 3) anoverall decrease in the time required to resolve#

of T

icke

ts

ProblemRequestQuestion



problems. The data demonstrate that, between similardata centers (IDS01, 02, and 03), ORC produces acontinuous improvement process.

ACKNOWLEDGMENTSWe acknowledge Steve Dickson, the QRE DeploymentManager, for creating the SRCL process anddocumenting a significant part of the Ops ReadinessCertification (ORC) process. He has played a major rolein applying the ORC methodology in IDS02, 03, and 04.

Stephen Dobbins played a key role in developing theSRCL process. Michelle Sugasawara and MichelleGriffiths were significant in executing the ORC processand interpreting the results, which improved the processfor IDS03.

REFERENCES[1] C.J. McDonald, " The Evolution of Intel’s CopyExactly Technology Transfer Method, Intel TechnologyJournal, Q4, 1998 athttp://developer.intel.com/technology/itj/q41998/articles /art_2.htm .

AUTHORS’ BIOGRAPHIESMichael Schlierf is the Intel Online Services Qualityand Reliability Engineering Manager. He has 20 yearsof experience with Intel in Industrial Engineering,Semiconductor Fabrication Operations Management,Factory Automation, and Information Technology. Hise-mail is [email protected] .

Michael Fuller is a QRE Program Manager with over10 years experience in program analysis and programmanagement, within both technology and appliedb e h a v i o r a l a n a l y s i s . H i s e - m a i l i [email protected] .

Mike Kerr is a professional technical writer with over20 years of experience documenting user interfaces,programs, and processes. His e-mail [email protected] .



e-Business Asset Management and Capacity Planning 1

e-Business Asset Management and Capacity Planning

Brad Coyne, IT e-Business Integration Engineering, Intel Corp.Sonja K. Sandeen, IT e-Business Integration Engineering, Intel Corp.

Index words: asset management; capacity planning; facilities management; infrastructure management

ABSTRACTThis paper describes Intel's e-Business data center assetmanagement and capacity planning programs includingthe business drivers that led to their creation, thetechnology developed to build and maintain them, andIntel’s future plans for them.

As Intel Corporation's Internet presence grew from staticcontent served from a single desktop PC to over 65externally facing Internet applications running on over850 servers in multiple states, the need for assetmanagement and capacity planning programs becamevery obvious. This paper explains these programs indepth. We will first, however, give a brief overview ofeach program.

INTRODUCTION

Asset ManagementIntel’s e-Business asset management program is a singleweb-based source from which all the physical andconfiguration data for each of the servers in the e-Business production and pre-production environments canbe obtained. This program was initially planned with foursequential phases and it is still evolving today. Effectiveasset management relies on both process and toolscoupled with the discipline of the using audience.

There are specific processes created and implemented inorder to maintain strong data integrity. Architecturaldesign review boards review the requirements andapprove them prior to any assets being procured forIntel’s e-Business environment. With each of theprocesses explicitly outlined in the intranet web site, eachengineer is responsible for updating the database whenthere is a change in the assets.

From a tools perspective, to make asset management easyfor engineers, Intel’s e-Business team has created acentral web application for recording the perpetuallychanging asset information. This asset managementapplication utilizes two back-end databases that capturephysical information as well as configuration data. Thetwo repositories are then seamlessly integrated into a

single web portal for engineers to view and update vitalserver information. The overall infrastructure for theasset management application consists of one web server,two database servers, and a system-side agent forsoftware and hardware inventory. The asset managementapplication can be used by Intel’s e-Business team foranything from determining where a system is located towhat applications are running on that system. Futuregoals of the asset management application will be tointegrate branding information (system specificdocumentation), system capacity information, and healthmonitoring controls and data. In e-Business, assetmanagement is seen as more than just recording serialnumbers. Our goal is to provide a comprehensive systemthat can be easily used to manage, administer, andevaluate Intel’s e-Business assets.

Capacity PlanningIntel’s e-Business is undergoing exponential growth ratesdue to the corporate emphasis on Internet commerce. Asour Internet business grows, the need to forecast systemand facility capacity requirements grows as well.Capacity planning is divided into two very largecategories: system management (which includes server,network infrastructure, and storage area networks) andfacilities/infrastructure management (which includesphysical space, network connectivity requirements, andpower and cooling requirements specifically related to thedata centers and pre-production labs). Capacity planningis a crucial part of the entire planning process andintegration for Intel’s e-Business infrastructure because ofIntel's demand for 99.999% application availability.

Every year Intel’s e-Business more than doubles thenumber of systems used to support our Internetapplications. To prevent our e-Business from outgrowingits facilities, system resources, and network infrastructure,we have to use a variety of methods to not only predict,but to also control capacity. Like asset management, Intele-Business has invoked programs dedicated to bothtechnical and process-based solutions to allow foreffective capacity planning.



THE EVOLUTION OF ASSETMANAGEMENT AND CAPACITYPLANNINGAsset management and capacity planning are viewed asbehind the scenes operational processes and tools notoften represented in a very public manner. If one does nothear about asset management and capacity planning, itmeans that the processes and tools are working effectivelyand are optimally integrated to provide all of the essentialbusiness planning and analytical data required at anygiven time. This, we feel is the true success criteria ofgood operational asset management and capacity planningprograms.

To understand the need for well-defined, integrated, andscalable asset management and capacity planningprograms for any business unit is the first step to effectiveand efficient programs.

These programs are inherently evolutionary. As e-Business and the Internet grow, so too will therequirements to manage assets and plan for capacity.

THE TECHNOLOGIES AND DESIGNS THATENABLE ASSET MANAGEMENTIn order to successfully implement an asset managementprogram within e-Business, a harmony must existbetween the processes that define asset management andthe tools that enable those definitions. This portion of thepaper will explain the enabling technologies and designsthat were modeled from the processes.

As the processes for asset management were constructed,based on an evaluation of the existing environment andthe projected environment, careful consideration wasgiven to the development of an application that would suitboth realms. When designing an Asset ManagementApplication (AMA), our first priority was to utilizeexisting data and processes. We then developed anapplication that would immediately leverage the currentsystem in use as well as define a system that would enrichasset management as it evolved to the desired state. Thefollowing sections of this paper describe the retrofittingand development of a comprehensive application thatenabled Intel’s e-Business to manage its assets from botha corporate capital and business support perspective.

Re-Developing Asset ManagementEffectively managing over a thousand capital valuedsystems is a challenge for any company or business. Tomeet that challenge within Intel’s e-Business structure, wewere chartered with not only retromanaging the existingenvironment, but also with controlling an exponentiallygrowing farm of servers. Step one of the assetmanagement challenge was to leverage existing data andapplications.

The management of assets within Intel’s e-Business hasgone through as many revisions and evolutions as theInternet itself. The first attempt of asset management wasa simple spreadsheet maintained by numerous engineers.Such a static effort was, at the time, all that was needed tomanage a business consisting of only a handful of servers.As Intel’s e-Business grew, and the value of assetmanagement was recognized, more dynamic applicationswere implemented. The first draft of such an applicationconsisted of a free-text web form with a simple one-tabledatabase. Although this development was pointed in theright direction, the demands of e-Business and IntelCorporation soon overcame its capabilities. To addressthe obvious needs at the time, management created aninternal group that was chartered with corralling andcoordinating Intel’s e-Business assets. This newly formedentity consisted of both process and applicationdevelopers.

Once processes were developed that outlined assetmanagement, the application developers needed toimplement a cost-effective solution. Third-partyapplications were considered briefly. However, theneeded level of scalability and customization togetherwith the inflated costs of outside applications quicklypaved the way for an in-house solution.

It was determined that the first step in instigating assetmanagement was to create a system that would not onlyhost all the needed information, but also make theacquisition of that information as simple as possible forthe users.

Creating a DependencyTo create such a system, first a database was designedthat could hold physical asset information as well asintegrate links to other systems and information that wereused by engineers. The reasoning behind such cross-functional integration was the realization that an engineerrarely needs corporate-level asset information. Byintegrating asset management into more daily processesand applications, Intel e-Business created a dependencythat would enforce the discipline required by assetmanagement processes.

One logical opportunity to create such a dependency wasthe already existing process for system space andnetworking requests. In order for a system (asset) to landwithin Intel’s e-Business environment, an engineer musthave two things: physical space assigned to place thesystem in the data center and an IP address with theaccompanying DNS entry. By taking advantage of thesetwo necessities, the application developers were able toenforce the collection of asset information. In order toacquire rack space and networking needs for a landingserver, an engineer must first pass through the assetmanagement processes.

To facilitate this dependency, the developers formed analliance with the facility and networking groups. This



new alliance not only benefited the asset managementprocess but also improved the efficiency of the joiningpartners. By integrating multiple processes andinterrelating the data within, duplicate efforts wereeliminated, and valuable data were joined.

Adding FeaturesThrough dependencies, the asset management effort wasassigned to the e-Business Integration Engineering group.However, to implement a successful application andprocess there needs to be tangible benefits to the user. Asmentioned previously, the typical engineer will not likelybenefit directly from asset management. Therefore, totruly create a business-wide need for asset management,the developers needed to create business-widefunctionality.

An opportunity to do just this presented itself in the formof the integration of systems management. Intel’s e-Business had an existing system for dynamicallyaccumulating system and application data. The systemsmanagement infrastructure consisted of a system-sideagent that inventoried software and hardwareconfiguration data for each server within the e-Businessenvironment. The asset management program developersrealized that if they could provide a portal to the assetdata within the asset application, they would be addingvalue to everyday operations. By altering the installationof the systems management tool to incorporate a logicallink to the asset management database, a virtualconglomerate system was created. With the marriage ofconfiguration and asset data, engineers now had aneveryday use for the e-Business asset application.

The creation of dependencies and the addition of everyday functionality to Intel’s e-Business AMA hassignificantly contributed to the success of the assetmanagement program. However, limiting the scope of theapplication and its user interfaces was not part of theapplication’s design. Further development and interfacesare on the horizon.

FUTURE DEVELOPMENTSWithin Intel e-Business we are constantly pushing fortight integration of applications, of which assetmanagement is no exception. Although the AMA is nowcoupled with our systems management data, there areeven more opportunities for integration. The goal of theAMA is to provide a comprehensive portal to all and anysystem data. The portal will eventually be the oneinterface engineers and managers alike will visit in orderto view any system information. The end result will bethe transparent integration of all system data ranging froma system’s physical and financial data to a system’sconfiguration and build documentation.

The next step for the AMA is to link the interface anddata to branding documentation. Intel e-Businessmaintains a document for each system within the

environment. The branding documentation outlines allthe steps used to build and configure a system. Brandinginformation is crucial to controlling and building ourproduction environment. Today, branding is viewedthrough shared directory browsing. A project is now inplace to improve the branding process by initiatingcontent control through version control. In addition toversion control, branding documents will be managedthrough a central database.

The AMA developers see the upcoming brandingapplication as yet another component that can beintegrated into the AMA portal. By adding the brandingcomponent we will create even more dependencies andincrease the functionality of the AMA for our end users.

With the linking of branding and systems managementapplications, the AMA will be an advanced infrastructureconsisting of a web server and three back-end databases.The real benefit the AMA will provide will be theseamless joining of multiple data repositories. Wheredatabases and the data once stood alone, they now will befully integrated into one portal (see Figure 1).

Figure 1: A view of the AMA components as they arescoped for today

After branding information is brought into the AMA thereare even more possibilities on the horizon. Plans arebeing considered to also incorporate our capacitymanagement and monitoring applications into the AMAportal. Assuming Intel e-Business can create thisimmensely conglomerated system, the AMA portal willundeniably be the one-stop interface for completesystems, asset, capacity, and monitoring management (seeFigure 2).



Figure 2: A view of the AMA components as they willbe viewed in the future

CAPACITY AND SYSTEMSMANAGEMENTEfficient management tools can greatly ease themanagement load in any computing environment,especially with an architecture that includes largenumbers of systems. The Intel e-Business infrastructureincludes over 850 servers, approximately 60% of whichare part of the production environment. All servers areremotely monitored for CPU, memory, and diskutilization. This helps management staff plan forupgrades and provides useful baseline information forsizing new systems and applications. It also helps ensurethat overtaxed systems are discovered before they sufferunnecessary failures or performance degradation.

In order to proactively monitor the capacity of systemswithin our e-Business environment, Intel’s IT group hasdeveloped a server-side agent that captures dozens ofsystem performance counters every five minutes. Theperformance data are then aggregated on several largedatabases exceeding a _ terabyte of performance data(Figure 3). Intel e-Business then runs daily reports foreach of its systems, which are routinely reviewed byproduction engineers for potential capacity issues. Theperformance data and reports allow us to predict as wellas prevent capacity problems within the environment (seeFigure 3).

Figure 3: An example of reports used by e-Business tomonitor capacity

In addition to monitoring system capacity Intel e-Businessalso uses a third-party agent and infrastructure to collectsystem configuration data. Knowing how a system isbuilt and what applications are running on that system isan important component to managing and controlling alarge e-Business site. E-Business systems are routinelyscheduled to query their own hardware and softwareconfigurations and relay those data to a central repository(Figure 4). Once in the repository, engineers can queryand review the dynamic data. This gives Intel e-Businessthe ability to identify, on a large scale, the systems thateither need upgrades or configuration changes.

Intel IT has also developed a custom application calledMetrios® that performs functionality testing. Metriosaccesses e-Business systems in much the same way a userwould and verifies system responses. It can run ActiveServer Page scripts and even dial out to test connectivityfrom the Internet. Metrios can be configured forautomatic monitoring and will page out to notify staff ofproblems requiring immediate attention (see Figure 4).Once notified, Intel IT staff use Microsoft’s TerminalServer∗ and the Intel LANDesk Server Manager 6.2for remote system management. Staff members can dialin from their desks or homes to access and managesystems throughout the e-Business computingenvironment. These tools are essential components formaintaining the Intel e-Business infrastructure andenabling efficient control of widely distributed systems.

∗ Other brands and names are the property of theirrespective owners.



Figure 4: Comprehensive e-Business systemsmanagement obtained through an integrated

infrastructure

FACILITIES AND INFRASTRUCTUREMANAGEMENTThe e-Business environment is becoming increasinglycomplex and in so many cases challenging Moore's Law.This growth is exponential in the speed of newapplications being introduced and existing applicationsbeing upgraded. These requirements are being driven bythe very intelligent end-user, who also happens to be theone developing the architecture. Along with assetmanagement and systems capacity planning, facilities andinfrastructure management was identified as a keycontributor to planning Intel’s e-Business successstrategy.

Facilities and infrastructure management is defined as thephysical space (data center and labs), the networkinfrastructure, and the power and cooling requirements tohouse all of the systems in a given space. These threemajor components are wholly dependent on each other inorder to accommodate a single system. One cannot land asystem in the data center or any of the pre-production/development labs without having all threecomponents in place.

In summary, facilities and infrastructure planning is sotightly integrated with asset management and capacityplanning in Intel's e-Business environment that it cannotbe treated as a separate issue.

ChallengesThe single most challenging problem with facilities andinfrastructure management planning is how to plan forgrowth and how to decide how much growth to plan forbased on cost and return on investment. It is a verypeculiar and confusing state to be in. Do you plan fortwo, four, or six years? Can you even plan for any furtherthan four years out? Based on trend analysis of system

procurement and the change in systems, how muchdensity do you plan for? What is your focus and how willthis investment be returned? What are the success criteriaof a data center or lab design? Once these questions havebeen answered, you can start the management process.

The problems with managing facilities and infrastructureare, in fact, in most cases the most costly and the mosttime consuming to react to and fix. We are trying toexpand our power/cooling capacity in the data center toaccommodate the amount of physical space and networkcapacity we currently have available. Power and coolingrequirements seem very basic, but they can be thedeciding factor in the installation of systems in a specificlab or data center. The design and planning of power andcooling capacity are essential and are very sensitive to ourever expanding e-Business infrastructure.

In order to understand the requirements of the pre-production labs and data center, we first prepared anhistorical trend analysis specific to the data center,production system growth, and pre-production systemgrowth. We factored in the understanding that the growthrate of systems in our e-Business environment has beendoubling every year, and the system platform physicalvertical size has decreased to 1/3 or 1/2 of the size fromone year ago. We also factored in our intention to makesure we would reduce the complexity of applicationdeployment (support, launches), support processes,network infrastructure (less reliance on WAN, MAN,routers), instrumentation, metrics, analyses, securityadministration, and capacity management. In addition toreducing complexity we also wanted to avoid indirectcosts of load balancing and network infrastructureoptimization, increased support costs, and increasedremote administration.

An overall facility and infrastructure roadmap wasdesigned with a consistent physical capacity, networkcapacity, and power/cooling capacity architecture andframework. Each of the pre-production labs are nowoutfitted with cabinets and racks to accommodate alltypes of systems – tower or rack mounts – as well as withthe network and power/cooling capacities which arescalable at a equal rate to each other.

The data center power/cooling capacity has beenretrofitted to accommodate the physical capacity requiredby e-Business production.

RESULTSThe success criteria of facilities and infrastructuremanagement is when the design of the lab or data centersfactors in the correct ratio of physical space to networkcapacity to power and cooling capacity, and all of theseare scalable at an even level. For example, the capacityremaining for network, physical space, and power/coolingis identical. In a facility design, you do not want to be ina position where you have a large amount of onecomponent and none remaining of another component.



The design of the Intel e-Business data center and pre-production labs is complete. The pre-production labs arein operation and are using the new facilities andinfrastructure strategy. The data center will soon beonline with the new infrastructure strategy.

With a long-term business strategy in place and acomplete understanding of the components of the facilityand infrastructure, Intel e-Business will be able to scalethe data center and all of its pre-production labs with zerointerference to business.

The importance of proactive facilities management andthe understanding of infrastructure scalability has beenrecognized and will always be an integral part of

e-Business strategy planning.

CONCLUSIONAs e-Business strives to provide the most successful andavailable Internet presence to the world, behind the scenesare the operational processes and tools enabling it to dojust that. Integrated, scalable, and very accurate assetmanagement and capacity planning programs areessential. Without these programs in place, there is noplatform on which to build the business strategy. Theinitiation and development of an asset managementsolution and capacity planning program will pay off in themost rewarding ways. With these programs in place youwill be able to successfully reduce the complexity of thee-Business environment with consistent processes andtools. Resources will be used effectively (both assets andpersonnel); system and application performance andavailability will increase; and consequently, time tomarket will decrease. The result is an overall reductionin cost for Intel.

ACKNOWLEDGMENTSThe contributions of Jeremy Murtishaw, an e-BusinessLead Internet Gateway Engineer whose knowledge andexperience in Data Center and Lab Network and FacilitiesInfrastructure Design, are gratefully appreciated.

AUTHORS’ BIOGRAPHIESBrad Coyne is an e-Business Architect with IntelCorporation. He graduated in 1998 with a B.S. degree inBusiness and an M.S. degree in Information Sciencesfrom Ball State University in Indiana. He has been withIntel Corporation for almost two years now. Since joiningIntel his engineering work has focused on servermanagement and Internet development and architecture.This focus includes the development of new applicationsand methods for managing an IT organization’s assets.His e-mail is [email protected].

Sonja K. Sandeen is an e-Business Logistics Analystwith Intel Corporation. She obtained a B.S. degree inBusiness Operations from the DeVry Institute of

Technology. Sonja provides e-Business IntegrationEngineering with a processdriven program for all of theoperational facets of a unit. Her primary focus is on assetmanagement space followed by capacity planning of theCPS Data Center and six pre-production labs. Sheenables process improvements across all of the e-Businessenvironments to ensure maximum efficiency throughintegration of e-Business data tracking tools andprocesses. Her e-mail is [email protected]



Intel e-Business Engineering Release Management and Application Landing 1

Intel® e-Business Engineering Release Management andApplication Landing

Alan Hodgson, IT e-Business Integration Engineering, Intel Corp.

Index words: release, environments, engineering, prioritization, testing, milestones, automation

ABSTRACTTo meet the rapid growth in the number of e-Businessapplications to be deployed and the demand forincreasing the frequency for application releases, Intel’se-Business release cycle has gone through manyevolutions and has seen many improvements. Thispaper examines these changes from an IT e-BusinessIntegration Engineering perspective. The e-Businessrelease cycle may range from 6 weeks to 20 weeksdepending on the functional and technical complexity ofthe release. A release may include up to 6 applicationsand in general, each application releases 4 or more timesa year. There are over 65 core e-Business external-facing applications.

In this paper, we focus mostly on Business-to-Business(B2B) direct sales and marketing, the most active e-Business area, and we deal only with the e-BusinessIntegration Engineering group’s role in e-Businessrelease planning and management.

INTRODUCTIONSince 1998, the Intel® IT e-Business IntegrationEngineering organization has grown from a small groupof 25 engineers supporting 50 servers running a handfulof external-facing Internet applications to 80+ engineerssupporting approximately 850 servers running 65external-facing Internet applications. This server farm issplit between pre-production, which in general coversdevelopment, QA test, and stress test servers; andproduction, which includes production, failover, andproduction support servers. The challenge has been toprovide on-time, reliable engineering activities tosupport the rapidly increasing number of applicationsbeing released in the external-facing, Intel® e-Businessspace and to keep pace with the increasing applicationrelease frequency.

The Intel IT e-Business Integration Engineering grouphas progressed, out ofnecessity, from adhoc releasesrequested with minimal lead time by the applicationgroups to a very close partnership with the businessapplication groups that includes early engagement andnotification of release priorities and plans. Out of thispartnership has evolved a comprehensive release processthat incorporates planning and prioritization, resource

and environment management, engineering managementand release management.

Intel is dedicated to delivering new and improvedfunctionality to its customers in a timeframe that keepsIntel and its customers competitive in the e-Businessmarket place. To ensure reliability and speed in therelease process, Intel’s IT e-Business IntegrationEngineering group is in the process of automating manyof the engineering build activities. This has reducedenvironment build times by up to 50% and increasedaccuracy and reliability. Automation is continuing withall aspects of instance designs and code migrations.

The engineering activities vary with the technicalcomplexity of the release, but in general include instancedesign activities, server build activities, stress testing,day 1 testing (a rehearsal of the deployment activities),and deployment of the application into production. Acomplex release may also include purchasing andassembling new hardware and possibly the developmentand implementation of new reference designs. Theseengineering activities are now closely managed andaligned with the application development lifecycleactivities to ensure that development, testing, anddeployment milestones can be met. Daily managementof release milestones has become a necessity formeeting release deadlines.

This paper covers a broad range of issues and activitiesrelating to the management of the release anddeployment of an external-facing (where Internetcommunication passes through a firewall) e-Businessapplication at Intel. It is written from an e-Businessintegration engineering perspective and looks at the Intele-Business release process with respect to how the e-Business Integration Engineering and the e-BusinessApplication Development teams handle the releaseprocess for over 60 applications a year.

Intel’s e-Business applications fall into one of thefollowing business areas:

• B2B Sales and Marketing Group (SMG)

• B2B Indirect Channel

• B2C (Business to Consumer)

• B2C Core Services (e.g., Search, Registration, etc.)



• B2B Supplier

• Other non-mission-critical e-Business applications

In most cases this paper refers to the release process forthe most active and mission-critical e-Business area,Intel’s B2B Sales and Marketing Group.

The Engagement ProcessFor new applications and major enhancements toexisting applications there are two phases of customerengagement with the e-Business Integration Engineeringgroup. The first phase, shown in Figure 1, deals withfinding strategic responses to business initiatives andinvolves participation in the Architecture Design Group(ADG). In this phase, the e-Business IntegrationEngineering group provides expertise in infrastructureand architecture to help find the best strategic response.This may include piloting and validating arecommendation before sanctioning its acceptance.

Market &Industry

Architecture &StrategicPlanning

AWGArchitecture Working Group

AWGArchitecture Working Group

ForwardEngineering Project SelectionProject Selection

Pilot ProjectsPilot Projects

New Reference Designs

Business RequirementsBusiness Requirements New TechnologyNew Technology

ADGArchitecture Design Group

Resource AllocationResource Allocation

Report out & Recommendation

StrategicRoadmap

E-Business GroupsE-Business Groups

Figure 1: Initiative study and strategic architecturedesign

The second phase, shown in Figure 2, occurs when aproject has been funded and is ready to be prioritized inthe release process. In this phase, the technicalcomplexity of the project is assessed, resources areassigned, and there is participation in the design,development, and deployment of the application.

This paper focuses on the second phase of theengagement process dealing with e-Business releaseplanning and management.

E-Business GroupsE-Business GroupsStrategicRoadmaps

Program Mgt OfficeProgram Mgt Office

Scope of Work

Scope of Work

Release Planning/PrioritizationValidate releaseDates

Current POR

Current POR

New PORNew POR

Prioritization/Re-planning

Forum

Prioritization/Re-planning

ForumAs

Required

EngineeringEngagement IT Engineering

Coordinator

IT EngineeringCoordinator

App Impact Assessment

App Impact Assessment

Assign TechnicalImplementation

Lead

Assign TechnicalImplementation

Lead

ResourceGantt

ResourceGantt

EMTGantt

EMTGantt

ComplexityAnalysis

ComplexityAnalysis

IT TechTimeline

IT TechTimeline

Implement ToPlan

Implement ToPlan

Go-LiveGo-Live

Figure 2: Release planning/management

CHALLENGESAs the number of e-Business applications increased andthe need to release more functionality more frequently,there was an ever-increasing demand for hardware andengineering resources. This demand often outpaced thehardware and resources available. Environments andservers were being shared among applications andneeded to be constantly rebuilt for the next release.Releases would follow so closely upon one another thata missed release date could very well have an impact onthe date of the next release. Engagement with the IT e-Business integration engineers would happen towardsthe end of the application design phase leaving verylittle time for engineering to get acquainted with, andprepare for, the new technology requirements. In manycases, new hardware needed to be purchased and newreference designs developed. In addition, there waslittle or no business prioritization of the applications andassociated releases. All this resulted in unrealisticexpectations from the users that a release could alwaysbe resourced and delivered on time. From anengineering standpoint this was a no win situation andresulted in overloaded resources and missed schedules.This paper addresses the release planning and



management processes that were developed andimplemented to change this scenario yet support thisfast-paced dynamic environment.

ENGINEERING RESPONSIBILITIESFirst, let’s take a look at the Intel IT e-BusinessIntegration Engineering organization and its role in therelease process. Support for our release cycleenvironments requires a team of engineers who can plan,design, build, test, and maintain the environments.These engineering activities are in the critical path of therelease and are dependent on certain developmentactivities being completed on time. Likewise, the QAtest teams are dependent on the e-Business integrationengineering activities being completed on time. TheIntel e-Business Integration Engineering group hascategorized these activities as follows:

• Design Engineering

• Factory Engineering

• Pre-Production Engineering

• Production Engineering

Design Engineering includes those activities relating toadding or changing infrastructure designs. These designadditions or changes may be as a result of new software,new hardware, or new releases of operating systems anddatabases. Design engineering may include designingnew infrastructure architecture, prototyping and testingthe changes, and ultimately developing referencedesigns that can be used to consistently set up and installthe new hardware, software, or release. These referencedesigns are used by other engineers to build servers fornew application releases.

Factory Engineering includes those activities relating toensuring a successful deployment of the applicationrelease from an engineering perspective. Theseactivities include developing and managing theengineering project plan and designing and documentingthe builds for all servers in the environment. Thesedesign documents are called Instance Designs (oftenreferred to as branding documents), and they provide thecomplete specification for building and configuring aserver. They include, in part, reference designs for the

base builds of the OS and databases, but in additioninclude directory structures, application installation andconfiguration instructions, and Common Object Model(COM) specifications. Factory engineering requires theengineer to be closely involved with the applicationdevelopment team to understand the new application’stechnical designs. The engineer should be a partner inthe design solutions. A factory engineer will, in mostcases, use the build of the proto environment as anengineering prototyping opportunity to test out anddevelop the instance design documentation. Thisdocumentation will ultimately be passed on to the pre-production engineer and the production engineer to buildand maintain the testing and production servers.

Pre-Production Engineering includes those activitiesrelating to the building and maintenance of servers in thepre-production environments. These engineers areexpected to build from the Instance Designdocumentation to ensure consistent and reliable serverbuilds. In the release cycle, these engineers are mostimpacted by any slippage in application design anddevelopment activities, since they are still expected todeliver test environments to the QA and Stress testteams on time.

Production Engineering includes those activitiesrelating to building the production servers, deploying anapplication release into production, and providingadequate production support for the applicationinfrastructure. Production engineering is also a primaryrecipient of the Instance Designs developed by thefactory engineer and validated by the pre-productionengineer. A production engineer is also a primaryparticipant in building the stress and day 1 environmentsand performing the day 1 execution. The day 1execution is in effect a dress rehearsal of the deploymentof a release into production and usually takes place theweek prior to production deployment. There is anadditional burden on the production engineer to voiceconcern if the application does not meet the entrancecriteria for production stability and performance. Oncea deployment has taken place the production engineerbecomes responsible for on-call support, productionenvironment maintenance, data center management, andtroubleshooting of production issues.



KEY: Landed/Online To be Ordered Requires UpgradeReceived/Not installed Order Approved: ESD: Estimated Shipping Date

Figure 3: Sample pre-production environment server map

RELEASE ENVIRONMENTSEach functionally dependent set of applications isdeveloped, tested, and deployed on an infrastructure ofservers that has been designed and configured to supportthe various applications. This infrastructure typicallyconsists of groups of servers forming environments thatare dedicated to phases of the release cycle.

• A development environment consists of a set ofdevelopment servers used by developers to developand test their applications.

• A proto environment consists of servers used toprototype or test out the designs.

• The pre-production QA environment consists ofservers used by an independent QA test team toperform functional and integrated testing.

• The pre-production stress testing and day 1 testingenvironment is the most dynamic environment. Itactually comprises up to five environments calledstations, which are built on an as-needed basis tostress test applications within a release. Anenvironment station may be used for stress testingWeb Order Management one week and rebuilt tostress test a channel application the next week.

• The production servers host the application inproduction and typically include a redundant set offail-over servers. In addition there may also be aproduction support environment where productionbug fixes can be made and tested without impactinga new release that is being tested in the pre-production environment.



The complete set of phased environments, includingproduction, is called a pipe. A pipe is, in most cases,specific and dedicated to an application area,(e.g., B2Bsales and marketing. A phased environment typicallyconsists of one or more web, database, and applicationservers. The numbers will vary depending on how closethe phase is to production. The QA test environmentswill be more representative of the productionarchitecture than the development environment and willinclude back-end system environments that make up theintegration architecture. At Intel, the e-Business, B2BDirect SMG pipe requires between 20 to 26 servers ineach of the test environments and around 50 servers inproduction, including failover. The pipe consists ofaround 140 servers that need to be maintained andsupported for a release. See Figure 3.

The new or changed application is moved fromenvironment to environment as development iscompleted and testing begins. At each stage, entrancecriteria need to be met, as shown in Figure 4, before thecode can be migrated to the next environment.

Proto

QA

Stress/

Day1

Prod-uction

Develop-ment

E-Business Pipeline ‘Promote to Production’ :Environment Entrance Criteria

! Code/build functional enough to be ready forunit testing in production-like environment

! Code is functionally frozen (only bug fixeshereafter)! Unit, Functional, & User-acceptance testingcompleted with no show-stoppers! Branding/install documentscompleted/updated

! Integration testing successful! ASP reviewed for guideline compliance! Stress, Load, & Time-dimension test planscompleted & ratified! Day 1 Test plan ready

! Stress, Load, Time-dimension, & Day 1testing successful! Go/No-Go completed with ‘Go’ & E-BusinessCCB request approved

Figure 4: Environment migration and exit criteria

The Release CycleThe release cycle for an e-Business application consistsof the standard phases of planning, requirements, design,development, testing, and deployment. However, unlikethe large back-end systems that often span years ofdevelopment, an e-Business release takes anywherefrom six months to six weeks depending on itsfunctional and technical complexity.

Figure 5: Complex release model



The Complex Release Model shown in Figure 5 reflectssignificant infrastructure changes that would includenew servers and associated reference and instancedesigns.

A Medium Complex Release Model would notnecessarily require new equipment and referencedesigns, but would require significant changes to

instance designs with corresponding build and testactivities.

The Simple Release Model shown in Figure 6 mayrequire significant testing of functionality but wouldrequire very little engineering. In this model, theenvironments stay dedicated to the application and donot need to be rebuilt or changed.

Figure 6: Simple release model

The Release Management ProcessIn the early days of e-Business development at Intel, theplanning horizon for releases was no more than sixmonths, and releases would occur somewhatsequentially within each business group. A releasecould consist of up to six applications. Thedependencies between those applications were sharedfunctionality, test environments, shared resources, orjust a shared release date. This approach did have somesuccess, but success was always dependent on theweakest application. If an application failed to passtesting criteria, the entire release might have beenpushed out. This in turn had an impact on futurereleases that were dependent on the same resources andenvironments. This constant flow of critical pathactivity was the most stressful for the environmentengineers and the QA test teams.

Figure 7 illustrates the four major processes that need totake place to manage a release.

e-Business Release Process

PMO - Project Resourcing

and ZBB List

E ngineeringPlanning

&Developme nt

Release Manageme nt

•Build a Release Roadmap for the next 3 - 6months.•Present Projects to be Released•Estab lish Release Milestones•Obtai n IT Commitment to resource the release•Address Issues impacting the Release schedule

•Develop a detailed resourcedEngineering plan for aRelease•Engage the resources toexecute the plan•Monitor execution to plan

Project Planning

&Development

•Monitor project progress to the Release Milestones•Engage in Test and Day 1 planning•Daily checkpoints

PMO Business

Prioritization

Strategic PlanningProject/InitiativePrioritization

Figure 7: Major release management processes



Business Prioritization and ResourcingIn our largest e-Business area of B2B, the businessorganization has established a Program ManagementOffice (PMO). The PMO is chartered with theresponsibility of establishing a forum for businessowners to submit release requests and join with otherbusiness owners to prioritize all requests relative toIntel’s B2B e-Business priorities. Other participants inthe forum include e-Business Integration Engineeringmanagement and QA/Test Management. The process issimple. The projects are assigned tentative release datesand priorities by the business owners. Each resourcemanager assigns resources to the projects until allresources are allocated. In the case of e-Businessintegration engineering, it is also necessary to determineif the shared test environments are available as requiredby the prioritization and release date. The businessowners then determine if the projects left unresourcedcan be postponed. If they can, then no priority changesare made. If they cannot, then further prioritization isdone or funding is allocated for additional resources orenvironments until the correct business prioritization isachieved. This process has brought visibility to the ratiobetween release demand and resource availability toimplement a release and has resulted in all the keyplayers working from the same prioritized list.

Engineering Planning and DevelopmentFrom an engineering perspective, there is a lot to do in avery short space of time. We may identify a set ofresources to work on the release in the PMO forum, butat that time we don’t know much about the technicalrequirements. In the past, the application teamspresented their technical requirements to the e-Businessintegration engineers at the end of their design phase, atthe e-Business Design Review Board forum (EDRB).

This late engagement has proved to be a major issue forcomplex releases where new hardware needs to bepurchased and possibly new reference designsdeveloped. Today, we assign an Integration TechnicalLead (ITL) to the release after the PMO forum has giventhe green light for the release to proceed. It is the role ofthe ITL to partner with the application team to get anunderstanding of the complexity of the application andthen to put together an engineering plan to meet therequirements. The ITL is also expected to be a partnerwith the application development team to develop thetechnical design and jointly present it at the EDRB.

Daily Release Management MeetingsIn the past, the daily release management meetingsfocused on issues surrounding the release at a particularpoint in time. This served a purpose and allowedbottlenecks and issues to get addressed very quickly.However, it did not lend itself to looking at the bigpicture or even looking a week or more ahead. Today,we still have a daily meeting but the release is managedthrough the implementation of milestones. Thesemilestones,(see Figure 8 below) reflect major dependentevents that need to be completed on time to ensure thatthe release stays on track.

Daily monitoring of milestones has proved to be quitesuccessful and provides advanced warnings of potentialshowstoppers. A missed milestone results in moreattention being focused on the issues rather than animmediate halt to the application or release. Acontinuation of missed milestones for an applicationmay result in that application being pulled from therelease.

<>

w30 w30 w34 w34 w34 w34 w34 w36 w34 w34 w36 w36 w33 w34 w37 w37 w37 w38 w38 w38

7/17 7/17 8/14 8/14 8/14 8/14 8/14 8/28 8/14 8/14 8/28 8/28 8/7 8/14 9/5 9/5 9/5 9/14 9/13 9/13IBL dbVulcan2San

w38 w389/12 9/12

w38 w389/12 9/12

Kihei w15 w26 w20 w30

Larry +InfoDesk w38 w40 w39 w40

AM/FDBLContent

PublishingRole w38 w40 w39 w40

Hig

h Le

vel D

esig

n

Ref

eren

ce D

esig

n

Det

aile

d D

esig

nRvw

Prot

o En

v R

eady

Prel

im In

st D

esig

n

Star

t Fun

ct T

est

eAR

L R

eady

Star

t C

har'n

Tes

t

QA

Env

Rea

dy

Func

t Cod

e Fr

eeze

PRO

TO E

xit

Cha

r'n T

estin

g

Star

t Int

eg T

estin

g

Inst

Des

ign

Free

ze

Day

1 E

nv R

eady

QA

Exi

t

Inte

g St

ress

En

v

Star

t Int

eg T

estin

g

Int S

tres

s C

ompl

ete

Day

1 P

lan

Star

t Day

1

Man

agem

ent U

pdat

e

Day

1 C

ompl

ete

App

licat

ion

Nam

e

Go/

NoG

o

Figure 8: Milestone tracking for a release



INCREASING THE RELEASEFREQUENCY

Engineering and Business PartnershipThe e-Business world is constantly changing ascustomers require new functionality. The industry isconstantly improving on technology and functionality,and on server technology; and operating systems anddatabases are issuing major releases almost every year.This is not a time for bureaucracy and process forprocess sake. It is a time for flexibility, compromise,and speed. These are most effectively accomplished byan alliance, or a partnership, with all the groupsparticipating in the release cycle. The goal is to jointlydeliver the release on time and not to demonstrate theshortcomings of the other groups’ deliverables.However, the realistic requirements of each group stillneed to be respected and delivered with flawlessexecution.

In the Intel e-Business world, the partnership that existstoday between the business organization and the IT e-Business Integration Engineering group is beingenhanced with a much closer resource alignment to thebusiness groups. This will provide more focused,dedicated environments and resources for each businessgroup.

AutomationAutomation is beginning to play a significant role inspeeding up the release process. In the e-Businessintegration engineering world, we have started toautomate server build scripts that can dramaticallyreduce the time and the amount of engineering resourcesit takes to build a server. Today, we have all of our OSand database base builds completed and in operation. Inthe future, we will be automating as many of theinstance design scripts as possible.

In addition, we are also bringing in testing tools thathave been successful with the back-end systems for testscript automation and execution for both functional andstress/performance testing.

Pipeline Server ManagementDedicated pipelines of servers for each business groupwill reduce server conflicts between business groupreleases, but will most likely double and triple thenumber of servers over the next two to four years.Managing these large server farms is going to be a majorchallenge, ensuring redundancy for failover andreliability rates in the 99.6% – 99.8% range. Tocomplicate the picture even more, Intel is planning adistributed data center architecture across NorthAmerica, Asia, and Europe. This architecture is seen asa means to improve connectivity and performance inAsian and European countries. The release process may

ultimately include multiple releases in different timezones.

MANAGING E-BUSINESS GROWTHAND CHANGE

Resource ConstraintsOne of the most difficult challenges facing the e-Business Integration Engineering group today is hiringenough experienced people who can hit the groundrunning when they join Intel. The e-BusinessIntegration Engineering group has grown from a 30-person organization two years ago to a 100-personorganization today. Windows* NT and other operatingsystems, Oracle, and SQL engineers are the primaryresources. The e-Business Integration Engineering grouphas developed boot camp classes for new hires and putsa high priority on training. We are also looking atopportunities for outsourcing the more standardizedtasks to free up our experienced employees to work onthe more creative design and engineering tasks.

OutsourcingThe key areas of opportunity for outsourcing that arebeing evaluated are application development andsupport, infrastructure engineering environment builds,production hosting, and first-, second-, and third-levelsupport. Many of the outsourcing opportunities can beoutsourced within Intel but there may be situationswhere it makes more sense to use external vendors.

Infrastructure ReleasesKeeping the infrastructure current and consistent hasalways been a challenge in this fast moving ever-changing e-Business world. A year ago there were 75different reference designs, many of which had minorservice pack or version differences. Today, through aconcerted effort to standardize and reduce the number ofreference designs, the number is down to around 40 andis still going down. Scheduling an infrastructure releasefor upgrades to servers, operating systems, anddatabases has always been difficult because of theimpact upgrades have on the business release schedule.To lessen the impact, infrastructure releases have beenaligned with functional releases. This allows theinfrastructure release to leverage the environmentengineering resources and the QA test resources. Thisstrategy has been successful, but it is sometimes viewedas an unnecessary overhead and involves a risk to abusiness release. As a result, Intel’s e-Businessengineering group is looking into having oneinfrastructure release a year. The business is supportingthis idea because it means a more stable and reliableenvironment.



RESULTSOver the last three months that these improved releasemanagement processes have been in place, there havebeen no release misses in the B2B space, and the IT e-Business Integration Engineering group has delivered asrequired. The early engagement both with the PMO andthe ITL has caused teams to plan ahead and adequatelyresource the releases. Time will tell, however, withrespect to increasing the number of releases and whetherautomation and outsourcing can add further value.

In the next six months there will be approximately 12releases involving approximately 21 applications in thee-Business B2B business area. The majority of theseapplications have been prioritized and resourced. This istwice as many releases as in the same period last year.

DISCUSSIONThe processes and improvements that have beenaddressed in this paper are bearing fruit, but are just thebeginning of the change that needs to take place. The e-Business world is fast paced and ever changing andrequires constant monitoring for new and better ways ofdoing business.

As we move forward with our releases it is clear thatthere are limits to the number of servers and people thatcan be thrown at the problems. The key success factorin delivering on schedule is staying closely partneredwith our customers. We all have to make sure that ourresources are expended on the highest priorities. ForIntel, the PMO forum is a major success factor andneeds to be expanded to include all e-Business groups.

It will be difficult for the business to stand by thedecision to dedicate one quarter to an infrastructurerelease. However, without it we stand the chance offalling behind on significant improvements to our ever-growing environment.

Finally, outsourcing both internally and externally mayturn out to be a key opportunity to increase the releasefrequency without incurring a major increase inresources and hardware.

CONCLUSIONIntel's executive management has set the goal to have100% of its customer business Internet enabled by early2001. To meet this goal, it has been necessary todevelop and deploy e-Business applications in theshortest possible time and with an acceptable level ofquality. To achieve this, we are changing the paradigmfor building, testing, and deploying systems. Business-focused teams, partnerships between business andengineering groups; automation of environment buildsand application testing; outsourcing of suitablefunctions; and a flexible but managed releasemanagement process where engineers and developers

feel each others pain when milestones are missed, are allessential to our success.

ACKNOWLEDGMENTSThanks to Jeff Engman for his significant efforts indriving change into the release process and steadfastlyfollowing the process.

Thanks to David Roff who has helped drive theautomation of the base builds.

Thanks also to Bruce Theurer who has driven changeinto the release process from the user side. Bruce isresponsible for both the PMO and the daily releasemanagement meetings.

AUTHOR’S BIOGRAPHYAlan Hodgson graduated from John Dalton College,Manchester, England in 1971 with an equivalent B.S.degree in mathematics, statistics, and computing. He isa 30-year veteran in information systems and has seenthe good, the bad, and the ugly in release managementover the years. He specialized for many years in testingand joined Intel 3 1/2 years ago as the Test andIntegration Manager for a large SAP project. Alan iscurrently an Engineering Manager in the IT e-BusinessIntegration Engineering group. His e-mail [email protected].


Legal notices athttp://developer.intel.com/sites/developer/tradmarx.htm

Documents

Intel Technology Journal Q4, 2000 · Intel Technology Journal Q4, 2000 Preface Lin Chao Editor Intel Technology Journal Technology is often a key business differentiator. This is