5
11 MIETEK GLINKOWSKI – All systems can fail – this is a simple fact that every industry must deal with. The paramount concern for the data center industry is the unbroken continuity of systems operations. Industry analysts esti- mate that a one-hour outage in a data center costs on average $350,000. And the cost is expected to only go up as more and more business enterprises depend on the storage, networking and processing of digital data, nearly all of it through or inside a data center. Since loss of service for a data center is so costly, even if only for an extremely short time, availability is still the most critical driver for data centers design, operation and maintenance. Defining data center availability via a tier classification system Designed for uptime Designed for uptime

λ1 - ABB Group...Tier II This design is known as N+1 4. The primary difference between a Tier I and Tier II classification is the presence of an The addition of the passive delivery

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: λ1 - ABB Group...Tier II This design is known as N+1 4. The primary difference between a Tier I and Tier II classification is the presence of an The addition of the passive delivery

11

MIETEK GLINKOWSKI – All systems can fail – this is a simple fact that every industry must deal with. The paramount concern for the data center industry is the unbroken continuity of systems operations. Industry analysts esti-mate that a one-hour outage in a data center costs on average $350,000. And the cost is expected to only go up as more and more business enterprises depend on the storage, networking and processing of digital data, nearly all of it through or inside a data center. Since loss of service for a data center is so costly, even if only for an extremely short time, availability is still the most critical driver for data centers design, operation and maintenance.

Defining data center availability via a tier classification system

Designed for uptime

Designed for uptime

Page 2: λ1 - ABB Group...Tier II This design is known as N+1 4. The primary difference between a Tier I and Tier II classification is the presence of an The addition of the passive delivery

12 ABB review 4|13

nual IT downtime. The different tier de-signs are also capable of accommodat-ing different power load densities, from 200 W/m2) to 1,500 W/m2. For power engineers it is important to realize that the higher the tier the higher the utility voltage supplied to the facility. This is predominantly related to the fact that the availability of power within a power sys-tem is generally increasing from low-volt-age (LV) area distribution to medium-voltage (MV) distribution to high-voltage (HV) transmission systems. The closer one is to the infinite bus of a large power system the less the likelihood of a distur-bance or blackout.

Tier IThis architecture is the simplest and therefore offers the lowest availability and lowest IT load power density. This design concept is called N, reflecting the fact that “n” IT loads need “n” sets of UPS units and gensets. ➔ 3 identifies the basic components of a data center, as described below.

Utility source

The utility source component in a Tier I classification feeds an input transformer stepping down from MV to LV.

A vailability of the data center refers to meeting the uptime expectations of the users. The current high availability

of data centers has been achieved most-ly through redundancy in design, equip-ment (both IT equipment and power devices), electricity delivery paths and software ➔ 1. Several classification sys-tems exist in the industry to define data center availability. Rapidly changing tech-nologies, desire to differentiate among themselves, environmental awareness and foremost cost pressures often dictate designs that either fall in between differ-ent tier structures or even seek more radical departures. The tier structure from the Uptime Institute, though not always followed, is considered an impor-tant industry guideline and thus is the classification referenced in this article. The Uptime Institute defines a four-tier system, where each level describes the availability as a guideline for designing data center infrastructure ➔ 2. The higher the tier, the greater the availability.

The lowest cost and the lowest perfor-mance data center, Tier I, has a target availability of 99.671 percent, which translates to 28.8 hours of annual IT downtime. The highest level data center design, Tier IV, has a target of 99.995 percent availability, or 24 minutes of an-

1 Reliability and availability

Reliability and availability are often misinter-preted and confused with the quality of a system or a product. Reliability is defined as a function of time:R(t) = e-λ1

where R(t) is reliability, t is time, and λ=Tf/Tp is a failure rate. Tf is the total number of failed occurrences during the total period of Tp. The longer the system is operating the lower the reliability. The parameter λ is a reciprocal of MTBF (mean time between failures). Mean time to repair (MTTR), which is the time needed to repair a failed system or device, is another important parameter. Used in combination, MTBF and MTTR determine the inherent availability (Ai) of a system or device: Ai = MTBF/(MFTBF + MTTR).

If one expands the concept of availability to include the scheduled maintenance downtime the availability changes to the operational availability, Ao.

Reliability and availability are not fixed numbers. They are both functions of specific components of the system as well as the system topology,

which determines how critical these compo-nents are to the mission critical function of the data center. Therefore, the reliability has to be evaluated at different points of the system where power is to be delivered to the IT load.

As mentioned, reliability and availability are not the same as quality. Quality refers to the condition of the new equipment when delivered to the customer – ie “out of the box.” Reliability and therefore availability is measured over a period of time. This, besides quality, includes the effects of aging and stress level of the equipment within the system.

Increased reliability can be accomplished by redundancy (of equipment and delivery paths). However, the more equipment the greater the likelihood for one or more of the components to fail. For any system design there is a balance between the level of redundancy and associated complexity and reliability gains. Good system designs need to get the most out of the equipment, utilize their full potential and provide a sufficient level of redundancy and back up for reliable energy supply.

The paramount concern for the data center industry is the unbroken continu-ity of a systems operation.

Title pictureWhat sort of designs do data centers follow in order to meet the high demands for availability?

Page 3: λ1 - ABB Group...Tier II This design is known as N+1 4. The primary difference between a Tier I and Tier II classification is the presence of an The addition of the passive delivery

13

gensets this time can increase to up to a minute.

Uninterruptible power supplies

There are primarily three types of uninter-ruptible power supply (UPS) technolo-gies – standby, line interactive and dou-ble conversion. By far the most popular is double conversion, where all the pow-er flowing through the UPS is rectified from AC to DC, inverted back to AC and therefore fully conditioned and cleaned from all utility-side disturbances, tran-sients, voltage sags and swells, and other power quality (PQ) effects. The DC bus in the middle is also connected to the bat-tery bank, which, in the event of power loss, provides short-term power. The switch between the utility AC power and the internal battery power is seamless and instantaneous. Short-term power is determined by the size of the battery bank and typically varies from 2 to 3 min to 7 to 10 min.

Genset

A genset is an emergency power genera-tor, typically with a diesel engine, that provides a long-term power backup in the event of a utility outage. Long-term is defined by the amount of fuel stored in the tank and can vary from 24 to 72 hours. Having a high-priority fuel delivery contract can extend the time. The generator is in the form of a synchro-nous machine with power ratings of few hundred kW to 2 to 3 MW.

Automatic transfer switching

Using a specialized automatic transfer switchgear (ATS) with control and pro-tection logic allows for a seamless switch from the source between the util-ity and the genset under a number of different conditions. Most of the time the switch from the utility to the genera-tor is open-before-close – ie, when the utility power is lost the utility breaker is open and the genset is closed only after the genset has started properly, reached the desired rpm and excitation, and is synchronized. The starting of the genset can take a few seconds. With multiple

Designed for uptime

The current high availability of data centers has been achieved most ly through redundan-cy in design, equip-ment, electricity delivery paths and software. 3 Tier I design N

GEN N

Main switchgear

LV switchgear

Switchgear

Mechanical load(cooling)

IT equipment(critical load)

UPS N

PDU

Utility feed

4 Tier II design (N+1)

GEN N

GEN+1

Main switchgear

LV switchgear

Switchgear

GEN switchgear

Mechanical load(cooling)

IT equipment(critical load)

UPS N

UPS +1

PDU

Utility feed

2 Tier similarities and differences

Tier I Tier II Tier III Tier IV

Number of delivery paths Only 1 Only 1 1 active1 passive

2 active

Redundant components N N +1 N +1 2 (N +1) or S + S

Utility voltage 208, 480 208, 480 12-15kV 12-15kV

Annual IT downtime due to site

28.8 hours 22.0 hours 1.6 hours 0.4 hours

Site availability 99.671% 99.749% 99.982% 99.995%

© The Uptime Institute

Page 4: λ1 - ABB Group...Tier II This design is known as N+1 4. The primary difference between a Tier I and Tier II classification is the presence of an The addition of the passive delivery

14 ABB review 4|13

additional genset and UPS. This pro-vides some degree of device redundancy of the most critical components of the system for short-term and long-term backup. All other components of the system are basically the same. Even with this redundancy there are still several dif-ferent single points of failures in the path to deliver power to the IT load.

Tier IIITier III is referred to as an active-passive system ➔ 5. In a Tier III classification the power delivery path has to be doubled. Besides the redundant critical compo-nents there has to be a second path par-allel to the critical IT load in case the pri-mary path has failed. This second path could be passive, ie, used only in case of emergency. A Tier III classification also requires a second utility connection. The addition of the passive delivery path sig-nificantly raises the cost of the entire system and also complicates the control, coordination, maintenance, etc. There is also an additional switchgear and motor control center (MCC), which should allow the full operation of the data center from the passive path. The IT equipment can now take full advantage of the dual sup-ply paths and therefore utilize dual PSUs for each server, for example. As a result the number of single points of failure is significantly reduced. However, the pas-sive delivery path does not require UPS so during the emergency conditions the system is vulnerable to utility conditions, therefore potentially exposed to utility power quality issues or even power out-ages.

Switchgear

A variety of switchgear is needed in data centers to distribute the power to the many different rows of IT equipment (critical loads) as well as cooling equipment (pumps, fans, valves, compressors, etc.) and other auxiliary loads. The circuit break-ers in the switchgear also provide protec-tion against faults and other abnormal conditions. In the Tier I facility all of the switchgear is low voltage (less than ~1 kV).

Power distribution unit

Power distribution units (PDUs) are com-prised of circuit breakers, metering units and, in North America, LV transformers, to further distribute the power to the IT racks as well as provide protection and measure the power (voltage and current) to the individual loads.

Power supply units

Power supply units (PSUs) are part of the IT equipment. Similar to the power sup-ply of a desktop computer these units transform the 220 V or 110 V input power to the DC voltage distributed to the various IT equipment: servers, network and storage systems. The most popular PSUs are transformer-less switched mode power supply (SMPS). Due to the redundancy of the power distribution for Tier III and IV more and more PSUs are now provided with dual AC inputs and can function from either of the two.

Tier IIThis design is known as N+1 ➔ 4. The primary difference between a Tier I and Tier II classification is the presence of an

The addition of the passive delivery path significantly raises the cost of the entire system and also compli-cates the control, coordination and maintenance.

5 Tier III active-passive design; no UPS in the passive path

GEN GEN

Main switchgear Main switchgear

LV switchgear LV switchgear

Switchgear Switchgear

GEN switchgear GEN switchgear

Mechanical load(cooling)

IT equipment(critical load)

UPS N

UPS +1

PDU PDU

Utility feed Utility feedActive Passive

Page 5: λ1 - ABB Group...Tier II This design is known as N+1 4. The primary difference between a Tier I and Tier II classification is the presence of an The addition of the passive delivery

15

Mietek Glinkowski

ABB Data Centers

Raleigh, NC, United States

[email protected]

For example, during one year, 10 short power interruptions at the server power supply lasting 50 ms each will have a much more detrimental impact on the operation of the servers than one

longer interruption of 500 ms during the same period of time. Although both will result in the same annual availability (total of 0.5 s of lost power) the first one will cause the servers

to reboot and possibly lose some data 10 times during the year; the second one will result in only one reboot a year.

Highly skilled engineering resources are needed to design, implement, and opti-mize the entire data center ecosystem for their availability and reliability. The tra-ditional way of thinking about availability and reliability is changing rapidly. In-creased system voltages, more sophisti-cated switching schemes, wider operat-ing regimes for IT equipment, and foremost the advent of failure-resilient software and cloud computing introduce new dimensions to data center reliability. So, stay tuned.

Tier IVReferred to as a 2N+1 system, the Tier IV classification is also considered the Cadillac of data center design ➔ 6. A relatively small number of data centers in

the world are certified as Tier IV designs. They are fully redundant, complete dual systems running actively in parallel. By virtue of the redundancy the rating of each path has to be 100 percent of the load and therefore the maximum utiliza-tion of the two paths under normal oper-ating conditions is at maximum 50 per-cent. In addition, some Tier IV designs will have N+1 of UPSs and gensets in each path, further increasing the com-plexity and cost but at the same time gaining the valuable fraction of a percent (0.01 percent to be exact) for availability. The target for Tier IV availability is to allow a maximum of 24 min per year of the annual site-caused end-user down-time (representing one failure every five years).

Changes to come Tier structure availability and downtime are not the only factors to consider. Im-pact of the interruptions on the operation of the mission critical facility can vary.

Designed for uptime

Tier IV designs are fully redundant, complete dual systems running actively in parellel.

For any system design there is a balance between the level of redun-dancy and associ-ated complexity and reliability gains.

6 Tier IV design 2N+1; two simultaneously active paths

GEN N

GEN +1

GEN +1

GEN N

Main switchgear Main switchgear

LV switchgear LV switchgear

Switchgear Switchgear

GEN switchgear GEN switchgear

Mechanical load(cooling)

IT equipment(critical load)

UPS N

UPS N

UPS +1

UPS +1

PDUA

PDUB

Utility feed A Utility feed B