1 14062 Understanding the Cost of Data Center Downtime ENP

Embed Size (px)

Citation preview

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    1/20

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    2/20

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    3/20

    Executive Summary

    Over the course of the past decade, enterprise business has fundamentally changed. Amongthe many changes experienced, none has been more profound than the increase in relianceon information technology (IT) systems to support business-critical applications. For manyof todays enterprises including banks, telecommunications companies, internet serviceproviders and cloud/co-location facilities data center throughput has evolved into monetizedcommodity. No longer simply supporting the internal needs of the organization, data centeravailability has become essential to many companies whose customers pay a premium for accessto a variety of IT applications.

    This unprecedented reliance on IT systems has forged an even stronger connection betweendata center availability and total cost of ownership (TCO). A single downtime event now has

    the potential to signicantly impact the protability (and, in extreme cases, the viability) of anenterprise. Unfortunately, a severe disconnect exists between IT personnel and their C-suitecounterparts with regard to understanding the frequency and the cost of data center downtime.

    Recognizing the need to address these misconceptions, Emerson Network Power partneredwith the Ponemon Institute to conduct two in-depth studies on the perceptions, causes and truemonetary costs of data center downtime totaling thousands of dollars per minute on average as well as which infrastructure vulnerabilities have the most signicant and costly impact on theavailability of critical IT systems (see National Survey on Unplanned Data Center Outages andThe Cost of Data Center Outages).

    In addition to examining the differing perceptions between the C-Suite and IT staff, this whitepaper takes a detailed look at the potential bottom line costs of data center downtime andexamines how power, cooling, monitoring and service inadequacies can contribute to a facilitysrisk of downtime. It explores specic data center infrastructure vulnerabilities and associateddowntime costs, as well as recommendations for fortifying these infrastructures to minimizedowntime and achieve the highest possible return on investment (ROI). Finally, it offers a long-term business case for addressing these critical vulnerabilities as well as factors CIOs and ITpersonnel should consider when prioritizing their actions and investments.

    3

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    4/20

    Introduction:

    Downtime Perceptions vs. Realities

    Since the dot com boom (and subsequentbust) of the late 90s and early 2000s, ITnetworks and data center systems haveexperienced a resurgence in the central rolethey play in revenue generation and businessgrowth. From streamlining customer serviceand networking to facilitating a variety ofe-commerce and enterprise IT services,data centers have evolved into businessfoundations for companies in a wide range of

    industries. Furthermore, as IT services becomeincreasingly commoditized (via co-location,disaster recovery and cloud computingservices), the economic impact of datacenter operations will continue to grow at anunprecedented rate.

    However, even though more enterprisesdepend on their data centers to supportbusiness-critical applications than ever before,signicant infrastructure vulnerabilities and

    misperceptions about the frequency and costof IT failures have put many companies atincreased risk for costly downtime events.

    According to a September 2010 PonemonInstitute study commissioned by EmersonNetwork Power, misconceptions about thefrequency and impact of data center downtimehave become commonplace in businessesacross the United States. The survey of morethan 400 data center and IT operationsprofessionals revealed a widening disconnect

    in perceptions being perpetuated between theC-suite and rank-and-le IT staff:

    Seventy-one percent of senior-levelrespondents believe their companysbusiness model is dependent on its datacenter to generate revenue and/or conducte-commerce. Only 58 percent of rank-and-le respondents shared this belief.

    Though respondents experienced an

    average of two downtime events over thetwo-year period studied (lasting up to 120minutes apiece, on average), 62 percentof senior-level respondents agreed thatunplanned outages did not happenfrequently. Forty-one percent of rank-and-le respondents also agreed with thisstatement.

    Seventy-ve percent of senior-levelrespondents feel their companies seniormanagement fully supports efforts to

    prevent and manage unplanned outages,while just 31 percent of supervisor-levelemployees and below agreed with thisstatement.

    Less than 32 percent of all respondentsagreed their company utilizes all bestpractices to maximize availability of criticalIT equipment (40 percent at the executivelevel; 29 percent at the rank-and-le level).

    Based on these ndings, it is clear thatexecutive-level respondents are extremelycognizant of the economic importance of theircompanys data center operations. This is notsurprising, as the core responsibility for seniormanagement and C-level executives (includingChief Information Ofcers) is to understandhow all facets of the business contribute to acompanys growth and performance.

    Survey responses also indicated that mostof these executives are not as in-tune to the

    day-to-day data center operations as rank-and-le employees specically charged withmaintaining the companys IT infrastructure.As such, many of the executives surveyed arenot as aware of the frequency of downtimeevents and the vulnerabilities in their datacenter infrastructures that are contributing tothese events.

    4

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    5/20

    Conversely, rank-and-le IT staff are more

    aware of the frequency of system failures andspecic vulnerabilities in their companies datacenter infrastructures than their executive-level counterparts. However, fewer rank-and-le respondents actively acknowledge the roleof their companies data center operationsin generating revenue and/or facilitatinge-commerce activity.

    On the surface, these ndings may appearto be benign examples of how siloed workgroups can promote disconnects in how

    common issues are perceived. However, forcompanies whose protability is directly tiedto the availability of enterprise IT operations,they can lead to dramatic increases in adverserisk for the protability, and potentially theviability, of a business.

    By bridging the perception gap betweenC-suite executives and rank-and-le ITstaff, companies will be better positionedto maximize the availability of critical IT

    applications without overly inating a datacenters total cost of ownership. In additionto ensuring the entire organization has anaccurate perception of the state of its datacenter infrastructure, it is critical employeesat all levels of the organization have athorough understanding of the true nancialimplications of downtime.

    These alarming misperceptions about thefrequency and impact of data center downtimeevents triggered the commission of a second

    study to determine and benchmark theaverage cost of data center downtime in theUnited States.

    Methodology:

    Benchmarking the Cost of Downtime

    Data Center Professionals from 41 independentfacilities across the country spanning avariety of organizational responsibilities wereasked to participate in the study. Participatingdata centers represented a wide variety ofindustry segments, including nancial services,telecommunications, retail (conventional ande-commerce), health care, government andthird-party IT services. To ensure that costswere representative of an average enterprise

    data center, participating data centers wererequired to have a minimum square-footage of2,500 ft2.

    Figure 1. Distribution of participatingorganizations by industry segment.

    TransportationDefenseCommunicationsHospitality

    Media

    Conventional retailTechnology & softwareEducation

    2%

    7%

    5%

    10%

    12%

    E-commerce retailCollocation services

    Financial servicesHealthcare

    ConsumerproductsPublic sectorIndustrialServices

    12%

    12%

    10%

    10%

    7%7%

    7%

    7%

    5%

    5%

    5%

    2%

    2%2%

    2%2%

    5

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    6/20

    Representatives from all levels of the IT

    staff were asked to participate in the study,including:

    Facility Managers

    Chief Information Ofcers

    Data Center Management Personnel

    Chief Information Security Ofcers

    IT Compliance Leaders

    To calculate the comprehensive cost ofdata center downtime, researchers used anactivity-based costing model which took intoconsideration direct, indirect and opportunitycosts. As shown in Figure 2, costs werecategorized according to internal activitycenters and external cost consequences.

    Respondents provided direct, indirect andopportunity cost estimates (separately) for a

    single recent outage based on provided rangevariables. To ensure reported losses includedin the study are as comprehensive as possible,

    follow up interviews also were conducted to

    obtain additional information about furtherrevenue losses resulting from data centeroutages.

    Quantifying the Cost of Downtime

    The study, completed in 2011, uncovered anumber of key ndings related to the cost ofdowntime. Based on cost estimates providedby survey respondents, the average cost ofdata center downtime was approximately$5,600 per minute.

    Based on an average reported incident lengthof 90 minutes, the average cost of a singledowntime event was approximately$505,500. These costs are based on a varietyof factors, including but not limited to data lossor corruption, productivity losses, equipmentdamage, root-cause detection and recoveryactions, legal and regulatory repercussions,revenue loss and long-term repercussions onreputation and trust among key stakeholders.

    Though direct costs accounted for nearlyone third of all costs reported, indirect and

    6

    Figure 2. Activity-based cost framework.

    Activity-basedcosting model

    Activity Centers

    Detection

    Containment

    Recovery

    Ex-post response

    Direct costs

    Indirect costs

    Opportunity costs

    Cost Consequences

    Equipment

    IT Productivity

    User Productivity

    Third Parties

    Lost Revenue

    Business Disruption

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    7/20

    opportunity costs signicantly more difcult

    to perceive for rank-and-le staff proved to besignicantly more costly, accounting for morethan 62 percent of all costs resulting from datacenter downtime.

    While business disruption and lost revenue werecited as the most signicant cost consequencesof downtime, other less obvious costs such aslosses in end-user and IT productivity also hada signicant impact on the cost of an averagedowntime event (Figure 3).

    Surprisingly, equipment costs were among thelowest costs reported for a downtime event,averaging approximately $9,000 per incident.This means that the residual, downstreameffects of a data center outage often are farmore costly than the costs to detect andremedy the root cause of an outage after ithas already occurred.

    When considering that the typical data center inthe United States experiences an average of two

    downtime events1

    over the course of two years,the costs of downtime for an average datacenter easily can surpass $1 million in less thantwo years time.

    For enterprises with revenue models that

    depend solely on the data centers ability todeliver IT and networking services to customers such as telecommunications service providersand e-commerce companies downtime canbe particularly costly, with the highest cost ofa single event topping $1 million (more than$11,000 per minute).

    In total, the cost of the most recent downtimeevents for the 41 participating data centerstotaled $20,735,602.

    Other key ndings from the study included:

    Total cost ofboth partial and totalunplanned outages can be a signicantexpense for organizations (approximately$258,000 and $680,000 per event onaverage, respectively).

    The average recovery time from a totaloutage was more than twice that of a partialoutage (134 and 59 minutes, respectively).

    Total cost of outages is systematicallyrelated to the duration of the outage andthe size of the data center.

    The leading (and most costly) root causesof downtime reported by respondents weredirectly related to vulnerabilities in the datacenters power and cooling infrastructures.

    1 Downtime events are not limited to total data center outages.Rack- and row-level outages also are factored-in to this aggregateas well as associated downtime costs.

    7

    Figure 3. Average cost of unplanned data center outages for nine categories..

    $179,827

    $118,080

    $96,226

    $42,530

    $22,347

    $20,884

    $9,537

    $9,063

    $7,008

    $- $40,000 $80,000 $120,000 $160,000 $200,000

    Business disruption

    Lost revenue

    End-user productivity

    IT productivity

    Detection

    Recovery

    Ex-post activities

    Equipments costs

    Third parties

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    8/20

    The Cost of

    Infrastructure Vulnerability

    In addition to revenue costs associatedwith downtime events, a variety of costsare directly associated with the responseactivities necessary for restoring service andidentifying and addressing the root-cause(s)of the outage. As such, respondents wereasked to cite the specic root cause(s) of themost recent outage at their organization aswell as all costs associated with identifyingand remedying the root cause to restore data

    center operations.

    As evidenced by Figure 4, while a variety ofroot causes were cited by survey respondents including UPS system failure (battery),water incursion and IT equipment failures the majority of root causes can be attributedto vulnerabilities in the data centers powerand cooling infrastructure. These root causesclosely mirror those identied by respondentsto the initial Ponemon Institute study.

    As explored in the Emerson Network Powerwhite paper Addressing the Leading RootCauses of Downtime, many of the leadingroot-causes of downtime can be attributed toa variety of factors chief among them beingthe need to get more from less. As demands

    to increase performance and efciency

    increased amidst the recent national economicrecession, data center managers beganimplementing design strategies that achievedthese gains at the cost of exposing criticalvulnerabilities in their infrastructures.

    Fortunately, the risk of many of the leadingroot causes of downtime can be minimizedby observing best practices in infrastructuredesign and system redundancy, as well asimplementing a comprehensive preventiveservice and maintenance regimen.

    In the following sections, this paper willfurther examine the costs incurred byvulnerabilities in respondents power andcooling infrastructures as well as actions andbest practices that can be implemented tominimize recovery costs as well as the overallrisk of downtime2.

    Power-Related Outages

    According to survey respondents, more than39 percent of data center outages reportedwere attributed directly to vulnerabilities inthe data centers power. Among the generalroot causes of downtime related to power, UPSrelated failures (including batteries) provedto be the most costly ($687,700) followed bygenerator failures ($463,890).

    One of the primary reasons powervulnerabilities are so costly for data centers isthat a failure in the power infrastructure will

    likely result in a catastrophic, total unplannedoutage. This means that in addition to anydirect costs incurred to remedy the cause ofthe outage, indirect and opportunity costsalso will be signicant due to the fact that allstakeholders will be affected by the outage.

    2 NOTE: For detailed recommendations for fortifying datacenter infrastructures against the most common root-causes ofdowntime, please refer to the companion white paper Addressingthe Leading Root Causes of Downtime: Technology Investmentsand Best Practices for Assuring Data Center Availability.

    8

    Figure 4. : Primary root causes of reportedunplanned outages.

    29%

    24%15%

    12%

    10%

    5%5% UPS system

    failure(battery)

    Accidental/Human error

    Water, heator CRAC failure

    Weatherrelated

    Generator

    failure

    IT equipmentfailure

    Other

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    9/20

    By denition, Tier I and II data center facilities

    are not equipped with the technologiesneeded to isolate a power system failure, suchas redundancy, dual power paths and staticswitches. As a result, the availability of thesedata centers power infrastructures is whollydependent on the integrity of the facilitys singlebackup system.

    Because Tier I and II data centers can do relativelylittle to prevent the indirect and opportunitycosts incurred by a total data center outagecaused by a power failure, making investments

    that minimize the impact of a power systemfailure on data center operations is stronglyrecommended. One of the best ways to do thisis to ensure that all power systems are backed byan adequate level of redundancy.

    Implementing redundancy allows facilitymanagers to eliminate single points of failurein their power infrastructures. Because there isalways a possibility of equipment failure overtime, redundancy ensures that a backup is always

    in place. While direct costs would still be incurredto repair or replace the failed module, theequipment failure would not have a catastrophicimpact on data center availability, and thus theorganization would not incur the substantialindirect and opportunity costs associated with atotal unplanned outage.

    When adding a UPS for redundancy or replacing

    an existing or failed module, the long-termreliability of the solution should be the highestpriority. Some UPS systems, including the LiebertNXL, also are capable of achieving superiorperformance and availability through redundantcomponents, reduced number of components,fault tolerances for input currents and integratedbattery monitoring capabilities.

    In addition to establishing redundancy in thepower infrastructure, adequate service andmaintenance for critical power systems can play

    a signicant role minimizing the risk of powerequipment failure. In fact, even a single annualpreventive maintenance visit can increase themean time between failure (MTBF) of a UPS unitby more than ten-fold.

    Finally, the implementation of comprehensiveinfrastructure monitoring and managementtools such as Liebert Nform, Liebert SiteScan andAlber Battery Monitoring also can minimize theactivity costs intrinsic to detecting and recovering

    from power system failures. Integrating acomprehensive monitoring solution includingbattery and branch circuit monitoring allows ITstaff to quickly identify, isolate and address powerequipment issues.

    9

    Figure 5. Average total cost by root causes of the unplanned outage.

    $0 $200,000 $400,000 $600,000 $800,000

    IT equipment failure

    UPS system failure (battery)

    Other root causes

    Water, heat or CRAC failure

    Generator failure

    Weather realated

    Accidental/human error

    $750,326

    $687,700

    $612,993

    $489,100

    $463,890

    $395,065

    $298,099

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    10/20

    Environmental-Related Outages

    Along with vulnerabilities in the powerinfrastructure, environmental vulnerabilitiesalso accounted for a noteworthy portion ofthe root-causes cited by survey respondents.Fifteen percent of all root causes were directlyattributed to thermal issues, including waterincursion and IT equipment failures related toheat density and cooling capacity. The costsassociated with detecting and recoveringfrom these failures also was significant, atmore than $489,000 per incident.

    Environmental issues also are a leading causeof IT equipment failures. In fact, thoughIT equipment failures only accounted forve percent of root causes cited by surveyrespondents, these failures incurred thehighest overall cost more than $750,000.

    In many cases, a single failure can cause achain reaction of IT equipment failures requiring extensive detection and recovery

    efforts to identify the root-cause in additionto the replacement of affected IT equipment.For example, a chilled water leak in the datacenters in-row cooling system can cause thefailure of sensitive IT equipment. In additionto identifying and remedying the coolingissue that caused the outage, servers andother damaged IT equipment will need to bereplaced.

    Also, it is critical to point out that coolingequipment does not need to fail to cause

    an IT equipment failure. Conversely, thesefailures typically caused by high heatdensities and hot spots within the rack frequently occur as a result of an inadequatecooling infrastructure rather than a coolingequipment failure. This further reinforcesthe importance of an optimized coolinginfrastructure.

    While some outages relating to the data

    centers cooling infrastructure may bemore isolated than power-related failures contributing to both total and partial datacenter outages a comprehensive coolinginfrastructure remains critical to minimizingdowntime events and their associated costs.This is particularly true considering the manyconnections between a data centers coolinginfrastructure and the viability of critical ITequipment where cooling systems do notneed to fail to cause catastrophic failures anddamage sensitive and costly equipment.

    Fortunately, there are a number of bestpractices and investments that can be madeto a data centers cooling infrastructure tominimize the risk of catastrophic equipmentfailures and associated downtime events.Many of these best practices are explored inthe white paper Addressing the Leading RootCauses of Downtime, including:

    Minimizing the risk of water incursion

    through the use ofrefrigerant-basedcooling instead of water-basedsolutions.

    Eliminating hot spots and high heatdensities by bringing precision coolingcloser to the load via row-based precisioncooling solutions.

    Installing robust monitoring andmanagement solutions with remotemonitoring functionality.

    Fortifying cooling and IT equipmentinvestments with regular preventivemaintenance and service visits.

    While these recommendations embodymany of the best practices for maximizingthe availability, effectiveness and efciencyof the data centers cooling infrastructure,some vendors, including Emerson Network

    10

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    11/20

    Power, now offer facility managers the

    ability to implement an integrated solutionoptimized for efcient, high-availability powerand cooling performance. These solutionsoffer all of the aforementioned design bestpractices, some with the additional benet ofrapid deployment for data center expansion ordisaster recovery.

    These integrated solutions also offer theadded benet of efcient precision coolingthrough cold-aisle containment (See Figure 6),maximizing the effectiveness of the integrated

    cooling solution. These characteristics play acritical role in focusing cooling based on thereal-time needs of the equipment housedwithin the racks, minimizing the risk of hotspots and other faults common in high densitycomputing environments while operating at ahigh level of efciency.

    Making the Business Case for

    Infrastructure Optimization3

    As detailed in the preceding sections,vulnerabilities in a data centers infrastructurecan have a dramatic impact on a facilityssusceptibility to costly downtime eventstotaling hundreds of thousands of dollars.However, as this paper has demonstrated,only 29 percent of rank-and-le IT staffmembers believe that their companies haveimplemented the technologies and bestpractices required to minimize the occurrence

    and impact of data center downtime.

    This disconnect begs the obvious question:If executives understand the role of theirdata centers in generating revenue andsustaining their respective business models,why have many hesitated to make thenecessary investments required to fortify theirinfrastructures against downtime? The likelyanswer is that, prior to quantifying the costof data center downtime, most executives

    could not recognize how downtime preventionspeeds the ROI of their infrastructureinvestments.

    As evidenced by the ndings of the PonemonInstitute, downtime can result in a variety oflong-term reoccurring costs, which includedirect costs associated with identifying andaddressing root causes, as well as indirectcosts associated with disrupting business-critical operations. While minimizing the riskof downtime events and their overall nancial

    impact may necessitate a signicant up-frontCAPEX investment, when considering the gainsin direct and indirect downtime costs as wellas savings gleaned from increases in efciency

    11

    Figure 6. Data center solutions to optimizeprecision cooling, like SmartAisle from EmersonNetwork Power, address specific needswith rapidly deployable solutions that cost-effectively add data center capacity, improve ITcontrol and increase efficiency.

    3 NOTE: Though based on real-world scenarios, the costs detailedin this analysis are approximations of market costs for a referencemodel data center (presented in Appendix A). To obtain a detailedestimate for optimizing your specic data center infrastructure inaccordance with the below recommendations, please contact yourEmerson Network Power Representative.

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    12/20

    that reduce OPEX, select investments can

    actually speed a business time-to-ROIwhile reducing a data centers total cost ofownership over time.

    To emphasize this point, one needs onlyto compare the cost of infrastructureoptimization to the average cost andoccurrence of downtime over time. It isimportant to rst understand how the cost ofdowntime impacts the speed to ROI for datacenter infrastructure investments.

    Power Infrastructure Optimization

    First, consider that a typical unoptimizedenterprise data center experiences an averageof ten downtime events over a period of tenyears, spanning a variety of root causes. At anaverage per-event cost of just over $500,000(including direct costs, indirect costs andopportunity costs), a typical enterprise datacenter can incur more than $5 million indowntime costs during this time.

    UPS system failure costs accounted for 29percent of data center outages reported bysurvey respondents. Extrapolated over tenyears, these data centers can expect to incurat least three downtime events related toUPS system failure, at an average total cost inexcess of $2 million in total downtime costs.

    Compare this gure to the approximate costsassociated with adding UPS redundancy to a2,500-square-foot data center with 105 high-

    density racks (1,000 servers) and a facilitypower draw of approximately 1,200 kW.Adding UPS redundancy to a data center ofthis size would likely require an initial capitalinvestment of approximately $250,000 and anannual investment of up to $15,000 for twoannual preventive service visits (increasing theMTBF for UPS systems by up to 23 times).

    Based on these numbers, when extrapolating

    these investments over ten years, the totalinvestment in strengthening this datacenters UPS systems infrastructure wouldbe approximately $400,000. Compared tothe average total cost of downtime eventscaused by a UPS systems failure as reportedby respondents ($687,000), ROI is easilyachieved through the prevention of a singleUPS-related downtime event. Furthermore,over a period of ten years, ROI can be achievedthree-fold in potential downtime costs alone,not considering gains in efciency and OPEX

    associated with reactive service visits.

    Cooling Infrastructure Optimization

    A similar analysis can be conducted withregard to the optimization of a data centerscooling infrastructure. Data center outagesrelated to failures or inadequacies of criticalcooling systems accounted for approximately20 percent of reported outages, including ITequipment failures. Collectively, the average

    cost of these root causes was approximately$554,000. This means that if an average datacenter experiences ten downtime events overa period of ten years, an average of two events(with an average total cost of more than $1.1million in downtime costs) will be related tovulnerabilities in the data centers coolinginfrastructure.

    To contrast these costs with the cost ofinfrastructure optimization, one can revisitthe aforementioned model data center. In

    this case, the model data center is assumedto rely on eight chilled-water based coolingsolutions servicing load from the data centersIT equipment, UPS and PDU systems, as well asbuilding egress and human load.

    Based on these parameters, it is stronglyrecommended that data center managersinvest in an assessment of their data centerspace. These service can range from a data

    12

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    13/20

    center audit performed by trained service

    representative (often free as part of an existingservice agreement) or a more comprehensivethermal assessment complete with CFDmodeling (approximately $12,000 for thebaseline data center in Appendix A) whichunveils a clear picture of vulnerabilities in adata centers cooling infrastructure and areaswhere signicant efciency gains can beachieved through cooling optimization. Often,such assessments conclude that additionalequipment investments can be postponedby optimizing the conguration of cooling

    systems, racks and IT equipment.

    By optimizing a data centers existing coolinginfrastructure via a cold-aisle containmentstrategy (costing as little as approximately$15,000 for a partitioned containment

    solution), data center managers and

    dramatically enhance the effectiveness oftheir cooling equipment with the addedbenet of signicant gains in energy savings.The addition of intelligent controls (LiebertiCOM) and remote monitoring to a containedinfrastructure (approximately $80,000 for thebaseline data center presented in AppendixA) can further enhance cooling efciencyby at least 12 percent and ensure that all ITequipment is being adequately and preciselycooled based on real-time heat densities(see Figure 7). Finally, investing in ongoing

    preventive maintenance and service for theequipment (an approximate annual investmentof $2,000) and installation of a comprehensiveleak detection solution for all cooling units(approximately $5,000) is recommended.

    13

    Figure 7. Dynamic control provides an additional 15 percent increase in total system efficiencyover cold aisle containment alone.

    75oF

    89oF

    54oF

    54oF

    85oF

    94oF

    62oF

    62oF

    92oF

    97oF

    62oF

    62oF

    Precision

    Cooling

    Precision

    Cooling

    Precision

    Cooling

    Compressor

    Condenser

    Evaporator Fan

    Total

    Savings

    69.7%

    9.3%

    21.0%

    100%

    -

    50.4%

    9.3%

    7.2%

    66.9%

    33%

    50.9%

    9.3%

    18.5%

    78.7%

    21%

    Conventional

    Cooling ApproachWith Cold Aisle

    Containment (CAC)

    With CAC and

    Intelligent Control

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    14/20

    Over ten years, the total investment in

    strengthening this data centers coolinginfrastructure would be approximately$135,000 ($115,000 in year one). Comparedto the average total cost of a single downtimeevent caused by IT systems failure or thermal-related outages as reported by respondents($554,000), these investments can easily bejustied if they prevent even a single thermal-related downtime event.

    Furthermore, as in the case of powerinfrastructure optimization, over a period of

    ten years, ROI can be achieved several timesover when considering potential downtimecosts as well as signicant gains in energyefciency cutting cooling-related energyusage by as much as 33 percent.

    Other Opportunities for Optimization

    In addition to vulnerabilities in the datacenters power and cooling infrastructure,accidents and human errors also can causecostly downtime events.

    Twenty-four percent of study respondentscited human error as the primary cause of theirmost recent downtime event, with downtimecaused by human error accounting for nearly$300,000 in downtime costs per incident. Overa period of ten years, downtime events related

    to human errors and/or accidents can easilycost an organization in excess of $600,000.

    Fortunately, best practices to minimize the riskof downtime events caused by human error

    14

    Figure 8. Potential downtime costs (blue) compared to CAPEX and ongoing service investmentsfor power and cooling infrastructure optimization (dark gray).

    $3,500,000.00

    $3,000,000.00

    $2,500,000.00

    $2,000,000.00

    $1,500,000.00

    $1,000,000.00

    $500,000.00

    $-

    $451,000

    $368,000

    $631,400

    $388,000

    $1,082,400

    $408,000

    $1,262,800

    $428,000

    $1,713,800

    $448,000

    Total DowntimeCost (Potential)

    Total OptimizationInvestment

    1Year

    Year

    52 3 4

    Total DowntimeCost (Potential)

    Total OptimizationInvestment

    6 107 8 9

    $1,894,200

    $468,000

    $2,345,200

    $488,000

    $2,525,600

    $508,000

    $2,976,600

    $528,000

    $3,157,000

    $548,000

    1 2 3 4 5 6 7 8 9 10

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    15/20

    are among the least expensive to implement.

    As explained in the white paper Addressingthe Leading Root Causes of Downtime,recommended actions for minimizing theoccurrence of human errors and AccidentalEmergency Power Off (EPO) events include:

    Shielding Emergency OFF buttons

    Strictly enforcing food and drink policies

    Avoiding contaminants

    Establishing secure access policies

    Performing ongoing personnel training

    Promoting consistent standards foroperation

    Labeling all components accurately

    Documenting maintenance procedures

    According to experts from EmersonNetwork Powers Liebert Services business,implementing these recommended actionswould cost approximately $3,500. Whenconsidering the high overall cost of downtime,such investments represent a nominal costthat can easily achieve an ROI of more than ahundred-fold by preventing a single error oraccident.

    A Comprehensive Comparison

    To put all of these calculations into greaterperspective, vulnerabilities in a data centersUPS and cooling infrastructure, as well ashuman error and accidental EPO events,collectively account for nearly three quartersof the root causes of downtime reported bysurvey respondents with an average cost ofmore than $450,000 per incident. As such,for data centers experiencing an average of tenmajor or minor downtime events over a periodof ten years, UPS, cooling and human error-

    related outages can be expected to accountfor at least seven major or minor downtimeevents, with an average total cost in excessof $3.15 million.

    As illustrated in Figure 8, the ROI ofinfrastructure optimization can beimmediately realized when comparing thepotential cost of downtime to the approximatecost of recommended investments capableof minimizing the risk for these root causes:

    $548,000 including ten years of preventivemaintenance of power and coolingequipment; $368,000 in Year One.

    Furthermore, when considering the additionalefciency gains achieved as a result of thesechanges, the return on investment in powerand cooling infrastructure optimizationis particularly evident, especially whenconsidering long-term savings in indirectand opportunity costs unique to reoccurringdowntime events.

    15

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    16/20

    Investment Prioritization:

    Evaluating Existing Infrastructure

    While the recommended actions outlined inthis paper are critical to minimizing the risk ofthe leading root causes of downtime (as well astheir associated costs), many enterprises maywish to prioritize these investments over time.These decisions are often based on a variety offactors, including CAPEX and OPEX requiredfor comprehensive optimization, the criticalityof data center operation and the impact ofplanned downtime on data center operations.

    If a comprehensive infrastructure overhaul isnot feasible, spreading out investments overtime can be an effective way to balance short-term CAPEX/OPEX with the long-term cost andrisk of the leading root causes of downtime,center operations. For example, many ofthe recommended actions for safeguardingagainst human error and accidental EPOrepresent low hanging fruit and are relativelyinexpensive to execute. As a result, some data

    centers may choose to complete these andother minimally invasive optimizations (suchas row partitioning) rst, and plan for moreintensive optimizations based on availableresources and a required time-to-ROI.

    However, regardless of whether an enterprisedecides to complete an infrastructure overhaulor space out these updates over time, manyoverlook the need to complete comprehensiveassessments of their existing infrastructures, acritical step that can help to avoid unnecessary

    investments that yield little additional value interms of availability or efciency.

    As highlighted in Addressing the Leading

    Root Causes of Downtime: TechnologyInvestments and Best Practices for AssuringData Center Availability White Paper fromEmerson Network Power, a comprehensiveassessment of the facility as well as all thermaland electrical systems can offer detailedinsight into how an existing data center can beoptimized for efciency without compromisingthe availability of critical systems.

    In addition to the performance of a datacenters power and cooling systems,

    data center assessments also take intoconsideration a variety of additional factors nottied directly to equipment performance thatcan impact the availability and performanceof critical systems, including heat densities inracks and rows, raised oor obstructions andarc ash vulnerabilities in the data centerselectrical infrastructure.

    Based on the assessment performed byspecially trained service personnel, the data

    center manager can clearly assess wherecapital investments are required (includingredundant power systems and precisioncooling equipment designed for high-density environments) and where existinginfrastructure can be adjusted or optimized inaccordance with best practices to minimize therisk of data center downtime.

    16

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    17/20

    Conclusion

    As evidenced by the ndings of the Ponemon Institute, a single downtime event now has thepotential to signicantly impact the protability (and, in extreme cases, the viability) of anenterprise. This trend can be attributed to a variety of economic trends, evolving businesspractices and the emergence of revenue streams that are wholly dependent on the availability ofcritical IT systems.

    With an average downtime cost for an enterprise data center totaling thousands of dollarsper minute, it is vital to close the widening disconnect between IT personnel and their C-suitecounterparts. An effective way to achieve this goal is to promote a thorough understanding ofthe frequency, cost and causes of data center downtime.

    Left unattended, an inadequate data center infrastructure will contribute to recurring downtimeevents and result in signicant nancial losses as well as permanent damage to a companysreputation and customer goodwill. While identifying these vulnerabilities and addressingthem based on some of the aforementioned best practices may require a signicant up-frontcost, when contrasting these investments with the potential bottom line costs of data centerdowntime, data center professionals can gain a clear understanding of how direct and indirectcosts can impact revenue over time.

    17

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    18/20

    Appendix A:

    Infrastructure Assumptions forModel Data Center (Pre-Optimization)

    The 2,500-square-foot hypothetical data center has 105racks with average density of 5.6 kW each. The racks arearranged in a hot-aisle/cold-aisle conguration. Coldaisles are four feet wide, and hot aisles are three feet wide.Based on this conguration and operating parameters,average facility power draw was calculated to be 1,127kW.

    Following are additional details used in the analysis:

    Servers

    Age is based on average server replacement cycle of4-5 years.

    Processor Thermal Design Power averages 91W/processor.

    All servers have dual redundant power supplies. Theaverage DC-DC conversion efciency is assumedat 85% and average AC-DC conversion efciency isassumed at 79 percent for the mix of servers fromfour-years old to new.

    Daytime power draw is assumed to exist for 14 hourson weekdays and 4 hours on weekends. Night timepower draw is 80 percent of daytime power draw.

    See Figure 16 for more details on server congurationand operating parameters.

    Storage

    Storage Type: Network attached storage.

    Capacity is 120 Terabytes.

    Average Power Draw is 49 kW.

    Communication Equipment

    Routers, switches and hubs required to interconnectthe servers, storage and access points through LocalArea Network and provide secure access to publicnetworks.

    Average Power Draw is 49 kW.

    Power Distribution Units (PDU):

    Provides output of 208V, 3 Phase through whipsand rack power strips to power servers, storage,communication equipment and lighting. (Averageload is 539kW).

    Input from UPS is 480V 3-phase.

    Efciency of power distribution is 97.5 percent.

    UPS System

    One double conversion 750 kVA UPS with inputlters for power factor correction (power factor = 91percent).

    The UPS receives 480V input power for thedistribution board and provides a 480V, 3 Phasepower to the power distribution units on the datacenter oor.

    UPS efciency at part load: 92.5 percent.

    Cooling system

    Cooling System is chilled water based.

    Total sensible heat load on the precision cooling

    system includes heat generated by the IT equipment,UPS and PDUs, building egress and human load.

    Cooling System Components:

    - Eight 146 kW chilled water based precisioncooling system placed at the end of each hotaisle. Includes one redundant unit.

    - The chilled water source is a chiller plantconsisting of three 200 ton chillers (n+1) withmatching condensers for heat rejection and fourchilled water pumps (n+2).

    - The chiller, pumps and air conditioners arepowered from the building distribution board(480V 3 phase).

    - Total cooling system power draw is 429 kW.

    Building substation:

    The building substation provides 480V 3-phasepower to UPSs and cooling system.

    Average load on building substation is 1,099 kW.

    Utility input is 13.5 kVA, 3-phase connection.

    System consists of transformer with isolationswitchgear on the incoming line, switchgear, circuitbreakers and distribution panel on the low voltageline.

    Substation, transformer and building entranceswitchgear composite efciency is 97.5 percent.

    18

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    19/20

  • 8/2/2019 1 14062 Understanding the Cost of Data Center Downtime ENP

    20/20

    Emerson Network Power.

    The global leader in enabling Business-Critical Continuity. EmersonNetworkPower. com

    AC Power

    Connectivity

    DC Power

    Embedded Computing

    Embedded Power

    Infrastructure Management & Monitoring

    Outside Plant

    Power Switching & Controls

    Precision Cooling

    Racks & Integrated Cabinets

    Services

    Surge Protection

    Emerson Network Power

    1050 Dearborn Drive

    P.O. Box 29186

    Columbus, Ohio 43229

    800.877.9222 (U.S. & Canada Only)

    614.888.0246 (Outside U.S.)

    Fax: 614.841.6022

    EmersonNetworkPower.com

    Liebert.com

    While every precaution has been taken to ensure accuracy andcompleteness in this literature, Liebert Corporation assumes noresponsibility, and disclaims all liability for damages resultingfrom use of this information or for any errors or omissions.

    2011 Liebert Corporation. All rights reserved throughoutthe world. Specifications subject to change without notice.All names referred to are trademarks or registered trademarksof their respective owners.Liebert and the Liebert logo are registered trademarks of theLiebert Corporation. Business-Critical Continuity, Emerson NetworkPower and the Emerson Network Power logo are trademarks andservice marks of Emerson Electric Co. 2011 Emerson Electric Co.

    SL-24661 R05-11 Printed in USA