Upload
dale-hines
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Datacenters And Resilient Services
Benjamin RavaniGeneral ManagerGlobal Foundation ServicesMicrosoft Corporation
ES30
Web services operations Size and scale Data center challenges
Case studies Best practices in building resiliency Opportunities during design phase Summary
Agenda And Objectives
Global Foundation Services’ Mission
Enable and Deliver
Winning Services
To Everyone, Everywhere
Global Foundation Services Across the company, all over the world, around the clock
Growth expected to continue to increase over the next 5 years!
Data Center Operations Challenges
In 2006, U.S. Data centers consumed an estimated 61 billion kilowatt-hours (kWh) of energy, which accounted for about 1.5% of the total electricity consumed in the U.S. that year
The total cost of that energy consumption was $4.5 billion, which is more than the electricity consumed by all color televisions in the country and is equivalent to the electricity consumption of about 5.8 million average U.S. householdsKoomey, jonathan. 2007. Estimating total power consumption by servers in the U.S. And the world. Oakland, CA: analytics press. February 15.Http://enterprise.Amd.Com/downloads/svrpwrusecompletefinal.Pdf
Data centers' power and cooling infrastructure accounts for about half of that electricity consumption; IT equipment accounts for the other half
Why Power Matters…
If the status quo continues, by 2011 data centers will consume 100 billion kWh of energy, at a total annual cost of $7.4 billion.
Those levels of power consumption would also necessitate the construction of 10 additional power plants
Why Power Matters …
Environmental Sustainability
Protecting our environment Smart growth in data center
Make every KW count! Invest in innovation for energy
efficiency Examples
Hydro Power equipment supply
Compute resource utilization Virtualization Green grid
http://www.Microsoft.Com/environment/our_commitment/articles/green_grid.Aspx
Last year beans, this year a data center
Data Center Costs
Land - 2% Core & Shell Costs – 9% Architectural – 7% Mechanical / Electrical – 82%
Land
Core / Shell
Mech /Elec
Arch
Problem and impact Poor planning About 500K users experienced delays in
creating/ updating accounts for several hours Root cause
Interdependent service’s batch job affecting overall performance
Batch job had bugs Solution
Capacity planning cross-services/cross-groups Testing all batch jobs in a test environments first Increase internal security
Case Study I Capacity planning and internal security November 2006
Problem and impact ~5 hours of login outage for 75% of users We couldn’t isolate the source of load
Root cause An internal service partner bug caused latency in
another dependent service, resulting in re-authentication requests – overloading with login rate
Solution Application architecture – reduce dependency Improved monitoring – specific to partner dependency Develop throttling – throttling by partners
Case Study IIProtection against accidental partner’s error - March 2007
Network Protect against DDOS attacks
Front-end machines Kernel throttling – for high connection queue IIS connections – for high connections Interface queue throttling - for high request queue CPU throttling – CPU threshold based TPS throttling – for high TPS per interface
Partner level throttling – for unexpected load increase from a partner
Back-end SQL connections Throttling on number of database connections
Throttling - at all layers of the systemControl incoming requests to prevent total shut down
URL Reputation Service (URS) Internet Explorer 7, 8 Phishing Filter
URS Phishing reporting site
Service profileGrown to billions of transactions
dailyCapacity model:
Capable of sustaining a res. time of <0.5 sec
Managed by 3 people
URL Reputation Service (URS) Overview
ArchitectureDesigned with a pod-oriented
architecture (POA)A pod consists of a couple of dozen
servers and a couple of load-balancers across multiple VIPs
Pods are distributed in multiple data centers globally
Pods are globally load balanced by intelligent traffic control for reliability and performance
URL Reputation Service Topology
Asia NA
EU
NANANA
ITMITM
Input Model: Known Phish Business Rules
Customers feedback loop Grading filters Partners input URS DB on SQL Cluster URF distribution to all pods
NA
EU
NANANA
Grading
ITMITM
URS
Partners
URS
URF
Feedback
Asia
Optimizing client traffic by geography reduces latency and error rates
Send customers to closest data center based on source IP
Response time < 0.5 sec
Performance And Global Load Balancing
Asia NA
EU
NANANA
ITMITM
Pulse
Asia NA
EU
NANANA
ITMITM
Pulse
Intelligent traffic management
Based on policy- re-route traffic from unavailable data center (DC) to other DCs
No service downtime during a DC failure
Disaster recovery/business continuity is built-in
Fault Tolerance Data Center Failover
Multiple VIPs per DC Reassign 1 pod to test VIPs for deploying new bits Rolling upgrade: Change validation process Low risk of outage during deployment Rollback: Easy recovery Lower cost of test labs
Rolling Upgrade (Roll Forward/Backward)
Asia NAEU
NANANA
ITMITM
Pulse
Security – security trumps feature Monitoring and instrumentation
Availability, performance Transaction monitoring Capacity and load System center operations manager 2007
Capacity management Software control, not people control
Change management Release pipeline System administration automation
Environment Control
Deploy servers where there is capacity Global scale Eliminate moves
Standards Hardware SKUs Optimize costs at data center level
Datacenter Agnostic Deployment And Standards
Throttle incoming traffic/limit retries
Back-end servers failover Datacenter failover –
services failover cross DC
Fault Tolerance
24X7 global data centers operations – managing tens of thousands of servers
We have learned from the industry and from our growing experiences – what it takes to make it better!
GFS partnership with Windows Azure from the start
Resiliency is competitive advantage
Summary
Evals & Recordings
Please fill
out your
evaluation for
this session at:
This session will be available as a recording at:
www.microsoftpdc.com
Please use the microphones provided
Q&A
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.