28
Datacenters And Resilient Services Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Embed Size (px)

Citation preview

Page 1: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Datacenters And Resilient Services

Benjamin RavaniGeneral ManagerGlobal Foundation ServicesMicrosoft Corporation

ES30

Page 2: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Web services operations Size and scale Data center challenges

Case studies Best practices in building resiliency Opportunities during design phase Summary

Agenda And Objectives

Page 3: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Global Foundation Services’ Mission

Enable and Deliver

Winning Services

To Everyone, Everywhere

Page 4: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Global Foundation Services Across the company, all over the world, around the clock

Page 5: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Growth expected to continue to increase over the next 5 years!

Data Center Operations Challenges

Page 6: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

In 2006, U.S. Data centers consumed an estimated 61 billion kilowatt-hours (kWh) of energy, which accounted for about 1.5% of the total electricity consumed in the U.S. that year

The total cost of that energy consumption was $4.5 billion, which is more than the electricity consumed by all color televisions in the country and is equivalent to the electricity consumption of about 5.8 million average U.S. householdsKoomey, jonathan. 2007. Estimating total power consumption by servers in the U.S. And the world. Oakland, CA: analytics press. February 15.Http://enterprise.Amd.Com/downloads/svrpwrusecompletefinal.Pdf

Data centers' power and cooling infrastructure accounts for about half of that electricity consumption; IT equipment accounts for the other half

Why Power Matters…

Page 7: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

If the status quo continues, by 2011 data centers will consume 100 billion kWh of energy, at a total annual cost of $7.4 billion.

Those levels of power consumption would also necessitate the construction of 10 additional power plants

Why Power Matters …

Page 8: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Environmental Sustainability

Protecting our environment Smart growth in data center

Make every KW count! Invest in innovation for energy

efficiency Examples

Hydro Power equipment supply

Compute resource utilization Virtualization Green grid

http://www.Microsoft.Com/environment/our_commitment/articles/green_grid.Aspx

Last year beans, this year a data center

Page 9: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Data Center Costs

Land - 2% Core & Shell Costs – 9% Architectural – 7% Mechanical / Electrical – 82%

Land

Core / Shell

Mech /Elec

Arch

Page 10: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Problem and impact Poor planning About 500K users experienced delays in

creating/ updating accounts for several hours Root cause

Interdependent service’s batch job affecting overall performance

Batch job had bugs Solution

Capacity planning cross-services/cross-groups Testing all batch jobs in a test environments first Increase internal security

Case Study I Capacity planning and internal security November 2006

Page 11: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Problem and impact ~5 hours of login outage for 75% of users We couldn’t isolate the source of load

Root cause An internal service partner bug caused latency in

another dependent service, resulting in re-authentication requests – overloading with login rate

Solution Application architecture – reduce dependency Improved monitoring – specific to partner dependency Develop throttling – throttling by partners

Case Study IIProtection against accidental partner’s error - March 2007

Page 12: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Network Protect against DDOS attacks

Front-end machines Kernel throttling – for high connection queue IIS connections – for high connections Interface queue throttling - for high request queue CPU throttling – CPU threshold based TPS throttling – for high TPS per interface

Partner level throttling – for unexpected load increase from a partner

Back-end SQL connections Throttling on number of database connections

Throttling - at all layers of the systemControl incoming requests to prevent total shut down

Page 13: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

URL Reputation Service (URS) Internet Explorer 7, 8 Phishing Filter

Page 14: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

URS Phishing reporting site

Page 15: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Service profileGrown to billions of transactions

dailyCapacity model:

Capable of sustaining a res. time of <0.5 sec

Managed by 3 people

URL Reputation Service (URS) Overview

ArchitectureDesigned with a pod-oriented

architecture (POA)A pod consists of a couple of dozen

servers and a couple of load-balancers across multiple VIPs

Pods are distributed in multiple data centers globally

Pods are globally load balanced by intelligent traffic control for reliability and performance

Page 16: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

URL Reputation Service Topology

Asia NA

EU

NANANA

ITMITM

Page 17: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Input Model: Known Phish Business Rules

Customers feedback loop Grading filters Partners input URS DB on SQL Cluster URF distribution to all pods

NA

EU

NANANA

Grading

ITMITM

URS

Partners

URS

URF

Feedback

Asia

Page 18: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Optimizing client traffic by geography reduces latency and error rates

Send customers to closest data center based on source IP

Response time < 0.5 sec

Performance And Global Load Balancing

Asia NA

EU

NANANA

ITMITM

Pulse

Page 19: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Asia NA

EU

NANANA

ITMITM

Pulse

Intelligent traffic management

Based on policy- re-route traffic from unavailable data center (DC) to other DCs

No service downtime during a DC failure

Disaster recovery/business continuity is built-in

Fault Tolerance Data Center Failover

Page 20: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Multiple VIPs per DC Reassign 1 pod to test VIPs for deploying new bits Rolling upgrade: Change validation process Low risk of outage during deployment Rollback: Easy recovery Lower cost of test labs

Rolling Upgrade (Roll Forward/Backward)

Asia NAEU

NANANA

ITMITM

Pulse

Page 21: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Security – security trumps feature Monitoring and instrumentation

Availability, performance Transaction monitoring Capacity and load System center operations manager 2007

Capacity management Software control, not people control

Change management Release pipeline System administration automation

Environment Control

Page 22: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Deploy servers where there is capacity Global scale Eliminate moves

Standards Hardware SKUs Optimize costs at data center level

Datacenter Agnostic Deployment And Standards

Page 23: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Throttle incoming traffic/limit retries

Back-end servers failover Datacenter failover –

services failover cross DC

Fault Tolerance

Page 24: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

24X7 global data centers operations – managing tens of thousands of servers

We have learned from the industry and from our growing experiences – what it takes to make it better!

GFS partnership with Windows Azure from the start

Resiliency is competitive advantage

Summary

Page 25: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Evals & Recordings

Please fill

out your

evaluation for

this session at:

This session will be available as a recording at:

www.microsoftpdc.com

Page 26: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

Please use the microphones provided

Q&A

Page 27: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market

conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Page 28: Benjamin Ravani General Manager Global Foundation Services Microsoft Corporation ES30