45
1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13 Feb 2005 Stanford, CA

1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

Embed Size (px)

Citation preview

Page 1: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

1

Dependability in the

Internet Era

Jim Gray

Microsoft ResearchHigh Dependability Computing Consortium Conference

Santa Cruz, CA 7 May 2001

REVISED: 13 Feb 2005 Stanford, CA

Page 2: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

2

Outline

• The glorious past (Availability Progress)

• The dark ages (current scene)

• Some recommendations

Page 3: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

3

PreviewThe Last 10 Years: Availability Dark Ages

Ready for a Renaissance? • Things got better, then things got a lot worse!

1950 1960 1970 1980 1990 2000

9%

99%

99.9%

99.99%

99.999%

99.999%

Computer Systems

Telephone Systems

Cellphones

InternetA

vaila

bilit

y

2010

Page 4: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

4

DEPENDABILITY: The 3 ITIES• RELIABILITY / INTEGRITY:

Does the right thing. (also MTTF>>1)

• AVAILABILITY: Does it now.

(also 1 >> MTTR ) MTTF+MTTRSystem Availability:If 90% of terminals up & 99% of DB up?

(=>89% of transactions are serviced on time).

• Holistic vs. Reductionist view

SecurityIntegrityReliability

Availability

Page 5: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

5

Fail-Fast is Good, Repair is Needed

Improving either MTTR or MTTF gives benefit

Simple redundancy does not help much.

Fault Detect

Repair

Return

Lifecycle of a moduleLifecycle of a modulefail-fast gives fail-fast gives short fault latencyshort fault latency

High Availability High Availability

is low UN-Availabilityis low UN-Availability

Unavailability ~ Unavailability ~ MTTRMTTR MTTFMTTF

Page 6: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

6

Fault Model• Failures are independent

So, single fault tolerance is a big win

• Hardware fails fast (dead disk, blue-screen)

• Software fails-fast (or goes to sleep)

• Software often repaired by reboot:– Heisenbugs

• Operations tasks: major source of outage– Utility operations

– Software upgrades

Page 7: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

7

Disks (raid) the BIG Success Story

• Duplex or Parity: masks faults• Disks @ 1M hours (~100 years) • But

– controllers fail and – have 1,000s of disks.

• Duplexing or parity, and dual path gives “perfect disks”

• Wal-Mart never lost a byte (thousands of disks, hundreds of failures).

• Only software/operations mistakes are left.

Page 8: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

8

Fault Tolerance vs Disaster Tolerance

• Fault-Tolerance: mask local faults– RAID disks– Uninterruptible Power Supplies– Cluster Failover

• Disaster Tolerance: masks site failures– Protects against fire, flood, sabotage,..– Also, software changes, site moves,…– Redundant system and service

at remote site.

Page 9: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

9

Availability99 999well-managed nodes

well-managed packs & clones

well-managed GeoPlex

Masks some hardware failures

Masks hardware failures, Operations tasks (e.g. software upgrades)Masks some software failures

Masks site failures (power, network, fire, move,…) Masks some operations failuresA

vaila

bilit

yUn-managed

Page 10: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

10

Case Study - Japan"Survey on Computer Security", Japan Info Dev Corp., March 1986. (trans: Eiichi

Watanabe).

Vendor (hardware and software) 5 MonthsApplication software 9 MonthsCommunications lines 1.5 YearsOperations 2 YearsEnvironment 2 Years

10 Weeks1,383 institutions reported (6/84 - 7/85)

7,517 outages, MTTF ~ 10 weeks, avg duration ~ 90 MINUTES

To Get 10 Year MTTF, Must Attack All These Areas

42%

12%

25%9.3%

11.2%

Vendor

Environment

OperationsApplication

Software

Tele Comm lines

Page 11: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

11

Case Studies - Tandem Trends

MTTF improved

Shift from Hardware & Maintenance to from 50% to 10%

to Software (62%) & Operations (15%)

NOTE: Systematic under-reporting of EnvironmentOperations errorsApplication Software

unknown environment operations maintenance hardware software

0

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

100

1985 1987 1989

0

20

40

60

80

1 00

1 20

1985 19 87 1 989

Outag es/ 1000 Syste m Yearsby Primar y Cause

% of Outage s by Pri mary Cause

Page 12: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

12

Dependability Status circa 1995 • ~4-year MTTF

• 5 9s for well-managed sys. Fault Tolerance Works.

• Hardware is GREAT (maintenance and MTTF).

• Software masks most hardware faults.

• Many hidden software outages in operations:• New Software.• Utilities.

• Need to make all hardware/software changes ONLINE.

• Software seems to define a 30-year MTTF ceiling.

• Reasonable Goal: 100-year MTTF. class 4 today => class 6 tomorrow.

Page 13: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

13

Honorable Mention

• The nice folks at Tandem (now HP))

– Made failover fast (30 seconds or less).

– Made change online• Add hardware/software• Reorganize database.• Rolling upgrades.

– Added at least one 9 to their story.

Page 14: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

14

And Then?

• Hardware got better (& more complex)

• Software got better (& more complex)

• Raid is standard, Snapshots becoming standard

• Cluster in a box: commodity failover

• Remote replication is standard.

Page 15: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

15

Outline

• The glorious past (Availability Progress)

• The dark ages (current scene)

• Some recommendations

Page 16: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

16

Progress?• MTTF improved from 1950-1995• MTTR incremental improvements 1970 ---

failover• Hardware and Software online change

(pNp) is now standard• Then the Internet arrived:

– No project can take more than 3 months.– Time to market is everything– Change is good.

Computer Systems

Telephone Systems

Cellphones

Internet

Page 17: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

17

The Internet Changed Expectations

1990Phones delivered 99.999%

ATMs delivered 99.99%

Failures were front-page news.

Few hackers

Outages last an “hour”

2005Cell phones deliver 90%

Web sites deliver 99%

Failures are business-page news

Many hackers.

Outages last a “day”

This is progress?

Page 18: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

18

Eric Brewer said it best:

ACID vs BASEthe internet litmus test

“copy” of slide 8 of http://www.ccs.neu.edu/groups/IEEE/ind-acad/brewer/sld008.htm

• AtomicityConsistencyIsolation Durabilty

• Availability?• Strong consistency

Isolation

• Focus on commit• Conservative (Pessimistic)

• Difficult evolution (e.g. schema)

• Nested transactions

• BasicAvailabilitySoft StateEventual Consistency

• Availability FIRST• Weak consistency

stale data is OKApproximate answers OK

• Best effort• Aggressive (optimistic)• Easier Evolution.

• Simpler!• Faster

I think it is a spectrum

Page 19: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

19

Why (1) Complexity• Internet sites are MUCH

more complex.– NAP– Firewall/proxy/IPsprayer– Web– DMZ– App server– DB server– Links to other sites– tcp/http/html/dhtml/dom/xml/

com/corba/cgi/sql/fs/os…

• Skill level is much reduced

Page 20: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

20

A Data Center (500 servers)

C is c o 7 0 0 0

ICPMSCOMC7501

C is c o 7 0 0 0

ICPMSCOMC7502

C a ta lyst5 0 0 0

ICPMSCOMC5001(MSCOM1)

ATM0/0/0.1

FE4/0/0Port 1/1

HSRP

FE4/1/0 FE4/1/0

HSRP

Port 2/1 Port 2/1C a ta lyst

5 0 0 0

ICPMSCOMC5002(MSCOM2)

FE4/0/0

ATM0/0/0.1

Port 1/1

C is c o 7 0 0 0

ICPMSCOMC7503

C a ta lyst5 0 0 0

ICPMSCOMC5003(MSCOM3)

ATM0/0/0.1

FE4/0/0Port 1/1

HSRP

FE4/1/0 FE4/1/0

HSRP

Port 2/1 Port 2/1 C a ta lyst5 0 0 0

ICPMSCOMC5004(MSCOM4)

FE4/0/0

ATM0/0/0.1

Port 1/1

C is c o 7 0 0 0

ICPMSCOMC7504

SD

SERETH

NEXT

SELECT

RESET

TXCRXL

PWR

SYSTEMS

SERETH

NEXT

SELECT

RESET

TXCRXL

PWR

SERETH

NEXT

SELECT

RESET

TXCRXL

PWR

SERETH

NEXT

SELECT

RESET

TXCRXL

PWR

AC AC

48V DC 48V DC

5VDC OK 5VDC OK

SHUTDOWN SHUTDOWN

CAUTION:Double Pole/neutral fusing CAUTION:Double Pole/neutral fusingF12A/250V F12A/250V

ASX-1000

B DB DB D B D

A CA CA CA C

SD

SERETH

NEXT

SELECT

RESET

TXCRXL

PWR

SYSTEMS

SERETH

NEXT

SELECT

RESET

TXCRXL

PWR

SERETH

NEXT

SELECT

RESET

TXCRXL

PWR

SERETH

NEXT

SELECT

RESET

TXCRXL

PWR

AC AC

48V DC 48V DC

5VDC OK 5VDC OK

SHUTDOWN SHUTDOWN

CAUTION:Double Pole/neutral fusing CAUTION:Double Pole/neutral fusingF12A/250V F12A/250V

ASX-1000

B DB DB D B D

A CA CA CA C

ICPMDISTFA1001 ICPMDISTFA1002

3A22A2

2A2

1A2

ATM0/0/0.1

4A2

ATM0/0/0.1

4A2

1A2

C is c o 7 0 0 0

ICPMSCOMC7505

Catalyst 2926

ICPMSFTDLC2921(MSCOM DL1)

Port 1/1

FE4/0/0

HSRP

C is c o 7 0 0 0

ICPMSCOMC7506

Catalyst 2926

ICPMSFTDLC2922(MSCOM DL2)

Port 1/1

FE5/0/0

HSRP

Port 1/2Port 1/2

FE4/0/0

HSRP

FE5/0/0

HSRP

IIS

IIS

IIS

IIS

IIS

IIS

CPMSFTWBW26CPMSFTWBW28CPMSFTWBW30

CPMSFTWBW37CPMSFTWBW38CPMSFTWBW39

WWW.MICROSOFT.COMWWW.MICROSOFT.COM

CPMSFTWBW24CPMSFTWBW31CPMSFTWBW32CPMSFTWBW33CPMSFTWBW34

CPMSFTWBW35CPMSFTWBW40CPMSFTWBW41CPMSFTWBW42CPMSFTWBW43

SEARCH.MICROSOFT.COM

CPMSFTWBS01CPMSFTWBS02CPMSFTWBS03CPMSFTWBS04CPMSFTWBS05CPMSFTWBS06CPMSFTWBS07CPMSFTWBS08CPMSFTWBS09

CPMSFTWBS10CPMSFTWBS11CPMSFTWBS12CPMSFTWBS13CPMSFTWBS14CPMSFTWBS15CPMSFTWBS16CPMSFTWBS17CPMSFTWBS18

WWW.MICROSOFT.COM

CPMSFTWBW08CPMSFTWBW13CPMSFTWBW14CPMSFTWBW29

CPMSFTWBW36CPMSFTWBW44CPMSFTWBW45

WWW.MICROSOFT.COM

CPMSFTWBW01CPMSFTWBW15CPMSFTWBW25

CPMSFTWBW27CPMSFTWBW46CPMSFTWBW47

REGISTER.MICROSOFT.COM

CPMSFTWBR03CPMSFTWBR04CPMSFTWBR05

CPMSFTWBR09CPMSFTWBR10

SUPPORT.MICROSOFT.COM

CPMSFTWBT01CPMSFTWBT02

CPMSFTWBT03CPMSFTWBT07

CPMSFTWBT04CPMSFTWBT05

WINDOWS.MICROSOFT.COM

CPMSFTWBY01CPMSFTWBY02

CPMSFTWBY03CPMSFTWBY04

WINDOWS98.MICROSOFT.COM

CPMSFTWBJ01

WINDOWSMEDIA.MICROSOFT.COM

PREMIUM.MICROSOFT.COM

CPMSFTWBP01CPMSFTWBP02

CPMSFTWBP03

SUPPORT.MICROSOFT.COM

CPMSFTWBT06CPMSFTWBT08

CPMSFTWBR07CPMSFTWBR08

CPMSFTWBR01CPMSFTWBR02CPMSFTWBR06

REGISTER.MICROSOFT.COM

WINDOWSMEDIA.MICROSOFT.COM WINDOWSMEDIA.MICROSOFT.COM

CPMSFTWBJ01CPMSFTWBJ02

CPMSFTWBJ03CPMSFTWBJ05

CPMSFTWBJ06CPMSFTWBJ07CPMSFTWBJ08

CPMSFTWBJ09CPMSFTWBJ10

CPMSFTWBJ06CPMSFTWBJ07CPMSFTWBJ08

CPMSFTWBJ09CPMSFTWBJ10

MSDN.MICROSOFT.COM

CPMSFTWBN01CPMSFTWBN02

CPMSFTWBN03CPMSFTWBN04KBSEARCH.MICROSOFT.COM

CPMSFTWBT40CPMSFTWBT41CPMSFTWBT42

CPMSFTWBT43CPMSFTWBT44

INSIDER.MICROSOFT.COM

CPMSFTWBI01 CPMSFTWBI02

3D2

C a ta lyst5 0 0 0

IUSCCMQUEC5002(COMMUNIQUE2)

C a ta lyst5 0 0 0

IUSCCMQUEC5001(COMMUNIQUE1)

C a ta lyst5 0 0 0

C a ta lyst5 0 0 0

ICPMSCBAC5001ICPMSCBAC5502

Port 1/1 Port 1/2Port 2/12

C is c o 7 0 0 0

ICPCMGTC7501

C is c o 7 0 0 0

ICPCMGTC7502

FE4/1/0

Port 1/1

FE4/1/0SQL

Microsoft.com SQL Servers

Microsoft.com Stagers,Build and Misc. Servers

FTP 6

Build Servers 32

IIS 210

Application 2

Exchange 24

Network/Monitoring 12

SQL 120

Search 2

NetShow 3

NNTP 16

SMTP 6

Stagers 26

Total 459

Microsoft.com Server Count

Drawn by: Matt GroshongLast Updated: April 12, 2000

IP addresses removed by J im Gray to protect security

CPMSFTSQLB05CPMSFTSQLB06CPMSFTSQLB08CPMSFTSQLB09CPMSFTSQLB14CPMSFTSQLB16CPMSFTSQLB18CPMSFTSQLB20CPMSFTSQLB21

Backup SQL Servers

CPMSFTSQLB22CPMSFTSQLB23CPMSFTSQLB24CPMSFTSQLB25CPMSFTSQLB26CPMSFTSQLB27CPMSFTSQLB36CPMSFTSQLB37CPMSFTSQLB38CPMSFTSQLB39

CPMSFTSQLA05CPMSFTSQLA06CPMSFTSQLA08CPMSFTSQLA09CPMSFTSQLA14CPMSFTSQLA16CPMSFTSQLA18CPMSFTSQLA20CPMSFTSQLA21CPMSFTSQLA22

Live SQL ServersCPMSFTSQLA23CPMSFTSQLA24CPMSFTSQLA25CPMSFTSQLA26CPMSFTSQLA27CPMSFTSQLA36CPMSFTSQLA37CPMSFTSQLA38CPMSFTSQLA39

IIS

IIS

IIS IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

Consolidator SQL Servers

CPMSFTSQLC02CPMSFTSQLC03CPMSFTSQLC06CPMSFTSQLC08CPMSFTSQLC16CPMSFTSQLC18CPMSFTSQLC20CPMSFTSQLC21CPMSFTSQLC22CPMSFTSQLC23

CPMSFTSQLC24CPMSFTSQLC25CPMSFTSQLC26CPMSFTSQLC27CPMSFTSQLC30CPMSFTSQLC36CPMSFTSQLC37CPMSFTSQLC38CPMSFTSQLC39

DOWNLOAD.MICROSOFT.COM DOWNLOAD.MICROSOFT.COM

HTMLNEWS(pvt).MICROSOFT.COM

CPMSFTWBV01CPMSFTWBV02CPMSFTWBV03

CPMSFTWBV04CPMSFTWBV05

CPMSFTWBD01CPMSFTWBD05CPMSFTWBD06

CPMSFTWBD07CPMSFTWBD08

CPMSFTWBD03CPMSFTWBD04CPMSFTWBD09

CPMSFTWBD10CPMSFTWBD11

ACTIVEX.MICROSOFT.COM

CPMSFTWBA02 CPMSFTWBA03

FTP.MICROSOFT.COM

CPMSFTFTPA03CPMSFTFTPA04

CPMSFTFTPA05CPMSFTFTPA06

NTSERVICEPACK.MICROSOFT.COM

CPMSFTWBH01CPMSFTWBH02

CPMSFTWBH03

HOTFIX.MICROSOFT.COM

CPMSFTFTPA01

ASKSUPPORT.MICROSOFT.COM

CPMSFTWBAM03CPMSFTWBAM04

CPMSFTWBAM01CPMSFTWBAM01

MSDNNews.MICROSOFT.COM

CPMSFTWBV21CPMSFTWBV22

CPMSFTWBV23

MSDNSupport.MICROSOFT.COM

CPMSFTWBV41 CPMSFTWBV42

NEWSLETTERS.MICROSOFT.COM

CPMSFTSMTPQ01 CPMSFTSMTPQ02

NEWSLETTERS

CPMSFTSMTPQ11CPMSFTSMTPQ12CPMSFTSMTPQ13CPMSFTSMTPQ14CPMSFTSMTPQ15

NEWSWIRE

CPMSFTWBQ01CPMSFTWBQ02CPMSFTWBQ03

Misc. SQL Servers

INTERNAL SMTP

CPMSFTSMTPR01CPMSFTSMTPR02

NEWSWIRE.MICROSOFT.COM

CPITGMSGR01 CPITGMSGR02

NEWSWIRECPITGMSGD01CPITGMSGD02CPITGMSGD03

OFFICEUPDATE.MICROSOFT.COM

CPMSFTWBO01CPMSFTWBO02

CPMSFTWBO04CPMSFTWBO07

PremOFFICEUPDATE.MICROSOFT.COM

CPMSFTWBO30CPMSFTWBO31

CPMSFTWBO32

SearchMCSP.MICROSOFT.COM

CPMSFTWBM03

SvcsWINDOWSMEDIA.MICROSOFT.COM

CPMSFTWBJ21 CPMSFTWBJ22

STATSCPITGMSGD04CPITGMSGD05CPITGMSGD07CPITGMSGD14CPITGMSGD15CPITGMSGD16CPMSFTSTA14CPMSFTSTA15CPMSFTSTA16

WINDOWS_Redir.MICROSOFT.COM

CPMSFTWBY05

COMMUNITIES

COMMUNITIES.MICROSOFT.COM

CPMSFTNGXA01CPMSFTNGXA02CPMSFTNGXA03

CPMSFTNGXA04CPMSFTNGXA05

CODECS.MICROSOFT.COM

CPMSFTWBJ16CPMSFTWBJ17CPMSFTWBJ18

CPMSFTWBJ19CPMSFTWBJ20

CGL.MICROSOFT.COM

CPMSFTWBG03CPMSFTWBG04CPMSFTWBG05

CPMSFTWBG04CPMSFTWBG05

CDMICROSOFT.COM

CPMSFTWBC01CPMSFTWBC02

CPMSFTWBC03

BACKOFFICE.MICROSOFT.COM

CPMSFTWBB01CPMSFTWBB03

CPMSFTWBB04

Build Servers

INTERNET-BUILDINTERNET-BUILD1INTERNET-BUILD2INTERNET-BUILD3INTERNET-BUILD4INTERNET-BUILD5INTERNET-BUILD6INTERNET-BUILD7INTERNET-BUILD8INTERNET-BUILD9INTERNETBUILD10INTERNETBUILD11INTERNETBUILD12INTERNETBUILD13INTERNETBUILD14INTERNETBUILD15INTERNETBUILD16

INTERNETBUILD17INTERNETBUILD18INTERNETBUILD19INTERNETBUILD20INTERNETBUILD21INTERNETBUILD22INTERNETBUILD23INTERNETBUILD24INTERNETBUILD25INTERNETBUILD26INTERNETBUILD27INTERNETBUILD30INTERNETBUILD31INTERNETBUILD32INTERNETBUILD34INTERNETBUILD36INTERNETBUILD42

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IISIIS

IIS IIS

SQL

SQL

SQL

SQL

SQLSQL

SQL

SQL

SQL

SQL

SQL

StagersCPMSFTCRA10CPMSFTCRA14CPMSFTCRA15CPMSFTCRA32CPMSFTCRB02CPMSFTCRB03CPMSFTCRP01CPMSFTCRP02CPMSFTCRP03

CPMSFTCRS01CPMSFTCRS02CPMSFTCRS03CPMSFTSGA01CPMSFTSGA02CPMSFTSGA03CPMSFTSGA04CPMSFTSGA07

PPTP / Terminal Servers

CPMSFTPPTP01CPMSFTPPTP02CPMSFTPPTP03CPMSFTPPTP04

CPMSFTTRVA01CPMSFTTRVA02CPMSFTTRVA03

CPMSFTSQLD01CPMSFTSQLD02CPMSFTSQLE01CPMSFTSQLF01CPMSFTSQLG01CPMSFTSQLH01CPMSFTSQLH02CPMSFTSQLH03CPMSFTSQLH04CPMSFTSQLI01CPMSFTSQLL01CPMSFTSQLM01CPMSFTSQLM02CPMSFTSQLP01CPMSFTSQLP02CPMSFTSQLP03CPMSFTSQLP04CPMSFTSQLP05CPMSFTSQLQ01CPMSFTSQLQ06

CPMSFTSQLR01CPMSFTSQLR02CPMSFTSQLR03CPMSFTSQLR05CPMSFTSQLR06CPMSFTSQLR08CPMSFTSQLR20CPMSFTSQLS01CPMSFTSQLS02CPMSFTSQLW01CPMSFTSQLW02CPMSFTSQLX01CPMSFTSQLX02CPMSFTSQLZ01CPMSFTSQLZ02CPMSFTSQLZ04CPMSFTSQL01CPMSFTSQL02CPMSFTSQL03

Monitoring Servers

CPMSFTHMON01CPMSFTHMON02CPMSFTHMON03

CPMSFTMONA01CPMSFTMONA02CPMSFTMONA03

Canyon Park Data CenterMicrosoft.com Network Diagram

Page 21: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

21

A Schematic of HotMail• ~7,000 servers • 100 backend stores

with 300TB (cooked)• many data centers• Links to

– Internet Mail gateways– Ad-rotator– Passport– …

• ~ 5 B messages per day• 350M mailboxes, 250M active• ~1M new per day.• New software every 3 months

(small changes weekly).

Sw

ittc

he

d E

the

rne

t

Inte

rne

tTelnet Management

Local Director

Local Director

Local Director

Local Director

MSERVS

MSERVSMSERVSFrontDoors

MSERVSMSERVSIncoming

MailServers

MSERVSMSERVSAD Servers

Local Director

MSERVSMSERVSGraphicsServers

DataDataData

DataUSTORES

MemberDirectory

Local Director

MSERVSMSERVSLoginServers

gatewaygatewaygatewaygatewaygateway

Page 22: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

22

Why (2) Velocity

• No project can take more than 13 weeks.

• Time to market is everything

• Functionality is everything

• Faster, cheaper, …

Schedule Quality

Functionality

trend

Page 23: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

23

Why (3) Hackers• Hacker’s are a new increased threat• Any site can be attacked from anywhere• Motives include ego, malice, and greed.• Complexity makes it hard to protect sites.• Whole internet attacks: Slammer• Concentration of wealth makes attractive target:

Reporter: “Why did you rob banks?”

Willie Sutton: “Cause that’s where the money is!”

Note: Eric Raymond’s How to Become a Hacker http://www.tuxedo.org/~esr/faqs/hacker-howto.html

is the positive use of “Hacker”, here I mean malicious and anti-social hackers.Black-hats, not white-hats.

Page 24: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

24

How Bad Is It?http://www-iepm.slac.stanford.edu/

Connectivity is poor.

http://www.internettrafficreport.com/main.htm

Page 25: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

25

How Bad Is It?

• Median monthly % ping packet loss for 2/ 99

http://www-iepm.slac.stanford.edu/pinger/

Page 26: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

26

And in 2006, about the same

Page 27: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

27

Or In the USOr In the US

Page 28: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

28

Keynote measures Response Time

and Up Time

Measures response time around the world

Business service is better than popular service

Has many proprietary services for SLAs.

 Week of

April 22 - April 28, 2001 Previous Week

  Index    15.90 15.78

Web Siteswith BestPerformanceAverages

Ameritrade (65) Lycos (81) Yahoo! (81) Altavista (19) Go.com

3.29 5.41 5.79 6.03 7.02

Ameritrade (64) Lycos (80) Yahoo! (80) Ask Jeeves (7) Altavista (18)

3.35 5.58 5.74 6.11 6.17

Worst Average (anonymous) 38.04 (anonymous) 37.44

Page 29: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

29

2006: typical 97.48% Availability

97.48%97.48%

Page 30: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

30

Netcraft’s Crisis-of-the-Day

Page 31: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

31

Page 32: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

32

Service Level Measurements

• Many organizations are measured on SLAs• Example: 1 sec response

99% of prime time• Keynote, Netcraft, …

– offer to monitor you site (probe every few min)• This probing can go deep into the tree to detect

services.

– Send alerts via email– Give monthly reports.

Page 33: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

33

In addition• Most large sites build

their own instrumentation (several times )

• This instrumentation is elaborate and essential for the Network Operations Center (NOC).

• There are attempts now to systematize itTivoli, OpenView, NetIQ, WhatsUP, Mom,..

Page 34: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

34

Microsoft.Com• Operations mis-configured

a router• Took a day to diagnose

and repair.

• DOS attacks cost a fraction of a day.

• Regular security patches.

Page 35: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

35

Back-End Servers are More Stable• Generally deliver 99.99%

• TerraServer for example single back-end failed after 2.5 y.

• Went to 4-nodecluster

• Fails every 2 mo.Transparent failover in 30 sec.Online software upgradesSo… 99.999% in backend…

Time %

Total Up Time 8754:07:22 99.93%

Total Down Time 5:52:38 0.07%Total Time 8760:00:00 100.00%Scheduled Down 2:50:45Scheduled Availabilty 8757:09:15 99.97%

Un-Scheduled Down 3:01:53Time %

Up Time 12888:21:49 99.519%Scheduled Down 4:00:25 0.031%

Unscheduled Down 58:20:46 0.451%

Total Time 12950:43:00 99.52%Total Down 62:21:11 0.48%

Year 1

Through18

Months

Down 30 hours in July (hardware stop, auto restart failed, operations failure)

Down 26 hours in September (Backplane failure, I/O Bus failure)

Page 36: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

36

eBay: A very honest site

• Publishes operations log.Publishes operations log.

• Has 99% of scheduled uptimeHas 99% of scheduled uptime

• Schedules about 2 hours/week down.Schedules about 2 hours/week down.

• Has had some operations outagesHas had some operations outages

• Has had some DOS problems.Has had some DOS problems.

http://www2.ebay.com/aw/announce.shtml

Page 37: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

37

And 2006…. Welcome to eBay's System Board. Visit this board for information on scheduled site maintenance or system issues that are affecting Marketplace trading. For general eBay news, please see our General Announcements Board. ***Resolved - PayPal site slowness*** February 08, 2006 | 05:20PM PST/PTFor several hours today, members may have experienced slowness while trying to access the PayPal website. This issue has now been resolved. AThank you for your patience. Link to this announcement | Back to top***PayPal site slowness***February 08, 2006 | 02:38PM PST/PTMembers may be experiencing intermittent slowness while trying to access the PayPal website. We're aware of this issue and are working to fix it as quickly as possible. Thank you for your patience. Link to this announcement | Back to top***Scheduled Maintenance For This Week***February 08, 2006 | 02:03PM PST/PTThe eBay system will be undergoing general maintenance from approximately 23:00 PT on Thursday, February 9th to 01:00 PT on Friday, February 10th. During this maintenance period, certain eBay site features may be intermittently unavailable or slow.

http://www2.ebay.com/aw/announce.shtml

Page 38: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

38

Some Cool New Things

• There are 100,000 node services.

• Google File System shows importance & benefit of Triplex

• DB replication & mirroring works (is easy)

• little things I have done– With Leslie Lamport: unified Paxos & 2PC– Measured mean-time-to-data-loss

(and continue to measure things).

Page 39: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

39

Outline

• The glorious past (Availability Progress)

• The dark ages (current scene)

• Some recommendations

Page 40: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

40

Not to throw stones but…

• Everyone has a serious problem.

• The BEST people publish their stats.

• The others HIDE their stats (check Netcraft to see who I mean).

• We have good NODE-level availability5-9s is reasonable.

• We have TERRIBLE system-level availability2-9s “scheduled” is the goal (!).

Page 41: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

41

Gresham’s Law:“bad money drives out good”

• People WANT features!• People WANT convenience!• People WANT cheap!• In exchange,

they seem to be willing to tolerate some– Un-availability (= inconvenience)– “Dirty data” that needs reconciliation– Insecurity

• I see it as our task to make it easier & cheaperto get high availability and Security.

Schedule Quality

Functionality

trend

Page 42: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

42

Recommendation #1

• Continue progress on back-ends.– Make management easier

(AUTOMATE IT!!!)– Measure – Compare best practices– Continue to look for better algoritims.

• Live in fear– We are at 10,000 node servers– We are headed for 1,000,000 node servers

Page 43: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

43

Recommendation #2• Current security approach is unworkable:

– Anonymous clients– Firewall is clueless– Incredible complexity

• We cant win this game!

• So change the rules (redefine the problem):– No anonymity– Unified authentication/authorization model – Single-function devices (with simple interfaces)– Only one-kind of interface (uddi/wsdl/soap/…).

Page 44: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

44

Recommendation #3• Dependability

requires holistic not reductionist approach.• It’s the WHOLE system

(end-to-end, top-to-bottom)• Hard to publish in this area, hard to get tenure.

– Journals want theorem+proof and crisp statements.

• Companies want to make money, so do not share their knowledge.

• Dependability is an important social good,• So, it Dependability Research needs

government or philanthropic sponsorship

Page 45: 1 Dependability in the Internet Era Jim Gray Microsoft Research High Dependability Computing Consortium Conference Santa Cruz, CA 7 May 2001 REVISED: 13

45

ReferencesAdams, E. (1984). “Optimizing Preventative Service of Software Products.” IBM Journal of Research and

Development. 28(1): 2-14.0Anderson, T. and B. Randell. (1979). Computing Systems Reliability. Garcia-Molina, H. and C. A. Polyzois. (1990). Issues in Disaster Recovery. 35th IEEE Compcon 90. 573-

577.Gray, J. (1986). Why Do Computers Stop and What Can We Do About It. 5th Symposium on Reliability in

Distributed Software and Database Systems. 3-12.Gray, J. (1990). “A Census of Tandem System Availability between 1985 and 1990.” IEEE Transactions

on Reliability. 39(4): 409-418.Gray, J. N., Reuter, A. (1993). Transaction Processing Concepts and Techniques. San Mateo, Morgan

Kaufmann.Lampson, B. W. (1981). Atomic Transactions. Distributed Systems -- Architecture and Implementation: An

Advanced Course. ACM, Springer-Verlag.Laprie, J. C. (1985). Dependable Computing and Fault Tolerance: Concepts and Terminology. 15’th

FTCS. 2-11.Long, D.D., J. L. Carroll, and C.J. Park (1991). A study of the reliability of Internet sites. Proc 10’th

Symposium on Reliable Distributed Systems, pp. 177-186, Pisa, September 1991.Theory and Practice of Reliable System Design, Dan Siewiorek, Robert SwarzBuilding Secure and Reliable Network Applications, Ken P. Birman Darrell Long, Andrew Muir and Richard Golding, ``A Longitudinal Study of Internet Host Reliability,'' Proc

of the Symposium on Reliable Distributed Systems, Bad Neuenahr, Germany: IEEE, 1995, p. 2-9http://www.netcraft.com/ They have even better for-fee data as well, but for-free is really excellent.http://www2.ebay.com/aw/announce.shtml#top eBay is an Excellent benchmark of best Internet practices

Empirical Measurements of Disk Failure Rates and Error Rates + C .van Ingen moving 2P with cheap iron“Consensus on Transaction Commit”, +, L. Lamport, unifies 2PC and Byzantie-Paxos