18
• J. Gray, Dependability in the Internet Era • (acknowledgement: slides from J.Gray, E.Brewer)

J. Gray, Dependability in the Internet Era (acknowledgement: slides from J.Gray, E.Brewer)

  • Upload
    liora

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

J. Gray, Dependability in the Internet Era (acknowledgement: slides from J.Gray, E.Brewer). Telephone Systems. Computer Systems. Internet. Cell phones. The Last 10 Years: Availability Dark Ages Ready for a Renaissance?. Things got better, then things got a lot worse!. 99.999%. 99.999%. - PowerPoint PPT Presentation

Citation preview

Page 1: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)

• J. Gray, Dependability in the Internet Era• (acknowledgement: slides from J.Gray, E.Brewer)

Page 2: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)

The Last 10 Years: Availability Dark Ages

Ready for a Renaissance? • Things got better, then things got a lot worse!

1950 1960 1970 1980 1990 2000

9%

99%

99.9%

99.99%

99.999%

99.999%

Computer Systems

Telephone Systems

Cellphones

InternetA

vaila

bilit

y

2010

Page 3: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)

DEPENDABILITY: The 3 ITIES• RELIABILITY / INTEGRITY:

Does the right thing. (also MTTF>>1)

• AVAILABILITY: Does it now.

(also 1 >> MTTR ) MTTF+MTTRSystem Availability:If 90% of terminals up & 99% of DB up?

(=>89% of transactions are serviced on time).

• Holistic vs. Reductionist view

SecurityIntegrityReliability

Availability

Page 4: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)

Fail-Fast is Good, Repair is Needed

Improving either MTTR or MTTF gives

benefit

Fault Detect

Repair

Return

Lifecycle of a moduleLifecycle of a modulefail-fast gives fail-fast gives short fault latencyshort fault latency

High Availability High Availability

is low UN-Availabilityis low UN-Availability

Unavailability ~ Unavailability ~ MTTRMTTR MTTFMTTF

Page 5: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)

Disks (raid) the BIG Success Story

• Duplex or Parity: masks faults• Disks @ 1M hours (~100 years) • But

– controllers fail and – have 1,000s of disks.

• Duplexing or parity, and dual path gives “perfect disks”

• Wal-Mart never lost a byte (thousands of disks, hundreds of failures).

• Only software/operations mistakes are left.

Page 6: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)

Fault Tolerance vs Disaster Tolerance

• Fault-Tolerance: mask local faults– RAID disks– Uninterruptible Power Supplies– Cluster Failover

• Disaster Tolerance: masks site failures– Protects against fire, flood, sabotage,..– Also, software changes, site moves,…– Redundant system and service

at remote site.

Page 7: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)

Availability99 999well-managed nodes

well-managed packs & clones

well-managed GeoPlex

Masks some hardware failures

Masks hardware failures, Operations tasks (e.g. software upgrades)Masks some software failures

Masks site failures (power, network, fire, move,…) Masks some operations failuresA

vaila

bilit

yUn-managed

Page 8: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)

Case Studies - Tandem Trends

MTTF improved

Shift from Hardware & Maintenance to from 50% to 10%

to Software (62%) & Operations (15%)

NOTE: Systematic under-reporting of EnvironmentOperations errorsApplication Software

unknown environment operations maintenance hardware software

0

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

100

1985 1987 1989

0

20

40

60

80

1 00

1 20

1985 19 87 1 989

Outag es/ 1000 Syste m Yearsby Primar y Cause

% of Outage s by Pri mary Cause

Page 9: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)

Dependability Status circa 1995 • ~4-year MTTF

• 5 9s for well-managed sys. Fault Tolerance Works.

• Hardware is GREAT (maintenance and MTTF).

• Software masks most hardware faults.• Many hidden software outages in operations:

• New Software.

• Utilities.

• Need to make all hardware/software changes ONLINE.

Page 10: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)

Progress?• MTTF improved from 1950-1995• MTTR incremental improvements 1970 ---

failover• Hardware and Software online change

(pNp) is now standard• Then the Internet arrived:

– No project can take more than 3 months.– Time to market is everything– Change is good.

Computer Systems

Telephone Systems

Cellphones

Internet

Page 11: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)

The Internet Changed Expectations

1990Phones delivered 99.999%

ATMs delivered 99.99%

Failures were front-page news.

Few hackers

Outages last an “hour”

2005Cell phones deliver 90%

Web sites deliver 99%

Failures are business-page news

Many hackers.

Outages last a “day”

This is progress?

Page 12: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)

2006

Page 13: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)
Page 14: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)

Eric Brewer said it best:

ACID vs BASEthe internet litmus test

• AtomicityConsistencyIsolation Durabilty

• Availability?• Strong consistency

Isolation

• Focus on commit• Conservative (Pessimistic)

• Difficult evolution (e.g. schema)

• Nested transactions

• BasicAvailabilitySoft StateEventual Consistency

• Availability FIRST• Weak consistency

stale data is OKApproximate answers OK

• Best effort• Aggressive (optimistic)• Easier Evolution.

• Simpler!• Faster

I think it is a spectrum

Page 15: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)
Page 16: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)
Page 17: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)
Page 18: J. Gray, Dependability in the Internet Era (acknowledgement:  slides from J.Gray, E.Brewer)