111
1 ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability Fundamentals of Dependability Jean-Claude Laprie

Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

  • Upload
    builien

  • View
    228

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

1ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Fundamentals of Dependability

Jean-Claude Laprie

Page 2: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

2ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Dependability: ability to deliver service that can justifiably be trusted

Service delivered by a system: its behavior as it is perceived by its user(s)

User: another system that interacts with the former

Function of a system: what the system is intended to do

(Functional) Specification: description of the system function

Correct service: when the delivered service implements the system function

(Service) Failure: event that occurs when the delivered service deviatesfrom correct service, either because the system does not comply with thespecification, or because the specification did not adequately describe itsfunction

Failure modes: the ways in which a system can fail, ranked according tofailure severities

Part of system state that may cause a subsequent service failure: error

Adjudged or hypothesized cause of an error: fault

Dependability: ability to avoid failures that are unacceptably frequent or

severe

Failures unacceptably frequent or severe: dependability failure

Page 3: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

3ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Absenceof catastrophic

consequences onthe user(s) and the environment

Continuityof service

Readinessfor usage

Absence of unauthorized disclosure of information

Absenceof improper

systemalterations

Ability toundergo

repairs andevolutions

SafetyReliability ConfidentialityAvailability Integrity Maintainability

SecurityAbsence of unauthorized access to, or handling of, system state

Dependability

Authorized actions

Page 4: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

4ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

SafetyReliability ConfidentialityAvailability Integrity Maintainability

FaultPrevention

FaultTolerance

FaultRemoval

FaultForecasting

Faults Errors FailuresActivation Propagation Causation

FaultsFailures …… Causation

Page 5: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

5ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Dependability attributes

! Availability, Reliability, Safety, Confidentiality, Integrity, Maintainability:Primary attributes

! Secondary attributes

" Specialization

# Robustness: dependability with respect to external faults

# Survivability: dependabilty in the presence of active fault(s)

# Resilience: dependability when facing functional, environmental,or technological changes

" Distinguishing among various types of (meta-)information

# Accountability: availability and integrity of the person whoperformed an operation

# Authenticity: integrity of a message content and origin, andpossibly some other information, such as the time of emission

# Non-repudiability: availability and integrity of the identity of thesender of a message (non-repudiation of the origin),or of the receiver (non-repudiation of reception)

Page 6: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

6ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Fault Error Failure

Deviation of thedelivered

service fromcorrect service,

i.e.,implementing

the systemfunction

Part of systemstate that may

cause asubsequent

failure

Adjudged or

hypothesized causeof an error

Failure… …Fault

System does notcomply withspecification

Specification does notadequately describe

function

Dependability Threats

PropagationActivation Causation

Page 7: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

7ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Error ErrorError

activation

propagation prop.

ServiceInterface

IncorrectService: Outage

CorrectService

Failure

Correct ServiceIncorrectService: Outage

Failure

extern

al fault

occurre

nce

Internal fault,dormant

vulnerability

activation

(computationprocess)

Page 8: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

8ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Ca

usa

tio

n

Propagation, Causation

System: Set of components interconnected in order to interactComponent: Another system

Page 9: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

9ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Means for Dependability

Preventingoccurrence orintroduction of

faults

FaultPrevention

Delivering correctservice in spite of

faults

Fault Tolerance

Reducing thepresence of

faults

FaultRemoval

Estimating the presentnumber, the future

incidence and the likelyconsequences of faults

FaultForecasing

Dependability Provision

Dependability Assessment

Fault Avoidance

Fault Acceptance

! !

! !

! !

! !

Page 10: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

10ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

" Original definition: ability to deliver service that can justifiably betrusted

! Enables to generalize availability, reliability, safety,confidentiality, integrity, maintainability, that are then attributesof dependability

" Alternate definition: ability to avoid service failures that areunacceptably frequent or severe

! A system can, and usually does, fail. Is it however stilldependable? When does it become undependable?

$criterion for deciding whether or not, in spite of service failures,a system is still to be regarded as dependable

! Dependence of system A on system B is the extent to which systemA’s dependability is (or would be) affected by that of system B

! Trust: accepted dependence

Dependability definitions

Explicitly Implicitly

Page 11: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

11ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

F. Schneider, ed.,‘Trust in cyberspace’,National AcademyPress, 1999

A. Ellison et al.,‘Survivable networksystems’, SEI Report,1999

‘Informationtechnology frontiers fora new millenium’, BlueBook 2000, NTSC

Reference

1) hostile attacks (from

hackers or insiders)

2) environmentaldisruptions (accidentaldisruptions, either man-made or natural)

3) human and operatorerrors (e.g., softwareflaws, mistakes by humanoperators)

1) attacks (e.g.,

intrusions, probes, denialsof service)

2) failures (internally

generated events due to,e.g., software designerrors, hardwaredegradation, humanerrors, corrupted data)

3) accidents (externallygenerated events such asnatural disasters)

• internal and externalthreats

• naturally occurringhazards and maliciousattacks from asophisticated and well-funded adversary

1) development faults(e.g., software flaws,hardware errata, maliciouslogic)

2) physical faults (e.g.,production defects,physical deterioration)

3) interaction faults(e.g., physicalinterference, inputmistakes, attacks,including viruses, worms,intrusions)

Threatspresent

assurance that asystem will perform asexpected

capability of a systemto fulfill its mission in atimely manner

consequences of thesystem behavior arewell understood andpredictable

1) ability to deliverservice that canjustifiably be trusted

2) ability of a system toavoid service failuresthat are unacceptablyfrequent or severe

Goal

TrustworthinessSurvivabilityHigh ConfidenceDependabilityConcept

Dependability and similar notions

Page 12: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

12ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Attributes

AvailabilityReliabilitySafetyConfidentialityIntegrityMaintainability

Dependability Means

Fault PreventionFault ToleranceFault RemovalFault Forecasting

Threats

FaultsErrorsFailures

Security

Page 13: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

13ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Faultforecating

Ordinal or qualitative evaluation

Probabilistic orquantitative evaluation

Modeling

Operational testing

Faulttolerance

Error detection

System recoveryError handling

Fault handling

Development

Static verification

Dynamic verificationVerification

Diagnosis

CorrectionNon-regression verificationFault

removalOperational life Corrective or preventive maintenance

Means

Faultprevention

Attributes

Availability/Reliability

Safety

Confidentiality

Integrity

Maintainability

Threats

Faults

Errors

Failures

Development

Physical

Interaction

Dependability

Page 14: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

14ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

111!References

109!Conclusion

102!From dependability to resilience

90!Development of dependable systems

66!Fault tolerance

57!Fault forecasting

45!Fault removal

28!State of the art from statistics

15!Threats: faults, errors, failures

Slides

Outline

Page 15: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

15ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Dependability threats: faults, errors, failures

Page 16: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

16ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Error of a programmer

FaultImpaired instructions or data

ActivationFaulty component and inputs

Error

PropagationWhen delivered service deviates (value, timing) from

implementing function

Failure

Page 17: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

17ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Short-circuit in integrated circuit

Failure

FaultStuck-at connection, modification of circuit function

ActivationFaulty component and inputs

Error

PropagationWhen delivered service deviates (value, timing) from

implementing function

Failure

Causation

Page 18: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

18ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Operator ErrorInappropriate human-system interaction

Fault

Error

PropagationWhen delivered service deviates (value, timing) from

implementing function

Failure

Page 19: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

19ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Electromagnetic perturbation

Fault

Error

ActivationFaulty component and inputs

FaultImpaired memory data

PropagationWhen delivered service deviates (value, timing) from

implementing function

Failure

Page 20: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

20ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Faults Errors Failures

Phase of creationor occurrence

Development faults

Operational faults

System boundariesInternal faults

External faults

Phenomenologicalcause

Natural faults

Human-made faults

PersistencePermanent faults

Transient faults

FaultsFailures ……

IntentMalicious faults

Non-malicious faults

Capability

Accidental faults

Incompetence faults

Deliberate faults

Signaled failures

Unsignaled failuresDetectability

ConsistencyConsistent failures

Inconsistent(Byzantine) failures

Consequences

Minor failures

Catastrophic failures

%

%

%

Content failuresDomain

Timing failures

PropagationActivation CausationCausation

Page 21: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

21ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Failures

Détectability

Signalled failures [delivered service is detected as incorrect, and signalled as such]

Unsignalled failures[incorrect service deivery is not detected]

Consistency

Consistent failures[incorrect service identically perceived by all users]

Inconsistent, or Byzantine, failures[some, or all, users perceive differently incorrect service]

Consequences

Minor failures[harmful consequences are of similar cost to the benefits provided by correctservice delivery ]

Catastrophic failures[cost of harmful consequences is orders of magnitude, or evenincommensurably higher than the benefits provided by correct service delivery]

•••

Domain

Value failures[value of delivered service deviates from implementing system function]

Timing failures[timing of service delivery (instant or duration) deviates from implementingsystem function]

Page 22: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

22ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Failuredomain

Value(correct timing)

Timing(correct value)

Value andtiming

Value failures

Service deliveredtoo early

Early timing failures

Service deliveredtoo late

Late timing failures

Suspendedservice

Halt failures

Erraticservice

Erratic failures

Non signalling of incorrect service: unsignalled failure

Signalling incorrect service in absence of failure: false alarm

• halt failures: fail-halt system

• minor failures: fail-safe system

Failure of detecting mechanisms

System whose all failures are, to an acceptable extent

Page 23: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

23ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Faults

Phase of creationor occurrence

Development faults[occur during (a) system development, (b) maintenance during the use phase,and (c) generation of procedures to operate or to maintain the system]

Operational faults[occur during service delivery of the use phase]

Systemboundaries

Internal faults[originate inside the system boundary]

External faults[originate outside the system boundary and propagate errors into the system by interaction or interference]

Phenomenologicalcause

Natural faults[caused by physico-chemical natural phenomena without human participation]

Human-made faults[result from human actions]

Intent

Malicious faults[introduced by a human with the malicious objective of causing harmto the system]

Non-malicious faults[introduced without a malicious objective]

Deliberate faults[result of a decision] Capability

Accidental faults[introduced inadvertently]

Incompetence faults[result from lack of professional competence by the authorized human(s),or from inadequacy of the development organization]

Persistence

Permanent faults[presence is assumed to be continuous in time]

Transient faults[presence is bounded in time]

Page 24: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

24ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Development faults Physical faults Interaction faults

Faults

Perm. Perm. Perm. Perm. Perm. Perm. Trans. Perm. Trans. Trans. Trans. Perm. Trans. Trans. Perm.

Nonmalicious

Malicious Nonmalicious

Nonmalicious

Nonmalicious

Nonmalicious

Malicious

Internal Internal External

Development Operational

Accid. Inc. Accid. Accidental Accidental Accid. Delib. Incompetence DeliberateDelib. Delib.

Persistence

Intent

System boundaries

Phase of creationor occurrence

Capability

Phenomenologicalcause

Human-made

Natural Natural NaturalHuman-made

Maliciouslogic

PhysicalDeterior.

PhysicalInterference

IntrusionAttempts

VirusesWorms

Input Mistakes

Overload

Configuration Mistakes

Design Flaws

Production Defects

Page 25: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

25ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Human-made Faults

Non-malicious MaliciousIntent

Accidental(Mistakes)

Deliberate(Bad decisions)

DeliberateIncompetenceCapability

Interaction(operators,

maintainers)&

Development(designers)

Malicious logicfaults:

logic bombs,Trojan horses,

trapdoors,viruses, worms,

zombies

Intrusionattempts

Individuals&

organizations

Development faultsPhysical faults Interaction faults

Hardware Software System

Page 26: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

26ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Failure pathology

Activation of dormant fault by computation process

Activation of vulnerability

" Active fault produces error(s)

" Error is latent or detected

Error can disappear before detection

Error propagation Other errors

" Component failure

Fault as viewed by• Components interacting with failed component• System containing failed component

Fault Error Failure FaultActivation

oroccurrence

Propagation Causation(interaction,composition)

External fault

Page 27: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

27ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Fault Error FailureFailure… …Fault

Erroralters

serviceFacility forstoppingrecursion

Contextdependent

PropagationActivation orOccurrence

Causation

Interaction orcomposition

Activationreproducibility

Solid (hard)faults

Elusive (soft)faults

Interactionfaults

Prior presenceof a vulnerability:Internal fault that

enables anexternal fault to

harm the systemPermanent faults

(development,physical, interaction)

Transient faults(physical, interaction)

Elusive faults

Solid faults Intermittent faults

Page 28: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

28ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

State of the art from statistics

Page 29: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

29ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

June 1980: False alerts at the North American Air Defense (NORAD)

June 1985 - January 1987: Excessive radiotherapy doses (Therac-25)

November 1988: Internet worm

15 January 1990: 9 hours outage of the long-distance phone in the USA

February 1991: Scud missed by a Patriot (Dhahran, Gulf War)

November 1992: Crash of the communication system of the Londonambulance service

26 and 27 June 1993: Authorization denial of credit card operations in France

4 June 1996: Failure of Ariane 5 maiden flight

13 April 1998: Crash of the AT&T data network

February 2000: Distributed denials of service on large Web sites

May 2000: Virus I love you

July 2001: Worm Code Red

August 2003: Propagation of the electricity blackout in the USA and Canada

August 1986 - 1987: the "wily hacker" penetrates several tens ofsensitive computing facilities

October 2006: 83,000 e-mail addresses, credit card info, banking transactionfiles stolen in UK

Faults

Ph

ysic

al

De

ve

lop

me

nt

Inte

ractio

n

Ava

ilab

ility

/R

elia

bili

ty

Sa

fety

Co

nfid

en

tia

lity

Lo

ca

lize

d

Dis

trib

ute

d

Failures

" ""

" ""

" " ""

" ""

" " """

" " """

"" ""

" " "

" " ""

" " ""

" " ""

" " ""

" " ""

" " ""

" " ""

Page 30: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

30ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

1

1.1

1.2

1.4

1.6

2.8

Industrysector

Banking

Retail

Insurance

Financial institutions

Manufacturing

Millions of $revenue/hourlost

EnergyAverage outage costs#

# Maintenance costs

Space shuttle on-board software: 100 M $ / an!

# Cost of software project cancellation (failure of the development process)

FAA AASEstimate 1983

1 G $

Estimate 1988(contract passed)

4 G $

Estimate 1994

7 G $

Timing shift(estimate 1994)

6 - 8 yrs

!

USA [Standish Group, 2002,13522 projects]

Success

34%

Challenged

51%

Cancelled

15%

loss ~ 38 G$ (out of total 225 G$)

!

Yearly cost of failures#

Accidental faults

Malicious faults

UK

1.25 G£

France (private sector)

0.9 G!

1 G!

USA

4 G$

Estimates of insurance companies (2000)

Global estimate USA : 80 G$ UE : 60 G!

Page 31: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

31ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Accidental faults

15-20%2~ 60%1Development

40-50%1~ 20%2Human interactions

15-20%2~ 10%3Physical external

15-20%2~ 10%3Physical internal

ProportionRankProportionRankFaults

Controlled systems

(e.g., civil airplanes, phonenetwork, Internet frontend

servers)

Dedicated computingsystems

(e.g., transactionprocessing, electronic

switching, Internet backendservers)

Number of failures

[consequences and outagedurations depend uponapplication]

Page 32: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

32ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Permanent faults and productionyield for Intel microprocessors

1 FIT = 10-9/hDPM : defects per million

1

10

100

1000

10000

1972

1976

1980

1984

1988

1992

1996

2000

FIT

DPM

1980 1990 2000

-1

-2

310

1

10

10

10

10

8086

286DX 386DX

486DX Pentium

Pentium MMX

K7

Pentium2

Pentium3

Pentium4

PentiumM

Athlon

PowerPC4

Itanium2

2

Number of transistors byintegrated circuit (x 106)

1st microprocessor : 4004,1971, 2300 transistors

Page 33: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

33ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Transient operational physical faults:proportion increases with integration

Reduced geometric dimensions

Lower and lower energy

Environmental perturbations become faults more easily

Directly:! particles,

outer spaceparticles

Indirectly

Limitation of electrical fields

Lower supply voltage

Redfuced noise immunity

Page 34: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

34ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Cosmic rays

Low energy deflected

Primariesdisappear~ 25 km

~ 1600/m2-s cascade to sea level

~ 100/cm2-s at 12000 m

~ 1/cm2-s sea-level flux

µ-

! "0

"+

"-

e+

e-

e+e-

!

!

!

!

n

pp

np

np

np

p

n

n

Page 35: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

35ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

" Andrew network of CMU, 13 SUN stations, 21 station.yrs

689298System failures

6552

183

29

1056

Permanent faults

Intermittent faults

Mean time tomanifestation (h)

Number ofmanifestations

" Tandem experimental data

Examination of anomaly log files (several tens of systems, 6months): 132 recorded software faults

• 1 solid fault (« bohrbug »)• 131 elusive faults (« heisenbugs »)

Page 36: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

36ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

• Complexity

• Economic pressure

[From J. Gray, ‘Dependability in the Internet era’]

36j 12h0,9

3j 16h0,99

8h 46mn0,999

52mn 34s0,9999

5mn 15s0,99999

32s0,999999

Outage duration/yrAvailability

1950 1960 1970 1980 1990 2000

9%

99%

99.9%

99.99%

99.999%

99.999%

Computer Systems

Telephone Systems

Cellphones

Internet

Availa

bili

ty

2010

Page 37: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

37ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Mean time to system crash, dueto hardware failure

1985 1990 1995 20000

10

20

30

ECL-TCM

CMOS

308X/3090

9020

G1-G5

MT

TF

(yrs

)

21 yearsSystem MTBF

438Reported outages

20000074000Disks

8000025500Processors

300009000Systems

70002000Clients

Duration (yrs)Number

TandemHigh end IBM servers

[From J. Gray, ‘A Census of Tandem SystemAvailability Between 1985 and 1990’, IEEE Tr. Onreliability, Oct.1990]

[From L. Spainhower and T. A. Gregg, ‘IBM S/390Parallel Enterprise Server G5 fault tolerance: Ahistorical perspective’, IBM J. Research andDevelopment, 1999]

Environment

Total

Maintenance

Operations

Software

Hardware

50

100

150

200

250

300

350

400

450

0

MTBF (yrs)

Page 38: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

38ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

AvailabilityMTTR

98.598 hours

99.011 hour

99.8310 mins

99.981 minAvailabilityfor 100h

MTBF

Website uptime statistics (Netcraft)

Top 50 most requested sites

0

50

100

150

200

250

300

Dec

2003

Nov

2004

July

2005

July

2006

Sep

2007

June

2008

Avg of Avg

Avg of Max

Page 39: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

39ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Average availability

Average TTRby part ofservice (hrs)

Service failurecause bylocation

Servicecharacteristic

Website

97.8%97.2%93.5%

1.2 (2 serv. fai.)1.2 (16 serv. fai.)7.8 (4 serv. fai.)Network

14 (3 serv. fai.)0.2 (1 serv. fai.)7.3 (5 serv. fai.)Back-end

2.5 (10 serv. fai.)N/A9.4 (16 serv. fai.)Front-end

4%9%2%Unknown

18%81%18%Network

11%10%3%Back-end

66%0%77%Front-end

39 hours206 hours126 hoursMTTF

562140Service failures

205N/A296Component failures

3 months6 months7 monthsPeriod of data stud.

Open-source OSon x86

Open-source OSon x86

Solaris onSPARC and x86

Front-end nodearchitecture

~500, ~15 sites>2000, 4 sites~500, 2 sites# of machines

~7 million~100 million~100 millionHits per day

Content

(bleeding edge)

Readmostly

(mature)Online

(mature)

Three large websites [from D. Oppenheimer, A. Ganapathi, D.A. Patterson, ‘Why

do Internet services fail, and what can be done about it?’, USISTS ‘03]

Page 40: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

40ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Component failure to service failure

node operator

node hardware

node software

net

Content

32

90

48

814

6 9103

10

06

0 10

10

20

30

40

50

60

70

80

90

100

component failure

service failure

node opera

tor

node hard

ware

node softw

are

net opera

tor

net hard

ware

net softw

are

net unkn

own

Online

operator hardware software

40

104

54

10 9 10

75

91

81

component failure

service failure

coverage (%)

0

40

80

120

0

10

20

30

40

50

60

70

80

90

100

36

18

59

37

18

1

14

7

50

94

7681

component failure

service failure

coverage (%)

Page 41: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

41ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

0

1000

2000

3000

4000

5000

6000

7000

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

SEI/CERT Statistics: vulnerabilities reported

Malicious faults

Page 42: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

42ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

The geographic spread of Sapphire in the 30 minutes after release.

Slammer/Sapphire worm

The fastest computer worm in history. As it began spreading throughout the Internet, it doubled in sizeevery 8.5 seconds. It infected more than 90 percent of vulnerable hosts within 10 minutes.The wormbegan to infect hosts slightly before 05:30 UTC on Saturday, January 25, 2003. Sapphire exploited abuffer overflow vulnerability in computers on the Internet running Microsoft's SQL Server or MSDE 2000(Microsoft SQL Server Desktop Engine). This weakness in an underlying indexing service was discoveredin July 2002; Microsoft released a patch for the vulnerability before it was announced. The worm infectedat least 75,000 hosts, perhaps considerably more, and caused network outages and such unforeseenconsequences as canceled airline flights, interference with elections, and ATM failures.

[From: http://www.caida.org/publications/papers/2003/sapphire/sapphire.html]

Page 43: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

43ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Global Information Security Survey 2004 — Ernst & Young

Loss of availability: Top ten incidents

Percentage of respondents that indicated the following incidents resulted in an unexpected orunscheduled outage of their critical business

0% 20% 40% 60% 80%

Hardware failures

Major virus, Trojan horse, or Internet worms

Telecommunications failure

Software failure

Third party failure, e.g., service provider

System capacity issues

Operational erors, e.g., wrong software loaded

Infrastructure failure, e.g., fire, blackout

Former or current employee misconduct

Distributed Denial of Servive (DDoS) attacks

Non malicious76%

Malicious24%

Page 44: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

44ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Yearly survey on computer damages in France — CLUSIF (2000, 2001, 2002)

3 year trends& stable' increase( decrease

Occurrences

0

5,0%

10,0%

15,0%

20,0%

25,0%

Internal failures Loss of

essential services

Naturalevents

Physical accidents

Utilization errors

Design errorsVirus

infection

Targetted logic

attacks

Disclosures and frauds

Image damage

Thefts

Physical sabotage

Non malicious causes71%

!

"

"

"

!

#

#

"!

"

!

!

Malicious causes29%

Risk perception

Non malicious causes29%

!

!

"

"

""

!

#

#

Internal failures

Loss of essential services

Naturalevents

Physical accidents

Utilization errors

Design errorsVirus

infection

Targetted logic

attacks

Disclosures and frauds

Image damage

Thefts

Physical sabotage

Malicious causes71%

10,0%

20,0%

30,0%

Occurrence impact

Loss ess. serv.

Int. fail.

Nat events

Util. err.

Des. err.

Virus Infec.

Thefts

High

Average

Low

Non malicious causes Malicious Causes

0%

20%

40%

60%

80%

100%

Page 45: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

45ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Fault Removal

Page 46: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

46ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Reducing the presence of faults

Diagnosis

Correction

Non-regression verification

Verification Checking whether the systemsatisfies verification conditions

general specific

Page 47: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

47ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Verification

DynamicStatic

Static analysis

System

Effective execution

Reviews andinspections

Staticanalysis

Theoremproving

Systembehaviormodel

Modelchecking

Symbolicinputs

Systemmodel

Symbolicexecution

Valued inputs

System

Test

Specification

Design

Implementation

%

%

% %

%

%

%

%

%

% %

Page 48: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

48ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Reviews and inspections $ Testing

! Schneider Electric data: nuclear control software

0

50

100

150

200

250

300

Componentstatic

analysis

Unittests

Integration tests

Qualification tests

Number of uncovered faults

Verification effortin man.day

! HP data Uncovered faults/hour

Operational testing 0.21

Functional testing 0.28

Structural testing 0.32

Inspections 1.06

Page 49: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

49ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

! Space shutlle on-board software (500 000 lines de code)

Versions

0

5

10

15

20

25

30

5/83 12/85 6/91

Product (operation)

Process (testing)

Early detection (pre-build)

Faults /

KS

LO

C

Page 50: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

50ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Cost of software verification

! Average figures: for critical systems, " 70%

Corrective20%

[D,I,V]Adaptative

25%[S,D,I,V]

Perfective55%

[R,S,D,I,V]

Maintenance67%

Verification15%

Implementation7%

Design5%

Specification3%

Requirements3%

Verification45%

Implementation22%

Design17%

Specification 8%

Requirements 7%

Verification46%

Page 51: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

51ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Relative cost of fault removal

1976(Boehm)

1995(Baziuk)

Design 5 - 7.5

Implementation 8 - 15

Requirements & specification 1 1

Integration 10 - 25 90 - 440

Qualification 25 - 75 440

Operation/Maintenance 470 - 88050 - 500

Cost of fault removal all the more high as it is late

IBM data

Fa

ult p

erc

en

tag

e

Implem Unittesting

Integr. testing

Qualif. testing

Operat.life

Faultsintroducedby phase

Faultscorrectedby phase

Cost offault

correctionby phase

Page 52: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

52ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Functional specs review 1%

Detailed specs review 2%

Inspection of component logic 2%

Inspection of component code 3%

Unit testing 4%

Functional testing 6%

System testing 10%

Percentage of erroneous corrections wrt number ofremoved faults

Page 53: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

53ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Incorrect or poorly expressed requirements

Incorrect or poorly expressed specification

Design-implementation faults affectingseveral components

Design-implementation faults affectingone components

Typographical fault

Regression fault

Others

0 10 20 30 40

Percentage of uncovered faults

Typical distribution of software faults uncovered during development

Page 54: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

54ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Persistent software faults classified by manifestation frequency

(200 faults examined)

1. Omitted logic (existing code too simple)

2. Non re-initialization of data

3. Regression fault

4. Documentation fault (correct software )

5. Inadequate specification

6. Faulty binary correction

7. Erroneous comment

8. IF instruction too simple

9. Faulty data reference

10. Data alignment fault (left bitswrt right bits, etc.)

11. Timing fault causing data loss

12. Non-initialization of data

13. Other categories of lesser importance (total less than or

equal to 4)

Category:Projet B

(100 k ins.) Total

60

23

17

16

11

11

11

11

10

7

6

5

Projet A(500 K ins.)

24

6

12

6

1

11

11

2

4

3

3

1

36

17

5

10

10

0

0

9

6

4

3

4

Page 55: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

55ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Fault density

! Created faults: 10 à 300 / KLOC

! Residual faults: 0.01 à 10 / KLOC

Example : increments of AT&T ESS-5

Size of Density of faults Density of faults Learning Type ofincrement uncovered during uncovered in factor software

development the field (y/x)(x) (y)

42069 28,5 7,2 0,25 Preventive maintenance5422 67,3 21,0 0,31 Billing9313 79,3 27,7 0,35 On-line upgrading14467 26,5 7,2 0,27 System growth

165042 101,6 5,3 0,05 Hardware fault recovery16504 84,1 2 0,02 Software fault recovery38737 149,4 5,8 0,04 System integrity

Page 56: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

56ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

0

20

40

60

80

100

120

140

Pentium[march 95]

PentiumPro

[nov 95]

PentiumII

[may 97]

MobileP. II

[aug 98]

Celeron[apr 98]

Xeon[may 98]

Total

Present jan 99

Design faults (‘errata’) of Intel processors

Desig

n faults

Processors[date of first update]

Page 57: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

57ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Fault Forecasting

Page 58: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

58ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Identification, analysis ofconsequences, and classification

of failures

Probabilistic evaluation of the extentto which some dependability

attributes are satisfied

Ordinal or

qualitativeevaluation

Probabilistic or

quantitative evaluation

Estimation of presence, creation and consequences of faults

Operational testing

Evaluation testaccording to

operational inputprofile

Behavior model of system /failure, maintenance actions,

solicitations

Modelling

Page 59: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

59ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Failure mode and effect (and criticality)analysis [FME(C)A]

Reliability diagrams

Fault trees

State diagrams

Markov chains

Petri nets

Ordinal evaluation Probabilistic evaluation

Page 60: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

60ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

a

c

Time

Failure intensity (average number of failures per unit of time)

Reliability growth

Improved ability to delivercorrect service

[stochastic increaseof times to failure]

Decreasing failure intensity

Non-stationary stochasticprocesses

Stable reliability

Preserved ability to delivercorrect service

[stochastic equalityof times to failure]

Constant failure intensity

Stationary stochasticprocesses

Reliability decrease

Degraded ability to delivercorrect service

[stochastic decreaseof times to failure]

Increasing failure intensity

Page 61: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

61ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Before version X

Version X

Version X+1

Version X+2

Version X+3

Version X+4

IBM Data

18

16

14

12

10

8

6

4

2

0

Avera

ge n

um

ber

of uniq

ue faults / 3

month

s

Relative month where failure first reported

1 3 5 7 9

11

13

15

17

19

21

23

25

27

29

31

33

35

Version X+1 X+2 X+3 X+4 X+5

Page 62: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

62ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

0

.1

.2

.3

.4packs p

er

1k lin

es p

er

month

(/5

0)

Yr 3Yr 2Yr 1

Hardware replacement rate trend

Field data

AT&T Data

Page 63: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

63ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Page 64: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

64ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Page 65: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

65ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Study of the FAA Airworthiness Directives (AD)

1 Jan. 1980 - 21 Sept. 1994

Confirmed avionics ADs : 33

Hardware : 20 Software : 13

Equipments Rockwell/CollinsBendix/KingHoneywell/SperryTracor Aerospace

Estimation of software reliability

10-610-710-810-910-10

DME (88)

DME (94)

TCAS II (91)

TCAS (94)

Omega

ATC Transp.

AverageNo software problem reported

Failure rate (h-1)

Page 66: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

66ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Fault Tolerance

Page 67: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

67ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Delivering service implementing system function in spite of faults

Error detection: identification of error presence

Error handling: error removal from system state, if possiblebefore failure occurrence

Fault handling: avoiding fault(s) to be activated again

System recovery: transformation of erroneous state in a state freefrom detected error and from fault that can be activated again

Page 68: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

68ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Concurrent detection, during service delivery

Preemptive detection: service delivery suspended, search forlatent errors and dormant faults

Rollback: brings the system back into a state saved prior to erroroccurrence

Saved state: recovery point

Rollforward: new state (free from detected error) found

Compensation: erroneous state contains enough redundancy forenabling error masking

Addition of error detection mechanisms in component

Self-checking component

Error detection

Error handling

Page 69: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

69ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Diagnostics: identifies and records the error cause(s), in terms oflocalisation and category

Isolation: performs physical or logical exclusion of the faultycomponent(s) from further contribution to service delivery, i.e.,makes the fault(s) dormant

Reconfiguration: either switches in spare components or reassignstasks among non-failed components

Reinitialization: checks, updates and records the new configuration,and updates system tables and records

Fault handling

! Intermittent faults

" Isolation and reconfiguration not necessary

Error handling Non recurrenceof error

Fault diagnosis Absenceof fault

Intermittentfault

" Identification

Page 70: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

70ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Error detection and systemrecovery

orDetection - recovery

Fault masking andsystem recovery

orMasking

Error detection

Error handling

Systematic application

even in error absence

2

Rollback 2

1

Rollforward 2

1

Compensation 12

1

Fault handling 3 3 3 3

Page 71: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

71ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

coverage factor influent when 1-c ~

Effectiveness of error processing: coverage factor

c = P{ service is delivered ! failed component }

Component failure is covered if system error successfully processed

Duplex system, active redundancy

Component failure rate: "

Repair rate: µ -c = 1 - c

Reliability

R(t) ! exp{ -2 ( -c + "/µ ) " t }

MTTF ! 1"

2"("-c"+""

µ")""

Unavailability

-A = 1 - A ! 2 ( -c + "/µ ) ( " / µ )

-ANR = " / µ

Equivalent failure rate "eq = 2 ( -c + "

µ ) "

2 -c " ! non-covered failure

2 ( " / µ ) " ! two successive failures, first failure covered

# "/µ

Page 72: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

72ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

µ

!

µ

!

!

eq

10-1

10-2

10-3

10-4

1

10-4

10-3

10-2

c = .95

c = .9

c = .99

c = .995

c = .999 c = 1.2

.4

.6

.8

1

0

10 -2 10 -1 10 10 2 10 31

! t

R(t)

c = .95

c = .99

c = .995

c = .999

c = 1

!

µ = 10

-3

10-4 10 -3 10-2

10 -23,6 j / yr

10 -38,8 h / yr

10 -453 min / yr

10 -55 min / yr

10 -630 sec / yr

10 -73 sec / yr

c = .9

c = .95

c = .99

c = .995

c = .999 c = 1

NON REDUNDANT

!

-A

Page 73: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

73ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

! Tandem

Dynabus

Dynabus interface

CPU

Memory

I/O channels

Disk control.

Disk control.

Disk control.

Tape control.

VDU control.

Processor

moduleProcessor

module

Processor

module

Page 74: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

74ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Experimental data

Systems 9000

Processors 25500

Disks

MTBF System 21 years

1 2 3 4 5 6 7 80

50

100

150

200

71

183

53

21

1 1 1

Num

ber

of re

port

ed failu

res

Length of fault chains

Length of fault

chains having

led to failure

Environment

Total

Maintenance

Operations

Software

Hardware

50

100

150

200

250

300

350

400

450

0

MTBF (yrs)

Page 75: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

75ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Hierarchy of faulttolerant protocols

according to relation"uses service”

Group membership

Atomic multicast

Clock synchronization

Common mechanism:consensus

Distributed system: set of processors (processing and storage)

communicating by messages via communication network

Pro

cesso

r

Processor Processor Processor

Processor

Processor Processor Processor

Communication network

Page 76: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

76ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

! Implies self-checking processors

! Consequences:

1)! Only correct value messages are sent

" minimum number of processors for consensus in presence of t faults: n"#"t+2

2)! Error detection: halting detection by interrogating and waiting

3)! Resource replication for tolerating t faults ("halts"): # t+1

4)! Communication network saturation by “babbling” is impossible

5)! Network architecture:

Fail-silent processors

Page 77: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

77ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

! Induced complications:

1)! Possibility of non consistent (“byzantine”) behavior

" minimum number of processors for consensus in presence of t faults: n"#"3.t+1

" t+1 exchanges of messages

2)! Error detection $ halt detection

3)! Resource replication for tolerating t faults # 2.t+1

4)! Communication network saturation by “babbling” possible

5)! Possibility of "lying" on source address

6) ! Network architecture:

Arbitrary failure processors

Page 78: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

78ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

(1) 2 3 9

1 (2) 3 4

1 2 (3) 9

X X X (4)

1

2

3

4

P1P2 P3 P4 P1P2 P3 P4 P1 P2 P3 P4 P1P2 P3 P4Origin

Private values

P1 P2 P3 P4

After 1st exchange

– – 3 4

– 2 – 9

– 5 6 –

– – 3 9

(1 2 3 4)

1 – – 9

6 – 7 –

– 2 – 9

1 – – 4

(1 2 3 9)

7 8 – – (X X X 4)

(1 2 3 9)

X X – X

X – X X

– X X XFrom P1

From P2

From P3

From P4

After 2nd exchange

After majority vote 1 2 3 9 1 2 3 9 1 2 3 9 X X X X

P1P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4

Broadcastingprivate values

Broadcastingvalues receivedfrom theirprocessors

Page 79: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

79ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

! Practical aspects

" Performance overhead due to volume of exchanged data

# t=1 % 9 values received by each processor

# t=2 % 156 values received by each processor

" Use of signatures

Signatures attached to exchanged data such that corruption probability withoutdetection is negligible

& 2t+1 processors and communication channels

& t+1 exchanges

& Reduction of volume of data exchanged

) t=1% 4 values received by each processor

) t=2% 40 values received by each processor

Page 80: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

80ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

(176,178,233)

178

(176,178,233)

178

(176,178,233)

178

177

176

233

178

! Other utilization: data with likely variations (e.g., from sensors),median vote

(110,176,178)

176

(176,177,178)

177

(176,178,233)

178

177

176

?

178

(176,177,178)

177

(176,177,178)

177

(176,177,178)

177

Page 81: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

81ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Prevention of errorpropagation

Fail-fast

Error detection(defensive

programming)and exception

handling

Service continuity

Error detectionand

recovery points

Design diversity

Recoveryblocks

Two-folddiversity

+acceptance

test

N-Versionprogramming

Three-folddiversity

+vote

N-self-checkingprogramming

Four-folddiversity

+switching

Two-folddiversity

+comparison

Doubleprogramming

Soft faults Solid faults

Development fault tolerance

Page 82: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

82ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Examples of recovery points

Tandem process pairs

Statistics on 169 software failures

1 processor halted

Several processors halted

138

31

efficiency: 82% (138/169)# does not account for under-reporting

Causes of multiple processor halts

2nd halt: processor executing the backup 24of failed primarysame fault primary-backup 17different fault 4unclear 3Not process-pair related 4Unclear 3

Page 83: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

83ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

"Aim: failure independency

! Obstacles: common specification, common difficulties,inter-variant synchronizations and decisions

"Operational use

! Civil avionics: generalized, at differing levels

! Railway signalling: partial

! Nuclear control: partial

"Dependability improvement

! Gain, although less than for physical fault tolerance

! Contribution to verification of specification

Design diversity

Page 84: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

84ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

CRA - NCSU - RTI - UCLA/UCSB - UIUC - UVA EXPERIMENT

Sensor management in a redundant inertia platform

20 programmes (1600 to 5000 lines)

920746 tests on flight simulator

Average failure probab 1 version 5.3 10-3

Average failure probab 3 -version systems 4.2 10-4

Reliability improvement : 13

Page 85: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

85ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Architecture of Airbus computers

28V

ProcessorMemory I/O

Power

Control channel

Monitoring channelPower

ProcessorMemory I/O

Protection lightning, EMI, over/under voltage

Page 86: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

86ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Airbus A-320

Ailerons

Horizontal plane

Elevators

SFCC1

SFCC2

Rudder

Slats and

flaps

FAC1

FAC2

SpoilersSEC1

SEC2

SEC3

ELAC1

ELAC2

Page 87: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

87ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Verification and evaluation of fault tolerance

Faults (deficiencies) in algorithms and mechanisms of fault tolerance

Fault toleranceCoverage

Modelling Test

Evaluation / Influence

Fault forecasting

Improvement

Fault removal

Test

Dynamicverification

Staticverification

Model checking

Fault injection

Page 88: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

88ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

ActivityTargetsystem

Inputs

Faults

Outputs

Correct/Incorrect

Error detection,error and faulthandling

Fault injection

By hardware

•Radiations

•Interferences

•Pins

Physical

By software

•Memory

•Executive

•Processor

Insimulation

Informational

Injection

Prototype oractual system

Simulationmodel

Target system

! Representativityof faults

Page 89: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

89ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Delta-4 system (LAAS)

Robustness testing of POSIX calls (CMU) Test of Chorus Classix R3 (LAAS)

Detectederrors

Toleratederrors

ErrorsFaults

Failures

94% 85% 99%

3% 1%

12%

Dig

ital U

nix

4.0

Fre

e B

SD

2.2

.5

HP

UX

B.1

0.2

0

0

0,05

0,1

0,15

0,2

0,25

Failu

re p

robabili

ty

AIX

4.1

Irix

6.2

Lin

ux 2

.0.1

8

LynxO

S 2

.4.0

NetB

SD

1.3

QN

X 4

.2.4

SunO

S 5

.5

Kern

el

Debug.

No o

bs

Err

or

sta

tus0

10

20

30

40

50

60

70

80

90

Syst.

hang

Excep.

Appl

.hang

Incor.

re

s.

% r

esponses

Internal faults

External faults

Average Min-Max

Average Min-Max

Page 90: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

90ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Development of DependableSystems

Page 91: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

91ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Dvelopment steps

Requirements and specification

Design

Realisation(including selection and

adaptation of pre-existingcomponents)

Integration

Qualification

Means fordependability

Fault Tolerance

Fault Removal

Fault Forecasting

Fault Prevention

Page 92: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

92ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Architecturaldesign

QualificationSpecification

Detaileddesign

Componentrealisation

Integration

Creation

Verification

V development model

Page 93: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

93ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Effort distribution

30%30%40%1990

20%60%20%1980

10%80%10%1960-1970

QualificationIntegrationComponentrealisation

Detaileddesign

Architecturaldesign

Requirementsand

specificationYears

Page 94: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

94ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Standards

!CEI 61508 (1998) – Combination of a damage probability and of itsseverity

! ISO/CEI 15026 (1998) – A function of the probability of occurrence of agiven threat and the potential adverse consequences of that threat’soccurrence

!MIL-STD-882D (February 2000) – An expression of the impact andpossibility of a mishap in terms of potential mishap severity and probabilityof occurrence

Techniques and approaches $ System criticitality

Risk

Failurelikelihood

Possibleconsequences of

failures

Page 95: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

95ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Major

Improbable

Infrequent

Occasional

Probable

Frequent

Probabilityofoccurrence

CatastrophicSignificantMinor

Severity

Probabilities of occurrence andseverities of consequences

Application domains (transportation,energy production,

telecommunications, etc.)

Page 96: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

96ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

10-4 - 10-610-8 - 10-10Improbable

10-3 - 10-410-6 - 10-8Infrequent

10-2 - 10-310-5 - 10-6Occasional

10-1 - 10-210-4 - 10-5Probable

> 10 -1> 10-4Frequent

Discrete time (bysolicitation)

Continuous time(h-1)

Probability ofoccurrence

Examples of occurrence probabilities

Page 97: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

97ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Example of damage severity (civil avionics)

No effect

Minor

Major

Hazardous /severe-major

Catastrophic

failure conditions which do not effect the operational capability of theaircraft or increase pilot workload.

failure conditions which would not significantly reduce aircraft safety, andwhich involve crew actions that are well within their capabilities. Minorfailure conditions include, for example, a slight increase in crew workload,such as routine flight plan changes or some inconvenience to occupants.

failure conditions which would reduce the capability of the aircraft or theability of the crew to cope with adverse operating conditions to the extentthat there would be, for example, a significant reduction in safety marginsor functional capabilities, a significant increase in crew workload or inconditions impairing crew efficiency, or some discomfort to occupants.

failure conditions which would reduce the capability of the aircraft or theability of the crew to cope with adverse operating conditions to the extentthat there would be:

• a large reduction in safety margins or functional capabilities;

• physical distress or higher workload such that the flight crew could notbe relied on to perform their tasks accurately or completely; or

• serious or fatal injury to a relatively small number of the occupants.

failure conditions which would prevent continued safe flight and landing

Page 98: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

98ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Major

Improbable

NegligibleInfrequent

Occasional

Probable

Frequent

Probabilityofoccurrence

CatastrophicSignificantMinor

Severity

UnacceptableUndesirable

Tolerable

Risk classes

Page 99: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

99ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Major

TrivialImprobable

Infrequent

Occasional

IntermediateProbable

Frequent

Probabilityofoccurrence

CatastrophicSignificantMinor

Severity

Low

High

Risk classes ' Criticality

Page 100: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

100ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Utilisation of techniques and approaches

M : Mandatory

SR : Strongly recommended. If technique or approachnot used, reasons have to be made explicit

R : Recommended

— : No recommendation in favor or against

NR : Explicitly not recommended. If technique orapproach used, reasons have to be made explicit

Page 101: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

101ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

! Actor independency

Designer/Implementer

Verifyer Verifyer

Designer/Implementer

Verifyer Verifyer

Designer/Implementer

Verifyer Verifyer

Trivial

Low

Intermediate and High

Can be thesameperson(s)

Can belong tothe samecompany

Criticality

Page 102: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

102ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

From Dependability to Resilience

Page 103: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

103ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

!Adjective Resilient" In use for 30+ years

" Recently, escalating use( buzzword

" Used essentially as synonym tofault tolerant

" Noteworthy exception: prefaceof Resilient Computing Systems,T. Anderson (Ed.), Collins, 1985

«The two key attributes here aredependability and robustness. […] Acomputing system can be said to berobust if it retains its ability to deliverservice in conditions which are beyondits normal domain of operation»

) in dependability and securityof computing systems

Materialscience

Ecology

Childpsychiatryandpsychology

Industrialsafety

Business

Socialpsychology

Adaptation tochanges, andgetting back

after asetback

!Fault and change tolerance

) in other domains

Resilience

Page 104: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

104ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Resilience: persistence of service delivery thatcan justifiably be trusted, when facing changes

Nature

Short termFunctional Foreseen

Medium termEnvironmental Foreseeable

Long termTechnological Unforeseen

Prospect Timing

Dependability: Ability to deliverservice that can justifiably be trusted

At stake: Maintain dependabilityin spite of changes

! The definition does not exclude the possibility of failure

Alternate definition of dependability:Ability to avoid unacceptably frequent or severe service failures

Threat evolution

Resilience: persistence of avoidance of unacceptablyfrequent or severe service failures, when facing changes

Resilience: persistence of dependability when facing changes

Page 105: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

105ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Changes

Nature

Short term, e.g., seconds

to hours, as in dynamicallychanging systems(spontaneous, or ‘ad-hoc’,networks of mobile nodes andsensors, etc.)

Functional Foreseen,as in newversioning

Medium term, e.g., hours

to months, as in newversioning or reconfigurations

Environmental

Foreseeable,as in the advent ofnew hardwareplatforms

Long term, e.g., months to

years, as in reorganizationsresulting from merging ofsystems in companyacquisitions, or from couplingof systems in militarycoalitions

Technological

Unforeseen,as drastic changesin service requestsor new types ofthreats

Prospect Timing

Threat evolution, e.g.,

) ever increasingproportion of transienthardware faults,

) ever-evolving andgrowing attacks,

) mismatches between the modificationsthat implement the changes and theformer status of the system

Page 106: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

106ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Context: emerging ubiquitous systems

immense information systems,perhaps involving everything from

super-computers, huge server "farms"and giant data centers, to myriads ofsmall mobile computers (billions) and

tiny embedded devices (trillions)

Characteristic: perpetual evolution

Development: dynamiccomponentization via service

discovery and assembly

Page 107: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

107ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

[From M.P. Papazoglou et al., ‘Service-Oriented Computing: State of

the Art and Research Challenges’, Computer, Nov. 2007]

Page 108: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

108ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Technologies for resilience

Evolutionary changes Evolvability! Adaptation

Trusted service Assessability! Verification and evaluation

Complex systems Diversity! Taking advantage of existing

diversity for avoiding single pointsof failure, and augmenting diversity

Fault Removal

Fault Forecasting

Assessability

Fault Prevention

Fault Tolerance

Evolvability

Usability

Diversity

*

*

*

*

*

*

*

* *

Ubiquitous systems Usability! Human and system users

Means fordependability

Page 109: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

109ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

Conclusion

Page 110: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

110ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

integration

inte

rcon

nect

ion p

erfo

rmance

!! vanishing substitutes ! for computers

funnel factor!! decreasing natural

! robustness

unavoidability of faults!! ill-mastered complexity

dependency

Page 111: Fundamentals of Dependability - CNRresist.isti.cnr.it/files/corsi/courseware_slides/dependability... · ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

111ReSIST courseware — Jean-Claude Laprie — Fundamentals of Dependability

References(in addition to those mentioned in the slides)

A.Avizienis, J.C. Laprie, B. Randell, C. Landwehr, Basic Concepts

and Taxonomy of Dependable and Secure Computing, IEEETransactions on Dependable and Secure Computing, Vol. 1, no. 1,pp. 11-33, January-March 2004

J.C. Laprie, From Dependability to Resilience, 38th IEEE/IFIP Int.Conf. On Dependable Systems and Networks, Anchorage, Alaska,June 2008, Sup. Vol., pp. G8-G9