Upload
lcm3766l
View
217
Download
0
Embed Size (px)
Citation preview
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
1/22
0
Dependability and Resilience
of Computing Systems
Jean-ClaudeLaprie
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
2/22
1
! Dependability: an integrative concept
! Based on, and elaboration upon A. Avizienis, J.C. Laprie, B.
Randell, C. Landwehr, Basic Concepts and Taxonomy of
Dependable and Secure Computing, IEEE Tr. Dependable
and Secure Computing, 2004
! Resilience: a framework for ubiquitous computing
challenges
! Based on, and elaboration upon outcomes of
the European Network of Excellence ReSIST
(Resilience for Survivability in Information
Society Technologies)
!Conclusion
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
3/22
2
SafetyReliability ConfidentialityAvailability Integrity Maintainability
FaultPrevention
FaultTolerance
FaultRemoval
FaultForecasting
Faults Errors FailuresActivation Propagation Causation
FaultsFailures Causation
Attributes
Threats
Means
Dependability: delivery of service that can justifiably be trusted, thusavoidance of failures that are unacceptably frequent or
severe
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
4/22
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
5/22
4
Fault
forecating
Ordinal or qualitative evaluation
Probabilistic orquantitative evaluation
ModelingOperational testing
Fault
tolerance
Error detection
System recoveryError handling
Fault handling
Development
Static verificationDynamic verification
Verification
Diagnosis
Correction
Non-regression verificationFault
removalOperational life Corrective or preventive maintenance
Means
Fault
prevention
Attributes
Availability
Safety
Confidentiality
Integrity
Maintainability
Threats
Faults
Errors
Failures
Development
Physical
Interaction
Dependability
Reliability
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
6/22
5
Dependability definition
" First part: delivery of service that can justifiably be trusted
! Enables to generalize availability, reliability, safety, confidentiality,
integrity, maintainability, that are then attributes of dependability
" Second part: avoidance of service failures that are unacceptably frequent or
severe
! Failure frequency and severity "risk assessment and management! A system can, and usually does, fail. Is it however still dependable?
When does it become undependable?
#
criterion for deciding whether or not, in spite of service failures, a
system is still to be regarded as dependable
! Unacceptably frequent or severe service failures "dependabilityfailure, connected to development process failure
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
7/22
6
Dependability attributes
!Availability, Reliability, Safety, Confidentiality, Integrity, Maintainability:
Property-based attributes
! Event-based attributes
" Robustness: persistence of dependability with respect to external faults
" Survivability: persistence of dependability in the presence of active fault(s)
" Resilience: persistence of dependability when facing functional,
environmental, or technological, changes
!Attributes based on distinguishing among various types of (meta-)information
" Accountability: availability and integrity of the person who performed an
operation
" Authenticity: integrity of a message content and origin, and possibly some
other information, such as the time of emission
" Non-repudiability: availability and integrity of the identity of the sender of a
message (non-repudiation of the origin), or of the receiver (non-
repudiation of reception)
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
8/22
7
Fault Error Failure
Deviationof the
delivered service
from correct service,
i.e., implementing
the system function
Part of system
state that may
cause a
subsequent
failure
Adjudged or
hypothesized
cause of an
error
Failure
Fault
System does
not comply with
the specification
describing the
system function
Specification
does not
adequately
describe the
system function
Dependability Threats
PropagationActivation Causation
Taking advantage of
a vulnerability, i.e.,
an internal fault
which enables the
system to be harmed
Computation
process
Internal
fault
External
fault
Context
dependent
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
9/22
8
Faults Errors Failures FaultsFailures PropagationActivation CausationCausation
Content failures
DomainTiming failures
Signaled failures
Unsignaled failuresDetectability
Consistency
Consistent failures
Inconsistent
(Byzantine) failures
Catastrophic failures
Consequences
Minor failures$$$
IdentificationLatent errors
Detected errors
ExtentSingle errors
Multiple related errors
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
10/22
9
Faults Errors Failures FaultsFailures PropagationActivation CausationCausation
Origin
Manifestatio
n
Phase of creationor occurrence
Development faults
Operational faults
System
boundaries
Internal faults
External faults
Phenomenologicalcause
Natural faults
Human-made faults
IntentMalicious faults
Non-malicious faults
Capability
Accidental faults
Incompetence faults
Deliberate faults
PersistencePermanent faults
Transient faults
Activation
reproducibility
Solid faults
Elusive faults
Hardware faults
Software faults
System faults
Dimension
Omission faults
Commission faultsStyle
PluralitySingle faults
Multiple faults
CorrelationIndependent faults
Related faults
Damage Soft faults
Hard faults
StatusDormant faults
Active faults
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
11/22
10
Faults
by origin
Non-
malicious
Malicious Non-
malicious.
Non-
malicious.
Non-malicious Non-malicious Malicious
Intent
Internal Internal ExternalSystem boundaries
Development OperationalPhase of creation
or occurrence
Accid. Incomp. Accid. Accidental Accidental Accid. Delib. Incomp. DeliberateDelib. Delib.Capability
Phenomenologicalcause
Human-madeNatural Natural NaturalHuman-made
Development faults Physical faults Interaction faults
Malicious
logic
Physical
Deterior.
Physical
Interference
Intrusion
Attempts
Viruses
Worms
Input Mistakes
Overload
Configuration andReconfiguration
Mistakes
Design Flaws
Production Defects
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
12/22
11
Style
Persistence
Faultsbymanifestation
Dimension
Faults by origin
Physical faults Development faults Interaction faults
Plurality
Omission faults # # #
Correlation
Damage
Status
Activation
reproducibility
Commission faults # # #
Permanent faults # # #
Transient faults # #
Hardware faults # # #
Software faults # #
System faults # # #
Single faults # # #
Multiple faults # # #
Independent faults # # #
Related faults # # #
Hard faults # # #
Soft faults # # #
Dormant faults # #
Active faults # # #
Solid faults # # #
Elusive faults # # #
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
13/22
12
A few milestones
! 12th IEEE Int. Symposium on Fault-Tolerant Computing, Santa Monica, June
1982" Session Fundamental Concepts of Fault Tolerance, papers by
A. Avizienis, H. Kopetz, J.C. Laprie and A. Costes, A.S. Robinson,
T. Anderson and P.A. Lee, P.A. Lee and D. Morgan
" Session Prospective on the State-of-the-Art, position paper by
W.C. Carter
! Paper by J.C. Laprie, 15th IEEE Int. Symposium on Fault-TolerantComputing, Ann Arbor, June 1985, Dependable Computing and Fault
Tolerance: Concepts and Terminology
! Publication by J.C. Laprie of the book Dependability: Basic Concepts and
Terminology in English, French, German, Italian, Japanese, Springer
Verlag, 1992, vol. 5 of the series Dependable Computing and Fault-Tolerant
Systems, A. Avizienis, H. Kopetz, J.C. Laprie, eds.
! Paper by A. Avizienis, J.C. Laprie, B. Randell, and C. Landwehr, Basic
Concepts and Taxonomy of Dependable and Secure Computing, IEEE
Transactions on Dependable and Secure Computing, 2004
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
14/22
13
! Strength of the dependability concept: integrative role
! Enables the more classical notions of reliability, availability, safety,
confidentiality, integrity, maintainability, etc. to be put into perspective, as
attributes of dependability
! Model provided for the means for achieving dependability
" Those means are much more orthogonal to each other than the more
classical classification according to the attributes of dependability
" Facilitates the unavoidable trade-offs, as the attributes tend to conflict
with each other
! Fault - Error - Failure model
" Understanding and mastering the threats
" Unified presentation of the threats, while explaining their specificities
via the various fault classes
! Concept and terminology widely accepted by international community
! IEEE Fault-Tolerant Computing Symposium +IFIP Dependable Computingfor Critical Applications Conference $IEEE/IFIP Conference on
Dependable Systems and Network
! Regional (Europe, Pacific Rim, South America) Fault Tolerant Computing
Conferences $Dependable Computing Conferences
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
15/22
14
Ubiquitous systems: future large, networked, continuously evolving systems
constituting complex information infrastructures perhaps involving everything from
super-computers and huge server "farms" to myriads of small mobile computers and
tiny embedded devices
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
16/22
15
Short term[e.g., second to hours, as in
dynamically changing systems]
Functional[incl. Specification faults]
Foreseen[e.g., new versioning]
Medium term
[e.g., hours to months, as inreconfigurations or new versioning]
Environmental
[incl. interaction faults]
Foreseeable
[e.g., advent of newhardware platforms]
Long term[e.g., months to years, as in
reorganisations resulting from
mergers, or in military coaltions]
Technological[incl. development
and physical faults]
Unforeseen[e.g., drastic changes in
servivce requests or
new types of threats]
At stake:Maintain dependability in spite of changes
Threat evolution
Resilience: persistence of dependability when facing changes
TimingNature Prospect
% Attacks
% Mismatches in evolutionary changes
% Side-effects in emerging behaviors
% Increasing importance of configuration faults
% Increasing proportion of hardware transient faults
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
17/22
16
!AdjectiveResilient
" In use for 30+ years
" Recently, escalating use %buzzword
" Used essentially as synonym, or
substitute, to fault tolerant
" Noteworthy exception: prefaceof Resilient Computing Systems,
T. Anderson (Ed.), Collins, 1985The two key attributes here are dependability
and robustness. [] A computing system can
be said to be robustif it retains its ability to
deliver service in conditions which are beyond
its normal domain of operation
$ in dependability and security of
computing systems
Adaptation to
changes, andgetting back
after a setback
Materialscience
Ecology
Child
psychiatry
and
psychology
Industrialsafety
Business
Social
psychology
$ in other domains
Resilience
!Generalisation of fault tolerance
changetolerance#
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
18/22
17
Technologies for resilience
Changes Evolvability
! Self evolvability: adaptation
Trusted service Assessability
! Verification and evaluation
Complex systems Diversity
! Taking advantage of existing diversity,
and augmenting it
Ambient intelligence Usability
!Human and system users
Fault Removal
Fault Forecasting
Assessability
Fault Prevention
Fault Tolerance
Evolvability
Usability
Diversity
#
#
#
#
#
#
#
# #Means for
dependability
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
19/22
18
Assessability
Continuous changes
Need for complementing
pre-deployment verification and evaluation
with
operational verification and evaluation
Emerging behaviors due to
vast number of interactions
Vast number of
configurations,
including
dependencies
not existing priorto deployment
Elusive,
context-
dependent,
faults
Concurrent
executions due
to prevalence of
multi-core
systems
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
20/22
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
21/22
20
Expectations of the computing science and engineering
community
! 50th anniversary issue of Communications of the ACM (January 2008) : most
articles mention dependability or resilience as a major factor (Jeannette Wing,
Rodney Brooks, Gul Agha, John Crowcroft, Gordon Bell)
! Rodney Brooks : New formalisms will let us analyze complex distributed systems,
producing new theoretical insights that lead to practical real-world payoffs. Exactly
what the basis for these formalisms will be is, of course impossible to guess. My own
bet is on resilience and adaptability
! March 2008 issue of IEEE Computer, feature section devoted to software
engineering in the 21st century
! Barry Boehm : In the 21st century, software engineers face the often formidable
challenges of simultaneously dealing with rapid change, uncertainty and emergence,
dependability, diversity, and interdependence
8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience
22/22
21
Ubiquitous computing systems: integral part of society
Dependability of current
computing systems
Web sites availability:
one to two 9s
Well managed systems
availability: four 9s
Dependability of classical
essential infrastructures
public transportation,
energy and water supply,
fixed phone: availability of
at least five 9s
Society problem
Societal reponsibility
$
//
&Mismatch
'Increasing
dependency
One 9
Two 9s
Three 9s
Four 9s
Five 9s
Six 9s
36j 12h0,9
3j 16h0,99
8h 46mn0,999
52mn 34s0,9999
5mn 15s0,99999
32s0,999999
Outageduration/year
Availability