Jean-ClaudeLaprie(1) Dependability and Resilience

Embed Size (px)

Citation preview

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    1/22

    0

    Dependability and Resilience

    of Computing Systems

    Jean-ClaudeLaprie

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    2/22

    1

    ! Dependability: an integrative concept

    ! Based on, and elaboration upon A. Avizienis, J.C. Laprie, B.

    Randell, C. Landwehr, Basic Concepts and Taxonomy of

    Dependable and Secure Computing, IEEE Tr. Dependable

    and Secure Computing, 2004

    ! Resilience: a framework for ubiquitous computing

    challenges

    ! Based on, and elaboration upon outcomes of

    the European Network of Excellence ReSIST

    (Resilience for Survivability in Information

    Society Technologies)

    !Conclusion

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    3/22

    2

    SafetyReliability ConfidentialityAvailability Integrity Maintainability

    FaultPrevention

    FaultTolerance

    FaultRemoval

    FaultForecasting

    Faults Errors FailuresActivation Propagation Causation

    FaultsFailures Causation

    Attributes

    Threats

    Means

    Dependability: delivery of service that can justifiably be trusted, thusavoidance of failures that are unacceptably frequent or

    severe

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    4/22

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    5/22

    4

    Fault

    forecating

    Ordinal or qualitative evaluation

    Probabilistic orquantitative evaluation

    ModelingOperational testing

    Fault

    tolerance

    Error detection

    System recoveryError handling

    Fault handling

    Development

    Static verificationDynamic verification

    Verification

    Diagnosis

    Correction

    Non-regression verificationFault

    removalOperational life Corrective or preventive maintenance

    Means

    Fault

    prevention

    Attributes

    Availability

    Safety

    Confidentiality

    Integrity

    Maintainability

    Threats

    Faults

    Errors

    Failures

    Development

    Physical

    Interaction

    Dependability

    Reliability

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    6/22

    5

    Dependability definition

    " First part: delivery of service that can justifiably be trusted

    ! Enables to generalize availability, reliability, safety, confidentiality,

    integrity, maintainability, that are then attributes of dependability

    " Second part: avoidance of service failures that are unacceptably frequent or

    severe

    ! Failure frequency and severity "risk assessment and management! A system can, and usually does, fail. Is it however still dependable?

    When does it become undependable?

    #

    criterion for deciding whether or not, in spite of service failures, a

    system is still to be regarded as dependable

    ! Unacceptably frequent or severe service failures "dependabilityfailure, connected to development process failure

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    7/22

    6

    Dependability attributes

    !Availability, Reliability, Safety, Confidentiality, Integrity, Maintainability:

    Property-based attributes

    ! Event-based attributes

    " Robustness: persistence of dependability with respect to external faults

    " Survivability: persistence of dependability in the presence of active fault(s)

    " Resilience: persistence of dependability when facing functional,

    environmental, or technological, changes

    !Attributes based on distinguishing among various types of (meta-)information

    " Accountability: availability and integrity of the person who performed an

    operation

    " Authenticity: integrity of a message content and origin, and possibly some

    other information, such as the time of emission

    " Non-repudiability: availability and integrity of the identity of the sender of a

    message (non-repudiation of the origin), or of the receiver (non-

    repudiation of reception)

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    8/22

    7

    Fault Error Failure

    Deviationof the

    delivered service

    from correct service,

    i.e., implementing

    the system function

    Part of system

    state that may

    cause a

    subsequent

    failure

    Adjudged or

    hypothesized

    cause of an

    error

    Failure

    Fault

    System does

    not comply with

    the specification

    describing the

    system function

    Specification

    does not

    adequately

    describe the

    system function

    Dependability Threats

    PropagationActivation Causation

    Taking advantage of

    a vulnerability, i.e.,

    an internal fault

    which enables the

    system to be harmed

    Computation

    process

    Internal

    fault

    External

    fault

    Context

    dependent

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    9/22

    8

    Faults Errors Failures FaultsFailures PropagationActivation CausationCausation

    Content failures

    DomainTiming failures

    Signaled failures

    Unsignaled failuresDetectability

    Consistency

    Consistent failures

    Inconsistent

    (Byzantine) failures

    Catastrophic failures

    Consequences

    Minor failures$$$

    IdentificationLatent errors

    Detected errors

    ExtentSingle errors

    Multiple related errors

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    10/22

    9

    Faults Errors Failures FaultsFailures PropagationActivation CausationCausation

    Origin

    Manifestatio

    n

    Phase of creationor occurrence

    Development faults

    Operational faults

    System

    boundaries

    Internal faults

    External faults

    Phenomenologicalcause

    Natural faults

    Human-made faults

    IntentMalicious faults

    Non-malicious faults

    Capability

    Accidental faults

    Incompetence faults

    Deliberate faults

    PersistencePermanent faults

    Transient faults

    Activation

    reproducibility

    Solid faults

    Elusive faults

    Hardware faults

    Software faults

    System faults

    Dimension

    Omission faults

    Commission faultsStyle

    PluralitySingle faults

    Multiple faults

    CorrelationIndependent faults

    Related faults

    Damage Soft faults

    Hard faults

    StatusDormant faults

    Active faults

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    11/22

    10

    Faults

    by origin

    Non-

    malicious

    Malicious Non-

    malicious.

    Non-

    malicious.

    Non-malicious Non-malicious Malicious

    Intent

    Internal Internal ExternalSystem boundaries

    Development OperationalPhase of creation

    or occurrence

    Accid. Incomp. Accid. Accidental Accidental Accid. Delib. Incomp. DeliberateDelib. Delib.Capability

    Phenomenologicalcause

    Human-madeNatural Natural NaturalHuman-made

    Development faults Physical faults Interaction faults

    Malicious

    logic

    Physical

    Deterior.

    Physical

    Interference

    Intrusion

    Attempts

    Viruses

    Worms

    Input Mistakes

    Overload

    Configuration andReconfiguration

    Mistakes

    Design Flaws

    Production Defects

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    12/22

    11

    Style

    Persistence

    Faultsbymanifestation

    Dimension

    Faults by origin

    Physical faults Development faults Interaction faults

    Plurality

    Omission faults # # #

    Correlation

    Damage

    Status

    Activation

    reproducibility

    Commission faults # # #

    Permanent faults # # #

    Transient faults # #

    Hardware faults # # #

    Software faults # #

    System faults # # #

    Single faults # # #

    Multiple faults # # #

    Independent faults # # #

    Related faults # # #

    Hard faults # # #

    Soft faults # # #

    Dormant faults # #

    Active faults # # #

    Solid faults # # #

    Elusive faults # # #

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    13/22

    12

    A few milestones

    ! 12th IEEE Int. Symposium on Fault-Tolerant Computing, Santa Monica, June

    1982" Session Fundamental Concepts of Fault Tolerance, papers by

    A. Avizienis, H. Kopetz, J.C. Laprie and A. Costes, A.S. Robinson,

    T. Anderson and P.A. Lee, P.A. Lee and D. Morgan

    " Session Prospective on the State-of-the-Art, position paper by

    W.C. Carter

    ! Paper by J.C. Laprie, 15th IEEE Int. Symposium on Fault-TolerantComputing, Ann Arbor, June 1985, Dependable Computing and Fault

    Tolerance: Concepts and Terminology

    ! Publication by J.C. Laprie of the book Dependability: Basic Concepts and

    Terminology in English, French, German, Italian, Japanese, Springer

    Verlag, 1992, vol. 5 of the series Dependable Computing and Fault-Tolerant

    Systems, A. Avizienis, H. Kopetz, J.C. Laprie, eds.

    ! Paper by A. Avizienis, J.C. Laprie, B. Randell, and C. Landwehr, Basic

    Concepts and Taxonomy of Dependable and Secure Computing, IEEE

    Transactions on Dependable and Secure Computing, 2004

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    14/22

    13

    ! Strength of the dependability concept: integrative role

    ! Enables the more classical notions of reliability, availability, safety,

    confidentiality, integrity, maintainability, etc. to be put into perspective, as

    attributes of dependability

    ! Model provided for the means for achieving dependability

    " Those means are much more orthogonal to each other than the more

    classical classification according to the attributes of dependability

    " Facilitates the unavoidable trade-offs, as the attributes tend to conflict

    with each other

    ! Fault - Error - Failure model

    " Understanding and mastering the threats

    " Unified presentation of the threats, while explaining their specificities

    via the various fault classes

    ! Concept and terminology widely accepted by international community

    ! IEEE Fault-Tolerant Computing Symposium +IFIP Dependable Computingfor Critical Applications Conference $IEEE/IFIP Conference on

    Dependable Systems and Network

    ! Regional (Europe, Pacific Rim, South America) Fault Tolerant Computing

    Conferences $Dependable Computing Conferences

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    15/22

    14

    Ubiquitous systems: future large, networked, continuously evolving systems

    constituting complex information infrastructures perhaps involving everything from

    super-computers and huge server "farms" to myriads of small mobile computers and

    tiny embedded devices

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    16/22

    15

    Short term[e.g., second to hours, as in

    dynamically changing systems]

    Functional[incl. Specification faults]

    Foreseen[e.g., new versioning]

    Medium term

    [e.g., hours to months, as inreconfigurations or new versioning]

    Environmental

    [incl. interaction faults]

    Foreseeable

    [e.g., advent of newhardware platforms]

    Long term[e.g., months to years, as in

    reorganisations resulting from

    mergers, or in military coaltions]

    Technological[incl. development

    and physical faults]

    Unforeseen[e.g., drastic changes in

    servivce requests or

    new types of threats]

    At stake:Maintain dependability in spite of changes

    Threat evolution

    Resilience: persistence of dependability when facing changes

    TimingNature Prospect

    % Attacks

    % Mismatches in evolutionary changes

    % Side-effects in emerging behaviors

    % Increasing importance of configuration faults

    % Increasing proportion of hardware transient faults

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    17/22

    16

    !AdjectiveResilient

    " In use for 30+ years

    " Recently, escalating use %buzzword

    " Used essentially as synonym, or

    substitute, to fault tolerant

    " Noteworthy exception: prefaceof Resilient Computing Systems,

    T. Anderson (Ed.), Collins, 1985The two key attributes here are dependability

    and robustness. [] A computing system can

    be said to be robustif it retains its ability to

    deliver service in conditions which are beyond

    its normal domain of operation

    $ in dependability and security of

    computing systems

    Adaptation to

    changes, andgetting back

    after a setback

    Materialscience

    Ecology

    Child

    psychiatry

    and

    psychology

    Industrialsafety

    Business

    Social

    psychology

    $ in other domains

    Resilience

    !Generalisation of fault tolerance

    changetolerance#

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    18/22

    17

    Technologies for resilience

    Changes Evolvability

    ! Self evolvability: adaptation

    Trusted service Assessability

    ! Verification and evaluation

    Complex systems Diversity

    ! Taking advantage of existing diversity,

    and augmenting it

    Ambient intelligence Usability

    !Human and system users

    Fault Removal

    Fault Forecasting

    Assessability

    Fault Prevention

    Fault Tolerance

    Evolvability

    Usability

    Diversity

    #

    #

    #

    #

    #

    #

    #

    # #Means for

    dependability

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    19/22

    18

    Assessability

    Continuous changes

    Need for complementing

    pre-deployment verification and evaluation

    with

    operational verification and evaluation

    Emerging behaviors due to

    vast number of interactions

    Vast number of

    configurations,

    including

    dependencies

    not existing priorto deployment

    Elusive,

    context-

    dependent,

    faults

    Concurrent

    executions due

    to prevalence of

    multi-core

    systems

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    20/22

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    21/22

    20

    Expectations of the computing science and engineering

    community

    ! 50th anniversary issue of Communications of the ACM (January 2008) : most

    articles mention dependability or resilience as a major factor (Jeannette Wing,

    Rodney Brooks, Gul Agha, John Crowcroft, Gordon Bell)

    ! Rodney Brooks : New formalisms will let us analyze complex distributed systems,

    producing new theoretical insights that lead to practical real-world payoffs. Exactly

    what the basis for these formalisms will be is, of course impossible to guess. My own

    bet is on resilience and adaptability

    ! March 2008 issue of IEEE Computer, feature section devoted to software

    engineering in the 21st century

    ! Barry Boehm : In the 21st century, software engineers face the often formidable

    challenges of simultaneously dealing with rapid change, uncertainty and emergence,

    dependability, diversity, and interdependence

  • 8/10/2019 Jean-ClaudeLaprie(1) Dependability and Resilience

    22/22

    21

    Ubiquitous computing systems: integral part of society

    Dependability of current

    computing systems

    Web sites availability:

    one to two 9s

    Well managed systems

    availability: four 9s

    Dependability of classical

    essential infrastructures

    public transportation,

    energy and water supply,

    fixed phone: availability of

    at least five 9s

    Society problem

    Societal reponsibility

    $

    //

    &Mismatch

    'Increasing

    dependency

    One 9

    Two 9s

    Three 9s

    Four 9s

    Five 9s

    Six 9s

    36j 12h0,9

    3j 16h0,99

    8h 46mn0,999

    52mn 34s0,9999

    5mn 15s0,99999

    32s0,999999

    Outageduration/year

    Availability