Levenson - 1-2-Beginners-Tutorial-part2.pdf

Embed Size (px)

Citation preview

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    1/163

    STPA: A New HazardAnalysis Technique

    (System-Theoretic Process Analysis)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    2/163

    Summary: Accident Causality in STAMP

    Accidents occur when

    Control structure or control actions do not enforce safetyconstraints

    Unhandled environmental disturbances or conditions Unhandled or uncontrolled component failures Dysfunctional (unsafe) interactions among components

    Control structure degrades over time (asynchronousevolution)

    Control actions inadequately coordinated among multiplecontrollers

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    3/163

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    4/163

    A Third Source of Risk

    Control actions inadequately coordinated among multiplecontrollers

    Controller 1 Process 1

    Boundary areas

    Copyright Nancy Leveson, Aug. 2006

    Controller 2 Process 2

    Controller 1

    Controller 2Process

    Overlap areas (side effects of decisions and control actions)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    5/163

    Uncoordinated Control Agents

    SAFE STATEATC provides coordinated instructions to both planes

    SAFE STATETCAS provides coordinated instructions to both planes

    Control Agent(TCAS)

    InstructionsInstructions

    UNSAFE STATEBOTH TCAS and ATC provide uncoordinated & independent instructions

    Control Agent(ATC)

    InstructionsInstructions

    Control Agent(ATC)

    InstructionsInstructions

    No Coordination

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    6/163

    Hazard Analysis

    Investigating an accident before it occurs.

    Goal: Identify potential causes of accidents (scenarios that can lead to

    losses)

    So can be eliminated or controlled in design or operations before .

    Used for: Developing requirements and design constraints

    Validating requirements and design for safety Preparing operational procedures and instructions Test planning and evaluation Management planning

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    7/163

    System-Theoretic Process Analysis(STPA)

    Supports a safety-driven design process where

    Hazard analysis influences and shapes early design decisions Hazard analysis iterated and refined as design evolves

    Also supports accident analysis and risk analysis/hazard analysis

    Goals (same as any hazard analysis)

    Identify safety constraints/requirements necessary to ensure

    acceptable risk Accumulate information about how hazards can be violated

    (scenarios), which is used to eliminate, reduce and controlhazards in system design, development, manufacturing, andoperations

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    8/163

    STPA

    Used to assist in defining scenarios in which the safetyconstraints could be violated.

    The same goal as fault trees or any other hazard analysisapproach) but Looks at more than component failures More support provided in the analysis Finds more types of accident scenarios

    Starts from basic control structure and assignedresponsibilities for safety-critical actions.

    Copyright Nancy Leveson, Aug. 2006

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    9/163

    STPA on Social Systems

    We have applied STPA to social (organizational)systems

    NASA Shuttle operations management structure

    Effect of policy changes following the Vioxx events

    Accident analysis and system redesign for food safety

    But will concentrate in the following on the physical system

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    10/163

    Steps in STPA

    Establish fundamentals

    Define accident for your system Define hazards

    Rewrite hazards as constraints on system design Draw preliminary (high-level) safety control structure

    Identify potentially unsafe control actions (safetyrequirements and constraints)

    Determine how each potentially hazardous control actioncould occur

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    11/163

    Steps in STPA

    Define accidents

    Define system hazards associated with accidents

    Translate system hazards into high-level safety

    requirements (constraints) Construct high-level control structure including

    Responsibilities of components Preliminary process model

    Refine high-level safety constraints into detailed safetyrequirements on components and scenarios for losses

    Use results to create or improve system design

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    12/163

    Defining Accidents

    Accident : An undesired and unplanned (but notnecessarily unexpected) event that results in (at least) aspecified level of loss.

    Incident : An event that involves no loss (or only minor

    circumstances.

    Loss can include human injury, property damage,

    environmental pollution (damage), mission loss, etc.

    Could prioritize or assign varying levels of severity

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    13/163

    Accidents or UnacceptableLosses for Explorer Spacecraft

    ACC1. Humans and/or human assets on earth arekilled/damaged. (PC1), (H5)

    ACC2. Humans and/or human assets off of the earth arekilled/dama ed. PC1 H6

    ACC3. Organisms on any of the moons of the outer planet (if theyexist) are killed or mutated by biological agents of Earth Origin.(H4)

    ACC4. The scientific data corresponding to the mission goals arenot collected. (G1, G2, G3, G4, G5, G6, G7), (H1)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    14/163

    Accidents (cont)

    ACC5. The scientific data is rendered unusable (e.g., deleted,corrupted, not returned at required time) before it can be fullyinvestigated . (G1, G2, G3, G4, G5, G6, G7), (H2,H3)

    ACC6 Organisms of Earth origin are mistaken for organisms

    missions to study the outer planet's moon. (H4 )

    ACC7. An incident during this mission directly causes anothermission to fail to collect, return, and/or use the scientific datacorresponding to its mission goals. (PC1)(H7)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    15/163

    Exercise

    Select an application from your industry or company

    Define accidents in this system.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    16/163

    Defining Hazards

    Hazard : A state or set of conditions that, together with other (worstcase) conditions in the environment, will lead to an accident (lossevent).

    Note that a hazard is NOT equal to a failure. s ngu s ng azar s rom a ures s mp cin understanding the difference between safetyand reliability engineering. (C.O. Miller)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    17/163

    Hazard Level : A combination of severity (worst potentialdamage in case of an accident) and likelihood of occurrenceof the hazard.

    Risk : The hazard level combined with the likelihood of thehazard leading to an accident plus exposure (or duration) ofthe hazard.

    RISK

    HAZARD LEVEL

    Hazardseverity

    Likelihood of hazard occurring

    HazardExposure

    Likelihood of hazardLeading to an accident

    Safety : Freedom from accidents or losses.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    18/163

    Identifying Hazards

    Must be within system but that depends on where drawsystem boundaries

    Choose hazards within design space

    Example: release of chemicals from plant

    Each part of socio-technical system responsible fordifferent parts of accident process and perhaps differenthazards

    Define small set of high-level hazards first

    Then can translate hazards into safety constraints andrequirements and refine them.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    19/163

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    20/163

    Hazards for Explorer Spacecraft

    H1. Inability of Mission to collect data. (ACC4)H2. Inability of Mission to return collected data. (ACC5)

    H3. Inability of Mission scientific investigators to use returned data.(ACC5)

    H4. Contamination of Outer Planet Moon with biological agents of.

    H5. Exposure of Earth life or human assets on Earth to toxic,radioactive, or energetic elements of mission hardware. (ACC1)

    H6. Exposure of Earth life or human assets off Earth to toxic,radioactive, or energetic elements of mission hardware. (ACC2)

    H7. Inability of other space exploration missions to use shared spaceexploration infrastructure to collect, return, or use data. (ACC5)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    21/163

    Hazards for TCAS II

    TCAS Hazards:1. Near mid-air collision (NMAC): Two controlled aircraft

    violate minimum separation standards)

    2. Controlled maneuver into ground

    3. Pilot loses control of aircraft

    4. Interference with other safety-related aircraft systems

    5. Interference with the ground-based ATC system

    6. Interference with ATC safety-related advisory

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    22/163

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    23/163

    Defining the High-Level Control Structure

    Need the control structure not the physical structure

    Engineers more used to defining physical connectionsthan logical connections

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    24/163

    Defining the Safety Control Structure

    High-level preliminary control structure is defined first

    Then refine as design process continues

    Need the control structure not the physical structure Not the same as the physical structure

    Basically the functional structure of the system

    Often useful to define levels or different views

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    25/163

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    26/163

    TCAS II Control Structure

    ICAO

    Copyright Nancy Leveson, Aug. 2006

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    27/163

    Component ResponsibilitiesTCAS:

    Receive and update information about its own and other aircraft Analyze information received and provide pilot with

    Information about where other aircraft in the vicinity are located An escape maneuver to avoid potential NMAC threats

    Pilot

    scanning

    Monitor TCAS displays and implement TCAS escape maneuvers

    Follow ATC advisories

    Air Traffic Controller

    Maintain separation between aircraft in controlled airspace byproviding advisories (control action) for pilot to follow

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    28/163

    Aircraft components (e.g., transponders, antennas)

    Execute control maneuvers

    Receive and send messages to/from aircraft

    Etc.

    Airline Operations Management

    Provide procedures for using TCAS and following TCAS advisories

    Train ilots

    Audit pilot performance

    Air Traffic Control Operations Management

    Provide procedures

    Train controllers,

    Audit performance of controllers

    Audit performance of overall collision avoidance system

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    29/163

    Control Structure Diagram Level 0

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    30/163

    Control Structure Diagram ISS Level 1

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    31/163

    ACC Control Structure Development

    31

    :

    5/18/2011

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    32/163

    325/18/2011

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    33/163

    Exercise

    Draw the functional control structure for your application

    Start with a VERY simple, very high-level model

    Identify responsibilities, commands, feedback

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    34/163

    Accident with No Component Failures

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    35/163

    Exercise

    Draw the high-level control structure for this system.

    Start with a simple control structure with three boxes Operator

    Automated controller

    Specify Component responsibilities Control actions Process model for each of the two controllers

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    36/163

    Computer

    Operator

    Plant

    Start ProcessStop Process

    Status Info

    Plant State:OK, Not OK, Unknown

    Reactor State:Operating, Not Operating,Unknown

    Water Valve:

    Catal st Valve:

    Open, ClosedUnknown

    O en, Closed

    Valves

    Plant stateinformation

    Open Water Close Water Open CatalystClose Catalyst ???

    Plant State:OK, Not OK, Unknown

    Unknown

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    37/163

    Documentation

    Remember to document all this as go along As part of engineering specifications (not separate) but

    identified as safety-related

    In hazard log

    Include cc en s, azar s, g - eve sa e y requ remen s, con ro

    structure

    Refined safety requirements and allocation to components

    Analysis results

    Design decisions

    Design rationale

    Tracing between design decisions and safety requirements

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    38/163

    STPA Step 1: Identifying Unsafe ControlActions

    We have now established the fundamental information tostart the analysis

    Next step (Step 1) is to identify the unsafe controlactions that each component can produce. Helps in refining safety requirements and constraints

    Step 2 will determine the causes of these unsafe controlactions. Causes will be used to guide design to eliminate

    or control them.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    39/163

    Identifying Unsafe Control Actions

    Four ways a controller can provide unsafe control:

    1. A control action required for safety is not provided

    2. An unsafe control action is provided

    .early (at the wrong time) or in the wrong sequence

    4. A control action required for safety is stopped too soon orapplied too long.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    40/163

    Identifying Unsafe Control Actions

    Four ways a controller can provide unsafe control:

    1. A control action required for safety is not providedThe aircraft are on a near collision course and TCAS does notprovide an RA

    The pilot does not follow the resolution advisory provided byTCAS (does not respond to the RA)

    2. An unsafe control action is providedThe aircraft are in close proximity and TCAS provides an RA

    that degrades vertical separation.The pilot incorrectly executes the TCAS resolution advisory.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    41/163

    Identifying Unsafe Control Actions

    3. A potentially safe control action is provided too late ortoo early (at the wrong time) or in the wrong sequence

    The aircraft are on a near collision course and TCAS providesan RA too late to avoid an NMAC

    The pilot applies the RA but too late to avoid the NMAC

    4. A control action required for safety is stopped too soonor applied too long.

    TCAS removes an RA too soon

    The pilot stops the RA maneuver too soon.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    42/163

    Defining System-Level Safety

    Constraints (Requirements)TCAS

    When two aircraft are on a collision course, TCAS must alwaysprovide an RA to avoid the collision

    TCAS must always provide an RA in time to prevent an NMAC

    Pilot The pilot must always follow the RA provided by TCAS

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    43/163

    Refinement of High-Level SafetyConstraints and Requirements

    (Done in Step 2)

    [HA-237]

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    44/163

    STPA Automated Train Door Example

    Define accident (death or injury of passenger oremployee)

    Identify hazards and translate into high-level safety design

    constraints Door Control System hazards:

    Doors open while train is in motion or not aligned withstation platform

    Door closes on a person Passengers cannot evacuate in case of an emergency

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    45/163

    Define control structure and basic componentsafety-related responsibilities

    Door Controller

    Door position

    Door clear?

    Controlcommands

    Close doors

    Reverse direction

    Train motion and position

    Open doors

    Train motion and positionEmergency notification

    Door Actuator

    DoorSensors

    Train Doors

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    46/163

    Door Controller

    Door position

    Door clear?

    Controlcommands

    Train motion and positionEmergency notification

    Open doorsClose doors

    Reverse Direction

    Door position

    Train motion

    Doorway obstructed?

    Fully openFully closedOpeningClosing

    Train position

    Emergency?

    Add ProcessModel

    Door Actuator

    DoorSensors

    Train Doors

    Disturbances

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    47/163

    Using a Table Helps to Organize Step 1

    Start from each high level hazard.

    Create a table with a row for each control action and acolumn for the four types of unsafe control.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    48/163

    ControlAction

    1) Not providingcauses hazard

    2) Providingcauses hazard

    3) Wrong Timingor Order

    4) Stoppedtoo soon

    Providesdoor close

    command

    Doors not commandedclosed or re-closed

    before moving

    Doorscommanded

    closed whileperson or objectis in the doorway

    Doorscommandedclosed during an

    emergencyevacuation

    Doors commandedclosed too early,

    before passengersfinish entering/exiting

    Doors commandedclosed too late, aftertrain starts moving

    Door closecommand

    stopped toosoon, notcompletelyclosed

    Providesdoor opencommand

    Doors not commandedopen for emergencyevacuation

    Doors not commandedopen after closing whilea person or obstacle isin the doorway

    Doorscommandedopen while trainis in motion

    Doorscommandedopen while trainis not aligned at aplatform

    Doors commandedopen before trainhas stopped or afterit started moving(covered by whiletrain is in motion)

    Doors commandedopen late afteremergency

    Door opencommandstopped toosoon duringemergencystop

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    49/163

    Generating Safety Requirements

    Rewrite entries in table as high-level requirements orconstraints on controller.

    Doors must not be opened until train is stopped and aligned withplatform

    Doors must not be closed if someone is in the doorway.

    If a person is detected in doorway during door closing, doorclosing must be stopped and reversed

    Train must not move with doors open

    etc. Use Step 2 to refine constraints by identifying causes of

    unsafe control actions.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    50/163

    Class Exercise

    Take the batch chemical reactor example.

    Using the control diagram and process models you drew

    STEP 1: Create a table of unsafe control actions for the

    hazard: Catalyst in reactor without reflux condenseropera ng wa er ow ng roug

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    51/163

    Providingcauses hazard

    Not providingcauses hazard

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    52/163

    Providingcauses hazard

    Not providingcauses hazard

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    53/163

    Resulting High-Level Constraints

    Translate the entries in the table into constraints

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    54/163

    Determine how each potentially hazardous control actionidentified in Step 1 could occur

    a) For each unsafe control action, examine the parts of control loop

    to see if they could cause it.

    STPA Step 2: Identifying Causes andDesigning Controls

    b) Design mitigation measures if they do not already exist orevaluate existing measures if analysis being performed on anexisting design.

    c) Determine if new hazards created by design changes

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    55/163

    Potential ControlFlaws

    A

    ( , ,

    )

    ProcessModel

    (inconsistent,incomplete, or

    incorrect)

    Inadequate

    ,,

    Inadequate

    55

    operation operation

    Component failures

    Changes over time

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    56/163

    Problems can occur when there is shared control or at theboundary areas of separately controlled processes.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    57/163

    Exercise Continued

    STEP 2: Identify some causes of the hazardouscondition: Open catalyst valve when water valve notopen

    HINT: Consider how controllers process model coulden y a wa er va ve s open w en s no .

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    58/163

    Results

    My first guess, without STPA, of the software safetyrequirement:

    Always open water valve before catalyst valve

    turned out to be incomplete

    Need more than this as well as additional design controls(e.g., flow monitor)

    Can potentially provide automated support

    A simple example, but more complex examples havebeen done and compared with standard safety analysis

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    59/163

    Step 3: Operations and PerformanceMonitoring

    Need to consider how designed controls could degradeover time and build in protection, including

    a) Planned performance audits where assumptionsunderlying the hazard analysis are the preconditions forthe operational audits and controls

    b) Management of change procedures

    c) Incident analysis

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    60/163

    HTV: H-II Transfer Vehicle

    Items Specifications

    Length 9.8 m (includingthrusters)

    Diameter 4.4 m

    Mass 10,500 kg

    Pro ellant Fuel: MMH

    Oxidizer: MON3(Tetroxide)

    Cargocapacity

    (suppliesandequipment)

    6,000 kg- Pressurized cargo:

    4,500 kg- Unpressurized cargo:1,500 kg

    Cargocapacity(waste)

    Max. 6,000 kg

    Target orbit Altitude: 350-460 km

    1 ( 10 2) B

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    61/163

    HTV Operations

    1. 2. 3. B 4. 5. / /

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    62/163

    PROX Operations

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    63/163

    /

    ( )

    t , x / t

    , x

    t , x / / / /

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    64/163

    Non-advocate Safety Assessmentof the Ballistic Missile Defense

    System using STPA

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    65/163

    Ballistic Missile Defense System (BMDS)Non-Advocate Safety Assessment using STPA

    A layered defense to defeat all ranges of threats in allphases of flight (boost, mid-course, and terminal)

    Uses a hit-to-kill interceptor that destroys incomingballistic missiles through force of impact

    Made up of many existing systems (BMDS Element) Early warning radars Aegis Ground-Based Midcourse Defense (GMD) Command and Control Battle Management and Communications

    (C2BMC) Others

    MDA used STPA to evaluate the residual safety risk ofinadvertent launch prior to deployment and test

    f f

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    66/163

    Safety Control Structure Diagram for FMIS

    8/2/2006 66

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    67/163

    Example Causes Identified

    1. Providing Fire Enable causes hazard The fire control computer is intended to send the fire enable

    command to the launch station upon receiving a weapons freecommand from an FMIS operator and while the fire controlsystem has at least one active track

    The specification requires an active track

    e so ware suppor s ec ar ng rac s nac ve a er a cer a nperiod with no radar input, after the total predicted impact timefor the track, and/or after a confirmed intercept

    One case was not well considered: if an operator de-selects allof these options

    The inadvertent or intentional entry of a weapons free commandwould send the fire enable command to the launch station evenif there were no threats to engage currently tracked by thesystem

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    68/163

    FMIS Inadequate Controls (contd)

    2. Providing Fire Enable causes hazard System undergoes periodic system operability testing using an

    interceptor simulator that mimics the interceptor flight computer

    Hazard analysis of the system identified the possibility thatcommands intended for test activities could be sent to theoperational system

    the LS is connected only to missile simulators or to any liveinterceptors

    If the fire control computer detects a change in this state, it willwarn the operator and offer to reset into a matching state

    There is a small window of time before the LS notifies the firecontrol component of the change during which the fire controlsoftware might send a fire enable command intended for test tothe live LS

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    69/163

    Results of Real BMDS Analysis

    Deployment and testing held up for 6 months because somany scenarios identified for inadvertent launch. In many ofthese scenarios: All components were operating exactly as intended Complexity of component interactions led to unanticipated

    system behavior

    STPA also identified component failures that could causeinadequate control (most analysis techniques consider onlythese failure events)

    As changes are made to the system, the differences areassessed by updating the control structure diagrams andassessment analysis templates .

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    70/163

    Evaluation

    STPA worked on this enormously complex system. Why? Top-down analysis

    Considers hazards and causes due to complex systeminteractions (more than just failure events)

    Provides guidance in conducting the analysis Comprehensively addresses the whole of the system,

    including hardware, software, operators, procedure,maintenance, and continuing development activities

    Focuses resources on the areas of the system with thegreatest impact on safety risk

    Provides a clear description of problem to decision makers(not just a probability number)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    71/163

    Assurance of Flight Critical Systems(NASA Aviation Safety Program)

    Goal: Development of safe, rapid, and cost effectiveNextGen systems using a unified safety assuranceprocess for ground based and airborne systems.

    Demonstrate a new safety assurance approach ona NextGen component

    Evaluate and compare it with the current approach

    Create enhanced safety risk managementtechniques

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    72/163

    Problem Statement (2)

    Attempts to re-engineer the NAS in the past have been not beenterribly successful and have been very slow, partly due toinability to assure safety of the changes.

    Question: How can NAS be re-engineered incrementally without

    negatively impacting safety? Hypothesis:

    Rethinking of how to do safety assurance required to successfullyintroduce NextGen concepts

    Applying systems thinking and systems theory can improve ourability to assure safety in these complex systems

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    73/163

    Assurance of Flight-Critical Systems(NASA Aviation Safety Program)

    Current ATC systems remarkably safe due to Conservative adoption of new technologies

    Careful introduction of automation to augment human capabilities

    Reliance on experience and learning from the past

    Extensive decoupling of system components NextGen violates these assumptions:

    Increased coupling and inter-connectivity among airborne, ground,and satellite systems

    Control shifting from ground to aircraft and shared responsibilities Use of new technologies with little prior experience in this

    environment

    Need to be careful that in trying to fix old problems do notintroduce new hazards or new causes of current hazards

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    74/163

    InIn--Trail Procedure (ITP)Trail Procedure (ITP)

    .

    Designed for oceanic and remote airspaces not covered by radar.

    Permits climb and descent using new reduced longitudinal separationstandards .

    Potential Benefits Reduced fuel burn and CO2 emissions via more opportunities to reach the

    optimum FL or FL with more favorable winds. Increased safety via more opportunities to leave turbulent FL.

    But standard separation requirements not met during maneuver

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    75/163

    ITP ProcedureITP Procedure Step by StepStep by Step

    1. Check that ITP criteria are met.

    2. If ITP is possible, request ATCclearance via CPDLC using ITPphraseology.

    3. Check that there are no blockingaircraft other than Reference

    Aircraft in the ITP request.

    4. Check that ITP request isapplicable (i.e. standard request

    not sufficient) and compliant with

    A

    .

    5. Check that ITP criteria are met.

    6. If all checks are positive, issue ITPclearance via CPDLC.8. ,

    .

    9. , .

    10. .11.

    .

    ,,

    ( , ) ,

    ATSA ITP ConceptATSA ITP Concept ITP SeparationITP Separation

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    76/163

    ATSA ITP ConceptATSA ITP Concept ITP SeparationITP SeparationStandardsStandards

    <

    >=

    Before the ITP maneuver, ITP criteria must be met (i.e. stage 1) During an ITP maneuver, the ITP longitudinal separation between

    aircraft is applied (i.e. stage 2).

    At final FL, procedural separation must exist with aircraft that arealready at that final FL (i.e. stage 3).

    A A

    1. 2. 3.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    77/163

    NextGen Hazards

    H-1: A pair of controlled aircraft violates minimum separationstandards.

    H-2: Aircraft enters an unsafe atmospheric region.

    H-3: Aircraft enters uncontrolled state.

    -pitch/roll/yaw that causes passenger injury but not necessarilyaircraft loss).

    H-5: Aircraft enters a prohibited area .

    Because ATSA-ITP will be used first in oceanic airspace, only H-1 was

    considered in the STPA analysis. But later, if it is used elsewhere, the

    other hazards will need to be considered.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    78/163

    HighHigh--Level ControlLevel ControlStructure for ITPStructure for ITP

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    79/163

    Execute ITP Abnormally Terminate ITP

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    80/163

    Approve request

    Deny Request Abnormal Termination Request ITP

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    81/163

    Potentially Hazardous Control Actionsby the Flight Crew

    Control ActionNot ProvidingCauses Hazard Providing Causes Hazard

    Wrong Timing/Order Causes Hazard

    Stopped TooSoon/Applied TooLong

    Execute ITP

    ITP executed when notapprovedITP executed when ITPcriteria are not satisfied

    ITP executed too soonbefore approval

    ITP aircraft levels offabove requested FL

    ITP executed with incorrectclimb rate, final altitude, etc

    after reassessment

    below requested FL

    AbnormalTermination of ITP

    FC continues withmaneuver in

    dangerous situation

    FC aborts unnecessarily

    FC does not follow regional

    contingency procedureswhile aborting

    (Not complete, does not include FC requesting ITP when not safe, considered later)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    82/163

    High Level Constraints on Flight Crew

    The flight crew must not execute the ITP when it has not beenapproved by ATC.

    The flight crew must not execute an ITP when the ITP criteria arenot satisfied.

    The flight crew must execute the ITP with correct climb rate, flightlevels, Mach number, and other associated performance criteria.

    The flight crew must not continue the ITP maneuver when it wouldbe dan erous to do so. The flight crew must not abort the ITP unnecessarily. (Rationale:

    An abort may violate separation minimums) When performing an abort, the flight crew must follow regional

    contingency procedures.

    The flight crew must not execute the ITP before approval by ATC. The flight crew must execute the ITP immediately when approved

    unless it would be dangerous to do so. The crew shall be given positive notification of arrival at the

    requested FL

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    83/163

    Potentially Hazardous Control Actionsfor ATC

    Control ActionNot Providing CausesHazard Providing Causes Hazard

    Wrong Timing/OrderCauses Hazard

    Stopped Too Soon orApplied Too Long

    Approve ITPrequest

    Approval given when criteriaare not met

    Approval given to incorrectaircraft

    Approval given too early

    Approval given too late

    Deny ITP request

    AbnormalTerminationInstruction

    Aircraft should abort butinstructionnot given

    Abort instruction given whenabort is not necessary

    Abort instruction giventoo late

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    84/163

    High-Level Constraints on ATC

    Approval of an ITP request must be given only when the ITPcriteria are met.

    Approval must be given to the requesting aircraft only.

    Approval must not be given too early or too late [needs to be

    clarified as to the actual time limits] An abnormal termination instruction must be given when

    continuing the ITP would be unsafe.

    An abnormal termination instruction must not be given when it

    is not required to maintain safety and would result in a loss ofseparation.

    An abnormal termination instruction must be givenimmediately if an abort is required.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    85/163

    ExampleSTPA

    Results

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    86/163

    Limitations of Current SafetyAssurance Approach

    Barriers and Effects

    Identify Operational Effects (OEs) that could result fromoccurrence of an OH

    Identify barriers that could prevent the OH from leading to.

    Barriers modeled and probability that an OE occurs giventhat the corresponding OH has occurred is estimated.

    Safety targets assigned to events (based on severity)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    87/163

    (It looks like they took a mishmash of techniques from the nuclear powercommunity)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    88/163

    Limitations of Safety AssuranceApproach

    Human Error analysis

    Held workshops with pilots and controllers to assess likelihood ofeach human error.

    Not sure how generated list of human errors but seemsincomplete

    Then created fault trees to determine probabilities

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    89/163

    DODO--312 Hazard Analysis for FC312 Hazard Analysis for FC

    DO-312 begins withOperational Hazards (whichare actually basic causes) Then identify chains-of-

    events (fault trees) thatcould lead to basic causes

    Each set of events isass gne a quant tat vesafety objective

    Human factors Assign probability of error Provides little accounting for

    why errors may occur

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    90/163

    Assumes that ATC &FC failing to detectdistance non-compliance areindependent

    communicationerrors are due only tocorruption of HF data

    DO-312 STPA

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    91/163

    DO 312Execution of an ITP Clearance notCompliant with ITP Criteria

    Assumption AS.40 The probability that ATC does notreceive ITP Distance (as part of the ITPclimb/descent request) but approves ITPprocedure or fails to detect that ITPDistance received in the request is not

    compliant, is assumed to occur no more

    Unsafe Control Action: ITP Flight Crewincorrectly executes ITPRequirement

    [1.2.1.1] Once ITP request has been made, allcommunication between ATC and the FC mustoccur on the same communication channel

    [1.2.1.2] All communication protocols mustinclude definitions of when a communication iscomplete

    [1.10] [1.17]

    STPA

    requen y an ery are.

    AS.12 The corruption of informationbecause of HF occurs no more than Often.

    [1.18] ATC must have access to current*knowledge of the velocity, heading, and locationof all aircraft involved in ITP request

    Assumption : ATC will have this knowledge as partof their overall ability to maintain separation,

    regardless of ITP clearances.[1.1.2] ITP shall provide the flight crews of aircraftoperating in procedural airspace the ability todetermine a clear procedure for communicatingdata about the desired flight level change andnecessary state data to the local air trafficcontroller

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    92/163

    How Can Performance Requirements beVerified?

    Example:

    The likelihood that the ITP equipment providesundetected erroneous information about accuracy andintegrity levels of own data shall be less than 1E-3 perflight hour.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    93/163

    Comparison

    We found many missing requirements Example:

    DO-312 assumes that the reference aircraft will not deviatefrom its flight plan during ITP execution.

    There should be a contingency or protocol in the event thatt e re erence a rcra t oes not ma nta n ts expecte speeand trajectory, for example, because of an emergencyrequiring immediate action (e.g., TCAS alert)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    94/163

    Heuristics to Help with Step 1

    We are creating additional procedures to assist with thisstep (and others)

    Thomas has defined some guidewords and a procedureto go through this process more rigorously

    Starts with identifying environmental or system stateconditions affecting behavior of component

    Then consider for each possible state the result of

    Providing the control action

    Not providing it

    Much of this can potentially be automated

    Control T i i d i i

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    95/163

    Door Controller

    Door position

    Door clear?

    Controlcommands

    Train motion and positionEmergency notification

    Open doorsClose doors

    Reverse Direction

    Door position

    Train motion

    Doorway obstructed?

    Fully openFully closedOpeningUnknown

    Train position

    Emergency?

    Train Door Controller

    Door Actuator

    DoorSensors

    Train Doors

    Disturbances

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    96/163

    1) Control actions provided in state thatmakes them hazardous

    Define context conditions (from process model and fromhazards)

    Train motion: Train is stopped, train is in motion

    Hazard: Doors are o ened while train in motion Emergency: No emergency, emergency situation requiringevacuation(Hazard: Doors do not open for emergency evacuation)

    Train position: Train is aligned with platform, train is notaligned with platform

    (Hazard: doors open when train not aligned with platform)

    Control actions provided in state where action is hazardous

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    97/163

    ControlAction

    Condition

    1: TrainMotion

    Condition 2:Emergency

    Condition

    3: TrainPosition

    Hazardous control action?

    Providedany timein thiscontext?

    Providedtoo earlyin thiscontext?

    Providedtoo late inthiscontext?

    Door opencommandprovided

    Train ismoving

    Noemergency

    (doesntmatter) Yes Yes Yes

    Control actions provided in state where action is hazardous

    moving

    exists

    matter) Yes* Yes* Yes*

    Train isstopped

    Emergencyexists

    (doesntmatter) No No Yes

    Train isstopped

    Noemergency

    Not alignedwithplatform

    Yes Yes Yes

    Train isstopped

    Noemergency

    Aligned withplatform No No No

    Control actions provided in state where inaction is hazardous

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    98/163

    Control actions provided in state where inaction is hazardous

    Control ActionTrain

    Motion EmergencyTrain

    Position Door StateHazardous if

    not provided in this

    context?Door opencommand notprovided

    Train isstopped

    Noemergency

    Alignedwith

    platform

    Person notin doorway No

    [1]

    Train isstopped

    Noemergency

    Alignedwith

    platform

    Person indoorway Yes

    Train isstopped

    Noemergency

    oaligned

    withplatform

    (doesntmatter) No

    Train isstopped

    Emergencyexists

    (doesntmatter)

    (doesntmatter) Yes

    Train ismoving

    (doesntmatter)

    (doesntmatter)

    (doesntmatter) No

    1 This is not hazardous because it does not lead to any of the identifiedhazards but clearly not desirable

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    99/163

    Safety-Guided Design

    Safety analysis and design should be integrated intosystem engineering process

    Most important decisions related to design made in earlyconcept development stage.

    Once made ver difficult or im ossible to chan e

    So kludges made to try to fix the problems (usuallyexpensive and not very effective)

    Cheapest and most effective if design safety in from thebeginning

    Can save money and time doing this (less rework)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    100/163

    Steps in Safety-Guided Design

    1. Identify system hazards

    2. Translate hazards into system-level safety constraintsand requirements.

    3. Try to eliminate hazards from system conceptual level.4. If cannot eliminate, then identify potential for control at

    system.

    5. Create system control structure and assignresponsibilities for enforcing safety constraints.

    S i S f G id d D i (2)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    101/163

    Steps in Safety-Guided Design (2)

    6. Refine system safety constraints and design in parallel.

    a. STPA step 1: identify potentially hazardous controlactions. Restate as component safety design constraintsand requirements.

    b. STPA step 2: determine factors that could lead to

    violation of safety constraintsc. Augment basic design to eliminate, mitigate, or control

    potentially unsafe control actions and behaviors.

    d. Iterate over the process, i.e. perform STPA on the newaugmented design and continue to refine the design until

    all hazardous scenarios are eliminated, mitigated, orcontrolled.

    7. Document design rationale and trace requirements andconstraints to the related design decisions

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    102/163

    1. Identify high-level functional requirements andenvironmental constraints.

    e.g. size of physical space, crowded area

    2. Identify high-level hazardsa. Violation of minimum separation between mobile base and

    Thermal Tile Robot Example

    b. Mobile robot becomes unstable (e.g., could fall over)

    c. Manipulator arm hits something

    d. Fire or explosion

    e. Contact of human with DMESf. Inadequate thermal control (e.g., damaged tiles not detected,

    DMES not applied correctly)g. Damage to robot

    Define preliminary control structure and refine constraints

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    103/163

    Define preliminary control structure and refine constraintsand design in parallel.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    104/163

    3. Try to eliminate hazards from system conceptualdesign. If not possible, then identify controls andnew design constraints.

    For unstable base hazard

    ys em a e y ons ra n :

    Mobile base must not be capable of falling over under

    worst case operational conditions

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    105/163

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    106/163

    Id tif t ti ll h d t l ti b h f

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    107/163

    Identify potentially hazardous control actions by each ofsystem components

    1. A required control action is not provided or not followed2. An incorrect or unsafe control action is provided3. A potentially correct or inadequate control action is provided

    too late or too early (at the wrong time)4. A correct control action is stopped too soon.

    azar ous con ro o s a zer egs: Legs not deployed before arm movement enabled

    Legs retracted when manipulator arm extended

    Legs retracted after arm movements are enabled or retracted

    before manipulator arm fully stowed Leg extension stopped before they are fully extended

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    108/163

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    109/163

    Restate as safety design constraints on components

    1. Controller must ensure stabilizer legs are extendedwhenever arm movement is enabled

    2. Controller must not command a retraction of stabilizer legswhen manipulator arm extended

    . on ro er mus no comman ep oymen o s a zer egsbefore arm movements are enabled. Controller must notcommand retraction of legs before manipulator arm fullystowed

    4. Controller must not stop leg deployment before they are fullyextended

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    110/163

    Do same for all hazardous commands:

    e.g., Arm controller must not enable manipulator armmovement before stabilizer legs are completely extended.

    ,controller in same component

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    111/163

    To produce detailed scenarios for violation of

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    112/163

    To produce detailed scenarios for violation ofsafety constraints, augment control structure with

    process models

    Arm MovementEnabledDisabled

    Stabilizer LegsExtendedRetracted

    Manipulator ArmStowed

    Extended

    How could become inconsistent with real state?

    e.g. issue command to extend stabilizer legs but externalobject could block extension or extension motor couldfail

    f

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    113/163

    Problems often in startup or shutdown:

    e.g., Emergency shutdown while servicing tiles. Stability legs

    manually retracted to move robot out of way. When restart,assume stabilizer legs still extended and arm movement could becommanded. So use unknown state when starting up

    Do not need to know all causes, only safety constraints: May decide to turn off arm motors when legs extended or when

    arm extended. Could use interlock or tell computer to power it off.

    Must not move when legs extended? Power down wheel motorswhile legs extended.

    Check for coordination problems

    General Design for Safety Principles

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    114/163

    General Design for Safety Principles

    In addition to identified application-specific designconstraints

    Result from: General STAMP principles of accident causation General engineering design principles

    (requirements completeness criteria in Safeware )

    Divided into General principles for any controller Special system design principles to reduce human errors

    Details in Chapter 9 of Engineering a Safer World

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    115/163

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    116/163

    CAST: Accident/IncidentCausal Analysis

    Goals for an Accident Analysis

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    117/163

    Goa s o a cc de t a ys sTechnique

    Minimize hindsight bias

    Provide a framework or process to assist inunderstanding entire accident process and identifyingsystemic factors

    Get away from blame (who) and shift focus to whyand how to prevent in the future

    Goal is to determine Why people behaved the way they did

    Weaknesses in the safety control structure that allowedthe loss to occur

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    118/163

    Hindsight Bias

    After an incident

    Easy to see where people went wrong, what they shouldhave done or avoided

    Easy to judge about missing a piece of information thatturned out to be critical

    Easy to see what people should have seen or avoided

    shoulda, coulda, woulda

    Hindsight Bias

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    119/163

    Hindsight Bias

    Almost impossible to go back and understand how worldlooked to somebody not having knowledge of outcome

    Oversimplify causality because can start from outcome andreason backward to presumed or plausible causes

    Overestimate likelihood of the outcome and peoples ability to

    foresee it because already know outcome Overrate rule or procedure violations

    Misjudge prominence or relevance of data presented to peopleat the time

    Match outcomes with actions that went before it: if outcome bad,actions leading to it must have been bad too (missedopportunities, bad assessments, wrong decisions, andmisperceptions)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    120/163

    Sidney Dekker, 2009

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    121/163

    Hindsight Bias Examples

    Data availability vs. data observability The available evidence should have been sufficient to give

    the Board Operator a clear indication that Tank 731 wasindeed filling and required immediate attention.

    Operators could have trended the data on the control board

    Board Control Valve Position: closed Flow Meter: shows no flow Flow: none

    Bypass Valve: closed SO 2 alarm: off Level in tank: 7.2 feet High level alarm: off

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    122/163

    Hindsight Bias Examples

    Another example

    Interviews with operations personnel did not produce a clearreason why the response to the SO 2 alarm took 31 minutes.The only explanation was that there was not a sense ofurgency since, in their experience, previous SO

    2alarms

    were a r u e o m nor re eases a no requ re a unevacuation.

    Overcoming Hindsight Bias

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    123/163

    Overcoming Hindsight Bias

    Assume nobody comes to work to do a bad job. Assume were doing reasonable things given the complexities,

    dilemmas, tradeoffs, and uncertainty surrounding them.

    Simply finding and highlighting peoples mistakes explainsnothing.

    Sa in what did not do or what should have done does notexplain why they did what they did.

    Investigation reports should explain Why it made sense for people to do what they did rather than

    judging them for what they allegedly did wrong and

    What changes will reduce likelihood of happening again

    Avoiding Hindsight Bias

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    124/163

    Avoiding Hindsight Bias

    Need to consider: Goals person pursuing at time and whether reasonable given

    circumstances

    Whether and how goals conflicted with each other (e.g., safetyvs. efficiency, production vs. protection)

    Unwritten rules and norms that may have played a role inbehavior

    Available vs. observable information

    Attentional demands Organizational context

    Cali American Airlines Crash

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    125/163

    Cali American Airlines Crash

    Cited probable causes : Flight crews failure to adequately plan and execute the approach to

    runway 10 at Cali and their inadequate use of automation

    Failure of flight crew to discontinue the approach into Cali, despitenumerous cues alerting them of the inadvisability of continuing the

    approach Lack of situational awareness of the flight crew regarding vertical

    navigation, proximity to terrain, and the relative location of criticalradio aids.

    Failure of the flight crew to revert to basic radio navigation at the

    time when the FMS-assisted navigation became confusing anddemanded an excessive workload in a critical phase of flight.

    CAST (Causal Analysis using

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    126/163

    System Theory)

    Identify system hazard violated and the system safetydesign constraints

    Construct the safety control structure as it was designedto work

    Control actions and feedback loops

    For each component, determine if it fulfilled itsresponsibilities or provided inadequate control.

    If inadequate control, why? (including changes over time) Context Process Model Flaws

    ( )

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    127/163

    CAST (2)

    Examine coordination and communication

    Consider dynamics and migration to higher risk

    Determine the changes that could eliminate the

    constraints) in the future.

    Generate recommendations

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    128/163

    Titan IV/Milstar Loss

    Physical Control Structure Involvedand Component Responsibilities

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    129/163

    and Component Responsibilities

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    130/163

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    131/163

    Copyright Nancy Leveson, Aug. 2006

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    132/163

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    133/163

    Copyright Nancy Leveson, Aug. 2006

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    134/163

    Id tif i g C t t I l d

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    135/163

    Identifying Components to Include

    Start with physical process

    What inadequate controls allowed the physical events?

    Physical

    Direct controller Indirect controllers

    Add controls and control components as required toexplain the inadequate controls already identified.

    Federal Aviation Administration

    Flight release, Charts etc.NOTAMs except L

    IOR, ASAP

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    136/163

    Comair: DeltaConnection

    Airport Safety &

    Standards District

    LEX ATCFacility

    5191

    FlightCrew

    Certification, Regulation,Monitoring & Inspection

    Procedures, Staffing, Budget Aircraft Clearance and

    Monitoring

    Read backs, RequestsLocal

    NOTAMs

    ATIS & L NOTAMsOperational Reports

    ALPA

    Optional constructionsignage

    Certification, Inspection,Federal Grants

    ATO:TerminalServices

    Pilot perspectiveinformation

    Blue Grass Airport

    Procedures &Standards

    Reports

    Certification & Regulation

    ce

    National

    Flight DataCenter Jeppesen

    Charts, NOTAM Data(except L) to Customer

    Reports, Project Plans

    NOTAM Data

    Chart Discrepancies

    a e y

    AirportDiagram

    AirportDiagram

    Verification

    = missing feedback lines

    Composite Flight Data, except L NOTAM

    Graphical Airport Data

    Construction information

    Authority

    Communications

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    137/163

    Why Our Efforts are Often NotWhy Our Efforts are Often Not

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    138/163

    yyCostCost- -EffectiveEffective

    Efforts superficial, isolated, or misdirected

    Often isolated from engineering design

    Spend too much time and effort on assurance not

    desi nin for safet Focusing on making arguments that systems are safe rather

    than making them safe

    Safety cases: Subject to confirmation bias

    Should be trying to prove the system is unsafe, not that it issafe

    Safety must be built in, it cannot be assured in

    Safety Cases

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    139/163

    y

    An argument that system design is safe is not enough Have been criticized as a causal factor in accidents

    Subject to confirmation bias

    their preconceptions or hypotheses regardless of whetherthe information is true.

    Value of system safety is doing what engineers do not

    do. A different viewpoint.

    Confirmation Bias

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    140/163

    Confirmation Bias

    People will focus on and interpret evidence in a way thatconfirms the goal they have set for themselves

    If the goal is to prove the system is safe, they will focus onthe evidence that shows it is safe and create an argument

    . If the goal is to show the system is unsafe, the evidence

    used and the interpretation of available evidence will bequite different.

    People also tend to interpret ambiguous evidence assupporting their existing position.

    Confirmation Bias (2)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    141/163

    Confirmation Bias (2)

    Experiments show people tend to test hypotheses in aone-sided way, by searching for evidence consistent withthe hypothesis they hold at a given time.

    Rather than searching through all the relevant evidence,

    answer supports their hypothesis.

    A related aspect is the tendency for people to focus on onepossibility and ignore alternatives.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    142/163

    Why our Efforts are Often NotWhy our Efforts are Often Not

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    143/163

    CostCost- -Effective (2)Effective (2)

    Safety efforts start too late

    80-90% of safety-critical decisions made in early systemconcept formation

    Cannot add safet to an unsafe desi n

    Why our Efforts are Often NotWhy our Efforts are Often NotCostCost- -Effective (3)Effective (3)

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    144/163

    CostCost Effective (3)Effective (3)

    Using inappropriate techniques for systems built today Mostly used hazard analysis techniques created 40-50 years

    ago Developed for relatively simple electromechanical systems New technology increasing complexity of system designs and

    introducing new accident causes Complexity is creating new causes of accidents

    Should build simplest systems possible, but usually unwillingto make the compromises necessary

    1. Complexity related to the problem itself

    2. Complexity introduced in the design of solution of problem Need new, more powerful safety engineering approaches to

    dealing with complexity and new causes of accidents

    Why our Efforts are Often NotWhy our Efforts are Often Not

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    145/163

    CostCost- -Effective (4)Effective (4)

    Focus efforts only on technical components of systems

    Ignore or only superficially handle

    Management decision making

    Safety culture

    Focus on development and often ignore operations

    Inadequate risk communication (inaccurate perceptionsof risk)

    Limited learning from events

    Safety culture,

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    146/163

    management,

    and thesinking of thelargestoffshore oilp a orm

    March 2001

    For those of you who maybe involved in managing

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    147/163

    g gsafety-critical projects

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    148/163

    Read this quote from aPetrobras executive,

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    149/163

    ,

    on the project thatsunk into the Atlantic

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    150/163

    Ocean off the coast of

    Brazil in March 2001.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    151/163

    "Petrobras has established new global benchmarksfor the generation of exceptional shareholder wealth

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    152/163

    through an aggressive and innovative programmeof cost cutting on its P36 production facility.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    153/163

    Conventional constraints have been successfully challenged

    and replaced with new paradigms appropriate to the

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    154/163

    p p g pp pglobalised corporate market place.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    155/163

    Through an integrated network offacilitated workshops,

    the project successfully rejected the established constricting andnegative influences of prescriptive engineering,

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    156/163

    onerous quality requirements, andoutdated concepts of inspection andl l

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    157/163

    client control.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    158/163

    Elimination of these unnecessary straitjackets has empowered the project'ssuppliers and contractors to propose highly economical solutions,

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    159/163

    with the win-win bonus of enhancedprofitability margins for themselves.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    160/163

    The P36 platform shows the shape of things to come

    in the unregulated global market economy of the 21st Century.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    161/163

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    162/163

    And now you have seen the final result ofthis proud achievement by Petrobras.

  • 8/10/2019 Levenson - 1-2-Beginners-Tutorial-part2.pdf

    163/163

    A life without adventure is likely to be unsatisfying, but a lifein which adventure is allowed to take whatever form it will,is likely to be short.

    Bertrand Russell