225

r2A Risk and Reliability 5th_Edition

  • Upload
    cpx81

  • View
    497

  • Download
    2

Embed Size (px)

Citation preview

Page 1: r2A Risk and Reliability 5th_Edition
Page 2: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd i

Copyright © March, April 1996, February 1997, January 1998, April-July 1999, June 2000, January 2001, May 2001, February 2002, March 2004, April 2004. Risk & Reliability Associates Pty Ltd, Consulting Engineers. 5th Edition Cover by Peter Anderson 5th Edition Co-ordination and review by Kris Francis. 5th Edition editing by Cherilyn Tillman and Bob Browning. Printed and Bound in Australia by Imscam Pty Ltd, Melbourne. This text is copyright. Apart from any fair dealing for the purpose of private study, research, criticism or review or as otherwise permitted under the Copyright Act, no part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means electronic, optic, mechanical, photocopying, recording or otherwise without the prior written permission from the publisher, Risk and Reliability Associates Pty Ltd. ISBN 0-9585241-3-0 RRP AUD $298.00 (including GST). Postage and handling extra. Published by: Risk & Reliability Associates Pty Ltd ACN: 072 114473 ABN: 98 072114473 Consulting Engineers Level 2 56 Hardware Lane MELBOURNE AUSTRALIA 3000 e-mail: [email protected] web: http//www.r2a.com.au fax: +61 3 9670 5278 voice: +61 3 9602 4747 Also in Sydney and Wellington. This text is intended to provide general information concerning the concepts and applications of risk and reliability theory. The text is used by R2A in its training courses on risk and reliability assessment. The examples and templates are provided as examples of the analytical tools used in assessing and managing risk. They should not be used a substitute for obtaining professional advice or assistance. The authors accept no responsibility for any errors or omissions in the material, or for the results of any actions taken as a result of using these examples or templates.

Page 3: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd ii

R2A Document Control Risk & Reliability – An Introductory Text

Edn. Date Section Issue/Nature of Revision Prepared: Reviewed: 1.0 04/96 First Edition RMR KJA 2.0 02/97 Second Edition RMR KJA 3.0 01/98 Third Edition RMR 3.1 07/99 Third Edition, Revised RMR 3.2 06/00 Third Edition, Second

Revision GEF RMR

3.3 01/01 Third Edition, Third Revision LS RMR

4.0 02/02 Fourth Edition GEF, CJT, RWB

RMR

5.0 02/03/04 Fifth Edition RMR, KJA, CJT,

RWB

CJT, RWB

15/03/04 Typos and layout RMR KNF 23/03/04 Chapter 16 & Index CJT, RWB RMR 04/04/04 Chapters 17 & 18. RWB MK, FS,

RMR 19/04/04 Typos & Index RWB RMR

Contributors to earlier editions and revisions include:

Teresa Alam John Bellhouse Keith Hart Matthew Lambert Simon Meiers Paul Rees PM Strickland.

TABLE OF CONTENTS Preface to the 5th Edition vi A SHORT DICTIONARY OF RISK & RELIABILITY TERMS AND ACRONYMS vii

Page 4: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd iii

PART 1 – GENERAL PRINCIPLES 1. INTRODUCTION TO RISK AND RELIABILITY CONCEPTS 1.1 1.1 The Nature Risk 1.2 Types of Risk 1.3 Risk Management Evolution 1.4 Historical Perspective of Risk 1.5 Reliability 1.6 Quality 2. RISK PARADIGMS & MODELS 2.1 2.1 The Rule of Law 2.2 Insurance 2.3 Asset Management 2.4 Threats and Vulnerabilities 2.5 Risk as Variance 2.6 Best Practice 2.7 Simulation 2.8 Culture 2.9 Paradigm Integration 2.10 Risk Models 3. RISK AND GOVERNANCE 3.1 3.1 Risk Management’s Role in Good Governance 3.2 Corporate Governance Systems 3.3 Origins of the Good Governance Movement 3.4 The Rise of the Risk Society 3.5 Governance and Non-Financial Risk 3.6 Public Sector Governance and Risk 3.7 Risk and Corporate Citizenship 3.8 Fallout Severity 3.9 Basic Principles of Good Corporate Urban Governance 4. LIABILITY 4.1 4.1 Criminal vs Civil Standard 4.2 Common Law Criteria 4.3 On Juries and Justice 4.4 Due Diligence 4.5 Safety Cases 4.6 Adversarial Legal System Contradictions 4.7 Risk Auditing Systems 5. CAUSATION 5.1 5.1 Paradigms 5.2 Biological Metaphors 5.3 Discrete State Concepts 5.4 Time Sequence 5.5 Energy Damage 5.6 Energy Damage Models 5.7 Latent Conditions 6. RISK CRITERIA 6.1 6.1 Legal Criteria 6.2 Individual Risk Criteria 6.3 Societal Risk Criteria 6.4 Environmental Risk Criteria 6.5 Insurance Criteria 6.6 Ethical Criteria

Page 5: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd iv

PART 2 – TECHNIQUES 7. TOP DOWN TECHNIQUES 7.1 7.1 SWOT Assessments 7.2 Upside and Downside Risk 7.3 Vulnerability Assessments 7.4 Enterprise Risk Profiling 7.5 Project Risk Profiling 8. RANKING TECHNIQUES 8.1 8.1 Risk Registers 8.2 Ranking Acute OH&S Hazards 8.3 Ranking Property Loss Prevention Hazards 8.4 Integrated Investment Ranking 9. MODELLING TECHNIQUES 9.1 9.1 Trees 9.2 Blocks 9.3 Integrated Presentation Models 9.4 Common Cause Failures 9.5 Human Error Rates 9.6 Equipment Fault Rates 9.7 System Safety Assurance 10. BOTTOM UP TECHNIQUES 10.1 10.2 RCM 10.3 HazOps 10.4 Common Mode Failures 10.5 Risk Management and the Project Life Cycle 10.6 QRA 10.7 HACCP 11. GENERATIVE TECHNIQUES 11.1 11.1 James Reason et al 11.2 Transparent Independent Rapid Risk Reporting 11.3 Generative Interview Technique 11.4 Generative Solutions Technique 12. RISK & RELIABILITY MATHEMATICS 12.1 12.1 Discrete Event Mathematics 12.2 Breakdown Failure Mathematics 12.3 State Theory Mathematics 12.4 Fractional Dead Time Mathematics

Page 6: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd v

PART 3 – THEMES, APPLICATIONS AND CASE STUDIES 13. PROCESS INDUSTRY MODELLING 13.1 13.1 Safety Cases 13.2 Context (Top Down) 13.3 Quantitative Risk Assessment (QRA) 13.4 Fire Modelling 13.5 Pool Fires 13.6 Jet Flames 13.7 Explosions 13.8 Toxic Gas Clouds 13.9 Fire Safety Studies 13.10 Risk Criteria Used in Australia and New Zealand 14. CRISIS MANAGEMENT 14.1 14.1 Intention 14.2 Lessons in Fallout Management 14.3 Design Stage 14.4 Case Studies 14.5 Conclusion 15. INDUSTRY BASED CASE STUDIES 15.1 15.1 Airspace Risk Assessment 15.2 Train Operations Rail Model 15.3 Fire Risk Management (in buildings) 15.4 Transmission Line Risk Management 15.5 Bushfire Risk Management 15.6 Tunnel Risk Management 16. OCCUPATIONAL HEALTH & SAFETY 16.1 16.1 Legislative Framework 16.2 OH & S Risk Assessment 16.3 Performance Indicators 16.4 Information Structures 16.5 Audit & Safety Management Systems 17. FINANCIAL RISK 17.1 17.1 Risk and Opportunity 17.2 Terms 17.3 Utility and Risk 17.4 Models 17.5 Market Risk Mathematics 18. SECURITY 18.1 18.1 Security and Risk Management 18.2 Security Terms 18.3 Basic Elements of Security Management 18.4 The Terrorist Threat

Page 7: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd vi

Preface to the 5th Edition This is the 5th Edition of Risk and Reliability - An Introductory Text. Risk and Reliability Associates Pty Ltd published the first edition of this Text in April 1998. Presently the Text has three parts. Parts 1-2 are based on the very successful 2-day risk management short courses presented by R2A director Richard Robinson for EEA (Engineering Education Australia). Part 3 summarises published R2A practice experience. R2A’s intention is to extend the Text to four parts so as to include material based on the System Safety Assurance Course presented by R2A Director Kevin Anderson for EEA. This course presently uses the 4th Edition as background reading, but work on the 6th edition is scheduled for later in 2004. The evolving nature of risk and risk management in the contemporary globalising environment that is sometimes described as the Risk Society necessitates frequent revision and additions. The recent spate of high profile, local and overseas corporate failures, for example, has created unprecedented interest in corporate governance. The evident vulnerabilities flowing from large-scale technology require scrutiny both from accidental and deliberate actions. And liability is increasingly ubiquitous. An integration of top down and bottom up risk management concepts and techniques as explained in Parts 1-2 becomes necessary to cope with the widening range and severity of modern risk. Part 3 comprises technical explanations of the practical applications of these concepts and techniques. The addition of Part 4 to the planned 6th Edition will address risks resulting from the rise of computer systems, and how, in the context of human frailty, such risks can be managed. R W Browning Hardware Lane, Melbourne March 2004

Page 8: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd vii

A Short Dictionary of Risk & Reliability Terms and Acronyms The dictionary below defines the usage of key terms in the R2A Text. Given the multi-disciplinary nature of risk management, different specialist groups often attribute different meanings to commonly used terms and different terms are often used for similar or near identical concepts. Items underlined are referenced as a separate entry in the R2A dictionary. For simplicity, acronyms have been included rather than giving them a separate listing. The list is adapted from an earlier list presented in a paper by R M Robinson and D B L Viner (1983). Accountability The property that ensures that the actions of an entity can be traced. ALARA As Low as Reasonably Achievable. ALARP As Low As Reasonably Practicable. Algorithm An explicit and finite step-by-step procedure for solving a problem or

achieving a required end. Asset In engineering and commerce, usually a capital cost item. In security,

insurance and loss control, usually refers to an item that if (accidentally) lost would cause a loss.

Audit An inspection or checking of methods of doing business. Audit Trail Data collected and potentially used to facilitate an audit. Availability The ratio of the total system or entity ‘up time’ to system or entity elapsed

time, the latter being the sum of the total ‘up time’ and ‘down time’. It is therefore a function of reliability and repair time.

Business Interruption In insurance terms, the loss of profits over a defined period, typically a year;

otherwise any production or sales stoppage. Common Law The unwritten law derived from the traditional law of England as developed

by judicial precedence, interpretation, expansion and modification (Butterworths (1998). Concise Australian Legal Dictionary. Butterworth, Australia).

Common Mode Failure Common Mode Failures refer to the simultaneous failure of multiple

components or systems due to a single, normally, external cause such an earthquake or fire. It is used to distinguish discreet failures of individual components or systems due to a defect arising locally within that component or system.

In commercial terms it refers to threats whose occurrence would simultaneously affect multiple inputs to any equation, for example, the advent of a third world war, change in interest rates, raw material sources and the like.

Consequence/s The actual or potential degree of severity of loss or gain.

Page 9: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd viii

Controls The most common term used in safety and in this context means to hold in

check or to restrain. It encompasses a large range of measures taken to reduce the likelihood and consequences of adverse outcomes. Controls can encompass both protection and precautions. For example, personal protective equipment is generally protection. The usual hierarchy of controls is:

Elimination, that is, removal of the hazard or risk Engineering controls, that is, those that design out the hazard or reduce it Substitution of a less hazardous substance or equipment or process Administrative controls such as job rotation to reduce exposure time to the

hazard Personal protective equipment, for example, dust masks, hearing protectors,

gloves etc Critical Control Point A point, step or procedure at which control can be applied and a food safety (CCP) hazard can be prevented, eliminated or reduced to acceptable levels. Damage Control Procedures designed to minimise the severity of loss. The same

performance of a function by two or more independent and dissimilar means (of particular reference to software) (Smith D J (1993) Reliability, Maintainability and Risk. Practical Methods for Engineers. 4th Edition. Butterworth Heinemann, Oxford).

Due Diligence A minimum standard of behaviour involving a system which provides against

contravention of relevant regulatory provisions and adequate supervision ensuring that the system is properly carried out (Butterworth (1998). Concise Australian Legal Dictionary. Butterworth, Australia).

A statutory defence to a charge of causing or permitting environmental harm or pollution (Butterworth (1998). Concise Australian Legal Dictionary. Butterworth, Australia).

Engineering Those activities devoted to changing the material world to a desired state

(Robinson Richard M (1981). An Outline of the Philosophy of Engineering and its Consequences, General Engineering Transactions, Engineers Australia, Vol. GE5, No.1, July 1981 pp.35-41).

ERA Environmental Risk Assessment. ERRF External Risk Reduction Facility. Engineers Australia The trading name of The Institution of Engineers, Australia Environmental Hazard An event or continuing process, which if realised, will lead to circumstances

having a potential to degrade, directly or indirectly, the quality of the environment in the short or long term. (Wright N H (1993). Development of Environmental Risk Assessment (ERA) in Norway. Norske Shell Exploration and Production).

Environmental Risk A measure of potential threats to the environment, which combines the

probability that the events will cause, or lead to degradation of the environment and the severity of that degradation Wright N H (1993). Development of Environmental Risk Assessment (ERA) in Norway. Norske Shell Exploration and Production).

EUC Equipment Under Control. Event An incident or situation, which occurs in a particular place during a particular

interval of time. (AS 4360:1999 Risk Management).

Page 10: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd ix

Event Tree Analysis A hazard identification and frequency analysis technique, which employs

inductive reasoning to translate different initiating events into possible outcomes. (AS 3931:1998 Risk Analysis of Technological Systems – Applications Guide). These are displayed graphically.

Failure (risk) A cessation of function that has consequences (usually meaning death,

injury or damage) beyond a component or entity merely becoming unavailable to perform its function. It can also be referred to as a ‘hazardous’ failure (Smith D J (1993) Reliability, Maintainability and Risk. Practical methods for Engineers. 4th Edition. Butterworth Heinemann, Oxford).

Failure (reliability) See Fault. Fault The inability of an entity to perform its required function, resulting in

unavailability. Non-performance to some defined performance criterion (Smith D J (1993) Reliability, Maintainability and Risk. Practical methods for Engineers. 4th Edition. Butterworth Heinemann, Oxford). It can also be referred to as a breakdown failure.

FDT Fractional Dead Time (a form of unavailability). The fraction of any time

period that a defence or control system is ‘dead’ (cannot operate correctly). It is therefore a function of audit frequency and the time to revive/restore the control system.

FMEA Fault Modes and Effects Analysis. (AS 3931:1998 Risk Analysis of

Technological Systems – Applications Guide). FMECA Fault Modes, Effects and Criticality Analysis. (AS 3931:1998 Risk Analysis of

Technological Systems – Applications Guide). Frequency The rate at which something occurs per unit time. FTA Fault Tree Analysis. A hazard identification and frequency analysis

technique, which starts with the undesired event and determines all the ways in which it could occur. These are displayed graphically. (AS 3931:1998 Risk Analysis of Technological Systems – Applications Guide).

Group Risk See Societal Risk HACCP Hazard and Critical Control Point analysis. An approach of identifying,

evaluating and controlling safety hazards in food processes. Hazard A source of potentially damaging energy, which can give rise to a loss and

used extensively by engineers and physical scientists. To be compared to a vulnerability. A source of potential harm or a situation with a potential to cause loss. (AS 3931:1998 Risk Analysis of Technological Systems – Applications Guide and AS 4360:1999 Risk Management). A situation that could occur during the lifetime of a product system or plant that has the potential for damage to the environment.

Hazard Identification Process of recognising that a hazard exists and defining its characteristics.

(AS 3931:1998 Risk Analysis of Technological Systems – Applications Guide).

HazOp HAZard and OPerability study. A formal analysis of a process or plant by

the application of guidewords. HEART Human Error Assessment and Reduction Technique.

Page 11: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd x

Heuristic Proceeding to a solution in the absence of an algorithm, by incremental

exploration using conceptual devices such as ideal types, models and working hypotheses which are intended to provide solutions rather than explain facts.

HPR Highly Protected Risk. US engineering term used to describe a level of loss

control excellence. HRA Human Reliability Assessment. IChemE The Institution of Chemical Engineers (UK). IPENZ The Institution of Professional Engineers, New Zealand Incident An event or situation, which occurs in a particular place during a particular

interval of time which should provide an alert to the risk management system. This can be a failure of a control system or a near miss.

Individual Risk The frequency at which an individual may be expected to sustain a given

level of harm from the realisation of specified hazards (Institution of Chemical Engineers (1985). Nomenclature for Hazard and Risk Assessment in the Process Industries. IChemE, Rugby, Warwickshire).

Insurance A method of transferring risk by financial means. Integrity A property of an object or data that has not been modified and is fit for the

purpose for which it is to be used. IRR Internal Rate of Return. JSA Job Safety Analysis. Latent Condition A failure which is not detected and/or enunciated when it occurs. (SAE ARP

4781:1998 Guidelines and Methods for Conducting the Safety Assessment process on Civil Airborne Systems and Equipment).

Liability A person’s present or prospective legal responsibility, duty, or obligation

(Butterworth (1998) Concise Australian Legal Dictionary. Butterworth, Australia).

Life Cycle Costing Life cycle costing provides a method for determining the total cost of a

system over its entire life cycle and is used to establish the cost effectiveness of alternative asset solutions. Cost effectiveness is defined as the ratio of systems effectiveness to life cycle cost (Blanchard (1991) Systems Engineering Management. Prentice Hall; Blanchard and Fabrycky (1990). Systems Engineering and Analysis. 2nd Edition, Prentice Hall International; Aslaksen and Belcher (1992). Systems Engineering. Prentice Hall).

Likelihood A term to describe the probability or frequency of an occurrence. Loss The embarrassment, harm, financial loss, legal or other damage which could

occur due to a loss event. Any negative consequence, financial or otherwise (AS 4360:1999 Risk Management) including death, injury, damage loss or breach of statute. It may lead to a claim and/or court proceedings.

Loss Event See occurrence.

Page 12: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd xi

Maintainability The set of technical processes that apply maintainability theory to establish

system maintainability requirements, allocate these requirements down to system elements, and predict and verify system maintainability performance (Blanchard and Fabrycky (1990). Systems Engineering and Analysis. 2nd Edition, Prentice Hall International).

MDT Mean Down Time. Mitigation The act of reducing the severity of the potential adverse outcome. In the

context of the types of controls listed above mitigation of risk could be achieved by any bar the first, that is, elimination.

Monitor To check, supervise, observe critically, or record the progress of an activity,

action or system on a regular basis in order to identify change. (AS 4360:1999 Risk Management)

Monte-Carlo Simulation A frequency analysis technique, which uses a model of the system to

evaluate variations in input conditions and assumptions. (AS 3931:1998 Risk Analysis of Technological Systems – Applications Guide)

MORT Management Oversight and Risk Tree. MTBF Mean Time Between Failure. MTTF Mean Time To Failure. MTTR Mean Time To Repair. Occurrence A sequence of events leading to damage or injury. P&ID Process (or Piping) and Instrumentation Diagram. Paradigm A universally recognised knowledge system that for a time provides model

problems and solutions to a community of practitioners (Kuhn T S (1970). The Structure of Scientific Revolutions. 2nd Edition, enlarged, sixth impression. University of Chicago Press).

Pathogen In the risk context Reason (1993) has defined pathogens as analogous to

latent failure in technical systems, similar to resident pathogens in the human body. (Managing the Management Risk: New Approaches to Organisational Safety Chapter 1 of Reliability and Safety in Hazardous Work Systems: Approaches to Analysis and Design.).

Perceived Risk That risk thought by an individual or group to be present in a given situation

(Institution of Chemical Engineers (1985). Nomenclature for Hazard and Risk Assessment in the Process Industries. IChemE, Rugby, Warwickshire).

Precautions Measures taken beforehand to ward off possible adverse events. In the

context of risk management precautions are the result of prudent foresight, that is due diligence. In the context of a Cause-consequence model, precautions act before the loss of control point.

Probability The likelihood of an event occurring. A number in a scale from 0 to 1 that

expresses the likelihood that one event will succeed another (Institution of Chemical Engineers (1985). Nomenclature for Hazard and Risk Assessment in the Process Industries. IChemE, Rugby, Warwickshire).).

Page 13: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd xii

Protection Protection has many meanings. However in the context of risk management

it is the state of being protected or something that protects or preservation from injury or harm. In the context of a cause-consequence model, protection usually acts after the loss of control point such as much fire protection equipment.

QRA Quantified Risk Assessment. The estimation of a given risk by logical and

analytical modelling techniques, or using statistical information from historical data from circumstances similar to existing or planned operations.

Quality Conformance to a set of requirements that, if met, results in an organisation,

service or product that is fit for its intended purpose. Totality of characteristics of an entity that bear on its ability to satisfy stated and implied needs (AS/NZS 9000.1:1994 Model for Quality Assurance in Design, Development, Production, Installation and Servicing).

RAROC Risk Adjusted Return On Capital. RBD Reliability Block diagram. A frequency analysis technique that creates a

model of the system and its redundancies to evaluate the overall system reliability. (AS 3931:1998 Risk Analysis of Technological Systems – Applications Guide)

RCM Reliability Centred Maintenance. Recovery Restoration of a system to its desired state following a fault or failure. Reliability The probability that a device will satisfactorily perform a specified function,

under given operating conditions, for a specified period of time (Smith David J (1993). Reliability, Maintainability and Risk. Practical Methods for Engineers. Fourth Edition. Butterworth Heinemann, Oxford.).

Reliability Engineering The set of technical processes that apply reliability theory to establish

system reliability requirements, allocate these requirements down to system elements, predict and verify system reliability performance and establish reliability growth programs (US MIL-HDBK-338-1A).

Residual Risk The remaining level of (pure) risk after risk treatment measures have been

taken. (AS 4360:1999 Risk Management) Resource/s The human, physical and financial assets of an organisation. Risk The chance of something happening that will have an adverse impact upon

objectives. It is measured in terms of consequences and likelihood. (AS 4360:1999 Risk Management)

Risk (Pure) The potential realisation of the unwanted consequences of an event from

which there is no prospect of gain. Risk (Speculative) Generally, risk deliberately undertaken for a perceived benefit. Risk Analysis A systematic use of available information to determine how often specified

events might occur and the magnitude of their consequences. (AS 4360:1999 Risk Management)

Risk Assessment The study of decisions subject to uncertain consequences. The overall

process of risk analysis and risk evaluation. Risk Curve or Diagram A plot of likelihood vs consequence for a series of events.

Page 14: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd xiii

Risk Engineering The application of engineering techniques to the risk management process. Risk Evaluation The process used to determine risk management priorities by comparing the

level of risk against predetermined standards, target risk levels or other criteria. (AS 4360:1999 Risk Management).

Risk Financing The methods applied to fund risk treatment and the financial consequences

of risk. Note: in some industries risk financing relates to the funding of the financial consequences of risk. (AS 4360:1999 Risk Management)

Risk Identification The observation and identification of new risk parameters (Rowe W D

(1977). An Anatomy of Risk. Wiley Interscience, New York). The process of determining what can happen, why and how. (AS 4360:1999 Risk Management)

Risk Management The process of planning, organising, directing and controlling the resources

and activities of an organisation in order to minimise the adverse effects of accidental losses to that organisation at least possible cost (Head E L (1978). The Risk Management Process. The Risk & Insurance Management Society Incorporated New York. Page 8)

Safe An acceptably low or tolerable level of risk. The opposite of dangerous. SafetyMAP Safety Management Achievement Program. Term coined by the Victorian

WorkCover Authority. Security The combination of availability, confidentiality and integrity. Sensitivity Analysis Examines how the results of a calculation or model vary as individual

assumptions are changed. (AS 4360:1999 Risk Management). Severity The measure of the absolute consequences of a loss, hazard or

vulnerability, ignoring likelihood. In insurance terms the absolute magnitude of the dollars associated with a single (potential) loss event.

Societal Risk The relationship between frequency and the number of people suffering from

a specified level of harm in a given population from the realisation of specified hazards (Institution of Chemical Engineers (1985). Nomenclature for Hazard and Risk Assessment in the Process Industries. IChemE, Rugby, Warwickshire). Sometimes referred to as Group Risk.

Stakeholders Those people and organisations who may affect, be affected by, or perceive

themselves to be affected by, a decision or activity. (AS 4360:1999 Risk Management).

Statute Law Law created by legislation, that is, made by Parliament (Butterworth (1998)

Concise Australian Legal Dictionary. Butterworth, Australia) System Safety A set of technical processes that apply risk management theory to establish

system safety requirements. These requirements are allocated down to the system elements, and predict and verify system safety performance and direct actions to prevent and/or reduce unacceptable levels of identified safety hazards (Blanchard B (1991). Systems Engineering Management. Wiley Interscience)

SRS Safety Related System. THERP Technique for Human Error Rate Prediction. Threat An action or event that might prejudice any asset.

Page 15: r2A Risk and Reliability 5th_Edition

Contents

Risk & Reliability Associates Pty Ltd xiv

Tolerable Risk Risk that is not regarded as negligible or something that can be ignored, but

must be kept under review and reduced further still (Health and Safety Executive (1988). The Tolerability of Risk From Nuclear Power Stations. HMSO, London).

VAR Value At Risk. A concept similar to that of Loss Expectancy. Vulnerability A weakness with regard to a threat. To be compared to a hazard. Vulnerability Analysis A method of 'completeness' checking for a defined scenario.

Page 16: r2A Risk and Reliability 5th_Edition

Concepts

Risk & Reliability Associates Pty Ltd 1.1

1. Introduction to Risk and Reliability Concepts 1.1 The Nature of Risk Risk means different things to different people at different times. However, one element that is common to all concepts of risk is the notion of uncertainty. If we knew what would happen next, there would be no risk. If immortal and omnipotent beings existed, the concept of risk would be incomprehensible to them. But in the world of finite beings, all face uncertain, possibly precarious futures. Risk, and what to do about it, are vital human concerns. Decision-making processes whether of statutory regulators, court judges, business managers or ordinary individuals reflect human concern to improve safety and security, and the reliability and efficacy of their endeavours in the face of ever present uncertainty. 1.2 Types of Risk Risk is generally divided into two broad types: Pure Risk and Speculative or Business Risk. If the likely consequences of a risk are considered to be always bad, offering no prospect of gain, it is designated pure risk. The possible events or situations that pure risk poses are treated as hazards or vulnerabilities. If the possible consequences of a risk are considered potentially desirable, that risk is designated as speculative or business risk, and is treated as an opportunity. Consequently, risk is assessed according both to its estimated likelihood or probability (how often it is likely to occur) and the value of its estimated consequences (how desirable or undesirable its impact may be). 1.3 Risk Management Evolution

USER OBJECTIVES LIMITATIONS Insurance Broker Maximise new clients

Maximise profits Affordable services only Conflict of objectives

Insurance Company Maximise underwriting profits

Conflict of objectives Narrow approach

Safety Manager Maximise safety budget Minimise loss

Loss reduction may not be cost effective

Risk Manager Maximise corporate profits Lacks knowledge of specialised disciplines Not line management

Line Manager Meet production objectives Maximise profits

May not understand contribution of risk management to results

Investment Manager Maximise investment returns Minimise Risk

Risk and profit do not directly accrue to adviser

Auditors Confirm reality matches reports

Historical analysis; the past may not reflect the future

Legal Advisors/Lawyers Manage (potential) conflicts Win court cases

Disputes = prosperity Sign off is difficult

Board Members Maximise corporate profits Minimise personal liability

Lacks knowledge of specialised disciplines

Users of the term "Risk Management" (Adapted from Blombery, 1982)

Page 17: r2A Risk and Reliability 5th_Edition

Concepts

1.2 Risk & Reliability Associates Pty Ltd

Several large international insurance brokers introduced both the concept and the term "risk management" into Australia in the 1970s. The move derived largely from a marketing strategy to gain new clients. Subsequently, others outside the insurance industry took up the term, using it to serve various purposes. Because the term risk management is used now in many different ways by different groups of professionals, confusion often arises as to what precisely is being referred to. Blombery (1982) suggests that the best way to avoid misinterpreting intentions is to examine what the main professional users of the term customarily imply when they refer to risk management, as shown in the table above. NB: Recently the financial investment industry also adopted the concept, developing a new lexicon in the process. For example, VAR (Value At Risk), which is a variation on the more traditional term and Loss Expectancy, which historically has been used by the insurance industry (Taylor, 1996). 1.4 Historical Perspectives of Risk What we think about risk and how we address it depends on the way we perceive that risk and what, at different times, we believe to be its cause. For example: 1.4.1 The Plague When a society believes that the reason many are dying from the plague is because God is punishing people for their sins, it will manage the risk differently from a society that believes in viruses and bacteria. The following illustrates some early attempts to control the plague (Nohl, 1926):

SPEYER 1347 A strict prohibition against gambling in churchyards. COUNCIL OF TOURNAI All concubines to be expelled or married; Sundays to be strictly observed; manufacture, sale and use of dice completely suppressed. (Dice factories turned to making rosary beads). ROUEN (France) 1507 'No gambling, cursing, drinking or excesses'.

1.4.2 United Kingdom - Public Health Reforms in the 1840s A particularly interesting risk management issue arose with the control of epidemics in the UK in the 1830s and 40s (Winslow 1967). Note that at this time viruses and bacteria were not known. The then theory of contagion related to miasmas or clouds of noxious, odious gases. Chadwick's Report on the Sanitary Conditions of the Working Classes (1842) recognised that disease struck where there was work and urban congestion. By providing clean water, sanitation and reasonable housing, the problem would be contained, if not solved. In part, his concept was a flow on from the Crimean war and Florence Nightingale, that “cleanliness is indeed next to Godliness” To quote from Chadwick's report: ...That the expense of public drainage, of supplies of water laid on in houses, and of means of improved cleansing would be a pecuniary gain, by diminishing the existing charges attended on sickness and premature mortality. and That by the combinations of all these arrangements it is probable that the full insurable period of life indicated by the Swedish tables; that is an increase of thirteen years at least, may be extended to the whole of the labouring classes.

Page 18: r2A Risk and Reliability 5th_Edition

Concepts

Risk & Reliability Associates Pty Ltd 1.3

Chadwick’s arguments to justify his risk management recommendations appealed to humanitarian-public interest benefits as well as cost savings over time. This did not achieve the immediate acceptance and success one might expect in today’s more democratic society with greater capacity for public scrutiny, accountability, and liability. There were many with vested interests that could not see, or did not agree that the very expensive fresh water and sewerage treatment was necessary or even effective. Today, passive smoking may be considered in this same context. 1.4.3 The 1840 North American Factory Mutual System In the early 1800s, cotton mills were a notorious source of fire and burned down regularly. A major part of the problem was the need to extract the cotton seeds from the cotton balls, which generated a significant amount of friction in a highly combustible medium. Zachariah Allen, a factory owner in the 1840s decided to build a superior mill. He fire-isolated the cotton gins, provided massive construction, and taught his people how to respond to a fire appropriately, using hoses and sand buckets. He then went to his existing underwriter and asked for a discount. The underwriter responded, “No, the good pay for the bad”. He then approached other owners who had built superior facilities and suggested that they pool the premiums they were paying to existing underwriters. As they should have fewer losses, they could then pay back a profit after a few years. This was a great success and was the forerunner of the Factory Mutual System and the "Highly Protected Risk" (HPR) concept. Such an engineering-underwriter viewpoint contrasts dramatically with a wholly financial view of insurance. With the Factory Mutual concept, only those plants that meet certain minimum design and management system requirements can join the premium pool. The loss rate will therefore remain static over time with minimal influence from market forces. With a purely financial approach a burning building can be insured if sufficient premium is paid. 1.4.4 Tripartite Risk Control Philosophies For Health and Safety policy particularly, Australia adopted the philosophies of the United Kingdom, following from the work of the Robens Committee (Creighton, 1996). The general concept is that there are three key parties to the risk control process: those who own the industry, those who work there, and the government. Each party is of equal status. This particularly applies to the development of codes of practice and regulations. While the tripartite concept has driven traditional approaches to OH&S risk control processes, the emerging legal environment puts increasing emphasis on a fourth party. Attention is swinging to stakeholders. Stakeholders range from consumers of products such as food or pharmaceuticals to the public and communities disaffected by industrial pollution or corporate governance failures. 1.4.5 Bipartite Philosophies An alternative is what might be called the bipartite approach apparently adopted by Germany, arising from industry based insurance efforts started by Bismarck in the 1890s. A bipartite guild (berufsgenossenschaft) is established for appropriate industries. The government’s role is confined to ensuring that the process occurs; specifically that the industry guild exists, that it functions to determine what the acceptable levels of risk are for that industry and to ensure that the consequences of this target are appropriately funded by industry based insurance.

Page 19: r2A Risk and Reliability 5th_Edition

Concepts

1.4 Risk & Reliability Associates Pty Ltd

1.5 Reliability Reliability is a risk-related concept, and a specific area of professional activity. The main concern of reliability-focussed professionals is to ensure that systems or system components work the first time they are required, and every time thereafter. The military has always had a very specific interest in this in both organisational and technological terms. The beginnings of the 20th century arms race in Europe can be traced to the involvement of industrial technology in production of the HMS Warrior in 1861. World War 1 provided the impetus to the development of the aircraft and armoured vehicles and the beginning of increasingly capable military equipment. World War ll brought the development of electronics and a dramatic increase in the complexity of increasingly accurate and destructive weapons. Such systems often consumed enormous resources yet failed to deliver effective service to the customers. As might be expected, the use of sophisticated valve based electronic systems in the emerging fighter jet industry proved very unreliable in the 1950s. 1.5.1 Failure Modes Until the mid 1970s, reliability-focussed professionals saw system components as exhibiting a standard failure profile consisting of three separate characteristics:

An infant mortality period due to quality of product failures. A useful life period with only random stress related failures A wear out period due to increasingly rapid conditional deterioration resulting from use or environmental degradation.

These are shown in the figure below. The consequence of such beliefs was that equipment was taken out of service and maintained at particular intervals, regardless of whether it was exhibiting signs of wear or not.

FailureRate

Infant Mortality Useful Life Wear OutTime

Bathtub Failure Curve However, actuarial studies of aircraft equipment failure data conducted in the early 1970s identified a more complex relationship between age and the probability of failure below. It evolved in the private airline industry primarily through the activities of the Maintenance Steering Group of the International Air Transport Association. The final report of the Maintenance Steering Group in 1980 titled MSG-3, provided the backbone of the logic processes contained in the referenced texts and RCM analysis (Moubray, 1992).

Page 20: r2A Risk and Reliability 5th_Edition

Concepts

Risk & Reliability Associates Pty Ltd 1.5

Wear-in to Random Wear Out

Random then Wear Out

Steadily Increasing

Inceasing during Wear-in and then Random

Random over measurable life

Wear-in then Random

4%

2%

5%

7%

14 %

68%

89%

Failure Rate Curves Specifically, the bathtub curve was discovered to be one of the least common failure modes and that periodic maintenance increased the likelihood of failure. This led to the idea that the maintenance regime ought to be based on the reliability of the components and the required level of availability of the system as a whole.

Page 21: r2A Risk and Reliability 5th_Edition

Concepts

1.6 Risk & Reliability Associates Pty Ltd

1.6 Quality Davis (2001) reviews a large number of contributors to the quality movement. Although there are differences in approach there appear to be 6 common principles namely; management commitment, measurement to determine current position and goals, quality teamwork in the workforce, system based tools, prevention is better than inspection, and customer focus. 1.3.1 W Edwards Deming (US circa 1948) Defines quality as a predicable degree of uniformity and dependability at low cost and suited to the market. The objective of his approach is to reduce the variability by continuous improvement, the "PDCA Cycle" (Plan, Do, Check, Act). Management is responsible for 94% of quality problems. 1.3.2 Joseph M Juran (US) Defines quality as fitness for use. He has a 10-step process to quality improvement. Like Deming, Juran believes that senior management are largely responsible for quality with less than 20% of quality issues being due to workers. However, quality improvements are not free. 1.3.3 Phillip B Crosby (US) Believes that quality is conformance to requirements. He introduced the concept of “zero defects” within the framework of his “four quality absolutes”. The cost of quality is the costs incurred due to non-conformance and therefore quality is free. 1.3.4 William E Conway (US) Has similar beliefs to Deming and indicates that quality increases productivity and lowers costs. He has a 6-tool process for quality improvement and advocates the use of simple statistical methods to identify problems and point to solutions. 1.3.5 Kaoru Ishikawa (Japan circa 1949) Focussed on seven basic tools for quality improvement, quality circles and company wide quality control (CWCC) from top to bottom. Cause and effect diagrams used extensively (see section 5.4). 1.3.6 Shigeru Mizuno (Japan) Promoted 7 tools for quality management; relations diagram, KJ or affinity diagram, systemic/tree diagram, matrix diagram, matrix data-analysis, process decision program chart, and arrow plan. 1.3.7 Masaaki Imai (Japan) Kaizen process to develop logical systemic thinking. Has an expanded form of the PDCA cycle. 1.3.8 Genichi Taguchi (Japan) Restates the Japanese view of investing first and not last. That is, design should be superior. 1.3.9 Shigeo Shingo (Japan) Promoted just in time manufacturing and defects = 0 (Poka-Yoke). 1.3.10 Armand V Feigenbaum (US) Holds that total quality management (TQM) is the way to completely manage an organisation. 1.3.11 Tom Peters (US) He has a focus of leadership and customer satisfaction rather than management. He includes tools like management by walking about (MBWA). 1.3.12 Claus Møller (Denmark) Personal quality is a central element of total quality with a focus on administrative improvement. 1.3.13 John Oakland (UK) Leadership is the key to business excellence and quality

Page 22: r2A Risk and Reliability 5th_Edition

Concepts

Risk & Reliability Associates Pty Ltd 1.7

REFERENCES Blombery R I (1982). Risk Management Origins, Objectives and Directions. Proceedings of the Victorian Industrial Safety Convention, Vol. 1, 1982, pp.39-48. Chadwick E L (1842). Report on the Sanitary Condition of the Labouring Population of Great Britain. Presented to Both Houses of Parliament, London. Creighton W B (1996). Understanding Occupational Health and Safety in Victoria. 2nd edition, Federation Press. Davis, Dr Elwyn C (2001). The quality gurus: What have we learnt from them? Reprinted in Engineering World. December 2001 / January 2002. pp15-19. Moubray, John (1992). RCM II Reliability Centred Maintenance. Butterworth Heinemann Nohl J (1926). The Black Death, a Chronicle of Plague, George Allen & Unwin Ltd, London. Taylor R T and W A MacDonald (1996). The Future of Market Risk Management. Article in Financial Derivatives & Risk Management. Issue 6, June 1996. IFR Publishing Winslow C E A (1967). The Conquest of Epidemic Disease. The Hafner Publishing Company, New York, New York. The particularly relevant chapter is Chapter XII, the Great Sanitary Awakening. READING Beck Ulrich (1986). Risk Society: Towards a New Modernity. Translated © Sage Publications, London. Reprinted 1998. Head E L (1978). The Risk Management Process. The Risk & Insurance Management Society Incorporated New York. Page 8 McCabe FM (1978). Risk Management and the Australian Safety Practitioner. Marsh & McLennan Pty Ltd, Melbourne, Australia. Robinson R M, D B L Viner and M A Muspratt (1985). National and Public Risk: Risk Control Strategy – Some Fundamentals. Paper presented at the ANZAAS Festival of Science, Monash University. Smith Anthony (1993). Reliability Centred Maintenance. McGraw Hill.

Page 23: r2A Risk and Reliability 5th_Edition

Paradigms

Risk & Reliability Associates Pty Ltd 2.1

2.0 Risk Paradigms and Models Efforts to demonstrate how risk should best be managed have given rise to a number of risk management paradigms. A paradigm is a universally recognised knowledge system that for a time provides model problems and solutions to a community of practitioners (after Kuhn, 1970). New paradigms based on more comprehensive or convincing theories may supersede older ones or exist co-jointly with them. The following describes a number of the most common paradigms including some of the advantages and disadvantages of each: The paradigms are: i) The rule of law. ii) Traditional risk management historically typified by the Lloyds Insurance and the Factory Mutual

Highly Protected Risk (HPR) approaches. iii) Asset based risk management, typified by engineering based Failure Modes, Effects and

Criticality Analysis (FMECA), Hazard and Operability (HazOp) and Quantified Risk Assessment (QRA) 'bottom-up' approaches.

iv) Threat-based risk management typified by Strengths, Weaknesses, Opportunities and Threats

(SWOT) and vulnerability type 'top-down' analyses. v) The comparatively recent market based risk management, which uses the notion of the risk

being equal to variance with an equivalent risk of gain as well as risk of loss. vi) Solution-based ‘best practice’ risk management rather than hazard based risk management. vii) The development of biological, systemic mutual feedback loop paradigms, practically

manifested in hyper-reality computer based simulations. viii) The development of risk culture concepts including quality type approaches.

Many proprietary risk management systems integrate several of these approaches. 2.1 The Rule of Law When everything else fails, the ultimate appeal is generally to the rule of law. In a very real sense, all the other paradigms represent methods of satisfying legal outcomes in the event of an adverse outcome. As a consequence, asking lawyers which paradigm is applicable to ensure ‘due diligence’ generates a response that all paradigms, once they are explained, are necessary. The diagram below shows a pathogen based cause-consequence diagram in a legal context, with LOC indicating loss of control. The power of the legal approach is that it is time-tested and proven. If the judiciary is independent of political and commercial interests of the day, then an independent and potentially fair resolution of otherwise potentially catastrophic social dislocation can occur. Perhaps this is why it works: both the political and judicial systems must simultaneously fail before social breakdown occurs. The weakness of the legal approach, certainly in an adversarial legal system, is that the courts remain courts of law rather than courts of justice.

Page 24: r2A Risk and Reliability 5th_Edition

Paradigms

2.2 Risk & Reliability Associates Pty Ltd

WHAT WRONG WHY NOT WHAT IF

Cradle

(Whole of

Life)

Grave

Pathogens

Immune System

LOC

Hit

Miss

Even

t Hor

izon

CAUSATION FORESEEABILITY PREVENTABILITY REASONABLENESS

Pathogen Cause-Consequence Model in Legal Context In the common law tests of negligence the four key words are Causation, Forseeability, Preventability and Reasonableness. This Rule of Law underpins the ALARP principle that risks shall be demonstrated to be “As Low As Reasonably Practicable”. It also provides a focus for other risk management principles including "not less safe", "continuous improvement" and "best practice”. (i) Define WHAT we are talking about CAUSATION (ii) Identify what could go WRONG FORESEEABILITY (iii) Control WHY it will not happen PREVENTABILITY (iv) Assess balance of Precautions to the Consequences IF it did REASONABLENESS Common Law is covered in more depth in Chapter 4. 2.2 Insurance Based Risk Management The Lloyds Insurance and the Factory Mutual Highly Protected Risk (HPR) approaches historically typify this. Both consider empirical history to be the source of wisdom. Looking at past incidents and losses and comparing these to existing plants and facilities can make judgements made about risk. The difference is that one approach, Lloyds', has a financial focus, where the Factory Mutual focus is on a target level of engineered and management excellence. The power of the process is the very tangible nature of history and in a sense the results represent the ultimate Darwinian ‘what if’ analysis. Its weakness is that in the modern rapidly changing world empirical history has become an increasingly less certain method of predicting the future. 2.3 Asset Based Risk Management Asset based risk management is typified by engineering based FMECA, HazOp and QRA 'bottom-up' approaches. Any bottom up method has problems with common cause or common mode failures. A detailed assessment from individual components or sub-systems such as HazOp or FMECA examines how that component or sub-system can fail under normal operating conditions. It does not examine how a catastrophic failure elsewhere might affect this component or the others around it. One attempts to address such ‘knock on’ effects in HazOps by a series of general questions after the detailed review is completed, but it nevertheless remains difficult to use a HazOp to determine credible worst-case scenarios. FMECA and QRA have the same problems.

Page 25: r2A Risk and Reliability 5th_Edition

Paradigms

Risk & Reliability Associates Pty Ltd 2.3

The power of bottom up techniques lies in the detailed intense scrutiny of complex systems and the provision of closely coupled solutions to identified problems. Any proposed risk control solutions are focussed and specific. They can be easily considered for cost/benefit results. The resulting risk registers are powerful decision making tools. 2.4 Threats and Vulnerabilities Threat based risk management is typified by SWOT and vulnerability type 'top-down' analyses. These methods mostly identify areas of general strategic concern rather than solutions to particular problems. A very simple example of a Threat and Vulnerability analysis is shown in the table below. Again this focuses on areas of concern rather than precise solutions.

Critical Success Factors Threats Reputation Operability Staff Technical xx xx xx Community - - xx Political (change of government)

x x x

Financial xxx xxx xxx Natural Events x xxx x

Sample Vulnerability Matrix

Scores

xxx Critical potential vulnerability that must be addressed. xx Moderate potential vulnerability. x Minor potential vulnerability. - No noticeable vulnerability. The intersections of a threat with a "critical success factor" or "asset" are termed vulnerabilities. The SWOT analysis interpreted from a risk perspective provides insight into vulnerabilities or the risk of loss and value addeds, or the risk of gain. This is shown in the figure below.

External / Internal Factors

Opportunities

Strategy

Organisation

Strengths

Value Addeds

Threats

Weaknesses

Vulnerabilities

Augmented SWOT Process Obviously the effort in this model is to ensure that ownership of the upside (value-addeds) is retained, and that ownership of the downside (liabilities) is avoided.

Page 26: r2A Risk and Reliability 5th_Edition

Paradigms

2.4 Risk & Reliability Associates Pty Ltd

2.5 Risk as Variance The comparatively recent market based risk management stems from the notion of risk being equal to variance with an equivalent risk of gain as well as risk of loss (see figure below). In finance, risk is normally assumed to be symmetric. This is not absolutely true, but by making such an assumption many of the tools of statistics become available, most notably the normal distribution, which is symmetric about its mean value. This is the principal strength of the approach.

Standard deviationdeemed

to equal risk

Rate of Return

PureRisk

Speculative Risk

Standard Distribution showing the Mean and Variance However, from a systems engineering perspective at least, this should really be known as the "boom/bust" model since, if everyone uses the same model, mutual feedback loops are inevitable. If pure risk only is assumed, then self-dampening effects are likely, which is the position adopted by most engineers and technologists. Business risk is usually considered to be the sum of both pure risk and speculative risk. 2.6 Best Practice So far all paradigms considered have been hazard based that is looking for problems and then solutions. In health & safety, a hazard is defined as a source of potentially damaging energy, which can give rise to a loss. In more general terms a hazard is a source of potential harm or a situation with a potential to cause loss. In this sense it is analogous to vulnerability, that is the potential impact of a threat upon an asset. Most risk systems like the Australian/New Zealand Risk Management Standard, AS/NZS 4360:1999 suggest a process of hazard identification, risk assessment, control option development and then implementation. An alternative to this is solution based 'best practice' risk management. The best practice risk management approach simply looks at all the good ideas other people in an industry use and see if there is any reason why such ideas ought not to be applied at your own site. In the figure below this means starting on the right rather than at the top or the left.

Page 27: r2A Risk and Reliability 5th_Edition

Paradigms

Risk & Reliability Associates Pty Ltd 2.5

JudgementsStatute, TLS, ALARP,

Common Law Due Diligence etc

Credible Hazards, Vulnerabilities or Pathogens

Risk

Actionsand Residual Risk Allocation

Hazard AssessmentAssess ConsequencesEstimate Likelihoods

Control OptionsMitigate Consequences

Decrease Likelihood

Best Practice Approaches (TLS = Target Level of Safety)

The best practice approach is particularly powerful in a common law ‘due diligence’ sense. The hazard assessment approach implies that statutes may be satisfied, target levels of safety met or 'As Low As Reasonably Practicable' (ALARP) arguments fulfilled. But if there were a simple solution to a trivial problem implemented at a competitor's facility then common law negligence could arise if something went wrong at the facility in question. A best practice process is one of the few approaches that target this difficulty. In a sense, this is confirming the view that liability arises when there are unimplemented good ideas rather than the existence of hazards or vulnerabilities in themselves. 2.7 Simulation Biological/Computer Simulation Paradigms are derived from the application of evolutionary concepts developed in virtual reality. The most practical manifestation of biological paradigms is in computer simulations. This amounts to modelling a complex system in a virtual reality environment and playing endless “what if” scenarios. For example, oilrigs and process plants are generally modelled in 3D before construction so that designers and operators can ‘walk around them' and in many ways ‘try them out’. If every component (or at least all those containing or controlling major energy sources) is identified and has its risk and reliability properties assigned to it then the designer can play ‘god’. Continuing this example, suppose every vessel in the plant ‘knows’ what over temperature or overpressure it can withstand before rupture, and after having ruptured under such conditions can ‘project’ and ‘communicate’ its thermal and pressure energies to adjacent vessels, which then respond accordingly. If the designer then told one to explode, a chain reaction may result. This would depend on separation distances, the force of the explosion and very many other factors. But by resetting the computer simulation and exploding different vessels an evolutionary process of plant risk design can occur. That is, the designer could ‘explode’ every vessel and keep adjusting the plant in small increments until the likelihood of secondary explosions is made vanishingly small. Obviously, this requires fearsome computer power, an extensive interpretation of nature and a belief that hyper-reality can come close to reality.

Page 28: r2A Risk and Reliability 5th_Edition

Paradigms

2.6 Risk & Reliability Associates Pty Ltd

2.8 Culture James Reason (1997) develops a cultural paradigm model in several ways (he is a psychologist by training). He notes three types of risk culture:

Pathological Culture Bureaucratic Culture Generative Culture Don't want to know Messengers are 'shot' Responsibility is shirked Failure is punished New ideas actively discouraged

May not find out Messengers are listened to if they arrive Responsibility is compartmentalised Failures lead to local repairs New ideas often present new problems

Actively seek it Messengers are trained and rewarded Responsibility is shared Failures lead to far reaching reforms New ideas are welcomed

Three Risk Cultures after Reason (1997)

To some extent, those dealing with technological risks have generally suffered a decline in influence as business risks and associated risk management techniques have come to the fore over the past ten years. However, culture has now been identified as central to effective risk management suggesting a new focus has been emerging in the last five years as shown in the figure below. Reason's Pathogen model is discussed in Chapter 5.

Hazards

TechnologicalRisks

Vulnerabilities

BusinessRisks

Pathogens

RiskCulture

Movement from Technological to Business to Risk Culture

Page 29: r2A Risk and Reliability 5th_Edition

Paradigms

Risk & Reliability Associates Pty Ltd 2.7

2.8.1 Safety Culture An interesting application of the cultural risk paradigm arises when considering safety in Australian industry. A major study endeavouring to determine why Australia has a good commercial aviation safety record documented aspects of Australian culture that affect safety performance (Braithwaite et al, 1997). The graph below reflects the answers that staff gave to a request from their manager to help paint his house. Australians have the highest likelihood (up to 95%) of any of the interviewed nations of saying, “No”.

Percentage

100500 7525

AustraliaNetherlands

UKWest Germany

USAItaly

JapanCanadaPoland

PakistanMexico

Hong KongMalaysia

EgyptSingaporeIndonesia

NepalChina

“No” Responses to the question "Would you help paint your manager's house?" Australians tend to be individualistic and to have a low “power-distance”. That is, actions or instructions from others have a comparatively limited effect on the way in which they act. They perceive a relatively flat power gradient between manager and subordinate. For example, on aircraft flight decks junior crew members feel able to speak up without loss of face to the senior crew or other repercussions, if they think an error has occurred. This facilitates initiation of effective additional checks. In industries with different management styles, difficulties can arise. If a person being directed does not believe that the directive is either practical or safe, then that person will tend to assess the situation and do it his/her own way. The person may do so without declaring his/her intention or discussing the intended change to procedures with management.

Page 30: r2A Risk and Reliability 5th_Edition

Paradigms

2.8 Risk & Reliability Associates Pty Ltd

2.9 Paradigm Integration The figure below describes an understanding of how the different paradigms presented in this section fit within a large organisation.

Vulnerability Analyses,SWOT Analyses etc,Audits, UnderwritingAssessments, AvailabilityAssessments.

Crisis and Fallout Management

QRA, HazOPs,FMECA, RCM,Job Safety Analysis,Cause ConsequenceModelling etc

Losses, Incidentsand Breakdowns

Fire Fighting, First Aid,Legal Actions

Insurance Payments

Pre-event EventHorizon

Post-event

Stategic Tactical

Operations &Maintenance

Board andCEO (Policy)ABCDE

1 2 3 4 5AS4360

IEC (AS) 61508

Courts

TopDown

An Integrated Risk Paradigm Framework

The top left hand box shows those paradigms that would be expected to apply strategically at the higher levels of an organisation, whilst those in the bottom left hand box could generally be applied at the operational level. On the right hand side are the tactical issues that are faced post-event. The objective of risk management is to stay on the left hand side of the event horizon but a complete risk management framework must provide for the post-event scenarios. There are a number of risk techniques available but only three generic methods by which organisations can proceed with strategic tasks to address the concept of risk. These are: i) Expert knowledge provided from experts, literature and research ii) Facilitated workshops of experts and interested parties iii) Interviews with selected players. Each of these methods has different strengths and weaknesses depending on the culture of the organisation and the nature of a particular task.

Page 31: r2A Risk and Reliability 5th_Edition

Paradigms

Risk & Reliability Associates Pty Ltd 2.9

The best methodologies to use in the implementation of each of the paradigms are illustrated in the following table:

Technique>> Risk Management Paradigm

Expert reviews Facilitated workshops

Selective interviews

1. The rule of law Yes (Legal opinions)

Yes (Arbitration, moot

courts)

Yes (Royal

Commissions) 2. Insurance approaches Yes

(Risk surveys, actuarial studies)

Yes (Risk profiling

sessions)

Yes (especially moral

risk) 3. Asset based, 'bottom-

up' approaches Yes

(QRA, availability & reliability audits)

Yes (HazOps,

FMECAs etc)

Difficult

4. Threat based 'top-down' approaches

Difficult in isolation

Yes (SWOT &

vulnerability)

Yes (Interviews)

5. Business (upside AND downside) approaches

Yes (Actuarial studies)

Difficult in isolation

Yes (Fact finding

tours)

6. Solution based ‘best practice’ approaches

Difficult to be comprehensive

Difficult to be comprehensive

Yes (Fact finding

tours)

7. Simulation Yes (Computer

simulations)

Yes (Crisis

simulations)

Difficult

8. Risk culture concepts Yes (Quality audits)

Difficult

Yes (Interviews)

Risk Management Paradigm - Technique Matrix

The concept of a Safety Case, which is logically prior to and supports the Business Case for an enterprise, is one interesting development. Those techniques and paradigms highlighted in the table could be used in developing a safety case.

Page 32: r2A Risk and Reliability 5th_Edition

Paradigms

2.10 Risk & Reliability Associates Pty Ltd

2.10 Models 2.10.1 Risk and Reliability Diagrams A particularly useful way of examining (pure) risk and reliability in an organisational sense is via a risk diagram. A risk diagram is fundamentally a plot of the likelihood of events occurring against the severity of the outcomes. This can be done in different ways depending on the industry or organisation that is being examined. The frequency denominator (events per year, events per kilometre, events per passenger mile, or events per any frequency denominator) is plotted against consequence severity in down time, dollars, lives lost, working days lost, or days lost to the community.

Protest PicketsPersonal InjuryIndustrial Stoppage

OH&SFire &Explosion CatastrophicMaintenance

BreakdownsPublic CrtiticismStaff Complaints

Reliabilty EngineeringFMECA and RCMDefence Industry Driven

ProductBoycott

Safety

High technology andhigh hazard systemfailures, Class Actions,Market Collapse

Relative Severity of Consequence

Service

Risk EngineeringHazOp and FTAAerospace & Nuclear Industry driven

Relative Likelihood

Organisation Risk Diagram In organisational terms the risk diagram describes the relationship between the different technical and commercial areas of activity and the relationship between risk and reliability. Plotted on normal axes, the curve typically takes the form of a hyperbola as shown. If the plot is likelihood against severity in dollars, then the area under the graph represents the size of the economic loss. Typically, the greatest area is at the maintenance end, then the OH&S or personal injury area, then the fire and explosion zone and lastly the catastrophic event region. The Maintenance region, being the largest, therefore provides the greatest returns for good management and is the target of such programs as Reliability Centred Maintenance (RCM). However the other regions which deal with damage, injury and death also have a legal dimension. One view is to suggest that failure to optimise the maintenance region can send an organisation broke, but failure to deal with the legal dimension can send directors to goal. Certainly, both are important.

Page 33: r2A Risk and Reliability 5th_Edition

Paradigms

Risk & Reliability Associates Pty Ltd 2.11

2.10.2 Asset Management and the Costs of Ownership Asset management is more than ownership, accountability and demand management after the assets are in place, it is about whole of life approach to management. Asset management is about all those actions, from the first stirrings of a need to the final recycling of the disposed asset which ensure that an asset achieves the business objectives of:

i) Being safe for operators, users and the public. ii) Not adversely impacting on environment during its use, maintenance or disposal. iii) Providing the service for which it was procured. iv) Achieving the above at minimum cost of ownership over its life.

The cost of ownership includes at least:

a) The initial capital cost, plus b) the whole of life cost of operation and maintenance, plus c) the whole of life cost of risk (the cost of prevention plus the cost of loss).

In some cases the largest component of the cost of ownership will be the whole of life cost of risk. For public authorities especially, it is very common to have very large expenditures on risk control measures that are not identified specifically as part of the cost of ownership of the operating assets. For example, signalling on railways is a risk control measure to prevent trains from colliding. If all trains ran exactly on time and the timetable was perfect then there would be no red signals ever occurring in a train network. This indeed was historically the case. The reasons for the introduction of signalling systems was because eventually the train system became sufficiently complicated that perfect achievement of timetable was no longer possible. This meant that collisions would inevitably occur unless some interposing system was installed. The cost of the signalling system should be included as part of the cost of ownership for the railways but identified as part of the preventive aspects of the cost of risk. This concept is reflected in market risk terms, especially in banks as RAROC (Risk Adjusted Return On Capital). 2.10.3 Risk Management Process Model The Risk Management Process Model is one of the most commonly used risk management models and dates from the mid seventies.

historical data (past experience)surveysworkforcescientific literature

Identification

likelihood of occurrence andseverity of consequence

Quantificationbalance of advantages / disadvantages of running the risk with the advantages / disadvantages of controlling it

Evaluation

risk retentionrisk reductioninsurancerisk transfer

Control

A Risk Management Process Model

The identification phase parallels the common law aspects of foreseeability (see Chapter 4). The option of risk transfer under Control, has been severely curtailed in Australia in recent times. It used to be possible to sell a high risk portion of a business and then contract the service back, leaving the risk associated with that enterprise quarantined from the original business. Such a practice has been soundly rejected in Australian jurisdictions.

Page 34: r2A Risk and Reliability 5th_Edition

Paradigms

2.12 Risk & Reliability Associates Pty Ltd

The model in below is an overview from the Australian/New Zealand Standard, Risk Management, AS4360: 1999. This follows the process model.

Establish the context

Evaluate risks

Analyse risks

Identify risks

Treat risks

Assess risks

Mon

itor

& R

evie

w

Com

mun

icat

e &

Con

sult

Risk Management Overview The main elements are in the form of an iterative process: a) Establish the Context - This step establishes the strategic, organisational and risk management

context in which the rest of the process will take place. Risk assessment criteria and structure to be used should also be defined.

b) Identify Risks - Identify what, why and how hazards arise. c) Analyse Risks - Determine existing controls and establish the likelihood of the events and the

severity of the consequence. d) Evaluate Risks - Compare projected risk levels against criteria to determine acceptability or

otherwise of each hazard and set risk priorities. e) Treat Risks - Accept and monitor low-priority hazards. For all other hazards develop and

implement a specific management plan, which includes consideration of funding. f) Monitor and Review - Monitor and review the performance of the risk management system and

changes which might affect it. g) Communicate and Consult. Communicate and consult with both internal and external

stakeholders at each stage of the risk management process and concerning the process as a whole.

For each stage of the process adequate records should be kept to satisfy an external audit.

Page 35: r2A Risk and Reliability 5th_Edition

Paradigms

Risk & Reliability Associates Pty Ltd 2.13

2.10.4 An Idealised Risk Management Structure The diagram below represents the way in which industry often establishes an idealised risk management structure. It is generally considered idealised because whilst a company manager may indeed have the title of Risk Manager, the legal responsibility for the management of risk is a line management function.

Pre-Event

Security ManagerFinance ManagerRisk EngineerPublic AffairsErgonomistOSH&E Manager

Damage Control

Crisis Management Team(Media Relations)First AidFire Team

Post-Event

Medical StaffInsuranceLegal Advisers

Risk Manager

An Idealised Risk Management Model In practice, most persons with the title of Risk Manager, are in fact internal risk advisors, on whose advice line management may choose to rely on. Legally, at least, the ultimate decision makers with regard to the levels of risk an organisation can accept will be ultimately its highest level of management, namely its Board of Directors or equivalent. 2.10.5 A Facilities Management Model The facilities management model is favoured by organisations that have large volumes of occupied space, for example universities and hotels.

FacilitiesManagement

RiskManagement

AssetManagement

SpaceManagement

Facilities Management Model

Page 36: r2A Risk and Reliability 5th_Edition

Paradigms

2.14 Risk & Reliability Associates Pty Ltd

2.10.6 An Asset Management Model The organisational model shown below has proved attractive to local government.

AssetManagement

RiskManagement

ResourceManagement

InsuranceOperation

ManagementMaintenanceManagement

RiskEngineering

An Asset Management Model 2.10.7 Process Model of Risk Management This model uses an underlying time sequence basis within a legal framework.

CourtsSafe=AcceptableRisk

EnvironmentalEngineering Safety

Event

Rehabilitation Insurance

InjuryRecovery

Required feedback

Teach peopleto work safely ie,lift their feet

Modify the work environment i.e,remove the brick

Risk Management

A brick

Someone aboutto trip over a brick

Process Model of Risk Management It also suggests that the purpose of risk management is to optimise the total costs of risk, subject to the constraint of matching legal expectations at all times.

Page 37: r2A Risk and Reliability 5th_Edition

Paradigms

Risk & Reliability Associates Pty Ltd 2.15

2.10.8 Key Performance Areas Model The key performance model is a spin off of a recent business management refocus that all business activity should be measured by Key Performance Indicators (KPIs) measuring Key Performance Areas (KPAs). This can be represented in a number of ways, such as the one shown below.

Organisation

Business/ExternalEnvironment

Customer,Competition,Growth, Political

Culture,Structure,Resoutces

Selection, Training,Assessment, Retraining

Design, Procure,Constuct, Modify, Audit

Operations, Maintnance, Audits,Corrective Actions, Procedures

Plans, Resources,Rehabilitation, Support

CompetentStaff

PhysicalConfiguration

Operation &MaintenanceManagement

Incident, Crisis& EmergencyPlanning

OutcomeWorld's Best Practice

Key Performance Areas Model 2.10.9 Risk Role Models Different elements of society play different risk management roles. Governments, for example are expected to have a major role in the management of public risk. This usually manifests itself as various forms of regulation over corporate risk and emergency response services, if required. Interestingly, depending on where organisations lie in the causal chain depends on how they regard the activities of the others and therefore the role each must play.

CorporatePrevention

Failure

CorporateCrisis

ManagementFailure

PublicEmergencyResponse

Failure

GovernmentCrisis

ManagementFailure

Corporate or Institutional Risk ManagementIndirect Government Control (Regulation)

Public Risk ManagementDirect Government Control

Corporate Hazard or Pathogen

Loss of Public Confidence.Change of

Government.

Time

Risk Roles Model From a government perspective, unmanaged corporate hazards represent a threat that must be addressed, usually by regulation and the provision of adequate emergency response and crisis management systems. From the corporations' perspective, governments and associated regulations represent disproportionate interference for possible consequences of matters that the corporations believe they have in hand already.

Page 38: r2A Risk and Reliability 5th_Edition

Paradigms

2.16 Risk & Reliability Associates Pty Ltd

REFERENCES Braithwaite G, J P E Faulkner, R E Caves (1997). Latitude or Attitude? - Airline Safety in Australia. Paper presented at the 1997 National Conference of the Risk Engineering Society, Engineers Australia. Canberra. Kuhn T S (1970). The Structure of Scientific Revolutions, 2nd Edition, enlarged, sixth impression. University of Chicago Press. Reason J (1997). Managing the Risks of Organizational Accidents. Ashgate Publishing Limited. Standards Australia/Standards New Zealand (1999). Risk Management. Australian/New Zealand Standard AS/NZS 4360:1999. READING Standards Australia (1999). Functional Safety of electrical / electronic / programmable electronic safety related systems. Par 6.5: Examples of methods for the determination of safety integrity levels AS 61508.5 – 1999 / IEC 61508.5 – 1998.

Page 39: r2A Risk and Reliability 5th_Edition

Governance

Risk & Reliability Associates Pty Ltd 3.1

3. Governance and Risk 3.1 Risk Management’s Role in Good Governance Over the last decade numerous international and national inter-governmental bodies have sought to promote good corporate governance. One element that all emphasise is that risk management is an integral part of good governance. For example, the Commonwealth Heads of Government meeting in Edinburgh in 1997 issued a Declaration whose purpose was “to promote excellence in corporate governance”. It set up the Commonwealth Association for Corporate Governance (CACG) which issued 15 Principles it considered “fundamental to a holistic approach to corporate governance”. In reference to Risk Management, CACG Principles state:

“The board must identify key risk areas and key performance indicators of the business enterprise and monitor these factors. If its strategies and objectives are to have any relevance, the board must understand and fully appreciate the business risk issues and key performance indicators affecting the ability of the corporation to achieve its purpose. Generating economic profit so as to enhance shareholder value in the long term, by competing effectively, is the primary objective of a corporation and its board. The framework of good corporate governance practices in a corporation must be designed with this objective in mind, while fulfilling broader economic, social and other objectives in the environment and circumstances in which the corporation operates. These factors – business risk and key performance indicators - should be benchmarked against industry norms and best practice, so that the corporation’s performance can be effectively evaluated. Once established, the board must constantly monitor these indicators. Management must ensure that they fully and accurately report on them to the satisfaction of the board. The board, as emphasised throughout, has a critical role to play in ensuring that the business enterprise is directed towards achieving its primary economic objectives of profit and growth. It must, therefore, fully appreciate the key performance indicators of the corporation and respond to key risk areas when it deems it necessary to assure the long-term sustainable development of the corporation.”

3.2 Corporate Governance Systems Corporate governance is the system by which an organisation is directed and controlled. Laws regulate only some aspects of corporate governance. In the main, directors and managers have only principles and guidelines to help them construct systems and maintain their currency. There is no single governance model that fits all types of organisations. 3.2.1 Governance Models A number of generic models issued by international bodies and national standard setting councils are commonly available. For example, the Organisation for Economic and Cultural Development (OECD), the United Nations (UN), the Commonwealth Association for Corporate Governance (CACG), and the Council of Standards Australia. These generic models seek to enable users to appreciate and identify the wide range of concerns that good governance needs to cover.

Page 40: r2A Risk and Reliability 5th_Edition

Paradigms

3.2 Risk & Reliability Associates Pty Ltd

Other relevant models, as well as regulations, codes of best practice, and government programs and policies also exist. Most focus on aspects of governance pertinent to the particular areas of authority or expertise of the issuing bodies. More often than not, these refer to financial risk. For example, national stock exchanges, chartered accountants, auditors, company secretaries and other finance-related professional groups. For example:

ASX Corporate Governance Council: Principles of Good Corporate Governance and Best Practice Recommendations.

IFSA IFSA Guidance Note No 2.00. Corporate Governance: A Guide for Fund

Managers and Corporations. ANAO Australian National Audit Office 1999 AS/NZS ISO 9001 Quality Management Systems – Requirements AS/NZS 4269 Complaints Handling AS/NZS 4360 Risk Management AS/ISO 15489 Records Management AS/ISO 15489.1 Part 1: General AS/ISO 15489.2 Part 2: Guidelines AS 3806 Compliance Programs AS 8000 Good Governance Principles AS 8001 Fraud and Corruption Control AS 8002 Organisational Codes of Conduct AS 8003 Corporate Social Responsibility AS 8004 Whistleblower Protection Programs for Entities. State as well as national government laws, regulations and programs can also apply. In Victoria for example: Victorian Managed Insurance Authority Act 1996, Financial Management Act 1994, Victoria Government’s Management Reform Program Victorian Government policies associated with private-public sector service and infrastructure

delivery such as Partnerships Victoria. 3.2.2 Key Governance Areas and Issues The following table lists only some of the numerous issues and operational functions that an reasonably comprehensive corporate governance system should encompass:

Accountability Asset management Transparency Quality management Code of conduct Continuous improvement Good citizenship Best Practice Social responsibility Training Shareholder rights OH&S Stakeholder identification Fraud and corruption control Stakeholder liaison Complaint handling Corporate ethics Compliance Board charter Due diligence Board protocol Records management Authority delegation Internal reporting CEO remuneration Security

Page 41: r2A Risk and Reliability 5th_Edition

Governance

Risk & Reliability Associates Pty Ltd 3.3

3.3 Origins of the Good Governance Movement The main challenge for those charged with designing or reviewing a corporate governance system is how to ensure the system recognises all the key aspects of the corporation’s objectives, context, structure and operation. In undertaking this task, it helps first to know why the recent global emphasis on governance came about. What was the good governance movement a reaction to? What did it seek to avoid? What does it aim to achieve? Until the 1990s little if anything was heard in business circles of the term “governance”. When the term first came into business usage, many took it to be merely another of those verbal fads that pop up from time to time – a fancy way of referring to governing or government. 3.3.1 Stock Crashes and Mega-Corporation Collapses Underpinning the changes to the business vocabulary were efforts to reorientate corporate organisation and decision-making. As the Australian Standard AS 8000-2003, Good Governance Principles (p.4) states:

“The stock market crash in 1987 and the subsequent collapse of many corporate entities around the world lead to urgent calls, particularly from institutional shareholders, for the reform of corporate governance mechanisms”.

After the 1990’s stock market bubble burst, five of the 10 biggest corporate collapses on record pushed US corporate bankruptcies to new records for the second consecutive year. Topping the list was WorldCom whose $104bn in assets made it the most expensive collapse in history. In 2002 alone, 186 US companies involving $368 billion in assets went bust. It beat the previous year’s record of $259 billion. Australia had its counter part when a number of high profile companies including HIH and OneTel imploded, causing severe damage to public as well as investor perceptions of corporate governance. The corporate failures of year 2001 were mainly the result of debt problems resulting from poor appreciation and response to financial risk. But, in the previous year, accounting scandals were the order of the day. Many involved criminal fraud, undetected or unexposed over lengthy periods of time even by those claiming to be professional financial watchdogs in the media, banking and investment advisory and audit houses. WorldCom accounted for more than $9 billion of false profits on its balance sheet. With the benefit of hindsight the consensus of financial media was that to have such thunderous bankruptcies, companies had both to take on a huge amount of debt and either be badly or fraudulently run. Such companies also had to have something sufficiently attractive about them that led creditors into foolishly or mistakenly extending them huge amounts of credit. These scandals inflicted severe damage on employees and pension fund holders. In a number of cases (many still proceeding), senior managers earned gaol sentences. 3.3.2 Other Contemporary Causal Factors At the same time, global efforts were developing to refit companies to cope with other new challenges, especially those posed by: - the expanding, increasingly competitive, international market economy, - the de-regulatory, neo-liberal economic policies generally associated with globalisation, - the increasingly complex technology creating what some called the “risk society”, - increasing concern and activism over environmental and public health issues, - escalating liability litigation, and, more recently - new styles and severity of international terrorism.

Page 42: r2A Risk and Reliability 5th_Edition

Paradigms

3.4 Risk & Reliability Associates Pty Ltd

The concept of governance came into fashion about the same time as a number of other new terms - or at least new usages of terms. The thrust of words like deregulation, corporatisation, privatisation, globalisation, international competitiveness, continuous improvement, etc, become clear only against the background of the new economic policy orthodoxy and changes in the international market and the global spread of new technology. Soon, we also began to hear more of concepts like transparency, stakeholder as well as shareholder interest, social responsibility, business ethics, and corporate good citizenship. Later, environmentalist and public safety terms like sustainability and the precautionary principle joined the verbal influx. It was a period in which liability litigation also proliferated. Concepts like duty of care, best practice, and due diligence helped swell the vocabulary of day-to-day corporate activity. These trends were part and counter-part of government deregulation, global market orientation, technology, and the public reactions all these developments engendered. 3.4 The Rise of the Risk Society The nature and extent of risks today are a far cry from those of the “satanic mills” of the first Industrial Revolution in the Nineteenth Century. The physical pollution and social harm associated with early technology was localised, mostly confined to a limited urban area. The risk as well as the opportunities of much modern technology is limitless: nuclear fission, radio-active waste disposal, genetic engineering, techno-scientific animal husbandry, food manufacture, pharmaceuticals and numerous other new processes and systems impact populations – often for the better, sometimes for the worse - across continents and down generations. Risk extends to the planet itself: greenhouse, ozone layer, acid rain, rain forest clearance in Brazil, forest fires in Indonesia and massive dam construction in China. But even at less than universal dispersion of risk, the economic and social impact of local incidents can be great. Consider, for example, the Longford gas disaster in Victoria, the failures of Sydney Water and Auckland Power, and the collapses of HIH Insurance and Ansett Airlines. Some sociologists have called attention to the newly emerging conditions by using the term, Risk Society (Beck, 1986). The term draws attention to the fact that the new globalising, highly complex, technological systems are unleashing hazards and potential threats, as well as benefits, to an extent previously unknown. Francis Fukuyama (1999) notes "it is science that drives the historical process; and we are on the cusp of an explosion in technological innovation in the life sciences and biotechnology”. Early technology was designed largely to control the risks that sprang from nature - flood, fire, disease, etc - and from scarcity - famine, low productivity, limited distribution capacity, etc. But now, as Ulrich Beck points out, risk increasingly emanates from man and his inventions. He uses the term "reflexive modernity" to warn of the catastrophic as well as the beneficial potential of the new technology. Technological progress is also enabling the world, if it will, to abolish scarcity of supply of human material needs. Many emerging hazards are both unintended and unanticipated. Arguably, some risks may lurk so deeply in new products or processes that they may be unknowable, even to state-of-the-art science at the time innovations are implemented. Increasingly sophisticated methods of risk identification, calculation and control will therefore be demanded of risk management. Of necessity, risk management functions will be conducted increasingly in the glare of public scrutiny. New parameters of transparency and public "risk tolerability" will be forged not in the comfortable privacy of boardrooms but on the exposed public battlefields of political controversy, legal liability and impending government regulation on default. When the nature of risk from new technology changes so that many risks remain latent and do not manifest themselves for years, the danger is that the incentives to control them can be weakened. Competition and the profit-motive may drive some management to neglect consequences they think will not necessarily impact during their term of office.

Page 43: r2A Risk and Reliability 5th_Edition

Governance

Risk & Reliability Associates Pty Ltd 3.5

The outbreaks of Mad Cow Disease and dioxin-contaminated food exports should be taken as just two of many warning signals that worse is to come unless risk management succeeds in keeping pace with the burgeoning risk society. 3.5 Governance and Non-Financial Risk One effect of the risk society and the corporate governance movement that gained momentum in the 1990s was to put greater emphasis on risk management. Since then, pressure has been maintained not just for best practice but also for continuous improvement in governance risk identification and management. “The proper governance of companies will become as crucial to the world economy as the proper governing of countries”, declared the President of the World Bank, James D. Wolfensohn, commenting on the good governance movement. Nevertheless, there are still hard yards to cover. A McKinsey study of risk management practice in May 2002 covered 200 directors representing over 500 boards of major companies. Thirty-six percent of the directors believed their boards did not understand their companys’ major risks. Approximately 40 percent believed they could not identify, safeguard and plan for risk effectively enough. The same percentage believed that non-financial risk received only “anecdotal treatment in the boardroom” (Protiviti 2003). Research published by Financial Executives International in November 2001, for example, claimed that 65 percent of senior executives lacked “high confidence” in their risk management. FEI reported that doubts persisted over the extent to which existing processes could be relied upon to identify all potentially significant business risks to their enterprises (Protiviti 2003). 3.6 Public Sector Governance and Risk 3.6.1 Auditor General Victoria’s Audit Report

The Auditor General Victoria’s performance audit report in March 2003, “Managing Risk Across the Public Sector” aimed to “provide a timely assessment about risk management practices at individual agency and whole-of-government or State-sector levels”. The report noted the effort to establish a formal and structured focus on risk across all industries and the integration of business risk with other more technical or financial risk assessment that began with first establishment of the Australian and New Zealand Standard, AS/NZS 4360:1999 Risk Management in 1995. The report found that the Victorian State public sector was increasingly applying a structured risk management approach, though not necessarily that suggested by the Standard. Key drivers in that State included the Victorian Managed Insurance Authority Act 1996, the Financial Management Act 1994, the Victoria Government’s Management Reform Program and policies associated with private-public sector service and infrastructure delivery such as Partnerships Victoria. However, the Auditor General found that in over three quarters of public sector organisations, Boards/CEOs and executive management were directly involved and taking leadership roles regarding risk management. Nevertheless he concluded: - Although more than 90 percent of the State’s public sector organisations examined and applied

risk management processes “in some part of their business and services”, risk management across the State public sector was “not yet an established or mature discipline”.

- Nearly one third of all organisations were still not explicitly identifying and assessing their key

risks. - Nor were they always reporting risk information to their key internal and external stakeholders.

Page 44: r2A Risk and Reliability 5th_Edition

Paradigms

3.6 Risk & Reliability Associates Pty Ltd

- Improvement was needed in the ability of organisations to identify their key state-sector risks. While various entities might have an adequate view of their own risk exposures, they did not all understand how their exposures would impact other agencies or the State as a whole.

- The likelihood therefore existed that significant State-sector risks were going undetected and

under managed. . There was “a lack of clarity around the responsibility for the escalation of these risks and a lack of a full understanding of State-sector risks within portfolios”. Certain risk types could therefor go undetected at a State-sector level and the risk persisted that insufficient risk mitigation strategies could be implemented from a whole-of-state perspective.

- Most agencies had no existing structure to share risk management best practice across the

State-sector - The practice was still prevalent of reviewing risk strategies and assessment as a separate

annual exercise or through periodic Board presentations. The Auditor General Victoria report advised that risk management should not be an annual or infrequent exercise, but should be imbedded into usual business processes. Is said, “Risk leadership, appetite and culture” should be monitored constantly. And there should be reliable access to demonstrated risk management good practices in other public sector organisations as well as up-to-date information on key success factors or benchmarks. 3.6.2. UK Strategy Unit Study Britain’s Prime Minister, Tony Blair, recently directed his UK Strategy Unit to conduct an in depth study of modern risk, and how governments might better manage it. Despite improvements across government, Blair admitted that risk management in the UK had been “found wanting in a number of recent policy failures and crises”. What government needed to know was how to get “the right balance between innovation and change on the one hand, and avoidance of shocks and crises on the other”. This was now “central to the business of good government”. Blair instructed the Strategy Unit to draw on “good practice and thinking around the world - from across government, the private sector, and other experts and commentators”. Even prior to Blair’s directive to the Strategy Unit, the UK government had already made changes to its approach to risk. Blair described these as “radical”, and referred in particular to bodies like the UK Food Standards Agency, the Human Genetics Commission and the Monetary Policy Committee. He said these bodies illustrated the trend to “more open processes, based on evidence”, arguing that such processes were more effective at handling risks and winning public confidence than secrecy. He also pointed to the Civil Contingencies Secretariat whose aim was to improve the way the UK prepares for threats of serious disruption to the nation. One of the Unit’s early conclusions was that it was not only the accelerating pace of change in science and technology and the greater connectedness of the world that was heightening the risk environment for government. Escalating risk, especially political risk, was also due to “rising public expectations... [and] declining trust in institutions, declining deference, and increased activism around specific risk issues, with messages amplified by the news media.” The report concluded that, although improved, risk management by the UK government was still inadequate to the burgeoning challenge. It needed to keep constantly under review where risk management should best sit. It should strive for continuous improvement through “good judgement supported by sound processes and systems”.

Page 45: r2A Risk and Reliability 5th_Edition

Governance

Risk & Reliability Associates Pty Ltd 3.7

On the changing nature and severity of risk it referred to “unforeseen events, programmes going wrong, projects going awry” including:

- manufactured risks. That is, those “requiring governments and regulators to make

judgements about the balance of benefit and risk across a huge range of technologies – from genetically modified food and drugs, to industrial processes or cloning methods.

- direct threats. For example, events of September 11 to the threat of chemical and

biological attack. - risks resulting from the increasing vulnerability of citizens to distant events. For

example, those ranging from economic crises on the other side of the world to attacks on IT networks, diseases carried by air travellers, or the indirect impact of civil wars and famines.

- safety risk issues. For example, those arising from BSE, the Measles, Mumps and

Rubella (MMR) vaccine, and such other issues of risk to the public regarding, for example, rail safety, adventure holidays, flooding;

- imposed risks. Those imposed on the public by individuals or businesses that

necessitate government regulatory intervention; - risks of infrastructure disruption from industrial action, protest or failure of transport or IT

networks; - risks to government from the transfer of risk. For example, in capital projects and

service delivery to the private sector; - risks of damage to government’s reputation in the eyes of stakeholders and the public

that impact government’s ability to carry out its programs.

The report recommended action in six main areas.

- “systematic, explicit consideration of risk should be firmly embedded in government’s core decision-making processes (covering policy making, planning and delivery)”

- “government should enhance its capacity to identify and handle strategic risks, with

improved horizon scanning, resilience building, contingency planning and crisis management”

- “risk handling should be supported by best practice, guidance and skills development –

organised around a risk ‘standard” - “departments and agencies should make earning and maintaining public trust a priority

in order to help them advise the public about risks they may face. There should be more openness and transparency, wider engagement of stakeholders and the public, wider availability of choice and more use of “arm’s-length” bodies such as the Food Standards Agency to provide advice on risk decisions. Underpinning principles for handling and communicating on risk to the public should be published for consultation”

- “ministers and senior officials should take a clear lead in handling risk in their

departments – driving forward improvements, making key risk judgements, and setting a culture which supports well judged risk taking and innovation”

- “the quality of risk handling across government should be improved through a two-year

programme of change, linked to the Spending Review, and clearly set in the context of public sector reform (the Departmental Change Programme)”.

The report said its recommendations aimed to enable confident decision taking on both risk and innovation in order to reduce waste and inefficiency and lead to fewer unanticipated problems and crises that may undermine confidence and trust.

Page 46: r2A Risk and Reliability 5th_Edition

Paradigms

3.8 Risk & Reliability Associates Pty Ltd

Guideline principles were suggested to cover difficult areas. The report noted, for example, that governments normally seek to ensure that those who impose risks on others bear the consequences. But cases arise where responsibility cannot be attributed to any specific individual or agency. The report recommended that governments aim to ensure that responsibility rests with those best placed to manage the risk. It said that this should include protecting minority interests by balancing risks between different groups. Where the consequences of a risk are too great for any one individual or business to bear, the Unit recommended that government should intervene to provide protection or to pool the risk. Where the market cannot provide sufficient cover and the consequences are unacceptable, it believed the government should step in as insurer of last resort. Government might also need to intervene where market provision is withdrawn in response to an external shock. A case in point was the inability or unwillingness of airline companies after September 11 to bear the costs of enhanced airport and aircraft security. When the study was completed the UK Prime Minister introduced the report with a caution against the sort of unwarranted risk avoidance that results in unnecessary loss of promising opportunities:

“All life involves some risk, and any innovation brings risk as well as reward - so the priority must be to manage risks better. We need to do more to anticipate risks, so that there are fewer unnecessary and costly crises, like BSE or failed IT contracts, and to ensure that risk management is an integral part of all delivery plans. But we also need to be sure that innovations are not blocked by red tape and risk aversion, and that there is a proper balance between the responsibilities of government and the responsibilities of the individual”.

(The UK Strategy Unit’s report itself is available on http://www.number10.gov.uk/SU/RISK/risk/home.html). 3.7 Risk and Corporate Citizenship Clearly, the corporate goal of maximising returns for shareholders is no longer acceptable as the magic ethical bullet that justifies any means. The CACG states:

“Good corporate governance requires that the board must govern the corporation with integrity and enterprise in a manner which entrenches and enhances the licence it has to operate. This licence is not only regulatory but embraces the corporation’s interaction with its shareholders and other stakeholders such as the communities in which it operates, bankers and other suppliers of finance and credit, customers, the media, public opinion makers and pressure groups. While the board is accountable to the owners of the corporation (shareholders) for achieving the corporate objectives, its conduct in regard to factors such as business ethics and the environment for example may have an impact on legitimate societal interests (stakeholders) and thereby influence the reputation and long-term interests of the business enterprise.”

The wider social impact of corporate decisions is being recognised, and a widening sense of social responsibility is being encouraged. Obviously, this expands the area of risk that can now impact the business enterprise through its public image and civil liability. Note the emphasis on stakeholders, not just shareholders. External as well as internal stakeholders are mentioned. That is, not only the jobs and working lives of the corporation’s employees are involved, but those outside among the public affected by corporate activities. Note that the CACG’s reference to shareholder interest is restricted to the “legitimate” interests of shareholders. Shareholders are described now as only one among a number of stakeholder groups.

Page 47: r2A Risk and Reliability 5th_Edition

Governance

Risk & Reliability Associates Pty Ltd 3.9

Likewise the reference is not to vague, generalised industry standards and relevant statutory obligations but to keeping up with “best” business practice – a more specific and demanding term. CACG principles point to increasing recognition of the wider impact of corporate decisions on the community. Attention is focussing on how the corporation should relate to the community, including the extent of social responsibility over and above an organisation’s obligation to shareholders, the law, and the bottom line. 3.8 Fallout Severity In the contemporary climate failures of corporate governance can result in very public fallouts with severe consequences not only for the corporation, its shareholders and stakeholders, but also for individual managers. In Australia this was most recently illustrated when on 24 March, 2004, the Australian Prudential Regulation Authority's (APRA) review of the foreign currency trading scandal at the National Australia Bank became public. Irregular currency options trades had incurred losses of $360 million at the National Australia Bank. It led to the sackings of four traders, other executive departures, and a change of chairman and chief executive. Media reports highlighted that the NAB had to halt its latest share buy-back in order to lift its capital adequacy ratio; the bank was not able to use its own internal measure of market risk capital; and its currency options desk was also halted for proprietary trading and corporate business. (For example, ABC Radio National Report, 24.3.04) The APRA review said that there were: - many missed opportunities to detect and close down the irregular currency options trades;

- management at the bank had turned "a blind eye" to known concerns; - back-office procedures had significant gaps; - executive risk committees were "particularly ineffective"; and - the bank's board was not sufficiently pro-active on risk issues. The regulator's report criticised what it called a culture where risk management controls were seen as "trip-wires to be negotiated rather than presenting any genuine constraint on risk-taking behaviour". The regulator says it frequently came across the phrase "profit is king" in its investigations. The chairman of the Australian Shareholders Association, John Curry, commented that it was “not sufficient for the audit committee to say that they didn't receive some of the information they should have received. The audit committee should have been out there asking questions and probing and finding out whether the systems were correct or not." (ABC Radio National, 24.3.04)

Page 48: r2A Risk and Reliability 5th_Edition

Paradigms

3.10 Risk & Reliability Associates Pty Ltd

3.9 Basic Principles of Good Corporate Urban Governance It is also worth noting that an inter-agency grouping is seeking to get the UN General Assembly to adopt the following principles for good urban governance. The campaign proposes the following concepts as goals not merely for rhetorical declarations, but for operational implementation:

• sustainability • subsidiarity • equity • efficiency • transparency • accountability • civic engagement • citizenship • security.

Given the efforts by the Commonwealth Heads of Government, UN agencies and numerous non-government bodies, it is reasonable to anticipate sharper, more critical public scrutiny and reaction to actual or perceived corporate failure to live up the new standards of good governance. Companies should be prepared to face rigorous public probing during the “fallouts” that will certainly follow any such occurrences. In regard to good governance in general whether governmental quasi-governmental or corporate, the Commonwealth Association for Corporate Governance (CACG) believes the following elements are essential: efficiency, probity, higher levels of conduct by professions and professionals, active and responsible capital providers, effective legal and regulatory regimes, reasonably competitive markets, a free and critical media. REFERENCES Fukuyama Francis, Professor of Public Policy, George Mason University. The Independent (16/6/99) Beck, Ulrich (1992). Risk Society - Towards a New Modernity. SAGE publications. Commonwealth Association of Corporate Governance’s (CACG’s) 15 Principles. www.combinet.net/governance/FinalVer/commonwe.htm Protiviti’s The Bulletin, Vol 1, Issue 7, 06/2003. Standards Australia/Standards New Zealand (1999). Risk Management. Australian/New Zealand Standard AS/NZS 4360:1999. READING

Page 49: r2A Risk and Reliability 5th_Edition

Liability

Risk & Reliability Associates Pty Ltd 4.1

4. Liability The law is much too important to be left up to lawyers. Australian aphorism. 4.1 Statute vs Civil Law Civil or common law is law derived from actual cases, that is, law made by or modified by the judiciary. Common law is the product of societal values over centuries and evolved in the English courts. One party claiming damages from another brings civil cases under common law. In common law the case must be proved “on the balance of probabilities”. Statute law is law passed by Acts of parliament. This law takes the form of Acts and Regulations made under an Act. Statute law specifies penalties for breaches. Statutory offences require that the case against the accused be proved 'beyond reasonable doubt'. This is considerably heavier than the civil standard. Government departments and statutory authorities are responsible for the enforcement of statute law. They determine whether to prosecute for breach of statutory duty. The paper by Gumley, 2003 outlines the offences and penalties for environmental crimes in Victoria and those under OH & S legislation take a similar approach. Australian OH & S legislation is based on the U.K. Roben’s type legislation and is derived from the common law duty of care concept (Creighton 1996), particularly the duty of employers towards their employees. Each Australian state has its own OH & S and Environmental legislation but whilst all are very similar there are some subtle differences as to the extent of the duties. In the 1985 Victorian OHS Act, for example, the duties of employers are qualified by “so far as practicable” with “practicable” being defined as having regard to:

a) the severity of the hazard or risk in question b) the state of knowledge about that hazard or risk and the ways of removing or mitigating that

hazard or risk c) the availability and suitability of ways to remove or mitigate that hazard or risk d) the cost of removing or mitigating the hazard or risk.

In some cases of workplace deaths, the authorities have brought charges of manslaughter under the Crimes Act. As yet no successful convictions have been obtained in Australia because the individuals charged must be shown to have mens rea or a guilty mind. In Victorian law the relevant mens rea for manslaughter is gross or criminal negligence. In some jurisdictions the concept of industrial manslaughter for workplace fatalities has been introduced because of the difficulty of proof beyond reasonable doubt. There are also some difficulties in determining which individuals represent the mind of the corporation. 4.2 Common Law Criteria Common law actions as a result of workplace injury have largely been supplanted by Workers compensation systems, i.e. injured workers receive compensation for the impacts of injury without having to take action against the employer in the courts. However apart from a much reduced number of workplace injury cases the common law duty of care of one person to another is invoked in many aspects of modern life. For example, the organisers of any public event have a duty of care to all those involved in or potentially impacted by the event. It is the common law duty of owners and occupiers of premises to ensure they are safe for members of the public who have access. Failure to do so may be negligent, and can lead to the significant costs associated with common law claims, and can also lead to statutory penalties for ‘responsible’ individuals if the responsible government authority decides to act.

Page 50: r2A Risk and Reliability 5th_Edition

Liability

4.2 Risk & Reliability Associates Pty Ltd

To be found guilty of negligence, the answers to all four of the questions posed below, on the balance of probabilities, needs to be “Yes”. These are termed the four common law tests of negligence. A. CAUSATION

Did the injury or damage occur because of the 'unsafe' matter on which the claim of negligence is based?

B. FORESEEABILITY Did you know or ought you to have known... ? Could this have been foreseen...? (Prior incidents, complaints, wide or common knowledge, or expert advice) C. PREVENTABILITY

Is there a practical way or alternative to how things were done? (Design or removal; administration and training).

D. REASONABLENESS Was the balance of the significance of the risk vs the effort required to reduce it reasonable? Note that: • approved or common practice may or may not be reasonable. • compliance with regulations and codes of practice is a starting point, not a goal. For example

BS 5760 : Part 12 : 1993 (page iii) states, Compliance with a British Standard does not of itself confer immunity from legal obligations.

• the occupier/employer must be practically able to undertake the change. • expense alone is not a factor, nor is practical inconvenience • the creation of other risks by the change needs to be considered. • individual susceptibility needs to be considered. Because of the considerable volume of case law available to the judiciary, the application of the common law tests of negligence also provide much of the basis for decisions relating to cases of offences under OH & S and Environmental law. These tests of negligence require expert evidence; lawyers cannot decide them. Most common law cases never reach court because the lawyers settle out of court. If there is no evidence of significance that would lead a judge or a jury to derive a “no” answer to any of the four tests above, the lawyers for the defendant can only accept defeat and settle for a relatively large sum. 4.3 On Juries and Justice With regard to common law actions for negligence described above, a jury sometimes determines the balance of probabilities. It seems that juries can be affected by the horror of the injuries and other matters so that even if the assessment might be less than 50% in favour of the plaintiff, the jury will still find in the plaintiff’s favour. But juries are complex. For an extreme example consider the following case from a sitting of the District Court, composed of the presiding Judge and Jury in Dubbo, NSW. (As quoted by the Hon. James Muirhead QC in Discharge the Jury? Menzies School of Health Research, 1989). The accused, a local man, was charged with cattle stealing. Apparently the evidence that he had stolen the cattle was overwhelming. The local jury having considered their verdict returned to court. When asked for their verdict the foreman replied, 'We find the defendant not guilty if he returns the cows.' The Judge was furious. He vigorously reminded the jury of their oaths to 'bring in a true verdict according to the evidence', declined to record their verdict and sent them back to the jury room to reconsider the verdict. The jury retired briefly and returned with a defiant air. When asked if they had reconsidered their verdict the foreman said 'Yes, we have. We find the accused not guilty and he can keep the cows.

Page 51: r2A Risk and Reliability 5th_Edition

Liability

Risk & Reliability Associates Pty Ltd 4.3

There are several points about the adversarial system that need to be remembered. It is first and foremost a court of law. As Engineers Australia notes in the brochure Are You at Risk (1990): Adversarial courts are not about the dispensing justice, they are about winning actions. In this context, the advocates are not concerned with presenting the court with all the information that might be relevant to the case. Quite the reverse, each seeks to exclude information considered to be unhelpful to their side's position. The idea is that the truth lies somewhere between the competing positions of the advocates. Further, courts do not deal in facts, they deal in opinions. Again from Are You at Risk : What is a fact? Is it what actually happened between Sensible and Smart? Most emphatically not. At best, it is only what the trial court - the trial judge or jury - thinks happened. What the trial court thinks happened may, however, be hopelessly incorrect. But that does not matter - legally speaking. That is, in court, the laws of man take precedence over the laws of nature. 4.4 Due Diligence The primary defence against negligence claims is due diligence. This really means that a reasonable person (in the eyes of the court and with the advantage of 20:20 hindsight) in the same position would have undertaken certain procedures and processes to ensure whatever it is that did happen, on the balance of probabilities, shouldn't have occurred. This is probably best represented by the diagram below (adapted from Sappideen and Stillman 1995).

Magnitude of RiskProbability of Occurence

Severity of Harm

ExpenseDifiiculty and Inconvenience

Utility of Conduct

How Would a Reasonable Defendant Respond to the Foreseeable Risk? The overall situation is perhaps best summarised by Chief Justice Gibbs of the High Court of Australia: Where it is possible to guard against a foreseeable risk, which, though perhaps not great, nevertheless cannot be called remote or fanciful, by adopting a means, which involves little difficulty or expense, the failure to adopt such means will in general be negligent. Turner v. The State of South Australia (1982) (High Court of Australia before Gibbs CJ, Murphy, Brennan, Deane and Dawson JJ).

Page 52: r2A Risk and Reliability 5th_Edition

Liability

4.4 Risk & Reliability Associates Pty Ltd

The balance is the hard part. It is hard for outsiders to know the true extent of the resources (financial, administrative and/or staff) ultimately available to an organisation. This means external assessment as to the correctness of the “balance” is difficult and something an individual organisation must do internally. The legislated hierarchical order of risk control solutions is:

i) Elimination or Removal (100% effective) ii) Design or engineering (typically 90% effective) iii) Administration (typically 50% effective) iv) Training (typically 30% effective).

Another way of expressing the courts’ reluctance to rely on training and administrative controls is to see it in the context of a cause-consequence model. A concept diagram is shown below:

Threat

Precaution Failure

Loss of Control

Loss

Near Miss

Falling objects on construction site

Kickrails to restrain small objects

PPE Hardhat

Incident

Concept Cause-Consequence Model

Primary controls include kickboards on platforms to prevent objects from being dislodged and falling in the first place. Note that personal protective equipment (in this case a hard hat) improves the probability of a near miss but that the system was out of control already once an object had started to fall. This needs to be taken into consideration when assessing the balance noted above. Specifically, it is imprudent, and indeed unlawful, to rely on administrative and training solutions when a design solution, on balance is available.

Page 53: r2A Risk and Reliability 5th_Edition

Liability

Risk & Reliability Associates Pty Ltd 4.5

4.5 Safety Cases Safety Cases provide for a very interesting perspective in the liability context. Historically they were developed to optimise safety performance. There are parallels to a Business Case, which is usually drawn up to convince a financier that a business is viable (Redmill et al., 1997). The object is to ensure that all significant factors affecting the business have been identified and that appropriate measures are in place to maximise the positive factors and minimise the negative ones. It is usually the responsibility of the highest levels of management of the organisation. Accordingly, responsibility for the failure of a business usually rests there too. A Safety Case is intended to provide the same assurance with respect to the safety of a system or complex. Again it is primarily the responsibility of the operating company, at its highest levels.

Safety Management

System

Business Units

BusinessManagement

System

Middle Management

Financial Audit

SafetyAudit

Board

CEO

Idealised Safety Case Structure

Safety Cases are in effect reasoned (legal) arguments that all significant hazards have been identified, properly managed and are ‘safe’. Once established, it typically manifests itself as a contract between an organisation and a regulator that permits the organisation to operate within defined limits in accordance with documented procedures. Compliance failure is a breach of contract. If damage to third parties, or death and injury occur due to such breaches then serious liabilities arise. Because of this, the adversarial legal process seems to have converted the concept to a liability management device. This is discussed further in the next section. Quality type processes are good to ensure compliance with the contract. However they are less effective in establishing the Safety Case initially, or in the argument for its subsequent redevelopment. Risk analysis is essential for the Safety Case’s initial development and continuing validity.

Page 54: r2A Risk and Reliability 5th_Edition

Liability

4.6 Risk & Reliability Associates Pty Ltd

4.6 Adversarial Legal System Contradictions Arising from the above review, there appear to be some profound contradictions being created in risk control and the adversarial legal system. Firstly, the emerging view that risk control failures arise from systemic (being strategic or policy) errors (Reason, 1993) does not appear to create liabilities for the policy or strategic decision makers. Rather it imposes the responsibility to be diligent (with all the subsequent liability) on those who actually have to implement such policies. It is also interesting to note that for senior management and board members at least, liability management is identical to consequence management. Frequency and therefore risk management is not really an issue. If a serious loss event can credibly occur (in legal terms it is possible) then it must be managed. The fact that it occurs very, very rarely is not relevant. To paraphrase a judge in NSW; "What do you mean you didn't think it could happen; there are seven dead". This liability impact has had a great effect on the development of safety cases. The Victorian Major Hazards legislation, for example, indicates that the chief executive officer or the most senior officer resident in the state of Victoria shall sign off the safety case. Passing a potential safety case via two sets of lawyers in the loop shown below changes its nature from being a wholly technical statement of safety by technical persons to a liability management device, a substantial development.

Board

Policy

Middle management assessement and

attempted feedback

Corporate legal sign

off

Requested resources, $, time & peopleIn house

legal advice

Safety Case Development Loop Secondly, the notion of a statute is that it represents a law that a citizen can choose to obey or not (we have free will). If it is not obeyed then a penalty will be imposed. Ignorance is no excuse. However, if a policy or system of management created the circumstances leading to the failure, meaning the individual did not really understand the risk framework being imposed, then a very difficult contradiction occurs. That individual has to have knowledge mastery of the total social/legal/technical risk control system in which he or she works so that potential problems can be demonstrated to those same policy makers in ways the policy makers cannot legally avoid. Otherwise the responsibility (and liability) cannot be restored to those higher management echelons. The parallel is to a soldier being commanded to perform crime. A soldier is trained to obey orders it is part of his work culture but he needs to have knowledge of societal law and mores as well. He has to know when to refuse an illegal command; otherwise he can find himself being charged with a crime. 4.7 Risk Auditing Systems Risk auditing rating systems like Victorian Government’s SafetyMAP, the NSCA’s 5 star system or Det Norske Veritas’s International Safety Rating System are also interesting in this context. Whilst they may provide indications as to the overall heath of risk control systems, they are not a direct defence against liability arising from a particular accident even if perfect scores had been consistently attained by the participating organisation.

Page 55: r2A Risk and Reliability 5th_Edition

Liability

Risk & Reliability Associates Pty Ltd 4.7

REFERENCES Creighton W B (1996). Understanding Occupational Health and Safety Now in Victoria. CCH Australia, Sydney. Gumley W (2003). Environmental Crimes: Offences and Penalties in Victoria. Corporate Misconduct ezine: http://www.lawbookco.com.au/academic/Corporate-Misconduct-ezine/html-files/articles.asp Engineers Australia (1990). Are You at Risk? Canberra. Muirhead J. Discharge the Jury? Menzies School of Health Research, (1989) Reason J (1993). Managing the Management Risk: New Approaches to Organisation Safety , Chapter 1 of Reliability and Safety in Hazardous Work Systems: Approaches to Analysis and Design. Eds I Wilpert et al. Lawrence Erlbaum Associates Ltd, East Sussex. ISBN 0-86377-309-5. Redmill, Felix and Jane Rajan (1997). Human Factors in Safety Critical Systems. Butterworth-Heinemann, Oxford. Sappideen C and R H Stillman (1995). Liability for Electrical Accidents: Risk, Negligence and Tort. Engineers Australia Pty Limited, Crows Nest, Sydney. Turner v. The State of South Australia (1982). High Court of Australia before Gibbs CJ, Murphy, Brennan, Deane and Dawson JJ). READING Smith Damien J (1986). Engineers & Professional Negligence. Enterprise Care, 1st Floor, 21 Burwood Road, Hawthorn, Victoria, 3122. Reprinted 1994. ISBN 0 646 09785 7.

Page 56: r2A Risk and Reliability 5th_Edition

Causation

Risk & Reliability Associates Pty Ltd 5.1

5. Causation We all have our philosophies, whether or not we are aware of the fact, and our philosophies are not worth very much. But the impact of our philosophies upon our actions and our lives is often devastating. This makes it necessary to try to improve our philosophies. (Paraphrased from Karl Popper, 1972). The way in which we believe things occur determines how we will respond and attempt to manage them. Risk analysis in many ways is an examination of our philosophies or prejudices using processes that can withstand judicial scrutiny. It is therefore culture, time and place specific. As mentioned in Chapter 1, if one believes that people are dying from the plague because of selective retribution from God for past sins, then the way this risk is managed will be different from that for a society, which believes in germ theory. In business terms for example, the world is often regarded as a wholly commercial place with everyone acting in a self-interested manner. If this is true, then what one does to prosper and minimise risk will differ from the actions of those who believe in a more humanistic view of human behaviour and responsibility. The natural material world on the other hand tends to be considered as deterministic or probabilistic in nature and subject to natural laws. This creates some profound contradictions. Engineers believe they can change the future using materials and systems that behave predictably. But if the engineers themselves are predictable, can their actions be similarly predetermined? Our courts have similar problems. How can someone be convicted of a crime if his or her behaviour was predetermined by his or her situation and circumstances? 5.1 Paradigms Paradigms (Kuhn 1970), or a set of concepts shared by a community of scientists or scholars, are fundamental issues. Alternate names given to describe these different views of how things happen include worldviews, or weltanschauung. Consider a simple example by comparing the views of some insurance authorities with those of risk engineers. Suppose that a certain class of car was having more accidents than most. An engineer investigating this might conclude that it was due to malfunctioning brakes. If so, a product recall would be made and the problem fixed. A month or two might go by to ensure that the accident frequency really did go down abruptly (a step function) and if so the matter, from the engineer's perspective, would be closed. That is, the causal effect between malfunctioning brakes and accidents had been established and the problem solved. Now consider the insurers. They will have increased premiums for this class of car once the accident rate increased. Once the accident rate drops to the same as all other cars the engineer might expect an immediate drop in the premiums. However, underwriters tend to have a probabilistic view of things. The premiums will almost certainly be averaged over several years and drop progressively, not abruptly. That is, underwriters usually have a probabilistic view on the universe, not a causal one. There are some interesting management shifts occurring. Maruyama (1974) describes three simplified ‘pure’ paradigms or structures of reasoning shown below. Some of the views in the table are prior to current modern views regarding community consultation especially for environmental risk assessment where community consultation is a prime source for legitimacy.

Page 57: r2A Risk and Reliability 5th_Edition

Causation

5.2 Risk & Reliability Associates Pty Ltd

It appears that a shift has occurred from paradigm 1 to paradigm 3 in the last 25 years. (1)

Unidirectional Causal Paradigm

(2) Random Process Paradigm

(3) Mutual Causal Paradigm

Science: Traditional 'cause and effect' model

Thermodynamics; Shannon's information theory

Post-Shannon information theory.

Information: Past and future inferable form

Information decays and gets lost; blueprint must contain more information than the finished product.

Information can be generated. Non-redundant complexity can be generated without pre-established blueprint.

Cosmology: Predetermined universe Decaying universe Self-generating and self-organising universe

Social organisation:

Hierarchical Individualistic Non-hierarchical interactionist

Social policy: Homogenistic Decentralisation Heterogenistic coordination Ideology: Authoritarian Anarchistic Cooperative Philosophy: Universalism Nominalism Network Ethics: Competitive Isolationist Symbiotic Aesthetics: Unity by similarity and

repetition Haphazard Harmony of diversity

Religion Monotheism Freedom of religion Polytheism harmonism Decision process:

Dictatorship, majority rule or consensus

Do your own thing Elimination of hardship on any individual

Logic: Deductive, axiomatic Inductive, empirical Complementary Perception: Categorical Atomistic Contextual Knowledge: Believe in one truth.

If the people are informed, they will agree.

Why bother to learn beyond ones own interest.

Polyocular; must learn different views and to take them into consideration.

Methodology:

Classificational, taxonomic

Statistical Relational, contextual analysis, network analysis.

Research hypothesis and strategy:

Dissimilar results must have been caused by dissimilar conditions. Differences must be traced to conditions producing them.

There is a probability distribution; find out probability distribution.

Dissimilar results may come from similar conditions due to mutually amplifying network. Network analysis instead of tracing of the difference back to initial conditions in such cases.

Assessment: 'Impact' analysis What does it do to me?

Look for feedback loops for self-cancellation or self-reinforcement.

Analysis: Pre-set categories used for all situations

Limited categories for own use

Changeable categories depending on situation.

Community people viewed as:

Ignorant, poorly informed, lacking expertise, limited in scope

Egocentric Most direct source of information, articulate in their own view, essential in determining relevance

Planning: By 'experts'; either keep community people uninformed, or inform them in such a way that they will agree

Laissez-faire Generated by community people.

Three 'Pure' Paradigms after Marayama (1974)

Page 58: r2A Risk and Reliability 5th_Edition

Causation

Risk & Reliability Associates Pty Ltd 5.3

5.2 Biological Metaphors 5.2.1 Reason's Pathogens James Reason’s (1993) resident pathogen model of how things go wrong is described in the figure below. The idea is that latent failures in technical systems are analogous to resident pathogens in the human body, which combine with local triggering factors, for example, life stresses or toxic chemicals, to overcome the immune system and produce disease. Like cancers and cardiovascular disorders, accidents in defended systems do not arise from single causes. Rather, they occur as a result of the adverse conjunction of several factors, each necessary but none sufficient to breach the defences alone. And, as in the case of the human body, no technical system can ever be entirely free of pathogens.

Fallibledecisions Latent failures (high level decision makers)

Latent failures (line management)

Latent failures (preconditions)

Active failures (productive activities)

Active and latent failures (defences)

Line managementdeficiencies

Preconditions for unsafe acts

Unsafe acts

Failed or absent

defences

Accident

Reason’s Resident Pathogen Metaphor Model

Such a view leads to a number of views about accident causation: a) Accident likelihood is a function of the number of pathogens within the system. b) The more complex and opaque the system, the more pathogens it will contain. c) Simpler, less well-defended systems need fewer pathogens to bring about an accident. d) The higher a person’s position within the decision making structure of a system, the greater the

opportunity to spawn pathogens. e) Local triggers are hard to anticipate. f) Resident pathogens can be identified pro-actively. g) Neutralising pathogens (latent failures) are likely to have more and wider ranging safety benefits

than those directed at minimising active failures. h) The establishment of diagnostic organisational signs will give general indications of the health of

the high-hazard technical system.

Page 59: r2A Risk and Reliability 5th_Edition

Causation

5.4 Risk & Reliability Associates Pty Ltd

5.2.2 Kauffmans' Complexity Kauffmans’ view (1995) is interesting in terms of organizational behavior. We may have our intentions, but we remain blind watchmakers. We are all, cells and CEOs, rather blindly climbing deforming fitness landscapes. If so, then the problems confronted by an organization - cellular, organismic business, governmental or otherwise - living in niches created by other organizations, is preeminently how to evolve on its deforming landscape, to track the moving peaks. Tracking peaks on deforming landscapes is central to survival. Landscapes in short are part of the search for excellence - the best compromises we can attain. 5.2.3 Dawkins' NeoDarwinism In the context of modelling complex technological systems, Richard Dawkins’ computer based artificial selection (his Biomorphs) provides some fertile parallels for risk and reliability engineers (Dawkins 1986 and 1998). In practice this boils down to modelling a complex system in a virtual reality environment and playing endless “what if” scenarios. An example was discussed in Section 2.7. Other examples, which you may be, more familiar with are Flight simulators for aircraft pilot training, road traffic modelling for designing road traffic control and simulation modelling of nuclear explosions. The last of these has been influential in convincing governments to sign nuclear test ban treaties. Obviously, these require fearsome computer power and an extensive interpretation of nature. And, a belief that hyper-reality can come close to reality. 5.3 Discrete State Concepts State models are based on the notion that any system with different ways and combinations of achieving similar outcomes can be described by a number of distinct, mutually exclusive, independent states. That is, failure or change in one state or condition is independent of the other. Once established, any of the defined operating states can be attained. The sequence to achieving each state may not be important per se. It is the time the system is in each state and the likelihood of transiting between states. Block diagrams and other graphical methods can be used to illustrate the system. The one immediately below depicts a redundant accounting system to ensure that correct accounts are kept.

Accounting System

AuditingSystem

A Redundant Accounting System

There are three possible states: State (0) Both systems operating State (1) One system operating State (2) Both systems failed

Page 60: r2A Risk and Reliability 5th_Edition

Causation

Risk & Reliability Associates Pty Ltd 5.5

The likelihood of failure of the second system once the first system has failed may well be different to the likelihood of failure when both systems are operating. Depending on which system has failed, the restoration rate of the other may also be different. These three states and transitions can be represented in different ways, such as in the following figure:

0 1 2

1st system fails 2nd system fails

1st system restored

State Transition Diagram Markov chain analysis is the most common form of state analysis technique as it assumes a constant failure rate and restoration rates. Other probability distributions can be used with more difficulty. The conceptual problem with the technique is defining all the possible system states. It can get very complex very rapidly. Ignoring partial states can render the analysis difficult. 5.4 Time Sequence One of the central concepts of causation is conjunction in time and space. The courts reflect this in the form of a chronology of events leading up to the "crime". For lawyers, the time sequence is defined by a list of events described in words down a page. For engineers it is usually an arrow of time going from left to right across a page. This general idea can easily be extended to most events, such as fire in a building as shown below.

Ignition Smoke Flame Flashover Escalation Burnout

Time Time Sequence Model of Fire

Having developed such a simple model it can be extended. The time sequence below for fire in a building was developed to satisfy underwriting concepts. It has the same two parts of the risk equation, how likely the fire is to develop (the inception risk) and how severe the consequences (the propagation risk) are likely to be.

Housekeeping,dust control,

storagearrangements,construction,

etc

SupportingConditions

Evacuation and Brigade response can commence

Smoking,wiring,

welding,static

sparks,etc

Rate of firegrowth,

combustibleloading,

spillsystems, etc.

IgnitionFire

Development

Fire reaches

significant,detectable

size

Inception Risk

SmokeDetection

Smoke LossExpectancy

Sprinklersand/or

foam

Maximum Forseeable

LossThermal Loss

Expectancy

Passive fire

control

ThermalDetection

Firewalls,Space

separation

TimeAlert staff,

Smoke detectors

Propagation Risk

Burnout

Generalised Time Sequence Model for Fire

Page 61: r2A Risk and Reliability 5th_Edition

Causation

5.6 Risk & Reliability Associates Pty Ltd

Note that the fire development or growth rate is not linear once flaming has commenced as shown in the following figure:

Smoke

Very Rapid Fire Growth

Flame

Time Representative Fire Curve

Different analysts have developed different time sequence models for different problems at different times. Heinrich's domino model of causation (derived in the 1940s) has a particular focus (Heinrich, 1959), shown below.

1. Ancestry and Social Environment

2. Fault of Person

3. Unsafe act or/ unsafe mechanical or physical condition

4. Accident

5. Injury

1

2

3

4

5

Removal of middledomino breaks the chain

Heinrich's Domino Model

Such a model suggests that accidents are ultimately derived from an individual’s ancestry and social environment. That is: 1. People are born with and/or are socialised to develop faulty personal characteristics such as

recklessness, stubbornness, avariciousness and the like. 2. Inherited or acquired faults of a person including recklessness, violent temper, nervousness,

excitability and inconsiderateness constitute proximate reasons for committing unsafe acts or permitting the existence of physical hazards.

3. Unsafe acts or performance will occur. 4. Accidents will occur (falls of persons, being hit etc). 5. Then injuries will result. The point of his model is that if one link in the domino can be removed (domino 3) then the chain will be broken. This supports the modern thrust in legislation and views such as those expressed by Kletz (1985) in his text, An Engineer’s View of Human Error. Kletz notes that saying that accidents are the result of human failings may or may not be true, but it is certainly not helpful in risk control terms.

Page 62: r2A Risk and Reliability 5th_Edition

Causation

Risk & Reliability Associates Pty Ltd 5.7

Rowe’s Risk Estimation Model, (see figure below) is directed at hazards with multiple pathways to damage situations (Rowe, 1977). This is particularly appropriate for some large chemical incidents and nuclear reactors, for example, where direct radiation, radioactive dust fallout and entrainment in the food chain can all provide cumulative doses to the exposed group. A risk agent is a person or group of persons who evaluate directly the consequences of a risk to which they are the subject. The arrows indicate that in principle multiple pathways can lead from one element to the other. Each pathway can have a probability associated with its occurrence.

The causative event is the beginning in time of an activityCausative

event

Exposure(s)

Consequence types

The final result of an activity initiated by a causative event

The condition of being vulnerable in some degree to a particular outcome of an activity, if that outcome

The impact to a risk agent of exposure to a risky event

Outcome(s)

Consequence values

The importance of a risk agent subjectively attaches to the undesirability of a specific risk consequence

Rowe's Risk Estimation Model

Ishikawa (1985) ‘Fishbone’ diagrams, shown below are another form of time sequence model often used by quality control advisers.

Material Machine Measurement

Man Method

EffectQuality characteristics

ProcessCharacteristicsCause Factors

Milieu

Ishikawa 'Fishbone' Diagram Ishikawa also refers to them as ‘cause and effect’ diagrams (Ishikawa, 1985). The effect is found at the right hand end. The words appearing at the tips of the main branches are causes or so called ‘cause factors’. The collection of these ‘cause factors’ is a process. The minor branches are inputs to the cause factors or sub-causes. The object of the exercise is to improve the quality characteristics by identifying the most important cause factors and adjusting them appropriately.

Page 63: r2A Risk and Reliability 5th_Edition

Causation

5.8 Risk & Reliability Associates Pty Ltd

5.5 Energy Damage From an engineering perspective, it is observed that injury damage and ill health are the result of the loss of control of damaging energies. Such a concept has a number of useful consequences. Firstly, establishing where energy can be released to affect people, (conjunction in time and space) provides a simple basis for determining vulnerabilities in a complex system. Secondly, the nature of the energy release provides insight into control options. For example, kinetic energy is proportional to the square of the speed of a vehicle. Going twice as fast means that 4 times the energy would be released on impact. A list of damaging energies is shown below.

External Energies Potential Energies gravitational

structural strain compressed fluids

Kinetic Energy linear and rotational motion 'Flowing' Mechanical Energy mechanical power in machinery Acoustic and other Vibrating Energy noise and mechanical vibration Electrical Energy electrical potential energy (volts)

electric-,magnetic radiation electrostatic charge

Ionising Radiation nuclear particles and radiation Thermal Radiation solids, fluids, flames

ambient condition Chemical Energy fire, explosion

toxic effects corrosive effects

Micro-biological infections, parasites, bacteria, virus, etc Muscular Energy purposeful (attacks) and inadvertent Internal Energies Whole or Part-Body Mass Energy gravitational, potential and/or kinetic energy, for example:

walking/running or swinging/moving the limbs Muscular Energy overload, overuse and postural energy levels

Damaging Energies

5.6 Energy Damage Models Energy damage concepts define a hazard as the source of energy. So a brick on the floor is not a hazard in itself, rather it is the potential energy of the person who trips. This sometimes seems trivial but in one expert witness case, for example, the authors considered the situation of an electrician who received an electric shock whilst on top of a ladder. He subsequently fell and hit his head on the concrete floor resulting in serious injury. The hazard in this case was the gravitational potential energy that was released during the fall that could have been controlled by wearing a hard hat. The electric shock represents only a possible reason for the fall and not the primary hazard source. Energy damage models are particularly effective in establishing control options. Haddon (1973), for example, suggests 10 generic counter strategies: i) Prevent marshalling of energy (don’t climb to a height} ii) Reduce energy marshalled (reduce speed) iii) Separate in time and space (install road traffic signals) iv) Prevent the release of energy (fit guard rails) v) Separate by a barrier (install guards) vi) Modify release rate of energy (reduce slope) vii) Strengthen structure (fire proof buildings) viii) Modify surface impact (remove sharp edges) ix) Detect, counteract damage (fire sprinklers) x) Optimise repair (rehabilitation)

Page 64: r2A Risk and Reliability 5th_Edition

Causation

Risk & Reliability Associates Pty Ltd 5.9

These 10 strategies provide a hierarchy of control and an opportunity to recognise additional essential factors as shown below:

PredisposingConditions

SituationNormal

Moving outof control

Out ofControl

Damage Repair

Preventmarshalling

energy(don't climb to

a height)

Prevent release

of energy(fit guardrails)

Modify rate orrelease of

energy(reduce slope)

Modify surface impacted

(remove sharpedges)

Optimiserepair

(rehabilitation)

Detect,counteract

damage(fire sprinklers)

Strenthenstructure(fire proofbuildings)

Separate intime or space(install road

traffic signals)

Reduce Energymarshalled

(reduce speed)

Separate by abarrier (install

gurardrails)

Time Zones

Haddon'sStrategies

Strategy for Management of Energy Exchanges The energy damage concept can be represented in different ways. The figure below of the extended energy damage model (Viner, 1991) shows possible hazard control mechanisms in terms of recipient effects.

Hazard control mechanism

Recipient'sboundary

Space transfer mechanism

hazard recipient

Extended Energy Damage Model The types of risk control measures, which are evident from this model, are: i) Control the existence or amount of energy. ii) Maintain the reliability of the hazard control mechanism. iii) Remove or reduce the need for the space transfer mechanism. iv) Raise the damage threshold of the recipient. v) Separate the hazard and the recipient. This can perhaps be best explained by considering someone exposed to a noisy machine. The machine can be replaced by a less noisy device or the noise could be reduced at its source by acoustic dampening on the machine, the machine and the recipient could be separated by the installation of an acoustic hood over the machine, or the recipient’s damage threshold could be artificially raised by the provision of hearing protection.

Page 65: r2A Risk and Reliability 5th_Edition

Causation

5.10 Risk & Reliability Associates Pty Ltd

Energy damage concepts are particularly useful for constructing cause-consequence models in assisting in the determination of the loss of control point. Whilst we may consider that in many circumstances, an incident occurs when there is a loss, at least in complex systems, the Loss of Control point is actually the incident. Unless such loss of control incidents are recorded and investigated the system is heading for a fall.

Threat

Precaution Failure

Loss of Control

Loss

Near MissIncidents

Concept Cause-Consequence Diagram

It is always better to control a hazard before loss of control point rather than respond during or after the event. Lawyers are far more prone to sign off on a management strategy that suggests that the dangerous situation will be prevented rather than relying on a rapid response strategy. Consider the hazard of fire in a building. In this case, it would be best to eliminate the source of energy, that is, the vulnerability or hazard, by using non-combustible materials. The next best alternative would be to control the hazard, for example, by installing automatic sprinklers. The least desirable (although sometimes necessary) option is to rely on human response, which occurs after the outbreak of fire, that is, after the loss of control of the latent chemical energy stored in the structure. 5.7 Conditions and Failures A latent failure (a failure which is not detected and/or enunciated when it occurs) will disable protective mechanisms or reduce safety margins thereby increasing the risk associated with hazards due to subsequent conditions or failures. Latent failures, by themselves, do not constitute hazards (that is, by themselves they have no effect which would make them noticeable, otherwise they would not be latent, by definition). Usually latent failures affect only functions which are not relied upon in normal operation, but which provide fail-safe coverage and/or protection against abnormal conditions. (SAE ARP 4761, Appendix D) The notion of latent conditions has re-emerged in causation recently, largely as a result of James Reason’s (1997) promotion of latent conditions. J L Mackie (1965) outlines a situation, which can be used to explore the concept: Suppose that a fire has started in a house, which is extinguished before it consumes the house completely. Fire investigators will investigate the cause and may conclude that it started in some wiring due to a short circuit. However, this is not a simple concept. 5.7.1 Necessary Conditions A necessary condition is a positive condition that must be present for the incident to occur. In the example of a house fire, necessary conditions include combustible materials and an ignition source. From this definition a short circuit is not a necessary condition for a house fire as hot oil fires on stoves and children playing with matches are other well-known domestic fire sources. 5.7.2 Sufficient Conditions For an incident to occur there must also be sufficient conditions. For example, there has to be sufficient nearby combustibles in an appropriate configuration with an adequate supply of air (oxygen) to cause a fire.

Page 66: r2A Risk and Reliability 5th_Edition

Causation

Risk & Reliability Associates Pty Ltd 5.11

5.7.3 Negative Conditions Negative conditions are the absence of certain conditions causing a fire. For example: * a correctly sized fuse (which would have prevented the short circuit in the first place), or * the failure to enclose the cable in metal pipe to shield it from combustibles, or * the absence of a nearby automatic sprinkler which would have minimised the fire, or * the absence of a micro-meteorite that would have crashed through the area just as the fire was

about to start. Obviously, negative conditions are problematic because they can include a vast array of unpredictable 'what if' possibilities. 5.7.4 Controllable Conditions What the fire investigators may be attempting to do is to describe those conditions that they believe should have been considered controllable. This is in some ways problematic since establishing all the relevant conditions can be a very difficult task, especially if it is deemed to include all aspects of human behaviour in the context of underlying cultural, social and economic circumstances. To establish what might be practicable, some form of probability test seems to be applied. The legal tests of causation appear to be relevant. If a negative condition were removed would it have: * controlled the situation beyond reasonable doubt? or, * controlled the situation on the balance of probabilities? 5.7.5 Latent Conditions The notion of latent conditions seems to rest around some form of failure that is not apparent when it occurs, similar to a hidden or concealed failure in FMECA (Fault Modes, Effects and Criticality Analysis). So, like a software error, a latent condition waits until a particular pattern of circumstances arises enabling a catastrophe. In this sense, latent conditions would be controllable, possibly negative and necessary not but sufficient.

Page 67: r2A Risk and Reliability 5th_Edition

Causation

5.12 Risk & Reliability Associates Pty Ltd

REFERENCES Dawkins Richard (1998). Climbing Mount Improbable. Penguin Books. His earlier works, The Blind Watchmaker (1986, Penguin Books) and The Selfish Gene (1976, Oxford University Press) are also worth reading. Haddon W (1973). Energy Damage and the Ten Countermeasure Strategies. Journal Trauma, Volume 13, Number 4, pages 321-331. Heinrich H W (1959). Industrial Accident Prevention. 4th Ed. New York, McGraw Hill Books. Ishikawa, Kaoru (1985). What is Total Quality Control? Prentice-Hall. Translated by David J Lu. Kauffman, Stuart (1995). At Home in the Universe. The Search for Laws of Self Organisation and Complexity. Penguin Books Edition 1996. (Quote is from page 247) Kletz T A (1985). An Engineer’s View of Human Error. IChemE, London. Kuhn T S (1970). The Structure of Scientific Revolutions, 2nd Edition, enlarged, sixth impression. University of Chicago Press. Maruyama M (1974). Paradigmatology and its Application to Cross-Disciplinary Cross-Professional and Cross-Cultural Communications. Cybernetica, No.2, pp. 136-156. Mackie J L (1965). Causes and Conditions. American Philosophical Quarterly, 2.4 (October 1965), pp 245-64 and 261-4. Reprinted as Chapter I of Causation and Conditionals edited by Ernest Sosa. Oxford Readings in Philosophy. Oxford University Press (1975). pp 15-38. Popper K R (1972). Objective Knowledge: An Evolutionary Approach. Clarendon Press, Oxford. Revised Edition 1979. Paraphrase is from Chapter 2. Reason J (1993). Managing the Management Risk: New Approaches to Organisation Safety Chapter 1 of Reliability and Safety in Hazardous Work Systems: Approaches to Analysis and Design. Eds I Wilpert et al. Lawrence Erlbaum Associates Ltd, East Sussex. ISBN 0-86377-309-5. Reason J (1997). Managing the Risks of Organisational Accidents. Ashgate Publishing Limited. Rowe W D (1977). An Anatomy of Risk. Wiley Interscience, New York. SAE ARP 4761:1996 Guidelines and Methods for Conducting the Safety Assessment process on Civil Airborne Systems and Equipment. Society of Automotive Engineers, Aerospace Recommended Practice. Viner D B L (1991). Accident Analysis and Risk Control. VRJ Information Systems, Melbourne. ISBN 0 646 02009 9

Page 68: r2A Risk and Reliability 5th_Edition

Risk Criteria

Risk & Reliability Associates Pty Ltd 6.1

6. Risk Criteria Risk criteria are used as a decision-making yardstick by governmental agencies, business and occasionally individuals to determine whether a risk is acceptable, tolerable or unacceptable. 6.1 Legal Criteria A robust form of measurement can be devised around legal criteria as discussed in Chapter 4. For example, the number of actions taken by various environmental and occupational health and safety enforcement agencies against an organisation or perhaps the number of days directors might spend in jail might be considered.

HAZARDS

INCIDENTS AND OCCURRENCES

CLAIMS JUDICIAL PROCEEDINGS

Like- lihood

Seve- rity

Risk Like- lihood

Seve- rity

Risk Like- lihood

Seve- rity

Risk Like- lihood

Seve- rity

Risk

H1 0.1 2 0.2 I1 0 2 0 C1 0 2 0 J1 0 2 0 H2 0.2 3 0.6 I2 1 3 3 C2 1 3 3 J2 0 3 0 H3 0.05 50 2.5 I3 0 50 0 C3 0 50 0 J3 0 50 0 H4 0.3 2 0.6 I4 2 2 4 C4 1 2 2 J4 1 2 2 H5 0.65 13 8.45 I5 0 13 0 C5 0 13 0 J5 0 13 0 H6 0.025 260 6.5 I6 0 260 0 C6 0 260 0 J6 0 260 0 H7 0.001 1500 1.5 I7 0 1500 0 C7 0 1500 0 J7 0 1500 0 H8 0.45 0.5 0.23 I8 0 0.5 0 C8 0 0.5 0 J8 0 0.5 0 H9 0.01 6 0.06 I9 0 6 0 C9 0 6 0 J9 0 6 0

H10 0.5 60 30 I10 1 45 45 C10 1 45 45 J10 1 45 45 H11 0.005 100 0.5 I11 0 100 0 C11 0 100 0 J11 0 100 0

: : : : Hi 0.003 1 0 Ij 0 0 0 Cj 0 0 0 Jj 0 0 0 ∑Hi 51.1 ∑Ij 52 ∑Cj 50 ∑Jj 47

Event Horizon <<<<<<Pre-Event Control / Post - Event Management >>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Concept Hazard Register The above table suggests why such an approach could be considered. Over any period of time, most hazards will not result in incidents and of the incidents that do occur only a few will give rise to claims. Most of the costs will manifest in those claims that make it to court. This ought to be a small subset of the set of all hazards and incidents. However, there are obviously other dimensions to managing risk like this. Unless one is clairvoyant it is not possible to know which hazards definitely will lead to court cases and which ones will not. So only if a company was both naive and immoral would it attempt to manage risk by trying to identify and manage only those hazards which it thought might lead to incidents that could end up giving rise to prosecution or a common law claim. 6.2 Individual Risk Criteria If a single severity of outcome is being considered then very often probability criteria can be used as the basis to benchmark risk. Many countries in the world maintain databases on causes of death to their citizens. These can be analysed. A typical result is shown on the following page. These tables are basically a statement of what a particular community seems to have historically accepted as ‘reasonable’. That is, what we as a society are willing to live with. Nuclear authorities usually undertake such studies. They are very interested in where nuclear risk is perceived to lie. The numbers for the NSW figures were prepared by ANSTO (Australian Nuclear Science and Technology Organisation).

Page 69: r2A Risk and Reliability 5th_Edition

Risk Criteria

6.2 Risk & Reliability Associates Pty Ltd

From such lists various authorities suggest acceptable frequencies of death for individuals in critical exposed groups. These numbers are in chances per million per year. That is, the chances, on average, of being struck and killed by lightning in NSW is one in ten million per year or alternatively, for an individual, once in every ten million years.

Voluntary Risks (average to those who take the risk)

Chances of fatality per million person years

Smoking (20 cigarettes/day)

1. all effects 2. all cancers 3. lung cancers

Drinking alcohol (average for all drinkers) - all effects - alcoholism and alcoholic cirrhosis Swimming Playing rugby football Owning firearms

5000 2000 1000

380 115

50 30 30

Transportation Risks (average to travellers)

Travelling by motor vehicle Travelling by train Travelling by aeroplane accidents

145

30 10

Risks averaged over the whole population

Cancers from all causes

total lung

Air pollution from burning coal to generate electricity Being at home-accidents at home Accident falls Pedestrians being struck by motor vehicles Homicide Accidental Poisoning

• total • venomous animals and plants

Fires and accidental burns Electrocution (non-industrial) Falling objects Therapeutic use of drugs Cataclysmic storms and storm floods Lightning Strikes Meteorite strikes

1800 380

0.07-300 110

60 35 20

18

0.1

10 3 3 2

0.2 0.1

0.001

Risks to Individuals in New South Wales (from NSW Department of Planning, 1990)

Source: Edited from D J Higson,

Risks to Individuals in NSW and Australia as a Whole, Australian Nuclear Science and Technology Organisation, July 1989

Page 70: r2A Risk and Reliability 5th_Edition

Risk Criteria

Risk & Reliability Associates Pty Ltd 6.3

Such data can also be represented in a triangle type diagram, sometimes referred to as "the dagger diagram". The two key levels seem to lie around road death statistics and the chances of being struck by lightning. In simple terms, it seems that if we believe something is more dangerous than driving a car then the risk is unacceptable (about one chance in 10,000 per year), but that if it about as likely as being struck by lightning (about one chance in 10 million per year), then it is probably so low that we don't expect anyone to do anything about it. In the range between these two figures cost benefit studies to reduce the risk to as low as reasonably practicable is appropriate

Negligible risk

Intolerable; risk cannot be justified except in extraordinary circumstances

Acceptable

Limit for WA EPA

Objective for NSW DoP

Risk Categories

I

II

III

IV

V

Car Accident Death Rate

TypicalQuantification

Values

Lightning Strike Death RateObjective for Vic VWA

Levels of Risk Acceptability

Trivial risk

Undesirable; tolerable only if reduction is impractical or if cost is grossly disproportionate to the improvement gained

Tolerable if the cost of reduction would exceed the improvement gained

10 per year-7

10 per year-6

10 per year-5

10 per year-4

Broadly Acceptable

Risk Levels for Individuals in a Critically Exposed Group Diagram (without quantification) appears in IEC 61508 as figure B1

Many organisations are now emphasising the risk criteria of tolerance rather than acceptance. To tolerate risk means that risk is not regarded as negligible, meaning that it can be ignored. Rather, it must be kept under review and reduced still further to the negligible level if and when this becomes practical. The key element is the process by which it is demonstrated that all practicable measures have been taken to reduce risk levels to a minimum. The Victorian WorkCover Authority, the NSW Department of Planning and the Western Australian Environmental Protection Authority (EPA) have defined individual risk levels. Other Australian States tend to utilise one or other of these criteria when assessing individual and/or societal risk. A summary of criteria used in Australia and New Zealand is described in Chapter 13, Process Industry.

Page 71: r2A Risk and Reliability 5th_Edition

Risk Criteria

6.4 Risk & Reliability Associates Pty Ltd

For example, the NSW Department of Planning has published an advisory paper "Risk Criteria for Land Use Safety Planning" (June 1992) that outlines the criteria by which the acceptability of risks associated with potentially hazardous developments will be assessed. The table below summaries the criteria for the individual fatality risk for new installations.

Risk Level Land Use 0.5 x 10-6 pa Hospitals, schools, child care facilities, old age housing 1.0 x 10-6 pa Residential, hotels, motels, tourist resorts 5 x 10-6 pa Commercial developments including retail centres, offices

and entertainment centres 10 x 10-6 pa Sporting complexes and active open spaces 50 x 10-6 pa Industrial

Individual Fatality Risk-New Installations

6.3 Societal Risk Criteria As the severity of the event increases, we appear to become more risk averse. Particularly, once the death threshold is passed, it appears the community has a much greater aversion to multiple fatality incidents. Authors such as Wiggins (1984) in the USA have noted that the dollars Congress spends per life saved for a coalmine disaster or aircraft collision is much higher than the dollars spent to save a life on the road. In many countries this seems to amount to a one hundred-fold decrease in the likelihood of the event for a ten-fold increase in the severity of the consequence measured in fatalities. This is shown in the Netherlands criteria below. Societal risk analysis combines the consequence and likelihood information with population information. This is presented as a F-N plot, which indicates the cumulative frequency (F) of killing 'n' or more people (N).

Netherland UnacceptableLimit

NetherlandAcceptable Limit

1 10 100 1000

10

10

10

10

10-3

-4

-5

-6

-7

-810

Number of Fatalities (N)

Frequencyof N or

morefatalitiesper year

ALARP(As low as reasonbly practicable)

Societal Risk Criteria as reported by the NSW Department of Planning (1990)

There also appear to be occasions dealing with very severe events where the consequence of the outcome is deemed to be so high that it is just politically unacceptable. Societal risk criteria have been proposed by a number of authorities including the Victorian WorkCover Authority and the NSW Department of Planning. Again these are described in more detail in Chapter 13.

Page 72: r2A Risk and Reliability 5th_Edition

Risk Criteria

Risk & Reliability Associates Pty Ltd 6.5

For example, societal risk criteria for public safety relating to hazardous industries have not been formally established and publicised in Victoria. There is currently a set of draft criteria issued by the Victorian WorkCover Authority (VWA), which is used by Government Authorities involved in Land Use Planning. This criterion was used as part of the Technica Ltd, “Risk Sensitivity Analysis for the Altona Petrochemical Complex and Environs”, October 1997. The document establishes criteria for societal risk in the form of a log-log F-N plot that results in two parallel lines defining three zones: a) above the acceptable limit the societal risk level is not tolerable b) between the acceptable and negligible limits the societal risk level is acceptable but if the

perceived benefits gained by the activity are not high enough, some risk reducing measures may be required. Risk should be "as low as reasonably practicable" (ALARP).

c) below the negligible limit, the societal risk level is acceptable, regardless of the perceived value of the activity.

1 10 100 1000

10

10

10

10

10-2

-3

-4

-5

-6

-710

Risk Unacceptable

Risk Acceptable but remedial measures desirable

RiskNegligible

Number of Fatalities (N)

Frequency of N or more fatalities per year

Victorian Societal Risk Criteria

6.4 Environmental Risk Criteria

Unlike OH&S risk assessment in which all evaluations have a common denominator, namely “human exposure”, environmental risk assessment has a much broader and complex scope with a substantial increase in the number of uncertainty characteristics. 6.4.1 Wright's Criteria Wright (1993) describes several factors which need to be recognised. * ecosystems are complex, open and dynamic * the time-scale to cause measurable impact or recovery from impacts may be longer than human

life * persistent materials which are bio-available, and have the potential to bio-accumulate should be

avoided, discharge will cause irreversible net change * the relative scale of the environmental impact must be considered in all environmental

dimensions (spatial, temporal etc) * the ecosystem has inherent or built-in variability and recoverability * cause and effect relationships are often difficult to measure * interdependency exists between different eco-sub-systems * acceptability of risks to the environmental resources is dependent on human values.

Page 73: r2A Risk and Reliability 5th_Edition

Risk Criteria

6.6 Risk & Reliability Associates Pty Ltd

There is also the problem of synergistic effects. This means, for example, that two chemicals which are individually inert in the environment, interact to cause damage. Wright also suggests that it is possible to calculate the likelihood and size of accidental or intermittent releases and then make a judgement on what the consequences of such releases would be. The table of consequences is shown below:

Consequence Type

Description

Catastrophic Irreversible alteration to one or more eco-systems or several component levels. Effects can be transmitted, can accumulate. Loss of sustainability of most resources. Life cycle of species impaired. No recovery. Area affected 100 km2

Very Serious Alteration to one or more eco-systems or component levels, but not irreversible. Effects can be transmitted, can accumulate. Loss of sustainability of selected resources. Recovery in 50 years. Area affected 50 km2.

Serious Alternation/disturbance of a component of an eco-system. Effects not transmitted, not accumulating or impairment. Loss of resources but sustainability unaffected. Recovery in 10 years.

Moderate Temporary alteration or disturbance beyond natural viability. Effects confined < 5000 m2, not accumulating. Resources temporarily affected. Recovery < 5 years.

Not detectable Alteration or disturbance within natural viability. Effects not transmitted, not accumulating. Resources not impaired.

Environmental Consequences

In the context of a risk diagram:

NotDetectable Moderate Serious

VerySerious

Catas-trophic

Accidental and Intermittent Release

Consequence

Likelihood

Intolerable Risk Level

Design/Operation Risk Level

Frequency per year

"As Low As Reasonably Practicable"(ALARP) Region

1

10

10

10

10

10

10

-1

-2

-3

-4

-5

-6Negligible Risk Level

Risk Levels for Accidental Releases to the Environment

Page 74: r2A Risk and Reliability 5th_Edition

Risk Criteria

Risk & Reliability Associates Pty Ltd 6.7

6.4.2 Inter-governmental Agreement on the Environment (Feb 1992) The 'Precautionary Principle' has been adopted by the Inter-governmental Agreement on the Environment (1992) between the Commonwealth and the States as a cornerstone of Australian environmental policy. The principle expressed in the IGAE is: Where there are threats of serious or irreversible environmental damage, lack of full scientific certainty should not be used as a reason for postponing measures to prevent environmental degradation. In the application of the precautionary principle, public and private decisions should be guided by: (i) careful evaluation to avoid, wherever practicable, serious or irreversible damage to the

environment; and (ii) an assessment of the risk-weighted consequences of various options. This principle apparently had its origins in Germany's democratic socialist movement in the 1930's and gained acceptance through the 1970's and early 1980's as a powerful corporate governance tool, significantly reducing the instances of imprudent business practices and adding strength to the world's rapidly developing securities' markets. The significance of an intergovernmental agreement relates to the Australian constitution and that the original six Australian states existed before federation. Unless the constitution specifically provides for powers being exercised by the federal government, the residual powers remain with the states. So in order to obtain a consistent national outcome for matters that lie outside the constitution an inter-governmental agreement must be obtained. 6.5 Insurance Criteria Depending on the nature of the event, the insurance approach can provide certain benchmarks or criteria.

OH&SFire &Explosion CatastrophicMaintenance

Relative Severity of Consequence

Relative Likelihood of Consequence

UninsuredWorkers Compensation

Property Insurance Re-insurance

Public Liability

Risk Diagram Showing Some Insurance Regimes

Page 75: r2A Risk and Reliability 5th_Edition

Risk Criteria

6.8 Risk & Reliability Associates Pty Ltd

There are presently about thirteen different definitions of property loss expectancy used throughout the world, each with subtle definitions and variations. The reason for this plethora appears primarily to derive from the history of the organisations using them. Once a company has established an underwriting tradition it is difficult to change the definitions without seriously complicating the individual underwriters’ attitudes and that of the re-insurers towards the underwriters. Perhaps not unnaturally, there appears to be an observable trend with loss estimates that the more conservative the underwriter the more severe the loss estimate criteria will be. This is particularly noticeable with re-insurers' definitions. In the case of workers’ compensation insurance, almost all Australian jurisdictions have different criteria for claim thresholds. 6.6 Ethical Criteria The Codes of Ethics of most professional societies contain certain performance criteria, which are supposed to apply to the members. The UK Engineering Council adopted in 1993 the following statement that Engineers Australia (1993) picked up in a more diluted form. The small print on the back of the UK brochure stated: The Engineering Council expects registrants to adhere to good engineering practice wherever and whenever possible and considers that this code of professional practice will assist registrants in achieving this standard. Registrants should be aware that non-compliance with the provisions of this code might be relevant when considering professional disciplinary matters although adherence to this code will be regarded as demonstrating good practice, which could provide the best protection against such action. While a failure to adhere to the provision of this code by an individual registrant may not necessarily amount to negligence or a breach of an applied contractual term by that registrant, such failure may be evidence of an infringement of the Council’s rules of conduct, which could lead to disciplinary proceedings. The ten-point code on professional practice on risk issues is: i. professional responsibility — exercise reasonable professional skill and care ii. law — know about and comply with the law iii. conduct — act in accordance with codes of conduct iv. approach — take a systematic approach to risk issues v. judgement — use professional judgement and experience vi. communication — communicate within your organisation vii. management — contribute effectively to corporate risk management viii. evaluation — assess the risk implications of alternatives ix. professional development — keep up to date by seeking education and training x. public awareness — encourage public understanding of risk issues. The point to note is that the law is second on the list, and that the statement in italics is quite clear that if a registrant fails to adhere to the code, then he or she is on their own.

Page 76: r2A Risk and Reliability 5th_Edition

Risk Criteria

Risk & Reliability Associates Pty Ltd 6.9

REFERENCES Commonwealth of Australia (1992). Intergovernmental Agreement on the Environment. Engineers Australia (1993). Dealing with Risk. Engineers Australia, Canberra. The Engineering Council of the United Kingdom (1993). Code of Good Practice for Dealing with Risk. Higson D J (1989). Risks to Individuals in New South Wales and in Australia as a Whole. Nuclear Safety Bureau, Australian Nuclear Science and Technology Organisation. NSB Report 2/1989. International Standard on Functional Safety, IEC-61508-5 Functional Safety Systems of electrical/electronic/programmable electronic safety-related systems- Part 5 Examples of methods for the determination of safety integrity levels, July 1998. NSW Department of Planning (1990 and 1992). Risk Criteria for Land Use Safety Planning. Hazardous Industry Planning Advisory Paper No. 4. Technica Ltd (1997). Risk Sensitivity Analysis for the Altona Petrochemical Complex and Environs. Western Australia EPA document: Guidance for the Assessment of Environmental Factors, Risk Assessment and Management: Offsite Individual Risk from Hazardous Industrial plants, No.2 (Interim July 1988) Wiggins J H (1984). Risk Analysis in Public Policy. Proceedings of Victoria Division, Engineers Australia, Risk Engineering Symposium 1984: Engineering to avoid Business Interruption. Wright N H (1993). Development of Environmental Risk Assessment (ERA) in Norway. Norske Shell Exploration and Production. READING Engineers Australia (1990). Are You at Risk? Engineers Australia, Canberra. Fernandes-Russell, Delia (1988). Societal Risk Estimates from Historical Data for UK and Worldwide Events Research Report No. 3. Environmental Risk Assessment Unit, School of Environmental Sciences, University of East Anglia Norwich, UK. Health and Safety Commission, UK (1991). Major Hazard Aspects of the Transport of Dangerous Substances. Report and Appendices of the Advisory Committee on Dangerous Substances London, HMSO. Health and Safety Executive, UK (1988). The Tolerability of Risk from Nuclear Power Stations. London, HMSO. Higson D J (1990). Nuclear Safety Assessment Criteria. Nuclear Safety, Volume 31, No. 2, April/June 1990, pp 173-186. Lees F P (1995). Loss Prevention in the Process Industries. 2nd Edition. Butterworth- Heinemann Ltd, Oxford, UK (3 Volumes). Muspratt M A & R M Robinson (1991). Ethics and their Environment. Proceedings of the Annual Conference, Hobart. Engineers Australia. NSW Department of Planning, Sydney (1989). Environmental Risk Impact Assessment Guidelines Hazardous Industry Planning Advisory Paper No. 3.. NSW Government (1993). Total Asset Management Manual - Risk Management. Public Works Department, November 1993. Warren Centre for Advanced Engineering (1986). Major Industrial Hazards. The University of Sydney.

Page 77: r2A Risk and Reliability 5th_Edition

Top Down Techniques

Risk & Reliability Associates Pty Ltd 7.1

7.0 Top Down Techniques This chapter focuses on the top down view of downside risk or vulnerabilities. Further discussion on the upside risk or value addeds is contained in Chapter 3.3, Risk and Opportunity. Ranking combinations of upside and down side risk is covered in Section 8.4, Integrated Investment Ranking.

Two high level or top down techniques appear common. Vulnerability techniques derived from the military intelligence community and SWOT (Strengths, Weaknesses, Opportunities and Threats) from the commercial sector. Conceptually the two overlap as shown in the augmented diagram below. 7.1 SWOT Assessments The SWOT analysis interpreted from a risk perspective provides insight into Liabilities as established by Vulnerabilities (the risk of loss), and Rewards identified by Value Adding (the risk of gain).

External / Internal Factors

Opportunities

Strategy

Organisation

Strengths

Value Addeds

Threats

Weaknesses

Vulnerabilities

Augmented SWOT Process 7.2 Upside and Downside Risk It should be noted that many risk decisions have simultaneous upside and downside risk elements. 7.2.1 Business Risk Market risk is an obvious form of business risk with both upside (speculative) and downside (pure) risk implications. 7.2.2 Clinical and Military Risk Decisions Different clinical procedures can also entail a mix of risk outcomes. Take the crude example of a traumatic leg injury. Amputation will almost certainly save the life of the patient but at a price of reduced mobility. Saving the leg is possible but with an increased risk of gangrene. Which procedure should be adopted? If a downside risk assessment only were considered then the leg would almost certainly be amputated. Military decisions also have this two-sided element. The best immediate course of action (COA) might be very chancy but could reduce the conflict to days rather than years. Is it better to take the chance or to play it ‘safe’ and prolong the conflict? 7.2.3 Project Risk Decisions Project risk provides another interesting insight. In this case the upside risk is assumed in the proposal. The risk analysis generally focuses on those issues which will prevent the assumed upside benefits from being achieved. That is, it is a downside risk assessment process from an assumed upside risk position. This is discussed further in Section 7.5, Project Risk Profiling.

Page 78: r2A Risk and Reliability 5th_Edition

Top Down Techniques

7.2 Risk & Reliability Associates Pty Ltd

7.3 Vulnerability Assessments The diagram below outlines a generic vulnerability assessment technique that is used very widely to assess and propose appropriate solutions to risks that affect most organisations. This technique is something that is used by military intelligence, strategic planners, public affairs risk analysts, project managers as well as risk engineers. The central concept is to define the assets of the business and all the possible threats to them. The organisation’s Critical Success Factors can also be considered to be the organisation’s assets. The threats are then systematically matched against the assets to see which is vulnerable to each threat. Only the assessed vulnerabilities then have control efforts directed at them. This prevents the misapplication of resources to something that was really only a threat and not a vulnerability.

Assets (Critical Success Factors) Public image and confidence

Capability to perform an organisation’s function Physical resources and facilities

Personnel resources Customer loyalty

Threats Smoke, fire, explosion

Natural hazards (rain, snow, wind, earthquake etc.) Critical plant failure

Failure of a major supplier Sabotage, acts of aggression

Vulnerabilities (Assets exposed to Threats) Physical (e.g. buildings vulnerable to fire, money to theft, equipment to sabotage, product to contamination)

Personal (e.g. personnel to injury/vehicle accident, chemical exposure, discrimination, terrorism) Public Relations (e.g. corporate image to pollution, product fault, fraud and corruption),

Financial (e.g. assets to currency, market or interest rate changes)

Management Strategies Risk Control (Design, Administration, Training)

Risk Avoidance Risk Transfer

Risk Acceptance

Generalised Vulnerability Assessment Technique It is important that the identified threats are credible. For example one would not list “earthquake” as a credible threat in a region, which is not in an earthquake region. Nor would terrorism normally be a credible threat to the building of a new production facility for jam in a rural location. The power of the vulnerability technique lies in its potential to provide a completeness check. For example, if all the critical success factors for an enterprise are declared, and all the primary credible threats identified then no unexpected vulnerabilities should impact the organisation. However, if one credible vulnerability is overlooked then an unexpected event can occur ‘out of the blue’.

Page 79: r2A Risk and Reliability 5th_Edition

Top Down Techniques

Risk & Reliability Associates Pty Ltd 7.3

The vulnerability process can also be shown as a simple flow chart.

Vulnerability Assessment Process

The power of the process rests on the fact that whilst there may be a large number of identified assets to be protected against a large number of threats, the actual number of critical vulnerabilities is usually quite small, typically around 10% of the intersections of a typical asset/threat matrix. Critical vulnerabilities are explained further in Section 7.3.5. The weakness of the technique is that it often identifies areas of strategic concern rather than particular risk issues and precautions.

Page 80: r2A Risk and Reliability 5th_Edition

Top Down Techniques

7.4 Risk & Reliability Associates Pty Ltd

The figure below shows the vulnerability technique as a flow chart for computer risk assessment.

Objectives of the Organisation

Is the computing facilityessential to the maintenanceof the objectives?

What parts are essential?

Can these essential services be done elsewhere?

Is insurance enough to cover cost of outside operations,replacement of equipment and non-essential services?

Is protection commensuratewith insured levels?

Vital points identified,processors, power supplies,air conditioning . . .

Threats identified, fire, water damage, power failure, sabotage . . .

Vulnerable?Do threats expose vital points ?

Is protection adequate and appropriate?

Cost effective recommendations?Business Interruption Insurance largely ineffective. Should such premiums provide funds for physical protection.

Implementation

More insurance required?

Yes

End

No

No

Yes

No

Yes

No

Payroll, accounting . . .

Adequate protectionagainst disaster possibilities essential.

Define disaster period?

Increased insurance

Yes

Flow Chart of the Asset and Threat Technique Applied to Computer Risk Assessment

The vulnerability approach is used in the Information Security Standard AS/NZS 4444.2:2000. Step 3 entitled, Undertake a Risk Assessment, comprises the steps of:

• Threats • Vulnerabilities • Impacts

Page 81: r2A Risk and Reliability 5th_Edition

Top Down Techniques

Risk & Reliability Associates Pty Ltd 7.5

7.3.1 Assets Lists are the most common way of establishing assets. For example, the Australian Risk Management Standard (AS 4360:1999) lists possible areas of impact as: a) Asset and resource base of the organisation, including personnel. b) Revenue and entitlements c) Costs of activities, both direct and indirect d) People e) Community f) Performance g) Timing and schedule of activities h) The environment i) Intangibles, such as reputation, goodwill, quality of life j) Organisational behaviour Dependency trees can also be used for such an assessment. The example below sets out the key assets from the viewpoint of an airline that perceives its business to be that of moving paying passengers by air.

Passenger Terminals

Trains, Taxis, Carparks

Serviceable AirportsPassengersTrained

AircrewServiceable

Aircraft

Flying Paying Passengers

Computers & Software

Trained Operators

Reservation Systems

Dependency Tree Diagram of an Airline Each of these sub-assets could then be examined for their vulnerability to each of the listed threats. All these approaches assume that the analyst has a clear view of what the business of the organisation actually is, something that is not always easily achieved. It is very difficult to undertake a risk analysis if the organisation concerned cannot clearly state its business at the outset. There are various ways that a vulnerability assessment can be made including desktop studies, workshops, hiring specialist consultants or combinations of these.

Page 82: r2A Risk and Reliability 5th_Edition

Top Down Techniques

7.6 Risk & Reliability Associates Pty Ltd

7.3.2 Threats The second task, after identification and assessment of assets, is identification and assessment of threats to these assets. Threat, as used in this section, refers to any occurrence or activity that could destroy a business asset or reduce its value or business effectiveness. (Where some disciplines use the term threat in this way, others would prefer to use terms like hazard or risk) The type and degree of protection required for different assets will depend on the nature and likelihood of the threat and how vulnerable that asset is to those threats. The security appropriate to bomb threats, for example, is obviously different to that required regarding product extortion. The issue to be considered is: what particular credible threats exist or could arise to the identified assets and which of these threats are significant? A sample Threat Checklist is shown below.

Threats to Treasury & Finance Credit squeezes Liquidity issues Customer payment defaults Exchange fluctuations Funding sources failure Interest rate fluctuations Threats to Assets Fire Earthquake Flood Explosion Critical plant failure Malicious damage Threats of Business Interruption Industrial action Political/Civil upheaval Picketing/Demonstrations/Boycott Bomb Threat Bomb "Hoax" Malicious Damage/Sabotage Threats to Information Industrial Espionage Takeover Sabotage of data Threats to Company Reputation Scandal (eg, frauds, business or political) Product Fault or Contamination Environmental pollution

Threats to Company's Competitive Edge Professional incompetence Failure to best practice Failure to continuously improve Poor public image Threats to Product Product Extortion Collusive Theft Pilferage Contamination Threats to Staff Discrimination OH&S injury Harassment Threats from Staff Pilferage Theft Fraud Malicious Damage Threats to Cash Robbery Burglary Military Threats Sniper fire Small arms fire Machine gun fire RPG or mortar attack Artillery attack Missile attack Thermonuclear

A Sample Threat Checklist

Page 83: r2A Risk and Reliability 5th_Edition

Top Down Techniques

Risk & Reliability Associates Pty Ltd 7.7

7.3.3 Vulnerabilities A vulnerability is a weakness with respect to a threat. This weakness may be intrinsic in the asset. For example, a US multinational company is probably more vulnerable to politically motivated attacks than a Swiss company. Product is more vulnerable to theft and fraud if the stock control and accounting systems are dominated by the requirements of the sales department to the detriment of accurate and timely accounting. Or the weakness may be due to the location of the asset. For example, an Australian company in the Middle East is more vulnerable to terrorism than one in Iceland. Confidential information on a meeting room blackboard in an office with some public access is more vulnerable than when it is in a locked cabinet in a manager's private office or a secure registry. Or the weakness may be due to inadequate or inappropriate risk management. For example, a company with no contingency planning for crisis management, public relations fallouts, or disaster recovery is more vulnerable to adverse business impact if certain threats materialise. NB: Vulnerability is used alternatively to refer to the extent of exposure of business or asset to a risk. 7.3.4 Business Impact Business impact is a form of risk characterisation particularly persuasive in assessing commercial risk. It is the overall cost to the company if threats succeed. Proper assessment of potential business impact is essential in determining the cost-benefit of proposed counter-measures. Commercial vulnerabilities are often characterised by an inability to purchase insurance against them. The quantification of commercial vulnerabilities is necessarily less ‘scientific’ as human nature appears to much greater significance. Many organisations create a Group Risk Profile. This is discussed further in Section 3.3 Risk and Opportunity. This provides consideration of the major recognised balance sheet, off balance sheet strategy performance and operational performance together with procedures for their day-to-day management. The key issues are to establish the nature of the perceived vulnerability quantified in terms of possible dollar impact and return period. How much would the counter-measure cost to implement and maintain? How much risk reduction would this achieve? How does this compare with the maximum foreseeable loss that could result if the measure was not introduced and threats succeeded? Such an approach can direct attention to revenue concentration for example. Any business that obtains more than 25% of its income from a single source or contract can be subject to major profit fluctuations if that source abruptly stopped. To ensure a steady dividend stream it may be desirable to retain profits to offset against the possible loss of income, or to use such retained funds to diversify the income stream. Business impact should include human cost, that is, suffering, anguish, anxiety, stress, and the like, which staff, members of the public, and associated families would experience - not just loss measurable in dollars. Good corporate citizens and managers should be motivated by normal human values, not just "economic rationalism". (Although in legal cases where injured parties sue for damages due to negligence, dollar values will be put on such things.) It is necessary also to consider consequential or indirect costs as well as direct costs. For example, it may only cost thousands of dollars to replace a contaminated product, even less if it is covered by insurance. But the loss of market share and business reputation may be far more important. Consequential damage includes such things as:

• business interruption • loss of market share or competitive edge • fines due to incidental pollution resulting from fire, explosion, or malicious damage

Page 84: r2A Risk and Reliability 5th_Edition

Top Down Techniques

7.8 Risk & Reliability Associates Pty Ltd

Consequential damage can result also if a breach of security causes such things as:

• strikes • legal liability • government regulation • deterioration in relations with staff, unions, neighbourhood, government, media / public

Sometimes security itself can be the cause of poor staff and union relations if it is inappropriate, or insensitively implemented. A common example is the inept use of baggage inspections or searches as a counter-measure against pilferage. Assessing business impact is a collective task. A manager cannot do it effectively without the assistance of other managers of specialist functions. Virtually all other functions are involved in assessing business impact in relation to one or other of the company's assets. Obviously insurance and finance/accounting departments need to be involved, but so too, in many cases, do production/operations, marketing, personnel, industrial relations, public and media relations, and legal departments. 7.3.5 Control Only when risk has been identified and prioritised by assessing assets, threats, vulnerabilities, and potential business impact, can appropriate control options be identified and appraised. 7.3.6 Workshops One of the most successful methods of obtaining consensus on the relative importance of vulnerabilities, characterising risk, establishing control options and creating an action list is to use an asset and threat matrix in a workshop with relevant managers. There are various possibilities but a common approach is a two-stage workshop shown below.

CriticalityAssessment

Recommendationsand/or ResidualRisk Allocation

Credible Threats

Credible Vulnerabilities

Asset ID

Statutory and Regulatory ComplianceCommon Law "Due Diligence"Investment Payback CriteriaInsurance Criteria

Stage 1

Stage 2

Possible Precautions

Risk Analysis

Critical CredibleVulnerabilities

Vulnerability Workshop Process As discussed in the Liability chapter (Chapter 4), senior decision makers and the courts require a demonstration that all practicable reasonable precautions are in place. The underlying issue is that if something untoward occurs the courts immediately look to establish (with the advantage of 20:20 hindsight) what precaution/s that should have been implemented weren’t. Risk is not strictly relevant since, after the event, likelihood is not relevant. It has happened. As an Australian judge has been reported as noting to the engineers after a recent train incident: “What do you mean you did not think it could happen, there are seven dead”.

Page 85: r2A Risk and Reliability 5th_Edition

Top Down Techniques

Risk & Reliability Associates Pty Ltd 7.9

Hence the notion of risk is really only used to test the value of the precaution it is claimed ought to have been in place. How risky a situation is before the event is not germane. 7.3.7 Criticality Assessment One of the simplest ways to address this is to undertake a preliminary criticality analysis. Prior to the Stage 2 workshop, the assets and threats of concern to the organisation are developed into a matrix form. A preliminary criticality determination is made using the values in the table below.

xxx xx x -

va

Critical potential vulnerability that must be (seen to be) addressed Moderate potential vulnerability Minor potential vulnerability No detectable change Possible value adding

Criticality Scoring System

If this is correctly done then around 10% or so of the cells will have three x’s. This is the Pareto principle. Typically 80% to 90% of the risk comes from 10% to 20% of the vulnerabilities. Dealing with these to 10 to 20% is the primary purpose of the analysis. A very simple example result from a first stage is shown in below.

ASSETS > THREATS Technical Failure Community Issues Political (change of government) Credit Squeeze Flood

Reputation

xx - x

xxx x

Operability

xx - x

xxx xxx

Staff

xx xx x

xx xx

Sample Vulnerability Matrix

Many analyses in fact stop at the criticality stage. Provided there are cogent arguments explaining why all critical vulnerabilities are being managed, then further analysis is often not required, at least from a liability perspective. In a sense the critical vulnerabilities are the top consequence scores in a risk characterisation matrix as shown below. The next section considers risk characterisation in greater detail.

Almost Certain

Catastrophic

CONSEQUENCE

1 3 42 5

A

B

C

D

E

HH H

HHH

MM

MM

E E EE EE E

ELL LL L

H

H

Likely

Possible

Unlikel

Rare

Insignifican Minor Moderat Major

LIKELIHOOD

xxx

xxx- x

5 x 5 Risk Characterisation Matrix Showing “xxx’ Criticality Consequence Values

Page 86: r2A Risk and Reliability 5th_Edition

Top Down Techniques

7.10 Risk & Reliability Associates Pty Ltd

However critical vulnerabilities (xxx) can be analysed further in a number of ways. This depends on the nature of the analysis. Profiling enterprise risk using the risk matrix approach is very popular and is described further in ensuing sections. However other techniques can be used depending on the nature of the issue. A sample vulnerability matrix for a business is shown below. In this case scores are out of 10. The sum of the scores in the columns indicates the best collective belief of that organisation as to the key assets that are most susceptible to possible threats. The sum of the scores in the rows indicates the belief as to the most serious threats the organisation faces. The highest individual scores represent critical areas of vulnerability that should be addressed. ASSETS >>> Repu-

tation Comp Edge

Staff Oper- ability

Public Env/ Habitat

Infor- mation

Bldg/ Facility

Totals

THREATS

Chemical (fire, explosion, poisons) 5 8 5 9 2 8 0 0 37

Bomb 6 5 5 4 5 4 3 4 36

Statutory non-compliance 9 10 5 10 2 0 0 0 36 Pollution (oil spills, fires, dang. goods releases)

9 6 3 5 2 10 0 0 35

Spill 9 5 3 4 2 10 0 0 33

Malicious damage and contamination 7 6 3 4 2 0 5 2 29

Biomechanical (incl personal injury) 9 10 7 0 2 0 0 0 28

Scandal (eg, frauds, political involvement’s)

8 6 5 7 1 0 0 0 27

Extortion 9 10 5 0 2 0 0 0 26

Picketing/demonstrations 9 10 5 0 2 0 0 0 26

Pilferage and Theft 9 10 5 0 2 0 0 0 26

Industrial espionage 6 9 1 0 1 0 8 0 25

Storm (wind, hail, lightning, floods) 3 2 4 4 2 2 2 6 25

Contamination 10 0 0 0 6 8 0 0 24

Harassment 9 5 5 0 2 0 0 0 21

Alcohol/drugs 8 2 3 5 1 1 0 0 20

Suborning of staff for fraud or collusive theft

4 5 3 0 2 0 4 0 18

Bomb (threats) and hoaxes 2 5 1 5 1 0 2 0 16

Gravitational (falls, falling objects, landslides)

2 2 6 0 2 0 0 0 12

Discrimination 0 0 4 0 6 0 0 0 10

Electrical 3 1 4 0 2 0 0 0 10

Assault 0 0 5 0 4 0 0 0 9

Noise and Vibration 2 2 3 0 0 0 0 0 7

Defamation 0 0 2 0 2 0 0 0 4

Totals 138 119 92 57 55 43 24 12 540

Sample Vulnerability Matrix of a Business

Page 87: r2A Risk and Reliability 5th_Edition

Top Down Techniques

Risk & Reliability Associates Pty Ltd 7.11

7.3.8 Risk Characterisation A risk characterisation matrix framework is a very common approach such as that described in Appendix E of the Risk Management standard (AS/NZS4360:1999) and shown below. This appears to have been adapted from earlier military work (U.K. Ministry of Defence,1996 and U. S. Department of Defence, 2000, both revised versions of earlier standards). Such a matrix can be greater or less than 5x5 matrix on either scale. 7x5 is common for very large organisations and 4x3 or 2x2 for small projects. Other systems use a 1 to 5 category for both likelihood and consequence.

Almost Certain

CatastrophicCONSEQUENCE

1 3 42 5

A

B

C

D

E

HH H

HHH

MM

MM

E E EE EE E

ELL LL L

H

H

Likely

Possible

Unlikely

Rare

Insignificant Minor Moderate Major

LIKELIHOOD

Example of Risk Definition and Classification (after AS 4360:1999) Three methods of risk presentation are possible and are shown below. The first is a linear risk profile concept. The second is hyperbolic in nature with the product of the two values being used. The third in effect sums the numbers as logarithms. That is, each number represents a change in the order of magnitude if the scales are log log in nature.

51 2 3 4 5

1

2

3

4

5

1

2

3

4

5

2

4

6

8

1 0

3

6

9

1 2

1 5

4

8

1 2

1 6

2 0

5

1 0

1 5

2 0

2 5

Consequence Severity Consequence Severity

Likelihood

1 2 3 4

1

2

3

4

5

3

1

6

1 0

1 5

2

5

9

1 4

1 9

4

8

1 3

1 8

2 2

7

1 2

1 7

2 1

2 4

1 1

1 6

2 0

2 3

2 5

1 2 3 4 5

Consequence Severity1 2 3 4

1

2

3

4

5

2

1

4

5

6

2

4

5

6

7

4

5

6

7

8

5

6

7

8

9

6

7

8

9

1 0

1 2 3 4 5

Likelihood Likelihood

Risk Assessment Charts

The hyperbolic ranking system provides for a much greater scatter between identified vulnerabilities. That is, the top score is 25 for both systems, but a vulnerability with a 3, 3 value scores 13 in the linear system but 9 in the hyperbolic system thereby deeming it much less important and therefore demanding less organisation control effort. That is, less time and money will produce substantially greater results. However the linear system indicates exactly where the risk lies since a unique number describes each point on the chart. The use of logarithmic scales seems to resolve a number issues since this ensures that lines of constant risk are created which makes such presentation tools more intuitive and user friendly. This may make the third most mathematically pure but appears to be the least common. The Australian Standard matrix at the top of the page does not appear to be based on either the hyperbolic or the linear system.

E = extreme risk; immediate attention required

H = high risk; senior management

attention required M = moderate risk; management

responsibility must be specified L = low risk; manage by routine

procedures

Page 88: r2A Risk and Reliability 5th_Edition

Top Down Techniques

7.12 Risk & Reliability Associates Pty Ltd

7.4 Enterprise Risk Profiling Ultimately there must be an enterprise view of how identified risk issues should be characterised. This is appears necessary when there are competing risk agendas and limited capital available. For example, underwriting requirements, environmental issues, RCM requirements and OH&S issues can compete for scarce capital. How can an organisation come to grips with such issues without an overall top down risk framework?

Enterprise Risk Management

Business Context

System

FMECA,HazOp, JSA

QRA etc.

Sub-system

Assembly

Low level top down

or

High level bottom up

Top down

Bottom up

Context

Component

Enterprise Risk Framework

The above enterprise risk framework diagram describes one understanding. When activities are undertaken bottom up, each specialist group comes to an internalised understanding of what is important to the organisation. However, when the risk assessment of the environmental group competes with the risk assessment of the HazOp group and the JSA group for resources a very difficult situation can arise. A high level business risk framework can normalise the value systems of the competing groups saving considerable time and much frustration. 7.4.1 Determining Risk Matrix Values One simple method for developing the consequence values of the matrix is to consider a loss that would prove catastrophic to the organisation and stepping back in order of magnitude changes from catastrophic to noticeable. The table should reflect the full range of loss values, not just directly measurable items. An example of a consequence table is shown the table below. The loss values can vary for different organisations. The critical aspect is the range of the consequences. This is different for different organisations. Catastrophic may be $1 billion for some companies whereas a $100,000 loss is probably devastating for most domestic situations. Note also that loss of reputation and other intangibles account for the vast majority of loss.

Page 89: r2A Risk and Reliability 5th_Edition

Top Down Techniques

Risk & Reliability Associates Pty Ltd 7.13

Consequence Rating>> Critical Success Factors

1 Noticeable

2 Important

3 Serious

4 Major

5 Catastrophic

Reputation & Competitive Edge

Magistrate's Court Action Serious complaint

Local Press County Court Action Adverse ministerial comment in State Parliament

State Press Supreme Court Action Adverse ministerial comment in Federal Parliament

National Press Court of Appeals of a Supreme Court OR Federal Court Action

International Press High Court Action

Financial Performance

$10,000 $100,000 $1 M $10 M $100 M

Compliance, Corporate Governance & Information

Breach of Statutory EPA Regulations

Isolated release of private information Isolated database hacking EPA Fine

Successful prosecution for breach of privacy

Widespread access to confidential records Breach of statutory, regulatory or contractual obligations Ongoing and extensive database hacking or fraud

Occupational Health & Safety and Environment

Minor injury Temporary serious injury

Permanent serious injury or disability Ongoing staff harassment or abuse Minor structural damage

1 death Significant structural damage due to fire etc

10 deaths Massive industrial disputes Loss of a major infrastructure facility due to earthquake, etc

Sample Consequence Table

Likelihood for an organisation is usually done on a frequency basis, for example:

Almost Certain Once per year Likely Once in 10 years Some Chance Once in 100 years Unlikely Once in 1,000 years Rare Once in 10,000 years

Typical Likelihood Values for an Organisation

The use of combined logarithmic values for each scale provides for lines of constant risk.

Page 90: r2A Risk and Reliability 5th_Edition

Top Down Techniques

7.14 Risk & Reliability Associates Pty Ltd

Each critical vulnerability can then be placed on the risk matrix as shown below. The summary of all the dots on the matrix is in fact the unmitigated risk profile for the subject organisation. As noted in section 4.4, Due Diligence, the final decision for action is individual to an organisation. But a process like this makes it transparent to any whom wish to know, whether it be shareholders, judge or jury, or regulator.

Almost Certain

CONSEQUENCE

1 3 42 5

A

B

C

D

E

HH H

HHH

MM

MM

E E EE EE E

ELL LL L

H

H

Likely

Some Chance

Unlikely

Rare

Noticeable Important Serious Major

LIKELIHOOD

1

2

3

5

6

4

7

Catastrophic

Sample Risk Profile

One of the simplest approaches is to place a dot on the current risk position as shown above and another on the revised location after proposed the risk control is in place, as shown below. An immediate payback can then be visually seen.

Almost Certain

CONSEQUENCE

1 3 42 5

A

B

C

D

E

HH H

HHH

MM

MM

E E EE EE E

ELL LL L

H

H

Likely

Some Chance

Unlikely

Rare

Noticeable Important Serious Major

LIKELIHOOD

4R

Catastrophic

5R3R

6R

2R

7R

1R

Sample Residual Risk Profile

Residual risks (those that remain after risk mitigation) can and should be classified. The categories given in AS(IEC) 61508:2000 are instructive. Class I (intolerable – except in extraordinary circumstances) Class II (undesirable – unless risk reduction is impracticable or the cost of reduction would exceed the

improvement gained) Class III (tolerable – if the cost of risk reduction would exceed the improvement gained) Class IV (broadly acceptable – negligible risk) Class V (acceptable – trivial risk) Class I is broadly equivalent to the ‘Extreme’ category in the Australian Standard. Class II is broadly equivalent to the ‘High’ category. Class III is equivalent to the ‘Moderate’ category whilst the remaining two classes equate to the ‘Low’ category. Hence all residual risks should be Class III, IV, or V, that is, ‘Moderate’ or ‘Low’ using the Australian Standard risk terminology.

Page 91: r2A Risk and Reliability 5th_Edition

Top Down Techniques

Risk & Reliability Associates Pty Ltd 7.15

7.5 Project Risk Profiling Projects to have an interesting conceptual risk profile. The upside risk position is assumed in the proposal. The risk analysis generally focuses on those issues which will prevent the assumed upside benefits from being achieved. That is, it is a downside risk assessment process from an assumed upside risk position. Again the vulnerability approach can be used as shown below.

Flow Chart for Project Vulnerability Assessment

Page 92: r2A Risk and Reliability 5th_Edition

Top Down Techniques

7.16 Risk & Reliability Associates Pty Ltd

The analysis can be done at any stage in the project’s life cycle depending on the project’s nature. Such a life cycle is shown below.

STAGES OF THE PLC ROLES FOR RISK ANALYSIS Conceive Design Plan Allocate Execute Deliver Review Support

Identifying stakeholders and their expectations Identifying appropriate performance objectives Setting performance criteria Assessing the likely cost of a design Identifying and allowing for regulatory constraints Determining appropriate levels of contingency funds and resources Evaluating alternative procurement strategies Determining appropriate risk sharing arrangements Identifying remaining execution risks Assessing implications of changes to design or plan Identifying risks to delivery Assessing feasibility of meeting performance criteria Assessing effectiveness of risk management strategies Identifying of realised risks and effective responses Identifying extent of failure liabilities Assessing profitability of the project

Applications of Risk Management in the Project Life Cycle (adapted from Project Risk Management, Chapman and Ward, 1997)

If done on a 5x5 matrix, risk characterisation requires further consideration. For a project, likelihood is usually done on a probability rather than a frequency basis since the likelihood is related to the project which may extend over many years, for example:

Almost Certain 100% chance of occurrence during the project Likely 30% chance of occurrence during the project Some Chance 10% chance of occurrence during the project Unlikely 3% chance of occurrence during the project Rare 1% chance of occurrence during the project

Typical Likelihood Values for a Project

To ensure lines of constant risk the consequence scale thus also needs to be (semi) logarithmic.

Project Delivery

1% time over-run

3% time over-run 10% time over-run 30% time over-run

100% time over-run

Financial Performance

1% budget over-run

3% budget over-run

10% budget over-run

30% budget over-run

100% budget over-run

Occupational Health & Safety

Minor injury Temporary serious injury

Permanent serious injury or disability

1 death Multiple deaths

Environmental EPA Reportable incident

Major spill or bushfire

Consequence Rating

1 Noticeable

2 Important

3 Serious

4 Major

5 Catastrophic

Typical Consequence Values for a Project

If the project delays and costs can be usefully characterised then contingency sums and delays can be estimated. This can be simply done by calculating the loss expectancy of the residual risks and then summing these. For example, wet weather is estimated at a 50% chance of 6 days. The average wet weather loss expectancy is then 3 days for the project. Such an approach assumes that each risk being considered is discrete. That is, the loss events do not overlap.

Page 93: r2A Risk and Reliability 5th_Edition

Top Down Techniques

Risk & Reliability Associates Pty Ltd 7.17

REFERENCES Chapman C and Ward S (1997) Project Risk Management, John Wiley and Sons, Chichester U.K. Table is from page 27. Department of Defense (USA), Standard Practice for System Safety, MIL-STD-882D, 10 February 2000. Ministry of Defence (UK), Safety Management Requirements for Defence Systems, Part 1: Requirements, Defence Standard 00-56(PART 1)/Issue 2, 13 December 1997. Standards Australia, Australian Risk Management Standard (AS 4360:1999) Standards Australia/International Electrotechnical Commission AS/IEC 61508:2000. Functional Safety of Electrical/ Electronic / Programmable Electronic Safety-related Systems. Standards Australia/Standards New Zealand (1999). Risk Management. Australian/New Zealand Standard AS/NZS 4360:1999. Standards Australia/Standards New Zealand (2000). Information Security Management. Australian/New Zealand Standard AS/NZS 4444.2:2000. READING Grey Stephen (1995). Practical Risk Assessment for Project Management. John Wiley & Sons, Chichester, UK. Robinson Richard M, Gaye E Francis, Kevin J Anderson (2003). Lessons from Cause-Consequence Modelling for Tunnel Emergency Planning. Proceedings of the Fifth International Conference on Safety in Road and Rail Tunnels. University of Dundee. pp 149-158. ISBN 1 901808 22 X.

Page 94: r2A Risk and Reliability 5th_Edition

Ranking Techniques

Risk & Reliability Associates Pty Ltd 8.1

8. Ranking Techniques 8.1 Risk Registers A Risk Register is an action list of identified problems ranked by risk criteria. The nature of a register varies according to the techniques by which the problems were identified and the manner of the risk characterisation. Common risk registers include Vulnerability, HazOp, Hazard (OH&S), FMECA and Property Loss Prevention. They all have a common purpose: to establish tactical and strategic weaknesses so that they can be managed before they manifest themselves as real pain to an organisation. Accordingly, they have many similarities especially in the methods of risk characterisation. 8.1.1 Vulnerability Registers A Vulnerability Register is derived from a top down process. It is described in detail in Chapter 7.3, Vulnerability Assessments. In summary, the process requires that critical success factors be identified for an enterprise (the assets). A list of potential threats is then developed. Assets that are vulnerable to threats can have a risk characterisation (business impact assessment) made to establish priorities. The primary benefit of such a process is that real resources are only spent on vulnerabilities rather than threats. The primary weakness of such an approach is that the identified vulnerabilities can be merely areas of concern and insufficiently precise to ensure that action can be targeted effectively. 8.1.2 HazOp Risk Registers A HazOp (Hazard and Operability) risk register is derived form a bottom up process. It is described in detail in Chapter 10, Bottom Up Techniques. In summary, the process requires that a detailed functional statement of a contract, project or process be available. Each functional element is examined using a series of predetermined guidewords to see if its failure will cause problems. If so, action is proposed. The principal benefit of a HazOp process is that it is very specific, and the benefits of corrective action can be easily seen. The primary weakness is that it may fail to spot problems, which result from simultaneous failures, so called "common cause", or “common mode” failures, which can have serious liability implications. 8.1.3 FMECA Registers A FMECA (Failure Modes, Effects and Criticality Analysis) is another form of bottom up risk assessment, very similar to HazOps, but directed at reliability rather than risk issues, although in practice HazOp and FMECA seem to be pretty much interchangeable. This process is also described in detail in Chapter 10, Bottom Up Techniques. 8.1.4 Hazard (OH&S) Registers The focus of such studies is obviously human safety and can incorporate a number of the Vulnerability and HazOp techniques. 8.1.5 Property Loss Prevention Registers Property Loss Prevention Registers are also described in Section 8.3 of this chapter. These focus on Property Loss matters, typically based around assessments, as they would be conducted by the insurance industry.

Page 95: r2A Risk and Reliability 5th_Edition

Ranking Techniques

8.2 Risk & Reliability Associates Pty Ltd

8.2 Ranking Acute OH&S Hazards Organisationally risk normally follows a hyperbolic profile. Such a view is consistent with the accident risk triangle espoused by Bird and Heinrich. On average it does appear that for a tenfold decrease in likelihood there is a tenfold increase in severity for pure risk events. On log-log paper this is a line of constant risk. This is based on the notion that risk is a function of both severity and frequency and, all other aspects being equal, can be expressed as the product of the two. This means that if it can be shown that the injury severity can be decreased by a factor of ten then its likelihood can be increased by a similar factor and vice versa without changing the overall risk. This concept is shown on a log-log graph as a line at 45 deg and is represented below. 8.2.1 Lines of Constant Risk

1 10 1000.1

1x10-3

-4

-5

-6

Severity of Consequence

Higher Risk (Dangerous)

Lower Risk(Safe)

Lines of Constant Risk

Likelihood of Occurrence

1x10

1x10

1x10

Lines of Constant Risk If such a concept is adopted then a simple spreadsheet risk assessment and solution ranking method can be developed. To achieve such a result requires that for each identified hazard an appropriate recommendation is made and the following parameters determined: • the likelihood of the event occurring; • the anticipated most probable of severity outcome for that occurrence; • the probable risk control effectiveness of the proposed recommendation and • an estimate of its cost.

Page 96: r2A Risk and Reliability 5th_Edition

Ranking Techniques

Risk & Reliability Associates Pty Ltd 8.3

8.2.2 Spreadsheet based Acute Hazard Quantification

Provided an assessment of the likelihood and consequence severity of a hazard can be made then a simple spreadsheet risk calculator can be devised as shown below.

Proposed Measure

Likelihood per year

Consequence Severity

Control Effectiveness

Control Cost $

Risk Reduction Rating

Provide foam padding

1 25 90% 100 22.5

Spreadsheet Risk Calculator

Absolute severity is the greatest expected measure of consequence for a particular hazard in whatever units are being used. The product of the likelihood of the event occurring per annum and the expected severity of the outcome measures absolute risk. Greatest risk reduction per dollar spent is calculated by the formula:

Likelihood x Severity x Percentage risk control Total capital cost of recommendation

If historical data on injury frequencies and severity is not available then a risk estimation can still be made for any hazard by developing exposure data. That is:

Likelihood = Exposure x Probability of injury

where Exposure is the number of trials per time period and Probability is a number between 0 and 1. For example, consider a tripping hazard due to wrinkled carpet. How many times in a working day does a typical employee step over the carpet? How many employees typically do this? How many days does a typical employee work? The product of all these numbers will give a first approximation as to the number of trials per annum. This can also be done in a spreadsheet form.

Trials per time unit

per person

Time units pa

People per shift

Shifts Trials pa

Probability of injury per trial

Likelihood of injury pa

2 240 10 2 9,600 1 x 10-4 1

Quantifying Exposure and Likelihood

Page 97: r2A Risk and Reliability 5th_Edition

Ranking Techniques

8.4 Risk & Reliability Associates Pty Ltd

Item

N

o.

Prop

osed

co

ntro

l m

easu

re

Tria

ls p

er

time

unit

per

pers

on

Tim

e un

its

p.a.

Peop

le

per

shift

Shif

tsEx

posu

re

(tria

ls p

.a.)

Prob

abili

ty

per

trial

Inju

ry

frequ

ency

(p

.a.)

Seve

rity

ratin

g (d

ays

lost

)

Ris

k (d

ays

lost

p.a

.)

Con

trol

effe

ctiv

enes

s (%

)

Con

trol

cost

($)

Payb

ack

scor

eR

ank

Ord

erPr

iori

ty

1Pr

ovid

e fo

am

for h

ead

bum

p po

tent

ial

224

010

296

000.

0001

0.96

2524

9010

021

.61

Sample Spreadsheet Hazard Register

Page 98: r2A Risk and Reliability 5th_Edition

Ranking Techniques

Risk & Reliability Associates Pty Ltd 8.5

A table of helpful figures is provided to facilitate risk ranking.

Exposure (Time Units p.a.) Constant (every 5 working minutes) Hours (typical working hours) Days (working days per year) Weeks (typical working weeks) Months Years

24000 per year 2000 per year 240 per year 48 per year 11 per year 1 per year

Reasonable Severity Potential (after Viner 1991) Days Lost Medical and Temporary Partial Incapacity (Hit thumb with a hammer) Temporary Total Incapacity (Unconscious) Permanent Partial Incapacity (Maiming) Permanent Total Incapacity/Death Multiple (typical 3) Deaths

0.5 25 275 6000 18000

Probability of Injury per Trial Certain Imminent Probable Likely Unexpected Remote 1 in a million

10-0 = 1 = 1/1 10-1 = 0.1 = 1/10 10-2 = 0.01 = 1/100 10-3 = 0.001 = 1/1,000 10-4 = 0.0001 = 1/10,000 10-5 = 0.00001 = 1/100,000 10-6 = 0.000001 = 1/1,000,000

Recommendation Effectiveness (Anticipated Risk Reduction) Total removal Design Administration Training

100% 90% 50% 30%

Recommendation Cost

Maintenance Budget Item Annual Budget Item Capital Works Item

$100 $1,000 $10,000+

Helpful Ranking Figures

Page 99: r2A Risk and Reliability 5th_Edition

Ranking Techniques

8.6 Risk & Reliability Associates Pty Ltd

These figures can be extended to the below. The first provides for a rapid calculation of expected accident frequency.

Exposure Probability 1/100

1/10,000

1/100,000

1/1,000,000

24,000 p.a. 2,000 p.a.

240 p.a. 48 p.a. 11 p.a.

1 p.a.

24 p.a. 2 p.a. 0.24 p.a. 0.048 pa. 0.011 p.a. 0.001 p.a.

2.4 p.a. 0.2 p.a. 0.024 p.a. 0.0048 p.a. 0.0011 p.a. 0.0001 p.a.

0.24 p.a. 0.02 p.a. 0.0024 p.a. 0.00048 p.a. 0.00011 p.a. 0.00001 p.a.

0.024 p.a. 0.002 p.a. 0.00024 p.a. 0.000048 p.a. 0.000011 p.a. 0.000001 p.a.

Figures to Calculate Expected Accident Frequency

The second table provides for a typical first order correlation between injury severity, loss expectancy and public response in the form of environmental, regulatory and media impact.

Severity OH&S (days lost)

Property (dollars)

Environmental Regulatory/Media

Noticeable Important

Serious Severe Critical

Catastrophic

0.5 25

275 (1 death ) 6,000

(3 deaths) 18,000 (3+ deaths) 18,000+

$1,000 $10,000

$100,000 $1,000,000

$10,000,000 $100,000,000

Local media (non-metropolitan)

Local media (metropolitan) National media, local regulation

National media & regulation Int’l media & national regulation

Estimated Expected Severity

8.2.3 Precautionary Ranking Note Care should be used when selecting a point on a line of constant risk as a system of risk characterisation. Whilst such lines may be true on average for all risks, they are not true for individual risks. For example, the risk of tripping on a footpath is more likely to cause injury than death whereas the risk associated with falling off a high rise building is far more likely to cause death than injury. That is, there is a unique risk curve for each hazard. It is almost certainly not a line of constant risk. It is therefore prudent to characterise the most probable consequence severity first and then to characterise the likelihood of the occurrence of that consequence severity. The object is to ensure that the worst point on the risk curve for an individual risk is chosen for characterisation. A sample list of possible risk curves follows. They are very subjective risk curves based on the experience of the authors. They are drawn as though on log-log graph paper so that a 45 degree line would represent a line of constant risk. Consequence is represented as days lost. Likelihood is on a probability basis and would need to be multiplied by the number of trials to obtain the actual expected number of expected injuries. Adapting the figures above to the nearest order of magnitude provides the following scales.

1 day lost Medical and Temporary Partial Incapacity 10 days lost Temporary Total Incapacity (Unconscious) 100 days lost Permanent Partial Incapacity (Maiming) 1,000 days lost Permanent Total Incapacity/Death 10,000 days lost Multiple Deaths

Consequence Scale

1 in 100 Probable 1 in 1,000 Likely 1 in 10,000 Unexpected 1 in 100,000 Remote 1 in 1,000,000 1 in a million

Likelihood Scale

Page 100: r2A Risk and Reliability 5th_Edition

Ranking Techniques

Risk & Reliability Associates Pty Ltd 8.7

10 100 1,000 10,000Medical and Temporary Partial Incapacity

Temporary Total Incapacity

Permanent Partial Incapacity

Permanent Total Incapacity/Death

1Multiple Deaths

1 in 100Probable

1 in 1,000Likely

1 in 10,000Unexpected

1 in 100,000Remote

1 in 1,000,0001 in a million

Risk Curve for Manual Handling Hazard

10 100 1,000 10,000Medical and Temporary Partial Incapacity

Temporary Total Incapacity

Permanent Partial Incapacity

Permanent Total Incapacity/Death

1Multiple Deaths

1 in 100Probable

1 in 1,000Likely

1 in 10,000Unexpected

1 in 100,000Remote

1 in 1,000,0001 in a million

Risk Curve for Trip on Paving Hazard

10 100 1,000 10,000Medical and Temporary Partial Incapacity

Temporary Total Incapacity

Permanent Partial Incapacity

Permanent Total Incapacity/Death

1Multiple Deaths

1 in 100Probable

1 in 1,000Likely

1 in 10,000Unexpected

1 in 100,000Remote

1 in 1,000,0001 in a million

Risk Curve for High Voltage Electrocution Hazard

Sample Possible Risk Curves of Particular Hazards

Page 101: r2A Risk and Reliability 5th_Edition

Ranking Techniques

8.8 Risk & Reliability Associates Pty Ltd

In the authors' experience the most appropriate order in which to consider such matters are:

* absolute severity * absolute risk. * greatest risk reduction per dollar spent

Absolute severity reflects the need to ensure that anything with (multiple) death potentials has been seriously considered. The spreadsheet calculator described will score badly risk control solutions that are expensive and/or inefficient. So if an expensive solution has been proposed when a cheaper one was available then due diligence may not have been satisfied. The results of such work can be represented by tabular outputs such as that shown below.

Statement of Risk Controls Risk Severity Cost Payback Head bump potential exists at the end of the conveyor

Provide foam padding in addition to the stripe indicating surface.

23 25 100 207

Use of blow down gun on conveyor provides for embolisms, mechanical damage potential and eye injury to personnel.

Discontinue the use of the blow down gun in favour of a suitable vacuum cleaner. Remove flexible air hose.

143 6000 1000 143

The dock should be guarded against fall potentials when not actually in use. This is difficult to effectively achieve.

Minimum options are: 1. Paint the edge brightly 2. Mark a "no walking" area. 3. Provide a small raised wooden edging.

264 275 1000 132

Jumping out of truck holding goods. This imposes severe back strain problems.

1. Provide a large non-slip step down. 2. Provide induction and training.

1031 275 5000 103

The platforms by the discharge chute do not have kick boards, a proper access ladder or complete hand railing.

This really requires a redesign of the loading operation in this area to conform to AS1657.

2639 275 20000 92

The ramp safety chain fastening appears inadequate. Any slack in the chain would enable the trailer/ramp to separate.

Provide a welded stanchion down to bumper level so that the chain is horizontal and slack is minimised.

26 275 1000 13

The stairway of Building 1 has slippery surfaces.

Resurface the stairs with a non slip surface, (coefficient of friction 0.4 min., desirably 0.5, for all foreseeable conditions)

7 25 1000 6

Sample Hazard Register

Sorted by Greatest Risk Reduction per Dollar spent (Payback Score)

Page 102: r2A Risk and Reliability 5th_Edition

Ranking Techniques

Risk & Reliability Associates Pty Ltd 8.9

8.2.4 Process Review Reviewing the process: i) Simultaneously identify the hazard and a possible solution ii) Select a realistic maximum injury severity for that hazard iii) Assess exposure and probability per trial to determine the frequency sensible for that

consequence severity iv) Conduct a reality check, "Is that frequency sensible for that consequence severity?" v) Select solution's control cost and risk control effectiveness. vi) Calculate risk, risk reduction and ranking. 8.2.5 Risk Control Measures There are five general categories for risk control:

Effectiveness 1. Removal or Elimination 2. Design or Physical Control (engineering) 3. Administrative Control (procedural) 4. Training (Work Method Controls) (personnel) 5. PPE (Personnel Protective Equipment)

100% 90% 50% 30% 20%

Some examples of the above categories are given in the table below:

Occurrence Type

Engineering Controls Procedural Controls Personnel Controls

CHEMICAL EXPOSURE - toxic properties

1. Design for containment

2. Ventilation systems. 3. Change rooms and

1. Chemical purchasing procedure. 2. Chemical register. showers etc. 3. Provision of personal protective equipment 4. Medical monitoring programs 5. Transportation, handling and storage

practices. 6. Maintenance of equipment 7. Emergency procedures

1. Training in the selection, use and care of personal protective equipment.

2. Information on toxic properties and routes of ingestion

CHEMICAL EXPOSURE - corrosive properties

1. Splash and leak proof containers.

2. Provision of showers and eye washes.

1. Chemical purchasing procedure. 2. Chemical register. 3. Provision of personal protective equipment. 4. Transportation, handling and storage

practices. 5. Maintenance of equipment. 6. Emergency procedures.

1. Training in the selection, use and care of personal protective equipment.

2 Information on toxic properties and routes of ingestion.

CHEMICAL EXPOSURE - fire and explosion effects

1. Provision of storage facilities

2. Provision of Containers

1. Chemical purchasing procedure 2. Chemical register. 3. Provision of personal 4.Transportation, handling and storage

practices. 5. Maintenance of equipment. 6. Emergency procedures

1. Awareness of hazardous properties.

2. Awareness of emergency procedures.

CHEMICAL EXPOSURE - asphyxiant properties

1. Atmosphere assessment equipment

2. Ventilation equipment 3. Harnesses and air

supply equipment

1. Work Permit systems 2. Equipment Maintenance

1 Awareness of hazardous properties.

2. Awareness of emergency procedures.

Examples of Risk Control Measures

Page 103: r2A Risk and Reliability 5th_Edition

Ranking Techniques

8.10 Risk & Reliability Associates Pty Ltd

8.3 Ranking Property Loss Prevention Hazards An example of a property hazard risk calculator is given in the figure below. Property Loss Prevention Program Major Recommendation Register Recommendation No:1 Date: Monday 15 March 1996 Recommendation: Installing in-rack sprinklers in the multiple row racks in the raw materials warehouse and under the finished goods conveyor is required to make the sprinkler protection effective. The existing sprinkler system was designed for solid pile storage. It is inadequate for multiple row rack storage and in-rack sprinklers or a very serious increase in overhead sprinklers protection would be required. The new conveyor system shields the overhead sprinklers and a new row is required under it.

Backg'd Event Freq. (pa) Hot Spot Freq. (pa) Total Event. Freq. (pa) Years Between Events Asset Damage $ Business Interruption $ Severity (PD + BI) $

0.01

0.01 100

2,000,000 1,000,000 3,000,000

Rec. Capital Cost $ Rec. Maint. Cost $ pa Rec. Effectiveness % Pre. Rec. Loss Expectancy Post. Rec. Loss Expectancy ∆ Annual Loss Expectancy Payback Period (years)

100,000 100

90 30,000

3,000 27,000

3.7

Property Loss Payback Calculator The definitions for each of the items above follow on the next page. The key concept is the total cost of risk. For property damage the product of the likelihood of the loss event and its expected frequency is the annual loss expectancy. That is how much money would need to be put aside each year to pay for the cost of loss if no insurance were purchased. It is a direct measure of the risk of the event. For example, if the projected cost of the event is $1m and it occurs once every 10 years then $100,000 per year should be set aside to pay for the cost of loss. This follows from the Loss Rate Concept (Browning R L 1980). However, if a risk control option can be implemented then it should reduce either the likelihood or the severity of the loss event substantially, perhaps 90%. This will reduce the annual loss expectancy by 90% from $30,000 per year to $3,000 per year. That is, there will be a saving in the cost of ownership of $27,000 per year. Thus if the cost of the improvement is $100,000 it will nominally take 3.7 years to pay back. The formula is:

Payback Period (Years) = Recommendation Cost . (∆ Annual Loss Expectancy - Maintenance Cost p.a.)

So in the above case:

Payback Period (Years) = $100,000 = 3.7 years ($27,000 pa - $100 p.a.)

Page 104: r2A Risk and Reliability 5th_Edition

Ranking Techniques

Risk & Reliability Associates Pty Ltd 8.11

8.3.1 Property Loss Calculator-Definition of Terms Backg'd Event Freq. (p.a.) This is the expected fire frequency associated with the event. For

example, a fire in a warehouse Hot Spot Freq. (p.a.) This is an assessment of unusual items which add a particular event

frequency beyond the normal, background frequency. For example, an internal petrol bowser

Total Event Freq. (p.a.) The sum of Background and Hot Spot Frequency Years Between Events The reciprocal of Total Event Frequency. Asset Damage An estimate of the expected property damage. Business Interruption An estimate of the expected loss of profits. Severity (PD + BI) The sum of Asset Damage and Business Interruption. Rec. Maint. Cost $ An estimate of the cost of maintaining the recommendation per year.

This needs to include any potential losses associated with the proposed solutions. For example, in-rack sprinklers might be struck once a year by forklifts causing $10,000 damage on each occasion.

Rec. Effectiveness % An estimate of the control effectiveness of the proposed risk control

solution. It can be either a frequency reduction or a severity reduction or both.

Pre-Rec. Loss Expectancy This is the annual loss expectancy and is the product of the Total Event

Frequency and the Severity. Post-Rec. Loss Expectancy This is the revised annual loss expectancy after the recommendation

has been implemented. It is the Pre-Recommendation Loss Expectancy reduced by the Recommendation Effectiveness.

∆ Annual Loss Expectancy This is the difference between the Pre- and Post Recommendation

Loss Expectancy Payback Period (years) This is equal to:

Recommendation Cost (∆ Annual Loss Expectancy - Maintenance Cost p.a.)

This excludes any discounted cash flow considerations, which does not

seem to be important for projects that have a payback of 3 years or less.

Page 105: r2A Risk and Reliability 5th_Edition

Ranking Techniques

8.12 Risk & Reliability Associates Pty Ltd

8.4 Integrated Investment Ranking Capital investment proposals are often focused on new projects or schemes. But projects which improve reliability or reduce risk can provide for superior investment. To properly assess and compare different capital works projects, an integrated assessment process is needed. Such a payback assessment system should also; • Establish a balanced investment program. • Rank projects to provide the maximum rate of return. • Assess the cost of providing a specified level of service. A concept model is shown below.

Reduction inRisk (Loss Expectancy)

The Benefits arising from aSolution to a Perceived

Problem are:

Calculated as Investment Ratioor Years Payback Value

CommercialBenefits

PR, ImageMoral Value

Savings in Maintenance

Costs

Reduction inRisk (LossExpectancy

Benefit Model

Of the four forms of benefit identified in the figure above, determining dollar values for Commercial Benefits and Maintenance Savings are relatively straightforward. However, determining dollar benefits for the issues of Public Relations, Corporate Morale, Image and Reduction in Loss Expectancy is more complex. Results of any investment assessment must be presented in ways that senior management can understand, that is financially based, and on clean crisp pieces of paper. In our experience, senior managers, directors do not respond to computer screens. A spreadsheet example of a possible layout is included at the conclusion of this chapter

Page 106: r2A Risk and Reliability 5th_Edition

Ranking Techniques

Risk & Reliability Associates Pty Ltd 8.13

New Company Pty Ltd Project Investment Summary Project Description Years Payback: 1.59 yrs A problem exists with a lack of oil traps on the storm water drains. This means that any transformer that leaks will release oil directly to the creek. The proposal is to install oil traps on each sub-station Investment Overview Cost Design Labour Materials Contingency Total Cost

$12,500 $86,000 $45,000 $14,350

$157,850

Return Commercial Return Maintenance Saving PR Benefit Risk Saving Total Return

(Summary over) $0 pa

($1,000) pa $500 pa

$99,954 pa

$99,454 pa This is for a photograph

Page 107: r2A Risk and Reliability 5th_Edition

Ranking Techniques

8.14 Risk & Reliability Associates Pty Ltd

Commercial Return $ 0 p.a. Risk Saving $ 99,954 p.a. No commercial prospects noted.

Maintenance Saving ($1,000) p.a. Risk Saving Calculation Will cost $1,000 per year to maintain.

Event Frequency per year Years between events Consequence Severity Asset Damage Business Interruptions Clean Up Cost Legal Cost Fines Management Stress Cost Public Relations Damage Total Severity Project Effectiveness

2.00 0.50

$0 $0

$5,000 $3,000 $2,000

$10,000 $30,000

$50,000

99.95%

Public Relations Benefits p.a.Comments Small benefit to locals but no real positives

Total cost to the organisation is 2 x $50,000 pa or $100,000 pa. PR Damage equals the cost to restore the Organisation's real name. Effectiveness: The only time when the oil traps won't work is during a raging storm, say 10 hours out of 8760 hours per year. Project Effectiveness = 8760 - 10 8760 = 99.95%

Page 108: r2A Risk and Reliability 5th_Edition

Ranking Techniques

Risk & Reliability Associates Pty Ltd 8.15

READING Anderson K J, Robinson R M and D Hyland (1992). Ranking of Infrastructure Renewals Taking into Account the Business Requirements of the Railway. CompRail ‘92 Conference. Washington. Browning R L (1980). The Loss Rate Concept in Safety Engineering. Marcel Dekker, USA. Lees F P (1995). Loss Prevention in the Process Industries. 2nd Edition. Butterworth-Heinemann Ltd, Oxford, UK. (3 Volumes). Robinson R M, J R Kennedy and T Beattie (1995). Risk Based Investment Ranking. Viner D B L (1991). Accident Analysis and Risk Control. VRJ Information Systems, Melbourne. ISBN 0 646 02009 9. Table is on page 132.

Page 109: r2A Risk and Reliability 5th_Edition

Item No.

Proposed control measure

Trials per time unit

per person

Time units p.a.

People per shift Shifts Exposure

(trials p.a.)Probability

per trial

Injury frequency

(p.a.)

Severity rating (days

lost)

Risk (days lost p.a.)

Control effectiveness

(%)

Control cost ($)

Payback score

Rank Order Priority

1Provide foam for head bump

potential2 240 10 2 9600 0.0001 0.96 25 24 90 100 21.6 1

Page 110: r2A Risk and Reliability 5th_Edition

Modelling Techniques

Risk & Reliability Associates Pty Ltd 9.1

9. Modelling Techniques There are variety of analytical methods for risk and reliability modelling of the pure risk of technical systems documented in a range of standards and codes. They are especially applicable to analysis of computer systems (functional safety assessment), which in this day and age appears to be a substantive component of any significant infrastructure control system. The ones that the authors have used successfully are shown below and will be discussed in this chapter.

Trees Fault Trees Success Trees Event Trees (Consequence Trees) Dependency Trees Blocks Reliability Block Diagrams Dependence Block Diagrams Blocks vs Trees Integrated Presentation Diagrams Cause-Consequence Diagrams Threat-Barrier Diagrams Venn (‘Swiss Cheese’) Diagrams

List of Modelling Techniques and Presentation Methods

The choice mostly relates to the nature of the problem under investigation and the requirements of the audience to whom the analysis is being addressed. The integrated presentation diagrams, as the name suggests, are generally more palatable to the public and the courts as they provide the most pictorial representation of the subject. However, analytical technical people generally prefer to use trees and block diagrams for the initial analysis at least. A summary of the mathematics required to support these pure risk-modelling techniques is contained in Chapter 12. This chapter also contains a summary of the mathematics used for modelling market (speculative) risk.

Page 111: r2A Risk and Reliability 5th_Edition

Modelling Techniques

9.2 Risk & Reliability Associates Pty Ltd

9.1 Trees The heart of decision trees is the assumption that truly independent variables contribute to occurrences and outcomes. That is, what independent things must conspire together to bring about an event, and having occurred, what are the possible outcomes? The general structure of such models was established in 1975 with the publication of the US Reactor Safety Study known as WASH-1400 and formally entitled: An Assessment of Accident Risks in the US Commercial Nuclear Power Stations (Reason, 1990). The basic steps are: i) Identify sources of potential hazard ii) Identify the events that could initiate such a hazard occurring (fault trees). iii) Establish the possible sequence of events that could result from such occurrences (event

trees). iv) Quantify in probability and frequency terms the likelihood of ii) and iii). v) Determine the overall risk by aggregating all the known quantified hazards. The difficulty is in determining the input numbers and ensuring that there are no common inputs or process that are affected simultaneously by one external factor. 9.1.1 Fault Trees The time sequence concept can be extended in several different ways using probabilistic concepts. A “fault” tree is effectively a statement of what events have to conspire together to bring about an undesired outcome. Traditionally these have been drawn top-down and therefore the undesired event known as the "top event". Because of the logical hierarchy of the items, it can be seen as a form of time sequence going from the bottom towards the top of the page.

LightFails

BulbBurnt Out

FuseFailure

IncorrectlySet

Power Surge

3.002 p.a.

1 p.a. 2 x 10 p.a.

1 x 10 p.a.

Power Failure

OR

OR

1 x 10 p.a.-3

-3

-3

2 p.a.

A 'Fault' Tree The fault tree leads to the conclusion that to minimise the likelihood of light failure, minimising the likelihood of bulb burnout provides the greatest contribution. The success tree in Section 9.1.2 indicates that to maximise light availability it is most effective to improve bulb operability than any other aspect. That is, both trees lead to the same general conclusion if the top event is similarly defined. From the risk engineer’s perspective, the reliability engineer has a distinct advantage; the outcome of the success tree, the "top event" is defined in terms of what makes the system operate to its specification, perhaps its availability; its success objective. The failures are all grouped together and contained in the idea of "unavailability" irrespective of whether the failure is due to a breakdown failure, (in the vast majority of instances), or failure (risk).

Page 112: r2A Risk and Reliability 5th_Edition

Modelling Techniques

Risk & Reliability Associates Pty Ltd 9.3

9.1.2 Success Trees It seems that because of this, reliability engineers conceptually prefer "success" tree analysis to "fault" tree analysis. The concept is similar but the Boolean mathematics in the construction of the tree is reversed ('or' gates become 'and' gates) because of this focus on availability (the desired outcome) rather than the fault (the undesired outcome). Reconsidering the light bulb fault tree example:

PowerAvailable

LightAvailable

BulbOperational

FuseOperational

CorrectlySet

FuseAvailable

& & &

&&

0.9488

0.999 0.95 0.9998

0.9999 0.9999

Power Available

A 'Success' Tree 9.1.3 Event Trees (Consequence Trees) An event tree is a similar device except that it answers the questions associated with a particular event occurring with several possible outcomes. These traditionally have also been drawn top-down although in this case the time arrow would be moving from the top of the page towards the bottom of the page as shown below.

100 fires p.a.

5 large fires p.a. 95 controlledfires p.a.

No Yes0.05 0.95

Fire Start Frequency

Sprinklers Effective?

Outcome Frequencies

An 'Event' (or 'Outcome') Tree

Page 113: r2A Risk and Reliability 5th_Edition

Modelling Techniques

9.4 Risk & Reliability Associates Pty Ltd

9.1.4 Dependency Trees The block diagram technique is powerful because it agglomerates all the detailed failure or reliability data into a single communicative overview at a system level, something most of the other techniques fail to achieve. A dependency tree for an airline business is shown below. The likelihood of achieving the top objective could be assessed from the reliability of simultaneously achieving each of the sub-objectives.

Flying payingpassengers

Passengers ServicableAirports

TrainedAircrew

ServiceableAircraft

ReservationsSystems

PassengerTerminals

Trains, taxis,carparks

TrainedOperators

Computers &Software

Airline Dependency Tree Such dependency trees appear to be particularly useful for critical infrastructure assessments using the threat and vulnerability technique (Chapter 7.3).

Page 114: r2A Risk and Reliability 5th_Edition

Modelling Techniques

Risk & Reliability Associates Pty Ltd 9.5

9.2 Blocks 9.2.1 Reliability Block Diagrams Block diagrams are a simple way of representing complex systems diagrammatically. They can be used for both risk and reliability studies. The key concept is to divide the system or process under consideration into sub-systems that are independent of each other and which all the interested parties can pictorially see and agree represents the system as a whole. (This is definitely art and not science). It is absolutely critical that as many interested parties as possible participate and sign off the block diagram as any modelling done is on the basis that the block diagram is an accurate representation of reality for the particular study sign off the block diagram. For reliability work the representation will depend on the definition of success or failure (usually in terms of availability) adopted for the system. If it has multiple definitions (usually associated with alternate operating modes) separate diagrams may be required for each. There are four basic configurations (BS 5760: Part 2:1994) namely, series, parallel (active redundant), m out of n units and cold standby. These are shown below:

A B Output

Series System

S

T

Output

Parallel or Active Redundant System

X

Y

Z

Output

Two Out of Three System

P

Q

Output

Cold Standby Each block could be further reduced to other block diagrams. The block diagram technique is powerful because it agglomerates all the detailed reliability data into a single communicative overview at the system level, something most of the other techniques fail to achieve.

Page 115: r2A Risk and Reliability 5th_Edition

Modelling Techniques

9.6 Risk & Reliability Associates Pty Ltd

9.2.2 Dependence Block Diagrams A reliability block diagram is, in fact, a success block diagram. It describes what elements have to work in order to get a successful output. Just like fault trees have a logical opposite in success trees, there are also fault block diagrams, generally known as dependence diagrams (SAE ARP 4761). The figure below shows the equivalent dependence diagram for the RBD in section 9.2.3 with all relevant failure paths.

Failure A Failure B

Failure EFailure D

Failure C

Sample Dependence Diagram Dependence diagrams are particularly useful for analysing fault trees and checking both the logic and mathematics since they can easily be drawn on a spreadsheet. In fact, the dependence diagram represents the cut set of a fault tree, the cut set being the set of all ways the top event in the fault tree will be true. 9.2.3 Blocks vs Trees Block diagrams and success trees (and therefore fault trees) are interchangeable mathematically. The choice between the two techniques (or the use of both) depends on the scope of the analysis and presentation needs. The advantage of block diagrams is the simplicity of high-level presentations. The advantage of fault trees is the mathematical convenience of modelling a large number of inputs using, for example, spreadsheets.

OutcomeSuccess A

Success B Success E

Success D

Success C

Sample Reliability Block Diagram

This can be redrawn as a fault tree.

Failure A

Failure B

Failure C

Failure E

Failure D

Failure F(Failure A & B)

Failure G(Failure D & E)

Failure H (Failure F, C or G)

&

&

or

Sample Fault Tree

Page 116: r2A Risk and Reliability 5th_Edition

Modelling Techniques

Risk & Reliability Associates Pty Ltd 9.7

9.3 Integrated Presentation Models 9.3.1 Cause-Consequence Models Fault and event trees can be put together as shown below as a combined fault and event tree or, more elegantly, a cause-consequence diagram (Lees, 1995).

&Manifest Threat

Failed Precaution

Loss of Control

Hit

Miss

Fault Trees Event Trees

Vulnerability

Concept 'Cause Consequence' Diagram In a complex situation a major difficulty is usually encountered in selecting the precise point of the loss of control event in such a cause consequence diagram. In theory at least, it could be anywhere along the chain. A useful solution to this difficulty for a risk engineer is to use an energy damage model approach (Viner, 1991) and to say that the event is the point at which control of the potentially damaging energy is lost. As emphasised in Chapter 4.4, Due Diligence, the loss of control point is very important legally. It is always better to prevent the problem, either by eliminating the threat or enhancing the precautions, than to try to recover the situation after control is lost. This has been tested with numerous lawyers by R2A on many occasions. For example, with regards to airspace collision risk it is the point at which the two aircraft collision envelopes overlap. That is, they become so close that the pilots cannot avoid each other; they have lost control of their kinetic energy (Chapter 15.1). It does not mean that they will collide. In fact the collision envelope is large compared to the aircraft. It is just that the pilots have lost control over the outcome. The loss of control point is not always totally obvious. For example, in an analysis for an electrical authority with high voltage transmission lines the point of loss of control of energy was when someone or something penetrated the flashover envelope of the high voltage conductor (Chapter 15.4). That is, despite having entered this region with a fishing pole on the back of a vehicle, the flashover may not occur with fatal results to the occupants. It is possible they might be insulated from the road or it may be a very dry day and the actual envelope is a little smaller than usual. The loss of control point for fire in a tunnel appears to be that fire size which overwhelms the usual air handling system (Chapter 15.6). There are several arguments for this. The simplest, legally, probably revolves around confined spaces. The tunnels should only have sweet, decent air whenever they are occupied, even during a fire/smoke incident. Otherwise they would be considered a confined space. Emergency ventilation to prevent a situation becoming a confined space is an attempt to restore control and acts after the event. For level crossings it is the point at which the vehicle approaching the level crossing has inadequate stopping distance. An example of a cause-consequence diagram for an inadequate stopping distance for a level crossing can be seen below. To fully describe a cause-consequence model requires 3 parameters, threat likelihood, precaution failure probability and the hit and miss balance (degree of vulnerability).

Page 117: r2A Risk and Reliability 5th_Edition

Modelling Techniques

9.8 Risk & Reliability Associates Pty Ltd

In terms of due diligence, the lawyers/courts always focus on the prevention side first. Trying to restore control after the event is always difficult. This actually parallels the OHS hierarchy of controls: elimination/engineering, administration and PPE (personal protective equipment). The latter can only be adopted if the other options are not viable. Viable in this sense seems to mean the common law test of negligence. That is, the balance of the significance of the risk verses the effort required to reduce it. Cause-consequence models invariably demonstrate that control before the loss of control point is the only way to reliably prevent large scale multiple life loss scenarios when large energies and many people are involved. In practice, in ensuring no loss of control, at least three assessment levels of precautions need to be considered: i) Not less safe – comparison with the current situation ii) Best practice - what other organisations and comparable industries do to manage similar threats iii) As low as reasonable practicable - the balance of the significance of an additional precaution of

defined safety integrity level versus its cost (a legally difficult process).

Train not Collision? Severe? Extension?

heard 0.01

1.00E-01 Train deathsYes

Train not seen and Failure to 0.9 1.89E-08

detect train Vehicle1.00E-02 1.00E-06 Yes deaths?

0.1 1.89E-06

Crossing Lights or Failure to Hit Vehicle not seen apply brakes deaths

1.00E-03 1.10E-05 2.10E-06 0.99 1.87E-06Loss of Control Injury/

Car driver or Damagedysfunctional 0.1 2.10E-07

1.00E-05 2.10E-05 No

Road/Braking Near misssystem fails Check Sum: 2.10E-05

1.00E-05 0.9 1.89E-05No

Conditional Cause-Consequence diagram for an inadequate stopping distance for a level crossingt

Advance crossing warning failure

Train detection failure

Driver fails to actuate brakes

Stopping system fails

LOC stopping distance inadequate

Stopping distance inadequate

Scrunch

Deaths/injury/damage

Coroner's inquiry

Cause Consequence Diagram of a Level Crossing One of the primary advantages of cause-consequence models is that they can readily be prepared on spreadsheets with the border tool drawing the lines. (It is necessary to include four cells for a particular item so that the line can come from the centre). Spreadsheets have become ubiquitous. Everyone can use them and share the model.

Page 118: r2A Risk and Reliability 5th_Edition

Modelling Techniques

Risk & Reliability Associates Pty Ltd 9.9

9.3.2 Threat Barrier Diagrams From the authors’ perspective, threat barrier diagrams are another representation of cause-consequence models, as drawn on a drafting package. They can be particularly useful in showing barriers that have effects on multiple threats such as that shown for the tunnel case study (Chapter 15.6) below.

EmergencyVentilation

Fire in Car

Fire in Heavy Commercial Vehicle

Traffic Congestion

Control

Auto DelugeSystem

Loss of

Control

Emergency Evacuation

DG Fire

Prohibitedvehicle

enforcement

Deaths, injury and damage

Fire in vehicle in stalled traffic greater

than 5 MW. Manual Fire Control

Sample Threat Barrier Diagram for Fire in a Road Tunnel

9.3.3 Venn (Swiss Cheese) Diagrams Venn diagram models are graphical representations of AND and OR gates. These are expanded in more detail in Chapter 12, Mathematics. James Reason’s use of this model type has provided the name “Swiss Cheese”.

Traffic Density Radar Option Separation/Segregation

See and Avoid Near Miss Mid Air Collision

Venn Diagram Model of the Series of Failures Required for a Mid-Air Collision

Page 119: r2A Risk and Reliability 5th_Edition

Modelling Techniques

9.10 Risk & Reliability Associates Pty Ltd

9.4 Common Mode and Cause Failures The validity of any of these models rests on the independence of the inputs and failure mechanisms. Each must be completely independent of all the others. If a single outside process can affect two inputs simultaneously then the model is compromised by what is termed a common mode or cause failure. Smith D (1993) makes a distinction between the two, which can be important, especially with diverse redundant systems. Common mode usually refers to a fire or power outage that can simultaneously damage both systems. Common cause refers matters like a misspecification for software. The hardware may be diverse and the software written by different contractors using alternate software. But the built in error will be reliably repeated by both systems, a common cause failure.

CommonCause

FailuresInputs

AccountingSystem

AuditingSystem

Outputs

Common ModeFailures

A Redundant System A Common Cause Failure is when both the systems fail because of a flawed input that each of the diverse systems processes incorrectly. A Common Mode Failure occurs because of a simultaneous failure of both systems due to an external agency, for example, a fire or corruption. 9.5 Human Error Rates Key references in the field of human reliability assessment (HRA) include the seminal US Nuclear Reactor Safety Study (1975), Lees (1995) and Swain (1983). Numerous techniques including HEART (Human Error Assessment and Reduction Technique) and THERP (Technique for Human Error Rate Prediction) are described by Villemeur (1992) and Kirwan (1995) and recent publications by Leveson (1995), Storey (1996) and Redmill (1997) also draw attention to the subject. The following figures stem from the failure rate of humans performing different tasks from the 1975 US Nuclear Reactor Safety Study. There are differences between errors of commission and errors of omission but the figures below have proven remarkably robust accurate for work undertaken by R2A. This includes air and sea pilots, car and train drivers and industrial situations generally.

Type of Activity Probability of Error per Task Critical Routine Task (tank isolation) Non-Critical Routine Task (misreading temperature data) Non Routine Operations (start up, maintenance) Check List Inspection Walk Around Inspection High Stress Operations; Responding after major accident - first five minutes - after five minutes - after thirty minutes - after several hours

0.001 0.003 0.01 0.1 0.5 1 0.9 0.1 0.01

Human Error Rates

(Source: US Atomic Energy Commission Reactor Safety Study, 1975)

Page 120: r2A Risk and Reliability 5th_Edition

Modelling Techniques

Risk & Reliability Associates Pty Ltd 9.11

Smith D (1993) summarises various sources. The following is an extract from this reference.

Type of Activity Probability of Error per Task Simplest Possible Task Overfill Bath Fail to isolate supply (electrical work) Fail to notice major cross roads Routine Simple Task Read checklist or digital display wrongly Set switch (multiposition) wrongly Routine Task with Care Needed Fail to reset valve after some related task Dial 10 digits wrongly Complicated Non-routine Task Fail to recognise incorrect status in roving inspection Fail to notice wrong position on valves

0.00001 0.0001 0.0005 0.001 0.001 0.01 0.06 0.1 0.5

Human Error Rates

(Source: Smith DJ 1993) A coarse summary has it that human errors in trained tasks occur typically at the rate of 1 in 100 per demand, checklist errors are notorious (1 in 10) and even critical tasks can evince error rates of 1 in 1000. For example, recent Watchdog monitoring of several thousand train orders found a handful of mistakes, not in themselves critical, but suggesting a human error probability of 2 in 1000. Based on successful testing of some 529 combinations of the software interlocking rules, according to Annex L of IEC 61508, at 95% confidence, failure probability per demand is 3/529=5.6 in1000.

Page 121: r2A Risk and Reliability 5th_Edition

Modelling Techniques

9.12 Risk & Reliability Associates Pty Ltd

9.6 Equipment Fault (Breakdown Failure) Rates The following table provides a list of typical breakdown failure rates for mechanical parts from work done by the authors. It is emphasised that the data can vary according to operating environments, system interactions and maintenance regimes.

Item MTBF Mean Time

Between Failures (Hrs)

F/Million Hrs Life (yrs)

Motor Gearbox Clutch Bearings Belts Tensioners

100,000 100,000 100,000 250,000 125,000 100,000

10 10 10 4 8

10

11.42 11.42 11.42 28.54 14.27 11.42

Typical Component Breakdown Failure Rates

Smith D (1993) summarises various sources of failure rates. The following is an extract from this reference. He provides up to three figures. If there is only one figure it means his sources are in good agreement. Two or three numbers means a scatter.

Failure Rates per million hours Item Lower Most Upper

Alarm Siren Alternator Computer-PLC Detectors-smoke-ionisation Motor-electrical-ac Transformers->415V VDU

1 1 20 2 1 0.4 10

6 5 1 200

20 9 50 6 20 7 500

General Breakdown Failures Rates

(Source: Smith DJ, 1993) 9.7 Generic Failure Rates Generic failure rates are useful for various forms of preliminary analysis. For example;

Item Failure Rates People Mechanical systems Electrical systems

10-2 per operation 10-3 per operation 10-4 per operation

Generic Failure Rates

9.8 System Safety Assurance System safety assurance is a large domain and the subject of separate R2A writings and courses. Nevertheless, certain elements are presented for introductory purposes. Much of the modelling described above is used for functional safety assessment pursuant to IEC61508:1998 (aka AS61508:2000).

Page 122: r2A Risk and Reliability 5th_Edition

Modelling Techniques

Risk & Reliability Associates Pty Ltd 9.13

9.8.1 Nines The table below summarises the different terminology sometimes used to describe availability.

Up to 30 secs downtime pa is 99.999905% availability pa or “6 nines” up to 1 min downtime pa is 99.999810% availability pa up to 5 mins downtime pa is 99.999049% availability pa or “5 nines” up to 10 mins downtime pa is 99.998097% availability pa up to 30 mins downtime pa is 99.994292% availability pa up to 45 mins downtime pa is 99.991438% availability pa or “4 nines” up to 1 hr downtime pa is 99.988584% availability pa up to 2 hrs downtime pa is 99.977169% availability pa up to 10 hrs downtime pa is 99.885845% availability pa or “3 nines”

Summary of Availability Numbers

9.8.2 SIL (Safety and Integrity Level) SIL is a measure of the probability that the safety related system will fail dangerous. The value of SIL ranges from 1 (the lowest) to 4 (the highest). The table below is adapted from IEC 61508-1:7.6.2.9; via Factory Mutual

Safety integrity level Low demand mode of operation (Average

probability of failure to perform its designed function on demand)

High demand or continuous mode of

operation (Probability of a dangerous failure per

hour) 4 ≥ 10 –5 to < 10 –4 ≥ 10 –9 to < 10 –8 3 ≥ 10 –4 to < 10 –3 ≥ 10 –8 to < 10 –7 2 ≥ 10 –3 to < 10 –2 ≥ 10 –7 to < 10 –6 1 ≥ 10 –2 to < 10 –1 ≥ 10 –6 to < 10 –5

Table of SIL Values

9.8.3 COTS & SOUP High reliability is most simply and economically achieved by parallel low reliability systems. A very simple example is shown in the figure below.

Y

XO

99%

99%

99.99%

Parallel Active Redundant Systems As a result, no longer are the commercial and military industrial approaches distinct. For years the military has had its advocates for the use of commercial off-the-shelf (COTS) equipment, non-developmental items (NDI), and software of unknown pedigree (SOUP) but now military use of commercial designs is required. For example, in June 1994, a US Secretary of Defence (William Perry) memorandum officially changed the way the military develops and acquires systems. Military standards and specifications are out (except with a waiver) and commercial practices are in.

Page 123: r2A Risk and Reliability 5th_Edition

Modelling Techniques

9.14 Risk & Reliability Associates Pty Ltd

REFERENCES British Standards Institution (1994). Reliability of Systems, Equipment and Components, Part 2: Guide to the Assessment of Reliability (BS 5760: Part 2). International Electrotechnical Commission (1998). Functional Safety of Electronic/Programmable Electronic Safety Related Systems. Also know as AS61508:2000. Kirwin Barry (1994). A Guide to Practical Human Reliability Assessment. Taylor & Francis, London. Lees F P (1995). Loss Prevention in the Process Industries. 2nd Edition. Butterworth- Heinemann Ltd, Oxford, UK, (3 Volumes). Leveson Nancy G (1995) Safeware - System Safety and Computers. Addison-Wesley. Perry William as quoted by Preston R. MacDiarmid and John J. Bart in Reliability Toolkit: Commercial Practices Edition. Reliability Analysis Center and Rome Laboratory, NY. Reason J (1990). Human Error. Cambridge University Press. Redmill Felix and Jane Rajan (editors 1997). Human Factors in Safety-Critical Systems. Butterworth Heinemann. Smith David J (1993). Reliability, Maintainability and Risk. Practical Methods for Engineers. Fourth Edition. Butterworth Heinemann, Oxford. Society of Automative Engineers, Guidelines and Methods for Conducting the Safety Assessment Process on Civil airborne Systems and Equipment, (SAE ARP 4761, 1995) Storey Neil (1996). Safety-Critical Computer Systems. Addison-Wesley. Swain Alan D and Bell Barbara Jean (1983). A Procedure for Conducting a Human Reliability Analysis for Nuclear Power Plants. US Atomic Energy Commission Reactor Safety Study (1975). Villemeur Alain (1992). Reliability, Maintainability and Safety Assessment. John Wiley & Sons. Viner Derek (1991). Accident Analysis and Risk Control. VRJ Delphi 1991. READING Department of Defence (USA) (1984). Electronic Reliability Design Handbook, (MIL-HDBK-338-1A), Washington DC. Department of Defence (USA). Reliability Prediction of Electronic Equipment (MIL-HDBK-217), Washington DC. Department of Defence (USA) (1986). Reliability Centred Maintenance Requirements of Naval Aircraft, Weapons Systems and Support Equipment. (MIL-STD-2173AS ), Washington DC. Factory Mutual Research Approval Guide (2001). Chapter 4, Functional Safety of Safety Related Systems and Components. Moubray, John (1992). RCM II Reliability Centred Maintenance. Butterworth Heinemannn Participating OREDA Companies. Off-shore Reliability Handbook (OREDA). Hovik, Norway: DNV Technica Standards Australia/Standards New Zealand (1998). Risk Analysis of Technological Systems - Applications Guide. Australian/New Zealand Standard AS/NZS 3931:1998.

Page 124: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

Risk & Reliability Associates Pty Ltd 10.1

10. Bottom Up Techniques Generically, bottom up techniques examine how an element can fail and then assesses the impact of this on the system as a whole. Different bottom up techniques divide the system under consideration differently and may consider different failure types depending on the purpose of the analysis. The most common approach is to gather relevant experts in a room and use a process to obtain group consensus as to the seriousness of a problem and what should be done about it. The general layout for such an assessment is sketched below.

Whiteboard

O/H Screen

Facilitator

Technical Secretary

Analysts

Computer Projector

Laptop

Typical Analysis Facility Layout The analysts are usually the designers and the (proposed) operators or maintainers, that is, those who have to live on a day-to-day basis with the plant or process. The facilitator and secretary are usually external to both these groups, often outside consultants. This is to minimise potential bias. The facility, process or contract is then examined in a structured manner, one piece at a time. Problems identified by the group are discussed and consensus achieved as to the significance and the best solution. Action is documented on the spot by the technical secretary with all those present signing off on it at that time.

Page 125: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

10.2 Risk & Reliability Associates Pty Ltd

10.1 FMEA, FMECA and RCM 10.1.1 FMEA and FMECA Fault (failure) modes and effects analysis (FMEA) and fault modes, effects and criticality analysis (FMECA) are similar in nature except the criticality of a failure mode in FMECA is used as a ranking tool for each failure mode. The process is divided into four key parts as shown below.

System Description &Block Diagram

Fault Modes

Effects(and Criticality)

Conclusion &Recommendation

Fault Modes, Effects and Criticality Approach The detail of the analysis depends on the level to which the system is reduced in the System Description and Block Diagram. If the plant is considered as several large subsystems then the results will be quite coarse. However, if the System Description is done to an individual component level, extraordinarily detailed analysis will ensue. Typically the systems breakdown for most reliability analysis is to four levels as shown in below:

Components (Parts)

System

Sub Systems

Assemblies

Typical System Breakdown

Page 126: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

Risk & Reliability Associates Pty Ltd 10.3

Several authorities provide for lists of failure modes to be considered for each component or sub-system. For example, MIL-SD-1629A (US Military Standard pages 101-105): • premature operation • failure to operate at a prescribed time • intermittent operation • failure to cease operation at the prescribed time • loss of output or failure during operation • degraded output or operational capability • other unique failure condition based on system characteristics and operational requirements or constraints A more typical list is:

Delayed operation Fails to start Open circuit Erratic operation Fails to stop Out of tolerance (high) Erroneous indication Fails to switch Out of tolerance (low) Erroneous input False actuation Physical binding or jamming Erroneous output Inadvertent operation Premature operation External leakage Intermittent operation Restricted flow Fails closed Internal leakage Short circuit Fails open Leakage (electrical) Structural failure Fails to close Loss of input Vibration Fails to open Loss of output

Generic Fault Modes for FMEA and FMECA

The failure effect of each mode of fault by each component or sub-system is then considered, especially if the effect will be concealed or hidden from the operators. This is common with redundant systems where the loss of the one unit could remain undetected until the second fails. It is of particular concern with protective devices that do not fail safe. In terms of establishing criticality, the effects are usually considered as being in four categories whose priorities are in the listed order: * safety (fault mode with possible death or injury effects) * environmental (fault mode with unacceptable environmental effects) * service (fault mode with operational effects such as production interruptions, product quality

variations, customer service implications) * economic (fault mode with increased costs only) By considering each component or sub-assembly and how it might achieve the fault mode described, and the consequences of such fault, a detailed understanding of the system can be achieved.

Page 127: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

10.4 Risk & Reliability Associates Pty Ltd

A summary of the sort of results obtainable from such a study is shown in the table below. Component, (item, functional group)

Fault (Failure) Modes

Possible Causes Effects on System (and criticality if desired)

Push-Button (PB)

The PB is stuck

Primary (mechanical) fault

Loss of system function: the motor does not operate

Push-Button (PB)

The PB contact remains stuck

Primary (mechanical) fault The operator fails to release the PB (human error)

The motor operates too long: hence a motor short circuit, which leads to a high electric current and to a melting of the fuse

Relay The relay contact remains open

Primary(mechanical) fault

Loss of system function: the motor does not operate

The relay contact remains stuck

A high current passes through the contact

The motor operates too long: hence a motor short circuit, which leads to a high electric current and a melting of the fuse

Fuse The fuse does not melt

The operator overrated the fuse (human error)

In the case of a short circuit, the fuse will not open the circuit

FMEA Table of Results

FMEA and FMECA are normally bottom up processes that look at how component parts can affect the larger systems as defined in the system description and block diagram. It can therefore be particularly detailed and is normally applied to very high valued systems where failure (breakdown) causes major difficulties, such as aircraft and military combat equipment. 10.2 RCM The purpose of Reliability Centred Maintenance (RCM) is to establish the nature and frequency of maintenance tasks to ensure a target (optimum) level of reliability at best cost. It evolved in the private airline industry primarily through the activities of the Maintenance Steering Group of the International Air Transport Association. The final report of the Maintenance Steering Group in 1980 titled MSG-3, provided the backbone of the logic processes contained in the referenced texts and RCM analysis (Moubray 1992). The RCM process asks eight basic questions: i) which assets (significant items) are to be subject to the analysis process. ii) what are the functions and associated performance criteria (accept/reject boundaries) of each

asset in its operating context. iii) in what manner does it cease to fulfil its listed functions (fault mode). iv) what failure mechanism causes each loss of function (failure cause or fault). v) what is the outcome and impact (criticality) of each fault (effect). vi) what maintenance tasks can be applied to prevent each fault (preventive maintenance). vii) what action should be taken if effective tasks cannot be identified. The main point of the RCM analysis is to select which maintenance regime is most appropriate.

Page 128: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

Risk & Reliability Associates Pty Ltd 10.5

Until the mid 1970s items were seen as exhibiting a standard fault profile consisting of three separate characteristics. An infant mortality period due to quality of product faults. A useful life period with only random stress related faults. A wear out period due to increasingly rapid conditional deterioration resulting from use or environmental degradation. This is shown in the figure below. The consequence of such beliefs was that equipment was taken out of service and maintained at particular intervals, whether it was exhibiting signs of wear or not.

FailureRate

Infant Mortality Useful Life Wear OutTime

Bathtub Fault Rate

However, actuarial studies of aircraft equipment fault data conducted in the early 1970s identified a more complex relationship between age and the probability of fault (Moubray 1992).

Wear-in to Random Wear Out

Random then Wear Out

Steadily Increasing

Inceasing during Wear-in and then Random

Random over measurable life

Wear-in then Random

4%

2%

5%

7%

14 %

68%

89%

Fault Rate Curve Specifically, the bathtub curve was discovered to be one of the least common fault modes and that periodic maintenance increased the likelihood of fault. This led to the idea that the maintenance regime ought to be based on the reliability of the components and the required level of availability of the system as a whole.

Page 129: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

10.6 Risk & Reliability Associates Pty Ltd

The figure below indicates the overall process:

Yes

Collect System InformationPresent Data

Selectcomponent/assembly/sub-system

Identifyfunction

Identify FailureModes and Effects

Assess Criticality

Concealedor

Evident

Safety orEnvironmental or

Service orEconomic?

Redesign?

MaintenancePlan

RCM Analysis Flow Chart Note that a concealed fault mode is of major significance when assessing criticality. As can be seen, the process is really a FMECA with a focus on maintenance outcomes. 10.3 HazOps The Hazard and Operability (HazOp) Study technique was originally pioneered in the chemical industry (Tweeddale 1992). It has since been adapted into a wide range of industries. The essential features of a HazOp study are:

* It is systematic and detailed. A series of guidewords is repeatedly used to ensure consistency

and repeatability. * A team who know most about the project or facility, typically those who designed and those who

must operate it conducts it. * It concentrates on exploring the consequences of deviations from the usual operating

conditions. * It is an audit of the completed part of a design. Traditionally the HazOp procedure examines process equipment on a system-by-system basis, reviewing the process parameters using a checklist of guidewords, which suggest deviations from the normal operating conditions. The consequences of a variation are assessed, as are the circumstances that might bring it about. If it is deemed to be of inconvenience then it is addressed by the workshop on the spot and a solution proposed for action. The technique seems to work because the key parties to the process are present: the designers and operators, the builders and maintainers or the contractor and contractee.

Page 130: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

Risk & Reliability Associates Pty Ltd 10.7

The guidewords are tailored to suit the particular industry. They can be defined using the following conceptual deviations (Tweeddale 1992): * too much of ... (speed, load, level, elapsed time, distance, vibration etc). * not enough of ... (speed, load, level, elapsed time, distance, vibration etc). * none of ... (speed, load, level, elapsed time, distance, vibration etc). * part of ... (wrong composition, wrong component). * opposite of ... (reverse direction). * wrong timing of ... (starting or stopping too early or too late, wrong sequence). * wrong direction of ... (to left or right, wrong setting of points etc). * wrong location of ... (too high or low, too far or too short). * poor performance of ... ( normal duty, testing etc). * other than ...(whatever else can happen apart from normal operation, such as start-up, shut

down, uprating, low rate operation, alternative mode of operation, maintenance etc).

Select a line

Move on to next deviation

Consider and specify mechanisms for

identification of deviation

Is it hazardous or does it prevent effecient operation?

What change in plant or methods will

prevent the deviation ormake it less likely or

protect against consequences?

Will the operator know there is more flow?

Is the change likely to be cost effective?

Agree change(s) and who is responsible for action

Follow up to see actionhas been taken

Consider othercauses of more flow

Consider otherchanges or agreeto accept hazard

Select deviation eg more flow

Is more flow possible?No

No

No

No

Yes

Yes

Yes

Yes

HazOp Flow Process

Page 131: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

10.8 Risk & Reliability Associates Pty Ltd

10.3.1 Process Industry HazOps The chemical process industry usually focuses on the process and instrumentation drawings (P&IDs). The typical guidewords used are:

Flow: leak, high, low, reverse, Level: phase Temperature: high, low Pressure: high, low Reaction rate: high, low, and vacuum Quality: fast, slow Physical Damage: concentration, impurities, cross-contamination, side

reactions, inspection and testing Control: impact, dropping, vibration Protection: response speed, independence, testing

After these key deviations have been applied to the P& IDs, a further list of overview guidewords can then be applied. These include: Materials of Construction (corrosion, erosion etc), Services Needed (compressed air and the like), Commissioning, Start-up, Shutdown, Breakdown, Electrical Safety, Fire & Explosion, Toxicity, Environmental Control, Access, Testing, Safety Equipment, Output or Throughput and Efficiency. 10.3.2 HazOps Applied to Contracts Most breakdowns in a contracting out relationship arise from a lack of understanding of what elements of the relationship were truly important and susceptible to unrecognised threats. Such hazards, however, can be determined before the contract is entered into using a modified HazOp technique. Those who have watched various contracts coming unravelled will have noted the oft expressed sentiment that, “Gee, I wish we had thought of that before we got into this thing”. Obviously the HazOp technique described here may not predict all possible problems, but it has proved itself superior to one or two individuals from the contracting organisations sitting in different rooms trying to crystal ball the future and include it in the contract conditions, especially for a project that is large or unique in nature. It also has the added effect of ensuring the “win-win” nature of any contract as both parties to the contact are assessing the potential difficulties and mutually agreeing on solutions. This reduces the likelihood of subsequent accusations and conspiracy theories. Actual assessment figures can be included on a HazOp Item Data Sheet. Such data can be exported to spreadsheet reports for listing and ranking.

Page 132: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

Risk & Reliability Associates Pty Ltd 10.9

In flow chart terms, the process is shown below with a sample HazOp Item Data Sheet following.

Select a key contract function

Move on to next deviation

Select threat e.g. key contractor staff absence

Can it occur?

Yes

No

NoIs it hazardous or does it prevent efficient operation?

Consider other critical contract staff absence

Yes

NoWill change in contract advise of this?

Will the company know the absence has occurred?

What change in contractor methods will prevent the deviation or make it less likely or protect against consequences?

Yes

Consider other changes or agree to accept hazard

Is the cost of the change justified?

No

Agree to change(s)Agree who is responsible for action

Follow up to see action has been taken

HazOp Procedure Applied to Contracts

Page 133: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

10.10 Risk & Reliability Associates Pty Ltd

R2A Hazard Item Data Sheet Hazard Item No 23 Identified Problem Location Present 14 March 1996 2.35pm Client: New Company Project: VIP Product Line Location: 3 stand press Drawing number: 736.67, Rev 4, 12/05/00 Title:

Design Engineer Maintenance Engineer Contractor Scribe/Secretary: Fred Gatt, R2A Facilitator/Chairman: Richard Robinson, R2A

Nature of Problem Preliminary Solution Payback Assessment Guide word: Production line maintenance Threat deviation: Key maintenance contractor staff unavailability Possible Causes: Illness Consequences: Production interruptions due to slow inexperienced maintenance staff

Event Frequency Consequence severity Solution effectiveness Risk Saving Proposal cost Commercial return Maintenance saving PR/Morale Benefit Risk Saving Total Investment payback Period

0.5 pa $10,000

99% $4,900 pa

$5,000

$1,000 pa

($1,000) pa $500 pa

$4,900 pa $5,450 pa

0.92 yrs

Action Sign Off Review contractor backup staff arrangements. Price back up machine. Choose between increasing contract price to have stand-by staff available or buy new parallel production equipment Payback assessment as for back up machine

Responsible Person: Design Engineer Follow up action: Price B/U machine, request contractor price to guarantee staff availability Date: 15 March 1996 By: Maintenance Engineer Status: Comments: Further work required

HazOp Item Data Sheet

Page 134: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

Risk & Reliability Associates Pty Ltd 10.11

10.4 Common Mode Failures Bottom up techniques have difficulties with common cause and mode failures. This arises because the process is bottom up rather than top down. A detailed assessment from individual components or sub-systems such as HazOp or FMECA examines how that component or sub-system can fail under normal operating conditions. It does not examine how a catastrophic failure elsewhere might affect this component or the others around it. Such ‘knock on’ effects are attempted to be addressed in HazOps by a series of general questions after the detailed review is completed, but it nevertheless remains difficult to use a HazOp to determine credible worst-case scenarios. Examining systems designed to deal with common mode failures with RCM techniques is difficult too. An automatic sprinkler system, for example will only be called upon to operate quite rarely, perhaps once in a hundred years. But when it is required, a massive common mode failure for all the equipment in the fire-affected area will be occurring. Sprinklers systems are therefore quite tough. An RCM analysis will suggest that it requires little or no maintenance to remain in an effective operating condition. Nevertheless, they are checked regularly, to ensure that the fractional dead time becomes trivial (the time it is out of service). Sprinklers are in fact subject to latent failures such as stones in the piping or a restriction in the water supply. Unless tested, such a condition may well remain hidden until the sprinklers are called upon to act during a fire. This is obviously the worst possible time to discover the fault. Reliability analysis is conceptually focused at minimising breakdown failures to the 5% section shown in the diagram below. That is, what should be done to plant and equipment to ensure optimal availability and service at best cost. Risk analysis is targeted at minimising damage, injury and death and consequential problems including legal implications, that is the 0.0001% (10-4) region in the diagram. Applying reliability analysis to failure (risk) problems can be a difficult concept since the intellectual focus of the group is different. In a sense, this is why reliability people are optimists and risk people pessimists.

95% existing availability

5% RCMReliability Focus

0.00001 % Risk focus

(10E-5)

Reliability vs Risk

For example, a critical facility was recently built with two power grid connections, a gas turbine generator and several diesel generators any one of which was capable of running the entire plant. Power supply reliability was very high from a breakdown failure perspective, as the reliability designer intended. However, all this gear was put in a single machine hall and thus subject to a single fire event. This provided for a common mode risk failure. If a risk engineer had been involved in the design process, the different power supply devices would have been fire isolated from each other so that a fire in one or a gas explosion in the hall could not expose the others and knock out all power supplies. In the context of outsourcing, lawyers represent a most interesting form of common mode failure. The diagram below shows two arrangements. The first represents the lawyers acting as advocates whilst in the second, the two parties are communicating directly and the lawyers are documenters. From observation of the difficulties associated with a number of outsourcing contracts it appears difficult for Party 1 and Party 2 to have a clear and complete understanding of each others position when lawyers act as advocates, in effect passing pieces of paper under the door to each other. The second diagram indicates the approach that seems to be much more effective.

Page 135: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

10.12 Risk & Reliability Associates Pty Ltd

Party 1 Party 2Lawyer 1 Lawyer 2

Lawyers acting as advocates

Party 1 Party 2

Lawyer 1 Lawyer 2

Lawyers acting as documenters 10.5 Risk Management and the Project Life Cycle The role and way in which risk management is considered in a project life cycle varies depending on the stage it is at. This can be represented by the figure below.

HazOps,FMECAs,QRA, JSA etcBottom up analysis

Operation and Maintenance

CommissioningFunctionalDefinition/Specification

Vulnerability Assessments

Top down analysis

Pre-Planning

Contract Management

Roll-out, transition orproject management

Risk Techniques in Project Management A pre-planning approach uses top down analysis such as vulnerability assessments to identify possible risks facing a project and/or the organisation in general. Vulnerabilities identified, (assets coinciding with a threat) are documented and addressed appropriately with a risk reduction solution in mind. This process can be conducted before a project is commenced as a form of completeness check. Once the project has been commissioned, risk management forms part of the project management process. Bottom up analysis techniques such as Quantified Risk Assessment (QRA), Job Safety Assessment (JSA) and HazOp studies can be used to identify specific project risks. It is here that engineering, procurement and construction solutions can be implemented.

Page 136: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

Risk & Reliability Associates Pty Ltd 10.13

Risk management processes should be ongoing to be effective. Once the project is completed risk management is incorporated into the project's operation and maintenance procedures. Periodic assessments of the project need to be conducted to keep the risk management status current and up-to-date. This can be done using either top down or bottom up methods or a combination of the two. 10.6 Hazard and Critical Control Point (HACCP) Analysis HACCP is a systematic, organised approach to identifying, evaluating and controlling safety hazards in a food process. It is used to develop and maintain a system, which minimises the risk of contaminants. It was apparently developed by NASA in the 1960's to help prevent food poisoning in astronauts. In many ways it appears as a top down vulnerability technique applied at a very low level in the sense that it identifies who is to be protected and from what. It then goes on to establish how. A critical control point is defined as any point or procedure in a specific food system where loss of control may result in an unacceptable health risk. Whereas a control point is a point where loss of control may result in failure to meet (non-critical) quality specifications. HACCP can be used both as corrective and preventative risk management options. Risks are identified and a management option is selected and implemented to control the risk. However, the aim is to prevent hazards at the earliest possible point in the food chain. HACCP involves the identification of acceptable risk standards appropriate to different types of food hazards and the procedures to ensure that the risks are kept within the limits set by those standards. Food safety risk can be divided into the following three categories:

Microbiological Risks - Escheria Coli - Salmonella - Listeria Monocytogenes - Staphylococcus - Clostridium Botulinum Chemical Risks - Pesticide and herbicide residues - Cleaning chemicals - Heavy metal residues - Allergens Physical Risks - Glass - Plastic - Metal - Wood etc

There are seven principles to the HACCP technique:

1. Identify hazards 2. Determine the critical control points 3. Determine the critical limits for each control point 4. Monitor the critical limits 5. Identify corrective action procedures (corrective action requests or CARs) 6. Establish records and control sheets 7. Verify the HACCP plan

Page 137: r2A Risk and Reliability 5th_Edition

Bottom Up Techniques

10.14 Risk & Reliability Associates Pty Ltd

REFERENCES Department of Defence (USA). A Procedure for a Failure Mode, Effects and Criticality Analysis. (MIL-STD- 1629A), Washington DC. Moubray, John (1992). RCM II Reliability Centred Maintenance. Butterworth Heinemann Tweeddale Mark (2003). Managing Risk and Reliability of Process Plants. Gulf Professional Publishing which is a imprint of Elsevier Science (USA). READING British Standards Institution (1994). Reliability of Systems, Equipment and Components, Part 2: Guide to the Assessment of Reliability (BS 5760: Part 2) Blanchard B (1991). Systems Engineering Management. Wiley Interscience. Blanchard and Fabrycky, (1990). Systems Engineering and Analysis, 2nd Edition, Prentice Hall International. Chemical Industries Association (1977). A Guide to Hazard and Operability Studies. Department of Urban Affairs and Planning (1995). Hazardous Industry Planning Advisory Paper No. 8 Hazard and Operability Studies. HAZOP Guidelines. Department of Defense (USA) (1984). Electronic Reliability Design Handbook, (MIL-HDBK-338-1A), Washington DC. Department of Defense (USA). Reliability Prediction of Electronic Equipment (MIL-HDBK-217), Washington DC. Department of Defense (USA) (1986). Reliability Centred Maintenance Requirements of Naval Aircraft, Weapons Systems and Support Equipment. (MIL-STD-2173AS ), Washington DC. Kletz T A (1985). An Engineer's View of Human Error. IChemE, London. Kletz T A (1986). HAZOP & HAZAN Notes on the Identification and Assessment of Hazards. IChemE, London. Kletz T A (1985). Cheaper, Safer Plants or Wealth and Safety at Work . IChemE, London. Lees F P (1995). Loss Prevention in the Process Industries. 2nd Edition. Butterworth-Heinemann Ltd, Oxford, UK. (3 Volumes) Reason J (1990). Human Error. Cambridge University Press. Smith Anthony (1993) Reliability Centred Maintenance. McGraw Hill. Smith David J (1993). Reliability, Maintainability and Risk. Practical Methods for Engineers. Fourth Edition. Butterworth Heinemann, Oxford. Villemeur Alain (1992). Reliability, Maintainability and Safety Assessment. John Wiley & Sons.

Page 138: r2A Risk and Reliability 5th_Edition

Generative Techniques

Risk & Reliability Associates Pty Ltd 11.1

11. Generative Risk Techniques ‘Generative’ technique is a term adopted from James Reason’s work in the risk area (Reason, 1993). In terms of the paradigm model in this text (Section 2.9) it generally refers to the ‘selective interview’ column. It has much to do with morale and the willingness of people to constructively speak up and for the organisation to respond positively. In a legal sense it provides assurance after the event that no one can say, ‘I knew that but nobody listened’. 11.1 James Reason et al James Reason is an English psychologist who has written extensively on risk. In 1993 he suggested a 7-point rating scale for overall organizational risk control: i) Pathological barest minimum industry safety practices ii) Pathological / low reactivity one step ahead of regulators, some concern re adverse trends iii) Worried / reactive anxious about a run of incidents or accidents iv) Repair /routine sensitive to events, safety data collection /analysis but local repair only v) Repair / some proactivity wide range of auditing but "technocratic" remedial measures vi) Reform / generative aware that engineering, selection, training not enough, looking for better vii) Truly generative proactive measures in place, safety measures under continuous review, range of

diagnostic/remedial measures being considered, not complacent or self-congratulatory, still afraid of the hazards.

Reason (1997) noted three types of risk models:

The Person Model The Person Model is exemplified by the traditional occupational safety approach. The main emphasis are upon individual unsafe acts and personal injury accidents. It is usually policed by safety departments. The most widely used counter measures are 'fear appeal', unsafe act auditing, new procedures, training and selection. The Engineering Model The Engineering Model is system based and quantified where possible. Counter measures are engineered into the system using devices such as HazOps, FMECA's etc. Measures include quantified individual risk and societal risk. The Organisational Model The Organisational Model is allied to crisis management. Human error is a consequence and not a cause. Countermeasures aim at an 'informed culture'. Safety may be measured as quality.

Audit systems can often be seen to favour one or more of these models.

Page 139: r2A Risk and Reliability 5th_Edition

Generative Techniques

11.2 Risk & Reliability Associates Pty Ltd

Reason also notes three types of culture, each having particular characteristics: Pathological Culture Bureaucratic Culture Generative Culture Don’t want to know Messengers are 'shot' on arrival Responsibility is shirked Failure is punished or concealed New ideas actively discouraged

May not find out Messengers are listened to if they arrive Responsibility is compartmentalised Failures lead to local repairs New ideas often present new problems

Actively seek it Messengers are trained and rewarded Responsibility is shared Failures lead to far reaching reforms New ideas are welcomed

For Reason, an informed culture = a safety culture. It has the following components: a reporting culture, a just culture, a flexible culture and a learning culture. A Reporting Culture

Disincentives • Extra work • Scepticism that anything constructive to prevent it will happen • A desire to forget all about it • Lack of trust and • Fear of reprisals

Incentives

• Indemnity against disciplinary proceedings • Confidentiality or de-identification • The separation of the agency or department collecting and analysing reports from those

bodies with the authority to institute disciplinary proceedings and impose sanctions • Rapid, useful, accessible and intelligible feedback to the reporting community • Ease of making a report

A Just Culture

Were the actions as intended?

Were safe operating

procedures knowingy violated?

Is there a history of

unsafe acts?

Were the consequen

ces as intended?

Sabotage, malevolent, damage etc.

Were procedures available, workable,

intelligible and correct?

Reckless violation.

System induced violation.

System induced

error.Blameless

error.Negligent

error.

Was adequate training, selection

processes and expertise available

and present?

Yes

No

Yes

No

YesNo

Yes No Yes No

Yes

No

Diminishing culpability1 0

A decision tree for determining the culpability of unsafe acts

Page 140: r2A Risk and Reliability 5th_Edition

Generative Techniques

Risk & Reliability Associates Pty Ltd 11.3

A Flexible Culture

• A culture that favours face-to-face communication • Work groups made up of divergent people (with shared values and assumptions) • Able to shift from centralised control to decentralised mode in which the guidance of local

operations depends largely on the professionalism of the first-line supervisors A Learning Culture

• Observing (noticing, attending, heeding, tracking) • Reflecting (analysing, interpreting, diagnosing) • Creating (imagining, designing, planning) • Acting (implementing, doing, testing)

Reason is not the only author to notice the importance of culture. Charles Hampden-Turner (1990) has a notion of virtuous and vicious circles, shown below.

and an increasingcentralizationof authority

thereby precipitating

considerable informal

resistance and dissent

and a tendency for unitsto decentralize and deviate

with the result that...

the culture promotes an

extremeformality

The Vicious Circle

the culture carefullynotes what informal

activity

that a centralized information

system encourages

and formalizes theseinto its regular

operations, ensuring...

among thedecentralized units is

of most value to customers

The Virtuous Circle

Page 141: r2A Risk and Reliability 5th_Edition

Generative Techniques

11.4 Risk & Reliability Associates Pty Ltd

11.2 Transparent Independent Rapid Risk Reporting A number of organisations have developed transparent, independent-of-line-management rapid risk reporting systems. Such systems have two prime aims: i) To enable rapid reporting of matters like critical near misses that give individual employees a

‘chill”. A number of organisations have noted that just before something really serious happens someone somewhere in the organisation develops premonition which if promptly reported can prevent a disaster, and;

ii) To deal with issues that normal, day to day, line management systems have repeatedly failed to

address. For example, remote monitoring systems that persistently fail despite the IT department’s recurring efforts to sustain them. Rather than let frustrated employees develop hidden independent fixes outside of the ken of line management which can easily create latent conditions, one last risk communication system can be invoked.

One common approach is a weekly Red, Amber, Green (RAG) report. All employees should be able to access the RAG report to flag emergency risks, near miss / unsafe conditions and systemic failure. Typically this is by email to a central coordinator. The report is sent electronically to all managers and board members weekly. If a critical issue is identified that requires immediate attention then it is entered into the RAG report and identified as a 'red' risk. A review and/or investigation is then conducted to examine the extent of the problem resulting in the problem being actioned and moved to either the Amber (under review) or Green (fixed) section. Once it is Green it is deleted. If the emerging risk is ongoing, then the risk should be transferred from the RAG report to the usual risk register database for ongoing monitoring. Such a process is peculiarly open and powerful since it is routinely steps outside normal day-to-day line management decision-making and real alerts are gratefully acknowledged. It does not appear to be abused since false alarms are personally damaging and not repeated. 11.3 Generative Interview Techniques This is a top down enquiry and judgement of unique organisations rather than a bottom up audit for deficiencies and castigation of variations for like organisations. The object is to delve sufficiently until evidence to sustain a judgement is transparently available to those who are concerned. (Enquiries should be positive and indicate future directions whereas audits are usually negative and suggest what ought not to be done). The diagram below shows a stylised picture of the ‘corporate soup’. Individuals have different levels of responsibility. For example, some are firmly grounded with direct responsibility for production and maintenance. Others work at the community interface surface with responsibilities that extend deep into the organisation as well as high into the community.

Community Interface Surface

Corporate Ocean

Grass Roots

Pathogens

Vulnerabilities

Hazards

Interview Depth

Page 142: r2A Risk and Reliability 5th_Edition

Generative Techniques

Risk & Reliability Associates Pty Ltd 11.5

The idea is that a team interviews recognised 'good players' at each level of the organisation. If a commonality of problems and, more particularly, solutions are identified consistently from individuals at all levels then adopting such solutions would be fast and reliable. Other positive feedback loops should be created too. The process should be stimulating, educational and constructive. Good ideas from other parts of the organisation ought to be explained and views as to the desirability of implementation in other places sought. The following questionnaire has been used as a general basis for such an interview process. SAMPLE GENERATIVE INTERVIEW GUIDE OVERVIEW A. WHAT IS UNDERSTOOD BY RISK AND RISK MANAGEMENT?

The purpose of the section is to obtain the interviewee's initial perception on risk management in the organisation.

A.1 What is risk? (pure/business/speculative). A.2 What is risk management? (AS 4360 vs other concepts like assurance, quality etc) A.3 What risks are relevant to you? (Types, concerns etc). A.4 What risk management approaches do you currently use? A.5 How effective do you believe your risk management systems are? A.6 Are you familiar with the requirements of AS/NZ 4360? B. WHAT RISK/DEPENDABILITY/ASSURANCE MEASURES AND TECHNIQUES ARE IN USE? This section tests knowledge of formal risk related processes. B.1 What specific risk skills have you and/or your people been trained in? B.2 What makes you believe that when a (potential) emergency occurs your people will respond

well? B.3 Have you or others attended courses in risk management? B.4 Do you have knowledge of the following techniques? B.5 Do you have knowledge of the following codes and standards? B.6 Do you have access to and does your staff use the library of past incidents? C. WHAT IS THE PRESENT RISK/SAFETY CULTURE? This section reflects the issue that systems must match cultures for optimum results. C.1 Is your culture risk pro-active? C.2 Does your section have a clear understanding of the organisation's aims? C.3 Do your people have a clear understanding of your section's aims? C.4 Do you feel there is a good active knowledge of past organisation risk failures? C.5 What are you measures of risk performance? C.6 Do you receive management feedback on risk performance? D. WHAT RISK INFORMATION SYSTEMS ARE IN PLACE? This is to test not only the types of risk information collected, but also how it is used and the

overall integration of these systems. D.1 What are your claims management/insurance/legal response systems? D.2 How does your OSH&E function operate? D.3 How does the internal audit system function? D.4 Is the whole of life cost of risk available in the organisation information and planning systems? E. WHAT CHANGES WOULD YOU SUGGEST FOR RISK MANAGEMENT? This section is particularly focussed at what positive things could be done to enhance risk

management in the subject organisation. This is really a best practice focus by identifying success factors (what is being done well) and how this can be extended. It also embodies the recognition that all organisations are unique and that there are different ways of achieving success.

Page 143: r2A Risk and Reliability 5th_Edition

Generative Techniques

11.6 Risk & Reliability Associates Pty Ltd

11.4 Generative Solutions Technique Hazard based approaches to risk focus on identifying problems, and how they should be controlled. Concepts such as ALARP (as low as reasonably practicable) are often used. Another approach is just to put up solutions, try them and see which work. Such an approach was used to develop the “best way forward” for Silver Fern Shipping (Kneller at al, 2002). A top down threat and vulnerability approach was initially adopted to determine primary issues with regards to potential fires with unmanned engine rooms for the “Taiko” and “Kakariki” following from fires in the “Westralia” and “Helix”. Such a review concluded (amongst other matters) that stopping all fires from starting is very difficult indeed. But it was also noted that fires in manned engine rooms were generally detected early and managed quickly. Such detection occurred via human sensory detection. In addition to sight and smell, a change in the sound pattern or altered vibrations can also provide early alert. That is, early detection was achieved by more than just typical fire detection systems. The engine room staff actually acted as environmental monitoring devices. This prompted speculation as to the best early detection system. No crisp answer was available. Much expensive research could be undertaken, but this would commit the organisation to an endless series of irresolvable “what if” problems and possibly an untested technology thereby sapping organisational resources and enthusiasm generally. It was also noted that the ships (marine) engineers received the greatest respect and pleasure from fixing problems and that if they had spare time at sea, there seemed to be an uncontrollable urge to 'fiddle' with things. In view of this a generative solutions approach was recommended. Basically the two ships chief engineers were each given a budget to buy detection equipment. This potentially included sniffers, cameras (thermal imaging & others), vibration monitors (torsional and longitudinal), sound and noise analysers and the like. For the next few months they fiddled and then returned to advise that which worked well on their ship. This was seen to be cheaper than hiring engineering consultants or researchers to attempt to determine a solution, which might or might not operate in a harsh marine environment. It was also constructive, agreeable and interesting for the crew. REFERENCES Hampden-Turner C (1990). Corporate Culture, From Vicious to Virtuous Circles. Hutchinson Business Books Limited, Great Britain. Kneller A, R Robinson and D McCann (2002). A Fire Risk Assessment. Paper presented at the Pacific 2002 Conference. Darling Harbour, Sydney. Reason J (1993). Managing the Management Risk: New Approaches to Organisation Safety Chapter 1 of Reliability and Safety in Hazardous Work Systems: Approaches to Analysis and Design. Eds I Wilpert et al. Lawrence Erlbaum Associates Ltd, East Sussex. ISBN 0-86377-309-5. Reason J (1997). Managing the Risks of Organizational Accidents. Ashgate Publishing Limited. READING Reason J (1990). Human Error. Cambridge University Press.

Page 144: r2A Risk and Reliability 5th_Edition

Mathematics

Risk & Reliability Associates Pty Ltd 12.1

12. Risk and Reliability Mathematics This section is devoted to pure risk mathematics as used for technical and safety risk. Financial (market risk) mathematics which has both upside and downside components are described in Chapter 17.5. 12.1 Discrete Event Mathematics Both risk and reliability engineers require an appreciation of how the probabilistic outcome of different independent events can be added together. This can be shown in several ways. The overall probability of at least one of two mutually independent systems operating successfully for a particular period of time can be shown as a form of a block diagram, that is:

Pr(A or B) = Pr(A) + Pr (B) - Pr(A)*Pr(B)

Pr(A)

Pr(B)

or

Active Redundant System Block Diagram

(That is, both units are operating but only one needs to operate for success) Note that a probability is a pure number between 0 and 1. So for the example above if each unit has a 50% chance of operating in the next hour then there will be a 75% of at least one operating in the next hour. That is:

if Pr(A) = Pr(B) = 0.5 (or 50%)

then

Pr(A) or Pr(B) = 0.75 (or 75%) This can also be shown as a Venn diagram, below.

Pr(A) Pr(B)

Pr(A) x Pr(B)

Probability of Occurrence of at Least One of Two Independent Events

The total probability of at least one of the two independent events occurring simultaneously equals the combined area of the overlapping circles, that is, Pr(A) plus Pr(B) less Pr(A) x Pr(B).

Page 145: r2A Risk and Reliability 5th_Edition

Mathematics

12.2 Risk & Reliability Associates Pty Ltd

12.1.1 Systems in Series For a series block diagram shown below, the probability of occurrence of system success of all three independent components operating can be shown as:

Pr(A) Pr(B) Pr(C)

PR(Success) = Prs(A) x Prs (B) x Prs (C)

where Prs (x) is the probability of success of the component

Probability of Success of a Series Operating

The probability of failure of this system can then be described as: Pr (failure) = 1 - Pr (success) Pr (failure) = 1 - {(1 - Prf (A)) x (1 - Prf (B)) x (1 - Prf (C))} 12.1.2 Systems in Parallel For a parallel block diagram, shown below, the probability of occurrence of system success of all three independent components operating can be shown as:

Pr(C) Success

Pr(B) Success

Pr(A) Success

Probability of Success of a Parallel Operating

Pr(Success) = 1 – {[1 – Prs(A)] x [1 – Prs (B)] x [1 – Prs (C)]} Again, the probability of failure of this system can then be described as; Pr(Failure) = 1 - Pr(Success) Pr(Failure) = Prf(A) x Prf (B) x Prf (C) The mathematical equivalence of these formulae should be noted.

Page 146: r2A Risk and Reliability 5th_Edition

Mathematics

Risk & Reliability Associates Pty Ltd 12.3

12.1.3 Fault Trees & Block Diagrams Most risk and reliability analysis activity is done on an events per period (usually a month or a year), that is, a frequency basis. For project management, this may not be so. Usually the problem in question applies to a particular project. This means it has a "probability" (a pure number between 0 and 1) of occurrence for that project rather than any time basis. To get around this, the term "likelihood" is used as a general term in this text to describe a probability or a frequency or a combination of both. The relationships between probability of failure and success for OR and AND systems are shown in below.

Parallel Blocks

PR(B) Success

"Swiss Cheese"

Fault Tree OR Gate

Fault Tree AND Gate

Series Block

Pr(B) Fails

Pr(A) Fails OR

Pr(B) Fails

Pr(A) Fails &

A Fails B Fails

"Swiss Cheese"

Pr(A) Success

Pr(B) Success

Pr(A) Success

A Fails B Fails

Pr(B) Success

Venn, Fault Tree & Block Diagram Comparisons

Traffic Density Radar Option Separation/Segregation

See and Avoid Near Miss Mid Air Collision

Series of Failure Required for a Mid Air Collision to Occur (after Reason)

Page 147: r2A Risk and Reliability 5th_Edition

Mathematics

12.4 Risk & Reliability Associates Pty Ltd

12.2 Breakdown Failure Mathematics Reliability is inextricably entwined with availability. If availability is thought of in terms of a repairable system being “up” and “down” then a number of concepts and terms can be simply defined.

Down state (unacceptable)

Time interval = t

Up state (acceptable)

Up

Down

Time

Two State Availability Concept The time in the up state is related to reliability and the time to repair in the down state. MDT or Mean Down Time, that is, the average time the system is in a down state. MTTR or Mean Time To Repair, that is, the average time to restore the system to the up state. MTBF or Mean Time Between Failure, that is, the average up time. For a system where the breakdown failure rate is constant with respect to time (or random), the calculation of reliability is:

where R = e-λt = e -t/MTBF R is reliability t is mission time in hours λ = 1/MTBF and is the (average) failure rate per hour MTBF is mean time between breakdown failures in hours e = 2.718218…(a constant)

For example, if λ = 0.01 per hour (1 per 100 hours) and t = 10 hours then R = 0.9. That is, it has a 90% chance of operating continuously for that 10 hour period. Where the mission time equals the MTBF, the reliability formula reduces to:

r = e-1 = 0.368. This predicts that 37% of the population will survive until the MTBF. Note that unreliability = 1- e -λt

For t = 1 and λ very small (around 10-6 and 10-7) then:

λ ≈ 1 - e -λt This is the point at which the reliability engineer’s fault rate becomes equivalent to the risk engineer’s failure frequency. Mathematically at least, it suggests that risk is a simplification of reliability.

Unreliability (1 year) = λ per year

Page 148: r2A Risk and Reliability 5th_Edition

Mathematics

Risk & Reliability Associates Pty Ltd 12.5

The table below summarises the different terminology sometimes used to describe availability.

up to 30 secs downtime pa is 99.999905% availability pa or "6 nines" up to 1 min downtime pa is 99.999810% availability pa up to 5 mins downtime pa is 99.999049% availability pa or "5 nines" up to 10 mins downtime pa is 99.998097% availability pa up to 30 mins downtime pa is 99.994292% availability pa up to 45 mins downtime pa is 99.991438% availability pa or "4 nines" up to 1 hr downtime pa is 99.988584% availability pa up to 2 hrs downtime pa is 99.977169% availability pa up to 10 hrs downtime pa is 99.885845% availability pa or "3 nines"

Summary of Availability Numbers

12.3 State Theory Mathematics State theory analysis considers the ‘states’ in which a system can exist. An example of a multi-state system is shown below. This system can be in one of three possible states at any given time: S1 - Both units A & B are operating S2 - One unit, A or B, has ceased to operate but the other is still functioning S3 - Both units A & B cease to operate.

A

B

Multi State System

One reason for this type of modelling is to take into account the decrease in reliability due to solo operation. That is, the load on the second unit may be greater than when both are operating implying that the breakdown rate of the system is higher once the first unit has failed. Breakdown and repair rates can have exponential, log normal and Weibull failure probabilities, each with respectively increasing analysis complexity. The simplest type is Markov analysis which assumes that these systems have a constant breakdown and repair rate. Monte Carlo simulation techniques are often necessary for models using the other breakdown failure distributions. The modelling is done by considering the system in its ‘perfect’ state (S1) and defining all the other states in between, (S1+1 to Sn -1), to failure (Sn). These states can include degradation, maintenance and repair. The diagrams can be represented in different ways too. Consider the two units, A and B below, which are identical and have the same failure rate and repair rate.

Page 149: r2A Risk and Reliability 5th_Edition

Mathematics

12.6 Risk & Reliability Associates Pty Ltd

System States

S1

S2

S3

Time

A & B OperatingA or B Failed

A & B Failed

Multi State System State Diagram 12.3.1 Markov Analysis Consider a single unit which has three states such as a ball bearing: S1 - Bearing is in good working order S2 - Bearing is degenerating (increased vibration) S3 - Bearing has failed

S1 S2 S3

Component State Diagram The last two states (S2, S3) can be reached in various ways; either by wearing out normally and leading to failure if not replaced (S1 to S2 to S3) or a catastrophic breakdown of the bearing due to the propagation of a hairline crack (S1 to S3). The system MTTF rate for an active redundancy system is shown in the equation below, where: Active Redundancy - two identical units: MTTF = 3λ + µ or µ if µ >> λ 2λ 2 2λ 2

λ = 1/MTBF and is the (average) failure rate per hour µ = 1/MTTR and is the (average) repair rate per hour

This equation is reached by doing an analysis of the system considering the probability of each state at any given time and then developing and solving a set of differential equations. Using the above multi state system and assuming the units are electrical generators with failure rates of 1 x 10-4 per hr (100 failures per million hours) and repair rates of 2 x 10-2 repairs per hr (50 hrs (ave) per repair). If this system was in active redundancy then:

MTTF = 3λ + µ 2λ2

MTTF = 3 x 10 -4 + 2 x 10-2 = 0.0203 = 1,015,000 hrs

2 x (10-4)2 0.02 x 10-6

Page 150: r2A Risk and Reliability 5th_Edition

Mathematics

Risk & Reliability Associates Pty Ltd 12.7

12.4 Fractional Dead Time Mathematics Fractional Dead Time (FDT) is the fraction of time that the equipment is dead (cannot operate properly). It is referred to as FDT because the failure of the equipment itself does not pose a threat until there is a realisation of another hazard, such as fire. The probability of the uncontrolled hazard (hence, the overall failure rate) can be determined through a simple AND gate argument:

HAZARD(Chances p.a.)

CONTROL DEAD(FDT)

UNCONTROLLEDHAZARD

&

AND Gate Argument For example, a fire detection system that is checked weekly and takes one hour to repair has a maximum dead time of one week and one hour. Assuming one equipment failure on average per year gives a maximum FDT of 0.01928 (169 hours per 8,760 hours). Similarly, equipment averaging 2 failures per year has a FDT of 0.03856. If the building typically experiences a fire once every 10 years (or, 0.1 chances p.a.), then the probability of an undetected fire is:

0.1 × 0.01928 = 0.001928 chances p.a. The occurrence of equipment failure can be estimated as the Mean Time Between Failure (MTBF). MTBF is the reciprocal of the equipment failure rate, with the above example having a MTBF of 1 year. It should be noted that the MTBF is characteristic of the equipment item, and is independent of the frequency of testing. Analogous to this is the Mean Time Between Hazard (MTBH), which is the reciprocal of the probability of the overall hazard (fire with no detection). In our example, the MTBH is 518 years (1/0.001928). If the system was checked once every year, it would have a MTBH of 10 years. These examples show the importance of checking equipment regularly, as the time between checks is usually much greater than the time required to repair the equipment. Given the reciprocal relationship between MTBH and FDT, using the worst-case scenario for FDT produces a minimum MTBH. In practice however, if the system fails randomly then, as an average, we could say it fails mid term between testing periods. For the above example, the FDT would be one half week plus one hour (85 hours per 8,760 hours), or 0.0097. Mean Time Between Hazard = 1/(0.1 × 0.0097) = 1,030 years Which calculation is a closer approximation to reality depends on the failure curve after testing. That is, if failure is most likely to occur immediately after the equipment goes on line after testing (often the case) rather than randomly, then the minimum Mean Time Between Hazard is probably prudent design assumption.

Page 151: r2A Risk and Reliability 5th_Edition

Mathematics

12.8 Risk & Reliability Associates Pty Ltd

READINGS Finucane, Pinkney etc (1989). Reliability of Fire Protection and Detection Systems. Fire Safety Engineering - Proceedings of the 2nd Conference International Conference. Kirwin Barry (1994). A Guide to Practical Human Reliability Assessment. Taylor & Francis, London. Lees F P (1995). Loss Prevention in the Process Industries. 2nd Edition. Butterworth-Heinemann Ltd, Oxford, UK. (3 Volumes). Moubray, John (1992). RCM II Reliability Centred Maintenance. Butterworth Heinemann. Sherwin & Bossche (1993). The Reliability, Availability & Productiveness of Systems. Chapman & Hall, London. Smith David J (1993). Reliability, Maintainability and Risk. Practical Methods for Engineers. Fourth Edition. Butterworth Heinemann, Oxford. Smith Anthony (1993) Reliability Centred Maintenance. McGraw Hill. Villemeur Alain (1992). Reliability, Maintainability and Safety Assessment. John Wiley & Sons. Vinogradov Oleg (1991). Introduction to Mechanical Reliability: A Designers Approach. Hemisphere Publishing Corporation, New York.

Page 152: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

Risk & Reliability Associates Pty Ltd 13.1

13. Process Industry Modelling 13.1 Safety Cases

With large and complex plants, the process of managing safety, health and environmental issues requires a formal management system. The formal approach adopted is usually referred to as a safety management system (SMS). An argument or case that the operation of a facility is performed with acceptable risks is often termed a safety case. There are parallels to business cases, which are usually drawn up to convince a financier that a business is viable (Redmill). The object of a business case is to ensure that all significant factors affecting the business have been identified and that appropriate measures are in place to maximise the positive factors and minimise the negative ones. It is usually the responsibility of the highest levels of management within the organisation. Accordingly, responsibility for failure of the business usually rests there too. A safety case is intended to provide the same assurance with respect to the safety of a system or facility. Again it is primarily the responsibility of the operating company, at its highest levels. The Victorian major hazards legislation, for example, requires that the CEO or the most senior company officer resident in the State of Victoria sign off the safety case.

Safety Management

System

Business Units

Business Management

System

Middle Management

Financial Audit

Safety Audit

Board

CEO

Idealised Safety Case Structure

Once established, a safety case effectively manifests itself as a contract between an organisation and a regulator that permits the organisation to operate within defined limits in accordance with documented procedures. Compliance failure is a breach of contract. If damage to third parties, or death and injury occur due to such breaches then serious liabilities arise. Because of this, it appears to the authors that the legal system has converted the safety case concept to a liability management device. This means that an overriding consideration is that any safety case work be to the satisfaction of legal counsel. This is difficult if the safety task is assigned to technical 'experts' in isolation. An initial context definition is essential. What constitutes a safety case varies from industry to industry. The paradigm discussion from Chapter 2 is relevant. Based on a number of presentations made to various lawyers, those techniques and paradigms highlighted in the following table at least can be used in developing a safety case.

Page 153: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

13.2 Risk & Reliability Associates Pty Ltd

Technique>> Risk Management Paradigm

Expert reviews Facilitated workshops

Selective interviews

0. The rule of law Yes (Legal opinions)

Yes (Arbitration, moot

courts)

Yes (Royal

Commissions) 1. Insurance

approaches Yes

(Risk surveys, actuarial studies)

Yes (Risk profiling

sessions)

Yes (especially moral

risk) 2. Asset based,

'bottom-up' approaches

Yes (QRA, availability

& reliability audits)

Yes (HazOps,

FMECAs etc)

Difficult

3. Threat based 'top-down' approaches

Difficult in isolation

Yes (SWOT &

vulnerability)

Yes (Interviews)

4. Business (upside AND downside) approaches

Yes (Actuarial studies)

Difficult in isolation

Yes (Fact finding

tours) 5. Solution based

‘best practice’ approaches

Difficult to be comprehensive

Difficult to be comprehensive

Yes (Fact finding

tours) 6. Biological, systemic

mutual feedback loop paradigms

Yes (Computer

simulations)

Yes (Crisis

simulations)

Difficult

7. Risk culture concepts

Yes (Quality audits)

Difficult

Yes (Interviews)

Risk Management Paradigm - Technique Matrix

Each of the approaches in the cells above has particular strengths and weaknesses. They can be combined in different ways. This chapter will consider the ways in which the safety case arguments are developed in the process industries, especially with regard to quantitative risk assessment (QRA), the mainstay of safety cases in the process industry to date. It is also important to note that various state legislation call up the term 'safety case' especially in regard to major hazard industries. Because of this, smaller facilities may choose to undertake a similar process but use a different term to avoid legal entanglements. 13.2 Context (Top Down) There are a number of methods by which the context and the depth of the technical study required can be assessed and explained. 13.2.1 Vulnerability Workshops Top down techniques are generically described in Chapter 7. They include asset and threat assessments such as those used by military intelligence and other authorities. The key hazards are identified using a consequence assessment based on an Asset and Threat (or vulnerability) technique in a workshop with key design team personnel. This tends to focus on the worst possible outcomes irrespective of the cause or relative likelihood of such problems.

Page 154: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

Risk & Reliability Associates Pty Ltd 13.3

Basically, the main assets are defined and then all the possible threats to them identified. The concept is that a hazard exists when an asset is actually vulnerable to a threat. The main asset groups usually include: • People (especially off site persons like pedestrians, people in vehicles, neighbours, the wider

community, and visitors, employees and contractors, emergency services) • Environment (habitat) • Operability (business continuity) • Property (third party and company property) Threats are typically energy based in the first instance, for example: • Chemical Energy (including fire, explosion, BLEVE, toxic cloud, vapour cloud explosion) • Kinetic Energy (including impact of cars, trucks, projectiles due to exploding 200 l drums etc) • Potential Energy (including landslides, collapsing structure, falling objects, dam failure) • Environmental (including storm, wind, hail, lighting, floods)

Vulnerability (Context) Workshop

- Group Session- Best Available Knowledge- Completeness Check- Representative Scenario Identification- Corporate Legal Sign off

Safety Case(WorkCover Regulations)

- High Consequence- Low Frequency- Control Measures- Safety Management System- Emergency Planning

Fire Safety Study(NSW Dept. of Planning HIPAP 2)

- Model Scenario Impacts- Preventative Measures- Protection Measures- Capability of Resources

OH&S

- Manual Handling- Machine Guarding

Phase 1 Context Definition & Legal Sign Off

Phase 2 Technical Study

Threat and Vulnerability Approach Depending on the outcome, the need for further detailed studies can be decided. 13.2.2 Tiered Approach A three-stage process, consistent with the National Code of Practice for the Control of Major Hazard Facilities (NOHSC:2016:1996) provides a tiered approach for a risk review. It suggests that the following types and combinations of risk assessments be considered: • A broad qualitative hazard analysis; • A semi-quantitative hazard consequence evaluation to determine hazard effects; or • A quantitative risk assessment

Page 155: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

13.4 Risk & Reliability Associates Pty Ltd

DEFINE SCOPEOF RISK REVIEW

CONDUCTPRELIMINARY HAZARD

ANALYSIS

Checklists 'What if' AnalysisReactive Chemicals ReviewConsolidated AuditTechnology ReviewInsurance Inspections

Determine proposed risk review process, methodology and criteria for levels I, II, III with relevant public authority

IDENTIFY OPPORTUNITIESTO REDUCE RISK

AND REVISE SYSTEM

EXCEEDSLEVEL 1 Criteria

EXCEEDSLEVEL II Criteria

CONDUCT RISK EVALUATION

CONDUCTQUANTITIVE

RISK ASSESSMENT

EXCEEDS LEVEL III Criteria

MANAGERESIDUAL

RISK

HazOps, FMECAs, Zonal Vulnerability AnalysesConsequence AnalysisConsequence ClassificationLikelihood Assessment(Qualitative)

YESHIGH LEVEL REVIEWOF ACTIVITY WITH

RELEVANT PUBLICAUTHORITY

QRACause Consequence Modelling(Quantitative)Escalation & PropagationScenario Assessment

RISK REVIEW METHODS

IEC 61508 Criteria

Multilevel Risk Review The National Code of Practice for the Control of Major Hazard Facilities gives an example of the Multilevel Risk Review Process used by Dow Chemical Limited (adapted by R2A), which is shown below. A similar approach is followed in the New South Wales Department of Urban Affairs and Planning's "Multi-Level Risk Assessment" guidelines (1997). The tiered approach of the multilevel risk review is structured so that if the preliminary studies do not find that there are significant offsite risks, then detailed studies such as quantitative risk assessments may not be necessary.

Page 156: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

Risk & Reliability Associates Pty Ltd 13.5

13.3 Quantitative Risk Assessment (QRA) 13.3.1 Concept The figure below summarises an individual risk plotting process. This is a preliminary individual risk diagram of a LPG tank at a service station. Known hazards include a relief valve fire on the tank itself, a relief valve fire on the truck that fills it, major leak valve fires and a tank rupture with resulting vapour cloud explosion. Each has a different likelihood of occurrence and a different consequence severity as well as a different location and hazard radius.

0

Events and FrequenciesTank Relief Valve Fire (17x10 pa)Tanker Relief Valve Fire (10x10 pa)Major Leak Fire (7x10 pa)Tank Rupture Explosion (3x10 pa)

40

30

20

10

Chances in a million per year

-6

-6

-6

-6

Risk = 3 x 10 pa

Risk = 10 x 10 pa

Risk = 20 x 10 pa

Risk =37 x 10 pa

Site Boundary

-6

-6

-6

-6

Individual risk plot for a LPG Tank (plan is a 10m grid) The likelihood of each event occurring is shown in chances per year. Each circle represents the region in which an unprotected standing person is likely to be killed if a particular event eventuates. So if the sum of all the event frequencies per year is calculated at a point, the likelihood of killing an individual standing at that spot continuously for one year is known. Having added up the cumulative risk at different locations, it is then possible to plot iso-risk contours and compare these to the land use planning criteria described later in this chapter to determine the acceptability/unacceptability of the facility or operation in question. Individual risk is the risk that an individual would face from a facility if they remained fixed at one spot 24 hours a day 365.25 days per year, the so called “tethered person”. This effectively relates to an individual such as a toddler or elderly adult who has limited mobility and may be expected to be present at a residential location for much of the time.

Page 157: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

13.6 Risk & Reliability Associates Pty Ltd

The generic steps for the QRA procedure for the risk assessment process hazards are: a) Context and Scope b) Credible Threat (Hazard) Identification c) Likelihood Assessment d) Consequence Assessment e) Risk Assessment (combining c & d) The five key stages of the QRA process are expanded in the following sections. 13.3.2 Credible Threat (Hazard) Identification Credible threat (hazard) identification is the stage where materials, equipment and operations that have the potential to do harm are identified. Threats can include the storage or processing of hazardous substances and operations where error can result in the release of hazardous material or damaging energy. There are a number of generic techniques that can be used to perform a well documented and systematic threat (hazard) identification. Some of these techniques are discussed in Chapters 7, 9 and 10. Chief amongst these are:

Top Down * Threat and Vulnerability Assessments (can be done on a geographic or zonal basis). * Tiered Approach (Section 13.2.2) Bottom Up * Fault Mode Effects & Criticality Analyses (FMECA) * Hazard and Operability Studies (HazOps)

13.3.3 Likelihood Assessment When all threats (hazards) have been identified the frequency of their occurrence is estimated, usually by consideration of relevant historical data. For the process industries the initial incident usually involves a loss of containment of some sort, typically a leak. Hence the most common failure modes are various hole sizes producing different sized leaks. R2A like to use the term “Hazardous Event” for the initial incident, as at this point there is the chance that no harm will eventuate. Hazards can have a variable number of potential failure modes. For example, piping sections have an infinite spectrum of potential hole sizes and resultant release rates. In order to deal with this the failure modes (hole sizes) of the equipment making up the hazard are broadly categorised in a number of discrete groups, such as pinhole, hole, and rupture. The number of discrete groups used to classify potential releases is dependent on the sensitivity of the overall risk results to this grouping, the nature of available historical failure rate data, and the need to constrain the analysis from becoming overly complex. With the failure modes of a hazard categorised, all components contributing to each failure mode are identified. The process of how the failure rate of various components is aggregated into an overall failure rate is shown in the next figure.

Potential FailureComponents Hazardous Event

Failure ModePiping

Flanges

Valves

etc

Minor LeakOR

Time

Fault tree showing logical combination of component failures

Page 158: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

Risk & Reliability Associates Pty Ltd 13.7

The process described here has been systematically expressed as the R2A computer based system of work as follows. • Process and instrumentation diagrams (P&IDs) are imported as images into the R2A system. • Identified hazards are separated into isolatable sections containing common failure modes (pipes or

vessels). Intelligent computer 'objects' representing all valves, flanges, vessels, pumps, pipework etc are overlaid on the P&ID. These (potential) failure items are linked to a failure rate database.

Each isolated section is aware of failure items associated with it. Thus the failure rates for the range of hole sizes deemed appropriate for the section can be aggregated. Up to 4 hole sizes are selected to represent the spectrum of failure hole sizes possible for the process section under consideration. 13.3.4 Consequence Assessment Incident Outcome Determination Having established the range of failure modes to be considered for each hazard, the next stage of the analysis is to determine the range of possible outcomes for each failure mode. This is dependent on the existence and implementation of mitigation measures (automatic or manual detection & isolation), and on the potential for event escalation (for example, ignition of flammable material). A useful method for representing the time sequence of events and the possible outcomes following a release is an event (outcome) tree analysis. The event tree starts at the hazardous event, which is one of the failure modes of the hazard in question. The tree branches, with each fragmentation representing an intermediate event such as early ignition of a flammable release. Each branch is assigned a probability, with the ends of the tree representing the probabilistic distribution of all potential outcomes. The figure below shows an extension of the fault tree shown in section 13.3.3 Likelihood Assessment. It includes the fault tree as well as an event tree and hence becomes a cause-consequence diagram.

Hazardous EventFailure Mode

(Loss of Control)Piping

Flanges

Valves

etc

Minor LeakOR

Time

Rapid Isolation? Delayed Isolation?

Small release

Medium release

Large release

Yes

YesNo

No

Threat (Hazard) Components OutcomesIntermediate

Events

Cause Consequence Diagram The intermediate events that can cause a permutation of outcomes can be release intervention strategies such as: * automatic detection and isolation equipment, * manual detection and isolation equipment, or factors effecting the nature of a release such as: * early or delayed ignition of flammable releases, rainout of a two phase release, * high obstacle density providing the potential for the detonation of a release, * presence of bunding or drainage for liquid releases.

Page 159: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

13.8 Risk & Reliability Associates Pty Ltd

Each of the intermediate events is predetermined to occur at a nominated time, and in a specific time order, with changes to the time order influencing the potential outcomes. As timing can also affect the size of a release, the analysis can also demonstrate how the performance of mitigation and control equipment will affect the overall risk result. The conditional probability of intervention strategies can be determined from reliability data of the components making up the system. For intervention and detection equipment that fails in a hidden manner, fractional dead time analysis can provide conditional probabilities that the equipment is in a failed state when called upon (refer section 12.4). Fractional dead time is dependent on the testing period of the equipment, which means another performance measure can be included in the risk model. Using event trees to show the time order of potential intermediate events following an initial release is a useful way of exploring the range of possible outcomes. For a simple plant where the number of possible intermediate events will be small, choosing a fixed time order is reasonable. For a complex and congested plant, the number of intermediate events will be large, and determining the time order of these events with certainty becomes impossible. In these cases more complex models are required which consider all possible permutations of the time order of intermediate events. Impact Quantification Event trees establish the size of potential releases and their probabilistic consequence scenarios. Scenarios resultant from a flammable release that have an impact include: * Fireballs or BLEVEs (Boiling Liquid Expanding Vapour Cloud Explosions) * Flash Fires * Vapour Cloud Explosions * Pool Fires * Jet Fires * Projectiles (especially 200 l drums of flammable liquid). Releases of toxic materials can have wide ranging impacts as toxic clouds. The severity of impact that can result from these consequence scenarios can be quantified in terms of: * Heat Radiation for Fireballs, Pool Fires and Jet Fires; * Explosion Overpressure for Vapour Cloud Explosions; * Flammable Concentrations for Flash Fires & * Toxic Load or dose for Toxic clouds. In order to determine the extent of the impact of the consequence scenarios a model or combination of models is required for each type of consequence. The modelling of the impact of accidental releases of hazardous materials is an extensive subject, discussed briefly in this chapter. Probit Equations To quantify the risk of fatality or injury following a hazardous release, a dose response relationship is required. Probit equations are particularly useful for heat radiation or toxic releases, where a sustained low level exposure can be equally as fatal as an instantaneous high level exposure. Probit equations are usually written in the form:

Y = A+ Bln(hazardous load) The probit, Y is a random variable with a mean of 5, and a variance of 1 (for example, Y=5 corresponds to a 50% chance of fatality). Probit equations for exposure to thermal radiation and toxic gas are expanded later in the chapter.

Page 160: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

Risk & Reliability Associates Pty Ltd 13.9

13.3.5 Risk Assessment Risks to the life and safety of people on and off site can be measured in a number of ways, some of the more common are: * Individual Risk, * Societal or Group Risk, * Potential Loss of Life (PLL), and * Other Criteria, for example TLS (Target Level of Safety) for rare maintenance events). Individual risk and societal risk are discussed in Chapter 6. Individual risk is the risk that an individual would face from a facility if they remained fixed at one spot 24 hours a day 365.25 days per year. Its value is a frequency of fatality, usually chances per million per year, and it is displayed as a 2 dimensional plot over a locality plan as contours of iso-risk. The fact that the values are for fixed targets is not always made clear, as it may be assumed that some individuals have the potential to only be present periodically. The figure below shows a simplified example of an individual risk plot.

Site Boundary

1 x 10

1 x 10

1 x 10-7

-6

-5

Simplified Individual Risk Plot (numbers are fatality frequency per year) Societal Risk is a measure of the frequency (F) of fatalities of various numbers (N) of the community for a particular hazard. This is represented as a curve on log axes, which is called an FN curve. The curve is cumulative in terms of frequency, as if there have been 10 fatalities there has also been 9, 8, 7 etc. Societal risk is designed to display how risks vary with changing levels of severity. For example a hazard may have an acceptable level of risk for just one fatality, but may be at an unacceptable level for 10 fatalities. The figure below shows a simplified example of a societal risk plot.

Netherland UnacceptableLimit

NetherlandAcceptable Limit

1 10 100 1000

10

10

10

10

10-3

-4

-5

-6

-7

-810

Number of Fatalities (N)

Frequency of N or more fatalities per year

Societal Risk Plot (or FN Curve)

Page 161: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

13.10 Risk & Reliability Associates Pty Ltd

The data from a societal risk plot can also be used to determine the PLL (probable life loss). This is basically the sum of the product of each FN pair. The result is a single number, which represents the expected number of fatalities per year.

Whereas individual risk uses the "tethered person" approach, societal risk (and hence potential loss of life) is more flexible in terms of the habits of the population. Factors such as variable population densities during the day and protective measures installed can be taken into account when determining the number of fatalities. Traditionally QRA for the petroleum and chemical industry is required to produce results as both individual risk and societal risk plots. This allows a comparison against regulatory risk criteria and facilitates the assessment of available risk control options. Typically a QRA uses a facility’s stable, year to year operating mode. However, the risks associated with construction and commissioning provide for possible increased risk at that particular time. Annualising these risks in the QRA may not be wholly relevant since the precautions that are taken during normal operation may be expected to be different during construction. In practice, some form of Not Less Safe (NLS) or common law criterion is often applied. The NLS criterion is essentially a question of the form, "What should be done during these potentially higher risk periods to ensure that the risk to people (the public and workers) remains not greater than the risk during normal operation". The QRA and the application of the Individual and Societal Risk criteria then become the base case to which any special process such as construction may be compared. The common law criteria are final arbiters, which extend beyond all of the above and directly address causation, foreseeability, preventability and reasonableness. They really considers the question, "Is there any practicable good precaution, which should be applied?" This tests to see if there is a simple risk control available at minimal cost that should be applied irrespective of any formal QRA type criteria. 13.3.6 QRA Difficulties Unreality Quantitative risk analysis is all about finding out what things must conspire together to bring about a serious problem, assessing which of these has the greatest importance in the hazard, and suggesting that such items be the primary focus of risk management. It often deals with absurdly small numbers and statistics, which can often lead observers to question the validity of the approach. One important factor in the outcome is the failure data used. Often an analyst is forced to use failure data for 30 year old facilities simply because it is widely accepted in the field as being the most reliable, whereas more modern data is less certain. A possible answer is that whilst it is not an exact description of reality, it can be the best available to date so that until another better method is developed it should be used to demonstrate due diligence. Not Reproducible There are arguments that the results of QRA are best used to compare the relative safety of different systems and not look at the absolute magnitude of the risk in relation to risk criteria. Whilst relative risk may be useful for designers to choose an optimum design, it does not address the public and hence the regulator’s concern of the level of risk a facility presents beyond its site boundary. However, the use of alternative failure rate data and consequence models can also provide different results for analyses conducted on the same plant. Standardised failure data and methodologies would also address some of the differences between QRA results that can arise between studies carried out by different analysts on similar facilities. A major limitation of quantitative risk assessment (QRA) is that it relies on the application of generic data where no specific data is available, in particular for pipeline failure rates. This does not take into consideration improvements in manufacturing and monitoring standards, or the possibility that local systems are superior to world standard. Failure rates also do not take into account land use. For example, third party pipeline damage is far more likely in a rural area, than in a major city street.

Page 162: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

Risk & Reliability Associates Pty Ltd 13.11

QRA is a methodology widely used in the process industry, where risk is localised, and can often be contained within the site boundaries. "Black box" QRA approaches contain value judgements that are not made explicit and that the wide range of parameters is beset by uncertainty. A more transparent approach seeks to exemplify the source, range and application of assumptions, so as to provide decision makers with the best possible information at the time the decision is made. Expense The expense of QRA is also of concern. Multilevel risk reduction ideas are being used as previously described in Section 13.2.2, 13.3. Regulatory authorities are increasingly adopting these to reduce the cost burden on industry. 13.4 Fire Modelling 13.4.1 Finite Element Modelling Thermal impacts are quantified in terms of radiative heat flux (kW/m2), which is the main form of damaging energy, provided there in no direct flame impingement. The models used to calculate heat flux represent the flame as a solid surface that is treated as a grey body radiator. An average radiative heat flux is assigned to the surface (SEP), with view factors (F) and atmospheric transmissivity (τ) used to determine the proportion of the heat incident at a specific location:

I = τF × SEP Finite element analysis breaks the surface of a flame and the target down into a number of planar surfaces, and aggregates the heat flux contribution from all fire elements on all receptor elements:

Finite Element Calculation of a Tank on Fire

Page 163: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

13.12 Risk & Reliability Associates Pty Ltd

13.4.2 View Factors For each element receiving radiation, a live can be drawn to each element emitting radiation. The normal vectors to the elements and the line form angles β1 and β2. If either of these angles is equal to, or greater than 90°, the elements cannot “see” one another, and the view factor is zero. The view factor between two differential elements can be expressed as:

Fd1−d 2 =cos β1 cos β2

πr 2

View Factors

2.4.3 Effects of Thermal Radiation In order to predict the number of fatalities resulting from jet fires impacts, a relationship is required between heat radiation and fatalities. Morbid statistics for lethality resulting from heat radiation do exist, primarily coming from measurements from WW2 and military research. A combination of the significant levels of heat radiation follows according to the sources quoted by Lees (1996):

Heat Radiation Effect 1.2 kW/m2 Received from the sun at noon in summer 2.1 kW/m2 Minimum to cause pain after 1 minute 4.7 kW/m2 Will cause pain in 15-20 seconds and injury after 30 seconds’ exposure

12.6 kW/m2 * Significant chance of fatality for extended exposure. * Thin steel with insulation on the side away from the fire may reach thermal stress level high enough to cause structural failure.

23 kW/m2 * Likely fatality for extended exposure and chance of fatality for instantaneous exposure.

* Spontaneous ignition of wood after long exposure. * Unprotected steel will reach thermal stress temperatures, which can cause failure.

35 kW/m2 * Cellulosic material will pilot ignite within one minute’s exposure. * Significant chance of fatality for people exposed instantaneously.

Heat Radiation Values (after HIPAP No 4:1992)

2.4.4 Thermal Radiation Fatality Probits Thermal dose is typically expressed as a combination of the thermal radiation intensity I (W/m2), and the exposure time t (seconds). The model proposed by Eisenberg, Lynch and Breeding (See Lees 1996) determines a probit value, which is a normally distributed variable with mean 5 and variance 1 (so a value of 5 represents a 50% chance of fatality). The model proposed by Lees relates thermal load to burn depth, and then uses a correlation between burn depth and mortality determined by Hymes (See Lees 1996). Typically, 90 seconds exposure to a heat flux level of 12.6kW/m2 results in a fatality probability of around 50%.

Page 164: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

Risk & Reliability Associates Pty Ltd 13.13

13.5 Pool Fires In order to calculate the heat radiated from a fire, it is first necessary to determine the size and shape of the flame. For pool fires the flame can be represented as a tilted cylinder. The parameters used to define the flame shape for the case of a tilted cylinder are presented in the figure below.

Flame Flame Length

Pool Diameter

Dragged Diameter

θ

FlameTilt

Parameters Defining Pool Fire Shape 13.5.1 Flame Dimensions The physical dimensions of pool fires including flame tilt, dragged pool diameter and flame length are dependant of the properties of the material (mass burning rate, vapour density), and on environmental factors (wind speed, air temperature, humidity). Pool diameter is often based on physical constraints such as bund dimensions. Flame height is only constrained in particular scenarios such as tunnel fires. Numerous models are available based on experimental observations for a large range of materials and pool sizes. R2A use the following correlations available from the SFPE Handbook of Fire Protection Engineering (1995):

Flame Tilt: American Gas Association Dragged Diameter: Welker & Sliepcevich Flame Length: Thomas Equation

13.5.2 Surface Emissive Power The surface emissive pool diameter and physical properties of the burning product. Experimental data indicates that larger pool fires have a lower surface emissive power, due in part to a loss in combustion efficiency in larger fires. Smoke and soot particles also reduce the surface emissive power of pool fires, with soot having a SEP of around 20kW/m2, and clean flame around 140kW/m2. Typical averaged SEPs are in the order of 25-90kW/m2.

Page 165: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

13.14 Risk & Reliability Associates Pty Ltd

13.6 Jet Flames Jet fires can liberate large amounts of energy. According to Chamberlain a release rate of 100kg/s over a few seconds would produce a flame about 65m long in moderate winds and release some 5000MW of combustive power which is more than two and a half times the output of Loy Yang power station. The model developed by G.A. Chamberlain (1987), of Shell assumes that the surface of the flame can be treated as a frustum for the purpose of calculating the Surface Emissivity Power (SEP). The dimensions of the flame can be defined in terms of the flame lift-off, tilt, length, frustum length, base width & tip width:

Jet Flame Frustum

13.6.1 Release Rates Gaseous release rates are calculated using an analytical solution assuming adiabatic flow of gas leaving an orifice. Different relationships are used if the flow is "choked" (critical) or "un-choked" (sub-critical). Under choked flow, the gas exits the pipeline at greater than atmospheric pressure, and continues to expand downstream of the release. For full bore ruptures, choked flow occurs when sonic velocity is achieved, which is the maximum possible velocity in the pipe. The calculation used gives a good estimation for the release rate of a gas leaving an orifice, but as hole sizes approach the pipeline diameter the calculation begins to over predict the release rates. This makes the analysis somewhat conservative. The following graph shows how the release rate drops as a function of pipeline length for a 100mm diameter pipeline rupture at transmission pressure:

0

5

10

15

20

25

30

0 200 400 600 800 1000

Distance along pipeline (m)

Rel

ease

rate

(kg/

s)

Page 166: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

Risk & Reliability Associates Pty Ltd 13.15

13.6.2 Surface Emissive Power The net heat release rate of a flame, Q (kW) is simply the product of the heat of combustion (∆Hc) of the gas (kJ/kg), and the rate of gas release (kg/s). Jet flames have a much higher surface emissive power than pool fires, owing to the more efficient combustion as a result of turbulent gas flow. The fraction of the total heat that is radiated is a function of the gas jet velocity (u), and can be determined from the following expression:

Fr=0.21 exp(-0.00323u) + 0.11 Typically the emissivities of jet flames are in the order of 100-400kW/m2. 13.7 Explosions The energy released in an explosion is normally due to stored chemical energy, fluid expansion energy or vessel strain energy. For all explosion types, the energy released is equal to the work done by the expansion of gas from its initial to its final state:

W = − PdV1

2

The effects of an explosion are determined using a scaling law, and an equivalent number of tonnes of TNT (W). For a particular criterion, the scaled distance (z) is determined, which can then be used to find the actual distance (r) to the overpressure using the following formula:

r = zW13

13.7.1 Scaled Distance The scaling is a function of the overpressure, and is usually determined from a graph based on empirical studies. The following chart for vapour cloud explosions is based on the equation in "Major Industrial Hazards technical papers" from the Warren Centre, University of Sydney, sourced from the 2nd report UK Advisory Committee on Major Hazards:

0

100

200

300

400

500

600

700

800

900

0 10 20 30 40 50 60 70

OverpRessure (kPa)

Scal

ed D

ista

nce

Page 167: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

13.16 Risk & Reliability Associates Pty Ltd

13.7.2 TNT Equivalence The equivalent quantity of TNT is calculated based on a heat of combustion of 4600kJ/kg. For vapour cloud explosions, energy release is based on complete combustion of the explosive material. In determining the equivalent mass of TNT, a yield factor is applied. Energy in the blast wave of an explosion is generally a small fraction of that theoretically available, with kinetic energy of shrapnel, potential energy in products, and residual energy in air also occurring. Typically, 1-10% of the available energy of an explosion is in the blast wave. The yield of the Flixborough explosion in which 30-40 metric tonnes of cyclohexane were released was estimated to be 4-5%. 13.7.3 Effects of Explosive Overpressure The following table outlines the typical observable effects of explosive overpressures.

Explosion Overpressure Effect 3.5 kPa (0.5 psi) * 90% glass breakage.

* No fatality and very low probability of injury. 7 kPa (1 psi) * Damage to internal partitions and joinery can be repaired.

* Probability of injury is 10%. No fatality. 14 kPa (2 psi) * House uninhabitable and badly cracked. 21 kPa (3 psi) * Reinforced structures distort.

* 20% chance of fatality to a person in a building. 35 kPa (5 psi) * 50% chance of fatality for a person in a building and 15% chance of

fatality for a person in the open. 70 kPa (10 psi) * Threshold for lung damage.

* 100% chance of fatality for a person in a building or in the open. * Complete demolition of house.

Some Effects of Explosion Overpressure (after HIPAP No 4:1992)

13.8 Toxic Gas Clouds Many calculation intensive computer programs exist to determine the toxic "footprint" as a function of time in the event of a release of a heavier than air toxic gas. Major factors affecting the impact of such releases are discussed below. 13.8.1 Release Type The manner in which a material is released will have a large bearing on the toxic cloud footprint. Sudden releases of liquefied gas tend to result in result in a large initial cloud due to aerosol particles and flashing liquid, which will rapidly drop back to a steady state size. Continuous releases will take longer to achieve a maximum cloud size, which is often the same size as the steady state cloud formed by a sudden release. The steady state cloud size is limited by the rate of mass transport from the liquid pool. This is influenced by factors such as heat transfer from the ground, solar radiation levels, and the surface area of the pool (which can be limited by bunding). For gaseous releases, high pressure causes forced mixing of air and gas, resulting in a long narrow plume. Lower pressure releases tend to be wider as natural dispersion is more influential. For an equivalent release rate, low pressure scenarios are likely to have more far reaching impacts. 13.8.2 Meteorological Data Atmospheric stability characterises the conditions of convective heat and mass transfer within the atmospheric boundary layer. This will influence both the rate at which liquid chlorine will evaporate from a pool, and disperse from the toxic gas cloud. Pasquill stability is determined based on the wind speed and solar radiation levels (or at night, cloud cover). The table below outlines factors used to determine atmospheric stability:

Page 168: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

Risk & Reliability Associates Pty Ltd 13.17

D a y NightWind Speed (m/s) Solar Radiation Cloud Cover Fraction

S t rong Modera te S l i g h t <0.5 0 . 5 - 0 . 8 >0.8<2 A A-B B F E D-E

2 - 3 A-B B-C C F E D-E3 - 5 B B-C C E D D5 - 6 C C-D D D D D>6 C D D D D D

Atmospheric Stability

13.8.3 Surface Roughness Effective surface roughness (in metres) characterises the ground conditions over which a plume will travel. Surface roughness generally varies between 0.005 and 1.5m, with the lower end representing a surface such as a spill over water, and the upper end forested or built up urban areas. Increased surface roughness reduces the impact area of toxic clouds. 13.8.4 Probit Relationships Probit equations for toxic exposure take that same form as that for heat radiation exposure used by Eisenberg, Lynch and Breeding:

Y = A+ Bln(toxic load) Toxic load or dose are interchangeable terms for the integration over time (t) of the concentration of a toxic substance (C), raised to a power termed the dose exponent (n).

toxic load = Cndt∫

The dose exponent has the effect of placing greater emphasis on acute exposures (high concentration over a short time) than chronic exposure (low concentration over a sustained period). Toxic load is expressed in terms of concentration (in ppm) with respect to time (minutes). Typical probit equation constants for chlorine exposure (sourced from Lees) are:

Probit Equation A B n Eisenberg, Breeding & Lynch -17.1 1.69 2.75 Perry & Articola -36.45 3.13 2.64 Rijnmond -11.4 0.82 2.75 ten Berge & van Heemst 5.04 0.5 2.75 Withers & Lees (Standard Population) -8.29 0.92 2 Withers & Lees (Vulnerable Population) -6.61 0.92 2

Representative Probit Equation Constants

13.9 Fire Safety Studies A fire safety study is a useful tool for a systematic review of an existing or planned fire prevention and protection system. It represents what would be done in the event that the risk prevention system breaks down and contingency plans are invoked. In the sense of the risk management matrix, it is a combination of best practice and simulation. From the point of an attending fire brigade, it does not have a likelihood component in the sense that the event would be assumed to be happening. That is, the brigades would only be required to attend because the undesired event is underway.

Page 169: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

13.18 Risk & Reliability Associates Pty Ltd

The structure used to perform Fire Safety Studies is often that adopted by the NSW Department of Planning in its Advisory Paper No 2. “Fire Safety Study Guidelines”, namely: • identify fire hazards (this stage may already have been completed if a top down context study has

been completed) • determine the credible fire scenarios from identified hazards • determine preventive measures to minimise the possibility of fire • model the potential impacts of identified scenarios • quantify the fire protection resources required to manage the identified scenarios • model the capability of proposed or installed fire protection systems capability to provide these

resources This approach is performance based although relevant codes and standards are still used for guidance. Adopted references include the NFPA (National Fire Protection Association of the USA) Codes, and Australian Standards including AS 1940 “Storage and Handling of Flammable and Combustible Liquids”. A range of fire models can be used to estimate flame impacts, usually pool fire and jet fire models. These include the use of finite element 3D modelling. An example of an R2A model used to determine the radiation impact from a high pressure gas line in a city fire is shown in figure below. This is available for viewing on the R2A website (www.r2a.com.au). Once the consequences of a fire have been determined, the level of protection required for adjacent facilities and the requirements to extinguish the fire can be ascertained. This is typically done using a combination of thermal response models, code requirements and experience.

3D View of a High Pressure Gas Jet Fire in a City Block 13.10 Risk Criteria used in Australia and New Zealand Individual and societal risk criteria have been defined by the Victorian WorkCover Authority, the NSW Department of Planning and the Western Australian Environmental Protection Authority (EPA). Other Australian States and New Zealand authorities tend to utilise a combination of these criteria when assessing individual and/or societal risk. It is important to note that such regulatory compliance does not appear to satisfy common law criteria. Even if in the ‘acceptable’ region any cost effective precaution that reduces risk further needs to be considered. This issue is expanded further in Chapter 4, Liability.

Page 170: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

Risk & Reliability Associates Pty Ltd 13.19

13.10.1 Victorian Risk Criteria Individual and societal risk criteria for public safety relating to hazardous industries have not been formally established and publicised in Victoria. There is currently a set of draft criteria issued by the Victorian WorkCover Authority (VWA), which is used by Government Authorities involved in Land Use Planning. This criteria was used as part of the Technica Ltd, “Risk Sensitivity Analysis for the Altona Petrochemical Complex and Environs”, October 1997. The following tables outline the risk criteria for individual fatality risk for both new and existing installations.

Risk Level Actions >10-5 pa Must not be exceeded at the plant boundary 10-5 to 10-7 pa All practicable risk reduction measure to be taken. No

residential development applicable to new developments. <10-7 pa Acceptable

Individual Fatality Risk - New Installations

Risk Level Actions >10-5 pa Must not be exceeded at the plant boundary. 10-5 to 10-7 pa All practicable risk reduction measures to be taken but

restrictions on residential development applicable to new developments.

<10-7 pa Acceptable

Individual Fatality Risk - Existing Installations The document also establishes criteria for societal risk. Societal risk analysis combines the consequence and likelihood information with population information. This is presented as a F-N plot, which indicates the cumulative frequency (F) of killing 'n' or more people (N). A log-log F-N plot results in two parallel lines which defines three zones. a) above the acceptable limit the societal risk level is not tolerable b) between the acceptable and negligible limits the societal risk level is acceptable but if the

perceived benefits gained by the activity are not high enough, some risk reducing measures may be required. Risk should be "as low as reasonably practicable" (ALARP).

c) below the negligible limit, the societal risk level is acceptable, regardless of the perceived value of the activity.

1 10 100 1000

10

10

10

10

10-2

-3

-4

-5

-6

-710

Risk Unacceptable

Risk Acceptable but remedial measure desirable

RiskNegligible

Number of Fatalities (N)

Frequency of N or more fatalities per year

Victorian Societal Risk Criteria

Page 171: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

13.20 Risk & Reliability Associates Pty Ltd

13.10.2 NSW Department of Planning The NSW Department of Planning has published an advisory paper "Risk Criteria for Land Use Safety Planning" (June 1992) that outlines the criteria by which the acceptability of risks associated with potentially hazardous developments will be assessed. The table below summaries the criteria for the individual fatality risk for new installations.

Risk Level Land Use 0.5 x 10-6 pa Hospitals, schools, child care facilities, old age

housing 1.0 x 10-6 pa Residential, hotels, motels, tourist resorts 5 x 10-6 pa Commercial developments including retail centres,

offices and entertainment centres 10 x 10-6 pa Sporting complexes and active open spaces 50 x 10-6 pa Industrial

Individual Fatality Risk-New Installations

The NSW Department of Planning also puts forward risk criteria for property damage and inter-plant propagation. They recommend that risk no greater than 5 x 10 -5 pa for levels of: • 23 kW/m2 of radiative heat flux; and • 14 kPa of explosive overpressure should be experienced at an adjacent site. Societal risk is also addressed. It outlines two components of the societal risk concept, namely the number of people exposed to the levels of risk and that society is more averse to incidents that involve multiple fatalities or injuries than to the same number of deaths or injuries occurring through a large number of smaller incidents.

Netherland UnacceptableLimit

NetherlandAcceptable Limit

1 10 100 1000

10

10

10

10

10-3

-4

-5

-6

-7

-810

Number of Fatalities (N)

Frequency of N or more fatalities per year

NSW (Netherlands) F-N Curve

The department then explains that societal risk criteria F-N curves should be used cautiously. This is also the R2A experience. They provide insight into the matter under investigation and a view as to the effectiveness of proposed precautions. But as noted at the commencement of this section, compliance is not sufficient to satisfy common law criteria. Even if in the acceptable region any cost effective precaution that reduces risk further will need to be considered.

Page 172: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

Risk & Reliability Associates Pty Ltd 13.21

13.10.3 Western Australia EPA Criteria In the document "Guidance for the Assessment of Environmental Factors, Risk Assessment and Management: Offsite Individual Risk from Hazardous Industrial Plants, No. 2 (Interim July 1998)", the Western Australia EPA has set out the following criteria for individual fatality risk. a) A risk level in residential zones of one in a million per year or less, is so small as to be

acceptable to the EPA. b) A risk level in "sensitive developments", such as hospitals, schools, child care facilities and

aged care housing developments, of one half in a million per year or less is so small as to be acceptable to the EPA. In the case of risk generators within the grounds of the "sensitive developments" necessary for the amenity of the residents, the risk level can exceed the risk level of one half in a million per year up to a maximum of one in a million per year, for areas that are intermittently occupied, such as garden areas and car parks.

c) Risk levels from industrial facilities should not exceed a target of fifty in a million per year at the

site boundary for each individual industry, and the cumulative risk level imposed upon an industry should not exceed a target of one hundred in a million per year.

d) A risk level for any non-industrial activity located in buffer zones between industrial facilities and

residential zones of ten in a million per year or less, is so small as to be acceptable to the Environmental Protection Authority.

e) A risk level for commercial developments, including offices, retail centres and showrooms

located in buffer zones between industrial facilities and residential zones, of five in a million per year or less, is so small as to be acceptable to the Environmental Protection Authority.

13.10.4 Risk Criteria in New Zealand The risk criteria used in New Zealand (Auckland City Council, NZ 1998) for land use safety planning appears to be the same as the New South Wales risk criteria for land use (Department of Planning, Sydney 1990) which are listed in Section 13.9.2.

Page 173: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

13.22 Risk & Reliability Associates Pty Ltd

REFERENCES Auckland City Council, New Zealand (1998). Auckland Western Reclamation Area Land Use Safety Study. Chamberlain G.A. (1987), Developments In Design Methods for Predicting Thermal Radiation from Flares, Chem Eng Res Des Vol. 65, July 1987 DNV Technica Ltd, (1997). Risk Sensitivity Analysis for the Altona Petrochemical Complex and Environs. Prepared for ACC and Victorian Government, October 1997. Lees F.P (1996) Loss Prevention in the Process Industries – hazard Identification, Assessment and Control. NSW Department of Urban Affairs and Planning’s Multi-Level Risk Assessment guidelines (1997) NSW Department of Planning, Fire Safety Study Guidelines. Hazardous Industry Planning Advisory Paper No.2 (1993). NSW Department of Planning, Risk Assessment. Hazardous Industry Planning Advisory Paper No.3 Environmental Risk Impact Assessment Guidelines (1993) . NSW Department of Planning, Risk Criteria for Land Use Safety Planning. Hazardous Industry Planning Advisory Paper No.4 (1992). NSW Department of Planning, Guidelines for Hazard Analysis. Hazardous Industry Planning Advisory Paper No.6 (1992). Redmill, Felix and Jane Rajan (1997). Human Factors in Safety Critical Systems. Butterworth-Heineman, Oxford. Society of Fire Protection Engineers, SFPE Handbook of Fire Protection Engineering (1995). Standards Australia. Storage and Handling of Flammable and Combustible Liquids. Australian Standard AS 1940:1993. Western Australia EPA (July 1998). Guidance for the Assessment of Environmental Factors, Risk Assessment and Management: Offsite Individual Risk from Hazardous Industrial Plants, No. 2 (Interim July 1998) Worksafe Australia (1996). The National Code of Practice for the Control of Major Hazard Facilities [NOHSC:2016] 1996.

Page 174: r2A Risk and Reliability 5th_Edition

Process Industry and Consequence Modelling

Risk & Reliability Associates Pty Ltd 13.23

READING Australian Standard 2885.1-1997 "Pipelines-Gas and liquid petroleum, Part 1: Design and Construction" Australian Standards HB105-1998 "Guide to pipeline risk assessment in accordance with AS 2885.1" Barry Thomas F(1995). An Introduction to Quantitative Risk Assessment in Chemical Process Industries, Section 5 Chapter 12, SPFE Handbook of Fire Protection Engineering, 2nd Edition, 1995. Chamberlain GA (1987), “Developments in Design Methods for Predicting Thermal Radiation Flares”, Chem Eng Res Des, Vol.65. Chen, Richardson & Saville (1992), “Numerical Simulation of Full Bore Ruptures of Pipelines Containing Perfect Gases” Trans IChemE, Vol 70, Part B, May 1992. Det Norske Veritas (USA) Inc (1999), "API Committee on Refinery Equipment BRD on Risk Based Inspection", Revision 04. E & P Forum "Risk Assessment Data Directory", 1996 European Gas Pipeline Incident Data Group (EGPIDG)“Gas pipeline incidents: 1970 - 1992”, Pipes & Pipelines International, July-August 1995, as quoted in "E&P Forum QRA Data Directory", Section 9. Report No 11.8/250 October 1996. Johnson AD, Brightwell HM, and Cresley AJ (1994), “A Model for Predicting the Thermal Radiation Hazards from Large-scale Horizontally Released Natural Gas Jet Fires”, Trans IChemE, 72(B3) (1994). Kletz T A (1986). HAZOP & HAZAN Notes on the Identification and Assessment of Hazards. IChemE, London. Lees F P (1995). Loss Prevention in the Process Industries. 2nd Edition. Butterworth-Heinemann Ltd, Oxford, UK. (3 Volumes) Miller Peter (1996). Difficulties with Quantifying Risk. Miller’s Tales. Engineers Australia May 1996. Pipeline Operators Group Database (1971-1995) Pipes & Pipelines International, July-August 1995 "Gas pipeline incidents: 1970-1992, A report of the European Gas Pipeline Incident Data Group" The Longford Royal Commission Report (1999). The Esso Longford Gas Plant Accident. Sir Daryl Dawson, Chairman and Brian J Brooks, Commissioner. Published by the Government Printer for the State of Victoria, June 1999. Tweeddale Mark (2003). Managing Risk and Reliability of Process Plants. Gulf Professional Publishing, an imprint of Elsevier Science (USA).

Page 175: r2A Risk and Reliability 5th_Edition

Crisis Management

Risk & Reliability Associates Pty Ltd 14.1

14. Crisis Management Ideally, every aspect of a company’s activity that could expose the enterprise to significant risk should be known, assessed and managed to best effect. This is not the easiest thing to do. Total success may be more an aspiration than a reality. One thing, however, is certain. Efforts made to achieve this goal rarely fail to pay off. Effective risk management not only avoids or reduces losses, but it sharpens the competitive edge. Good governance requires that risk identification be comprehensive, and risk management as optimal as best practice will allow. Spotting and assessing risks that may result in crises with legal, political or public relations fallout is only one aspect of risk management, but it is one of the most important. “Fallouts” refer to the various sorts of external reactions and business consequences that may follow corporate decisions and activities. Risk management concern is mainly with those negative, sometimes unexpected reactions that can threaten the corporate image, more so if not well handled. Fallout crises may be triggered by events such as: - accidents, engineering failures, and other such events that have adverse health, safety or

environmental effects; - cost overruns or bankruptcy due to inadequate financial management or criminal fraud; - challenges to a corporate project at the proposal stage, or - responses to a negative – that is a decision not to do something, or a failure to perform as

expected. - perceptions by influential elements in the community that a corporation’s priorities or values

clash with the public interest. Fallouts can be fought out in the media, in industrial disputes, in protest actions, and/or in the courts and the legislature. In other words, the range and possibilities of fallouts are fairly limitless – both in respect of their causes and the way they unfold. 14.1 Intention As this chapter mostly addresses those already involved and experienced in many aspects of coping with risk, the intention is to skip the customary introductory points about the need for contingency planning for crisis or incident management. Instead, this section will make a number of assertions regarding crisis management, backed by illustrative examples and case studies. The aim is to help readers check whether their current thinking on risk – that is, how to identify risks and prepare to manage them, hopefully before they eventuate - is comprehensive enough. The range of relevant risks matches the wide variety of fallout possibilities. The assertions also aim to prompt consideration of whether existing systems for managing risk within readers’ organisations are appropriate to cope with the rapidly changing social, legal, political and international public environments in which both corporations and government, public and private bodies, now have to operate. The first of these assertions refers to the most important guideline in managing fallout. One thing that managing fallout is always about, and will always remain about, is securing and maintaining public trust and confidence. By trust is meant public acceptance that: - corporate management is acting in good faith; - the public can accept management claims without undue suspicion, doubt, or cynicism; - it can accept the corporation’s stated agendas as its real agendas, and - these agendas are public interest as well as special interest agendas.

Page 176: r2A Risk and Reliability 5th_Edition

Security & Crisis Management

14.2 Risk & Reliability Associates Pty Ltd

For example, when it comes to changes (say) in health, education, law enforcement, or some other essential public service, is the dominant and only aim to improve the quality and availability of services to the public, or is it to cut cost in line with some political agenda, or to cater only for some special interest? Trust relates to corporate ethics and social responsibility. Confidence, on the other hand, refers to ability. Confidence: This refers to public confidence in the competence, diligence, and professionalism of management to get jobs done properly – i.e. efficiently, safely, on budget, on time, and with the end product delivering what earlier promotion led the public and/or consumers to expect. A common temptation in managing fallout is to fall back on accusing the media, political parties, or activist groups of unwarranted interference and trouble-making. Often this involves saying or inferring that critics are acting for reasons of special interest, ideological opposition, or out of sheer “agin-the-government” bloody-mindedness. No matter whether this is the case – as it sometimes is – or not, the battle is still about re-assuring or re-earning public trust and confidence. Managing fallout always remains a battle for public credibility between managements and their critics. In the end the outcome will boil down essentially to what the facts are. What is the truth, or at least, who can be most believed? Rough and tumble working out of public accountability is an essential feature of the democratic process. It will remain an ever-present occupational hazard of management - especially when public services or key infrastructure development or redevelopment are involved, or when products are perceived to disaffect public health, safety or the environment. More often than not, fallouts involve legitimate and desirable public interest probing into how matters affecting the public or some significant section of the public, are being conducted. Public suspicions may sometimes be unfounded, or mischievously provoked. Criticism may be unnecessarily abrasive. Nevertheless, this does not reduce the importance of accountability and transparency in a democracy. 14.2 Lessons in Fallout Management Given the current political and business environment, we have little excuse for not all becoming experts on the risk of fallout and its management. Local and international media deliver almost daily free and detailed lessons in fallout management and mismanagement. From September 11 to Bali, from Enron and Arthur Anderson to OneTel, HIH, Ansett, and National Australia Bank, from the Tampa and “children overboard” controversies, to the challenging of church managements over the handling of child abuse cases, and, most recently of all, to the accuracy and political use of intelligence to justify the pre-emptive war on Iraq, daily news headlines have been on little else. Every day brings a fresh instalment, another tutorial in the dos and don’ts of fallout management or mismanagement. Throughout all this coverage, stress is clearly on trust and confidence - that is, on the credibility and competence, the professional integrity and proficiency of managements. Obviously some fallout questioning is motivated by partisan political opportunism, and/or by special interest and ideological preferences of one sort or another. Quite often the manner of questioning is unnecessarily shrill and abrasive. Presumption of innocence until proven guilty is not a prominent feature of media trials. But this does not undermine the basic democratic legitimacy of these public interrogations and their social utility. Most people are more confident in those managerial spokespersons who manage to respond in a workmanlike, unresentful, fact-focussed, up-front way to media questioning even when that questioning is at its most deliberately provocative, accusatory and insulting.

Page 177: r2A Risk and Reliability 5th_Edition

Crisis Management

Risk & Reliability Associates Pty Ltd 14.3

The ability of some spokespersons to do this seems to come when they accept that public accountability is an essential and proper part of their job. Not the most jolly and comfortable part, but an inevitable part. The fallout cases selected for mention so far have been particularly dramatic and occurring at the highest national and international levels. But this enhances rather than limits the lessons they carry for smaller scale enterprises. The dynamics exposed so starkly in these examples highlight the basics of risk management: i.e. the importance of prior risk identification, the origin of fallouts, the sorts of issues they raise, and the techniques that succeed or fail to manage them effectively. One clear lesson is that what management does, or can do, during the fallout is obviously limited by what they did or didn’t do before the fallout. Trying to re-write, re-interpret or shred history after the event has limited success. As Arthur Anderson discovered, it is more likely to super heat the frying fat. 14.3 Design Stage Managing fallout effectively obviously begins at the design stage. The design stage is when the risks of proposed policies and projects should be comprehensively explored. Not only in terms of the likely and possible impact on the public in general, but also on particular community, political and special interest groups many of which, as we well know, are more than capable of vigorous and practised response. It is at the design stage that we should explore how proposed decisions might be misunderstood or challenged. Also how projects can be unambiguously explained and communicated to watchful and potentially critical elements of the general public. This exploration should include how, if necessary, decisions can be justified later, after the fact, when unintended consequences may have manifested themselves. Even statements and directions that seem perfectly clear and simple can be badly misunderstood. 14.4 Case Studies Before proceeding to two specific case studies of contrasting fallout technique, it is useful to repeat a little more bluntly the key point that has been made so far. Fallout management, public interrogation and response, is a legitimate and desirable aspect of democratic accountability. Even when the process is misused and Rafferty’s Rules apply, more benefit accrues overall to the corporate image through frank and willing engagement with the public than through resentful reluctance, avoidance or obfuscation. In these times when political and PR minders and spin doctors seem to abound, there is a danger that managers will be tempted to think cynically of the fallout process. Some see fallout management largely in terms of merely training spokesmen in PR and political minder-style techniques for media appearance. Training is seen too much in terms of preparation to do battle with hostile, unfair and tricky adversaries only - the negative approach rather than a positive bid to win public trust and confidence. Should combat or communication, openness or secrecy, fact or spin, explanation or avoidance, be foremost in one’s approach to fallout management? The following two case studies may help readers decide between the two approaches. Both cases involved global product recalls. But they have the advantage of being widely applicable. Both are classical illustrations of good and woeful fallout management.

Page 178: r2A Risk and Reliability 5th_Edition

Security & Crisis Management

14.4 Risk & Reliability Associates Pty Ltd

Perrier One recall was by Perrier of its mineral water in 1990. The other was the recall by Proctor and Gamble in 1986 of its pain relieving Tylenol pain tablets. Perrier’s fallout was triggered when a US health authority, using advanced technology, detected benzene in a shipment of Perrier water from France. The concentration was allowable under World Health Organisation standards but not under US standards. Different Perrier spokesmen in the US and France started making factual assertions to the media before the company had established the facts. One claim was that contamination was confined to the US shipment. Another was that the benzene came from bottle cleaning. These claims were later shown by the media to be false or mistaken. This immediately set back the company’s credibility and aroused the media’s blood scent. Media focus on the Perrier story intensified. The investigative spotlight extended to Perrier’s promotional marketing. The value of the mineral water product lay in lifestyle image. Perrier claimed that the mineral water was pure at its natural source – that it was naturally sparkling, and that it was calorie and sodium free. Perrier’s image was built around promotional slogans like "It's Perfect, it's Perrier" and words like "Natural" and "Pure" and “Health”. Many consumers obviously saw Perrier as the fashionable drink of choice for those wishing to display a health-conscious, organic sort of lifestyle image. During the course of the fallout it was revealed that not only did the benzene come from the natural spring source in France, but also that the water was not naturally sparkling as it was in the bottled product. Neither was it calorie or sodium free. The company got no credit for finally admitting these facts. Its eventual retractions were regarded as forced confessions. Credibility went to the media for dragging the facts out of the Perrier spokespersons. The company’s image was also adversely affected by that fact that at no time during the fallout did the company apologise to its customers or express concern. Brand loyalty of its consumers was decimated. Company spokesmen gave the impression that they regarded public questioning as something of an unwarranted impertinence. Among the public, many thought the company should have known, perhaps did know, what was in its product. There was obviously little anticipatory risk management and little or no coordination between risk management and Perrier’s marketing consultants. By overlooking, ignoring or concealing so many potentially explosive risk factors in it’s marketing, Perrier was inviting disaster. As a result, 160 million bottles of Perrier eventually had to withdrawn and disposed of. Ultimate stock market and other business losses exceeded their value many times over. By the end of the fallout and for a long time afterwards, few saw the corporation’s image in terms of competence, transparency, credibility, and good governance. Re-establishing the corporate image was a slow and painful task. Tylenol Now let’s contrast the Perrier outcome with Proctor and Gamble’s handling of the Tylenol incident. Note that in the Perrier case no one was injured and probably no one’s health was really damaged. Benzene is considered potential carcinogenic, but the concentrations involved were small – on a borderline above US but below WHO standards. When a criminal extortionist poisoned Proctor and Gamble’s Tylenol tablets, however, several consumers died. When the extortion threat was first brought to the company’s attention through the media, Proctor and Gamble had previous assessed the risk and had contingency plans in place - plans thought out in the calm times before any crisis occurred.

Page 179: r2A Risk and Reliability 5th_Edition

Crisis Management

Risk & Reliability Associates Pty Ltd 14.5

It was the corporation’s CEO, not a low ranking staffer, who immediately appeared as spokesperson. He went straight on the nation’s leading media interview show. First, he declared the company’s concern for its customers. He said the company’s first priority was public safety. He said a global recall was already in motion. He announced that 24-hour, toll free, multiply phone-in lines had been set up to handle all inquires and problems. All phone-in staff were fully trained and kept up-to-date by progress briefings The CEO admitted that in hindsight the product would have been more secure if it had had tamper-evident packaging. This would be corrected. He was the company’s only media spokesman on the issue. He refused to hazard guesses when asked factual questions about which he was uncertain. He explained why the facts were not yet clear and what was being done to establish them. Willing transparency, demonstrable competence and public interest priorities were the qualities that won the corporation the public’s trust and confidence. In contrast to Perrier, Proctor and Gamble emerged from its fallout with an enhanced rather than a splattered public image. In a short time, the value of its shares and its product market leadership went back to pre-incident levels. 14.5 Conclusion Unlike the positive focus required of the enterprising movers and shakers vital to corporate energy and achievement, risk management has the job of looking at the grey sky scenarios - not just the sunny blue ones. It is right and proper for mainstream managers to keep their hearts and eyes on new fields of conquest. It is the risk management function to spot and mention the minefields that may slow or even prevent them reaching their goal. The current state of corporate credibility with much of the public is somewhat damaged. This has occurred at the very time that governments and others are putting pressure on corporations to expand good governance. These two trends comprise a sort of pincer movement, creating a fairly rugged environment in which to manage.

Page 180: r2A Risk and Reliability 5th_Edition

Security & Crisis Management

14.6 Risk & Reliability Associates Pty Ltd

REFERENCES When the Bubble Burst, The Economist, 3rd August 1991. Article on the Perrier incident. Gideon Haigh (1991). The Business of Managing Crises. The Age. 15th August 1991. Summary article including a review of the Tylenol incident. Gideon Haigh (1991). Ignorance is not Bliss in Crisis Management. The Age, 16th August 1991. Article on the Perrier incident. READING David Elias (1997). Arnott’s Agenda. Textbook case was the template for food threat. The Age. 22nd February 1997. p A20. Murray Mottram (1995). Going, Going, Gone. The Sunday Age (6th August 1995). Article on the Iron Baron incident.

Page 181: r2A Risk and Reliability 5th_Edition

Case Studies

Risk & Reliability Associates Pty Ltd 15.1

15. Industry Based Case Studies 15.1 Airspace Risk Assessment An Airspace Risk Model (ARM) was developed to address the risks of various airspace classifications for Airservices Australia, in particular those in isolated areas (Jones et al). Initially this model was used to determine the level of risk for both the current and proposed methods of operating in Australian airspace. The critical event as defined by the airspace risk model is the ‘near miss’. This is considered to have occurred when two or more aircrafts come within the defined horizontal (1 Nm) and vertical (500 feet) limits without being aware of the other’s presence. By defining this as the critical event it is assumed that the loss of control of the situation is identified as the point at which movement of the control surfaces of an aircraft at risk would not have any significant effect by the time the collision point was passed; that is that no matter what the actions of the pilots were at this point the results would still be ruled by luck. This is deemed to have occurred 12 seconds before any near miss / collision. The cause/consequence diagram is centred on this critical event from which the consequences flow from left to right. Time is also considered as always progressing from left to right across the page. The figure on the next page shows this cause/consequence diagram. Event diagrams were developed to show the sequence of events that lead to the critical event in the cause/consequence diagram:

* Traffic Alert not received * Aircraft cannot receive call * Considered action fails * Evasive action fails.

The event diagram for the “Traffic Alert not received” is shown below. An event diagram was also developed for the other three events.

ATS alert fails

No alert fromother aircraft

Traffic alert notprovided

Aircraft cannotrecieve call

Traffic Alert notreceived

or

&

Event Diagram for Traffic Alert not received Once all these event diagrams had been developed and verified the model needed to be quantified by the panel of operational research personnel (who also referred to various surveys and publications). Once this was done the values were inserted into the model and solved using methods outlined in Chapter 9 of this text. The results showed that the model was quite sensitive in some areas, which required further investigation. This quantified risk analysis approach (cause-consequence modelling) can be calibrated to give an assessment of the existing risk of the particular system under study. By testing such models against both the available data and the experiences of senior management and the technical personnel in the industry concerned, it is ensured that the model accurately reflects the best available information and knowledge at the time it is used to make decisions regarding risk acceptance and risk reduction, if required.

Page 182: r2A Risk and Reliability 5th_Edition

Case Studies

15.2 Risk & Reliability Associates Pty Ltd

ATC

Sep

arat

ion

inap

plic

able

1.00

E+0

0

1st A

ircra

ft5

min

ute

resp

onse

.Co

nsid

ered

act

ion

fails

from

pag

e xx

2.00

E-0

32A

1st A

ircra

ft1

min

ute

resp

onse

.A

vasiv

e ac

tion

fails

from

pag

e xx

1.00

E-0

32B

&1s

t airc

raft

fails

to

avoi

d 2n

d ai

rcra

ft

2nd

Airc

raft

5 m

inut

e re

spon

se.

Cons

ider

ed a

ctio

n fa

ilsfro

m p

age

xx2.

00 E

-03

3A

2nd

Airc

raft

1 m

inut

e re

spon

se.

Ava

sive

actio

n fa

ilsfro

m p

age

xx1.

00 E

-03

3B

&2n

d ai

rcra

ft fa

ils to

av

oid

1st a

ircra

ft

Loss

of C

ontro

l of

airc

raft

ener

gyEn

velo

pes o

verla

p.A

ircra

ft co

llisio

n

&

2.00

E-0

6

2.00

E-0

6

4.00

E-1

2

Col

lisio

n

Nul

l

Yes No

Col

lisio

n?

0.01

4.00

E-1

4

Slig

ht d

amag

eFl

y aw

ay

Yes No

Airc

raft

Loss

?

0.90

3.60

E-1

4

Airc

raft

Loss

4.0

E-15

0.10

Airc

raft

loss

on

ly

Yes No

Popu

lous

are

a?

0.01

3.60

E-1

6

Airc

raft

loss

&

colla

tera

l dam

age

3.56

E-1

50.

99

ATC

Se

para

tion

Cons

ider

ed ac

tion

5 m

inut

esEv

asio

n A

ctio

n1

min

ute

Crit

ical

Los

s of

Con

trol E

vent

0 se

cond

sIm

med

iate

out

com

e?+1

0 se

cond

sA

ircra

ft lo

ss?

+30

seco

ndCo

llate

ral d

amag

e?+3

min

utes

Cause-Consequence Model for Enroute Airspace Collision Risk

Page 183: r2A Risk and Reliability 5th_Edition

Case Studies

Risk & Reliability Associates Pty Ltd 15.3

15.2 Train Operations Rail Model Risk analysis is being used by NSW CityRail to rank infrastructure renewals in a way that ensures that work, which is done, will have a significant impact on the business (Anderson et al 1992). This is done by obtaining specifications on the acceptable ranges of quality of service from assets, of which the lower bounds are considered unambiguously safe for all possible levels of operation. With respect to this, management can then identify and eliminate safety and service risks by doing specific projects. This would also allow management to assess the cost of providing a specific level of service and safety, or alternatively the levels that can be provided with the funding available. The information required is that which would enable the estimation of the likely frequency of occurrence of a train collision on a particular section of the track. The data sheet designed for this input data is shown below.

A B C D

LINE HAZARDS & CONTROLS

Section data

SECTION

CalculateEnter

Analyst Kevin Anderson 17/1/92 2:06 PM CancelNextPrevFirst

LastShow Item

MainLINE

dnWk

14

upWk

633025

Fingerme to Hurtledown

9.34Length

Wrong side failure probability 7.87,-87.14,-9Visibility failure probability

NoAutomatic warning system (AWS)NoMechanical trainstop/trip (ATS)NoElectronic transponder (ATP)

Line Code

TRAIN

240258

8484

1111

SignalsTrackPoints

InterlockingTrain stops

Driver 2DeadManVigilance

AWSATS

NoYesNoNoYes

ATP No

YesNo

NoNoNo

Yes

YesYes

NoNoNo

No

NoYes

NoNoNo

Yes

TRAIN CONTROLS

SECTION DETAILS

EXPOSURE

Head On

Rear On

0.1472

0.6611

SECTION OCCURRENCE

A B C

FREQUENCY ( per annum)

1110

D

CONDITIONAL PROBABILITY

Head On

2.27,-51.42,-61.42,-62.27,-5

ABCD

1.5,-50.79,-60.79,-61.5,-5

Rear OnRun 1 w/o ATPModel Case

Run 1 w/o ATPModel Case

Run 1 w/o ATPModel Case

.1 .1 .1 .1% late

TOTAL 0.8084

MainLine Case

Main

Line Case

Autocalc

Section Based Data Sheet for the Estimation of Railway Collision Risk Data (illustrative purposes only)

The data in this sheet is then used in the Fault and Event Tree (cause-consequence model) for the Loss of Train Energy that calculates the probability of the possible outcomes. This is shown in the figure below.

Page 184: r2A Risk and Reliability 5th_Edition

Case Studies

15.4 Risk & Reliability Associates Pty Ltd

Page 185: r2A Risk and Reliability 5th_Edition

Case Studies

Risk & Reliability Associates Pty Ltd 15.5

This data sheet is in the background of the layout below.

EmuPlains Lapstone

Glenbrook

Blaxland

Warrmoo

Valley Hieghts

Springwood

Faulconbridge

Linden

Woodford

Hazelbrook

Lawson

Bullaburra

Wentworth Falls

Leura

Katoomba

MedlowBathBlackheath

Mt Victoria

Bell

NewnessJunction

Zig ZagTunnel

Lithgow

Bowenfels

Edgecombe OakleyPark

A Computer based Network Layout to which the Section Data is linked

This allows the entire system to be managed on a single sheet with a juxtaposition of data that is highly relevant to the task of determining the relative importance of different line and train controls. 15.3 Fire Risk Management (in buildings) Monash University owns or occupies many different types of buildings from multi storey high-rise buildings to low-level sprawling buildings of varying ages. Each one of these has a different level of fire protection. The authors provided advice regarding the establishment and ongoing use of a Fire Risk Management Information System, which would enable accurate assessment of deficiencies, corrective costs, work priorities and work completed to be available to management. This method would also ensure the limited pool of funds was used effectively. An initial assessment of almost half the campus building floor areas was done concentrating on the adequacy of the following systems: • emergency procedures • alert and communication systems exits • exit signs and emergency lighting • smoke control systems • air handling systems • fire penetrations • inspections testing and maintenance • fire detection and control systems This assessment revealed considerable life safety problems, which would require large amounts of funds to correct. Due to this the costs of fire risk management was then translated into the optimisation of the total costs of risk, that is the maximisation of the risk reduction per dollar spent. This was done with respect to safety, ‘an acceptable level of risk’ and ‘duty of care’ as defined by the Victorian Occupiers Liability Act (1983) and the Victorian Occupational Health and Safety Act (1985).

Page 186: r2A Risk and Reliability 5th_Edition

Case Studies

15.6 Risk & Reliability Associates Pty Ltd

An unacceptable level of risk is reached when the risk of fatality is assessed to be too high. An acceptable level of risk can be determined by analysing existing risks, which are familiar to and accepted by the public. A summary of hazards and involuntary risks resulting from voluntary and involuntary activities are shown and discussed in Chapter 9 of this text. With respect to this project the calculated level of acceptable risk was defined as one or more fatalities with a frequency of one (or less) in a million per year. If the calculated risk were greater than this value then risk reduction measures would be deemed necessary. A time sequence fire model was used to analyse the event/consequence model. This model was used to emphasise the time of occurrence of various conditions and related to the risk control measures. A smaller fire, which can be put out with an extinguisher and does not require fire brigade response, is not considered in this type of analysis. The frequency of larger fires is then determined and any parameters that would aid in early detection are considered. These include: − smoke detection − occupant response − fire rating of doors, walls etc. − sprinkler system operation (where installed) A fault tree was then developed to describe the system; this tree describes the failures or faults that have to occur before the top event of this tree eventuates. This type of modelling is described in greater detail in Chapter 9 of this text To add to the complexity of the analysis the buildings were also classed as one of four different occupancies:

1. Residential Occupancy 2. Office Occupancy 3. Public Occupancy 4. Laboratory Occupancy

Each of these occupancies has different parameters, which affect the result of the fault tree. A user-friendly interface that has all the relevant calculators in the background was developed on a Macintosh computer using “SuperCard” software. This allows the user to look at any of the building categories listed above. Particular building in their current state of life safety risk, which relates directly to the level of success of escape from a burning building, can also be viewed. Applicable financial data (including the cost of maintenance - inspection and testing) is also included. There are fifteen factors that affect the probability of escape (shown in the fault tree). Each of these factors has three possible probabilities, which relate to the items being: • not installed • installed but not maintained • installed and maintained Data files containing the above probabilities are used to calculate each building’s risk of multiple deaths. A fire risk optimisation model was then used to rank the buildings in descending order of risk. This routine provides a hierarchical list of risk reduction measures to be undertaken and the corresponding reduction in risk, which can be used to achieve specific risk levels. Overall the following steps were followed. i) Analysis of current life safety equipment. ii) Definition of the ‘level of acceptable risk’ and the number of fire starts per year. iii) Develop a fault tree for the system. iv) Calculate the building’s risk of multiple deaths with the system in its current state. v) Rank the buildings in descending order of risk. vi) Produce hierarchical list of risk reduction measures to be undertaken and the corresponding

reduction in risk.

Page 187: r2A Risk and Reliability 5th_Edition

Case Studies

Risk & Reliability Associates Pty Ltd 15.7

15.4 Transmission Line Risk Management Over 30% of the transmission lines used in Tasmania are 50 years old or more. These lines were built before the establishment of industry based guidelines. As a result of this many of these lines do not meet the clearance requirements outlined by the Electricity Supply of Australia (ESAA). Also a number of these lines were built across what were remote areas, but due to new roads being built and greater access to these areas by the general public via off road vehicles. For this reason greater clearances to comply with the ESAA guidelines are required and hence may pose a danger to the community and environment in their current state. Transend Networks Pty Ltd, Tasmania has adopted Risk Management techniques as an essential part of the Asset Management of the transmission system (Houbaer and Seddon 1995). This method was chosen to assist the Company in obtaining the greatest risk reduction per dollar spent, to reduce the amount of overall expenditure, and to optimise operations whilst also limiting their legal liability. This was done with the use of a risk-ranking model, which ranked the lines according to the severity of the breach of clearance according to the statutory minimum clearance obligation of 5.5 m above roads, (this is greater for other categories). The other techniques used to rank hazards and solutions were to quantify the level of risk exposure to people, equipment and environment, to determine the consequences of these hazards and to compare these risk levels with acceptable risk exposure levels documented by legislation, best industry practice or guidelines. This model also aided management in deciding whether expenditure on refurbishment or development projects could be minimised or deferred. In the case of deferral the likelihood of an unwanted event occurring is increased, but the risk is deemed acceptable provided the appropriate preventative control measures are put in place. This is preferable, as the cost of fixing all infringements would cost tens of millions in capital expenditure over a number of years. The four main parameters of the risk model were: • identification of critically exposed groups • classification of credible hazards • development of cause/consequence diagrams to determine what events have conspired

together to cause loss of control of conductor energy under consideration, and • determining acceptable risk criteria 15.4.1 Risk Criteria The rationale for ground to conductor clearances prescribed by the ESAA Guidelines could not be established, so the following analysis was made. Vehicles over 4.3 m are over dimensioned and require statutory approval before movement can commence. The Australian bridge overpass design height is 5.5 m. Flashover distance for 110kV is 0.25m and flashover distance for 220kV is 0.55m. Flashover distance for lightning strike (about 500kV) is 1.2m. Thus for traversable areas, the following conductor “risk” thresholds can be established. A1 4.3 m+0.25 m =4.55 m 110 kV flashover threshold for

maximum dimensioned vehicles A2 4.3 m+0.55 m =4.85 m 220 kV flashover threshold for maximum dimensioned vehicles B0 4.3 m+1.2 m =5.5 m Lightning (500 kV) flashover threshold for maximum dimensioned vehicles (ESAA Guideline for 110kV non traversable) B1 5.5 m+0.25 m =5.75 m 110 kV flashover threshold for over dimensioned vehicles that fit under bridges B2 5.5 m+0.55 m =6.05 m 220 kV flashover threshold for over dimensioned

vehicles that fit under bridges (ESAA Guideline for 220kV non traversable)

Page 188: r2A Risk and Reliability 5th_Edition

Case Studies

15.8 Risk & Reliability Associates Pty Ltd

C1 5.5 m+1.2 m =6.7 m Lightning (500 kV) flashover threshold for over dimensioned vehicles that fit under bridges. (ESAA Guideline for 110 kV traversable). C2 7.5 m (ESAA Guideline for 220 kV traversable). C1 is the same as the ESAA Guidelines for 110 kV. 15.4.2 Process The Transmission Line Risk Management System is a PC based desktop colour publishing solution to assessing and managing span-based hazards. The prime focus is on direct flashover hazards to the public. The process was developed with the support of the HEC solicitor. The operational steps are: 15.4.2.1 Preliminary PC Based Risk Assessment using Original Design Data A preliminary computer based assessment is made using the original data used in the design of the transmission line. Some Field Verification of the original design data occurs as required. The original design data is transferred to an Excel Spreadsheet format on disk. This information is then transferred to the TLRMS PC and analysed. Infringing spans (generally less than 6.7m for 110 kV conductors and 7.6m for 220 kV conductors or 9.5m over public roads) for alternative conductor core temperatures (typically 49°C and 75°C) are determined. A Register of Offending Spans is then printed by Transmission Number Line and Core Temperature. 15.4.2.2 Desk Top Risk Assessment This considers each span, taking into account factors like land use, road and rail crossings, conductor crossings and the various infringements determined above. It must be done by someone knowledgeable with the conductor and its environs. 15.4.2.3 Field Inspection - Best Available Data Established Based on the results of the above, suspect spans are inspected in the field using specially developed single A4 Register pages. If the ground or conductor profiles are incorrect then the TLCAD data needs to be corrected and the above two stages repeated. 15.4.2.4 Final Assessment and Action a) If the span data is correct then the proposed (infringement) control option/s needs to be

selected and costed and marked on the register page. b) Special hazards related to special critical groups (for example, scenic views encouraging low

flying pilots) needs to be assessed and noted on the register page. The single 'help’ page lists the possible hazards considered by the original expert team, but the field operative should try to consider if any other hazards to any other exposed group exist, for example, hang gliders, abseiling and others.

c) The control option data is then inserted into the TLRMS PC and the risk and control data

options exported to an Excel spreadsheet. d) This data is then ranked by: Worst Electricity Supply Association of Australia (CB1) infringement per span Worst Electricity Supply Association of Australia (CB1) infringement per linear metre Greatest hazard reduction per dollar spent for design controls e) Action budgets are formulated and plans made. If a physical change is implemented then the

design data needs to be altered and the TLRMS item re-run. f) If a procedural solution is adopted (for rare excessive conductor sags) during extreme

weather/load conditions then this needs to be formally documented and implemented. Regular training and/or drills will be required.

Page 189: r2A Risk and Reliability 5th_Edition

Case Studies

Risk & Reliability Associates Pty Ltd 15.9

15.5 Bushfire Risk Management The need for risk management or the reduction of loss control in bushfire prone areas is discussed at length in a paper written by the authors after the devastating Ash Wednesday bushfires (Anderson and Robinson 1984). The main objectives of such a bushfire risk management system would include: i) Relating the costs of various bushfire protection methods to the vulnerability of threatened

assets; in particular lives, property and areas of particular environmental/habitat significance. ii) Documenting methods of environmental management towards an optimum level of bushfire

prevention and safety. This will include both active and passive management items such as the application of planning controls and standards for road access and water supply reticulation reliability.

iii) Determining an appropriate balance between environmental conservation and fire hazard

reduction practices. iv) Concentrating on prevention measures within the ambit of local councils. 15.5.1 Assets The main assets that need to be protected are lives, property (residential, commercial and municipal) and areas of high environmental/habitat quality. Identifying assets also identifies where fire protection needs to be concentrated. 15.5.2 Threat Assessment To assess the threat to an asset an estimation of the type, severity and frequency of hazards needs to be made. This would be based on history and Rural Land Mapping (which includes an assessment of fire hazard). Obviously any information available on past incidents will be useful to this assessment. 15.5.3 Asset Exposure Obviously as the fire reaches and grows beyond the controllable stage the options for fire retardation decrease and the losses increase significantly. The probability of fire is dependent upon the supporting environmental conditions such as wind, temperature and combustible loading. There are generally three stages of fire growth that can be directly related to asset loss. Fire inception phase which there is very little loss, the minimum loss situation, (also referred to as ‘Normal Loss Expectancy’), is defined as the largest loss expected under normal circumstances, which assumes no loss of life and minimal loss of property. The maximum loss situation, (also referred to as the Maximum Foreseeable Loss), is the worst-case scenario. This exposure increases with the decrease of housing density and increases with the lack of clearings, adequate water supplies and access roads.

Page 190: r2A Risk and Reliability 5th_Edition

Case Studies

15.10 Risk & Reliability Associates Pty Ltd

15.5.4 Protection Measures Protection measures, which are both passive and active, were then proposed for both the Normal and Maximum Foreseeable Loss expectancy. Some examples are shown in the table below.

Fire

dev

elop

ed to

th

e un

cont

rolle

d st

age;

Not

at R

isk

Urb

an b

uild

ings

pr

otec

ted

as p

art

of p

rote

ctio

n of

cr

itica

l are

as;

urba

n fr

inge

bu

ildin

gs o

utsi

de

of p

rote

cted

are

a m

ay b

e at

ris

k.

Rur

al re

side

nts

evac

uatio

n to

to

wns

(eld

erly

, ch

ildre

n );

som

e m

ay r

emai

n to

pro

tect

hou

ses.

Com

plet

e ev

acua

tion

to to

wn

and

clea

n up

gro

ups

may

be

sent

ou

t afte

r crit

ical

per

iod

to s

ave

hous

es. S

ome

loss

of

hous

es,

no lo

ss o

f lif

e.

Fire

cre

ws

mai

ntai

ning

co

mm

unic

atio

n lin

ks to

HQ

's an

d ob

tain

ing

info

rmat

ion

on

wat

er s

uppl

ies

etc.

Ove

rall

com

mun

ity in

fo. s

yste

m

(eg.

sire

n) to

ale

rt to

eva

cuat

ion

to

tow

n, o

r rad

io ti

me

for w

arni

ng

thro

ugh

fire

dang

er p

erio

d

Emer

genc

y w

ater

sup

plie

s op

erat

ed b

y di

esel

pum

ps (s

ay

unde

rgro

und

tank

s) N

o on

e w

ay

road

s; lo

op ro

ads

for 2

way

acc

ess

to a

ll ar

eas.

Prot

ecte

d by

gol

f cou

rse

on th

e no

rth s

ide

of th

e to

wns

hip;

this

w

ould

act

as

fireb

reak

. C

ritic

al

build

ings

cou

ld b

e pl

aced

in p

ark

area

; in

the

case

whe

re o

nly

min

imal

ar

eas

coul

d be

cle

ared

the

evac

uatio

n ce

nter

sho

uld

be d

oubl

e br

icke

d an

d sp

rinkl

ered

via

und

ergr

ound

pip

ing

or a

ltern

ativ

ely

the

evac

uatio

n co

uld

be u

nder

grou

nd.

Emer

genc

y w

ater

sup

plie

s. If

po

ssib

le g

roun

d cr

ews

disp

atch

ed to

wor

k on

mos

t si

gnifi

cant

are

as, (

say

suff

icie

nt)

No

furth

er p

rote

ctio

n po

ssib

le

In t

he i

nitia

lst

ages

of t

he

fire;

Onc

e th

e fir

eha

s st

arte

d to

deve

lop;

Onc

e th

e fir

eis

at a

siz

e it

can

be e

asily

dete

cted

;

Onc

e th

e fir

eha

s be

en

dete

cted

;

ASS

ET

FIR

EST

AG

E>>

PRO

PER

TY

Urb

anR

ural

Con

trol

burn

ing

off

thro

ugh

enfo

rcem

ent

Cle

ar ro

adsi

des

as p

art o

f reg

ular

wor

ks p

rogr

am

Enfo

rce

the

clea

ring

arou

ndho

uses

, in

gutte

rs a

nd e

nsur

e fir

efig

htin

g eq

ipm

ent i

s ke

pt f

orho

useh

olds

in is

olat

ed a

reas

Mak

e cl

eare

d ar

eas

avai

labl

e

Infr

a re

d fir

e to

wer

s m

inim

ise

dete

ctio

n tim

eIn

crea

sed

surv

eilla

nce

parti

cula

rly o

n da

ys o

f hi

gh f

ireda

nger

(e.g

- us

e a

helic

opte

r on

days

of t

otal

fire

ban

)

Not

at R

isk

Not

at R

isk

Evac

uatio

n of

res

iden

ts to

tow

n,so

me

loss

of

life

to f

ire f

ight

ers

but m

inim

ised

with

bet

ter

equi

pmen

t, kn

owle

dge

of fi

resi

tuat

ion

and

beha

viou

rFi

re c

rew

s in

fire

res

ista

nt ta

nker

s

Not

at R

isk

Not

at R

isk

If p

ublic

in is

olat

ed a

reas

pub

lickn

owle

dge

of fi

re d

ange

r day

s im

port

ant

CR

ITIC

AL

FA

CIL

ITIE

S &

SER

VIC

ES

LIV

ES

Popu

latio

n de

nsity

ens

ures

dete

ctio

n in

whi

ch c

ase

they

sho

uld

be m

ade

fire

resi

stan

t

Popu

latio

n de

nsity

ens

ures

de

tect

ion

in t

owns

Wat

er p

umps

etc

. sho

uld

be fi

repr

oof

to a

n ap

prop

riate

leve

l

Infr

a re

d fir

e to

wer

s

Fire

brea

ks a

roun

d ha

bita

t if

min

imal

dis

turb

ance

to o

ccur

with

in a

reas

Wat

er S

uppl

ies

in h

abita

t are

a or

ne

arby

are

as o

r pro

vide

an

area

for

anim

al e

vacu

atio

n

Con

trol P

ublic

acc

ess

to

habi

tat

area

s.Ex

perie

nced

fire

cre

ws

to d

om

aint

enan

ce/ c

lear

ing

in

sign

ifica

nt a

ras.

Patro

l are

as in

hig

h fir

e ris

kar

eas

Not

at R

isk

SIG

NIF

ICA

NT

HA

BIT

AT

/E

NV

IRO

NM

EN

T

Prevention Measures Applicable At Various Stages of A Bushfire

Page 191: r2A Risk and Reliability 5th_Edition

Case Studies

Risk & Reliability Associates Pty Ltd 15.11

15.6 Tunnel Risk Management The following is summarised from Robinson, Francis & Anderson (2003). An initial vulnerability assessment was conducted as a completeness check to test for issues to be addressed. A very reduced sample for a tunnel is shown in the table below.

Assets>>

Threats

Travelling Public Including Disabled,

Elderly, small children, people who

behave erratically

Operator Staff Including

contractors, Breakdown

services

Emergency Services

Fire brigade, ambulance &

police

Local Residents

Habitat/ Environment

Air quality

Infra-structure & Third Party

Motorcycle breakdown

x x - - - -

Passenger car breakdown

x x - - - -

Bus Breakdown xx x x - - - HCV load fire stationary vehicle in free flowing traffic

xx xx xxx x x x

HCV vehicle fire burning vehicle in stationary traffic

xxx xxx xxx x x x

Injury/entrapment accident - all lanes blocked

xx x x - - -

Fatal accident - all lanes blocked

xx x x - - -

Pedestrians in Tunnel on walkway

x x x - - -

Cyclist in Tunnel xx x x - - -

Sample Vulnerability Table HCV (heavy commercial vehicle) fire especially in stationary traffic appears as critical (xxx) for three exposed groups and is analysed further. The figure below shows a preliminary cause-consequence model for a fire in a heavy commercial vehicle (HCV) in stalled traffic in a long two-tunnel system using longitudinal emergency ventilation (jet fans). 15.6.1 Loss of Control Point The loss of control point appears to be that fire which overwhelms the usual air handling system. There are several arguments for this. The simplest, legally, probably revolves around confined spaces. The tunnels should only have sweet, decent air whenever they are occupied, even during a fire/smoke incident. Otherwise they would be considered a confined space. Emergency ventilation to prevent a situation becoming a confined space is an attempt to restore control and acts after the event. On an open freeway a fire is mostly an isolated event since the heat and smoke goes up and exposed persons (beyond those trapped in the vehicle/s) basically stay away from the inferno until the brigade arrives or the fire burns out. In a tunnel this is potentially far more problematic because of the contained environment. Even an unmanaged 5 MW fire can create substantial problems for persons remote from the fire unless special precautions are taken. This means that it is the change of the tunnel environment by the fire that creates the loss of control.

Page 192: r2A Risk and Reliability 5th_Edition

Case Studies

15.12 Risk & Reliability Associates Pty Ltd

Threat controls Vulnerability ControlsDangerous goods restrictions Stalled traffic minimisationNon combustible vehicles Manual efforts, deluge systems

Fire Brigades ReponseEmergency evacuation systemsJet fans

Threat 0.5 HitFire in HCV in Potential injuriesstalled traffic and deaths

0.01 pa 0.00005 paLoss of Control(Manifest Threat)

0.0001 paPrecautions Smoke/fire overwhelms Near MissAutomatic usual air handling systems (Null outcome)fire control 5+ MW Fire?

0.01 0.5 0.00005 pa

Usual ventilation/air handlingEarly automatic fire control including sprinklers/deluge systemsStorm drainage deals with spilt fuel fire etc.

Preliminary Cause-Consequence Model for HCV Fire in a Tunnel in Stalled Traffic Another way to think of this relates to different size fires in the tunnel. Suppose that a car engine catches on fire, the driver pulls over and a passing truck driver stops and extinguishes the fire with a fire extinguisher. Other than the lane restriction and the possibility of collision, from the point of view of the tunnel environment, there has been no loss of control since the smoke and heat will have been dissipated in the overall tunnel air movement (piston effect of cars and the jet fans etc). However, there is a certain size fire that will disrupt the air flow, place remote persons at risk and thus bring about the need to impose emergency measures including an emergency ventilation system and the like. This appears to be the loss of control point.

Jet Fans and Piston Effect

Bouyancy Effect of Hot Combustion Gases

Fire in Downward Facing Tunnel Since tunnels can slope, cars travel in different directions and hot air rises; the fire loss of control point for two tunnels is potentially different. It is likely to be more severe in the tunnel where vehicles travel downhill. As suggested in the diagram above, fire in the down tunnel is far more likely to produce turbulence and mixing.

Page 193: r2A Risk and Reliability 5th_Edition

Case Studies

Risk & Reliability Associates Pty Ltd 15.13

There are three primary risk control regions. 15.6.2 Threat Reduction Firstly, threat reduction; in this case reduce the source of fire, for example, combustible trucks with large combustible loads. Small fires in any vehicle may occur once every two months, in a heavy commercial vehicle, say once per 10 years and in stalled traffic say once in 100 years. 15.6.3 Precautions Secondly, precautions such as deluge systems that can control fire before the normal air handling system is overloaded (small fires are safe fires). A further consideration is the size of the uncontrolled fires. If the environment can be designed to manage, say a 5 MW fire and, for example, the proposed deluge system could be relied upon to control the fire 99% of the occasions on which it is called upon to act. Automatic activation is probably required to achieve such reliability. In legal terms this may be considered to be beyond reasonable doubt? 15.6.4 Vulnerability Reduction And thirdly, reduce vulnerability by ensuring no one is present during a fire (minimal stalled cars) and the provision of emergency response, ventilation and evacuation systems. The critical scenario is high congestion with stalled traffic meaning there are stopped vehicles both before and after the fire. This makes the use of the longitudinal (jet fan) emergency mode problematic since it would blow smoke over one column of stopped traffic hampering evacuation. That is, with stalled traffic and longitudinal emergency ventilation, a heavy commercial vehicle fire will expose a large number of people who would have to evacuate through a smoky environment on foot. To reliably achieve this is very, very difficult. The lawyers (and regulators to whom such arguments have been presented) have always confirmed that precautions implemented before the loss of control point are the best place for the precautionary dollar. Complex, expensive, hard to model and unpredictable emergency measures invoked after the loss of control point attempting to bring a situation back under control are legally difficult to defend, especially when a sensible pre-loss of control point precaution was available. Obviously it is necessary to acknowledge and verify the reliability of the actual automatic systems that are proposed. Complex systems require commensurate safety assurance, such as through obtaining a Safety Integrity Level (SIL) pursuant to the Functional Safety Standard IEC (AS) 61508. REFERENCES Anderson K J and R M Robinson (1984). A Proposal for the Development of a Strategy Role in Bushfire Loss Reduction. Engineers Australia Local Government Conference, Melbourne. Anderson K J, R M Robinson and D J Hyland (1992). Ranking of Infrastructure Renewals Taking into Account the Business Requirements of the Railway. CompRail ‘92 Conference, Washington. Houbaer R and M Seddon (1995). Risk Management of Transmission Line Clearances in the Hydro-Electric Commission of Tasmania. Hydro-Electric Commission, Tasmania. Jarman M, C Tillman and R Robinson (1989). Management of Building Fire Risks through quantified Risk Assessment Techniques- A case study at Monash University. NSCA Convention, Monash Univ. Jones K, K Anderson, W Ely and R Phillips (1995). Application of Risk Analysis to Airspace Planning. ICAO Review of the General Concept of Separation Panel (RGCSP), Gold Coast, Australia. Robinson Richard M, Gaye E Francis, Kevin J Anderson (2003). Lessons from Cause-Consequence Modelling for Tunnel Emergency Planning. Proceedings of the Fifth International Conference on Safety in Road and Rail Tunnels. University of Dundee. pp 149-158. ISBN 1 901808 22 X. Victorian Occupiers Liability Act (1983). Now incorporated as the Part IIA of the Wrongs Act (1958) as amended in 1989. (Reprint No. 6. 15 January 1992). Victorian Occupational Health & Safety Act (1985). Act No. 10190/1985 (Reprint No. 5. 17 November 1998).

Page 194: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

Risk & Reliability Associates Pty Ltd 16.1

16. Occupational Health & Safety

16.1 Legislative Framework 16.1.1 History Early Occupational Health & Safety legislation followed on the heels of the industrial revolution and was generally very proscriptive and detailed and was largely aimed at factories and shops. In the 1960’s it was becoming increasingly obvious that proscriptive legislation could not keep pace with social, economic and technological change. Attempts at doing so had resulted in a huge volume of sometimes complex and rigid regulations. Consequently in the early 1970’s the British Government established a Committee of Inquiry, chaired by Lord Robens, to review OH & S in the UK. The report of this review (Robens, 1972), which came to be known as the Robens Report, was extremely influential in the reform of OH & S in the UK, but also in Australia, Canada and many other countries. All Australian States and Territories followed in the footsteps of the UK during the 1970’s and 80’s in a total overhaul of their OH & S legislation and regulatory framework. The other development during last century was the establishment in some countries including Australia of Workers compensation systems and laws. These had their origin not in the UK but in Germany in the 19th century. Prior to the establishment of Workers Compensation schemes in Australia, the only avenue for injured workers to recover costs associated with their injury was to sue their employer under Common Law. This meant the injured employee had to prove that on the balance of probablilities the employer had been negligent. For many injured employees taking legal action was beyond their financial means and even if they could afford it they risked having court costs awarded against them if they failed to prove negligence. Hence it was often not worth taking this risk if the amount of potential damages was not much greater than the court costs. The other problem with the Common Law system, which is why some States have removed or reduced the rights of workers to sue under Common Law, is that it takes many years for a Common Law claim to be decided and in the meantime there is an incentive in the form of increased damages for workers remain injured ie it is counter–productive in terms of rehabilitation. Because of this Workers Compensation legislation in Australia places an emphasis on the rehabilitation of injured workers. 16.1.2 Acts, Regulations and Codes of Practice In Australia Occupational Health & Safety is regulated by the States and Territories. In other words they have the responsibility of making and enforcing the OH&S laws in the form of Acts and Regulations. Each State and Territory has an OHS Act which sets out the general requirements for ensuring safe and healthy workplaces. These Acts establish the structure and define the responsibilities for achieving this goal. They define the government bodies responsible for OH&S as well as specifying the duty of care required by employers, employees and others who may have an impact, by their acts or omissions, on workplace health & safety. Such people may be contractors, designers, suppliers or manufacturers. The main objectives of OH & S legislation are to ensure safety, health and welfare of people at work and to eliminate risks to health and safety from the workplace. However many of the OH & S Acts extend the duty to persons at the workplace other than the employees. Hence retailers have a duty towards customers on their premises and Educational Institutions have a duty to their students. This is simply a reinforcement of the Common Law Duty of Care. Regulations can be made to support the OHS Act. In some States and Territories there are OH & S Regulations which deal with a large number of hazards and issues, whereas in some jurisdictions the regulations are hazard specific eg Noise regulations, Asbestos regulations, Plant regulations. The Regulations specify in more detail the steps that must be taken to control specific hazards and by whom. In some states Regulations may be supported by Codes of Practice. These are basically practical “how to comply” documents with a lot of useful advice on assessment and control.

Page 195: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

16.2 Risk & Reliability Associates Pty Ltd

16.1.3 Standards and Guidance Documents The National Occupational Health & Safety Commission (NOHSC) draws up National Standards in consultation with State/Territory Health & Safety Authorities, employee unions and employer organisations. These are adopted into their legislation by the States/Territories or called up by them is the case for the National Standards for Atmospheric Contaminants in the Occupational Environment (NOHSC, 1995). Standards produced by Standards Australia and other organisations provide technical and design advice. Some are safety related such as those dealing with fire safety and emergency standards and many others contain some health & safety provisions. There are also many other codes, standards and guidance notes in the public domain, some produced by authorities such as NOHSC and other by bodies such as professional and industry associations. The legal framework is represented in the figure below:

Legal Framework 16.1.4 Compliance Compliance with Acts and Regulations is mandatory whereas with all the other types of document mentioned above, compliance is generally not mandatory unless the document is called up by an Act or a Regulation. However Codes and Australian Standards can be used as evidence in court to demonstrate what could have been done, that is, a form of best practice. Compliance is desirable unless another solution or precaution achieves an equal or better outcome.

Page 196: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

Risk & Reliability Associates Pty Ltd 16.3

16.1.5 Extent of General Duties The wording of the General Duties of Care in Australian OH & S legislation varies between jurisdictions. For example in Victoria employers must provide a safe and healthy work environment “so far as is practicable”, whereas in South Australia the extent of the duties are “so far as is reasonably practicable”. In some states there is no such qualification so that the duty imposed is absolute. However to date there is no evidence that these differences have lead to a higher compliance standard being enforced in one State than in another. “Practicable” is defined (Occupational Health & Safety Act, 1985) as having regard to: (a) the severity of the hazard or risk in question; (b) the state of knowledge about that hazard or risk and any ways of removing or mitigating that

hazard or risk; (c) the availability and suitability of ways to remove or mitigate the hazard or risk; and (d) the cost of removing or mitigating that hazard or risk. In general the extent of the duties appears to the Common Law Duty of Care in all Australian jurisdictions however there are significant differences between jurisdictions when it comes to regulations and this can cause added complexity for companies operating across borders. 16.1.6 Penalties and Interventions Breaches of OH & S legislation can result in fines being imposed, generally through proceedings in a Magistrates Court. But the legislation provides for inspectors and in some states other parties, such as Health & Safety Representatives, to issue Improvement Notices or Prohibition Notices. An Improvement Notice requires an employer to take specified actions within a stipulated time period. A Prohibition Notice requires work to cease until specified remedies have been implemented. It is important to be aware of the rights and powers conferred on certain types of individual under OH & S legislation as hindering these people or failing to respond to notices is usually also an offence. 16.1.7 Definition of Employer Whilst all employees are in no doubt as to this status under OH & S legislation, the question of which employees if any could be deemed to also be “the employer” generally causes more anxiety. The interpretation that is now generally applied is that anyone in a management or supervisory role, that is anyone who is involved in the management of others, could be an “employer”. There have not been many cases where middle or lower managers or supervisors have been prosecuted for OH & S breaches but it would appear that for this to occur the manager must have knowingly issued instructions or omitted to take action that s/he knew was in violation of company policy or OH & S requirements, in other words that s/he knowingly by act or omission put others at risk. 16.2 OH & S Risk Assessment Most Australian legislation specifies that a process of hazard identification, risk assessment and risk control must be undertaken. In most instances the risk assessment methodology used is the risk matrix approach from the Australian Risk Management Standard although this Standard presents the matrix as one of several methods that can be used. The matrix approach has already been described in Chapter 7. In the OH & S context hazards are usually categorised using the energy-based classification described in Section 5.5. In our experience risk assessments are often worthless or worse, lead to efforts and expenditure being targeted inappropriately, because the hazard or vulnerability has not been properly defined. Furthermore the estimation of consequence or likelihood is often attempted using the qualitative scales given in the Standard and this then becomes a very subjective process.

Page 197: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

16.4 Risk & Reliability Associates Pty Ltd

Sometimes where there are several vulnerabilities from the one hazard a critical vulnerability can be overlooked. For example, a risk assessment of liquid nitrogen use in a laboratory dealt with the risk of liquid nitrogen burns but it did not deal with the risk of asphyxiation because presumably controls were believed more than adequate as they were of best practice standard (good ventilation, backup ventilation, 24hr monitoring of ventilation, oxygen monitoring and alarms). In effect all the controls failed to one degree or another and a worker died. In hindsight it would have been better to focus risk management resources on those that had the potential for greatest consequence. The legislation requires that risk control must be based upon the Hierarchy of Controls which is defined in Victoria as being in the order of most to least preferred:

1. Elimination 2. Substitution 3. Engineering controls 4. Administrative controls 5. Personal protective equipment and clothing

There are small variations to this in other states/territories. Because of the legislative requirement to carry out risk assessments, which must be documented to prove that they have been, this can result in an extremely large list of controls that need to be implemented. The authors’ belief is that the best use of resources if frequently obtained by ignoring the risk assessment stage and going straight to the identification of risk mitigating controls/precautions. It is interesting to note that this concept has now being adopted in the UK and elsewhere with respect to substances where inhalation exposure is one of the main risks (IOHA, 2002). The concept of control banding is an attempt at shifting the emphasis onto controls rather than risk assessment by simplifying the risk assessment, which for inhalation exposures amounts to exposure assessment. 16.3 Performance Indicators There are a number of possible performance measures available to assess risk and reliability. Commonly used ones are based around: − Fatalities (total number and frequency of occurrence). − Injuries (total number, severity and frequency of occurrence). − Statutory breaches (number and severity). − Days gained or delayed (especially for projects and contracts). − Dollars (gained or lost). − Availability (% time operating). These measures can be per period (per day, week, month or year) for an organisation or for a particular contract or project. A number of the more commonly used formulations follow. 16.3.1 Fatality Risk A common form of assessing fatality risk is: Fatality Risk from an activity = Number of deaths per annum from that activity Exposed Population Obviously, this is only statistically significant if the exposed population is of a reasonable size. Sometimes if the number of lives at risk can be assessed, attempts are made to assess the value of human life in financial terms. Ramachandran (1995) summarises the five methods his research shows are used to value human life.

Page 198: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

Risk & Reliability Associates Pty Ltd 16.5

i) Gross Output

This examines the gross output based on goods and services that a person can produce if not deprived, by death, of the opportunity to do so. This gives a relatively small value to a human life.

ii) Livelihood Approach

This is not altogether different to the output approach, assigns value in direct proportion to income. This also gives a relatively small value to a human life. It favours the higher paid over lower the paid.

iii) Insurance Method

This uses the value of life insurance policies purchased by individuals. This is a form of self valuation but has constraints in that what one person thinks their life is worth and what they can actually afford may be quite different.

iv) Court Awards This involves the awards given to the heirs of the deceased person. v) Willingness to pay.

This approach to value life rests on the principle that living is generally an enjoyable experience for which people are willing to sacrifice other activities such as consumption. That is, how much people are willing to pay to feel safe. It reflects the notion of “consumer sovereignty”.

16.3.2 Lost Time Frequency Injury Rates There have been attempts to reduce injury statistics to single numbers to compare the performance of organisations. This does not seem to have been hugely successful. Consider for example the use of a measure called the Lost Time Frequency Injury Rate (LTIFR) for OH&S performance described in the Australian Standard AS 1885.1-1990. The LTFIR is calculated by the number of incidents where more than a day was lost in a given period per million hours worked. The Lost Time Injury Rate (LTIR) is defined as the occurrence of lost time injuries per 100 workers. Even if actual days lost (per million hours worked) is used as a measure of risk, care needs to be taken with a “cash flow” view compared to an “accruals” view. The figure below represents four work injuries that occurred over three years. Each has a different duration as shown. The diagram indicates that there were three incidents in the year 2002/03, with the days lost being shown in the light grey hatching. Incident 2 was carried over from 2001/02 and incident 4 was carried over into 2003/04 and extended the whole year and beyond.

Schematic of Four Injuries that Occurred over Three Years

A consequence of the “cash flow” approach is that a death is measured as a loss of one man year, which is regarded as a ridiculously low value. For example, this compares to a debilitating back injury extending over several years (which would be bought to account each year) and from which a complete recovery was made, something like accident 4. This means that the focus of companies that use a concept like the Lost Time Frequency Injury Rate or any “cash flow” basis of risk accounting, would be on high frequency, low severity events, rather than high severity (fatality), low frequency events, the primary focus of regulators and the courts.

Page 199: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

16.6 Risk & Reliability Associates Pty Ltd

An alternative proposal is for an accruals basis using days lost. So in the case above the 2002/2003 year has only two incidents that actually occurred in it (labeled 3 and 4). However, the days incurred extend into 2004. The whole of this amount would be bought to account in the 2002/03 year and 2003/04 would be deemed to have no injuries. Since an accruals basis of accounting is the one most organisations use, and the one which the whole organisation is usually trained to understand, using a cash flow basis for injury measurement seems curious. A detailed discussion of this sort of problem and other difficulties associated with the use of existing injury indicators is contained in WorkSafe Australia (1994) documents entitled Positive Performance Indicators, Beyond Lost Time Injuries. 16.4 Information Structures This section actually addresses a larger risk management domain than OH & S but this seems to be the context in which it is most frequently raised. 16.4.1 Hazards (Vulnerabilities), Incidents and Risk The relationship between hazards (vulnerabilities) and incidents requires clarification. It is always better to focus on preventing hazards rather than managing incidents from a control viewpoint. This perhaps can be best explained as follows.

Hi = Particular or specific hazard (for i = 1 to n hazards) {Hi} = Set of all known hazards Ij = Particular or specific incident (for j = 1 to m hazards) {Ij} = Set of all known incidents

n is much larger than m, hence { Hi } ≠ { Ij } For every Ij there is a particular Hi, but not vice versa.

Frequency

Severity

Ij

Hi

Relationship of Incidents to Hazards or Vulnerabilities A pictorial representation on a risk curve is shown in the figure above. Note that the focus of risk management should be on the set of all hazards. The set of all possible incidents is in fact identical to the set of all hazards except that over a particular time period most have a null rather than actual outcome.

Page 200: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

Risk & Reliability Associates Pty Ltd 16.7

For example, if a company were exposed to i hazards in a defined period, say a year, then the set of hazards {Hi} would be represented as {H1, H2, H3 , H4, H5, ... }. If we then look at data for a particular year there might have been only three actual incidents. These could be represented by {Ij} = {I1, I2, I3, I4, I5, ...}. However, since only three of the incidents do not have a null outcome it would be better represented as {Ij} = {I2, I4, I10}. The risk associated with each hazard and incident is the product of likelihood and severity. That is, how likely it is of occurring and how many days are lost, for example, if it occurs. In the case of an incident the ‘likelihood’ of occurrence will be 1 for an incident that has occurred and 0 for one that hasn’t occurred. The table below sets this out using some hypothetical figures. Note that the null incidents are also shown.

HAZARDS

INCIDENTS AND OCCURRENCES

CLAIMS JUDICIAL PROCEEDINGS

Like- lihood

Seve- rity

Risk Like- lihood

Seve- rity

Risk Like- lihood

Seve- rity

Risk Like- lihood

Seve- rity

Risk

H1 0.1 2 0.2 I1 0 2 0 C1 0 2 0 J1 0 2 0 H2 0.2 3 0.6 I2 1 3 3 C2 1 3 3 J2 0 3 0 H3 0.05 50 2.5 I3 0 50 0 C3 0 50 0 J3 0 50 0 H4 0.3 2 0.6 I4 2 2 4 C4 1 2 2 J4 1 2 2 H5 0.65 13 8.45 I5 0 13 0 C5 0 13 0 J5 0 13 0 H6 0.025 260 6.5 I6 0 260 0 C6 0 260 0 J6 0 260 0 H7 0.001 1500 1.5 I7 0 1500 0 C7 0 1500 0 J7 0 1500 0 H8 0.45 0.5 0.23 I8 0 0.5 0 C8 0 0.5 0 J8 0 0.5 0 H9 0.01 6 0.06 I9 0 6 0 C9 0 6 0 J9 0 6 0

H10 0.5 60 30 I10 1 45 45 C10 1 45 45 J10 1 45 45 H11 0.005 100 0.5 I11 0 100 0 C11 0 100 0 J11 0 100 0

: : : : Hi 0.003 1 0 Ij 0 0 0 Cj 0 0 0 Jj 0 0 0 ∑Hi 51.1 ∑Ij 52 ∑Cj 50 ∑Jj 47

Event Horizon <<<<<<Pre-Event Control / Post - Event Management >>>>>>>>>>>>>>>>>>>>>>

Concept Hazard (or Vulnerability) Register

In this particular example the total risk due to the hazards is 51.1, which represents a theoretical loss of fifty one days. The incidents that are recorded show that fifty two days were lost, although only two of the potential hazards actually caused the incidents. In fact, if there are a statistically large enough number of hazards then the sum of the probabilised outcome of the hazard set should be equal to the sum of the actual incidents experienced. This means that with a large amount of data over a long period of time it is possible to determine the probable risk loss, based on the following formula:

Σ0 Risk { Hi } = Σ0 Risk { Ij } The focus is then on reducing the probable risk amount that in turn will reduce the actual risk loss due to incidents occurring.

j i

Page 201: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

16.8 Risk & Reliability Associates Pty Ltd

16.4.2 Coordinated Information To ensure that the information regarding losses, incidents, near misses and control system failures are effectively recorded, a coordinated risk information system needs to be available. This may be part of other information systems but its definition needs to be independently developed to support the predicted credible loss scenarios (especially legal and insurance details) and identify in a timely manner any emerging, unpredicted hazards or vulnerabilities. In terms of a strategic risk management control system, names need to be given to different parts so that it is clear to everyone what is being discussed. There are several possible good solutions to the naming issues so the principle of “Ockham’s Razor” 1 has been applied. Broadly this means, choose the simplest answer unless a reason to select a more complex one is discovered. In information terms for risk management this can mean:

Control System Failures

Loss of Control (Near Misses, no loss)

DeathInjuryMedical cost DamageStatutory Breach Courts

Risk ControlManagement

Efforts

TimeIncidents Losses

Hazards

Hazards

Risk Management Information System

Claims (Insurable losses)

Hazard (Risk)s Information Framework

There can be discussion about the desirability of including control system failures in Incidents, especially if there were other parallel control systems in place which prevented the loss of control so that a near miss occurred or the hazard did not occur whilst the control system was not operational. The authors believe any control system failures ought to be recorded as a significant increase in these are indicative of the health or otherwise of the control system. In cause - consequence terms it means incidents are all those items shown in the larger shaded area below.

Loss of Control

Loss

Near Miss

Hazards

Control SystemFailure Incidents

Concept Cause-Consequence Diagram for Information Framework In practice the documentation process outlined in the following figure is needed. 1 Ockham's Razor. The usual formulation of the principle of ontological economy attributed to William of Ockham is: Entia non sunt multiplicanda praeter necessitatem or Entities are not to be multiplied beyond necessity.

Page 202: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

Risk & Reliability Associates Pty Ltd 16.9

Co-ordination

Summary

analysis

recommendation

Vulnerability RegisterLoss Calculator

Collision ModelFire Model etc. etc.

EventExchangeCollisionDang. GoodsOHS &E

LocationTreasury

ProductionShipping

Lookup Tables

Incident ProFormaDateEventTypeDamage

Reporting by Region per Period

Summary

advice

report

Collision ModelFire Model etc

Feedback by Event Type

Cause-consequenceModels incorporatingenergy-damage andtime-sequenceanalysis concepts

Hazard

Control

Loss

Null

summary

Review by PeriodLikelihood (p.a)

Consequence '000$

100101

100101 10000

Incident

Strategic Information System A key element is the need to assess the significance of each incident by co-ordinator/s. For example, the authors have noted many cases where an incident such as a broken rail is given the same rating irrespective of the location, be it a remote siding or a busy main line with many high speed passenger trains. Obviously a sudden increase in main line breaks is of considerably greater concern than a similar increase on rarely used sidings. To obtain this understanding requires a co-ordination review that reclassifies, on the basis of current operation, the risk associated with each event. Then and only then can a review by period have meaning. 16.4.3 An Integrated Concept of Risk & Reliability Information Management The figure below describes an understanding of how the different processes and techniques described in this text fit within a large organisation, and how the information flows occur.

Operations & Maintenance

Strategic Tactical

TopDown

BottomUp

Board andCEO

(Policy)

Pre-event Event Post-Event

CrisisManagement

LossesIncidents & Breakdowns

Fire FightingFirst Aid

Judicial ActionsInsurance Payments

QRAHazopsRCMJob Safety Analysis (JSA) Detectability Reliability MaintainabilityCause-Consequence Modelling etc

Review

Feedback

Control Reporting

Co-ordination

Vulnerability AnalysisSWOT AnalysisUnderwriting AssessmentAvailability Assessment

An Integrated Concept of Risk & Reliability Information Management

Interestingly, the process of risk related information management does seem to need the loop shown in the circle above. Note that this does not exclude once off studies over any of the boundaries, which can be done at any time.

Page 203: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

16.10 Risk & Reliability Associates Pty Ltd

16.5 Audit & Safety Management Systems There has been a continuing desire to develop systems that can provide advice as to the overall effectiveness of risk control systems. These have manifested themselves in various auditing and scoring systems. 16.5.1 SafetyMAP The Victorian WorkCover Authority has developed a health and safety audit system whose purpose is to enable an organisation to:

a) Measure the performance of its health and safety program b) Implement a cycle of continual improvement c) Introduce recognised bench marking standards for health and safety d) Gain recognition for its health & safety management standards.

It has five elements:

1. Health and safety policy 2. Planning 3. Implementation 4. Measurement and evaluation 5. Management review

Initial Level Certification requires an organisation to satisfy the requirements of 82 SafetyMAP audit criteria. The Victorian WorkCover Authority states that these criteria have been selected as encompassing the building blocks for an effective, integrated health and safety management system. Advanced Level Certification requires all 125 applicable SafetyMAP audit criteria to be in place.

Interestingly, this system is based on the concept of ensuring that the process (the presence and effectiveness of management systems) is well and that therefore the proper results will follow. However, as the Victorian WorkCover Authority notes: However conformance to SafetyMAP criteria, whether recognised by formal certification or other means, does not assure compliance with statutory obligations nor does it preclude any action by a statutory body. The danger with such a system is that OH & S resources become focussed on preparing documentation rather than action and prevention.

Page 204: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

Risk & Reliability Associates Pty Ltd 16.11

16.5.2 ISRS (International Safety Rating System) This has been developed in various guises in different parts of the world. The manifestation described here is that by Det Norske Veritas (UK, 1996) who appear to have purchased the Frank E Bird, Jr’s (1976) Atlanta based International Loss Control Institute’s program. The program is based on several key propositions: 1. Safety is good for business and profits. 2. Proactively managing loss is much better than reacting to events. 3. Losses are ultimately due to a lack of effective management systems. 4. An audit system can indicate the health of the proactive loss control management systems. The following figure shows the time sequence model adopted.

1. Inadequate Programme2. Inadequate

Program Standards3.Inadequate

Compliance Standards

Lack of Control

BasicCauses

PersonalFactors

JobFactors

SubstandardActsand

Conditions

Incident Loss

PeoplePropertyProcess

EnvironmentQuality

ImmediateCauses

Contactwith

Energyor

Substance

1.InadequateProgamme2.InadequateProgrammeStandards3.InadequateComplianceStandards

The DNV Loss Causation Model

The key program elements and points score/weighting are given in the table below. Recognition levels are scored out of 10.

ISRS Program Elements 1. Leadership and Administration 2. Leadership training 3. Planned inspections and maintenance 4. Critical task analysis and maintenance 5. Accident/incident investigation 6. Task observation 7. Emergency preparedness 8. Rules and work permits 9. Accidents/incident analysis 10. Knowledge and skill training 11. Personal protective equipment 12. Health and hygiene control 13. System evaluation 14. Engineering and change management 15. Personal communications 16. Group communications 17. General promotion 18. Hiring and placement 19. Materials and services management 20 Off-the-job safety

Points 1310 700 690 650 605 450 700 615 550 700 380 700 700 670 490 450 380 405 615 240

ISRS Program Elements

Like the other audit systems, scoring a perfect 10 out of 10 does not mean that all legal duties have been met.

Page 205: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

16.12 Risk & Reliability Associates Pty Ltd

16.5.3 The DuPont Safety Training Observation Program (STOP) System STOP was developed by DuPont to provide a behaviour based observation program that may be used to improve safety in any organisation. This system is designed to be used by management at all levels. It is not really an audit system as such although the authors' have observed it being used in this capacity. STOP is based on a series of Safety Principles noted below:

• All injuries and occupational illnesses can be prevented. • Safety is everyone's responsibility. • Management is directly accountable for preventing injuries and occupational illnesses. • Safety is a condition of employment. • Training is an essential element for safe workplaces. • Safety audits must be conducted. • Safe work practices should be reinforced and all unsafe acts and unsafe conditions must be

corrected promptly. • It is essential to investigate injuries and occupational illnesses, as well as incidents with the

potential for injury. • Safety off the job is an important element of the overall safety effort. • Preventing injuries and occupational illnesses is good business. • People are the most critical element in the success of a safety and health program.

STOP... for Safety

OBSERVE

ACT

REPORT

STOP

DECIDE

Safety Observation Cycle

The procedure to be used can be seen in the Safety Observation Cycle in the figure above. This shows path of action which starts with a manager deciding to observe an employee. The manager must then stop and watch the employee carry out their job, particularly noting how the employee does or does not adhere to safe working practices. The manager then needs to approach the employee and discuss their working practices reinforcing the safe ones as well as addressing the unsafe. The manager then needs to report the situation appropriately to their superiors. In an Australian cultural context, the system has various degrees of success attributed to it. Certainly, the authors' have noted that if it becomes known as the 'dob-a-mate' technique then it seems to be a cultural anathema and failure. Conversely, if it becomes a 'look-after-your-mate' process then it seems to have a good chance of being effective.

Page 206: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

Risk & Reliability Associates Pty Ltd 16.13

16.5.4 NSCA 5-Star Health & Safety Management System The NSCA 5-Star Health & Safety Management System was developed by the National Safety Council of Australia to identify the elements of a complete OH&S program. It also provides organisations with a framework for improvement and quantitative measurement of OHS performance. The system uses 60 key elements considered to be comprehensive and exhaustive set of risk management components for any organisation in any aspect of business. These 60 Key Elements are grouped into 5 categories in the NSCA system. The categories are as follows: 1. Policy, Organisation & Program Management 2. Management of Health & Safety Risks 3. Control of Specific Work Risks 4. Working Environment 5. Emergency Preparedness & Management There are 5 star gradings: zero to five. Grading Audits are conducted within an organisation on an annual basis and assessed according to Key Elements. A star grading is awarded after each annual grading audit to record an organisation's standard of achievement in implementing best practice levels of risk management. A One Star grading means that the organisation's OHS system is better than approximately 50% of other organisations. Respectively, a Five Star grading means that the organisation is in the top 2-5%. The Key Element Score (KES) and the Injury & Illness Statistics Index (IISI) are then used to assess the current state of an organisation and allocate a Star Grading.

Star Grading KES% 0 Star 00-49 1 Star 50-59 2 Star 60-69 3 Star 70-79 4 Star 80-89 5 Star 90-100

The Benefits of using the NSCA 5-Star Health & Safety Management System are described as: 1. A better measurement of performance 2. Independent assessment 3. International recognition 4. Improved management skills and communication 5. Improved employee involvement The NSCA 5-Star System states the following in terms of legal obligations; "The organisation's standards of health, safety and environmental risk management are normally based on continuous improvement above the legal statutory minimum obligations up to international "best practice". Where national/international standards are incomplete, or unacceptably low or non existent, NSCA 5-Star System (Version 2) assists an organisation define its own standards based on its corporate structure."

Page 207: r2A Risk and Reliability 5th_Edition

Occupational Health & Safety

16.14 Risk & Reliability Associates Pty Ltd

REFERENCES Bird Frank E, Jr (1974). Management Guide to Loss Control. International Loss Control Institute, Georgia, USA. DuPont Safety and Environmental Management Services, DuPont STOP for safety system (supervision) © 1986, Revised 1992 and 1995 IOHA (2002) Report of the International Control Banding Workshop, International Occupational Hygiene Association, London 2002. National Safety Council of Australia, 5-Star Health and Safety Management System. Version 2 (1995). NOHSC (1995) National standards for Atmospheric Contaminants in the Occupational Environment, NOHSC:3008. National occupational Health and Safety Commission, Canberra Ramachandran G (1995). Value of Human Life. Society of Fire Protection Engineers Handbook (1995). Section 5, Chapter 8. Society of Fire Protection Engineers, Boston. Robens, Lord (1972) Committee on Health and Safety at Work, Report. HMSO, London Standards Australia/Standards New Zealand (1999). Risk Management. Australian/New Zealand Standard AS/NZS 4360:1999. WorkSafe Australia (National Health and Safety Commission) (1994). Positive Performance Indicators, Beyond Lost Time Injuries, Part 1 - Issues. ISBN 0 644 35266 3. © The Commonwealth of Australia. WorkSafe Australia (National Health and Safety Commission) (1994) Positive Performance Indicators, Beyond Lost Time Injuries, Part 2 - Practical Approaches. ISBN 0 644 35267 1 © The Commonwealth of Australia. Victorian Occupational Health & Safety Act (1985). Act No. 10190/1985 (Reprint No. 5. November 1998) Victorian WorkCover Authority (2002). A Guide to Occupational Health and Safety Management Systems. SafetyMAP (4th Edition). READING The NOHSC web site (http://www.nohsc.gov.au/) provides a lot of useful information as well as providing links to all the State/Territory Authority web sites.

Page 208: r2A Risk and Reliability 5th_Edition

Financial Risk

Risk & Reliability Associates Pty Ltd 17.1

17. Financial Risk The good news is that risk can have its speculative as well as negative aspects. It can offer business opportunities. The more successful companies become at identifying and managing risk, the bigger the comparative advantage they gain. 17.1 Terms Banks and large-scale financial institutions have only comparatively recently started to focus on risk-adjusted return measures on capital rather than purely a return on asset or book equity. (Smithson 1997). In doing so, they appear to be creating a new lexicon. Terms currently used include:

VAR Value at Risk EAR Earnings at Risk Raroc Risk adjusted return on capital Rorac Return on risk adjusted capital Rarorac Risk adjusted return on risk-adjusted capital Economic Capital = credit risk capital + market risk capital + operational risk capital.

These are obviously designed to take the costs-of-risk into account in terms of the financial institution’s business. In part it is trying to ask the question: “How much has been earned for the risks that have been taken?” The problem of advisors and managers taking extreme chances with someone else’s money is always real. If everything goes well, everyone profits from such extreme risk taking (blue sky) but if matters sour it is the shareholders and investors that lose money, not the advisors or managers. Based on perusal of US magazines like Risk and Financial Derivatives and Risk Management there are remarkable pockets of extraordinary sophisticated statistical modelling occurring. But whether this is truly cost effective is difficult for an outsider to know. There are editorials reporting that some managers feel, “…that using derivatives destroys shareholder value through the costs of dealing; monitoring the transactions; and management time.” (Cooper, 1997). That is, that the costs of managing risk can exceed the reduction in the costs-of-risk. Further, such activities impose risk on others. 17.2 Hedge Funds In September 1998 the world came, “within a whisker of meltdown” (David Thomas 1999). It arose because the Long Term Capital Management (LTCM) a US based hedge fund went to the wall. US Hedge funds reportedly manage up to $1 trillion US dollars, most (up to 90%) of it borrowed. They insert this in various markets acquiring around $10 trillion worth of exposures. To put this in perspective, the GNP of the US is reported to be about $7 trillion. These hedge funds are secretive things. Provided there are less than 100 investors, they do not have to report to the US government who these investors are, how much money they raise and how they invest it (Browning, 1998). Basically, it seems that despite having (or perhaps because of it) Nobel Prize winning economists on staff, LTCM punted and lost. By September 1999, it had lost about 90% of its capital. However, rather than let the company remain subject to market forces and let it go belly up, with the US Federal Reserve leading, the US financial community and Wall Street Authorities provided enough capital (US$3.65 billion) for the hedge fund to be salvaged. The idea was to prevent a domino effect that might fatally destabilise a weakened global market. This approach contrasts vigorously with the approach taken by the IMF and the US with Asian and Latin American debtor countries. Essentially what is happening is that the profits associated with the hedge funds are retained by the funds but the risks associated with their operation are being shared by the global community. Obviously, this has not gone unnoticed.

Page 209: r2A Risk and Reliability 5th_Edition

Financial Risk

17.2 Risk & Reliability Associates Pty Ltd

For example, the Australian Treasurer, Peter Costello, has mentioned in a speech in March 1999 that the global overhaul in finance had to, amongst other things, address the need for better supervision of the highly leveraged international investors, such as hedge funds. To quote the Treasurer, “Vested interests in the international financial sector who benefit from the international community’s sharing of their risks (but not their profits) will resist the necessary evolution in the international financial architecture”. As the journalist Alan Wood noted at the time (The Australian), for vested interests, read the interests of Wall Street. This may not have changed much in recent times. Tim Colebatch (The Age) reviewing the Treasurer's presentation to the Asia Pacific Economic Summit (Sept 2000) notes that the Treasurer states that there is still no agreement on reforms such as requiring hedge funds in capital markets to disclose their operations. Many articles point out (Smithson 1997), the major challenge will be in accounting. The dominant methodology begins with book-keeping and subjects these to a series of adjustments governed by precise rules. This is backward looking at a stable past. Looking to an uncertain future, the essence of risk, and the performance of the market, market valuations of derivatives and the risks they are used to manage, is very difficult. And how should such general uncertainty in accounts be portrayed? How can all this be made transparent to investors and customers? 17.3 Utility and Risk The financial economics literature always starts by discussing the concept of utility. In common parlance an individuals’ utility is the gain or usefulness one obtains from a certain course of action, relative to its cost, which in real life is often not quantified or even quantifiable at all. Of course, in finance things are quantified, and so certain simplifying assumptions must be made. These assumptions are extremely important to bear in mind, since the ultimate conclusions one comes to are highly influenced by these fundamental assumptions. Individual preferences are very diverse, individuals are often characterised as being either risk averse or risk takers. Those who gamble say at a casino or in tattslotto are willing to lose a small, or often cumulatively not so small amount of money, in the hope of making a large gain. Rationally, they are playing what is called in the statistics literature a negative sum game. They are certain to lose in the long run, otherwise the casino could not pay all it’s operational costs and return a profit to its owners. The risk function of the gamblers is not symmetric, since they accept a small loss in the hope of a large gain. But most models assume that financial risk is symmetric. 17.4 Models Markets go up and down. So risk in market terms can be adverse (pure risk) or beneficial (speculative risk). From observation and experience it would seem most investors have a greater preference for not losing money rather than gaining it, that is, given equal probabilities risk averseness is the norm. However, in finance risk is normally assumed to be symmetric. This is not absolutely true, but by making such an assumption many of the tools of statistics become available, most notably the normal distribution, which is symmetric about its mean value.

Standard deviationdeemed

to equal risk

Rate of Return

PureRisk

Speculative Risk

Rate of Return

Page 210: r2A Risk and Reliability 5th_Edition

Financial Risk

Risk & Reliability Associates Pty Ltd 17.3

So for practical reasons it is assumed that mean and standard deviation are the appropriate measures for the return and risk respectively, that is, risk is assumed symmetric and investors are risk neutral. This enables some formal definitions. i. Mean = 'average' = a measure of central tendency or average return on the asset. ii Standard deviation or its volatility. (The terms risk, volatility and standard deviation of returns

tend to be used synonymously) iii. Distribution. For example, stock prices are log-normally distributed In the finance sector there are a wide number of uses to which the above principles can be put: Life Insurance: matching of assets and liabilities and solvency margins General Insurance: business risk, catastrophes, re-insurance and claims reserving Superannuation and Funds Management: asset allocation, returns to members, guaranteeing, minimum returns. Banking: risk assessment, accumulations of risk, value at risk (VAR) across the business, derivatives. The general approach to dealing with portfolios of assets or liabilities is the same. Since financial market returns are ultimately dependent upon the economy, events will tend to affect different assets in often similar ways. For example, if interest rates rise a bank will find problems in all aspects of its book: real estate, business or other loans. Given that returns to assets are clearly not independent statistical principles are again used. This is covered in further detail in section 12.5 Market Risk Mathematics. 17.4.1 Diversification: Systematic and Unsystematic Risk Within an asset class most securities are highly correlated. Hence there is a limit to the reduction in variance, which can be achieved in practice. The risk of being in the market per se cannot be eliminated (indeed it is the source of the reward). The index for an asset class and its standard deviation (= risk) is effectively the minimum risk for that asset class. In practice, this can be achieved with a relatively small number of securities - 25-30 is usually more than adequate and even 10 may not be far off. That element of risk, which can be eliminated by diversification, is called diversifiable or unsystematic. That component which remains (the core risk for being in the market) is called systematic. 17.4.2 Asset Allocation Securities can be categorised into asset classes (in an intuitive sense) which have like characteristics, for example, fixed interest securities, Australian equities, international equities and so on. (There are, of course, securities, which are hybrid or intermediate in nature). Indices are used to represent price movements in the asset classes as a whole. In Australia we use: All Ordinaries Index - Australian Equities Commonwealth Bank Bond Index (All Maturities) - Fixed Interest Morgan Stanley Capital International Index - International Equities The above principles are used to build suitable portfolios of assets, that is, by knowing the correlations between asset classes, optimal portfolios can be built. That is, the appropriate asset mix that gives the best possible return can be determined for a given level of risk. A plot of these points is known as the efficient frontier. 17.4.3 Value at Risk (VAR) The risk in any business can be assessed in just the same way, not just for funds management. Thus by analysing a bank into it’s component assets and liabilities one can derive a single estimate of how much a firm could lose due to the price volatility of the instruments it holds. This methodology, introduced by J P Morgan, requires just such a system of correlation’s and matrices as described above. (Of course, there are competing methodologies of risk assessment, described in the Risk magazine special supplement).

Page 211: r2A Risk and Reliability 5th_Edition

Financial Risk

17.4 Risk & Reliability Associates Pty Ltd

17.4.4 Solvency Risk Both general and life insurance companies need to maintain prudent levels of reserves to cope with fluctuations in the business. They set their solvency levels so as to be able to meet all eventualities to a certain level of probability. By analysing their assets and liabilities they can assess this particular measure of risk because the resulting portfolio of assets and liabilities (by assumption) follows a normal distribution. 17.4.5 Claims Reserving The process of measuring outstanding liabilities is called claims reserving. A company needs to estimate current levels of profit, but leave behind sufficient reserves to meet obligations as they arise. These obligations may not arise for many years, as in diseases like asbestosis. By putting together the risks from the separate lines of business one can assess risk for the company as a whole. 17.5 Market Risk Mathematics In finance, risk is normally assumed to be symmetric. In taking such a position, market risk analysts are defining risk as a simultaneous combination of pure and speculative risk. That is, the likelihood of loss is the same as the likelihood of gain, an interesting and perhaps optimistic assumption. It may not be true, but by making such an assumption many of the tools of statistics become available, most notably the normal distribution, which is symmetric about its mean value. So for practical reasons it is assumed that mean and standard deviation are the appropriate measures for the return and risk respectively, that is, risk is assumed symmetric and investors are risk neutral. This enables some formal definitions.

Standard deviationdeemed to equal risk

Average Rate of Return

SpeculativeRisk

PureRisk

Standard Deviation Showing the Mean and Variance

i. Mean = “average” = r r pi ii

n

==∑1

where ( )p ri i= prob of occurrence; as a measure of central

tendency or average return (r) on the asset.

ii. Standard deviation ( )S r r pi ii

n

= −

=∑

2

1

12

is a measure of the risk of an investment or its

volatility. (The terms risk, volatility and standard deviation of returns tend to be used synonymously). Also of use are the skewness and kurtosis (3rd and 4th moments about the mean), which are measures of the symmetry of the distribution and its “peakedness”, respectively. For the standard normal these are 0,1,0,3 respectively. iii. Distribution. For example, stock prices are log-normally distributed that is: ( )ln ~ ,p Nt µ σ 2 the standard normal distribution.

Similarly:

( )Δ% ~ ,p Nt µ σ 2 which is the more usual way of expressing this fact.

Page 212: r2A Risk and Reliability 5th_Edition

Financial Risk

Risk & Reliability Associates Pty Ltd 17.5

(that is, ln ln lnpp

p pt

tt t

−−

= −

11 )

In the finance sector there are a wide number of uses to which the above principles can be put. Since financial market returns are ultimately dependent upon the economy, events will tend to affect different assets in often similar ways. For example, if interest rates rise a bank will find problems in all aspects of its book; real estate, business or other loans. Given that returns to assets are clearly not independent, statistical principles are again used. Correlation and the Correlation Coefficient Given any two series { } { }X x x Y y yn n= =1 1, . . . . , . . . . it is of considerable interest to estimate any linkages between the two time series. A measure of this is the covariance, or the degree to which the series rise or fall together, and it is defined to be:

cov ( ) ( )( ) ( )( )( )X Yn

x x y y E X Yi i x, = − = − −∑1

µ µ

Note that : Var ( ) ( ) ( )X X X E X X= = − =cov , µ σ2 2

The correlation coefficient between two series is the standardised variate, which has a value between +1 or perfect correlation and -1 or perfect inverse correlation:-

( )ρ

σ σX YX Y

X Y,

cov ,=

12.5.1 The Two Variable Case Since market risk analysts use the standard deviation as a measure of risk, the need arises to consider what happens when two securities or assets (X and Y) are combined in a simple portfolio. In general it is assumed that the securities are not independent and that the price changes will in fact be correlated. Securities are after all only financial claims on assets in the real economy. Thus:

( ) ( ) ( )

( ) ( )[ ]( ) ( )( ) ( )( ]( ) ( ) ( )( )( )( ) ( ) ( )

( ) ( ) ( ) (( )

var ( )

var var cov ,

var var var var

X Y E X Y

E X Y

E X X Y Y

E X E Y E X Y

X Y X Y

X Y X Y

X Y

X Y

X X Y Y

X Y X Y

X Y X Y

+ = + − +

= − + −

= − + − − + −

= − + − + − −

= + +

= + +

= + +

µ µ

µ µ

µ µ µ µ

µ µ µ µ

ρ

σ σ ρσ σ

2

2 2

2 2

2 2

2 2

2

2

2

2

2 In general:

( )var aX bY a b a bX Y X Y+ = + +2 2 2 2 2σ σ ρ σ σ

and ( ) ( ) ( )E aX bY aE X bE Y+ = +

Page 213: r2A Risk and Reliability 5th_Edition

Financial Risk

17.6 Risk & Reliability Associates Pty Ltd

12.5.2 Extension to n securities : Real Portfolios, Real Assets Real world portfolios consist of many securities (within an asset class) and indeed many potential asset classes (each with n securities). The above may be extended by noting:

If S X X X Xn n ii

n

i= + + ==∑11

. . . . not independent and ( )var X i i=σ 2

Then ( ) ( )var cov ,,

S X Xn ii

h

i ii j

= +=∑ ∑σ 2

1

2 the 2nd term being all possible combinations of X Xi j, ,

and noting that cov ( ) ( )X X X Xi j j i=cov , then there are

( )

( )n nn

n n2 2 2

12

=

−=

−!! !

pairs

This can be put in matrix form (the variance - covariance matrix)

X X XXX

X

n

n n n n

1 2

121 1

21 2

222 1

21

2

σ σ

σ

σ σ

, ,

,

, ,

. . .. . . .

. . .

.

.

.

Note: the leading diagonal being the variances and the matrix itself being symmetrical about the leading diagonal. The above process may then be used to combine assets in such a way as to achieve a minimum variance or risk, for example by choosing assets that have a low or negative correlation with each other. This process is known as mean-variance optimisation. User friendly computer packages exist to remove the heavy computations. In optimising the risk of a portfolio of securities or assets, it becomes apparent from the above matrix that the number of covariances far out-number the number of variances. To simplify matters, let us assume we are dealing with a portfolio of N assets of securities. The proportion invested in each asset is 1/N. In each variance cell in the matrix we have (1/N)2 x variance and in each covariance cell we have (1/N)2

x covariance. Portfolio variance = N x (1/N)2 x average variance + (N2-N) (1/N)2 x average covariance = (1/N) average variance + (1-1/N) average covariance As N increases, the portfolio variance approaches the average covariance. If the average covariance is zero, this mean that every asset or security behaves independently of the other and it is possible to eliminate all risk. However this rarely occurs in a given market or industry as assets or securities are affected by similar factors. The average covariance is the lowest level of risk than can be achieved by diversification. This residual is the market risk.

Page 214: r2A Risk and Reliability 5th_Edition

Financial Risk

Risk & Reliability Associates Pty Ltd 17.7

REFERENCES Fukuyama Francis, Professor of Public Policy, George Mason University. The Independent (16/6/99) Browning Bob (1998). Hedge Fund Fears Come Years too Late. Article in News Weekly, October 17, 1998, pages 6 and 7. Colebatch Tim, The Age, September 27, 1999 Cooper Graham, Editorial, Risk Magazine, Volume 10/No 6/June 1997. Costello, Peter (Australian Treasurer) (1999). Reform. As quoted in two articles in the Weekend Australian, March 27-28, page 5, one each by Ian Henderson and Alan Wood. Radcliffe, Robert C., (1994) Investment: Concepts Analysis Strategy, 4th Edition, Harper Collins, New York. See p.170 for a discussion of alternatives both symmetric and asymmetric. Smithson, Charles, Tyrone Po and John Rozario (1997) Capital Budgeting. Article in Risk Magazine, Volume 10/No 6/June 1997. Thomas, David (1999). Nightmare on Wall Street. The Age - Good Weekend February 6, 1999. READING Francis, Jack Clark (1991) Investments: Analysis and Management, 5th Edition, McGraw Hill, New York. Hensel, Chris R.; Ezra, D.Don and Ilkiw, John H. (July-August 1991). The Importance of the Asset Allocation Decision, Financial Analysts Journal 65-72. Paul-Choudhury, Sumit et al (July 1996). Firmwide Risk Management - A Special Supplement to Risk, Risk Magazine, London.

Page 215: r2A Risk and Reliability 5th_Edition

Security

Risk & Reliability Associates Pty Ltd 18.1

18.0 Security The international reach and severity of contemporary terrorism, the increasingly sophisticated modus operandi of much modern crime, especially “white collar” crime, and the way in which criminal networks are globalising, have raised the importance of security within risk management and good governance. Security is obviously more relevant to some enterprises than others. Generally the most relevant factors in assessing the vulnerability of companies to terrorist and/or criminal threats are the location, national identity and political profile of companies, together with the nature of their operations and products. However, few if any enterprises are immune from security risk of some sort. Even companies that are not the direct targets of terrorist or criminal intentions can be indirectly affected by attacks on others, principally by the way threat environments raise costs and impact stock and product markets. For example, the tourist industry has been directly affected by attacks on hotels and resorts. But it has also been indirectly affected by the attacks on airlines on which the tourist market depends. The cost of exporting certain goods to US markets has been affected by delays and costs caused by stringent new border crossing custom requirements. Public infrastructure management, in particular, is currently beset with the need to reassess security in the light of new terrorist threats. Meeting the costs of sometimes substantial enhancement of security impacts the user-payers as well as the owners of infrastructure. One or more of the new security threats is affecting businesses across the board. Threats range from mega-corporate bankruptcies as a result of management-auditor malfeasance, to industrial espionage, to electronic and credit card fraud, computer hacking and viruses, to petty vandalism. Whether directly attacked or not, public as well as corporate enterprises need to exercise security cognisance and apply the appropriate type and degree of security risk management in regard to this widening range of threats. 18.1 Security and Risk Management The security function is required to cope with aspects of risk that differentiate it from other risk management functions. The chief of these is that security threats spring from deliberate intention rather than from accidental, natural, or dysfunctional systemic causes. Persons - not systems, nor components, nor acts of god - create security threats. Persons are not only capable of acts of ill will, but also have intelligence. Intelligence enables persons to discern what protective systems are in place and devise ways to defeat them. Consequently, the first priority and unavoidable task in the security process is to assess the threat. Does a threat actually exist? Do any politically or criminally motivated actions pose a significant risk to the enterprise in question? If so, what are the likely methods of attack? Once the threat assessment is made, then most of the regular processes and techniques of risk management kick in. The key steps after the threat assessment are common to both security and general risk management functions. Most if not all of the above steps that follow the threat assessment will or should have been performed in the course of previous risk management. Those assets vital to the conduct and success of the enterprise will already have been identified. System vulnerability to failure of systems due to explosion, fire, flood, mis-operation, and loss will also have been identified, as will resilience, appropriate crisis management, recovery plans, business impact costs, and so on. The cause of damaging events may be different in the security context, but the effects and responses are mainly replicated in the other areas of risk management. In most business and other organisations security is separated and often isolated as a management function. This occurs mainly for reasons of confidentiality. Nevertheless, security remains in essence a risk management function requiring coordination and integration into the overall management system and a key consideration in good governance.

Page 216: r2A Risk and Reliability 5th_Edition

Concepts

18.2 Risk & Reliability Associates Pty Ltd

18.2 Security Terms Because security personnel use certain terms differently to other risk management professionals, it is appropriate to begin with a definition of three terms basic to security management. Security Management This refers to managing the risk of deliberate intention and attempts to cause harm and/or inflict loss. Security risk emanates from individual or agencies with will and intelligence. Consequently, it involves the potential to detect and defeat controls designed to preventing loss, dysfunction, or harm by natural, accidental or deliberate causes. Security Threats This refers to a generic risk or hazard of a security nature. For example, as used, in the sentences “The threat of terrorism is being taken more seriously in Europe after the carnage in Madrid ; or, “The threat of burglary is a constant concern of many householders.” Security references are generally to “the threat of...’, not to “a threat” (as in “Company X received a bomb threat”). Security Vulnerability This refers to a weakness or susceptibility of something (a potential target) to a security threat. eg. as used, for example, in the sentences, “The inadequately trained and equipped Iraqi police are particularly vulnerable to terrorist attack”; or, “Democracies provide countless soft targets for terrorists. Shopping centres, railway stations, and other crowded locations ,for example, are especially vulnerable as they are largely unprotectable”. Non-security risk professionals often use the term vulnerability to indicate the extent of exposure of an organisation to some risk, rather than its susceptibility to that risk. For example, “The firm’s vulnerability to currency fluctuations could be in the order of millions of dollars”, compared with “The firm is vulnerable to currency fluctuations”. 18.3 Basic Elements of Security Management The central considerations in the design or review of a security system is to identify and assess the following elements: Assets, Threats, Vulnerabilities, Business Impact and Counter Measures. The choice of elements is determined by the logic of the flow chart below:

ASSET

Valuable?

Threatened?

Vulnerable?

Adverse business impact?

Cost effective counter measures?

ACTION

Yes

Yes

Yes

Yes

Yes

Yes

No

No

No

No

No

END

END

END

END

END

Page 217: r2A Risk and Reliability 5th_Edition

Security

Risk & Reliability Associates Pty Ltd 18.3

No security risk exists, nor is expense on counter-measures warranted unless - the organisation in question has valuable assets; - those assets are threatened and are vulnerable to those threats; and - significant business impact would result if the threats eventuated; and - cost-effective, appropriate counter- measure options can be identified. Proposal Model The following suggests the basic elements of a generic model for a risk control proposal:

Risk control measure A is proposed It is designed to protect assets B and C, which are at risk from threats D, E and F which are assessed as having the likelihood of occurrence G and H due to existing vulnerabilities I and J. The business impact (severity), if these threats eventuate, is estimated to be in the region $K-$L. There are also the human factors M and N to consider. The cost of implementation of the measure is $P, maintenance approximately $Q per year. The risk reduction from counter-measure A will produce an estimated cost-benefit in the order of $...... Assessment Status The assessment is that the above threats exist, are sufficiently likely to occur, and have significant business impact to warrant the above responses. Business impact assessment was made and/or checked out with functional managers: production, marketing, finance, legal, industrial relations, health and safety, insurance and other relevant personnel

A Risk Control Format

18.3.1 Assets The first task in the security management process is to identify comprehensively all the significant assets of the organisation. This includes identifying the relative importance of various types of asset to the viability and success of the organisation. Not all assets are material assets such as capital, plant, equipment, products, etc. many of the more important are non-material assets. The chart below includes a number of asset categories as a partial guide to asset identification. Naturally every organisation’s list will be somewhat different and be more comprehensive.

• company reputation with consumers, the public, government, regulatory agencies, etc; • morale, loyalty, retention, motivation of staff; • industrial relations • electronic data in transmission; • information in the possession of staff; • credit rating; • competitive edge-comparative advantage. • intellectual property • market sensitive information • Accounting and auditing integrity • Good governance • State of OH&S • Position regarding legal liability

Page 218: r2A Risk and Reliability 5th_Edition

Concepts

18.4 Risk & Reliability Associates Pty Ltd

The three charts below indicate ways of ensuring that the asset survey is complete, and that no assets are over-looked that would cause the organisation significant harm if lost or impaired;

Inward Goods

Raw Material Storage

Production

Finished Goods

Product Warehouse

Outward Goods

Wholesaler/retailer

Consumer

Orders - Goods(quantity & quality)

accounts - payments

stock control

continuity, waste, quality control,

formulae

unaccountable, desirable

stock control

accounting and stock control

Public Arena

reputationmarket share

liabilityextortion

regulation

Asset Survey by Workflow Staff Security Consumer Security Public Security Safe workplace Product liability Assault Harassment Discrimination Traffic control Car parks Change rooms

Contamination Product Extortion

Pollution Toxic emissions Fires Explosions

Asset Survey by Legal Issues

Competitive Price Sensitive Personnel Form of

Information Location of Information

Marketing Customer lists Formulae Processes

Property buying Takeovers

Medical records Salaries

Hardcopy Electronic email Mail Voice

IT centre Laptops Desktops Board reports Consultants Government Sales staff

Asset Survey by Information

Page 219: r2A Risk and Reliability 5th_Edition

Security

Risk & Reliability Associates Pty Ltd 18.5

18.3.2 Threats The second task, after identification and assessment of assets, is identification and assessment of threats to these assets. The type and degree of protection required for different assets will depend on the nature, likelihood, and severity of the threat. The security appropriate to bomb threats, for example, is obviously different to if the threat was product extortion or industrial espionage. The issue to be considered at this stage is: What particular threats, if any, exist to the identified assets, and which are significant? A sample Threat Checklist is shown below.

Threats to Treasury & Finance Credit squeezes Liquidity issues Customer payment defaults Exchange fluctuations Funding sources failure Interest rate fluctuations Threats to Assets Fire Earthquake Flood Explosion Critical plant failure Malicious damage Threats of Business Interruption Industrial action Political/Civil upheaval Picketing/Demonstrations/Boycott Bomb Threat Bomb "Hoax" Malicious Damage/Sabotage Threats to Information Industrial Espionage Takeover Sabotage of data Threats to Company Reputation Scandal (eg, frauds, business or political) Product Fault or Contamination Environmental pollution Non-compliance

Threats to Company's Competitive Edge Professional incompetence Failure to best practice Failure to continuously improve Poor public image Threats to Product Product Extortion Collusive Theft Pilferage Contamination Threats to Staff Discrimination OH&S injury Harassment Threats from Staff Pilferage Theft Fraud Malicious Damage Threats to Equipment, Cash Robbery Burglary Drug abuse,gambling Sovereign Risk Nationalisation Military Threats Coups Civil disturbance Civil war

A Sample Threat Checklist

It is important to check and review assessments. Consultation with at least functional managers and staff, for example, financial, legal, personnel, industrial relations, public relations, security, safety, warehouse, stock control in addition to specialist police services (for example, bomb, crime prevention, armed robbery, fraud squads) is desirable. Relevant private services, (financial auditors, risk engineers, liability lawyers etc) might also be consulted. Remember that it is futile to include threats, which are not credible. The consultation of others is particularly important in this regard.

Page 220: r2A Risk and Reliability 5th_Edition

Concepts

18.6 Risk & Reliability Associates Pty Ltd

8.3.3 Vulnerability Vulnerability is a weakness or susceptibility of an asset with respect to a threat. This weakness may be intrinsic to the asset. For example, a US multinational company is more vulnerable to politically motivated attacks than a Swiss company. A company with a Board practicing inadequate or inappropriate corporate governance is more vulnerable to costly scandal than one maintaining best practice and continuous improvement. A financial company is more vulnerable to theft and fraud if the accounting, investment and audit systems are dominated by the requirements of the sales and marketing department to the detriment of accurate and timely accounting, audit and risk management. Or the weakness may be due to the location of the asset. For example, a multinational company in the Middle East may be more vulnerable to terrorism than one in Iceland. Confidential information on a meeting room blackboard in an office with some public access is more vulnerable than when it is in a locked cabinet in a manager's private office or a secure registry. Or the weakness may be due to inadequate or inappropriate protection against known threats. For example, a plant with poor personnel, industrial, and public relations may be more vulnerable to malicious damage than one with good relations. A company with no contingency planning for serious security and other incidents, and with no pre-prepared disaster recovery plan/guidelines may be more vulnerable to adverse business impact if certain threats materialise. A sample list of vulnerabilities is shown below.

Business Continuity

• Production dependent on on-going supplies of raw materials, which could be stopped by picketing?

• Cash flow interruptions through product recall due to contamination or extortion could prove financially difficult for the company?

Business Reputation

• Removal of product from sales for a period could affect long-term market share? (That is, people try other brands and change brand loyalties)

Information

• Price and competition sensitive information exists?

• Competitors exist? • Unscrupulous competitors exist? • Environmentalist or consumerist critics

exist? • Political and/or industrial militant critics

exist? • Data is inadequately backed up? • Some managers refuse to take risk

seriously and manage it professionally.

Plant

• Production equipment, which is easily damaged and slow to be replaced?

• Inadequate access control? Inappropriate intruder detection? Staff

• Inadequate personnel selection, checking, and training procedures?

• Poor personnel relations / supervision • Disgruntled employees, ex-

employees, contractors? • Isolated female staff working at night? • Badly lit car parks?

Product

• Stock control system will not warn reliably and in good time that a loss trend has emerged?

• Product loss is put down to unexplained "shrinkage" or inaccurate stocktaking or accounting?

• Product is small, highly desirable, easily disposable, subject to access during night shifts, and employees' car park is unlit and close to rear doors of product warehouse which is poorly supervised.

Table of Vulnerabilities

Page 221: r2A Risk and Reliability 5th_Edition

Security

Risk & Reliability Associates Pty Ltd 18.7

Vital or Key Points The concept of vital points (sometimes also referred to as key points) is important to vulnerability assessment and prioritisation. A vital or key point of any asset from the security viewpoint, is any part or feature of an asset (For example, plant, equipment, communications or information system) that is essential to its continuing operation or integrity. If this vital point is easily damaged (due to accessibility or fragility), and would be difficult, for any reason, to restore to proper operation, it becomes all the more vital to reduce its vulnerability. 18.3.4 Business Impact Having identified and assessed an organisation’s assets, significant threats to them and whether they are vulnerability to those threats, the fourth task is to assess the business impact if various threats were to eventuate. Only when the four elements are identified, assessed, and related can the appropriate priorities of a security system be correctly determined. Business impact is the overall consequences for an organisation if threats succeed. Business impact assessments are similar to, but not identical with risk management severity measurements. Business impact should include human cost, that is, suffering, anguish, anxiety, stress, and the like, which staff, members of the public, and associated families would experience - not just loss measurable in dollars. Good corporate citizens and managers are motivated by normal human values, not just the “bottom line” or “Profit is King” attitudes. It is necessary also to consider consequential or indirect costs as well as direct costs. For example, it may only cost thousands of dollars to replace a contaminated product, even less if it is covered by insurance. But the loss of market share, brand loyalty, and business reputation may be far more important. Consequential damage includes such things as:

• business interruption • loss of market share or competitive edge • fines due to incidental pollution resulting from fire, explosion, or malicious damage

Consequential damage can result also if a breach of security causes such things as:

• strikes • legal liability • government regulation • deterioration in relations with staff, unions, neighbourhood, government, media / public

Sometimes security itself can be the cause of poor staff and union relations if it is inappropriate, or insensitivity implemented. A common example is the inept use of baggage inspections or searches as a counter-measure against terrorism. Assessing business impact is a collective task. A manager cannot do it effectively without the assistance of other managers of specialist functions. Virtually all other functions are involved in assessing business impact in relation to one or other of the company's assets. Obviously insurance and finance/accounting departments need to be involved, but so too, in many cases, do production/operations, marketing, personnel, industrial relations, public and media relations, and legal departments. Business impact is a form of risk characterisation particularly persuasive in assessing commercial risk. It is the overall cost to the company if threats succeed. Proper assessment of potential business impact is essential in determining the cost-benefit of proposed counter-measures. The key issues are to establish the nature of the perceived vulnerability quantified in terms of possible dollar impact and return period. How much would the counter-measure cost to implement and maintain? How much risk reduction would this achieve? How does this compare with the maximum foreseeable loss that could result if the measure was not introduced and threats succeeded?

Page 222: r2A Risk and Reliability 5th_Edition

Concepts

18.8 Risk & Reliability Associates Pty Ltd

18.3.5 Counter Measures When identification and assessment of assets, threats, vulnerabilities, and potential business impact is complete, it is possible to consider what cost-effective counter-measure options exist to avoid or reduce the cost of risk. Counter-measures to avoid or deter security threats, lessen vulnerability and reduce potential business impact comprise both material and non-material measures. Material measures – or physical security – include such things as: - access control systems - intruder detection and alarm systems - perimeter fences, locks, safes, and other physical barriers - signage - guards, patrols Many possible control options are non-physical. For example: - credible threat intelligence (that is, pre-warning of crime, terror or other relevant trends) - accounting and inventory control techniques (that is, capacity to get timely warning that losses

are occurring through theft, pilferage, fraud, etc.) - personnel, industrial, and public relations techniques (that is, reducing risk from disgruntled staff,

unionists, neighbours, activist groups) - training (that is, raising security consciousness and motivation) - contingency planning - crisis management (damage/business impact control) - avoidance (giving up activities if they are too risky compared with the possible profit;--relocating

activities to safer areas) - transference of risk (that is, insurance, contracting out) - secure back-up (for example, data and equipment back-up and offsite secure storage) - payroll techniques (for example, payment by cheque or bank deposit) - law enforcement and security liaison arrangements - effective monitoring performance indicators for timely warning of loss trends. Any selection of physical and non-material measures does not constitute an effective security system unless these measures are coordinated so as to complement each other in the furtherance of the organisation’s goal and objectives. It is important that the security function should not be compartmentalised so as to allow demarcation gaps, contrasting security arrangements, or haphazard variations in security standards within the one organisation. Zoning of security standards and control levels can, however, be appropriate when it is a considered, deliberate and coordinated measure applied to vital points within an organisation. For example, research or confidential information storage or processing departments. Testing protective security How well a vital point is protected can be highlighted for review by applying what some call the “onion test”. This is illustrated in the ‘onion’ diagram below. The security principle illustrated by the onion test is that the degree of protection is indicated by the number of protective layers that surround a vital point. In high security situations, an initial barrier and intruder detection should operate at the external perimeter so as to warn security monitors in time to respond before an intruder reaches the core of the concentric circles surrounding the vital point. Inner barriers aim to delay the intruder to facilitate timely intervention.

Page 223: r2A Risk and Reliability 5th_Edition

Security

Risk & Reliability Associates Pty Ltd 18.9

Valuable asset

Patrols

Fencing

Security lighting

Door, window locks

Intruder detection

Secure room

Security monitor

Contingency plan

Security management

Staff security awareness

Accounting system

Inventory system

Personnel selection

PHYSICAL SECURITY

MANAGERIAL SECURITY

“Onion” Test of Vulnerability Resources can be used more effectively if it possible to concentrate protection around vital points within an establishment rather than seek to protect everything within a location equally by often futile efforts to seal off the whole establishment at the outer perimeter. 18.4 The Terrorist Threat Contemporary terrorism has put increased emphasis on the security function in general and on certain elements of that function in particular: 18.4.1 Severity The severity of the terrorist threat has increased. For xample, as exemplified in the World Trade Centre, Bali and Madrid incidents. Currently favoured targets are highly vulnerable crowded public areas such as transport stations, entertainment and tourist hotel areas. 18.4.2 New Modus Operandi Terrorists can combine primary and secondary targets. For example, as in the highjack of airliners and their weaponisation into missiles to attack the primary targets – the Twin Towers and the Pentagon. 18.4.3 Range and Applicability The threat now has a global reach with attacks ranging from Moscow to Bali, from New York to Madrid. Although most business operations will never become the primary or secondary targets of the new terrorism, few if any will avoid being affected indirectly. Terrorism has already and will continue to increase certain costs of business, including - compliance costs in regard to increasing anti-terrorist regulation

(For example, container export to the US market) - possible delays and uncertainties regarding to “just-in-time” manufacturing delivery systems - delays and interference with executive travel - accidental involvement in counter-terrorist investigations

(For example, unwitting involvement in terrorist money laundering and funding operations) - unanticipated economic and/or market fluctuations in various parts of the world due to terrorist

incidents, war, and civil disturbances.

Page 224: r2A Risk and Reliability 5th_Edition

Index

Risk & Reliability Associates Pty Ltd I

INDEX Entries in italics type indicate other referenced writers. Page Adversarial Legal System 4.2-3 Airspace Risk Assessment 15.1 Asset Management 2.2, 2.14, 7.3-16 Audit Systems 4.6, 11.1 Australian Risk Criteria 13.18 Availability 9.13 Beck, U 3.1, 3.3 Best Practice Risk Management 2.4 Bipartite Philosophies 1.3 Biological Metaphors 2.5, 5.3 Block Diagrams 9.1, 9.5 Block vs Trees 9.6 Blombery Dr Ron 1.1 Bottom Up Techniques 8.1-15 Breakdown Failure Mathematics 12.4 Browning R W 3.10 Browning R L 8.10 Bushfire Risk Management 15.10 Business Impact 7.7, 18.7 Causation 5.1-7 Cause-Consequence Modelling 4.4, 9.7-10 Chadwick E L 1.2 Chapman and Ward 7.16 Claims Reserving 9.22 Common Cause and Mode Failures 9.10 Common Law Criteria 4.1-2 Common Mode Failures 10.11 Conditions and Failures 5.10-11 Context–Process Industry Risk Assessment 13.2-4 Control 7.15-16 Conway W E 1.6 Coordinated Information 16.8 Costello, Peter (Australian Treasurer) 3.11 Costs of Ownership 3.15 COTS 9.13 Creighton W B 1.3, 4.1 Criminal Matters v's Civil Standards 4.1 Crosby P 1.6 Dawkins Richard 5.4 Demming W E 1.6 Discrete Event Mathematics 12.1 Discrete State Concepts 5.4 Diversification 3.12 Due Diligence 4.3 Det Norske Veritas 4.6, 16.11 DuPont STOP System 16.12 Energy Damage 5.8 Energy Damage Models 5.8-10 Environmental Risk Criteria 6.7-9

Page Equipment Breakdown Failure Rates 9.12 Ethical Criteria 6.11 Event Trees 9.3 Facilities Management 3.17 Factory Mutual System 1.3 Failure Modes 1.4 Failure Rates 1.5, 9.12 Fatality Risk 16.4 Fault Trees and Block Diagrams 12.3 Fault Trees and Success Trees 9.2-3 Feigenbaum A 1.6 Fire Safety Studies 13.17-18 Fire Risk Management (in buildings) 15.6 FMEA, FMECA 10.2 FMECA Registers 8.1 Fractional Dead Time Mathematics, 12.7 Generative Techniques 11.1 Group/Societal Risk Criteria 6.6 HACCP Analysis 10.13 Hazard (OH&S) Registers 8.1 Hazards, Incidents and Risk 16.6 HazOps 10.6-10 HazOp Risk Registers 8.1 Haddon W 5.8 Heinrich H W 5.6 Human Error Rates 9.10-11 Idealised Risk Management Structure 3.17 Imai M 1.6 Individual Fatality Risk 6.2 Individual Risk Levels 6.2-5 Industry Based Risk Assessment 15.1 Information Systems 16.6-10 Information Measures 16.4 Information Security 7.6 Information Structures 16.6 Insurance based Risk Management 2.2 Insurance Criteria 6.10 Integrated Information Management 16.10 Integrated Investment Ranking 8.12 Intergovernmental Environment Agreement 6.9 International Safety Rating System 3.8, 16.11 Ishikawa Fishbone Diagram 5.7 Ishikawa K 1.6 Juran J M 1.6 Juries and Justice 4.2 Kauffman R 5.4 Key Performance Areas 3.19 Kletz T 5.6 Kuhn T 2.1, 5.1

Page 225: r2A Risk and Reliability 5th_Edition

Index

Risk & Reliability Associates Pty Ltd II

page Lees F P 9.7 Legal Criteria 6.1 Liability 4.1 Liability & Consequence Management 4.6 Lost Time Frequency Injury Rates 16.5 Market Risk 2.4, 3.10 Market Risk Mathematics 17.4 Market Risk Models 3.11-12 Markov Analysis 12.6 Maruyama M 5.2 Mizuno S 1.6 Modelling Techniques 9.1 Moubray J 1.4, 10.5 Møller C 1.6 New Zealand Risk Criteria 13.18 Nohl J 1.2 NSCA 5-Star 16.13 NSW Department of Planning 6.2, 13.20 Oakland J 1.6 Ockham’s Razor 16.8 OH&S Hazard Ranking 8.2 OH&S Hazard Registers 8.1 Organisational Models 3.14 Paradigms 2.1 Paradigms Integration 2.8 Pathogen Metaphor Model 5.3 Payback Assessments 8.12-14 Peters T 1.6 Popper K R 5.1 Probability Criteria 6.1 Process Industry Risk Assessments 13.1 Process Risk Management 3.18 Project Risk Process Model 2.4, 7.15 Property Loss Prevention Registers 8.1 Property Loss Prevention Ranking 8.10-11 Public Risk 3.19 Quality 1.6 Quantitative Risk Analysis (QRA) 11.5 Ranking Techniques 7.1 RCM 8.4 Reason James 0.6, 3.4, 5.3,10.9 Redmill Felix 3.5 Reliability 1.4 Residual Risk 9.15 Rise of the Risk Society 1.8 Risk 1.1 Risk Assessments 7.11-16 Risk Assessment in the Process Industry 13.1 Risk as Variance 2.4 Risk Auditing Systems 4.6 Risk Characterisation 7.11 Risk Culture 2.6 Risk Criteria 6.1, 13.18-21 Risk of Financial Loss or Gain 3.10

page Risk Management 1.1 Risk Management Overview 3.16 Risk Management Process Models 3.15 Risk Management & Project Life Cycle 7.16 Risk Management Structure 7.15 Risk Profiling 7.14 Risk Registers 8.1 Risk & Reliability Diagrams 3.14 Risk & Reliability Mathematics 12.1 Risk Role Models 3.19 Rowe W D 5.7 Rule of Law 2.1 Safety Cases 4.5, 13.1 Safety Culture 2.7 Safety Integrity Level (SIL) 9.13 SafetyMAP 16.10 Severity Criteria 6.4 Shingo S 1.6 Simulation 2.5 Smith D J 9.11 Societal Risk Criteria 6.4 Solution Based Risk Management 2.5 Solvency Risk 3.13 SOUP 9.13 State Theory Mathematics 12.5 STOP System (duPont) 16.12 Success Trees 9.3 SWOT Assessments 2.1, 7.1 Systems in Series 12.2 Systems in Parallel 12.2 Taguchi G 1.6 Taylor R T 1.2 Terrorism 18.9 Time Sequence 5.5 Threats 4.4, 7.6, 7.15,14.1, 18.1-5,18.9 Top Down Context 13.2 Top down Techniques 7.1 Train Operations Rail Model 15.3 Transmission Line Risk Management 15.7 Tripartite Risk Control Philosophies 1.3 Tweeddale H M 10.7 UK Health & Safety Executive 6.8 Utility and Risk 3.11 Value at Risk 3.13 Victorian Risk Criteria 13.19 Viner D B L 5.9, 9.7 Vulnerabilities 7.2-4, 7.8-10, 18.2, 18.6 Vulnerability Assessments 2.3, 7.2-4, 18.6 Vulnerability Registers 8.1 Vulnerability Workshops 7.8 Western Australia EPA Risk Criteria 6.3 Wiggins J H 6.4 Winslow C E A 1.2 Workshops 7.8 Wright J H 6.5