Preventing recurrence of industrial control system ... · Preventing recurrence of industrial...

Preview:

Citation preview

Preventing recurrence of industrial control system accident using assurance case

Mirko Napolano, Fumio Machida,

Roberto Pietrantuono, and Domenico Cotroneo

University of Naples Federico II, NEC Corporation

Outline

1. Motivation

2. Assurance of accident recurrence prevention

3. A case study

4. Conclusion

3

Critical infrastructure systems

▌Critical infrastructure systems

Power grids, gas pipelines, water supplies, communication and transportation services, etc.

They are essential for human lives and a wide variety of social activities

▌Advances and threats

Infrastructure systems are getting smarter

They may confront new types of threats

4

Accident can happen

▌Accident in critical infrastructure system Ex) PG&E Gas pipeline explosion killed 8 people and injured 58

September 9, 2010 - San Bruno, California

Avoiding similar accidents in the future, by lessons learned from the experience

NTSB accident report, PAR-11/01

5

Understanding what happened

▌Independent public agencies investigate on the accident

Authoritative body with experience in the field

Many months to reconstruct the events and assess the causes

Participations of all the stakeholders

▌At the end of this process a final report is published with:

Accident narrative

Systems descriptions and analyses

List of safety recommendations

▌Recommendations are guidelines to solve identified problems

E.g. “The flight management computer needs to be improved in accordance with the design specifications” (issued for an aircraft crash)

6

Challenge

▌A source of information is available: accident knowledge

Useful for third-party organizations that need to improve their existing systems in the same domain

▌Though, the list of recommendations is not enough:

Directed to the concerned system providers

Issued with generic solutions not straightforward to be applied

Goal

• Learning from experience clearly how to avoid effectively

reccurence of similar accidents

Our contribution

• A methodology to structure the accident knowledge through

graphical notations and arguments

Outline

1. Motivation

2. Assurance of accident recurrence prevention

3. A case study

4. Conclusion

8

Approach overview

▌Step 1: ECFMA (Event and Causal Factor Mitigation Analysis)

Graphical representation of events, problems and solutions

Information provided by the whole report (descriptions and recommendations)

▌Step 2: Assurance Case

Argumentation over the mitigation of the discovered problems

Instantiation of a new pattern, “Accident Recurrence Prevention Pattern”

9

Example of ECFMA

▌ECFA: tool used by investigative agencies as an accident causation model to identify root, direct and contributory causes

ECFMA introduces “solution” element connected to “causal factor”

10

Assurance case concepts

▌Safety case

A structured argument supported by a body of evidence used for assuring system safety

▌Assurance case

A general argumentation for assuring any kind of system property

▌Goal Structuring Notation (GSN)

A standard graphical notation widely used to describe assurance cases

▌Assurance case patterns

A means of documenting and reusing successful argument structures

11

Example of assurance case

12

Accident Recurrence Prevention Pattern

▌Define a new assurance case pattern

Goal is to ensure the recurrence of similar accidents in the future

Outline

1. Motivation

2. Assurance of accident recurrence prevention

3. A case study

4. Conclusion

14

Case study: PG&E accident

▌Date and location: September 9, 2010 - San Bruno, California

▌Industrial system: SCADA system managing and controlling a gas pipeline

▌The accident: an explosion in the pipeline caused by an overpressure not adequately managed by SCADA system

▌Consequences: 8 people killed, 58 injuries and 38 homes destroyed

NTSB accident report,PAR-11/01

15

Accident analysis

▌Analysis performed using the final report issued by NTSB

▌Problems identified from ECFMA

1. Lack of information in the maintenance work procedures (root cause)

2. Failure of the two redundant power supplies that energize the electrical valves in the station under maintenance (direct cause)

3. Inadequate fail-safe mode (contributory cause)

4. Absence of Remote Control Valves (RCV) (contributory cause)

▌Proposed solutions

1. Maintenance work procedure including requirements for identifying the likelihood and consequences of planned work on SCADA system

2. Use of separate circuit breakers in the station

3. Use of close fail-safe mode

4. Installation of RCVs along all the lines

16

PG&E ECFMA: an excerpt

17

PG&E assurance case

18

Evaluation

▌Comparison among two possible approaches to improve systems from accident knowledge: Use of list of recommendations

Assurance case

▌Consider the report as a structured document composed by links and nodes to be compared against the assurance case nodes: sections, subsections, paragraphs

▌Evaluation criteria: Understandability

Reusability

Effectiveness

19

Results

#1: Direct links from hazard to mitigation Recommendations 0/4

Assurance case 4/4

#2: Average hops from hazard to mitigation

Recommendations 24.5

Assurance case 1

Understandability

Reusability

#1: Links from recommendations to hazard context

Recommendations 0/4

Assurance case 4/4

#2: Hops from mitigation to hazard context Recommendations 31.25

Assurance case 2

Effectiveness

Number of mitigated hazards Recommendations 2

Assurance case 4

Assurance case provides more structured and reusable knowledge

20

Conclusions

▌We presented an approach to create a post-failure assurance case from the accident analysis

▌A new assurance case pattern has been developed to directly use the analysis outcomes about identified problems and solutions

▌Our approach effectively increases understandability and reusability in the system improving process

Recommended