52
Are We Doing It Wrong? Jim McGlone, MBA, GICSP CMO, Kenexis

Jim McGlone, MBA, GICSP CMO, Kenexis · •Makes SL-T (ISA/IEC62443) selection similar to LOPA for SIL selection targets for SIF. Security PHA Review •Can be done after a PHA, or

  • Upload
    buinhan

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Are We Doing It Wrong?Jim McGlone, MBA, GICSP

CMO, Kenexis

• In my lifetime seatbelts, crumple zones, power windows, airbags, more airbags, door lock knobs disappeared, drive by wire technology made my emergency brake a pushbutton, and now automatic collision avoidance

• What if something goes wrong

• Patch Tuesday

• Never mind, just patch everyday!

• How are we supposed to run the plant if we are patching, updating firmware, updating antivirus signatures

• Routable protocols are appearing on everything, even pressure and temperature transmitters

• Everything is digital, remotely programmable or configurable

• Now we have wireless and IoT stuff appearing

• Communication stacks have been hacked

• Remote Access Trojans have compromised control

• Safety Instrumented Systems have been hacked

• When is the last time you felt like you had this under control

• If we are trying to protect information in the office, what are we protecting in the plant?• Production, process, safety, machines,

environment• Can it be segmented and categorized

differently• Are some parts of production more

important than others• Are some processes more important or

dangerous • Are service level requirements at risk• MTTR to long• Is life at risk• Can it go boom

• Is it statistically possible to measure the risk of being attacked or of a a malware infestation

• How do I make the business case for management

• We already measure the risk of critical processes using Process Hazards Analysis (PHA)• Common method is HAZOP

• Most wet processes use it today

• Established ISA84 & IEC61511• We must understand the process under control

• We must understand under what circumstances control can be lost

• We must know the consequences, causes, and safeguards for each scenario

• Once the SCENARIO is known that can compromise a critical process• We can assess the risk

• Evaluate the SAFEGUARDS

• Determine if improved security measures are necessary for that SCENARIO

• Network device vulnerabilities alone do not tell us what the risk is to the plant

• Looking at the design of the network to determine what hazards are present is backwards

• Its like looking at the Safety Instrumented System (SIS) to determine what risk to the process exist, we do not do that

• Cybersecurity is important, but not specific unless attached to a specific SCENARIO

• Cyber PHA, Cyber HAZOP, CHAZOP (Computer Hazards & Operability)

• These methods focus on vulnerabilities in network devices

• These methods are more like Failure Modes & Effects Analysis (FMEA) • Great tool for identifying flaws in design, doesn’t identify process hazard

scenarios

• They lack an Initiating Event

• Have Infinite Potential Outcomes

• Unknowable Frequency of Attack

• Failure to Consider Inherent Safety

1• Identify an ICS asset

2• Identify a threat employing that asset

3• Identify a vulnerability allowing that threat to occur

4• Determine likelihood of attack

5• Determine consequence

6• Calculate risk

• Maybe we should be analyzing the process to verify at least one layer of protection for a SCENARIO cannot be hacked like the car example

• Maybe we should argue that any industrial cybersecurity effort that begins with the network devices is focused on the wrong thing

Security PHA Review• Designed to either generate the cybersecurity

performance targets for a zone or specify the recommendations for inherently cyber-secure layers of protection

• Developed by technical safety practitioners with a strong background in industrial controls implementation and cybersecurity

• Designed to fit with existing project life cycles of design, implementation, and operation of process plants while leveraging existing engineering tasks and reports generated for process safety

• Makes SL-T (ISA/IEC62443) selection similar to LOPA for SIL selection targets for SIF

Security PHA Review• Can be done after a PHA, or as a

step during we check each SCENARIO• Is the initiating event hackable• Microprocessors are hackable• Control loops, SIS functions, operator

interface actions are all micro-based• Human operation manually opening a

valve are not hackable yet• Mechanical safety devices like pressure

relief valves are not hackable

• If all layers are hackable • Assign Security Level (SL) or

recommend inherently safe device

Determine if ALL Safeguards are Hackable

Next Scenario

Assign SL based on Consequence of Scenario or Propose a Safeguard that is Inherently Safe Against Cyber Attack

Determine if Cause is Hackable

Yes

Yes

No

No

Security PHA Review Benefits

• Lower risk to tolerable level based on lowering consequence of event

• Better understanding of attack vectors

• Make the right choices for the design you have

• Increased efficiency by extending existing studies

• Standards compliance by building on recognized and generally accepted good engineering practices

• ISA/IEC 62443 Industrial Automation and Control Systems Security• Significant collection of documents to support various security aspects

General

• IEC 62443-1-1Concepts, and Models

• IEC 62443-1-2Master Glossary of Terms and Abbreviations

• IEC 62443-1-3System Security Compliance Metrics

• IEC 62443-1-4IACS Security Lifecycle and Use-Case

Policies & Procedures

• IEC 62443-2-1Requirements for an IACS Security Management System

• IEC 62443-2-2Implementation Guidance for an IACS Security Management System

• IEC 62443-2-3Patch Management in the IACS Environment

• IEC 62443-2-4Requirements for IACS Solution Suppliers

System

• IEC 62443-3-1Security Technologies for IACS

• IEC 62443-3-2Security Risk Assessment and System Design

• IEC 62443-3-3System Security Requirements and Security Levels

Component

• IEC 62443-4-1Product Development Requirements

• IEC 62443-4-2Technical Security Requirements for IACS Components

• Process Hazards Analysis

• ~50 years old

• HAZOP is the most common method

• Facility is broken down into Nodes and every deviation like High Pressure, Low Temperature, Reverse Flow is considered

• If safeguards are inadequate, recommendations are made

Let’s Look at HAZOP quickly for those who might not be familiar…

1. Collect Process Safety Information (PSI)A. Piping and Instrumentation Diagrams (P&IDs)

B. Process Flow Diagrams (PFDs)

C. Process Block Flow Diagrams

D. Material and Energy Balance (including stream compositions, temperatures, pressures, flow rates, etc.)

E. Equipment Specification Sheets

F. Instrumentation Specification Sheets

G. Relief System Design Basis Documentation

H. Cause-and-Effect Diagrams

2. Assess Deviations from Design IntentA. Four deviations and eight process parameters (32 combinations) are

common

B. It is possible to apply this to each piece of equipment, but it is impractical

C. Facilitators group equipment together into “Nodes” where the operating conditions of the equipment are similar

3. P&IDs Marked to Provide a Visual Extent of a Node

4. HAZOP TeamA. Facilitator (a.k.a., leader or chairman)

B. Scribe (a.k.a, technical support engineer)

C. Operations

D. Operations Management

E. Process Engineering

F. Maintenance (including specific

G. Process Safety Management

H. Instrumentation and Controls Engineering

I. Specialty Equipment Engineering (e.g., rotating equipment, fired heaters)

5. DeviationsA. Analyze each deviation for each node

and document the results of the discussion

B. Deviations vary for different industriesC. Process industries commonly use

i. Pressureii. Temperatureiii. Leveliv. Flowv. Compositionvi. Viscosity

D. Guide words are used to drive deviations from design intent

i. High (More)ii. Low (Less)iii. Reverseiv. Misdirectedv. Other thanvi. Abnormal

6. Scenarios & ConsequencesA. Facilitator leads the team through a

discussion about the first deviation, high pressure in this case

B. Team considers is there is a way to achieve high pressure above the design capabilities of the Node

C. If not, then it is documented and the team moves on

D. If it is possible to exceed maximum allowable working pressure, potential rupture of vessel, release of flammable material, potential fire or explosion, potential for single fatality for exposed personnel”

A. Then it is documented and assigned a Consequence Category based on the severity

7. Safeguards A. Cause of high pressure might be

excessive upstream pressure feeding into this process section, or an external fire near process equipment

B. All applicable safeguards are listed

8. LikelihoodA. Team makes an assessment of

the likelihood that the event, and its associated consequence, will occur considering all of the safeguards that are available

9. RiskA. The overall risk posed by the

scenario is a function of the combination of consequence and likelihood

B. Each intersection represents a statement about tolerable risk of the Consequence and Likelihood pair.

C. If the consequence severity was a category 4 and the likelihood was a category 1, then the table shows an orange risk level 2

D. If this is unacceptable to the plant, then recommendations must be made to lower the risk to an acceptable level

Security PHA Review

• Identify the locations where safeguards that are inherently safe against cyber-attack should be deployed or an increased security level should be put in place

• Another PHA or HAZOP is not required, a small cybersecurity team can accomplish the task maybe with an I&C technician

• Each scenario is reviewed to determine if a pathway that is subject to exploitation via cyber-attack (i.e., “hackable”) exists • Review all the scenarios and their safeguards to see if they are hackable

• Spring-based safeguards are not hackable

• Valves physically operated by humans are not hackable yet

• Microprocessor-based controls with routable protocols are hackable

Security PHA Review Flowchart

Document as NO CONCERN –Next Scenario

Is the Cause Hackable?

Start

Obtain PHA Report

Identify Scenario

Identify Scenario Cause

Identify First Safeguard

Is the Safeguard Hackable

More Safeguards?

Identify Consequence

Add Non-Hackable

Safeguard?

Consequence Significant?

No

Yes

No

No

Yes

Yes

Yes

Not Feasible

Feasible

Document Recommendation for Non-Hackable Safeguard

Document as NO CONCERN PENDING RECOMMEDATION –

Next Scenario

Document Recommendation for SL Target

Hackable CauseHackable Safeguards

Cons. Sig. I.S. Safeguard N/P

Choose Recommended SL from Appendix 3

Safety Consequence Category Table

More Scenarios

Select Highest SL

Assign SL

No

Yes

• If the DEVIATION in the HAZOP includes at least one non-hackable safeguard like a spring-based relief valve, then the deviation cannot be generated through a cyber-attack and is thus considered not hackable

• If you find that everything is hackable, you need to look at the CONSEQUENCES to determine if that deviation results in a significant consequence • If the CONSEQUENCE is acceptable, that attack vector is essentially a nuisance

that is best left to traditional cyber-security

• If the CONSEQEUNCE is significant and not acceptable, then it is incumbent upon the analyst to make a recommendation to add a non-hackable safeguard or establish a Security Level (SL) based on the organizations tolerable risk criteria

• SL is chosen from an example chart like the one on the next page…

• Risk reduction is expensive so practical measures are used

• Individual Risk of Fatality (IR)

• As Low As Reasonable Possible (ALARP)

• Target Maximum Event Likelihood (TMEL)

• Typical risk tolerance criteria will be developed using a scenario-based fatality TMEL of 1x10-5 per year

• 10-5 or 1E-05 is 1 in 100,000

Note: This is an example, your details might be different.

• Security PHA Review documented, using paper and highlighters

• Security PHA Review documentation, new document

Initiating Event Loc. Hack? Safeguard Loc. Hack? All SG

Hack?

Scenario

Hack?

Cons.

Cat

SL Req.

1.1.1.1 Failure of FIC-202,

Quench After Bed #1, such

that the valve goes to the

closed position and the

quench flow is stopped

DCS Yes Operator intervention based

on high outlet temperature

alarms TAH-204, TAH-205,

TAH-206, TAH-207

DCS Yes Yes Yes High 2

Operator intervention based

on low quench flow alarm –

FAH-201

DCS Yes

Safety Instrumented Function

UZC-207 Stopping Inlet Flow

Upon Detection of High

Temperature

DCS Yes

• Security PHA Review documentation, modified HAZOP report

• Tank Reactor Runaway Reaction• Continuously Stirred Tank Reactor (CSTR) producing a highly reactive

chemical compound

• Reactor is charged with a fixed quantity of Reactant A

• After reactant A has been completed charged, the agitator (i.e., mixer) is started, and cooling water is introduced into the cooling water jacket

• After cooling and agitation are established, Reactant B is added to the reactor at a slow and controlled rate

• After a set period of time, all of Reactant A is consumed by the addition of Reactant B, and the reactor vessel is left containing the reaction product, Product C

• The product is drained from the tank reactor and sent downstream for further processing

• Tank Reactor Runaway Reaction• Simplified P&ID of the batch chemical reaction process

• Tank Reactor Runaway Reaction• Cooling water failure causes reactor to increase

• Increased temperature increases reaction rates and runaway

• Once the runaway reaction starts it can no longer be stopped by re-establishing the flow of cooling water

• Causes a significant pressure increase as Product C decomposes into gaseous by-products

• Pressure in the reactor vessel will quickly rise in excess of the maximum allowable working pressure of the vessel causing it to rupture

• Vessel rupture is also expected to result in a fire and vapor cloud explosion in addition to the physical explosion of the pressure vessel

• Tank Reactor Runaway Reaction• HAZOP

• Tank Reactor Runaway Reaction• SPR begins with an analysis of the initiating event

• Failure of a cooling water pump such that flow of cooling water stops flowing to the reactor cooling jacket

• Unlikely cyber-attack could cause the pump to fail, a cyber-attack could cause the pump to stop if any start/stop functionality is included in the control system or the safety instrumented system, which is quite likely in a batch process

• Initiating event is determined to be hackable.

Initiating Event Location Hackable?

Failure of Cooling Water Pump P-403,

which results in loss of cooling water

flow to Cooling Jacket E-402 of Reactor

R-401

DCS Yes

• Tank Reactor Runaway Reaction• There is only one safeguard which is a Safety Instrumented Function (SIF)

• The SIF is determined to be hackable because it resides in a SIS that is based on a programmable logic controller

• If the control system were taken over by an attacker, the output of the SIF could be frozen in an energized state, preventing the dump valve from opening

Safeguard Location Hackable?

Safety Instrumented Function UZC-402

Dumping the reactor contents into the

quench vessel which stops the reaction.

SIS Yes

ALL SAFEGUARDS HACKABLE? YES

• Tank Reactor Runaway Reaction• Next step is assign SL based on Consequence Category

• Potential for a single fatality as the result of the fire and explosion that could accompany the loss of containment

• Tank Reactor Runaway Reaction• Sometimes in situations like this, the team may recommend the use of a non-

hackable safeguards so that the scenario consequence does not unduly increase the required SL

• Upon review of the common non-hackable safeguards, it is determined that no self-contained mechanical device will prevent the scenario under consideration

• Tank Reactor Runaway Reaction• Analog “mimic” of the SIF UZC-403 will employ the second thermocouple of a

dual element thermocouple set in the existing thermowell

• Wired to an analog temperature transmitter that will convert the temperature measurement to a 4-20 mA signal

• 4-20 Ma signal will be analyzed by an analog current monitor relay that will open a contact in the 24 VDC signal to the solenoid valve for UZV-403, de-energizing the solenoid, venting the valve’s actuator, and causing the valve to go to the open position

• Entire analog “mimic” is inherently safe against cyber-attack, and any cyber-attack that is waged on the digital complement (UZC-207) will be not interfere in the safety functionality of the analog “mimic” function

• Tank Reactor Runaway Reaction• Analog “Mimic” of the digital SIF

TZT403B

TZT403

TT

100

TZH

TSH

100

Digital Pathway

Analog Pathway

TYB

100

Instrument Air

TYA

100

• Non-hackable Safeguards• Analog “Mimic” of a Digital SIF

• Non-hackable Safeguards• Pressure Relief Valve, Rupture Disc

• Buckling Pin, Check Valve

• Non-hackable Safeguards• Mechanical Overspeed, Current Relay

• Non-hackable Safeguards• Excess Flow Check Valve

• In this session… • Described the problem and made the case for doing it differently

• We described a Security PHA Review

• We explained how a HAZOP works

• We showed hoe to perform a SPR on a HAZOP

• We provided a real world scenario

• We through in a few non-hackable systems for consideration

• Final Thoughts• We know we can connect everything

• But should we

• Does every device need to be remotely controllable or programmable