5
Safety Instrumented System Design: Lessons Learned Paul Gruhn Moore Process Automation Solutions, 8924 Kirby Drive, Houston, TX 77054 Engineering is a bold disc$line. Engineers are constantly reaching for new heights, searchingfor new materials andgreater efficiency. Unfortu- nate& part ofthatprocess means we occasionally exceed known bounhries. It is regrettable, but it wouldappear that human nature requires that we learn the hard way. While this is an obviously painfilprocess, we can learn more from our few mistakes than from our many successes. Our many suc- cesses may containf i w s that are never revealed under normal conditions, and we may go on repeating them over and over. It is only when expected conditions are exceeded, andfailure is the result, that we learn where we went wrong [12]. Valuable lessons can be learnedfromfailures, and there are plenty of examples from industry in general as well as specific detaik regardingfail- ures ofsajty control system. For example, the UKHSE (Healthandsafety Executive) issued a publication in 1995 [2] that reviewed 34 accidents that were directly cawed by control and safety system failures. The HSE published the reviews so that engineers could learn from and hope&lly not repeat the mistakes discussed in the book. The IEC and ISA standzrdc on this subject, as well as the CCPS Gzdelines, are based upon a “Safety Life Cycle” which is a set ofsteps one shouldgo through in the overall design process in an gort to emre that nothingfalls through the mcks. As responsible engineers, we should not have to learn the hard way. Plants have become too large and the risks have become too great for us to learn by mere trial and error. Because we cannot do a recall on all refineries we need to get things right the first time. We can , however, learn from the mistakes ofothm without re-inventing the wheel or operating in isolation. THE DANGER OF OVER-CONFIDENCE AND COMPLACENCY We install multiple safety layers, we use the latest technology, and an accident may not have occurred in a long time. While it is easy to become over-confident and complacent about many safety issues this a very dangerous attitude to develop. The accidents at Bhopal and Chernobyl help to drive the point home that one cannot assume one has a safe facility just because an accident has never happened [ill. Many people have said, “We haven’t had an accident here in so many years, therefore we have a safe facility.” Such a statement could have been made by plant managers at Bhopal, Chernobyl, and Pasadena the day before the incidents occurred. MAJOR INDUSTRIALACCIDENTS The following list is by no means complete as the scope of this paper does not allow for inclusion of all the major industrial accidents that have occurred. The following are some of the incidents that have received the most press coverage over the years: Flixborough (England, 1974) was the wake-up call for the UK. The incident resulted in 28 deaths, over 100 injuries and the complete destruction of the plant. Fortunately the accident occurred on a holiday week- end, as few people were in the facility at the time. Had the accident happened during a normal work week it could have resulted in over 500 fatalities. The death toll from the incident in Bhopal (India, 1984) was over 2,000 at the time of the accident. Some recent reports place the estimates as high as 10,000 with over 200,000 injuries. Chernobyl (Soviet Union, 1986) is estimably one of the worst industrial accident of all time. Thousands of deaths have been estimated to date. Piper Alpha was another major catastrophe for the UK, and the offshore industry in general (North Sea, 1988). The incident caused 165 deaths and the entire platform was destroyed. Pasadena was the wake-up call for the US (Texas, 1989) with 23 fatalities and 130 injuries. Another acci- dent in nearby Channelview killed 17 and injured over 100 less than one year later. These two accidents resulted in the Occupational Safety and Health Administration (OSHA) PSM (Process Safety Manage- ment) legislation. RECOMMENDATIONS OF THE LORD CULLEN REPORT The Lord Cullen report detailed the results of the inquiry commissioned by the British government, into the Piper Alpha accident [l]. The following are some of the recommendations outlined in the report: The operator should be required ... to submit a Sa&y Case ... of each instalation. A Safety Case is a detailed design document that verifies that certain objectives have been met. An ade- quate safety management system must be in place. Potential major hazards must be identified and appro- Process Safety Progress (Vol 18, No.2)

Safety instrumented system design: Lessons learned

Embed Size (px)

Citation preview

Page 1: Safety instrumented system design: Lessons learned

Safety Instrumented System Design:

Lessons Learned Paul Gruhn Moore Process Automation Solutions, 8924 Kirby Drive, Houston, TX 77054

Engineering is a bold disc$line. Engineers are constantly reaching for new heights, searchingfor new materials andgreater efficiency. Unfortu- nate& part ofthat process means we occasionally exceed known bounhries. It is regrettable, but it wouldappear that human nature requires that we learn the hard way. While this is an obviously painfilprocess, we can learn more from our few mistakes than from our many successes. Our many suc- cesses may contain f i w s that are never revealed under normal conditions, and we may go on repeating them over and over. It is only when expected conditions are exceeded, andfailure is the result, that we learn where we went wrong [12].

Valuable lessons can be learnedfrom failures, and there are plenty o f examples from industry in general as well as specific detaik regardingfail- ures ofsajty control system. For example, the UKHSE (Health andsafety Executive) issued a publication in 1995 [2] that reviewed 34 accidents that were directly cawed by control and safety system failures. The HSE published the reviews so that engineers could learn from and hope&lly not repeat the mistakes discussed in the book. The IEC and ISA standzrdc on this subject, as well as the CCPS Gzdelines, are based upon a “Safety Life Cycle” which is a set ofsteps one shouldgo through in the overall design process in an gort to e m r e that nothingfalls through the mcks.

As responsible engineers, we should not have to learn the hard way. Plants have become too large and the risks have become too great for us to learn by mere trial and error. Because we cannot do a recall on all refineries we need to get things right the first time. We can , however, learn from the mistakes ofothm without re-inventing the wheel or operating in isolation.

THE DANGER OF OVER-CONFIDENCE AND COMPLACENCY We install multiple safety layers, we use the latest

technology, and an accident may not have occurred in a long time. While it is easy to become over-confident and complacent about many safety issues this a very dangerous attitude to develop. The accidents at Bhopal and Chernobyl help to drive the point home that one cannot assume one has a safe facility just because an accident has never happened [ill. Many people have said, “We haven’t had an accident here in so many years, therefore we have a safe facility.” Such a statement could have been made by plant managers at Bhopal, Chernobyl, and Pasadena the day before the incidents occurred.

MAJOR INDUSTRIAL ACCIDENTS The following list is by no means complete as the

scope of this paper does not allow for inclusion of all the major industrial accidents that have occurred. The following are some of the incidents that have received the most press coverage over the years:

Flixborough (England, 1974) was the wake-up call for the UK. The incident resulted in 28 deaths, over 100 injuries and the complete destruction of the plant. Fortunately the accident occurred on a holiday week- end, as few people were in the facility at the time. Had the accident happened during a normal work week it could have resulted in over 500 fatalities.

The death toll from the incident in Bhopal (India, 1984) was over 2,000 at the time of the accident. Some recent reports place the estimates as high as 10,000 with over 200,000 injuries.

Chernobyl (Soviet Union, 1986) is estimably one of the worst industrial accident of all time. Thousands of deaths have been estimated to date.

Piper Alpha was another major catastrophe for the UK, and the offshore industry in general (North Sea, 1988). The incident caused 165 deaths and the entire platform was destroyed.

Pasadena was the wake-up call for the US (Texas, 1989) with 23 fatalities and 130 injuries. Another acci- dent in nearby Channelview killed 17 and injured over 100 less than one year later. These two accidents resulted in the Occupational Safety and Health Administration (OSHA) PSM (Process Safety Manage- ment) legislation.

RECOMMENDATIONS OF THE LORD CULLEN REPORT The Lord Cullen report detailed the results of the

inquiry commissioned by the British government, into the Piper Alpha accident [l]. The following are some of the recommendations outlined in the report:

The operator should be required ... to submit a Sa&y Case ... o f each instalation.

A Safety Case is a detailed design document that verifies that certain objectives have been met. An ade- quate safety management system must be in place. Potential major hazards must be identified and appro-

Process Safety Progress (Vol 18, No.2)

Page 2: Safety instrumented system design: Lessons learned

FIGURE 1. Causes of 34 control system accidents (from a summary of the findings of the UK Health & Safety Executive [ll).

priate controls must be provided. Regulations should be pev f rmance oriented (set goals),

rather than prescriptive. The second statement, while not a direct quote, is

the intent o f the original statement. Prescriptive docu- ments quickly become obsolete and do not allow for new knowledge and systems.

Standards should be set by reference to the ALARP (As Low As Reasonably Practical) principle.

The ALARP principle is similar to many of the new industry standards. In essence, the systems that are required should be proportionate to the level of the risk. This requires the owner/operator to decide what levels of risk are “tolerable” and what economic limits they are willing to set. The new standards do not mandate tech- nology, levels of redundancy, or test intervals.

GENERAL FINDINGS OF THE UK HSE In 1995 the UK HSE (Health and Safety Executive)

published Out of Control: Why Control Systemsgo Wrong and How to Prevent Failure. The publication reviewed 34 acci- dents that were the direct result of control and safety system failures [2]. These findings were published so that engineers could learn from, and hopefully not repeat, the mistakes cited in the book (a summary of the findings appear in Figure 1).

Over 44% of the accidents examined in Out of Con- trolwere due to incorrect specifications. Specifications consist of two parts. The “functional specification” states what the system is to perform (regardless of the technology or level of redundancy). In other words, what is the functional logic to be performed? Cases were sited of systems that did exactly what they were programmed to do, but the procedure was wrong. The second half of specifications is the “integrity spec- ification”, or how well the system is supposed to work. The book describes a case where an operator with a relay system, replaced it with a simplex soft- ware system, and assumed that since it was perform- ing the same functional logic, it was the same system. Wrong! Both systems involve totally different tech- nologies with completely different failure characteris- tics. An accident occurred because the operator failed to understand the basic technology issues involved.

More than 20% of the accidents discussed were caused by changes made after commissioning and start-up. What one person assumed to be a minor insignificant change was not the case. Every change must be thoroughly documented and reviewed.

Therefore, 64% of the accidents occurred where technical decisions were involved, such as, deciding what and how well the system should perform. Simi- lar findings have been documented by others with the result that various industry groups are endeavoring to cover these issues in industry standards of which some are listed below:

SAFETY INSTRUMENTED SYSTEM (SIS) STANDARDS Programmable Electronic Systems In Safety Related Applica-

tions, Purt I - An Introductory Guide, Purt 2 - General Techni- cal Guidelines, UK HSE [ f . An excellent document pub- lished by the UK Health and Safety Executive in 1987 on the use of Programmable Electronic Systems (for use in safety applications) that is just as applicable for other technologies as well.

Guidelinesfor Safe Automation of Chemical Processes, pub- lished by the American Institute of Chemical Engi- neers’ Center for Chemical Process Safety [61. CCPS released this book as part of their Guidelines series in late 1993. It covers the design of DCS (Distributed Control Systems) and “interlock” (shutdown) systems.

Application of Safe9 Instrumented Systemsfor the Process Zndustries, International Society for Measurement and Control, ANSIASA S84.01 [31. ISA S84 took over 10 years to produce and was finally released in 1996. It applies to the process industry, and covers all logic box technologies, including field devices.

Functional Safity - Safety Related Systems, International Elec- trotechnical Commission, DraB Standards 61508 & 6151 I [4]. The International Electrotechnical Commission has been working for several (almost ten) years on their overall standard (for all industries) dS61508, which will cover the use of relay, solid state and programmable systems. The standard will apply to all industries such as trans- portation, medical, nuclear, etc. Portions of the dS61508 document were released in 1998. The process industry specific document (dS615111, which is still a few years away from release, should essentially be an IEC version of ANSI/ISA S84.

All of the standards and guidelines listed above (except the early HSE text) are performance oriented, not prescriptive. In other words, they do not mandate technology, level of redundancy, or manual test inter- vals. The essential premise is: the greater the risk, the better the systems needed to control the risk.

If we, the experts from industry, do not issue our own standards, the government will do it for us. Recent OSHA and EPA regulations are a prime exam- ple of this fact.

DESIGN LIFE CYCLE The IEC and ISA standards, as well as the CCPS Guide-

Lines, are based on a “Safety Life Cycle,” a partial version of which is shown in Figure 2. The figure is in no way all-encompassing as additional procedures (not shown in the figure) must be followed once the system is actu- ally manufactured and installed.

The first step is to analyze the hazards. The goal is to have an inherently safe plant, although this is not always possible. A number of non-SIS safety layers should be considered before deciding to add an SIS.

Process Safety Progress (Vo1.18, No.2) Fall 1999 157

Page 3: Safety instrumented system design: Lessons learned

FIGURE 4. Three-dimensional Rtsk Kanking

FIGURE 2. Safety Design Life Cycle.

Once the decision has been made to add an SIS. a determination must be made as to the performance required. (The greater the process risk, the more superi- or performance is required of the SIS.) This can be a dif- ficult task when one takes the legal system in the US into account. What level of risk should be considered acceptable, or tolerable? How safe should the plant be? This is like trying to answer the proverbial, “How safe is safe?” which is not the sort of thing companies want to commit to writing. ?We at company XYZ consider it tol- erable to kill 5 persons every 100-million man-hours.”)

Consequently, one must choose a technology, level of redundancy, and method for testing the proposed system. When all the facts are accumulated, the system can be analyzed to see if it will meet the initial perfor- mance target. If it does not, changes must be made and, like all engineering disciplines, it is an iterative process.

The standards cover many additional topics for the manufacture, test, installation, and maintenance o f sys- tems as well.

INDEPENDENT LAYERS OF PROTECTION A safety instrumented system is just one of many

“layers of protection” as illustrated in Figure 3. The process control system should keep the process with- in normal limits. If that does not work. alarms are typ- ically configured to alert the operator to take action.

FIGURE 3. hlultiple Independent Safeti h w r 5

158 Fall 1999

When the process goes beyond the alarm limits, the Safety Instrumented System is triggered. There are usually additional layers of protection in place includ- ing mechanical safety devices, relief systems, evacua- tion procedures, etc. All the layers must fail in order for an accident to reach the general population. The diagram in Figure 3 illustrates the idea of not placing all one’s eggs in one basket. The nuclear industry calls this concept ‘.defense in depth”.

SIS PERFORMANCE REQUIREMENTS Figure 4 demonstrates how one may qualitatively

evaluate process risk to determine the required safety integrity level (SIL). Risk is a function of the frequency and severity of an event. Does the SIS under consider- ation have any form of additional layers of protection beyond it? In other words, if the system under consid- eration fails, will anything else be able to prevent the hazard? If not, then one is restricted to the first (top) layer in this figure.

However, if there are additional safety layers in the process (Figure 31, one can consider lowering the design requirements of the SIS under study. Instead of requiring level 3 performance, perhaps the require- ments could be lowered to those for level 2 .

The performance numbers required by the stan- dards for the different safety integrity levels are dis- played in Table 1. Again, the documents do not pre- scribe technology, level of redundancy, or test inter- vals. If the user evaluates the risk and determines the required SIL! then he must document in some fashion that the proposed system meets the performance requirements.

Several different terms are used to measure system performance, some more confusing than others. Avail- ability numbers beyond 79% loose significance with most people. The compliment of that - unavailability, or probability of failure on demand - can also be diffi- cult to relate to as the numbers are so sinall that one needs to use scientific notation.

The term “Risk Reduction Factor” was first intro- duced in the CCPS Guide1itze.s in 1973 [61, although some were using a similar term before that. The range o f numbers are much easier t o relate to, and the term itself is more intuitive.

Process Safety Progress (Vol. 18. No.2)

Page 4: Safety instrumented system design: Lessons learned

TABLE 1. Performance Requirements for Each SIL

Integrity Safe PFD* Level Availability

4 > 99.99% <.0001

Equivalent RRF**

>10.000

ISA and AIChE 3 99.9 - 99.99% ,001 - 0001 documents restricted to 3 levels 2 99 - 99.9% .01 - ,001

1 90 - 99% .1 - .01

1,000 - 10,000

100 - 1,000

10 - 100

0

~~~~

*PFD = Probability of failure on demand. ""RRF = Risk reduction factor (l/PFD)

(Control - N/A)

~~~ ~

EXAMPLE SIS CONFIGURATIONS Figure 5 is derived from the CCPS Guidelinesand

ISA S84 standard [3 , 61. Three safety integrity levels (1, 2 , and 3) have been defined for the process industries. For an SIL 1 system, an availability of about .99 is required. The normal design is the best available non- redundant device. Therefore, the normal design would be single sensors, single logic solvers, and sin- gle actuators.

SIL 2 requires an availability of approximately ,999. Depending upon the types of devices employed, that can sometimes be achieved in a non-redundant sys- tem and sometimes requires a redundant system. Therefore the second path is optional. Redundancy should be employed for components with lower avail- abilities.

SIL 3 requires availabilities in the range of ,999 to ,9999. In order to implement those availability levels, a totally redundant system with independent paths is typically required.

However a company may develop such a "cook- book" that they will need to justify their decisions, usually hased on some form of a quantitative analysis.

Integrity Level Minimum Interlock

(Availability) SEnSOr Logic solver Actuator Design Structure

Non-redundant - - - - - - 1 Best single path design

abut + 0 99

2

3

about 0 999

0 999 10

0 9999

FIGURE 5. Sample Configurations for Each SIL.

Partially redundant Redundant independent paths for elements wth lower availabihty

Totally redundant Redundant independent paths for total interlock syster A single fault of a SIS system component should not result ir loss of process protearon

ANALYZING SIS PERFORMANCE Several modeling techniques are available to ana-

lyze the performance of a safety instrumented system: Fault trees are excellent for modeling an entire

system, including the field devices and the process. They do not handle complex redundant systems with time dependent factors such as test intervals and repair rates very well, and they only account for known circumstances (an unknowable situation can- not be included in the fault tree).

Reliability block diagrams, along with their associ- ated simple series/parallel formulas are simple. How- ever, they do not handle the more complex redundant systems and time dependent variables.

Most reliability practitioners have settled upon vari- ations of Markov modeling for these systems. Markov modeling is typically associated with complex transi- tion diagrams and matrix math, but there are algebraic simplifications that have been around for decades. The level of complexity and apparent accuracy or pre- cision offered by the more detailed methods however is illusory - the initial assumptions and simplifications have far more of an impact on the final results - often by orders of magnitude.

Either way, one can do calculations by hand, or by using a computer spreadsheet, or a specialized pro- gram. The intent is to analyze the problem before one specifies the solution.

CONCLUSION "Engineering responsibility should not require the stimulu-

tion that comes in the wake of catastrophe. "S.C. Florman. As responsible engineers, we should not have to

learn the hard way. Plants are too large, and the risks are too great, to learn by mere trial and error. We can- not do a recall on all refineries, therefore, we need to get it right the first time and we can learn from the mistakes of others. We do not have to keep re-invent- ing the wheel or operating in isolation.

Process Safety Progress (Vo1.18, No.2) Fall 1999 159

Page 5: Safety instrumented system design: Lessons learned

‘‘ The result of the increased application o f hazard analysis has been a 50% reduction in injuries, increased ease of opera- tions, and decreasedproduction stoppages. ”N. Leveson [l 11.

Very often people complain about doing safety studies (e.g., HAZOPs, etc.), but history has shown them to be beneficial. Plants are too complex for any one person to be aware of all of the hazards. Complex facilities require a detailed, systematic review by an experienced team.

Xccidents are not due to lack of knowledge, butfnilure to use the knowledge we have. ”T. Kletz

Trevor Kletz is a recognized leader in the field of process safety and is one of the people who devel- oped the HAZOP technique. He has written many excellent textbooks on process safety, inherently safe design, plant disasters, human error, and computer control [7 - 101.

“When a man? education is finished, he is finished.” E.A. Filene

Filene, a businessman, made this statement almost 100 years ago and, in a way, times have not changed. If you let your knowledge stagnate, no matter what your field, you are doomed.

The key is that what it takes to be a Control System Engineer today is different than what it was 20 years ago. People need to be aware of new standards, new techniques, and new solutions.

“Those who cannot remember the past, are condemned to repeat it. ”G. Santayana

LITERATURE CITED 1. UK HMS0,“The Public Inquiry into the Piper

Alpha Disaster”, London, England (1990). 2. UK Health & Safety Executive, “Out of Control,

Why Control Systems Go Wrong and How to Pre- vent Failure”, Sheffield, England (1995).

3. International Society for Measurement and Control, “Application of Safety Instrumented Sys- tems for the Process Industries”, ANSI/ISA S84.01, RTP, NC (1996).

4. I n t e r n a t i o n a l E l e c t r o t e c h n i c a l C o m m i s - sion,“Functional Safety - Safety Related Systems”, draft standards 61508 & 61511. Geneva, Switzer- land (1998).

5. UK Health & Safety Executive,“Programmable Electronic Systems In Safety Related Applications, Part 1 - An Introductory Guide, Part 2 - General Technical Guidelines”, Sheffield, England (1987).

6. American Institute of Chemical Engineers, Center for Chemical Process Safety,“Guide- lines for Safe Automation of Chemical Processes”, New York, NY (1993).

7. Kletz, T. A., ”Computer Control and Human Error“, Gulf Publishing Co., Houston, TX (1986).

8. Kletz, T. A., “What Went Wrong? Case Histories of Process Plant Disasters”, Gulf Publishing Co., Houston, TX (1986).

9. Kletz, T. A., “An Engineer’s View Of Human Error”, The Institute of Chemical Engineers, War- wickshire, England (1985)

10. Kletz, T. A., “Lessons From Disaster - How orga- nizations have no memory and accidents recur”, Gulf Publishing Co., Houston, TX (1993).

11. Leveson, N. G., “Safeware - System Safety and Computers”, Addison-Wesley, New York, NY, (1995).

12. Petroski, H., “To Engineer is Human, The Role of Failure in Successful Design”, Vintage, New York, NY (1982).

This paper (25a) waspresented at the AICbE Spring National Meeting held in Houston, 7X in March 1999

160 Fall 1999 Process Safety Progresy (Vol 18. N o 2)