Ch13 Control System Design

07:23 7/11/00 Ref: 3723 LEES ± Loss Prevention in the Process Industries Chapter 13 Page No. 1

Control SystemDesign13

13.1 Process Characteristics 13/213.2 Control System Characteristics 13/413.3 Instrument System Design 13/413.4 Process Computer Control 13/913.5 Control of Batch Processes 13/1313.6 Control of Particular Units 13/1513.7 Computer Integrated Manufacturing 13/1813.8 Instrument Failure 13/1813.9 Trip Systems 13/2613.10 Interlock Systems 13/4513.11 Programmable Logic Systems 13/5013.12 Programmable Electronic Systems 13/5013.13 Software Engineering 13/5813.14 Safety-related Instrument Systems 13/6313.15 CCPS Safe Automation Guidelines 13/6513.16 Emergency Shut-down Systems 13/6913.17 Level of Automation 13/7113.18 Toxic Storage Instrumentation 13/7213.19 Notation 13/72

1 3 / 2 C O N T R O L S Y S T E M D E S I G N

The operation of the plant according to specifiedconditions is an important aspect of loss prevention.This is very largely a matter of keeping the systemunder control and preventing deviations. The controlsystem, which includes both the process instrumentationand the process operator, therefore has a crucial role toplay. Selected references on process control are given inTable 13.1.

Traditionally, control systems have tended to grow bya process of accretion as further functions are added.One of the thrusts of current work is to move towards amore systematic design approach in which there is amore formal statement of the control objectives, hier-archy, systems and subsystems.

Once the objectives have been defined, the functionsof the systems and subsystems can be specified. Typicalsubsystems are those concerned with measurement,alarm detection, loop control, trip action, etc. The nextstep is the allocation of function between man andmachine � in this case the instrumentation and theoperator. This allocation of function and the humanfactors aspects of process control are discussed inChapter 14.

It is convenient to distinguish several broad categoriesof function that the control system has to perform: theseare (1) information collection, (2) normal control and (3)fault administration.

A control system is usually also an informationcollection system. In addition to that required forimmediate control of the process, other information iscollected and transmitted. Much of this is used in thelonger term control of the process. Another categorywhich is somewhat distinct from normal control is theadministration of fault conditions which represent dis-turbances more severe than the control loops can handle.

Table 13.1 Selected references on process control

NRC (Appendix 28 Control Systems); A.J. Young (1955);Ceaglske (1956); D.F. Campbell (1958); Grabbe, Ramoand Wooldrige (1958); Macmillan (1962); Buckley (1964);R.J. Carter (1964, 1982); Harriott (1964); Hengstenberg,Sturm and Winkler (1964); Coughanowr and Koppel(1965); Perlmutter (1965); Franks (1967); IChemE (1967/45; C.D. Johnson (1967); E.F. Johnson (1977) H.S.Robinson (1967b); Shinskey (1967, 1977, 1978, 1983);Himmelblau and Bischoff (1968); Chemical Engineering(1969c); Gould (1969); McCoy (1969); Soule (1969�);Himmelblau (1970); Considine (1971); Hartmann (1971);Pollard (1971); Luyben (1973); C.A.J. Young (1973); C.L.Smith and Brodman (1976); Lees (1977a); R.E. Young(1977, 1982); C.L. Smith (1979); Dorf (1980); L.A. Kane(1980); Basta (1981d); Frankland (1981); Auffret,Boulvert and Thibault (1983); Stephanopoulos (1984);Hydrocarbon Processing (1986a�); Tsai, Lane and Lin(1986); Benson (1987); Prett and Morari (1987); W.R.Fisher, Doherty and Douglas (1988); Prett and Garcia(1988); Asbjornsen (1989); T. Martin (1989b); K.Pritchard (1989); R. Hill (1991); Ayral and Melville(1992); Y.Z. Friedman (1992); T. Palmer (1992); C.Butcher (1993c); Holden and Hodgson (1993); Pontonand Laing (1993); Roberson, O'Hearne and Harkins(1993)

Sequence control, batch control, including computercontrolKochhar (1979); Thome, Cline and Grillo (1979); Ghosh(1980); Rosenof (1982b); Armstrong and Coe (1983);Severns and Hedrick (1983); Anon. (1984ii); M. Henry,Bailey and Abou-Loukh (1984); Bristol (1985); Cherry,Preston and Frank (1985); E.M. Cohen and Fehervari(1985); Krigman (1985); Namur Committee (1985);Preston and Frank (1985); Egli and Rippin (1986); Love(1987a,b, 1988); Rosenof and Ghosh (1987); ISA (1988);Kondili, Pantiledes and Sargent (1988); Cott andMacchietto (1989); IChemE (1989/135); T.G. Fisher(1990); Crooks, Kuriyna and Macchietto (1992); Wilkins(1992); Sawyer (1992a,b, 1993a,b); Hedrik (1993)

Reactor control (see also Table 11.4)Aris and Amundson (1957, 1958); Harriott (1961, 1964);Levenspiel (1962); Dassau and Wolfgang (1964);Coughanowr and Koppel (1965); Denbigh (1965);Perlmutter (1965); Shinskey (1967); Buckley (1970);SchoÈttle and Hader (1977); Rosenhof (1982a,b); R. Kingand Gilles (1986); Rosenof and Ghosh (1987); Craig(1989)

Compressor control, turbine controlClaude (1959); Hagler (1960); Tezekjian (1963); R.N.Brown (1964); Daze (1965); Marton (1965); Hatton(1967); Magliozzi (1967); Hougen (1968); Labrow (1968);M.H. White (1972); Nisenfeld et al. (1975); Sweet (1976);lEE (1977 Coll. Dig. 77/38); Nisenfeld and Cho (1978);Staroselsky and Ladin (1979); D.F. Baker (1982); Bass(1982); Gaston (1982); Maceyka (1983); B. Fisher (1984);Rana (1985); AGA (1988/52)

Process instrument and control systemsIsaac (1960); Anon. (1962a); Fusco and Sharshon (1962);Richmond (1965); Fowler and Spiegelman (1968); Byrne(1969); Frey and Finneran (1969); Klaassen (1971); Hix(1972); Nisenfeld (1972); Jervis (1973); K. Wright (1973);Calabrese and Krejci (1974); Wilmot and Leong (1976);Gremillion (1979); Mosig (1977); Redding (1977);Shinskey (1978); Kumamoto and Hensley (1979); Rinard(1982); Cocheo (1983); Rindfleisch and Schecker (1983);Swanson (1983); Galuzzo and Andow (1984); Love(1984); E.M. Cohen (1985); B. Davis (1985); S.J. Brown(1987); Cluley (1993); I.H.A. Johnston (1993)

13.1 Process Characteristics

The control system required depends very much on theprocess characteristics (E. Edwards and Lees, 1973).Important characteristics include those relating to thedisturbances and the feedback and sequential features. Areview of the process characteristics under these head-ings assists in understanding the nature of the controlproblem on a particular process and of the controlsystem required to handle it.

Processes are subject to disturbances due to unavoid-able fluctuations and to management decisions. Thedisturbances include:


C O N T R O L S Y S T E M D E S I G N 1 3 / 3

(1) raw materials quality and availability;(2) services quality and availability;(3) product quality and throughput;(4) plant equipment availability;(5) environmental conditions;

and due to

(6) links with other plants;(7) drifting and decaying factors;(8) process materials behaviour;(9) plant equipment malfunction;(10) control system malfunction.

Quality may relate to any relevant parameter such as thecomposition or particle size of the material, the voltagelevel of a power supply or the specification of a product.Plant equipment may be taken off or brought back intoservice. Links with other plant may require changes inthe operation of the process. Typical drifting and decay-ing factors are fouling of a heat exchanger and decay ofcatalyst. Process materials introduce disturbancesthrough such behaviour as the clogging of solids onweighbelts or the blocking of pipes. Plant equipmentfailures constitute disturbances, as do those of thecontrol system such as instrument faults, measurementnoise, control loop instability or operator error.

Certain trends in modern plants tend to intensify theprocess disturbances. They include use of continuous,high throughput processes, existence of recycles, elim-ination of storage and interlinking of plants.

Some process characteristics which tend to makefeedback control more difficult include:

(1) measurement problems;(2) dead time;(3) very short time constants;(4) very long time constants;(5) recycle;(6) non-linearity;(7) inherent instability;(8) limit cycles;(9) strong interactions;(10) high sensitivity;(11) high penalties;(12) parameter changes;(13) constraint changes.

Measurement has always been one of the principalproblems in process control. A measurement may bedifficult to make; it may be inaccurate, noisy, orunreliable; or it may be available in sampled form only.Even if the measurement itself is satisfactory, it may notbe the quantity of prime interest. An ìndirect' orìnferred' measurement may have to be computed orotherwise obtained from the actual plant measurement(s).Feedback control is totally dependent on measurement.

Dead time or time delay arises in various ways inprocesses. It may be introduced by the distance�velocitylag in pipework, the nature of distributed parametersystems or the time to obtain a sample or laboratoryanalysis. Dead time makes feedback control moredifficult, owing to the delay before any error ismeasured and corrective action is initiated.

Processes with very short time constants are obviouslydifficult to control, because the speed of response

required for control decisions and actions is rapid. Butso also are processes with very long time constants,where the problems have to do with the increasedchance of disturbances and other control interactionsupsetting the control action taken and with the difficultyof remembering all the relevant factors.

Recycle takes a number of forms, including recycle ofa process stream to an earlier point in the process andinternal recycle within a vessel.

If a process is very non-linear, its behaviour tends tovary with throughput, its responses to disturbances andcorrective actions differ, and it becomes difficult to findsatisfactory controller settings.

Some processes, notably certain chemical reactors, areinherently unstable over a certain range of operation. Ifthe process enters the unstable region, variables such astemperature and pressure may increase exponentially,leading to an explosion. In other cases the processenters a limit cycle and oscillates between definite limits.

The relationships between the input and outputvariables of a process are often complex and there maybe strong interactions. One input may change severaloutputs and one output may be changed by severalinputs. Where the output variables are controlled bysingle loops, severe interactions may occur betweenthese loops.

Some processes are very sensitive and this clearlyintensifies the difficulty of control. So also does theexistence of very high penalties for excursions outsidethe control limits.

Process parameter changes tend to reduce the effec-tiveness of controller settings and may make the processinherently more difficult to control. Constraint changesalter the envelope within which the process is to becontrolled.

The sequential control characteristics of a processinclude:

(1) plant start-up;(2) plant shut-down;(3) batch operation;(4) equipment changeover;(5) product quality changes;(6) product throughput changes;(7) equipment availability changes;(8) mechanical handling operations.

The sequential element in the start-up and shut-down ofcontinuous processes and in batch processes is obvious,but there are other operations with sequential features.Continuous processes often contain semi-continuousequipment, particularly where regeneration is necessary.Deliberate changes in product quality or throughput orin equipment status involves sequential operation. Ingeneral, a sequence consists of a series of stages, ofwhich some are initiated by events occurring in theprocess and others are initiated after the lapse of aspecified time.

Some other process characteristics which may besignificant include requirements for:

(1) monitoring;(2) feedforward control;(3) optimization;



(4) scheduling;(5) process investigation;(6) plant commissioning.

Monitoring is usually a very important function in thecontrol system. The monitoring requirements posed by aprocess vary, but in cases such as multiple identical unitsor batch operations they can be very large.

Feedforward control may be appropriate if there aredifficulties in feedback control due to measurementproblems or process lags. It is applicable where thedisturbances can be measured but not eliminated, andwhere a model exists which makes possible theprediction of the effect on the controlled variable ofboth the disturbing and correcting variables.

If the plant has a time-varying operating point,continuous optimization may be appropriate. Althoughoptimization is carried out normally for economicreasons, it is characterized by adherence to a set ofconstraints. Operation within the envelope of constraintscontributes to process safety.

Some processes pose a scheduling requirement,particularly where batch operations are concerned.There is normally some element of novelty, in theprocess or plant equipment and this may give rise to arequirement for process investigation and collection ofinformation which is not otherwise needed for control.

The investigative element is particularly importantduring plant commissioning. So also is the need forfacilities which assist in bypassing problems on plantequipment or control instrumentation, while solutions aresought or equipment ordered.

13.2 Control System Characteristics

The characteristics of process control systems havepassed through three broad phases: (1) manual control,(2) analogue control and (3) computer control (coveringall forms of programmable electronic system). However,such a classification can be misleading, because it doesnot bring out the importance of measuring instrumenta-tion and displays, because neither analogue nor computercontrol is a homogeneous stage and because it says verylittle about the quality of control engineering andreliability engineering and the human factors involved.

The sophistication of the measuring instrumentationgreatly affects the nature of the control system even atthe manual control stage. This covers instruments formeasuring the whole range of chemical and physicalproperties. The displays provided can also vary widely.These are discussed in more detail in the next chapter.

The stage of analogue control implies the use ofsimple analogue controllers, but may also involve the useof other special purpose equipment. Most of thisequipment serves to facilitate one of the followingfunctions: (1) measurement, (2) information reductionand (3) sequential control.

The first two functions, therefore, improve the informa-tion available to the operator and assist him to digest it,but leave the control to him. The equipment typicallyincludes data loggers and alarm scanners. The thirdfunction does relieve the operator of a control function.Batch sequential controllers exemplify this sort ofequipment.

Another crucial distinction is in the provision ofprotective or trip systems. In some cases the safetyshut-down function is assigned primarily to automaticsystems; in others it is left to the operator. Similarly,computer control is not a homogeneous stage ofdevelopment In some early systems the function of thecomputer was limited to the execution of direct digitalcontrol (DDC). The real control of the plant was thencarried out by the operator with the computer as a ratherpowerful tool at his disposal. In other systems thecomputer had a complex supervisory program whichtook most of the control decisions and altered the controlloop set points, leaving the operator a largely monitoringfunction. The two types of system are very different.

The quality of the theoretical control engineering isanother factor which distinguishes a system and largelydetermines its effectiveness in coping with problemssuch as throughput changes, dead time and loopinteractions.

Equally important is the reliability engineering. Unlessgood reliability is achieved nominally automated func-tions will be degraded so that they have to be donemanually or not at all. Control loops on manual settingare the typical result.

The extent to which human factors has been applied isanother distinguishing feature. This aspect is consideredfurther in Chapter 14. The general trend in controlsystems is an increase in the degree of automation and achange in the operator's role from control to monitoring.

Computer control itself has progressed from control bya single computer, or possibly several such computers, todistributed control by programmable electronic systems(PESs). These are described further in Section 13.4.

13.3 Instrument System Design

The design of process instrument systems, like mostkinds of design, is largely based on previous practice.The control panel instrumentation and the controlsystems on particular operations tend to become fairlystandardized. Selected references on process instrumen-tation are given in Table 13.2.

13.3.1 Some design principlesThere are some basic principles which are important forcontrol and instrument systems on hazardous processes.The following account has been given by Lees (1976b):

(1) There should be a clear design philosophy and properperformance and reliability specifications for the con-trol and instrumentation. The design philosophyshould deal among other things with the characteris-tics of the process and of the disturbances to which itis subject, the constraints within which the plant mustoperate, the definition of the functions which the con-trol system has to perform, the allocation of the func-tion of these between the automatic equipment andthe process operator, the requirements of the opera-tor and the administration of fault conditions. Thephilosophy and specification should cover: measure-ments, displays, alarms and control loops; protectivesystems; interlocks; special valves (e.g. pressurerelief, non-return, emergency isolation); the specialpurpose equipment; and the process computer(s).



Table 13.2 Selected references on processinstrumentation

British Gas (Appendix 27 Instrumentation); IEEE(Appendix 27); ISA (Appendix 27); Gillings (1958); Howe,Drinker and Green (1961); Jenett (1964a); J.T. Miller(1964); O.J. Palmer (1965); Richmond (1965, 1982);Holstein (1966); Liptak (1967, 1970, 1993); Regenczuk(1967); Considine (1968, 1971, 1985); Fowler andSpiegelman (1968); EEUA (1969 Doc. 32, 1970 Doc. 37D,1973 Hndbk 34); HSE (1970 HSW Bkit 24); Tully (1972);Whitaker (1972); Zientara (1972); EEMUA (1973 Publ.120); Perry and Chilton (1973); Weston (1974a,b); Anon.(1975i); Andrew (1975); Doebelin (1975); Anon. (1976LPB 7, p. 1); J. Knight (1976); Benedict (1977); Hayward(1977, 1979); C.D. Johnson (1977); Yothers (1977); Anon.(1978 LPB 21, p. 68); Cavaseno (1978b); C. Tayler(1987a); Verstoep and Schlunk (1978); Cheremisinoff(1979, 1981); B.E. Cook (1979); Hayward (1979); Hougen(1979); Marcovitch (1979); Ottmers et al. (1979); Andrewand Williams (1980); Chemical Engineering Staff (1980);Coppack (1980); IChemE (1980/73); Medlock (1980);Messniaeff (1980); Hewson (1981); Cramp (1982); Liptakand Venczel (1982); R.J. Smith (1982); R.H. Kennedy(1983); Anon. (1984gg); IBC (1984/51); Klaassen (1984);Perry and Green (1984); Atkinson (1985); Borer (1985);Cahners Exhibitions Ltd (1985); Demorest (1985); M.J.Hauser, McKeever and Stull (1985); Higham (1985a,b);Langdon (1983); Leigh (1985); Challoner (1986);A. Moore (1986); A. Morris (1986); Tily (1986); Leigh(1987); Sinnott (1988); C. Butcher (1990b, 1991c);Bosworth (1991); Burchart (1991); Bond (1992 LPB 106);Krohn (1992); Nimmo (1992); K. Petersen (1992); API(1993 RP 551); Goodner (1993); Chilton Book Co. (1994);McClure (1994)BS (Appendix 27 Instrumentation), VDI (see Appendix27)

SymbolsISA (1976, 1982)

MeasurementFlow: IBC (1982/26, 1984/54); IMechE (1989/100)Level: IBC (1982/28)Pressure, vacuum: Waters (1978); Pressure GaugeManufacturers Association (1980); Masek (1981, 1982,1983); Demorest (1985); Liptak (1987); Roper and Ryans(1989)Temperature: ASTM (1974 STP 470A)Process analysers: Huyten (1979); Verdin (1973,1980); Huskins (1977); Carr-Brion (1986); Clevett (1986);EEMUA (1988 Publ. 138); Dailey (1993)

Non-invasive instrumentsAsher (1982)

Intelligent and self-checking instrumentsHasler and Martin (1971, 1973, 1974); J.O. Green (1978);R.E. Martin (1979, 1980); Barney (1985); Dent (1988);Anon. (1994b)

Control valvesISA (Appendix 27); Charlton (1960); Liptak (1964, 1983);EEUA (1969 Hndbk 32); Driskell (1969, 1983, 1987);Baumann (1971, 1981); Baumann and Villier (1974); Hays

and Berggren (1976); Hutchison (1976); Forman (1978);R.T. Wilson (1978); Kawamura (1980); Perry (1980);Royle and Boucher (1980); Whitaker (1981); Langford(1983); M. Adams (1984); Kerry (1985); Kohan (1985);Vivian (1988); Barnes and Doak (1990); Bhasin (1990);Fitzgerald (1990); Luyben (1990); B.A. White (1993);Anon. (1994b)BS 5793: 1979�

FluidicsJ. Grant and Marshall (1976, 1977); Grant and Rimmer(1980); Anon. (1981 LPB 40, p. 7)

SamplingCornish, Jepson and Smurthwaite (1981); Strauss (1985)

Signal transmission, cablingBerry (1978); Garrett (1979); Kaufman and Perz (1978);Boxhorn (1979); Anon. (1984cc); K. Hale (1985); Higham(1985ab); Mann (1985); C. Tayler (1986e); P. Reeves(1987); Fuller (1989)

Sneak circuitsMcAlister (1984); Rankin (1984)

Intrinsic safety (see Table 16.2)

Fail-safe philosophyFusco and Sharshon (1962); Axelrod and Finneran(1965); Hix (1972); Nisenfeld (1972); Bryant (1976); Ida(1983)

Instrument commissioningGans and Benge (1974); Spearing (1974); Shanmugam(1981); Meier (1982)

Instrument maintenanceUpfold (1971); Skala (1974); Denoux (1975); van Eijk(1975); R. Kern (1978d)

Instrument failure (see also Appendix 14)SIRA (1970); Anyakora, Engel and Lees (1971); A.E.Green and Bourne (1972); Lees (1976b); Cornish (1978a�c); English and Bosworth (1978); H.S. Wilson (1978);Mahood and Martin (1979); Kletz (1981i); Perkins(1980); Weir (1980); R.I. Wright (1980); Vannah andCalder (1981); Rooney (1983); Prijatel (1984); May (1985)

Logic systemsHodge and Mantey (1967); F.J. Hill and Peterson (1968);Maley (1970); Steve (1971); D. King (1973); E.P. Lynch(1973, 1974, 1980); Zissos (1976); Kampel (1986); S.B.Friedman (1990)

Protective systems, trip systemsBowen and Masters (1959); Obermesser (1960); Eames(1965 UKAEA AHSB(S) R99, 1966 UKAEA AHSB(S)R119, 1967 UKAEA AHSB(S) R122, R131); A.E. Greenand Bourne (1965 UKAEA AHSB(S) R91, 1966 UKAEAAHSB(S) R117, 1972); L.A.J. Lawrence (1965�66);Bourne (1966 UKAEA AHSB(S) R110, 1967); A.E. Green(1966 UKAEA AHSB(S) R113, 1968, 1969 UKAEAAHSB(S) R172, 1970); Hensley (1967 UKAEA AHSB(S)R136, 1968, 1971); Hettig (1967); Vaccaro (1969);Schillings (1970); M.R. Gibson and Knowles (1971, 1982



LPB 44); Kletz (1971, 1972a, 1985n, 1987j, 1991n); R.M.Stewart (1971, 1974a,b); Stewart and Hensley (1971);Tucker and Cline (1971); Wood (1971); Bennet (1972);R.L. Browning (1972); Herrmann (1972); Nisenfeld(1972); Ruziska (1972); J.T. Fisher (1973); de Heer(1973, 1974, 1975); J.R. Taylor (1973, 1976c); AEC(1975); van Eijk (1975); Lawley and Kletz (1975); E.J.Rasmussen (1975); Hullah (1976); B.R.W. Wilson (1976);Giugioiu (1977); Quenne and Signoret (1977); B.W.Robinson (1977); SuÈss (1977); Troxler (1977); M.R.Gibson (1978); Kumamoto and Henley (1978); Verde andLevy (1979); Chamany, Murty and Ray (1981); Wheatleyand Hunns (1981); Aitken (1982); Lees (1982a); Rhodes(1982); Ciambarino, Merla and Messina (1983); Jonstad(1983); Yip, Weller and Allan (1984); Enzina (1985);Lihou and Kabir (1985); Hill and Kohan (1986);Onderdank (1986); C. Tayler (1986c); Zohrul Abir (1987);Barclay (1988); R. Hill (1988, 1991); Kumar,Chidambaram and Gopalan (1989); Oser (1990);Papazoglu and Koopman (1990); Rushton (1991a,b,1992); Argent, Cook and Goldstone (1992); Beckman(1992a,b, 1993); Englund and Grinwis (1992); S.B.Gibson (1992); Gruhn (1992a,b); Kobyakov (1993); R.A.Freeman (1994); VDI 2180 (1967)

InterlocksD. Hughes (n.d.); Richmond (1965, 1982); E.G. Williams(1965); Platt (1966); Holmes (1971); Rivas and Rudd(1974); Rivas, Rudd and Kelly (1974); Becker (1979);Becker and Hill (1979); E.P. Lynch (1980); Kohan (1984);Rhoads (1985)

Control system classificationW.S. Black (1989); EEMUA (1989 Publ. 160)

Emergency shut-down systemsDoEn (1984); AGA (1988/52); Cullen (1990); HSE(1990b); J. Pearson (1992)

Leak detectionISA (1982 S67.03)

Gas, smoke and fire detectors (see Table 16.2)

Toxics detectors (see Table 18.1)

Reaction runaway detectorsHub (1977c); Wu (1985)

Fracture detectorsPonton (1980); Wilkie (1985a)

Instrument air (see Table 11.17)

(2) The process should be subjected to a critical examina-tion such as a hazop study to discover potentialhazards and operating difficulties.

(3) If a process contains serious hazards and requires anelaborate instrument system, it should be re-examinedto determine whether the hazards can be reduced atsource.

(4) If the process continues to contain serious hazards,these should be assessed and protective systems pro-vided as appropriate. If necessary, these should behigh integrity protective systems.

(5) For pressure systems it is necessary to provide protec-tion not just against overpressure, but also againstother conditions such as underpressure, overtempera-ture, undertemperature, overfilling, etc.

(6) The measurements should be as far as possible on thevariable of direct interest. If this variable has to beinferred from some other measurement, this factshould be made clear. It is also important that themeasurement should be at the right location.

(7) If the variable is critical for process safety, the samemeasurement should not be used for control and foran alarm or trip.

(8) If the variable is critical for operator comprehension, itmay be desirable to provide additional integrity.

(9) The alarm system should have a properly thought outphilosophy, which relates the variables alarmed, thenumber, types and degrees of alarm, and the alarmdisplays and priorities to factors such as instrumentfailure and operator confidence, the information loadon the operator, the distinction between alarms andstatuses, and the action which the operator has totake.

(10) The control loops should have fail-safe action as far aspossible, particularly on loss of instrument air orelectrical power to the control valves. The action forother equipment should also be fail-safe where applic-able.

(11) Those control loops which can add material or energyto the process are particularly critical and it may bedesirable to provide additional integrity.

(12) The control system as a whole and the individualinstruments should have the `rangeability' necessaryto maintain good measurement and control at lowthroughputs.

(13) The control system should be designed for off-normalas well as normal conditions, e.g. start-up and shut-down.

(14) Restart situations, such as restarting after a trip orrestarting an agitator, tend to be particularly hazar-dous.

(15) Manual stations should be provided which allow theoperator to manipulate control valves in situationssuch as the failure of the automatic controls.

(16) The fact of instrument failure should be fully takeninto account. The reliability of critical instrumentationshould be assessed quantitatively where possible.

(17) The ways in which dependent failures can occur andthe ways in which the instrument designer's intentionsmay be frustrated should be carefully considered.

(18) Instrumentation which is intended to deal with a faultshould not be disabled by the fault itself. And if theprocess operator has to manipulate the instrumenta-tion during the fault, he should not be prevented fromdoing so by the condition arising from the fault.

(19) The services (instrument air, electrical power, inertgas) on which instruments depend should have anappropriate degree of integrity.

(20) The instrument system should be checked regularlyand faults repaired promptly. It should not be allowedto deteriorate, even though the process operator com-pensates for this. The process operator should be



trained not to accept instrumentation unrepaired overlong periods.

(21) Ease of detection of instrument faults should be anobjective in the design of the instrument system.The process operator should be trained to regarddetection of malfunction in instruments as an integralpart of his job.

(22) Instruments which are required to operate only underfault conditions, and which may therefore have anunrevealed fault, require special consideration.

(23) Important instruments should be checked regularly.The proof test interval should, where possible, bedetermined from a reliability assessment. The checksshould not be limited to protective systems and pres-sure relief valves, but should include non-returnvalves, emergency isolation valves, etc., and oftenalso measurements, alarms, control loops, etc.

(24) Tests should correspond as nearly as possible to theexpected plant conditions. It should be borne in mindthat an instrument may pass a workshop test, but stillnot perform satisfactorily on the plant.

(25) Valves, whether control or isolation valves, are liableto pass fluid even when closed. Characterized controlvalves in particular tend not to give a tight shut-off.More positive isolation may require measures suchas the use of double block and bleed valves or ofslip plates.

(26) Valves, particularly control valves, also tend to stick.This can give rise to conditions which do not alwaysemerge from a simple application of fail-safe philoso-phy. Jamming in the open position is often particularlydangerous.

(27) Practices which process operators tend to develop intheir use of the instrumentation should be borne inmind, so that these practices do not invalidate theassumptions made in the reliability assessments.

(28) The fact of human error should be fully taken intoaccount. To the extent that is practical, human factorsprinciples should be applied to reduce human error,and the reliability of the process operator should beassessed quantitatively.

It is also necessary to pay careful attention to the detailsof the individual instruments used. Some features whichare important are as follows:

(1) Instruments are a potential source of failure, eitherthrough a functional fault on the instrument orthrough loss of containment at the instrument.

(2) Use of inappropriate materials of construction can leadto both kinds of failure. Materials should be checkedcarefully in relation to the application, bearing in mindthe possible impurities as well as the bulk chemicals.It should be remembered that the instrument supplierusually has only a very general idea of the application.

(3) Instruments containing glass, such as sight glasses orrotameters, can break and give rise to serious leaksand should be avoided if such leaks could be hazar-dous.

(4) Instruments may need protection against the processfluid due to its corrosiveness. Examples of protectionare the use of inert liquids in the impulse lines onpressure transmitters or of chemical diaphragm sealson pressure gauges.

(5) Sampling and impulse lines should be given carefulattention. Purge systems are often used to overcome

blockages in impulse lines. Freezing is another com-mon problem, which can be overcome by the use ofsteam or electrical trace heating.

(6) Temperature measuring elements should not normallybe installed bare, but should be protected by a ther-mowell. A thermowell is frequently exposed to quitesevere conditions such as erosion/corrosion or vibra-tion and should be carefully designed.

(7) Pulsating flow is a problem in flowmeters such asorifice plate devices and can give rise to serious inac-curacies. This is a good example of a situation wherereplication of identical instruments is no help.

(8) Pressure transmitters and regulators are easilydamaged by overpressure and this needs to beborne in mind.

(9) Complex instruments such as analysers, speed con-trollers, vibration monitors and solids weighers aregenerally less reliable than other instruments. Thisrequires not only that such instruments shouldreceive special attention but that the consequencesof failure should be analysed with particular care.

(10) Different types of pressure regulator are often con-fused, with perhaps a pressure reducing valve beingused instead of a non-return valve, or vice versa. It isspecially necessary with these devices to check thatthe right one has been used. Also, bypasses shouldnot be installed across pressure regulators.

(11) Selection of control valves is very important. A controlvalve should have not only the right nominal capacitybut also appropriate rangeability and control character-istics. It should have any fail-safe features required,which may include not only action on loss of powerbut also a suitable limit to flow when fully open. Itshould have any necessary temperature protection,e.g. cooling fins. Bellows seals may need to be pro-vided to prevent leaks. The valve should have a propermechanical balance for the application, so that it iscapable of shutting off against the process pressure.It should be borne in mind that any valve, but particu-larly a characterized valve, may not give completelytight shut-off, and also that a badly adjusted valvepositioner can prevent shut-off.

(12) Instruments should not be potential sources of ignitionand should conform with the hazardous area classifi-cation requirements.

Further discussions of the SLP aspects of instrumentsystems are given by Hix (1972) and the Center forChemical Process Safety (CCPS, 1993/14).

13.3.2 Instrument distributionA feel for the distribution of types of instrument on aprocess plant may be obtained from the following figuresgiven by Tayler (1987a):

Overall Monitoring Control(%) (%) (%)

Pressure 40 26 21Temperature 32 56 15Flow 20 8 47Level 8 4 8Analysis 3 4Miscellaneous 3 5



The first column evidently refers only to the four maintypes. It can be seen that, whereas temperature isdominant for monitoring, it is flow which predominatesin control.

13.3.3 Instrument accuracyMost process plant instrumentation is quite accurateprovided it is working properly. Information on theexpected error limits of commercially available instru-mentation has been given by Andrew and Williams(1980), who list limits for over 100 generic types ofinstrument. Some ranges of total error quoted by theseauthors are:

Pressure:Bellows transmitter �0.5%

Temperature:Thermocouple �0.25�5%Resistance thermometer �0.2�0.5

Flow:Orifice meter �0.5�1%

Level:Differential pressure �0.5�2%

Analysis:Gas chromatograph �0.5�1%

13.3.4 Instrument signal transmissionPueumatic instrument signals are transmitted by tubing,but several means are available for the transmission ofelectrical signals: wire, fibre optics and radio waves. Thesignals from measuring instruments can become cor-rupted in transmission. Pneumatic signals may beaffected by poor quality instrument air, while electricalsignals are liable to be subject to electromagneticinterference.

Both pneumatic and electrical instrument signals utilizelive zero, standard ranges being 3�15 psig for pneumaticinstruments and 4�20 mA for electronic ones. This avoidsthe situation where a zero signal is ambiguous, meaningeither that the measured variable actually has a zerovalue or that the instrument signal has simply gone dead.

13.3.5 Instrument utilitiesInstrument systems require high quality and highreliability utilities. A general account of instrumentutilities has been given in Chapter 11. As far as qualityis concerned, pneumatic systems require instrument airwhich is free of dirt and oil. Many electronic instrumentsystems can operate from an electrical feed which doesnot constitute an uninterruptible power supply (UPS).But computers and PESs are intolerant of even milli-second interruptions, unless they have their own in-builtmeans of eliminating them. A further treatment ofinstrument utilities is given by the CCPS (1993/14).

13.3.6 Valve leak-tightnessIn many situations on process plants the leak-tightness ofa valve is of some importance. The leak-tightness ofvalves is discussed by Hutchison (1976) in the ISAHandbook of Control Valves.

Terms used to describe leak-tightness of a valve trimare (1) drop tight, (2) bubble tight or (3) zero leakage.Drop tightness should be specified in terms of themaximum number of drops of liquid of defined size per

unit time and bubble tightness in terms of the maximumnumber of bubbles of gas of defined size per minute.

Zero leakage is defined as a helium leak rate notexceeding about 0.3 cm3/year. A specification of zeroleakage is confined to special applications. It is practicalonly for smaller sizes of valves and may last for only afew cycles of opening and closing. Liquid leak-tightnessis strongly affected by surface tension.

Specifications for leak tightness of a stop, or isolation,valve are given in SP-61 by the US Valve ManufacturersStandardization Society, and are quoted in the ISAHandbook. In respect of control valves, the Handbookstates:

Properly designed control valves can achieve stop valvetightness and maintain it throughout a long service lifebefore trim replacement; particularly with cage guided,balanced trim having elastomer plug-to-cage seals. Thecontrol valve, however, is expected to throttle and oftenshuts off much more frequently than stop valves. Forexample, some dump valves may have from 4000 to7000 opening and closing cycles per day, handling highpressure and erosive fluids at 1000 to 4000 psi pressuredrop. Few stop valves could match this performance andremain tight.

It is normal to assume a slight degree of leakage forcontrol valves. It is possible to specify a tight shut-offcontrol valve, but this tends to be an expensive option.A specification for leak-tightness should cover the testfluid, temperature, pressure, pressure drop, seating forceand test duration. For a single-seated globe valve withextra tight shut-off the Handbook states that themaximum leakage rate may be specified as 0.0005 cm3

of water per minute per inch of valve seat orificediameter (not the pipe size of the valve end) perpound per square inch pressure drop. Thus a valvewith a 4 in. seat orifice tested at 2000 psi differentialpressure would have a maximum water leakage rate of4 cm3/min.

13.3.7 Hazardous area compatibilityThe instrument system, including the links to the controlcomputers, should be compatible with the hazardous areaclassification. Hazardous area classification involves firstzoning the plant and then installing in each zoneinstrumentation with a degree of safeguarding appropri-ate to that zone. Since much instrumentation is of lowpower, an approach based on inherent safety is oftenpractical. These various aspects of hazardous areaclassification are dealt with in Chapter 16.

13.3.8 Multi-functional vs dedicated systemsAn aspect of basic design philosophy which occursrepeatedly in different guises is the choice which hasto be made between a multi-functional and a dedicatedsystem. Some basic functions which are typicallyrequired are (1) monitoring, (2) control, (3) trips andinterlocks, (4) fire and gas detection, (5) emergencyshut-down (ESD) and (6) communication. The tripsystem may well be separate from the monitoring andcontrol system and the ESD system trips separate fromthe other trips.

The situation which develops is illustrated in Figure13.1(a) which shows a traditional design for an offshoreproduction platform system (A. Morris, 1986). The



alternative design which he proposes for consideration isshown in Figure 13.1(b). To the objection that this latterdesign puts all its eggs in one basket, the author putstwo arguments. First, the overall reliability has beenimproved to such an extent that the frequency of acomplete system failure will be very low. Second, in themajority of cases the process should be able to survivesuch failure because it can be brought to a safe state bysimple measures, notably by shutting off the heat inputand depressurizing.

A particular but common example of the multi-functional vs dedicated system problem is the choicebetween a computer-based and a hardwired trip system.This aspect is discussed further in Sections 13.9, 13.12and 13.15.

13.4 Process Computer Control

The use of computers in control systems began in thelate 1950s and is now a mature technology. Processcontrol computer systems and applications are describedin Computer Control of Industrial Processes (Savas, 1965),Computer Control of Industrial Processes (Lowe andHidden, 1971), Handbook of Industrial Control Computers(Harrison, 1972), Understanding Distributed ProcessControl (Moore and Herb, 1983), Computer Systems for

Process Control (GuÈth, 1986) and Industrial DigitalControl Systems (Warwick and Rees, 1986), while adescription of computer control and its relation tooperator control has been given in Man and Computerin Process Control (E. Edwards and Lees, 1973). Selectedreferences on process computer control are given inTable 13.3.

The inclusion of a process control computer greatlyextends the capabilities, but also affects the reliability, ofthe control system. These two aspects are now consid-ered.

13.4.1 Computer configurations and reliabilityThere are several ways in which a computer may beincorporated in a process control system. The approachesoriginally used are illustrated in Figure 13.2. If there isno computer, then the loops are controlled by analoguecontrollers as shown in Figure 13.2(a).

The configuration given in Figure 13.2(b) is set-pointcontrol. The computer takes in signals from measuringinstruments and sends signals to the set points ofanalogue controllers. If there is a computer failure,control is still maintained by the analogue controllers.Figure 13.2(c) shows direct digital control (DDC). Thecomputer again takes in signals from measuring instru-ments, but now sends signals direct to the control valves;


Figure 13.1 Instrumentation for a system on an offshore production platform (A. Morris, 1986): (a) conventionalsystem; and (b) alternative system (Courtesy of Process Engineering)

1 3 / 1 0 C O N T R O L S Y S T E M D E S I G N

Table 13.3 Selected references on process computercontrol

Process computer control, including distributedcontrolSavas (1965); Anke, Kaltenecker and Oetker (1970);Lowe and Hidden (1971); T.J. Harrison (1972); Lees(1972); E. Edwards and Lees (1973); IEE (1977 Conf.Publ. 153, 1982 Control Ser. 21, 1988 Control Ser. 37,1989 Conf. Publ. 314, 1990 Control Ser. 44, 1993 ControlSer. 48); R.E. Young (1977); Bader (1979); Sandefur(1980); Cocheo (1981); IMechE (1982/61); Petherbridge(1982); Helms (1983); D.R. Miller, Begeman and Lintner(1983); J.A. Moore and Herb (1983); Rembold,Armbruster and UÈ lzmann (1983); Anon. (1984rr); NordicLiaison Committee (1985 NKA/LIT (85)5); C. Tayler(1985b, 1986d); GuÈjth (1986); Hide (1986); Morrish(1986); Warwick and Rees (1986); J. Pearson andBrazendale (1988); D.L. May (1988); Strock (1988);Eddershaw (1989 LPB 88); J.A. Shaw (1991); Livingston(l992); Ray, Cary and Belger (1992); Wadi (1993)BS (Appendix 27 Computers)

Computer integrated processingZwaga and Veldkamp (1984); C. Tayler (1985d); O'Grady(1986); T.J. Williams (1989); W. Thompson (1991);Canfield and Nair (1992); Conley and Clerrico (1992);Mehta (1992); Nair and Canfield (1992); Sheffield (1992);Stout (1992); Bernstein et al. (1993); Koppel (1993);Mullick (1993); Yoshimura (1993)

Programmable electronic systemsZielinski (1978); Bristol (1980); Sargent (1980); EEMUA(1981 Publ. 123); HSE (1981 OP 2, 1987/21, 22); Dartt(1982); IBC (1982/39); Devries (1983); Martinovic(1983); Martel (1984); Lihou (1985b, 1987); Skinner(1985 LPB 62); Weiner (1985); Wilkinson and Balls(1985); R. Bell (1986); Daniels (1986); Fulton and Barrett(1986); Holsche and Rader (1986); Margetts (1986a,b,1987); Pinkney (1986); Wilkinson (1986); Anon. (1987u);Pinkney and Hignett (1987); Wilby (1987); Bellamy andGeyer (1988); Clatworthy (1988); D.K. Wilson (1988);Deja (1989); IGasE (1989 IGE/SR/15); Max-Lino (1989);Oser (1990); British Gas (1991 Comm. 1456); Borer(1991); J. Pearson (1991); Sawyer (1991a); Gruhn(1992b); Prugh (1992d)

Control rooms, computer displaysBernard and Wujkowski (1965); Wolff (1970); IEE (1971Conf. Pub. 80, 1977 Conf. Pub. 150); Dallimonti (1972,1973); E. Edwards and Lees (1973); Strader (1973); Lees(1976d); Bonney and Williams (1977); Jervis and Pope(1977); Hammett (1980); Burton (1981); Lieber (1982);C.M. Mitchell and Miller (1983); Banks and Cerven(1984); Jansen (1984); Mecklenburgh (1985); C. Tayler(1986a); Gilmore, Gertman and Blackman (1989)

Computer system reliability, including safety criticalsystems, fault tolerant systems, computer systemsecurity (see also Table 7.1)Hendrie and Sonnenfeldt (1963); R.J. Carter (1964);Sonnenfeldt (1964); Burkitt (1965); A. Thompson (1965);Lombardo (1967); Regenczuk (1967); Amrehn (1969);Stott (1969); Anon. (1970d); Barton et al. (1970); Hubbe(1970); Luke and Golz (1970); H.F. Moore and Ballinger

(1970); Parsons, Oglesby and Smith (1970); J. Grant(1971); J.A. Lawrence and Buster (1972); E. Edwards andLees (1973); Daniels (1979 NCRS 17, 1983, 1986); N.R.Brown (1981); Wong (1982); Anon. (1984cc); Hura(1984); Bucher and Frets (1986)

Computer-based tripsWilkinson and Balls (1985); Wilkinson (1986); Cobb andMonier-Williams (1988)

Computer-based `black box' recorderAnon. (1977a)

Safety of computer controlled plantsKletz (1982g, 1991g, 1993a); Pitblado, Bellamy and Geyer(1989); P.A. Bennett (1991a); Frank and Zodeh (1991);P.G. Jones (1991); Pearson (1991)BS (Appendix 27 Computers)

Computer control applicationsW.E. Miller (1965); UKAC (1965); Control Engineering(1966); IEE (1966 Conf. Pub. 24, 1967 Conf. Pub. 29,1968 Conf. Pub. 43, 1969 Coll. Dig. 69/2, 1971 Conf.Pub. 81, 1972 Conf. Pub. 83, 1973 Conf. Pub. 103, 1975Conf. Pub. 127, 1977 Coll. Dig. 77/30); Washimi andAsakura (1966); IChemE (1967/45); M.J. Shah (1967);Whitman (1967); Barton et al. (1970); Higson et al.(1971); Sommer et al. (1971); E. Edwards and Lees(1973); Daigre and Nieman (1974); St Pierre (1975);Tijssen (1977); P.G. Friedman (1978); Weems, Ball andGriffin (1979); British Gas (1983 Comm. 1224); IBC(1983/40); Seitz (1983); C. Tayler (1984b); Tatham,Jennings and Klahn (1986)

there are no analogue controllers. If there is a computerfailure, control is lost on all loops, unless stand-byarrangements have been made. Although set-point con-trol developed first, it was followed quickly by DDC, andboth methods came into use.

The first large DDC installation on a chemical plantwas on the ammonia soda plant of lmperial ChemicalIndustries (ICI) at Fleetwood (Burkitt, 1965; A.Thompson, 1965). The computer carried out DDC on98 loops and achieved an availability of about 99.8%.Further accounts of DDC systems have been given byBarton et al. (1970) and by Higson et al. (1971).

Although the initial intention was for DDC to save thecost of analogue controllers, it soon became apparentthat many other factors were involved in the choicebetween set-point control and DDC. Since, with DDC,computer failure leads to loss of control, it may benecessary to achieve a much higher reliability than withset-point control. The effort required to implement aDDC installation tends, therefore, to be much greater. Itis necessary to pay very careful attention to details of thecomputer, the power supply and the environment, theinput�output equipment and the programming. UsuallyDDC does not reduce the cost of adding computercontrol to the control system much below that for set-point control. Savings in costs per loop tend to be slight,


C O N T R O L S Y S T E M D E S I G N 1 3 / 1 1

because the equipment needed to get measurements intothe computer and to position the control valves from it isquite expensive. It is necessary to provide stand-byanalogue controllers for critical control loops andchange-over equipment to transfer between computerand analogue control. The extra general effort requiredto assure integrity in DDC is also significant.

On the other hand, DDC does offer some advantages,not only over conventional control but also over set-pointcontrol. The advantages derive from the fact that thecomputer takes in signals from the measuring instru-ments and can process them in all sorts of ways beforesending out the results as signals to the control valves. Itmakes it possible to: carry out operations on themeasurements, such as calculation of indirect measure-ments and filtering of measurement signals; ensure thatthe control algorithm is truly proportional, integral andderivative without the inaccuracies and interactions whichtend to occur in analogue controllers; use differentcontrol algorithms such as non-linear or asymmetrical

algorithms or algorithms with some logic in them;eliminate features such as integral saturation andderivative kick; position valves more accurately; alterthe control configuration; and so on.

There are several ways in which the reliability of DDCsystems can be improved. One of these, as mentionedearlier, is the use of stand-by controllers on critical loops.But this is by no means a complete answer to theproblem. The system may still be upset by intermittentfaults, there may be difficulties in keeping the stand-byinstrumentation maintained and avoiding degradation,and the operator is faced with a different interface touse on loss of computer control. Another approach is theuse of duplication. In this case it is necessary not only touse dual computers, but also to duplicate other parts ofthe system such as power supplies and input�outputequipment. Various configurations are possible and innormal operation the work may be divided either on aparallel or a hierarchical basis, but in all cases theessential principle is that the surviving computer takesover the critical control functions. The reliability of dualcomputer systems is undoubtedly higher, but it can stillbe affected by factors such as intermittent failures, datalink troubles, hardware faults in common, such asearthing, and software faults in common, such asprogramming errors. With regard to reliability, for thetype of system just described, the most reliable systemsachieved a mean time between failures (MTBF) and anavailability of not less than 2000 hours and 99.9%,respectively.

Advances in process control systems, and particularlythe trend towards distributed PESs, have largely resolvedthe dilemmas described and have gone far towardssolving the reliability problems. Figure 13.3 showsschematically a system configuration typical of thesedevelopments. The backbone of the system is a datahighway to which various devices are connected. Theindividual PES controllers are capable of operating asDDC controllers in the stand-alone or set-point control


Figure 13.2 Process computer control systems: set-point and direct digital control: (a) analogue control; (b)set-point control by computer; (c) direct digital control bycomputer

Figure 13.3 Process computer control systems:distributed control system


modes. The VDU display can also operate independentlyof the computer. Thus the system allows the full facilitiesof DDC if the computer is working, but on computerfailure the controllers maintain control and the VDUdisplay continues to provide the operator with the usualinterface.

Various configurations may be used to obtain back-upcontrol of critical loops. Where a loop is backed up it isdesirable to ensure `bumpless' transfer when the stand-byequipment assumes control. This involves a process ofinitialization before control is transferred.

Accounts of computer-based and PES-based processcontrol systems based on these principles include thoseby E. Johnson (1983), Tatham, Jennings and Klahn(1986), Cobb and Monier-Williams (1988) and theCCPS (1993/14). Programmable electronic systems forprocess control are considered further in Section 13.12.Data on the reliability of computer systems are given inAppendix 14.

13.4.2 Computer functionsIf the computer carries out DDC, then this is its mostimportant function. The facilities and flexibility whichDDC offers have already been described. However, asjust described, modern process control systems aregenerally based on distributed PESs.

The other main functions which a process controlcomputer or PES performs are:

(1) measurement;(2) data processing and handling;(3) monitoring;(4) other control;(5) sequential and logical control;(6) optimization;(7) scheduling;(8) communication.

Several of these functions are important in relation tosafety and loss prevention (SLP).

The measurements on which control depends arecritical. The computer is often used to carry out certainchecks on the measurements as described in Chapter 30.It can also upgrade them in various ways such as byextraction of non-linearities, zero or range correction, orfiltering.

The computer's ability to calculate ìndirect' orìnferred' measurements is widely used. These arecalculated from one or more process measurementsand possibly other data inserted into the computer, e.g.laboratory analyses. Thus the mass flow of a particularcomponent may be calculated from a total mass flow anda concentration measurement. It is often such indirectmeasurements which are of principal interest and theiruse represents a real advance in control. An indirectmeasurement can be subjected to all the operationswhich are carried out on direct measurements: it can bedisplayed, logged, monitored, controlled and used inmodelling and optimization.

The computer usually logs data and provides summa-ries for the process operator and management. Theselogs often contain important information on equipmentfaults, operator interventions, etc. Arrangements are alsosometimes made for a post-mortem log in the event of aserious incident on the process. This usually involves

holding a continuously up-dated set of data on processinstrument readings so that it can be replayed ifnecessary.

The computer almost invariably carries out monitoringof the process measurements and statuses to detectabnormal conditions. This constant scanning of theoperating conditions is invaluable in maintaining controlof the process. Computer alarm scanning is considered,together with other aspects of the alarm system withwhich the operator interacts, in more detail in Chapter14.

Frequently, there are one or two process variables,equipments or operations which are particularly difficultto control and for these more advanced control methodsmay be appropriate. These methods are usually difficultto implement without a computer. The following appearto be especially useful: (1) indirect variable control, (2)automatic loop tuning, (3) control of dead time processesand (4) non-interacting control.

The execution by the computer of sequential opera-tions in a reliable manner is another common functionwhich is invaluable in maintaining trouble-free operationof the process. Such sequential control involves muchmore than simply sending out control signals. It isessential for checks to be made to ensure that theprocess is ready to proceed to the next stage, that theequipment has obeyed the control signals, and so on.There is therefore a liberal sprinkling of checksthroughout the sequence. Thus sequential controlinvolves continuous checking of the state of the processand the operation of equipment.

Using a computer it is possible to carry out morecomplex sequences with greater reproducibility. This isparticularly useful in operations where it is necessary tofollow a rather precise schedule in order to avoiddamage to the equipment.

On some processes where there is a time-varyingoptimum, the computer carries out continuous optimiza-tion. Optimization is usually performed with a set ofconstraints. Computer optimization therefore provides asby-product a more formal definition of, and adherence to,process constraints.

There are several other computer functions which areparticularly relevant to SLP. These include computeralarm analysis, valve sequencing and malfunction detec-tion. These are dealt with in Chapter 30.

For many years there was very little use of computersto carry out the protective function of tripping plant whena hazardous condition occurs. The protective system hasalmost invariably been a system separate from thecontrol system, whether or not the latter contains acomputer, and engineered for a greater degree ofintegrity. There is now movement towards the use ofPESs for the trip functions also, but only where it can bedemonstrated that the system has a reliability at leastequal to that of a conventional hardwired system.

13.4.3 Computer displays and alarmsProcess computers, as just indicated, are powerful toolsfor the support of information display and alarm systems.The design of such systems is intimately bound up withthe needs of the process operator, and discussion istherefore deferred to Chapter 14.



13.4.4 Fault-tolerant computer systemsTo the extent practical, process computer systems shouldbe fault tolerant. A fault-tolerant system is one whichcontinues to perform its function in the face of one ormore faults. Accounts of fault-tolerant design of computersystems, including process computer systems, are givenby Shrivastava (1991), the CCPS (1993/14) and Johnston(1993).

The creation of a fault-tolerant system involves acombination of approaches. A necessary preliminary iseffort to obtain high reliability and thus to eliminatefaults. The methods of reliability engineering may beused to model the system and to identify weak points.The use of redundancy and diversity is a commonstrategy. Dependent failures and methods of combatingthem should receive particular attention.

Prompt detection and repair of faults is an importantpart of a strategy for a fault-tolerant system. A fault-tolerant system should degrade gracefully, and safely.One important aspect is the fail-safe action of the system.

13.4.5 Computer power suppliesProcess computers and PESs require a high reliabilityand high quality power supply. A general account ofpower supplies is given in Chapter 11. The operation ofsuch equipments can be upset by millisecond interrup-tions, unless they have in-built means of dealing withthem. They therefore generally require an uninterruptiblepower supply (UPS). Devices used to provide a UPSinclude motor generators, DC/AC inverters and batteries.

The power supply also needs to be uninterruptible inthe sense that it has high reliability. One option is theuse of batteries, another is some form of redundancy ordiversity of supply.

A treatment of power supplies for PESs is given by theCCPS (1993/14). A relevant code for UPSs is IEEE 446.

13.4.6 Computer system protectionProcess computers and PESs require suitable protectionagainst fire and other hazards.

For fire protection relevant codes are BS 6266: 1992Code of Practice for Fire Protection of Electronic DataProcessing Installations, NFPA 75: 1992 Protection ofElectronic Computer/Data Processing Equipment andNFPA 232: 1991 Protection of Records.

Lightning protection is covered in NFPA 78: 1989Lightning Protection Code.

Codes for earthing are BS 1013: 1965 Earthing andIEEE 142: 1982 Grounding of Industrial and CommercialPower Systems (the IEEE Green Book).

These hazards and protection against them are treatedby the CCPS (1993/14).

13.5 Control of Batch Processes

The control of batch processes involves a considerabletechnology over and above that required for the controlof continuous processes. Accounts of batch processcontrol are given in Batch Process Automation (Rosenofand Ghosh, 1987), Batch Control Systems (T.G. Fisher,1990) and Computer-Controlled Batch Processing (Sawyer,1993a) and by Love (1987a�c, 1988).

Batch processes constitute a large proportion of thosein the process industries. Sawyer (1993a) gives thefollowing figures:

Industry sector Mode of operation

Batch Continuous

Chemical 45% 55%Pharmaceutical 80% 20%

Many batch plants are multi-purpose and can makemultiple products. Their outstanding characteristic istheir flexibility. They differ from continuous plants inthat: the operations are sequential rather than contin-uous; the environment in which they operate is oftensubject to major variability; and the intervention of theoperator is to a much greater extent part of their normaloperation rather than a response to abnormal conditions.A typical batch plant is shown in Figure 13.4.

13.5.1 Models of batch processingThere are a number of models which have beendeveloped to represent batch processing. Threedescribed by T.G. Fisher (1990) are (1) the recipemodel, (2) the procedure model and (3) the unit model.

The recipe model centres on the recipe required tomake a particular product. Its elements are the proce-dures, the formula, the equipment requirements and the`header'. The procedure is the generic method ofprocessing required to make a class of product. Theformula is the raw materials and operating conditions forthe particular product The equipment requirementscover the equipment required to execute the formula,including materials of construction. The header is theidentification of the batch in terms of product, version,recipe and so on.

The procedure model has the form:

Procedure!Operation!Phase!Control step

The overall procedure consists of a number of opera-tions, akin to the unit operations of continuous pro-cesses, except that they may be carried out by the sameequipment. The phase is a grouping of actions within anoperation. The control step is the lowest level of action,typically involving the movement of a small number offinal control elements.

The concept of phase is a crucial one in batchprocessing. A phase is a set of actions which it islogical to group together and which ends at a pointwhere it is logical and safe for further intervention totake place. It is closely connected, therefore, with theconcept of `hold' states at which it is safe for the processto be held. The possibility that other facilities on whichthe progress of the batch depends may not beimmediately available makes such hold states essential.

The unit model is equipment-oriented and has theform

Unit!Equipment module!Device/loop!Element

The unit is broken down into functional equipmentmodules such as vessels and columns. These in turnare decomposed into devices and loops which aregroupings of elements such as sensors and controlvalves.



13.5.2 Representation of sequential operationsThe control of a batch process is a form of sequentialcontrol. A typical sequential control procedure, expressedin terms of the procedure model, is shown in Table 13.4.Various methods are available for the specification ofsequences. They include (1) flowcharts, (2) sequentialfunction charts and (3) structured plain language.

The flowchart is a common method of representingsequences, but its successful use requires that: aconsistent style be adopted; that the method cater forthe procedure hierarchy by the use of a hierarchy ofcharts for operations, phases and control steps; it alsoallows for parallel activities and for actions prompted byalarms and failures; and is supplemented by informationon recipes, units, etc., and by other representations suchas structured language. Computer-based drafting aids areinvaluable in creating flowcharts.

The sequential flowchart has been developed expresslyto describe sequential control and has three basicfeatures: (1) steps, (2) transitions, and (3) directedlinks. A step is an action and ends with a conditionaltransition. If the condition is satisfied, control passes tothe next step. This latter step then becomes active and

the previous step inactive. A directed link creates asequence from steps and transitions. Figure 13.5 shows asequential flowchart together with the standard symbolsused in the creation of such charts.

With regard to the use of structured language, Rosenofand Ghosh advise that: (1) simple statements should beused; (2) the required function should be clearly definedin a statement; (3) the plant hardware addressed should,where possible, be identified; (4) text should be indentedwhere necessary; (5) negative logic should be avoided;and (6) excessive nested logic should be avoided.

13.5.3 Structure of batch processingThe overall structure of batch processing is commonlyrepresented as a hierarchy. The following structure andterminology by Rosenof and Ghosh (1987) is widelyused:

Production planning!Production scheduling!Recipemanagement!Batch management! Sequential control!Discrete/regulatory control!Process interlocks!Safety interlocks


Figure 13.4 A typical batch plant (Sawyer, 1993) (Courtesy of the Institution of Chemical Engineers)


A treatment of batch processing as a form of computerintegrated manufacturing is given in Section 13.7.

13.5.4 Batch control systemsBatch processing may be controlled by the processoperator, by a system of single controllers or by aprogrammable logic control (PLC) system, a distributedcontrol logic system (DCL) or a centralized controlsystem (CCS). The selection of the system architectureand hardware is discussed by Sawyer (1993).

Recommendations for batch control have been made inEurope by the NAMUR committee (1985), whichaddresses particularly the need for standard terminologyand for a hierarchical structure of the control systemwhich reflects that of batch processing itself

In the USA, guidance is available in the form of ISASP88: 1988 Batch Control Systems.

13.6 Control of Particular Units

The safe operation of process units is critically dependenton their control systems. Two particularly importantfeatures of control in process plant are (1) compressorcontrol and (2) chemical reactor control. These are nowconsidered in turn.

13.6.1 Compressor controlCentrifugal and axial compressors are subject to thephenomenon of surging. Surging occurs when flowthrough the compressor falls to a critical value so thata momentary reversal of flow occurs. This reversal offlow tends to lower the discharge pressure and normalflow resumes. The surge cycle is then repeated. Severesurging causes violent mechanical shock and noise, andcan result in complete destruction of parts of thecompressor such as the rotor blades.

A typical centrifugal compressor characteristic showingthe surge limit is illustrated in Figure 13.6(a). Acentrifugal compressor is usually fitted with anti-surgecontrols which detect any approach to the surgeconditions and open the bypass from the delivery tothe suction of the machine, thus increasing the flowthrough the machine and moving it away from the surgeconditions.

The compressor delivery and suction pressures Pd andPs are related to the gas flow Q as follows:

Pd ÿ Ps / Q2 �13:6:1�The shape of the surge curve is therefore parabolic asshown in Figure 13.6(a). An expression of this form isgenerally inconvenient in instrumentation, for whichlinear relations are preferred. A linear relation can be


Table 13.4 Typical sequential control procedure (Sawyer, 1993a) (Courtesy of the Institution of Chemical Engineers)

Operations Phases Control steps

Initialize Initialize Start jacket circulation pump. Put reactor temperature controller inSECONDARY AUTO mode with set point of 120�C

Weigh Weigh Component 3 Initialize (tare-off weigh tank). Open outlet valve from head tank. Whenweight of Component 3 equals preset, close outlet valve from headtank

Charge Add Component 3 Open outlet valve from weigh tank. When enough of Component 3 hasbeen added, start the agitator. When weigh tank is empty, close outletvalve

Add Component 1 Initialize (reset flow totalizer to zero). Open outlet valves from headtank to flowmeter and from flowmeter to reactor. When volume ofComponent 1 charged equals preset, close outlet valves

React Heat Initialize (put reactor temperature controller in CASCADE mode withset point of 120�C)

Hold Initialize (reset timer). Start timer

Sample

Discharge Cool Initialize (set reactor temperature set point to 35�C)

Transfer Initialize (set reactor outlet valves to correct destination, ie storagetank). Start discharge pump. Set reactor temperature controller toMANUAL mode with output at zero (full cooling). Before agitatorblades are uncovered, stop agitator. When reactor is empty, closereactor outlet valves, stop discharge pump, stop jacket circulationpump


obtained by making use of the relation for pressure drop�P across the orifice flowmeter on the compressorsuction:

Q2 / �P �13:6:2�Hence from relations 13.6.1 and 13.6.2

Pd ÿ Ps / �P �13:6:3�Figure 13.6(b) shows the compressor characteristicsredrawn in terms of this pressure drop. The surgecondition is now given by a straight line. The antisurgecontrol system is set to operate on a line somewhat inadvance of the surge limit, as shown in Figure 13.6.(b).The anti-surge controller is usually a P�I controller and,since it operates only intermittently, it needs to havearrangements to counteract integral saturation.

Accounts of centrifugal and axial compressor controlare given by Claude (1959), R.N. Brown (1964), Daze(1965), Hatton (1967) and Magliozzi (1967), and accountsof reciprocating compressor control are given by Hagler(1960) and Labrow (1968). Multi-stage compressorcontrol is discussed by D.F. Baker (1982), Maceyka(1983) and Rana (1985), and control of compressors inparallel by Nisenfeld and Cho (1978) and B. Fisher(1984).

13.6.2 Chemical reactor controlThe basic characteristics of chemical reactors havealready been described in Chapter 11, in which, inparticular, an account was given of the stability andcontrol of a continuous stirred tank reactor. It is


Figure 13.5 Sequential function chart (Sawyer, 1993): (a) chart for control steps ADD COMPONENT; and (b) basicsymbols (Courtesy of the Institution of Chemical Engineers)


appropriate here to consider some additional aspects ofreactor control.

A continuous stirred tank reactor is generally stableunder open-loop conditions, but in some cases a reactormay be unstable under open-loop but stable underclosed-loop conditions. Some polymerization reactorsand some fluidized bed reactors may be open-loopunstable under certain conditions.

The reactor should be designed so that it is open-loopstable unless there is good reason to the contrary. Onemethod of achieving this is to use jacket cooling with alarge heat transfer area. Another is to cool by vaporiza-tion of the liquid in the reactor. This latter method givesa virtually isothermal reactor.

If the reactor is or may be open-loop unstable, thecontrol system should be very carefully designed. Theresponses of the controls should be fast. One method ofachieving this is the use of cascade control for thereactor temperature to the coolant temperature. The deadtime should be minimized. A high coolant flow assists inreducing dead time.

Continuous stirred tank reactors and batch reactorshave their own characteristic control problems. Some ofthe control problems of continuous stirred tank reactorsare as follows. A reaction in a continuous reactor is oftencarried out in a single phase in one pass. This requiresaccurate control of the feed flows to the reactor. Failureto achieve such control may have effects such asunconverted reactant leaving the reactor, undesirableside reactions or rapid corrosion.

It is often possible for impurities to build up in acontinuous reactor. Where this is the case, arrangementsshould be made to purge the impurities. If the reactantsto a continuous reactor need to be preheated, this shouldbe done before they are mixed, unless the reactionrequires a catalyst. A continuous reactor is sometimesprovided with regenerative preheating. It should beborne in mind that such preheating constitutes a formof positive feedback.

As described in Chapter 11, batch reactors are of twobroad types. In the first, the àll-up' batch reactor, the

main reactants are all charged at the start. In the semi-batch reactor one reactant is not charged initially but isfed continuously.

The reaction mass in a batch reactor cannot neces-sarily be assumed to be completely mixed. It is notuncommon for there to be inhomogeneities, hot spotsand so on. This has obvious implications for reactorcontrol.

Some of the control problems of batch reactors are asfollows. In a typical all-up batch cycle the reactants andcatalyst are charged, the charge is heated to reactiontemperature, and the reaction mass is then cooled anddischarged. In some cases the reaction stage is followedby a curing stage which may be at a temperature belowor above the reaction temperature.

In the initial heating up period the temperature of thecharge should be brought up to the operating pointrapidly, but it should not overshoot. If the reactortemperature is controlled by an ordinary three-termcontroller, integral saturation in the controller willcause overshoot. It is necessary, therefore, to employ acontroller which is modified to avoid this. Alternatively,the heating up may be controlled in some other waywhich avoids overshoot. Once the reaction is under wayin a batch reactor the initial heat release is large. Thecooling system should be adequate for this peak heatrelease.

Semi-batch reactors have different problems. Theaddition of the continuously fed reactant before thebatch is up to temperature should be avoided, otherwiseit is liable to accumulate and then to react rapidly whenthe operating temperature is reached.

If agitation is interrupted and then resumed, there maybe a sudden and violent reaction of reactants which haveaccumulated. There should be suitable alarms, trips andinterlocks to signal loss of agitation, to cut off feed ofreactant, and to ensure an appropriate restart sequence.

In both types of reactor there should be arrangementsto prevent material from the reactor passing back intoreactant storage tanks where this could constitute ahazard. The control of flows in the reactant feed pipes is


Figure 13.6 Centrifugal compressor characteristics illustrating surge and antisurge control: (a) conventionalcharacteristic; (b) characteristic for antisurge control


important. It is necessary to ensure tight shut-off of thereactants and to prevent flow from the reactor into thereactant feed system.

The reactor should be provided with suitable displayand alarm instrumentation, so that the process operatorhas full information on the state of the reactor. Importantvariables are typically the flows of the reactants and ofthe coolant, the pressure in the reactor and thetemperature of the reactor and of the coolant.Important statuses are the state of the agitator, of thepumps and of the valves.

The reactor should have a control system which isfully effective in preventing a reaction runaway. The mainreactor control is usually based either on reactortemperature or on reactor pressure. The dynamicresponse of the loop is especially important. Thereshould be adequate potential correction on the controlloops. In other words, the steady-state gain between themanipulated variable and the controlled variable shouldbe high enough to ensure that control of the latter isphysically possible.

The instrumentation should possess both capabilityand reliability for the duty. Important aspects ofcapability are accuracy and dynamic response. Theeffects of instrument failure should be fully considered.In particular, failure in the measurement and control ofthe main variable, which is usually temperature, shouldbe assessed. The ease of detection of instrumentmalfunction by the process operator should be consid-ered. Factors which assist in malfunction detectioninclude the use of measuring instruments with acontinuous range rather than a binary output and theprovision of recorders and of indications of valve position.

Trip systems should be provided to deal withpotentially hazardous conditions. These typically includeloss of feed, loss of coolant, loss of agitation and rise inreactor temperature. Emergency shut-down arrangementsfor reactors are discussed in Chapter 11.

Use should be made of interlocks to ensure thatcritical sequences which have to be carried out on thereactor are executed safely and to prevent actions whichare not permissible. Many of these control functions arefacilitated by the use of a process control computer. Afuller discussion of instrumentation is given in Section13.8.

13.7 Computer Integrated Manufacturing

There is now a strong trend in the process industries tointegrate the business and plant control functions in atotal system of computer integrated manufacturing(CIM). Accounts of CIM are given by T.J. Williams(1989), Canfield and Nair (1992), Conley and Clerico(1992), Mehta (1992), Nair and Canfleld (1992),Bernstein et al. (1993) and Koppel (1993).

The aim of CIM is essentially to obtain a flexible andoptimal response to changes in market demand, on theone hand, and to plant capabilities on the other. It hasbeen common practice for many years for productionplans to be formulated and production schedules to beproduced by computer and for these schedules to bepassed down to the plant. In refineries, use of largescheduling programs is widespread. In addition toflexibility, other benefits claimed are improved product

quality, higher throughputs, lower costs and greatersafety.

A characteristic feature of CIM is that information alsoflows the other way, i.e. up from the plant to theplanning function. This provides the latter with acontinuous flow of up-to-date information on the cap-ability of the plant so that the schedule can be modifiedto produce the optimal solution. A CIM system maytherefore carry out not only the process control andquality control but also scheduling, inventory control,customer order processing and accounting functions.

The architecture of a CIM system is generallyhierarchical and distributed. Treatments of such archi-tecture are given in Controlling Automated ManufacturingSystems (O'Grady, 1986) and by Dempster et al. (1981).

For such a system to be effective it is necessary thatthe data passing up from the plant be of high quality.The system needs to have a full model of the plant,including the mass and energy balances and the statesand capabilities of the equipment. This involves variousforms of model-based control, which is of such promi-nence in CIM that the two are sometimes treated as ifthey are equivalent.

Plant data are corrupted by noise and errors of variouskinds, and in order to obtain a consistent data set it isnecessary to perform data reconciliation. Methods basedon estimation theory and other techniques are used toachieve this. Complete and rigorous model-based recon-ciliation (CRMR) is therefore a feature of CIM. Datareconciliation is discussed further in Chapter 30. Oneimplication of CIM is the plant is run under much tightercontrol, which should be beneficial to safety.

13.7.1 Batch plantsBatch processing involves not only sequential operationsbut also a high degree of variability of equipment statesand is particularly suited to CIM. Accounts of integratedbatch processing include those by Rosenof (1982b),Armstrong and Coe (1983), Rippin (1983), Severns andHedrick (1983), Bristol (1985), Krigman (1985), Egli andRippin (1986), Kondili, Panteides and Sargent (1988),Cott and Macchietto (1989) and Crooks, Kuriyna andMacchietto (1992).

In the system described by Cott and Macchietto(1989), use is made of three levels of control, whichare, in descending order: plant level control, batch levelcontrol and resource level control, operating respectivelyon typical time-scales of days, minutes and seconds. Acomprehensive approach to batch processing requires theintegration of tools for plant design, automation andoperating procedures.

13.8 Instrument Failure

Process plants are dependent on complex controlsystems and instrument failures may have seriouseffects. It is helpful to consider first the ways in whichinstruments are used. These may be summarized asfollows:



Instrument System application

Measuring instrument Input to:Display system�measurement/status/alarmControl loopTrip systemComputer model

Control element Output from:Control loopTrip system

Measuring instruments are taken to include digital aswell as analogue outputs. Control elements are normallycontrol valves, but can include power cylinders, motors,etc.

The important point is that some of these applicationsconstitute a more severe test of the instrumentation thanothers. The accuracy of a flowmeter may be sufficient forflow control, but it may not be good enough for an inputto a mass balance model in a computer. The dynamicresponse of a thermocouple may be adequate for a paneldisplay, but it may be quite unacceptable in a tripsystem.

This leads directly, of course, to the question of thedefinition of failure. In the following sections variouskinds of failure are considered. It is sufficient here to

emphasize that the reliability of an instrument dependson the definition of failure, and may vary depending onthe application.

13.8.1 Overall failure ratesThere are more data on the failure rates of instrumenta-tion than on most other types of plant equipment. It isnow usually possible to obtain sufficient data forassessment purposes, though there are inevitably somegaps. There are two types of failure data on instruments.The first relates to performance in standard instrumenttests and the second to performance on process plant. Itis the latter which is of primary interest here.

Many of the data on the failure rates of instruments onprocess plants derive from the work of the UK AtomicEnergy Authority (UKAEA). Table 13.5 gives data quotedin early investigations by UKAEA workers. Table 13.6shows data in another early survey by Anyakora, Engeland Lees (1971) in three works in the chemical industry.The first (works A) was a large works producing a widerange of heavy organic chemicals. The second (works B)made heavy inorganic chemicals. The third (works C)was two sites in a glass works. The failures were definedas and derived from job requests from the processoperators. The failure rates were calculated on theassumption of a constant failure rate. The environmentfactor quoted in the table is explained below.

The failure rates given in Table 13.6 are in broadagreement with other work published about the sametime, such as that of Skala (1974). It should be


Table 13.5 Some data on instrument failure rates published by the UKAEA

Instrument Failure rate (faults/year) Referencea

Observed Assumed/predicted

Control valveb 0.25 1�40.26 5, 6

Solenoid valve 0.26 5, 6Pressure relief valve 0.022 5, 6Hand valve 0.13 5, 6Differential pressure transmitterb 0.76 1�4, 7, 8Variable area flowmeter transmitterb 0.68 1�4, 8Thermocouple 0.088 5, 6Temperature trip amplifler:

type A 2.6 7type B 1.7 7

Pressure switch 0.14 1�4, 8Pressure gauge 0.088 5, 6O2 analyser 2.5 1, 2, 4, 8Controllerb 0.38 7Indicator (moving coil meter) 0.026 5, 6, 8Recorder (strip chart) 0.22 5, 6, 8Lamp (indicator) 0.044 5, 6, 8Photoelectric cell 0.13 5, 6Tachometer 0.044 5, 6Stepper motor 0.044 5, 6Relayb 0.17 7Relay (Post Office) 0.018 5, 6

a (1) Hensley (1967 UKAEA AHSB(S) R136); (2) Hensley (1968); (3) Hensley (1969 UKAEA AHSB(S) R178); (4) Hensley (1973 SRS/GR/1); (5) Green and Bourne (1966 UKAEA AHSB(S) R117); (6) A.E. Green and Bourne (1972); (7) Eames (1966); (8) Green (1966UKAEA AHSB(S) R113)b Pneumatic.


emphasized that of the failures given in the tables only avery small proportion resulted in a serious plantcondition. In most cases the failures were detected bythe process operator, who then called in the instrumentmaintenance personnel.

The failure rates quoted are those for normalcommercial instruments in the process industries. Incertain other applications where higher instrument costsare acceptable, the failure rates are lower. Thus instru-ments used in some defence applications are an order of


Table 13.6 Some instrument failure rate data from three chemical works (Anyakora, Engel and Lees, 1971)(Courtesy of the Institution of Chemical Engineers)

Instrument No. at Instrument Environment No. of Failurerisk years factor faults (faults/year)

Control valve 1531 747 2 447 0.60Power cylinder 98 39.9 2 31 0.78Valve positioner 334 158 1 69 0.44Solenoid valve 252 113 1 48 0.42Current/pressure transducer 200 87.3 1 43 0.49Pressure measurement 233 87.9 3 124 1.41Flow measurement (fluids): 1942 943 3 1069 1.14

Differential pressure transducer 636 324 3 559 1.73Transmitting variable area flowmeter 100 47.7 3 48 1.01Indicating variable area flowmeter 857 409 3 137 0.34Magnetic flowmeter 15 5.98 4 13 2.18

Flow measurement (solids):Load cell 45 17.9 � 67 3.75Belt speed measurement and control 19 7.58 � 116 15.3

Level measurement (liquids): 421 193 4 327 1.70Differential pressure transducer 130 62 4 106 1.71Float-type level transducer 158 75.3 4 124 1.64Capacitance-type level transducer 28 13.4 4 3 0.22Electrical conductivity probes 100 39.8 4 94 2.36

Level measurement (solids) 11 4.38 � 30 6.86Temperature measurement (excludingpyrometers): 2579 1225 3 425 0.35

Thermocouple 772 369 3 191 0.52Resistance thermometer 479 227 3 92 0.41Mercury-in-steel thermometer 1001 477 2 13 0.027Vapour pressure bulb 27 10.7 4 4 0.37Temperature transducer 300 142 3 124 0.88

Radiation pyrometer 43 30.9 4 67 2.17Optical pyrometer 4 3.4 4 33 9.70Controller 1192 575 1 164 0.29Pressure switch 549 259 2 87 0.34Flow switch 9 3.59 � 4 1.12Speed switch 6 2.39 � 0 �Monitor switch 16 6.38 � 0 �Flame failure detector 45 21.3 3 36 1.69Millivolt-current transducer 12 4.78 � 8 1.67Analyser: 86 39.0 � 331 8.49

pH meter 34 15.8 � 93 5.88Gas�liquid chromatograph 8 3.43 � 105 30.6O2 analyser 12 5.67 � 32 5.65CO2 analyser 4 1.90 � 20 10.5H2 analyser 11 5.04 � 5 0.99H2O analyser (in gases) 3 1.38 � 11 8.00Infrared liquid analyser 3 1.43 � 2 1.40Electrical conductivity meter (for liquids) 5 1.99 � 33 16.70Electrical conductivity meter (for water

in solids) 3 1.20 � 17 14.2Water hardness meter 3 1.20 � 13 10.9

Impulse lines 1099 539 3 416 0.77Controller settings 1231 609 � 84 0.14


magnitude more expensive, but have a much higherreliability. Further data on instrument failure rates aregiven in Appendix 14.

13.8.2 Factors affecting failureSome of the factors which affect instrument failure arelisted below.

(1) System context:(a) application (display, control, etc.);(b) specification (accuracy, response, etc.);(c) definition of failure.

(2) Installation practices.(3) Environmental factors � process materials:

(a) degree of contact (control room, plant);(b) material phase (gas, liquid, solid);(c) cleanliness;(d) temperature;(e) pressure;(f) corrosion;(g) erosion.

(4) Environmental factors � ambient and plant conditions:(a) temperature;(b) humidity;(c) dust;(d) frost exposure;(e) vibration;

(f) impact exposure.(5) Operating factors:

(a) movement, cycling.(6) Maintenance practices.

There is little information available on which to assessthe effect of these factors. In the survey by Anyakora,Engel and Lees an attempt was made to assess the effectof environment, defining this rather loosely in terms ofboth ambient conditions and process materials. Twoapproaches were tried. One was to compare the effectof being or not being in contact with process fluids.Table 13.7 shows this effect for two groups of instru-ments, one consisting of those which are in contact andone consisting of those which are not. The instrumentswhich are not in contact with process fluids show amuch lower failure rate, although control valves andtemperature measurements are exceptions.

The other approach, which was applied to instrumentswhich are in contact with process fluids, was todistinguish between `clean' and `dirty' fluids. A fluidwas regarded as dirty if it contained `gunk', polymerized,corroded, etc. Table 13.8 gives data for instruments inthese two cases.

From this work it was concluded, as a first approxima-tion and for the instruments considered, that the severityof the environment of an instrument depends on the


Table 13.7 Effect of environment on instrument reliability: instruments in contact with and not in contact with processfluids (Anyakora, Engel and Lees, 1971) (Courtesy of the Institution of Chemical Engineers)

Instrument No. at risk No. of faults Failure rate(faults/year)

Instruments in contact with process fluids: 2285 1252 1.15Pressure measurement 193 89 0.97Level measurement 316 233 1.55Flow measurement 1733 902 1.09Flame failure device 43 28 1.37

Instruments not in contact with process fluids : 2179 317 0.31Valve positioner 320 62 0.41Solenoid valve 168 24 0.30Current-pressure transducer 89 23 0.54Controller 1083 133 0.26Pressure switch 519 75 0.30

Control valve 1330 359 0.57Temperature measurement 2391 326 0.29

Table 13.8 Effect of environment on instrument reliability: instruments in contact with clean and dirty fluids(Anyakora, Engel and Lees, 1971) (Courtesy of the Institution of Chemical Engineers)

Instrument No. at risk No. of faults Failure rate(faults/year)

Control valve:Clean fluids 214 17 0.17Dirty fluids 167 71 0.89

Differential pressure transmitter:Clean fluids 27 5 0.39Dirty fluids 90 82 1.91


aggressiveness of any process materials with which it isin contact and that other factors are generally ofsecondary importance.

If the failure rate is taken to be a product of a basefailure rate and of an environment factor, then Tables13.7 and 13.8 suggest that a maximum value of about 4is appropriate. Environment factors are given in Table13.6; the failure rates given in the table are the originaldata and should be divided by the environment factor togive the base failure rate. It should also be noted that thesampling/impulse line failure rates given in Table13.6 should be added to the failure rates of theinstruments themselves to obtain the failure rates ofinstallations.

13.8.3 Failure modesThe overall failure rate of an instrument gives onlylimited information. It is often necessary to know itsfailure modes. Failure modes can be classified in severalways. Some important categories are (1) condition, (2)performance, (3) safety, and (4) detection.

In a failure classification based on conditions, a failuremode is exemplified by a faulty bellows on a flowmeteror a broken diaphragm in a control valve. In aclassification by performance illustrations of failure are


Table 13.9 Failure modes of some instruments (Lees,1973b) (Courtesy of the Institution of ChemicalEngineers)

Instrument failure mode No. of faults

Control valve:Leakage 54Failure to move freely:

sticking (but moving) 28seized up 7not opening 5not seating 3

Blockage 27Failure to shut off flow 14Glands repacked/tightened 12Diaphragm fault 6Valve greased 5General faults 27

Thermocouple:Thermocouple element faults 24Pocket faults 11General faults 20

Table 13.10 Failure modes of some instruments defined by performance

Reference

Level measurement and alarm: Failure rate Lawley (1974b)(faults/year)

Level indicator fails to danger 2High level alarm fails 0.2

ProbabilityOperator fails to observe level indicator or take action 0.04Operator fails to observe level alarm or take action 0.03

Flow measurement and control: Failure rate S.B. Gibson (1977b)For an FRC where high flow is undesirable: (faults/year)

Flow element fails giving low reading 0.1Flow transmitter fails giving low reading 0.5Flow recorder controller fails calling for more flow 0.4Flow control valve fails towards open position 0.1

Fractional dead timeHigh flow trip fails to operate 0.01

For an FRC where low flow is undesirable: Failure rate(faults/year)

Flow element fails giving high reading 0.2Flow transmitter fails giving high reading 0.4Flow recorder controller fails calling for less flow 0.4Flow control valve fails towards closed position 0.2

Fractional dead timeLow flow trip fails to operate 0.01Low flow trip left aborted after start-up 0.01

Manual and control valves: Failure rate Lawley (1974b)(faults/year)

Manual isolation valve wrongly closed 0.05 and 0.1Control valve fails open or misdirected open 0.5Control valve fails shut or misdirected shut 0.5

FRC, flow recorder controller


a zero error in a flowmeter or the passing of fluid whenshut by a control valve. A performance classificationemphasizes effects and a condition classification empha-sizes causes, but the distinction is not rigid: a blockagein a control valve could reasonably be classed either way.The safety classification divides faults into fail-safe andfail-dangerous. The detection classification distinguishesbetween revealed and unrevealed faults: a revealed faultsignals its presence and is at once detectable, anunrevealed fault is not immediately detectable, but isusually detected by a proof check.

Condition and performance may be regarded as theprimary types of failure. Safety and detection modes maybe obtained from these and from the system context ofthe instrument.

Table 13.9 shows some data obtained by Lees (1973b)in the survey already described on failure modes inthermocouples and control valves. These failure modesare essentially classified by condition, although thecondition is often revealed by a performance failure.Similar data for other instruments are given in the

original paper. Some data used in reliability studiesdescribed in the literature on failure modes of instru-ments and control loops are shown in Table 13.10.

If information is available on overall but not on modefailure rates, it is sometimes assumed that about one-third of the faults are in the fail-dangerous mode. Thesafety and detection failure modes of the temperature tripamplifier shown in Figure 13.7 have been analysed byEames (UKAEA 1973 SRS/GR/12) as shown in Table13.11. The fault is described by a four-letter code. Thefirst indicates that it is fail-safe (S), fail-dangerous (D) ora calibration shift in the dangerous direction (C). Thesecond is the number of the equipment adverselyaffected by the fault The number 1�5 refers to,respectively, the main trip, the excess margin alarm,the low margin alarm, and the indicating meter shown inthe figure and the indicating lamp (not shown). The thirdletter indicates that the fault is revealed (r) or unrevealed(u). The fourth is the number of the equipment whichreveals the fault; the numbering code is as before. Thevarious failure rates of the equipment are as follows:

Faults Failure rate

(faults/106 h) (faults/year)

Fail-dangerous (D1) 9.85 0.086Fail-dangerous,

unrevealed (D1u) 4.6 0.040Total 145.5 1.27

Thus the total fail-dangerous and fail-dangerous unre-vealed faults are, respectively, 6.8% and 3.2% of the totalfaults, which is one measure of the success of the fail-safe design of the equipment.

13.8.4 Prediction of failure ratesIt is sometimes necessary to know the failure rate of aninstrument for which field data are not available. To meetthis situation methods have been developed for estimat-ing the failure rate of an instrument from those of itsconstituent parts. Table 13.12 shows part of a predictionby Hensley (UKAEA 1969 AHSB(s) R178) of the failurerate of a pressure switch.

A comparison of some observed and predicted failurerates is given in Table 13.13. It can be seen that theagreement is quite good. A more quantitative measure ofthe effectiveness of the technique is Figure 13.8, which


Figure 13.7 Temperature trip amplifier (Eames, 1973UKAEA SRS/GR/12) (Courtesy of the UK Atomic EnergyAuthority, Systems Reliability Directorate)

Table 13.11 Failure modes and rates of a temperature trip amplifier (Eames, 1973 UKAEA SRS/GR/12) (Courtesy ofthe UK Atomic Energy Authority, Systems Reliability Directorate)

Failure rate(faults/106 h)

Sr1 53.89 D1r2 4.85 D2r3 9.95 C1u 3.15Sr2 2.1 D1r3 0.4 D2r4 0.15 C2u 6.79Sr3 7.4 D1u 4.6 D2u 6.0 C3u 5.77Sr5 5.0 D3r2 0.35Su 15.84 D3u 3.3

D4r4 5.8D5u 10.2

Total S 84.23 Total D1 9.85 Total other D 35.75 Total C 15.71


is given by A.E. Green and Bourne (1972) and shows theratio of observed and predicted failure rates for a numberof equipments. The median value of this ratio is 0.76 andthe probabilities of the ratio being within factors of 2 and4 of this value are 70% and 96% respectively.

13.8.5 Loop failure ratesData on failure rates of complete control loops have beenthe given by Skala (1974) and are shown in Table 13.14.Loop failure rates can be calculated from the failure ratesof the constituent instruments. The failure rates of a loopwith a pneumatic flow indicator controller, as calculatedfrom the data in Table 13.5 (UKAEA), as calculated fromthe data in Table 13.6 (Anyakora, Engel and Lees), andas given by Skala, are shown in Table 13.15.

Again it should be emphasized that of the failure ratesfor loops given in these tables only a very smallproportion results in a serious plant upset or trip. Inone study of the control loop failures on a large chemicalplant quoted by M.R. Gibson (1978), it is found that therehad been three control loop failures which resultedin plant trips and that the frequency of such failureswas one failure every 20 years per loop.

13.8.6 Detection of failureIf instrument failure occurs, it is important for it to bedetected. The ease of detection of an instrument failuredepends very much on whether the fault is revealed orunrevealed. Unrevealed faults are generally detectableonly by proof testing.

An instrument fault which is revealed is usuallydetected by the process operator either from thebehaviour of the instrument itself or from the effect ofthe failure on the control system. There are, however,developments in the use of the process computer todetect instrument faults. Fault detection by the operatorand by the computer is discussed in Chapters 14 and 30,respectively.

The detection of failure in instruments which have abinary output such as pressure or level switches isparticularly difficult, because the fault if generallyunrevealed, but is particularly important, because suchinstruments are frequently part of an alarm or tripsystem. One approach to the problem is to use aninstrument with a continuous range output rather than abinary output. Thus a level measuring instrument may beused instead of a level switch. In this way may of thefaults on the instrument which would otherwise beunrevealed become revealed.


Table 13.12 Predicted failure rates of a pressure switch (Hensley, 1969 UKAEA AHSB(S) R178) (Courtesy of theUK Atomic Energy Authority, Systems Reliability Directorate)

Component Fault Category Failure rate (faults/106 h)

Dangerous Safe Total

Spring Fracture Dangerous 0.2Bellows Rupture Safe 5.0Screws � pivot (2 items) Loosen Dangerous 1.0Microswitch Random Dangerous 25% 0.5

Safe 75% 1.5Total of above 5 � � 1.7 6.5Total for 30 components in instrument 2.9 11.7 14.6

(faults/year)Total for 30 components in instrument 0.025 0.10 0.13

Table 13.13 Observed and predicted instrument failure rates

Instrument Failure rate (faults/year) Referencea

Observed Predicted

Control valveb 0.25 0.19 1�3Differential pressure transmitterb 0.76 0.45 1�4Variable area flowmeter transmitterb 0.68 0.7 1�3Temperature trip amplifier:

type A 2.6 2.8 1, 2, 4type B 1.7 2.1 4

Controller 0.38 0.87 4Pressure switch 0.14 0.13 1�3Gas analyser 2.5 3.3 1, 2Relayb 0.17 0.35 4

a (1) Hensley (1967 UKAEA AHSB(S) R136); (2) Hensley (1968); (3) Hensley (1969 UKAEA AHSB (S) R178); (4) Eames (1966).b Pneumatic.


13.8.7 Self-checking instrumentsDevelopments are also occurring in instruments whichhave a self-checking capability. Principles on which suchinstruments are designed include (1) multiple binaryoutputs and (2) electrical sensor check.

A self-checking level measuring instrument which usesmultiple binary outputs has been described by Haslerand Martin (1971). The instrument has a series of binaryoutput points, which measure the liquid level at differentheights. These points provide a mutual check. Thus, forexample, if there are 10 points and the liquid level is upto point 5, so that this point gives a positive output, theabsence of a positive output from point 4 indicates afailure on that point

A self-checking level switch in which electronic signalsare used to check the state of the sensor has beendescribed by J.O. Green (1978).

Increasingly, instruments are also being provided withthe enhanced capabilities available from the incorporationof microprocessors. Self-checking is one such capability.

A general account of such instruments is given inIntelligent Instrumentation (Barney, 1985). A furtherdiscussion of intelligent, or smart, instruments is givenby the CCPS (1993/14).

13.8.8 Fault-tolerant instrumentationInstrument systems should have a degree of faulttolerance. The need for fault-tolerant systems hasalready been mentioned in relation to computer sys-tems, where certain basic principles were outlined. Theseprinciples are equally applicable to the design of fault-tolerant instrument systems. Two main features of suchinstrumentation are redundancy and/or diversity and fail-safe operation. Fault-tolerant design of instrument sys-tems is discussed by Bryant (1976), Ida (1983),Frederickson and Beckman (1990) and the CCPS(1993/14).

13.8.9 Instrument testingInformation is also available on the performance ofinstruments when subjected to a battery of standardtests. The evaluation of instruments is carried out bothby special testing organizations and by major users. Inthe UK the main organization concerned with instrumentevaluation is the Scientific Instrument ResearchAssociation (SIRA). Some of the tests carried out bySIRA have been described by Cornish (1978a). Theinstruments tested are normal production models.


Figure 13.8 Ratio of observed to predicted equipment failure rates (A.E. Green and Bourne, 1972) (Reproducedwith permission from Reliability Technology by A.E. Green and J.R. Bourne, Copyright #, 1972, John Wiley andSons Inc.)


Results of instrument evaluations by SIRA for theperiod 1971�76 have been given by Cornish (1978b) andare shown in Table 13.16. The reference conditions arethe manufacturer's specification or, where no specifica-tion is quoted, an assumed specification based on currentpractice. The influence conditions refer to variations inelectrical power or instrument air supply, high and lowtemperature, and humidity. These failure rates under testare high, but similar results are apparently obtained inother industrial countries.

13.9 Trip Systems

It is increasingly the practice in situations where ahazardous condition may arise on the plant to providesome form of automatic protective system. One of theprincipal types of protective system is the trip system,which shuts down the plant, or part of it, if a hazardouscondition is detected. Another important type of protec-tive system is the interlock system, which prevents theoperator or the automatic control system from following ahazardous sequence of control actions. Interlock systemsare discussed in Section 13.10.

Accounts of trip systems are given in ReliabilityTechnology (A.E. Green and Bourne, 1972) and byHensley (1968), R.M. Stewart (1971), Kletz (1972a), deHeer (1974), Lawley and Kletz (1975), Wells (1980),Barclay (1988), Rushton (1991a,b) and Englund andGrinwis (1992).

The existence of a hazard which may require aprotective system is usually revealed either during thedesign process, which includes, as routine, considerationof protective features, or by hazard identification techni-ques such as hazop studies.

The decision as to whether a trip system is necessaryin a given case depends on the design philosophy. Thereare quite wide variations in practice on the use of tripsystems. There is no doubt, however, about the generaltrend, which is towards the provision of a morecomprehensive coverage by trip systems. The problemsare considered further in Chapter 14. The decision as towhether to install a trip system can be put on a lesssubjective basis by making a quantitative assessment ofthe hazard and of the reliability of the operator inpreventing it.


Table 13.14 Control loop failure rates (after Skala, 1974)(Reproduced from Instrument Technology with permissionof the publisher, Copyright #, Instrument Society ofAmerica, 1974)

Loop failures (by type of loop):(faults/year)

PlC 1.15PRC 1.29FIC 1.51FRC 2.14LIC 2.37LRC 2.25TIC 0.94TRC 1.99

Loop failures (by frequency per loop):a

(% loops) (faults/year)25 034 114 29 35 44 53 62 71 81 90 100 112 12

Loop failures (by element in loop):(loop element) (% faults)Sensing/sampling 21Transmitter 20Transmission 10Receiverb 18Controller 7Control valve 7Other 17

a These data have been read from Figure 2b of the originalpaper.b Presumably indicators, recorders.

Table 13.15 Failure rates for a pneumatic flow indicatorcontrol loop

UKAEA data: (faults/year)Differential pressure transmitter 0.76Controller 0.38Control valve 0.25

1.39

Anyakora, Engel and Lees' data:Impulse lines 0.26Differential pressure transmittera 0.58Controllera 0.29Control valvea 0.30Valve positionera 0.09 (0.2�0.44)b

1.52Skala's data:FIC loopc 1.51

a Pneumatic.b It is assumed that 20% of the valves have positioners.c FIC, Flow indicator controller.

Table 13.16 Instrument test failures (after Comish,1978b)

Fault Instrumentssubject to fault(%)

Instruments faulty as received 21Outside specification under reference

conditions 27Outside specification under influence

conditions 30Component failure during evaluation 27Inadequate handbook/manual 26Modification to design or

manufacturing method afterevaluation 33


13.9.1 Single-channel trip systemA typical, single-channel trip system is shown in Figure13.9. It consists of a sensor, a trip switch and a tripvalve. The configuration of a trip loop is therefore notdissimilar to that of a control loop. The difference is that,whereas the action of a control loop is continuous, thatof a trip loop is discrete.

The trip switch may be of a general type, beingcapable of taking an electronic or pneumatic signal fromany type of sensor. Thus in a pneumatic system apressure switch would serve as the trip switch.Alternatively, the trip switch and the sensor may becombined to give a switch dedicated to a particularvariable. Thus common types of trip switch include flow,pressure, temperature, level and limit switches.

13.9.2 Dependability of trip systemsSince a trip system is used to protect against ahazardous condition, it is essential for the system itselfto be dependable. The dependability of a trip systemdepends on (1) capability and (2) reliability. Thus it isnecessary both for the system to have the capability ofcarrying out its function in terms of features such asaccuracy, dynamic response, etc., and for it to be reliablein doing so.

The reliability of the trip system may be improved bythe use of (1) redundancy and (2) diversity. Thus oneapproach is to use multiple redundant instruments, whichgenerally give a reliability greater than that of a singleinstrument. But redundancy is not always the fullanswer, because there are some dependent failureswhich may disable the whole set of redundant instru-ments. This difficulty can be overcome by the use ofdiversity, which is exemplified by the use of differentmeasurements to detect the same hazard and by the useof different instruments to measure the same variable.

Most trip systems consist of a single channelcomprising a sensor, a switch and a shut-off valve, butwhere the integrity required is higher than that whichcan be obtained from a single channel, redundancy isgenerally used.

A trip system should be reliable against functionalfailure, i.e. failure which prevents the system shutting theplant down when a hazardous condition occurs. Such a

condition is not normally present and its rate ofoccurrence, which is the demand rate on the tripsystem, is usually very low. Thus functional failures ofthe system are generally unrevealed failures. The tripsystem should also be reliable against operational failure,i.e. failure which causes the system to shut the plantdown when no hazardous condition exists. Thus opera-tional failures of the system are always revealed failures.

It is the object of trip system design and operation toavoid both loss of protection against the hazardouscondition due to functional failure and plant shut-downdue to operational failure, or spurious trip. Sincefunctional failure of the system is generally unrevealed,it is necessary to carry out periodic proof testing todetect such failure.

The simpler theoretical treatments of trip systemsusually assume that the functional failures are unre-vealed and the operational failures revealed and that thefailure rates are constant; this approach is followed here.The treatment draws particularly on the work of A.E.Green and Bourne (1966 UKAEA AHSB(S) R117), someof which was later published by the same authors inReliability Technology (1972).

13.9.3 Fractional dead timeThe fractional dead time (FDT) of an equipment orsystem gives the probability that it is in a failed state. Ifthe failure of an equipment is revealed with a revealedfailure rate �, the FDT � depends on the failure rate andthe repair time � r:

� � ��r �r � 1 13:9:1� �For a series system with revealed failure, the FDT � ofthe system is related to the FDT �i of the constituentequipments as follows:

� �Xn

i�1

�i �i�ri � 1 13:9:2� �

For a parallel system with revealed failure, the FDT � ofthe system is related to the FDT �i of the equipments asfollows:

� �Yn

i�1

�i �i�ri � 1 13:9:3� �

For a parallel redundant, or 1/n (1-out-of-n), system withrevealed failure, the FDT �1/n of the system is related tothe FDT �1/1 of a single equipment as follows:

�1=n � �n1=1 �13:9:4�

If, however, the failure of an equipment is unrevealedwith an unrevealed failure rate �, the FDT depends onthis failure rate and on the proof test interval �p. Theprobability q of failure within time period t is:

q � 1ÿ exp�ÿ�t� 13:9:5� �or, for small values of �t

q � �t �t � 1 �13:9:6�

Then the FDT is:


Figure 13.9 A trip system


� � 1

�p

��p

0qdt �13:9:7�

For a 1/n system with unrevealed failure the FDT�1=nof the system is obtained from the probability q1=n offailure of the system within the time period t:

�1=n �1

�p

��p

0q1=ndt �13:9:8�

A detailed account of fractional dead times is given byA.E. Green and Bourne (UKAEA 1966 AHSB(S) R117).

13.9.4 Functional reliability of trip systemsFunctional failure of a trip system is here assumed to beunrevealed. The failure rate � used in the equations inthis section is that applicable to these unrevealed fail-dangerous faults.

For a simple trip system consisting of a single channel1/1 (1-out-of-one) system with a failure rate � theprobability q of failure within proof test interval �p is:

q � 1ÿ exp�ÿ��p� 13:9:9� �For small values of ��p:

q � ��p ��p � 1 13:9:10� �The trip system is required to operate only if ahazardous plant condition occurs. The probability p�that such a plant demand, which has a demand rate �,will occur during the dead time �0 after the failure is:

p� � 1ÿ exp�ÿ��0� 13:9:11� �But, on average, the dead time �0 is half the proof testinterval �p:

�0 ��p

213:9:12� �

Hence:

p� � 1ÿ exp�ÿ��p

�2� 13:9:13� �

For small values of ��p:

p� ��p

2��p � 1 13:9:14� �

The probability p� that a plant hazard will be realized canbe written in terms of the plant hazard rate �:

p� � 1ÿ exp�ÿ��p� 13:9:15� �For small values of ��p:

p� � ��p ��p � 1 13:9:16� �Frequently, some or all of the approximations ofEquations 13.9.10, 13.9.14 and 13.9.16 apply. If all do,then taking

p� � qp� 13:9:17� �gives

� � ��p

2��p � 1; ��p � 1; ��p � 1 13:9:18� �

If the assumptions underlying Equation 13.9.18 are notvalid, an expression which has been commonly used is:

� � �p� 13:9:19� �Alternatively, the plant hazard rate can be expressed interms of the FDT � of the system:

� � �� 13:9:20� �As given earlier, the probability q of failure of the simpletrip system within the time period t is:

q � 1ÿ exp�ÿ�t� 13:9:21� �For small values of �t:

q � �t �t � 1 13:9:22� �Then the FDT � of the simple trip system is:

� � 1

�p

��p

0qdt 13:9:23� �

Hence:

� � ��p

213:9:24� �

and

� � ��p

213:9:25� �

as before.For a parallel redundant, or 1/n (1-out-of-n), system, or

for an m/n (m-out-of-n) system, which may be a majorityvoting system, the following treatment applies. If thereare n equipments of which m must survive for thesystem to survive and r must fail for the system to fail*:

r � nÿm� 1 �13:9:26�The probability qm=n of failure of this system within theproof test interval is:

qm=n �Xn

k�r

n

k

� �qk 1ÿ q� �nÿk 13:9:27� �

For small values of q:

qm=n �n

r

� �qr q� 1 �13:9:28�

The FDT �m=n of the system is:

�m=n �1

�p

��p

0qm=ndt 13:9:29� �

Then from Equations 13.9.22, 13.9.28 and 13.9.29 theFDT �m=n of the system is:

�m=n �n

r

� � ��p

ÿ �r

r � 113:9:30� �

Thus for a 2/3 majority voting system:

�2=3 � ��p

ÿ �213:9:31� �

For the special case of a parallel redundant, or 1/n (1-out-of-n), system, Equation 13.9.30 reduces to give theFDT �1=n :

*In this trip chapter the number of equipments which must failfor the system to fail is r. This notation differs from that used inChapter 7 for r-out-of-n parallel systems, in which r was used forthe number of equipments which must survive for the system tosurvive. These two notations are used in order to preservecorrespondence with established usage in texts on generalreliability (e.g. Shooman, 1968) and on trip systems (e.g. A.E.Green and Bourne, 1972).



Table 13.17 Fractional dead times for trip systems withsimultaneous proof testing

System n m r �

1/1 1 1 1��p

2�1=1

1/2 2 1 2��p

ÿ �23

43�

21=1

1/3 3 1 3��p

ÿ �34

2�31=1

2/2 2 2 1 ��p 2�1=1

2/3 3 2 2 ��p

ÿ �24�21=1

Table 13.18 Fractional dead times for trip systems withstaggered proof testing (A.E. Green and Bourne, 1972)(Reproduced with permission from Reliability Technologyby A.E. Green and J.R. Bourne, Copyright #, 1972,John Wiley and Sons Inc.)

System �*�

��

1/1��p

21

1/2 524 ��p

ÿ �21.6

1/3 112 ��p

ÿ �33.0

2/2 ��p 1

2/3 23 ��p

ÿ �21.5

a �, fractional dead time for simultaneous testing; �, fractionaldead time for staggered testing.

�1=n ��p

ÿ �n

n� 113:9:32� �

Thus for a 1/2 parallel system

�1=2 ��p

ÿ �2

313:9:33� �

and for a 1/3 system

�1=3 ��p

ÿ �3

413:9:34� �

Fractional dead times calculated from Equation 13.9.30are shown in Table 13.17, both as functions of ��p and ofthe FDT �1=1 for a single channel. Table 13.17 assumesthat the instruments are tested simultaneously at the endof the proof test interval. Some improvement can beobtained by staggered testing, as shown by the data inTable 13.18, which are taken from A.E. Green andBourne (1972).

13.9.5 Operational reliability of trip systemsIt is also necessary to consider operational failure of tripsystems. Operational failure is here assumed to berevealed. The failure rate � used in the equations in

this section is that applicable to these revealed fail-safe,or fail-spurious, faults.

For a simple trip system consisting of a single channel1/1 system with a failure rate � the operational failure,or spurious trip, rate is:

� � �13:9:35�For a parallel redundant, or 1/n, system the operationalfailure rate 1=n is:

1=n � n� �13:9:36�For an m/n system, which may be a majority votingsystem, the following treatment applies. The rate atwhich the first operational failure of a single channeloccurs is n�. This first failure only results in a systemtrip if further operational failure of single channelssufficient to trip the system occurs within the repairtime � r. The probability q that this will occur is:

q � nÿ 1mÿ 1

� ��r� �mÿ1 ��r � 1 �13:9:37�

Then the operational failure rate m=n of the system is:

m=n � n�q �13:9:38�

m=n � n�nÿ 1mÿ 1

� ��r� �mÿ1 �13:9:39�

Thus for a 2/3 majority voting system:

2=3 � 3�� 2��r� � �13:9:40�

2=3 � 6�2�r �13:9:41�

Operational failure, or spurious trip, rates calculated fromEquations 13.9.36 and 13.9.39 are shown in Table 13.19.

It is emphasized again that the foregoing treatment isa simplified one. The expressions derived here appear,however, to be those in general use (e.g. Hensley, 1968;Kletz, 1972a; de Heer, 1974; Lawley and Kletz, 1975). Fulltheoretical treatments of trip systems are been givenby A.E. Green and Bourne (1966 UKAEA AHSB(S)R117) and Wheatley and Hunns (1981). The latter giveexpressions for a wide variety of trip systems.

13.9.6 Proof testing of trip systemsThe treatment of the functional reliability of trip systemswhich has just been given demonstrates clearly theimportance of the proof test interval. The expressionsderived show that the condition for high functionalreliability is

Table 13.19 Spurious trip rates for trip systems

System

1/1 �1/2 2�1/3 3�2/2 22� r

2/3 6�2� r



��p � 1 �13:9:42�As an illustration of the effect of the proof test interval,consider a simple trip system which has a failure rate of0.67 faults/year on a duty where the demand rate is 1demand/year:

�� 1 demand/year�� 0.67 faults/year

If the proof test interval is 1 week:

�p� 0.0192 year

Then the fractional dead time is:

� � ��p

2� 0:0064

and the plant hazard rate is

� � �� 0:0064 hazards=year

The plant hazard rate for a range of proof test intervalsis:

�p �(hazards/year)

1 week 0.00641 month 0.0271 year 0.26

For the longer proof test intervals the approximateEquation 13.9.18 is not valid and Equation 13.9.19 wasused.

Some additional factors which affect the choice ofproof test interval are the facts that, while it is beingtested, a trip is disarmed and that each test is anopportunity for an error which disables the trip, such asleaving it isolated after testing.

Thus for a simple trip system with a trip disarmedperiod �d and an isolation dead time �is the FDTbecomes:

� � ��p

2� �d

�p� �is 13:9:43� �

This expression has a minimum at:

�p

ÿ �min� 2�d

�

� �1=2

13:9:44� �

The effect of these factors can be illustrated byconsidering the trip system described in the previousexample. If the trip disarmed period is 1 h and theisolation dead time is 0.001:

�d � 1:14� 10ÿ4year

�is � 0:001

�p

ÿ �min� 0:0184 year � 0:96 weeks

Assume a proof test interval of 1 week is chosen:

�p � 0:0192 year

Then the fractional dead time is

�� 0.013

and the plant hazard rate is

�� 0.013 hazards/year

In some instances it is not possible to test all parts ofthe trip system every time a proof test is carried out. Forexample, it is often not permissible to close the shut-offvalve completely. In such cases a partial test is done,checking out the system to demonstrate valve movementbut not valve shut-off.

If for a simple trip system the functional failure rateand proof test interval of the first part of the system are�A and �pA, respectively, and those of the second partare �B and �pB, respectively, then:

qA � 1ÿ exp ÿ�A t� � �13:9:45�

qA � �A t �A t � 1 �13:9:46�Similarly

qB � �Bt �Bt � 1 �13:9:47�Then the FDT is:

� � 1

�pA

��pA

0qAdt � 1

�pB

��pB

0qBdt �13:9:48�

� � 1

2�A�pA � �B�pB

ÿ � �13:9:49�There appears to be some variability in industrial practicewith respect to the proof test interval. Generally aparticular firm tends to have one longer interval, whichis the standard one, and a shorter interval which is usedfor more critical cases. One such pair of intervals is 3months and 1 month. Another is 1 month and 1 week.Thus 1 month and 1 week proof test intervals arementioned in many of the trip system applicationsdescribed by Lawley and Kletz. In some cases thepolicy is adopted that if analysis shows that the prooftest interval for a single trip is short, a redundant tripsystem is used.

13.9.7 Some other trip system characteristicsThere are certain general characteristics which aredesirable in any instrument system, but which areparticularly important in a trip system. A trip systemshould possess not only reliability but also capability. Inother words, when functional, it should be capable ofcarrying out its function. If it is not, then no amount ofredundancy will help. The measuring instrument of thetrip system should be accurate. This is particularlyimportant where the safety margin is relatively fine.

The trip system should have a good dynamic response.What matters here is the ability of the trip to give rapiddetection of the sensed variable and to effect rapidcorrection of that variable or rapid plant shut-down. Thisresponse therefore depends on the trip system itself, butalso on the dynamic response of the plant This aspect isconsidered further in Section 13.9.16.

The trip system should have sufficient rangeability tomaintain accuracy at different plant throughputs. Anotherimportant property of a trip system is the ease andcompleteness with which it can be checked. It is



obviously desirable to be able to check all the elementsin a trip system, but this is not always easy to arrange.

13.9.8 Trip system applicationsSome Illustrations of the specification and design ofprotective systems have been given by Kletz (1972a,1974a) and by Lawley and Kletz (1975). As alreadymentioned in Chapter 12, the use of trip systems insteadof pressure relief valves is sometimes an attractiveproposition, particularly where the relief valve solutioninvolves large flare or toxic scrubbing systems. This isconsidered by Kletz (1974a) and by Lawley and Kletz(1975), who suggest that if a trip system is used insteadof a relief valve, it should be designed for a reliability 10times that of the latter. The reason for this is theuncertainty in the figures and the difference in themodes of failure; a relief valve which fails to operate atthe set pressure may nevertheless operate at a higherpressure, whereas a trip is more likely to fail completely.

The fail-dangerous failure rate of a pressure relief valveand of a simple 1/1 trip system are quoted by theseworkers as 0.01 and 0.67 faults/year, respectively (Kletz,1974a). Then, using these failure rates and assuming ademand rate of 1 demand/year, the plant hazard ratesshown in Table 13.20 are obtained. For the longer prooftest interval ��p � 1 and ��p � 1 do not apply and forthis case the data in the table are obtained not from theapproximate equation 13.9.18, but from Equation 13.9.19.Thus to meet the design criterion suggested forfunctional reliability a 1/2 trip system with weeklytesting is required. The spurious trip rate for thissystem, however, might be unacceptable, leading to arequirement for a 2/3 system.

Another example of the use of trip systems is thehydrocarbon sweetening plant shown in Figure 13.10(Kletz, 1972a). The hydrocarbon is sweetened with smallquantities of air which normally remain completelydissolved, but conditions can arise in which an explosivemixture may be formed. Initially it was assumed that theprincipal problem lay in a change in the air/hydrocarbonratio but a hazop study revealed that a hazard couldarise in a number of ways. One is for an air pocket to beformed, which can occur as follows:

(1) the temperature can be so high that the amount of airnormally used will not dissolve;

(2) an air pocket can be left behind when a filter is recom-missioned;

(3) the pressure can fall, allowing air to come out of solu-tion;

(4) a fault in the mixer can prevent the air being mixedwith hydrocarbon, so that the pockets of air can becarried forward;

(5) a fault in the air/hydrocarbon ratio controller canresult in the admission of excess air.

In this case it is also necessary for the feed to be aboveits flashpoint, which can occur in a number of ways: (1)the temperature can be above the normal flashpoint and(2) the feed can contain low flashpoint material.Alternatively, if there is both a loss of pressure in thereceiver and a failure of the non-return valve, hydro-carbon may find its way into the air receiver.

The fault tree for the hazard is shown in Figure 13.11.The probabilities of the various fault paths wereevaluated from this and the trip requirements wereidentified. The trip initiators considered, shown by thecircles in Figures 13.10 and 13.11, were

(1) a device for detecting a pocket of air in the reactor;(2) a pressure switch for detecting a low pressure in the

receiver;(3) a temperature measurement device for detecting a

high temperature in the feed;(4) laboratory analysis for detecting a low flashpoint feed;(5) a device for detecting a high air/hydrocarbon ratio.

As the latter condition is detected by the trip initiator 1anyway, trip initiator 5 was not used. The shut-downarrangement was that any of the trip initiators 1�3 shutsa valve in the air line and shuts down the compressor.The use of the laboratory analysis of feed flashpoint wasrestricted to ensuring that low flashpoint feeds were onlypresent a sufficiently small fraction of the time to meetthe system specification.

A third example is the distillation column heatingsystem shown in Figure 13.12 (Lawley and Kletz, 1975).Heat was supplied to the distillation column from anexisting steam-heated reboiler and a hot-water-heatedfeed vaporizer. The plant was to be uprated by theaddition of another reboiler and vaporizer.

The existing pressure relief valve was a 2/2 systemand was adequate to handle overpressure from the


Table 13.20 Plant hazard rates for pressure relief valves and for trip systems (after Kletz 1974a) (Courtesy ofChemical Processing)

System Plant hazard rate, �(hazards/year) (years/hazard)

No relief valve or trip 1 say, 1Relief valve:

annual testing 0.004 250Single 1/1 trip:

annual testing 0.264 4monthly testing 0.027 36weekly testing 0.0064 180

Duplicate 1/2 trip:weekly testing 5.5 � 10ÿ5 18 000


existing reboiler and vaporizer. The problem was to copewith the overpressure to which the new reboiler andvaporizer might give rise. Both re-sizing of the existingrelief valves and the addition of a third were unattractivein the particular situation. There was therefore arequirement for a trip system which would shut downboth the steam to the new reboiler and the hot waterpump on the new vaporizer.

Both 1/1 and 1/2 trip systems were considered. The1/1 system consisted of a pressure switch for detecting ahigh pressure on the overhead vapour line, a relay and acontact on the power supply to the hot water pumps anda relay and contact on that to the solenoid-operated shut-off valve on the reboiler steam supply. The 1/2 systemconsisted of a duplication of the 1/1 system. Thesummary of the failure rates of a simple trip systemfor this case is shown in Table 13.21.

The functional reliability of the existing 2/2 relief valvesystem was calculated as follows:

� � 0.005 faults/year

�p � 2 years

� � ��p � 0.01

The target FDT for the new trip system was taken as afactor of 10 less than this, namely 0.001, for the reasonsalready explained. The functional reliability of a 1/1 tripsystem was calculated as follows:

� � 0.42 faults/year (from Table 13.21)

� � ��p

2� 0:21�p

Then, taking into account also the disarmed time and theisolation dead time:

�d � 1 h=test

� 0:0001 years=test

�is � 0:001

� � 0:21�p �0:0001

�p� 0:001

The minimum FDT is at a proof test interval of

�p

ÿ �min� 2�p

��

ÿ �1=2

� 0:022 years

� 8 days

Assuming a proof test interval of 1 week, the FDT is

� � 0.01

This does not meet the target.The authors therefore consider a 1/2 trip system. The

analysis for this system is more complex and takes intoaccount the fact that the complete shut-off of the valvescan be checked only on a proportion of the tests. It isconcluded that a 1/2 system does just meet the targetset.

13.9.9 High integrity trip systemsThe trip system applications described so far have been1/1 or 1/2 systems. The more complex system with 2/3majority voting is now considered.

A major system of this kind, which appears to be thethen most sophisticated on a chemical plant and whichhas been influential in the general development ofprotective systems in the industry, is that on the ethyleneoxide process of ICI described by Stewart and co-workers(R.M. Stewart, 1971, 1974a; R.M. Stewart and Hensley,1971). The ethylene oxide process is potentially extra-ordinarily hazardous: it operates with a reaction mixturevery close to the explosive limit, there is a fire/explosionhazard and a toxic release hazard.

The design of the protective system followed themethods already outlined. The risk criterion was set ata probability of one fatality of 3�10ÿ5 per year. Thehazards were assessed by means of a fault tree, part ofwhich is shown in Figure 13.13.

The system devised is a high integrity protectivesystem (HIPS), consisting of the high integrity tripinitiators (HITIs), the high integrity voting equipment(HIVE) and the high integrity shut-down system (HISS).A schematic system diagram, which omits replicatedsignal connections, is shown in Figure 13.14.


Figure 13.10 Hydrocarbon sweetening plant (Kletz,1972a) (Courtesy of the American Institute of ChemicalEngineers)


Redundancy is fully exploited throughout the system.Against each logic path to fire/explosion in the fault treeat least one parameter was selected initially to be a tripinitiator. The integrity specified in fact required the useof at least two parameters. The choice of the tripinitiating parameters is important but difficult. Some areobvious such as high oxygen concentration, high reactortemperature, low recycle gas flow. Others are far lessobvious, but are needed to guard against combinations offaults or to substitute for other parameters. This latteroccurs, for example, where the measurement response istoo slow, e.g. oxygen concentration, or where the tripcould result in a hazardous condition, e.g. recyclecompressor trip.

The measuring instruments used are carefully selectedand, if necessary, modified to ensure high reliability.

Each parameter is measured by triplicate instruments.The cables from each trip initiator on a parameter go bydifferent routes so that there is less chance of all threebeing disabled by an incident such as a flash fire; theequipments have separate power supplies; and so on. Thearrangement of the shut-down valves in the oxygenline illustrates further the use of redundancy. There aretwo lines each with three valves. A single line representsa 1/3 shut-down system. Duplication is provided topermit complete testing without disarming.

The advantages of the system are that the failure ofone trip initiator in the fail-safe mode does not cause theplant to be tripped spuriously, the failure of one tripinitiator in the fail-dangerous mode does not prevent theplant from being tripped, and the proof testing can bedone without disarming the system.


Figure 13.11 Fault tree for explosion on a hydrocarbon sweetening plant (after Kletz, 1972a) (Courtesy of theAmerican Institute of Chemical Engineers)


The design of the system was subjected to anindependent assessment by assessors within the com-pany who were advised by the UKAEA. The assessorschecked all feasible faults which could lead to hazardousconditions, the capability of the HIPS to carry out theprotective action against the hazardous conditions arisingfrom such faults and the occurrence rate of otherhazardous conditions which the HIPS would not pre-vent, in relation to the design target Table 13.22 showsan extract from the table produced during this assess-ment.

The assessment showed that at this stage the planthazard rate was 4.79 � 10ÿ5/year, which was higherthan the target of 3 � 10ÿ5/year. An extra HITI wasused to reduce the contribution of fault 3 from2.72 � 10ÿ5 to 0.8 � 10ÿ5/year, which brought thesystem within specification.

The assessors also examined the HIPS as installed toensure that there were no significant deviations fromdesign and reviewed the maintenance, calibration andtesting procedures. The quality of the maintenance andtesting is crucial to the integrity of a protective systemand much attention was paid to this aspect.

It was estimated that an alternative system with 70 1/1single trip initiators would result in some 30 spurioustrips a year and that the system used reduced this by afactor of over 12. Since the cost of a trip was estimatedas £2000, the saving due to avoidance of spurious tripswas about £55 000 per annum. The cost of £140 000 for


Figure 13.12 Distillation column heating system(Lawley and KIetz, 1975) (Courtesy of ChemicalEngineering)

Table 13.21 Failure rates of trip systems for a distillation column heating system (Lawley and Kletz, 1975) (Courtesyof Chemical Engineering)

Components Failure rate (faults/year)

Fail-to-danger Fail-safe Total

Trip initiator:Impulse lines � blocked 0.03 � 0.03

� leaking 0.06 � 0.06Pressure switch (contacts open to give trip signal on

rising pressure) 0.10 0.03 0.13Cable fractured or severed � 0.03 0.03Loss of electrical supply � 0.05 0.05

Total 0.19 0.11 0.30

Steam shut-off system:Relay coil (de-energize to trip) � 0.05 0.05Relay contact 0.01 0.01 0.02Relay terminals and wire � 0.01 0.01Solenoid valve (de-energize to trip) 0.10 0.20 0.30Loss of electrical supply (to solenoid valve) � 0.05 0.05Trip valve (closes on air failure) 0.10 0.15 0.25Air supply line � blocked or crushed 0.01 � 0.01

� fractured or holed � 0.01 0.01Loss of air supply � 0.05 0.05

Total 0.22 0.53 0.75

Pump shut-off system:Relay coil, contact, terminals and wire (as above) 0.01 0.07 0.08

0.01 0.07 0.08

CO

NT

RO

LS

YS

TE

MD

ES

IGN

13

/35

07

:23

7/1

1/0

0R

ef:

37

23

LE

ES

±L

oss

Pre

ve

ntio

nin

the

Pro

cess

Ind

ustrie

sC

ha

pte

r1

3P

ag

eN

o.

35

Figure 13.13 Fault tree for fire/explosion on an ethylene oxide plant (after R.M. Stewart, 1971) (Courtesy of the Institution of Chemical Engineers)


the installation was therefore considered justified onthese grounds alone.

A coda to this account has been given by A Taylor(1981), who describes the operation of the trip systemover the period 1971�80. The information on theperformance of the trip system is of two kinds:operations of and tests on the system. Events wereclassified as spurious, genuine and deliberate, the latterbeing initiations by the operators.

Analysis of these events revealed that in a few casesthe demand frequency was greater than that originallyestimated by orders of magnitude. The author gives atable listing seven fault conditions, with the eventnumbering rising to 53, which exemplify the worstdiscrepancies. The two fault conditions which show thegreatest discrepancies are the opening of a certain reliefvalve and loss of reaction:

Fault description Fault condition

Predicted Actual Ratio ofFrequency frequency actual/(events/year) (events/year) predicted

frequency

15 A certain reliefvalve opens 0.001 1.68 1680

48b Loss ofreaction 0.01 1.16 116

The relief valve fault was due to `feathering', which hadnot been anticipated. The loss of reaction fault is notexplicitly explained, but references by the author to theeffect of modifications in reaction conditions may bear onthis. It is noteworthy that of the seven fault conditions itis those for which the original frequency estimates werelowest which are most in error.

With regard to instrument failure rates, in the case ofmagnetic float switches three different failure mechan-isms, and three different failure rates, were observed.Switches operating submerged in clean lubricating oilrecorded no failures; those operating in recycled gas withoccasional slugs of dirty water choked up; and a newtype of switch was found to suffer from corrosion.

The author quotes three examples of dependentfailure. One of these relates to the choked level switchesjust mentioned, which were all on one vessel. On four

occasions, testing of the switches revealed that all werechoked. The test procedure was altered to require that ifone switch was found to be choked, the others shouldbe tested.

There were also mistakes made in the installation ofthe instruments. In one case pneumatic pressureswitches, of a flameproof type which is not waterproof,were located downwind of a low pressure steam ventpipe, and suffered water ingress and corrosion.

13.9.10 PES-based trip systemsA trip system needs to be highly reliable. For thisreason, it has been the practice to design trip systems asseparate, hardwired systems. The acceptability of using aprogrammable electronic system (PES) to implement thetrip functions has long been a matter of debate. Therehas been a marked reluctance to abandon dedicated,hardwired systems.

The most constructive approach to the problem is totry to define the conditions which must be met by aPES-based trip system. As described in Section 13.12, theHealth and Safety Executive (HSE, 1987b) has issuedguidance based on this approach. Further guidance isgiven in the CCPS Safe Automation Guidelines describedin Section 13.15, which are largely concerned with thistopic.

An account of a computer-based trip system on anammonia plant has been given by Cobb and Monier-Williams (1988). The reason given for moving to such asystem is the avoidance of spurious trips. Design optionswere considered based on programmable logic controllerand computer systems. The latter was selected largelybecause it offered a better interface with the operator.The system uses two computers operating in parallel.Some features of the system are: the ability to useinferred measurements; improved reliability of the trips;decreased defeating, or disarming, of the trips; andbetter control of any disarming which does occur.

13.9.11 Disarming of trip systemsIt may sometimes be necessary to disarm a trip. Thisneed arises particularly where a transition is being madebetween one state and another such as during start-up.The disarming of a trip should be assessed to ensurethat it does not negate the design intent, whether thischeck is made at the time of the original design orsubsequently. Such disarming should be the subject of aformal authorization procedure. This may be supplemen-ted by hardware measures such as a key interlock.

If a trip proves troublesome, it is liable to be disarmedwithout such authorization. This is particularly likely tooccur if there are frequent spurious trips, due to sensorfailure or other causes. In order to disarm a trip it maynot be necessary to interfere with the hardware. It isoften sufficient simply to alter to set-point.

13.9.12 Restart after a tripOnce a trip has operated, it is necessary to reset thesystem so that a safer restart can be made. Therefore,the trip action which has driven the plant to a safe stateshould not simply be cancelled, but instead a plannedsequence of actions taken to effect the restart. Onesituation which has frequently led to incidents is therestart of agitation in a batch reactor following aninterruption of agitation.


Figure 13.14 High integrity protective system (afterStewart, 1971) (Courtesy of the Institution of ChemicalEngineers)

CO

NT

RO

LS

YS

TE

MD

ES

IGN

13

/37

07

:23

7/1

1/0

0R

ef:

37

23

LE

ES

±L

oss

Pre

ve

ntio

nin

the

Pro

cess

Ind

ustrie

sC

ha

pte

r1

3P

ag

eN

o.

37

Table 13.22 Assessment of reliability of a high integrity trip system (after R.M. Stewart, 1971) (Courtesy of the Institution of Chemical Engineers)

Description Fault condition Fractional dead time Hazardrate

Occasions Probability Probability Demand rate Relevant HITI HIVE HISS Overall (�105)per year that it that (demands/year) trip (hazards/

leads to operator's initiator year)rupture intervention No.

failsa b c (d� a� b� c) e f g (h� e� f� g) (i� d� h)

1 Feed filters blocked 0.001 0.2 0.1 0.000 02 10 & 12 10ÿ4 10ÿ5 10ÿ5 1.2�10ÿ4 0.000 24

2 Oxygen supply failure 2.0 0.2 0.1 0.04 10 & 38 10ÿ4 10ÿ5 10ÿ5 1.2�10ÿ4 0.48

3 PCV fails open 0.25 0.1 1.0 0.025 11 10ÿ3 8.3�10ÿ5 10ÿ5 1.09�10ÿ3 2.72

4 Compressor antisurgebypass fails open 0.2 1.0 0.1 0.02 18 & 36 10ÿ4 10ÿ5 10ÿ5 1.2�10ÿ4 0.24

5 Gross carryoverfrom absorber 0.1 1.0 0.1 0.01 18, 24 10ÿ4 10ÿ5 10ÿ5 1.2�10ÿ4 0.12

6 etc. etc.

PCV, pressure control valve.


A case history caused by restart after a trip has beendescribed by Kletz (1979a). A cumene oxidation reactorwas fitted with a high temperature trip for which the tripaction was to shut off the air and dump the contents ofthe reactor into a water tank. A spurious trip occurred,the air valve closed and the dump valve opened. The tripcondition cleared itself, the dump valve remained open,but the air valve reopened. Air passed into the reactor,creating a flammable mixture.

13.9.13 Restart after a depressurizationA particular case of restart after a trip is the repressur-ization of a vessel following emergency depressurization.The effect of rapid reduction of pressure in a vesselcontaining a material such as liquefied gas may be tochill the vessel below the transition temperature, thuscreating the hazard of brittle fracture. Too prompt arepressurization, before the vessel has warmed upsufficiently, can result in realization of this hazard.

Cases where this has occurred are mentioned by Valkand Sylvester-Evans (1985). Treatment of the problemusing a model of blowdown has been described by S.M.Richardson and Saville (1992).

13.9.14 Hazard rate of a single-channel trip systemIn the relations for the functional reliability of a single-channel trip given in Section 13.9.3:

� � ��p

213:9:50� �

The assumptions made are that ��p � 1; ��p � 1;��p � 1; as stated.

If Equation 13.9.50 is used outside its range of validity,the results obtained can be not only incorrect butnonsensical. Consider the case where the failure rate is�=0.01 failures/year, the demand rate is �� 3 demands/year and the proof test interval is �p� 1 year. ThenEquation 13.9.50 gives for the hazard rate � a value of0.015 hazards/year, which is actually greater than thefailure rate �.

A treatment is now given for the more general case,based on the work of Lees (1982a) as extended byde Oliveira and Do Amaral Netto (1987). For a single-channel trip one formulation of the possible states is: (1)trip operational; (2) trip failed but failure undetected; and(3) trip failed, failure detected and trip under repair. Thecorresponding Markov model is:

_P1�t� � ÿ�P1�t� � �P3�t� 13:9:51a� �_P2�t� � �P1�t� ÿ �P2�t� 13:9:51b� �_P3�t� � �P2�t� ÿ �P3�t� 13:9:51c� �where Pn is the probability that the trip is in state n.

With the initial condition that the trip is operational,the solution of Equation 13.9.51 is as follows:

P1�t� ��

r1r2

� r1 � �� r1 r1 ÿ r2� � exp r1t� � ÿ r2 r2 � ��

r2 r1 ÿ r2� � exp r2t� �

�13:9:52a�

P2 t� � � ��

r1r2� � r1 � ��

r1 r1 ÿ r2� � exp r1t� � ÿ � r2 � �� r2 r1 ÿ r2� � exp r2t� �

�13:9:52b�

P3 t� � � ��

r1r2� ��

r1 r1 ÿ r2� � exp r1t� � ÿ ��

r2 r1 ÿ r2� � exp r2t� �

�13:9:52c�with

r1 �ÿ �� ÿ � �� ÿ �� 2ÿ 4��1=2

2�13:9:53a�

r2 �ÿ �� ÿ �� 2ÿ 4��1=2

2�13:9:53b�

Then the fractional dead time and hazard rate obtainedfrom Equations 13.9.5b and 13.9.52c are instantaneousvalues, and are:

� t� � � P2 t� � � P3 t� � �13:9:54�

� t� � � � P2 t� � � P3 t� �� 13:9:55�The fractional dead time and the hazard rate given inEquations 13.9.54 and 13.9.55 are functions of time. Theaverage value of the hazard rate over the proof testinterval is:

� � 1

�p

��p

0� t� �dt �13:9:56�

Then, substituting Equation 13.9.55 in Equation 13.9.56and integrating gives for the average hazard rate:

� � �� r1r2

� �� r1 � �� r2

1�p r1 ÿ r2� � exp r1�p

ÿ �ÿ 1� �

ÿ �� r2 � �� r2

2�p r1 ÿ r2� � exp r2�p

ÿ �ÿ 1� � �13:9:57�

The foregoing treatment is based on the assumptionsthat the trip is always operational after a proof test isperformed and that the test duration is negligiblecompared with the proof test interval.

Although this model has a high degree of generality, itis based on the assumption that, following detection of atrip failure, the plant continues to operate while the tripis repaired. If in fact the policy is that the plant operationdoes not continue while the trip is being repaired,different expressions apply. If the state P3(t) is droppedfrom the instantaneous hazard rate �(t) in Equation13.9.55 and the repair rate �� 0, Equation 13.9.57 for theaverage hazard rate then becomes:

� � 1

�p1ÿ 1

�ÿ � � exp ÿ��p

ÿ �ÿ � exp ÿ��p

ÿ �� 6� �

�13:9:58a�

� � 1

�p1ÿ 1� ��p

ÿ �exp ÿ��p

ÿ �� 13:9:58b�

This case is essentially that considered by Lees (1982a),who used the joint density function method. Two of the



relations which he gives, for the failure density functionf� and the probability p� of realization of the hazard, arealso of interest and are

f� ��

�ÿ � exp ÿ��p

ÿ �ÿ exp ÿ��p

ÿ �� 6� � �13:9:59a�

f� � �2�p exp ÿ��p

ÿ �� 13:9:59b�

and

p� � 1ÿ 1

�ÿ � �� exp�ÿ��p� ÿ � exp�ÿ��p�� 6� ��13:9:60a�

p� � 1ÿ �1� ��p� exp�1� ��p� � � � �13:9:60b�

A number of other relations have been given in theliterature for situations where Equation 13.9.50 is notvalid. An expression given by Kletz and by Lawley(Kletz, 1972a; Lawley and Kletz, 1975; Lawley, 1976) is

� � � 1ÿ exp ÿ��p

�2

ÿ �� 13:9:61�

This is in effect Equation 13.9.19. It may be derived fromEquation 13.9.17 together with Equations 13.9.10, 13.9.13and 13.9.16. It is applicable for small ��p and ��p, buthigher ��p.

Lawley (1981) has subsequently given the moreaccurate Equation 13.9.58. The assumptions underlyingthis equation have just been described.

Wells (1980) has given an expression

� � ��

� � � �13:9:62�as an upper bound on the hazard rate for higher valuesof ��p > 2� �. This expression is equivalent to taking thefractional dead time as � � �= �� .

De Oliveira and Do Amaral Netto give the relation:

� � � 1ÿ 1

��p1ÿ exp ÿ��p

ÿ �� 13:9:63�

for low values of � but higher values of ��p.Numerical results for some of these expressions have

been given by Lees and by de Oliveira and Do AmaralNetto. Table 13.23 shows some comparative resultsobtained, mainly by the latter workers.

13.9.15 Frequency of events in a trip systemA method of determining for a trip system the frequencyof the events of principal interest has been described byKumamoto, Inoue and Henley (1981). These events arethe demand, the functional failure and the operationalfailure of the trip. The method is implemented in theprogram PROTECT.

The procedure is to designate each of these events inturn as the top event of a fault tree, to create the faulttree and to determine its cut sets. These cut setstogether with the proof test interval for the trip are theinputs used by the model to provide estimates of thefrequency of the events mentioned.

The application of this program to determine theexpected frequency of these events for an ammonia�airmixing plant as a function of the proof test interval hasbeen described by Kumar, Chidambaram and Gopalan(1989).


Table 13.23 Some numerical values given by expressions for the average hazard rate of a single-channel tripsystem (after de Oliveira and Do Amaral Netto, 1987) (Courtesy of Elsevier Science Publishers)

Parameter Equation

�p � � 13.9.57 13.9.58 13.9.61 13.9.63 13.9.18(year) (yearÿ1) (yearÿ1)

0.0192 0.1 0.1 0.958� 10ÿ4 0.958� 10ÿ4 0.958� 10ÿ4 0.958� 10ÿ4 0.959� 10ÿ4

1.0 0.954� 10ÿ3 0.952� 10ÿ3 0.954� 10ÿ3 0.958� 10ÿ3 0.959� 10ÿ3

10 0.919� 10ÿ2 0.900� 10ÿ2 0.914� 10ÿ3 0.958� 10ÿ2 0.959� 10ÿ2

1.0 0.1 0.952� 10ÿ3 0.952� 10ÿ3 0.958� 10ÿ3 0.953� 10ÿ3 0.959� 10ÿ3

1.0 0.949� 10ÿ2 0.947� 10ÿ2 0.954� 10ÿ2 0.953� 10ÿ2 0.959� 10ÿ2

10 0.913� 10ÿ1 0.895� 10ÿ1 0.914� 10ÿ1 0.953� 10ÿ1 0.959� 10ÿ1

10 0.1 0.900� 10ÿ2 0.900� 10ÿ2 0.958� 10ÿ2 0.900� 10ÿ2 0.959� 10ÿ2

1.0 0.897� 10ÿ1 0.895� 10ÿ1 0.954� 10ÿ1 0.900� 10ÿ1 0.959� 10ÿ1

10 0.864 0.845 0.914 0.900 0.959

1.0 0.1 0.1 0.468� 10ÿ2 0.468� 10ÿ2 0.488� 10ÿ2 0.484� 10ÿ2 0.5� 10ÿ2

1.0 0.359� 10ÿ1 0.355� 10ÿ1 0.393� 10ÿ1 0.484� 10ÿ1 0.5� 10ÿ1

10 0.916� 10ÿ1 0.860� 10ÿ1 0.993� 10ÿ1 0.484 0.51.0 0.1 0.358� 10ÿ1 0.355� 10ÿ1 0.488� 10ÿ1 0.368� 10ÿ1 0.5� 10ÿ1

1.0 0.284 0.264 0.393 0.368 0.5010 0.847 0.591 0.993 3.68 5.0

10 0.1 0.892� 10ÿ1 0.860� 10ÿ1 0.488 0.900� 10ÿ1 0.51.0 0.827 0.591 3.93 0.900 5.010 4.82 1.00 9.93 9.00 50

a In equations where � is used, �� 365/year.


13.9.16 Time response of a trip systemThe point has already been made that the dependabilityof a trip system is a function not only of its reliability butalso of its capability. An important aspect of capability isthe dynamic response. The effect of the dynamicresponse of the instrument is illustrated in Figure13.15. It is assumed in the figure that, when a faultoccurs, the variable increases linearly from its normallevel to the danger level. The nominal trip point is setpart way up the ramp, but the trip will not usually occurat the point in time corresponding to this level of thevariable. There will normally be delays due to samplingand the dynamic response of the measuring instrumentand there may be an instrument error. After themeasuring instrument has responded, there will bedelays in the safety circuitry and the shut-down valve.There will be a further delay in the process itself, beforethe effect of the shut-off is felt on the variable measured.All these factors, delays and errors, erode the nominalsafety margin and should be considered carefully. Theoriginal assumptions concerning the maximum rate ofrise of the variable are clearly critical also.

Further reduction of the nominal trip point may beappropriate, but the setting should not be put so low thatnoise on the variable at its normal level activates the trip.A spurious trip can arise from too low a level of the tripsetting as well as from instrument unreliability.

The dynamic response of the complete situationagainst which the trip system is designed to protectmay be modelled using standard methods. An account ofunsteady-state modelling of plant is given inMathematical Modeling in Chemical Engineering (Franks,1967) and the modelling of instrumentation is treated intexts on process control such as those by Harriott (1964)and Coughanowr and Koppel (1965).

The following treatment is confined to the dynamicresponse of the measuring instrument, or sensor. Theinputs to a sensor are generally characterized by a set ofidealized forcing functions, of which the main typesrelevant here are (1) the step function, (2) the rampfunction and (3) the impulse function. The unit stepfunction changes suddenly at time zero from a value ofzero to one of unity. The unit ramp function increaseslinearly with time and has a slope t. The unit impulse is afunction which is infinitely large at time zero and zeroelsewhere, but which also has an area which is unity.These three forcing functions are shown in Figure13.16(a)�(c).

The instrument itself is typically modelled as either afirst- or second-order system. Thus a temperature sensor

might be modelled as a first-order system:

McpdT

dt� UA Ti ÿ T� � �13:9:64�

where A is the area for heat transfer to the sensor, cp isthe specific heat of the sensor, M is the mass of thesensor, t is time, T is the temperature of the sensor, U isthe overall heat transfer coefficient, and the subscript iindicates input, or forcing. Thus Ti is the temperature ofthe surrounding fluid.

Equation 13.9.64 may be written in the more generalform for a first-order system as:

�dT

dt� Ti ÿ T �13:9:65�

where

� � Mcp

�UA �13:9:66�

and � is a time constant. A lag of the form given byEquation 13.9.65 is known as a `transfer lag'.

In this case the model obtained is a linear one. If themodel obtained is non-linear, it needs first to belinearized. A non-linear model is obtained, for example,if the mode of heat transfer to the sensor is radiationrather than conduction.

The normal approach is then to express each term inthe linear model as the sum of the steady-state value andof a transient component Equation 13.9.65 thenbecomes:

�d Tss � ��

dt� Ti;ss � �i

ÿ �� Tss � �� 13:9:67�

The corresponding steady-state equation is:

0 � Ti;ss ÿ Tss �13:9:68�Subtracting Equation 13.9.68 from Equation 13.9.67

gives:

�d�

dt� �i ÿ � �13:9:69�

where � is the transient component of temperature andthe subscript ss indicates steady state.

Equation 13.9.69 is then transformed into the Laplace,or s, domain by taking the Laplace transform:

s��ÿ � 0� � � ��i ÿ �� 13:9:70�Taking the initial condition as the steady state with zerodeviation gives �(0)� 0 and hence the ratio of the outputto the input, or the transfer function, is:


Figure 13.15 Effect of instrument error and dynamicresponse on the safety margin in a trip system (afterHensley, 1988) (Courtesy of the Institute of Measurementand Control)



Figure 13.16 Dynamic response of sensor systems: (a) step forcing function; (b) ramp forcing function; (c) impulseforcing function; (d) response of first-order system to step forcing function; (e) response to first-order system to rampforcing function; (f) response to first-order system to impulse forcing function; (g) response of second-order system tostep forcing function. For second order system: !n, natural frequency: �, damping factor


��

�i� 1

1� �s�13:9:71�

The response of the first-order system to the threeforcing functions is then as follows. For the stepresponse:

�i � k �13:9:72�where k is a constant Taking the Laplace transform ofEquation 13.9.72 gives

��i �k

s�13:9:73�

Substituting Equation 13.9.73 in Equation 13.9.71 gives:

�� k

1� �s� �s �13:9:74�

Inverting the Laplace transformed expression 13.9.74back into the time domain:

� � k 1ÿ exp ÿt=�� 13:9:75�Equation 13.9.75 is sometimes written as:

�

�i� 1ÿ exp ÿt=�� 13:9:76�

Where this is done, it should be noted that �i is aconstant, defined by Equation 13.9.72, whereas inEquation 13.9.69 it was a variable.

For the ramp response, proceeding in same way:

�i � kt; say �13:9:77�

��i �k

s2�13:9:78�

�� k

1� �s� �s2�13:9:79�

�

�i� � t

�ÿ 1ÿ exp ÿt=��

� ��13:9:80�

The ramp response has the important property that

�

�i

� t ÿ � t !1 �13:9:81�

In other words, after an initial transient, the measuredvalue lags the actual value by a time equal to the timeconstant � .

For the impulse response:

�i � k��t�; say �13:9:82�

��i � k �13:9:83�

�� k

1� �s�13:9:84�

�

�i� 1

�exp ÿt=�� 13:9:85�

where �(t) is the impulse function. The step, ramp andimpulse responses of a first-order system are shown inFigure 13.16(d)�(f).

An overdamped second-order system is equivalent totwo transfer lags in series. The basic model is therefore

�1dT1

dt� Ti ÿ T1 �13:9:86�

�2dT2

dt� T1 ÿ T2 �13:9:87�

where subscripts 1 and 2 indicate the first and secondstages, respectively.

The transfer function of the second-order system is:

��2

�i� 1

1� �1s� � q� �2s� � �13:9:88�

The step response is:

�2

�i� 1ÿ 1

�1 ÿ �2�1 exp ÿt=�1� � ÿ �2 exp ÿt=�2� �� 13:9:89�

The impulse response is:

�2

�i� 1

�1 ÿ �2exp ÿt=�1� � ÿ exp ÿt=�2� �� 13:9:90�

The step response of a second-order system is shown inFigure 13.16(g).

It is sometimes required to provide an unsteady-statemodel of the sensor for incorporation into an unsteady-state model of the total system. In this case an equationsuch as Equation 13.9.65 may be used for a first-ordersystem and a pair of equations such as Equations 13.9.86and 13.9.87 for a second-order system.

The three forcing functions may be illustrated by theexample of a flammable gas cloud at a gas detector. Thegas cloud may present to the sensor as any one of theseforcing functions. The gas concentration may risesuddenly from zero to a value which then remainsconstant (step function), it may rise linearly (rampfunction) or it may rise momentarily from zero to ahigh value and then subside as rapidly (approximated byan impulse function).

An account of the time lags which occur in practicaltrip systems has been given by R. Hill and Kohan(1986), who characterize the dynamic response of a tripby a ramp function similar to that shown in Figure 13.15and consider in turn the individual time lags. If the totalinterval from the time when the process starts to deviateto the time when it reaches the danger point exceeds 2minutes, there is normally no problem in designing a tripsystem, but if the interval is less than this there is apotential problem, and, if it is of the order of seconds, atrip solution may well not be practical.

The signal transmission lags to and from the logicsystem, and the logic system delay itself are normallynegligible, even for pneumatic systems. Exceptions mayoccur where there are very long pneumatic transmissionlines or where the logic is executed on a time-shareddevice. The more significant lags are likely to be in thesampling and the sensor, in the final control element,and in the process itseif. Sampling lags may amount toa dead time of 10�30 seconds. Transfer lags in sensorsvary, with temperature measurement lags often beinglarge due to the thermal inertia of the measuring pocket.The lag at the control valve can vary from a fraction of asecond up to several minutes, depending on the valvesize. The lag in the process is also highly variable.



Table 13.24 Ranking of trip system configurations withrespect to functional and operational failure (Rushton,1991b) (Courtesy of the Institution of ChemicalEngineers)

Functional ranking Operational ranking

�m/n System m/n System

Low 1/3 Low 3/3

jj#

1/2 jj#

2/22/3 2/31/1 1/12/2 1/2

High 3/3 High 1/3

13.9.17 Configuration of trip systemsInspection of Tables 13.17 and 13.19 indicates that theuse of parallel redundant, or 1/n, systems gives anincrease in functional reliability, but a decrease inoperational reliability compared with a 1/1 system.Better overall reliability characteristics can be obtainedby the use of a majority voting system, of which the 2/3system is the simplest A comparison of a 2/3 systemwith a 1/1 system shows that the 2/3 system has a highfunctional and operational reliability, while a comparisonwith a 1/2 system shows that the 2/3 system has aslightly lower functional reliability, but a much higheroperational one. The 2/2 system is little used for tripsystems but has some interesting characteristics. It iseffectively a series system which has a rather lowerfunctional reliability than a 1/1 system, but its opera-tional reliability exceeds not only that of the 1/1 but alsothat of the 2/3 system.

The configuration of trip systems has been discussedby Rushton (1991a,b), who describes a formal approach.According to Rushton, for typical systems the ranking oftrip systems with respect to their functional andoperational reliability is invariant and is as shown inTable 13.24. The trip systems most commonly used arethe 1/1, 1/2 and 2/3 systems. The requirement forfunctional reliability is rarely such as to justify a 1/3system and that for operational reliability rarely such asto justify a 2/2 or 3/3 system.

The criterion given by Rushton for selection of the tripsystem configuration is an economic one and is:

V � nC � ��sH � sS� � 1ÿ �s� �G �13:9:91�

where C is the annualized cost of a single channel trip, G isthe cost of a genuine trip, H is the cost of realization of ahazard, S is the cost of a spurious trip, V is the overallannual cost, and the subscript s indicates the trip system(as opposed to a single channel). The most economicsolution is that which minimizes V.

For a genuine trip there is an element of loss relatedto the process failure which causes the demand. If, forpurposes of comparison, this element (which will occurin all cases) is neglected, the cost of a genuine trip isapproximately the same as that of a spurious one (G�S),so that Equation 13.9.9 1 becomes:

V � nC � ��sH � s � � 1ÿ �s�� S �13:9:92�If the basic parameters of a particular application are known,namely the demand rate �, the fail-to-danger and spuriousfailure rates �s and s, the proof test interval �p, the repairtime � r and the costs of hazard realization H and spurioustrip S, then a plot of H/C vs S/C gives a map showing theregions where a particular configuration is optimal. Theboundaries of the regions are curves of constant V/C.

As an illustration, consider the case given by Rushtonwhere the application is characterized by �� 0.01demands/year, �� 0.2 failures/year, � 0.5 failures/year, �p� 1/12 and � r� 1/52. The map giving theoptimal configurations for this case is shown in Figure13.17(a). If � is increased to 1.0 demands/year, the mapbecomes that shown in Figure 13.17(b).

Rushton also treats the case where there is an elementof common cause failure (CCF) and uses for this thebeta method described in Chapter 9. He considers thesimplest trip configuration to which such failure applies,the 1/2 system. For such a system:

�1=2 �� 1ÿ �1� ��p

� �23

� �1��p

2�13:9:93�

1=2 � 2� 1ÿ �2� � � �2 �13:9:94a�

� 2ÿ �2� � �13:9:94b�where �1 is the fraction of the functional failure ratewhich is common cause, or the beta value for that failurerate, and �2 is the operational beta value.

The effect of CCF may be illustrated by consideringthe extension given by Rushton of his example to thecase of a 1/2 trip system where �� 1 demand/year,�� 0.2 failures/year, � 0.5 failures/year, �p� 1/12,� r� 1/52 and where �1� �2��. Maps of the configura-tion space for this case are shown in Figure 13.17(c) and(d) for different values of �. Table 13.25 gives expres-sions for the fractional dead time and spurious trip ratefor different trip configurations.

This cost-based approach allows the different tripsystem configurations to be put on a common basis forpurposes of comparison. Where the hazard includes oneto human life, there will be a certain level of functionalreliability which must be achieved and this should be afactor in the choice of configuration. The approachdescribed may still be applicable with adaptation injudging which configurations are reasonably practicable.

13.9.18 Integration of trip systemsAs already described, a trip system is normally dormantand comes to life only when a demand occurs. Anelement of the trip system such as a sensor or a valvemay experience failure and such a failure will lieunrevealed unless detected by proof testing or someother means. By contrast, equivalent elements in acontrol system are exercised continuously, and failurein such an element is liable to cause an operationalexcursion of some kind. The failure in this case is arevealed one. Yet the actual physical fault in the twocases may well be identical. A sensor may fail giving alow/zero or high reading, or a valve may jam open orshut. The concept of trip integration, which has beendescribed by Rushton (1992), is based on this contrast



between a fault which lies unrevealed in a trip systembut is revealed in a measurement and control system.The principle applies to any system which has aprotective function. The system is regarded as integratedprovided it is regularly exercised, which generally meansthat it is in use during the normal operation of the plant.

As an illustration, Rushton describes a refrigeratedstorage tank for a toxic liquid, equipped with a coolingsystem and a pressure relief valve. Both are protectivesystems, but the cooling system is in more or lesscontinuous use and is thus integrated, whereas the reliefvalve is not.

In this case the integration is benign, but it can alsobe malign. As an example of the latter, the author citesthe case of a sensor which is common to both a levelcontrol loop and a high level alarm. Failure of the sensorresults in failure not only of the control loop but also ofthe alarm.

It should be an aim of trip system design to convertunrevealed failures into revealed failures, and hence toenhance reliability, by the judicious exploitation ofbenign integration.

13.9.19 Maintenance of trip systemsThe foregoing account of trip systems has brought outthe importance of proof testing. This testing and, moregenerally, the maintenance of trips needs to be of a high

standard if the design reliabilities are to be achieved.Accounts of the testing and maintenance of trip systemshave been given by R.M. Stewart (1971), A. Taylor(1981) and Barclay (1988).

The system described by Barclay is broadly as follows.The trips on a plant are covered by a testing schedulewhich specifies a test interval for each trip system. Acommon test interval is 12 months, but the interval isestablished for each trip individually. A change to thetest interval, or complete removal of the trip, aregoverned by formal procedures which involve consulta-tion with the interested parties.

There is a written procedure for the test which detailsthe actions to be taken. This is necessary because theprocedure can be quite complex, because the individualperforming the test may not be familiar with theparticular trip and because in many cases the test isone of the last tasks done prior to a start-up when theremay be considerable pressure. This procedure can bechanged only after formal consultations.

The test should cover the whole trip from initial tofinal element. From the point of view of testing, thepreferred method is an actual test in which theprocedure is to take the process to the trip point andverify the trip action. The alternative is a simulated testwhich is performed by simulating process conditionsusing test equipment.


Figure 13.17 Configuration selection map for trip systems: illustrative examples (Rushton, 1991b): (a) case with��0.01, ��0; (b) case with ��0.1, ��0; (c) case with �� 1, �� 0.3; and (d) case with ��1, �� 1. See text forfurther details (Courtesy of the Institution of Chemical Engineers)


In many cases it is impractical to carry out an actualtest. In the case of a hazardous process, the reasons areobvious. But even for a less hazardous process thenumber of trips may be such that repeated shut-downand start-up is not practical. On a plant with 30 or 40trips the equipment may be worn out just by testing.

It can be misleading to rely on a single-point trip as asufficient test. And particularly where there is complexlogic it is necessary to exercise all the steps in the chain;omission of intermediate steps can be misleading.

Instruments which are part of a trip system areprovided with identification, both on circuit diagramsand in the field by a tag. This helps avoid shut-downscaused by work on such instruments.

The trip system is maintained in good condition bypreventive maintenance. Equipment is inspected fordeterioration. Critical equipment is classified as suchand subject to periodic overhaul. It is required thatfollowing maintenance work a function check be carriedout on the equipment. A trip which is out of service orfails to operate is not tolerated. It is classed as a hazardand action is taken.

At the site described by Barclay, there are some35 000 to 40 000 instruments with more than 5000 tripsand interlocks. Trip maintenance is handled by acomputerized system. The responsibility for testing inthis works lies with the operations rather than themechanical function. Essentially similar considerationsapply to the maintenance of interlocks.

13.10 Interlock Systems

Interlocks are another important type of protectivedevice. They are used to control operations which musttake place in a specified sequence and equipments whichmust have specified relations between their states. Thisdefinition of an interlock differs from that often used inthe American literature, where the term ìnterlock' tends

to be applied to both trip and interlock systems (asdefined here).

Accounts of interlock systems are given in AppliedSymbolic Logic (E.P. Lynch, 1980) and Logical Design ofAutomation Systems (V.B. Friedman, 1990) and by D.Richmond (1965), E.G. Williams (1965), Becker (1979),Becker and Hill (1979), Kohan (1984) and the CCPS(1993/14).

There are various kinds of interlock. The original typeis a mechanical device such as a padlock and chain on ahand valve. Another common type is the key interlock.Increasing use is made of software interlocks based onprocess computers.

Some typical applications of interlocks are in suchareas as:

(1) electrical switchgear;(2) test cubicles;(3) machinery guards;(4) vehicle loading;(5) conveyor systems;(6) machine start-up and shut-down;(7) valve systems;(8) instrument systems;(9) fire protection systems;(10) plant maintenance.

An interlock is often used to prevent access as long asan equipment is operating. Thus electrical switchgearmay be installed in a room where an interlock preventsthe door opening until there is electrical isolation.Similarly, an interlock prevents access to a test cubiclefor operations involving high pressure or explosivematerials until safe conditions pertain. An interlock maybe used to stop access to a machine or entry into avessel unless the associated machinery cannot move. Invehicle loading, interlocks are used to prevent a tanker


Table 13.25 Fractional dead times and spurious trip rates for trip systems with simultaneous proof testing andcommon cause failures accounted for by the beta method (Rushton, 1991b) (Courtesy of the Institution of ChemicalEngineers)

System �m/n m/n

1/1��p

2

1/2��p 1ÿ �1� �� 2

3� �1��p

22ÿ �2� �

2/2 ��p 1ÿ �1=2� � 2 2 1ÿ �2� �2�r � �2

1/3��p�3

41ÿ �1� �3��1��p

23ÿ 2�2� �

2/3 ��pÿ �2

1ÿ �1� �2� �1��p2

6 2 1ÿ �2� �2�r � �2

3/3��p2

3ÿ 2�1� � 3 3 1ÿ �2� �3�2r � �2

m/nn

r

� � ��p 1ÿ �1� �� rr� 1

� �1��p2

n 1ÿ �2� � nÿ 1

mÿ 1

� � 1ÿ �2� ��r� �mÿ1��2


moving away while it is still connected to the dischargepoint.

Where synchronized operation of equipment is neces-sary, as in a conveyor system, interlocks are used toensure this. Interlocks are used for the start-up ofmachinery to ensure that all the prestart conditions aremet, that the correct sequence is followed and thatconditions for transition from stage to stage are met. Forlarge rotating machinery key factors are process condi-tions and oil pressures.

Pressure relief valves have interlocks to prevent all thevalves being shut off simultaneously. There may beinterlocks on other critical valve systems. Interlocks arealso a part of instrument systems. An interlock may beused to prevent the disarming of a trip system unlesscertain conditions are met. Fire protection systems areprovided with interlocks as a safeguard against leavingthe system disabled, particularly after testing or main-tenance. Plant maintenance operations make much use ofinterlocks to prevent valves being opened or machinerystarted up while work is in progress.

Some features of a good hardware interlock are that it(1) controls operations positively, (2) is incapable ofdefeat, (3) is simple, robust and inexpensive, (4) isreadily and securely attachable to engineering devicesand (5) is regularly tested and maintained.

Some interlocks are quite simple, but some interlocksystems are quite complex. Such systems are often notconfined to interlocks, but incorporate other logicfunctions. Interlock systems therefore shade over intogeneral logic control systems. In particular, there aresome very large interlock systems on boilers and gasturbines.

An especially important type of logic control is thecontrol of sequential operations. Sequential controlsystems usually have numerous checks which must besatisfied before the next stage is initiated and checks thatequipment has obeyed the control signals. These checksconstitute a form of interlock.

Since an interlock can bring the process to a halt, it isimportant to provide adequate status and alarm signals toindicate which feature is responsible for the stoppage. Itwill be apparent that some interlocks are effectively trips.The distinction between the two is often blurred.

The interlocks described so far are simple rather thanhigh integrity systems, but the latter can, of course, beused, if the situation warrants it. The general approach issimilar to that described for trip systems.

13.10.1 Interlock diagramsAs with protective systems so with interlock systems thedesign may involve a number of parties and a commonlanguage is needed. Unfortunately, this is an area ofsome difficulty, for three reasons. The description ofinterlock systems involves the use of several differenttypes of diagram; there appears to be considerablevariability in the types of diagram employed and in thenomenclature used to describe them. The symbols foruse in these diagrams are given in standards; however,not only are these standards subject to continuousrevision, but also the symbols given are often notthose in common use. Interlock systems are not wellserved with textbooks. In particular, there is in electricalengineering a large literature on switching systems, but

very little of this addresses process interlock systems assuch.

Three types of diagram commonly used in the designof interlock systems in the process industries are (1) theprocess flowchart, (2) the logic diagram and (3) theladder diagram. The last two are sometimes referred toas the àttached logic diagram' and the `detached logicdiagram', respectively.

The starting point for design of an interlock system isa description of a sequence of operations. A diagramshowing this is a process chart. Process chart symbolshave been given in Work Study (Curie, 1960) and areshown in Table 13.26, Section A.

The logic required to implement this sequence may beshown in a logic diagram. This utilizes standard symbolsfor functions such as OR, AND and NOT, similar tothose used in fault tree work, as described in Chapter 9.Standard symbols for fault trees are given in BS 5760Reliability of Systems, Equipment and Componsuts, Part 7:1991 Guide to Fault Tree Analysis. For some functions,two sets of symbols are given, the preferred and thealternative. It is the latter which are commonly used inthe process industries and which are used here. Thelogic symbols used here are the alternative symbolsgiven in BS 5760 and are shown in Table 13.26, SectionB.

The logic diagram may then be converted to a ladderdiagram. Standard symbols for protective logic systemsare given in BS 3939: 1985 Graphical Symbols forElectrical Power, Telecommunications and ElectronicDiagrams. The relevant IEC standard is IEC 617Graphical Symbols for Diagrams. BS 3939: Part 7: 1985Switchgear, Controlgear and Protective Devices, which isidentical to IEC 617-7, gives relevant symbols. Other setsof symbols include those given by E.G. Williams (1965)and those of E.P. Lynch (1980). An account of theevolution of logic symbols is given in An Introduction tothe New Logic Symbols (Kampel, 1986). Table 13.26,Section C, shows a selection of symbols, includingthose used here, from those given by Lynch.

13.10.2 Some basic systemsSome of the basic building blocks of interlock systemsare illustrated in Figure 13.18. Figure 13.18(a) shows asimple starting circuit. Activation of the circuit occurs ifthere is a signal due to depression of the startpushbutton AND a signal due to non-depression of thestop pushbutton. Since the signal from the startpushbutton will disappear when it is no longer beingdepressed by the operator, it is necessary to provide thefeedback signal shown, which ensures that therecontinues to be an output signal. If the stop pushbuttonis depressed, the output signal is extinguished.

Figure 13.18(b) shows a time delayed holding circuit.If following activation by the start pushbutton, the signalX does not appear within the time interval specified, theoutput signal disappears. A typical application of thiscircuit is start-up of a motor-driven pump which issupplied with lubricating oil by a lube oil pump drivenfrom the same motor. If after the time interval specifiedthe lubricating oil pressure signal is still absent, thepump is shut down.

Figure 13.18(c) shows a self-extinguishing circuit.Activation of the pushbutton gives an output signalwhich continues until the time interval specified has



elapsed, when the output signal is extinguished. Thiscircuit might typically be used to have a motor-drivenequipment run for a period and then shut down.

13.10.3 Illustrative example: conveyor systemAs an illustration of an interlock system, consider theconveyor system described by Lynch. A screw conveyorA feeds material from a car vibrator to an elevator whichdischarges to screw conveyor B above two storage binsA and B. There is a slide gate on the pipe betweenconveyor B and each bin, with a limit switch on eachgate. Material is fed from a bin by a star feeder intoscrew conveyor C. The loading equipment can fill thebins at several times the rate at which it can bewithdrawn.

Figure 13.19(a) shows a logic diagram for theinterlocks for manual operation of this system.Conveyor B can be started only if either A or B slidegate is open. The elevator can be started only ifconveyor B is running. Conveyor A can be started onlyif the elevator is running. The diagram also shows thesimple non-interlocked starting circuit for the carvibrator.

The corresponding ladder diagram is shown in Figure13.19(b). The diagram shows six circuits A�F. Certainrelays occur in more than one circuit, e.g. relay R1 incircuits A and D, and it is this which imparts theinterlocking feature. Circuit A is the starting circuit forconveyor B. This circuit can be activated only if eitherrelay R2 or R3, the relays for the slide gates limitswitches (LS), is closed. If this condition is met,depression of the start pushbutton (PB) energizes relaysR1 and Ml and causes R1 to close and M1 to operate arelay in the power circuit. When the stop pushbutton ispressed, the circuit is de-energized and R1 opens.

In circuit B closure of the slide valve limit switch LS1energizes relay R2 and causes it to close, and opening ofthe switch causes R2 to open. Circuit C implements asimilar relationship between limit switch LS2 and relayR3. Circuit D is the starting circuit for the elevator. Thecircuit can be activated only if relay R1 is closed. If thiscondition is met, depression of the start button energizesrelays R4 and M2 and causes R4 to close and M2 to


Table 13.26 Interlock logic symbols

A Work study symbolsa

Symbol Activity Predominantresult

Operation Producesaccomplisheschanges furtherthe process

Inspection Verifies quantityor quality

Transport Moves or carries

Delay Interferes ordelays

Storage Holds, retains orstores

B Logic symbolsb

AND

OR

NOT

Delay

C Ladder diagram symbolsc

Pushbutton start

Pushbutton stop

Position, or limit, switch

Relay or solenoid contracts,normally open, closed whenrelay or solenoid is energized

Relay or solenoid contactsnormally closed, opened whenrelay or solenoid is energized

Motor n

Relay n

Solenoid n

a These symbols are given by Currie (1960), who attributes themwithout reference to the American Society of MechanicalEngineers.b These symbols are given in BS 5670: Part 7: 1991. Thealternative symbol for NOT is a common alternative and is thatused by E.P. Lynch (1980).c These symbols are those used by E.P. Lynch (1980).



Figure 13.18 Some basic interlock system logic diagrams: (a) simple starting circuit; (b) time delayed holdingcircuit; and (c) self-extinguishing circuit. PB, pushbutton

Figure 13.19 Conveyor interlock system diagrams: (a) logic diagram; and (b) ladder diagram (E.P. Lynch, 1980)(Reproduced with permission from Applied Symbolic Logic by E.P. Lynch, Copyright #, 1980, John Wiley and SonsInc.)


operate. Circuit E is the starting circuit for conveyor A,and is similar to circuit D. The circuit can be activatedonly if relay R4 is closed. Circuit F is a simple startingcircuit and is not interlocked.

13.10.4 Illustrative example: reactor systemAnother example of a simple interlock system isillustrated in Figures 13.20 and 13.21. Figure 13.20shows a plant consisting of a water-cooled reactor in


Figure 13.19 continued


which a batch reaction is carried out The reactor ischarged with chemical A and chemical B is then fed ingradually from a weigh tank as the reaction proceeds.The interlock system is required to cut off the supply ofB from the weigh tank if any of the following conditionsapply: (1) the shut-off valve V3 on reactor 2 is open; (2)the agitator is not operating; (3) the agitator paddle hasfallen off, or (4) the reactor temperature has risen abovea fixed limit. The loss of the agitator paddle is detectedby a current-sensitive relay on the motor.

An interlock system for carrying out these functions isshown in Figure 13.21. The start input opens valve 1,unless valve 3 is open or the agitator is stopped, whichconditions inhibit start-up. If these conditions occur lateror if the reactor temperature rises or the agitator paddlefalls off, valve 1 is closed. The interlock causing theclosure is signalled by a status or alarm display. There isa 10 s delay on the reactor high temperature interlock toallow for noise on that signal. If operation is inhibited bythe reactor high temperature or agitator stoppageinterlocks, these inhibitions are removed 5 and 10 min,respectively, after the inhibiting condition has disap-peared. An account of the reliability of interlock systemsis given by R.A. Freeman (1994).

13.11 Programmable Logic Systems

As already indicated, increasing use is made in processcontrol systems of programmable logic controllers(PLCs). An account of the application of PLCs tofunctions such as pump change over, fire and gasdetection and emergency shut-down has been given by

Margetts (1986a,b). He describes the planning of anoperation such as pump change over using hierarchicaltask analysis, in which the change over task issuccessively redescribed until it has been broken downinto executable elements, and the application of thehazard and operability (hazop) method to assess theadequacy of the resultant design.

He also deals with the reliability of the PLC system.For the system which he considers, the MTBFs of theinput device, the control logic and the output device are100 000, 10 000 and 50 000 h, respectively, giving anoverall system MTBF of 7690 h. Use of as many asfour control logic units in parallel would raise the systemMTBF to 14 480 h, but this is not the complete answer.The method described by the author for the furtherenhancement of reliability is the exploitation of the abilityof the PLC to test the input and output devices and alsoitself.

13.12 Programmable Electronic Systems

Increasingly, the concept of computer control hasbecome subsumed in the broader one of the program-mable electronic system (PES). The account given hereis confined to the safety aspects of PESs and is based onthe HSE PES Guide. The treatment in the CCPS SafeAutomation Guidelinesis discussed in Section 13.15.

13.12.1 HSE PES GuideAn account of programmable electronic systems andtheir safety implications is given in ProgrammableElectronic Systems in Safety Related Applications (HSE,1987b) (the HSE PES Guide), of which Part 1 is an


Figure 13.20 Batch reactor system. FIC, flow indicator controller; S, speed measurement; TI, temperatureindicator; TIC, temperature indicator controller


Introductory Guide (PES 1) and Part 2 the GeneralTechnical Guidelines (PES 2). The general configurationof a PES is shown in Figure 13.22.

Whereas in a safety-related system the use of conven-tional hardwired equipment is routine, the use of a PESin such an application has been relatively unknownterritory. The approach taken, therefore, has been toassess the level of integrity required in the PES byreference to that obtained with a conventional systembased on good practice. This level of integrity is referredto as `conventional safety integrity'.

PES 2 gives three system elements which should betaken into account in the design and analysis of safety-related systems:

(1) configuration;(2) reliability;(3) overall quality.

Safety integrity criteria for the system should bespecified which cover all three of these system elements.

13.12.2 ConfigurationThe configuration of the system should be such as toprotect against failures, both random and systematic. Theformer are associated particularly with hardware and the

latter with software. PES 2 lays down three principleswhich should govern the configuration:

(1) the combined number of PES and non-PES safetyrelated systems which are capable, independently, ofmaintaining the plant in a safe condition, or bringing itto a safe state, should not be less than the number ofconventional systems which have provided conven-tional safety integrity;

(2) no failure of a single channel of programmable electro-nic (PE) hardware should cause a dangerous mode offailure of the total configuration of safety-related sys-tems.

(3) faults within the software associated with a singlechannel of PE should not cause a dangerous modeof failure of the total configuration of safety-relatedsystems.

Observance of the second principle may require that, inaddition to the single channel of PE hardware, thereshould be at least one additional means of achieving therequired level of safety integrity. Three such meansmight be:

(1) additional non-programmable hardware;(2) additional programmable hardware of diverse design;(3) additional PE hardware of same design.


Figure 13.21 Batch reacter interlock system logic diagram


The latter is applicable only where the design is wellestablished and there is a record of reliable operation inan environment similar to that under consideration.

Observance of the third principle may require thatwhere a single design of software is used, there shouldbe an additional means of achieving safety integrity. Suchmeans may be:

(1) additional software of diverse design;(2) additional non-programmable hardware.

Diversity of software is required only where:

(1) PES safety-related systems are the sole means ofachieving the required level of safety integrity;

(2) faults in the software of a single channel of PE mightcause a dangerous mode of failure of the total config-uration of safety-related systems.

This strategy is intended to protect against systematicfailures and, in particular (a) software errors in theembedded or applications software; (b) differences in thedetailed operation of microprocessor and other large-scaleintegrated circuits from that specified; (c) incompatibilitybetween original and replacement hardware modules;


Figure 13.22 A programmable electronic system (HSE, 1987b). ADC, analogue-to-digital converter; DAC, digital-to-analogue converter; NP, non-programmable hardware; PE, programmable electronics (Courtesy of HM StationeryOffice)


and (d) incompatibility of updated or replacementembedded software with original software or hardware.

The extent to which it is necessary to have diversity ofsoftware depends on the application. As a minimum, thesafety-related function should use diverse applicationssoftware. For higher reliability it may be necessary toconsider also diverse embedded software. The safetyrequirements specification is necessarily a commonfeature of the diverse software implementations. It istherefore important that it be correct.

PES 2 recognizes that there may exist other ways ofproviding against failure. In some applications it may bepossible to achieve the required level of safety integrityby adopting a formal approach to the software designand testing. Furthermore, for situations where a relativelylow level of reliability is acceptable, the use of a singlePE channel may be acceptable provided there isextensive self-monitoring of the hardware and automaticsafety action on detection of failure.

13.12.3 ReliabilityThe governing principle for reliability of the hardware isthat the overall failure rate in a dangerous mode offailure, or, for a protection system, the probability offailure to operate on demand, should meet the standardof conventional safety integrity. PES 2 specifies threemeans of meeting this criterion:

(1) a qualitative appraisal of the safety-related systems,using engineering judgement;

(2) a quantified assessment of the safety-related systems;(3) a quantified assessment of the safety of the plant

Essentially, the level of reliability should be governed bythe conventional safety integrity principle. Where theacceptable level of reliability is relatively low, the firstmethod may suffice, but where a higher reliability isrequired the second and third methods will be appro-priate.

13.12.4 Overall qualityWhilst the foregoing measures concerning configurationand reliability are a necessary framework, they are not inthemselves sufficient In particular, systematic errorscreep in due to deficiencies in features such as thesafety requirements specification and software faults.They need, therefore, to be supplemented by the thirdsystem element, overall quality. Overall quality isconcerned essentially with high quality procedures andengineering. These should cover the quality of thespecification, design, construction, testing, commission-ing, operation, maintenance and modification of thehardware and software.

In determining the level of overall quality to be aimedfor, regard should be paid to the level which would beappropriate for conventional safety integrity and to thelevel determined for the system elements of configurationand reliability. As a minimum, attention should be paid to(a) the quality of manufacture and (b) the quality ofimplementation. For overall quality to match a higherlevel of reliability (c) each procedural and engineeringaspect should be reviewed.

Qualitative assessment checklists in support of such areview are included in PES 2 in Appendix 7. Three setsof checklists are given for (1) a control computer, (2)

programmable logic control (PLC) and (3) commoncause failure (CCF). The headings of these checklistsare (1) safety requirements specification, (2) hardwarespecification, (3) hardware design, (4) hardware manu-facture, (5) hardware test, (6) installation, (7) systemtest, (8) operations, (9) hardware maintenance andmodification, (10) software specification, (11) softwaredesign, (12) software coding, (13) software test, (14)embedded software, (15) application programming and(16) software maintenance and modification. The applic-able headings are: for a control computer, all except(14)�(15); for a PLC, all except (10)�(13); and for CCF,all.

13.12.5 Design considerationsPES 2 describes a number of design considerationswhich are particularly relevant to the safety integrity ofPESs. The replacement of a control chain in which thesensor sends a signal directly to the actuator by onewhich involves analogue-to-digital (A/D) converters andPEs may reduce the safety integrity. Unless there is apositive contribution to safety by doing otherwise, thedirect route between sensor and actuator should nor-mally be retained. An additional signal may be takenfrom the sensor with suitable isolation to the PEs.

For the execution of safety functions, it may benecessary to have a shorter sampling interval than isrequired for normal measurement and control functions.

There should be hardware or software to ensure thaton switch-on or on restart after a power failure theresetting of the system is complete and the point in theprogram at which entry occurs is a safe one.Interruptions of the power supply should be catered forand should not lead to unidentified or unsafe conditions.

As far as practicable, safety critical functions should beautomatically monitored or should be self-checking. Theemergency shut-down systems should be proof checkedat appropriate intervals to discover unrevealed failures.

PES 2 also gives detailed guidance on the environ-mental aspects of PESs, particularly in respect ofelectrical interference and of electrostatic sensitivedevices.

13.12.6 Software considerationsThe software for use in safety-related applications needsto be of high quality and PES 2 gives an account ofsome of the measures which may be taken to achievethis. These include:

(1) safety requirement specification;(2) software specification;(3) software design, coding and test;(4) system test.

They also include:

(5) software modification procedures.

For all these aspects there should be

(6) formal documentation.

PES 2 puts considerable emphasis on the safetyrequirements specification, as already described. It alsodevotes a good deal of space to the control of softwarechanges.


13

/54

CO

NT

RO

LS

YS

TE

MD

ES

IGN

07

:24

7/1

1/0

0R

ef:

37

23

LE

ES

±L

oss

Pre

ve

ntio

nin

the

Pro

cess

Ind

ustrie

sC

ha

pte

r1

3P

ag

eN

o.

54

Figure 13.23 A nitrator unit under control of a PSE (HSE, 1987b) (Courtesy of HM Stationery Office)


A further account of software reliability is given inSection 13.13.

13.12.7 Illustrative examplePES 2 gives as an illustrative example part of the safetyintegrity assessment of a plant for the manufacture of theexplosive pentaerythritol tetranitrate (PETN). Figure13.23 shows a schematic diagram of one of the nitrators.

A particularly critical parameter is high temperature inthe nitrator, the limit being 35�C. For this, protection isprovided in the form of a dump valve, which opens todump the reactor contents to a drowning tank. Theconventional control and protection system for such aplant is a control system incorporating some protectionfeatures and a single dedicated protection system.

In the design considered, the plant is controlled by acontrol computer which performs the basic control of theoperating sequence. At each stage of the sequence thecomputer performs checks to ensure that the previousstage is complete and that the plant is in the correctstate and ready to proceed to the next stage.

On each of the critical parameters there are duplicatesensors. The signal from one sensor goes to the com-puter and that from the other to a PLC. The computerand the PLC operate their own relays in the appropriate

interlock system. An attempt by the computer to take anaction is inhibited if: (1) the PLC relay contact is notclosed in agreement with the computer; or (2) thecombination of permissives being sent to the computeris correct but indicates that the action is to be inhibited;or (3) the combination of permissives being sent to thecomputer is incompatible with the inputs to the PLC. Thelatter occurrence is assessed by the computer. Thusthe control of critical parameters and sequences is bythe computer monitored by the PLC, which is in turnmonitored by the computer.

For the critical parameter of high temperature in thenitrator, the computer and PLC act in effect as a 1/2protective system. If either detects a high temperature, itacts to open the dump valve on the reactor.

A hazard analysis of the nitrator system was performedand a fault tree developed for the top event`Decomposition', as shown in Figure 13.24(a). Thisevent occurs if there is a demand in the form of hightemperature and a failure of the top level of protection(`protection fails'). The guide includes the further faulttrees for the events C1�C4 and B4�B8.

Figure 13.24(b) shows one of these constituent faulttrees, subtree B5 for the event `no dump signal'. Thishas several interesting features. The top event in the


Figure 13.24 Sections of fault tree for the top event `Decomposition' for the nitrator unit shown in Figure 13.23(HSE, 1987b): (a) top section of fault tree; and (b) subtree for event B5 (Courtesy of HM Stationery Office)


subtree will occur if both the computer and the PLC failto send a trip signal. Among the causes for the computerfailing to send this trip signal are various combinations ofinstrument failure, including common cause failure of allthe instruments in one group, e.g. resistance thermo-meters RT1 and RT2. In the tree the failures of RT1 andRT2 are regular random failures, whilst the failure `CCFof RTs' is the CCF for this group. The separation of theCCF in this way both highlights it as a specific eventand assists in assigning to it a numerical value.

Other CCFs occur higher up the subtree just beneaththe top event These are the CCFs of: (a) all threeresistance thermometers, RT1 and RT2 on the computerand RT3 on the PLC; (b) the hydraulic valves; and (c)both the computer and the PLC.

A safety integrity analysis of the system is given inwhich each of the three system elements (configuration,reliability and overall quality) are examined. For config-uration, a check is made against each of the threeprinciples given in Section 13.12.2. In respect of criterion1, the combined number of PES and non-PES systems istwo, as in the conventional system, so this criterion ismet. For criterion 2, failure of no single channel,computer or PLC, will cause loss of protection, socriterion 2 is met. For criterion 3, failure of no singleset of software, on the computer or on the PLC, willcause loss of protection, so criterion 3 is met.

For reliability, the fault tree is analysed to produce thecutsets and it is shown that the Boolean relation for thetop event A1 is:

A1 �D10� Z� X�B6� J13� J14� J15� J16� J18� J20

�L1� L2� Y� � Y�D11�D12� � �Bl�D1�D2

�D11�D12�H1�H2��J20� B6� �13:12:1�

The frequency of the top event A1 was then estimatedby applying data on the frequency of failures and, forprotective features, by utilizing Equation 13.9.18 withdata on proof test intervals. Table 13.27 shows thenumerical values obtained for the events given inEquation 13.12.1. The frequency of the top event wasfound to be 8.8� 10ÿ4/year. This was based on anumber of pessimistic assumptions and on this basiswas deemed an acceptable frequency.

For overall quality, it was considered necessary toexamine not only the quality of manufacture andimplementation, but also the procedures and engineer-ing. For this the checklists given in Appendix 7 of PES 2were used and the results obtained for each item in thisexample are shown in that appendix. This check led toconsideration of the following modifications: (a) additionof software limits on programmable alarm and trip levels;and (b) provision of test signal injection and monitoring


Figure 13.24 continued



Table 13.27 Some results obtained in the estimation of the hazard rate of the nitrator shown in Figure 13.23 (HSE,1987b) (Courtesy of HM Stationery Office)

Event reference Event description Failure rate Test Probability of(failures/106 h) interval failure on demand

D10 Common cause failure of resistance thermometersRT1, RT2 and RT3 giving low output 0.06

Z Common cause failure of computer and PLC Not quantified

X Failure in a dangerous mode of control computer 7.5B6 Drain valve fails to open 3.8� 10ÿ4

J13 Failure of resistance thermometer RT3 giving lowoutput 2.0 1 month 7.3� 10ÿ4

J14 Temperature switch TSW1 fails to operate on hightemperature 3.4 1 month 1.2� 10ÿ4

J15 Logic unit (PLC) input LUI 1 failed; does not respondto TSW1 1.0 1 month 3.7� 10ÿ4

J16 Valve V3 fails to open 7.5� 10ÿ4

J18 Logic unit (PLC) output LUO 1 failed; does notde-energize 1.0 1 month 3.7� 10ÿ4

L1 Drowning tank leaking NegligibleL2 Drowning tank drain valve opened 1� 10ÿ3

Y PLC failed in dangerous mode 2.5 1 month 9.1� 10ÿ4

�1 5.7� 10ÿ3

X:�1 0.042

Y PLC failed in dangerous mode 2.5 1 month 9.1� 10ÿ4

D11 Common cause failure of temperature transmittersTT1, TT2 giving low output 0.06

D12 Common cause failure of control computer analogueinputs A11 and A12 giving low reading 0.03

�2 0.09

Y:�2 8.2� 10ÿ5

B1 PE impure Not quantifiedD1 Feeder fails at high speed 2.6D2 Control computer analogue output A01 fails to high

O/P 1.0D11 Common cause failure of temperature transmitters

TT1,TT2 0.06D12 Common cause failure of control computer analogue

inputs A11, A12 0.03H1 Hydraulic failure of agitator causing high speed NegligibleH2 Stirrer breaks from shaft Negligible

�3 3.7

J20 Common cause failure of hydraulic valves V2 and V3 2.3� 10ÿ6

B6 Drain valve fails to open 3.8� 10ÿ4

�4 3.8� 10ÿ4

�3 � �4 1.4� 10ÿ3

Total D10� Z�X�1 �Y�2 � �3�4� � 0.1


points, particularly on the resistance thermometersmeasuring high temperature in the nitrator.

13.13 Software Engineering

The use of various types of computer aid in process plantdesign and operation is now routine. The dependabilityof these aids is determined by the quality of thecomputer programs. The dependability of this softwareis therefore important and may be critical. This isespecially the case in real-time, on-line computer-basedsystems.

The dependability of software, particularly in safetycritical systems, is a major topic in software engineeringand is beyond the scope of this book. However, it cannotbe neglected, and therefore a brief description is given ofsome of the principal issues of which engineers in theprocess industries should be aware.

Accounts of software engineering and software relia-bility are given in Software Engineering (Bauer, 1975a),Software Reliability, Principles and Practice (Meyers,1976), Quality Assurance for Computer Software (Dunnand Ullman, 1982), Program Venfication Using ADA(McGettrick, 1982), Software Engineering (Shooman,1983), Software Defect Removal (Dunn, 1984), ProgramConstruction and Verification (Backhouse, 1986),Systematic Software Development Using VDM (C.B.Jones, 1986), The Spine of Software: Designing ProbablyCorrect Software � Theory and Practice (Baber, 1987),Achieving Safety and Reliability with Computer Systems(Daniels, 1987), The Craft of Software Engineering (Macroand Buxton, 1987), Software Reliability (Littlewood,1987b), Software Reliability (Musa et al., 1987),Handbook of Software Quality Assurance (Schulmeyerand MacManus, 1987), Software Diversity inComputerised Control Systems (Voges, 1987), Managingthe Software Process (Humphrey, 1989), High IntegritySoftware (Sennett, 1989), Software Engineering(Somerville, 1989), Case Studies in Systematic SoftwareDevelopment (C.B. Jones and Shaw, 1990), DerivingPrograms from Specifications (C. Morgan, 1990),Software Quality and Reliability (Ince, 1991a), SoftwareEngineers Reference Book (McDermid, 1991), DevelopingSafety Systems: A Guide Using ADA (Pyle, 1991),Reliability in Instrumentation and Control (Cluley, 1993)and Safety Aspects of Computer Control (P. Bennett, 1993).

13.13.1 Software dependabilityThe software provided should be dependable in servingthe purposes of the system. The dependability of thesoftware has two aspects: (1) specification and (2)reliability. The requirements of the system need to bedefined and then converted into a specification. Both theformulation of the requirements and the conversion intoa specification are critical features. It is then necessary toensure that the software conforms with the specificationto a high degree of reliability. One of the recurringthemes in discussions of software dependability is thatreliability alone is not enough. If the specification isdefective, the software will be so too, however high itsreliability.

13.13.2 Some software problemsThere are some persistent problems associated withsoftware. A review of these problems by Bauer (1975b)cites the following tendencies: (1) software is producedin a relatively amateurish and undisciplined way, (2) it isdeveloped in the research environment by tinkering or inindustry by a human wave approach, (3) it is unreliableand needs permanent maintenance, (4) it is messy,opaque and difficult to modify or extend and (5) itarrives later, costs more and performs less well thanexpected.

13.13.3 Software error ratesThere are a number of rules-of-thumb used in thesoftware industry for the error rates which occur inprogramming. An account is given by Cluley (1993). Forsoftware, an important distinction is that made between afault and failure. A fault is an error in the program. Afailure occurs when the program is run and produces anincorrect result for software reasons. It is a commonoccurrence that a program which contains a fault may berun many times before a failure occurs.

A rule-of-thumb widely used in the industry is that aprogram typically contains 1 fault per 1000 instructions.This is supported by data given by Musa et al. (1987) tothe effect shown that for programs of some 100 000 linesof source code when first operational the incidence offaults varies between 1.4 and 3.9 faults per 1000 lines.For programs when first written the number of faults ismuch higher.

Faults may be corrected, but correction is not alwaysstraightforward and the potential exists to introduceother faults. The data of Musa et al. indicate thatbetween 1 and 6 new faults are introduced for every100 faults corrected.

Musa et al. also quote data for the number of failuresper fault for a single run of a program. The averagevalue of this ratio is 4.2� 10ÿ7 failures/fault. In otherwords, this implies that in order to detect a fault bytriggering a failure, it is necessary on average to run aprogram 2.4� 10ÿ6 times.

In real-time applications of safety critical systemsanother metric of concern is the failure intensity, orfrequency of failure per unit time, or per mission. Forpassenger aircraft an error rate used has been 10ÿ9 permission, where the mission is a flight of 1 to 10 hours.

The progress of a debugging task may be monitoredby `seeding' the program with deliberate errors whichare not known to the team engaged in the work. Thus if35 faults have been introduced deliberately and 25genuine and 7 deliberate faults are found, the estimatednumber of original faults is 125 (� 25� 35/7).

There exist reliability growth models for softwarewhich may be used by management to estimate thetime necessary to debug a program. One such model isdescribed by Cluley.

13.13.4 Software managementManagement commitment is crucial in achieving depend-ability in software, as in other fields. Management needsto create a culture which gives priority to, and soensures, dependability of the software. The managementof a software project includes the following aspects:

(1) project management;



(2) software quality assurance;(3) software standards;(4) system requirements and software specifications;(5) software development;(6) software documentation;(7) software verification;(8) software modification control;(9) software validation and testing;(10) software maintenance.

Accounts of project management are given by Tsichritzis(1975a) and P.A.V. Hall (1991). The other aspects areconsidered below.

13.13.5 Software quality assuranceThere should by a system of quality assurance (QA) forthe software. The extent of this system will depend onthe scale of the operation, and in some cases will begoverned by standards and/or user requirements, but asa minimum there should be a formal system and anindependent QA function. Some of the methods ofassuring quality are described below.

13.13.6 Software standardsUse has long been made in software development of thetraditional quality standards such as BS 5750 and ISO9000, but there are an increasing number of standardsspecific to software. Accounts of developments in thesestandards are given by P. Bennett (1991a), the CCPS(1993/14) and Rata (1993). In the UK, standards andguidance include: BS 5887: 1980 Code of Practice forTesting of Computer-Based Systems, BS 6238: 1982 Code ofPractice for Performance Monitoring of Computer-basedSystems, BS 5515: 1984 Code of Practice forDocumentation of Computer-based Systems and BS 6719:1986 Guide to Specifying User Requirements for aComputer-based Systems; Programmable Electronic Systemsin Safety Related Applications (HSE, 1987b) (the HSEPES Guide); the Ministry of Defence (MoD) InterimDefence Standards 0055: 1989 Requirements for theProcurement of Safety Critical Software in DefenceEquipment (MoD, 1989c) and 0056: 1989 Requirementsfor the Analysis of Safety Critical Hazards (MoD, 1989b).

Relevant US standards are IEEE 1058-1987 SoftwareProject Management Plans, IEEE 1012-1987 SoftwareVerification and Validation Plans, IEEE 1028-1988Software Reviews and Audits, IEEE 730-1989 SoftwareQuality Assurance Plans and IEEE 1063-1989 SoftwareUser Documentation, as well as the guides IEEE 830-1984Guide to Software Requirement Specifications and IEEE1042-1987 Guide to Software Configuration Management.An international standard is IEC SC65A WG9: 1991Software for Computers in the Application of IndustrialSafety-related Systems.

The PES Guide has been described in Section 13.12.MOD 0055 is in three main sections. The first deals

with the project management, the parties involved andthe documentation; the second with the software engi-neering; and the third relates the requirements of thesetwo sections to the life cycle of the project.

MOD 0056 gives requirements for the hazard analysisof safety critical systems.

There are also two IEC working groups, WG9 andWG1O, which deal with software for safety-related

applications and with generic safety aspects, respec-tively. WG9 is responsible for IEC SG65A

13.13.7 Software developmentThe process of software development is generallydescribed broadly in the following terms:

(1) requirements specification;(2) system specification;(3) program specification;(4) program design;(5) program production;(6) program verification;(7) program validation and testing;(8) system integration and testing.

In software development two terms widely used are`verification' and `validation' (V&V). Verification is theprocess of determining whether the product of a givenphase of development meets the requirements estab-lished in the previous phase. Validation is the process ofevaluating software at the end of the software develop-ment process to ensure compliance with softwarerequirements.

It is good practice to verify the software produced ineach phase of the project before proceeding to the nextphase. Another aspect of good practice is the productionof good documentation.

One method of software development which is founduseful in many cases is prototyping. There is more thanone kind of prototype. An account of prototyping is givenby Ince (1991b).

13.13.8 Software specificationThe conversion of the user's requirements into anunambiguous specification for the system and then forthe software is one of the most important, but difficult,tasks in software development. There is a high degree offormality in the approach taken to the specification of thesoftware and a number of formal methods have beendeveloped. An account is given by Webb (1991). Use ismade of mathematically based languages such as VDM,Z and OBT and of mathematically based methodologiessuch as JSD, EPOS and Yourdan StructuralDevelopment. For many safety critical systems suchformal methods are a requirement.

13.13.9 Software design, production and verificationThere are a number of basic principles governingsoftware design. They include (1) modularity and (2)hierarchy.

The computer program required for even a moderatelysized project may be large. A large program needs to besubdivided into manageable parts, or modules. Whilstsubdivision into modules is necessary, problems arise ifthe interfaces between modules are poorly defined. Thespecification of the interfaces between modules requirescareful attention. In some applications, it may be possibleto exploit the use of verified modules and of a modulelibrary.

The program will generally have a hierarchicalstructure, with the higher level modules controlling thelower level ones. Structured programming involves theuse of a hierarchy of conceptual layers and provides aformal approach to the creation of hierarchical software.



As already mentioned, verification of the programsproduced at each phase should not be relegated until theend but should be performed before proceeding to thenext phase.

13.13.10 Software modification controlA major software project will generally be subject tomodifications. Demands to make modifications may occurat any level, starting with the system requirements.There should be a system for the control of suchmodifications. The ease with which such a system canbe created and operated depends very much on thequality of the software design and documentation.

13.13.11 Software reliabilityThe point has already been made that `softwarereliability' is not the same as `software dependability'. Itis nevertheless an essential feature. Accounts of softwarereliability are given in the texts quoted and by Tsichritzis(1975b). Some aspects of software reliability are:

(1) programming language;(2) programming practice;(3) software design;(4) measurement of reliability;(5) assessment of reliability.

The programming language used can influence thereliability of the software produced. A number ofexamples of differences between languages in thisrespect are given by Tsichritzis.

Likewise, the programming style can affect reliability.One aspect is the naming of items. It is usuallyrecommended that semantic naming be practised inwhich the name is a meaningful one. Another aspect isthe length of sections of the program. Here therecommendation is to keep the verification length short.A practice which tends to increase the verification lengthis the use of GO TO statements.

One aspect of good design practice which contributesto software reliability is a strong structure. Another istransparency of the programs. A third is well-definedinterfaces between modules.

In principle, improvement of reliability depends on theability to measure it. Traditionally, metrics have beenconcerned primarily with aspects of performance such asexecution time rather than with reliability. Measures forreliability are discussed by Tsichritzis. Software protec-tion contributes to reliability by providing barriers to thetransmission of errors between different features of thesystem.

13.13.12 Software testing and debuggingThe traditional way of dealing with errors in a programis testing and debugging. An account of this aspect isgiven by Poole (1975). Debugging and testing are greatlyfacilitated if they are planned for in the design phase.Another feature which can make a major contribution isdocumentation written with this requirement in mind.

Debugging tends to be a difficult task and various aidsare available. One is the system dump, activated by a callin, or by catastrophic failure of, the program. Another isthe snapshot, similar to a dump, but occurring duringexecution. The trace mode of program execution causesan output to be made for each statement in the section

traced. The traceback facility shows how control reachedthe point in the program where the error has occurred.

It is helpful to debugging if key quantities in theprogram are made parameters which the user can alter.This permits a fuller exploration of the programcharacteristics. Debugging is also assisted by theincorporation of debugging code in the program. Theuse of such code is discussed by Poole.

Testing is assisted by subdivision of the program intomodules. It is not, however, a straightforward matter todevise test beds and test strategies for modules.

13.13.13 Software protectionSoftware protection may be regarded as an aspect ofsoftware reliability. The aim of software protection is toguard against error and malice. There are a variety ofitems, such as files and programs, which need to beprotected and a corresponding variety of means ofachieving protection.

Protection establishes barriers to the transmission ofan error between one part of the system and another. Ittherefore contributes to reliability by limiting the effect ofan error. Protection contributes to reliability in anotherway. The occurrence of an error usually results in anattempt to violate a protection barrier. This can be usedas a means of error detection.

Closely related to software protection is softwaresecurity. The aim of software security is to guardagainst unauthorized use.

13.13.14 Software assessmentThere are a number of methods available for theassessment of the software reliability. Accounts of thesetechniques are given in the texts mentioned and byTsichritzis (1975b), Fergus et al. (1991), Webb (1991)and M.R. Woodward (1991). Three main approaches are:

(1) auditing;(2) static analysis;(3) dynamic analysis.

Auditing particularly addresses aspects such as thequality assurance and standards, the comprehensibilityand readability of the program, and the documentation.

Static analysis involves analysing the program withoutrunning it. Some methods which may be used include:

(1) semantic checking;(2) control flow analysis;(3) data use analysis;(4) information flow analysis;(5) semantic analysis;(6) compliance analysis.

The program compiler is generally utilized to performchecks on statements in the program, or semanticchecks. The power of this facility depends on theprogramming language used.

The control flow of the program may be analysed toreveal its structure and to detect undesirable featuressuch as multiple starts, multiple ends, unreachable code,etc. One method of doing this is to represent theprogram as a graph of nodes joined by arcs, whereinitially each node represents a statement. A process ofreduction is then applied whereby nodes are successivelyeliminated to reveal the underlying structure.



The data use of the program may be analysed toidentify incorrect uses of data such as attempts to readdata which are not available or failure to utilize datawhich have been generated.

The information flow in the program may be analysedto identify the dependence of output variables on inputvariables.

Semantic analysis determines the mathematical rela-tionship between the input and output variables for eachsemantically feasible path. It can be used to determinethe outputs for the whole input space, includingunexpected inputs.

Compliance analysis compares the program with thespecification and reveals discrepancies. The specificationis expressed as a statement in the predicate calculus ofthe pre-conditions and post-conditions to be satisfied bythe program. For a complex program assertions may beprovided about the functionality of the program forintermediate stages.

Use may also be made of diagrams showing the logicof the program, such as fault trees, event trees, Petrinets, and state transition diagrams, as described by P.Bennett (1991a). The application of fault tree analysis toprograms has been developed by Leveson and co-workers (Leveson and Harvey, 1983; Leveson andStolzy, 1983). Figure 13.25 shows the analysis of anIF. . .THEN. . .ELSE statement by fault tree, Petri netand event tree methods.

Fault trees and event trees are described in Chapter 9,but the Petri net representation requires brief explana-tion. A Peiri net consists of the quintuple C:

C � �P;T; I;O;u� �13:13:1�where P is a place, T a transition, I an input, O anoutput and u an initial condition. Initialization of a Petrinet is called `marking' it. A transition are said to `fire'.Assigning a value is referred to as `passing' a `token' to aplace.

Software tools have been developed to assist in thestatic analysis of software. Some of the tools available aredescribed by Fergus et al. (1991). They includeMALPAS, SPADE and the LDRA testbed. An accountof MALPAS, which includes control flow, data use,information flow, semantic and compliance analysers, isgiven by Webb (1991).

Dynamic analysis, or testing, involves running theprogram and analysing the results. The basic techniqueis to force situations where errors are revealed. Thereare two main approaches. One is `black box' testing andthe other `white box' testing. The distinction is that thelatter relies on knowledge of the structure of theprogram, whilst the former has no such knowledge buttests the performance against the requirements andrelies essentially on knowledge of the applicationdomain. Dynamic testing may be control flow or dataflow driven. A discussion is given by M.R. Woodward(1991).

One aim of testing is to remove whole classes of error.A technique for doing this is mutation testing. Anaccount is given by Woodward. The basic concept is tomake a small change to the program in the expectationthat this will make an observable difference in itsperformance.

It should be appreciated that good results from avalidation test do not necessarily indicate high reliability.

This is so only if the exercise of the control path in thevalidation test corresponds to that which will occur inpractice.

The level of assessment should be matched to theapplication. This aspect is discussed by P. Bennett(1991a). He lists five classes of assessment:

0 System overview.1 System structure analysis.2 System hazard analysis.3 Rigorous analysis.4 Formal mathematical methods.

13.13.15 Software correctnessThe use of formal methods to prove the correctness ofthe program has already been mentioned. This is a majorarea of research. There are differing views as to thefeasibility of such proof.

The methods used to prove correctness may beinformal or formal. The informal method derives fromwork of Naur (1966), Floyd (1967) and London (1968),following von Neumann. Points are selected on all thecontrol paths at which assertions can be made about thevariables. Then if A is an assertion at one point in acontrol path and B an assertion at the following point,the approach taken is to prove that the code is such thatif A is true, B is true. If this verification is performed furall adjacent pairs of assertions and for all control paths,the partial correctness of the program is proved. Proof ofcomplete correctness requires a separate proof of halting.It tends to be a substantial task, however, to develop theassertions and to perform the proofs.

The formal method of proving correctness is based onthe demonstration by Floyd (1967) that proof of partialcorrectness is equivalent to proving correspondingtheorems in the first order-predicate calculus. Mannaand Pnueli (1969) extended such proof to include halting.The approach taken is to formulate the problem so thatit is possible to apply automatic theorem-proving techni-ques.

13.13.16 Software maintenanceSoftware generally requires a good deal of maintenance.This is particularly true of safety related software,especially real-time software. The project managementshould make suitable provisions for software mainte-nance. The quality of the software, and the associateddocumentation, largely determines the ease of mainte-nance.

13.13.17 Software for real-time systemsReal-time, on-line systems controlling process plants placeeven more stringent demands on software. An account ofthe software aspects is given by Fergus et al. (1991).

The characteristics of real-time systems have beendescribed by Quirk (1985). In such systems the demandsare driven in timing and sequencing by the real world,they may occur in parallel and they may be unexpectedand even conflicting. The software must satisfy timeconstraints and it must continue to operate. Moreover,the software is part of a total system and is difficult tovalidate in isolation.




Figure 13.25 Some representations used in error analysis of an IF. . .THEN. . .ELSE statement in a computerprogram (after P. Bennett, 1991a): (a) fault tree; (b) Petri net; and (c) event tree (Courtesy of Butterworth-Heinemann)


Work on the methods of describing the behaviour ofreal-time systems typically deals with issues such asconcurrency and synchronization, resource scheduling,and liveness and deadlock.

The dynamic testing of a real-time program may becarried out using an off-line host machine. Such testingis described by Fergus et al.

13.13.18 Safety critical systemsIf the consequences of failure of a real-time computersystem are sufficiently serious, the system is a safetycritical system (SCS). SCSs are of particular concern inthe military, aerospace and nuclear fields, but are ofgrowing importance in the process industries.

SCSs are treated in Safety Aspects of Computer Control(P. Bennett, 1993). Other accounts are given in the textscited at the start of this section and by P. Bennett(1991a,b), Bologna (1993), Ehrenberger (1993), Malcolm(1993), McDermid (1993) and Pyle (1993). Standards areparticularly important for SCSs. Some relevant standardsand guidance are detailed in Section 13.13.6. Practicalguidance is available in the HSE PES Guidelines and theCCPS Safe Automation Guidelines, described in Sections13.12 and 13.15, respectively.

There are a number of real-time languages andenvironments which have special safety related features.One such is ADA. Accounts are given in ProgramVerification Using ADA (McGettrick 1982), ADA forSpecfication and Design (Goldsack, 1985), ADA inIndustry (Heilbrunner, 1988) and Developing SafetySystems: A Guide Using ADA (Pyle, 1991). Pyle (1993)discusses the guidance given in the HSE PSE Guide inthe context of ADA.

Where the process is dependent on a computer orPES, methods are required to identify the associated

hazards. The application of hazop to process computers(chazop) is described in Chapter 8.

13.14 Safety-related Instrument Systems

It will be apparent from the foregoing that it is necessaryto adopt a systematic approach to the whole system ofinstrumentation, control and protection.

13.14.1 EEMUA Safety-related Instrument System GuideA scheme for this is described in Safety-relatedInstrument Systems for the Process Industries by theEngineering Equipment and Materials Users Association(EEMUA) (1989 Publ. 160) (the EEMUA Safety-relatedInstrument System Guide). This document complementsthe HSE PES Guide by proving additional guidancespecific to the process industries. The background to,and an account of, the scheme is given by W.S. Black(1989).

The starting point is the practice in conventionalsystems of separating the protective functions from thecontrol functions. Whereas in such systems controlfunctions may be performed by a PES, it has beenalmost universal practice to use hardwired systems forprotective functions.

13.14.2 Categories of systemFour categories of system are defined:

0 Self-acting devices.1 Non-self-acting devices.2 System which protects against damage to environ-

ment.


Table 13.28 Categories of control and protective system given in the EEMUA Guide (after EEMUA, 1989) (Courtesyof the Engineering Equipment Manufacturers and Users Association)

Category Type of system Purpose Consequence of failure Requirements

0a Self-acting device Safety Hazard to persons Relevant BSssuch as PRV, BD, orcontainment

1b Instrument safety Safety Hazard to persons PES Guide;System EEMUA Guide

2c Protective system Economic or Loss of production or harm to Reliabilityenvironmental environment comparable to

conventionalanaloguesystems so thatdemands onprotectivedevices arelimited

3c Control system Operational Loss of production and possibledemand on Category 0, 1 or 2system

a Where Category 0 devices are installed and their capability and integrity alone are adequate to ensure safety, Category 1 systemswill be unnecessary.b Where mechanical devices cannot be used or are not adequate alone to ensure safety, Category 1 systems will be necessary.c If programmable systems are used for Category 2 or 3 systems, a full assessment of the system according to the PES Guide or theEEMUA Guide will be unnecessary.


3 System which ensures reliable production and keepsplant operation with operational limits.

These categories are amplified in Table 13.28.

13.14.3 Categorization processAssignment of systems to these categories should bemade on the basis of a review, involving consideration ofthe plant line by line. A schedule should be prepared ofall the failures which result in excursions outside normalprocess operating limits. The process conditions afterfailure should be determined. Cases where the processconditions are unacceptable with respect to safety shouldbe identified. The option of making a modification toeliminate the unacceptable condition should be consid-ered. If it is decided to rely on the instrument system toprevent the unacceptable condition, the system should belisted together with the potential hazard.

The review may be part of a hazop study or it may beseparate. A separate review has the advantage that anyrethinking can be done outside the hazop. If a separatereview is undertaken, the results should be considered inthe hazop.

13.14.4 Selection of systemsIn selecting a system for a Category 1, 2 or 3 duty,consideration should be given as to whether the systemshould be programmable or non-programmable. TheInstitution of Electrical Engineers (IEE) classification ofprogrammable systems recognizes three types:

(1) fixed program system;(2) limited variability system;(3) full variability system.

Examples of these three types are a three-term controllerwhich emulates its analogue equivalent, a programmablelogic controller (PLC) and a minicomputer. Table 13.29gives the selection scheme presented in the EEMUAGuide.

13.14.5 Review of systemsOnce the systems have been selected, the arrangementsshould be subjected to a review by a team includingprocess engineers, control engineers and operationsmanagers. It should be established that the requirements

given in the PES Guide for configuration, relia-bility and quality are met The EEMUA Guide refers tothe checklists in the PES Guide and gives its ownchecklists.

13.14.6 Implementation of systemsA Category 0 or 1 system should have the capability andreliability to deal with the foreseeable failure modes andfailure frequency of the plant itself and of the Category 2and 3 systems.

Where a Category 1 system is used, the system shouldbe engineered in accordance with the PES Guide and theEEMUA Guide. The requirements in these documentsrelating to hardware, quality and reliability are applicableboth to programmable and non-programmable systems.For the latter, however, the requirements relating tosoftware are not applicable.

Failure of a Category 2 or 3 system may put a demandon a Category 0 or 1 system. Where a Category 0 or 1system is used which is based on a PES, the failure rateshould not exceed that of the equivalent conventionalsystem.

13.14.7 Failures in systemsThe EEMUA Guide gives an account of and guidanceon, the failures which occur in conventional andprogrammable systems. In conventional systems a singleoutput failure is usually to the zero or low states. This isthe mode of failure on loss of air or power. Systems aredesigned so that the plant goes to a safe state on thisfailure mode. In such systems the usual assumption isthat multiple failures are in the zero or low mode. This iscommonly the basis on which relief capacity is sized.

In programmable systems a single output failure willbe due to failure in an input or output channel. Thefailure rate to the high state is unlikely to exceed that ina conventional system. There is potential, however, in aprogrammable system for multiple failures to the highstate due to random hardware failure or systematicsoftware failure. An assessment should be made of thesystem to ensure that the probability of multiple failureto the high state due to random hardware failure is low.

Failures of software may be failures of system softwareor of applications software. It is rare for system softwareto be fault free, although a mature system can be


Table 13.29 Selection scheme for control and protective systems given in EEMUA Guide (EEMUA, 1989) (Courtesyof the Engineering Equipment Manufacturers and Users Association)

System Self-acting Non-programmable Fixed program Limited variability Full variability

Ultimate safety, Preferred � � � �Category 0

Ultimate safety, � Preferred Acceptable Acceptable AvoidCategory 1

Protection, � Preferred Preferred Acceptable AvoidCategory 2

Regulatory, � Acceptable Acceptable Preferred AvoidCategory 3

Supervisory � � Avoid Preferred Acceptablecontrol

Information � � � Acceptable Preferred


expected to contain fewer faults than a new one. Thesystem software for any control system to be used in asafety related application should be evaluated. Thealternative means given are formal evaluation and userexperience.

Failure in the applications software should be mini-mized by good software engineering. The EEMUA Guidegives guidance on software development and testing.

13.14.8 Loop allocation strategiesThere are two basic strategies for loop allocation: (1)outputs distributed and (2) outputs grouped. Theprinciples are that, in the first case, the outputs from asingle PES unit are distributed around a number ofprocess units, whereas in the second they are concen-trated at a single process unit or at least at a minimumnumber of units.

If the outputs are distributed, loops which may failsimultaneously are not concentrated on the same processunit. The resultant problem at any given unit willtherefore be less severe. In particular, this policy allowsthe pressure relief system to be designed for a singlefailure on each unit On the other hand there may thenbe a quite large number of process units with somedegree of problem. Alternatively, the outputs may begrouped. The problem of multiple failures of loops isthen concentrated on one process unit.

The choice between these strategies depends on thecharacteristics of the process and the probability ofmultiple failure. A distributed strategy may be suitablefor a simple, slow-responding process, but a groupedstrategy for a fast-acting process. With a well-designedinstrument system, particularly with redundancy, theprobability of multiple failure may be small comparedwith failures of the process unit for other causes.

13.15 CCPS Safe Automation Guidelines

13.15.1 Guidelines for Safe Automation of ChemicalProcessesThe safety aspects of process control systems are thesubject of Guidelines for Safe Automation of ChemicalProcesses (CCPS, 1993/14) (the CCPS Safe AutomationGuidelines). The Safe Automation Guidelines cover thesafety aspects of the whole process control system,including the basic process control system (BPCS), thesafety interlock system (SIS) and the human operator.Two types of interlock are distinguished: (1) failureinterlocks and (2) permissive interlocks. The distinctioncorresponds to that used here between trips andinterlocks proper.

The headings of the Guidelines are: (1) overview; (2)the place of automation in chemical plant safety � adesign philosophy; (3) techniques for evaluating integrityof process control systems; (4) safety considerations inthe selection and design of BPCSs; (5) safety considera-tions in the selection and design of SISs; (6) adminis-trative controls to ensure control system integrity; (7) anexample involving a batch polymerization reactor; and (8)the path forward. Appendices deal with SIS technologies,separation of the BPCS and SIS, watchdog timer circuits,communications, sensor fail-safe considerations, SISequipment selection, PES failure modes and factoryacceptance test guidelines.

The Guidelines are concerned particularly with PES-based SISs. As described earlier, at least until recentlythe normal approach has been to use for the safetyinterlock a hardwired system separate from the rest ofthe control system, whether or not this be computerbased. The Guidelines describe a design philosophy inwhich the system of choice for an SIS is a PES-basedsystem. In large part the guidance is concerned withensuring that a PES-based system has the availability andreliability required for this duty.

This section gives an outline of the Guidelines. Thelatter contain a wealth of practical guidance on thevarious topics which are touched on here.

13.15.2 Basic design methodThe design requirements for the SIS arise out of theprocess hazard analysis (PHA). The Guidelines requirethat the SIS should be designed by a formal method, butare flexible with respect to the method used. They give abasic design method, which includes what they term a`qualitative approach to specification' of the safetyinterlocks required but which allows for the use ofalternative quantitative approaches.

The basic design method given in the Guidelines isbased on the following features:

(1) independent protection layers;(2) process risk ranking;(3) safety interlock integrity level specification;(4) safety interlock integrity level implementation.

This philosophy is outlined in Table 13.30. Section A ofthe table lists the features which are treated as layers ofprotection and Section B gives the criteria for a layer orcombination of layers to constitute an independent layerof protection (IPL). An IPL protects against a particulartype of hazardous event. The event severity and eventlikelihood are obtained from the process hazard analysisas shown in Section C. The scheme given in Section Dindicates the integrity level (IL) required for any safetyinterlock (SI). There are three integrity levels: Levels 1,2 and 3 (IL1, IL2 and IL3). As stated in the footnotes,the number of IPLs to be used in the table is the totalnumber of IPLs, including the safety interlock beingclassified. The implementation of a safety interlocksystem of specified integrity level is indicated inSection E and an example of the determination of theintegrity level of a safety interlock is given in Section F.

13.15.3 Evaluation of control system integrityThe Guidelines review the various safety and integrityevaluation techniques applicable to process controlsystems, including under the qualitative techniquesoperating experience, standards and codes, design guide-lines, checklists, What-If analysis, failure modes andeffects analysis (FMEA) and hazop and under thequantitative techniques trip capability analysis, fault treeanalysis, event tree analysis, reliability block diagrams,Markov models, Monte Carlo simulation, non-destructivefault insertion testing and QRA.

The BPCS and SIS should both be certified, either byself-certification or third party certification. For PESdevices three maturity levels are recognized: user-approved (for BPCS), user-approved safety (UAS) (forSIS) and user-obsolete. The Guidelines give criteria foruser approvals.




Table 13.30 Basic design philosophy of the safety interlock system in the CCPS Safe Automation Guidelines(CCPS, 1993/14) (Courtesy of the American Institute of Chemical Engineers)

A Layers of protection

1. Process design2. Basic controls, process alarms, operator supervision3. Critical alarms, operator supervision and manual intervention4. Automatic SIS5. Physical protection (relief devices)6. Physical protection (containment dikes)7. Plant emergency response8. Community emergency response

B Criteria for independent layers of protection

The criteria for a protection layer or a combination of protection layers to qualify as a independent protection layer(IPL) are:

1. The protection provided reduces the identified risk by a large amount, that is, at least by a 100-fold reduction2. The protective function is provided with a high degree of availability ÿ 0.99 or greater3. The protection has the following characteristics:

� Specificity. An IPL is designed solely to prevent or to mitigate the consequences of one potentially hazardousevent (e.g. a runaway reaction, release of toxic material, a loss of containment, or a fire). Multiple causes may leadto the same hazardous event and therefore multiple event scenarios may initiate action of one IPL� Independence. An IPL is independent of the other protection layers associated with the identified danger� Dependability. It can be counted on to do what it was designed to do. Both random and systematic failuremodes are addressed in the design� Auditability. It is designed to facilitate regular validation of the protective functions. Functional testing andmaintenance of the safety system is necessary

C Process risk ranking

An event is assigned a severity and a likelihood:

Event severityMinor incident Impact initially limited to local area of the event with potential for broader consequences if

corrective action is not takenSerious incident One that could cause:

� Any serious injury or fatality on site or off site� Property damage of $1 million offsite or $5 million onsite

Extensive incident One that is five more times worse than a serious incident

Event likelihoodLow A failure or series of failures with a very low probability of occurrence within the expected

lifetime of the plant (5 10ÿ4 failures/year). Examples: (1) three or more simultaneousinstrument, valve or human failures; (2) spontaneous failure of single tanks or process vessels

Moderate A failure or series of failures with a very low probability of occurrence within the expectedlifetime of the plant (10ÿ4 to 10ÿ2 failures/year). Examples: (1) dual instrument failures; (2)combination of instrument failures and operator errors; (3) single failures of small process linesor fittings

High A failure can reasonably be expected to occur within the expected lifetime of the plant (4 10ÿ2

failures/year). Examples: (1) process leaks; (2) single instrument or valve failures; (3) humanerrors that could result in material releases

The event is designated low risk for any of the following combinations: (1) severity low, likelihood low;(2) severityserious, likelihood low; (3) severity low, likelihood moderate. It is designated high risk for any of the followingcombinations: (1) severity extensive, likelihood high; (2) severity serious, likelihood high; (3) severity extensive,likelihood moderate. It is designated moderate risk for the other three combinations


13.15.4 Basic process control systemThe basic process control system is not usually an IPL.It is, nevertheless, the next line of defence after theprocess design and has an important part to play. TheGuidelines therefore deal with the safety considerations inthe selection and design of the BPCS. The account givencovers (1) the technology selection, (2) the signals, (3)the field measurements, (4) the final control elements,(5) the process controllers, (6) the operator/controlinterfaces, (7) communication considerations, (8) electri-cal power distribution systems, (9) control systemgrounding, (10) batch control, (11) software design anddata structures and (12) advanced computer control

strategies, and contain much practical material on thesefeatures.

The Guidelines advise that use of a supervisorycomputer should be subject to a discipline whichrestricts it to manipulation of loop set points. It shouldnot normally be able to change the operational mode ofthe loops except for transfer to the back-up mode oncomputer failure or to computer mode on initialization. Itshould not compromise the integrity of the back-upcontrols.

The design philosophy of the Guidelines requires thatthe BPCS and the SIS should be separate systems. TheBPCS should not be relied on to protect against unsafe


D Safety interlock integrity level specificationa

Event severity

Minor Serious Extensive

Event likelihood Low Moderate High Low Moderate High Low Moderate High

No. of IPLsb 3 (5) (5) (5) (5) (5) (5) (5) 1 12 (5) (5) 1 (5) 1 2 1 2 3 (2)1 1 1 3 1 2 3 (2) 3 (2) 3 (2) 3 (1)

aThe values in the table without brackets refer to the integrity level (IL) required; the values in brackets refer to thenumber of the note given below.bTotal number of IPLs, including the safety interlock being classified.

Notes:1. One Level 3 safety interlock does not provide sufficient risk reduction at this risk level. Additional PHAmodifications are required.2. One Level 3 safety interlock may not provide sufficient risk reduction at this risk level. Additional PHA review isrequired.3. Event likelihood � likelihood that the hazardous event occurs without any of the IPLs in service (i.e. the frequencyof demand).4. Event likelihood and total number of IPLs are defined as part of the PHA team work.5. SIS IPL is probably not needed.

Integrity level availability

Availability(%)

Level 1 about 99Level 2 99�99.9Level 3 up to 99.9�99.99

E Safety interlock integrity level implementation

Integrity level (IL) Minimum interlock design structure

1 Non-redundant: best single path design2 Partially redundant: redundant independent paths for elements with lower availability3 Totally redundant: redundant, independent paths for total interlock system. Diversity

should be considered and used where appropriate. A single fault of an SIS component ishighly unlikely to result in a loss of process protection

F Illustrative example

Event severity ExtensiveEvent likelihood without benefit of either IPL ModerateTotal number of IPLs (non-SIS IPL + SIS interlock) 2Required SIS interlock integrity level 2


process conditions. The integrity of the SIS should notbe compromised by the BPCS. Appendix B of theGuidelines gives detailed guidance on separation.

13.15.5 Safety interlock systemAs far as concerns safety considerations in the selectionand design of the SIS, the Guidelines cover (1) thedesign issues, (2) the requirements analysis, (3) thetechnology selection, (4) the architecture selection, (5)the equipment selection and (6) the system design. Thenecessary preliminaries are to determine the need forsafety interlocks and to establish their integrity levels.

Design issuesDesign issues are of two main kinds: function andintegrity. Issues concerning function include the para-meters to be monitored, the trip actions to be taken, thetesting facilities and policy. Among those bearing onintegrity are the number of integrity levels required,which affects the choice of technology.

Some specific design issues addressed in theGuidelines are (1) the fail-safe characteristics, (2) logicstructures, (3) fault prevention and mitigation, (4)separation of the BPCS and SIS, (5) diversity, (6)software considerations, (7) diagnostics, (8) the human/machine interface and (9) communications.

The fail-safe issue involves the choices de-energize-to-trip vs energize-to-trip. There is also the question of thefailures modes of PES-based devices. Even at the chiplevel probable states are equally likely to be on or off.The problem is even more severe at the level of a PES-based device. Effectively the Guidelines suggest alterna-tive approaches based on use of equipment of provenreliability, capable of self-diagnosis and of proof-testing,with judicious use of redundancy.

There should be a separation between the SIS andBPCS such as to ensure the integrity of the former.Conventional SISs have long utilized separate sensorsand power supplies. The Guidelines bring into considera-tion also the input/output system, the software and thehuman/machine interface.

Diagnostics may be used to detect fail-to-dangerfailures in the safety interlock equipment including thesensor, the logic solvers, the final control elements andthe energy sources. The Guidelines distinguish betweenpassive and active diagnostics. In passive diagnostics thefailure is revealed only when a demand is imposed,either by the system or by a user test. In activediagnostics the device is subjected continuously totesting by input of out-of-range conditions and itsresponse monitored, but over a time interval shortenough not to upset the safety interlock loop. Theexample quoted is the perturbation of a solenoid valveon a control valve with sufficient rapidity that the controlvalve is not affected.

Requirements analysisThe requirements analysis determines the targets foravailability and reliability (or functional and operationalreliability).

Technology selectionThe SIS technologies given in the Guidelines include: (1)fluid logic (pneumatic, hydraulic); (2) electrical logic,including direct-wired systems, electromechanical devices

(relays, timers), solid state relays, solid state logic andmotor-drive timers; (3) PES technology, involving pro-grammable logic controllers (PLCs) and distributedcontrol systems (DCSs); and (4) hybrid systems. Thetechnologies are detailed in Appendix A of theGuidelines.

The hardware of a typical SIS as envisaged in theGuidelines might consist of a logic solver with inputmodules receiving sensor signals, output modules send-ing out signals to final control elements, a BPCSinterface, a human/machine interface and an engineer'sinterface.

Architecture selectionUnder architecture selection the Guidelines discuss thevarious ways of achieving an integrity appropriate for theintegrity level determined. Thus for IL1 redundancy isusually not necessary, though it may be appropriate for alower reliability element. For IL3, on the other hand,there should be full redundancy. Other features men-tioned for IL3 are use of analogue sensors so that activediagnostics can be practised, monitoring of the logicsolver outputs by the BPCS and consideration of the useof diversity in the sensors. For both high availability andhigh reliability use may be made of a triple modularredundant (TMR) system, or 2/3 voting system.

Equipment selectionThe equipment selected for a PES-based SIS should beof user-approved safety.

System designThe basic design method for the SIS has already beendescribed, but the design involves more than this. Thedesign should allow for the special features of PES-basedsystems. One of these, the difficulty of determining fail-safe states, has already been mentioned. Another featureis false `turn-ons' of inputs or outputs.

Another problem in PES-based systems is that the lifeof a given version of the software is relatively short sothat the version initially used is liable to become out ofdate and, after a time, no longer supported by thevendor. The problem then arises that insertion of an up-dated version constitutes a software modification, with allthat that entails. There are various approaches to theproblem, none entirely satisfactory.

The design should take into account the potentialimpacts of the SIS on the other components of theprocess control system, including the alarm system, thecommunications system and the human/machine inter-faces.

Most process control systems involve some sequentialcontrol even if it is largely limited to start-up and shut-down. The sequential logic should operate in such a wayas not to cause any safety problems. Its operation shouldbe tested against the safety interlock logic to ensure thatnormal operation of the sequential control does nottrigger interlock action.

The documentation for the SIS specified in theGuidelines includes (1) the operational description, (2)the schematic diagrams, (3) the binary logic diagramsand (4) the single line diagrams. Examples are given ofthese different types of diagram.



13.15.6 Administrative actionsIn order to ensure the control system integrity thedesign process just described needs to be supported byadministrative actions The Guidelines outline minimumprocedural requirements, the scope of which includes (1)operating procedures, (2) maintenance facilities, (3)testing of the BPCS, (4) testing of the SIS and alarms,(5) test frequency requirements, (6) testing facilities, (7)operations training, (8) documentation, and (9) auditingof maintenance and documentation.

The test frequency indicated in the Guidelines for SISfunctional testing is, for minimal risk systems, testingonce every 2 years or at major turnarounds, whichever ismore frequent, and for high risk systems, testing at leastonce a year or on major maintenance, whichever is morefrequent.

13.16 Emergency Shut-down Systems

In a quite large proportion of cases, the plant is providednot just with individual trips but with a completeautomatic emergency shut-down (ESD) system. Thereis relatively little written about ESD systems. One of theprincipal accounts is that given in Offshore Installations:Guidance on Design and Construction, Guidance Notes tothe Offshore Installations (Construction and Use)Regulations 1974 issued by the Department of Energy(1984) followed now by Offshore Installations: Guidanceon Design, Construction and Certification (HSE, 1990b)(the HSE Design, Construction and Certification GuidanceNotes).

13.16.1 Conceptual design of ESDThe function of an ESD system is to detect a conditionor event sufficiently hazardous or undesirable as torequire shut-down and then to effect transition to asafe state. The potential hazards are determined by amethod of hazard identification such as hazop. Estimatesare then made of the frequency and consequences ofthese hazards. The hazards against which the ESDsystem is to protect are then defined.

This protection is effected by identifying the operatingparameters which must be kept within limits if realizationof the hazards is to be avoided and selecting shut-downactions which will achieve this. A shut-down sequence isdetermined and the shut-down logic formulated. It is notalways necessary to shut down the whole plant and thereare different levels of ESD which fall short of this, suchas shut-down of an individual unit or of a section ofplant.

13.16.2 Initiation of ESDThe arrangements for initiation of the ESD are critical. Ifthese are defective, so that the system is not activatedwhen it should be, all the rest of the design goes fornothing. There is a balance to be struck between thefunctional and operational reliability of the ESD system.It should act when a hazard arises, but should not causeunnecessary shut-downs or other hazards.

One factor which affects this balance is the fact thatusually the plant is safest in the normal operating modeand that transitions such as shut-down and start-up tendto be rather more prone to hazards and are to beavoided unless really necessary. Another, related factor isthat shut-down of one plant may impose shut-down onother, linked plants.

Initiation may be manual, automatic or, more usually,both. The usual arrangement is a manual initiation point,or shut-down button, in the control centre, other manualinitiation points located strategically throughout the plantand initiation by instrumentation. Such automatic initia-tion may be effected by the fire and gas system and/orby process instruments. Measures should be taken toavoid inadvertent activation, including activation duringmaintenance and testing.

13.16.3 Action on ESDThere are a variety of actions which an ESD system maytake. Three principal types are:

(1) flow shut-off;(2) energy reduction;(3) material transfer.

Flow shut-off includes shut-off of feed and other flows. Itoften involves shut-down of machinery and may includeisolation of units. Energy reduction covers shut-off ofheat input and initiation of additional cooling. Materialtransfer refers to pressure reduction, venting and blow-down.

A fundamental principle in ESD is failure to a safestate. The overall aim is failure to a safe state for thesystem as a whole. This is normally effected by applyingthe principle to individual units, but there may beexceptions, and cases should be considered individually.Each required action of the ESD system should beeffected by positive means. Reliance should not be placedon the cascading effect of other trip actions.

13.16.4 Detail design of ESD systemIt is a fundamental principle that protective systems beindependent of the rest of the instrument and controlsystem, and this applies equally to an ESD system. Thedesign of the ESD system should follow the principleswhich apply to trip systems generally, as described inSection 13.9. There should be a balance betweenfunctional and operational reliability. Dependent failuresshould be considered. The reliability may be assessedusing fault tree and other methods. The techniques ofdiversity and redundancy should be used as appropriate.Use may be made of majority voting systems.

The emergency shutdown valves (ESVs) should have ahigh degree of integrity. Such valves are frequentlyprovided with pneumatic or hydraulic power supplies inaddition to electrical power supply. An ESV should belocated so as that it is unlikely to be disabled by thetype of incident against which it is intended to protect.

The ESD system should be provided with powersupplies which have a high degree of integrity. Thenormal approach is to provide an uninterruptible powersupply. This supply should be designed and located sothat it is unlikely to be disabled by the incident itself.The cables from the power supply to the final shut-downelements should be routed and protected to avoiddamage by the incident.

13.16.5 Operation of ESD systemThe status of the ESD system should be clear at alltimes. There should be a separate display showing thisstatus in the control centre. This display should give thestatus of any part of the EDS system which is under testor maintenance and of any part which is disarmed.Initiation of ESD should activate audible and visual



alarms in the control centre. There should be anindication of the source of the initiation, whether manualor instrument. ESD should also be signalled by an alarmwhich is part of the general alarm system.

It may be necessary in certain situations such as start-up, changeover or maintenance to disarm at least part ofthe ESD system, but such disarming should be governedby formal arrangements. The principles are essentiallysimilar to those which apply to trip systems generally, asdescribed in Section 13.9.

13.16.6 Testing and maintenance of ESD systemThe ESD system should be subject to periodic prooftesting and such testing should be governed by a formalsystem. The principles of proof testing were discussed inSection 13.9. As far as is practical, the test should coverthe complete system from initiation to shut-down condi-tion.

The need for proof testing and, more generally, for thedetection of unrevealed failure should be taken intoaccount in the design. The equipment should bedesigned for ease of testing. It should be segregatedand clearly identified. Techniques for detection ofinstrument malfunction should be exploited. In votingsystems, the failure of a single channel should besignalled.

13.16.7 Documentation of an ESD systemThe ESD system should be fully documented. The HSEDesign, Construction and Certification Guidance Notes givedetails of recommended documentation.

13.16.8 ESD of a gas terminalThe design of systems for ESD and emergencydepressurization (EDP) of a gas terminal has beendescribed by Valk and Sylvester-Evans (1985). Thedesign philosophy described is that the ESD systemshould operate only in an extreme emergency, that theESD and EDP systems are separate from the control, tripand relief systems, and that the systems should besimple and reliable.

The preliminary design of the ESD and EDP systemswas reviewed by means of a hazop study. Potentialoperational failures were studied using general reliabilityengineering methods and functional failures were studiedusing, in particular, fault tree analysis. A further hazop

was conducted on the final design for the ESD and EDPsystem.

Design studies showed that a totally fail-safe conceptwould result in a relief and flare system of exceptionalsize. Alternatives considered were to allow an increase inthe depressurization time for certain critical equipmentbeyond that recommended in the codes and to controlthe peak depressurization flow in the relief and flaresystem. In the design adopted, the plant was divided intosections such that the depressurization of each sectioncould be done independently and the operation of thesections was interlocked. The depressurization time ofcertain items was extended to 30 minutes as opposed tothe 15 minutes recommended in API 521, but the designcompensated for this extension by provision of additionalfireproofing and water cooling arrangements.

The authors highlight the differences of philosophybetween companies on whether the ESD and EDPsystems should be used for normal shut-down anddepressurization or reserved as systems dedicated foremergency use. This project reaffirmed the need toconsider the ESD and EDP systems at an early stageand to avoid treating them as an àdd-on' feature to bedealt with late in the design.

13.16.9 ESD on Piper AlphaThe ESD system on Piper Alpha illustrates a number ofthe points just made. Overall, the system was largelyeffective in achieving shut-down and venting and blow-down, but there were a number of features which are ofinterest

The main button for the initiation of the ESD causedclosure of the ESV on the main oil pipeline but not onthe three gas pipelines. One reason for this was thatclosure of these latter ESVs would impose a forced shut-down on the linked platforms. There were three separateshut-down buttons, one for each of these valves, andshut-down depended on manual action by the controlroom operator � he was thrown across the control roomby the explosion.

The ESVs on the risers of the gas pipelines were solocated that they were vulnerable to the fires thatdeveloped. This defect was widespread throughout theNorth Sea and regulations were introduced without delayto require such valves to be relocated. There was alsoevidence that some of the ESVs did not achieve tight


Table 13.31 Elements of three candidate protective systems for a large steam boiler plant (after Hunns, 1981)(Courtesy of Elsevier Science Publishers)

Control and protective features Protective system designa

Manual Medium automated Highly automated

Trip parametersb A A ABoiler purge sequence M A ABurner flame failure detection M A AGas valves leak test M M AIgnition burner control M M ABurner fuel valves operation M A A

aA, automated; M, manual.bLow boiler drum level; low combustion air flow; low instrument air pressure; low fuel oil pressure; low atomizing steam pressure; lowfuel gas pressure; high fuel gas pressure; high knockout drum level; and loss of 11OV DC supplies.


shut-off. The explosion damaged power supplies, and insome cases closure of ESVs occurred, not due to survivalof the intended power supply, but fortuitously. Furtherdetails are given in Appendix 19.

13.17 Level of Automation

The allocation of function between man and machine is aprincipal theme in human factors and is discussed in thenext chapter. Of particular interest here is the allocationof control and protective functions to the processoperator or the instrument system. This was touchedon above and is now considered in more detail by meansof a industrial example.

13.17.1 Illustrative example: steam boiler protectivesystemA case study of the optimum level of automation hasbeen described by Hunns (1981). The system investi-gated was the protective system of a large steam plant.The plant consisted of a 100 MW boiler operating at1500�2000 psi and producing 500 ton/h of steam, theboiler being dual fired with oil and gas. The principalrelevant features of the three candidate control andprotective systems are shown in Table 13.31.

Each system was assessed for its reliability in thestart-up and operational phases of the plant. A variety ofstart-up sequences were considered, each related to theevent which had caused the previous shut-down. Some200 logic diagrams were produced.

The criterion used to determine the optimum systemwas a function of the expected shut-downs. These wereclassified as low penalty and high penalty. Low penaltyshut-downs were unwanted shut-downs due to spurioustrips and correct shut-downs in response to a demand,whilst high penalty, or catastrophic, shut-downs werethose caused by a demand to which the protectivesystem did not respond.

One such case is an excessive release of unignited fuelinto the combustion chamber. Figure 13.26 shows asection of the logic, which the authors term `matrixlogic'. Events envisaged by the analyst are shown in theleft-hand column. A particular event sequence is shownby a vertical column of the matrix containing one ormore dots. The circle enclosing the & symbol at thehead of the column indicates that the events are ANDedtogether. The set of event sequences is collected underthe OR symbol, which indicates that these eventsequences are related to the top event, the unignitedrelease, by OR logic.

The equipment failure data were taken from the Safetyand Reliability Directorate SYREL data bank, whilst thehuman error estimates were obtained by expert judge-ment There were some 180 elements in the latter list.Estimates of the values in the list were made by twoexperienced analysts. Use was made of performanceshaping factors such as `time to react', `prior expec-tancy', `conspicuity of task' and `perception of conse-quence'. Good agreement was obtained between the two.Those estimates which were particularly critical or where


Figure 13.26 Matrix logic diagram for a steam boiler protective system: event èxcessive unignited fuel release'(Hunns, 1981) (Courtesy of Elsevier Science Publishers)


a divergence had emerged were mediated by a third,independent, analyst.

The results of the study were expressed in terms ofthe number of low and high penalty shut-downs per yearand the corresponding mean outage time. The meanoutage of low penalty events was taken as 1/3 days/event and that of high penalty events as 60 days/event.The manual system gave appreciably more high penaltyshutdowns but fewer low penalty ones and overall ahigher outage than the other two systems. The mediumautomated system gave slightly more high penalty shut-downs and fewer low penalty ones than the highlyautomated one, but the same outage time. On a life cyclecost basis, for which the highly automated system hadhigher capital and maintenance costs, the mediumautomated system was superior. The total life cyclecosts of the three systems � manual, medium automatedand highly automated � were £0.122, 0.095 and 0.099million/year, respectively.

13.18 Toxic Storage Instrumentation

On plants handling high toxic hazard materials(HTHMs), the instrumentation and control systemassumes particular importance. Some relevant considera-tions are outlined in Guidelines for Safe Storage andHandling of High Toxic Hazard Materials by the CCPS(1988/2) (the CCPS HTHM Guide).

Depending on the degree of hazard, the instrumenta-tion and control system should be a high integrity one.This requires adherence to the various principles alreadydescribed for high integrity design, including theapplication of principles such as fail-safe and secondchance design and the use, as appropriate, of highreliability instrumentation, instrument diversity andredundancy and high quality maintenance. It alsoinvolves the application of the techniques of hazardidentification and assessment to the design.

In respect of measurement, principal considerations arethat the potential for release from the instrument, or itsfittings, should be minimized, that the instrument bereliable and that the measurement be accurate. For flowmeasurement this favours the use of non-invasive sensorssuch as magnetic flowmeters and avoidance of glass ininstruments such as rotameters. Orifice flowmeters alsohave the disadvantage of an extra flange and associatedpiping. For pressure measurement, diaphragm pressuresensors are preferred to direct-connected gauges of theBourdon tube type. Precautions to be taken where thelatter have to be used include protection by inert liquidfilling in corrosive service and installation of shut-offvalves and, possibly, flow limiters in the form ofrestriction orifices. For level measurement, weighingmethods have advantages, but use of sightglasses shouldbe avoided. For temperature measurement, particularcare should be taken in the design of the thermowell,which can be a weak point.

The arrangements for control and protection shouldaddress the hazards of particular importance for thestorage of tonics. These include (1) overpressure, (2)overfilling, (3) overtemperature and (4) high release flow.For overpressure, the main requirements are the provi-sion of overpressure protection and of means of disposalfor the relief flows. For overfilling, a significant role islikely to be played by trip systems. For temperature

deviations, which may indicate reaction runaway orthermal stratification with its attendant risk of rollover,the need is for warning. Some methods of dealing withovertemperature are described below. High releases flowfollowing a failure of containment may be mitigated bythe use of suitable control valve trims and of restrictororifices or excess flow valves.

Storage of a reactive chemical requires close control inrespect both of temperature and of contamination.Methods of temperature control include the use ofcooling coils, a reflux condenser, a quench system andshort stop arrangement. All these methods of tempera-ture control require for their effective functioning goodmixing in the tank.

A toxic gas detection system should be provided, onthe lines described in Chapter 18. Toxic gas detectorsshould also be installed on vents on which breakthroughof a toxic gas may occur. The sensors should have arange adequate for this duty. Instrumentation may alsobe required to ensure that the pilot burner remains lit onany flare which has the function of destroying bycombustion any toxic gas routed to it.

13.19 Notation

Section 13.6Pd delivery pressurePs suction pressure�P pressure dropQ gas flow

Section 13.9f� density function for a plant hazard occurringm number of equipments which must survive for trip

system to surviven number of identical equipmentsp� probability of a plant hazard occurringr number of equipments which must fail for trip

system to failt time spurious trip rate� plant demand rate� plant hazard rate� equipment failure rate�p proof test interval� r repair time� fractional dead time (with simultaneous testing)

Subsections 13.9.1�13.9.13p� probability of a plant demand occurringq probability of a single channel failingq probability defined by Equation 13.9.37�d disarmed time�0 dead time�* fractional dead time (with staggered testing)�is isolation dead time

Subscriptsm/n for an m/n (m-out-of-n) systemmin minimum



Subsection 13.9.14Pn probability that system is in state nr1,r2 terms defined by Equation 13.9.53

Subsection 13.9.16A heat transfer area of sensorcp specific heat of sensork constantM mass of sensors Laplace operatort timeT temperatureTi input temperatureU overall heat transfer coefficient of sensor�(t) delta function� damping factor� temperature division�i input temperature deviation� time constant of sensor!n natural frequency

Subscripts:i inputss steady-state1,2 first, second stageSuperscripts:� Laplace transform

Subsection 13.9.17C annualized cost of a single channel tripG cost of genuine tripH cost of realization of hazardS cost of spurious tripV overall annual cost

�1 functional beta value�2 operational beta value

Subscripts:s trip system1/2 1/2 system


Documents

Ch13 Control System Design