A procedure for the analysis of errors of commission in a Probabilistic Safety Assessment of a nuclear power plant at full power

ELSEVIER 0 9 5 1 - 8 3 2 0 ( 9 5 ) 0 0 0 7 5 - 7

Reliability Engineering and System St(/t'ty 50 (1995} IS9 201 © ~)95 Elsevier Science Limited

Printed in Northern Ireland. All rights reserxcd 0951-8320/95/$9.50

A procedure for the analysis of errors of commission in a Probabilistic Safety

Assessment of a nuclear power plant at full power

J. Julius, E. Jorgenson NUS, 1303 S.Central Ave., Kent, WA 98032, USA

G. W. Parry NUS, 910 Clopper Rd., Gaithersburg, MD 20878, USA

&

A. M. Mosleh University of Maryland, 2135 Building 090, College Park, MD 20742, USA

(Received 1 May 1995: accepted 7 July 1995)

This paper describes an analytical procedure that has been developed to facilitate the identification of errors of commission for inclusion in a Probabilistic Safety Assessment (PSA) of a nuclear power plant operating at full power. The procedure first identifies the opportunities for error by determining when operators are required to intervene to bring the plant to a safe condition following a transient, and then identifying under what conditions this is likely to occur using a model of the causes of error. In order to make the analysis practicable, a successive screening approach is used to identify those errors with the highest potential of occurrence. The procedure has been applied as part of a PSA study, and the results of that application are summarized. For the particular plant to which the procedure was applied, the conclusion was that, because of the nature of the procedures, the high degree of redundancy in the instrumentation, the operating practices, and the control board layouts, the potential for significant errors of commission is low.

1 I N T R O D U C T I O N

In recent years, the analysis of human/system interactions has played an increasingly important role in Probabilistic Safety Assessments (PSAs) of nuclear power plants. In particular, attention has been focused on those interactions that take place after an initiating event, in which the operators are guided bv emergency and abnormal operating procedures (EOPs and AOPs) to bring the plant to a safe, stable state. Considerable effort has been expended in many recent PSAs to capture the impact of the failures that might occur in these human/system interactions in terms of their effect on the availability of plant equipment and

189

on safety functions. However, in most current PSAs, failures to respond as called for by procedure are generally modeled as errors of omission: a response is modeled as being executed correctly or not executed at all. The consequences of incorrect responses are generally not addressed. A human failure that leads to an incorrect response is often referred to as arising from an error of commission. In the PSA context, the most significant errors of commission are those that either, in addition to resulting in failure to perform the primary function, also fail, or make unavailable, other equipment or functions needed to mitigate the accident scenario, or otherwise exacerbate the situation. Such an error introduces a dependency

190 J. Julius et al.

between events within a fault tree, or between functions on the event tree. However , an error of commission that fails a single function or system (e.g., by terminating prematurely) can also be significant from a risk assessment perspective if it provides a new mechanism of failure that cannot reasonably be expected to be included in the failure probabili ty assigned for the function/system. This latter type of error does not introduce a new dependency between the events of the model, but increases the failure probabil i ty of the affected function.

It should be noted that some of the impacts of errors of commission may be incorporated in a PSA, even if they are not explicit in the structure of the logic model. For example, T H E R P ~ includes contributions to the probabili ty of failure to perform an action f rom such errors of commission as selecting the wrong switch from an array of switches. However , using the T H E R P approach, the impact of those errors of commission on other equipment is not modeled.

This paper presents a procedure for the identification of significant errors of commission for inclusion in a PSA. The identification of a human failure as an error of commission in the context of a PSA must be made in relation to the consequence of the failure and the way it is to be modeled. There is no universal definition. We adopt, as a working definition, that an error of commission results in an action that is inappropriate for the given scenario, in that it changes the state of one or more required pieces of equipment or functions in an adverse way. Within this definition, errors of commission can take many forms, and could include spurious actions and deliberate acts of sabotage. However , in order to limit the scope to within reasonable bounds, this paper addresses primarily those errors of commission that could be made by a rational crew while responding to an upset, transient or accident, while being guided by the set of normal, abnormal , or emergency operat ing procedures and functional restoration procedures. The procedure described here is for a PSA for internal initiating events from full power; shutdown states and external events are not addressed. The procedure developed for a PSA of the low-power and shut-down states will be presented separately. In addition, neither errors that can be the cause of, or contribute to, an initiating event, nor errors of commission in the time f lame preceding the initiating event that result in unan- nounced unavailabilities of systems are addressed.

As stated above, the errors of commission addressed in this procedure are restricted to those that result f rom rational behavior of the operat ing crew; random, or arbitrary, behavior is not addressed. Thus, it is assumed that the interactions of the crew with the plant are directed by the requirements of the procedures in response to the plant status as it is perceived by the crew. With these constraints, errors

are envisioned as being largely caused by problems in the plant informat ion/operat ing crew interface, or procedure /opera t ing crew interface. It is not the intent of this work to address methods to assess the adequacy or correctness of the procedures, although it is recognized that this is a potential cause of errors. With these analytical boundary conditions, therefore, the errors of commission addressed will primarily include those that are a result of systematic influences on all crews, rather than those resulting from individual crew characteristics.

The paper is structured as follows. Section 2 discusses the basic assumptions and models of the causes of human error upon which the analysis procedures described in this paper are based. It was found convenient, for analysis purposes, to treat three error modes separately, namely, global misdiagnoses, local misdiagnoses and slips. Sections 3 and 4 describe the procedures developed to identify the opportunit ies for error and the specific error expressions resulting from these error modes, and to screen these error expressions for their significance to risk. Screening of the potential errors is per formed on the basis of their consequence, on the potential for recovery via the procedures, and on an assessment of their frequency. Section 5 gives a summary of the results of applying these procedures to a specific plant, and Section 6 presents concluding remarks.

2 GENERAL ASSUMPTIONS AND UNDERLYING MODEL

2.1 General assumptions

As stated in the introduction, this work was per formed under the assumption that the operating crew is responding in a rational manner , and is guided by a set of operat ing procedures. Specifically, for the development of the analytical procedure presented in this paper, it was assumed that the procedures would be symptom-based procedures based on the generic Westinghouse Emergency Response Guidelines. Thus, the first procedure that is entered in response to a reactor trip or transient is E-0, 'Reac tor Trip or Safety Injection' , which is a diagnostic procedure that helps the operators identify the type of accident and leads them to an appropriate procedure for the accident type. Once the operators are in the E-0 procedure, they are trained to continue in the procedure at hand, and not to get diverted and delayed by other alarms that might annunciate. The accepted practice is that only one procedure is to be followed at a time, except for certain local procedures per formed in the plant by auxiliary operators .

In contrast with the conventional Human Reliability

Procedure for the analysis of errors of commission in a probabilistic safety assessment 191

Analysis ( HR A) models which essentially only address human failures as omitting to perform a required function called out by the operating procedures, modeling specific errors of commission (EOC) requires identification of reasons for erroneous actions. This is because EOCs are not necessarily associated with predefined actions or responses, and the set of possible actions is virtually unlimited. Therefore the identification of error opportunities, error modes, and specific manifestations of the error, i.e., error expressions, requires an understanding of the causes of error and the specific context in which the errors are likely to occur.

2.2 Underlying model of error causes

Unfortunately, current theories and models of cognition and the causes of error are still in their early stage of development and are not mature enough to implement in a PSA context. This study builds upon the cause-effect mapping concept introduced in Ref. 2. The mapping approach relates the impact of a set of identifiable influencing factors on possible modes of error without specifying the exact internal mechanism of the influence. In all cases, however, the connection between the influencing factor and the mode of error is either intuitive and obvious, or can be fully or partially supported by the available literature on cognitive science, human factors research, or nuclear power operating experience. As will be seen, however, use is made of ideas concerning the internal error mechanisms as an aid to identifying the conditions for which the potential for error is high.

Recognition of the fact that operator responses to accidents at nuclear power plants are heavily guided by procedures provides a reasonable basis for modeling human caused failures as errors in, or deviations from, following the procedural instructions. Intentional deviations from procedures are considered in this paper to be rational actions based on the operators ' understanding and perceptions of the plant status. This has been observed in review of the majority of significant events in nuclear power plants, and is consistent with the view of a number of researchers in the field. 2-7 For example, according to Fullwood & Hall, 3 ' . . . any intentional deviation from standard operating procedure is made because the employee believes his method of operation to be safer, more economical, or more efficient or because he believes performance as stated in the procedure is unnecessary. '

2.2.1 Error modes For the purposes of this study, because the causes of error are different, it was found convenient to group the errors of interest into three broad categories of error mode:

• Global Misdiagnosis, manifested by the selection of an inappropriate procedure. An Incorrect Proce- dure Selection is an error that occurs when the operator, while following the correct protocol, selects an incorrect operating procedure. Incorrect- ness is considered with respect to the class of accident to which the particular procedure corres- ponds, e.g., transient, LOCA, SGTR;

• Local Misdiagnosis, manifested by the commission of an (intentional) inappropriate action at the level of a human/system interaction. Such an intentional action is based on an incorrect interpretation of the procedure or the information required to correctly execute the procedure. This category includes shortcuts and skip errors 2 as causes of the misdiagnosis;

• Slip, manifested by the commission of an (unintentional) inappropriate action at the level of a human/sys tem/component interaction.

A diagnosis can be classified in terms of timing, i.e., whether it was performed on time, was premature, or delayed, and whether it was correct, i.e., whether it was consistent with the initiating event, or status of system or component. This classification applies to both global and local diagnoses. With this classification, an error in diagnosis includes misdiagnosis, premature correct diagnosis, premature misdiagnosis, and delayed diagnosis. In the context of a PSA, the time scale associated with a delayed diagnosis is, by definition, that which leads to unsafe plant consequences and is a contributing cause of the errors of omission as normally modeled in PSAs.

2.2.2 Performance influencing factors (PIFs) In order to understand causes of error it is necessary to consider the processes involved in the intention formation process; namely information processing, problem solving, and decision making, as well as the execution process itself. The factors in the operator 's physical and cognitive environment that can promote errors in these processes must be identified. Examples of these factors are instrument layout and redun- dancies, instrument failure, procedural inadequacy, and crew training deficiencies. These factors, which we will call Performance Influencing Factors (PIFs), are discussed in this section.

PIFs are factors that are environmental, systemic, or human related characteristics that influence the likelihood of occurrence of errors. The identification of which PIFs to use to characterize error causes should be based on models of internal (to the operator) mechanisms. 4 The PIFs that were considered as candidates for this analytic procedure are a subset of those identified in Ref. 2, where two basic categories and several subcategories of PIFs are recognized, as discussed below. The modeling


approach adopted in Ref. 2 to account for the influence of the factors is one of mapping PIFs via the error modes to specific error expressions and providing an explanation of the map. For example, a set of PIFs may be identified as the probabilistic reason for misdiagnosis of the initiating event. The misdiagnosis (error mode) , in turn could be the cause of operators taking actions such as starting a system when not needed or terminating a function when needed (error expressions). The major categories of PIFs are:

(a) Context - Independent PIFs. These are factors that apply to all accident conditions and scenarios as generic and systematic influences on opera tor behavior. Their roots usually go back to conditions set prior to the event, and their influences are not expected to change in the time scale of typical accident scenarios.

(b) Contex t -Dependent PIFs. These are factors whose influences on opera tor performance are time and context dependent . These are further divided into three categories: plant-related, procedure-related, and operator-related.

2.2.2.1 Context-independent PIFs These are divided into several subcategories.

Training related PIFs. These include:

• degree of familiarity with, and frequency of training on, EOPs;

• general philosophy towards using the EOPs (e.g., the degree to which the EOPs are adhered to):

• generic rules for handling procedural ambiguities, e.g., use of Re-diagnosis Procedure:

• method of resolving conflicting information from different instrumentations, e.g., checking redundant or diverse instruments routinely to confirm signals from the pr imary instrument.

Crew team characteristic PIFs.

• Team structure (clearly defined command and control hierarchy);

• established protocol for communicat ion (e.g., the form of communicat ing the steps of the procedures, method of confirming and acknowledging com- mands and messages):

• adequacy of resources (work load/resource ratio for the spectrum of initiating event types).

Plant related PIFs.

• Human factors design of the plant

- - con t ro l panel layout

- -qua l i ty of information media - -p recau t iona ry measures against

activation of control buttons. unintentional

2.2.2.2 Context-dependent PIFs

Plant related PIFs.

• Value of critical pa ramete r (e.g., pr imary pressure)

An unfavorable value of a key paramete r may lead to a threat to or time-stress on the operator . For example, if the opera tor finds a very high level in a steam generator , he may conclude that the steam generator is about to become solid, resulting in water going to the s team lines. This may lead him to take a shortcut. 2 The term value, however, does not refer to the value of the paramete r as a continuous variable. Rather , it refers to the degree to which the parameter value is outside the normal range (approaching a very high or a very low value), close to a trip set point, or to a value which signifies an alarming condition to the operator .

• Trend of critical parameters (e.g., rate of change in pr imary pressure)

Rate of change has a time-stress effect on the opera tor similar to the value of parameter . But, unlike the value of the parameter , which is significant at very high or low levels, a high rate of change has an effect over a broad range of pa ramete r value.

• Availability of equipment

Availability of equipment impacts the choices an opera tor may have to respond to an upset condition.

• Availability of instrumentation

The status of the instrumentat ion impacts the information available to the opera tor to enable him to make decisions.

E O P Related PIFs.

• E O P response phase (e.g., verification, diagnosis)

The major types of activities included in the EOPs are verification, diagnosis, and recovery. EOPs are normally followed in the following phases. Upon entry into the E-0, the first phase consists of the verification of actuation of important systems and values of critical parameters . The next phase is diagnosis of the accident type, which leads to the recovery procedure, e.g., E- l , E-2A, etc., for the identified root cause. The recovery procedures are in turn divided into re-verification, re-diagnosis, and recovery phases.

Each phase represents a different type of activity. For example, when in the verification phase, the operators are verifying automatic actuation of the safety related systems. During the diagnosis phase, on


the other hand, the operators are trying to diagnose the root cause of the accident by checking the symptoms from the plant. Different types of errors can occur in the different phases. For example, in the early verification phase of following the EOP, a shortcut or premature transfer to another procedure is one of the likely errors. Due to the nature of the activity in the verification phase, a skip is another likely error. The recovery phase comes towards the end of the operator response, and so shortcuts or incorrect transfer is not one of the likely errors.

remember that the pump was verified ON and omit verifying the flow.

2.2.2.3 Identification of applicable PIFs The degree to which these factors may be present in any PSA defined scenario must be established through a review of the plant physical response to initiating events (in terms of system performance criteria and thermal hydraulics trends), a review of emergency operating procedures, a control room walk through, and interviews with operators and training personnel.

Operator related PIFs.

• Confidence in diagnosis

When the operator has a very high degree of confidence in his diagnosis, he may tend to become fixed on that diagnosis. He would expect a plant behavior that is consistent with his diagnosis. Thus, he may ignore the signals from the plant that are inconsistent with his diagnosis. Furthermore, having made a diagnosis about the root cause, the operator may not pay as much attention to the systems he thinks are less important based on his diagnosis.

• Expectat ion

Opera tor expectations about plant behavior are generated as a result of operator diagnosis and the actions performed on the plant. If the behavior of the plant following his intervention does not match the operator 's expectations, this may lead him to reconsider his actions, allowing him an opportunity to recover from his error. When such a mismatch happens after taking an erroneous shortcut, the operator may lose confidence in his own judgement. Thus, the operator would be more likely to return to following the procedure.

Expectat ion as a factor is mentioned by Weston. 8 He mentions how experience with an event causes people to have anticipation about the future course of the event. Davis 9 also indicates that people have expectations about situations. The specific measures which are used to gauge this influence on operator behavior are:

- -deg ree of similarity between signatures of different accident conditions;

- -deg ree of match between plant response and expected response.

• Memory of previous actions and accident history

An example of the influence of memory is the case where the operator has recently verified that a pump is ON, and he is asked to verify flow. He may

2.2.3 The theoretical basis for the analysis procedure The PIFs discussed in the previous section are recognized as impacting human performance. Howe- ver, our interest is in identifying the potential for specific errors of commission, and, in particular, those that are intentional. Therefore, it is necessary to discuss how the PIFs discussed in the previous section impact this potential.

Most of the examples discussed under context- independent PIFs define the boundary conditions under which the analysis is performed. For instance, it is assumed that the EOPs will be used appropriately, that the operators will be aware of the diagnosis procedure, and that there are clearly defined roles for crew members. It will be seen in Sections 3 and 4 that the general operational practices are considered in identifying potential EOC scenarios. These types of assumptions enable us to create a construct within which the expected crew response is relatively well-defined, and the opportunities for self-checking and correction are identified.

The context-dependent PIFs impact the behavior in different ways. However, it is clear that they are not independent. For example, the operator 's expectation is certainly a function of his training, and of his diagnosis of the situation. It is not possible to construct an approach that maps the PIFs into errors in a simple way. Some of the PIFs can be thought of as primarily setting up opportunities for error and determining the potential consequences, and others as primarily impacting the likelihood. For example, the plant parameters and status, together with the procedures, dictate what the appropriate response should be, and therefore they characterize the conditions under which a human/system interaction is expected. Ambiguities in procedures help define the potential consequences, and the structure, clarity and logic of the procedures has an impact on the likelihood of errors. In addition, the error prob- abilities associated with responses are impacted by such things as rate of change of parameters, work load, etc.

As will be seen in the following sections of the paper, the first concern in performing an analysis of errors of commission is to identify the opportunities


for, and potential consequences of, error. Given the style of procedures assumed in this work, if the conditions are optimal, the likelihood of significant error is minimal. Therefore , our search is for scenarios in which conditions are not optimal. Decisions are made by the control room crew on the basis of the information available to them. It is assumed, therefore, that for an error of commission to be made, the crucial factor is that there be something about the scenario that either distorts the available information so that it looks like the signature of another scenario for which the inappropriate action would indeed be appropriate (e.g., instrumentation failure), or conditions are such that the operators are inclined to censor information to match a more familiar scenario signature. These assumptions are consistent with the similarity matching, and, to a lesser extent, frequency gambling processes that are proposed by many (e.g., Reason 1°) to be the underlying processes that drive human cognitive behavior. In this way, the PIFs discussed as being related to the value of critical parameters, the operators ' biases induced by training, and ,the availability of instrumentation and/or equipment, are taken into account in this initial screening for opportunities for error. It will be noted that the values of critical parameters are also used to define the need for the system/human interaction as seen above. In addition, ambiguities, or unclear directions in the procedures (referred to as ' type of response required') are considered at this stage.

The operator 's expectations of the plant response to his actions, the strength of belief in his diagnosis, and memory of recent actions are all influential in the recovery process and are addressed as needed during screening at a later stage.

Thus, while there is not a one-to-one correspon- dence between each PIF discussed in this section and a particular step in the method developed here to analyze errors of commission, it will be seen in the following sections that each PIF has been considered either explicitly in the development of the procedure, or in its application.

The following sections provide the procedural steps for identifying, screening, and ranking potential errors of commission for the three different error modes. The analytical procedure that has been developed for Global Misdiagnosis (i.e., choice of incorrect procedure) is described in Section 3, and that for the local errors of commission, both misdiagnosis and slip errors, is described in Section 4.

The philosophy adopted in developing these procedures is essentially similar to that in Ref. 2, namely a successive screening of the very large number of situations in which operators interact with the plant in order to find those cases where the likelihood of the scenario with the potential for leading to an unrecovered error of commission with

potentially significant consequences is relatively high. The major difference is that, while the approach of Ref. 2 relies on simulation to identify the opportunities for error, the approach presented here is, of necessity, more economical and focussed. The first step in the process is to identify those PSA defined scenarios that provide opportunities for a human/system interaction. The second step is, for each scenario, to use the models of error mechanisms and error causes discussed in the previous section to identify under what conditions, within the boundary conditions implied by that scenario, an error of commission might occur. Once an error is considered plausible, it may be screened out on the fact that the consequences of the likely errors are unimportant, or that it is very likely to be recovered, or that the likelihood of its occurring in the first place is sufficiently low.

Using the PSA model, the consequences of an error are relatively straightforward to identify once the error has been identified. In addition, using the information on accident progression within the PSA, the opportunities for recovery before the consequences of an error become irreversible are relatively easily identified. The intent in developing this procedure was that, if the error could not be screened out, was there a likelihood it would be formally assessed. The determination of the likelihood of an error is a function of the number and importance of performance influencing factors (PIFs) associated with the scenario generating the opportunity for error. In this procedure, the assessment of the likelihood was based on judgmental considerations, using PIFs that have been identified or postulated in previous research on causes of human error, and a formal quantification method was not developed.

3 PROCEDURE FOR ANALYSIS OF OPPORTUNITIES FOR GLOBAL MISDIAGNOSIS

In Westinghouse procedures, the diagnosis of the accident type is essentially performed in the E-0 procedure. Therefore, the potential for global misdiagnosis is eva lua ted by applying the following steps to the E-0 plant procedure, to identify the possibility of, and assess the likelihood of, transfers to an incorrect procedure. The analysis procedure has three phases and provides various options for screening as described below.

Phase 1: initial screening

1. Screen the initiating events of the PSA based on their frequency. Any initiating event having

Procedure for the analysis o f errors o f commission in a probabilistic safety assessment 195

a frequency of 1.0E-x or smaller is screened out, where x is the exponent of the rounded-up core damage frequency minus two. This is based on an argument that the maximum impact of an error of commission associated with response to the initiator is less then 1% of CDF.

Phase 2: identification of the opportunities for global misdiagnosis

. Develop a procedure response matrix (PRM) for all initiating events, or initiator groups that produce significantly different plant responses. This is a table of the expected trends of the important plant parameters and/or indicators identified in the E-0 procedure (e.g., primary pressure) for each of the initiator groups. Consideration needs to be given to subdividing an initiating event group for cases where the initiating event is caused by instrumentation failure (e.g., SG level indication) which may also provide a false signal to the operators. Then, for each of these major groups, review the decision points in the procedural path. For each decision point that relates to entering a new path in the procedure (e.g., entering E- l , E-2, E-3A, E-3B, etc.), identify the possible incorrect decisions resulting from either misin- terpretat ion or failure of the plant to provide the correct information used at the decision step, or missing the decision step altogether. An example PRM is provided in Table 1.

Phase 3: screening

Screening can be performed on the basis of consequences, on the basis of procedurally guided recovery to the appropriate procedure, or on the basis of likelihood. Step 3 collects the information that will be useful at a later stage in the procedure. Step 4 addresses screening on consequence, Steps 5-7 address long-term recovery (i.e., as a result of procedurally guided re-diagnosis), and Steps 8-10 address screening on likelihood. In practice, Steps 4-7 are carried out simultaneously, and in many cases will precede the screening on likelihood.

3. Identify the critical indicators and alarms corresponding to the entries in the PRM developed in Step 2 above. Also, for each alarm and indicator identify:

- - indica tor /a la rm location(s) - - r edundancy level - -whe the r these are diverse indicators or

alarms (e.g., alternative methods of verifying the status of the critical parameter).

It is convenient to summarize this information in the form of a plant information matrix (PIM), listing the critical parameters and the above information for each parameter , to be used in the screening at a later stage. An example PIM is provided in Table 2. The form of this matrix is abstracted from that constructed for the application described later in this paper. The table itself summarizes the conclusions which are supported by copious notes.

Screening on the basis of insignificant consequences

4. For each potential error, review the incorrectly applied procedure and identify whether there exists the potential for failing, not performing, or otherwise m ak in g unavailable, those functions required to bring the plant to a safe stable state (identified from the event tree). For those errors for which this cannot be excluded unequivocally, proceed to Steps 5-7.

Screening on the basis o f the potential for recovery

5. For each global misdiagnosis opportunity review the procedure entered as a result of the misdiagnosis to identify re-diagnosis opportunities based on procedural directions. If there are such opportunities, the case can be screened from further analysis, if the conditions in Steps 6 and 7 below are not met.

6. If the scenarios involve additional hardware failure over and above the initiating events, (e.g., failure of AFW), identify the impact on the operator 's ability to make a re-diagnosis. This can be done by considering the impact of the hardware failure on plant response. Specify this in terms of changes in the values of the critical parameters in the PRM matrix that are used to make the re-diagnosis. If the recovery cannot be guaranteed, proceed to frequency screening.

7. Assess the potential for mindset leading to failure of the recovery action. This may, for example, be a function of training bias, or of a commonality of cues. For example, if the information used that resulted in the misdiagnosis in the first place is different from that which would be used to provide the opportunity for re-diagnosis, i.e., if common cues are used for the initial diagnosis and the re-diagnosis, do not claim recovery. If the recovery cannot be guaranteed, proceed to frequency screening.

Screening on likelihood To screen a scenario on the basis of frequency, it is helpful to consider different possible mechanisms that can be postulated that could lead to a symptom being perceived as different from what it should be:


° m

~a

r~

~a

0

0

e.>

ca

~ . =

z

o . ~

> ~

S Z ~ = = - o

>-

Z

Z

>

0 ~

Q,) . .

0

m

..=

m _

e4

t ~ ¢)

z

>. >- a:

~ =

~ e q e%

ca e~ Ca

0 0 0 z z z

> - > - ~

~ e q ~

> . > . =

0 © e-

>.>.~

m ca 0

,aD

ca ©

z

I I [

e q

g= 09

.Z= ~, ~,._~ > - > . 3 :

0o 0o ~

0 O O

O O e~

ca

z

I I

o- , ¢~, e q

>- > . .~

> . > . ~

O o

. . l :

~ O Z H

©

e-.

8

ca

,.-.

G e-.

O . .~ Eo=E 0 0

~ ' ~

;_~ e-


(a) instrumentation failure, or unavailability of an indication due to equipment failure, or loss of support system,

(b) plant behavior may be modified by other equipment states (e.g., steam line break inside containment may look like a small LOCA)

(c) the step in the procedure that completes the diagnosis may be missed, bypassed, or misinterpreted.

For (a) follow Step 8 below; for (b) follow Step 9 below; for (c) follow Step 10 below.

8. Postulate the appropriate failure mode of instrumentation for the appropriate critical parameter for each case. Primarily focus on single failures of instrumentation; multiple instrumentation faults generally have a very low probability because of redundancy.

---check the possible failure modes (high, low, as-is) to determine if it is likely that the instrumentation would fail in such a way as to give incorrect information in the manner required to cause the error;

- - fo r each possible failure mode, using the PIM, determine the possibility of the operators recognizing instrumentation failure, using information on redundancy of instrumentation, standard practices for checking alternate indications, procedural backup, etc.

Retain only those possibilities for which there is no obvious immediate backup to aid recovery from the incorrect information.

9. Identify potential equipment failures that can produce misleading indications by using the following:

-- identify the status of equipment as it should be in response to the initiating event, e.g., PORVs remain closed, safety injection (SI) pumps should not start, etc;

-- identify failures that result in violating these conditions and affect the key symptom(s) for decision-making in such as way as to lead to an incorrect decision;

-- identify other symptoms or indications that are triggered by the additional faults and that can lead to an opportunity to correct this misinterpretation.

Retain those failures than can lead to misinterpretation, allow no immediate potential for recovery and which have a non-negligible probability of failure.

10. Assess the possibility of the operator not noticing, or misinterpreting the discriminating information available from the alarms and

indicators. To do this list whether the following conditions exist for each initiator:

- -work overload (typically the number of alarms, and the number of parameters to be monitored for the initiator, on a relative basis);

--perception of time urgency based on training (this can be determined by interviewing operators and asking them to rank different initiators with respect to an overall urgency factor). The rate of change of the parameter is one way of measuring this;

- -whether there is information supplied by an instrument that is known to be unreliable. This is a random hardware failure phenome- non, and the reliability of an instrument can be assessed as high or low for each of the important instruments. Persistent problems with instruments over a long period of time should be noted. This is a problem of desensitization to information;

- -whether there are any negative human factors considerations, e.g., lack of clarity of information, remoteness of recovery equipment, lighting, etc;

- -whether the procedural instruction is unam- biguous and clear;

- -whether training has over- or under- emphasized the scenario.

One potential way of using this information is to perform a qualitative screening as follows, using a rating scheme such as High, Moderate, Low. A 'High' rating on any of the first four factors or a high negative rating on either of the last two is translated into a 'High Likelihood' of ignoring confirming information. Two or more 'Moderate ' ratings will also translate into 'High Likelihood', and other combinations are screened out as 'Low Likelihood'.

For each alarm or indicator assessed as having a high likelihood of not being noticed by the operator, use the PIM to identify a recovery possibility, that is, the potential for the operator to eventually notice the information or learn about the plant condition through other means listed in PIM.

Assessment of likelihood If an error cannot be screened out, it will be necessary to assess its likelihood. This is done judgementally in Step 11.

11. Consider the combined likelihood of the initiator and any of the conditions in Steps 8, 9 and 10. Return to Step 1, and reassess the


likelihood. Again, if the combined likelihood is less than 1.0E-x/yr, the error may be screened out.

4 A N A L Y S I S FOR O P P O R T U N I T I E S FOR LOC AL M I S D I A G N O S E S OR SLIPS

- - incor rec t information error - - ind ica tor fault - -mi s r ead instruction - - o m i t a step in a procedure (this can cause an

error of commission by omitting a precau- tionary action or a disabling action, for example).

Local misdiagnosis as an error mode may have a greater potential for failing systems than does a slip, since a local misdiagnosis is associated with a particular step in a procedure which could apply to all trains. Slips which impact the unavailability of individual components are assumed to be minor contributions to those unavailabilities, and it is the common cause potential of slip errors which is of pr imary interest. Therefore , in this analysis, the opportunit ies for the most important inappropriate actions are taken to be the same for both slips and local misdiagnosis. The assessment of the likelihood of such errors and the potential for recovery is, however, different.

4.1 Identification of opportunities for and causes of inappropriate acts

1. For each initiating event group, list the functions appearing in the event tree and identify the set of success paths.

2. Identify possible human induced failure modes of the first function on a given success path, e.g., initiate system prematurely, terminate system prematurely, create diversion path, provide too much flow, provide too little flow.

3. Identify reasons for activating these failure modes by reviewing the procedure being followed to identify steps requiring some action as a result of a 'result not obta ined ' s tatement, or a symptom-dr iven requirement to per form some action.

4. Each such step constitutes an opportuni ty for an error of commission if it satisfies the following:

(a) changes the state of the system from that required,

(b) there is no automatic realignment, (c) an override of an interlock is not necessary

(the necessity to perform an override of an interlock is a powerful argument against the occurrence of a slip, though less so against a mistake).

4.2 Assessment of the potential for recovery from, or the likelihood of, local misdiagnosis

1. Develop a list of potential error mechanisms based on the critical procedural step:

.

.

.

Determine whether the error is irreversible - these are the most critical errors. For those errors that are reversible, using the information in Step 3, determine whether the opera tor may recover from misdiagnosis of the functional status of the system. To assess the impact of potential immediate recovery or compensatory mechanisms, develop a System/Funct ion Status Indicator list for each system which according to the above steps can be incorrectly operated. This is a list of alarms and indicators which provide the opera tor with the pa ramete r values used in the procedures to assess the status of the system or function. In each case identify:

- - r edundancy level of indications - -d ive r se indicator or alarm (e.g., alternative

methods of verifying the status of the critical parameter )

- -p rocedura l guidance or standard practice on checking and verifying the functional status of the indicator or alarm

- -pers i s ten t alarm.

Recoverable errors may be screened out. If there are no immediate recovery possibilities, identify later recovery possibilities. This can be done by determining the plant or system response to the action and the availability of feedback information through indicators and alarms. Procedural steps should also be con- suited for response guidelines. Errors, for which there is an opportuni ty for recovery, may be screened out.

4.3 Assessment of potential for recovery from, or the likelihood of, slip (unintentional commission) errors

1. For the significant error opportunities, list important Opera to r Action Points (OAP) at the execution level (e.g., Open Valve x), identifying only actions essential to success.

2. Review control panel layout to identify the location of the switches associated with the action. Locate and identify other switches on the panel in the vicinity of those needed for action,

Procedure for the analysis o f errors of commission in a probabilistic safety assessment 199

and determine their function, and the likelihood of confusion, based on similarity of layout, etc.

3. Determine the criticality of the associated functions in terms of impact on the plant response to the accident. If there is no impact of accidental change in the status of these systems, the case can be screened out.

4. For cases not screened out on the basis of likelihood, postulate a Slip error for each of the functions in the vicinity of the intended function on the control panel (e.g., open the wrong valve).

5. For each remaining case, list the possible system and plant response scenarios, and determine whether the error can be recovered using Steps 2 and 3 of the local misdiagnosis analysis procedure. The recoverable errors cases can be screened out.

4.4 Detailed impact analysis

For each case where the recovery of the function is not possible before there is an adverse impact on the system, the failure of the function or system may be added to the corresponding event tree or fault tree with an increased probability of failure due to operator error.

5 OVERVIEW OF THE RESULTS OF AN APPLICATION OF THE PROCEDURES

The following section summarizes the main general conclusions of the study in which these procedures were applied.

5.1 Global misdiagnosis

In the sense in which it is defined, 2 a global misdiagnosis is a misdiagnosis of the general class of accident and is manifested by a transfer out of the E-0 procedure to an inappropriate procedure. The E-0 is essentially a diagnostic procedure that is entered as a result of a reactor trip and serves to guide the operators into specific procedures for the type of accident that has been 'diagnosed'. Following the procedure described in Section 3, on the basis of the screening criterion adopted, only the large LOCA was screened out on initiating event frequency.

The types of adverse consequences that were identified can be grouped into:

- -degradat ion of decay heat removal by unnecessary isolation of a steam generator,

- -c rea t ion of a LOCA by inappropriate transfer to feed and bleed,

- -non-per formance of required actions.

The latter are effectively covered in the existing PSA and their impact is included in the failure to respond

in a timely manner. For the other two types, most potential inappropriate transfers can be screened out effectively for one or more of the reasons given below:

- -commit t ing the error would require multiple and diverse indications to be faulty or mistaken;

- -mult iple instrument failures are assumed to be low probability events. This is particularly true of safety function actuation signals since each signal is generated by a 2-out-of-3 logic, each channel is continuously monitored for continuity, and mutual comparisons are made to identify potential failures;

--faul ts in multiple diverse indications are considered to be very low probability;

--s imultaneously mistaken indications are very low probability events if the indications are diverse. This is particularly true for the cases where indications are mistakenly assumed to be triggered, as these combinations represent the unexpected rather than the expected.

- - T h e procedures themselves allow, and even direct, recovery into the correct procedure. For example, if E-2 were to be entered instead of E- l , than after isolation of the perceived faulted steam generator, E-2 leads to procedure E-1. E-2 also allows a diagnosis for rupture of a steam generator and can lead to E-3A.

However, as the first E-2 example above shows, while it is possible to get back on track, it may be with a degraded function. For example, the decay heat function may be degraded by isolating a steam generator if E-2 has been followed unnecessarily.

Mistaking an SGTR for a stream line break (SLB) could in fact be significant in that it would be possible or even likely to isolate the incorrect generator, if the pressure rising in the ruptured generator were interpreted as falling pressure in the intact generator. However, some other signal, such as a radioactivity alarm, should occur on an SGTR but not on an SLB, making this mistake unlikely. A steam line break is not more likely than an SGTR, therefore a radioactivity alarm is not likely to be ignored by a short cut in diagnosis.

The worst scenario from the point of view of consequence is mistaking a LO CA for a SGTR, since procedure E-3A would not allow for cooldown via ECA 3.1. However, the likelihood of this is minimized because the transfer to E-1 occurs before the transfer to E-3A. Strict adherence to procedures, i.e., no short cuts, would therefore prevent this.

The conclusion was reached that it was unnecessary to modify the existing PSA model to include global errors of commission.

5.2 Local misdiagnosis or slip

Two different types of adverse consequences were identified:

200 J. Julius et ai.

- - fa i lure of a system; - -degrada t ion of a system.

The most significant errors are those that lead to irreversible failures. The detailed analysis per formed resulted in the following conclusions.

• Slips have a higher potential for further degrading already degraded systems than for leading to failure of other undegraded systems. This is because system controls are grouped together on the control panels. Slips are highly unlikely to lead to failure of systems or components as a result of an intention to manipulate other systems or components , because of spatial separation. Spurious slips are ruled out because of the need to provide a confirmatory signal while effecting a change of state, thus changing the state of a component must be a deliberate act. Thus any error caused by a slip would need to arise out of some intention to act. In addition, the degree of redundancy in the plant significantly reduces the chance of causing failure of complete systems.

• The most likely candidates for a significant slip error are associated with the electrical power system, and the s team generators, in the case of degradat ion of one train. In the case of the electrical power system the possibility of creating irreversible damage is greater than that for the SGs, where an inappropriate isolation is reversible.

• The significance of mistakes is considerably lessened because of the redundancy and diversity of instrumentation. The highest likelihood scenario was identified as isolation of an intact generator.

5.3 Summary

The overall conclusion of applying the analytical procedures described here to an operat ing plant is that, for that plant, there are few opportunit ies for significant errors of commission for the following reasons:

(a) the procedures are forgiving and to a large part self correcting, allowing opportunit ies for operators to recover from global misdiagnosis and re-enter the correct procedure. In addition, there is a rediagnosis procedure to be used when the plant does not appear to be responding as expected. However , if an error were made such that E-3A was entered instead of E-l , then there is a potential for a non-recoverable omission in that, in E-3A, there arc no instructions to proceed to cool down:

(b) the instrumentation is redundant and highly reliable. It has a self checking capability which annunciates when one of three channels is in disagreement:

(c) it is the practice of the operating crews to check subsidiary, confirmatory indications and this is an additional factor which has been taken into account in reaching our conclusions. This is important for screening out local errors of commission;

(d) the layout of the control boards, and the way in which manipulations are per formed are significant factors mitigating against slips as sources of errors.

In terms of consequences, perhaps the most significant effects would be loss or increase in degradation of a function. An example is an isolation of the wrong steam generator. Events representing unavailabilities of equipment , of which these errors are an additional failure cause, already existed in the PSA model, so the model itself did not need to be restructured to account for errors of commission and no new accident sequences were identified.

6 C O N C L U S I O N

This paper has presented an analytical procedure for the identification of human failure events for inclusion in a PSA model, that represent the consequences of significant errors of commission associated with operating crew responses to equipment malfunctions or transients and accidents at a nuclear power plant. In the context of a PSA model, an error of commission is defined as one which leads to a failure of a system(s) or function(s) required to mitigate an accident, or an inappropriate actuation of a system or function. The procedure is based on an assumption of rational behavior on the part of the operating crews, and for a plant being opera ted with Emergency Operat ing Procedures based on the Westinghouse Emergency Response Guidelines. Because of this, the procedure has focused on the opera tor / informat ion interface and the opera tor /p rocedure interface as the main sources of potential problems. The investigation of the appropriateness, or correctness, of the procedures has not been addressed.

For analysis purposes, errors of commission were addressed for three high level error modes: global errors of commission, manifested by the selection of an incorrect procedure: local misdiagnosis, manifested by a misinterpretation of the status of a piece of equipment: and slips. To analyze the potential for errors of commission, it is necessary to understand the reasons why, and under what conditions, errors are made. Thc procedure that has been developed here is based on a relatively simple model of possible error causes and mechanisms, that is derived from some current models of human behavior. While it is relatively complete with respect to the identification of the scenarios in which opera tor intervention is

Procedure for the analysis o f errors o f commission in a probabilistic safety assessment 201

required, the procedure for the identification of the signatures of error prone situations, or what others have called the error forcing context, 7 could be made more complete by establishing a more comprehensive set of error modes and error cause descriptions. 4 In particular, the search for the subtle causes of error, resulting f rom the dynamic aspects of accident development that cannot be easily captured in PSA scenario descriptions, should be improved.

The procedure was developed in an iterative manner , and the experience gained by applying it to a specific plant was instrumental to its success. A brief overview of the results of an application of this procedure to an operat ing plant has been presented in Section 5. For the particular plant to which the procedure was applied, the conclusion was that, because of the nature of the procedures, the high degree of redundancy in the instrumentat ion, the operat ing practices, and the control board layouts, the potential for significant errors of commission is low. The application is a t ime consuming exercise, particularly because it is necessary to pay attention to the details of the information interfaces in a way that has until now not been a major concern in PSAs.

A C K N O W L E D G E M E N T S

The authors are indebted to Mario van der Borst of N.V. Elektrici tei ts-Produktiemaatschappij Zuid- Nederland (EPZ) for his support in the performance of the work described in this paper.

R E F E R E N C E S

1. Swain, A.D., & Guttman, H., Handbook of Human Reliability Analysis with Emphasis on Nuclear Power Plant Applications, NUREG/CR-1278, US Nuclear Regulatory Commission, Washington, DC, 1983.

2. Macwan, A., & Mosleh, A., A methodology for modeling operator errors of commission in Probabilistic Risk Assessment. Reliab. Engng & System Safety, 45 (1994) 139-157

3. Fullwood, R. R., & Hall, R. E., Probabilistic Risk Assessment in Nuclear Power Industry; Fundamentals and Applications, Pergamon, Oxford, 1988.

4. Parry, G. W., Suggestions for an improved HRA method for use in Probabilistic Safety Assessments. Reliab. Engng & System Safety, 49 (1995) 1-12.

5. Dougherty, E., Context and human reliability analysis. Reliab. Engng & System Safety, 41 (1993) 25-47.

6. Hollnagel, E. & Cacciabue, P.C., Reliability of cognition, context, and data for a second generation HRA. In Proc. Int. Conf. Probabilistic Safety Assess- ment and Management, San Diego, California, 20-25 March 1994.

7. Barriere, M., Luckas, W., Cooper, S., Wreathall, J., Bley, D., Ramey-Smith, A. & Thompson, C., Multidisciplinary framework for analyzing errors of commission and dependencies in Human Reliability Analysis. Presentation at the Water Reactor Safety Meeting, Rockville, MD., 24-26 October 1994.

8. Weston, R.C.W., Human factors in air traffic control. In Pilot Error, (eds Hurst, R. & Hurst, L.), Macmillan, NY, 1982, pp. 118-135.

9. Davis, D. R., Human error and transport accidents. Ergonomics, 2 (1958) 26-33.

10. Reason, J., Human Error, Cambridge University Press, Cambridge, UK, 1990.

Documents

A procedure for the analysis of errors of commission in a Probabilistic Safety Assessment of a nuclear power plant at full power