IEEE TRANSACTIONS ON SOFTWARE ENGINEERING ...rushby/papers/1oo2-final-2col.pdfIEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 38, NO. 5, SEPTEMBER/OCTOBER 2012, PAGES 1178–1194 1

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 38, NO. 5, SEPTEMBER/OCTOBER 2012, PAGES 1178–1194 1

Reasoning About The ReliabilityOf Diverse Two-Channel Systems

In Which One Channel Is “Possibly Perfect”Bev Littlewood, Member, IEEE Computer Society and John Rushby, Member, IEEE Computer Society

F

Abstract—This paper refines and extends an earlier one by the firstauthor [1]. It considers the problem of reasoning about the reliability offault-tolerant systems with two “channels” (i.e., components) of whichone, A, because it is conventionally engineered and presumed tocontain faults, supports only a claim of reliability, while the other, B, byvirtue of extreme simplicity and extensive analysis, supports a plausibleclaim of “perfection.”

We begin with the case where either channel can bring the systemto a safe state. The reasoning about system probability of failure ondemand (pfd) is divided into two steps. The first concerns aleatoryuncertainty about (i) whether channel A will fail on a randomly se-lected demand and (ii) whether channel B is imperfect. It is shownthat, conditional upon knowing pA (the probability that A fails on arandomly selected demand) and pB (the probability that channel B

is imperfect), a conservative bound on the probability that the systemfails on a randomly selected demand is simply pA × pB . That is,there is conditional independence between the events “A fails” and“B is imperfect.” The second step of the reasoning involves epistemicuncertainty represented by assessors’ beliefs about the distribution of(pA, pB) and it is here that dependence may arise. However, we showthat under quite plausible assumptions, a conservative bound on systempfd can be constructed from point estimates for just three parameters.We discuss the feasibility of establishing credible estimates for theseparameters.

We extend our analysis from faults of omission to those of com-mission, and then combine these to yield an analysis for monitoredarchitectures of a kind proposed for aircraft.

1 BACKGROUND

THIS paper is about the assessment of dependabilityfor fault tolerant software-based systems that em-

ploy two design-diverse channels. This type of systemarchitecture is mainly used to provide protection againstdesign faults in safety-critical applications. The paperclarifies and greatly extends an earlier note by one ofthe authors on a specific instance of the topic [1].

The use of intellectual diversity to improve the de-pendability of processes is ubiquitous in human activity:witness the old saws “Two heads are better than one”

Manuscript received 27 January, 2010; revised 15 December, 2010 and 16May, 2011.B. Littlewood is with the Centre for Software Reliability, City University,London, UKJ. Rushby is with the Computer Science Laboratory, SRI International, MenloPark CA, USA

and “Belt and braces” (or suspenders for North Americanreaders). We all believe that having a colleague reviewour work—“two heads”—is better than conducting thereview ourselves, and we generally consider it prudentto have a backup for complicated machines or plans.

The use of diversity to build reliable systems iswidespread in non-software-based engineering. For ex-ample, in the nuclear industry it is common to havemulti-channel protection systems that are based onphysically different process variables (e.g., temperature,pressure, flow-rates). The simple idea underlying theseapplications of diversity is that, whilst it is inevitablethat humans make mistakes in designing and buildingsystems, and that these mistakes can cause failures dur-ing operation, forcing the systems to be built differentlymight make the failure processes of different versions bediverse also. In other words, you might expect that evenif channel A fails on a particular demand, there is a goodchance that channel B will not fail on that demand.

These ideas have been applied to software systems fornearly thirty years, but there has been some controversyabout the benefits that the approach brings. On the onehand, there are reports of successful industrial appli-cations of software design diversity: for example, thesafety-critical flight control systems of the Airbus fleets[2] have now experienced massive operational exposure[3] with apparently no critical failure where diversity hasbeen employed.1 On the other hand, successive carefullyconducted experiments—see e.g., [5], [6]—have shownthat multiple diverse channels cannot be assumed to failindependently. Thus, if the two channels of a systemeach have a probability of failure on demand (pfd) of10−3, it would be prime facie wrong to claim a pfd of 10−6

for the system. In fact, there will generally be positivedependence of failures between the two channels, andso the pfd will be greater than 10−6.

1. An accident involving an A330 on 7 October 2008 near Learmonth,Australia, was due to a poor algorithm, implemented in software,for spike rejection in angle-of-attack data [4]; the software has sincebeen revised. This software does not have a diverse backup, so itsfailure does not vitiate our point. However, it does highlight thatsensor processing can be a single point of failure in otherwise diverseimplementations; we will return to this point later.

2 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 38, NO. 5, SEPTEMBER/OCTOBER 2012, PAGES 1178–1194

Some insight into these experimental results camefrom probabilistic models of diversity [7], [8]. This workgave careful attention to the different notions of “in-dependence” that had hitherto been treated somewhatinformally, and showed why versions that had beendeveloped “independently” could not necessarily be ex-pected to fail “independently.” The key idea in this workis a notion of “difficulty” variation over the demandspace—informally, some demands are intrinsically moredifficult to handle correctly than others. Thus, if channelA fails on a randomly selected demand, this increasesthe likelihood that the demand is a “difficult” one andhence increases the likelihood that B also will fail. Thatis, the conditional probability of failure of B, given thatA has failed, is greater than its marginal probability offailure, pB , and the probability of system failure will begreater than the product of the marginal probabilitiespA × pB .

The picture that emerged from this evidence—industrial experience, experiments, and theoreticalmodeling—has sometimes been taken to undermineclaims for the efficacy of software diversity. Whilst theindustrial evidence is positive, it inevitably emergesonly after years of operating experience, and even thendoes not easily allow quantitative claims for achievedreliability. The experimental and modeling evidence, onthe other hand, has been taken to be negative.

In fact, this negative view seems somewhat unfair.Although the experiments and models show that as-sumptions of independence of failures cannot be trusted,this does not mean that there cannot be useful benefitsto be gained from design diversity. Ruling out indepen-dence means that we cannot simply multiply the pfdsof the separate channels to obtain the system pfd, butthere might still be considerable reliability benefits. Themodeling results of Littlewood and Miller [8] actuallyshow that there will be benefits under quite plausibleassumptions, but they do not help quantify these benefitsfor particular instances; the experiments of Knight andLeveson [9] show that the benefits can be considerableon average, but again their results do not help to assessa particular system’s reliability.

The position for several years, then, has been this:there is ample evidence that design diversity can“work,” in some average sense, in delivering improvedreliability, but there is very little evidence as to howmuch improvement it delivers even on average,2 and the

2. This notion of “on average” efficacy is similar to that used tosupport claims for other software engineering processes. We know thatcertain software engineering processes—e.g., sound static analysis—can completely eliminate certain fault classes, and that they and othermethods such as various kinds of testing are effective in improvingthe resulting reliability of the systems to which they are applied [10].But we usually know very little about how much they deliver evenon average, and generally nothing at all about what contribution theymake to a particular system’s quality (because the particular can bevery different from the average). This means that knowledge about thesoftware engineering processes used to build a system is of limited usein estimating its operational reliability.

important problem of how to estimate the reliability of aparticular design-diverse system remains very difficult.

This paper concerns this last problem, and it beginswith a rather special class of architectures that appear tobring a simplification to the assessment problem. Thiswork was prompted by one author’s involvement inthe early 1990s with the difficulties of assessing thereliability of the protection system of the UK’s SizewellB nuclear plant. This system comprises the PrimaryProtection System (PPS), which is software-based, andthe Secondary Protection System (SPS), which is hard-wired, in a “one-out-of-two” (sometimes written 1oo2)configuration—that is, if either of the systems requestsit, the reactor “trips” and enters a safe state.

The PPS provides quite extensive and desirable func-tionality, over and above that required for a simplereactor trip; for example, it has extensive built-in hard-ware self-testing and redundancy management featuresto maximize its availability. The result is that, for asafety-critical system, it is quite complex with about100,000 lines of code. For this reason it was not thoughtplausible to make claims for absolute correctness of thesoftware in the PPS; instead, it was required to have apfd no worse than 10−3.

The SPS, on the other hand, has much more restrictedfunctionality, and as a consequence is much simpler (anddoes not depend upon software for its correct operation).Because of this simplicity of design, it might have beenpossible to claim that the SPS is free of design faults, andthus could be deemed free of failures in operation fromsuch causes.3 Of course, regulators might have justifiableskepticism when faced with such a claim of “perfection”:they might be prepared to accept a high confidence inperfection, given extensive suitable supporting evidence,but not accept certainty. It might therefore be plausibleto contemplate attaching a probability to the claim ofperfection for the SPS and then to combine this in someway with the reliability claimed for the PPS to yield aquantifiable claim for the overall system. This is the keyidea in the work reported here; we think it is new.

Asymmetric fault tolerant architectures (i.e., those inwhich one channel is claimed to be reliable and theother is possibly perfect) are intuitively plausible forother safety-critical systems in addition to nuclear shut-down. For example, airplanes sometimes employ simple“monitors” that check the operation of complex sub-systems and trigger higher-level fault recovery whenthey detect problems, while some heart pacemakers anddefibrillators have a purely hardware “safety core” thatprovides backup to a more complex firmware-drivenprimary system. Whilst these approaches are recognizedas attractive ways for achieving reliability and safety, it

3. Of course, the SPS could fail as a result of hardware componentfailures—as could the PPS. But there are good reasons to believe thatsuch contributions to unreliability are better understood and are smallcompared to those arising from design faults. For simplicity here andlater in this paper we shall ignore hardware failures, but it is easy toextend the modeling to include them.

LITTLEWOOD AND RUSHBY: RELIABILITY OF TWO-CHANNEL SYSTEMS IN WHICH ONE CHANNEL IS “POSSIBLY PERFECT” 3

is less widely appreciated that the asymmetry of thearchitecture can also aid in assuring that the systemis sufficiently reliable (or safe), if we can exploit thepossibility of perfection for the second channel.

In Section 2 of the paper this idea will be examined insome detail. For clarity, the account will be conductedin terms of a 1oo2 demand-based system, such as theSizewell protection system, but (as seems likely now,even for these nuclear systems) both channels will beassumed to be software-based.

A 1oo2 architecture reduces the likelihood of failuresof omission, but may raise those of commission (i.e.,uncommanded activations). Section 3 considers thesefailures of commission, and Section 4 combines theprevious results and applies them to “monitored archi-tectures” in which the behavior of a complex operationalchannel is monitored by a simpler channel that cantrigger higher-level fault recovery when safety propertiesare violated. Architectures of this kind are among thoserecommended for commercial airplanes [11]. Section 5presents our conclusions.

2 REASONING ABOUT A ONE-OUT-OF-TWOSYSTEM WITH A POSSIBLY PERFECT CHANNEL

The first part of the reasoning here is similar to thatused in the earlier paper [1]. The main novelty is thatthe present paper establishes conditional independence offailure of one channel and imperfection of the other4

and, in the second part of this section, introduces aproper modeling of the epistemic uncertainty about theparameters of the original work. In giving presentationsof this work, we have found that some are surprisedby, and others are skeptical of our results; we attemptto allay concern by presenting our arguments in somedetail.

We begin by outlining our notion of failure; this isdescribed in more detail in Section 2.3 where we examinethe estimation of model parameters, but it is useful tonarrow the interpretation of the term before we proceed.

Our context is a critical system where the events thatconstitute a critical failure are defined by regulation (e.g.,“inability to continue safe flight and landing” [12]) oras the top-level claims of an assurance or safety case[13], [14]. Through design iterations interleaved withhazard analysis, fault-tree analysis, and other systemsengineering processes, we arrive at a level of designthat identifies a subsystem to be implemented as a 1oo2architecture. The process of system decomposition andsafety analysis will have identified the critical propertiesof this subsystem, and will have allocated a reliabilityrequirement to it. Our focus is on this subsystem andin our analysis we take its critical properties (whose

4. Two events X and Y are conditionally independent given somethird event Z when P (X and Y |Z) = P (X |Z) × P (Y |Z); we saythe independence of X and Y is conditional on Z. Z may be theevent that some model parameter takes a known value. Unconditionalindependence is P (X and Y ) = P (X)× P (Y ) and is not, in general,implied by conditional independence.

violations constitute its failures) and its reliability goal asgiven, but we assume that these have been chosen appro-priately for its role in the overall system. This generallymeans that the critical properties are directly derivedfrom the system safety case, rather than stated in termsof the detailed requirements for the subsystem—becausethe latter are often wrong (we will see an example of thisin Section 4). In our aleatory analysis, we assume that thecritical properties are “correct” so that their violationscorrespond exactly to (all and only) the failures of thesystem. Of course, it may be that the critical propertiesactually documented in the safety case are not “correct”and some event that is apprehended after the fact asa failure does not violate the documented properties.There is therefore epistemic uncertainty about the cor-rectness of the critical properties, and we will accountfor this, but we reiterate that in the aleatory analysis, thecritical properties are correct, or “what they should havebeen,” and failures are violations of these properties.

We can now focus on our object of study, which is a1oo2 demand-based (sub)system comprising two chan-nels, A and B. Channel A is assumed to have sufficientfunctionality, and thus complexity, for it to be infeasibleto claim it to be fault-free. We thus have to assumethat in operation it will eventually fail, and the criterionof its acceptability (when judged alone) will be that itfails infrequently—that is, its probability of failure ona randomly selected demand, pfdA, is acceptably small.Failure here refers to violation of the critical propertiesallocated to the 1oo2 subsystem in the system decompo-sition described earlier. The kinds of evidence that couldbe used by assessors to gain confidence that pfdA is smallinclude: seeing lots of operational testing without failure,high quality of developers and development process, nofaults found in extensive static analysis, etc.

Channel B, on the other hand, has been kept verysimple (its functionality is limited and its implemen-tation straightforward) and thus is open to a claim of“perfection.” This word has been chosen deliberatelyfor its convenience, but in the knowledge that it comeswith extensive baggage. In this context it means justthat this channel of software will not fail however muchoperational exposure it receives; it is “fault-free” if weassume that failures can and will occur if and onlyif faults are present. “Perfection” is preferred to “cor-rectness” because the latter is judged with respect torequirements that are often quite detailed, whereas wereserve “perfection” for just the absence of the criticalfailures. There is a possibility that this channel is notperfect, and so we speak of it as “possibly perfect.” Weuse “possibly” perfect rather than other adjectives suchas “plausibly” perfect because we will wish to refine ourassessment as evidence becomes available: thus we beginfrom the neutral stance that the channel’s perfection is“possible,” and may revise that to “plausible,” or even“likely,” as supporting evidence becomes available. Inour formal treatment, we use the neutral term “possibly”


perfect and employ probability to express the “degree”of perfection (or, rather, imperfection).

In particular, we will henceforth use pnpB to denotethe probability that the channel B is not perfect. Evidenceallowing assessors to have confidence that the probabil-ity pnpB is small could include: a formal specificationand high confidence that this really does accuratelycapture the critical properties, formal proof that thesoftware implementation is correct with respect to thisspecification, and absence of failures in test, etc.

We will often speak of A and B as “software” becauseit is expected that the bulk of their implementation willbe realized in this way, but it is important to note thatour assessments of reliability and perfection refer to thetotality of their design and implementation, and flawstherein, not just the parts that are explicitly software. Itis also important that the 1oo2 system is pared down tojust the two channels, with a minimum of coordinatingmechanism, because we wish to derive the reliability ofthe system from properties of the channels, and this willbe invalid if the subsystem contains a lot of additionalmechanism. (We do account for the coordinating mech-anism in Section 4.)

There are two kinds of uncertainty involved in rea-soning about the reliability of the system. The first isaleatory uncertainty, or “uncertainty in the world,” thesecond is epistemic uncertainty, or “uncertainty about theworld” [15]. Aleatory uncertainty can be thought of as“natural” uncertainty that is irreducible. For example, ifwe were to toss a coin we would not be able to predictwith certainty whether it would fall as “heads” or as“tails.” This uncertainty cannot be eliminated, and anypredictions about future tosses of the coin can only beexpressed as probabilities. This is true even when weknow the probability of heads, which we denote pH , on asingle toss of the coin, such as when the coin is “fair” andpH = 0.5. Real coins, of course, may not be fair, so therewill be uncertainty about the value of the parameter pH .This is epistemic uncertainty, and it is reducible: in thecase of the coin, we could toss it very many times andobserve the frequency of heads in the sequence of tosses.This frequency is an estimate of pH and the estimategets closer to the true value of pH as the number ofobserved tosses increases (there are limit theorems inprobability that detail the nature of this convergence),so the uncertainty reduces.

In much scientific modeling, the aleatory uncertaintyis captured conditionally in a model with unknown pa-rameters, and the epistemic uncertainty centers upon thevalues of these parameters—as in the simple example ofcoin tossing.5 For our 1oo2 system the two probabilities,pfdA and pnpB , are parameters that capture two typesof aleatory uncertainty. For a randomly selected demand,Channel A either does, or does not, fail; pfdA is just thechance that it does fail. As in the case of a coin toss, this

5. The situation can be more complicated than this, of course. Theremay be uncertainty about which of several models is “correct,” andthis uncertainty may not be captured by a parametric model.

chance can be given a classical frequentist interpretation,as the proportion of failures in a hypothetical infinitelylarge sequence of independently selected demands.

Similarly, Channel B either is, or is not, perfect; pnpBis just the chance that it is not perfect. It is a little harderto give a frequentist “in the world” interpretation to thisprobability. One approach, which follows the spirit of[7], [8], is to think of the program for B as a randomselection from the set of all programs, with the probabil-ity distribution of the selection being determined by theproblem to be solved and the character of the softwaredevelopment and assurance processes employed. Eachprogram can be assigned an indicator variable that takesthe value 0 if the program is perfect for B’s problemand 1 if it is not; pnpB is then the expected value of theindicator variable over the selection distribution. Just asthe nature of the problem and the character of the soft-ware development and assurance processes determinethe aleatory selection distribution, so information aboutthese will later influence the assessors’ judgment aboutthe epistemic uncertainty concerning pnpB .

A result that we shall prove in Section 2.1 is that thesetwo parameters are sufficient to describe the uncertaintyabout the system probability of failure. Specifically, con-ditional on knowing the actual values pA and pB of pfdAand pnpB , respectively, the system probability of failureis no worse than pA×pB . The epistemic uncertainty heresolely concerns the values of pA and pB : assessors willnot know these values with certainty. Their beliefs aboutthese values—and here we adopt a Bayesian subjectiveinterpretation of probabilities—will be represented by ajoint distribution

F (pA, pB) = P (pfdA < pA, pnpB < pB) (1)

and their (conservative) probability for system failurewill then be∫

0≤pA≤10≤pB≤1

pA × pB dF (pA, pB). (2)

Generally, assessors’ beliefs about (pA, pB) will not beindependent: that is, the bivariate distribution (1) doesnot factorize into a product of the marginal distributionsof the random variables pfdA and pnpB and so it is notstraightforward to derive a numerical value for (2). Thisis an important issue which is examined in Section 2.2

Our reasoning about the reliability of the system nowproceeds in two stages, the first involving the aleatoryuncertainty, the second the epistemic uncertainty.

2.1 Aleatory Uncertainty

We shall assume in what follows that our interest centersupon the system reliability expressed as the probabilityof its failure upon a (operationally) randomly selecteddemand. Note that other things might be of interest, forinstance the probability of surviving a specified numberof demands (e.g., the number expected to be encountered


during a specific mission, or during the lifetime of thesystem), but we shall not consider these here.

For aleatory uncertainty we have, by the formula fortotal probability, the following conditional pfd, where weuse al ps as an abbreviation for the aleatory parameter-ization pfdA = pA, pnpB = pB wherever this is necessaryto keep formulas within a column.

P (system fails on randomly selected demand | al ps) (3)= P (system fails |A fails, B imperfect, al ps)

× P (A fails, B imperfect | al ps)

+ P (system fails |A succeeds, B imperfect, al ps)

× P (A succeeds, B imperfect | al ps)

+ P (system fails |A fails, B perfect, al ps)

× P (A fails, B perfect | al ps)

+ P (system fails |A succeeds, B perfect, al ps)

× P (A succeeds, B perfect | al ps).

The last three terms of the right hand side of (3) arezero: the system does not fail if either A succeeds or Bis perfect. We shall assume conservatively that if B isnot perfect, then it will fail with certainty whenever Afails, as in the original version of this work [1]: B bringsno benefit if it is not perfect. This (very) conservative as-sumption is important in the following reasoning where,in the first term on the right hand side of (3), the firstfactor in the product is replaced by 1, to yield:

P (system fails on randomly selected demand | al ps)

≤ P (A fails, B imperfect | pfdA = pA, pnpB = pB)

= P (A fails |B imperfect, pfdA = pA, pnpB = pB) (4)× P (B imperfect | pfdA = pA, pnpB = pB).

Now, the fact that B is imperfect tells us nothing aboutwhether or not A will fail on this demand. So we have

P (A fails |B imperfect, pfdA = pA, pnpB = pB) (5)= P (A fails | pfdA = pA, pnpB = pB)

= P (A fails | pfdA = pA)

and so, substituting (5) in (4),


≤ P (A fails |B imperfect, pfdA = pA, pnpB = pB)

× P (B imperfect | pfdA = pA, pnpB = pB)

= P (A fails | pfdA = pA)× P (B imperfect | pnpB = pB)

= pA × pB . (6)

This argument follows that in [1] quite closely; thedifference is that here we have shown that there isconditional independence between the events “A fails” and“B imperfect” when pfdA and pnpB are known.

The importance of this independence result can beseen by contrasting the situation here with the moreusual 1oo2 system, where both channels are sufficientlycomplex that neither is open to a believable claim of per-fection. In the latter case, we cannot apply the reasoning

in step (5) and so it is not possible to claim independenceof failures in the two channels, even conditionally givenpfdA and pfdB (in an obvious notation). Instead, even atthis stage of reasoning where we are concerned only withaleatory uncertainty, we need to deal with dependenceof failures. Thus, it is shown in [8], for example, thatthe conditional probability of such a system failing on arandomly selected demand is

pfdA × pfdB + Cov(θA, θB)

where θA, θB are the difficulty function random variablesfor the two channels. So assessors’ beliefs about pfdA andpfdB are not alone sufficient for them to reason aboutthe reliability of the system: they also need to know thecovariance of the difficulty functions. And this is beforewe consider the problem of epistemic uncertainty con-cerning parameters such as pfdA, pfdB , and Cov(θA, θB),which would involve further dependencies.

In contrast, (4–6) above show that knowledge of thevalues of pfdA and pnpB are sufficient for a completedescription of aleatory uncertainty via the (conservative)conditional independence expression for the system pfd,(6).

Lest it seem that a particular choice was needed atequation (4), it is worth observing that this could equallyhave been written as


≤ P (A fails, B imperfect | pfdA = pA, pnpB = pB)

= P (B imperfect |A fails, pfdA = pA, pnpB = pB) (7)× P (A fails | pfdA = pA, pnpB = pB),

where the conditioning is reversed in (7).Here, the failure of A on this demand tells us noth-

ing about the imperfection of B, since this is a globalproperty with known probability pB , and so we have


≤ P (B imperfect |A fails, pfdA = pA, pnpB = pB)

× P (A fails | pfdA = pA, pnpB = pB)

= P (B imperfect | pnpB = pB)×P (A fails | pfdA = pA)

= pB × pAas before.

The attributes of “perfection” that enable the reason-ing employed at (4) and (7) are: first, that its satisfactionguarantees its channel does not fail on a given demand(this is needed to eliminate the last two terms in (3));second, that it is a global property (i.e., independent ofthis demand, which is needed in (4) and (7)); and third,that it is aleatory (i.e., its uncertainty is irreducible).

Other global properties, such as “has a valid proof ofcorrectness” (whose negation would be “has a flawedproof of correctness”), appear plausible but are notaleatory. Proof of correctness is relative to a specification,and it is possible this specification is flawed. Thus,absence of failure on a given demand (needed for thefirst point above) is contingent on correctness of the


specification, which is an additional parameter and areducible source of uncertainty.

Hence “perfection” as we have defined it is preciselythe property required for this derivation. Properties suchas “has a valid proof of correctness” are importantin assigning an epistemic value to the probability ofperfection and are discussed in Section 2.3. First, though,we examine the basic problem of epistemic uncertaintyabout the model parameters.

2.2 Epistemic Uncertainty and the Structureof F(pA, pB)

The previous section has derived a conservative boundfor the probability of failure on demand for a 1oo2system, conditional on knowledge of the values of therandom variables pfdA and pnpB . In practice, of course,pfdA and pnpB will not be known with certainty, andthis is where the epistemic uncertainty comes in, via theassessors’ posterior beliefs represented by the distribu-tion, (1). “Posterior” here means that this distributiontakes account of all evidence the assessors have acquiredabout the unknown parameters.

Practical interest centers upon the unconditional prob-ability of system failure on a randomly selected de-mand. By the formula for total probability, and usingthe Riemann-Stieltjes integral, this is

P (system fails on randomly selected demand)

=

∫0≤pA≤10≤pB≤1

P (system fails . . . | al ps) dF (pA, pB)

so, substituting (6),

≤∫

0≤pA≤10≤pB≤1

pA × pB dF (pA, pB), (8)

which provides the formal derivation for (2).If the assessors’ beliefs about pfdA and pnpB are inde-

pendent, so that F (pA, pB) factorizes as F (pA)× F (pB),then (8) becomes∫

0≤pA≤1

pA dF (pA)×∫

0≤pB≤1

pB dF (pB) (9)

= P (A fails)× P (B imperfect)

= PA × PB .

That is, the bound on the probability of system failureis just the product of the assessors’ posterior probabilitiesof the events “A fails” and “B imperfect” (i.e., the meansof the distributions for pfdA and pnpB , respectively,taking account of any experimental and analytical ev-idence that may be available). In this case the reliabilityassessment is particularly simple—it involves only theassessment of two parameters.

However, if there is positive association between as-sessors’ beliefs about pfdA and pnpB , then (8) will belarger than (9) and reliability assessment cannot be re-duced to the simple product of two parameters. Readers

may think that this is the situation that will apply inmost real-life cases.

The two-stage reasoning—aleatory uncertainty fol-lowed by epistemic uncertainty—makes clear the sourceof any dependence here: it can arise only from theepistemic uncertainty, via the posterior distributionF (pA, pB). Even though the events “A fails” and “B isimperfect” are conditionally independent at the aleatorylevel, assessors’ beliefs about the probabilities of theseevents may not be independent. For example, if assessorssee A fail in testing, they may wonder if the developershave understood the requirements correctly, and thismay affect their estimate of the probable perfection ofB; this does not happen at the aleatory level, where theprobabilities of failure and of imperfection are parame-ters to the model.

Assessors whose beliefs about pfdA and pnpB are notindependent are faced with the problem of expressingtheir beliefs as a complete bivariate distribution. Elic-itation of experts’ subjective probabilities is an activeresearch area in Bayesian statistics, and there have beenconsiderable advances in recent years in techniques andtools: see [16] for a good introduction to this work. Weshall not consider this general elicitation problem furtherhere, since for a particular system the elicitation processwill depend upon the exact details of the application.Instead, we propose a simplification that will ease thiselicitation task in general. Our approach derives from aconservative treatment of common-cause faults in safetycases and is described in the following section.

2.3 Estimation of Model Parameters

One reason—indeed, we will argue, the only soundreason—that assessors’ beliefs about the parameters pAand pB might not be independent is concern abouthigher-level faults that could be common to both chan-nels. Common-cause faults are a major issue in the safetyassessment of any multi-channel system, so we begin bydescribing the general context of safety assessment, andthe treatment of common-cause faults. We then showhow epistemic assessment of the parameters for a 1oo2system can be performed within this context.

The decision to deploy a safety-critical or other kindof critical system is based on careful assessment of risks,estimating both the likelihood of undesired events, andtheir potential costs and consequences in terms of envi-ronmental harm, loss of life, erosion of national security,loss of commercial confidence or reputation, and so on.The assessment is highly specific to the system con-cerned and its context, and nowadays is often groundedin an argument-based safety or assurance “case” (see, forexample [17]–[21]) and similar argumentation implicitlyunderlies the guidelines (e.g., [11], [22], [23]) developedfor standards-based certfication processes. Such a casebegins with claims that enumerate the undesired lossevents and the tolerable failure rate or likelihood asso-ciated with them: for example, “as low as reasonably


practicable” (ALARP) [24], or “so unlikely that they arenot anticipated to occur during the entire operationallife of all airplanes of one type” [12, paragraph 9.e(3)].Evidence is collected about the system, its design, and itsprocesses of development, and then an argument is con-structed to justify satisfaction of the claims, based on theevidence. This process may recurse through subsystems,with substantiated claims about a subsystem being usedas evidence in a parent case.

The upper levels of a safety case will often decomposeinto subcases based on enumeration of hazards, or fail-ure modes, or architectural components. In the case ofmulti-channel systems, there will be upper-level analysisof the mechanism (if any) that coordinates them (e.g.,a voter or arbiter), and of causes or modes of failurethat may be common to all channels. Common causesmay include physical faults in shared components suchas sensors or in mechanisms that coordinate separatechannels, or conceptual faults in the understanding ofassumptions or requirements. Much engineering exper-tise and many associated analytical techniques and toolsare available for identifying shared or common mech-anisms: for example, common-cause analysis, common-mode analysis, and zonal safety analysis.

By these and other means, arguments and supportingevidence for the correctness of top-level safety require-ments and for the absence of common-cause faults inassumptions, mechanisms, and failure modes can bedeveloped in great detail and subjected to substantialanalysis and empirical investigation. This is necessarybecause flaws in these elements could dominate overallsystem safety and also its probabilistic assessment, nomatter how this is performed. We may then proposethat assessors’ subjective probability for system failureon a randomly selected demand will take the formC + (1 − C) × D, where C is due to residual doubtsabout the top-level issues of system safety requirementsand hazards common to all channels, and D is due tothe specific architecture of the system and its constituentchannels.

The probability C must be small if the system is to befit for a critical purpose, but it may not be realistic or use-ful to strive to make it arbitrarily small. In the UK, for ex-ample, it is common for assessors to impose a claim limitof around 10−4 or 10−5 for the probabilistic assessmentof any system [25]; in UK civil nuclear systems, thereis a specific recommended limit of 10−5 for probabilisticassessment of common-cause faults [17, paragraph 172].These limits can be interpreted as a judgement thatassessors will always retain residual doubts about top-level issues and that claims for these, expressed hereas C, should be taken only so far, and no further; theyensure that this quantity is sufficiently conservative thatit really can absorb the hazards common to all channelsand thus allow assessors’ beliefs about the probabilitiesfor the separate channels to be treated as independent.In other words, claim limits prevent the value of C

being pared away to the point where it threatens thisindependence.

With this idea in mind, we return to the problem ofestimating (8) when assessors’ beliefs about pfdA andpnpB are not independent. Consider the ways in whichdependency might arise. There seem to be only twopossibilities: one is concern about the intrinsic difficultyof some demands, the other is common-cause faults. Weconsider these in turn.

Assessors might observe in testing that the A channelfails more often on some classes of demands than others.But should this affect their beliefs about the probabilityof perfection of the B channel—that is their beliefs aboutthe existence of demands for which B fails? We arguenot; failures of the A channel are “per demand” events,whereas perfection of the B channel is a very differ-ent kind of “global” event. For this reason, assurancetechniques that can contribute to claims of perfection,such as mechanically checked proofs of correctness, areglobal also, and are “uniformly credible” across all cases:a (mechanically checked) proof is a proof. Failures of Amay indeed influence assessment of the failure behaviorof B given imperfection of B, but that is not the eventunder consideration (furthermore, our aleatory analysisalready makes the conservative assumption that if theB channel is imperfect, then it will fail whenever the Achannel does).

Dually, assessors might observe that some parts ofthe safety case for the possibly perfect channel B areharder (i.e., require more work) than others. But shouldthis affect their confidence in corresponding parts of thereliable channel A and their beliefs about its probabilityof failure? Again, we argue not: it is accepted that thereliable channel fails with some probability, and we haveevidence about that probability (e.g., by statistically validrandom testing).

Thus, unlike the case of two reliable channels, wedo not think that intrinsic difficulty of some demandsshould be a cause for dependency in beliefs about pfdAand pnpB : failure of the A channel and imperfectionof the B channel are such different events, and theassurance methods that provide evidence for estimationof their probabilities are of such different kinds, that de-pendency should not arise from concern about difficultdemands.

Rather, dependency in beliefs must have its root causein some higher level that is common to both chan-nels, such as shared mechanism, or the interpretationof requirements (so that one way in which observedfailures of A can affect belief in the perfection of Bis by raising doubts about the top level requirements).One way forward, therefore, building on the treatmentdescribed earlier for common-cause faults, is for theassessors to place some probability mass, say C, atthe point (1, 1) in the (pA, pB)-plane to represent theirsubjective probability that there are such common faults.The effect of this is that the assessors believe that if thereare such faults, then A will fail with certainty (pA = 1),


B will be imperfect with certainty (pB = 1), and hencepA × pB = 1. Hence, there is a probability C that thesystem is certain to fail on a randomly selected demand(since we assume that if B is imperfect, it always failswhen A does). This is clearly conservative, and may bevery much so.

The assessors then have the following conservativeunconditional probability of failure of the system on arandomly selected demand (again by a version of theformula for total probability).


≤∫

0≤pA≤10≤pB≤1

pA × pB dF (pA, pB)

= C ×∫

0≤pA≤10≤pB≤1

pA × pB dF (pA, pB | pA = pB = 1)

+ (1− C)×∫

0≤pA≤10≤pB≤1

pA × pB dF (pA, pB | pA 6= 1, pB 6= 1)

= C + (1− C)×∫

0≤pA<10≤pB<1

pA × pB dF (pA, pB). (10)

If the assessors believe that all dependence betweentheir beliefs about the model parameters has been cap-tured conservatively in C, the conditional distribution inthe second term of (10) now factorizes, and we have:


≤ C + (1− C)×∫

0≤pA<1

pA dF (pA)×∫

0≤pB<1

pB dF (pB)

= C + (1− C)× P ∗A × P ∗B (11)

where P ∗A and P ∗B are the means of the posterior distri-butions representing the assessors’ beliefs about the twoparameters, each conditional on the parameter not beingequal to 1 (i.e., A not certain to fail, B not certain to beimperfect).

If C is small (as it must be if this approach is to beuseful), we can replace the factor (1 − C) in (11) by 1,and we can substitute PA, the unconditional probabilityof failure of the A channel, for P ∗A, and PB , the uncondi-tional probability of imperfection of the B channel, forP ∗B , to yield the conservative approximation

C + PA × PB . (12)

Assessors may find it easier to formulate their beliefsabout these unconditional probabilities. The same sim-plification can be applied to similar formulas developedin later sections and the resulting approximations willall be conservative.

We now consider and illustrate numeric values thatmight be assigned to the parameters that appear inFormula (11) or its simplification (12). We use the word“estimation” for the process by which assessors ex-

presses their “expert beliefs” in terms of these values,but we do not mean to imply that the numbers havealeatoric, in-the-world meaning, of course. Rather, theyare representations of the assessors’ beliefs about theworld.

Bearing in mind that a 1oo2 system has little or nomechanism shared between its two channels and thattheir common top-level requirements are not engineer-ing details but the critical safety requirements (aroundwhich the entire safety case is constructed), we believethat very small values (e.g., 10−7) could plausibly beassessed for C, the probability of common-cause failure.However, this value is also used to absorb any lackof independence in beliefs about the parameters of thetwo channels (which, we have argued, must derive fromcommon-cause concerns) so, borrowing from the idea ofclaim limits, we suggest that 10−5 could be a plausibleand conservative value to use for C. It then remains toconsider the processes for obtaining, or eliciting, P ∗A andP ∗B , or their unconditional variants PA and PB .

The first of these is the assessors’ probability of failureon demand for the reliable channel. For values down toabout 10−4, it is feasible to provide strong and credibleevidence by statistically valid random testing [26], [27].By statistically valid here we mean that the test caseselection probabilities are exactly the same as those thatare encountered in real operation, and that the oracle(used to determine whether a demand has been executedcorrectly or not) is sound. See [28] for an example ofsuch testing used to assess the reliability of a reactorprotection system. In the event that there is doubt aboutthe representativeness of the test case selection, or of thecorrectness of the oracle, then this epistemic uncertaintymust be incorporated into the assessors’ probability P ∗A.Expert assessors might take account of other kinds ofevidence as well as that obtained from testing in arrivingat their P ∗A. We shall not discuss this issue in detailhere, but note that Bayesian Belief Nets (BBNs) are oneexample of a formalism for handling such disparatesources of evidence and uncertainty (see, for example,[29]).

For the possibly perfect channel, the assessors mustexpress their subjective probability of its imperfection:that is, their degree of belief that there is at least onedemand on which it fails. A rational process for devel-oping this probability uses Bayesian methods to modifya prior belief through the acquisition of evidence toyield a refined posterior belief (see [26, sidebar] for abrief description of this process). The strongest evidencewill come from testing and scrutiny and analysis of thedesign and implementation of the channel concerned.As the claim is “perfection,” the analysis will likelyinclude extensive “Verification and Validation” (V&V) ofthe kinds recommended in guidelines such as those ofMISRA [30] for cars, or DO-178B [22] for airplanes. Moreforward-looking guidelines such as the forthcoming DO-178C update to DO-178B may also countenance formalverification. Notice that if testing reveals a failure, or if


formal verification or other analysis reveals a fault, thenthe claim of perfection is destroyed and the channel mustbe replaced in its entirety.6 In the following, therefore, weconsider only the case in which testing and verificationreveal no failures.

Formal verification and other analyses can be per-formed to various degrees of thoroughness: they mayexamine just the abstract design of the software and itsalgorithms, or detailed models developed in a model-based design framework, or the executable code, or allof these. In addition, the formalizations and analysesmay extend to the non-software elements that are partof the channel—for example, its sensors and actuators,the physical processes that they monitor and control,and its environment—possibly including human oper-ators. The properties verified may range from simpleabsence of runtime anomalies, as may be guaranteed bystatic analysis, to full functional correctness with respectto a detailed requirements specification. The subjectiveprobability of imperfection will obviously be influencedby consideration of how comprehensive is the analysis,and the extent of what has not been formally verifiedor otherwise examined in comparable detail; it will alsobe tempered by consideration of the complexity of thedesign and its implementation, and possibly by residualconcerns about the soundness of the formalization, theanalyses, and the tools employed. The significance andrelative weights of these concerns in assessing a proba-bility of imperfection are considered in [31], where is itsuggested that values for P ∗B in the range 10−2–10−4 arecredible.

In summary, we believe that it is plausible and con-servative that well-designed and highly assured 1oo2systems can support claims such as 10−5 for C, the prob-ability of failure due to causes common to both channels,10−3 for P ∗A, the probability of failure by the reliablechannel, and 10−2 for P ∗B , the probability of imperfectionof the possibly perfect channel. Substituting these valuesin (11) we obtain 10−5 + (1 − 10−5) × 10−3 × 10−2 andthereby substantiate a conservative assessment of about2 × 10−5 for the probability of failure on demand ofthe overall system. This assessment and its constituentquantities are comparable to those that have been em-ployed for real systems, but we believe this is the firstaccount that provides an argument in which the basisfor each step is exposed to scrutiny and their method ofcombination can be rigorously defended.

Because the idea is new, there is no experience inassessing probabilities of imperfection, so we really havelittle idea what values might be assessed in practice andtherefore deliberately used a very conservative valuein this example. However, as we outline in [31], andintend to argue more fully in future [32], the traditionalprocesses of software assurance performed in support of

6. We are speaking here of the testing and verification performed forassurance and evaluation; discovery and repair of failures and faultsin earlier stages of development may be acceptable in some lifecycleprocesses.

DO-178B [22] (the guidelines for certification of aircraftsoftware) and similar certification regimes are best un-derstood as developing evidence of possible perfection.The probability of failure for such highly assured soft-ware is then bounded by the product of its probabilityof imperfection and its conditional probability of failure,if imperfect. Software developed and evaluated to thehigher “Design Assurance Levels” of DO-178B can beused in contexts requiring very low probabilities or ratesof failure (e.g., 10−9 per hour), and historical experiencevalidates this judgment. Littlewood and Povyakalo [33]show that when faced with increasing evidence forfailure-free operation, our beliefs should tend towardattributing this to the software being perfect rather thanhaving a very small conditional probability of failure.Thus, we believe that precedent and experience are ableto support credible assessments of very low probabilitiesof imperfection for suitable software, and we are hopefulthat consensus will develop in support of this position.

The relative sizes of the quantities appearing in theexample above are worth some comment. It is plausiblethat assessors could assign a value as small as 10−4 toP ∗A and, as argued above, a similar value seems feasiblefor P ∗B ; their product is then 10−8, which approachesthe range required for ultra-high dependability in masstransit systems such as trains. The fly in the ointmentis common-cause failure, where it may be difficult toassess a probability smaller than 10−5 to C, and thistherefore sets the overall limit on claimed reliability.There are two ways to interpret this observation; oneis to conclude that our analysis is cautionary and thatrequirements and claims for ultra-high dependabilityshould be viewed skeptically, especially if they dependon assigning very low probabilities to common-causeevents. We note that the Airbus A340 fuel emergencyto be discussed in Section 4 was essentially a common-cause failure.

The other interpretation is that our approach to simpli-fying the problem of epistemic estimation is inadequatefor applications requiring ultra-high dependability: itconflates genuine common-cause failure (for whose like-lihood we may truly be able to assess a very low prob-ability) with lack of independence in beliefs about pfdAand pnpB . This is an area where further research andinvestigation of different approaches seem warranted.One alternative approach has been developed by Little-wood and Povyakalo [34], following circulation of draftsof the present paper. This approach asks assessors tostate beliefs about the marginal distributions of the twoparameters pA and pB , and then calculates conservativeapproximations to the joint distribution (8) that do notassume independence. One of their most attractive re-sults is that if we have a confidence bound 1 − αA on anassessment PA for pfdA (i.e., P (pfdA < PA) > 1 − αA),and 1−αB on an assessment PB for pnpB , then pfdS , theprobability of failure on demand for the system, satisfies

P (pfdS < PA × PB) > 1− (αA + αB).


In words, we “multiply the claims and add the doubts”[34]. For example, if we are 95% confident that pfdA <10−4 and 90% confident that pnpB < 10−4, then we canbe 85% confident that the system probability of failure ondemand is less than 10−8. Whilst this approach deliversonly confidence bounds and does not support calculationof the actual posterior probability of failure on demandof the system, (8), it may nevertheless be possible to usethese confidence bounds within a wider safety case.

We return to this topic in Section 4, where we developan analysis for the kind of monitored architectures pro-posed for aircraft, but first we need to examine failuresof commission.

3 TYPE 2 FAILURES, OR FAILURES OF COM-MISSION

In this section, we turn from failures of omission to thoseof commission: those that perform an action (such asshutting down a reactor) when it is not required. In somesystem contexts, this kind of failure can have serioussafety consequences, while in others it may be a costlynuisance, and in yet others merely an annoying “falsealarm.” We study failures of this kind because they are ofsome importance in themselves, and because they lay thegroundwork for our analysis of monitored architecturesin Section 4.

In a 1oo2 system, either channel can put the systeminto a safe state. Hence, the failure of one channel tobring the system to a safe state when this should bedone (i.e., on a “demand”) can be compensated by theother channel working perfectly. But, dually, the 1oo2arrangement means that a single channel can force thesystem to a safe state when it is not necessary to do so(i.e., when there is no demand or, as we shall sometimessay, on a “nondemand”). We can call the failure to bringthe system to a safe state when it is necessary to do soa “Type 1” failure (a failure of omission), and the dualfailure that does this unnecessarily a “Type 2” failure (afailure of commission).

Although we refer to Type 1 and Type 2 failuresas duals, their treatment requires careful considerationbecause the demands and nondemands that underliethem are somewhat different in character. Intuitively, ademand is an event at some point in time, whereas anondemand is the absence of a demand over a period oftime. We unify the two by adjusting our treatment toconsider the rate at which demands occur.

The channels of the systems we are interested intypically operate as cyclic processes executing at somefixed rate, such as so many times a second; on eachexecution, they sample sensors, perform some compu-tation, possibly update their internal state, and decidewhether to act (e.g., to shut down the reactor) or not. Astime passes, there will be very many executions, only afew of which might be demands—i.e., represent statesin which the channels are required to act. We supposethat whether any particular execution is a demand or not

is independent of any other execution, so the numbersof demands in disjoint time intervals are independentof each other (i.e., “independent increments” in theterminology of statistics) and the probability distributionof the number of demands in a time interval dependsonly on the length of the interval (i.e., “stationary in-crements”). Under these conditions, the occurrence ofdemands will be approximately a Poisson process [35]with rate d, say. For the applications we have in mind,there will be very great disparity between the durationsof the cycles and the expected times between successivedemands, so this approximation will be very close. Inwhat follows, therefore, we shall use the continuous timeformalism of the Poisson process. So

d = limδt→0

P (1 demand in (t, t+ δt))

δt. (13)

We define failure rates in a similar way. Then, for Type1 failures we have

1oo2 system Type 1 failure rate

= limδt→0

P (1 demand and system fails in (t, t+ δt))

δt

= limδt→0

P (system fails | 1 demand in (t, t+ δt))× P (1 demand in (t, t+ δt))

δt

= limδt→0

P (system fails | 1 demand in (t, t+ δt))

× P (1 demand in (t, t+ δt))δt

≤(C + (1− C)× P ∗A × P ∗B

)× d

where the first term comes from the epistemic formula(11) and the second from (13). Using (12) and rearrangingterms we also have the conservative simplification

1oo2 system Type 1 failure rate≤ d× (C + PA × PB). (14)

For Type 2 failures due to the A channel, we have

1oo2 system Type 2 failure rate due to A

= limδt→0

P (no demand and A activates in (t, t+ δt))

δt

= limδt→0

P (A activates |no demand in (t, t+ δt))× P (no demand in (t, t+ δt))

δt

= limδt→0

P (A activates |no demand in (t, t+ δt))δt

× P (no demand in (t, t+ δt)).

The second term here is conservatively approximated by1. Then, taking limits, the first term is the failure rateof Channel A with respect to Type 2 failures, whoseepistemic value (i.e., mean of the assessors’ posterior


distribution) we will denote by FA2. Thus, we have

1oo2 system Type 2 failure rate due to A

≤ FA2. (15)

We could perform an exactly similar calculation forType 2 failures due to the B channel, but this wouldnot exploit the possible perfection of that channel. Ac-cordingly, we suppose that the possible perfection of Bextends to Type 2 failures and that pB2 is its aleatoryprobability of imperfection with respect to those failures.Using ndi (.,.) to abbreviate “no demand in (.,.)” wherenecessary, we then have

P (B activates |ndi(t, t+ δt))

= P (B activates |B imperfect, ndi(t, t+ δt))

× P (B imperfect)+ P (B activates |B perfect, ndi(t, t+ δt))

× P (B perfect).

The second term in this sum is zero (B does not activateon a nondemand if it is perfect with respect to Type 2failures), so the right hand side reduces to

P (B activates |B imperfect, ndi(t, t+ δt))× pB2.

The calculation given earlier for the A channel cannow be reproduced for the B channel, but working atthe aleatory level and substituting the expression abovein the final line. This gives

1oo2 system Type 2 aleatory failure rate due to B

≤ limδt→0

P (B activates |B imperfect, ndi(t, t+ δt))δt

× P (ndi(t, t+ δt))× pB2.

Here, the second term is conservatively approximated by1. Then, taking limits, the first term is the aleatory failurerate of B with respect to Type 2 failures, conditional onB being imperfect with respect to these failures, whichwe will denote by fB2|np.

1oo2 system Type 2 aleatory failure rate due to B

≤ fB2|np × pB2. (16)

The right hand side is the product of two parametersdescribing aleatory uncertainty. We now have to considerassessors’ epistemic uncertainty about these parametersto derive a posterior expression for the failure rate. Afully accurate treatment requires integrating with respectto a joint probability function G(fB2|np, pB2) just as wedid with the function F (pA, pB) in Section 2.2. However,it may be that assessors’ beliefs about these parameterswill be independent (why should the rate of failure, ifimperfect, be dependent on the probability of imperfec-tion?) so that the distribution G can be factorized andthe failure rate due to Type 2 errors in B is then givenby

1oo2 system Type 2 failure rate due to B

≤ FB2|np × PB2 (17)

where FB2|np is the assessors’ posterior failure rate of Bwith respect to Type 2 failures, assuming B is imperfectwith respect to these failures, and PB2 is the posteriorprobability of B being imperfect with respect to Type2 failures. Each of these is the mean of the posteriordistribution of its random variable considered alone.

We can now assess the overall merit of a 1oo2 systemin terms of its risk per unit time, taking account of bothType 1 and Type 2 failures, as follows. Risk is the productof the rate or probability of a failure and the severityof its worst-case consequences. Denote the worst-caseseverity of a Type 1 failure by c1, and those of Type2 failures by the A and B channels by cA2 and cB2,respectively. (Since the A and B channels may bringthe system to a safe state in different ways—e.g., bydropping the control rods or by poisoning the mod-erator, respectively—it is reasonable to assess differentseverities to their Type 2 failures.) In some applications,a Type 2 failure may have no safety consequences, socA2 and cB2 will be assessed as zero. But even in thesecases there are likely to be economic or other losses (e.g.,reputation) associated with a Type 2 failure and these canbe included in a broader calculation of risk, resultingin nonzero assessments for cA2 and cB2. We will referto c1, cA2 and cB2 as costs, presuming that worst-casesafety consequences and economic and other losses arenormalized into some financial scale.

The rates for the three kinds of system failure are givenby (14), (15), and (17), respectively. Adding the productsof these and their costs yields the total risk:

1oo2 risk per unit time≤ c1 × d× (C + PA1 × PB1) + cA2 × FA2

+ cB2 × FB2|np × PB2. (18)

We may assume that the cost c1 of a Type 1 failure islarge, and hence that extensive design and engineeringeffort is focused on ensuring that both the demand rated and the failure rate (C + PA1 × PB1) are small, so thatthe contribution to overall risk by the first term in thissum is acceptable.

The second and third terms in the sum (18) are notreduced by a demand rate so, when cA2 and cB2 arenonzero, their contribution to overall risk will be smallonly if all their other components are small. The riskfrom a Type 2 failure of the main A channel is inherentin any safety system, so we may assume that the failurerate FA2 is acceptable, relative to the cost of this typeof failure. The risk due to a Type 2 failure of the Bchannel, however, is an additional factor introduced bythe decision to employ a 1oo2 architecture. Hence, wewill need to be sure that the contribution from theproduct cB2×FB2|np×PB2 is an acceptable price to payfor the reduction in risk that the architecture brings toType 1 failures.

In the above analysis we have assumed that d, c1,cA2 and cB2 are known. If they are not, then they needto be treated as a part of a more general epistemic


analysis. This is simple if it can be assumed, as may bereasonable in some cases, that assessors’ beliefs aboutthese parameters are independent of their beliefs aboutthe parameters representing different probabilities andrates of failure and imperfection. We shall not pursuethis further here.

4 MONITORED ARCHITECTURES

In this section we examine a different class of architec-tures from the 1oo2 case considered so far. These archi-tectures employ an operational channel that is completelyresponsible for the functions of the system concerned,and a monitor channel that can signal an “alarm” ifit deems the operational channel to have failed or tobe causing unsafe behavior. Other systems will ignorethe outputs of a system whose monitor is signaling analarm, so the monitor transforms potential “malfunc-tion” or “unintended function” into “loss of function”(another interpretation is that the purpose of the monitoris to avoid transitions into hazardous states). Loss offunction may still be a dangerous condition, so theremust generally be some larger system that responds tothe alarm signal and compensates in some way for thelost function. This is really, therefore, an architecture forsubsystems; we refer to it as a monitored architecture.

Monitored architectures are attractive for subsystemsof highly redundant systems where occasional loss offunction is anticipated and it is likely that some othersubsystem can take over the functions of the one whosemonitor has signaled the alarm, and the latter subsystemcan be shut down or otherwise isolated.

Aircraft systems have this characteristic, as there is arequirement that no single failure may cause a “catas-trophic failure condition” [12, paragraph 5]; the redun-dancy (e.g., multiple engines, multiple sources of airdata, and internal redundancy in systems such as autopi-lot, flight management, autoland, autothrottle, FADEC,etc.) is primarily present to cope with physical failures(of a mechanical element or of a computer), but there isincreasing interest in using it to handle software faults.The software in flight-critical functions is supposed tobe almost certainly fault-free and guidelines such asARP4754 [11], DO-178B [22], and DO-297 [23] are in-tended to ensure that it is so. However, the reliabilityrequirements for flight critical functions (those that couldcause catastrophic failure conditions) are so high (e.g.,1−10−9 per hour, sustained for the duration of the flight[12, paragraph 10.b])7 that there is some concern that noamount of development assurance can deliver sufficientconfidence that its embedded software is free of faults.

A position paper by a team of experts representingcertification authorities of North and South America,

7. An explanation for this figure can be derived [36, page 37] byconsidering a fleet of 100 aircraft, each flying 3,000 hours per year overa lifetime of 33 years (thereby accumulating about 107 flight-hours). Ifhazard analysis reveals ten potentially catastrophic failure conditionsin each of ten systems, then the “budget” for each is about 10−9 ifsuch a condition is not expected to occur in the lifetime of the fleet.

and Europe [37] expresses this concern, and suggeststhat “fault-tolerance techniques should be applied (suchas, diverse and redundant implementations or simpledesign). . . ”8 Monitored architectures are an approach forachieving this and our proposal is that the monitor chan-nel should be supported by a credible claim of possibleperfection. One way to do this would be to develop themonitor channel as a very simple piece of software andto formally verify it against its requirements, in muchthe same way as we suggested for the possibly perfectchannel in the 1oo2 architecture.

Alternatively, rather than formally analyzing the mon-itor channel, we could formally synthesize it from itsrequirements. There is a topic in formal methods knownas Runtime Verification, whose technology provides meth-ods for automated synthesis of monitors for formallyspecified properties [38]. Given requirements specified ina language such as EAGLE or RULER [39], the methodsof runtime verification can automatically generate anefficient (and provably correct) monitor that will raisean alarm as soon as a requirement is violated.

We refer to monitors developed using either formalanalysis or synthesis as formal monitors, and now turn toconsideration of the requirements that they monitor.

One obvious source of requirements is the documen-tation for the operational channel being monitored. Theproblem with this choice is that even the “high-level soft-ware requirements” of guidelines such as DO-178B arefocused on software functions rather than (sub)systemsafety properties, and operational software is often ro-bustly correct with respect to these requirements (due tothe effectiveness of practices such as those recommendedby DO-178B). That is to say, the overall (sub)systemmay have failed or be operating in an unsafe manner,but each individual software component is operatingcorrectly according to its requirements.

A recent in-flight incident illustrates this topic. It con-cerns an Airbus A340-642, registration G-VATL, whichsuffered a fuel emergency on 8 February 2005 [40]. Theplane was over Europe on a flight from Hong Kong toLondon when two engines flamed out. The crew foundthat the fuel tanks supplying those engines were emptyand those for the other two engines were very low. Theydeclared an emergency and landed at Amsterdam. Thesubsequent investigation reported that two Fuel Con-trol Monitoring Computers (FCMCs) are responsible forpumping fuel between the tanks on this type of airplane.The two FCMCs cross-compare and the “healthiest” onedrives the outputs to the data bus. In this case, bothFCMCs had known faults (but complied with the mini-mum capabilities required for flight); unfortunately, oneof the faults in the one judged healthiest was inabilityto drive the data bus. Thus, although it gave correct

8. Observe that the concern underlying this suggeston can be inter-preted as a “claim limit” (recall Section 2.3), though of a different kindfrom that used in nuclear systems: the expert team is saying they haveresidual doubts about the reliability of any complex single-channelsystem.


commands to the fuel pumps (there was plenty of fueldistributed in other tanks), these were never received.

Another component of the A340 fuel system, the FuelData Concentrator (FDC), independently monitors fuellevels and generates a warning when these fall belowa threshold. However, this alarm is disregarded by theFlight Warning Computer (FWC) unless both FCMCshave failed. The reason for this is not explained, but aplausible explanation is that it is to avoid Type 2 failures:a correctly functioning FCMC has access to more sensordata and can more accurately judge fuel levels than canthe FDC. One of the recommendations of the IncidentReport [40, Safety Recommendation 2005-37] is to changethe FWC logic so that the FDC can always trigger a lowfuel level warning. Observe that the fault that causedthe FCMCs to malfunction and that disabled the FDCwarning was a conceptual flaw in how the “healthiness”of FCMCs should be assessed, and this gave rise tocommon-cause faults in the requirements for the FCMCsand FWC.

This example illustrates our claim that there is unlikelyto be much benefit in monitoring requirements at orbelow the component level: not only is critical softwaregenerally correct with respect to this level of specifica-tion, but faults are often located in these requirements.Furthermore, larger problems may not be manifested atthis level (i.e., they are emergent above the componentlevel). Instead, we need to monitor properties that moredirectly relate to the safe functioning of the overallsystem, and that are more likely to be violated whenproblems are present.

The claims and assumptions of an argument-basedsafety case can provide exactly these properties. A safetycase for the A340 fuel system would surely include state-ments about the acceptable distribution of fuel amongthe different tanks, in addition to minimum levels inthe tanks feeding the engines. Properties concerningfuel distribution are different in kind (i.e., “diverse”)from those that appear in software requirements, whichtypically focus on tasks to be performed rather thanproperties to be maintained (i.e. invariants), and aregood candidates for monitoring: unlike low fuel level(the property monitored by the A340 FDC) incorrect fueldistribution could trigger an alarm quite early in theflight and seems unlikely to trigger false alarms.

A monitored architecture therefore has a highly re-liable operational channel, whose behavior is derivedfrom its software requirements specification, and a sim-ple formal monitor that has a high probability of perfec-tion with respect to properties taken directly from thesystem safety case.9 Analysis of the reliability of sucharchitectures can build on our previous analysis of 1oo2architectures, with the operational channel taking the

9. Of course, the safety case should provide evidence and argumentthat the operational channel does satisfy those properties. Thus, themonitor is redundant if the safety case is sound; its purpose—as withall applications of diversity—is for protection against the possibilitythat the safety case is flawed.

part of A and the monitor the part of B, but with somechanges that reflect differences in the architectures.

One significant change is that there is no notion of“demand” for the operational channel in the moni-tored architecture; instead, it continuously performs itsassigned operational role, but may occasionally sufferfailures that violate a safety requirement. Thus, a moni-tored architecture has a Type 1 failure if its operationalchannel violates a safety requirement and its monitorfails to detect that fact. The operational channel will havesome failure rate, denoted frA, the monitor will havesome probability of imperfection with respect to Type1 failures (i.e., failures of omission), denoted pnpB1, andwe seek a rate for Type 1 system failure based on theseparameters. The other significant change is that there isno notion of Type 2 failure (i.e., of commission) for theoperational channel: only the monitor channel can raisean alarm, and thereby cause a Type 2 system failure bydoing so unnecessarily. We suppose that the monitor willhave some probability of imperfection with respect toType 2 failures, denoted pnpB2 and we seek a rate forType 2 system failure based on this parameter.

We begin with the aleatory analysis for Type 1 systemfailure, which is displayed at the top of the next page.From (21), dividing by δt and taking limits, we obtain:

Monitored architecture Type 1 aleatory failure rate≤ fA × pB1. (22)

The Type 2 failure rate for the monitored architectureis just that for the monitor channel. We can directly applythe analysis from the 1oo2 case so that (16) becomes

Monitored architecture Type 2 aleatory failure rate≤ fB2|np × pB2 (23)

where fB2|np is the Type 2 failure rate for the monitorchannel, conditional on its imperfection with respect tothat class of failures.

As with the cases examined earlier, we now needto consider epistemic uncertainty in the two formulasabove. For the Type 1 failure rate (22), we will need toconsider a joint distribution function H(fA, pB1) repre-senting assessors’ beliefs about these variables. The firstof these is the failure rate of the operational channel,while the second is the probability of imperfection ofthe monitor. We argue that these are different measuresabout very different channels: one a fairly complexsystem designed to meet its requirements (which willbe highly operational and removed from the claimsof the safety case—though intended to ensure them),the other a very simple and direct check of propertiesfrom the safety case. Thus, it is very plausible thatassessors’ beliefs about these channels will be indepen-dent, and so the function H(fA, pB1) can be factorizedinto distributions for the individual variables. However,although the channels are diverse in purpose and design,it is quite likely they will share some implementationmechanisms: for example, a monitor for a fuel system,


Aleatory Analysis for Type 1 Failure in a Monitored Architecture

We assume that frA has known value fA and that pnpB1 has known value pB1. Then

P (monitored architecture has Type 1 failure in (t, t+ δt) | frA = fA, pnpB1 = pB1)

= P (A fails in (t, t+ δt), B does not detect failure | frA = fA, pnpB1 = pB1)

≤ P (A fails in (t, t+ δt), B imperfect | frA = fA, pnpB1 = pB1) (19)= P (A fails in (t, t+ δt) |B imperfect, frA = fA, pnpB1 = pB1)

× P (B imperfect | frA = fA, pnpB1 = pB1)

= P (A fails in (t, t+ δt) | frA = fA, pnpB1 = pB1) (20)× P (B imperfect | frA = fA, pnpB1 = pB1)

= P (A fails in (t, t+ δt) | frA = fA, pnpB1 = pB1)× pB1. (21)

Line (20) follows from the one above it because the fact that B is imperfect tells us nothing about the failure of Ain this interval. Line (19) follows from the one above it by the following reasoning:

P (A fails in (t, t+ δt), B imperfect | frA = fA, pnpB1 = pB1)

= P (A fails in (t, t+ δt), B imperfect and does not detect failure | frA = fA, pnpB1 = pB1)

+ P (A fails in (t, t+ δt), B imperfect and detects failure | frA = fA, pnpB1 = pB1)

≥ P (A fails in (t, t+ δt), B imperfect and does not detect failure | frA = fA, pnpB1 = pB1)

= P (A fails in (t, t+ δt), B does not detect failure | frA = fA, pnpB1 = pB1).

The last line above follows from its predecessor because B must be imperfect if it does not detect failure (of A).

such as that in the A340 example, might rely on thesame sensors as those used by the operational channel.These shared mechanisms induce potential common-cause failures that must be taken into account.

One approach would be to factor the system explicitlyinto three components—shared mechanism (e.g., sen-sors), operational channel, and monitor—and to calcu-late probabilities over this more complex model. How-ever, we propose a simpler treatment that is similar tothat employed to yield (12) in Section 2.2. There, weintroduced an element C to absorb failures due to faultsin requirements and interpretation that are common toboth channels; here, we introduce an element M1 toabsorb failures due to mechanisms that are common toboth channels. Thus,

Monitored architecture Type 1 failure rate≤ M1 + FA × PB1, (24)

where M1 is the assessed Type 1 failure rate due tomechanisms shared between the operational and mon-itor channels, FA is the posterior failure rate of theoperational channel (excluding items in M1), and PB1 isthe posterior probability of imperfection (with respect toType 1 failures) of the monitor channel (again, excludingitems in M1).

The Type 2 failure rate (23) was derived directly fromthe B channel case in a 1oo2 architecture (16). We arguethat the epistemic valuation can similarly be derivedfrom (17) in the 1oo2 case to yield

Monitored architecture Type 2 failure rate≤ M2 + FB2|np × PB2, (25)

where M2 is the assessed Type 2 failure rate due tomechanisms shared between the operational and moni-tor channels, FB2|np is the posterior Type 2 failure ratefor the monitor channel, assuming it is imperfect withrespect to that class of failures, and PB2 is the posteriorprobability of imperfection of the monitor channel withrespect to Type 2 failures. (Although Type 2 failures aredue to the monitor channel alone, we must still includethe contributions of the mechanisms that it shares withthe operational channel.)

Putting these together, we obtain the risk of a moni-tored architecture:

Monitored architecture risk per unit time (26)≤ c1 × (M1 + FA × PB1) + c2 × (M2 + FB2|np × PB2),

where c1 is the cost of a Type 1 failure and c2 the costof a Type 2 failure.

Assuming, as seems likely, that M1 is much smallerthan FA (since the failures considered in M1 are includedamong those in FA), a relatively modest claim for perfec-tion of the monitor with respect to Type 1 failures (e.g.,PB1 = 10−3) provides significant reduction in the overallrisk due to Type 1 failures.

Employing a monitor brings with it the new risk ofa Type 2 failure, and responsibility for this is borne bythe monitor alone. For this new risk to be acceptable,assuming M2 is small, we need either a strong claimfor perfection of the monitor with respect to Type 2failures (e.g., PB2 = 10−6), or the consequence of a Type2 failure must be modest. Relief from this demandingrequirement on PB2 is possible if assessors are able toassign a small value to FB2|np, the failure rate of the


monitor channel, assuming it is imperfect. Statisticallyvalid testing of the monitor channel provides a basis forassessing FB2|np and could yield values on the order of10−3, thereby potentially reducing the requirement onPB2 to a similar value. Obviously, the tests must revealno failures, since then the channel would definitely beimperfect.

Whatever the contribution of FB2|np, there should bean inverse relationship between the product FB2|np×PB2

and the cost c2. For the case of the A340 fuel manage-ment system described earlier, it is likely that the re-sponse to an alarm raised by a monitor would be a warn-ing to the pilots, who would then follow the standardtroubleshooting and emergency procedures establishedfor possible faults in the fuel system. Thus, the conse-quences of a false alarm are modest and commensuratewith a correspondingly modest claim for perfection ofthe monitor with respect to Type 2 failures.

On the other hand, monitoring mechanisms on Amer-ican Airlines Flight 903 of 12 May 1997 (an AirbusA300) inappropriately raised an alarm indicating failureof one of the avionics buses. This caused an automatedbus reset, which in turn caused the Electronic FlightInstrumentation System (EFIS), which is responsible fordisplaying instruments on the cockpit monitors, to goblank for a while. At that time, the pilots were at-tempting to recover from a major upset (the plane wasin a succession of stalls) and the loss of all primaryinstruments at this critical time (“when the situation wasat its gravest”) jeopardized the recovery [41]. Here, thecost of a Type 2 error was very high and it would seemthat an extremely low probability of imperfection shouldhave been required of the monitor.

In fact, the monitor on Flight 903 did not fail (whichis why we called its action “inappropriate” rather than“incorrect”): the problem was in the properties that itwas required to monitor. Until now, we have implicitlyconsidered demands to be events in the logical or phys-ical world—but such events do not always announcethemselves unambiguously: their detection may requiredelicate interpretation of sensors or indirect inferencefrom other available data. In this case, the event to bemonitored is malfunction of an avionics bus, and oneway to detect this event is reception of faulty data de-livered by the bus. And one way to judge whether datais faulty is by considering its physical plausibility. Here,roll angle rates of change greater than 40 degrees/secondwere considered implausible, and reception of sensorvalues in excess of this limit were what triggered themonitor. But, as noted, the airplane was stalled, the rollrates were real, and the inference of a bus fault wasincorrect.

Designers choose the requirements for properties tobe monitored; these properties, not the events they areintended to detect, are the demands that a monitorresponds to and for which its probability of perfectionis assessed. Through consideration of the possible costsof appropriate and inappropriate actions and inactions,

designers may choose to fine-tune detection thresholdsand adjust other elements in the specifications of prop-erties to be monitored. They can also adjust the actionstaken in response to violation of these properties. Theconsequences of these choices can far outweigh theperfection or otherwise of the monitor—hence, there islittle point in requiring high probability of perfectionin a monitor when the properties that it monitors areimperfect indicators of the event that it should respondto. Notice that this concern is not unique to monitors: itapplies just the same to 1oo2 architectures.

It might at first seem reasonable to extend the proba-bility of perfection for a monitor to include “perfection”of the monitored properties (relative the physical or log-ical events they are intended to identify). The problemis that perfection is an inappropriate notion to apply tothese properties: they are internal models of an externalreality and the models may work better in some casesthan others, and are also subject to vicissitudes suchas the fallibilities of sensors. Hence, the fidelity of aproperty to its intended event is not a global attribute,but varies from one demand to another, and is thereforemeasured as reliability, not perfection.

For this reason, monitors and their required probabil-ities of perfection, the fidelity of the properties that theymonitor, and the costs of the responses that they triggerand of the failures that they avert, should be developedin the context of a safety case, where the necessaryassurances and tradeoffs can be made in an integratedmanner. In the case of G-VATL, the property that wasmonitored by the FDC (low fuel in the engine tanks) hadlow fidelity with respect to the physical event of interest(malfunction of the fuel system) and was therefore proneto both Type 1 and Type 2 failures. To compensate forthe latter, common mechanism was introduced betweenthe operational system and the monitor, and that ledto system failure. As we described earlier, a better de-sign would have monitored the distribution of fuel: thiswould trigger much earlier and, modulo reliability ofthe sensors, would seem to have high fidelity. As noted,the cost of a false alarm is fairly modest in this system,and so we conclude that a properly constructed formalmonitor would work well for this application: a modestand achievable probability of perfection would deliver auseful and credible claim to the safety case.

In the case of Flight 903, the property that was mon-itored (plausibility of data) had very low fidelity withrespect to the event of interest (malfunction in an avion-ics bus) and the cost of a Type 2 failure was high (anautomated bus reset). It seems unlikely that a monitor ofhigher fidelity could be constructed given the limitationsof the subsystem concerned. We conclude that a moni-tored architecture was probably inappropriate for thisapplication and that larger architectural revisions shouldhave been considered, such as a fault-tolerant avionicsbus, additional redundancy (e.g., checksums) to allowmonitoring of end-to-end reliability, or more sophisti-


cated fusion and diagnosis (e.g., analytical redundancy[42]) of available sensor data.

In summary, monitored architectures seem appropri-ate and effective for a usefully broad class of safety-critical systems. The monitor channel should evaluateproperties taken directly from the system safety case andcan be vastly simpler than the operational channel, aswell as strongly diverse from it. Its simplicity shouldallow assessment of an attractively small probability ofimperfection for the monitor channel, and the strongdiversity between the two channels should allow separa-tion of beliefs so that the complicating factor C employedin the 1oo2 case that led to the “cautionary” conclusion atthe end of Section 2.3 is avoided. There are factors M1 in(24) and M2 in (25) that play arithmetically similar rolesto C, but these concern different events (failures dueto common sensors and mechanisms shared betweenthe operational and monitor channels) whose assessedprobabilities can be reduced to small values by suitabledesign and assurance. (The problems with the A340fuel management system example do not invalidate thisconclusion: that was an unsuitable design.) A modestand plausible value such as 10−3 for the probabilityof perfection of the monitor, PB1, then delivers a veryuseful reduction in the Type 1 failure rate for the systemand a similar value should also be adequate for PB2 ifthe cost of a Type 2 failure can be controlled by suitableengineering.

5 CONCLUSION

We have considered systems in which a highly reliablesoftware component or “channel” A operates in combi-nation with a channel B that supports a credible claimof “perfection,” meaning that it will never fail. The claimof perfection may be wrong, so we speak of a “possiblyperfect” channel characterized by a probability of imper-fection. Our results for this model clarify, generalize, andextend an earlier result in [1].

The clarification lies in showing that there is conditionalindependence between the events “channel A fails on arandomly selected demand” and “channel B is imper-fect.” This means that the description of the aleatoricuncertainty is particularly simple: under conservativeassumptions, the conditional system pfd is bounded bypA × pB , where pA is the probability that A fails on arandomly selected demand, and pB is the probabilitythat B is imperfect.

This contrasts with the usual approach involving chan-nels for which claims of perfection cannot be madeand only reliability is claimed; here, independence ofchannel failures cannot be asserted, and thus the twochannel pfds are not sufficient to characterize the model.Instead, consideration needs to be given to the “degree”of dependence of the failures of the two channels, andthis is extremely difficult to incorporate into the model:in the Eckhardt and Lee [7] and Littlewood and Miller [8]models, for example, it involves knowing the covariance

of the “difficulty functions” over the demand space.Typically, these are not known, and cannot easily beestimated.

The generalization lies in the treatment of epistemicuncertainty, which is shown to be the sole source ofdependence in this model. Assessors must describe theirbeliefs about the two parameters of the model via the“belief” distribution F (pA, pB) in order to obtain theirunconditional probability of failure of the system on arandomly selected demand. Representing beliefs as abivariate distribution may not be easy, but supposingthat the distribution can be factored will usually notaccurately represent the assessors’ beliefs. We thereforeproposed a conservative approach that is similar to thatunderpinning some “claim limit” approaches in safetycases, and that can be seen as one way of formalizingsuch arguments. It requires assessors to represent theirbeliefs in terms of just three quantities: the probabilityC of common faults that afflict both channels, the prob-ability P ∗A of failure for the reliable channel A in theabsence of such common faults, and the probability P ∗Bof imperfection for the possibly perfect channel, againassuming the absence of common faults. The probabilitythat the system fails on a randomly selected demand isthen conservatively bounded by

C + (1− C)× P ∗A × P ∗B .

The model described here seems much simpler thanthe classical one (with two reliable channels) in thetreatment of both aleatory and epistemic uncertainty.We described the context of argument-based safety casesin which the epistemic probabilities C, P ∗A, and P ∗Bshould be formulated, and argued that credible anduseful values can be constructed with modest enhance-ments to current development and assurance practices.Development of a channel that may claim perfection ismost plausible when the channel is very simple, or issubjected to mechanically checked formal verificationor, preferably, both. Fallibilities in formal verificationare described in [31] and we argued that these can beaccommodated within useful and credible values for P ∗B .

The earlier result has been extended by consideringfailures of commission (e.g., tripping a reactor when thisis not needed), and then further extended by combiningthese results with a modification of the 1oo2 analysis toyield an analysis for monitored architectures: those, suchas are proposed for aircraft, in which a reliable opera-tional channel is monitored by a possibly perfect channelthat can trigger higher-level fault recovery. These casesrequire extension of the statistical model from one basedon probability of failure on demand to one based on ratesof failure.

Like our first result, these extended results are rel-atively simple and require estimation of only a fewparameters. The simplicity is achieved through conserva-tive assumptions; hence our results are conservative also,possibly very much so. Nonetheless, plausible values forthe parameters yield attractive conclusions.


The thrust of this paper has been about assessmentof reliability of fault tolerant diverse software-basedsystems. The problem of assessing the reliability of sucha system is often more difficult than the problem ofachieving that reliability, particularly when the reliabilityrequirements are very stringent. It is worth saying, then,that the asymmetric 1oo2 architecture considered here isone that has long been regarded as an attractive one forachieving reliability in certain applications (e.g., nuclearreactor protection systems), not least because it forcesthe reduction of excessive functionality, with its ensuingcomplexity, which are known enemies of dependability.That this architecture brings greater simplicity to theassessment process is an added bonus.

1oo2 architectures are most appropriate for systemsthat have a safe state; monitored architectures are oneattempt to extend these benefits to systems that mustcontinue operation in the presence of faults. Monitoredarchitectures are recommended in the relevant aircraftguidelines [11, Table 5–2] as one way to achieve thehigh levels of dependability required for flight criticalsoftware. For example, it is suggested that a Level Asystem (one requiring the highest level of assurance)can be achieved by a Level C operational channel anda Level A monitor. Current revisions to these guidelinesare augmenting its Table 5–2 with rules that allow cal-culation of acceptable alternatives. Our results can beseen as a way to provide formal underpinnings to thoserules. We showed that risk due to failure of the opera-tional channel can be significantly reduced by a monitor,but that the probability of imperfection in the monitorwith respect to faults of commission must be carefullyweighed against the danger and cost of unnecessary faultrecovery. Furthermore, a perfect monitor is only as goodas the properties that it monitors, so a large part of thechallenge in developing effective monitors is to ensurethat the properties chosen for monitoring are good ones.

Asymmetric 1oo2 systems and monitored architec-tures can be seen as ways to implement and exploitideas that have long been known in the fields of faulttolerance and security: for example, fail-stop processors[43], safety kernels [44], [45], detectors and correctors[46], and security by execution monitoring [47]. All ofthese use 1oo2 comparison or some form of monitoringto avoid transitions to hazardous states. The utility ofthese architectures in the construction of critical systemsis well known; our contribution is to show how theparticular form of these architectures (where the monitorcan support a claim of probable perfection) enablesassessment of the reliability achieved.

As we have indicated at various points in the paper,assessment at the epistemic level in models of thesekinds is not easy (although, as we have also noted, itis considerably easier than for models in which claimsabout reliability must be made for both channels). Ourtreatment makes assumptions that may be very conser-vative, and still the parameter C, for example, presentsgreat difficulty for an assessor. More work is needed

on ways of easing the task of assessors in using ourapproach: we briefly mentioned some recent results thatavoid the problem of assessing C by requiring the asses-sors instead to provide assessments about their marginaldistributions for pfdA and pnpB . We intend to investigatealternative approaches to these problems.

Other work for the future could involve extending thescope of the models. For example, it will be interesting toextend the analysis of monitored architectures to includerecovery mechanisms, and possibly to consider a basisfor certification of systems that employ adaptive controlor other techniques that are currently considered beyondthe pale.

Acknowledgements

We were very fortunate that our original manuscriptreceived very careful readings from the Associate Editorand three reviewers. Their extensive and thoughtfulcomments and constructive suggestions resulted in thisfinal version being a great improvement on the original,both in readability and in technical accuracy. We thankthem for what must have been a substantial effort on alltheir parts.

We would also like to thank the following colleaguesfor valuable discussions about earlier versions of thiswork: Peter Bishop, Robin Bloomfield, Alan Burns,Bruno Dutertre, Pat Lincoln, Paul Miner, Peter Popov,Andrey Povyakalo, Bob Riemenschneider, N. Shankar,Lorenzo Strigini, David Wright, and Bob Yates.

Bev Littlewood’s work was supported by the INDEEDproject, funded by the UK Engineering and PhysicalSciences Research Council; and by the DISPO project,funded by British Energy, NDA (Sellafield, MagnoxNorth, Magnox South), AWE and Westinghouse underthe CINIF Nuclear Research Programme (the views ex-pressed in this Report are those of the authors and donot necessarily represent the views of the members ofthe Parties; the Parties do not accept liability for anydamage or loss incurred as a result of the informationcontained in this Report).

John Rushby’s research was supported by Na-tional Science Foundation grant CNS-0720908 andby NASA cooperative agreements NNX08AC64A andNNX08AY53A.

REFERENCES

[1] B. Littlewood, “The use of proof in diversity arguments,” IEEETransactions on Software Engineering, vol. 26, no. 10, pp. 1022–1023,Oct. 2000.

[2] J. C. Rouquet and P. J. Traverse, “Safe and reliable computing onboard the Airbus and ATR aircraft,” in Safety of Computer ControlSystems (SAFECOMP ’86). Sarlat, France: Published by Pergamonfor the International Federation of Automatic Control, IFAC, Jul.1986.

[3] Statistical Summary of Commercial Jet Aircraft Accidents, World-wide Operations, 1959–2009, Boeing Commercial Airplane Group,Seattle, WA, Jul. 2010, published annually by Boeing AirplaneSafety Engineering, available at http://www.boeing.com/news/techissues/pdf/statsum.pdf.

http://www.boeing.com/news/techissues/pdf/statsum.pdf

http://www.boeing.com/news/techissues/pdf/statsum.pdf


[4] In-Flight Upset 154 km West of Learmonth, WA, 7 October 2008, VH-QPA, Airbus A330-303, Australian Transport Safety Bureau, Dec.2011, Aviation Occurrence Investigation AO-2008-070, Final.

[5] D. E. Eckhardt, A. K. Caglayan, J. C. Knight, L. D. Lee, D. F.McAllister, M. A. Vouk, and J. P. J. Kelly, “An experimentalevaluation of software redundancy as a strategy for improvingreliability,” IEEE Transactions on Software Engineering, vol. 17, no. 7,pp. 692–702, Jul. 1991.

[6] J. C. Knight and N. G. Leveson, “An experimental evaluation ofthe assumption of independence in multiversion programming,”IEEE Transactions on Software Engineering, vol. SE-12, no. 1, pp.96–109, Jan. 1986.

[7] D. E. Eckhardt, Jr. and L. D. Lee, “A theoretical basis for theanalysis of multiversion software subject to coincident errors,”IEEE Transactions on Software Engineering, vol. SE-11, no. 12, pp.1511–1517, Dec. 1985.

[8] B. Littlewood and D. R. Miller, “Conceptual modeling of coin-cident failures in multiversion software,” IEEE Transactions onSoftware Engineering, vol. 15, no. 12, pp. 1596–1614, Dec. 1989.

[9] J. C. Knight and N. G. Leveson, “An empirical study of failureprobabilities in multi-version software,” in Fault Tolerant Comput-ing Symposium 16. Vienna, Austria: IEEE Computer Society, Jul.1986, pp. 165–170.

[10] B. Littlewood, P. T. Popov, L. Strigini, and N. Shryane, “Modellingthe effects of combining diverse software fault detection tech-niques,” IEEE Transactions on Software Engineering, vol. 26, no. 12,pp. 1157–1167, Dec. 2000.

[11] Aerospace Recommended Practice (ARP) 4754: Certification Consider-ations for Highly-Integrated or Complex Aircraft Systems, Society ofAutomotive Engineers, Nov. 1996, also issued as EUROCAE ED-79; revised as ARP 4754A, December 2010.

[12] System Design and Analysis, Federal Aviation Administration, Jun.21, 1988, advisory Circular 25.1309-1A.

[13] T. Kelly, “Arguing safety—a systematic approach to safety casemanagement,” Ph.D. dissertation, Department of Computer Sci-ence, University of York, UK, 1998.

[14] P. Bishop and R. Bloomfield, “A methodology for safety casedevelopment,” in Safety-Critical Systems Symposium, Birmingham,UK, Feb. 1998, available at http://www.adelard.com/resources/papers/pdf/sss98web.pdf.

[15] W. L. Oberkampf and J. C. Helton, “Alternative representationsof epistemic uncertainty,” Reliability Engineering and System Safety,vol. 85, no. 1–3, pp. 1–10, 2004.

[16] A. O’Hagan, C. E. Buck, A. Daneshkhah, J. R. Eiser, P. H.Garthwaite, D. J. Jenkinson, J. E. Oakley, and T. Rakow, UncertainJudgements: Eliciting Experts’ Probabilities. Wiley, 2006.

[17] Safety Assessment Principles for Nuclear Facilities, 2006th ed., UKHealth and Safety Executive, Bootle, UK, available at http://www.hse.gov.uk/nuclear/saps/saps2006.pdf.

[18] Licensing of Safety Critical Software for Nuclear Reactors: CommonPosition of Seven European Nuclear Regulators and Authorised Tech-nical Support Organizations, AVN Belgium, BfS Germany, CSNSpain, ISTec Germany, NII United Kingdom, SKI Sweden, STUKFinland, 2007, available at http://www.bfs.de/de/kerntechnik/sicherheit/Licensing safety critical software.pdf.

[19] Air Traffic Services Safety Requirements, CAP 670, Safety RegulationGroup, UK Civil Aviation Authority, Jun. 2008, see Part B, Section3, Systems Engineering SW01: Regulatory Objectives for SoftwareSafety Assurance in ATS Equipment; Available at http://www.caa.co.uk/docs/33/cap670.pdf.

[20] Defence Standard 00-56, Issue 4: Safety Management Requirementsfor Defence Systems. Part 1: Requirements, UK Ministry of Defence,Jun. 2007, available at http://www.dstan.mod.uk/data/00/056/01000400.pdf.

[21] Engineering Safety Management (The Yellow Book), Volumes1 and 2, Fundamentals and Guidance, Issue 4, Rail Safetyand Standards Board, London, UK, 2007, available fromhttp://www.yellowbook-rail.org.uk/site/the yellow book/theyellow book.html.

[22] DO-178B: Software Considerations in Airborne Systems and Equip-ment Certification, Requirements and Technical Concepts for Avia-tion (RTCA), Washington, DC, Dec. 1992, this document is knownas EUROCAE ED-12B in Europe.

[23] DO-297: Integrated Modular Avionics (IMA) Development Guidanceand Certification Considerations, Requirements and Technical Con-cepts for Aviation (RTCA), Washington, DC, Nov. 2005, alsoissued as EUROCAE ED-124 (2007).

[24] Health and Safety at Work etc. Act, UK Health and Safety Execu-tive, 1974, available at http://www.hse.gov.uk/legislation/hswa.htm; guidance suite at http://www.hse.gov.uk/risk/theory/alarp.htm.

[25] The Use of Computers in Safety-Critical Applications: Final Report ofthe Study Group on the Safety of Operational Computer Systems, UKHealth and Safety Commission, 1998, available at http://www.hse.gov.uk/nuclear/computers.pdf.

[26] B. Littlewood and L. Strigini, “Validation of ultrahigh depend-ability for software-based systems,” Communications of the ACM,pp. 69–80, Nov. 1993.

[27] R. W. Butler and G. B. Finelli, “The infeasibility of experimentalquantification of life-critical software reliability,” IEEE Transactionson Software Engineering, vol. 19, no. 1, pp. 3–12, Jan. 1993.

[28] J. May, G. Hughes, and A. D. Lunn, “Reliability estimationfrom appropriate testing of plant protection software,” IEE/BCSSoftware Engineering Journal, vol. 10, no. 6, pp. 206–218, Nov. 1995.

[29] B. Littlewood and D. Wright, “The use of multi-legged argumentsto increase confidence in safety claims for software-based systems:a study based on a BBN analysis of an idealised example,” IEEETransactions on Software Engineering, vol. 33, no. 5, pp. 347–365,May 2007.

[30] Development Guidelines for Vehicle Based Software, The Motor Indus-try Software Reliability Association (MISRA), Nuneaton, UK, Jan.2001, pDF Version 1.1.

[31] J. Rushby, “Software verification and system assurance,” in Sev-enth International Conference on Software Engineering and FormalMethods (SEFM), D. V. Hung and P. Krishnan, Eds. Hanoi,Vietnam: IEEE Computer Society, Nov. 2009, pp. 3–10.

[32] B. Littlewood, J. Rushby, and L. Strigini, “On the nature of soft-ware assurance,” Computer Science Laboratory, SRI International,Menlo Park, CA, Tech. Rep., 2012, in preparation.

[33] B. Littlewood and A. Povyakalo, “On claims for the perfectionof software,” Centre for Software Reliability, City University,London, UK, Tech. Rep., Jan. 2010.

[34] ——, “Conservative reasoning about epistemic uncertainty for theprobability of failure on demand of a 1oo2 software-based systemin which one channel is “possibly perfect”,” Centre for SoftwareReliability, City University, London, UK, Tech. Rep., Jan. 2010.

[35] S. M. Ross, Stochastic Processes. New York, NY: Wiley, 1996.[36] E. Lloyd and W. Tye, Systematic Safety: Safety Assessment of Air-

craft Systems. London, England: Civil Aviation Authority, 1982,reprinted 1992.

[37] CAST Position Paper 24: Reliance on Development AssuranceAlone when Performing a Complex and Full-Time Critical Function,Certification Authorities Software Team (CAST), Mar. 2006,available from http://www.faa.gov/aircraft/air cert/designapprovals/air software/cast/cast papers/.

[38] K. Havelund and G. Rosu, “Efficient monitoring of safety prop-erties,” Software Tools for Technology Transfer, vol. 6, no. 2, pp. 158–173, Aug. 2004.

[39] H. Barringer, D. Rydeheard, and K. Havelund, “Rule systemsfor run-time monitoring: From EAGLE to RULER,” in RuntimeVerification (RV 2007), ser. Lecture Notes in Computer Science,vol. 4839. Vancouver, British Columbia, Canada: Springer-Verlag,Mar. 2007, pp. 111–125.

[40] Report on the incident to Airbus A340-642, registration G-VATL en-route from Hong Kong to London Heathrow on 8 February 2005, UKAir Investigations Branch, 2007, available at http://www.aaib.gov.uk/publications/formal reports/4 2007 g vatl.cfm.

[41] Safety Recommendations A-98-3 through -5, National TransportationSafety Board, Washington, DC, Jan. 1998, available at http://www.ntsb.gov/Recs/letters/1998/A98 3 5.pdf.

[42] A. S. Willsky, “A survey of methods for failure detection indynamic systems,” Automatica, vol. 12, no. 6, pp. 601–611, Nov.1976.

[43] R. D. Schlichting and F. B. Schneider, “Fail-stop processors: Anapproach to designing fault-tolerant computing systems,” ACMTransactions on Computer Systems, vol. 1, no. 3, pp. 222–238, Apr.1983.

[44] J. Rushby, “Kernels for safety?” in Safe and Secure ComputingSystems, T. Anderson, Ed. Blackwell Scientific Publications,1989, ch. 13, pp. 210–220, (Proceedings of a Symposium held inGlasgow, October 1986).

[45] K. G. Wika and J. C. Knight, “On the enforcement of softwaresafety policies,” in COMPASS ’95 (Proceedings of the Tenth Annual

http://www.adelard.com/resources/papers/pdf/sss98web.pdf

http://www.adelard.com/resources/papers/pdf/sss98web.pdf

http://www.hse.gov.uk/nuclear/saps/saps2006.pdf

http://www.hse.gov.uk/nuclear/saps/saps2006.pdf

http://www.bfs.de/de/kerntechnik/sicherheit/Licensing_safety_critical_s oftware.pdf

http://www.bfs.de/de/kerntechnik/sicherheit/Licensing_safety_critical_s oftware.pdf

http://www.caa.co.uk/docs/33/cap670.pdf

http://www.caa.co.uk/docs/33/cap670.pdf

http://www.dstan.mod.uk/data/00/056/01000400.pdf

http://www.dstan.mod.uk/data/00/056/01000400.pdf

http://www.yellowbook-rail.org.uk/site/the_yellow_book/the_yellow_book. html

http://www.yellowbook-rail.org.uk/site/the_yellow_book/the_yellow_book. html

http://www.hse.gov.uk/legislation/hswa.htm

http://www.hse.gov.uk/legislation/hswa.htm

http://www.hse.gov.uk/risk/theory/alarp.htm

http://www.hse.gov.uk/risk/theory/alarp.htm

http://www.hse.gov.uk/nuclear/computers.pdf

http://www.hse.gov.uk/nuclear/computers.pdf

http://www.faa.gov/aircraft/air_cert/design_approvals/air_software/cast /cast_papers/

http://www.faa.gov/aircraft/air_cert/design_approvals/air_software/cast /cast_papers/

http://www.aaib.gov.uk/publications/formal_reports/4_2007_g_vatl.cfm

http://www.aaib.gov.uk/publications/formal_reports/4_2007_g_vatl.cfm

http://www.ntsb.gov/Recs/letters/1998/A98_3_5.pdf

http://www.ntsb.gov/Recs/letters/1998/A98_3_5.pdf


Conference on Computer Assurance). Gaithersburg, MD: IEEEWashington Section, Jun. 1995, pp. 83–93.

[46] A. Arora and S. S. Kulkarni, “Designing masking fault tolerancevia nonmasking fault tolerance,” IEEE Transactions on SoftwareEngineering, vol. 24, no. 6, pp. 435–450, Jun. 1998.

[47] F. Schneider, “Enforceable security policies,” ACM Transactions onInformation and System Security, vol. 3, no. 1, pp. 30–50, Feb. 2000.

PLACEPHOTOHERE

Bev Littlewood has degrees in mathematicsand statistics, and a PhD in statistics and com-puter science; he is a Chartered Engineer, and aChartered Statistician. He has worked for morethan 30 years on problems associated with thedependability of software-based systems, andhas published many papers in international jour-nals and conference proceedings and has editedseveral books. His technical contributions havelargely focused on the application of rigorousprobabilistic and statistical techniques in soft-

ware systems engineering. In 1983 he founded the Centre for SoftwareReliability (CSR) at City University, London, and was its Director until2003. He is currently Professor of Software Engineering in CSR. From1990 to 2005 he was a member of the UK Nuclear Safety AdvisoryCommittee. He is a member of IFIP Working Group 10.4 on ReliableComputing and Fault Tolerance, of the UK Computing Research Com-mittee, and is a Fellow of the Royal Statistical Society. He is on theeditorial boards of several international journals. In 2007 he was therecipient of the IEEE Computer Society’s Harlan D Mills Award.

PLACEPHOTOHERE

John Rushby received B.Sc. and Ph.D. degreesin computing science from the University of New-castle upon Tyne in 1971 and 1977, respectively.He joined the Computer Science Laboratory ofSRI International in 1983, and served as its di-rector from 1986 to 1990; he currently managesits research program in formal methods and de-pendable systems. His research interests centeron the use of formal methods for problems in thedesign and assurance of secure and dependablesystems. Prior to joining SRI, he held academic

positions at the Universities of Manchester and Newcastle upon Tyne inEngland.

Dr. Rushby is a former associate editor for Communications of theACM, IEEE Transactions on Software engineering, and Formal Aspectsof Computing and is a current member of IFIP Working Group 10.4 onReliable Computing and Fault Tolerance. He was recently a member ofa National Research Council study that produced the report “Softwarefor Dependable Systems: Sufficient Evidence?” He is the 2011 recipientof the IEEE Computer Society’s Harlan D Mills Award.

Documents

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING ...rushby/papers/1oo2-final-2col.pdfIEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 38, NO. 5, SEPTEMBER/OCTOBER 2012, PAGES 1178–1194 1