Reliability Analysis Center - SUNY CortlandŁ Specific Prior Distribution:Some modelers have attempted to deal with the reality of different failure rates for different faults by assum-ing

RAC is a DoD Information Analysis Center Sponsored by the Defense Technical Information Center and Operated by IIT Research Institute

IN

SI

DE

T h e J o u r n a l o f t h e

7Tutorial: Testing for

MTBF

9PRISM - A New Tool

from RAC

10Calls for Papers

11

Special InsertSTART 2000-1:

SustainedMaintenance

Planning

15Industry News

18From the Editor

20Calendar

21In Progress at RAC

Reliability Analysis Center

First Quarter - 2000

AA DDiissccuussssiioonn ooff SSooffttwwaarree RReelliiaabbiilliittyyMMooddeelliinngg PPrroobblleemmss

By: Jorge Romeu, Reliability Analysis Center

IntroductionA quarter of a century has passed since the firstsoftware reliability model appeared. Manydozens more, of various types, have been devel-oped since. And many practitioners still dis-agree on the practical uses of models in soft-ware managing, staffing, costing and releaseactivities. The present article examines this sit-uation, discusses some of its causes and sug-gests some approaches to improve it.

This author believes that the current user dissat-isfaction stems from the manner in which relia-bility, as a concept, is applied to the softwareenvironment and on how the related modelshave evolved. This paper, therefore, begins byproviding an overview of the characteristics ofsoftware reliability models and of their devel-opment efforts. This is followed by a discus-sion of the assumptions underlying softwarereliability models and other related problems.Finally, some suggestions on how to improvethe situation are provided.

Software ReliabilityBroadly speaking, reliability is the probabilityof satisfactory operation of a system or device,under specific conditions, for a specific time. Insoftware systems, the concept of reliability iscomplicated by several factors. An operatorand a hardware subsystem are always associat-ed with the software. Hence, documentation,training, and interface problems can (directly orindirectly) induce software failures, thusbecoming part of the reliability assessmentprocess.

It is also important to understand the origins ofthe concept of software reliability. It evolved asa result of the increasing use of embedded soft-ware in already existing hardware systems.Hardware reliability had been successfullydeveloped and understood since the early 50�s.Hence, it was only natural that the same type ofhardware professionals (e.g. systems and elec-trical engineers) would develop the first soft-ware models by extending and adapting theirpreviously successful hardware ones. But thehardware modeling techniques, as we will latersee, did not always work well in the softwareenvironment.

This lack of model portability occurred due tosome basic differences between the two envi-ronments. It is true that, in general, both hard-ware and software reliability models can bebroadly divided into three categories: Structural(theoretical), Part Count (component) andBlack Box (empirical). Examples of theoreticalhardware/software models are few and usuallyare of systems of minor complexity (independ-ent series, parallel component). Of the secondtype, we find MIL-HDBK-217 models for hard-ware, and models based on software science orcyclomatic complexity for software. Black Boxmodels, usually time-driven, include the ArmyMateriel and Systems Analysis Activity(AMSAA) model for hardware, and Jelinski-Moranda, Musa, Goel-Okumoto, etc. for soft-ware. Also included are other types of empiricalmodels based on time series, input domain,seeding, etc.

However, this author finds one substantial dif-ference between the two modeling activities.

T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r

F i r s t Q u a r t e r - 2 0 0 02

After its development stage, hardware ismass-produced and the resulting indus-trial product is used in relatively similarenvironments. For example, after theprototype has been developed and tested,a military helicopter is mass-producedand flown by similarly trained pilots inattack and rescue missions.

Software, on the other hand, alwaysremains a �prototype� on its own, ofwhich exact copies are made and used bya wide variety of people with very differ-ent interests and applications. For exam-ple, a matrix-inversion software may beused by a high school student to solve a2x2 system of linear equations or by aPh.D. candidate to invert thousands ofmultivariate correlation matrices in asimulation study.

The modeling stage is, therefore, perma-nent for the (prototype) software product.Hence, its characteristics and problemshave a more significant impact in thisenvironment than in the hardware one, aswe will see next.

Modeling ProblemsA major problem encountered in the soft-ware reliability modeling activity arisesfrom the involvement of two differentgroups of individuals, modelers andpractitioners, each having a differentproduct and process in mind and seekinga different result. The fundamental dif-ferences between these two groups makethe existing software reliability models aprofessional success for many modelersbut unsatisfactory working tools formany practitioners.

Modelers are usually researchers or aca-demicians while most practitioners aresoftware developers. Academicians andresearchers rely on publishing papers,which are peer reviewed and assessed fortheir theoretical value by other academi-cians and researchers, to obtain theirtenure, promotion or doctorates. To pub-lish their work, modelers use sophisticat-ed statistical theories that require strict(and sometimes unrealistic or unjusti-fied) underlying assumptions.

Practitioners (managers, developers) onthe other hand, need to staff, cost, andrelease software products. Practitionerswork with programmers, under time con-straints and must rely on insufficient andsometimes deficient information.Software practitioners need models andapproaches that are feasible (implement-ed without incurring exorbitant costs orexcessive burden) and practical (can beused to staff, cost, release the software,etc.).

Theoretical models are based on manymathematically driven software assump-tions that, in practice, do not hold or areweak. In addition, many models of theBlack Box (e.g. time-based) class fail tocapture several other important factorsthat affect software reliability.

In general, modelers achieve their goalsbut practitioners (whose needs are notmet) remain unsatisfied. For, even whenthe software reliability models developedhave indeed helped users in their work,they have not completely solved theirpractical problems in a satisfactory man-ner.

Such dichotomy of interests is, in theopinion of this author, the source of mostof the problems encountered in softwarereliability modeling. For models thathave been based on theoretical assump-tions, many times far removed from real-ity, cannot produce accurate results. Thisis not to say that other problems, such asdefining software reliability, agreeing onsoftware metrics, etc. do not complicatethe matter even further. We will, in thefollowing pages, discuss some of themajor discrepancies that arise from thementioned dichotomy between softwaretheory and practice.

Validity of SoftwareReliability ModelAssumptionsSome software reliability model assump-tions do not hold or are weak becausethey have a purely theoretical (mathe-matical) origin. Note that not all model

assumptions are invalid all the time or inall the models. Some (Black Box) modelassumptions and related topics and thereasons for their possible lack of validityare:

� Definition and Criticality ofFailures: In many cases, failuresare user dependent and poorlydefined. This makes their identifi-cation in the field also difficult.

� Definition of Time Units: Includecalendar time, execution time, etc.,which may differ substantially ormay not always be accuratelyrecorded. Some models (Musa)have found ways to deal with thisby converting units from one timedomain to another. The assumptionalso implies that testing intensity istime homogeneous.

� Fixed Number of Faults: Assumesthat no additional faults are intro-duced and that every debuggingattempt is successful. Some mod-els (e.g. imperfect debugging ofGoel) attempt to address theseissues.

� All Faults Have the Same FailureRate: This implies that all faultshave the same probability ofoccurrence. But failure probabili-ty is in fact associated with inputdomain and user profile. Hence,all failures are not equally likely.For example, a software failureoccurs only when a specific inputis given. But some users may pro-vide such input very frequently.For this user, the program willhave a high failure rate. Anotherconsequence of failures not beingequally likely is that reliability willbe affected by the order in whichfaults are discovered. Say two dif-ferent testing teams have uncov-ered two different sequences of�n� faults. They may obtain twodifferent reliability estimates (andthis is complicated by the specificuser profile).

� All Software Faults are AlwaysExposed: Faults are encounteredonly if that part of the software


F i r s t Q u a r t e r - 2 0 0 0 3

where they reside is exercised. Ifthere is a fault that prevents theexecution of some part of the soft-ware until it is removed, the faultsthat exist downstream, in that partof the code, are not exposed untilthe initial one is uncovered andremoved.

� Faults are Immediately Removed:Testing will not usually be stoppedonce a fault is uncovered.Adaptive procedures (removing/patching part of the code; restrict-ing the input) will be used to con-tinue the testing, while the fault isuncovered and fixed.

� Only One Failure at a TimeOccurs: This is a required PoissonProcess assumption and not allsoftware necessarily complies withit. There may occur multiple fail-ures simultaneously (and not allfaults will be corrected beforerestarting the testing).

� Testing is Homogeneous: Testingeffort is not always the same; per-sonnel may vary as well as timededicated to it and to other concur-rent functions. In addition, if acritical fault is uncovered or adeadline approaches, testing maybecome more intensive.

� Failure Rate is (only) Proportionalto Error Content: There are manyother factors, such as user profile,complexity of the problem, lan-guage, programming experience,etc.

� Number of Failures in DisjointIntervals is Independent: This isalso another key Poisson Processassumption. There is a finite num-ber of faults in the software, whichare sequentially removed. If weencounter and remove a largenumber of faults in one interval,then in the next time interval therewill be less faults to find, and viceversa. Hence, the number of faultsuncovered in two disjoint, adjacenttime intervals is affected by thenumber of faults previouslyuncovered.

� Times Between Failures areIndependent: Since the number of

failures encountered in disjointintervals is not independent, thisassociated assumption is not trueeither.

� Testing Proceeds Only After aFault is Removed (corrected): Thisis an ideal situation that does notoccur in practice. Adaptive proce-dures are used to proceed withtesting.

� All the Code is Tested, All theTime: Some testing may occurbefore all the code is completed.Then, if a fault is encountered andlocated in a given module, thisfault may be removed or patched(adaptive procedure) to proceedwith the testing.

� Run Time versus Think Time: Run(test) time models penalize devel-opment strategies that spend moredesk (think) time analyzing theprogram than in testing. Calendartime captures both of these activi-ties (think and run times) but thistime measurement is weak.

� Specific Prior Distribution: Somemodelers have attempted to dealwith the reality of different failurerates for different faults by assum-ing a prior distribution and thenusing a Bayesian model for thereliability. The form of such a prioris selected for mathematical rea-sons, in order to obtain a closedform solution for the correspon-ding posterior.

� Reliability Growth Continues withAdditional Testing Time: It isimplicitly assumed that, as testtime proceeds, new faults will beuncovered and removed. Hence,the software reliability willincrease. This precludes the intro-duction of new faults as well as theincrease in program complexity bythe maintenance operation.

� Seeded Faults Have the SameFailure Rate as Indigenous: In thisapproach to software reliability,the developer intentionally intro-duces a number of faults in theprogram (e.g. fault �seeding�).Then the testing team �uncovers�some of them during testing, along

with other �indigenous� faults (notseeded, but actual programmingones). Based on the number ofindigenous and seeded faultsuncovered, an estimate of totalnumber of program faults isobtained. This estimationapproach assumes that the com-plexity and location of seededfaults are the same as the complex-ity and program location of theindigenous faults. This is not nec-essarily so.

� Software Input and User Profilesare Known and Representative:Some software reliability modelsuse input domain profiles, whichare difficult to establish. Then,some users exercise some parts ofthe code more than others do,establishing a particular user pro-file. Estimating such profiles con-stitutes a complex statistical prob-lem, involving multivariate param-eter estimation and goodness of fittests. In addition, these profilesare user dependent, requiring onefor each different user.

� Failure Data Collection isAccurate: In software develop-ment, the basic activity is to devel-op good code. Data collection isusually a peripheral activity thatprogrammers are assigned, in addi-tion to their work. Data collectionforms (problem reports, etc.) arecomplicated and data elementssuch as exact times, etc. may notbe recorded accurately.

In addition, other issues associated withsoftware development and model use,impact software reliability model assess-ment. Some of them are:

� Software Doesn�t Wear Out withTime: This has always been a keydifference between software andhardware reliability. Hardwaredevices �age�. Software also�ages� � just in a different way!As time proceeds and softwaremaintenance occurs, new func-tions, modules, hardware, capabil-ities, are added/modified. Its com-


F i r s t Q u a r t e r - 2 0 0 04

plexity increases to the point, thatit becomes more economic to�retire� it than to continue main-taining it. This process is, concep-tually, akin to hardware �aging�processes.

� Development Phases and FaultExposure: In different softwaredevelopment phases, differenttypes of faults are uncovered.Hence, combining failure datafrom different phases for modelinput may be detrimental to theoverall software reliability estima-tion.

� Experimentation: Many softwaredevelopment experiments under-taken to assess software reliabilitymodels and methods have beenimplemented using very specificproblems and subjects. This situa-tion poses restrictions on theextrapolation of results. Theexperimental subjects are usuallystudents or pre-selected profes-sional teams. The problems areusually theoretical, or replicationsof available real life ones. Neitherhas been randomly selected andmay be far from representative.Experimental results are still use-ful and provide valuable insight,but care should be exercised intheir interpretation and especiallyin extrapolation.

� Additional Factors Not AccountedFor: Most software models areaffected by project requirements,software environment and docu-mentation, user profile, program-mer and management experience,and other factors not accounted forin the model. Their contributionto the unreliability of the softwareprogram is therefore not reflectedin the model results.

� Initial Reliability Estimations: Insome software models, initial esti-mates are a function of (1) theprocessor speed, (2) programminglanguage used (via expansionrates) and (3) error exposure rate.Program size cancels out and does

not constitute a factor at this earlytime. Other previously mentionedfactors that also affect softwarecomplexity and reliability are notincluded in this initial estimation.

� Fault Exposure Ratios: Theseratios, used for initial estimation,were obtained several years ago.In a rapidly advancing area suchas software programming, wherenew languages, new environments(e.g. visual) and technologies arecoming out every day, such faultexposure ratios may no longer berepresentative. In addition, theywere obtained from specific envi-ronments and projects of the pastand may not represent the projectsand new application areas oftoday.

� Language Exposure Ratios: Theseratios are subject to the criticismmade of fault exposure ratios. Inaddition, new programming lan-guages (e.g. Java) have appearedrecently for which accurate lan-guage exposure ratios may not yetbe available.

� Having a Large Pool of SoftwareReliability Models from Which toChoose: This constitutes an addi-tional and serious problem. Sinceno single model has been com-pletely established, softwaredevelopers must choose one. Forexample, the developer may tryfitting several models and thenchoose the most accurate amongthem, based on past behavior.However, can one be sure that pastbehavior always guarantees amodel�s correct future behavior?Model selection is not an easytask.

Some Suggestions toImprove Software ReliabilityModeling Software reliability models are used toassess the end result of the softwaredevelopment process. This final result(program) is a function of at least three

broad factors: people, project and envi-ronment. The people include program-mers, management, testers, etc. Theproject is represented by its characteris-tics: size, complexity, requirements,functions, interfaces, etc. The environ-ment includes all the characteristics ofthe software development shop: manage-ment style, software tools, methods, etc.This author believes that, in addition tousing and improving the software relia-bility models, resources should also bededicated to obtaining a better under-standing of one�s own software organiza-tion (strengths and weaknesses) and toimprove it by training its people andreadjusting its methods.

In-depth forensic analysis of an organiza-tion�s past work will reveal its strong andweak points. It will also provide forassessing key components of the threefactors mentioned in the previous para-graph. This author also proposes thaterror prevention, rather than correction,be emphasized. Organizational improve-ments based on the results of forensicanalysis may provide substantial mid andlong-range software reliability gains.The development by the SoftwareEngineering Institute of the CapabilityMaturity Model (SEI/CMM), with itsfive-level classification of softwareshops (from basic practices to continu-ous, quantitative improvement process)is a tacit recognition of the urgent needfor organizational improvements.

It is known that about 70% of softwarefaults result from problems introducedduring the requirements definition anddesign phases (including software reuseproblems). Dedicating more time andstaff to better understanding, stating andconveying to programmers the specialrequirements and design issues of eachproject will pay off at the end. Bettertraining, software tools and programmerand resource time management may alsocontribute to reducing programmingstress and thus software errors.


F i r s t Q u a r t e r - 2 0 0 0 5

Finally, forensic analysis may also showthat improvements are needed for impor-tant concepts such as fault avoidance andprevention techniques (tools, training),fault tolerance (recovery blocks, n-ver-sion programming), fault detection/cor-rection (walkthroughs).

ConclusionsThis author recognizes the excellentwork of many software reliability modeldevelopers. He has worked with some ofthem in industry and academia andrespects their many talents and achieve-ments. There are ample examples in thesoftware engineering literature that showhow software reliability models, directlyand indirectly, have:

� Improved the software estimation,assessment, prediction, etc.processes

� Improved the understanding of thesoftware development process

� Contributed to the development ofnew software engineering toolsand techniques

However, it is also unquestionable thatthe problems discussed in this articleaffect model efficiency and accuracy andare at the heart of the dissatisfaction ofsome model users. For, in general, soft-ware reliability models are based upon:

� Poor (few, weak measurementscale) data

� Weak (invalid, incomplete, unre-alistic) assumptions

� Incomplete (lacking many impor-tant) factors

Therefore, current software reliabilitymodels can only provide an approxima-tion of the results that we want and need.All things considered, however, andunder the circumstances discussed in thisarticle this is the best they can do, at thistime. Therefore, this author also propos-es that the models be improved. This canbe accomplished by seriously revisitingthe modeling problems discussed in thispaper. By approaching the stated model-developer versus model-practitionerdichotomy in a constructive manner, allparties may agree to:

� Not ask or expect unrealisticresults from software reliabilitymodels

� Have practitioners work moreclosely with software modeldevelopers

� Provide an incentive for adapting(as opposed to developing new)models

� Assess each software organiza-tion�s strengths and weaknesses

� Improve, correspondingly, eachorganization (programming/processes)

� Strive for error prevention ratherthan error correction

� Then, use a software reliabilitymodel, judiciously, to assessimprovement.

Bibliography1. Handbook of Software Reliability

Engineering. Michael R. Lyu,Editor. IEEE/Computer SocietyPress-McGraw Hill. 1995.

2. Software Reliability: Measurement,Prediction, Application. Musa, J., A.Iannino and K. Okumoto. McGrawHill. 1987.

3. �Software Reliability: Models,Assumptions, Limitations andApplicability.� Goel A., IEEE-TRSoftware Engineering. Vol. 11.December 1985.

4. �Discussion of Statistical Measuresto Evaluate and Compare PredictiveQuality of Software ReliabilityEstimation Methods.� J. L. Romeu.Proceedings of the 1997 BiennialMeeting of the InternationalStatistical Institute (ISI). Istanbul,Turkey.

5. �Classifying Combined Hardware/Software Reliability Models.�Romeu, J. L. and K. Dey.Proceedings of the 1984 RAMSConference. San Francisco, CA.

6. �Some Measurement ProblemsDetected in the Analysis of SoftwareProductivity Data and theirStatistical Consequences.� Romeu,J. L. and S. Gloss-Soler. Proceedingsof the 1983 COMPSAC Conference.Chicago, IL.

RAC Points of ContactPoints of contact have been assigned for many of the services and products available from RAC. For the convenience of our cus-tomers and Journal readers, a list of all of our contacts is provided below. Please contact the indicated individuals for informationand for answers to your questions on the topics as noted.

RAC Point of Contact TopicDave Dylis 315-339-7055 [email protected] PRISM and Data collection and sharingNed Criscimagna 301-918-1526 [email protected] RAC JournalMary Priore 315-339-7135 [email protected] RAC Web SiteBruce Dudley 315-339-7045 [email protected] or [email protected] Technical InquiriesNan Pfrimmer 800-526-4803 [email protected] TrainingGina Nash 800-526-4802 [email protected] Product Catalog and orders


F i r s t Q u a r t e r - 2 0 0 0 7

By: Anthony Coppola, Reliability Analysis Center

Tests for failure rate or its reciprocal, mean time between failures(MTBF), are based on the risks of accepting a value defined asundesirable (the consumer�s risk) and of rejecting a valuedefined as desirable (the producer�s risk). These risks are com-puted using the Poisson distribution, which provides the proba-bility that any given number of failures will occur during the des-ignated test time.

(1)

where:Pn = probability of n events occurring in some defined

circumstancesa = expected number of events

For reliability testing, the expected number of events is the num-ber of failures in a given test time. Since reliability tests definea maximum number of failures for acceptance, we will need thecumulative Poisson:

(2)

where:n = maximum number of failures alloweda = expected number of failuresx = number of events

The consumer�s risk (i.e., the probability that a product with afailure rate considered unacceptable would be accepted.) isdetermined by calculating the probability of no more than theallowable number of failures occurring using the expected num-ber of failures of the most reliable unit that is still consideredunacceptable. The expected number is calculated from the fail-ure distribution parameters of a product considered just unac-ceptable, the number of products on test, and the test time.

For example, if a group of products having a failure distributiondescribed by a Weibull function are put on test, the expectednumber of failures is:

a = m(t/q)b (3)

where:a = expected number of failuresm = number of units on testt = test time on one unitq = characteristic life (the time at which 63% of prod-

ucts will have failed)

b = shape parameter (when b < 1.0, failures occur at andecreasing rate with time: when b > 1.0, failuresoccur at an increasing rate with time.)

Note: use of this equation assumes that testing starts at t = 0 forall units (none operate before the test) and that each unit is oper-ated for the same test time (except those that fail before time = t).

For example, suppose we wish to test a product that experienceindicates will have a Weibull distribution of failures with b = 2.We decide we want to reject it if its characteristic life (q) is nobetter than 100 hours, and will accept no more than a 10% riskof a product with this life passing the test. We can test 20 units(m) for 50 hours (t) each. How many failures can we allow?

Our expected number of failures (a) would be:

a = m(t/q)b = 20(50/100)2 = 20(.5)2= 20(.25) = 5.0

Hence, the probability of accepting a product with the life wewant to reject is:

For n = 0:

For n = 1:

For n = 2:

!n

eaP

an

n

-

=

0067.1

)0067(.1

!0

)0067(.5P

0===

0402.0335.0067.

1

)0067(.50067.

!1

)0067(.5

!0

)0067(.5P

10

=+=

+=+=

124.0838.00402.2

1675.0402.

2

)0067(.250402.

!2

)0067(.5

!1

)0067(.5

!0

)0067(.5P

210

=+=+=

+=

++=

Tutorial: Testing for MTBF

!x

ea fewer)or P(n

axn

0x

-

=

å=

!x

)0067(.5

!x

e5

!x

eaP

xn

0x

5xn

0x

axn

0x

===== ååå=

-

=

-

=


F i r s t Q u a r t e r - 2 0 0 0 8

Hence, we can allow one failure, but not two, if we wish to keepthe consumer�s risk below 10%.

What of the producer�s risk? It is the probability that a productwith a failure rate considered acceptable (one we do not wish toreject) would be rejected. It is found by solving Equation 2using the characteristic life considered acceptable, with the giventest time and allowable number of failures, which we calculatedearlier. The result is the probability of the acceptable productpassing the test. One minus this result is the probability that theproduct will fail the test and, hence, be rejected, which is the pro-ducer�s risk.

For example, suppose we decided that the product that we aretesting should have a characteristic life of 500 hours. What is therisk that the test we devised will reject a product actually achiev-ing this norm?

Since q = 500 hours, the expected number of failures for 20 unitsrunning 50 hours each would be:

a = m(t/q)b = 20(50/500)2 = 20(.1)2 = 20(.01) = 0.2

The probability of this product having no more than one failureduring the test is:

A product with the desirable life will therefore pass the test 98%of the time, or, equivalently, will have a 2% risk of rejection(producer�s risk = .02).

Note that if we tested only one unitand allowed no failures, Equation 2reduces to:

(4)

Equation 4 is the expression for thereliability of a product with a Weibulldistribution of failures, as should beexpected (reliability = probability ofno failures in a given operating time).

When a constant failure rate isassumed (b = 1.0), it is not even nec-essary for all units on test to have

equal operating times. For this case, the probability of passingthe test is determined by the formula:

(5)

where:l = failure rate (l= 1/q; note that in this case, q is

also the mean time between failure)n = number of failures allowedt = total test time among all units tested

For determining the consumer�s risk, l is set equal to the bestfailure rate we feel is unacceptable. For the producer�s risk, l isset to a value we do not wish to reject.

However, for products assumed to possess a constant failure rate,there is no need to solve the equations. Tabulations of tests havebeen made citing the total operating hours and number of allow-able failures for selected values of the producer�s risk, the con-sumer�s risk and the discrimination ratio (i.e. the ratio of accept-able MTBF/unacceptable MTBF). For �fixed length� tests, theunits on test are operated until the acceptable number of failuresis exceeded (reject decision) or the total operating hours amongthe units on test equals the test duration (accept decision). Table1 is one such tabulation, which clearly shows the effects of dif-ferent risks and discrimination ratios on test duration and accept-able number of failures.

To reduce the test time required to reach decisions, �probabilityratio sequential tests� have been devised. These tests assume aconstant failure rate and require us to define an MTBF we wantto reject with a certain confidence (or, equivalently, one forwhich we have defined an acceptable consumer�s risk) and anMTBF we want to accept with a certain confidence. For thesetests, the probability that an observed number of failures in an

98.1637.8187.P

!1

e)2(.

!0

e)2(.

!x

eaP

2.12.0ax1

0x

=+=

+==---

=

å

( )[ ] bq-b

=q

= )/t(0

e!0

/t P

Table 1: Fixed Length Test PlansDecision Risks Discrimination Test Duration Allowable

Producer�s Consumer�s Ratio (multiples of�bad� MTBF)

Failures

10% 10% 1.5 45.0 3610% 20% 1.5 29.9 2520% 20% 1.5 21.5 1710% 10% 2.0 18.8 1310% 20% 2.0 12.4 920% 20% 2.0 7.8 510% 10% 3.0 9.3 510% 20% 3.0 5.4 320% 20% 3.0 4.3 230% 30% 1.5 8.0 630% 30% 2.0 3.7 230% 30% 3.0 1.1 0

!x

)t(eP

xtn

0x

l=

l-

=

å


F i r s t Q u a r t e r - 2 0 0 0 9

PPRRIISSMM -- AANNeeww TToooollffrroomm RRAACC

As reported earlier, the Reliability Analysis Center has devel-oped a new methodology, and an associated software tool calledPRISM, for estimating the failure rate of electronic systems.This methodology includes new component reliability predictionmodels developed by RAC called RACRates, and means forassessing the reliability of systems due to non-component vari-ables.

The PRISM methodology is structured to allow the user the abil-ity to estimate the reliability of a system in the early designstages when little is known about the system. For example, earlyin the development phase of a system, a reliability estimate canbe made based on a generic parts list, using default values foroperational profiles and stresses. When additional informationbecomes available, the model allows the incremental addition ofdata.

PRISM provides the capability to assess the reliability of elec-tronic systems. It is not intended to be the "standard" predictionmethodology, and like any methodology, it can be misused ifapplied carelessly. It does not consider the effect of redundancyor perform FMEAs. The intent of PRISM is to provide the datanecessary to feed these analyses.

The methodology allows modifying a base reliability estimatewith process grading factors for the following failure causes:parts, design, manufacturing, system management, wearout,induced and no defect found. These process grades correspond tothe degree to which actions have been taken to mitigate theoccurrence of system failure due to these failure categories.Once the base estimate is modified with the process grades, thereliability estimate is further modified by empirical data takenthroughout system development and testing. This modification is

accomplished using Bayesian techniques that apply the appro-priate weights for the different data elements.

Advantages of this new methodology are that it uses all availableinformation to form the best estimate of field reliability, is tai-lorable, has quantifiable confidence bounds, and has sensitivityto the predominant system reliability drivers.

The new model adopts a broader scope to predicting reliability.It factors in all available reliability data as it becomes availableon the program. It thus integrates test and analysis data, whichprovides a better prediction foundation and a means of estimat-ing variances from different reliability measures.

PRISM is unique in that, as a single integrated tool, it:

· Includes a failure rate model for a system· Includes component reliability models with acceleration

factors (or pi factors) to estimate the effect on failure rate ofvarious stress and component variables

· Allows an initial estimate of failure rate to be developedusing a combination of the new "RACRate" failure ratemodels developed by RAC, the empirical field failure ratedata contained in the RAC databases, or user-defined failurerates entered directly by the user

· Allows the initial estimate of failure rate to be adjusted forprocess grading factors, infant mortality characteristics, reli-ability growth characteristics, and environmental stresses

· Includes a factor for assessing the reliability growth charac-teristics of a system

· Accounts for infant mortality· Includes a predictive software reliability model that does not

require empirical data· Allows the predicted failure rate to be combined with empir-

ical data using a Bayesian approach

For further information contact D. Dylis at (315) 339-7055 or todownload DEMO version of PRISM visit our RAC web site athttp://rac.iitri.org/PRISM.

elapsed test time would have occurred if the product had theundesirable MTBF is divided by the probability that the samenumber of failures would have occurred if the product had thedesirable MTBF. If this ratio is sufficiently small, the productis accepted. If it is sufficiently large, the product is rejected(�Sufficiently� is defined by a formula based on predefined pro-ducer�s and consumer�s risks). If neither criterion is satisfied,the test continues. See MIL-HDBK-781, Reliability TestMethods, Plans, and Environments for EngineeringDevelopment, Qualification, and Production, for further details.

A new RAC text, �Practical Statistical Tools for the ReliabilityEngineer,� provides basic training in probability, statistics anddistributions, followed by detailed discussions of the applica-tion of these concepts to measuring reliability, demonstrating

reliability, reliability growth test-ing, sampling, statistical qualitycontrol, and improving processes.Statistical techniques are the mostvaluable tools of the reliabilityengineer. However, there is adearth of statistical textbooks thatare both pertinent to reliabilityapplications and easy to under-stand. The discussions aredetailed to the extent that practicaluse can be made of tools, and thelimitations to each tool is clearlydefined to keep a novice from misapplication. The text isdesigned to be as user-friendly as possible.


F i r s t Q u a r t e r - 2 0 0 010

The 5th High Assurance Systems Engineering Symposium (HASE 2000)November 15-17, Albuquerque, New Mexico

Special Theme for 2000: Providing Convincing Evidence of Safety

The objective of this Symposium is to provide an effective forum for original scientific and engineering advances in High-AssuranceSystems design, development, and deployment, and to foster communication among researchers and practitioners working in all aspectsof high assurance systems.

The home page for HASE 2000 is http://www.high-assurance.org/. Papers to the HASE 2000 conference can be submitted via snailmail or through the electronic submission page. To send papers via snail mail, please mail a hard copy of your paper to the followingpostal address:

Dr. Victor Winter Principal Member of Technical Staff High Integrity Software Systems Engineering PO Box 5800, Department 2662 Sandia National Laboratories Albuquerque, NM 87185-0535Telephone: (505) 284-2696 Fax: (505) 844-9478 Web URL: http://www.sandia.gov/ast/

Military & Aerospace/Avionics COTS Conference, Exhibition & SeminarAugust 22-25, Fort Collins, COSponsored by the Johns Hopkins University Applied Physics Laboratory, Laurel, MD, Naval Surface Warfare Center, Crane, IN & JetPropulsion Laboratory, Pasadena, CA.

This conference is dedicated to issues assuring the highest quality, availability, reliability, and cost effectiveness of microelectronictechnology and its insertion into high performance, affordable systems. Commercial-Off-The-Shelf (COTS) issues include the appli-cation of non-military Plastic Encapsulated Microcircuits (PEMs) on commercially produced printed circuit boards and assemblies usedin these systems. Discussion of recent developments and future directions will assure relevance of material. The conference will con-tinue to highlight the insertion of commercial technology, i.e., plastic encapsulated packaging and low cost assemblies, into military,aerospace and avionic equipment.

Prospective authors should submit five (5) copies of a one-page abstract by May 1, 2000 for review by the Program Committee.Abstracts must include author�s name(s), affiliation and complete address, E-mail, fax & telephone number. Abstracts will be selectedon the basis of technical merit, supporting test results & overall suitability. Send abstracts to:

Conference Chairman and Coordinator Edward B. HakimThe Center for Commercial Component Insertion (C3I, Inc)2412 Emerson AvenueSpring Lake, NJ 07762Telephone & Fax: (732) 449-4729E-mail: [email protected]

Calls for Papers

The appearance of advertising in this publication does not constitute endorsement by theDepartment of Defense or RAC of the products or services advertised.


F i r s t Q u a r t e r - 2 0 0 0 11

Table of Contents

� Introduction

� Background and Concept

� SMP Components

� Implementing SMP

� For Further Study

� Publications

� Other START Sheets Available

� About the Author

IntroductionReadiness is the ability of forces or equipment to deliverdesigned outputs without unacceptable delay. While �readi-ness� is associated more with combat forces, it can also beused to describe how well an enterprise is poised to impact orrespond to commercial marketplaces. The term compriseshuman resources and equipment (weapons systems, e.g.)among its many elements. Affordable readiness is that levelof readiness that can be sustained within some budget or atminimum life-cycle cost. The discussion here is about thesupport necessary to keep equipment ready (a contribution tooverall readiness) at an affordable cost. Affordable readinessencompasses four separate but related ways to look at supportfor weapons systems or industrial machinery:

� Total Cost of Ownership

� Sustained Maintenance Planning

� Flexible Sustainment

� Rightsourcing

That is, total cost of ownership, or life-cycle cost, cannot beminimized unless:

� maintenance planning is continually reviewed for sys-tem optimization (sustained planning);

� performance-based specifications and metrics are usedto adjust existing support concepts and operations; and

� innovative procurement strategies are used to find thebest sources of supply, labor, and materials to supportthe system.

This START sheet addresses the concept of SustainedMaintenance Planning (SMP). (The RAC has also publisheda closely related sheet on Flexible Sustainment.)

Background and ConceptInnovative maintenance planning and execution can extendthe useful life of a system. Maintenance management�s func-tions are to cost effectively maintain the system to achievemission objectives, with minimal downtime, and to intro-duce upgrade and modification programs that improve oper-ational capability as required. To accomplish this, mainte-nance managers must plan for and execute preventive andcorrective maintenance that is based on an in-depth under-standing of how the system is performing when compared todesign limitations. When done correctly, the useful life of asystem can be extended safely and operational readiness andsystem effectiveness are more affordable.

In an era of fierce competition for scarce resources, it is nolonger sufficient or competitive to develop a maintenanceplan during development, and then implement that plan with-out change over the life of a system. That happened - stilldoes, in commercial industry - all too often when develop-ment and production (build) organizations transferred sys-tems with static maintenance plans to owners unprepared todo the required sustaining engineering. Fortunately, modern,proven concepts (e.g., Just-In-Time logistics, IntegratedProduct Teams, seamless transition, etc.) in addition to pow-erful communications media and computer-based informa-tion systems and analytical tools enhance the capability to dodynamic and iterative maintenance planning. Such planning,and its judicious execution, optimize the use of scarceresources and make readiness more affordable.

To maximize the benefits of applying these concepts, the U.S.Navy�s Naval Aviation Systems Command (NAVAIR) identi-fied key processes that are known collectively as SustainedMaintenance Planning (SMP). SMP is defined as:

�An iterative process that ensures the highest affordableaviation weapons system reliability by using the broadrange of aviation metrics to analyze effectiveness and per-formance of each weapons system�s maintenance pro-grams, continually improve maintenance documentationand recommend improvements across the entire spectrumof ILS elements�

START Selected Topics in AssuranceRelated Technologies

Volume 7, Number 1

Sustained Maintenance Planning

STA

RT

20

00

-1,

SM

P


F i r s t Q u a r t e r - 2 0 0 012

The goals of SMP are to:

� have a set of iterative processes that use maintenance met-rics to evaluate the effectiveness of a maintenance plan inachieving desired availability;

� have a life-cycle process for establishing and adjustingPreventive Maintenance (PM) requirements for all levels ofmaintenance; and

� ensure the desired levels of reliability are obtained for a sys-tem, either through actual or near-achievement of inherentreliability, or through the identification of the most signifi-cant candidates for redesign.

High field reliability, coupled with optimized maintenance,translates directly into affordable readiness.

SMP ComponentsFigure 1 shows the SMP process, and is similar to Figure 6-16 in the Management Manual Draft NAVAIR 00-25-406,Design Interface Maintenance Planning. The figure showsSustained Maintenance Planning as �In Service SupportPlanning.� Figure 1 depicts SMP as an overarching closed-loopprocess that encompasses continual review of established main-tenance plans to ensure the most cost-effective maintenance isbeing performed on fielded systems. The two key componentsof SMP are:

� Reliability-Centered Maintenance

� Age Exploration

Figure 1. The Sustained Maintenance Planning Process

Sustained Maintenance Planning

REDESIGNOPERATIONS

DESIGN OF EQUIPMENTFUNCTIONAL BREAKDOWN

PERFORMANCE SPECSTEST DATA

FMECA

RCM DECISION LOGIC

AGE EXPLORATION (AE)SAMPLING REQUIREMENTS

PREVENTIVEMAINTENANCEREQUIREMENT

NO PREVENTIVE

MAINTENANCE

MAINTENANCE PLAN LORA

LOGISTICS SUPPORT

PREVENTIVE MAINTENANCE DOCUMENTS (MRC, MIMS, IMC, AEPD)

CORRECTIVE MAINTENANCE REQUIREMENTS (ATE/TPS, SE)

SUPPLY & TRAINING SUPPORT

ACQUISITION

OPERATIONAL

CORRECTIVE MAINTENANCE PERFORMED

(UNSCHEDULED)

PREVENTIVE MAINTENANCE PERFORMED(SCHEDULED)

SAMPLING INSPECTIONSPERFORMEDPERFORMED

(Age Exploration, Lead the Fleet, etc.)�HANDS-ON� ENGINEERING / LOGISTICS

O/I/D MAINTENANCE PERSONNELVENDOR / PRIME TESTING

OPS

FAILURES

REPAIRS

EQUIPMENT PERFORMANCE

RELIABILITY & MAINTAINABILITY DATA ANALYSIS

AGE EXPLORATION DATA COLLECTION AGE EXPLORATION DATA ANALYSIS

FA ILURE DISTRIBUTIONS / RATESWEIBULL & TRENDING ANALYSIS

DATAUPDATE

AGEEXPL

SIGNIFICANT

SAFETY ECONOMICS

INSPECT

LIFELIMIT

REDESIGN

AGEEXPL

$$$

REPAIRRATES

INSPECT

LIFELIMIT

REDESIGN

INITIAL DESIGN &MODIFICATION PROCESS

RCM DECISION ANALYSIS

OPERATIONAL DATA ANALYSIS

LOCAL CARDS

3MHMR

EI QDR

MISHAP REITPDR

ASQ

PMA / APMLDIRECTED

TASKS

DOCUMENT TASK

IN-SERVICE ENGINEERING/LOGISTICS

ENGINEERING INVESTIGATIONSDEFECT REPORT DISPOSITION

INSPECTION BULLETINSSTRUCTURAL ANALYSIS

REWORK / REPAIRFATIGUE LIFE MONITORING

AFFORDABLE READINESSTOP DEGRADERS

3M DATA ANALYSIS

MAINTENANCE PROG IMPLEMENTINDUSTRIAL COORDINATION

MAINTENANCE PROGRAM REVIEWDEFECT DATA FEEDBACK

DMDS


F i r s t Q u a r t e r - 2 0 0 0 13

Reliability-Centered Maintenance (RCM) is an extensiveprocess unto itself, and applies to all levels of maintenance. Itis well defined in another Management Manual, NAVAIR 00-25-403, Guidelines for the Naval Aviation Reliability-Centered Maintenance Process. RCM is readily adaptable toany system, industrial ones included, having mission or outputrequirements over some system life (life cycle). Essentially,RCM is a process that can be used to first establish and thenadjust maintenance procedures and activities based on project-ed and observed fielded system performance. That perform-ance can be characterized through the analysis of expected andactual system or component failures, and tracking the trendsover time to determine maintenance plan effectiveness. Also,reliability can be improved by implementing results of engi-neering activities such as a Failure Modes, Effects, andCriticality Analysis (FMECA). Proposed solutions andchanges to maintenance plans should routinely be subjected tothe rigors of cost-effectiveness analysis.

Age Exploration (AE) is the second key component ofSustained Maintenance Planning. AE tasks can range fromreviews of usage or failure data (e.g., Navy�s 3-M, Air Force�s66-1) to actual inspections or tests to monitor age-related phe-nomena such as wearout, fatigue, longer response times, anddegradation caused by exposure or storage. Ideally, AE taskswould be limited to the durations needed to collect sufficientdata and update RCM analyses.

The key to both RCM and AE is a commitment to apply thesignificant resources that may be required to collect and ana-lyze performance data. It follows that only with such a com-mitment can SMP be meaningful and successful.

Closing the loop in Figure 1, AE program results are integrat-ed into and with RCM activities, which in turn are used torefine maintenance plans and improve their cost-effectiveness.Reiterating these activities over the life of a system andimplementing the resulting solutions cost-effectively are theessence of Sustained Maintenance Planning.

Implementing SMPAs shown in Figure 1, implementing SMP actually beginsearly with the initial design activities of the early AcquisitionPhase during Design Interface. It is then that the initialFMEA�s are accomplished, maintenance concepts are devel-oped, critical performance parameters are defined, and datagathering and data analysis schemes are formulated. The abil-ity to sustain that initial maintenance planning and refinemaintenance plans as required, throughout the life of a system,depends on an adequate resource commitment to gather andanalyze data cost-effectively and intelligently throughout thelife of the system.

For Further Study Web Sites. Additional information on SMP can be obtainedfrom the following web sites. In addition, many of the publi-cations in the list that follows can be downloaded from thesesites.

a. http://www.deskbook.osd.mil

b. http://www.nalda.navy.mil/rcm/403/403desc.htm

c. http://www.nalda.navy.mil

d. http://www.nalda.navy.mil/3.6/coo/

e. http://www.acq-ref.navy.mil/

Publicationsa. Joint Aeronautical Commanders� Group. Flexible

Sustainment Guide, Change 2. December, 1998.

b. Naval Air Systems Command. Draft NAVAIR 00-25-406, Management Manual, Design InterfaceMaintenance Planning. Washington, D.C.:Commander, NAVAIR, January, 1999.

c. Naval Air Systems Command. Maintenance TradeCost Guidebook. Washington, D.C.: Cost Department,NAVAIR-4.2, October, 1998.

d. Naval Air Systems Command. Joint Service Guide forPost Production Support Planning. Patuxent River,MD: Logistics Policy and Processes, AIR-3.6.1.1,October, 1997.

e. Naval Air Systems Command. NAVAIR 00-25-403,Management Manual, Guidelines for the NavalAviation Reliability-Centered Maintenance Process.Washington, D.C.: Commander, NAVAIR,October, 1966.

f. Naval Air Systems Command. Contracting forSupportability Guide. Patuxent River, MD: LogisticsPolicy and Processes, AIR-3.6.1.1, October, 1997.


F i r s t Q u a r t e r - 2 0 0 014

Other START Sheets Available

RAC's Selected Topics in Assurance Related Technologies(START) sheets are intended to get you started in knowledge ofa particular subject of immediate interest in reliability, main-tainability, supportability and quality.

94-1 ISO 9000

95-1 Plastic Encapsulated Microcircuits (PEMs)

95-2 Parts Management Plan

96-1 Creating Robust Designs

96-2 Impacts on Reliability of Recent Changes inDoD Acquisition Reform Policies

96-3 Reliability on the World Wide Web

97-1 Quality Function Deployment

97-2 Reliability Prediction

97-3 Reliability Design for Affordability

98-1 Information Analysis Centers

98-2 Cost as an Independent Variable (CAIV)

98-3 Applying Software Reliability Engineering(SRE) to Build Reliable Software

98-4 Commercial Off-the-Shelf Equipment andNon-Developmental Items

99-1 Single Process Initiative

99-2 Performance-Based Requirements (PBRs)

99-3 Reliability Growth

99-4 Accelerated Testing

99-5 Six-Sigma Programs

These START sheets are available on-line at http://rac.iitri.org/DATA/START.

About the AuthorStephen G. Dizek is a Senior Engineer with IIT ResearchInstitute, where he has worked on projects related to Reliability,Cost-Benefits Analysis, and Decision Support. Before joiningIITRI, he worked in industry as Reliability and SustainingEngineering Manager for design, build, and fielding of preci-sion, automated, robot-based materials handling systems.Earlier, he spent 15 years with Dynamics Research Corporation(DRC) and The Analytic Sciences Corporation (TASC) asManager and Technical Director for weapons systems projectson reliability, warranty, logistics, risk analysis, decision sup-port, and the development and implementation of sequentialBayesian techniques for assessing dormant systems. Prior tothat, a 22-year United States Air Force career included acquisi-tion assignments to Electronic Systems Division, the MIN-UTEMAN SPO at the Space and Missile Systems Organization,and Aeronautical Systems Division.

Mr. Dizek holds a BS in Mathematics (minor in MechanicalEngineering) from the University of Massachusetts, an MS inSystems Engineering (Reliability) from the Air Force Instituteof Technology (AFIT), and an MS from the University ofSouthern California in Systems Management. He is a residentgraduate of the United States Naval War College�s School ofNaval Command and Staff, and earned graduate-level certifi-cates from AFIT in Systems Software Engineering and SystemsSoftware Acquisition. He has presented numerous papers atRAMS, SOLE, NAECON, and NES / ICA symposia, work-shops, and chapter meetings, and guest-lectured at AFIT,DSMC, and TASC.

About the Reliability Analysis CenterThe Reliability Analysis Center is a Department of Defense Information Analysis Center (IAC). RAC serves as a governmentand industry focal point for efforts to improve the reliability, maintainability, supportability and quality of manufactured com-ponents and systems. To this end, RAC collects, analyzes, archives in computerized databases, and publishes data concerningthe quality and reliability of equipments and systems, as well as the microcircuit, discrete semiconductor, and electromechani-cal and mechanical components that comprise them. RAC also evaluates and publishes information on engineering techniquesand methods. Information is distributed through data compilations, application guides, data products and programs on comput-er media, public and private training courses, and consulting services. Located in Rome, NY, the Reliability Analysis Center issponsored by the Defense Technical Information Center (DTIC). Since its inception in 1968, the RAC has been operated by IITResearch Institute (IITRI). Technical management of the RAC is provided by the U.S. Air Force's Research LaboratoryInformation Directorate (formerly Rome Laboratory).

For further information on RAC START Sheets contact the:Reliability Analysis Center 201 Mill StreetRome, NY 13440-6916Toll Free: (888) RAC-USER Fax: (315) 337-9932

or visit our web site at:http://rac.iitri.org


F i r s t Q u a r t e r - 2 0 0 0 15

Industry News

SAE and IMechE to Publish Journal on EngineResearchThe Society of Automotive Engineers (SAE) and the UK-basedInstitution of Mechanical Engineers (IMechE) have agreed tocooperatively publish the Quarterly International Journal ofEngine Research. The inaugural issue was scheduled for releasein March 2000.

Featuring original papers on experimental and analytical studiesof engine technology, the journal is meant to be a premier sourceof reference information on all aspects of engines used in a widevariety of transportation modes. Topics will include combustionengine performance, emissions control, fuel spray technology,electronic engine controls, and other key subjects.

SAE members who are editors of the journal are: Dr. Rolf D.Reitz, Department of Mechanical Engineering, University ofWisconsin, Madison; Dr. Constanti Arcoumanis, Department ofMechanical Engineering, College of Science, Technology, andMedicine, UK; and Professor Takeyuki Kamimoto, Departmentof Mechanical Engineering, Tokai University, Japan.

Those interested in having articles published in the new journalshould contact Vivian Rathke at SAE International for a copy ofthe author guidelines (call 724-772-7107 or [email protected]. Subscriptions can be ordered by calling SAECustomer Sales at 724-776-4970 ([email protected]).Mention order number JER-2000SUB.

ISO Publishes Draft Interim Version of ISO 9000StandardIn the January issue of Quality Digest, it was reported that ISO,the International Organization for Standardization, published thedraft international standard (DIS) of ISO 9000, ISO 9001 andISO 9004.

� ISO 9000, Quality Management Systems �Fundamentals and vocabulary. This revision willreplace the current vocabulary standard, ISO 8402, andthe guidelines for selection and use, ISO 9000-1;

� ISO 9001, Quality Management Systems �Requirements. This revision will replace the currentquality assurance models, ISO 9001, ISO 9002 and ISO9003. It will therefore become the sole ISO 9000 stan-dard for use in third-party certification;

� ISO 9004, Quality Management Systems � Guidelinesfor performance improvements. This revision willreplace the following current guidelines: for qualitymanagement and quality system elements, ISO 9004-1;for services, ISO 9004-2; for processed materials, ISO9004-3; for quality improvement, ISO 9004-4, and thetechnical corrigendum to that document.

The release of these documents marked the beginning of a five-month period during which they will be circulated to ISO mem-ber bodies for review and ballot. Balloting should be completedby the second half of April 2000.

The national standards institutes of 90 countries make up theISO member bodies. If 75% of the countries vote in favor of theDIS standards by April 25, 2000, the documents will be accept-ed for further processing as final draft international standards(FDIS). The FDISs will be balloted in the third quarter of 2000.If these drafts pass with 75% of the vote or more, the standardswill be published as ISO standards in the last quarter of 2000.

The draft standards are publicly available documents that can beobtained from ISO members (a full list is posted on ISO�s WebSite: http://www.iso.ch). For additional information on the revi-sions to ISO 9000, ISO 9001, and ISO 9004, visit the followingWeb Site: http://www.iso.ch/9000e/revisionstoc.html.

NIST Announces 1999 Baldrige Award WinnersFour companies will receive the Malcolm Baldrige NationalQuality Award at a ceremony at the Marriott Wardman Hotel inWashington D.C. on March 13-15, 2000. The winners, namedby the National Institute of Standards and Technology (NIST),which administers the award, are:

� Manufacturing category: STMicroelectronics Inc. � Region Americas (Carrollton, TX)

� Service category: BI (Minneapolis, MN)� Service category: Ritz-Carlton Hotel Co. L.L.C.

(Atlanta, GA)� Small-business category: Sunny Fresh Foods

(Monticello, MN)

The Malcolm Baldrige National Quality Award was created byPublic Law 100-107, signed into law by President RonaldReagan on August 20, 1987. The Award Program, responsive tothe purposes of Public Law 100-107, led to the creation of a newpublic-private partnership. Principal support for the programcomes from the Foundation for the Malcolm Baldrige NationalQuality Award, established in 1988.

The Award is named for Malcolm Baldrige, who served asSecretary of Commerce from 1981 until his tragic death in arodeo accident in 1987. His managerial excellence contributed tolong-term improvement in efficiency and effectiveness of gov-ernment. The Secretary of Commerce and the National Instituteof Standards and Technology (NIST, formerly the NationalBureau of Standards) were given responsibilities to develop andadminister the Award with cooperation and financial supportfrom the private sector.

For more information on the Baldrige Award and the 1999 win-ners, visit the NIST Web Site: http://www.quality.nist.gov/.


F i r s t Q u a r t e r - 2 0 0 018

From the Editor

Those readers who did not receive and

read the 4th Quarter issue of the RACJournal may be surprised to not see TonyCoppola�s name in this section. Tony hasretired as the Editor of the Journal and Ihave been given the daunting task of fol-lowing in his footsteps. Since 1992,Tony worked to make the Journal one ofthe most respected and valuable publica-tions in the field of reliability and associ-ated disciplines. As important as thatwork has been, however, it is just onesmall part of the incredible contributionTony has made to the reliability commu-nity.

When listing the true pioneers in reliabil-ity, Tony Coppola�s name must beincluded. For over 40 years, first as acivilian with the US Air Force and thenwith IIT Research Institute, Tony devel-oped the reliability concepts and toolsmany of us now take for granted. In1964, before many of us even knew whatreliability or effectiveness meant, Tonyserved on the Weapons SystemEffectiveness Industry Advisory Group.His selection for this group was mostappropriate � he developed a mathemati-cal model of Systems Effectiveness in1961 while with the Rome AirDevelopment Center where he workedfrom 1956 to 1992. In another key effort,he chaired the committee on ArtificialIntelligence Applications toMaintenance, a committee of theInstitute for Defense Analysis Study onReliability and Maintainability fromJune to October 1984.

Tony�s technical expertise has alwaysbeen matched by his good old-fashionedcommon sense. His talents are manyand include an ability to take a complexsubject and make it understandable, anda gift for showing how inscrutable theoryapplies to practical problems. It was nat-ural, then, for Tony to share his expertisethrough teaching. He served as guestinstructor at the USAF Academy in 1973and 1980; at the USAF Institute ofTechnology in 1964, 1969, 1974, 1977,

and 1978; and at the George WashingtonUniversity in 1969 and 1980. In addi-tion, he prepared and taught 15 hours ofmaterial for a special RAC trainingcourse for Eaton Corporation in 1994 andtaught both the Design Reliability andMechanical Reliability RAC Courses in1993 and 1994.

In addition to passing on his expertisethrough teaching, Tony has single-hand-edly created a library of references forthe reliability community. His more than100 publications include PracticalStatistical Tools for the ReliabilityEngineer, RAC STAT, 1999; The TQMToolkit, RAC TQM, 1993; Report onArtificial Intelligence Applications toMaintenance a 1984 study on reliabilityand maintainability by the Institute forDefense Analysis; A Reliability andMaintainability Management Manual,with A. Sukert, Air Force TechnicalReport (AFTR), RADC-TR-79-200,1979; Bayesian Reliability Testing MadePractical, AFTR, RADC-TR-81-106,1981; and A Design Guide for Built-In-Test, AFTR, RADC-TR-78-224, 1978.

Tony�s contributions have not goneunnoticed. His awards include the AirForce Award for Meritorious CivilianService (1987), the Air Force Award forOutstanding Civilian Career Service(1992), the IEEE Centennial Medal(1984), and the P.K. McElroy Award forBest Paper (1979 Annual Reliability andMaintainability Symposium). He is afellow of the IEEE and has been listed in�Who�s Who in America� since 1981.

All of us who work in reliability andmaintainability owe a debt of gratitude toTony. Happily, he will be available to theRAC staff on a very limited consultingbasis for the immediate future. For me,that is a godsend. As the new editor ofthe RAC Journal, I will value Tony�sguidance and sage advice. But, Tony, Ipromise to try not to lean on you toomuch.

To you, our readers, I pledge my besteffort to maintain the level of excellenceTony has achieved in the RAC Journal.To that end, I encourage you to submitarticles for the Journal and to respond toarticles with letters. For it is through thefree and open exchange of ideas andopinions that our body of knowledgegrows. So please consider the Journal asa forum for sharing your experiences andresearch. Readers interested in submit-ting articles or having ideas for continu-ally improving the Journal can contactme at:

[email protected]

Finally, I know all of his friends and col-leagues, and readers of the RAC Journaljoin me in wishing Tony a long, healthy,and rewarding retirement. And Tony -happy honks!

New RAC Catalog AvailableDescriptions and price data for all RACpublications, as well as information onRAC training, consulting services,START sheets, RAC Journal subscrip-tions, the RAC web site, and the RACData Sharing Consortium are provided inthe RAC Product catalog. A new issue isnow available. To request your freecopy, call RAC toll-free at (888) RAC-USER (888-722-8737) or view and printfrom our web site at http://rac.iitri.org/PRODUCTS/RAC_Catalog.pdf.

Ned Criscimagna


F i r s t Q u a r t e r - 2 0 0 020

Calendar - Upcoming Events in Reliability

12th Annual Software Technology Conference April 30-May 4, 2000 Salt Lake City, UTContact: Dana DovenbargerSTC 2000 Conference ManagerOO-ALC/TISEA7278 4th StreetHill AFB, UT 84056-5205Tel: (801) 777-7411 Fax: (801) 775-4932 E-mail: [email protected] On the web: http://www.stc-online.org

54th Meeting of the Society for MachineryFailure Prevention Technology May 1-4, 2000Virginia Beach, VAContact: Henry C. PuseyMFPT SocietyHaymarket, VA 20169-2420 Tel: (703) 754-2234Fax: (703) 754-9743E-mail: [email protected] the web: www.mfpt.org

2000 IEEE/IAS Industrial & CommercialPower Systems Technical Conference May 7-11, 2000Clearwater Beach, FLContact: Mr. James H. Beall E-mail: [email protected]: (727) 376-2790On the web: http://www.ewh.ieee.org/soc/ias/icps2k/

54th Annual Quality Congress and ExpositionMay 8-10, 2000Indianapolis, INContact: American Society for QualityP.O. Box 3066Milwaukee, WI 53201-3066Tel: (800) 248-1946Fax: (414) 272-1734On the web: http://aqc.asq.org/

International Conference on Metrology:Trends and Applications in Calibration andTesting LaboratoriesMay 16-18, 2000 Jerusalem, IsraelContact: Conference Secretariat

ISAS International SeminarsPO Box 34001Jerusalem, 91340 IsraelTel: 972-2-6520574 Fax: 972-2-6520558 E-mail: [email protected]

RAC Training ProgramJune 20-22, 2000 Virginia Beach, VAContact: RAC201 Mill StreetRome, NY 13440-6916Tel: (800) 526-4803 Fax: (315) 337-9932E-mail: [email protected] On the web: http://rac.iitri.org/

PRODUCTS/course_summaries.html.

35th Annual International LogisticsSymposiumAugust 6-10, 2000 New Orleans, LAContact: SOLE8100 Professional Place Hyattsville, MD 20785Tel: (301) 459-8446 Fax: (301) 459-1522 E-mail: [email protected] On the web: http://www.sole.org

Military & Aerospace/Avionics (COTS)Conference, Exhibition & Seminar August 22-25, 2000 Fort Collins, CO Contact: Edward B. HakimThe C3I2412 Emerson Ave.Spring Lake, NJ 07762 Tel: (732) 449-4729 Fax: (732) 449-4729 E-mail: [email protected]

Durability and Damage Tolerance of AgingAircraft StructureOctober 11-13, 2000Contact: Aerospace Short CoursesThe University of Kansas

Continuing Education12600 Quivira Road

Overland Park, KS 66213-2402 Tel: (913) 897-8500Fax: (913) 897-8540E-mail: [email protected] the web: http://www.kuce.org/aero

ISTFA 2000 26th International Symposium forTesting and Failure AnalysisNovember 12-16, 2000 Bellvue, WAContact: ISTFA Conference

AdministratorASM InternationalMaterials Park, OH 44073-0002Tel: (440) 338-5151 E-mail: [email protected] On the web: http://www.edfas.org

COMADEM 2000 Condition Monitoring &Diagnostic Engineering ManagementCongress & ExhibitionDecember 3-8, 2000 Houston, TX Contact: Henry C. PuseyMFPT Society4193 Sudley RoadHaymarket, VA 20169-2420Tel: (703) 754-2234 Fax: (703) 754-9743 E-mail: [email protected] the web: http://www.mfpt.org

Year 2001 International Symposium onProduct Quality and Integrity - Reliability andMaintainability Symposium (RAMS)January 22-25, 2001Philadelphia, PAOn the web: http://www.rams.org./

Also visit our Calendar webpage at http://rac.iitri.org/cgi-rac/Areas?0


F i r s t Q u a r t e r - 2 0 0 0 21

In Progress at RAC

COTS Handbook to bePublishedIn a project for the Department ofDefense, the Reliability Analysis Centerand the Naval Surface Warfare Center,Crane Division, collected and summa-rized information specifically addressingreliability and support issues associatedwith the integration of commercial prod-ucts into existing military systems. Thiscollection of information also reflects theexpertise and experience that RAC andNSWC have gained in the area of com-mercial product insertion with the DoDacquisition community.

The information is now being compiledinto a handbook that is intended to assisttechnical personnel in supporting com-mercial products and, as a result, encour-age increased use of commercial prod-ucts. Included in the handbook will beinformation on best practices, lessonslearned, examples and guidance. Theguidance will consist of support plan-ning, configuration management, relia-bility analysis, and refresh/upgradeassessment and planning. The title of thenew handbook will be �SupportingCommercial Products in MilitaryApplications.�

New RAC Guide forReliability-CenteredMaintenance (RCM)The RAC has begun development of anew guide for Reliability-CenteredMaintenance (RCM), Practical Appli-cation of RCM. John Moubray, anauthor of note on the subject of RCM hasobserved that the continuing innovationof systems in all aspects of private andpublic sector endeavors has meant thatreliability and availability have becomekey issues. This observation is based onthe findings of research that show thatsystem failures significantly impact thecapability of systems to satisfy cus-tomers. Research findings also reveal

that system failures have serious safetyand environmental consequences.

New developments in maintenance engi-neering and management have resultedin discarding the past, and now largelydiscredited, system service and break-down repair maintenance philosophy.New maintenance practices have includ-ed condition monitoring, or condition-based maintenance, design for reliabilityand maintainability, and the increaseduse of decision support tools (i.e., hazardstudies, criticality analyses, FMEA/FMECA).

RCM has not been addressed in a docu-ment comparable to the RAC'sReliability Toolkit. The current versionof RCM has been largely developed andrefined by the U.S. civil and military avi-ation engineers and maintenance experts.They have established the "best prac-tices" for Reliability-CenteredMaintenance used throughout the avia-tion industry and DoD. Virtually everymajor U.S. airline bases their mainte-nance program on the results of RCManalysis. However, RCM remains rela-tively unknown outside of aviation.

The new RAC product will build on andexpand information existing in the RACresources, with the objective of provid-ing a single source of information dedi-cated to the development and implemen-tation of efficient and effective reliabilityand maintainability programs. Publi-cation is planned for late 2000.

New START SheetsTwo new START Sheets are now avail-able from the RAC. The first addressesSustained Maintenance Planning and isincluded in this issue of the RACJournal. The second sheet addresses theconcept of Flexible Sustainment.Sustainment is defined as all of the activ-ities required of an Integrated WeaponSystem Management (IWSM) singlemanager in support of operating com-

mand customers, to keep a weapon sys-tem operational in both peacetime andwartime. Innovative sustainment isrequired to extend the useful lifetimes ofall systems in a global threat environ-ment that is by no means static. The newSTART sheet provides a good overviewof the flexible sustainment concept andgood references for further study.

All RAC START sheets can be down-loaded from our web page:http://rac.iitri.org/DATA/START

Maintainability ToolkitNearing CompletionWork is nearing completion on the devel-opment of a Maintainability Toolkit, acompanion document to the RAC'sReliability Toolkit: CommercialPractices Edition. The new toolkit is acomprehensive guide to maintainabilityusing the same format that has made theReliability Toolkit such a best seller.

The new toolkit covers a wide range ofsubjects including:

· The Concept of Maintainability· Structuring a Maintainability

Program· Source Selection· Designing for Maintainability· Maintainability Analysis· Maintainability Testing· Maintainability Data Collection and

Analysis

The appendices in the Toolkit includeinformation on sources of R&MEducation, R&M Software andAcronyms and Abbreviations.

Ned Criscimagna, a Senior Engineerwith RAC, is authoring theMaintainability Toolkit.


F i r s t Q u a r t e r - 2 0 0 0 23

Call: (800) 526 - 4802 Reliability Analysis Center201 Mill Street

(315) 339 - 7047 Rome, NY 13440-6916

Fax: (315) 337 - 9932 E-mail: [email protected]

O r d e r F o r m

Quantity Title US Price Each Non-US Price Total

Practical Statistical Tools for the Reliability Engineer $75.00 $85.00PRISM $1995.00 $2195.00

Shipping and Handling:US Orders add $4.00 per book for first class shipments, ($2.00 for RAC Blueprints).Non-US add $10.00 per book for surface mail (8-10 weeks), $15.00 per book for air mail ($25.00 for NPRD and VZAP, $40.00 for EPRD, $4.00 for RAC Blueprints).

Total

Name

Company

Division

Address

City State Zip

Country Phone Ext

Fax E-mail

Method of Payment:� Personal Check Enclosed� Company Check Enclosed (make checks payable to IITRI/RAC)

� Credit card #: Exp Date:

Type (circle): American Express VISA MastercardA minimum of $25.00 is required for credit card orders

Name on Card

Signature

Billing Address

� DD1155 (Government personnel) � Company Purchase Order � Send Product Catalog

F i r s t Q u a r t e r - 2 0 0 024

IIT Research Institute/Reliability Analysis Center201 Mill StreetRome, NY 13440-6916

ADDRESS SERVICE REQUESTED

Non-ProfitOrganization

US Postage PaidUtica, NY

Permit No. 566

Reliability Analysis Center

(315) 337-0900 General Inquiries(888) RAC-USER General Inquiries(315) 337-9932 Facsimile(315) 337-9933 Technical Inquiries(800) 526-4802 Publication Orders(800) 526-4803 Training Course Info

[email protected] Product [email protected] Technical Inquiries

http://rac.iitri.org Visit RAC on the Web

Contact Gina Nash at (800) 526-4802 or theabove address for our latest Product Catalog or a free copy of the RAC Users Guide.

T h e J o u r n a l o f t h e

Course Dates:June 20-22, 2000

Location:The Virginia Beach Resort Hotel2800 Shore DriveVirginia Beach, VA 23451(757) 481-9000

Course Fee:Accelerated Testing $1,695Design Reliability $1,095Mechanical Reliability $1,095System SoftwareReliability $1,095

Accelerated TestingThis results-oriented course provides both an in-depth introduction to the underlying statistical theory andmethods as well as a complete overview and step-by-step guidance on practical applications of the learnedtheory using ReliaSoft�s ALTA, a software tool designed expressly for the analysis of accelerated life testdata. Instructed by Mr. Pantelis Vassiliou.

Design ReliabilityThis intensive overview covers theoretical and practical aspects of reliability engineering with a focus onelectrical and electronic parts and systems. Each of the most important elements of a sound reliability pro-gram are covered and supported by practical problem solving. Instructed by Norman Fuqua.

Mechanical ReliabilityThis Mechanical Reliability Training Course is a practical application of fundamental mechanical engi-neering to system and component reliability. Designed for the practitioner, this course covers the theoriesof mechanical reliability and demonstrates the supporting mathematical theory. Instructed by NedCriscimagna.

System Software ReliabilityThis training course is tailored for reliability engineers, systems engineers, and software engineers andtesters. Featuring hands-on software reliability measurement, analyses and design, it is intended for thoseindividuals responsible for measuring, analyzing, designing, automating, implementing or ensuring soft-ware reliability for either commercial or government programs. Practical approaches are stressed withmany examples included. Instructed by Ann Marie (Leone) Neufelder.

Call us for more information at 1-800-526-4803 or 315-339-7036. You may also get more details and register on line at our web site at http://rac.iitri.org/PRODUCTS/enrollment_info.html

RAC Training ProgramThe Reliability Analysis Center invites you to join us in Virginia Beach, VA, June 20-22, 2000 for courses on the topics of Accelerated Testing, DesignReliability, Mechanical Reliability and System Software Reliability. RAC has been instructing the latest advances in reliability engineering for overtwenty-five years. Each of these courses is three days in length and are presented by instructors offering extensive practical experience coupled withdeep technical knowledge. Designers, practitioners and managers will become better prepared with the tools and vision to make reliability engineer-ing an integral part of the product development process.

Documents

Reliability Analysis Center - SUNY CortlandŁ Specific Prior Distribution:Some modelers have attempted to deal with the reality of different failure rates for different faults by assum-ing