An Empirical Study on Reliability Modeling for Diverse Software Systems

An Empirical Study on Reliability An Empirical Study on Reliability Modeling for Diverse Software Modeling for Diverse Software

SystemsSystems

Xia Cai and Michael R. LyuXia Cai and Michael R. Lyu

Dept. of Computer Science & EngineeringDept. of Computer Science & EngineeringThe Chinese University of Hong KongThe Chinese University of Hong Kong

22Dept. of Computer Science & EngineeringDept. of Computer Science & Engineering

OutlineOutline

IntroductionIntroduction Objectives and previous workObjectives and previous work Analyses and investigations on reliability models fAnalyses and investigations on reliability models f

or diverse software systemsor diverse software systems• Reliability bounds model by Popov,Strigini, et alReliability bounds model by Popov,Strigini, et al• System reliability model by Dugan and LyuSystem reliability model by Dugan and Lyu

Discussion Discussion ConclusionConclusion


IntroductionIntroduction Design diversity is one of the two main Design diversity is one of the two main

techniques for software fault tolerancetechniques for software fault tolerance The rationale of this approach is the The rationale of this approach is the

expectation that software programs built expectation that software programs built differently will fail differentlydifferently will fail differently

Reliability models attempt to estimate the Reliability models attempt to estimate the probability of coincident failures in probability of coincident failures in multiple versionsmultiple versions

Empirical data are highly demanded for Empirical data are highly demanded for evaluation and cross-validation of the evaluation and cross-validation of the usefulness and/or effectiveness of these usefulness and/or effectiveness of these modelsmodels


Reliability models for design diversityReliability models for design diversity

Eckhardt and Lee (1985)Eckhardt and Lee (1985)• Variation of difficulty on demand spaceVariation of difficulty on demand space• Positive correlations between version failuresPositive correlations between version failures

Littlewood and Miller (1989)Littlewood and Miller (1989)• Forced design diversityForced design diversity• Possibility of negative correlationsPossibility of negative correlations

Dugan and Lyu (1995)Dugan and Lyu (1995)• Markov reward modelMarkov reward model

Tomek and Trivedi (1995)Tomek and Trivedi (1995)• Stochastic reward net Stochastic reward net

Popov, Strigini et al (2003)Popov, Strigini et al (2003)• Subdomains on demand spaceSubdomains on demand space• Upper/lower bounds for failure probability Upper/lower bounds for failure probability

Conceptual

models

Structural

models

In between


Our objectivesOur objectives

To study reliability and fault correlation isTo study reliability and fault correlation issues in design diversity by means of mutasues in design diversity by means of mutantation testingntation testing

To investigate and compare the predictioTo investigate and compare the prediction performance of different existing reliabin performance of different existing reliability models for design diversitylity models for design diversity


Our previous workOur previous work Motivated by the lack of empirical data, we Motivated by the lack of empirical data, we

conducted the RSDIMU project in the year conducted the RSDIMU project in the year 2002.2002.

It took more than 100 students 12 weeks to It took more than 100 students 12 weeks to develop 34 program versionsdevelop 34 program versions

1200 test cases were executed on these 1200 test cases were executed on these program versionsprogram versions

426 mutants were generated by injecting a 426 mutants were generated by injecting a single fault identified in the testing phasesingle fault identified in the testing phase

A number of analyses and evaluations were A number of analyses and evaluations were conducted in our previous workconducted in our previous work


IntroductionIntroduction Objectives and previous workObjectives and previous work Analyses and investigations on reliability models foAnalyses and investigations on reliability models fo

r diverse software systemsr diverse software systems• Reliability bounds model by Popov,Strigini, et alReliability bounds model by Popov,Strigini, et al (PS model)(PS model)• System reliability model by Dugan and LyuSystem reliability model by Dugan and Lyu (DL model)(DL model)

DiscussionDiscussion ConclusionConclusion

OutlineOutline


PS ModelPS Model

Proposed by P. T. Popov, L. Strigini, J. May and S. Kuball (2003)

Target: give the upper and “likely” lower bounds for probability of coincident failures

Assumptions: • Given the knowledge on disjoint subdomains Si on t

he demand space, i.e.,1)the probability P(Si) of a random demand being drawn from

Si;

2)the probabilities of failure on demand (pfds) of A and B for demands from Si, PA|Si and PB|Si .


PS Model (cont’)PS Model (cont’)

Alternative estimates for probability of Alternative estimates for probability of failures on demand (pfd) of a 1-out-of-2 failures on demand (pfd) of a 1-out-of-2 systemsystem


PS Model (cont’)PS Model (cont’)

Upper bound of system pfdUpper bound of system pfd

““Likely” lower bound of system pfdLikely” lower bound of system pfd- under the assumption of conditional independence- under the assumption of conditional independence


Experimental setupExperimental setup

Mutants are treated as program versions in Mutants are treated as program versions in our experimentour experiment

1200 test cases are divided into seven 1200 test cases are divided into seven categories by the system statuscategories by the system status

The first 800 test cases (manually designed The first 800 test cases (manually designed for functionality testing) are used as for functionality testing) are used as qualification test and other 400 test cases qualification test and other 400 test cases (randomly generated) as operational test(randomly generated) as operational test


Programs passed qualification test

Information on subdomainsInformation on subdomains

Failure data and demand profileFailure data and demand profile

Upper bounds

Lower bounds

subdomains

Faults in operational test

hypothetical

realAnalysis


Estimation MethodEstimation Method

Since no failure was observed in some subdomaSince no failure was observed in some subdomains, we adopt ins, we adopt confidence bounds methodconfidence bounds method rathe rather than point estimates method in our experimenr than point estimates method in our experimentt

One-sided confidence boundsOne-sided confidence bounds (Bayesian Bound (Bayesian Bounds) are computed for the probabilities of failuress) are computed for the probabilities of failures

90%90% confidence upper bounds as well as lower confidence upper bounds as well as lower bounds on pfds of mutants in subdomains undebounds on pfds of mutants in subdomains under r all demand profilesall demand profiles were estimated were estimated


Bayesian Bounds under DP4 Bayesian Bounds under DP4 90% confidence upper bounds on pfds in subdomains90% confidence upper bounds on pfds in subdomains

90% confidence lower bounds on pfds in subdomains90% confidence lower bounds on pfds in subdomains


Upper bounds Upper bounds

Failure Lower Analysis

Upper bounds on the joint pfds under all Demand ProfilesUpper bounds on the joint pfds under all Demand Profiles


Lower BoundsLower Bounds

Failure Upper Analysis

““Likely” lower bounds on the joint pfds under Demand ProfilesLikely” lower bounds on the joint pfds under Demand Profiles


Analysis on upper/lower boundsAnalysis on upper/lower bounds

Mutant Mutant pairspairs

Failure Failure featuresfeatures

Performance Performance comparisoncomparison

Covariance in Covariance in failuresfailures

Upper Upper boundsbounds

Lower Lower boundsbounds

(117, (117, 305)305)

No No correlationcorrelation

ObservedObserved

Fail differentlyFail differently Positive (DP1)Positive (DP1)

Negative Negative (others)(others)

Smaller thaSmaller than min(Pn min(PAA,P,PBB))

Larger Larger than than PPAA*P*PBB in in DP1DP1

(215, (215, 382)382)

Correlation Correlation

ObservedObservedMutant 382 perfMutant 382 performs worse in aorms worse in all subdomainsll subdomains

Always positiveAlways positive Equal to Equal to PP215215

Larger in Larger in all DPsall DPs

(382, (382, 403)403)

CorrelationCorrelation

ObservedObservedPerform Perform differently differently

Positive (DP1&2)Positive (DP1&2)

Negative(DP3&4)Negative(DP3&4)Smaller thaSmaller than min(Pn min(PAA,P,PBB))

Larger in Larger in DP1&2DP1&2

Failure Lower Upper


DiscussionDiscussion With our data, the confidence bounds in PS model are tiWith our data, the confidence bounds in PS model are ti

ghter than Pghter than PAA*P*PB B and min(Pand min(PAA, P, PBB) under most circumstan) under most circumstances exceptces except• One program performs worse than the other in all subdomainsOne program performs worse than the other in all subdomains• Negative covariance holds between the failure probability of twNegative covariance holds between the failure probability of tw

o programso programs

Difficulties and limitations of PS modelDifficulties and limitations of PS model• The way to divide the demand space into disjoint subdomainsThe way to divide the demand space into disjoint subdomains• The thorough knowledge on the probability and performance oThe thorough knowledge on the probability and performance o

f all the versions in each subdomainf all the versions in each subdomain


DL ModelDL Model Proposed by Dugan and Lyu (1995)Proposed by Dugan and Lyu (1995) 3-level reliability model3-level reliability model

• A Markov model detailing the system structureA Markov model detailing the system structure• Two fault trees presenting the causes of failures in Two fault trees presenting the causes of failures in

the initial configuration and the reconfigured statthe initial configuration and the reconfigured statee

AssumptionsAssumptions• Unrelated faults: different erroneous resultsUnrelated faults: different erroneous results• Related faults: similar erroneous resultsRelated faults: similar erroneous results


DL ModelDL Model

Example: Reliability model of DRBExample: Reliability model of DRB


DL Model (cont’)DL Model (cont’) Fault tree models for 2-, 3-, and 4-version Fault tree models for 2-, 3-, and 4-version

systemssystems


Results of DL model with our project dataResults of DL model with our project data

The new experimental data is applied to The new experimental data is applied to verify the effectiveness and consistency of verify the effectiveness and consistency of DL modelDL model

Six mutants with various failure Six mutants with various failure characteristics are employed in the characteristics are employed in the operational testoperational test



Failure characteristics for 2,3,4-version Failure characteristics for 2,3,4-version configurationsconfigurations



Summary of parameter values Summary of parameter values

Prob. of related faults between two versions

Prob. of unrelated faults

Prob. of related faults in all versions



Predicted reliability by different configurationsPredicted reliability by different configurations



Predicted safety by different configurationsPredicted safety by different configurations


DiscussionDiscussion Compared our project with former project, the Compared our project with former project, the

reliability and safety performance of DRB, NVP, reliability and safety performance of DRB, NVP, NSCP shows consistency of DL model with respect NSCP shows consistency of DL model with respect to our experimental datato our experimental data

The discrepancy in the first thousands of hours The discrepancy in the first thousands of hours may indicate dependence on operational domains may indicate dependence on operational domains

The simplified classification of related and The simplified classification of related and unrelated faults need to be improved by including unrelated faults need to be improved by including real-life scenariosreal-life scenarios

To achieve more accurate results, the information To achieve more accurate results, the information about the correlation between successive about the correlation between successive executions should be includedexecutions should be included


Comparison of PS & DL ModelComparison of PS & DL Model

PS ModelPS Model DL ModelDL Model

AssumptionsAssumptionsThe whole demand space can be pThe whole demand space can be partitioned into disjoint subdomainartitioned into disjoint subdomains; knowledge on subdomains shouls; knowledge on subdomains should be givend be given

The faults among program The faults among program versions can be classified into versions can be classified into unrelated faults and related faultsunrelated faults and related faults

PrerequisitePrerequisite1.Probability of subdomains1.Probability of subdomains2.Failure probabilities of programs 2.Failure probabilities of programs on subdomainson subdomains

1.Number of faults unrelated and 1.Number of faults unrelated and related among versionsrelated among versions

2. Probability of hardware and 2. Probability of hardware and decider failuredecider failure

Target Target systemsystem

Specific 1-out-of-2 system Specific 1-out-of-2 system configurationsconfigurations

All multi-version system All multi-version system combinationscombinations

Measurement Measurement objectiveobjective

Upper and lower bounds for Upper and lower bounds for failure probabilityfailure probability

Average failure probabilityAverage failure probability

Experimental Experimental results results

Give tighter bounds under most Give tighter bounds under most circumstances, yet whether circumstances, yet whether tighter enough needs further tighter enough needs further investigationinvestigation

The prediction results agree well The prediction results agree well with observation, yet may have with observation, yet may have deviations to a specific systemdeviations to a specific system


ConclusionConclusion

Mutants are employed to investigate the prediMutants are employed to investigate the prediction performance of two reliability modelsction performance of two reliability models

Advantages, limitations and performance of PAdvantages, limitations and performance of PS and DL model are comparedS and DL model are compared

With our data, the confidence bounds in PS mWith our data, the confidence bounds in PS model are tighter than Podel are tighter than PAA*P*PB B and min(Pand min(PAA, P, PBB) und) under most circumstanceser most circumstances


ConclusionConclusion

The PS approach is helpful with our data to analThe PS approach is helpful with our data to analyze the behaviors of the versions under subdomyze the behaviors of the versions under subdomains in revealing the features of fault correlation ains in revealing the features of fault correlation among diverse programsamong diverse programs

Our analyses with DL model about the Our analyses with DL model about the reliability and safety features of DRB, NVP reliability and safety features of DRB, NVP and NSCP are consist with the original and NSCP are consist with the original experiment, although there are crossovers experiment, although there are crossovers in the first thousands of hours in the in the first thousands of hours in the reliability curvesreliability curves


Future workFuture work

More test cases should be employed for cross-More test cases should be employed for cross-validation on the prediction accuracy of PS movalidation on the prediction accuracy of PS model and DL modeldel and DL model

Other existing reliability models can be applieOther existing reliability models can be applied for further comparisons with our experimentd for further comparisons with our experimental dataal data

Q & AQ & A

Thank you!Thank you!

Dept. of Computer Science & EngineeringDept. of Computer Science & Engineering

Documents

An Empirical Study on Reliability Modeling for Diverse Software Systems