Upload
elwin-chapman
View
230
Download
0
Tags:
Embed Size (px)
Citation preview
Software Reliability Software Reliability ResearchResearchPankaj JalotePankaj Jalote
Professor, CSE, IIT Kanpur, Professor, CSE, IIT Kanpur, IndiaIndia
System ReliabilitySystem Reliability
System – an entity that provides System – an entity that provides defined behavior at interfacesdefined behavior at interfaces• System is a hierarchy of subsystems, System is a hierarchy of subsystems,
each subsystem being a systemeach subsystem being a system Reliability of a system - its ability to Reliability of a system - its ability to
provide failure-free operationprovide failure-free operation Failure – the system behavior is Failure – the system behavior is
incorrect or not as expected; is a incorrect or not as expected; is a random phenomenonrandom phenomenon
Reliability QuantificationReliability Quantification
Reliability of a system defined as Reliability of a system defined as failure probability in a time periodfailure probability in a time period
R(t) = Prob that system has notR(t) = Prob that system has notfailed by time tfailed by time t
For rel work, often distribution of R(t) For rel work, often distribution of R(t) is specifiedis specified
Reliability Quantification..Reliability Quantification..
Reliability can also be quantified by Reliability can also be quantified by Mean Time to Failure (MTTF)Mean Time to Failure (MTTF)
Also by failure rate (no of failures per Also by failure rate (no of failures per unit time.)unit time.)
From R(t), MTTF or failure rate can be From R(t), MTTF or failure rate can be determineddetermined
Under some assumptions, failure rate Under some assumptions, failure rate and MTTF are inversely relatedand MTTF are inversely related
Software ReliabilitySoftware Reliability
Software (un)reliability not caused Software (un)reliability not caused due to aging but due to bugsdue to aging but due to bugs
The more the bugs, the lesser the The more the bugs, the lesser the reliability of the softwarereliability of the software
Still failures seem random, hence rel Still failures seem random, hence rel theory can be appliedtheory can be applied
Software Reliability ResearchSoftware Reliability Research
Two main threadsTwo main threads• Software reliability modeling – how to Software reliability modeling – how to
model and predict sw relmodel and predict sw rel• Improving sw reliability – by removing Improving sw reliability – by removing
defects through program checking, defects through program checking, verification, testing,…verification, testing,…
Will discuss some work being done Will discuss some work being done here in these twohere in these two
Software Reliability Software Reliability ModelingModeling
Software ReliabilitySoftware Reliability
Software systems often are one-offSoftware systems often are one-off• Measuring reliability in lab not practical Measuring reliability in lab not practical
as too much failure data is needed; as too much failure data is needed; requires timerequires time
Failures often result in fault removal, Failures often result in fault removal, leading to reliability improvementleading to reliability improvement• Predicting future reliability from Predicting future reliability from
measured reliability is hardermeasured reliability is harder Hence different models neededHence different models needed
Software Reliability Growth ModelsSoftware Reliability Growth Models
Assume that reliability is a function Assume that reliability is a function of the defect level and as defects are of the defect level and as defects are removed, reliability improvesremoved, reliability improves
Model the failure-fix process of Model the failure-fix process of software evolutionsoftware evolution
Many models have been proposed in Many models have been proposed in the last 3 decadesthe last 3 decades
Model parameters determined from Model parameters determined from past data on failures and fixespast data on failures and fixes
Reliability of Software ProductsReliability of Software Products
For software products, a large For software products, a large population exists in field and faults population exists in field and faults are not removed as failures occurare not removed as failures occur
According to SRGMs, the reliability According to SRGMs, the reliability should remain the sameshould remain the same
I.e. the failure rate should be I.e. the failure rate should be constantconstant
Average Failure Rate of a MS Average Failure Rate of a MS ProductProduct
Failure intensity
00.010.020.030.040.050.060.070.080.09
1 2 3 4 5 6 7 8 9 10 11
Months frm release
Fai
lure
s/m
on
th/u
nit
Reasons for this PhenomenonReasons for this Phenomenon
Users learn with time and avoid Users learn with time and avoid failure causing situationfailure causing situation
Users start with exploring more, then Users start with exploring more, then limit to some part of the productlimit to some part of the product• Most users use a few product featuresMost users use a few product features
Configuration related failures are Configuration related failures are much more in the startmuch more in the start
These failures reduce with timeThese failures reduce with time
A New Model for Product Rel.A New Model for Product Rel.
For a user, there is a transient failure For a user, there is a transient failure rate, which decays with a factorrate, which decays with a factor
With time the transient goes, and With time the transient goes, and failure rate reaches a steady statefailure rate reaches a steady state
Steady state failure rate – represents Steady state failure rate – represents the reliability of the productthe reliability of the product
Failure Rate of a UnitFailure Rate of a Unit
Failure rate for one Failure rate for one unit isunit isλ (i) = λ0 *αλ (i) = λ0 *αii + λf + λf
λ0 is the initial λ0 is the initial transient ratetransient rate
λf is the final λf is the final steady state ratesteady state rate
α is the decay α is the decay factorfactor
Failure rate of a unit
Time
Fai
lure
rat
e
Applying it to a ProductApplying it to a Product
Considered the failure and sale data Considered the failure and sale data of a real product for MSof a real product for MS
Applying the model to the data and Applying the model to the data and determining parameters, we getdetermining parameters, we getλ0 = 0.04 failures/monthλ0 = 0.04 failures/month
λf = 0.008 failures/monthλf = 0.008 failures/month
α = 0.4 (i.e. 40% decay each month)α = 0.4 (i.e. 40% decay each month)
Example…Example…
Steady state failure rate is 1/6Steady state failure rate is 1/6thth of of average rate in month 2, 1/3average rate in month 2, 1/3rdrd of of average rate in month 4average rate in month 4
I.e. initial MTTF could be 1/6I.e. initial MTTF could be 1/6thth the the steady state MTTFsteady state MTTF
Steady state is reached quite soon – Steady state is reached quite soon – in two to three monthsin two to three months
Software Architecture Software Architecture Based Rel EstimationBased Rel Estimation
Sw ArchitectureSw Architecture
Architecture is the components in the Architecture is the components in the system and how they are connectedsystem and how they are connected
Is decided very early in sw projectIs decided very early in sw project If reliability and performance can be If reliability and performance can be
modeled from architecture, can modeled from architecture, can improve the architectureimprove the architecture
Some work going on in arch. based Some work going on in arch. based perf. and rel modeling perf. and rel modeling
Program VerificationProgram Verification
Program VerificationProgram Verification
Basic goal – to ensure that program Basic goal – to ensure that program is free of defects (bugs) as much as is free of defects (bugs) as much as possiblepossible
Good program verification leads to Good program verification leads to higher reliabilityhigher reliability
Program Verification TechniquesProgram Verification Techniques
Testing – program is executed with Testing – program is executed with test data to find bugstest data to find bugs
Static analysis – program source Static analysis – program source code is analyzedcode is analyzed
Dynamic analysis – program run on Dynamic analysis – program run on some data and assertions madesome data and assertions made
Model checkingModel checking Formal verificationFormal verification
TechniquesTechniques
Most techniques work in isolationMost techniques work in isolation Sometimes they are complimentary Sometimes they are complimentary
in their defect detection capabilityin their defect detection capability Combining techniques meaningfully Combining techniques meaningfully
can improve reliabilitycan improve reliability We are working on techniques for We are working on techniques for
combining testing and static analysiscombining testing and static analysis
State-based Testing State-based Testing AutomationAutomation
TestingTesting
Testing remains main verification Testing remains main verification activity – most reliance on itactivity – most reliance on it
Consumes as much as half of the Consumes as much as half of the total effort in a sw producttotal effort in a sw product
Testing: test case design, execution, Testing: test case design, execution, checking the results, then checking the results, then debugging, fixing, retestingdebugging, fixing, retesting
Each step is expensiveEach step is expensive
Test AutomationTest Automation
Test automation can help reduce Test automation can help reduce cost and make testing more effectivecost and make testing more effective
Most test automation approaches Most test automation approaches focus on data collection, re-testingfocus on data collection, re-testing
Little effort in complete end-to-end Little effort in complete end-to-end automationautomation
We are working on automating OO We are working on automating OO testing using state based modelstesting using state based models
SummarySummary
Software reliability is a rich and wide Software reliability is a rich and wide areaarea
Exciting work going on across the Exciting work going on across the world in modeling, analysis, program world in modeling, analysis, program checking, testing, etcchecking, testing, etc
Lots of open issuesLots of open issues