Predicting and Estimating Nov 06

Embed Size (px)

Citation preview

  • 8/21/2019 Predicting and Estimating Nov 06

    1/89

    1Linda M. Laird 2004All Rights Reserved

    What will the reliability be?

  • 8/21/2019 Predicting and Estimating Nov 06

    2/89

    2 2003 Linda LairdAll Rights Reserved

    Predicting and Estimating Software

    Reliability

    Linda M. Laird

    SRE 689

  • 8/21/2019 Predicting and Estimating Nov 06

    3/89

    3Linda M. Laird 2004All Rights Reserved

    Hardware

    Requirements

    Analysis

    System

    Requirements

    Analysis and

    Design

    Hardware

    Preliminary

    Design

    Hardware

    Detailed

    Design

    Fabrication HWCI Test System

    Integration

    and Test

    Software

    Requirements

    Analysis

    Software

    Preliminary

    Design

    Software

    Detailed

    Design

    Coding

    and Unit

    Test

    CSC

    Integration

    Test CSCI Test

    System

    Reliability

    Requirements

    System HW/

    SW

    Reliability

    Model

    System HW/

    SW Reliability

    Allocations

    ReDesign Activity

    Design Activity

    HW/SW

    Reliability

    Predictions

    Progress Evaluation

    Assessment

    Report

    Program Review Board Activity

    HW/SW Growth Testing

    Evaluate Growth

    HW/SW

    Demo Test

    Evaluate

    Results

    Assessment

    Report

    Design Correction

    Reallocation Needed

    Reassign Resources

    Not OK

    To Program Manager

    and Engineering Manager

    To Program Manager

    and Engneering manager

    System

    Reliability

    Tasks

    Source: Lakey

    and Neufelder

  • 8/21/2019 Predicting and Estimating Nov 06

    4/89

    4Linda M. Laird 2004All Rights Reserved

    Projecting Reliability Agenda

    Motivation

    Prediction vs. Estimation

    Model Overview

    Predicting Defect Densities

    Predicting Failures From Defect DensitiesEstimating Failures from TestingExecution Time vs. Calendar Time

    Estimating Failure Models

    Reliability Growth Reliability Estimation

    This

    Week

    Next

    Week

  • 8/21/2019 Predicting and Estimating Nov 06

    5/89

    5Linda M. Laird 2004All Rights Reserved

    Need to have a view of expected

    reliability throughout the projectso if

    you can tell if you are going to hit the

    requirementsor not.

  • 8/21/2019 Predicting and Estimating Nov 06

    6/89

    6Linda M. Laird 2004All Rights Reserved

    And if you are going to miss.

    What can you do about it?

    Plenty

  • 8/21/2019 Predicting and Estimating Nov 06

    7/89

    7Linda M. Laird 2004All Rights Reserved

    (So QuickWhat are some ways you canimprove the reliability?)

    Fault Tolerance, Reduce Defects,

    Increase Reviews, Reduce Complexity,

    Redundancy (maybe), etc

  • 8/21/2019 Predicting and Estimating Nov 06

    8/89

    8Linda M. Laird 2004All Rights Reserved

    Why else Predict the Reliability?

    When to Ship

    Objective statement of quality of

    productResource planning for maintenance

  • 8/21/2019 Predicting and Estimating Nov 06

    9/89

    9Linda M. Laird 2004All Rights Reserved

    Prediction and Estimation

    Whats the difference?

    http://www.bmc.riken.go.jp/sensor/Huang/Localization/estimation.gif
  • 8/21/2019 Predicting and Estimating Nov 06

    10/89

    10Linda M. Laird 2004All Rights Reserved

    Prediction & Estimation

    Predictionsbased on historical reliability

    data and knowledge from other projects

    Estimationsbased on reliability data for this

    project & reliability models

    Prediction: Used in earlier stages

    Estimation: Used in later stages (when youhave more data)

  • 8/21/2019 Predicting and Estimating Nov 06

    11/89

    11Linda M. Laird 2004All Rights Reserved

    Prediction Semantics

    The terminology of prediction and

    estimation are frequently misused.

    (including by these lectures)

    What is important is the conceptare you

    using historical data from other projects

    (predicting) or are you trending from this

    project (estimating) .

  • 8/21/2019 Predicting and Estimating Nov 06

    12/89

    12Linda M. Laird 2004All Rights Reserved

    Reliability Prediction

    Used to predict a products reliability

    or to predict the number of latent

    defects when available to users

    Like to Predict

    Fault Density (per phase)Fault Profile

    Initial Failure Rate

    Final Failure Rate

    ?

    0

  • 8/21/2019 Predicting and Estimating Nov 06

    13/89

    13Linda M. Laird 2004All Rights Reserved

    So how do we predict what the

    reliability be?

  • 8/21/2019 Predicting and Estimating Nov 06

    14/89

    14Linda M. Laird 2004All Rights Reserved

    Prediction Model steps

    Can either make prediction for each stepor

    use actual data if that step has alreadyoccurred.

    Multiple methodologies for each step

    Issue remains of predicting failures fromfaults.need to be carefulit is a weak link

    Fault

    Profile

    & Defect

    Density

    Initial FailureRate

    Delivered

    andOn-going

    Failure Rate

  • 8/21/2019 Predicting and Estimating Nov 06

    15/89

    15Linda M. Laird 2004All Rights Reserved

    Predicting Fault Density and Distribution

    Typical Distribution of Faults

    Defect Prediction Models Dynamic

    Rayleigh, Exponential, S-Curve Models

    (Fault Injection)

    Static Coqualmo Model

    Based on Process

    RL-TR-92-95

    Industry Data Such as SEI Delivered Fault Data

    Local ModelsHistorical Data

    Note: Much of this

    Material is from CS533 --

    Included as a review

  • 8/21/2019 Predicting and Estimating Nov 06

    16/89

    16Linda M. Laird 2004All Rights Reserved

    Typical Fault Distributions

  • 8/21/2019 Predicting and Estimating Nov 06

    17/89

    17Linda M. Laird 2004All Rights Reserved

    Defect Dynamics and Behaviors

    Defects have certain dynamics,

    behaviors, and patterns which areimportant to understand in order to

    understand the dynamics of software

    development

  • 8/21/2019 Predicting and Estimating Nov 06

    18/89

    18Linda M. Laird 2004All Rights Reserved

    Projected Software Defects

    In general, defect arrivals follow a Rayleigh Distribution Curvecan predict,

    based upon project size and past defect densities, the curve, along with theUpper and Lower Control Bounds

    Time

    Defects

    Upper Limit

    Lower

    Limit

    F(t) = 1e^((-t/c)^2)

    f(t) = 2*((t/c)^2) *e ^((-t/c)^2)

    Recall that F(t) is the cumulative distribution density, f(t) is the

    probability distribution, t is time, and c is a constant.

  • 8/21/2019 Predicting and Estimating Nov 06

    19/89

    19Linda M. Laird 2004All Rights Reserved

    Defects Detected tends to be similar to Staffing

    Curves

    People

    Defects

    TimeSource: Industrial Strength Software,Putnam & Myers, IEEE, 1997

  • 8/21/2019 Predicting and Estimating Nov 06

    20/89

    20Linda M. Laird 2004All Rights Reserved

    Which is related to Code Production Rate

    People

    Defects

    Time

    Code Production

    Rate

    And all tend tofollow Rayleigh

    Curves

    TEST

    Note: Period during test is similar to

    exponential curve Source: Putnam &Myers

  • 8/21/2019 Predicting and Estimating Nov 06

    21/89

    21Linda M. Laird 2004All Rights Reserved

    Defect Prediction/Estimation Models

    Total number of defects

    Distribution of defects over time

  • 8/21/2019 Predicting and Estimating Nov 06

    22/89

    22Linda M. Laird 2004All Rights Reserved

    Defect Model Types: Static and Dynamic

    Dynamic is usually based on statistical distributions of faultsfound (akaestimated)

    Two types

    One that model the entire development Rayleigh distributions

    One that models the testing/deployment process Exponential andS-Curve models

    Work better in the large on projects when you need to estimatewhen/if the project will fail.

    Static uses attributes of the program to estimate number ofdefects (aka predicted)

    Typically of form y = f(a,b,c,d,e) where y is the defect rate or # ofdefects, and a->z are attributes of the product, process, and/orproject

    COQUALMO &RL-TR-92-95 Model, Industry Data, Local Historicalare all static models

    Usually work better at the module level to provide indication toengineers on where to focus

  • 8/21/2019 Predicting and Estimating Nov 06

    23/89

    23Linda M. Laird 2004All Rights Reserved

    Total Defects and Defect Distribution

    If you have fault data, you can estimate the

    total number of faults and the distribution.

    Via Calculations or Tools, using predictive models

    Method for the three primary distributions:

    Rayleigh

    Exponential

    S-Curves

    If you dont have fault data, you use historical

    data from other projects and static models

  • 8/21/2019 Predicting and Estimating Nov 06

    24/89

    24Linda M. Laird 2004All Rights Reserved

    Development Phase Model Applicability

    Start tracking

    defects

    Start

    Independent

    Testing

    Rayleigh

    Model

    Exponential (Reliability Growth)

    Model & S-Curves

    Static

    Models

  • 8/21/2019 Predicting and Estimating Nov 06

    25/89

    25Linda M. Laird 2004All Rights Reserved

    Exponential and S-Shaped Distributions

    S-Shaped Curve

    Exponential

    Time

    Cumulative

    Failures

    Found(e,g.F(t))

  • 8/21/2019 Predicting and Estimating Nov 06

    26/89

    26Linda M. Laird 2004All Rights Reserved

    Exponential and S-Shaped Distributions

    S-Shaped Arrival Curve

    Exponential

    Time

    Defects

    Found(Arrival

    Distribution

    f(t))

  • 8/21/2019 Predicting and Estimating Nov 06

    27/89

    27Linda M. Laird 2004All Rights Reserved

    S curves: Overview

    Resemble an S---with a slow start, then a much

    quicker discovery rate, and than a slow tail-off at theend

    Based upon view that software defect removalprocess is a defect detection, defection isolation anddefect correctionand all of them take time.

    Multiple S curve models, all Based upon the non-homogeneous Poisson process for the arrivaldistribution

    One equation:

    M(t) =

    Where M(t) is the expected number of failures by time t,and K is the total number of failures

    t

    etk

    )1(1

  • 8/21/2019 Predicting and Estimating Nov 06

    28/89

    28Linda M. Laird 2004All Rights Reserved

    Rayleigh & Exponential Curves

    In the family of Weibull curves;

    Which have the form of:

    F(t) = 1e(-t/c)m ;

    f(t) = (m/t)*(t/c)me (-t/c)m

    For m = 1Exponential Distribution

    For m = 2Rayleigh Distribution

  • 8/21/2019 Predicting and Estimating Nov 06

    29/89

    29Linda M. Laird 2004All Rights Reserved

    Rayleigh Model

    Defect Arrival Rate (PDF)the number of defects to

    arrive at time t =

    Cumulative Defects (CDF) -- the total number of defectsto arrive by time t =

    Where:

    K=total number of injected defects

    c is a function of the time tmaxthat the curve reachesits peak

    c = tmax* sqrt (2)

    Note: at tmax, ~ 40% of the defects should havebeen found

    )1(*)(2)/( cteKtF

    2)/(2 *)/2(*)( ctectKtf

  • 8/21/2019 Predicting and Estimating Nov 06

    30/89

    30Linda M. Laird 2004All Rights Reserved

    Using Rayleigh Model

    Simple extensions of the model provide

    other useful information.

    For example, defect priority classes can

    be specified as percentages of the total

    curve.

    This allows the model to predict defects

    by severity categories over time

  • 8/21/2019 Predicting and Estimating Nov 06

    31/89

    31Linda M. Laird 2004All Rights Reserved

    Plotting the graphs/looking at the fxns

    If K = 1, F(t) =

    probability of 1

    defect arriving by

    time t

    f(t) = probability

    of defect arriving

    at time 1

    So. what do

    these charts

    mean?

    Raleigh distribution - c=2

    -0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    0 5 10 15

    time

    probabilty

    F(t) for c = 2

    f(t) for c = 2

    Raleigh Distribution c = 10

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 5 10 15 20

    time

    probability

    F(t) for c = 10

    f(t) for c = 10

  • 8/21/2019 Predicting and Estimating Nov 06

    32/89

    32Linda M. Laird 2004All Rights Reserved

    Plotting the graphs/working with the fxns

    These are all for K =

    1.

    For case 1, tmax~ =

    1.4, => c = ~ 1.96

    (close to 2)

    For example, the

    probability that thedefect will arrive at

    time 2 is ~.39, and

    the probability that it

    has arrived by time

    2 is ~.62

    For case 2, tmax= ~7

    => c = 7*1.4 = 9.8

    (almost 10)

    Raleigh distribution - c=2

    -0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    0 5 10 15

    time

    probabilty

    F(t) for c = 2

    f(t) for c = 2

    Raleigh Distribution c = 10

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 5 10 15 20

    time

    probability

    F(t) for c = 10

    f(t) for c = 10

  • 8/21/2019 Predicting and Estimating Nov 06

    33/89

    33Linda M. Laird 2004All Rights Reserved

    Predicting defects analytically

    You can

    assuming a Distribution.and with defect

    data collected from early in the project

    Mathematically determine the curve and

    the equation, as long as youve hit themaximum.

    With Rayleigh, you need a maximum

    With Exp, you need enough data to see slope starting

    to change

  • 8/21/2019 Predicting and Estimating Nov 06

    34/89

    34Linda M. Laird 2004All Rights Reserved

    Method for using the Rayleigh Distribution

    Given n data points, plot them

    Determine tm(the time t at which f(t) ismax)

    Then since you have the formulae

    F(t) = K[1-e-(t2/2*tm2)

    ]f(t) = K[ (1/tm)

    2*t*e-(t2/2*tmax^2)]

    Where F(t) is the cumulative arrival rate,f(t)is the arrival rate for defects, and K is the

    total number of defectsAnd you can then use these to predict the

    later arrival of defects.

  • 8/21/2019 Predicting and Estimating Nov 06

    35/89

    35Linda M. Laird 2004All Rights Reserved

    Example: 594 Faults found by day 9

    Faults vs. Days

    0

    20

    40

    60

    80

    100

    0 2 4 6 8 10

    Days

    FaultsF

    ound

    What is the arrival

    function f(t)?

    Need tmax and K

    to determine f(t).

    Tmax -> 7 )2)27*2/1(2)7/1(*)( tteKtf

  • 8/21/2019 Predicting and Estimating Nov 06

    36/89

    36Linda M. Laird 2004All Rights Reserved

    Then what would you do?

    Solve for K ( you can pick any points --- I use

    t = 1defects = 20 for simplicity) =>K=20*49/e(-1/98)= ~990

    You now have an equation:

    Then plot out the equation and use it to

    predict arrival rates (and also see how well itmatches to the data)

    2)98/1(

    )49/990()(

    t

    tetf

    201.

    2.20 t

    te

    Note: this is an extremely simplistic way to solve for the equation. Using more than 1 point

    Or tools would be a good idea

  • 8/21/2019 Predicting and Estimating Nov 06

    37/89

    37Linda M. Laird 2004All Rights Reserved

    Using this data

    Remember that K is the expected total

    number of faults to be foundYou can determine # of defects found so far by

    taking sum of points on chartwhich happens toequal 594.

    This chart and analysis says that You expect ~ 990 faults

    Therefore, have found ~60% of faults so far

    If you wanted to predict when at least 95% hadbeen found, you could either

    Solve for (KF(t))/K >= .05 Use the equations with an excel model

  • 8/21/2019 Predicting and Estimating Nov 06

    38/89

    38Linda M. Laird 2004All Rights Reserved

    What did we just do?

    We figured out how to predict the total

    number of defects and the distribution basedupon the defects found to date, and assuming

    a Rayleigh distribution

    What would you do if you missed some of the

    initial data (for example, no one tracked

    defects found in requirement)would this

    method be useless?

    NOyoud use the maximum,

    and then project the faults found

    initially as well.

  • 8/21/2019 Predicting and Estimating Nov 06

    39/89

    39Linda M. Laird 2004All Rights Reserved

    Rayleigh Model Implementation

    SPSS (Regression Module)

    SAS

    SLIM (by Quantitative Software

    Management)

    STEER (by IBM)

  • 8/21/2019 Predicting and Estimating Nov 06

    40/89

    40Linda M. Laird 2004All Rights Reserved

    Now lets look at an exponential distribution

    F(t) = 1e-t

    f(t) = *e- t

    Or, given K total defects,F(t) = K*( 1e- t)

    f(t) = K * *e- t

  • 8/21/2019 Predicting and Estimating Nov 06

    41/89

    41Linda M. Laird 2004All Rights Reserved

    Exponential Distributions

    Exponential

    Time

    Cumulative

    Failures

    Found

    Ktotal

    number

    of

    defects

  • 8/21/2019 Predicting and Estimating Nov 06

    42/89

    42Linda M. Laird 2004All Rights Reserved

    Exponential Distributionswhat is K?

    Exponential

    Time

    Cumulative

    Failures

    Found

    Ktotal

    number

    of

    defects

  • 8/21/2019 Predicting and Estimating Nov 06

    43/89

    43Linda M. Laird 2004All Rights Reserved

    Solve equations

    Either

    solve for a few points (ok)

    Draw in your own K (not so good)

    let excel figure it out for you with trendlines

    (better)

  • 8/21/2019 Predicting and Estimating Nov 06

    44/89

    44Linda M. Laird 2004All Rights Reserved

    OKnow you try one by hand

    Given the following data, what are

    K

    t 1 2 3 4 5 6 7 8 9

    defects found 17 16 15 14 14 13 12 12 12

  • 8/21/2019 Predicting and Estimating Nov 06

    45/89

    45Linda M. Laird 2004All Rights Reserved

    Answer

    Since f(t) = K* *e- t, then

    f(a)/f(b) = (K* *e-

    a) /(K* *e-

    t)= e

    (b-a)

    And K = f(t)/ (*e- t)

    Therefore, selecting a = 1 and b = 5, we have f(1)/f(5) = e (5-1)

    17/14 = e *4

    Ln(17/14)=4= .048

    And thenpick a few points to determine Ktry 1and 5 againK1= 17/(.048*e

    -.048) = 372

    K5= 15/(.048* e-.048*5) = 371

  • 8/21/2019 Predicting and Estimating Nov 06

    46/89

    46Linda M. Laird 2004All Rights Reserved

    Using the Rayleigh Model instead of the

    exponential distribution

  • 8/21/2019 Predicting and Estimating Nov 06

    47/89

    47Linda M. Laird 2004All Rights Reserved

    Predicting arrival rates

    If you have a projection of the total

    number of defects (using static modelsor historical data) you can also predictthe arrival rates of defects using theRayleigh model

    Then use it as a plan to manage against

    If there are significant deviations, thiswould cause the manager to investigateand potentially take remedial action

  • 8/21/2019 Predicting and Estimating Nov 06

    48/89

    48Linda M. Laird 2004All Rights Reserved

    Using The Rayleigh Distribution

    This model, given total number of defects

    expected, spreads them out over the life-cycle of the project in a Rayleigh Curve.Use: to compare projected with actual faults found

    to determine project performance

    Input is:TdTotal duration of project (to operational

    delivery)

    ErTotal expected # of faults for lifetime ofproject

    Errors for each time period isEm = (6 * Er/Td^2)*t*exp(-3t^2/td^2)

    NOTE: this assumes ~95% of faults foundbefore delivery

    Source: Putnam and Myers

  • 8/21/2019 Predicting and Estimating Nov 06

    49/89

    49Linda M. Laird 2004All Rights Reserved

    Rayleigh Model - example

    Given a 26 week project, and expected

    faults of 1000..then.using formula

    Defects Per Week

    0

    10

    20

    30

    40

    50

    60

    0 10 20 30

    Week

    Defects

    Found

  • 8/21/2019 Predicting and Estimating Nov 06

    50/89

    50Linda M. Laird 2004All Rights Reserved

    Try another problem

    If you expect to have 100 defects, and you

    think that the time it takes to shipment is 10weeks. Youve found 60 defects by week 6.

    Are you in good shape or not?

    Since the errors you should be finding for

    each time period isEm = (6 * 100/10^2)*t*exp(-3t^2/10^2)

    = 6t *exp(-3t2/100)

    You should have found Total for weeks 1 to 6 = Sum(Em) for m= 1 to 6

    = 71.45 (I used a spreadsheet to calculate).So.either your software is better than you

    expected, or you are behind in finding defects.

  • 8/21/2019 Predicting and Estimating Nov 06

    51/89

    51Linda M. Laird 2004All Rights Reserved

    Other Tools

    If you dont like the calculations, there

    are tools (such as SLIM and STEER)

    which, given the arrival rate data, will

    help you predict the remaining defects

    and arrival patterns.

    E i ti l D t d R d ti

  • 8/21/2019 Predicting and Estimating Nov 06

    52/89

    52Linda M. Laird 2004

    All Rights Reserved

    Putnam and Myers (1992) found total defectsprojected using Rayleigh curves were within 5% to10%Others not as close, but may have had dirty data.

    With small projects, have smaller number of datapoints, and therefore, less confidence.

    Using their STEER software tool IBM FederalSystems in Gaithersburg, MD estimated latentdefects for 8 projects and compared the estimate withactual data collected for the first year in thefield..very close.

    Some data suggests that m=1.8 for Weibull curvesmay be best

    Kans recommendation: Use as many models aspossible to predict, compare with each other, trackresults, and see what works the best.

    Experiential Data and Recommendations:

  • 8/21/2019 Predicting and Estimating Nov 06

    53/89

    53Linda M. Laird 2004

    All Rights Reserved

    Dynamic Model Distribution Summary

    Formal Parametric Models for projecting

    latent software defects whendevelopment is complete and the

    project is ready to ship

    Encompasses both defect preventionand early defect removal

  • 8/21/2019 Predicting and Estimating Nov 06

    54/89

    54Linda M. Laird 2004

    All Rights Reserved

    Predicting Fault Density and Distribution

    Typical Distribution of FaultsDefect Prediction ModelsDynamic

    Rayleigh, Exponential, S-Curve Models

    Static Coqualmo Model

    Based on Process

    RL-TR-92-95

    Industry Data

    Such as SEI Delivered Fault Data

    Local ModelsHistorical Data

  • 8/21/2019 Predicting and Estimating Nov 06

    55/89

    55Linda M. Laird 2004

    All Rights Reserved

    COQUALMOby Chulani and Boehm

    Defect Analysis Tool from USC

    Extension to the COCOMO estimation model(Software Sizing Model developed by Boehm and

    others at USC)

    Based on the Defect Insertion/Removal model

    Tool/Paper available on our course website

    Coqualmo is a model which predictsDelivered

  • 8/21/2019 Predicting and Estimating Nov 06

    56/89

    56Linda M. Laird 2004

    All Rights Reserved

    Defect Density (per KLOC or per FP)

    Defects In

    Defectsout

    Based upon a variety of

    factorsAnd which you cantune based on your own

    experience.

    Delivered Defect Density

    C l M d l

  • 8/21/2019 Predicting and Estimating Nov 06

    57/89

    57Linda M. Laird 2004

    All Rights Reserved

    Coqualmo Models

    2 Separate models

    Source: COCOMO II

    Size

    Software

    Platform,

    Product,

    Personnel, andProject

    Attributes

    Defect

    Introduction

    Number of non-trivial reqmts,

    design, and coding

    defects introduced

    Defect

    RemovalNumber of

    Defects per

    KLOC

    Defect Removal Activities

    (Automated Analysis,Reviews, Testing and Tools

    I P

  • 8/21/2019 Predicting and Estimating Nov 06

    58/89

    58Linda M. Laird 2004

    All Rights Reserved

    Input Parameters

    For defect introduction, it uses the COCOMO II project

    descriptors (size, personnel capability and experience,platform characteristics, project practices, and product

    characteristics such as complexity and required

    reliability) to estimate the number of requirements,

    design, and code defects introduced into the project.

    For defect removal, it uses ratings of a projects level

    of use of analysis tools, peer reviews, and execution

    testing, to determine what fraction of the introduced

    defects are removed. Its estimates to date are consistent

    with general project experience and a small number of

    detailed project data points.

    COQUALMO M d t il

  • 8/21/2019 Predicting and Estimating Nov 06

    59/89

    59Linda M. Laird 2004

    All Rights Reserved

    COQUALMOMore detail

    Quantitative model for defect introduction and

    removalAcronym for Constructive Quality Model

    Chulani and Boehm at USC1999

    Consistent with COCOMO model by Boehm

    Current data is from the COCOMO clients and Expert

    OpinionNeed addl data from more projects to tune the model

    Defects Introduced (DI) =Where A is the multiplicative constant (for rqmts, design,

    coding)B is initially set to 1 and accounts for economies of scale

    QAF is the quality factor that is taking into account 21 defectintroduction factors (Platform, Product, Personnel, andProject)

    j

    B

    j

    j QAFSizeA j

    **

    3

    1

    DI ti i E li h

  • 8/21/2019 Predicting and Estimating Nov 06

    60/89

    60Linda M. Laird 2004

    All Rights Reserved

    DI equationin English

    What does that equation say?

    That the number of defects introduced is the sumof the number of defects introduced in each

    requirements, design, and coding

    The number of defects introduced in a given

    phase = A * (size) ^ B * QAF where A is based upon which phase

    B is based upon size

    QAF is based upon the quality of the process, platform,

    etc.

    E l (U i d d t )

  • 8/21/2019 Predicting and Estimating Nov 06

    61/89

    61Linda M. Laird 2004

    All Rights Reserved

    Example (Using dummy data):

    Assume that you calculated the QAF for each phase ---

    and that you have the following values, and that the modelhas given you the values for A as shown

    This says that the Defects Introduced by phase would be:

    Phase QAF A

    Rqmts 1.2 10

    Design 1 20Coding 0.5 30

    Phase QAF A DI

    Rqmts 1.2 10 12Design 1 20 20

    Coding 0.5 30 15

    Note that the QAFs imply a

    requirements activity worse than

    average and a coding activity

    better than average

    QAF Q lit A t F t

  • 8/21/2019 Predicting and Estimating Nov 06

    62/89

    62Linda M. Laird 2004

    All Rights Reserved

    QAFQuality Assessment Factor

    The QAF is a factor which is the product

    of 21 defect introduction driverssuchas analyst capability, programmer

    capability, required reliability of the

    system, etc.

    Defects Introduced

  • 8/21/2019 Predicting and Estimating Nov 06

    63/89

    63Linda M. Laird 2004

    All Rights Reserved

    Defects Introduced

    Nominal values, per KSLOC are:

    DI(requirements) = 10;DI (design) = 20

    DI (coding) = 30

    DI(total) = 60

    E.G., for for every 1K lines of code, the model

    predicts that, assuming a nominal situation therewould typically be 60 defects injected into the code,10 of which were requirements defects, 20 werecoding, etc. etc.

    Process Maturity had highest impact on defectintroductionwith everything else held constant, itvaries result by a factor of 2.5which says that if youhave a very good process, you significantly reducethe number of defects introduced

    COQUALMO Defect Removal

  • 8/21/2019 Predicting and Estimating Nov 06

    64/89

    64Linda M. Laird 2004

    All Rights Reserved

    COQUALMODefect Removal

    Initial values determined by experts using the 2-

    Delphi technique Looked at three different removal techniques:

    Automated Analysis, People Reviews, ExecutionTesting and Tools

    Rated %DRE for removing defects for 6 levels of

    effectiveness of technique for each phase (rqmts,design, coding)

    Computed residual defects as If all techniques Very Low Effectiveness= 60 defects

    per KSLOC

    If all techniques Extra High Effectiveness= 1.57 defectsper KSLOC

    If all techniques Nominal= 14.3 defects per KSLOC

    Summary on COQUALMO model

  • 8/21/2019 Predicting and Estimating Nov 06

    65/89

    65Linda M. Laird 2004

    All Rights Reserved

    Summary on COQUALMO model

    Mathematical model which takes as input

    Your view of your defect injection driversYour view of your defect removal drivers

    Gives you a projection of # of defectsremaining in your system at any phase

    Can be used to estimate impact of driverchanges on defect densitywhat if analysis

    improvement investment analysis

    Other Similar Models available

    RL TR 92 52

  • 8/21/2019 Predicting and Estimating Nov 06

    66/89

    66Linda M. Laird 2004

    All Rights Reserved

    RL-TR-92-52

    Seems to be a primary reference and model

    for both default density and fault densityCould not obtain a copy of report, I believe

    very similar to CoQualmo

    Key Fault Parameters for predicting defect

    density are:Application Type & Difficulty: 2 to 14

    Development Organization: .5 to 2

    Software Complexity: .8 to 1.5

    Compliance with Design Rules: .75 to 1.5 Note: 1 adds them

    Predicting Fault Density and Distribution

  • 8/21/2019 Predicting and Estimating Nov 06

    67/89

    67Linda M. Laird 2004

    All Rights Reserved

    Predicting Fault Density and Distribution

    Typical Distribution of FaultsDefect Prediction ModelsDynamic

    Rayleigh, Exponential, S-Curve Models

    Static Coqualmo Model

    Based on Process

    RL-TR-92-95

    Industry Data

    Such as SEI Delivered Fault Data

    Local ModelsHistorical Data

    SEI Defect Removal

  • 8/21/2019 Predicting and Estimating Nov 06

    68/89

    68Linda M. Laird 2004

    All Rights Reserved

    SEI Defect Removal

    Cumulative % of defects removed thru

    acceptance test:

    SEI Level 2: 25.5%

    SEI Level 3: 41.5%

    SEI Level 4: 62.3%

    SEI Level 5: 87.3%

    Diaz & King,

    2002 (in Kan)

    Industry data

  • 8/21/2019 Predicting and Estimating Nov 06

    69/89

    69Linda M. Laird 2004

    All Rights Reserved

    Industry data

    CMM Approach

    Measure Average defects/

    function points

    Typical defect potential and delivered defects

    for SEI CMM Level 1

    5.0 potential

    .75 delivered

    Typical defect potential and delivered defects

    for SEI CMM Level 2

    4.0 potential

    .44 delivered

    Typical defect potential and delivered defects

    for SEI CMM Level 3

    3.0 potential

    .27 delivered

    Typical defect potential and delivered defectsfor SEI CMM Level 4

    2.0 potential.14 delivered

    Typical defect potential and delivered defects

    for SEI CMM Level 5

    1.0 potential

    .05 delivered

    Source:Capers Jones, 1995

    Industry Data

  • 8/21/2019 Predicting and Estimating Nov 06

    70/89

    70Linda M. Laird 2004

    All Rights Reserved

    Industry Data

    Industry Approach

    Measure Average defects/ function

    points

    Delivered defects per industry System Software - .4

    Commercial Software - .5

    Information Software - 1.2

    Military Software - .3

    Overall average - .65

    Source:

    Capers Jones, 1995

    Defect Data By Application Domain - Reifer

  • 8/21/2019 Predicting and Estimating Nov 06

    71/89

    71Linda M. Laird 2004

    All Rights Reserved

    Defect Data By Application Domain - Reifer

    Application Domain Number

    Proje

    cts

    ErrorRange

    (Errors/

    KESLOC)

    Normative Error Rate Notes

    (Errors/ KESLOC)

    Automation 55 2 to 8 5 Factory automation

    Banking 30 3 to 10 6 Loan processing, ATM

    Command & Control 45 0.5 to 5 1 Command centers

    Data Processing 35 2 to 14 8 DB-intensive systems

    Environment/ Tools 75 5 to 12 8 CASE, compilers, etc.

    Military -All 125 0.2 to 3 < 1.0 See subcategories

    Airborne 40 0.2 to 1.3 0.5 Embedded sensors

    Ground 52 0.5 to 4 0.8 Combat center

    Missile 15 0.3 to 1.5 0.5 GNC system

    Space 18 0.2 to 0.8 0.4 Attitude control system

    Scientific 35 0.9 to 5 2 Seismic processing

    Telecom 50 3 to 12 6 Digital switches

    Test 35 3 to 15 7 Test equipment, devices

    Trainers/ Simulations 25 2 to 11 6 Virtual reality simulator

    Web Business 65 4 to 18 11 Client/server sites

    Other 25 2 to 15 7 All others

    Domain Data Comments

  • 8/21/2019 Predicting and Estimating Nov 06

    72/89

    72Linda M. Laird 2004

    All Rights Reserved

    Domain Data Comments

    Defect rates in military systems are much

    smaller due to the safety requirementsDefect rates after delivery tend to be cyclical

    with each version released. They initially are

    high, and then stabilize around 1 to 2 defects

    per KLOC in systems with longer lifecycles (>

    5 years). Web Business systems tend to

    have shorter lifecycles (

  • 8/21/2019 Predicting and Estimating Nov 06

    73/89

    73Linda M. Laird 2004

    All Rights Reserved

    Local History

    SimplestDefect Densities and Defect

    Removal Efficiencies from other project

    Remember from 533 what DRE is -- the %

    of defects removed in each developmentphase

    Prediction Model 2nd Step

  • 8/21/2019 Predicting and Estimating Nov 06

    74/89

    74Linda M. Laird 2004

    All Rights Reserved

    Prediction Model 2nd Step

    Now at 2nd step -- predicting failure rate from defects

    Fault

    Profile

    & DefectDensity

    Initial Failure

    Rate

    Delivered

    and

    On-going

    Failure Rate

  • 8/21/2019 Predicting and Estimating Nov 06

    75/89

    75Linda M. Laird 2004

    All Rights Reserved

    Predicting Failure Rate from Fault DensityIssues

    Musa ModelUsing Past Projects Data

  • 8/21/2019 Predicting and Estimating Nov 06

    76/89

    76Linda M. Laird 2004

    All Rights Reserved

    Issue is thatfailures are afunction of

    FaultsEnvironment

    System Usage& Mix

  • 8/21/2019 Predicting and Estimating Nov 06

    77/89

    77Linda M. Laird 2004

    All Rights Reserved

    Issue is that failures are a function of

    Faults

    Environment

    System Usage & MixIf you can make

    these the same as your

    Target environment

    Then the

    projection should

    work out better

  • 8/21/2019 Predicting and Estimating Nov 06

    78/89

    78Linda M. Laird 2004

    All Rights Reserved

    Typically cant have those similar to

    operational environment until

    operational testingso before that time,

    use empirical data..

  • 8/21/2019 Predicting and Estimating Nov 06

    79/89

    79

    Linda M. Laird 2004

    All Rights Reserved

    The Musa Prediction Method

    For predicting failure rate given a fault

    density -- developed for predicting

    expected failure rate in system test

    Caveat: This method seems like magic

    to me. But is does have an empirical

    basis..

    Musa Model Underlying Concepts

  • 8/21/2019 Predicting and Estimating Nov 06

    80/89

    80

    Linda M. Laird 2004

    All Rights Reserved

    Musa Model Underlying Concepts

    Each fault is embodied in machine instructions

    There is a probability that the faulty machineinstructions will cause a failure

    Therefore, if you know the number of faultsremaining, the number of machineinstructions for the program, the speed of themachine, and the probability, you can predictthe arrival rate of failures.

    Musa Prediction Model I/O

  • 8/21/2019 Predicting and Estimating Nov 06

    81/89

    81

    Linda M. Laird 2004

    All Rights Reserved

    Musa Prediction Model I/O

    Input: Fault Density, Size of Program,

    Processor Speed andA probability that a given faulty line of code

    will cause a failure when it is

    executede.g, a ratio of failures to faultscan either be from past history, or can

    use default (4.2*10^-7)

    Output: Expected Failure Rate

    Musa Model for Failure Rate

  • 8/21/2019 Predicting and Estimating Nov 06

    82/89

    82Linda M. Laird 2004All Rights Reserved

    Musa Model for Failure Rate

    Let w = number of faults

    Let I = number of object code instructions

    Let r = process speed in instructions per sec

    Let L = expected failure rate (e.g., lambda)

    K=magicconstant = 4.2*10^-7 --- theprobability that a given faulty line of code will

    cause a failure when it is executed

    Then L = r*K*w/I

    The Key is obviously K

  • 8/21/2019 Predicting and Estimating Nov 06

    83/89

    83Linda M. Laird 2004All Rights Reserved

    The Key is obviously K

    And where does it come from?

    If you have other similar programs/project,

    generate it from those (K = L*I/(r*w))

    Interestingly, Musas data across

    multiple projects only has K slightlyvaryingwith a range of 1*10^-7 to

    7.5*10^-7

    Example: Musa Model

  • 8/21/2019 Predicting and Estimating Nov 06

    84/89

    84Linda M. Laird 2004All Rights Reserved

    Example: Musa Model

    Let w = number of faults

    Let I = number of object code instructions

    Let r = process speed in instructions per sec

    Let L = expected failure rate (e.g., lambda)

    K=magicconstant = 4.2*10^-7 failures per fault

    Then L = r*K*w/I

    Assume a 100 MIP machine; 5 defectsper KLOC, 100K Source Lines, C++

    Then what is the expected failures per

    execution second?

    Class Example: Musa Model

  • 8/21/2019 Predicting and Estimating Nov 06

    85/89

    85Linda M. Laird 2004All Rights Reserved

    Class Example: Musa Model

    Let w = number of faults

    Let I = number of object code instructions

    Let r = process speed in instructions per sec K=magicconstant = 4.2*10^-7 failures per fault

    Then L = r*K*w/I

    Assume a 100 MIP machine; 5 defects per KLOC,100K Source Lines, C++

    Then w=5*100 = 500 total faults I = 100K*6 (from table in Rome Notebook) = 600K

    lines of object code

    L=(100*10^6)*(4.2*10^-7)*500/6*10^5

    =10^8*10^-7*10^2*4.2*5/10^5*6 = 10^-2*3.5= .035=.035 failures per execution sec

    = 2.1 failures per minute

    So this says that the initial failure rate is estimated to be 2.1failures per EXECUTION minute.

    Musa Model Summary

  • 8/21/2019 Predicting and Estimating Nov 06

    86/89

    86Linda M. Laird 2004All Rights Reserved

    Musa Model Summary

    The theory behind Musas model is that

    the faults are embedded in the code,and that the probability of the faults

    becoming failures is dependent upon

    the fault density, and the frequency ofthe code being executed.

    Kans Empirical data

  • 8/21/2019 Predicting and Estimating Nov 06

    87/89

    87Linda M. Laird 2004All Rights Reserved

    p

    For system platforms to have > 99.9+%

    availability, the defect level has to be

  • 8/21/2019 Predicting and Estimating Nov 06

    88/89

    88Linda M. Laird 2004All Rights Reserved

    j g y y

    Prediction vs. Estimation

    Model OverviewPredicting Defect Densities

    Predicting Failures From Defect Densities

    Estimating Reliability from TestingExecution Time vs. Calendar Time

    Estimating Failure Models Reliability Growth

    Reliability Estimation

    Tools

    Homework

  • 8/21/2019 Predicting and Estimating Nov 06

    89/89

    For the Rayleigh curve example, solve for t such that95% of the defects have been found.

    Play with Coqualmo so you can actually use it (onwebsite in tools)Understand parameters (may need to look up COCOMO

    model to understand them)

    Read articles on website Do project on website.