Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Early Estimation of Early Estimation of Software Quality Using In-Software Quality Using In-
ProcessProcessTesting Metrics: A Testing Metrics: A
Controlled Case StudyControlled Case Study
Presenters:Presenters:
Yigal DarsaYigal Darsa
Daniel LiuDaniel Liu
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
ABSTRACTABSTRACT
o field quality of a product tends to field quality of a product tends to become available too late in the become available too late in the software development processsoftware development process
o A controlled case study conducted at A controlled case study conducted at North Carolina State UniversityNorth Carolina State University
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
ABSTRACT (cont’d)ABSTRACT (cont’d)
o Use of a suite of in-process metrics that Use of a suite of in-process metrics that leverages the software testing effort to leverages the software testing effort to provideprovide
an estimation of potential software field an estimation of potential software field quality in early software development quality in early software development phases,phases,
the identification of low quality software the identification of low quality software programsprograms
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
INTRODUCTIONINTRODUCTION
o True field quality cannot be measured True field quality cannot be measured before a product has been completed before a product has been completed and delivered to an internal or external and delivered to an internal or external customer. customer.
o Field quality is calculated using the Field quality is calculated using the number of failures found by these number of failures found by these customers.customers.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
INTRO (cont’d)INTRO (cont’d)
• Because this information is available Because this information is available late in the process, corrective actions late in the process, corrective actions tend to be expensive.tend to be expensive.
• Software developers can benefit from an Software developers can benefit from an early warning regarding the quality of early warning regarding the quality of their product.their product.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
INTRO (cont’d)INTRO (cont’d)
• Early warning can be built from a Early warning can be built from a collection of internal metricscollection of internal metrics
• An internal metric, such as the An internal metric, such as the cyclomatic complexity (will be explained cyclomatic complexity (will be explained later), is a measure derived from the later), is a measure derived from the product itselfproduct itself
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
INTRO (cont’d)INTRO (cont’d)
• An external measure is a measure of a product An external measure is a measure of a product derived from the external assessment of the derived from the external assessment of the behavior of the systembehavior of the system
i.e.: the number of defects found in test is an external i.e.: the number of defects found in test is an external measure.measure.
• Structural object-orientated (O-O) Structural object-orientated (O-O) measurements are being used to evaluate and measurements are being used to evaluate and predict the quality of softwarepredict the quality of software
i.e.: Chidamber-Kemerer and MOOD O-O metric suitesi.e.: Chidamber-Kemerer and MOOD O-O metric suites
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
INTRO (cont’d)INTRO (cont’d)
The CK metric suite consists of six The CK metric suite consists of six metrics metrics
weighted methods per class (WMC), weighted methods per class (WMC), coupling between objects (CBO), coupling between objects (CBO), depth of inheritance tree (DIT), depth of inheritance tree (DIT), number of children (NOC), number of children (NOC), response for a class (RFC), response for a class (RFC), lack of cohesion among methods (LCOM).lack of cohesion among methods (LCOM).
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
INTRO (cont’d)INTRO (cont’d)
o These metrics can be a useful early These metrics can be a useful early internal indicator of externally-visible internal indicator of externally-visible product quality in terms of fault-product quality in terms of fault-pronenessproneness
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
INTRO (cont’d)INTRO (cont’d)
Section 2 outlines the STREW metric suite.Section 2 outlines the STREW metric suite.
Sections 3 discusses a controlled Sections 3 discusses a controlled
experiment, in which STREW metric suite experiment, in which STREW metric suite had been studied. had been studied.
Section 4 presents the experimental results. Section 4 presents the experimental results.
Finally, Section 5 presents the conclusions Finally, Section 5 presents the conclusions and future work.and future work.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Introduction Introduction STREW STREW
The metric used is called the Software Testing and Reliability Early Warning metric, also know as STREW.
STREW is a set of internal, in-process software metrics that are used to make an early estimation of post release field quality.
GOAL: early prediction of field quality.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Strew Metric SuiteStrew Metric Suite
Reasoning behind this metric suites:
Different from the traditional reliability estimation models, STREW puts a greater emphasis on internal software metrics, especially those involving the testing effort.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Strew Metric SuiteStrew Metric Suite
The use of the STREW metrics is based on the existence of an extensive collection of unit test cases being created as development proceeds.
During the initial stages of creating any project, such a unit test suite might not be available. In that case, historical data from a comparable project may be used.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Strew Metric SuiteStrew Metric Suite The STREW-J metric suite consists of nine
metric ratios.
The metrics are intended to cross-check each other and to triangulate upon an estimate of post-release field quality.
Each metric makes an individual contribution towards estimation of the post-release field quality but work best when used together.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Strew Metric SuiteStrew Metric Suite9 STREW metrics table. Categorized into three groups: test quantification
metrics, complexity and O-O metrics, and a size adjustment metric.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Strew Metric SuiteStrew Metric Suite
Test quantification metrics (SM1 to SM4):
Specifically intended to crosscheck each other to account for coding/testing styles.
i.e. One developer might write fewer test cases, each with multiple asserts checking various conditions. Another developer might test the same conditions by writing many more test cases, each with only one assert.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Strew Metric SuiteStrew Metric Suite
Test quantification metrics (SM1 to SM4):
Intended to provide useful guidance to each of these developers without prescribing the style of writing the test cases.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Strew Metric SuiteStrew Metric Suite
The complexity and O-O metrics (SM5 to SM8):
It examines the relative ratio of test to source code for control flow complexity and for a subset of the CK metrics.
These relative ratios for a product in development can be compared with the historical values for similar projects to indicate the relative complexity of the testing effort with respect to the source code.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Strew Metric SuiteStrew Metric Suite The complexity and O-O metrics (SM5 to SM8)
IN DETAILS:
The cyclomatic complexity metric: This software systems studies can be defined as the
number of linearly independent paths in a program.
It have shown that code complexity correlates strongly with program size measured by lines of code and is an indication of the extent to which control flow is used. The use of conditional statements increases the amount of testing required because there are more logic and data flow paths to be verified.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Strew Metric SuiteStrew Metric Suite The complexity and O-O metrics (SM5 to SM8)
IN DETAILS:
The larger the inter-object coupling, the higher the sensitivity to change. Therefore, maintenance of the code is more difficult. As a result, the higher the inter-object class coupling, the more rigorous the testing should be.
The number of methods and the complexity of methods involved is a predictor of how much time and effort is required to develop and maintain the class.
The larger the number of methods in a class, the greater is the potential impact on its children, since the children will inherit all the methods defined in the class.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Strew Metric SuiteStrew Metric Suite
The final metric is a relative size adjustment factor:
Defect density has been shown to increase with class size.
The difference of lines of code size is accounted for, because projects that uses the STREW metric prediction relative size adjustment factor will not all have the same LOC size.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Strew Metric SuiteStrew Metric Suite
Removal of Metrics:Removal of Metrics:
Some metrics were removed based on the lack of ability to contribute towards the estimation of post-release field quality.
Statement coverage Branch coverage Number of requirements/Source lines of code Number of childrentest/Number of childrensource
Lack of cohesion among methodstest/Lack of cohesion among methodssource
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
CONTROLLED EXPERIMENT CONTROLLED EXPERIMENT
o Research DesignResearch Design
o Case Study LimitationsCase Study Limitations
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Research DesignResearch Design
To evaluate the predictive ability of STREW-To evaluate the predictive ability of STREW-J, a case study was carried out in a J, a case study was carried out in a junior/senior-level software engineering junior/senior-level software engineering course at NCSU in the fall 2003 semestercourse at NCSU in the fall 2003 semester
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Research Design (cont’d)Research Design (cont’d) Students developed an open source EclipseStudents developed an open source Eclipse
• Eclipse is an open source integrated development Eclipse is an open source integrated development environmentenvironment
Plug-in in Java.Plug-in in Java.
Plug ins tested via:Plug ins tested via:• Unit test via JUnitUnit test via JUnit• Acceptance test via FIT toolAcceptance test via FIT tool
Groups of four or five junior and/or senior Groups of four or five junior and/or senior students submitted 22 projectsstudents submitted 22 projects
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Research Design (cont’d)Research Design (cont’d)
SLOC: Source Lines of CodeTLOC: Test Lines of Code
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Research Design (cont’d)Research Design (cont’d)
Evaluated by 45 black box test cases:Evaluated by 45 black box test cases:
• Exception checkingException checking
• Error handlingError handling
• Boundary test casesBoundary test cases
• Operational correctness of the plug inOperational correctness of the plug in
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Case Study LimitationsCase Study Limitations
Students’ experiences vary. Not everyone is Students’ experiences vary. Not everyone is advancedadvanced
Experience is done academically and in ideal Experience is done academically and in ideal conditions; might not reflect industrial SW conditions; might not reflect industrial SW DevelopmentDevelopment
Eclipse plug ins are relatively small Eclipse plug ins are relatively small applications for industrial applicationsapplications for industrial applications
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Limitations Limitations Experimental Results Experimental Results
Black box test failures/KLOC:
Approximated as the problems that would have been found by the customer had the project been released.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Experimental ResultsExperimental Results
Using the black box test failures/KLOC test quality obtained by running the 45 test cases, a multiple linear regression analysis was performed.
Difficulty with multiple linear regression analysis: There is multi-collinearity among the
metrics. This can lead to inflated variance in the prediction of post-release field quality.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Experimental ResultsExperimental Results
To eliminate multi-collinearity, PCA was used:
PCA removed multi-collinearity and are orthogonal to each other. This means that changes in one component do not influence either of the other components, unlike the individual metrics.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Experimental ResultsExperimental Results
Ability of STREW metrics: used to identify programs of low quality.
All programs having a Black box test failures/KLOC lower than the calculation from the equation below are of high quality and the remaining is of low quality.
Equation (lower bound) = µBlack box test failures – [(zα/2*Standard deviation of black box test failures/KLOC) / n ]
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Experimental ResultsExperimental ResultsTable: Table: Overall classification of program quality
The estimate of the percentage correct classification is 90.9% (i.e. overall 20 of the 22 programs were correctly identified as high or
low quality programs, using STREW).
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
Results Results Conclusion Conclusion
From the experiment: 20 of the 22 programs were correctly identified as high
or low quality programs.
Feedback on potential field quality of software is very useful to developers because it helps identify weaknesses and faults in the software that require fixing.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
CONCLUSIONCONCLUSION
• in most production environments, field in most production environments, field quality is measured too late to quality is measured too late to affordably guide significant corrective affordably guide significant corrective actionsactions
• in-process testing metric suite for in-process testing metric suite for providing an early warning regarding providing an early warning regarding post-release field quality measured by post-release field quality measured by black box test failures/KLOC, and for black box test failures/KLOC, and for identifying low quality programsidentifying low quality programs
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
CONCLUSION (cont’d)CONCLUSION (cont’d)
• STREW metric suite is a practical STREW metric suite is a practical approach to measuring software post-approach to measuring software post-release field qualityrelease field quality
• STREW-based logistic regression STREW-based logistic regression analysis is a feasible technique for analysis is a feasible technique for detecting low quality programs.detecting low quality programs.
Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation
ANY QUESTIONS!?ANY QUESTIONS!?
Why? What is it that you didn’t Why? What is it that you didn’t understand?understand?
Are you sure you read the article?Are you sure you read the article?
If you read and didn’t understand what If you read and didn’t understand what makes you think that we understood?makes you think that we understood?
Go easy on us!! Go easy on us!!