A summary of software measurement experiences in the Software Engineering Laboratory

A S~rnrn~ of Software Measurement Experiences in the Software Engineering Laboratory

Jon D. Valett and Frank E. McGarry NASA, Goddard Space Flight Center, Greenbelt, Maryland

The practice of along software is incr~ing~ seen as a valuable tool in the overall development of hjgh-quiz software projects. Software measurement attempts to use known, quantifiable, objective, and subjective measures to compare and profile software projects and products. To compute these measures effectively, data that characterize the software project and product are needed. This paper covers aspects of data collection and software measurement as they have been applied by one particular organiza. tion, the Software Engineering Laboratory (SEL). The measurement results include the experiences and leaaons learned through numerous experiments conducted by the SEL on nearly 60 flight dynamics software projects. These experiments have a~ernpt~ to determine the effect of various software development technologies on overall software project quality and on specific measures such as productivity, reliability, and maintainability.

1. BACKGROUND

The practice of measuring software is increasingly seen as a valuable tool in the overall development process of highquality software projects. Software measurement attempts to use known, quantifiable, objective, and subjective measures to effectively compare and profile software projects and products. Attaching these quantifiable software measures to specific aspects of software projects allows for meaningful analysis of the quality of a software project. The data for these measures can be obtained in a variety of ways, including data collection forms, automatic collection, and direct interviews with project members. As data are collected and projects are measured, an organization improves its understanding of the software development process within its environment, and therefore the overall process is improved

Currently, some org~tions from industry, aca- demia, and gove~ent are measuring their software development process [l-3]. This measurement is typi-

Address correspondence to Jon D. Valet& Code 552, NASA, Goddard Space Flight Center, Greenbelt, MD 20771.

tally justified by the ~gument that an org~tion must understand its development characteristics before applying some newer technology. Thus, to determine the effect of new technologies on that organization’s development process, the effect must be measurable against an environmental baseline. The major deterrent to incorporating a measurement program is cost. If the cost outweighs the benefits, the measurement process is not worth pursuing.

This paper describes one approach to data collection and experimentation, explains how the techniques were applied to one particular pr~uction enviro~ent, and describes the pertinent lessons learned from this organization’s attempt to improve its development process through measurement. This paper does not attempt to summarize the results of the technologies studied during the period for which the measurement program has existed.

1.1 The Software Engineering Laboratory (SEL)

For the past 12 years, the SEL has dedicated itself to measuring and ~ders~~g software development in one particular production envi~~ent. The SEL is a cooperative effort between the National Aeronautics and Space Administration/Goddard Space Flight Center (NASA/GSFC), the University of Maryland (Computer Science Department), and Computer Sciences Corpora- tion. It is funded by NASA/GSFC and is an integral part of the Flight Dynamics Division at Goddard. The goals of the SEL are to understand the software development process in the GSFC environment; to measure the effect of various methodologies, tools, and models on this process; and to identify and then apply successful development practices [4].

The SEL’s study enviro~ent is the Bight dynamics area at NASAIGSFC, which is responsible for developing medium- and large-scale ground support software systems for spacecraft mission support. Detailed software development data, including data on effort, changes, errors, computer utilization, software size. and

137 The JoIuYlal of systems and software 9, 137-148 (1989) This article is in the public domain and cannot be copyrighted. OlfS-1232/89/$3.50

J. D. Valett and F. E. McGarry

methodologies used, have been collected [5] on nearly 60 software projects. These medium-scale projects, developed mostly in Fortran, have ranged from 3000- 170,000 lines of code, with the average about 60,000 lines. The SEL has thus had a unique opportunity to study and apply different development methodologies to production-level software projects and to determine their effects on the development process. During these years, the SEL has acquired valuable experience in the science of measuring software and has gained immeasurable benefit from applying and assessing available software development technologies.

1.2 The Operation of the SEL

The SEL’s approach to studying software technologies is to first define the specific goals of a potential study effort and then to derive appropriate questions and metrics of interest from the set of goals [6]. As teams are formed to develop flight dynamics production software, they are informed of the goals of the study and of which types of data will be collected during development. If needed, the development team is trained in a particular methodology that may be part of the study (e.g., Ada*, or structured design, or some development tool).

From the start of project development, information is provided (through the use of data collection forms) by the developer and manager for recording in a central data base [7] of information. This detailed information includes effort data, change/error data, development characteristics, and environment data. The data are quality assured three times before becoming an inherent part of the data base, where it will be studied by the SEL team. The development team is responsible for filling out forms and responding to short interviews with the study team. All data entry, quality assurance, and analysis is performed by staff personnel other than the development team. The prime goal of the development team is to complete the mission support project in a timely, cost-effective manner, not to study the technol-

ogy*

1.3 Types of Data

The data collected by the SEL falls into three basic classes, based on how the data are collected:

1. Forms. 2. Interviews/subjective information. 3. Automated methods.

* Ada is a registered trademark of the U.S. Government, A& Joint Program office.

Forms. Forms are used to collect data on effort and resources, error and change characteristics, component characteristics, and project estimates. The Personnel Resources Form (Figure 1) is used to collect on a weekly basis the effort data from each project member, detailing the number of hours spent on different activities for each member of the team. The Change Report Form (Figure 2) provides data on component** changes and errors and is used any time a modification is made to source code that has completed its unit testing. The Component Origination Form (Figure 3) provides detailed component data and is used by project members after a unit has been coded and unit tested. Project estimates consist of manager’s projections of phase dates, system size, and effort (resources) required to build the system. A form is collected from project managers that provides these dates and estimates after each significant phase or milestone.

Interviews/subjective information. Interviews and subjective data are collected both during and after project completion to gain additional insight into project characteristics and into forms that may have been completed by project developers and managers. The subjective data represent detailed characteristics of the development process from the eyes of the senior software engineer studying the project. Information such as methodology actually used, quality of tools applied, management approach, and level of discipline adhered to is recorded.

Automated methods. Automated data collection procedures characterize project parameters such as computer utilization, change and growth history, and product characteristics. Computer utilization data are taken from weekly computer accounting data and include the amount of CPU time and the number of computer runs made on a project. Change and growth data are collected automatically on a weekly basis by tools that monitor a project’s controlled library to determine the number of components that have been changed, total number of components, and number of lines that exist for that week. Other tools are used after a project is completed to characterize the software product developed. All source code is processed to compute the detailed characteristics of the product including size characteristics in lines of code per module, executable versus nonexecutable versus commentary lines, number of each type of statement, and number and type of modules.

** A component is defined as any separately compilable piece of source code.

software Measurement Experiences 139

Personnel Resources Form Name:

Project: Friday Date:

j SECTION A: Total Hours Spent on Project for the Week: SECTION 3: Hours By Activity flotat of hours In Section B should equal total hours In Section A) I

Activity

-%n

I cave ~~~t~s Hours

~~~ the concepts of the sys&n. as requirements analysis).

Any work prior to the actual design (such

Create Design Devebpment of the system, subsystem, or components design inciudes devebpment of PDL, design diagrams, etc.

Read/Review Design Hours spent reading or reviawii design. includes desiin meetings, formal and informal reviews, or waikthroughs.

Write Code Actualiywdiisystemcamponents. includesbofhdeskandtermiicodedevebpment

R~~Rev~ Code Code reading for any purpose other than isoiatbn of errors.

Test Code Units 1 Te&ltg ~~~I~Of thesystem ~~~~~~~~~

Hoursspentfhdinga ~~b~~~~~a~i~~ in&desgen- eratbn and execution of tests associated with finding the error.

integration Test Writing and executing tests that integrate system components, including system tests,

Acceptance Test Runningkupporting acceptance testing.

Other Dther hours spent on the project not covered above. includes management, meetings, training hours, tMebook system descriptbns, user’s guides, etc.

SECTION C: Effort On Specific Activitiss (Need not add to A) (Some hours may bs counted In mom than one area; view each

f&cfk: Estimateaftobrihoursspentthatwerecswedbyurplarwadchanges~emK~ ~~~~~~ch~~~~~~

/ Wtplvursd~~~~Changerto or&angeddesign#rorsor

- (ThisincludeoalihounJpentdekyl9q.)

~~~~~~~~8~~i~O~ti~izi~~ Estimate of total hours spent improving the efficiency or clarity of design, or code, or documentation. These are not caused by required changes or errors in the system.

m

DoaKnenlirrg : Hours spent on any documaMtbn of the system. includes devebpment of design documents, proiogs, in-line commentary, test pianq system descriptbns, user’s guides, or any other system 0

&US Hours spent in an effort to reuse components of the system includes effort in boking at other s~s) design, cuk, or ~~~. Count total hours in searching, appiyii, and testing. I

I For Libr8d8n’t Uw On&

Figure 1. Personnel resources form

140 J. D. Valett and F. E. McGarry

CHANGE REPORT FORM Project Name:

Programmer Name:

Section A - ldentifiieation

Current Date:

Approved by:

Describe the change: (What, why, how)

Effect: What components (or documents) are Effort: What additlonal components (or documents) changed? (Include verslon) were examined In determining what change was

needed?

Location of developer’s source files

Nesd for change determlned on: Change completed (incorporated into system}: tevwse Me)

1 hr/bss 1 hdl dy 1 dyl3dys .3dys

Eff&t in person time to Isolate the changs (or error): Effort In person time to implement the change (or correction): 1

Section B -All Changes Type of Change (Check one) Effects of Change

& &VWWhangeacorrecUontooneand

o-r- q klSertlon/deleuonofdebugcade only one conlponBnn

q fJlaWd~nlen’ 0 OpUmkatian of time/spadaccwacy

•~~~~0~~~~~1~ 0 cf Dkiyoubokstanyotharcponent?

•~~~~~rn~~~, •~(~~~) or- q q D~~~wto~~~~m~t~s

q lmpmwment ol uxar servkas piwsed explMy or knplkiity (e.g., wmmonbiocks)loorhomthechanged

=vx-nn

Section C - For Emr Comctions Only Source of Error Class of Error Characteristics

(Check one) (Check most appl~~b~~ (Check Y or N for all)

cl- ~iniullnzation

q tUncYaMlr~ q Lcglc/conlrolnvlnzlure ;;I!

(e.g., flow of control Inwrrect) omisaicm error (e.g., sometMng was IoH 4)

cl- r-J Intetice (iniamlll)

q lcOd. (modtdebmodub COmlnIJnies8ar)

error (e.g., something incormcl was

q PmvkW8ch8nge •1 bIterilE (extemxl)

U(~~~~) cl q ~~~~~~(C~l)

@.g.l mwrg wfWle u-8 q campuubnri

Fo?~sunoniy

(e.g, omu in mxth expression) Number:

Itlwoarweqlmily~ppikrble,cheokthe Da% onehigheronthellm~ Entered by:

Checked by:

Figure 2. Change report fem.

Software Measurement Experiences 141

1.4 Types of Studies

COMPONENT ORIGINATION FORM

Project Name:

Subsystem Prefix:

Component Name:

Date entered into controlled library:

j&ggtio of Deve@gr’s File n

Library or directoty:

Member name:

Programmer Name:

Form Date:

Relative DBfiwBv of DeveloDina Comwrte~

Please Indicate your judgment by circling one of the numbers below.

Easy Medium Hard

1 2 3 4 5

Oriain

If the component was modified or derived from a diirent project, please indicate the approximate amount of change and from where it was acquired; lf lt was coded new (from detailed design) indicate NEW.

NEW Extensively modified (more than 25% of statements changed) Slightly modified Old (unchanged)

r

If not new, what project or library is it from?

TVDB of ComDonent (Check one only)

‘INCLUDE’ file (e.g., COMMON) JCL (or other control) ALC (assembler code) FORTRAN executable source

source Pascal NAMELIST or parameter list Display identification (GESS) Menu definition or help Reference data files BLOCK DATA file

Ada subprogram specification Ada subprogram body Ada package specification Ada package body Ada task specification Ada task body Ada generic specification Adagenetfcbody

-Other

PU~DDSB af Executable Component For executable code, please identify the major purpose or purposes of this component. (Check all that apply).

VO processing module Control Algorithmic/computational Interface to operating system

transfer Data _ Ada process abstraction Logic/decision _ Ada data abstraction

ILY 1987

Figure 3. Component origination form.

Having all of these data available allows the SEL to perform many different studies and experiments. The experiments can be classified into three types: profile development, methodology evaluations, and comparative project experiments.

To better understand the environment, its process, and the typical project, the SEL studies detailed profile

information representing both the process and the products of projects in this environment. Profiles are designed to obtain a better understanding of how an organization does its business. They are not intended to evaluate whether the characteristics are right or wrong but to report on the method of software development.


Characteristics such as effort distribution (between design, code, test, etc.) productivity, types of errors commonly committed, and the manner in which code is developed and changed over time are typical items that are captured.

The second type of experiment is the measurement of methodologies and their effects on the development process. This type of experiment compares projects that are similar in complexity and size of effort but which have applied different techniques to the development process. This type of experiment is used to understand the trends that a particular development technique may have on projects witbin the environment.

The third type of experiment is the direct comparison of two or more projects developing to the same requirements but using different methodologies. The value of this type of experiment is the ability to directly compare the effect of the methodology on the same project. Frequently, these types of experiments are conducted on small, contrived projects because the obvious prohibitive factor in this type of study is cost. Several such studies have, however, been conducted by the SEL.

Throughout the 12 years of the SEL, numerous technologies have been studied and reported on, provid- ing extensive insight into the production of software in the SEL environment. Detailed results of the studies are presented in a series of reports and documents published by the SEL [8-l 11. Some results from all three types of experiments are presented as examples later in this

paper. The rest of this paper focuses on the experiences and

lessons learned from the SEL’s extensive data collection and experimentation efforts. Examples of the results obtained from the various study approaches are presented first. Then the lessons learned from the data collection process itself and the SEL’s experiences with the cost and general overhead of the experimentation process are discussed. Finally, the major drivers for supporting software measurement are summarized.

2. EXPERIENCES FROM 12 YEARS OF MEASUREMENT

Since the inception of the SEL in 1976, extensive effort has been put forth in the measurement of software development within the NASA/GSFC environment. Nearly 60 major projects have been monitored to varying levels of detail; experiments and studies with all of these projects have been conducted over the last decade.

The collected data have become part of a major corporate memory, the SEL data base, which currently

requires over 15 megabytes of storage. All monitored projects have recorded effort/resource data, and most have detailed records of change and error data as well.

The SEL has used all of the 60 projects, representing over 3 million lines of code, to develop the profiles of the environment and to study general trends in the characteristics of the software process and product of that environment. In addition, approximately 35 of the projects have been (and still are) used in carrying out studies of specific software development methodologies. On six to eight projects, comparative experiments have been performed.

In total, over 150,080 forms have been collected and recorded on the SEL data base, from which the SEL has gleaned vast amounts of information. From the detailed studies performed by the SEL, some key insights have been gained in the measurement and experimentation process.

2.1 Study Examples

Examples of the three types of SEL experiments are presented here to illustrate the types of data collected and the results obtained.

Profile studies. The profile study attempts to characterize an aspect of the development environment. One completed study showed that individual productivity varies significantly within the SEL environment [ 121. By recording detailed performance and effort data from over 150 individuals across 25 major projects, the SEL was able to estimate the productivity rates of these individuals. Although there are obvious problems and objections to using lines of code as a productivity measure, the SEL used it to at least compute trends in productivity. Figure 4 shows the computed variations in productivity for the particular projects studied. It shows that individual variation is much greater than project variation in both large (over 20,008 lines of code) and small projects. This simple productivity profile thus verifies a suspected result: good people are the most important methodology. If it could be determined why certain developers performed at a much higher rate than others, training programs could be developed and staffing could be determined which might bring out those characteristics. From a measurement perspective, level-of-effort data and the amount of code produced (size data) were the only items needed to produce this profile. Such relatively high-level data are easily collected and can offer a great deal of information. Figure 5 [13] is another example of an environmental profile, showing the percentage of effort by activity on a typical SEL project. The important point here is that the profile does not indicate whether tbe SEL development profile


BY PROJECT (ALL CHARGES)

BY PERSON (PRO~CiMt$blER

8- 7.

6.

5.

4.

3.

2. , . 2.7 3A

4.7

” MN - AVG - MAX LARGE PROJECT

8

7

6

5

4

3

2

1

- 0 I

MN AVG MAX

:: 10

t

i

:

; LJIl 5.2

iJ MN AVG

10.8

LARGE PROJECT SMALL PROJECT

A LARGE PROJECT IS GREATER THAN 20K SOURCE LINES OF CODE (SLOC).

is good or bad, only where effort is being expended. Because such a low percentage of effort is spent in actually coding, the SEL has not concentrated its efforts on improving coding activities but has instead concentrated on improving more significant activities such as designing and testing. This profile was created with effort data (by activity) at a very high level, that is, no component-level effort data were used. In developing profiles, it should be remembered that it is major trends that are of interest and these these can typically be found without using exceptionally detailed information.

Methodological evaluations. Another type of experiment used by the SEL is the methodological evalua-

Figure 4. Productivity variation in the SEL (SLOUhr).

tion, an example of which was presented at the Hawaii International Conference on Systems Sciences in 1985 [14]. The emphasis of the experiment was on determining the effects of computer resources and tool usage on software productivity, reliability, and maintainability. Productivity is defined as the number of lines of code produced per staff-month; reliability is the number of errors reported after unit testing per thousand lines of code; and maintainability is the average length of time needed to change or correct the software. The results showed that an increase in the use and quality of

Figure 5. Effort by activity in the SEL.


UN AVG MAX N&V

COST INCREASED

4l

. . . . .._**....__.____.*...

AVG MAX IV&V

RELIABILITY NOT IMPROVED

AVG MAX

MORE ERRORS FOUND EARLY

i.2-

MlN AVG MAX W&V

ERROR CORRECTION COST NOT DIFFERENT

Figure 6, Results of a study-on IV&V methodology (based on three experiments).

software tools had a significant positive effect on the productivity and maintainability of software projects but that the quality of the computer enviro~ent had no measurable effect on any of the three measures. A third surprising result was that, as the number of terminals per programmer increased, productivity decreased. Subjec- tive data were used in this analysis, though more than just the subjective data from the SEL data base were included. The subjective data in the data base did not cover all the areas needed, so additional, experiment- specific data were collected post facto from project managers. This experiment showed that high-level trends in project data were very useful in analysis as opposed to detailed statistical data, which relates back to previous points that too much detail can obscure general riWltS.

Another example of experimentation on methodological efforts was the a~lication of ~e~ndent Validation and Verification (IV&V) to three projects [123. The IV&V experiment compared the effect of applying IV&V to three large-scale (over 100,000 lines or code) projects, using the measures of p~~tivi~, number of errors found early in the project, overall reliability, and cost to correct errors, with similar projects that did not use IV&V. Figure 6 shows the results of the experiment. Altbough more errors were found early in develop~nt using this method, the other three measures show that

IV&V was not a particularly effective methodology in the SEL environment. Methodological measurement is extremely vahrable to an org~ation in evaluating new techniques, given the existence of profiles as a mecha- nism for comparison.

Comp~ativ~ project studies. The third type of SEL experiment is a comparison of projects developing from the same requirements but using different methodologies. One study that used this method compared different testing strategies using small programs that were seeded with faults. Three testing strategies-structura1, func- tional, and code reading-were compared to determine their relative ability and cost to locate the errors [15]. The experiment used professional programmers in the SEL environment and compared the quality of these three methods. The most prominent result from this experiment was that code reading proved to be the most effective means of testing for these study programs, regardless of the level of experience of the programmer; the more experienced developers were, however, even more adept at using it. An important corollary to this result is that code reading was also the least expensive technique for locating errors. Although code reading will never be the only testing strategy used by an org~tion, this experiment has provided a rationale for the SEL to stress code reading as an important part of the development process. For this experiment, specific data designed to answer the questions posed by the goals of the experiment were collected from the p~gr~rs. Although the typical data on effort were collected, more


specific data were required to assess the methodologies accurately.

A production-size experiment currently being conducted by the SEL involves the comparison of Fortran and A& as development languages. In this experiment, two independent teams, one using Fortran and one using Ada, are developing a major mission support project for GSFC. This type of software project is typical within the SEL environment and generally represents approximately 50,000 lines of source code; it is therefore a valid test of the two languages within the environment. Although this experiment is not complete, many obser- vations and results have already been reported [ 16, 171. The major early results from this experiment show that the use of A& as a methodology, rather than as simply another programming language, will cause significant changes in the common waterfall model of software development. Typical management planning and track- ing methods used in the past will have to be adjusted significantly to develop projects in Ada properly. Again in this experiment, a goal (evaluating Ada for the SEL environment) was established and specific data were collected to help reach that goal. Although this type of experiment is expensive, it does offer the best grounds for evaluating a methodology within an environment.

2.2 Experiences With Data Collection

From the many experiments performed by the SEL, important lessons have been learned on the importance of certain classes of data:

1. critical data. 2. Data useful to specific studies. 3. Data of little or no use.

Critical data. Some data are critical to any organization and should therefore always be collected. These data are generally inexpensive to collect and are truly necessary for an organization to understand its development process. The first type of critical data is the information provided on phase dates; estimates of size and effort; me+hodology; environment; and tools used. These project characteristics, along with characteristics describing the final software project, are vital to characterizing a project and its environment. A second type of critical data is resource data, which includes effort data and computer utilization data. The SEL recently changed its philosophy on the collection of effort data. Because accurate effort data by software component, which was formerly collected on most projects, is both difficult and expensive to collect, a high-level effort data form has been produced (Figure 1). This form is now being used to collect effort data by

activity for the overall project and effort data on specific areas of interest to current studies. Computer utilization data are relatively inexpensive to collect, coming directly from computer accounting records, and can be easily used to profile computer use within an environment. This profile can be extremely useful in evaluating the progress of an ongoing development project. A third type of critical data is a record of changes and errors. Again, counts of changes and errors and the relative effort to implement them are needed for a complete profile of the development environment.

Data useful to specific studies. Data that are useful only to specific studies should be collected on an as- needed basis; these data are generally either somewhat more expensive or less critical to overall environment understanding. One example of this class of data, effort data by component, was used extensively on previous SEL studies but is now only extracted from specific projects with a defined study need for the data. Component information data is still being collected by the SEL for ongoing studies that require it; however, it is not as a rule critical for general profile studies. Detailed error data classifying types of errors may be important to some studies but are not as critical as the high-level error data discussed above. If an organization wants to evaluate the most frequent types of errors committed, these data will be needed; otherwise, it may not be worth the effort. Change and growth history data, although somewhat important to the environment profile, are more expensive to collect; an organization would therefore have to determine the relative value of collecting such data.

Least-used data. The third class of data is that which the SEL has found to be least useful. When the SEL first began collecting data, it attempted to generalize the data collection to gather all possible data from a project. Now, with the development of the Goal-Question- Metric paradigm [3], aside from the critical data, data are collected on the basis of experimental goals. This third class consists of the generalized data that were not used extensively in any of the studies. One such item was the Run Analysis Form that detailed information on each execution of a computer run. These data were not effectively used as either a profile or as data for studies and so are no longer collected. Originally, the SEL also collected over 250 subjective data items from each development project. The attempt was to capture every minute detail pertaining to development methodology, such as the level to which structured programming and top-down design were used, the development team’s experience with the application, and the level of management control used on the project, but the extreme


detail made actual evaluation of the methodology impos- sible. Because these 250 items proved to be much too detailed to use for any real studies, an effort was made to reduce this subjective data to 30-35 items. The SEL has just started to collect this type of data, and its usefulness remains undetermined.

2.3 Cost of Collecting Data

As noted previously, the major valid concern over carrying out m~u~ment efforts in any software enviro~ent is cost. An org~tion must be convinced that the potential benefit from the measurement process exceeds the overhead that will be incurred via the process itself. Experience from the SEL has led to greater insight into the general cost of carrying out the measurement and analysis activity. The overhead can be divided into three major areas: data collection, processing, and analysis.

Data co#Iectioa. The most visible overhead to the m~ure~nt process is that of collecting the data. Obviously, requiring all developers and managers to fill out forms reflecting development effort activities adds to the cost of the project. An analysis of the additional cost on over 60 projects shows that the cost of collecting data to the extent performed by the SEL (which includes all three classes of data) is approximately 2-3 % of the total development cost.

Originally, the SEL had expected that the overhead would run approximately 8-10%. Early projects, patti- ally because of wasted data collection attempts, incurred an overhead of approx~~ly 5%. For the last 3-5 years, the general overhead cost of collecting data forms has converged to approximately 2-3 56 of the build cost.

Data processing. After forms have been completed or interviews with developers have been conducted, the available data must be quality checked, entered into some storage medium (data base), and archived. The SEL has permanent staff that is responsible for this effort for alJ projects being studied. By examining the total expenditures for designing and generating a data base and then looking at the recurring operational cost of the daily data processing relative to the cost of the projects being developed, the SEL has estimated the cost of data processing to be 5% of the total development effort being supported. This is the recurring operational cost after the initial data base design and setup has been completed. This initial startup cost could run anywhere from 3-10 staff-years of effort.

Data analysis. The most visible cost incurred is that of coBecting and processing data, but the key to the

overall experiment measurement process is interpreting the studies: the data analysis. For the SEL, this effort has been the major expenditure in the measurement process. Obviously, an organization could expend as much effort as available to analyze data collected from software development projects. The experiences of the SEL, an organization that probably expends much more effort in analyzing experiment data than a typical organization would, are presented here.

The analysis overhead includes data ~te~~~tion, experiment planning, training, analysis, and report writing. This effort costs, on the average, lo-15% of the total development effort of the project being analyzed. This expenditure has resulted in a major improvement to the understanding and overall process of developing software in this environment. Standard techniques have been established, tools selected, and training needs determined via this analysis. The cost has been well worth it.

3. SUMMARY AND DI~USSION

An important point of this paper is that software development can be measured and many valuable results can be produced via the measurement process. These results may come at some expense, but the net gain in process understanding and improvement should be worth it. The most important lesson the SEL has learned from its years of measuring software is that it is necessary to define measurement goals before defining the data needed. By defining goals and letting them drive the data co&c&d, an org~~tion can save both time and money in its me~urement efforts.

Although some organizations are measuring their software process, more should undertake this effort, Measuring software need not be a tremendously expensive task, and a smaJl expense could lead to large gains in both quality and productivity. This improvement of the software development process and product should be enough to encourage organizations to begin at least a minimal amount of data collection. The SEL has learned many good reasons for any organization to start a m~urement program; the most important ones are discussed below.

1. Provides a vital management planning aid. By continually measuring software development over a period of several years, an organization will develop a corporate memory that is an extremely useful management aid. The historical data can be used in predicting the &me behavior of projects, estimating resources, and comparing current and past project developments. The corporate memory is also criti-


2.

3.

4.

tally important in experimenting with new technologies because it provides a profile or baseline of the typical project within an environment. Thus, when a new technology is applied to a project within the environment, it is possible to determine the cause/ effect relationship of that technology and therefore its applicability to that environment. Without accurate historical data, an organization can never determine if it is improving the way its develops software, regardless of how improving is defined. Supports process ~nderstundi~g. By m~su~ng software effectively, an organization can better understand how it develops that software. Measuring software allows an organization to develop profiles of such items as types and causes of errors, relative effort to make corrections, effort distributions across the life cycle, and computer utilization. By understanding the development process, an organization can identify strengths and weaknesses in that process and in the product produced. Before an organization can attempt to improve some facet of its development process, it must understand where the existing weaknesses lie, for example, the types of errors that are most common. Areas of strength can be stressed on future projects, while weaknesses might be addressed with new approaches. These new approaches can be experimented with on future projects to determine their effect based on the changes in the project profile. Provides rationale for adopting a standard approach. Measuring software via experimentation allows for the adoption of a standard approach to developing software within an enviro~ent. If a technology or practice can be shown to measurably improve the development process, this data provides significant rationale and support for adopting that technology or practice as a standard technique. Without showing this effect within a particular environment, there is no rationale for using one particular set of techniques over any other. Provides discipline to project development. The process of measurement (i.e., data collection) adds a de facto discipline to the project team by requiring members to realize the characteristics of the development process. The simple act of filling out forms that characterize effort or report an error gives the developer a clearer insight into the project. This project discipline in turn provides insight on software engineering technology to everyone.

For these reasons, collecting data and measuring software are vital activities to any software development organization. All of the above points lead to the conclusion that measuring software is a key step in

improving an organization’s software process and products .

The SEL has shown that software measurement can become an integral part of an organization’s development process. Without software measurement, success or failure in developing software systems may be random. To clearly understand the software development process, the strengths and weaknesses of that process, and how best to improve that process, software measurement is critical. Clearly, by unders~ding an enviro~ent’s profile, ~p~vemen~ in software development can be effected via the planned application and evaluation of new development technologies.

REFERENCES

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

C. E. Walston and C. P. Felix, A Method of Program- ming Measurement and Evaluation, IBM Sys. J., 16(l), 54-73 (1977). V. R. Basili and H. D. Rombach, TAME: Integrating

Measurement Into Software Environments, University of Maryland, Technical Report TR-1764, 1987. B. Curtis, S. Sheppard, and P. MiIlim~, Third Time Charm: Stronger Prediction of Programmer Performance by Software Complexity Metrics, Proceeding of the 4th International Conference on Software Engineering, New York: Computer Society Press, pp. 356-360, (1979). D. N. Card, F. E. McGarry, G. Page, et al., Software Engineering Laboratory, SEL-8 l- 104, The Software Engineering Laboratory, Feb. 1982. V. E. Church, D. N. Card, and F. E. McGarry, et al., Software Engineering Laboratory, SEL-81-101, Guide to Data Collection, Aug. 1982. V. R. Basili, Quantitative Evaluation of Software ~ethodolo~, University of Ma~land Technical Report TR-1519, 1985. P. Lo and D. Wyckoff, Software Engineering Laboratory, SEL-8 1 - 102, Software Engineering Laboratory @EL), Data Base Organization and User’s Guide Revision I, July 1983. Software Engineering Laboratory, SEL-82-004, Collec- tion Software Engineering Papers; Vol. Z, July 1982. Software Engineering Laboratory, SEL-83-003, Col- lected Software Engineering Papers: Vol. ZZ, Nov. 1983. Software Engineering Laboratory, SEL-85-003, Col- lected Software Engin~ri~g Papers: Vol. ZZZ, Nov. 1985. Software Engineering Laboratory, SEL-86-004, Co& lected Software Engineering Papers: Vol. ZV, Nov. 1986. F. E. McCarty, What Have We Learned in the Last 6 Years? Proceedings of the 7th Annual Software Engi- neering Workshop, SEL-82-007, Software Engineering Laboratory, Dec. 1982. F. E. McGarry, S. Voltz, and J. D. Valett, Determining Software Productivity Leverage Factors, Proceedings of


the 11th Annual Software Engineering Workshop, SEL-86-006, Software Engineering Laboratory, Dec. 1986.

14. F. E. McGarry, J. D. Valet& and D. Hall, Measuring the Impact of Computer Resource Quality on the Software Development Process and Product, Proceedings of the Hawaiian International Conference on System Sci- ences, Jan. 1985.

15. D. N. Card, R. W. Selby, F. E. McGarry, et al.,

Sofhvare Engineering Laboratory, SEL-85-001, Compar- ison of Software Verification Techniques, April 1985.

16. W. W. Agresti, SEL Ada Experiment: Status and Design Experiences, Proceedings of the 11th Annual Software Engineering Workshop, SEL-86-006, Software Engi- neering Laboratory, Dec. 1986.

17. C. Brophy, et al., Software Engineering Laboratory, SEL-87404, Assessing the Ada Design Process and Its Implementation: A Case Study, July 1987.

Documents

A summary of software measurement experiences in the Software Engineering Laboratory