Hitting the wall: errors in developing and code inspecting a `simple' spreadsheet model

Ž .Decision Support Systems 22 1998 337–353

Hitting the wall: errors in developing and code inspecting a‘simple’ spreadsheet model 1

Raymond R. Panko ), Ralph H. Sprague Jr. 2

UniÕersity of Hawaii, 2404 Maile Way, Honolulu, HI 98821, USA

Abstract

Field audits and experiments have found substantial error rates when students and professionals have built spreadsheetmodels. In this study, 102 undergraduate MIS majors and 50 MBA students developed a model from a word problem thatwas relatively simple and free of domain knowledge. Even so, 35% of their 152 models were incorrect. There was nosignificant difference in errors per model between undergraduates and MBAs. Even among the 17 MBAs with 250 h or more

Ž .of experience, 24% of the models contained errors. The cell error rate CER —the percentage of cells with errors—was2.0%. When 23 undergraduates attempted to audit their models through code inspection, only three with incorrect

Ž .spreadsheets 15% produced clean spreadsheets when they finished the audit. q 1998 Elsevier Science B.V. All rightsreserved.

Keywords: Audit; Code inspection; Decision support system; End user computing; Error; Fault; Spreadsheet; Modeling

1. Introduction

1.1. The importance of spreadsheet modeling

Spreadsheet modeling is enormously important inbusiness. Spreadsheet modeling has long been among

w xthe most widely used PC applications 28,37,42 ,w xespecially among managers 28 . Many spreadsheet

models guide important organizational decisions. Itis difficult to imagine a firm making a critical deci-sion without ‘going through the numbers’ with aspreadsheet model.

) Corresponding author. Tel.: q1-808-956-5049; fax: q1-808-956-9889; e-mail: [email protected]

1 An earlier version of this paper was presented at the HawaiiInternational Conference on System Sciences, January 1996.

2 E-mail: [email protected]

If anything, spreadsheet modeling should grow inw ximportance in the future. Sprague and Carlson 50

said that there are three components of decisionŽ .support systems DSSs : modeling, data manage-

ment, and the user interface. Early spreadsheet pro-grams were only strong in modeling. However thegrowth of macro languages and Microsoft’s VisualBasic for Applications have brought extensive toolsfor customizing the user interface and for linkingvarious modules into a coherent whole. In addition,

Ž .thanks to the Open Database Connectivity ODBCprotocol, spreadsheet models on Windows PCs canaccess clientrserver databases.

Given the importance of spreadsheet programs, itwould be a serious concern if even a small fractionof spreadsheets contain errors. Yet Table 1 suggeststhat spreadsheet errors actually are fairly common.The table presents summary data from a number ofexperiments and field audits of real-world spread-

0167-9236r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved.Ž .PII S0167-9236 97 00038-9

( )R.R. Panko, R.H. Sprague Jr.rDecision Support Systems 22 1998 337–353338

Table 1Selected studies of spreadsheet errors

Study Remarks Cell error rate Error per model Percent ofŽ .CER models with errors

Field audits4w xButler 1996 273 operational models 10.7%

w x w xCragg and King 1993 12 20 operational models 25%w x w xDavies and Ikin 1987 15 19 operational models 21%

5 a aw xHicks 1995 1 module with 19 submodules 1.2% 2.4 26%about to enter operation

DeÕelopment experimentsw x w xBrown and Gould 1987 8 Minimal definition of errors 0.6 44%w x w xBrown and Gould 1987 8 Broader definition of errors 63%

bw x w xHassinen 1988 25 Paper and pencil exercise 4.3% 0.8 55%w x w xHassinen 1988 25 Computer exercise 1.7 48%

bw x w xLerch 1988 34 Fill in formulas in template 9.3%cw x w xJanvrin and Morrison 1996 29 , Study 1: links between worksheets 7%–14% 84%–95%

w x w xMorrison 1995 40cw x w xJanvrin and Morrison 1996 29 Study 2: links between worksheets 8%–17%

w x w xPanko and Halverson forthcoming 43 General business students, 5.6% 2.4 79%working alone

w x w xPanko and Halverson forthcoming 43 General business students, 1.9% 0.8 64%working in groups of four

w x w xTeo and Tan 1997 53 Undergraduate students 2.0% 0.5 42%

Code inspection experimentsdw x w xGalletta et al. 1993 21 Finding errors in seeded models 34%–54%dw x w xGalletta et al. 1996–1997 22 Finding errors in seeded models 45%–55%

a Errors per model and percent of models with errors computed on basis of submodules.b Errors per formula cell.c Errors per inter-spreadsheet link.d Percent of seeded errors not detected.

sheet models. Perhaps the most significant pattern inthe table is that every one of these studies has founderror rates that would be unacceptable in practice.

One potential threat to external validity in theexperimental data is the possibility that the tasksused in past experiments may have been too difficultfor subjects or may have required task domainknowledge that the subjects did not possess. If so,the high error rates seen in past experiments could bereflections of task unsuitability rather than of spread-sheet development per se. To address this threat, ourstudy used a task designed to be relatively simpleand free of domain knowledge.

Another potential threat to external validity in theexperimental data is that some experiments have

w xused undergraduate students 25,34,43,53 . Thisraises the concern that undergraduates may be inade-quate surrogates for spreadsheet developers in orga-

nizations. To address this concern, we drew oursample both from undergraduate business classes andfrom MBA classes. In addition, among the MBAsubjects, we analyzed differences in error rates be-tween inexperienced and experienced spreadsheetusers.

w xFrom human error research in general 3,47 , weknow that error is present in all cognitive processes.As a result, we must develop ways to reduce in-evitable errors by detecting and correcting them. Oneway to do this in spreadsheet modeling is to conduct

w xcode inspections 17 , which involve the detailedexamination of the model’s code after it is devel-

w xoped. Galletta et al. 21,22 conducted code inspec-tion experiments using spreadsheet models seededwith errors. The models used in these experimentswere developed by the experimenter, rather than byactual subjects. To see if people may have more

( )R.R. Panko, R.H. Sprague Jr.rDecision Support Systems 22 1998 337–353 339

trouble code inspecting their own models, we hadsome of our subjects code inspect their own models.

Finally, we examined the types of errors thatw xsubjects made, using the Panko and Halverson 45

taxonomy of logical, mechanical, and omission er-rors. We were especially interested in seeing if un-dergraduates, inexperienced MBA students, and ex-perienced MBA students made different types oferrors.

2. Research on errors

2.1. Errors in real-world spreadsheet models

In recent years, we have seen a scattering ofreports about errors in real-world spreadsheet modelsw x44 . Given the reluctance of organizations to publi-cize embarrassments, these incidents may be only thetip of a large iceberg. A number of consultants,based on personal experience, have claimed that 20%to 40% of all spreadsheet models contain errorsw x13,16,32,48 .

Table 1 shows that systematic field audits ofreal-world spreadsheet models have reinforced con-cerns about spreadsheet accuracy. These four auditsinvolved data from 313 real-world models from over200 organizations. These studies found errors in10.7% to 26% of the spreadsheets or spreadsheetmodules they examined. In addition, Dent describedan audit in an Australian firm that found errors in

3 w xabout 30% of all models . Freeman 20 , in turn,described a study by Coopers and Lybrand in Lon-don. This study found errors in 90% of all spread-sheet models with more than 150 rows.

Although the rates of errors found in these fieldaudits varied somewhat, at least some of the differ-ences probably are due to methodological differ-ences. The Butler study, for example, analyzedspreadsheets with an automated analysis tool similarto a grammar checker for word processors 4. This

3 A. Dent, Personal communication with the first author viaelectronic mail, April 2, 1995.

4 Ž .R. Butler Team Leader , Central Computer Audit Unit, HMCustoms and Excise Computer Audit Unit, personal communica-tions with the first author by electronic mail, August and Septem-ber, 1996.

tool would not warn the auditor if the developer hadused the wrong algorithm in a calculation. Nor would

w xit catch many other errors. Cragg and King 12 , inŽturn, note that their audits were fairly brief only 2 h

.per model and were done by a single person and somay have missed errors. Only the Hicks field audit

w xused a methodology similar to code inspection 17in programming 5. It used a cell-by-cell team auditwith one developer and two team members fromother departments. This intensive code inspectionfound 45 errors in 3856 cells, for a cell error rateŽ .CER of 1.2%. Of 19 submodules, 26% had errors.

2.2. High error rates in experiments

While formal audits give us real-world data, theydo not give us detailed information about the typesof errors that people make when they create spread-sheet models. Nor do they tell us the frequency withwhich developers make errors. For such information,we need experiments in which numerous subjectsperform an identical task. Table 1 shows results froma number of these experiments. All have founddisturbingly high error rates.

In most experiments, a majority of the spread-sheet models that developers created contained atleast one error. In addition, the CER—the percent-age of cells containing errors—has been a few per-cent. For larger models, this suggests that the issue isnot whether such models have errors but rather howmany errors they contain.

Error rates differ across the experiments. How-ever this too seems to reflect methodological differ-

w xences. The studies by Brown and Gould 8 , Hassi-w x 6 w xnen 25 , Panko and Halverson 43 , and Teo andw xTan 53 looked at errors for entire models at the end

of a development phase. They had similar error rates.w xIn contrast, the Lerch 34 and Janvrin and Morrison

w x29 studies only looked at certain especially difficultformula cells. This could plausibly account for theirhigher CER.

5 L. Hicks, NYNEX, private communication with the firstauthor, June 1995.

6 K. Hassinen, University of Joensuu, private communicationwith the first author, January 1995. Provided further data from his1988 study, so that the cell error rate could be computed.


2.3. Error rates in other cognitiÕe actiÕities

w xReason 47 summarized recent cognitive researchon human error. He argued that human cognitiveprocesses are very fast and flexible, but their meth-ods of operation inherently produce a small errorrate. In other words, correct performance and errorsare due to the same underlying mechanisms. In allhuman activities, there will be errors. The only ques-tion is the rate of uncorrected errors, that is, errorsthat are not detected and fixed by people as theywork.

Research in other human cognitive activities havetold us that errors are not merely inevitable; they areactually predictable. Research has produced humanerror data from a large number of experiments andreal world incidents across many types of humancognitive activity. These data collectively suggestthat human beings have a natural uncorrected errorrate of about 1%, plus or minus an order of magni-tude. Of course the error rate will depend on thetask, as will the error detection and correction rate.However, if spreadsheet error research did not alsofind errors in roughly 1% of all cells, the researchitself would be rather suspect.

In programming, for example, we have data fromw xnumerous code inspections 17 of real-world pro-

grams developed by experienced programmersw x 7,8,95,7,17,30,49,52 . From these studies, we knowthat programmers have uncorrected errors in 1% to10% of their statements even after careful develop-ment but before intensive testing. More precisely,programmers refer to what we call errors as faults,and they report fault rates in terms of faults perthousand lines of non-comment source code, KLOC.While there are different ways to count errors and

w xlines of code 1 , the consistency among real-worldcode inspections is striking, suggesting a natural

7 M. Graden, P. Horsley, The effects of software inspections onŽ .a major telecommunications project, AT&T Tech. J. 65 1986,

w xcited in Ref. 52 .8 McCormick, K. Results of using code inspection for the

AT&T ICIS Project, in: 2nd Annual Symposium on EDP Qualityw xAssurance, March, 1983, cited in Ref. 51 .

9 P.B. Moranda, A comparison of software error-rate models,in: Texas Conference on Computing, 1975, pp. 2A-6.1–2A-6.9,

w xcited in Ref. 19 .

underlying error level in human cognitive processingfor programming tasks.

In programming, faults per KLOC has proven tobe a good measure because it is highly independentof the length of the program module being studiedw w xxe.g., 46 . It allows programmers to look at a coreerror rate in their work. We hope that the CER willbe similarly useful. While larger spreadsheet modelsshould have a larger number of errors than smallermodels, we speculate that the CERs will be roughlysimilar, allowing us to see the underlying error ratein spreadsheet model development.

While spreadsheet models are often believed to bemuch simpler and smaller than programs, surveys ofdevelopers have shown that many spreadsheet mod-

w x w xels are both large 9,12,24 and complex 24 . Third-generation language programs studied in field audits,in turn, tend to be built in relatively small modules.So finding comparable error rates should not besurprising.

Also encouraging us to believe that the error ratesshown in Table 1 are reasonable is the fact thatcomparable error rates have been found in a variety

w xof other computer applications 10,11 , even whenexpert subjects are used. There are also similar toerror rates when subjects use a calculator or look up

w xnumbers from a table 38 .

2.4. Error cascades

In most matters, a handful of uncorrected errors inevery hundred actions is a small penalty for speedand flexibility. But when there are long sequences ofcomputations, as in spreadsheet models, even verylow error rates cascade rapidly into a high probabil-ity of a bottom-line error.

w xLorge and Solomon 35 developed a generalmethod for analyzing error cascades. This methodallows us to compute the probability of an error in acascade of spreadsheet model cells. If there are N

Ž .stages in this case, N cells in a cascade , and if theŽ .error rate per state is e in this case, the CER , then

the probability of an error in the bottom-line value atthe end of the cascade, E will be given by thisformula:

NEs1y 1ye 1Ž . Ž .


For instance, suppose that the CER is 2% and thatthere are only 50 cells in the cascade to a certainbottom line value. Then the probability of an error inthe bottom line value will be 64%! Quite simply,unless the CER is vanishingly small, the probabilityof a bottom line error will be quite large. In fact, forlarger models, the issue will be how many errors the

Ž .model is likely to have. For the conditions in Eq. 1 ,this would be the CER times the number of cells inthe spreadsheet model.

2.5. Research in error detection and correction

Errors are inevitable. We know from protocolw xanalysis studies in statistics problem solving 2 and

w xwriting 26 that people both detect and correct errorsas they go along and also engage occasionally inmore systematic error checking episodes, in whichthey go back over their work before they finish.From these studies, we also know that they missmany of their errors. For simple slips, the detectionand correction rate is over 90%. For more complexerrors, the error rate is much lower.

In programming, we know that we have to spendabout a third of the total development time on sys-tematic error checking after development. Otherwise,our programs will have an unacceptably high numberof errors upon delivery. Often, this systematic errorchecking takes the form of team code inspectionw x17 . Team code inspection is used because bothexperiments and field experience have shown thatindividual inspectors will not catch a large fractionof all programming errors in programs. In experi-

w xments, the subjects of Basili and Selby 4 and Myersw x41 only caught half of all seeded errors in testprograms. As Table 1 show, the studies of Galletta et

w xal. 21,22 had similar error detection rates withspreadsheet models seeded with errors. Tjahjono’sw x14 subjects detected only 22% of seeded program-ming errors.

One concern with past laboratory studies is thatthe seeded models were developed by the experi-menter rather than by subjects. As a result, theseeded experiments and spreadsheet models havefollowed good design practices. Subjects might havea more difficult time inspecting their own programs.First, many of their models used poor design and somight be difficult to read. Second, it seems plausible

that people may have a harder time inspecting theirown models than the models of others.

2.6. Types of errors

We noted earlier that error detection and correc-tion rates seem to depend on the type of errors thatare made. Several categorizations have been sug-gested for spreadsheet errors. We will follow the

w xtaxonomy created by Panko and Halverson 45 .They divided errors into mechanical, logical, andoversight errors, based on a classification scheme

w xdeveloped by Allwood 2 to study students workingon statistics problems.

2.6.1. Mechanical errorsMechanical errors include mistyping a number,

accidentally typing a plus sign instead of a minussign, pointing to a wrong cell when entering a for-mula, reading a number incorrectly from a problemstatement, or selecting the wrong range. In the hu-

w xman error literature, they are often called slips 3 .w xSurveying several typing studies, Kukich 33

found that expert typists make uncorrected keystrokeŽ .mechanical errors in 0.5% to 1% of all characterstyped. In another related area, Swain and Guttmanconducted simulations of nuclear plant operation 10.They found that for simple tasks, such as selectingthe right switch, error rates were between 0.3% and1%.

In his field audit of a spreadsheet about to becomeoperational, Hicks found that 64% of the errorsdiscovered were mechanical errors, with pointingerrors being the most common type of mechanicalerror 5. In their spreadsheet experiments, Brown and

w x w x w xGould 8 , Floyd and Pyun 18 , and Lerch 34 allfound considerable numbers of errors that we wouldclassify as mechanical.

2.6.2. Logic errorsLogic errors involve faulty reasoning instead of

simple mechanical slips. Logic errors include the use

10 A.D. Swain, H.E. Guttman, Handbook of Human ReliabilityAnalysis with Emphasis on Nuclear Power Plant Applications.Ž .Draft report for interim use and comment Technical ReportNUREGrCR-1278, U.S. Nuclear Regulatory Commission, Wash-

w xington, DC, October 1980, cited in Ref. 31 .


of the wrong algorithm or implementing the algo-rithm with the wrong logic. For instance, in our task,the subject had to add 30% to labor costs to accountfor fringe benefits. Some subjects divided labor costsby 70% instead of multiplying it by 130%. If it tookyou a second or two to see why that was incorrect,you see the subtlety of many logic errors.

Logic errors have been studied in other cognitivew xdomains. In statistics for example, Allwood 2 found

many logic errors in statistical problem solving bynovices. In programming, we have long known thatlogic errors are important even in programs by ex-pert programmers. In one study of 4339 lines of codeat Aetna, for instance, 51% of the errors discovered

w xwere logic errors 17 .w xPanko and Halverson 43 found many logic er-

rors in their spreadsheet experiment and subdividedw xlogic errors into Eureka errors 35 , which are easy to

prove to be incorrect, and Cassandra errors, whichare difficult to prove to be incorrect. This distinctionwas important because some subjects worked inteams. It was hypothesized that teams would have adifficult time dealing with Cassandra errors, and this

w xin fact proved to be true. Panko and Halverson 45suggested additional ways to subcategorize logic er-rors. Among the spreadsheet studies in Table 1,

5 w x w xHicks, Brown and Gould 8 , and Lerch 34 allfound logic errors.

2.6.3. Omission errorsFinally, omission errors involve leaving some-

thing out of a model. For example, to compute theamount of a loan that he would need when purchas-ing a new home, one analyst forgot to include payingoff the old mortgage, even though he had includedthe mortgage payoff in several previous analyses ofthe sale.

Specifically, in an experiment, an omission erroris omitting a parameter given in the task statement.For instance, if respondents have to compute laborcosts, they may forget to include fringe benefits ormay forget that there are two members of a workteam.

Only 4% of Hicks’ errors were omissions, al-though the main logic error consisted of omitting aparameter from an equation 5. In the Brown and

w xGould 8 experiment, when omission errors wereexcluded, 44% of the spreadsheet models had errors.

When omission errors were included, this rose to63%.

Omission errors are dangerous because they seemto be very difficult for developers to detect. All-

w xwood’s 2 subjects detected none of their omissionerrors. In simulations of nuclear emergencies,Woods’ subjects never corrected misdiagnoses of theproblem 11.

It can be argued that the Panko and Halversonw x45 taxonomy should be a two-by-two matrix, witherrors of omission and commission for both mechan-ical and logic error rates. Indeed, this would be ideal.However it is very difficult in practice to character-ize omission errors as logical or mechanical withoutinterrogating subjects as they work. Panko and

w x ŽHalverson 43 did not do so. Nor did we in this.experiment.

w xSome writers, including Reason 47 , merge omis-sion and commission errors. However, given differ-ences in detection and correction rates found be-tween omission and other errors in past studies, itseems best to keep a specific omission category.

3. Research goals and hypotheses

Now that we have surveyed past research onspreadsheet errors, we will turn to our specific re-search goals and hypotheses.

3.1. Measuring error rates for a simple model

As discussed above, the error rates seen in pastlaboratory studies could plausibly be due to the tasksbeing too difficult for subjects or requiring taskdomain knowledge that subjects did not possess. Soone research goal was to develop a simple andrelatively domain-free model, then measure errorrates for subjects. Given past research shown inTable 1, we selected three measures of error rates:the percentage of incorrect models containing at leastone error, the number of errors per model, and theCER.

11 D.D. Woods, Some results on operator performance in emer-gency events, in: Institute of Chemical Engineers Symposium,

Ž . w xSeries 90 1984 pp. 21–31, cited in Ref. 47 .


3.2. Comparing error rates for different types ofsubjects

As discussed earlier, many studies have beendone with undergraduate subjects. This led to ourgoal of having different types of subjects do the task,so that we could compare their error rates. We choseto compare undergraduate business students withMBA students, to see if the latter group would infact make fewer errors. We also wished to compareinexperienced with experienced spreadsheet develop-ers. This was only possible to do among our MBAstudents. These goals led to the following specific

Ž .hypotheses: a H1, MBA students make fewer errorsŽ .per model than undergraduate students. b H2, Ex-

perienced MBA students make fewer errors per modelthan inexperienced MBA students.


Given the taxonomy of error types given earlier,we wished to measure the percentages of errors thatwould fall into the logical, mechanical, and omissionerror categories.

3.4. Code inspection

As discussed earlier, code inspection studies todate have used models developed by the experi-menter. Consequently, we wished to know if subjectscode inspecting their own models will find fewererrors than the 50% figures found by Galletta et al.w x21,22 . We did not create a formal hypothesis be-cause this part of the research was exploratory. Wehad no independent data to tell us if a randomly-selected sample of subjects would correct 50% of theerrors in these specific models.

3.5. Identifying subjects who made errors

In most human activities, there is a range ofability within the population. In general, studyingindividual differences in human error is extremelydifficult, because in any given task, most subjectsmake no errors and the rest make only a handful.With a different task, or perhaps even with the sametask on another day, the results could be different interms of who makes errors and how many errors

they make. To study individual differences conclu-sively in human error domains requires giving sub-jects a long series of tasks, in order to get a goodaverage error rate for individuals. Nevertheless, thefact that we had a post-experiment questionnaireallowed us to compare subjects who made errors inour single task with subjects who did not. This partof the analysis was completely exploratory.

4. Methodology

4.1. The sample

The sample consisted of 152 students at the Uni-versity of Hawaii. All participated as a class require-ment. Students received full credit if they ‘gave theproject their best shot.’

In the sample, 102 subjects were upper-divisionundergraduate business students. All were MIS ma-jors. All had previously taken two accounting coursesand an introductory computer course that coveredspreadsheet mechanics. An additional six undergrad-uates were excluded from the sample because theirmodels or disks were unreadable.

The remaining 50 subjects were MBA studentswho had previously taken an accounting class orwho had waived the class because of a previousaccounting course. For spreadsheet modeling, all hadtaken the required course covering this topic or werecurrently taking the course. Those currently takingthe course had already covered the spreadsheet mod-eling part of the course. An additional five MBAstudents were given the task but were excluded fromin the sample because their models or disks wereunreadable.

Following the procedures used by Galletta et al.w x21 , MBA students were subdivided into inexperi-enced and experienced spreadsheet developers basedon hours of experience. Inexperienced subjects werethose with 100 h or less of spreadsheet experience.Experienced subjects were those with 250 h or more

w xof experience. Following Galletta et al. 21 , we onlyconsidered development, auditing, and training expe-rience. We did not consider time spent typing num-bers into spreadsheets created by others.

There were 26 inexperienced spreadsheet devel-opers in our MBA sample. They had a mean of only


11 h of experience. Twenty-one had no experiencedeveloping models at work. There were 17 experi-enced spreadsheet developers in the MBA sample.Their mean hours of experience was 2269. Themedian was 635. One subject with 200 h of experi-ence was not categorized because he or she had morethan 100 h but less than 250. Another six could notbe categorized because of ambiguities in their re-sponses to the experience scale.

In a post-experiment questionnaire, subjects ratedtheir spreadsheet expertise. The mean on a 7-point

Ž .scale was 4.8 4.8 for undergraduates, 4.7 for MBAs .Sixty-two percent used the three highest values onthe scale. Another 21% selected the middle value.

Another question asked about the adequacy oftheir spreadsheet knowledge for the experimentaltask. Eight-eight percent rated their spreadsheetknowledge as adequate. Another 11% rated theirknowledge as barely adequate. One subject rated hisor her knowledge as inadequate. This MBA student,however, built a correct model. Undergraduate andgraduate distributions were almost identical. Thisdistribution seems reasonable, because the task onlyrequired basic spreadsheet skills.

In a third question, we asked subjects if they hada difficult time with the spreadsheet knowledge re-quired for the specific task in the experiment.Eighty-six percent disagreed, with 56% choosingextreme disagreement on the 7-point scale. Ten per-cent agreed, with 1% choosing extreme agreement.

4.2. Procedure

The subjects did not work in the laboratory. In-stead, they took the experimental materials home in a

sealed envelope. They opened the envelope whilesitting in front of a blank spreadsheet file. They had45 min to do the task, which was about twice thelength of time subjects averaged on a pre-test.Eighty-nine percent said that they had sufficienttime, 5% choose the neutral value on a 7-point scale,and 6% chose values of 1 through 3. Seventy-threepercent chose 7, indicating strongest agreement.

Not using the control of laboratory work is con-troversial. However there was little incentive to cheat,because subjects knew that they would get full creditif they merely gave the task their best effort.

More importantly, we argue that not doing thetask in a laboratory added to realism. One concernwith laboratory studies is that they are unrealisticand so create errors. By allowing subjects to work ina more comfortable environment, we hoped to re-duce that threat to external validity.

As a cross check, we had another 10 undergradu-ate subjects do the development task in the labora-tory. Thirty percent had errors, in contrast to 37% ofthe undergraduates who did the assignment on atake-home basis. The difference in number of errorsper spreadsheet was not significant. In addition, ifpeople working at home had cheated, we wouldexpect them to have had a lower rate of incorrectspreadsheets. We also checked each of the models inthis study to ensure that they were not merely copiesof someone else’s work.

The packet contained a consent form and a set ofinstructions. Both were explained in class beforehanding out the packet. The packet also contained abrief problem statement, which we present below.

Finally, the packet also contained a post-experi-ment questionnaire. This questionnaire asked about

Fig. 1. The Wall Task. You are to build a spreadsheet model to help you create a bid to build a wall. You will offer two options—lava rockor brick. Both walls will be built by crews of two. Crews will work 3 8-h days to build either type of wall. The wall will be 20 ft long, 6 fttall, and 2 ft thick. Wages will be US$10 per h per person. You will have to add 20% to wages to cover fringe benefits. Lava rock will costUS$3 per cubic foot. Brick will cost US$2 per cubic foot. Your bid must add a profit margin of 30% to your expected cost.


the subjects’ perceptions of the problem, the experi-ment experience, their performance, and their back-ground.

When some undergraduate subjects returned theirpackets, they were given 10 min of in-class instruc-tion on the data in Table 1. The purpose was tosensitize them to the dangers of spreadsheet errors.They were then taught for 10 min how to code-in-spect a model by going through it cell-by-cell. Weshowed them how to check for incorrect formulasand noted that they need to check facts in the modelagainst those in the problem statement. We showedthem how to use Excel’s tools for drawing arrows tothe cells referred to in formulas. The subjects werethen given their disks and problem statements to takehome again and code inspect.

4.3. The task

As discussed earlier, the task was designed to besimple and relatively domain-free. Fig. 1 shows thespecific task used in the experiment. We call it the‘Wall Task,’ because it requires the subject to pre-pare bids for building a wall made of either brick orlava rock. The domain knowledge consists of mea-suring the wall’s volume, multiplying this by the costper cubic foot, simple labor calculations, and theadditions of fringe benefits and a profit margin.

In the post-experiment questionnaire, we askedrespondents to rate the problem’s difficulty on a5-point scale, with 5 being high. Sixty-six percent ofthe respondents chose 1 or 2, and another 29% chose

Ž .3. Only 5% chose values at the high difficult end ofthe scale. We seem to have succeeded in producing arelatively simple problem from most subjects’ pointof view.

In another question, we asked respondents if theyhad a difficult time with the knowledge required inthe problem. Seventy percent disagreed, with 41%choosing extreme disagreement. Sixteen percentagreed, with 2% choosing extreme agreement.

4.4. Error determination

To assess errors, we used a standard spreadsheetsolution. Fig. 2 shows this solution.

The first author compared the subjects’ modelswith the standard solution. If there was no error in

Fig. 2. Standard solution for the wall problem.

the bottom line values, he recorded that fact. If abottom line value in the subject’s model was incor-rect, he identified the errors and corrected them untilthe model gave the correct bottom line values.


Ž . Ž .Fig. 3. Rules for Classifying Errors by Type. a Omission error, failing to include a fact explicitly given in the task statement; b LogicŽ .error, a mistake caused by the subject having the wrong algorithm for solving the step or making a mistake implementing the algorithm; c

Mechanical error, an error caused by a simple slip such as pointing to the wrong cell, mistyping a number, typing the wrong operator,selecting the wrong cell or range, misreading a number from the task statement, and so forth.

4.5. Classifying errors

Of the 63 errors, 61 were classified as mechani-Žcal, logical, or omission errors. Two errors could

not be classified because the original spreadsheets.were damaged. As the first author corrected errors,

he wrote a brief description of each error. After thedescriptions were collected, both authors indepen-dently classified errors into mechanical, logical, andomission errors. Fig. 3 shows the definitions theyused. They disagreed upon only two errors initially.One was indisputably an omission error; it involvedleaving a fact in the problem statement out of themodel. One of the authors, however, initially classi-fied it as a mechanical problem. The second involvedmultiplying the labor cost by 0.2 to give labor costplus fringe benefits. The labor cost should have beenmultiplied by 1.2. One author classified this as amechanical error, the other as a logic error. Based onthe fact that students in homework assignments fre-quently make the mistake of multiply by a factorinstead of by one plus the factor, it was jointlydecided that this was a logic error. Identical mechan-ical errors are not made frequently by multiple peo-

w xple. Measured by the kappa statistic 6 , the inter-raterreliability was 0.939.

5. Results

Table 2 summarizes results for both the develop-ment and code inspection phases of the experiment.

5.1. Undergraduates Õs. MBAs

A quick scan of Table 2 indicates that MBAs didnot do much better than undergraduates. Even aquarter of the MBA students with more than 250 hof model development experience had errors in theirmodels. Confirming this impression, t-test for under-graduates vs. MBAs had a probability of 0.223.Hypothesis H1, that MBA students make fewer er-rors per model than undergraduate students, was notsupported.

An F-test comparing undergraduates, experiencedMBAs and inexperienced MBAs also found no sta-tistically significant difference, with a probability of0.414. The difference between experienced and inex-perienced MBAs had a probability of 0.112, basedon a t-test for the number of errors in the model.Hypothesis H2, that experienced MBA students makefewer errors per model than inexperienced MBAstudents, was not supported.

A larger sample of experienced MBAs might haveproduced statistical significance. However even ifthe difference between experienced and inexperi-enced MBA students had been statistically signifi-cant, it would make little practical difference. Evenon this simple problem, a quarter of the experiencedMBA students made errors. This fraction is unac-ceptable from a business viewpoint. So trying topredict differences in error rates across groups, whilepotentially useful in assessing detailed error rates,would leave intact the main error finding—that allgroups produced a level of errors that would beunacceptable in practice.

()

R.R

.Panko,R

.H.Sprague

Jr.rD

ecisionSupportSystem

s22

1998337

–353

347

Table 2Summary of results

a aTotal sample Undergrad MIS majors All MBA students Inexperienced MBA students Experienced MBA students

DeÕelopment phaseNumber of subjects 152 102 50 26 17

Ž .Spreadsheets with errors % 35% 37% 30% 35% 24%Ž .Errors per spreadsheet mean 0.41 0.44 0.36 0.46 0.24

Standard deviation 0.61 0.62 0.60 0.71 0.44Total errors 63 45 18 12 4

Ž .CER cell error rate overall 2.0% 2.2% 1.7% 2.3% 1.1%

bCode inspection phaseInspected spreadsheets with errors 23Models corrected completely 3% of models corrected completely 13%% of errors corrected 18%

The t-test for the number of errors per model for undergraduates vs. MBAs had a probability of 0.223.The F-test for differences in the number of errors per model among undergraduates, inexperienced MBAs and experienced MBAs had a probability of 0.414.a Experienced MBA students have had 250 h or more of creating, auditing, and teaching spreadsheets at work. Inexperienced MBA students had 100 h or fewer.bCode inspections were only done by 23 undergraduate students. No student with a correct spreadsheet model changed it during code inspection.


Because of the lack of statistical significance, ourreporting will focus on data from the total sample.Table 2, however, gives more specific informationby group.

5.2. Spreadsheets with errors

The simplest measure of errors is the fraction ofall spreadsheet models that contained errors. In thisstudy, 53 of the 152 spreadsheets developed by thesubjects had errors. Although this 35% error rate waslower than the error rates found in past experiments,it was still quite high. Even with a rather simple anddomain-free task, errors were abundant.

5.3. Numbers of errors

w xAs in the studies by Brown and Gould 8 , Pankow x w x 6and Halverson 43 , and Hassinen 25 , although the

fraction of spreadsheets with errors was high, thesubjects actually made very few errors per model onthe average. Our subjects only made 63 errors—amere 0.42 errors per model. Even among the 53models with errors, only 10 had two errors. Nonehad more than two.

w xAs Panko and Halverson 43 discussed, the prob-lem with spreadsheets is not that people make a largenumber of errors. It is that there are many cells onthe logic cascades of cells leading to the bottom linefigures. As noted earlier, even a tiny CER will bemultiplied over a logic cascade into a high probabil-ity of an error in bottom line values.

5.4. Cell error rate

As discussed earlier, we are especially interestedin the CER—the number of errors per hundredmodel cells. Our subject models averaged 20.5 cellsper spreadsheet, with a high standard deviation of7.6. There was no statistical difference between cor-

Ž .rect spreadsheets 20.7 cells and incorrect spread-Ž .sheets 20.2 cells .

The 152 models had a total of 3116 cells. Thesubjects made 63 errors, so the cell error rate was2.0%. If the CER had been based on the standardsolution’s 25 cells per model, the CER would be1.7%.

As expected, given the simple nature of the task,this cell error rate was lower than those in shown inTable 1. This suggests that our problem was indeedsimpler than past problems. The only exception is

w xthe Teo and Tan 53 study, which also used ourWall Task 12.

However even this relatively low CER would stillbe unacceptable in the real world. In large models,having 2% of all cells in error would mean not just ahigh probability of an error in the bottom line butalso a high number of errors per model. As notedearlier, the Hicks field audit had a CER of 1.2% andfound many errors in the model audited 5.


w xThe study by Panko and Halverson 43 noted thatsubjects made many distinct errors. This was alsotrue in the current study. Table 3 shows the errorsmade by subjects.

Table 3 shows that undergraduates and MBAsmade extremely similar percentages of omission,logic, and mechanical errors. In fact, despite therelatively small number of errors, this similarityextended down even to common individual errorswithin categories.

5.5.1. Omission ErrorsOmission errors were the most common, account-

ing for 54% of all errors. Interestingly, there wereonly five distinct omission errors, and all but onewere committed more than once. One error—forget-ting that there were two people in each workcrew—accounted for a third of all errors in the entireexperiment.

In fact, there were 25 omission errors in thecomputation of labor cost. These accounted for al-most half the errors in the total study. One possibleexplanation is that the number of facts in the prob-lem statement for the computation of labor costs wasfairly large, so that subjects may have had a difficulttime retaining the information in their limited work-ing memory. Our subjects could have used scratchpaper to write down facts as they worked, but we

12 They computed their CER on the basis of the number of cellsin the standard solution.


Table 3Errors by type

Type of error Number of errors % of errors for % of errors for % of errors fortotal sample undergrads MBAs

Omission errors 33 54% 56% 50%Omitted two workers in labor calculation 20 33% 35% 28%Omitted 30% profit margin 4 7% 7% 6%Omitted fringe benefits 4 7% 7% 6%Omitted 3 days in labor calculation 4 7% 7% 6%One other omission error occurred onceLogic errors 26 43% 42% 44%Profit margin on materials only 6 10% 9% 11%For profit margin, divided by 0.7 5 8% 7% 11%Two other logic errors occurred twice11 other logic errors occurred onceMechanical errors 2 3% 2% 6%Two mechanical errors occurred onceTotal 61 100% 100% 100%

Two errors could not be classified because of damage to the spreadsheet models.

collected all papers the subjects used, and very fewwrote facts on the task statement or on scratch paper.Various omitted facts may have literally dropped outof memory. This may be a fruitful area for furtherresearch. This is plausible because we have longknown that working memory can only hold about

w xseven items 39 . This limited working memory isused not only to hold numbers but also to hold plansfor the algorithm being considered and for broader

w x w xplanning 3,47 . Mattson and Baars 36 argued thaterror detection is in part determined by availableattentional resources. If the resources are not suffi-cient, they argued, omission errors will occur.

5.5.2. Logic errorsLogic errors were almost as numerous but were

more diverse. While there were only five distinctomission errors, there were 16 distinct logical errors.At least in this study, then, logic errors were muchless predictable than omission errors. There seems tobe a strong random element in logic error-making.

5.5.3. Mechanical ErrorsMechanical errors were almost nonexistent; there

were only two definite mechanical errors. One was apointing error, while the other appeared to be anerror in reading and then writing down the fringebenefit rate. The popular perception of spreadsheeterrors as pointing and typographical errors was notborne out in this study.

Typographical errors, in fact, were nonexistent,although one or two errors classified in other wayscould possibly have been typographical errors. Thislack of typographical errors was astonishing, becauseskilled typists make one uncorrected error in about

w xevery 200 keystrokes 33 . In contrast, our subjectsprobably hit 20,000 to 30,000 keys without making asingle uncorrected typographical error in numbers

Ž .and formulas we did not study text cells .

5.5.4. Types of errors: implicationsThe wide variety of errors suggests that when we

audit spreadsheets it will not be enough to inspectparts of the problem that seem especially difficultand to ignore other parts of the model. Althoughsome types of errors are more likely than others,error-making appears to be extremely diverse. If thisis generally true, it will mean that specific errors willnot be very predictable. From one study, of course,we cannot draw general conclusions. We need to seeif similar patterns appear in other experiments and,more importantly, in real-world spreadsheet models.

5.6. Code inspection

As noted earlier, some subjects were given theirmodels back with instructions to code inspect them.This was not possible in all undergraduate classes orin MBA classes from which subjects were drawn.


None of the subjects with correct spreadsheetsmade any changes. Of the subjects with incorrectspreadsheets, 23 attempted to correct their spread-sheets. As discussed above, we anticipated that oursubjects would do poorly because they were inspect-ing their own models and because their modelstended to be difficult to read. Indeed, only three of

Ž .the 23 subjects with errors 13% corrected theirspreadsheets. Another 17 made no changes. Threecaught a single error in a two-error model, and oneother actually added an error without fixing any.Counting errors instead of models, the subjects fixedonly 21% of their 23 errors. Counting the subjectwho added an error during the code inspection, a netof 18% of the errors were fixed.

Our error rates are higher than those found in thew xcode inspection studies by Galletta et al. 21,22 . In

those studies, subjects caught about half of all seedederrors. One might dismiss our higher error detectionrates by saying that our students were undergradu-ates, while Galletta et al. used CPAs and MBAs.However we gave the 1996 Galletta et al. problemset and half of the 1993 problem set to another group

Žof 26 undergraduate MIS majors. The other half ofthe 1993 problem set involved accounting knowl-

.edge we could not assume for our students. Ourundergraduate subjects caught almost exactly thesame fraction of errors that the Galletta et al. sub-

w xjects 21,22 caught. So class standing cannot be thewhole story.

5.7. Who makes errors?

Can we distinguish between people who makeerrors and people who do not? In general, the ques-tions that we asked on the post-experiment question-naire provided little power to distinguish people whobuilt correct spreadsheets from those who did not.

We asked 36 questions that might have distin-guished between the two groups of subjects. Amongthose that did not distinguish between spreadsheetswith errors and clean spreadsheets at the 0.05 cut-offwere confidence in the accuracy of the spreadsheetand various measures of prior knowledge. Only threequestions distinguished subjects who made errorsfrom those that did not at the 0.05 cut-off.

First, when asked for the best number of people tohave done the model development in a team, subjects

with errors had a mean of 1.7 people, while forsubjects without errors, the mean was 1.4. Thissuggests that the subjects who made errors may havebeen somewhat lacking in confidence after the exper-iment, despite their answers to direct confidencequestions.

Second, subjects who made errors were morelikely to have said that they had a difficult time withthe accounting knowledge required in the problemŽ .2.94 on a 7-point scale. than were those who did

Ž .not have errors 2.32 . However, the difference inmeans was small.

Third, subjects who did not have errors rated theirŽaccounting expertise somewhat higher 4.70 on a

. Ž .7-point scale than did subjects who had errors 4.10 .These few differences, although statistically sig-

nificant, were too small to have any practical predic-tive power. It appears that giving people question-naires to assess whether or not they commit errorswill not have much predictive power. In addition,given 36 questions, three positives is about what onewould expect by chance.

These results seem to suggest that error-makinghas a strong random element. The same people whomade errors in this experiment might be the peoplewith correct spreadsheets in another experiment. Ofcourse, there might be other factors that we have nottaken into account.

6. Conclusion

One threat to external validity in past spreadsheetexperiments has been the concern that tasks used inthe experiments were too difficult for subjects orrequired domain knowledge that subjects did notpossess. To assess this possibility, our subjects useda task that was simple and relatively free of require-ments for domain knowledge. Although our subjectsdid have somewhat lower error rates than subjects inpast experiments, our subjects still made errors in35% of their models and in 2.0% of all cells. Ifreal-world spreadsheets had even an order of magni-tude fewer errors, this would still be too much forsafety in corporate spreadsheet development. Whenour subjects code inspected their own models forerrors, furthermore, they found only about one error


in ten. Overall, the concern that past experimentsused problems that were too difficult for the subjectscannot be used to dismiss our subjects’ unacceptablyhigh error rates.

Quite simply, to err is human. Human factorsstudies, including those using computers, have con-sistently shown that while people do not make manyerrors, they do have natural error rates that often areon the order of 1% or more of their actions. Most ofthese human factors studies, furthermore, have beendone on people who are experts in their fields.Error-making is not a novice-level phenomenon. Ourstudy found that inexperienced and experiencedspreadsheet developers made about the same numberof errors per spreadsheet model. In addition, Galletta

w xet al. 21 found that when experienced spreadsheetdevelopers audited models, they did not find a higherpercentage of the errors in these models than didinexperienced spreadsheet developers. This lack oflarge differences between relative novices and ex-perts has also been seen in other studies, for instance

w xin Grudin’s 23 study of typing. This is not to saythat novice–expert differences do not exist. It is onlyto say that they are not an order of magnitude in size.Looking at the error rates in Table 1, we would neederror rates lower by one or two orders of magnitudeto make spreadsheet modeling is a safe activity.

Professional programmers have long known thatthey have error rates comparable to those in Table 1when they have ‘finished’ a program or modulew x 7,8,95,7,17,30,49,52 . As a result, professional pro-grammers use development disciplines that call forspending about a third of their time testing theprogram. One of the techniques they use to check for

w xerrors is code inspection 17 , in which a team ofprogrammers systematically checks the code for er-rors. First, members of the team check the codeindividually. Then, in a meeting, they read throughthe program line by line. During this process, mem-bers report the errors they discovered before themeeting. In addition, the team discovers additionalprogram faults during the meeting.

Unfortunately, independent audits are rare inw xspreadsheet modeling 24 , and even data testing

w xwith extreme values is also uncommon 24 . In addi-Ž .tion, data on spreadsheet and programming code

inspection errors for individuals indicates that onlyteam inspections are likely to succeed at reducing

programming and spreadsheet development errors toan acceptable level.

In general, it seems that we will need the kinds ofdeep testing seen in professional programming forimportant spreadsheets. This will involve sophisti-cated data testing and team code inspections. In fact,we will have to rethink the entire development pro-cess. Professional programmers have to conduct teamdesign inspections before they ever begin to code. In

w xcontrast, Cragg and King 12 found that the 31spreadsheet developers that they interviewed rarelydid much planning before they start filling in cells on

w xa spreadsheet. Both Brown and Gould 8 and Pankow xand Halverson 43 noted a lack of planning in their

experiments.Perhaps spreadsheet developers have felt that their

models are small compared to the programs of pro-fessional programmers, but as noted earlier we knowthat many spreadsheets are quite large and complex.We also know that spreadsheet developers oftenhave considerable difficulty when they try to under-stand even their own spreadsheets and often haveproblems finding appropriate ways of handling com-

w xputational tasks 27 . Quite simply, spreadsheet de-velopment looks quite a bit like programming.

Fortunately, if our goal is to teach developers howto create spreadsheets professionally, we may be ableto draw on what we already know about programdevelopment. Not everything in programming devel-opment will carry over to spreadsheet development,of course. Still, in many ways, teaching spreadsheetdevelopers how to develop their spreadsheets moresafely is likely to be largely a matter of ‘teachingnew dogs old tricks’.

For further information on spreadsheet errors,consult the Spreadsheet Research Website athttp:rrwww.cba.hawaii.edurpankorssrr. For moreinformation on human error in general, consult theHuman Error Website at http:rrwww.cba.hawaii.edurpankorhumanerrr.

References

w x1 A.J. Albrecht, J. Gaffney Jr., Software function, softwarelines of code, and development effort prediction: a software

Ž . Ž .science, IEEE Trans. Software Eng. 9 11 1983 639–648.w x2 C.M. Allwood, Error detection processes in statistical prob-

Ž . Ž .lem solving, Cogn. Sci. 8 4 1984 413–437.


w x Ž .3 B.J. Baars Ed. , Experimental Slips and Human Error,Plenum, New York, 1992.

w x4 V.R. Basili, R.W. Selby, Jr., Four applications of a softwaredata collection and analysis methodology, in: J.K. Skwirzyn-

Ž .ski Ed. , Software System Design Methods, Springer-Verlag,Berlin, 1986, pp. 3–33.

w x5 B. Beizer, Software Testing Techniques, 2nd edn., VanNostrand-Reinhold, New York, 1990.

w x6 Y.M.M. Bishop, S.E. Fienberg, P.W. Holland, Discrete Mul-tivariate Analysis: Theory and Practice, MIT Press, Cam-bridge, MA, 1975.

w x7 B.W. Boehm, Improving software productivity, Computer 20Ž . Ž .9 1987 43–57.

w x8 P.S. Brown, J.D. Gould, An experimental study of peoplecreating spreadsheets, ACM Trans. Office Information Sys-

Ž . Ž .tems 5 3 1987 258–272.w x9 E.G. Cale Jr., Quality issues for end-user developed soft-

Ž . Ž .ware, J. Systems Manage. 45 1 1994 36–39.w x10 S.K. Card, T.P. Moran, A. Newell, The Psychology of

Human–Computer Interaction, Erlbaum, Hillsdale, NJ, 1983.w x11 H.C. Chan, H.J. Lu, K.K. Wei, A survey of SQL language, J.

Ž . Ž .Database Manage. 4 4 1993 4–15.w x12 P.G. Cragg, M. King, Spreadsheet modelling abuse: an op-

Ž . Ž .portunity for OR?, J. Operational Res. Soc. 44 8 1993743–752.

w x13 R. Creeth, Microcomputer spreadsheets: their uses and abuses,Ž . Ž .J. Accountancy 159 6 1985 90–93.

w x14 A.D. Danu Tjahjono, Exploring The Effectiveness Of FormalTechnical Review Factors With CSRS, A Collaborative Soft-ware Review System, Technical Report ICS-TR-95-08, In-formation and Computer Science Department, Univ. ofHawaii, Honolulu, HI, 96822, June, 1996.

w x15 N. Davies, C. Ikin, Auditing spreadsheets, Australian Ac-Ž .countant, December 1987 pp. 54–56.

w x16 S. Ditlea, Spreadsheets can be hazardous to your health,Ž .Personal Computing, January 1987 pp. 60–69.

w x17 M.E. Fagan, Design and code inspections to reduce errors inŽ . Ž .program development, IBM Systems J. 15 3 1976 182–

211.w x18 B.D. Floyd, J. Pyun, Errors in Spreadsheet Use, working

paper 167, Center for Research on Information Systems,Information Systems Department, New York Univ., NewYork, 1987.

w x19 E.H. Forman, N.D. Singpurwalla, An empirical stopping rulefor debugging and testing computer software, J. Am. Stat.

Ž . Ž .Assoc., Application Section 72 360 1977 750–757.w x20 D. Freeman, How to make spreadsheets error-proof, J. Ac-

Ž . Ž .countancy 181 5 1996 75–77.w x21 D.F. Galletta, D. Abraham, M. ElLouadi, W. Lekse, Y.A.

Pollalis, J.L. Sampler, An empirical study of spreadsheeterror-finding performance, Accounting Manage. Information

Ž . Ž .Technol. 3 2 1993 79–95.w x22 D.F. Galletta, K.S. Hartzel, S.E. Johnson, J.L. Joseph, S.

Rustagi, Spreadsheet presentation and error detection, J.Ž . Ž .Manage. Information Systems 13 3 Winter 1996 45–63.

w x23 J.T. Grudin, Error patterns in novice and skilled transcriptionŽ .typing, Chap. 6, in: W.E. Cooper Ed. , Cognitive Aspects of

Skilled Typewriting, Springer-Verlag, New York, 1983, pp.121–143.

w x24 M.J.J. Hall, A risk and control oriented study of the practicesof spreadsheet application developers, in: Proceedings of the29th Hawaii International Conference on System Sciences,Vol. II, January 1996, pp. 364–373.

w x25 K. Hassinen, An Experimental Study of Spreadsheet ErrorsMade by Novice Spreadsheet Users, Department of Com-puter Science, Univ. of Joensuu, P.O. Box 111, SF-80101Joensuu, Finland, 1988.

w x26 J.R. Hayes, L.S. Flower, Identifying the organization ofŽ .writing processes, in: L. Gregg, E. Steinberg Eds. , Cogni-

tive Processes in Writing, Erlbaum, Hillsdale, NJ, 1980, pp.3–30.

w x27 D.G. Hendry, T.R.G. Green, Creating, comprehending, andexplaining spreadsheets: a cognitive interpretation of whatdiscretionary users think of the spreadsheet model, Int. J.

Ž . Ž .Human–Comput. Studies 40 6 1994 1033–1065.w x28 M. Igbaria, F.N. Pavri, S.L. Huff, Microcomputer applica-

Ž .tions: an empirical look at usage, Information Manage. 16 4Ž .1989 187–196.

w x29 D. Janvrin, J. Morrison, Factors influencing risks and out-comes in end-user development, in: Proceedings of the 29thInternational Conference on System Sciences, Vol. II, Maui,Hawaii, January 1996, pp. 346–355.

w x30 C. Jones, Programming Productivity, McGraw-Hill, NewYork, 1986.

w x31 B. Kantowitz, R.D. Rorkin, Human Factors: UnderstandingPeople–System Relationships, Wiley, New York, 1983.

w x32 R. Kee, Programming standards for spreadsheet software,Ž . Ž .CMA Mag. 62 3 1988 55–60.

w x33 K. Kukich, Techniques for automatically correcting words inŽ . Ž .text, ACM Computing Surveys 24 4 1992 377–436.

w x34 F.J. Lerch, Computerized financial planning: discoveringcognitive difficulties in knowledge building. Unpublisheddoctoral dissertation, Univ. of Michigan, Ann Arbor Sci.Publ., Ann Arbor, MI, 1988.

w x35 I. Lorge, H. Solomon, Two models of group behavior in theŽ .solution of eureka-type problems, Psychometrika 20 2

Ž .1955 139–148.w x36 M. Mattson, B.J. Baars, Error-minimizing mechanisms:

Ž .boosting or editing, in: B.J. Baars Ed. , Experimental Slipsand Human Error, New York, Plenum, 1992, pp. 263–287.

w x37 E.R. McLean, L.A. Kappelman, J.P. Thompson, ConvergingŽ .end-user and corporate computing, Commun. ACM 36 12

Ž .1993 79–92.w x38 R.E. Melchers, M.V. Harrington, Human Error in Simple

Design Tasks, Report No. 31, Civil Engineering ResearchReports, Monash Univ., 1982.

w x39 G.A. Miller, The magic number seven plus or minus two,Ž .Psychol. Rev. 63 1956 81–97.

w x40 Morrison, 1995.w x41 G.J. Myers, A controlled experiment in program testing and

Ž .code walkthroughsrinspections, Commun. ACM 21 9Ž .1978 760–768.

w x42 R.R. Panko, End User Computing: Management, Applica-tions, and Technology, Wiley, New York, 1988.


w x43 R.R. Panko, R.H. Halverson, Jr., Are two heads better thanŽ .one? at reducing errors in spreadsheet development , Office

Systems Res. J., forthcoming.w x44 R.R. Panko, R.H. Halverson, Jr., Introduction to the mini-

track on risks in end user computing, in: Proceedings of the29th Hawaii International Conference on System Sciences,

Ž .Vol. II, Maui, Hawaii, January 1996a pp. 324–325.w x45 R.R. Panko, R.H. Halverson, Jr., Spreadsheets on trial: a

survey of research on spreadsheet risks, in: Proceedings ofthe 29th Hawaii International Conference on System Sci-

Ž .ences, Vol. II, Maui, Hawaii, January 1996b pp. 324–325.w x46 L.H. Putnam, W. Myers, Measures for Excellence: Reliable

Software on Time, on Budget, Yourdon, Englewood Cliffs,NJ, 1992.

w x47 J. Reason, Human Error, Cambridge Univ. Press, Cambridge,UK, 1990.

w x48 B. Ronen, M.A. Palley, H. Lucas, Spreadsheet analysis andŽ . Ž .design, Commun. ACM 32 1 1989 84–92.

w x49 B. Spencer, Software inspections at applicon, in: T. Gilb, D.Ž .Graham Eds. , Software Inspection, Addison-Wesley, Work-

ingham, England, 1993, pp. 264–279.w x50 R.H. Sprague, Jr., E.D. Carlson, Building Effective Decision

Support Systems, Prentice-Hall, Englewood Cliffs, NJ, 1982.w x51 Strauss, Ebenau, 1983.w x52 S.H. Strauss, R.G. Ebenau, Software Inspection Process,

McGraw-Hill, New York, 1994.w x53 T.S.H. Teo, M. Tan, Quantitative and qualitative errors in

spreadsheet development, in: Proceedings of the ThirtiethHawaii International Conference on System Sciences, Vol.III, Kihei, Hawaii, January 1997, pp. 149–155.

Dr. Raymond R. Panko is a professor ofdecision sciences in the College of Busi-ness Administration at the University ofHawaii. He received his Ph.D. fromStanford University. He has been in-volved with end-user computing sincethe 1960s and began his current pro-gram of research on spreadsheet re-search in 1993. He is also involved ingroupwork research; in fact, his initialwork on spreadsheet development wasan exploration of groupwork in an end

user computing context. His e-mail address is [email protected] home page is http:rrwww.cba.hawaii.edurpanko. He main-tains a website on spreadsheet research. That URL ishttp:rrwww.cba.hawaii.edurpankorssrr.

Dr. Ralph H. Sprague, Jr. is also a pro-fessor of decision sciences in the Col-lege of Business Administration at theUniversity of Hawaii. He received hisPh.D. from Indiana University. Dr.Sprague is one of the most widely-citedauthors in Decision Support Systems.His e-mail address is [email protected]. His home page is http:rrwww.cba.hawaii.edursprague.

Documents

Hitting the wall: errors in developing and code inspecting a `simple' spreadsheet model