Evaluation Practices of Instructional Designers and Organizational Supports and Barriers

82 PERFORMANCE IMPROVEMENT QUARTERLY

Performance Improvement Quarterly, 9(4) pp. 82-92

IntroductionMost instructional systems design

and performance technology modelsincorporate summative evaluation aspart of the systematic process for solv-ing knowledge or performance defi-ciencies. Summative evaluation oftraining is an attempt to obtain infor-mation on the long term effects of atraining program and to assess thevalue of the training in light of thatinformation (Sims, 1993). In-spite ofits importance, there are reasons tobelieve that this essential instruc-tional design activity is not practicedby instructional designers. This ar-ticle presents a discussion ofevaluation’s role in the instructionaldesign process, reviews evidence thata performance problem exists, andthen presents the findings of researchexamining the summative evaluationpractices of instructional designers.

BackgroundAccording to Phillips (1983),

evaluation is undertaken for twopurposes: to improve the trainingprocess and to decide whether or notto continue with it. Evaluation ben-efits trainers, training participants,and their sponsoring organization ordepartment (Roberts, 1990). Train-ers and instructional designers canreceive useful information on howthe program can be improved. Evalu-ation sessions pinpoint difficultiesand identify which training methodsare most effective. Evaluation crite-ria helps trainees identify 1) whetherthe training helped them performtheir job more effectively, 2) describetheir confidence in their abilities andskills, and 3) offer their reaction tothe instructional situation. For theorganization evaluation helps deter-mine whether training has improved

Evaluation Practices of Instructional Designers andOrganizational Supports and Barriers

Leslie MollerPamela MallinPennsylvania State University

ABSTRACT

The overall goal of this study was todetermine the status quo of evaluationefforts of instructional designers aspart of a systematic instructional sys-tem. This descriptive research investi-gated what organizational and mana-gerial frameworks are required to sus-tain the evaluate component of system-atic instructional design. Specifically,this survey: 1) Identifies and describes

current evaluation practices and strat-egies of instructional designers withincorporate and higher educational set-tings, 2) Describes types and causes oforganizational supports and barriersinstructional designers encounter asrelated to conducting instructionalevaluation, and 3) Compares and con-trasts professional practices and orga-nizational issues by industry.

VOLUME 9, NUMBER 4/1996 83

its staff’s job performance suffi-ciently to justify the cost. The organi-zation looks for direct change in thework performance leading to positivebusiness results.

This is consistent withKirkpatrick (1994) who states thatthere are three specific reasons whyevaluation is needed. Kirkpatrickdescribes evaluation 1) as helping tojustify the existence of the trainingdepartment by showing how it con-tributes to the organization’s objec-tives and goals, 2) providing informa-tion to ascertain whether to continueor discontinue training programs,and 3) to identify methods for im-proving future training programs.Clearly, evaluation has a significantrole in the corporate instructionaldesign environment. Approachesvary but there seems to be little, ifany, disagreement that evaluationshould be used to improve trainingprograms. From an educational per-spective, evaluation is a significantfactor in determining the long-termeffectiveness of training. From abusiness perspective, when trainingis evaluated and justified, manage-ment is less likely to view the train-ing department as a target shoulddownsizing occurs. The most com-mon reasons to evaluate are to decideon whether a particular training pro-gram should be continued and how toimprove a training program.

This research is built on the as-sumption that evaluation is an es-sential instructional design activity.One of the most widely used modelsfor training in a corporate trainingenvironment is Kirkpatrick’s model.Kirkpatrick’s model consists of fourlevels or types of evaluation. Thismodel was selected as the basis forthis survey due to the concrete na-

ture of its description and as a com-mon thread for benchmarking. Thefollowing briefly reviews what is ac-complished in each of the four levels.

Level OneThe first level of training program

evaluation is the reaction or critiqueevaluation. In reality it is a customersatisfaction index. It lets you knowwhat the participants think of theprogram and delivery (Kirkpatrick,1994). It includes evaluating materi-als, instructors, facilities, methodol-ogy, content, and so forth. It does notinclude a measure of the learningthat takes place (Kirkpatrick, 1994;Phillips, 1983).

Level TwoThis level evaluates whether the

participants have learned the statedobjectives of the program. There aretwo primary purposes of Level Twomeasures: to determine the degree towhich learners have actually learnedwhat was intended and to determinehow a number of factors influencedlearning. These factors are coursecontent, learning activities, sequenc-ing of events, course material, in-structional aids, the learners them-selves, and the environment(Kirkpatrick, 1994).

Level ThreeThis level evaluates whether par-

ticipants are using on the job whatthey have been taught in the trainingprogram. The learning of knowledgeand skills does not guarantee theirtransfer. There are four categories oflearning outcomes that can be evalu-ated. They are: affective learningoutcomes, cognitive learning out-comes, behavioral or skill learningoutcomes, and operational outcomes.


In Kirkpatrick’s model, Level Threeevaluation focuses on behavior or skillapplication only (Kirkpatrick, 1994).

Level FourEvaluations at this level are used

to relate the results of the training toorganizational improvement. Some ofthe results that can be examined in-clude cost savings, work output im-provement, and quality changes(Kirkpatrick, 1994; Phillips, 1983).The calculation of training costs isrelatively easy; what is difficult ismeasuring the benefits. The methodsof computing benefits will varygreatly from one situation to another.It may be necessary to determine thebenefits by calculating the change inproductivity, the decrease in produc-tion costs, or the increase in output(Kirkpatrick, 1994; Caffarella, 1988).

Although it is clearly estab-lished, in instructional design

theory, that an effective training in-tervention requires evaluation ac-tivities, questions remain onwhether practice is keeping pacewith research. A previous ASTD pollwas conducted among training lead-ers in terms of their evaluation prac-tices. Using Kirkpatrick’s model asa reference point, the ASTD resultsindicated that at Level One close to100 percent evaluated training pro-grams at the participant reactionlevel. Twenty-five percent of all thetraining programs were evaluatedat Level Two, the learning level. Be-havior change, Level Three was theleast measured, with only 10 per-cent of companies evaluating at thislevel. The organizational results,Level Four, were evaluated twenty-five percent of the time (Carnevale,1990). (See Figure 1.) That study didnot indicate how the evaluation wasconducted, if any patterns existed,

Figure 1. ASTD Survey Results


or why evaluation was or was notconducted.

Additional information isneeded to help to describe theevaluation practices of instruc-tional designers. Specifically, thisresearch investigated four ques-tions. The questions are:

• What are typical evaluation prac-tices?

• Is there any variation in practicedue to type of organization?

• How do practitioners implementthe various levels of evaluation?

• What are the organizational sup-ports and barriers to implement-ing evaluation activities?

MethodA 30-item questionnaire was de-

veloped and pilot tested at PennState University. The subjects forthe pilot test were Masters levelgraduate students who were alsopracticing instructional designers.Following revisions based on the pi-lot test, the questionnaire was sent to300 randomly selected InternationalSociety or Performance Improve-ment (ISPI) members and 100 ran-domly selected American Educa-tional Communication Technologysociety (AECT) members who werepreviously self-identified as instruc-tional designers. Two follow-up let-ters were sent to non-respondents.One hundred ninety-one question-naires, of the original four hundred,were returned and analyzed. Thequestionnaire consisted of 5 demo-graphic questions, 11 multiple-choice descriptive questions regard-ing specific evaluation related activi-ties, 8 open-ended questions explor-ing how each of the four evaluationsmay be performed, and 6 Likert-

scales to identify the respondent’sattitudes towards evaluation. Theinformation was analyzed using SASand Hyperqual software programs.Limitations of this research are thepotential bias associated with anyself-reported information that is vol-untarily reported.

ResultsThe results of the questionnaire

indicated that, as expected, an over-whelming number of training pro-grams are evaluated for learner reac-tion. Specifically, 89.5% of the re-spondents are conducting one ormore types of Level One, or end-of-course learner reaction, evaluations.Respondents were instructed tocheck all methods they “routinelyuse.” The vast majority of those re-sponding indicated the use “of smilesheets” (74%), quantitative question-naires (72%), and open-ended ques-tionnaires (67.5%) as their data-col-lection method. Respondents also in-dicated that only 57% of their clientsexpected that learner reaction wouldbe evaluated.

The data indicated little differencewhen the responses are aggregated bythe type of organization. With theexception of the Academic/HigherEducation Institution category, thereappears to be no practical differencebetween types of organizations interms of the rate of use of Level Oneactivities. Respondents in the Aca-demic/Higher Education Institutioncategory reported that only 72% com-plete Level One or participant reac-tion evaluation. This may be due tohigher sensitivity by corporate-basedinstructional designers to the impor-tance of customer satisfaction.

Inquiries regarding Level Two,learning achievement, indicate that


nearly 71% of respondents use one ormore forms of evaluation to identifygains in learning after instructionhas occurred. Specifically, 44% of re-spondents utilize a pretest/post-testdesign to measure learning after in-struction, 46% use a post-test onlymeasure as their method for evaluat-ing learning achievement. A less ac-curate “learner self-report method”was used by 38% of respondents. Anamazing 30% of responding practi-tioners report not evaluating learn-ing achievement. (See Figure 2.) Aswith Level One evaluation however,instructional designers take a lead-ership role in evaluation activities,conducting far more evaluation atthese levels than their clients expect.Only 48% of questionnaire respon-dents indicated that their clients re-quire Level Two results. Further-more, our respondents, as compared

with those in the ASTD study, mea-sure learning achievement signifi-cantly more often.

The practice of evaluating learn-ing, when examined across indus-tries, varies greatly, Academic/Higher Education and Healthcarefields had the lowest reported rates(40% and 55.5% respectively). Per-haps not coincidentally, these twogroups also reported that fewer cli-ents expected Level Two evaluation.

In the Transportation industry,however, 100% of questionnaire re-spondents indicated that they con-duct learning evaluation. At thesame time, they share withHealthcare a 33% client expectationrate. Other high response ratescame from the Military and Utility/Communications groups. Both ofthese industries report 100%completion of learner knowledge/

Figure 2. Level Two Methods Reported by Respondents


skill evaluation with substantial cli-ent expectation rates.

Level Three evaluation, or trans-fer of training, is measured by only43% of those responding to the ques-tionnaire, with a little over a thirdbelieving that their clients want thisinformation. While the number ofparticipants who report conductingLevel Three evaluations is four timesgreater than was indicated in theASTD study, this is a significantlylower number than are conductingLevel Two evaluations.

The extent of Level Three evalua-tion also varies by industry. Only theMilitary and Transportation indus-tries report 100% in response toLevel Three activities. Government(36%), Academic (17%) andHealthcare (11%) report the leastamount of Level Three evaluation,although their clients’ expectations

are similar to those of the Transpor-tation industry. In the remainingcategories, only Utility/Communica-tions (61%), Retail (60%) and Con-sulting (52%) reported routinelymeasuring whether transfer of train-ing occurred (See Figure 3.).

Of the participants reporting useof Level Three evaluation, the meth-ods most often used for conductingtransfer of training assessment in-clude communications with supervi-sors (57 responses) and participantfeedback (45 responses). These realnumbers translate into very low per-centages with an even more disturb-ing possibility that these methodsmight not be effective for Level Threeevaluation, depending on the type ofinformation the manager and par-ticipant questionnaire or interviewyields. Other methods reported in-cluded examining job performance

Figure 3. Level Three Evaluation Activities


Figure 4. Level Three Evaluation Methods

data (23), direct observation of per-formance (16) and use of perfor-mance appraisal systems (8). Othertypes of Level Three evaluationmethods included peer evaluation(4), retesting of participants and, (3)studying retention and promotionrates (1) (See Figure 4.).

Level Four evaluation, or evalu-ating the effectiveness of training atresolving the original need, is con-ducted by 65% of questionnaire re-spondents. When the data is aggre-gated by type of organization, wefind a consistently high rate of re-spondents perform level four evalu-ations. These ranged from a low of55% (Government) to 100% (Trans-portation), with others reporting asfollows: 64% (Consulting & Indus-trial/Manufacturing), 67% (Aca-demic), 70% (Utility/Communica-tions), 73% (Financial/Insurance),

75% (Military), 78% (Health Care),and 80% (Retail). (See Figure 5.).

These are surprisingly high num-bers given that slightly over half ofthe respondents do not report a posi-tive benefit by measuring thechange. This high number combinedwith the failure to report the resultsraises real questions concerning thequality of Level Four evaluationsand whether that information isused. Examination of how respon-dents collect Level Four informationindicates that more responded posi-tively than should have.

The study asked respondents howthey measured the effectiveness ofthe instructional materials in resolv-ing original needs. Using the mostliberal interpretation of how thisevaluation took place, only 40 of 191respondents identified methods suf-ficient to qualify as Level Four evalu-


ation. This is a far lower numberthan one would expect given that65% reported conducting any LevelFour evaluation.

There are several possible expla-nations for this discrepancy. First,the questionnaire was faulty andrespondents were unable to providethe correct information. However,given the extensive pilot testing andthe quality of other questions/re-sponses this, though possible, is un-likely. Another explanation is thatrespondents do not truly under-stand how to measure if an originalneed was resolved and therefore,use inadequate methods. Thisseems more likely when viewed inthe context of answers given to pre-vious questions. Also, many re-sponses identified methodologies

that were more appropriate for LevelOne or Two as being used in LevelFour. For example, answers included“asking instructors after the train-ing” as a measurement of learninggain (or loss). It is impossible, giventhe existing data, to determine whichexplanation is more valid.

This study also examined the or-ganizational supports or barriers asthey relate to evaluation. Of organi-zations that have a formalized andstated methodology that describeshow a particular organization pro-vides training, only 77% requireevaluation. Forty percent of respon-dents stated that conducting evalu-ations is not part of their job descrip-tion. Furthermore, only 63% reportthat any reference is made to con-ducting evaluation during their

Figure 5. Level Four Evaluation Activities


yearly performance appraisal. Only52% report that they have the timeor resources to conduct evaluation.While the survey did not breakthese organizational issues down interms of levels of evaluation, onecould assume, based on responsesto early questions, that when atten-tion is given to evaluation, it is atthe lower levels.

In organizations where evalua-tion is “part of the job description”more respondents (33%) do not con-duct Level Three evaluations (trans-fer of training) than those who do(27%). Once again, a contradictionexists because when evaluations arepart of the job, respondents are twiceas likely to conduct evaluation if it isrequired (40% vs. 20%), yet practiceof evaluation appears not to extendto Level Three.

The survey question that per-tained to organizational barriers toconducting evaluation was openended. Of the 191 questionnairesreturned, 168 mentioned barriers intheir organization. Overall, lack oftime and/or resources dominatedthe explanation of barrier types.Many of those responses also identi-fied a resistant culture to evaluationas a critical barrier. Typical re-sponses included “fear of what theevaluation may describe,” “lack ofconcern for quality…just produc-tion,” and “lack of knowledge by cli-ents or management as to the valueof evaluation.” Several respondentsidentified “lack of access to data”(baseline or post-training) as thebarrier they often face. This is mostlikely an extension of the resistantculture previously mentioned. A fewresponses indicated “lack of trainingmethodology and/or lack of ability”as possible barriers.

DiscussionThis study yielded a mixture of

positive and negative results. On thepositive side, we as a profession aredoing a good job at Level One andLevel Two. It is not surprising, giventhe simplicity of Level One evalua-tions that almost everyone is analyz-ing learner reaction. However, asimportant as it is to understand the“learner as a consumer,” this alone isan inadequate evaluation of the roleof training. Furthermore, whenlearner reaction is the only form ofevaluation, we may get a misrepre-sentation of the effectiveness of thetraining and, possibly more damag-ing, instructional designers may beinfluenced to design training thatappeals to learners’ wishes at theexpense of effective instructional de-sign. This is an understandable, al-though far from ideal, reaction ofpractitioners who do not have otherdata to back up their work.

We are also doing a good job inLevel Two evaluation, although itdoes appear that we need to improveour methods of evaluating leanerachievement, given the low numberswho use measures appropriate forLevel Two evaluation. However,given the resistance to evaluationmany instructional designers de-scribed, it is surprising how “good”we are doing.

The inadequate use of LevelThree and Level Four evaluation(transfer of training and end-resultchange) is understandable due tothe required resources, skills, andmotivation, but disturbing given thestrong momentum for performancetechnology—which promotes behav-ior change and end-results as the keycriteria of success. Also, the discrep-ancy between the number claiming


to conduct Level Four evaluationsand the number actually using ap-propriate tools or methods is signifi-cant and may hold negative conse-quences, both for the individual situ-ations and the profession at-large.

Many questions about the qualityof evaluation practice are answeredwhen we explore the context in whichinstructional design is performed.When almost 90%of practitionersreport their com-panies place sig-nificant barriersto evaluation andover 30% describetheir workplace ashaving a culturethat resists evalu-ation, it becomesquite clear why aprogram of effec-tive evaluation isnot occurring. Onesignificant ques-tion that remainsunanswered iswhat efforts in-structional de-signers are ex-pending to edu-cate their manag-ers and clients.

Part of the roleof an instructionaldesigner, according to Rothwell andKazanas (1992) is to justify and com-municate their efforts. It appears, forreasons unknown, that we are failingat this task. Instructional designers,especially with the shift towards per-formance technology, recognize thatperformance is the result of expend-ing effort, having the skills andknowledge, and being allowed the op-portunity to utilize those skills. If we

accept the sentiment (often expressedin the questionnaires) of an existingawareness and support for the impor-tance of expanding evaluation, andapply the skills and knowledge weshould possess, then we must focusour collective efforts on the opportuni-ties to evaluate training. Perhaps oneof the challenges of our profession isthat before we rush to apply perfor-

mance technologyto fix the rest ofthe organization,we should lookfirst at our ownenvironments.

As is often thecase with re-search, eachstudy may pro-vide one small an-swer while identi-fying many newquestions. Al-though furtherresearch is neces-sary, some impli-cations are evi-dent. Specifically,the professionneeds to focus onthree areas. First,the instructionaldesigners skill/knowledge ofevaluation meth-

ods—while this study did not investi-gate existing knowledge issues,there are indications that instruc-tional designers need to further de-velop their ability to create evalua-tion activities, especially those thattest learning achievement and long-term impact to the organization. Sec-ond, instructional designers need toimprove their ability to communicateand educate clients and organiza-

Inadequate use ofLevel Three and

Four evaluation isunderstandable due

to the requiredresources, skills,

and motivation, butdisturbing given

the strongmomentum forperformance

technology—whichpromotes behaviorchange and end-results as the keycriteria of success.


tions as to the importance of evalua-tion to effective training. Third, in-structional designers need to furtherexamine their cultures to determinethe extent of the resistance to evalu-ation and address the resistance inthe design and reporting of evalua-tion activities.

This research project was a firststep in understanding and improv-ing the practice of instructional de-sign as it relates to evaluation. Inbroad strokes, it benchmarks evalua-tion activities and describes the orga-nizational context for many instruc-tional designers. Although the datamay not be surprising and may evenbe disappointing, it confirms, throughempirical evidence, much of the anec-dotal information many have heardand shared. Future research shouldfocus on answering several key ques-tions. Specifically, questions for fu-ture research include:

• How have organizations that suc-cessfully conduct evaluationsbuilt a support structure?

• How do instructional designerssuccessfully communicate the im-portance of evaluation and build asupport structure?

• Do instructional designers havethe necessary skills and knowl-edge to conduct all types ofevaluation?

• Can we create new evaluationmodels that are more efficientto use?

ReferencesCaffarella, R. (1988). Program develop-

ment and evaluation resource bookfor trainers. New York: John Wiley& Sons.

Carnevale, A. (1990). Evaluation prac-tices. Training & Development Jour-nal, 44(7), s23 - s29.

Kirkpatrick, D.L. (1994). Evaluatingtraining programs, the four levels.San Francisco, CA: Berrett-KoehlerPublishers.

Phillips, J.J. (1983). Handbook of train-ing evaluation and measurementmethods. Houston, TX: Gulf Pub-lishing.

Roberts, A. (1990). Evaluating trainingprograms. International Trade Fo-rum, 26(4), 18-23.

Rothwell, W., & Kazanas, H.C., (1992)Mastering the instructional designprocess. Jossey Bass.

Sims, R. (1993). Evaluating public sectortraining programs. Public PersonnelManagement, 22(4) 591-615.

LESLIE MOLLER, Ph.D. is Assis-tant Professor and Program Coor-dinator of the Instructional Sys-tems Program at The Pennsylva-nia State University-Great Valley.Mailing address: 30 E. SwedesfordRoad, Malvern, PA 19355. E-mail:[email protected]

PAMELA MALLIN is a graduatestudent in the Instructional Sys-tems Program at The Pennsylva-nia State University-Great Valleyand Training Manager for AT&T.Mailing address: 30 E. SwedesfordRoad, Malvern, PA 19355.

Documents

Evaluation Practices of Instructional Designers and Organizational Supports and Barriers