Peer and self assessment during problem-based tutorials

Peer and Self Assessment duringProblem-based Tutorials

Maura E. Sullivan, RN, MS, Maurice A. Hitchcock, EdD, Los Angeles, California,Gary L. Dunnington, MD, Springfield, Illinois

BACKGROUND: Peer and self assessment may con-tribute a unique and insightful perspective to astudents’ performance. This study investigatesthe association between self, peer, and facultyevaluations in the intimate setting of a problem-based tutorial group.

METHODS: Third-year medical students participat-ing in the required third-year surgical clerkshipduring the 1996–97 academic year (n 5 154)were randomly assigned to problem-based learn-ing groups and completed self and peer evalua-tions at the end of the last tutorial. These evalua-tions were compared with expert tutor ratingsusing Pearson correlation coefficients.

RESULTS: A moderate correlation was found be-tween peer and tutor ratings. There was very lit-tle correlation between self and tutor ratings.

CONCLUSIONS: The results of this study suggestthat peer and self ratings in the setting of a tuto-rial group may provide additional valuable infor-mation regarding medical student performanceduring a surgery clerkship. Am J Surg. 1999;177:266–269. © 1999 by Excerpta Medica, Inc.

As we shift toward a new paradigm in teaching thatrecognizes a rapidly changing knowledge base inmedicine and the need for students to become

lifelong learners, there comes a need for different types ofevaluation methods. Thus far in education we have de-pended on faculty ratings as our primary method of assess-ing medical students, however, self and peer assessmentsmay provide valuable insight regarding student perfor-mance. In order for students to acquire lifelong learningskills, they must develop the ability to critically evaluatethemselves. As educators, if we are able to assess students’ability to identify their own strengths and weaknesses ascompared with their peers and faculty, we can gain valu-able insight regarding the self-directed learning skills ofstudents. In addition, peer assessment may provide a

unique and insightful perspective into the overall perfor-mance of medical students. In the intimate setting of aproblem-based tutorial, where students are more activelyinvolved in the learning process, the value of these assess-ments becomes even more salient. How students perceivethemselves and their learning and the extent to which peerand self evaluations contribute to the overall assessment ofmedical students should be important considerations to usas educators as we continually examine and modify ourcurricula.

The literature shows that self and peer evaluations pro-vide valuable insight into the performance of medicalstudents. It has been demonstrated that peer ratings (1) area good predictor of future performance,1–4 (2) are inter-nally consistent and reliable,2 and (3) provide informationregarding student performance that is not measured byother traditional evaluation methods.1,5 In addition, self-assessment has been shown to contribute to continuededucation and lifelong learning,6,7 motivate the studentphysician to gather and improve knowledge and skillsrelated to patient management,8 and “have educationalmerits as a measure of non-cognitive abilities associatedwith clinical performance.”9

There have been only three previous studies investigatingthe extent to which self and peer ratings correlate withthose of the faculty.10–12 The varied results reported inthese studies make it difficult to draw conclusions on theassociation of peer, self and faculty ratings. In addition,there has been little investigation of peer and self evalua-tion skills in the setting of a problem-based tutorial. Onlyone study has been identified in the medical literature inwhich peer evaluations were investigated in a tutorialsetting.13 However, this study focused on the correlationbetween peer evaluation and knowledge gain and not onthe correlation among the raters themselves.

Problem-based learning is characterized by the use ofclinical cases in a small tutorial setting where studentslearn problem-solving skills and acquire knowledge aboutthe clinical sciences in the context of the case presented.14

“The basic outline of the problem-based learning process isencountering the problem first, problem solving with clin-ical reasoning skills, identifying learning needs in an inter-active process, self study, applying newly gained knowledgeto the problem, and summarizing what has been learned.”15

In a tutorial group students are held responsible for thecommunication of difficult clinical concepts, yet are sel-dom held responsible for evaluation methods. One wouldassume that in this intimate setting, students are in a betterposition to provide an accurate and valuable assessment ofthemselves and their peers. These assessments may bevaluable to evaluators as they provide additional informa-

From the Departments of Surgery and Medical Education (MES,MAH), University of Southern California School of Medicine, LosAngeles, California, and Southern Illinois University School ofMedicine (GLD), Springfield, Illinois.

Requests for reprint should be addressed to Maura E. Sullivan,RN, MS, LAC1USC Medical Center, Department of Surgery,1200 North State Street, Rm. 9900, Los Angeles, California90033.

Manuscript submitted August 17, 1998, and accepted in re-vised form September 22, 1998.

ASSOCIATION FOR SURGICAL EDUCATION

266 © 1999 by Excerpta Medica, Inc. 0002-9610/99/$–see front matterAll rights reserved. PII S0002-9610(99)00006-9

tion regarding students’ performance. The purposes of thisstudy are twofold: to determine if students can identifytheir own strengths and weaknesses as compared with theirpeers and faculty, and to determine the extent to whichpeer and faculty ratings correlate in a problem-based tuto-rial setting.

METHODSSubjects

Third-year medical students at the University of South-ern California participating in the required surgical clerk-ship during the 1996–97 academic year (n 5 154) partic-ipated in this study.

InstrumentationThe instrument used in this study was developed at the

University of California at Los Angeles and used for prob-lem-based tutorials prior to the incorporation of it into thisstudy. Each tutor rated each student on a 5-point scale(scale: 1, poor, to 5, outstanding) on the items of problem-solving, independent learning, and group participation.Students used the same form to evaluate themselves andtheir peers. Faculty were introduced to the instrument at afaculty development workshop that was held prior to theonset of the first tutorial and were given feedback regardingits use throughout the year. Students were introduced tothe instrument during the last tutorial of the clerkship.Both faculty and students were instructed to provide aglobal rating score on each of the three items contained inthe instrument.

ProcedureAll students were randomly assigned to a problem-based

learning group (mean 8 6 1). Each group met a total of sixtimes and discussed three different cases. Each case con-sisted of a 1-hour session followed by a 2-hour session 2days later. A different tutor was randomly assigned to eachgroup for each case. After the second session of each casethe tutor filled out an evaluation form. At the end of thelast case the students were given the same form and askedto evaluate themselves and their peers on the same items.Students were instructed on the importance of peer andself-evaluation skills and were encouraged to provide ac-curate and honest responses. They were also assured thatthese evaluations would be kept confidential and would notinfluence their grades. All forms were collected by the endof the clerkship. Pearson correlation coefficients were usedto examine the associations among self, peer, and tutorratings.

RESULTSComplete responses were received from 152 students.

One student failed to complete a self rating and one stu-dent failed to turn in a completed form. The Table showsthe mean and standard deviation for all three areas inves-tigated. Faculty ratings were the lowest on all items. Stu-dents tended to rate themselves higher than faculty andpeers. Self ratings were highest in the areas of problemsolving and group participation.

The highest correlation was found between peer andfaculty ratings. There was a moderate correlation in theareas of independent learning (r 5 .50) and group partic-

ipation (r 5 .54), and a lower correlation in the area ofproblem-solving (r 5 .24). All of these findings were sta-tistically significant at the 0.01 level.

The lowest correlation was found between self and facultyratings. There was very low correlation in the area ofproblem solving (r 5 .11), a low correlation in the area ofindependent learning (r 5 .24), which had statistical sig-nificance at the 0.01 level, and a low correlation in the areaof group participation (r 5 .18), which had statisticalsignificance at the 0.05 level.

The correlation between self and peer ratings were lowbut had statistical significance. In the area of problem-solving, the correlation (r 5 .18) was significant at the 0.05level, and in the areas of independent learning (r 5 .21)and group participation (r 5 .23) results were significant atthe 0.01 level.

It is important to note that, although the correlationbetween most variables reached statistical significance,overall the proportion of variance explained was quitelow. With the exception of the correlation betweentutor and peer evaluations in the areas of independentlearning (25%) and group participation (29%), the cor-relation between all other variables had a proportion ofvariance explained of 6% or less, thus indicating a poorassociation.

COMMENTSBased on the results of this study we conclude that (1)

students are not able to identify their own strengths andweaknesses as compared with their peers and faculty, and(2) there is only a moderate correlation between peer andexpert ratings in a tutorial setting. The low correlation thatwe found between self and peers and self and faculty ratingssupport those found in the literature by Risucci and col-leagues,10 Calhoun et al,11 and Morton and MacBeth.12

Risucci and colleagues10 investigated the ratings of surgicalresidents by self, supervisors, and peers and found thatself-ratings failed to correlate significantly with eithergroup. Calhoun et al11 studied the ability of second-yearmedical students to evaluate their physical examinationskills as compared with expert faculty and concluded thatstudents were not able to accurately assess their own per-formance. Morton and MacBeth12 investigated the corre-lation among staff, peer, and self-assessments of fourth-yearstudents in surgery and found a low correlation betweenself and peers (r 5 .24) and self and staff (r 5 .32).

A possible explanation for these results is that studentsare not routinely taught self-evaluation skills in a tradi-tional curriculum. Like any other skill, students first needto be introduced to the concept and then allowed topractice the skill before they become comfortable enoughto incorporate it into their professional behaviors. We

TABLEMean Assessment and Standard Deviation of Student

Performance during Problem-based Tutorials

Self Peer Tutor

Problem solving 4.37 6 .69 4.29 6 .41 3.89 6 .48Independent learning 4.29 6 .74 4.40 6 .40 4.00 6 .53Group participation 4.47 6 .72 4.31 6 .44 3.91 6 .64

ASSESSMENT DURING PROBLEM-BASED TUTORIALS/SULLIVAN ET AL

THE AMERICAN JOURNAL OF SURGERY® VOLUME 177 MARCH 1999 267

would not expect students to demonstrate expert problem-solving or technical skills without first an introduction andthen guidance and mentoring. Therefore, we should notexpect the development of self-assessment skills to be anydifferent.

The moderate correlation between peer and faculty rat-ings supports the results found by Morton and MacBeth.12

In the same study mentioned previously, the authors foundthe correlation between peers and faculty to be moderatelycorrelated (r 5 .53). These results and the results found inthe present study are lower than the high correlation foundbetween peers and faculty (r 5 .92) by Risucci and col-leagues.10 A possible explanation for this moderate corre-lation is that peer ratings may evaluate a different domainthan faculty ratings. This does not necessarily mean thatthey are inferior. Peer evaluations may offer a view of thestudent often not available, since students know their peersfrom a different perspective. Physicians are likely to beviewed differently by peers, patients, and themselves. Each“viewer” in turn provides a unique and meaningful perspec-tive to the overall assessment of the physician’s perfor-mance.16 Schumacher5 demonstrated that peer evaluationsare one of the most likely measures to assess the interper-sonal skills of medical students. This is extremely impor-tant as it is generally agreed that the professional compe-tence of physicians includes not only medical knowledgeand technical skills, but also good personal characteristics,interpersonal skills, professional attitudes and ethical stan-dards.17 These skills, although critically important in med-ical education, are difficult to measure by other means.

Peer evaluations have been shown to provide a means ofassessing student’s noncognitive abilities.2 The normativepractice in medical education has long been one of agreater focus on assessing students’ performance from acognitive perspective than on student’s noncognitive abil-ities. This is critical when one considers that as many as30% of residents who fail to complete a residency programare terminated because of what are called noncognitiveproblems.18

There is widespread disagreement in the literature aboutthe validity of faculty ratings as the gold standard.19,20

Perhaps in the intimate setting of a problem-based tutorialstudents are actually better evaluators of their peers’ per-formance. In our study, students remained in the sametutorial groups for the entire clerkship, thus meeting a totalof six sessions. As the tutor was different for each case,faculty spent only two sessions with each group. Thus,students spent more time with each other than they didwith the surgical faculty. Students are also more familiarwith each others past ability and performance. In a studyinvestigating knowledge gain in a problem-based surgeryclerkship, Schwartz and colleagues13 demonstrated thatthere was a higher correlation between peer evaluationsand knowledge gain (r 5 .51) than tutor ratings (r 5 .18)or preceptor ratings (r 5 .12). The question of who canbest measure student performance in this setting remainsunanswered.

IMPLICATIONSThe low correlations between self and peer and self and

faculty ratings are significant in the sense that they dem-onstrate the inability of students to identify their own

strengths and weaknesses. As self-evaluation skills are im-portant in the development of lifelong learning habits,then it should be a goal of our curricula to foster anddevelop these skills in our students. Like any other skills,self-assessment skills must be practiced in order for them toimprove. In our study students assessed themselves andtheir peers only once, therefore it was not possible tomonitor the development of these skills. Further researchneeds to be done in the tutorial setting to investigatewhether these skills will develop over time.

The moderate correlation between peer and faculty rat-ings suggest that peers may contribute a unique and in-sightful perspective into a student’s performance. It hasbeen demonstrated in the literature that peer evaluationsmay be better predictors than faculty ratings of futuresuccess of students.1–4,13 Further studies need to be inves-tigated in the tutorial setting in order to determine if thecorrelation between faculty ratings and other measures ofperformance is stronger than peer-ratings and other mea-sures of performance. The purpose of these studies would beto determine if peer and self evaluations are inferior orsuperior to faculty ratings in a tutorial setting.

Various types of measurements are essential for the broadevaluation of clinical performance.13 Self-assessment skillsare important in the development of lifelong learninghabits and provides educators with an additional lensthrough which to view student behavior. This study sug-gests that students are not able to accurately evaluatethemselves, and that this is an issue that needs to beaddressed in the undergraduate curriculum. The moderatecorrelation between peer and faculty ratings suggests thatpeers may provide a perspective that is otherwise notobtainable. In the intimate setting of a problem-basedtutorial group, peer and self assessments may provide anadditional valuable perspective of student behavior thatcan assist educators with their overall performance evalu-ation of students. It should be the goal of every medicalschool to provide an accurate and thorough evaluation ofeach student. Incorporating these underutilized methods ofevaluation into each curriculum is a critical step towardthis goal.

REFERENCES1. Kubany AJ. Use of sociometric peer nominations in medicaleducation. J Appl Psychol. 1957;41:389–394.2. Arnold L, Willoughby L, Calkins V, et al. Use of peer evaluationin the assessment of medical students. J Med Educ. 1981;56:35–42.3. Korman M, Stubblefield RL. Medical school evaluation andinternship performance. J Med Educ. 1971;46:670–673.4. Linn BS, Arostegui M, Zeppa R. Peer and self assessment inundergraduate surgery. J Surg Res. 1976;21:453–456.5. Schumacher C. A factor-analytic study of various criteria ofmedical student accomplishment. J Med Educ. 1964;39:192–196.6. Barrows HS, Tamblyn RM. Self-assessment units. J Med Educ.1976;51:334–336.7. Keck JW, Arnold L, Willoughby L, Calkins V. Efficacy ofcognitive/non-cognitive measures in predicting resident-physicianperformance. J Med Educ. 1979;54:759–765.8. Zabarenko R, Zabarenko L. The Doctor Tree. Pittsburgh: Uni-versity of Pittsburgh; 1978.9. Arnold L, Willoughby TL, Calkins V. Self-evaluation in under-graduate medical education: a longitudinal perspective. J Med Educ.1985;60:21–28.10. Risucci DA, Tortolani AJ, Ward RJ. Ratings of surgical resi-


268 THE AMERICAN JOURNAL OF SURGERY® VOLUME 177 MARCH 1999

dents by self, supervisors and peers. Surg Gynecol Obstet. 1989;169:519–526.11. Calhoun JG, Woolliscroft JO, Hockman EM, et al. Evaluatingmedical student clinical skill performance: relationships among self,peer and expert ratings. Proc Annu Conf Res Med Educ. 1984;23:205–210.12. Morton JB, MacBeth WAAG. Correlations between staff, peerand self assessments of fourth-year students in surgery. Med Educ.1977;11:167–170.13. Schwartz RW, Donnelly MB, Sloan DA, Young B. Knowledgegain in a problem-based surgery clerkship. Acad Med. 1994;69:148–151.14. Albanese MA, Mitchell, S. Problem-based learning: a review ofthe literature on its outcomes and implementation issues. AcadMed, 1993;68:52–81.

15. Barrows HS. How to Design a Problem-based Curriculum for thePre-clinical Years. New York: Springer;1985.16. Kegel-Flom P. Predicting supervisor, peer and self ratings ofintern performance. J Med Educ. 1972;50:812–815.17. Herman MW, Veloski JJ, Hojat M. Validity and importance oflow ratings given medical graduates in non-cognitive areas. J MedEduc. 1983;58:837–843.18. King RB. Resident selection, what’s the problem? In: LangsleyDG, ed. How to Select Residents. Evanston, Ill: American Board ofMedical Specialties; 1988:25–36.19. Magarian GJ, Mazur DJ. Evaluation of students in medicineclerkships. Acad Med. 1990;65:341–345.20. Methany WP. Limitations of physician ratings in the assess-ment of student clinical performance in an obstetrics and gynecol-ogy clerkship. Obstet Gynecol. 1991;78:136–141.


THE AMERICAN JOURNAL OF SURGERY® VOLUME 177 MARCH 1999 269

Documents

Peer and self assessment during problem-based tutorials