Peer and self assessment in undergraduate surgery

JOURNAL OF SURGICAL RESEARCH 21,453-456 (1976)

Peer and Self Assessment in Undergraduate Surgery’

BERNARD S. LINN, M.D., 2.5 MARTIN AROSTEGUI, M.D.,3 AND ROBERT ZEPPA, M.D.’

2Associate Professor of Surgery, University of Miami School of Medicine, Associate Chief of

Stafffor Education, Veterans Administration Hospital, Miami, Florida, 3Medical Resident, University of Miami School of Medicine, Miami, Florida, and 4Professor and Chairman,

Department ofsurgery, University of Miami Schoolqf Medicine, Miami, Florida

The ultimate criterion of a physician’s performance is the quality of care which he delivers. Much has been written about measurement of these variables. Even if quality of care can be estimated accurately, the intervening years between medical school and practice make it difficult to identify markers of success. Grades reportedly [l] have had little predictive validity, neither has our system of credentialling [2, 31. Although student behaviors seem important, those which specifically predict success in practice have not been clearly identified. Further- more, there is still a question concerning how behaviors can best be measured and by whom.

With PSROs, recertification, and docu- mentation of continued competence, the focus for evaluation has moved beyond the medical student to include the resident and the physician throughout his clinical practice. Some emphasis has been given by the AMA recently to self assessment in continuing education. Little attention has been given to this type of evaluation in medical school. Yet, it would seem that helping the student learn to evaluate himself would be an important asset to him throughout his career. With the interest in peer review in judging quality of care, it is also surprising that peer evaluation has not been used more in measuring student

‘Work supported in part by VA 8200 Research and Education funds.

‘Address correspondence to Bernard S. Linn, M.D., Veterans Administration Hospital (141), 1201 N.W. 16 St., Miami, Florida 33125.

performances, but where it has [4-71, it has been reported of value.

The purpose of this study was to determine if students could identify their own strengths and weaknesses as viewed by their peers and as reflected in other outcome measures such as grades and National Board Medical Examination (NBME).

METHOD

Ninety-eight of 102 junior medical students during 1 academic year participated in the study. Students rotated through their surgical clerkship in small groups. Each student rated himself and his peers in the group at the end of the 1Zweek rotation on a 13-item Performance Rating Scale [8]. Items were scored on four-point scales ranging from below average to outstanding, with higher scores being more favorable. The scale yields two subscores (factor scores) that can be equated with (a) knowledge and (b) interpersonal relationships. There were 98 ratings of self and 830 peer ratings. Each student had about eight peers in his group who rated him. These ratings were averaged to provide one peer score on the knowledge and relationship factor. Averaged scores were compared between self and peer ratings by Pearson Product Moment Correlations.

To determine the relationship of peer and self ratings to other areas of evaluation of student performance in the surgical clerkship, these four variables were used to predict final grade in surgery (derived from grades on quizzes, faculty and housestaff rat-

453 Copyright @ 1976 by Academic Press, Inc. All rights of reproduction in any form reserved.

454 JOURNAL OF SURGICAL RESEARCH: VOL. 21, NO. 6, DECEMBER 1976

ings on a 12-item Ward Performance Rating Scale [9], and scores on the surgical section of the NBME taken in the junior year) as well as to predict the NBME alone. Data were analyzed by multiple regression analyses. Lastly, all elements of the evaluation system and the two scores from peer and self ratings were factor analyzed (Kaiser Varimax Rotation) to determine the dimensions in evaluation.

RESULTS

The correlations between peer and self scores were low but significant statistically. Peer and self ratings on the knowledge factor of the scale correlated at the 0.01 level (r = 0.408). However, the correlation between peer and self judgment on interpersonal relationship factor of the scale was only r = 0.298 (P < 0.05). This indicated that students and their peers agreed less about the student’s interpersonal skills than about the student’s knowledge.

There have been discussions about what NBME measures. Most seem to think NBME measures fund of knowledge, specifically recall of facts. Our question was whether peers could identify students who would do well on NBME, specifically on the ratings of the student’s cognitive ability, and, furthermore, whether the student himself could predict his intellectual performance. When the four variables (peer and self knowledge and relationship factors) were entered as predictors of NBME scores, peer

knowledge ratings were highly predictive (F = 15.48, P < 0.001) and self judgment of knowledge was only minimally related to performance on NBME (F = 3.96, P < 0.05). Scores on the relationship factor which measured the student’s interpersonal charac- teristics were not statistically significant as predictors, whether rated by peer or self. In fact, the direction of the correlation indicated that higher interpersonal skills were associated with less favorable scores on NBME.

The same four variables were reanalyzed to determine their relationship to the final grade in surgery. The final grade was de- termined by a combination of grades on quizzes, ratings of the faculty and housestaff on ward performance, and scores from NBME. Grades were converted to a numerical scale of 13 points and used as the outcome (de- pendent) variable in analysis. Again, peer ratings were more significantly related to grades than self ratings. Both peer knowledge (F = 19.97) and relationship (F = 8.37) factors were highly related to final surgical grade (0.001 level). Self ratings of knowledge (F = 6.24) and relationship (F = 4.11) were predictive of final grade at the 5% level. In this analysis as well, the ratings on interpersonal skills by peers and self were inversely related to grade.

In order to determine whether the components of evaluations were essentially measuring the same thing or whether the evaluation system was in fact tapping

TABLE I Factor Structureof All Evaluation Measures Used in the Study with Factor Loading and Percent of Variance

Accounted for by Each Factor

Variables

Quiz grades NBME scores Resident evaluation Faculty evaluation Final grade in surgery Self relationship score Self knowledge score Peer relationshio score

Factor I 25%

0.670 0.542

0.932

Factor II 21%

0.913

Factor III 18%

0.879 0.896

Factor IV 10%

0.698

0.664

Peer knowledge score 0.855

LINN, AROSTEGUI AND ZEPPA: SELF ASSESSMENT IN SURGERY 455

different dimensions, all evaluation scores were subjected to factor analysis, a statis- tical technique for grouping highly intercor- related variables together. Table 1 shows the results. Four rather strong groups (factors) emerged. The major factor suggested that final grade, NBME, and housestaff ratings of ward performance were highly intercorre- lated. Peer and self ratings were separated as the second and third factors, each including its own knowledge and relationship components. In the case of self ratings, knowledge carried greater weight than relationship, whereas for peer ratings, relationship was more important. This suggests that students gave more emphasis to knowledge when they assessed their own performance, while peers gave more weight to interpersonal relationships. The fourth factor, ac- counting for only 10% of the variance, was composed of quiz grades and faculty ratings of ward performance. It suggests that faculty may be more highly influenced by the grades they give on quizzes than by what they ob- serve on the ward or that faculty may not know the student as well from observation as they do from grading papers.

DISCUSSION

In all instances, peer assessments were better predictors of final grade or NBME performance. It suggests that peer assessment of student knowledge is very similar to what is judged by faculty or obtained from paper and pencil tests of cognitive ability. What peers think about the interpersonal behavior of their colleagues is also highly predictive of final grade and not related to NBME. Students receiving less favorable ratings on interpersonal skills had better grades. Geertsma and Chapman [lo] reported similarly finding two factors (a cognitive and noncognitive one) and showed that students who scored high on one, scored low on the other. This is somewhat disturb- ing. Does it mean that students with good interpersonal qualities are less intelligent? Does it imply that popular students are more into extracurricular activities and therefore

have less time to apply to textbook learning? Up to a point, this could be desirable in that it reflected a broader base of education with a better rounded student. Another considera- tion is that even though interpersonal skills were assessed by faculty and housestaff and were used in determining the final grade, that housestaff and faculty may rate these scales with a halo effect from knowing the student’s cognitive performance or it could indicate they simply do not know the interpersonal behavior of a student in the same way that his peers know him, Lastly, it could be a philo- sophical approach indicating that medicine has worshipped too long at the alter of academic achievement in that it has passed the bounds of what may be needed and necessary in a practicing physician. Most students entering medical school are already in the upper IQ levels, increasingly so each year, and the tendency toward “elitism” in scholastic achievement may not be related to high level performance as a clinician, but more predictive of careers in nonclinical areas where interpersonal skills are not as essential.

What the data also suggest are that students are not as accurate as their peers in judging their overall performance. Self scores were never better than 5% level as a predictor and sometimes not significantly associated with outcome at all. However, this does not seem a good argument for abandon- ing self ratings, if one goal of education is to help students take responsibility for self- directed learning. It may be that they, lack experience in assessing themselves in any formal manner. In fact, it would be interest- ing to test whether repeated feedback from other elements of the evaluation system, given on an ongoing basis, helped to improve the accuracy of self ratings or had any im- pact on his perceptions of himself.

Based on this study, it is our intention to use both forms of ratings and to add peer assessments, at least, as one of the components of the evaluation system that goes into determining the student’s final grade in surgery.

456 JOURNAL OF SURGICAL RESEARCH: VOL. 21, NO. 6, DECEMBER 1976

SUMMARY The purpose was to determine if students

could identify their own strengths and weak- 2. nesses as viewed by their peers and whether peer or self ratings of knowledge and in- 3, terpersonal skills could predict scores on Na- tional Board Medical Examinations and final grades in surgery. Ninety-eight juniors rated 4. themselves on a 13-item scale; 830 peer ratings (about eight per student) were also obtained and averaged to provide a peer score 5. for each student. There were low but significant correlations between the two sets of ratings. Peer scores were excellent predictors of 6. both National Boards and final grade. Self scores were only weakly associated with these outcomes. Interpersonal skill ratings 7. were inversely related to grades. Peer ratings seem to be a valuable addition to evaluation of student performance. Self ratings may be

8.

more useful as a means of helping students learn to accurately assess their strengths and 9, weaknesses.

REFERENCES IO I. Wingard, J. R., and Williamson, J. W. Grades as

predictors of physicians’ career performance: An

evaluative literature review. .I. Med. Ed. 48:311, 1973.

Peterson, 0. L., Andrews, L. P., Spain, R. S., and Greenberg, B. G. An analytical study of North Carolina general practice.J. Med. Ed 31:1, 1956.

Price, P. B., Taylor, C. W., Richards, J. M., and Jacobsen, T. C. Measurement of physician performance.J. Med. Ed. 38:203, 1964.

Korman, M., Stubblefield, R. L., and Martin, L. W. Patterns of success in medical school. J. Med. Ed. 43:405, 1968.

Schumacher, C. F. A factor-analytic study of various criteria of medical examinations. J. Med. Ed. 39:192, 1964.

Hammond, K. R., and Kern, F. Teaching Comprehensive Medical Care, Harvard University Press, Cambridge, 1959.

Heifer, R. E. Peer evaluation: Its potential useful- ness in medical education. Brif. J. Med. Ed. 6:224, 1972.

Linn, B. S., Arostegui, M., and Zeppa, R. A performance rating scale for peer and self assessment. Brit. J. Med. Ed. 9:98, 1975.

Linn, B. S., and Zeppa, R. Measuring performance in surgical clerkship. J. Med. Ed. 49:601, 1974.

Geertsma, R. H., and Chapman, R. L. Medical school evaluation and internship performance. J. Med. Ed. 46:670, 197 I.

Documents

Peer and self assessment in undergraduate surgery