Performance rating scale for peer and self assessment

British Journal of Medical Ediication, 1975, 9, 98-101

Performance rating scale for peer and self assessment1

BERNARD S. LINN2, MARTIN AROSTEGUI, and ROBERT ZEPPA Department of Surgery, University of Miami School of Medicine, Miami, Florida, U.S.A.

f(ey words SELF ASSESSMENT *PEER REVIEW 'EDUCATION, MEDICAL, UNDERGRADUATE *EDUCATIONAL MEASUREMENT

FACTOR ANALYSIS, STATISTICAL ATTITUDE OF HEALTH PERSONNEL STUDENTS, MEDICAL INTERPERSONAL RELATIONS EFFICIENCY FLORIDA

With emphasis in teaching shifting from simple communication of facts to more active partici- pation of students in their own learning comes a need for more diverse evaluation methods. Multiple choice, essay, National Board exams, and grades reflect the student's ability to recall facts. Recently, other areas of performance have included ratings of the students' behaviour relevant to ultimate physician performance. Generally, these evaluations have been done by faculty and house staff who have observed a group of students over a period of time. Since students rotate through the various services during their clinical years in groups and since group forms of learning have become popular, it is surprising that evaluation of performance by peers has not been done more often. In addition, the evaluation of medical care by process techniques such as medical record audit or observation by peers and the recent estab- lishment of a Professional Standards Review Organization that will undoubtedly rely heavily on peer assessment place added importance on peer evaluation in estimation of delivery of quality care. If the end point of success of physician performance is delivery of quality care, and if his peers are among the best judges of his abilities, then peer evaluation also seems an important dimension of student performance. Peterson (1973) has suggested that

'Work supported in part by VA 8200 Research and Education funds. lRequests for reprints to Dr Bernard S. Linn, Veterans Administration Hospital (151), 1201 N.W. 16th Street, Miami, Florida 33125, U.S.A.

98

students be taught peer review since they will be doing this in the future. Certainly, the ability to assess oneself is a critical ingredient in the process of continuing medical education.

Only a few studies have used peer evaluation in medical education, and none reported any systematic attempts to construct a reliable and valid instrument. Hammond and Kern (1959) asked students to rank themselves on nine attributes and found students were good judges of performance, as reflected by correlation with grades. Schumacher (1964) used peer ratings of senior medical students on knowledge, skill in diagnosis, and ability to establish relationships. He concluded that peer ratings accounted for a factor not measured by grades and National Boards, which he called skill in relationships. Lastly, Korman and Stubblefield (1971) studied intern performance using grades and ratings of faculty and peers during the senior year of medical school and found peer ratings the best predictor of intern performance.

The purpose of this paper is to describe development of a scale for self or peer rating to determine level of performance.

Scale development Table 1 shows the Performance Rating Scale (PRS). Sixteen items are rated on a 4-point scale ranging from below average to outstanding, with a higher score being a more favour- able one. Items were selected which had face and consensual validity. During 1971, junior medical students at the University of Miami School of Medicine were asked to rate themselves and a group of their peers, whom they

Performance rating scale for peer acd self assessment 99

Table 1. Physician performance rating scale ~

DIRECTIONS: Please fill out the information below before proceeding to ratings.

NAME OF PERSON EVALUATED : .................................................... .....................................

DATE: ...... .......................... SERVICE: ................. ............................................................ STATUS O F PERSON EVALUATED : (circle) Student Intern Resident Faculty Other

PERIOD O F OBSERVATION : ........................................................................................................................................... YOUR NAME: ..................... ........................................................................................................ ......................... YOUR STATUS: (circle) Student Intern Resident Faculty Other Please evaluate the items below according to the following scale : 1 =Below average 2=average 3=Above average 4=Outstanding

Encircle appropriate level :

CONSCIENTIOUSNESS LEADERSHIP APPEARANCE CLINICAL JUDGEMENT

TEAM SPIRIT INTERPERSONAL RELATIONSHIPS INTELLECTUAL CURIOSITY

DOCTOR-PATIENT RELATIONSHIP

TECHNICAL ABILITY

:ICIA N”

1 1

1

2 2 2 2 2 2 2 2 2

3 3 3

3 3 3 3

-t

4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

~~

‘No entry necessary for this item if this is a self-evaluation.

had worked with closely on ward assignments. This generated 928 ratings, each student rating himself and about seven to ten of his peers. These ratings were subjected to factor analysis to determine dimensions of evaluation and to provide a guide to future scoring. Though the factor structure, in itself, provided data on reliability, the reliability of the scaIe was further evaluated by test-retest of having 54 students rate themselves and their peers twice, about one week apart. The validity of the scale was tested by its correlation with final grade.

Factor structure Table 2 shows the results of the factoring; using Kaiser’s Varimax rotation, two distinct factors emerged. Ten items, accounting for 40% of the variance, loaded on factor one, and seemed to describe an interpersonal or relationship factor. Six items formed the second factor, accounting for about 29% of the variance, and described knowledge or skill. In subsequent

analysis, items which loaded on these factors were added together and averaged to form two factor scores, referred to as a Knowledge Factor (KF) or Relationship Factor (RF). The use of specific weights in developing factor scores did not greatly enhance measurement over the simple method of adding and averaging the ten and six item scores. Since the last three items are dropped when the scale is used for self rating, the procedure of averaging allows ready comparison of levels of scores.

Reliability and validity A scale’s reliability refers to its ability to provide consistent results. One way of determining consistency is .by a test-retest method. The seven to ten peer ratings for each of 54 students were averaged into one K F and XF score and together with his self rating were correlated with the same ratings taken one week later. For self rating, RF was 0.824 and KF was 0.680. For peer ratings, R F was 0.902 and KF

100 Bernard S . Linn, Martin Arostegui, and Robert Zeppu

Table 2. Factor structure of physician performance rating scale

Variables Factor I

'Relationships' Factor 2

'Knowledge'

Fund of knowledge -0.869 Conscientiousness 0.689 Leadership 0.683 Appearance 0.572 Clinical judgement -0.743 Doctor-patient relationship 0.760 Team spirit 0.8 15 Interpersonal relationships 0.833 Intellectual curiosity -0.755 Technical ability - 0.608 Potential as physician - 0.695 Integrity 0.760 Ability to communicate 0.680 Desirability as a partner' 0.752 Desirability as a consultant' -0.654 Desirability as your own physician' 0.69 1

'No entry necessary for this item if this is a self-evaluation.

was 0.907. Thus, reliability seemed quite res- pectable.

The concept of validity deals with how well a scale measures what it is intended to measure. A long follow-up of students into their clinical practice and more precise performance criteria are needed to answer this question. In terms of short term performance, one of the best esti- mates is still probably grades. Correlation of self and peer ratings with final grades were: Self RF (r=0.244, PCO.05). Self KF (r=0*296, P<O.Ol), Peer RF (r=0326, P<OOl), and Peer KF (r=0.503, P<O.Ol).

Student reactions One interesting finding was that students con- sistently rated themselves lower than they were rated by their peers. Peer ratings were slightly toward above average, with about one-half scale point deviation, and self ratings were toward the average range with half a point standard deviation. For the most part, students were interested in the process of evaluation and made good suggestions. Six students either failed to co-operate or returned unusable data. Two rated faculty and residents instead of themselves or peers. Four submitted in writing their objection to doing the evaluation.

Uses of the scale It seems that peer, and to a lesser extent self- evaluation, provided ratings that were highly related to those of the grading system. In addition, it offers a view of the student not often

available, since students know themselves and their peers from a different perspective. A short scale can provide considerable data on students in a rapid and easily obtainable manner.

The fact that students and faculty agreed could simply mean that living together in the same educational environment produces com- mon values. Likewise, grades probably do not provide the best anchor point for validating performance, but they are currently one of the most accepted and reliable methods.

Lastly, in generalizing to a wide use of the scale in clinical practice, it is recognized that there is a considerable difference between the student in training and the physician with years of experience. The physician in practice has no absolute grades with which peer review can be compared. However, one step might be to determine the relation between peer and self assessment, on a scale such as the one described here, and to determine the outcome of such techniques as medical record audit to judge the quality of care.

Summary A performance rating scale was developed and tested on a class of junior medical students who rated themselves and four to ten of their peers. When 928 ratings were factor analysed, two strong factors, knowledge and relationship, emerged. Test-retest reliabilities were good. Validity was measured by correlation of ratings with grades, and though both sources of ratings correlated significantly with grades given by

Performance rating scale f o r peer and self assessment 101

faculty, peer ratings were more highly related

to rate themselves lower than they were rated by their peers. Grades are probably not the Press: Massachusetts.

one of the most reliable. Use of the scale to judge performance of physicians in practice has not been tested. The question of how such

other measures of quality of care is raised.

References to grades than were self ratings- Students tended Hammond, K. R., and Kern, F. (1959). Teaching

Comprehensive Medical Care. Harvard University

Korman, M., and Stubblefield, R. L. (1971). Medical best estimate of performance, but are school evaluation and internship performance. ~o i rmal of Medical Education. 46, 670-673.

Peterson, P. (1973). Teaching peer review. Jorrrnal of the American Medical Association, 224, 884-885.

Schumacher, C. F. (1964). A factor-analytic study of Of peer and to various criteria of medical examinations. loirrnal

of Medical Education, 39, 192-196.

Documents

Performance rating scale for peer and self assessment