Peer Feedback Marking: Developing Peer Assessment

This article was downloaded by: [University of Birmingham]On: 25 August 2014, At: 19:20Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Innovations in Education & TrainingInternationalPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/riie19

Peer Feedback Marking: DevelopingPeer AssessmentNancy Falchikov aa Napier University , Edinburgh, UKPublished online: 09 Jul 2006.

To cite this article: Nancy Falchikov (1995) Peer Feedback Marking: Developing Peer Assessment,Innovations in Education & Training International, 32:2, 175-187, DOI: 10.1080/1355800950320212

To link to this article: http://dx.doi.org/10.1080/1355800950320212

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/riie19

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/1355800950320212

http://dx.doi.org/10.1080/1355800950320212

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

IETI 32,2 175

Peer Feedback Marking: Developing Peer Assessment

Nancy Falchikov, Napier University, Edinburgh, UK

SUMMARY

Some studies of peer assessment in higher education are reviewed, and found to focus on eitherassessment of a product such as an examination script, or of the performance of a particular skill,often in a medical or dental setting. Classroom performance studies focus mainly on interper-sonal skills or group dynamics.

Many examples where mean peer assessments resembled lecturer assessments were found, andthe overwhelming view seems to be that peer assessment is a useful, reliable and valid exercise.Student evaluations of peer assessment suggest that they also perceive it to be beneficial. How-ever, some students expressed a dislike of awarding a grade to their peers, particularly in thecontext of a small, well established group.

A study which attempted to capitalize on the benefits of peer assessment while minimizing theproblems is described. In this study, the emphasis was on critical feedback, rather than on theawarding of a grade, though this was required also. Results indicated a close correspondencebetween lecturer and peer marks, as in previous studies. Feedback was perceived to be useful, andthe scheme of Peer Feedback Marking (PFM) rated as conferring more benefits than the moreusual, lecturer marked method. The main strength of PFM seems to be related to the enhancementof student learning by means of reflection, analysis and diplomatic criticism.

INTRODUCTION Peer assessment is the process whereby groups ofindividuals rate their peers. This exercise may or may

Assessment of students serves a number of purposes, not entail previous discussion or agreement oversome of which may be conflicting. For example, criteria. It may involve the use of rating instrumentsRamsden (1986) argues that assessment for the pur- or checklists, which may have been designed, bypose of selection is not consonant with the enhance- others, before the peer assessment exercise, or bement of learning. It is broadly accepted that the designed by the user group to meet their particularassessment demands placed on students influence needs. It is not unusual for programmes of peer as-the type of learning which takes place (Laurillard, sessment to be associated with self assessment1984; Ramsden, 1984), and devolving some respon- schemes. Products such as written work or exami-sibility for assessment to students is often seen as a nation scripts, may be assessed by peers (egmeans of enhancing the learning process (Boud, Falchikov, 1986; Magin and Churches, 1988), as may1988). Student involvement in assessment can take various aspects of performance. Peer assessment hasthe form of self assessment, peer assessment or col- been used in university and college classrooms andlaborative assessment. Common to most successful in the supervised work setting, such as hospitals orself and peer assessment schemes is the act of mak- schools. Examples of peer assessed performance in-ing explicit the assessment criteria. Other character- elude practical medical or surgical skills (eg,istics of successful self assessment are discussed in Schumacher, 1964), orthodontic or other dental skillsFalchikov and Boud (1989). (eg, Forehand et al 1982), nurse 'effectiveness'

Dow

nloa

ded

by [

Uni

vers

ity o

f B

irm

ingh

am]

at 1

9:20

25

Aug

ust 2

014

176 IETI32.2

(Klimoski and London, 1974), classroom participa-tion (Boud and Tyree, 1979), pre-practicum coun-selling skills (eg, Fuqua, Johnson, Newman,Anderson and Gade, 1984) and performance in agroup (Falchikov, 1988,1993).The majority of peerassessment studies appear to fall into the perform-ance group.

Many of the earlier studies of peer assessment tookplace in medical education. Over 20 years ago,Korman and Stubblefield (1971) reported that peerratings were the best predictor of intern perform-ance. Linn, Arostegui and Zeppa (1975) concludedthat peer evaluation 'offers a view of the student notoften available, since students know themselves andtheir peers from a different perspective', but alsofound that peer evaluations 'provided ratings thatwere highly related to those of the grading system'.In 1977, Morton and Macbeth reported a study offourth year student self and peer assessment of over-all performance on a clinical attachment in generalsurgery. The correlation between the staff and peerratings was 0.53 (and significant at p< 0.001).

However, not all studies of peer assessment find closecorrespondence between peers and teachers. Forexample, Kegel-Flom (1975) found that analysis ofcorrelations between ratings of four dimensions ofperformance by supervisors, peers and interns, indi-cated that the three groups not only used differentperspectives in making their judgements, but alsorated performance differently. Further analysis of theperformance criteria, the mode of rating and othersalient factors in this study and in other studies ofpeer assessment is necessary in order to better under-stand Kegel-Flom's apparently contradictory results.

Agreements between ratings by students and teach-ers are also found in the field of dentistry. For ex-ample, a simple study of the assessment of dentalclinical procedures carried out by Mast and Bethart(1978) which involved the assignment of one ofthreecategories ('acceptable', 'modify' or 'unacceptable')reported faculty—student agreements of over 99 percent. Forehand, Vann, and Shugars (1982) also de-scribed peer and self assessment studies in orthodon-tics carried out by Jacobs, where the correspondencebetween peer and instructor ratings exceeded thatbetween self and instructor ratings. Similarly,Klimoski and London (1974) found that peer andsupervisor ratings of the 'effectiveness' of nursesdid not differ significantly, whereas self and super-visor ratings did.

In a very early study which points to the importanceof interpersonal skills, Schumacher (1964) reportedpeer ratings by senior medical students of know-ledge, skill in diagnosis and the ability to establishrelationships. He concluded that peer ratings ac-counted for a 'skill in relationships' factor, whichhe saw as not normally being measured by gradesand national boards. In a university classroom studyinvolving peer assessment, D'Augelli (1973) at-tempted to assess interpersonal skills using a meas-uring instrument with behavioural categoriesprovided. Trained observer and mean peer ratingswere found to be highly significantly related on asub-set of behaviours compared, and D'Augelli con-cluded that, 'the convergence of peer and observerratings suggests the former may be accurate indicesof interpersonal behavior'. This was not found to bethe case for self ratings.

Other university and college classroom studies fallinto two groups: those which look at peer assess-ment of the processes of interaction and those whichassess the products. Looking first at the processgroup, Boud and Tyree (1979) supplied first yearlaw students with two rating scales with which toassess participation in class. Peer/teacher correlationswere found to be positive and highly significant,being r = 0.75 for one rating scale and r = 0.83 forthe other. These researchers found that self/teacherratings were also related, though not to the sameextent as peer/teacher ratings.

Falchikov's studies of group process analysis (1988,1993) differ from most studies in that they reportedpeer assessment by students in the absence of a lec-turer or tutor. As comparison of peer marks with lec-turer marks was impossible, some measure of ratingreliability was made by means of a comparison ofstudents' rankings. Rank orders were found to beidentical in the first, smaller, study, and highly sig-nificantly related in the assessment of task behav-iours in the later one.

Classroom studies of peer assessment of traditionaleducational products (essays or examination scripts)are relatively infrequently occurring, and seem tobe geographically limited to Scotland andAustralia.Gray (1987) reported a study of peer marking of anengineering examination for which a set of modelanswers were provided. Results indicated 39 per centagreement to within five marks between peer andstaff. However, more than 84 per cent of pairs ofratings agreed to within ten marks. A similar studyis reported by Magin and Churches (1988). The

Dow

nloa

ded

by [

Uni

vers

ity o

f B

irm

ingh

am]

at 1

9:20

25

Aug

ust 2

014

lecturer/peer correlation was calculated to be r = 0.81.Falchikov (1986) reported a study of self and peerassessment of an essay, and found 'reasonable agree-ment' (< 10 per cent difference) between peer andtutor ratings in just over 60 per cent of cases. A ten-dency to over-mark on the part of peers was alsonoted.

Both Falchikov and Magin and Churches report thatstudents perceived the scheme of self and peer as-sessment to be beneficial to them. For example, 73per cent of Magin and Churches' (1988) studentsperceived the experience of peer assessment to beof benefit in 'developing my ability to assess thework of a colleague when I become a practising en-gineer', though the major benefit of the exercise wasseen to be 'improving exam performance throughdeveloping an understanding of what examiners lookfor in answers'. In Falchikov's (1986) study, stu-dent ratings of peer assessment suggested that thescheme had conferred many benefits in terms of thelearning process. For example, over 80 per cent ofthe student group stated that peer group assessmentmade them think more. Other benefits claimed werethat the scheme made them more structured andmade them learn more. Overall, they rated thescheme as challenging, beneficial and helpful as wellas being hard and time consuming. However, leastliked features suggested that peer group assessmentpresented two problems to students. Students did notenjoy the idea of 'marking down or failing a peer'and reported lack of knowledge of the peer topic aspresenting a difficulty. These features may accountfor the overmarking Falchikov (1986) observed onthe part of nearly 40 per cent of the student sample.She concluded that some aspects of peer group as-sessment required some modification.

Thus, in summary, peer assessment appears to havethe following characteristics:

— The majority of studies attempt to assess the 'ac-curacy' of peer assessment by means of compari-sons between the ratings of an 'expert' (usuallythe lecturer) and the mean peer mark.

— The overwhelming view is that peer assessmentis generally a useful, reliable and valid exercise.However, in some circumstances, student over-marking occurs.

— Comparison between studies is difficult, particu-larly in relation to performance or process assess-ment. A wide variety of measuring instrumentshave been used to meet the requirements of a widevariety of situations.

Peer Feedback Marking: Developing Peer Assessment 177

— Studies focus on either a product or performance.Performance assessment studies greatly out-number product assessment ones.

— Peer assessment of supervised performance atwork takes place in locations such as hospitalsor schools, whereas peer assessment of educa-tional products takes place in university class-rooms.

— Classroom studies tend to focus on interpersonalskills or group dynamics.

— Few student evaluations of peer assessment arereported. Those that are available suggest morebenefits than reservations.

— In some circumstances students appear to experi-ence difficulty or reluctance in awarding marksto their peers.

PRESENT STUDY

The present study attempted to capitalize on thebenefits of peer assessment in terms of improvingthe learning process, sharpening critical abilitiesand increasing student autonomy, while at the sametime addressing problems encountered in earlierstudies.

Participants

Thirteen students of human developmental psychol-ogy took part in this study (12 female, 1 male). Allwere in the third year of a four year BSc honoursdegree in biological sciences. Mean age of partici-pants was 20 years 11 months.

Task

Students were required to carry out an individualexercise. First of all, they were directed to go to thecurrent journals section in the library and identifythose periodicals relevant to the study of human de-velopmental psychology. They were further in-structed to browse through a few copies and identifyan experimental study of any topic of pre-adolescentdevelopment which interested them. They were toldto study their chosen article and then summarize it.The written summary was not to exceed two A4 sidesof paper. In addition, they were asked to suggest whatinvestigation the experimenter might carry out next.Finally, students were informed that they would berequired to talk for about 10 minutes to the rest ofthe group about the study and the issues raised inthe article. Use of visual aids and handouts wasencouraged.

Dow

nloa

ded

by [

Uni

vers

ity o

f B

irm

ingh

am]

at 1

9:20

25

Aug

ust 2

014

178 IETI32.2

Preparation for peer feedback marking

When the task was presented to students, they werealso informed that they were to carry out the assess-ment of their peers' short oral presentations. Helpwas promised nearer the time for presentations. Be-fore the first presentation, students were asked tothink about oral presentations such as lectures andseminars at which they had been present, and to iden-tify the characteristics of a good presentation. Theywere encouraged to remember bad presentationsalso, in order to work out what was wrong with them.A short list of relevant criteria was compiled for usein assessing their peers' presentations. See Table 1.

Table 1. Student generated criteria relating to goodoral presentation

StructureThe presentation should have a coherent andlogical structure.

Knowledge of topicThe presenter should demonstrate a goodknowledge and understanding of the topic.

Amount of informationThe presentation should include anappropriate amount of information — neithertoo much or too little.

DeliveryThe presentation should be delivered in aclear, expressive manner.

Next, students were reminded of the marking bandsin use, and of a likely pattern of ultimate degree clas-sifications (ie, a relatively small number of first andthird class degrees, and a distribution of marks overthe upper and lower second range). Marks out of 20corresponding to these groupings were calculatedand noted by all students. Finally, copies of a peerassessment form were circulated, each simply ask-ing students to identify the 'best feature' and a 'weak-ness' of each presentation. In addition, an overallmark (out of 20) was requested. Assessments wereto be carried out anonymously. Mean peer assess-ments were to carry equal weight with lecturer marksin terms of coursework grade.

Implementation

After each student's presentation, peers completedthe assessment form and awarded an overall mark,taking into account the criteria of excellence previ-

ously identified. Written assessments were also madeindependently by the lecturer, using the same assess-ment form. Oral feedback, based on the written assess-ments, was given by students, and then by the lecturer,to each student after her or his presentation.

Treatment of results

All assessment sheets were collected, and mean peermarks calculated. Comparisons were made betweenmean peer and lecturer marks. 'Best features' and'weaknesses' listed for each presentation weregrouped into a number of categories, and the degreeof inter-rater agreement calculated.

Evaluation of the Peer Feedback Markingscheme

An evaluation sheet was distributed to all studentsat the next meeting after completion of the exercise(ie, when all marking was completed and feedbackfrom peers received) (see appendix). The evaluationinstrument consisted of seven questions, whichsought information concerning the best liked fea-ture of Peer Feedback Marking (PFM) (Question 1),the least liked feature (Question 2), an evaluation ofPFM (Question 3), and evaluation of traditionalmarking (Question 4), an assessment of the effectsof PFM (Question 5), an assessment of the effectsof traditional marking (Question 6) and an open-ended question inviting a personal comparison ofthe two schemes (Question 7).

In this context, 'traditional marking' would, in allprobability, have consisted of the assessment of awritten version of the presentation by the lecturer,with a global assessment of the oral presentation,again by the lecturer, which might or might not havebeen taken into account in the final grading of thestudent's work.

Feedback to students

Lists of'best features' and 'weaknesses' relating toeach presentation were returned to students, togetherwith mean peer assessment marks.

RESULTS

Comparisons of mean peer mark and lecturermark

Mean peer marks, lecturer marks and the differencesbetween these are shown in Table 2.

Dow

nloa

ded

by [

Uni

vers

ity o

f B

irm

ingh

am]

at 1

9:20

25

Aug

ust 2

014


Table 2. A comparison of mean peer mark and lecturer mark

s123456789

10111213

MeanMean

Mean peer mark

13.312.912.513.512.612.913.913.412.814.111.412.710.5

overmarking = 1.0 markundermarking = 0.6 mark

Overall difference = +0.5 marks

Lecturer mark

13131215121214121112131210

= 5.0%= 3.0%= +2.5%

Difference (P-L)

+0.30.1

+0.5-1.5+0.6+0.9-0.1+1.4+1.8+2.1-0.6+0.7+0.5

% equivalent

+1.5%-0.5%+2.5%-7.5%+3.0%+4.5%-0.5%+7.0%+9.0%

+10.5%-3.0%+3.5%+2.5%

Differences between the two marks were small andvaried from +2.1 (+10.5 per cent) to -0.6 (-3.0 percent). In nine cases, the mean peer mark exceededthat of the lecturer ('overmarking'), though by onemark only (5.0 per cent). In the remaining four cases,the reverse was true ('undermarking'), and the lec-turer's mark exceeded the mean peer mark by 0.6mark (3.0 per cent). The overall difference was +0.5mark (2.5 per cent). The distribution of marks overthe degree classification ranges indicated no differ-ence between the lecturer and mean peer mark. (Peersawarded one first-class mark, ten upper seconds andtwo lower seconds, while the lecturer awarded twofirst-class marks and nine upper seconds and onelower second.)

Dimensions along which raters judged thequality of a short oral presentation

Features of presentations listed by students fell intoa number of categories, which reflected the criteriarelating to a good oral presentation they had identi-fied earlier. Each feature had a positive and a nega-tive pole corresponding to identified strengths andweaknesses:

Understanding and structure

Positive 'Very interesting. I could relate to it.''Easy to follow.'

Negative 'It was hard to understand the study.''Confused and confusing.'

Some raters focused on parts of the presentation.

'Good beginning,''Clear results section.'

Delivery

Positive ' Well presented. Very good overheads andhandout.''Spoke to the group rather than into thepaper.'

Negative 'Read straight from notes, with little eyecontact.''Monotonous tone of voice.'

Amount (of material included)

Positive 'Just the right amount. All the info wasrelevant.''Wide range of ideas.'

Negative 'Difficult to take in all the info. Too manyfigures.''Lasted a bit too long.'

Knowledge

Positive 'Seemed to know the study well.''Knowledge of the topic was good.'

Negative ' Seemed like they did not know what wascoming next.''Lack of knowledge.'

Dow

nloa

ded

by [

Uni

vers

ity o

f B

irm

ingh

am]

at 1

9:20

25

Aug

ust 2

014

180 IETI32.2

In addition to the four dimensions above, some top-ics chosen by students were themselves rated asstrengths.

Topic

'Intrinsic interest in the topic''Good hard hitting information.'

Effort expended on the presentation featured verylittle in ratings, attracting one mention only.

'A lot of effort had obviously been put in.'

Some features may belong to more than one category.For example, being 'easy to follow' may result froma good delivery as much as from clear structure.

The degree of agreement between peers overstrengths and weaknesses of presentations varied,being between 25 per cent and 77.7 per cent forstrengths, and between 25 per cent and 100 per centfor weaknesses. Most students' presentations wererated as having a variety of strengths and the mean

number of positive features mentioned was 4.2.There was less variety and, thus, a greater degree ofagreement over the weaknesses listed. Mean numberof weaknesses listed was 2.8. The degree of peerrater agreement for the most frequently mentioned'strength' and 'weakness' of the 13 presentations isshown in Table 3.

Presentation seven is particularly interesting in thatunderstanding and delivery appear as both strengthsand weaknesses.

EVALUATION

Best liked features of Peer FeedbackMarking

Best liked features tended to fall into one of twogroups, in roughly equal numbers, the first concern-ing the fairness of the system, and the second theconceptualization of feedback as an aid to learning.

Table 3.

Presentationnumber

123456

7

89

101112

Overall agreement

Key

A = AmountD = DeliveryK = KnowledgeU = UnderstandingZ = Zero response

Degree of peer rater agreement over strengths and weaknesses of presentations

Strengthcategory

U+D+D+D+U+U+

U+D+K+U+D+U+u+

+ = positive— = negative

Degree ofagreement

50.0%44.4%55.6%77.8%50.0%66.7%

50.0%each25.0%66.7%25.0%41.7%33.3%

48.9%(s.d=15.7)

Weaknesscategory

A -A-A-ZD-D -ZU-D-U-A -A-ZA-E-

(range = 25.0-77.8%)

Degree ofagreement

41.7%44.4%

100%35.5%50.0%25.0%

each33.3%

each58.3%

100%91.7%41.7%58.3%

each

59.5%(s.d. = 26.9)(range = 25.0-100%)

Dow

nloa

ded

by [

Uni

vers

ity o

f B

irm

ingh

am]

at 1

9:20

25

Aug

ust 2

014

Fairness

'Assessed by a greater number of people.''Other people's reactions to your work -fairer.'

Feedback as an aid to learning

'Told me about my weaknesses and strengths.''Looking for good points/best features.'

Least liked features of Peer FeedbackMarking

Again, responses to this question fell into two cat-egories, one concerning the problems of awardingmarks to one's friends, the other focusing on the dif-ficulties of exercising the analytical skills necessaryfor the task.

Marking friends

'Some of the marks given were a bit high.Some people perhaps felt embarrassedjudging their peers.''I find it difficult to assess people as fairly asI want to. You may feel obliged to friends —everybody knows everybody's writing.'

Analytical skills

'It's hard to criticize and judge differencesbetween different speakers.''Difficult to think of things to say about otherpeople's reports.'

A comparison of Peer Feedback Marking andtraditional marking

Responses to questions three and four enabled a com-parison of Peer Feedback Marking (PFM) and tra-ditional marking to be made. A response recordedin either of the two extreme columns was taken assupportive of the right or left hand description asappropriate. Positive ratings comparing traditionalmarking and PFM are shown in Figure 1.

In four cases, PFM scores on the positive pole ex-ceeded those for traditional marking. PFM wasregarded as fairer, more informative and challeng-ing, and harder than traditional marking. Traditionalmarking was rated as more accurate than PFM,fractionally more beneficial and more enjoyable.Ratings on the negative poles are shown in Figure 2.

Traditional marking is here characterized as less in-formative, less challenging, but rather easier for thestudents than PFM. PFM, on the other hand, is seen


as less accurate, less beneficial and less enjoyablethan the traditional alternative. However, given thevery small number of ratings involved, extremecaution should be exercised in attempting to inter-pret these data.

Responses to questions five and six enable a com-parison of the perceived effects of the two markingschemes. Perceived benefits are shown in Figure 3,and negative characteristics associated with the twoschemes in Figure 4.

In contrast to the tentative conclusions about thebenefits of the two schemes of marking derived fromnegative ratings of PFM (Figure 2), in Figure 3, PFMis seen as conferring more benefits to users than istraditional marking. It is seen as making the userthink more, learn more, and become more independ-ent, confident and critical than does the traditionalscheme. Negative characteristics (Figure 4) tend tosupport the perceived superiority of PFM (althoughnumbers of ratings are again small). Traditionalmarking is rated as making you more dependent andless critical than PFM. Students believe that tradi-tional marking does not make you think more aboutyour work, nor increase the amount of your learn-ing. The two schemes appear to be equal in terms ofstudent ratings concerning their lack of confidence.

DISCUSSION

The present study aimed to improve peer assessmentby providing students with a greater degree of helpwith marking than has typically been the case inmany previous studies. The present study also dif-fered from previous ones in that its main focus wason the provision of feedback to students, identify-ing the strengths and weaknesses of their presenta-tions. The experience of the students was tapped,and the criteria by which they were to judge oralpresentations made explicit. Students were remindedof the meaning of the marks in terms of ultimatedegree classification. The strategy of providing morehelp with marking appears to have met with somesuccess in that, in the present study, comparisons ofmean peer and lecturer mark indicated little differ-ence between the two. However, as previous workhas suggested (eg, Falchikov, 1986), peer markerstended to be a little more generous than the lecturer,which may reflect their reluctance to grade peersnoted in earlier studies. This hint of overmarkingmay also be attributable to the fact that peer markscontributed towards the overall grade. However, it

Dow

nloa

ded

by [

Uni

vers

ity o

f B

irm

ingh

am]

at 1

9:20

25

Aug

ust 2

014

182 IETI32.2

• PFM

Q Traditional

Positiveratings 3

Fair Informative Accurate Unbiased Challenging Beneficial Hard Enjoyable

Rating categories

Figure I. A comparison of Peer Feedback Marking and traditional marking: positive ratings.

m PFM

• Traditional

3 T

2S

Negativeratings \s

1

OS

0Unfair Uninfor-

mativeInaccurate Biased Not Not Easy Not

challenging beneficial enjoyable

Rating categories

Figure 2. A comparison of Peer Feedback Marking and traditional marking: negative ratings.

Dow

nloa

ded

by [

Uni

vers

ity o

f B

irm

ingh

am]

at 1

9:20

25

Aug

ust 2

014


• PFM

D Traditional

Benefit ratings

Independent Think more Learn more Confident

Rating categories

Figure 3. A comparison of the benefits of PFM and traditional marking.

Critical

• PFM

CD Traditional

Rating categories

Figure 4. Negative characteristics associated with PFM and traditional marking.

Dow

nloa

ded

by [

Uni

vers

ity o

f B

irm

ingh

am]

at 1

9:20

25

Aug

ust 2

014

184 IETI32.2

should be noted that, in the present study, marks werehigh overall. It may be that marking out of 20, ratherthan using the whole 100 mark range, leads to anelevation of marks, particularly where markers areinclined to give the benefit of the doubt. None theless, in terms of 'accuracy' of peer marking, thepresent study suggests that peer ratings may be rea-sonable indices of oral presentation skills.

The evaluation of the scheme of peer feedbackmarking (PFM) suggested that the benefits of pre-vious systems of peer assessment had been pre-served. PFM was perceived by students as beingan aid to learning. It was rated as fair by a major-ity, though two students felt it to be more biasedthan the traditional system. One of these studentsfelt the mark awarded her by her peers was too high.PFM was seen as more informative and more chal-lenging than the traditional method of marking, and,understandably, much harder work (and less enjoy-able) for the students. However, students reportedhaving become more critical and confident as aresult of participation in the scheme. They per-ceived that PFM made them think more than thetraditional scheme of marking, and learn more as aconsequence.

In spite of the emphasis of the scheme being on theprovision of feedback, negative features identifiedby students echoed some findings of previous stud-ies where peers experienced reluctance to awardmarks to their class mates. A 'problem' common tothis and previous studies is loyalty to friends. In thepresent study, students were more reluctant to offercriticism than praise. Overall, they recorded a largermean number of'best features'than of'weaknesses'.Moreover, where criticism of a peer's performancewas made, it was not uncommon for some 'soften-ing' of the point to be made. Often, criticisms weremodified.

'A touch too much information . . . ''A little'Slightly

Some criticisms were prefaced by, 'perhaps','maybe' or 'seemed to be'. Other criticisms wereaccompanied by an attempt at explanation.

eg, 'Hard to keep following at times, but thatcould be because it's generally hard to keeplistening further down the line.''Tended to have a little bit of difficulty getting

points across. This was largely due to nerves,however.'

Overall, students appeared to be very supportive oftheir peers to the extent that, in one case, the prob-lem identified was seen to belong to the rater ratherthan to the presenter.

'I got a bit confused.'

These students knew each other well, and appearedto get on well together. It is not entirely surprisingthat social relationships may have entered into theassessment procedure. Moreover, their assessmentsof each other are, in effect, public, as handwritingis likely to be traceable to individuals. Thus, as inthe prisoner's dilemma game (Brehm and Kassin,1993), co-operation in the form of'being nice' toone's peers, may be seen to constitute the best strat-egy for ensuring a good mark for one's self. Cer-tainly, overall marks were on the high side.However, in the present case, given that lecturermarks were also high, the effects of peer friend-ships seem to be more about the discomfort expe-rienced when offering criticism than about grossinflation of marks.

In the present educational context, where very largeclass sizes are becoming the norm, close friendshipswithin work groups may become less of a 'prob-lem' than was the case in this study of PFM. As classsizes increase and more demands are placed on stu-dents to word process their essays or record assess-ments directly into a computer, fears about the lackof anonymity may recede. However, it can be ar-gued that learning how to make criticism in a diplo-matic manner is a useful skill in itself. The presentstudy indicated that students lacked confidence intheir own abilities, in that PFM attracted higher rat-ings concerning bias than did the traditional scheme,and lower ratings concerning accuracy. It is to behoped that, having received feedback on the closecorrespondence between peer and lecturer markingin the present study, these students will approach theirnext PFM assignment with a greater degree ofconfidence.

PFM is a versatile addition to the assessment reper-toire of any lecturer, in that it may be used in a widevariety of situations. A major strength of PFM seemsto be related to the enhancement of student learningby means of reflection, analysis and diplomaticcriticism.

Dow

nloa

ded

by [

Uni

vers

ity o

f B

irm

ingh

am]

at 1

9:20

25

Aug

ust 2

014


Learning, Second Edition, London: Kogan Page.

Boud, D. and Falchikov, N. (1989) Quantitative Stud-ies of Student Self-Assessment in Higher Education:a Critical Analysis of Findings, Higher Education,18, 5, 529-49.

Boud, D. and Tyree, A.L. (1979) Self and Peer As-sessment in Professional Education: a PreliminaryStudy in Law, Journal of the Society of Public Teach-ers of Law, 15, 1, 65-74.

Brehm, S.S. and Kassin, S.M. (1993) Social Psy-chology, Second Edition, Boston: Houghton Mifflin.

D'Augelli, A.R. (1973) The Assessment of Interper-sonal Skills: a comparison of Observer, Peer and SelfRatings, Journal of Community Psychology, 1, 177-9.

Falchikov, N. (1986) Product Comparisons and Proc-ess Benefits of Collaborative Self and Peer GroupAssessments, Assessment and Evaluation in HigherEducation, 11, 2, 146-66.

Falchikov, N. (1988) Self and Peer Assessment of aGroup Project Designed to Promote the Skills ofCapability, Programmed Learning and EducationalTechnology, 26, 4, 327-39.

Falchikov, N. (1993) Group Process Analysis: Selfand Peer Assessment of Working Together in a Group,Educational Technology and Training, 30, 3, 275-84.

Falchikov, N. and Boud, D. (1989) Student Self-Assessment in Higher Education: a Meta-Analysis,Review of Educational Research, 59, 4, 395-430.

Forehand, L.S., Vann, W.F. and Shugars, D.A. (1982)Student Self-Evaluation in Pre-Clinical RestorativeDentistry, Journal of Dental Education, 46, 4, 221-6.

Fuqua, D.R., Johnson, A.W., Newman, J.L., Ander-son, M.W. and Gade, E.M. (1984) Variability AcrossSources of Performance Ratings, Journal of Coun-selling Psychology, 31, 2, 249-52.

Gray, T.G.F. (1987) An Exercise in Improving thePotential of Exams for Learning, European Journalof Engineering Education, 12, 4, 311-23.

Kegel-Flom, P. (1975) Predicting Supervisor, Peer,and Self-Ratings of Intern Performance, Journal ofMedical Education, 50, 812-15.

Klimoski, R.J. and London, M. (1974) Role of theRater in Performance Appraisal, Journal of AppliedPsychology, 59, 445-51.

Korman, M. and Stubblefield, R.L. (1971) Medical

School Evaluation and Internship Performance, Jour-nal of Medical Education, 46, 670-73.

Laurillard, D. (1984) Learning from Problem Solv-ing. In Marton, F., Hounsell, D. and Entwistle, N.J.(Eds.) The Experience of Learning, Edinburgh: Scot-tish Academic Press.

Linn, B.S., Arostegui, M. and Zeppa, R. (1975) Per-formance Rating Scale for Peer and Self-Assessments,British Journal of Medical Education, 9, 98-101.

Magin, D.J. and Churches, A.E. (1988) What do stu-dents learn from self and peer assessments? In Steele,J. and Hedberg, J. (Eds.) Designing for Learning inIndustry and Education, Canberra: Australian Soci-ety for Educational Technology.

Mast, T.A. and Bethart, H. (1978) Evaluation ofClinical Dental Procedures by Senior Dental Stu-dents, Journal of Dental Education, 42, 4, 196-7.

Moodie, G.C. (ed) (1986) Standards and Criteria inHigher Education, Guildford: SRHE & NFER-Nelson.

Morton, J.B. and Macbeth, W.A.A.G. (1977) Corre-lations Between Staff, Peer, and Self-Assessmentsof Fourth-Year Students in Surgery, Medical Edu-cation, 11, 30, 167-70.

Ramsden, P. (1984) The Context of Learning. InMarton, F., Hounsell, D. and Entwistle, N.J. (Eds.)The Experience of Learning, Edinburgh: ScottishAcademic Press.

Ramsden, P. (1986) Students and Quality. In G.C.Moodie (ed) Standards and Criteria in Higher Edu-cation, Guildford: SRHE & NFER-Nelson.

Schumacher, C.F. (1964) A Factor-Analytic Studyof Various Criteria of Medical Examinations, Jour-nal of Medical Education, 39, 192-6.

BIOGRAPHICAL NOTES

Nancy Falchikov works in the Department of SocialSciences at Napier University, Edinburgh where sheteaches psychology. She has a BSc and a PhD fromthe University of Edinburgh, as well as a teachingqualification. She is currently co-ordinating a cross-faculty research study of peer tutoring and develop-ing her work on self and peer assessment and co-operative learning. She is also investigating mis-match in the learning environment.

Address for correspondence: Nancy Falchikov, NapierUniversity, Colinton Road, Edinburgh EH10 5DT.

Dow

nloa

ded

by [

Uni

vers

ity o

f B

irm

ingh

am]

at 1

9:20

25

Aug

ust 2

014

186 IETI32.2

APPENDIX

PEER FEEDBACK MARKING: ASSESSMENT OF AN ORAL PRESENTATION

1. What did you like BEST about peer feedback marking?

Why?

2. What did you like LEAST about peer feedback marking?

Why?

3. Please rate peer feedback marking on the following dimensions:

Fair

Uninformative

Aids learning

Inaccurate

Unbiased

Easy

Good

Unfair

Informative

Doesn't aid learning

Accurate

Biased

Hard

Bad

4. Now complete the same ratings for traditional (lecturer) marking:

Fair

Uninformative

Aids learning

Inaccurate

Unbiased

Easy

Good

Unfair

Informative

Doesn't aid learning

Accurate

Biased

Hard

Bad

Dow

nloa

ded

by [

Uni

vers

ity o

f B

irm

ingh

am]

at 1

9:20

25

Aug

ust 2

014

5. Peer feedback marking makes me:


Dependent

Think

Not learn more

Confident

Uncritical

Now complete the same ratings for traditional marking.

6. Traditional marking makes me:

Independent

Not think

Learn more

Lack confidence

Critical

Dependent

Think

Not learn more

Confident

Uncritical

7. How does peer feedback marking compare with traditional marking? Which do you prefer? Why?

Dow

nloa

ded

by [

Uni

vers

ity o

f B

irm

ingh

am]

at 1

9:20

25

Aug

ust 2

014

Documents

Peer Feedback Marking: Developing Peer Assessment