5
British Journal of Medical Education, 1970, 5, 147- 1 5 I Appraisal of student performance : multiple-choice questions, essays, and ’short notes’ ANNE FERGUSON’, MARY A. WRIGHT, and G. P. McNlCOL University Department of Medicine, Glasgow Royal Infirmary This paper describes a comparison of the scores obtained in two types of essay question with those derived from an objective (multiple-choice) test, using a class of dental students during their period of instruction in clinical medicine. The design of the experiment also allowed assess- ment of the abilities of examiners, with widely varying experience, to mark the different types of essay question. In Glasgow, dental students have two terms of instruction in medicine, in the fourth year of the curriculum. The course is partly vocational, covering medical emergencies, diseases affecting head, neck, and mouth, blood coagulation, infection, and other conditions directly relevant to dentistry. However, the main purpose is to outline the theoretical and clinical aspects of medical diseases, so that the student can see dentistry in context as only one facet of applied human biology. Students must pass the pro- fessional examinations in medicine and surgery before proceeding to fifth (final) year, and the written examinations in medicine have comprised traditional essays and ‘short notes’ questions. Objective paper Recently, objective papers were introduced for class and professional examinations in medicine for medical students in Glasgow and we were prompted to look critically at our method of examination of the dental students. We decided to use an objective paper for one year only and to decide, after analysis of the results, whether Present address: University Department of Bacteriology and Immunology. Western Infirmary, Glasgow, W. I. we should recommend inclusion questions in future class and examinations. of objective professional Material and methods Students Thirtj-tiiree dental stirdents in the fourth year had 25 hours of lectures, and 50 hours of bedside teaching in medicine. They were given a class examination with both an objective test and a written paper. Three weeks later they sat the professional examination. Thirty-four medical students in the fourth year had approximately 180 hours of lectures and seminars, and 240 hours bedside teaching in medicine. As part of a class examination, these students sat the same objective paper as the dental students. Class examinations Written paper. This contained two essay questions each allocated 30 marks, and a question with four items, each allocated 10 marks, on which the students were instructed to write ‘short notes’. There was no choice of question. The total time available for the examination was one and a half hours, giving approximately 30 minutes for each of the essay questions and seven minutes for each of the ‘short note’ questions. All scripts were marked by three examiners: A, senior lecturer, 14 years’ experience in setting and correcting examinations ; B, lecturer, three years’ experience in correcting examinations ; C, assist- ant lecturer, completely inexperienced as an examiner. The range of marks corresponded to that used in the professional examination: 147

Appraisal of student performance: multiple-choice questions, essays, and ‘short notes’

Embed Size (px)

Citation preview

British Journal of Medical Education, 1970, 5, 147- 1 5 I

Appraisal of student performance : multiple-choice questions, essays, and ’short notes’ ANNE FERGUSON’, MARY A. WRIGHT, and G. P. McNlCOL

University Department of Medicine, Glasgow Royal Infirmary

This paper describes a comparison of the scores obtained in two types of essay question with those derived from an objective (multiple-choice) test, using a class of dental students during their period of instruction in clinical medicine. The design of the experiment also allowed assess- ment of the abilities of examiners, with widely varying experience, to mark the different types of essay question.

In Glasgow, dental students have two terms of instruction in medicine, in the fourth year of the curriculum. The course is partly vocational, covering medical emergencies, diseases affecting head, neck, and mouth, blood coagulation, infection, and other conditions directly relevant to dentistry. However, the main purpose is to outline the theoretical and clinical aspects of medical diseases, so that the student can see dentistry in context as only one facet of applied human biology. Students must pass the pro- fessional examinations in medicine and surgery before proceeding to fifth (final) year, and the written examinations in medicine have comprised traditional essays and ‘short notes’ questions.

Objective paper Recently, objective papers were introduced for class and professional examinations in medicine for medical students in Glasgow and we were prompted to look critically at our method of examination of the dental students. We decided to use an objective paper for one year only and to decide, after analysis of the results, whether

Present address: University Department of Bacteriology and Immunology. Western Infirmary, Glasgow, W. I .

we should recommend inclusion questions in future class and examinations.

of objective professional

Material and methods Students Thirtj-tiiree dental stirdents in the fourth year had 25 hours of lectures, and 50 hours of bedside teaching in medicine. They were given a class examination with both an objective test and a written paper. Three weeks later they sat the professional examination.

Thirty-four medical students in the fourth year had approximately 180 hours of lectures and seminars, and 240 hours bedside teaching in medicine. As part of a class examination, these students sat the same objective paper as the dental students.

Class examinations Written paper. This contained two essay questions each allocated 30 marks, and a question with four items, each allocated 10 marks, on which the students were instructed to write ‘short notes’. There was no choice of question. The total time available for the examination was one and a half hours, giving approximately 30 minutes for each of the essay questions and seven minutes for each of the ‘short note’ questions. All scripts were marked by three examiners: A, senior lecturer, 14 years’ experience in setting and correcting examinations ; B, lecturer, three years’ experience in correcting examinations ; C , assist- ant lecturer, completely inexperienced as an examiner. The range of marks corresponded to that used in the professional examination:

147

148 Anne Fergiison, Mary A. Wright, and G. P. McNicol

. . . 20. . . . .

. . I . .. . 10. . .... -. . r= .90

<SO%, fail; 60%, good pass; >70%, distinc- tion. Objective paper. We chose the form of objective question used in the Glasgow MB finals (Harden, Lever, and Wilson, 1969). A short preamble is followed by six items, each to be answered Yes or No. Correct answer (Yes or No) g a i n s f l mark: incorrect answer - 1 mark; unanswered 0 marks. Forty questions, with a total of 240 items, were answered in 90 minutes.

.. .. . . . . . . . . . . " . t . r z . 6 9

Professional examinations Dental. This was composed of a written paper (100 marks) with two essays and four 'short notes', and a 10-minute oral examination (100 marks) conducted by an external examiner working with examiner A or B. Medical. Marks for four objective papers, sat over the course of a year, were added to give 80 % of the professional examination mark. The other 20% was the mark for a dissertation, prepared by the student in his own time over a six-month period. First-year e.\ramination. The medical and dental students had identical courses and examinations i n biology, chemistry, and physics. Marks for these three examinations were totalled (300) and compared with the objective paper marks.

Statistical analysis Each group of dental students' marks (essays and short notes as examined by A, B, and C, and objective paper) was ranked by Spearman's method (Langley, 1968), and correlation co- efficients (r) were calculated using the ranked results.

Results Written papers: inter-examiner correlations There was some variation in the range of marks given by each examiner. Ranges of total marks for the written paper were: A, 43 to 66%; B, 37 to 77 %; C, 38 to 64 %. In the analysis of the data, the use of a ranking technique eliminated discrepancies due to the differences in range of marks. Correlation coefficients, calculated from ranked values, gave good inter-examiner cor- relations for conventional essays, and excellent correlations for short notes (Table 1). Of special interest is the good correlation between ex- perienced examiner (A) and completely in- experienced examiner (C) (Fig. 1).

Table 1. Ranking of written papers: inter-e.uarnitrer correlations -~ ~

T y p e of question Exonritiers Currelotion cueficient ( r ) .. -

Essays (2) A. B 0.75 A. C 0.69 B,' C 0.83

Short notes (4) A, B 0.92 A, c 0.90 B, C 0.56

SHORT NOTES ESSAYS

30 - u L a2

E

X Y

.E 20

- 10

I m K

. .. ..

..

10 20 30 M 20 30 Ranking (Examiner A )

Fig. 1. Ranking of written paper by examiners A and C. Left-ranking of four short note questions; right-ranking of two essay qrrestions.

Objective paper This gave a wide range of marks (62 to 162 out of 240) for the dental students. T o check the internal consistency of the examination, the correlation coefficient was calculated for results of even-number and odd-number questions and was 0.69. Since halving the number of items reduces the reliability of any objective examina- tion (Hubbard and Clemans, 1961), this value for r indicates adequate internal consistency in the objective paper. As can be seen from Fig. 2,

5 1 n DENTAL (33)

MEDICAL (34) 5 - *-Top 5 students in

2 . 1 .

O ' 50 75 100 125 150 175 200

Fig. 2. Marks (out of 240) in the objective paper. for 33 dental striclents and 34 tneelicnl strrclerrts. Coding inclicatc,.\ sirbsc~qitent performance of tneclical strrclmts in the pro- fessional examination in ttwlicine.

Objective paper mark (out of 240)

Appraisal of student performance I49

there was considerable overlap between medical and dental students’ marks in the objective paper. Medical students had more experience of objective examinations ; however, analysis of the way questions were answered (Table 2) shows that the ratio between unanswered and in- correctly answered questions was similar for the two groups and it is therefore unlikely that the dental students guessed more than the medicals.

Table 2. Marks for 240 objective question items

Medicals Dentals (meanof34) (meanof33)

Correct answer (scores+ I ) 169.5 148.4 Incorrect answer (scores- I ) 28.7 37.9 Unanswered (no score) 41.8 53.7 Final mark (out of 240) 140.8 110.5

Incorrectiunanswered ratio 0.69 0.7 1 Range (97-183) (62-162)

Correlations between rankings for objective and written papers Values of r for objective paper ranking and essay, short notes, total written paper, and individual examiners’ rankings are given in Table 3. Values for r range from 0.51 to 0-63. These are relatively poor correlations and show no sig- nificant improvement when all three examiners’ marks are pooled and considered together.

Table 3. Correlations between objective paper ranking and written paper rankings

Correlation coeficient ( r ) (correlation with objective

Type of question Examiner paper)

Essays (2) A 0.63 B 0.52 C A, B, C

Short notes (4) A B C A, B, C

Essays and short notes A

B C A. B. C

0.55 0.58 0.52 0.5 I 0.55 0.54

0.60 0.52 0.59 0.59

Prediction of results of professional examinations in medicine Denral students. In the professional examination, one student gained distinction, one failed, there were five very good passes ( > 67%) and four poor passes (< 53 ”/,). These groups of students

were clearly separated by their marks for the written paper in the class examination, but not by their marks in the objective paper (Fig. 3).

PROFESSIONAL EXAM. m Distinction a Good pass ( ~ 6 7 % ) fl Poor ( 4 3 % )

5 r El Fail

0 Total written paper marks (out of 300) L

100 150 Objective paper marks (out of 240)

Fig. 3. Written and objective paper marks for the 33 dental students. The coding indicates subsequent per- formance in the professional examination.

The written paper in the professional examination contained only essays and short notes, and the results can be interpreted as showing that some students consistently write good essays ; further, when oral marks were ranked and compared with the class examination rankings, the value of r for oral/objective correlation was 0.42, but was 0-69 for oraljwritten correlation, suggesting that those who write good essays also tend to score high marks in an oral examination. Medical students. The objective paper marks clearly separated the five students with top performance, and the six with poorest per- formance in the professional examination. However, 80 % of the professional examination marks were for objective questions. The disser- tation had been graded, A, B, C, D, E, before allocation of marks, and these grades were our only available evidence of the students’ per- formances in non-objective tests. As shown in Table 4, the mean objective paper marks showed a crude positive correlation with the dissertation grade, although for students within each grade there was a wide scatter of objective marks.

1 so Anne Ferguson, Mary A . Wright, and G. P . McNicol

Table 4. Comparison of medical students’ objective paper marks with dissertation grade

Dissertation Number o j Objective paper mark grode students mean range

A 8 161.6 (123-173) B 10 139.8 (122-166) C 13 138.3 (110-183) D 3 119.7 (97-1 63)

Objective paper marks and first-year professional examination marks On simple linear regression analysis, for both dental and medical students, no correlation was found between total marks for first-year subjects (biology, chemistry, physics) and objective paper marks.

Discussion The very good inter-examiner correlation for written paper ranking is surprising. We think that several factors have contributed to this. Although the scripts were marked ‘blind’, the examiners were aware that they were taking part in an experiment, and may have approached the task with unusual care. There were only 33 scripts for correction, and so a single question could be marked at one sitting; the examiner could easily compare different essays, and refer back to previously corrected scripts to check consistency of marking. Further, the use of ranking eliminated variations due to the different range of marks used by each examiner.

The form of teaching made it likely that answers would be fairly uniform. Examiners A and B conducted the lecture course, and all the students used the same textbook. They recog- nized the subjects of especial relevance to dental practice - for example, sub-acute bacterial endocarditis; lesions of cranial nerves - and learned these thoroughly. Predictably, about half of the written and objective questions were on such subjects.

It is also known that good inter-examiner correlation is present when short written answers are corrected (Bull, 1956; Mowbray and Davies, 1967), and we found that a completely inex- perienced examiner, correcting short notes, gives similar marks to those given by an experienced examiner. Marking out of 10 forces the examiner to categorize the answers into four or five groups (given marks, 4, 5, 6, 7, 8). Clearly the inex- perienced examiner is more likely to discriminate between a script worth 5/10 and one worth 6 / 10

than between one worth, say 18/30 and one worth 19 30; allocation to narrower ranges than percentiles suggest a failure to appreciate the imprecision inherent in the grading of essays.

The good inter-examiner correlation suggests that the written paper marks were a reliable assessment of the student’s ability to write essays and short notes. Written paper marks did not correlate well with objective paper marks. It is clear, therefore, that the objective paper and written paper assessed different attributes. The Yes/No type of objective question is exacting and demands accurate negative knowledge - that is, that a fact is untrue. The dental students had only 75 hours of teaching; we think it likely that they had insufficient precise ‘negative’ knowledge to answer the objective questions well. The over- lap in the objective paper scores between the dental and medical students is, none the less, surprising (Fig. 2). Use of a different type of objective question could make the test a more reliable index of the dental students’ knowledge of medicine. Possibly the type of question where only one of the five alternatives is correct, would be suitable (Hubbard and Clemans, 1961).

Examinations act as a source of feed-back to teachers. Such feed-back can be obtained from answers to objective papers if each item is analysed (this usually requires access to a com- puter), and if the analyses are subsequently dis- cussed at staff meetings. Unless many members of staff are interested in education, such dis- cussions are unlikely to take place. In contrast, immediate feed-back takes place during the process of correcting essay or short note ques- tions. Junior members of staff, who are heavily involved in small-group teaching, should derive benefit from correcting students’ scripts, and we have demonstrated that a completely inexperien- ced examiner, marking short notes, produces rankings closely comparable with those of an experienced examiner.

When large numbers of candidates and exam- iners are involved in an examination an objective format rules out all possibility of unfairness due to inter-examiner inconsistencies in marking. However, with fewer candidates and examiners there comes a point below which it takes more time to prepare and analyse good objective questions than it takes to correct short notes and essays. Our data shows that examination of these smaller numbers of students could reliably be

Appraisal of student performance 151

carried out using short written questions. In this situation, junior members of staff can be given their first experience in correction of examination papers.

The heavy demands on the time of skilled professional staff involved in the setting and marking examinations are important inducements to carry out experiments on the relevance and efficiency of different types of examination. However, we would sound a note of caution: together we have spent 130 hours preparing and writing this short report, an investment of time which would have covered routine marking of our dental students’ papers for the next eight years.

Summary A group of 33 dental students had 75 hours of instruction in clinical medicine. Their class examination consisted of a written paper with two essays and four ‘short notes’, and an ob- jective test with 240 Yes/No items. Written papers were marked by three examiners with widely varying experience, and inter-examiner correlations were good for essays and excellent for short notes. The objective test gave a wide range of marks and there was substantial overlap with the marks of 34 medical students who sat the same objective paper. Poor correlations were obtained when ranked scores in the objective

test were compared with ranked written paper marks, with no improvement in correlation when the three examiners’ marks were pooled.

Compared with the objective test rankings, the written paper rankings gave better separation of the students who performed well or badly in the subsequent degree examination.

It is concluded that some of the factors measured by the objective test differed from those measured by the written paper. An in- experienced examiner, correcting short notes, produces closely similar ranking to a highly experienced examiner. In this situation, there is immediate feed-back to junior members of staff, who carry out much of the small-group teaching in clinical departments.

Our thanks are due to Professor E. M. McGirr for his interest in this study, and to Professor C. M. Fleming and T. C. White for permission to publish.

References Bull. G . M. (1956). An examination of the final medical

examination in medicine. Lancet, 2, 368-372. Harden, R. McG., Lever, R., and Wilson, G . M. (1969). Two

systems of marking objective examination questions. Lancet, 1,4042.

Hubbard. J. P., and Clemans, W. V. (1961). Multiple Choice Examinations in Medicine. Lea and Febiger : Philadelphia.

Langley, R. ( I 968). Practicul Statistics j b r Nott-Mathematical People. Pan: London.

Mowbray, R. M., and Davies, B. M . (1967). ‘Short-note’ and ‘essay’ examinations compared. British Journal of Medical Education, 1.356-358.