Upload
civaas
View
215
Download
0
Embed Size (px)
Citation preview
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
1/134
title:Monitoring the Learning Outcomes of EducationSystems Directions in Development (Washingto
D.C.)
author: Greaney, Vincent.; Kellaghan, Thomas.
publisher: World Bank
isbn10 | asin: 0821337343
print isbn13: 9780821337349
ebook isbn13: 9780585233888
language: English
subject
Educational evaluation--Cross-cultural studies,Educational indicators--Cross-cultural studies,
Educational tests and measurements--Cross-cu
studies.
publication date: 1996
lcc: LB2822.75.G736 1996eb
ddc: 379.1/54
subject:
Educational evaluation--Cross-cultural studies,
Educational indicators--Cross-cultural studies,
Educational tests and measurements--Cross-custudies.
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
2/134
P
DIRECTIONS IN DEVELOPMENT
Monitoring the Learning Outcomes of Education Systems
Vincent GreaneyThomas Kellaghan
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
3/134
P
1996 The International Bank for Reconstruction
d Development / THE WORLD BANK
18 H Street, N.W.
ashington, D.C. 20433
l rights reserved
anufactured in the United States of America
rst printing November 1996
e findings, interpretations, and conclusions expressed in this study are entirely thos
e authors and should not be attributed in any manner to the World Bank, to its affili
ganizations, or to the members of its Board of Executive Directors or the countries
present.
over photos: Curt Carnemark, The World Bank
ncent Greaney is a senior education specialist in the World Bank's Asia Technical
epartment, Human Resources and Social Development Division. Thomas Kellaghan
rector of the Educational Research Centre, St. Patrick's College, Dublin
brary of Congress Cataloging-in-Publication Data
eaney, Vincent.
onitoring the learning outcomes of education systems / Vincent
eaney, Thomas Kellaghan.cm. (Directions in development)
cludes bibliographical references.
BN 0-8213-3734-3
Educational evaluationCross-culture studies. 2. Educational
dicatorsCross-culture studies. 3. Educational tests and
easurementsCross-culture studies. I. Kellaghan, Thomas.
Title. III. Series: Directions in development (Washington,
C.)B2822.75.G736 1996
9.1'54dc20 96-43969
CIP
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
4/134
Pa
ontents
eface vii
ature and Uses of Educational Indicators 1
Educational Indicators 3
Choice of Outcome Indicators 4
Uses of Information from Outcome Assessments 5
Informing Policy 6
Monitoring Standards 7
Introducing Realistic Standards 7
Identifying Correlates of Achievement 8
Directing Teachers' Efforts and Raising Students'
Achievements 8
Promoting Accountability 9
Increasing Public Awareness 9
Informing Political Debate 10
Role of National Assessments 10
ational and International Assessments 12
National Assessments 12
United States 12
England and Wales 15
Chile 17
Colombia 19
Thailand 21
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
5/134
Namibia 22
Mauritius 24
International Assessments 25
International Assessment of Educational Progress 25
International Association for the Evaluation of
Educational Achievement
26
Advantages of International Assessments 27
Disadvantages of International Assessments 28
ational Assessment and Public Examinations 31
Purposes 31
Achievements of Interest 32
Testing, Scoring, and Reporting 33
Populations of Interest 34
Monitoring 34
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
6/134
P
Contextual Information 35
High-Stakes and Low-Stakes Testing 37
Efficiency 38
Conclusion 38
omponents of a National Assessment 40
Steering Committee 41
Implementing Agency 42
Internal Agency 42
External Agency 43
Team from Internal and External Agencies 44
Foreign Experts 44
Building Support 45
Target Population 46
Population Defined by Age or Grade 46
Choice of Levels of Schooling 47
Sampling 47
Choice of Population for Sampling Purposes 48
Sample Selection 48
What Is Assessed? 50
Instrument Construction 52
Type of Test 55
Test Sophistication 58
Nonachievement Variables 58
Administration Manual 59
Review Process 59
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
7/134
Administration 60
Analysis 61
Reporting 62
Average Performance of Students in a Curriculum
Area 63
Percentage Passing Items 63
Percentage Achieving Mastery of Curriculum
Objectives 63
Percentage Achieving Specified Attainment Targets 64
Percentage Functioning at Specified Levels of
Proficiency 65
Cost-Effectiveness 65
Conclusion 66
tfalls of National Assessment: A Case Study 68
Background to the Initiation of a National Assessment in
Sentz 68
School System 68
Response to Education Concerns 69
National Assessment of Educational Standards in Sentz 70
Organization 70
Test Development 71
Implementation 72
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
8/134
P
Analysis of the Case 73
Responses to Assessment 73
Implementation Procedures 74
A Choice to Make 76ferences 77
ppendix. National Assessment Checklist 85
bles
2.1
Proficiency Levels of Students in Grades 4, 8, and 12, as
Measured by U.S. NAEP Mathematics Surveys, 1990 and1992
14
2.2
Percentage of Students at or above Average Proficiency
Levels in Grades 4, 8, and 12, as Measured by U.S.
NAEP Mathematics Surveys, 1990 and 1992
15
4.1
Specifications for Mathematics Test: IntellectualBehaviors Tested
54
4.2
Distribution of Costs of Components of National
Assessment in the United States
66
5.1
Educational Developments in Sentz, 197090 69
5.2
Schedule of Activities for a National Assessment in
Sentz
72
oxes
2.1
Atypical Student Samples 28
4.1
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
9/134
Examples of Multiple-Choice Items in Mathematics, for
Middle Primary Grades
55
4.2
Example of Open-Ended Item in Mathematics, for Lower
Secondary Grades
56
4.3
Dangers of Cultural Bias in Testing
60
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
10/134
Pa
eface
e collection and publication of statistics relating to numbers of schools, numbers o
achers, student enrollments, and repetition rates have for some time been a feature o
ost education systems. Up to relatively recently, however, few systems, with theception of those with public examinations, have systematically collected informatio
hat education systems actually achieve in terms of students' learning. This is so even
ough, as the World Declaration on Education for All (UNESCO 1990b) reminds us,
whether or not expanded educational opportunities will translate into meaningful
velopmentfor an individual or for societydepends ultimately on whether people lea
esult of those opportunities."
response to this consideration, education systems in more than fifty countries, mosem in the industrial world, have in recent years shown an interest in obtaining
formation on what their students have learned as a result of their educational
periences. This interest was manifested either by developing national procedures to
sess students' achievements or by participating in international studies of student
hievement. It seems likely that the number of countries involved in these activities w
crease in the future.
his book is intended to provide introductory information to individuals with an inte
assessing the learning outcomes of education systems. It considers the role of dicators in this process, in particular their nature, choice, and use (chapter 1). A num
approaches to assessing learning outcomes in selected industrial countries (the Uni
ates and the United Kingdom) and in representative developing countries (Chile,
olombia, Mauritius, Namibia, and Thailand) are described. Systems of comparative
ernational assessment are also reviewed, and the arguments for and against the
rticipation of developing countries in such assessments are examined (chapter 2).
me countries already have available and publish information on student learning inrm of public examination results. The question arises: can such information be rega
equivalent to the information obtained in national assessment systems that are desig
ecifically to provide data on learning outcomes for an education system? The answ
ached in chapter 3) is that it cannot.
chapter 4, the various stages of a national assessment, from the establishment of a
tional steering committee to actions designed to
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
11/134
Pag
sseminate results and maximize the impact of the assessment, are described. Finally
apter 5, a case study containing numerous examples of poor practice in the conduct
tional assessments is presented. The more obvious examples of poor practice are
entified, and corrective measures are suggested.
e authors wish to express their appreciation for assistance in the preparation of this
per to Leone Burton, Vinayagum Chinapah, Erika Himmel, John Izard, Ramesh
anrakhan, Michael Martin, Paud Murphy, Eileen Nkwanga, O. C. Nwana, Carlos Ro
alcolm Rosier, Molapi Sebatane, and Jim Socknat. The manuscript was prepared by
resa Bell and Julie-Anne Graitge. Nancy Berg edited the final manuscript for
blication. Abigail Tardiff and Amy Brooks were the proofreaders.
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
12/134
P
ature and Uses of Educational Indicators
though most of us probably think of formal education or schooling primarily in ter
the benefits that it confers on individuals, government investment in education hasten been based on assumptions about the value of education to the nation rather tha
e individual. As public schooling developed in the eighteenth and nineteenth centur
pport for it was frequently conceived in the context of objectives that were public
her than private, collective rather than individual (Buber 1963). More recently, colo
ministrations recognized the value of education in developing the economy as well
promoting shared common values designed to make populations more amenable to
ntrol.
e importance of education for the nation is reflected in the considerable sums of
oney that national governments, and, frequently, provincial, regional, and state
vernments, are prepared to invest in it. In 1987 world public expenditure on educat
mounted to 5.6 percent of gross national product (GNP); the figure varied from a low
1 percent for East Asia to a high of 6.5 percent for Oceania. As a percentage of total
vernment expenditure, the median share for education was 12.8 percent in industria
untries, a figure considerably lower than the 15.4 percent recorded in developing
untries (UNESCO 1990a).
ven this situation, it is not surprising that for some time government departments h
utinely collected and published statistics that indicate how their education systems a
orking and developing. Statistics are usually provided on school numbers and facili
udent enrollments, and efficiency indices such as student-teacher ratios and rates of
petition, dropout, and cohort completion. But despite an obvious interest in what
ucation achieves, and despite the substantial investments of effort and finance in its
ovision, few systems in either industrial or developing countries have, until recentlystematically collected and made available information on the outcomes of education
hus, in most countries there is a conspicuous dearth of evidence on the quality of
udents' learning. Few have stopped, as a former mayor of New York was inclined to
d asked "Hey, how am I doing?" although knowing precisely how one is doing wou
rely be useful.
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
13/134
P
nce the 1980s, however, decisionmakers have begun to attach increasing importanc
e development of a coherent system for monitoring and evaluating educational
hievement, specifically pupil learning outcomes. In this book, our focus is on the
velopment of such a system. Following usage in the United States, this system is
ferred to as a national assessment.
e interest in developing a systematic approach to assessing outcomesin doing a nati
sessmentcan be attributed to several factors. One is a growing concern that many
ildren spend a considerable amount of time in school but acquire few useful skills.
indham (1992) has pointed out, school attendance without learning "makes no socia
onomic or pedagogical sense" (p. 56). In the words of the World Declaration on
ducation for All (UNESCO 1990b, par. 4),
Whether or not expanded educational opportunities will translate into meaningful developmentfor
an individual or for societydepends ultimately on whether people actually learn as a result of thosopportunities, in other words, whether they incorporate useful knowledge, reasoning ability, skill
and values.
he problem of inadequate school learning is not confined to developing countries.
roughout the world, one hears expressions of dissatisfaction with the levels of
hievement of today's students, though there may be little evidence that standards are
ct declining. Even without such evidence, a case can still be made that changes in th
orld of work are resulting in a mismatch between educational outcomes and the nee
society (Townshend 1996). This mismatch is most obvious in the case of what hasen called "an educational underclass" made up of students who perform very poorl
e education system. This underclass is found in most countries. In the past its memb
uld find employment in unskilled work, but this is no longer possible because jobs
quire only minimal literacy skills are fast disappearing from the labor market,
rticularly in industrial countries.
ven the need for better-educated students, decisionmakers are concluding that a
onitoring system is necessary to gather information needed to describe and monitor ture of students' achievements, the relevance of those achievements to the world of
ork, and the number of inadequately prepared students leaving the system.
hat is learned at school assumes even more importance because of increased global
onomic competition, marked by rapid movement of capital and new technologies fr
untry to country. In such a situation, it is claimed that a country's level of productiv
d ability to compete depend greatly on workers' and management's skill in using
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
14/134
P
pital and technology (World Bank 1991) and thus that ''skilled people become the o
stainable competitive advantage" (Thurow 1992, p. 520). Comparative studies of
udents' achievements have been used to gauge the relative status of countries in
veloping individual skills.
nother reason for interest in monitoring student achievements is that governments to
e faced with the problem of expanding enrollments while at the same time improvin
e quality of educationwithout increasing expenditure. More detailed knowledge of t
nctioning of the education system will, it is hoped, help decisionmakers cope with t
uation by increasing the system's efficiency.
final reason for the increased interest in monitoring and evaluating educational
ovision arises from the move in many countries, in the interest of both democracy a
ficiency, to decentralize authority in the education system, providing greater autono
local authorities and schools. When traditional central controls are loosened in thisay, a coherent system of monitoring is necessary.
ducational Indicators
e term educational indicator (in the tradition of economic and social indicators) is
ten used to describe policy-relevant statistics that contain information about the stat
ality, or performance of an education system. Several indicators are required to pro
e necessary information. In choosing indicators, care is taken to provide a profile o
rrent conditions that metaphorically can be regarded as reflecting the "health" of thestem (Bottani and Walberg 1994; Burnstein, Oakes, and Guiton 1992). Indicators ha
e following characteristics (Burnstein, Oakes, and Guiton 1992; Johnstone 1981; Ow
odgkinson, and Tuijnman 1995):
n indicator is quantifiable; that is, it represents some aspect of the education system
merical form.
particular value of an indicator applies to only one point or period in time.
statistic qualifies as an indicator only when there is a standard or criterion against w
can be judged. The standard may involve a norm-referenced (synchronic) comparis
tween different jurisdictions; a self-referenced (diachronic) comparison with indica
lues obtained at different points in time for the same education system; or a criterio
ferenced comparison with an ideal or planned objective.
n indicator provides information about aspects of the education system that
licymakers, practitioners, or the public regard as
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
15/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
16/134
P
portant. Sometimes it may be easy to obtain consensus among interested parties on
hat is important; other times it may not.
n indicator is realistic in the sense that it is based on information collected with due
gard to financial and other constraints.
n indicator describes conditions amenable to improvement.
formation for indicators is collected frequently enough to allow change to be
onitored.
dicators allow an examination of distributions among subpopulations of interest (fo
ample, by age, gender, income, or socioeconomic group).
e selection of indicators to represent the status of the education system is based on
odel, which may be explicit or implicit, of how the education system works (Burnst
akes, and Guiton 1992). The set of indicators incorporated in the model should refle
e multifaceted nature of education in all its complexity (Bottani and Tuijnman 1994)
comprehensive enough to describe the important dimensions of the system. The
odel, in turn, provides a context for interpreting what the indicators mean, how they
ate to other aspects of the education system (and perhaps to other social and econo
stems), and how they are likely to respond to various kinds of manipulation.
e model of the education system on which indicators are built frequently comprises
me combination of inputs, processes, and outputs. Inputs are the resources available systemfor example, buildings, books, the number and quality of teachers, and suc
ucationally relevant background characteristics of students as the socioeconomic
nditions of their families, communities, and regions. Processes are the ways school
e their resources as expressed in curricular and instructional activities. Outputs are
at the school tries to achieve; they include the cognitive achievements of students an
fective characteristics such as the positive and negative feelings and attitudes studen
velop relating to their activities, interests, and values.
hoice of Outcome Indicators
enumerate the outcomes of education about which it might be useful to have empi
formation in terms of the many aims that have been posited for education would be
dless task. Aims frequently suggested include the development of literacy and
meracy skills, the development of aesthetic areas of experience, preparation for life
mocratic society, preparation for the world of work, development of character and
oral sensitivity, and personal self-fulfillment. Aims (and
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
17/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
18/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
19/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
20/134
P
achers' efforts and raising students' achievements, promoting accountability, increas
blic awareness, and informing political debate.
forming Policy
formation on the achievements of students in an education system can serve a varie
diences and functions. Educational administrators, such as senior ministry of educaficials, should be in a position to produce valid, timely, and useful information whe
dressing policy issues to be resolved in a political setting. Without such information
licymaking can be unduly influenced by personal biases of ministers of education o
nior civil servants, vested interests of school owners or teacher unions, and anecdo
idence offered by business interests, journalists, and politicians. Given this range of
fluences, at a minimum, pertinent data must be available to guide the selection of
orities in curriculum, the provision of material resources, and teacher training
ategies. However, as noted above, factual information to assist policymaking, especta on the quality of student learning, is seldom available in developing countries. Ev
hen data on student achievement are available, the views of powerful constituencies
ntinue to play a role in setting educational priorities. Virtually all decisions in publi
licy are based on both facts and values (Lincoln and Guba 1981). The role of
hievement data is to strengthen the factual basis of decisionmaking.
any education systems are committed to the principle of equality of opportunity and
onitor the extent to which groups enjoy equal access to and participate in education.formation from a national assessment can bring this a step further by providing
idence about the achievements of such groups. Thus, national assessment results ha
en used in the United States to provide evidence of differences in school achieveme
ated to geography, gender, and ethnicity. Many countries will also be interested in
owing whether mean reading achievement levels are similar for boys and girls, rura
d urban children, and children from different linguistic groups.
formation from a national assessment will be more useful to policymakers if it prov
formation on subdomains of knowledge rather than just an overall score for arriculum area such as reading or mathematics. Recent reading surveys have examin
spondents' performance in analysis and comprehension of narrative material (based
tional text), expository material (information or opinion writing), and documentary
aterial (information presented in a structured form in charts, maps, lists, or sets of
structions) (Elley 1992). In mathemat-
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
21/134
P
, categories (subdomains) that have been used include numbers and operations,
easurement, geometry, data analysis and statistics, and algebra and functions (Lapoi
ead, and Askew 1992). Data on the performance of students in subdomains can poi
engths and weaknesses within curriculum areas, show how intended curricula are
plemented in schools, and, in particular, highlight such factors as gender, urban-rur
cation, or performance at different times. Such information may have implications rriculum design, teacher training, and the allocation of resources.
onitoring Standards
formation on student achievement in key curriculum areas collected on a regular ba
s helped monitor changes in achievement over time in such countries as Chile, Fran
land, Thailand, the United Kingdom, and the United States. By presenting objective
dings on achievement, a national assessment can provide evidence relevant to
sertions made frequently by employers, industrialists, and others that educationalndards are falling.
ountries vary in the frequency with which they obtain information on particular area
hievement. A five-year interval would seem to be a reasonable time span, since
hievement standards are unlikely to vary greatly from year to year. This does not m
at a national assessment exercise would be conducted only every five years.
sessments could be more frequent, but a particular curriculum area would be asses
ly once in five years.troducing Realistic Standards
national assessment can foster a sense of realism in the debate on appropriate
hievement levels. In developing countries, unrealistic standards have probably
ntributed to the high student failure rates that are a feature of many education system
ellaghan and Greaney 1992). Unduly high levels of expectation may be prompted b
e desire to maintain traditional colonial standards. However, such a target may be al
possible to attain, given the level of socioeconomic development of some countriesnother factor affecting the target is the changing nature of the school-going populati
sing from the dramatic increase in enrollment numbers; this increase, in turn, is oft
companied by lower teacher qualification requirements and a decline in the quality
ucational facilities.
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
22/134
P
entifying Correlates of Achievement
formation on correlates of the outcomes of an education system can help policymak
entify factors over which they can exercise some controlfactors likely to contribute
provements in student achievement levels. Data on some of these potentially
anipulable variables may have to be collected along with achievement data at the tim
e national assessment. For example, national assessment data have been used in
olombia to assess the impact of in-service teacher training. In Chile the contribution
hool resources to student achievement has been examined and decisions made abou
ocation of such resources. Other possible correlates of achievement include the
mphasis placed on individual subject areas; assessment and supervision procedures;
xtbooks (prices, numbers, contents, and distribution systems); curricular content; an
te policies on language instruction.
recting Teachers' Efforts and Raising Students' Achievements
e expectation is that action will be taken in the light of national assessment results t
andate changes in policy or in the allocation of resources. However, the information
ch assessments provide may be sufficient, even without formal action, to bring teac
d learning into line with what is assessed (Burnstein, Oakes, and Guiton 1992). The
ason for the improvement is that the indicators may point to what is important, and
what is measured is likely to become what matters" (Burnstein, Oakes, and Guiton 19
410). As a consequence, curricula, teaching, and learning will be directed toward thhievements represented in the indicators. What is tested is what will be taught, and
not tested will not be taught (Kellaghan and Greaney 1992).
e conditions under which assessments will have positive effects are not entirely cle
rtainly, there are situations in which assessment systems have little impact on polic
actice (Gipps and Goldstein 1983), for example, when the results are not communic
early or in a usable way to policymakers. It is equally certain that when high stakes
ached to performance on an assessment, teaching and learning will be aligned with
sessment (Kellaghan and Grisay 1995; Madaus and Kellaghan 1992). But although thay result in improved test scores, if these are the result of teaching to the test, they w
t necessarily be matched by improvement in students' achievement measured in oth
ays (Kellaghan and Greaney 1992; Le Mahieu 1984; Linn 1983).
ailand provides an example of a national assessment designed to change teachers'
rceptions of what is important to teach. The assessment included affective outcome
ch as attitudes toward work, moral
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
23/134
P
lues, and social participation in the hope that teachers would begin to stress learnin
tcomes other than those measured in formal examinations. Subsequently, it was
ablished that teachers began to emphasize affective learning outcomes in their teach
d evaluation (Prawalpruk 1996).
omoting Accountability
overnments need access to relevant information on the operation of the education
stem to enable them to determine whether the state is getting good value for its
vestment. That investment is substantial. Recent figures indicate that in most low-
come economies, expenditure on education is one of the largest cost items in
vernment spendingmuch larger than expenditures on health, defense, housing, soci
curity, or welfare (World Bank 1995a). In this situation, relevant feedback is obviou
sential and can help avoid a waste of scarce resources that has been described as
cially intolerable, economically unacceptable, and politically short-sighted (Bottani90, p. 336).
variety of models of accountability exists. The precise model employed will depend
any factors. First, it will depend on who is regarded as responsible for performance
acher, the school, the ministry of education, or the general public. Second, the natur
e information obtained will affect which individuals or institutions are identified as
countable. In the British system of national assessment, information is available abo
schools; thus schools can be identified in the accountability process. If individualachers or schools are not identified in national assessments, it obviously will not be
ssible to hold them accountable for student performance. Similarly, when samples,
her than whole populations of schools, are tested in a national assessment, adequat
formation will not be available (except for a small number of sample schools) to
entify and hold accountable poorly performing teachers or schools.
creasing Public Awareness
inistries of education are often reluctant to place in the public arena information aboe operation of the education system that they regard as sensitive. This is not surprisi
hen the ministry is charged by government with attaining politically sensitive (but
actically difficult) objectives such as promotion of a national language. Willingness
blicize policy failures is not a conspicuous characteristic of most ministries. In addi
litical expediency may dictate that ministries not
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
24/134
Pa
port results which highlight the superiority of particular ethnic, linguistic, or regiona
oups. In such situations, it may be difficult to establish an atmosphere in which nati
sessments can be conducted and results made freely available to all interested partie
though it may sometimes be in the interest of a ministry to control the flow of
formation, the long-term advantages of an open-information system are likely to
tweigh any short-term disadvantage. Several long-term benefits can be identified.
hen the results of a national assessment are made widely available, they can attract
nsiderable media attention and thus heighten public consciousness on educational
atters. The results of a national assessment can also bring an air of reality and a leve
egrity to discussions about the education system. The informed debate that is simul
n, in turn, contribute to increased public support for national, regional, and local ef
improve the education system. Thus, although the knowledge furnished by nationa
sessments may create immediate problems for politicians and government officials,e longer term it can provide a stimulus, rationale, or justification for reform initiativ
forming Political Debate
ational and, even more notably, international comparative assessment exercises give
considerable debate among politicians, as well as others interested in education. An
ucation system provides a country with the human resources and expertise necessar
ake it competitive in international markets, and from this perspective political intere
tional achievement is understandable. Politicians need to know whether the educatistem is giving value for the considerable portion of the national budget they allocat
each year. Today, in many countries, rhetoric (usually uninformed) tends to domina
e political debate on education. Armed with objective evidence on the operation of
stem, politicians are more likely to initiate reforms and to prompt ministries of
ucation to action.
ole of National Assessments
though there has been a pronounced increase in recent years in support for formalsessment of student achievement (Lockheed 1992), most developing countries still l
lid and timely information on the outcomes of schooling. A national assessment can
lp fill this gap by providing educational leaders and administrators with relevant da
student achievement levels in important curricular areas
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
25/134
Pa
a regular basis. These data can contribute to policy and public debate, to the diagn
problems, to the formulation of reforms, and to improved efficiency.
ere is no single formula or design for carrying out a national assessment. A
vernment's purposes and procedures for assessing national levels of achievement w
determined by local circumstances and policy concerns. The diversity of uses and
proaches will become more apparent in chapter 2 when we review seven national
sessment systems from different regions of the world, as well as international
mparative assessments of student achievements. The remainder of the book provid
formation on how toand how not toconduct a national assessment.
may seem reasonable to argue that spending money on a national assessment is not
stified when resources are inadequate for building schools or for providing textboo
udents who need them. In response, it needs to be pointed out that the resources
quired for the conduct of a national assessment would not go very far in addressingajor shortcomings in the areas of school or textbook provision. Furthermore, the
formation obtained through a national assessment can bring about cost-efficiencies
entifying failing features of existing arrangements or by producing evidence to supp
ore effective alternatives. However, it is up to the proponents of a national assessme
show that the likely benefits to the education system as a whole merit the allocation
e necessary funds. If they cannot show this, the resources earmarked for this activit
ght indeed be more usefully devoted to activities such as school and textbook
ovision.
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
26/134
Pa
ational and International Assessments
ational assessments tend to be initiated by governmentsmore specifically, by ministr
education. International assessments often owe their origin to the initiatives of embers of the research community. The main difference between the two types of
sessment is that national assessments are designed and implemented within individu
untries using their own sampling designs and instrumentation, whereas internationa
sessments require participating countries to follow similar procedures and use the s
struments.
this chapter, national assessment systems in two industrial countries (the United Sta
d England and Wales) and five developing countries (two in Latin America, one inia, and two in Africa) are described. Next, two international assessments are outlin
d the advantages and disadvantages for developing countries of participating in suc
sessments are considered.
ational Assessments
ational assessments are now a standard feature of education systems in several indu
untries. The assessments are similar in many ways. Virtually all use multiple-choice
ort-answer questions, although Norway and the United States include essay-typeiting tasks and oral assessments are conducted in Sweden and the United Kingdom
ngland, Wales, and Northern Ireland). National assessments also differ in several
spects from country to country. In Canada and France many grades are assessed,
hereas relatively few are assessed in the Netherlands, Norway, Scotland, and Swede
e purposes of national assessment also vary.
nited States
e U.S. National Assessment of Educational Progress ( NAEP) is the most widely reportional assessment model in the literature. It is an on-
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
27/134
Pa
ing survey, mandated by the U.S. Congress and implemented by trained field staff,
ually school or district personnel. The survey is designed to measure students'
ucational achievements at specified ages and grades. It also examines achievements
bpopulations defined by demographic characteristics and by specific background
perience. Since 1990 voluntary state-level assessments, in addition to the national
sessments, have been authorized by Congress (Johnson 1992).
though the NAEP has been in existence since 1969, politicians and the general publ
pear to have become interested in its findings only recently (Smith, O'Day, and Coh
90). Heightened political interest as a result of the attention paid by the National
overnors' Association to NAEP findings led to the introduction in 1990 of state-by-s
mparisons (Phillips 1991). Over the years, details of the administration of the NAEP
ve changedfor example, the frequency of assessment and the grade level targeted. A
esent, assessments are conducted every second year on samples of students in gradeand 12. Eleven instructional areas have been assessed periodically. Most recent rep
ve focused on reading and writing (Applebee and others 1990a, 1990b; Langer and
hers 1990; Mullis and Jenkins 1990); mathematics and science (Dossey and others 1
ullis and Jenkins 1988; Mullis and others 1993); history (Hammack and others 1990
ography (Allen and others 1990); and civics (Anderson and others 1990). Data have
en reported by state, gender, ethnicity, type of community, and region.
p to 1984, the percentages of students who passed items were reported. Since that da
oficiency scales have been developed for each subject area. These scales weremputed by using statistical techniques (based on item response theory) to create a s
ale representing performance (Phillips and others 1993). The scale is a numerical in
at ranges from 0 to 500. It has three achievement levelsbasic, proficient, and advanc
ch grade level and allows comparison of performance across grades 4, 8, and 12.
setting the achievement levels, the views of teacher representatives (sixty-eight in
athematics, for example), administrators, and members of the general public were ta
o account (Mullis and others 1993). Performance at the lowest, or basic, level denortial mastery of the knowledge and skills required at each grade level. For example,
ade 4 students performing at the basic level are able to perform simple operations w
hole numbers and show some understanding of fractions and decimals. Performanc
e middle, or proficient , level demonstrates competence in the subject matter. In the v
the National Assessment Governing Board, all students should perform at this leve
ade 4 students who are proficient in mathematics can use whole numbers to estima
mpute, and determine whether results are reasonable; have a conceptual understand
fractions and
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
28/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
29/134
Pa
cimals; can solve problems; and can use four-function calculators. The highest, or
vanced , level indicates superior performance. Grade 4 students who receive this rat
n solve complex nonroutine problems, draw logical conclusions, and justify answe
verage mathematics proficiency marks are presented for grades 4, 8, and 12 for 1990
d 1992 in table 2.1. The data in the last column show that in both years more than o
rd of students at all grade levels failed to reach the basic level of performance.
owever, the figures in this and in other columns suggest that standards rose between
90 and 1992.
sults based on one common scale (table 2.2) show that most students, especially th
grades 4 and 8, performed poorly on tasks involving fractions, decimals, and
rcentages. Furthermore, very few grade 12 students were able to solve nonroutine
oblems involving geometric relations, algebra, or functions. Subsequent analyses
vealed that performance varied by type of school attended, state, gender, and level ome support.
omparisons of trends over time show that achievements in science and mathematics
ve improved, whereas, except at one grade level, there has been no significant
provement in reading or writing since the mid-1980s (Mullis and others 1994).
formation collected in the NAEP to help provide a context for the interpretation of the
hievement results revealed that large proportions of high school students avoid taki
athematics and science courses.
ble 2.1. Proficiency Levels of Students in Grades 4, 8, and 12, as Measured by
S. NAEP Mathematics Surveys, 1990 and 1992
Grade
d year
Average
proficiency
Percentage of students
at or above Percentage of students below
basic AdvancedProficientBasic
ade 4
90 213 1 13 54 46
92 218 2 18 61 39
ade 8
90 263 2 20 58 42
92 268 4 25 63 37
ade
90 294 2 13 59 41
92 299 2 16 64 36
urce: Mullis and others 1993.
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
30/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
31/134
Pa
ble 2.2. Percentage of Students at or above Average Proficiency Levels in Grades
8, and 12, as Measured by U.S. NAEP Mathematics Surveys, 1990 and 1992
Grade
nd year Average proficiency
Percentage at or above proficiency level
200 250 300 350
ade 4
90 213 67 12 0 0
92 218 72 17 0 0
ade 8
90 263 95 65 15 0
92 268 97 68 20 1
ade 12
90 294 100 88 45 5
92 299 100 91 50 6
te: Skills for each proficiency level are as follows:
vel 200. Addition, subtraction, and simple problem solving with numbers
vel 250. Multiplication and division, simple measurement, two-step problem
ving
vel 300. Reasoning and problem solving involving fractions, decimals,
rcentages, and elementary concepts in geometry, algebra, and statistics
vel 350. Reasoning, problem solving involving geometric relationship, algebra,
nctions.
urce: Mullis and others 1993.
mong eleventh-graders who enroll in science courses, approximately half had never
nducted independent experiments. Almost two-thirds of eighth-graders spend more
an three hours a day watching television.
gland and Wales
England and Wales, national monitoring efforts have been a feature of the educatio
stem since 1948. Large-scale national surveys of levels of reading achievement of 9
-, and 15-year-olds were conducted irregularly up to 1977 (Kellaghan and Madaus82). In 1978, partly in response to criticisms about standards in schools, a more
aborate system of assessment, run by the Assessment of Performance Unit in the
epartment of Education and Science, was set up (Foxman, Hutchinson, and Bloomf
91). Three main areas of student achievement were
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
32/134
Pa
geted for assessment at ages 11, 13, and 15: language, mathematics, and science. In
dition to pencil-and-paper tests, performance tasks were administered to small sam
students to assess their ability to estimate and to weigh and measure objects.
sessments in the 1980s carried considerable political weight. They contributed to th
gnificant curriculum reform movement embodied in the 1988 Education Act, which
e first time, defined a national curriculum in England and Wales (Bennett and Desfo
91). The new curriculum was divided into four ''key" stages, two at the primary lev
d two at the secondary level. A new system of national assessment was introduced i
njunction with the new curriculum. Attainment was to be assessed by teachers in th
wn classrooms by administering externally designed performance assessments. Thes
sessments went well beyond the performance tests introduced by the Assessment an
rformance Unit; they were designed to match normal classroom tasks and to have n
gative backwash effects on the curriculum (Gipps and Murphy 1994).e policy-related dimension of the assessments was clear. They were intended to hav
riety of functions: formativeto be used in planning further instruction; diagnosticto
entify learning difficulties; summativeto record the overall achievement of a student
stematic way; and evaluativeto provide information for assessing and reporting on
pects of the work of the school, the local education authority, or other discrete parts
e education service (Great Britain, Department of Education and Science, 1988). In
rticular, the assessments were expected to play an important role in ensuring that
hools and teachers adhered to the curriculum as laid down by the central authority.us the assessment approach could be described as "fundamentally a management
vice" (Bennett and Desforges 1991, p. 72); it was not supported by any theory of
arning (Nuttall 1990).
though there have been several versions of the curriculum and of the assessment
stem since its inception, some significant features of the system have been maintain
rst, all students are assessed at the end of each key stage at ages 7, 11, 14, and 16.
cond, students' performance is assessed against statements of attainment prescribedch stage (for example, the student is able to assign organisms to their major groups
ing keys and observable features, or the student can read silently and with sustained
ncentration). Third, assessments are based on both teacher judgments and external
ts.
achers play an important role in assessment: they determine whether a student has
hieved the level of response specified in the statement of attainment, record the
hievement levels reached, indicate level of progress in relation to attainment targets
ovide evidence to support levels of attainment reached, and give information about
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
33/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
34/134
Pa
nt achievements and progress to parents, other teachers, and schools. Moderation is
rried out by other teachers, to help ensure a common marking standard.
tial reactions to the process indicated that teachers welcomed the materials provide
d the innovative assessment procedures. On the negative side, the assessment proce
aced a heavy burden on teachers, the in-service support provided was inadequate, a
e assessment turned out to be largely impractical (Broadfoot and others n.d.; Gipps
hers 1991; Madaus and Kellaghan 1993). To add to the problems, results were being
blished at a time of intense competition between schools and of job losses, which g
e to questions about entrusting the administration and scoring to teachers (Fitz-Gib
95).
wo important lessons can be drawn from the British national assessment system. Fir
e use of complex assessment tasks leads to problems of standardization of procedur
r administration and scoring that, in turn, lead to problems of comparability, bothtween schools and over time. Second, it is extremely difficult, if at all possible, to
vise assessment tasks that will serve equally well formative, diagnostic, and summa
aluative purposes (Kellaghan 1996c). Efforts to deal with these problems are to be
und in the move to make greater use of more conventional centralized written tests
accord priority to the summative function in future assessments (Dearing 1993; Gip
d Murphy 1994).
hile1978 Chile's Ministry of Education assigned responsibility for a national assessmen
external agency, the Pontificia Universidad Católica de Chile. The study was pilote
er a two-year period. Data on contextual variables, as well as on achievement, were
llected (Himmel 1996). These included student-home variables (student willingness
arn, parental expectations for their children); teacher-classroom variables (teaching
ethodologies, classroom climate); principal and school variables (expectations of sta
d of students, promotion of parents in school activities); and institutional variables
ducational and financial policies).
e assessment was designed to provide information on the extent to which students
ere achieving learning targets considered minimal by the Ministry of Education; to
ovide feedback to parents, teachers, and authorities at municipal, regional, and cent
vels; and to provide data to planners that would guide the allocation of resources in
xtbook development, curriculum development, and in-service teacher training.
l students in grades 4 and 8 were assessed in Spanish (reading and
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
35/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
36/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
37/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
38/134
Pa
udents in rural schools; students in large schools performed better than students in s
hools; and students in private schools scored highest.
e results were disseminated extensively. Teachers received classroom results contai
e average percentage of correct answers for each objective assessed, as well as the
erage number of correct answers over the entire test. Results were also reported
tionally and by school, location, and region. Each classroom and school was given
rcentile ranking based on other schools in the same socioeconomic category, as we
national ranking. Special manuals explained the results and indicated how schools a
achers could use the information to improve achievement levels. Results were given
hool supervisors.
latively little use was made of the self-concept information. Parental information w
t used and was not collected after the first year. Parents, however, received a simpli
port of overall results for Spanish and mathematics.
e of the national assessment results has increased gradually. Lowscoring schools ha
cess to a special fund to enable them to improve infrastructure, educational resourc
d pedagogical approaches. Results have also been used to prompt curriculum refor
rcentile rank scores were dropped in favor of percentage scores because teachers fo
difficult to interpret the former.
e Chilean experience highlights the need for consensus and political will, technical
mpetence, and economic feasibility (Himmel 1996). Currently there appears to belitical and public support for the SIMCE. It provides education administrators with
formation for planning, and authors of instructional materials use the information to
entify objectives. However, the enterprise has not been a total success. Some school
alizing that their rank depended on the reported socioeconomic grouping of their
udents, overestimated the extent of poverty among their students to help boost their
sition. Efforts to explain procedures and results to parents have not been reflected i
creased parent involvement with schools except for private schools. Almost two-thi
teachers reported that they did not use the special manual that dealt with thedagogical implications of the test results. Finally, questions have been raised about
lue of the census approach when sample data could provide policymakers with the
eded information.
olombia
ational assessment in Colombia was prompted by a perception that insufficient relev
formation was available for decisionmaking at central, regional, and local levels (Ro
96). The Ministry of Education also wished to use the results to generate debate on
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
39/134
ucational issues.
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
40/134
Pa
e initial assessment conducted in 1991 focused on the extent to which standards
fined as minimum in mathematics and language were being attained in grades 3 and
urban and rural public and private schools. A total of 15,000 students participated i
e assessment. Originally thirteen states, accounting for 60 percent of the population
ere targeted. The sample comprised 650 students in grade 3 and 500 students in grad
each state.
r grade 3 four performance levels were assessed in mathematics and three in readin
mprehension. Performance levels or target standards were determined by the test
velopment personnel. For example, in mathematics the lowest performance level
cluded items on simple addition, whereas more complex tasks involving problem
lving were equated with higher performance levels. For grade 5 five performance le
ere assessed in mathematics and four in reading. Both multiplechoice items and item
r which students had to supply short answers were used. Data on personal, school, vironmental characteristics were collected, as well as information on student
rticipation in local organizations or associations.
e national leader of the assessment had considerable experience in research, data
llection, and fieldwork. Teams were established to coordinate the fieldwork within
dividual states. Each team was led by a coordinator who directed the field testing,
pported by two or three individuals with formal qualifications in the social sciences
cal coordinators, usually young people, supervised the work of ten to fifteen
ldworkers. The fieldworkers, often university students or recent social scienceaduates, administered the tests and conducted teacher interviews. The supply of
plicants for these positions was ample because of the relatively high unemploymen
es among graduates. Local teachers were not asked to administer tests because it wa
t they might attempt to help students taking the tests. Ministry of Education official
ere considered unqualified for the work.
the end of the assessment, profiles of high-scoring schools, teachers, and
ministrators were developed. The percentages of students who scored at eachrformance level were reported separately for each state, for public and private scho
d for urban and rural schools, as well as at the national level. Correlates of achievem
ere identified; these included the number of hours per week devoted to a subject are
achers' emphasis on specific content areas, teachers' educational level, school faciliti
d number of textbooks per student. Negative correlations were recorded for grade
petition, absenteeism, time spent getting to school, and family size (Instituto SER de
vestigación/Fedesarrollo 1994). The number of in-service courses a teacher had tak
d not emerge as a significant predictor of achievement.
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
41/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
42/134
Pa
sults were released through the mass media, and a program of national and local
orkshops was organized to discuss the results and their implications. Individual teac
ceived information on national and regional results in newsletters, brochures, and o
er-friendly documents. Administrators, especially at the state level, used results for
cal comparisons. A national seminar used the national assessment data to identify
propriate strategies for improving educational quality. Results for individual schoolere not reported because it was felt that this would undermine teacher support for th
sessment.
e apparent success of the initial assessment has been attributed to the creation of an
aluation unit within the Ministry of Education; to the commitment of the minister an
ce-minister for education; to the support of ministry officials; to the use of an extern
blic agency to design the assessment instruments; and to the use of a private agency
ke responsibility for sampling, piloting of instruments, administration of tests, and dalysis (C. Rojas, personal communication, 1995). After the first two years responsib
r the national assessment was transferred to a public agency, which administered th
sessment in 1993 and 1994. By late 1995, however, the agency had not managed to
alyze the data collected in either year.
ailand
llowing the introduction of a new higher secondary school curriculum in 1981, pub
rtification examinations at the end of secondary school were abolished in Thailand,achers were given responsibility for evaluating student achievements in their respec
urses. Concerned that achievement might fall in this situation, the Ministry of Educ
roduced national assessment as a means of monitoring standards (Prawalpruk 1996
dministrators at various levels of the system were expected to use the results to help
prove the quality of education. To encourage schools to broaden their objectives an
structional practices, the national assessment included measures of affective learnin
tcomes (attitudes toward work, moral values, and participation) and practical skills
arting in 1983, all grade 12 students (in their final year in secondary school) weresessed in Thai, social studies, and physical education. In addition, science, mathema
d career education were assessed in most subsequent years. Both cognitive and
fective outcomes were assessed in social studies, physical education, and career
ucation. The task was entrusted to the Office of Educational Assessment and Testin
rvices in the Department of Curriculum and Instruction Development.
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
43/134
Pa
any of the staff had achieved master's degrees in educational assessment; eight had b
ined outside Thailand. Subject matter committees (twelve to eighteen members eac
ablished for each subject area developed tables of specifications for achievement a
ote multiple-choice items. Nationwide testing was conducted on the same two days
hools were furnished with individual student scores and with school, regional, and
ovincial mean scores; information on how other individual schools performed was
ovided. For public communication purposes, student performance was reported as
rcentage of items answered correctly. Provincial administrators advised how the res
uld be used in planning academic programs at school, provincial, and regional leve
subsequent years, samples of grades 6 and 9 were assessed, generally every second
ar. In a reaction to the initial failure of schools to use assessment results to improve
hool practice, the national assessment design was expanded to include measures of
hool process (school administration, curriculum implementation, lesson preparationd instruction). Starting in 1990 school process measures were assessed by teams of
ree external evaluators. The early national assessment results for science and
athematics were considered disappointing; they showed that students were weak at
plying principles in both subject areas. This conclusion prompted a significant
rriculum revision in 1989.
ational assessment has been used for school and provincial planning and for monito
vels of student achievement over time; it has also helped increase teacher interest infective learning outcomes. According to Prawalpruk (1996), some principals misuse
e results by claiming that poor results could be attributed to poor teaching. Results w
ed for educational planning only if adequate administrative support was available.
hool principals ignored assessment results if they did not consider them useful for
anning.
amibia
e National Institute for Educational Development in Namibia collaborated with Floate University and Harvard University in 1992 to assess the basic language and
athematics proficiencies of students at grades 4 and 7. The objectives of the assessm
ere to inform policymakers on achievement levels to enable them "to decide on reso
geting to underachieving schools" (Namibia, Ministry of Education and Culture, 19
xiv), to sensitize managers to the professional needs of teachers, to enable schools
gions to
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
44/134
Pa
mpare themselves with their counterparts, and to provide baseline data for monitor
ogress.
sts were developed "by reference groups within the head office of the ministry" (p.
sed on official curricula and textbooks. A random sample of 136 schools was draw
vering Namibia's six education regions. Within each school, one grade 4 and one gr
class were chosen randomly. In one specific region of interest (Ondangwa), thirty-f
hools with grade 4 students and nineteen with grade 7 students took the national
nguageOshindongatest. Test instructions to all students were given in the local langu
ore than 7,000 students in grades 4 and 7 were tested in English and mathematics.
f the 136 schools, 20 were included in a special longitudinal sample to monitor chan
English achievement over time. In these schools, students in grades 4 and 5 took th
ade 4 test, whereas those in grades 6 and 7 took the grade 7 test. It was planned to
administer the tests to students each year. It is now accepted that the longitudinal samas too small to permit generalization to the wider population of Namibian children.
e tests were administered to all students in attendance in the targeted grades in the 1
mple schools; only 98 schools, however, had a grade 7 class. Both the English and
hindonga tests were timed. The English test took 40 to 60 minutes and the Oshindo
t 60 to 80 minutes to complete. The untimed mathematics test took up to 120 minut
d caused some student fatigue.
cause the test designers hoped to get a normal distribution of test scores, tests weresigned to assess levels of mastery. Items answered correctly by less than 20 percent
ore than 80 percent of students were deleted in analyses. This reduced severely the
mber of items that could be used in measuring performance levelsin the English gr
est, from seventeen to nine, and in the grade 7 mathematics test, from sixty to thirty
ght.
sults showed that many grade 4 students had difficulty with the English test, promp
ncern that the expected level of performance was too high and suggesting that therriculum materials might be too advanced. Initial analyses of results suggested that
tegories of students increased their scores between grades 4 and 5 and between grad
d 7. At grade 7, the performances of girls and boys were similar on the two languag
ts, but boys outscored girls on the mathematics test. Older students had much lowe
ores than younger ones; for example, 19-year-olds answered correctly fewer than h
e items answered correctly by 12- and 13-year-olds on both the English and
athematics tests. Differences in scores for regions and for language groups were als
ported.
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
45/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
46/134
Pa
ata were used to relate performance levels to three background factorsage, gender, a
me languagewhich in combination explained about one-third of the variance in Eng
ores and about one-fifth of the variance in mathematics scores. In one region, howe
s than 3 percent of the variance could be attributed to these factors. A set of papers
epared for teachers outlining practical suggestions for improving student performan
areas that had posed difficulties.
he study concluded that the process of developing the tests for the assessment was n
ogether satisfactory and that a new competency-based curriculum will make it
cessary to develop new measures to assess basic competencies in subject areas.
auritius
implement the recommendations of the World Conference on Education for All, th
nited Nations Educational, Scientific, and Cultural Organization (UNESCO) and the Unit
ations Children's Fund (UNICEF) launched a project to develop national assessment
pacities in China, Jordan, Mali, Mauritius, and Morocco (Chinapah 1992; UNESCO 199
entification missions to each country were supported by some centralized training in
rvey methodology. Each national assessment focused on learning achievement (liter
meracy, and basic life skills); factors related to learning achievement (personal
aracteristics, home environment, and school environment); and access and equity
male enrollment, and admission and participation rates of specific groups). The
signers hoped that lessons learned in the course of the project could be adapted andplied in other developing countries.
e national assessment in Mauritius was conducted to address policy issues relating
ucational inequalities (Chinapah 1992) and to provide baseline data on achievemen
vels, with the aim of identifying the percentage of students who attained defined
ceptable standards in specified subject areas. Literacy (English and French), numera
d life skills were assessed. Items on road safety, awareness of the environment, soc
ills, and study skills were included.
ecific performance criteria were developed for each subject area (Mauritius
aminations Syndicate 1995). To be rated literate in French, for example, a 9-year-ol
as required to obtain a minimum score of twenty marks out of thirty-five, including
ght of a possible thirteen in "reading" and twelve of, twenty-two in "vocabulary, wri
pression." To be considered literate in English, the 9-year-old was expected to obtai
nimum score of seventeen marks, including twelve out of a possible twenty-two in
ading and five of eight in writing. Such performances were considered to represent
ility to read clearly, to un-
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
47/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
48/134
Pa
rstand different types of text judged appropriate for 9-year-olds, and to solve simpl
opping problems (V. Chinapah, personal communication, 1995).
pproximately 1,600 standard IV students, mainly 9-year-olds in a representative sam
fifty-two schools, were assessed. Questionnaires were administered to parents,
achers, and school principals to obtain background information on home, school, a
udent characteristics. Responsibility for the assessment was entrusted to the Mauritiu
aminations Syndicate. The syndicate, which administers the annual high-stakes pub
aminations, had some technical competence in test development, data analysis, and
ministration of formal assessments. Each test lasted 40 minutes. The literary and
meracy test relied on multiple-choice and short-answer questions, the life skills test
ultiple-choice items. Tests were administered by retired primary school inspectors a
ad teachers. Data were collected in 1994, and findings were presented to the Ministr
ducation and to teachers. The syndicate plans to repeat the assessment in the future tonitor possible changes in achievement over time (R.Manrakhan, personal
mmunication, 1995).
ternational Assessments
ternational assessments, in contrast with national assessments, involve measuremen
e educational outcomes of education systems in several countries, usually
multaneously. Representatives from many countries (usually from research
ganizations) agree on an instrument to assess achievement in a curriculum area, thestrument is administered to a representative sample of students at a particular age or
ade in each country, and comparative analyses of the data are carried out (Kellaghan
d Grisay 1995).
ountries participating in international studies are expected to provide personnel and
nds for administration, training, printing, local analyses, and production of national
ports. Costs of instrument development, sampling frameworks, international data
alyses, and report writing are the responsibility of the international assessment agen
which individual countries make a financial contribution.
ternational Assessment of Educational Progress
e first International Assessment of Educational Progress (IAEP), conducted in 1988 u
e direction of Educational Testing Services, under contract to the U.S. Department o
ducation, represents an
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
49/134
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
50/134
Pa
y to learn, the amount of time a subject is studied, the use of computers, and factor
d resources in the homes of students (Anderson and Postlethwaite 1989; Anderson,
yan, and Shapiro 1989; Elley 1992, 1994; Kifer 1989; Lambin 1995; Postlethwaite and
oss 1992).
vantages of International Assessments
e main advantage of international studies over national assessments is the compara
amework they provide in assessing student achievement and curricular provision
usén 1967). International assessments give some indication of where the students in
untry stand relative to students in other countries. They also show the extent to whi
e treatment of common curriculum areas differs across countries, and, in particular,
tent to which the approach in a given country may be idiosyncratic. This informatio
ay lead a country to reassess its curriculum policy.
any accounts are available of how findings of international studies on student
hievement and curricula have been used to change educational policy (Husén 1987;
ellaghan 1996b; Torney-Purta 1990). For example, results of international studies ha
en credited with the increased emphasis placed on science in Canada and in the Un
ates (McEwen 1992). In Japan the relatively superior performance of students in
athematical computation compared with mathematical application and analysis led t
ange in emphasis in the curriculum (Husén 1987). In Hungary participation in IEA stu
s been credited with curriculum reform in reading, and the finding that home factorcounted for more variance in student achievement than school factors helped to
dermine Marxist-Leninist curricular ideologies (Báthory 1989).
ternational assessments have many other advantages. Their findings tend to attract m
litical and media attention than those of national studies. Thus, poor results can pro
liticians and other policymakers with a strong rationale for budgetary support for th
ucation sector.
r national teams entrusted with the implementation of international assessment, theperience of rigorous sampling, item review, printing, distribution, supervision, scor
ta entry, and drafting of national reports according to an agreed-on timetable can
ntribute greatly to the development of local capacity to conduct research and nation
sessments. Finally, staffing requirements and costs are lower in international studies
an in national assessments because instrumentation and sampling design are develop
collaboration with experts in other countries.
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
51/134
Pa
sadvantages of International Assessments
can be argued that such factors as availability of schools and materials, opportunity
arn, status and quality of teaching, parental interest, and class size differ so radically
om country to country that valid comparisons of international achievement test resu
e impossible (Rotberg 1991). Although IEA studies generally consider the extent to w
udents in individual countries have had opportunities to learn the content tested, it is
ubtful whether politicians, policymakers, or the media take these into consideration
hen commenting on national rankings. Political rhetoric, frequently based on the
rceived implications of the findings for competitiveness in international trade rathe
an on a sober evaluation of the meaning of results, may dominate the discussion
mediately following the publication of results. In fairness, it should be stressed tha
informed political rhetoric can be prompted by the results of national as well as
ernational assessments and that some of the problems associated with internationalsessments apply equally to national assessments.
potentially significant problem with both international and national studies is the
fficulty in obtaining a representative sample of students (box 2.1). In many develop
untries up-to-date population data may not be available, and communication and
gistical problems can contribute to relatively low response rates. The National Cente
r Education Statistics in the United States has set a response rate target of 85 percen
oss-sectional surveys. This target may be much too high for developing countries, a
deed it has been achieved only once by the United States in international studies of athematics and science (Medrich and Griffith 1992). Sampling problems are
mmonplace and have been blamed for significant reversals of performance in some
untries between grades (Rotberg 1991). Targeted populations may not be comparab
pecially in countries where national enrollment, drop-
ox 2.1. Atypical Student Samples
the 1991 IAEP mathematics study, only 3 percent of the population of
-year-old students in Brazil and 1 percent of the correspondingpulation of students in Mozambique were sampled. The performance
Chinese studentswhich was highlighted in the report of the studywas
sed on a sample that excluded many 13-year-olds: those below grade
n twenty provinces and cities, those out of school (almost 50 percent
the population), and those attending school in nine provinces and
tonomous regions with predominantly non-Chinese populations
apointe, Mead, and Askew 1992). The exclusion of these groups
ggests that the reported achievement levels may seriously
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
52/134
erestimate the mean achievements of Chinese students.
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
53/134
Pa
t,and retention rates differ sharply. The result is that countries may have been
presented by atypical samples of students.
further problem with international assessments is that it is probably impossible to
velop a test that is equally valid for several countries (Kellaghan and Grisay 1995).
hat is meant by ''achievement in mathematics" or "achievement in science" varies fr
untry to country because different countries will choose different skills applied to
fferent facts and concepts to define what they regard as mathematical or scientific
hievement. Furthermore, a particular domain of a subject may be taught at different
ade levels in different countries. For example, simple geometric shapes, which are
roduced in many countries in the junior or lower primary grades, are not introduce
til grade 5 in Bangladesh. Again, prior knowledge or expectations might interfere w
empts to solve a simple problem.
cause items included in an international test represent a common denominator of thrricula of participating countries, it is unlikely that the relative weights assigned to
ecific curriculum areas in national curricula will match those in international tests. I
e 1988 IAEP relatively little effort was made to test the curricula covered by non-U.
rticipants. As a result, in one of the participating countries (Ireland), important area
e mathematics curriculum were not tested, and other areas that received substantial
mphasis in the national curriculum were accorded relatively little emphasis in the
ernational test (Greaney and Close 1989).
though a range of test formats is used in international assessments, the multiple-cho
rmat is used widely for reasons of management efficiency and desirable psychomet
operties (especially reliability). Even when other assessment formats are included,
ports may be limited to the results of the multiple-choice tests. This means that
portant skills in the national curriculum, including writing, oral, aural, and practica
ills, are excluded.
e costs of international assessments are likely to be lower than those of national
sessments, but participation in an international assessment does require considerablnancial support. The IEA estimates that the minimum national requirement is a full-tim
searcher and a data manager. Personnel requirements vary according to the nature o
sessment. Developing countries that wish to participate must pay a nominal annual
d make a contribution to the overall costs on the basis of their economic circumstan
cal funds have to be obtained for printing, data processing, and attendance at IEA
eetings. Costs may be met by a ministry of education, from university operating
dgets, or from a direct grant from the ministry of education to a university or resea
nter. IEA experience suggests that government-owned institutes have a better track re
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
54/134
an universities in conducting assess-
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
55/134
Pa
ents (W. Loxley, personal communication, 1993). A lack of meaningful contact betw
iversity researchers and government ministries is particularly noteworthy in some L
merican countries.
any developing countries are likely to encounter a range of common problems, whe
ey are conducting an international or a national assessment. These include unavailab
current population information on schools and enrollment figures; lack of experien
administering large-scale assessments or in administering objective tests in schools
ts that do not adequately reflect the curriculum offered in schools or that fail to refl
gional, ethnic, or linguistic variations; lack of exposure to objective-type items; fear
t results might be used for teacher accountability purposes; insufficient funds and
illed manpower to do rigorous in-country analyses of the national or international d
vernmental restrictions on publicizing results; and logistical problems in conducting
sessment.n balance, a developing country can probably benefit from participation in internati
sessments of student achievements. Participation can help develop expertise that ca
awn on later in more focused and more relevant national assessments. Consultant
pport, however, may be needed to carry out an international or national assessment
rticular, the services of long-and short-term local and foreign consultants may be
quired to offer training programs in test development, sampling, and analysis.
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
56/134
Pa
ational Assessment and Public Examinations
though the idea of national assessment is new in most countries, public examinatio
e an important and well-established feature of education in Africa, Asia, Europe, ane Caribbean. In developing countries they are usually offered at the end of primary
hooling and at the ends of the junior and senior cycles of secondary schooling. Pub
aminations are similar in many respects to national assessments: procedures are
rmalized, and testing is normally done outside the classroom setting and requires
udents to provide evidence of achievement. Because of their importance, their
quency, and their similarity to national assessments, it is reasonable to ask whether
blic examinations could be used to obtain the kind of information that national
sessment systems are designed to collect.
ght issues are relevant in attempting to answer this question: the purposes of public
aminations and of national assessments; the achievements of interest to the two
tivities; testing, scoring, and reporting procedures; the populations of interest to the
tivities; monitoring capabilities of the two activities; the need for contextual informa
interpreting assessment data; the implications of attaching high stakes to assessmen
d efficiency and cost-effectiveness in obtaining information.
rposes
e purposes of public examinations and national assessments are significantly differ
e purpose of a public examination is to determine whether an individual student
ssesses certain knowledge and skills. A national assessment is not primarily concer
th identifying the performance of individual students; rather, its purpose is to asses
rformance of all or part of the education system. Given this difference, we can still
hether it is possible to aggregate the data from individual assessments in public
aminations to obtain information on
Note: For a more extended treatment of this topic, see Kellaghan (1996a).
8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T
57/134
Pa
e system. To answer that question, we have to consider the more specific purposes o
dividual assessment and the implications of these purposes for the kind of assessme
ocedure used.
public examinations, information on student performance is used to make decision
out certification and selection, with selection tending to be the more important func
ellaghan and Greaney 1992; Lockheed 1991). As a consequence, the assessment
ocedure or examination will attempt to achieve maximum discrimination for those
udents for whom the probability of selection is high. This is done by excluding item
at are easy or of intermediate difficulty; if most students answered an item correctly
m would not discriminate among the higher-scoring students. However, tests made
lely of more difficult questions will not cover the whole curriculum or even attemp
so. The result is that public examinations may provide information on students'
hievements on only limited aspects of a curriculum.he purpose of national assessment is to find out what all students know and do not
ow. Therefore, the instrument used must provide adequate curriculum coverage. F
policy perspective, the performance of students who do poorly on an assessment m
of greater interest than the performance of those who do well.
chievements of Interest
ere is some overlap in the student achievements identified as important by public
aminations and national assessments. During the period of basic education, bothrtification and national assessment are based on information about basic literacy,
meracy, and reasoning skills. If we look at primary certificate (public) examination
e find that many focus on a number of core subjects, and a glance at several nationa
sessments indicates that they do the same. For example, students knowledge of a
tional language and mathematics is included in all national assessment systems.
owever, no national assessment attempts the coverage found in public examinations
e secondary level, when students tend to select and specialize in subject areas. Thebjects offered vary from one examination authority to another, but it is not unusual
d syllabi and examinations in twenty, thirty, or even more subjects.
ational assessments have focused on cognitive areas of development. Thailand
rawalpruk 1996) and Chile (Himmel 1996) are among the relatively small number o
ucation systems that have attempted to assess affective o