Monitoring the Learning Outcomes-Vincent Greaney T

8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

1/134

title:Monitoring the Learning Outcomes of EducationSystems Directions in Development (Washingto

D.C.)

author: Greaney, Vincent.; Kellaghan, Thomas.

publisher: World Bank

isbn10 | asin: 0821337343

print isbn13: 9780821337349

ebook isbn13: 9780585233888

language: English

subject

Educational evaluation--Cross-cultural studies,Educational indicators--Cross-cultural studies,

Educational tests and measurements--Cross-cu

studies.

publication date: 1996

lcc: LB2822.75.G736 1996eb

ddc: 379.1/54

subject:

Educational evaluation--Cross-cultural studies,

Educational indicators--Cross-cultural studies,

Educational tests and measurements--Cross-custudies.


2/134

P

DIRECTIONS IN DEVELOPMENT

Monitoring the Learning Outcomes of Education Systems

Vincent GreaneyThomas Kellaghan


3/134

P

1996 The International Bank for Reconstruction

d Development / THE WORLD BANK

18 H Street, N.W.

ashington, D.C. 20433

l rights reserved

anufactured in the United States of America

rst printing November 1996

e findings, interpretations, and conclusions expressed in this study are entirely thos

e authors and should not be attributed in any manner to the World Bank, to its affili

ganizations, or to the members of its Board of Executive Directors or the countries

present.

over photos: Curt Carnemark, The World Bank

ncent Greaney is a senior education specialist in the World Bank's Asia Technical

epartment, Human Resources and Social Development Division. Thomas Kellaghan

rector of the Educational Research Centre, St. Patrick's College, Dublin

brary of Congress Cataloging-in-Publication Data

eaney, Vincent.

onitoring the learning outcomes of education systems / Vincent

eaney, Thomas Kellaghan.cm. (Directions in development)

cludes bibliographical references.

BN 0-8213-3734-3

Educational evaluationCross-culture studies. 2. Educational

dicatorsCross-culture studies. 3. Educational tests and

easurementsCross-culture studies. I. Kellaghan, Thomas.

Title. III. Series: Directions in development (Washington,

C.)B2822.75.G736 1996

9.1'54dc20 96-43969

CIP


4/134

Pa

ontents

eface vii

ature and Uses of Educational Indicators 1

Educational Indicators 3

Choice of Outcome Indicators 4

Uses of Information from Outcome Assessments 5

Informing Policy 6

Monitoring Standards 7

Introducing Realistic Standards 7

Identifying Correlates of Achievement 8

Directing Teachers' Efforts and Raising Students'

Achievements 8

Promoting Accountability 9

Increasing Public Awareness 9

Informing Political Debate 10

Role of National Assessments 10

ational and International Assessments 12

National Assessments 12

United States 12

England and Wales 15

Chile 17

Colombia 19

Thailand 21


5/134

Namibia 22

Mauritius 24

International Assessments 25

International Assessment of Educational Progress 25

International Association for the Evaluation of

Educational Achievement

26

Advantages of International Assessments 27

Disadvantages of International Assessments 28

ational Assessment and Public Examinations 31

Purposes 31

Achievements of Interest 32

Testing, Scoring, and Reporting 33

Populations of Interest 34

Monitoring 34


6/134

P

Contextual Information 35

High-Stakes and Low-Stakes Testing 37

Efficiency 38

Conclusion 38

omponents of a National Assessment 40

Steering Committee 41

Implementing Agency 42

Internal Agency 42

External Agency 43

Team from Internal and External Agencies 44

Foreign Experts 44

Building Support 45

Target Population 46

Population Defined by Age or Grade 46

Choice of Levels of Schooling 47

Sampling 47

Choice of Population for Sampling Purposes 48

Sample Selection 48

What Is Assessed? 50

Instrument Construction 52

Type of Test 55

Test Sophistication 58

Nonachievement Variables 58

Administration Manual 59

Review Process 59


7/134

Administration 60

Analysis 61

Reporting 62

Average Performance of Students in a Curriculum

Area 63

Percentage Passing Items 63

Percentage Achieving Mastery of Curriculum

Objectives 63

Percentage Achieving Specified Attainment Targets 64

Percentage Functioning at Specified Levels of

Proficiency 65

Cost-Effectiveness 65

Conclusion 66

tfalls of National Assessment: A Case Study 68

Background to the Initiation of a National Assessment in

Sentz 68

School System 68

Response to Education Concerns 69

National Assessment of Educational Standards in Sentz 70

Organization 70

Test Development 71

Implementation 72


8/134

P

Analysis of the Case 73

Responses to Assessment 73

Implementation Procedures 74

A Choice to Make 76ferences 77

ppendix. National Assessment Checklist 85

bles

2.1

Proficiency Levels of Students in Grades 4, 8, and 12, as

Measured by U.S. NAEP Mathematics Surveys, 1990 and1992

14

2.2

Percentage of Students at or above Average Proficiency

Levels in Grades 4, 8, and 12, as Measured by U.S.

NAEP Mathematics Surveys, 1990 and 1992

15

4.1

Specifications for Mathematics Test: IntellectualBehaviors Tested

54

4.2

Distribution of Costs of Components of National

Assessment in the United States

66

5.1

Educational Developments in Sentz, 197090 69

5.2

Schedule of Activities for a National Assessment in

Sentz

72

oxes

2.1

Atypical Student Samples 28

4.1


9/134

Examples of Multiple-Choice Items in Mathematics, for

Middle Primary Grades

55

4.2

Example of Open-Ended Item in Mathematics, for Lower

Secondary Grades

56

4.3

Dangers of Cultural Bias in Testing

60


10/134

Pa

eface

e collection and publication of statistics relating to numbers of schools, numbers o

achers, student enrollments, and repetition rates have for some time been a feature o

ost education systems. Up to relatively recently, however, few systems, with theception of those with public examinations, have systematically collected informatio

hat education systems actually achieve in terms of students' learning. This is so even

ough, as the World Declaration on Education for All (UNESCO 1990b) reminds us,

whether or not expanded educational opportunities will translate into meaningful

velopmentfor an individual or for societydepends ultimately on whether people lea

esult of those opportunities."

response to this consideration, education systems in more than fifty countries, mosem in the industrial world, have in recent years shown an interest in obtaining

formation on what their students have learned as a result of their educational

periences. This interest was manifested either by developing national procedures to

sess students' achievements or by participating in international studies of student

hievement. It seems likely that the number of countries involved in these activities w

crease in the future.

his book is intended to provide introductory information to individuals with an inte

assessing the learning outcomes of education systems. It considers the role of dicators in this process, in particular their nature, choice, and use (chapter 1). A num

approaches to assessing learning outcomes in selected industrial countries (the Uni

ates and the United Kingdom) and in representative developing countries (Chile,

olombia, Mauritius, Namibia, and Thailand) are described. Systems of comparative

ernational assessment are also reviewed, and the arguments for and against the

rticipation of developing countries in such assessments are examined (chapter 2).

me countries already have available and publish information on student learning inrm of public examination results. The question arises: can such information be rega

equivalent to the information obtained in national assessment systems that are desig

ecifically to provide data on learning outcomes for an education system? The answ

ached in chapter 3) is that it cannot.

chapter 4, the various stages of a national assessment, from the establishment of a

tional steering committee to actions designed to


11/134

Pag

sseminate results and maximize the impact of the assessment, are described. Finally

apter 5, a case study containing numerous examples of poor practice in the conduct

tional assessments is presented. The more obvious examples of poor practice are

entified, and corrective measures are suggested.

e authors wish to express their appreciation for assistance in the preparation of this

per to Leone Burton, Vinayagum Chinapah, Erika Himmel, John Izard, Ramesh

anrakhan, Michael Martin, Paud Murphy, Eileen Nkwanga, O. C. Nwana, Carlos Ro

alcolm Rosier, Molapi Sebatane, and Jim Socknat. The manuscript was prepared by

resa Bell and Julie-Anne Graitge. Nancy Berg edited the final manuscript for

blication. Abigail Tardiff and Amy Brooks were the proofreaders.


12/134

P

ature and Uses of Educational Indicators

though most of us probably think of formal education or schooling primarily in ter

the benefits that it confers on individuals, government investment in education hasten been based on assumptions about the value of education to the nation rather tha

e individual. As public schooling developed in the eighteenth and nineteenth centur

pport for it was frequently conceived in the context of objectives that were public

her than private, collective rather than individual (Buber 1963). More recently, colo

ministrations recognized the value of education in developing the economy as well

promoting shared common values designed to make populations more amenable to

ntrol.

e importance of education for the nation is reflected in the considerable sums of

oney that national governments, and, frequently, provincial, regional, and state

vernments, are prepared to invest in it. In 1987 world public expenditure on educat

mounted to 5.6 percent of gross national product (GNP); the figure varied from a low

1 percent for East Asia to a high of 6.5 percent for Oceania. As a percentage of total

vernment expenditure, the median share for education was 12.8 percent in industria

untries, a figure considerably lower than the 15.4 percent recorded in developing

untries (UNESCO 1990a).

ven this situation, it is not surprising that for some time government departments h

utinely collected and published statistics that indicate how their education systems a

orking and developing. Statistics are usually provided on school numbers and facili

udent enrollments, and efficiency indices such as student-teacher ratios and rates of

petition, dropout, and cohort completion. But despite an obvious interest in what

ucation achieves, and despite the substantial investments of effort and finance in its

ovision, few systems in either industrial or developing countries have, until recentlystematically collected and made available information on the outcomes of education

hus, in most countries there is a conspicuous dearth of evidence on the quality of

udents' learning. Few have stopped, as a former mayor of New York was inclined to

d asked "Hey, how am I doing?" although knowing precisely how one is doing wou

rely be useful.


13/134

P

nce the 1980s, however, decisionmakers have begun to attach increasing importanc

e development of a coherent system for monitoring and evaluating educational

hievement, specifically pupil learning outcomes. In this book, our focus is on the

velopment of such a system. Following usage in the United States, this system is

ferred to as a national assessment.

e interest in developing a systematic approach to assessing outcomesin doing a nati

sessmentcan be attributed to several factors. One is a growing concern that many

ildren spend a considerable amount of time in school but acquire few useful skills.

indham (1992) has pointed out, school attendance without learning "makes no socia

onomic or pedagogical sense" (p. 56). In the words of the World Declaration on

ducation for All (UNESCO 1990b, par. 4),

Whether or not expanded educational opportunities will translate into meaningful developmentfor

an individual or for societydepends ultimately on whether people actually learn as a result of thosopportunities, in other words, whether they incorporate useful knowledge, reasoning ability, skill

and values.

he problem of inadequate school learning is not confined to developing countries.

roughout the world, one hears expressions of dissatisfaction with the levels of

hievement of today's students, though there may be little evidence that standards are

ct declining. Even without such evidence, a case can still be made that changes in th

orld of work are resulting in a mismatch between educational outcomes and the nee

society (Townshend 1996). This mismatch is most obvious in the case of what hasen called "an educational underclass" made up of students who perform very poorl

e education system. This underclass is found in most countries. In the past its memb

uld find employment in unskilled work, but this is no longer possible because jobs

quire only minimal literacy skills are fast disappearing from the labor market,

rticularly in industrial countries.

ven the need for better-educated students, decisionmakers are concluding that a

onitoring system is necessary to gather information needed to describe and monitor ture of students' achievements, the relevance of those achievements to the world of

ork, and the number of inadequately prepared students leaving the system.

hat is learned at school assumes even more importance because of increased global

onomic competition, marked by rapid movement of capital and new technologies fr

untry to country. In such a situation, it is claimed that a country's level of productiv

d ability to compete depend greatly on workers' and management's skill in using


14/134

P

pital and technology (World Bank 1991) and thus that ''skilled people become the o

stainable competitive advantage" (Thurow 1992, p. 520). Comparative studies of

udents' achievements have been used to gauge the relative status of countries in

veloping individual skills.

nother reason for interest in monitoring student achievements is that governments to

e faced with the problem of expanding enrollments while at the same time improvin

e quality of educationwithout increasing expenditure. More detailed knowledge of t

nctioning of the education system will, it is hoped, help decisionmakers cope with t

uation by increasing the system's efficiency.

final reason for the increased interest in monitoring and evaluating educational

ovision arises from the move in many countries, in the interest of both democracy a

ficiency, to decentralize authority in the education system, providing greater autono

local authorities and schools. When traditional central controls are loosened in thisay, a coherent system of monitoring is necessary.

ducational Indicators

e term educational indicator (in the tradition of economic and social indicators) is

ten used to describe policy-relevant statistics that contain information about the stat

ality, or performance of an education system. Several indicators are required to pro

e necessary information. In choosing indicators, care is taken to provide a profile o

rrent conditions that metaphorically can be regarded as reflecting the "health" of thestem (Bottani and Walberg 1994; Burnstein, Oakes, and Guiton 1992). Indicators ha

e following characteristics (Burnstein, Oakes, and Guiton 1992; Johnstone 1981; Ow

odgkinson, and Tuijnman 1995):

n indicator is quantifiable; that is, it represents some aspect of the education system

merical form.

particular value of an indicator applies to only one point or period in time.

statistic qualifies as an indicator only when there is a standard or criterion against w

can be judged. The standard may involve a norm-referenced (synchronic) comparis

tween different jurisdictions; a self-referenced (diachronic) comparison with indica

lues obtained at different points in time for the same education system; or a criterio

ferenced comparison with an ideal or planned objective.

n indicator provides information about aspects of the education system that

licymakers, practitioners, or the public regard as


15/134


16/134

P

portant. Sometimes it may be easy to obtain consensus among interested parties on

hat is important; other times it may not.

n indicator is realistic in the sense that it is based on information collected with due

gard to financial and other constraints.

n indicator describes conditions amenable to improvement.

formation for indicators is collected frequently enough to allow change to be

onitored.

dicators allow an examination of distributions among subpopulations of interest (fo

ample, by age, gender, income, or socioeconomic group).

e selection of indicators to represent the status of the education system is based on

odel, which may be explicit or implicit, of how the education system works (Burnst

akes, and Guiton 1992). The set of indicators incorporated in the model should refle

e multifaceted nature of education in all its complexity (Bottani and Tuijnman 1994)

comprehensive enough to describe the important dimensions of the system. The

odel, in turn, provides a context for interpreting what the indicators mean, how they

ate to other aspects of the education system (and perhaps to other social and econo

stems), and how they are likely to respond to various kinds of manipulation.

e model of the education system on which indicators are built frequently comprises

me combination of inputs, processes, and outputs. Inputs are the resources available systemfor example, buildings, books, the number and quality of teachers, and suc

ucationally relevant background characteristics of students as the socioeconomic

nditions of their families, communities, and regions. Processes are the ways school

e their resources as expressed in curricular and instructional activities. Outputs are

at the school tries to achieve; they include the cognitive achievements of students an

fective characteristics such as the positive and negative feelings and attitudes studen

velop relating to their activities, interests, and values.

hoice of Outcome Indicators

enumerate the outcomes of education about which it might be useful to have empi

formation in terms of the many aims that have been posited for education would be

dless task. Aims frequently suggested include the development of literacy and

meracy skills, the development of aesthetic areas of experience, preparation for life

mocratic society, preparation for the world of work, development of character and

oral sensitivity, and personal self-fulfillment. Aims (and


17/134


18/134


19/134


20/134

P

achers' efforts and raising students' achievements, promoting accountability, increas

blic awareness, and informing political debate.

forming Policy

formation on the achievements of students in an education system can serve a varie

diences and functions. Educational administrators, such as senior ministry of educaficials, should be in a position to produce valid, timely, and useful information whe

dressing policy issues to be resolved in a political setting. Without such information

licymaking can be unduly influenced by personal biases of ministers of education o

nior civil servants, vested interests of school owners or teacher unions, and anecdo

idence offered by business interests, journalists, and politicians. Given this range of

fluences, at a minimum, pertinent data must be available to guide the selection of

orities in curriculum, the provision of material resources, and teacher training

ategies. However, as noted above, factual information to assist policymaking, especta on the quality of student learning, is seldom available in developing countries. Ev

hen data on student achievement are available, the views of powerful constituencies

ntinue to play a role in setting educational priorities. Virtually all decisions in publi

licy are based on both facts and values (Lincoln and Guba 1981). The role of

hievement data is to strengthen the factual basis of decisionmaking.

any education systems are committed to the principle of equality of opportunity and

onitor the extent to which groups enjoy equal access to and participate in education.formation from a national assessment can bring this a step further by providing

idence about the achievements of such groups. Thus, national assessment results ha

en used in the United States to provide evidence of differences in school achieveme

ated to geography, gender, and ethnicity. Many countries will also be interested in

owing whether mean reading achievement levels are similar for boys and girls, rura

d urban children, and children from different linguistic groups.

formation from a national assessment will be more useful to policymakers if it prov

formation on subdomains of knowledge rather than just an overall score for arriculum area such as reading or mathematics. Recent reading surveys have examin

spondents' performance in analysis and comprehension of narrative material (based

tional text), expository material (information or opinion writing), and documentary

aterial (information presented in a structured form in charts, maps, lists, or sets of

structions) (Elley 1992). In mathemat-


21/134

P

, categories (subdomains) that have been used include numbers and operations,

easurement, geometry, data analysis and statistics, and algebra and functions (Lapoi

ead, and Askew 1992). Data on the performance of students in subdomains can poi

engths and weaknesses within curriculum areas, show how intended curricula are

plemented in schools, and, in particular, highlight such factors as gender, urban-rur

cation, or performance at different times. Such information may have implications rriculum design, teacher training, and the allocation of resources.

onitoring Standards

formation on student achievement in key curriculum areas collected on a regular ba

s helped monitor changes in achievement over time in such countries as Chile, Fran

land, Thailand, the United Kingdom, and the United States. By presenting objective

dings on achievement, a national assessment can provide evidence relevant to

sertions made frequently by employers, industrialists, and others that educationalndards are falling.

ountries vary in the frequency with which they obtain information on particular area

hievement. A five-year interval would seem to be a reasonable time span, since

hievement standards are unlikely to vary greatly from year to year. This does not m

at a national assessment exercise would be conducted only every five years.

sessments could be more frequent, but a particular curriculum area would be asses

ly once in five years.troducing Realistic Standards

national assessment can foster a sense of realism in the debate on appropriate

hievement levels. In developing countries, unrealistic standards have probably

ntributed to the high student failure rates that are a feature of many education system

ellaghan and Greaney 1992). Unduly high levels of expectation may be prompted b

e desire to maintain traditional colonial standards. However, such a target may be al

possible to attain, given the level of socioeconomic development of some countriesnother factor affecting the target is the changing nature of the school-going populati

sing from the dramatic increase in enrollment numbers; this increase, in turn, is oft

companied by lower teacher qualification requirements and a decline in the quality

ucational facilities.


22/134

P

entifying Correlates of Achievement

formation on correlates of the outcomes of an education system can help policymak

entify factors over which they can exercise some controlfactors likely to contribute

provements in student achievement levels. Data on some of these potentially

anipulable variables may have to be collected along with achievement data at the tim

e national assessment. For example, national assessment data have been used in

olombia to assess the impact of in-service teacher training. In Chile the contribution

hool resources to student achievement has been examined and decisions made abou

ocation of such resources. Other possible correlates of achievement include the

mphasis placed on individual subject areas; assessment and supervision procedures;

xtbooks (prices, numbers, contents, and distribution systems); curricular content; an

te policies on language instruction.

recting Teachers' Efforts and Raising Students' Achievements

e expectation is that action will be taken in the light of national assessment results t

andate changes in policy or in the allocation of resources. However, the information

ch assessments provide may be sufficient, even without formal action, to bring teac

d learning into line with what is assessed (Burnstein, Oakes, and Guiton 1992). The

ason for the improvement is that the indicators may point to what is important, and

what is measured is likely to become what matters" (Burnstein, Oakes, and Guiton 19

410). As a consequence, curricula, teaching, and learning will be directed toward thhievements represented in the indicators. What is tested is what will be taught, and

not tested will not be taught (Kellaghan and Greaney 1992).

e conditions under which assessments will have positive effects are not entirely cle

rtainly, there are situations in which assessment systems have little impact on polic

actice (Gipps and Goldstein 1983), for example, when the results are not communic

early or in a usable way to policymakers. It is equally certain that when high stakes

ached to performance on an assessment, teaching and learning will be aligned with

sessment (Kellaghan and Grisay 1995; Madaus and Kellaghan 1992). But although thay result in improved test scores, if these are the result of teaching to the test, they w

t necessarily be matched by improvement in students' achievement measured in oth

ays (Kellaghan and Greaney 1992; Le Mahieu 1984; Linn 1983).

ailand provides an example of a national assessment designed to change teachers'

rceptions of what is important to teach. The assessment included affective outcome

ch as attitudes toward work, moral


23/134

P

lues, and social participation in the hope that teachers would begin to stress learnin

tcomes other than those measured in formal examinations. Subsequently, it was

ablished that teachers began to emphasize affective learning outcomes in their teach

d evaluation (Prawalpruk 1996).

omoting Accountability

overnments need access to relevant information on the operation of the education

stem to enable them to determine whether the state is getting good value for its

vestment. That investment is substantial. Recent figures indicate that in most low-

come economies, expenditure on education is one of the largest cost items in

vernment spendingmuch larger than expenditures on health, defense, housing, soci

curity, or welfare (World Bank 1995a). In this situation, relevant feedback is obviou

sential and can help avoid a waste of scarce resources that has been described as

cially intolerable, economically unacceptable, and politically short-sighted (Bottani90, p. 336).

variety of models of accountability exists. The precise model employed will depend

any factors. First, it will depend on who is regarded as responsible for performance

acher, the school, the ministry of education, or the general public. Second, the natur

e information obtained will affect which individuals or institutions are identified as

countable. In the British system of national assessment, information is available abo

schools; thus schools can be identified in the accountability process. If individualachers or schools are not identified in national assessments, it obviously will not be

ssible to hold them accountable for student performance. Similarly, when samples,

her than whole populations of schools, are tested in a national assessment, adequat

formation will not be available (except for a small number of sample schools) to

entify and hold accountable poorly performing teachers or schools.

creasing Public Awareness

inistries of education are often reluctant to place in the public arena information aboe operation of the education system that they regard as sensitive. This is not surprisi

hen the ministry is charged by government with attaining politically sensitive (but

actically difficult) objectives such as promotion of a national language. Willingness

blicize policy failures is not a conspicuous characteristic of most ministries. In addi

litical expediency may dictate that ministries not


24/134

Pa

port results which highlight the superiority of particular ethnic, linguistic, or regiona

oups. In such situations, it may be difficult to establish an atmosphere in which nati

sessments can be conducted and results made freely available to all interested partie

though it may sometimes be in the interest of a ministry to control the flow of

formation, the long-term advantages of an open-information system are likely to

tweigh any short-term disadvantage. Several long-term benefits can be identified.

hen the results of a national assessment are made widely available, they can attract

nsiderable media attention and thus heighten public consciousness on educational

atters. The results of a national assessment can also bring an air of reality and a leve

egrity to discussions about the education system. The informed debate that is simul

n, in turn, contribute to increased public support for national, regional, and local ef

improve the education system. Thus, although the knowledge furnished by nationa

sessments may create immediate problems for politicians and government officials,e longer term it can provide a stimulus, rationale, or justification for reform initiativ

forming Political Debate

ational and, even more notably, international comparative assessment exercises give

considerable debate among politicians, as well as others interested in education. An

ucation system provides a country with the human resources and expertise necessar

ake it competitive in international markets, and from this perspective political intere

tional achievement is understandable. Politicians need to know whether the educatistem is giving value for the considerable portion of the national budget they allocat

each year. Today, in many countries, rhetoric (usually uninformed) tends to domina

e political debate on education. Armed with objective evidence on the operation of

stem, politicians are more likely to initiate reforms and to prompt ministries of

ucation to action.

ole of National Assessments

though there has been a pronounced increase in recent years in support for formalsessment of student achievement (Lockheed 1992), most developing countries still l

lid and timely information on the outcomes of schooling. A national assessment can

lp fill this gap by providing educational leaders and administrators with relevant da

student achievement levels in important curricular areas


25/134

Pa

a regular basis. These data can contribute to policy and public debate, to the diagn

problems, to the formulation of reforms, and to improved efficiency.

ere is no single formula or design for carrying out a national assessment. A

vernment's purposes and procedures for assessing national levels of achievement w

determined by local circumstances and policy concerns. The diversity of uses and

proaches will become more apparent in chapter 2 when we review seven national

sessment systems from different regions of the world, as well as international

mparative assessments of student achievements. The remainder of the book provid

formation on how toand how not toconduct a national assessment.

may seem reasonable to argue that spending money on a national assessment is not

stified when resources are inadequate for building schools or for providing textboo

udents who need them. In response, it needs to be pointed out that the resources

quired for the conduct of a national assessment would not go very far in addressingajor shortcomings in the areas of school or textbook provision. Furthermore, the

formation obtained through a national assessment can bring about cost-efficiencies

entifying failing features of existing arrangements or by producing evidence to supp

ore effective alternatives. However, it is up to the proponents of a national assessme

show that the likely benefits to the education system as a whole merit the allocation

e necessary funds. If they cannot show this, the resources earmarked for this activit

ght indeed be more usefully devoted to activities such as school and textbook

ovision.


26/134

Pa

ational and International Assessments

ational assessments tend to be initiated by governmentsmore specifically, by ministr

education. International assessments often owe their origin to the initiatives of embers of the research community. The main difference between the two types of

sessment is that national assessments are designed and implemented within individu

untries using their own sampling designs and instrumentation, whereas internationa

sessments require participating countries to follow similar procedures and use the s

struments.

this chapter, national assessment systems in two industrial countries (the United Sta

d England and Wales) and five developing countries (two in Latin America, one inia, and two in Africa) are described. Next, two international assessments are outlin

d the advantages and disadvantages for developing countries of participating in suc

sessments are considered.

ational Assessments

ational assessments are now a standard feature of education systems in several indu

untries. The assessments are similar in many ways. Virtually all use multiple-choice

ort-answer questions, although Norway and the United States include essay-typeiting tasks and oral assessments are conducted in Sweden and the United Kingdom

ngland, Wales, and Northern Ireland). National assessments also differ in several

spects from country to country. In Canada and France many grades are assessed,

hereas relatively few are assessed in the Netherlands, Norway, Scotland, and Swede

e purposes of national assessment also vary.

nited States

e U.S. National Assessment of Educational Progress ( NAEP) is the most widely reportional assessment model in the literature. It is an on-


27/134

Pa

ing survey, mandated by the U.S. Congress and implemented by trained field staff,

ually school or district personnel. The survey is designed to measure students'

ucational achievements at specified ages and grades. It also examines achievements

bpopulations defined by demographic characteristics and by specific background

perience. Since 1990 voluntary state-level assessments, in addition to the national

sessments, have been authorized by Congress (Johnson 1992).

though the NAEP has been in existence since 1969, politicians and the general publ

pear to have become interested in its findings only recently (Smith, O'Day, and Coh

90). Heightened political interest as a result of the attention paid by the National

overnors' Association to NAEP findings led to the introduction in 1990 of state-by-s

mparisons (Phillips 1991). Over the years, details of the administration of the NAEP

ve changedfor example, the frequency of assessment and the grade level targeted. A

esent, assessments are conducted every second year on samples of students in gradeand 12. Eleven instructional areas have been assessed periodically. Most recent rep

ve focused on reading and writing (Applebee and others 1990a, 1990b; Langer and

hers 1990; Mullis and Jenkins 1990); mathematics and science (Dossey and others 1

ullis and Jenkins 1988; Mullis and others 1993); history (Hammack and others 1990

ography (Allen and others 1990); and civics (Anderson and others 1990). Data have

en reported by state, gender, ethnicity, type of community, and region.

p to 1984, the percentages of students who passed items were reported. Since that da

oficiency scales have been developed for each subject area. These scales weremputed by using statistical techniques (based on item response theory) to create a s

ale representing performance (Phillips and others 1993). The scale is a numerical in

at ranges from 0 to 500. It has three achievement levelsbasic, proficient, and advanc

ch grade level and allows comparison of performance across grades 4, 8, and 12.

setting the achievement levels, the views of teacher representatives (sixty-eight in

athematics, for example), administrators, and members of the general public were ta

o account (Mullis and others 1993). Performance at the lowest, or basic, level denortial mastery of the knowledge and skills required at each grade level. For example,

ade 4 students performing at the basic level are able to perform simple operations w

hole numbers and show some understanding of fractions and decimals. Performanc

e middle, or proficient , level demonstrates competence in the subject matter. In the v

the National Assessment Governing Board, all students should perform at this leve

ade 4 students who are proficient in mathematics can use whole numbers to estima

mpute, and determine whether results are reasonable; have a conceptual understand

fractions and


28/134


29/134

Pa

cimals; can solve problems; and can use four-function calculators. The highest, or

vanced , level indicates superior performance. Grade 4 students who receive this rat

n solve complex nonroutine problems, draw logical conclusions, and justify answe

verage mathematics proficiency marks are presented for grades 4, 8, and 12 for 1990

d 1992 in table 2.1. The data in the last column show that in both years more than o

rd of students at all grade levels failed to reach the basic level of performance.

owever, the figures in this and in other columns suggest that standards rose between

90 and 1992.

sults based on one common scale (table 2.2) show that most students, especially th

grades 4 and 8, performed poorly on tasks involving fractions, decimals, and

rcentages. Furthermore, very few grade 12 students were able to solve nonroutine

oblems involving geometric relations, algebra, or functions. Subsequent analyses

vealed that performance varied by type of school attended, state, gender, and level ome support.

omparisons of trends over time show that achievements in science and mathematics

ve improved, whereas, except at one grade level, there has been no significant

provement in reading or writing since the mid-1980s (Mullis and others 1994).

formation collected in the NAEP to help provide a context for the interpretation of the

hievement results revealed that large proportions of high school students avoid taki

athematics and science courses.

ble 2.1. Proficiency Levels of Students in Grades 4, 8, and 12, as Measured by

S. NAEP Mathematics Surveys, 1990 and 1992

Grade

d year

Average

proficiency

Percentage of students

at or above Percentage of students below

basic AdvancedProficientBasic

ade 4

90 213 1 13 54 46

92 218 2 18 61 39

ade 8

90 263 2 20 58 42

92 268 4 25 63 37

ade

90 294 2 13 59 41

92 299 2 16 64 36

urce: Mullis and others 1993.


30/134


31/134

Pa

ble 2.2. Percentage of Students at or above Average Proficiency Levels in Grades

8, and 12, as Measured by U.S. NAEP Mathematics Surveys, 1990 and 1992

Grade

nd year Average proficiency

Percentage at or above proficiency level

200 250 300 350

ade 4

90 213 67 12 0 0

92 218 72 17 0 0

ade 8

90 263 95 65 15 0

92 268 97 68 20 1

ade 12

90 294 100 88 45 5

92 299 100 91 50 6

te: Skills for each proficiency level are as follows:

vel 200. Addition, subtraction, and simple problem solving with numbers

vel 250. Multiplication and division, simple measurement, two-step problem

ving

vel 300. Reasoning and problem solving involving fractions, decimals,

rcentages, and elementary concepts in geometry, algebra, and statistics

vel 350. Reasoning, problem solving involving geometric relationship, algebra,

nctions.

urce: Mullis and others 1993.

mong eleventh-graders who enroll in science courses, approximately half had never

nducted independent experiments. Almost two-thirds of eighth-graders spend more

an three hours a day watching television.

gland and Wales

England and Wales, national monitoring efforts have been a feature of the educatio

stem since 1948. Large-scale national surveys of levels of reading achievement of 9

-, and 15-year-olds were conducted irregularly up to 1977 (Kellaghan and Madaus82). In 1978, partly in response to criticisms about standards in schools, a more

aborate system of assessment, run by the Assessment of Performance Unit in the

epartment of Education and Science, was set up (Foxman, Hutchinson, and Bloomf

91). Three main areas of student achievement were


32/134

Pa

geted for assessment at ages 11, 13, and 15: language, mathematics, and science. In

dition to pencil-and-paper tests, performance tasks were administered to small sam

students to assess their ability to estimate and to weigh and measure objects.

sessments in the 1980s carried considerable political weight. They contributed to th

gnificant curriculum reform movement embodied in the 1988 Education Act, which

e first time, defined a national curriculum in England and Wales (Bennett and Desfo

91). The new curriculum was divided into four ''key" stages, two at the primary lev

d two at the secondary level. A new system of national assessment was introduced i

njunction with the new curriculum. Attainment was to be assessed by teachers in th

wn classrooms by administering externally designed performance assessments. Thes

sessments went well beyond the performance tests introduced by the Assessment an

rformance Unit; they were designed to match normal classroom tasks and to have n

gative backwash effects on the curriculum (Gipps and Murphy 1994).e policy-related dimension of the assessments was clear. They were intended to hav

riety of functions: formativeto be used in planning further instruction; diagnosticto

entify learning difficulties; summativeto record the overall achievement of a student

stematic way; and evaluativeto provide information for assessing and reporting on

pects of the work of the school, the local education authority, or other discrete parts

e education service (Great Britain, Department of Education and Science, 1988). In

rticular, the assessments were expected to play an important role in ensuring that

hools and teachers adhered to the curriculum as laid down by the central authority.us the assessment approach could be described as "fundamentally a management

vice" (Bennett and Desforges 1991, p. 72); it was not supported by any theory of

arning (Nuttall 1990).

though there have been several versions of the curriculum and of the assessment

stem since its inception, some significant features of the system have been maintain

rst, all students are assessed at the end of each key stage at ages 7, 11, 14, and 16.

cond, students' performance is assessed against statements of attainment prescribedch stage (for example, the student is able to assign organisms to their major groups

ing keys and observable features, or the student can read silently and with sustained

ncentration). Third, assessments are based on both teacher judgments and external

ts.

achers play an important role in assessment: they determine whether a student has

hieved the level of response specified in the statement of attainment, record the

hievement levels reached, indicate level of progress in relation to attainment targets

ovide evidence to support levels of attainment reached, and give information about


33/134


34/134

Pa

nt achievements and progress to parents, other teachers, and schools. Moderation is

rried out by other teachers, to help ensure a common marking standard.

tial reactions to the process indicated that teachers welcomed the materials provide

d the innovative assessment procedures. On the negative side, the assessment proce

aced a heavy burden on teachers, the in-service support provided was inadequate, a

e assessment turned out to be largely impractical (Broadfoot and others n.d.; Gipps

hers 1991; Madaus and Kellaghan 1993). To add to the problems, results were being

blished at a time of intense competition between schools and of job losses, which g

e to questions about entrusting the administration and scoring to teachers (Fitz-Gib

95).

wo important lessons can be drawn from the British national assessment system. Fir

e use of complex assessment tasks leads to problems of standardization of procedur

r administration and scoring that, in turn, lead to problems of comparability, bothtween schools and over time. Second, it is extremely difficult, if at all possible, to

vise assessment tasks that will serve equally well formative, diagnostic, and summa

aluative purposes (Kellaghan 1996c). Efforts to deal with these problems are to be

und in the move to make greater use of more conventional centralized written tests

accord priority to the summative function in future assessments (Dearing 1993; Gip

d Murphy 1994).

hile1978 Chile's Ministry of Education assigned responsibility for a national assessmen

external agency, the Pontificia Universidad Católica de Chile. The study was pilote

er a two-year period. Data on contextual variables, as well as on achievement, were

llected (Himmel 1996). These included student-home variables (student willingness

arn, parental expectations for their children); teacher-classroom variables (teaching

ethodologies, classroom climate); principal and school variables (expectations of sta

d of students, promotion of parents in school activities); and institutional variables

ducational and financial policies).

e assessment was designed to provide information on the extent to which students

ere achieving learning targets considered minimal by the Ministry of Education; to

ovide feedback to parents, teachers, and authorities at municipal, regional, and cent

vels; and to provide data to planners that would guide the allocation of resources in

xtbook development, curriculum development, and in-service teacher training.

l students in grades 4 and 8 were assessed in Spanish (reading and


35/134


36/134


37/134


38/134

Pa

udents in rural schools; students in large schools performed better than students in s

hools; and students in private schools scored highest.

e results were disseminated extensively. Teachers received classroom results contai

e average percentage of correct answers for each objective assessed, as well as the

erage number of correct answers over the entire test. Results were also reported

tionally and by school, location, and region. Each classroom and school was given

rcentile ranking based on other schools in the same socioeconomic category, as we

national ranking. Special manuals explained the results and indicated how schools a

achers could use the information to improve achievement levels. Results were given

hool supervisors.

latively little use was made of the self-concept information. Parental information w

t used and was not collected after the first year. Parents, however, received a simpli

port of overall results for Spanish and mathematics.

e of the national assessment results has increased gradually. Lowscoring schools ha

cess to a special fund to enable them to improve infrastructure, educational resourc

d pedagogical approaches. Results have also been used to prompt curriculum refor

rcentile rank scores were dropped in favor of percentage scores because teachers fo

difficult to interpret the former.

e Chilean experience highlights the need for consensus and political will, technical

mpetence, and economic feasibility (Himmel 1996). Currently there appears to belitical and public support for the SIMCE. It provides education administrators with

formation for planning, and authors of instructional materials use the information to

entify objectives. However, the enterprise has not been a total success. Some school

alizing that their rank depended on the reported socioeconomic grouping of their

udents, overestimated the extent of poverty among their students to help boost their

sition. Efforts to explain procedures and results to parents have not been reflected i

creased parent involvement with schools except for private schools. Almost two-thi

teachers reported that they did not use the special manual that dealt with thedagogical implications of the test results. Finally, questions have been raised about

lue of the census approach when sample data could provide policymakers with the

eded information.

olombia

ational assessment in Colombia was prompted by a perception that insufficient relev

formation was available for decisionmaking at central, regional, and local levels (Ro

96). The Ministry of Education also wished to use the results to generate debate on


39/134

ucational issues.


40/134

Pa

e initial assessment conducted in 1991 focused on the extent to which standards

fined as minimum in mathematics and language were being attained in grades 3 and

urban and rural public and private schools. A total of 15,000 students participated i

e assessment. Originally thirteen states, accounting for 60 percent of the population

ere targeted. The sample comprised 650 students in grade 3 and 500 students in grad

each state.

r grade 3 four performance levels were assessed in mathematics and three in readin

mprehension. Performance levels or target standards were determined by the test

velopment personnel. For example, in mathematics the lowest performance level

cluded items on simple addition, whereas more complex tasks involving problem

lving were equated with higher performance levels. For grade 5 five performance le

ere assessed in mathematics and four in reading. Both multiplechoice items and item

r which students had to supply short answers were used. Data on personal, school, vironmental characteristics were collected, as well as information on student

rticipation in local organizations or associations.

e national leader of the assessment had considerable experience in research, data

llection, and fieldwork. Teams were established to coordinate the fieldwork within

dividual states. Each team was led by a coordinator who directed the field testing,

pported by two or three individuals with formal qualifications in the social sciences

cal coordinators, usually young people, supervised the work of ten to fifteen

ldworkers. The fieldworkers, often university students or recent social scienceaduates, administered the tests and conducted teacher interviews. The supply of

plicants for these positions was ample because of the relatively high unemploymen

es among graduates. Local teachers were not asked to administer tests because it wa

t they might attempt to help students taking the tests. Ministry of Education official

ere considered unqualified for the work.

the end of the assessment, profiles of high-scoring schools, teachers, and

ministrators were developed. The percentages of students who scored at eachrformance level were reported separately for each state, for public and private scho

d for urban and rural schools, as well as at the national level. Correlates of achievem

ere identified; these included the number of hours per week devoted to a subject are

achers' emphasis on specific content areas, teachers' educational level, school faciliti

d number of textbooks per student. Negative correlations were recorded for grade

petition, absenteeism, time spent getting to school, and family size (Instituto SER de

vestigación/Fedesarrollo 1994). The number of in-service courses a teacher had tak

d not emerge as a significant predictor of achievement.


41/134


42/134

Pa

sults were released through the mass media, and a program of national and local

orkshops was organized to discuss the results and their implications. Individual teac

ceived information on national and regional results in newsletters, brochures, and o

er-friendly documents. Administrators, especially at the state level, used results for

cal comparisons. A national seminar used the national assessment data to identify

propriate strategies for improving educational quality. Results for individual schoolere not reported because it was felt that this would undermine teacher support for th

sessment.

e apparent success of the initial assessment has been attributed to the creation of an

aluation unit within the Ministry of Education; to the commitment of the minister an

ce-minister for education; to the support of ministry officials; to the use of an extern

blic agency to design the assessment instruments; and to the use of a private agency

ke responsibility for sampling, piloting of instruments, administration of tests, and dalysis (C. Rojas, personal communication, 1995). After the first two years responsib

r the national assessment was transferred to a public agency, which administered th

sessment in 1993 and 1994. By late 1995, however, the agency had not managed to

alyze the data collected in either year.

ailand

llowing the introduction of a new higher secondary school curriculum in 1981, pub

rtification examinations at the end of secondary school were abolished in Thailand,achers were given responsibility for evaluating student achievements in their respec

urses. Concerned that achievement might fall in this situation, the Ministry of Educ

roduced national assessment as a means of monitoring standards (Prawalpruk 1996

dministrators at various levels of the system were expected to use the results to help

prove the quality of education. To encourage schools to broaden their objectives an

structional practices, the national assessment included measures of affective learnin

tcomes (attitudes toward work, moral values, and participation) and practical skills

arting in 1983, all grade 12 students (in their final year in secondary school) weresessed in Thai, social studies, and physical education. In addition, science, mathema

d career education were assessed in most subsequent years. Both cognitive and

fective outcomes were assessed in social studies, physical education, and career

ucation. The task was entrusted to the Office of Educational Assessment and Testin

rvices in the Department of Curriculum and Instruction Development.


43/134

Pa

any of the staff had achieved master's degrees in educational assessment; eight had b

ined outside Thailand. Subject matter committees (twelve to eighteen members eac

ablished for each subject area developed tables of specifications for achievement a

ote multiple-choice items. Nationwide testing was conducted on the same two days

hools were furnished with individual student scores and with school, regional, and

ovincial mean scores; information on how other individual schools performed was

ovided. For public communication purposes, student performance was reported as

rcentage of items answered correctly. Provincial administrators advised how the res

uld be used in planning academic programs at school, provincial, and regional leve

subsequent years, samples of grades 6 and 9 were assessed, generally every second

ar. In a reaction to the initial failure of schools to use assessment results to improve

hool practice, the national assessment design was expanded to include measures of

hool process (school administration, curriculum implementation, lesson preparationd instruction). Starting in 1990 school process measures were assessed by teams of

ree external evaluators. The early national assessment results for science and

athematics were considered disappointing; they showed that students were weak at

plying principles in both subject areas. This conclusion prompted a significant

rriculum revision in 1989.

ational assessment has been used for school and provincial planning and for monito

vels of student achievement over time; it has also helped increase teacher interest infective learning outcomes. According to Prawalpruk (1996), some principals misuse

e results by claiming that poor results could be attributed to poor teaching. Results w

ed for educational planning only if adequate administrative support was available.

hool principals ignored assessment results if they did not consider them useful for

anning.

amibia

e National Institute for Educational Development in Namibia collaborated with Floate University and Harvard University in 1992 to assess the basic language and

athematics proficiencies of students at grades 4 and 7. The objectives of the assessm

ere to inform policymakers on achievement levels to enable them "to decide on reso

geting to underachieving schools" (Namibia, Ministry of Education and Culture, 19

xiv), to sensitize managers to the professional needs of teachers, to enable schools

gions to


44/134

Pa

mpare themselves with their counterparts, and to provide baseline data for monitor

ogress.

sts were developed "by reference groups within the head office of the ministry" (p.

sed on official curricula and textbooks. A random sample of 136 schools was draw

vering Namibia's six education regions. Within each school, one grade 4 and one gr

class were chosen randomly. In one specific region of interest (Ondangwa), thirty-f

hools with grade 4 students and nineteen with grade 7 students took the national

nguageOshindongatest. Test instructions to all students were given in the local langu

ore than 7,000 students in grades 4 and 7 were tested in English and mathematics.

f the 136 schools, 20 were included in a special longitudinal sample to monitor chan

English achievement over time. In these schools, students in grades 4 and 5 took th

ade 4 test, whereas those in grades 6 and 7 took the grade 7 test. It was planned to

administer the tests to students each year. It is now accepted that the longitudinal samas too small to permit generalization to the wider population of Namibian children.

e tests were administered to all students in attendance in the targeted grades in the 1

mple schools; only 98 schools, however, had a grade 7 class. Both the English and

hindonga tests were timed. The English test took 40 to 60 minutes and the Oshindo

t 60 to 80 minutes to complete. The untimed mathematics test took up to 120 minut

d caused some student fatigue.

cause the test designers hoped to get a normal distribution of test scores, tests weresigned to assess levels of mastery. Items answered correctly by less than 20 percent

ore than 80 percent of students were deleted in analyses. This reduced severely the

mber of items that could be used in measuring performance levelsin the English gr

est, from seventeen to nine, and in the grade 7 mathematics test, from sixty to thirty

ght.

sults showed that many grade 4 students had difficulty with the English test, promp

ncern that the expected level of performance was too high and suggesting that therriculum materials might be too advanced. Initial analyses of results suggested that

tegories of students increased their scores between grades 4 and 5 and between grad

d 7. At grade 7, the performances of girls and boys were similar on the two languag

ts, but boys outscored girls on the mathematics test. Older students had much lowe

ores than younger ones; for example, 19-year-olds answered correctly fewer than h

e items answered correctly by 12- and 13-year-olds on both the English and

athematics tests. Differences in scores for regions and for language groups were als

ported.


45/134


46/134

Pa

ata were used to relate performance levels to three background factorsage, gender, a

me languagewhich in combination explained about one-third of the variance in Eng

ores and about one-fifth of the variance in mathematics scores. In one region, howe

s than 3 percent of the variance could be attributed to these factors. A set of papers

epared for teachers outlining practical suggestions for improving student performan

areas that had posed difficulties.

he study concluded that the process of developing the tests for the assessment was n

ogether satisfactory and that a new competency-based curriculum will make it

cessary to develop new measures to assess basic competencies in subject areas.

auritius

implement the recommendations of the World Conference on Education for All, th

nited Nations Educational, Scientific, and Cultural Organization (UNESCO) and the Unit

ations Children's Fund (UNICEF) launched a project to develop national assessment

pacities in China, Jordan, Mali, Mauritius, and Morocco (Chinapah 1992; UNESCO 199

entification missions to each country were supported by some centralized training in

rvey methodology. Each national assessment focused on learning achievement (liter

meracy, and basic life skills); factors related to learning achievement (personal

aracteristics, home environment, and school environment); and access and equity

male enrollment, and admission and participation rates of specific groups). The

signers hoped that lessons learned in the course of the project could be adapted andplied in other developing countries.

e national assessment in Mauritius was conducted to address policy issues relating

ucational inequalities (Chinapah 1992) and to provide baseline data on achievemen

vels, with the aim of identifying the percentage of students who attained defined

ceptable standards in specified subject areas. Literacy (English and French), numera

d life skills were assessed. Items on road safety, awareness of the environment, soc

ills, and study skills were included.

ecific performance criteria were developed for each subject area (Mauritius

aminations Syndicate 1995). To be rated literate in French, for example, a 9-year-ol

as required to obtain a minimum score of twenty marks out of thirty-five, including

ght of a possible thirteen in "reading" and twelve of, twenty-two in "vocabulary, wri

pression." To be considered literate in English, the 9-year-old was expected to obtai

nimum score of seventeen marks, including twelve out of a possible twenty-two in

ading and five of eight in writing. Such performances were considered to represent

ility to read clearly, to un-


47/134


48/134

Pa

rstand different types of text judged appropriate for 9-year-olds, and to solve simpl

opping problems (V. Chinapah, personal communication, 1995).

pproximately 1,600 standard IV students, mainly 9-year-olds in a representative sam

fifty-two schools, were assessed. Questionnaires were administered to parents,

achers, and school principals to obtain background information on home, school, a

udent characteristics. Responsibility for the assessment was entrusted to the Mauritiu

aminations Syndicate. The syndicate, which administers the annual high-stakes pub

aminations, had some technical competence in test development, data analysis, and

ministration of formal assessments. Each test lasted 40 minutes. The literary and

meracy test relied on multiple-choice and short-answer questions, the life skills test

ultiple-choice items. Tests were administered by retired primary school inspectors a

ad teachers. Data were collected in 1994, and findings were presented to the Ministr

ducation and to teachers. The syndicate plans to repeat the assessment in the future tonitor possible changes in achievement over time (R.Manrakhan, personal

mmunication, 1995).

ternational Assessments

ternational assessments, in contrast with national assessments, involve measuremen

e educational outcomes of education systems in several countries, usually

multaneously. Representatives from many countries (usually from research

ganizations) agree on an instrument to assess achievement in a curriculum area, thestrument is administered to a representative sample of students at a particular age or

ade in each country, and comparative analyses of the data are carried out (Kellaghan

d Grisay 1995).

ountries participating in international studies are expected to provide personnel and

nds for administration, training, printing, local analyses, and production of national

ports. Costs of instrument development, sampling frameworks, international data

alyses, and report writing are the responsibility of the international assessment agen

which individual countries make a financial contribution.

ternational Assessment of Educational Progress

e first International Assessment of Educational Progress (IAEP), conducted in 1988 u

e direction of Educational Testing Services, under contract to the U.S. Department o

ducation, represents an


49/134


50/134

Pa

y to learn, the amount of time a subject is studied, the use of computers, and factor

d resources in the homes of students (Anderson and Postlethwaite 1989; Anderson,

yan, and Shapiro 1989; Elley 1992, 1994; Kifer 1989; Lambin 1995; Postlethwaite and

oss 1992).

vantages of International Assessments

e main advantage of international studies over national assessments is the compara

amework they provide in assessing student achievement and curricular provision

usén 1967). International assessments give some indication of where the students in

untry stand relative to students in other countries. They also show the extent to whi

e treatment of common curriculum areas differs across countries, and, in particular,

tent to which the approach in a given country may be idiosyncratic. This informatio

ay lead a country to reassess its curriculum policy.

any accounts are available of how findings of international studies on student

hievement and curricula have been used to change educational policy (Husén 1987;

ellaghan 1996b; Torney-Purta 1990). For example, results of international studies ha

en credited with the increased emphasis placed on science in Canada and in the Un

ates (McEwen 1992). In Japan the relatively superior performance of students in

athematical computation compared with mathematical application and analysis led t

ange in emphasis in the curriculum (Husén 1987). In Hungary participation in IEA stu

s been credited with curriculum reform in reading, and the finding that home factorcounted for more variance in student achievement than school factors helped to

dermine Marxist-Leninist curricular ideologies (Báthory 1989).

ternational assessments have many other advantages. Their findings tend to attract m

litical and media attention than those of national studies. Thus, poor results can pro

liticians and other policymakers with a strong rationale for budgetary support for th

ucation sector.

r national teams entrusted with the implementation of international assessment, theperience of rigorous sampling, item review, printing, distribution, supervision, scor

ta entry, and drafting of national reports according to an agreed-on timetable can

ntribute greatly to the development of local capacity to conduct research and nation

sessments. Finally, staffing requirements and costs are lower in international studies

an in national assessments because instrumentation and sampling design are develop

collaboration with experts in other countries.


51/134

Pa

sadvantages of International Assessments

can be argued that such factors as availability of schools and materials, opportunity

arn, status and quality of teaching, parental interest, and class size differ so radically

om country to country that valid comparisons of international achievement test resu

e impossible (Rotberg 1991). Although IEA studies generally consider the extent to w

udents in individual countries have had opportunities to learn the content tested, it is

ubtful whether politicians, policymakers, or the media take these into consideration

hen commenting on national rankings. Political rhetoric, frequently based on the

rceived implications of the findings for competitiveness in international trade rathe

an on a sober evaluation of the meaning of results, may dominate the discussion

mediately following the publication of results. In fairness, it should be stressed tha

informed political rhetoric can be prompted by the results of national as well as

ernational assessments and that some of the problems associated with internationalsessments apply equally to national assessments.

potentially significant problem with both international and national studies is the

fficulty in obtaining a representative sample of students (box 2.1). In many develop

untries up-to-date population data may not be available, and communication and

gistical problems can contribute to relatively low response rates. The National Cente

r Education Statistics in the United States has set a response rate target of 85 percen

oss-sectional surveys. This target may be much too high for developing countries, a

deed it has been achieved only once by the United States in international studies of athematics and science (Medrich and Griffith 1992). Sampling problems are

mmonplace and have been blamed for significant reversals of performance in some

untries between grades (Rotberg 1991). Targeted populations may not be comparab

pecially in countries where national enrollment, drop-

ox 2.1. Atypical Student Samples

the 1991 IAEP mathematics study, only 3 percent of the population of

-year-old students in Brazil and 1 percent of the correspondingpulation of students in Mozambique were sampled. The performance

Chinese studentswhich was highlighted in the report of the studywas

sed on a sample that excluded many 13-year-olds: those below grade

n twenty provinces and cities, those out of school (almost 50 percent

the population), and those attending school in nine provinces and

tonomous regions with predominantly non-Chinese populations

apointe, Mead, and Askew 1992). The exclusion of these groups

ggests that the reported achievement levels may seriously


52/134

erestimate the mean achievements of Chinese students.


53/134

Pa

t,and retention rates differ sharply. The result is that countries may have been

presented by atypical samples of students.

further problem with international assessments is that it is probably impossible to

velop a test that is equally valid for several countries (Kellaghan and Grisay 1995).

hat is meant by ''achievement in mathematics" or "achievement in science" varies fr

untry to country because different countries will choose different skills applied to

fferent facts and concepts to define what they regard as mathematical or scientific

hievement. Furthermore, a particular domain of a subject may be taught at different

ade levels in different countries. For example, simple geometric shapes, which are

roduced in many countries in the junior or lower primary grades, are not introduce

til grade 5 in Bangladesh. Again, prior knowledge or expectations might interfere w

empts to solve a simple problem.

cause items included in an international test represent a common denominator of thrricula of participating countries, it is unlikely that the relative weights assigned to

ecific curriculum areas in national curricula will match those in international tests. I

e 1988 IAEP relatively little effort was made to test the curricula covered by non-U.

rticipants. As a result, in one of the participating countries (Ireland), important area

e mathematics curriculum were not tested, and other areas that received substantial

mphasis in the national curriculum were accorded relatively little emphasis in the

ernational test (Greaney and Close 1989).

though a range of test formats is used in international assessments, the multiple-cho

rmat is used widely for reasons of management efficiency and desirable psychomet

operties (especially reliability). Even when other assessment formats are included,

ports may be limited to the results of the multiple-choice tests. This means that

portant skills in the national curriculum, including writing, oral, aural, and practica

ills, are excluded.

e costs of international assessments are likely to be lower than those of national

sessments, but participation in an international assessment does require considerablnancial support. The IEA estimates that the minimum national requirement is a full-tim

searcher and a data manager. Personnel requirements vary according to the nature o

sessment. Developing countries that wish to participate must pay a nominal annual

d make a contribution to the overall costs on the basis of their economic circumstan

cal funds have to be obtained for printing, data processing, and attendance at IEA

eetings. Costs may be met by a ministry of education, from university operating

dgets, or from a direct grant from the ministry of education to a university or resea

nter. IEA experience suggests that government-owned institutes have a better track re


54/134

an universities in conducting assess-


55/134

Pa

ents (W. Loxley, personal communication, 1993). A lack of meaningful contact betw

iversity researchers and government ministries is particularly noteworthy in some L

merican countries.

any developing countries are likely to encounter a range of common problems, whe

ey are conducting an international or a national assessment. These include unavailab

current population information on schools and enrollment figures; lack of experien

administering large-scale assessments or in administering objective tests in schools

ts that do not adequately reflect the curriculum offered in schools or that fail to refl

gional, ethnic, or linguistic variations; lack of exposure to objective-type items; fear

t results might be used for teacher accountability purposes; insufficient funds and

illed manpower to do rigorous in-country analyses of the national or international d

vernmental restrictions on publicizing results; and logistical problems in conducting

sessment.n balance, a developing country can probably benefit from participation in internati

sessments of student achievements. Participation can help develop expertise that ca

awn on later in more focused and more relevant national assessments. Consultant

pport, however, may be needed to carry out an international or national assessment

rticular, the services of long-and short-term local and foreign consultants may be

quired to offer training programs in test development, sampling, and analysis.


56/134

Pa

ational Assessment and Public Examinations

though the idea of national assessment is new in most countries, public examinatio

e an important and well-established feature of education in Africa, Asia, Europe, ane Caribbean. In developing countries they are usually offered at the end of primary

hooling and at the ends of the junior and senior cycles of secondary schooling. Pub

aminations are similar in many respects to national assessments: procedures are

rmalized, and testing is normally done outside the classroom setting and requires

udents to provide evidence of achievement. Because of their importance, their

quency, and their similarity to national assessments, it is reasonable to ask whether

blic examinations could be used to obtain the kind of information that national

sessment systems are designed to collect.

ght issues are relevant in attempting to answer this question: the purposes of public

aminations and of national assessments; the achievements of interest to the two

tivities; testing, scoring, and reporting procedures; the populations of interest to the

tivities; monitoring capabilities of the two activities; the need for contextual informa

interpreting assessment data; the implications of attaching high stakes to assessmen

d efficiency and cost-effectiveness in obtaining information.

rposes

e purposes of public examinations and national assessments are significantly differ

e purpose of a public examination is to determine whether an individual student

ssesses certain knowledge and skills. A national assessment is not primarily concer

th identifying the performance of individual students; rather, its purpose is to asses

rformance of all or part of the education system. Given this difference, we can still

hether it is possible to aggregate the data from individual assessments in public

aminations to obtain information on

Note: For a more extended treatment of this topic, see Kellaghan (1996a).


57/134

Pa

e system. To answer that question, we have to consider the more specific purposes o

dividual assessment and the implications of these purposes for the kind of assessme

ocedure used.

public examinations, information on student performance is used to make decision

out certification and selection, with selection tending to be the more important func

ellaghan and Greaney 1992; Lockheed 1991). As a consequence, the assessment

ocedure or examination will attempt to achieve maximum discrimination for those

udents for whom the probability of selection is high. This is done by excluding item

at are easy or of intermediate difficulty; if most students answered an item correctly

m would not discriminate among the higher-scoring students. However, tests made

lely of more difficult questions will not cover the whole curriculum or even attemp

so. The result is that public examinations may provide information on students'

hievements on only limited aspects of a curriculum.he purpose of national assessment is to find out what all students know and do not

ow. Therefore, the instrument used must provide adequate curriculum coverage. F

policy perspective, the performance of students who do poorly on an assessment m

of greater interest than the performance of those who do well.

chievements of Interest

ere is some overlap in the student achievements identified as important by public

aminations and national assessments. During the period of basic education, bothrtification and national assessment are based on information about basic literacy,

meracy, and reasoning skills. If we look at primary certificate (public) examination

e find that many focus on a number of core subjects, and a glance at several nationa

sessments indicates that they do the same. For example, students knowledge of a

tional language and mathematics is included in all national assessment systems.

owever, no national assessment attempts the coverage found in public examinations

e secondary level, when students tend to select and specialize in subject areas. Thebjects offered vary from one examination authority to another, but it is not unusual

d syllabi and examinations in twenty, thirty, or even more subjects.

ational assessments have focused on cognitive areas of development. Thailand

rawalpruk 1996) and Chile (Himmel 1996) are among the relatively small number o

ucation systems that have attempted to assess affective o

Documents

Monitoring the Learning Outcomes-Vincent Greaney T