Monitoring the Learning Outcomes-Vincent Greaney T

  • Upload
    civaas

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    1/134

    title:Monitoring the Learning Outcomes of EducationSystems Directions in Development (Washingto

    D.C.)

    author:   Greaney, Vincent.; Kellaghan, Thomas.

    publisher:   World Bank

    isbn10 | asin:   0821337343

    print isbn13:   9780821337349

    ebook isbn13:   9780585233888

    language:   English

    subject 

    Educational evaluation--Cross-cultural studies,Educational indicators--Cross-cultural studies,

    Educational tests and measurements--Cross-cu

    studies.

    publication date:   1996

    lcc:   LB2822.75.G736 1996eb

    ddc:   379.1/54

    subject:

    Educational evaluation--Cross-cultural studies,

    Educational indicators--Cross-cultural studies,

    Educational tests and measurements--Cross-custudies.

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    2/134

    P

    DIRECTIONS IN DEVELOPMENT

    Monitoring the Learning Outcomes of Education Systems

    Vincent GreaneyThomas Kellaghan

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    3/134

    P

    1996 The International Bank for Reconstruction

    d Development / THE WORLD BANK 

    18 H Street, N.W.

    ashington, D.C. 20433

    l rights reserved

    anufactured in the United States of America

    rst printing November 1996

    e findings, interpretations, and conclusions expressed in this study are entirely thos

    e authors and should not be attributed in any manner to the World Bank, to its affili

    ganizations, or to the members of its Board of Executive Directors or the countries

    present.

    over photos: Curt Carnemark, The World Bank 

    ncent Greaney is a senior education specialist in the World Bank's Asia Technical

    epartment, Human Resources and Social Development Division. Thomas Kellaghan

    rector of the Educational Research Centre, St. Patrick's College, Dublin

    brary of Congress Cataloging-in-Publication Data

    eaney, Vincent.

    onitoring the learning outcomes of education systems / Vincent

    eaney, Thomas Kellaghan.cm. (Directions in development)

    cludes bibliographical references.

    BN 0-8213-3734-3

    Educational evaluationCross-culture studies. 2. Educational

    dicatorsCross-culture studies. 3. Educational tests and

    easurementsCross-culture studies. I. Kellaghan, Thomas.

    Title. III. Series: Directions in development (Washington,

    C.)B2822.75.G736 1996

    9.1'54dc20 96-43969

    CIP

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    4/134

    Pa

    ontents

    eface vii

    ature and Uses of Educational Indicators  1

    Educational Indicators 3

    Choice of Outcome Indicators 4

    Uses of Information from Outcome Assessments 5

    Informing Policy 6

    Monitoring Standards 7

    Introducing Realistic Standards 7

    Identifying Correlates of Achievement 8

    Directing Teachers' Efforts and Raising Students'

    Achievements  8

    Promoting Accountability 9

    Increasing Public Awareness 9

    Informing Political Debate 10

    Role of National Assessments 10

    ational and International Assessments  12

    National Assessments 12

    United States 12

    England and Wales 15

    Chile 17

    Colombia 19

    Thailand 21

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    5/134

     Namibia 22

    Mauritius 24

    International Assessments 25

    International Assessment of Educational Progress 25

    International Association for the Evaluation of 

    Educational Achievement

      26

    Advantages of International Assessments 27

    Disadvantages of International Assessments 28

    ational Assessment and Public Examinations  31

    Purposes 31

    Achievements of Interest 32

    Testing, Scoring, and Reporting 33

    Populations of Interest 34

    Monitoring 34

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    6/134

    P

    Contextual Information 35

    High-Stakes and Low-Stakes Testing 37

    Efficiency 38

    Conclusion 38

    omponents of a National Assessment  40

    Steering Committee 41

    Implementing Agency 42

    Internal Agency 42

    External Agency 43

    Team from Internal and External Agencies 44

    Foreign Experts 44

    Building Support 45

    Target Population 46

    Population Defined by Age or Grade 46

    Choice of Levels of Schooling 47

    Sampling 47

    Choice of Population for Sampling Purposes 48

    Sample Selection 48

    What Is Assessed? 50

    Instrument Construction 52

    Type of Test 55

    Test Sophistication 58

     Nonachievement Variables 58

    Administration Manual 59

    Review Process 59

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    7/134

    Administration 60

    Analysis 61

    Reporting 62

    Average Performance of Students in a Curriculum

    Area  63

    Percentage Passing Items 63

    Percentage Achieving Mastery of Curriculum

    Objectives  63

    Percentage Achieving Specified Attainment Targets 64

    Percentage Functioning at Specified Levels of 

    Proficiency  65

    Cost-Effectiveness 65

    Conclusion 66

    tfalls of National Assessment: A Case Study  68

    Background to the Initiation of a National Assessment in

    Sentz  68

    School System 68

    Response to Education Concerns 69

    National Assessment of Educational Standards in Sentz 70

    Organization 70

    Test Development 71

    Implementation 72

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    8/134

    P

    Analysis of the Case 73

    Responses to Assessment 73

    Implementation Procedures 74

    A Choice to Make 76ferences 77

    ppendix. National Assessment Checklist 85

    bles

    2.1

    Proficiency Levels of Students in Grades 4, 8, and 12, as

    Measured by U.S.  NAEP Mathematics Surveys, 1990 and1992

    14

    2.2

    Percentage of Students at or above Average Proficiency

    Levels in Grades 4, 8, and 12, as Measured by U.S.

    NAEP Mathematics Surveys, 1990 and 1992

    15

    4.1

    Specifications for Mathematics Test: IntellectualBehaviors Tested

    54

    4.2

    Distribution of Costs of Components of National

    Assessment in the United States

    66

    5.1

    Educational Developments in Sentz, 197090  69

    5.2

    Schedule of Activities for a National Assessment in

    Sentz

    72

    oxes

    2.1

    Atypical Student Samples  28

    4.1

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    9/134

    Examples of Multiple-Choice Items in Mathematics, for 

    Middle Primary Grades

    55

    4.2

    Example of Open-Ended Item in Mathematics, for Lower 

    Secondary Grades

    56

    4.3

    Dangers of Cultural Bias in Testing

      60

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    10/134

    Pa

    eface

    e collection and publication of statistics relating to numbers of schools, numbers o

    achers, student enrollments, and repetition rates have for some time been a feature o

    ost education systems. Up to relatively recently, however, few systems, with theception of those with public examinations, have systematically collected informatio

    hat education systems actually achieve in terms of students' learning. This is so even

    ough, as the World Declaration on Education for All (UNESCO 1990b) reminds us,

    whether or not expanded educational opportunities will translate into meaningful

    velopmentfor an individual or for societydepends ultimately on whether people lea

    esult of those opportunities."

    response to this consideration, education systems in more than fifty countries, mosem in the industrial world, have in recent years shown an interest in obtaining

    formation on what their students have learned  as a result of their educational

    periences. This interest was manifested either by developing national procedures to

    sess students' achievements or by participating in international studies of student

    hievement. It seems likely that the number of countries involved in these activities w

    crease in the future.

    his book is intended to provide introductory information to individuals with an inte

    assessing the learning outcomes of education systems. It considers the role of dicators in this process, in particular their nature, choice, and use (chapter 1). A num

    approaches to assessing learning outcomes in selected industrial countries (the Uni

    ates and the United Kingdom) and in representative developing countries (Chile,

    olombia, Mauritius, Namibia, and Thailand) are described. Systems of comparative

    ernational assessment are also reviewed, and the arguments for and against the

    rticipation of developing countries in such assessments are examined (chapter 2).

    me countries already have available and publish information on student learning inrm of public examination results. The question arises: can such information be rega

    equivalent to the information obtained in national assessment systems that are desig

    ecifically to provide data on learning outcomes for an education system? The answ

    ached in chapter 3) is that it cannot.

    chapter 4, the various stages of a national assessment, from the establishment of a

    tional steering committee to actions designed to

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    11/134

    Pag

    sseminate results and maximize the impact of the assessment, are described. Finally

    apter 5, a case study containing numerous examples of poor practice in the conduct

    tional assessments is presented. The more obvious examples of poor practice are

    entified, and corrective measures are suggested.

    e authors wish to express their appreciation for assistance in the preparation of this

    per to Leone Burton, Vinayagum Chinapah, Erika Himmel, John Izard, Ramesh

    anrakhan, Michael Martin, Paud Murphy, Eileen Nkwanga, O. C. Nwana, Carlos Ro

    alcolm Rosier, Molapi Sebatane, and Jim Socknat. The manuscript was prepared by

    resa Bell and Julie-Anne Graitge. Nancy Berg edited the final manuscript for 

    blication. Abigail Tardiff and Amy Brooks were the proofreaders.

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    12/134

    P

    ature and Uses of Educational Indicators

    though most of us probably think of formal education or schooling primarily in ter

    the benefits that it confers on individuals, government investment in education hasten been based on assumptions about the value of education to the nation rather tha

    e individual. As public schooling developed in the eighteenth and nineteenth centur

    pport for it was frequently conceived in the context of objectives that were public

    her than private, collective rather than individual (Buber 1963). More recently, colo

    ministrations recognized the value of education in developing the economy as well

    promoting shared common values designed to make populations more amenable to

    ntrol.

    e importance of education for the nation is reflected in the considerable sums of 

    oney that national governments, and, frequently, provincial, regional, and state

    vernments, are prepared to invest in it. In 1987 world public expenditure on educat

    mounted to 5.6 percent of gross national product (GNP); the figure varied from a low

    1 percent for East Asia to a high of 6.5 percent for Oceania. As a percentage of total

    vernment expenditure, the median share for education was 12.8 percent in industria

    untries, a figure considerably lower than the 15.4 percent recorded in developing

    untries (UNESCO 1990a).

    ven this situation, it is not surprising that for some time government departments h

    utinely collected and published statistics that indicate how their education systems a

    orking and developing. Statistics are usually provided on school numbers and facili

    udent enrollments, and efficiency indices such as student-teacher ratios and rates of 

    petition, dropout, and cohort completion. But despite an obvious interest in what

    ucation achieves, and despite the substantial investments of effort and finance in its

    ovision, few systems in either industrial or developing countries have, until recentlystematically collected and made available information on the outcomes of education

    hus, in most countries there is a conspicuous dearth of evidence on the quality of 

    udents' learning. Few have stopped, as a former mayor of New York was inclined to

    d asked "Hey, how am I doing?" although knowing precisely how one is doing wou

    rely be useful.

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    13/134

    P

    nce the 1980s, however, decisionmakers have begun to attach increasing importanc

    e development of a coherent system for monitoring and evaluating educational

    hievement, specifically pupil learning outcomes. In this book, our focus is on the

    velopment of such a system. Following usage in the United States, this system is

    ferred to as a national assessment.

    e interest in developing a systematic approach to assessing outcomesin doing a nati

    sessmentcan be attributed to several factors. One is a growing concern that many

    ildren spend a considerable amount of time in school but acquire few useful skills.

    indham (1992) has pointed out, school attendance without learning "makes no socia

    onomic or pedagogical sense" (p. 56). In the words of the World Declaration on

    ducation for All (UNESCO 1990b, par. 4),

    Whether or not expanded educational opportunities will translate into meaningful developmentfor

    an individual or for societydepends ultimately on whether people actually learn as a result of thosopportunities, in other words, whether they incorporate useful knowledge, reasoning ability, skill

    and values.

    he problem of inadequate school learning is not confined to developing countries.

    roughout the world, one hears expressions of dissatisfaction with the levels of 

    hievement of today's students, though there may be little evidence that standards are

    ct declining. Even without such evidence, a case can still be made that changes in th

    orld of work are resulting in a mismatch between educational outcomes and the nee

    society (Townshend 1996). This mismatch is most obvious in the case of what hasen called "an educational underclass" made up of students who perform very poorl

    e education system. This underclass is found in most countries. In the past its memb

    uld find employment in unskilled work, but this is no longer possible because jobs

    quire only minimal literacy skills are fast disappearing from the labor market,

    rticularly in industrial countries.

    ven the need for better-educated students, decisionmakers are concluding that a

    onitoring system is necessary to gather information needed to describe and monitor ture of students' achievements, the relevance of those achievements to the world of

    ork, and the number of inadequately prepared students leaving the system.

    hat is learned at school assumes even more importance because of increased global

    onomic competition, marked by rapid movement of capital and new technologies fr

    untry to country. In such a situation, it is claimed that a country's level of productiv

    d ability to compete depend greatly on workers' and management's skill in using

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    14/134

    P

    pital and technology (World Bank 1991) and thus that ''skilled people become the o

    stainable competitive advantage" (Thurow 1992, p. 520). Comparative studies of 

    udents' achievements have been used to gauge the relative status of countries in

    veloping individual skills.

    nother reason for interest in monitoring student achievements is that governments to

    e faced with the problem of expanding enrollments while at the same time improvin

    e quality of educationwithout increasing expenditure. More detailed knowledge of t

    nctioning of the education system will, it is hoped, help decisionmakers cope with t

    uation by increasing the system's efficiency.

    final reason for the increased interest in monitoring and evaluating educational

    ovision arises from the move in many countries, in the interest of both democracy a

    ficiency, to decentralize authority in the education system, providing greater autono

    local authorities and schools. When traditional central controls are loosened in thisay, a coherent system of monitoring is necessary.

    ducational Indicators

    e term educational indicator  (in the tradition of economic and social indicators) is

    ten used to describe policy-relevant statistics that contain information about the stat

    ality, or performance of an education system. Several indicators are required to pro

    e necessary information. In choosing indicators, care is taken to provide a profile o

    rrent conditions that metaphorically can be regarded as reflecting the "health" of thestem (Bottani and Walberg 1994; Burnstein, Oakes, and Guiton 1992). Indicators ha

    e following characteristics (Burnstein, Oakes, and Guiton 1992; Johnstone 1981; Ow

    odgkinson, and Tuijnman 1995):

    n indicator is quantifiable; that is, it represents some aspect of the education system

    merical form.

    particular value of an indicator applies to only one point or period in time.

    statistic qualifies as an indicator only when there is a standard or criterion against w

    can be judged. The standard may involve a norm-referenced (synchronic) comparis

    tween different jurisdictions; a self-referenced (diachronic) comparison with indica

    lues obtained at different points in time for the same education system; or a criterio

    ferenced comparison with an ideal or planned objective.

    n indicator provides information about aspects of the education system that

    licymakers, practitioners, or the public regard as

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    15/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    16/134

    P

    portant. Sometimes it may be easy to obtain consensus among interested parties on

    hat is important; other times it may not.

    n indicator is realistic in the sense that it is based on information collected with due

    gard to financial and other constraints.

    n indicator describes conditions amenable to improvement.

    formation for indicators is collected frequently enough to allow change to be

    onitored.

    dicators allow an examination of distributions among subpopulations of interest (fo

    ample, by age, gender, income, or socioeconomic group).

    e selection of indicators to represent the status of the education system is based on

    odel, which may be explicit or implicit, of how the education system works (Burnst

    akes, and Guiton 1992). The set of indicators incorporated in the model should refle

    e multifaceted nature of education in all its complexity (Bottani and Tuijnman 1994)

    comprehensive enough to describe the important dimensions of the system. The

    odel, in turn, provides a context for interpreting what the indicators mean, how they

    ate to other aspects of the education system (and perhaps to other social and econo

    stems), and how they are likely to respond to various kinds of manipulation.

    e model of the education system on which indicators are built frequently comprises

    me combination of inputs, processes, and outputs. Inputs are the resources available systemfor example, buildings, books, the number and quality of teachers, and suc

    ucationally relevant background characteristics of students as the socioeconomic

    nditions of their families, communities, and regions. Processes are the ways school

    e their resources as expressed in curricular and instructional activities. Outputs are

    at the school tries to achieve; they include the cognitive achievements of students an

    fective characteristics such as the positive and negative feelings and attitudes studen

    velop relating to their activities, interests, and values.

    hoice of Outcome Indicators

    enumerate the outcomes of education about which it might be useful to have empi

    formation in terms of the many aims that have been posited for education would be

    dless task. Aims frequently suggested include the development of literacy and

    meracy skills, the development of aesthetic areas of experience, preparation for life

    mocratic society, preparation for the world of work, development of character and

    oral sensitivity, and personal self-fulfillment. Aims (and

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    17/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    18/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    19/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    20/134

    P

    achers' efforts and raising students' achievements, promoting accountability, increas

    blic awareness, and informing political debate.

    forming Policy

    formation on the achievements of students in an education system can serve a varie

    diences and functions. Educational administrators, such as senior ministry of educaficials, should be in a position to produce valid, timely, and useful information whe

    dressing policy issues to be resolved in a political setting. Without such information

    licymaking can be unduly influenced by personal biases of ministers of education o

    nior civil servants, vested interests of school owners or teacher unions, and anecdo

    idence offered by business interests, journalists, and politicians. Given this range of

    fluences, at a minimum, pertinent data must be available to guide the selection of 

    orities in curriculum, the provision of material resources, and teacher training

    ategies. However, as noted above, factual information to assist policymaking, especta on the quality of student learning, is seldom available in developing countries. Ev

    hen data on student achievement are available, the views of powerful constituencies

    ntinue to play a role in setting educational priorities. Virtually all decisions in publi

    licy are based on both facts and values (Lincoln and Guba 1981). The role of 

    hievement data is to strengthen the factual basis of decisionmaking.

    any education systems are committed to the principle of equality of opportunity and

    onitor the extent to which groups enjoy equal access to and participate in education.formation from a national assessment can bring this a step further by providing

    idence about the achievements of such groups. Thus, national assessment results ha

    en used in the United States to provide evidence of differences in school achieveme

    ated to geography, gender, and ethnicity. Many countries will also be interested in

    owing whether mean reading achievement levels are similar for boys and girls, rura

    d urban children, and children from different linguistic groups.

    formation from a national assessment will be more useful to policymakers if it prov

    formation on subdomains of knowledge rather than just an overall score for arriculum area such as reading or mathematics. Recent reading surveys have examin

    spondents' performance in analysis and comprehension of narrative material (based

    tional text), expository material (information or opinion writing), and documentary

    aterial (information presented in a structured form in charts, maps, lists, or sets of 

    structions) (Elley 1992). In mathemat-

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    21/134

    P

    , categories (subdomains) that have been used include numbers and operations,

    easurement, geometry, data analysis and statistics, and algebra and functions (Lapoi

    ead, and Askew 1992). Data on the performance of students in subdomains can poi

    engths and weaknesses within curriculum areas, show how intended curricula are

    plemented in schools, and, in particular, highlight such factors as gender, urban-rur

    cation, or performance at different times. Such information may have implications rriculum design, teacher training, and the allocation of resources.

    onitoring Standards

    formation on student achievement in key curriculum areas collected on a regular ba

    s helped monitor changes in achievement over time in such countries as Chile, Fran

    land, Thailand, the United Kingdom, and the United States. By presenting objective

    dings on achievement, a national assessment can provide evidence relevant to

    sertions made frequently by employers, industrialists, and others that educationalndards are falling.

    ountries vary in the frequency with which they obtain information on particular area

    hievement. A five-year interval would seem to be a reasonable time span, since

    hievement standards are unlikely to vary greatly from year to year. This does not m

    at a national assessment exercise would be conducted only every five years.

    sessments could be more frequent, but a particular curriculum area would be asses

    ly once in five years.troducing Realistic Standards

    national assessment can foster a sense of realism in the debate on appropriate

    hievement levels. In developing countries, unrealistic standards have probably

    ntributed to the high student failure rates that are a feature of many education system

    ellaghan and Greaney 1992). Unduly high levels of expectation may be prompted b

    e desire to maintain traditional colonial standards. However, such a target may be al

    possible to attain, given the level of socioeconomic development of some countriesnother factor affecting the target is the changing nature of the school-going populati

    sing from the dramatic increase in enrollment numbers; this increase, in turn, is oft

    companied by lower teacher qualification requirements and a decline in the quality

    ucational facilities.

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    22/134

    P

    entifying Correlates of Achievement 

    formation on correlates of the outcomes of an education system can help policymak

    entify factors over which they can exercise some controlfactors likely to contribute

    provements in student achievement levels. Data on some of these potentially

    anipulable variables may have to be collected along with achievement data at the tim

    e national assessment. For example, national assessment data have been used in

    olombia to assess the impact of in-service teacher training. In Chile the contribution

    hool resources to student achievement has been examined and decisions made abou

    ocation of such resources. Other possible correlates of achievement include the

    mphasis placed on individual subject areas; assessment and supervision procedures;

    xtbooks (prices, numbers, contents, and distribution systems); curricular content; an

    te policies on language instruction.

    recting Teachers' Efforts and Raising Students' Achievements

    e expectation is that action will be taken in the light of national assessment results t

    andate changes in policy or in the allocation of resources. However, the information

    ch assessments provide may be sufficient, even without formal action, to bring teac

    d learning into line with what is assessed (Burnstein, Oakes, and Guiton 1992). The

    ason for the improvement is that the indicators may point to what is important, and

    what is measured is likely to become what matters" (Burnstein, Oakes, and Guiton 19

    410). As a consequence, curricula, teaching, and learning will be directed toward thhievements represented in the indicators. What is tested is what will be taught, and

    not tested will not be taught (Kellaghan and Greaney 1992).

    e conditions under which assessments will have positive effects are not entirely cle

    rtainly, there are situations in which assessment systems have little impact on polic

    actice (Gipps and Goldstein 1983), for example, when the results are not communic

    early or in a usable way to policymakers. It is equally certain that when high stakes

    ached to performance on an assessment, teaching and learning will be aligned with

    sessment (Kellaghan and Grisay 1995; Madaus and Kellaghan 1992). But although thay result in improved test scores, if these are the result of teaching to the test, they w

    t necessarily be matched by improvement in students' achievement measured in oth

    ays (Kellaghan and Greaney 1992; Le Mahieu 1984; Linn 1983).

    ailand provides an example of a national assessment designed to change teachers'

    rceptions of what is important to teach. The assessment included affective outcome

    ch as attitudes toward work, moral

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    23/134

    P

    lues, and social participation in the hope that teachers would begin to stress learnin

    tcomes other than those measured in formal examinations. Subsequently, it was

    ablished that teachers began to emphasize affective learning outcomes in their teach

    d evaluation (Prawalpruk 1996).

    omoting Accountability

    overnments need access to relevant information on the operation of the education

    stem to enable them to determine whether the state is getting good value for its

    vestment. That investment is substantial. Recent figures indicate that in most low-

    come economies, expenditure on education is one of the largest cost items in

    vernment spendingmuch larger than expenditures on health, defense, housing, soci

    curity, or welfare (World Bank 1995a). In this situation, relevant feedback is obviou

    sential and can help avoid a waste of scarce resources that has been described as

    cially intolerable, economically unacceptable, and politically short-sighted (Bottani90, p. 336).

    variety of models of accountability exists. The precise model employed will depend

    any factors. First, it will depend on who is regarded as responsible for performance

    acher, the school, the ministry of education, or the general public. Second, the natur

    e information obtained will affect which individuals or institutions are identified as

    countable. In the British system of national assessment, information is available abo

    schools; thus schools can be identified in the accountability process. If individualachers or schools are not identified in national assessments, it obviously will not be

    ssible to hold them accountable for student performance. Similarly, when samples,

    her than whole populations of schools, are tested in a national assessment, adequat

    formation will not be available (except for a small number of sample schools) to

    entify and hold accountable poorly performing teachers or schools.

    creasing Public Awareness

    inistries of education are often reluctant to place in the public arena information aboe operation of the education system that they regard as sensitive. This is not surprisi

    hen the ministry is charged by government with attaining politically sensitive (but

    actically difficult) objectives such as promotion of a national language. Willingness

    blicize policy failures is not a conspicuous characteristic of most ministries. In addi

    litical expediency may dictate that ministries not

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    24/134

    Pa

    port results which highlight the superiority of particular ethnic, linguistic, or regiona

    oups. In such situations, it may be difficult to establish an atmosphere in which nati

    sessments can be conducted and results made freely available to all interested partie

    though it may sometimes be in the interest of a ministry to control the flow of 

    formation, the long-term advantages of an open-information system are likely to

    tweigh any short-term disadvantage. Several long-term benefits can be identified.

    hen the results of a national assessment are made widely available, they can attract

    nsiderable media attention and thus heighten public consciousness on educational

    atters. The results of a national assessment can also bring an air of reality and a leve

    egrity to discussions about the education system. The informed debate that is simul

    n, in turn, contribute to increased public support for national, regional, and local ef

    improve the education system. Thus, although the knowledge furnished by nationa

    sessments may create immediate problems for politicians and government officials,e longer term it can provide a stimulus, rationale, or justification for reform initiativ

    forming Political Debate

    ational and, even more notably, international comparative assessment exercises give

    considerable debate among politicians, as well as others interested in education. An

    ucation system provides a country with the human resources and expertise necessar

    ake it competitive in international markets, and from this perspective political intere

    tional achievement is understandable. Politicians need to know whether the educatistem is giving value for the considerable portion of the national budget they allocat

    each year. Today, in many countries, rhetoric (usually uninformed) tends to domina

    e political debate on education. Armed with objective evidence on the operation of

    stem, politicians are more likely to initiate reforms and to prompt ministries of 

    ucation to action.

    ole of National Assessments

    though there has been a pronounced increase in recent years in support for formalsessment of student achievement (Lockheed 1992), most developing countries still l

    lid and timely information on the outcomes of schooling. A national assessment can

    lp fill this gap by providing educational leaders and administrators with relevant da

    student achievement levels in important curricular areas

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    25/134

    Pa

    a regular basis. These data can contribute to policy and public debate, to the diagn

    problems, to the formulation of reforms, and to improved efficiency.

    ere is no single formula or design for carrying out a national assessment. A

    vernment's purposes and procedures for assessing national levels of achievement w

    determined by local circumstances and policy concerns. The diversity of uses and

    proaches will become more apparent in chapter 2 when we review seven national

    sessment systems from different regions of the world, as well as international

    mparative assessments of student achievements. The remainder of the book provid

    formation on how toand how not toconduct a national assessment.

    may seem reasonable to argue that spending money on a national assessment is not

    stified when resources are inadequate for building schools or for providing textboo

    udents who need them. In response, it needs to be pointed out that the resources

    quired for the conduct of a national assessment would not go very far in addressingajor shortcomings in the areas of school or textbook provision. Furthermore, the

    formation obtained through a national assessment can bring about cost-efficiencies

    entifying failing features of existing arrangements or by producing evidence to supp

    ore effective alternatives. However, it is up to the proponents of a national assessme

    show that the likely benefits to the education system as a whole merit the allocation

    e necessary funds. If they cannot show this, the resources earmarked for this activit

    ght indeed be more usefully devoted to activities such as school and textbook 

    ovision.

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    26/134

    Pa

    ational and International Assessments

    ational assessments tend to be initiated by governmentsmore specifically, by ministr

    education. International assessments often owe their origin to the initiatives of embers of the research community. The main difference between the two types of 

    sessment is that national assessments are designed and implemented within individu

    untries using their own sampling designs and instrumentation, whereas internationa

    sessments require participating countries to follow similar procedures and use the s

    struments.

    this chapter, national assessment systems in two industrial countries (the United Sta

    d England and Wales) and five developing countries (two in Latin America, one inia, and two in Africa) are described. Next, two international assessments are outlin

    d the advantages and disadvantages for developing countries of participating in suc

    sessments are considered.

    ational Assessments

    ational assessments are now a standard feature of education systems in several indu

    untries. The assessments are similar in many ways. Virtually all use multiple-choice

    ort-answer questions, although Norway and the United States include essay-typeiting tasks and oral assessments are conducted in Sweden and the United Kingdom

    ngland, Wales, and Northern Ireland). National assessments also differ in several

    spects from country to country. In Canada and France many grades are assessed,

    hereas relatively few are assessed in the Netherlands, Norway, Scotland, and Swede

    e purposes of national assessment also vary.

    nited States

    e U.S. National Assessment of Educational Progress ( NAEP) is the most widely reportional assessment model in the literature. It is an on-

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    27/134

    Pa

    ing survey, mandated by the U.S. Congress and implemented by trained field staff,

    ually school or district personnel. The survey is designed to measure students'

    ucational achievements at specified ages and grades. It also examines achievements

    bpopulations defined by demographic characteristics and by specific background

    perience. Since 1990 voluntary state-level assessments, in addition to the national

    sessments, have been authorized by Congress (Johnson 1992).

    though the NAEP has been in existence since 1969, politicians and the general publ

    pear to have become interested in its findings only recently (Smith, O'Day, and Coh

    90). Heightened political interest as a result of the attention paid by the National

    overnors' Association to NAEP findings led to the introduction in 1990 of state-by-s

    mparisons (Phillips 1991). Over the years, details of the administration of the NAEP

    ve changedfor example, the frequency of assessment and the grade level targeted. A

    esent, assessments are conducted every second year on samples of students in gradeand 12. Eleven instructional areas have been assessed periodically. Most recent rep

    ve focused on reading and writing (Applebee and others 1990a, 1990b; Langer and

    hers 1990; Mullis and Jenkins 1990); mathematics and science (Dossey and others 1

    ullis and Jenkins 1988; Mullis and others 1993); history (Hammack and others 1990

    ography (Allen and others 1990); and civics (Anderson and others 1990). Data have

    en reported by state, gender, ethnicity, type of community, and region.

    p to 1984, the percentages of students who passed items were reported. Since that da

    oficiency scales have been developed for each subject area. These scales weremputed by using statistical techniques (based on item response theory) to create a s

    ale representing performance (Phillips and others 1993). The scale is a numerical in

    at ranges from 0 to 500. It has three achievement levelsbasic, proficient, and advanc

    ch grade level and allows comparison of performance across grades 4, 8, and 12.

    setting the achievement levels, the views of teacher representatives (sixty-eight in

    athematics, for example), administrators, and members of the general public were ta

    o account (Mullis and others 1993). Performance at the lowest, or basic, level denortial mastery of the knowledge and skills required at each grade level. For example,

    ade 4 students performing at the basic level are able to perform simple operations w

    hole numbers and show some understanding of fractions and decimals. Performanc

    e middle, or proficient , level demonstrates competence in the subject matter. In the v

    the National Assessment Governing Board, all students should perform at this leve

    ade 4 students who are proficient in mathematics can use whole numbers to estima

    mpute, and determine whether results are reasonable; have a conceptual understand

    fractions and

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    28/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    29/134

    Pa

    cimals; can solve problems; and can use four-function calculators. The highest, or 

    vanced , level indicates superior performance. Grade 4 students who receive this rat

    n solve complex nonroutine problems, draw logical conclusions, and justify answe

    verage mathematics proficiency marks are presented for grades 4, 8, and 12 for 1990

    d 1992 in table 2.1. The data in the last column show that in both years more than o

    rd of students at all grade levels failed to reach the basic level of performance.

    owever, the figures in this and in other columns suggest that standards rose between

    90 and 1992.

    sults based on one common scale (table 2.2) show that most students, especially th

    grades 4 and 8, performed poorly on tasks involving fractions, decimals, and

    rcentages. Furthermore, very few grade 12 students were able to solve nonroutine

    oblems involving geometric relations, algebra, or functions. Subsequent analyses

    vealed that performance varied by type of school attended, state, gender, and level ome support.

    omparisons of trends over time show that achievements in science and mathematics

    ve improved, whereas, except at one grade level, there has been no significant

    provement in reading or writing since the mid-1980s (Mullis and others 1994).

    formation collected in the  NAEP to help provide a context for the interpretation of the

    hievement results revealed that large proportions of high school students avoid taki

    athematics and science courses.

    ble 2.1. Proficiency Levels of Students in Grades 4, 8, and 12, as Measured by

    S. NAEP Mathematics Surveys, 1990 and 1992

    Grade

    d year 

     Average

     proficiency

     Percentage of students

    at or above  Percentage of students below

    basic AdvancedProficientBasic

    ade 4

    90 213 1 13 54 46

    92 218 2 18 61 39

    ade 8

    90 263 2 20 58 42

    92 268 4 25 63 37

    ade

    90 294 2 13 59 41

    92 299 2 16 64 36

    urce: Mullis and others 1993.

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    30/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    31/134

    Pa

    ble 2.2. Percentage of Students at or above Average Proficiency Levels in Grades

    8, and 12, as Measured by U.S. NAEP Mathematics Surveys, 1990 and 1992

    Grade

    nd year   Average proficiency

      Percentage at or above proficiency level 

    200 250 300 350

    ade 4

    90 213 67 12 0 0

    92 218 72 17 0 0

    ade 8

    90 263 95 65 15 0

    92 268 97 68 20 1

    ade 12

    90 294 100 88 45 5

    92 299 100 91 50 6

    te: Skills for each proficiency level are as follows:

    vel 200. Addition, subtraction, and simple problem solving with numbers

    vel 250. Multiplication and division, simple measurement, two-step problem

    ving

    vel 300. Reasoning and problem solving involving fractions, decimals,

    rcentages, and elementary concepts in geometry, algebra, and statistics

    vel 350. Reasoning, problem solving involving geometric relationship, algebra,

    nctions.

    urce: Mullis and others 1993.

    mong eleventh-graders who enroll in science courses, approximately half had never

    nducted independent experiments. Almost two-thirds of eighth-graders spend more

    an three hours a day watching television.

    gland and Wales

    England and Wales, national monitoring efforts have been a feature of the educatio

    stem since 1948. Large-scale national surveys of levels of reading achievement of 9

    -, and 15-year-olds were conducted irregularly up to 1977 (Kellaghan and Madaus82). In 1978, partly in response to criticisms about standards in schools, a more

    aborate system of assessment, run by the Assessment of Performance Unit in the

    epartment of Education and Science, was set up (Foxman, Hutchinson, and Bloomf

    91). Three main areas of student achievement were

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    32/134

    Pa

    geted for assessment at ages 11, 13, and 15: language, mathematics, and science. In

    dition to pencil-and-paper tests, performance tasks were administered to small sam

    students to assess their ability to estimate and to weigh and measure objects.

    sessments in the 1980s carried considerable political weight. They contributed to th

    gnificant curriculum reform movement embodied in the 1988 Education Act, which

    e first time, defined a national curriculum in England and Wales (Bennett and Desfo

    91). The new curriculum was divided into four ''key" stages, two at the primary lev

    d two at the secondary level. A new system of national assessment was introduced i

    njunction with the new curriculum. Attainment was to be assessed by teachers in th

    wn classrooms by administering externally designed performance assessments. Thes

    sessments went well beyond the performance tests introduced by the Assessment an

    rformance Unit; they were designed to match normal classroom tasks and to have n

    gative backwash effects on the curriculum (Gipps and Murphy 1994).e policy-related dimension of the assessments was clear. They were intended to hav

    riety of functions: formativeto be used in planning further instruction; diagnosticto

    entify learning difficulties; summativeto record the overall achievement of a student

    stematic way; and evaluativeto provide information for assessing and reporting on

    pects of the work of the school, the local education authority, or other discrete parts

    e education service (Great Britain, Department of Education and Science, 1988). In

    rticular, the assessments were expected to play an important role in ensuring that

    hools and teachers adhered to the curriculum as laid down by the central authority.us the assessment approach could be described as "fundamentally a management

    vice" (Bennett and Desforges 1991, p. 72); it was not supported by any theory of 

    arning (Nuttall 1990).

    though there have been several versions of the curriculum and of the assessment

    stem since its inception, some significant features of the system have been maintain

    rst, all students are assessed at the end of each key stage at ages 7, 11, 14, and 16.

    cond, students' performance is assessed against statements of attainment prescribedch stage (for example, the student is able to assign organisms to their major groups

    ing keys and observable features, or the student can read silently and with sustained

    ncentration). Third, assessments are based on both teacher judgments and external

    ts.

    achers play an important role in assessment: they determine whether a student has

    hieved the level of response specified in the statement of attainment, record the

    hievement levels reached, indicate level of progress in relation to attainment targets

    ovide evidence to support levels of attainment reached, and give information about

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    33/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    34/134

    Pa

    nt achievements and progress to parents, other teachers, and schools. Moderation is

    rried out by other teachers, to help ensure a common marking standard.

    tial reactions to the process indicated that teachers welcomed the materials provide

    d the innovative assessment procedures. On the negative side, the assessment proce

    aced a heavy burden on teachers, the in-service support provided was inadequate, a

    e assessment turned out to be largely impractical (Broadfoot and others n.d.; Gipps

    hers 1991; Madaus and Kellaghan 1993). To add to the problems, results were being

    blished at a time of intense competition between schools and of job losses, which g

    e to questions about entrusting the administration and scoring to teachers (Fitz-Gib

    95).

    wo important lessons can be drawn from the British national assessment system. Fir

    e use of complex assessment tasks leads to problems of standardization of procedur

    r administration and scoring that, in turn, lead to problems of comparability, bothtween schools and over time. Second, it is extremely difficult, if at all possible, to

    vise assessment tasks that will serve equally well formative, diagnostic, and summa

    aluative purposes (Kellaghan 1996c). Efforts to deal with these problems are to be

    und in the move to make greater use of more conventional centralized written tests

    accord priority to the summative function in future assessments (Dearing 1993; Gip

    d Murphy 1994).

    hile1978 Chile's Ministry of Education assigned responsibility for a national assessmen

    external agency, the Pontificia Universidad Católica de Chile. The study was pilote

    er a two-year period. Data on contextual variables, as well as on achievement, were

    llected (Himmel 1996). These included student-home variables (student willingness

    arn, parental expectations for their children); teacher-classroom variables (teaching

    ethodologies, classroom climate); principal and school variables (expectations of sta

    d of students, promotion of parents in school activities); and institutional variables

    ducational and financial policies).

    e assessment was designed to provide information on the extent to which students

    ere achieving learning targets considered minimal by the Ministry of Education; to

    ovide feedback to parents, teachers, and authorities at municipal, regional, and cent

    vels; and to provide data to planners that would guide the allocation of resources in

    xtbook development, curriculum development, and in-service teacher training.

    l students in grades 4 and 8 were assessed in Spanish (reading and

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    35/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    36/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    37/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    38/134

    Pa

    udents in rural schools; students in large schools performed better than students in s

    hools; and students in private schools scored highest.

    e results were disseminated extensively. Teachers received classroom results contai

    e average percentage of correct answers for each objective assessed, as well as the

    erage number of correct answers over the entire test. Results were also reported

    tionally and by school, location, and region. Each classroom and school was given

    rcentile ranking based on other schools in the same socioeconomic category, as we

    national ranking. Special manuals explained the results and indicated how schools a

    achers could use the information to improve achievement levels. Results were given

    hool supervisors.

    latively little use was made of the self-concept information. Parental information w

    t used and was not collected after the first year. Parents, however, received a simpli

    port of overall results for Spanish and mathematics.

    e of the national assessment results has increased gradually. Lowscoring schools ha

    cess to a special fund to enable them to improve infrastructure, educational resourc

    d pedagogical approaches. Results have also been used to prompt curriculum refor

    rcentile rank scores were dropped in favor of percentage scores because teachers fo

    difficult to interpret the former.

    e Chilean experience highlights the need for consensus and political will, technical

    mpetence, and economic feasibility (Himmel 1996). Currently there appears to belitical and public support for the SIMCE. It provides education administrators with

    formation for planning, and authors of instructional materials use the information to

    entify objectives. However, the enterprise has not been a total success. Some school

    alizing that their rank depended on the reported socioeconomic grouping of their 

    udents, overestimated the extent of poverty among their students to help boost their 

    sition. Efforts to explain procedures and results to parents have not been reflected i

    creased parent involvement with schools except for private schools. Almost two-thi

    teachers reported that they did not use the special manual that dealt with thedagogical implications of the test results. Finally, questions have been raised about

    lue of the census approach when sample data could provide policymakers with the

    eded information.

    olombia

    ational assessment in Colombia was prompted by a perception that insufficient relev

    formation was available for decisionmaking at central, regional, and local levels (Ro

    96). The Ministry of Education also wished to use the results to generate debate on

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    39/134

    ucational issues.

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    40/134

    Pa

    e initial assessment conducted in 1991 focused on the extent to which standards

    fined as minimum in mathematics and language were being attained in grades 3 and

    urban and rural public and private schools. A total of 15,000 students participated i

    e assessment. Originally thirteen states, accounting for 60 percent of the population

    ere targeted. The sample comprised 650 students in grade 3 and 500 students in grad

    each state.

    r grade 3 four performance levels were assessed in mathematics and three in readin

    mprehension. Performance levels or target standards were determined by the test

    velopment personnel. For example, in mathematics the lowest performance level

    cluded items on simple addition, whereas more complex tasks involving problem

    lving were equated with higher performance levels. For grade 5 five performance le

    ere assessed in mathematics and four in reading. Both multiplechoice items and item

    r which students had to supply short answers were used. Data on personal, school, vironmental characteristics were collected, as well as information on student

    rticipation in local organizations or associations.

    e national leader of the assessment had considerable experience in research, data

    llection, and fieldwork. Teams were established to coordinate the fieldwork within

    dividual states. Each team was led by a coordinator who directed the field testing,

    pported by two or three individuals with formal qualifications in the social sciences

    cal coordinators, usually young people, supervised the work of ten to fifteen

    ldworkers. The fieldworkers, often university students or recent social scienceaduates, administered the tests and conducted teacher interviews. The supply of 

    plicants for these positions was ample because of the relatively high unemploymen

    es among graduates. Local teachers were not asked to administer tests because it wa

    t they might attempt to help students taking the tests. Ministry of Education official

    ere considered unqualified for the work.

    the end of the assessment, profiles of high-scoring schools, teachers, and

    ministrators were developed. The percentages of students who scored at eachrformance level were reported separately for each state, for public and private scho

    d for urban and rural schools, as well as at the national level. Correlates of achievem

    ere identified; these included the number of hours per week devoted to a subject are

    achers' emphasis on specific content areas, teachers' educational level, school faciliti

    d number of textbooks per student. Negative correlations were recorded for grade

    petition, absenteeism, time spent getting to school, and family size (Instituto SER de

    vestigación/Fedesarrollo 1994). The number of in-service courses a teacher had tak

    d not emerge as a significant predictor of achievement.

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    41/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    42/134

    Pa

    sults were released through the mass media, and a program of national and local

    orkshops was organized to discuss the results and their implications. Individual teac

    ceived information on national and regional results in newsletters, brochures, and o

    er-friendly documents. Administrators, especially at the state level, used results for 

    cal comparisons. A national seminar used the national assessment data to identify

    propriate strategies for improving educational quality. Results for individual schoolere not reported because it was felt that this would undermine teacher support for th

    sessment.

    e apparent success of the initial assessment has been attributed to the creation of an

    aluation unit within the Ministry of Education; to the commitment of the minister an

    ce-minister for education; to the support of ministry officials; to the use of an extern

    blic agency to design the assessment instruments; and to the use of a private agency

    ke responsibility for sampling, piloting of instruments, administration of tests, and dalysis (C. Rojas, personal communication, 1995). After the first two years responsib

    r the national assessment was transferred to a public agency, which administered th

    sessment in 1993 and 1994. By late 1995, however, the agency had not managed to

    alyze the data collected in either year.

    ailand 

    llowing the introduction of a new higher secondary school curriculum in 1981, pub

    rtification examinations at the end of secondary school were abolished in Thailand,achers were given responsibility for evaluating student achievements in their respec

    urses. Concerned that achievement might fall in this situation, the Ministry of Educ

    roduced national assessment as a means of monitoring standards (Prawalpruk 1996

    dministrators at various levels of the system were expected to use the results to help

    prove the quality of education. To encourage schools to broaden their objectives an

    structional practices, the national assessment included measures of affective learnin

    tcomes (attitudes toward work, moral values, and participation) and practical skills

    arting in 1983, all grade 12 students (in their final year in secondary school) weresessed in Thai, social studies, and physical education. In addition, science, mathema

    d career education were assessed in most subsequent years. Both cognitive and

    fective outcomes were assessed in social studies, physical education, and career 

    ucation. The task was entrusted to the Office of Educational Assessment and Testin

    rvices in the Department of Curriculum and Instruction Development.

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    43/134

    Pa

    any of the staff had achieved master's degrees in educational assessment; eight had b

    ined outside Thailand. Subject matter committees (twelve to eighteen members eac

    ablished for each subject area developed tables of specifications for achievement a

    ote multiple-choice items. Nationwide testing was conducted on the same two days

    hools were furnished with individual student scores and with school, regional, and

    ovincial mean scores; information on how other individual schools performed was

    ovided. For public communication purposes, student performance was reported as

    rcentage of items answered correctly. Provincial administrators advised how the res

    uld be used in planning academic programs at school, provincial, and regional leve

    subsequent years, samples of grades 6 and 9 were assessed, generally every second

    ar. In a reaction to the initial failure of schools to use assessment results to improve

    hool practice, the national assessment design was expanded to include measures of 

    hool process (school administration, curriculum implementation, lesson preparationd instruction). Starting in 1990 school process measures were assessed by teams of 

    ree external evaluators. The early national assessment results for science and

    athematics were considered disappointing; they showed that students were weak at

    plying principles in both subject areas. This conclusion prompted a significant

    rriculum revision in 1989.

    ational assessment has been used for school and provincial planning and for monito

    vels of student achievement over time; it has also helped increase teacher interest infective learning outcomes. According to Prawalpruk (1996), some principals misuse

    e results by claiming that poor results could be attributed to poor teaching. Results w

    ed for educational planning only if adequate administrative support was available.

    hool principals ignored assessment results if they did not consider them useful for 

    anning.

    amibia

    e National Institute for Educational Development in Namibia collaborated with Floate University and Harvard University in 1992 to assess the basic language and

    athematics proficiencies of students at grades 4 and 7. The objectives of the assessm

    ere to inform policymakers on achievement levels to enable them "to decide on reso

    geting to underachieving schools" (Namibia, Ministry of Education and Culture, 19

    xiv), to sensitize managers to the professional needs of teachers, to enable schools

    gions to

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    44/134

    Pa

    mpare themselves with their counterparts, and to provide baseline data for monitor

    ogress.

    sts were developed "by reference groups within the head office of the ministry" (p.

    sed on official curricula and textbooks. A random sample of 136 schools was draw

    vering Namibia's six education regions. Within each school, one grade 4 and one gr

    class were chosen randomly. In one specific region of interest (Ondangwa), thirty-f

    hools with grade 4 students and nineteen with grade 7 students took the national

    nguageOshindongatest. Test instructions to all students were given in the local langu

    ore than 7,000 students in grades 4 and 7 were tested in English and mathematics.

    f the 136 schools, 20 were included in a special longitudinal sample to monitor chan

    English achievement over time. In these schools, students in grades 4 and 5 took th

    ade 4 test, whereas those in grades 6 and 7 took the grade 7 test. It was planned to

    administer the tests to students each year. It is now accepted that the longitudinal samas too small to permit generalization to the wider population of Namibian children.

    e tests were administered to all students in attendance in the targeted grades in the 1

    mple schools; only 98 schools, however, had a grade 7 class. Both the English and

    hindonga tests were timed. The English test took 40 to 60 minutes and the Oshindo

    t 60 to 80 minutes to complete. The untimed mathematics test took up to 120 minut

    d caused some student fatigue.

    cause the test designers hoped to get a normal distribution of test scores, tests weresigned to assess levels of mastery. Items answered correctly by less than 20 percent

    ore than 80 percent of students were deleted in analyses. This reduced severely the

    mber of items that could be used in measuring performance levelsin the English gr

    est, from seventeen to nine, and in the grade 7 mathematics test, from sixty to thirty

    ght.

    sults showed that many grade 4 students had difficulty with the English test, promp

    ncern that the expected level of performance was too high and suggesting that therriculum materials might be too advanced. Initial analyses of results suggested that

    tegories of students increased their scores between grades 4 and 5 and between grad

    d 7. At grade 7, the performances of girls and boys were similar on the two languag

    ts, but boys outscored girls on the mathematics test. Older students had much lowe

    ores than younger ones; for example, 19-year-olds answered correctly fewer than h

    e items answered correctly by 12- and 13-year-olds on both the English and

    athematics tests. Differences in scores for regions and for language groups were als

    ported.

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    45/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    46/134

    Pa

    ata were used to relate performance levels to three background factorsage, gender, a

    me languagewhich in combination explained about one-third of the variance in Eng

    ores and about one-fifth of the variance in mathematics scores. In one region, howe

    s than 3 percent of the variance could be attributed to these factors. A set of papers

    epared for teachers outlining practical suggestions for improving student performan

    areas that had posed difficulties.

    he study concluded that the process of developing the tests for the assessment was n

    ogether satisfactory and that a new competency-based curriculum will make it

    cessary to develop new measures to assess basic competencies in subject areas.

    auritius

    implement the recommendations of the World Conference on Education for All, th

    nited Nations Educational, Scientific, and Cultural Organization (UNESCO) and the Unit

    ations Children's Fund (UNICEF) launched a project to develop national assessment

    pacities in China, Jordan, Mali, Mauritius, and Morocco (Chinapah 1992; UNESCO 199

    entification missions to each country were supported by some centralized training in

    rvey methodology. Each national assessment focused on learning achievement (liter

    meracy, and basic life skills); factors related to learning achievement (personal

    aracteristics, home environment, and school environment); and access and equity

    male enrollment, and admission and participation rates of specific groups). The

    signers hoped that lessons learned in the course of the project could be adapted andplied in other developing countries.

    e national assessment in Mauritius was conducted to address policy issues relating

    ucational inequalities (Chinapah 1992) and to provide baseline data on achievemen

    vels, with the aim of identifying the percentage of students who attained defined

    ceptable standards in specified subject areas. Literacy (English and French), numera

    d life skills were assessed. Items on road safety, awareness of the environment, soc

    ills, and study skills were included.

    ecific performance criteria were developed for each subject area (Mauritius

    aminations Syndicate 1995). To be rated literate in French, for example, a 9-year-ol

    as required to obtain a minimum score of twenty marks out of thirty-five, including

    ght of a possible thirteen in "reading" and twelve of, twenty-two in "vocabulary, wri

    pression." To be considered literate in English, the 9-year-old was expected to obtai

    nimum score of seventeen marks, including twelve out of a possible twenty-two in

    ading and five of eight in writing. Such performances were considered to represent

    ility to read clearly, to un-

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    47/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    48/134

    Pa

    rstand different types of text judged appropriate for 9-year-olds, and to solve simpl

    opping problems (V. Chinapah, personal communication, 1995).

    pproximately 1,600 standard IV students, mainly 9-year-olds in a representative sam

    fifty-two schools, were assessed. Questionnaires were administered to parents,

    achers, and school principals to obtain background information on home, school, a

    udent characteristics. Responsibility for the assessment was entrusted to the Mauritiu

    aminations Syndicate. The syndicate, which administers the annual high-stakes pub

    aminations, had some technical competence in test development, data analysis, and

    ministration of formal assessments. Each test lasted 40 minutes. The literary and

    meracy test relied on multiple-choice and short-answer questions, the life skills test

    ultiple-choice items. Tests were administered by retired primary school inspectors a

    ad teachers. Data were collected in 1994, and findings were presented to the Ministr

    ducation and to teachers. The syndicate plans to repeat the assessment in the future tonitor possible changes in achievement over time (R.Manrakhan, personal

    mmunication, 1995).

    ternational Assessments

    ternational assessments, in contrast with national assessments, involve measuremen

    e educational outcomes of education systems in several countries, usually

    multaneously. Representatives from many countries (usually from research

    ganizations) agree on an instrument to assess achievement in a curriculum area, thestrument is administered to a representative sample of students at a particular age or

    ade in each country, and comparative analyses of the data are carried out (Kellaghan

    d Grisay 1995).

    ountries participating in international studies are expected to provide personnel and

    nds for administration, training, printing, local analyses, and production of national

    ports. Costs of instrument development, sampling frameworks, international data

    alyses, and report writing are the responsibility of the international assessment agen

    which individual countries make a financial contribution.

    ternational Assessment of Educational Progress

    e first International Assessment of Educational Progress (IAEP), conducted in 1988 u

    e direction of Educational Testing Services, under contract to the U.S. Department o

    ducation, represents an

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    49/134

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    50/134

    Pa

    y to learn, the amount of time a subject is studied, the use of computers, and factor

    d resources in the homes of students (Anderson and Postlethwaite 1989; Anderson,

    yan, and Shapiro 1989; Elley 1992, 1994; Kifer 1989; Lambin 1995; Postlethwaite and

    oss 1992).

    vantages of International Assessments

    e main advantage of international studies over national assessments is the compara

    amework they provide in assessing student achievement and curricular provision

    usén 1967). International assessments give some indication of where the students in

    untry stand relative to students in other countries. They also show the extent to whi

    e treatment of common curriculum areas differs across countries, and, in particular,

    tent to which the approach in a given country may be idiosyncratic. This informatio

    ay lead a country to reassess its curriculum policy.

    any accounts are available of how findings of international studies on student

    hievement and curricula have been used to change educational policy (Husén 1987;

    ellaghan 1996b; Torney-Purta 1990). For example, results of international studies ha

    en credited with the increased emphasis placed on science in Canada and in the Un

    ates (McEwen 1992). In Japan the relatively superior performance of students in

    athematical computation compared with mathematical application and analysis led t

    ange in emphasis in the curriculum (Husén 1987). In Hungary participation in IEA stu

    s been credited with curriculum reform in reading, and the finding that home factorcounted for more variance in student achievement than school factors helped to

    dermine Marxist-Leninist curricular ideologies (Báthory 1989).

    ternational assessments have many other advantages. Their findings tend to attract m

    litical and media attention than those of national studies. Thus, poor results can pro

    liticians and other policymakers with a strong rationale for budgetary support for th

    ucation sector.

    r national teams entrusted with the implementation of international assessment, theperience of rigorous sampling, item review, printing, distribution, supervision, scor

    ta entry, and drafting of national reports according to an agreed-on timetable can

    ntribute greatly to the development of local capacity to conduct research and nation

    sessments. Finally, staffing requirements and costs are lower in international studies

    an in national assessments because instrumentation and sampling design are develop

    collaboration with experts in other countries.

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    51/134

    Pa

    sadvantages of International Assessments

    can be argued that such factors as availability of schools and materials, opportunity

    arn, status and quality of teaching, parental interest, and class size differ so radically

    om country to country that valid comparisons of international achievement test resu

    e impossible (Rotberg 1991). Although IEA studies generally consider the extent to w

    udents in individual countries have had opportunities to learn the content tested, it is

    ubtful whether politicians, policymakers, or the media take these into consideration

    hen commenting on national rankings. Political rhetoric, frequently based on the

    rceived implications of the findings for competitiveness in international trade rathe

    an on a sober evaluation of the meaning of results, may dominate the discussion

    mediately following the publication of results. In fairness, it should be stressed tha

    informed political rhetoric can be prompted by the results of national as well as

    ernational assessments and that some of the problems associated with internationalsessments apply equally to national assessments.

    potentially significant problem with both international and national studies is the

    fficulty in obtaining a representative sample of students (box 2.1). In many develop

    untries up-to-date population data may not be available, and communication and

    gistical problems can contribute to relatively low response rates. The National Cente

    r Education Statistics in the United States has set a response rate target of 85 percen

    oss-sectional surveys. This target may be much too high for developing countries, a

    deed it has been achieved only once by the United States in international studies of athematics and science (Medrich and Griffith 1992). Sampling problems are

    mmonplace and have been blamed for significant reversals of performance in some

    untries between grades (Rotberg 1991). Targeted populations may not be comparab

    pecially in countries where national enrollment, drop-

    ox 2.1. Atypical Student Samples

    the 1991 IAEP mathematics study, only 3 percent of the population of 

    -year-old students in Brazil and 1 percent of the correspondingpulation of students in Mozambique were sampled. The performance

    Chinese studentswhich was highlighted in the report of the studywas

    sed on a sample that excluded many 13-year-olds: those below grade

    n twenty provinces and cities, those out of school (almost 50 percent

    the population), and those attending school in nine provinces and

    tonomous regions with predominantly non-Chinese populations

    apointe, Mead, and Askew 1992). The exclusion of these groups

    ggests that the reported achievement levels may seriously

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    52/134

    erestimate the mean achievements of Chinese students.

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    53/134

    Pa

    t,and retention rates differ sharply. The result is that countries may have been

    presented by atypical samples of students.

    further problem with international assessments is that it is probably impossible to

    velop a test that is equally valid for several countries (Kellaghan and Grisay 1995).

    hat is meant by ''achievement in mathematics" or "achievement in science" varies fr

    untry to country because different countries will choose different skills applied to

    fferent facts and concepts to define what they regard as mathematical or scientific

    hievement. Furthermore, a particular domain of a subject may be taught at different

    ade levels in different countries. For example, simple geometric shapes, which are

    roduced in many countries in the junior or lower primary grades, are not introduce

    til grade 5 in Bangladesh. Again, prior knowledge or expectations might interfere w

    empts to solve a simple problem.

    cause items included in an international test represent a common denominator of thrricula of participating countries, it is unlikely that the relative weights assigned to

    ecific curriculum areas in national curricula will match those in international tests. I

    e 1988 IAEP relatively little effort was made to test the curricula covered by non-U.

    rticipants. As a result, in one of the participating countries (Ireland), important area

    e mathematics curriculum were not tested, and other areas that received substantial

    mphasis in the national curriculum were accorded relatively little emphasis in the

    ernational test (Greaney and Close 1989).

    though a range of test formats is used in international assessments, the multiple-cho

    rmat is used widely for reasons of management efficiency and desirable psychomet

    operties (especially reliability). Even when other assessment formats are included,

    ports may be limited to the results of the multiple-choice tests. This means that

    portant skills in the national curriculum, including writing, oral, aural, and practica

    ills, are excluded.

    e costs of international assessments are likely to be lower than those of national

    sessments, but participation in an international assessment does require considerablnancial support. The IEA estimates that the minimum national requirement is a full-tim

    searcher and a data manager. Personnel requirements vary according to the nature o

    sessment. Developing countries that wish to participate must pay a nominal annual

    d make a contribution to the overall costs on the basis of their economic circumstan

    cal funds have to be obtained for printing, data processing, and attendance at IEA

    eetings. Costs may be met by a ministry of education, from university operating

    dgets, or from a direct grant from the ministry of education to a university or resea

    nter. IEA experience suggests that government-owned institutes have a better track re

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    54/134

    an universities in conducting assess-

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    55/134

    Pa

    ents (W. Loxley, personal communication, 1993). A lack of meaningful contact betw

    iversity researchers and government ministries is particularly noteworthy in some L

    merican countries.

    any developing countries are likely to encounter a range of common problems, whe

    ey are conducting an international or a national assessment. These include unavailab

    current population information on schools and enrollment figures; lack of experien

    administering large-scale assessments or in administering objective tests in schools

    ts that do not adequately reflect the curriculum offered in schools or that fail to refl

    gional, ethnic, or linguistic variations; lack of exposure to objective-type items; fear

    t results might be used for teacher accountability purposes; insufficient funds and

    illed manpower to do rigorous in-country analyses of the national or international d

    vernmental restrictions on publicizing results; and logistical problems in conducting

    sessment.n balance, a developing country can probably benefit from participation in internati

    sessments of student achievements. Participation can help develop expertise that ca

    awn on later in more focused and more relevant national assessments. Consultant

    pport, however, may be needed to carry out an international or national assessment

    rticular, the services of long-and short-term local and foreign consultants may be

    quired to offer training programs in test development, sampling, and analysis.

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    56/134

    Pa

    ational Assessment and Public Examinations

    though the idea of national assessment is new in most countries, public examinatio

    e an important and well-established feature of education in Africa, Asia, Europe, ane Caribbean. In developing countries they are usually offered at the end of primary

    hooling and at the ends of the junior and senior cycles of secondary schooling. Pub

    aminations are similar in many respects to national assessments: procedures are

    rmalized, and testing is normally done outside the classroom setting and requires

    udents to provide evidence of achievement. Because of their importance, their 

    quency, and their similarity to national assessments, it is reasonable to ask whether

    blic examinations could be used to obtain the kind of information that national

    sessment systems are designed to collect.

    ght issues are relevant in attempting to answer this question: the purposes of public

    aminations and of national assessments; the achievements of interest to the two

    tivities; testing, scoring, and reporting procedures; the populations of interest to the

    tivities; monitoring capabilities of the two activities; the need for contextual informa

    interpreting assessment data; the implications of attaching high stakes to assessmen

    d efficiency and cost-effectiveness in obtaining information.

    rposes

    e purposes of public examinations and national assessments are significantly differ

    e purpose of a public examination is to determine whether an individual student

    ssesses certain knowledge and skills. A national assessment is not primarily concer

    th identifying the performance of individual students; rather, its purpose is to asses

    rformance of all or part of the education system. Given this difference, we can still

    hether it is possible to aggregate the data from individual assessments in public

    aminations to obtain information on

     Note: For a more extended treatment of this topic, see Kellaghan (1996a).

  • 8/9/2019 Monitoring the Learning Outcomes-Vincent Greaney T

    57/134

    Pa

    e system. To answer that question, we have to consider the more specific purposes o

    dividual assessment and the implications of these purposes for the kind of assessme

    ocedure used.

    public examinations, information on student performance is used to make decision

    out certification and selection, with selection tending to be the more important func

    ellaghan and Greaney 1992; Lockheed 1991). As a consequence, the assessment

    ocedure or examination will attempt to achieve maximum discrimination for those

    udents for whom the probability of selection is high. This is done by excluding item

    at are easy or of intermediate difficulty; if most students answered an item correctly

    m would not discriminate among the higher-scoring students. However, tests made

    lely of more difficult questions will not cover the whole curriculum or even attemp

    so. The result is that public examinations may provide information on students'

    hievements on only limited aspects of a curriculum.he purpose of national assessment is to find out what all students know and do not

    ow. Therefore, the instrument used must provide adequate curriculum coverage. F

    policy perspective, the performance of students who do poorly on an assessment m

    of greater interest than the performance of those who do well.

    chievements of Interest

    ere is some overlap in the student achievements identified as important by public

    aminations and national assessments. During the period of basic education, bothrtification and national assessment are based on information about basic literacy,

    meracy, and reasoning skills. If we look at primary certificate (public) examination

    e find that many focus on a number of core subjects, and a glance at several nationa

    sessments indicates that they do the same. For example, students knowledge of a

    tional language and mathematics is included in all national assessment systems.

    owever, no national assessment attempts the coverage found in public examinations

    e secondary level, when students tend to select and specialize in subject areas. Thebjects offered vary from one examination authority to another, but it is not unusual

    d syllabi and examinations in twenty, thirty, or even more subjects.

    ational assessments have focused on cognitive areas of development. Thailand

    rawalpruk 1996) and Chile (Himmel 1996) are among the relatively small number o

    ucation systems that have attempted to assess affective o