Shaping Educational Accountability Systems

Shaping Educational AccountabilitySystems

KATHERINE RYAN

ABSTRACT

The No Child Left Behind Act of 2001 (NCLB) institutionalizes the reliance on accountabilityand assessment systems as a key mechanism for improving student achievement (Linn, Baker, &Betebenner, 2002). However, there is a fundamental tension between performance measurementsystems, which do serve stakeholders and public interests through monitoring, and these kinds ofindicators where representations of program quality are oversimplified (Stake, 2001). Evaluatorsare uniquely situated to made a significant contribution in the dialogue about the merits andshortcomings of educational accountability systems. Suggestions concerning how evaluationcan contribute to improving and changing accountability systems are presented.

The No Child Left Behind Act of 2001 (NCLB) institutionalizes the reliance on account-ability and assessment systems as a key mechanism for improving student achievement (Linnet al., 2002). Under the leadership of Sandra Mathison and James Sanders, co-chairs of theAd-hoc Committee on High Stakes Testing, the evaluation community (a group of peopleidentifying themselves as evaluators who share and participate in common evaluative interestsand concerns) developed a position paper on the high-stakes testing that is at the center of thestate-mandated educational accountability systems that are sweeping across the United States(American Evaluation Association, 2001). In this paper, I suggest the evaluation communityhas the responsibility and obligation to go beyond position papers to play an integral part inthe study of, improvement of, and judgments about the merit and worth of these assessmentand accountability systems.

There are numerous difficulties in considering how to evaluate educational accountabilitysystems, including what criteria should be used to evaluate the effectiveness of the system (Linn,2001). Recently, the first set of standards designed for evaluating the current systems and forguiding the development of improved systems were proposed (Baker, Linn, Herman, & Koretz,2002). Intended to provide a gauge for determining educational accountability quality, the

Katherine Ryan • Associate Professor, Department of Educational Psychology, University of Illinois, 230 Education,1310 South Sixth Street, Champaign, IL 61820, USA; Tel: (1) 217-333-0719; E-mail: [email protected].

American Journal of Evaluation, Vol. 23, No. 4, 2002, pp. 453–468. All rights of reproduction in any form reserved.ISSN: 1098-2140 © 2002 by American Evaluation Association. Published by Elsevier Science Inc. All rights reserved.

453

454 AMERICAN JOURNAL OF EVALUATION, 23(4), 2002

goal of these standards is a very important—a major step forward. While the standards arecomprehensive in many respects, attending to system components and test standards, for in-stance, there is no standard that addresses how to find out and represent various stakeholdergroups’ perspectives and values in the decisions about educational accountability systems andresults.

Locating accountability in evaluation—the exact shape and substance of evaluation’srelationship to accountability—is one of the issues that crosscuts the discipline. There is a fun-damental tension between performance measurement systems, which do serve stakeholdersand public interests through monitoring, and the reductive nature of indicators where rep-resentations of program quality are oversimplified (Stake, 2001). Scholars have questionedevaluators’ role in these systems (Greene, 1999; Stronach, Halsall, & Hustler, 2002).

Greene’s (1999, 2001a) shifting position represents the kind of conflict evaluators expe-rience when considering accountability and evaluation. While acknowledging the importanceof accountability, Greene (1999) criticized performance measurement as inadequate for rep-resenting program quality, as at odds with democratically-oriented evaluative processes, andshe questioned the role of the evaluator in performance measurement systems. In later work,while maintaining that performance measurement is inadequate, Greene (2001a) suggests thatevaluators who have these kinds of concerns need to become involved in accountability evalua-tion, actively participating and shaping accountability evaluation. Only by becoming involvedcan evaluators, first, “summon the best countenance of accountability” (Greene, 2001a, p. 2),changing the rules of the accountability evaluation game and, at the same time, change theaccountability game all together (Greene, 2001a).

Greene poses an ambitious agenda. I propose a first step. Using evaluative imagina-tion, I suggest that evaluators are uniquely situated to made a significant contribution in thedialogue about the merits and shortcomings of educational accountability systems. The “eval-uative imagination,” as I use the term, is the collective wisdom residing within the evaluationcommunity and its practices. By collective wisdom, I mean the assembled or accumulatedinsights and capacity to make judgments about the merit or worth—the quality of that whichis being evaluated. These insights and capacity are reflected and embedded in the practices ofevaluation.

Like all practices, the practices of evaluators and evaluation are articulated and tacit(Wenger, 1998). The practices of evaluation and evaluators include a language (e.g., evaluand,summative, and formative evaluation), theories (e.g., Mark, Henry, & Julnes, 2000; Stake,1967; Weiss, 1998), procedures (elements in conducting an evaluation), rules of thumb (e.g.,how to focus evaluation questions), contracts, (to specify a scope of work with clients), tools(qualitative and quantitative methods), regulations (Joint Committee standards), and codes(AEA guiding principles). The tacit component involves shared world views (improving soci-ety), rules of thumb (when to stop collecting data), intuitions (when to distribute initial reports),how to handle ethical concerns, and the common sense of evaluation (Wenger, 1998).

Examples of the collective wisdom found in evaluation practices and that can be espe-cially helpful for educational accountability systems include evaluation theories, expertise incomplementary applied social science methods, analytical frameworks, sensitivity to highlypoliticized contexts, and emerging skills in engaging what Greene (2001a) calls the relationaldimensions of evaluation practice. The relational dimensions of evaluative practice involve“attending to the ways people relate to and communicate with one another in a given set-ting . . . [which] means attending to the moral, ethical, political dimensions of evaluation ofthat setting” (Greene, 2001a, p. 23).

Shaping Educational Accountability Systems 455

Building on House and Howe’s (1999) deliberative democratic evaluation, the relationaldimensions of evaluation are anchored in the notion that attending to the broader socio-politicaland value context of the evaluation is critical because evaluation reflects and sustains the kindof institutions in which it is embedded (Greene, 2001a; House & Howe, 1999). Greene saysevaluators can contribute to changing the socio-political and value context by engaging inevaluation practices that are committed to democratic principles like justice, fairness, andequity, and by a critical examination of the relationships created and sustained by this context(Greene, 2001a).

In this paper, I examine how the evaluative imagination might be used for shaping educa-tional accountability systems. While my primary focus is educational accountability systemsat the school and district levels, I recognize that these systems have implications and conse-quences for students. I briefly describe the design features that anchor the current wave ofeducational accountability systems. Next I identify several concerns with the “black box” ofeducational accountability systems. Drawing on the evaluative imagination, I offer sugges-tions concerning how evaluation can contribute to improving accountability systems; thesesuggestions include doing accountability evaluation “right”; developing program theory, andusing implementation theory to study the consequences of educational accountability. Finally,I present initial ideas on how evaluation and evaluators might change accountability systems.These suggestions include, for example, locating the pluralistic policy shaping community ineducational accountability context and addressing stakeholder issues through inclusive dia-logues and deliberations.

CONTEXT OF EDUCATIONAL ACCOUNTABILITY SYSTEMS

. . . Although it is commonplace to talk about education in the United States as 50 separateexperiments, such a characterization suggests far more systemization, planning, and eval-uation than exists . . . . We need a better basis for designing, evaluating, and redesigningassessment and accountability systems than we currently have. (Linn, 2001, p. 2)

Early in the history of evaluation, Cronbach and his colleagues (1980) rigorously ex-amined the idea of accountability based on a rational management model. Such a model ofaccountability, according to Cronbach et al. (1980), focuses on efficiency, where informationflows to the “responsible” or “accountable” manager or policy official who makes a decisionthat is then executed by subordinates. Cronbach et al.’s statement that “a demand for account-ability is a sign of social pathology in the political system” (p. 4) goes to the heart of theirposition on this issue.

The current educational accountability design is a modern version of the rational man-agement model. Educational accountability systems are constructed around the notion of man-agerial efficiency, focusing on how an accountability system can be designed to maximizelearning. Alternatively, accountability systems can start with some kind of educational goaland ask what resources are needed to achieve the outcome relative to some districts in com-parison to others1 (Ladd, 1996). The system design reflects an extant value commitment toefficiency.

While the details of the educational accountability systems differ, they are all character-ized by the following features (Ladd, 1996; Linn, 2000). The systems (a) develop challengingcontent and performance standards that focuses on student learning; (b) emphasize the mea-surement of student achievement as a basis for school accountability; (c) develop technically


complex apparatus as the mechanism for evaluating schools; and (d) introduce rewards, penal-ties, and interventions as incentives for improving achievement. At the same time, there aresubstantial differences in the dimensions of the current educational accountability systems(Elmore, Abelman, & Fuhrman, 1996; Linn, 2001). These dimensions can be categorized intofour areas related to: (a) defining the concept of accountability; (b) assessing student progress;(c) addressing fairness issues from a technical perspective; and (d) defining the system con-sequences. The commonalities and differences in educational accountability systems presentmany issues. Below I offer a brief discussion highlighting key issues, organized around thedifferences in these systems.

Issues in Defining the Concept of Accountability

One of the biggest problems facing educational accountability systems is inherent toany accountability system. That is, what constitutes acceptable performance? This questioninvolves consideration of accountability goals, levels, standard of accountability (fixed orrelative), and how much emphasis is given to current performance and how much to improve-ment over time. The issue of what constitutes acceptable performance is especially complexin the context of educational accountability, because the frame of reference for interpret-ing performance in education (absolute or relative, norm-referenced, criterion-referenced, orstandards-based interpretation, or some combination) has been and continues to be a topic ofdebate.

What is viewed as acceptable student performance, and whether this is a relative or ab-solute phenomena, varies considerably. For example, Mississippi uses a relative approachinvolving minimum and maximum values. In Illinois, the standard of accountability is fixed;it is defined as 50% or more of the students meeting or exceeding standards as measured bythe Illinois Standards Assessment Test (ISAT), the statewide assessment. The system does notspecify why students should be expected to achieve at a prescribed level and rate of achieve-ment, the basis for the standard, and what value would follow from that performance. Theseare important issues because they are critical for developing, understanding, and implementinga theory for improving student performance.

Issues in Assessing Student Progress

In terms of the assessments used in an accountability system, issues include the following:

• why assessments are assigned particular weights;• what should actually be assessed (all subjects or just math and reading);• when to assess (all grades or selected grades);• how to assess performance (selected response only or a combination of multiple-choice

and performance assessments, portfolios);• how to aggregate scores into an omnibus performance index; and• whether to have high school graduation exams.

Other concerns involve what indicators should count (dropout rate, attendance rates, spe-cial education placements, high school diploma testing, etc.); if so, how much they shouldcount, and whether each indicator is a unique measure or should be combined with achieve-ment measures into an overall performance index. The formulas used to calculate the generaleducation index that is often used to assign sanctions or awards is rarely understood. For


example, in Illinois the ISAT statewide assessment is the primary measurement of studentoutcomes. This large-scale assessment is composed of multiple-choice and, in some contentareas (reading, writing, and mathematics), constructed-response items. The reading, writing,and mathematics portions are administered in Grades 3, 6, 8, and 10. The social studies andscience assessments are currently composed of all multiple-choice items and are administeredin Grades 4 and 7. No other performance indicators (e.g., attendance rates) are part of thedesignation system, though such indicators may be included in the revised system.

Addressing Fairness Issues from a Technical Perspective

To date, the mechanism used for addressing the fairness issues involved in educationalaccountability systems is a technically complex apparatus. Two dimensions are related to thetechnical/fairness aspects of the accountability system. The first dimension addresses how theimprovement of school performance is judged by changes in the achievement levels of differentgroups of students. There are three tracking approaches: longitudinal (tracking the same groupof students across grades), quasi-longitudinal (comparing this year’s eighth grade studentswith the previous year’s seventh graders), or cohort (comparing Grades 4 or 8 with previousgroups at the same level). The tracking systems are difficult to explain, volatile (e.g., cohort),and these methods give different results when answering the question of who has shown themost improvement (Carlson, 2000; Linn & Haug, 2002).

The second dimension relates to whether student socio-economic status is taken intoaccount (SES or prior achievement). More than any other group, local educators express con-cerns about the fairness of these systems (Elmore et al., 1996). Questions are raised about thetransient students who may have just arrived when testing takes place, and about holding spe-cial education students accountable. Neither of these groups is likely to be equally distributedwithin a district. The question becomes whether or not districts should be held responsible forstudent performance regardless of students’ prior achievement and background characteristics.

There are two views on this issue. One perspective reflects the notion that no one shouldbe held accountable for factors they cannot control (Cronbach et al., 1980). As a consequence,some states have adjusted for socio-economic status. For example, Pennsylvania has createdwhat they call the “similar schools” score band (10 schools scoring immediately below andabove the target school) as a comparison group.

However, there is an equal concern that controlling for SES institutionalizes lower ex-pectations for poor students (Clotfelter & Ladd, 1996; Linn, 2001). Because there is a corre-lation between SES and ethnicity, this approach has the effect of creating lower standards forAfrican-American and Hispanic students (Linn, 2001). While imperfect, using prior achieve-ment as a predictor in an accountability system is an alternative to a direct control for SES.Because SES is correlated with prior achievement, it provides some control for differences inSES while leaving factors that teachers and schools can influence.

Both of these approaches are helpful in addressing this complex issue, However, they alsoillustrate the variety of difficulties encountered when attempting to use a technical solution towhat are fundamentally fairness issues.

Addressing Intended and Unintended System Consequences

The intended system consequences involve rewards, such as whether and how increasesin student achievement are recognized. The system consequences may also include sanctions


or assistance (resources available for schools who fail to make adequate progress). Corruptionof indicators, such as manipulating data (changing test scores or giving out answers to tests),primarily selecting indicators that can be easily manipulated (referrals to special education),avoiding constructs that are difficult to measure (citizenship), and avoiding measurement tech-niques that are difficult to implement (performance assessment and portfolio), are illustrationsof unintended consequences. In turn, treatments may be subtly shaped by the need for indicatorsthat are not easily manipulated, like narrowing of the curriculum (Stecher & Chun, 2001).

What draws attention when people consider consequences are the “high stakes” assess-ments, used for consequences such as grade promotion, certification, or the award of salaryincreases. Certainly, the consequences of high stakes assessments impact all students, teachers,and schools. While the goal of educational accountability systems is to improve teaching andlearning for all, particular groups of students, teachers, and schools (e.g., low achieving, whoare often low income) may be disproportionately effected by these consequences. For example,results from a recent investigation suggest high school graduation tests increase the probabilitythat the lowest achieving students will drop out (Jacob, 2001).

There are other unintended consequences reflecting social pathology that are connectedto the relationships of individuals and communities in the accountability context. For example,at the direction of the state, the Austin, Texas Public School system was indicted by thecounty prosecutor’s office for tampering with attendance data (attendance being the benchmarkindicator generating district fiscal resources). How students experience these episodes and howthis might further shape the school climate are unknown. The extent to which this kind ofincident affects the relationships between and among schools, parents, the community, and thestate has not yet been examined.

OPENING THE EDUCATIONAL ACCOUNTABILITY BLACK BOX

Although the current educational accountability systems resemble demonstration projects in-stead of experiments (Hanushek, 1996), their problems resemble those found during the eval-uations of Lyndon Johnson’s Great Society programs. Like the “black box” evaluations duringthe Great Society initiative, problems of the current educational accountability systems in-clude: how to define improvement, how to assess student progress, how well the educationalintervention and evaluation are implemented, or any unintended effects from the intervention.

The “black box” nature of these technically complex systems is problematic. However,they are symptomatic of a deeper structural problem with educational accountability systems.While the primary goal of these systems is to improve education, these systems do not provideinformation to administrators, teachers, parents, or students about what to do to and howto improve performance. The standards-based, large scale assessments identify students whoexceed, meet, or fail to meet the learning standards with minimal diagnostic informationthat might provide this kind of guidance. The treatment is testing—the idea is that educationalassessments can themselves be the key to improving student achievement by directing teachingand learning. However, there is no information about what this teaching or learning should bebeyond broad standards. Teachers and educators could use help in deciding what can be done toimprove student learning and how to do it. While stakeholders like teachers are held responsiblefor improving student performance, the systems do not help them know how to get better.

What can be done to improve student learning and how to do it are really fundamen-tal educational questions related to learning and beliefs about school (Elmore et al., 1996).


However, this kind of program and implementation theory is rarely articulated with firm linksto instructional, curricular, and assessment practices in the development and implementationof educational accountability systems. The defensibility of judgments about performance andthese practices are difficult for policymakers or educators, including teachers. In other words,the linking of proficient performance to performance in the workplace or to other educationaloutcomes is not specified. As a consequence, these systems are not likely to be viewed by thesestakeholder groups as helpful.

Attending to fairness issues from a technical perspective, like taking student socio-econo-mic status into account (SES or prior achievement) and defining the kind of tracking systemused to rate improvement, is important. However, these technical solutions to fairness issuesmask a more fundamental problem. It is the economic base of districts and the distribution ofresources that is the largest concern. Poorer students and districts can be at a disadvantage ineducational accountability systems.

It is not any easier to identify a “failed school” than it is a “failed student.” Test scores arenot enough evidence that the problem resides in the student or in the school. The inequities ofresource distribution are central in considering who has failed and understanding failure. Forexample, in Illinois, the correlation between school-level achievement and the poverty index(number of students eligible for free or reduced lunches in a school district) is .8 (poverty isrelated to lower achievement).

Resource distribution is one important key to thinking about how to improve school per-formance as well. Money matters. Student learning is affected by per pupil spending, especiallyin districts with minimal resources (Ferguson & Ladd, 1996). When a school or district failsto make adequate progress, the educational accountability systems differ substantially in thekinds and extent of resources available for implementing and sustaining changes (Elmore et al.,1996). The resources can be classified as: (a) human resources (e.g., consultants, field visits,interviews); (b) professional development (training programs to improve educators skills); and(c) financial resources (funds provided to troubled schools for extra services). The actual as-sistance available is, in some cases, modest. Some educational accountability systems provideonly human resources to schools for making changes.

There is no technical solution to the unequal distribution of resources because it is not atechnical problem. Instead, it is a practical moral problem involving wise and prudent decisionmaking (Mark et al., 2000; Schwandt, 2001). This is a matter of understanding about whatcan be and will be done to improve instruction in general and particularly for low performingschools.

CONTRIBUTIONS FROM THE EVALUATIVE IMAGINATION

While there are no doubt other areas in which to contribute, the evaluative imagination canmake contributions to this discourse for considering what to do and how to improve studentperformance and for thinking about the practical problems involved in fairness issues. It is theevaluation theories, analytic frameworks, and applied science methods that have much to offerin thinking about how to improve educational accountability systems. While less extensive,sensitivity to highly politicized contexts and attention to the relational dimensions of evaluationprovide some resources for considering how to change these accountability systems.

Addressing moral, practical problems like fairness issues is complex and challen-ging. Much of what is said here will be tentative and has to be tackled more deeply in


theoretical, conceptual, and empirical investigations. Issues of fairness are not solved withevaluation theories, analytic frameworks, and applied social science methods, although theymay be helpful. The available collective wisdom from the evaluation community for address-ing moral, practical problems, although important, is modest. Sensitivity to politicized con-texts and attention to the relational dimensions of evaluation offer some ideas about how tobegin.

Improving Educational Accountability Systems

Do accountability evaluation right. Educational accountability systems appear to bea variant of Wholey’s accountability evaluation approach—an approach that is much morethan standardized test scores and outcomes. Wholey (1994) emphasizes that, until the programdefinition is clearly articulated, programs are not ready to be evaluated and that indicatorscannot be selected. In spite of the fact that the educational outcomes are specified and elaborateperformance indices are calculated, it is not clear that the “educational program” is defined orthat the logic of the program is sufficiently well developed at this stage to warrant the use ofeducational outcomes.

In their assisted sense-making approach, Mark et al. (2000) recommend an array of eval-uative methods for studying programs and policies. Given the state of knowledge about theseeducational accountability systems, descriptive and classification studies would be useful. Forexample, descriptions of district resources, service delivery (involving amount and type ofeducational services and how instruction is delivered), and program context (size, demograph-ics, and economic base) would be a major step toward developing the program theories thatunderlie schools’ current operations. Especially critical to the Mark et al. (2000) approachto description is the use of mixed methods, like structured and less structured observations,interviews, focus groups, and surveys, for developing these descriptions. The mixed methodspermit the development of a much richer description in comparison to performance measure-ment, the most common current descriptive method. The rich descriptions offer the possi-bility of disentangling some key contextual differences in the implementation of educationalprograms.

Classification can be used to examine the structures of the current educational account-ability systems (Mark et al., 2000). There is probably an underlying typology that differentiatesamong these systems that can be studied with case study and factor analysis methods. Who-ley (personal communication, October 6, 1998) also recommends studying a small numberof programs as case studies as a supplement to performance indicators. Such case studiescan illustrate what exemplary programs look like and what they do. It is equally importantto include case studies of programs with significant problems. In the context of educationalaccountability systems, case studies across and within states would be useful. While there arestudies of accountability systems or dimensions of systems at the state level, (e.g., Clotfelter &Ladd, 1996; Guskey, 1994), there are few studies investigating within-state differences (e.g.,why one district is clearly successful and another fails).

Causal analysis, a third inquiry mode (Mark et al., 2000), could be used to estimatethe impact of different educational treatments (e.g., the cost and extent of fiscal and humanresources, class size) on student learning. Identifying the cause of the effects is also importantfor determining the particular contexts in which the treatment will be successful. For example,knowing why a smaller class size improves student achievement is just as important as knowingthat it does.


Develop program theory. The use of program theory to open up the “black box”of educational accountability systems would provide a mechanism for reflecting about whatto do to improve performance (Weiss, 1998). In effect, the newest reform in education istesting as a “lever to change classroom instruction and may be implemented with either ahigh-stakes or low-stakes assessment” or treatment (Heubert & Hauer, 1999, p. 36; also seeLinn, 2000). So the treatment is testing—educational assessments can be the key to improvingstudent achievement by directing teaching and learning. Program theory can be used to questionwhether testing alone is an adequate treatment for improving student achievement. Building onprogram theory, school improvement plans can be developed not only to specify what schoolsare going to achieve, but also how they will achieve it, including the resources needed toaccomplish educational goals.

Broaden accountability constructs through mixing methods. The content and per-formance standards that are the foundation for educational accountability systems emphasizehigh achievement, including complex understanding of subject areas and higher-order thinking(Ladd, 1996). No single assessment method can adequately represent these student outcomes.These kinds of complex understandings are best represented by problems and activities that stu-dents engage in, the kinds of thinking these activities advance, the kinds of questions studentsask and answer, not multiple choice tests. Activities that reveal the connections that studentsmake between what they study in class and the outside world (learning transfer) represent thecomplex understandings of high achievement. There is a clear articulation between these kindsof complex understanding (what to do to improve student performance) and assessment prac-tices like performance assessment and portfolios. However, these are measurement techniquesthat are difficult and expensive to implement on a large scale basis. This kind of educationalevaluation is labor intensive and time consuming (Eisner, 2001).

Test scores have a place in the representation of student achievement. However, usingmultiple methods to assess the same student outcome and multiple methods to measure differentstudent outcomes provides a much broader picture of student achievement (Greene, 2001a).Multiple methods are fairer, giving students multiple opportunities to show what they knowand can do. With multiple assessment methods, teachers can disentangle what students knowand can do on a test from what students know and can do in class.

Study educational accountability system consequences with implementation theory.Implementation theory, anchored in the notion that if program activities are conducted asplanned, the desired results will be achieved, is one approach in studying the intended and un-intended consequences of accountability systems (Weiss, 1998). Theory alone is not enough toproduce change. Developing a theory of implementation for delivering of educational programsfocuses on how to improve student performance—the specifics of learning. That is focus on theinstruction, curriculum, and assessments that will be provided to students. Whether the plan iscommunicated, then implemented as intended, as well as the actual quality and extent of the in-struction, curriculum, and assessments, are as important as program theory. Including a theoryof action specifies the causal links that tie program inputs to program outputs, making explicitthe logical connection between program theory and program implementation (Weiss, 1998).

At this stage, the study of unintended consequences can be considered. For example, theintended effect of the implementation of content and performance standards and their measure-ment with large-scale assessment is to increase student achievement. A potential unintendedeffect is that teachers may narrow their teaching and classroom assessment practices. Reflecting


on these kinds of implementation details leads the way for systematic study of educational ac-countability consequences.

Changing Educational Accountability Systems

Accountability for educational outcomes should be a shared responsibility of states, schooldistricts, public officials, educators, parents, and students. High standards cannot be estab-lished and maintained merely by imposing them on students. (Heubert & Hauer, 1999, p. 5)

Democracy is functioning well when every party learns how a pending decision wouldaffect his or her interests, and feels that the decision process is being suitably sensitive tothem. (Cronbach, 1988, p. 7)

The idea that evaluation is a location of democracy is not new and it is supported by eval-uators from differing theoretical perspectives (Cronbach et al., 1980; Greene, 2001a; House &Howe, 1999; Mark et al., 2000; McDonald, 1976). There are several notions about how to enactdemocratic processes within evaluation. Such issues as who should make policy, how to findout and represent various stakeholder groups’ values, and the relative position of stakeholderswithin the evaluation context location are still widely discussed and debated. Values inquiry,which “attempts to identify the values relevant to social programs and policies and to infusethem into evaluations” (Mark et al., 2000, p. 40) is a major approach for enabling democraticprocesses. Democratically oriented approaches characterized by inclusion, dialogue, and de-liberation are another (Abma, 2001; Greene, 2001b; House & Howe, 1999; Mathison, 2000;Ryan & DeStefano, 2001).

Locate the pluralistic policy-shaping community in the socio-political context. Iden-tifying potential stakeholder groups and audiences is the first step to enacting democratic pro-cesses in educational accountability systems. Cronbach and his colleagues (1980) favored acollective vision of how policy is shaped rather than made in the socio-political context. Theyproposed a pluralistic policy-shaping community (stakeholders who shape policy through in-teraction) rooted in notions of democratic participation. This community examines policyalternatives, discusses various sides and, through a process of accommodation, decides collec-tively what is the best course of action. In principle, there are various groups who are possiblestakeholders and audiences in educational accountability systems.2 The policy shaping com-munity for educational accountability systems can be adapted from the analysis of Cronbachand his colleagues (1980).

Members of the public include two groups: individuals directly affected by the assess-ment or system consequences (parents, students) and illuminators (journalists, academicians)(Cronbach et al., 1980). Parents and students have a stake in how the assessment or accountabil-ity system validation affects their interests, such as whether high school diplomas are awardedbased on assessment performance. The illuminators interpret and communicate informationabout the educational accountability systems. They are interested in disentangling or perhapsre-entangling issues surrounding educational accountability systems. Like Mark et al. (2000),I would add a third general group, members of the public (ordinary citizens interested in oraffected by concerns and representing multiple viewpoints). There are many other possiblestakeholders groups, including combinations of groups. For instance, in Illinois there is acoalition between large school districts and the business community. The coalition is activelylobbying for increased accountability.


The different stakeholder groups and audiences within the accountability system includepolicymakers at the federal and/or state level, school officials (superintendents and principals),and those involved in direct instruction (teachers). Federal and state policy makers are con-cerned about educational outcomes—broad goals and standards and how they impact policy.Their assessment interests are focused on the measurement of these educational outcomes.Superintendents and principals are responsible for the administration and daily operations ofthe educational programs. They are held accountable for seeing that there is progress towardmeeting educational standards and outcomes that are now represented as increases in test scoresand other indicators.

Teachers and other direct service personnel are responsible for instruction. There is a movetoward holding teachers responsible for increases in assessment scores and other indicatorsby rewarding such improvements with increases with pay increases and bonuses. Teachers(and probably principals and superintendents) are interested in how and whether high-stakesassessments can adequately measure student learning.

The current educational accountability systems basically represent policymakers’ views.If schools are managed well and doing a good job (managerial efficiency), then students willlearn more and demonstrate this improvement by higher scores on standardized achievementtests (outcome). Stakeholder groups like teachers, school administrators, parents, and studentsmay have different notions about accountability. For example, one or more of these groupsmay not agree that standardized test scores can adequately represent how much students knowand can do or that simply instituting accountability measures is enough to improve schoolsand student performance.

There is a moderate amount of evidence which suggests there are differences betweenpolicymakers’ views of educational accountability and the views of other stakeholders’ groups.Findings from surveys and focus groups suggest teachers are concerned about a number ofissues related to educational accountability, such as narrowing of the curriculum, devoting timeto test preparation instead of broader goals, and spending more instructional time on the contentareas tested, and that teachers feel pressure for students to do well on these tests (Chudowsky& Behuniak, 1997; Koretz, Barron, Mitchell, & Stecher, 1996; Stecher & Chun, 2001). Arecent, in-depth public opinion poll (ETS, 2001) found all demographic groups (includingwhite, Hispanic, and African-American parents, educators, and 200 education policy makers)thought standardized testing was important but that there should be no over-reliance on tests.Further, all groups were willing to spend real money hiring more teachers, reducing class size,paying teachers more, and building new buildings or repairing old ones.

Examine the relationships among the policy shaping community members. Evalua-tors bring a critical perspective to educational accountability systems by going beyond collect-ing and presenting stakeholders’ opinions to developing a deep understanding of the politicaland historical context of these systems (House & Howe, 1999). Understanding the politicaland historical context of these systems involves examining values underlying the system, theway these values mark the relationships that structure these systems (Greene, 2001a), and thepower relations. A brief (and incomplete) analysis of the entangled relations in educationalaccountability systems illustrates what attention to the relational aspects might look like.

Currently, educational accountability systems are structured by a hierarchical approachto improving educational achievement, reflecting a climate of control. Policy makers are com-mitted to using an efficiency model that uses indicators as measures of productivity (gains instandardized achievement scores represent increased productivity of schools). There is a sense


that districts and schools need to be monitored closely by federal and state policy makers.School administrators and teachers are under pressure to be “accountable,” particularly to thestate and federal levels.

The disaggregation of scores by socio-economic status, race/ethnicity, and language back-ground has highlighted the disparities in educational achievement and opportunities to learn.This is positive, in many respects, because these groups are now targeted as a priority for im-proved educational outcomes. However, the necessary resources (financial and instructional)for improvement are not always available. The social costs of the policy, such as increasesin dropout rate, are receiving less attention (Jacob, 2001). That the public sees standardizedtesting as important, but thinks there should be no over-reliance on tests is also a lesser pri-ority (ETS, 2001). The power relations are particularly entangled. Policymakers who servethe public are accountable to the policy-shaping community. The policy-shaping communityincludes particular groups of students, teachers, and schools (e.g., low income) who may bedisproportionately affected by the consequences (e.g., grade promotion).

Evaluators are not separate from these relationships, whether they see themselves as akey player (Greene, 2001a) or more like a member of the press, as a provider of evaluativeinformation and processes (Henry, 2001; Mark, 2002). Evaluators are actors in shaping thecharacter of these relationships. Deciding what should be done in educational accountabilitysystems often involves value choices. Who and what is right? The challenge becomes how todevelop an educational accountability system with a more democratic character. An examina-tion of values and preferences is a first step in developing this democratic character with anaim of deciding what ought to be done.

Address stakeholder issues through inclusive dialogues and deliberations about prac-tical problems. Educational accountability systems may be substantially different if theperspectives of this policy-shaping community were included. Take high school graduationexams. The issues surrounding these exams are highly complicated and value-laden. Highschool graduation exams are related to an increase in the drop out rate (Jacob, 2001). Theselection of constructs to be measured for high school exit exams is critical, especially wheregroup differences (gender and ethnic groups) in test scores are at issue (Willingham & Cole,1997). For example, depending on what constructs are measured (e.g., writing or science) andhow constructs are measured (constructed response or multiple choice), the high school exitpassage rates for females and males will shift.

In this next section, I present three different approaches for including stakeholders’ perspe-ctives in educational accountability systems. I do not intend to argue for one approach over ano-ther—that there is a correct choice among the three. Each approach has promise as well as risks.

Values inquiry can be used to clarify and balance values (Mark et al., 2000). Teachers,principals, and school superintendents could be surveyed to identify various meanings of ahigh school diploma, the most important aspects of the construct, and whether they regardtest scores as adequate measures of what a high school diploma represents. Parents or otherinterested groups could also participate. While passive, this opens up a conversation aboutthe meaning of a high school diploma, what constructs are important, and how to broadenoutcomes. Focus groups and group interviews are additional values inquiry methods (Market al., 2000). Within the values inquiry approach, the information may be used, for example,to make decisions about what kinds of outcomes to include in the accountability system. Thismethod has been used with success and can be implemented at any level—federal, state, orlocal levels (Mark et al., 2000).


House and Howe (1999) suggest going beyond the values inquiry to a more active ap-proach. Stakeholders and others engage in inclusive dialogues where they deliberate togetherusing reason and evidence to reach decisions about what ought to be done. Dialogues areintended for middle democracy settings, where citizens come together on a regular basis toreach collective decisions about public issues (Gutmann & Thompson, 1996). Stakeholderinterests are uncovered, constructed, re-constructed, and constituted in the dialogical process.There is particular emphasis on finding stakeholder groups and perspectives (more than usu-ally included) and ensuring these stakeholders’ groups are constructed and represented inconversations and decisions in some fashion.

Within this framework, dialogue is procedural, an instrument of reason (Schwandt, 2001).The dialogical process may take the form of a debate, while cultivating tolerance and respect forall perspectives. There is a commitment to address issues of concern and to come to a consensuswith a solution that is agreeable in part to all. This kind of work may be done in committees,focus groups, one shot or multiple public forums in combination with surveys, or other methods.

Using the example above, stakeholder views on high school exit exams may emerge andshift through this deliberative process. Through this deliberation process, policy makers mightdecide that broadening student outcomes for high school exit exams is critical. Schools willbe given more resources and 3 years to improve before the exit exam is implemented with anyconsequences. After listening to members of the business community, parent or teachers maydecide high school exit exams are critical for making the high school diploma meaningful. Aconsensus would be reached; although all may not agree, they could state their position.

This is the optimistic scenario. Implementing dialogues in practice can be a challenge.There is some evidence suggesting dialogues can lead to greater understanding among stake-holder groups (Abma, 2001; Karlsson, 2001; Ryan & DeStefano, 2001). On the other hand,dialogues between stakeholder groups are a risk. Findings from a case study suggest thesekinds of dialogues may lead to the entrenchment of stakeholders’ positions, acrimony, and donot always produce consensus (Greene, 2000).

While there is substantial overlap between the two conceptualisations, Greene (2001a,2001b) shifts from a procedural notion of dialogue towards a substantive notion involving a re-turn to a “moral discourse” in evaluation (Greene, 2001a; Schwandt, 1989, 2001). While there issome empirical evidence describing what a substantive dialogue looks like in practice (Greene,2001b), this approach has not been fully implemented. Accordingly, my discussion of this ap-proach is primarily confined to a theoretical description—awaiting a complete test in practice.

Substantive dialogue moves to an ethical, moral space about how one ought to be andhow to understand one another—making explicit that these are ethical, moral decisions. Theevaluator is an interlocutor (Schwandt, 2001) enabling democratic participation in dialogueinvolving “. . . a value commitment to engagement . . . with the challenges of difference anddiversity in practices” (Greene, 2001b, p. 181). The two dialogue methods are similar but theprocess changes.

With the substantive approach, through dialogue, evaluation becomes a site where stake-holders’ participation is enabled through conversation about issues of concern. Stakeholdersand other groups put aside narrow self-interest and address issues among themselves throughrespectful, reciprocal conversation. Through dialogue, the notion of what constitutes as “higherauthority” or “legitimate knowledge” is broadened beyond, for example, scientific rationalitylegitimizing the voices of marginalized stakeholder groups. While dialogue in evaluation can bejustified on its own merit as a democratizing practice, there is an implicit assumption, that betterand more democratic outcomes are achieved. To date, there is no evidence that this is the case.


I now present a brief example illustrating the optimistic view of how this approach willplay out in practice. The issues in the high school exit exam case will be viewed differentlydepending on peoples’ values (Cole & Zieky, 2001). When low-achieving students do not passhigh school graduation tests, some people or groups will consider this to be necessary for ahigh school diploma to have meaning. Others will see this as unfair to the student and considerthe failure to reflect other factors, such as poor schooling or a lack of school funding. Unlikecurrent decisions reached in educational standards committees that emphasize consensus withminimal opportunities for disagreement (Moss & Schultz, 2001), a substantive dialogue on thistopic would aim for all participants to have a better understanding of each others’ positions.Here dissensus is not a problem to overcome; instead, it is as a resource for examining taken forgranted beliefs and practices (Moss & Schultz, 2001). A space may be created for a minorityopinion position. In addition to the substance of the dialogue (whether there should be highschool exit exams), the nature of the school and community relationships that are created byenacting high school exit exams would be also be part of the dialogue. The kinds of relationshipscreated by high school exit exams in poor schools would be highlighted and compared to thosecreated in wealthier districts.

CONCLUSION

Educational accountability indicators are an important part of the administration of educationalprograms. However, these indicators oversimplify complex educational outcomes and theremay be an over-reliance on these systems for improving student performance. A recent, note-worthy development is the first set of standards designed for evaluating the current systems andfor guiding the development of improved systems (Baker et al., 2002). Standard 3 of these newstandards make it clear that students, administrators, and policymakers have a shared responsi-bility for achieving the expected system results. There is no standard, however, that addresseshow to find out and represent various stakeholder groups’ perspectives and values, who de-cides, and how to decide what these results should be. Evaluators have the necessary analyticframeworks, methodological expertise, sensitivity to highly politicized contexts, and emergingskills in the relational, moral, and ethical dimensions of evaluation to improve and change that.Enacting democratic processes in educational accountability systems may contribute to thedevelopment of new and different public policies or programs.

NOTES

1. I am considering only accountability systems that are developed within existing structures,including giving parents school choice within the public school system. Achieving educational account-ability through vouchers to private schools is not addressed.

2. I have identified a slightly different policy-shaping community as the stakeholders and audiencesin the high-stakes assessment validation process (Ryan, 2002).

REFERENCES

Abma, T. A. (2001). Reflexive dialogues: A story about the development of injury prevention in twoperforming arts schools. Evaluation, 7, 238–254.


American Evaluation Association. (2001). American Evaluation Association position statement onhigh stakes testing in Pre K-12 education. Retrieved March 30, 2002; from http://www.eval.org/hst3.htmreference.

Baker, E. L., Linn, R. L., Herman, J. L., & Koretz, D. (2002). Standards for educational accountabilitysystems (Policy Brief No. 5). Los Angeles: University of California, National Center for Researchon Evaluation, Standards, and Student Testing.

Carlson, D. (2000). All students or the ones we taught? Paper presented at the Thirtieth Annual NationalConference on Large Scale Assessment, Council of Chief School Officers, Snowbird, UT.

Chudowsky, N., & Behuniak, P. (1997). Establishing the consequential validity for large scale perfor-mance assessments. Paper presented at the National Council of Measurement in Education, Chicago.

Clotfelter, C. T., & Ladd, H. F. (1996). Recognizing and awarding success in public schools. In H. F.Ladd (Ed.), Holding schools accountable: Performance-based reform in education (pp. 23–63).Washington, DC: Brookings Institute.

Cole, N. S., & Zieky, M. J. (2001). The new faces of fairness. Journal of Educational Measurement, 38,369–382.

Cronbach, L. J., Ambron, S. R., Dornbsuch, S. M., Hess, R. D., Hornik, R. C., Phillips, D. C., Walker, D.F., & Weiner, S. S. (1980). Toward a reform of program evaluation: Aims, methods, and institutionalarrangements. San Francisco, CA: Jossey-Bass.

Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer & H. Braun (Eds.), Testvalidity. Hillsdale, NJ: Lawrence Erlbaum Associates Inc.

Educational Testing Service. (2001). A measured response: Americans speak on education reform.Retrieved October 10, 2001; from http://ets.org/aboutets/news/01052401.html.

Eisner, E. W. (2001). What does it mean to say a school is doing well? Phi Delta Kappan, 82, 367–372.Elmore, R. F., Abelman, C. H., & Fuhrman, S. H. (1996). The new accountability in state education reform:

From process to performance. In H. F. Ladd (Ed.), Holding schools accountable: Performance-basedreform in education (pp. 65–98). Washington, DC: Brookings Institute.

Ferguson, R. F., & Ladd, H. (1996). How and why money matters. In H. F. Ladd (Ed.), Holding schoolsaccountable: Performance-based reform in education (pp. 265–298). Washington, DC: BrookingsInstitute.

Greene, J. C. (1999). The inequality of performance measurements. Evaluation, 5, 160–172.Greene, J. A. (2000). Voices in conversation? Struggling for inclusion. In K. Ryan & L. DeStefano (Eds.),

Evaluation as a democratic process: Promoting inclusion, dialogue, and deliberation (pp. 13–26).New Directions for Evaluation (No. 85). San Francisco, CA: Jossey-Bass.

Greene, J. C. (2001a, January 25). Beyond accountability. Keynote address at the annual meeting of theSoutheast Evaluation Association, Tallahassee, FL.

Greene, J. C. (2001b). Dialogue in evaluation: A relational perspective. Evaluation, 7, 181–187.Guskey, T. R. (Ed.). (1994). High stakes performance assessment: Perspectives on Kentucky’s educational

reform. Thousand Oaks, CA: Corwin Press, Inc.Gutmann, A., & Thompson, D. (1996). Democracy and difference. Cambridge, MA: The Belknap Press

of Harvard University Press.Hanushek, E. A. (1996). Comments on chapters two, three, and four. In H. F. Ladd (Ed.), Holding schools

accountable: Performance-based reform in education (pp. 128–136). Washington, DC: BrookingsInstitute.

Henry, G. T. (2001). How modern democracies are shaping evaluation and the emerging challenges forevaluation. American Journal of Evaluation, 22, 419–430.

Heubert, J. P., & Hauer, R. M. (Eds.). (1999). High stakes: Testing for tracking, promotion, and graduation.Washington, DC: National Academy Press.

House, E., & Howe, K. (1999). Values in evaluation and social research. Thousand Oaks, CA: Sage.Jacob, B. A. (2001). Getting tough? The impact of high school graduation exams. Educational Evaluation

and Policy Analysis, 23, 99–121.Karlsson, O. (2001). Critical dialogue: Its value and meaning. Evaluation, 7, 188–203.

http://www.eval.org/hst3.htmreference

http://www.eval.org/hst3.htmreference

http://ets.org/aboutets/news/01052401.html


Koretz, D. M., Barron, S., Mitchell, K. J., & Stecher, B. M. (1996). Perceived effects of the Kentuckyinstruction results information district. MR-792-PCT/FF Santa Monica, CA.

Ladd, H. F. (Ed.). (1996). Holding schools accountable: Performance-based reform in education.Washington, DC: Brookings Institute.

Linn, R. (2000). Assessments and accountability. Educational Researcher, 29(2), 4–16.Linn, R. (2001). The design and evaluation of educational assessment and accountability systems (CSE

Tech. Rep. No. 509). Los Angeles: University of California, National Center for Research onEvaluation, Standards, and Student Testing.

Linn, R. L., Baker, E. L., & Betebenner, D. W. (2002). Accountability systems: Implications of requirementof the No Child Left Behind Act of 2001 Systems (CSE Tech. Rep. No. 567). Los Angeles: Universityof California, National Center for Research on Evaluation, Standards, and Student Testing.

Linn, R. L., & Haug, C. (2002). Stability of school-building accountability scores and gains. EducationalEvaluation and Policy Analysis, 24(2), 29–36.

Mark, M. M. (2002). Toward a classification of different evaluator roles. In K. E. Ryan & T. A. Schwandt(Eds.), Exploring evaluator role and identity. Greenwich, CT: InfoAge Publishing.

Mark, M. M., Henry, G., & Julnes, G. (2000). Evaluation: An integrated framework for understanding,guiding, and improving policies and programs. San Francisco, CA: Jossey-Bass Publishers.

Mathison, S. (2000). Deliberation, evaluation, and democracy. In K. Ryan & L. DeStefano (Eds.),Evaluation as a democratic process: Promoting deliberation, dialogue, and inclusion (pp. 85–90).New Directions for Evaluation (No. 85). San Francisco, CA: Jossey-Bass.

McDonald, B. (1976). Evaluation and the control of education. In D. Tawney (Ed.), Curriculum evaluationtoday: Trends and implications (pp. 125–134). London: Macmillan Education.

Moss, P. A., & Schultz, A. (2001). Educational standards assessment and the search for consensus.American Educational Research Journal, 38, 37–70.

Ryan, K. E. (2002). Assessment validation in the context of high-stakes assessment. EducationalMeasurement: Issues and Practice, 17, 7–15.

Ryan, K. E., & DeStefano, L. (2001). Dialogue as a democratizing evaluation method. Evaluation, 7,195–210.

Schwandt, T. A. (1989). Recapturing moral discourse in evaluation. Educational Researcher, 18(8),11–16.

Schwandt, T. A. (2001). A postscript on thinking about dialogue. Evaluation, 7, 264–276.Stake, R. E. (1967). The countenance of educational evaluation. Teachers College Record, 68, 523–540.Stake, R. E. (2001). How modern democracies are shaping evaluation and the emerging challenges for

evaluation. American Journal of Evaluation, 22, 349–354.Stecher, B., & Chun, T. (2001). School and classroom practices during two years of educational reform

in Washington state (CSE Tech. Rep. No. 550). Los Angeles: University of California, NationalCenter for Research on Evaluation, Standards, and Student Testing.

Stronach, I., Halsall, R., & Hustler, D. (2002). Future imperfect: Evaluation in dystopian times. In K.E. Ryan & T. A. Schwandt (Eds.), Exploring evaluator role and identity. Greenwich, CT: InfoAgePublishing.

Wenger, E. (1998). Communities of practice: Learning, meaning, and identity. Cambridge: CambridgeUniversity Press.

Weiss, C. H. (1998). Evaluation: methods for studying programs and policies. Upper Saddle River, NJ:Prentice-Hall.

Wholey, J. S. (1994). Assessing the feasibility and likely usefulness of evaluation. In J. Wholey, H. Hatry,& K. Newcomer (Eds.), Handbook of practical program evaluation (pp. 15–39). San Francisco, CA:Jossey-Bass.

Willingham, W. W., & Cole, N. S. (1997). Gender and fair assessment. Hillsdale, NJ: Lawrence ErlbaumAssociates Inc.

Documents

Shaping Educational Accountability Systems