12
ELSEVIER Studies in Educational Evaluation 30 (2004) 325-336 Studies in Educational Evaluation www.elsevier.com/stueduc UNPACKING THE EVALUATION PROCESS: A STUDY OF TRANSITIONAL BILINGUAL EDUCATIOW Carolyn Huie Hofstetter University of California, Berkeley Abstract This article examines the process of an ongoing, independent evaluation of a transitional Spanish/English bilingual program housed at a large, urban school district in Northern California. The program is designed to enhance Kindergarten through Grade 5 (K-5) students' English language proficiency, as well as their English performance in academic subject areas. The article looks at how the evaluation team addressed various political, methodological, and theoretical challenges, including multiple stakeholders, small sample sizes, difficulty establishing well-defined "treatment" and "control" groups, and questionable instrumentation. Finally, we discuss how participation in a highly dynamic, interactive evaluation process prompted the school district to engage in activities and to make decisions that ultimately served both the evaluation as well as the district as a whole. This article provides an overview of an ongoing longitudinal, multi-site evaluation of a transitional bilingual education program in an urban school district in northern California. Rather than present a summary of the research and findings, we focus on describing the evaluation process, including decision-points, political considerations, and 0191-491X/04/$ - see front matter © 2004 Published by Elsevier Science Ltd. doi: 10.1016/j.stueduc.2004.11.005

UNPACKING THE EVALUATION PROCESS: A …elt606.pbworks.com/f/C.H.Hofstetter.pdfUNPACKING THE EVALUATION PROCESS: A STUDY OF TRANSITIONAL ... in comparison to similar students in Structured

  • Upload
    leliem

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

E L S E V I E R Studies in Educational Evaluation 30 (2004) 325-336

Studies in Educat iona l Eva luat ion

www.elsevier.com/stueduc

UNPACKING THE EVALUATION PROCESS: A STUDY OF TRANSITIONAL BILINGUAL EDUCATIOW

Carolyn Huie Hofstetter

University of California, Berkeley

Abstract

This article examines the process of an ongoing, independent evaluation of a transitional Spanish/English bilingual program housed at a large, urban school district in Northern California. The program is designed to enhance Kindergarten through Grade 5 (K-5) students' English language proficiency, as well as their English performance in academic subject areas. The article looks at how the evaluation team addressed various political, methodological, and theoretical challenges, including multiple stakeholders, small sample sizes, difficulty establishing well-defined "treatment" and "control" groups, and questionable instrumentation. Finally, we discuss how participation in a highly dynamic, interactive evaluation process prompted the school district to engage in activities and to make decisions that ultimately served both the evaluation as well as the district as a whole.

This article provides an overview of an ongoing longitudinal, multi-site evaluation of a transitional bilingual education program in an urban school district in northern California. Rather than present a summary of the research and findings, we focus on describing the evaluation process, including decision-points, political considerations, and

0191-491X/04/$ - see front mat ter © 2004 Publ ished b y Elsevier Science Ltd. doi: 10.1016/ j .s tueduc.2004.11.005

326 C.H. Hofstetter / Studies in Educational Evaluation 30 (2004) 325-336

stakeholder issues. The context is an evaluation study of a Kindergarten through Grade 5 (K-5) bilingual education program for Spanish/English learners and its effects on students' English language acquisition as well as their English performance in academic subject areas, in comparison to similar students in Structured English Immersion (SEI) classes. Some schools had also implemented the well-established reading program, Success for All (SFA), which served as a major confounding influence.

Systematic evaluation of bilingual education programs in the United States is crucial. The population of English learners is growing rapidly and is increasingly acknowledged in policy discussions. Accountability and high standards are the hallmarks of recent educational policies, including the No Child Left Behind Act of 2001. The goal is to improve teacher effectiveness and to raise student achievement, largely through the selection and implementation of highly effective educational programs. Programs based on scientifically based research (SBR) are desirable, particularly those that meet the following criteria: 1) theoretical base of the program or practice; 2) evidence of effects; and 3) implementation and replicability.

Within this context, we look at various political, methodological, and theoretical challenges encountered in the study, as well as how the evaluation team addressed them. We begin with a description of the policy context surrounding the initial call for the evaluation study, followed by a description of the study itself, including the program overviews, decision points during the evaluation process, and various data and logistical issues. By describing this process, we hope to inform future evaluations of similar programs, and perhaps to provide theoretical implications for evaluation use, misuse and stakeholder involvement.

The Evaluation

The current study stems from a U.S. federal desegregation court case, 2 whereby a large urban school district in Northern California was found guilty of intentional segregation of Latino students. In 1997-98, as part of an imposed voluntary integration plan, the school district staff, building on an existing program, developed and implemented a K-5 transitional bilingual education program, known as Academic Language Acquisition (ALA). The federal court mandated an evaluation of the ALA program to determine its effectiveness in influencing English language development among English learners and in maintaining grade-level student achievement. Further, as ALA was implemented as part of a federal case, its presence did not violate California state Proposition 227 that required parental waivers for children to participate in the program and that curtailed the presence of bilingual education programs throughout the state.

In 1999, staff from the school district contracted with evaluators from the University of California, Berkeley to conduct an evaluation of this program. As part of the contract, the evaluation design was to be guided by requirements in the federal consent decree, stating that the evaluation would have the following components: (a) longitudinal in nature; (b) 1999-2000 will be considered the first year of implementation; (c) specific outcomes measures in English and Spanish should be used, and (d) creation of a longitudinal data set. To the extent possible, data collection efforts overlapped with efforts conducted by the court compliance monitor. Findings from a validation study conducted by the compliance

C.H. Hofstetter / Studies in Educational Evaluation 30 (2004) 325-336 327

monitor were incorporated into the evaluation study as appropriate. Outside of these requirements, the evaluation team and school district representatives discussed all aspects of the evaluation.

In its original conception, school district representatives wanted the evaluation to focus entirely on the ALA program versus the alternative program option for English learners, Structured English Immersion (SEI). In early discussions with the evaluation team, however, the school district realized that its primary interest was whether the two programs met their respective goals, rather than comparing approaches to determine which was the most effective. This change in perspective stemmed from several factors. First, despite conceptual and pedagogical overlap between the two programs, ALA and SEI were philosophically different in intent and thus had different purposes and goals, so a direct comparison was problematic. Second, although some comparison would inevitably be made between the two programs, the scope of the evaluation could not encompass this level of data collection and analysis. Third, while examining the federal consent decree, the federal judge overseeing the desegregation case did not explicitly require a program comparison. Thus the evaluation team and the school district representatives decided that each respective program was to be evaluated on its own merits. Comparisons between programs were a secondary goal of the evaluation.

Program Overviews

Academic Language Acquisition (ALA).

The goal of the ALA program is English language proficiency and academic achievement in the core curriculum utilizing the primary language. Instruction is in both Spanish and English, with decreasing Spanish language instruction and increasing English language instruction over time. In kindergarten and first grade, for example, 70% of instruction is provided in the students' primary language (L1) and 30% in English (L2). As students progress through each grade the percentage of English increases until the 4th/5th grades when the program design calls for 85% English and 15% Spanish language instruction. Specifically, there are three types of goals - academic (ACAD), linguistic (L1NG), and psychosocial, as identified by the school district:

Students will read, write and perform mathematics processes on grade level by 3rd grade in their primary language. (ACAD) Students will be orally fluent in English by 3rd grade. (L1NG) Students will transition into English reading by the end of 3rd grade. (ACAD)

Structured English Immersion (SEI)

English learners whose parents do not request the ALA program or who are in a school where there is no established ALA classroom instead enroll in a mainstream English language classroom, known as Structured English Immersion (SEI). Specialized instruction is predominantly in English. Teachers use English Language Development (ELD), sheltered and Specially Designed Academic Instruction in English (SDAIE) strategies, and remain consistent across grade levels. Primary language support in Spanish, about 10-15%

328 C.H. Hofstetter / Studies in Educational Evaluation 30 (2004) 325-336

of daily instruction, is provided only to facilitate English acquisition and academic progress. Additionally, SEI is offered in all schools, including those that have implemented the ALA program. In such cases where a school offers both programs, the number of English learners in SEI classes is disproportionately lower than the number in ALA classes. As noted previously, many SEI goals are similar to those for ALA:

Students will read, write and perform mathematics processes in English on grade level by 3rd grade. (ACAD) Students will be orally fluent in English by 3rd grade. (LING) Students will meet the redesignation criteria at the following rate: 25% at the end of 3rd grade, 50% of students in 4th grade, 100% of students in 5th grade. (ACAD)

Success for All (SFA)

SFA is a nationally known, widely implemented, school-wide, coordinated Kindergarten through Grade 8 (K-8) reading program for elementary schools. Offered in several ALA and SEI schools, particularly those with lower socioeconomic status levels, SFA represents a major confounding influence when examining the ALA program effects. Founded on research-based cooperative learning strategies, SFA groups students according to language and reading level. Students engage in 90 minutes of uninterrupted daily reading instruction, and use specially designed curricula customized by reading level. (For an overview of SFA and summary of research, see Slavin & Madden, 2001). At least 80% of the teaching staff within a school must approve implementation before the program can be adopted at that school. Although components vary by site, depending on school needs and available resources, several core scaffolding elements are common to all implementations, including a reading program, reading assessments administered every eight weeks, reading tutors for one-on-one instruction, a family support team, a program facilitator who oversees implementation of SFA in the school, and extensive teacher training.

The Process

In conceptualizing the study, the evaluation team grappled with several considerations: What questions should the evaluation address? Who were key stakeholders and what were their positions? What sorts of data were available and what needed to be collected? What sort of evaluation would meet client needs, but also be defensible to the broader populace, namely proponents and critics of bilingual education, as well as researchers and evaluators in the field? And, finally, what study design would meet the criteria of scientifically based research (SBR) to address accountability mandates? Each issue is discussed below, providing a contextual backdrop for decisions made by the evaluation team.

Defining Evaluation Questions

One of the first tasks in focusing the study was the need for the evaluation team to identify the client and key stakeholders. One could argue that the primary client was the school district, in essence the district superintendent, even though the evaluation was

C.H. Hofstetter / Studies in Educational Evaluation 30 (2004) 325-336 329

prompted by a court mandate issued by a federal judge. The judge did not outline evaluation questions, rather he presented guidelines that the evaluation should follow. Given this, we regarded the district superintendent as the primary client.

The evaluation team identified stakeholders who were primary intended users of the evaluation (Patton, 1997) - people well acquainted with the programs of interest (ALA, SEI), the federal court case, or both - and who were in a position to use the evaluation in some meaningful way. Because of the federal desegregation suit, the district superintendent, assistant superintendent and director of desegregation were interested in findings to determine whether the ALA program met its intended purposes, to ensure that students were not being denied special services that might contradict the intent of the voluntary integration plan. The bilingual education office staff, including the director of bilingual education, shared this interest and wanted independent, hard data to document program outcomes. If there was negative evidence about the implementation and/or impact of the ALA or SEI programs, the bilingual staff appeared amenable to changing their program structure and operations accordingly. Finally, lawyers for both the district and the plaintiff were interested largely to obtain evidence to support their respective positions.

Members of these groups were present in early discussions with the evaluation team, particularly when defining the purpose and scope of the study. We discussed what types of evaluation questions would meet the court mandates, as well as what information would be of greatest long-term benefit and use to the district, in terms of guiding their needs and decision-making processes. Although there were several directions in which the study could proceed, the stakeholders and the evaluators agreed upon the four general questions:

.

2.

.

4.

What are the criteria to determine the effectiveness of the ALA program and the SEI process? Are the ALA students reaching the stated programmatic goals of the ALA program? Are SEI students reaching the goals of the Structured English Immersion (SEI) process? How do the ALA program and the SEI process compare in helping students increase their English language proficiency, in general and over time? What impact does the presence of a Success for All (SFA) program have on ALA and SEI student performance?

Involving Stakeholders

One of the key factors in conducting the evaluation was constant communication between the evaluation team and key stakeholders. In terms of the school district, three district administrators were integral to the shaping and eventual conduct of the study, each playing different, but essential, roles. The director of desegregation served as the intermediary between the evaluation team and the district superintendent, the federal judge and the court monitor, and the lawyers for the school district and the plaintiff. This person ensured that the requirements of the federal court case were met, served as an interface between the evaluation team and the court monitor, and informed the superintendent about the progress of the study. The director of bilingual education was the conduit between the evaluation team and the schools where data were collected. This person provided entrde into the schools, notifying school principals and teachers about the study, answering

330 C.H. Hofstetter / Studies in Educational Evaluation 30 (2004) 325-336

questions and concerns about why an evaluation team would be contacting them for visits, interviews, classroom observations, and also provided the bulk of the program documentation. Finally, the director of educational accountability provided access to the school district databases, including test and demographic data for individual students whose data were analyzed for the quantitative portion of the study. This person provided database documentation and insights that helped guide the analysis, given the status and completeness of the data available.

Depending on the status of the study and the type of information needed, these persons used their internal access to ensure that needs of the evaluation team were met. As primary intended users of the evaluation, they relied on district staff to provide assistance to support the evaluation, either through information sharing, obtaining documents or information, contacting schools when necessary, and other logistical considerations. All attended meetings to discuss progress and findings, and reviewed and provided commentary on interim documents and reports to ensure accuracy.

Designing a Defensible Study

Considering the political context of the evaluation, district representatives wanted the evaluation to be defensible to multiple audiences, including district staff, as well as to external audiences, including proponents and opponents of bilingual education, educators of English learners, parents, students, and residents in surrounding communities. The evaluation team wanted the study to meet the basic tenets of evaluation research, including a theoretically-based, conceptual framework, rigorous design and instrumentation, sound statistical and qualitative data analyses, and valid inferences about the programs of interest.

In developing the study design, the evaluation team reviewed earlier evaluations of bilingual education programs and reviews of these evaluations. Many were criticized on theoretical and methodological grounds, such as lacking an explicit theory to guide the study design, small sample size, lack of definable comparison groups (making it difficult to compare students within and across different types of programs), and/or inappropriate study designs and analyses (see, for example, Dannoff, Coles, McLaughlin, & Reynolds, 1978; Development Associates, 1986; Ramirez, Pasta, Yuen, Billings, & Ramey, 1991). Recurrent problems prompted both proponents and opponents of bilingual education programs to identify and to agree on four methodologically acceptable criteria for future evaluations. These criteria guided subsequent research reviews and meta-analyses (see Greene, 1998; Meyer & Fienberg, 1992; Rossell & Baker, 1996), as well as the current study:

,

2.

3. 4.

Studies had to compare students in bilingual programs to a control group of similar students. The design had to ensure that initial differences between treatment and control groups were controlled statistically or through random assignment. Results were to be based on standardized test scores in English. Differences between the scores of treatment and control groups were to be determined by means of appropriate statistical tests.

CH. Hofstetter / Studies in Educational Evaluation 30 (2004) 325-336 331

Conceptual frameworks guided the present study, particularly the questions of interest, theoretical propositions, study design, and ultimately the data collection and analysis schemes. Both ALA and SEI were founded on theoretically grounded research, in fact sharing similar features based on the same pedagogical models. The features included: (a) length of time in program; (b) adequate exposure to second language; (c) emphasis on academic achievement; (d) bilingual instruction through separate monolingual lesson periods; (e) program quality; (f) type of English language instruction; (g) instructional environment; (h) parental involvement; (i) teacher quality, and (j) increasing minority status. The key difference was the extent to which the English language was used in the instructional process. Full descriptions of the programmatic features are available in the evaluation report (Hofstetter & Garcia, 2003).

To obtain definable comparison groups using district data, we analyzed data only from students who met designated standards, rather than all students enrolled in the Academic Language Acquisition (ALA) program or Structured English Immersion (SEI) process at the time of the study. These selection criteria, developed in conjunction with district staff, were: (1) continuous enrollment in an ALA program or SEI classroom, with or without SFA, since kindergarten (1998-2002); (2) specified ethnicity as Hispanic and/or home language as Spanish; (3) enrollment in the school district for 182 days with maximum 15 days absent as an indicator of solid attendance, and (4) longitudinal test data available. While this yielded greater confidence in the purity of the treatment group and the fidelity of results, the inclusion criteria severely reduced the number of students in the study sample and likely limited the generalizability of the findings.

Another consideration was controlling for students' initial English language proficiency. Earlier research suggested that one of the main caveats of bilingual education studies was that initial differences in students' English language proficiency are strong predictors of their subsequent performance. As a result, efforts were made to obtain a more fair comparison of the four groups of interest. Several methodological ground rules were developed for conducting the matches, essentially matching on key language proficiency data in kindergarten. Matching was conducted independently by two raters experienced in statistical analysis and matching methods.

Quantitative data were supplemented with qualitative data, including school visits, interviews with key stakeholders (e.g., school principal, teachers, instructional staff), document analysis, and classroom observations at multiple grade levels. Although the federal court did not require qualitative information for the evaluation, the evaluation team and the district felt it was important to provide at least some contextual information to help explain the quantitative test results, to the extent possible. From the inception of the study, the evaluation team conducted informal site visits of virtually all the ALA and SEI schools. The goal was to obtain an overview of the scope and implementation of each program, including implementation issues, staff perceptions of program impact, number of student and teacher participants, other literacy initiatives that English learners might participate in, school demographics and other information pertinent to the evaluation study. In addition to visiting schools, the evaluation team worked in conjunction with the district to develop criteria for implemented ALA and SEI programs. This information, along with program validation findings obtained by the court monitor, was used to ensure that programs were implemented as intended.

332 C.H. Hofstetter / Studies in Educational Evaluation 30 (2004) 325-336

Reporting Findings

Interim findings were reported to the district at regular intervals and upon request. Throughout the study, the evaluation team circulated drafts of the evaluation proposal and interim reports among key stakeholders and primary intended users for feedback, accuracy and commentary. The evaluation team took into consideration suggestions for report changes, most of which were related to program information rather than study findings, and retained final decision-making authority of revisions to the documents, most notably the final report.

Throughout the process, there were concerns about the possible use and misuse of the evaluation fmdings. Thus, the evaluation team included numerous disclaimers on interim documents, noting that they were for review purposes only, and that no conclusions about the data were to be made until the study completion. This was of particular concern with the lawyers involved in the case, as well as with critics of bilingual education programs, who were eager to use evaluation results to facilitate their own positions. Although we considered viewpoints of such groups, the evaluation team chose not to let them influence the study design or process.

Evaluation Use

Evaluations can be used in a variety of ways, be it conceptually or instrumentally. In this study, for example, information needed for the evaluation prompted the district to create and update important documents for accountability purposes. When the evaluation team sought to identify criteria to examine program effectiveness, we found that standards for the SEI program were not available. The district bilingual education staff, in conjunction with the evaluation team, defined the performance criteria, which were then reviewed by key stakeholders. When the criteria were accepted, they were then presented to the district board of education for approval into the official district program documentation.

Additionally, the evaluation team was interested in defining criteria for implemented versus developing programs. Through various conversations, the evaluation team and bilingual education staff worked closely to create criteria for implemented ALA and SEI programs. Various program documents were reviewed. Schools were visited to obtain evidence of implementation. This information, in conjunction with program validation findings obtained from a federal court monitor, was used to identify implemented programs for inclusion in the study. Implementation criteria for both programs were similar, with a focus on the following factors: (a) staff understanding of foundations of bilingual education or English immersion, depending on their program; (b) staff can articulate these foundations as they relate to their program; (c) experience in staff related to implementing the program; (d) program is fully articulated across grade levels in school; (e) administrative and teacher leadership is on site; (f) staff identifies itself as part of the program; (g) parental support is an important part of the program, and (h) compliance criteria are met. Ultimately, all ALA and SEI schools were considered to have met these general criteria and were included in the study. These criteria were incorporated into the evaluation design and the program documentation. We believe this helped with standardizing the program implementations across school sites.

C.H. Hofstetter /Studies in Educational Evaluation 30 (2004) 325-336 333

Time Constraints

Based on the federal court order, the evaluation study would be conducted in three years, starting in the 1999-2000 academic year and continuing through 2001-2002. During evaluation contract negotiations, and upon learning more about the program itself, the evaluation team suggested that the first year (1999-2000) be considered a planning year, particularly since the program was relatively new, having been implemented in 1997-98. This would give the district one year to ensure that the ALA and SEI programs had an opportunity to be implemented in the desired fashion, to the extent possible given the short time frame of the study. Consequently, data analyzed for the study commenced in 2000- 2001 and proceeded through 2002-2003 (although the study was subsequently extended for an additional two years).

Evaluation Issues Confronted by the Evaluation Team

As with previous examinations of bilingual education programs, there were several limitations of the current study. Some stemmed from logistical issues, others from the nature of the study itself. Following are selected issues and how the evaluation team addressed them.

Small Sample Size

Imposing multiple criteria related to students' attendance rates, matriculating in the same type of program, and being enrolled for 3+ consecutive years in the same program reduced the numbers of students dramatically, such that only descriptive subgroup and t-tests were possible. As a result, the evaluators were limited in the inferences they could draw about the data. The subgroup sizes prompted the evaluators to set up a matched sample analysis, based on student characteristics at baseline, and to calculate effect sizes to yield more rigorous insights into the students' achievement data.

Lack of Assessment Information from School District

Questions about the validity of inferences about English learners based on standardized tests administered in the English language are an issue well-known to educators. Despite a need for multiple measures of student test data, only SAT-93 test data were available for analysis; the remaining data were not scored or were regarded as not trustworthy. Given this, the evaluation team opted to also conduct small-scale case studies of selected programs of interest. The evaluation team was aware that the program strength and weaknesses could not be captured in a standardized test score, and they also knew that contextual factors were important determinants of student and program performance. For these reasons, the evaluation team set up interviews, classroom observations, and small- scale focus groups with three schools of interest, representing various idealized configurations of the programs of interest, in addition to visiting all the participating schools at least once.

334 C.H. Hofstetter / Studies in Educational Evaluation 30 (2004) 325-336

Timeliness of the Data

The evaluation team suffered numerous delays in obtaining district test data. Ultimately, contacting the superintendent of schools and notifying her that any delays in obtaining the data would translate into delays in the study itself and the deliverables (of which there were many). This resulted occasionally in faster turn-around times. Further, in some instances, there were problems in analyzing the district test data as many variables had been mislabeled or were documented in such a way that it was impossible for persons unfamiliar with the data set (e.g., the evaluators at the beginning of the study) to understand how they were set up. In these instances, the evaluation team sought assistance from the district officials and modified the analyses accordingly.

Difficulty in Making Causal Inferences about Program Effects

Because of potentially confounding variables (e.g., after school educational programs, parental tutoring), potential measurement problems, lack of information about program implementation across schools, and other caveats noted earlier, it is virtually impossible to make causal inferences about the ALA and/or SEI program effects on student performance. For these reasons, the evaluation team was cautious in making any definitive statements about the effectiveness of the respective programs of interest, particularly given the stakeholders with competing interests.

Lessons Learned and Next Steps

The current study is notable in terms of the stakeholder and evaluation use issues. Because there were numerous stakeholders groups, with varying points of views, it was imperative for the evaluation team to remain cognizant and respectful of these perspectives, but not allow them to consume or overwhelm the evaluation design or findings. In general, this distance between the evaluators and the stakeholders in terms of the study was to our benefit. Numerous stakeholders knew the study was being conducted but did not contact us directly to express their views.

Constant communication between the evaluators and the different stakeholders, particularly the primary intended users, was key to the successful conduct of the study. The potential high-stake nature of the study (determining if the ALA program would remain in existence) increased the level of interest and participation of the stakeholders, which also facilitated the evaluation itself. District staff made possible our access to the schools and school personnel, and were responsive to our evaluation needs throughout the study. As evaluators, we were in turn attentive to their concerns, posing questions of interest and involving them throughout the evaluation process. We feel this level of interaction and constant contact helped in terms not only of facilitating the conduct of the evaluation, but also in aiding the district in improving its documentation of the respective ALA and SEI programs, particularly developing indicators for measuring program implementation and better defining reasonable performance standards.

C.H. Hofstetter /Studies in Educational Evaluation 30 (2004) 325-336 335

The final evaluation report has since been released to the school district, which has provided copies to the federal judge, court monitor, lawyers, and numerous stakeholder groups. To date, the report has been well received. There has not been any indication that the A L A and SEI programs o f interest will be modified or eliminated at this point - the district representatives agreed that the data at this point were insufficient to warrant such actions.

Interestingly, after the evaluation team reported the findings to the school superintendent and school board members, there was interest in extending the study for two more years, to follow the same cohort o f English learner students through Grade 5 (beginning when these students were in Kindergarten). Limited funds are available to support the extension, however, so the case study portion o f the study will not be continued. The issue o f dwindling subgroup sample sizes remains a major concern for the study in its current conception. We are currently examining data to see how missing data might be retrieved or imputed, or whether proxy data are available. Further, we may have to investigate alternative analysis possibilities to ensure that the study meets the basic tenets o f evaluation research, despite the caveats noted earlier.

Notes

This work was supported through a grant from the San Jose Unified School District, San Jose, California. The author wishes to thank Eugene Garcia, Arizona State University, and Alice Miano for their invaluable contributions to the project. I am also grateful to the District staff, teachers and students who participated in this study. The school district and title of the federal court case have been removed for purposes of anonymity. Stanford Achievement Test, 9th Edition (SAT). The SAT-9 represents one component of the California Standardized Testing and Reporting (STAR) Program. It is a nationally normed test administered annually to all students in grades 2-11 in the state. Tests are administered in several subject areas, including Reading, Mathematics and Language Arts. The SAT-9 has since been replaced by a test linked directly to the state content standards.

References

Dannoff, M.N., Coles, G.J., McLaughlin, D.H., & Reynolds, D.J. (1978). Evaluation of the impact of ESEA Title VII Spanish~English bilingual education programs (Volume III.) Palo Alto, CA: American Institutes for Research.

Development Associates (1986). Year 1 report of the longitudinal phase. Arlington, VA: Development Associates.

Greene, J. (1998). A recta-analysis of the effectiveness of bilingual education. Claremont, CA: Tomas Rivera Policy Institute.

Hofstetter, C.H., & Garcia, E.E. (2003). Academic Language Acquisition (ALA) program effectiveness study. Prepared for the San Jos6 Unified School District. Berkeley CA: University of California, Berkeley, Graduate School of Education.

336 C.I~ Hofstetter / Studies in Educational Evaluation 30 (2004) 325-336

Meyer, M.M., & Fienberg, S.E. (Eds.) (1992). Assessing evaluation studies: The case of bilingual education strategies. Washington, DC: National Academy Press.

No Child Left Behind Act of 2001, Pub. L. No. 107-110, 115 Stat. 1425 (2002). Retrieved August 30, 2002. Available: <http://www.ed.gov/legislation/ESEA02/>.

Patton, M.Q. (1997). Utilization-focused evaluation: The new century text (3rd Ed.). Thousand Oaks, CA: Sage.

Ramirez, D.J., Pasta, D.J., Yuen, S.D., Billings, D.K., & Ramey, D.R. (1991). Final report: Longitudinal study of structured-English immersion strategy, early-exit and late-exit trans#ional bilingual education programs for language minority children (Volume II). San Mateo, CA: Aguirre, International.

Rossell, C.H., & Baker, K. (1996). The effectiveness of bilingual education. Research in the Teaching of English, 30, 7-74.

Slavin, R.E., & Madden, N.A. (Eds.) (2001). Success for all: Research and reform in elementary education. Mahwah, NJ: Erlbaum.

The Author

CAROLYN HUIE HOFSTETTER is an assistant professor in the Graduate School of Education, University of California, Berkeley. Her research focuses on assessment and evaluation issues, including the validity of standardized content-based assessments for English learners and their use program evaluations. She is also interested in the use and misuse of evaluations, and their implications for evaluation theory.

Correspondence: <chofstet@ berkeley.edu>