30
This article was downloaded by: [UOV University of Oviedo] On: 15 October 2014, At: 04:04 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Applied Measurement in Education Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hame20 The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment Mikyung Kim Wolf a , Jinok Kim b & Jenny Kao c a Educational Testing Service b American Institutes for Research c Teachers College , Columbia University Published online: 05 Oct 2012. To cite this article: Mikyung Kim Wolf , Jinok Kim & Jenny Kao (2012) The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment, Applied Measurement in Education, 25:4, 347-374, DOI: 10.1080/08957347.2012.714693 To link to this article: http://dx.doi.org/10.1080/08957347.2012.714693 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any

The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

  • Upload
    jenny

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

This article was downloaded by: [UOV University of Oviedo]On: 15 October 2014, At: 04:04Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK

Applied Measurement inEducationPublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/hame20

The Effects of Glossary andRead-Aloud Accommodationson English Language Learners’Performance on a MathematicsAssessmentMikyung Kim Wolf a , Jinok Kim b & Jenny Kao ca Educational Testing Serviceb American Institutes for Researchc Teachers College , Columbia UniversityPublished online: 05 Oct 2012.

To cite this article: Mikyung Kim Wolf , Jinok Kim & Jenny Kao (2012) The Effects ofGlossary and Read-Aloud Accommodations on English Language Learners’ Performanceon a Mathematics Assessment, Applied Measurement in Education, 25:4, 347-374, DOI:10.1080/08957347.2012.714693

To link to this article: http://dx.doi.org/10.1080/08957347.2012.714693

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all theinformation (the “Content”) contained in the publications on our platform.However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness,or suitability for any purpose of the Content. Any opinions and viewsexpressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of theContent should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for any

Page 2: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

losses, actions, claims, proceedings, demands, costs, expenses, damages,and other liabilities whatsoever or howsoever caused arising directly orindirectly in connection with, in relation to or arising out of the use of theContent.

This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone isexpressly forbidden. Terms & Conditions of access and use can be found athttp://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 3: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

APPLIED MEASUREMENT IN EDUCATION, 25: 347–374, 2012Copyright © Taylor & Francis Group, LLCISSN: 0895-7347 print / 1532-4818 onlineDOI: 10.1080/08957347.2012.714693

The Effects of Glossary and Read-AloudAccommodations on English Language

Learners’ Performance on a MathematicsAssessment

Mikyung Kim WolfEducational Testing Service

Jinok KimAmerican Institutes for Research

Jenny KaoTeachers College, Columbia University

Glossary and reading aloud test items are commonly allowed in many states’accommodation policies for English language learner (ELL) students for large-scalemathematics assessments. However, little research is available regarding the effectsof these accommodations on ELL students’ performance. Further, no research existsthat examines how students use the accommodations. Using a randomized experi-mental design and think-aloud procedure, the present study explored the effects ofthe two accommodations. Results showed no significant effect of glossary, mixedeffects of read aloud, and significant interaction effects between students’ priorcontent knowledge and accommodations, which suggests the accommodation waseffective for students with content knowledge. Think-aloud data indicated that stu-dents were not familiar with, and did not actively utilize the glossary. Implicationsfor the effective use of accommodations and future research are discussed.

The validity of assessments for English language learner (ELL) students hasgained much attention since federal legislation mandated the participation ofall students, including ELL students, in state accountability systems. A centralvalidity concern in assessing ELL students is to what extent the assessment

Correspondence should be addressed to Dr. Mikyung Kim Wolf, Educational Testing Service, 660Rosedale Rd., MS4-R, Princeton, NJ 08541. E-mail: [email protected]

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 4: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

348 WOLF, KIM, KAO

outcomes demonstrate the students’ content knowledge and skills without inter-ference from the students’ limited English language proficiency. For example, amathematics assessment is, broadly speaking, intended to measure a student’smathematical problem-solving ability. However, linguistic complexities in themath assessment may interfere with ELL students’ mathematical problem-solvingability, failing to measure the intended construct for these students.

Testing accommodations have been utilized as a way of reducing unintendedfactors, referred to as construct-irrelevant variance, so that one can adequatelyassess ELL students’ content knowledge and make appropriate inferences fromthe assessment results. Federal legislation specifically urged states to establish anappropriate, valid accommodation guideline as part of their accountability systems(No Child Left Behind, 2002). However, challenges emerged partly because ofinconclusive results on effective accommodations.

Previous research on accommodations used for ELL students has focused ononly a handful of those used in practice, leaving many used accommodationsstill yet unexamined (Francis, Rivera, Lesaux, Kieffer, & Rivera, 2006; Sireci,Li, & Scarpati, 2003). Among the few accommodations that have been researched,results have been mixed, calling for more rigorous research to identify effectivetypes of accommodations. The lack of research-based evidence on the effectsof various accommodations on ELL students’ testing performance created chal-lenges for policy makers and practitioners to establish appropriate accommodationpolicies and uses for ELL students.

In light of this, the purpose of the present study is to examine the effectivenessand validity of accommodations which are commonly allowed for ELL studentswhen taking a large-scale content assessment. This study focused on two partic-ular accommodations for states’ standards-based math assessments at grade 8:English glossary and reading aloud an entire test in English. These two accom-modations were selected for a number of reasons. First, these accommodationsare assumed to help ELL students because they directly support the students’English language limitations. Second, among the direct linguistic support typesof accommodations, these two have been identified as frequently allowed accom-modations in states’ policies. According to a nationwide accommodation policyreview conducted for the 2006–2007 school year (Wolf, Kao, et al., 2008), thesetwo accommodations were allowed in over 40 states for their accountability test-ing systems. Third, the states participated in this study expressed their interest inthese two accommodations.

Previous research on glossary accommodation effects for ELL students yieldedmixed results (Abedi, Courtney, & Leon, 2003a; Abedi, Courtney, & Leon, 2003b;Abedi, Hofstetter, Baker, & Lord, 2001; Abedi, Lord, Boscardin, & Miyoshi,2001). For example, Abedi et al. (2003a) conducted an experimental study whereboth ELL and non-ELL students at grades 4 and 8 received a customized Englishdictionary, or a bilingual/English glossary, or linguistic modification of test items,or no accommodation. The researchers found that there was no statistically

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 5: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

EFFECTS OF GLOSSARY AND READ-ALOUD ACCOMMODATIONS 349

significant impact of the customized dictionary or the glossary on ELL students’test scores. In a separate study, Abedi et al. (2003b) examined the effects of “pop-up” glossaries of non-content terms on the test performance of grade 4 and grade8 ELL students using a computer administration in a mathematics assessment.Results indicated that the “pop-up” glossary was effective in increasing the per-formance of ELL students in both grades without affecting the scores of non-ELLstudents. On the other hand, in another study involving a grade 8 math test, testscores of both ELL and non-ELL students increased with the provision of anEnglish glossary with extra time (Abedi et al., 2001). These results raised ques-tions about the validity of the accommodation because it impacted the test scoresof the non-ELL students who received it.

Limited accommodation research for ELL students is evident when it comesto the read-aloud accommodation. Prior research on the effects of read aloud/oraladministration of test items on test scores has primarily focused on students withdisabilities. Only one study investigating the effect of reading aloud test itemson ELL student performance was found (Anderson, Liu, Swierzbin, Thurlow, &Bielinski, 2000). Anderson et al. (2000) provided grade 8 ELL students fromSpanish-speaking backgrounds with dual-language test items (items printed inboth English and Spanish) as well as the option of listening to the item readaloud in Spanish with an audiocassette on a reading test. They found no signif-icant effects of the accommodations, noting that the majority of students did notchoose the option of read aloud. Based on our review, the effects of both glossaryand read-aloud accommodations for ELL students remain ambiguous and requirefurther investigation.

In this study, we address the following research questions:

1. Does providing glossary and read-aloud accommodations increase ELLstudents’ performance in a math assessment as compared to the standardtesting condition?

2. Does providing glossary and read-aloud accommodations leave non-ELLstudents’ performance unchanged, as compared to the standard testingcondition?

3. To what extent do ELL students have difficulty with the language andcontent in solving math items?

4. How do ELL students utilize a glossary accommodation?5. What are students’ perceptions on the helpfulness of glossary and read-

aloud accommodations when taking a math assessment?

METHOD

In order to investigate the research questions, we utilized both quantitativeand qualitative methods. Quantitatively, a randomized experimental design was

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 6: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

350 WOLF, KIM, KAO

applied to find the effects of accommodations on ELL and non-ELL students.Qualitatively, a think-aloud and retrospective interview were used to examine theELL students’ uses and perceptions of the accommodations.

Participants

A total of 605 students from two states participated in this study (313 ELL and292 non-ELL). We will refer to two states as State X and State Y, respectively,to preserve anonymity. ELL students were defined by the states’ respective cri-teria: the student’s home language and the student’s English proficiency leveldetermined by English language proficiency test performance.

Since the math test was designed to measure grade 8 standards, students at theend of grade 8 (for State X) or beginning of grade 9 (for State Y) were targeted forthe sample. Data collection was conducted in State X in Spring 2008 with grade8 students, and in State Y in Fall 2008 with grade 9 students. It was expected thatgrade 9 students were a more appropriate sample in the Fall than the then-currenteighth-graders because the mathematics assessment of the experimental designaddressed grade 8 standards.

In State X, 267 grade 8 students (140 ELL, and 127 non-ELL) from fourschools in one urban school district participated in the testing. In State Y,338 grade 9 students (173 ELL, and 165 non-ELL) from nine schools in fourschool districts (one urban and three suburban districts) participated in the test-ing. Among the ELL students who took the math test administered in this study,a total of 68 students (19 from State X, 49 from State Y) also participated inthe think-aloud interview. Many of the ELL students were from Spanish-speakingbackgrounds (55%), and other home languages included Arabic, Bengali, Danish,Hmong, Mandarin Chinese, Punjabi, Tagalog, Turkish, and Vietnamese. Studentswho spoke no English were excluded from the study.

In addition to the scores on the math test employed in this study, existing datawas obtained from each state. The data included students’ background information(e.g., gender, race/ethnicity, home language, free or reduced lunch program eli-gibility, Individualized Education Plan [IEP] status, ELL status, English languageproficiency level, ESL program participation) along with states’ 2008 standardizedtest scores for reading and math scores from 2008 English language proficiencytests.

Instruments

Math Test

An eighth-grade level mathematics test was developed for the study anddesigned to be administered in one class period of approximately 45 minutes.

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 7: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

EFFECTS OF GLOSSARY AND READ-ALOUD ACCOMMODATIONS 351

It comprised 37 items (35 multiple-choice and 2 open-ended), including releaseditems from the National Assessment of Educational Progress (1990, 1992,1996) and the Trends in International Mathematics and Science Study (1995),as well as released items from states’ standardized math tests. The test itemsaddressed math standards of number sense/computation and algebra, and alsoincluded some items covering geometry, measurement, and data analysis. Effortwas made to make the test comparable to the participating states’ standards-basedmath tests in terms of content and linguistic complexity. An alignment study, toensure alignment with participating states’ math content standards, was conductedby an external review team consisting of doctoral students with expertise in matheducation, educational psychology, and secondary-level teaching, using Webb’salignment tool (Webb, 1997; Webb, Alt, Ely, & Vesperman, 2005). The math testitems were also examined for their linguistic complexity using a linguistic contentanalysis protocol developed by the researchers (Wolf et al., 2008; Wolf & Leon,2009). For example, the number of words, lexical density, number of academicwords, number of academic grammatical features (e.g., passive, nominalization),and sentence structure were examined for every item. The results of the ratingscores were compared to results from the states’ current mathematics assessments,which had been conducted previously (Wolf et al., 2008). The results indicatedthat the linguistic complexity between our math test and states’ math tests werecomparable in terms of those features reviewed above.

In examining the psychometric quality of our math test, internal consistencyand correlation with other measures were investigated. Acceptable reliability esti-mates were obtained (cf., Cronbach’s α = 0.81 for entire participants, α = 0.82 forState X participants, α = 0.80 for State Y participants). A correlation analysiswas conducted between our math test scores and states’ math test scores. Thecorrelation coefficients provided supportive concurrent validity evidence for ourinstrument. Pearson correlation coefficients were 0.71 and 0.77 with State X andState Y standardized math test scores, respectively.

Accommodations. In order to implement the read-aloud accommodationin a standardized manner, an administration script of test items was created fortest proctors to read aloud verbatim to the students. The script development alsotook into consideration the states’ accommodation policies. Both States X and Yaccommodation guidelines specified that certain numbers and symbols were notallowed to be read aloud in states’ mathematics assessments because they wererelated to part of the construct to be measured. Thus, in our script, in cases wherenumbers and symbols were not to be read, the proctor was instructed to pause forthe duration it would have taken to read that particular number or symbol aloud.

With respect to the glossary accommodation, a test booklet with built-in glos-saries was created. The test items and layout were the same as the standard andread-aloud versions. English-to-English glossaries appeared in the right margin

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 8: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

352 WOLF, KIM, KAO

next to each item. Only non-content (i.e., non-math) words and phrases wereglossed (e.g., above, charges, drawer, of the following). General academic vocab-ulary words (e.g., consisting, contains, approximation) were also glossed, but notspecialized or technical terms (e.g., linear, polygons). Glossary definitions werebased on Longman Handy Learner’s Dictionary of American English (2000), withmodifications made based on age-appropriateness and relevance to the test item.The glossary version of the test was reviewed by two eighth-grade math teacherswith experience teaching ELL students. Feedback was provided on both glossedwords as well as test items, and alterations were made based on their feedback.

Student Think-Aloud Test and Interview Protocol

A sample of five items were selected from the glossary version of the test toelicit students’ think-aloud responses. Figure 1 displays the stems of the five think-aloud test items. Different types/ranges of linguistic complexity were taken intoconsideration when selecting the items. Some items were more complex than oth-ers in terms of the number of academic vocabulary, grammatical features, andcohesive devices presented. For example, Item 5 contained more vocabulary andgrammatical complexity compared to other items.

A retrospective interview protocol focused on five main areas to promptstudents as they completed the items on the Student Think-Aloud Test: compre-hension (Does the student understand the question?), problem solving (How doesthe student solve the problem?), difficulty (What is the students’ perceived diffi-culty of the problem?), accommodation use (Did the student utilize the glossarywords printed with the test items?), and students’ general perceptions on accom-modations. These areas were of particular interest to understand how ELL studentscomprehended the test items and solved using a given accommodation. Prior todata collection, the think-aloud procedure and interview protocol were piloted onlocal middle school students for refinement.

Procedure

For the math test, students were randomly assigned to one of three testing con-ditions: Standard (no accommodation), Read Aloud, and Glossary. Effort wasmade to ensure there were a roughly equal number of ELL and non-ELL studentsin each condition. Teachers or school administrators in the read-aloud conditionwere trained by the researchers on how to use the script prior to the testing toensure uniformity across all schools in administering the read aloud. The read-aloud condition was performed in a separate room with an average of 13 studentsper room.

For the think-aloud procedure, students met one-on-one with a researcher forapproximately 20 minutes following the math test. Students were selected on

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 9: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

EFFECTS OF GLOSSARY AND READ-ALOUD ACCOMMODATIONS 353

1.

How many triangles of the shape and size of the shaded triangle can the trapezoid above be divided into?

shaded: darker or filled in

above: on top

2. Of the following, which is the closest approximation of a 15 percent tip on a restaurant check of $24.99?

Of the following: from the choices

closest approximation: best guess; nearest amount

tip: money for the waiter or waitress

check: a bill

3. A group of students has a total of 29 pencils and everyone has at least one pencil. Six students have 1 pencil each, five students have 3 pencils each, and the rest of the students have 2 pencils each. How many students have only 2 pencils?

at least one: one or more

the rest: left over

4. A group of hikers climbed from Salt Flats (elevation -55 feet) to Talon Bluff (elevation 620 feet). What is the difference in elevation between Talon Bluff and Salt Flats?

hikers: people who walk in mountains

elevation: how high

climbed: walked up a mountain

Salt Flats: name of a place

Talon Bluff: name of a place

5. A plumber charges customers $48 for each hour worked plus an additional $9 for travel. If h represents the number of hours worked, which of the following expressions could be used to calculate the plumber’s total charge in dollars?

plumber: a person who fixes things

charges: asks for money; bill

customers: people who buy things

additional: extra

travel: going somewhere

represents: stands for

of the following: from the choices

calculate: find

FIGURE 1 Stems of the five think-aloud test items with glossary words.

a voluntary basis from those who turned in a parent permission form for theinterview. The students were shown a brief video clip demonstrating how to “thinkaloud” along with the researcher’s instructions. Students first performed a “thinkaloud” while solving the five items (concurrent verbal report), followed by an

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 10: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

354 WOLF, KIM, KAO

interview (retrospective verbal report; Ericsson & Simon, 1993). Interviews wereconducted primarily in English; however, students were encouraged to use theirnative language (Spanish and Mandarin Chinese only) when they struggled withanswering interview questions. All sessions were audio recorded, and then latertranscribed.

Quantitative Analysis

Multiple regression analyses were conducted to investigate the effects of glossaryand read-aloud accommodations on students’ performance on a math assessment,relative to a standard condition in which students received no accommodation. Thedesign of this study was a multisite randomized trial, in which students withinschools were randomly assigned to one of the three conditions outlined earlier.Although multilevel models typically provide a good analytical approach to multi-site randomized trials (see, e.g., Raudenbush & Liu, 2000; Seltzer, 2004; Shadish,2002), in this study we primarily used multiple regression models due to the smallnumber of schools (four in State X and nine in State Y) and negligible variabilityacross the schools. The school membership was controlled in our multiple regres-sion analyses with the variables of binary indicators of schools. We employedmultilevel models in cases where we tested out the robustness against modelspecifications.

The analyses were performed separately for States X and Y due to the differentvariables used from each state’s existing data set (e.g., state specific English profi-ciency level, state assessment scores) as well as the different contexts in differentstates. Our quantitative analyses focused on current ELL students and non-ELLstudents. Students who were reclassified as fluent in English were excluded fromthe quantitative analysis mainly due to the extremely small number of thesestudents.

Regression analyses were first conducted for the sample combining both ELLand non-ELL students, while the second set of analyses was conducted separatelyfor ELL and non-ELL students. The first set of analyses of the ELL and non-ELLstudent groups in a simultaneous analysis made the most of the data in terms ofthe statistical power. More importantly, it provided a statistical test of differencesbetween the ELL and non-ELL students in terms of whether and how much theybenefited from accommodations relative to no accommodation. The statistical testof differences between ELL and non-ELL students was conducted by incorporat-ing two interaction terms in the regression analysis: the interaction between ELLstatus and the indicator of the glossary condition, and the interaction betweenthe ELL status and the indicator of the read-aloud condition. The second set ofanalyses allowed us to examine whether ELL students with certain characteristics(e.g., English language proficiency, baseline mathematical content knowledge)benefited from accommodations. This was done by including the interaction termsbetween treatment status and the variables of student characteristics.

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 11: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

EFFECTS OF GLOSSARY AND READ-ALOUD ACCOMMODATIONS 355

Initially, we included various student background variables that were the-oretically considered a strong predictor of students’ academic performance inthe regression model. The variables of interest were students’ English languageproficiency (ELP) test scores or levels, free or reduced lunch (FRL) program par-ticipation, and state’s content test scores. By entering these variables in the model,we aimed to control for remaining imbalances in terms of the observed preexist-ing characteristics, and to increase the statistical power of estimating the effectsof treatments. Variables or interactions that were initially included, but did notexplain much variability in the outcome beyond the predictors that were alreadyin the model, were not included in our final model.

Regression Models for the Combined Sample of Both ELL andNon-ELL Students

We estimated differences in the outcome between the Glossary and theStandard conditions, between the Read Aloud and the Standard conditions, bothfor ELL and non-ELL students using the main and interaction effects of the treat-ment conditions and ELL status (i.e., ELL vs. non-ELL). The following multipleregression model (1) was used for State X and State Y:

Yi = β0 + β1Glossaryi + β2ReadAloudi + β3ELLi + β4Glossary∗ ELLi

+ β5ReadAloud∗ELLi + β6Mathscorei + β7Mathscore2i

+∑

k=1 to K−1βk+7Schoolki + β8Admini + ri, ri ∼ N(0, σ 2) (1)

The outcome in the above multiple regression model, Yi, is the number of itemsstudent i answered correctly in the math test developed for the current study.Glossary is a binary indicator of whether a student i was assigned to the Glossaryaccommodation condition. Similarly, ReadAloud is a binary indicator of whethera student i was assigned to the Read Aloud accommodation condition. ELL isa binary indicator of whether a student i was an ELL student (i.e., ELL studentscoded as 1; and non-ELL students coded as 0). Mathscore refers to the scale scoresfrom each state’s standardized math assessment in grade 8; and the quadratic termis also included to capture a curvature of the relationship. The Mathscore variablewas centered at its sample mean, and the quadratic term is the square of the cen-tered values. Schoolki are indicators of whether student i was in school k (threeindicators are included in the State X model, and eight indicators are included inthe State Y model because in each, one school serves as a baseline). Admin iswhether student i was in an administration setting where students had less than45 minutes to complete the test (due to various unexpected logistical challenges,in a few classrooms).

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 12: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

356 WOLF, KIM, KAO

The parameter, β1 represents the expected difference in the outcome betweenthe Glossary and Standard conditions for non-ELL students, while β2 representsthe expected difference in the outcome between the Read Aloud and Standardconditions for non-ELL students. In randomized studies, these expected differ-ences can be considered as the effects of treatments (i.e., Glossary and Read Aloudaccommodation conditions). The coefficient of ELL, β3, represents the expecteddifference in the outcome between ELL and non-ELL students who are in theStandard condition.

Interaction effects of the treatments and ELL status are captured in the param-eters β4 and β5. The parameter, β4, captures the expected difference in the effectof the Glossary condition between ELL and non-ELL students, while β5 capturesthe expected difference in the effect of the Read Aloud condition between ELLand non-ELL students. Therefore, the expected difference between the Glossaryand standard conditions for ELL students can be calculated as β1 + β4, while theexpected difference between the Read Aloud and Standard conditions for ELLstudents can be calculated as β2 + β5.

Note that variables such as reading scores in the state test, FRL status, and ELPscores or levels were initially included in the model based on our hypothesis, butdid not explain much variability in the outcome beyond the predictors that werealready in the model. These variables were not included in our final model.

Regression Models for ELL Students Only. The primary purpose of thisanalysis is to identify potential interactions between the accommodations and ELLstudent characteristics. We tested various interactions based on our hypotheses,and did not include interactions that did not turn out significant in our final model.No interaction was found in the State X sample. In contrast, significant interac-tion effects were found in the State Y sample. The following multiple regressionmodels (2) and (3) were thus used for State X and State Y, respectively:

Yi = β0 + β1Glossaryi + β2ReadAloudi + β3Mathscorei + β4Mathscore2i

+ β5School1i + β6School2i + β7School3i + β8Admini

+ ri, ri ∼ N(0, σ 2) (2)

Yi = β0 + β1Glossaryi + β2ReadAloudi + β3Mathscorei + β4Mathscore2i

+∑

k=1 to 8βk+4Schoolki + β8Admini + β9Glossaryi Mathscorei

+ β10ReadAloudi Mathscorei + ri, ri ∼ N(0, σ 2) (3)

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 13: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

EFFECTS OF GLOSSARY AND READ-ALOUD ACCOMMODATIONS 357

While β1 and β2 represent main effects of the treatments (i.e., Glossary and ReadAloud), β9 and β10 are the key parameters in the model, which represent an interac-tion effect between the Glossary condition and student state math test scores, andan interaction effect between the Read Aloud condition and student state math testscores, respectively.

Regression Models for Non-ELL Students Only

We followed a similar process that we used for ELL students in the analysis ofnon-ELL students. We tested various interactions between the given accommoda-tions and student characteristics, and included them when they were significant.The purpose of such analyses is to see whether interactions of a similar kind alsohold in the non-ELL student sample as compared to the ELL students.

Student Verbal Protocol Analysis

Both former and current ELL students were included in the verbal protocol analy-sis. Former ELL students included those who were reclassified as being fluent andunder a 2-year monitoring period as well as those who had been exited from ELLstatus for over 2 years.

As described above, a coding scheme to examine the students’ verbal reportswas developed, focusing on the five targeted areas of interest: comprehension,problem solving, item difficulty, accommodation use, and general perception ofaccommodation use. Each transcript was coded with the established scheme usingAtlas.ti1 qualitative data analyses software by two researchers with an averageof 84.1% exact agreement between the two raters. Disagreements were discussedto reach a consensus. Descriptive statistics were computed in order to find anypatterns in the areas of interest listed above.

RESULTS

Quantitative Results for Experimental Study: State X

Descriptive Results

Table 1 presents the descriptive statistics for ELL and non-ELL student samplesin State X. The variables include students’ outcome scores (math test scores fromthe present study, or “Test Score” in tables) by treatment condition, state assess-ment scores, socioeconomic status as indicated by free or reduced lunch (FRL)

1ATLAS.ti Scientific Software Development GmbH, Nassauische Str. 58, D-10717 Berlin,Germany.

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 14: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

358 WOLF, KIM, KAO

TABLE 1State X Descriptive Statistics for ELL and Non-ELL Students by Condition

Standard Glossary Read Aloud

Variables n M SD n M SD n M SD

ELL StudentsTest Score 44 13.14 4.75 36 10.56 3.36 37 13.95 4.92State Math Score 42 208.29 92.74 35 171.26 71.80 35 213.69 85.29State Reading Score 42 230.05 63.69 35 228.69 44.95 35 227.60 65.06FRL 42 0.69 0.47 35 0.91 0.39 35 0.77 0.43ELP Test Score 37 531.05 55.09 35 519.06 39.67 23 540.57 36.80ELP Level 37 3.78 1.32 35 3.31 1.11 23 4.13 0.97

Non-ELL StudentsTest Score 42 21.98 5.85 48 19.52 6.06 37 19.76 6.63State Math Score 40 340.95 65.07 46 322.48 62.47 37 318.41 79.16State Reading Score 40 330.28 44.97 46 314.30 52.58 37 314.14 44.32FRL 40 0.40 0.50 46 0.39 0.49 37 0.41 0.50

Note. Test score refers to scores from the math test in this study, out of a total 39 possible points.The state achievement test scale scores in math and reading range from 100 to 500. FRL refers tothe proportion of students participating in the free or reduced lunch program. The state ELP testscale score refers to the overall score, and ranges from 341 to 666. The state ELP levels range from1 to 5, with 5 being the highest level of proficiency. N sizes are lower due to missing backgrounddata.

program, ELP assessment scores, and ELP levels. The distributions of studentcharacteristics and scores were in general fairly similar across conditions. Somedifferences in student characteristics across conditions were observed, which attimes can occur in randomized studies that involve small sample sizes. For exam-ple, in the ELL student sample as shown in Table 1, students in the Glossarycondition tended to have lower test scores on state content and ELP assessments,and also tended to receive free or reduced lunch program more than those inthe other conditions. Also, in the non-ELL sample, students in the two accom-modation conditions had lower average test scores on state content assessmentsas compared to students in the Standard condition. Including such variables ascovariates in regression analyses helps avoid bias that arises from these preexist-ing differences. As described earlier, we initially included all these variables in themodel. Then some of them were dropped from the model in the model selectionprocess after results showed no need for further adjustment after controlling forother variables in the model.

Results for the combined sample of both ELL and non-ELL students.Table 2 presents the results for 235 ELL and non-ELL students from the multipleregression analysis in Equation 1. All parameters that we controlled for showed

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 15: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

EFFECTS OF GLOSSARY AND READ-ALOUD ACCOMMODATIONS 359

TABLE 2State X Multiple Regression Results for the Combined

Sample of ELL and Non-ELL Students (n = 235)

Estimate SE p

Intercept 20.19 1.41 <.0001Glossary −0.82 0.87 0.35ReadAloud −2.16 0.97 0.03ELL −3.23 0.99 0.00Glossary × ELL −0.72 1.23 0.57ReadAloud × ELL 2.27 1.29 0.08Mathscore 0.05 0.00 <.0001Admin −4.83 1.44 0.00Mathscore2 0.0002 0.00 <.0001Residual 15.91

the direction of relationships we expected: a positive and significant math contentknowledge–outcome relationship with very slight curvature, and lower perfor-mance for students in classrooms that ran out of time. The expected differenceof the Glossary condition relative to the Standard condition was not significant;nor was the interaction between Glossary and student ELL status. That is, in StateX there was no effect of the Glossary condition either for the ELL or non-ELLstudents.

As for the Read Aloud condition, the interaction between Read Aloud and stu-dent ELL status approached significance (coeff = 2.27, p = .08). This suggestsa differential effect of Read Aloud between ELL and non-ELL students in thisState X sample. While there is a significant negative effect of the Read Aloud fornon-ELL students as indicated in the main effect coefficient (coeff = −2.16, p =.03), the effect for ELL students is negligible (i.e., coeff = −2.16 + 2.27 = 0.11).

Results from separate analyses for ELL and non-ELL students. Sincethere was no significant interaction between the accommodation conditions andELL student characteristics in the State X sample, the separate analyses were con-sistent with the results from the analysis for the combined sample. Table 3 presentsthe results for 112 current ELL students from the multiple regression analysisshown in Equation 2. The expected differences of both accommodations relativeto no accommodation were not significant, indicating non-significant effects ofthe accommodations for ELL students (coeff = −1.34 and p = 0.12 for Glossary;coeff = 0.53 and p = .37 for Read Aloud).

For non-ELL students, the effect of the Glossary condition relative to theStandard condition was not significant. However, the Read Aloud conditionshowed significantly lower performance in the outcome scores relative to theStandard condition, which suggests that the Read Aloud, on average, hampered

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 16: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

360 WOLF, KIM, KAO

TABLE 3State X Multiple Regression Results for Current

ELL Students (n = 112)

Estimate SE p

Intercept 15.67 1.90 <.0001Glossary −1.34 0.87 0.12ReadAloud 0.53 0.93 0.57Mathscore 0.06 0.01 <.0001Admin −3.09 2.06 0.14Mathscore2 0.00 0.00 <.0001Residual 13.51

Note. Coefficients of binary indicators of “schools”were also estimated but omitted in this table for brevityof presentation.

the math test performance of non-ELL students (coeff = −2.51, p = .02, effectsize = 0.40).

Quantitative Results for Experimental Study: State Y

Descriptive Results

Table 4 presents the descriptive statistics for ELL and non-ELL student sam-ples in State Y. As in the State X sample analysis, variables showing pre-existingdifferences across conditions (e.g., state math test scores, FRL) were controlledfor in the analysis.

Results for the combined sample of both ELL and non-ELL Students.Table 5 presents the results for 262 students from the multiple regression analysis,using the model shown in Equation 1.

Similar to the State X sample results, the expected difference of the Glossarycondition relative to the Standard condition was not significant; nor was the inter-action between Glossary and student ELL status. This suggests that, in the StateY sample, there was no main effect of the Glossary conditions either for the ELLor non-ELL students.

As for the Read Aloud condition, the interaction between Read Aloud and stu-dent ELL status approached significance (coeff = 1.03, p = .07). This suggestsdifferential effects of the condition between ELL and non-ELL students. Therewas no effect of the Read Aloud condition on performance in math assessment fornon-ELL students with the coefficient being close to zero, while the effect for ELLstudents was positive with non-negligible magnitude (i.e., coeff = −0.0008 +1.03 = 1.03).

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 17: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

EFFECTS OF GLOSSARY AND READ-ALOUD ACCOMMODATIONS 361

TABLE 4State Y Descriptive Statistics for Current ELL and Non-ELL Students by Condition

Standard Glossary Read Aloud

Variables n M SD n M SD n M SD

ELL StudentsTest Score 43 13.09 4.03 43 12.86 5.49 52 13.38 4.69State Math Score 43 481.65 64.23 41 500.51 41.48 44 468.52 62.60State Reading Score 43 569.09 56.63 41 556.41 56.85 44 554.18 45.43FRL 43 0.88 0.32 43 0.88 0.32 52 0.83 0.38ELP Test Score 36 548.17 41.74 38 544.11 34.73 42 541.50 41.57ELP Level 36 3.67 0.99 38 3.53 0.73 42 3.50 0.99

Non-ELL StudentsTest Score 51 18.65 6.87 55 19.71 6.06 59 17.97 6.03State Math Score 42 548.62 48.33 47 555.17 52.42 45 542.96 55.63State Reading Score 42 635.52 47.47 46 648.89 41.10 45 640.78 50.46FRL 51 0.22 0.42 55 0.20 0.40 59 0.17 0.38

Note. Test score refers to the math test in this study, out of a total 39 possible points. The stateachievement test scale scores range from 310 to 890 for math, and 330 to 990 for reading. FRL refersto the proportion of students participating in the free or reduced lunch program. The state ELP testscale score refers to the overall score, and ranges from 341 to 666. The state ELP levels range from1 to 5, with 5 being the highest level of proficiency. N sizes are lower due to missing background data.

TABLE 5State Y Multiple Regression Results for ELL and Non-ELL

Students (n = 262)

Estimate SE p

Intercept 16.81 0.79 <.0001Glossary −0.39 0.68 0.57ReadAloud −0.0008 0.72 0.99ELL −0.75 0.43 0.08Glossary × ELL −0.05 0.59 0.93ReadAloud × ELL 1.03 0.58 0.07Mathscore 0.08 0.00 <.0001Admin −1.27 1.32 0.34Mathscore2 0.0002 0.00 <.0001Residual 12.72

Results from separate analyses for ELL and non-ELL students. In sepa-rate analyses for ELL and non-ELL students, we investigated interactions betweenaccommodations and various student characteristics (e.g., levels of prior con-tent knowledge measured by the state test scores, and levels of English languageproficiency). Table 6 presents the results for 128 current ELL students from the

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 18: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

362 WOLF, KIM, KAO

TABLE 6State Y Multiple Regression Results for Current ELL Students

(n = 128)

Estimate SE Z p

Intercept 13.93 0.81 17.12 <.0001Glossary 0.67 0.76 0.88 0.38ReadAloud 3.00 0.88 3.40 <.001Mathscore 0.07 0.01 6.54 <.0001Admin −3.24 1.41 −2.30 0.02Mathscore2 0.00 0.00 5.11 <.0001Glossary × Mathscore 0.04 0.01 3.00 <.01ReadAloud × Mathscore 0.02 0.01 2.19 0.03Residual 8.74

multiple regression analysis, using the model shown in Equation 3. Unlike in theState X sample, we found significant interaction effects in the ELL student sam-ple, which adds an important finding and distinguishes from the findings for thecombined sample.

When we focused on the ELL sample, the effect of the Read Aloud conditionwas notable. The Read Aloud condition showed a significant positive effect on theoutcome relative to the Standard condition. The magnitude of the coefficient was3.00, reaching about two thirds of one standard deviation of the outcome (effectsize = 0.65; p < .01). This means that students in the Read Aloud condition tendedto answer three more items correctly compared to students in the Standard con-dition, while controlling for other variables in the model. This is considered as amedium effect size (Cohen, 1988). The main effect of Glossary was not significant(coeff = 0.67; p = .38).

Both accommodations had significant interaction effects with student mathcontent knowledge, as measured by the state standardized math assessment. Thedirections of the interactions indicate that students with higher levels of contentknowledge (i.e., scored higher on the state math assessment) benefit from theaccommodations more than students with lower levels of content knowledge (i.e.,scored lower on the state math assessment).

As more significant findings emerged for the State Y sample, we also ran amultilevel model to examine whether the result was sensitive to the differences inmodel specification (i.e., consider schools as random effects versus fixed effects).The results from the multilevel model, as shown in Table 7, show similar find-ings to the multiple regression results: a significant main effect of Read Aloud;and positive interaction effects of both accommodations with students’ prior mathcontent knowledge.

Figure 2 shows the estimated relationships between prior math score on thestate’s standardized assessment and the outcome score, respectively for each

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 19: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

EFFECTS OF GLOSSARY AND READ-ALOUD ACCOMMODATIONS 363

TABLE 7State Y Results from Multilevel Models for Current ELL

Students (n = 128)

Fixed Effects Coefficient SE p

Intercept 14.31 0.57 <.0001Glossary 0.42 0.83 0.61ReadAloud 2.54 0.73 0.00Mathscore 0.07 0.01 <.0001Admin −2.87 1.01 0.01Mathscore2 0.00 0.00 <.0001Glossary × Mathscore 0.04 0.02 0.03ReadAloud × Mathscore 0.02 0.01 0.05

VarianceRandom Effects Component SE p

Intercept 0.42 0.78 0.29Standard Residual 8.41 2.02 <.0001Glossary Residual 13.34 3.08 <.0001Read Aloud Residual 4.95 1.16 <.0001

FIGURE 2 The estimated relationships between state math test score and the outcome.

condition (i.e., Standard, Glossary, and Read Aloud). As the figure shows, com-pared to the Standard condition (i.e., no accommodation), the fitted line for ReadAloud is above the fitted line for the Standard condition, which is from the sig-nificant main effect of Read Aloud. However, due to the interaction effect, the

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 20: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

364 WOLF, KIM, KAO

difference between the fitted lines between Read Aloud and Standard becomesgreater for students with higher prior math scores (in the state standardizedassessment).

For the Glossary condition, the fitted lines represent no main effect but onlythe interaction effect. On average, ELL students in the Glossary condition and inthe Standard condition performed at a similar level. However, students who camewith lower prior math knowledge, based on the state math assessment, performedworse in the Glossary condition than students in the Standard condition. However,students who came with higher prior math knowledge performed better in theGlossary condition than students in the Standard condition.

In summary, we found that student prior math scores tended to moderate theeffects of both accommodations, Glossary and Read Aloud, benefiting studentswith higher prior math scores more than students with lower prior math scores.

For the non-ELL student sample, we tested interactions between both accom-modations and student characteristics in the same way we did for the ELL sample.However, the interactions we found in the ELL student sample did not hold truefor non-ELL students. The effects of both accommodations relative to no accom-modation were not statistically significant (coeff = −0.16, p = .84 for Glossary;coeff = −0.16, p = .86 for Read Aloud). Thus, the analysis for the non-ELLsample reconfirmed the findings about non-ELL students from the combinedanalysis.

Qualitative Results: Students’ Verbal Protocol Analysis

As described earlier, the students’ verbal protocol analysis aimed to identify thedifficulties ELL students encountered while taking a math assessment. That is,whether the students’ difficulty stemmed from limited English language profi-ciency or lack of mathematical content knowledge was a focus of the analysis. Thequalitative analysis also focused on the students’ use of the given accommodationand their perception about the helpfulness of accommodations in taking a mathassessment. The results are presented corresponding to the research questions:language and content difficulties, the use of the glossary accommodation alongwith students’ prior experience with the focal accommodations, and students’perceptions on the helpfulness of the focal accommodations.

Language Difficulty in Items

In examining the extent to which the students had difficulty in understand-ing the language in the sample items, students were asked to paraphrase whatthe question was asking in their own words. Based on the students’ think-aloudand retrospective interview responses, four codes were assigned including “Yes:Students comprehended the question,” “Partial comprehension: There were some

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 21: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

EFFECTS OF GLOSSARY AND READ-ALOUD ACCOMMODATIONS 365

TABLE 8Language Comprehension Results for Each Item by ELL Status:

Frequency (%)

Item Yes Partial No Total

Current ELL1 40 (76.9) 8 (15.4) 4 (7.7) 52 (100.0)2 20 (44.4) 10 (22.2) 15 (33.3) 45 (100.0)3 23 (47.9) 22 (45.8) 3 (6.3) 48 (100.0)4 26 (51.0) 22 (43.1) 3 (5.9) 51 (100.0)5 23 (51.1) 16 (35.6) 6 (13.3) 45 (100.0)

Former ELL1 13 (100.0) 0 (0.0) 0 (0.0) 13 (100.0)2 11 (84.6) 1 (7.7) 1 (7.7) 13 (100.0)3 11 (78.6) 3 (21.4) 0 (0.0) 14 (100.0)4 11 (78.6) 3 (21.4) 0 (0.0) 14 (100.0)5 13 (92.9) 1 (7.1) 0 (0.0) 14 (100.0)

parts that students were unable to paraphrase or they said they did not understandabout certain parts” (in other words students comprehended the gist of the storyin an item, but did not adequately paraphrase parts of the story in an item), “No:Students did not comprehend the question or were unable to paraphrase the ques-tion at all,” and “Not sure: There was not enough evidence to judge students’comprehension of the language.” There were also a few cases where students didnot have enough time to complete each item (as described in the Method sectionearlier). These cases were coded as missing responses with “Not sure” cases.

Table 8 presents the summary of students’ comprehension of the five sampleitems. The results are presented by students’ ELL status: current and former ELLstudents.

As shown in Table 8, current ELL students were less able to appropriatelyparaphrase the items in their own words to demonstrate their comprehension oflanguage in items compared to former ELL students. Yet, current ELL studentsalso demonstrated at least partial understanding of the items by describing a part ofitems in their own words. All students were also asked whether each item includeddifficult words to understand and what those words were. Table 9 summarizes thestudents’ responses on the vocabulary difficulty in each item.

Various words were identified as difficult by ELL students including techni-cal words, general academic words, proper nouns, and relatively frequent wordssuch as “restaurant” and “climb.” Seemingly, the comprehension difficulty inItem 2 was not only from the vocabulary difficulty, but also from cultural andage-related experience for ELL students (i.e., tipping at a restaurant). Many stu-dents indicated that “closest approximation” was hard and also struggled withappropriately describing the phrase “a 15 percent tip on a check.” The following

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 22: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

366 WOLF, KIM, KAO

TABLE 9Vocabulary Difficulty Identified by Students in Each Item: Frequency (%)

Item nThere WereHard Words

I Knew AllWords Identified Difficult Words/Phrases∗

1 59 13 (22.0) 46 (78.0) shaded (8), trapezoid (4)2 54 27 (50.0) 27 (50.0) closest approximation (27), tip (4), of the

following (3), charges (2), percent (2),restaurant (2)

3 56 7 (12.5) 49 (87.5) at least one (3)4 56 30 (53.6) 26 (46.4) Salt Flats (16), Talon Bluff (14), elevation

(13), hikers (6), climbed (4)5 49 20 (40.8) 29 (59.2) plumber (9), additional (7), expressions (7),

represents (5), calculate (5), travel (2)

Note. ∗The numbers in parentheses indicate the number of students who identified the givenword/phrase as difficult.

excerpts indicate that these students were struggling to comprehend the phrasewhile repeatedly reading the item:

Student 1 (State X Current ELL)

INTERVIEWER: You want to tell me what the question is asking in your ownwords? How would you explain this question to me?

STUDENT 1: Of the following which is the closest . . . [reads to self again]I think it’s asking like . . . oh like, what is the tip of . . .

the . . . [quietly re-reading] the tip of . . . [reading silently toself again] I think they left like . . . tip for on . . . twenty fourand uh point ninety nine cent . . . Ah, I can’t think . . . I thinkthey let . . . left their tip . . . of like twenty four ninety nine.Twenty four dollar and ninety nine cents like, I think they leftfifteen dollars with it . . .

Sometimes, vocabulary identified as difficult was not necessarily a major obsta-cle to comprehending the items. For instance, although students identified propernouns such as “Salt Flats” and “Talon Bluff” in Item 4 as difficult, they appropri-ately described the problem as finding the “difference between the two places.”Moreover, some students demonstrated their ability to infer the meaning ofunknown words from the context as in the following excerpt:

Student 2 (State Y Current ELL)

INTERVIEWER: Were there any words that you didn’t know?STUDENT 2: Yeah, this [pointing to hikers].

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 23: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

EFFECTS OF GLOSSARY AND READ-ALOUD ACCOMMODATIONS 367

INTERVIEWER: You still explained when I asked you what this question wasasking. How did you figure out without knowing the word?

STUDENT 2: Because you had to like, you have to . . . I read the front andthe back and it says climbed . . . so maybe it’s like, hikers, aperson who hikes, climbs.

Content Difficulty in Items

The students’ problem-solving processes were examined through the stu-dents’ think-aloud as well as the retrospective interview responses. Four codeswere assigned to students’ verbal reports based on the students’ problem-solvingprocesses and answers: (1) correct (demonstrating appropriate mathematicalknowledge to correctly solve the given item); (2) incorrect attempt (demonstrat-ing some mathematical knowledge, but arriving at an incorrect answer); (3) guess(demonstrating no mathematical knowledge or the answer was chosen basedon non-mathematical reasoning); and (4) no attempt (which includes circling aresponse but not providing any type of reasoning).

Table 10 summarizes the students’ problem-solving results on each item.Overall, the current ELL students performed lower than the former ELL studentson all five items, as shown in the percentage of “Correct,” which is consistentwith students’ performance on the overall test. A number of current ELL stu-dents attempted to solve the given items, demonstrating an understanding of thelanguage in the items, but did not use appropriate mathematical procedures tocorrectly solve the items.

For cases of “Incorrect Attempt” and “Guess,” students often performed anincorrect arithmetic computation (e.g., mistakes in addition, multiplication, or

TABLE 10Problem-Solving Results for Each Item by ELL Status: Frequency (%)

Item CorrectIncorrectAttempt Guess

NoAttempt Total

Current ELL1 13 (24.1) 33 (61.1) 0 (0.0) 8 (14.8) 54 (100.0)2 4 (7.4) 29 (53.7) 11 (20.4) 10 (18.5) 54 (100.0)3 12 (22.2) 35 (64.8) 1 (1.9) 6 (11.1) 54 (100.0)4 8 (14.8) 42 (77.8) 0 (0.0) 4 (7.4) 54 (100.0)5 11 (21.6) 30 (58.8) 3 (5.9) 7 (13.7) 51 (100.0)

Former ELL1 6 (42.9) 8 (57.1) 0 (0.0) 0 (0.0) 14 (100.0)2 4 (28.6) 5 (35.7) 3 (21.4) 2 (14.3) 14 (100.0)3 7 (50.0) 7 (50.0) 0 (0.0) 0 (0.0) 14 (100.0)4 6 (42.9) 8 (57.1) 0 (0.0) 0 (0.0) 14 (100.0)5 10 (71.4) 3 (21.4) 1 (7.1) 0 (0.0) 14 (100.0)

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 24: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

368 WOLF, KIM, KAO

division). Item 2 was the most difficult for both current and former ELL studentsto solve correctly. Students who made an “Incorrect Attempt” tended to divide24.99 by 15. Many students also attempted to guess how much tip should be givenbased on real-life restaurant experiences instead of computing it. For instance,the following excerpt suggests that the student comprehended the language in theitem, but lacked the content knowledge to solve the item correctly:

Student 3 (State X Current ELL)

STUDENT 3: Because um I remember one time I went to a restaurant and . . . weleft ten percent tip and it was two dollars and something, so itcouldn’t be A and D was too much.

Use of Glossary Accommodation

To examine whether and how students used the given glossary for each item,students’ verbal reports were categorized into three groups: (1) “No” (student didnot look at the glossary at all); (2) “Looked” (student said that s/he looked, butdid not use because s/he knew all the words); and (3) “Looked and Used” (studentused the meaning shown in the glossary). Table 11 summarizes the students’ useof glossary for each item. Across all five items, students mostly said they did notlook at the glossary words. When prompted for reasons, students responded thatthey did not need the glossary because they already knew all the words, or becausethey “forgot” or did not realize the glossary was there.

Students’ Prior Experience with Glossary/Dictionary andRead-Aloud Accommodations

In order to understand whether students were familiar with using the givenaccommodations, students were asked if they had previous experience withglossary, dictionary, and read-aloud accommodations for their state’s standard-ized math assessments. Overall, students had little experience with the given

TABLE 11Students’ Use of Glossary for Each Item: Frequency (%)

Item nNo (Did Not

Look)Looked (But Knew

All the Words)Looked and

Used

1 52 28 (53.8) 18 (34.6) 6 (11.5)2 53 23 (43.4) 11 (20.8) 19 (35.8)3 50 32 (64.0) 12 (24.0) 6 (12.0)4 51 28 (54.9) 9 (17.6) 14 (27.5)5 46 25 (54.3) 7 (15.2) 14 (30.4)

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 25: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

EFFECTS OF GLOSSARY AND READ-ALOUD ACCOMMODATIONS 369

accommodations. Almost no students among the sample had previous experienceof using a built-in glossary or dictionary for a state math assessment. Two stu-dents in State X reported that the directions were read aloud and that the itemswere read aloud only when the students raised their hands to ask. More studentsreported having experience with read aloud in State Y (31% of the respondents)than students in State X (18% of the respondents).

Perceptions on the Helpfulness of Accommodations

In response to the question about whether the glossary accommodation washelpful, most students reported having a glossary as helpful (approximately 80%of the respondents). Some students had mixed feelings toward a glossary, stat-ing that it would only “sometimes” be helpful, such as only “if you need it.”With respect to read-aloud accommodation, about 63% of the students viewedreading aloud favorably. Students were more likely to report mixed or negativeperceptions regarding read aloud as compared to glossary. Students commentedon individuals’ different paces of solving problems or viewed being read aloud toas a distraction.

DISCUSSION

The present study investigated the effectiveness and validity of two accommoda-tions, glossary and reading aloud the entire test, provided to ELL students duringa mathematics assessment. As described earlier, these two accommodation typeshave been commonly allowed across states’ accountability assessments with anassumption that they, by directly supporting ELL students’ linguistic barriers,would be effective for ELL students.

Regarding the effect of the glossary accommodation, results of our studyindicated no significant difference in the ELL students’ performance on the math-ematics assessment in either state’s samples, compared to the Standard condition(i.e., receiving no accommodation). The students’ verbal protocol analysis resultsprovided some insight into this result. The majority of the students who par-ticipated in the think aloud did not utilize the provided built-in glossary whilecompleting the sample test items. A number of students reported that they “for-got” about the glossary and all students (who were asked) reported that they hadnever been provided a glossary during any mathematics testing. Collective evi-dence insinuates that students’ prior experience, familiarity, and skills in using aglossary may be important factors for improving the effectiveness of the glossaryaccommodation.

As for the read-aloud accommodation, the statistical analysis yielded mixedresults. In the State X sample, there was no significant effect of the read-aloud

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 26: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

370 WOLF, KIM, KAO

accommodation on ELL students’ performance on the given math test, comparedto those in the Standard condition. However, a significant positive effect of theread-aloud accommodation was detected in the State Y ELL sample. Althoughthe small sample size in this study limited us in generalizing the results to a largerpopulation, there are some plausible sources to explain these differences. First,the students’ verbal protocol analysis revealed that State Y sample students hadmore prior experience with the read-aloud accommodation than State X samplestudents. Secondly, according to State Y policy, State Y provided a test scriptdeveloped by their test publishers to be used for read aloud. State Y appeared tohave a more systematic implementation of the read-aloud accommodation, whenutilized. We speculate that the mixed effect of the read-aloud accommodationwas related to ELL students’ prior experience, similar to the finding about theglossary accommodation. State Y students were more likely to have received aread-aloud accommodation in the past, and were more likely to have received onein a systematic way.

Our analysis, which controlled for various students’ characteristics, yielded anotable result regarding the interaction between accommodation effects and stu-dents’ characteristics. In State Y ELL samples, there was a significant interactioneffect of both the glossary and read-aloud accommodations and ELL students’prior content knowledge, as measured by the states’ mathematics assessments.For instance, ELL students who scored higher on their state mathematics assess-ment benefited more from having a given accommodation than ELL students whoscored lower on their state’s mathematics assessment. This result suggests that theaccommodations help ELL students who have acquired content knowledge butdo not help those who have not. This finding signifies the importance of provid-ing accommodations to ensure the accessibility of content assessments for ELLstudents.

It is worth mentioning the results of our preliminary analyses where we exam-ined the interaction effect between the given accommodations and ELL students’background characteristics, particularly for ELL students’ English language profi-ciency. Based on previous literature, we expected that ELL students might benefitfrom accommodations differentially depending on their levels of English languageproficiency. However, results did not show such interaction effects with studentELP levels, as we described our methodology to arrive at our final model. A closelook at the data indicated that our sample had limited range of ELP levels. In theState X sample, there were more students in the medium to high levels (i.e., Levels3, 4, and 5, out of a possible 5 for State X’s ELP test) than in the lower levels. In theState Y sample, more than half of the ELL students in State Y were clustered inone level, Level 4. It appeared that our data did not have enough power to detectinteraction effects between accommodation and students’ ELP levels.

While the analysis on ELL students’ performance in different accommo-dation conditions was directly concerned with the effectiveness of a given

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 27: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

EFFECTS OF GLOSSARY AND READ-ALOUD ACCOMMODATIONS 371

accommodation, it was also intertwined with a validity issue. An accommodationthat is “effective” helps ELL students reduce any linguistic barriers that inter-fere with their ability to demonstrate their content knowledge. Thus, an effectiveaccommodation can increase the validity of the test scores for ELL students.In addition, the analysis of non-ELL students’ performance also attempted toexamine the validity issue. Providing accommodations should not change thenature of the construct being measured. Thus, providing accommodations to non-ELL students, who would not need linguistic support, should not significantlyincrease their test scores. One way of addressing these validity concerns was toprovide evidence that non-ELL students’ performance remain unchanged acrossall accommodation conditions.

The results of this study showed that there was no significant difference amongnon-ELL students’ performance in the different accommodation conditions in theState Y sample. However, State X non-ELL students in the Read Aloud conditionperformed lower compared to their peers in the Standard condition, which wasstatistically significant. One may speculate that reading aloud distracted non-ELLstudents who had less trouble reading silently and understanding the questions bythemselves.

Students’ verbal protocol analysis provided a deeper understanding of ELL stu-dents’ problem solving processes and difficulties with language presented in theitems. As expected, ELL students had some difficulty in understanding the lan-guage of the sample math items, as compared to former ELL students. However,the ELL students in the think-aloud analysis were found to struggle more with thecontent knowledge needed to solve an item correctly. That is, most ELL studentsattempted to solve the problems, which suggested understanding of the languagein the items, but did not use appropriate mathematical procedures to correctlysolve the items. For example, they used inappropriate operations such as multipli-cation instead of division, and vice versa. Students’ content knowledge limitationsneeds to be further investigated to explore whether their limited English languageproficiency interfered with students’ learning of the content.

Although most students identified hard words at the researchers’ requests, theytended to ignore the glossary while taking the sample test. Not surprisingly, stu-dents tended to list both general academic and specialized academic words as hardwords (e.g., “approximation,” “following,” “additional,” “expression”). It is alsonotable that some students identified words with higher frequency or part of dailyuse as hard. These words included “travel,” “restaurant,” and “climbed.” Theseresults suggest that explicit instruction of both academic and social vocabulary isneeded for ELL students even in math class at the secondary level.

The language complexity rating of items and students’ comprehension of itemsor performance on items were not necessarily related to each other for the givensample items. For instance, Item 1 in the think-aloud test had a relatively high lin-guistic complexity rating, particularly for the vocabulary and grammar categories.

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 28: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

372 WOLF, KIM, KAO

However, students generally comprehended this item well and performed betteron this item compared to other items. It may be the case that the visual image pre-sented in the item (e.g., shaded triangle and trapezoid illustrated) provided extracues for ELL students to comprehend and solve the item. Item 5, which also hada high linguistic complexity rating, with a higher number of academic words, wascorrectly solved by students more often than other items. Meanwhile, studentsperformed poorly on Item 2, which had a relatively low linguistic complexityrating and language demand. The verbal protocol analysis results revealed thatnot only did students have difficulty in applying appropriate mathematical pro-cedures, but also, the story in the problem (i.e., tipping at a restaurant) had littlecontextual relevance to students’ age and background. This result suggests thatsocio-cultural-economic background and age-appropriateness should be consid-ered in addition to language difficulty when examining potential sources of itemdifficulty for ELL students.

The results of this study offer a number of practical implications for policymakers and practitioners to consider in the use of accommodations. As discussedabove, students’ familiarity and prior experiences with a given accommodationseem to play an important role when looking at the effects of the accommodation.In order for accommodations to be effectively and appropriately used, they shouldbe part of daily classroom practice. It is thus critical for policy makers to provideclear accommodation guidelines for practitioners to utilize the accommodationsin their daily instruction and assessments for ELL students. Professional develop-ment should also take place to ensure the communication of the accommodationpolicies across the state, districts, and schools, and as well as between contentteachers and ELL teachers. Meanwhile, it is important to consider students’ con-tent knowledge as well as language proficiency when providing accommodations.This study suggests that if students had little content knowledge from the begin-ning, providing accommodations would make little difference regardless of theirlanguage proficiency level.

LIMITATIONS AND FURTHER STUDY

Although this study employed a careful research design with a randomizedexperiment accompanied by an analysis of students’ verbal reports, it has somelimitations. A major limitation is the small sample size, which requires the read-ers to be cautious in interpreting and generalizing the results. The results of thestudy also indicate that the effects of accommodations may be contingent upona specific ELL population and their experiences with accommodations. Studentsin the present study were largely from Spanish-speaking backgrounds and startedschool in the United States in early elementary grades, and were clustered at mod-erate to high English language proficiency levels. Future research should include

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 29: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

EFFECTS OF GLOSSARY AND READ-ALOUD ACCOMMODATIONS 373

a replication of this study with a different population, such as ELL students whohad experience with a glossary in testing and ELL students with a wider rangeof English language proficiency. Research is also needed to further investigateELL students’ opportunity to learn to better understand the students’ perfor-mance on content-area assessments. Additionally, examining non-ELL students’problem-solving processes through a verbal protocol analysis may help identifyELL-specific difficulties in tackling math items.

ACKNOWLEDGEMENTS

This study was conducted while the authors were at CRESST/UCLA. Thestudy was supported by the research grant (R305A050004) awarded by the U.S.Department of Education’s Institute of Education Sciences (IES). The findingsand opinions expressed herein are those of the authors and do not necessarilyreflect the positions or policies of the IES. We would like to thank Joan Herman,Noelle Griffin, and Jamal Abedi for their invaluable feedback, and Sandy Changand Nichole Rivera for their significant research assistance. Finally, our sincerethanks go to all the students and teachers who participated in this study.

REFERENCES

Abedi, J., Courtney, M., & Leon, S. (2003a). Effectiveness and validity of accommodations forEnglish language learners in large-scale assessments (CSE Tech. Rep. No. 608). Los Angeles,CA: University of California, National Center for Research on Evaluation, Standards, and StudentTesting.

Abedi, J., Courtney, M., & Leon, S. (2003b). Research-supported accommodation for English lan-guage learners in NAEP (CSE Tech. Rep. No. 586). Los Angeles, CA: University of California,National Center for Research on Evaluation, Standards, and Student Testing.

Abedi, J., Hofstetter, C., Baker, E., & Lord, C. (2001). NAEP math performance and test accommo-dations: Interactions with student language background (CSE Tech. Rep. No. 536). Los Angeles,CA: University of California, National Center for Research on Evaluation, Standards, and StudentTesting.

Abedi, J., Lord, C., Boscardin, C. K., & Miyoshi, J. (2000). The effects of accommodations on theassessment of LEP students in NAEP (CSE Tech. Rep. No. 537). Los Angeles, CA: University ofCalifornia, National Center for Research on Evaluation, Standards, and Student Testing.

Anderson, M., Liu, K., Swierzbin, B., Thurlow, M., & Bielinski, J. (2000). Bilingual accommoda-tions for limited English proficient students on statewide reading tests: Phase 2 (Minnesota ReportNo. 31). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum.Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. Cambridge, MA:

The MIT Press.Francis, D. J., Rivera, M., Lesaux, N., Kieffer, M., & Rivera, H. (2006). Practical guidelines for

the education of English language learners: Research-based recommendations for the use of

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4

Page 30: The Effects of Glossary and Read-Aloud Accommodations on English Language Learners’ Performance on a Mathematics Assessment

374 WOLF, KIM, KAO

accommodations in large-scale assessments. Portsmouth, NH: RMC Research Corporation, Centeron Instruction. Retrieved from http://www.centeroninstruction.org/files/ELL3-Assessments.pdf

Longman handy learner’s dictionary of American English (new edition) . (2000). Harlow, Essex UK:Pearson Education Limited.

No Child Left Behind Act of 2001, Pub. L. No. 107–110, 115 Stat. 1425 (2002).Raudenbush, S., & Liu, X. (2000). Statistical power and optimal design for multisite randomized trials.

Psychological Methods, 5, 199–213.Seltzer, M. H. (2004). The use of hierarchical models in analyzing data from experiments and quasi-

experiments conducted in field settings. In D. Kaplan (Ed.), The handbook of quantitative methodsfor the social sciences (pp. 309–330). Thousand Oaks, CA: Sage.

Shadish, W. R. (2002). Revisiting field experimentation: Field notes for the future. PsychologicalMethods, 7(1), 3–18.

Sireci, S. G., Li, S., & Scarpati, S. (2003). The effects of test accommodations on test performance:A review of the literature (Center for Educational Assessment Research Report. No. 485). Amherst,MA: University of Massachusetts, School of Education.

Webb, N. L. (1997). Criteria for alignment of expectations and assessments in mathematics and sci-ence education (Monograph No. 6). Council of Chief State School Officers and National Institutefor Science Education Research. Madison, WI: University of Wisconsin, Wisconsin Center forEducation Research.

Webb, N. L., Alt, M., Ely, R., & Vesperman, B. (2005). The WEB alignment tool: Development, refine-ment, and dissemination. Report to the Council of Chief State School Officers’ State Collaborativeon Assessment & Student Standards, Technical Issues in Large-Scale Assessment Collaborative.Washington DC: Council of Chief State School Officers.

Wolf, M. K., Herman, J. L., Kim, J., Abedi, J., Leon, S., Griffin, N., . . . Shin, H. W. (2008). Providingvalidity evidence to improve the assessment of English language learners (CRESST Rep. No. 738).Los Angeles, CA: University of California, National Center for Research on Evaluation, Standards,and Student Testing.

Wolf, M. K., Kao, J., Griffin, N., Herman, J. L., Bachman, P., Chang, S. M., & Farnsworth, T.(2008). Issues in assessing English language learners: English language proficiency measures andaccommodation uses—Practice review (CRESST Rep. No. 732). Los Angeles, CA: University ofCalifornia, National Center for Research on Evaluation, Standards, and Student Testing.

Wolf, M. K., & Leon, S. (2009). An investigation of the language demands in content assessments forEnglish language learners. Educational Assessment, 14(3&4),139–159.

Dow

nloa

ded

by [

UO

V U

nive

rsity

of

Ovi

edo]

at 0

4:04

15

Oct

ober

201

4