Upload
kimberly
View
213
Download
0
Embed Size (px)
Citation preview
Applied Measurement in Education, 27: 196–213, 2014Copyright © Taylor & Francis Group, LLCISSN: 0895-7347 print/1532-4818 onlineDOI: 10.1080/08957347.2014.905784
Formative Information Using Student Growth Percentilesfor the Quantification of English Language Learners’
Progress in Language Acquisition
Husein Taherbhai, Daeryong Seo, and Kimberly O’MalleyPearson
English language learners (ELLs) are the fastest growing subgroup in American schools. These stu-dents, by a provision in the reauthorization of the Elementary and Secondary Education Act, are to besupported in their quest for language proficiency through the creation of systems that more effectivelymeasure ELLs’ progress across years. In the past, ELLs’ progress has been based on students’ priorscores measuring the same construct. To disentangle effectiveness from achievement, the reportinghas generally targeted mean-group activity. In contrast, student growth percentiles (SGPs) provide acomparison of students’ growth with others who have the same achievement score history. By exam-ining the construct measured by an English language proficiency test as manifested in student scoresin Speaking, Listening, Reading and Writing, this article outlines the use of SGPs in providing infor-mation on how much each student needs to grow, which will allow educators to more effectively applydifferential formative instructional strategies.
In recent years, the study of English in K–12 settings has taken center stage in the United Statesbecause of the growing numbers of English language learners (ELLs) enrolled in schools acrossthe nation (Meyer, Madden, & McGrath, 2004; U.S. Government Accountability Office [GAO],2006; Van Roekel, 2008). However, many of these students’ academic performances fall wellbelow those of their non-ELL peers, not because of the lack of academic achievement but becauseof the inadequacy of their English language skills (Abedi, 2008).
Under the No Child Left Behind Act (NCLB) of 2001, each state is required to assess lan-guage proficiency via the four recognized English Language Proficiency (ELP) modalities (i.e.,Speaking, Listening, Reading, and Writing). As McCarthy (1999) points out, the use of modalityresults is necessary because it helps educators demarcate the underlying reasons of student dif-ferential performances, look for parallels between the processes in the learning of each modality,and use the information constructively in a classroom setting.
According to Abedi (2008), most states use a compensatory model for assessing students’language acquisition. Unlike conjunctive models where students must achieve “targets” in eachof the modalities of ELP to be considered proficient, students assessed by the compensatory
Correspondence should be addressed to Daeryong Seo, Psychometric and Research Services, Pearson, 19500Bulverde Road, San Antonio, TX 78259. E-mail: [email protected]; or to Husein Taherbhai, 1265 EarlfordDrive, Pittsburgh, PA 15227. E-mail: [email protected]
Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/hame.
GROWTH PERCENTILES OF ELLS 197
model may not be proficient in an aspect of language acquisition that is important when ELLsare mainstreamed into a non-ELL classroom (e.g., Reading), and yet be considered proficient intotal language acquisition.
In recent years, the paradigm shift in educational culture emphasizes the importance of assess-ment that provides information for formative purposes (Rushton, 2005), where relevant feedbackcan be used to minimize the existing gap between the actual and desired levels of performance(Nichols, Meyers, & Burling, 2009; Perie, Marion, & Gong, 2009).
While simple descriptive scores do provide a diagnostic aspect to the differential performanceof student achievement, they do not provide meaningful information that can be integrated intoeffective classroom learning and teaching (Ferrara & DeMauro, 2006; Goodman & Hambleton,2004; Roberts & Gierl, 2010). In its simplest form, the difference in two student scoreson a vertical scale can give an indication of growth. However, as Lissitz and Doran (2009)point out, additional information is required to meaningfully interpret that growth. Even whenwell-designed scales exist, Betebenner (2009) contends that vertical scales are, at their best, quasi-interval because the same change in student score points can lead to different amount of learning,depending on where the student is on the scale.
In the past, the primary effort in the analysis of growth has been to use prior student achieve-ment to disentangle the effectiveness (e.g., of teachers, of schools) from the aggregate level ofachievement (Ballou, Sanders, & Wright, 2004; Betebenner, 2007; Braun, 2005). The use of stu-dents’ prior scores (one or more, at different points in time) as indicators of students’ growth isbased on empirical evidence in the establishment of the relationship between students’ pretestsand outcome variables (e.g., Ho, 2011; Sanders, 2006). Generally speaking, two performancetasks (including one prior score) are necessary, although a larger number of prior scores canprovide more accurate results (Sanders, 2006).
Using prior student achievement to quantify teacher and school effectiveness has been broughtinto the limelight, primarily through value-added models (VAM) (Sanders, Saxton, & Horn,1997) of which Sanders’ (2006) Tennessee Value-Added Assessment System (TVAAS) and theEducational Value-Added Assessment System (EVAAS) have been very prominent. However,as Betebenner (2007) points out, models suitable for quantifying teacher and school effective-ness through students’ longitudinal data “are generally not well suited for making individualdeterminations concerning student progress” (p. 3).
Furthermore, it should be noted that most current multilevel approaches to measuring growthconsider measurement occasions as nested within students. These approaches use fit lines for thevertical scale with distinct slopes and intercepts for each student. However, the slopes represent an“average” rate of increase for the students across testing occasions, and these rates show statisticalartifacts that make lower achieving students increase at rates exceeding those of their higherachieving counterparts (Marsh & Hau, 2002).
While most formative methods provide the same type of intervention for ELLs with the samescore in the test, they fail to recognize differential requirements of students within these similarscoring groups. Recognizing that students achieve differently and knowing what these differencesare can instill realistic goals for ELLs. One way of providing fair differential growth expectancyfor students is through normative information whereby student scores are compared not to anaverage or aggregate achievement of students who have different achievement trends, but toother students who have an identical pattern of achievement across tests that measure the sameconstruct.
198 TAHERBHAI, SEO, O’MALLEY
Betebenner’s (2007) student growth percentile (SGP) model, which uses quantile regression,can be used as a formative assessment tool to compare students with identical prior history. TheSGP model allows students’ estimated entry scores at each of the predetermined percentiles tobe calculated based on their prior scores. Thus, the percentile necessary for growth to attaina predetermined target score can be assessed for each student. The differential propensity ofeach student for achieving a predetermined target score allows the application of instructionalresources in a way that does not expect too much or too little from each student. Creating realisticexpectations based on the potential of the student would likely allow for the allocation of scarceinstructional resources in the most productive manner and avoid setting some students up forfailure.
PURPOSE OF THE ARTICLE
The purpose of this article is to utilize Betebenner’s (2007) SGP model to provide formativeinformation for ELLs’ ongoing progress in English language proficiency (ELP). The method usesthe quantile regression model based on students’ prior total ELP scores and the ELP modalityscores to estimate each student’s growth percentile score for the total and the ELP modalities.
While the normative comparison of students with their academic peers (i.e., students whohave identical prior scores) allows the quantification of students’ potential, there is also a needfor examining how much the student needs to achieve to obtain the criterion-referenced target ofproficiency. Therefore, aside from comparing SGPs in a normative manner with academic peers,each student’s percentile growth will be examined to evaluate the percentile growth required inattaining a predetermined criterion score, i.e., the target score.
The purpose of this article then can be stated as:
1. quantifying growth percentile score for each ELL based ona. total ELP scores conditioned on previous years’ ELP scores,b. modality ELP scores conditioned on previous years’ ELP modality scores; and
2. determining the percentile entry score for each ELL in achieving the target score (i.e., thepercentile ranking a student needs to achieve proficiency in total and in each of the fourmodalities of the ELP examination).
THE STUDENT GROWTH PERCENTILE (SGP) MODEL
The conditional distribution of students using the SGP procedure provides the context withinwhich the students’ current achievement is understood normatively. In other words, students areexamined on their current performance with their academic peers by a classification of theirachievement in terms of the quantiles of interest. The comparative aspect of the SGP model isestablished through an examination of the percentiles (as in, say, achievement percentiles), whichbasically is a normative process that compares students based on the percentile ranking theyobtain.
The percentile of a student’s current score within his/her corresponding conditional distribu-tion can be translated to a probability statement of a student achieving the current percentile score
GROWTH PERCENTILES OF ELLS 199
given his/her prior achievement scores. In this context, current scores are the scores for whichpercentile estimates are needed based on the trend obtained from a set of prior year scores mea-suring the same construct, or a set of scores that have substantive meaning in its use as predictors.Mathematically,
SGP ≡ Pr(CurrentAchievement|PastAchievement) × 100
As can be seen from the above equation, unconditional normative percentiles normatively quan-tify achievement, while conditional percentiles normatively quantify growth (Betebenner, 2007).In other words, when the conditional aspect in the equation is removed, the SGP would simplybe the probability of obtaining percentile ranks from the current administration, which provides,through the students’ percentile rankings, a comparison among students who took the current test.On the other hand, when the percentile probability is conditioned on prior performance, it projectsstudent status that reflects the current ranking vis-à-vis the scores of the students in the previousyears (i.e., it reflects the trend [growth] that provides percentiles based on the performance inprior administrations).
Calculation of a student’s growth percentile is associated with the conditional density of thestudent’s score at time t using the student’s prior scores at times 1, 2, . . . , t-1 as the condition-ing variable. By conditioning on a covariate x (i.e., the prior score), the rth conditional quantilefunction, Qy(r|x), is given by (Betebenner, 2009):
Qy(r|x) = arg minβ∈Rp
n∑
i=1
pr(yi − x′iβ)
As can be seen from the above equation, when r = 0.5, then the estimated conditional quantileline is the median regression line.
SGPs provide a number of attractive features from both theoretical and practical perspectives.In a practical sense, aside from the fact that SGPs are familiar (e.g., in the field of Pediatrics)and easily communicable to the layperson, the probabilistic approach allows the stakeholders toestablish what is deemed adequate in terms of growth. However, as Betebenner (2007) pointsout, the classification of SGPs as being “adequate,” “good,” or “enough” is a standard settingprocedure, which will differ from one assessment to the other.
From a theoretical perspective, it should be noted that aside from the model being robust tooutliers (Betebenner, 2007), SGPs are uncorrelated with prior achievement, which is analogousto least squares-based residuals being uncorrelated with independent variables. Hence, there isno foundation for applying to the SGP model the common complaint about regression creatingartifacts that generally provide a faster rate of increase for lower achieving students compared tohigher achieving students (Marsh & Hau, 2002).
As with regression analysis, the quantile regression method models a relationship between aset of predictor variables and specific percentiles (or quantiles) of the response variable. It speci-fies a change in a specified quantile of the response variable produced by one unit change in thepredictor variables. Thus, the relative effect on student achievement is reflected by the change inthe size of the regression coefficients.
200 TAHERBHAI, SEO, O’MALLEY
Through the use of the SGP method as provided by Betebenner (2009), the regression coeffi-cients calculated for each specified percentile (see Table 3) can be used to predict score estimatesat the predetermined percentiles with respect to each student’s prior history. Therefore, as shownin Table 4, the score estimates can serve as an entry point for students to perform in a particularpercentile. When the focus is on the predetermined cut-score for achieving proficiency, this scorecan then be used to assess the percentile the student would have to grow in order to attain thetarget criterion-referenced cut.
METHOD
Participants
The sample of data analyzed for the study consists of 7,195 ELLs who had been administeredan ELP assessment since 2007. The large-scale ELP assessment was originally developed to testfive grade spans (i.e., K, 1–2, 3–5, 6–8, and 9–12) and annually administered to these ELLssince 2007. For our analyses, first graders who had been in the ELP program for five years since2007 were selected. These students’ fifth year of administration in 2011 was considered to betheir most current administration, while the other four years from 2007 to 2010 were the fourprior tests as indicators of the current year score.
Instrument
The ELP test, comprising four modalities (i.e., Speaking, Listening, Reading, and Writing), isintended to measure English language progress of the students who have a primary home lan-guage other than English. Five ELP test scores of each student for five consecutive years (2007 to2011) were used in this study.
The Scale and Proficiency Cut Scores
Even though most scores obtained in educational assessment fall under the equivariance of mono-tone scale transformation (Koenker, 2005), percentile rankings do not change in spite of theirestimates being a function of varying scales (Betebenner, 2007). Therefore, a vertical scale is nota requirement for SGP analyses, even though in this study, the scores of the ELP test were on avertical scale.
As per Wei and He (2006) and Betebenner (2007), B-splines were employed to accommodateheteroscedasticity and skewness of the conditional densities associated with the values of theindependent variables (i.e., student scores from 2007 to 2010). The B-spline parameterizationwas used because, according to Harrell (2001), they provide excellent fit and seldom lead to“estimation problems” (p. 20).
Generally speaking, proficiency cut scores are set as criterion-based achievement for “passing”the underlying construct that is being measured. Achieving the target cuts could imply variousthings in different assessments where target cuts are set as a proficient score, above proficientscore, and so on. In this article, we use the total proficiency cut as the score that indicates that thestudents do not need to attend ELL classes because he/she has enough language acquisition to
GROWTH PERCENTILES OF ELLS 201
function effectively in academic classes. For the modalities, each target cut indicates proficiencyin the particular modality. In practice, the target scores are generally set through a standard set-ting procedure. The proficiency scale score cuts for the ELP examination used in this article are674 for the Total, 686 for Reading, 678 for Writing, 657 for Listening, and 652 for Speaking.
Summary Statistics
Summary statistics that included the median and the median absolute deviation (MAD), which arerobust measures of univariate location and scale, respectively, were checked for marked deviation.
Model Fit Analyses
Histograms of the standardized residuals and two fitted density curves (normal and Kernel) wereused to examine regression fit to the data at the predetermined percentiles. Beside the visualinspection, a goodness-of-fit analysis was conducted by comparing the estimated conditional den-sity of the B-spline parameterization with the theoretical density for each predetermined quantile.The expectation from this analysis was to have the percentage of ELL students under each of thepredetermined percentiles reflect the scenario of a data set with perfect model fit (i.e., 10% ofstudents under or at the 10th percentile, 20% under or at the 20th percentile, etc).
Student Growth Percentile (SGP) Analyses
Two aspects of growth percentiles were needed to evaluate ELLs’ progress:
1. Norm-referenced inferences: The examination of each student’s SGP for the total scoreand the modality scores of the ELP examination to see how well they grew vis-à-vis theiracademic peers (i.e., those with identical prior scores).
2. Criterion-referenced inferences: The evaluation of the SGP needed for non-proficientELLs to achieve proficiency and for proficient students to maintain their SGP rankingfor the total score and across each individual modality ELP score.
In the first instance, the number of students who were proficient based on their current totalELP scores (i.e., their Year 5 scores) were identified. Then narrow bands of quantiles were pre-specified for total and modality ELP score estimations of these students, that is, from the 10thpercentile all the way to the 90th percentile in increments of 10 percentile points (i.e., 0.10, 0.20,0.30, etc.). The number of quantiles is generally based on the requirements of the researcher.
In this article, based on the total n-count of 7,195 students scale scores, the authors believedthat groupings from the 10th to the 90th percentile would provide enough discrimination in ana-lyzing each student’s unique percentile trajectory for formative purposes. The predicted valuesof the scores at the specified percentiles for each of these students were calculated using ProcQuantReg in SAS 9.2.
Students’ Year 5 scores were then examined vis-à-vis the predicted total ELP and modalityscores needed for entry into each percentile. The percentiles that harbored each fifth year modalityscore and the total ELP score became the students’ growth percentile ranks for the modalities andthe total score, respectively. The established proficiency cuts for the total ELP examination andeach of the four modalities were then examined for their location (i.e., the percentile in which
202 TAHERBHAI, SEO, O’MALLEY
they resided), to determine the SGP required for students to maintain proficiency or to achieveproficiency with the assumption of the same growth pattern over years (i.e., holding the priorperformance constant).
This kind of analysis could be a very useful tool for teachers in the context of formativeassessment for it would give them an indication of what growth is expected for each student tomaintain or attain proficiency, which in turn would help them formulate their teaching strategiesin terms of how much effort would be required for each of these students.
RESULTS
Summary Statistics
Summary statistics that included the median and the MAD were produced (see Table 1). In gen-eral, the mean of students’ total and modality ELP scores increases as the time goes along. Thedifferences between the mean and median as well as the differences between the standard devia-tion (SD) and the MAD indicate the presence of outliers. For Year 1, the mean and the median aswell as the SD and the MAD are a bit different; for Year 4 and Year 5, only the Speaking modalityshows such differences; otherwise, for all other years and modalities, the differences in the meanand median and the SD and MAD are very small.
Model Fit Analyses
Histograms overlaid with two fitted density curves (i.e., the Normal and the Kernel DensityCurves) of the standardized residuals were examined at each of the predetermined quantiles.Figure 1 displays one such histogram at the 50th percentile. As seen in the figure, the modelseems to fit the data well for the total scores. The model also fits the data well with varyingdegrees of “absolute” fit for the modalities.
To better discriminate between low and high achieving students’ model fit to the data, Year4 students’ B-spline scores (2010) were grouped by deciles based on low-to-high performingstudents. Percentages of the student growth using the current year scores (i.e., the fifth year,2011 scores) at the predetermined 10th to the 90th percentiles were calculated for each Decile(see Table 2).
Overall, the percentages of students under each percentile were not too far removed fromexpectations, with only a few more than three percentage points higher than expectation.
Diagnostics
As shown in Table 3, the Year 4 score had the highest regression coefficient, relative to the Year 1,Year 2, and Year 3 scores, which indicates that this predictor was the most influential of the fourpredictors in each of the growth percentiles across the total scores and the modalities. Similarresults were found for the modality-based analyses. Here, too, the Year 4 score had the highestregression coefficient, relative to Year 1, Year 2, and Year 3 scores.
It should be also noted that the Year 1 ELP score has no influence in the prediction of the Year5 ELP score (i.e., its coefficient is close to zero). This particular phenomenon is understandable
TAB
LE1
Sum
mar
yS
tatis
tics
ofF
ive
Year
s’E
LPA
sses
smen
t:To
tala
ndM
odal
ityS
core
Tota
lL
iste
ning
Spea
king
Rea
ding
Wri
ting
Vari
able
Mea
nM
edia
nSD
MA
DM
ean
Med
ian
SDM
AD
Mea
nM
edia
nSD
MA
DM
ean
Med
ian
SDM
AD
Mea
nM
edia
nSD
MA
D
Yea
r1
Scor
e58
159
537
1559
158
349
3859
458
462
5956
956
042
2157
756
640
23Y
ear
2Sc
ore
619
618
2423
622
645
4132
634
643
5053
621
614
4553
619
622
2932
Yea
r3
Scor
e64
764
926
2664
063
836
3668
167
553
5264
063
939
4164
565
135
35Y
ear
4Sc
ore
665
670
3128
663
665
3843
685
676
5379
666
668
4544
671
681
4944
Yea
r5
Scor
e69
069
233
3368
667
541
2870
569
746
7469
269
346
4470
170
553
44
203
204 TAHERBHAI, SEO, O’MALLEY
FIGURE 1 Histogram for standardized residuals at the 50th percentile:Total score.
TABLE 2Goodness-of-Fit Analysis: Estimated Percent of Students at or Below Each Percentile by Deciles Based on
the Students’ Total Scores
Group(Decile)
10thPercentile
20thPercentile
30thPercentile
40thPercentile
50thPercentile
60thPercentile
70thPercentile
80thPercentile
90thPercentile
1 15 24 31 42 50 58 68 76 872 13 21 31 42 50 62 66 77 883 13 21 34 39 46 59 72 78 894 11 21 29 41 49 56 66 81 875 10 20 32 38 54 63 71 80 936 11 23 31 39 51 58 67 77 917 12 22 32 41 50 60 70 78 938 12 23 32 41 50 60 69 81 909 11 21 29 38 48 61 75 75 9310 10 23 33 44 58 58 70 83 92
ExpectedValues
10 20 30 40 50 60 70 80 90
since this is the first year of the five-year period during which the students take the examination.At this stage, students are new to ELP classrooms and the effects of learning may not have beenassimilated. Furthermore, as can be seen from the table, the influence of the prior tests decreasesas the time between the test administrations increases (i.e., the coefficients are much lower for the
GROWTH PERCENTILES OF ELLS 205
TABLE 3Regression Coefficient Estimates of SGP Model: Total Score
Quantile Regression Coefficients
Percentiles Intercept S.E.
Year 1Score
Coefficient S.E.
Year 2Score
Coefficient S.E.
Year 3Score
Coefficient S.E.
Year 4Score
Coefficient S.E.
10th Percentile 30.85 12.37 −0.02 0.01 0.14 0.02 0.31 0.02 0.55 0.0220th Percentile 59.00 12.58 −0.02 0.01 0.12 0.02 0.30 0.02 0.54 0.0130th Percentile 75.61 11.63 −0.03 0.01 0.12 0.02 0.29 0.02 0.54 0.0140th Percentile 87.05 11.54 −0.04 0.01 0.12 0.02 0.29 0.02 0.54 0.0150th Percentile 94.16 11.66 −0.05 0.01 0.12 0.02 0.30 0.02 0.53 0.0160th Percentile 108.65 11.27 −0.06 0.01 0.12 0.02 0.30 0.02 0.53 0.0170th Percentile 103.99 11.62 −0.05 0.01 0.12 0.02 0.31 0.02 0.53 0.0280th Percentile 121.91 13.28 −0.07 0.01 0.12 0.02 0.30 0.02 0.54 0.0290th Percentile 112.27 20.16 −0.07 0.02 0.13 0.03 0.32 0.03 0.54 0.03
first few years compared to the most recent year, Year 4). This is intuitively understandable as stu-dents’ current behavior can be best estimated by their most immediate prior behavior. However, itis important to note that even though coefficients for the first few years are not large, they providea monotonically increasing trend for estimating growth in the current year (i.e., the students’ Year5 ELP scores).
The coefficients from Table 3 were used to estimate each student’s growth percentile score atthe selected percentiles. Examples of some students’ total scores, necessary to attain membershipin a percentile, are provided in Table 4. The table also provides the same information for thestudents based on their modality performance.
Growth Needed to Achieve Proficiency With Respect to Students’ Total Scores
As can be seen from Table 4, Student ID #1’s Year 5 score of 697 was above the establishedproficiency cut = 674. This student needs to grow in the same 50th percentile to maintain his/herproficiency status, holding his/her prior performance trend constant.
Similarly, Student ID #2’s growth is at the 50th percentile and this student has also reachedproficiency.
By the same token, Student ID #3 has achieved proficiency but fails to meet the adequategrowth category since his/her growth percentile score is at the 20th percentile (just below the30th percentile).
Student ID #4 has missed proficiency by one scale point and his/her SGP is just below the50th percentile (i.e., at the 40th percentile).
Student ID #5 has a very high growth percentile (70th percentile) but his/her Year 5 score of670 does not cross over the threshold to proficiency, which requires her/his already high growthlevel to increase slightly to the 80th percentile.
TAB
LE4
Eig
htE
xam
ples
ofS
tude
nt’s
Tota
land
Mod
ality
Pre
dict
edS
core
sA
cros
sth
eS
peci
fied
Per
cent
iles
Type
ofE
LP
Scor
eID No.
Year
5Sc
ore
10th
Perc
enti
le20
thPe
rcen
tile
30th
Perc
enti
le40
thPe
rcen
tile
50th
Perc
enti
le60
thPe
rcen
tile
70th
Perc
enti
le80
thPe
rcen
tile
90th
Perc
enti
le
Tota
l1
697∗
671
679
685
690
695
700
706
712
724
269
7∗66
967
868
368
969
469
970
471
172
33
688∗
676
684
689
695
699
704
710
717
728
467
364
965
766
366
867
467
868
469
170
25
670
631
640
646
652
657
662
667
674
684
667
068
969
670
270
771
271
772
373
074
17
650
641
650
656
661
666
671
677
683
694
858
658
959
960
661
161
662
162
663
364
2L
iste
ning
172
3∗64
165
266
066
867
568
369
270
472
42
723∗
645
658
667
674
683
691
699
712
736
367
5∗68
169
370
271
072
173
074
075
578
94
694∗
624
636
644
651
658
665
673
684
699
572
3∗62
163
264
064
765
366
066
967
969
36
660∗
651
662
670
678
686
694
702
714
738
764
862
463
564
465
265
866
567
468
569
98
648
614
628
636
643
650
657
664
673
690
Spea
king
167
1∗66
468
770
373
274
274
774
774
774
72
655∗
619
636
649
660
678
704
747
747
747
374
7∗65
367
568
971
373
474
774
774
774
74
747∗
654
674
689
707
722
731
747
747
747
574
7∗61
764
265
267
871
274
774
774
774
76
619
659
682
697
728
739
747
747
747
747
767
165
267
068
770
871
272
174
774
774
78
671
629
647
658
673
694
714
747
747
747
206
Rea
ding
168
266
067
268
168
869
570
371
272
474
62
750∗
670
681
690
697
705
712
721
733
752
370
6∗65
867
168
068
969
670
371
272
374
14
663
640
654
663
672
679
687
697
709
729
564
663
564
965
866
767
568
469
370
571
86
723∗
673
684
692
700
707
714
723
734
756
761
961
162
863
864
865
666
467
468
670
08
437
497
520
531
543
552
565
578
594
606
Wri
ting
173
0∗66
867
868
769
570
271
272
373
876
42
675
693
704
713
722
730
742
755
771
799
366
465
566
767
768
569
270
271
272
674
54
654
631
643
652
659
666
673
683
695
716
565
462
763
764
665
466
066
767
568
770
96
688∗
710
722
731
740
749
762
776
793
819
767
563
564
865
666
467
167
868
970
172
28
414
517
526
533
537
540
538
542
545
568
∗ ind
icat
esin
divi
dual
who
pass
espe
rfor
man
cecu
tsco
re.
207
208 TAHERBHAI, SEO, O’MALLEY
Student ID #6 has the same Year 5 score as Student ID #5. However, in comparison to his/heracademic peers, Student ID #6’s progress is rather low, placing him/her below the 10th percentile.In terms of attaining proficiency, however, this student can increase his/her score slightly toachieve proficiency at the 10th percentile growth level.
Student ID #7 is not only progressing at a low 20th percentile but he/she needs to grow in the70th percentile to meet proficiency.
Student ID #8 is very low in achieving proficiency, and his/her growth percentile is belowthe 10th percentile. As with Student ID #7, this student would have to increase his/her growthpercentile to beyond 90th percentile in comparison with his/her academic peers to achieveproficiency.
Examination of Growth Needed to Achieve Proficiency With Respect to Both Students’Total and Modality Scores
Analyzing modality scores in the same manner as we did the total scores in the previous sectioncan help teachers allocate resources in an appropriate manner as befitting those who are at varyingneed for intervention with respect to the particular modalities.
In examining the modality scores, there are several percentiles at the higher end for theSpeaking modality that have the same entry score (i.e., 747; see Table 4). This can happen withthe ELL population because proficiency in Speaking is most easily achievable among the differ-ent ELP modalities (Menken & Kleyn, 2009). In the data set used in this article, there were alarge number of students who had achieved high scores in Speaking (i.e., approximately 48% ofthe students had scores at or above 747). For the eight students shown as examples in Table 4,all had reached the 747 mark at the 60th or the 70th percentile (see Table 4), which provided thesame entry score for the remaining percentiles at the higher end.
The crux of the analyses, however, lies in teasing out the effects of compensatory scoringso that a student, who is proficient or nearly proficient with respect to the total score, may notperform well in mainstream classrooms because of his/her lack of proficiency in one or more ofthe modalities.
Deciphering who is proficient can be easily accomplished by examining the scale scores.However, this type of information would not include how much effort is needed to achieve pro-ficiency. Additional scrutiny of the entry score that is needed for proficiency at each modalitypercentile can inform us of the percentile growth needed in the modalities not only to achieveoverall English language proficiency but also to perform adequately in academic classes.
For example, as shown in Table 5, while Student ID #2 has reached total ELP proficiency (Year5 score = 697), he/she is lacking in Writing, which is one of the key components of academicsuccess (Robertson, 2009). Granted, this student has not missed proficiency in Writing by much(675, which is very close to the proficient cut score of 678). Nevertheless, the fact that this studentgrew only to a level below the 10th percentile in comparison to his/her academic peers in Writingis troubling because it shows that the student has not improved his/her writing much in recentyears. Overall, the message from this model is that, with a little attention on Writing by his/herteacher, this student could obtain the growth in Writing needed (in this case, to the 10th percentile)to achieve proficiency in that modality.
On the other hand, Student ID # 8 has a very poor Total score (i.e., 586), with a SGP set atbelow the 10th percentile and the proficiency cut that lies beyond the 90th percentile. While there
TAB
LE5
Exa
mpl
esof
Teac
her
Leve
lInf
orm
atio
nD
epic
ting
Indi
vidu
alS
tude
nt’s
Gro
wth
and
The
irP
redi
cted
Gro
wth
Per
cent
iles
toA
chie
veor
Mai
ntai
nP
rofic
ienc
y:To
tala
ndM
odal
ityS
cale
Sco
res
(SS
)
Tota
lR
eadi
ngW
riti
ngSp
eaki
ngL
iste
ning
Std
IDYe
ar5
SGP
Est
imat
eof
Gro
wth
for
Pro
f.at
SS=
674
Year
5SG
P
Est
imat
eof
Gro
wth
for
Pro
f.at
SS=
686
Year
5SG
P
Est
imat
eof
Gro
wth
for
Pro
f.at
SS=
678
Year
5SG
P
Est
imat
eof
Gro
wth
for
Pro
f.at
SS=
652
Year
5SG
P
Est
imat
eof
Gro
wth
for
Pro
f.at
SS=
657
Com
men
ts
269
750
thPe
rcen
tile
Ach
ieve
dPr
of75
080
thPe
rcen
tile
Ach
ieve
dPr
of67
5B
elow
10th
Perc
entil
e
10th
Perc
entil
e65
530
thPe
rcen
tile
Ach
ieve
dPr
of72
3A
bove
90th
Perc
entil
e
Ach
ieve
dPr
ofTo
talP
rof
butn
eed
som
ehe
lpin
Wri
ting
whe
rehe
/sh
eha
sno
tac
hiev
edPr
of.A
lso
need
todo
bette
rin
Wri
ting
and
Spea
king
com
pare
dto
his
acad
emic
peer
s4
673
40th
Perc
entil
e50
thPe
rcen
tile
663
30th
Perc
entil
e70
thPe
rcen
tile
654
30th
Perc
entil
e70
thPe
rcen
tile
747
70th
Perc
entil
eA
chie
ved
Prof
694
80th
Perc
entil
eA
chie
ved
Prof
Just
one
scor
epo
intl
ess
toac
hiev
eTo
tals
core
profi
cien
cybu
tnee
dsex
tens
ive
rem
edia
lin
Rea
ding
and
Wri
ting
858
6B
elow
10th
Perc
entil
e
Bey
ond
90th
Perc
entil
e
437
Bel
ow10
thPe
rcen
tile
Bey
ond
90th
Perc
entil
e
414
Bel
ow10
thPe
rcen
tile
Bey
ond
90th
Perc
entil
e
671
40th
Perc
entil
eA
chie
ved
Prof
648
50th
Perc
entil
e60
thPe
rcen
tile
The
stud
enth
asa
long
clim
bin
Tota
lgro
wth
toac
hiev
epr
ofici
ency
.N
eeds
exte
nsiv
ere
med
iali
nR
eadi
ngan
dW
ritin
g.A
lso
need
sso
me
help
inL
iste
ning
toov
erco
me
the
10po
ints
perc
entil
ega
p
Not
e.Pr
of=
Profi
cien
cy;S
td=
Stud
ent.
209
210 TAHERBHAI, SEO, O’MALLEY
may be indications that this student needs help across all modalities, he/she is a classic case ofthe type of student who is proficient in Speaking (the student is proficient in Speaking with SGPin the 30th growth percentile; see Table 5) but who does poorly in academic achievement. SeeMenken and Kleyn (2009) for a discussion of fluency in speaking being construed as an indicationof language proficiency.
By the same token, Student ID #4 has missed the total ELP proficiency score by just one scalepoint (673 instead of 674). However, an examination of the modality scores shows the student issignificantly lacking in both Reading and Writing skills (i.e., he/she has achieved 30th percentilegrowth for both Reading and Writing) even though the student has 80th percentile growth forSpeaking and Listening. To obtain a label of “true” proficiency that will serve the student wellin English-laden academic subjects, the teacher needs to concentrate her/his efforts on the stu-dent’s Reading and Writing in the ELP classroom. In other words, this student has to grow in the60th percentile for Reading and in the 70th percentile for Writing to perform well in academicclasses.
Thus, it becomes evident that each student’s modality performance should also be scrutinizedin the same manner as was shown in our discussion on Total scores so that, even if studentsperform at proficiency in overall ELP scores, they must also be proficient in their modality scoresto be able to perform well in academic classrooms laden with the language component.
COMMENTS AND DISCUSSION
As outlined by the United States Department of Education’s Blueprint for Reform (2010), moreeffective measures of students’ growth are expected for all students. This expectation is all themore important for ELL students because fluency in the language could also affect their perfor-mance in academic content areas, particularly where academic learning is associated with theknowledge of the English language.
While ELLs’ overall progress in ELP can be assessed simply by observing the relative cutemployed for passing the ELP examination, an evaluation of ELLs’ performance on the fourmodalities becomes paramount when assessing areas of weakness and strength within the ELPconstruct for each student. Because of the use of compensatory scoring by many ELP assess-ments, students’ proficiency in certain modalities could be so low that it could result in pooracademic performance (Menken & Kleyn, 2009) even though the student could have achieved atotal score of proficiency on an ELP examination.
As stated earlier, the modality in which a student does not perform adequately can easilybe obtained from the actual scores and the proficiency cut-offs established for each modality.However, unlike the SGP model, the simple cut-off calculation in each of the four modalities doesnot give us an indication of how much effort is required to help the student achieve proficiency.
As Betebenner (2007) points out, SGPs focus on normatively quantifying changes in achieve-ment instead of focusing on the magnitude of learning. In analyzing the information provided inTables 4 and 5, it becomes evident from the use of quantile regression methods that achieving agiven level of proficiency can require differential effort from two students with the same currentscore. When knowledge of comparative growth is lacking, instructional efforts are appliedwithout a fair reckoning of students’ growth potential. As such, students are often instructedbased on an average performance criterion that lumps them together in an undifferentiated
GROWTH PERCENTILES OF ELLS 211
manner, instead of receiving remedial help that varies for each student based on different levelsof achievement across years.
Thus, when the SGP model is applied as a formative assessment tool, students can benefitfrom using individualized, tailor-made remedial activities. Furthermore, providing teachers theirstudents’ differential propensity for achieving proficiency may assist in mollifying those teacherswho claim that it is unfair to be held accountable based on a single measure of progress for allstudents.
As in many educational assessments (e.g., students’ ability estimations), the standard errorsfor the regression coefficient at the extremes of the percentile groupings seems a bit larger relativeto the middle of the predetermined percentile scale. However, this relative difference depends onfactors such as low student counts at the scale extremes and measurement error. The importanceof accuracy and precision, therefore, is dependent on the type of inferences one wishes to drawfrom the growth percentiles with the understanding that precise comparison cannot be sustained(Betebenner, 2007).
Much like other growth models, students with missing prior scores would have to be elimi-nated from the SGP analyses. Other SGP models could be created for students with a differentconsecutive number of prior scores, provided enough n-counts are available for such analyses (seeGrady, Lewis, & Gao, 2010 for minimum sample size requirements). It should be kept in mindthat while more than one prior score is desirable for more accurate SGP estimates, the net effecton formative information is that it serves as a guideline for teachers’ remedial actions. Any helpin that direction is always useful as a starting point for intervention techniques. While estimationof missing scores to account for dwindling n-counts is a possibility (Sanders, 2006), imputingmissing values carries the risk of larger sampling errors.
Projecting how much students need to grow is based on holding the growth pattern con-stant. But like any dynamic situation, students’ growth trajectories are likely to change, andtherefore, if behooves educators to monitor student progress across years as is done by theColorado Department of Education (see http://ww2.ed.gov/adminis/lead/account/growthmodel/co/coattachment1tutorial.pdf). Appropriate changes can then be made in remedial efforts for eachstudent by an examination of his/her yearly growth toward proficiency.
Finally, using the SGP method for comparison can seem a daunting task for some educa-tors, particularly for those teachers whose ELLs’ growth requirements for achieving proficiencymay seem unattainable. Nevertheless, this concern does not undercut the useful information themethod provides (viz., the desirable property of knowing how much growth is required). On thecontrary, the SGP method for comparison can facilitate the better allocation of finite resourcesfor those students who are having problems, allowing educators to revisit these problems withcreative teaching methods, one-on-one tutorials, motivational strategies, and parental involve-ment that provides the right type of intervention on a differential basis for each student (Seo &Taherbhai, 2009).
ACKNOWLEDGMENTS
The authors thank the editor, Dr. Kurt F. Geisinger, and the reviewers for their valuable adviceand suggestions, which greatly improved the article. Thanks are also due to Dr. Mark G. Robeckfor his constructive editorial suggestions. All errors remain the responsibility of the authors.
212 TAHERBHAI, SEO, O’MALLEY
REFERENCES
Abedi, J. (2008). Classification system for English language learners: Issues and recommendations. EducationalMeasurement: Issues and Practice, 27, 17–31.
Ballou, D., Sanders, W., & Wright, P. (2004). Controlling for student background in value-added assessment of teachers.Journal of Educational and Behavioral Statistics, 29, 37–65.
Betebenner, D. W. (2007). Estimation of student growth percentiles for the Colorado student assessment program.Retrieved from http://www.cde.state.co.us/cdedocs/Research/PDF/technicalsgppaper_betebenner.pdf
Betebenner, D. W. (2009). Norm-and criterion-referenced student growth. Educational Measurement: Issues andPractice, 28, 42–51.
Braun, H. I. (2005). Using student progress to evaluate teachers: A primer on value-added models. Retrieved fromhttp://www.ets.org/Media/Research/pdf/PICVAM.pdf
Ferrara, S., & DeMauro, G. E. (2006). Standardized assessment of individual achievement in K–12. In R. L. Brennan(Ed.), Educational measurement (4th ed., pp. 579–621). Westport, CT: American Council on Education/Praeger.
Goodman, D. P., & Hambleton, R. K. (2004). Student test score reports and interpretive guides: Review of currentpractices and suggestions for future research. Applied Measurement in Education, 17, 145–220.
Grady, M., Lewis, D., & Gao, F. (2010). The effect of sample size on student growth percentiles. Retrieved from http://www.ctb.com/ctb.com/control/getAssetListByFilterTypeViewAction?param=393&title=topic&p=library
Harrell, F. E. (2001). Regression modeling strategies: With applications to linear models, logistic regression, and survivalanalysis. New York, NY: Springer-Verlag.
Ho, A. (2011). Growth and consequences: Will NCLB give way to growth models? Retrieved from http://www.gse.harvard.edu/blog/news_features_releases/2011/01/growth-and-consequences-will-nclb-give-way-to-growth-models.html
Koenker, R. (2005). Quantile regression: Econometrics society monographs. New York, NY: Cambridge UniversityPress.
Lissitz. B., & Doran, H. (2009). Modeling growth for accountability and program evaluation: An introduction forWisconsin educators. Retrieved from http://marces.org/completed/Lissitz%20(2009)%20Modeling%20Growth%20for%20Accountability.pdf
Marsh, H. W., & Hau, K. T. (2002). Multilevel modeling of longitudinal growth and change: Substantive effects orregression toward the mean artifacts? Multivariate Behavioral Research, 37, 245–282.
McCarthy, C. P. (1999). Reading theory as a microcosm of the four skills. Retrieved from http://iteslj.org/Articles/McCarthy-Reading.html
Menken, K., & Kleyn, T. (2009). The difficult road for long-term English learners. Retrieved from http://www.ascd.org/
publications/educational_leadership/apr09/vol66/num07/The_Difficult_Road_for_Long-Term_English_Learners.aspx
Meyer, D., Madden, D., & McGrath, D. (2004). English language learner students in U.S. public schools: 1994 and 2000(Issue Brief No. 2004-035). Jessup, MD: National Center for Education Statistics.
Nichols, P. D., Meyers, J. L., & Burling, K. S. (2009). A framework for evaluating and planning assessments intended toimprove student achievement. Educational Measurement: Issues and Practice, 28, 14–23.
Perie, M., Marion, S., & Gong, B. (2009). Moving toward a comprehensive assessment system: A framework forconsidering interim assessments. Educational Measurement: Issues and Practice, 28, 5–13.
Roberts, M. R., & Gierl, M. J. (2010). Developing score reports for cognitive diagnostic assessments. EducationalMeasurement: Issues and Practice, 29, 25–38.
Robertson, K. (2009). Math instruction for English language learners. Retrieved from http://www.readingrockets.org/article/30570/
Rushton, A. (2005). Formative assessment: A key to deep learning? Medical Teacher, 27, 509–513.Sanders, W. L. (2006). Comparisons among various educational assessment value-added models. Retrieved from http://
www.sas.com/govedu/edu/services/vaconferencepaper.pdfSanders, W. L., Saxton, A. M., & Horn, S. P. (1997). The Tennessee value-added assessment system: A quantitative
outcomes-based approach to educational assessment. In J. Millman (Ed.). Grading teachers, grading schools: Isstudent achievement a valid measure? (pp. 137–162). Thousand Oaks, CA: Corwin Press.
GROWTH PERCENTILES OF ELLS 213
Seo, D., & Taherbhai, H. (2009). Motivational beliefs and cognitive processes in mathematics achievement, analyzedin the context of cultural differences: A Korean elementary school example. Asia Pacific Education Review, 10,193–203.
U.S. Department of Education (2010). A blueprint for reform: The reauthorization of the Elementary and SecondaryEducation Act. Retrieved from http://www2.ed.gov/policy/elsec/leg/blueprint/blueprint.pdf
U.S. Government Accountability Office (2006). No Child Left Behind Act: Assistance from education could help statesbetter measure progress of students with limited English proficiency. Retrieved from http://www.gao.gov/highlights/d06815high.pdf
Van Roekel, D. (2008). English language learners face unique challenges. Retrieved from http://www.nea.org/assets/docs/mf_PB05_ELL.pdf
Wei, Y., & He, X. (2006). Conditional growth charts. Annals of Statistics,34, 2069–2097.
Copyright of Applied Measurement in Education is the property of Taylor & Francis Ltd andits content may not be copied or emailed to multiple sites or posted to a listserv without thecopyright holder's express written permission. However, users may print, download, or emailarticles for individual use.