Supporting Teachers’ Efforts to use Assessment to …steinhardt.nyu.edu/scmsAdmin/uploads/002/459/Science SOM... · Web viewTO is the target spring letter-word reading outcome (described

Algorithm-guided Individualized Instruction page 1

Supporting Online Material

Appendix SA

The Individualizing Instruction (ISI) Intervention

The individualizing student instruction (ISI) intervention integrates several areas of

research – child by instruction interactions (1-4), evidence-based instruction (5, 6), the “beat-the-

odds” literature on effective schools (7-9), and teacher planning and organization (10-13).

Evidence-based instruction incorporates methods of instruction with empirical evidence of

efficacy. In the “beat-the-odds” literature, researchers found characteristics common to schools

that demonstrated high student achievement that were not in evidence in less effective schools.

These characteristics included a dedicated block of time for teaching reading, teaching reading to

smaller groups of children with similar learning needs, and using assessment to guide instruction.

Additionally, effective teachers were masterful planners and organizers so that classroom time

was used efficiently.

The ISI intervention was comprised of two components: (1) the A2i software, which

computed recommended amounts and types of instruction for each child based on their assessed

vocabulary and reading skills, organization and planning features, and a catalogue of

instructional activities indexed to the dimensions of instruction: and (2) and professional

development designed to teach teachers how to individualize reading instruction in their

classroom using the A2i recommendations and planning features. Teachers who fully

implemented ISI had a dedicated uninterrupted time for language arts instruction, used multiple

student-grouping configurations, including homogenous reading-skill groups, in order to address


the unique needs of the individual students in their classrooms. Based on classroom observations,

the content of the literacy instruction was attuned to the skill level of the students in the groups

and amounts and types of instruction aligned with A2i individual student recommendations.

There was an observable system in place (e.g., center chart, daily schedule) for organizing

students into groups and facilitating transitions from one station or center to another. The teacher

followed a daily or weekly lesson plan (e.g., group activity planner or other similar written plan),

the classroom was well organized, transitions were efficient, and instruction was well paced.

Students worked independently at literacy-focused centers, with activities designed to meet their

learning objectives, while the teachers worked with small groups of students to provide more

intensive and scaffolded instruction. Virtually the entire language arts block was spent in

meaningful literacy activities.

Assessment to Instruction Software (A2i).

A2i software has five views (Figures 1, S1 & S2): the Classroom View, the Literacy

Minutes Manager, the Group Activity Planner, Classroom Set-up and the Core Curriculum

Guide. The Classroom View (Figure 1) provides the individual instruction plan for each child for

each of the instructional strategies (teacher-managed meaning-focused, teacher-managed code-

focused, etc.). The algorithms that compute recommended amounts of teacher-managed code-

focused and child-managed meaning-focused instruction are described below. Children are also

placed into groups using an algorithm that relies on their most current letter-word reading score.

Thus, each time updated scores are entered into A2i, students’ recommended group membership

(and amount of each type of instruction) changes. Teachers select the number of groups

(typically 3 to 5 groups). They are encouraged to use the algorithm-recommended student group

assignments but may make changes in the group membership if they wish.


Each reading group has a recommended mean amount for each type of instruction. This

amount changes whenever group membership changes. It is the mean amount for each group that

we ask the teachers to target. The teachers then plan their weekly schedule, including the

language arts block and other literacy activities throughout the day, using the Literacy Minutes

Manager (Figure S1). Teachers are encouraged to set a consistent daily/weekly schedule and

classroom routine, which tends to enhance classroom organization and students’ behavior (7, 8,

13). Once the language arts schedule is completed in the Literacy Minutes Manager, the teacher

is ready to plan daily instruction using the Group Activity Planner (Figure S2). The teacher first

selects the date he or she wishes to plan and selects an instructional block (e.g., teacher-managed

code-focused at 10:35 for Group 3). The page then scrolls down to reveal the teacher-managed

code-focused activities that have been indexed to the school’s core curriculum (Figure S2 top

right). A2i can be used with any evidence-based reading core curriculum or reading activity that

can be mapped onto the dimensions of instruction. Sorting and search features permit the teacher

to locate and select specific core curriculum activities (e.g., Open Court, Level 1, Unit 5).

Activities to be implemented for that day are then checked off. All planned activities turn red in

the curriculum index so that the teacher can follow his or her progression through the curriculum.

Once all the instruction blocks are planned, the teacher clicks the button to print the lesson plan

for the day (Figure S2 bottom). The Classroom Set-up view permits the teacher to select the

number of groups, number of adults in the classroom, and the time available. The Curriculum

Guide provides a catalogue of all of the instructional activities indexed in A2i (e.g., FCRR center

activities, Open Court, Reading Mastery) according to type of instruction (, child-managed

meaning focused, etc.).

About the Algorithms.


The algorithms are based on HLM models from first grade student and classroom data presented

in the Beyond the Reading Wars paper (1).

Setting the A2i Target Outcome (TO). For the A2i algorithms to work, the target spring

reading outcome (i.e., grade level and at least nine months of progress), is set based on students’

initial letter-word reading score. The minimum target reading outcome for this study was set at a

grade equivalent of 2.1. The metric grade equivalent is based on a 9-month school year and

represents an estimate of the performance that an average student at a specific time in the school

year is expected to achieve. Thus a grade equivalent of 1.5 represents the typical performance of

a first grader in January and 2.1 represents the typical performance of a second grader in late

September. Grade equivalent was selected as the metric for A2i because it is a meaningful way

to portray students’ skill levels and, based on teacher feedback, more useful for teachers. For

students reading above grade level, the target outcome equals their grade equivalent (GE) plus

0.9, which would be the minimum amount of expected reading skill growth in one school year.

(e.g., initial GE 1.5 + .9 = 2.4). In this way, adequate yearly growth is anticipated for all students

with more than a year’s growth expected for students beginning school with lower reading

scores.

The teacher-managed code-focused and child-managed meaning-focused amounts

required to achieve each child’s target outcome are computed using fall letter-word reading and

vocabulary scores. There are separate equations for the amount of teacher-managed code-focused

and teacher-managed meaning-focused instruction. SPSS syntax to compute recommended

amounts is provided below:

Teacher-managed code-focused Amount (TMCF):

TMCFa = ((TO - (.2* lw_ge))/(.05 + (.05 * lw_ge)))+ 13.

TMCF_Recommended = (TMCFa - (.82 * M)).


M is the current month of the school year starting in September. Thus, M would equal 0 in

September, 1 in October, 2 in November, and so on. TO is the target spring letter-word reading

outcome (described previously). Lw_ge is the most recent letter-word recognition grade

equivalent score. TMCFa is the intermediary amount term. TMCF_Recommended is the amount

in minutes recommended in the Classroom view of A2i. The recommended amounts change each

month and each time new scores are entered and depend on the child’s target outcome (TO) and

current letter-word reading score.

Child-managed meaning-focused Amount (CMMF):

CMMFa = ((3.76 - TO + (1.4 * voc_ae))/(.30)) - 14.

CMMFsl = 10-(.24*CMMFa).

CMMF_Recommended = CMMFa + .5*(CMMFsl * M).

Voc_ae is the most recent WJ picture vocabulary age equivalent score in years. CMMFa is the

intermediary amount term and CMMFsl is the intermediary slope term. CMMF_Recommended

is the amount recommended in the Classroom view of A2i. The amount recommended will vary

depending on each child’s target outcome (TO) and vocabulary score.

Teacher-manage meaning-focused Amount (TMMF):

TMMF_Recommended = 16 - M.

TMMF_Recommended is the amount, in minutes, presented in the classroom view. Note that the

amount decreases each month, based on observations in the Beyond the Reading Wars paper (1)

Child-managed code-focused Amount (CMCF):

CMCF_Recommended = 15.

CMCF_Recommended is the amount, in minutes, presented in the classroom view. The amount

is the same each month.


Professional Development Provided

Professional development topics included: 1) using assessment to guide instruction; 2)

planning for effective instruction using A2i; 3) organizing classrooms using small groups based

on learning goals; 4) implementing effective reading instruction; and 5) using research to inform

instruction. During this first year, professional development efforts focused primarily on topics

1, 2, 3 and 4. Teachers participated in two workshops, spring and fall 2005. Researchers (n = 6),

assigned to schools, met with individual teachers at the schools about every other week to teach

them how to individualize instruction and use A2i during planning time and to act as participant

observers in the classroom. Additionally, teachers met after school in monthly collaborative

professional development groups. Scheduling was carefully monitored so that all teachers

received the same amount of time with the researcher responsible for their training.

Algorithms for teacher-managed code-focused and child-managed meaning-focused

instruction were available. Because we did not, at the time, have research to support first grade

algorithms for teacher-managed meaning-focused and child-managed code-focused instruction,

A2i recommendations showed the mean amounts teachers provided in the Beyond the Reading

Wars study (1).

There were results and a number of design features that would minimize the possibility

that the treatment effect was the result of the professional development alone (i.e., a Hawthorne

Effect) and not the A2i recommendations and planning tools. Although all the teachers in the

treatment group received the same professional development protocol and amount of training,

the degree with which they individualized instruction and how much they used the software

varied. Plus, the more teachers used A2i from September to May, the greater was their students’

passage comprehension skill growth. (See Table S6).


The control teachers were considered to be actively participating in the study. We

conducted three video-taped classroom observations (fall, winter, and spring) in both treatment

and control classrooms. All teachers, not just the treatment teachers, received the assessment

results for the tests we administered. We identified children who seemed to be falling behind,

based on our assessments in January 2006, and these children were brought to the teachers’

attention (both treatment and control teachers). Additionally, the control teachers received

introductory information on individualizing instruction using A2i and the purpose of the study

during a meeting at their schools prior to the beginning of the study.

The district was also participating in reading reform initiatives including Reading First

and vocabulary interventions. Reading First is a federally funded initiative designed to improve

student achievement at historically low performing schools. Schools receive funding for

professional development, use an evidence-based core reading curriculum, and conduct student

progress monitoring assessments. Three of the five control schools and two of the treatment

schools were participating in Reading First. Moreover, Reading First-like practices were

mandated throughout the district and the district had just instituted a vocabulary intervention.

Teachers (both treatment and control) were required to: provide a dedicated language arts block

lasting at least 90 minutes, of which 45 minutes had to be small-group instruction, and use an

evidence-based core reading curriculum. Additionally, throughout the district, schools were

required to provide a school-based reading coach (i.e., a reading specialist); and assess students’

reading skills four times per year using DIBELS (a progress monitoring assessment tool), and at

the end of the year using the Stanford Achievement Tests-10 (SAT-10) a nationally normed

standardized test of reading comprehension. Based on our observations of the teachers’


classrooms (both treatment and control), almost all of them were utilizing a dedicated language

arts block with varying amounts of small group instruction.

There were six researchers who provided professional development. Two of the

researchers had BAs in early childhood and elementary education and were working on their

masters’ degrees – one in reading and one in early childhood education; two of the researchers

had BAs in psychology and were working on their masters’ degrees– one in community

psychology and one in African American studies; and two of the researchers had advanced

degrees and experience teaching.


Appendix SB

Methods

Procedures for random assignment and participant selection

This study utilized a cluster randomized design with a wait-list control group. This means

that all schools and teachers received the A2i software and training but the teachers in the control

group schools had to wait one school year to receive the software and training. Thus the

experiment was conducted in the first year of a two year study. School were matched and paired

based on Reading First status, percentage of students eligible for free and reduced lunch (FARL,

i.e., poverty status), third grade mean Florida Comprehensive Achievement Tests (FCAT)

reading score (a state-mandated test) and first grade mean SAT10 reading comprehension score

(14) (See Table S3). One member of each school pair was randomly assigned to the treatment

condition. All first grade teachers at the schools were invited to participate. Twenty-five of the

teachers in the control schools and twenty-four teachers in the treatment schools agreed to

participate, which represented over 80% of the teachers in the ten schools. After the onset of the

study but prior to the January assessment, two teachers at treatment schools left the study, one

for personal reasons and the other because he was teaching second graders during the language

arts time. Their students’ scores are not included in these analyses. Parental consent was obtained

for 78% of the students in the participating teachers’ classrooms. Children for whom consent

was not obtained participated in the instruction but were not assessed or included in the reported

results.

In general, although assignment was random, control group students began the year with

stronger vocabulary and reading skills. Please see Tables S2 and S3 for scores and descriptive

information for children, teachers and schools. Teacher characteristics were not significantly


different for treatment and control groups except that there were more African American

teachers in the treatment group [t(45)=2.08, p=.043]. Descriptive information was obtained

through parent and teacher questionnaires and through the State of Florida Progress Monitoring

Reporting Network (http://www.fcrr.org/pmrn/).

Measuring Student Achievement

Students’ language and literacy skills were assessed in August 2005, January and May

2006 using a battery of language and literacy assessments. Woodcock Johnson Tests of

Achievement-III (15) were chosen because they are psychometrically strong and highly

predictive of performance on state and federal achievement tests, such as the NAEP (16). The

Letter-Word Identification test assesses letter-word reading skills by asking children to recognize

and name increasingly unfamiliar letters and words. The Picture Vocabulary test asks children to

name pictures of increasingly unfamiliar objects. In the Passage Comprehension assessment,

students read sentences and passages of increasing complexity and are asked to supply the

missing word. For example, “The duck is swimming in the ___.”

We present results for the Passage Comprehension and Letter-Word Identification tests.

Students’ end of the year passage comprehension score is of particular interest as an outcome

because it requires students to decode and understand what they are reading and necessitates use

of vocabulary and morphosyntactic knowledge. Letter-Word Identification and Picture

Vocabulary subtest results are used by the A2i algorithms to compute recommended amounts

and types of instruction.

The treatment group teachers first gained access to assessment information and algorithm

recommendations provided by A2i software in September 2005. The control group teachers were

provided written reports of the assessment results for their students. A2i recommendations for


each child were revised using the January results. Again, January scores were provided to the

teachers in the control group. Final scores were also shared.

Accumulating evidence strongly indicates that children’s socioeconomic status (SES) is

closely tied to their academic achievement (17), which is why SES was one of the school

matching variables prior to random assignment to condition. We used students’ eligibility for the

free or reduced price school lunch (FARL) program as a proxy for their SES. Children who are

eligible come from families with incomes at or below 130 percent of the poverty level for free

lunch, or 185 percent, for reduced-price lunch. 130 percent of the poverty level is considered

$26,000 for a family of four; 185 percent is $37,000

(http://www.fns.usda.gov/cnd/lunch/AboutLunch/NSLPFactSheet.pdf). We created two dummy

coded variables (18) where children were coded 1 if they were eligible for free or reduced price

lunch and 0 if they were not. Only 5% of the children were eligible for reduced price lunch

compared to 54% eligible for free lunch. The remaining children were not eligible or did not

apply for the program. Data were missing for 146 children. For these children, the school-wide

percentage as a proportion was used (e.g., if school-wide, 96% of children were eligible for

FARL, then the child was coded .96 for eligibility for free lunch making the assumption that the

probability that they were eligible for FARL was .96 and that it was unlikely they were eligible

only for the reduced price lunch plan).

Assessing Teachers’ Fidelity to the Intervention -- Individualizing Students’ Instruction in

the Classroom.

A2i software automatically records the teachers’ use of the software. This includes the

number of times they sign on, how long they stay online each visit, and which components of

A2i they utilize and how. The total amount of time (in minutes) teachers used A2i from

http://www.fns.usda.gov/cnd/lunch/AboutLunch/NSLPFactSheet.pdf


September 1 through May 31 was included in the models. One school used Reading Mastery as

their core curriculum, which was not fully indexed and available in A2i until November 2005

and so the five teachers at this school did not use the Group Activity Planner until then. They

did, however, utilize the student recommendations and recommended groups components.

Classrooms were observed bi-weekly from December 2005 to February 2006 with

researchers as participant observers. Timed field notes of teacher and student activities as well as

general observations of instruction were completed during every observation. On February 17,

2006, prior to release of the January assessment results, each researcher was asked to assess the

teachers’ implementation from low fidelity (score of 1) to high fidelity (score of 5) (See Table

S4). High fidelity included consistent use of small groups based on learning objectives, amount

of time on instruction generally in line with A2i recommendations, individualized instruction

based on student assessments, and classroom management systems that supported

individualizing student instruction. A teacher with a rating of fair may have used small groups,

but generally all of the groups were doing the same activity at the same time. Low fidelity

indicated that the teacher was using whole class instruction and no small groups, and did not

individualize student instruction based on A2i recommendations. All control teachers received a

score of 0, no intervention.


Appendix SC

Results

Do children who receive algorithm-guided individualized instruction demonstrate stronger

reading skill growth?

Hierarchical Linear Modeling (19, 20) was used because children were nested in

classrooms. Failing to take into account the nested structure of the data may lead to incorrect

standard errors. The hierarchical linear models considered are multiple linear regression models

in which the residual is decomposed into a within-class residual and a between-class residual.

The regression coefficients can be interpreted as for ordinary linear multiple regression models.

Descriptive statistics are provided in Table S3. An exemplar model is presented in equation S1

and HLM results for reading comprehension are provided in Tables S5-S7. Results for letter-

word recognition are provided in Table S8. Note that these results control for students’ fall letter-

word, reading comprehension, and vocabulary scores, gender, free or reduced lunch status and

the percentage of children eligible for FARL school-wide (classroom level). FARL was coded as

two separate variables with 1 representing children with either free or reduced lunch status and 0

representing all other children.

Level-1 Model (S1)

Yij (Predicted Spring Passage Comprehension W score) = β0j + β 1j*(gender ij) +

β2j*(FARL[two variables]ij) + β3j*(Fall Letter-word Identification W score ij)+

β4j*(Fall Vocabulary W score ij)+ β5j*(Fall Passage Comprehension W score ij)+rij

Level-2 Model

β0j = γ00 + γ 01*(Treatment =1 j) + γ 02*(School % FARL =1 j) + u0j


β1j = γ10

β2j = γ20

β3j = γ30

β4j = γ40

β5j = γ50

The same model was used for amount of A2i use and fidelity score, which were entered

instead of the dichotomous variable, treatment (Treatment = 1, Control = 0). All continuous

variables, except A2i use and fidelity score, were centered at their grand mean. Although schools

were randomly assigned to treatment and control conditions, using covariates increases power

(21, 22) so we included students’ gender and FARL status. We also included fall letter-word

identification, passage comprehension, and vocabulary W scores at the child level. The treatment

variable or fidelity ratings such as the total time A2i was used (in minutes) from September

through May (Control = 0) was entered at the teacher level. Yij is the predicted spring passage

comprehension score for child i in classroom j. γ00 represents the fitted mean spring passage

comprehension score for the sample holding all other variables constant and γ01 represents the

treatment effect. If γ01 is significantly different from zero (p < .05), then there is a 95% chance

that the difference in achievement for children receiving the treatment is different than for

children in control classrooms. HLM (Version 6.03) software was used to create both graphs

(See Figures 2 & S3). For both Figure 2 and Figure S3, the control group scores are presented as

points at the fitted mean (A2i use and fidelity scores equal 0), whereas the treatment groups

scores are represented by lines because fitted mean scores vary depending on minutes of A2i use

or fidelity score. Red points and lines represent fitted passage comprehension W scores for first-

graders who started first grade with above average vocabulary scores (modeled at the 75th


percentile of the sample, W score = 474, age equivalent = 6.0). The blue points and blue lines

represent fitted scores for children who started first grade with below average vocabulary W

scores (modeled at the 25th percentile of the sample, W score = 486, age equivalent = 8.0).

End of year results (May 2006) revealed that children in the treatment group made

significantly greater gains overall on the passage comprehension test compared to children in the

control classrooms, controlling for fall status, gender, and FARL status (See Table S5). The

fitted mean difference between students in the treatment and control classrooms was 2.63

(Treatment Effect d = .25), which translates into a two month difference in grade equivalents.

The proportion of variance explained by the final model was computed by subtracting the total

variance in the full model (See Table S5) from the total variance in the unconditional model

(without child or classroom level predictors) and dividing by the total variance in the

unconditional model. The total variance in the unconditional model was 260.29 (u0 or τ = 36.76

[between classroom variance], r or σ2 = 223.53 [within classroom variance]). The final model

explained 57% of the variance in children’s spring passage comprehension scores.

An additional model, which included only the treatment teachers, revealed that the more

time teachers spent using A2i, the greater was their students’ passage comprehension score

growth overall (i.e., a dose-response effect). Results are provided in Table S6.

Additional models, with all teachers included, revealed a significant student fall

vocabulary by A2i use interaction (See Table S7 & Figures 2 and S3). In general, students with

lower fall vocabulary scores ended the school year with lower reading comprehension scores.

However, the more teachers used A2i or the higher their rubric score, the greater were their

students reading comprehension skills by spring. This effect was greater for students with lower

initial vocabulary scores, however. Children who began first grade with lower vocabulary scores


ended the year with reading comprehension scores that were comparable to students with strong

initial vocabulary scores but only in classrooms where the teacher used the A2i software to a

greater extent (Table S7 top & Figure 2) or implemented individualized instruction with high

fidelity (See Table S7 bottom & Figure S3).

When interaction terms are included in models, the size of the treatment effect depends

on teacher fidelity (e.g., A2i use) and student initial status (i.e., fall vocabulary). By comparing

the total variance explained in the final model with and without the fidelity of treatment variable,

the amount of variance explained by the treatment and the interaction term can be computed.

Fifteen percent of the total variance in spring passage comprehension scores was explained by

the amount of time teachers spent using A2i and the interaction term; 43% of the variance was

explained by child fall reading and vocabulary scores, gender, and FARL. The total amount of

variance in spring reading comprehension scores explained by the final model was 58%.

We found highly similar results with letter-word identification as the outcome (Table S8).

There was a vocabulary-by-A2i use interaction that was significant and replicated the passage

comprehension results. Altogether, these results indicated that using A2i, supported by

professional development, increases the extent to which teachers individualized student

instruction, which, in turn, led to stronger student reading scores.

References

1. C. M. Connor, F. J. Morrison, E. L. Katch, Scientific Studies of Reading 8, 305 (2004).2. C. M. Connor, F. J. Morrison, J. N. Petrella, Journal of Educational Psychology 96, 682

(2004).3. C. M. Connor, F. J. Morrison, L. Slominski, Journal of Educational Psychology 98, 665

(2006).4. C. M. Connor, F. J. Morrison, P. Underwood, Scientific Studies of Reading (in press).5. K. Rayner, B. R. Foorman, C. A. Perfetti, D. Pesetsky, M. S. Seidenberg, Psychological

Science in the Public Interest 2, 31 (2001).6. NRP, “National Reading Panel report: Teaching children to read: An evidence-based

assessment of the scientific research literature on reading and its implications for reading


instruction” Tech. Report No. NIH Pub. No. 00-4769 (U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health, National Institute of Child Health and Human Development, 2000).

7. B. M. Taylor, D. P. Pearson, K. Clark, S. Walpole, The Elementary School Journal 101, 121 (2000).

8. R. Wharton-McDonald, M. Pressley, J. M. Hampston, The Elementary School Journal 99, 101 (1998).

9. M. Pressley et al., Scientific Studies of Reading 5, 35 (2001).10. H. Borko, J. Niles, in Educators' handbook: A research perspective V. Richardson-

Koehler, Ed. (Longman, New York, 1987) pp. 167-187.11. L. S. Fuchs, D. Fuch, N. Phillips, The Elementary School Journal 94, 331 (1994).12. C. E. Cameron, C. M. Connor, F. J. Morrison, Journal of School Psychology 43, 61

(2005).13. J. E. Brophy, T. L. Good, in Handbook of research on teaching M. C. Wittrock, Ed.

(Macmillan, New York, 1986) pp. 328-375.14. Harcourt Educational Measurement, Stanford achievement test 10 (SAT10) (Harcourt,

Orlando, FL, 2003), pp.15. N. Mather, R. W. Woodcock, Woodcock Johnson III tests of achievement: Examiner's

manual (Riverside, Itasca, IL, 2001), pp.16. NAEP, “The nation's report card” (National Center for Education Statistics, 2005).17. F. J. Morrison, H. J. Bachman, C. M. Connor, Improving literacy in America: Guidelines

from research (Yale University Press, New Haven, CT, 2005), pp.18. T. D. Cook, D. T. Campbell, Quasi-experimentation design and analysis issues for field

settings (Houghton Mifflin Co., Boston, 1979), pp.19. S. W. Raudenbush, A. Bryk, Y. F. Cheong, R. Congdon, M. du Toit, HLM6:

Hierarchical linear and nonlinear modeling (Scientific Software International, Lincolnwood, IL, 2004), pp.

20. S. W. Raudenbush, A. S. Bryk, Hierarchical linear models: Applications and data analysis methods. J. de Leeuw, Ed., Advanced quantitative techniques in the social sciences (Sage Publications, Thousand Oaks, CA, ed. 2nd, 2002), pp.

21. A. Venter, S. E. Maxwell, E. Bolig, Psychological Methods 7, 194 (2002).22. S. W. Raudenbush, Psychological Methods 2, 173 (1997).


Table S1

Selected Language Arts Instruction Activity within the Dimensions of Instruction

Teacher Managed (TM) Child Managed (CM)

Code-focused

(CF)

Alphabet Activity Letter Sight-Sound Initial Consonant Stripping Word Segmentation Phonics activities Phonological Awareness activities Sight word reading

Spelling Independent repeated reading of

words Phonics activities Phonological awareness activities Non-word reading activities Computer activities code-focused

Meaning-focused

(MF)

Vocabulary Teacher Read Aloud Student Read Aloud, Choral Group Writing, Writing

Instruction, Model Writing Listening Comprehension Discussion Repeated reading of text Timed reading

Student Read Aloud, Individual Buddy reading Sustained Silent Reading Reading Comprehension worksheets Student Individual Writing

Algorithm-guided Individualized Instruction Online Supplemental Materials page 19

Table S2

Description of Schools

School Treatment School?

Reading First? Total number first grade classrooms

Core Curriculum

Percentage of children eligible for Free and Reduced Lunch, indicating family meets federal poverty level criteria

A No Yes 3 Reading

Mastery

93

B Yes Yes 6 Open Court 96

C No Yes 6 Open Court 88

D Yes Yes 5 Reading

Mastery

82

E No Yes 5 Open Court 57

F Yes No 4 Open Court 69

G Yes No 5 Open Court 67

H No No 7 Open Court 37

I No No 6 Open Court 24


J Yes No 5 Open Court 29


Table S3

Descriptive Statistics by Treatment and Control Group as well as means for variables used in HLM analyses

Student Assessments Treatment M (SD Control M (SD) Total M (SD) HLM descriptives

Fall WJ Letter-Word Reading W Score 404 (28) 413 (23) 409 (31)

Fall WJ Letter-Word Reading Grade Equivalent 1.4 (.65) 1.6 (.76)

Spring WJ Letter-Word Reading W Score 451 (24.18) 455 (26.66) 439 (27)

Spring WJ Letter-Word Reading Grade Equivalent 2.5 (.73) 2.6 (.87)

Fall WJ Picture Vocabulary W Score 476 (10) 481 (10) 479 (11)

Fall Picture Vocabulary Standard Score (norm mean = 100, SD = 15) 99 (11) 103 (16)

Spring WJ Picture Vocabulary W Score 481 (9.50) 485 (11.04) 484 (11)

Spring Picture Vocabulary Standard Score 100 (9.50) 104 (11.52)

Fall Passage Comprehension W score 447 (20) 452 (22) 450 (22)

Fall Passage Comprehension Grade Equivalent 1.4 (.60) 1.6 (.76)

Spring Passage Comprehension W score 464 (15) 467 (16) 466 (15)

Spring Passage Comprehension Grade Equivalent 2.0 (.65) 2.1 (.73)

Percentage of Students eligible for Free Lunch 52% 34% 54%

Percentage of Students eligible for Reduced Lunch 5% 5% 5%

Percentage of students receiving services for limited English proficiency* 1% 1%

Percentage of Children who are African American* 74% 38% 54%

Percentage of Children who are White* 17% 52% 37%

Teacher** and School Variables Treatment M (SD) Control M (SD) Total M (SD)


Years of teaching experience 12.4 (11.58) 9.27 (7.83) 10.67 (9.72)

Years teaching first grade 4.3 ((5.62) 5.30 (5.18) 4.85 (5.35)

Number of teachers with Masters degree 8 4

Number of Teachers who are African American 11 6

Mean percentage of students on FARL school-wide 74.26 53.1 62.82 (24.52)

*Data not available for 18% of the children in the sample. **Data are missing for one treatment teacher.


Table S4

Descriptions of teachers’ fidelity individualizing their students’ instruction based on classroom

observations.

Fidelity Description; 0 Control Group Teachers (regardless of what was observed in the classroom)

1 Poor fidelity. The teacher is not individualizing instruction at all. Most of the instruction is whole class without attention to the individual needs of the students. The classroom is not well organized, transitions are long, and instruction is not well paced for either higher or lower performing students. For two (9%) teachers, fidelity was described as poor.

2 Low fidelity. The teacher is using primarily whole class instruction. When small groups are used, they are not always focused on literacy. The classroom has adequate organization, transitions are reasonable, and instruction, while not individualized, is adequately paced. For six (27%) teachers, fidelity was described as low.

3 Fair fidelity. The teacher is using small groups. However, the children in the small groups are generally receiving highly similar amounts and types of instruction (i.e., not differentiated). The teacher has attended to the grouping recommendations in A2i but grouping is based more on convenience rather than student learning objectives, skill, or ability level. The classroom has adequate organization, transitions during small group time are reasonable, and instruction, while not fully individualized, is adequately paced. For five (23%) teachers, fidelity was described as fair.

4 Moderate fidelity. The teacher is using small groups and there is some evidence that instruction is individualized and there is an attempt to meet the A2i recommended time targets for each group. The teacher is grouping children based on learning objectives, skill, and/or ability rather than convenience. The classroom has adequate organization, transitions are reasonable, and instruction is adequately paced. For four (18%) teachers, fidelity was described as moderate.

5 High fidelity. The teacher uses small groups, and there is good evidence that the instruction is individualized and instruction amounts align with A2i recommended amounts and types. The number of groups is based on effective group size and reflects A2i recommendations. The teacher groups children based on learning objectives, skill, and/or ability rather than convenience. There is an observable system in place for organizing students into groups and transitioning from one station to another. The teacher uses a lesson plan (e.g. group activity planner). For five (23%) teachers, fidelity was described as high.

Algorithm-guided Individualized Instruction Supporting Online Material page 24

Table S5

Results of Hierarchical Linear Modeling demonstrating significant differences in treatment and control students’ achievement on the

spring WJ Passage Comprehension Test, controlling for students’ free and reduced lunch status (i.e., meeting federal definition of

poverty), and fall achievement scores. For the Treatment variable, treatment = 1, control = 0.

Fixed Effects Coefficient Standard Error t-ratio (df) p-value

Mean Spring WJ Passage Comprehension W score (intercept) 466.04 1.07 433.45 (44) < .001

Treatment Effect (Classroom level) 2.63 1.15 2.28 (44) .028

Percentage of students eligible for FARL school-wide -.04 .027 -1.47 (44) .148

Gender (1 = girl, 0 = boy) 0.62 0.81 0.69 (533) .488

Free Lunch Status (1 = eligible, 0 = not) -2.79 1.33 -2.096 (533) .036

Reduced Lunch Status (1=eligible, 0=not) -5.29 1.43 -3.69 (533) < .001

Fall WJ Letter-word W score .14 .02 5.60 (535) < .001

Fall WJ Vocabulary W score .16 .05 3.38 (535) .001

Fall WJ Passage Comprehension W score .31 .04 7.10 (535) < .001

Random Effects Standard

Deviation

Variance Chi-Square

(df)

p-value

Between classroom residual 1.81 4.36 63.55 (44) .028

Within classroom residual 10.33 106.87

Deviance = 4088.2


Table S6

Results of Hierarchical Linear Modeling demonstrating significant association between the number of minutes teachers used A2i and

their students’ achievement on the spring WJ Passage Comprehension Test, controlling for students’ gender, free and reduced lunch

status (i.e., meeting federal definition of poverty), and fall achievement scores.


Mean Spring WJ Passage Comprehension W score (intercept) 466.18 1.44 329.47 (20) <.001

A2i minutes used (Classroom level) .02 .007 2.97(20) .008

Gender (1 = girl, 0 = boy) .27 1.15 .23 (219) .815


Reduced Lunch Status (1=eligible, 0=not) -1.67 2.16 -.77 (219) .442

Fall WJ Letter-word W score .19 .04 4.87 (219) <.001

Fall WJ Vocabulary W score .12 .08 1.47 (219) .138

Fall WJ Passage Comprehension W score .22 .04 5.33 (219) <.001


Deviation

Variance Chi-Square

(df)

p-value




Table S7

Results of Hierarchical Linear Modeling demonstrating differences in student achievement on the spring WJ Passage Comprehension

Test between control and treatment groups when teachers’ use of A2i (total minutes from September to May) varies, controlling for

students’ free and reduced lunch status and fall achievement (top) and when treatment teacher rubric score varies (bottom).



Treatment = A2i minutes used, Control = 0 0.014 .004 3.44 (44) .002

Percentage of students eligible for FARL school-wide -0.03 .025 -1.07 (44) .289

Gender (1 = girl, 0 = boy) 0.59 0.82 0.72 (532) .472


Reduced Lunch Status (1 = eligible, 0 = not) -5.20 1.41 -3.68 (532) <.001

Fall WJ Letter-word W score 0.15 .03 5.79 (532) < .001

Fall WJ Vocabulary W score 0.23 .05 4.48 (532) < .001

A2i X Fall Vocabulary W score interaction -0.001 .0002 -2.77 (532) .006

Fall WJ Passage Comprehension W score 0.30 .04 6.83 (532) < .001


Deviation

Variance Chi-Square

(df)

p-value

Between classroom residual 1.70 2.88 56.84 (44) 0.093


Deviance = 4105.3


Table S7 Bottom



Teacher fidelity (1 = low to 5 = high), Control = 0 0.53 .33 1.60 (44) .117

Percentage of students eligible for FARL school-wide -0.03 .03 -1.14 (44) .262

Gender (1 = girl, 0 = boy) 0.53 0.80 0.66 (532) .507


Reduced Lunch Status (1 = eligible, 0 = not) -5.20 1.42 -3.66 (532) <.001


Fall WJ Vocabulary W score 0.22 .05 4.42 (532) < .001

Fidelity Score X Fall Vocabulary W score interaction -0.04 .02 -2.17 (532) .030



Deviation

Variance Chi-Square

(df)

p-value

Between classroom residual 2.34 4.48 68.56 (44) 0.010


Deviance = 4096.2


Table S8

Results of Hierarchical Linear Modeling demonstrating differences in student achievement on the spring WJ Letter-Word Identification

Test between control and treatment groups when teachers’ use of A2i (total minutes from September to May) varies, controlling for

students’ free and reduced lunch status and fall achievement.


Mean Spring WJ Letter-word identification W score (intercept) 454.56 1.67 271.13 (44) < .001

Teacher fidelity, A2i minutes used 0.02 .005 3.47 (44) .001

Percentage of students eligible for FARL school-wide -.03 .04 -0.96 (44) .341

Gender (1 = girl, 0 = boy) -2.39 1.08 -2.21 (532) .027

Free Lunch Status (1 = eligible, 0 = not) -1.56 2 .03 -0.77 (532) .443

Free and Reduced Lunch Status (1 = eligible, 0 = not) -1.70 2.71 -0.628 (532) .530


Fall WJ Vocabulary W score 0.06 .07 .76 (532) .450

A2i X Fall Vocabulary W score interaction -0.001 .0004 -2.72 (532) .007



Deviation

Variance Chi-Square

(df)

p-value



Deviance = 4472.22


Figure S1. Classroom View (left) The colored bars represent the amount of instruction recommended by the algorithms based on

students’ vocabulary and letter-word recognition scores. Children’s names, when clicked, provide the scores and a graph depicting

actual scores compared to targets. Another algorithm assigns children to groups based on their letter-word identification score. The

teacher selects the number of groups and may change group membership. Literacy Minutes Manager (right). This is where teachers

scheduled their literacy instruction. Although teachers could schedule literacy throughout the day, they were encouraged to use a

dedicated ninety minute block of time for reading at a minimum. The school district mandated that a 90 minute block of time be

dedicated to reading instruction.


Figure S2. The daily planning features of A2i,

including the Group Activity Planner group and

calendar (top left), indexed reading activities (top

right) and, printable lesson plan (bottom).


Figure S3. Vocabulary-by-Fidelity Score Interaction. The points represent the fitted mean for first-graders in the control group whereas the lines represent fitted mean scores for children in the treatment group, which varied by fidelity score. The red point and line represent fitted mean scores for students who began first grade with fall vocabulary scores falling at the 75th percentile of the sample (W = 474, Age Equivalent = 6 years) whereas the blue point and line represent fitted scores for first graders with lower fall vocabulary scores falling at the 25th percentile of the sample (W = 486, Age Equivalent = 8 years). Graph created using HLM version 6.03.

0 1.00 2.00 3.00 4.00 5.00460.0

463.8

467.5

471.3

475.0

Rubric Score (1 to 5)

Sprin

g Pa

ssag

e C

ompr

ehen

sion

W S

core

WJ_VOC_W = -5.070

WJ_VOC_W = 7.930

Control Group

Documents

Supporting Teachers’ Efforts to use Assessment to …steinhardt.nyu.edu/scmsAdmin/uploads/002/459/Science SOM... · Web viewTO is the target spring letter-word reading outcome (described