18
CHAPTER 5 EFFECT OF FORMAT ON LEARNING DISABLED AND NON-LEARNING DISABLED STUDENTS’ PERFORMANCE ON A HANDS-ON SCIENCE ASSESSMENT BRIDGET DALTON, CATHERINE COBB MOROCCO, TERRENCE TIVNAN, and PENELOPE RAWSON Harvard University, Cambridge, MA 02138, U.S.A. Abstract Students with learning disabilities often perform poorly on multiple-choice tests that emphasize recall and factual knowledge. This study compared the effect of two alternative assessments - a constructed diagram test and a written questionnaire - on fourth-grade learning disabled (LD) and non-learning disabled (Non-LD) students’ learning. As part of a larger investigation of different approaches to hands-on science learning, 172 students (including 33 LD students) in six urban and two suburban classrooms participated in the study. Results indicate that students’ assessment outcomes are a function of learner status (LD, low, average and high achieving) and level of domain specific knowledge after instruction. After controlling for domain specific knowledge, students with LD, and low and average achieving students obtained higher scores on the constructed diagram test than on the questionnaire. High achieving students were not sensitive to format differences, performing comparably on the two measures. The facilitative effect of the diagram format may have been due to differences in the primary symbol systems (graphic vs. text) and the openness of the response format (constrained vs. open) of the constructed diagram and questionnaire, respectively. Introduction Jay, a student with learning disabilities (LD), and his fourth-grade class had just completed an intensive, hands-on science unit on electricity. When asked to define and give an example of a conductor on the post-instruction questionnaire, Jay left the item blank. However, Jay’s performance on the diagram analysis test following instruction (see Figure 1 below) reveals that he understands the function of a conductor and can explain the effect of a conductor versus insulator on a simple circuit, either by drawing the electrical pathway or describing the problem. Further, classroom observation of Jay’s experiments with conductors shows that he is able to build a circuit tester and 299

Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

Embed Size (px)

Citation preview

Page 1: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

CHAPTER 5

EFFECT OF FORMAT ON LEARNING DISABLED AND NON-LEARNING DISABLED STUDENTS’

PERFORMANCE ON A HANDS-ON SCIENCE ASSESSMENT

BRIDGET DALTON, CATHERINE COBB MOROCCO, TERRENCE TIVNAN, and PENELOPE RAWSON

Harvard University, Cambridge, MA 02138, U.S.A.

Abstract

Students with learning disabilities often perform poorly on multiple-choice tests that emphasize recall and factual knowledge. This study compared the effect of two alternative assessments - a constructed diagram test and a written questionnaire - on fourth-grade learning disabled (LD) and non-learning disabled (Non-LD) students’ learning. As part of a larger investigation of different approaches to hands-on science learning, 172 students (including 33 LD students) in six urban and two suburban classrooms participated in the study. Results indicate that students’ assessment outcomes are a function of learner status (LD, low, average and high achieving) and level of domain specific knowledge after instruction. After controlling for domain specific knowledge, students with LD, and low and average achieving students obtained higher scores on the constructed diagram test than on the questionnaire. High achieving students were not sensitive to format differences, performing comparably on the two measures. The facilitative effect of the diagram format may have been due to differences in the primary symbol systems (graphic vs. text) and the openness of the response format (constrained vs. open) of the constructed diagram and questionnaire, respectively.

Introduction

Jay, a student with learning disabilities (LD), and his fourth-grade class had just completed an intensive, hands-on science unit on electricity. When asked to define and give an example of a conductor on the post-instruction questionnaire, Jay left the item blank. However, Jay’s performance on the diagram analysis test following instruction (see Figure 1 below) reveals that he understands the function of a conductor and can explain the effect of a conductor versus insulator on a simple circuit, either by drawing the electrical pathway or describing the problem. Further, classroom observation of Jay’s experiments with conductors shows that he is able to build a circuit tester and

299

Page 2: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

300 B. DALTON ef ~1 _- LnlF \

f -->

+

’ : I a I = : I ” * I

\-

--- ---_,

Will the bufb fight? u 4 -T

If YES. draw the patJ%ay in REO.

If NO. WCS ‘.he problem?

Will the bulb rght? hb

If YES. draw the pathway in RED.

If NO v&a& lhe problem?

I3 .I

Figure 5.1, Jay’s (LD) post-instruction response to constructed diagram questlons on conductwty.

systematically test the conductivity of several items. (EDC Science and Problemsolving Project, 1990.)

Inquiry-based science instruction is receiving renewed attention because it allows students to develop important concepts and skills in the context of “doing” science, carrying out investigations to answer questions and solve problems. This kind of challenging science curriculum can potentially benefit ALL students, including those who struggle to acquire literacy and formal school learning (Educating Americans for the Twenty-first Century, National Science Board Commission, 1983). The drive to forge new directions in curriculum and instruction has been accompanied by a call to develop assessment tools and techniques that are congruent with the goals of the new curricula and that provide teachers with a more comprehensive view of students’ concepts and inquiry process skills (Champagne, Lovitts, & Calinger, 1990; National

Science Foundation, 1991; Resnick, 1987). To get beyond students’ surface understanding of concepts and assess what is

likely to be a complex interaction of content knowledge, process skills, and problem solving ability, it is necessary to develop alternatives to the multiple-choice and short-answer tests that currently dominate classroom and national science assessments (Hein, 1990; Raizen, Baron, Champagne, Haertel, Mullis, & Oakes, 1989). Researchers and educators have begun to explore the value of alternative science assessments such as hands-on performance, computer simulation, open-ended questionnaires, and figural response items (Blumberg, Epstein, MacDonald, & Mullis, 1986; Champagne, Lovitts, & Calinger, 1990; Harmon, 1990b; Martinez, 1991; Shavelson, Carey, & Webb, 1990). However, we know very little about the relative benefits of the different types of assessment, or whether the benefits vary for different student populations.

As the opening vignette illustrates, the issue of assessment may be critically important for students with learning disabilities. These students often have particular difficulty

Page 3: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

Performance Assessment 301

conveying what they know on group-administered multiple-choice and short-answer tests that emphasize recall and factual knowledge (Putnam, 1992; Sylvia & Ysseldyke, 1991).

Difficulties with reading, writing, and/or language (Englert & Thomas, 1987; Graham & MacArthur, 1988; Morocco, Dalton, & Tivnan, 1992; Myklebust, 1973; Torgesen, 1977), as well as poor test-taking skills (Alley & Deshler, 1979; Scruggs & Mastropieri, 1988), contribute to their weak performance on these measures. To reduce the bias of tests that rely heavily on written language proficiency to assess content area knowledge, LD students’ Individual Education Plans may include provisions for adapting testing conditions (e.g., allowing a student to take tests in the resource room, providing additional time) or, less frequently, the test format (e.g., administering the test orally, having the student write responses on the test rather than on a separate sheet, enlarging print size) (Mick, 1989; Tolfa, Scruggs, & Bennion, 1985). However, teachers and researchers alike generally rely on written textual questions to assess the learning of students with LD (Zigmund, Levin, & Laurie, 1985).

In contrast to testing practice, there is strong evidence that visual representations such as drawings and diagrams (Bergerud, Lovitt, & Horton, 1988; Mastropieri, Emerick, & Scruggs, 1988; Woodward & Noell, 1991) and graphic organizers (Darth & Carnine, 1986; Horton, Lovitt, & Bergerud, 1990) improve LD students’ content area learning. Graphics vary widely in visual design (e.g., photographs, line drawings, flow charts, graphs) and in the amount of text embedded in, or accompanying the graphic. However, they share a reliance on a different symbol system than text, using visual representations and cues to simplify complex concepts and processes, structure relationships and make the abstract more concrete (Winn, 1989). For LD students who often have difficulty recognising critical attributes, ignoring irrelevant detail, and connecting and organizing related concepts (for a review, see Stone & Michaels, 1986), graphics may help them activate and develop more appropriate schemas for understanding (Darth & Camine, 1986).

The facilitative effect of graphics on learning raises important questions about their use in assessment. Research on reading comprehension focuses on how students construct meaning from text, and occasionally from graphics that are presented to them in a textbook, a magazine article, etc. (for reviews, see Dole, Duffy, Roehler, & Pearson, 1991; Levie & Lenz, 1982). In an assessment task, students must not only read and interpret the question, which may include text, graphical, and/or numerical information, but must generate a response using one or more of these symbol systems to represent their understanding.

A second critical aspect of the response mode in testing is the openness, or the degree to which students must construct their own writing, drawing, or computation, in contrast to “operating” on a presented graphic, or selecting from multiple-choice alternatives. Figure 5.2 presents a textual question, along with several fourth-grade students’ unstructured drawing and writing responses showing a range of understanding, writing and drawing competence, and choice of expressive mode. It differs from the constructed diagram item in Figure 5.1 in two important ways: The dominant symbol

system used in the question (text vs. diagram) and the openness of the response format (open vs. constrained).

Dynamic assessment (e.g., Brown & Ferrara, 1985; Campione, 1990) provides a useful framework for thinking about the influence of variations in the symbol systems used in questions and responses and in the openness of the response format on students’ assessment outcomes. If a particular format scaffolds, or supports, students’

Page 4: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

302 B. DALTON et al.

Figure 5.2. A textual question and several fourth-grade students’ unstructured writing and drawing responses.

performance, it allows them to perform at a higher level than they would otherwise (Vygotsky, 1978). In effect, the scaffold helps define what Vygotsky terms the zone of proximal development, identifying what students have already learned and perhaps more importantly, what they are in the process of learning. Assessment that is sensitive to individual differences would provide important information to teachers, administrators, and researchers about students’ learning. The implications for teaching and evaluation would be even greater if some students, such as those with LD, are more sensitive to these format differences than others.

For example, consider again Jay’s (LD) performance on the questionnaire vs. constructed diagram questions described in the opening vignette. It is likely that Jay’s

Page 5: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

Performance Assessment 303

teacher would evaluate his progress quite differently in the two assessment contexts, and that her perception of his competence (or lack thereof) would lead her to make different instructional decisions for him. While Jay’s questionnaire performance suggests he “didn’t get it and needs review,” his diagram performance indicates he is ready for more challenging material.

Despite LD students’ difficulty with multiple-choice and short-answer tests, there is little research exploring alternative assessment approaches. In one of the few studies to address this issue, Bergerud, Lovitt, & Horton (1988) found that LD high school students’ performance on a life science test varied as a function of both test format and treatment. When LD students were tested with textual questions alone, the effect of the graphics vs. study guide treatment was comparable; when tested with a combination of textual and graphics questions, the graphics students outperformed the study guide students. Further, students in the graphics treatment tended to be able to answer both textual and graphics questions, while students in the study guide treatment tended to have greater difficulty with graphics questions than textual questions.

Bergerud, Lovitt, & Horton’s (1988) results suggest that LD students’ experience with the symbol system, or their “graphical literacy,” as well as their domain specific knowledge, influences their ability to respond to particular assessment formats. How- ever, in a hands-on science pilot study where LD and Non-LD students had equal experience with graphic representations of electricity concepts and processes, LD students scored significantly higher on a diagram test than on a written questionnaire, and the positive effect of the diagram format was greater for LD students than Non-LD students (Morocco, Dalton, Tivnan, & Rawson, 1992). These results indicate that other learner characteristics, such as verbal ability, spatial reasoning, or written language proficiency are also likely to be factors influencing students’ response to graphic vs. textual assessment formats.

The complexity of the interrelationship of learner, task, and symbol system is further suggested in assessment research with normally achieving populations. Martinez (1991) compared student responses to multiple-choice textual questions and constructed diagram questions (e.g., the student responds by “operating” on the diagram, drawing arrows to connect components of a food web). While multiple-choice questions were generally easier than constructed diagram questions, it is not clear whether this was due to differences in the symbol system, the openness of the response format, or a combination of the two. To answer a multiple-choice question, the student evaluates and selects from a limited range of possibilities. The task is primarily one of recognition. In contrast, the constructed diagram format is more open, offering a greater range of response possibilities. For example, although the spatial arrangement of the pictures of a rabbit, fox, man and grass might suggest certain relationships in a food web, students are free to connect the components in whatever way makes sense to them, constructing their own meaning from the graphic presentation.

The influence of the response format is also demonstrated in Shavelson, Carey, & Webb’s (1990) study of high school students’ responses to chemistry problems presented as word problems, diagrams, tables, graphs and numerical examples. While students’ responses did not vary as a function of question format (with the exception that graphs were generally more difficult), they did vary according to response format. Other studies have shown that students do best when the symbolic form of the question and response are the same (Dwyer, 1978; Webb, Gold, & Qi, 1989).

Page 6: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

304 B. DALTON et al.

Shavelson and colleagues’ current research on the exchangeability of science per- formance assessments provides additional evidence that assessment format impacts student performance (Pine, Baxter, & Shavelson, 1991; Shavelson, Carey, & Webb, 1990). For example, while students’ average performance on a computer simulation vs. hands-on experiment was comparable, there was substantial variation among students, with some doing much better on the computer simulation, and others doing much better on the hands-on assessment (Pine, Baxter, & Shavelson, 1991).

The above studies highlight the critical need to examine the interaction of assessment format, learner characteristics/aptitudes and learning experiences (Snow, 1989). This chapter presents the results of a comparison of the effect of assessment modality on fourth grade LD and Non-LD students’ learning in hands-on science. The study is part of a three-year classroom based project that culminated in a comparison of two hands-on approaches to teaching electricity - supported inquiry science and activity-

based science - in six urban and two suburban fourth-grade classrooms over a two month period (for a full report of the project, see Morocco, Dalton, & Tivnan, 1992). As part of this larger investigation, we developed and field-tested two forms of an alternative assessment to examine students’ understanding and application of electricity concepts: A written questionnaire requiring students to write and, in some cases, draw their responses to textual questions, and a diagram analysis test requiring students to construct a response based on their analysis of figural information. Separate analyses of the two outcome measures clearly demonstrated the superiority of the supported inquiry approach, regardless of differences in learner status (LD vs. low, average and high achieving students) and urban/suburban district. However, the pattern of results for students of different learner/achievement status varied somewhat on the questionnaire and diagram test formats, suggesting that the test format may have influenced individual student outcomes. The results of that first study led us to the following two questions,

which guide the current analysis:

l What is the effect of a questionnaire vs. constructed diagram format on the performance of fourth-grade children with learning disabilities (LD), low achieve- ment (Lo-A), average achievement (Ave-A), or above average achievement (Hi-

A)? l Does the effect vary as a function of students’ domain specific knowledge? That

is, does the effect vary for students with a strong electricity knowledge base after instruction (More-El) and students with a less developed knowledge base (Less-

El)?

We anticipated that both the questionnaire and constructed diagram test would yield important information about the quality of students’ conceptions and misconceptions. We hypothesized that LD students at varying levels of domain specific knowledge would perform more strongly on the diagram test, since it reduces the reading/writing barrier. It also potentially scaffolds students’ performance by offering a visual representation of a situation that constrains the potential responses, and perhaps, provides a schema for interpreting and responding to the problem (Anderson, 1977; Darth & Carnine, 1986). For similar reasons, we thought low achieving students might also benefit from the diagram test format, since these students generally have more written language difficulties.

Page 7: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

Performance Assessment 305

Method

Teachers and Sites

Eight teachers participated in this study. Six teachers were from five schools in an urban school district serving an ethnically and economically diverse community in a northeastern metropolitan area. The other two teachers were from a school in an affluent suburban school district in the same area. Teachers used a hands-on science curriculum that engaged children in experimenting with batteries and bulbs to develop their understanding of electricity concepts and science process skills. The science unit took place over a two-month period and included twelve 45-60 minute lessons. Research staff observed each class weekly to document fidelity of instruction and to provide technical assistance as needed.

Students

One hundred and seventy-two fourth-grade students, including 33 students with LD, participated in the study (see Tables 5.1 and 5.2). Consultation with school personnel and review of student records verified that all of the LD students were receiving learning disabilities services in one or more academic areas and were functioning in the average to above average range of intelligence. To assess effects on non-LD students of varying achievement levels, we constructed a composite standardized achievement score based on students’ math and spelling performance on the Wide Range Achievement Test-R (Jastak & Wilkinson, 1984) administered prior to instruction. Non-LD students who obtained a composite score below the 35th percentile were classified as “low achieving,” students scoring in the 35th-67th percentile range were classified as “average achieving” and students scoring above the 67th percentile were classified as “high achieving.” To test effects on students with different levels of domain specific knowledge, we constructed a composite electricity knowledge score based on the sum of their mean item scores on the questionnaire and diagram test administered after instruction (maximum score possible is 8.0). Students who obtained a score greater than 6.0 were classified as having a “more” developed electricity knowledge base (More-El), and students obtaining a score equal to or less than 6.0 were classified as having a “less” developed electricity knowledge (Less-El).

Table 5.1 Sample Size by Learner Status and Level of Domain Knowledge

LD Learner status

Lo-A Ave-A Hi-A Total

Domain knowledge Less El More El Total

28 23 43 18 112 5 7 25 23 60

33 30 68 41 172

Page 8: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

306 B. DALTON et al

Table 5.2 Characteristics of Fourth-Grade Students with LD, Low. Average and High Achievement

LD Lo-A Ave-A Hi-A Total (n = 33) (n = 30) (n = 68) (n = 41) (n = 172)

CA (months) Mean

(SD) Percent Male Ethnicity (percent)

White African American Latin0 Other

Percent ESL Percent urban Achievement (percentiles)*

Spelling mean

(SD) Math mean

(SD) Electricity knowledge (raw scores)t

Mean

(SD)

120.3 123.3 117.1 114.6 118.2

(7.6) (5.0) (4.8) (4.0) (6.1) 57.6 66.7 51.5 43.9 53.5

42.4 53.3 69.1 80.5 64.0 42.4 36.7 17.7 9.8 23.8 12.1 6.7 8.8 2.4 7.6 3.0 3.3 4.4 7.3 4.7

27.3 20.0 22.1 9.8 19.8 84.9 80.0 66.2 51.2 68.6

33.6 25.1 59.9 83.3 54.3 (26.2) (12.5) (17.7) (15.3) (27.9) 27.0 21.2 45.0 78.8 45.4

(17.4) (11.3) (18.8) (15.8) (26.7)

4.49 4.97 5.41 6.20 5.35 (1.06) (1.07) (1.12) (0.95) (1.20)

*WRAT-R administered prior to instruction. tcomposite based on sum of average item scores on post-instruction questionnaire and diagram test

(the maximum score possible is 8.0).

Table 5.2 presents data on the background characteristics and achievement of students by learner status. Overall, the students were quite diverse in terms of ethnicity and language background, and over two-thirds of them were from urban schools. As we might expect, the learner status groups differed somewhat in relation to these variables. To assess achievement differences, we carried out one-way analyses of variance of students’ prior math and spelling achievement. Results indicated that while LD students’ average performance was higher than that of students in the Lo-A group, these differences were not significant (spelling, p < .071 and math, p < .174). As we would expect, LD and Lo-A students’ academic achievement was significantly lower than that of the Ave-A students (for both math and spelling, p < .OOO). In addition, Ave-A students’ achievement was lower than that of the Hi-A students (for both math and spelling, p < .OOO). The similarity of the LD and low achieving groups is consistent with the findings of Ysseldyke and others that LD designations can be inconsistent across schools and districts, and often do not reliably distinguish between LD and low achieving students (Ysseldyke, Algozzine and Epps, 1983; Ysseldyke, Algozzine, Shinn, & McGue, 1982).

Instruments

We designed a written questionnaire to assess students’ understanding before and after instruction, and a constructed diagram test as an additional post-instruction measure (the questionnaire items and diagram test are presented in Appendix A). Only the

Page 9: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

Performance Assessment 307

post-instruction data are reported here. There are several key concepts that children may develop as they experiment with batteries and bulbs. First, electricity flows in a simple circuit when a complete pathway has been formed between the battery, bulb and wires. Second, the bulb brightness is a function of the amount of energy and resistance in the circuit. Third, the electricity flow can be turned off by opening, or interrupting the circuit (e.g., an open switch, insertion of an insulator, faulty circuit components or connections). Fourth, some materials operate as electrical conductors, while others act as insulators. Fifth, different circuit configurations (simple, series, parallel) result in different outcomes and offer different advantages to the user.

Written questionnaire

The written questionnaire included eight multi-part textual questions that asked students to use writing and drawing to convey their understanding and application of concepts. For example, students wrote an explanation and/or drew a picture of how they would test an object for conductivity (see Figure 5.2). Extended writing and drawing were chosen as the response modes to allow students to show what they knew in their own words and visual representations. Our pilot work and Chittenden’s (1990) and Hein’s (1990) research suggest that children’s drawings can be quite revealing of their understanding of science concepts and processes. Further, we thought these open-ended response formats would enable students to express different levels of understanding, rather than right/wrong responses. The value of assessing children’s responses qualitatively is supported by research on children’s science misconceptions that compellingly demonstrates the evolving nature of their understanding (Carey, 1986; Gilbert, Osborne, & Fensham, 1986).

We also knew that the questionnaire format could potentially challenge fourth-grade students, and particularly students with LD. Expository writing is difficult for many students at this level, and especially so for LD students, who are not only less familiar with this text structure, but who also often have difficulty accessing what they know and writing coherent, complete texts (Englert & Thomas, 1987; Graham & MacArthur, 1988; Stoddard & MacArthur, 1993). We expected that the inclusion of drawing, as well as writing, would expand students’ response options. In addition, we emphasized in our directions to students that we were interested in their ideas and thinking, and were not concerned with spelling and punctuation.

Constructed diagram test

The diagram test also included eight questions. Like the questionnaire, it focuses on simple, series, and parallel circuits and conductors/insulators. For seven of the eight questions, students analyze figural material (e.g., a circuit diagram) and predict whether a particular bulb will light by checking “yes” or “no.” If they predict the bulb will light, they construct an explanation by “operating” (Martinez, 1991) on the diagram, drawing the electricity pathway with dashes or a red line. If they predict the bulb will not light, students write a brief explanation of the problem. While the response to the prediction question is constrained, or closed (yes/no), the response to the why

Page 10: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

308 B. DALTON ef tii.

question is constructed and may be figural (drawing the pathway for “yes” predictions) or written (describing the problem in words for “no” predictions). It is important to note that although the questionnaire and diagram tests were both developed to assess the same underlying concepts, we did not design parallel sets of figural and written items for each question. While some items could be considered parallel, others might not. The diagram test emphasizes figural analysis and constrained drawing and writing responses, while the questionnaire emphasizes textual analysis and unstructured open writing and drawing responses.

GENERAL SCORtNG CRITERIA

LevelO: No response, responds with ‘I don’t know,’ or provides an unsexable ~plwe.

Level 1: A naive or descriptive am5wer that is egocentric, irrelevant or repeats the question.

Level 2: A response that is incorrect or ignorant of the concepts (2A) or one that shows I clear confusion between two mncepts-misconception (2B).

Level 3: A generaliy good response, which may contain an inaccuncy, provide only a partiai explanation or omit one of the -pies called for.

Level 4: A complete and a-rate response.

t

l- ’ EXAMPLE OF ONE LD STUDENT’S GROWTH

L?J PARALLEL CIRCUITS

7. Maxos had a different stig of lights. When he un- screwed one bulb, the west of the liehts stavedon.

7a. why?

.7B. Drawa LARGE &gram to show how the bulbs

could be wired in this ckuit.

Figure 5.3. Gen,;al scoring criteria, with a sample rating of cne LD student’s growth on a parallel circuit questionnaire item.

Page 11: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

Performance Assessment 309

Id. What could you do to make the bulb burn brighter? (VERSION B - QUESTION 2C)

The purpose of this question is to assess students’ understanding of the relationship between bulb brightness and other components in the circuit. To increase the brightness of a bulb in a simple circuit, the amount and/or flow of electric current must be increased. This can be done most easily by increasing the voltage (number of batteries or battery voltage) without increasing the resistance (bulbs, wires).

level 4: Student knows that increasing the voltage will increase the bulb’s brightness and also mentions the batteries should be in series or offers more than one option.

. “get a more powerful battery, a brighter bulb, or two batteries.”

. “use 2 batteries, one on top of another.”

level 3: Student knows that increasing the voltage will increase the bulb’s brightness, but does not mention the batteries should be in series or does not offer more than one option OR suggests a less likely solution that potentially could affect the bulb’s brightness by manipulating the amount of resistance in the circuit OR gives an incomplete or partially correct response. The correct portion should relate to level 4 responses. The incorrect portion may correspond to level 2 responses.

. “add another battery.” 0 “get a more powerful battery.” . make the filament smaller. 0 get a thicker strip of wire. . get more batteries and tie the wire to the part that you plug into the lamp.

level 2: (2A) Student provides an incorrect explanation regarding the relationship between voltage and resistance, applies an irrelevant use of circuit knowledge OR (2B) evidences a misconception (confuses brightness with number of bulbs and not the amount of the energy source OR equates faster light with more powerful light OR equates the tightness of the connection to the level of brightness).

level 2A: . add another wire. . touch the bottom of the bulb to the battery, but keep the wire wrapped up. . put it in a dark room.

level 2B: . use two bulbs. . you could buy a brighter light bulb. . put the battery on a higher speed of light. . push the wires harder.

level 1: Student offers a naive or egocentric response, an irrelevant response not related to circuits, OR relies entirely on experience/knowledge of household electricity.

. let it stay on till it burn.

. screw it into a lamps socket and turn on the power meaning turn or flick the light switch.

level 0: student does not respond, responds “I don’t know,” or provides an unscorable response.

if you put a there battery. TAlthough we might infer that Sara meant “a there” to mean “another” or “a third” battery, it’s too much of a jump in this case to give her credit for understanding the concept.)

Figure 5.4. An example of specific scoring criteria for questionnaire item 2C.

Page 12: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

310 B. DALTON et ul.

Scoring and R~li~~ilit_~

The written questionnaire and diagram test responses were scored blindly by outside raters (science specialists and graduate students in assessment) using general and item specific criteria. Figure 5.3 presents the general criteria as adapted by Harmon (1990a) from National Assessment of Educational Progress criteria and an example rating of one LD student’s pre- and post-instruction responses to a parallel circuit item. Figure 5.4 presents an example of specific criteria developed for each item. The scoring criteria were developed through several iterations over a two-year period, including: (1) Development of a preliminary set of scoring criteria based on students’ responses on pilot assessments, (2) two reviews by assessment and science education experts, (3) application of scoring criteria to pilot study student protocols, (4) analysis of the appropriateness of the scoring criteria to the range of student responses, and (5) further revision based on the pilot analysis. Raters were trained to use the scoring criteria, practicing with sample student protocols representing a range of responses, comparing ratings and discussing discrepancies.

Coefficient alpha was used to assess the internal consistency of the scales, yielding a coefficient of .86 on the post-instruction questionnaire and .67 on the diagram test. Twenty-five percent of the tests were scored by two raters to assess interrater reliability. The median percent agreement was .69 on the post-instruction questionnaire and .88 on the diagram test. Interrater reliability on total scores were .93 on the post-instruction questionnaire and .94 on the diagram test.

Data Analysis

To test the hypothesis that students’ performance would differ on the two assessments, we compared students’ scores in terms of the percentage point difference in total scores on the diagram vs. questionnaire tests. For example, a student obtaining 60% on the diagram test and 40% on the questionnaire would obtain a positive difference score of 20 percentage points, while a student obtaining the reverse scores would obtain a negative difference score of 20 percentage points.

Results

A 4 (learner status) by 2 (domain knowledge) analysis of variance of students’ percent difference scores yielded a main effect for both learner status fl; (3,170) = 3.03; p < ,031) and domain knowledge (F (1,170) = 5.21; p < .024). There was no inter- action effect (F (3,170) = .33; p < .802).

Averaging across domain knowledge, we find that LD, Lo-A and Ave-A students tended to perform higher on the diagram test, while Hi-A students’ performance did not differ on the two measures. On average, LD students’ performance was higher on the diagram test by an average of 8.83 percentage points, a difference greater than zero @ < .007). Low-A students and Ave-A students also scored higher on the diagram test, obtaining average difference scores of 9.82 percentage points (‘J < .OOl) and 3.13

Page 13: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

Performance Assessment 311

percentage points (JJ < .042), respectively. Hi-A students, on the other hand, realized an average diagram difference of only 1.20 percentage points, and this difference was not significant (p < .529).

The effect of the diagram test format also varied by level of domain knowledge. Averaging over learner status, we found that students with Less-El after instruction obtained higher scores on the diagram test than on the questionnaire, by an average difference of 8.55 percentage points. Students with More-El also obtained higher scores on the diagram test, averaging a difference of 2.94 percentage points. But this increase was not significantly different from zero (p < .172) and was significantly less than the difference realized by their less knowledgeable peers (p < .OOO).

Although not a focus of the analysis, we were interested in whether students had a preference for a particular type of assessment format, particularly given some LD students’ feelings of inadequacy in traditional testing contexts. As part of the diagram test, students were asked to indicate whether they preferred the diagram or questionnaire test. Eighty-eight percent of the students reported that they liked the diagram test better, many remarking that it was “fun” and “easier” than the questionnaire.

Discussion

For fourth-grade students in a hands-on science program on electricity, the effect of test format appears to be a function of both learner status and level of domain knowledge. These results suggest that students with LD, low and average academic skills are better able to access and use their knowledge in a constructed diagram format than in an open-ended questionnaire format. In contrast, high achieving students appear to be less sensitive to these format differences, performing comparably on the two types of assessment.

In addition, this study suggests that graphics may be more useful than textual questionnaire items in helping students who have less domain specific expertise to access and use their “fragile” knowledge (Perkins & Martin, 1986). However, as students develop expertise within a specific content domain, the impact of the symbolic representation of the question or the openness of the response format tends to weaken, but not disappear.

There may be several reasons underlying students’ differential performance on the diagram and questionnaire tests. One possible explanation is that the two tests are measuring different aspects of achievement. We designed the two measures to assess students’ understanding and application of the same basic concepts, but there may be unintended differences in the items that are influencing students’ performance.

An alternative explanation is that the diagram test scaffolds students’ performance in two important ways. First, the dominant symbol system of the question is figural, rather than verbal. The simple, but realistic graphics highlight the key components and structure of each circuit, providing visual representations of situations directly related to students’ previous hands-on experience building circuits and learning how electricity

works. The graphics may have helped students to activate electric circuit schemas relevant to each specific question, as well as to focus their attention on critical attributes

Page 14: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

312 B. DALTON et al

of the problem (e.g., the wiring of a series vs. parallel circuit). As Darth and Carnine (1986) explain, the potential of a graphic to activate a more “constrained” schema may be particularly important for students who have difficulty accessing and using prior knowledge appropriately, identifying key information and ignoring irrelevant detail (for a review, see Stone & Michals, 1986).

Second, the response format of the diagram test was more constrained than that of the questionnaire. After predicting whether a particular circuit would work (a closed yes/no response), students constructed a response by either drawing the electrical pathway on the diagram or writing a brief explanation of why a particular light bulb would not light. It seems reasonable to assume that many students would find this an easier task than producing what were often quite complex drawings and written explanations on the questionnaire. Further, the fact that the response option for the majority of the diagram questions was constant, with only the design of the specific electrical circuit changing, may have helped students focus on the relevant attributes of each problem.

The hypothesis that the diagram format is particularly supportive of LD and low achieving students who have difficulty with verbally oriented, more open-ended written assessments appears to be supported by this data. However, it is not clear to what extent differences in the primary symbol system> response openness, or both symbol system and openness, contributed to these results. In this study, figural questions and constrained response were linked, and textual questions and open responses were linked. We are conducting additional research to disambiguate these effects.

The results of this study are consistent with current science assessment research showing that students’ performance in hands-on science is influenced by the character- istics of the assessment (Harmon, 1990b; Martinez, 1991; Pine, Baxter, & Shavelson, 1991; Shavelson, Carey, & Webb, 1990). These findings suggest that young students’ performance may vary as a function of the assessment modality, and that it is important to use a variety of tools to obtain a comprehensive view of their progress, particularly for LD students, low achieving students, or students who have a relatively undeveloped knowledge base. Differences in performance between an open-ended questionnaire (low scaffold) and a constructed diagram test (high scaffold) provide a more complete picture of the effects of specific interventions on diverse learners. For individual students, these differences may help define their zone of proximal development (Vygotsky, 1978), alerting teachers to potentially rich opportunities for instruction and learning.

Acknowledgements - This research was supported by a U.S. Office of Education Special Education Program grant to Catherine Cobb Morocco, Education Development Center, inc. An earlier version of this chapter was presented at the annual meeting of the American Educational Research Association, San Francisco, CA, April 27, 1992. We wish to thank Karen Worth and Maryellen Harmon for their advice on developing instruments and scoring procedures and to express our appreciation to the teachers and children who participated in this project.

References

Alley, G., & Deshler, D. (1979). Teaching the learning disabled adolescent: Sfrategies and methods. Denver, CO: Love Publishing.

Anderson, R. (19’77). The notion of schemata and the educational enterprise. In R. Anderson, R. Spiro. & W. Montague (Eds.), Schooling and the acquisition of knowledge. Hi&dale, NJ: Erlbaum.

Bergerud, D., Lovitt. T. C., & Horton, S. (1988). The effectiveness of textbook adaptations in life science for high school students with learning disabilities. Journal of Learning Disabilities, 21(Z), 70-76.

Page 15: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

Performance Assessment 313

Blumberg, F., Epstein, M., MacDonald, W., SC Mullis, I. (1986). A pilot study of higher-order thinking skills assessment techniques in science and mathematics. Princeton, NJ: National Assessment of Educational Progress.

Brown, A., & Ferrara, R. (1985). Diagnosing zones of proximal development. In J. Wertsch (Ed.), Culture, communication and cognition: Vygotskian perspectives. Cambridge, UK: Cambridge University Press.

Campione, J. C. (1990). Dynamic assessment: Potential for change as a metric of individual readiness. In A. B. Champagne, B. E. Lovitts, & B. J. Calinger (Eds.), Assessment in the service of instruction (pp. 167-177). Washington, DC: American Association for the Advancement of Science.

Carey, S. (1986). Cognitive science and science education. American Psychologist, 41(10), 112>1130. Champagne, A. B., Lovitts, B. E., & Calinger, B. J. (1990). A ssessment in the service of instruction:

Papers from the 1990 AAAS forum for school science. Washington, D.C.: American Association for the Advancement of Science.

Chittenden, E. (1990). Young children’s discussion of science topics. In G. Hein (Ed.), The assessment of hands-on elementary science programs (pp. 22&247). North Dakota: Center for Teaching and Learning.

Darth, C., & Carnine, D. (1986). Teaching content area material to learning disabled students. Exceptional Children, 53, 240-246.

Dole, J. A., Duffy, G. G., Roehler, L. R., & Pearson, P. D. (1991). Moving from the old to the new: Research on reading comprehension in instruction. Review of Educational Research, 61(2), 239264.

Dwyer, F. M. (1978).Strategies for improving visual learning. State College, PA: Learning Services. EDC Science and Problemsolving Project (1990). Unpublished pilot data. Newton, MA: Education

Development Center. Englert, C., & Thomas, C. (1987). Sensitivity to text structure in reading and writing: A comparison

between learning disabled and non-learning disabled students. Learning Disability Quarterly, 10, 93-105. Gilbert, J. K., Osborne, R. J., & Fensham, P. J. (1986). Children’s science and its consequences for

teaching. In J. Brown, A. Cooper, T. Horton, F. Toates, & D. Zeldin (Eds.), Science in schools (pp. 302-315). Philadelphia: Open University Press.

Graham, S., & MacArthur, C. A. (1988). Written language of the handicapped. In C. Reynolds & L. Mann (Eds.), Encyclopedia of special education (pp. 1178-1181). New York: Wiley.

Harmon, M. (1990a). Personal communication with authors. Harmon, M. (1990b). Fair testing issues: Are science education assessments biased? In A. Champagne et al.

(Eds.), Assessment in the service of instruction (pp. 29-59). Washington, D.C.: American Association for the Advancement of Science.

Hein, G. (1990). The assessment of hands-on elementary science programs. North Dakota: Center for Teaching and Learning.

Horton, S. V., Lovitt, T. C., & Bergerud, D. (1990). The effectiveness of graphic organizers for three classifications of secondary students in content area classes. Journal of Learning Disabilities, 23(l), 12-22.

Jastak, S., & Wilkinson, G. S. (1984). Wide range achievement test-revised: Level 1. Wilmington, DE: Jastak Associates.

Levie, W., & Lentz, R. (1982). Effects of text “illustrations: A review of research. Educational Communication and Technology Journal, 30(4), 195-232.

Martinez, M. E. (1991). A comparison of multiple-choice and constructed figural response items. Journal of Educational Measurement, 28(2), 131-145.

Mastropieri, M. A., Emerick, I. C., & Scruggs, T. E. (1988). Mnemonic instruction of science concepts. Behavioral Disorders, 14(l), 48-56.

Mick, L. B. (1989). Measurement effects of modifications in minimum competency test formats for exceptional students. Measurement and Evaluation in Counseling and Development, 22, 31-36.

Morocco, C. C., Dalton, B., & Tivnan, T. (1992). The impact of computer-supported writing instruction on fourth-grade students with and without learning disabilities. Reading and Writing Quarterly, 8, 87-113.

Morocco, C. C., Dalton, B., Tivnan, T., & Rawson, P. (1992). Supported inquiry science: Teaching for conceptual change in the urban classroom. Paper presented at the annual meeting of the American Educational Research Association. San Francisco, CA.

Myklebust, H. R. (1973). Development and disorders of written language: Studies of normal and exceptional children. NY: Grune & Stratton.

National Science Board Commission on Precollege Education in Mathematics, S. A. T. (1983). Educating Americans for the twenty-first century.

National Science Foundation (1991). Program solicitation on assessing student learning. Washington, DC: Author.

Perkins, D., & Martin, F. (1986). Fragile knowledge and neglected strategies in novice programmers. In El Soloway & S. Iyengar (Eds.), Empirical studies of programmers (pp. 213-229). Norwood, NJ: Ablex.

Page 16: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

314 B. DALTON et af.

Pine, J., Baxter, G., 8: Shavelson, R. (1991). Assessments for hands-on elementary science curricula. Paper presented at the annual meeting of the National Science Teachers Association in Houston.

Putnam, M. L. (1992). Characteristics of questions on tests administered by mainstream secondary classroom teachers. Learning Disabilities Research and Practice, 3, 129-136.

Raizen, S. A., Baron, J. B., Champagne, A. B., Haertel, E., Mullis, I. V., & Oakes. J. (1989). Assessment in elementary science education. Washington, D.C.: The National Center for Improving Science Education.

Resnick, L. B. (1987). Education and learning to think. Washington, D.C.: National Academy Press. Scruggs, T. E., & Mastropieri, M. A. (1988). Are learning disabfed students ‘test-wise’? A review of recent

research. Learning Disabilities Focus, 3, 87-97. Shavelson, R. J., Carey, N. B., & Webb, N. M. (1990). Indicators of science achievement: Options for

a powerful policy instrument. Phi Delta Kappan, 71(9), 692-697. Snow, R. (1989). Aptitude and treatment interactions as a framework for research in individual differences

in learning. In P. Ackerman, R. Sternberg, & R. Glaser. (Eds.), Learning and individual differencesr Advances in theory and research (pp. 13-59). NY: W. H. Freeman.

Stoddard, B., & MacArthur, C. A. (1993). A peer editor strategy: Guiding learning-disabled students in response and revision. Research in the Teaching of English, 27(l), 76103.

Stone, A., & Michals, D. (1986). Problem-solving skills in learning disabled children. In S. J. Ceci (Ed.), Handbook of cognirive, social and neuropsychologicaf aspects of Learning disabilities. Hillsdale, NJ: Lawrence Ertbaum.

Sylvia, J., & Ysseldyke, J. E. (1991). Assessment in special and remedial education. Boston: Houghton Mifflin.

Tolfa, D., Scruggs, T. E., & Bennion, K. (1985). Format changes in reading achievement tests: Implications for learning disabled students. Psychology in the &hoots, 22, 387-391.

Torgesen, J. K. (1977). Memorization processes in reading-disabled children. Journal of educational Psychology, 69, 57-578.

Vygotsky, L. (1978). Mind in society. Cambridge, MA: Harvard University Press. Webb, N. M., Gold, K., & Qi, S. (1989). Mathematical problem-solving processes and performance:

Translation among symbolic representations. Paper presented at the annual meeting of the American Educational Research Association. San Francisco, CA.

Winn, W. (1989). The design and use of instructional graphics. In H. Mandl & J. R. Levin (Eds.), Knowledge acquisition form text and pictures (pp. 125-144). North Holland: Elsevier Science Publishers.

Woodward, J., & Noel& J. (1991). Science instruction at the secondary level: Implications for students with learning disabilities. Journal of Learning Disabilities, 24(5), 277-284.

Ysseldyke, J., Algozzine, B., & Epps, S. (1983). A logical and empirical anaIysis of current practices in classifying students as handicapped. Exceptional Chifdren, N(2), 160-166.

Ysseldyke, J., Algozzine, B., Shinn, M. R., & McGue, M. (1982). Similarities and differences between low achievers and students classified learning disabled. Journal of Special Education, 16(l), 73-85.

Zigmond, N., Levin, E., & Laurie, T. E. (1985). Managing the mainstream: An analysis of teacher attitudes and student performance in mainstream high school programs. Journal of Learning Disabilities, 18,535-541.

Page 17: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

Performance Assessment 315

Appendix A

PrelPost-Written Questionnaire Items

1. Imagine you have a battery, a bulb and some wire. la. First, draw a large diagram of the bulb, showing all the different parts. Label each part of the bulb. lb. Second, add the battery and wire to your diagram, showing one way to light the bulb. Label all the parts you know. lc. Describe what happens to make your bulb light up. Id. What could you do to make your bulb burn brighter? le. What would happen if you added another bulb to your set-up?

2a. What is an electrical ‘conductor’? 2b. Give an example.

3a. What is an electrical ‘insulator’? 3b. Give an example.

4. If you want to test whether a dime conducts electricity, what could you do? You may draw to help explain your answer.

5. Carlos and Sue tried to light a bulb with a battery and some wire, but it didn’t work. 5a. What could be the problem? (List as many problems as you can.) Sb. How would you check it?

6. Sue had a string of tree lights. She unscrewed 1 bulb and all the lights went our. 6a. Why? 6b. Draw a large diagram to show how the bulbs could be wired in the circuit.

7. Marcos had a different string of lights. When he unscrewed 1 bulb, the rest of the lights stayed on 7a. Why? 7b. Draw a large diagram to show how the bulbs could be wired in this circuit.

8. Why is a switch an important part of an electric circuit?

Concept Scores: Simple circuit = questions la and lb. Conductivity = questions 2a, 2b, and 4 Series circuit = questions 6a and 6b. Parallel circuit = questions 7a and 7b.

Page 18: Effect of format on learning disabled and non-learning disabled students' performance on a hands-on science assessment

B. DALTON et rrl.

Diagram Analysis Post- Test

i