Evidence on the Effectiveness

7/27/2019 Evidence on the Effectiveness

1/41

Language Learning ISSN 0023-8333

Evidence on the Effectivenessof Comprehensive Error Correction

in Second Language Writing

Catherine G. Van Beuningen

University of Amsterdam

Nivja H. De Jong

Utrecht University

Folkert Kuiken

University of Amsterdam

This study investigated the effect of direct and indirect comprehensive corrective feed-

back (CF) on second language (L2) learners written accuracy (N= 268). The studyset out to explore the value of CF as a revising tool as well as its capacity to supportlong-term accuracy development. In addition, we tested Truscotts (e.g., 2001, 2007)

claims that (a) correction may have value for nongrammatical errors but not for errors

in grammar; (b) students are inclined to avoid more complex constructions due to error

correction; and (c) the time spent on CF may be more wisely spent on additional writing

practice. Results showed that both direct and indirect comprehensive CF led to improved

accuracy, over what is gained from self-editing without CF (control group 1) and from

sheer writing practice without CF (control group 2), and this was true not only during

revision but also in new pieces of writing (i.e., texts written during posttest and delayed

posttest sessions, 1 and 4 weeks after the delivery of CF). Furthermore, a separateanalysis of grammatical and nongrammatical error types revealed that only direct CF

resulted in grammatical accuracy gains in new writing and that pupils nongrammatical

An earlier version of this article was presented at the Annual Conference of the American Associ-

ation of Applied Linguistics in Atlanta in March 2010. The research was conducted as part of the

doctoral dissertation by the first author. We would like to thank the dissertation examiners John

Bitchener, Kees De Glopper, Jan Hulstijn, Lourdes Ortega, Gert Rijlaarsdam, and Rob Schoo-

nen for useful suggestions. We also thank the anonymous reviewers and the editor ofLanguage

Learningfor their invaluable comments on the previous versions of this article.Correspondence concerning this article should be addressed to Catherine G. Van Beuningen,

Amsterdam Center for Language and Communication, University of Amsterdam, Spuistraat 134,

1012 VB Amsterdam, the Netherlands. Internet: [email protected]

Language Learning 62:1, March 2012, pp. 141 1C2011 Language Learning Research Club, University of MichiganDOI: 10.1111/j.1467-9922.2011.00674.x


2/41

Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

accuracy benefited most from indirect CF. Moreover, CF did not result in simplified

writing when structural complexity and lexical diversity in students new writing were

measured. Our findings suggest that comprehensive CF is a useful educational tool that

teachers can use to help L2 learners improve their written accuracy over time.

Keywords written corrective feedback; error correction; direct and indirect feedback;

second language writing; accuracy development; written complexity

Introduction

Error correction or corrective feedback (CF) is probably the most widely used

feedback form in present-day second language (L2) classrooms. Its usefulness,

however, has been fiercely debated ever since Truscotts (1996) article, in which

he claimed error correction is necessarily ineffective and potentially harmful.

In the decade that followed, he repeatedly presented objections with respect

to the use of CF in L2 writing classes (Truscott, 1996, 1999, 2004, 2007, and

elsewhere).

Truscotts (1996) statement that CF is ineffective relies on both practical

and theoretical arguments. His practical doubts pertain to teachers capacities

in providing adequate and consistent feedback and to learners ability and will-

ingness to use the feedback effectively. His theoretical argument rests on theclaim that CF overlooks important insights from second language acquisition

(SLA) theories, including (a) the gradual and complex nature of interlanguage

development, which stands in stark contrast to error correction as a simple

transfer of information; (b) the impossibility for any single form of CF to be ef-

fective across the very differently acquired domains of morphology, syntax, and

lexis, particularly with respect to grammatical features that are integral parts

of a complex system (Truscott, 2007, p. 258) and that would be impervious to

change (even if CF might be proven beneficial for spelling problems and othersuch simple, discrete errors); (c) the likelihood that any proven benefits might

be at best related to the development of explicit declarative knowledge (e.g.,

DeKeyser, 2003; Ellis, 2004), but never implicit procedural knowledge, which

is all-important for acquisition; thus, CF would promote pseudo-learning or

at best self-editing and revision skills, without fostering true accuracy develop-

ment; and (d) the impracticality of tailoring CF to each learners current level

of L2 development, in Pienemanns (1998) sense of learner readiness, given

that findings in this area to date are incomplete and insufficient to be useful

for teaching practice. In addition to arguing that error correction is ineffective,

Truscott (1996, 2004, 2007) stated that correcting students writing might be

also counterproductive. One of his arguments was that teachers run the risk

Language Learning 62:1, March 2012, pp. 141 2


3/41


of making their students avoid more complex structures when they emphasize

learners errors by providing CF. Truscott reasoned that it is the immediate

goal of error correction to make learners aware of the errors they committedand that this awareness creates a motivation for students to avoid the corrected

constructions in future writing (Truscott, 2007). Second, Truscott (1996, 2004)

claimed that CF is a waste of time and suggested that the energy spent on

dealing with correctionsboth by teachers and studentscould be allocated

more efficiently to alternative activities, such as additional writing practice.

Based on all of the above reasons, Truscott (1996) summoned the aban-

donment of CF from L2 writing classes, until its usefulness had been proven

by empirical research (Truscott, 1999, 2004, 2007). Although Ferris (1999,

2002, 2004) made a stand for the use of written CF and argued that Truscotts

conclusions were premature, she agreed that evidence from well-designed

studies was necessary before any firm conclusions could be drawn about the

(in)effectiveness of error correction. This call has resulted in an ever-expanding

body of studies exploring the effects of CF on L2 learners writing.

Framed within a cognitive perspective of L2 acquisition, the present study

reports on a tightly controlled classroom-based quasi-experiment, which in-

cluded pretest, treatment/control, posttest, and delayed posttest sessions, and

compared the effects of direct and indirect comprehensive CF on the writtenaccuracy exhibited by 268 secondary school L2 learners during revision as well

as in new writing, 1 and 4 weeks after the treatment. Two control groups that

did not receive any CF but engaged instead in self-editing and in additional

writing practice, respectively, were also included. The study also examined the

differential value of CF for grammatical and nongrammatical error types as

well as any putative complexity avoidance effects of the CF treatments.

Research into the Effectiveness of Written CF

Early empirical work on the effects of CF on L2 learners writing can be

categorized into two strands: a set of studies that focused on the role of CF

during the revision process and a second group of investigations that set out to

answer the question of whether correction yields a learning effect when new

pieces of writing are inspected. The revision studies demonstrated that language

students were able to improve the accuracy of a particular piece of writing,

based on the feedback provided (e.g., Ashwell, 2000; Fathman & Whalley,

1990; Ferris, 1997; Ferris & Roberts, 2001). These findings thus showed that

CF is a useful editing tool. However, as Truscott and Hsu (2008) have rightly

argued, these results do not constitute evidence of learning, as only two versions

3 Language Learning 62:1, March 2012, pp. 141


4/41


of the same draft were compared. It thus remained unclear if students success

in using the feedback during revision would subsequently lead to acquisition of

the corrected forms. One might also speculate that practice with autonomousself-editing of texts might be a better use of instructional time, if CF is really

only beneficial as an editing tool. If one believes that editing is the only source

of benefit of CF, teachers might simply spend more instructional time in class

encouraging the honing of self-editing skills.

The more interesting body of research, therefore, consists of studies that

have investigated the effects of CF on new pieces of writing. The first studies that

did opt to provide insights into the learning potential of CF, however, produced

mixed results (e.g., Chandler, 2003; Kepner, 1991; Polio, Fleck, & Leder, 1998;

Semke, 1984; Sheppard, 1992). Whereas Chandler (2003) concluded that CF

is an effective means of improving the accuracy of students writing over time,

Kepner (1991), Semke (1984), Polio et al. (1998), and Sheppard (1992) failed

to find an effect for error correction. All of these studies, however, suffered

from serious shortcomings in terms of their design, execution, and/or analyses

(see Guenette, 2007, or Van Beuningen, De Jong, & Kuiken, 2008, for a review

of these methodological issues). These investigations thus did not allow for any

firm conclusions on the role of CF in L2 accuracy development and led both

opponents (e.g., Truscott, 1996) and advocates (e.g., Ferris, 1999) of writtenerror correction to call for more, well-designed CF studies.

The recognition that a focus on grammar learning benefits rather than

on revision skills is needed has led the way into a growing body of tightly

controlled investigations that has begun to explore the long-term effects of CF

on L2 writing by comparing learners accuracy performance on pretests and

(delayed) posttests. Following the methodology of oral feedback research (e.g.,

Ellis, Loewen, & Erlam, 2006; Lyster, 2004), the majority of these researchers

(e.g., Bitchener, 2008; Bitchener & Knoch, 2008, 2009, 2010a, 2010b; Ellis,Sheen, Murakami, & Takashima, 2008; Sheen, 2007, 2010) have chosen to

investigate the effects of focused correction, whereby CF only targets one

persistently problematic error type at a time (e.g., errors in the use of English

articles). The rationale behind this focused approach is that learners might

be more likely to notice and understand corrections when just one feature is

targeted (Ellis et al., 2008), particularly in light of a limited processing capacity

model of L2 acquisition (Bitchener, 2008; Sheen, 2007). The extant findings

thus far consistently point at robust and durable positive effects of focused

CF on learners accuracy development (see Xu, 2009, however, for a critical

discussion of the findings by Bitchener, 2008; and see Bitchener, 2009, for a

response).



5/41


By comparison, evidence on the language learning potential of unfocused

CF, which involves comprehensive correction of every error in students writ-

ing, is scarce. To the best of our knowledge, only four studies have investigatedwhether comprehensive CF yields a learning effect. Two of them (Truscott &

Hsu, 2008; Van Beuningen et al., 2008) compared comprehensive CF to no CF

groups, whereas the other two (Ellis et al., 2008; Sheen, Wright, & Moldawa,

2009) included a focused and a comprehensive CF group in addition to a control

group in their design.

Truscott and Hsu (2008) examined the writing performance of 47 English-

as-a-second-language (ESL) learners, half of whom received comprehensive

corrections and the other half functioned as a control group. Truscott and Hsu

found that although comprehensive CF enabled their learners to improve the

accuracy of a particular text during revision, it did not lead to accuracy gains

when writing a new text. However, Bruton (2009) has noted that a ceiling effect

may explain the findings, as the texts learners wrote during the pretest contained

very few errors to begin with. The second study that compared comprehensive

CF to no CF was conducted as a pilot for the present study (Van Beuningen et

al., 2008). It explored the effects of two types of comprehensive CF and two

control conditions on the writing of 66 L2 learners of Dutch. It was found that

comprehensive error correction not only led to improved accuracy in the revisedversion of a particular piece of writing but also yielded a clear learning effect;

learners who received comprehensive CF made significantly fewer errors in

newly produced texts than pupils whose errors had not been corrected.

Ellis et al. (2008) found accuracy gains for both their focused and com-

prehensive CF groups and thus concluded both are equally effective. However,

some methodological weaknesses of the study have been discussed by Xu

(2009) and, as the authors themselves acknowledged, students in the focused

CF group received more feedback on the target feature (i.e., English articles)than students in the comprehensive CF group. Sheen et al. (2009), on the other

hand, found the focused approach to be more beneficial than provision of com-

prehensive feedback when both were compared to the control group. However,

the authors pointed out that the correction received by the comprehensive CF

group was rather unsystematic in nature; although some errors were corrected,

others were ignored. It is conceivable that this unsystematic way of correcting

negatively influenced the effect of comprehensive CF in this study. More evi-

dence on the potentially differential value of focused versus comprehensive CF

would be most welcome, not only because of the discrepant results reviewed

here but also because the issue of relative effectiveness is particularly relevant



6/41


for teaching practice, given that both might be valuable feedback methodologies

in different ways.

Ultimately, however, we argue that it is important to produce more evi-dence about the singular contribution of unfocused or comprehensive CF on

the accuracy of new pieces of writing. For one, and notwithstanding the sig-

nificant contribution of the focused CF work to the error correction debate,

the implications that can be drawn from focused CF studies so far are rather

limited, because the targeted linguistic features (i.e., English articles) are typi-

cally selected for maximal simplicity (Ferris, 2010; Truscott, 2010). Xu (2009)

also noted that such a clear focus on just one or two grammatical structures

may lead students to consciously monitor the use of that target feature when

performing on the posttest(s). We are sympathetic to Brutons (2009) concern

that focused CF studies might have framed CF as written grammar exercises

rather than authentic writing tasks, and we would like to heed Hartshorn et

al.s (2010) call for research on more authentic CF methodologies, which focus

on the accurate production of all aspects of writing, simultaneously (p. 89).

Most important to us is the fact that the comprehensive approach most closely

resembles the correction method used in actual teaching practice; a teachers

purpose in correcting his/her pupils written work is (among other things) to

improve accuracy in general, not just the use of one grammatical feature (Ferris,2010; Storch, 2010).

Thus, in the present study, we decided to investigate crucial theoretical

issues surrounding the CF debate by closely examining the pedagogical value

of comprehensive error correction (both during revision as well as for longer

term L2 learning as demonstrated in new writing) while adopting the tightly

controlled methodology of recent focused CF studies. Before presenting the

study, we review the extant empirical evidence bearing on the main additional

variables investigated: the relative effectiveness of direct versus indirect CF, thedifferential effects of CF on grammatical versus nongrammatical errors, and

two potentially harmful side effects of CF.

The Relative Effectiveness of Direct and Indirect CF

Corrective feedback researchers have not only shown interest in the question

of whether correction should be comprehensive or selective in nature. Many

studies have also addressed how questions, exploring the relative effective-

ness of different ways to deliver CF, or CF types. Most of these studies have

categorized their CF methodologies as either direct or indirect. Whereas direct

CF consists of an indication of the error and provision of the correspond-

ing correct L2 form, indirect CF only indicates that an error has been made.



7/41


Indirect correction methods can take different forms that vary in their explicit-

ness (e.g., underlining of errors, coding of errors). What they share in common

is that instead of the teacher providing the target form, it is left to the learnerto correct his/her own errors.

Various alternative hypotheses concerning the relative effectiveness of di-

rect and indirect CF have been put forward. In support of indirect CF, it has been

suggested that learners will benefit more from it because it engages students in

a more profound form of language processing while they self-edit their writing

(e.g., Ferris, 1995; Lalande, 1982). In this view, the indirect approach requires

pupils to engage in guided learning and problem solving and, as a result, pro-

motes the type of reflection that is more likely to foster long-term acquisition

(Bitchener & Knoch, 2008, p. 415). In support of direct CF, on the other hand,

it has been claimed that the indirect approach might fail because it provides

learners with insufficient information to resolve complex linguistic errors (e.g.,

syntactic errors). Chandler (2003) furthermore argued that whereas direct CF

enables learners to instantly internalize the correct form, learners whose errors

are corrected indirectly do not know if their own hypothesized corrections are

indeed accurate. This delay in access to the target form might level out the

potential advantage of the additional cognitive effort associated with indirect

CF. Moreover, it may be that learners need a certain level of (meta)linguisticcompetence to be able to self-correct their errors using indirect CF (e.g., Ferris,

2004; Hyland & Hyland, 2006; Sheen, 2007).

Clear empirical evidence on the differential effects of direct and indirect CF

on accuracy development is lacking, as research on the issue has produced con-

flicting results. Some researchers have found no differences between the two CF

types (Frantzen, 1995; Robb, Ross, & Shortreed, 1986), others have reported

an advantage for indirect CF (Ferris, 2006; Lalande, 1982), and yet others have

found direct correction to be most effective in their comparisons (Bitchener& Knoch, 2010b; Chandler, 2003; Van Beuningen et al., 2008). Problems of

interpretation cloud these contradictory results, however. For example, some

of the studies were not designed to compare the two CF methodologies, and in

other studies, the operationalization of direct and indirect CF as distinct treat-

ments was not rigorous enough. Additionally, caution must be exercised when

results are proclaimed on descriptive findings versus statistical significance

tests and on comparisons against the other treatment versus the control group.

Thus, in the study in which we piloted the present research methodology (Van

Beuningen et al., 2008), the difference when the direct and indirect CF treat-

ments were compared against each other did not reach significance, at ap-value

of .06, but when each treatment was compared to the two control (no CF)



8/41


conditions, only the learners receiving direct CF significantly outperformed

pupils in the control groups when writing a new text. Bitchener and Knoch

(2010b), who did report a statistically significant difference between direct andindirect CF, found the advantage to be in favor of the direct approach. However,

the findings between the two studies are not directly comparable, as the former

investigated (direct and indirect) comprehensive CF and the latter investigated

(direct and indirect) focused CF.

The Value of CF for Different Error Types

In his 2007 article, Truscott explained that his case against CF (Truscott, 1996,

1999) was actually a case against grammar correction. He claimed that syntactic

errors in particular might not be amenable to correction, because they are

integral parts of a complex system thatin Truscotts viewis impermeable to

CF. He furthermore suggested that morphological features are evenly unlikely to

benefit from CF because their acquisition not only depends on the understanding

of form but also of meaning and use in relation to other words and portions of

the language system. Truscott (2001, 2007) concluded that if CF has any value

for L2 development, this could only be true for errors that involve simple

problems in relatively discrete items (Truscott, 2001, p. 94)such as spelling

errorsand not for errors in grammar.A number of studies explored the effects of CF on separate error types, and

all reported differing levels of improvement for different types of errors (e.g.,

Bitchener, Young, & Cameron, 2005; Ferris, 2006; Ferris & Roberts, 2001;

Frantzen, 1995; Lalande, 1982; Sheppard, 1992). Ferris (2006), for example,

differentiated among five major error categories: verb errors, noun errors, article

errors, lexical errors, and sentence errors. She found that students receiving CF

only experienced a significant reduction from pretest to posttest in verb errors.

Lalande (1982) distinguished 12 error types and observed that correction onlyled to a significant decrease in orthographical errors. Bitchener et al. (2005)

investigated how focused CF influenced learners accuracy development on

three target structures and found that CF had a greater effect on the accuracy of

past simple tense and articles than on the correct usage of prepositions. None

of these studies, however, could test Truscotts claim that grammar correction is

ineffective, because they were not set out by design to investigate whether CF is

more beneficial for nongrammatical error types than for errors in grammar.

The Potential Harmful Side Effects of CF

Truscott has argued not only that CF is ineffective but also that it has at least

two harmful side effects. One is simplified writing, based on the assumption



9/41


that CF encourages learners to avoid situations in which they make errors.

In fact, Truscott (2004, 2007) has argued that accuracy gains found in earlier

correction studies might well be attributable to such avoidance and simplifiedwriting instead of CF. His suggestions are in line with limited capacity models

of attention that also predict a trade-off between accuracy and complexity (e.g.,

Skehan, 1998). Within these models, L2 performance is expected to become

more complex when learners are willing to experiment with the target language.

A focus on accuracy, on the other hand, is seen to reflect a greater degree of

conservatism in which learners will try to achieve greater control over more

stable [interlanguage] elements while avoiding extending their L2 repertoire

(Skehan & Foster, 2001, p. 191).

Few studies have investigated the influence of written CF on linguistic

complexity, and, in our opinion, studies that did (Chandler, 2003; Robb et al.,

1986; Sheppard, 1992) could not come to any warranted conclusions. Shep-

pard (1992), for example, reported a negative effect of CF on the structural

complexity of learners writing, but in fact his finding was nonsignificant.

Robb et al. (1986) found that CF had a significant positive effect on written

complexity, but they did not include a no CF control group. The same holds for

Chandler (2003), who did not find any effect of CF on the complexity of stu-

dents writing. An additional problem with the latter study is that Chandlerbased her conclusion on holistic ratings of text quality. In our view, however,

the fact that holistic ratings did not change does not necessarily prove that the

linguistic complexity of learners writing did not change either.

The second harmful side effect of CF identified by Truscott (1996, 2004)

was the diversion of time and energy away from more productive aspects of writ-

ing instruction, such as sheer writing practice. The only study that has directly

tested this claim by comparing the effects of CF to those of writing practice is

an investigation by Sheen et al. (2009). Their results opposed Truscotts claimby showing that learners did not benefit more from writing practice than from

CF.

Research Questions

The present studyof which the setup, tasks, and procedure were piloted in Van

Beuningen et al. (2008)intended to add to the existing body of CF research

by trying to tackle some of the unsettled issues in the debates reviewed above

and by answering eight research questions.

Given the limited empirical evidence about the value of comprehensive CF,

the first two research questions are:



10/41


RQ1 Is comprehensive written CF useful as an editing toolthat is, does it

enable learners to improve the accuracy of an initial text during revision?

RQ2 Does comprehensive written CF yield alearning effectthat is, does itlead to improved accuracy in new texts written 1 week and 4 weeks after

CF has been provided?

Because it seems plausible that learners benefit from taking a critical look

at their own text and revising it, even without teacher intervention, the present

study furthermore opted to test if comprehensive CF has an added value above

self-correction:

RQ3 Is comprehensive CF more beneficial to learners accuracy developmentthan having the opportunity to correct their own writing (without feed-

back)?

Given that the extant CF research to date has been unable to come to any

clear conclusions regarding the relative efficacy of direct and indirect CF, we

also asked the following:

RQ4 How effective are direct and indirect (comprehensive) CF relative to each

other and to no CF?

Moreover, as no earlier research has been designed to directly address

Truscotts (2001, 2007) claim that CF might have value for nongrammatical

errors but not for errors in grammar, we strived to test this hypothesis by

distinguishing between grammatical and nongrammatical error types in our

analyses:

RQ5 Are grammatical errors less amenable to correction than other types of

errors (i.e., nongrammatical errors)?

In addition, very few studies have explored the potential harmful effects of

CF, and we therefore asked the following:

RQ6 Is comprehensive CF more beneficial to learners accuracy development

than writing practice?

RQ7 Does error correction lead to avoidance of lexically and structurally

(more) complex utterances?

Finally, we explored the potential influence in our sample of learners

educational level on the degree to which they are able to benefit from direct

and indirect CF:



11/41


RQ8 What (if any) is the influence of pupils educational level on CF efficacy?

Research question 8 holds educational and theoretical implications. In the

first place, it would be valuable for teachers to know if learners from different

educational levels are equally receptive to (direct and indirect) comprehensive

CF. Moreover, we presumed learners educational level to be indicative of

their level of metalinguistic awareness, and the suggestion has been made that

learners with lower levels of (meta)linguistic competence might be less able

to correct their own errors based on indirect CF (e.g., Ferris, 2004; Hyland &

Hyland, 2006; Sheen, 2007). This led us to expect that pupils with a higher

educational level would perhaps profit more from indirect CF than pupils from

a lower level of education.

Method

Setting and Participants

Four Dutch secondary schools with multilingual student populations partici-

pated in the study. Over 80% of those schools pupils came from non-Dutch

language backgrounds;1 although most pupils were born in the Netherlands,

many of them only started learning Dutch in school (i.e., at age 4). Our samplewas very heterogeneous with respect to language background (28 first lan-

guages [L1s]), with Moroccan Arabic (31%), Turkish (16%), and Surinamese

languages (16%: 8% Sranan Tongo, 8% Sarnami Hindustani) being the most

common L1s.

All schools that took part in our study aimed at integrating content and

(second) language instruction by adopting a language-sensitive instructional

approach (e.g., Van Eerde & Hajer, 2005). Following this approach, language

did not only play a central role in language classes but also in classes inwhich the overriding focus was on content (e.g., biology, mathematics, ge-

ography). The main aim is to cater for the special needs of L2 learners and

learners with limited language proficiency, who might experience problems

understanding and acquiring the content due to the linguistic demands of the

input. The present investigation was conducted during biology classes, and our

tasks treated biology-related topics.

The studys population consisted of six intact classes of pupils in their

second year of higher general secondary education (orhavo in Dutch) (N=

134)

and seven intact classes of pupils in their second year of secondary prevocational

education (orvmbo-tin Dutch) (N= 134). In the remainder of this article wewill use the contrast higher level and lower level, respectively, to refer



12/41


to these two groups of pupils representing distinct educational levels. Pupils

mean age was 14 (minimum 14, maximum 15). Within classes, participants

were randomly assigned to the four different condition groups incorporated inthe study, to prevent a confounding interaction between condition and class.

Furthermore, the male/female and Dutch L1/L2 proportion was kept constant

across the four groups (see Note 1).

Treatments and Controls

The present study featured two experimental treatments and two control con-

ditions. They are described below.

Experimental Group I: Direct CF (DIR)Pupils in the first experimental group received comprehensive direct CF. The

researcher identified all existing linguistic errors and provided the pupils with

the corresponding target forms, as illustrated in Example 1:

Example1 :direct error correction

De lievenheersbeestje is rood. [The ladybug is red.]Het lieveheersbeestje

Experimental Group II: Indirect CF (IND)

Texts of pupils in the second experimental group were corrected indirectly.

The researcher provided an indication of each error and its category, but it was

left to the student to derive the corresponding target forms, as illustrated in

Example 2:

Example2 :indirect error correction

De lievenheersbeestje S is rood. [The ladybug is red.]( = wrong word, S = spelling error)

The error coding system comprised codes for nine different linguistic error

types, classified under three superordinate categories: (a) lexical errors: word

choice; (b) grammatical errors: word form (e.g., verb tense, singular/plural),

word order, incomplete sentences, and addition or omission of a word; and (c)

orthographical errors: spelling, punctuation, and capitalization.

Control Group I: Self-Correction (SLF)

Pupils in the first control group did not receive any feedback on their initial

text but were invited to revise their writing without any available CFthat is,



13/41


to self-correct their output. Including this control condition enabled us to set

apart effects of error correction from effects of the revision process as such.

Control Group II: Additional Writing Practice (PRC)

Just as the first control group, participants in the second control group did

not receive CF. Unlike the first control group, they were not involved in any

revision activities. Instead, they performed a completely new writing task,

which provided them with the opportunity to practice their writing skills once

more. Inclusion of this second control condition assured an equal distribution of

time-on-task across experimental and control groups, because completion of a

completely new task asked (at least) as much time as revising an already written

text. Moreover, integration of the practice condition enabled us to determine ifCF is more effective than just providing pupils with more practice opportunities

(Truscott, 1996, 2004).

Procedure

The present investigation included four sessions: a pretest session (S1), a treat-

ment/control session (S2), a posttest session (S3), and a delayed-posttest session

(S4). Figure 1 presents an overview of the procedure, which is described in de-

tail below. Our tasks were designed for research purposes only; they were not

part of the standard biology curriculum. However, all tasks were administered

during class periods. The tasks and topics were introduced and explained by

the researcher, who also provided the CF treatments, and the class teacher was

present to maintain order. All tasks and tests were pen-and-paper assignments.

Session 1: Pretest (Week 1)

During the first session (S1), all pupils, irrespective of their group assignment,

were presented with a receptive vocabulary test, a questionnaire concerning

their language background, and the first writing task. Scores on the vocabularytest were taken to provide an indication of pupils overall language proficiency.

A vocabulary test was chosen because earlier research demonstrated that vo-

cabulary knowledge is a good predictor of overall language proficiency (e.g.,

Beglar & Hunt, 1999; Zareva, Schwanenflugel, & Nikolova, 2005). Moreover,

a vocabulary test seemed to be the most suitable instrument out of practical

considerations and time restrictions. We included the vocabulary test scores in

our analyses as a covariate to be able to control for individual L2 proficiency

differences. The output on the first assignment served as a baseline measure

of pupils written accuracy and structural and lexical complexity. Pupils were

given 20 minutes to complete the writing task and were instructed to write a

minimum of 15 lines.



14/41


Group Condition Pretest

S1: week 1

Treatment/Control

Session

S2: week 2

Posttest

S3: week 3

Delayed

Posttest

S4: week 6

Direct CF

(DIR)

N= 32 + 35

Revision based on

comprehensive

direct CF

(Same text)

Indirect CF

(IND)

N = 29 + 33

Revision based on

comprehensive

indirect CF

(Same text)

Self-Correction

(SLF)

N = 37 + 34

Revision based on

self-correction

(no CF)

(Same text)

Practice

(PRC)

N = 36 + 32

Vocabulary

test

&

Language

background

questionnaire

Initial text

(butterflies)

Writingpractice

(no revision nor CF)

(New text: honeybees)

New text

(ladybugs)

New text

(wasps)

Figure 1 Experimental procedure. DIR= Direct CF, IND = Indirect CF, SLF = Self-correction with no CF, PRC = Additional writing practice with no CF.

Before administering the first assignment, the researcher introduced the

tasks topic by means of a 10-minute mini-lesson, to ensure a comparable

minimal amount of background knowledge on the topic among participants

and across the four conditions. Additionally, pupils were told that their main



15/41


focus needed to be on communicating content but that their texts would also

be evaluated on linguistic adequacy. To give them an idea of the form-related

features they could attend to while writing, pupils were given a handout listingcommon types of errors and an example for each error category. Participants

were not aware of the fact that they could be receiving CF at a later stage. We

chose to draw learners attention to form across all four groups at this stage for

several reasons. First, we aimed at establishing the unique contribution of CF

to accuracy development. If only the attention of the experimental groups had

been directed toward linguistic form (by means of comprehensive CF provision

during session 2), it would have been unclear if it was the form-oriented focus

in general or the CF as such that led to any observed differences with the

control groups. Second, the handout used in the introduction was reusedin a

slightly adapted version including the error coding systemwhen instructing

pupils in the indirect CF group on the interpretation of the error codes [see the

Session 2: Treatment/Control (Week 2) subsection]. This setup ensured that

the indirect group did not have an advantage as compared to the other groups

in terms of language input. Finally, by directing every learners attention to

linguistic form at the start of the experiment, we strove to foster stable (i.e.,

across sessions) and equal (i.e., across the four groups) task representations.

Had the task instructions in session 1 emphasized only content issues, the wayin which pupils in the experimental (but not the control) groups interpreted

the task would have been likely to change over time because students are

known to attune their task representations toward the task demands set by the

teacher (Broekkamp, Van Hout-Wolters, Van den Bergh, & Rijlaarsdam, 2004).

Whereas pupils initial perceptions of the study tasks goal would have been to

understand and communicate content (in any scenario where there was no initial

instructional focus on language form), receiving CF could have resulted in a

shift of learners task representations toward focusing on linguistic accuracy.

Session 2: Treatment/Control (Week 2)

The treatment/control session (S2) took place 1 week after S1. Pupils in the two

CF groups received the directly or indirectly corrected versions of their initial

texts (cf. Examples 1 and 2 earlier). They were asked to copy the initial texts,

revising all errors corrected by the researcher. Before starting their revisions,

pupils in the indirect CF group were given a handout, accompanied by oral

instructions, on how to interpret and use the error codes in their texts. The two

control groups each underwent a different condition involving no CF. Pupils in

the self-correction group were handed the texts they wrote during the pretest

session without any alterations. They were invited to self-correct their pretest



16/41


writing by thoroughly reading over their texts and searching for any elements

that needed revising. Even if no such utterances would be found, pupils were

still obliged to copy their initial texts. Pupils in the practice group, on theother hand, were not given the opportunity to revise their pretest texts but were

presented with a new writing task instead. The researcher shortly explained

the new tasks topic before pupils started writing. All pupils were allocated 20

minutes to finish the task they were presented with, irrespective of the condition

to which they were assigned.

Sessions 3 and 4: Posttest (Week 3) and Delayed Posttest (Week 6)

The first posttest (S3) was administered 1 week after the treatment/control

session (S2), and the delayed posttest took place 4 weeks after S2. During both

posttest sessions, all pupils were given 20 minutes to produce a text (at least

15 lines in length) on a new topic, which was again shortly introduced by the

researcher.

Writing Tasks

The tasks used throughout the study were four writing assignments on a biology-

related topicnamely, the metamorphosis of different insects: butterflies (S1),

honeybees (S2), ladybugs (S3), and wasps (S4).2 Whereas three of the taskswere performed by all pupils, the task on a wasps metamorphosis was only

performed by participants in the second control group, who practiced their

writing during the treatment/control session (S2).

All tasks within the series had a comparable presentation and design; they

invited pupils to write an email to a classmate who was absent during the

researchers explanation of the tasks topic. Pupils were asked to explain

the metamorphosis of the relevant insect, based on a series of images depicting

the metamorphosis process. To avoid individual differences in familiarity withthe relevant vocabulary, (potentially) unfamiliar words (e.g., larva, cocoon)

were glossed.

Data Processing and Reliability

All handwritten output was transcribed and coded using the CLAN program

of CHILDES (MacWhinney, 2000). Two research assistants transcribed pupils

texts, and the first author was responsible for coding them. Texts were coded

for linguistic errors and clause types (i.e., main clause and subordinate clause).

The coding procedure was entirely blind; during coding, the researcher was

unaware of the study group to which the text at hand belonged. Ten percent of

the data was also coded by one of the assistants to be able to establish interrater



17/41


Table 1 Average levels of intrarater and interrater agreement for linguistic measures

Overall Grammatical Nongrammatical Structural

accuracy accuracy accuracy complexity

ICC Intrarater .998 .977 .996 .974

ICC Interrater .981 .921 .975 .946

reliability and recoded, 6 months later, by the first author to measure intrarater

reliability.

The reliability results are presented in Table 1. We calculated intraclass

correlation coefficients (ICC) to establish the average levels (over sessions) ofintrarater and interrater agreement for overall accuracy, grammatical accuracy,

nongrammatical accuracy, and structural complexity (cf. next section). The

intrarater ICCs were calculated from an ANOVA two-way mixed-effects model,

and they provide an indication of the variability due to variation within the same

rater. A two-way random-effects model was used to estimate the interrater ICCs,

reporting the proportion of the variability due to variation among raters. As

shown in Table 1, high levels of agreement within the same rater as well as

between raters were found for all four measures.

Linguistic Measures for the Dependent Variables

Every text was analyzed for linguistic accuracy and, in order to address research

question 7, also for structural complexity and lexical diversity.

As did earlier studies exploring the effectiveness of written CF (e.g.,

Chandler, 2003; Truscott & Hsu, 2008), we used an error ratio to measure

overall accuracy: [number of linguistic errors/total number of words] 10.A 10-word ratio rather than the more common 100-word ratio was used be-cause pupils texts were relatively short (i.e., around 120 words). To be able to

test Truscotts (2001, 2007) claim that nongrammatical errors might be more

amenable to correction than errors in grammar (our research question 5), we

broke down our overall accuracy measure into a measure of grammatical ac-

curacy and a measure of nongrammatical accuracy. The first measure was a

ratio calculated on the basis of the sum of the number of article errors, inflec-

tional errors, word order errors, omissions of necessary elements, additions

of nonnecessary elements, pronominal errors, and other grammatical errors

(i.e., [number of grammatical errors/total number of words] 10). Lexicalerrors, orthographical errors, appropriateness/pragmatic errors, and other non-

grammatical errors,3 on the other hand, were included in the nongrammatical



18/41


accuracy ratio (i.e., [number of nongrammatical errors/total number of words]

10).4

To measure structural complexity, we used a subordination index: the num-ber of subordinate clauses as a percentage of the total number of clauses (i.e.,

[number of subclauses/total number of clauses] 100) (Norris & Ortega, 2009;Wolfe Quintero, Inagaki, & Kim, 1998). Lexical diversity was calculated using

Guirauds Index, a type-token ratio that corrects for text length (types/

tokens)

(Guiraud, 1954).

Statistical Analyses

As our participants came from different classes within different schools, our

data were structured hierarchically. In other words, the performances of pupilswithin one class or school were expected to be more similar to each other than

to performances of pupils in other classes and schools; a teacher, for example,

is likely to affect the performances of his/her pupils. This consideration was

important for the choice of appropriate statistical procedures. ANOVA, specif-

ically, relies on the independence of cases and may have been inappropriate

for our data if indeed we had found that observations within classes or schools

were dependent. We therefore applied linear multilevel analyses (e.g., Snijders

& Bosker, 1999) to explore the relationship between observations within classesand schools. The multilevel procedure enabled us to explicitly model possible

dependencies in the data by including class and school as random factors.5

These multilevel analyses showed, however, that class and school never made a

significant contribution to the model. After we had ascertained in this way that

the assumption of independence was met, we proceeded to analyze our data

using AN(C)OVAs because this procedure has proven to have more power in

absence of data dependencies (e.g., Snijders & Bosker, 1999).

We used ANCOVAs to test for between-groups differences on the dif-

ferent dependent variables (i.e., overall accuracy, grammatical accuracy, non-

grammatical accuracy, structural complexity, and lexical diversity) in the treat-

ment/control session (S2), first posttest session (S3), and delayed-posttest ses-

sion (S4). The initial ANCOVA models contained group condition and ed-

ucational level as between-subjects variables and language proficiency as a

covariate. In addition, we incorporated learners pretest (S1) performance on

the relevant dependent variable as a covariate to account for effects of initial in-

dividual differences. We started out each ANCOVA with a full model, including

all relevant factors, and all possible interactions between those factors. Factors

and interactions between factors that did not explain a significant proportion of

the variance were excluded from the final ANCOVA models.



19/41


Table 2 Descriptive statistics for overall language proficiency by educational level and

group condition

Higher educational level (havo) Lower educational level (vmbo-t)Group Group

condition N M SD condition N M SD

DIR 32 76.09 10.52 DIR 34 72.24 11.08

IND 26 74.85 11.80 IND 31 67.97 10.71

SLF 33 80.45 8.93 SLF 31 70.61 11.93

PRC 33 74.09 8.46 PRC 33 71.97 8.54

Total 124 76.46 10.10 Total 129 70.75 10.63

Note. The maximum possible score on the receptive vocabulary test was 108. The sample

size for the vocabulary test is smaller than the total sample size for participants in the

study because some participants failed to complete the vocabulary test. DIR= directCF, IND = indirect CF, SLF = self-correction with no CF, PRC = additional writing

practice with no CF.

Results

We will first present some relevant descriptive and inferential results on thelanguage proficiency and writing in S1 that help ascertain the comparability

of the four study groups in terms of learners proficiency profiles and baseline

performance at the onset of the study, prior to delivery of the two treatments

and two control conditions. We will then successively present the findings

regarding the value of comprehensive CF as an editing tool (RQ1), its language

learning potential (RQ2), the effects of direct and indirect CF on grammatical

and nongrammatical accuracy (RQ4 and RQ5), and the influence of CF on the

complexity of learners writing (RQ7). The comparison between the effects of

CF and those of self-correction (RQ3) and additional writing practice (RQ 7),

and the role of pupils educational level (RQ8), will be discussed throughout

the Results section.

Comparability of Groups at the Onset of the Study

The descriptive statistics for overall language proficiency (i.e., pupils scores

on the receptive vocabulary test), itemized by educational level and group as-

signment, are included in Table 2. In Tables 3, 4, and 5 readers will find the

descriptive statistics for overall accuracy, grammatical accuracy, and nongram-

matical accuracy scores on the first writing task about butterflies, done in S1



20/41


Table 3 Descriptive statistics for overall accuracya by educational level, condition, and

session

Pretest

(S1)

Treatment/control

session

(S2)

Posttest

(S3)

Delayed

posttest

(S4)Educational Group

level condition N M SD M SD M SD M SD

High (havo) DIR 32 1.52 0.75 0.37 0.29 1.08 0.66 1.32 0.89

IND 29 1.25 0.67 0.42 0.32 1.05 0.46 0.89 0.40

SLF 37 1.39 0.60 1.28 0.66 1.61 0.82 1.50 0.70

PRC 36 1.29 0.63 1.30 0.72 1.49 0.73 1.41 0.77

Total 134 1.37 0.66 0.88 0.71 1.33 0.73 1.31 0.75

Low (vmbo-t) DIR 35 1.69 0.70 0.32 0.24 1.64 0.76 1.42 0.78

IND 33 1.95 0.88 0.74 0.63 1.73 0.97 1.54 0.90

SLF 34 1.85 0.75 1.50 0.68 1.97 0.93 1.75 0.84

PRC 32 1.79 1.00 2.19 1.00 2.26 1.00 2.08 1.10

Total 134 1.82 0.83 1.17 0.99 1.90 0.94 1.70 0.93

Note.DIR=

direct CF, IND=

indirect CF, SLF=

self-correction with no CF, PRC=additional writing practice with no CF.

aNumber of linguistic errors per 10 words.

as a baseline measure of pupils written accuracy and structural and lexical

complexity, also itemized by educational level. The descriptive results for the

structural complexity and lexical diversity on S1 are shown in Tables 6 and 7,

respectively.A series of ANOVAs on pupils baseline texts (S1) with group condition

as a between-subjects variable showed that there were no initial differences

among the four groups in overall accuracy,F(3, 264)


21/41


Table 4 Descriptive statistics for grammatical accuracyaby educational level, condition,

and session

Pretest

(S1)

Treatment/control

session

(S2)

Posttest

(S3)

Delayed

posttest



High (havo) DIR 32 0.43 0.27 0.09 0.10 0.32 0.22 0.39 0.34

IND 29 0.30 0.24 0.10 0.11 0.36 0.27 0.37 0.22

SLF 37 0.37 0.25 0.37 0.23 0.42 0.30 0.46 0.33

PRC 36 0.37 0.27 0.36 0.27 0.44 0.23 0.45 0.23Total 134 0.37 0.26 0.24 0.24 0.39 0.26 0.42 0.29

Low (vmbo-t) DIR 35 0.55 0.31 0.10 0.11 0.49 0.33 0.39 0.26

IND 33 0.47 0.33 0.21 0.23 0.51 0.27 0.49 0.32

SLF 34 0.53 0.32 0.46 0.29 0.64 0.41 0.60 0.39

PRC 32 0.52 0.36 0.59 0.41 0.69 0.44 0.72 0.46

Total 134 0.52 0.33 0.34 034 0.58 0.38 0.55 0.38

Note.DIR

=direct CF, IND

=indirect CF, SLF

=self-correction with no CF, PRC

=additional writing practice with no CF.aNumber of grammatical errors per 10 words.

language proficiency, F(3, 253) = 1.96, p = .120, p 2 = .02. These findingsensured that our four groups were comparable at the onset of the study.

On the other hand, however, a series of ANCOVAs with educational level

as a between-subjects variable and language proficiency as a covariate revealed

that these two factors did influence learners pretest writing. We found thatboth educational level,F(1, 250) = 14.20,p


22/41


Table 5 Descriptive statistics for nongrammatical accuracya by educational level, con-

dition, and session

Pretest

(S1)

Treatment/control

session

(S2)

Posttest

(S3)

Delayed

posttest



High (havo) DIR 32 1.09 0.63 0.28 0.27 0.77 0.56 0.93 0.82

IND 29 0.95 0.52 0.32 0.26 0.69 0.39 0.52 0.29

SLF 37 1.01 0.55 0.92 0.56 1.20 0.69 1.04 0.59

PRC 36 0.93 0.54 0.94 0.63 1.05 0.66 0.97 0.69

Total 134 1.00 0.56 0.64 0.56 0.95 0.63 0.89 0.66

Low (vmbo-t) DIR 35 1.14 0.56 0.23 0.17 1.15 0.65 1.03 0.75

IND 33 1.48 0.72 0.52 0.47 1.22 0.82 1.05 0.72

SLF 34 1.32 0.63 1.05 0.64 1.33 0.69 1.15 0.70

PRC 32 1.28 0.82 1.60 0.85 1.56 0.89 1.36 0.95

Total 134 1.30 0.69 0.83 0.78 1.32 0.78 1.15 0.79

Note.DIR

=direct CF, IND

=indirect CF, SLF


=additional writing practice with no CF.aNumber of nongrammatical errors per 10 words.

were well guided in incorporating the factors educational level and language

proficiency in later analyses when answering our research questions.

Effects of Comprehensive CF on Overall Written Accuracy

Table 3 includes the descriptive statistics for overall accuracy scores for allfour group conditions, itemized per educational level and session (i.e., treat-

ment/control session, posttest, and delayed posttest). Figure 2 is a graphic illus-

tration of the descriptive results in Table 3, collapsed over educational level. It

shows the accuracy development of the four study groups over time. The graph

provides a descriptive preview of this studys main findings, which will be

presented in this section. As can be seen from the graph, all groups performed

similarly at the pretest (S1). In the treatment/control session (S2), however, the

error rate of the CF groups and the self-correction group decreased, whereas

the number of errors committed by the practice group increased. On the two

posttests (S3 and S4), we still see an error rate difference between the control

and experimental groups, in favor of the latter.



23/41


Table 6 Descriptive statistics for structural complexityaby educational level, condition,

and session

Pretest

(S1)

Treatment/contro

lsession

(S2)

Posttest

(S3)

Delayed

posttest



High DIR 32 16.31 10.46 17.30 10.87 14.67 9.14 10.27 8.13

(havo) IND 29 16.30 8.98 18.07 10.20 14.36 10.38 10.98 8.25

SLF 37 14.74 8.94 14.39 9.30 15.65 10.92 10.94 10.62

PRC 36 19.86 10.02 16.13 8.73 15.75 9.32 15.80 8.02Total 134 16.83 9.71 16.35 9.74 15.16 9.88 12.06 9.10

Low DIR 35 20.78 8.17 21.66 8.27 18.65 10.77 15.17 9.64

(vmbo-t) IND 33 19.41 11.66 21.04 10.57 16.70 10.98 10.84 8.95

SLF 34 15.03 8.95 16.19 8.62 14.57 11.00 11.25 7.67

PRC 32 18.86 8.00 15.19 8.23 13.66 8.65 13.17 8.30

Total 134 18.52 9.45 18.57 9.32 15.89 10.45 12.72 8.73

Note.DIR

=direct CF, IND

=indirect CF, SLF


=additional writing practice with no CF.aSubordination Index (i.e., [number of subclauses/total number of clauses] 100).

Analyses of covariance were used to test for between-groups accuracy dif-

ferences in the treatment/control session (S2), posttest (S3), and delayed posttest

(S4). The initial ANCOVA models contained group condition and educational

level as between-subjects variables and language proficiency and overall ac-

curacy S1 as covariates. Nonsignificant factors and nonsignificant interactions

between them were excluded from the final models. Table 8 offers a previewof the significant differences observed between groups in post hoc pairwise

comparisons (to be reported in detail below) in overall, grammatical, and non-

grammatical accuracy across the three sessions, together with the associated

Cohensdeffect sizes.

Revision Effects (S2)

During revision, pupils language proficiency, F(1, 240)= 2.63, p= .106,

p

2

=.01, and the interaction between condition and language proficiency,

F(3, 240) = 2.40,p = .069, p 2 = .03, did not prove to play a significant role.These factors were therefore discarded from the final ANCOVA model. The



24/41


Table 7 Descriptive statistics for lexical diversitya by educational level, condition, and

session

Pretest

(S1)

Treatment/control

session

(S2)

Posttest

(S3)

Delayed

posttest



High (havo) DIR 32 7.13 0.85 7.28 0.91 6.87 0.90 6.51 0.80

IND 29 7.10 0.77 7.10 0.73 6.69 0.68 6.61 0.63

SLF 37 7.09 0.81 7.06 0.78 6.70 0.90 6.31 0.98

PRC 36 7.31 0.50 6.72 0.72 6.88 0.71 6.63 0.83Total 134 7.16 0.74 7.03 0.81 6.79 0.80 6.51 0.84

Low (vmbo-t) DIR 35 7.00 0.81 7.26 0.70 6.51 0.73 6.58 0.69

IND 33 6.77 1.06 7.03 1.00 6.64 1.00 6.47 0.89

SLF 34 7.08 0.75 7.12 0.82 6.46 0.70 6.37 0.75

PRC 32 6.96 0.87 6.53 0.77 6.59 0.87 6.57 0.70

Total 134 6.95 0.88 7.00 0.87 6.55 0.74 6.50 0.75

Note.DIR

=direct CF, IND

=indirect CF, SLF


=additional writing practice with no CF.aGuirauds Index (types/

tokens).

definitive analysis revealed significant effects of group condition, F(3, 259)

= 111.24,p


25/41


Table8S

ummaryofsignificantcontrastsamongfourgroupswithass

ociatedCohensdeffectsizes

Overallaccuracy

d

Gra

mmaticalaccuracy

d

Nongrammaticalaccuracy

d

Treatment/controlsession(S2)

D

IR>

SLF

1.9

9

DIR>

SLF

1.5

2

DIR>

SLF

1.6

1

D

IR>

PRC

2.7

4

DIR>

PRC

1.8

5

DIR>

PRC

2.5

3

I

ND>

SLF

1.5

5

IND>

SLF

1.0

5

IND>

SLF

1.3

3

I

ND>

PRC

2.2

9

IND>

PRC

1.3

7

IND>

PRC

2.0

6

Posttest(S

3)

D

IR>

SLF

0.6

7

DIR>

PRC

0.5

5

DIR>

SLF

0.5

3

D

IR>

PRC

0.8

1

DIR>

PRC

0.6

5

I

ND>

SLF

0.7

0

IND>

SLF

0.6

8

I

ND>

PRC

0.8

4

IND>

PRC

0.8

0

Delayedposttest(S4)

D

IR>

PRC

0.6

3

DIR>

PRC

0.6

5

IND>

SLF

0.6

6

IND>

SLF

0.6

6

IND>

PRC

0.8

2

I

ND>

PRC

0.9

4

Note.

DIR

=

directCF,

IND=

indirectC

F,

SLF=

self-correctionwith

noCF,

PRC=

additionalwritingpracticewithnoCF.

p

Documents

Evidence on the Effectiveness