Evidence on the Effectiveness

  • Upload
    e

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

  • 7/27/2019 Evidence on the Effectiveness

    1/41

    Language Learning ISSN 0023-8333

    Evidence on the Effectivenessof Comprehensive Error Correction

    in Second Language Writing

    Catherine G. Van Beuningen

    University of Amsterdam

    Nivja H. De Jong

    Utrecht University

    Folkert Kuiken

    University of Amsterdam

    This study investigated the effect of direct and indirect comprehensive corrective feed-

    back (CF) on second language (L2) learners written accuracy (N= 268). The studyset out to explore the value of CF as a revising tool as well as its capacity to supportlong-term accuracy development. In addition, we tested Truscotts (e.g., 2001, 2007)

    claims that (a) correction may have value for nongrammatical errors but not for errors

    in grammar; (b) students are inclined to avoid more complex constructions due to error

    correction; and (c) the time spent on CF may be more wisely spent on additional writing

    practice. Results showed that both direct and indirect comprehensive CF led to improved

    accuracy, over what is gained from self-editing without CF (control group 1) and from

    sheer writing practice without CF (control group 2), and this was true not only during

    revision but also in new pieces of writing (i.e., texts written during posttest and delayed

    posttest sessions, 1 and 4 weeks after the delivery of CF). Furthermore, a separateanalysis of grammatical and nongrammatical error types revealed that only direct CF

    resulted in grammatical accuracy gains in new writing and that pupils nongrammatical

    An earlier version of this article was presented at the Annual Conference of the American Associ-

    ation of Applied Linguistics in Atlanta in March 2010. The research was conducted as part of the

    doctoral dissertation by the first author. We would like to thank the dissertation examiners John

    Bitchener, Kees De Glopper, Jan Hulstijn, Lourdes Ortega, Gert Rijlaarsdam, and Rob Schoo-

    nen for useful suggestions. We also thank the anonymous reviewers and the editor ofLanguage

    Learningfor their invaluable comments on the previous versions of this article.Correspondence concerning this article should be addressed to Catherine G. Van Beuningen,

    Amsterdam Center for Language and Communication, University of Amsterdam, Spuistraat 134,

    1012 VB Amsterdam, the Netherlands. Internet: [email protected]

    Language Learning 62:1, March 2012, pp. 141 1C2011 Language Learning Research Club, University of MichiganDOI: 10.1111/j.1467-9922.2011.00674.x

  • 7/27/2019 Evidence on the Effectiveness

    2/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    accuracy benefited most from indirect CF. Moreover, CF did not result in simplified

    writing when structural complexity and lexical diversity in students new writing were

    measured. Our findings suggest that comprehensive CF is a useful educational tool that

    teachers can use to help L2 learners improve their written accuracy over time.

    Keywords written corrective feedback; error correction; direct and indirect feedback;

    second language writing; accuracy development; written complexity

    Introduction

    Error correction or corrective feedback (CF) is probably the most widely used

    feedback form in present-day second language (L2) classrooms. Its usefulness,

    however, has been fiercely debated ever since Truscotts (1996) article, in which

    he claimed error correction is necessarily ineffective and potentially harmful.

    In the decade that followed, he repeatedly presented objections with respect

    to the use of CF in L2 writing classes (Truscott, 1996, 1999, 2004, 2007, and

    elsewhere).

    Truscotts (1996) statement that CF is ineffective relies on both practical

    and theoretical arguments. His practical doubts pertain to teachers capacities

    in providing adequate and consistent feedback and to learners ability and will-

    ingness to use the feedback effectively. His theoretical argument rests on theclaim that CF overlooks important insights from second language acquisition

    (SLA) theories, including (a) the gradual and complex nature of interlanguage

    development, which stands in stark contrast to error correction as a simple

    transfer of information; (b) the impossibility for any single form of CF to be ef-

    fective across the very differently acquired domains of morphology, syntax, and

    lexis, particularly with respect to grammatical features that are integral parts

    of a complex system (Truscott, 2007, p. 258) and that would be impervious to

    change (even if CF might be proven beneficial for spelling problems and othersuch simple, discrete errors); (c) the likelihood that any proven benefits might

    be at best related to the development of explicit declarative knowledge (e.g.,

    DeKeyser, 2003; Ellis, 2004), but never implicit procedural knowledge, which

    is all-important for acquisition; thus, CF would promote pseudo-learning or

    at best self-editing and revision skills, without fostering true accuracy develop-

    ment; and (d) the impracticality of tailoring CF to each learners current level

    of L2 development, in Pienemanns (1998) sense of learner readiness, given

    that findings in this area to date are incomplete and insufficient to be useful

    for teaching practice. In addition to arguing that error correction is ineffective,

    Truscott (1996, 2004, 2007) stated that correcting students writing might be

    also counterproductive. One of his arguments was that teachers run the risk

    Language Learning 62:1, March 2012, pp. 141 2

  • 7/27/2019 Evidence on the Effectiveness

    3/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    of making their students avoid more complex structures when they emphasize

    learners errors by providing CF. Truscott reasoned that it is the immediate

    goal of error correction to make learners aware of the errors they committedand that this awareness creates a motivation for students to avoid the corrected

    constructions in future writing (Truscott, 2007). Second, Truscott (1996, 2004)

    claimed that CF is a waste of time and suggested that the energy spent on

    dealing with correctionsboth by teachers and studentscould be allocated

    more efficiently to alternative activities, such as additional writing practice.

    Based on all of the above reasons, Truscott (1996) summoned the aban-

    donment of CF from L2 writing classes, until its usefulness had been proven

    by empirical research (Truscott, 1999, 2004, 2007). Although Ferris (1999,

    2002, 2004) made a stand for the use of written CF and argued that Truscotts

    conclusions were premature, she agreed that evidence from well-designed

    studies was necessary before any firm conclusions could be drawn about the

    (in)effectiveness of error correction. This call has resulted in an ever-expanding

    body of studies exploring the effects of CF on L2 learners writing.

    Framed within a cognitive perspective of L2 acquisition, the present study

    reports on a tightly controlled classroom-based quasi-experiment, which in-

    cluded pretest, treatment/control, posttest, and delayed posttest sessions, and

    compared the effects of direct and indirect comprehensive CF on the writtenaccuracy exhibited by 268 secondary school L2 learners during revision as well

    as in new writing, 1 and 4 weeks after the treatment. Two control groups that

    did not receive any CF but engaged instead in self-editing and in additional

    writing practice, respectively, were also included. The study also examined the

    differential value of CF for grammatical and nongrammatical error types as

    well as any putative complexity avoidance effects of the CF treatments.

    Research into the Effectiveness of Written CF

    Early empirical work on the effects of CF on L2 learners writing can be

    categorized into two strands: a set of studies that focused on the role of CF

    during the revision process and a second group of investigations that set out to

    answer the question of whether correction yields a learning effect when new

    pieces of writing are inspected. The revision studies demonstrated that language

    students were able to improve the accuracy of a particular piece of writing,

    based on the feedback provided (e.g., Ashwell, 2000; Fathman & Whalley,

    1990; Ferris, 1997; Ferris & Roberts, 2001). These findings thus showed that

    CF is a useful editing tool. However, as Truscott and Hsu (2008) have rightly

    argued, these results do not constitute evidence of learning, as only two versions

    3 Language Learning 62:1, March 2012, pp. 141

  • 7/27/2019 Evidence on the Effectiveness

    4/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    of the same draft were compared. It thus remained unclear if students success

    in using the feedback during revision would subsequently lead to acquisition of

    the corrected forms. One might also speculate that practice with autonomousself-editing of texts might be a better use of instructional time, if CF is really

    only beneficial as an editing tool. If one believes that editing is the only source

    of benefit of CF, teachers might simply spend more instructional time in class

    encouraging the honing of self-editing skills.

    The more interesting body of research, therefore, consists of studies that

    have investigated the effects of CF on new pieces of writing. The first studies that

    did opt to provide insights into the learning potential of CF, however, produced

    mixed results (e.g., Chandler, 2003; Kepner, 1991; Polio, Fleck, & Leder, 1998;

    Semke, 1984; Sheppard, 1992). Whereas Chandler (2003) concluded that CF

    is an effective means of improving the accuracy of students writing over time,

    Kepner (1991), Semke (1984), Polio et al. (1998), and Sheppard (1992) failed

    to find an effect for error correction. All of these studies, however, suffered

    from serious shortcomings in terms of their design, execution, and/or analyses

    (see Guenette, 2007, or Van Beuningen, De Jong, & Kuiken, 2008, for a review

    of these methodological issues). These investigations thus did not allow for any

    firm conclusions on the role of CF in L2 accuracy development and led both

    opponents (e.g., Truscott, 1996) and advocates (e.g., Ferris, 1999) of writtenerror correction to call for more, well-designed CF studies.

    The recognition that a focus on grammar learning benefits rather than

    on revision skills is needed has led the way into a growing body of tightly

    controlled investigations that has begun to explore the long-term effects of CF

    on L2 writing by comparing learners accuracy performance on pretests and

    (delayed) posttests. Following the methodology of oral feedback research (e.g.,

    Ellis, Loewen, & Erlam, 2006; Lyster, 2004), the majority of these researchers

    (e.g., Bitchener, 2008; Bitchener & Knoch, 2008, 2009, 2010a, 2010b; Ellis,Sheen, Murakami, & Takashima, 2008; Sheen, 2007, 2010) have chosen to

    investigate the effects of focused correction, whereby CF only targets one

    persistently problematic error type at a time (e.g., errors in the use of English

    articles). The rationale behind this focused approach is that learners might

    be more likely to notice and understand corrections when just one feature is

    targeted (Ellis et al., 2008), particularly in light of a limited processing capacity

    model of L2 acquisition (Bitchener, 2008; Sheen, 2007). The extant findings

    thus far consistently point at robust and durable positive effects of focused

    CF on learners accuracy development (see Xu, 2009, however, for a critical

    discussion of the findings by Bitchener, 2008; and see Bitchener, 2009, for a

    response).

    Language Learning 62:1, March 2012, pp. 141 4

  • 7/27/2019 Evidence on the Effectiveness

    5/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    By comparison, evidence on the language learning potential of unfocused

    CF, which involves comprehensive correction of every error in students writ-

    ing, is scarce. To the best of our knowledge, only four studies have investigatedwhether comprehensive CF yields a learning effect. Two of them (Truscott &

    Hsu, 2008; Van Beuningen et al., 2008) compared comprehensive CF to no CF

    groups, whereas the other two (Ellis et al., 2008; Sheen, Wright, & Moldawa,

    2009) included a focused and a comprehensive CF group in addition to a control

    group in their design.

    Truscott and Hsu (2008) examined the writing performance of 47 English-

    as-a-second-language (ESL) learners, half of whom received comprehensive

    corrections and the other half functioned as a control group. Truscott and Hsu

    found that although comprehensive CF enabled their learners to improve the

    accuracy of a particular text during revision, it did not lead to accuracy gains

    when writing a new text. However, Bruton (2009) has noted that a ceiling effect

    may explain the findings, as the texts learners wrote during the pretest contained

    very few errors to begin with. The second study that compared comprehensive

    CF to no CF was conducted as a pilot for the present study (Van Beuningen et

    al., 2008). It explored the effects of two types of comprehensive CF and two

    control conditions on the writing of 66 L2 learners of Dutch. It was found that

    comprehensive error correction not only led to improved accuracy in the revisedversion of a particular piece of writing but also yielded a clear learning effect;

    learners who received comprehensive CF made significantly fewer errors in

    newly produced texts than pupils whose errors had not been corrected.

    Ellis et al. (2008) found accuracy gains for both their focused and com-

    prehensive CF groups and thus concluded both are equally effective. However,

    some methodological weaknesses of the study have been discussed by Xu

    (2009) and, as the authors themselves acknowledged, students in the focused

    CF group received more feedback on the target feature (i.e., English articles)than students in the comprehensive CF group. Sheen et al. (2009), on the other

    hand, found the focused approach to be more beneficial than provision of com-

    prehensive feedback when both were compared to the control group. However,

    the authors pointed out that the correction received by the comprehensive CF

    group was rather unsystematic in nature; although some errors were corrected,

    others were ignored. It is conceivable that this unsystematic way of correcting

    negatively influenced the effect of comprehensive CF in this study. More evi-

    dence on the potentially differential value of focused versus comprehensive CF

    would be most welcome, not only because of the discrepant results reviewed

    here but also because the issue of relative effectiveness is particularly relevant

    5 Language Learning 62:1, March 2012, pp. 141

  • 7/27/2019 Evidence on the Effectiveness

    6/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    for teaching practice, given that both might be valuable feedback methodologies

    in different ways.

    Ultimately, however, we argue that it is important to produce more evi-dence about the singular contribution of unfocused or comprehensive CF on

    the accuracy of new pieces of writing. For one, and notwithstanding the sig-

    nificant contribution of the focused CF work to the error correction debate,

    the implications that can be drawn from focused CF studies so far are rather

    limited, because the targeted linguistic features (i.e., English articles) are typi-

    cally selected for maximal simplicity (Ferris, 2010; Truscott, 2010). Xu (2009)

    also noted that such a clear focus on just one or two grammatical structures

    may lead students to consciously monitor the use of that target feature when

    performing on the posttest(s). We are sympathetic to Brutons (2009) concern

    that focused CF studies might have framed CF as written grammar exercises

    rather than authentic writing tasks, and we would like to heed Hartshorn et

    al.s (2010) call for research on more authentic CF methodologies, which focus

    on the accurate production of all aspects of writing, simultaneously (p. 89).

    Most important to us is the fact that the comprehensive approach most closely

    resembles the correction method used in actual teaching practice; a teachers

    purpose in correcting his/her pupils written work is (among other things) to

    improve accuracy in general, not just the use of one grammatical feature (Ferris,2010; Storch, 2010).

    Thus, in the present study, we decided to investigate crucial theoretical

    issues surrounding the CF debate by closely examining the pedagogical value

    of comprehensive error correction (both during revision as well as for longer

    term L2 learning as demonstrated in new writing) while adopting the tightly

    controlled methodology of recent focused CF studies. Before presenting the

    study, we review the extant empirical evidence bearing on the main additional

    variables investigated: the relative effectiveness of direct versus indirect CF, thedifferential effects of CF on grammatical versus nongrammatical errors, and

    two potentially harmful side effects of CF.

    The Relative Effectiveness of Direct and Indirect CF

    Corrective feedback researchers have not only shown interest in the question

    of whether correction should be comprehensive or selective in nature. Many

    studies have also addressed how questions, exploring the relative effective-

    ness of different ways to deliver CF, or CF types. Most of these studies have

    categorized their CF methodologies as either direct or indirect. Whereas direct

    CF consists of an indication of the error and provision of the correspond-

    ing correct L2 form, indirect CF only indicates that an error has been made.

    Language Learning 62:1, March 2012, pp. 141 6

  • 7/27/2019 Evidence on the Effectiveness

    7/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    Indirect correction methods can take different forms that vary in their explicit-

    ness (e.g., underlining of errors, coding of errors). What they share in common

    is that instead of the teacher providing the target form, it is left to the learnerto correct his/her own errors.

    Various alternative hypotheses concerning the relative effectiveness of di-

    rect and indirect CF have been put forward. In support of indirect CF, it has been

    suggested that learners will benefit more from it because it engages students in

    a more profound form of language processing while they self-edit their writing

    (e.g., Ferris, 1995; Lalande, 1982). In this view, the indirect approach requires

    pupils to engage in guided learning and problem solving and, as a result, pro-

    motes the type of reflection that is more likely to foster long-term acquisition

    (Bitchener & Knoch, 2008, p. 415). In support of direct CF, on the other hand,

    it has been claimed that the indirect approach might fail because it provides

    learners with insufficient information to resolve complex linguistic errors (e.g.,

    syntactic errors). Chandler (2003) furthermore argued that whereas direct CF

    enables learners to instantly internalize the correct form, learners whose errors

    are corrected indirectly do not know if their own hypothesized corrections are

    indeed accurate. This delay in access to the target form might level out the

    potential advantage of the additional cognitive effort associated with indirect

    CF. Moreover, it may be that learners need a certain level of (meta)linguisticcompetence to be able to self-correct their errors using indirect CF (e.g., Ferris,

    2004; Hyland & Hyland, 2006; Sheen, 2007).

    Clear empirical evidence on the differential effects of direct and indirect CF

    on accuracy development is lacking, as research on the issue has produced con-

    flicting results. Some researchers have found no differences between the two CF

    types (Frantzen, 1995; Robb, Ross, & Shortreed, 1986), others have reported

    an advantage for indirect CF (Ferris, 2006; Lalande, 1982), and yet others have

    found direct correction to be most effective in their comparisons (Bitchener& Knoch, 2010b; Chandler, 2003; Van Beuningen et al., 2008). Problems of

    interpretation cloud these contradictory results, however. For example, some

    of the studies were not designed to compare the two CF methodologies, and in

    other studies, the operationalization of direct and indirect CF as distinct treat-

    ments was not rigorous enough. Additionally, caution must be exercised when

    results are proclaimed on descriptive findings versus statistical significance

    tests and on comparisons against the other treatment versus the control group.

    Thus, in the study in which we piloted the present research methodology (Van

    Beuningen et al., 2008), the difference when the direct and indirect CF treat-

    ments were compared against each other did not reach significance, at ap-value

    of .06, but when each treatment was compared to the two control (no CF)

    7 Language Learning 62:1, March 2012, pp. 141

  • 7/27/2019 Evidence on the Effectiveness

    8/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    conditions, only the learners receiving direct CF significantly outperformed

    pupils in the control groups when writing a new text. Bitchener and Knoch

    (2010b), who did report a statistically significant difference between direct andindirect CF, found the advantage to be in favor of the direct approach. However,

    the findings between the two studies are not directly comparable, as the former

    investigated (direct and indirect) comprehensive CF and the latter investigated

    (direct and indirect) focused CF.

    The Value of CF for Different Error Types

    In his 2007 article, Truscott explained that his case against CF (Truscott, 1996,

    1999) was actually a case against grammar correction. He claimed that syntactic

    errors in particular might not be amenable to correction, because they are

    integral parts of a complex system thatin Truscotts viewis impermeable to

    CF. He furthermore suggested that morphological features are evenly unlikely to

    benefit from CF because their acquisition not only depends on the understanding

    of form but also of meaning and use in relation to other words and portions of

    the language system. Truscott (2001, 2007) concluded that if CF has any value

    for L2 development, this could only be true for errors that involve simple

    problems in relatively discrete items (Truscott, 2001, p. 94)such as spelling

    errorsand not for errors in grammar.A number of studies explored the effects of CF on separate error types, and

    all reported differing levels of improvement for different types of errors (e.g.,

    Bitchener, Young, & Cameron, 2005; Ferris, 2006; Ferris & Roberts, 2001;

    Frantzen, 1995; Lalande, 1982; Sheppard, 1992). Ferris (2006), for example,

    differentiated among five major error categories: verb errors, noun errors, article

    errors, lexical errors, and sentence errors. She found that students receiving CF

    only experienced a significant reduction from pretest to posttest in verb errors.

    Lalande (1982) distinguished 12 error types and observed that correction onlyled to a significant decrease in orthographical errors. Bitchener et al. (2005)

    investigated how focused CF influenced learners accuracy development on

    three target structures and found that CF had a greater effect on the accuracy of

    past simple tense and articles than on the correct usage of prepositions. None

    of these studies, however, could test Truscotts claim that grammar correction is

    ineffective, because they were not set out by design to investigate whether CF is

    more beneficial for nongrammatical error types than for errors in grammar.

    The Potential Harmful Side Effects of CF

    Truscott has argued not only that CF is ineffective but also that it has at least

    two harmful side effects. One is simplified writing, based on the assumption

    Language Learning 62:1, March 2012, pp. 141 8

  • 7/27/2019 Evidence on the Effectiveness

    9/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    that CF encourages learners to avoid situations in which they make errors.

    In fact, Truscott (2004, 2007) has argued that accuracy gains found in earlier

    correction studies might well be attributable to such avoidance and simplifiedwriting instead of CF. His suggestions are in line with limited capacity models

    of attention that also predict a trade-off between accuracy and complexity (e.g.,

    Skehan, 1998). Within these models, L2 performance is expected to become

    more complex when learners are willing to experiment with the target language.

    A focus on accuracy, on the other hand, is seen to reflect a greater degree of

    conservatism in which learners will try to achieve greater control over more

    stable [interlanguage] elements while avoiding extending their L2 repertoire

    (Skehan & Foster, 2001, p. 191).

    Few studies have investigated the influence of written CF on linguistic

    complexity, and, in our opinion, studies that did (Chandler, 2003; Robb et al.,

    1986; Sheppard, 1992) could not come to any warranted conclusions. Shep-

    pard (1992), for example, reported a negative effect of CF on the structural

    complexity of learners writing, but in fact his finding was nonsignificant.

    Robb et al. (1986) found that CF had a significant positive effect on written

    complexity, but they did not include a no CF control group. The same holds for

    Chandler (2003), who did not find any effect of CF on the complexity of stu-

    dents writing. An additional problem with the latter study is that Chandlerbased her conclusion on holistic ratings of text quality. In our view, however,

    the fact that holistic ratings did not change does not necessarily prove that the

    linguistic complexity of learners writing did not change either.

    The second harmful side effect of CF identified by Truscott (1996, 2004)

    was the diversion of time and energy away from more productive aspects of writ-

    ing instruction, such as sheer writing practice. The only study that has directly

    tested this claim by comparing the effects of CF to those of writing practice is

    an investigation by Sheen et al. (2009). Their results opposed Truscotts claimby showing that learners did not benefit more from writing practice than from

    CF.

    Research Questions

    The present studyof which the setup, tasks, and procedure were piloted in Van

    Beuningen et al. (2008)intended to add to the existing body of CF research

    by trying to tackle some of the unsettled issues in the debates reviewed above

    and by answering eight research questions.

    Given the limited empirical evidence about the value of comprehensive CF,

    the first two research questions are:

    9 Language Learning 62:1, March 2012, pp. 141

  • 7/27/2019 Evidence on the Effectiveness

    10/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    RQ1 Is comprehensive written CF useful as an editing toolthat is, does it

    enable learners to improve the accuracy of an initial text during revision?

    RQ2 Does comprehensive written CF yield alearning effectthat is, does itlead to improved accuracy in new texts written 1 week and 4 weeks after

    CF has been provided?

    Because it seems plausible that learners benefit from taking a critical look

    at their own text and revising it, even without teacher intervention, the present

    study furthermore opted to test if comprehensive CF has an added value above

    self-correction:

    RQ3 Is comprehensive CF more beneficial to learners accuracy developmentthan having the opportunity to correct their own writing (without feed-

    back)?

    Given that the extant CF research to date has been unable to come to any

    clear conclusions regarding the relative efficacy of direct and indirect CF, we

    also asked the following:

    RQ4 How effective are direct and indirect (comprehensive) CF relative to each

    other and to no CF?

    Moreover, as no earlier research has been designed to directly address

    Truscotts (2001, 2007) claim that CF might have value for nongrammatical

    errors but not for errors in grammar, we strived to test this hypothesis by

    distinguishing between grammatical and nongrammatical error types in our

    analyses:

    RQ5 Are grammatical errors less amenable to correction than other types of

    errors (i.e., nongrammatical errors)?

    In addition, very few studies have explored the potential harmful effects of

    CF, and we therefore asked the following:

    RQ6 Is comprehensive CF more beneficial to learners accuracy development

    than writing practice?

    RQ7 Does error correction lead to avoidance of lexically and structurally

    (more) complex utterances?

    Finally, we explored the potential influence in our sample of learners

    educational level on the degree to which they are able to benefit from direct

    and indirect CF:

    Language Learning 62:1, March 2012, pp. 141 10

  • 7/27/2019 Evidence on the Effectiveness

    11/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    RQ8 What (if any) is the influence of pupils educational level on CF efficacy?

    Research question 8 holds educational and theoretical implications. In the

    first place, it would be valuable for teachers to know if learners from different

    educational levels are equally receptive to (direct and indirect) comprehensive

    CF. Moreover, we presumed learners educational level to be indicative of

    their level of metalinguistic awareness, and the suggestion has been made that

    learners with lower levels of (meta)linguistic competence might be less able

    to correct their own errors based on indirect CF (e.g., Ferris, 2004; Hyland &

    Hyland, 2006; Sheen, 2007). This led us to expect that pupils with a higher

    educational level would perhaps profit more from indirect CF than pupils from

    a lower level of education.

    Method

    Setting and Participants

    Four Dutch secondary schools with multilingual student populations partici-

    pated in the study. Over 80% of those schools pupils came from non-Dutch

    language backgrounds;1 although most pupils were born in the Netherlands,

    many of them only started learning Dutch in school (i.e., at age 4). Our samplewas very heterogeneous with respect to language background (28 first lan-

    guages [L1s]), with Moroccan Arabic (31%), Turkish (16%), and Surinamese

    languages (16%: 8% Sranan Tongo, 8% Sarnami Hindustani) being the most

    common L1s.

    All schools that took part in our study aimed at integrating content and

    (second) language instruction by adopting a language-sensitive instructional

    approach (e.g., Van Eerde & Hajer, 2005). Following this approach, language

    did not only play a central role in language classes but also in classes inwhich the overriding focus was on content (e.g., biology, mathematics, ge-

    ography). The main aim is to cater for the special needs of L2 learners and

    learners with limited language proficiency, who might experience problems

    understanding and acquiring the content due to the linguistic demands of the

    input. The present investigation was conducted during biology classes, and our

    tasks treated biology-related topics.

    The studys population consisted of six intact classes of pupils in their

    second year of higher general secondary education (orhavo in Dutch) (N=

    134)

    and seven intact classes of pupils in their second year of secondary prevocational

    education (orvmbo-tin Dutch) (N= 134). In the remainder of this article wewill use the contrast higher level and lower level, respectively, to refer

    11 Language Learning 62:1, March 2012, pp. 141

  • 7/27/2019 Evidence on the Effectiveness

    12/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    to these two groups of pupils representing distinct educational levels. Pupils

    mean age was 14 (minimum 14, maximum 15). Within classes, participants

    were randomly assigned to the four different condition groups incorporated inthe study, to prevent a confounding interaction between condition and class.

    Furthermore, the male/female and Dutch L1/L2 proportion was kept constant

    across the four groups (see Note 1).

    Treatments and Controls

    The present study featured two experimental treatments and two control con-

    ditions. They are described below.

    Experimental Group I: Direct CF (DIR)Pupils in the first experimental group received comprehensive direct CF. The

    researcher identified all existing linguistic errors and provided the pupils with

    the corresponding target forms, as illustrated in Example 1:

    Example1 :direct error correction

    De lievenheersbeestje is rood. [The ladybug is red.]Het lieveheersbeestje

    Experimental Group II: Indirect CF (IND)

    Texts of pupils in the second experimental group were corrected indirectly.

    The researcher provided an indication of each error and its category, but it was

    left to the student to derive the corresponding target forms, as illustrated in

    Example 2:

    Example2 :indirect error correction

    De lievenheersbeestje S is rood. [The ladybug is red.]( = wrong word, S = spelling error)

    The error coding system comprised codes for nine different linguistic error

    types, classified under three superordinate categories: (a) lexical errors: word

    choice; (b) grammatical errors: word form (e.g., verb tense, singular/plural),

    word order, incomplete sentences, and addition or omission of a word; and (c)

    orthographical errors: spelling, punctuation, and capitalization.

    Control Group I: Self-Correction (SLF)

    Pupils in the first control group did not receive any feedback on their initial

    text but were invited to revise their writing without any available CFthat is,

    Language Learning 62:1, March 2012, pp. 141 12

  • 7/27/2019 Evidence on the Effectiveness

    13/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    to self-correct their output. Including this control condition enabled us to set

    apart effects of error correction from effects of the revision process as such.

    Control Group II: Additional Writing Practice (PRC)

    Just as the first control group, participants in the second control group did

    not receive CF. Unlike the first control group, they were not involved in any

    revision activities. Instead, they performed a completely new writing task,

    which provided them with the opportunity to practice their writing skills once

    more. Inclusion of this second control condition assured an equal distribution of

    time-on-task across experimental and control groups, because completion of a

    completely new task asked (at least) as much time as revising an already written

    text. Moreover, integration of the practice condition enabled us to determine ifCF is more effective than just providing pupils with more practice opportunities

    (Truscott, 1996, 2004).

    Procedure

    The present investigation included four sessions: a pretest session (S1), a treat-

    ment/control session (S2), a posttest session (S3), and a delayed-posttest session

    (S4). Figure 1 presents an overview of the procedure, which is described in de-

    tail below. Our tasks were designed for research purposes only; they were not

    part of the standard biology curriculum. However, all tasks were administered

    during class periods. The tasks and topics were introduced and explained by

    the researcher, who also provided the CF treatments, and the class teacher was

    present to maintain order. All tasks and tests were pen-and-paper assignments.

    Session 1: Pretest (Week 1)

    During the first session (S1), all pupils, irrespective of their group assignment,

    were presented with a receptive vocabulary test, a questionnaire concerning

    their language background, and the first writing task. Scores on the vocabularytest were taken to provide an indication of pupils overall language proficiency.

    A vocabulary test was chosen because earlier research demonstrated that vo-

    cabulary knowledge is a good predictor of overall language proficiency (e.g.,

    Beglar & Hunt, 1999; Zareva, Schwanenflugel, & Nikolova, 2005). Moreover,

    a vocabulary test seemed to be the most suitable instrument out of practical

    considerations and time restrictions. We included the vocabulary test scores in

    our analyses as a covariate to be able to control for individual L2 proficiency

    differences. The output on the first assignment served as a baseline measure

    of pupils written accuracy and structural and lexical complexity. Pupils were

    given 20 minutes to complete the writing task and were instructed to write a

    minimum of 15 lines.

    13 Language Learning 62:1, March 2012, pp. 141

  • 7/27/2019 Evidence on the Effectiveness

    14/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    Group Condition Pretest

    S1: week 1

    Treatment/Control

    Session

    S2: week 2

    Posttest

    S3: week 3

    Delayed

    Posttest

    S4: week 6

    Direct CF

    (DIR)

    N= 32 + 35

    Revision based on

    comprehensive

    direct CF

    (Same text)

    Indirect CF

    (IND)

    N = 29 + 33

    Revision based on

    comprehensive

    indirect CF

    (Same text)

    Self-Correction

    (SLF)

    N = 37 + 34

    Revision based on

    self-correction

    (no CF)

    (Same text)

    Practice

    (PRC)

    N = 36 + 32

    Vocabulary

    test

    &

    Language

    background

    questionnaire

    Initial text

    (butterflies)

    Writingpractice

    (no revision nor CF)

    (New text: honeybees)

    New text

    (ladybugs)

    New text

    (wasps)

    Figure 1 Experimental procedure. DIR= Direct CF, IND = Indirect CF, SLF = Self-correction with no CF, PRC = Additional writing practice with no CF.

    Before administering the first assignment, the researcher introduced the

    tasks topic by means of a 10-minute mini-lesson, to ensure a comparable

    minimal amount of background knowledge on the topic among participants

    and across the four conditions. Additionally, pupils were told that their main

    Language Learning 62:1, March 2012, pp. 141 14

  • 7/27/2019 Evidence on the Effectiveness

    15/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    focus needed to be on communicating content but that their texts would also

    be evaluated on linguistic adequacy. To give them an idea of the form-related

    features they could attend to while writing, pupils were given a handout listingcommon types of errors and an example for each error category. Participants

    were not aware of the fact that they could be receiving CF at a later stage. We

    chose to draw learners attention to form across all four groups at this stage for

    several reasons. First, we aimed at establishing the unique contribution of CF

    to accuracy development. If only the attention of the experimental groups had

    been directed toward linguistic form (by means of comprehensive CF provision

    during session 2), it would have been unclear if it was the form-oriented focus

    in general or the CF as such that led to any observed differences with the

    control groups. Second, the handout used in the introduction was reusedin a

    slightly adapted version including the error coding systemwhen instructing

    pupils in the indirect CF group on the interpretation of the error codes [see the

    Session 2: Treatment/Control (Week 2) subsection]. This setup ensured that

    the indirect group did not have an advantage as compared to the other groups

    in terms of language input. Finally, by directing every learners attention to

    linguistic form at the start of the experiment, we strove to foster stable (i.e.,

    across sessions) and equal (i.e., across the four groups) task representations.

    Had the task instructions in session 1 emphasized only content issues, the wayin which pupils in the experimental (but not the control) groups interpreted

    the task would have been likely to change over time because students are

    known to attune their task representations toward the task demands set by the

    teacher (Broekkamp, Van Hout-Wolters, Van den Bergh, & Rijlaarsdam, 2004).

    Whereas pupils initial perceptions of the study tasks goal would have been to

    understand and communicate content (in any scenario where there was no initial

    instructional focus on language form), receiving CF could have resulted in a

    shift of learners task representations toward focusing on linguistic accuracy.

    Session 2: Treatment/Control (Week 2)

    The treatment/control session (S2) took place 1 week after S1. Pupils in the two

    CF groups received the directly or indirectly corrected versions of their initial

    texts (cf. Examples 1 and 2 earlier). They were asked to copy the initial texts,

    revising all errors corrected by the researcher. Before starting their revisions,

    pupils in the indirect CF group were given a handout, accompanied by oral

    instructions, on how to interpret and use the error codes in their texts. The two

    control groups each underwent a different condition involving no CF. Pupils in

    the self-correction group were handed the texts they wrote during the pretest

    session without any alterations. They were invited to self-correct their pretest

    15 Language Learning 62:1, March 2012, pp. 141

  • 7/27/2019 Evidence on the Effectiveness

    16/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    writing by thoroughly reading over their texts and searching for any elements

    that needed revising. Even if no such utterances would be found, pupils were

    still obliged to copy their initial texts. Pupils in the practice group, on theother hand, were not given the opportunity to revise their pretest texts but were

    presented with a new writing task instead. The researcher shortly explained

    the new tasks topic before pupils started writing. All pupils were allocated 20

    minutes to finish the task they were presented with, irrespective of the condition

    to which they were assigned.

    Sessions 3 and 4: Posttest (Week 3) and Delayed Posttest (Week 6)

    The first posttest (S3) was administered 1 week after the treatment/control

    session (S2), and the delayed posttest took place 4 weeks after S2. During both

    posttest sessions, all pupils were given 20 minutes to produce a text (at least

    15 lines in length) on a new topic, which was again shortly introduced by the

    researcher.

    Writing Tasks

    The tasks used throughout the study were four writing assignments on a biology-

    related topicnamely, the metamorphosis of different insects: butterflies (S1),

    honeybees (S2), ladybugs (S3), and wasps (S4).2 Whereas three of the taskswere performed by all pupils, the task on a wasps metamorphosis was only

    performed by participants in the second control group, who practiced their

    writing during the treatment/control session (S2).

    All tasks within the series had a comparable presentation and design; they

    invited pupils to write an email to a classmate who was absent during the

    researchers explanation of the tasks topic. Pupils were asked to explain

    the metamorphosis of the relevant insect, based on a series of images depicting

    the metamorphosis process. To avoid individual differences in familiarity withthe relevant vocabulary, (potentially) unfamiliar words (e.g., larva, cocoon)

    were glossed.

    Data Processing and Reliability

    All handwritten output was transcribed and coded using the CLAN program

    of CHILDES (MacWhinney, 2000). Two research assistants transcribed pupils

    texts, and the first author was responsible for coding them. Texts were coded

    for linguistic errors and clause types (i.e., main clause and subordinate clause).

    The coding procedure was entirely blind; during coding, the researcher was

    unaware of the study group to which the text at hand belonged. Ten percent of

    the data was also coded by one of the assistants to be able to establish interrater

    Language Learning 62:1, March 2012, pp. 141 16

  • 7/27/2019 Evidence on the Effectiveness

    17/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    Table 1 Average levels of intrarater and interrater agreement for linguistic measures

    Overall Grammatical Nongrammatical Structural

    accuracy accuracy accuracy complexity

    ICC Intrarater .998 .977 .996 .974

    ICC Interrater .981 .921 .975 .946

    reliability and recoded, 6 months later, by the first author to measure intrarater

    reliability.

    The reliability results are presented in Table 1. We calculated intraclass

    correlation coefficients (ICC) to establish the average levels (over sessions) ofintrarater and interrater agreement for overall accuracy, grammatical accuracy,

    nongrammatical accuracy, and structural complexity (cf. next section). The

    intrarater ICCs were calculated from an ANOVA two-way mixed-effects model,

    and they provide an indication of the variability due to variation within the same

    rater. A two-way random-effects model was used to estimate the interrater ICCs,

    reporting the proportion of the variability due to variation among raters. As

    shown in Table 1, high levels of agreement within the same rater as well as

    between raters were found for all four measures.

    Linguistic Measures for the Dependent Variables

    Every text was analyzed for linguistic accuracy and, in order to address research

    question 7, also for structural complexity and lexical diversity.

    As did earlier studies exploring the effectiveness of written CF (e.g.,

    Chandler, 2003; Truscott & Hsu, 2008), we used an error ratio to measure

    overall accuracy: [number of linguistic errors/total number of words] 10.A 10-word ratio rather than the more common 100-word ratio was used be-cause pupils texts were relatively short (i.e., around 120 words). To be able to

    test Truscotts (2001, 2007) claim that nongrammatical errors might be more

    amenable to correction than errors in grammar (our research question 5), we

    broke down our overall accuracy measure into a measure of grammatical ac-

    curacy and a measure of nongrammatical accuracy. The first measure was a

    ratio calculated on the basis of the sum of the number of article errors, inflec-

    tional errors, word order errors, omissions of necessary elements, additions

    of nonnecessary elements, pronominal errors, and other grammatical errors

    (i.e., [number of grammatical errors/total number of words] 10). Lexicalerrors, orthographical errors, appropriateness/pragmatic errors, and other non-

    grammatical errors,3 on the other hand, were included in the nongrammatical

    17 Language Learning 62:1, March 2012, pp. 141

  • 7/27/2019 Evidence on the Effectiveness

    18/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    accuracy ratio (i.e., [number of nongrammatical errors/total number of words]

    10).4

    To measure structural complexity, we used a subordination index: the num-ber of subordinate clauses as a percentage of the total number of clauses (i.e.,

    [number of subclauses/total number of clauses] 100) (Norris & Ortega, 2009;Wolfe Quintero, Inagaki, & Kim, 1998). Lexical diversity was calculated using

    Guirauds Index, a type-token ratio that corrects for text length (types/

    tokens)

    (Guiraud, 1954).

    Statistical Analyses

    As our participants came from different classes within different schools, our

    data were structured hierarchically. In other words, the performances of pupilswithin one class or school were expected to be more similar to each other than

    to performances of pupils in other classes and schools; a teacher, for example,

    is likely to affect the performances of his/her pupils. This consideration was

    important for the choice of appropriate statistical procedures. ANOVA, specif-

    ically, relies on the independence of cases and may have been inappropriate

    for our data if indeed we had found that observations within classes or schools

    were dependent. We therefore applied linear multilevel analyses (e.g., Snijders

    & Bosker, 1999) to explore the relationship between observations within classesand schools. The multilevel procedure enabled us to explicitly model possible

    dependencies in the data by including class and school as random factors.5

    These multilevel analyses showed, however, that class and school never made a

    significant contribution to the model. After we had ascertained in this way that

    the assumption of independence was met, we proceeded to analyze our data

    using AN(C)OVAs because this procedure has proven to have more power in

    absence of data dependencies (e.g., Snijders & Bosker, 1999).

    We used ANCOVAs to test for between-groups differences on the dif-

    ferent dependent variables (i.e., overall accuracy, grammatical accuracy, non-

    grammatical accuracy, structural complexity, and lexical diversity) in the treat-

    ment/control session (S2), first posttest session (S3), and delayed-posttest ses-

    sion (S4). The initial ANCOVA models contained group condition and ed-

    ucational level as between-subjects variables and language proficiency as a

    covariate. In addition, we incorporated learners pretest (S1) performance on

    the relevant dependent variable as a covariate to account for effects of initial in-

    dividual differences. We started out each ANCOVA with a full model, including

    all relevant factors, and all possible interactions between those factors. Factors

    and interactions between factors that did not explain a significant proportion of

    the variance were excluded from the final ANCOVA models.

    Language Learning 62:1, March 2012, pp. 141 18

  • 7/27/2019 Evidence on the Effectiveness

    19/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    Table 2 Descriptive statistics for overall language proficiency by educational level and

    group condition

    Higher educational level (havo) Lower educational level (vmbo-t)Group Group

    condition N M SD condition N M SD

    DIR 32 76.09 10.52 DIR 34 72.24 11.08

    IND 26 74.85 11.80 IND 31 67.97 10.71

    SLF 33 80.45 8.93 SLF 31 70.61 11.93

    PRC 33 74.09 8.46 PRC 33 71.97 8.54

    Total 124 76.46 10.10 Total 129 70.75 10.63

    Note. The maximum possible score on the receptive vocabulary test was 108. The sample

    size for the vocabulary test is smaller than the total sample size for participants in the

    study because some participants failed to complete the vocabulary test. DIR= directCF, IND = indirect CF, SLF = self-correction with no CF, PRC = additional writing

    practice with no CF.

    Results

    We will first present some relevant descriptive and inferential results on thelanguage proficiency and writing in S1 that help ascertain the comparability

    of the four study groups in terms of learners proficiency profiles and baseline

    performance at the onset of the study, prior to delivery of the two treatments

    and two control conditions. We will then successively present the findings

    regarding the value of comprehensive CF as an editing tool (RQ1), its language

    learning potential (RQ2), the effects of direct and indirect CF on grammatical

    and nongrammatical accuracy (RQ4 and RQ5), and the influence of CF on the

    complexity of learners writing (RQ7). The comparison between the effects of

    CF and those of self-correction (RQ3) and additional writing practice (RQ 7),

    and the role of pupils educational level (RQ8), will be discussed throughout

    the Results section.

    Comparability of Groups at the Onset of the Study

    The descriptive statistics for overall language proficiency (i.e., pupils scores

    on the receptive vocabulary test), itemized by educational level and group as-

    signment, are included in Table 2. In Tables 3, 4, and 5 readers will find the

    descriptive statistics for overall accuracy, grammatical accuracy, and nongram-

    matical accuracy scores on the first writing task about butterflies, done in S1

    19 Language Learning 62:1, March 2012, pp. 141

  • 7/27/2019 Evidence on the Effectiveness

    20/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    Table 3 Descriptive statistics for overall accuracya by educational level, condition, and

    session

    Pretest

    (S1)

    Treatment/control

    session

    (S2)

    Posttest

    (S3)

    Delayed

    posttest

    (S4)Educational Group

    level condition N M SD M SD M SD M SD

    High (havo) DIR 32 1.52 0.75 0.37 0.29 1.08 0.66 1.32 0.89

    IND 29 1.25 0.67 0.42 0.32 1.05 0.46 0.89 0.40

    SLF 37 1.39 0.60 1.28 0.66 1.61 0.82 1.50 0.70

    PRC 36 1.29 0.63 1.30 0.72 1.49 0.73 1.41 0.77

    Total 134 1.37 0.66 0.88 0.71 1.33 0.73 1.31 0.75

    Low (vmbo-t) DIR 35 1.69 0.70 0.32 0.24 1.64 0.76 1.42 0.78

    IND 33 1.95 0.88 0.74 0.63 1.73 0.97 1.54 0.90

    SLF 34 1.85 0.75 1.50 0.68 1.97 0.93 1.75 0.84

    PRC 32 1.79 1.00 2.19 1.00 2.26 1.00 2.08 1.10

    Total 134 1.82 0.83 1.17 0.99 1.90 0.94 1.70 0.93

    Note.DIR=

    direct CF, IND=

    indirect CF, SLF=

    self-correction with no CF, PRC=additional writing practice with no CF.

    aNumber of linguistic errors per 10 words.

    as a baseline measure of pupils written accuracy and structural and lexical

    complexity, also itemized by educational level. The descriptive results for the

    structural complexity and lexical diversity on S1 are shown in Tables 6 and 7,

    respectively.A series of ANOVAs on pupils baseline texts (S1) with group condition

    as a between-subjects variable showed that there were no initial differences

    among the four groups in overall accuracy,F(3, 264)

  • 7/27/2019 Evidence on the Effectiveness

    21/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    Table 4 Descriptive statistics for grammatical accuracyaby educational level, condition,

    and session

    Pretest

    (S1)

    Treatment/control

    session

    (S2)

    Posttest

    (S3)

    Delayed

    posttest

    (S4)Educational Group

    level condition N M SD M SD M SD M SD

    High (havo) DIR 32 0.43 0.27 0.09 0.10 0.32 0.22 0.39 0.34

    IND 29 0.30 0.24 0.10 0.11 0.36 0.27 0.37 0.22

    SLF 37 0.37 0.25 0.37 0.23 0.42 0.30 0.46 0.33

    PRC 36 0.37 0.27 0.36 0.27 0.44 0.23 0.45 0.23Total 134 0.37 0.26 0.24 0.24 0.39 0.26 0.42 0.29

    Low (vmbo-t) DIR 35 0.55 0.31 0.10 0.11 0.49 0.33 0.39 0.26

    IND 33 0.47 0.33 0.21 0.23 0.51 0.27 0.49 0.32

    SLF 34 0.53 0.32 0.46 0.29 0.64 0.41 0.60 0.39

    PRC 32 0.52 0.36 0.59 0.41 0.69 0.44 0.72 0.46

    Total 134 0.52 0.33 0.34 034 0.58 0.38 0.55 0.38

    Note.DIR

    =direct CF, IND

    =indirect CF, SLF

    =self-correction with no CF, PRC

    =additional writing practice with no CF.aNumber of grammatical errors per 10 words.

    language proficiency, F(3, 253) = 1.96, p = .120, p 2 = .02. These findingsensured that our four groups were comparable at the onset of the study.

    On the other hand, however, a series of ANCOVAs with educational level

    as a between-subjects variable and language proficiency as a covariate revealed

    that these two factors did influence learners pretest writing. We found thatboth educational level,F(1, 250) = 14.20,p

  • 7/27/2019 Evidence on the Effectiveness

    22/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    Table 5 Descriptive statistics for nongrammatical accuracya by educational level, con-

    dition, and session

    Pretest

    (S1)

    Treatment/control

    session

    (S2)

    Posttest

    (S3)

    Delayed

    posttest

    (S4)Educational Group

    level condition N M SD M SD M SD M SD

    High (havo) DIR 32 1.09 0.63 0.28 0.27 0.77 0.56 0.93 0.82

    IND 29 0.95 0.52 0.32 0.26 0.69 0.39 0.52 0.29

    SLF 37 1.01 0.55 0.92 0.56 1.20 0.69 1.04 0.59

    PRC 36 0.93 0.54 0.94 0.63 1.05 0.66 0.97 0.69

    Total 134 1.00 0.56 0.64 0.56 0.95 0.63 0.89 0.66

    Low (vmbo-t) DIR 35 1.14 0.56 0.23 0.17 1.15 0.65 1.03 0.75

    IND 33 1.48 0.72 0.52 0.47 1.22 0.82 1.05 0.72

    SLF 34 1.32 0.63 1.05 0.64 1.33 0.69 1.15 0.70

    PRC 32 1.28 0.82 1.60 0.85 1.56 0.89 1.36 0.95

    Total 134 1.30 0.69 0.83 0.78 1.32 0.78 1.15 0.79

    Note.DIR

    =direct CF, IND

    =indirect CF, SLF

    =self-correction with no CF, PRC

    =additional writing practice with no CF.aNumber of nongrammatical errors per 10 words.

    were well guided in incorporating the factors educational level and language

    proficiency in later analyses when answering our research questions.

    Effects of Comprehensive CF on Overall Written Accuracy

    Table 3 includes the descriptive statistics for overall accuracy scores for allfour group conditions, itemized per educational level and session (i.e., treat-

    ment/control session, posttest, and delayed posttest). Figure 2 is a graphic illus-

    tration of the descriptive results in Table 3, collapsed over educational level. It

    shows the accuracy development of the four study groups over time. The graph

    provides a descriptive preview of this studys main findings, which will be

    presented in this section. As can be seen from the graph, all groups performed

    similarly at the pretest (S1). In the treatment/control session (S2), however, the

    error rate of the CF groups and the self-correction group decreased, whereas

    the number of errors committed by the practice group increased. On the two

    posttests (S3 and S4), we still see an error rate difference between the control

    and experimental groups, in favor of the latter.

    Language Learning 62:1, March 2012, pp. 141 22

  • 7/27/2019 Evidence on the Effectiveness

    23/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    Table 6 Descriptive statistics for structural complexityaby educational level, condition,

    and session

    Pretest

    (S1)

    Treatment/contro

    lsession

    (S2)

    Posttest

    (S3)

    Delayed

    posttest

    (S4)Educational Group

    level condition N M SD M SD M SD M SD

    High DIR 32 16.31 10.46 17.30 10.87 14.67 9.14 10.27 8.13

    (havo) IND 29 16.30 8.98 18.07 10.20 14.36 10.38 10.98 8.25

    SLF 37 14.74 8.94 14.39 9.30 15.65 10.92 10.94 10.62

    PRC 36 19.86 10.02 16.13 8.73 15.75 9.32 15.80 8.02Total 134 16.83 9.71 16.35 9.74 15.16 9.88 12.06 9.10

    Low DIR 35 20.78 8.17 21.66 8.27 18.65 10.77 15.17 9.64

    (vmbo-t) IND 33 19.41 11.66 21.04 10.57 16.70 10.98 10.84 8.95

    SLF 34 15.03 8.95 16.19 8.62 14.57 11.00 11.25 7.67

    PRC 32 18.86 8.00 15.19 8.23 13.66 8.65 13.17 8.30

    Total 134 18.52 9.45 18.57 9.32 15.89 10.45 12.72 8.73

    Note.DIR

    =direct CF, IND

    =indirect CF, SLF

    =self-correction with no CF, PRC

    =additional writing practice with no CF.aSubordination Index (i.e., [number of subclauses/total number of clauses] 100).

    Analyses of covariance were used to test for between-groups accuracy dif-

    ferences in the treatment/control session (S2), posttest (S3), and delayed posttest

    (S4). The initial ANCOVA models contained group condition and educational

    level as between-subjects variables and language proficiency and overall ac-

    curacy S1 as covariates. Nonsignificant factors and nonsignificant interactions

    between them were excluded from the final models. Table 8 offers a previewof the significant differences observed between groups in post hoc pairwise

    comparisons (to be reported in detail below) in overall, grammatical, and non-

    grammatical accuracy across the three sessions, together with the associated

    Cohensdeffect sizes.

    Revision Effects (S2)

    During revision, pupils language proficiency, F(1, 240)= 2.63, p= .106,

    p

    2

    =.01, and the interaction between condition and language proficiency,

    F(3, 240) = 2.40,p = .069, p 2 = .03, did not prove to play a significant role.These factors were therefore discarded from the final ANCOVA model. The

    23 Language Learning 62:1, March 2012, pp. 141

  • 7/27/2019 Evidence on the Effectiveness

    24/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    Table 7 Descriptive statistics for lexical diversitya by educational level, condition, and

    session

    Pretest

    (S1)

    Treatment/control

    session

    (S2)

    Posttest

    (S3)

    Delayed

    posttest

    (S4)Educational Group

    level condition N M SD M SD M SD M SD

    High (havo) DIR 32 7.13 0.85 7.28 0.91 6.87 0.90 6.51 0.80

    IND 29 7.10 0.77 7.10 0.73 6.69 0.68 6.61 0.63

    SLF 37 7.09 0.81 7.06 0.78 6.70 0.90 6.31 0.98

    PRC 36 7.31 0.50 6.72 0.72 6.88 0.71 6.63 0.83Total 134 7.16 0.74 7.03 0.81 6.79 0.80 6.51 0.84

    Low (vmbo-t) DIR 35 7.00 0.81 7.26 0.70 6.51 0.73 6.58 0.69

    IND 33 6.77 1.06 7.03 1.00 6.64 1.00 6.47 0.89

    SLF 34 7.08 0.75 7.12 0.82 6.46 0.70 6.37 0.75

    PRC 32 6.96 0.87 6.53 0.77 6.59 0.87 6.57 0.70

    Total 134 6.95 0.88 7.00 0.87 6.55 0.74 6.50 0.75

    Note.DIR

    =direct CF, IND

    =indirect CF, SLF

    =self-correction with no CF, PRC

    =additional writing practice with no CF.aGuirauds Index (types/

    tokens).

    definitive analysis revealed significant effects of group condition, F(3, 259)

    = 111.24,p

  • 7/27/2019 Evidence on the Effectiveness

    25/41

    Van Beuningen, De Jong, and Kuiken Effectiveness of Comprehensive CF

    Table8S

    ummaryofsignificantcontrastsamongfourgroupswithass

    ociatedCohensdeffectsizes

    Overallaccuracy

    d

    Gra

    mmaticalaccuracy

    d

    Nongrammaticalaccuracy

    d

    Treatment/controlsession(S2)

    D

    IR>

    SLF

    1.9

    9

    DIR>

    SLF

    1.5

    2

    DIR>

    SLF

    1.6

    1

    D

    IR>

    PRC

    2.7

    4

    DIR>

    PRC

    1.8

    5

    DIR>

    PRC

    2.5

    3

    I

    ND>

    SLF

    1.5

    5

    IND>

    SLF

    1.0

    5

    IND>

    SLF

    1.3

    3

    I

    ND>

    PRC

    2.2

    9

    IND>

    PRC

    1.3

    7

    IND>

    PRC

    2.0

    6

    Posttest(S

    3)

    D

    IR>

    SLF

    0.6

    7

    DIR>

    PRC

    0.5

    5

    DIR>

    SLF

    0.5

    3

    D

    IR>

    PRC

    0.8

    1

    DIR>

    PRC

    0.6

    5

    I

    ND>

    SLF

    0.7

    0

    IND>

    SLF

    0.6

    8

    I

    ND>

    PRC

    0.8

    4

    IND>

    PRC

    0.8

    0

    Delayedposttest(S4)

    D

    IR>

    PRC

    0.6

    3

    DIR>

    PRC

    0.6

    5

    IND>

    SLF

    0.6

    6

    IND>

    SLF

    0.6

    6

    IND>

    PRC

    0.8

    2

    I

    ND>

    PRC

    0.9

    4

    Note.

    DIR

    =

    directCF,

    IND=

    indirectC

    F,

    SLF=

    self-correctionwith

    noCF,

    PRC=

    additionalwritingpracticewithnoCF.

    p