10
Proceedings of COLING 2012: Posters, pages 863–872, COLING 2012, Mumbai, December 2012. The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings Tomoya Mizumoto 1 Yuta Hayashibe 1 Mamoru Komachi 1 Masaaki Nagata 2 Yu ji Matsumoto 1 (1) Nara Institute of Science and Technology, Nara, Japan (2) NTT Communication Science Laboratories, Kyoto, Japan ABSTRACT English as a Second Language (ESL) learners’ writings contain various grammatical errors. Pre- vious research on automatic error correction for ESL learners’ grammatical errors deals with re- stricted types of learners’ errors. Some types of errors can be corrected by rules using heuristics, while others are difficult to correct without statistical models using native corpora and/or learner corpora. Since adding error annotation to learners’ text is time-consuming, it was not until recently that large scale learner corpora became publicly available. However, little is known about the ef- fect of learner corpus size in ESL grammatical error correction. Thus, in this paper, we investigate the effect of learner corpus size on various types of grammatical errors, using an error correction system based on phrase-based statistical machine translation (SMT) trained on a large scale error- tagged learner corpus. We show that the phrase-based SMT approach is effective in correcting frequent errors that can be identified by local context, and that it is difficult for phrase-based SMT to correct errors that need long range contextual information. KEYWORDS: ESL, grammatical error correction, statistical machine translation. 863

The effect of learner corpus size in grammatical error correction of ESL writings

  • Upload
    tmu-jp

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Proceedings of COLING 2012: Posters, pages 863–872,COLING 2012, Mumbai, December 2012.

The Effect of Learner Corpus Sizein Grammatical Error Correction of ESL Writings

Tomoya Mizumoto1 Yuta Hayashibe1

Mamoru Komachi1 Masaaki Nagata2 Yu ji Matsumoto1

(1) Nara Institute of Science and Technology, Nara, Japan(2) NTT Communication Science Laboratories, Kyoto, Japan

[email protected], [email protected], [email protected],

[email protected], [email protected]

ABSTRACTEnglish as a Second Language (ESL) learners’ writings contain various grammatical errors. Pre-vious research on automatic error correction for ESL learners’ grammatical errors deals with re-stricted types of learners’ errors. Some types of errors can be corrected by rules using heuristics,while others are difficult to correct without statistical models using native corpora and/or learnercorpora. Since adding error annotation to learners’ text is time-consuming, it was not until recentlythat large scale learner corpora became publicly available. However, little is known about the ef-fect of learner corpus size in ESL grammatical error correction. Thus, in this paper, we investigatethe effect of learner corpus size on various types of grammatical errors, using an error correctionsystem based on phrase-based statistical machine translation (SMT) trained on a large scale error-tagged learner corpus. We show that the phrase-based SMT approach is effective in correctingfrequent errors that can be identified by local context, and that it is difficult for phrase-based SMTto correct errors that need long range contextual information.

KEYWORDS: ESL, grammatical error correction, statistical machine translation.

863

1 IntroductionEnglish as a Second Language (ESL) learners’ writings contain various kinds of grammatical er-rors. Recent growth in corpus annotation of learner English allows detailed analysis of grammaticalerrors in learners’ writings. Konan-JIEM Learner Corpus (hereafter referred to as KJ Corpus)1 isone such corpus composed of English essays written by Japanese college students. Table 1 showsthe distribution of errors found in KJ Corpus2. The most frequent error type is article errors, fol-lowed by noun number and preposition errors. It is not surprising that frequent types of errorsaccount for the most errors, but it should be noted that there are many different types of errors inlearner corpus.

Thus far, a lot of studies have been made on automated error correction in regard to errors ESLlearners make. However, most previous studies of second language learning deal with one ora few restricted types of learners’ errors. For example, there are studies on preposition errors(Rozovskaya and Roth, 2011), verb selection errors (Liu et al., 2011), tense errors (Tajiri et al.,2012), verb form errors (agreement and tense) (Lee and Seneff, 2008), preposition and articleerrors (Dahlmeier and Ng, 2011) and spelling, article, preposition and word form (agreement andtense) errors (Park and Levy, 2011). Recently, Swanson and Yamangil (2012) presented a detailedanalysis on correcting all types of errors in the Cambridge Learner Corpus, but their task is differentfrom the others in that their goal is to detect errors and select error types given both the originaland corrected text, which is not often available in practice.

Some types of errors like agreement errors can be corrected by simple rules using heuristics, whileothers like preposition errors are difficult to correct without statistical model trained on nativecorpora and/or learner corpora. It was not until recently that large scale learner corpora becamewidely available for grammatical error correction. However, little is known about the effect oflearner corpus size in ESL grammatical error correction.

In this paper, we conduct experiments in error correction targeting all types of errors using a largescale error-annotated learner corpus to see the effect of corpus size in grammatical error correction.We build an error correction system with phrase-based statistical machine translation (SMT) tech-nique. Also, we create a large scale error-tagged corpus of learner English from the web. We thenanalyze the results of error correction by breaking down the error types and discuss the strengthand weakness of the example based approach using a large scale but noisy learner corpus.

The main contribution of this work is two-fold:• To our knowledge, it is the first attempt to use a large scale learner corpus to correct all types

of errors.• We show the effect of learner corpus size on the phrase-based SMT approach and show its

advantages and disadvantages.In the following, we briefly overview related work of grammatical error correction in Section 2.Then we describe our grammatical error correction system and large scale error-annotated learnercorpus in Section 3. Section 4 shows our experimental results and discusses the effect of corpussize on different error types.

2 Related workEven though there are many works on error correction in learners’ English, only a few target mul-tiple various kinds of grammatical errors.

1http://www.gsk.or.jp/catalog/GSK2012-A/catalog_e.html2Spelling errors are excluded from target of annotation in KJ Corpus.

864

Types Proportion (%) Types Proportion (%)

article 19.23 verb other 4.09noun number 13.88 adverb 3.59preposition 13.56 conjunction 2.04tense 8.77 word order 1.34lexical choice of noun 7.04 noun other 1.30lexical choice of verb 6.90 auxiliary verb 0.88pronoun 6.62 other lexical choice 0.74agreement 5.25 relative 0.42adjective 4.30 interrogative 0.04

Table 1: The distribution of errors on KJ Corpus.

First, Brockett et al. (2006) proposed an error correction model with phrase-based SMT. Eventhough their model can deal with all types of errors, they evaluated their method only on nounnumber errors using an artificial data, partly because there was no large scale learner corpus avail-able at the time. We would like to emphasize that our work is the first attempt to use a real worldlarge learner corpus with phrase-based SMT technique. We will show that phrase-based SMTespecially suffers from data sparseness.

Second, Park and Levy (2011) attempted to correct various kinds of errors with a noisy channelmodel using a large scale unannotated corpus of learner English. Ours differs from their work inthat we use a large scale error-tagged corpus annotated by the wisdom of crowds. In addition,they targeted only spelling, article, preposition and word form errors, while we do not restrict errortypes.

Third, Han et al. (2010) developed a preposition correction system using a large scale error-taggedcorpus of learner English. They built a maximum entropy-based model for preposition errorstrained on learner and native corpora. We also take advantage of a large scale error-tagged cor-pus of learner English, but use phrase-based SMT to deal with various kinds of errors and to fullyexploit the learner corpus.

Recently, Dahlmeier and Ng (2012) presented a beam-search decoder for correcting spelling, ar-ticle, preposition, punctuation and noun number errors. They reported that their discriminativemodel achieves considerably better results than an SMT baseline trained on a few hundreds of sen-tences. As we will see later, we observed a similar tendency in preposition error correction whenwe trained a phrase-based SMT system on a small learner corpus. However, in this work, we ex-ploit a large scale error-annotated corpus extracted from the web to overcome the data sparsenessproblem.

3 Using a large scale learner corpus with phrase-based SMT for grammati-cal error correction

3.1 Error correction with phrase-based SMTWe use phrase-based statistical machine translation (Koehn et al., 2003) to conduct unrestrictederror correction. There are several studies about grammatical error correction using phrase-basedstatistical machine translation (Brockett et al., 2006; Mizumoto et al., 2011; Ehsan and Faili, 2012).Although Brockett et al. (2006) corrected English learners’ error using phrase-based statisticalmachine translation, they only targeted mass noun errors. Mizumoto et al. (2011) dealt with un-

865

restricted types of learners’ errors, but their target is not English but Japanese. Ehsan and Faili(2012) applied an SMT framework to English and Persian grammatical error correction, but usedartificially created learner corpora.

The well-known statistical machine translation formulation using a log-linear model (Och and Ney,2002) is defined by:

e = argmaxe

P(e| f ) = argmaxe

M

∑m=1

λmhm(e, f ) (1)

where e represents target sentences (corrected sentences) and f represents source sentences (sen-tences written by learners). hm(e, f ) is a feature function and λm is a model parameter for eachfeature function. This formulation finds a target sentence e that maximizes a weighted linear com-bination of feature functions for source sentence f . A translation model and a language modelcan be used as feature functions. The translation model is commonly represented as conditionalprobability P( f |e) factored into the translation probability between phrases. The language modelis represented as probability P(e). The translation model is learned from sentence-aligned parallelcorpus while the language model is learned from target raw corpus.

3.2 Crowdsourcing annotation of a large scale corpus of learner English

We use data from a language learning social networking service Lang-8 3 to train the error correc-tion system using statistical machine translation. In Lang-8, language learners post their writing onthe Lang-8 site to be corrected by native speakers. We can obtain pairs of learner’s sentence andcorrected sentence in large scale from Lang-8. Mizumoto et al. (2011) first presented an approachto extract a learner corpus from the web, but we differ from them in that we create a learner corpusof English rather than Japanese. Also, unlike (Tajiri et al., 2012), we propose to use metadata ofusers to determine the L1 of English learners. Because our test corpus (KJ Corpus) is written byJapanese college students, we would like to use the same kind of data; it is out side of the scope ofthis paper to see the effect of learners’ L1.

We crawled blog entries found in Lang-8 as of December 2010. We used writings in Lang-8 writtenby Japanese ESL learners for translation model and language model of error correction system withSMT. There are 509,116 sentence pairs in English writings written by Japanese L1 English learners.However, we need to filter noisy sentences because it may be hard to align them if the sentencesare drastically changed from the original learner’s sentences, resulting in degraded performance onphrase-based SMT approach. Therefore, we calculate the edit distance between a learner sentenceand the corrected sentence using a dynamic programming algorithm, and retain sentences whosenumbers of both insertions and deletions is equal to or less than 5 words 4. As a result, we obtain391,699 sentence pairs.

4 Experiment: Effect of learner corpus size in grammatical error correctionWe carried out an experiment on grammatical error correction with SMT-based system using a largescale learner corpus. To see the effect of corpus size, we compare a system using Lang-8 Corpus(large scale learner corpus) with different sizes and a system using KJ Corpus (small scale corpus).In order to get a closer look at the effect of error correction methods, we also experimented on thepreposition error correction task using a maximum entropy model as a discriminative baseline andSMT-based models as our proposal for all error correction.

3http://lang-8.com/4 We use 6 as a distortion-limit for Moses, therefore we chose the edit distance to be smaller than the distortion-limit.

866

4.1 Tools and experimental dataWe used Moses 2010-08-13 5 with default parameters as a decoder and GIZA++ 1.0.5 6 as analignment tool to implement an error correction system with phrase-based SMT. We applied grow-diag-final-and (Och and Ney, 2003) heuristics for phrase extraction. The number of extractedphrases are 1,050,070 (245 MB) using all data of Lang-8 Corpus. We used 3-gram as a languagemodel trained on the corrected text of Lang-8 Corpus.

Next, we built the maximum entropy model (Berger et al., 1996) as a multi-class classifier baselinefor preposition error correction (Sakaguchi et al., 2012). We used the implementation of MaximumEntropy Modeling Toolkit 7 with its default parameters. We incorporated surface, POS, WordNet,parse and language model features described in (Tetreault et al., 2010) and (De Felice and Pulman,2008). POS and parse features were extracted using the Stanford Parser 2.0.2. This system achievesrecall of 18.44, precision of 34.88 and F-measure of 24.12 trained and tested on the CLC FCEdataset (Yannakoudakis et al., 2011), which ranked the 4th out of 13 systems at the HOO 2012Shared Task (Dale et al., 2012).

We use KJ Corpus as a test data. KJ Corpus consist of 170 essays, containing 2,411 sentences.When we experiment on a system using KJ Corpus, we perform 5-fold cross validation.

4.2 Evaluation metricsFor the evaluation metrics, we use automatic evaluation criteria. To be precise, we use recall,precision and F-measure.

Recall and precision for each type of errors are calculated from true positive, false positive and falsenegative based on error tags in KJ Corpus. The word which does not have any tag in KJ Corpusdoes not affect precision for each type of errors 8. For example, let us consider the following:

learner: He talked to me his life of Kyoto, and he took me Kyoto university.correct: He talked to me about his life in Kyoto and he took me to Kyoto university.system: He talked me his life on Kyoto, and he took me to Kyoto university.

In this example, the system deletes preposition “to”, which does not have any tag. Thus, precision= 1/2, recall = 1/2 for preposition errors and precision = 1/3, recall = 1/2 for Total scores.

4.3 Experimental resultsTable 2 shows error correction results for each type of errors on different corpora. We comparedSMT systems trained on KJ Corpus, Lang-8 Corpus with the same amount of data with KJ Corpus,and full Lang-8 Corpus. With very few exceptions, the larger the size of learner corpus, the higherthe accuracy. In addition, using the larger corpus, precision tends to increase more than recall.

Table 3 presents F-measures for each type of error varying the corpus sizes (2K, 10K, 20K, 100K,200K, 300K, All (390K)). As we will see later in the next section, there are two types of errors inwhich learner corpus size matters.

Table 4 shows the performance of preposition error correction. Perhaps not surprising, but it stilldeserves attention that SMT model trained on all Lang-8 Corpus clearly outperformed other two

5http://http://www.statmt.org/moses/6http://code.google.com/p/giza-pp/7https://github.com/lzhang10/maxent8The total score is calculated using all the correction output with and without any tag.

867

Training Corpus KJ Corpus Lang-8 Corpus (2K) Lang-8 Corpus (390K)

Recall Prec F Recall Prec F Recall Prec F

article 0.187 0.531 0.277 0.187 0.571 0.282 0.359 0.761 0.488noun number 0.207 0.603 0.308 0.136 0.671 0.226 0.199 0.710 0.311preposition 0.137 0.375 0.201 0.092 0.319 0.143 0.262 0.585 0.361

tense 0.102 0.170 0.128 0.043 0.088 0.058 0.080 0.149 0.104lexical choice of noun 0.035 0.114 0.054 0.033 0.152 0.054 0.182 0.443 0.258lexical choice of verb 0.070 0.161 0.098 0.065 0.200 0.098 0.192 0.324 0.241

pronoun 0.075 0.220 0.112 0.040 0.143 0.063 0.150 0.367 0.213agreement 0.236 0.604 0.340 0.125 0.483 0.199 0.228 0.469 0.307adjective 0.151 0.326 0.206 0.056 0.286 0.094 0.389 0.522 0.446verb other 0.089 0.139 0.109 0.147 0.333 0.204 0.286 0.419 0.340

adverb 0.265 0.450 0.333 0.214 0.429 0.286 0.292 0.432 0.349conjunction 0.100 0.417 0.161 0.091 0.714 0.161 0.115 0.546 0.190word order 0.500 0.025 0.048 0.667 0.050 0.093 0.750 0.075 0.136noun other 0.182 0.222 0.200 0.143 0.167 0.154 0.571 0.429 0.490

auxiliary verb 0.056 0.167 0.083 0.100 0.400 0.160 0.100 0.400 0.160other lexical choice 0.167 0.200 0.182 0.000 0.000 0.000 0.357 0.455 0.400

relative 0.111 0.250 0.154 0.182 0.667 0.286 0.091 0.500 0.154interrogative 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Total 0.149 0.147 0.148 0.113 0.205 0.146 0.247 0.275 0.260

Table 2: Result for each type of errors by statistical machine translation. Bold face indicates thatone system’s result is equal or greater by more than 0.1 points than the other systems’ result.

systems. MaxEnt does slightly better than SMT when they are trained on the same small corpus.Unfortunately, we were not able to use Lang-8 Corpus since it took too long to train.

4.4 DiscussionWe can classify errors into two types: (1) errors which get better correction by increasing corpussize and (2) errors which have little relationship with corpus size. The first type of errors includesarticle, preposition, lexical choice of noun, lexical choice of verb, adjective, and noun other. On theother hand, the second type of errors comprises noun number, tense, agreement, adverb, conjunc-tion, word order, auxiliary verb, relative and interrogative. We can expect to improve performance(both recall and precision) for errors that require wide coverage lexical knowledge, such as lexicalchoice errors, by using a much larger corpus with phrase-based SMT. In contrast, we may say thaterrors which involve larger context such as tense errors are difficult to correct with phrase-basedSMT. We discuss the result while looking at examples of two of the former type of errors (articleand lexical choice of noun) whose F-measures improve with increasing corpus size, and three ofthe latter type of errors (noun number, tense and agreement), whose F-measures do not change oreven degrade.

Table 5 shows examples of article and lexical choice of noun. These are the examples that phrase-based SMT failed to correct using KJ Corpus. Because we can acquire a lot of pairs of an errorphrase and its correction by increasing the size of the learner corpus, the phrase-based SMT wasable to correct them using Lang-8 Corpus.

Table 6 shows examples of noun number, tense and agreement errors. The first example of noun

868

Training Corpus KJ Lang-8

2K 10K 20K 100K 200K 300K 390K

article 0.277 0.282 *0.390 *0.420 *0.443 *0.459 *0.475 *0.488noun number 0.308 0.226 0.214 0.238 0.270 0.300 0.319 0.311preposition 0.201 0.143 0.192 0.226 *0.333 *0.336 *0.344 *0.362

tense 0.128 0.058 0.066 0.058 0.081 0.096 0.089 0.104lexical choice of noun 0.054 0.054 0.124 0.133 *0.189 *0.216 *0.250 *0.258lexical choice of verb 0.098 0.098 0.087 0.138 *0.196 *0.232 *0.232 *0.241

pronoun 0.112 0.063 0.131 0.150 0.177 0.195 0.213 0.213agreement 0.340 0.197 0.224 0.248 0.260 0.284 0.307 0.307adjective 0.206 0.094 0.165 0.219 *0.413 *0.426 *0.426 *0.446verb other 0.109 0.204 0.240 0.311 0.291 *0.340 0.308 0.340

adverb 0.333 0.286 0.286 0.302 0.333 0.349 0.349 0.349conjunction 0.161 0.161 0.161 0.191 0.161 0.191 0.191 0.191word order 0.048 0.093 0.093 0.091 0.091 0.091 0.091 0.136noun other 0.200 0.154 0.286 0.286 *0.531 *0.490 *0.490 *0.490

auxiliary verb 0.083 0.160 0.160 0.083 0.083 0.160 0.160 0.160other lexical choice 0.182 0.000 0.095 0.095 0.400 0.400 0.400 0.400

relative 0.154 0.285 0.154 0.154 0.154 0.154 0.154 0.154interrogative 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Total 0.148 0.146 0.180 0.200 0.239 0.247 0.254 0.260

Table 3: Results (F-measure) for error correction by SMT varying the learner corpus sizes. As-terisks indicate that the difference of result using Lang-8 Corpus and result using KJ Corpus isstatistically significant (p < 0.01).

System Training corpus Recall Precision F-measure

Maximum entropy-based model KJ Corpus 0.165 0.407 0.235Phrase-based SMT KJ Corpus 0.137 0.375 0.201Phrase-based SMT Lang-8 Corpus (390K) 0.262 0.585 0.362

Table 4: Result for preposition error correction on KJ Corpus.

number was corrected using Lang-8 Corpus with phrase-based SMT since the error is one of thecommon learners’ expressions. The second was not corrected using Lang-8 Corpus with phrase-based SMT because “dools” 9 is slightly displaced from “a big”, and a proper noun “snoopy” isinserted between “dools” and “a big”. It is hard to correct this kind of error with Phrase-basedSMT, even using artificial data such as in Brockett et al. (2006). To solve this problem, we need toconduct generalization using POS or consider dependency relations.

The first example of a tense error was corrected using both KJ Corpus and Lang-8 Corpus withphrase-based SMT. One of the reasons why the baseline system was able to correct the error is thatit requires only local context to correct and is very frequent even in a small leaner corpus. In thesecond example, the system fails to find tense agreement in the complex sentence. Tense error isdifficult to correct for phrase-based SMT since it involves global context (Tajiri et al., 2012).

The first example of agreement error was corrected using Lang-8 Corpus with phrase-based SMT.This is because the phrase pair correcting “Flowers is” to “Flowers are” is frequent and the language

9The word “dools” written by a learner is also a spelling error.

869

learner correct

article I like a chocolate very much. I like chocolate very much.lexical choice of noun my cycle was injured, but i wasn’t. my bicycle was damaged, but i wasn’t.

Table 5: Examples of system output for article and lexical choice of noun error

learner correct

noun number 1 I read various type books. I read various types of books.*noun number 2 There is a big snoopy dools in my room. There is a big snoopy doll in my room.

tense 1 If I ’ll live in saitama, I must have ... If I live in saitama, I must have ...*tense 2 The weather is very sunny, so we were ... The weather was very sunny, so we were ...

agreement 1 Flowers is very beautiful. Flowers are very beautiful.*agreement 2 I think, reading comics are not "reading" I think, reading comics is not "reading"

Table 6: Examples of system results for noun number, tense and agreement errors. Asterisksindicate that the SMT system using full Lang-8 Corpus failed to correct the errors.

model probability of “Flowers are” is also higher than “Flowers is”. The second example is onethat the system failed to correct since the pattern is unseen in the learner corpus and thus the systemhas no way to capture the relation between the subject “reading” and “are”. To solve this problem,it needs to get the subject-verb relation considering a dependency structure.

As for prepostion error correction, we suspect that there are two reasons why the SMT-based modelusing full Lang-8 Corpus outperformed the MaxEnt model. First, due to the small amount oftraining data in KJ Corpus (2,000 sentences), the MaxEnt model failed to build a high performancesystem. Second, the high performance of the SMT system may be attributed to the fact that bothKJ Corpus and Lang-8 Corpus were written by Japanese native speakers. Also, the reason whythe MaxEnt model achieved better result than SMT when trained on the same small corpus ispossibly because KJ Corpus is too small to learn variations in learner English by phrase-basedSMT approach, while a discriminative model can exploit a small dataset using rich features.

Conclusion

We tackled the task of ESL grammatical error correction of all types of errors using a large scalecorpus of learner English with phrase-based SMT technique. Previous research focused on re-stricted types of errors due to the small amount of learner corpora. We overcome this problem bytraining an error correction system on a large scale error tagged corpus extracted from the web.

We found that the size of corpus is critical to improve phrase-based SMT approach. However,the degree of improvement varies across error types. Phrase-based SMT is effective in correctingfrequent errors which require only local context. For example, there is a clear improvement inincreasing the size of learner corpus for correcting article, preposition, lexical choice and adjectiveerrors, while there is little improvement for correcting agreement and tense errors.

Acknowledgement

We would like to thank Yangyang Xi for granting permission to use text from Lang-8.

870

ReferencesBerger, A. L., Pietra, V. J. D., and Pietra, S. A. D. (1996). A Maximum Entropy Approach toNatural Language Processing. Computational Linguistics, 22(1):39–71.

Brockett, C., Dolan, W. B., and Gamon, M. (2006). Correcting ESL Errors Using Phrasal SMTTechniques. In Proceedings of COLING-ACL, pages 249–256.

Dahlmeier, D. and Ng, H. T. (2011). Grammatical Error Correction with Alternating StructureOptimization. In Proceedings of ACL-HLT, pages 915–923.

Dahlmeier, D. and Ng, H. T. (2012). A Beam-Search Decoder for Grammatical Error Correction.In Proceedings of EMNLP, pages 568–578.

Dale, R., Anisimoff, I., and Narroway, G. (2012). HOO 2012: A Report on the Preposition andDeterminer Error Correction Shared Task. In Proceedings of BEA, pages 54–62.

De Felice, R. and Pulman, S. G. (2008). A Classifier-Based Approach to Preposition and Deter-miner Error Correction in L2 English. In Proceedings of COLING, pages 169–176.

Ehsan, N. and Faili, H. (2012). Grammatical and Context-Sensitive Error Correction Using aStatistical Machine Translation Framework. Software: Practice and Experience.

Han, N.-R., Tetreault, J., Lee, S.-H., and Ha, J.-Y. (2010). Using an Error-Annotated LearnerCorpus to Develop an ESL/EFL Error Correction System. In Proceedings of LREC, pages 763–770.

Koehn, P., Och, F. J., and Marcu, D. (2003). Statistical Phrase-Based Translation. In Proceedingsof HLT-NAACL, pages 48–54.

Lee, J. and Seneff, S. (2008). Correcting Misuse of Verb Forms. In Proceedings of ACL-HLT,pages 174–182.

Liu, X., Han, B., and Zhou, M. (2011). Correcting Verb Selection Errors for ESL with thePerceptron. In Proceedings of CICLing, pages 411–423.

Mizumoto, T., Komachi, M., Nagata, M., and Matsumoto, Y. (2011). Mining Revision Log ofLanguage Learning SNS for Automated Japanese Error Correction of Second Language Learners.In Proceedings of IJCNLP, pages 147–155.

Och, F. J. and Ney, H. (2002). Discriminative Training and Maximum Entropy Models for Statis-tical Machine Translation. In Proceedings of ACL, pages 295–302.

Och, F. J. and Ney, H. (2003). A Systematic Comparison of Various Statistical Alignment Models.Computational Linguistics, 29(1):19–51.

Park, Y. A. and Levy, R. (2011). Automated Whole Sentence Grammar Correction Using a NoisyChannel Model. In Proceedings of ACL, pages 934–944.

Rozovskaya, A. and Roth, D. (2011). Algorithm Selection and Model Adaptation for ESL Cor-rection Tasks. In Proceedings of ACL, pages 924–933.

871

Sakaguchi, K., Hayashibe, Y., Kondo, S., Kanashiro, L., Mizumoto, T., Komachi, M., and Mat-sumoto, Y. (2012). Naist at the hoo 2012 shared task. In Proceedings of the Seventh Workshop onBuilding Educational Applications Using NLP, pages 281–288.

Swanson, B. and Yamangil, E. (2012). Correction Detection and Error Type Selection as an ESLEducational Aid. In Proceedings of NAACL: HLT, pages 357–361.

Tajiri, T., Komachi, M., and Matsumoto, Y. (2012). Tense and Aspect Error Correction for ESLLearners Using Global Context. In Proceedings of ACL, pages 198–202.

Tetreault, J., Foster, J., and Chodorow, M. (2010). Using Parse Features for Preposition Selectionand Error Detection. In Proceedings of ACL, pages 353–358.

Yannakoudakis, H., Briscoe, T., and Medlock, B. (2011). A New Dataset and Method for Auto-matically Grading ESOL Texts. In Proceedings of ACL, pages 180–189.

872