Upload
dominic-whitehead
View
223
Download
1
Tags:
Embed Size (px)
Citation preview
Using Contextual Speller Techniques and Language Modeling for ESL Error Correction
Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William B. Dolan, Dmitriy Belenko, Lucy Vanderwende
Reporter: Chia-Ying Lee Advisor: Hsin-Hsi Chen
Microsoft Research & University of IllinoisIJCNLP 2008
Introduction About 750M (74%) people use English as a
second language (Crystal 1997)
Non-native writer encountered some special problem. (Ex: prepositions 介係詞 )
Challenge: Writing errors often present a semantic dimension(Ex: at school 指地點, in school 指時間 )
2
Target Error Type1. Preposition 介係詞 presence and choice: In the other
hand, ... (On the other hand ...) 2. Definite and indefinite determiner presence and choice:
I am teacher... (am a teacher) 3. Gerund 動名詞 /infinitive 不定詞 confusion: I am
interesting in this book. (interested in) 4. Auxiliary verb presence and choice: 從屬動詞 My
teacher does is a good teacher (my teacher is...) 5. Over-regularized verb inflection: I writed a letter (wrote) 6. Adjective/noun confusion: This is a China book (Chinese
book) 7. Word order (adjective sequences and nominal
compounds): I am a student of university (university student)
8. Noun pluralization: They have many knowledges (much knowledge)
3
Problem Definition
Present a modular system for detection and correction of errors made by non-native writers.
Focus on preposition and determiner related problem.
4
Related Work Turner and Charniak (2007) utilize a language
model based on a statistical parser for determiner and preposition selection
De Felice and Pulman (2007) utilize a set of sophisticated syntactic and semantic analysis features to predict 5 common English prepositions
Han et al. (2004, 2006) use a maximum entropy classifier to propose article corrections
Izumi et al. (2003) and Chodorow et al. (2007) present techniques of automatic preposition choice modeling
5
System Description 0. Preprocessing
Tokenized and POS tagged
1. Suggestion Provider (SP) Detection and correction
2. Language Model (LM) Delete the suggestions whose score is lower than
original
3. Example Provider (EP) Query the web for exemplary sentences
6
Suggestion Provider(1/3)
Classifiers : Presence/absence or pa classifier
ex: p(article + teacher) = 0.54 Choice or ch classifier
ex: p(the) = 0.04 p(a/an) = 0.96
Potential insertion sites are determined heuristically from the sequence of POS tags
7
Suggestion Provider(2/3) Features: ( ±6 tokens)
Relative position Token string POS tags Example: 0/I/PRP 1/am/VBP 2/teacher/NN
3/from/IN 4/Korea/NNP 5/./.
Decision tree classifiers (WinMine toolkit Chickering 2002) Better than linear SVM
8
Suggestion Provider(3/3) Data set:
English Encarta encyclopedia (560k sentences) A random set of 1M sentences from a Reuters
news data set.
Preposition from the NICT Japanese Learners of English corpus :about, as, at, by, for, from, in, like, of, on, since, to, with, than, “other“
9
Language Model 5-gram model trained on the English
Gigaword corpus (LDC2005T12) 120K-word vocabulary 54 million bigrams, 338 million trigrams, 801
million 4-grams and 12 billion 5-grams. Use interpolated Kneser-Ney smoothing
(Kneser and Ney 1995) without count cutoff Score:
I am teacher from Korea. score = 0.19 I am a teacher from Korea. score = 0.60
10
Example Provider (1/2)
Web Search
String query in a small window
Ranking rule:In the same sentenceSentence length Context overlap
11
Example Provider (2/2)
Original: I want to travel Disneyland in March. Suggestion: I want to travel to Disneyland in
March. Top 3 examples: 1. Timothy's wish was to travel to
Disneyland in California. 2. Should you travel to Disneyland in
California or to Disney World in Florida? 3. The tourists who travel to Disneyland in
California can either choose to stay in Disney resorts or in the hotel for Disneyland vacations.
12
Evaluation (1/5) Suggestion provider
Determiner choice Preposition choice Language model
Human evaluation 70% for training; 30%for testing Combined accuracy:
13
Evaluation (2/5) Suggestion provider
Determiner choice
Baseline:69.9% Choosing the mostfrequent class label none State of the art Turner and Charniak
(Penn Tree Bank): 86.74%
14
Evaluation (3/5) Suggestion provider
Preposition choice
Baseline : 28.94%Using no preposition
15
Evaluation (4/5) Language Model
Reduced the number of preposition corrections by 66.8% and the determiner corrections by 50.7%
Increase precision dramatically
For the accuracy of preposition suggestions LM score + classifier probability : 62.32%LM score alone: 58.36%
16
Evaluation (5/5) Human evaluation
17
CLEC: Chinese Learners of English Corpus (Gui and Yang 2003)
Conclusion and Future Work Successfully combining contextual speller
based methods with language model scoring and providing web-based examples.
The system can work even in extremely noisy text with reasonable accuracy
Future Work : Using web counts to build a learned ranker that
combines information from language model and classifiers
18
Thank you!
19
買敏順找敏順!敏順讓您呼吸順暢 輕鬆舒爽