Upload
exascale-infolab
View
562
Download
2
Embed Size (px)
Citation preview
Correct Me If I’m Wrong: Fixing Grammatical
Errors by Preposition Ranking
Roman Prokofyev, Ruslan Mavlyutov, Gianluca Demartini and
Philippe Cudre-Mauroux
eXascale Infolab
University of Fribourg, Switzerland
November 4th, CIKM’14
Shanghai, China
1
Motivations and Task Overview
2
• Grammatical correction is important by itself
• Also as a part of Machine Translation or Speech Recognition
Correction of textual content written by English Learners.
⇒ Rank candidate prepositions by their likelihood of being
correct in order to potentially replace the original.
I am new in android programming.
[to, at, for, …]
State-of-Art in Grammar Correction
Approaches:
• Multi-class classification using pre-defined candidate set
• Statistical Machine Translation
• Language modeling
Features:
• Lexical features, POS tags
• Head verbs/nouns, word dependencies
• N-gram counts from large corpora (Google Web 1-T)
• Confusion matrix
3
Key Ideas
• Usage of a particular preposition is governed by a
particular word/n-gram;
⇒ Task: select/aggregate n-grams that influence
preposition usage;
⇒ Use n-gram association measures to score each
preposition.
4
Processing Pipeline
5
m...1
n-gram
construction
Supervised
Classifier
Sentence
Tokenization
Feature
extractionFeature
extractionFeatures N-gram
Statistics
Documentforeach
foreach
Sentence
Extraction
Corrected
SentenceCorrected
Document
List of
n-grams
Ranked list
of
prepositions
for n-gram
Tokenization and n-gram distance
6
N-gram Type Distance
the force PREP 3gram -2
force be PREP 3gram -1
be PREP you 3gram 0
PREP you . 3gram 1
N-gram Type Distance
be PREP 2gram -1
PREP you 2gram 1
PREP . 2gram 2
N-gram association measures
Motivation:
use association measures to compute a score that will be
proportional to the likelihood of an n-gram appearing
together with a preposition.
Background N-gram collection: Google Books N-grams.
7
N-gram PMI scores by preposition
force be PREP (with: -4.9), (under: -7.86), (at: -9.26), (in: -9.93), …
be PREP you (with: -1.86), (amongst: -1.99), (beside: -2.26), …
PREP you . (behind: -0.71), (beside: -0.82), (around: -0.84), …
N-grams with determiner skips
8
Around 30% prepositions are used in proximity to
determiners (“a”, “the”, etc.)
Determiners Nouns
Correct preposition ranks
“one of the most” “one of most”
PMI-based Features
9
• Average rank of a preposition among the ranks of the
considered n-grams;
• Average PMI score of a preposition among the PMI
scores of the considered n-grams;
• Total number of occurrences of a certain preposition on
the first position in the ranking among the ranks of the
considered n-grams.
Calculated across 2 logical groups (considered n-grams):
• N-gram size;
• N-gram distances.
Central N-grams
10
Distribution of correct preposition counts on top of PMI
rankings with respect to n-gram distance.
Other features
• Confusion matrix values
• POS tags: 5 most frequent tags + “OTHER” catch-all tag;
• Preposition itself: sparse vector of the size of the
candidate preposition set.
11
to in of for on at with from
to 0.958 0.007 0.002 0.011 0.004 0.003 0.005 0.002
in 0.037 0.79 0.01 0.009 0.066 0.036 0.015 0.008
Preposition selection
Supervised Learning algorithm.
• Two-class classification with a confidence score for every
preposition from the candidate set;
• Every candidate preposition will receive its own set of
feature values;
Classifier: random forest.
12
Training/Test Collections
Training collection:
• First Certificate of English (Cambridge exams)
Test collections:
• CoNLL-2013 (50 essays written by NUS students)
• StackExchange (historical edits)
Cambridge FCE CoNLL-2013 StackExhange
N# sentences 27k 1.4k 6k
13
Experiments: overview
1. Feature importance scores
2. N-gram sizes
3. N-gram distances
4. CoNLL and StackExchange evaluation
14
Experiments: Feature Importances
All top features except “confusion matrix” are based on the
PMI scores.
15
Feature name Importance score
Confusion matrix probability 0.28
Top preposition counts (3grams) 0.13
Average rank (distance=0) 0.06
Central n-gram rank 0.06
Average rank (distance=1) 0.05
N-gram Sizes and Distances
Distance restriction Precision Recall F1 score
(0) 0.3077* 0.3908* 0.3442*
(-1,1) 0.3231 0.4166 0.3637
(-2,2) 0.3214 0.4222 0.3648
(-5,5) 0.3223 0.4028 0.3577
None 0.3214 0.3924* 0.3532
16
N-gram sizes Precision Recall F1 score
{2,3}-grams 0.3005* 0.3879* 0.3385*
{3}-grams + skip n-grams (4) 0.2931* 0.4187 0.3447
{2,3}-grams + skip n-grams 0.3231 0.4166 0.3637
* indicates a statistically significant difference to the best performing
approach (in bold)
Test Collection Evaluation
17
Collection Approach Precision Recall F1
score
CoNLL-2013
NARA Team @CoNLL2013 0.2910 0.1254 0.1753
N-gram-based classification 0.2592 0.3611 0.3017
StackExchange
N-gram-based classification 0.1585 0.2185 0.1837
N-gram-based classification
(cross-validation)
0.2704 0.2961 0.2824
Takeaways
• PMI association measures
• + Two-class preposition ranking
⇒ allow to significantly outperform the state of the art.
• Skip n-grams contribute significantly to the final result.
18
Roman Prokofyev (@rprokofyev)
eXascale Infolab (exascale.info), University of Fribourg, Switzerland
http://www.slideshare.net/eXascaleInfolab/
Two-class vs. Multi-class classification
Multi-class:
• input: one feature vector
• output: one of N classes (1-N)
Two-class (still with N possible outcomes)
• input: N feature vectors
• output: 1/0 for each of N feature vectors
19
Training/Test Collections
Training collection:
• Cambridge Learner Corpus: First Certificate of English
(exams of 2000-2001), CLC FCE
Test collections:
• CoNLL-2013 (50 essays written by NUS students)
• StackExchange (StackOverflow + Superuser)
CLC FCE CoNLL-2013 SE
N# sentences 27k 1.4k 6k
N# prepositions 60k 3.2k 15.8k
N# prepositional errors 2.9k 152 6k
% errors 4.8 4.7 38.2
20
Evaluation Metrics
Precision =
Recall =
F1 =
21
valid _ suggested _corrections
total _ suggested _corrections
valid _ suggested _corrections
total _valid _corrections
2·Precision·Recall
Precision+Recall