Upload
channing-vega
View
30
Download
1
Embed Size (px)
DESCRIPTION
Showcasing the potential of error-annotated learner corpora for profiling research. Jennifer Thewissen Centre for English Corpus Linguistics (CECL). Profiling research. Definition - PowerPoint PPT Presentation
Citation preview
Showcasing the potential of error-annotated learner
corpora for profiling research
Jennifer ThewissenCentre for English Corpus Linguistics
(CECL)
1
Profiling research
Definition Finding ‘criterial
features’ that discriminate between different levels of proficiency (e.g. Hawkins & Buttery, 2010)
CEF levels C2 C1 B2 B1 A2 A1
2
Feature we focussed on
Construct of accuracy, viz. errors
Focus on four proficiency levels, viz. B1, B2, C1, C2
Aim = See whether errors constituted a «criterial feature» to distinguish these levels
3
Data & methodology
4
5
International Corpus of Learner English (Granger et al., 2009)
L1 Total scripts Total tokens
FR 74 50060
GE 71 49540
SP 78 51385
Total 223 150985
Threefold analysis
Error annotation, i.e. error tagging phase
CEF rating phase
Error counting phase
6
7
Error annotation
Broad error categories Description
F Form, spelling errors
G Grammatical errors
L Lexical errors
X Lexico-grammatical errors
Q Punctuation errors
W Word missing, word redudant, word order
S Sentence unclear, incomplete
8
Error tagging examples
The fast spread of television can transform it into a double-edged (FS) wheapon
$weapon$.
I will try to give several (XNUC) proofs $proof$ of the truth of the sentence.
46 error subcategories Result: a detailed error profile per text
9
The CEF rating procedure
Individual rating of the 223 learner scripts according to the linguistic descriptors in the Common European Framework of Reference for Languages (CEF) (Council of Europe, 2001)
B1, B2, C1 or C2 (with + and – increments)
2 professional raters (+ 1 rater in cases of wide disagreement) (r = 0.70)
Tracking development
10
CEF scoreError
profile
Development:Progress?
Stabilisation?Regression?
11
Error counting: potential occasion analysis (GNN)
Learner corpussample
Error-tagged data
Total noun-number errors
POS-taggeddata (CLAWS7)
Total nouns used
12
Statistical analyses: ANOVA & Ryan (GNN)
CEF score N Ryan-derived groupings
C2 28 0,32
C1 67 0,70 0,70
B2 62 0,99 0,99
B1 66 1,23
GNN = [B1/B2]>[B2/C1]>[C1/C2]
Results for profiling research
13
14
4 main error developmental patterns
Error developmental patterns
Illustration
Improvement-only pattern B1>B2>C1>C2
Improvement & stabilisation pattern e.g. B1>[B2/C1/C2]
Stabilisation-only pattern [B1/B2/C1/C2]
Partly regressive pattern B2>B1
Two dominating error patterns
Dominating error patterns
Number of error
categories
Examples
B1>[B2/C1/C2] 17 (37%) SpellingUncountable nounsLexical phrasesAdjective number errorsUnclear sentences
[B1/B2/C1/C2] 16 (35%) TensesPunctuation confusionVerb complementationNoun complementation
15
16
Where do progress and stabilisation mainly occur? Discriminating power of errors
Adjacent proficiency levels
Number of discriminating error
types
B1>B2 20
B2>C1 3
C1>C2 2
[B2/C1/C2] 33
Preliminary observations for profiling research
17
Some concluding remarks
Errors (negative features) Stronger discriminatory power
between certain levels (viz. B1 vs. B2) than others (viz. B2 vs. C1 vs. C2)
Need to capture other features than errors (e.g. positive features)
Conclusion for profiling research: errors are useful but they are not enough in and of themselves
18