View
4
Download
0
Category
Preview:
Citation preview
Elena Volodina, elena.volodina@svenska.gu.se
Intelligent Computer-AssistedLanguage Learning
Self-presentation: Elena Volodina
● 1998 PhD in Linguistics (Moscow, Russia)● 2008 MA in Language Technologies (Gothenburg, Sweden)
2010 - ...Research Engineer (Språkbanken)→ 2017 - …Researcher (SB-Text)
● Lärka development● ICALL research● Second language resources and algorithms● L2 Swedish infrastructure● L2 profiles● ...
“Teachers never tell you their first names because they don't want you
to Google them”
https://spraakbanken.gu.se/eng/personal/elena
3
Focus on literacy● Dutch study:● → Average reading comprehension ~B1 level
Velleman, E., van der Geest, T.: Online test tool to determine the CEFR reading comprehension level of text. Procedia Computer Science 27 (2014)
4
Literacy: Sweden● PIAAC study focusing on literacy● Sweden among 5 “best” of 23 countries on average● Largest discrepancy between native-born and non-native born citizens● → high unemployment rate● → higher risk for deteriorated health
OECD. 2013. OECD Skills Outlook 2013. First Results from the Survey of Adult Skills.PIAAC. 2013. Survey of Adult Skills (PIAAC).SCB. 2013. Tema utbildning, rapport 2013:2, Den internationella undersökningen av vuxnas färdigheter. Statistiska centralbyrån.
5
Societal need
2015: out of 9,9 mln citizens, 2,2 mln have foreign backgrund, dvs 22,2 %(Statistiska centralbyrån)
6
What can we do?cause versus symptoms
NaturalLanguage
Processing+
technical competence
ComputerAssisted
LanguageLearning
+pedagogical competence
ICALL
NLP + CALL = ICALL
ICALL development cycle
1. Definingtarget group
2. Defininglanguage skill
3. Developing resources
4. Developing tools &
algorithms
5. Developing prototype
6. Evaluatingprototype7. Maintenance
ICALL development cycle
1. Definingtarget group
2. Defininglanguage skill
3. Developing resources
4. Developing tools &
algorithms
5. Developing prototype
6. Evaluatingprototype7. Maintenance
Adults vs kidsHealthy vs special needs
…
ICALL development cycle
1. Definingtarget group
2. Defininglanguage skill
3. Developing resources
4. Developing tools &
algorithms
5. Developing prototype
6. Evaluatingprototype7. Maintenance
Writing, speaking, reading, listening,
vocabulary, grammar…
ICALL development cycle
1. Definingtarget group
2. Defininglanguage skill
3. Developing resources
4. Developing tools &
algorithms
5. Developing prototype
6. Evaluatingprototype7. Maintenance
Research
Not research?
Not research
Why do we need resources (data)?
L2 exercisesContext-free,
understandable, level-appropriate
Appropriatehttps://spraakbanken.gu.se/larkalabb/infl-mc
Sentence selection needTarget vocabulary and grammar need
• Vocabulary exercise (L2)• Inflection exercise (L2)• Bundled gaps (L2)• Word-based exercises (L2: egg, listening)• …
• Exercises for students of linguistics (L1)
à Corpus of course books
Produced by experts FOR L2 learners
→ reading comprehension texts→ exercises→ recordings of listening excerpt
COCTAILL corpus
Elena Volodina, Ildikó Pilán, Stian Rødven Eide and Hannes Heidarsson 2014. You get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second Language. NEALT Proceedings Series 22
COCTAILL
COCTAILL ingredients
How it looks: text topics
https://spraakbanken.gu.se/larkalabb/editor
COCTAILL qualitative explorationstopics across levels
COCTAILL explorations:target skills across levels
From COCTAILL to a graded L2 receptive vocabulary:SVALex
Thomas François, Elena Volodina, Ildikó Pilán, Anaïs Tack. 2016. SVALex: a CEFR-graded lexical resource for Swedish foreign and second language learners. Proceedings of LREC 2016, Slovenia.
http://cental.uclouvain.be/cefrlex/svalex/
From course books to automatic CEFR levelassessment in texts & sentences: HitEx
Ildikó Pilán, Elena Volodina, Lars Borin. (2016). Candidate sentence selection for language learning exercises: from a comprehensive framework to an empirical evaluation. TAL Journal: Special issue NLP for learning and Teaching. Volume 57, Number 3.
HitEx
Machine learning(supervised training)
features trainedclassifier
POStags lexicons
Readabilitystudies
dependencyrelations
80% correct (texts)63% correct (sentences)
Course books (training data)
Languagelearningresources
Features
https://spraakbanken.gu.se/larkalabb/hitex
From SVALex & text classification experiments to text evaluation: TextEval• https://spraakbanken.gu.se/larkalabb/texteval• Text analysis platform• Assessment of learner written language and expert written texts
• CEFR level (machine learning)• Highlighting vocabulary by CEFR level (based on graded word lists)• Out-of-vocabulary items are a challenge
Ildikó Pilán, Elena Volodina and David Alfter. 2016. Coursebook texts as a helping hand for classifying linguistic complexity in language learners' writings. Proceedings of the workshop on Computational Linguistics for Linguistic Complexity (CL4LC), COLING 2016, Osaka, Japan.
https://spraakbanken.gu.se/larkalabb/texteval
Example text
• Den 6 juni utnämndes till Sveriges officiella nationaldag först år 1983, och blev helgdag 2005. Före 1983 var dagen känd som svenska flaggans dag, men har firats som inofficiell nationaldag sedan 1916 – dessförinnan var den känd som Gustafsdagen. Huvudskälet till firandet är just att den då 27-årige Gustav Vasa valdes till kung av Sverige den 6 juni 1523, varpå Kalmarunionen upplöstes och Sverige blev självständigt. Även 1809 och 1974 års regeringsformer, som båda skrevs under den 6 juni, anges som skäl att högtidlighålla dagen. En vanlig stereotyp är att svenskars nationaldagsfirande varken är särskilt omfattande eller patriotiskt – vanligtvis i jämförelse med norrmännens 17 maj-firande. Jonas Engman, sakkunnig i traditionsfrågor vid Nordiska museet, anser att norrmännen snarare är undantaget. - Vi tittar gärna på Norge och frågar oss varför vi inte gör som norrmännen. Norge är dock nog mer ovanligt i sitt firande, sett till övriga Norden. De har varit med om krig, men det har även finländarna och danskarna. Den nationella identiteten spelade nog en stor roll under upplösningen av unionen med Sverige, säger Engman till TT. En kluven dag. Jonas Engman påpekar att den svenska attityden till den 6 juni präglas av en viss kluvenhet. - Bland annat har arbetarrörelsen, som betonade internationell solidaritet över nationell patriotism, varit inflytelserik här. Nationalismen bröt också ut sent hos oss – under sent 1800-tal – medan många av de andra europeiska nationalstaterna tillkom redan efter Napoleonkrigen. Vi är stolta över Sverige på olika sätt, mycket patriotism finns exempelvis i idrotten, säger han. Rättad: I en tidigare version av texten uppgavs fel antal år sedan första officiella firandet.
SweLL pilot
Elena Volodina, Ildikó Pilán, Ingegerd Enström, Lorena Llozhi, Peter Lundkvist, Gunlög Sundberg, Monica Sandell. 2016. SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies. Proceedings of LREC 2016, Slovenia.
SweLL pilot
Topics
L1s
Age
Non-lemmatized items
From SweLL-pilot to productive vocabularySweLLex
Elena Volodina, Ildikó Pilán, Lorena Llozhi, Baptiste Degryse, Thomas François. 2016. SweLLex: second languagelearners' productive vocabulary. Proceedings of the workshop on NLP4CALL&LA. NEALT Proceedings Series / LiUP
From SweLLex and SVALex to level classificationof new vocabulary: Siwoco
• https://spraakbanken.gu.se/larkalabb/siwoco• Automatic prediction of single word lexical complexity
• SVM, Logistic regression, MLP classifier• Features: Word length, syllables, suffix length, gender, homonymy, polysemy,
compounds, N-grams, topic distribution
• Validation through crowdsourcing
David Alfter, Elena Volodina. 2016. Towards Single Word Lexical Complexity Prediction. Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications 2018, NAACL.
Second chance: starting NOT from scratch
Grant information
Elena Volodina, Beata Megyesi, Mats Wirén, Lena Granstedt, Julia Prentice, Monica Reichenberg, Gunlög Sundberg. 2016. A Friend in Need? Research agenda for electronic Second Language infrastructure. Proceedings of SLTC 2016, Umeå, Sweden
SweLL promises (main)
à
1. Deliver a well-annotated (gold standard) corpus of L2 essays• 600 essays, approx 100 per CEFR levels A1-C1 + 100 for control L1 learner corpus• Incl manual error annotation & manually checked linguistic annotation• Make available for research (and public?)
SweLL promises (main)
2. Set a platform (and workflow) for • Continuous upload of new essays• Manual error-annotation• Automatic linguistic annotation
à à
SweLL promises (main)
• Set a platform for browsing L2 essays • in concordance fashion (+parallel view)• In full text fashion
SweLL focus (main)
• Adult learners (16+ years)• Healthy learners• Written essays (no speech data)• Where possible – longitudinal data
SweLL promises (side path, rather experimental)
• Design a set of exercises• To elicit (structured) responses that would answer some interesting research questions• To create this way a database that could be used for research
• Develop further Lärka platform for • Deploying the above exercises• Link user answers to their individual ”profiles” (age, gender, L1s, …)
Data
2013 2014 2015 2016 2017
Corpus creation
0
5
10Experiments
Num
ber o
f arti
cles
9
Essay corpus, SweLL-corpus, creation and SweLL-based publications
Curios “time & effort” fact:Data vs experiments
2
Lifetime of corpora vs tools
• Corpora creation costs both in time and money, but:
• Well-documented, representative, reliably annotated and available corporaare used far beyond their initial research purpose
• Penn TreeBank (Marcus et al., 1993; cited 6813 times), is still used for research (e.g. Pawar, A., & Mago, V., 2018)
• ICLE (Granger, 1998; cited 358 times) à modern research (e.g. Möller, 2017)
• Whereas tools trained on corpora get outdated as research makes progress
è
Lifetime of tools
https://spraakbanken.gu.se/larka/archive
2012-2016https://spraakbanken.gu.se/larkalabb/
2016-...
Tools decay, data stay
Available data
Corpus availability(and the legal hassle)
• Necessary step acc to GDPR (EU General Data Protection Regulation)• Names and personality cannot be revealed or traced to the real person• Everyone has the right to know which databases he/she is represented in• Everyone has the right to withdraw from the database
• Hence, we cannot destroy the ”Name ßà ID” mapping keys if wewant to have (longitudinal) data
• Anyone can demand access to the data (acc to Principle of Public Access to Official Records, Swedish law)
• à however no right to use the information!
SweLL agreement form: https://goo.gl/5hKuew
GDPR
• Restrictions on use of personal information to protect ”subjects”, i.e. physical people
• Important consequences for learner corpora (L2) projects –IF you want data to be available for research!
• Metadata precautions• Text de-identification and pseudonymization• Name-ID mapping keys handling
SweLLL2 infrastructure
project
No information on the country of birth
Birthyear: 5-year spans, e.g. 2000-2004
No exact date for entering the L2 country
No information on school or teacher
Pseudonymization of text data: names, cities, ages, professions, etc.
Example essay (translation into English + mocking errors)
• I live in Guntorp on apartement . I live with my boyfriend . His name is Hans . The apartement mine has a pattio and tree room . Jag enjoy there in Guntorp but a lot of time to goto shop , fortifive minut . I have the bus and the Guntop train . Jag lived in Norway bifore , in Tromsö . It was less than Gunntorp . I enjoy their too becaus I had more friends. I think it is hard to have friends here . But I enjoy better job here . In Tromsö jobbe I only on one website . In Guntorp I work on many website . I am webdevelooper . But Guntorp is closser to Spain than Tromsö . It is important how one lives because I am not in my country . I mess my mother and my father but I live her with my boyfriend .
To-dos (1)• Test NER on original learner texts:
à Can NER speed up the process? Noise? How about essays reviewing books and films, political events etc?
• Automate pseudonymization for English (partly done for Swedish): lists, consequency of geographical namereplacements, etc
à Assess risks of introducing errors that were not in the original text and find ways of avoiding themà Add a possibility of setting the whole text into a ”cultural” context, e.g. Astrid Lindgren’s or Hungarian, etc.
• Test replicating grammatical forms (and errors?) in pseudonymized segments
à e.g. Stadsbibliotekets --> The Volvo’sà Asses the possibility of projecting MSDs from the original text and evaluate their reliability
• Link to Lärka and crowdsource pseudo-tag corrections by essays writers. Learn from ”correction reports”
Reliable and interesting data
Annotation makes data interesting/useful(you get what you annotate)
Annotation…
• …is now the place where linguistics hides in NLP (Fort, 2016)• Parts of speech• Base forms of the words (lemmas)• Syntactic and semantic information• …
Karën Fort. 2016. Collaborative Annotation for Reliable Natural Langugage Processing. Wiley.
Annotation…
• …can ”hide” other disciplines than linguistics• (e.g. so called) Error annotation • Target skills• Receptive vs productive skills• Level of proficiency in a (second/foreign) language• Text genres• …
Implications (for L2 corpora)
• Take other discipline’s perspectives into account, at least• NLP interests• Second Language Acquisition research questions (or a minor share of those)
• It is worth investing time and money into a resource, and work along:• Corpus design (representativity, balance, availability)• Corpus metadata• Corpus annotation & annotation reliability
NLP needs
• NLP often• is ”applied” to other research disciplines and • seeks to assist with other discipline’s research questions
• but there are a range of (traditional) questions• (automatic) error detection• (automatic) error correction• (automatic) essay grading• (automatic) essay classification (e.g. by level of proficiency, genre, topic, grade…)• L1 identification• Linguistic complexity studies (syntax, vocabulary, etc.)• (semi-automatic) anonymization (pseudonymization)• Writing support / feedback • …
SLA needs
• Longitudinal L2 data underlying mental representations and developmental processes (e.g. Myles, 2005)
• Speech data (e.g. Myles, 2005)
• Task-based data (e.g. Alexopoulou et al., 2017)
• Individual cognitive processes (scores from intelligence tests, motivation test, aptitude tests; Granger & Paquot, 2017)
• …
SweLL corpus design principles
• Representativeness• (most popular) immigrant languages• age and gender • levels of proficiency• various tasks ?• L2 vs L1 learners/writers
• Balance
• Annotation• Documentation
Hovy E.H., Lavid J.M. 2010. ”Towards a ”Science” of corpus annotation: a new methodological challenge for corpus linguistics.
Pre-annotation decisions
Post-annotation work
Representative data
Corpus designL1s A1 A2 B1 B2 C1 Control group Total
M F M F M F M F M F M F
Arabic 5 5 5 5 5 5 5 5 5 5 X X 50
Dari/Persian 5 5 5 5 5 5 5 5 5 5 X X 50
English 5 5 5 5 5 5 5 5 5 5 X X 50
Greek 5 5 5 5 5 5 5 5 5 5 X X 50Croatian/BKS 5 5 5 5 5 5 5 5 5 5 X X 50
Sorani 5 5 5 5 5 5 5 5 5 5 X X 50
Kurmanji 5 5 5 5 5 5 5 5 5 5 X X 50
Somali 5 5 5 5 5 5 5 5 5 5 X X 50
Spanish 5 5 5 5 5 5 5 5 5 5 X X 50
Tigrinya 5 5 5 5 5 5 5 5 5 5 X X 50
50 50 50 50 50 50 50 50 50 50 50 50 600
Annotation campaign management
Adriane Boyd
1. Building a corpus (data,
metadata)
2. Tagset,guidelines,
tool
3. Pilot with acorpus sample
4. Qualitativeanalysis
(comparingannotators’ decisions)
5. Quantitativeanalysis (inter-
annotatoragreement)
6. Annotatingcorpus
(biweeklymeeting)
7. Post-campaign:delivery,
maintenance
Representative?Balanced?Accessible?
no
yes
Reliable annotation?Stable annotation? Appropriate tagset?
Guidelines?
yes
no
Hovy et al. 2010. Towards a ”Science” ofcorpus annotation…
1. Building a corpus (data,
metadata)
2. Tagset,guidelines,
tool
3. Pilot with acorpus sample
4. Qualitativeanalysis
(comparingannotators’ decisions)
5. Quantitativeanalysis (inter-
annotatoragreement)
8. Annotatingcorpus
(regularchecks)
10. Corpus publication
or reviewingor correction,
delivery, maintenance
Representative?Balanced?Accessible?
no
yes
Reliable annotation?Stable annotation? Appropriate tagset?
Guidelines?
yes
no
Fort. 2016. Collaborative annotation…
6. Mini-referencecorpus for annotator
training
7. Annotatortraining
(collective, individual)
Learning curves, checks,
Updates to tagset, guidelines
yes
9. Randommanual
checks by experts
1. Building a corpus (data,
metadata)
2. Tagset,guidelines,
tool
3. Pilot with acorpus sample
4. Qualitativeanalysis
(comparingannotators’ decisions)
5. Quantitativeanalysis (inter-
annotatoragreement)
8. Annotatingcorpus
(regularchecks)
10. Corpus publication
or reviewingor correction,
delivery, maintenance
Representative?Balanced?Accessible?
no
yes
Reliable annotation?Stable annotation? Appropriate tagset?
Guidelines?
yes
no
Fort. 2016. Collaborative annotation…
6. Mini-referencecorpus for annotator
training
7. Annotatortraining
(collective, individual)
Learning curves, checks,
Updates to tagset, guidelines
yes
9. Randommanual
checks by experts
Annotation quality
• Reliability & stability à through inter-annotator agreement checks• Reproducibility à agreement of an annotator with himself, intra-
annotator agreement• Random manual checks of the annotations by experts or evaluators
Error taxonomy
Error annotation
• Don’t say the ”E-word”! • Negative connotation (SLA)• Norm deviations – not better, though• Interlanguage phenomenon (Díaz-Negrillo et al., 2009)• Practice-oriented view as a ”non-norm adequate form” (Dobric, 2015)• Unexpected uses (Gaillat et al. 2014)• Cross-disciplinary misunderstanding?
• Ideal to counter-balance error annotation with so called ”can-do” annotation
• à would allow for e.g. CAF analysis (Complexity, Accuracy, Fluency) (Wolfe-Quintero et al., 1998)
• à would probably help (a bit) to cloze the gap between SLA, LCR & NLP
Error annotation
• Don’t say the ”E-word”! (Julia Prentice, EuroSLA, submitted)
• Negative connotation (SLA)• Norm deviations – not better, though• Interlanguage phenomenon (Díaz-Negrillo et al., 2009)• Practice-oriented view as a ”non-norm adequate form” (Dobric, 2015)• Cross-disciplinary misunderstanding?
• Ideal to counter-balance error annotation with so called ”can-do” annotation
• à would allow for e.g. CAF analysis (Complexity, Accuracy, Fluency) (Wolfe-Quintero et al., 1998)
• à would probably help (a bit) to cloze the gap between SLA, LCR & NLP
What’s in a name?That which we call a rose
by any other namewould smell as sweet.
Shakespeare
Error à Correction annotation
Ideal picture (errors + can-do’s)
Linguistic element absent
AbsentNo annotation
Linguistic element present, but in a deviating form
Error-annotated segment / Can-do annotated
segment
Linguistic element present in a correct
form
Can-do annotated segment
phenomenon
annotation
Taxonomy
Taxonomies are like underwear; everyone needs them, but no one wants someone else’s
Anon
Standards are like tooth brushes; everyone likes the idea of them, but no one wants someone else’s
Anon
Egon Stemle, EURAC, Italy
SweLL pre-pilot experiment
• ASK versus Merlin taxonomy• …was used by project researchers on 2 essays (i.e. producing 4 files each)• …time was taken• …experiences were recorded
SweLL pre-pilot experiment
• Summary• It takes twice as long to use Merlin taxonomy• ASK taxonomy (L2 Norwegian) is closer to L2 Swedish• ASK lacks some useful tags• Decision: enrich ASK taxonomy with a few Merlin tags
Taxonomy ambiguity
Taxonomy ambiguity
Taxonomy ambiguity
Normalization
* I has was
• Re-writing L2 learner original in a normative way, creating a so-calledtarget hypothesis (Lüdeling et al., 2005)
Normalization
* I has was à I have been ? I was? I had?
Normalization: basic principles
• Minimal change• Positive assumption• Lexical and grammatical competence prior to functional and
structural correctness
Minimal change…
Example* Jag trivs mycket bor med dem. (Eng) I enjoy much live with them.
Potential target hypotheses:
Jag trivs mycket bra med dem à Minimal change (seemingly) à Error: wrong word / spelling?
Jag trivs mycket med att bo med dem à Lexical competence of BO, verb à Errors: idiomaticity error (trivs) +
wrong verb form (bo)
Why normalization as a separate step?
• It helps to build a better understanding of a learner’s linguisticcompetence
Why normalization as a separate step?
• It helps to build a better understanding of a learner’s linguisticcompetence
• It can be outsourced to SLA researchers for doing it
Why normalization as a separate step?
• It helps to build a better understanding of a learner’s linguisticcompetence
• It can be outsourced to SLA researchers for doing it• Error annotation depends on the change applied to the original text à and as such it is not ERROR annotation, but CORRECTION annotation
Why normalization as a separate step?
• It helps to build a better understanding of a learner’s linguisticcompetence
• It can be outsourced to SLA researchers for doing it• Error annotation depends on the change applied to the original text –and as such is not ERROR annotation, but is CORRECTION annotation• Inter-annotator agreement with respect to error codes can be objectively measured only given that the annotators are working on the same normalized version
SweLL normalization tool• Transformation-based• String matching & calculating diff• Linking on the fly (original – normalized versions)• Parallel text
• Coming (if ever):• Drop-down menus for error codes• Drag-and-drop (spaghetti view)• Three-tier representation (original, spell-corrected, normalized)
• Desired:• Support with automatic spelling error detection
Dan Rosén, developer
Arild Matsson,research engineer
SweLL normalization & error-annotation tool– hands-on demo
• https://spraakbanken.gu.se/swell/dev/
Inter-annotator agreement (IAA), pilot1
What to compare?
115
COCTAILL “ingredients”
116
IAA: How it looks: text example
Freja looks into Jonas's horoscope: You are playful, and if you can choose, you'd spend theday getting to know better somebody you are acquainted with. The evening will beromantic.
And then into her own: The love life is a mess, but otherwise, the day will be funny, sensualand entertaining. Don't work yourself up. You will receive compliments from somebody inyour surrounding.
117
IAA: How it looks: text topics to choose from
118
IAA: How it looks: result(1) culture and traditions, (2) daily life , (3) relations with other people, (4) religion; myth and legends
Intra- & inter-annotator agreement…
• ”…if humans can agree on something at N%, systems will achieve (N-10%)…” (Hovy & Lavid, 2010)
• ”In Sklandica, a Polish treebank, 20% of the agreed annotations were in factwrong.” (Fort, 2016; Wolinski et al., 2011)
• ”Whatever measure(s) is/are employed, the annotation manager has to determine the tolerances: when the agreement is good enough?” (Hovy & Lavid, 2010)
• ”…perhaps it doesn’t matter what the agreement level is, as long as pooragreements are seriously investigated.” (Hovy & Lavid, 2010)
Finally
• Central question in manual annotation: how to obtain reliable, usefuland consistent annotations?
• Annotation in corpora has a theoretical impact: empirical observations à extension/redifinition of theory
• Annotation in corpora has a practical impact: application withinteaching, tool and algorithm building
The NLP community generally is not very concerned with the theoreticallinguistic soundness. The Corpus Linguistics community does not seem
to seek ”reliability” in the annotation process and results.
(Hovy and Lavid, 2010)
Lesson 1
● Do not underestimate the time it takes to collect and prepare data
● Preparing a resource can be a research&developmentproject in itself (e.g. structured input from exercises for VPs and NPs + providing feedback on that)
123
Time-effect ratio consequences
● Researchers skip compiling own data→ use what is available→ in the end often targeting English
Lesson 2
● Take time to study legal regulations, not to waste previously collected data→ There are “loopholes”, but not without information loss
125
Question to you
● In your Master Thesis → Do you plan to collect & prepare data yourself? → or Is data already available?
Recommended