Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Trinity College LondonEnglish qualifications for real-world communication
TECHNOLOGY FOR TEACHERS IN
ASSESSMENT – THE IMMEDIATE FUTURE
1&2 November, 2018
Alex ThorpLead Academic - Europe
How far down the digital road will EL assessment go?
Overview
1. Back to the start – Introduction
2. Introducing AI – history and definitions
3. AI and language – NLP
4. Chatbots
5. AI and Language assessment
(Speaking focus)
6. Case study – Communicative competence
7. Summary
8. Test evaluation – the 3 c’s
9. Future considerations
Introduction
Introduction – true or false?
Current AI still hugely limited, processing equivalent to a 2 year old
AI and, more particularly NLP, can now offer a fully automated 4-skill assessment solution
AI dates back as far as the 1950s
The human brain provided the model for modern machine learning
That which humans find easy, computers find difficult – and vice versa
Elon Musk labelled AI ‘a fundamental risk to the existence of civilization’
Machine scoring is more reliable than human scoring
I’ve utilized AI this morning!
Spot the odd one out?
Name of tool Developer Language
learning/testingWrite & Improve English Language iTutoring LearningWrite & Improve +Class View English Language iTutoring Learning
Write & Improve +Test Zone English Language iTutoring TestingRead & Improve (coming soon) English Language iTutoring Learning
Duolingo Duolingo Learning / testinge-Rater ETS TestingWriting Mentor ETS LearningLanguage Muse Activity Palette ETS Learning / testing
AuraLang AuraLang LearningBetterAccentTutor Better Accent LearningTriplePlayPlus Syracuse Language Systems Learning
Test of English Language Learning Pearson Testing
Intelligent Essay Assessor Pearson TestingIntelliMetric Vantage Learning TestingMyAccess! Vantage Learning LearningProject Essay Grade MI Learning / testing
Summary table of identified commercially-available language learning and language testing tools. Gillings et al. 2018
Computers as ‘tutors or tools’?
Introducing AI
Coding is the application of linguistic resource through a range of cognitive processes to generate meaning – often described as competences
Back to basics - Communication cycle
Coding is the application of linguistic resource through a range of cognitive processes to generate meaning – often described as competences
Back to basics - Communication cycle
Back to the beginning
29’086 measures barley 37 months. Kushim
A clay tablet with an administrative text from the city of Uruk, c.3400–3000 BC. Probably our first ever recorded code. If Kushim was indeed a person, he may be the first individual in history whose name is known to us!
Y N Harari 2015
Let’s go back
Partial scripts
Numerical partial script became the language of advancement
As societies developed external codes required to cope with sociological demands to support larger collectives
Full scripts
Unto the era of computers….
Can computers think like humans?
H Simon and A Newell – Pittsburgh 1955. A thinking machine?
Can computers think like humans?
Alan Turing 19481st chess programme
How to overcome Combinational Explosion?
How to give intelligence to make good decisions?
Turing developed rules to guide.
The birth of Classical AI
A problem defined, a set of programmed rules applied (Heuristics)
Could plan complex operations in highly controlled environments
Could deliver maximum efficiency and economy
But classical AI couldn’t engage with it’s environment
Our world is a little more… chaotic
Enter Machine learning
System’s ability to learn for themselves from raw data (training datasets)
System’s learn from first principles – from structure in data, and seeks potential solutions to problems
• Image recognition• Voice recognition• Optical character recognition• Advanced customisation• Intelligent data analysis• Sensory data analysis
-Model (predicts) based on Parameters -Input to inform (training data) -Learner (adjusts parameters through differences in prediction and actual)
1960’s – Bayesian methods
introduced for probabilistic
inference
1980’s – back propagation
1990’s: Shift from knowledge to a
data driven approach – analysis of large amount of
data
>1990s: Support Vector machines
and Recurrent Neural Networks
2010>: ANN and Deep learning
Enter Machine learning
Machine learning: Algorithms that parse data, learn from that data, and then apply what they’ve learned to make informed decisions.
The algorithm needs to be told how to make an accurate prediction
The Moravec Paradox
The things that our brains find difficult to cope with, that require a lot of conscious mental effort, like chess, were simple for AI.
The things that our brains find easy to cope with, that require a little conscious mental effort, like making sense of what we see and hear, or movement, were very difficult for AI
“We are prodigious Olympians in perceptual and motor areas… abstract thought though is a new trick.. We’ve not yet mastered it”
(Moravec 1988)
How does ML work? Enter Artificial Neural Networks
You recognised a dog instantaneously, by the firing of choral assemblies of neural networks
Neural Networks consist of the following components•An input layer, x•An arbitrary amount of hidden layers•An output layer, ŷ•A set of weights and biases between each layer, W and b•A choice of activation function for each hidden layer, σ.
Enter Artificial Neural Networks
Artificial Neural Networks
Is there a full stop?
Is there a
capital?
Is it at start of para?
Is there a
subject?
Is there an
object?
Is there a
noun?
SVOCA?
OC?
VA?
Sentence
sample
Sentence sample
Sentence
Non sentence
Training data
Training data – each time we tell it what it’s looking at, it tweaks the connections to better recognise what it’s looking for.
AI is now booming• Optimise harvesting• Interpret medical images• Grading students• Id financial opportunities• Driverless cars
AI ANN : taught then develops
10’s of 1000’s of simulations every second and chooses to do the best one
Enter Deep Learning
Solve intelligence. Use it to make the world a better place. (Mission statement – DeepMind)
Demis Hassabis - CEO
Entering a process (e.g. playing a game) through a ‘learning algorithm’ that changes millions of connections in a neural network to reinforce or stop an action to improve the desired outcome (not task-based algorithm)
Uses Representation Learning –automatically discovers characteristics needed for feature detection or classification of raw data, that is then used to perform a task
Deep learning: ML requires input – DL can learn by itself through learning algorhithm.E.g. Automatic light – ML accepts only ‘dark’, DL would learn ‘I can’t see’
Could a DL neural network system go beyond human understanding?
AlphaGo played a completely unpredictable move – can come up with a new idea beyond the remit of human thought….
Let’s ‘Go’
In DL systems, the algorithm learns how to make accurate predictions through its own data processing (ML needs to be told).
AI Limitations
Can find patterns in, and learn from, data, but no real understanding of what those patterns actually mean, there is no meaningful conceptual thinking.
• Patterns in complex data • Convert data into meaningful concepts
• Process ‘predictable’ (images / outcomes)
• ‘Understand’ content or images – easily tricked
With no real conceptual understanding of patterns – hardest challenge of all is ability that relies on exactly this - language
Prof Al Khalili
• Data engagement beyond human capacity
• Operate autonomously – based on training datasets
AI and language
Recognise these?
Chatbot
NLP
NLU
ASR
NLGAI
SDS
DMS
Coding is the application of linguistic resource through a range of cognitive processes to generate meaning – often described as competences
Communication cycle
AI in language - NLP
Automated Speech Recognition (ASR) Speech generation
Text recognition Text generation (NLG)
[Response driven]
When was Elvis born?
AI in language – Speech Recognition
Limited until advent of AI and Machine Learning techniques
Collect waveforms (phonetic input)
Fast Fourier Transform = spectogram
Identifies resonances of production
Labels ‘Formants’ recognising phonemes,
words and phrases
Converts to text –‘best fit hypothesis’
ASR Challenges – Who ate all the cake?
I think David ate all the delicious chocolate cake.
Tonic / Keywords / Onset – Volume / Pitch / Length / Pausing
Remarkable number of variables - immense amount of comparative data to be processed to arrive at correct hypothesis as to meaning beyond denotation.
Yet any communication act is a combination of oral production and non-verbal cues, paralinguistics and contextual parameters.
AI in language – Speech Recognition
Formants – limited with 44 phonemes and syntactic training If only it were that easy:-)
Requires a ‘Language model’
Automatic Speech Recognition (ASR)
Speech signal (audio)
Decoding
Orthographic representation
Language models
Acoustic model
Lexical data
Training data
Learns with more training data
INPUT
OUTPUT
Text recognition
Fails if can’t parse sentences – higher risk when rules based
Can process sentence meaning (denotative)
Tag sentence structure (syntactic)
Parse tree - tag words with likely part of speech
Phrase structure rules (e.g. parts of speech)
AI in language - NLP
Automated Speech Recognition (ASR) Speech generation
Text recognition Text generation (NLG)
[Response driven]
When was Elvis born?
Natural Language Generation (text)
Fails if can’t access relevant semantic meaning – higher risk when rules based
Produces sentence ‘parsed text’ related to meaning (denotative)
Knowledge Graph generated (Google 70b+ facts end 2016)
Exploits web of semantic information (entities linked through meaningful relationships)
Codifying of language applied
Speech synthesis
• Speech recognition in reverse• Text broken into phonetic
elements• Speech sound generated• Rules of phonemic
representation manipulable• ML can extrapolate models
from input (training) data
Putting the pieces together
So a computer can…
• Convert our speech to text• Establish meaning• Generate a text response• Convert this test to speech
But can it have a meaningful conversation?
Spoken Dialogue Systems (SDS)
SDS use both speech and NLP technologies to enable extended human-machine conversation.
Det
erm
ine
app
rop
riat
e sy
stem
re
spo
nse
Commercially driven to achieve success in constrained conversation to achieve a specific scenario’s goal (Litman et al, 2016). Limited application in assessing interactive language.
• DMS uses ASR and NLU, in conjunction with an internal representation of ‘system state’
SDS – Dialogue Management system
• Limited number of ‘states’ – interaction at any point represented by one ‘state’
• Each utterance moves the interaction from one ‘state’ to another
• Applicable to mapped dialogues (scripted)
System ask ‘Do you live in a town or
the countryside?’
System Ask: Which town do you live
in?
System ask: How far is the nearest
town?
System say: (not_understood)
SDS – Dialogue Management system
NLU – live - Town
NLU – live -Countryside
NLU – live - ?
‘Finite state machine’ – predictable path of interaction – not spontaneous
Summary - AI in language - NLP
Automated Speech Recognition (ASR)
Speech generation
Text recognition Text generation (NLG)
[Response driven]
When was Elvis born?
Chatbots
Can AI simulate human interaction?
Chatbots – several programmes simultaneously analysing output, these generate wide range of hypothetical responses and choose that which is most likely to prolong dialogic exchange:
• Person bot – personality with character and baseline facts• Rapport bot – find out about you and interests• Wikibot – seek facts based on conversation content• A ranking function – choosing the best response
Heriot-Watt University Alana the bot
Prof Oliver Lemon
Can AI simulate human interaction?
In communication there is a lot more going on than just words. Whilst AI can recognise complex patterns it cannot understand concepts.
AI still very limited in terms of:• Pragmatics • Socio-linguistic competence• Strategic competence• Co-constructed dialogue / authentic exchange
Let’s have a chat to a bot
https://www.masswerk.at/eliza/
Eliza the psychotherapist- Heuristic engine
Mitsuku. ML engine60b+ messages processed
https://www.pandorabots.com/mitsuku/
Chatbot - task
What went right? Why do you think it worked?What did not work so well?What do you think was the cause of the communication break-down?
In pairs, sharing a device, have a chat with Mitsuku (or alternative conversational chatbot)
1: Try a simple interaction2: Try a more demanding dialogue
AI and language assessment
(speaking)
AI and language assessment (Productive skills)
Machine scores automatically generated
Utilise set criteria and dependent variables (e.g. repeat accuracy, length of production, fluency, vocabulary, grammar and pronunciation)
Compared to reference scores (manually set)
ASR - automated and human correlation
• Correlations improve with longer utterances (Bernstein, 2012; Neumeyer et al, 2000)
• Repeat accuracy = high correlation (0.92) (Graham et al, 2008)
• Repeat accuracy used as predictor of oral proficiency
• Further high correlation studies as predictors (Cook et al, 2011; De Wet et al, 2009)
• Predictive measures for fluency stronger for read speech rather than spontaneous (Cucchiarini et al, 2010)
• Correlations higher for rate of speech and accuracy compared to ‘goodness of pronunciation’ (Müller et al, 2009)
AI scores – case studies
Pearson PTE ETS – TOEFL iBT
System Versant Speechrater
Task example Read aloudRepeat sentenceShort answer
Opinion on familiar topicSpeak based on reading (total 6 tasks)
Scoring includes Pronunciation FluencyVocabularySentence mastery
PronunciationFluencyGrammatical facilityTopical coherenceIdea progression(Multiple regression scoring)
Correlation 0.84 - 0.92 0.73
Construct Psycholinguistic (Van Moere, 2012)
Direct and immediate interaction (Butler et al, 2000)
Predictive Ability to use core language in real time / use lexis to build phrases and clauses and
articulate
Contextualised and limited restriction – account for content, coherence and interactive (but task
monologic)
Adapted from Litman et al, 2018
On
ly p
ract
ice
test
s su
bje
ct t
o
furt
her
res
earc
h
Automatic Speech Recognition (ASR) in assessment
• Repeat accuracy• Length of production • Fluency (rate of speech)• Vocabulary – complexity and
accuracy• Grammar – complexity and
accuracy • Pronunciation (compared to
reference acoustic model)
• Test task limited: e.g. elicited imitation, reading aloud or short free responses
• Limited opportunity for spontaneous or dialogic speech
• Copes with transactional rather than interactional dialogue
Dialogue Management system – Finite state
At utterance level ‘States’ created (for
example)
Syntactic analysis
• Grammar errors• (NLU:Grammar = No)
Semantic analysis
• Meaning for expected answer
• Detail or gist
Pragmatics • Politeness• Contextual
coherence
Acoustic input
• Prosodic features
• Fluency
Alternative ‘State tracking’ –DMS gives
probability of path
Less ASR errors and can resolve
ambiguities further in dialogue
Potential application of holistic scales including CEFR. (Shashidhar et al, 2015)
Spoken Dialogue Systems (SDS) in assessment (Finite state)
• User tolerance of recognition errors
• Pedagogical value of misrecognised utterances
• Narrow domain scenario-guided conversation
• Useful for constrained and transactional dialogues
• Applicable where semi-scripted dialogue used
• Conversations simple and constrained• Based on L1 competent model• Test-takers have limited speaking skills,
SDSs – designed to process speech from proficient users
• Most conversational responses are not right or wrong (as required from tutorial dialogue system technology)
• SDS needs to be easily configurable by language experts
• Limited training data (despite machine learning)
The assessor will need to consider the extent to which their construct can accommodate [SDS’s] deviation from authentic dialogue (Litman et al, 2018)
Spoken Dialogue Systems (SDS)
Opportunities for spontaneous yet non-conversational speech, within constrained domain
State tracking SDSs – overcome ASR difficulties during dialogue
Applying AI – case study
Bachman & Palmer (2010): communicative competence model
Communicative Competence
Linguistic competence
Socio-linguistic competence
Discoursecompetence
Strategic competence
Case study - Communicative competence
Conversational features in co-constructed dialogue
Communicative competence - features
• Higher level contextual user ability – often related to concept• Semantic and topical relationship – tied to utterance history• Appropriate conversational functions (e.g. ending dialogue)• Linguistic devices (referring expressions, prosody etc.)• Turn taking conventions (linguistic signalling etc.)• Conversation coordination (confirming understanding, recovering etc.)• .• .
Communicative competence
1: In groups of 3 or 4 you will be allocated one of the four competences.
- Identify elements of one competence (e.g. Socio-cultural = register)- What parameters would need to be measured in spoken performance to
assess these elements?- Discuss if you think the AI systems covered today could be applied to assess
each element / the overall competence?
2: Cross-group into groups of 4, with one person covering each competence.
- Share your ideas around the application of AI to the communicative competences
Summary
Development of AI over
time
Speaking constructsassessed
AI and automated language assessment
20182010
Lin
guis
tic
com
pet
ence
? ? ?
Fully
au
tom
ated
la
ngu
age
asse
ssm
en
t
Initial criticism as only narrow constructs could be assessed
Construct or assessment engine?
Choose construct to assess, audit available technologies, and compensate for short-fallings with human intervention
or
Select available technologies, and align to construct they can cover, decide if compensation necessary
To what extent can contained measures be used as indicators or predictors of overarching language proficiency?
Construct – considerations
There is a rethinking of what speaking constructs could be….
• Expand theoretical definition of interactional competence • Encompass co-constructive and dynamic dialogue• Engage personal cognitive and contextual factors • Incorporate digital literacies, human – machine interaction• Consider narrower / partial constructs as sufficient predictors of proficiency• Scope for plurilingual and translanguaging competencies • Inclusion of transferable skills and mediation
There is a long road ahead…
Role of individual agency – impact on identity in test taking experience
‘I deserve to engage with a human’
AI in language assessment - identity
To conclude - some predictions
• Increasing number of collaborations between exam developers and high-tech IT companies
• Increasing use of blended modes of assessment delivery – digital / human
• Inclusion of digital literacies written into assessed constructs (coping with latency, paucity of NVs or paralinguistics, digital interface engagement, mediating NLP shortcomings etc.)
• Blended modes may include tasks of recorded human interaction that machines score – but not actual interaction with the machine (to overcome restrictions with SDSs etc.)
• Commercial opportunities for establishing L2 spoken corpora at differentiated CEFR levels for training datasets
• Development of AI formative assessment engine integrated into course delivery
To conclude - some predictions
And in the long term….• Localisation to class-level through local-populated datasets driving
adaptive assessment on an ongoing and formative basis – mediated through individual devices…
Trinity College LondonEnglish qualifications for real-world communication
Alex ThorpLead Academic, Language (Europe)