50
Machine Learning of Level and Progression in Second/Additional Language Spoken English Kate Knill Speech Research Group, Machine Intelligence Lab Cambridge University Engineering Dept 11 May 2016

Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Machine Learning of Level and Progression in Second/Additional Language Spoken English

Kate Knill Speech Research Group, Machine Intelligence Lab Cambridge University Engineering Dept 11 May 2016

Page 2: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Cambridge ALTA Instititute

•  Virtual institute at University of Cambridge

•  Computing, Linguistics, Engineering, Language Assessment

•  Sponsorship from Cambridge English Language Assessment

•  Work presented was done at CUED – thanks to:

•  Mark Gales, Rogier van Dalen, Kostas Kyriakopoulos, Andrey Malinin, Mohammad Rashid, Yu Wang

Page 3: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Spoken Communication

Pronunciation Prosody

Message Construction Message Realisation Message Reception

Speaker Characteristics Environment/Channel

Page 4: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Spoken Communication

Pronunciation Prosody

Message Construction Message Realisation Message Reception

Speaker Characteristics Environment/Channel

Spoken communication is a very rich communication medium

Page 5: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Spoken Communication Requirements

•  Message Construction should consider: •  Has the speaker generated a coherent message to convey?

•  Is the message appropriate in the context?

•  Is the word sequence appropriate for the message?

Page 6: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Spoken Communication Requirements

•  Message Construction should consider: •  Has the speaker generated a coherent message to convey?

•  Is the message appropriate in the context?

•  Is the word sequence appropriate for the message?

•  Message Realisation should consider: •  Is the pronunciation of the words correct/appropriate?

•  Is the prosody appropriate for the message?

•  Is the prosody appropriate for the environment?

Page 7: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Spoken Communication Requirements

•  Message Construction should consider: •  Has the speaker generated a coherent message to convey?

•  Is the message appropriate in the context?

•  Is the word sequence appropriate for the message?

•  Message Realisation should consider: •  Is the pronunciation of the words correct/appropriate?

•  Is the prosody appropriate for the message?

•  Is the prosody appropriate for the environment?

Page 8: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Spoken Language Versus Written

okay carl uh do you exercise yeah actually um i belong to a gym down here gold’s gym and uh i try to exercise five days a week um and now and then i’ll i’ll get it interrupted by work or just full of crazy hours you know

ASR Output

Page 9: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Spoken Language Versus Written

okay carl uh do you exercise yeah actually um i belong to a gym down here gold’s gym and uh i try to exercise five days a week um and now and then i’ll i’ll get it interrupted by work or just full of crazy hours you know

ASR Output

Meta-Data Extraction Markup Speaker1: / okay carl {F uh} do you exercise / Speaker2: / {DM yeah actually} {F um} i belong to a gym down here / / gold’s gym / / and {F uh} i try to exercise five days a week {F um} / / and now and then [REP i’ll + i’ll] get it interrupted by work or just full of crazy hours {DM you know } /

Page 10: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Spoken Language Versus Written

okay carl uh do you exercise yeah actually um i belong to a gym down here gold’s gym and uh i try to exercise five days a week um and now and then i’ll i’ll get it interrupted by work or just full of crazy hours you know

ASR Output

Meta-Data Extraction Markup Speaker1: / okay carl {F uh} do you exercise / Speaker2: / {DM yeah actually} {F um} i belong to a gym down here / / gold’s gym / / and {F uh} i try to exercise five days a week {F um} / / and now and then [REP i’ll + i’ll] get it interrupted by work or just full of crazy hours {DM you know } /

Written Text Speaker1: Okay Carl do you exercise? Speaker2: I belong to a gym down here, Gold’s Gym, and I try to exercise five days a week and now and then I’ll get it interrupted by work or just full of crazy hours.

Page 11: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Business Language Testing Service (BULATS) Spoken Tests

•  Example of a test of communication skills A.  Introductory Questions: where you are from B.  Read Aloud: read specific sentences C.  Topic Discussion: discuss a company that you admire

D.  Interpret and Discuss Chart/Slide: example above E.  Answer Topic Questions: 5 questions about organising a meeting

Page 12: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Common European Framework of Reference (CEFR)

Level Global Descriptor

C2 Fully operational command of the spoken language

C1 Good operational command of the spoken language

B2 Generally effective command of the spoken language

B1 Limited but effective command of the spoken language

A2 Basic command of the spoken language

A1 Minimal command of the spoken language

Page 13: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Automated assessment of one speaker

Audio

Grade

Page 14: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Automated assessment of one speaker

Audio

Grade

Feature

extraction

Features

Grader

Page 15: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Automated assessment of one speaker

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Page 16: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Outline

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Page 17: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Speech Recognition Challenges

•  Non-native ASR highly challenging •  Heavily accented •  Pronunciation dependent on L1

•  Commercial systems poor!

•  State-of-the-art CUED systems

Training Data Word error rate

Native & C-level non-native English

54%

BULATS speakers 30%

Page 18: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Automatic Speech Recognition Components

Language Model

Acoustic Model

Recognition Engine “The cat sat on …”

Acoustic Model training data

Language Model training data

Pronunciation Lexicon

Page 19: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Forms of Acoustic and Language Models

L2 audio data L2 text data L1 text data

+ L2 Acoustic Model

L2 Language Model

Used to recognise L2 speech

Page 20: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Forms of Acoustic and Language Models

L2 audio data L2 text data L1 text data

+ L2 Acoustic Model

L2 Language Model

Used to recognise L2 speech

Native (L1) audio data

Native (L1) text data

Native Acoustic Model

Native Language Model

Useful to extract features

Page 21: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Speech Recognition System

PLP

LayerBottleneck

Bottleneck

Bottleneck

HMM−GMMTandem

Stacked Hybrid

FusionScore

Log−Posteriors

Log−LikelihoodsAMI Corpus DataBULATS Data

Speaker Dependent

FBank

PLP

•  Joint decoding - frame-level combination

L(ot | si ) = λTLT (ot | si )+λHLH (ot | si )

Page 22: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Recognition Rate vs L1

•  Acoustic models trained on English data from Gujarati L1

scored against crowd-sourced references

Page 23: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Recognition Error Rate vs Learner Progression

0

10

20

30

40

50

A1 A2 B1 B2 C

%WER

CEFR Grade

ReadSpontaneousOverall

Page 24: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Outline

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Page 25: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Outline

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Page 26: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Baseline Features

•  Mainly fluency based:

•  Audio Features: statistics about •  fundamental frequency (f0) •  speech energy and duration

•  Aligned Text Features: statistics about •  silence durations •  number of disfluencies (um, uh, etc) •  speaking rate

•  Text Identity Features: •  number of repeated words (per word) •  number of unique word identities (per word)

Page 27: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Speaking Time vs Learner Progression

0

100

200

300

400

500

600

700

A1 A2 B1 B2 C

Average Speaking Time

(secs)

CEFR Grade

spontaneous speech read speech

Page 28: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Pronunciation Features

•  Hypothesis: poor speakers are weaker at making phonetic distinctions •  less proficient – phone realisation closer to L2 •  more proficient – phone realisation closer to L1

•  Statistical approach – learn phonetic distances from graded data •  single multivariate Gaussian of K-L divergence per phoneme pair •  1081 phoneme pairs JSD(p1(x), p2 (x)) =

12KL(p1(x) || p2 (x))+KL(p2 (x) || p1(x)[ ]

KL(p1(x) || p2 (x)) =12tr(Σ2

−1Σ1 − Ι)+ (µ1 −µ2 )T Σ2

−1( ) µ 1−µ2( )+ logΣ2−1

Σ1−1

⎝⎜⎜

⎠⎟⎟

Page 29: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Pronunciation Features vs Learner Progression

•  Pattern of distances different between candidates of different levels •  Correlation with score: mis-pronounced phones higher K-L distance

•  opposite of expectation that poor speakers have more overlap

Candidate Grade A1 Candidate Grade C2

Page 30: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Statistical Parser Features

•  Parser features from RASP system improve grades for written tests

•  Problem: speech recognition accuracy

•  Smaller subtrees and leaves are fairly robust

Page 31: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Outline

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Page 32: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Outline

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Page 33: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Uses of Automatic Assessment

•  Human graders ✔ very powerful ability to assess spoken language ✖ vary in quality and not always available

•  Automatic graders ✔ more consistent and potentially always available ✖ validity of the grade varies and limited information about context

Page 34: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Uses of Automatic Assessment

•  Human graders ✔ very powerful ability to assess spoken language ✖ vary in quality and not always available

•  Automatic graders ✔ more consistent and potentially always available ✖ validity of the grade varies and limited information about context

•  Use automatic grader •  for grading practice tests/learning process •  in combination with human graders

•  combination: use both grades •  back-off process: detect challenging candidates

Page 35: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Gaussian Process Grader

•  Currently have 1000s candidates to train grader •  limited data compared to ASR frames (100,000s frames) •  useful to have confidence in prediction

Gaussian Process is a natural choice for this configuration

Page 36: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Form of Output

Graders Pearson Correlation Human experts 0.85 Automatic GP 0.83 – 0.86

Page 37: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Effect of Grader Features

Grader Pearson Correlation with Expert Graders

Standard examiners 0.85 Automatic baseline 0.83 + Pronunciation 0.84 + RASP 0.85 + Confidence 0.83 + RASP + Confidence 0.86 Pronunciation features 0.82

Page 38: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Combining Human and Automatic Graders

•  Interpolate between human and automated grades •  higher correlation i.e. more reliable grade produced

•  Content checking can be done by the human grader

Original 0.2 0.4 0.6 0.8 Gaussian process

Interpolation weight

0.85

0.9

0.95

1Correlation

Page 39: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Detecting Outlier Grades

•  Standard (BULATS) graders handle standard speakers very well •  non-standard (outlier) speakers less well handled •  use Gaussian Process variance to automatically detect outliers

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Rejection rate (i.e., cost)

0.85

0.9

0.95

1

Correlation

Gaussian process

•  Back-off to human experts - reject 10%: performance 0.83 è 0.88

Random rejection

Ideal rejection

Page 40: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Assessing Communication Level

•  Language complexity is related to proficiency •  Future work – look into e.g.

•  McCarthy’s use of chunks “I would say”, “and then” •  Abdulmajeed and Hunston’s “correctness analysis”

0

10000

20000

30000

40000

50000

60000

uniquewords

bigrams trigrams fourgrams

A1 A2 B1 B2

0

200

400

600

800

1000

0 2 4 6 8 10 12 14 16

Numberofphones/word

A1 A2 B1 B2

•  Ignore high-level content and communication skills currently”

Page 41: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Assessing Content

•  Grader correlates well with expert grades •  features do not assess content – primarily fluency features

•  Train a Recurrent Neural Network Language Model for each question •  assess whether the response is consistent with example answers

Page 42: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Topic Classification

•  Experiment details •  280-D LSA topic space •  Supervised (SUP): 490 speakers, 2x crowd-sourced transcriptions •  Semi-supervised (Semi-SUP): + 10005 speakers, ASR transcriptions

•  Increasing quantity of data helps even though high %WER •  RNNLM can handle large data sets unlike K-Nearest Neighbour (KNN)

System HL-dim Training Data

% Error

KNN - SUP 20.8 RNNLM 100 17.5 RNNLM 200 Semi-SUP 9.3

Page 43: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Off-Topic Response Detection

•  Synthesised pool of off-topic responses •  Naïve – select incorrect response from any section •  Directed – select incorrect response from same section

Page 44: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Spoken Language Assessment

•  Automatically assess: •  Message realisation

•  Fluency, pronunciation

•  Message construction •  Construction & coherence of response •  Relationship to topic

Page 45: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Spoken Language Assessment

•  Automatically assess: •  Message realisation

•  Fluency, pronunciation Achieved (with room for improvement) •  Message construction

•  Construction & coherence of response •  Relationship to topic

Unsolved – active research areas

Page 46: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Spoken Language Assessment and Feedback

Error Detection & Correction

•  Automatically assess: •  Message realisation

•  Fluency, pronunciation

•  Message construction •  Construction & coherence of response •  Relationship to topic

•  Provide feedback: •  Feedback to user: realisation, construction •  Feedback to system: adjust to level

Feedback

Page 47: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Recognition Error Rate Versus Learner Progression

Page 48: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Time Alignment and Pronunciation Feedback

Page 49: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Conclusions

•  Automated machine-learning for spoken language assessment •  important to keep costs down •  able to be integrated into the learning process

•  Current level – assessment of fluency •  ongoing research into assessing communication skills:

•  appropriateness and acceptability

•  Error detection and feedback is challenging •  high precision required in detecting where errors have occurred •  supplying feedback in appropriate form for learner

Page 50: Machine Learning of Level and Progression in Second ...mi.eng.cam.ac.uk/~kmk/presentations/UBham_May2016_Knill.pdf · Progression in Second/Additional Language Spoken English Kate

Questions?