ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent
Recognition of foreign names spoken by native speakers
Frederik Stouten & Jean-Pierre Martens
Ghent University
Electronics and Information Systems (ELIS)
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 2
Overview
• Problem statement• Methodology
– computing phonological scores– foreignizable phonemes
• Experiments– baseline system– systems with methodology implementation
• Conclusions
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 3
• Automatic attendant or car navigation systems– lexicon may contain > 100K words– many from foreign origin
• Native speaker of Dutch can pronounce Andrew as
Problem statement
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 4
• Automatic attendant or car navigation systems– lexicon may contain > 100K words– many from foreign origin
• Native speaker of Dutch can pronounce Andrew as• nativized A n d r E w• intermediate E n d r u w• foreignized E n d r u
Problem statement
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 5
• Standard solutions– foreign g2p’s + mapping to native phonemes– include foreign phoneme acoustic models
• Our proposal – combine scores of standard acoustic models and
phonologically inspired back-off model • both models trained on native speech only
– use foreign g2p’s without phoneme mapping– introduce foreignizable phonemes instead of traditional
foreign-to-native phoneme mappings
Problem situation
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 6
Combining scores
• two-stream score per acoustic model state q– standard model : log pA(x | q)
– phonological back-off model : log pB(x | q)
• control parameters– g1q, g2q = state dependent stream weight
(different risk for foreignized pronunciation)
– α, β = state independent scaling coefficients (to get same overall mean, variance)
– equidistant samples on g1q+ g2q = 1 (factor has no effect)
LL(xjq) = g1q logpA (xjq) + g2q[®logpB (xjq) ¡ ¯]
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 7
Combining scores
• Computation of log pB(x | q)
– phonological feature space: binary features fi (i=1,…,25)
– map each state to phonological space• select features of state on basis of forced alignment of speech
with standard acoustic models
• select fi with large enough mean of P(fi | x) / P(fi ) on state
• other strategy for foreignizable phonemes (see further)
– compute posterior probabilities P(fi | x) • configuration of 4 neural networks
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 8
Combining scores
• Computation of log pB(x | q)
– phonological feature space: binary features fi (i=1,…,25)
– map each state to phonological space• select features of state on basis of forced alignment of speech
with standard acoustic models
• select fi with large enough mean of P(fi | x) / P(fi ) on state
• other strategy for foreignizable phonemes (see further)
– compute posterior probabilities P(fi | x) • configuration of 4 neural networks
– convert posterior probabilities to log-likelihood logpB (xjq) = log
PB (qjx)PB (q)
+ logpB (x)
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 9
Combining scores
• Come to final two-stream score
– g2q less dependent on q than
– g2q log pB (x) = discardable
– computation of log PB(q | x) / PB(q)• Pq : positive features that are ‘on’ for state q• Nq : negative features absent or ‘off’ for q
LL(xjq) = g1q logpA (xjq) + g2q[®logPB (qjx)PB (q)
¡ ¯]
logPB (qjx)PB (q)
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 10
Combining scores
• Assuming independent PHFs we get
(1) (2)• Start with only positive features (term (1))
– problem : unequal number for different q– solution : take average or wqp x (1), with wqp = 1 / card(Pq)– experiment showed this is better
• Add negative features (term (2))– supposed to represent same probability– experiment shows 75 % correlation between (1) and (2)– keeping (1) + (2) is slightly better than discarding (2)
logPB (qjx)PB (q)
=X
f i 2Pq
logP (f i jx)P (f i )
+X
f i 2N q
log1¡ P (f i jx)1¡ P (f i )
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 11
Introducing foreignizable phonemes
• Baseline pronunciation of foreign name – take foreign language g2p output– map foreign phonemes to best native equivalent
• Our pronunciation– if equivalent has different PHFs
keep info of original
foreignizable phoneme: /NativePhon/_/ForeignPhon/
– e.g. /rr/ /r/_/rr/ (Dutch /r/ originating from English /rr/)– 6 such phonemes for English Dutch– use positive PHFs of /ForeignPhon/ (knowledge based)
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 12
Introducing foreignizable phonemes
• Pronunciation variants– mix of standard and new approach
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 13
Introducing foreignizable phonemes
• Pronunciation variants– mix of standard and new approach
name transcription
Alan Presser baseline E l @ n _ p r E s @ r
alternative 1 E l @ n _ p r_rr E s @ r
alternative 2 E l @ n _ p r E s @ r_rr
alternative 3 E l @ n _ p r_rr E s @ r_rr
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 14
Experiments
• Recognition of English names– database from Nuance (Cremelie, N and ten Bosch, L)– 2050 English name utterances – 21 different names– 26 native speakers of Dutch
• Recognizer– Standard acoustic models: cross-word triphones, trained on
Dutch read speech– PHF feature detector: neural network configuration, trained on
Dutch read speech– Vocabulary: 21 English names + 1779 Dutch names– Lexicon: different transcriptions for each name (see next slide)
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 15
Baseline system
• No back-off model used• Effects of different types of
transcriptions measured
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 16
Baseline system
• No back-off model used• Effects of different types of
transcriptions measuredlexicon WER CI95(%)
DuAlone 30.3 28.4-32.3
DuMan 23.5 21.6-25.3
EngAlone 23.1 21.2-24.9
EngDu 18.2 16.5-19.9
EngMan 16.8 15.2-18.4
ManAlone 24.7 22.8-26.5
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 17
Baseline system
• No back-off model used• Effects of different types of
transcriptions measured• Most important findings
1. English much better than Dutch transcriptions (alone)
model foreign pronunciations
2. Dutch transcriptions inevitable
model native pronunciations
lexicon WER CI95(%)
DuAlone 30.3 28.4-32.3
DuMan 23.5 21.6-25.3
EngAlone 23.1 21.2-24.9
EngDu 18.2 16.5-19.9
EngMan 16.8 15.2-18.4
ManAlone 24.7 22.8-26.5
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 18
Systems with back-off model
• system FOREIGN– consider one foreignizable phonemes at the time– same g1 on all its states : find optimal value under
condition that g1 = 1 for all other phonemes– repeat process until all foreignizable phonemes treated
• system NATIVE– same g1 on all states– search for best g1
• system ALL– foreignizable phonemes : g1 = from FOREIGN– other phonemes: same g1, g1 = from NATIVE
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 19
Systems with back-off model
• Main results : relative improvement of 11%
• Other results– g1 < 0.5 for system FOREIGN
– g1 > 0.5 for system NATIVE
system g1 g2 WER(%)
BASELINE 1 0 18.2
FOREIGN opt. opt. 16.5
NATIVE 0.7 0.3 17.3
ALL opt+0.7 opt+0.3 16.2
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 20
Latest work
• Seek confirmation of results on other data• Autonomata database (STEVIN-project)
– 60000 names, 5000 different names, 240 speakers– French + English + Dutch names– French + English + Dutch speakers– French + English + Dutch g2p outputs per name– large RI by using foreign g2p’s on French and English– much larger RI with our methodology than here– paper submitted to ASRU-2007
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 21
Conclusions as of today
• large improvements on foreign name recognition by adding foreign g2p outputs (RI of around 40%)
• substantial extra improvements by adding new methodology (RI of up to 30%)