Advances in WP2
Nancy Meeting – 6-7 July 2006
www.loquendo.com
2
Recent Work on NN Adaptation in WP2
• State of the art LIN adaptation method implemented and experimented on the benchmarks (m12)
• Innovative LHN adaptation method implemented and experimented on the benchmarks (m21)
• Experimental results on benchmark corpora and Hiwire database with LIN and LHN (m21)
• Further advances on new adaptation methods (m24)
3
LIN Adaptation
Output layer
….
….
….
Input layer
1st hidden layer
2nd hidden layer
Emission Probabilities
Acoustic phonetic Units
Speech Signal parameters….
Speaker
Independent
MLP
SI-MLP
LIN
4
LHN Adaptation
Output layer
….
….
….
Input layer
1st hidden layer
2nd hidden layer
Emission Probabilities
Acoustic phonetic Units
Speech Signal parameters….
Speaker
Independent
MLP
SI-MLP
LHN
5
Results Summary (W.E.R.)
Test set baseline LIN adapted
E.R. LHN adapted
E.R
WSJ0 16kHz bigr LM
10.5 9.4 10.5% 8.4 20.0%
WSJ1 Spoke-3
16kHz bigr LM
54.2 46.5 14.2% 30.6 43.5%
HIWIRE8kHz
11.6 7.8 32.1% 7.2 37.9%
6
Papers presented:
• Roberto Gemello, Franco Mana, Stefano Scanzio, Pietro Laface, Renato De Mori, “Adaptation of Hybrid ANN/HMM models using hidden linear transformations and conservative training”, Proc. of Icassp 2006, Toulouse, France, May 2006
• Dario Albesano, Roberto Gemello, Pietro Laface, Franco Mana, Stefano Scanzio, “Adaptation of Artificial Neural Networks Avoiding Catastrophic Forgetting”, Proc. of IJCNN 2006, Vancouver, Canada, July 2006
7
The “Forgetting” problem in ANN Adaptation
• It is well known, in connectionist learning, that acquiring new information in the adaptation process can damage previously learned information (Catastrophic Forgetting)
• This effect must be taken into account when adapting an ANN with limited amount of data, which do not include enough samples for all the classes.
• The “absent” classes may be forgotten during adaptation as the discriminative training (Error Backpropagation) assigns always zero targets to absent classes
8
“Forgetting” in ANN for ASR
• While Adapting ASR ANN/HMM model, this problem can arise when the adaptation set does not contain examples for some phonemes, due to the limited amount of adaptation data or the limited vocabulary
• The ANN training is discriminative, contrary to that of GMM-HMMs, and absent phonemes will be penalized by assigning to them a zero target during the adaptation
• That induces in the ANN a forgetting of the capability to classify the absent phonemes. Thus, while the HMM models for phonemes with no observations remain un-adapted, the ANN output units corresponding to phonemes with no observations loose their characterization, rather than staying not adapted
9
Example of ForgettingAdaptation examples only of E, U, O (e.g. from words: uno, due, tre); no examples for the other vowels (A, I, ə )The classes with examples adapt themselves, but tend to invade the classes with no examples, that are partially “forgotten”
F1 (kHz)
F2
(kHz)
E
e A
U O
0.00.5
5.0
4.0
3.0
2.0
1.0
1.51.00.5F1 (kHz)
F2
(kHz)
I E
e A
U O
0.00.5
5.0
4.0
3.0
2.0
1.0
1.51.00.5
I
10
“Conservative” Training
• We have introduced “conservative training” to avoid the forgetting of absent phonemes
• The idea is to avoid zero target for the absent phonemes, using for them the output of the Original NN as target;
Let be FP the set of phonemes present in the adaptation set and FA the set of absent ones. The target are assigned according to the following equations:
0
1
0
iPi
iPi
Ai
fcorrectFfTARGET
fcorrectFfTARGET
FfTARGETStandard policy
0
__1
__
iPi
FjjiPi
iAi
fcorrectFfTARGET
fNNORIGINALOUTPUTfcorrectFfTARGET
fNNORIGINALOUTPUTFfTARGET
A
Conservative policy
11
Conservative Training target assignment policy
A1 P1 P2 P3 A2
P2 is the class corresponding to the correct
phoneme
Px: class in the adaptation set Ax: absent
class
Posterior probability computed using the original network
0.03 0.00 0.95 0.00 0.02 0.00 0.00 1.00 0.00 0.00
Standard target assignment policy
12
“Conservative” Training
• In this way, the phonemes that are absent in the adaptation set are “represented” by the response given by the Original NN
• Thus, the absent phonemes are not “absorbed” by the neighboring present phonemes
• The results of adaptation with conservative training are:– Comparable performances on target environment– Preservation of performances on generalist environment– Great improvement of performances in speaker adaptation,
when only few sentences are available
13
Adaptation tasks
– Application data adaptation: Directory Assistance• 9325 Italian city names• 53713 training + 3917 test utterances
– Vocabulary adaptation: Command words• 30 command words• 6189 training + 3094 test utterances
– Channel-Environment adaptation: Aurora-3• 2951 training + 654 test utterances
14
Adaptation Results on different tasks (%WER)
Adaptation Task
Adaptation
Method
Application
Directory
Assistance
Vocabulary
Command Words
Channel-Environment
Aurora-3 CH1
No adaptation 14.6 3.8 24.0
LIN 11.2 3.4 11.0
LIN + CT 12.4 3.4 15.3
LHN 9.6 2.1 9.8
LHN + CT 10.1 2.3 10.4
15
Mitigation of Catastrophic Forgetting using Conservative Training
Models
Adapted on
Application
Directory
Assistance
Vocabulary
Command Words
Channel-Environment
Aurora-3 CH1
Adaptation
Method
LIN 36.3 42.7 108.6
LIN + CT 36.5 35.2 42.1
LHN 40.6 63.7 152.1
LHN + CT 40.7 45.3 44.2
No Adaptation 29.3
Tests using adapted models on Italian continuous speech (% WER)
16
Conclusions
– The new LHN adaptation method, developed within the project, outperforms standard LIN adaptation
– In adaptation tasks with missing classes, Conservative Training reduces the catastrophic forgetting effect, preserving the performance on another generic task
17
Workplan
• Selection of suitable benchmark databases (m6)
• Baseline set-up for the selected databases (m8)
• LIN adaptation method implemented and experimented on the
benchmarks (m12)
• Experimental results on Hiwire database with LIN (m18)
• Innovative NN adaptation methods and algorithms for acoustic
modeling and experimental results (m21)
• Further advances on new adaptation methods (m24)
• Unsupervised Adaptation: algorithms and experimentation (m33)
Recommended