Upload
nathaniel-russell
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora
C. AURAN, C. BOUZON & D.J. HIRST
Laboratoire Parole et LangageCNRS UMR6057
Université de Provence
Summary
1. The Aix-MARSEC ProjectBuilding Aix-MARSECAvailability of the databaseMethodology
2. Grapheme-Phoneme Conversion and AlignmentThe Aix-MARSEC MethodologyIntegration into PCE
3. Conclusion and Perspectives
The Aix-MARSEC Project
• Automatic grapheme-to-phoneme conversion
• Automatic phoneme level alignment
• Automatic intonation annotation using the Momel-Intsint methodology
• 8 annotation levels aligned: phonemes, syllable constituents,
syllables, words, feet and rhythmic units, tone groups, Intsint coding
• Tagging and parsing alignment under way
The Aix-MARSEC Project
An evolution from the SEC and MARSEC corpora
SEC
Spoken English Corpus
• 55,000 words, 339 min. and 18 sec. • BBC 1980s recordings• 11 speaking styles• 53 (17 female and 36 male) speakers• Orthographic transcription• Syntactic tagging and parsing• Prosodic annotation: 14 tonetic stress marks
MARSEC
Machine Readable SEC
Aix-MARSEC
Building Aix-MARSEC
• Alignment of words and tone groups with the signal
• Conversion of all the TSM to ASCII characters
The Aix-MARSEC Project
The Aix-MARSEC Project
Availability of the database
• Online version:• Annotation files (TextGrids)
• Phonemes data tables
• Perl and Praat scripts
www.lpl.univ-aix.fr/~EPGA/
• CD-Rom version:• Annotation files (TextGrids)
• Phonemes data tables
• Perl and Praat scripts
• Sound files (.wav format)
The Aix-MARSEC Project
Methodology
Automatic alignment
Orthographic transcription
Raw phonemic transcription
Optimised phonemic transcription
Aligned phonemic transcription
Elision prediction
G2P conversion
SC annotation Syllable annotation Word annotation
TSM annotation
Rhythmic annotation
Grapheme-Phoneme Conversion and Alignment
G2P Conversion and Alignment
Orthographic transcription
Raw phonemic transcription
Optimised phonemic transcription
Elision prediction
G2P conversion
The Aix-MARSEC Methodology
Automatic alignment
Aligned phonemic transcription
SC annotation Syllable annotation Word annotation
G2P Conversion and Alignment
Orthographic transcription
Raw phonemic transcription
G2P conversion
The Aix-MARSEC Methodology
G2P Conversion and Alignment
The Aix-MARSEC Methodology
G2P Conversion: General principles
• Dictionary-based method (4 dictionaries used)
• Specific processing for numbers, abbreviations, etc.
• Syntagmatic effects (linking r, definite article)
Raw transcription
G2P Conversion and Alignment
The Aix-MARSEC Methodology
G2P Conversion: The 4 dictionaries
• Primary pronunciation dictionary (‘Advanced Learners’ Dictionary’, Oxford University Press; 71 000 entries)
• Complementary dictionary (700 entries)
• “Problematic forms” dictionary (for hesitations, partial words,…; 26 entries)
• “Reduced forms” dictionary (75 entries)
G2P Conversion and Alignment
The Aix-MARSEC Methodology
G2P Conversion: Specific issues
• Abbreviations• Numbers• Sequences of numbers and capitals (Post Codes)• Genitives and Contractions• 3rd person and plural forms• Preterite and past participle forms
G2P Conversion and Alignment
Orthographic transcription
Raw phonemic transcription
G2P conversion
The Aix-MARSEC Methodology
Optimised phonemic transcription
Elision prediction
G2P Conversion and Alignment
The Aix-MARSEC Methodology
Elision Prediction: General principles
• Raw transcription ↔ citation forms
• Continuous speech ↔ specific phenomena (elisions, epenthesis, metathesis, etc.)
G2P Conversion and Alignment
The Aix-MARSEC Methodology
Elision prediction: Constraints
- Intonation constraints (TSM)- Temporal constraints:
Minimal threshold: 5ms
Thresholds for specific phonemes (Klatt, 1979)
/t – d/= 55ms; /@/= 55ms; /T/= 110ms
Lengthening « z » factor: z < 0 elision
z ≥ 0 no elision
- Phonotactic constraints (rules)
G2P Conversion and AlignmentElision prediction: Rules
Principles Phonemes Contexts Constraints Examples0 <5ms1 d and TSM and then2 h he('s/ll/d) him his her TSM in her case
3 t d {[t][d]} # {[t][d]} Th.1 - except '-ed' I've got to
4 t d C1 + {[t][d]} # C2 – {[h][j]} Th. mustn't lose
6 l [O:] + [l] (#) C always7 T C + [T] (#) [s] Th. twelfths8 ptk bdg [s| z] + {[p| b][t| d][k| g]} (#) [s| z] tourists
10 @ # [k@n] ('syll (syll [0…n])) # TSM - Th. confront
11 @ {[k][p]} + [@] + [n] # Th. open
5 p k glimpsenasal + {[p][k]} (#) C – {[r][l][j]}
9 @ Th. - */rl/ camera[@] + {[l][r]} (#) + voyelle réduite {[I][@]}
1Th.: duration threshold
G2P Conversion and AlignmentElision prediction: Evaluation
MEASURES
RECALL 50,51 %
PRECISION 74,44 %
SILENCE 49,49 %
NOISE 25,56 %
F-MEASURE 60,18 %
4077 elided phonemes out of 199,770 in the corpus (≈ 2 %)
Half of all elisions are correctly predicted
¾ predicted elisions are correct
Global quality of the algorithm
G2P Conversion and Alignment
Orthographic transcription
Raw phonemic transcription
Optimised phonemic transcription
Elision prediction
G2P conversion
The Aix-MARSEC Methodology
Automatic alignment
Aligned phonemic transcription
G2P Conversion and AlignmentAlignment: General principles
HMM and Viterbi based alignment by Christophe Lévy (LIA, France)
- HMM trained on the TIMIT corpus of American English
- Gaussian Mixture Model (8 components & diagonal covariance matrices estimated through the Expectation-Maximisation algorithm optimising the Maximum-Likelihood criterion)
- 12 MFCC (filter bank analysis) increased by energy, delta and delta-delta coefficients
39-coefficient vector per speech frame
G2P Conversion and Alignment
Absolute mean error: 22 ms
Mean error: - 6,29 ms
Kurtosis: 8,15 (narrow distribution)
Skewness: -0,94 (left bias)
Alignment: Evaluation
G2P Conversion and Alignment
Acceptance Threshold
Optimised transcription
64 ms 93.25 %
32 ms 82.02 %
20 ms 68.37 %
16 ms 59.97 %
15 ms 57.40 %
10 ms 42.43 %
5 ms 23.72 %
Alignment: Evaluation
Integration into PCEIntegration: Motivations
Double focus:
Segmental phenomena
Prosodic phenomena
Formant charts
Tonal alignment
Phoneme level alignment
For phoneticians and phonologists
Integration into PCEIntegration: 2 possible policies
• Direct integration: Exact Aix-MARSEC methodology
Requires word level manual alignment
• Alternative integration: Adaptation of the Aix-MARSEC methodology
Optional elisions predicted on the basis of phonotactic rules only + decision during the alignment phase
Conclusions and Perspectives
Conclusions and Perspectives
• An easily evolutive fully automatic methodology
• Diverse types of phonological / phonetic segmental / prosodic
exploitation (formant charts, temporal, intonational and metrical
studies, …)
• Full interactivity with other ProZEd modules (Momel-Intsint, …)
• Realistic integration into PCE (2 options)
Well… This time it’s for good !!
Presentation available from
www.lpl.univ-aix.fr/~EPGA/
14 ASCII prosodic annotation symbols:
_ low level~ high level< step-down> step-up/’ (high) rise-fall
‘/ high\ high fall fall-rise/ high rise
, low rise‘ low fall,\ (low rise-fall – not used)\, low fall-rise* stressed but unaccented| minor intonation unit boundary|| major intonation unit boundary
(Roach, 1994)
Back to the presentation
Reduced forms processing
Creation of a reduced forms dictionary based on O’Connor (1967) and
Faure (1975)
Reduction constraint: TSM absence
Aim: improving G2P conversion
Back to the presentation
Example: TSM: ‘/and → converted into /{nd/
No TSM: and → converted into /@nd/