Upload
jason-owens
View
225
Download
0
Tags:
Embed Size (px)
Citation preview
Diagnostic Assessment of Diagnostic Assessment of Childhood Apraxia of Speech Using Childhood Apraxia of Speech Using
Automatic Speech Recognition (ASR) Automatic Speech Recognition (ASR) SystemsSystems
Lawrence D. Shriberg1
John-Paul Hosom2
Jordan R. Green3
1Waisman Center, University of Wisconsin - Madison
2Center for Spoken Language Understanding, Oregon Health & Science University
3Department of Special Education & Communication Disorders,University of Nebraska - Lincoln
This research is supported by NIDCD grants DC000496 and DC006722
http://www.waisman.wisc.edu/phonology/Index.htm
2
AcknowledgmentsAcknowledgments
Phonology Project,Waisman Center, University of Wisconsin - Madison
Roger Brown Katherina Hauner Jane McSweenyCatherine Coffey Heather Karlsson Connie NadlerPeter Flipsen Jr.a Ray Kent Alison ScheerJordan Greenb Yunjung Kim Christie TilkensSheryl Hall Joan Kwiatkowski David Wilson
Collaborative Projects
• Thomas Campbell & colleagues: University of Pittsburgh• John-Paul Hosom & colleagues: Oregon Health & Science
University• Barbara Lewis & colleagues: Case Western Reserve
University• Christopher Moore & colleagues: University of Washington• Rhea Paul & colleagues: Yale Child Studies Center• Bruce Pennington & colleagues: University of Colorado• Joanne Roberts & Colleagues: University of North Carolina• Bruce Tomblin & colleagues: University of Iowaa. University of Tennessee, Knoxville b. University of Nebraska
3
Complex Disease Model for ChildhoodComplex Disease Model for ChildhoodSpeech Sound Disorders (SSD) of Speech Sound Disorders (SSD) of
Unknown OriginUnknown OriginRisk and Protective Factors
EnvironmentalGenetic
Cognitive-Linguistic
Auditory-Perceptual
Speech Motor
Control
Psycho-social
Phonological Attunement
Speech Delay – Genetic
(SD-GEN)
Speech Delay – Otitis Media
with Effusion (SD-OME)
Speech Delay Speech Motor Involvement
(SD-SMI)
Speech Delay – Developmental Psychosocial Involvement
(SD-DPI)
Speech Errors (SE)
SD-GEN SD-OME SD-AOS SD-DPI SE-/s/ SE-/r/
I. Etiological Processes
II. Explanatory Processes
III. Nosological Entity
IV. Trait Markers (phenotypes, endophenotypes)
> Omissions< Distortions< Backing
> Omissions< Distortions< Backing
- - - -- - - - - - - -
> M1 values- - - - - - - -
- - - -- - - - - - - -
< F3–F2- - - - - - - -
- - - -- - - -- - - -
Speech markers> I-S Gap> Backing
- - - - - - - - - - - -
> Severity- - - - - - - -
SD-DYS
8 speech markersLex. Stress Ratio>Coeff. Var. Ratio
8 speech markersLex. Stress Ratio>Coeff. Var. Ratio
- - - -- - - - - - - -
- - - - - - - - - - - -
V. Diagnostic Markers
*Shriberg, Austin, et al. (1997)
4
Complex Disease Model for ChildhoodComplex Disease Model for ChildhoodSpeech Sound Disorders (SSD) of Speech Sound Disorders (SSD) of
Unknown OriginUnknown OriginRisk and Protective Factors
EnvironmentalGenetic
Cognitive-Linguistic
Auditory-Perceptual
Speech Motor
Control
Psycho-social
Phonological Attunement
Speech Delay – Genetic
(SD-GEN)
Speech Delay – Otitis Media
with Effusion (SD-OME)
Speech Delay Speech Motor Involvement
(SD-SMI)
Speech Delay – Developmental Psychosocial Involvement
(SD-DPI)
Speech Errors (SE)
SD-GEN SD-OME SD-AOS SD-DPI SE-/s/ SE-/r/
I. Etiological Processes
II. Explanatory Processes
III. Nosological Entity
IV. Trait Markers (phenotypes, endophenotypes)
SDCS*- - - -- - - -
SDCS- - - - - - - -
- - - -- - - - - - - -
SDCS- - - - - - - -
- - - -- - - - - - - -
SDCS- - - - - - - -
- - - - - - - - - - - -
SDCS- - - - - - - -
- - - -- - - - - - - -
SDCS- - - - - - - -
SD-DYS
SDCS- - - - - - - -
SDCS- - - - - - - -
- - - -- - - - - - - -
SDCS- - - - - - - -
V. Diagnostic Markers
*Shriberg, Austin, et al. (1997)
5
Complex Disease Model for ChildhoodComplex Disease Model for ChildhoodSpeech Sound Disorders (SSD) of Speech Sound Disorders (SSD) of
Unknown OriginUnknown OriginRisk and Protective Factors
EnvironmentalGenetic
Cognitive-Linguistic
Auditory-Perceptual
Speech Motor
Control
Psycho-social
Phonological Attunement
Speech Delay – Genetic
(SD-GEN)
Speech Delay – Otitis Media
with Effusion (SD-OME)
Speech Delay Speech Motor Involvement
(SD-SMI)
Speech Delay – Developmental Psychosocial Involvement
(SD-DPI)
Speech Errors (SE)
SD-GEN SD-AOS SD-DPI SE-/s/ SE-/r/
I. Etiological Processes
II. Explanatory Processes
III. Nosological Entity
IV. Trait Markers (phenotypes, endophenotypes)
SD-DYS
V. Diagnostic Markers
*Shriberg, Austin, et al. (1997)
SD-OME
SDCS*- - - -- - - -
SDCS- - - - - - - -
- - - -- - - - - - - -
SDCS- - - - - - - -
- - - -- - - - - - - -
SDCS- - - - - - - -
- - - - - - - - - - - -
SDCS- - - - - - - -
- - - -- - - - - - - -
SDCS- - - - - - - -
SDCS- - - - - - - -
SDCS- - - - - - - -
- - - -- - - - - - - -
SDCS- - - - - - - -
6
Complex Disease Model for ChildhoodComplex Disease Model for ChildhoodSpeech Sound Disorders (SSD) of Speech Sound Disorders (SSD) of
Unknown OriginUnknown OriginRisk and Protective Factors
EnvironmentalGenetic
Cognitive-Linguistic
Auditory-Perceptual
Speech Motor
Control
Psycho-social
Phonological Attunement
Speech Delay – Genetic
(SD-GEN)
Speech Delay – Otitis Media
with Effusion (SD-OME)
Speech Delay Speech Motor Involvement
(SD-SMI)
Speech Delay – Developmental Psychosocial Involvement
(SD-DPI)
Speech Errors (SE)
SD-GEN SD-OME SD-AOS SD-DPI SE-/s/ SE-/r/
I. Etiological Processes
II. Explanatory Processes
III. Nosological Entity
IV. Trait Markers (phenotypes, endophenotypes)
SD-DYS
V. Diagnostic Markers
*Shriberg, Austin, et al. (1997)
SDCS*- - - -- - - -
SDCS- - - - - - - -
- - - -- - - - - - - -
SDCS- - - - - - - -
- - - -- - - - - - - -
SDCS- - - - - - - -
- - - - - - - - - - - -
SDCS- - - - - - - -
- - - -- - - - - - - -
SDCS- - - - - - - -
SDCS- - - - - - - -
SDCS- - - - - - - -
- - - -- - - - - - - -
SDCS- - - - - - - -
7
Complex Disease Model for ChildhoodComplex Disease Model for ChildhoodSpeech Sound Disorders (SSD) of Speech Sound Disorders (SSD) of
Unknown OriginUnknown OriginRisk and Protective Factors
EnvironmentalGenetic
Cognitive-Linguistic
Auditory-Perceptual
Speech Motor
Control
Psycho-social
Phonological Attunement
Speech Delay – Genetic
(SD-GEN)
Speech Delay – Otitis Media
with Effusion (SD-OME)
Speech Delay Speech Motor Involvement
(SD-SMI)
Speech Delay – Developmental Psychosocial Involvement
(SD-DPI)
Speech Errors (SE)
SD-GEN SD-OME SD-AOS SD-DPI SE-/s/ SE-/r/
I. Etiological Processes
II. Explanatory Processes
III. Nosological Entity
IV. Trait Markers (phenotypes, endophenotypes)
SD-DYS
V. Diagnostic Markers
*Shriberg, Austin, et al. (1997)
> Omissions< Distortions< Backing
> Omissions< Distortions< Backing
- - - -- - - - - - - -
> M1 values- - - - - - - -
- - - -- - - - - - - -
< F3–F2- - - - - - - -
- - - -- - - -- - - -
Speech markers> I-S Gap> Backing
- - - - - - - -- - - -
> Severity- - - - - - - -
8 speech markersLex. Stress Ratio>Coeff. Var. Ratio
8 speech markersLex. Stress Ratio>Coeff. Var. Ratio
- - - -- - - - - - - -
- - - - - - - - - - - -
8
Diagnostic Markers and Diagnostic Markers and Automatic Speech RecognitionAutomatic Speech Recognition
• Childhood Apraxia of Speech is controversial disorderdue to lack of consensus on features that define it and etiologic conditions. (Guyette & Diedrich, 1981; Shriberg et al., 1997)
• “suspected Apraxia of Speech” (sAOS) proposed as interim term (Shriberg et al., 1997)
• Two proposed markers for sAOS: Lexical Stress Ratio (LSR) (Shriberg et al., 2003a)
Coefficient of Variation Ratio (CVR) (Shriberg et al., 2003b)
• This work: Pilot study for complete automation of these markers, to address inherent human variability. Aim was to replicate results of prior work.
• Techniques from automatic speech recognition (ASR)
9
Outline of TalkOutline of Talk
• Complex Disease Model for Childhood Speech-Sound Disorders of Unknown Origin
• Diagnostic Markers for sAOS
• Applying ASR to the Lexical Stress Ratio (LSR) The Lexical Stress Ratio Identifying Vowel Boundaries Using ASR Computing the LSR Results
• Applying ASR to Coefficient of Variation Ratio (CVR)
• Summary & Conclusion
10
Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:The Lexical Stress RatioThe Lexical Stress Ratio
• LSR (Shriberg et al., 2003a) measures “inappropriate lexical stress” observed in children with sAOS
• Inappropriate lexical stress:excessive stress on a syllable, or lack of stress on a syllable that is normally stressed
• Three factors used to measure lexical stress:frequency area, amplitude area, and duration of the first and second vowels in trochaic words
• Combine values to a single dimension, defined as stress
• Both high and low LSR values associated with sAOS.
• This work: pilot study on 2 subjects from original LSR study.
11
Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:Identifying Vowel Boundaries Using ASRIdentifying Vowel Boundaries Using ASR
• Primary issue in automating LSR:Determine boundaries of both vowels in known, isolated, two-syllable words (e.g. “ladder”)
• Vowel boundaries determined by “forced alignment”
.pau l @ dc d 3r .pau
ForcedAlignment
12
Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:Computing the LSRComputing the LSR
• Vowel duration (D) = (end time of vowel)–(begin time of vowel)
• Amplitude area (AA) = (average amplitude of vowel (dB))×D
• Frequency area (FA) = (average F0 of vowel)×D
amp:
F0:
phon:
spec:
wave:
13
Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:Computing the LSRComputing the LSR
iiii
N
n nv
nvi
N
n nv
nvi
N
n nv
nvi
SCSCSCLSR
CCC
NDD
S
NAAAA
S
NFAFA
S
332211
321
13
12
11
3030507049002
1
2
1
2
1
.,.,.
,
,
,
,
,
,
v1 = first vowel, v2 = second vowel, N = number of utterances, i = subject
• LSR computed as described in Shriberg et al. (2003a):Ratios of the frequency area, amplitude area, and duration between the first and second vowel are computed and combined into a single score.
14
Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:ResultsResults
• Standard error of the mean estimated from data published by Shriberg et al. (2003a): 0.023
• Difference in results for Subject 1 is within estimated standard error of the mean; not so for Subject 2
• Single gross error from forced alignment procedure, when manually corrected, caused automatic LSR result to become 0.88 (within standard error of the mean)
ParticipantReported
LSRAutomatic
LSRAbsolute
Difference
Subject 1 1.65 1.63 0.02
Subject 2 0.89 0.83 0.06
15
Outline of TalkOutline of Talk
• Complex Disease Model…
• Diagnostic Markers for sAOS
• Applying ASR to the Lexical Stress Ratio (LSR)
• Applying ASR to Coefficient of Variation Ratio (CVR) The Coefficient of Variation Ratio Identifying Speech/Pause Regions Using ASR Computing the CVR Results
• Summary & Conclusion
16
Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:
The Coefficient of Variation RatioThe Coefficient of Variation Ratio• CVR (Shriberg et al., 2003b) measures reduction in normal
temporal variation of speech, as observed in children with sAOS
• Measurement of CVR depends on duration of speech events and duration of pause events
• Because of reduced variability of speech-event durations in children with sAOS, these children have higher CVR values relative to control group
s
s
p
p
speech
pause
CV
CVCVR
p = standard deviation of pause eventsp = mean duration of pause eventss = standard deviation of speech eventss = mean duration of speech events
17
Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:
The Coefficient of Variation RatioThe Coefficient of Variation Ratio• In Shriberg et al. 2003b, speech/pause events detected by:
(1) displaying speech amplitude envelope(2) human identification of pause event with largest amplitude(3) speech/pause classification using threshold from Step (2)(4) removing speech/pause regions with duration < 100 msec
• Preliminary results show good agreement between this Matlab-based algorithm and manual measurements from spectrograms (Green et al., this conference)
18
Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:
Identifying Speech/Pause Regions Using Identifying Speech/Pause Regions Using ASRASR• Can be difficult to identify speech/pause from only energy
or amplitude envelope, so investigated speech/pausedetection using ASR
• ASR system trained using 300 utterances from 3 children with speech delay of unknown origin
• All training data phonetically labeled by hand, time-aligned at the phoneme level
• ASR system trained to classify 8 broad-phonemic classes related to speech (e.g. “nasal”), instead of phonemes
• Grammar used by ASR system imposed constraints onsequences of phonemic classes to be consistent withEnglish syllable structure
19
Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:
Computing the CVRComputing the CVR• ASR results (broad phonetic classes with English syllable
structure) mapped to “speech” and “pause” events
• CVR computed as in Shriberg et al. (2003b):s
s
p
p
CVR
phone class:
speech/pau:
wave:
spectrogram:
20
Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:
ResultsResults
• ASR-method CVR values within 3% of reported values
Participant MethodAverage CV
of pause events
Average CV of speech
eventsCVR
Subject 1reported 0.581 0.407 1.43
ASR 0.565 0.398 1.42
Subject 2reported 0.545 0.503 1.08
ASR 0.509 0.460 1.11
21
Summary & ConclusionSummary & Conclusion
• Agreement between published results and current pilot-study results indicates potential for ASR-based LSR
• Improvements necessary to automation of LSR: Train speech recognizer on children’s speech data
• ASR-based CVR results for two subjects considered to be comparable to reported CVR results.
• Algorithms still require refinement; improvements possible
• Need to evaluate generalization of methods to other subjects
22
ReferencesReferences
• Green, J., Beukelman, D., Ball, L., Ullman, C., and Maassen K. (2004). “Development and Evaluation of a Computer-based System to Measure and Analyze Pause and Speech Events,” Conference on Motor Speech: Motor Speech Disorders, Speech Motor Control, Albuquerque, NM.
• Guyette, T. W. and Diedrich, W. M. (1981). "A Critical Review of Developmental Apraxia of Speech," in Speech and Language: Advances in Basic Research and Practice, 5, pp. 1-45.
• Hawley, M. (2003). “Speech Training And Recognition for Dysarthric Users of Assistive Technology (STARDUST) ”, Wales International Conference on Electronic Assistive Technology, Cardiff, Wales, July 2003.
• Hosom, J. P. (2000). Automatic Time Alignment of Phonemes Using Acoustic-Phonetic Information. Ph.D. thesis, Oregon Graduate Institute of Science and Technology, Beaverton, Oregon.
• Kasi, K. and Zahorian, S. A. (2002). “Yet Another Algorithm for Pitch Tracking,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2002, Orlando, FL, 1, pp. 361-364.
23
ReferencesReferences
• Marquardt, T. P., Sussman, H. M., Snow, T., and Jacks, A. (2002). "The Intelligibility of the syllable in developmental apraxia of speech," in Journal of Communication Disorders, 35, pp. 31-49.
• Shriberg, L. D., Austin, D., Lewis, B. A., McSweeny, J. L., and Wilson, D. L. (1997). "The Speech Disorders Classification System (SDCS): Extensions and Lifespan Reference Data," in Journal of Speech, Language, and Hearing Research, 40, pp. 723-740.
• Shriberg, D. L., Campbell, T. F., Karlsson, H. B., Brown, R. L., McSweeny, J. L., & Nadler, C. J. (2003a). A Diagnostic Marker for Childhood Apraxia of Speech: The Lexical Stress Ratio,” in Special Issue: Diagnostic Markers for Child Speech-Sound Disorders, Clinical Linguistics & Phonetics. 17.7, pp. 549-574.
• Shriberg, D. L., Green, J. R., Campbell, T. F., McSweeny, J. L., & Scheer, A. (2003b). “A Diagnostic Marker for Childhood Apraxia of Speech: The Coefficient of Variation Ratio,” in Special Issue: Diagnostic Markers for Child Speech-Sound Disorders, Clinical Linguistics & Phonetics, 17.7, pp. 575-595.
24
Diagnostic Assessment of Diagnostic Assessment of Childhood Apraxia of Speech Using Childhood Apraxia of Speech Using
Automatic Speech Recognition (ASR) Automatic Speech Recognition (ASR) SystemsSystems
Lawrence D. Shriberg1
John-Paul Hosom2
Jordan R. Green3
1Waisman Center, University of Wisconsin - Madison
2Center for Spoken Language Understanding, Oregon Health & Science University
3Department of Special Education & Communication Disorders,University of Nebraska - Lincoln
This research is supported by NIDCD grants DC000496 and DC006722
http://www.waisman.wisc.edu/phonology/Index.htm
25
26
Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:Speech DataSpeech Data
• Data from Shriberg et al.’s 2003a study (LSR corpus):
24 children with speech delay (control data)
11 children with sAOS
Recordings of elicited samples of 8 trochaic words
Average age: 6 yrs, 4 mo. for children with speech delay, 7 yrs, 1 mo. for children with sAOS.
• This study:
Pilot study
2 children from LSR corpus
27
Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:Measuring FMeasuring F00 and Amplitude and Amplitude
• Fundamental frequency (F0) measured by computing auto-correlation of relative changes in energy between 200 and 700 Hz (Hosom, 2000)
• Post-processing of F0 contour: dynamic programming
method to correct pitch-doubling and pitch-halving errors(e.g. Kasi and Zahorian, 2002)
• 3 Hz average absolute error on single speaker underquiet conditions (cf. electro-glottography measurements)
• Amplitude computed as the log energy (in decibels) of the signal using a 20-msec Hamming window
28
Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:
Speech DataSpeech Data• Data from Shriberg et al.’s 2003b study (CVR corpus):
30 children with normal speech acquisition (control data) 30 children with speech delay (control data) 15 children with sAOS Recordings of conversational speech Inclusionary criteria for children with sAOS based on
set of provisional speech and prosody-voice markers considered consistent with sAOS.
• This study: Pilot study 2 children from CVR corpus
29
Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:
ResultsResults• Replicated Shriberg et al.’s CV computation using threshold
of amplitude envelope
• Differences between replicated and reported values for average CVs of pause and speech < 0.01; smallest reported standard error of the mean = 0.015
• Comparison of results of individual samples from the ASR and Matlab methods:
(A) Matlab CVR method yields some speech events that are “interrupted” by low-amplitude speech. ASR-based CVR may be less sensitive to these interruptions.
(B) other differences in way ASR and Matlab techniques detect speech and pause events
30
Why LSR = state of art system for adult Why LSR = state of art system for adult speech,speech,
CVR = broad classes trained on child. CVR = broad classes trained on child. speech?speech?• LSR forced alignment could be
(a) state-of-art system trained on adult speech (b) not-state-of-art system trained on children’s
speech
• Phoneme identities are given, so mismatch between adult and child speech not as severe as in ASR
• Chose (a) because of ease of implementation
• For CVR task of speech/pause detection, existing forcedalignment system
(a) would require software modifications for ASR(b) was not required to identify phonemes
• Implemented broad-phonetic-category ASR because ofpresumed robustness and ease of implementation.
31
Long-Term GoalsLong-Term Goals
• ASR may also allow automatic measurement of otherprosodic factors, such as syllable duration, inter-stressintervals, and linguistic rhythm
• Multiple measures of sAOS may be combined for improvedsensitivity and specificity
• Evaluate specific factors that influence diagnosis