Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Velocity-based features Spatio-temporal gestural features Spatio-temporal gestural features
M I A LM e d i c a l I ma g e A n a l y s i s L a b
CONCLUSIONS• We presented a machine learning approach
to analyze and describe motions of the human tongue in dynamic US
• Results show that our proposed descriptors can be employed to perform different classification tasks effectively
• Future work includes applying the method to data with more varied articulations
Feature-Extraction
2) Spatio-temporal gestural descriptors
• These descriptors are designed to explicitly encode changes in tongue motion over time
• We perform principal component analysis on the x and y components of all displacement fields (for all k in all n studies)
• We represent D(n,k) using the principal coefficients C of the first M principal components:
Pk=[Cx ... Cx Cy ... Cy ]• Our spatio-temporal gestural descriptor is then encoded as:
[ P1 P2 ... Pk PK-1 ]
• To capture regional velocity differences that may exist, we divide the image domain into 3 regions and compute the distributions of the x- and y- components of D(n,k)
• Entries of each histogram consitute a feature vector, e.g. Vx-P is the vector for the x-component in the posterior
• Concatenating all feature vectors yields our velocity-based descriptor
3
How does Dynamic Time Warp [4] work?
1. Spectral analysis to extract features that relate to pitch, as well as onset times of beats/notes
2. Constructs a t x t similarity matrix S; Sij
is the cosine distance of the features of signals Am and An extracted at the t-th timestep
3. Find the lowest-cost path through S using dynamic-programming
[1] Rastadmehr et al. Increased midsagittal tongue velocity as indication of articulatory compensation in patients with lateral partial glossectomies. J Head and Neck 30(6) (2008) 718–726
[2] Kocjancic, T.: Ultrasound study of tongue movements in childhood apraxia of speech. In: Ultrafest V. (2010) 1–2
[3] Herold et al.: Analysis oaf vowel-consonant-vowel sequences in patients with partial glossectomies using 2D ultrasound imaging. In: Ultrafest V. (2010) 1–2
[4] Turetsky, R., Ellis, D.: Ground-truth transcriptions of real music from force-aligned midi syntheses. In: 4th ISMIR. (2003) 135–141
[5] Metz et al.: Nonrigid registration of dynamic medical imaging data using nD+t B-splines and a groupwise optimization approach. Medical Image Analysis 15(2) (2011) 238 – 24
[6] Wu, J.: A Fast Dual Method for HIK SVM Learning. In: ECCV. (2010) 552–565
A Machine Learning Approach to Tongue Motion Analysis in 2D Ultrasound Image SequencesLisa Tang1, Ghassan Hamarneh1 and Tim Bressmann2
1
Medical Image Analysis Lab, School of Computing Science, Simon Fraser University 2
Department of Speech-Language Pathology, Faculty of Medicine, University of TorontoUNIVERSITY of TORONTOSPEECH-LANGUAGE PATHOLOGY
Time
Freq
uen
cy
.15 1 1.5 2 2.5
Lowest cost path
fram
e in
dic
es o
f A m
frame indices of An
i-th frame of Anmatches to
j-th frame of Am
Sij
Data Normalization
1. One patient study is chosen as reference, and its audio signal Am is chosen as the template audio signal
2. For each other patient study n, we seek a mapping
Tm: Am An
that aligns audio signal Am to An using Dynamic Time Warp [4]
3. We then compute from Tm the K indices that indicate frame correspondences
• Reading speeds vary across subjects so the same word is articulated at different times
• We thus need to resolve temporal correspondences of the US sequences across subjects, i.e. extract a subset of US frames from each sequence in which same sounds were spoken
• In finding temporal correspondence across studies Um and Un , we use their audio recordings Am and An :
1
U(n,1) U(n,k) U(n,K-1) U(n,K)
U(m,1) U(m,k) U(m,K-1) U(m,K)
Tm
Am
An
Motion Characterization
• We characterize tongue motions via groupwise registration of the K extracted US frames
• We employ the 2D+time registration algorithm of [5]
• Registration accuracy has been confirmed using expert-delineated tongue contours
• This generates a set of displacement fields {D(n,k) : k = 1 ... K }, each of which maps points in frame k to corresponding points in frame k+1 in
2
. . .
D(n,1) . . . D(n,K-1)
)
1) Velocity-based descriptors
INTRODUCTION• Analysis of ultrasound (US) tongue
sequences and accompanying audio recordings enables speech research
• Current goal: develop a procedure for tongue motion analysis
• Ultimate goal: develop reliable and robust indicators that quantify what constitute normal and abnormal tongue movement
• Such indicators would aid the development of treatment strategies for impediments
• In contrast to previous tongue motion analyses, e.g. [1-3], we propose a method that does not require segmentations
• We analyze tongue motion captured in the US data via 3 classification tasks to be described below
• Given a training set of paired data:
we train a Support Vector Machine (SVM) that predicts the label of each sample in a test set based on the sample’s features
• Distance between ai and aj is measured as [6]:
where F is the length of ai
• We then train the SVM using Intersection Coordinate Descent [6], a deterministic algorithm that was shown to be fast and accurate
EXP
ERIM
ENTA
L R
ESU
LTS
Analysis4
74%
86% 84%86%/ishi/
vs. /ushi/
/ishi/ vs.
/ushi/
/aka/ vs.
/ishi/ 3-class
Examine how tongue velocity varies in different regions of the tongue as subject spoke
Examine whether the spatio-temporal descriptors can be used to predice utterance type
Examine whether the spatio-temporal descriptors can be used to predict abnormal tongue motion
Subjects recited a passage of over 50 words
Subjects recited 3 utterances 5x, each is a vowel-consonant-vowel (VCV) se-quence: /aka/, /ishi/, /ushu/
Same as Task #2
Task 1 Task 2 Task 3
Displacement field computed between two US frames
The sequence of displacement fields generated from an entire VCV sequence
Same as Task #2
Abnormal vs. normal tongue motion /aka/ vs. /ishi/ vs. /ushu/ Abnormal vs. normal tongue motion
AnalysesObjective
Setup
Motion Samples
bi
ai
Results
84% 86%84%
/aka/ /ishi/ /ushu/70
80
90
Vx-B Vy-B Vx-D Vy-D Vx-P Vy-P All
86%90%
81%89% 90% 91% 94%
Classification accuracies
f=1
FK( ai , aj ) = min( ai , aj )
f f
( ai , bi )feature vector label
of motion sample
1 M 1 M
PosteriorBlade
Dorsum
{Vx-P, Vy-P}{Vx-B, Vy-B}{Vx-D, Vy-D}
Dorsum BladePosterior
Un
70
80
90
70
80
90