Upload
palmer-hale
View
71
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Xkl: A Tool For Speech Analysis. Eric Truslow Adviser: Helen Hanson. Outline. Introduction to speech analysis Production mechanism Models of speech production Background about Xkl Design Pitch Detection Labeling Portability Future Work. Outline. Introduction to speech analysis - PowerPoint PPT Presentation
Citation preview
Xkl: A Tool For Speech Analysis
Eric TruslowAdviser: Helen Hanson
Outline
• Introduction to speech analysis– Production mechanism– Models of speech production
• Background about Xkl• Design
– Pitch Detection– Labeling– Portability
• Future Work
Outline
• Introduction to speech analysis– Production mechanism– Models of speech production
• Background about Xkl• Design
– Pitch Detection– Labeling– Portability
• Future Work
Speech Production
Vocal Tract Frequency Reponse
Speech Production
Periodic Source Vocal Tract Frequency Reponse
Speech Production
Periodic Source Vocal Tract Frequency Reponse
Nasal cavities contribute tooNasal cavities contribute too
Output
Speech Model: Basic
Impulse TrainGenerator
Pitch Period
Glottal PulseModel X
Random Noise
GeneratorX
Vocal TractModel
Vocal Tract Parameters
Voiced/Unvoiced Decision
Gain
Gain
Speech Model: Klatt
Parameters
• Source characterization– Voiced or unvoiced– Frequency of periodic source– Energy distribution of a noise source
• Vocal tract model– Resonant frequency (Formants),
antiresonant frequencies and bandwidths
Outline
• Introduction to speech analysis– Production mechanism– Models of speech production
• Background about Xkl• Design
– Pitch Detection– Labeling– Portability
• Future Work
Background - Xkl
• Developed in-house at MIT by Dennis Klatt in the 1980s, and was originally a command line tool on Vax systems.
• Later was ported to UNIX and an X11/Motif GUI was added.
• Currently runs on Linux.
• Praat has become a very versatile alternative to Xkl, but Xkl has functionality that Praat does not.
Xkl – Features
• Allows users to easily examine speech signals in fine detail.
• Automatically computes DFT and spectrogram.
• Can perform a variety of computations not available in other packages.
• Averages spectra over time or waveforms• Smooth spectrum
Spectrogram and DFT in Xkl
SpectrogramSpectrogram
DFT and smoothed spectrumDFT and smoothed spectrum
Outline
• Introduction to speech analysis– Production mechanism– Models of speech production
• Background about Xkl• Design
– Pitch Detection– Labeling– Portability
• Future Work
Design Requirements
Users surveyed wanted:1. Pitch period estimator2. An improved labeling system3. Portability
1. Compatibility with multiple operating systems
2. Support for more audio file formats
Pitch Detection
• How rapidly the vocal tract is excited with periodic pulses.
• Carries lexical and prosodic information.• During computation we must decide whether
speech is voiced or unvoiced.– Errors in computation often occur during
transitions between sounds.– Errors depend on type of pitch detector being
used.
Pitch Detection: Design
• There are many different pitch detectors
• Praat's was chosen because it– Outperforms other detectors (SNR, HNR)– Is readily available
Pitch Detection: Algorithm
Tone 4
Remove HanningWindow Sidelobe
Praat Pitch Detector
Compute GlobalPeak Value
Process FrameTo Obtain LocalOptimal Choices
Find Path withGlobally Minimum
Cost
• Time domain, autocorrelation method• Frame processing determines strongest
pitch candidates including unvoiced.• Viterbi algorithm minimizes global cost
from candidates.
Outline
• Introduction to speech analysis– Production mechanism– Models of speech production
• Background about Xkl• Design
– Pitch Detection– Labeling– Portability
• Future Work
Labeling
• Support for reading and saving TextGrid files, for interaction with Praat [1].
– Tiers for grouping labels• Want labels to be displayed in same
window as waveform– Different from Xkl's separated
window layout
Labeling
Outline
• Introduction to speech analysis– Production mechanism– Models of speech production
• Background about Xkl• Design
– Pitch Detection– Labeling– Portability
• Future Work
Portability
• PortAudio– a cross-platform audio library– supports most operating systems– simplifies software maintenance
• Runs on OS X – Since it natively runs X11
• Added support to open Microsoft .wav files.
Outline
• Introduction to speech analysis– Production mechanism– Models of speech production
• Background about Xkl• Design
– Requirements– Alternatives– Final Design
• Future Work
Future Work
• Deploy to users for feedback• Finalize
– Labeling – Pitch Contour
• Fix bugs and add small features
Software Used
Eclipse – Integrated Development Environment.
Doxygen – A documentation generation system.
SVN – A version control system.
Open Motif – X Windows window managing system and widget library.
GDB – The GNU debugger.
GNU build system on OS X.
PortAudio – A multiplatform audio library.
Thank you for your attention.
Special thanks to:• Professor Helen Hanson• Dr. Stefanie Shattuck-Hufnagel (MIT)• Dennis H. Klatt• Survey Participants• ECE Department
Questions?
References
1: Paul Boersma & David Weenink (2009):Praat: doing phonetics by computer (Version 5.1.05) [Computer program].Retrieved May 1, 2009, from http://www.praat.org/
2: Paul Boersma, Accurate Short-term analysis of the Fundamental Frequency and the Harmonics-to-Noise Ratio of a Sampled Sound, 1993, http://www.fon.hum.uva.nl/paul/papers/Proceedings_1993.pdf