Upload
nguyentruc
View
230
Download
2
Embed Size (px)
Citation preview
10/20/2009
1
Digital Microphone Array
Design, Implementation and Speech Recognition Experiments
Erich Zwyssig
EADS IW UK Ltd.
CSTR - The University of Edinburgh
19th October 2009
Outline
• Motivation
• Background
• Digital Microphone Array – Background
• Digital Microphone Array – Building
• ASR – Methodology
• ASR – Setup
• Results
• Conclusions
10/20/2009
2
Motivation
• Meetings shall be (more)
– efficient
– productive
• AMI / AMIDA consortium
– One research topic
• Instrumented
Meeting Room
Instrumented Meeting Room
• Recording Devices
– Audio
– Video
– …
• Distant Speech Recognition
– People don’t like to wear head-mounted
microphones
10/20/2009
3
Background
• Distant Speech Recognition (DSR) combines:
– acoustic array processing
– automatic speech recognition (ASR).
• Problems
– Dereverberation
– Noise
Distant Speech Recognition
• A complete DSR system includes:
– microphone array
– algorithm to track the active speaker(s)
– beamforming algorithm to focus on the desired
speaker(s)
– post-filtering to enhance the beamforming
– speech recognition engine
– speaker adaptation component
10/20/2009
4
Microphone Array
• Mono
– Directivity through
mechanics/acoustics
• Stereo
– add two signals
• Array
– delay-sum
– superdirective beamforming
• linear endfire arrays
Microphone Array
• delay-sum beamforming
10/20/2009
5
Digital MEMS microphone array
• MEMS (Micro Electro Mechanical System)
• ultra small microphones
– withstand reflow soldering in automatic
manufacturing
– cheap
MEMS devices
• Accelerators
– Phones
– Game Consoles
• Microphones
• Chemical sensors
• etc.
10/20/2009
6
MEMS microphone
• 20 years of research
• Business
– 2004: $2 Mio
– 2006: $140 Mio
– 2011: $922 Mio (est.)
• Currently about 20 providers
• Main (novel) part is the membrane
MEMS microphone
• Two principles– frequency modulation
scheme(capacitor C modulates oscillator)
– pre-charged capacitor CQ = V * C = const.(C modulates V)
10/20/2009
7
Digital microphone array - System
• Digital MEMS microphone array
• Digital Signal Processing (DSP)
• Interface (IF)
• Personal Computer (PC)
Digital microphone array – Background
• Analogue Digital Converter (ADC)
• Oversampling ADC
• Digital Signal Processing (DSP)
• Interfaces (IF)
10/20/2009
8
ADC
• Analogue Digital Conversion– Nyquist converter
– Oversampling converter
• Building blocks– Low Pass Filter (LPF)
– Analogue Digital Converter (ADC)
– Digital Signal Processing (DSP)
Nyquist converter
• Sample at the Nyquist frequency
– e.g. 16kHz
– Need 96dB stopband attenuation for audio HiFi
• for the analogue low pass filter -> impossible
10/20/2009
9
Oversampling converter
• Swap
resolution
with
sample
frequency
Digital Signal Processing
• FIR filter
– e.g. moving average filter
10/20/2009
10
Digital Signal Processing
• Differentiator
• Integrator
• CIC Filter
(Cascaded integrator-
comb filter)
Interfaces
• PDM
• I2C
• I2S
• AC’97
• USB
• Firewire
10/20/2009
11
Digital microphone array – building
• System Design
• Signal Processing
• DSP implementation
• USB interface
System Design
• Interfaces define the system
– Microphone to DSP -> PDM
– DSP to PC � ????
• DSP to IF � AC’97
• IF to PC � USB
10/20/2009
12
Digital MEMS Microphone Array
System
Signal Processing
• HiFi audio ADCs typically work @ 64 fs
• Need to downsample to fs
• Two options– Downsample using one filter
(requires an FIR filter of 3400th order)
– Downsample in steps• CIC from 64fs to 8fs
• FIR from 8fs to 4fs to 2fs to fs (using halfband filters)
10/20/2009
13
Signal Processing
DSP implementation
• Microphone Interface
• DSP
• FIFO
• AC’97 Interface
• Controller
• Clocking
10/20/2009
14
DSP implementation
USB Interface
• (TI) TAS1020B USB StreamingController
– First trials (stereo)
• (TI) TUSB3200A USB StreamingController
– 8052 core
– DMA
– Full AC’97 IF support
10/20/2009
15
HW design flow
• DSP design
• HDL design
• Simulation
• Synthesis
• Debugging
DSP design
• Matlab©
– specify filter
• e.g. stopband
attenuation
– specify
constraints
• e.g. bit width
– export Xilinx
coefficients
10/20/2009
16
HDL Design
• Verilog HDL example (counter)
//functional
reg [width-1:0] count;
always @(posedge clk or negedge resetn)
begin
if (~resetn)
count <= 0;
else
if (ena)
count <= count + 1;
else
count <= count;
end
Simulation
• Modelsim™ XE example
10/20/2009
17
Synthesis
• Xilinx© ISE® Example
Debugging
• Setup: Logic Analyser and
Digital Sampling
Oscilloscope
10/20/2009
18
Digital Microphone Array
Limitations
• Windows XP/Vista
– Not more than
stereo over USB
• Xubuntu (Linux)
– Limits at 7
channels over USB
• MAC
– tbd
10/20/2009
19
ASR Methodology
• Beamforming (mdm-tools)– noise removal
– speaker tracking
– beamforming (and post-filtering)
• HMMs trained with HTK on the WSJCAM0 database– 53 male and 39 female speakers with British English
accents
– 11,000 tied-state triphones
– three emitting states per triphone
– 6 Gaussian mixture components per state
– 52-element feature vectors (comprising 13 MFCCs and 0th cepstral coefficient) with 1st, 2nd and 3rd order derivatives
ASR Adaptation
• MAP
– Too little data
• MLLR
– Means-only
– Means and variances (constrained)
• Channel, gender and individual
– Channel � analogue vs. digital
– Gender � female vs. male
10/20/2009
20
ASR Setup
• WSJCAM0 prompts– Adaptation sentences (17, approx. 1 min)
– 5k sentences (38, approx. 7 min)
– 20k sentences (38, approx. 7 min)
• 12 participants– 6 female / 6 male
– All UK English
• Recorded with– AMI/AMIDA analogue microphone array
– Newly developed digital microphone array
Recording setup
10/20/2009
21
ASR adaptation scenarios
Results
0.0
10.0
20.0
30.0
40.0
50.0
60.0
None MLLR Channel MLLR Speaker and Channel
[%]
WER
Analogue
Digital
10/20/2009
22
Conclusions
• The newly designed digital microphone array
compares well with the analogue one
• No effect of speed of speaker
• Two categories of speakers
– Ones the system likes (sheep),
and others it dislikes (goats)
• Using adaptation the gap between the
speakers and channels closes
Conclusions
10/20/2009
23
Conclusions
Acknowledgements
• Steve and Mike for using the AMI/AMIDA
setup
• Knowles Electronics and Bernafon Ltd. for
supplying the digital MEMS microphones
• Wolfson Microelectronics plc for access to DSP
filters, Keil µVision Compiler and EEPROM
programmer
10/20/2009
24
Thanks
• To you, the audience, for coming here today
• Any questions?
Demo
• IF 3.07 (Instrumented Meeting Room) for the
next 30 min
10/20/2009
25
References
• Zwyssig Erich, “Digital Microphone Array- Design,
Implementation and Speech Recognition Experiments”, Thesis
submitted towards the degree of MSc, The University of
Edinburgh, Aug 2009
• Zwyssig E., Lincoln M., Renals S., “A digital microphone array
for distant speech recognition”, submitted for ICASSP2010,
Dallas, February 2010
Distant Speech Recognition
• ASR– HMM
– GMM
– Viterbi
– LM
• RSR– Adaptation
• MAP
• MLLR
– FE• CMN
• CVN
• VTLN
• DSR– Microphone Array
– Beamforming
– Wienerfilter• Noise removal
• AMI/AMIDA projects– develop smart meeting room
• Current limitations are– portability
– cheap commodity HW
10/20/2009
26
ASR – Flow (adaptive)
ASR – Methodology