View
226
Download
1
Category
Tags:
Preview:
Citation preview
Speech Recognition Front End
Pre-emphasis
TemporalFeatures
Consolidate Features
Frequency Features
Spectral Analysis
windowing
EnhanceFeatures
Speech
FeatureVectors
This week’s focus is on the spectral analysis
Spectral Analysis
• Goal: Find useful frequency related features• Approaches
– Without Fourier Analysis: • Apply a recursive band pass bank of filters• Use linear predictive coding (LPC)
– With Fourier Analysis: • Calculate a Fourier transform • warp results based on the MEL scale
• Applications: Auditory models mimicking human hearing– Eliminate noise by removing non-voice frequencies – Detect formants present in signal– Perform Cepstral analysis to detect pitch and recognize speech– Auditory nerves stop responding to extended occurrences of the same
frequency• Idea: Deemphasize frequencies present for extended periods.• Results: Effective for speech recognition in noisy environments
This week’s emphasis will be on Fourier Analysis
The Fourier Transform Family• Fourier Series
A decomposed weighted sum of sinusoidal functions that models an arbitrary infinitely periodic continuous function
• Fourier TransformA linear operation that maps an arbitrary function with infinite range into a spectrum of its frequency components
• Discrete Fourier Transform (DFT)A Fourier Transform applied to a discrete infinitely repeating periodic series of complex numbers.
• Discrete Time Fourier Transform (DTFT)A Fourier Transform applied to a a-periodic discrete series of complex numbers that extend from ± ∞.
• Fast Fourier Transform: Fast way to calculate DFT
The number e
• e = limn->∞{(1 + 1/n)n}
• When n = 1 e ≈ 2When n = 2 e ≈ (1 + ½)2 = 9/4 = 2.25When n = 3 e ≈ (1 + 1/3)3 = 64/27 = 2.37037
• When n is extremely large, it approaches the value: e = 2.718281828 …
• What does this have to do with sound?Answer: The future slides will tell.
Quick Calculus Review
• The derivative of a function at a point is the slope of the function at that point (change in y over change in x).
• The derivative of x2 = 2x (Notation: f’(x2) = 2x)lim ∆x->0 ( (x+∆x)2 – x2 )/ ∆x
= lim ∆x->0 (x2 + 2x∆x + ∆x2 – x2)/∆x = lim ∆x->0 (2x + ∆x) = 2x
• Tables of derivatives proved by mathematicians exist• We will need these:
– f’(xn) = nxn-1
– f’(Sin x) = Cos x, f’(Cos x) = -Sin x– f’(ex) = ex, f’(eax) = a eax
Complex Numbers• Extends the number line to a plane
– Horizontal axis: Real Numbers– Vertical axis: Complex Numbers– Rectangular Notation: a + bi
• a along the real axis• b along the imaginary axis
• Operations– Addition: (a+bi) + (c+di) = (a+b) + (b+d)I– Multiplication: (a+bi) * (c+di) = (ac – bd) + (ad + bc)I– Division: (a+bi)/(c+di) solved by multiplying numerator and
denominator by the conjugate of c+di, which equals c-di
Polar Notation
• Rectangular Form 4+i3• Convert to Polar Form (5,36.87)
– M = sqrt(42+32) = 5– Ө = arctan(3/4)
• Convert to Rectangular– A+ib = M(cos Ө + i * sinӨ)
Distance and angle from the origin
Note: At 90 and 270 degrees we have a divide by zero
McLauren Series for e, sin, cos• McLauren Series to estimate any well-behaved
function in terms of polynomialsf(x) = f(0)x0/0! + f’(0)x1/1! + … + fn(0)xn/n! + …
• Try it out say for the third derivative at x = 0f3(0) = 0 + 0 + 0 + 3*2*1 f3(0)/(3*2*1) + 0 + 0 + …All the derivatives match at x = 0.
• Series that we will need• ex = 1 + x + x2/2! + x3/3! + x4/4! + …• Sin x = x – x3/3! + x5/5! – x7/7! + …• Cos x = 1 – x2/2! + x4/4! – x6/6! + …
• Another way to calculate e: e = 1 + 1 + 1/2! + 1/3! + …
Note: 0! = 1
Sine, Cosine and e
eiӨ = 1 + iӨ + (iӨ)2/2! + (iӨ)3/3! + (iӨ)4/4! + (iӨ)5/5! + (iӨ)6/6! + (iӨ)7/7! +···
(Multiply terms to eliminate higher powers of i) = 1 + iӨ - Ө2/2! - iӨ3/3! + Ө4/4! + iӨ5/5! - Ө6/6! - iӨ7/7! + ···
(Gather real and complex terms together) = (1- Ө2/2! + Ө4/4! - Ө6/6! + ···) + i (Ө - Ө3/3! + iӨ5/5! - iӨ7/7! + ···)
(Substitute Cos and Sin terms for the series) eiӨ = cos(Ө) + i sin(Ө) (This is called Euler’s formula)
From Previous Slideex = 1 + x + x2/2! + x3/3! + x4/4! + …Sin x = x – x3/3! + x5/5! – x7/7! + …Cos x = 1 – x2/2! + x4/4! – x6/6! + …
Key Formulae and Identities
Euler's Formula: eix = cos(x) + i * sin(x)
Trigonometric Identities: cos(x)=cos(-x) and sin(x)=-sin(-x)
cos(x) = (eix + e-ix)/2 and sin(x) = (eix – e-ix)/2isin2(x)+ cos2(x) = 1
sin(x+y) = sin(x)cos(y) + cos(x)sin(y)cos(x+y) = cos(x)cos(y) - sin(x)sin(y)
Quick Linear Algebra Review• Linear algebra extends Euclidian space beyond three dimensions.• <3,4,5> represents a vector going from points (0,0,0) to (3,4,5).• Two vectors are orthogonal (perpendicular roughly speaking) if
their inner (dot product) equals 0.– Example: <1,0,0> • <0,1,0> = 1*0 + 0*1 + 0*0 = 0– Example: <3,1>•<-1,3> = 3*-1 + 1*3 = 0
• Two functions are orthogonal between a and b if ∫a,b f(x)g(x)dx = 0
• A set of functions are mutually orthogonal if ∫a,b fi(x)fj(x)dx = 0 if i≠j
and c>0 if i=j.• Why do we need this? Orthogonal function sets can be used to
decompose or construct signals.
Inner Product: sum the products of correspondent coordinates
Basis to span a space• Consider the orthogonal basis <1,0,0>, <0,1,0>, <0,0,1>
– These form a basis a three dimension space.– Why? Any 3-dimension vector is a linear combination of these– Example: <4,3,2> = 4 * <1,0,0> + 3 * <0,1,0> + 2 * <0,0,1>
• Consider the orthogonal basis vectors: <1,2>, <-2,1> – They are orthogonal because: <1,2> • <-2,1> = 0
• Consider the basis vectors: <1/5½,2/5½ >, <-2/5½,1/5½ >– Also orthogonal because the inner (dot) product is 0)– <1/5 ½,2/5 ½ >has a length of unity ((1/5½)2 + (2/5½)2) ½ = 1– <-2/5 ½,1/5½ > also has a length of unity (same distance calculation)
• Orthonormal basis vectors: orthogonal and have unity length
Orthogonal and Orthonormal• Experiment (intuitive example, not mathematically precise)• Goal: construct <4,7> from basis vectors
– Orthogonal Basis: <1,2> and <-2,1>– <1,2> • <4,7> = 18 and <-2,1> • <4,7> = -1– 18 <1,2> + (-1)<-2,1> = <20, 35> which is five times <4,7>
• Another experiment– Orthonormal basis: <1/5½,2/5½ >, <-2/5½,1/5½ >– <1/5½,2/5½ > • <4,7> = 18/5½ and <-2/5½,1/5½>• <4,7> = -1/5½
– (18/5½)<1/5½,2/5½ > + (-1/5½)<-2/5½,1/5½> = <20/5, 35/5> = <4,7>
• Conclusion: Orthonormal basis vectors correlated with another vector gets the multiple of that basis vector.
Fourier Series• A Fourier series is an sum (possibly by not
necessarily infinite) of Sine and Cosines to model a continuous signal.
• Fourier modeling allows us to decompose a signal, perform processing, and recombine the results to solve an original problem
Fourier Cosine Series
• The set of functions: {cos(k2πF0} where k is an integer >0
– Mutually orthogonal from –T to T for 0 ≤ t < ∞; T>0
– ∫-L,L cos(k12πx/P) cos(k22πx/P)dx = 0 if k1 ≠ k2; ≠ 0 if k1 = k2
– Proof requires some Calculus: Namely integration
• x(t) = a0cos(0*2πF0t) +a1 cos(1*2πF0t) +a1 cos(2*2πF0t) …x(t) = a0 + ∑k=1,∞akcos(k2πFt ) where F = 2π/T
• Comment: The series doesn’t include phases, if we add phases we have twice as many unknowns to compute
cos (πx/3) and cos (2 πx/3)Integral: Cos (πx/3) * cos (2 πx/3)
A General OrthogonalFunction Set
• Euler Equation: eiφ = cos(φ) + i sin(φ)– Radius = magnitude (always unity); φ = phase.
• Consider the function set: {eiωk}
– Angular frequency: ωk = 2πkF0 = 2πk/T0
– F0,T0 Fundamental frequency & period.
– k = speed which eiωk traverses the circle
– Orthogonal because ∫-∞,∞ ejω
n ejωm =0 whenever n ≠-m Notes
1. The book uses j instead of I
2. Electrical engineers prefer j
3. Mathematicians prefer I
4. Get used to both!
5. In the diagram, φ = 2πF0
Orthogonality Example• Left: Correlate top with middle resulting bottom having area ≠0 • Right: Correlate top with middle resulting bottom having area = 0
Putting it all together• {eiω
k} is an Orthogonal basis for signals– Each function: eiω
k is a basis function
– We can use to basis functions to synthesize signals• Synthesize (Fourier series)
– Source: frequency magnitudes, Sink: time signal– x(k) = (1/T)∑k=0,T ak eikω
0 where x(k) = signal at time t
– T = # of basis functions (possibly infinite); ak = magnitude of wk
• For computer processing, we need a discrete counterpart– Why? We don’t to deal with infinite points or basis functions– x[k] = (1/N)∑k=0,N X[k] ei2∏kn/N
– k determines how fast the sum traverses the circle (higher k faster)– N basis functions and N frequencies
• Note: For periodic functions, we can use [0,T] instead of [- ∞,∞]
Fourier Analysis• Goal: Compute coefficients given the signal.• Synthesis equation: x(t) = ∑k= -∞,∞ ak eitmω
0
• Multiply both sides by e-itkw0
x(t)e-itkw0 = (∑k= -∞, ∞ ak eimtω
0)e-itkw0
• Integrate over the period: 0, T0∫0,T0x(t)e-itkw
0dt = ∫0,T0(∑k= -∞, ∞ ak eimω0
t) e-itkω0dt
• The sum will be zero except when k = m∫0,T0 x(t)e-itkw
0dt = (∑k= -∞, ∞ ak ) ∫0,T0(etimω0) e-itkω
0dt∫0,T0 x(t)e-itkw
0dt = (∑k= -∞, ∞ ak ) ∫0,T0(eit(m-k)ω0)dt
• The only time this is non-zero is if k=m ∫0,T0 x(t)e-itkw
0dt = ak ∫0,T0dt = ak t |0,t0 = ak T0
• The answer (value of coefficient m): ak = (1/T0)∫0,T0 x(t)e-itkw0dt
• Note: 1/T0 is simply a constant the scales the result
Discrete Version• Definition: Continuous Fourier Transform and Inverse
– Transform: X(w) = ∫-∞, ∞ x(t)e-itwtdt
– Inverse: x(t) = (1/2π)∫ -∞, ∞ X(w)eiwtdw
• Convert from continuous version: – Evaluate at N equally spaced points (period now is N)– Use sums to approximate the integral– Note: x(t) = value at time t, x[n] is x(t) evaluated at time 2∏kn/N
• Discrete Fourier Transform and Inverse– Transform: X[k] = ∑n=0,N-1 x[n] e-i2∏kn/N
– Inverse: x[k] = (1/N)∑n=0,N-1 X[k] ei2∏kn/N
• Note: X[k] is a complex number representing magnitude/phase• Conclusion: We can go between time and frequency domains
Signal Plot
• The phases are shown in the spectrum plot in the complex plane.
• The phase affects how the time domain signal looks.
• The amplitude of the spectrum plot remain constant regardless of phase.
Fourier Transform of Square Wave
Fourier Transforms exhibit the property of duality• Square wave in frequency = to window sync function in time and visa versa• Convolution in time = multiplication in frequency and visa versa• Proof with calculus
∫-∞,∞ x(t)e-jtkw0dt = ∫-1/2,1/2 x(t)e-jtkw
0dt = ∫-1/2,1/2 e-jtkw0dt =(1/jw)e-jwt|-1/2,1/2
= (1/jw)(e-jw½ –e-jw(-½))=(1/jw)(ejw/2–e-jw/2) = sin(jw/2)/(jw/2)
-1/2 1/2
Complex DFT by Correlationdouble[] DFT( double[] time, int N){ double[] f[2*N], real, imag; double om, w = 2 * Math.PI / time.length; for (k=0; k<N; k++) { for (i=0; i<N; i++) { real = Math.cos(2*Math.PI*k*i/N); imag = -Math.sin(2*Math.PI*k*i/N); f[2*k] +=(time[2*i]*real–time[2*i+1]*imag); f[2*k+1]+=(time[2*i+1]*imag+time[2*i]*real); } } return freq;}
Note: even indices = real part, odd indices = imaginary part
Complexity: O(N2) because of the double loop of N eachExample: For 512 samples, loops 262144 timesEvaluation: Too slow, but FFT is O(N lg N)
The FFT Algorithm• The FFT algorithm is based on divide-and-conquer
n/4 n/4 n/4 n/4
n/2 n/2
n O(n)
O(n)
O(n)
O(log n)
The running time complexity is O(n log n)
Why do we need FFT?• Correlation algorithm is O(N2)
• Too slow to be practical even on today's processors
• Optimized FFT is O(N lgN) which is orders of magnitude faster
• Assume 512 elements in a window– O(N) = C * 512 – O(N2) = C * 512 * 512 = C * 262,144– O(N lg N) = C * 512 * 9 = C * 4,608
Theory for OptimizationBase Case: x[0]
Recursive Relationship
∑t=0N-1 x[t] e-i2πkt/N
= ∑t=0N/2-1 x[2t] e-i2πk(2t)/N
+ ∑t=0N/2-1 x[2t+1] e-i2πk(2t+1)/N
= ∑t=0N/2-1 x[2t] e-i2πkt/(N/2)
+ ∑t=0N/2-1 x[2t+1] e-i2πk(2t+1)/N
= ∑t=0N/2-1 x[2t] ei2πkt/(N/2)
+ e-i2πk/N∑t=0N/2-1x[2t+1]e-i2πkt/(N/2)
= Fkeven
+ e-i2πk/N * Fkodd
Note: work at each step is O(N); there are lg(N) levels
Simple Recursive FFT SolutionComplex[] fft(Complex[] x){ int N = x.length;
Complex[] y = new Complex[N]; if (x.length==1) {y[0] = x[0]; return y; } Complex[] even = new Complex[N/2]; Complex[] odd = new Complex[N/2]; for (int m=0; m<N/2; m++) { even[m] = x[2*m]; odd[n] = x[2*m+1]; } Complex[] q = fft(even), r = fft(odd);
for (int k=0; k<N/2; k++) { double exp = -2*k* math.PI /N; Complex wk = new Complex
(Math.cos(exp), Math.sin(exp)); Y[k] = q[k].plus(wk.times(r[k])); Y[k+N/2] = q[k].minus(wk.times(r[k])); } Return y; } Note: e-2kπ/N = -e-2kπ/N+N/2
Inefficiencies
• The Complex class causes many jumps and puts pressure on the hardware cache
• Declaring and copying arrays at every step slows things down at least by half
• Repetitive calculations of sines and cosines are extremely slow
• N<<1 is ten times faster than N/2• Overhead associated with activation record creation
due to the recursion calls is very slow
The computations still are an order of magnitude slower than needed
Eliminating the Recursion
• The numbers in the rectangles are the array indices
• You see the original indices as we pass through each level of recursion
• Can you see a pattern ?
000 001 010 011 100 101 111110
000 010 100 110 001 011 111101
000 100 010 110 001 101 111011
Butterfly algorithm
Butterfly Code
int j = N>>1, k;for (int i=1;i<N-1;i++){ if (i < j) { swap (x[i],x[j]);} k = N>>1; while (k>=2 & j>=k) { j -= k; k >>= 1; } j += k;}
• Most Significant BitSwapBit ( x, x + lgN)
• Second most significant bit
SwapBit(x, x + lg(N/2)
• Third most significant bit SwapBit(x, x + lg(N/4)
• kth most significant bitSwapBit(x, x + lg(N/2k))
Flip bits from left to right
Sin and Cosine Table Look Up
• ei2πk/N = cos(2πk/N) + i sin(2πk/N)• We can store in an array (sinX[])sin(2π0/N), sin(2π1/N), sin(2π2/N) sin(2π3/N), … sin(2π(N-1)/N)
• cos(2πk/N) = sinX[(k+(N>>2))%N]
Compute the values ahead of time and save repetitive calculations
Optimized FFT – after butterfly code// Perform the fft calculations.for (int stage=1; stage<=M; stage++) // M = lg N{ // Remember that complex numbers require pairs of doubles fftSubGroupGap = 2<<stage; // 4, 8, 16, ... – subgroup distance gap = fftSubGroupGap>>1; // 2, 4, 8, ... – odd/even distance kInc = N>>1; // Number of 2PIki/N steps for odd/even entries.
// Outer loop: each sub-fft group; inner loop: combine group elements for (int even=0; even<complex.length; even+=fftSubGroupGap) { k = 0; // Index into the trigonometric lookup table. for (int element=even; element<(even+gap); element+=2) {
// ***** See Next Slide ***** k += kInc; // position for next look up.
} } kInc >>= 1;}
Multiplication Portion // Look up e^2PIki/N avoiding trig calculations here. realW = sines[(k+(N>>2))%N]; // cos(2PIk/N); imagW = -sines[k%N]; // -sin(2PIk/N);
// Complex multiplication of the odd entry of the subgroup // with (e^2PIi/N)^k = (cos(2PI/N) - i * sin(2*PI/N)^k j = (element + gap); tempReal = realW * complex[j] - imagW * complex[j+1]; tempImag = realW * complex[j+1] + imagW * complex[j];
// Adjust the odd entry (subtract: the fft is periodic). complex[j] = complex[element] - tempReal; complex[j+1] = complex[element+1] -tempImag;
//Adjust the even entry. complex[element] += tempReal; complex[element+1] += tempImag;
Final Notes• Standard Fast Fourier Transform
– requires N to be a power of 2 for recursion to work– Can pad the array with zeroes to extend frequency domain
• Can it work if N is not a power of 2? – Yes, but special slower processing is needed
• How do we know if it works?– Point N/2-1 = Point N/2+1, Point N/2-2 = Point N/2+2,
Point N/2-k = Point N/2 + k, etc.– Note: Points 0 and N/2 don't match, so don’t check these– The FFT Inverse should restore the time domain signal– Compare to the slower correlation DFT calculation– Try some simple impulses and check the results
Recommended