Speech Recognition Front End Pre-emphasis Temporal Features Consolidate Features Frequency Features...

Speech Recognition Front End

Pre-emphasis

TemporalFeatures

Consolidate Features

Frequency Features

Spectral Analysis

windowing

EnhanceFeatures

Speech

FeatureVectors

This week’s focus is on the spectral analysis

Spectral Analysis

• Goal: Find useful frequency related features• Approaches

– Without Fourier Analysis: • Apply a recursive band pass bank of filters• Use linear predictive coding (LPC)

– With Fourier Analysis: • Calculate a Fourier transform • warp results based on the MEL scale

• Applications: Auditory models mimicking human hearing– Eliminate noise by removing non-voice frequencies – Detect formants present in signal– Perform Cepstral analysis to detect pitch and recognize speech– Auditory nerves stop responding to extended occurrences of the same

frequency• Idea: Deemphasize frequencies present for extended periods.• Results: Effective for speech recognition in noisy environments

This week’s emphasis will be on Fourier Analysis

The Fourier Transform Family• Fourier Series

A decomposed weighted sum of sinusoidal functions that models an arbitrary infinitely periodic continuous function

• Fourier TransformA linear operation that maps an arbitrary function with infinite range into a spectrum of its frequency components

• Discrete Fourier Transform (DFT)A Fourier Transform applied to a discrete infinitely repeating periodic series of complex numbers.

• Discrete Time Fourier Transform (DTFT)A Fourier Transform applied to a a-periodic discrete series of complex numbers that extend from ± ∞.

• Fast Fourier Transform: Fast way to calculate DFT

The number e

• e = limn->∞{(1 + 1/n)n}

• When n = 1 e ≈ 2When n = 2 e ≈ (1 + ½)2 = 9/4 = 2.25When n = 3 e ≈ (1 + 1/3)3 = 64/27 = 2.37037

• When n is extremely large, it approaches the value: e = 2.718281828 …

• What does this have to do with sound?Answer: The future slides will tell.

Quick Calculus Review

• The derivative of a function at a point is the slope of the function at that point (change in y over change in x).

• The derivative of x2 = 2x (Notation: f’(x2) = 2x)lim ∆x->0 ( (x+∆x)2 – x2 )/ ∆x

= lim ∆x->0 (x2 + 2x∆x + ∆x2 – x2)/∆x = lim ∆x->0 (2x + ∆x) = 2x

• Tables of derivatives proved by mathematicians exist• We will need these:

– f’(xn) = nxn-1

– f’(Sin x) = Cos x, f’(Cos x) = -Sin x– f’(ex) = ex, f’(eax) = a eax

Complex Numbers• Extends the number line to a plane

– Horizontal axis: Real Numbers– Vertical axis: Complex Numbers– Rectangular Notation: a + bi

• a along the real axis• b along the imaginary axis

• Operations– Addition: (a+bi) + (c+di) = (a+b) + (b+d)I– Multiplication: (a+bi) * (c+di) = (ac – bd) + (ad + bc)I– Division: (a+bi)/(c+di) solved by multiplying numerator and

denominator by the conjugate of c+di, which equals c-di

Polar Notation

• Rectangular Form 4+i3• Convert to Polar Form (5,36.87)

– M = sqrt(42+32) = 5– Ө = arctan(3/4)

• Convert to Rectangular– A+ib = M(cos Ө + i * sinӨ)

Distance and angle from the origin

Note: At 90 and 270 degrees we have a divide by zero

McLauren Series for e, sin, cos• McLauren Series to estimate any well-behaved

function in terms of polynomialsf(x) = f(0)x0/0! + f’(0)x1/1! + … + fn(0)xn/n! + …

• Try it out say for the third derivative at x = 0f3(0) = 0 + 0 + 0 + 3*2*1 f3(0)/(3*2*1) + 0 + 0 + …All the derivatives match at x = 0.

• Series that we will need• ex = 1 + x + x2/2! + x3/3! + x4/4! + …• Sin x = x – x3/3! + x5/5! – x7/7! + …• Cos x = 1 – x2/2! + x4/4! – x6/6! + …

• Another way to calculate e: e = 1 + 1 + 1/2! + 1/3! + …

Note: 0! = 1

Sine, Cosine and e

eiӨ = 1 + iӨ + (iӨ)2/2! + (iӨ)3/3! + (iӨ)4/4! + (iӨ)5/5! + (iӨ)6/6! + (iӨ)7/7! +···

(Multiply terms to eliminate higher powers of i) = 1 + iӨ - Ө2/2! - iӨ3/3! + Ө4/4! + iӨ5/5! - Ө6/6! - iӨ7/7! + ···

(Gather real and complex terms together) = (1- Ө2/2! + Ө4/4! - Ө6/6! + ···) + i (Ө - Ө3/3! + iӨ5/5! - iӨ7/7! + ···)

(Substitute Cos and Sin terms for the series) eiӨ = cos(Ө) + i sin(Ө) (This is called Euler’s formula)

From Previous Slideex = 1 + x + x2/2! + x3/3! + x4/4! + …Sin x = x – x3/3! + x5/5! – x7/7! + …Cos x = 1 – x2/2! + x4/4! – x6/6! + …

Key Formulae and Identities

Euler's Formula: eix = cos(x) + i * sin(x)

Trigonometric Identities: cos(x)=cos(-x) and sin(x)=-sin(-x)

cos(x) = (eix + e-ix)/2 and sin(x) = (eix – e-ix)/2isin2(x)+ cos2(x) = 1

sin(x+y) = sin(x)cos(y) + cos(x)sin(y)cos(x+y) = cos(x)cos(y) - sin(x)sin(y)

Quick Linear Algebra Review• Linear algebra extends Euclidian space beyond three dimensions.• <3,4,5> represents a vector going from points (0,0,0) to (3,4,5).• Two vectors are orthogonal (perpendicular roughly speaking) if

their inner (dot product) equals 0.– Example: <1,0,0> • <0,1,0> = 1*0 + 0*1 + 0*0 = 0– Example: <3,1>•<-1,3> = 3*-1 + 1*3 = 0

• Two functions are orthogonal between a and b if ∫a,b f(x)g(x)dx = 0

• A set of functions are mutually orthogonal if ∫a,b fi(x)fj(x)dx = 0 if i≠j

and c>0 if i=j.• Why do we need this? Orthogonal function sets can be used to

decompose or construct signals.

Inner Product: sum the products of correspondent coordinates

Basis to span a space• Consider the orthogonal basis <1,0,0>, <0,1,0>, <0,0,1>

– These form a basis a three dimension space.– Why? Any 3-dimension vector is a linear combination of these– Example: <4,3,2> = 4 * <1,0,0> + 3 * <0,1,0> + 2 * <0,0,1>

• Consider the orthogonal basis vectors: <1,2>, <-2,1> – They are orthogonal because: <1,2> • <-2,1> = 0

• Consider the basis vectors: <1/5½,2/5½ >, <-2/5½,1/5½ >– Also orthogonal because the inner (dot) product is 0)– <1/5 ½,2/5 ½ >has a length of unity ((1/5½)2 + (2/5½)2) ½ = 1– <-2/5 ½,1/5½ > also has a length of unity (same distance calculation)

• Orthonormal basis vectors: orthogonal and have unity length

Orthogonal and Orthonormal• Experiment (intuitive example, not mathematically precise)• Goal: construct <4,7> from basis vectors

– Orthogonal Basis: <1,2> and <-2,1>– <1,2> • <4,7> = 18 and <-2,1> • <4,7> = -1– 18 <1,2> + (-1)<-2,1> = <20, 35> which is five times <4,7>

• Another experiment– Orthonormal basis: <1/5½,2/5½ >, <-2/5½,1/5½ >– <1/5½,2/5½ > • <4,7> = 18/5½ and <-2/5½,1/5½>• <4,7> = -1/5½

– (18/5½)<1/5½,2/5½ > + (-1/5½)<-2/5½,1/5½> = <20/5, 35/5> = <4,7>

• Conclusion: Orthonormal basis vectors correlated with another vector gets the multiple of that basis vector.

Fourier Series• A Fourier series is an sum (possibly by not

necessarily infinite) of Sine and Cosines to model a continuous signal.

• Fourier modeling allows us to decompose a signal, perform processing, and recombine the results to solve an original problem

Fourier DecompositionThe top signal decomposes into nine cosine and sine waves

Fourier Square Wave Synthesis

Fourier Cosine Series

• The set of functions: {cos(k2πF0} where k is an integer >0

– Mutually orthogonal from –T to T for 0 ≤ t < ∞; T>0

– ∫-L,L cos(k12πx/P) cos(k22πx/P)dx = 0 if k1 ≠ k2; ≠ 0 if k1 = k2

– Proof requires some Calculus: Namely integration

• x(t) = a0cos(0*2πF0t) +a1 cos(1*2πF0t) +a1 cos(2*2πF0t) …x(t) = a0 + ∑k=1,∞akcos(k2πFt ) where F = 2π/T

• Comment: The series doesn’t include phases, if we add phases we have twice as many unknowns to compute

cos (πx/3) and cos (2 πx/3)Integral: Cos (πx/3) * cos (2 πx/3)

A General OrthogonalFunction Set

• Euler Equation: eiφ = cos(φ) + i sin(φ)– Radius = magnitude (always unity); φ = phase.

• Consider the function set: {eiωk}

– Angular frequency: ωk = 2πkF0 = 2πk/T0

– F0,T0 Fundamental frequency & period.

– k = speed which eiωk traverses the circle

– Orthogonal because ∫-∞,∞ ejω

n ejωm =0 whenever n ≠-m Notes

1. The book uses j instead of I

2. Electrical engineers prefer j

3. Mathematicians prefer I

4. Get used to both!

5. In the diagram, φ = 2πF0

Orthogonality Example• Left: Correlate top with middle resulting bottom having area ≠0 • Right: Correlate top with middle resulting bottom having area = 0

Putting it all together• {eiω

k} is an Orthogonal basis for signals– Each function: eiω

k is a basis function

– We can use to basis functions to synthesize signals• Synthesize (Fourier series)

– Source: frequency magnitudes, Sink: time signal– x(k) = (1/T)∑k=0,T ak eikω

0 where x(k) = signal at time t

– T = # of basis functions (possibly infinite); ak = magnitude of wk

• For computer processing, we need a discrete counterpart– Why? We don’t to deal with infinite points or basis functions– x[k] = (1/N)∑k=0,N X[k] ei2∏kn/N

– k determines how fast the sum traverses the circle (higher k faster)– N basis functions and N frequencies

• Note: For periodic functions, we can use [0,T] instead of [- ∞,∞]

Fourier Analysis• Goal: Compute coefficients given the signal.• Synthesis equation: x(t) = ∑k= -∞,∞ ak eitmω

• Multiply both sides by e-itkw0

x(t)e-itkw0 = (∑k= -∞, ∞ ak eimtω

0)e-itkw0

• Integrate over the period: 0, T0∫0,T0x(t)e-itkw

0dt = ∫0,T0(∑k= -∞, ∞ ak eimω0

t) e-itkω0dt

• The sum will be zero except when k = m∫0,T0 x(t)e-itkw

0dt = (∑k= -∞, ∞ ak ) ∫0,T0(etimω0) e-itkω

0dt∫0,T0 x(t)e-itkw

0dt = (∑k= -∞, ∞ ak ) ∫0,T0(eit(m-k)ω0)dt

• The only time this is non-zero is if k=m ∫0,T0 x(t)e-itkw

0dt = ak ∫0,T0dt = ak t |0,t0 = ak T0

• The answer (value of coefficient m): ak = (1/T0)∫0,T0 x(t)e-itkw0dt

• Note: 1/T0 is simply a constant the scales the result

Discrete Version• Definition: Continuous Fourier Transform and Inverse

– Transform: X(w) = ∫-∞, ∞ x(t)e-itwtdt

– Inverse: x(t) = (1/2π)∫ -∞, ∞ X(w)eiwtdw

• Convert from continuous version: – Evaluate at N equally spaced points (period now is N)– Use sums to approximate the integral– Note: x(t) = value at time t, x[n] is x(t) evaluated at time 2∏kn/N

• Discrete Fourier Transform and Inverse– Transform: X[k] = ∑n=0,N-1 x[n] e-i2∏kn/N

– Inverse: x[k] = (1/N)∑n=0,N-1 X[k] ei2∏kn/N

• Note: X[k] is a complex number representing magnitude/phase• Conclusion: We can go between time and frequency domains

Signal Plot

• The phases are shown in the spectrum plot in the complex plane.

• The phase affects how the time domain signal looks.

• The amplitude of the spectrum plot remain constant regardless of phase.

Fourier Transform of Square Wave

Fourier Transforms exhibit the property of duality• Square wave in frequency = to window sync function in time and visa versa• Convolution in time = multiplication in frequency and visa versa• Proof with calculus

∫-∞,∞ x(t)e-jtkw0dt = ∫-1/2,1/2 x(t)e-jtkw

0dt = ∫-1/2,1/2 e-jtkw0dt =(1/jw)e-jwt|-1/2,1/2

= (1/jw)(e-jw½ –e-jw(-½))=(1/jw)(ejw/2–e-jw/2) = sin(jw/2)/(jw/2)

-1/2 1/2

Complex DFT by Correlationdouble[] DFT( double[] time, int N){ double[] f[2*N], real, imag; double om, w = 2 * Math.PI / time.length; for (k=0; k<N; k++) { for (i=0; i<N; i++) { real = Math.cos(2*Math.PI*k*i/N); imag = -Math.sin(2*Math.PI*k*i/N); f[2*k] +=(time[2*i]*real–time[2*i+1]*imag); f[2*k+1]+=(time[2*i+1]*imag+time[2*i]*real); } } return freq;}

Note: even indices = real part, odd indices = imaginary part

Complexity: O(N2) because of the double loop of N eachExample: For 512 samples, loops 262144 timesEvaluation: Too slow, but FFT is O(N lg N)

The FFT Algorithm• The FFT algorithm is based on divide-and-conquer

n/4 n/4 n/4 n/4

n/2 n/2

n O(n)

O(log n)

The running time complexity is O(n log n)

Why do we need FFT?• Correlation algorithm is O(N2)

• Too slow to be practical even on today's processors

• Optimized FFT is O(N lgN) which is orders of magnitude faster

• Assume 512 elements in a window– O(N) = C * 512 – O(N2) = C * 512 * 512 = C * 262,144– O(N lg N) = C * 512 * 9 = C * 4,608

Theory for OptimizationBase Case: x[0]

Recursive Relationship

∑t=0N-1 x[t] e-i2πkt/N

= ∑t=0N/2-1 x[2t] e-i2πk(2t)/N

+ ∑t=0N/2-1 x[2t+1] e-i2πk(2t+1)/N

= ∑t=0N/2-1 x[2t] e-i2πkt/(N/2)

+ ∑t=0N/2-1 x[2t+1] e-i2πk(2t+1)/N

= ∑t=0N/2-1 x[2t] ei2πkt/(N/2)

+ e-i2πk/N∑t=0N/2-1x[2t+1]e-i2πkt/(N/2)

= Fkeven

+ e-i2πk/N * Fkodd

Note: work at each step is O(N); there are lg(N) levels

Simple Recursive FFT SolutionComplex[] fft(Complex[] x){ int N = x.length;

Complex[] y = new Complex[N]; if (x.length==1) {y[0] = x[0]; return y; } Complex[] even = new Complex[N/2]; Complex[] odd = new Complex[N/2]; for (int m=0; m<N/2; m++) { even[m] = x[2*m]; odd[n] = x[2*m+1]; } Complex[] q = fft(even), r = fft(odd);

for (int k=0; k<N/2; k++) { double exp = -2*k* math.PI /N; Complex wk = new Complex

(Math.cos(exp), Math.sin(exp)); Y[k] = q[k].plus(wk.times(r[k])); Y[k+N/2] = q[k].minus(wk.times(r[k])); } Return y; } Note: e-2kπ/N = -e-2kπ/N+N/2

Inefficiencies

• The Complex class causes many jumps and puts pressure on the hardware cache

• Declaring and copying arrays at every step slows things down at least by half

• Repetitive calculations of sines and cosines are extremely slow

• N<<1 is ten times faster than N/2• Overhead associated with activation record creation

due to the recursion calls is very slow

The computations still are an order of magnitude slower than needed

Eliminating the Recursion

• The numbers in the rectangles are the array indices

• You see the original indices as we pass through each level of recursion

• Can you see a pattern ?

000 001 010 011 100 101 111110

000 010 100 110 001 011 111101

000 100 010 110 001 101 111011

Butterfly algorithm

Butterfly Code

int j = N>>1, k;for (int i=1;i<N-1;i++){ if (i < j) { swap (x[i],x[j]);} k = N>>1; while (k>=2 & j>=k) { j -= k; k >>= 1; } j += k;}

• Most Significant BitSwapBit ( x, x + lgN)

• Second most significant bit

SwapBit(x, x + lg(N/2)

• Third most significant bit SwapBit(x, x + lg(N/4)

• kth most significant bitSwapBit(x, x + lg(N/2k))

Flip bits from left to right

Sin and Cosine Table Look Up

• ei2πk/N = cos(2πk/N) + i sin(2πk/N)• We can store in an array (sinX[])sin(2π0/N), sin(2π1/N), sin(2π2/N) sin(2π3/N), … sin(2π(N-1)/N)

• cos(2πk/N) = sinX[(k+(N>>2))%N]

Compute the values ahead of time and save repetitive calculations

Optimized FFT – after butterfly code// Perform the fft calculations.for (int stage=1; stage<=M; stage++) // M = lg N{ // Remember that complex numbers require pairs of doubles fftSubGroupGap = 2<<stage; // 4, 8, 16, ... – subgroup distance gap = fftSubGroupGap>>1; // 2, 4, 8, ... – odd/even distance kInc = N>>1; // Number of 2PIki/N steps for odd/even entries.

// Outer loop: each sub-fft group; inner loop: combine group elements for (int even=0; even<complex.length; even+=fftSubGroupGap) { k = 0; // Index into the trigonometric lookup table. for (int element=even; element<(even+gap); element+=2) {

// ***** See Next Slide ***** k += kInc; // position for next look up.

} } kInc >>= 1;}

Multiplication Portion // Look up e^2PIki/N avoiding trig calculations here. realW = sines[(k+(N>>2))%N]; // cos(2PIk/N); imagW = -sines[k%N]; // -sin(2PIk/N);

// Complex multiplication of the odd entry of the subgroup // with (e^2PIi/N)^k = (cos(2PI/N) - i * sin(2*PI/N)^k j = (element + gap); tempReal = realW * complex[j] - imagW * complex[j+1]; tempImag = realW * complex[j+1] + imagW * complex[j];

// Adjust the odd entry (subtract: the fft is periodic). complex[j] = complex[element] - tempReal; complex[j+1] = complex[element+1] -tempImag;

//Adjust the even entry. complex[element] += tempReal; complex[element+1] += tempImag;

Final Notes• Standard Fast Fourier Transform

– requires N to be a power of 2 for recursion to work– Can pad the array with zeroes to extend frequency domain

• Can it work if N is not a power of 2? – Yes, but special slower processing is needed

• How do we know if it works?– Point N/2-1 = Point N/2+1, Point N/2-2 = Point N/2+2,

Point N/2-k = Point N/2 + k, etc.– Note: Points 0 and N/2 don't match, so don’t check these– The FFT Inverse should restore the time domain signal– Compare to the slower correlation DFT calculation– Try some simple impulses and check the results

Speech Recognition Front End Pre-emphasis Temporal Features Consolidate Features Frequency Features...

Documents

DPP-TTS: DIVERSIFYING PROSODIC FEATURES OF SPEECH VIA

Abstract Windowing Toolkit 2

Self-supervised Learning of Visual Speech Features with ...arXiv:2004.12031v3 [cs.LG] 6 May 2020 Self-supervised Learning of Visual Speech Features with Audiovisual Speech Enhancement

FFT Windowing Tutorial

Features of Connected Speech

Features of English Speech

L6: Short-time Fourier analysis and synthesiscourses.cs.tamu.edu/rgutier/csce630_f14/l6.pdfIntroduction to Speech Processing | Ricardo Gutierrez -Osuna | CSE@TAMU 5 • Windowing function

Speech Quality Assessment Using Audio Features

The features of the connected speech final

Representations 1 Representations 2 Mid-level ...sgn24006/PDF/L02-mid-level-representations.pdf · STFT spectrogram Constant-Q transform ... for speech signals often 25ms ... Windowing

Abstract windowing toolkit & swing

Ensemble Learning of Hybrid Acoustic Features for Speech

Using Prosodic Features of Speech and Audio Localization

SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM …

Windowing functions - Kevin Boles

Cognitive Load Measurement using Speech/Linguistic Features

The NitrOS-9 Level 2 Windowing System - Color Computer ......5 Chapter 1. The NitrOS-9 Level 2 Windowing System One of NitrOS-9 Level 2’s advanced features is its built-in Windowing

Windowing in Apache Apex

ALIGNING AUDIOVISUAL FEATURES FOR AUDIOVISUAL SPEECH ... · ALIGNING AUDIOVISUAL FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION Fei Tao and Carlos Busso Multimodal Signal Processing

Traversing the speech and thought presentation features of