DIGITAL PROCESSING OF SPEECH AND IMAGE SIGNALS

Embed Size (px)

DESCRIPTION

Fundamentals of speech signal analysis

Citation preview

  • Lecture

    DIGITAL PROCESSING

    OF

    SPEECH AND IMAGE SIGNALS

    RWTH Aachen, WS 2006/7

    Prof. Dr.-Ing. H. Ney, Dr.rer.nat. R. SchluterLehrstuhl fur Informatik 6

    RWTH Aachen

    1. System Theory and Fourier Transform

    2. Discrete Time Systems

    3. Spectral Analysis

    4. Fourier Transform and Image Processing

    5. LPC Analysis

    6. Wavelets

    7. Coding

    8. Image Segmentation and Contour-Finding

    Completions: L. Welling, A. Eiden; April 1997Completions: J. Dahmen, F. Hilger, S. Koepke; Mai 2000Completions: F. Hilger, D. Keysers; Juli 2001Translation: M. Popovic, R. Schluter; April 2003Corrections: D. Stein; October 2006

  • Literature:

    A. V. Oppenheim, R. W. Schafer: Discrete Time Signal Processing,Prentice Hall, Englewood Cliffs, NJ, 1989.

    A. Papoulis: Signal Analysis, McGraw-Hill, New York, NY, 1977. A. Papoulis: The Fourier Integral and its Applications, McGraw-HillClassic Textbook Reissue Series, McGraw-Hill, New York, NY, 1987.

    W. K. Pratt: Digital Image Processing, Wiley & Sons Inc, New York,NY, 1991.

    Further reading:

    T. K. Moon, W. C. Stirling: Mathematical Methods and Algorithmsfor Signal Processing. Prentice Hall, Upper Saddle River, NJ, 2000.

    J. R. Deller, J. G. Proakis, J. H. L. Hansen: Discrete-Time Processingof Speech Signals, Macmillan Publishing Company, New York, NY,1993.

    W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery: Nu-merical Recipes in C, Cambridge Univ. Press, Cambridge, 1992.

    L. Rabiner, B. H. Juang: Fundamentals of Speech Recognition, Pren-tice Hall, Englewood Cliffs, NJ, 1993.

    T. Lehmann, W. Oberschelp, E. Pelikan, R. Repges: Bildverarbeitungfur die Medizin, Springer Verlag, Berlin, 1997.

    L. Berg: Lineare Gleichungssysteme mit Bandstruktur, VEB DeutscherVerlag der Wissenschaften, Berlin, 1986.

  • Contents

    1 System Theory and Fourier Transform 1

    1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.2 Linear time-invariant Systems . . . . . . . . . . . . . . . . 11

    1.3 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . 16

    1.4 Properties of the Fourier Transform . . . . . . . . . . . . . 25

    1.5 Parseval Theorem . . . . . . . . . . . . . . . . . . . . . . . 33

    1.6 Autocorrelation Function . . . . . . . . . . . . . . . . . . . 34

    1.7 Existence of the Fourier Transform . . . . . . . . . . . . . 35

    1.8 -Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    1.9 Motivation for Fourier Series . . . . . . . . . . . . . . . . . 41

    1.10 Time Duration and Band Width . . . . . . . . . . . . . . . 45

    2 Discrete Time Systems 51

    2.1 Motivation and Goal . . . . . . . . . . . . . . . . . . . . . 52

    2.2 Digital Simulation using Discrete Time Systems . . . . . . 53

    2.3 Examples of Discrete Time Systems . . . . . . . . . . . . . 56

    2.4 Sampling Theorem (Nyquist Theorem) and Reconstruction 61

    2.5 Logarithmic Scale and dB . . . . . . . . . . . . . . . . . . 70

    2.6 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . 72

    2.7 Fourier Transform and zTransform . . . . . . . . . . . . . 74

    2.8 System Representation and Examples . . . . . . . . . . . . 78

    2.9 Discrete Time Signal Fourier Transform Theorem . . . . . 88

    2.10 Discrete Fourier Transform: DFT . . . . . . . . . . . . . . 90

    2.11 DFT as Matrix Operation . . . . . . . . . . . . . . . . . . 98

    2.12 From Continuous Fourier Transform to Matrix Representa-tion of Discrete Fourier Transform . . . . . . . . . . . . . . 102

    2.13 Frequency Resolution and Zero Padding . . . . . . . . . . 104

    2.14 Finite Convolution . . . . . . . . . . . . . . . . . . . . . . 105

    2.15 Fast Fourier Transform (FFT) . . . . . . . . . . . . . . . . 108

    2.16 FFT Implementation . . . . . . . . . . . . . . . . . . . . . 118

    i

  • 2.17 Cyclic Matrices and Fourier Transform . . . . . . . . . . . 124

    3 Spectral analysis 1313.1 Features for Speech Recognition . . . . . . . . . . . . . . . 1323.2 Short Time Analysis and Windowing . . . . . . . . . . . . 1353.3 Autocorrelation Function and Power Spectral Density . . . 1593.4 Spectrograms . . . . . . . . . . . . . . . . . . . . . . . . . 1653.5 Filter Bank Analysis . . . . . . . . . . . . . . . . . . . . . 1683.6 Mel-frequency scale . . . . . . . . . . . . . . . . . . . . . . 1713.7 Cepstrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 1733.8 Statistical Interpretation of the Cepstrum Transformation 1833.9 Energy in acoustic Vector . . . . . . . . . . . . . . . . . . 185

    4 Fourier Transform and Image Processing 1874.1 Spatial Frequencies and Fourier Transform for Images . . . 1884.2 Discrete Fourier Transform for Images . . . . . . . . . . . 1964.3 Fourier Transform in Computer Tomography . . . . . . . . 1974.4 Fourier Transform and RST Invariance . . . . . . . . . . . 199

    5 LPC Analysis 2075.1 Principle of LPC Analysis . . . . . . . . . . . . . . . . . . 2085.2 LPC: Covariance Method . . . . . . . . . . . . . . . . . . . 2125.3 LPC: Autocorrelation Method . . . . . . . . . . . . . . . . 2135.4 LPC: Interpretation in Frequency Domain . . . . . . . . . 2165.5 LPC: Generative Model . . . . . . . . . . . . . . . . . . . 2215.6 LPC: Alternative Representations . . . . . . . . . . . . . . 223

    6 Outlook: Wavelet Transform 2256.1 Motivation: from Fourier to Wavelet Transform . . . . . . 2266.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 2276.3 Discrete Wavelet Transform . . . . . . . . . . . . . . . . . 229

    7 Coding (appendix available as separate document) 233

    8 Image Segmentation and Contour-Finding 237

    ii

  • List of Figures

    1.1 Oscillograms of three time functions composed as sum of 20partial oscillations. a) n = 0, b) n =

    2 , c) n statistical. 3

    1.2 Amplitude spectrum of a time function composed as sum of20 partial tones. . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.3 from left to right: original photo, low-pass and high-passfiltered version . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.4 Phase manipulation for portion of a speech signal (vowel o)sampled at 8kHz, 25ms analysis window (200 samples), 512point FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.5 Phase manipulation for portion of a speech signal (consonantn) sampled at 8kHz, 25ms analysis window (200 samples),512 point FFT . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.6 Phase manipulation for a Heavisidefunction (stepfunction) 6

    1.7 Schematic representation of the physiological mechanism ofspeech production . . . . . . . . . . . . . . . . . . . . . . . 8

    2.1 Digital photo . . . . . . . . . . . . . . . . . . . . . . . . . 58

    2.2 Gradient image . . . . . . . . . . . . . . . . . . . . . . . . 58

    2.3 Several real cases of Laplace Operator subtraction from orig-inal image. a) Original image b) Original image minusLaplace Operator (negative values are set to 0 and valuesabove the grey scale are set to the highest grade of grey) . 60

    2.4 Ideal reconstruction of a band-limited signal (from Oppen-heim, Schafer)a) original signal b) sampled signal c) reconstructed signal 64

    iii

  • 2.5 Sampling of band-limited signal with different sampling rates:b) sampling rate higher than Nyquist rate - exact reconstruc-tion possiblec) sampling rate equal to Nyquist rate - exact reconstructionpossibled) sampling rate smaller than Nyquist rate - aliasing - exactreconstruction not possible . . . . . . . . . . . . . . . . . . 65

    2.6 Amplitude spectrum of the voiceless phoneme s from theword ist . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    2.7 Logarithmic amplitude spectrum of the phoneme s . . . 71

    2.8 Amplitude spectrum of the voiced phoneme ae from theword Ah . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    2.9 Logarithmic amplitude spectrum of the phoneme ae . . . 71

    2.10 Amplitude spectrum of a speech pause . . . . . . . . . . . 71

    2.11 Logarithmic amplitude spectrum of a speech pause . . . . 71

    2.12 Hanning window . . . . . . . . . . . . . . . . . . . . . . . 103

    2.13 Example of a linear convolution of two finite length signals:a) two signals;b) signal x[n-k] for different values of n:i) n < 0, no overlap with h[k], therefore convolution y[n] =0ii) n between 0 and Nh +Nx 2, convolution 6= 0iii) n > Nh +Nx 2, no overlap with h[k], convolution y[n]= 0c) resulting convolution y[n]. . . . . . . . . . . . . . . . . . 106

    2.14 Flow diagram for decomposition of one N -DFT to two N/2DFTs with N = 8 . . . . . . . . . . . . . . . . . . . . . . . 110

    2.15 Flow diagram of an 8pointFFT using Butterfly operations. 111

    2.16 Flow diagram of an 8pointFFT using Butterfly operations. 120

    2.17 Input and output arrays of an FFT. a) The input array con-tains N (N is power of 2) complex input values in one realarray of the length 2N . with alternating real and imagi-nary parts. b) The output array contains complex Fourierspectrum at N frequency values. Again alternating real andimaginary parts. The array begins with the zero-frequencyand then goes up to the highest frequency followed withvalues for the negative frequencies. . . . . . . . . . . . . . 122

    iv

  • 3.1 Example for the application of the Discrete Fourier Trans-form (DFT). . . . . . . . . . . . . . . . . . . . . . . . . . . 138

    3.2 a) signal v[n]; b) DFT-spectrum V [k]; c) Fourier spectrumV (ej). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

    3.3 a) signal v[n]; b) DFT-spectrum V [k]; c) Fourier spectrumV (ej). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

    3.4 a) DFT of length N = 64; b) DFT of length N = 128; c)Fourier spectrum V (ej). . . . . . . . . . . . . . . . . . . . 151

    3.5 Influence of the window function:above: speech signal (vowel a); central: 512 point FFTusing rectangle window; below: 512 point FFT using Ham-ming window . . . . . . . . . . . . . . . . . . . . . . . . . 158

    3.6 Fourier Transform of a voiced speech segment:a) signal progression, b) high resolution Fourier Transform,c) low resolution Fourier Transform with short Hammingwindow (50 sampled values), d) low resolution Fourier Trans-form using autocorrelation function (19 coefficients), e) lowresolution Fourier Transform using autocorrelation function(13 coefficients) . . . . . . . . . . . . . . . . . . . . . . . . 162

    3.7 Signal progression and autocorrelation function of voiced(left) and unvoiced (right) speech segment . . . . . . . . . 163

    3.8 Temporal progression of speech signal and four autocorrela-tion coefficients . . . . . . . . . . . . . . . . . . . . . . . . 164

    3.9 a) wide-band spectrogram: short time window, high timeresolution (vertical lines), no frequency resolution; for voicedsignals provides information on formant structure b) narrow-band spectrogram: long time window, no time resolution,high frequency resolution (horizontal lines); for voiced sig-nals provides information on fundamental frequency (pitch) 166

    3.10 Wide-band and narrow-band spectrogram and speech am-plitude for the sentence Every salt breeze comes from thesea. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

    3.11 Above: logarithmized power spectrum of a spoken vowel(schematic).Below: corresponding cepstrum (inverse Fouriertransformof the logarithmized power spectrum). . . . . . . . . . . . 177

    v

  • 3.12 Cepstral smoothing: speech signal (vowel a), windowedspeech signal (Hamming window), spectrum obtained fromthe whole cepstrum (blue) and smoothed spectrum obtainedfrom the first 13 cepstral coefficients (red). . . . . . . . . . 178

    3.13 Homomorph analysis of a speech segment: signal progres-sion, homomorph smoothed spectrum using 13 and 19 cep-stral coefficients . . . . . . . . . . . . . . . . . . . . . . . . 179

    4.1 TVimage (analog) . . . . . . . . . . . . . . . . . . . . . . 1934.2 Digitized TVimage . . . . . . . . . . . . . . . . . . . . . . 1934.3 Amplitude spectrum of Figure 4.2 . . . . . . . . . . . . . . 1934.4 Low-pass filtered . . . . . . . . . . . . . . . . . . . . . . . 1934.5 High-pass filtered . . . . . . . . . . . . . . . . . . . . . . . 1944.6 High-pass enhancement . . . . . . . . . . . . . . . . . . . . 194

    5.1 LPCanalysis of one speech segmenta) signal progression, b) prediction error (K=12), c) LPCspectrum with K=12 coefficients, d) spectrum of the predic-tion error (K=12), e) LPCspectrum with K=18 coefficients 219

    5.2 LPCSpectra for different prediction orders K . . . . . . . 220

    List of Tables

    2.1 Fourier transform pairs . . . . . . . . . . . . . . . . . . . . 872.2 Fourier transform Theorems . . . . . . . . . . . . . . . . . 88

  • Chapter 1

    System Theory and FourierTransform

    Overview:1.1 Introduction

    1.2 Linear time-invariant Systems

    1.3 Fourier Transform

    1.4 Properties of Fourier Transform

    1.5 Parseval Theorem

    1.6 Autocorrelation Function

    1.7 Existence of the Fourier Transform

    1.8 Function

    1.9 Fourier Series

    1.10 Duration and Band Width

    Digital Processing of Speech and Image Signals WS 2006/2007

    1

  • 1.1 Introduction

    What distinguishes the Fourier Transform (FT) fromother transformations?

    1. Mathematical property of linear time-invariant systems:

    FT decomposes the time signal into eigenfunctionseigenfunctions keep their form by passing thelinear time-invariant system

    A x = x

    Magnitude of FT: shift invariant2. Physical observation:

    Human ear produces sort of FT, essentially only magnitude of FT(strictly speaking: short-time FT)

    Example:Time functions with different evolution can sound equally.The human ear either senses sense phase differences of partialtones of the complete sound of stationary processes very weakly,or does not sense them at all.

    Fourier transform in speech processing:

    Calculation of the spectral components of speech Basic method for obtaining observations (features) forspeech recognition

    Digital Processing of Speech and Image Signals WS 2006/2007

    2

  • 0

    0

    = 0

    0

    0

    = pi/2

    0

    0

    random

    Figure 1.1: Oscillograms of three timefunctions composed as sum of 20 partialoscillations. a) n = 0, b) n =

    pi2, c) n

    statistical.

    0

    Figure 1.2: Amplitude spectrum of a timefunction composed as sum of 20 partialtones.

    Figure 1.3: from left to right: original photo, low-pass and high-pass filtered version

    Digital Processing of Speech and Image Signals WS 2006/2007

    3

  • amplitude spectrum

    0

    original signal

    0

    0

    inverse FT for phase (f) = 0

    0

    0

    Inverse FT for phase (f) = pi2

    0

    0

    Inverse FT for random phase (f)

    0

    0

    Figure 1.4: Phase manipulation for portion of a speech signal (vowel o) sampled at 8kHz,25ms analysis window (200 samples), 512 point FFT

    Digital Processing of Speech and Image Signals WS 2006/2007

    4

  • amplitude spectrum

    0

    original signal

    0

    0

    inverse FT for phase (f) = 0

    0

    0

    inverse FT for phase (f) = pi2

    0

    0

    inverse FT for random phase (f)

    0

    0

    Figure 1.5: Phase manipulation for portion of a speech signal (consonant n) sampled at8kHz, 25ms analysis window (200 samples), 512 point FFT

    Digital Processing of Speech and Image Signals WS 2006/2007

    5

  • amplitude spectrum

    0

    original signal

    0

    inverse FT for phase (f) = 0

    0

    0

    inverse FT for phase (f) = pi2

    0

    0

    inverse FT for random phase (f)

    0

    0

    Figure 1.6: Phase manipulation for a Heavisidefunction (stepfunction)

    Digital Processing of Speech and Image Signals WS 2006/2007

    6

  • Why Fourier?

    Roughly:

    Production, description and algorithmic operations on signals (func-tions or measurement curves over the time axis) can be described verywell in Fourier domain (frequency domain).

    Deeper reason:

    Production, description and algorithmic operations on signals are largelybased on linear time-invariant (LTI) operations.

    Fourier Transform: simple representation of LTI-operations (later:convolution theorem)

    Why continuous?

    Real world is continuous Computer (digital = time discrete = sampled)model of the real world

    Digital Processing of Speech and Image Signals WS 2006/2007

    7

  • t AT

    glottal

    pulsesvocal

    tract

    filter

    radiation

    from lips

    and nose

    t

    T

    speech [a:]a)

    b)

    0 4 0 4 0 4 0 4

    1/T

    |E(f)| [dB] |V(f)| [dB][a:] |A(f)| [dB] |S(f)| [dB]

    * * =

    [a:]

    MUSCLEFORCE

    LUNGVOLUME

    PHARYNXCAVITY

    NASALCAVITY

    MOUTHCAVITY

    TONGUEHUMP

    VELUM

    VOCALCORDS

    TRACHEA ANDBRONCHI

    LARYNXTUBE

    NOSEOUTPUT

    MOUTHOUTPUT

    Figure 1.7: Schematic representation of the physiological mechanism of speech production

    Digital Processing of Speech and Image Signals WS 2006/2007

    8

  • signal (speech, image)

    feature extraction(signal analysis)

    feature vector(pattern vector)

    (pattern)comparison

    decision

    reference data(vectors, features)

    Examples:

    Spoken language Written numbers (letters) Cell recognition (red blood cells)

    Digital Processing of Speech and Image Signals WS 2006/2007

    9

  • Examples of applications of Fourier Transform:

    Electrical switchgears Recognition and coding

    Speech and general acoustic signals

    Image signals

    Time series analysis: Astronomical measurement curves

    Stock-market course

    . . .

    Computer tomography Solving differential equations Description of image production in optical systems

    Digital Processing of Speech and Image Signals WS 2006/2007

    10

  • 1.2 Linear time-invariant Systems

    Example:

    speech production

    electrical systems

    S

    h(t)input signal

    x(t)output signal

    y(t)

    symbolic:

    {t y(t)} = S {t x(t)}simplified:

    y(t) = S {x(t)} Note: the complete time domain of the function is important, notindividual positions in time t.

    more exact:

    y = S {x}LTISystem: (LTI = Linear Time-Invariant)

    Linear:Additive:

    S {x1 + x2} = S {x1}+ S {x2}

    Homogeneous:

    S { x} = S {x} , IR

    Time-invariant:{t y(t t0)} = S {t x(t t0)} , t0 IR

    Digital Processing of Speech and Image Signals WS 2006/2007

    11

  • Mathematical theorem:

    Linearity and time invariance result in convolution representation Output signal y(t) of LTI system S with input signal x(t):

    y(t) =

    x(t ) h() d

    =

    x() h(t ) d

    = x(t) h(t)

    h: impulse response of the system S

    1/

    e (t)

    t t

    x (t)

    i

    system response h (t) to excitation e (t):h(t) = S {e(t)}

    signal x(t) is represented as sum of amplitude weighted and timeshifted elementary functions e(t):

    x(t) = lim0

    [i

    x(i) e (t i) ]

    Digital Processing of Speech and Image Signals WS 2006/2007

    12

  • Hence the following holds for the output signal y(t):

    y(t) = S {x(t)}

    = S

    {lim0

    i

    x(i) e(t i) }

    = lim0

    [S

    {i

    x(i) e (t i) } ]

    additivity:

    = lim0

    [i

    S { x(i) e(t i) }]

    homogeneity (for x(i) and ):

    = lim0

    [i

    x(i) S { e (t i) } ]

    time invariance:

    = lim0

    [i

    x(i) h (t i) ]

    limiting case 0 :

    di

    h(t) h(t)result:

    y(t) =

    x() h(t ) d = x(t) h(t)

    h(t): impulse response of the system

    Digital Processing of Speech and Image Signals WS 2006/2007

    13

  • Examples of LTI-operations:

    Oscillatory systems (electrical or mechanical) withexternal excitation:

    x(t) h() y(t)

    y(t) =

    h(t ) x() d

    y(t) + 2y(t) + 2y(t) = x(t)

    , : parameters depending on the oscillatory system

    More general electrical engineering systems:high-pass, low-pass, band-pass

    Sliding average value:

    x(t) S y(t) := x(t)

    x(t) =1

    T

    +T/2T/2

    x(t+ ) d

    Differentiator:x(t) S y(t) := x(t)

    Comb filter: hypothesized period Tx(t) S y(t) := x(t) x(t T )

    In general: linear differential equations with coefficients ck and dlk

    cky(k)(t) =

    l

    dlx(l)(t)

    [ + further constraints ]

    Digital Processing of Speech and Image Signals WS 2006/2007

    14

  • Example of a non-linear system:

    system: y(t) = x2(t)

    x(t) = A cos(t)

    = y(t) = A2 cos2(t) = A2

    2(1 + cos(2t))

    frequency doubling

    Digital Processing of Speech and Image Signals WS 2006/2007

    15

  • 1.3 Fourier Transform

    Sinusoidal oscillation:

    x(t) = A sin ( t + )

    amplitude A

    phase / null phase

    angular frequency = 2 f

    j2 = 1, j C

    cos

    sin

    Im

    Re

    1

    1

    complex representation: ej = cos + j sin , IR

    cos =ej + ej

    2and sin =

    ej ej2j

    dimension:

    DIM() DIM(t) = 1

    DIM() =1

    DIM(t)=

    1

    [sec]= [Hz]

    Digital Processing of Speech and Image Signals WS 2006/2007

    16

  • LTI-System

    y(t) =

    x(t )h()d = x(t) h(t)

    Determine the following specific input signal:x(t) = A ej(t+)

    For this input signal the output signal becomes:

    y(t) =

    A ej((t)+)h()d

    = A ej(t+)

    h()ejd

    H() = F {h()}

    = x(t) H()

    Definition of the Fourier transform:

    H() =

    h()ejd = F {h()} = F { h()}

    ( decomposition into ej)

    H() is called transfer function of the system

    Remark about x(t) = A ej(t+):

    The shape of the input signal x(t), i.e. its frequency (eigenfunc-tion) remains invariant

    Amplitude (intensity) and phase (time shift) are depending onH() (eigenvalue)

    ( analogy to the problem of eigenvalues in linear algebra)

    Digital Processing of Speech and Image Signals WS 2006/2007

    17

  • Remarks

    FT is complex:H() = Re {H()} + j Im {H()} = |H()| ej()

    Amplitude (spectrum):

    |H()| =Re {H()}2 + Im {H()}2

    Phase (spectrum):

    () =

    arctan

    (Im {H()}Re {H()}

    )Re {H()} > 0

    arctan

    (Im {H()}Re {H()}

    )+ Re {H()} < 0

    2Re {H()} = 0,Im {H()} > 0

    2

    Re {H()} = 0,Im {H()} < 0

    Digital Processing of Speech and Image Signals WS 2006/2007

    18

  • Examples of Fourier transforms:

    1. Rectangle function

    h(t) = rect(t

    T) =

    {1, |t| T/20, |t| > T/2

    H() =

    h(t)ejtdt =

    T2

    T2

    ejtdt =1

    j[ej

    T2 ej T2

    ]

    =2

    sin(

    T

    2) =

    T sin(T

    2)

    T

    2

    (here: Im {H()} = 0)

    t

    h(t)

    H()

    Digital Processing of Speech and Image Signals WS 2006/2007

    19

  • 2. Double-sided exponential

    h(t) = e|t| with > 0

    H() =

    h(t)ejtdt

    =

    0

    e(+j)tdt+

    0

    e(j)tdt

    =

    [e(+j)t

    (+ j) +e(j)t

    ( j)]0

    = 0 + 0 1(+ j) 1

    ( j)=

    j + + j2 + 2

    =2

    2 + 2

    Imaginary part equals 0 Infinite spectrum No zeros

    h(t)

    t

    H( )

    If h(t) is symmetric (i.e. h(t) = h(t)), imaginary parts drop awayand the real part is sufficient

    Digital Processing of Speech and Image Signals WS 2006/2007

    20

  • 3. Damped oscillations

    h(t) = e|t| cos(t) with > 0

    H() =

    h(t)ejtdt

    =

    0

    e(+j)t cos(t)dt+

    0

    e(j)t cos(t)dt

    =

    0

    e(+j)tejt + ejt

    2dt+

    0

    e(j)tejt + ejt

    2dt

    = . . . (elementary calculation)

    =

    2 + ( )2 +

    2 + ( + )2

    Limiting case:

    H()|= = 1

    +

    2 + (2)2

    = tends towards or if tends towards 0

    H( )

    | |

    h(t)

    t

    Digital Processing of Speech and Image Signals WS 2006/2007

    21

  • 4. Modulated rectangle function (truncated cosine)

    h(t) =

    {cos( t), |t| T/2

    0, |t| > T/2

    H() =

    h(t)ejtdt

    =

    T2

    T2

    cos( t)ejtdt

    = . . . (elementary calculation)

    =T

    2

    sin(( ) T

    2

    )( ) T

    2

    +

    sin

    (( + )

    T

    2

    )( + )

    T

    2

    | |

    h(t)

    t

    H( )

    h(t)

    t

    | |

    H( )

    Digital Processing of Speech and Image Signals WS 2006/2007

    22

  • Fourier Transform pairs (u = /2)

    Rectangle function

    1

    -1/2 1/2

    -1/2 1/2

    Triangle function

    1

    Exponential function

    e-|x|

    Gaussian function

    e-x2

    Unit impulse

    (x)

    1

    Sinc function

    Squared sinc function

    sin(piu)piu

    1

    2+(2piu)22

    piu2

    e-pi

    1

    Digital Processing of Speech and Image Signals WS 2006/2007

    23

  • Inverse Fouriertransform

    H() =

    h(t)ejtdt

    assumption: h(t) =1

    2

    H()ejtd

    with: H() =

    h()ejd

    inserting H() in h(t):

    h(t) =1

    2lim

    ,T

    TT

    h() ej(t) d

    d=

    1

    2lim

    limT

    TT

    ej(t) d h() d

    = lim

    limT

    1

    TT

    sin ((t ))t h() d

    = lim

    1

    sin ((t ))t h() d

    = h(t)

    due to:

    lim

    1

    sin(t)

    th(t) dt = h(0)

    formal expression:

    h(t) =

    12

    ej(t) d

    = (t )

    h() d

    ( distribution theory, see there for stronger proof)Digital Processing of Speech and Image Signals WS 2006/2007

    24

  • 1.4 Properties of the Fourier Transform

    Symmetry

    H() =

    h(t) ejt dt = F {h(t)}

    h(t) =1

    2

    H() ejt d = F1 {H()}

    F 2{h(t)} = F{H()} = 2h(t)

    F1 F{h(t)} = F1{H()} = h(t)

    Time domain and frequency domain are correlated symmetrically. Properties of FT are valid in both domains, especially the convolutiontheorem (see later).

    Digital Processing of Speech and Image Signals WS 2006/2007

    25

  • Theorems for the Fourier transform

    H() =

    ejt h(t) dt

    consider the equation:

    H() = F {h(t)}more exact:

    { H()} = F {t h(t)}

    1. Linearity: integral operator is linear

    2. Inverse scaling, similarity principle:

    h(t) ejt dt =1

    ||

    h() ej d

    F{h(t)} = 1|| H(

    ), IR\{0}

    Note:

    Absolute value, because integral boundaries are swapped for < 0.

    3. Shift: h(t t0)

    h(t t0) ejt dt = ejt0

    h(t t0) ej(tt0) dt

    = ejt0

    h() ej d

    Digital Processing of Speech and Image Signals WS 2006/2007

    26

  • = F{h(t t0)} = ejt0H() t0 IR

    with H() = F{h(t)}

    important:

    | F{h(t t0)} | = | F{h(t)} | , because

    |ejt0| = |eju| = | cosu j sinu|=

    cos2 u + sin2u

    = 1

    4. Symmetry and antisymmetry:

    h(t) = h(t) results in Im{H()} = 0

    h(t) = h(t) results in Re{H()} = 0

    5. Complex conjugation: suppose that h(t) is a complex function

    h(t) ejt dt =

    h(t) ejt dt

    =

    h(t) ejt dt = H()

    F{h(t)} = H() = F{h(t)}

    Special case: h(t) is real, so h(t) = h(t)

    = H() = H() = | H() | = | H() | = | H() |

    Digital Processing of Speech and Image Signals WS 2006/2007

    27

  • 6. Differentiation:

    dh

    dt=

    t

    12

    H() ejt d

    =

    1

    2

    H() j ejt d

    F{dh(t)dt

    } = j F{h(t)}

    Interpretation: differentiation = enhancement of high frequencies(due to the multiplication with )

    7. Integration:

    F{t

    h()d} = 1

    jF{h(t)}

    Proof: similar to differentiation or inversion

    8. Modulation principle:

    F{h(t) cos(0t)} =

    h(t) cos(0t) e

    jt dt

    =1

    2

    h(t) ej0t ejt dt +

    h(t) ej0t ejt dt

    =

    1

    2

    h(t) ej(0)t dt +

    h(t) ej(+0)t dt

    =

    1

    2[ H( 0) + H( + 0) ]

    and similarly

    F{ h(t) sin(0t) } = 12j

    [ H( 0) H( + 0) ]

    Digital Processing of Speech and Image Signals WS 2006/2007

    28

  • h(t), H() x(t) X()

    y(t) Y()

    Convolution theorem

    Convolution in time domain corresponds to multiplication in frequencydomain

    Time domain: y(t) = x(t) h(t) =

    x(t ) h() d

    Frequency domain:

    Y () =

    ejt

    h() x(t ) d dt

    =

    h()

    x(t ) ejt dt d

    =

    h() X() ej d (shifting)

    = X()

    h() ej d

    = X() H()

    Digital Processing of Speech and Image Signals WS 2006/2007

    29

  • Likewise, multiplication in time domain corresponds to convolution infrequency domain (note the factor 12):

    Time domain: y(t) = a(t) b(t)Frequency domain:

    Y () =

    a(t) b(t) ejt dt

    =

    a(t)1

    2

    B()ejt ejt d dt

    =1

    2

    B()

    a(t)ej()t dt d

    =1

    2

    A( ) B()d

    =1

    2A() B()

    Motivation for the Fourier transform:FT gives the simplest representation of the system operation, be-cause every LTI-System can be interpreted as convolution of the inputsignal x(t) and the impulse response of the system h(t). Convolutioncan be then efficiently calculated using FT and convolution theorem.

    Mathematical: eigenfunctions

    Digital Processing of Speech and Image Signals WS 2006/2007

    30

  • Example: Oscillator with excitation

    x(t) Oscillator y(t)

    y(t) + 2 y(t) + 2 y(t) = x(t)

    x(t) =1

    2

    +

    X()ejtd

    y(t) =1

    2

    +

    Y ()ejtd

    y(t) =1

    2

    +

    Y ()j ejtd

    y(t) =1

    2

    +

    Y ()[2] ejtd

    +

    [2 + 2j + 2]Y ()ejtd =+

    X()ejtd

    +

    {[2 + 2j + 2] Y ()X()}

    =0

    ejtd = 0 t

    In this way we obtain the transfer function of an oscillator:

    H() =Y ()

    X()=

    1

    2 + 2j + 2

    Digital Processing of Speech and Image Signals WS 2006/2007

    31

  • h(t) =1

    2

    +

    H()ejtd

    (can be given explicitly)

    y(t) =

    +

    x(t) h(t )d

    Note:y(t) does not contain the component which corresponds to the homoge-neous differential equation of the oscillator.

    FourierTransform

    Convolution with h(t)

    Multiplication with H() = F{h(t)}

    Inverse FourierTransform

    x(t)

    X()

    y(t)

    Y()

    Digital Processing of Speech and Image Signals WS 2006/2007

    32

  • 1.5 Parseval Theorem

    Convolution theorem:

    F1 {H() X()} =

    h(t) x( t) dt

    ()1

    2

    H() X() ej d = (h x) ()

    We make two special assumptions:

    i) x(t) := h(t), then: X() = H()ii) = 0

    Inserting in () results in:

    1

    2

    H()H() d =

    h(t)h(t) dt

    1

    2

    |H()|2 d =

    |h(t)|2 dt = E

    Energy E in time domain = Energy E in frequency domain(up to the factor

    1

    2; aid: use normalization factor

    12

    for both

    directions of Fourier Transform)

    Physical aspect: energy conservation Mathematical aspect: unitary (orthogonal) representation in vectorspace

    |H()|2 is called power spectral density.

    Digital Processing of Speech and Image Signals WS 2006/2007

    33

  • 1.6 Autocorrelation Function

    Autocorrelation function

    Autocorrelation function of time continuoussignal or function h(t) is defined as:

    R(t) =

    h() h(t+ )d

    The following equation is valid:R(t) = h(t) h(t) which results in R(t) = R(t)

    Fourier transform gives: (Wiener-Khinchin Theorem)F{R(t)} = H() H() = |H()|2

    Thus: Fourier transform connects autocorrelationfunction R(t) and power spectral density |H()|2

    |H()|2 =

    R(t) ejt dt =

    R(t) cos(t) dt

    Remark:autocorrelation is a special case of the cross correlation between sig-nals x() and h(t)

    Ch,x =

    h() x(t+ )d

    Digital Processing of Speech and Image Signals WS 2006/2007

    34

  • 1.7 Existence of the Fourier Transform

    Conditions for h(t) for the existence of the Fourier transform

    H() =

    ejt h(t) dt , h(t) =1

    2

    ejt H() d

    When are those equations valid?

    Sufficient conditions:

    1. h(t) is absolutely integrable:

    |h(t)|dt

  • Impulse response:

    y(t) =

    h(t )() d

    = h(t) (t)

    = h(t)

    Consequence:

    h(t) 1

    (t) dt = 1

    A function like (t) does not exist. But it is possible to define thefunctional for each function t h(t):

    [t h(t)] (h) := h(0)

    1.8 -Function

    Starting point: definition of the -function as a boundary case ofa function (t):

    lim0

    +

    f(t) (t) dt = f(0) (1.1)

    Possible realizations of (t)

    a) (t) =

    1

    2t [,+]

    0 otherwise

    b) (t) =1

    2 + t2

    Digital Processing of Speech and Image Signals WS 2006/2007

    36

  • c) (t) =1

    sin (t/)

    t

    d) (t) =122

    et2

    22

    During inversion of the Fourier transform we have formally ob-tained:

    (t) =1

    2

    +

    ejt d = lim

    1

    sin (t)

    t(1.2)

    Fourier transform F{(t)}:

    F{(t)} =+

    ejt(t) dt

    due to (1.1) the following holds:

    F{(t)} = ejt|t=0 = 1

    Another derivation using (1.2):

    (t) =1

    2

    +

    ejt F{(t)} d general

    =1

    2

    +

    ejt d according to (1.2)

    Comparison results in:

    F{(t)} = 1

    Digital Processing of Speech and Image Signals WS 2006/2007

    37

  • From this we obtain the following equations:

    From symmetry property:

    F{1} = 2 ()

    From shifting theorem:

    F{ej0t 1} = 2 ( 0)

    cos (0 t) =1

    2

    [ej0t + ej0t

    ]=

    1

    2

    +

    ( 0) ejt d ++

    ( + 0) ejt d

    =

    1

    2

    +

    [ ( 0) + ( + 0) ] ejt d

    F{ cos (0 t) } = [ ( 0) + ( + 0) ]

    Note: another derivation:

    consider damped oscillations

    1

    2e|t| cos (0t)

    in the limit 0 .

    Digital Processing of Speech and Image Signals WS 2006/2007

    38

  • Comb function

    define comb function (pulse train, sequence of -impulses):

    x(t) =+

    n=(t nT )

    Fourier transform of comb function:

    X() =

    +

    x(t) ejt dt

    =

    +

    +n=

    (t nT ) ejt dt

    =+

    n=

    +

    (t nT ) ejt dt

    =+

    n=ejnT

    = . . . (see Papoulis 1962, p. 44)

    =2

    T

    +n=

    ( n2T)

    in words:

    -impulse sequence with period T in time domain

    produces

    -impulse sequence with period 1T in frequency domain(i.e. 2T in -frequency domain)

    comb function is transformed to comb function

    Digital Processing of Speech and Image Signals WS 2006/2007

    39

  • Comb function

    cos(0t)

    sin(0t)

    -2pi-4pi-6pi 2pi 4pi 6piT T T T T T

    T 3T-T-3T 6T-6T

    (t-nT)n=-

    12j((-0)+(+0))

    ((-0)+(+0))12

    (-n2pi/T)n=-

    2piT

    0 0

    0

    0

    Digital Processing of Speech and Image Signals WS 2006/2007

    40

  • 1.9 Motivation for Fourier Series

    x : IR IRt x(t)

    Consider a periodical function x with period T :

    x(t) = x(t+ T ) for each t IRthen also x(t) = x(t+ kT ) for k Z

    Examples:

    Constant function:x0(t) = A0

    Harmonic oscillator:

    x1(t) = A1 cos (2

    Tt + 1) , A1 > 0

    All higher harmonic:

    xn(t) = An cos (n2

    Tt + n) , An > 0

    therefore

    x(t) =n=0

    An cos (n 0 t + n) with 0 =2

    T, An 0

    is periodical with period T = 20

    Another notation:

    x(t) =

    n=Bn e

    j n 0 t where Bn is a complex number

    Digital Processing of Speech and Image Signals WS 2006/2007

    41

  • Line spectrum representation

    Real measured signal has always a widespread spectrum.

    Reasons:

    Strictly periodical signal (almost) never exists Period can fluctuate

    Wave form within one period can fluctuate

    Only a finite section of the signal is analyzed(window function)

    Only a strictly periodical signal has a sharp line spectrum

    Remarks:

    Fourier series are actually not strictly related to periodical functions:a finite interval of IR is sufficient (the signal is then interpreted asinfinitely prolonged).

    By transition from the finite interval to the complete real axis theFourier series becomes Fourier integral.

    Digital Processing of Speech and Image Signals WS 2006/2007

    42

  • Calculation of Fourier coefficients:

    Consider a periodical function x(t) with period T = 20 approach:

    x(t) =+

    n=an e

    j n0 t a C

    multiplication with ej m0 t where m IN and integration over oneperiod result in:

    +T/2T/2

    x(t) ej m 0 t dt =+

    n=an

    +T/2T/2

    ej (nm) 0 t dt

    Due to orthogonality holds:+T/2T/2

    ej (nm) 0 t dt ={T if n = m0 if n 6= m

    Then:T/2

    T/2

    x(t) ej m 0 t dt = am T

    Result:

    an =1

    T

    +T/2T/2

    x(t) ej n 0 t dt

    =1

    T

    +T/2T/2

    x(t) cos (n 0 t) dt j 1T

    +T/2T/2

    x(t) sin (n 0 t) dt

    Digital Processing of Speech and Image Signals WS 2006/2007

    43

  • Spectrum of a periodical function

    If x(t) is periodical with the period T = 20 , then

    x(t) =+

    n=an e

    j n0 t, an C

    The Fourier transform X() is:

    X() = F{x(t)}

    =+

    n=an F{ej n 0 t}

    = 2( n0)

    = 2 +

    n=an ( n0)

    Note:This derivation is formal, because the Fourier integral does notexist in the usual sense;strict derivation within the scope of distribution theory.

    In words:a periodic function with the period T has a Fourier transform in theform of a line spectrum with the distance 0 =

    2T between the com-

    ponents.

    Digital Processing of Speech and Image Signals WS 2006/2007

    44

  • 1.10 Time Duration and Band Width

    1. Similarity principle:

    F{h(t)} = 1|| H(

    )

    t

    h( t)0

  • 2. Special case: h(t) with

    Im {H()} = 0 ( h(t) symmetrical )and

    Re {H()} 0

    h(t) has maximum for t = 0:

    h(t) =1

    2

    H() cos(t) d 12

    H() d = h(0)

    define:

    T =1

    h(0)

    h(t) dt

    B =1

    H(0)

    H() d

    from

    T =H(0)

    h(0)and B = 2

    h(0)

    H(0)

    follows

    T B = 2

    Digital Processing of Speech and Image Signals WS 2006/2007

    46

  • 3. In general: normalized impulse h(t) IR with

    h2(t) dt = 1, h(t) IR

    T 2 :=

    h2(t) t2 dt

    B2 :=

    |H() |2 2 d = 2

    [h(t)]2 dt

    Results in uncertainty relation:

    T B

    2

    Proof: Cauchy-Schwarz inequality

    | xT y | ||x|| ||y||

    [ t h(t) ] h(t) dt

    2

    =1

    4

    [ t h(t)]2 dt

    = T 2

    [ h(t) ]2 dt

    B2

    2From: partial integration

    u(t) v(t) dt = u(t) v(t)

    u(t) v(t) dt

    [ h(t) h(t) ]

    u(t)

    tv(t)

    dt =1

    2h(t)2 t

    1

    2h2(t) 1 dt

    [ h(t) h(t) ] t dt = 0 12

    Digital Processing of Speech and Image Signals WS 2006/2007

    47

  • Equality sign is valid for linear dependency:

    h(t) = t h(t)dh

    h= t dt

    log(h) = 12 t2 + const., > 0

    = Optimum T B =

    2for Gauss impulse

    h(t) =2

    e12 t2

    Variance: 2 =1

    Quantum Physics: similar statement about position and impulseof a particle

    Digital Processing of Speech and Image Signals WS 2006/2007

    48

  • 4. Finite positive signal

    g(t)

    { 0 0 t T= 0 t < 0 or T < t

    0 T

    g(t)

    t

    The following is valid for the amplitude spectrum |G()|:

    |G()| =

    +

    g(t) ejtdt

    +

    |g(t)| |ejt|dt

    =

    +

    |g(t)| dt

    because g(t) 0= G(0)

    Define the band width B as:

    |G(B)|2 = G2(0)

    2and |G(B)|2 |G()|2 for || < B

    Then:

    T B 2

    Digital Processing of Speech and Image Signals WS 2006/2007

    49

  • Proof:

    The following inequalities are valid:

    a2 + b2 (a b)2

    2 a, b IR

    | sin|+ | cos| 1 IR

    For the Fourier-Transform of g(t) holds:

    Re{G()} =T0

    g(t) cost dt

    Im{G()} = T0

    g(t) sint dt

    For 0 t 2holds: cost 0, sint 0

    and therefore: cost+ sint = | cost|+ | sint| 1

    Re{G()} Im{G()} =T0

    g(t) [cost+ sint] dt

    T0

    g(t) 1 dt

    = G(0)

    |G()|2 = Re2{G()}+ Im2{G()}

    [Re{G()} Im{G()}]2

    2

    12G2(0) |G(B)|2

    Digital Processing of Speech and Image Signals WS 2006/2007

    50

  • Chapter 2

    Discrete Time Systems

    Overview:2.1 Motivation and Goal

    2.2 Digital Simulation using Discrete Time Systems

    2.3 Examples of Discrete Time Systems

    2.4 Sampling Theorem and Reconstruction

    2.5 Logarithmic Scale and dB

    2.6 Quantization

    2.7 Fourier Transform and zTransform

    2.8 System Representation and Examples

    2.9 Discrete Time Signal Fourier Transform Theorems

    2.10 Discrete Fourier Transform (DFT)

    2.11 DFT as Matrix Operation

    2.12 From continuous FT to Matrix Representation of DFT

    2.13 Frequency Resolution and Zero Padding

    2.14 Finite Convolution

    2.15 Fast Fourier Transform (FFT)

    2.16 FFT Implementation

    Digital Processing of Speech and Image Signals WS 2006/2007

    51

  • 2.1 Motivation and Goal

    If we want to process a continuous time signal x(t) with a computer, wehave to sample it at discrete equidistant time points

    tn = n TSwhere TS is called sampling period.

    Terminology:time discrete is often called digital, where this adjective often(but not always) denotes the amplitude quantization,i.e. the quantization of the value x(n TS).

    Advantages of digital processing in comparison to analog components:

    independent of analog components and technical difficulties with re-spect to their realization;

    in principle arbitrary high accuracy; also non-linear methods are possible,in principle even every mathematical method.

    Digital Processing of Speech and Image Signals WS 2006/2007

    52

  • 2.2 Digital Simulation using Discrete Time Systems

    Task definition:

    Given:Analog system with input signal x(t) and output signal y(t);Sampling with sampling period TS

    Wanted:Discrete System with input signal x[n] and output signal y[n], suchthat

    x[n] = x(nTS)

    results in

    y[n] = y(nTS)

    For which signals is such a digital simulation possible? The sampling theorem gives (most of) the answer.

    Digital Processing of Speech and Image Signals WS 2006/2007

    53

  • LTI System (analog to continuous time case):

    Linearity:Homogeneity:

    S { x[n]} = S {x[n]}

    Additivity:

    S {x1[n] + x2[n]} = S {x1[n]} + S {x2[n]}

    Shift invariance:S {x[n n0]} = y[n n0], n0 whole number

    Digital Processing of Speech and Image Signals WS 2006/2007

    54

  • Representation of an LTI System as discrete convolution:

    Unit impulse:

    [n] =

    {1, n = 00, n 6= 0

    The signal x[n] is represented with amplitude weighted and time shiftedunit impulses [n]. The system reacts on [n] with h[n]:

    h[n] = S {[n]}Input signal:

    x[n] =

    k=x[k] [n k]

    Output signal:

    y[n] = S

    { k=

    x[k] [n k]}

    Additivity

    =

    k=S { x[k] [n k] }

    Homogeneity

    =

    k=x[k] S { [n k] }

    Time invariance

    =

    k=x[k] h[n k]

    Input signal x[n] and output signal y[n] of a discrete time LTI system arelinked through discrete convolution.h[n] is called impulse response like in continuous time case.

    Digital Processing of Speech and Image Signals WS 2006/2007

    55

  • 2.3 Examples of Discrete Time Systems

    Difference calculation:y[n] = x[n] x[n n0]

    1-2-1-averaging:y[n] = 0.5 x[n 1] + x[n] + 0.5 x[n+ 1]

    sliding window averaging (smoothing)

    y[n] =1

    2M + 1

    Mk=M

    x[n k]

    weighted averaging: instead of constant weight

    h[n] =1

    2M + 1

    arbitrary weights can be used:

    y[n] =M

    k=Mh[k] x[n k]

    Note: the only difference from general case isfinite length of the convolution kernel h[n].

    First order difference equation:(recursive averaging, averaging with memory)

    y[n] y[n 1] = x[n]

    (Digital) resonator (second order difference equation)y[n] y[n 1] y[n 2] = x[n]

    Image processing:Gradient calculation and image enhancement(Roberts Operator, Laplace Operator)

    Digital Processing of Speech and Image Signals WS 2006/2007

    56

  • Roberts Cross Operator

    gray values x[i, j]

    i i+1

    j

    j+1

    2

    |x[i, j]|2 = (x[i, j] x[i+ 1, j + 1])2 + (x[i, j + 1] x[i+ 1, j])2

    Note: non-linear operation

    simplified:

    |x[i, j]| = |x[i, j] x[i+ 1, j + 1]|+ |x[i, j + 1] x[i+ 1, j]|

    Digital Processing of Speech and Image Signals WS 2006/2007

    57

  • Figure 2.1: Digital photo

    Figure 2.2: Gradient image

    Digital Processing of Speech and Image Signals WS 2006/2007

    58

  • Laplace Operator discrete approximation of the second derivation

    2x[i, j] = 2ix[i, j] +2jx[i, j]= x[i+ 1, j] 2x[i, j] + x[i 1, j] +

    x[i, j + 1] 2x[i, j] + x[i, j 1]= x[i+ 1, j] + x[i 1, j] + x[i, j + 1] + x[i, j 1] 4x[i, j]

    1 -2 1

    1

    1

    1

    1

    1

    1-4

    -2 i-1 i i+1

    j+1jj-1

    Image enhancement:

    y[i, j] = x[i, j]2x[i, j]= h[i, j] x[i, j]

    Digital Processing of Speech and Image Signals WS 2006/2007

    59

  • Figure 2.3: Several real cases of Laplace Operator subtraction from original image. a)Original image b) Original image minus Laplace Operator (negative values are set to 0and values above the grey scale are set to the highest grade of grey)

    Digital Processing of Speech and Image Signals WS 2006/2007

    60

  • 2.4 Sampling Theorem (Nyquist Theorem) and Re-

    construction

    The following will be analyzed and derived respectively:

    How should we choose the sampling period TS, if we want to represent acontinuous signal x(t) with its sample values x(nTS) so that the signal x(t)can be exactly reconstructed from its sample values?

    Fourier transform of the continuous time signal x(t):

    X() = F { x(t) } =

    x(t) ejt dt

    x(t) = F1 { X() } = 12

    X() ejt d (2.1)

    Signal x(t) has limited bandwidth with upper limit B, which means:X() = 0 for all || BNote: X(B) = 0

    X() in domain B < < B can be represented as Fourier Series:

    X() =

    n=an exp(jn

    B) (2.2)

    The coefficients an are given by:

    an =1

    2B

    BB

    X() exp(jn

    B) d (2.3)

    Comparison of the equations (2.1) and (2.3) shows that the coefficientsan are given by the values of the inverse Fourier transform of x(t) atpoints

    tn =n

    B(2.4)

    The band limitation of X() has to be considered for the integrationlimits in (2.1). Result:

    an = x(n

    B)

    B(2.5)

    Digital Processing of Speech and Image Signals WS 2006/2007

    61

  • Inserting Eq. (2.5) into Eq. (2.2) and then in Eq. (2.1) results in:

    x(t) =1

    2

    BB

    B

    n=

    x(n

    B) exp(jn

    B) exp(jt) d

    After swapping summation and integration and subsequent integra-tion:

    x(t) =

    n=x(n

    B)sin(B (t n

    B))

    B (t nB

    )

    Reconstruction of the signal x(t) from sample values is possible ifequidistant sample values x(

    n

    B) = x(n Ts) have the distance TS

    TS =

    B(2.6)

    The sampling period TS corresponds to the sampling frequency S:

    S =2

    TS

    Equation (2.6) shows that if the sampling frequency isS := 2 B

    the original signal x(t) can be reconstructed exactly.

    In the Fourier series representation of X() in equation (2.2), theperiod 2 B has been supposed.B is the highest frequency component of the signal x(t).

    Digital Processing of Speech and Image Signals WS 2006/2007

    62

  • Since X() is equal to zero for || B, the period 2 B can besubstituted with every period 2 B where B B. The previousderivation is also valid for this B.

    WhenB =

    TS

    then:

    x(t) =

    n=x(nTS)

    sin( (t nTS)/TS) (t nTS)/TS

    (reconstruction formula)

    Note: limt0sin(t)t = 1 (lHopitals rule)

    The condition B B results in:

    TS B

    (2.7)

    for the sampling period TS and in:

    S 2 B (2.8)for the sampling frequency S.

    The equations (2.7) and (2.8) are denoted as sampling theorem. Thesampling frequency has to be at least twice as high as the upper limitfrequency of the signal B where X() = 0 for || B. If and onlyif this condition is satisfied, an exact reconstruction (without approx-imation) of a continuous signal x(t) from its sample values x(nTS) ispossible.

    Note: The sampling frequency S = 2 B is also calledNyquist frequency.

    Digital Processing of Speech and Image Signals WS 2006/2007

    63

  • tx(t)

    t

    xs(t)

    xr(t)

    T

    T

    a)

    b)

    c)

    Figure 2.4: Ideal reconstruction of a band-limited signal (from Oppenheim, Schafer)a) original signal b) sampled signal c) reconstructed signal

    Digital Processing of Speech and Image Signals WS 2006/2007

    64

  • X()

    a)

    b)

    -S S

    . . . . . .

    XS1() S > 2,

    XS2()

    c)

    , S = 2

    -S S

    . . . . . .

    (Nyquist rate)

    XS3()

    , S < 2 (aliasing)

    . . . . . .

    SS

    d)

    Figure 2.5: Sampling of band-limited signal with different sampling rates:b) sampling rate higher than Nyquist rate - exact reconstruction possiblec) sampling rate equal to Nyquist rate - exact reconstruction possibled) sampling rate smaller than Nyquist rate - aliasing - exact reconstruction not possible

    Digital Processing of Speech and Image Signals WS 2006/2007

    65

  • Another proof using delta- and comb-function:

    Sampling of the continuous signal x(t) with S =2TS

    Band limitation: X() = 0 for || B(always possible: analog to low-pass with T () = 0 for || B)

    Sampling procedureMultiplication of a function with a comb-function in time domain

    xs(t) = Ts x(t) +

    n=(t nTs)

    results in a convolution with a comb-function in frequency domain:

    Xs() = Ts 12

    X() 2Ts

    +n=

    ( 2n

    Ts

    )

    =

    +

    X()+

    n=

    ( 2n

    Ts

    )d

    =+

    n=X

    ( n2

    Ts

    )

    = sampled signal has periodical Fourier spectrum(Analogy to Fourier series: periodical signal has line spectrum, i.e.discrete spectrum)

    No overlap if:

    B S B2B S

    Digital Processing of Speech and Image Signals WS 2006/2007

    66

  • In so-called digital simulation, the signal x(t) is represented by itssampled values x(n TS) measured at equidistant time points withdistance TS. With a proper sampling period TS an exact reconstruc-tion of the signal x(t) from the sampled values x(n TS) is possible.

    If it is possible to exactly reconstruct the signal x(t) from the sampledvalues x(nTS), then it is possible to perform a discrete time processingof the sampled values x(n TS) on a computer, which is equivalent tothe continuous time processing of the signal x(t) (digital simulation).

    Continuous time processing:

    y(t) =

    x() h(t ) d

    Discrete time processing: Sampling period TS

    x[n] := x(nTS)

    y(nTS) =

    k=x(kTS) h(nTS kTS) TS, h[n] = h(nTS)

    y[n] =

    k=x[k] h[n k]

    As a result of the convolution theorem (convolution in time domaincorresponds to multiplication in frequency domain), the band limitedinput signal gives an also band limited output signal which is exactlydetermined by its sampled values.

    Digital Processing of Speech and Image Signals WS 2006/2007

    67

  • Important:

    In the domain || < S/2 the Fourier transform of a continuous timesignal x(t) is identical with the Fouriertransform of the correspondingsampled discrete time signal x(nTS):

    X() =

    x(t) exp(jt) dt

    for || S/2 is identical to

    TS XS() = TS

    n=x(nTS) exp(jTSn)

    = TS

    n=x(nTS) exp(j2

    Sn)

    Inverse Fourier transform of discrete time signal:

    x(nTS) =1

    S

    S/2S/2

    XS() exp(jTSn) d

    One period:

    S2

    S2

    2S

    The Fourier transform of a discrete time signal is periodic in withthe period 2/TS = S.

    The Fourier transform of a discrete time signal iscontinuous in .

    Digital Processing of Speech and Image Signals WS 2006/2007

    68

  • Frequency normalization

    Define the normalized frequency N :

    N : = 2

    S

    Definition: ( now denotes a normalized frequency)

    Fourier transform of discrete time signal x[n]:

    X(ej) =+

    n=x[n] exp(jn)

    Note the notation X(ej).

    Inverse Fourier transform of discrete time signal x[n]:

    x[n] =1

    2

    X(ej) exp(jn) d

    Digital Processing of Speech and Image Signals WS 2006/2007

    69

  • 2.5 Logarithmic Scale and dB

    Why?

    large dynamic range for the amplitude values of a signal

    x(t) = A cos t

    A := amplitude(pressure, velocity, inclination, current, voltage, ... )linear variable

    A0 := reference amplitudepredefined value for calibration

    dB := decibel

    A[dB] 20 lg AA0

    , lg log10

    = 10 lg A2

    A20, A2 = quadratic variable = energy, intensity

    because of 210 = 1024 = 103:

    1 bit more =

    {factor 2 for amplitude = 6 dB= factor 4 for intensity

    3 dB = factor 2 for intensity

    Digital Processing of Speech and Image Signals WS 2006/2007

    70

  • 01

    2

    3

    4

    5

    6

    7

    0 1000 2000 3000 4000 5000 6000 7000 8000

    A

    f / Hz

    Phonem: s

    Figure 2.6: Amplitude spectrum of thevoiceless phoneme s from the wordist

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    2

    0 1000 2000 3000 4000 5000 6000 7000 8000

    log

    A

    f / Hz

    Phonem: s

    Figure 2.7: Logarithmic amplitude spec-trum of the phoneme s

    0

    2

    4

    6

    8

    10

    12

    0 1000 2000 3000 4000 5000 6000 7000 8000

    A

    f / Hz

    Phonem: ae

    Figure 2.8: Amplitude spectrum of thevoiced phoneme ae from the wordAh

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    2

    2.5

    0 1000 2000 3000 4000 5000 6000 7000 8000

    log

    A

    f / Hz

    Phonem: ae

    Figure 2.9: Logarithmic amplitude spec-trum of the phoneme ae

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    0 1000 2000 3000 4000 5000 6000 7000 8000

    A

    f / Hz

    Pause

    Figure 2.10: Amplitude spectrum of aspeech pause

    -3

    -2.5

    -2

    -1.5

    -1

    -0.5

    0

    0 1000 2000 3000 4000 5000 6000 7000 8000

    log

    A

    f / Hz

    Pause

    Figure 2.11: Logarithmic amplitudespectrum of a speech pause

    Digital Processing of Speech and Image Signals WS 2006/2007

    71

  • 2.6 Quantization

    Uniform quantization

    -XMAX XMAX

    Quantisation: x = Q(x)

    B bits correspond to 2B quantisation levels

    Boundaries: x0, x1, . . . , xk, . . . , xK where K = 2B

    Width of one quantisation level using uniform quantisation:

    = 2 XMAX2B

    Quantisation error:

    2e =

    +

    (x x)2 p(x) dx =Kk=1

    xkxk1

    (x xk)2 p(x) dx

    for uniform quantisation:

    a) xk xk1 = = const(k)b) xk =

    12(xk1 + xk)

    uniform distribution with p(x) = const(x) results in:

    2e =k

    212 1K

    =212

    =X2MAX3 22B

    Digital Processing of Speech and Image Signals WS 2006/2007

    72

  • signal-to-noise ratio in dB (general definition):

    SNR[dB] := 10 lg2x2n

    2x = power of the signal x2n = power of the noise n

    SNR = signal-to-noise ratio

    signal-to-quantisation noise ratio (special case):

    SNR[dB] := 10 lg2x2e

    2e = power of the noise caused by quantisation errors

    uniform quantisation using B bits:

    SNR[dB] = 6.02 B + 4.77 20 lg XMAXx

    if signal amplitude has Gaussian distribution, only 0.064% of sampleshave amplitude greater than 4x:

    SNR[dB] = 6.02 B 7.2 for XMAX = 4x

    Digital Processing of Speech and Image Signals WS 2006/2007

    73

  • 2.7 Fourier Transform and zTransform

    Transfer function and Fourier transform

    Eigenfunctions of discrete linear time invariant systems (analog to timecontinuous case):

    x[n] = ej n < n < ( is dimensionless here)

    Proof:

    x[n] = ej n

    y[n] =

    k=h[k] ej (nk)

    = ej n

    k=h[k] ej k

    Define: H(ej ) =

    k=h[k] ej k

    Remark:The Fourier transform of a discrete time signal is already introduced asFourier series during the derivation of sampling theorem and reconstruc-tion formula (equation (2.2)).

    Result: y[n] = ej n H(ej )

    Digital Processing of Speech and Image Signals WS 2006/2007

    74

  • ztransform:

    Fourier transform of a discrete time signal: x[n]

    X(ej) =+

    n=x[n] ejn

    periodic in

    is normalized frequency, thence:

    <

    X is evaluated on the unit circle (ej)

    Generalization: X is evaluated for any complex values z. That results in ztransform:

    X(z) =+

    n=x[n] zn

    Reasons for ztransform1. analytically simpler, function theory methods are applicable

    2. better handling of convergence problem:

    convergence of finite signal, i.e. x[n] = 0 for each n > N0

    convergence of infinite signal depends on z

    Inverse ztransform:

    x[n] =1

    2j

    X(z) zn1 dz

    formally: z = ej dz = jzd

    x[n] =1

    2

    20

    X(ej) ejn d

    Digital Processing of Speech and Image Signals WS 2006/2007

    75

  • Example of Fourier transform and ztransform:

    Truncated geometric series

    x[n] =

    {an 0 n N 10 otherwise

    ztransform

    X(z) =N1n=0

    an zn =N1n=0

    (a z1)n =1 (a z1)N1 a z1

    =1

    zN1zN aNz a

    Fourier transform

    ztransform results in Fourier transformation using substitution

    z = ej

    X(ej) =1 aN ejN1 a ej

    special case for a = 1 (discrete time rectangle):

    = exp

    (j(N 1)

    2

    ) sin(N2

    )sin(2

    )

    Digital Processing of Speech and Image Signals WS 2006/2007

    76

  • Proof for the ztransform inversion

    Statement:

    x[k] =1

    2j

    X(z) zk1 dz

    Cauchy integration rule1

    2j

    zkdz =

    {1 k = 10 k 6= 1

    1

    2j

    X(z) zk1dz =

    1

    2j

    n

    x[n] zn+k1dz

    =n

    x[n]1

    2j

    zn+k1dz

    6= 0 only for n = k= x[k]

    Fourier:z = ej = dz = j ej d

    Then:

    x[n] =1

    2j

    +

    X(ej) (ej)n1 j ejd

    Integration path is unit circle because of ej

    =1

    2

    +

    X(ej) ejn d

    Digital Processing of Speech and Image Signals WS 2006/2007

    77

  • 2.8 System Representation and Examples

    Example 1: Difference calculation

    Difference equationy[n] = x[n] x[n n0], n0 integral number

    Fourier transform gives:

    n=y[n] ejn =

    n=

    x[n] ejn

    n=x[n n0] ejn

    Y (ej) = X(ej)

    n=

    (x[n] ejn ejn0

    )= X(ej) ejn0 X(ej)

    Then follows:H(ej) =

    Y (ej)

    X(ej)

    = 1 ejn0

    |H(ej)|2 = (1 cos(n0))2 + sin2(n0)= 1 2cos(n0) + cos2(n0) + sin2(n0)= 2 (1 cos(n0))

    |H(ei)|2

    0

    1

    2

    3

    4

    5

    n0

    Digital Processing of Speech and Image Signals WS 2006/2007

    78

  • Example 2: First order difference equation

    Delay

    y[n]x[n]

    +

    y[n-1]

    x[n] + y[n 1] = y[n] y[n] y[n 1] = x[n]

    Method 1: Estimation of transfer function H(ej)from impulse response h[n]:

    From the Eq. above with y[n] = h[n] and x[n] = [n] follows:h[n] = [n] + h[n 1]

    = [n] + [n 1] + 2 [n 2] + =

    {n, n 00, otherwise

    Fourier spectrum/transfer function H(ej)

    H(ej) =+

    k=h[k] ejk

    =+k=0

    k ejk

    =+k=0

    ( ej

    )k=

    1

    1 ej for || < 1

    Digital Processing of Speech and Image Signals WS 2006/2007

    79

  • Method 2: Estimation of transfer function H(ej) usingFourier transform of difference equation:

    Difference equation:y[n] y[n 1] = x[n]

    Fouriertransform:Y (ej) ej Y (ej) = X(ej)

    Result:

    H(ej) =Y (ej)

    X(ej)

    =1

    1 ej

    Digital Processing of Speech and Image Signals WS 2006/2007

    80

  • Example 3: Linear difference equations (with constant coefficients)

    Difference equation:

    y[n] =Ii=0

    b[i] x[n i]Jj=1

    a[j] y[n j]

    z-transform:

    Y (z) = X(z)Ii=0

    b[i]zi Y (z)Jj=1

    a[j]zj

    Result:

    H(z) =Y (z)

    X(z)

    =

    Ii=0

    b[i] zi

    1 +Jj=1

    a[j] zj

    =+

    n=h[n] zn

    Using the definition of H(z) we can optain the impulse response as afunction of the coefficients of the difference equation in the above term.

    Remark:

    If we factorise denominator and numerator polynoms into linear fac-tors, we can obtain a zero-pole-representation of a discrete time LTIsystem:

    H(i) =Ii=1(z vi)Jj=1(z wj)

    with zeros vi C and poles wj C.

    Digital Processing of Speech and Image Signals WS 2006/2007

    81

  • in general:

    h[n] has infinite number of non-zero values

    = IIRfilter: Infinite Impulse Response

    but if: a[j] 0 jh[n] identical to zero outside of a finite interval

    h[n] =

    {b[n] n = 0, . . . , I0 otherwise

    = FIRfilter: Finite Impulse Response

    Digital Processing of Speech and Image Signals WS 2006/2007

    82

  • Example 4:Impulse response as truncated geometric series

    h[n] =

    {an 0 n M a IR0 otherwise

    H(z) =Mn=0

    an zn =1 aM+1 z(M+1)

    1 a z1

    system operation:

    y[n] =

    k=h[k] x[n k]

    =Mk=0

    akx[n k]

    or as difference equation (recursively)

    y[n] a y[n 1] = x[n] aM+1x[nM 1]

    Digital Processing of Speech and Image Signals WS 2006/2007

    83

  • For this example we consider the zero-pole-representation:

    H(z) =1 (za)(M+1)1 (za)1

    > 0

    Zeros: zk = a ej 2pikM+1 k = 0, 1, . . . ,MPole: z0 = a

    (cancelled by zero z0 = a)

    M=11

    Re

    Im

    a

    Digital Processing of Speech and Image Signals WS 2006/2007

    84

  • Example 5:Fibonacci numbers

    Difference equation:

    n 0 h[n+ 2] = h[n+ 1] + h[n]h[0] = h[1] = 1

    n < 0 h[n] = 0

    H(z) =

    n=h[n]zn

    = 1 + z1 +n=0

    h[n+ 2]z(n+2)

    = 1 + z1 +n=0

    h[n+ 1]z(n+2) +n=0

    h[n]z(n+2)

    = 1 + z1 + z1n=0

    h[n+ 1]z(n+1) + z2n=0

    h[n]zn

    = 1 + z1 (1 +n=1

    h[n]zn) H(z)

    + z2n=0

    h[n]zn H(z)

    = 1 + z1H(z) + z2H(z)

    H(z)(1 z1 z2) = 1H(z) =

    1

    1 z1 z2=

    15

    ( a1 az1

    b

    1 bz1)

    where a =1 +

    5

    2and b =

    152

    Digital Processing of Speech and Image Signals WS 2006/2007

    85

  • For a and b the following holds:

    1

    1 (az )=

    n=0

    ( az

    )n=

    n=0

    anzn

    That results in:

    H(z) =n=0

    15

    (an+1 bn+1 ) zn

    !=

    n=0

    h[n] zn

    h[n] =15

    (an+1 bn+1 )

    Digital Processing of Speech and Image Signals WS 2006/2007

    86

  • Table 2.1: Fourier transform pairs

    signal Fouriertransform

    1. [n] 1

    2. [n n0] ejn0

    3. 1 ( < n

  • 2.9 Discrete Time Signal Fourier Transform Theo-

    rem

    Basically there is no difference between FT theorem for the continuoustime and the discrete time case because summation has the same proper-ties as integration. Only differentiation and difference calculation are notcompletely analog, because it is not possible to form a derivative in thediscrete time case.

    Table 2.2: Fourier transform Theorems

    signal Fouriertransformx[n], y[n] X(ej), Y (ej)

    1. ax[n] + by[n] aX(ej) + bY (ej)

    2. x[n nd], ejndX(ej)nd is integral number

    3. ej0nx[n] X(ej(0))

    4. x[n] X(ej)X(ej) if x[n] is real

    5. nx[n] jdX(ej)

    d

    6. x[n] y[n] X(ej)Y (ej)

    7. x[n]y[n]1

    2

    pipi

    X(ej)Y (ej())d

    8. x[n] x[n 1] (1 ej)X(ej)|1 ej|2 = 2(1 cos)

    Parseval theorem

    9.

    n=

    |x[n]|2 = 12

    pipi

    |X(ej)|2d

    10.

    n=

    x[n]y[n] =1

    2

    pipi

    X(ej)Y (ej)d

    Digital Processing of Speech and Image Signals WS 2006/2007

    88

  • Example 1 corresponding to Theorem 5:

    X(ej) =+

    k=x[k] ejk

    d

    dX(ej) =

    d

    d

    (+

    k=x[k] ejk

    )

    =+

    k=

    d

    d

    (x[k] ejk

    )=

    +k=

    x[k] (jk) ejk

    j dd

    X(ej) =+

    k=k x[k] ejk

    F {n x[n]} = j dd

    F {x[n]}

    Example 2 corresponding to Theorem 8:

    F {x[n] x[n 1]} =+

    k=x[k] ejk

    +k=

    x[k 1] ejk

    =+

    k=x[k] ejk

    +k=

    x[k] ejk ej

    = X(ej)(1 ej)

    = |F {x[n] x[n 1]} |2 = |F {x[n]} |2 |1 ej|2= |F {x[n]} |2 2 (1 cos())

    Digital Processing of Speech and Image Signals WS 2006/2007

    89

  • 2.10 Discrete Fourier Transform: DFT

    The Fourier transform for discrete time signals and systems has been ex-plained on the previous pages. For discrete time signals with finite lengththere is also another Fourier representation called Discrete Fourier Trans-form (DFT).

    The DFT plays a central role in digital signal processing.

    Decisive reasons:

    fast algorithms exist for DFT calculation(Fast Fourier Transform, FFT).

    discrete frequencies k can be better represented in the computer thancontinuous frequencies .

    Digital Processing of Speech and Image Signals WS 2006/2007

    90

  • Assume a discrete time signal x[n] with finite length (see also chapter 3.2on page 135):

    x[n] =

    {x[n] 0 n N 10 otherwise

    Note: For a continuous time signal it is impossible in the strict sense to beband-limited and time-limited (truncation effect = Windowing).

    The discrete time signal Fourier transform for x[n] is:

    X(ej) =N1n=0

    x[n] exp(jn)

    is a continuous variable. The period is 2. Frequency discretisationis made by sampling along the frequency axis.

    The Fourier transform X(ej) is evaluated at

    k =2

    Nk where k = 0, 1, . . . , N 1

    Define:X[k] : = X(ej)| = k

    N=8

    Re

    ImC

    Digital Processing of Speech and Image Signals WS 2006/2007

    91

  • Discrete Fourier Transform (DFT):

    X[k] =N1n=0

    x[n] exp(j 2N

    k n), k = 0, 1, . . . , N 1

    Inverse DFT:

    x[n] =1

    N

    N1k=0

    X[k] exp(j2

    Nk n), n = 0, 1, . . . , N 1

    Remark:

    This equation can be proven by inserting the equation for X[k] in theequation for x[n] and using the orthogonality:

    1

    N

    N1n=0

    exp

    (j2

    Nkn

    )=

    {1 k = m N, m is integral number0 otherwise

    Note:

    Consider the analogy between inverse DFT (above) and inverseFourier transform of discrete time signal:

    x[n] =1

    2

    20

    X(ej) ejn d

    Under the given conditions the integral is equal to the sum(without approximation!).

    Digital Processing of Speech and Image Signals WS 2006/2007

    92

  • Remarks:

    DFT coefficients X[k] are not an approximation of the discrete timesignal Fourier transform X(ej). On the contrary:

    X[k] = X(ej)| = k Number of the coefficients X[k] depends on the signal length N . Afiner sampling of the discrete time signal Fourier transform is possibleby appending zeros to the signal x[n] (ZeroPadding).

    x[n]

    nN-1

    Digital Processing of Speech and Image Signals WS 2006/2007

    93

  • Interpretation of Fourier coefficients

    Fourier transform X(ej) of the time discrete signal x[n]

    |X(e )|

    pipi

    j

    Evaluation at N discrete sampling points

    k =2

    Nk

    yields the DFT coefficients X[k].

    At first k lies in the domain k = N2+ 1, . . . , 0, . . . ,

    N

    2.

    |X(e )|, |X[k]|

    pipi

    j

    -N/2+1 N/21 20-1 k

    Digital Processing of Speech and Image Signals WS 2006/2007

    94

  • Because of the periodicity of X(ej) the coefficients X[k] can also beobtained by shifting the sampling points with negative frequency intothe positive frequency domain (by one period).

    Then k = 0, . . . , N/2, . . . , N 1.

    X[k] =N1n=0

    x[n] exp(j 2N

    k n)

    |X(e )|, |X[k]|

    pipi

    j

    1 20 N-1 k

    Interpretation of coefficients for general signal x[n]:k = 0 f = 0

    1 k N2 1 0 < f < fS

    2

    k =N

    2 fS

    2N

    2+ 1 k N 1 fS

    2< f < 0

    Digital Processing of Speech and Image Signals WS 2006/2007

    95

  • Symmetric relations by real signals:

    For DFT coefficients X[k] of a real signal x[n] the following holds:X[k] = X[N k]

    Re(X[k]) = Re(X[N k])Im(X[k]) = Im(X[N k])

    For the amplitude spectrum |X[k] | the following holds:|X[k] |2 = Re2{X[k]} + Im2{X[k]}

    = |X[N k] |2

    Digital Processing of Speech and Image Signals WS 2006/2007

    96

  • Realization of DFT:

    /* PI = 3.14159265358979 */

    /* x: input signal */

    /* N: length of input signal */

    /* Xre, Xim: real and imaginary part of DFT coefficients */

    void dft (int N, float x[], float Xre[], float Xim[]) {

    int n, k;

    float SumRe, SumIm;

    for (k=0; k

  • 2.11 DFT as Matrix Operation

    Notation with unit roots

    X[k] =N1n=0

    x[n] exp (2jN

    k n)

    =N1n=0

    x[n] W knN

    where WN := exp (2jN

    )

    N=12

    W =10N

    W 1N

    W 2NW 3N

    Periodicity of WNunit root:

    exp (j k) = exp (j2Nk) = (WN)

    k

    WN := exp (j2N)

    Digital Processing of Speech and Image Signals WS 2006/2007

    98

  • Note:

    1. W rN = Wr mod NN

    2. W kNN = (WNN )

    k = 1k = 1 k Z

    3. W 2N = [exp (2j

    N)]2 = exp (2j

    N2)

    = exp ( 2jN/2

    ) = WN/2 N even

    4. WN/2N = exp (

    2j

    N

    N

    2) = exp (j) = 1

    5. Wr+N/2N = W

    N/2N W

    rN = W rN

    Digital Processing of Speech and Image Signals WS 2006/2007

    99

  • DFT as matrix multiplication

    X[k] =N1n=0

    x[n] exp (2jN

    k n)

    =N1n=0

    W knN x[n]

    =N1n=0

    {WN}kn x[n]

    with the matrix {WN} and the matrix elements:

    {WN}kn := W knN

    Inversion:

    x[n] =1

    N

    N1k=0

    X[k] exp (2j

    Nk n)

    =1

    N

    N1k=0

    (W1N )kn X[k]

    =1

    N

    N1k=0

    {W1N }kn X[k]

    Therefore for the matrix {WN}1 holds:

    {WN}1 := 1N{W1N }

    Digital Processing of Speech and Image Signals WS 2006/2007

    100

  • DFT matrix operation: properties

    DFT: invertible linear mapping

    N complex signal values N complex Fourier components

    N real signal values N2complex Fourier components

    (due to symmetry)

    in words:DFT causes no information loss in the signal.

    Parseval theorem for DFTgeneral Fourier:

    N1n=0

    |x[n]|2 = 12

    +|X(ej)|2d

    special DFT: (recalculate for yourself!)

    N1n=0

    |x[n]|2 = 1N

    N1k=0

    |X[k]|2

    in words:

    Disregarding the factor1

    N, the DFT is a norm conserving (= energy

    conserving) transformation (mathematical terminology: unitary).

    Digital Processing of Speech and Image Signals WS 2006/2007

    101

  • 2.12 From Continuous Fourier Transform to Matrix

    Representation of Discrete Fourier Transform

    Assumption: band-limited signal x(t)

    Fourier transform of the continuous time signal x(t):

    X() = F{x(t)} =

    x(t) ejtdt (2.9)

    For the exact reconstruction (without approximation) of the continuoustime signal from sampled values, the samples x[n] = x(n Ts) must havethe distance of at most

    Ts =

    B

    (sampling theorem).

    This results in the Fourier transform of the discrete time signal x[n]:

    X(ej) =

    x[n] ejn (2.10)

    where is frequency normalised on T s

    Functions (2.9) and (2.10) agree in interval [S/2,+S/2] = [B,+B].X

    ()||

    BB SS

    Digital Processing of Speech and Image Signals WS 2006/2007

    102

  • 00.2

    0.4

    0.6

    0.8

    1

    0 N-10

    0.2

    0.4

    0.6

    0.8

    1

    0 N-1

    Figure 2.12: Hanning window

    The signal x[n] is further decomposed by applying a window function w[n](windowing):

    w[n] =

    {. . . n = 0, . . . , N 10 otherwise

    Windowed signal y[n]:y[n] = w[n] x[n]

    can be analyzed using Fourier transform or DFT.

    Y (ejk) =N1n=0

    y[n] ejk

    DFT:

    k =2k

    Nwhere k = 0, . . . , N 1

    Y [k] =N1n=0

    y[n] e2piNkn

    Matrix representation:

    K = N

    Y [0]...

    Y [k]...

    Y [K 1]

    =

    ...

    e2pijNnk

    ...

    y[0]...

    y[n]...

    y[N 1]

    Digital Processing of Speech and Image Signals WS 2006/2007

    103

  • 2.13 Frequency Resolution and Zero Padding

    Task: signal x[n] with finite length N is given.

    Wanted: Fourier transform X(ejk) at

    k =2

    Kk, where k = 0, 1, . . . , K 1 and K > N

    Inserting the definitions:

    X(ejk) =N1n=0

    x[n] exp (2jK

    k n)

    =K1n=0

    x[n] exp (2jK

    k n)

    where x[n] =

    {x[n] n = 0, . . . , N 10 n = N, . . . ,K 1

    i.e. Zero Padding (appending zeros).

    Matrix representation:

    X[0]...

    ...X[K 1]

    =

    W nkK

    x[0]...

    ...x[N 1]

    0...0

    n = 0

    n = N 1n = N

    n = K 1

    Note:Zero Padding does not introduce any additional information into the sig-nal. This is only a trick so that DFT and particularly FFT (Fast FourierTransform) can be performed with a

    higher frequency resolution .

    Digital Processing of Speech and Image Signals WS 2006/2007

    104

  • 2.14 Finite Convolution

    Input signal and convolution kernel have finite duration

    Consider finite convolution:

    Impulse response: h[n] 0 for n 6 {0, 1, 2, . . . , Nh 1} Input signal: x[n] 0 for n 6 {0, 1, 2, . . . , Nx 1} Output signal:

    y[n] =

    k=h[k] x[n k]

    =

    Nh1k=0

    h[k] x[n k]

    k

    k

    h[k]

    x[-k]

    N-1

    -(N-1)x

    h

    0

    n=0

    Altogether: Nx +Nh 1 positions with overlap Therefore only Nx + Nh 1 values of output signal can be differentfrom zero:

    y[n] =

    0 n > Nx +Nh 2. . . n = 0, 1, . . . , Nx +Nh 20 n < 0

    Digital Processing of Speech and Image Signals WS 2006/2007

    105

  • k0 Nh-1=12

    1

    k0 Nx-1=4

    x[k]1 0.8

    0.60.4

    0.2

    k0-Nx

    x[n-k] n=-1,

    k0

    x[n-k] n=m (m>0 & m Nh +Nx 2, no overlap with h[k], convolution y[n] = 0c) resulting convolution y[n].

    Digital Processing of Speech and Image Signals WS 2006/2007

    106

  • Finite convolution using DFT

    Convolution theorem:

    y[n] =

    k=h[k] x[n k]

    Fourier:

    Y (ej) = H(ej) X(ej), 0 2Also valid for sample frequencies:

    k :=2

    Nk, k = 0, . . . , N 1 for any N

    Notation: Y [k] = H[k] X[k]

    Question: How to choose the length N of the DFT ? Reminder: different lengths

    x[n]: Nx non-zero values

    h[n]: Nh non-zero values

    y[n]: Ny = Nx +Nh 1 non-zero values Answer:

    The convolution theorem is certainly correct for any N > 0.

    If we want to calculate the output signal completely from Y [k],we have to know Y [k] for

    at least N = Nx +Nh 1frequency values k = 0, 1, . . . , N 1.

    In words: for the DFT length N must be valid:

    N Nx + Nh 1Method: Zero Padding, i.e. appending zeros.

    Note: The FFT will be introduced on the next pages. A comparison ofcosts for realization of the finite convolution by DFT and FFT can befound at the end of paragraph 2.15.

    Digital Processing of Speech and Image Signals WS 2006/2007

    107

  • 2.15 Fast Fourier Transform (FFT)

    Principle of FFT:Calculation of the DFT can be done by successive decomposition intosmaller DFT calculations. In this way, the number of elementary oper-ations (multiplications and additions) is dramatically reduced:

    FFT:

    N 2 N2ldN operations

    N = 1024 :2 NldN

    =2 102410

    = 200 factor of velocity gain

    The matrix is decomposed into a product of sparse matrices, therefore Nwith lot of prime factors is convenient (not necessarily only powers of two).

    Terminology for different variants of FFT:

    in time in frequency in place: yes/no radix 2 radix 4 decomposition to prime factors instead of N = 2n

    History

    1965 Cooley and Tukey1942 Danielson and Lanczos1905 Runge1805 Gauss

    Digital Processing of Speech and Image Signals WS 2006/2007

    108

  • Algorithms which are based on a decomposition of the signal x[n] are calleddecimationintime algorithms.

    The case N = 2 is considered in the following.

    X[k] =N1n=0

    x[n] exp(j 2N

    k n) where k = 0, 1, . . . , N 1

    =N1n=0

    x[n] W nkN where WnkN = exp(j

    2

    Nk n)

    Decomposition of the sum over n into the sums over even and odd n:

    X[k] =

    N/21r=0

    x[2r] W 2rkN +

    N/21r=0

    x[2r + 1] W(2r+1)kN

    =

    N/21r=0

    x[2r] (W 2N)rk + W kN

    N/21r=0

    x[2r + 1] (W 2N)rk

    Because of

    W 2N = exp(2j2

    N) = exp(j 2

    N/2) = WN/2

    for k = 0, . . . , N 1 holds:

    X[k] =

    N/21r=0

    x[2r] W rkN/2 + WkN

    N/21r=0

    x[2r + 1] (WN/2)rk

    = G[k] + W kN H[k]

    Each of the two sums corresponds to the DFT with the length N/2.The first sum is a N/2DFT of the even indexed signal values x[n],the second sum is a N/2DFT of the odd indexed values.

    The DFT of the length N can be obtained by getting the two N/2DFTs together, with the factor W kN .

    Digital Processing of Speech and Image Signals WS 2006/2007

    109

  • Complexity:The complexity O(N 2) of one-dimensional FT can be reduced by adequateresorting values from two FTs with length N2 and complexity O(2 (N2 )2) =N2

    2 . By successive application of this resorting the complexity can be re-duced to O(N logN).The case N = 23 = 8 is considered in the following.

    X[4] can be obtained from H[4] and G[4] according to previous equa-tion.

    Because of the DFTlength N2 = 4:H[4] = H[0] and G[4] = G[0]

    And then:

    X[4] = G[0] + W 4N H[0]

    The values X[5], X[6] and X[7] can be obtained analogously.

    Flow diagram for decomposition of one N -DFT into two N/2DFTs:

    x[n] X[k]

    N/2-point

    DFT

    N/2-point

    DFT

    x[0]

    x[2]

    x[4]

    x[6]

    x[1]

    x[3]

    x[5]

    x[7]

    G[0]

    G[1]

    G[2]

    G[3]

    H[0]

    H[1]

    H[2]

    H[3]

    X[0]

    X[1]

    X[2]

    X[3]

    X[4]

    X[5]

    X[6]

    X[7]

    W0N

    W1N

    W2N

    W3N

    W4N

    W5N

    W6N

    W7N

    Figure 2.14: Flow diagram for decomposition of one N -DFT to two N/2DFTs withN = 8

    Digital Processing of Speech and Image Signals WS 2006/2007

    110

  • Further analogous decomposition, until only DFTs with the lengthN = 2 remain (so called Butterfly Operation)

    Resulting flow diagram of the FFT:

    x[n] X[k]x[0]

    x[4]

    x[2]

    x[6]

    x[1]

    x[5]

    x[3]

    x[7]

    W0N

    W0N

    W0N

    W0N

    -1

    -1

    -1

    -1

    W0N

    W2N

    -1

    -1

    -1

    -1

    W0N

    W1N

    W2N

    W3N

    X[0]

    X[1]

    X[2]

    X[3]

    X[4]

    X[5]

    X[6]

    X[7]

    W0N

    W2N

    -1

    -1

    -1

    -1

    Figure 2.15: Flow diagram of an 8pointFFT using Butterfly operations.

    Digital Processing of Speech and Image Signals WS 2006/2007

    111

  • Complexity reduction

    Number of complex multiplications in FFTis N/2 ldN .

    Comparison:Direct application of the DFT definition needsN 2 complex multiplications.

    Example: N = 1024 = 210

    N 2

    N/2 ldN 200

    Complexity reduction by factor 200

    FFT with the base 2 is not minimal according to number of addi-tions, FFT with the base 4 can be better.

    Digital Processing of Speech and Image Signals WS 2006/2007

    112

  • Matrix representation of the FFT principle

    The complex Fourier matrix can be decomposed into the product ofr = ldN matrices, each of them having only two non-zero elements ineach column.

    The following graph shows the decomposition of the Fourier matrix inthe case of inverse transformation.

    w corresponds to W1N

    X = |wnk|X = T3 T2 T1 TS xThis is how the decomposition into r + 1 = 4 matrices looks like(w4 = 1, w8 = 1):

    1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 w w2 w3 w4 w5 w6 w7 1 w w2 w3 w4 w5 w6 w7

    1 w2 w4 w6 w8 w10 w12 w14 1 w2 w4 w6 1 w2 w4 w6

    1 w3 w6 w9 w12 w15 w18 w21 1 w3 w6 w w4 w7 w2 w5

    |wnk| = 1 w4 w8 w12 w16 w20 w24 w28 = 1 w4 1 w4 1 w4 1 w41 w5 w10 w15 w20 w25 w30 w35 1 w5 w2 w7 w4 w w6 w3

    1 w6 w12 w18 w24 w30 w36 w42 1 w6 w4 w2 1 w6 w4 w2

    1 w7 w14 w21 w28 w35 w42 w49 1 w7 w6 w5 w4 w3 w2 w

    Digital Processing of Speech and Image Signals WS 2006/2007

    113

  • Signal flow diagram

    The calculation operations which correspond to the matrix representationof FFT can be showed in a signal flow diagram.

    T3 T2 T1 TS1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 00 1 0 0 0 w 0 0 0 1 0 w2 0 0 0 0 1 -1 0 0 0 0 0 0 0 0 0 0 1 0 0 00 0 1 0 0 0 w2 0 1 0 -1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 00 0 0 1 0 0 0 w3 0 1 0 w2 0 0 0 0 0 0 1 -1 0 0 0 0 0 0 0 0 0 0 1 01 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 00 1 0 0 0 w 0 0 0 0 0 0 0 1 0 w2 0 0 0 0 1 -1 0 0 0 0 0 0 0 1 0 00 0 1 0 0 0 w2 0 0 0 0 0 1 01 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 00 0 0 1 0 0 0 w3 0 0 0 0 0 1 0 w2 0 0 0 0 0 0 1 -1 0 0 0 0 0 0 0 1

    TS T1 T2 T3

    -1

    -1

    -1

    -1

    -1

    -1

    -1

    2

    2

    -1

    2

    3

    1

    1

    1

    1

    X[0]

    X[1]

    X[2]

    X[3]

    X[4]

    X[5]

    X[6]

    X[7]

    x[0]

    x[1]

    x[2]

    x[3]

    x[4]

    x[5]

    x[6]

    x[7]

    Digital Processing of Speech and Image Signals WS 2006/2007

    114

  • Matrices T1, T2 and T3 contain exactly two non-zero elements in eachrow.

    Non-zero elements are realizing the Butterfly Operation.

    Matrix T1: step width of the Butterfly Operation is 1Matrix T2: step width of the Butterfly Operation is 2Matrix T3: step width of the Butterfly Operation is 4

    step widths can be found: in signal flow diagram

    distance between the non-zero elements in T1, T2 and T3

    Digital Processing of Speech and Image Signals WS 2006/2007

    115

  • Butterfly Operation

    Signal flow diagram and matrix representation of the FFT are basedon the following basic operation:

    Xm-1[p]

    Xm-1[q]

    Xm[p]

    Xm[q]-1

    WrN

    For two input values Xm1[p] and Xm1[q] this operation producestwo output values Xm[p] and Xm[q]. The output values are thereby alinear combination of the input values.

    Because of the flow graph, the operation is calledButterfly Operation.

    Xm[p] = Xm1[p] +W rN Xm1[q]Xm[q] = Xm1[p]W rN Xm1[q]

    [Xm[p]Xm[q]

    ]=

    [1 W rN1 W rN

    ] [Xm1[p]Xm1[q]

    ]

    Digital Processing of Speech and Image Signals WS 2006/2007

    116

  • Bit Reversal

    The matrix representation of the FFT uses a sorting matrix, i.e. thesignal which is to be transformed is at first resorted.

    Example for N = 8:

    n Binary representation Reversed n

    0 000 000 01 001 100 42 010 010 23 011 110 64 100 001 15 101 101 56 110 011 37 111 111 7

    Bit Reversal is a necessary part of the FFTAlgorithm.

    Bit Reversal for N = 23

    Digital Processing of Speech and Image Signals WS 2006/2007

    117

  • 2.16 FFT Implementation

    Fortran versionC adapted from: Oppenheim, Schafer p. 608

    C SUBROUTINE FFT DecimationInTime (X, ld n) **********************************

    C ****************************************************************************

    PARAMETER PI = 3.14159265358979

    PARAMETER N max = 2048

    COMPLEX X(N max) ! array for input AND output

    COMPLEX Temp ! temporary storage

    COMPLEX W uni ! root of unity

    COMPLEX W pow ! powers of W uni

    INTEGER N, ld N, ip, iq, iqbeq, j, k, i exp, istp

    N = 2**ld n

    IF (N.GT.N max) STOP

    C BIT Reversed Sorting *********************************************************

    j = 1

    DO i = 1, N-1

    IF (i.LT.j) THEN ! swap X(j) and X(i)

    Temp = X(j)

    X(j) = X(i)

    X(i) = Temp

    ENDIF

    k = N/2

    DO WHILE (k.LT.j)

    j = j - k

    k = k / 2

    ENDDO

    j = j + k

    ENDDO

    C End of Bit Reversed Sorting **************************************************

    C FFT Butterfly Operations *****************************************************

    DO i=1, ld N

    i exp = 2**i ! exponent

    istp = i exp/2 ! stepsize

    W pow = (1.0,0.0)

    W uni = CMPLX (COS (PI/FLOAT(istp)), -SIN(PI/FLOAT(istp)))

    DO ipbeg = 1, istp

    DO ip = ipbeg, N, i exp

    iq = ip + istp

    Temp = X(iq) * W pow

    X(iq) = X(iq) - Temp

    X(ip) = X(iq) + Temp

    ENDDO

    W pow = W pow * W uni

    ENDDO

    ENDDO

    C End of FFT Butterfly Operations **********************************************

    RETURN

    END

    Digital Processing of Speech and Image Signals WS 2006/2007

    118

  • Explanations about Fortran Program

    Two program parts:

    1. Bit Reversal

    2. Butterfly Operations

    3 loops with variables i, ipbeg, ip are controlling the execution ofthe Butterfly operations

    outer loop i:i specifies the level of the FFT

    With exception of the first level, Butterfly operations are nested.Therefore two loops are used for the Butterfly operations withinone level.

    middle loop, ipbeg:ipbeg: goes over the nested Butterfly operationsi=1: ipbeg=1i=2: ipbeg=1,2i=3: ipbeg=1,2,3,4iqbeg: specifies the sequence of starting points for inner loop

    inner loop, ip:ip: specifies the first element of the Butterfly operationistp: step width of the Butterfly operationiq=ip+istp: specifies the second element for Butterfly operationinner loop is started once per nesting

    Digital Processing of Speech and Image Signals WS 2006/2007

    119

  • x[0]

    x[4]

    x[2]

    x[6]

    x[1]

    x[5]

    x[3]

    x[7]

    W0N

    W0N

    W0N

    W0N

    -1

    -1

    -1

    -1

    W0N

    W2N

    -1

    -1

    -1

    -1

    W0N

    W1N

    W2N

    W3N

    X[0]

    X[1]

    X[2]

    X[3]

    X[4]

    X[5]

    X[6]