Time Domain Methods1 (1)

Embed Size (px)

Citation preview

  • 8/3/2019 Time Domain Methods1 (1)

    1/51

    Time-Domain Methods forSpeech Processing

    Introduction

  • 8/3/2019 Time Domain Methods1 (1)

    2/51

    Speech Processing Methods

    Time-Domain Method:

    Involving the waveform of speech signal

    directly.

    Frequency-Domain Method:

    Involving some form of spectrum

    representation.

  • 8/3/2019 Time Domain Methods1 (1)

    3/51

    Time-Domain Measurements

    Averagezero-crossing rate, energy, and the

    autocorrelation function.

    Very simple to implement.

    Provide a useful basis for estimating

    important features of the speech signal, e.g., Voiced/unvoiced classification

    Pitch estimation

  • 8/3/2019 Time Domain Methods1 (1)

    4/51

    Time-Domain Methods forSpeech Processing

    Time-Dependent

    Processing of Speech

  • 8/3/2019 Time Domain Methods1 (1)

    5/51

    Time Dependent Natural of Speech

    This is a test.

  • 8/3/2019 Time Domain Methods1 (1)

    6/51

    Time Dependent Natural of Speech

  • 8/3/2019 Time Domain Methods1 (1)

    7/51

    Short-Time Behavior of Speech

    Assumption

    The properties of speech signal changeslowly with time.

    Analysis Frames Short segment of speech signal.

    Overlap one anotherusually.

  • 8/3/2019 Time Domain Methods1 (1)

    8/51

    Time-Dependent Analyses

    Analyzing each frame may produce eithera

    single number, or a set of numbers, e.g., Energy (a single number)

    Vocal tract parameters (a set of numbers)

    This will produce a new time-dependentsequence.

  • 8/3/2019 Time Domain Methods1 (1)

    9/51

    General Form

    g

    g!!

    m

    n mnwmxTQ )()]([

    n: Frame index

    x(m): Speech signal

    T[]: A linear or nonlinear transformation.

    w(m): A window function (finite of infinite).

  • 8/3/2019 Time Domain Methods1 (1)

    10/51

    General Form

    Qn is a sequence oflocal weightedaverage values of the sequence T[x(m)].

    g

    g!!

    m

    n mnwmxTQ )()]([

  • 8/3/2019 Time Domain Methods1 (1)

    11/51

    Example

    g

    g!! m mxE )(2

    Energy

    2

    1

    ( )n

    n

    m n N

    E x m

    !

    ! Short-Time

    Energy

  • 8/3/2019 Time Domain Methods1 (1)

    12/51

    Example

    2

    1

    ( )n

    n

    m n N

    E x m

    !

    ! Short-Time

    Energy

  • 8/3/2019 Time Domain Methods1 (1)

    13/51

    2

    1

    ( )n

    n

    m n N

    E x m

    !

    ! Short-Time

    Energy

    )()]([ 2 mxmxT !

    ee

    !otherwise

    Nmmw

    0

    101)(

    g

    g!

    !m

    n mnwmxTE )()]([

    Example

  • 8/3/2019 Time Domain Methods1 (1)

    14/51

    General Short-Time-Analysis Scheme

    T[ ]Linear

    Filter

    Lowpass

    Filter

    Depending on the

    choice of window

  • 8/3/2019 Time Domain Methods1 (1)

    15/51

    Time-Domain Methods forSpeech Processing

    Short-Time Energy and

    Average Magnitude

  • 8/3/2019 Time Domain Methods1 (1)

    16/51

    Applications

    Silence Detection

    Segmentation

    Lip Sync

  • 8/3/2019 Time Domain Methods1 (1)

    17/51

    Short-Time Energy

    g

    g!

    !m

    n mnwmxE2)]()([

    g

    g!

    !m

    mnwmx )()( 22

    g

    g!

    !m

    mnhmx )()(2

    )(*)(2 mhmx!

  • 8/3/2019 Time Domain Methods1 (1)

    18/51

  • 8/3/2019 Time Domain Methods1 (1)

    19/51

    Block Diagram Representation

    [ ]2x(n) x2

    (n)

    | |x(n) |x(n)|

    h(n) En

    w(n) Mn

    )()( 2 mwnh !

  • 8/3/2019 Time Domain Methods1 (1)

    20/51

    Block Diagram Representation

    [ ]2x(n) x2

    (n)

    | |x(n) |x(n)|

    h(n) En

    w(n) Mn

    )()( 2 mwnh !

    What is the effect of windows?

  • 8/3/2019 Time Domain Methods1 (1)

    21/51

    The Effects of Windows

    Window length

    Window function

  • 8/3/2019 Time Domain Methods1 (1)

    22/51

    Rectangular Window

    ee!

    otherwiseNnnh

    0101)(

    )2/sin(

    )2/sin()( 2/)1(

    NeeH Njj !

  • 8/3/2019 Time Domain Methods1 (1)

    23/51

    m[(

    Mainlobe

    width

    Rectangular Window

    )2/sin(

    )2/sin()( 2/)1(

    NeeH Njj !

    Peak sidelobe

    T T T2T 2

    |)(|

    j

    eH

    N

    2

    N

    2

    N=88

  • 8/3/2019 Time Domain Methods1 (1)

    24/51

    Rectangular Window

    )2/sin(

    )2/sin()( 2/)1(

    NeeH Njj !What is this?

    Discuss the effect of window duration.

    Discuss the effect of mainlobe width and sidelobe peak.

    m[(

    Mainlobe

    width

    Peak sidelobe

    T T T2T 2

    |)(|

    j

    eH

    N

    2

    N

    2

    N=88

  • 8/3/2019 Time Domain Methods1 (1)

    25/51

    Commonly Used Windows

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 5 10 15 20

    R

    ectangular

    BlackmanHanning

    Bartlett

    Hamming

  • 8/3/2019 Time Domain Methods1 (1)

    26/51

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 5 10 15 20

    Rectangular

    Blackman

    Hanning

    Bartlett

    Hamming

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 5 10 15 20

    Rectangular

    Blackman

    Hanning

    Bartlett

    Hamming

    Commonly Used Windows

    ee

    !otherwise

    Nnnw

    0

    101)(

    e

    ee

    !

    otherwise

    NnNNn

    NnNn

    nw

    0

    12/)1()1/(22

    2/)1(0)1/(2

    )(

    ee

    ! otherwise

    NnNn

    nw 0

    10)]1/(2cos[5.05.0

    )(

    ee

    !otherwise

    NnNnnw

    0

    10)]1/(2cos[46.054.0)(

    ee

    ! otherwise

    NnNnNn

    nw 0

    10)]1/(4cos[8.0)]1/(2cos[5.042.0

    )(

    Rectangular

    Bartlett

    (Triangular)

    Hanning

    Hamming

    Blackman

  • 8/3/2019 Time Domain Methods1 (1)

    27/51

    Commonly Used Windows

    Rectangular

    Bartlett

    Hanning

    Hamming

    Blackman

    Least mainlobe width

  • 8/3/2019 Time Domain Methods1 (1)

    28/51

    Examples: Short-Time Energy

    RectangularWindow HammingWindow

  • 8/3/2019 Time Domain Methods1 (1)

    29/51

    Examples: Average Magnitude

    RectangularWindow HammingWindow

  • 8/3/2019 Time Domain Methods1 (1)

    30/51

    The Effects of Window Length

    Increasing the window lengthN, decreases

    the bandwidth. IfNis too small, e.g., less than one pitch

    period, En and Mn will fluctuate very rapidly.

    IfNis too large, e.g., on the order of severalpitch periods, En and Mn will change very

    slowly.

  • 8/3/2019 Time Domain Methods1 (1)

    31/51

    The Choice of Window Length

    No signal value ofNis entirely satisfactory.

    This is because the duration of a pitch period

    varies from about 2 ms for a high pitch

    female or a child, up to 25 ms for a very lowpitch male.

  • 8/3/2019 Time Domain Methods1 (1)

    32/51

    Sampling Rate Thebandwidth of both En and Mn is just that

    of the lowpass filter. So, they need not be sampled as frequently as

    speech signals.

    For example Frame size =20ms

    Sample period =10ms

  • 8/3/2019 Time Domain Methods1 (1)

    33/51

    Main Applications ofEn and Mn

    To provide the basis for distinguishing

    voiced speech segments from unvoicedsegments.

    Silence detection.

  • 8/3/2019 Time Domain Methods1 (1)

    34/51

    Differences ofEn and Mn

    g

    g!!

    m

    n mnwmxE2

    )]()([

    g

    g!

    !m

    n mnwmxM )(|)(|

    Emphasizing large sample-to-

    sample variations in x(n).

    The dynamic range (max/min)

    is approximately the square

    root ofEn.

    The differences in level between voiced and unvoiced

    regions are not as pronounced as En.

  • 8/3/2019 Time Domain Methods1 (1)

    35/51

    FIR and IIR

    All the windows that we discussed

    are FIRs.

    Each of them is a lowpass filter.

    It can also be an IIR.

  • 8/3/2019 Time Domain Methods1 (1)

    36/51

    IIR Example

    u

    ! 00

    0

    )( n

    na

    nh

    n

    11

    1

    )( ! azzH

    Recursive formulas:

    )(21 nxaEE nn !

    |)(|1 nxaMM nn !

    Short-Time Energy:

    Short-Time

    Average magnitude:

  • 8/3/2019 Time Domain Methods1 (1)

    37/51

    Time-Domain Methods forSpeech Processing

    Short-Time Average

    Zero-Crossing Rate

  • 8/3/2019 Time Domain Methods1 (1)

    38/51

    Voiced and Unvoiced Signals

    Th/i/s

    Thi/s/

  • 8/3/2019 Time Domain Methods1 (1)

    39/51

    The Short-Time Average Zero-Crossing Rate

    g

    g!

    !m

    n mnwmxmxZ )(|)]1(sgn[)](sgn[|

    x(n) FirstDifference

    | |ZnLowpass

    Filter

    u!

    0)(1

    0)(1)](sgn[

    mx

    mxmx 10

    2

    1)( ee! Nm

    Nmw

  • 8/3/2019 Time Domain Methods1 (1)

    40/51

    Distribution of Zero-Crossings

  • 8/3/2019 Time Domain Methods1 (1)

    41/51

    Example

  • 8/3/2019 Time Domain Methods1 (1)

    42/51

    Block diagram of the voiced/unvoiced classification.

  • 8/3/2019 Time Domain Methods1 (1)

    43/51

    Frame-byframe processing of speech signal.

  • 8/3/2019 Time Domain Methods1 (1)

    44/51

    Zero-Crossings Rate

    A zero crossing is said to occur if successive

    samples have different algebraic signs.The rate at which zero crossings occur is a

    simple measure of the frequency content of a

    signal.Zero-crossing rate is a measure of number of

    times in a given time interval/frame that the

    amplitude of the speech signals passes through a

    value of zero.

  • 8/3/2019 Time Domain Methods1 (1)

    45/51

    Distribution of zero-crossings for unvoiced and voiced speech

  • 8/3/2019 Time Domain Methods1 (1)

    46/51

    A definition for zero-crossings rate is:

    Where

    and

  • 8/3/2019 Time Domain Methods1 (1)

    47/51

    Original speech signal for word four.

  • 8/3/2019 Time Domain Methods1 (1)

    48/51

  • 8/3/2019 Time Domain Methods1 (1)

    49/51

    Summary

    Energy of a speech is a parameter for

    classifying the voiced/unvoiced parts. The voiced part of the speech has high energy

    because of its periodicity and the unvoiced part

    of speech has low energy.

  • 8/3/2019 Time Domain Methods1 (1)

    50/51

    Zero-crossing rate is an important parameter for voiced/

    unvoiced classification.

    used as a part of the front-end processing in automatic

    speech recognition system.

    Voiced speech is produced because of excitation of vocal

    tract by the periodic flow of air at the glottis and usuallyshows a low zero-crossing count ,

    The unvoiced speech is produced by the constriction of the

    vocal tract narrow enough to cause turbulent airflow which

    results in noise and shows high zero-crossing count.

  • 8/3/2019 Time Domain Methods1 (1)

    51/51

    The results suggest that

    zero crossing rates are low for voiced part andhigh for unvoiced part where as the energy is

    high for voiced part and

    low for unvoiced part.