Demo End of Speech

Embed Size (px)

Citation preview

  • 8/16/2019 Demo End of Speech

    1/45

     PCS Research & Advanced Technology Labs

    Speech Lab

    How to deal with the noise in real systems?

    Hsiao-Chun Wu

    Motorola PCS Research and Advanced

    Technology Labs, Speech Laboratory

    [email protected]

    Phone: (815) 884-3071

  • 8/16/2019 Demo End of Speech

    2/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    Why do we need to study noise?

     Noise exists everywhere. It affects the performance of signal

     processing in reality. Since the noise cannot be avoided by system

    engineers, modern “noiseprocessing! technology has been researched

    and designed to overcome this problem. "ence many related research

    areas have been emerging, such as signal detection, signal

    enhancement#noise suppression and channel e$uali%ation. 

  • 8/16/2019 Demo End of Speech

    3/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    & Spectral 'runcation

     (  Spectral Subtraction )*++-

    & 'ime 'runcation (  Signal /etection 

    & Spatial and#or 'emporal 0iltering

     (  1$uali%ation  (  2rray Signal Separation )3lind Source Separation-

    "ow to deal with noise? 4ut it off5555

    )()(~ 

    )()()(~ 

    )(

     f  S  f  N  f  N  f  S  f  S 

     f  R

    noiseT nr    ∈≈   τ  τ  τ  τ   -,)-)6

    )()()()()(~ 

    )(

    t st st ht wt s

    t r 

    )()()()()(~ 

    )(

    t  S t  S t  H t W t  S 

    t  R

  • 8/16/2019 Demo End of Speech

    4/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    Session 1. On-line Automatic End-of-speech Detection

    Algorithm (Time Truncation)*. 7ro8ect goal.

    9. :eview of current methods.;. Introduction to voice metric based endofspeech

    detector.

  • 8/16/2019 Demo End of Speech

    5/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    1. Project Goal:

    • Problem

    – Digit-dial recognition with unknown digit string length

    • Solution 1

    – fixed length window such as 10 seconds? (inconvenience to users)

    • Solution 2

    – Dynamic termination of data capture? (need a robust detection

    algorithm)

  • 8/16/2019 Demo End of Speech

    6/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    • Research and design a robust dynamic termination mechanism for speech

    recognizer.

    – a new on-line automatic end-of-speech detection algorithm with small

    computational complexity.

    • Design a more robust front end to improve the recognition accuracy for

    speech recognizers.

    – a new algorithm can also decrease the excessive feature extraction of redundant

    noise.

  • 8/16/2019 Demo End of Speech

    7/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    2. Review of Current Methods:

    Most speech detection algorithms can be characterized into three categories.

    • Frame energy detection

    – short-term frame energy (20 msec) can be used for speech/noise

    classification.

    – it is not robust at large background noise levels.

    • Zero-crossing rate detection–  short-term zero-crossing rate can also be used for speech/noise

    classification.

    – it is not robust in a wide variety of noise types.

    • Higher-order-spectral detection

    – short-term higher-order spectra can be used for speech/noiseclassification.

    – it implies a heavy computational complexity and its threshold is

    difficult to be pre-determined.

  • 8/16/2019 Demo End of Speech

    8/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    3. Introduction to Voice Metric Based End-of-speech

    Detector:

    • End-of-speech detection using voice metric features is based on the Mel-

    energies. Voice metric features are robust over a wide variety of background

    noise. Originally voice metric based speech/noise classifier was applied for

    IS-127 CELP speech coder standard. We modify and enhance voice-metric

    features to design a new end-of-speech detector for Motorola voice

    recognition front end (VR LITE III).

  • 8/16/2019 Demo End of Speech

    9/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

  • 8/16/2019 Demo End of Speech

    10/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

  • 8/16/2019 Demo End of Speech

    11/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

  • 8/16/2019 Demo End of Speech

    12/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

  • 8/16/2019 Demo End of Speech

    13/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

  • 8/16/2019 Demo End of Speech

    14/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    CS h & Ad d h l b

  • 8/16/2019 Demo End of Speech

    15/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    voice metric score table

    PCS R h & Ad d T h l L b

  • 8/16/2019 Demo End of Speech

    16/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    7reS#N4lassifier 

    >oiceetric

    elSpectrum

    SN: 1stimate

    1@S

    3uffer 

    'hreshold

    2daptation

    raw data 00'

    Speech

    Start?

    Silence

    /uration

    'hreshold

    7ostS#N

    4lassifier 

    voice metric scores

    @riginal >: AI'1 0ront 1nd

    1ndofspeech /etector  data capture stops

     yes

    no

    PCS R h & Ad d T h l L b

  • 8/16/2019 Demo End of Speech

    17/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    >: AI'1

    recognitionengine

    feature vector

    frame buffer 

    segmentation

    of speech intoframes

    data capture

    terminates

    end of

    speech?

     yes

    no frame i net frame i!*

    speech

    input

    front end

    with endofspeech

    detector 

    PCS R h & Ad d T h l L b

  • 8/16/2019 Demo End of Speech

    18/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    B.=* seconds

    ;.C seconds

  • 8/16/2019 Demo End of Speech

    19/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    4orrect

    detection

    1nd

     point

    0alse

    detection

    falsedetection

    time error 

    correctdetection

    time error 

    String “2-2-9-1-7-8” in Car 55 mph

    seconds

    PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    20/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    4. Simulation Results: (Simulation is done over Motorola digit-stringdatabase, including 16 speakers and 15,166 variable-length digit strings in 7

    different conditions. Silence threshold is 1.85 seconds.)A. Receiver Operating Curve (ROC): ROC curve is the

    relationship between the end-of-speech detection rate versus the

    false (early) detection rate. We compare two different methods,

    namely, (1) new voice-metric based end-of-speech detector and

    (2) old speech/noise flag based end-of-speech detector.

    PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    21/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    :@4 curve

    false detection rate )D-

       d  e   t  e  c   t   i  o  n  r

      a   t  e   )   D   -

    PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    22/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    • B. String-accuracy-convergence (SAC) curve: SAC

    curve is the relationship between the string recognition accuracy

    versus the false (early) detection rate. We compare two different

    methods, namely, (1) new voice-metric based end-of-speech

    detector and (2) old speech/noise flag based end-of-speech

    detector.

    PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    23/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    false detection rate )D-

      s   t  r   i  n  g  r  e  c  o  g  n   i   t   i  o  n

      a  c  c  u  r  a  c  y   )   D   -

    S24 curve

    PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    24/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    C. Table of detection results: (This table illustrates the result amongthe Madison sub-database including data files with 1.85 seconds or more of

    silence after end of speech.)

    4ondition 2verage

    'ime 1rror 

    2verage

    0alse/etection

    'ime 1rror 

    2verage

    4orrect/etection

    'ime 1rror 

    0alse

    /etection:ate

    String

     Numbers

    'otal

    /etection:ate

    @verall 1.98 sec 1.68 sec 1.85 sec 0.47 7!418 86.08@ffice

    4losetalE 

    *.+C sec F sec *.+; sec FD +FC +

  • 8/16/2019 Demo End of Speech

    25/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    (This table illustrates the result over the small database collected by Motorola

    PCS CSSRL. All digits strings are recorded in 15 seconds of fixed window)

    4ondition 2verage

    'ime 1rror 

    2verage

    0alse/etection

    'ime 1rror

    2verage

    4orrect/etection

    'ime 1rror

    0alse

    /etection:ate

    String

     Numbers

    'otal

    /etection:ate

    String

    :ecognition2ccuracy

    )w#i 1@S-

    String

    :ecognition2ccuracy

    )w#o 1@S-

    @verall 1.82sec"n#s

    0 sec"n#s 1.82

    sec"n#s

    0 121 96.69 50.41 29.75

    @ffice4losetalE 

    *.=seconds

    F seconds *.=seconds

    FD 9* *FFD BB.BCD B*.+FD

    @ffice2rmlength

    *.<seconds

    F seconds *.<seconds

    FD 9F *FFD B=.FFD B=.FFD

    4afG4losetalE 

    *.CBseconds

    F seconds *.CBseconds

    FD

  • 8/16/2019 Demo End of Speech

    26/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    2nalysis of the Simulation :esult Why didnHt 1@S

    detection worE well in babble noise?

    PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    27/45

     PCS Research & Advanced Technology Labs

    Speech Lab November 14, 2000

    @ptimal /etection /ecision

    & 3ayes classifier 

    & AiEelihood :atio 'est

    )$%(&"g')$%(&"g'   n f 

     H 

     H 

      ns f 

    n

    s

     

    $)(

    )(&"g'!)(

    )$%(&"g')$%(&"g')(

    ns f 

    n f T T 

     H 

     H 

       L

    n   f ns   f    L

     "ayes "ayes

    n

    s

     

    =

     

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    28/45

    CS & gy

    Speech Lab November 14, 2000

    /igit “one! in closetalEing mic, $uiet office

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    29/45

    gy

    Speech Lab November 14, 2000

    /igit “one! in handsfree mic, == mil#h car 

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    30/45

    gy

    Speech Lab November 14, 2000

    /igit “one! in fartalEing mic, cafeteria

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    31/45

    gy

    Speech Lab November 14, 2000

    5. Conclusion:• New voice-metric based end-of-speech detector is robust over a wide

    variety of background noise.

    • Only a small increase in the computational complexity will be brought by

    new voice-metric based end-of-speech detector and it can be real-time

    implementable.

    • New voice-metric based end-of-speech detector can improve recognition

    performance by discarding extra noise due to the fixed data capture

    window.

    • New voice-metric based end-of-speech detector needs further improvement

    in the babble noise environment.

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    32/45

    Speech Lab November 14, 2000

    Session 2. Speech Enhancement Algorithms: Blind

    Source Separation Methods (Spatial and TemporalFiltering)1. Motivation and research goal.

    2. Statement of “blind source separation” problem.

    3. Principles of blind source separation.

    4. Criteria for blind source separation.

    5. Application to blind channel equalization for digital

    communication systems.6. Simulation and comparison.

    7. Summary and conclusion.

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    33/45

    Speech Lab November 14, 2000

    1. Motivation:

    • Mimic human auditory system to differentiate the subject signals from othersounds, such as interfered sources, background noise for clear recognition of the

    subject contents.

    • ‘One of the most striking facts about our ears is that we have two of them--and

     yet we hear one acoustic world; only one voice per speaker .’ (E. C. Cherry andW. K. Taylor. Some further experiments on the recognition of speech, with one

    and two ears. Journal of the Acoustic Society of America, 26:554-559, 1954)

    • The ‘‘cocktail party effect’’--the ability to focus one’s listening attention on a

    single talker among a cacophony of conversations and background noise--hasbeen recognized for some time. This specialized listening ability may be because

    of characteristics of the human speech production system, the auditory system, or

    high-level perceptual and language processing.

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    34/45

    Speech Lab November 14, 2000

    Research Goal:

    Design a preprocessor with digital signal processing speech

    enhancement algorithms. The input signals are collected through

    multiple sensor (microphone) arrays. After the computation of

    embedded signal processing algorithms, we have clearly separated

    signals at the output.

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    35/45

    Speech Lab November 14, 2000

    2udio Input

    3lind Source Separation 2lgorithms

    1nhanced @utput

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    36/45

    Speech Lab November 14, 2000

    2. Problem Statement of Blind Source Separation:

    What is “ Blind Source Separation!?

    Sensor 1 Sensor  N 

    Signal 1 Signal  M 

    Received input signals

    Given the N  linearly mixed received input signals,

    we need to recover the M  statistically independent

    sources as much as possible ( ). #  N  ≥

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    37/45

    Speech Lab November 14, 2000

    Formulation of Blind Source Separation Problem:

    2 received signal vector from the array, $ )t -, is the original source vector S )t - 

    through the channel distortion H )t -, such that $ )t -  H )t -  S )t -, where 

    and

    We need to estimate a separator  W )t - such that

    where

    [ ] [ ]T  M T 

     N    t  st  st S t  xt  xt  X    -)-)-),-)-)-) **     ==

    =-)-)

    -)

    -)-)

    -)

    *

    ***

    t ht h

    t h

    t ht h

    t  H 

     NM  N 

    ij

     M 

    [ ]   -)-)FF-)-)-)6

    *   t  X t W t  st  st S   T 

     M    ⊗=≈  

    =

    -)-)

    -)

    -)-)

    -)

    *

    ***

    t wt w

    t w

    t wt w

    t W 

     NN  N 

     p

     N 

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    38/45

    Speech Lab November 14, 2000

    3. Principles of Blind Source Separation:

    'he independence measurement ShannonHs  M!t!a" in#ormation.

    F-,,,)-)-,,,) 9**

    9*   ≥−∑==

     N 

     N 

    ii N   $ $ $ H  $ H  $ $ $ %   

    ∑−=

    =

     $

    i i $ N &  N  $ #  '  $ $ $ #  '  $ $ $ % 

    i*9*9*-JK)LlogM-JK,,,)LlogM-,,,)    

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    39/45

    Speech Lab November 14, 2000

    4. Criteria to Separate Independent Sources:

    • Constrained Entropy (Wu, IJCNN99):–  

    • Hardamard Measure (Wu, ICA99):

    –  

    • Frobenius Norm (Wu, NNSP97):

    –  

    • Quadratic Gaussianity (Wu, NNSP99):

    –  

    =

     N 

    i i i i  y f W  % 

    101 )$!!(&"g'$)#et(&"g'  

    )$'&"g($)'(&"g2T T    '   ' diag  % 

     

    2

    * $)'($'( 

    T T    ' diag   '  %  

    i i ) i     dy y f  y f  %  i 

    2

    4   )()(  ∞

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    40/45

    Speech Lab November 14, 2000

    We apply the minimization of modified constrained entropy

    to adapt an equalizer w(t ) =[w0, w1, ....] for

    a digital channel h(t ). Assume a PAM signal constellation with symbols s(t ) = ,

    passing through a digital channel h(t ) = [c(t , 0.11) + 0.8c(t -1, 0.11) - 0.4c(t -3,

    0.11)]W 6T (t ),

    where is raised-cosine function with

    roll-off factor β and is a rectangular window. the input signal

    to the equalizer is where n(t ) is the background noise.

    We applied generalized anti-Hebbian learning to adapt w(t )

    such that  .

    5. Application to Blind Single Channel Equalization

    for Digital Communication Systems:∑

    =

     N 

    i i i i 

     N  y f w % 

    101 )$!!(&"g')&"g(  

    1

    2

    22

    41

    )c"s()(sin)!(

    t c

    t c

     

    π

    =

    )()()(   t t ht w   δ

    )6

    (6T 

    t rect W  T  =

     

    τ

    τ )()()()( t nst ht   

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    41/45

    Speech Lab November 14, 2000

    Signaltonoise :atio )d3-

    Signalto

    interf

    erence:atio)d3

    -

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    42/45

    Speech Lab November 14, 2000

    Signaltonoise :atio )d3-

       3   i   t   1  r  r

      o  r   :  a   t  e

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    43/45

    Speech Lab November 14, 2000

    6. Simulation and Comparison:

    The simulation results for comparison among our generalized

    anti-Hebbian learning, SDIF algorithm and Lee’s Informax method

    (Lee IJCNN97) over three real recordings downloaded from Salk

    Institute, University of California at San Diego.

     PCS Research & Advanced Technology Labs

  • 8/16/2019 Demo End of Speech

    44/45

    Speech Lab November 14, 2000

     New >: AI'1 0rontend 3lind Source Separation  1nd

    ofspeech /etection

    schemes 2verage/etection

    'ime 1rror 

    2verage0alse

    /etection'ime1rror 

    2verage4orrect

    /etection'ime1rror 

     Number of Strings

    0alse/etection

    :ate

    'otal/etection

    :ate

    1@Sonly

    F.9=Bseconds

    F.*==seconds

    F.;*Cseconds

    *< C.*

  • 8/16/2019 Demo End of Speech

    45/45

    7. Conclusion and Future Research:

    & 'he computational efficiency of blind source separation needs

    to be reduced.

    & 'est 3SS for 1@S detection under microphone arrays of the

    same Eind.

    & Incorporate other array signal processing )beamformer?-

    techni$ue to improve speech detection and recognition.