_________________________ Speech and Music Discrimination using Gaussian Mixture Model Seminar...
54
________________________ ________________________ _ _ Speech and Music Discrimination using Speech and Music Discrimination using Gaussian Mixture Model Gaussian Mixture Model Seminar Seminar Program Program Project Team Project Team Dr. Deep Sen Dr. Deep Sen (Supervisor) (Supervisor) CHOI Arthur, Tsz Kin CHOI Arthur, Tsz Kin (3015809) CHENG Derek, Ka Chun (3015809) CHENG Derek, Ka Chun (3015631) (3015631)
_________________________ Speech and Music Discrimination using Gaussian Mixture Model Seminar Program Project Team Dr. Deep Sen(Supervisor) CHOI Arthur,
_________________________ Speech and Music Discrimination using
Gaussian Mixture Model Seminar Program Project Team Dr. Deep
Sen(Supervisor) CHOI Arthur, Tsz Kin(3015809) CHENG Derek, Ka
Chun(3015631)
Slide 2
_________________________ Speech and Music Discrimination using
GMM
Slide 3
_________________________ Motivations Many researches on HMM,
not too many using GMM GMM reduce complexity compared to HMM Our
feature extraction methods will reduce complexity Multimedia files
search/storage still under develop Fit University requirement
Slide 4
_________________________ Speech and Music Discrimination using
GMM
_________________________ Speech and Music Discrimination using
GMM Approaches Deterministic Signals Deterministic Signals can be
analysis as completely specified functions of time can be analysis
as completely specified functions of time Un-deterministic Signals
must analysis probilistically must analysis probilistically
[Tele3013 notes]
Slide 7
_________________________ Speech and Music Discrimination using
GMM Procedures 1.Read a signal 2.Segmented it into small frames
3.Extract features of each frames 4.Classify each frames
Slide 8
_________________________ Speech and Music Discrimination using
GMM Feature Extractions
Slide 9
_________________________ Speech and Music Discrimination using
GMM Classification
Slide 10
_________________________ silencemusicspeech
Slide 11
_________________________ Speech and Music Discrimination by
using GMM Segmentation Reasons Reasons Get a better estimation
result Get a better estimation result Achieve a Real-Time behavior
Achieve a Real-Time behavior Problems and solutions Problems and
solutions Frames too big -- Classification accuracy decrease Frames
too big -- Classification accuracy decrease Frames too small--
Feature extraction accuracy decrease Frames too small-- Feature
extraction accuracy decrease Chose frame size ~20ms Chose frame
size ~20ms Music Signal
Slide 12
_________________________ Speech and Music Discrimination using
GMM
Slide 13
_________________________ 4 Hz modulation energy Speech energy
has a characteristic energy modulation peak around the 4Hz syllabic
rate. [Houtgast & Steeneken 1985] Reasons Accurately separate
speech signals and music signals (~94%) Accurately separate speech
signals and music signals (~94%) Easy to implement in Matlab Easy
to implement in Matlab Novel and Robust Novel and Robust
Slide 14
_________________________ Speech and Music Discrimination using
GMM Music Signal Speech Signal
Slide 15
_________________________ Speech and Music Discrimination using
GMM
Slide 16
_________________________ Music Signal Speech Signal Energy vs.
Time
Slide 17
_________________________ Speech and Music Discrimination using
GMM Zero-Crossing Count (ZCC) The zero-crossing count is the total
number of times that a signal goes through the x-axis over a
certain time. Speech signals High ZCC Music signalsLow ZCC Reasons
ZCC of a speech signal is significantly high ZCC of a speech signal
is significantly high Very easy to implement in Matlab Very easy to
implement in Matlab Mature and Robust Mature and Robust
Slide 18
_________________________ Speech and Music Discrimination using
GMM
Slide 19
_________________________
Slide 20
_________________________ Spectral Roll-off Point The spectral
roll-off point measures the skewness of the spectrum. Reasons Music
usually has more energy in the high frequency range Music usually
has more energy in the high frequency range Useful for separate
different kind of speech later Useful for separate different kind
of speech later
Slide 21
_________________________ Speech and Music Discrimination using
GMM Spectral Roll-off Point Spectral Roll-off Point = SR
where,
Slide 22
_________________________ Speech and Music Discrimination using
GMM Music Signal Speech Signal frequency power
Slide 23
_________________________ Speech and Music Discrimination using
GMM Entropy Modulation Music appears to be ordered compared with a
speech signal [J.Pinquier, J.L. Rouas, R. Andre-Obercht 2002]
Higher Entropy means higher ordered Higher Dynamism means higher
rate of changes Reasons Accurately separate speech signals and
music signals(~90%) Accurately separate speech signals and music
signals(~90%) Novel and Robust Novel and Robust
Slide 24
_________________________ Speech and Music Discrimination using
GMM Music Signal Speech Signal
Slide 25
_________________________ Speech and Music Discrimination using
GMM [J. Ajmera, I.A. McCowan, H.Bourlard 2002]
Slide 26
_________________________ Speech and Music Discrimination using
GMM Instantaneous entropy Average entropy Average Instantaneous
entropy
Slide 27
_________________________ Speech and Music Discrimination using
GMM Pulse Metric The beat of a piece of music is one of the
clearest features of the music. [K.D. Martin, E.D.Scheirer, B.L.
Vercoe 1988]
Slide 28
_________________________ Speech and Music Discrimination using
GMM Other Features Spectral Centroid Spectral Flux Silence Ratio
Short-Time Energy Ratio Volume Dynamic Change Number of Segments
Segment Duration etc
Slide 29
_________________________ Introduction to Gaussian Mixture
Model (GMM) Differentiation of speech and music from a sound source
Differentiation of speech and music from a sound source Use for
speech processing, mostly for speech recognition, speaker
identification and voice conversion Use for speech processing,
mostly for speech recognition, speaker identification and voice
conversion Model densities and to represent general spectral
features Model densities and to represent general spectral
features
Slide 30
Why we choose GMM? Low complexity Rate independence Bit
scalability Short computation time
Slide 31
What is Gaussian Mixture Model? Gaussian Mixture Model consist
of a set of local Gaussian modes, and an integrating network.
Different Gaussian distributions represent different domain of
feature space, and have different output characteristics GMM try to
describe a complex system using combination of all the Gaussian
clusters, instead of using a single model
Slide 32
Gaussian mixtures or clusters Use to describe a complex system
instead of using a single model Represents a dataset by a set of
mean and covariance
Slide 33
Gaussian Mixture Model A Gaussian Mixture Model is represented
by: is the P-dimensional input vector is the P-dimensional input
vector is the mixture weights is the component densities is the
component densities
Slide 34
Clustering clustering is a technique from pattern
classification A technique to group samples P-dimensional feature
vector is considered as a point in space and all points near if are
clustered together
Slide 35
clustering Grey circle represents the variance of
distribution
Slide 36
Gaussian component density P-variate Gaussian function of the
form: is the mean vector is the mean vector is the covariance
matrix
Slide 37
Covariance matrix Indicates the dispersion of distribution In
mathematics, it is defined as the matrix whose ij th element is the
covariance of andi,j=1d
Slide 38
Covariance matrix The diagonal components of the covariance
matrix are the variances of individual random variables
Off-diagonal components are the covariance of two random variables,
and Symmetric matrix
Slide 39
Full covariance matrix The most powerful Gaussian model as it
fits the data best drawback! Needs a lot of data to estimate
parameters Costly in high-dimensional feature spaces
Slide 40
Diagonal covariance matrix Good compromise between quality and
model size Gaussian components can act together to model the
overall probability density function Capable of modelling the
correlations between the feature vector
Slide 41
Review the Gaussian mixture density The matrix weight must
satisfy the condition and and Three components compose the Gaussian
mixture density: mean vectors, covariance matrices and mixture
weights Three components compose the Gaussian mixture density: mean
vectors, covariance matrices and mixture weights
Slide 42
Expectation-maximization (EM) Estimate the mean vector,
covariance matrix and mixture weight Recursively updates
distribution of each Gaussian model and conditional
probability
Slide 43
Idea of Expectation-maximization Instead of starting with a
random configuration of all components and improve upon this
configuration with expectation-maximization. We start with the
optimal one-component mixture. Then start repeating two steps until
convergence Instead of starting with a random configuration of all
components and improve upon this configuration with
expectation-maximization. We start with the optimal one-component
mixture. Then start repeating two steps until convergence i)Inset a
new components and ii)Apply EM until convergence
Slide 44
Convergence Theorem The sequence of likelihood is
monotonically-increasing and bounded, the likelihood will converge
to a local maximum The sequence of likelihood is
monotonically-increasing and bounded, the likelihood will converge
to a local maximum
Slide 45
EM algorithm Assume denote the log- likelihood of the dataset
under k-component matrix Assume denote the log- likelihood of the
dataset under k-component matrix 1.Compute the optimal
one-component mixture. Set k=1 2.Find the optimal new component and
corresponding matrix weight while keepingfixed while
keepingfixed
Slide 46
EM algorithm 3. Set and k=k+1 and k=k+1 4. Update until
convergence
Slide 47
Speech/music discrimination by using GMM An interesting feature
of GMM, component densities of mixture may represent Different
phonetic events for modelling speech Different portion of the sound
when used to model spectra of sound from musical instrument
Slide 48
Achievement Identified optimized frame size Obtained robust
features Performed a few tests Implemented some Matlab codes
Studied the Gaussian Mixture Models (GMMs) and some of their
mathematical expressions
Slide 49
Next year planning Comprehensive and more in-depth research on
GMMs Model the sound source base on GMMs Evaluate noise effect
Matlab implementation for speech/music separation
Slide 50
Next year planning Investigate a novel classification method
Support Vector Machine (SVM) Differentiate Male and female speech
Differentiate Classical and Non-Classical Music Generate a final
thesis report
Slide 51
_________________________ Speech and Music Discrimination using
GMM
Slide 52
_________________________ Resources Internet, Microsoft Sound
Recorder, Matlab Neural Networks for Pattern Recognition (Bishop
1996) Processing and Perception of Speech and Music (Morgan 2000)
Research Papers
Slide 53
_________________________ Speech and Music Discrimination using
GMM Management Plan Dec Feb 04Matlab ImplementationsDec Feb
04Matlab Implementations Investigate noise effect Research on
Support Vector Machine Experiments Jan 05Separating class.,
non-class. musicJan 05Separating class., non-class. music Feb
05Separating male, female speechFeb 05Separating male, female
speech Mar Jun 05Separate Chamber music and Orchestra Music.
Separate Baby speech. (if have time)Mar Jun 05Separate Chamber
music and Orchestra Music. Separate Baby speech. (if have
time)
Slide 54
Ben Gold, Nelson Morgan, Speech and Audio Signal Processing:
Processing and Perception of Speech and Music (2000), John Wiley
& Sons, Inc., USA. Joseph F. Hair, JR., Rolph E. Anderson,
Ronald L. Tatham, William C. Black, Multivariate Data Analysis 4th
Edition (1995), Prentice-Hall International, Inc. USA. Keinosuke
Fukunaga, Computer Science and Scientific Computing: Introduction
to Statistical Pattern Recognition 2nd Edition (1990), Academic
Press, Inc., California, USA., ISBN 0-12-269851-7 Marty J.Schmidts,
Understanding and Using Statistic (1975), D.C Health and Company,
Canada. ISBN 0-669-94490-4 Norman L.Johnson, Samuel Kotz,
Distributions in statistics: Continuous univariate distributions
vol.1 (1970), Houghton Mifflin Company, Boston, USA Richard A.
Johnson, Dean W. Wichern, Applied Multivariate Statistical Analysis
(1992), Prentice-Hall, Inc., New Jersey, USA. ISBN 0-13-041400-X
Richard J.Harris, A Primer of Multivariate Statistics (1975),
Academic Press Inc., New York, USA. ISBN 0-12-327250-5 Thomas D.
Rossing, The Science of Sound (1982), Addison-Wesley Publishing
Company Inc., USA., ISBN 0-201-06505-3 Thomas D. Rossing, Neville
H. Fletcher, Principles of Vibration and Sound (1995),
Springer-Verlag New York Inc. ISBN 0-387-94336-6 El-Maleh K., Klein
M., Petrucci G., and Kabal P., Speech/music discrimination for
multimedia applications (2000), in ICASSP00 Houtgast, T. and
Steeneken, H.J.M. (1985). A review of the MTF- concept in room
acoustics, J. Acoust. Soc. Am. 77, 1069 1077. J. Ajmera, I.
McCowan, and H. Bourlard. Robust HMM- based speech/music
segmentation (2002). In Proceedings of ICASSP-02 J.J. Burred, A.
Lerch, Hierarchical Automatic Audio Signal Classification (2004),
Journal of the Audio Engineering Society J. Pinquier, J. Rouas, R.
Andre-Obrecht, Robust speech / music classification in audio
documents (2002), 7th International Conference On Spoken Language
Processing (ICSLP), pp. 20052008 Martin, KD, Scheirer, ED, Vercoe,
BL, Music Content Analysis through Models of Audition (1998), ACM
Multimedia98 Workshop on Content Processing of Music for Multimedia
Applications, Bristol, UK Thank you