Upload
pankaj-kumar
View
651
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
ENVIRONMENTAL NATURAL SOUND DETECTION AND CLASSIFICATION USINGCONTENT-BASED RETRIEVAL (CBR) AND MFCC
Project Mentor :- Shiladitya PujariProject group member :-Parth Sinha(20093043)Pankaj Kumar(20093013)Manas Sarkar(20093030)Ruchasri Nath(20093055)
1
MAIN TOPICS
Objective
Methodology
Result
Future scope & conclusion
2
OBJECTIVE
To develop an Environmental Sound Detection & Classification technique (using Content Based Retrieval & MFCC) so that computer system can predict and understand “SOUND” more accurately.
To make computer systems more intelligent & reliable in understanding its environment based on this technique.
3
DESCRIPTION OF TERMS
MFCC
CBR
4
WHAT ARE MFCCS? In sound processing, the Mel-frequency cepstrum (MFC) is a
representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear Mel scale of frequency.
Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum").
The difference between the cepstrum and the Mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the Mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal cepstrum. This frequency warping can allow for better representation of sound, for example, in audio compression.
MFCCs are commonly derived as follows: 1. Take the Fourier transform of (a windowed excerpt of) a signal. 2. Map the powers of the spectrum obtained above onto the Mel
scale, triangular overlapping windows.
5
(CONTD…….)
3.Take the logs of the powers at each of the mel frequencies.
4.Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
5. The MFCCs are the amplitudes of the resulting spectrum. MFCCs are commonly used as features in speech
recognition systems, such as the systems which can automatically recognize numbers spoken into a telephone. They are also common in speaker recognition, which is the task of recognizing people from their voices.
MFCCs are also increasingly finding uses in music information retrieval applications such as genre classification, audio similarity measures, etc.
6
CBR
Content Based Retrieval means that the retrieval and the required search is based on the analysis of the actual contents of the data(here sound) rather than the metadata such as keywords, tags and/or descriptions associated with the sounds.
In our project we’ll use multimedia database which provides Content Based Retrieval .
7
METHODOLOGY(1)The major steps involved in the entire methodare as follows :
Extraction of feature for classifying highly diversified natural sounds.
Making clusters according to their feature similarity.
Finding a match for a particular sound query from the cluster.
8
METHODOLOGY(2) First we take input sound(audio signal of any
format). Then some preprocessing will be done to
normalize the signals. Feature Extraction of the audio signal. Next will be the Classification phase(consisting of
two phases):-
Training phase Testing phase
9
METHODOLOGY(3)
10
Fig: Mel Frequency Cepstral Coefficient pipeline
PROCESS DESCRIPTIONSampling It is the process of converting a continuous signal into a discrete signal. Sampling can be
done for signals varying in space, time, or any other dimension, and similar results are obtained in two or more dimensions.
Pre-emphasis In processing of electronic audio signals,pre-emphasis refers to a system process designed
to increase (within a frequency band) the magnitude of some (usually higher) frequencies with respect to the magnitude of other (usually lower) frequencies in order to improve the overall signal-to-noise ratio (SNR) by minimizing the adverse effects.
Windowing In signal processing, a window function (also known as tapering function) is a
mathematical function that is zero-valued outside of some chosen interval. For instance, a function that is constant inside the interval and zero elsewhere is called a rectangular window, which describes the shape of its graphical representation.
Fast Fourier Transform FFTs are of great importance to a wide variety of applications, from digital signal processing
and solving partial differential equations to algorithms for quick multiplication of large integers.
Absolute Value In mathematics, the absolute value (or modulus) |a\ of a real number a is the numerical
value of a without its sign. The absolute value of a number may be thought of as its distance from zero.
11
PROCESS DESCRIPTION(CONTINUED..)Discrete cosine transformation(DCT) In particular, a DCT is a Fourier-related transform similar to the discrete Fourier
transform (DFT), but uses only real numbers. DCTs are equivalent to DFTs of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even), where in some variants the input and/or output data are shifted by half a sample. There are eight standard DCT variants, of which four are commonly used.
Linear Discriminate Analysis (LDA) Linear discriminate analysis (LDA) and the related Fisher's linear discriminate are
methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier or, more commonly, for dimensionality reduction before later classification.
12
TRAINING AND TESTING
13Fig: Flowchart of Testing SessionFig: Flow chart of Training Session
RESULT
On using the above mentioned approaches (MFCC and CBR) for sound detection and classification system we find that the Recognition Rate is very high and very accurate.
Although the recognition rate is high enough, one problem is that of Rejection Rate, that is, the rejection rate is not quite good enough.
This implies that if the particular sound that is to be tested is already present in the database then the matching process is very accurate but if that sound is not present in the database then the system doesn’t reject the sound (or stop the matching) rather it matches it with the nearest and closest sounds in terms of features.
14
FUTURE SCOPE AND CONCLUSION
Future scope and applications Environmental monitoring Speaker recognition Genre classification Audio similarity measures Robotic awareness
Conclusion This method of environmental sound detection and classification is developed using
MFCC pipeline and CBR for extraction of features of a particular sound and retrieval of sound features from the multimedia database respectively. This method can be implemented in the domain of robotics where sound detection and recognition may be possible up to a satisfactory level. If the method will be properly implemented with computer vision, then human-computer interaction process can be developed much. MFCC is undoubtedly more efficient feature extraction method because it is designed by giving emphasis on human perception power. Using more than one features of a sound may obviously improve the performance of the method. Applying clustering technique, accuracy can be boosted. Another good feature available today is Audio spectrum projection provided by MPEG7 specification. Inclusion of this feature may increase the performance measure of the method.
15