Multimedia Database Management System - Chapter 5
Indexing and Retrieval of Audio
Rachmat Wahid Saleh Insani, S.Kom
Multimedia Database Management System - Chapter 5
Introduction• Audio is classified into three types: speech, music,
and noise.
• Different audio types are processed and indexed in different ways.
• Query audio pieces are similarly classified, processed, and indexed.
• Audio pieces are retrieved based on similarity between the query index and the audio index in the database.
Multimedia Database Management System - Chapter 5
Objectives• Main audio properties and features.
• Audio classification.
• Main speech recognition techniques.
• General approach in indexing and retrieval.
• Temporal and content relationship between media types.
Multimedia Database Management System - Chapter 5
Main Audio Properties and Features
• Time domain
• Frequency domain
Multimedia Database Management System - Chapter 5
Features Derives in theTime Domain
A signal is represented as amplitude varying with time.
Multimedia Database Management System - Chapter 5
Features Derives in theTime Domain
• Average energy
• Zero crossing rate
• Silence ratio
E =x(n)2
n=0
N−1
∑N
ZC =| sgn x(n)− sgn x(n−1)
n=1
N
∑2N
Multimedia Database Management System - Chapter 5
Features Derived fromthe Frequency Domain
• Sound spectrum
Multimedia Database Management System - Chapter 5
Features Derived fromthe Frequency Domain
• Bandwidth
• Energy Distribution
• Harmonicity
• Pitch
Multimedia Database Management System - Chapter 5
Audio ClassificationWhy audio classification is important?
- Different audio types require different processing and indexing retrieval techniques.
- Different audio types have different significance to different applications.
- Speech is important audio types which is successful speech recognition techniques available.
- Audio types is very useful to some applications.
- The search space after classification is reduced to a particular audio class during the retrieval process.
Multimedia Database Management System - Chapter 5
Audio Classification
• There are two types of sound: speech and music.
Multimedia Database Management System - Chapter 5
Main Characteristics
Music
• Music has frequency range from 16-20,000 Hz.
• Music has low silence ratio.
• Music has regular beats.
Speech
• Speech frequency range from 100-7,000 Hz.
• Speech has high silence ratio.
• No regular beats.
Multimedia Database Management System - Chapter 5
Audio Classification Frameworks
• Step by Step Classification
• Feature Vector Based Audio Classification
Multimedia Database Management System - Chapter 5
Feature Vector BasedAudio Classification
Audio pieces of the same class are located close to each other in the feature space and audio pieces of different classes are located far apart in the feature space.
Multimedia Database Management System - Chapter 5
AutomaticSpeech Recognition
ASR system collect models or feature vectors for all possible speech units. Speech unit e.g., phoneme, word, and phrases.
Multimedia Database Management System - Chapter 5
Automatic Speech Recognition Factors
• A phoneme spoken by different speakers or the same speaker at different times produces different features in term of duration, amplitude, and frequency components.
• The above differences are exacerbated by the background or environmental noise.
• Normal speech is continuous and difficult to separate into individual phonemes.
• Phonemes vary with their location in a word.
Multimedia Database Management System - Chapter 5
Speech Recognition Performance
Speech recognition performance is normally measured by recognition error rate. The lower the error rate, the higher the performance.
The performance are affected by following factors:
- Subject matter: this may vary from a set of digits, a newspaper article, to general news.
- Types of speech: read or spontaneous conversation.
- Size of the vocabulary: it ranges from dozens to a few thousand words.
Multimedia Database Management System - Chapter 5
Indexing and Retrieval of Structured Music and Sound Effects
• Structured music are represented by a set of commands.
• The most common structured music is MIDI.
• A new standard of structured audio is MPEG-4 Structured Audio.
• The formats contains structure and notes description.
Multimedia Database Management System - Chapter 5
Indexing and Retrieval of Structured Music and Sound Effects
Multimedia Database Management System - Chapter 5
Indexing and Retrieval of Sample Based Music
• Based on extracted sound features.
• Based on pitches of music notes.