CS 414 - Spring 2014 CS 414 Multimedia Systems Design Lecture
15 MP3 and MP4 Audio Klara Nahrstedt Spring 2014
Slide 2
CS 414 - Spring 2014 Administrative HW1 posted on February 24
(Monday) HW1 deadline on March 3 (Monday) Midterm March 7 (Friday)
in class
Slide 3
Outline H.264/H.265 Arithmetic Coding MP3 Audio Encoding MP4
Audio Reading: Media Coding book, Section 7.7.2 7.7.5 Recommended
Paper on MP3: Davis Pan, A Tutorial on MPEG/Audio Compression, IEEE
Multimedia, pp. 6-74, 1995 Recommended books on JPEG/ MPEG
Audio/Video Fundamentals: Haskell, Puri, Netravali, Digital Video:
An Introduction to MPEG-2, Chapman and Hall, 1996 CS 414 - Spring
2014
Slide 4
H.264/H.265 Entropy Encoding (Limitations of Huffman Coding)
Diverges from lower limit when probability of a particular symbol
becomes high always uses an integral number of bits Must send code
book with the data lowers overall efficiency Must determine
frequency distribution must remain stable over the data set CS 414
- Spring 2014
Slide 5
H.264/H.265 Entropy Coding (Arithmetic Coding) Each symbol is
coded by considering the prior data Encoded data must be read from
the beginning, there is no random access possible Each real number
(< 1) is represented as binary fraction 0.5 = 2 -1 (binary
fraction = 0.1); 0.25 = 2 -2 (binary fraction = 0.01), 0.625 =
0.5+0.125 (binary fraction = 0.101) . CS 414 - Spring 2014
Slide 6
Slide 7
Slide 8
AUDIO COMPRESSION CS 414 - Spring 2014
Slide 9
Why Audio Compression is Needed Data rate = sampling rate *
quantization bits * channels (+ control information) For example
(digital audio): 44100 Hz; 16 bits; 2 channels generates about 1.4M
of data per second; 84M per minute; 5G per hour CS 414 - Spring
2014
Slide 10
MPEG-1 Audio Lossy compression of audio In late 1980s ISOs MPEG
group started to standardize TV broadcasting Use of Audio on CD-ROM
(later DVD) MPEG-1 Audio 1992 MPEG-2 Audio - 1994 MPEG-1 Audio
Layer I, II, III CS 414 - Spring 2014
MPEG-1 Audio Layer II Called MP2 Dominant standard for audio
broadcasting DAB digital radio and DVB digital television Came out
of MUSICAM codecs with bit rates 64-196 kbps MUSICAM audio coding -
basis for MPEG-1 and MPEG-2 audio Sampling rates: 32, 44.1, 48 kHz
Bit rates: 32, 48, 56, 64, 80, 96, 384 kbps Format: mono, stereo,
dual channel, MP2 sub-band audio encoder in time domain CS 414 -
Spring 2014
Slide 13
MPEG-1 Audio Layer III MPEG-1 Layer III is called MP3 format
Popular for Internet applications Goal to compress to 128 kbps, but
can be compressed to higher or lower resulting quality Utilization
of psychoacoustics Scientific study of sound perception. CS 414 -
Spring 2014
Slide 14
MPEG Audio Encoding Steps CS 414 - Spring 2014
Slide 15
Slide 16
MPEG Audio Filter Bank Filter bank divides input into multiple
sub-bands (32 equal frequency sub-bands) Sub-band i defined -
filter output sample for sub-band i at time t, C[n] one of 512
coefficients, x[n] audio input sample from 512 sample buffer CS 414
- Spring 2014
Slide 17
MPEG Audio Psycho-acoustic Model Compresses by removing
acoustically irrelevant parts of audio signals Takes advantage of
human auditory systems inability to hear quantization noise under
auditory masking Auditory masking: occurs when ever the presence of
a strong audio signal makes a temporal or spectral neighborhood of
weaker audio signals imperceptible. CS 414 - Spring 2014
Slide 18
Slide 19
Loudness and Pitch (Review on Psychoacoustic Effects) More
sensitive to loudness at mid frequencies than at other frequencies
intermediate frequencies at [500hz, 5000hz] Human hearing
frequencies at [20hz,20000hz] Perceived loudness of a sound changes
based on frequency of that sound basilar membrane reacts more to
intermediate frequencies than other frequencies CS 414 - Spring
2014
Slide 20
Masking Effects (Review of Psychoacoustic Effects) CS 414 -
Spring 2014 Frequency masking Temporal masking
Slide 21
CS 414 - Spring 2014 MPEG/audio divides audio signal into
frequency sub-bands that approximate critical bands. Then we
quantize each sub-band according to the audibility of quantization
noise within the band
Slide 22
MPEG Audio Bit Allocation This process determines number of
code bits allocated to each sub-band based on information from the
psycho- acoustic model Algorithm: 1. Compute mask-to-noise ratio:
MNR=SNR-SMR Standard provides tables that give estimates for SNR
resulting from quantizing to a given number of quantizer levels 2.
Get MNR for each sub-band 3. Search for sub-band with the lowest
MNR 4. Allocate code bits to this sub-band. If sub-band gets
allocated more code bits than appropriate, look up new estimate of
SNR and repeat step 1 CS 414 - Spring 2014
Slide 23
Audio Quality Bitrate With too low bit rate, we get compression
artifacts Ringing Pre-echo sound is heard before it occurs. It is
most noticeable in impulsive sounds from percussion instruments
such as cymbals Occurs in transform-based audio compression
algorithms Quality of encoder and encoding parameters Constant Bit
rate encoding Variable Bit rate encoding CS 414 - Spring 2014
Slide 24
MP3 Audio Format CS 414 - Spring 2014 Source:
http://wiki.hydrogenaudio.org/images/e/ee/Mp3filestructure.jpg
Slide 25
MPEG Audio Comments Precision of 16 bits per sample is needed
to get good SNR ratio Noise we are getting is quantization noise
from the digitization process For each added bit, we get 6dB better
SNR ratio Masking effect means that we can raise the noise floor
around a strong sound because the noise will be masked away Raising
noise floor is the same as using less bits and using less bits is
the same as compression CS 414 - Spring 2014
Slide 26
MPEG-4 Audio (AACs Improvements over MP3 Advanced Audio Coding
in MPEG-4 More sample frequencies (8-96 kHz) Arbitrary bit rates
and variable frame length Higher efficiency and simpler filterbank
Uses pure MDCT (modified discrete cosine transform) Used in Windows
Media Audio CS 414 - Spring 2014
Slide 27
MPEG-4 Audio Variety of applications General audio signals
Speech signals Synthetic audio Synthesized speech (structured
audio) CS 414 - Spring 2014
Slide 28
MPEG-4 Audio Part 3 Includes variety of audio coding
technologies Lossy speech coding (e.g., CELP) CELP code-excited
linear prediction speech coding General audio coding (AAC) Lossless
audio coding Text-to-Speech interface Structured Audio (e.g., MIDI)
CS 414 - Spring 2014
Slide 29
MPEG-4 Part 14 Called MP4 with Extension.mp4 Multimedia
container format Stores digital video and audio streams and allows
streaming over Internet Container or wrapper format meta-file
format whose spec describes how different data elements and
metadata coesit in computer file CS 414 - Spring 2014
Slide 30
Conclusion MPEG Audio is an integral part of the MPEG standard
to be considered together with video MPEG-4 Audio represents an
major extension in terms of capabilities to MPEG-1 Audio CS 414 -
Spring 2014 [edit] Notesedit
Slide 31
ADDITIONAL SLIDES CS 414 - Spring 2014
Slide 32
Criteria for Good Standard Achieve desired outcome Be
comprehensible Allow efficient implementation Support competition
Give benchmark tests Be supported by industry Be good for end users
. Two models: implement first, then standardize standardize first,
then implement CS 414 - Spring 2014
Slide 33
History of MPEG Audio MP3 First psychoacoustic masking code was
proposed in 1979 in AT&T Bell Labs, Murray Hill. MP3 based on
OCF (optimum coding in frequency domain) and PXFM (Perceptual
transform coding) MPEG-1 Audio Layer III public release 1993 MPEG-2
Audio III public release 1995 CS 414 - Spring 2014
Slide 34
MPEG Audio MP3 1997 mp3.com offering thousands of MP3s created
by independent artists for free 1999 Napster MP3 peer-to-peer file
sharing Problem: copyright infringement Authorized services:
Amazon.com, Rhapsody, Juno Records,.. CS 414 - Spring 2014
Slide 35
Fletcher-Munson Contours Each contour represents an equal
perceived sound CS 414 - Spring 2014 Perception sensitivity
(loudness) is not linear across all frequencies and
intensities
Slide 36
MPEG-4 Audio (Successor of MP3) Advanced Audio Coding (AAC) now
part of MPEG-4 Audio Inclusion of 48 full-bandwidth audio channels
Default audio format for iPhone, iPad, Nintendo, PlayStation,
Nokia, Android, BlackBerry Introduced 1997 as MPEG-2 Part 7 In 1999
updated and included in MPEG-4 CS 414 - Spring 2014
Slide 37
MPEG-4 Audio Bit-rate 2-64kbps Scalable for variable rates
MPEG-4 defines set of coders Parametric Coding Techniques: low
bit-rate 2-6kbps, 8kHz sampling frequency Code Excited Linear
Prediction: medium bit-rates 6- 24 kbps, 8 and 16 kHz sampling rate
Time Frequency Techniques: high quality audio 16 kbps and higher
bit-rates, sampling rate > 7 kHz CS 414 - Spring 2014