Click here to load reader

CS 414 - Spring 2014 CS 414 – Multimedia Systems Design Lecture 15 – MP3 and MP4 Audio Klara Nahrstedt Spring 2014

Embed Size (px)

Citation preview

  • Slide 1
  • CS 414 - Spring 2014 CS 414 Multimedia Systems Design Lecture 15 MP3 and MP4 Audio Klara Nahrstedt Spring 2014
  • Slide 2
  • CS 414 - Spring 2014 Administrative HW1 posted on February 24 (Monday) HW1 deadline on March 3 (Monday) Midterm March 7 (Friday) in class
  • Slide 3
  • Outline H.264/H.265 Arithmetic Coding MP3 Audio Encoding MP4 Audio Reading: Media Coding book, Section 7.7.2 7.7.5 Recommended Paper on MP3: Davis Pan, A Tutorial on MPEG/Audio Compression, IEEE Multimedia, pp. 6-74, 1995 Recommended books on JPEG/ MPEG Audio/Video Fundamentals: Haskell, Puri, Netravali, Digital Video: An Introduction to MPEG-2, Chapman and Hall, 1996 CS 414 - Spring 2014
  • Slide 4
  • H.264/H.265 Entropy Encoding (Limitations of Huffman Coding) Diverges from lower limit when probability of a particular symbol becomes high always uses an integral number of bits Must send code book with the data lowers overall efficiency Must determine frequency distribution must remain stable over the data set CS 414 - Spring 2014
  • Slide 5
  • H.264/H.265 Entropy Coding (Arithmetic Coding) Each symbol is coded by considering the prior data Encoded data must be read from the beginning, there is no random access possible Each real number (< 1) is represented as binary fraction 0.5 = 2 -1 (binary fraction = 0.1); 0.25 = 2 -2 (binary fraction = 0.01), 0.625 = 0.5+0.125 (binary fraction = 0.101) . CS 414 - Spring 2014
  • Slide 6
  • Slide 7
  • Slide 8
  • AUDIO COMPRESSION CS 414 - Spring 2014
  • Slide 9
  • Why Audio Compression is Needed Data rate = sampling rate * quantization bits * channels (+ control information) For example (digital audio): 44100 Hz; 16 bits; 2 channels generates about 1.4M of data per second; 84M per minute; 5G per hour CS 414 - Spring 2014
  • Slide 10
  • MPEG-1 Audio Lossy compression of audio In late 1980s ISOs MPEG group started to standardize TV broadcasting Use of Audio on CD-ROM (later DVD) MPEG-1 Audio 1992 MPEG-2 Audio - 1994 MPEG-1 Audio Layer I, II, III CS 414 - Spring 2014
  • Slide 11
  • MPEG-1 Audio Encoding Characteristics Precision 16 bits Sampling frequency: 32KHz, 44.1 KHz, 48 KHz 3 compression layers: Layer 1, Layer 2, Layer 3 (MP3) Layer 3: 32-320 kbps, target 64 kbps Layer 2: 32-384 kbps, target 128 kbps Layer 1: 32-448 kbps, target 192 kbps CS 414 - Spring 2014
  • Slide 12
  • MPEG-1 Audio Layer II Called MP2 Dominant standard for audio broadcasting DAB digital radio and DVB digital television Came out of MUSICAM codecs with bit rates 64-196 kbps MUSICAM audio coding - basis for MPEG-1 and MPEG-2 audio Sampling rates: 32, 44.1, 48 kHz Bit rates: 32, 48, 56, 64, 80, 96, 384 kbps Format: mono, stereo, dual channel, MP2 sub-band audio encoder in time domain CS 414 - Spring 2014
  • Slide 13
  • MPEG-1 Audio Layer III MPEG-1 Layer III is called MP3 format Popular for Internet applications Goal to compress to 128 kbps, but can be compressed to higher or lower resulting quality Utilization of psychoacoustics Scientific study of sound perception. CS 414 - Spring 2014
  • Slide 14
  • MPEG Audio Encoding Steps CS 414 - Spring 2014
  • Slide 15
  • Slide 16
  • MPEG Audio Filter Bank Filter bank divides input into multiple sub-bands (32 equal frequency sub-bands) Sub-band i defined - filter output sample for sub-band i at time t, C[n] one of 512 coefficients, x[n] audio input sample from 512 sample buffer CS 414 - Spring 2014
  • Slide 17
  • MPEG Audio Psycho-acoustic Model Compresses by removing acoustically irrelevant parts of audio signals Takes advantage of human auditory systems inability to hear quantization noise under auditory masking Auditory masking: occurs when ever the presence of a strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible. CS 414 - Spring 2014
  • Slide 18
  • Slide 19
  • Loudness and Pitch (Review on Psychoacoustic Effects) More sensitive to loudness at mid frequencies than at other frequencies intermediate frequencies at [500hz, 5000hz] Human hearing frequencies at [20hz,20000hz] Perceived loudness of a sound changes based on frequency of that sound basilar membrane reacts more to intermediate frequencies than other frequencies CS 414 - Spring 2014
  • Slide 20
  • Masking Effects (Review of Psychoacoustic Effects) CS 414 - Spring 2014 Frequency masking Temporal masking
  • Slide 21
  • CS 414 - Spring 2014 MPEG/audio divides audio signal into frequency sub-bands that approximate critical bands. Then we quantize each sub-band according to the audibility of quantization noise within the band
  • Slide 22
  • MPEG Audio Bit Allocation This process determines number of code bits allocated to each sub-band based on information from the psycho- acoustic model Algorithm: 1. Compute mask-to-noise ratio: MNR=SNR-SMR Standard provides tables that give estimates for SNR resulting from quantizing to a given number of quantizer levels 2. Get MNR for each sub-band 3. Search for sub-band with the lowest MNR 4. Allocate code bits to this sub-band. If sub-band gets allocated more code bits than appropriate, look up new estimate of SNR and repeat step 1 CS 414 - Spring 2014
  • Slide 23
  • Audio Quality Bitrate With too low bit rate, we get compression artifacts Ringing Pre-echo sound is heard before it occurs. It is most noticeable in impulsive sounds from percussion instruments such as cymbals Occurs in transform-based audio compression algorithms Quality of encoder and encoding parameters Constant Bit rate encoding Variable Bit rate encoding CS 414 - Spring 2014
  • Slide 24
  • MP3 Audio Format CS 414 - Spring 2014 Source: http://wiki.hydrogenaudio.org/images/e/ee/Mp3filestructure.jpg
  • Slide 25
  • MPEG Audio Comments Precision of 16 bits per sample is needed to get good SNR ratio Noise we are getting is quantization noise from the digitization process For each added bit, we get 6dB better SNR ratio Masking effect means that we can raise the noise floor around a strong sound because the noise will be masked away Raising noise floor is the same as using less bits and using less bits is the same as compression CS 414 - Spring 2014
  • Slide 26
  • MPEG-4 Audio (AACs Improvements over MP3 Advanced Audio Coding in MPEG-4 More sample frequencies (8-96 kHz) Arbitrary bit rates and variable frame length Higher efficiency and simpler filterbank Uses pure MDCT (modified discrete cosine transform) Used in Windows Media Audio CS 414 - Spring 2014
  • Slide 27
  • MPEG-4 Audio Variety of applications General audio signals Speech signals Synthetic audio Synthesized speech (structured audio) CS 414 - Spring 2014
  • Slide 28
  • MPEG-4 Audio Part 3 Includes variety of audio coding technologies Lossy speech coding (e.g., CELP) CELP code-excited linear prediction speech coding General audio coding (AAC) Lossless audio coding Text-to-Speech interface Structured Audio (e.g., MIDI) CS 414 - Spring 2014
  • Slide 29
  • MPEG-4 Part 14 Called MP4 with Extension.mp4 Multimedia container format Stores digital video and audio streams and allows streaming over Internet Container or wrapper format meta-file format whose spec describes how different data elements and metadata coesit in computer file CS 414 - Spring 2014
  • Slide 30
  • Conclusion MPEG Audio is an integral part of the MPEG standard to be considered together with video MPEG-4 Audio represents an major extension in terms of capabilities to MPEG-1 Audio CS 414 - Spring 2014 [edit] Notesedit
  • Slide 31
  • ADDITIONAL SLIDES CS 414 - Spring 2014
  • Slide 32
  • Criteria for Good Standard Achieve desired outcome Be comprehensible Allow efficient implementation Support competition Give benchmark tests Be supported by industry Be good for end users . Two models: implement first, then standardize standardize first, then implement CS 414 - Spring 2014
  • Slide 33
  • History of MPEG Audio MP3 First psychoacoustic masking code was proposed in 1979 in AT&T Bell Labs, Murray Hill. MP3 based on OCF (optimum coding in frequency domain) and PXFM (Perceptual transform coding) MPEG-1 Audio Layer III public release 1993 MPEG-2 Audio III public release 1995 CS 414 - Spring 2014
  • Slide 34
  • MPEG Audio MP3 1997 mp3.com offering thousands of MP3s created by independent artists for free 1999 Napster MP3 peer-to-peer file sharing Problem: copyright infringement Authorized services: Amazon.com, Rhapsody, Juno Records,.. CS 414 - Spring 2014
  • Slide 35
  • Fletcher-Munson Contours Each contour represents an equal perceived sound CS 414 - Spring 2014 Perception sensitivity (loudness) is not linear across all frequencies and intensities
  • Slide 36
  • MPEG-4 Audio (Successor of MP3) Advanced Audio Coding (AAC) now part of MPEG-4 Audio Inclusion of 48 full-bandwidth audio channels Default audio format for iPhone, iPad, Nintendo, PlayStation, Nokia, Android, BlackBerry Introduced 1997 as MPEG-2 Part 7 In 1999 updated and included in MPEG-4 CS 414 - Spring 2014
  • Slide 37
  • MPEG-4 Audio Bit-rate 2-64kbps Scalable for variable rates MPEG-4 defines set of coders Parametric Coding Techniques: low bit-rate 2-6kbps, 8kHz sampling frequency Code Excited Linear Prediction: medium bit-rates 6- 24 kbps, 8 and 16 kHz sampling rate Time Frequency Techniques: high quality audio 16 kbps and higher bit-rates, sampling rate > 7 kHz CS 414 - Spring 2014