Upload
norman-sherman
View
219
Download
1
Embed Size (px)
Citation preview
PAC/AAC audio coding standard
A. [email protected] Institute of TechnologyECE8873-Spring/2004
Overview
Audio Recording Coding-ultimate goal AAC Encoder Block Diagram Principles of Psychoacoustics Perceptual Entropy Quantization and Coding Samples
Introduction
"If a tree falls in the forest with no one around to hear it, does it make a sound?"
Audio Recording
Edison, 1877
Audio Recording
Philips, 1978
A/D Converter
PCM
Coding
Ultimate Goal: reduce the number of bits needed to represent the data.
Bitrate = Fsa x Wordlength
AAC Encoder Block Diagram
Perceptual Model
Gain Control MDCT TNS
Multi-ChannelM/S, Intensity Prediction z^-1
Quant
ScaleFactorExtract
Iterative Rate Control Loop
EntropyCoding
Side information coding, Bitstreamchannel
s(n)
Principles of Psychoacoustics
Source localization.
Two ears are necessary.
Brain uses intensity differences, and time delays between the two perceived signals.
Principles of Psychoacoustics
inaudible
audible
Absolute Hearing Threshold
Principles of Psychoacoustics
Human Ear Loudness characteristic
Robinson and Dadson equi-loudness contours.
Principles of Psychoacoustics Critical Bands
Concept introduced by Harvey Fletcher 1940.
Frequency to Place Transform.Function of frequency that quantifies the cochlear filter passbands.
Example: The critical band for a 1kHz is about 160Hz in width. A narrow band noise centered at 1kHz is perceived with the same loudness as long as the width < 160Hz.
(Hz)])1000/(4.11[7525)( 69.02ffBWc
Principles of Psychoacoustics
Simultaneous Masking: Frequency
inaudible
audible
Principles of Psychoacoustics
BETH TN 5.14
Simplified Paradigms:Noise Masking Tone
Tone Masking Noise
1Bark
THN
1Bark
THTKETH NT
K=3dB...5dB (constant)
Principles of Psychoacoustics
1Bark
th
Spread of Masking
Principles of Psychoacoustics
Masking: Temporal
Perceptual Entropy Perceptual Entropy, objective metric of
perceptually relevant introduced by J. Johnston
The perceived information from an audio signal is only a fraction of the total information emanated by the source.
Perceptual Entropy
Procedure:1. Window and transform to frequency.2. Masking Threshold is computed using
perceptual rules3. A determination is made of the
number of bits required to quantize the spectrum, without injecting perceptible noise.
Perceptual Entropy
a
gSFM
)1,60
min( dB
SFM
)dB(5.5)1()5.14( iOi
s(n) HannWindow
MDCTDetermine nature
(Noise-like)(Tone-like)
ApplyThresholding
rules
)10/()(10log10 ii OCiT
Spectral Flatness Measure
Coefficient of ‘Tonality’
Offset
JND Estimates
Perceptual Entropy
25
1e)bits/sampl(1)
/6
)Im(int(22log1)
/6
)Re(int(22log
i
bh
blwiiii
i
i kT
wn
kT
wnPE
i: index of critical band;bli, blh: lower and upper bounds of band i;ki: number of transform component in band i;Ti: masking threshold in band i;nint: rounding to the nearest integer.
Returning
"If a tree falls in the forest with no one around to hear it, does it make a sound?"
From a Perceptual Coding standpoint, if no one can hear it, THERE IS NO TREE.
AAC Encoder Block Diagram
Perceptual Model
Gain Control MDCT TNS
Multi-ChannelM/S, Intensity Prediction z^-1
Quant
ScaleFactorExtract
Iterative Rate Control Loop
EntropyCoding
Side information coding, Bitstreamchannel
s(n)
Quantization and Coding
Power-law quantizer Huffman Coding (table can be chosen)
Global Gain -> Quantization step size Scale Factors -> noise shaping factor
Quantization and Codingwhile NOISE_CTL
while FINDING_RATENr_bits= get_bits_needed();if (Nr_bits > max_bits)
adjust_global_gain();else
FINDING_RATE=0;endq_noise=get_quant_noise_level();if (q_noise> Th(band))
adjust_band_scale_factor();else
NOISE_CTL=0;end
Samples
Castanets
Original 48kHz Stereo
128kbps AAC Stereo (48kHz)
Piano
Timpani
References[1] Ted Painter and Andreas Spanias. Perceptual coding
of digital audio. Proceedings of the IEEE, 88(4):449-513. Abril 2000.
[2] Karlheinz Brandenburg, MP3 and AAC explained, AES 17th International Conference on High Quality Audio Coding, 1999.
[3] J.D. Johnston, A.J. Ferreira, Sum-Difference Stereo Transform Coding, Proc. ICASSP 1992.
[4] Deepen Sinha, James D. Johnston. Audio Compression at low bit rates using a Signal Adaptive switched Filterbank. Proc. of the ICASSP 1996, pp. 1053-1056 .