Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
© Fraunhofer IDMT
1
Multimedia Standards
SS 2017
Lecture 7
Prof. Dr.-Ing. Karlheinz Brandenburg
Contact:
Dipl.-Inf. Thomas Köllmer [email protected]
© Fraunhofer IDMT
2
Psychoacoustic Fundamentals
MPEG Audio Coding
Speech Coding
AUDIO CODING
© Fraunhofer IDMT
3
Capabilities of the human ear
Frequency Range: ca. 16 Hz to. 8 - 25 kHz (typically 16Hz – 20 kHz)
Frequency Resolution: ca. 640 steps
Dynamic Range: ca. 120-130 dB
Dynamic Resolution: better than 1dB
© Fraunhofer IDMT
4
Psychoacoustic Fundamentals
Human Hearing
Source: Ars Auditus; http://www.dasp.uni-wuppertal.de/index.php?id=57, 2010
outer ear middleear
inner ear
ear canal
pinn
a
cochlea with organ
of corti
archwaysossicles
eustachian tubeear drum
© Fraunhofer IDMT
5
Psychoacoustic Fundamentals
Schematic drawing of the organ of corti
Source: Zwicker&Fastl “Psychoacoustics Facts and Models”
© Fraunhofer IDMT
6
Psychoacoustic Fundamentals
Preprocessing of sound in the peripheral system
Source: Zwicker&Fastl “Psychoacoustics Facts and Models”
© Fraunhofer IDMT
7
Psychoacoustic Fundamentals
Information processing in the auditory system
Source: Zwicker&Fastl “Psychoacoustics Facts and Models”
© Fraunhofer IDMT
8
Psychoacoustic Fundamentals
Sound perception
Source: Zwicker&Fastl “Psychoacoustics Facts and Models”
© Fraunhofer IDMT
9
Psychoacoustic Fundamentals
Threshold in quiet or absolute threshold
Source: Zwicker&Fastl “Psychoacoustics Facts and Models”
© Fraunhofer IDMT
10
Psychoacoustic Fundamentals
Critical bands (“Frequenzgruppen”) in human hearing:
Different interpretations that produce the same segmentation
Constant distance in the Cochlea
Tones in a critical band above the threshold in quiet: their energy adds up
Tones in a critical band under the threshold in quiet: their energyadds up and might become audible
“Formula” for the width of the frequency bands
for frequencies < 500 Hz: Constant 100Hz width
for frequencies > 500 Hz: 0,2*frequency
© Fraunhofer IDMT
11
Psychoacoustic Fundamentals
Critical bandwidth as a function of frequency
Approximations for low and high frequency ranges are indicated by broken lines.
Source: Zwicker&Fastl “Psychoacoustics Facts and Models”
© Fraunhofer IDMT
12
Psychoacoustic Fundamentals
Pure tones masked by white (broad-band) noise
Source: Zwicker&Fastl “Psychoacoustics Facts and Models”
© Fraunhofer IDMT
13
Psychoacoustic Fundamentals
Narrow band noise masking a tone at different center frequencies
Source: Zwicker&Fastl “Psychoacoustics Facts and Models”
© Fraunhofer IDMT
14
Psychoacoustic Fundamentals
Narrow band noise masking a tone at varying levels (center frequency: 1kHz)
Source: Zwicker&Fastl “Psychoacoustics Facts and Models”
© Fraunhofer IDMT
15
Source: U. Zölzer, “DigitaleAudiosignalverarbeitung”
Psychoacoustic Fundamentals
Masking neighboring bands
© Fraunhofer IDMT
Psychoacoustic Fundamentals
Masking in the time domain / temporal masking effects
Depends on various factors
Duration of the masking signal
Intensity and spectrum of the masker
Time and frequency of both signals
• Source: Zwicker&Fastl “Psychoacoustics Facts and Models”
© Fraunhofer IDMT
17
Psychoacoustic Fundamentals
Temporal masking effects
Post-Masking: corresponds to decay in the effect of the masker expected
Pre-Masking: appears during time before masker is switched on
Quick build-up time for loud maskers
Slower build-up time for faint test sounds
Frequency resolution Blurring in time
Frequency resolution in the ear ⇒ Masking in time
Because of in-ear fast processing between quiet to loud signals, we get Pre-Echoes
Pre-Masking: 1-5 ms
Post-Masking: ~100ms
© Fraunhofer IDMT
18
Psychoacoustic Fundamentals
Pre-Echo: Example without Pre-Echo
© Fraunhofer IDMT
19
Psychoacoustic Fundamentals
Pre-Echo: Example
© Fraunhofer IDMT
20
Audio Examples
Example 10:
Castanets original
Example 11:
Castanets coded with a block size of 2048 samples
© Fraunhofer IDMT
21
Demo: The "13 dB-miracle"
Original signal
Original + white noise, SNR = 13,6 dB
Original + noise at threshold, S/N = 13,6 dB
Difference (modulated white noise)
Difference (noise at threshold)
© Fraunhofer IDMT
22
The "13 dB-miracle”
© Fraunhofer IDMT
The McGurk Effekt
© Fraunhofer IDMT
24
Psychoacoustic Fundamentals
Block diagram of a perceptual audio encoder
AnalysisFilter bank
Quantizationand coding
Serial bitstreammultiplexing
Calculation ofmasking threshold
based on psychoacoustics
Audio in bitstream
© Fraunhofer IDMT
25
Filter BankBit or NoiseAllocation
BitstreamFormatting
PsychoacousticModel
Digital AudioInput
Signal toMask Ratio
EncodedBitstream
QuantizedSamples
The Basic Paradigm of T/F Domain Audio Coding
© Fraunhofer IDMT
26
MPEG Audio Coding
© Fraunhofer IDMT
27
History of Audio Coding
1979 - the „Critical Band Coder“
1982 - „classic ATC“ for Music
1985 - MSC
1987 - OCF
1990 - MUSICAM
1990 - ASPEC
1992 - MPEG 1
1996 - PAC
1997 - MPEG 2 AAC
1999 - MPEG 4 AAC
2002 - HE AAC
2005 - MPEG 4 ALS
2006 - MPEG-D MPEG Surround (MPS)
2010 - MPEG-D Spatial Audio Object Coding (SAOC)
2012 - MPEG-D Unified Speech and Audio Coding (USAC)
2015 - MPEG H High Efficiency Coding and Media Delivery in Heterogeneous Environments
© Fraunhofer IDMT
28
MPEG-1 Audio Main building blocks
Perceptual model: - using psychoacoustics, mostly proprietary
Filter bank: - subdividing the input signal into spectral
components
- more lines ⇒ more coding gain
- longer impulse response ⇒ pre-echo artifacts
Quantization & coding: - this is the step introducing quantization noise
- spectral shape of quantization noise determines
the audibility
- can be designed to leave encoding methods
optional
MPEG-1 Audio
© Fraunhofer IDMT
29
MPEG – 1 Audio
Structure of the Encoder
© Fraunhofer IDMT
30
MPEG – 1 Audio
Structure of the Decoder
© Fraunhofer IDMT
31
MPEG – 1 Audio
Short description of the layers
Layer-1: Frame length: 384 samples (approx. 8ms) Spectral resolution: 32 sub bands Quantization: Block companding (12 Samples)
Layer-2: Frame length: 1152 Samples (approx.24ms) Spectral resolution: 32 sub bands Quantization: Block companding (12 Samples) Usage of the Scale-Factor-Select information
Layer-3: Frame length: 1152 Samples (approx. 24ms) Spectral resolution: 576 spectral lines Quantization: non uniform with Huffman coding Usage of the Scale-Factor-Select information
© Fraunhofer IDMT
32
MPEG – 1 Audio
Joint Stereo Mode for an additional increase of the compression rate
Different possibilities:
Mid/Side (M/S) stereo coding:
Two channels are coded; the left channel contains the sum of both original channels, the right channel contains the difference
Intensity coding:
Either both channels are coded separately (“stereo“-mode) or “Intensity Stereo“-coding is used
For higher frequencies only a mono signal is transmitted, which is adjusted nearby the original stereo position
© Fraunhofer IDMT
33
Trademark by Philips: PASC
Processing of frames with 384 PCM samples each
Signal is split to 32 bandsby a polyphase filter bank
32 frequency bands of 12 samples each
MPEG-1 Audio Layer-1
© Fraunhofer IDMT
34
MPEG-1 Audio Layer-1
© Fraunhofer IDMT
35
MPEG-1 Audio Layer-2
Trademark: MUSICAM
Processing of frames with 1152 PCM samples each
36 subband samples, grouped to 3 blocks with 12 samples each
Layer-2 offers the possibility of bit allocation, scale factors and samples just like Layer-1
Additionally: scale factor select information and packing of bits
Theoretical minimum of the coding / decoding delay: approx. 35 ms
© Fraunhofer IDMT
36
MPEG-1 Audio Layer-2
Bitstream structureLayer-2
Structure of Layer-2subband samples
© Fraunhofer IDMT
37
Layer 3 :
Standard frame length: 1152 samples (24 ms @48 kHz)
Frequency resolution: 576/192 sub-bands
Quantization: non-uniform with Huffman coding – Use of scale factor select Information
One benefit of MP3-formats is that it is a headerless file format, which means that it is not necessary to have a header to play the music.
Allows MP3 streaming
Theoretic minimum delay of the Coder/Decoder is around 59 ms.
MPEG-1 Audio Layer 3
© Fraunhofer IDMT
38
MPEG-1 Audio Layer-3
Two different MDCT block lengths – a long block of 18 samples or a short block of 6 samples
“Joint Stereo” with Mid/Side- and intensity coding:
M/S: Not the left and right channel (L/R), but the Mid and Side channel (M=(L+R)/2, S=(L-R)/2 are transmitted.
© Fraunhofer IDMT
39
MPEG-1 Audio Layer-3
Constant part:
Fixed number of bytes: 17 in mono, 32 in stereo, independent of the bit rate
Header (ISO Standard, as in Layer I and Layer –II)
Additional information for the frame: (e.g. pointer on the variable part)
Additional information per granule (e.g. selection of the Huffman tables)
Variable part:
Also called ”main info”
Scale factors
Huffman coded frequency lines
Additional data
In Layer-III the bit rates can be switched dynamically
© Fraunhofer IDMT
40
MPEG-1 Audio Layer-3 – Bit Reservoirs
© Fraunhofer IDMT
41
Layer 3 Block Diagram
MPEG-1 Audio Layer 3
© Fraunhofer IDMT
42
Bit Stream Syntax
© Fraunhofer IDMT
43
MPEG – 1 Audio – Bit Stream Syntax Layer -1, -2, and -3 Compression
Layer Bits indicate the used Layer
The higher the layer, the better the compression, but more processing power is required
00 reserved
01 Layer III
10 Layer II
11 Layer I
© Fraunhofer IDMT
44
MPEG-1 Audio Bit Stream Syntax
© Fraunhofer IDMT
45
MPEG-1 Audio Bit Stream Syntax – header()
© Fraunhofer IDMT
46
MPEG-1 Audio Bit Stream Syntax – header()
© Fraunhofer IDMT
47
MPEG-1 Audio Bit Stream Syntax – header()
© Fraunhofer IDMT
48
MPEG-1 Audio Bit Stream Syntax – audio_data()
Audio Data Layer-1
© Fraunhofer IDMT
49
MPEG-1 Audio Bit Stream Syntax – audio_data()
Audio Data Layer-2
© Fraunhofer IDMT
50
MPEG-1 Audio Bit Stream Syntax – audio_data()
Audio Data Layer-2
© Fraunhofer IDMT
51
MPEG-1 Audio Bit Stream Syntax – audio_data()
Audio Data Layer-3
© Fraunhofer IDMT
52
MPEG-1 Audio Bit Stream Syntax – audio_data()
Audio Data Layer-3
© Fraunhofer IDMT
53
MPEG Audio Layer-3: Huffman-Code Tables
© Fraunhofer IDMT
54
MPEG Audio Layer-3: Huffman-Code Tables
© Fraunhofer IDMT
55
MPEG Audio Layer-3: Huffman-Code Tables
© Fraunhofer IDMT
56
MPEG Audio Layer-3: Huffman-Code Tables
© Fraunhofer IDMT
57
More on Audio Coding will be covered in the dedicated lecture series!
http://www.tu-ilmenau.de/mt/lehrveranstaltungen/lehre-fuer-master-mt/audio-coding/
© Fraunhofer IDMT
58
Organisational issues Preliminary list of lectures – updated version is on the website
* ISO 8601 Representation of dates and times ch. 2.2.10 : calendar week number: ordinal number which identifies a calendar week within its calendar year according to the rule that the first calendar week of a year is that one which includes the first Thursday of that year and that the last calendar week of a calendar year is the week immediately preceding the first calendar week of the next calendar year
Tuesday, 17:00, K-Hs1 Thursday, 13:00, K-Hs2
CW* 14 IntroductionCW15 Standardization I Standardization IICW16CW17CW18 Video Coding ICW19 Video Coding II Video Coding IIICW20CW21 Psychoacoustic FundamentalsCW22 Metadata StandardsCW23 MPEG Audio I MPEG Audio IICW24 Speech CodingCW25CW26CW27 System Standards ICW28 System Standards II System Standards III
Regular Date Alternate Date