mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

1

Multimedia Standards

SS 2017

Lecture 7

Prof. Dr.-Ing. Karlheinz Brandenburg

[email protected]

Contact:

Dipl.-Inf. Thomas Köllmer [email protected]

© Fraunhofer IDMT

2

Psychoacoustic Fundamentals

MPEG Audio Coding

Speech Coding

AUDIO CODING

© Fraunhofer IDMT

3

Capabilities of the human ear

Frequency Range: ca. 16 Hz to. 8 - 25 kHz (typically 16Hz – 20 kHz)

Frequency Resolution: ca. 640 steps

Dynamic Range: ca. 120-130 dB

Dynamic Resolution: better than 1dB

© Fraunhofer IDMT

4


Human Hearing

Source: Ars Auditus; http://www.dasp.uni-wuppertal.de/index.php?id=57, 2010

outer ear middleear

inner ear

ear canal

pinn

a

cochlea with organ

of corti

archwaysossicles

eustachian tubeear drum

© Fraunhofer IDMT

5


Schematic drawing of the organ of corti

Source: Zwicker&Fastl “Psychoacoustics Facts and Models”

© Fraunhofer IDMT

6


Preprocessing of sound in the peripheral system


© Fraunhofer IDMT

7


Information processing in the auditory system


© Fraunhofer IDMT

8


Sound perception


© Fraunhofer IDMT

9


Threshold in quiet or absolute threshold


© Fraunhofer IDMT

10


Critical bands (“Frequenzgruppen”) in human hearing:

Different interpretations that produce the same segmentation

Constant distance in the Cochlea

Tones in a critical band above the threshold in quiet: their energy adds up

Tones in a critical band under the threshold in quiet: their energyadds up and might become audible

“Formula” for the width of the frequency bands

for frequencies < 500 Hz: Constant 100Hz width

for frequencies > 500 Hz: 0,2*frequency

© Fraunhofer IDMT

11


Critical bandwidth as a function of frequency

Approximations for low and high frequency ranges are indicated by broken lines.


© Fraunhofer IDMT

12


Pure tones masked by white (broad-band) noise


© Fraunhofer IDMT

13


Narrow band noise masking a tone at different center frequencies


© Fraunhofer IDMT

14


Narrow band noise masking a tone at varying levels (center frequency: 1kHz)


© Fraunhofer IDMT

15

Source: U. Zölzer, “DigitaleAudiosignalverarbeitung”


Masking neighboring bands

© Fraunhofer IDMT


Masking in the time domain / temporal masking effects

Depends on various factors

Duration of the masking signal

Intensity and spectrum of the masker

Time and frequency of both signals

• Source: Zwicker&Fastl “Psychoacoustics Facts and Models”

© Fraunhofer IDMT

17


Temporal masking effects

Post-Masking: corresponds to decay in the effect of the masker expected

Pre-Masking: appears during time before masker is switched on

Quick build-up time for loud maskers

Slower build-up time for faint test sounds

Frequency resolution Blurring in time

Frequency resolution in the ear ⇒ Masking in time

Because of in-ear fast processing between quiet to loud signals, we get Pre-Echoes

Pre-Masking: 1-5 ms

Post-Masking: ~100ms

© Fraunhofer IDMT

18


Pre-Echo: Example without Pre-Echo

© Fraunhofer IDMT

19


Pre-Echo: Example

© Fraunhofer IDMT

20

Audio Examples

Example 10:

Castanets original

Example 11:

Castanets coded with a block size of 2048 samples

© Fraunhofer IDMT

21

Demo: The "13 dB-miracle"

Original signal

Original + white noise, SNR = 13,6 dB

Original + noise at threshold, S/N = 13,6 dB

Difference (modulated white noise)

Difference (noise at threshold)

© Fraunhofer IDMT

22

The "13 dB-miracle”

© Fraunhofer IDMT

The McGurk Effekt

© Fraunhofer IDMT

24


Block diagram of a perceptual audio encoder

AnalysisFilter bank

Quantizationand coding

Serial bitstreammultiplexing

Calculation ofmasking threshold

based on psychoacoustics

Audio in bitstream

© Fraunhofer IDMT

25

Filter BankBit or NoiseAllocation

BitstreamFormatting

PsychoacousticModel

Digital AudioInput

Signal toMask Ratio

EncodedBitstream

QuantizedSamples

The Basic Paradigm of T/F Domain Audio Coding

© Fraunhofer IDMT

26

MPEG Audio Coding

© Fraunhofer IDMT

27

History of Audio Coding

1979 - the „Critical Band Coder“

1982 - „classic ATC“ for Music

1985 - MSC

1987 - OCF

1990 - MUSICAM

1990 - ASPEC

1992 - MPEG 1

1996 - PAC

1997 - MPEG 2 AAC

1999 - MPEG 4 AAC

2002 - HE AAC

2005 - MPEG 4 ALS

2006 - MPEG-D MPEG Surround (MPS)

2010 - MPEG-D Spatial Audio Object Coding (SAOC)

2012 - MPEG-D Unified Speech and Audio Coding (USAC)

2015 - MPEG H High Efficiency Coding and Media Delivery in Heterogeneous Environments

© Fraunhofer IDMT

28

MPEG-1 Audio Main building blocks

Perceptual model: - using psychoacoustics, mostly proprietary

Filter bank: - subdividing the input signal into spectral

components

- more lines ⇒ more coding gain

- longer impulse response ⇒ pre-echo artifacts

Quantization & coding: - this is the step introducing quantization noise

- spectral shape of quantization noise determines

the audibility

- can be designed to leave encoding methods

optional

MPEG-1 Audio

© Fraunhofer IDMT

29

MPEG – 1 Audio

Structure of the Encoder

© Fraunhofer IDMT

30

MPEG – 1 Audio

Structure of the Decoder

© Fraunhofer IDMT

31

MPEG – 1 Audio

Short description of the layers

Layer-1: Frame length: 384 samples (approx. 8ms) Spectral resolution: 32 sub bands Quantization: Block companding (12 Samples)

Layer-2: Frame length: 1152 Samples (approx.24ms) Spectral resolution: 32 sub bands Quantization: Block companding (12 Samples) Usage of the Scale-Factor-Select information

Layer-3: Frame length: 1152 Samples (approx. 24ms) Spectral resolution: 576 spectral lines Quantization: non uniform with Huffman coding Usage of the Scale-Factor-Select information

© Fraunhofer IDMT

32

MPEG – 1 Audio

Joint Stereo Mode for an additional increase of the compression rate

Different possibilities:

Mid/Side (M/S) stereo coding:

Two channels are coded; the left channel contains the sum of both original channels, the right channel contains the difference

Intensity coding:

Either both channels are coded separately (“stereo“-mode) or “Intensity Stereo“-coding is used

For higher frequencies only a mono signal is transmitted, which is adjusted nearby the original stereo position

© Fraunhofer IDMT

33

Trademark by Philips: PASC

Processing of frames with 384 PCM samples each

Signal is split to 32 bandsby a polyphase filter bank

32 frequency bands of 12 samples each

MPEG-1 Audio Layer-1

© Fraunhofer IDMT

34


© Fraunhofer IDMT

35


Trademark: MUSICAM

Processing of frames with 1152 PCM samples each

36 subband samples, grouped to 3 blocks with 12 samples each

Layer-2 offers the possibility of bit allocation, scale factors and samples just like Layer-1

Additionally: scale factor select information and packing of bits

Theoretical minimum of the coding / decoding delay: approx. 35 ms

© Fraunhofer IDMT

36


Bitstream structureLayer-2

Structure of Layer-2subband samples

© Fraunhofer IDMT

37

Layer 3 :

Standard frame length: 1152 samples (24 ms @48 kHz)

Frequency resolution: 576/192 sub-bands

Quantization: non-uniform with Huffman coding – Use of scale factor select Information

One benefit of MP3-formats is that it is a headerless file format, which means that it is not necessary to have a header to play the music.

Allows MP3 streaming

Theoretic minimum delay of the Coder/Decoder is around 59 ms.

MPEG-1 Audio Layer 3

© Fraunhofer IDMT

38


Two different MDCT block lengths – a long block of 18 samples or a short block of 6 samples

“Joint Stereo” with Mid/Side- and intensity coding:

M/S: Not the left and right channel (L/R), but the Mid and Side channel (M=(L+R)/2, S=(L-R)/2 are transmitted.

© Fraunhofer IDMT

39


Constant part:

Fixed number of bytes: 17 in mono, 32 in stereo, independent of the bit rate

Header (ISO Standard, as in Layer I and Layer –II)

Additional information for the frame: (e.g. pointer on the variable part)

Additional information per granule (e.g. selection of the Huffman tables)

Variable part:

Also called ”main info”

Scale factors

Huffman coded frequency lines

Additional data

In Layer-III the bit rates can be switched dynamically

© Fraunhofer IDMT

40

MPEG-1 Audio Layer-3 – Bit Reservoirs

© Fraunhofer IDMT

41

Layer 3 Block Diagram

MPEG-1 Audio Layer 3

© Fraunhofer IDMT

42

Bit Stream Syntax

© Fraunhofer IDMT

43

MPEG – 1 Audio – Bit Stream Syntax Layer -1, -2, and -3 Compression

Layer Bits indicate the used Layer

The higher the layer, the better the compression, but more processing power is required

00 reserved

01 Layer III

10 Layer II

11 Layer I

© Fraunhofer IDMT

44

MPEG-1 Audio Bit Stream Syntax

© Fraunhofer IDMT

45

MPEG-1 Audio Bit Stream Syntax – header()

© Fraunhofer IDMT

46


© Fraunhofer IDMT

47


© Fraunhofer IDMT

48

MPEG-1 Audio Bit Stream Syntax – audio_data()

Audio Data Layer-1

© Fraunhofer IDMT

49


Audio Data Layer-2

© Fraunhofer IDMT

50


Audio Data Layer-2

© Fraunhofer IDMT

51


Audio Data Layer-3

© Fraunhofer IDMT

52


Audio Data Layer-3

© Fraunhofer IDMT

53

MPEG Audio Layer-3: Huffman-Code Tables

© Fraunhofer IDMT

54


© Fraunhofer IDMT

55


© Fraunhofer IDMT

56


© Fraunhofer IDMT

57

More on Audio Coding will be covered in the dedicated lecture series!

http://www.tu-ilmenau.de/mt/lehrveranstaltungen/lehre-fuer-master-mt/audio-coding/

© Fraunhofer IDMT

58

Organisational issues Preliminary list of lectures – updated version is on the website

* ISO 8601 Representation of dates and times ch. 2.2.10 : calendar week number: ordinal number which identifies a calendar week within its calendar year according to the rule that the first calendar week of a year is that one which includes the first Thursday of that year and that the last calendar week of a calendar year is the week immediately preceding the first calendar week of the next calendar year

Tuesday, 17:00, K-Hs1 Thursday, 13:00, K-Hs2

CW* 14 IntroductionCW15 Standardization I Standardization IICW16CW17CW18 Video Coding ICW19 Video Coding II Video Coding IIICW20CW21 Psychoacoustic FundamentalsCW22 Metadata StandardsCW23 MPEG Audio I MPEG Audio IICW24 Speech CodingCW25CW26CW27 System Standards ICW28 System Standards II System Standards III

Regular Date Alternate Date

Documents

mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING