Week 6 February 16, 2012 Human Psychoacousticsese250/week6/Week6_S'12(Psychoaco… · The Cognitive Ear • Modern Psychoacoustics Benefits greatly from o decades of neural recording

Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 1

ESE250:

Digital Audio Basics

Week 6

February 16, 2012

Human Psychoacoustics

2

Course Map

Numbers correspond to course weeks

2,5 6

11

13

12

Today

ESE 250 – S’12 Kod & DeHon Week 6 – Psychoacoustics

Where are we ? • Week 2 Received signal is sampled &

quantized

q = PCM[ r ]

• Week 3 Quantized Signal is Coded

c =code[ q ]

• Week 4 Sampled signal first

transformed into frequency domain

Q = DFT[ q ]

• Week 5 signal oversampled & low

pass filtered

Q = LPF[ DFT(q+n) ]

• Week 6 Transformed signal analyzed

Using human psychoaoustic models

• Week 7 Acoustically Interesting signal

is “perceptually coded”

C = MP3[ Q]

Over

Sample DFT LPF

Decode Produce

r(t)

p(t)

q + n

C Perceptual

Coding

Store /

Transmit

Q + N Q

Week 4

Week 6

Week 5 Week 3

[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]



The Physical Ear

• External Sound Waves

Guided by outer ear

into auditory canal

• Excite Inner Ear

through mechanical linkage

connecting ear drum

to cochlea

• Initiates signal processing

frequency domain analysis

via analog computation

[R. Munkong and B.-H. Juang. IEEE Sig. Proc. Mag., 25(3):98–117, 2008]

http://www.youtube.com/watch?v=dyenMluFaUw


The Cognitive Ear • Modern Psychoacoustics

Benefits greatly from o decades of neural

recording o contemporary brain

imaging technology

Still in its infancy o huge neural

populations o intricate recruitment of

brain anatomy

• General Consensus Frequency

representation is encoded at neural-cochlear interface

Processed through an array of modules

o Spatially distributed o Hierarchically arranged

(fan-outs, fan-ins) o “inwardly” abstract,

e.g. outer module:

“noise/music” inner module:

“guitar/saxophone”

[R. Munkong and B.-H. Juang. IEEE Sig. Proc. Mag., 25(3):98–117, 2008]


Power Spectrum Model of Hearing

• Rough Picture (main content of today’s lecture):

Critical Bands: Auditory system contains finite array

of adaptively tunable, overlapping bandpass filters

Frequency Bins: humans process a signal’s

component (against noisy background) in the one

filter with closest center frequency

Masking: certain signal components in a given band

are “favored” and others are filtered out

• Established through decades of psychoacoustic

experiments

B.C.J. Moore. Int.Rev.Neurobiol., 70:49–86, 2005.


Auditory Thresholds • H. Fletcher (1940)

played pure tones

varying

o frequency, f [ Hz]

o Intensity,

I [Dyn ¢ cm-2]

= 10-5 [N ¢ cm-2]

= 10 Pa

o phase changes tend to

be inaudible

large listener population

o young

o Acute

• Recorded extreme thresholds faintest audible

greatest tolerable

[H. Fletcher. Rev. Mod. Phys., 12(1):47–65, 1940].

• Results: pain-free hearing range

extends at most over

20 Hz – 20 KHz

with sensitivity

» 2 ¢ 10-4 ¢ 10 ¹ Pa = 2 º Pa

Week 6 – Psychoacoustics


Auditory Thresholds [H. Fletcher. Rev. Mod. Phys., 12(1):47–65, 1940].

• Standard pressure

p0 =

2 ¢ 10-9 Pa

• Compare to

ambient sea-level pressure:

1 Atmosphere

= 105 Pascal

~ 103 millibar

~ 30 inHg

• Q: why use log-log

scale?

• A1: dynamic range

• A2: “loudness” is a

power function

Sound Pressure Level

LSPL = 20 log10(p/p0) dB

10 Pa


Aside: Scales for Power Laws • Assume a power law relationship

Response = P(Stimulus)

where r = P(s) = k sn

• Rescaling: take q = Ln (r) and t = Ln (s) ) q = Ln (k) + n Ln (s) = K + n t

• Change of Coordinates Yields a linear relationship

Between new stimulus-response variables

q = P(t) = K + n t

• Linearity renders the JND (“just noticeable difference”)

unit

uniformly in stimulus and response coordinates

facilitating perceptual processing algorithms

we’ll re-examine

the origin of this

model later in

the lecture


Masking


• Dots show pure tone magnitude (in dB)

• Required to be audible above noise of the magnitude

on the middle curve

centered at that frequency

with bandwidth o at least wider

o than the bars

o of next figures

Noise Masking Tone


Masking • Masking Paradigms

“Masker” masking “maskee”

Noise Masking Tone o previous plot

Tone Masking Noise o pure tone

of 80 SPL

at 1 kHz

o just masks

o “critical band” noise of 56 SPL

centered at 1 kHz

Masker-to-Maskee ratio o Constant for fixed relative

frequency and varying amplitude

o Changes with varying relative frequency

Noise Masking Noise o more care required

o to describe bandwidth pair

• Temporal Masking Masker effect persists for tenths

of a second

Masker effect is “acausal” o on ~ 2/100 timescales

[T. Painter and A. Spanias. Proc. IEEE, 88(4):451–512, 2000.]

1 “Bark”

frequency

interval


Masking


Tone Masking Noise

value (in dB) above quiet threshold

such that a signal at the abscissa frequency

can be heard in presence of 200 Hz tone

…masker at fundamental

can somewhat mask maskees

at the harmonics …

… but the “spreading

curve” is traditionally

depicted over the

fundamental only


Interlude A: Why Masking?

• Ecology of auditory scene Ensemble of sound sources – specific placements and time of production

– is called the auditory scene

Animal communication takes place within a noisy environment o communication crucial for animal society; auditory capability emerged

through evolutionary selective pressure

o distinct production and reception strategies allow multiple species to “share” same bandwidth in the same location

• Masking: animal cocktail party skills Many varied studies: e.g., birds respond selectively to conspecific song

when mixed with other species’ o response levels weaken as masked song similarity increases

o response levels remain strong as multiple “chorus” increases

o babies distinguish parents’ calls in presence of louder non-related conspecifics’

o trained non-relatives distinguish familiar from unfamiliar conspecifics’ songs

Human selective attention can be disrupted by “cognitive” distractors o E.g. inserting listener’s name into “wrong” channel

[H. Brumm and H. Slabbekoorn. Adv.Study Beh.,35:151–209, 2005]


Interlude B: Why Frequency?

• Why not some other harmonic series?

Fourier’s analysis shows

harmonic analysis could be based on

arbitrary smooth periodic fundamental

• Why does the animal receiver use

sinusoids?

• Hamiltonian Mechanics

Simplest physical model of vibrating

masses

Coupled spring-mass-damper mechanics

Produce sinusoidal harmonics

m

x

b k

…. all sound

is produced

by vibrating

masses ….


Pitch JND • JND = “just noticeable difference”

change in stimulus

that “just” elicits perceptual notice

where “just” means o smaller variations of stimulus

o cannot be discerned

• Human Pure Tone Data JND below 1000 Hz is

o roughly constant

o ~ 3 Hz

JND above 1000 Hz is o roughly log-log linear

o Log[Jnd(f2)] - Log[ Jnd(f1)] ~ n (Log[f2] - Log[f1])

Questions: o if not power law, then what?

o what units?

• Suggests as frequency increases broader frequency bands

“assigned” to same length of cochlear tissue


Question: what is n?

e.g. f1 =2000

f2 =4000

6 = 10 – 4 ~ n( Log10[2] )

) n ~ 20


Critical Bands

Decades of empirical study • reveal that human audio frequency

perception

• is quantized into < 30 “critical bands”

• of perceptually near-identical pitch classes

• corresponding to ~equal length bands of cochlear tissue (neurons)


Critical Bands: Evidence [H. Fletcher. Rev. Mod. Phys., 12(1):47–65, 1940].

• E.g., Noise-Masking-Tone paradigm Masking Level: Im/If

ratio of just-masked intensity to masker-signal intensity

o Im sound intensity of

tone at frequency f which is just

perceptible in presence of noise

o If Noise intensity at

frequency f (obtained from

derivative of noise power spectrum at f)

Since sound energy level is given by the area under the power spectrum

o as masking noise bandwidth increases

o masking level should increase

Should grow with bandwidth of noise-masker signal

But instead flattens out outside of critical band


Critical Bands : Evidence

• Alternative Evidence

Tone-masking-Noise

(Fig. a & c)

o noise audibility threshold

o for small bandwidth noise

o remains constant

o until tone frequency locus

o falls away from critical

bandwidth

Noise-masking-Tone

(Fig. b & d)

o same effect

o with masker and maskee

roles reversed

[T. Painter and A. Spanias. Proc. IEEE, 88(4):451–512, 2000.]


Psychophysical Scales

• Perceptual continua seem to come in two “flavors” “how much” (magnitude sensitivity)

“what kind” or “where” (positional/locational sensitivity)

• Magnitude-sensitive perceptual responses Generally exhibit a geometric sensitivity

Response = P(Stimulus) where r = P(s) ~ k sn

Psychophysically “natural” scale: “just noticeable difference” (JND) units

o a “unit” change in stimulus just elicits perceptual notice across entire range

o smaller variations of stimulus cannot be discerned across entire range

Question can be posed in terms of scalar change of coordinates

o Identify a rescaling, q = C(r), t = C(s)

o Such that q is linear in the (rescaled) stimulus variable, t

• In contrast, spatial- and/or frequency-sensitive responses do not seem to exhibit

a regular “jnd” principle

[S. S. Stevens. Psychological Review, 64(3):153–181, 1957]


Modeling Empirical Regularity • Premise: Response = P(Stimulus) where

stimulus is externally measured (here: acoustic wave amplitude, A)

response is subjectively reported (here: perceived “loudness,” L)

• Empirical Observation: L = P(A) Case 1: Response exhibits incremental regularity

o At each amplitude, A, an increment in stimulus, A

o Elicits a proportionate increment of loudness, L

o Quantitative Summary: L = p A

Case 2: Response exhibits ratio regularity o At each amplitude, A, a percentage stimulus increment, A /A

o Elicits a proportionately increased response increment, L / L

o Quantitative Summary: L / L = p A /A

• Mathematical Idealization: take the limit, A ! 1 Case 1:

o dL = p dA ) s dL = p s dA

o ) L = p ( A – A0) ) P(A) = p ( A – A0)

Case 2:

o dL/L = p dA/A ) s dL/L = p s dA/A

o ) Ln L = p (Ln A – Ln A0) = Ln[ (A/A0) p ] ) P(A) = (A/A0)

p


The Bark Scale

• “Bark” units Uniform JND scale for frequency

Maps appropriate frequency intervals

Into their respective critical band number

Unit name (Zwicker ’61)

o commemorates Barkhausen

o credited with introducing first sound loudness scale

• Frequency-to-Bark function B(f) = 1 ArcTan[ 1 f ] + 2 ArcTan[ 2 f ]

First Principles vs. Empirical Modeling o Question 1: how were the constant

parameters, 1, 2, 1, 2, determined?

o Question 2: how might the functional form have been determined?

o Question 3: what are the necessary and sufficient features of any smooth scalar change of coordinates?

[E. Zwicker. J. Acoust. Soc.Am., 33(2):248, February 1961]


To Read Further • Human Auditory System Science from Audio Signal

Processing Viewpoint R. Munkong and B.-H. Juang. Auditory perception and cognition.

IEEE Signal Processing Magazine, 25(3):98–117, 2008

• Human Auditory Signal Processing Models B.C.J. Moore. Basic psychophysics of human spectral

processing. International Review of Neurobiology, 70:49–86, 2005

• Use of Psychoacoustic Models in Audio Signal Processing T. Painter and A. Spanias. Perceptual coding of digital audio.

Proceedings of the IEEE, 88(4):451-512, 2000.

• Ethology of Animal Communication H. Brumm and H. Slabbekoorn. Acoustic communication in

noise. Advances in the Study of Behavior, 35:151–209, 2005


ESE250:

Digital Audio Basics

End Week 5 Lecture

Human

Psychoacoustics

Documents

Week 6 February 16, 2012 Human Psychoacousticsese250/week6/Week6_S'12(Psychoaco… · The Cognitive Ear • Modern Psychoacoustics Benefits greatly from o decades of neural recording