Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 1
ESE250:
Digital Audio Basics
Week 6
February 16, 2012
Human Psychoacoustics
2
Course Map
Numbers correspond to course weeks
2,5 6
11
13
12
Today
ESE 250 – S’12 Kod & DeHon Week 6 – Psychoacoustics
Where are we ? • Week 2 Received signal is sampled &
quantized
q = PCM[ r ]
• Week 3 Quantized Signal is Coded
c =code[ q ]
• Week 4 Sampled signal first
transformed into frequency domain
Q = DFT[ q ]
• Week 5 signal oversampled & low
pass filtered
Q = LPF[ DFT(q+n) ]
• Week 6 Transformed signal analyzed
Using human psychoaoustic models
• Week 7 Acoustically Interesting signal
is “perceptually coded”
C = MP3[ Q]
Over
Sample DFT LPF
Decode Produce
r(t)
p(t)
q + n
C Perceptual
Coding
Store /
Transmit
Q + N Q
Week 4
Week 6
Week 5 Week 3
[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 3
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 4
The Physical Ear
• External Sound Waves
Guided by outer ear
into auditory canal
• Excite Inner Ear
through mechanical linkage
connecting ear drum
to cochlea
• Initiates signal processing
frequency domain analysis
via analog computation
[R. Munkong and B.-H. Juang. IEEE Sig. Proc. Mag., 25(3):98–117, 2008]
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 5
The Cognitive Ear • Modern Psychoacoustics
Benefits greatly from o decades of neural
recording o contemporary brain
imaging technology
Still in its infancy o huge neural
populations o intricate recruitment of
brain anatomy
• General Consensus Frequency
representation is encoded at neural-cochlear interface
Processed through an array of modules
o Spatially distributed o Hierarchically arranged
(fan-outs, fan-ins) o “inwardly” abstract,
e.g. outer module:
“noise/music” inner module:
“guitar/saxophone”
[R. Munkong and B.-H. Juang. IEEE Sig. Proc. Mag., 25(3):98–117, 2008]
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 6
Power Spectrum Model of Hearing
• Rough Picture (main content of today’s lecture):
Critical Bands: Auditory system contains finite array
of adaptively tunable, overlapping bandpass filters
Frequency Bins: humans process a signal’s
component (against noisy background) in the one
filter with closest center frequency
Masking: certain signal components in a given band
are “favored” and others are filtered out
• Established through decades of psychoacoustic
experiments
B.C.J. Moore. Int.Rev.Neurobiol., 70:49–86, 2005.
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 7
Auditory Thresholds • H. Fletcher (1940)
played pure tones
varying
o frequency, f [ Hz]
o Intensity,
I [Dyn ¢ cm-2]
= 10-5 [N ¢ cm-2]
= 10 Pa
o phase changes tend to
be inaudible
large listener population
o young
o Acute
• Recorded extreme thresholds faintest audible
greatest tolerable
[H. Fletcher. Rev. Mod. Phys., 12(1):47–65, 1940].
• Results: pain-free hearing range
extends at most over
20 Hz – 20 KHz
with sensitivity
» 2 ¢ 10-4 ¢ 10 ¹ Pa = 2 º Pa
Week 6 – Psychoacoustics
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 8
Auditory Thresholds [H. Fletcher. Rev. Mod. Phys., 12(1):47–65, 1940].
• Standard pressure
p0 =
2 ¢ 10-9 Pa
• Compare to
ambient sea-level pressure:
1 Atmosphere
= 105 Pascal
~ 103 millibar
~ 30 inHg
• Q: why use log-log
scale?
• A1: dynamic range
• A2: “loudness” is a
power function
Sound Pressure Level
LSPL = 20 log10(p/p0) dB
10 Pa
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 9
Aside: Scales for Power Laws • Assume a power law relationship
Response = P(Stimulus)
where r = P(s) = k sn
• Rescaling: take q = Ln (r) and t = Ln (s) ) q = Ln (k) + n Ln (s) = K + n t
• Change of Coordinates Yields a linear relationship
Between new stimulus-response variables
q = P(t) = K + n t
• Linearity renders the JND (“just noticeable difference”)
unit
uniformly in stimulus and response coordinates
facilitating perceptual processing algorithms
we’ll re-examine
the origin of this
model later in
the lecture
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 10
Masking
[H. Fletcher. Rev. Mod. Phys., 12(1):47–65, 1940].
• Dots show pure tone magnitude (in dB)
• Required to be audible above noise of the magnitude
on the middle curve
centered at that frequency
with bandwidth o at least wider
o than the bars
o of next figures
Noise Masking Tone
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 11
Masking • Masking Paradigms
“Masker” masking “maskee”
Noise Masking Tone o previous plot
Tone Masking Noise o pure tone
of 80 SPL
at 1 kHz
o just masks
o “critical band” noise of 56 SPL
centered at 1 kHz
Masker-to-Maskee ratio o Constant for fixed relative
frequency and varying amplitude
o Changes with varying relative frequency
Noise Masking Noise o more care required
o to describe bandwidth pair
• Temporal Masking Masker effect persists for tenths
of a second
Masker effect is “acausal” o on ~ 2/100 timescales
[T. Painter and A. Spanias. Proc. IEEE, 88(4):451–512, 2000.]
1 “Bark”
frequency
interval
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 12
Masking
[H. Fletcher. Rev. Mod. Phys., 12(1):47–65, 1940].
Tone Masking Noise
value (in dB) above quiet threshold
such that a signal at the abscissa frequency
can be heard in presence of 200 Hz tone
…masker at fundamental
can somewhat mask maskees
at the harmonics …
… but the “spreading
curve” is traditionally
depicted over the
fundamental only
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 14
Interlude A: Why Masking?
• Ecology of auditory scene Ensemble of sound sources – specific placements and time of production
– is called the auditory scene
Animal communication takes place within a noisy environment o communication crucial for animal society; auditory capability emerged
through evolutionary selective pressure
o distinct production and reception strategies allow multiple species to “share” same bandwidth in the same location
• Masking: animal cocktail party skills Many varied studies: e.g., birds respond selectively to conspecific song
when mixed with other species’ o response levels weaken as masked song similarity increases
o response levels remain strong as multiple “chorus” increases
o babies distinguish parents’ calls in presence of louder non-related conspecifics’
o trained non-relatives distinguish familiar from unfamiliar conspecifics’ songs
Human selective attention can be disrupted by “cognitive” distractors o E.g. inserting listener’s name into “wrong” channel
[H. Brumm and H. Slabbekoorn. Adv.Study Beh.,35:151–209, 2005]
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 15
Interlude B: Why Frequency?
• Why not some other harmonic series?
Fourier’s analysis shows
harmonic analysis could be based on
arbitrary smooth periodic fundamental
• Why does the animal receiver use
sinusoids?
• Hamiltonian Mechanics
Simplest physical model of vibrating
masses
Coupled spring-mass-damper mechanics
Produce sinusoidal harmonics
m
x
b k
…. all sound
is produced
by vibrating
masses ….
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 16
Pitch JND • JND = “just noticeable difference”
change in stimulus
that “just” elicits perceptual notice
where “just” means o smaller variations of stimulus
o cannot be discerned
• Human Pure Tone Data JND below 1000 Hz is
o roughly constant
o ~ 3 Hz
JND above 1000 Hz is o roughly log-log linear
o Log[Jnd(f2)] - Log[ Jnd(f1)] ~ n (Log[f2] - Log[f1])
Questions: o if not power law, then what?
o what units?
• Suggests as frequency increases broader frequency bands
“assigned” to same length of cochlear tissue
[H. Fletcher. Rev. Mod. Phys., 12(1):47–65, 1940].
Question: what is n?
e.g. f1 =2000
f2 =4000
6 = 10 – 4 ~ n( Log10[2] )
) n ~ 20
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 17
Critical Bands
Decades of empirical study • reveal that human audio frequency
perception
• is quantized into < 30 “critical bands”
• of perceptually near-identical pitch classes
• corresponding to ~equal length bands of cochlear tissue (neurons)
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 18
Critical Bands: Evidence [H. Fletcher. Rev. Mod. Phys., 12(1):47–65, 1940].
• E.g., Noise-Masking-Tone paradigm Masking Level: Im/If
ratio of just-masked intensity to masker-signal intensity
o Im sound intensity of
tone at frequency f which is just
perceptible in presence of noise
o If Noise intensity at
frequency f (obtained from
derivative of noise power spectrum at f)
Since sound energy level is given by the area under the power spectrum
o as masking noise bandwidth increases
o masking level should increase
Should grow with bandwidth of noise-masker signal
But instead flattens out outside of critical band
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 19
Critical Bands : Evidence
• Alternative Evidence
Tone-masking-Noise
(Fig. a & c)
o noise audibility threshold
o for small bandwidth noise
o remains constant
o until tone frequency locus
o falls away from critical
bandwidth
Noise-masking-Tone
(Fig. b & d)
o same effect
o with masker and maskee
roles reversed
[T. Painter and A. Spanias. Proc. IEEE, 88(4):451–512, 2000.]
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 20
Psychophysical Scales
• Perceptual continua seem to come in two “flavors” “how much” (magnitude sensitivity)
“what kind” or “where” (positional/locational sensitivity)
• Magnitude-sensitive perceptual responses Generally exhibit a geometric sensitivity
Response = P(Stimulus) where r = P(s) ~ k sn
Psychophysically “natural” scale: “just noticeable difference” (JND) units
o a “unit” change in stimulus just elicits perceptual notice across entire range
o smaller variations of stimulus cannot be discerned across entire range
Question can be posed in terms of scalar change of coordinates
o Identify a rescaling, q = C(r), t = C(s)
o Such that q is linear in the (rescaled) stimulus variable, t
• In contrast, spatial- and/or frequency-sensitive responses do not seem to exhibit
a regular “jnd” principle
[S. S. Stevens. Psychological Review, 64(3):153–181, 1957]
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 21
Modeling Empirical Regularity • Premise: Response = P(Stimulus) where
stimulus is externally measured (here: acoustic wave amplitude, A)
response is subjectively reported (here: perceived “loudness,” L)
• Empirical Observation: L = P(A) Case 1: Response exhibits incremental regularity
o At each amplitude, A, an increment in stimulus, A
o Elicits a proportionate increment of loudness, L
o Quantitative Summary: L = p A
Case 2: Response exhibits ratio regularity o At each amplitude, A, a percentage stimulus increment, A /A
o Elicits a proportionately increased response increment, L / L
o Quantitative Summary: L / L = p A /A
• Mathematical Idealization: take the limit, A ! 1 Case 1:
o dL = p dA ) s dL = p s dA
o ) L = p ( A – A0) ) P(A) = p ( A – A0)
Case 2:
o dL/L = p dA/A ) s dL/L = p s dA/A
o ) Ln L = p (Ln A – Ln A0) = Ln[ (A/A0) p ] ) P(A) = (A/A0)
p
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 22
The Bark Scale
• “Bark” units Uniform JND scale for frequency
Maps appropriate frequency intervals
Into their respective critical band number
Unit name (Zwicker ’61)
o commemorates Barkhausen
o credited with introducing first sound loudness scale
• Frequency-to-Bark function B(f) = 1 ArcTan[ 1 f ] + 2 ArcTan[ 2 f ]
First Principles vs. Empirical Modeling o Question 1: how were the constant
parameters, 1, 2, 1, 2, determined?
o Question 2: how might the functional form have been determined?
o Question 3: what are the necessary and sufficient features of any smooth scalar change of coordinates?
[E. Zwicker. J. Acoust. Soc.Am., 33(2):248, February 1961]
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 23
To Read Further • Human Auditory System Science from Audio Signal
Processing Viewpoint R. Munkong and B.-H. Juang. Auditory perception and cognition.
IEEE Signal Processing Magazine, 25(3):98–117, 2008
• Human Auditory Signal Processing Models B.C.J. Moore. Basic psychophysics of human spectral
processing. International Review of Neurobiology, 70:49–86, 2005
• Use of Psychoacoustic Models in Audio Signal Processing T. Painter and A. Spanias. Perceptual coding of digital audio.
Proceedings of the IEEE, 88(4):451-512, 2000.
• Ethology of Animal Communication H. Brumm and H. Slabbekoorn. Acoustic communication in
noise. Advances in the Study of Behavior, 35:151–209, 2005
Week 6 – Psychoacoustics ESE 250 – S’12 Kod & DeHon 24
ESE250:
Digital Audio Basics
End Week 5 Lecture
Human
Psychoacoustics