1
Assessing the contribution of spectral cues to recognition of frequency-lowered consonants Kelly Fitz°, Christophe Micheyl * Dania Rashiq°, Susie Valentine°, Tao Zhang° °Starkey Hearing Technologies , *U. of Minnesota, Dept. of Psychology Methods Results Motivation Do enhanced spectral cues reduce confusions among frequency- lowered fricatives in hearing-impaired listeners with training? Listeners match frequency-lowered fricatives across vowel contexts. 3I-2AFC task Reference interval contains the target consonant (e.g. /ath/). Target interval contains the same consonant, but a different vowel from the reference (e.g. /ith/). Non-target interval contains a different consonant, but the same vowel as the target (e.g. /is/). 4 unvoiced fricative consonants /f/, /th/, /s/, /sh/ 2 talkers (male and female) 3 vowels /a/, /i/, /u/ Materials Headphone presentation (HD600) 60 dB SPL presentation in test ear Speech-shaped noise at 20 dB SNR Masking noise at 50 dB SPL in non test ear 50 dB masking noise in non-test ear Linear gain targets derived from CamEQ Hierarchical Bayesian Treatments Oracle labeling Estimate power spectrum in 125 ms frames Compute power in one octave neighborhood of two peaks Synthesize narrowband noise components Frequency lowering candidates Thresholds > 65 dB HL above 4 kHz. Thresholds <= 50 dB below 1.5 kHz. Audiogram slope >= 25 dB HL/octave in at least one octave. Participants 0 20 40 60 80 100 120 250 500 1000 2000 4000 8000 Hearing threshold (dB HL) Frequency (Hz) Audiometric thresholds Frequency lowering can restore audibility of critical high frequency cues to patients with severe high frequency hearing loss. [1] J. Robinson, T. Stainsby, T. Baer, and B. C. J. Moore, “Evaluation of a frequency transposition algorithm using wearable hearing aids,” Int J Audiol, pp. 1–10, Apr. 2009. 20 practice trials each session 2 tests x 144 trials per test x 4 days in one week At least two inactive weeks between treatments Training* The latent ability variable, x, was modeled as the sum of a constant term, main effects, interactions, and an error term (Bayesian ANOVA). Factors were subject, treatment, consonant pair, and training. Training modeled as a linear function of the session number. All terms modeled as Gaussian random variables, with mean and variance assigned Gaussian and half-t priors 4 , respectively, and learned from the data. Posterior distributions of model parameters were computed using Markov-chain Monte-Carlo (MCMC) methods in RJAGS 5 . 2k 3k 4k 5k 6k 7k 8k 60 55 50 45 40 35 Welch MeanSquare Spectra for /s/ in forks Frequency (Hz) Power (dB FS) Effect of spectral cue preservation 0.4 0.6 0.8 500 1k 1.5k 0 0.005 0.01 0.015 0.02 0.025 Time (s) Frequency (Hz) Amplitude (linear) 0.4 0.6 0.8 500 1k 1.5k 0 0.005 0.01 0.015 0.02 0.025 Time (s) Frequency (Hz) Amplitude (linear) 0.4 0.6 0.8 500 1k 1.5k 0 0.005 0.01 0.015 0.02 0.025 Time (s) Frequency (Hz) Amplitude (linear) One component Two components “Forks” Frequency Amplitude Two variable-frequency components Compresses spectral cues Frequency Amplitude One fixed-frequency component Removes spectral cues Frequency Amplitude Two fixed-frequency components, with classification One or two components according to phoneme class Exaggerates spectral cues The authors gratefully acknowledge the generous assistance of Sandy Jobes, and of the participants in this study. Sometimes introduces confusions among lowered consonants 1 . (e.g. /s/ and /sh/) You are hearing ‘s’. Which sound contains ‘s’? Presentation * A previous study showed no evidence that listeners without training made use of enhanced spectral cues. Nine participants, ages 68 to 87 (avg 74.9) years Tested in better ear if both ears meet the criteria. Listener selects “Sound 1” or “ Sound 2” using a computer touch screen. Feedback (correct/incorrect) is provided. Two w/classification [2] A. Gelman and J. Hill, Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, 2007. [3] J. N. Rouder, R. D. Morey, P. L. Speckman, and M. S. Pratte, “Detecting chance: a solution to the null sensitivity problem in subliminal priming.,” Psychon Bull Rev, vol. 14, no. 4, Aug. 2007. [4] A. Gelman, A. Jakulin, M. G. Pittau, and Y.-S. Su, “A weakly informative default prior distribution for logistic and other regression models,” Ann. Appl. Stat., vol. 2, no. 4, pp. 1360–1383, Dec. 2008. [5] Plummer, M., “rjags: Bayesian Graphical Models Using MCMC.” In R package version 3.5, U. http://mcmc-jags.sourceforge.net , ed. 2011. “Mass at Chance” 3 Correct-responses modeled as binomial. Probability of correct response defined as the cumulative-normal transform of latent ability, x (similar to d’). Latent ability, x, had a Gaussian distribution truncated below zero so that performance could not be worse than chance (p = 0.5). Overall Performance 0.5 0.6 0.7 0.8 0.9 1 F-TH F-S 0.5 0.6 0.7 0.8 0.9 1 F-SH TH-S S1 S2 S4 S5 S7 Mean 0.5 0.6 0.7 0.8 0.9 1 TH-SH S1 S2 S4 S5 S7 Mean S-SH Proportion correct No treatment One component Two components Two w/classification 95% Bayesian confidence intervals for proportion correct scores on the matching task for each subject, treatment, and consonant pair. Performance under all treatments (including no-treatment) was highly variable within and between subjects. Data so far show no general benefit of any of the treatments relative to no-treatment... ...but this is not the only measure of benefit due to frequency lowering. Most subjects showed substantial benefit a fricative detection test). 5 subjects completed all four treatments (data collection in-progress) Data were analyzed using a hierarchical Bayesian model 2 . No effect of training - data shown is collapsed across sessions. Effect of preserving or enhancing spectral cues was highly variable within and between subjects. Statistically significant differences among treatments (based on 95% Bayesian confidence intervals) were only observed for a few subjects and consonants. Treatments preserving or enhancing spectral cues produce fewer confusions for consonant pairs that include /sh/than treatments that remove spectral cues. d’ score on S-test (fricative detection task) Subject Unprocessed One- Component Two- Component Two w/ classification S1 0.79 3.42 3.19 1.98 S2 0.73 1.42 1.48 1.03 S4 1.06 2.44 3.57 2.79 S5 1.13 2.23 2.79 3.09 S7 1.58 1.20 1.48 1.23 -1 -0.5 0 0.5 1 F-TH F-S -1 -0.5 0 0.5 1 F-SH TH-S S1 S2 S4 S5 S7 Mean -1 -0.5 0 0.5 1 TH-SH S1 S2 S4 S5 S7 Mean S-SH Difference in performance vs. one-component 95% Bayesian confidence intervals for change in performance (similar to change in d’) for treatments with spectral cues compared to treatment without. 0.3 0.1 0.1 0.3 Two components Two w/classification Difference in performance vs. one-component SH pairs F pairs S pairs TH pairs Two components Two w/classification Reduced confusions compared to no spectral cues variation between subjects variation within subjects variation between subjects variation within subjects Three frequency-lowering treatments differ in degree of spectral contrast.

Motivation Assessing the contribution of spectral cues to ......[1] J. Robinson, T. Stainsby, T. Baer, and B. C. J. Moore, “Evaluation of a frequency transposition algorithm using

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Motivation Assessing the contribution of spectral cues to ......[1] J. Robinson, T. Stainsby, T. Baer, and B. C. J. Moore, “Evaluation of a frequency transposition algorithm using

Assessing the contribution of spectral cues to recognition of frequency-lowered consonants

Kelly Fitz°, Christophe Micheyl*

Dania Rashiq°, Susie Valentine°, Tao Zhang°

°Starkey Hearing Technologies , *U. of Minnesota, Dept. of Psychology

Methods

Results

Motivation

Do enhanced spectral cues reduce confusions among frequency-lowered fricatives in hearing-impaired listeners with training?

Listeners match frequency-lowered fricatives across vowel contexts.

3I-2AFC task

Reference interval contains the target consonant (e.g. /ath/).

Target interval contains the same consonant, but a different vowel from the reference (e.g. /ith/).

Non-target interval contains a different consonant, but the same vowel as the target (e.g. /is/).

4 unvoiced fricative consonants /f/, /th/, /s/, /sh/

2 talkers (male and female)

3 vowels /a/, /i/, /u/

Materials

Headphone presentation (HD600)

60 dB SPL presentation in test ear

Speech-shaped noise at 20 dB SNR

Masking noise at 50 dB SPL in non test ear

50 dB masking noise in non-test ear

Linear gain targets derived from CamEQ

Hierarchical Bayesian

Treatments

Oracle labeling

Estimate power spectrum in 125 ms frames

Compute power in one octave neighborhood of two peaks

Synthesize narrowband noise components

Frequency lowering candidatesThresholds > 65 dB HL above 4 kHz.

Thresholds <= 50 dB below 1.5 kHz.

Audiogram slope >= 25 dB HL/octave in at least one octave.

Participants

0"

20"

40"

60"

80"

100"

120"250" 500" 1000" 2000" 4000" 8000"

Hearing(threshold((dB(HL

)(

Frequency((Hz)(

Audiometric(thresholds(

Frequency lowering can restore audibility of critical high frequency cues to patients with severe high frequency hearing loss.

[1] J. Robinson, T. Stainsby, T. Baer, and B. C. J. Moore, “Evaluation of a frequency transposition algorithm using wearable hearing aids,” Int J Audiol, pp. 1–10, Apr. 2009.

20 practice trials each session

2 tests x 144 trials per test x 4 days in one week

At least two inactive weeks between treatments

Training*

The latent ability variable, x, was modeled as the sum of a constant term, main effects, interactions, and an error term (Bayesian ANOVA).

Factors were subject, treatment, consonant pair, and training.

Training modeled as a linear function of the session number.

All terms modeled as Gaussian random variables, with mean and variance assigned Gaussian and half-t priors4 , respectively, and learned from the data.

Posterior distributions of model parameters were computed using Markov-chain Monte-Carlo (MCMC) methods in RJAGS5.

2k 3k 4k 5k 6k 7k 8k−60

−55

−50

−45

−40

−35

Welch Mean−Square Spectra for /s/ in forks

Frequency (Hz)

Pow

er (d

B F

S)

Effect of spectral cue preservation

0.4 0.6 0.8500

1k

1.5k0

0.005

0.01

0.015

0.02

0.025

Time (s)

One−component method

Frequency (Hz)

Am

plitu

de (l

inea

r)

0.4 0.6 0.8500

1k

1.5k0

0.005

0.01

0.015

0.02

0.025

Time (s)

Two−component method

Frequency (Hz)

Am

plitu

de (l

inea

r)

0.4 0.6 0.8500

1k

1.5k0

0.005

0.01

0.015

0.02

0.025

Time (s)

Two−component method with classification

Frequency (Hz)

Am

plitu

de (l

inea

r)

One component Two components

“Forks”

FrequencyAmplitude

Two variable-frequency components

Compresses spectral cues

FrequencyAmplitudeOne fixed-frequency component

Removes spectral cues

FrequencyAmplitude

Two fixed-frequency components, with classification

One or two components

according to phoneme class

Exaggerates spectral cues

The authors gratefully acknowledge the generous assistance of Sandy Jobes, and of the participants in this study.

Sometimes introduces confusions among lowered consonants 1. (e.g. /s/ and /sh/)

You  are  hearing  ‘s’.

Which  sound  contains  ‘s’?

Presentation

* A previous study showed no evidence that listeners without training made use of enhanced spectral cues.

Nine participants, ages 68 to 87 (avg 74.9) years

Tested in better ear if both ears meet the criteria.

Listener selects “Sound 1” or “ Sound 2” using a computer touch screen.

Feedback (correct/incorrect) is provided.

Two w/classification

[2] A. Gelman and J. Hill, Data analysis using regression and multilevel/hierarchical models.  Cambridge University Press, 2007.

[3] J. N. Rouder, R. D. Morey, P. L. Speckman, and M. S. Pratte, “Detecting chance: a solution to the null sensitivity problem in subliminal priming.,” Psychon Bull Rev, vol. 14, no. 4, Aug. 2007.

[4] A. Gelman, A. Jakulin, M. G. Pittau, and Y.-S. Su, “A weakly informative default prior distribution for logistic and other regression models,” Ann. Appl. Stat., vol. 2, no. 4, pp. 1360–1383, Dec. 2008.

[5] Plummer, M., “rjags: Bayesian Graphical Models Using MCMC.” In R package version 3.5, U. http://mcmc-jags.sourceforge.net, ed. 2011.

“Mass at Chance”3

Correct-responses modeled as binomial.

Probability of correct response defined as the cumulative-normal transform of latent ability, x (similar to d’).

Latent ability, x, had a Gaussian distribution truncated below zero so that performance could not be worse than chance (p = 0.5).

Overall Performance

0.50.60.70.80.9

1

F-TH

No treatmentOne component Two componentsTwo w/classification

F-S

No treatmentOne component Two componentsTwo w/classification

No treatmentOne component Two componentsTwo w/classification

No treatmentOne component Two componentsTwo w/classification

0.50.60.70.80.9

1

F-SH TH-S

S1 S2 S4 S5 S7M

ean

0.50.60.70.80.9

1

TH-SH

S1 S2 S4 S5 S7M

ean

S-SH

Prop

ortio

n co

rrect

No treatmentOne componentTwo componentsTwo w/classification

95% Bayesian confidence intervals for proportion correct scores on the matching task for each subject, treatment, and consonant pair.

Performance under all treatments (including no-treatment) was highly variable within and between subjects.

Data so far show no general benefit of any of the treatments relative to no-treatment...

...but this is not the only measure of benefit due to frequency lowering.

Most subjects showed substantial benefit a fricative detection test).

5 subjects completed all four treatments (data collection in-progress)

Data were analyzed using a hierarchical Bayesian model2 .

No effect of training - data shown is collapsed across sessions.

Effect of preserving or enhancing spectral cues was highly variable within and between subjects.

Statistically significant differences among treatments (based on 95% Bayesian confidence intervals) were only observed for a few subjects and consonants.

Treatments preserving or enhancing spectral cues produce fewer confusions for consonant pairs that include /sh/than treatments that remove spectral cues.

d’ score on S-test (fricative detection task)

Subject UnprocessedOne-

ComponentTwo-

ComponentTwo w/

classification

S1 0.79 3.42 3.19 1.98

S2 0.73 1.42 1.48 1.03

S4 1.06 2.44 3.57 2.79

S5 1.13 2.23 2.79 3.09

S7 1.58 1.20 1.48 1.23

-1

-0.5

0

0.5

1

F-TH

Two componentTwo w/classification

F-S

Two componentTwo w/classification

-1

-0.5

0

0.5

1

F-SH TH-S

S1 S2 S4 S5 S7

Mea

n

-1

-0.5

0

0.5

1

TH-SH

S1 S2 S4 S5 S7

Mea

n

S-SH

Diff

eren

ce in

per

form

ance

vs.

one

-com

pone

nt

95% Bayesian confidence intervals for change in performance (similar to change in d’) for treatments with spectral cues compared to treatment without.

−0.3

−0.1

0.1

0.3

SH p

airs

F pa

irs

S pa

irs

TH p

airs

Diff

eren

ce in

per

form

ance

vs.

one−c

ompo

nent

Two componentsTwo w/classification

Diff

eren

ce in

per

form

ance

vs

. one

-com

pone

nt

SH pairs

F pairs

S pairs

TH pairs

Two componentsTwo w/classification

Reduced confusions compared to no spectral cues

variation between subjects

variation within subjects

variation between subjects

variation within subjects

Three frequency-lowering treatments differ in degree of spectral contrast.