4
Proceedings of the International Symposium on Musical Acoustics, March 31st to April 3rd 2004 (ISMA2004), Nara, Japan Auditory Grouping in the Perception of Roughness Induced by Subharmonics: Empirical Findings and a Qualitative Model Chen-Gia Tsai Department of Musicology, Humboldt University Berlin, Germany [email protected] Abstract Quasi-periodic sounds with subharmonics at (2n-1)f 0 /2 (where f 0 is the perceived pitch, n = 1, 2, 3...) can be produced by musical instruments such as the saxophone, the trombone, the violin, and the Chinese membrane flute. Lower subharmonics in a natural sound are always too weak to evoke the pitch f 0 /2, but upper subharmonics (>11f 0 /2) can be strong enough to affect the sound quality. Subharmonics are common in human vocalizations and have been identified as a source of roughness. However, this type of roughness cannot be explained by existing psychoacoustic models and appears to contradict the theory of consonance- dissonance. The present study provided a qualitative model of roughness induced by subharmonics with the consideration of higher-order mechanisms of auditory grouping. The key assumption was that interference between components at nf 0 lying in the same critical bands would be largely reduced once they are grouped by a robust pitch sensation of f 0 . Roughness induced by subharmonics reflects a limitation of the pitch-based grouping mechanism, as the perceived pitch is too high for grouping the subharmonics. 1. Introduction Auditory roughness is an important parameter that induces unpleasant qualities of a sound. Since its introduction by Helmholtz [1], roughness has been considered to be due to rapid beatings in auditory peripheral channels, or critical bands. Aside from psychoacoustic research, roughness as an indicator of pathological voices has been extensively studied by clinicians. Roughness evaluations of human voices have provided new data for examining psychoacoustic models of roughness. In Reuter’s dissertation [2], a psychoacoustic model for roughness calculation [3] was applied to pathological voices. It was a striking finding that computed results showed a medium correlation to the perceived roughness. A conflict between psychoacoustic and clinical studies of human voices can be found in a specific type of roughness: roughness induced by subharmonics. Subharmonics are spectral components at multiples of a low integer fraction of the perceived pitch f 0 . This paper focused on the subharmonics at (2n-1)f 0 /2, although subharmonics at multiples of f 0 /n can occur in human vocalizations. Typically, lower subharmonics are too weak to evoke the ‘subharmonic pitch’ f 0 /2, but upper subharmonics are strong enough to induce the rough sound quality. This study addressed some empirical findings against existing psychoacoustic models of roughness and provided a new model that qualifies the perception of roughness induced by subharmonics. 2. Background 2.1. Neural correlates of musical dissonance Helmholtz explained the perception of musical dissonance in terms of beats evoked by adjacent harmonics of two simultaneously sounding musical tones. These beats could result in intermittent neural activity. This approach implies that the degree of dissonance depends on the spectral content of the tones. Recent research on the neurophysiological basis for dissonance perception mainly concerned temporal coding at various levels of the auditory nervous system, such as the inferior colliculus [4] and the primary auditory cortex [5]. However, such phase-locked firing patterns should be explained no more than maintenance of temporal information at lower levels of auditory processing. Its correlation to the unpleasant sensation associated with dissonance or roughness–which is likely coded in the prefrontal cortex–remains to be demonstrated. A case study of brain lesion in the auditory cortex showed a disassociation between the harmony perception and roughness perception [6]. The authors thus suggested that pitch relationships influenced harmony perception in the vertical dimension with roughness playing a secondary role. In the other words, harmonics of two musical tones may not produce unpleasant beats when lying in the same critical band. 2.2. ‘Pleasant beating’ in low-pitched singing It is also unclear whether beats arising from the harmonics of a single musical tone can induce unpleasant qualities. According to psychoacousticians, a harmonic-rich voice with f 0 < 200 Hz would have a high value of roughness, because upper harmonics could interfere each other within critical bands. However, such low-pitched voices are not rough according to voice clinicians. In singing practice, the voices of a 3-S2-2 257

Auditory Grouping in the Perception of Roughness

Embed Size (px)

DESCRIPTION

Auditory Grouping in the Perception of Roughness

Citation preview

Page 1: Auditory Grouping in the Perception of Roughness

Proceedings of the International Symposium on Musical Acoustics, March 31st to April 3rd 2004 (ISMA2004), Nara, Japan

Auditory Grouping in the Perception of Roughness Induced by Subharmonics: Empirical Findings and a Qualitative Model

Chen-Gia Tsai

Department of Musicology, Humboldt University Berlin, Germany [email protected]

Abstract

Quasi-periodic sounds with subharmonics at (2n-1)f0/2 (where f0 is the perceived pitch, n = 1, 2, 3...) can be produced by musical instruments such as the saxophone, the trombone, the violin, and the Chinese membrane flute. Lower subharmonics in a natural sound are always too weak to evoke the pitch f0/2, but upper subharmonics (>11f0/2) can be strong enough to affect the sound quality. Subharmonics are common in human vocalizations and have been identified as a source of roughness. However, this type of roughness cannot be explained by existing psychoacoustic models and appears to contradict the theory of consonance-dissonance. The present study provided a qualitative model of roughness induced by subharmonics with the consideration of higher-order mechanisms of auditory grouping. The key assumption was that interference between components at nf0 lying in the same critical bands would be largely reduced once they are grouped by a robust pitch sensation of f0. Roughness induced by subharmonics reflects a limitation of the pitch-based grouping mechanism, as the perceived pitch is too high for grouping the subharmonics.

1. Introduction Auditory roughness is an important parameter that induces unpleasant qualities of a sound. Since its introduction by Helmholtz [1], roughness has been considered to be due to rapid beatings in auditory peripheral channels, or critical bands. Aside from psychoacoustic research, roughness as an indicator of pathological voices has been extensively studied by clinicians.

Roughness evaluations of human voices have provided new data for examining psychoacoustic models of roughness. In Reuter’s dissertation [2], a psychoacoustic model for roughness calculation [3] was applied to pathological voices. It was a striking finding that computed results showed a medium correlation to the perceived roughness.

A conflict between psychoacoustic and clinical studies of human voices can be found in a specific type of roughness: roughness induced by subharmonics. Subharmonics are spectral components at multiples of a low integer fraction of the perceived pitch f0. This paper focused on the subharmonics at (2n-1)f0/2, although

subharmonics at multiples of f0/n can occur in human vocalizations. Typically, lower subharmonics are too weak to evoke the ‘subharmonic pitch’ f0/2, but upper subharmonics are strong enough to induce the rough sound quality. This study addressed some empirical findings against existing psychoacoustic models of roughness and provided a new model that qualifies the perception of roughness induced by subharmonics.

2. Background

2.1. Neural correlates of musical dissonance

Helmholtz explained the perception of musical dissonance in terms of beats evoked by adjacent harmonics of two simultaneously sounding musical tones. These beats could result in intermittent neural activity. This approach implies that the degree of dissonance depends on the spectral content of the tones.

Recent research on the neurophysiological basis for dissonance perception mainly concerned temporal coding at various levels of the auditory nervous system, such as the inferior colliculus [4] and the primary auditory cortex [5]. However, such phase-locked firing patterns should be explained no more than maintenance of temporal information at lower levels of auditory processing. Its correlation to the unpleasant sensation associated with dissonance or roughness–which is likely coded in the prefrontal cortex–remains to be demonstrated.

A case study of brain lesion in the auditory cortex showed a disassociation between the harmony perception and roughness perception [6]. The authors thus suggested that pitch relationships influenced harmony perception in the vertical dimension with roughness playing a secondary role. In the other words, harmonics of two musical tones may not produce unpleasant beats when lying in the same critical band.

2.2. ‘Pleasant beating’ in low-pitched singing

It is also unclear whether beats arising from the harmonics of a single musical tone can induce unpleasant qualities. According to psychoacousticians, a harmonic-rich voice with f0 < 200 Hz would have a high value of roughness, because upper harmonics could interfere each other within critical bands. However, such low-pitched voices are not rough according to voice clinicians. In singing practice, the voices of a

3-S2-2

257

Page 2: Auditory Grouping in the Perception of Roughness

professional bass are characterized by a formant around 3 kHz. Although the unresolved harmonics clustering around this singer’s formant could give the voice a rough quality, the audience favors bright low-pitched voices rather than dull ones that are free of roughness.

2.3. ‘Unpleasant consonance’: roughness induced by subharmonics

According to Helmholtz, consonant intervals were pleasant because very few beats were produced in auditory channels. However, human voices with subharmonics appears against this theory.

Fig. 1 displays two spectrograms of human voices with subharmonics. The prominent pitch f0 is represented as the first strong spectral component in each spectrogram (marked with arrows). In Fig. 1a, the frequencies of subharmonics (2n-1)f0/2 are integer multiples of f0/2. Therefore, a sudden appearance of these subharmonics could be regarded as adding a tone one octave below to the voice. In general, the frequencies of subharmonics in a voice are always multiples of a low integer fraction of the fundamental frequency, such as f0/2, f0/3, f0/4, or even f0/6 (Fig. 1b). As the corresponding subharmonic pitch and the fundamental frequency strictly stand in simple integer ratios, it is a puzzle that such ‘consonances’ in voices are characterized by a rough quality.

Figure 1: Human voices with subharmonics.

2.4. Roughness vs. aperiodicity

In phoniatrics, several indicators have been used for describing pathological voice quality, including roughness, breathiness and hoarseness. Roughness has been related to ‘aperiodicity features’ of human voices, which arise from the glottis’ instability. This approach parallels the evaluations of psychoacoustic roughness in amplitude- or frequency-modulated tones.

The correlation between roughness and modulated sounds appears fairly low in music. A soprano’s voice can be rapidly and deeply modulated but still beautiful,

whereas steady but rough tones with subharmonics can be found in wind instruments (e.g. the trombone and the bassoon, see [7]) and bowed string instruments [8]. Although such tones are generally avoided in Western Classical Music because of the unpleasant quality, they are deliberately used in Russian lament [9], jazz and Chinese membrane flute music [10].

It is important to note that stationary violin tones with subharmonics were related to the rustle quality [11], which appears similar to roughness. This is in a sharp contrast to the general belief that roughness is always induced by rapidly modulated signals, raising the question of multidimensionality of roughness. Bergan and Titze [12] investigated the effect of amplitude- and frequency-modulations on the perception of pitch and roughness in voices with subharmonics. In the introduction of their paper, the authors mentioned that roughness induced by subharmonics may be distinguishable from roughness due to aperiodicity.

3. Auditory modeling

3.1. Auditory grouping and interference reduction

As the spectral distribution of subharmonics may be a new dimension of roughness, psychoacoustic models based on the notion of critical bandwidth are unsuitable for roughness induced by subharmonics. I suggest that this type of roughness cannot be explained without taking into account the pitch-based grouping mechanism in auditory scene analysis.

Auditory scene analysis deals with the organization of auditory scene which breaks a sound mixture into elements and groups proximate elements into discrete objects [13]. Grouping mechanisms are considered to be governed by some ‘grouping rules’ such as harmonicity, coherent modulation, common onset and spatial location.

A qualitative model of roughness induced by subharmonics is proposed here with two assumptions. First, the grouping rule of harmonicity is modified as that components at nf0 are grouped only when the pitch sensation of f0 is robust. In the other words, if the pitch strength of f0 is low, components at nf0 will not be grouped despite harmonicity. Second, unpleasant beats between components lying in the same critical bands will be largely reduced once they are grouped. This assumption is supported by the fact that bright, low-pitched singing can have a low value of roughness.

3.2. Model description

3.2.1. Stage 1: Pitch extraction

The definition of subharmonics and harmonics demands a definition of pitch, which is not obvious when the

a b

258

Page 3: Auditory Grouping in the Perception of Roughness

subharmonic pitch f0/2 competes with f0. Figs. 2a and 2b display the spectra of a pleasant

throat-singing voice (kargyraa) and a rough voice with subharmonics. Psychoacoustic models of roughness cannot explain that the throat-singing voice is less rough than the voice with subharmonics. The perceived drone pitch of the throat-singing voice is the frequency of the first component, whereas the perceived pitch of the rough voice is the frequency of the second component. This can be related to the degree of pre-dominance of the even-numbered components at lower frequencies. For the throat-singing voice, no such predominance is notable (Fig. 2a). For the rough voice, the lowest six even-numbered components dominate (Fig. 2b), so that the pitch f0 tends to be extracted according to them; they are harmonics at nf0. Fig. 2c displays the spectrum of a saxophone tone with subharmonics. The predominance of lower harmonics is also noticeable.

Figure 2: Pitch extraction and the predominance of

odd-numbered components. (a) Spectrum of a throat-singing voice (kargyraa). (b) Spectrum of a rough voice.

(c) Spectrum of a saxophone tone. Although subharmonics are often thought as weaker

than their flanking harmonics, it is important to note that the predominance of harmonics depends on the frequency range. Typically, lower subharmonics are much weaker than their flanking harmonics and partially masked by the latter. This harmonic predominance is less significant at higher frequencies.

Figs. 2b and 2c show that the subharmonics above 6f0 are comparable to their flanking harmonics in magnitude. However, these upper subharmonics are unresolved components (rank>12) and unable to evoke a robust pitch sensation. Consequently, the pitch strength of f0/2 is fairly low.

3.2.2. Stage 2: Sifting with harmonic sieve

In this stage a harmonic sieve is constructed according to the pitch f0. This harmonic sieve consists of a series of harmonic ‘holes’ at nf0. Harmonics pass the sieve, while subharmonics are rejected by it.

3.2.3. Stage 3: Grouping harmonics

In this stage, the components that have passed the harmonic sieve are grouped as a single entity, which is the ‘pure’ part of the sound because the unpleasant beats between harmonics are largely reduced. Rejected by the harmonic sieve, subharmonics remain ungrouped, evoking many entities in the auditory scene.

3.2.4. Stage 4: Higher-order grouping

Although subharmonics are not grouped by the pitch of f0/2, the auditory system still recognizes that both harmonics and subharmonics have arisen from the same source. This implicates a higher-order grouping.

4. Discussion

4.1. Cancellation filtering

The present model differs from previous roughness models in the stages 2 and 3, where components segregation and grouping take place. Interference reduction of unresolved components is accomplished through the sifting in the stage 2. To segregate the harmonics lying in the same critical band, one should assume a mechanism of ‘f0-guided cancellation filtering’ within auditory channels. A temporal model of harmonics segregation was proposed in [14]. This model offered a putative neural mechanism supporting the idea that beats induced by unresolved harmonics are cancelled at a higher level of the auditory processing hierarchy.

4.2. Lower vs. upper subharmonics

While subharmonics have been identified as a source of roughness, the present model distinguishes between the roles of lower and upper subharmonics. It was suggested that the pitch could be determined in terms of subharmonic-to-harmonic ratio [15]. However, the upper subharmonics have few contributions to pitch perception. They cannot elicit a pitch to group themselves, thus inducing roughness. In contrast, lower subharmonics contribute to the pitch sensation of f0/2. When these lower subharmonics become stronger, the pitch and roughness of the sound will be reduced. This effect has been verified in a perceptual experiment [10].

4.3. Subharmonics as auditory impurities

Stimuli with subharmonics shed new light on auditory scene analysis by introducing the notion of auditory

259

Page 4: Auditory Grouping in the Perception of Roughness

impurity; a sound with subharmonics is perceived as a ‘pure’ part plus impurities. As impurities, sub-harmonics are more or less segregated from the ‘pure’ part of the sound composed of the well-grouped harmonics, but still bound to it through a higher-order grouping. This grouping, distinguishable from that based on pitch, may stem from experiences and learning. Since we often hear sounds with subharmonics emitted from one oscillator such as the glottis and musical instruments, the auditory system may have learned to bind subharmonics with harmonics.

4.4. Sounds of self-sustained oscillators

The pitch sensation is of great importance of grouping all components emitted by the same oscillator. However, this grouping mechanism has some limitations. First, self-sustained oscillators with a torus or strange attractor in the phase space produce inharmonic components. Second, even for the oscillators with a one-dimensional attractor (limit cycle) that produce periodic sounds, the pitch-based grouping sometimes fail because of peculiar spectral features. For example, the sound of the oscillator that has undergone a period-doubling can have weak odd-numbered components at lower frequencies. The pitch f0, which is extracted on the basis of the lower even-numbered components–the harmonics–is too high for grouping all components. The pitch sensation of f0/2 can accomplish this task, but the auditory system fails to perceive this pitch when the lower odd-numbered components–the subharmonics–are weak and masked by adjacent harmonics. From this standpoint, roughness induced by subharmonics reflects a performance limit of the pitch-based auditory grouping.

5. Conclusions This study has reviewed some evidence against the psycho-acoustic models of roughness based on the notion of critical bandwidth. A model qualifying the perception of roughness induced by subharmonics was proposed with two assumptions: (1) grouping demands a robust pitch sensation; (2) unpleasant beatings caused by components in the same critical bands are largely reduced by this grouping. Since lower subharmonics in rough voices or musical tones are always much weaker than their flanking harmonics, the pitch of f0/2 is very weak. Therefore, the subharmonics are not grouped and perceived as impurities. This new model takes into consideration two higher-level mechanisms: (1) grouping harmonics across critical bands, and (2) binding the subharmonics with the well-grouped harmonics to form a single auditory entity.

Future research should be dedicated to a calculation model of roughness induced by subharmonics. This demands the estimation of the relative strength of pitch

candidates, as the performance of grouping strongly depends on the pitch strength.

References [1] von Helmholtz, H. L. F. On the Sensations of Tone,

Dover, New York, 1954/1877. [2] Reuter, R. Untersuchung der Rauhigkeit

menschlicher Stimmen auf der Grundlage der nichtlinearen Dynamik und der Psychoakusik. PhD thesis, Technical University Berlin, 2000.

[3] Aures, W. “Ein Berechnungsverfahren der Rauhigkeit”, Acustica 58:268-281, 1985.

[4] McKinney, M. F., Tramo, M. J., and Delgutte, B. “Neural correlates of musical dissonance in the inferior colliculus”, in: Physiological and Psycho-physical Bases of Auditory Function, Shaker Publishing BV, 83-89, 2001.

[5] Fishman, Y., Reser, D. H., Arezzo, J. C. and Steinschneider, M. “Complex tone processing in primary auditory cortex of the awake monkey. I. Neural ensemble correlates of roughness”, J. Acoust. Soc. Am. 108(1):235-246, 2000.

[6] Tramo, M. J., Cariani, P. A., and Delgutte, B., and Braida, L. D. “Neurobiological foundations for the theory of harmony in western tonal music”, in: The Biological Foundations of Music, The New York Academy of Sciences, 92-116, 2001.

[7] Gibiat, V., and Castellengo, M. “Period doubling occurrences in wind instruments musical performance”, Acustica 86:746-754, 2000.

[8] Kimura, M. “How to produce subharmonics on the violin”, J. New Music Res. 28(2):178-184, 1999.

[9] Mazo, M. Ericson, D., and Harvery, T. “Emotion and expression: Temporal data on voice quality in Russian lament”, in: Vocal Fold Physiology: Voice Quality Control, Singular, San Diego, 173-178, 1995.

[10] Tsai, C. -G. The Chinese Membrane Flute (dizi): Physics and Perception of its Tones. PhD thesis, Humboldt University Berlin, 2003.

[11] Stepanek, J., and Otcenasek, Z. “Rustle as an attribute of timbre of stationary violin tones”, J. Catgut Acoust. Soc. 3(8):32-38, 1999.

[12] Bergan, C. C., and Titze, I. R. “Perception of pitch and roughness in vocal signals with subharmonics”, J. Voice 15:165-175, 2001.

[13] Bregman, A. S. Auditory Scene Analysis, MIT Press, 1990.

[14] de Cheveigné, A. “Concurrent vowel identification. III. A neural model of harmonic interference can-cellation”, J. Acoust. Soc. Am. 101:2857-2865, 1997.

[15] Sun, X., and Xu, Y. “Perceived pitch of synthesized voice with alternate cycles”, J. Voice 16(4):443-459, 2002.

260