13
ACTA ACUSTICA UNITED WITH ACUSTICA Vol. 101 (2015) 1039 – 1051 DOI 10.3813/AAA.918898 The Role of Spectral-Envelope Characteristics in Perceptual Blending of Wind-Instrument Sounds Sven-Amin Lembke, Stephen McAdams Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT), Schulich School of Music, McGill University, 555 Sherbrooke Street West, Montréal, Québec, Canada H3A 1E3. [email protected] Summary Certain combinations of musical instruments lead to perceptually more blended timbres than others. Orchestra- tion commonly seeks these combinations and can benefit from generalized acoustical descriptions of perceptu- ally relevant features that allow the prediction of blend. Previous research on correlating such instrument-specific features with the perception of blend shows an important role of spectral-envelope characteristics, leaving unan- swered, however, whether global or local characteristics are more important (e.g., spectral centroid or formant structure). This paper reports how wind instruments can be characterized through pitch-generalized spectral- envelope descriptions that exhibit their formant structure and how this is represented in an auditory model. Two experiments employing blend-production and blend-rating tasks study the perceptual relevance of formants to blend, involving dyads of a recorded instrument sound and a parametrically varied synthesized sound. Frequency relationships between formants influence blend critically, as does the degree of formant prominence. In addition, multiple linear regression relying primarily on local spectral-envelope characteristics explains 87% of the vari- ance in blend ratings. A perceptual model for the contribution of spectral characteristics to perceived blend is proposed. PACS no. 43.66.Ba, 43.66.Jh, 43.75.Cd 1. Introduction Knowledge of instrument timbre leads composers to se- lect certain instruments over others to fulfill a desired purpose in orchestrating a musical work. One such pur- pose is achieving a blended combination of instruments. The blending of instrumental timbres is thought to depend mainly on factors such as the synchrony between note onsets, partial tones aligned along the harmonic series, and specific combinations of instruments [1]. Whereas the first two factors depend on compositional decisions and their precise execution during musical performance, the third factor strongly relies on instrument-specific acous- tical characteristics. A representative characterization of these features would thus facilitate explaining and theoriz- ing perceptual eects related to blend. In agreement with past research [1, 2, 3], blend is defined as the perceptual fu- sion of concurrent sounds, with a corresponding decrease in the distinctness of individual sounds. It can involve dif- ferent practical applications, such as augmenting a domi- nant timbre by adding other subordinate timbres or creat- ing an entirely novel, emergent timbre [4]. This paper ad- dresses only the first scenario, as the latter likely involves more than two instruments. Received 15 June 2014, accepted 11 May 2015. Along a perceptual continuum, maximum blend is most likely only achieved for concurrent sounds in pitch unison or octaves. Even though other intervals may be rightly as- sumed to make two instruments more distinct, certain in- strument combinations would still exhibit higher degrees of blend than others. On the opposite extreme of this continuum, a strong distinctness of individual instruments leads to the perception of a heterogeneous, non-blended sound. Assuming auditory fusion to rely on low-level pro- cesses (related to auditory scene analysis, see [5]), in- creasingly strong and congruent perceptual cues for blend should counteract even deliberate attempts to identify in- dividual sounds. Previous research on timbre perception has shown a dominant importance of spectral properties. Timbre sim- ilarity has been linked to spectral-envelope characteristics [6]. Similarity-based behavioral groupings of stimuli re- flect a categorization into distinct spectral-envelope types [7] or the exchange of spectral envelopes between synthe- sized instruments results in an analogous inversion of po- sitions in multidimensional timbre space [8]. Furthermore, Strong & Clark [9] reported increasing confusion in in- strument identification (e.g., oboe with trumpet) whenever prominent spectral-envelope traits are disfigured, mak- ing instruments resemble each other more. With regard to blending, Kendall & Carterette [2] established a link between timbre similarity and blend, by relating closer © S. Hirzel Verlag · EAA 1039

The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015) 1039 ndash 1051

DOI 103813AAA918898

The Role of Spectral-Envelope Characteristics inPerceptual Blending of Wind-Instrument Sounds

Sven-Amin Lembke Stephen McAdamsCentre for Interdisciplinary Research in Music Media and Technology (CIRMMT) Schulich School of MusicMcGill University 555 Sherbrooke Street West Montreacuteal Queacutebec Canada H3A 1E3sven-aminlembkemailmcgillca

SummaryCertain combinations of musical instruments lead to perceptually more blended timbres than others Orchestra-tion commonly seeks these combinations and can benefit from generalized acoustical descriptions of perceptu-ally relevant features that allow the prediction of blend Previous research on correlating such instrument-specificfeatures with the perception of blend shows an important role of spectral-envelope characteristics leaving unan-swered however whether global or local characteristics are more important (eg spectral centroid or formantstructure) This paper reports how wind instruments can be characterized through pitch-generalized spectral-envelope descriptions that exhibit their formant structure and how this is represented in an auditory model Twoexperiments employing blend-production and blend-rating tasks study the perceptual relevance of formants toblend involving dyads of a recorded instrument sound and a parametrically varied synthesized sound Frequencyrelationships between formants influence blend critically as does the degree of formant prominence In additionmultiple linear regression relying primarily on local spectral-envelope characteristics explains 87 of the vari-ance in blend ratings A perceptual model for the contribution of spectral characteristics to perceived blend isproposed

PACS no 4366Ba 4366Jh 4375Cd

1 Introduction

Knowledge of instrument timbre leads composers to se-lect certain instruments over others to fulfill a desiredpurpose in orchestrating a musical work One such pur-pose is achieving a blended combination of instrumentsThe blending of instrumental timbres is thought to dependmainly on factors such as the synchrony between noteonsets partial tones aligned along the harmonic seriesand specific combinations of instruments [1] Whereas thefirst two factors depend on compositional decisions andtheir precise execution during musical performance thethird factor strongly relies on instrument-specific acous-tical characteristics A representative characterization ofthese features would thus facilitate explaining and theoriz-ing perceptual effects related to blend In agreement withpast research [1 2 3] blend is defined as the perceptual fu-sion of concurrent sounds with a corresponding decreasein the distinctness of individual sounds It can involve dif-ferent practical applications such as augmenting a domi-nant timbre by adding other subordinate timbres or creat-ing an entirely novel emergent timbre [4] This paper ad-dresses only the first scenario as the latter likely involvesmore than two instruments

Received 15 June 2014accepted 11 May 2015

Along a perceptual continuum maximum blend is mostlikely only achieved for concurrent sounds in pitch unisonor octaves Even though other intervals may be rightly as-sumed to make two instruments more distinct certain in-strument combinations would still exhibit higher degreesof blend than others On the opposite extreme of thiscontinuum a strong distinctness of individual instrumentsleads to the perception of a heterogeneous non-blendedsound Assuming auditory fusion to rely on low-level pro-cesses (related to auditory scene analysis see [5]) in-creasingly strong and congruent perceptual cues for blendshould counteract even deliberate attempts to identify in-dividual sounds

Previous research on timbre perception has shown adominant importance of spectral properties Timbre sim-ilarity has been linked to spectral-envelope characteristics[6] Similarity-based behavioral groupings of stimuli re-flect a categorization into distinct spectral-envelope types[7] or the exchange of spectral envelopes between synthe-sized instruments results in an analogous inversion of po-sitions in multidimensional timbre space [8] FurthermoreStrong amp Clark [9] reported increasing confusion in in-strument identification (eg oboe with trumpet) wheneverprominent spectral-envelope traits are disfigured mak-ing instruments resemble each other more With regardto blending Kendall amp Carterette [2] established a linkbetween timbre similarity and blend by relating closer

copy S Hirzel Verlag middot EAA 1039

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

timbre-space proximity between pairs of single-instrumentsounds to higher blend ratings for the same sounds form-ing dyads lsquoDarkerrsquo timbres have been hypothesized to befavorable to blend [4 10] quantified through the globalspectral-envelope descriptor spectral centroid with lsquodarkrsquoreferring to lower centroids Strong blend was found to bebest explained by a low centroid composite ie the cen-troid sum of the sounds forming a dyad

By contrast with global descriptors attempts to ex-plain blending through local spectral-envelope character-istics focus on prominent spectral maxima also termedformants [11 12] in this context Reuter [3] reported be-havioral findings in favor of timbre blend occurring when-ever formant regions between two instruments coincideHis explanation argues that this coincidence avoids in-complete masking [13] which inversely hypothesizes thatthe non-coincidence of formant locations prevents audi-tory fusion due to incomplete mutual masking of the pre-sumedly salient formants between instruments facilitatingthe detection of their distinct identities

As prominent signifiers of spectral envelopes formantshave been employed widely to describe wind instruments[3 7 8 12 14 15 16 17 18 19] Like the formantstructure found in the human voice [11 12 20] formantsin wind instruments are located at absolute frequency re-gions which remain largely unaffected by pitch change[14 17 18] This invariance may in fact allow for the gen-eralized acoustical description for these instruments andtogether with assessing its potential constraints (eg in-strument register dynamic marking) it will be of value tomusical applications (eg [21]) Furthermore it is mean-ingful to assess how such prominent spectral features arerepresented at an intermediary stage between acousticsand perception ie at a sensorineural level simulated bycomputational models of the human auditory system Au-ditory models can account for effects related to spectralmasking ie to what neural excitation pattern a spectrumof a single or compound sound leads For instance ex-citation patterns typically involve an asymmetric upwardspread in frequency but the shape of excitation still variesboth as a function of frequency and excitation level TheAuditory Image Model (AIM) simulates different stages ofthe peripheral auditory system covering the transductionof acoustical signals into neural responses and the sub-sequent temporal integration across auditory filters yield-ing the stabilized auditory image (SAI) which providesthe closest representation relating to acoustical spectral-envelope traits for human-voice and musical-instrumentsounds [22] AIMrsquos most recent development employsdynamic compressive gammachirp (DCGC) filterbankswhich account for both frequency and level dependencyof basilar excitation by adapting filter shape accordingly[23] AIM may therefore aid in assessing the relevanceof hypotheses concerning blend as previous theories hadalso employed representations or explanations which tookspectral-masking effects into account [4 3]

This paper addresses whether pitch-invariant spectral-envelope characterization is relevant to blending Section 2

introduces the chosen approach to spectral-envelope de-scription its corresponding representation through audi-tory models and how in the perceptual investigation thespectral description is operationalized in terms of paramet-ric variations of formant frequency location Section 3 out-lines the design of two behavioral experiments that inves-tigate the relevance of local variations of formant structureto blend perception with their specific methods and find-ings presented in Sections 4 and 5 respectively Finallythe combined results from acoustical and perceptual in-vestigations are discussed in Section 6 leading to the es-tablishment of a spectral model for blend in Section 7

2 Spectral-envelope characteristics

A corpus of wind instrument recordings was used to es-tablish a generalized acoustical description for each in-strument The orchestral instrument samples were drawnfrom the Vienna Symphonic Library1 (VSL) supplied asstereo WAV files (441 kHz sampling rate 16-bit dynamicresolution) with only left-channel data considered Theinvestigated instruments comprised (French) horn bas-soon C trumpet B clarinet oboe and flute with theavailable audio samples spanning their respective pitchranges in semitone increments Because the primary fo-cus concerned spectral aspects all selected samples con-sisted of long sustained notes without vibrato As spectralenvelopes commonly exhibit significant variation acrossdynamic markings all samples included only mezzofortemarkings representing an intermediate level of instrumentdynamics

21 Spectral-envelope description

Past investigations of pitch-invariant spectral-envelopecharacteristics pursued comprehensive assessments ofspectral analyses encompassing extended pitch ranges ofinstruments [14 17 18] The spectral-envelope descrip-tion employed in this paper was based on an empirical es-timation technique relying on the initial computation ofpower-density spectra for the sustained portions of sounds(excluding onset and offset) followed by the detectionof partial tones ie their frequencies and power levelsA curve-fitting procedure employing a cubic smoothingspline (piecewise polynomial of order 3) applied to thecomposite distribution of partial tones over all pitchesyielded the spectral-envelope estimates The procedurebalanced the contrary aims of achieving a detailed splinefit and a linear regression involving iterative minimizationof deviations between the estimate and the composite dis-tribution until an optimal criterion was met These pitch-generalized spectral-envelope estimates then served as thebasis for the identification and categorization of formantsThe main formant represented the most prominent spectralmaximum with decreasing magnitude towards both lowerand higher frequencies or if not available the most promi-

1 URL httpvslcoat Last accessed April 12 2014

1040

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10

Powerlevel(dB)

horn38 pitches

spectral-env estimate

composite distribution

formant maximum

formant bounds 3 dB

formant bounds 6 dB

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10oboe

34 pitches

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10clarinet

42 pitches

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10

Powerlevel(dB)

Frequency (Hz)

bassoon41 pitches

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10

Frequency (Hz)

trumpet31 pitches

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10

Frequency (Hz)

flute39 pitches

Figure 1 Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels) Estimates are based on thecomposite distribution of partial tones compiled from the specified number of pitches across the range of each instrument

nent spectral plateau ie the point exhibiting the flat-test slope along a region of decreasing magnitude towardshigher frequencies Furthermore descriptors for the mainformant F were derived from the estimated spectral enve-lope They comprised the frequencies of the formant max-imum Fmax as well as upper and lower bounds (eg Frarr

3dBand F larr

3dB) at which the power magnitude had decreased byeither 3 dB or 6 dB relative to Fmax

The spectral-envelope estimates for all investigated in-struments generally suggested pitch-invariant trends asshown in Figure 1 A narrower spread of the partial tones(circles) around the estimate (curve) argues for a strongerpitch-invariant trend The lower-pitched instruments hornand bassoon (left panels) exhibited strong tendencies forprominent spectral-envelope traits ie formants Higher-pitched instruments yielded two different kinds of descrip-tion Oboe and trumpet (middle panels) displayed mod-erately weaker pitch-invariant trends nonetheless exhibit-ing main formants with that of the trumpet being of con-siderable frequency extent compared to more locally con-strained ones reported for the other instruments Althoughstill following an apparent pitch-invariant trend the re-maining instruments clarinet and flute (right panels) dis-played only weakly pronounced formant structure withthe identified formants more resembling local spectralplateaus Furthermore the unique acoustical trait of theclarinet concerning its low chalumeau register preventedany valid assumption of pitch invariance to be made forthe lower frequency range This register is characterizedby a marked attenuation of the lower even-order partialswhose locations accordingly varied as a function of pitchFigure 1 also displays the associated formant descriptors(vertical lines) from which it can be shown that the iden-tified main formant for the clarinet (top-right panel) waslocated above the pitch-variant low frequencies

22 Auditory-model representation

If pitch-invariant spectral-envelope characteristics are per-ceptually relevant they should also become apparent ina representation closer to perception like the output ofa computational auditory model Using the AIM SAIswere derived from the DCGC basilar-membrane modelcomprising 50 filter channels equidistantly spaced alongequivalent-rectangular-bandwidth (ERB) rate [24] andcovering the audible range up to 5 kHz2 A time-averagedSAI magnitude profile was obtained by computing the me-dians across time frames per filter channel which resem-bled the auditory excitation pattern [22]

A strong similarity among SAIs across an extendedrange of pitches was taken as an indicator for pitch-invariant tendencies Pearson correlation matrices for allpossible pitch combinations were computed comparingthe profiles of SAI magnitudes over filter channels In ad-dition this approach also aided in identifying the limitsof pitch invariance as adjacent regions exhibiting weakercorrelations delimited instrument registers where SAIsvaried as a function of pitch Three representative casesare illustrated in Figure 2 For horn (left panel) and bas-soon (not shown) broad regions of pitch-invariant SAIprofiles became apparent (dark square) spanning largeparts of their ranges up to pitches of about D4 Oboe(middle panel) and trumpet (not shown) exhibited moreconstrained and fragmented regions of high SAI similar-ity contrasted by increasingly pitch-variant SAI profilesabove A4 For these four instruments pitch-invariant char-acterization appeared to be more prevalent and stable in

2 As band-limited analysis economized computational cost and no promi-nent formants above 5 kHz were found the audio samples were sub-sampled by a factor of 4 to a sampling rate of 11025 Hz only for thepurposes of analysis with AIM

1041

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Figure 2 (Color online) SAI correlation matrices for horn oboe and clarinet (left-to-right) Correlations (Pearson r) between SAImagnitudes across 50 filter channels consider all possible pitch combinations with obtained r falling within [0 1] (see legend far-right)

lower pitch regions from which low-pitched instrumentsin particular would benefit All of these instruments lostpitch-invariant tendencies in their high registers The re-maining instruments clarinet (right panel) and flute (notshown) lacked widespread pitch-invariant SAI charac-teristics as strong patterns of correlation were only ob-tained between directly adjacent pitches (diagonal) andnot across wider pitch regions

23 Parametric variation of main-formant fre-quency

In order to study the contribution of local variations ofspectral characteristics a synthesis model was employedthat provided parametric control over separate spectral-envelope components The synthesis infrastructure reliedon a source-filter model and was realized for real-timemodification of the control parameters [25] based onwhich the spectral envelope remained invariant to pitchchanges During synthesis the filter structure was fed aharmonic source signal of variable fundamental frequencycontaining harmonics up to 5 kHz The filter structure con-sisted of two independent filters modeling the main for-mant on the one hand and the remaining spectral-enveloperegions on the other A parameter allowing the main for-mant to be shifted in frequency relative to the remain-ing regions was implemented as an absolute deviation ΔFin Hz from a predefined origin ie ΔF = 0 Analoguemodels for each instrument were designed for ΔF = 0 bymatching the frequency response of the composite filterstructure to the spectral-envelope estimates as illustratedin Figure 3 for the horn (dashed black line) superimposedover its corresponding estimate (solid grey line) The ana-logues were not meant to deliver realistic emulations ofthe instruments per se but rather to achieve a good fitbetween the main formants of the analogue and spectral-envelope estimate Limiting differences in shape betweenmain formants helped to deduce the measured perceptualdifferences that resulted from frequency relationships be-tween them It should be noted that the synthesis filterstructure for the clarinet excluded its pitch-variant lower

500 1000 1500 2000 2500 3000 3500 4000 4500 5000-50

-40

-30

-20

-10

0

Powerlevel(dB)

Frequency (Hz)

spectral-envelope estimate

formant bounds 3 dB

synthesis filter for ΔF= 0

synthesis filters for ΔFne 0

-II -I 0 +I+II

Figure 3 Spectral-envelope estimate of horn and filter magni-tude responses of its synthesis analogue The original analogueis modeled for ΔF = 0 the other responses are variations ofΔF The top axis displays the equivalent scale for the five ΔF levelsinvestigated in Experiment B

frequency region (see Section 21) It only modeled theformant above that region as well as the remaining spec-tral envelope towards higher frequencies in order to orientthe investigation toward specifically testing the relevanceof the identified albeit less pronounced formant

3 General methods

The perceptual relevance of main-formant frequency toblending was tested for sound dyads All dyads compriseda sampled instrument and its synthesized analogue modelIn a given dyad the instrument sample was constant andits synthesized analogue was variable with respect to theparameter ΔF Variations with ΔF gt 0 shifted the mainformant of the synthesized sound higher in frequency rel-ative to the sampled instrumentrsquos main formant and ac-cordingly ΔF lt 0 corresponded to shifts toward lowerfrequencies Two perceptual experiments were conductedto investigate how ΔF variations relate to blend In Ex-periment A participants controlled ΔF directly and were

1042

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

asked to find the ΔF that gave optimal blend whereasin Experiment B listeners provided direct blend ratingsfor predefined ΔF variations Using the instruments pre-sented in Section 2 the robustness of perceptual effectswas assessed over the two experimental tasks for differ-ent pitches unison and non-unison intervals and stimuluscontexts Given six instruments and various pitches andintervals it was impractical to test all possible combina-tions An exploratory approach was chosen instead withnot all instruments being tested across all factors Whereasthis could reveal blend-related dependencies concerningΔF across the factors of interest it did not allow gener-alizing across all instruments to the same degree as well asdetermining perceptual thresholds Still each factor wasstudied with at least three instruments for greater gener-alizability Pitches were chosen to represent common reg-isters of the individual instruments Non-unison intervalsincluded both smaller and larger intervals

The methods both experiments share in common arepresented in this section before addressing their specificsand results in the following sections

31 ParticipantsDue to the demanding experimental tasks participantsof both experiments were musically experienced listen-ers They were recruited primarily from the SchulichSchool of Music McGill University Their backgroundswere assessed through self-reported degree of formalmusical training accumulated across several disciplineseg instrumental performance composition music the-ory andor sound recording All participants passed a stan-dardized hearing test [26 27] No participant took part inboth experiments

32 StimuliAll stimuli involved dyads comprising one sampled(drawn from VSL) and one synthesized sound For a sam-ple at any given pitch the spectral envelope was ap-proximated by the pitch-generalized description from Sec-tion 21 which resulted in the main formants of sampledand synthesized sounds resembling each other for ΔF = 0With regard to the temporal envelope both instrumentswere synchronized in their note onsets followed by thesustain portion and ending with an artificial 100-ms linearamplitude decay ramp applied to both instrument soundsThe sampled sound retained its original onset character-istics whereas across all modeled analogues the synthe-sized onsets were characterized by a constant 100-ms lin-ear amplitude ramp Stimuli were presented over a stan-dard two-channel stereophonic loudspeaker setup insidean Industrial Acoustics Company double-walled soundbooth with the instruments simulated as being capturedby a stereo main microphone at spatially distinct locationsinside a mid-sized moderately reverberant room (see [25]for details)

33 ProcedureExperimental conditions were presented in randomized or-der within blocks of repetitions A specific condition could

not occur twice in succession between blocks The mainexperiments were in each case preceded by 10 practice tri-als under the guidance of the experimenter to familiar-ize participants with the task and with representative ex-amples of stimulus variations Dyads were played repeat-edly throughout experimental trials allowing participantsto pause playback at any time

34 Data analysis

With respect to investigated factors Experiment A evalu-ated the influence of the factors instrument register and in-terval type Experiment B assessed pitch-invariant percep-tual performance across a number of factors and further-more correlated the perceptual data with spectral-envelopetraits Separate analyses of variance (ANOVAs) were con-ducted for each instrument testing for statistically signifi-cant main effects within factors and interaction effects be-tween them A criterion significance level α = 05 waschosen and if multiple analyses on split factor levels orindividual post-hoc analyses were conducted Bonferronicorrections were applied In repeated-measures ANOVAsthe Greenhouse-Geisser correction (ε) was applied when-ever the assumption of sphericity was violated In additionExperiment A also considered one-sample t-tests againsta mean of zero for testing differences to ΔF = 0 Statis-tical effect sizes η2

p and r are reported for ANOVAs andt-tests respectively The analyses considered participant-based averages for trial repetitions of identical conditions

4 Experiment A

41 Method

411 Participants

The experiment was conducted with 17 participants 6 fe-male and 11 male with a median age of 27 years (range20-57) Fifteen participants reported more than 10 yearsof formal musical training with 10 indicating experi-ence with wind instruments Participants were remuner-ated with 15 Canadian dollars

412 Stimuli

Table I lists the 17 investigated dyad conditions (columnentries of bottom row) All instruments included unisonintervals (0 semitones ST) With regard to additional fac-tors three levels of the Interval factor compared unisonintervals to consonant (7 or minus3 ST) and dissonant (6 orminus2 ST) non-unison intervals Two levels of the Registerfactor contrasted low (A2 C4 or E3) to high (D5 or B5) in-strument registers for unison dyads with the high-registerpitches being derived from the pitch-variant regions iden-tified in Section 22 The sampled sound remained at theindicated reference pitch whereas the synthesized soundvaried relative to it to form the non-unison intervals Alldyads had constant durations of 4900 ms The level bal-ance between instruments was variable and determined bythe participant

1043

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table I Seventeen dyad conditions from Experiment A across instruments pitches and intervals (top-to-bottom) Intervals in semitonesrelative to the specified reference pitch

horn bassoon oboe trumpet clarinet fluteC3 A2 D5 C4 C4 B5 E3 D5 C4

0 6 7 0 -2 -3 0 0 0 6 7 0 0 -2 -3 0 0

ΔF

-II -I +II+I0

A

B

Γ= -100 Hz fslider

high

low

Figure 4 ΔF variations investigated in Experiments A and B AParticipants controlled fslider which provided a constant range of700 Hz (white arrows) Γ (eg minus100 Hz) represented a random-ized roving parameter preventing the range from always beingcentered on ΔF = 0 B Participants rated four dyads varyingin ΔF drawn from the low or high context The contexts repre-sented subsets of four of the total of five predefined ΔF levels

413 Procedure

A production task required participants to adjust ΔF di-rectly in order to achieve the maximum attainable blendwith the produced value serving as the dependent vari-able User control was provided via a two-dimensionalgraphical interface including controls for ΔF and thelevel balance between instruments The slider controls forΔF= fslider + Γ provided a constant range of 700 Hzwith fslider isin [minus350+350] and including a randomizedroving offset Γ isin [minus100+100] between trials As visu-alized in Figure 4 (top) minimal or maximal Γ limitedthe range covered by all trials to 500 Hz (solid thick greyline) with all possible ΔF deviations spanning a range of900 Hz (dashed thick line) Participants completed a totalof 88 experimental trials (22 conditions3 times 4 repetitions)taking about 50 minutes and including a 5-minute breakafter about 44 trials

42 Results

421 General trends

For all six instruments participants associated optimalblend with deviations ΔF le 0 Figure 6 (diamonds inlower part) illustrates the means for optimal ΔF fromwhich two different patterns become apparent among in-struments ΔF are displayed relative to a scale of equiv-alent variations tested in Experiment B with the scalevalue 0 corresponding to ΔF = 04 The grey lines indicate

3 Only 17 conditions investigated ΔF the remainder studied other for-mant properties that lie outside the focus of this article For instancerelative main-formant magnitude variations ΔL were also investigatedwhich will be addressed in a future publication4 ΔF were linearly interpolated to a scale of equi-distant levels eg minusI 0 and +I corresponding to the numerical scale values -1 0 and 1 re-spectively

each instrumentrsquos respective slider range For the instru-ments horn bassoon oboe and trumpet (left panel) opti-mal blend was found in direct proximity to ΔF = 0 Forthe unison intervals of horn and bassoon ΔF did not differsignificantly from zero ΔF for the other two instrumentswere located slightly lower [t(16) le minus56 p lt 0001rge 82] By contrast optimal ΔF for the clarinet and flute(right panel) were relatively distant from ΔF = 0 in linewith significant underestimations [t(16)leminus38 p le 0015rge 69] In summary ΔF values leading to optimal blendwere limited to cases in which the formant of the synthe-sized instrument was at or below that of the sampled in-strument

422 Instrument register and interval type

The influence of instrument register on the optimal ΔFwas investigated for trumpet bassoon and clarinet atpitches corresponding to instrument-specific low and highregisters One-way repeated-measures ANOVAs for Reg-ister yielded moderately strong main effects for trumpetand bassoon [F (1 16) ge 192 p le 0005 η2

p ge 55] dueto an increase of the optimal ΔF in the high registerA less pronounced effect was obtained for the clarinet[F (1 16)=53p= 0358η2

p = 25]The investigation of possible differences in optimal ΔF

between interval types involved comparisons between uni-son and non-unison intervals as well as a distinction be-tween consonant and dissonant for the latter One-wayrepeated-measures ANOVAs on Interval conducted forbassoon clarinet and trumpet only led to a weak maineffect for the trumpet [F (2 32)= 37p= 0347η2

p = 19]Post-hoc tests for the three possible comparisons yieldeda single significant difference between the interval sizes 0and 6 ST [t(16) = minus35 p = 0033 r = 65]

5 Experiment B

51 Method

511 Participants

The experiment was completed by 20 participants 9 fe-male and 11 male with a median age of 22 years (range18ndash35) Fifteen participants reported more than 10 yearsof formal musical training with 11 indicating experi-ence with wind instruments Participants were remuner-ated with 20 Canadian dollars

512 Stimuli

Table II lists the 22 investigated dyad conditions The In-terval factor investigated two levels comparing unison to

1044

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table II Twenty-two dyad conditions from Experiment B across instruments pitches and intervals (top-to-bottom) Intervals in semi-tones relative to the specified reference pitch

horn bassoon oboe trumpet clarinet fluteC3 B 3 A2 D4 C4 G 4 E5 C4 B 4 E3 A4 C4 G 4 E5

0 6 0 6 0 -2 0 -2 0 0 0 0 6 0 6 0 -2 0 -2 0 0 0

non-unison (6 or -2 ST dissonant) intervals Depending onthe instrument the Pitch factor involved two (horn bas-soon trumpet clarinet) or three (oboe flute) levels In thecase of horn bassoon trumpet and clarinet there weretwo levels of Interval for each level of Pitch In additionthis experiment included two factors that were related toΔF variations alone which applied to all conditions listedin Table II The first was synonymous with ΔF as it ex-plored a total of five ΔF levels including ΔF = 0 and twosets of predefined moderate and extreme deviations aboveand below it ie the ΔF levels hereafter labeled 0 plusmnI and plusmnII The second factor grouped the five levels contex-tually into two subsets of four which are denoted as lowand high contexts and defined in Figure 4 (bottom)

Employing the formant descriptors from Section 21the investigated ΔF levels were expressed on a commonscale of spectral-envelope description which provided abetter basis of comparison than taking equal frequencydifferences in Hz as the frequency extent of formantsacross instruments varied considerably Figure 5 providesexamples for all resulting ΔF levels of the horn The fourlevels ΔF = 0 were defined as frequency distances be-tween the formant maximum Fmax and measures relatedto the location and width of its bounds (eg Frarr

6dB andΔF3dB respectively) For example the positive deviationΔF (+I) was the distance between the formant maximumand its upper bound minus 10 of the width betweenthe 3 dB bounds If spectral-envelope descriptions lackedlower bounds (eg trumpet clarinet flute) the frequencylocated below Fmax that exhibited the lowest magnitudewas taken as a substitute value

Unlike the dyads in Experiment A the synthesizedsound always remained at the reference pitch whereas thesampled sound varied its pitch for non-unison intervalsbecause this tested the assumption of pitch-invariant de-scription for the recorded sounds more thoroughly Thedyads had a constant duration of 4700 ms In additionthe conditions listed in Table II including the associatedfive ΔF levels per condition had predetermined values forthe level balance between sounds and had also been equal-ized for loudness The first author determined the level bal-ance aiming for good balance between both sounds whilemaintaining discriminability between ΔF levels whichwas subsequently verified by the second author Loudnessequalization was conducted subjectively in a separate pi-lot experiment anchored to a global reference dyad for allconditions and ΔF levels For all stimuli gain levels weredetermined that equalized stimulus loudness to the globalreference These gain levels were based on median val-ues from at least five participants which were determined

200 400 600 800 1000 1200-12

-10

-8

-6

-4

-2

0

Powerlevel(dB)

Frequency (Hz)

ΔF3dB

F3dB

larrF3dB

rarrFmax

F6dB

larrF6dB

rarr

-II -I 0 +I +II

Figure 5 ΔF levels from Experiment B defined relative toa spectral-envelope estimatersquos formant maximum and boundsΔF (plusmnI) fall 10 inside of ΔF3dBrsquos extent ΔF (+II) aligns withFrarr

6dB whereas ΔF (minusII) aligns with either 80 middot F larr6dB or 150 Hz

whichever is closer to Fmax

either after the corresponding interquartile ranges first fellbelow 4 dB or after running a maximum of 10 participants

513 ProcedureA relative-rating task required participants to compare ΔFlevels for a given condition from Table II In each ex-perimental trial participants were presented four dyadsand asked to provide four corresponding ratings The fourdyads represented one of the twoΔF contexts labeled highand low in Figure 4 A continuous rating scale was em-ployed which spanned from most blended to least blended(values 1 to 0 respectively) and served as the dependentvariable Participants needed to assign two dyads to thescale extremes (eg most and least) the remaining twodyads were positioned along the scale continuum relativeto the chosen extremes Playback could be switched freelybetween the four dyads with the visual order of the se-lection buttons and rating scales for individual dyads ran-domized between trials Participants completed 120 trials(30 conditions5 times 2 contextstimes 2 repetitions) taking about75 minutes including two 5-minute breaks after about 40and 80 trials

52 Results

521 General trendsGroup medians of ratings aggregated across the factorsPitch Interval and Context illustrate how blend varies as a

5 Only the 22 conditions investigating ΔF are reported here

1045

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

function of ΔF Figure 6 suggests that participants mainlyassociated higher degrees of blend with the levels ΔF le 0whereas much lower ratings were obtained for ΔF gt 0 Interms of higher degrees of blend two typical rating pro-files as a function of ΔF emerged (shown as the ideal-ized dashed-and-dotted curves in Figure 6) 1) For the in-struments horn bassoon oboe and trumpet (left panel)medium to high blend ratings were obtained at and be-low ΔF = 0 above which ratings decreased markedlyresembling the profile of a plateau 2) The instrumentsclarinet and flute (right panel) exhibited a monotonicallydecreasing and approximately linear rating profile as ΔFincreases in which ΔF = 0 did not appear to assume a no-table role These differences in plateau vs linear profilesfor the two instrument subsets are analyzed more closelyin the following sections also taking into account potentialeffects due to the other factors

522 Blend and pitch invariance

Spectral characteristics that remain stable with pitch varia-tion such as formants may have a pitch-invariant percep-tual relevance To test this whenever the profiles of blendratings over ΔF remained largely unaffected by differentpitches intervals and ΔF contexts the perceptual resultswere assumed to be pitch-invariant For instance Figure 7suggests this tendency for the horn in which the plateauprofile was maintained over all factorial manipulations Asa first step the main effects across ΔF were tested to con-firm that ratings served as reliable indicators of perceptualdifferences Given these main effects perceptual robust-ness to pitch variation was fulfilled if no ΔF times Pitch orΔF times Interval interaction effects were found across bothΔF contexts An absence of main effects betweenΔF con-texts would indicate further perceptual robustness

As the Context factor only involved ΔF levels com-mon to both the high and low contexts namely 0 andplusmnI (see Figure 4 bottom) the ratings for these levels re-quired range normalization and separate analyses from theremaining factors For the instruments involving the In-terval factor these were conducted on split levels of thatfactor Furthermore the experimental task imposed the us-age of the rating-scale extremes which resulted in severalviolations of normality due to skewed distributions for thedyads selected as extremes As a result all main and in-teraction effects were tested with a battery of five indepen-dent repeated-measures ANOVAs on the raw and trans-formed ratings The data transformations included non-parametric approaches of rank transformation [28] andprior alignment of lsquonuisancersquo factors [29]6 The statistics

6 Given the unavailability of non-parametric alternatives for repeated-measures three-way ANOVAs that include tests for interaction effectsan approach was chosen that assesses tests over multiple variants ofdependent-variable transformations presuming that the most conserva-tive test in the ANOVA battery minimizes accepting false positives Ranktransformation is a common approach in non-parametric tests such as theone-way Friedman test [28] Issues with tests for interaction effects los-ing power in the presence of strong main effects were addressed throughlsquoalignmentrsquo of the raw data prior to rank transformation [29] For in-stance a test for the interaction AtimesB would align its lsquonuisancersquo factors

-II -I 0 +I +II

0

1

ΔF

Blendrating

Produced optimal blend

-II -I 0 +I +IIΔF

Produced optimal blend

horn

bassoon

oboe

trumpet

profile

clarinet

flute

profile

Figure 6 Perceptual results for the different instruments groupedaccording to two typical response patterns (left and right panels)Experiment A (diamonds bottom) mean ΔF for produced opti-mal blend transformed to a continuous scale of ΔF levels Thegrey lines indicate slider ranges (compare to Figure 4 top) Ex-periment B (curves) median blend ratings across ΔF levels andidealized profile

-II -I 0 +I

0

1

context lowinterval unison

Blendrating

-I 0 +I +II

0

1

context highinterval unison

pitch 1

pitch 2

-II -I 0 +I

0

1

context lowinterval non-unison

Blendrating

ΔF-I 0 +I +II

0

1

context high

interval non-unison

ΔF

Figure 7 Medians and interquartile ranges of blend ratingsfor horn across ΔF levels and the factorial manipulationsPitchtimesContexttimesInterval

for the most liberal and conservative p-values are reported(eg conserv|liberal) with the conservative finding beingassumed valid if statistical significance is in doubt

Strong main effects were found for all instrumentswhich indicated clear differences in perceived blendamong the investigated ΔF levels Table III lists ANOVAstatistics for the range between strongest (clarinet) andweakest (bassoon) main effects among the instrumentswhich reflects analogous differences in the utilized rating-scale ranges in Figure 6 Furthermore the rating profiles ofthe instruments horn bassoon oboe and trumpet fulfilledthe criteria for pitch-invariant robustness as they appeared

by removing the main effects for A and B The four data transformationsprocessed the raw data with or without alignment and for two rankingmethods The first method computed global ranks across the entire dataset ie across participants and conditions whereas the second methodevaluated within-participant ranks across conditions per participant

1046

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table III Range of strongest to weakest ANOVA main effects along ΔF across all six instruments

low context high contextEffect Stat conserv liberal conserv liberal

F 860 826 1651 1651Clarinet df 16307 21391 357 357(strongest) ε 54 69 - -

p lt0001 lt0001 lt0001 lt0001η2p 82 81 90 90

F 164 168 126 152Bassoon df 357 357 14271 357(weakest) ε - - 48 -

p lt0001 lt0001 0005 0001η2p 46 47 40 44

unaffected by pitch-related variation There was only oneexception from a complete absence of effects interactingwith ΔF a moderate main effect for Context for trum-pet was found only at non-unison intervals [F (1 19) =1048|2504 p = 0043|0001 η2

p = 355|569] By con-trast the rating profiles for clarinet and flute did not exhibitpitch invariance as they clearly violated the criteria acrossboth ΔF contexts The interaction effects with ΔF and amain effect for Context leading to their disqualification aredescribed in Table IV

The instruments displaying pitch invariance were thesame ones with plateau rating profiles possibly attributinga special role to ΔF = 0 as defining a boundary governingblend To further support this assumption by joint analysisof the four instruments two hierarchical cluster analyseswere employed that grouped ΔF levels based on their sim-ilarity in perceptual ratings or auditory-model representa-tions The first cluster analysis reinterpreted rating differ-ences between ΔF as a dissimilarity measure This mea-sure considered effect sizes of statistically significant non-parametric post-hoc analyses (Wilcoxon signed-rank test)for pairwise comparisons between ΔF levels ie greaterstatistical effects between two ΔF levels were expressedas being more dissimilar in the perceived degree of blendFor non-significant differences dissimilarity was assumedto be zero The second analysis relied on correlation coef-ficients (Pearson r) between dyad SAI profiles across ΔFlevels (see Figure 9 for examples) The dissimilarity mea-sure considered the complement value 1minus r and as all cor-relations fall within the range [0 1] no special treatmentfor negative correlations was required Both cluster anal-yses employed complete-linkage algorithms The dissimi-larity input matrices were obtained by averaging 30 inde-pendent data sets aggregated across the four instrumentsand the factors Context Pitch and Interval As shown inFigure 8 both analyses led to analogous solutions in whichthe two levels ΔF gt 0 are maximally dissimilar to acompact cluster associating the three levels ΔF le 0 Inother words ΔF associated with low and high degrees ofblend group into two distinct clusters clearly relating tothe plateau profile where ΔF = 0 defines the boundary tohigher degrees of blend

Table IV ANOVA effects for clarinet and flute suggesting an ab-sence of pitch invariance lowast The column header low context doesnot apply in this case

Clarinet low context high contextEffect Stat conserv liberal conserv liberal

F 29 34 38 44ΔF times df 357 357 20387 357Interval ε - - 68 -

p 044 024 030 008η2p 13 15 17 19

Flute low context high contextEffect Stat conserv liberal conserv liberal

F 28 43 44 72ΔF times df 39750 6114 6114 6114Pitch ε 66 - - -

p 031 0006 0005 lt0001η2p 13 18 19 28

F 49 157 - -df 119 119 - -

Contextlowast ε - - - -p 039 0008 - -η2p 21 45 - -

-II -I 0 +I +II

01

02

03

04

05

Dissimilarity

ΔF-II -I 0 +I +II

01

02

ΔF

Figure 8 Dendrograms of ΔF -level groupings for the pitch-invariant instruments Dissimilarity measures are derived fromperceptual ratings (left) and auditory-modelled dyad SAI profiles(right)

523 Blend and its spectral correlatesExplaining blending between instruments with the helpof spectral-envelope characteristics could eventually al-

1047

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table V Variables entering stepwise-regression algorithm to ob-tain models reported in Table VI a Not for pitch-variant subsetinter-variable correlation |r| gt 7 b Computed as described in[30] c Accounting for pitch d Accounting for perceivable beat-ing between partial tones

No Variable Description

1 ΔLrarr3dB derivate of Frarr

3dB Equation (1)2 ΔLF1vsF2 ΔL betw formants F1 amp F2

3 ΔSslopeF1ab spectral slope above Frarr

3dB4 ΔSslope

b global spectral slope5 |ΔScentroid

b absolute centroid difference6 Scentroid

b centroid composite7 ERBrate

c reference pitch in Table II8 I(non)unison interval category (binary)9 IST

a interval size in semitones10 Clohi ΔF context (binary)11 mixratio balance betw instruments12 AMdepth

bd amplitude modulation depth

low the prediction of blend through these instrument-specific traits In addition it could help understand the wayin which spectral characteristics contribute perceptuallyGiven this aim multiple linear regression was employed tomodel the median blend ratings through a number of vari-ables or regressors The regression models assessed therelative contributions of regressors describing both globaland local spectral-envelope traits Global descriptors in-volved the commonly reported spectral centroid and spec-tral slope [30] whereas local descriptors concerned theformant characterization discussed in Section 21 Becausea dyad yielded two descriptor values across its constituentsounds the regressor measure had to associate the two insome way For the spectral centroid two measures wereconsidered namely composite (sum) and absolute differ-ence [4] Although the centroid composite relates to thelsquodarkerrsquo-timbre hypothesis mentioned in the Introductionthe centroid difference had still been found to best explainblend in non-unison intervals [4] which left some uncer-tainty as to which of these two measures was more appro-priate in explaining blend in general The remaining spec-tral regressors were implemented as polarity-preservingdifferences between descriptors with the sampled instru-ment serving as the reference (ref) and the synthesized in-strument being variable (var) across ΔF For example thedifference of descriptor values dx for instrument x wouldcorrespond to Δd = dref minus dvar

Regression models for two separate subsets of the per-ceptual data were explored pitch-invariant (horn bassoonoboe trumpet) and pitch-variant instruments (clarinetflute) The datasets comprised all conditions tested acrossfactors and instruments with N = 118 and N = 54 forthe pitch-invariant and -variant subsets respectively Re-gressor variables were pooled from the spectral-envelopedescriptors and additional variables that were included toaccount for potential confounding factors eg pitch in-terval If these factors did not contribute as regressorsthis would further support a perceptual relevance of pitch

invariance The initial pool of regressors comprised 32variables subsequently reduced to a pre-selected set thatexhibited inter-variable correlations |r| lt 7 in order toavoid pronounced collinearity among regressors The pre-selection was determined by first identifying the vari-able that in simple linear regression exhibited the high-est R2 and subsequently adding all remaining variablesthat yielded permissible inter-variable correlations Ta-ble V lists the pre-selected variables entered into the re-gression which comprised spectral-envelope descriptors(nos 1-6) and variables representing other potential fac-tors of influence (nos 7-12) Stepwise multiple-regressionalgorithms with both forward-selection and backward-elimination schemes were considered which converged onoptimum models by iteratively adding or eliminating re-gressors respectively Models with similar combinationsof regressors to the optimum models were explored aswell In anticipation of reporting the results the inclusionof a binary regressor for ΔF context Clohi benefited all in-vestigated regression models as it corrected the systematicoffset of scaled ratings between the low and high contexts(see Figure 7)

In simple linear regression the strongest spectral-enve-lope descriptors all concerned local formant characteriza-tion and did not involve the global descriptors Among theformant descriptors the highest correlations were obtainedfor the main-formant upper bound Frarr

3dB applied to bothpitch-invariant [R2(118) = 656 p lt 0001] and pitch-variant subsets [R2(54) = 713 p lt 0001] Notably theformant maximum Fmax did not perform better than Frarr

3dBlikely due to differing skewness properties between instru-ment formants (see Figure 1) At the same time the utilityof Frarr

3dB implies that it could assume an important role inexplaining blend as perhaps the perceptually most salientfeature of formants It performed slightly better for thepitch-variant than for the pitch-invariant subset becauseΔF rarr

3dB essentially follows a strictly monotonic functionacross the investigated ΔF levels which apparently mod-els the linear blend profile better To improve the model-ing for the plateau blend profile derivate descriptors ofFrarr

3dB were explored The most effective derivate ΔLrarr3dB re-

lated the upper-bound frequencies Frarr3dB of the two instru-

ments to a corresponding change in level across the ref-erence instrumentrsquos spectral-envelope magnitude functionLref (f ) in dB as formalized in Equation 1 In other wordsthis measure evaluated magnitude differences relative tothe reference instrument at frequencies appearing to be ofparticular perceptual relevance to both instruments whichtherefore may relate to spectral-masking effects (eg in-complete masking [13])

ΔLrarr3dB = Lref F rarr

3dB|ref minus Lref F rarr3dB|var (1)

The obtained solutions from stepwise multiple regressionyielded identical models for both instrument subsets in-volving the regressors ΔLrarr

3dB absolute spectral-centroiddifference |ΔScentroid| and context Clohi A slight gain inperformance was achieved by substituting the |ΔScentroid|based on audio signals from individual sounds with a vari-

1048

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table VI Multiple-regression models best predicting timbre-blend ratings for two instrument subsets

Pitch-invariant subset R2adj=87

F (3 116)=2724 p lt0001

Regressors βstd t p

ΔLrarr3dB 100 286 lt0001

|ΔScentroid| 26 77 lt0001Clohi 27 79 lt0001

Pitch-variant subset R2adj=88

F (3 52)=1342 p lt0001

Regressors βstd t p

ΔLrarr3dB 103 200 lt0001

|ΔScentroid| 16 33 0018Clohi 34 68 lt0001

ant computed on the pitch-generalized spectral-envelopeestimates Table VI displays these optimized regressionmodels for pitch-invariant and pitch-variant subsets bothleading to about 87 explained variance (adjusted R2)The standardized regression-slope coefficients βstd indi-cate the relative contribution of regressors with the rel-ative weights being very similar for both subsets In thesemodels ΔLrarr

3dB acted as the strongest predictor for theblend ratings contributing about five times more than|ΔScentroid| which furthermore did not perform better thanClohi These findings clearly argue for local spectral-envelope descriptors to be more meaningful than globalones in explaining blending in the investigated dyadsMoreover the remaining global descriptor spectral slopeappeared to play no role Furthermore finding both in-strument subsets to be modeled equally well through thesame spectral-envelope descriptors points to a general util-ity of pitch-generalized descriptions for all instrumentsDespite the findings in Section 522 arguing against pitch-invariant perceptual robustness for clarinet and flute theobtained regression models excluded the Pitch and Intervalvariables thus appearing to be less relevant to explainingthe blend ratings

6 General discussion

Orchestrators would benefit from acoustical descriptionsof instruments that correspond to the perceptual processesinvolved in achieving blended timbres Section 2 sug-gests that common orchestral wind instruments are rea-sonably well described through pitch-generalized spectral-envelope estimates which furthermore show the instru-ments horn bassoon oboe and trumpet to be charac-terized by prominent formant structure Auditory mod-els employing stabilized auditory images (SAI) confirmthat for strong formant characterization and for lower tomiddle pitch ranges the pitch-invariant characterization isstable In higher instrument registers however SAI pro-files indicate limitations to pitch-invariant characteriza-tion Other instruments like clarinet and flute yield SAI

profiles clearly varying as a function of pitch implyingthat this pitch dependency may also extend to perception

The perceptual investigation in Sections 3 to 5 con-firms the acoustical implications showing that strong for-mant characterization results in main formants becomingperceptually relevant to blending Given a dyad in whicha putative main formant is variable in frequency relativeto a fixed reference formant the investigated instrumentsdisplay two archetypical profiles based on their formantprominence For the pitch-variant clarinet and flute blendincreases as a monotonic quasi-linear function if the vari-able formant moves from above to below the referenceFor pitch-invariant instruments the frequency alignmentbetween the variable formant and the reference (ΔF= 0)functions as a boundary delimiting a region of higher de-grees of blend at and below the reference and contrastedby a marked decrease in blend when the variable formantexceeds it which overall resembles a plateau profile Thepitch invariance even extends to different interval typesas the plateau profile remains unaffected in non-unisonintervals regardless of their degree of consonance How-ever the findings suggest that the perceptual relevance ofspectral-envelope estimates diminishes in higher instru-ment registers The limited sampling of conditions acrossthe investigated factors prevented a more comprehensiveevaluation of all instruments Whereas the findings allowgeneral relationships for formant frequency to be deducedmore comprehensive investigations are needed to attainmore generalizable quantification of absolute frequencyranges and thresholds

In correlating acoustical and perceptual factors spec-tral-envelope characteristics alone explain up to 87 ofthe variance in blend ratings In addition local spectraltraits seem to be more powerful acoustical predictors ofblend than global traits The formant descriptor for theupper formant bound Frarr

3dB when expressed as a derivatedescriptor ΔLrarr

3dB acts as the strongest predictor for theblend ratings regardless of whether instruments belongto the pitch-invariant group or not With regard to clar-inet and flute the departures from pitch invariance foundin Section 522 contradict the utility of pitch-generalizedspectral-envelope description in predicting blend ratingsas reported in Section 523 Taking both findings into ac-count this for one argues that the descriptor Frarr

3dB stillsucceeds in explaining blend well even for clarinet andflute On the other hand the same instruments displaya greater perceptual sensitivity to the Pitch and Intervalfactors likely associated with their less pronounced for-mant structure Overall strong formant prominence leadsto more drastic changes in blend

The prediction of blend using ΔLrarr3dB still presumes

that one of the instruments serves as a reference formantas the employed difference descriptors are anchored tothe sampled instrument The dependence on a referenceleaves some ambiguity because an arbitrary combinationof two instruments would lead to contradictory predictionsof blend if both instruments were given equal importancein serving as the reference Given the context of both ex-

1049

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

periments it can be assumed that the sampled instrumentacting as a constant anchor had been biased into servingas the reference by combining it with a variable synthe-sized instrument In addition a possible perceptual expla-nation could concern audio samples of instruments playingwithout vibrato generally still exhibiting coherent micro-modulations of partial tones These modulations have beenshown to contribute to a stronger and more stable audi-tory image [31] and may thus bias the more stable imagetoward acting as the reference especially as the synthe-sized partials remain static over time Even in the contextof blending in musical performance one instrument as-sumes the role of the leading voice in which it possiblyserves as the reference whereas an accompanying instru-ment avoids exceeding the lead instrumentrsquos main-formantfrequency Likewise returning to the notion of blend lead-ing to augmented timbres [4] the dominant timbre to beembellished by another would seem predestined to func-tion as the reference based on which the less dominanttimbre should not exhibit formant frequencies exceedingthose of the reference

Finally the results allow a reassessment of previousexplanations for blend The lsquodarkerrsquo-timbre hypothesis[4] is directly reflected in the obtained linear blend pro-file in which lower ΔF increases blend and at the sametime causes a decrease in the spectral-centroid compos-ite By contrast this hypothesis is not well explained bythe plateau profile as blend ratings remain similarly highfor ΔF le 0 The alternative hypothesis of coincidence offormant regions [3] would have predicted stronger blendratings for ΔF = 0 than for all other levels which in theperceptual results is only achieved for the levels ΔF gt 0While the hypothesis only achieves partial fulfillment withrespect to spectral variations ΔF it finds more agreementin the corresponding SAI representations As shown in twoexample cases for horn in Figure 9 the dyad SAI pro-files for the levels ΔF gt 0 are distinguishable from theremaining levels through clear deviations between 1 and2 kHz7 and located just above the hornrsquos estimated Frarr

3dBRemarkably the formant shifts related to ΔFlt 0 (Fig-ure 3) are not reflected in the corresponding dyad SAIprofiles (Figure 9) which instead exhibit direct align-ment below 500 Hz for all three levels ΔF le 0 There-fore only ΔF gt 0 seems to lead to incomplete masking[13] revealing the presence of the synthesized instrumentwhereas the spectral-envelope variations ΔF lt 0 evokelittle change compared to the dyad excitation pattern forΔF = 0

Of still greater importance the auditory system as mod-eled by AIM using the DCGC seemingly involves a high-

7 Concerning the output from the AIM a misalignment between actualsinusoidal frequencies and the corresponding SAI peaks was observedThrough personal communication with the developer of the utilized AIMimplementation this was explained as being an inherent property of thedynamic-compression filters A correction function was derived by com-puting SAIs for various sinusoidal frequencies and fitting the two fre-quency scales yielding the linear function fSAI = 117 middot f + 282 Hz[r2(50)= 999 p lt 0001] In Figure 9 the correction manifests itself inthe compressed frequency extent for the SAIs

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

001

002

SAImagnitude

0

-II

-I

+I

+II

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

002

004

006

008

Frequency (Hz)

Figure 9 SAI profiles of dyads for allΔF levels (Experiment B)depicting two experimental conditions for horn Top pitch 1 uni-son bottom pitch 2 non-unison the grid lines correspond topartial-tone locations

pass characteristic that attenuates spectral-envelope re-gions below 500 Hz affecting the perceived magnitudesof the respective partial tones (grid lines) This impliesthat in the region below 500 Hz frequency deviations be-tween main formants no longer affect the achieved degreeof blend as reflected both in Figure 9 and the perceptualfindings Horn and bassoon would especially benefit fromthis as their main formants are centered around 500 HzOboe and trumpet both exhibiting higher Frarr

3dB can be as-sumed to benefit to a lesser degree In summary main for-mants located in proximity to 500 Hz will benefit more ontop of which lower pitches would also increase the num-ber of partial tones falling into this favorable frequencyregion This reflects tendencies for pitch-invariant traits inSAI correlations (see Section 22) to be more pronouncedat lower pitch ranges and for instruments of lower registerwhich would lend support to the lsquodarker-timbrersquo hypothe-sis in terms of limiting the spectral centroid in frequency

7 Conclusion

Evidence from acoustical and psychoacoustical descrip-tions of wind instruments and from perceptual validationshows that relative location and prominence of main for-mants affect the perception of timbre blend critically Fur-thermore these pitch-invariant spectral characteristics ex-plain and predict the perception of blend to a promis-ingly high degree Remaining discrepancies between theacoustic and perceptual domains can be explained throughapparent constraints of the simulated auditory systemIn conclusion a perceptual model for the contributionof local spectral-envelope characteristics to blending isproposed keeping in mind that it would serve as aninstrument-specific component in a more complex gen-eral perceptual model involving compositional and per-formance factors as initially discussed in Section 1 Themain factors influencing the perception of spectral blendare summarized1 Frequency relationships between upper bounds of main

formants are critical to blend Among several instru-

1050

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

ments one is expected to serve as the reference (eglead instrument dominant timbre) above which thepresence of other instrumentsrsquo formants would stronglyresult in decreased blend

2 Prominence of the main formants governs whetherthese relationships lead to plateau or linear blend pro-files and in the first case pitch-invariant perceptual ro-bustness extends to non-unison intervals

3 Spectral-envelope relationships below 500 Hz may benegligible due to constraints of the auditory system Atthe same time blend decreases at higher pitches due toa degraded perceptual robustness of formants

This hypothetical model still requires further investigationconcerning a more systematic study on 1) the apparentconstraints of the auditory system as modeled by AIM2) how in musical practice one instrument may functionas the reference 3) establishing a more specific descrip-tion of formant prominence and 4) addressing the contri-bution of loudness balance between instruments to blendThese future investigations will further validate and refinethe proposed perceptual model and will improve compu-tational prediction tools for the instrument-specific spec-tral component of blend Orchestration practice will ben-efit from these research efforts even beyond aiming forblend as knowledge of favorable instrument relationshipsalso informs orchestrators as to how to avoid it

References

[1] G J Sandell Concurrent timbres in orchestration a per-ceptual study of factors determining blend PhD disserta-tion Northwestern University 1991

[2] R A Kendall E C Carterette Identification and blend oftimbres as a basis for orchestration Contemp Music Rev9 (1993) 51ndash67

[3] C Reuter Die auditive Diskrimination von Orchesterin-strumenten - Verschmelzung und Heraushoumlrbarkeit vonInstrumentalklangfarben im Ensemblespiel (The auditorydiscrimination of orchestral instruments fusion and distin-guishability of instrumental timbres in ensemble playing)P Lang Frankfurt am Main 1996

[4] G J Sandell Roles for spectral centroid and other factorsin determining ldquoblended instrument pairings in orchestra-tion Music Percept 13 (1995) 209ndash246

[5] A S Bregman Auditory scene analysis the perceptual or-ganization of sound MIT Press Cambridge MA 1990

[6] S McAdams S Winsberg S Donnadieu G De Soete JKrimphoff Perceptual scaling of synthesized musical tim-bres common dimensions specificities and latent subjectclasses Psychol Res 58 (1995) 177ndash192

[7] L Wedin G Goude Dimension analysis of the perceptionof instrumental timbre Scand J Psychol 13 (1972) 228ndash240

[8] J M Grey J W Gordon Perceptual effects of spectralmodifications on musical timbres J Acoust Soc Am 63(1978) 1493ndash1500

[9] W Strong M Clark Synthesis of wind-instrument tonesJ Acoust Soc Am 41 (1967) 39ndash52

[10] D Tardieu S McAdams Perception of dyads of impulsiveand sustained instrument sounds Music Percept 30 (2012)117ndash128

[11] L Hermann Phonophotographische Untersuchungen(Phonophotographic investigations) Pfluumlger Archiv fuumlrdie Gesamte Physiologie des Menschen und der Thiere 58(1894) 264ndash279

[12] C Stumpf Die Sprachlaute (The speech sounds) SpringerBerlin 1926

[13] J P Fricke Zur Anwendung digitaler Klangfarbenfilter beiAufnahme und Wiedergabe (On the application of digitaltimbre filters during recording and playback) Proc 14thTonmeistertagung Munich Germany 1986 135ndash148

[14] K E Schumann Physik der Klangfarben (Physics of tim-bres) Professorial dissertation Universitaumlt Berlin Berlin1929

[15] E L Saldanha J F Corso Timbre cues and the identifica-tion of musical instruments J Acoust Soc Am 36 (1964)2021ndash2026

[16] W Strong M Clark Perturbations of synthetic orchestralwind-instrument tones J Acoust Soc Am 41 (1967) 277ndash285

[17] D Luce J Clark Physical correlates of brass-instrumenttones J Acoust Soc Am 42 (1967) 1232ndash1243

[18] D A Luce Dynamic spectrum changes of orchestral in-struments J Audio Eng Soc 23 (1975) 565ndash568

[19] J Meyer Acoustics and the performance of music 5th edSpringer New York 2009

[20] G Fant Acoustic theory of speech production MoutonThe Hague 1960

[21] C Reuter Klangfarbe und Instrumentation Geschichte ndashUrsachen ndash Wirkung (Timbre and instrumentation historyndash causes ndash effect) P Lang Frankfurt am Main 2002

[22] R van Dinther R D Patterson Perception of acousticscale and size in musical instrument sounds J AcoustSoc Am 120 (2006) 2158ndash2176

[23] T Irino R D Patterson A dynamic compressive gam-machirp auditory filterbank IEEE Trans Audio SpeechLang Process 14 (2006) 2222ndash2232

[24] B C Moore B R Glasberg Suggested formulae for cal-culating auditory-filter bandwidths and excitation patternsJ Acoust Soc Am 74 (1983) 750ndash753

[25] S-A Lembke S McAdams A spectral-envelope synthe-sis model to study perceptual blend between wind instru-ments Proc 11th Congregraves Franccedilais drsquoAcoustique IOAannual meeting Acoustics 2012 Nantes France 20121025ndash1030

[26] F Martin C Champlin Reconsidering the limits of normalhearing J Am Acad Audiol 11 (2000) 64ndash66

[27] ISO 389ndash8 Acoustics Reference zero for the calibrationof audiometric equipmentmdashPart 8 Reference equivalentthreshold sound pressure levels for pure tones and circum-aural earphones Tech Rept International Organization forStandardization Geneva Switzerland 2004

[28] W J Conover R L Iman Rank transformations as abridge between parametric and nonparametric statisticsAm Stat 35 (1981) 124ndash129

[29] J J Higgins S Tashtoush An aligned rank transform testfor interaction Nonlinear World 1 (1994) 201ndash221

[30] G Peeters B L Giordano P Susini N Misdariis SMcAdams The Timbre Toolbox extracting audio descrip-tors from musical signals J Acoust Soc Am 130 (2011)2902ndash2916

[31] S McAdams Spectral fusion spectral parsing and the for-mation of auditory images PhD dissertation Stanford Uni-versity 1984

1051

Page 2: The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

timbre-space proximity between pairs of single-instrumentsounds to higher blend ratings for the same sounds form-ing dyads lsquoDarkerrsquo timbres have been hypothesized to befavorable to blend [4 10] quantified through the globalspectral-envelope descriptor spectral centroid with lsquodarkrsquoreferring to lower centroids Strong blend was found to bebest explained by a low centroid composite ie the cen-troid sum of the sounds forming a dyad

By contrast with global descriptors attempts to ex-plain blending through local spectral-envelope character-istics focus on prominent spectral maxima also termedformants [11 12] in this context Reuter [3] reported be-havioral findings in favor of timbre blend occurring when-ever formant regions between two instruments coincideHis explanation argues that this coincidence avoids in-complete masking [13] which inversely hypothesizes thatthe non-coincidence of formant locations prevents audi-tory fusion due to incomplete mutual masking of the pre-sumedly salient formants between instruments facilitatingthe detection of their distinct identities

As prominent signifiers of spectral envelopes formantshave been employed widely to describe wind instruments[3 7 8 12 14 15 16 17 18 19] Like the formantstructure found in the human voice [11 12 20] formantsin wind instruments are located at absolute frequency re-gions which remain largely unaffected by pitch change[14 17 18] This invariance may in fact allow for the gen-eralized acoustical description for these instruments andtogether with assessing its potential constraints (eg in-strument register dynamic marking) it will be of value tomusical applications (eg [21]) Furthermore it is mean-ingful to assess how such prominent spectral features arerepresented at an intermediary stage between acousticsand perception ie at a sensorineural level simulated bycomputational models of the human auditory system Au-ditory models can account for effects related to spectralmasking ie to what neural excitation pattern a spectrumof a single or compound sound leads For instance ex-citation patterns typically involve an asymmetric upwardspread in frequency but the shape of excitation still variesboth as a function of frequency and excitation level TheAuditory Image Model (AIM) simulates different stages ofthe peripheral auditory system covering the transductionof acoustical signals into neural responses and the sub-sequent temporal integration across auditory filters yield-ing the stabilized auditory image (SAI) which providesthe closest representation relating to acoustical spectral-envelope traits for human-voice and musical-instrumentsounds [22] AIMrsquos most recent development employsdynamic compressive gammachirp (DCGC) filterbankswhich account for both frequency and level dependencyof basilar excitation by adapting filter shape accordingly[23] AIM may therefore aid in assessing the relevanceof hypotheses concerning blend as previous theories hadalso employed representations or explanations which tookspectral-masking effects into account [4 3]

This paper addresses whether pitch-invariant spectral-envelope characterization is relevant to blending Section 2

introduces the chosen approach to spectral-envelope de-scription its corresponding representation through audi-tory models and how in the perceptual investigation thespectral description is operationalized in terms of paramet-ric variations of formant frequency location Section 3 out-lines the design of two behavioral experiments that inves-tigate the relevance of local variations of formant structureto blend perception with their specific methods and find-ings presented in Sections 4 and 5 respectively Finallythe combined results from acoustical and perceptual in-vestigations are discussed in Section 6 leading to the es-tablishment of a spectral model for blend in Section 7

2 Spectral-envelope characteristics

A corpus of wind instrument recordings was used to es-tablish a generalized acoustical description for each in-strument The orchestral instrument samples were drawnfrom the Vienna Symphonic Library1 (VSL) supplied asstereo WAV files (441 kHz sampling rate 16-bit dynamicresolution) with only left-channel data considered Theinvestigated instruments comprised (French) horn bas-soon C trumpet B clarinet oboe and flute with theavailable audio samples spanning their respective pitchranges in semitone increments Because the primary fo-cus concerned spectral aspects all selected samples con-sisted of long sustained notes without vibrato As spectralenvelopes commonly exhibit significant variation acrossdynamic markings all samples included only mezzofortemarkings representing an intermediate level of instrumentdynamics

21 Spectral-envelope description

Past investigations of pitch-invariant spectral-envelopecharacteristics pursued comprehensive assessments ofspectral analyses encompassing extended pitch ranges ofinstruments [14 17 18] The spectral-envelope descrip-tion employed in this paper was based on an empirical es-timation technique relying on the initial computation ofpower-density spectra for the sustained portions of sounds(excluding onset and offset) followed by the detectionof partial tones ie their frequencies and power levelsA curve-fitting procedure employing a cubic smoothingspline (piecewise polynomial of order 3) applied to thecomposite distribution of partial tones over all pitchesyielded the spectral-envelope estimates The procedurebalanced the contrary aims of achieving a detailed splinefit and a linear regression involving iterative minimizationof deviations between the estimate and the composite dis-tribution until an optimal criterion was met These pitch-generalized spectral-envelope estimates then served as thebasis for the identification and categorization of formantsThe main formant represented the most prominent spectralmaximum with decreasing magnitude towards both lowerand higher frequencies or if not available the most promi-

1 URL httpvslcoat Last accessed April 12 2014

1040

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10

Powerlevel(dB)

horn38 pitches

spectral-env estimate

composite distribution

formant maximum

formant bounds 3 dB

formant bounds 6 dB

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10oboe

34 pitches

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10clarinet

42 pitches

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10

Powerlevel(dB)

Frequency (Hz)

bassoon41 pitches

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10

Frequency (Hz)

trumpet31 pitches

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10

Frequency (Hz)

flute39 pitches

Figure 1 Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels) Estimates are based on thecomposite distribution of partial tones compiled from the specified number of pitches across the range of each instrument

nent spectral plateau ie the point exhibiting the flat-test slope along a region of decreasing magnitude towardshigher frequencies Furthermore descriptors for the mainformant F were derived from the estimated spectral enve-lope They comprised the frequencies of the formant max-imum Fmax as well as upper and lower bounds (eg Frarr

3dBand F larr

3dB) at which the power magnitude had decreased byeither 3 dB or 6 dB relative to Fmax

The spectral-envelope estimates for all investigated in-struments generally suggested pitch-invariant trends asshown in Figure 1 A narrower spread of the partial tones(circles) around the estimate (curve) argues for a strongerpitch-invariant trend The lower-pitched instruments hornand bassoon (left panels) exhibited strong tendencies forprominent spectral-envelope traits ie formants Higher-pitched instruments yielded two different kinds of descrip-tion Oboe and trumpet (middle panels) displayed mod-erately weaker pitch-invariant trends nonetheless exhibit-ing main formants with that of the trumpet being of con-siderable frequency extent compared to more locally con-strained ones reported for the other instruments Althoughstill following an apparent pitch-invariant trend the re-maining instruments clarinet and flute (right panels) dis-played only weakly pronounced formant structure withthe identified formants more resembling local spectralplateaus Furthermore the unique acoustical trait of theclarinet concerning its low chalumeau register preventedany valid assumption of pitch invariance to be made forthe lower frequency range This register is characterizedby a marked attenuation of the lower even-order partialswhose locations accordingly varied as a function of pitchFigure 1 also displays the associated formant descriptors(vertical lines) from which it can be shown that the iden-tified main formant for the clarinet (top-right panel) waslocated above the pitch-variant low frequencies

22 Auditory-model representation

If pitch-invariant spectral-envelope characteristics are per-ceptually relevant they should also become apparent ina representation closer to perception like the output ofa computational auditory model Using the AIM SAIswere derived from the DCGC basilar-membrane modelcomprising 50 filter channels equidistantly spaced alongequivalent-rectangular-bandwidth (ERB) rate [24] andcovering the audible range up to 5 kHz2 A time-averagedSAI magnitude profile was obtained by computing the me-dians across time frames per filter channel which resem-bled the auditory excitation pattern [22]

A strong similarity among SAIs across an extendedrange of pitches was taken as an indicator for pitch-invariant tendencies Pearson correlation matrices for allpossible pitch combinations were computed comparingthe profiles of SAI magnitudes over filter channels In ad-dition this approach also aided in identifying the limitsof pitch invariance as adjacent regions exhibiting weakercorrelations delimited instrument registers where SAIsvaried as a function of pitch Three representative casesare illustrated in Figure 2 For horn (left panel) and bas-soon (not shown) broad regions of pitch-invariant SAIprofiles became apparent (dark square) spanning largeparts of their ranges up to pitches of about D4 Oboe(middle panel) and trumpet (not shown) exhibited moreconstrained and fragmented regions of high SAI similar-ity contrasted by increasingly pitch-variant SAI profilesabove A4 For these four instruments pitch-invariant char-acterization appeared to be more prevalent and stable in

2 As band-limited analysis economized computational cost and no promi-nent formants above 5 kHz were found the audio samples were sub-sampled by a factor of 4 to a sampling rate of 11025 Hz only for thepurposes of analysis with AIM

1041

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Figure 2 (Color online) SAI correlation matrices for horn oboe and clarinet (left-to-right) Correlations (Pearson r) between SAImagnitudes across 50 filter channels consider all possible pitch combinations with obtained r falling within [0 1] (see legend far-right)

lower pitch regions from which low-pitched instrumentsin particular would benefit All of these instruments lostpitch-invariant tendencies in their high registers The re-maining instruments clarinet (right panel) and flute (notshown) lacked widespread pitch-invariant SAI charac-teristics as strong patterns of correlation were only ob-tained between directly adjacent pitches (diagonal) andnot across wider pitch regions

23 Parametric variation of main-formant fre-quency

In order to study the contribution of local variations ofspectral characteristics a synthesis model was employedthat provided parametric control over separate spectral-envelope components The synthesis infrastructure reliedon a source-filter model and was realized for real-timemodification of the control parameters [25] based onwhich the spectral envelope remained invariant to pitchchanges During synthesis the filter structure was fed aharmonic source signal of variable fundamental frequencycontaining harmonics up to 5 kHz The filter structure con-sisted of two independent filters modeling the main for-mant on the one hand and the remaining spectral-enveloperegions on the other A parameter allowing the main for-mant to be shifted in frequency relative to the remain-ing regions was implemented as an absolute deviation ΔFin Hz from a predefined origin ie ΔF = 0 Analoguemodels for each instrument were designed for ΔF = 0 bymatching the frequency response of the composite filterstructure to the spectral-envelope estimates as illustratedin Figure 3 for the horn (dashed black line) superimposedover its corresponding estimate (solid grey line) The ana-logues were not meant to deliver realistic emulations ofthe instruments per se but rather to achieve a good fitbetween the main formants of the analogue and spectral-envelope estimate Limiting differences in shape betweenmain formants helped to deduce the measured perceptualdifferences that resulted from frequency relationships be-tween them It should be noted that the synthesis filterstructure for the clarinet excluded its pitch-variant lower

500 1000 1500 2000 2500 3000 3500 4000 4500 5000-50

-40

-30

-20

-10

0

Powerlevel(dB)

Frequency (Hz)

spectral-envelope estimate

formant bounds 3 dB

synthesis filter for ΔF= 0

synthesis filters for ΔFne 0

-II -I 0 +I+II

Figure 3 Spectral-envelope estimate of horn and filter magni-tude responses of its synthesis analogue The original analogueis modeled for ΔF = 0 the other responses are variations ofΔF The top axis displays the equivalent scale for the five ΔF levelsinvestigated in Experiment B

frequency region (see Section 21) It only modeled theformant above that region as well as the remaining spec-tral envelope towards higher frequencies in order to orientthe investigation toward specifically testing the relevanceof the identified albeit less pronounced formant

3 General methods

The perceptual relevance of main-formant frequency toblending was tested for sound dyads All dyads compriseda sampled instrument and its synthesized analogue modelIn a given dyad the instrument sample was constant andits synthesized analogue was variable with respect to theparameter ΔF Variations with ΔF gt 0 shifted the mainformant of the synthesized sound higher in frequency rel-ative to the sampled instrumentrsquos main formant and ac-cordingly ΔF lt 0 corresponded to shifts toward lowerfrequencies Two perceptual experiments were conductedto investigate how ΔF variations relate to blend In Ex-periment A participants controlled ΔF directly and were

1042

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

asked to find the ΔF that gave optimal blend whereasin Experiment B listeners provided direct blend ratingsfor predefined ΔF variations Using the instruments pre-sented in Section 2 the robustness of perceptual effectswas assessed over the two experimental tasks for differ-ent pitches unison and non-unison intervals and stimuluscontexts Given six instruments and various pitches andintervals it was impractical to test all possible combina-tions An exploratory approach was chosen instead withnot all instruments being tested across all factors Whereasthis could reveal blend-related dependencies concerningΔF across the factors of interest it did not allow gener-alizing across all instruments to the same degree as well asdetermining perceptual thresholds Still each factor wasstudied with at least three instruments for greater gener-alizability Pitches were chosen to represent common reg-isters of the individual instruments Non-unison intervalsincluded both smaller and larger intervals

The methods both experiments share in common arepresented in this section before addressing their specificsand results in the following sections

31 ParticipantsDue to the demanding experimental tasks participantsof both experiments were musically experienced listen-ers They were recruited primarily from the SchulichSchool of Music McGill University Their backgroundswere assessed through self-reported degree of formalmusical training accumulated across several disciplineseg instrumental performance composition music the-ory andor sound recording All participants passed a stan-dardized hearing test [26 27] No participant took part inboth experiments

32 StimuliAll stimuli involved dyads comprising one sampled(drawn from VSL) and one synthesized sound For a sam-ple at any given pitch the spectral envelope was ap-proximated by the pitch-generalized description from Sec-tion 21 which resulted in the main formants of sampledand synthesized sounds resembling each other for ΔF = 0With regard to the temporal envelope both instrumentswere synchronized in their note onsets followed by thesustain portion and ending with an artificial 100-ms linearamplitude decay ramp applied to both instrument soundsThe sampled sound retained its original onset character-istics whereas across all modeled analogues the synthe-sized onsets were characterized by a constant 100-ms lin-ear amplitude ramp Stimuli were presented over a stan-dard two-channel stereophonic loudspeaker setup insidean Industrial Acoustics Company double-walled soundbooth with the instruments simulated as being capturedby a stereo main microphone at spatially distinct locationsinside a mid-sized moderately reverberant room (see [25]for details)

33 ProcedureExperimental conditions were presented in randomized or-der within blocks of repetitions A specific condition could

not occur twice in succession between blocks The mainexperiments were in each case preceded by 10 practice tri-als under the guidance of the experimenter to familiar-ize participants with the task and with representative ex-amples of stimulus variations Dyads were played repeat-edly throughout experimental trials allowing participantsto pause playback at any time

34 Data analysis

With respect to investigated factors Experiment A evalu-ated the influence of the factors instrument register and in-terval type Experiment B assessed pitch-invariant percep-tual performance across a number of factors and further-more correlated the perceptual data with spectral-envelopetraits Separate analyses of variance (ANOVAs) were con-ducted for each instrument testing for statistically signifi-cant main effects within factors and interaction effects be-tween them A criterion significance level α = 05 waschosen and if multiple analyses on split factor levels orindividual post-hoc analyses were conducted Bonferronicorrections were applied In repeated-measures ANOVAsthe Greenhouse-Geisser correction (ε) was applied when-ever the assumption of sphericity was violated In additionExperiment A also considered one-sample t-tests againsta mean of zero for testing differences to ΔF = 0 Statis-tical effect sizes η2

p and r are reported for ANOVAs andt-tests respectively The analyses considered participant-based averages for trial repetitions of identical conditions

4 Experiment A

41 Method

411 Participants

The experiment was conducted with 17 participants 6 fe-male and 11 male with a median age of 27 years (range20-57) Fifteen participants reported more than 10 yearsof formal musical training with 10 indicating experi-ence with wind instruments Participants were remuner-ated with 15 Canadian dollars

412 Stimuli

Table I lists the 17 investigated dyad conditions (columnentries of bottom row) All instruments included unisonintervals (0 semitones ST) With regard to additional fac-tors three levels of the Interval factor compared unisonintervals to consonant (7 or minus3 ST) and dissonant (6 orminus2 ST) non-unison intervals Two levels of the Registerfactor contrasted low (A2 C4 or E3) to high (D5 or B5) in-strument registers for unison dyads with the high-registerpitches being derived from the pitch-variant regions iden-tified in Section 22 The sampled sound remained at theindicated reference pitch whereas the synthesized soundvaried relative to it to form the non-unison intervals Alldyads had constant durations of 4900 ms The level bal-ance between instruments was variable and determined bythe participant

1043

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table I Seventeen dyad conditions from Experiment A across instruments pitches and intervals (top-to-bottom) Intervals in semitonesrelative to the specified reference pitch

horn bassoon oboe trumpet clarinet fluteC3 A2 D5 C4 C4 B5 E3 D5 C4

0 6 7 0 -2 -3 0 0 0 6 7 0 0 -2 -3 0 0

ΔF

-II -I +II+I0

A

B

Γ= -100 Hz fslider

high

low

Figure 4 ΔF variations investigated in Experiments A and B AParticipants controlled fslider which provided a constant range of700 Hz (white arrows) Γ (eg minus100 Hz) represented a random-ized roving parameter preventing the range from always beingcentered on ΔF = 0 B Participants rated four dyads varyingin ΔF drawn from the low or high context The contexts repre-sented subsets of four of the total of five predefined ΔF levels

413 Procedure

A production task required participants to adjust ΔF di-rectly in order to achieve the maximum attainable blendwith the produced value serving as the dependent vari-able User control was provided via a two-dimensionalgraphical interface including controls for ΔF and thelevel balance between instruments The slider controls forΔF= fslider + Γ provided a constant range of 700 Hzwith fslider isin [minus350+350] and including a randomizedroving offset Γ isin [minus100+100] between trials As visu-alized in Figure 4 (top) minimal or maximal Γ limitedthe range covered by all trials to 500 Hz (solid thick greyline) with all possible ΔF deviations spanning a range of900 Hz (dashed thick line) Participants completed a totalof 88 experimental trials (22 conditions3 times 4 repetitions)taking about 50 minutes and including a 5-minute breakafter about 44 trials

42 Results

421 General trends

For all six instruments participants associated optimalblend with deviations ΔF le 0 Figure 6 (diamonds inlower part) illustrates the means for optimal ΔF fromwhich two different patterns become apparent among in-struments ΔF are displayed relative to a scale of equiv-alent variations tested in Experiment B with the scalevalue 0 corresponding to ΔF = 04 The grey lines indicate

3 Only 17 conditions investigated ΔF the remainder studied other for-mant properties that lie outside the focus of this article For instancerelative main-formant magnitude variations ΔL were also investigatedwhich will be addressed in a future publication4 ΔF were linearly interpolated to a scale of equi-distant levels eg minusI 0 and +I corresponding to the numerical scale values -1 0 and 1 re-spectively

each instrumentrsquos respective slider range For the instru-ments horn bassoon oboe and trumpet (left panel) opti-mal blend was found in direct proximity to ΔF = 0 Forthe unison intervals of horn and bassoon ΔF did not differsignificantly from zero ΔF for the other two instrumentswere located slightly lower [t(16) le minus56 p lt 0001rge 82] By contrast optimal ΔF for the clarinet and flute(right panel) were relatively distant from ΔF = 0 in linewith significant underestimations [t(16)leminus38 p le 0015rge 69] In summary ΔF values leading to optimal blendwere limited to cases in which the formant of the synthe-sized instrument was at or below that of the sampled in-strument

422 Instrument register and interval type

The influence of instrument register on the optimal ΔFwas investigated for trumpet bassoon and clarinet atpitches corresponding to instrument-specific low and highregisters One-way repeated-measures ANOVAs for Reg-ister yielded moderately strong main effects for trumpetand bassoon [F (1 16) ge 192 p le 0005 η2

p ge 55] dueto an increase of the optimal ΔF in the high registerA less pronounced effect was obtained for the clarinet[F (1 16)=53p= 0358η2

p = 25]The investigation of possible differences in optimal ΔF

between interval types involved comparisons between uni-son and non-unison intervals as well as a distinction be-tween consonant and dissonant for the latter One-wayrepeated-measures ANOVAs on Interval conducted forbassoon clarinet and trumpet only led to a weak maineffect for the trumpet [F (2 32)= 37p= 0347η2

p = 19]Post-hoc tests for the three possible comparisons yieldeda single significant difference between the interval sizes 0and 6 ST [t(16) = minus35 p = 0033 r = 65]

5 Experiment B

51 Method

511 Participants

The experiment was completed by 20 participants 9 fe-male and 11 male with a median age of 22 years (range18ndash35) Fifteen participants reported more than 10 yearsof formal musical training with 11 indicating experi-ence with wind instruments Participants were remuner-ated with 20 Canadian dollars

512 Stimuli

Table II lists the 22 investigated dyad conditions The In-terval factor investigated two levels comparing unison to

1044

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table II Twenty-two dyad conditions from Experiment B across instruments pitches and intervals (top-to-bottom) Intervals in semi-tones relative to the specified reference pitch

horn bassoon oboe trumpet clarinet fluteC3 B 3 A2 D4 C4 G 4 E5 C4 B 4 E3 A4 C4 G 4 E5

0 6 0 6 0 -2 0 -2 0 0 0 0 6 0 6 0 -2 0 -2 0 0 0

non-unison (6 or -2 ST dissonant) intervals Depending onthe instrument the Pitch factor involved two (horn bas-soon trumpet clarinet) or three (oboe flute) levels In thecase of horn bassoon trumpet and clarinet there weretwo levels of Interval for each level of Pitch In additionthis experiment included two factors that were related toΔF variations alone which applied to all conditions listedin Table II The first was synonymous with ΔF as it ex-plored a total of five ΔF levels including ΔF = 0 and twosets of predefined moderate and extreme deviations aboveand below it ie the ΔF levels hereafter labeled 0 plusmnI and plusmnII The second factor grouped the five levels contex-tually into two subsets of four which are denoted as lowand high contexts and defined in Figure 4 (bottom)

Employing the formant descriptors from Section 21the investigated ΔF levels were expressed on a commonscale of spectral-envelope description which provided abetter basis of comparison than taking equal frequencydifferences in Hz as the frequency extent of formantsacross instruments varied considerably Figure 5 providesexamples for all resulting ΔF levels of the horn The fourlevels ΔF = 0 were defined as frequency distances be-tween the formant maximum Fmax and measures relatedto the location and width of its bounds (eg Frarr

6dB andΔF3dB respectively) For example the positive deviationΔF (+I) was the distance between the formant maximumand its upper bound minus 10 of the width betweenthe 3 dB bounds If spectral-envelope descriptions lackedlower bounds (eg trumpet clarinet flute) the frequencylocated below Fmax that exhibited the lowest magnitudewas taken as a substitute value

Unlike the dyads in Experiment A the synthesizedsound always remained at the reference pitch whereas thesampled sound varied its pitch for non-unison intervalsbecause this tested the assumption of pitch-invariant de-scription for the recorded sounds more thoroughly Thedyads had a constant duration of 4700 ms In additionthe conditions listed in Table II including the associatedfive ΔF levels per condition had predetermined values forthe level balance between sounds and had also been equal-ized for loudness The first author determined the level bal-ance aiming for good balance between both sounds whilemaintaining discriminability between ΔF levels whichwas subsequently verified by the second author Loudnessequalization was conducted subjectively in a separate pi-lot experiment anchored to a global reference dyad for allconditions and ΔF levels For all stimuli gain levels weredetermined that equalized stimulus loudness to the globalreference These gain levels were based on median val-ues from at least five participants which were determined

200 400 600 800 1000 1200-12

-10

-8

-6

-4

-2

0

Powerlevel(dB)

Frequency (Hz)

ΔF3dB

F3dB

larrF3dB

rarrFmax

F6dB

larrF6dB

rarr

-II -I 0 +I +II

Figure 5 ΔF levels from Experiment B defined relative toa spectral-envelope estimatersquos formant maximum and boundsΔF (plusmnI) fall 10 inside of ΔF3dBrsquos extent ΔF (+II) aligns withFrarr

6dB whereas ΔF (minusII) aligns with either 80 middot F larr6dB or 150 Hz

whichever is closer to Fmax

either after the corresponding interquartile ranges first fellbelow 4 dB or after running a maximum of 10 participants

513 ProcedureA relative-rating task required participants to compare ΔFlevels for a given condition from Table II In each ex-perimental trial participants were presented four dyadsand asked to provide four corresponding ratings The fourdyads represented one of the twoΔF contexts labeled highand low in Figure 4 A continuous rating scale was em-ployed which spanned from most blended to least blended(values 1 to 0 respectively) and served as the dependentvariable Participants needed to assign two dyads to thescale extremes (eg most and least) the remaining twodyads were positioned along the scale continuum relativeto the chosen extremes Playback could be switched freelybetween the four dyads with the visual order of the se-lection buttons and rating scales for individual dyads ran-domized between trials Participants completed 120 trials(30 conditions5 times 2 contextstimes 2 repetitions) taking about75 minutes including two 5-minute breaks after about 40and 80 trials

52 Results

521 General trendsGroup medians of ratings aggregated across the factorsPitch Interval and Context illustrate how blend varies as a

5 Only the 22 conditions investigating ΔF are reported here

1045

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

function of ΔF Figure 6 suggests that participants mainlyassociated higher degrees of blend with the levels ΔF le 0whereas much lower ratings were obtained for ΔF gt 0 Interms of higher degrees of blend two typical rating pro-files as a function of ΔF emerged (shown as the ideal-ized dashed-and-dotted curves in Figure 6) 1) For the in-struments horn bassoon oboe and trumpet (left panel)medium to high blend ratings were obtained at and be-low ΔF = 0 above which ratings decreased markedlyresembling the profile of a plateau 2) The instrumentsclarinet and flute (right panel) exhibited a monotonicallydecreasing and approximately linear rating profile as ΔFincreases in which ΔF = 0 did not appear to assume a no-table role These differences in plateau vs linear profilesfor the two instrument subsets are analyzed more closelyin the following sections also taking into account potentialeffects due to the other factors

522 Blend and pitch invariance

Spectral characteristics that remain stable with pitch varia-tion such as formants may have a pitch-invariant percep-tual relevance To test this whenever the profiles of blendratings over ΔF remained largely unaffected by differentpitches intervals and ΔF contexts the perceptual resultswere assumed to be pitch-invariant For instance Figure 7suggests this tendency for the horn in which the plateauprofile was maintained over all factorial manipulations Asa first step the main effects across ΔF were tested to con-firm that ratings served as reliable indicators of perceptualdifferences Given these main effects perceptual robust-ness to pitch variation was fulfilled if no ΔF times Pitch orΔF times Interval interaction effects were found across bothΔF contexts An absence of main effects betweenΔF con-texts would indicate further perceptual robustness

As the Context factor only involved ΔF levels com-mon to both the high and low contexts namely 0 andplusmnI (see Figure 4 bottom) the ratings for these levels re-quired range normalization and separate analyses from theremaining factors For the instruments involving the In-terval factor these were conducted on split levels of thatfactor Furthermore the experimental task imposed the us-age of the rating-scale extremes which resulted in severalviolations of normality due to skewed distributions for thedyads selected as extremes As a result all main and in-teraction effects were tested with a battery of five indepen-dent repeated-measures ANOVAs on the raw and trans-formed ratings The data transformations included non-parametric approaches of rank transformation [28] andprior alignment of lsquonuisancersquo factors [29]6 The statistics

6 Given the unavailability of non-parametric alternatives for repeated-measures three-way ANOVAs that include tests for interaction effectsan approach was chosen that assesses tests over multiple variants ofdependent-variable transformations presuming that the most conserva-tive test in the ANOVA battery minimizes accepting false positives Ranktransformation is a common approach in non-parametric tests such as theone-way Friedman test [28] Issues with tests for interaction effects los-ing power in the presence of strong main effects were addressed throughlsquoalignmentrsquo of the raw data prior to rank transformation [29] For in-stance a test for the interaction AtimesB would align its lsquonuisancersquo factors

-II -I 0 +I +II

0

1

ΔF

Blendrating

Produced optimal blend

-II -I 0 +I +IIΔF

Produced optimal blend

horn

bassoon

oboe

trumpet

profile

clarinet

flute

profile

Figure 6 Perceptual results for the different instruments groupedaccording to two typical response patterns (left and right panels)Experiment A (diamonds bottom) mean ΔF for produced opti-mal blend transformed to a continuous scale of ΔF levels Thegrey lines indicate slider ranges (compare to Figure 4 top) Ex-periment B (curves) median blend ratings across ΔF levels andidealized profile

-II -I 0 +I

0

1

context lowinterval unison

Blendrating

-I 0 +I +II

0

1

context highinterval unison

pitch 1

pitch 2

-II -I 0 +I

0

1

context lowinterval non-unison

Blendrating

ΔF-I 0 +I +II

0

1

context high

interval non-unison

ΔF

Figure 7 Medians and interquartile ranges of blend ratingsfor horn across ΔF levels and the factorial manipulationsPitchtimesContexttimesInterval

for the most liberal and conservative p-values are reported(eg conserv|liberal) with the conservative finding beingassumed valid if statistical significance is in doubt

Strong main effects were found for all instrumentswhich indicated clear differences in perceived blendamong the investigated ΔF levels Table III lists ANOVAstatistics for the range between strongest (clarinet) andweakest (bassoon) main effects among the instrumentswhich reflects analogous differences in the utilized rating-scale ranges in Figure 6 Furthermore the rating profiles ofthe instruments horn bassoon oboe and trumpet fulfilledthe criteria for pitch-invariant robustness as they appeared

by removing the main effects for A and B The four data transformationsprocessed the raw data with or without alignment and for two rankingmethods The first method computed global ranks across the entire dataset ie across participants and conditions whereas the second methodevaluated within-participant ranks across conditions per participant

1046

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table III Range of strongest to weakest ANOVA main effects along ΔF across all six instruments

low context high contextEffect Stat conserv liberal conserv liberal

F 860 826 1651 1651Clarinet df 16307 21391 357 357(strongest) ε 54 69 - -

p lt0001 lt0001 lt0001 lt0001η2p 82 81 90 90

F 164 168 126 152Bassoon df 357 357 14271 357(weakest) ε - - 48 -

p lt0001 lt0001 0005 0001η2p 46 47 40 44

unaffected by pitch-related variation There was only oneexception from a complete absence of effects interactingwith ΔF a moderate main effect for Context for trum-pet was found only at non-unison intervals [F (1 19) =1048|2504 p = 0043|0001 η2

p = 355|569] By con-trast the rating profiles for clarinet and flute did not exhibitpitch invariance as they clearly violated the criteria acrossboth ΔF contexts The interaction effects with ΔF and amain effect for Context leading to their disqualification aredescribed in Table IV

The instruments displaying pitch invariance were thesame ones with plateau rating profiles possibly attributinga special role to ΔF = 0 as defining a boundary governingblend To further support this assumption by joint analysisof the four instruments two hierarchical cluster analyseswere employed that grouped ΔF levels based on their sim-ilarity in perceptual ratings or auditory-model representa-tions The first cluster analysis reinterpreted rating differ-ences between ΔF as a dissimilarity measure This mea-sure considered effect sizes of statistically significant non-parametric post-hoc analyses (Wilcoxon signed-rank test)for pairwise comparisons between ΔF levels ie greaterstatistical effects between two ΔF levels were expressedas being more dissimilar in the perceived degree of blendFor non-significant differences dissimilarity was assumedto be zero The second analysis relied on correlation coef-ficients (Pearson r) between dyad SAI profiles across ΔFlevels (see Figure 9 for examples) The dissimilarity mea-sure considered the complement value 1minus r and as all cor-relations fall within the range [0 1] no special treatmentfor negative correlations was required Both cluster anal-yses employed complete-linkage algorithms The dissimi-larity input matrices were obtained by averaging 30 inde-pendent data sets aggregated across the four instrumentsand the factors Context Pitch and Interval As shown inFigure 8 both analyses led to analogous solutions in whichthe two levels ΔF gt 0 are maximally dissimilar to acompact cluster associating the three levels ΔF le 0 Inother words ΔF associated with low and high degrees ofblend group into two distinct clusters clearly relating tothe plateau profile where ΔF = 0 defines the boundary tohigher degrees of blend

Table IV ANOVA effects for clarinet and flute suggesting an ab-sence of pitch invariance lowast The column header low context doesnot apply in this case

Clarinet low context high contextEffect Stat conserv liberal conserv liberal

F 29 34 38 44ΔF times df 357 357 20387 357Interval ε - - 68 -

p 044 024 030 008η2p 13 15 17 19

Flute low context high contextEffect Stat conserv liberal conserv liberal

F 28 43 44 72ΔF times df 39750 6114 6114 6114Pitch ε 66 - - -

p 031 0006 0005 lt0001η2p 13 18 19 28

F 49 157 - -df 119 119 - -

Contextlowast ε - - - -p 039 0008 - -η2p 21 45 - -

-II -I 0 +I +II

01

02

03

04

05

Dissimilarity

ΔF-II -I 0 +I +II

01

02

ΔF

Figure 8 Dendrograms of ΔF -level groupings for the pitch-invariant instruments Dissimilarity measures are derived fromperceptual ratings (left) and auditory-modelled dyad SAI profiles(right)

523 Blend and its spectral correlatesExplaining blending between instruments with the helpof spectral-envelope characteristics could eventually al-

1047

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table V Variables entering stepwise-regression algorithm to ob-tain models reported in Table VI a Not for pitch-variant subsetinter-variable correlation |r| gt 7 b Computed as described in[30] c Accounting for pitch d Accounting for perceivable beat-ing between partial tones

No Variable Description

1 ΔLrarr3dB derivate of Frarr

3dB Equation (1)2 ΔLF1vsF2 ΔL betw formants F1 amp F2

3 ΔSslopeF1ab spectral slope above Frarr

3dB4 ΔSslope

b global spectral slope5 |ΔScentroid

b absolute centroid difference6 Scentroid

b centroid composite7 ERBrate

c reference pitch in Table II8 I(non)unison interval category (binary)9 IST

a interval size in semitones10 Clohi ΔF context (binary)11 mixratio balance betw instruments12 AMdepth

bd amplitude modulation depth

low the prediction of blend through these instrument-specific traits In addition it could help understand the wayin which spectral characteristics contribute perceptuallyGiven this aim multiple linear regression was employed tomodel the median blend ratings through a number of vari-ables or regressors The regression models assessed therelative contributions of regressors describing both globaland local spectral-envelope traits Global descriptors in-volved the commonly reported spectral centroid and spec-tral slope [30] whereas local descriptors concerned theformant characterization discussed in Section 21 Becausea dyad yielded two descriptor values across its constituentsounds the regressor measure had to associate the two insome way For the spectral centroid two measures wereconsidered namely composite (sum) and absolute differ-ence [4] Although the centroid composite relates to thelsquodarkerrsquo-timbre hypothesis mentioned in the Introductionthe centroid difference had still been found to best explainblend in non-unison intervals [4] which left some uncer-tainty as to which of these two measures was more appro-priate in explaining blend in general The remaining spec-tral regressors were implemented as polarity-preservingdifferences between descriptors with the sampled instru-ment serving as the reference (ref) and the synthesized in-strument being variable (var) across ΔF For example thedifference of descriptor values dx for instrument x wouldcorrespond to Δd = dref minus dvar

Regression models for two separate subsets of the per-ceptual data were explored pitch-invariant (horn bassoonoboe trumpet) and pitch-variant instruments (clarinetflute) The datasets comprised all conditions tested acrossfactors and instruments with N = 118 and N = 54 forthe pitch-invariant and -variant subsets respectively Re-gressor variables were pooled from the spectral-envelopedescriptors and additional variables that were included toaccount for potential confounding factors eg pitch in-terval If these factors did not contribute as regressorsthis would further support a perceptual relevance of pitch

invariance The initial pool of regressors comprised 32variables subsequently reduced to a pre-selected set thatexhibited inter-variable correlations |r| lt 7 in order toavoid pronounced collinearity among regressors The pre-selection was determined by first identifying the vari-able that in simple linear regression exhibited the high-est R2 and subsequently adding all remaining variablesthat yielded permissible inter-variable correlations Ta-ble V lists the pre-selected variables entered into the re-gression which comprised spectral-envelope descriptors(nos 1-6) and variables representing other potential fac-tors of influence (nos 7-12) Stepwise multiple-regressionalgorithms with both forward-selection and backward-elimination schemes were considered which converged onoptimum models by iteratively adding or eliminating re-gressors respectively Models with similar combinationsof regressors to the optimum models were explored aswell In anticipation of reporting the results the inclusionof a binary regressor for ΔF context Clohi benefited all in-vestigated regression models as it corrected the systematicoffset of scaled ratings between the low and high contexts(see Figure 7)

In simple linear regression the strongest spectral-enve-lope descriptors all concerned local formant characteriza-tion and did not involve the global descriptors Among theformant descriptors the highest correlations were obtainedfor the main-formant upper bound Frarr

3dB applied to bothpitch-invariant [R2(118) = 656 p lt 0001] and pitch-variant subsets [R2(54) = 713 p lt 0001] Notably theformant maximum Fmax did not perform better than Frarr

3dBlikely due to differing skewness properties between instru-ment formants (see Figure 1) At the same time the utilityof Frarr

3dB implies that it could assume an important role inexplaining blend as perhaps the perceptually most salientfeature of formants It performed slightly better for thepitch-variant than for the pitch-invariant subset becauseΔF rarr

3dB essentially follows a strictly monotonic functionacross the investigated ΔF levels which apparently mod-els the linear blend profile better To improve the model-ing for the plateau blend profile derivate descriptors ofFrarr

3dB were explored The most effective derivate ΔLrarr3dB re-

lated the upper-bound frequencies Frarr3dB of the two instru-

ments to a corresponding change in level across the ref-erence instrumentrsquos spectral-envelope magnitude functionLref (f ) in dB as formalized in Equation 1 In other wordsthis measure evaluated magnitude differences relative tothe reference instrument at frequencies appearing to be ofparticular perceptual relevance to both instruments whichtherefore may relate to spectral-masking effects (eg in-complete masking [13])

ΔLrarr3dB = Lref F rarr

3dB|ref minus Lref F rarr3dB|var (1)

The obtained solutions from stepwise multiple regressionyielded identical models for both instrument subsets in-volving the regressors ΔLrarr

3dB absolute spectral-centroiddifference |ΔScentroid| and context Clohi A slight gain inperformance was achieved by substituting the |ΔScentroid|based on audio signals from individual sounds with a vari-

1048

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table VI Multiple-regression models best predicting timbre-blend ratings for two instrument subsets

Pitch-invariant subset R2adj=87

F (3 116)=2724 p lt0001

Regressors βstd t p

ΔLrarr3dB 100 286 lt0001

|ΔScentroid| 26 77 lt0001Clohi 27 79 lt0001

Pitch-variant subset R2adj=88

F (3 52)=1342 p lt0001

Regressors βstd t p

ΔLrarr3dB 103 200 lt0001

|ΔScentroid| 16 33 0018Clohi 34 68 lt0001

ant computed on the pitch-generalized spectral-envelopeestimates Table VI displays these optimized regressionmodels for pitch-invariant and pitch-variant subsets bothleading to about 87 explained variance (adjusted R2)The standardized regression-slope coefficients βstd indi-cate the relative contribution of regressors with the rel-ative weights being very similar for both subsets In thesemodels ΔLrarr

3dB acted as the strongest predictor for theblend ratings contributing about five times more than|ΔScentroid| which furthermore did not perform better thanClohi These findings clearly argue for local spectral-envelope descriptors to be more meaningful than globalones in explaining blending in the investigated dyadsMoreover the remaining global descriptor spectral slopeappeared to play no role Furthermore finding both in-strument subsets to be modeled equally well through thesame spectral-envelope descriptors points to a general util-ity of pitch-generalized descriptions for all instrumentsDespite the findings in Section 522 arguing against pitch-invariant perceptual robustness for clarinet and flute theobtained regression models excluded the Pitch and Intervalvariables thus appearing to be less relevant to explainingthe blend ratings

6 General discussion

Orchestrators would benefit from acoustical descriptionsof instruments that correspond to the perceptual processesinvolved in achieving blended timbres Section 2 sug-gests that common orchestral wind instruments are rea-sonably well described through pitch-generalized spectral-envelope estimates which furthermore show the instru-ments horn bassoon oboe and trumpet to be charac-terized by prominent formant structure Auditory mod-els employing stabilized auditory images (SAI) confirmthat for strong formant characterization and for lower tomiddle pitch ranges the pitch-invariant characterization isstable In higher instrument registers however SAI pro-files indicate limitations to pitch-invariant characteriza-tion Other instruments like clarinet and flute yield SAI

profiles clearly varying as a function of pitch implyingthat this pitch dependency may also extend to perception

The perceptual investigation in Sections 3 to 5 con-firms the acoustical implications showing that strong for-mant characterization results in main formants becomingperceptually relevant to blending Given a dyad in whicha putative main formant is variable in frequency relativeto a fixed reference formant the investigated instrumentsdisplay two archetypical profiles based on their formantprominence For the pitch-variant clarinet and flute blendincreases as a monotonic quasi-linear function if the vari-able formant moves from above to below the referenceFor pitch-invariant instruments the frequency alignmentbetween the variable formant and the reference (ΔF= 0)functions as a boundary delimiting a region of higher de-grees of blend at and below the reference and contrastedby a marked decrease in blend when the variable formantexceeds it which overall resembles a plateau profile Thepitch invariance even extends to different interval typesas the plateau profile remains unaffected in non-unisonintervals regardless of their degree of consonance How-ever the findings suggest that the perceptual relevance ofspectral-envelope estimates diminishes in higher instru-ment registers The limited sampling of conditions acrossthe investigated factors prevented a more comprehensiveevaluation of all instruments Whereas the findings allowgeneral relationships for formant frequency to be deducedmore comprehensive investigations are needed to attainmore generalizable quantification of absolute frequencyranges and thresholds

In correlating acoustical and perceptual factors spec-tral-envelope characteristics alone explain up to 87 ofthe variance in blend ratings In addition local spectraltraits seem to be more powerful acoustical predictors ofblend than global traits The formant descriptor for theupper formant bound Frarr

3dB when expressed as a derivatedescriptor ΔLrarr

3dB acts as the strongest predictor for theblend ratings regardless of whether instruments belongto the pitch-invariant group or not With regard to clar-inet and flute the departures from pitch invariance foundin Section 522 contradict the utility of pitch-generalizedspectral-envelope description in predicting blend ratingsas reported in Section 523 Taking both findings into ac-count this for one argues that the descriptor Frarr

3dB stillsucceeds in explaining blend well even for clarinet andflute On the other hand the same instruments displaya greater perceptual sensitivity to the Pitch and Intervalfactors likely associated with their less pronounced for-mant structure Overall strong formant prominence leadsto more drastic changes in blend

The prediction of blend using ΔLrarr3dB still presumes

that one of the instruments serves as a reference formantas the employed difference descriptors are anchored tothe sampled instrument The dependence on a referenceleaves some ambiguity because an arbitrary combinationof two instruments would lead to contradictory predictionsof blend if both instruments were given equal importancein serving as the reference Given the context of both ex-

1049

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

periments it can be assumed that the sampled instrumentacting as a constant anchor had been biased into servingas the reference by combining it with a variable synthe-sized instrument In addition a possible perceptual expla-nation could concern audio samples of instruments playingwithout vibrato generally still exhibiting coherent micro-modulations of partial tones These modulations have beenshown to contribute to a stronger and more stable audi-tory image [31] and may thus bias the more stable imagetoward acting as the reference especially as the synthe-sized partials remain static over time Even in the contextof blending in musical performance one instrument as-sumes the role of the leading voice in which it possiblyserves as the reference whereas an accompanying instru-ment avoids exceeding the lead instrumentrsquos main-formantfrequency Likewise returning to the notion of blend lead-ing to augmented timbres [4] the dominant timbre to beembellished by another would seem predestined to func-tion as the reference based on which the less dominanttimbre should not exhibit formant frequencies exceedingthose of the reference

Finally the results allow a reassessment of previousexplanations for blend The lsquodarkerrsquo-timbre hypothesis[4] is directly reflected in the obtained linear blend pro-file in which lower ΔF increases blend and at the sametime causes a decrease in the spectral-centroid compos-ite By contrast this hypothesis is not well explained bythe plateau profile as blend ratings remain similarly highfor ΔF le 0 The alternative hypothesis of coincidence offormant regions [3] would have predicted stronger blendratings for ΔF = 0 than for all other levels which in theperceptual results is only achieved for the levels ΔF gt 0While the hypothesis only achieves partial fulfillment withrespect to spectral variations ΔF it finds more agreementin the corresponding SAI representations As shown in twoexample cases for horn in Figure 9 the dyad SAI pro-files for the levels ΔF gt 0 are distinguishable from theremaining levels through clear deviations between 1 and2 kHz7 and located just above the hornrsquos estimated Frarr

3dBRemarkably the formant shifts related to ΔFlt 0 (Fig-ure 3) are not reflected in the corresponding dyad SAIprofiles (Figure 9) which instead exhibit direct align-ment below 500 Hz for all three levels ΔF le 0 There-fore only ΔF gt 0 seems to lead to incomplete masking[13] revealing the presence of the synthesized instrumentwhereas the spectral-envelope variations ΔF lt 0 evokelittle change compared to the dyad excitation pattern forΔF = 0

Of still greater importance the auditory system as mod-eled by AIM using the DCGC seemingly involves a high-

7 Concerning the output from the AIM a misalignment between actualsinusoidal frequencies and the corresponding SAI peaks was observedThrough personal communication with the developer of the utilized AIMimplementation this was explained as being an inherent property of thedynamic-compression filters A correction function was derived by com-puting SAIs for various sinusoidal frequencies and fitting the two fre-quency scales yielding the linear function fSAI = 117 middot f + 282 Hz[r2(50)= 999 p lt 0001] In Figure 9 the correction manifests itself inthe compressed frequency extent for the SAIs

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

001

002

SAImagnitude

0

-II

-I

+I

+II

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

002

004

006

008

Frequency (Hz)

Figure 9 SAI profiles of dyads for allΔF levels (Experiment B)depicting two experimental conditions for horn Top pitch 1 uni-son bottom pitch 2 non-unison the grid lines correspond topartial-tone locations

pass characteristic that attenuates spectral-envelope re-gions below 500 Hz affecting the perceived magnitudesof the respective partial tones (grid lines) This impliesthat in the region below 500 Hz frequency deviations be-tween main formants no longer affect the achieved degreeof blend as reflected both in Figure 9 and the perceptualfindings Horn and bassoon would especially benefit fromthis as their main formants are centered around 500 HzOboe and trumpet both exhibiting higher Frarr

3dB can be as-sumed to benefit to a lesser degree In summary main for-mants located in proximity to 500 Hz will benefit more ontop of which lower pitches would also increase the num-ber of partial tones falling into this favorable frequencyregion This reflects tendencies for pitch-invariant traits inSAI correlations (see Section 22) to be more pronouncedat lower pitch ranges and for instruments of lower registerwhich would lend support to the lsquodarker-timbrersquo hypothe-sis in terms of limiting the spectral centroid in frequency

7 Conclusion

Evidence from acoustical and psychoacoustical descrip-tions of wind instruments and from perceptual validationshows that relative location and prominence of main for-mants affect the perception of timbre blend critically Fur-thermore these pitch-invariant spectral characteristics ex-plain and predict the perception of blend to a promis-ingly high degree Remaining discrepancies between theacoustic and perceptual domains can be explained throughapparent constraints of the simulated auditory systemIn conclusion a perceptual model for the contributionof local spectral-envelope characteristics to blending isproposed keeping in mind that it would serve as aninstrument-specific component in a more complex gen-eral perceptual model involving compositional and per-formance factors as initially discussed in Section 1 Themain factors influencing the perception of spectral blendare summarized1 Frequency relationships between upper bounds of main

formants are critical to blend Among several instru-

1050

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

ments one is expected to serve as the reference (eglead instrument dominant timbre) above which thepresence of other instrumentsrsquo formants would stronglyresult in decreased blend

2 Prominence of the main formants governs whetherthese relationships lead to plateau or linear blend pro-files and in the first case pitch-invariant perceptual ro-bustness extends to non-unison intervals

3 Spectral-envelope relationships below 500 Hz may benegligible due to constraints of the auditory system Atthe same time blend decreases at higher pitches due toa degraded perceptual robustness of formants

This hypothetical model still requires further investigationconcerning a more systematic study on 1) the apparentconstraints of the auditory system as modeled by AIM2) how in musical practice one instrument may functionas the reference 3) establishing a more specific descrip-tion of formant prominence and 4) addressing the contri-bution of loudness balance between instruments to blendThese future investigations will further validate and refinethe proposed perceptual model and will improve compu-tational prediction tools for the instrument-specific spec-tral component of blend Orchestration practice will ben-efit from these research efforts even beyond aiming forblend as knowledge of favorable instrument relationshipsalso informs orchestrators as to how to avoid it

References

[1] G J Sandell Concurrent timbres in orchestration a per-ceptual study of factors determining blend PhD disserta-tion Northwestern University 1991

[2] R A Kendall E C Carterette Identification and blend oftimbres as a basis for orchestration Contemp Music Rev9 (1993) 51ndash67

[3] C Reuter Die auditive Diskrimination von Orchesterin-strumenten - Verschmelzung und Heraushoumlrbarkeit vonInstrumentalklangfarben im Ensemblespiel (The auditorydiscrimination of orchestral instruments fusion and distin-guishability of instrumental timbres in ensemble playing)P Lang Frankfurt am Main 1996

[4] G J Sandell Roles for spectral centroid and other factorsin determining ldquoblended instrument pairings in orchestra-tion Music Percept 13 (1995) 209ndash246

[5] A S Bregman Auditory scene analysis the perceptual or-ganization of sound MIT Press Cambridge MA 1990

[6] S McAdams S Winsberg S Donnadieu G De Soete JKrimphoff Perceptual scaling of synthesized musical tim-bres common dimensions specificities and latent subjectclasses Psychol Res 58 (1995) 177ndash192

[7] L Wedin G Goude Dimension analysis of the perceptionof instrumental timbre Scand J Psychol 13 (1972) 228ndash240

[8] J M Grey J W Gordon Perceptual effects of spectralmodifications on musical timbres J Acoust Soc Am 63(1978) 1493ndash1500

[9] W Strong M Clark Synthesis of wind-instrument tonesJ Acoust Soc Am 41 (1967) 39ndash52

[10] D Tardieu S McAdams Perception of dyads of impulsiveand sustained instrument sounds Music Percept 30 (2012)117ndash128

[11] L Hermann Phonophotographische Untersuchungen(Phonophotographic investigations) Pfluumlger Archiv fuumlrdie Gesamte Physiologie des Menschen und der Thiere 58(1894) 264ndash279

[12] C Stumpf Die Sprachlaute (The speech sounds) SpringerBerlin 1926

[13] J P Fricke Zur Anwendung digitaler Klangfarbenfilter beiAufnahme und Wiedergabe (On the application of digitaltimbre filters during recording and playback) Proc 14thTonmeistertagung Munich Germany 1986 135ndash148

[14] K E Schumann Physik der Klangfarben (Physics of tim-bres) Professorial dissertation Universitaumlt Berlin Berlin1929

[15] E L Saldanha J F Corso Timbre cues and the identifica-tion of musical instruments J Acoust Soc Am 36 (1964)2021ndash2026

[16] W Strong M Clark Perturbations of synthetic orchestralwind-instrument tones J Acoust Soc Am 41 (1967) 277ndash285

[17] D Luce J Clark Physical correlates of brass-instrumenttones J Acoust Soc Am 42 (1967) 1232ndash1243

[18] D A Luce Dynamic spectrum changes of orchestral in-struments J Audio Eng Soc 23 (1975) 565ndash568

[19] J Meyer Acoustics and the performance of music 5th edSpringer New York 2009

[20] G Fant Acoustic theory of speech production MoutonThe Hague 1960

[21] C Reuter Klangfarbe und Instrumentation Geschichte ndashUrsachen ndash Wirkung (Timbre and instrumentation historyndash causes ndash effect) P Lang Frankfurt am Main 2002

[22] R van Dinther R D Patterson Perception of acousticscale and size in musical instrument sounds J AcoustSoc Am 120 (2006) 2158ndash2176

[23] T Irino R D Patterson A dynamic compressive gam-machirp auditory filterbank IEEE Trans Audio SpeechLang Process 14 (2006) 2222ndash2232

[24] B C Moore B R Glasberg Suggested formulae for cal-culating auditory-filter bandwidths and excitation patternsJ Acoust Soc Am 74 (1983) 750ndash753

[25] S-A Lembke S McAdams A spectral-envelope synthe-sis model to study perceptual blend between wind instru-ments Proc 11th Congregraves Franccedilais drsquoAcoustique IOAannual meeting Acoustics 2012 Nantes France 20121025ndash1030

[26] F Martin C Champlin Reconsidering the limits of normalhearing J Am Acad Audiol 11 (2000) 64ndash66

[27] ISO 389ndash8 Acoustics Reference zero for the calibrationof audiometric equipmentmdashPart 8 Reference equivalentthreshold sound pressure levels for pure tones and circum-aural earphones Tech Rept International Organization forStandardization Geneva Switzerland 2004

[28] W J Conover R L Iman Rank transformations as abridge between parametric and nonparametric statisticsAm Stat 35 (1981) 124ndash129

[29] J J Higgins S Tashtoush An aligned rank transform testfor interaction Nonlinear World 1 (1994) 201ndash221

[30] G Peeters B L Giordano P Susini N Misdariis SMcAdams The Timbre Toolbox extracting audio descrip-tors from musical signals J Acoust Soc Am 130 (2011)2902ndash2916

[31] S McAdams Spectral fusion spectral parsing and the for-mation of auditory images PhD dissertation Stanford Uni-versity 1984

1051

Page 3: The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10

Powerlevel(dB)

horn38 pitches

spectral-env estimate

composite distribution

formant maximum

formant bounds 3 dB

formant bounds 6 dB

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10oboe

34 pitches

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10clarinet

42 pitches

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10

Powerlevel(dB)

Frequency (Hz)

bassoon41 pitches

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10

Frequency (Hz)

trumpet31 pitches

0 1000 2000 3000 4000 5000

-40

-30

-20

-10

0

10

Frequency (Hz)

flute39 pitches

Figure 1 Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels) Estimates are based on thecomposite distribution of partial tones compiled from the specified number of pitches across the range of each instrument

nent spectral plateau ie the point exhibiting the flat-test slope along a region of decreasing magnitude towardshigher frequencies Furthermore descriptors for the mainformant F were derived from the estimated spectral enve-lope They comprised the frequencies of the formant max-imum Fmax as well as upper and lower bounds (eg Frarr

3dBand F larr

3dB) at which the power magnitude had decreased byeither 3 dB or 6 dB relative to Fmax

The spectral-envelope estimates for all investigated in-struments generally suggested pitch-invariant trends asshown in Figure 1 A narrower spread of the partial tones(circles) around the estimate (curve) argues for a strongerpitch-invariant trend The lower-pitched instruments hornand bassoon (left panels) exhibited strong tendencies forprominent spectral-envelope traits ie formants Higher-pitched instruments yielded two different kinds of descrip-tion Oboe and trumpet (middle panels) displayed mod-erately weaker pitch-invariant trends nonetheless exhibit-ing main formants with that of the trumpet being of con-siderable frequency extent compared to more locally con-strained ones reported for the other instruments Althoughstill following an apparent pitch-invariant trend the re-maining instruments clarinet and flute (right panels) dis-played only weakly pronounced formant structure withthe identified formants more resembling local spectralplateaus Furthermore the unique acoustical trait of theclarinet concerning its low chalumeau register preventedany valid assumption of pitch invariance to be made forthe lower frequency range This register is characterizedby a marked attenuation of the lower even-order partialswhose locations accordingly varied as a function of pitchFigure 1 also displays the associated formant descriptors(vertical lines) from which it can be shown that the iden-tified main formant for the clarinet (top-right panel) waslocated above the pitch-variant low frequencies

22 Auditory-model representation

If pitch-invariant spectral-envelope characteristics are per-ceptually relevant they should also become apparent ina representation closer to perception like the output ofa computational auditory model Using the AIM SAIswere derived from the DCGC basilar-membrane modelcomprising 50 filter channels equidistantly spaced alongequivalent-rectangular-bandwidth (ERB) rate [24] andcovering the audible range up to 5 kHz2 A time-averagedSAI magnitude profile was obtained by computing the me-dians across time frames per filter channel which resem-bled the auditory excitation pattern [22]

A strong similarity among SAIs across an extendedrange of pitches was taken as an indicator for pitch-invariant tendencies Pearson correlation matrices for allpossible pitch combinations were computed comparingthe profiles of SAI magnitudes over filter channels In ad-dition this approach also aided in identifying the limitsof pitch invariance as adjacent regions exhibiting weakercorrelations delimited instrument registers where SAIsvaried as a function of pitch Three representative casesare illustrated in Figure 2 For horn (left panel) and bas-soon (not shown) broad regions of pitch-invariant SAIprofiles became apparent (dark square) spanning largeparts of their ranges up to pitches of about D4 Oboe(middle panel) and trumpet (not shown) exhibited moreconstrained and fragmented regions of high SAI similar-ity contrasted by increasingly pitch-variant SAI profilesabove A4 For these four instruments pitch-invariant char-acterization appeared to be more prevalent and stable in

2 As band-limited analysis economized computational cost and no promi-nent formants above 5 kHz were found the audio samples were sub-sampled by a factor of 4 to a sampling rate of 11025 Hz only for thepurposes of analysis with AIM

1041

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Figure 2 (Color online) SAI correlation matrices for horn oboe and clarinet (left-to-right) Correlations (Pearson r) between SAImagnitudes across 50 filter channels consider all possible pitch combinations with obtained r falling within [0 1] (see legend far-right)

lower pitch regions from which low-pitched instrumentsin particular would benefit All of these instruments lostpitch-invariant tendencies in their high registers The re-maining instruments clarinet (right panel) and flute (notshown) lacked widespread pitch-invariant SAI charac-teristics as strong patterns of correlation were only ob-tained between directly adjacent pitches (diagonal) andnot across wider pitch regions

23 Parametric variation of main-formant fre-quency

In order to study the contribution of local variations ofspectral characteristics a synthesis model was employedthat provided parametric control over separate spectral-envelope components The synthesis infrastructure reliedon a source-filter model and was realized for real-timemodification of the control parameters [25] based onwhich the spectral envelope remained invariant to pitchchanges During synthesis the filter structure was fed aharmonic source signal of variable fundamental frequencycontaining harmonics up to 5 kHz The filter structure con-sisted of two independent filters modeling the main for-mant on the one hand and the remaining spectral-enveloperegions on the other A parameter allowing the main for-mant to be shifted in frequency relative to the remain-ing regions was implemented as an absolute deviation ΔFin Hz from a predefined origin ie ΔF = 0 Analoguemodels for each instrument were designed for ΔF = 0 bymatching the frequency response of the composite filterstructure to the spectral-envelope estimates as illustratedin Figure 3 for the horn (dashed black line) superimposedover its corresponding estimate (solid grey line) The ana-logues were not meant to deliver realistic emulations ofthe instruments per se but rather to achieve a good fitbetween the main formants of the analogue and spectral-envelope estimate Limiting differences in shape betweenmain formants helped to deduce the measured perceptualdifferences that resulted from frequency relationships be-tween them It should be noted that the synthesis filterstructure for the clarinet excluded its pitch-variant lower

500 1000 1500 2000 2500 3000 3500 4000 4500 5000-50

-40

-30

-20

-10

0

Powerlevel(dB)

Frequency (Hz)

spectral-envelope estimate

formant bounds 3 dB

synthesis filter for ΔF= 0

synthesis filters for ΔFne 0

-II -I 0 +I+II

Figure 3 Spectral-envelope estimate of horn and filter magni-tude responses of its synthesis analogue The original analogueis modeled for ΔF = 0 the other responses are variations ofΔF The top axis displays the equivalent scale for the five ΔF levelsinvestigated in Experiment B

frequency region (see Section 21) It only modeled theformant above that region as well as the remaining spec-tral envelope towards higher frequencies in order to orientthe investigation toward specifically testing the relevanceof the identified albeit less pronounced formant

3 General methods

The perceptual relevance of main-formant frequency toblending was tested for sound dyads All dyads compriseda sampled instrument and its synthesized analogue modelIn a given dyad the instrument sample was constant andits synthesized analogue was variable with respect to theparameter ΔF Variations with ΔF gt 0 shifted the mainformant of the synthesized sound higher in frequency rel-ative to the sampled instrumentrsquos main formant and ac-cordingly ΔF lt 0 corresponded to shifts toward lowerfrequencies Two perceptual experiments were conductedto investigate how ΔF variations relate to blend In Ex-periment A participants controlled ΔF directly and were

1042

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

asked to find the ΔF that gave optimal blend whereasin Experiment B listeners provided direct blend ratingsfor predefined ΔF variations Using the instruments pre-sented in Section 2 the robustness of perceptual effectswas assessed over the two experimental tasks for differ-ent pitches unison and non-unison intervals and stimuluscontexts Given six instruments and various pitches andintervals it was impractical to test all possible combina-tions An exploratory approach was chosen instead withnot all instruments being tested across all factors Whereasthis could reveal blend-related dependencies concerningΔF across the factors of interest it did not allow gener-alizing across all instruments to the same degree as well asdetermining perceptual thresholds Still each factor wasstudied with at least three instruments for greater gener-alizability Pitches were chosen to represent common reg-isters of the individual instruments Non-unison intervalsincluded both smaller and larger intervals

The methods both experiments share in common arepresented in this section before addressing their specificsand results in the following sections

31 ParticipantsDue to the demanding experimental tasks participantsof both experiments were musically experienced listen-ers They were recruited primarily from the SchulichSchool of Music McGill University Their backgroundswere assessed through self-reported degree of formalmusical training accumulated across several disciplineseg instrumental performance composition music the-ory andor sound recording All participants passed a stan-dardized hearing test [26 27] No participant took part inboth experiments

32 StimuliAll stimuli involved dyads comprising one sampled(drawn from VSL) and one synthesized sound For a sam-ple at any given pitch the spectral envelope was ap-proximated by the pitch-generalized description from Sec-tion 21 which resulted in the main formants of sampledand synthesized sounds resembling each other for ΔF = 0With regard to the temporal envelope both instrumentswere synchronized in their note onsets followed by thesustain portion and ending with an artificial 100-ms linearamplitude decay ramp applied to both instrument soundsThe sampled sound retained its original onset character-istics whereas across all modeled analogues the synthe-sized onsets were characterized by a constant 100-ms lin-ear amplitude ramp Stimuli were presented over a stan-dard two-channel stereophonic loudspeaker setup insidean Industrial Acoustics Company double-walled soundbooth with the instruments simulated as being capturedby a stereo main microphone at spatially distinct locationsinside a mid-sized moderately reverberant room (see [25]for details)

33 ProcedureExperimental conditions were presented in randomized or-der within blocks of repetitions A specific condition could

not occur twice in succession between blocks The mainexperiments were in each case preceded by 10 practice tri-als under the guidance of the experimenter to familiar-ize participants with the task and with representative ex-amples of stimulus variations Dyads were played repeat-edly throughout experimental trials allowing participantsto pause playback at any time

34 Data analysis

With respect to investigated factors Experiment A evalu-ated the influence of the factors instrument register and in-terval type Experiment B assessed pitch-invariant percep-tual performance across a number of factors and further-more correlated the perceptual data with spectral-envelopetraits Separate analyses of variance (ANOVAs) were con-ducted for each instrument testing for statistically signifi-cant main effects within factors and interaction effects be-tween them A criterion significance level α = 05 waschosen and if multiple analyses on split factor levels orindividual post-hoc analyses were conducted Bonferronicorrections were applied In repeated-measures ANOVAsthe Greenhouse-Geisser correction (ε) was applied when-ever the assumption of sphericity was violated In additionExperiment A also considered one-sample t-tests againsta mean of zero for testing differences to ΔF = 0 Statis-tical effect sizes η2

p and r are reported for ANOVAs andt-tests respectively The analyses considered participant-based averages for trial repetitions of identical conditions

4 Experiment A

41 Method

411 Participants

The experiment was conducted with 17 participants 6 fe-male and 11 male with a median age of 27 years (range20-57) Fifteen participants reported more than 10 yearsof formal musical training with 10 indicating experi-ence with wind instruments Participants were remuner-ated with 15 Canadian dollars

412 Stimuli

Table I lists the 17 investigated dyad conditions (columnentries of bottom row) All instruments included unisonintervals (0 semitones ST) With regard to additional fac-tors three levels of the Interval factor compared unisonintervals to consonant (7 or minus3 ST) and dissonant (6 orminus2 ST) non-unison intervals Two levels of the Registerfactor contrasted low (A2 C4 or E3) to high (D5 or B5) in-strument registers for unison dyads with the high-registerpitches being derived from the pitch-variant regions iden-tified in Section 22 The sampled sound remained at theindicated reference pitch whereas the synthesized soundvaried relative to it to form the non-unison intervals Alldyads had constant durations of 4900 ms The level bal-ance between instruments was variable and determined bythe participant

1043

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table I Seventeen dyad conditions from Experiment A across instruments pitches and intervals (top-to-bottom) Intervals in semitonesrelative to the specified reference pitch

horn bassoon oboe trumpet clarinet fluteC3 A2 D5 C4 C4 B5 E3 D5 C4

0 6 7 0 -2 -3 0 0 0 6 7 0 0 -2 -3 0 0

ΔF

-II -I +II+I0

A

B

Γ= -100 Hz fslider

high

low

Figure 4 ΔF variations investigated in Experiments A and B AParticipants controlled fslider which provided a constant range of700 Hz (white arrows) Γ (eg minus100 Hz) represented a random-ized roving parameter preventing the range from always beingcentered on ΔF = 0 B Participants rated four dyads varyingin ΔF drawn from the low or high context The contexts repre-sented subsets of four of the total of five predefined ΔF levels

413 Procedure

A production task required participants to adjust ΔF di-rectly in order to achieve the maximum attainable blendwith the produced value serving as the dependent vari-able User control was provided via a two-dimensionalgraphical interface including controls for ΔF and thelevel balance between instruments The slider controls forΔF= fslider + Γ provided a constant range of 700 Hzwith fslider isin [minus350+350] and including a randomizedroving offset Γ isin [minus100+100] between trials As visu-alized in Figure 4 (top) minimal or maximal Γ limitedthe range covered by all trials to 500 Hz (solid thick greyline) with all possible ΔF deviations spanning a range of900 Hz (dashed thick line) Participants completed a totalof 88 experimental trials (22 conditions3 times 4 repetitions)taking about 50 minutes and including a 5-minute breakafter about 44 trials

42 Results

421 General trends

For all six instruments participants associated optimalblend with deviations ΔF le 0 Figure 6 (diamonds inlower part) illustrates the means for optimal ΔF fromwhich two different patterns become apparent among in-struments ΔF are displayed relative to a scale of equiv-alent variations tested in Experiment B with the scalevalue 0 corresponding to ΔF = 04 The grey lines indicate

3 Only 17 conditions investigated ΔF the remainder studied other for-mant properties that lie outside the focus of this article For instancerelative main-formant magnitude variations ΔL were also investigatedwhich will be addressed in a future publication4 ΔF were linearly interpolated to a scale of equi-distant levels eg minusI 0 and +I corresponding to the numerical scale values -1 0 and 1 re-spectively

each instrumentrsquos respective slider range For the instru-ments horn bassoon oboe and trumpet (left panel) opti-mal blend was found in direct proximity to ΔF = 0 Forthe unison intervals of horn and bassoon ΔF did not differsignificantly from zero ΔF for the other two instrumentswere located slightly lower [t(16) le minus56 p lt 0001rge 82] By contrast optimal ΔF for the clarinet and flute(right panel) were relatively distant from ΔF = 0 in linewith significant underestimations [t(16)leminus38 p le 0015rge 69] In summary ΔF values leading to optimal blendwere limited to cases in which the formant of the synthe-sized instrument was at or below that of the sampled in-strument

422 Instrument register and interval type

The influence of instrument register on the optimal ΔFwas investigated for trumpet bassoon and clarinet atpitches corresponding to instrument-specific low and highregisters One-way repeated-measures ANOVAs for Reg-ister yielded moderately strong main effects for trumpetand bassoon [F (1 16) ge 192 p le 0005 η2

p ge 55] dueto an increase of the optimal ΔF in the high registerA less pronounced effect was obtained for the clarinet[F (1 16)=53p= 0358η2

p = 25]The investigation of possible differences in optimal ΔF

between interval types involved comparisons between uni-son and non-unison intervals as well as a distinction be-tween consonant and dissonant for the latter One-wayrepeated-measures ANOVAs on Interval conducted forbassoon clarinet and trumpet only led to a weak maineffect for the trumpet [F (2 32)= 37p= 0347η2

p = 19]Post-hoc tests for the three possible comparisons yieldeda single significant difference between the interval sizes 0and 6 ST [t(16) = minus35 p = 0033 r = 65]

5 Experiment B

51 Method

511 Participants

The experiment was completed by 20 participants 9 fe-male and 11 male with a median age of 22 years (range18ndash35) Fifteen participants reported more than 10 yearsof formal musical training with 11 indicating experi-ence with wind instruments Participants were remuner-ated with 20 Canadian dollars

512 Stimuli

Table II lists the 22 investigated dyad conditions The In-terval factor investigated two levels comparing unison to

1044

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table II Twenty-two dyad conditions from Experiment B across instruments pitches and intervals (top-to-bottom) Intervals in semi-tones relative to the specified reference pitch

horn bassoon oboe trumpet clarinet fluteC3 B 3 A2 D4 C4 G 4 E5 C4 B 4 E3 A4 C4 G 4 E5

0 6 0 6 0 -2 0 -2 0 0 0 0 6 0 6 0 -2 0 -2 0 0 0

non-unison (6 or -2 ST dissonant) intervals Depending onthe instrument the Pitch factor involved two (horn bas-soon trumpet clarinet) or three (oboe flute) levels In thecase of horn bassoon trumpet and clarinet there weretwo levels of Interval for each level of Pitch In additionthis experiment included two factors that were related toΔF variations alone which applied to all conditions listedin Table II The first was synonymous with ΔF as it ex-plored a total of five ΔF levels including ΔF = 0 and twosets of predefined moderate and extreme deviations aboveand below it ie the ΔF levels hereafter labeled 0 plusmnI and plusmnII The second factor grouped the five levels contex-tually into two subsets of four which are denoted as lowand high contexts and defined in Figure 4 (bottom)

Employing the formant descriptors from Section 21the investigated ΔF levels were expressed on a commonscale of spectral-envelope description which provided abetter basis of comparison than taking equal frequencydifferences in Hz as the frequency extent of formantsacross instruments varied considerably Figure 5 providesexamples for all resulting ΔF levels of the horn The fourlevels ΔF = 0 were defined as frequency distances be-tween the formant maximum Fmax and measures relatedto the location and width of its bounds (eg Frarr

6dB andΔF3dB respectively) For example the positive deviationΔF (+I) was the distance between the formant maximumand its upper bound minus 10 of the width betweenthe 3 dB bounds If spectral-envelope descriptions lackedlower bounds (eg trumpet clarinet flute) the frequencylocated below Fmax that exhibited the lowest magnitudewas taken as a substitute value

Unlike the dyads in Experiment A the synthesizedsound always remained at the reference pitch whereas thesampled sound varied its pitch for non-unison intervalsbecause this tested the assumption of pitch-invariant de-scription for the recorded sounds more thoroughly Thedyads had a constant duration of 4700 ms In additionthe conditions listed in Table II including the associatedfive ΔF levels per condition had predetermined values forthe level balance between sounds and had also been equal-ized for loudness The first author determined the level bal-ance aiming for good balance between both sounds whilemaintaining discriminability between ΔF levels whichwas subsequently verified by the second author Loudnessequalization was conducted subjectively in a separate pi-lot experiment anchored to a global reference dyad for allconditions and ΔF levels For all stimuli gain levels weredetermined that equalized stimulus loudness to the globalreference These gain levels were based on median val-ues from at least five participants which were determined

200 400 600 800 1000 1200-12

-10

-8

-6

-4

-2

0

Powerlevel(dB)

Frequency (Hz)

ΔF3dB

F3dB

larrF3dB

rarrFmax

F6dB

larrF6dB

rarr

-II -I 0 +I +II

Figure 5 ΔF levels from Experiment B defined relative toa spectral-envelope estimatersquos formant maximum and boundsΔF (plusmnI) fall 10 inside of ΔF3dBrsquos extent ΔF (+II) aligns withFrarr

6dB whereas ΔF (minusII) aligns with either 80 middot F larr6dB or 150 Hz

whichever is closer to Fmax

either after the corresponding interquartile ranges first fellbelow 4 dB or after running a maximum of 10 participants

513 ProcedureA relative-rating task required participants to compare ΔFlevels for a given condition from Table II In each ex-perimental trial participants were presented four dyadsand asked to provide four corresponding ratings The fourdyads represented one of the twoΔF contexts labeled highand low in Figure 4 A continuous rating scale was em-ployed which spanned from most blended to least blended(values 1 to 0 respectively) and served as the dependentvariable Participants needed to assign two dyads to thescale extremes (eg most and least) the remaining twodyads were positioned along the scale continuum relativeto the chosen extremes Playback could be switched freelybetween the four dyads with the visual order of the se-lection buttons and rating scales for individual dyads ran-domized between trials Participants completed 120 trials(30 conditions5 times 2 contextstimes 2 repetitions) taking about75 minutes including two 5-minute breaks after about 40and 80 trials

52 Results

521 General trendsGroup medians of ratings aggregated across the factorsPitch Interval and Context illustrate how blend varies as a

5 Only the 22 conditions investigating ΔF are reported here

1045

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

function of ΔF Figure 6 suggests that participants mainlyassociated higher degrees of blend with the levels ΔF le 0whereas much lower ratings were obtained for ΔF gt 0 Interms of higher degrees of blend two typical rating pro-files as a function of ΔF emerged (shown as the ideal-ized dashed-and-dotted curves in Figure 6) 1) For the in-struments horn bassoon oboe and trumpet (left panel)medium to high blend ratings were obtained at and be-low ΔF = 0 above which ratings decreased markedlyresembling the profile of a plateau 2) The instrumentsclarinet and flute (right panel) exhibited a monotonicallydecreasing and approximately linear rating profile as ΔFincreases in which ΔF = 0 did not appear to assume a no-table role These differences in plateau vs linear profilesfor the two instrument subsets are analyzed more closelyin the following sections also taking into account potentialeffects due to the other factors

522 Blend and pitch invariance

Spectral characteristics that remain stable with pitch varia-tion such as formants may have a pitch-invariant percep-tual relevance To test this whenever the profiles of blendratings over ΔF remained largely unaffected by differentpitches intervals and ΔF contexts the perceptual resultswere assumed to be pitch-invariant For instance Figure 7suggests this tendency for the horn in which the plateauprofile was maintained over all factorial manipulations Asa first step the main effects across ΔF were tested to con-firm that ratings served as reliable indicators of perceptualdifferences Given these main effects perceptual robust-ness to pitch variation was fulfilled if no ΔF times Pitch orΔF times Interval interaction effects were found across bothΔF contexts An absence of main effects betweenΔF con-texts would indicate further perceptual robustness

As the Context factor only involved ΔF levels com-mon to both the high and low contexts namely 0 andplusmnI (see Figure 4 bottom) the ratings for these levels re-quired range normalization and separate analyses from theremaining factors For the instruments involving the In-terval factor these were conducted on split levels of thatfactor Furthermore the experimental task imposed the us-age of the rating-scale extremes which resulted in severalviolations of normality due to skewed distributions for thedyads selected as extremes As a result all main and in-teraction effects were tested with a battery of five indepen-dent repeated-measures ANOVAs on the raw and trans-formed ratings The data transformations included non-parametric approaches of rank transformation [28] andprior alignment of lsquonuisancersquo factors [29]6 The statistics

6 Given the unavailability of non-parametric alternatives for repeated-measures three-way ANOVAs that include tests for interaction effectsan approach was chosen that assesses tests over multiple variants ofdependent-variable transformations presuming that the most conserva-tive test in the ANOVA battery minimizes accepting false positives Ranktransformation is a common approach in non-parametric tests such as theone-way Friedman test [28] Issues with tests for interaction effects los-ing power in the presence of strong main effects were addressed throughlsquoalignmentrsquo of the raw data prior to rank transformation [29] For in-stance a test for the interaction AtimesB would align its lsquonuisancersquo factors

-II -I 0 +I +II

0

1

ΔF

Blendrating

Produced optimal blend

-II -I 0 +I +IIΔF

Produced optimal blend

horn

bassoon

oboe

trumpet

profile

clarinet

flute

profile

Figure 6 Perceptual results for the different instruments groupedaccording to two typical response patterns (left and right panels)Experiment A (diamonds bottom) mean ΔF for produced opti-mal blend transformed to a continuous scale of ΔF levels Thegrey lines indicate slider ranges (compare to Figure 4 top) Ex-periment B (curves) median blend ratings across ΔF levels andidealized profile

-II -I 0 +I

0

1

context lowinterval unison

Blendrating

-I 0 +I +II

0

1

context highinterval unison

pitch 1

pitch 2

-II -I 0 +I

0

1

context lowinterval non-unison

Blendrating

ΔF-I 0 +I +II

0

1

context high

interval non-unison

ΔF

Figure 7 Medians and interquartile ranges of blend ratingsfor horn across ΔF levels and the factorial manipulationsPitchtimesContexttimesInterval

for the most liberal and conservative p-values are reported(eg conserv|liberal) with the conservative finding beingassumed valid if statistical significance is in doubt

Strong main effects were found for all instrumentswhich indicated clear differences in perceived blendamong the investigated ΔF levels Table III lists ANOVAstatistics for the range between strongest (clarinet) andweakest (bassoon) main effects among the instrumentswhich reflects analogous differences in the utilized rating-scale ranges in Figure 6 Furthermore the rating profiles ofthe instruments horn bassoon oboe and trumpet fulfilledthe criteria for pitch-invariant robustness as they appeared

by removing the main effects for A and B The four data transformationsprocessed the raw data with or without alignment and for two rankingmethods The first method computed global ranks across the entire dataset ie across participants and conditions whereas the second methodevaluated within-participant ranks across conditions per participant

1046

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table III Range of strongest to weakest ANOVA main effects along ΔF across all six instruments

low context high contextEffect Stat conserv liberal conserv liberal

F 860 826 1651 1651Clarinet df 16307 21391 357 357(strongest) ε 54 69 - -

p lt0001 lt0001 lt0001 lt0001η2p 82 81 90 90

F 164 168 126 152Bassoon df 357 357 14271 357(weakest) ε - - 48 -

p lt0001 lt0001 0005 0001η2p 46 47 40 44

unaffected by pitch-related variation There was only oneexception from a complete absence of effects interactingwith ΔF a moderate main effect for Context for trum-pet was found only at non-unison intervals [F (1 19) =1048|2504 p = 0043|0001 η2

p = 355|569] By con-trast the rating profiles for clarinet and flute did not exhibitpitch invariance as they clearly violated the criteria acrossboth ΔF contexts The interaction effects with ΔF and amain effect for Context leading to their disqualification aredescribed in Table IV

The instruments displaying pitch invariance were thesame ones with plateau rating profiles possibly attributinga special role to ΔF = 0 as defining a boundary governingblend To further support this assumption by joint analysisof the four instruments two hierarchical cluster analyseswere employed that grouped ΔF levels based on their sim-ilarity in perceptual ratings or auditory-model representa-tions The first cluster analysis reinterpreted rating differ-ences between ΔF as a dissimilarity measure This mea-sure considered effect sizes of statistically significant non-parametric post-hoc analyses (Wilcoxon signed-rank test)for pairwise comparisons between ΔF levels ie greaterstatistical effects between two ΔF levels were expressedas being more dissimilar in the perceived degree of blendFor non-significant differences dissimilarity was assumedto be zero The second analysis relied on correlation coef-ficients (Pearson r) between dyad SAI profiles across ΔFlevels (see Figure 9 for examples) The dissimilarity mea-sure considered the complement value 1minus r and as all cor-relations fall within the range [0 1] no special treatmentfor negative correlations was required Both cluster anal-yses employed complete-linkage algorithms The dissimi-larity input matrices were obtained by averaging 30 inde-pendent data sets aggregated across the four instrumentsand the factors Context Pitch and Interval As shown inFigure 8 both analyses led to analogous solutions in whichthe two levels ΔF gt 0 are maximally dissimilar to acompact cluster associating the three levels ΔF le 0 Inother words ΔF associated with low and high degrees ofblend group into two distinct clusters clearly relating tothe plateau profile where ΔF = 0 defines the boundary tohigher degrees of blend

Table IV ANOVA effects for clarinet and flute suggesting an ab-sence of pitch invariance lowast The column header low context doesnot apply in this case

Clarinet low context high contextEffect Stat conserv liberal conserv liberal

F 29 34 38 44ΔF times df 357 357 20387 357Interval ε - - 68 -

p 044 024 030 008η2p 13 15 17 19

Flute low context high contextEffect Stat conserv liberal conserv liberal

F 28 43 44 72ΔF times df 39750 6114 6114 6114Pitch ε 66 - - -

p 031 0006 0005 lt0001η2p 13 18 19 28

F 49 157 - -df 119 119 - -

Contextlowast ε - - - -p 039 0008 - -η2p 21 45 - -

-II -I 0 +I +II

01

02

03

04

05

Dissimilarity

ΔF-II -I 0 +I +II

01

02

ΔF

Figure 8 Dendrograms of ΔF -level groupings for the pitch-invariant instruments Dissimilarity measures are derived fromperceptual ratings (left) and auditory-modelled dyad SAI profiles(right)

523 Blend and its spectral correlatesExplaining blending between instruments with the helpof spectral-envelope characteristics could eventually al-

1047

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table V Variables entering stepwise-regression algorithm to ob-tain models reported in Table VI a Not for pitch-variant subsetinter-variable correlation |r| gt 7 b Computed as described in[30] c Accounting for pitch d Accounting for perceivable beat-ing between partial tones

No Variable Description

1 ΔLrarr3dB derivate of Frarr

3dB Equation (1)2 ΔLF1vsF2 ΔL betw formants F1 amp F2

3 ΔSslopeF1ab spectral slope above Frarr

3dB4 ΔSslope

b global spectral slope5 |ΔScentroid

b absolute centroid difference6 Scentroid

b centroid composite7 ERBrate

c reference pitch in Table II8 I(non)unison interval category (binary)9 IST

a interval size in semitones10 Clohi ΔF context (binary)11 mixratio balance betw instruments12 AMdepth

bd amplitude modulation depth

low the prediction of blend through these instrument-specific traits In addition it could help understand the wayin which spectral characteristics contribute perceptuallyGiven this aim multiple linear regression was employed tomodel the median blend ratings through a number of vari-ables or regressors The regression models assessed therelative contributions of regressors describing both globaland local spectral-envelope traits Global descriptors in-volved the commonly reported spectral centroid and spec-tral slope [30] whereas local descriptors concerned theformant characterization discussed in Section 21 Becausea dyad yielded two descriptor values across its constituentsounds the regressor measure had to associate the two insome way For the spectral centroid two measures wereconsidered namely composite (sum) and absolute differ-ence [4] Although the centroid composite relates to thelsquodarkerrsquo-timbre hypothesis mentioned in the Introductionthe centroid difference had still been found to best explainblend in non-unison intervals [4] which left some uncer-tainty as to which of these two measures was more appro-priate in explaining blend in general The remaining spec-tral regressors were implemented as polarity-preservingdifferences between descriptors with the sampled instru-ment serving as the reference (ref) and the synthesized in-strument being variable (var) across ΔF For example thedifference of descriptor values dx for instrument x wouldcorrespond to Δd = dref minus dvar

Regression models for two separate subsets of the per-ceptual data were explored pitch-invariant (horn bassoonoboe trumpet) and pitch-variant instruments (clarinetflute) The datasets comprised all conditions tested acrossfactors and instruments with N = 118 and N = 54 forthe pitch-invariant and -variant subsets respectively Re-gressor variables were pooled from the spectral-envelopedescriptors and additional variables that were included toaccount for potential confounding factors eg pitch in-terval If these factors did not contribute as regressorsthis would further support a perceptual relevance of pitch

invariance The initial pool of regressors comprised 32variables subsequently reduced to a pre-selected set thatexhibited inter-variable correlations |r| lt 7 in order toavoid pronounced collinearity among regressors The pre-selection was determined by first identifying the vari-able that in simple linear regression exhibited the high-est R2 and subsequently adding all remaining variablesthat yielded permissible inter-variable correlations Ta-ble V lists the pre-selected variables entered into the re-gression which comprised spectral-envelope descriptors(nos 1-6) and variables representing other potential fac-tors of influence (nos 7-12) Stepwise multiple-regressionalgorithms with both forward-selection and backward-elimination schemes were considered which converged onoptimum models by iteratively adding or eliminating re-gressors respectively Models with similar combinationsof regressors to the optimum models were explored aswell In anticipation of reporting the results the inclusionof a binary regressor for ΔF context Clohi benefited all in-vestigated regression models as it corrected the systematicoffset of scaled ratings between the low and high contexts(see Figure 7)

In simple linear regression the strongest spectral-enve-lope descriptors all concerned local formant characteriza-tion and did not involve the global descriptors Among theformant descriptors the highest correlations were obtainedfor the main-formant upper bound Frarr

3dB applied to bothpitch-invariant [R2(118) = 656 p lt 0001] and pitch-variant subsets [R2(54) = 713 p lt 0001] Notably theformant maximum Fmax did not perform better than Frarr

3dBlikely due to differing skewness properties between instru-ment formants (see Figure 1) At the same time the utilityof Frarr

3dB implies that it could assume an important role inexplaining blend as perhaps the perceptually most salientfeature of formants It performed slightly better for thepitch-variant than for the pitch-invariant subset becauseΔF rarr

3dB essentially follows a strictly monotonic functionacross the investigated ΔF levels which apparently mod-els the linear blend profile better To improve the model-ing for the plateau blend profile derivate descriptors ofFrarr

3dB were explored The most effective derivate ΔLrarr3dB re-

lated the upper-bound frequencies Frarr3dB of the two instru-

ments to a corresponding change in level across the ref-erence instrumentrsquos spectral-envelope magnitude functionLref (f ) in dB as formalized in Equation 1 In other wordsthis measure evaluated magnitude differences relative tothe reference instrument at frequencies appearing to be ofparticular perceptual relevance to both instruments whichtherefore may relate to spectral-masking effects (eg in-complete masking [13])

ΔLrarr3dB = Lref F rarr

3dB|ref minus Lref F rarr3dB|var (1)

The obtained solutions from stepwise multiple regressionyielded identical models for both instrument subsets in-volving the regressors ΔLrarr

3dB absolute spectral-centroiddifference |ΔScentroid| and context Clohi A slight gain inperformance was achieved by substituting the |ΔScentroid|based on audio signals from individual sounds with a vari-

1048

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table VI Multiple-regression models best predicting timbre-blend ratings for two instrument subsets

Pitch-invariant subset R2adj=87

F (3 116)=2724 p lt0001

Regressors βstd t p

ΔLrarr3dB 100 286 lt0001

|ΔScentroid| 26 77 lt0001Clohi 27 79 lt0001

Pitch-variant subset R2adj=88

F (3 52)=1342 p lt0001

Regressors βstd t p

ΔLrarr3dB 103 200 lt0001

|ΔScentroid| 16 33 0018Clohi 34 68 lt0001

ant computed on the pitch-generalized spectral-envelopeestimates Table VI displays these optimized regressionmodels for pitch-invariant and pitch-variant subsets bothleading to about 87 explained variance (adjusted R2)The standardized regression-slope coefficients βstd indi-cate the relative contribution of regressors with the rel-ative weights being very similar for both subsets In thesemodels ΔLrarr

3dB acted as the strongest predictor for theblend ratings contributing about five times more than|ΔScentroid| which furthermore did not perform better thanClohi These findings clearly argue for local spectral-envelope descriptors to be more meaningful than globalones in explaining blending in the investigated dyadsMoreover the remaining global descriptor spectral slopeappeared to play no role Furthermore finding both in-strument subsets to be modeled equally well through thesame spectral-envelope descriptors points to a general util-ity of pitch-generalized descriptions for all instrumentsDespite the findings in Section 522 arguing against pitch-invariant perceptual robustness for clarinet and flute theobtained regression models excluded the Pitch and Intervalvariables thus appearing to be less relevant to explainingthe blend ratings

6 General discussion

Orchestrators would benefit from acoustical descriptionsof instruments that correspond to the perceptual processesinvolved in achieving blended timbres Section 2 sug-gests that common orchestral wind instruments are rea-sonably well described through pitch-generalized spectral-envelope estimates which furthermore show the instru-ments horn bassoon oboe and trumpet to be charac-terized by prominent formant structure Auditory mod-els employing stabilized auditory images (SAI) confirmthat for strong formant characterization and for lower tomiddle pitch ranges the pitch-invariant characterization isstable In higher instrument registers however SAI pro-files indicate limitations to pitch-invariant characteriza-tion Other instruments like clarinet and flute yield SAI

profiles clearly varying as a function of pitch implyingthat this pitch dependency may also extend to perception

The perceptual investigation in Sections 3 to 5 con-firms the acoustical implications showing that strong for-mant characterization results in main formants becomingperceptually relevant to blending Given a dyad in whicha putative main formant is variable in frequency relativeto a fixed reference formant the investigated instrumentsdisplay two archetypical profiles based on their formantprominence For the pitch-variant clarinet and flute blendincreases as a monotonic quasi-linear function if the vari-able formant moves from above to below the referenceFor pitch-invariant instruments the frequency alignmentbetween the variable formant and the reference (ΔF= 0)functions as a boundary delimiting a region of higher de-grees of blend at and below the reference and contrastedby a marked decrease in blend when the variable formantexceeds it which overall resembles a plateau profile Thepitch invariance even extends to different interval typesas the plateau profile remains unaffected in non-unisonintervals regardless of their degree of consonance How-ever the findings suggest that the perceptual relevance ofspectral-envelope estimates diminishes in higher instru-ment registers The limited sampling of conditions acrossthe investigated factors prevented a more comprehensiveevaluation of all instruments Whereas the findings allowgeneral relationships for formant frequency to be deducedmore comprehensive investigations are needed to attainmore generalizable quantification of absolute frequencyranges and thresholds

In correlating acoustical and perceptual factors spec-tral-envelope characteristics alone explain up to 87 ofthe variance in blend ratings In addition local spectraltraits seem to be more powerful acoustical predictors ofblend than global traits The formant descriptor for theupper formant bound Frarr

3dB when expressed as a derivatedescriptor ΔLrarr

3dB acts as the strongest predictor for theblend ratings regardless of whether instruments belongto the pitch-invariant group or not With regard to clar-inet and flute the departures from pitch invariance foundin Section 522 contradict the utility of pitch-generalizedspectral-envelope description in predicting blend ratingsas reported in Section 523 Taking both findings into ac-count this for one argues that the descriptor Frarr

3dB stillsucceeds in explaining blend well even for clarinet andflute On the other hand the same instruments displaya greater perceptual sensitivity to the Pitch and Intervalfactors likely associated with their less pronounced for-mant structure Overall strong formant prominence leadsto more drastic changes in blend

The prediction of blend using ΔLrarr3dB still presumes

that one of the instruments serves as a reference formantas the employed difference descriptors are anchored tothe sampled instrument The dependence on a referenceleaves some ambiguity because an arbitrary combinationof two instruments would lead to contradictory predictionsof blend if both instruments were given equal importancein serving as the reference Given the context of both ex-

1049

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

periments it can be assumed that the sampled instrumentacting as a constant anchor had been biased into servingas the reference by combining it with a variable synthe-sized instrument In addition a possible perceptual expla-nation could concern audio samples of instruments playingwithout vibrato generally still exhibiting coherent micro-modulations of partial tones These modulations have beenshown to contribute to a stronger and more stable audi-tory image [31] and may thus bias the more stable imagetoward acting as the reference especially as the synthe-sized partials remain static over time Even in the contextof blending in musical performance one instrument as-sumes the role of the leading voice in which it possiblyserves as the reference whereas an accompanying instru-ment avoids exceeding the lead instrumentrsquos main-formantfrequency Likewise returning to the notion of blend lead-ing to augmented timbres [4] the dominant timbre to beembellished by another would seem predestined to func-tion as the reference based on which the less dominanttimbre should not exhibit formant frequencies exceedingthose of the reference

Finally the results allow a reassessment of previousexplanations for blend The lsquodarkerrsquo-timbre hypothesis[4] is directly reflected in the obtained linear blend pro-file in which lower ΔF increases blend and at the sametime causes a decrease in the spectral-centroid compos-ite By contrast this hypothesis is not well explained bythe plateau profile as blend ratings remain similarly highfor ΔF le 0 The alternative hypothesis of coincidence offormant regions [3] would have predicted stronger blendratings for ΔF = 0 than for all other levels which in theperceptual results is only achieved for the levels ΔF gt 0While the hypothesis only achieves partial fulfillment withrespect to spectral variations ΔF it finds more agreementin the corresponding SAI representations As shown in twoexample cases for horn in Figure 9 the dyad SAI pro-files for the levels ΔF gt 0 are distinguishable from theremaining levels through clear deviations between 1 and2 kHz7 and located just above the hornrsquos estimated Frarr

3dBRemarkably the formant shifts related to ΔFlt 0 (Fig-ure 3) are not reflected in the corresponding dyad SAIprofiles (Figure 9) which instead exhibit direct align-ment below 500 Hz for all three levels ΔF le 0 There-fore only ΔF gt 0 seems to lead to incomplete masking[13] revealing the presence of the synthesized instrumentwhereas the spectral-envelope variations ΔF lt 0 evokelittle change compared to the dyad excitation pattern forΔF = 0

Of still greater importance the auditory system as mod-eled by AIM using the DCGC seemingly involves a high-

7 Concerning the output from the AIM a misalignment between actualsinusoidal frequencies and the corresponding SAI peaks was observedThrough personal communication with the developer of the utilized AIMimplementation this was explained as being an inherent property of thedynamic-compression filters A correction function was derived by com-puting SAIs for various sinusoidal frequencies and fitting the two fre-quency scales yielding the linear function fSAI = 117 middot f + 282 Hz[r2(50)= 999 p lt 0001] In Figure 9 the correction manifests itself inthe compressed frequency extent for the SAIs

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

001

002

SAImagnitude

0

-II

-I

+I

+II

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

002

004

006

008

Frequency (Hz)

Figure 9 SAI profiles of dyads for allΔF levels (Experiment B)depicting two experimental conditions for horn Top pitch 1 uni-son bottom pitch 2 non-unison the grid lines correspond topartial-tone locations

pass characteristic that attenuates spectral-envelope re-gions below 500 Hz affecting the perceived magnitudesof the respective partial tones (grid lines) This impliesthat in the region below 500 Hz frequency deviations be-tween main formants no longer affect the achieved degreeof blend as reflected both in Figure 9 and the perceptualfindings Horn and bassoon would especially benefit fromthis as their main formants are centered around 500 HzOboe and trumpet both exhibiting higher Frarr

3dB can be as-sumed to benefit to a lesser degree In summary main for-mants located in proximity to 500 Hz will benefit more ontop of which lower pitches would also increase the num-ber of partial tones falling into this favorable frequencyregion This reflects tendencies for pitch-invariant traits inSAI correlations (see Section 22) to be more pronouncedat lower pitch ranges and for instruments of lower registerwhich would lend support to the lsquodarker-timbrersquo hypothe-sis in terms of limiting the spectral centroid in frequency

7 Conclusion

Evidence from acoustical and psychoacoustical descrip-tions of wind instruments and from perceptual validationshows that relative location and prominence of main for-mants affect the perception of timbre blend critically Fur-thermore these pitch-invariant spectral characteristics ex-plain and predict the perception of blend to a promis-ingly high degree Remaining discrepancies between theacoustic and perceptual domains can be explained throughapparent constraints of the simulated auditory systemIn conclusion a perceptual model for the contributionof local spectral-envelope characteristics to blending isproposed keeping in mind that it would serve as aninstrument-specific component in a more complex gen-eral perceptual model involving compositional and per-formance factors as initially discussed in Section 1 Themain factors influencing the perception of spectral blendare summarized1 Frequency relationships between upper bounds of main

formants are critical to blend Among several instru-

1050

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

ments one is expected to serve as the reference (eglead instrument dominant timbre) above which thepresence of other instrumentsrsquo formants would stronglyresult in decreased blend

2 Prominence of the main formants governs whetherthese relationships lead to plateau or linear blend pro-files and in the first case pitch-invariant perceptual ro-bustness extends to non-unison intervals

3 Spectral-envelope relationships below 500 Hz may benegligible due to constraints of the auditory system Atthe same time blend decreases at higher pitches due toa degraded perceptual robustness of formants

This hypothetical model still requires further investigationconcerning a more systematic study on 1) the apparentconstraints of the auditory system as modeled by AIM2) how in musical practice one instrument may functionas the reference 3) establishing a more specific descrip-tion of formant prominence and 4) addressing the contri-bution of loudness balance between instruments to blendThese future investigations will further validate and refinethe proposed perceptual model and will improve compu-tational prediction tools for the instrument-specific spec-tral component of blend Orchestration practice will ben-efit from these research efforts even beyond aiming forblend as knowledge of favorable instrument relationshipsalso informs orchestrators as to how to avoid it

References

[1] G J Sandell Concurrent timbres in orchestration a per-ceptual study of factors determining blend PhD disserta-tion Northwestern University 1991

[2] R A Kendall E C Carterette Identification and blend oftimbres as a basis for orchestration Contemp Music Rev9 (1993) 51ndash67

[3] C Reuter Die auditive Diskrimination von Orchesterin-strumenten - Verschmelzung und Heraushoumlrbarkeit vonInstrumentalklangfarben im Ensemblespiel (The auditorydiscrimination of orchestral instruments fusion and distin-guishability of instrumental timbres in ensemble playing)P Lang Frankfurt am Main 1996

[4] G J Sandell Roles for spectral centroid and other factorsin determining ldquoblended instrument pairings in orchestra-tion Music Percept 13 (1995) 209ndash246

[5] A S Bregman Auditory scene analysis the perceptual or-ganization of sound MIT Press Cambridge MA 1990

[6] S McAdams S Winsberg S Donnadieu G De Soete JKrimphoff Perceptual scaling of synthesized musical tim-bres common dimensions specificities and latent subjectclasses Psychol Res 58 (1995) 177ndash192

[7] L Wedin G Goude Dimension analysis of the perceptionof instrumental timbre Scand J Psychol 13 (1972) 228ndash240

[8] J M Grey J W Gordon Perceptual effects of spectralmodifications on musical timbres J Acoust Soc Am 63(1978) 1493ndash1500

[9] W Strong M Clark Synthesis of wind-instrument tonesJ Acoust Soc Am 41 (1967) 39ndash52

[10] D Tardieu S McAdams Perception of dyads of impulsiveand sustained instrument sounds Music Percept 30 (2012)117ndash128

[11] L Hermann Phonophotographische Untersuchungen(Phonophotographic investigations) Pfluumlger Archiv fuumlrdie Gesamte Physiologie des Menschen und der Thiere 58(1894) 264ndash279

[12] C Stumpf Die Sprachlaute (The speech sounds) SpringerBerlin 1926

[13] J P Fricke Zur Anwendung digitaler Klangfarbenfilter beiAufnahme und Wiedergabe (On the application of digitaltimbre filters during recording and playback) Proc 14thTonmeistertagung Munich Germany 1986 135ndash148

[14] K E Schumann Physik der Klangfarben (Physics of tim-bres) Professorial dissertation Universitaumlt Berlin Berlin1929

[15] E L Saldanha J F Corso Timbre cues and the identifica-tion of musical instruments J Acoust Soc Am 36 (1964)2021ndash2026

[16] W Strong M Clark Perturbations of synthetic orchestralwind-instrument tones J Acoust Soc Am 41 (1967) 277ndash285

[17] D Luce J Clark Physical correlates of brass-instrumenttones J Acoust Soc Am 42 (1967) 1232ndash1243

[18] D A Luce Dynamic spectrum changes of orchestral in-struments J Audio Eng Soc 23 (1975) 565ndash568

[19] J Meyer Acoustics and the performance of music 5th edSpringer New York 2009

[20] G Fant Acoustic theory of speech production MoutonThe Hague 1960

[21] C Reuter Klangfarbe und Instrumentation Geschichte ndashUrsachen ndash Wirkung (Timbre and instrumentation historyndash causes ndash effect) P Lang Frankfurt am Main 2002

[22] R van Dinther R D Patterson Perception of acousticscale and size in musical instrument sounds J AcoustSoc Am 120 (2006) 2158ndash2176

[23] T Irino R D Patterson A dynamic compressive gam-machirp auditory filterbank IEEE Trans Audio SpeechLang Process 14 (2006) 2222ndash2232

[24] B C Moore B R Glasberg Suggested formulae for cal-culating auditory-filter bandwidths and excitation patternsJ Acoust Soc Am 74 (1983) 750ndash753

[25] S-A Lembke S McAdams A spectral-envelope synthe-sis model to study perceptual blend between wind instru-ments Proc 11th Congregraves Franccedilais drsquoAcoustique IOAannual meeting Acoustics 2012 Nantes France 20121025ndash1030

[26] F Martin C Champlin Reconsidering the limits of normalhearing J Am Acad Audiol 11 (2000) 64ndash66

[27] ISO 389ndash8 Acoustics Reference zero for the calibrationof audiometric equipmentmdashPart 8 Reference equivalentthreshold sound pressure levels for pure tones and circum-aural earphones Tech Rept International Organization forStandardization Geneva Switzerland 2004

[28] W J Conover R L Iman Rank transformations as abridge between parametric and nonparametric statisticsAm Stat 35 (1981) 124ndash129

[29] J J Higgins S Tashtoush An aligned rank transform testfor interaction Nonlinear World 1 (1994) 201ndash221

[30] G Peeters B L Giordano P Susini N Misdariis SMcAdams The Timbre Toolbox extracting audio descrip-tors from musical signals J Acoust Soc Am 130 (2011)2902ndash2916

[31] S McAdams Spectral fusion spectral parsing and the for-mation of auditory images PhD dissertation Stanford Uni-versity 1984

1051

Page 4: The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Figure 2 (Color online) SAI correlation matrices for horn oboe and clarinet (left-to-right) Correlations (Pearson r) between SAImagnitudes across 50 filter channels consider all possible pitch combinations with obtained r falling within [0 1] (see legend far-right)

lower pitch regions from which low-pitched instrumentsin particular would benefit All of these instruments lostpitch-invariant tendencies in their high registers The re-maining instruments clarinet (right panel) and flute (notshown) lacked widespread pitch-invariant SAI charac-teristics as strong patterns of correlation were only ob-tained between directly adjacent pitches (diagonal) andnot across wider pitch regions

23 Parametric variation of main-formant fre-quency

In order to study the contribution of local variations ofspectral characteristics a synthesis model was employedthat provided parametric control over separate spectral-envelope components The synthesis infrastructure reliedon a source-filter model and was realized for real-timemodification of the control parameters [25] based onwhich the spectral envelope remained invariant to pitchchanges During synthesis the filter structure was fed aharmonic source signal of variable fundamental frequencycontaining harmonics up to 5 kHz The filter structure con-sisted of two independent filters modeling the main for-mant on the one hand and the remaining spectral-enveloperegions on the other A parameter allowing the main for-mant to be shifted in frequency relative to the remain-ing regions was implemented as an absolute deviation ΔFin Hz from a predefined origin ie ΔF = 0 Analoguemodels for each instrument were designed for ΔF = 0 bymatching the frequency response of the composite filterstructure to the spectral-envelope estimates as illustratedin Figure 3 for the horn (dashed black line) superimposedover its corresponding estimate (solid grey line) The ana-logues were not meant to deliver realistic emulations ofthe instruments per se but rather to achieve a good fitbetween the main formants of the analogue and spectral-envelope estimate Limiting differences in shape betweenmain formants helped to deduce the measured perceptualdifferences that resulted from frequency relationships be-tween them It should be noted that the synthesis filterstructure for the clarinet excluded its pitch-variant lower

500 1000 1500 2000 2500 3000 3500 4000 4500 5000-50

-40

-30

-20

-10

0

Powerlevel(dB)

Frequency (Hz)

spectral-envelope estimate

formant bounds 3 dB

synthesis filter for ΔF= 0

synthesis filters for ΔFne 0

-II -I 0 +I+II

Figure 3 Spectral-envelope estimate of horn and filter magni-tude responses of its synthesis analogue The original analogueis modeled for ΔF = 0 the other responses are variations ofΔF The top axis displays the equivalent scale for the five ΔF levelsinvestigated in Experiment B

frequency region (see Section 21) It only modeled theformant above that region as well as the remaining spec-tral envelope towards higher frequencies in order to orientthe investigation toward specifically testing the relevanceof the identified albeit less pronounced formant

3 General methods

The perceptual relevance of main-formant frequency toblending was tested for sound dyads All dyads compriseda sampled instrument and its synthesized analogue modelIn a given dyad the instrument sample was constant andits synthesized analogue was variable with respect to theparameter ΔF Variations with ΔF gt 0 shifted the mainformant of the synthesized sound higher in frequency rel-ative to the sampled instrumentrsquos main formant and ac-cordingly ΔF lt 0 corresponded to shifts toward lowerfrequencies Two perceptual experiments were conductedto investigate how ΔF variations relate to blend In Ex-periment A participants controlled ΔF directly and were

1042

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

asked to find the ΔF that gave optimal blend whereasin Experiment B listeners provided direct blend ratingsfor predefined ΔF variations Using the instruments pre-sented in Section 2 the robustness of perceptual effectswas assessed over the two experimental tasks for differ-ent pitches unison and non-unison intervals and stimuluscontexts Given six instruments and various pitches andintervals it was impractical to test all possible combina-tions An exploratory approach was chosen instead withnot all instruments being tested across all factors Whereasthis could reveal blend-related dependencies concerningΔF across the factors of interest it did not allow gener-alizing across all instruments to the same degree as well asdetermining perceptual thresholds Still each factor wasstudied with at least three instruments for greater gener-alizability Pitches were chosen to represent common reg-isters of the individual instruments Non-unison intervalsincluded both smaller and larger intervals

The methods both experiments share in common arepresented in this section before addressing their specificsand results in the following sections

31 ParticipantsDue to the demanding experimental tasks participantsof both experiments were musically experienced listen-ers They were recruited primarily from the SchulichSchool of Music McGill University Their backgroundswere assessed through self-reported degree of formalmusical training accumulated across several disciplineseg instrumental performance composition music the-ory andor sound recording All participants passed a stan-dardized hearing test [26 27] No participant took part inboth experiments

32 StimuliAll stimuli involved dyads comprising one sampled(drawn from VSL) and one synthesized sound For a sam-ple at any given pitch the spectral envelope was ap-proximated by the pitch-generalized description from Sec-tion 21 which resulted in the main formants of sampledand synthesized sounds resembling each other for ΔF = 0With regard to the temporal envelope both instrumentswere synchronized in their note onsets followed by thesustain portion and ending with an artificial 100-ms linearamplitude decay ramp applied to both instrument soundsThe sampled sound retained its original onset character-istics whereas across all modeled analogues the synthe-sized onsets were characterized by a constant 100-ms lin-ear amplitude ramp Stimuli were presented over a stan-dard two-channel stereophonic loudspeaker setup insidean Industrial Acoustics Company double-walled soundbooth with the instruments simulated as being capturedby a stereo main microphone at spatially distinct locationsinside a mid-sized moderately reverberant room (see [25]for details)

33 ProcedureExperimental conditions were presented in randomized or-der within blocks of repetitions A specific condition could

not occur twice in succession between blocks The mainexperiments were in each case preceded by 10 practice tri-als under the guidance of the experimenter to familiar-ize participants with the task and with representative ex-amples of stimulus variations Dyads were played repeat-edly throughout experimental trials allowing participantsto pause playback at any time

34 Data analysis

With respect to investigated factors Experiment A evalu-ated the influence of the factors instrument register and in-terval type Experiment B assessed pitch-invariant percep-tual performance across a number of factors and further-more correlated the perceptual data with spectral-envelopetraits Separate analyses of variance (ANOVAs) were con-ducted for each instrument testing for statistically signifi-cant main effects within factors and interaction effects be-tween them A criterion significance level α = 05 waschosen and if multiple analyses on split factor levels orindividual post-hoc analyses were conducted Bonferronicorrections were applied In repeated-measures ANOVAsthe Greenhouse-Geisser correction (ε) was applied when-ever the assumption of sphericity was violated In additionExperiment A also considered one-sample t-tests againsta mean of zero for testing differences to ΔF = 0 Statis-tical effect sizes η2

p and r are reported for ANOVAs andt-tests respectively The analyses considered participant-based averages for trial repetitions of identical conditions

4 Experiment A

41 Method

411 Participants

The experiment was conducted with 17 participants 6 fe-male and 11 male with a median age of 27 years (range20-57) Fifteen participants reported more than 10 yearsof formal musical training with 10 indicating experi-ence with wind instruments Participants were remuner-ated with 15 Canadian dollars

412 Stimuli

Table I lists the 17 investigated dyad conditions (columnentries of bottom row) All instruments included unisonintervals (0 semitones ST) With regard to additional fac-tors three levels of the Interval factor compared unisonintervals to consonant (7 or minus3 ST) and dissonant (6 orminus2 ST) non-unison intervals Two levels of the Registerfactor contrasted low (A2 C4 or E3) to high (D5 or B5) in-strument registers for unison dyads with the high-registerpitches being derived from the pitch-variant regions iden-tified in Section 22 The sampled sound remained at theindicated reference pitch whereas the synthesized soundvaried relative to it to form the non-unison intervals Alldyads had constant durations of 4900 ms The level bal-ance between instruments was variable and determined bythe participant

1043

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table I Seventeen dyad conditions from Experiment A across instruments pitches and intervals (top-to-bottom) Intervals in semitonesrelative to the specified reference pitch

horn bassoon oboe trumpet clarinet fluteC3 A2 D5 C4 C4 B5 E3 D5 C4

0 6 7 0 -2 -3 0 0 0 6 7 0 0 -2 -3 0 0

ΔF

-II -I +II+I0

A

B

Γ= -100 Hz fslider

high

low

Figure 4 ΔF variations investigated in Experiments A and B AParticipants controlled fslider which provided a constant range of700 Hz (white arrows) Γ (eg minus100 Hz) represented a random-ized roving parameter preventing the range from always beingcentered on ΔF = 0 B Participants rated four dyads varyingin ΔF drawn from the low or high context The contexts repre-sented subsets of four of the total of five predefined ΔF levels

413 Procedure

A production task required participants to adjust ΔF di-rectly in order to achieve the maximum attainable blendwith the produced value serving as the dependent vari-able User control was provided via a two-dimensionalgraphical interface including controls for ΔF and thelevel balance between instruments The slider controls forΔF= fslider + Γ provided a constant range of 700 Hzwith fslider isin [minus350+350] and including a randomizedroving offset Γ isin [minus100+100] between trials As visu-alized in Figure 4 (top) minimal or maximal Γ limitedthe range covered by all trials to 500 Hz (solid thick greyline) with all possible ΔF deviations spanning a range of900 Hz (dashed thick line) Participants completed a totalof 88 experimental trials (22 conditions3 times 4 repetitions)taking about 50 minutes and including a 5-minute breakafter about 44 trials

42 Results

421 General trends

For all six instruments participants associated optimalblend with deviations ΔF le 0 Figure 6 (diamonds inlower part) illustrates the means for optimal ΔF fromwhich two different patterns become apparent among in-struments ΔF are displayed relative to a scale of equiv-alent variations tested in Experiment B with the scalevalue 0 corresponding to ΔF = 04 The grey lines indicate

3 Only 17 conditions investigated ΔF the remainder studied other for-mant properties that lie outside the focus of this article For instancerelative main-formant magnitude variations ΔL were also investigatedwhich will be addressed in a future publication4 ΔF were linearly interpolated to a scale of equi-distant levels eg minusI 0 and +I corresponding to the numerical scale values -1 0 and 1 re-spectively

each instrumentrsquos respective slider range For the instru-ments horn bassoon oboe and trumpet (left panel) opti-mal blend was found in direct proximity to ΔF = 0 Forthe unison intervals of horn and bassoon ΔF did not differsignificantly from zero ΔF for the other two instrumentswere located slightly lower [t(16) le minus56 p lt 0001rge 82] By contrast optimal ΔF for the clarinet and flute(right panel) were relatively distant from ΔF = 0 in linewith significant underestimations [t(16)leminus38 p le 0015rge 69] In summary ΔF values leading to optimal blendwere limited to cases in which the formant of the synthe-sized instrument was at or below that of the sampled in-strument

422 Instrument register and interval type

The influence of instrument register on the optimal ΔFwas investigated for trumpet bassoon and clarinet atpitches corresponding to instrument-specific low and highregisters One-way repeated-measures ANOVAs for Reg-ister yielded moderately strong main effects for trumpetand bassoon [F (1 16) ge 192 p le 0005 η2

p ge 55] dueto an increase of the optimal ΔF in the high registerA less pronounced effect was obtained for the clarinet[F (1 16)=53p= 0358η2

p = 25]The investigation of possible differences in optimal ΔF

between interval types involved comparisons between uni-son and non-unison intervals as well as a distinction be-tween consonant and dissonant for the latter One-wayrepeated-measures ANOVAs on Interval conducted forbassoon clarinet and trumpet only led to a weak maineffect for the trumpet [F (2 32)= 37p= 0347η2

p = 19]Post-hoc tests for the three possible comparisons yieldeda single significant difference between the interval sizes 0and 6 ST [t(16) = minus35 p = 0033 r = 65]

5 Experiment B

51 Method

511 Participants

The experiment was completed by 20 participants 9 fe-male and 11 male with a median age of 22 years (range18ndash35) Fifteen participants reported more than 10 yearsof formal musical training with 11 indicating experi-ence with wind instruments Participants were remuner-ated with 20 Canadian dollars

512 Stimuli

Table II lists the 22 investigated dyad conditions The In-terval factor investigated two levels comparing unison to

1044

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table II Twenty-two dyad conditions from Experiment B across instruments pitches and intervals (top-to-bottom) Intervals in semi-tones relative to the specified reference pitch

horn bassoon oboe trumpet clarinet fluteC3 B 3 A2 D4 C4 G 4 E5 C4 B 4 E3 A4 C4 G 4 E5

0 6 0 6 0 -2 0 -2 0 0 0 0 6 0 6 0 -2 0 -2 0 0 0

non-unison (6 or -2 ST dissonant) intervals Depending onthe instrument the Pitch factor involved two (horn bas-soon trumpet clarinet) or three (oboe flute) levels In thecase of horn bassoon trumpet and clarinet there weretwo levels of Interval for each level of Pitch In additionthis experiment included two factors that were related toΔF variations alone which applied to all conditions listedin Table II The first was synonymous with ΔF as it ex-plored a total of five ΔF levels including ΔF = 0 and twosets of predefined moderate and extreme deviations aboveand below it ie the ΔF levels hereafter labeled 0 plusmnI and plusmnII The second factor grouped the five levels contex-tually into two subsets of four which are denoted as lowand high contexts and defined in Figure 4 (bottom)

Employing the formant descriptors from Section 21the investigated ΔF levels were expressed on a commonscale of spectral-envelope description which provided abetter basis of comparison than taking equal frequencydifferences in Hz as the frequency extent of formantsacross instruments varied considerably Figure 5 providesexamples for all resulting ΔF levels of the horn The fourlevels ΔF = 0 were defined as frequency distances be-tween the formant maximum Fmax and measures relatedto the location and width of its bounds (eg Frarr

6dB andΔF3dB respectively) For example the positive deviationΔF (+I) was the distance between the formant maximumand its upper bound minus 10 of the width betweenthe 3 dB bounds If spectral-envelope descriptions lackedlower bounds (eg trumpet clarinet flute) the frequencylocated below Fmax that exhibited the lowest magnitudewas taken as a substitute value

Unlike the dyads in Experiment A the synthesizedsound always remained at the reference pitch whereas thesampled sound varied its pitch for non-unison intervalsbecause this tested the assumption of pitch-invariant de-scription for the recorded sounds more thoroughly Thedyads had a constant duration of 4700 ms In additionthe conditions listed in Table II including the associatedfive ΔF levels per condition had predetermined values forthe level balance between sounds and had also been equal-ized for loudness The first author determined the level bal-ance aiming for good balance between both sounds whilemaintaining discriminability between ΔF levels whichwas subsequently verified by the second author Loudnessequalization was conducted subjectively in a separate pi-lot experiment anchored to a global reference dyad for allconditions and ΔF levels For all stimuli gain levels weredetermined that equalized stimulus loudness to the globalreference These gain levels were based on median val-ues from at least five participants which were determined

200 400 600 800 1000 1200-12

-10

-8

-6

-4

-2

0

Powerlevel(dB)

Frequency (Hz)

ΔF3dB

F3dB

larrF3dB

rarrFmax

F6dB

larrF6dB

rarr

-II -I 0 +I +II

Figure 5 ΔF levels from Experiment B defined relative toa spectral-envelope estimatersquos formant maximum and boundsΔF (plusmnI) fall 10 inside of ΔF3dBrsquos extent ΔF (+II) aligns withFrarr

6dB whereas ΔF (minusII) aligns with either 80 middot F larr6dB or 150 Hz

whichever is closer to Fmax

either after the corresponding interquartile ranges first fellbelow 4 dB or after running a maximum of 10 participants

513 ProcedureA relative-rating task required participants to compare ΔFlevels for a given condition from Table II In each ex-perimental trial participants were presented four dyadsand asked to provide four corresponding ratings The fourdyads represented one of the twoΔF contexts labeled highand low in Figure 4 A continuous rating scale was em-ployed which spanned from most blended to least blended(values 1 to 0 respectively) and served as the dependentvariable Participants needed to assign two dyads to thescale extremes (eg most and least) the remaining twodyads were positioned along the scale continuum relativeto the chosen extremes Playback could be switched freelybetween the four dyads with the visual order of the se-lection buttons and rating scales for individual dyads ran-domized between trials Participants completed 120 trials(30 conditions5 times 2 contextstimes 2 repetitions) taking about75 minutes including two 5-minute breaks after about 40and 80 trials

52 Results

521 General trendsGroup medians of ratings aggregated across the factorsPitch Interval and Context illustrate how blend varies as a

5 Only the 22 conditions investigating ΔF are reported here

1045

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

function of ΔF Figure 6 suggests that participants mainlyassociated higher degrees of blend with the levels ΔF le 0whereas much lower ratings were obtained for ΔF gt 0 Interms of higher degrees of blend two typical rating pro-files as a function of ΔF emerged (shown as the ideal-ized dashed-and-dotted curves in Figure 6) 1) For the in-struments horn bassoon oboe and trumpet (left panel)medium to high blend ratings were obtained at and be-low ΔF = 0 above which ratings decreased markedlyresembling the profile of a plateau 2) The instrumentsclarinet and flute (right panel) exhibited a monotonicallydecreasing and approximately linear rating profile as ΔFincreases in which ΔF = 0 did not appear to assume a no-table role These differences in plateau vs linear profilesfor the two instrument subsets are analyzed more closelyin the following sections also taking into account potentialeffects due to the other factors

522 Blend and pitch invariance

Spectral characteristics that remain stable with pitch varia-tion such as formants may have a pitch-invariant percep-tual relevance To test this whenever the profiles of blendratings over ΔF remained largely unaffected by differentpitches intervals and ΔF contexts the perceptual resultswere assumed to be pitch-invariant For instance Figure 7suggests this tendency for the horn in which the plateauprofile was maintained over all factorial manipulations Asa first step the main effects across ΔF were tested to con-firm that ratings served as reliable indicators of perceptualdifferences Given these main effects perceptual robust-ness to pitch variation was fulfilled if no ΔF times Pitch orΔF times Interval interaction effects were found across bothΔF contexts An absence of main effects betweenΔF con-texts would indicate further perceptual robustness

As the Context factor only involved ΔF levels com-mon to both the high and low contexts namely 0 andplusmnI (see Figure 4 bottom) the ratings for these levels re-quired range normalization and separate analyses from theremaining factors For the instruments involving the In-terval factor these were conducted on split levels of thatfactor Furthermore the experimental task imposed the us-age of the rating-scale extremes which resulted in severalviolations of normality due to skewed distributions for thedyads selected as extremes As a result all main and in-teraction effects were tested with a battery of five indepen-dent repeated-measures ANOVAs on the raw and trans-formed ratings The data transformations included non-parametric approaches of rank transformation [28] andprior alignment of lsquonuisancersquo factors [29]6 The statistics

6 Given the unavailability of non-parametric alternatives for repeated-measures three-way ANOVAs that include tests for interaction effectsan approach was chosen that assesses tests over multiple variants ofdependent-variable transformations presuming that the most conserva-tive test in the ANOVA battery minimizes accepting false positives Ranktransformation is a common approach in non-parametric tests such as theone-way Friedman test [28] Issues with tests for interaction effects los-ing power in the presence of strong main effects were addressed throughlsquoalignmentrsquo of the raw data prior to rank transformation [29] For in-stance a test for the interaction AtimesB would align its lsquonuisancersquo factors

-II -I 0 +I +II

0

1

ΔF

Blendrating

Produced optimal blend

-II -I 0 +I +IIΔF

Produced optimal blend

horn

bassoon

oboe

trumpet

profile

clarinet

flute

profile

Figure 6 Perceptual results for the different instruments groupedaccording to two typical response patterns (left and right panels)Experiment A (diamonds bottom) mean ΔF for produced opti-mal blend transformed to a continuous scale of ΔF levels Thegrey lines indicate slider ranges (compare to Figure 4 top) Ex-periment B (curves) median blend ratings across ΔF levels andidealized profile

-II -I 0 +I

0

1

context lowinterval unison

Blendrating

-I 0 +I +II

0

1

context highinterval unison

pitch 1

pitch 2

-II -I 0 +I

0

1

context lowinterval non-unison

Blendrating

ΔF-I 0 +I +II

0

1

context high

interval non-unison

ΔF

Figure 7 Medians and interquartile ranges of blend ratingsfor horn across ΔF levels and the factorial manipulationsPitchtimesContexttimesInterval

for the most liberal and conservative p-values are reported(eg conserv|liberal) with the conservative finding beingassumed valid if statistical significance is in doubt

Strong main effects were found for all instrumentswhich indicated clear differences in perceived blendamong the investigated ΔF levels Table III lists ANOVAstatistics for the range between strongest (clarinet) andweakest (bassoon) main effects among the instrumentswhich reflects analogous differences in the utilized rating-scale ranges in Figure 6 Furthermore the rating profiles ofthe instruments horn bassoon oboe and trumpet fulfilledthe criteria for pitch-invariant robustness as they appeared

by removing the main effects for A and B The four data transformationsprocessed the raw data with or without alignment and for two rankingmethods The first method computed global ranks across the entire dataset ie across participants and conditions whereas the second methodevaluated within-participant ranks across conditions per participant

1046

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table III Range of strongest to weakest ANOVA main effects along ΔF across all six instruments

low context high contextEffect Stat conserv liberal conserv liberal

F 860 826 1651 1651Clarinet df 16307 21391 357 357(strongest) ε 54 69 - -

p lt0001 lt0001 lt0001 lt0001η2p 82 81 90 90

F 164 168 126 152Bassoon df 357 357 14271 357(weakest) ε - - 48 -

p lt0001 lt0001 0005 0001η2p 46 47 40 44

unaffected by pitch-related variation There was only oneexception from a complete absence of effects interactingwith ΔF a moderate main effect for Context for trum-pet was found only at non-unison intervals [F (1 19) =1048|2504 p = 0043|0001 η2

p = 355|569] By con-trast the rating profiles for clarinet and flute did not exhibitpitch invariance as they clearly violated the criteria acrossboth ΔF contexts The interaction effects with ΔF and amain effect for Context leading to their disqualification aredescribed in Table IV

The instruments displaying pitch invariance were thesame ones with plateau rating profiles possibly attributinga special role to ΔF = 0 as defining a boundary governingblend To further support this assumption by joint analysisof the four instruments two hierarchical cluster analyseswere employed that grouped ΔF levels based on their sim-ilarity in perceptual ratings or auditory-model representa-tions The first cluster analysis reinterpreted rating differ-ences between ΔF as a dissimilarity measure This mea-sure considered effect sizes of statistically significant non-parametric post-hoc analyses (Wilcoxon signed-rank test)for pairwise comparisons between ΔF levels ie greaterstatistical effects between two ΔF levels were expressedas being more dissimilar in the perceived degree of blendFor non-significant differences dissimilarity was assumedto be zero The second analysis relied on correlation coef-ficients (Pearson r) between dyad SAI profiles across ΔFlevels (see Figure 9 for examples) The dissimilarity mea-sure considered the complement value 1minus r and as all cor-relations fall within the range [0 1] no special treatmentfor negative correlations was required Both cluster anal-yses employed complete-linkage algorithms The dissimi-larity input matrices were obtained by averaging 30 inde-pendent data sets aggregated across the four instrumentsand the factors Context Pitch and Interval As shown inFigure 8 both analyses led to analogous solutions in whichthe two levels ΔF gt 0 are maximally dissimilar to acompact cluster associating the three levels ΔF le 0 Inother words ΔF associated with low and high degrees ofblend group into two distinct clusters clearly relating tothe plateau profile where ΔF = 0 defines the boundary tohigher degrees of blend

Table IV ANOVA effects for clarinet and flute suggesting an ab-sence of pitch invariance lowast The column header low context doesnot apply in this case

Clarinet low context high contextEffect Stat conserv liberal conserv liberal

F 29 34 38 44ΔF times df 357 357 20387 357Interval ε - - 68 -

p 044 024 030 008η2p 13 15 17 19

Flute low context high contextEffect Stat conserv liberal conserv liberal

F 28 43 44 72ΔF times df 39750 6114 6114 6114Pitch ε 66 - - -

p 031 0006 0005 lt0001η2p 13 18 19 28

F 49 157 - -df 119 119 - -

Contextlowast ε - - - -p 039 0008 - -η2p 21 45 - -

-II -I 0 +I +II

01

02

03

04

05

Dissimilarity

ΔF-II -I 0 +I +II

01

02

ΔF

Figure 8 Dendrograms of ΔF -level groupings for the pitch-invariant instruments Dissimilarity measures are derived fromperceptual ratings (left) and auditory-modelled dyad SAI profiles(right)

523 Blend and its spectral correlatesExplaining blending between instruments with the helpof spectral-envelope characteristics could eventually al-

1047

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table V Variables entering stepwise-regression algorithm to ob-tain models reported in Table VI a Not for pitch-variant subsetinter-variable correlation |r| gt 7 b Computed as described in[30] c Accounting for pitch d Accounting for perceivable beat-ing between partial tones

No Variable Description

1 ΔLrarr3dB derivate of Frarr

3dB Equation (1)2 ΔLF1vsF2 ΔL betw formants F1 amp F2

3 ΔSslopeF1ab spectral slope above Frarr

3dB4 ΔSslope

b global spectral slope5 |ΔScentroid

b absolute centroid difference6 Scentroid

b centroid composite7 ERBrate

c reference pitch in Table II8 I(non)unison interval category (binary)9 IST

a interval size in semitones10 Clohi ΔF context (binary)11 mixratio balance betw instruments12 AMdepth

bd amplitude modulation depth

low the prediction of blend through these instrument-specific traits In addition it could help understand the wayin which spectral characteristics contribute perceptuallyGiven this aim multiple linear regression was employed tomodel the median blend ratings through a number of vari-ables or regressors The regression models assessed therelative contributions of regressors describing both globaland local spectral-envelope traits Global descriptors in-volved the commonly reported spectral centroid and spec-tral slope [30] whereas local descriptors concerned theformant characterization discussed in Section 21 Becausea dyad yielded two descriptor values across its constituentsounds the regressor measure had to associate the two insome way For the spectral centroid two measures wereconsidered namely composite (sum) and absolute differ-ence [4] Although the centroid composite relates to thelsquodarkerrsquo-timbre hypothesis mentioned in the Introductionthe centroid difference had still been found to best explainblend in non-unison intervals [4] which left some uncer-tainty as to which of these two measures was more appro-priate in explaining blend in general The remaining spec-tral regressors were implemented as polarity-preservingdifferences between descriptors with the sampled instru-ment serving as the reference (ref) and the synthesized in-strument being variable (var) across ΔF For example thedifference of descriptor values dx for instrument x wouldcorrespond to Δd = dref minus dvar

Regression models for two separate subsets of the per-ceptual data were explored pitch-invariant (horn bassoonoboe trumpet) and pitch-variant instruments (clarinetflute) The datasets comprised all conditions tested acrossfactors and instruments with N = 118 and N = 54 forthe pitch-invariant and -variant subsets respectively Re-gressor variables were pooled from the spectral-envelopedescriptors and additional variables that were included toaccount for potential confounding factors eg pitch in-terval If these factors did not contribute as regressorsthis would further support a perceptual relevance of pitch

invariance The initial pool of regressors comprised 32variables subsequently reduced to a pre-selected set thatexhibited inter-variable correlations |r| lt 7 in order toavoid pronounced collinearity among regressors The pre-selection was determined by first identifying the vari-able that in simple linear regression exhibited the high-est R2 and subsequently adding all remaining variablesthat yielded permissible inter-variable correlations Ta-ble V lists the pre-selected variables entered into the re-gression which comprised spectral-envelope descriptors(nos 1-6) and variables representing other potential fac-tors of influence (nos 7-12) Stepwise multiple-regressionalgorithms with both forward-selection and backward-elimination schemes were considered which converged onoptimum models by iteratively adding or eliminating re-gressors respectively Models with similar combinationsof regressors to the optimum models were explored aswell In anticipation of reporting the results the inclusionof a binary regressor for ΔF context Clohi benefited all in-vestigated regression models as it corrected the systematicoffset of scaled ratings between the low and high contexts(see Figure 7)

In simple linear regression the strongest spectral-enve-lope descriptors all concerned local formant characteriza-tion and did not involve the global descriptors Among theformant descriptors the highest correlations were obtainedfor the main-formant upper bound Frarr

3dB applied to bothpitch-invariant [R2(118) = 656 p lt 0001] and pitch-variant subsets [R2(54) = 713 p lt 0001] Notably theformant maximum Fmax did not perform better than Frarr

3dBlikely due to differing skewness properties between instru-ment formants (see Figure 1) At the same time the utilityof Frarr

3dB implies that it could assume an important role inexplaining blend as perhaps the perceptually most salientfeature of formants It performed slightly better for thepitch-variant than for the pitch-invariant subset becauseΔF rarr

3dB essentially follows a strictly monotonic functionacross the investigated ΔF levels which apparently mod-els the linear blend profile better To improve the model-ing for the plateau blend profile derivate descriptors ofFrarr

3dB were explored The most effective derivate ΔLrarr3dB re-

lated the upper-bound frequencies Frarr3dB of the two instru-

ments to a corresponding change in level across the ref-erence instrumentrsquos spectral-envelope magnitude functionLref (f ) in dB as formalized in Equation 1 In other wordsthis measure evaluated magnitude differences relative tothe reference instrument at frequencies appearing to be ofparticular perceptual relevance to both instruments whichtherefore may relate to spectral-masking effects (eg in-complete masking [13])

ΔLrarr3dB = Lref F rarr

3dB|ref minus Lref F rarr3dB|var (1)

The obtained solutions from stepwise multiple regressionyielded identical models for both instrument subsets in-volving the regressors ΔLrarr

3dB absolute spectral-centroiddifference |ΔScentroid| and context Clohi A slight gain inperformance was achieved by substituting the |ΔScentroid|based on audio signals from individual sounds with a vari-

1048

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table VI Multiple-regression models best predicting timbre-blend ratings for two instrument subsets

Pitch-invariant subset R2adj=87

F (3 116)=2724 p lt0001

Regressors βstd t p

ΔLrarr3dB 100 286 lt0001

|ΔScentroid| 26 77 lt0001Clohi 27 79 lt0001

Pitch-variant subset R2adj=88

F (3 52)=1342 p lt0001

Regressors βstd t p

ΔLrarr3dB 103 200 lt0001

|ΔScentroid| 16 33 0018Clohi 34 68 lt0001

ant computed on the pitch-generalized spectral-envelopeestimates Table VI displays these optimized regressionmodels for pitch-invariant and pitch-variant subsets bothleading to about 87 explained variance (adjusted R2)The standardized regression-slope coefficients βstd indi-cate the relative contribution of regressors with the rel-ative weights being very similar for both subsets In thesemodels ΔLrarr

3dB acted as the strongest predictor for theblend ratings contributing about five times more than|ΔScentroid| which furthermore did not perform better thanClohi These findings clearly argue for local spectral-envelope descriptors to be more meaningful than globalones in explaining blending in the investigated dyadsMoreover the remaining global descriptor spectral slopeappeared to play no role Furthermore finding both in-strument subsets to be modeled equally well through thesame spectral-envelope descriptors points to a general util-ity of pitch-generalized descriptions for all instrumentsDespite the findings in Section 522 arguing against pitch-invariant perceptual robustness for clarinet and flute theobtained regression models excluded the Pitch and Intervalvariables thus appearing to be less relevant to explainingthe blend ratings

6 General discussion

Orchestrators would benefit from acoustical descriptionsof instruments that correspond to the perceptual processesinvolved in achieving blended timbres Section 2 sug-gests that common orchestral wind instruments are rea-sonably well described through pitch-generalized spectral-envelope estimates which furthermore show the instru-ments horn bassoon oboe and trumpet to be charac-terized by prominent formant structure Auditory mod-els employing stabilized auditory images (SAI) confirmthat for strong formant characterization and for lower tomiddle pitch ranges the pitch-invariant characterization isstable In higher instrument registers however SAI pro-files indicate limitations to pitch-invariant characteriza-tion Other instruments like clarinet and flute yield SAI

profiles clearly varying as a function of pitch implyingthat this pitch dependency may also extend to perception

The perceptual investigation in Sections 3 to 5 con-firms the acoustical implications showing that strong for-mant characterization results in main formants becomingperceptually relevant to blending Given a dyad in whicha putative main formant is variable in frequency relativeto a fixed reference formant the investigated instrumentsdisplay two archetypical profiles based on their formantprominence For the pitch-variant clarinet and flute blendincreases as a monotonic quasi-linear function if the vari-able formant moves from above to below the referenceFor pitch-invariant instruments the frequency alignmentbetween the variable formant and the reference (ΔF= 0)functions as a boundary delimiting a region of higher de-grees of blend at and below the reference and contrastedby a marked decrease in blend when the variable formantexceeds it which overall resembles a plateau profile Thepitch invariance even extends to different interval typesas the plateau profile remains unaffected in non-unisonintervals regardless of their degree of consonance How-ever the findings suggest that the perceptual relevance ofspectral-envelope estimates diminishes in higher instru-ment registers The limited sampling of conditions acrossthe investigated factors prevented a more comprehensiveevaluation of all instruments Whereas the findings allowgeneral relationships for formant frequency to be deducedmore comprehensive investigations are needed to attainmore generalizable quantification of absolute frequencyranges and thresholds

In correlating acoustical and perceptual factors spec-tral-envelope characteristics alone explain up to 87 ofthe variance in blend ratings In addition local spectraltraits seem to be more powerful acoustical predictors ofblend than global traits The formant descriptor for theupper formant bound Frarr

3dB when expressed as a derivatedescriptor ΔLrarr

3dB acts as the strongest predictor for theblend ratings regardless of whether instruments belongto the pitch-invariant group or not With regard to clar-inet and flute the departures from pitch invariance foundin Section 522 contradict the utility of pitch-generalizedspectral-envelope description in predicting blend ratingsas reported in Section 523 Taking both findings into ac-count this for one argues that the descriptor Frarr

3dB stillsucceeds in explaining blend well even for clarinet andflute On the other hand the same instruments displaya greater perceptual sensitivity to the Pitch and Intervalfactors likely associated with their less pronounced for-mant structure Overall strong formant prominence leadsto more drastic changes in blend

The prediction of blend using ΔLrarr3dB still presumes

that one of the instruments serves as a reference formantas the employed difference descriptors are anchored tothe sampled instrument The dependence on a referenceleaves some ambiguity because an arbitrary combinationof two instruments would lead to contradictory predictionsof blend if both instruments were given equal importancein serving as the reference Given the context of both ex-

1049

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

periments it can be assumed that the sampled instrumentacting as a constant anchor had been biased into servingas the reference by combining it with a variable synthe-sized instrument In addition a possible perceptual expla-nation could concern audio samples of instruments playingwithout vibrato generally still exhibiting coherent micro-modulations of partial tones These modulations have beenshown to contribute to a stronger and more stable audi-tory image [31] and may thus bias the more stable imagetoward acting as the reference especially as the synthe-sized partials remain static over time Even in the contextof blending in musical performance one instrument as-sumes the role of the leading voice in which it possiblyserves as the reference whereas an accompanying instru-ment avoids exceeding the lead instrumentrsquos main-formantfrequency Likewise returning to the notion of blend lead-ing to augmented timbres [4] the dominant timbre to beembellished by another would seem predestined to func-tion as the reference based on which the less dominanttimbre should not exhibit formant frequencies exceedingthose of the reference

Finally the results allow a reassessment of previousexplanations for blend The lsquodarkerrsquo-timbre hypothesis[4] is directly reflected in the obtained linear blend pro-file in which lower ΔF increases blend and at the sametime causes a decrease in the spectral-centroid compos-ite By contrast this hypothesis is not well explained bythe plateau profile as blend ratings remain similarly highfor ΔF le 0 The alternative hypothesis of coincidence offormant regions [3] would have predicted stronger blendratings for ΔF = 0 than for all other levels which in theperceptual results is only achieved for the levels ΔF gt 0While the hypothesis only achieves partial fulfillment withrespect to spectral variations ΔF it finds more agreementin the corresponding SAI representations As shown in twoexample cases for horn in Figure 9 the dyad SAI pro-files for the levels ΔF gt 0 are distinguishable from theremaining levels through clear deviations between 1 and2 kHz7 and located just above the hornrsquos estimated Frarr

3dBRemarkably the formant shifts related to ΔFlt 0 (Fig-ure 3) are not reflected in the corresponding dyad SAIprofiles (Figure 9) which instead exhibit direct align-ment below 500 Hz for all three levels ΔF le 0 There-fore only ΔF gt 0 seems to lead to incomplete masking[13] revealing the presence of the synthesized instrumentwhereas the spectral-envelope variations ΔF lt 0 evokelittle change compared to the dyad excitation pattern forΔF = 0

Of still greater importance the auditory system as mod-eled by AIM using the DCGC seemingly involves a high-

7 Concerning the output from the AIM a misalignment between actualsinusoidal frequencies and the corresponding SAI peaks was observedThrough personal communication with the developer of the utilized AIMimplementation this was explained as being an inherent property of thedynamic-compression filters A correction function was derived by com-puting SAIs for various sinusoidal frequencies and fitting the two fre-quency scales yielding the linear function fSAI = 117 middot f + 282 Hz[r2(50)= 999 p lt 0001] In Figure 9 the correction manifests itself inthe compressed frequency extent for the SAIs

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

001

002

SAImagnitude

0

-II

-I

+I

+II

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

002

004

006

008

Frequency (Hz)

Figure 9 SAI profiles of dyads for allΔF levels (Experiment B)depicting two experimental conditions for horn Top pitch 1 uni-son bottom pitch 2 non-unison the grid lines correspond topartial-tone locations

pass characteristic that attenuates spectral-envelope re-gions below 500 Hz affecting the perceived magnitudesof the respective partial tones (grid lines) This impliesthat in the region below 500 Hz frequency deviations be-tween main formants no longer affect the achieved degreeof blend as reflected both in Figure 9 and the perceptualfindings Horn and bassoon would especially benefit fromthis as their main formants are centered around 500 HzOboe and trumpet both exhibiting higher Frarr

3dB can be as-sumed to benefit to a lesser degree In summary main for-mants located in proximity to 500 Hz will benefit more ontop of which lower pitches would also increase the num-ber of partial tones falling into this favorable frequencyregion This reflects tendencies for pitch-invariant traits inSAI correlations (see Section 22) to be more pronouncedat lower pitch ranges and for instruments of lower registerwhich would lend support to the lsquodarker-timbrersquo hypothe-sis in terms of limiting the spectral centroid in frequency

7 Conclusion

Evidence from acoustical and psychoacoustical descrip-tions of wind instruments and from perceptual validationshows that relative location and prominence of main for-mants affect the perception of timbre blend critically Fur-thermore these pitch-invariant spectral characteristics ex-plain and predict the perception of blend to a promis-ingly high degree Remaining discrepancies between theacoustic and perceptual domains can be explained throughapparent constraints of the simulated auditory systemIn conclusion a perceptual model for the contributionof local spectral-envelope characteristics to blending isproposed keeping in mind that it would serve as aninstrument-specific component in a more complex gen-eral perceptual model involving compositional and per-formance factors as initially discussed in Section 1 Themain factors influencing the perception of spectral blendare summarized1 Frequency relationships between upper bounds of main

formants are critical to blend Among several instru-

1050

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

ments one is expected to serve as the reference (eglead instrument dominant timbre) above which thepresence of other instrumentsrsquo formants would stronglyresult in decreased blend

2 Prominence of the main formants governs whetherthese relationships lead to plateau or linear blend pro-files and in the first case pitch-invariant perceptual ro-bustness extends to non-unison intervals

3 Spectral-envelope relationships below 500 Hz may benegligible due to constraints of the auditory system Atthe same time blend decreases at higher pitches due toa degraded perceptual robustness of formants

This hypothetical model still requires further investigationconcerning a more systematic study on 1) the apparentconstraints of the auditory system as modeled by AIM2) how in musical practice one instrument may functionas the reference 3) establishing a more specific descrip-tion of formant prominence and 4) addressing the contri-bution of loudness balance between instruments to blendThese future investigations will further validate and refinethe proposed perceptual model and will improve compu-tational prediction tools for the instrument-specific spec-tral component of blend Orchestration practice will ben-efit from these research efforts even beyond aiming forblend as knowledge of favorable instrument relationshipsalso informs orchestrators as to how to avoid it

References

[1] G J Sandell Concurrent timbres in orchestration a per-ceptual study of factors determining blend PhD disserta-tion Northwestern University 1991

[2] R A Kendall E C Carterette Identification and blend oftimbres as a basis for orchestration Contemp Music Rev9 (1993) 51ndash67

[3] C Reuter Die auditive Diskrimination von Orchesterin-strumenten - Verschmelzung und Heraushoumlrbarkeit vonInstrumentalklangfarben im Ensemblespiel (The auditorydiscrimination of orchestral instruments fusion and distin-guishability of instrumental timbres in ensemble playing)P Lang Frankfurt am Main 1996

[4] G J Sandell Roles for spectral centroid and other factorsin determining ldquoblended instrument pairings in orchestra-tion Music Percept 13 (1995) 209ndash246

[5] A S Bregman Auditory scene analysis the perceptual or-ganization of sound MIT Press Cambridge MA 1990

[6] S McAdams S Winsberg S Donnadieu G De Soete JKrimphoff Perceptual scaling of synthesized musical tim-bres common dimensions specificities and latent subjectclasses Psychol Res 58 (1995) 177ndash192

[7] L Wedin G Goude Dimension analysis of the perceptionof instrumental timbre Scand J Psychol 13 (1972) 228ndash240

[8] J M Grey J W Gordon Perceptual effects of spectralmodifications on musical timbres J Acoust Soc Am 63(1978) 1493ndash1500

[9] W Strong M Clark Synthesis of wind-instrument tonesJ Acoust Soc Am 41 (1967) 39ndash52

[10] D Tardieu S McAdams Perception of dyads of impulsiveand sustained instrument sounds Music Percept 30 (2012)117ndash128

[11] L Hermann Phonophotographische Untersuchungen(Phonophotographic investigations) Pfluumlger Archiv fuumlrdie Gesamte Physiologie des Menschen und der Thiere 58(1894) 264ndash279

[12] C Stumpf Die Sprachlaute (The speech sounds) SpringerBerlin 1926

[13] J P Fricke Zur Anwendung digitaler Klangfarbenfilter beiAufnahme und Wiedergabe (On the application of digitaltimbre filters during recording and playback) Proc 14thTonmeistertagung Munich Germany 1986 135ndash148

[14] K E Schumann Physik der Klangfarben (Physics of tim-bres) Professorial dissertation Universitaumlt Berlin Berlin1929

[15] E L Saldanha J F Corso Timbre cues and the identifica-tion of musical instruments J Acoust Soc Am 36 (1964)2021ndash2026

[16] W Strong M Clark Perturbations of synthetic orchestralwind-instrument tones J Acoust Soc Am 41 (1967) 277ndash285

[17] D Luce J Clark Physical correlates of brass-instrumenttones J Acoust Soc Am 42 (1967) 1232ndash1243

[18] D A Luce Dynamic spectrum changes of orchestral in-struments J Audio Eng Soc 23 (1975) 565ndash568

[19] J Meyer Acoustics and the performance of music 5th edSpringer New York 2009

[20] G Fant Acoustic theory of speech production MoutonThe Hague 1960

[21] C Reuter Klangfarbe und Instrumentation Geschichte ndashUrsachen ndash Wirkung (Timbre and instrumentation historyndash causes ndash effect) P Lang Frankfurt am Main 2002

[22] R van Dinther R D Patterson Perception of acousticscale and size in musical instrument sounds J AcoustSoc Am 120 (2006) 2158ndash2176

[23] T Irino R D Patterson A dynamic compressive gam-machirp auditory filterbank IEEE Trans Audio SpeechLang Process 14 (2006) 2222ndash2232

[24] B C Moore B R Glasberg Suggested formulae for cal-culating auditory-filter bandwidths and excitation patternsJ Acoust Soc Am 74 (1983) 750ndash753

[25] S-A Lembke S McAdams A spectral-envelope synthe-sis model to study perceptual blend between wind instru-ments Proc 11th Congregraves Franccedilais drsquoAcoustique IOAannual meeting Acoustics 2012 Nantes France 20121025ndash1030

[26] F Martin C Champlin Reconsidering the limits of normalhearing J Am Acad Audiol 11 (2000) 64ndash66

[27] ISO 389ndash8 Acoustics Reference zero for the calibrationof audiometric equipmentmdashPart 8 Reference equivalentthreshold sound pressure levels for pure tones and circum-aural earphones Tech Rept International Organization forStandardization Geneva Switzerland 2004

[28] W J Conover R L Iman Rank transformations as abridge between parametric and nonparametric statisticsAm Stat 35 (1981) 124ndash129

[29] J J Higgins S Tashtoush An aligned rank transform testfor interaction Nonlinear World 1 (1994) 201ndash221

[30] G Peeters B L Giordano P Susini N Misdariis SMcAdams The Timbre Toolbox extracting audio descrip-tors from musical signals J Acoust Soc Am 130 (2011)2902ndash2916

[31] S McAdams Spectral fusion spectral parsing and the for-mation of auditory images PhD dissertation Stanford Uni-versity 1984

1051

Page 5: The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

asked to find the ΔF that gave optimal blend whereasin Experiment B listeners provided direct blend ratingsfor predefined ΔF variations Using the instruments pre-sented in Section 2 the robustness of perceptual effectswas assessed over the two experimental tasks for differ-ent pitches unison and non-unison intervals and stimuluscontexts Given six instruments and various pitches andintervals it was impractical to test all possible combina-tions An exploratory approach was chosen instead withnot all instruments being tested across all factors Whereasthis could reveal blend-related dependencies concerningΔF across the factors of interest it did not allow gener-alizing across all instruments to the same degree as well asdetermining perceptual thresholds Still each factor wasstudied with at least three instruments for greater gener-alizability Pitches were chosen to represent common reg-isters of the individual instruments Non-unison intervalsincluded both smaller and larger intervals

The methods both experiments share in common arepresented in this section before addressing their specificsand results in the following sections

31 ParticipantsDue to the demanding experimental tasks participantsof both experiments were musically experienced listen-ers They were recruited primarily from the SchulichSchool of Music McGill University Their backgroundswere assessed through self-reported degree of formalmusical training accumulated across several disciplineseg instrumental performance composition music the-ory andor sound recording All participants passed a stan-dardized hearing test [26 27] No participant took part inboth experiments

32 StimuliAll stimuli involved dyads comprising one sampled(drawn from VSL) and one synthesized sound For a sam-ple at any given pitch the spectral envelope was ap-proximated by the pitch-generalized description from Sec-tion 21 which resulted in the main formants of sampledand synthesized sounds resembling each other for ΔF = 0With regard to the temporal envelope both instrumentswere synchronized in their note onsets followed by thesustain portion and ending with an artificial 100-ms linearamplitude decay ramp applied to both instrument soundsThe sampled sound retained its original onset character-istics whereas across all modeled analogues the synthe-sized onsets were characterized by a constant 100-ms lin-ear amplitude ramp Stimuli were presented over a stan-dard two-channel stereophonic loudspeaker setup insidean Industrial Acoustics Company double-walled soundbooth with the instruments simulated as being capturedby a stereo main microphone at spatially distinct locationsinside a mid-sized moderately reverberant room (see [25]for details)

33 ProcedureExperimental conditions were presented in randomized or-der within blocks of repetitions A specific condition could

not occur twice in succession between blocks The mainexperiments were in each case preceded by 10 practice tri-als under the guidance of the experimenter to familiar-ize participants with the task and with representative ex-amples of stimulus variations Dyads were played repeat-edly throughout experimental trials allowing participantsto pause playback at any time

34 Data analysis

With respect to investigated factors Experiment A evalu-ated the influence of the factors instrument register and in-terval type Experiment B assessed pitch-invariant percep-tual performance across a number of factors and further-more correlated the perceptual data with spectral-envelopetraits Separate analyses of variance (ANOVAs) were con-ducted for each instrument testing for statistically signifi-cant main effects within factors and interaction effects be-tween them A criterion significance level α = 05 waschosen and if multiple analyses on split factor levels orindividual post-hoc analyses were conducted Bonferronicorrections were applied In repeated-measures ANOVAsthe Greenhouse-Geisser correction (ε) was applied when-ever the assumption of sphericity was violated In additionExperiment A also considered one-sample t-tests againsta mean of zero for testing differences to ΔF = 0 Statis-tical effect sizes η2

p and r are reported for ANOVAs andt-tests respectively The analyses considered participant-based averages for trial repetitions of identical conditions

4 Experiment A

41 Method

411 Participants

The experiment was conducted with 17 participants 6 fe-male and 11 male with a median age of 27 years (range20-57) Fifteen participants reported more than 10 yearsof formal musical training with 10 indicating experi-ence with wind instruments Participants were remuner-ated with 15 Canadian dollars

412 Stimuli

Table I lists the 17 investigated dyad conditions (columnentries of bottom row) All instruments included unisonintervals (0 semitones ST) With regard to additional fac-tors three levels of the Interval factor compared unisonintervals to consonant (7 or minus3 ST) and dissonant (6 orminus2 ST) non-unison intervals Two levels of the Registerfactor contrasted low (A2 C4 or E3) to high (D5 or B5) in-strument registers for unison dyads with the high-registerpitches being derived from the pitch-variant regions iden-tified in Section 22 The sampled sound remained at theindicated reference pitch whereas the synthesized soundvaried relative to it to form the non-unison intervals Alldyads had constant durations of 4900 ms The level bal-ance between instruments was variable and determined bythe participant

1043

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table I Seventeen dyad conditions from Experiment A across instruments pitches and intervals (top-to-bottom) Intervals in semitonesrelative to the specified reference pitch

horn bassoon oboe trumpet clarinet fluteC3 A2 D5 C4 C4 B5 E3 D5 C4

0 6 7 0 -2 -3 0 0 0 6 7 0 0 -2 -3 0 0

ΔF

-II -I +II+I0

A

B

Γ= -100 Hz fslider

high

low

Figure 4 ΔF variations investigated in Experiments A and B AParticipants controlled fslider which provided a constant range of700 Hz (white arrows) Γ (eg minus100 Hz) represented a random-ized roving parameter preventing the range from always beingcentered on ΔF = 0 B Participants rated four dyads varyingin ΔF drawn from the low or high context The contexts repre-sented subsets of four of the total of five predefined ΔF levels

413 Procedure

A production task required participants to adjust ΔF di-rectly in order to achieve the maximum attainable blendwith the produced value serving as the dependent vari-able User control was provided via a two-dimensionalgraphical interface including controls for ΔF and thelevel balance between instruments The slider controls forΔF= fslider + Γ provided a constant range of 700 Hzwith fslider isin [minus350+350] and including a randomizedroving offset Γ isin [minus100+100] between trials As visu-alized in Figure 4 (top) minimal or maximal Γ limitedthe range covered by all trials to 500 Hz (solid thick greyline) with all possible ΔF deviations spanning a range of900 Hz (dashed thick line) Participants completed a totalof 88 experimental trials (22 conditions3 times 4 repetitions)taking about 50 minutes and including a 5-minute breakafter about 44 trials

42 Results

421 General trends

For all six instruments participants associated optimalblend with deviations ΔF le 0 Figure 6 (diamonds inlower part) illustrates the means for optimal ΔF fromwhich two different patterns become apparent among in-struments ΔF are displayed relative to a scale of equiv-alent variations tested in Experiment B with the scalevalue 0 corresponding to ΔF = 04 The grey lines indicate

3 Only 17 conditions investigated ΔF the remainder studied other for-mant properties that lie outside the focus of this article For instancerelative main-formant magnitude variations ΔL were also investigatedwhich will be addressed in a future publication4 ΔF were linearly interpolated to a scale of equi-distant levels eg minusI 0 and +I corresponding to the numerical scale values -1 0 and 1 re-spectively

each instrumentrsquos respective slider range For the instru-ments horn bassoon oboe and trumpet (left panel) opti-mal blend was found in direct proximity to ΔF = 0 Forthe unison intervals of horn and bassoon ΔF did not differsignificantly from zero ΔF for the other two instrumentswere located slightly lower [t(16) le minus56 p lt 0001rge 82] By contrast optimal ΔF for the clarinet and flute(right panel) were relatively distant from ΔF = 0 in linewith significant underestimations [t(16)leminus38 p le 0015rge 69] In summary ΔF values leading to optimal blendwere limited to cases in which the formant of the synthe-sized instrument was at or below that of the sampled in-strument

422 Instrument register and interval type

The influence of instrument register on the optimal ΔFwas investigated for trumpet bassoon and clarinet atpitches corresponding to instrument-specific low and highregisters One-way repeated-measures ANOVAs for Reg-ister yielded moderately strong main effects for trumpetand bassoon [F (1 16) ge 192 p le 0005 η2

p ge 55] dueto an increase of the optimal ΔF in the high registerA less pronounced effect was obtained for the clarinet[F (1 16)=53p= 0358η2

p = 25]The investigation of possible differences in optimal ΔF

between interval types involved comparisons between uni-son and non-unison intervals as well as a distinction be-tween consonant and dissonant for the latter One-wayrepeated-measures ANOVAs on Interval conducted forbassoon clarinet and trumpet only led to a weak maineffect for the trumpet [F (2 32)= 37p= 0347η2

p = 19]Post-hoc tests for the three possible comparisons yieldeda single significant difference between the interval sizes 0and 6 ST [t(16) = minus35 p = 0033 r = 65]

5 Experiment B

51 Method

511 Participants

The experiment was completed by 20 participants 9 fe-male and 11 male with a median age of 22 years (range18ndash35) Fifteen participants reported more than 10 yearsof formal musical training with 11 indicating experi-ence with wind instruments Participants were remuner-ated with 20 Canadian dollars

512 Stimuli

Table II lists the 22 investigated dyad conditions The In-terval factor investigated two levels comparing unison to

1044

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table II Twenty-two dyad conditions from Experiment B across instruments pitches and intervals (top-to-bottom) Intervals in semi-tones relative to the specified reference pitch

horn bassoon oboe trumpet clarinet fluteC3 B 3 A2 D4 C4 G 4 E5 C4 B 4 E3 A4 C4 G 4 E5

0 6 0 6 0 -2 0 -2 0 0 0 0 6 0 6 0 -2 0 -2 0 0 0

non-unison (6 or -2 ST dissonant) intervals Depending onthe instrument the Pitch factor involved two (horn bas-soon trumpet clarinet) or three (oboe flute) levels In thecase of horn bassoon trumpet and clarinet there weretwo levels of Interval for each level of Pitch In additionthis experiment included two factors that were related toΔF variations alone which applied to all conditions listedin Table II The first was synonymous with ΔF as it ex-plored a total of five ΔF levels including ΔF = 0 and twosets of predefined moderate and extreme deviations aboveand below it ie the ΔF levels hereafter labeled 0 plusmnI and plusmnII The second factor grouped the five levels contex-tually into two subsets of four which are denoted as lowand high contexts and defined in Figure 4 (bottom)

Employing the formant descriptors from Section 21the investigated ΔF levels were expressed on a commonscale of spectral-envelope description which provided abetter basis of comparison than taking equal frequencydifferences in Hz as the frequency extent of formantsacross instruments varied considerably Figure 5 providesexamples for all resulting ΔF levels of the horn The fourlevels ΔF = 0 were defined as frequency distances be-tween the formant maximum Fmax and measures relatedto the location and width of its bounds (eg Frarr

6dB andΔF3dB respectively) For example the positive deviationΔF (+I) was the distance between the formant maximumand its upper bound minus 10 of the width betweenthe 3 dB bounds If spectral-envelope descriptions lackedlower bounds (eg trumpet clarinet flute) the frequencylocated below Fmax that exhibited the lowest magnitudewas taken as a substitute value

Unlike the dyads in Experiment A the synthesizedsound always remained at the reference pitch whereas thesampled sound varied its pitch for non-unison intervalsbecause this tested the assumption of pitch-invariant de-scription for the recorded sounds more thoroughly Thedyads had a constant duration of 4700 ms In additionthe conditions listed in Table II including the associatedfive ΔF levels per condition had predetermined values forthe level balance between sounds and had also been equal-ized for loudness The first author determined the level bal-ance aiming for good balance between both sounds whilemaintaining discriminability between ΔF levels whichwas subsequently verified by the second author Loudnessequalization was conducted subjectively in a separate pi-lot experiment anchored to a global reference dyad for allconditions and ΔF levels For all stimuli gain levels weredetermined that equalized stimulus loudness to the globalreference These gain levels were based on median val-ues from at least five participants which were determined

200 400 600 800 1000 1200-12

-10

-8

-6

-4

-2

0

Powerlevel(dB)

Frequency (Hz)

ΔF3dB

F3dB

larrF3dB

rarrFmax

F6dB

larrF6dB

rarr

-II -I 0 +I +II

Figure 5 ΔF levels from Experiment B defined relative toa spectral-envelope estimatersquos formant maximum and boundsΔF (plusmnI) fall 10 inside of ΔF3dBrsquos extent ΔF (+II) aligns withFrarr

6dB whereas ΔF (minusII) aligns with either 80 middot F larr6dB or 150 Hz

whichever is closer to Fmax

either after the corresponding interquartile ranges first fellbelow 4 dB or after running a maximum of 10 participants

513 ProcedureA relative-rating task required participants to compare ΔFlevels for a given condition from Table II In each ex-perimental trial participants were presented four dyadsand asked to provide four corresponding ratings The fourdyads represented one of the twoΔF contexts labeled highand low in Figure 4 A continuous rating scale was em-ployed which spanned from most blended to least blended(values 1 to 0 respectively) and served as the dependentvariable Participants needed to assign two dyads to thescale extremes (eg most and least) the remaining twodyads were positioned along the scale continuum relativeto the chosen extremes Playback could be switched freelybetween the four dyads with the visual order of the se-lection buttons and rating scales for individual dyads ran-domized between trials Participants completed 120 trials(30 conditions5 times 2 contextstimes 2 repetitions) taking about75 minutes including two 5-minute breaks after about 40and 80 trials

52 Results

521 General trendsGroup medians of ratings aggregated across the factorsPitch Interval and Context illustrate how blend varies as a

5 Only the 22 conditions investigating ΔF are reported here

1045

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

function of ΔF Figure 6 suggests that participants mainlyassociated higher degrees of blend with the levels ΔF le 0whereas much lower ratings were obtained for ΔF gt 0 Interms of higher degrees of blend two typical rating pro-files as a function of ΔF emerged (shown as the ideal-ized dashed-and-dotted curves in Figure 6) 1) For the in-struments horn bassoon oboe and trumpet (left panel)medium to high blend ratings were obtained at and be-low ΔF = 0 above which ratings decreased markedlyresembling the profile of a plateau 2) The instrumentsclarinet and flute (right panel) exhibited a monotonicallydecreasing and approximately linear rating profile as ΔFincreases in which ΔF = 0 did not appear to assume a no-table role These differences in plateau vs linear profilesfor the two instrument subsets are analyzed more closelyin the following sections also taking into account potentialeffects due to the other factors

522 Blend and pitch invariance

Spectral characteristics that remain stable with pitch varia-tion such as formants may have a pitch-invariant percep-tual relevance To test this whenever the profiles of blendratings over ΔF remained largely unaffected by differentpitches intervals and ΔF contexts the perceptual resultswere assumed to be pitch-invariant For instance Figure 7suggests this tendency for the horn in which the plateauprofile was maintained over all factorial manipulations Asa first step the main effects across ΔF were tested to con-firm that ratings served as reliable indicators of perceptualdifferences Given these main effects perceptual robust-ness to pitch variation was fulfilled if no ΔF times Pitch orΔF times Interval interaction effects were found across bothΔF contexts An absence of main effects betweenΔF con-texts would indicate further perceptual robustness

As the Context factor only involved ΔF levels com-mon to both the high and low contexts namely 0 andplusmnI (see Figure 4 bottom) the ratings for these levels re-quired range normalization and separate analyses from theremaining factors For the instruments involving the In-terval factor these were conducted on split levels of thatfactor Furthermore the experimental task imposed the us-age of the rating-scale extremes which resulted in severalviolations of normality due to skewed distributions for thedyads selected as extremes As a result all main and in-teraction effects were tested with a battery of five indepen-dent repeated-measures ANOVAs on the raw and trans-formed ratings The data transformations included non-parametric approaches of rank transformation [28] andprior alignment of lsquonuisancersquo factors [29]6 The statistics

6 Given the unavailability of non-parametric alternatives for repeated-measures three-way ANOVAs that include tests for interaction effectsan approach was chosen that assesses tests over multiple variants ofdependent-variable transformations presuming that the most conserva-tive test in the ANOVA battery minimizes accepting false positives Ranktransformation is a common approach in non-parametric tests such as theone-way Friedman test [28] Issues with tests for interaction effects los-ing power in the presence of strong main effects were addressed throughlsquoalignmentrsquo of the raw data prior to rank transformation [29] For in-stance a test for the interaction AtimesB would align its lsquonuisancersquo factors

-II -I 0 +I +II

0

1

ΔF

Blendrating

Produced optimal blend

-II -I 0 +I +IIΔF

Produced optimal blend

horn

bassoon

oboe

trumpet

profile

clarinet

flute

profile

Figure 6 Perceptual results for the different instruments groupedaccording to two typical response patterns (left and right panels)Experiment A (diamonds bottom) mean ΔF for produced opti-mal blend transformed to a continuous scale of ΔF levels Thegrey lines indicate slider ranges (compare to Figure 4 top) Ex-periment B (curves) median blend ratings across ΔF levels andidealized profile

-II -I 0 +I

0

1

context lowinterval unison

Blendrating

-I 0 +I +II

0

1

context highinterval unison

pitch 1

pitch 2

-II -I 0 +I

0

1

context lowinterval non-unison

Blendrating

ΔF-I 0 +I +II

0

1

context high

interval non-unison

ΔF

Figure 7 Medians and interquartile ranges of blend ratingsfor horn across ΔF levels and the factorial manipulationsPitchtimesContexttimesInterval

for the most liberal and conservative p-values are reported(eg conserv|liberal) with the conservative finding beingassumed valid if statistical significance is in doubt

Strong main effects were found for all instrumentswhich indicated clear differences in perceived blendamong the investigated ΔF levels Table III lists ANOVAstatistics for the range between strongest (clarinet) andweakest (bassoon) main effects among the instrumentswhich reflects analogous differences in the utilized rating-scale ranges in Figure 6 Furthermore the rating profiles ofthe instruments horn bassoon oboe and trumpet fulfilledthe criteria for pitch-invariant robustness as they appeared

by removing the main effects for A and B The four data transformationsprocessed the raw data with or without alignment and for two rankingmethods The first method computed global ranks across the entire dataset ie across participants and conditions whereas the second methodevaluated within-participant ranks across conditions per participant

1046

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table III Range of strongest to weakest ANOVA main effects along ΔF across all six instruments

low context high contextEffect Stat conserv liberal conserv liberal

F 860 826 1651 1651Clarinet df 16307 21391 357 357(strongest) ε 54 69 - -

p lt0001 lt0001 lt0001 lt0001η2p 82 81 90 90

F 164 168 126 152Bassoon df 357 357 14271 357(weakest) ε - - 48 -

p lt0001 lt0001 0005 0001η2p 46 47 40 44

unaffected by pitch-related variation There was only oneexception from a complete absence of effects interactingwith ΔF a moderate main effect for Context for trum-pet was found only at non-unison intervals [F (1 19) =1048|2504 p = 0043|0001 η2

p = 355|569] By con-trast the rating profiles for clarinet and flute did not exhibitpitch invariance as they clearly violated the criteria acrossboth ΔF contexts The interaction effects with ΔF and amain effect for Context leading to their disqualification aredescribed in Table IV

The instruments displaying pitch invariance were thesame ones with plateau rating profiles possibly attributinga special role to ΔF = 0 as defining a boundary governingblend To further support this assumption by joint analysisof the four instruments two hierarchical cluster analyseswere employed that grouped ΔF levels based on their sim-ilarity in perceptual ratings or auditory-model representa-tions The first cluster analysis reinterpreted rating differ-ences between ΔF as a dissimilarity measure This mea-sure considered effect sizes of statistically significant non-parametric post-hoc analyses (Wilcoxon signed-rank test)for pairwise comparisons between ΔF levels ie greaterstatistical effects between two ΔF levels were expressedas being more dissimilar in the perceived degree of blendFor non-significant differences dissimilarity was assumedto be zero The second analysis relied on correlation coef-ficients (Pearson r) between dyad SAI profiles across ΔFlevels (see Figure 9 for examples) The dissimilarity mea-sure considered the complement value 1minus r and as all cor-relations fall within the range [0 1] no special treatmentfor negative correlations was required Both cluster anal-yses employed complete-linkage algorithms The dissimi-larity input matrices were obtained by averaging 30 inde-pendent data sets aggregated across the four instrumentsand the factors Context Pitch and Interval As shown inFigure 8 both analyses led to analogous solutions in whichthe two levels ΔF gt 0 are maximally dissimilar to acompact cluster associating the three levels ΔF le 0 Inother words ΔF associated with low and high degrees ofblend group into two distinct clusters clearly relating tothe plateau profile where ΔF = 0 defines the boundary tohigher degrees of blend

Table IV ANOVA effects for clarinet and flute suggesting an ab-sence of pitch invariance lowast The column header low context doesnot apply in this case

Clarinet low context high contextEffect Stat conserv liberal conserv liberal

F 29 34 38 44ΔF times df 357 357 20387 357Interval ε - - 68 -

p 044 024 030 008η2p 13 15 17 19

Flute low context high contextEffect Stat conserv liberal conserv liberal

F 28 43 44 72ΔF times df 39750 6114 6114 6114Pitch ε 66 - - -

p 031 0006 0005 lt0001η2p 13 18 19 28

F 49 157 - -df 119 119 - -

Contextlowast ε - - - -p 039 0008 - -η2p 21 45 - -

-II -I 0 +I +II

01

02

03

04

05

Dissimilarity

ΔF-II -I 0 +I +II

01

02

ΔF

Figure 8 Dendrograms of ΔF -level groupings for the pitch-invariant instruments Dissimilarity measures are derived fromperceptual ratings (left) and auditory-modelled dyad SAI profiles(right)

523 Blend and its spectral correlatesExplaining blending between instruments with the helpof spectral-envelope characteristics could eventually al-

1047

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table V Variables entering stepwise-regression algorithm to ob-tain models reported in Table VI a Not for pitch-variant subsetinter-variable correlation |r| gt 7 b Computed as described in[30] c Accounting for pitch d Accounting for perceivable beat-ing between partial tones

No Variable Description

1 ΔLrarr3dB derivate of Frarr

3dB Equation (1)2 ΔLF1vsF2 ΔL betw formants F1 amp F2

3 ΔSslopeF1ab spectral slope above Frarr

3dB4 ΔSslope

b global spectral slope5 |ΔScentroid

b absolute centroid difference6 Scentroid

b centroid composite7 ERBrate

c reference pitch in Table II8 I(non)unison interval category (binary)9 IST

a interval size in semitones10 Clohi ΔF context (binary)11 mixratio balance betw instruments12 AMdepth

bd amplitude modulation depth

low the prediction of blend through these instrument-specific traits In addition it could help understand the wayin which spectral characteristics contribute perceptuallyGiven this aim multiple linear regression was employed tomodel the median blend ratings through a number of vari-ables or regressors The regression models assessed therelative contributions of regressors describing both globaland local spectral-envelope traits Global descriptors in-volved the commonly reported spectral centroid and spec-tral slope [30] whereas local descriptors concerned theformant characterization discussed in Section 21 Becausea dyad yielded two descriptor values across its constituentsounds the regressor measure had to associate the two insome way For the spectral centroid two measures wereconsidered namely composite (sum) and absolute differ-ence [4] Although the centroid composite relates to thelsquodarkerrsquo-timbre hypothesis mentioned in the Introductionthe centroid difference had still been found to best explainblend in non-unison intervals [4] which left some uncer-tainty as to which of these two measures was more appro-priate in explaining blend in general The remaining spec-tral regressors were implemented as polarity-preservingdifferences between descriptors with the sampled instru-ment serving as the reference (ref) and the synthesized in-strument being variable (var) across ΔF For example thedifference of descriptor values dx for instrument x wouldcorrespond to Δd = dref minus dvar

Regression models for two separate subsets of the per-ceptual data were explored pitch-invariant (horn bassoonoboe trumpet) and pitch-variant instruments (clarinetflute) The datasets comprised all conditions tested acrossfactors and instruments with N = 118 and N = 54 forthe pitch-invariant and -variant subsets respectively Re-gressor variables were pooled from the spectral-envelopedescriptors and additional variables that were included toaccount for potential confounding factors eg pitch in-terval If these factors did not contribute as regressorsthis would further support a perceptual relevance of pitch

invariance The initial pool of regressors comprised 32variables subsequently reduced to a pre-selected set thatexhibited inter-variable correlations |r| lt 7 in order toavoid pronounced collinearity among regressors The pre-selection was determined by first identifying the vari-able that in simple linear regression exhibited the high-est R2 and subsequently adding all remaining variablesthat yielded permissible inter-variable correlations Ta-ble V lists the pre-selected variables entered into the re-gression which comprised spectral-envelope descriptors(nos 1-6) and variables representing other potential fac-tors of influence (nos 7-12) Stepwise multiple-regressionalgorithms with both forward-selection and backward-elimination schemes were considered which converged onoptimum models by iteratively adding or eliminating re-gressors respectively Models with similar combinationsof regressors to the optimum models were explored aswell In anticipation of reporting the results the inclusionof a binary regressor for ΔF context Clohi benefited all in-vestigated regression models as it corrected the systematicoffset of scaled ratings between the low and high contexts(see Figure 7)

In simple linear regression the strongest spectral-enve-lope descriptors all concerned local formant characteriza-tion and did not involve the global descriptors Among theformant descriptors the highest correlations were obtainedfor the main-formant upper bound Frarr

3dB applied to bothpitch-invariant [R2(118) = 656 p lt 0001] and pitch-variant subsets [R2(54) = 713 p lt 0001] Notably theformant maximum Fmax did not perform better than Frarr

3dBlikely due to differing skewness properties between instru-ment formants (see Figure 1) At the same time the utilityof Frarr

3dB implies that it could assume an important role inexplaining blend as perhaps the perceptually most salientfeature of formants It performed slightly better for thepitch-variant than for the pitch-invariant subset becauseΔF rarr

3dB essentially follows a strictly monotonic functionacross the investigated ΔF levels which apparently mod-els the linear blend profile better To improve the model-ing for the plateau blend profile derivate descriptors ofFrarr

3dB were explored The most effective derivate ΔLrarr3dB re-

lated the upper-bound frequencies Frarr3dB of the two instru-

ments to a corresponding change in level across the ref-erence instrumentrsquos spectral-envelope magnitude functionLref (f ) in dB as formalized in Equation 1 In other wordsthis measure evaluated magnitude differences relative tothe reference instrument at frequencies appearing to be ofparticular perceptual relevance to both instruments whichtherefore may relate to spectral-masking effects (eg in-complete masking [13])

ΔLrarr3dB = Lref F rarr

3dB|ref minus Lref F rarr3dB|var (1)

The obtained solutions from stepwise multiple regressionyielded identical models for both instrument subsets in-volving the regressors ΔLrarr

3dB absolute spectral-centroiddifference |ΔScentroid| and context Clohi A slight gain inperformance was achieved by substituting the |ΔScentroid|based on audio signals from individual sounds with a vari-

1048

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table VI Multiple-regression models best predicting timbre-blend ratings for two instrument subsets

Pitch-invariant subset R2adj=87

F (3 116)=2724 p lt0001

Regressors βstd t p

ΔLrarr3dB 100 286 lt0001

|ΔScentroid| 26 77 lt0001Clohi 27 79 lt0001

Pitch-variant subset R2adj=88

F (3 52)=1342 p lt0001

Regressors βstd t p

ΔLrarr3dB 103 200 lt0001

|ΔScentroid| 16 33 0018Clohi 34 68 lt0001

ant computed on the pitch-generalized spectral-envelopeestimates Table VI displays these optimized regressionmodels for pitch-invariant and pitch-variant subsets bothleading to about 87 explained variance (adjusted R2)The standardized regression-slope coefficients βstd indi-cate the relative contribution of regressors with the rel-ative weights being very similar for both subsets In thesemodels ΔLrarr

3dB acted as the strongest predictor for theblend ratings contributing about five times more than|ΔScentroid| which furthermore did not perform better thanClohi These findings clearly argue for local spectral-envelope descriptors to be more meaningful than globalones in explaining blending in the investigated dyadsMoreover the remaining global descriptor spectral slopeappeared to play no role Furthermore finding both in-strument subsets to be modeled equally well through thesame spectral-envelope descriptors points to a general util-ity of pitch-generalized descriptions for all instrumentsDespite the findings in Section 522 arguing against pitch-invariant perceptual robustness for clarinet and flute theobtained regression models excluded the Pitch and Intervalvariables thus appearing to be less relevant to explainingthe blend ratings

6 General discussion

Orchestrators would benefit from acoustical descriptionsof instruments that correspond to the perceptual processesinvolved in achieving blended timbres Section 2 sug-gests that common orchestral wind instruments are rea-sonably well described through pitch-generalized spectral-envelope estimates which furthermore show the instru-ments horn bassoon oboe and trumpet to be charac-terized by prominent formant structure Auditory mod-els employing stabilized auditory images (SAI) confirmthat for strong formant characterization and for lower tomiddle pitch ranges the pitch-invariant characterization isstable In higher instrument registers however SAI pro-files indicate limitations to pitch-invariant characteriza-tion Other instruments like clarinet and flute yield SAI

profiles clearly varying as a function of pitch implyingthat this pitch dependency may also extend to perception

The perceptual investigation in Sections 3 to 5 con-firms the acoustical implications showing that strong for-mant characterization results in main formants becomingperceptually relevant to blending Given a dyad in whicha putative main formant is variable in frequency relativeto a fixed reference formant the investigated instrumentsdisplay two archetypical profiles based on their formantprominence For the pitch-variant clarinet and flute blendincreases as a monotonic quasi-linear function if the vari-able formant moves from above to below the referenceFor pitch-invariant instruments the frequency alignmentbetween the variable formant and the reference (ΔF= 0)functions as a boundary delimiting a region of higher de-grees of blend at and below the reference and contrastedby a marked decrease in blend when the variable formantexceeds it which overall resembles a plateau profile Thepitch invariance even extends to different interval typesas the plateau profile remains unaffected in non-unisonintervals regardless of their degree of consonance How-ever the findings suggest that the perceptual relevance ofspectral-envelope estimates diminishes in higher instru-ment registers The limited sampling of conditions acrossthe investigated factors prevented a more comprehensiveevaluation of all instruments Whereas the findings allowgeneral relationships for formant frequency to be deducedmore comprehensive investigations are needed to attainmore generalizable quantification of absolute frequencyranges and thresholds

In correlating acoustical and perceptual factors spec-tral-envelope characteristics alone explain up to 87 ofthe variance in blend ratings In addition local spectraltraits seem to be more powerful acoustical predictors ofblend than global traits The formant descriptor for theupper formant bound Frarr

3dB when expressed as a derivatedescriptor ΔLrarr

3dB acts as the strongest predictor for theblend ratings regardless of whether instruments belongto the pitch-invariant group or not With regard to clar-inet and flute the departures from pitch invariance foundin Section 522 contradict the utility of pitch-generalizedspectral-envelope description in predicting blend ratingsas reported in Section 523 Taking both findings into ac-count this for one argues that the descriptor Frarr

3dB stillsucceeds in explaining blend well even for clarinet andflute On the other hand the same instruments displaya greater perceptual sensitivity to the Pitch and Intervalfactors likely associated with their less pronounced for-mant structure Overall strong formant prominence leadsto more drastic changes in blend

The prediction of blend using ΔLrarr3dB still presumes

that one of the instruments serves as a reference formantas the employed difference descriptors are anchored tothe sampled instrument The dependence on a referenceleaves some ambiguity because an arbitrary combinationof two instruments would lead to contradictory predictionsof blend if both instruments were given equal importancein serving as the reference Given the context of both ex-

1049

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

periments it can be assumed that the sampled instrumentacting as a constant anchor had been biased into servingas the reference by combining it with a variable synthe-sized instrument In addition a possible perceptual expla-nation could concern audio samples of instruments playingwithout vibrato generally still exhibiting coherent micro-modulations of partial tones These modulations have beenshown to contribute to a stronger and more stable audi-tory image [31] and may thus bias the more stable imagetoward acting as the reference especially as the synthe-sized partials remain static over time Even in the contextof blending in musical performance one instrument as-sumes the role of the leading voice in which it possiblyserves as the reference whereas an accompanying instru-ment avoids exceeding the lead instrumentrsquos main-formantfrequency Likewise returning to the notion of blend lead-ing to augmented timbres [4] the dominant timbre to beembellished by another would seem predestined to func-tion as the reference based on which the less dominanttimbre should not exhibit formant frequencies exceedingthose of the reference

Finally the results allow a reassessment of previousexplanations for blend The lsquodarkerrsquo-timbre hypothesis[4] is directly reflected in the obtained linear blend pro-file in which lower ΔF increases blend and at the sametime causes a decrease in the spectral-centroid compos-ite By contrast this hypothesis is not well explained bythe plateau profile as blend ratings remain similarly highfor ΔF le 0 The alternative hypothesis of coincidence offormant regions [3] would have predicted stronger blendratings for ΔF = 0 than for all other levels which in theperceptual results is only achieved for the levels ΔF gt 0While the hypothesis only achieves partial fulfillment withrespect to spectral variations ΔF it finds more agreementin the corresponding SAI representations As shown in twoexample cases for horn in Figure 9 the dyad SAI pro-files for the levels ΔF gt 0 are distinguishable from theremaining levels through clear deviations between 1 and2 kHz7 and located just above the hornrsquos estimated Frarr

3dBRemarkably the formant shifts related to ΔFlt 0 (Fig-ure 3) are not reflected in the corresponding dyad SAIprofiles (Figure 9) which instead exhibit direct align-ment below 500 Hz for all three levels ΔF le 0 There-fore only ΔF gt 0 seems to lead to incomplete masking[13] revealing the presence of the synthesized instrumentwhereas the spectral-envelope variations ΔF lt 0 evokelittle change compared to the dyad excitation pattern forΔF = 0

Of still greater importance the auditory system as mod-eled by AIM using the DCGC seemingly involves a high-

7 Concerning the output from the AIM a misalignment between actualsinusoidal frequencies and the corresponding SAI peaks was observedThrough personal communication with the developer of the utilized AIMimplementation this was explained as being an inherent property of thedynamic-compression filters A correction function was derived by com-puting SAIs for various sinusoidal frequencies and fitting the two fre-quency scales yielding the linear function fSAI = 117 middot f + 282 Hz[r2(50)= 999 p lt 0001] In Figure 9 the correction manifests itself inthe compressed frequency extent for the SAIs

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

001

002

SAImagnitude

0

-II

-I

+I

+II

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

002

004

006

008

Frequency (Hz)

Figure 9 SAI profiles of dyads for allΔF levels (Experiment B)depicting two experimental conditions for horn Top pitch 1 uni-son bottom pitch 2 non-unison the grid lines correspond topartial-tone locations

pass characteristic that attenuates spectral-envelope re-gions below 500 Hz affecting the perceived magnitudesof the respective partial tones (grid lines) This impliesthat in the region below 500 Hz frequency deviations be-tween main formants no longer affect the achieved degreeof blend as reflected both in Figure 9 and the perceptualfindings Horn and bassoon would especially benefit fromthis as their main formants are centered around 500 HzOboe and trumpet both exhibiting higher Frarr

3dB can be as-sumed to benefit to a lesser degree In summary main for-mants located in proximity to 500 Hz will benefit more ontop of which lower pitches would also increase the num-ber of partial tones falling into this favorable frequencyregion This reflects tendencies for pitch-invariant traits inSAI correlations (see Section 22) to be more pronouncedat lower pitch ranges and for instruments of lower registerwhich would lend support to the lsquodarker-timbrersquo hypothe-sis in terms of limiting the spectral centroid in frequency

7 Conclusion

Evidence from acoustical and psychoacoustical descrip-tions of wind instruments and from perceptual validationshows that relative location and prominence of main for-mants affect the perception of timbre blend critically Fur-thermore these pitch-invariant spectral characteristics ex-plain and predict the perception of blend to a promis-ingly high degree Remaining discrepancies between theacoustic and perceptual domains can be explained throughapparent constraints of the simulated auditory systemIn conclusion a perceptual model for the contributionof local spectral-envelope characteristics to blending isproposed keeping in mind that it would serve as aninstrument-specific component in a more complex gen-eral perceptual model involving compositional and per-formance factors as initially discussed in Section 1 Themain factors influencing the perception of spectral blendare summarized1 Frequency relationships between upper bounds of main

formants are critical to blend Among several instru-

1050

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

ments one is expected to serve as the reference (eglead instrument dominant timbre) above which thepresence of other instrumentsrsquo formants would stronglyresult in decreased blend

2 Prominence of the main formants governs whetherthese relationships lead to plateau or linear blend pro-files and in the first case pitch-invariant perceptual ro-bustness extends to non-unison intervals

3 Spectral-envelope relationships below 500 Hz may benegligible due to constraints of the auditory system Atthe same time blend decreases at higher pitches due toa degraded perceptual robustness of formants

This hypothetical model still requires further investigationconcerning a more systematic study on 1) the apparentconstraints of the auditory system as modeled by AIM2) how in musical practice one instrument may functionas the reference 3) establishing a more specific descrip-tion of formant prominence and 4) addressing the contri-bution of loudness balance between instruments to blendThese future investigations will further validate and refinethe proposed perceptual model and will improve compu-tational prediction tools for the instrument-specific spec-tral component of blend Orchestration practice will ben-efit from these research efforts even beyond aiming forblend as knowledge of favorable instrument relationshipsalso informs orchestrators as to how to avoid it

References

[1] G J Sandell Concurrent timbres in orchestration a per-ceptual study of factors determining blend PhD disserta-tion Northwestern University 1991

[2] R A Kendall E C Carterette Identification and blend oftimbres as a basis for orchestration Contemp Music Rev9 (1993) 51ndash67

[3] C Reuter Die auditive Diskrimination von Orchesterin-strumenten - Verschmelzung und Heraushoumlrbarkeit vonInstrumentalklangfarben im Ensemblespiel (The auditorydiscrimination of orchestral instruments fusion and distin-guishability of instrumental timbres in ensemble playing)P Lang Frankfurt am Main 1996

[4] G J Sandell Roles for spectral centroid and other factorsin determining ldquoblended instrument pairings in orchestra-tion Music Percept 13 (1995) 209ndash246

[5] A S Bregman Auditory scene analysis the perceptual or-ganization of sound MIT Press Cambridge MA 1990

[6] S McAdams S Winsberg S Donnadieu G De Soete JKrimphoff Perceptual scaling of synthesized musical tim-bres common dimensions specificities and latent subjectclasses Psychol Res 58 (1995) 177ndash192

[7] L Wedin G Goude Dimension analysis of the perceptionof instrumental timbre Scand J Psychol 13 (1972) 228ndash240

[8] J M Grey J W Gordon Perceptual effects of spectralmodifications on musical timbres J Acoust Soc Am 63(1978) 1493ndash1500

[9] W Strong M Clark Synthesis of wind-instrument tonesJ Acoust Soc Am 41 (1967) 39ndash52

[10] D Tardieu S McAdams Perception of dyads of impulsiveand sustained instrument sounds Music Percept 30 (2012)117ndash128

[11] L Hermann Phonophotographische Untersuchungen(Phonophotographic investigations) Pfluumlger Archiv fuumlrdie Gesamte Physiologie des Menschen und der Thiere 58(1894) 264ndash279

[12] C Stumpf Die Sprachlaute (The speech sounds) SpringerBerlin 1926

[13] J P Fricke Zur Anwendung digitaler Klangfarbenfilter beiAufnahme und Wiedergabe (On the application of digitaltimbre filters during recording and playback) Proc 14thTonmeistertagung Munich Germany 1986 135ndash148

[14] K E Schumann Physik der Klangfarben (Physics of tim-bres) Professorial dissertation Universitaumlt Berlin Berlin1929

[15] E L Saldanha J F Corso Timbre cues and the identifica-tion of musical instruments J Acoust Soc Am 36 (1964)2021ndash2026

[16] W Strong M Clark Perturbations of synthetic orchestralwind-instrument tones J Acoust Soc Am 41 (1967) 277ndash285

[17] D Luce J Clark Physical correlates of brass-instrumenttones J Acoust Soc Am 42 (1967) 1232ndash1243

[18] D A Luce Dynamic spectrum changes of orchestral in-struments J Audio Eng Soc 23 (1975) 565ndash568

[19] J Meyer Acoustics and the performance of music 5th edSpringer New York 2009

[20] G Fant Acoustic theory of speech production MoutonThe Hague 1960

[21] C Reuter Klangfarbe und Instrumentation Geschichte ndashUrsachen ndash Wirkung (Timbre and instrumentation historyndash causes ndash effect) P Lang Frankfurt am Main 2002

[22] R van Dinther R D Patterson Perception of acousticscale and size in musical instrument sounds J AcoustSoc Am 120 (2006) 2158ndash2176

[23] T Irino R D Patterson A dynamic compressive gam-machirp auditory filterbank IEEE Trans Audio SpeechLang Process 14 (2006) 2222ndash2232

[24] B C Moore B R Glasberg Suggested formulae for cal-culating auditory-filter bandwidths and excitation patternsJ Acoust Soc Am 74 (1983) 750ndash753

[25] S-A Lembke S McAdams A spectral-envelope synthe-sis model to study perceptual blend between wind instru-ments Proc 11th Congregraves Franccedilais drsquoAcoustique IOAannual meeting Acoustics 2012 Nantes France 20121025ndash1030

[26] F Martin C Champlin Reconsidering the limits of normalhearing J Am Acad Audiol 11 (2000) 64ndash66

[27] ISO 389ndash8 Acoustics Reference zero for the calibrationof audiometric equipmentmdashPart 8 Reference equivalentthreshold sound pressure levels for pure tones and circum-aural earphones Tech Rept International Organization forStandardization Geneva Switzerland 2004

[28] W J Conover R L Iman Rank transformations as abridge between parametric and nonparametric statisticsAm Stat 35 (1981) 124ndash129

[29] J J Higgins S Tashtoush An aligned rank transform testfor interaction Nonlinear World 1 (1994) 201ndash221

[30] G Peeters B L Giordano P Susini N Misdariis SMcAdams The Timbre Toolbox extracting audio descrip-tors from musical signals J Acoust Soc Am 130 (2011)2902ndash2916

[31] S McAdams Spectral fusion spectral parsing and the for-mation of auditory images PhD dissertation Stanford Uni-versity 1984

1051

Page 6: The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table I Seventeen dyad conditions from Experiment A across instruments pitches and intervals (top-to-bottom) Intervals in semitonesrelative to the specified reference pitch

horn bassoon oboe trumpet clarinet fluteC3 A2 D5 C4 C4 B5 E3 D5 C4

0 6 7 0 -2 -3 0 0 0 6 7 0 0 -2 -3 0 0

ΔF

-II -I +II+I0

A

B

Γ= -100 Hz fslider

high

low

Figure 4 ΔF variations investigated in Experiments A and B AParticipants controlled fslider which provided a constant range of700 Hz (white arrows) Γ (eg minus100 Hz) represented a random-ized roving parameter preventing the range from always beingcentered on ΔF = 0 B Participants rated four dyads varyingin ΔF drawn from the low or high context The contexts repre-sented subsets of four of the total of five predefined ΔF levels

413 Procedure

A production task required participants to adjust ΔF di-rectly in order to achieve the maximum attainable blendwith the produced value serving as the dependent vari-able User control was provided via a two-dimensionalgraphical interface including controls for ΔF and thelevel balance between instruments The slider controls forΔF= fslider + Γ provided a constant range of 700 Hzwith fslider isin [minus350+350] and including a randomizedroving offset Γ isin [minus100+100] between trials As visu-alized in Figure 4 (top) minimal or maximal Γ limitedthe range covered by all trials to 500 Hz (solid thick greyline) with all possible ΔF deviations spanning a range of900 Hz (dashed thick line) Participants completed a totalof 88 experimental trials (22 conditions3 times 4 repetitions)taking about 50 minutes and including a 5-minute breakafter about 44 trials

42 Results

421 General trends

For all six instruments participants associated optimalblend with deviations ΔF le 0 Figure 6 (diamonds inlower part) illustrates the means for optimal ΔF fromwhich two different patterns become apparent among in-struments ΔF are displayed relative to a scale of equiv-alent variations tested in Experiment B with the scalevalue 0 corresponding to ΔF = 04 The grey lines indicate

3 Only 17 conditions investigated ΔF the remainder studied other for-mant properties that lie outside the focus of this article For instancerelative main-formant magnitude variations ΔL were also investigatedwhich will be addressed in a future publication4 ΔF were linearly interpolated to a scale of equi-distant levels eg minusI 0 and +I corresponding to the numerical scale values -1 0 and 1 re-spectively

each instrumentrsquos respective slider range For the instru-ments horn bassoon oboe and trumpet (left panel) opti-mal blend was found in direct proximity to ΔF = 0 Forthe unison intervals of horn and bassoon ΔF did not differsignificantly from zero ΔF for the other two instrumentswere located slightly lower [t(16) le minus56 p lt 0001rge 82] By contrast optimal ΔF for the clarinet and flute(right panel) were relatively distant from ΔF = 0 in linewith significant underestimations [t(16)leminus38 p le 0015rge 69] In summary ΔF values leading to optimal blendwere limited to cases in which the formant of the synthe-sized instrument was at or below that of the sampled in-strument

422 Instrument register and interval type

The influence of instrument register on the optimal ΔFwas investigated for trumpet bassoon and clarinet atpitches corresponding to instrument-specific low and highregisters One-way repeated-measures ANOVAs for Reg-ister yielded moderately strong main effects for trumpetand bassoon [F (1 16) ge 192 p le 0005 η2

p ge 55] dueto an increase of the optimal ΔF in the high registerA less pronounced effect was obtained for the clarinet[F (1 16)=53p= 0358η2

p = 25]The investigation of possible differences in optimal ΔF

between interval types involved comparisons between uni-son and non-unison intervals as well as a distinction be-tween consonant and dissonant for the latter One-wayrepeated-measures ANOVAs on Interval conducted forbassoon clarinet and trumpet only led to a weak maineffect for the trumpet [F (2 32)= 37p= 0347η2

p = 19]Post-hoc tests for the three possible comparisons yieldeda single significant difference between the interval sizes 0and 6 ST [t(16) = minus35 p = 0033 r = 65]

5 Experiment B

51 Method

511 Participants

The experiment was completed by 20 participants 9 fe-male and 11 male with a median age of 22 years (range18ndash35) Fifteen participants reported more than 10 yearsof formal musical training with 11 indicating experi-ence with wind instruments Participants were remuner-ated with 20 Canadian dollars

512 Stimuli

Table II lists the 22 investigated dyad conditions The In-terval factor investigated two levels comparing unison to

1044

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table II Twenty-two dyad conditions from Experiment B across instruments pitches and intervals (top-to-bottom) Intervals in semi-tones relative to the specified reference pitch

horn bassoon oboe trumpet clarinet fluteC3 B 3 A2 D4 C4 G 4 E5 C4 B 4 E3 A4 C4 G 4 E5

0 6 0 6 0 -2 0 -2 0 0 0 0 6 0 6 0 -2 0 -2 0 0 0

non-unison (6 or -2 ST dissonant) intervals Depending onthe instrument the Pitch factor involved two (horn bas-soon trumpet clarinet) or three (oboe flute) levels In thecase of horn bassoon trumpet and clarinet there weretwo levels of Interval for each level of Pitch In additionthis experiment included two factors that were related toΔF variations alone which applied to all conditions listedin Table II The first was synonymous with ΔF as it ex-plored a total of five ΔF levels including ΔF = 0 and twosets of predefined moderate and extreme deviations aboveand below it ie the ΔF levels hereafter labeled 0 plusmnI and plusmnII The second factor grouped the five levels contex-tually into two subsets of four which are denoted as lowand high contexts and defined in Figure 4 (bottom)

Employing the formant descriptors from Section 21the investigated ΔF levels were expressed on a commonscale of spectral-envelope description which provided abetter basis of comparison than taking equal frequencydifferences in Hz as the frequency extent of formantsacross instruments varied considerably Figure 5 providesexamples for all resulting ΔF levels of the horn The fourlevels ΔF = 0 were defined as frequency distances be-tween the formant maximum Fmax and measures relatedto the location and width of its bounds (eg Frarr

6dB andΔF3dB respectively) For example the positive deviationΔF (+I) was the distance between the formant maximumand its upper bound minus 10 of the width betweenthe 3 dB bounds If spectral-envelope descriptions lackedlower bounds (eg trumpet clarinet flute) the frequencylocated below Fmax that exhibited the lowest magnitudewas taken as a substitute value

Unlike the dyads in Experiment A the synthesizedsound always remained at the reference pitch whereas thesampled sound varied its pitch for non-unison intervalsbecause this tested the assumption of pitch-invariant de-scription for the recorded sounds more thoroughly Thedyads had a constant duration of 4700 ms In additionthe conditions listed in Table II including the associatedfive ΔF levels per condition had predetermined values forthe level balance between sounds and had also been equal-ized for loudness The first author determined the level bal-ance aiming for good balance between both sounds whilemaintaining discriminability between ΔF levels whichwas subsequently verified by the second author Loudnessequalization was conducted subjectively in a separate pi-lot experiment anchored to a global reference dyad for allconditions and ΔF levels For all stimuli gain levels weredetermined that equalized stimulus loudness to the globalreference These gain levels were based on median val-ues from at least five participants which were determined

200 400 600 800 1000 1200-12

-10

-8

-6

-4

-2

0

Powerlevel(dB)

Frequency (Hz)

ΔF3dB

F3dB

larrF3dB

rarrFmax

F6dB

larrF6dB

rarr

-II -I 0 +I +II

Figure 5 ΔF levels from Experiment B defined relative toa spectral-envelope estimatersquos formant maximum and boundsΔF (plusmnI) fall 10 inside of ΔF3dBrsquos extent ΔF (+II) aligns withFrarr

6dB whereas ΔF (minusII) aligns with either 80 middot F larr6dB or 150 Hz

whichever is closer to Fmax

either after the corresponding interquartile ranges first fellbelow 4 dB or after running a maximum of 10 participants

513 ProcedureA relative-rating task required participants to compare ΔFlevels for a given condition from Table II In each ex-perimental trial participants were presented four dyadsand asked to provide four corresponding ratings The fourdyads represented one of the twoΔF contexts labeled highand low in Figure 4 A continuous rating scale was em-ployed which spanned from most blended to least blended(values 1 to 0 respectively) and served as the dependentvariable Participants needed to assign two dyads to thescale extremes (eg most and least) the remaining twodyads were positioned along the scale continuum relativeto the chosen extremes Playback could be switched freelybetween the four dyads with the visual order of the se-lection buttons and rating scales for individual dyads ran-domized between trials Participants completed 120 trials(30 conditions5 times 2 contextstimes 2 repetitions) taking about75 minutes including two 5-minute breaks after about 40and 80 trials

52 Results

521 General trendsGroup medians of ratings aggregated across the factorsPitch Interval and Context illustrate how blend varies as a

5 Only the 22 conditions investigating ΔF are reported here

1045

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

function of ΔF Figure 6 suggests that participants mainlyassociated higher degrees of blend with the levels ΔF le 0whereas much lower ratings were obtained for ΔF gt 0 Interms of higher degrees of blend two typical rating pro-files as a function of ΔF emerged (shown as the ideal-ized dashed-and-dotted curves in Figure 6) 1) For the in-struments horn bassoon oboe and trumpet (left panel)medium to high blend ratings were obtained at and be-low ΔF = 0 above which ratings decreased markedlyresembling the profile of a plateau 2) The instrumentsclarinet and flute (right panel) exhibited a monotonicallydecreasing and approximately linear rating profile as ΔFincreases in which ΔF = 0 did not appear to assume a no-table role These differences in plateau vs linear profilesfor the two instrument subsets are analyzed more closelyin the following sections also taking into account potentialeffects due to the other factors

522 Blend and pitch invariance

Spectral characteristics that remain stable with pitch varia-tion such as formants may have a pitch-invariant percep-tual relevance To test this whenever the profiles of blendratings over ΔF remained largely unaffected by differentpitches intervals and ΔF contexts the perceptual resultswere assumed to be pitch-invariant For instance Figure 7suggests this tendency for the horn in which the plateauprofile was maintained over all factorial manipulations Asa first step the main effects across ΔF were tested to con-firm that ratings served as reliable indicators of perceptualdifferences Given these main effects perceptual robust-ness to pitch variation was fulfilled if no ΔF times Pitch orΔF times Interval interaction effects were found across bothΔF contexts An absence of main effects betweenΔF con-texts would indicate further perceptual robustness

As the Context factor only involved ΔF levels com-mon to both the high and low contexts namely 0 andplusmnI (see Figure 4 bottom) the ratings for these levels re-quired range normalization and separate analyses from theremaining factors For the instruments involving the In-terval factor these were conducted on split levels of thatfactor Furthermore the experimental task imposed the us-age of the rating-scale extremes which resulted in severalviolations of normality due to skewed distributions for thedyads selected as extremes As a result all main and in-teraction effects were tested with a battery of five indepen-dent repeated-measures ANOVAs on the raw and trans-formed ratings The data transformations included non-parametric approaches of rank transformation [28] andprior alignment of lsquonuisancersquo factors [29]6 The statistics

6 Given the unavailability of non-parametric alternatives for repeated-measures three-way ANOVAs that include tests for interaction effectsan approach was chosen that assesses tests over multiple variants ofdependent-variable transformations presuming that the most conserva-tive test in the ANOVA battery minimizes accepting false positives Ranktransformation is a common approach in non-parametric tests such as theone-way Friedman test [28] Issues with tests for interaction effects los-ing power in the presence of strong main effects were addressed throughlsquoalignmentrsquo of the raw data prior to rank transformation [29] For in-stance a test for the interaction AtimesB would align its lsquonuisancersquo factors

-II -I 0 +I +II

0

1

ΔF

Blendrating

Produced optimal blend

-II -I 0 +I +IIΔF

Produced optimal blend

horn

bassoon

oboe

trumpet

profile

clarinet

flute

profile

Figure 6 Perceptual results for the different instruments groupedaccording to two typical response patterns (left and right panels)Experiment A (diamonds bottom) mean ΔF for produced opti-mal blend transformed to a continuous scale of ΔF levels Thegrey lines indicate slider ranges (compare to Figure 4 top) Ex-periment B (curves) median blend ratings across ΔF levels andidealized profile

-II -I 0 +I

0

1

context lowinterval unison

Blendrating

-I 0 +I +II

0

1

context highinterval unison

pitch 1

pitch 2

-II -I 0 +I

0

1

context lowinterval non-unison

Blendrating

ΔF-I 0 +I +II

0

1

context high

interval non-unison

ΔF

Figure 7 Medians and interquartile ranges of blend ratingsfor horn across ΔF levels and the factorial manipulationsPitchtimesContexttimesInterval

for the most liberal and conservative p-values are reported(eg conserv|liberal) with the conservative finding beingassumed valid if statistical significance is in doubt

Strong main effects were found for all instrumentswhich indicated clear differences in perceived blendamong the investigated ΔF levels Table III lists ANOVAstatistics for the range between strongest (clarinet) andweakest (bassoon) main effects among the instrumentswhich reflects analogous differences in the utilized rating-scale ranges in Figure 6 Furthermore the rating profiles ofthe instruments horn bassoon oboe and trumpet fulfilledthe criteria for pitch-invariant robustness as they appeared

by removing the main effects for A and B The four data transformationsprocessed the raw data with or without alignment and for two rankingmethods The first method computed global ranks across the entire dataset ie across participants and conditions whereas the second methodevaluated within-participant ranks across conditions per participant

1046

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table III Range of strongest to weakest ANOVA main effects along ΔF across all six instruments

low context high contextEffect Stat conserv liberal conserv liberal

F 860 826 1651 1651Clarinet df 16307 21391 357 357(strongest) ε 54 69 - -

p lt0001 lt0001 lt0001 lt0001η2p 82 81 90 90

F 164 168 126 152Bassoon df 357 357 14271 357(weakest) ε - - 48 -

p lt0001 lt0001 0005 0001η2p 46 47 40 44

unaffected by pitch-related variation There was only oneexception from a complete absence of effects interactingwith ΔF a moderate main effect for Context for trum-pet was found only at non-unison intervals [F (1 19) =1048|2504 p = 0043|0001 η2

p = 355|569] By con-trast the rating profiles for clarinet and flute did not exhibitpitch invariance as they clearly violated the criteria acrossboth ΔF contexts The interaction effects with ΔF and amain effect for Context leading to their disqualification aredescribed in Table IV

The instruments displaying pitch invariance were thesame ones with plateau rating profiles possibly attributinga special role to ΔF = 0 as defining a boundary governingblend To further support this assumption by joint analysisof the four instruments two hierarchical cluster analyseswere employed that grouped ΔF levels based on their sim-ilarity in perceptual ratings or auditory-model representa-tions The first cluster analysis reinterpreted rating differ-ences between ΔF as a dissimilarity measure This mea-sure considered effect sizes of statistically significant non-parametric post-hoc analyses (Wilcoxon signed-rank test)for pairwise comparisons between ΔF levels ie greaterstatistical effects between two ΔF levels were expressedas being more dissimilar in the perceived degree of blendFor non-significant differences dissimilarity was assumedto be zero The second analysis relied on correlation coef-ficients (Pearson r) between dyad SAI profiles across ΔFlevels (see Figure 9 for examples) The dissimilarity mea-sure considered the complement value 1minus r and as all cor-relations fall within the range [0 1] no special treatmentfor negative correlations was required Both cluster anal-yses employed complete-linkage algorithms The dissimi-larity input matrices were obtained by averaging 30 inde-pendent data sets aggregated across the four instrumentsand the factors Context Pitch and Interval As shown inFigure 8 both analyses led to analogous solutions in whichthe two levels ΔF gt 0 are maximally dissimilar to acompact cluster associating the three levels ΔF le 0 Inother words ΔF associated with low and high degrees ofblend group into two distinct clusters clearly relating tothe plateau profile where ΔF = 0 defines the boundary tohigher degrees of blend

Table IV ANOVA effects for clarinet and flute suggesting an ab-sence of pitch invariance lowast The column header low context doesnot apply in this case

Clarinet low context high contextEffect Stat conserv liberal conserv liberal

F 29 34 38 44ΔF times df 357 357 20387 357Interval ε - - 68 -

p 044 024 030 008η2p 13 15 17 19

Flute low context high contextEffect Stat conserv liberal conserv liberal

F 28 43 44 72ΔF times df 39750 6114 6114 6114Pitch ε 66 - - -

p 031 0006 0005 lt0001η2p 13 18 19 28

F 49 157 - -df 119 119 - -

Contextlowast ε - - - -p 039 0008 - -η2p 21 45 - -

-II -I 0 +I +II

01

02

03

04

05

Dissimilarity

ΔF-II -I 0 +I +II

01

02

ΔF

Figure 8 Dendrograms of ΔF -level groupings for the pitch-invariant instruments Dissimilarity measures are derived fromperceptual ratings (left) and auditory-modelled dyad SAI profiles(right)

523 Blend and its spectral correlatesExplaining blending between instruments with the helpof spectral-envelope characteristics could eventually al-

1047

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table V Variables entering stepwise-regression algorithm to ob-tain models reported in Table VI a Not for pitch-variant subsetinter-variable correlation |r| gt 7 b Computed as described in[30] c Accounting for pitch d Accounting for perceivable beat-ing between partial tones

No Variable Description

1 ΔLrarr3dB derivate of Frarr

3dB Equation (1)2 ΔLF1vsF2 ΔL betw formants F1 amp F2

3 ΔSslopeF1ab spectral slope above Frarr

3dB4 ΔSslope

b global spectral slope5 |ΔScentroid

b absolute centroid difference6 Scentroid

b centroid composite7 ERBrate

c reference pitch in Table II8 I(non)unison interval category (binary)9 IST

a interval size in semitones10 Clohi ΔF context (binary)11 mixratio balance betw instruments12 AMdepth

bd amplitude modulation depth

low the prediction of blend through these instrument-specific traits In addition it could help understand the wayin which spectral characteristics contribute perceptuallyGiven this aim multiple linear regression was employed tomodel the median blend ratings through a number of vari-ables or regressors The regression models assessed therelative contributions of regressors describing both globaland local spectral-envelope traits Global descriptors in-volved the commonly reported spectral centroid and spec-tral slope [30] whereas local descriptors concerned theformant characterization discussed in Section 21 Becausea dyad yielded two descriptor values across its constituentsounds the regressor measure had to associate the two insome way For the spectral centroid two measures wereconsidered namely composite (sum) and absolute differ-ence [4] Although the centroid composite relates to thelsquodarkerrsquo-timbre hypothesis mentioned in the Introductionthe centroid difference had still been found to best explainblend in non-unison intervals [4] which left some uncer-tainty as to which of these two measures was more appro-priate in explaining blend in general The remaining spec-tral regressors were implemented as polarity-preservingdifferences between descriptors with the sampled instru-ment serving as the reference (ref) and the synthesized in-strument being variable (var) across ΔF For example thedifference of descriptor values dx for instrument x wouldcorrespond to Δd = dref minus dvar

Regression models for two separate subsets of the per-ceptual data were explored pitch-invariant (horn bassoonoboe trumpet) and pitch-variant instruments (clarinetflute) The datasets comprised all conditions tested acrossfactors and instruments with N = 118 and N = 54 forthe pitch-invariant and -variant subsets respectively Re-gressor variables were pooled from the spectral-envelopedescriptors and additional variables that were included toaccount for potential confounding factors eg pitch in-terval If these factors did not contribute as regressorsthis would further support a perceptual relevance of pitch

invariance The initial pool of regressors comprised 32variables subsequently reduced to a pre-selected set thatexhibited inter-variable correlations |r| lt 7 in order toavoid pronounced collinearity among regressors The pre-selection was determined by first identifying the vari-able that in simple linear regression exhibited the high-est R2 and subsequently adding all remaining variablesthat yielded permissible inter-variable correlations Ta-ble V lists the pre-selected variables entered into the re-gression which comprised spectral-envelope descriptors(nos 1-6) and variables representing other potential fac-tors of influence (nos 7-12) Stepwise multiple-regressionalgorithms with both forward-selection and backward-elimination schemes were considered which converged onoptimum models by iteratively adding or eliminating re-gressors respectively Models with similar combinationsof regressors to the optimum models were explored aswell In anticipation of reporting the results the inclusionof a binary regressor for ΔF context Clohi benefited all in-vestigated regression models as it corrected the systematicoffset of scaled ratings between the low and high contexts(see Figure 7)

In simple linear regression the strongest spectral-enve-lope descriptors all concerned local formant characteriza-tion and did not involve the global descriptors Among theformant descriptors the highest correlations were obtainedfor the main-formant upper bound Frarr

3dB applied to bothpitch-invariant [R2(118) = 656 p lt 0001] and pitch-variant subsets [R2(54) = 713 p lt 0001] Notably theformant maximum Fmax did not perform better than Frarr

3dBlikely due to differing skewness properties between instru-ment formants (see Figure 1) At the same time the utilityof Frarr

3dB implies that it could assume an important role inexplaining blend as perhaps the perceptually most salientfeature of formants It performed slightly better for thepitch-variant than for the pitch-invariant subset becauseΔF rarr

3dB essentially follows a strictly monotonic functionacross the investigated ΔF levels which apparently mod-els the linear blend profile better To improve the model-ing for the plateau blend profile derivate descriptors ofFrarr

3dB were explored The most effective derivate ΔLrarr3dB re-

lated the upper-bound frequencies Frarr3dB of the two instru-

ments to a corresponding change in level across the ref-erence instrumentrsquos spectral-envelope magnitude functionLref (f ) in dB as formalized in Equation 1 In other wordsthis measure evaluated magnitude differences relative tothe reference instrument at frequencies appearing to be ofparticular perceptual relevance to both instruments whichtherefore may relate to spectral-masking effects (eg in-complete masking [13])

ΔLrarr3dB = Lref F rarr

3dB|ref minus Lref F rarr3dB|var (1)

The obtained solutions from stepwise multiple regressionyielded identical models for both instrument subsets in-volving the regressors ΔLrarr

3dB absolute spectral-centroiddifference |ΔScentroid| and context Clohi A slight gain inperformance was achieved by substituting the |ΔScentroid|based on audio signals from individual sounds with a vari-

1048

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table VI Multiple-regression models best predicting timbre-blend ratings for two instrument subsets

Pitch-invariant subset R2adj=87

F (3 116)=2724 p lt0001

Regressors βstd t p

ΔLrarr3dB 100 286 lt0001

|ΔScentroid| 26 77 lt0001Clohi 27 79 lt0001

Pitch-variant subset R2adj=88

F (3 52)=1342 p lt0001

Regressors βstd t p

ΔLrarr3dB 103 200 lt0001

|ΔScentroid| 16 33 0018Clohi 34 68 lt0001

ant computed on the pitch-generalized spectral-envelopeestimates Table VI displays these optimized regressionmodels for pitch-invariant and pitch-variant subsets bothleading to about 87 explained variance (adjusted R2)The standardized regression-slope coefficients βstd indi-cate the relative contribution of regressors with the rel-ative weights being very similar for both subsets In thesemodels ΔLrarr

3dB acted as the strongest predictor for theblend ratings contributing about five times more than|ΔScentroid| which furthermore did not perform better thanClohi These findings clearly argue for local spectral-envelope descriptors to be more meaningful than globalones in explaining blending in the investigated dyadsMoreover the remaining global descriptor spectral slopeappeared to play no role Furthermore finding both in-strument subsets to be modeled equally well through thesame spectral-envelope descriptors points to a general util-ity of pitch-generalized descriptions for all instrumentsDespite the findings in Section 522 arguing against pitch-invariant perceptual robustness for clarinet and flute theobtained regression models excluded the Pitch and Intervalvariables thus appearing to be less relevant to explainingthe blend ratings

6 General discussion

Orchestrators would benefit from acoustical descriptionsof instruments that correspond to the perceptual processesinvolved in achieving blended timbres Section 2 sug-gests that common orchestral wind instruments are rea-sonably well described through pitch-generalized spectral-envelope estimates which furthermore show the instru-ments horn bassoon oboe and trumpet to be charac-terized by prominent formant structure Auditory mod-els employing stabilized auditory images (SAI) confirmthat for strong formant characterization and for lower tomiddle pitch ranges the pitch-invariant characterization isstable In higher instrument registers however SAI pro-files indicate limitations to pitch-invariant characteriza-tion Other instruments like clarinet and flute yield SAI

profiles clearly varying as a function of pitch implyingthat this pitch dependency may also extend to perception

The perceptual investigation in Sections 3 to 5 con-firms the acoustical implications showing that strong for-mant characterization results in main formants becomingperceptually relevant to blending Given a dyad in whicha putative main formant is variable in frequency relativeto a fixed reference formant the investigated instrumentsdisplay two archetypical profiles based on their formantprominence For the pitch-variant clarinet and flute blendincreases as a monotonic quasi-linear function if the vari-able formant moves from above to below the referenceFor pitch-invariant instruments the frequency alignmentbetween the variable formant and the reference (ΔF= 0)functions as a boundary delimiting a region of higher de-grees of blend at and below the reference and contrastedby a marked decrease in blend when the variable formantexceeds it which overall resembles a plateau profile Thepitch invariance even extends to different interval typesas the plateau profile remains unaffected in non-unisonintervals regardless of their degree of consonance How-ever the findings suggest that the perceptual relevance ofspectral-envelope estimates diminishes in higher instru-ment registers The limited sampling of conditions acrossthe investigated factors prevented a more comprehensiveevaluation of all instruments Whereas the findings allowgeneral relationships for formant frequency to be deducedmore comprehensive investigations are needed to attainmore generalizable quantification of absolute frequencyranges and thresholds

In correlating acoustical and perceptual factors spec-tral-envelope characteristics alone explain up to 87 ofthe variance in blend ratings In addition local spectraltraits seem to be more powerful acoustical predictors ofblend than global traits The formant descriptor for theupper formant bound Frarr

3dB when expressed as a derivatedescriptor ΔLrarr

3dB acts as the strongest predictor for theblend ratings regardless of whether instruments belongto the pitch-invariant group or not With regard to clar-inet and flute the departures from pitch invariance foundin Section 522 contradict the utility of pitch-generalizedspectral-envelope description in predicting blend ratingsas reported in Section 523 Taking both findings into ac-count this for one argues that the descriptor Frarr

3dB stillsucceeds in explaining blend well even for clarinet andflute On the other hand the same instruments displaya greater perceptual sensitivity to the Pitch and Intervalfactors likely associated with their less pronounced for-mant structure Overall strong formant prominence leadsto more drastic changes in blend

The prediction of blend using ΔLrarr3dB still presumes

that one of the instruments serves as a reference formantas the employed difference descriptors are anchored tothe sampled instrument The dependence on a referenceleaves some ambiguity because an arbitrary combinationof two instruments would lead to contradictory predictionsof blend if both instruments were given equal importancein serving as the reference Given the context of both ex-

1049

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

periments it can be assumed that the sampled instrumentacting as a constant anchor had been biased into servingas the reference by combining it with a variable synthe-sized instrument In addition a possible perceptual expla-nation could concern audio samples of instruments playingwithout vibrato generally still exhibiting coherent micro-modulations of partial tones These modulations have beenshown to contribute to a stronger and more stable audi-tory image [31] and may thus bias the more stable imagetoward acting as the reference especially as the synthe-sized partials remain static over time Even in the contextof blending in musical performance one instrument as-sumes the role of the leading voice in which it possiblyserves as the reference whereas an accompanying instru-ment avoids exceeding the lead instrumentrsquos main-formantfrequency Likewise returning to the notion of blend lead-ing to augmented timbres [4] the dominant timbre to beembellished by another would seem predestined to func-tion as the reference based on which the less dominanttimbre should not exhibit formant frequencies exceedingthose of the reference

Finally the results allow a reassessment of previousexplanations for blend The lsquodarkerrsquo-timbre hypothesis[4] is directly reflected in the obtained linear blend pro-file in which lower ΔF increases blend and at the sametime causes a decrease in the spectral-centroid compos-ite By contrast this hypothesis is not well explained bythe plateau profile as blend ratings remain similarly highfor ΔF le 0 The alternative hypothesis of coincidence offormant regions [3] would have predicted stronger blendratings for ΔF = 0 than for all other levels which in theperceptual results is only achieved for the levels ΔF gt 0While the hypothesis only achieves partial fulfillment withrespect to spectral variations ΔF it finds more agreementin the corresponding SAI representations As shown in twoexample cases for horn in Figure 9 the dyad SAI pro-files for the levels ΔF gt 0 are distinguishable from theremaining levels through clear deviations between 1 and2 kHz7 and located just above the hornrsquos estimated Frarr

3dBRemarkably the formant shifts related to ΔFlt 0 (Fig-ure 3) are not reflected in the corresponding dyad SAIprofiles (Figure 9) which instead exhibit direct align-ment below 500 Hz for all three levels ΔF le 0 There-fore only ΔF gt 0 seems to lead to incomplete masking[13] revealing the presence of the synthesized instrumentwhereas the spectral-envelope variations ΔF lt 0 evokelittle change compared to the dyad excitation pattern forΔF = 0

Of still greater importance the auditory system as mod-eled by AIM using the DCGC seemingly involves a high-

7 Concerning the output from the AIM a misalignment between actualsinusoidal frequencies and the corresponding SAI peaks was observedThrough personal communication with the developer of the utilized AIMimplementation this was explained as being an inherent property of thedynamic-compression filters A correction function was derived by com-puting SAIs for various sinusoidal frequencies and fitting the two fre-quency scales yielding the linear function fSAI = 117 middot f + 282 Hz[r2(50)= 999 p lt 0001] In Figure 9 the correction manifests itself inthe compressed frequency extent for the SAIs

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

001

002

SAImagnitude

0

-II

-I

+I

+II

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

002

004

006

008

Frequency (Hz)

Figure 9 SAI profiles of dyads for allΔF levels (Experiment B)depicting two experimental conditions for horn Top pitch 1 uni-son bottom pitch 2 non-unison the grid lines correspond topartial-tone locations

pass characteristic that attenuates spectral-envelope re-gions below 500 Hz affecting the perceived magnitudesof the respective partial tones (grid lines) This impliesthat in the region below 500 Hz frequency deviations be-tween main formants no longer affect the achieved degreeof blend as reflected both in Figure 9 and the perceptualfindings Horn and bassoon would especially benefit fromthis as their main formants are centered around 500 HzOboe and trumpet both exhibiting higher Frarr

3dB can be as-sumed to benefit to a lesser degree In summary main for-mants located in proximity to 500 Hz will benefit more ontop of which lower pitches would also increase the num-ber of partial tones falling into this favorable frequencyregion This reflects tendencies for pitch-invariant traits inSAI correlations (see Section 22) to be more pronouncedat lower pitch ranges and for instruments of lower registerwhich would lend support to the lsquodarker-timbrersquo hypothe-sis in terms of limiting the spectral centroid in frequency

7 Conclusion

Evidence from acoustical and psychoacoustical descrip-tions of wind instruments and from perceptual validationshows that relative location and prominence of main for-mants affect the perception of timbre blend critically Fur-thermore these pitch-invariant spectral characteristics ex-plain and predict the perception of blend to a promis-ingly high degree Remaining discrepancies between theacoustic and perceptual domains can be explained throughapparent constraints of the simulated auditory systemIn conclusion a perceptual model for the contributionof local spectral-envelope characteristics to blending isproposed keeping in mind that it would serve as aninstrument-specific component in a more complex gen-eral perceptual model involving compositional and per-formance factors as initially discussed in Section 1 Themain factors influencing the perception of spectral blendare summarized1 Frequency relationships between upper bounds of main

formants are critical to blend Among several instru-

1050

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

ments one is expected to serve as the reference (eglead instrument dominant timbre) above which thepresence of other instrumentsrsquo formants would stronglyresult in decreased blend

2 Prominence of the main formants governs whetherthese relationships lead to plateau or linear blend pro-files and in the first case pitch-invariant perceptual ro-bustness extends to non-unison intervals

3 Spectral-envelope relationships below 500 Hz may benegligible due to constraints of the auditory system Atthe same time blend decreases at higher pitches due toa degraded perceptual robustness of formants

This hypothetical model still requires further investigationconcerning a more systematic study on 1) the apparentconstraints of the auditory system as modeled by AIM2) how in musical practice one instrument may functionas the reference 3) establishing a more specific descrip-tion of formant prominence and 4) addressing the contri-bution of loudness balance between instruments to blendThese future investigations will further validate and refinethe proposed perceptual model and will improve compu-tational prediction tools for the instrument-specific spec-tral component of blend Orchestration practice will ben-efit from these research efforts even beyond aiming forblend as knowledge of favorable instrument relationshipsalso informs orchestrators as to how to avoid it

References

[1] G J Sandell Concurrent timbres in orchestration a per-ceptual study of factors determining blend PhD disserta-tion Northwestern University 1991

[2] R A Kendall E C Carterette Identification and blend oftimbres as a basis for orchestration Contemp Music Rev9 (1993) 51ndash67

[3] C Reuter Die auditive Diskrimination von Orchesterin-strumenten - Verschmelzung und Heraushoumlrbarkeit vonInstrumentalklangfarben im Ensemblespiel (The auditorydiscrimination of orchestral instruments fusion and distin-guishability of instrumental timbres in ensemble playing)P Lang Frankfurt am Main 1996

[4] G J Sandell Roles for spectral centroid and other factorsin determining ldquoblended instrument pairings in orchestra-tion Music Percept 13 (1995) 209ndash246

[5] A S Bregman Auditory scene analysis the perceptual or-ganization of sound MIT Press Cambridge MA 1990

[6] S McAdams S Winsberg S Donnadieu G De Soete JKrimphoff Perceptual scaling of synthesized musical tim-bres common dimensions specificities and latent subjectclasses Psychol Res 58 (1995) 177ndash192

[7] L Wedin G Goude Dimension analysis of the perceptionof instrumental timbre Scand J Psychol 13 (1972) 228ndash240

[8] J M Grey J W Gordon Perceptual effects of spectralmodifications on musical timbres J Acoust Soc Am 63(1978) 1493ndash1500

[9] W Strong M Clark Synthesis of wind-instrument tonesJ Acoust Soc Am 41 (1967) 39ndash52

[10] D Tardieu S McAdams Perception of dyads of impulsiveand sustained instrument sounds Music Percept 30 (2012)117ndash128

[11] L Hermann Phonophotographische Untersuchungen(Phonophotographic investigations) Pfluumlger Archiv fuumlrdie Gesamte Physiologie des Menschen und der Thiere 58(1894) 264ndash279

[12] C Stumpf Die Sprachlaute (The speech sounds) SpringerBerlin 1926

[13] J P Fricke Zur Anwendung digitaler Klangfarbenfilter beiAufnahme und Wiedergabe (On the application of digitaltimbre filters during recording and playback) Proc 14thTonmeistertagung Munich Germany 1986 135ndash148

[14] K E Schumann Physik der Klangfarben (Physics of tim-bres) Professorial dissertation Universitaumlt Berlin Berlin1929

[15] E L Saldanha J F Corso Timbre cues and the identifica-tion of musical instruments J Acoust Soc Am 36 (1964)2021ndash2026

[16] W Strong M Clark Perturbations of synthetic orchestralwind-instrument tones J Acoust Soc Am 41 (1967) 277ndash285

[17] D Luce J Clark Physical correlates of brass-instrumenttones J Acoust Soc Am 42 (1967) 1232ndash1243

[18] D A Luce Dynamic spectrum changes of orchestral in-struments J Audio Eng Soc 23 (1975) 565ndash568

[19] J Meyer Acoustics and the performance of music 5th edSpringer New York 2009

[20] G Fant Acoustic theory of speech production MoutonThe Hague 1960

[21] C Reuter Klangfarbe und Instrumentation Geschichte ndashUrsachen ndash Wirkung (Timbre and instrumentation historyndash causes ndash effect) P Lang Frankfurt am Main 2002

[22] R van Dinther R D Patterson Perception of acousticscale and size in musical instrument sounds J AcoustSoc Am 120 (2006) 2158ndash2176

[23] T Irino R D Patterson A dynamic compressive gam-machirp auditory filterbank IEEE Trans Audio SpeechLang Process 14 (2006) 2222ndash2232

[24] B C Moore B R Glasberg Suggested formulae for cal-culating auditory-filter bandwidths and excitation patternsJ Acoust Soc Am 74 (1983) 750ndash753

[25] S-A Lembke S McAdams A spectral-envelope synthe-sis model to study perceptual blend between wind instru-ments Proc 11th Congregraves Franccedilais drsquoAcoustique IOAannual meeting Acoustics 2012 Nantes France 20121025ndash1030

[26] F Martin C Champlin Reconsidering the limits of normalhearing J Am Acad Audiol 11 (2000) 64ndash66

[27] ISO 389ndash8 Acoustics Reference zero for the calibrationof audiometric equipmentmdashPart 8 Reference equivalentthreshold sound pressure levels for pure tones and circum-aural earphones Tech Rept International Organization forStandardization Geneva Switzerland 2004

[28] W J Conover R L Iman Rank transformations as abridge between parametric and nonparametric statisticsAm Stat 35 (1981) 124ndash129

[29] J J Higgins S Tashtoush An aligned rank transform testfor interaction Nonlinear World 1 (1994) 201ndash221

[30] G Peeters B L Giordano P Susini N Misdariis SMcAdams The Timbre Toolbox extracting audio descrip-tors from musical signals J Acoust Soc Am 130 (2011)2902ndash2916

[31] S McAdams Spectral fusion spectral parsing and the for-mation of auditory images PhD dissertation Stanford Uni-versity 1984

1051

Page 7: The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table II Twenty-two dyad conditions from Experiment B across instruments pitches and intervals (top-to-bottom) Intervals in semi-tones relative to the specified reference pitch

horn bassoon oboe trumpet clarinet fluteC3 B 3 A2 D4 C4 G 4 E5 C4 B 4 E3 A4 C4 G 4 E5

0 6 0 6 0 -2 0 -2 0 0 0 0 6 0 6 0 -2 0 -2 0 0 0

non-unison (6 or -2 ST dissonant) intervals Depending onthe instrument the Pitch factor involved two (horn bas-soon trumpet clarinet) or three (oboe flute) levels In thecase of horn bassoon trumpet and clarinet there weretwo levels of Interval for each level of Pitch In additionthis experiment included two factors that were related toΔF variations alone which applied to all conditions listedin Table II The first was synonymous with ΔF as it ex-plored a total of five ΔF levels including ΔF = 0 and twosets of predefined moderate and extreme deviations aboveand below it ie the ΔF levels hereafter labeled 0 plusmnI and plusmnII The second factor grouped the five levels contex-tually into two subsets of four which are denoted as lowand high contexts and defined in Figure 4 (bottom)

Employing the formant descriptors from Section 21the investigated ΔF levels were expressed on a commonscale of spectral-envelope description which provided abetter basis of comparison than taking equal frequencydifferences in Hz as the frequency extent of formantsacross instruments varied considerably Figure 5 providesexamples for all resulting ΔF levels of the horn The fourlevels ΔF = 0 were defined as frequency distances be-tween the formant maximum Fmax and measures relatedto the location and width of its bounds (eg Frarr

6dB andΔF3dB respectively) For example the positive deviationΔF (+I) was the distance between the formant maximumand its upper bound minus 10 of the width betweenthe 3 dB bounds If spectral-envelope descriptions lackedlower bounds (eg trumpet clarinet flute) the frequencylocated below Fmax that exhibited the lowest magnitudewas taken as a substitute value

Unlike the dyads in Experiment A the synthesizedsound always remained at the reference pitch whereas thesampled sound varied its pitch for non-unison intervalsbecause this tested the assumption of pitch-invariant de-scription for the recorded sounds more thoroughly Thedyads had a constant duration of 4700 ms In additionthe conditions listed in Table II including the associatedfive ΔF levels per condition had predetermined values forthe level balance between sounds and had also been equal-ized for loudness The first author determined the level bal-ance aiming for good balance between both sounds whilemaintaining discriminability between ΔF levels whichwas subsequently verified by the second author Loudnessequalization was conducted subjectively in a separate pi-lot experiment anchored to a global reference dyad for allconditions and ΔF levels For all stimuli gain levels weredetermined that equalized stimulus loudness to the globalreference These gain levels were based on median val-ues from at least five participants which were determined

200 400 600 800 1000 1200-12

-10

-8

-6

-4

-2

0

Powerlevel(dB)

Frequency (Hz)

ΔF3dB

F3dB

larrF3dB

rarrFmax

F6dB

larrF6dB

rarr

-II -I 0 +I +II

Figure 5 ΔF levels from Experiment B defined relative toa spectral-envelope estimatersquos formant maximum and boundsΔF (plusmnI) fall 10 inside of ΔF3dBrsquos extent ΔF (+II) aligns withFrarr

6dB whereas ΔF (minusII) aligns with either 80 middot F larr6dB or 150 Hz

whichever is closer to Fmax

either after the corresponding interquartile ranges first fellbelow 4 dB or after running a maximum of 10 participants

513 ProcedureA relative-rating task required participants to compare ΔFlevels for a given condition from Table II In each ex-perimental trial participants were presented four dyadsand asked to provide four corresponding ratings The fourdyads represented one of the twoΔF contexts labeled highand low in Figure 4 A continuous rating scale was em-ployed which spanned from most blended to least blended(values 1 to 0 respectively) and served as the dependentvariable Participants needed to assign two dyads to thescale extremes (eg most and least) the remaining twodyads were positioned along the scale continuum relativeto the chosen extremes Playback could be switched freelybetween the four dyads with the visual order of the se-lection buttons and rating scales for individual dyads ran-domized between trials Participants completed 120 trials(30 conditions5 times 2 contextstimes 2 repetitions) taking about75 minutes including two 5-minute breaks after about 40and 80 trials

52 Results

521 General trendsGroup medians of ratings aggregated across the factorsPitch Interval and Context illustrate how blend varies as a

5 Only the 22 conditions investigating ΔF are reported here

1045

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

function of ΔF Figure 6 suggests that participants mainlyassociated higher degrees of blend with the levels ΔF le 0whereas much lower ratings were obtained for ΔF gt 0 Interms of higher degrees of blend two typical rating pro-files as a function of ΔF emerged (shown as the ideal-ized dashed-and-dotted curves in Figure 6) 1) For the in-struments horn bassoon oboe and trumpet (left panel)medium to high blend ratings were obtained at and be-low ΔF = 0 above which ratings decreased markedlyresembling the profile of a plateau 2) The instrumentsclarinet and flute (right panel) exhibited a monotonicallydecreasing and approximately linear rating profile as ΔFincreases in which ΔF = 0 did not appear to assume a no-table role These differences in plateau vs linear profilesfor the two instrument subsets are analyzed more closelyin the following sections also taking into account potentialeffects due to the other factors

522 Blend and pitch invariance

Spectral characteristics that remain stable with pitch varia-tion such as formants may have a pitch-invariant percep-tual relevance To test this whenever the profiles of blendratings over ΔF remained largely unaffected by differentpitches intervals and ΔF contexts the perceptual resultswere assumed to be pitch-invariant For instance Figure 7suggests this tendency for the horn in which the plateauprofile was maintained over all factorial manipulations Asa first step the main effects across ΔF were tested to con-firm that ratings served as reliable indicators of perceptualdifferences Given these main effects perceptual robust-ness to pitch variation was fulfilled if no ΔF times Pitch orΔF times Interval interaction effects were found across bothΔF contexts An absence of main effects betweenΔF con-texts would indicate further perceptual robustness

As the Context factor only involved ΔF levels com-mon to both the high and low contexts namely 0 andplusmnI (see Figure 4 bottom) the ratings for these levels re-quired range normalization and separate analyses from theremaining factors For the instruments involving the In-terval factor these were conducted on split levels of thatfactor Furthermore the experimental task imposed the us-age of the rating-scale extremes which resulted in severalviolations of normality due to skewed distributions for thedyads selected as extremes As a result all main and in-teraction effects were tested with a battery of five indepen-dent repeated-measures ANOVAs on the raw and trans-formed ratings The data transformations included non-parametric approaches of rank transformation [28] andprior alignment of lsquonuisancersquo factors [29]6 The statistics

6 Given the unavailability of non-parametric alternatives for repeated-measures three-way ANOVAs that include tests for interaction effectsan approach was chosen that assesses tests over multiple variants ofdependent-variable transformations presuming that the most conserva-tive test in the ANOVA battery minimizes accepting false positives Ranktransformation is a common approach in non-parametric tests such as theone-way Friedman test [28] Issues with tests for interaction effects los-ing power in the presence of strong main effects were addressed throughlsquoalignmentrsquo of the raw data prior to rank transformation [29] For in-stance a test for the interaction AtimesB would align its lsquonuisancersquo factors

-II -I 0 +I +II

0

1

ΔF

Blendrating

Produced optimal blend

-II -I 0 +I +IIΔF

Produced optimal blend

horn

bassoon

oboe

trumpet

profile

clarinet

flute

profile

Figure 6 Perceptual results for the different instruments groupedaccording to two typical response patterns (left and right panels)Experiment A (diamonds bottom) mean ΔF for produced opti-mal blend transformed to a continuous scale of ΔF levels Thegrey lines indicate slider ranges (compare to Figure 4 top) Ex-periment B (curves) median blend ratings across ΔF levels andidealized profile

-II -I 0 +I

0

1

context lowinterval unison

Blendrating

-I 0 +I +II

0

1

context highinterval unison

pitch 1

pitch 2

-II -I 0 +I

0

1

context lowinterval non-unison

Blendrating

ΔF-I 0 +I +II

0

1

context high

interval non-unison

ΔF

Figure 7 Medians and interquartile ranges of blend ratingsfor horn across ΔF levels and the factorial manipulationsPitchtimesContexttimesInterval

for the most liberal and conservative p-values are reported(eg conserv|liberal) with the conservative finding beingassumed valid if statistical significance is in doubt

Strong main effects were found for all instrumentswhich indicated clear differences in perceived blendamong the investigated ΔF levels Table III lists ANOVAstatistics for the range between strongest (clarinet) andweakest (bassoon) main effects among the instrumentswhich reflects analogous differences in the utilized rating-scale ranges in Figure 6 Furthermore the rating profiles ofthe instruments horn bassoon oboe and trumpet fulfilledthe criteria for pitch-invariant robustness as they appeared

by removing the main effects for A and B The four data transformationsprocessed the raw data with or without alignment and for two rankingmethods The first method computed global ranks across the entire dataset ie across participants and conditions whereas the second methodevaluated within-participant ranks across conditions per participant

1046

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table III Range of strongest to weakest ANOVA main effects along ΔF across all six instruments

low context high contextEffect Stat conserv liberal conserv liberal

F 860 826 1651 1651Clarinet df 16307 21391 357 357(strongest) ε 54 69 - -

p lt0001 lt0001 lt0001 lt0001η2p 82 81 90 90

F 164 168 126 152Bassoon df 357 357 14271 357(weakest) ε - - 48 -

p lt0001 lt0001 0005 0001η2p 46 47 40 44

unaffected by pitch-related variation There was only oneexception from a complete absence of effects interactingwith ΔF a moderate main effect for Context for trum-pet was found only at non-unison intervals [F (1 19) =1048|2504 p = 0043|0001 η2

p = 355|569] By con-trast the rating profiles for clarinet and flute did not exhibitpitch invariance as they clearly violated the criteria acrossboth ΔF contexts The interaction effects with ΔF and amain effect for Context leading to their disqualification aredescribed in Table IV

The instruments displaying pitch invariance were thesame ones with plateau rating profiles possibly attributinga special role to ΔF = 0 as defining a boundary governingblend To further support this assumption by joint analysisof the four instruments two hierarchical cluster analyseswere employed that grouped ΔF levels based on their sim-ilarity in perceptual ratings or auditory-model representa-tions The first cluster analysis reinterpreted rating differ-ences between ΔF as a dissimilarity measure This mea-sure considered effect sizes of statistically significant non-parametric post-hoc analyses (Wilcoxon signed-rank test)for pairwise comparisons between ΔF levels ie greaterstatistical effects between two ΔF levels were expressedas being more dissimilar in the perceived degree of blendFor non-significant differences dissimilarity was assumedto be zero The second analysis relied on correlation coef-ficients (Pearson r) between dyad SAI profiles across ΔFlevels (see Figure 9 for examples) The dissimilarity mea-sure considered the complement value 1minus r and as all cor-relations fall within the range [0 1] no special treatmentfor negative correlations was required Both cluster anal-yses employed complete-linkage algorithms The dissimi-larity input matrices were obtained by averaging 30 inde-pendent data sets aggregated across the four instrumentsand the factors Context Pitch and Interval As shown inFigure 8 both analyses led to analogous solutions in whichthe two levels ΔF gt 0 are maximally dissimilar to acompact cluster associating the three levels ΔF le 0 Inother words ΔF associated with low and high degrees ofblend group into two distinct clusters clearly relating tothe plateau profile where ΔF = 0 defines the boundary tohigher degrees of blend

Table IV ANOVA effects for clarinet and flute suggesting an ab-sence of pitch invariance lowast The column header low context doesnot apply in this case

Clarinet low context high contextEffect Stat conserv liberal conserv liberal

F 29 34 38 44ΔF times df 357 357 20387 357Interval ε - - 68 -

p 044 024 030 008η2p 13 15 17 19

Flute low context high contextEffect Stat conserv liberal conserv liberal

F 28 43 44 72ΔF times df 39750 6114 6114 6114Pitch ε 66 - - -

p 031 0006 0005 lt0001η2p 13 18 19 28

F 49 157 - -df 119 119 - -

Contextlowast ε - - - -p 039 0008 - -η2p 21 45 - -

-II -I 0 +I +II

01

02

03

04

05

Dissimilarity

ΔF-II -I 0 +I +II

01

02

ΔF

Figure 8 Dendrograms of ΔF -level groupings for the pitch-invariant instruments Dissimilarity measures are derived fromperceptual ratings (left) and auditory-modelled dyad SAI profiles(right)

523 Blend and its spectral correlatesExplaining blending between instruments with the helpof spectral-envelope characteristics could eventually al-

1047

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table V Variables entering stepwise-regression algorithm to ob-tain models reported in Table VI a Not for pitch-variant subsetinter-variable correlation |r| gt 7 b Computed as described in[30] c Accounting for pitch d Accounting for perceivable beat-ing between partial tones

No Variable Description

1 ΔLrarr3dB derivate of Frarr

3dB Equation (1)2 ΔLF1vsF2 ΔL betw formants F1 amp F2

3 ΔSslopeF1ab spectral slope above Frarr

3dB4 ΔSslope

b global spectral slope5 |ΔScentroid

b absolute centroid difference6 Scentroid

b centroid composite7 ERBrate

c reference pitch in Table II8 I(non)unison interval category (binary)9 IST

a interval size in semitones10 Clohi ΔF context (binary)11 mixratio balance betw instruments12 AMdepth

bd amplitude modulation depth

low the prediction of blend through these instrument-specific traits In addition it could help understand the wayin which spectral characteristics contribute perceptuallyGiven this aim multiple linear regression was employed tomodel the median blend ratings through a number of vari-ables or regressors The regression models assessed therelative contributions of regressors describing both globaland local spectral-envelope traits Global descriptors in-volved the commonly reported spectral centroid and spec-tral slope [30] whereas local descriptors concerned theformant characterization discussed in Section 21 Becausea dyad yielded two descriptor values across its constituentsounds the regressor measure had to associate the two insome way For the spectral centroid two measures wereconsidered namely composite (sum) and absolute differ-ence [4] Although the centroid composite relates to thelsquodarkerrsquo-timbre hypothesis mentioned in the Introductionthe centroid difference had still been found to best explainblend in non-unison intervals [4] which left some uncer-tainty as to which of these two measures was more appro-priate in explaining blend in general The remaining spec-tral regressors were implemented as polarity-preservingdifferences between descriptors with the sampled instru-ment serving as the reference (ref) and the synthesized in-strument being variable (var) across ΔF For example thedifference of descriptor values dx for instrument x wouldcorrespond to Δd = dref minus dvar

Regression models for two separate subsets of the per-ceptual data were explored pitch-invariant (horn bassoonoboe trumpet) and pitch-variant instruments (clarinetflute) The datasets comprised all conditions tested acrossfactors and instruments with N = 118 and N = 54 forthe pitch-invariant and -variant subsets respectively Re-gressor variables were pooled from the spectral-envelopedescriptors and additional variables that were included toaccount for potential confounding factors eg pitch in-terval If these factors did not contribute as regressorsthis would further support a perceptual relevance of pitch

invariance The initial pool of regressors comprised 32variables subsequently reduced to a pre-selected set thatexhibited inter-variable correlations |r| lt 7 in order toavoid pronounced collinearity among regressors The pre-selection was determined by first identifying the vari-able that in simple linear regression exhibited the high-est R2 and subsequently adding all remaining variablesthat yielded permissible inter-variable correlations Ta-ble V lists the pre-selected variables entered into the re-gression which comprised spectral-envelope descriptors(nos 1-6) and variables representing other potential fac-tors of influence (nos 7-12) Stepwise multiple-regressionalgorithms with both forward-selection and backward-elimination schemes were considered which converged onoptimum models by iteratively adding or eliminating re-gressors respectively Models with similar combinationsof regressors to the optimum models were explored aswell In anticipation of reporting the results the inclusionof a binary regressor for ΔF context Clohi benefited all in-vestigated regression models as it corrected the systematicoffset of scaled ratings between the low and high contexts(see Figure 7)

In simple linear regression the strongest spectral-enve-lope descriptors all concerned local formant characteriza-tion and did not involve the global descriptors Among theformant descriptors the highest correlations were obtainedfor the main-formant upper bound Frarr

3dB applied to bothpitch-invariant [R2(118) = 656 p lt 0001] and pitch-variant subsets [R2(54) = 713 p lt 0001] Notably theformant maximum Fmax did not perform better than Frarr

3dBlikely due to differing skewness properties between instru-ment formants (see Figure 1) At the same time the utilityof Frarr

3dB implies that it could assume an important role inexplaining blend as perhaps the perceptually most salientfeature of formants It performed slightly better for thepitch-variant than for the pitch-invariant subset becauseΔF rarr

3dB essentially follows a strictly monotonic functionacross the investigated ΔF levels which apparently mod-els the linear blend profile better To improve the model-ing for the plateau blend profile derivate descriptors ofFrarr

3dB were explored The most effective derivate ΔLrarr3dB re-

lated the upper-bound frequencies Frarr3dB of the two instru-

ments to a corresponding change in level across the ref-erence instrumentrsquos spectral-envelope magnitude functionLref (f ) in dB as formalized in Equation 1 In other wordsthis measure evaluated magnitude differences relative tothe reference instrument at frequencies appearing to be ofparticular perceptual relevance to both instruments whichtherefore may relate to spectral-masking effects (eg in-complete masking [13])

ΔLrarr3dB = Lref F rarr

3dB|ref minus Lref F rarr3dB|var (1)

The obtained solutions from stepwise multiple regressionyielded identical models for both instrument subsets in-volving the regressors ΔLrarr

3dB absolute spectral-centroiddifference |ΔScentroid| and context Clohi A slight gain inperformance was achieved by substituting the |ΔScentroid|based on audio signals from individual sounds with a vari-

1048

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table VI Multiple-regression models best predicting timbre-blend ratings for two instrument subsets

Pitch-invariant subset R2adj=87

F (3 116)=2724 p lt0001

Regressors βstd t p

ΔLrarr3dB 100 286 lt0001

|ΔScentroid| 26 77 lt0001Clohi 27 79 lt0001

Pitch-variant subset R2adj=88

F (3 52)=1342 p lt0001

Regressors βstd t p

ΔLrarr3dB 103 200 lt0001

|ΔScentroid| 16 33 0018Clohi 34 68 lt0001

ant computed on the pitch-generalized spectral-envelopeestimates Table VI displays these optimized regressionmodels for pitch-invariant and pitch-variant subsets bothleading to about 87 explained variance (adjusted R2)The standardized regression-slope coefficients βstd indi-cate the relative contribution of regressors with the rel-ative weights being very similar for both subsets In thesemodels ΔLrarr

3dB acted as the strongest predictor for theblend ratings contributing about five times more than|ΔScentroid| which furthermore did not perform better thanClohi These findings clearly argue for local spectral-envelope descriptors to be more meaningful than globalones in explaining blending in the investigated dyadsMoreover the remaining global descriptor spectral slopeappeared to play no role Furthermore finding both in-strument subsets to be modeled equally well through thesame spectral-envelope descriptors points to a general util-ity of pitch-generalized descriptions for all instrumentsDespite the findings in Section 522 arguing against pitch-invariant perceptual robustness for clarinet and flute theobtained regression models excluded the Pitch and Intervalvariables thus appearing to be less relevant to explainingthe blend ratings

6 General discussion

Orchestrators would benefit from acoustical descriptionsof instruments that correspond to the perceptual processesinvolved in achieving blended timbres Section 2 sug-gests that common orchestral wind instruments are rea-sonably well described through pitch-generalized spectral-envelope estimates which furthermore show the instru-ments horn bassoon oboe and trumpet to be charac-terized by prominent formant structure Auditory mod-els employing stabilized auditory images (SAI) confirmthat for strong formant characterization and for lower tomiddle pitch ranges the pitch-invariant characterization isstable In higher instrument registers however SAI pro-files indicate limitations to pitch-invariant characteriza-tion Other instruments like clarinet and flute yield SAI

profiles clearly varying as a function of pitch implyingthat this pitch dependency may also extend to perception

The perceptual investigation in Sections 3 to 5 con-firms the acoustical implications showing that strong for-mant characterization results in main formants becomingperceptually relevant to blending Given a dyad in whicha putative main formant is variable in frequency relativeto a fixed reference formant the investigated instrumentsdisplay two archetypical profiles based on their formantprominence For the pitch-variant clarinet and flute blendincreases as a monotonic quasi-linear function if the vari-able formant moves from above to below the referenceFor pitch-invariant instruments the frequency alignmentbetween the variable formant and the reference (ΔF= 0)functions as a boundary delimiting a region of higher de-grees of blend at and below the reference and contrastedby a marked decrease in blend when the variable formantexceeds it which overall resembles a plateau profile Thepitch invariance even extends to different interval typesas the plateau profile remains unaffected in non-unisonintervals regardless of their degree of consonance How-ever the findings suggest that the perceptual relevance ofspectral-envelope estimates diminishes in higher instru-ment registers The limited sampling of conditions acrossthe investigated factors prevented a more comprehensiveevaluation of all instruments Whereas the findings allowgeneral relationships for formant frequency to be deducedmore comprehensive investigations are needed to attainmore generalizable quantification of absolute frequencyranges and thresholds

In correlating acoustical and perceptual factors spec-tral-envelope characteristics alone explain up to 87 ofthe variance in blend ratings In addition local spectraltraits seem to be more powerful acoustical predictors ofblend than global traits The formant descriptor for theupper formant bound Frarr

3dB when expressed as a derivatedescriptor ΔLrarr

3dB acts as the strongest predictor for theblend ratings regardless of whether instruments belongto the pitch-invariant group or not With regard to clar-inet and flute the departures from pitch invariance foundin Section 522 contradict the utility of pitch-generalizedspectral-envelope description in predicting blend ratingsas reported in Section 523 Taking both findings into ac-count this for one argues that the descriptor Frarr

3dB stillsucceeds in explaining blend well even for clarinet andflute On the other hand the same instruments displaya greater perceptual sensitivity to the Pitch and Intervalfactors likely associated with their less pronounced for-mant structure Overall strong formant prominence leadsto more drastic changes in blend

The prediction of blend using ΔLrarr3dB still presumes

that one of the instruments serves as a reference formantas the employed difference descriptors are anchored tothe sampled instrument The dependence on a referenceleaves some ambiguity because an arbitrary combinationof two instruments would lead to contradictory predictionsof blend if both instruments were given equal importancein serving as the reference Given the context of both ex-

1049

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

periments it can be assumed that the sampled instrumentacting as a constant anchor had been biased into servingas the reference by combining it with a variable synthe-sized instrument In addition a possible perceptual expla-nation could concern audio samples of instruments playingwithout vibrato generally still exhibiting coherent micro-modulations of partial tones These modulations have beenshown to contribute to a stronger and more stable audi-tory image [31] and may thus bias the more stable imagetoward acting as the reference especially as the synthe-sized partials remain static over time Even in the contextof blending in musical performance one instrument as-sumes the role of the leading voice in which it possiblyserves as the reference whereas an accompanying instru-ment avoids exceeding the lead instrumentrsquos main-formantfrequency Likewise returning to the notion of blend lead-ing to augmented timbres [4] the dominant timbre to beembellished by another would seem predestined to func-tion as the reference based on which the less dominanttimbre should not exhibit formant frequencies exceedingthose of the reference

Finally the results allow a reassessment of previousexplanations for blend The lsquodarkerrsquo-timbre hypothesis[4] is directly reflected in the obtained linear blend pro-file in which lower ΔF increases blend and at the sametime causes a decrease in the spectral-centroid compos-ite By contrast this hypothesis is not well explained bythe plateau profile as blend ratings remain similarly highfor ΔF le 0 The alternative hypothesis of coincidence offormant regions [3] would have predicted stronger blendratings for ΔF = 0 than for all other levels which in theperceptual results is only achieved for the levels ΔF gt 0While the hypothesis only achieves partial fulfillment withrespect to spectral variations ΔF it finds more agreementin the corresponding SAI representations As shown in twoexample cases for horn in Figure 9 the dyad SAI pro-files for the levels ΔF gt 0 are distinguishable from theremaining levels through clear deviations between 1 and2 kHz7 and located just above the hornrsquos estimated Frarr

3dBRemarkably the formant shifts related to ΔFlt 0 (Fig-ure 3) are not reflected in the corresponding dyad SAIprofiles (Figure 9) which instead exhibit direct align-ment below 500 Hz for all three levels ΔF le 0 There-fore only ΔF gt 0 seems to lead to incomplete masking[13] revealing the presence of the synthesized instrumentwhereas the spectral-envelope variations ΔF lt 0 evokelittle change compared to the dyad excitation pattern forΔF = 0

Of still greater importance the auditory system as mod-eled by AIM using the DCGC seemingly involves a high-

7 Concerning the output from the AIM a misalignment between actualsinusoidal frequencies and the corresponding SAI peaks was observedThrough personal communication with the developer of the utilized AIMimplementation this was explained as being an inherent property of thedynamic-compression filters A correction function was derived by com-puting SAIs for various sinusoidal frequencies and fitting the two fre-quency scales yielding the linear function fSAI = 117 middot f + 282 Hz[r2(50)= 999 p lt 0001] In Figure 9 the correction manifests itself inthe compressed frequency extent for the SAIs

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

001

002

SAImagnitude

0

-II

-I

+I

+II

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

002

004

006

008

Frequency (Hz)

Figure 9 SAI profiles of dyads for allΔF levels (Experiment B)depicting two experimental conditions for horn Top pitch 1 uni-son bottom pitch 2 non-unison the grid lines correspond topartial-tone locations

pass characteristic that attenuates spectral-envelope re-gions below 500 Hz affecting the perceived magnitudesof the respective partial tones (grid lines) This impliesthat in the region below 500 Hz frequency deviations be-tween main formants no longer affect the achieved degreeof blend as reflected both in Figure 9 and the perceptualfindings Horn and bassoon would especially benefit fromthis as their main formants are centered around 500 HzOboe and trumpet both exhibiting higher Frarr

3dB can be as-sumed to benefit to a lesser degree In summary main for-mants located in proximity to 500 Hz will benefit more ontop of which lower pitches would also increase the num-ber of partial tones falling into this favorable frequencyregion This reflects tendencies for pitch-invariant traits inSAI correlations (see Section 22) to be more pronouncedat lower pitch ranges and for instruments of lower registerwhich would lend support to the lsquodarker-timbrersquo hypothe-sis in terms of limiting the spectral centroid in frequency

7 Conclusion

Evidence from acoustical and psychoacoustical descrip-tions of wind instruments and from perceptual validationshows that relative location and prominence of main for-mants affect the perception of timbre blend critically Fur-thermore these pitch-invariant spectral characteristics ex-plain and predict the perception of blend to a promis-ingly high degree Remaining discrepancies between theacoustic and perceptual domains can be explained throughapparent constraints of the simulated auditory systemIn conclusion a perceptual model for the contributionof local spectral-envelope characteristics to blending isproposed keeping in mind that it would serve as aninstrument-specific component in a more complex gen-eral perceptual model involving compositional and per-formance factors as initially discussed in Section 1 Themain factors influencing the perception of spectral blendare summarized1 Frequency relationships between upper bounds of main

formants are critical to blend Among several instru-

1050

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

ments one is expected to serve as the reference (eglead instrument dominant timbre) above which thepresence of other instrumentsrsquo formants would stronglyresult in decreased blend

2 Prominence of the main formants governs whetherthese relationships lead to plateau or linear blend pro-files and in the first case pitch-invariant perceptual ro-bustness extends to non-unison intervals

3 Spectral-envelope relationships below 500 Hz may benegligible due to constraints of the auditory system Atthe same time blend decreases at higher pitches due toa degraded perceptual robustness of formants

This hypothetical model still requires further investigationconcerning a more systematic study on 1) the apparentconstraints of the auditory system as modeled by AIM2) how in musical practice one instrument may functionas the reference 3) establishing a more specific descrip-tion of formant prominence and 4) addressing the contri-bution of loudness balance between instruments to blendThese future investigations will further validate and refinethe proposed perceptual model and will improve compu-tational prediction tools for the instrument-specific spec-tral component of blend Orchestration practice will ben-efit from these research efforts even beyond aiming forblend as knowledge of favorable instrument relationshipsalso informs orchestrators as to how to avoid it

References

[1] G J Sandell Concurrent timbres in orchestration a per-ceptual study of factors determining blend PhD disserta-tion Northwestern University 1991

[2] R A Kendall E C Carterette Identification and blend oftimbres as a basis for orchestration Contemp Music Rev9 (1993) 51ndash67

[3] C Reuter Die auditive Diskrimination von Orchesterin-strumenten - Verschmelzung und Heraushoumlrbarkeit vonInstrumentalklangfarben im Ensemblespiel (The auditorydiscrimination of orchestral instruments fusion and distin-guishability of instrumental timbres in ensemble playing)P Lang Frankfurt am Main 1996

[4] G J Sandell Roles for spectral centroid and other factorsin determining ldquoblended instrument pairings in orchestra-tion Music Percept 13 (1995) 209ndash246

[5] A S Bregman Auditory scene analysis the perceptual or-ganization of sound MIT Press Cambridge MA 1990

[6] S McAdams S Winsberg S Donnadieu G De Soete JKrimphoff Perceptual scaling of synthesized musical tim-bres common dimensions specificities and latent subjectclasses Psychol Res 58 (1995) 177ndash192

[7] L Wedin G Goude Dimension analysis of the perceptionof instrumental timbre Scand J Psychol 13 (1972) 228ndash240

[8] J M Grey J W Gordon Perceptual effects of spectralmodifications on musical timbres J Acoust Soc Am 63(1978) 1493ndash1500

[9] W Strong M Clark Synthesis of wind-instrument tonesJ Acoust Soc Am 41 (1967) 39ndash52

[10] D Tardieu S McAdams Perception of dyads of impulsiveand sustained instrument sounds Music Percept 30 (2012)117ndash128

[11] L Hermann Phonophotographische Untersuchungen(Phonophotographic investigations) Pfluumlger Archiv fuumlrdie Gesamte Physiologie des Menschen und der Thiere 58(1894) 264ndash279

[12] C Stumpf Die Sprachlaute (The speech sounds) SpringerBerlin 1926

[13] J P Fricke Zur Anwendung digitaler Klangfarbenfilter beiAufnahme und Wiedergabe (On the application of digitaltimbre filters during recording and playback) Proc 14thTonmeistertagung Munich Germany 1986 135ndash148

[14] K E Schumann Physik der Klangfarben (Physics of tim-bres) Professorial dissertation Universitaumlt Berlin Berlin1929

[15] E L Saldanha J F Corso Timbre cues and the identifica-tion of musical instruments J Acoust Soc Am 36 (1964)2021ndash2026

[16] W Strong M Clark Perturbations of synthetic orchestralwind-instrument tones J Acoust Soc Am 41 (1967) 277ndash285

[17] D Luce J Clark Physical correlates of brass-instrumenttones J Acoust Soc Am 42 (1967) 1232ndash1243

[18] D A Luce Dynamic spectrum changes of orchestral in-struments J Audio Eng Soc 23 (1975) 565ndash568

[19] J Meyer Acoustics and the performance of music 5th edSpringer New York 2009

[20] G Fant Acoustic theory of speech production MoutonThe Hague 1960

[21] C Reuter Klangfarbe und Instrumentation Geschichte ndashUrsachen ndash Wirkung (Timbre and instrumentation historyndash causes ndash effect) P Lang Frankfurt am Main 2002

[22] R van Dinther R D Patterson Perception of acousticscale and size in musical instrument sounds J AcoustSoc Am 120 (2006) 2158ndash2176

[23] T Irino R D Patterson A dynamic compressive gam-machirp auditory filterbank IEEE Trans Audio SpeechLang Process 14 (2006) 2222ndash2232

[24] B C Moore B R Glasberg Suggested formulae for cal-culating auditory-filter bandwidths and excitation patternsJ Acoust Soc Am 74 (1983) 750ndash753

[25] S-A Lembke S McAdams A spectral-envelope synthe-sis model to study perceptual blend between wind instru-ments Proc 11th Congregraves Franccedilais drsquoAcoustique IOAannual meeting Acoustics 2012 Nantes France 20121025ndash1030

[26] F Martin C Champlin Reconsidering the limits of normalhearing J Am Acad Audiol 11 (2000) 64ndash66

[27] ISO 389ndash8 Acoustics Reference zero for the calibrationof audiometric equipmentmdashPart 8 Reference equivalentthreshold sound pressure levels for pure tones and circum-aural earphones Tech Rept International Organization forStandardization Geneva Switzerland 2004

[28] W J Conover R L Iman Rank transformations as abridge between parametric and nonparametric statisticsAm Stat 35 (1981) 124ndash129

[29] J J Higgins S Tashtoush An aligned rank transform testfor interaction Nonlinear World 1 (1994) 201ndash221

[30] G Peeters B L Giordano P Susini N Misdariis SMcAdams The Timbre Toolbox extracting audio descrip-tors from musical signals J Acoust Soc Am 130 (2011)2902ndash2916

[31] S McAdams Spectral fusion spectral parsing and the for-mation of auditory images PhD dissertation Stanford Uni-versity 1984

1051

Page 8: The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

function of ΔF Figure 6 suggests that participants mainlyassociated higher degrees of blend with the levels ΔF le 0whereas much lower ratings were obtained for ΔF gt 0 Interms of higher degrees of blend two typical rating pro-files as a function of ΔF emerged (shown as the ideal-ized dashed-and-dotted curves in Figure 6) 1) For the in-struments horn bassoon oboe and trumpet (left panel)medium to high blend ratings were obtained at and be-low ΔF = 0 above which ratings decreased markedlyresembling the profile of a plateau 2) The instrumentsclarinet and flute (right panel) exhibited a monotonicallydecreasing and approximately linear rating profile as ΔFincreases in which ΔF = 0 did not appear to assume a no-table role These differences in plateau vs linear profilesfor the two instrument subsets are analyzed more closelyin the following sections also taking into account potentialeffects due to the other factors

522 Blend and pitch invariance

Spectral characteristics that remain stable with pitch varia-tion such as formants may have a pitch-invariant percep-tual relevance To test this whenever the profiles of blendratings over ΔF remained largely unaffected by differentpitches intervals and ΔF contexts the perceptual resultswere assumed to be pitch-invariant For instance Figure 7suggests this tendency for the horn in which the plateauprofile was maintained over all factorial manipulations Asa first step the main effects across ΔF were tested to con-firm that ratings served as reliable indicators of perceptualdifferences Given these main effects perceptual robust-ness to pitch variation was fulfilled if no ΔF times Pitch orΔF times Interval interaction effects were found across bothΔF contexts An absence of main effects betweenΔF con-texts would indicate further perceptual robustness

As the Context factor only involved ΔF levels com-mon to both the high and low contexts namely 0 andplusmnI (see Figure 4 bottom) the ratings for these levels re-quired range normalization and separate analyses from theremaining factors For the instruments involving the In-terval factor these were conducted on split levels of thatfactor Furthermore the experimental task imposed the us-age of the rating-scale extremes which resulted in severalviolations of normality due to skewed distributions for thedyads selected as extremes As a result all main and in-teraction effects were tested with a battery of five indepen-dent repeated-measures ANOVAs on the raw and trans-formed ratings The data transformations included non-parametric approaches of rank transformation [28] andprior alignment of lsquonuisancersquo factors [29]6 The statistics

6 Given the unavailability of non-parametric alternatives for repeated-measures three-way ANOVAs that include tests for interaction effectsan approach was chosen that assesses tests over multiple variants ofdependent-variable transformations presuming that the most conserva-tive test in the ANOVA battery minimizes accepting false positives Ranktransformation is a common approach in non-parametric tests such as theone-way Friedman test [28] Issues with tests for interaction effects los-ing power in the presence of strong main effects were addressed throughlsquoalignmentrsquo of the raw data prior to rank transformation [29] For in-stance a test for the interaction AtimesB would align its lsquonuisancersquo factors

-II -I 0 +I +II

0

1

ΔF

Blendrating

Produced optimal blend

-II -I 0 +I +IIΔF

Produced optimal blend

horn

bassoon

oboe

trumpet

profile

clarinet

flute

profile

Figure 6 Perceptual results for the different instruments groupedaccording to two typical response patterns (left and right panels)Experiment A (diamonds bottom) mean ΔF for produced opti-mal blend transformed to a continuous scale of ΔF levels Thegrey lines indicate slider ranges (compare to Figure 4 top) Ex-periment B (curves) median blend ratings across ΔF levels andidealized profile

-II -I 0 +I

0

1

context lowinterval unison

Blendrating

-I 0 +I +II

0

1

context highinterval unison

pitch 1

pitch 2

-II -I 0 +I

0

1

context lowinterval non-unison

Blendrating

ΔF-I 0 +I +II

0

1

context high

interval non-unison

ΔF

Figure 7 Medians and interquartile ranges of blend ratingsfor horn across ΔF levels and the factorial manipulationsPitchtimesContexttimesInterval

for the most liberal and conservative p-values are reported(eg conserv|liberal) with the conservative finding beingassumed valid if statistical significance is in doubt

Strong main effects were found for all instrumentswhich indicated clear differences in perceived blendamong the investigated ΔF levels Table III lists ANOVAstatistics for the range between strongest (clarinet) andweakest (bassoon) main effects among the instrumentswhich reflects analogous differences in the utilized rating-scale ranges in Figure 6 Furthermore the rating profiles ofthe instruments horn bassoon oboe and trumpet fulfilledthe criteria for pitch-invariant robustness as they appeared

by removing the main effects for A and B The four data transformationsprocessed the raw data with or without alignment and for two rankingmethods The first method computed global ranks across the entire dataset ie across participants and conditions whereas the second methodevaluated within-participant ranks across conditions per participant

1046

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table III Range of strongest to weakest ANOVA main effects along ΔF across all six instruments

low context high contextEffect Stat conserv liberal conserv liberal

F 860 826 1651 1651Clarinet df 16307 21391 357 357(strongest) ε 54 69 - -

p lt0001 lt0001 lt0001 lt0001η2p 82 81 90 90

F 164 168 126 152Bassoon df 357 357 14271 357(weakest) ε - - 48 -

p lt0001 lt0001 0005 0001η2p 46 47 40 44

unaffected by pitch-related variation There was only oneexception from a complete absence of effects interactingwith ΔF a moderate main effect for Context for trum-pet was found only at non-unison intervals [F (1 19) =1048|2504 p = 0043|0001 η2

p = 355|569] By con-trast the rating profiles for clarinet and flute did not exhibitpitch invariance as they clearly violated the criteria acrossboth ΔF contexts The interaction effects with ΔF and amain effect for Context leading to their disqualification aredescribed in Table IV

The instruments displaying pitch invariance were thesame ones with plateau rating profiles possibly attributinga special role to ΔF = 0 as defining a boundary governingblend To further support this assumption by joint analysisof the four instruments two hierarchical cluster analyseswere employed that grouped ΔF levels based on their sim-ilarity in perceptual ratings or auditory-model representa-tions The first cluster analysis reinterpreted rating differ-ences between ΔF as a dissimilarity measure This mea-sure considered effect sizes of statistically significant non-parametric post-hoc analyses (Wilcoxon signed-rank test)for pairwise comparisons between ΔF levels ie greaterstatistical effects between two ΔF levels were expressedas being more dissimilar in the perceived degree of blendFor non-significant differences dissimilarity was assumedto be zero The second analysis relied on correlation coef-ficients (Pearson r) between dyad SAI profiles across ΔFlevels (see Figure 9 for examples) The dissimilarity mea-sure considered the complement value 1minus r and as all cor-relations fall within the range [0 1] no special treatmentfor negative correlations was required Both cluster anal-yses employed complete-linkage algorithms The dissimi-larity input matrices were obtained by averaging 30 inde-pendent data sets aggregated across the four instrumentsand the factors Context Pitch and Interval As shown inFigure 8 both analyses led to analogous solutions in whichthe two levels ΔF gt 0 are maximally dissimilar to acompact cluster associating the three levels ΔF le 0 Inother words ΔF associated with low and high degrees ofblend group into two distinct clusters clearly relating tothe plateau profile where ΔF = 0 defines the boundary tohigher degrees of blend

Table IV ANOVA effects for clarinet and flute suggesting an ab-sence of pitch invariance lowast The column header low context doesnot apply in this case

Clarinet low context high contextEffect Stat conserv liberal conserv liberal

F 29 34 38 44ΔF times df 357 357 20387 357Interval ε - - 68 -

p 044 024 030 008η2p 13 15 17 19

Flute low context high contextEffect Stat conserv liberal conserv liberal

F 28 43 44 72ΔF times df 39750 6114 6114 6114Pitch ε 66 - - -

p 031 0006 0005 lt0001η2p 13 18 19 28

F 49 157 - -df 119 119 - -

Contextlowast ε - - - -p 039 0008 - -η2p 21 45 - -

-II -I 0 +I +II

01

02

03

04

05

Dissimilarity

ΔF-II -I 0 +I +II

01

02

ΔF

Figure 8 Dendrograms of ΔF -level groupings for the pitch-invariant instruments Dissimilarity measures are derived fromperceptual ratings (left) and auditory-modelled dyad SAI profiles(right)

523 Blend and its spectral correlatesExplaining blending between instruments with the helpof spectral-envelope characteristics could eventually al-

1047

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table V Variables entering stepwise-regression algorithm to ob-tain models reported in Table VI a Not for pitch-variant subsetinter-variable correlation |r| gt 7 b Computed as described in[30] c Accounting for pitch d Accounting for perceivable beat-ing between partial tones

No Variable Description

1 ΔLrarr3dB derivate of Frarr

3dB Equation (1)2 ΔLF1vsF2 ΔL betw formants F1 amp F2

3 ΔSslopeF1ab spectral slope above Frarr

3dB4 ΔSslope

b global spectral slope5 |ΔScentroid

b absolute centroid difference6 Scentroid

b centroid composite7 ERBrate

c reference pitch in Table II8 I(non)unison interval category (binary)9 IST

a interval size in semitones10 Clohi ΔF context (binary)11 mixratio balance betw instruments12 AMdepth

bd amplitude modulation depth

low the prediction of blend through these instrument-specific traits In addition it could help understand the wayin which spectral characteristics contribute perceptuallyGiven this aim multiple linear regression was employed tomodel the median blend ratings through a number of vari-ables or regressors The regression models assessed therelative contributions of regressors describing both globaland local spectral-envelope traits Global descriptors in-volved the commonly reported spectral centroid and spec-tral slope [30] whereas local descriptors concerned theformant characterization discussed in Section 21 Becausea dyad yielded two descriptor values across its constituentsounds the regressor measure had to associate the two insome way For the spectral centroid two measures wereconsidered namely composite (sum) and absolute differ-ence [4] Although the centroid composite relates to thelsquodarkerrsquo-timbre hypothesis mentioned in the Introductionthe centroid difference had still been found to best explainblend in non-unison intervals [4] which left some uncer-tainty as to which of these two measures was more appro-priate in explaining blend in general The remaining spec-tral regressors were implemented as polarity-preservingdifferences between descriptors with the sampled instru-ment serving as the reference (ref) and the synthesized in-strument being variable (var) across ΔF For example thedifference of descriptor values dx for instrument x wouldcorrespond to Δd = dref minus dvar

Regression models for two separate subsets of the per-ceptual data were explored pitch-invariant (horn bassoonoboe trumpet) and pitch-variant instruments (clarinetflute) The datasets comprised all conditions tested acrossfactors and instruments with N = 118 and N = 54 forthe pitch-invariant and -variant subsets respectively Re-gressor variables were pooled from the spectral-envelopedescriptors and additional variables that were included toaccount for potential confounding factors eg pitch in-terval If these factors did not contribute as regressorsthis would further support a perceptual relevance of pitch

invariance The initial pool of regressors comprised 32variables subsequently reduced to a pre-selected set thatexhibited inter-variable correlations |r| lt 7 in order toavoid pronounced collinearity among regressors The pre-selection was determined by first identifying the vari-able that in simple linear regression exhibited the high-est R2 and subsequently adding all remaining variablesthat yielded permissible inter-variable correlations Ta-ble V lists the pre-selected variables entered into the re-gression which comprised spectral-envelope descriptors(nos 1-6) and variables representing other potential fac-tors of influence (nos 7-12) Stepwise multiple-regressionalgorithms with both forward-selection and backward-elimination schemes were considered which converged onoptimum models by iteratively adding or eliminating re-gressors respectively Models with similar combinationsof regressors to the optimum models were explored aswell In anticipation of reporting the results the inclusionof a binary regressor for ΔF context Clohi benefited all in-vestigated regression models as it corrected the systematicoffset of scaled ratings between the low and high contexts(see Figure 7)

In simple linear regression the strongest spectral-enve-lope descriptors all concerned local formant characteriza-tion and did not involve the global descriptors Among theformant descriptors the highest correlations were obtainedfor the main-formant upper bound Frarr

3dB applied to bothpitch-invariant [R2(118) = 656 p lt 0001] and pitch-variant subsets [R2(54) = 713 p lt 0001] Notably theformant maximum Fmax did not perform better than Frarr

3dBlikely due to differing skewness properties between instru-ment formants (see Figure 1) At the same time the utilityof Frarr

3dB implies that it could assume an important role inexplaining blend as perhaps the perceptually most salientfeature of formants It performed slightly better for thepitch-variant than for the pitch-invariant subset becauseΔF rarr

3dB essentially follows a strictly monotonic functionacross the investigated ΔF levels which apparently mod-els the linear blend profile better To improve the model-ing for the plateau blend profile derivate descriptors ofFrarr

3dB were explored The most effective derivate ΔLrarr3dB re-

lated the upper-bound frequencies Frarr3dB of the two instru-

ments to a corresponding change in level across the ref-erence instrumentrsquos spectral-envelope magnitude functionLref (f ) in dB as formalized in Equation 1 In other wordsthis measure evaluated magnitude differences relative tothe reference instrument at frequencies appearing to be ofparticular perceptual relevance to both instruments whichtherefore may relate to spectral-masking effects (eg in-complete masking [13])

ΔLrarr3dB = Lref F rarr

3dB|ref minus Lref F rarr3dB|var (1)

The obtained solutions from stepwise multiple regressionyielded identical models for both instrument subsets in-volving the regressors ΔLrarr

3dB absolute spectral-centroiddifference |ΔScentroid| and context Clohi A slight gain inperformance was achieved by substituting the |ΔScentroid|based on audio signals from individual sounds with a vari-

1048

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table VI Multiple-regression models best predicting timbre-blend ratings for two instrument subsets

Pitch-invariant subset R2adj=87

F (3 116)=2724 p lt0001

Regressors βstd t p

ΔLrarr3dB 100 286 lt0001

|ΔScentroid| 26 77 lt0001Clohi 27 79 lt0001

Pitch-variant subset R2adj=88

F (3 52)=1342 p lt0001

Regressors βstd t p

ΔLrarr3dB 103 200 lt0001

|ΔScentroid| 16 33 0018Clohi 34 68 lt0001

ant computed on the pitch-generalized spectral-envelopeestimates Table VI displays these optimized regressionmodels for pitch-invariant and pitch-variant subsets bothleading to about 87 explained variance (adjusted R2)The standardized regression-slope coefficients βstd indi-cate the relative contribution of regressors with the rel-ative weights being very similar for both subsets In thesemodels ΔLrarr

3dB acted as the strongest predictor for theblend ratings contributing about five times more than|ΔScentroid| which furthermore did not perform better thanClohi These findings clearly argue for local spectral-envelope descriptors to be more meaningful than globalones in explaining blending in the investigated dyadsMoreover the remaining global descriptor spectral slopeappeared to play no role Furthermore finding both in-strument subsets to be modeled equally well through thesame spectral-envelope descriptors points to a general util-ity of pitch-generalized descriptions for all instrumentsDespite the findings in Section 522 arguing against pitch-invariant perceptual robustness for clarinet and flute theobtained regression models excluded the Pitch and Intervalvariables thus appearing to be less relevant to explainingthe blend ratings

6 General discussion

Orchestrators would benefit from acoustical descriptionsof instruments that correspond to the perceptual processesinvolved in achieving blended timbres Section 2 sug-gests that common orchestral wind instruments are rea-sonably well described through pitch-generalized spectral-envelope estimates which furthermore show the instru-ments horn bassoon oboe and trumpet to be charac-terized by prominent formant structure Auditory mod-els employing stabilized auditory images (SAI) confirmthat for strong formant characterization and for lower tomiddle pitch ranges the pitch-invariant characterization isstable In higher instrument registers however SAI pro-files indicate limitations to pitch-invariant characteriza-tion Other instruments like clarinet and flute yield SAI

profiles clearly varying as a function of pitch implyingthat this pitch dependency may also extend to perception

The perceptual investigation in Sections 3 to 5 con-firms the acoustical implications showing that strong for-mant characterization results in main formants becomingperceptually relevant to blending Given a dyad in whicha putative main formant is variable in frequency relativeto a fixed reference formant the investigated instrumentsdisplay two archetypical profiles based on their formantprominence For the pitch-variant clarinet and flute blendincreases as a monotonic quasi-linear function if the vari-able formant moves from above to below the referenceFor pitch-invariant instruments the frequency alignmentbetween the variable formant and the reference (ΔF= 0)functions as a boundary delimiting a region of higher de-grees of blend at and below the reference and contrastedby a marked decrease in blend when the variable formantexceeds it which overall resembles a plateau profile Thepitch invariance even extends to different interval typesas the plateau profile remains unaffected in non-unisonintervals regardless of their degree of consonance How-ever the findings suggest that the perceptual relevance ofspectral-envelope estimates diminishes in higher instru-ment registers The limited sampling of conditions acrossthe investigated factors prevented a more comprehensiveevaluation of all instruments Whereas the findings allowgeneral relationships for formant frequency to be deducedmore comprehensive investigations are needed to attainmore generalizable quantification of absolute frequencyranges and thresholds

In correlating acoustical and perceptual factors spec-tral-envelope characteristics alone explain up to 87 ofthe variance in blend ratings In addition local spectraltraits seem to be more powerful acoustical predictors ofblend than global traits The formant descriptor for theupper formant bound Frarr

3dB when expressed as a derivatedescriptor ΔLrarr

3dB acts as the strongest predictor for theblend ratings regardless of whether instruments belongto the pitch-invariant group or not With regard to clar-inet and flute the departures from pitch invariance foundin Section 522 contradict the utility of pitch-generalizedspectral-envelope description in predicting blend ratingsas reported in Section 523 Taking both findings into ac-count this for one argues that the descriptor Frarr

3dB stillsucceeds in explaining blend well even for clarinet andflute On the other hand the same instruments displaya greater perceptual sensitivity to the Pitch and Intervalfactors likely associated with their less pronounced for-mant structure Overall strong formant prominence leadsto more drastic changes in blend

The prediction of blend using ΔLrarr3dB still presumes

that one of the instruments serves as a reference formantas the employed difference descriptors are anchored tothe sampled instrument The dependence on a referenceleaves some ambiguity because an arbitrary combinationof two instruments would lead to contradictory predictionsof blend if both instruments were given equal importancein serving as the reference Given the context of both ex-

1049

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

periments it can be assumed that the sampled instrumentacting as a constant anchor had been biased into servingas the reference by combining it with a variable synthe-sized instrument In addition a possible perceptual expla-nation could concern audio samples of instruments playingwithout vibrato generally still exhibiting coherent micro-modulations of partial tones These modulations have beenshown to contribute to a stronger and more stable audi-tory image [31] and may thus bias the more stable imagetoward acting as the reference especially as the synthe-sized partials remain static over time Even in the contextof blending in musical performance one instrument as-sumes the role of the leading voice in which it possiblyserves as the reference whereas an accompanying instru-ment avoids exceeding the lead instrumentrsquos main-formantfrequency Likewise returning to the notion of blend lead-ing to augmented timbres [4] the dominant timbre to beembellished by another would seem predestined to func-tion as the reference based on which the less dominanttimbre should not exhibit formant frequencies exceedingthose of the reference

Finally the results allow a reassessment of previousexplanations for blend The lsquodarkerrsquo-timbre hypothesis[4] is directly reflected in the obtained linear blend pro-file in which lower ΔF increases blend and at the sametime causes a decrease in the spectral-centroid compos-ite By contrast this hypothesis is not well explained bythe plateau profile as blend ratings remain similarly highfor ΔF le 0 The alternative hypothesis of coincidence offormant regions [3] would have predicted stronger blendratings for ΔF = 0 than for all other levels which in theperceptual results is only achieved for the levels ΔF gt 0While the hypothesis only achieves partial fulfillment withrespect to spectral variations ΔF it finds more agreementin the corresponding SAI representations As shown in twoexample cases for horn in Figure 9 the dyad SAI pro-files for the levels ΔF gt 0 are distinguishable from theremaining levels through clear deviations between 1 and2 kHz7 and located just above the hornrsquos estimated Frarr

3dBRemarkably the formant shifts related to ΔFlt 0 (Fig-ure 3) are not reflected in the corresponding dyad SAIprofiles (Figure 9) which instead exhibit direct align-ment below 500 Hz for all three levels ΔF le 0 There-fore only ΔF gt 0 seems to lead to incomplete masking[13] revealing the presence of the synthesized instrumentwhereas the spectral-envelope variations ΔF lt 0 evokelittle change compared to the dyad excitation pattern forΔF = 0

Of still greater importance the auditory system as mod-eled by AIM using the DCGC seemingly involves a high-

7 Concerning the output from the AIM a misalignment between actualsinusoidal frequencies and the corresponding SAI peaks was observedThrough personal communication with the developer of the utilized AIMimplementation this was explained as being an inherent property of thedynamic-compression filters A correction function was derived by com-puting SAIs for various sinusoidal frequencies and fitting the two fre-quency scales yielding the linear function fSAI = 117 middot f + 282 Hz[r2(50)= 999 p lt 0001] In Figure 9 the correction manifests itself inthe compressed frequency extent for the SAIs

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

001

002

SAImagnitude

0

-II

-I

+I

+II

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

002

004

006

008

Frequency (Hz)

Figure 9 SAI profiles of dyads for allΔF levels (Experiment B)depicting two experimental conditions for horn Top pitch 1 uni-son bottom pitch 2 non-unison the grid lines correspond topartial-tone locations

pass characteristic that attenuates spectral-envelope re-gions below 500 Hz affecting the perceived magnitudesof the respective partial tones (grid lines) This impliesthat in the region below 500 Hz frequency deviations be-tween main formants no longer affect the achieved degreeof blend as reflected both in Figure 9 and the perceptualfindings Horn and bassoon would especially benefit fromthis as their main formants are centered around 500 HzOboe and trumpet both exhibiting higher Frarr

3dB can be as-sumed to benefit to a lesser degree In summary main for-mants located in proximity to 500 Hz will benefit more ontop of which lower pitches would also increase the num-ber of partial tones falling into this favorable frequencyregion This reflects tendencies for pitch-invariant traits inSAI correlations (see Section 22) to be more pronouncedat lower pitch ranges and for instruments of lower registerwhich would lend support to the lsquodarker-timbrersquo hypothe-sis in terms of limiting the spectral centroid in frequency

7 Conclusion

Evidence from acoustical and psychoacoustical descrip-tions of wind instruments and from perceptual validationshows that relative location and prominence of main for-mants affect the perception of timbre blend critically Fur-thermore these pitch-invariant spectral characteristics ex-plain and predict the perception of blend to a promis-ingly high degree Remaining discrepancies between theacoustic and perceptual domains can be explained throughapparent constraints of the simulated auditory systemIn conclusion a perceptual model for the contributionof local spectral-envelope characteristics to blending isproposed keeping in mind that it would serve as aninstrument-specific component in a more complex gen-eral perceptual model involving compositional and per-formance factors as initially discussed in Section 1 Themain factors influencing the perception of spectral blendare summarized1 Frequency relationships between upper bounds of main

formants are critical to blend Among several instru-

1050

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

ments one is expected to serve as the reference (eglead instrument dominant timbre) above which thepresence of other instrumentsrsquo formants would stronglyresult in decreased blend

2 Prominence of the main formants governs whetherthese relationships lead to plateau or linear blend pro-files and in the first case pitch-invariant perceptual ro-bustness extends to non-unison intervals

3 Spectral-envelope relationships below 500 Hz may benegligible due to constraints of the auditory system Atthe same time blend decreases at higher pitches due toa degraded perceptual robustness of formants

This hypothetical model still requires further investigationconcerning a more systematic study on 1) the apparentconstraints of the auditory system as modeled by AIM2) how in musical practice one instrument may functionas the reference 3) establishing a more specific descrip-tion of formant prominence and 4) addressing the contri-bution of loudness balance between instruments to blendThese future investigations will further validate and refinethe proposed perceptual model and will improve compu-tational prediction tools for the instrument-specific spec-tral component of blend Orchestration practice will ben-efit from these research efforts even beyond aiming forblend as knowledge of favorable instrument relationshipsalso informs orchestrators as to how to avoid it

References

[1] G J Sandell Concurrent timbres in orchestration a per-ceptual study of factors determining blend PhD disserta-tion Northwestern University 1991

[2] R A Kendall E C Carterette Identification and blend oftimbres as a basis for orchestration Contemp Music Rev9 (1993) 51ndash67

[3] C Reuter Die auditive Diskrimination von Orchesterin-strumenten - Verschmelzung und Heraushoumlrbarkeit vonInstrumentalklangfarben im Ensemblespiel (The auditorydiscrimination of orchestral instruments fusion and distin-guishability of instrumental timbres in ensemble playing)P Lang Frankfurt am Main 1996

[4] G J Sandell Roles for spectral centroid and other factorsin determining ldquoblended instrument pairings in orchestra-tion Music Percept 13 (1995) 209ndash246

[5] A S Bregman Auditory scene analysis the perceptual or-ganization of sound MIT Press Cambridge MA 1990

[6] S McAdams S Winsberg S Donnadieu G De Soete JKrimphoff Perceptual scaling of synthesized musical tim-bres common dimensions specificities and latent subjectclasses Psychol Res 58 (1995) 177ndash192

[7] L Wedin G Goude Dimension analysis of the perceptionof instrumental timbre Scand J Psychol 13 (1972) 228ndash240

[8] J M Grey J W Gordon Perceptual effects of spectralmodifications on musical timbres J Acoust Soc Am 63(1978) 1493ndash1500

[9] W Strong M Clark Synthesis of wind-instrument tonesJ Acoust Soc Am 41 (1967) 39ndash52

[10] D Tardieu S McAdams Perception of dyads of impulsiveand sustained instrument sounds Music Percept 30 (2012)117ndash128

[11] L Hermann Phonophotographische Untersuchungen(Phonophotographic investigations) Pfluumlger Archiv fuumlrdie Gesamte Physiologie des Menschen und der Thiere 58(1894) 264ndash279

[12] C Stumpf Die Sprachlaute (The speech sounds) SpringerBerlin 1926

[13] J P Fricke Zur Anwendung digitaler Klangfarbenfilter beiAufnahme und Wiedergabe (On the application of digitaltimbre filters during recording and playback) Proc 14thTonmeistertagung Munich Germany 1986 135ndash148

[14] K E Schumann Physik der Klangfarben (Physics of tim-bres) Professorial dissertation Universitaumlt Berlin Berlin1929

[15] E L Saldanha J F Corso Timbre cues and the identifica-tion of musical instruments J Acoust Soc Am 36 (1964)2021ndash2026

[16] W Strong M Clark Perturbations of synthetic orchestralwind-instrument tones J Acoust Soc Am 41 (1967) 277ndash285

[17] D Luce J Clark Physical correlates of brass-instrumenttones J Acoust Soc Am 42 (1967) 1232ndash1243

[18] D A Luce Dynamic spectrum changes of orchestral in-struments J Audio Eng Soc 23 (1975) 565ndash568

[19] J Meyer Acoustics and the performance of music 5th edSpringer New York 2009

[20] G Fant Acoustic theory of speech production MoutonThe Hague 1960

[21] C Reuter Klangfarbe und Instrumentation Geschichte ndashUrsachen ndash Wirkung (Timbre and instrumentation historyndash causes ndash effect) P Lang Frankfurt am Main 2002

[22] R van Dinther R D Patterson Perception of acousticscale and size in musical instrument sounds J AcoustSoc Am 120 (2006) 2158ndash2176

[23] T Irino R D Patterson A dynamic compressive gam-machirp auditory filterbank IEEE Trans Audio SpeechLang Process 14 (2006) 2222ndash2232

[24] B C Moore B R Glasberg Suggested formulae for cal-culating auditory-filter bandwidths and excitation patternsJ Acoust Soc Am 74 (1983) 750ndash753

[25] S-A Lembke S McAdams A spectral-envelope synthe-sis model to study perceptual blend between wind instru-ments Proc 11th Congregraves Franccedilais drsquoAcoustique IOAannual meeting Acoustics 2012 Nantes France 20121025ndash1030

[26] F Martin C Champlin Reconsidering the limits of normalhearing J Am Acad Audiol 11 (2000) 64ndash66

[27] ISO 389ndash8 Acoustics Reference zero for the calibrationof audiometric equipmentmdashPart 8 Reference equivalentthreshold sound pressure levels for pure tones and circum-aural earphones Tech Rept International Organization forStandardization Geneva Switzerland 2004

[28] W J Conover R L Iman Rank transformations as abridge between parametric and nonparametric statisticsAm Stat 35 (1981) 124ndash129

[29] J J Higgins S Tashtoush An aligned rank transform testfor interaction Nonlinear World 1 (1994) 201ndash221

[30] G Peeters B L Giordano P Susini N Misdariis SMcAdams The Timbre Toolbox extracting audio descrip-tors from musical signals J Acoust Soc Am 130 (2011)2902ndash2916

[31] S McAdams Spectral fusion spectral parsing and the for-mation of auditory images PhD dissertation Stanford Uni-versity 1984

1051

Page 9: The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table III Range of strongest to weakest ANOVA main effects along ΔF across all six instruments

low context high contextEffect Stat conserv liberal conserv liberal

F 860 826 1651 1651Clarinet df 16307 21391 357 357(strongest) ε 54 69 - -

p lt0001 lt0001 lt0001 lt0001η2p 82 81 90 90

F 164 168 126 152Bassoon df 357 357 14271 357(weakest) ε - - 48 -

p lt0001 lt0001 0005 0001η2p 46 47 40 44

unaffected by pitch-related variation There was only oneexception from a complete absence of effects interactingwith ΔF a moderate main effect for Context for trum-pet was found only at non-unison intervals [F (1 19) =1048|2504 p = 0043|0001 η2

p = 355|569] By con-trast the rating profiles for clarinet and flute did not exhibitpitch invariance as they clearly violated the criteria acrossboth ΔF contexts The interaction effects with ΔF and amain effect for Context leading to their disqualification aredescribed in Table IV

The instruments displaying pitch invariance were thesame ones with plateau rating profiles possibly attributinga special role to ΔF = 0 as defining a boundary governingblend To further support this assumption by joint analysisof the four instruments two hierarchical cluster analyseswere employed that grouped ΔF levels based on their sim-ilarity in perceptual ratings or auditory-model representa-tions The first cluster analysis reinterpreted rating differ-ences between ΔF as a dissimilarity measure This mea-sure considered effect sizes of statistically significant non-parametric post-hoc analyses (Wilcoxon signed-rank test)for pairwise comparisons between ΔF levels ie greaterstatistical effects between two ΔF levels were expressedas being more dissimilar in the perceived degree of blendFor non-significant differences dissimilarity was assumedto be zero The second analysis relied on correlation coef-ficients (Pearson r) between dyad SAI profiles across ΔFlevels (see Figure 9 for examples) The dissimilarity mea-sure considered the complement value 1minus r and as all cor-relations fall within the range [0 1] no special treatmentfor negative correlations was required Both cluster anal-yses employed complete-linkage algorithms The dissimi-larity input matrices were obtained by averaging 30 inde-pendent data sets aggregated across the four instrumentsand the factors Context Pitch and Interval As shown inFigure 8 both analyses led to analogous solutions in whichthe two levels ΔF gt 0 are maximally dissimilar to acompact cluster associating the three levels ΔF le 0 Inother words ΔF associated with low and high degrees ofblend group into two distinct clusters clearly relating tothe plateau profile where ΔF = 0 defines the boundary tohigher degrees of blend

Table IV ANOVA effects for clarinet and flute suggesting an ab-sence of pitch invariance lowast The column header low context doesnot apply in this case

Clarinet low context high contextEffect Stat conserv liberal conserv liberal

F 29 34 38 44ΔF times df 357 357 20387 357Interval ε - - 68 -

p 044 024 030 008η2p 13 15 17 19

Flute low context high contextEffect Stat conserv liberal conserv liberal

F 28 43 44 72ΔF times df 39750 6114 6114 6114Pitch ε 66 - - -

p 031 0006 0005 lt0001η2p 13 18 19 28

F 49 157 - -df 119 119 - -

Contextlowast ε - - - -p 039 0008 - -η2p 21 45 - -

-II -I 0 +I +II

01

02

03

04

05

Dissimilarity

ΔF-II -I 0 +I +II

01

02

ΔF

Figure 8 Dendrograms of ΔF -level groupings for the pitch-invariant instruments Dissimilarity measures are derived fromperceptual ratings (left) and auditory-modelled dyad SAI profiles(right)

523 Blend and its spectral correlatesExplaining blending between instruments with the helpof spectral-envelope characteristics could eventually al-

1047

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table V Variables entering stepwise-regression algorithm to ob-tain models reported in Table VI a Not for pitch-variant subsetinter-variable correlation |r| gt 7 b Computed as described in[30] c Accounting for pitch d Accounting for perceivable beat-ing between partial tones

No Variable Description

1 ΔLrarr3dB derivate of Frarr

3dB Equation (1)2 ΔLF1vsF2 ΔL betw formants F1 amp F2

3 ΔSslopeF1ab spectral slope above Frarr

3dB4 ΔSslope

b global spectral slope5 |ΔScentroid

b absolute centroid difference6 Scentroid

b centroid composite7 ERBrate

c reference pitch in Table II8 I(non)unison interval category (binary)9 IST

a interval size in semitones10 Clohi ΔF context (binary)11 mixratio balance betw instruments12 AMdepth

bd amplitude modulation depth

low the prediction of blend through these instrument-specific traits In addition it could help understand the wayin which spectral characteristics contribute perceptuallyGiven this aim multiple linear regression was employed tomodel the median blend ratings through a number of vari-ables or regressors The regression models assessed therelative contributions of regressors describing both globaland local spectral-envelope traits Global descriptors in-volved the commonly reported spectral centroid and spec-tral slope [30] whereas local descriptors concerned theformant characterization discussed in Section 21 Becausea dyad yielded two descriptor values across its constituentsounds the regressor measure had to associate the two insome way For the spectral centroid two measures wereconsidered namely composite (sum) and absolute differ-ence [4] Although the centroid composite relates to thelsquodarkerrsquo-timbre hypothesis mentioned in the Introductionthe centroid difference had still been found to best explainblend in non-unison intervals [4] which left some uncer-tainty as to which of these two measures was more appro-priate in explaining blend in general The remaining spec-tral regressors were implemented as polarity-preservingdifferences between descriptors with the sampled instru-ment serving as the reference (ref) and the synthesized in-strument being variable (var) across ΔF For example thedifference of descriptor values dx for instrument x wouldcorrespond to Δd = dref minus dvar

Regression models for two separate subsets of the per-ceptual data were explored pitch-invariant (horn bassoonoboe trumpet) and pitch-variant instruments (clarinetflute) The datasets comprised all conditions tested acrossfactors and instruments with N = 118 and N = 54 forthe pitch-invariant and -variant subsets respectively Re-gressor variables were pooled from the spectral-envelopedescriptors and additional variables that were included toaccount for potential confounding factors eg pitch in-terval If these factors did not contribute as regressorsthis would further support a perceptual relevance of pitch

invariance The initial pool of regressors comprised 32variables subsequently reduced to a pre-selected set thatexhibited inter-variable correlations |r| lt 7 in order toavoid pronounced collinearity among regressors The pre-selection was determined by first identifying the vari-able that in simple linear regression exhibited the high-est R2 and subsequently adding all remaining variablesthat yielded permissible inter-variable correlations Ta-ble V lists the pre-selected variables entered into the re-gression which comprised spectral-envelope descriptors(nos 1-6) and variables representing other potential fac-tors of influence (nos 7-12) Stepwise multiple-regressionalgorithms with both forward-selection and backward-elimination schemes were considered which converged onoptimum models by iteratively adding or eliminating re-gressors respectively Models with similar combinationsof regressors to the optimum models were explored aswell In anticipation of reporting the results the inclusionof a binary regressor for ΔF context Clohi benefited all in-vestigated regression models as it corrected the systematicoffset of scaled ratings between the low and high contexts(see Figure 7)

In simple linear regression the strongest spectral-enve-lope descriptors all concerned local formant characteriza-tion and did not involve the global descriptors Among theformant descriptors the highest correlations were obtainedfor the main-formant upper bound Frarr

3dB applied to bothpitch-invariant [R2(118) = 656 p lt 0001] and pitch-variant subsets [R2(54) = 713 p lt 0001] Notably theformant maximum Fmax did not perform better than Frarr

3dBlikely due to differing skewness properties between instru-ment formants (see Figure 1) At the same time the utilityof Frarr

3dB implies that it could assume an important role inexplaining blend as perhaps the perceptually most salientfeature of formants It performed slightly better for thepitch-variant than for the pitch-invariant subset becauseΔF rarr

3dB essentially follows a strictly monotonic functionacross the investigated ΔF levels which apparently mod-els the linear blend profile better To improve the model-ing for the plateau blend profile derivate descriptors ofFrarr

3dB were explored The most effective derivate ΔLrarr3dB re-

lated the upper-bound frequencies Frarr3dB of the two instru-

ments to a corresponding change in level across the ref-erence instrumentrsquos spectral-envelope magnitude functionLref (f ) in dB as formalized in Equation 1 In other wordsthis measure evaluated magnitude differences relative tothe reference instrument at frequencies appearing to be ofparticular perceptual relevance to both instruments whichtherefore may relate to spectral-masking effects (eg in-complete masking [13])

ΔLrarr3dB = Lref F rarr

3dB|ref minus Lref F rarr3dB|var (1)

The obtained solutions from stepwise multiple regressionyielded identical models for both instrument subsets in-volving the regressors ΔLrarr

3dB absolute spectral-centroiddifference |ΔScentroid| and context Clohi A slight gain inperformance was achieved by substituting the |ΔScentroid|based on audio signals from individual sounds with a vari-

1048

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table VI Multiple-regression models best predicting timbre-blend ratings for two instrument subsets

Pitch-invariant subset R2adj=87

F (3 116)=2724 p lt0001

Regressors βstd t p

ΔLrarr3dB 100 286 lt0001

|ΔScentroid| 26 77 lt0001Clohi 27 79 lt0001

Pitch-variant subset R2adj=88

F (3 52)=1342 p lt0001

Regressors βstd t p

ΔLrarr3dB 103 200 lt0001

|ΔScentroid| 16 33 0018Clohi 34 68 lt0001

ant computed on the pitch-generalized spectral-envelopeestimates Table VI displays these optimized regressionmodels for pitch-invariant and pitch-variant subsets bothleading to about 87 explained variance (adjusted R2)The standardized regression-slope coefficients βstd indi-cate the relative contribution of regressors with the rel-ative weights being very similar for both subsets In thesemodels ΔLrarr

3dB acted as the strongest predictor for theblend ratings contributing about five times more than|ΔScentroid| which furthermore did not perform better thanClohi These findings clearly argue for local spectral-envelope descriptors to be more meaningful than globalones in explaining blending in the investigated dyadsMoreover the remaining global descriptor spectral slopeappeared to play no role Furthermore finding both in-strument subsets to be modeled equally well through thesame spectral-envelope descriptors points to a general util-ity of pitch-generalized descriptions for all instrumentsDespite the findings in Section 522 arguing against pitch-invariant perceptual robustness for clarinet and flute theobtained regression models excluded the Pitch and Intervalvariables thus appearing to be less relevant to explainingthe blend ratings

6 General discussion

Orchestrators would benefit from acoustical descriptionsof instruments that correspond to the perceptual processesinvolved in achieving blended timbres Section 2 sug-gests that common orchestral wind instruments are rea-sonably well described through pitch-generalized spectral-envelope estimates which furthermore show the instru-ments horn bassoon oboe and trumpet to be charac-terized by prominent formant structure Auditory mod-els employing stabilized auditory images (SAI) confirmthat for strong formant characterization and for lower tomiddle pitch ranges the pitch-invariant characterization isstable In higher instrument registers however SAI pro-files indicate limitations to pitch-invariant characteriza-tion Other instruments like clarinet and flute yield SAI

profiles clearly varying as a function of pitch implyingthat this pitch dependency may also extend to perception

The perceptual investigation in Sections 3 to 5 con-firms the acoustical implications showing that strong for-mant characterization results in main formants becomingperceptually relevant to blending Given a dyad in whicha putative main formant is variable in frequency relativeto a fixed reference formant the investigated instrumentsdisplay two archetypical profiles based on their formantprominence For the pitch-variant clarinet and flute blendincreases as a monotonic quasi-linear function if the vari-able formant moves from above to below the referenceFor pitch-invariant instruments the frequency alignmentbetween the variable formant and the reference (ΔF= 0)functions as a boundary delimiting a region of higher de-grees of blend at and below the reference and contrastedby a marked decrease in blend when the variable formantexceeds it which overall resembles a plateau profile Thepitch invariance even extends to different interval typesas the plateau profile remains unaffected in non-unisonintervals regardless of their degree of consonance How-ever the findings suggest that the perceptual relevance ofspectral-envelope estimates diminishes in higher instru-ment registers The limited sampling of conditions acrossthe investigated factors prevented a more comprehensiveevaluation of all instruments Whereas the findings allowgeneral relationships for formant frequency to be deducedmore comprehensive investigations are needed to attainmore generalizable quantification of absolute frequencyranges and thresholds

In correlating acoustical and perceptual factors spec-tral-envelope characteristics alone explain up to 87 ofthe variance in blend ratings In addition local spectraltraits seem to be more powerful acoustical predictors ofblend than global traits The formant descriptor for theupper formant bound Frarr

3dB when expressed as a derivatedescriptor ΔLrarr

3dB acts as the strongest predictor for theblend ratings regardless of whether instruments belongto the pitch-invariant group or not With regard to clar-inet and flute the departures from pitch invariance foundin Section 522 contradict the utility of pitch-generalizedspectral-envelope description in predicting blend ratingsas reported in Section 523 Taking both findings into ac-count this for one argues that the descriptor Frarr

3dB stillsucceeds in explaining blend well even for clarinet andflute On the other hand the same instruments displaya greater perceptual sensitivity to the Pitch and Intervalfactors likely associated with their less pronounced for-mant structure Overall strong formant prominence leadsto more drastic changes in blend

The prediction of blend using ΔLrarr3dB still presumes

that one of the instruments serves as a reference formantas the employed difference descriptors are anchored tothe sampled instrument The dependence on a referenceleaves some ambiguity because an arbitrary combinationof two instruments would lead to contradictory predictionsof blend if both instruments were given equal importancein serving as the reference Given the context of both ex-

1049

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

periments it can be assumed that the sampled instrumentacting as a constant anchor had been biased into servingas the reference by combining it with a variable synthe-sized instrument In addition a possible perceptual expla-nation could concern audio samples of instruments playingwithout vibrato generally still exhibiting coherent micro-modulations of partial tones These modulations have beenshown to contribute to a stronger and more stable audi-tory image [31] and may thus bias the more stable imagetoward acting as the reference especially as the synthe-sized partials remain static over time Even in the contextof blending in musical performance one instrument as-sumes the role of the leading voice in which it possiblyserves as the reference whereas an accompanying instru-ment avoids exceeding the lead instrumentrsquos main-formantfrequency Likewise returning to the notion of blend lead-ing to augmented timbres [4] the dominant timbre to beembellished by another would seem predestined to func-tion as the reference based on which the less dominanttimbre should not exhibit formant frequencies exceedingthose of the reference

Finally the results allow a reassessment of previousexplanations for blend The lsquodarkerrsquo-timbre hypothesis[4] is directly reflected in the obtained linear blend pro-file in which lower ΔF increases blend and at the sametime causes a decrease in the spectral-centroid compos-ite By contrast this hypothesis is not well explained bythe plateau profile as blend ratings remain similarly highfor ΔF le 0 The alternative hypothesis of coincidence offormant regions [3] would have predicted stronger blendratings for ΔF = 0 than for all other levels which in theperceptual results is only achieved for the levels ΔF gt 0While the hypothesis only achieves partial fulfillment withrespect to spectral variations ΔF it finds more agreementin the corresponding SAI representations As shown in twoexample cases for horn in Figure 9 the dyad SAI pro-files for the levels ΔF gt 0 are distinguishable from theremaining levels through clear deviations between 1 and2 kHz7 and located just above the hornrsquos estimated Frarr

3dBRemarkably the formant shifts related to ΔFlt 0 (Fig-ure 3) are not reflected in the corresponding dyad SAIprofiles (Figure 9) which instead exhibit direct align-ment below 500 Hz for all three levels ΔF le 0 There-fore only ΔF gt 0 seems to lead to incomplete masking[13] revealing the presence of the synthesized instrumentwhereas the spectral-envelope variations ΔF lt 0 evokelittle change compared to the dyad excitation pattern forΔF = 0

Of still greater importance the auditory system as mod-eled by AIM using the DCGC seemingly involves a high-

7 Concerning the output from the AIM a misalignment between actualsinusoidal frequencies and the corresponding SAI peaks was observedThrough personal communication with the developer of the utilized AIMimplementation this was explained as being an inherent property of thedynamic-compression filters A correction function was derived by com-puting SAIs for various sinusoidal frequencies and fitting the two fre-quency scales yielding the linear function fSAI = 117 middot f + 282 Hz[r2(50)= 999 p lt 0001] In Figure 9 the correction manifests itself inthe compressed frequency extent for the SAIs

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

001

002

SAImagnitude

0

-II

-I

+I

+II

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

002

004

006

008

Frequency (Hz)

Figure 9 SAI profiles of dyads for allΔF levels (Experiment B)depicting two experimental conditions for horn Top pitch 1 uni-son bottom pitch 2 non-unison the grid lines correspond topartial-tone locations

pass characteristic that attenuates spectral-envelope re-gions below 500 Hz affecting the perceived magnitudesof the respective partial tones (grid lines) This impliesthat in the region below 500 Hz frequency deviations be-tween main formants no longer affect the achieved degreeof blend as reflected both in Figure 9 and the perceptualfindings Horn and bassoon would especially benefit fromthis as their main formants are centered around 500 HzOboe and trumpet both exhibiting higher Frarr

3dB can be as-sumed to benefit to a lesser degree In summary main for-mants located in proximity to 500 Hz will benefit more ontop of which lower pitches would also increase the num-ber of partial tones falling into this favorable frequencyregion This reflects tendencies for pitch-invariant traits inSAI correlations (see Section 22) to be more pronouncedat lower pitch ranges and for instruments of lower registerwhich would lend support to the lsquodarker-timbrersquo hypothe-sis in terms of limiting the spectral centroid in frequency

7 Conclusion

Evidence from acoustical and psychoacoustical descrip-tions of wind instruments and from perceptual validationshows that relative location and prominence of main for-mants affect the perception of timbre blend critically Fur-thermore these pitch-invariant spectral characteristics ex-plain and predict the perception of blend to a promis-ingly high degree Remaining discrepancies between theacoustic and perceptual domains can be explained throughapparent constraints of the simulated auditory systemIn conclusion a perceptual model for the contributionof local spectral-envelope characteristics to blending isproposed keeping in mind that it would serve as aninstrument-specific component in a more complex gen-eral perceptual model involving compositional and per-formance factors as initially discussed in Section 1 Themain factors influencing the perception of spectral blendare summarized1 Frequency relationships between upper bounds of main

formants are critical to blend Among several instru-

1050

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

ments one is expected to serve as the reference (eglead instrument dominant timbre) above which thepresence of other instrumentsrsquo formants would stronglyresult in decreased blend

2 Prominence of the main formants governs whetherthese relationships lead to plateau or linear blend pro-files and in the first case pitch-invariant perceptual ro-bustness extends to non-unison intervals

3 Spectral-envelope relationships below 500 Hz may benegligible due to constraints of the auditory system Atthe same time blend decreases at higher pitches due toa degraded perceptual robustness of formants

This hypothetical model still requires further investigationconcerning a more systematic study on 1) the apparentconstraints of the auditory system as modeled by AIM2) how in musical practice one instrument may functionas the reference 3) establishing a more specific descrip-tion of formant prominence and 4) addressing the contri-bution of loudness balance between instruments to blendThese future investigations will further validate and refinethe proposed perceptual model and will improve compu-tational prediction tools for the instrument-specific spec-tral component of blend Orchestration practice will ben-efit from these research efforts even beyond aiming forblend as knowledge of favorable instrument relationshipsalso informs orchestrators as to how to avoid it

References

[1] G J Sandell Concurrent timbres in orchestration a per-ceptual study of factors determining blend PhD disserta-tion Northwestern University 1991

[2] R A Kendall E C Carterette Identification and blend oftimbres as a basis for orchestration Contemp Music Rev9 (1993) 51ndash67

[3] C Reuter Die auditive Diskrimination von Orchesterin-strumenten - Verschmelzung und Heraushoumlrbarkeit vonInstrumentalklangfarben im Ensemblespiel (The auditorydiscrimination of orchestral instruments fusion and distin-guishability of instrumental timbres in ensemble playing)P Lang Frankfurt am Main 1996

[4] G J Sandell Roles for spectral centroid and other factorsin determining ldquoblended instrument pairings in orchestra-tion Music Percept 13 (1995) 209ndash246

[5] A S Bregman Auditory scene analysis the perceptual or-ganization of sound MIT Press Cambridge MA 1990

[6] S McAdams S Winsberg S Donnadieu G De Soete JKrimphoff Perceptual scaling of synthesized musical tim-bres common dimensions specificities and latent subjectclasses Psychol Res 58 (1995) 177ndash192

[7] L Wedin G Goude Dimension analysis of the perceptionof instrumental timbre Scand J Psychol 13 (1972) 228ndash240

[8] J M Grey J W Gordon Perceptual effects of spectralmodifications on musical timbres J Acoust Soc Am 63(1978) 1493ndash1500

[9] W Strong M Clark Synthesis of wind-instrument tonesJ Acoust Soc Am 41 (1967) 39ndash52

[10] D Tardieu S McAdams Perception of dyads of impulsiveand sustained instrument sounds Music Percept 30 (2012)117ndash128

[11] L Hermann Phonophotographische Untersuchungen(Phonophotographic investigations) Pfluumlger Archiv fuumlrdie Gesamte Physiologie des Menschen und der Thiere 58(1894) 264ndash279

[12] C Stumpf Die Sprachlaute (The speech sounds) SpringerBerlin 1926

[13] J P Fricke Zur Anwendung digitaler Klangfarbenfilter beiAufnahme und Wiedergabe (On the application of digitaltimbre filters during recording and playback) Proc 14thTonmeistertagung Munich Germany 1986 135ndash148

[14] K E Schumann Physik der Klangfarben (Physics of tim-bres) Professorial dissertation Universitaumlt Berlin Berlin1929

[15] E L Saldanha J F Corso Timbre cues and the identifica-tion of musical instruments J Acoust Soc Am 36 (1964)2021ndash2026

[16] W Strong M Clark Perturbations of synthetic orchestralwind-instrument tones J Acoust Soc Am 41 (1967) 277ndash285

[17] D Luce J Clark Physical correlates of brass-instrumenttones J Acoust Soc Am 42 (1967) 1232ndash1243

[18] D A Luce Dynamic spectrum changes of orchestral in-struments J Audio Eng Soc 23 (1975) 565ndash568

[19] J Meyer Acoustics and the performance of music 5th edSpringer New York 2009

[20] G Fant Acoustic theory of speech production MoutonThe Hague 1960

[21] C Reuter Klangfarbe und Instrumentation Geschichte ndashUrsachen ndash Wirkung (Timbre and instrumentation historyndash causes ndash effect) P Lang Frankfurt am Main 2002

[22] R van Dinther R D Patterson Perception of acousticscale and size in musical instrument sounds J AcoustSoc Am 120 (2006) 2158ndash2176

[23] T Irino R D Patterson A dynamic compressive gam-machirp auditory filterbank IEEE Trans Audio SpeechLang Process 14 (2006) 2222ndash2232

[24] B C Moore B R Glasberg Suggested formulae for cal-culating auditory-filter bandwidths and excitation patternsJ Acoust Soc Am 74 (1983) 750ndash753

[25] S-A Lembke S McAdams A spectral-envelope synthe-sis model to study perceptual blend between wind instru-ments Proc 11th Congregraves Franccedilais drsquoAcoustique IOAannual meeting Acoustics 2012 Nantes France 20121025ndash1030

[26] F Martin C Champlin Reconsidering the limits of normalhearing J Am Acad Audiol 11 (2000) 64ndash66

[27] ISO 389ndash8 Acoustics Reference zero for the calibrationof audiometric equipmentmdashPart 8 Reference equivalentthreshold sound pressure levels for pure tones and circum-aural earphones Tech Rept International Organization forStandardization Geneva Switzerland 2004

[28] W J Conover R L Iman Rank transformations as abridge between parametric and nonparametric statisticsAm Stat 35 (1981) 124ndash129

[29] J J Higgins S Tashtoush An aligned rank transform testfor interaction Nonlinear World 1 (1994) 201ndash221

[30] G Peeters B L Giordano P Susini N Misdariis SMcAdams The Timbre Toolbox extracting audio descrip-tors from musical signals J Acoust Soc Am 130 (2011)2902ndash2916

[31] S McAdams Spectral fusion spectral parsing and the for-mation of auditory images PhD dissertation Stanford Uni-versity 1984

1051

Page 10: The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

Table V Variables entering stepwise-regression algorithm to ob-tain models reported in Table VI a Not for pitch-variant subsetinter-variable correlation |r| gt 7 b Computed as described in[30] c Accounting for pitch d Accounting for perceivable beat-ing between partial tones

No Variable Description

1 ΔLrarr3dB derivate of Frarr

3dB Equation (1)2 ΔLF1vsF2 ΔL betw formants F1 amp F2

3 ΔSslopeF1ab spectral slope above Frarr

3dB4 ΔSslope

b global spectral slope5 |ΔScentroid

b absolute centroid difference6 Scentroid

b centroid composite7 ERBrate

c reference pitch in Table II8 I(non)unison interval category (binary)9 IST

a interval size in semitones10 Clohi ΔF context (binary)11 mixratio balance betw instruments12 AMdepth

bd amplitude modulation depth

low the prediction of blend through these instrument-specific traits In addition it could help understand the wayin which spectral characteristics contribute perceptuallyGiven this aim multiple linear regression was employed tomodel the median blend ratings through a number of vari-ables or regressors The regression models assessed therelative contributions of regressors describing both globaland local spectral-envelope traits Global descriptors in-volved the commonly reported spectral centroid and spec-tral slope [30] whereas local descriptors concerned theformant characterization discussed in Section 21 Becausea dyad yielded two descriptor values across its constituentsounds the regressor measure had to associate the two insome way For the spectral centroid two measures wereconsidered namely composite (sum) and absolute differ-ence [4] Although the centroid composite relates to thelsquodarkerrsquo-timbre hypothesis mentioned in the Introductionthe centroid difference had still been found to best explainblend in non-unison intervals [4] which left some uncer-tainty as to which of these two measures was more appro-priate in explaining blend in general The remaining spec-tral regressors were implemented as polarity-preservingdifferences between descriptors with the sampled instru-ment serving as the reference (ref) and the synthesized in-strument being variable (var) across ΔF For example thedifference of descriptor values dx for instrument x wouldcorrespond to Δd = dref minus dvar

Regression models for two separate subsets of the per-ceptual data were explored pitch-invariant (horn bassoonoboe trumpet) and pitch-variant instruments (clarinetflute) The datasets comprised all conditions tested acrossfactors and instruments with N = 118 and N = 54 forthe pitch-invariant and -variant subsets respectively Re-gressor variables were pooled from the spectral-envelopedescriptors and additional variables that were included toaccount for potential confounding factors eg pitch in-terval If these factors did not contribute as regressorsthis would further support a perceptual relevance of pitch

invariance The initial pool of regressors comprised 32variables subsequently reduced to a pre-selected set thatexhibited inter-variable correlations |r| lt 7 in order toavoid pronounced collinearity among regressors The pre-selection was determined by first identifying the vari-able that in simple linear regression exhibited the high-est R2 and subsequently adding all remaining variablesthat yielded permissible inter-variable correlations Ta-ble V lists the pre-selected variables entered into the re-gression which comprised spectral-envelope descriptors(nos 1-6) and variables representing other potential fac-tors of influence (nos 7-12) Stepwise multiple-regressionalgorithms with both forward-selection and backward-elimination schemes were considered which converged onoptimum models by iteratively adding or eliminating re-gressors respectively Models with similar combinationsof regressors to the optimum models were explored aswell In anticipation of reporting the results the inclusionof a binary regressor for ΔF context Clohi benefited all in-vestigated regression models as it corrected the systematicoffset of scaled ratings between the low and high contexts(see Figure 7)

In simple linear regression the strongest spectral-enve-lope descriptors all concerned local formant characteriza-tion and did not involve the global descriptors Among theformant descriptors the highest correlations were obtainedfor the main-formant upper bound Frarr

3dB applied to bothpitch-invariant [R2(118) = 656 p lt 0001] and pitch-variant subsets [R2(54) = 713 p lt 0001] Notably theformant maximum Fmax did not perform better than Frarr

3dBlikely due to differing skewness properties between instru-ment formants (see Figure 1) At the same time the utilityof Frarr

3dB implies that it could assume an important role inexplaining blend as perhaps the perceptually most salientfeature of formants It performed slightly better for thepitch-variant than for the pitch-invariant subset becauseΔF rarr

3dB essentially follows a strictly monotonic functionacross the investigated ΔF levels which apparently mod-els the linear blend profile better To improve the model-ing for the plateau blend profile derivate descriptors ofFrarr

3dB were explored The most effective derivate ΔLrarr3dB re-

lated the upper-bound frequencies Frarr3dB of the two instru-

ments to a corresponding change in level across the ref-erence instrumentrsquos spectral-envelope magnitude functionLref (f ) in dB as formalized in Equation 1 In other wordsthis measure evaluated magnitude differences relative tothe reference instrument at frequencies appearing to be ofparticular perceptual relevance to both instruments whichtherefore may relate to spectral-masking effects (eg in-complete masking [13])

ΔLrarr3dB = Lref F rarr

3dB|ref minus Lref F rarr3dB|var (1)

The obtained solutions from stepwise multiple regressionyielded identical models for both instrument subsets in-volving the regressors ΔLrarr

3dB absolute spectral-centroiddifference |ΔScentroid| and context Clohi A slight gain inperformance was achieved by substituting the |ΔScentroid|based on audio signals from individual sounds with a vari-

1048

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table VI Multiple-regression models best predicting timbre-blend ratings for two instrument subsets

Pitch-invariant subset R2adj=87

F (3 116)=2724 p lt0001

Regressors βstd t p

ΔLrarr3dB 100 286 lt0001

|ΔScentroid| 26 77 lt0001Clohi 27 79 lt0001

Pitch-variant subset R2adj=88

F (3 52)=1342 p lt0001

Regressors βstd t p

ΔLrarr3dB 103 200 lt0001

|ΔScentroid| 16 33 0018Clohi 34 68 lt0001

ant computed on the pitch-generalized spectral-envelopeestimates Table VI displays these optimized regressionmodels for pitch-invariant and pitch-variant subsets bothleading to about 87 explained variance (adjusted R2)The standardized regression-slope coefficients βstd indi-cate the relative contribution of regressors with the rel-ative weights being very similar for both subsets In thesemodels ΔLrarr

3dB acted as the strongest predictor for theblend ratings contributing about five times more than|ΔScentroid| which furthermore did not perform better thanClohi These findings clearly argue for local spectral-envelope descriptors to be more meaningful than globalones in explaining blending in the investigated dyadsMoreover the remaining global descriptor spectral slopeappeared to play no role Furthermore finding both in-strument subsets to be modeled equally well through thesame spectral-envelope descriptors points to a general util-ity of pitch-generalized descriptions for all instrumentsDespite the findings in Section 522 arguing against pitch-invariant perceptual robustness for clarinet and flute theobtained regression models excluded the Pitch and Intervalvariables thus appearing to be less relevant to explainingthe blend ratings

6 General discussion

Orchestrators would benefit from acoustical descriptionsof instruments that correspond to the perceptual processesinvolved in achieving blended timbres Section 2 sug-gests that common orchestral wind instruments are rea-sonably well described through pitch-generalized spectral-envelope estimates which furthermore show the instru-ments horn bassoon oboe and trumpet to be charac-terized by prominent formant structure Auditory mod-els employing stabilized auditory images (SAI) confirmthat for strong formant characterization and for lower tomiddle pitch ranges the pitch-invariant characterization isstable In higher instrument registers however SAI pro-files indicate limitations to pitch-invariant characteriza-tion Other instruments like clarinet and flute yield SAI

profiles clearly varying as a function of pitch implyingthat this pitch dependency may also extend to perception

The perceptual investigation in Sections 3 to 5 con-firms the acoustical implications showing that strong for-mant characterization results in main formants becomingperceptually relevant to blending Given a dyad in whicha putative main formant is variable in frequency relativeto a fixed reference formant the investigated instrumentsdisplay two archetypical profiles based on their formantprominence For the pitch-variant clarinet and flute blendincreases as a monotonic quasi-linear function if the vari-able formant moves from above to below the referenceFor pitch-invariant instruments the frequency alignmentbetween the variable formant and the reference (ΔF= 0)functions as a boundary delimiting a region of higher de-grees of blend at and below the reference and contrastedby a marked decrease in blend when the variable formantexceeds it which overall resembles a plateau profile Thepitch invariance even extends to different interval typesas the plateau profile remains unaffected in non-unisonintervals regardless of their degree of consonance How-ever the findings suggest that the perceptual relevance ofspectral-envelope estimates diminishes in higher instru-ment registers The limited sampling of conditions acrossthe investigated factors prevented a more comprehensiveevaluation of all instruments Whereas the findings allowgeneral relationships for formant frequency to be deducedmore comprehensive investigations are needed to attainmore generalizable quantification of absolute frequencyranges and thresholds

In correlating acoustical and perceptual factors spec-tral-envelope characteristics alone explain up to 87 ofthe variance in blend ratings In addition local spectraltraits seem to be more powerful acoustical predictors ofblend than global traits The formant descriptor for theupper formant bound Frarr

3dB when expressed as a derivatedescriptor ΔLrarr

3dB acts as the strongest predictor for theblend ratings regardless of whether instruments belongto the pitch-invariant group or not With regard to clar-inet and flute the departures from pitch invariance foundin Section 522 contradict the utility of pitch-generalizedspectral-envelope description in predicting blend ratingsas reported in Section 523 Taking both findings into ac-count this for one argues that the descriptor Frarr

3dB stillsucceeds in explaining blend well even for clarinet andflute On the other hand the same instruments displaya greater perceptual sensitivity to the Pitch and Intervalfactors likely associated with their less pronounced for-mant structure Overall strong formant prominence leadsto more drastic changes in blend

The prediction of blend using ΔLrarr3dB still presumes

that one of the instruments serves as a reference formantas the employed difference descriptors are anchored tothe sampled instrument The dependence on a referenceleaves some ambiguity because an arbitrary combinationof two instruments would lead to contradictory predictionsof blend if both instruments were given equal importancein serving as the reference Given the context of both ex-

1049

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

periments it can be assumed that the sampled instrumentacting as a constant anchor had been biased into servingas the reference by combining it with a variable synthe-sized instrument In addition a possible perceptual expla-nation could concern audio samples of instruments playingwithout vibrato generally still exhibiting coherent micro-modulations of partial tones These modulations have beenshown to contribute to a stronger and more stable audi-tory image [31] and may thus bias the more stable imagetoward acting as the reference especially as the synthe-sized partials remain static over time Even in the contextof blending in musical performance one instrument as-sumes the role of the leading voice in which it possiblyserves as the reference whereas an accompanying instru-ment avoids exceeding the lead instrumentrsquos main-formantfrequency Likewise returning to the notion of blend lead-ing to augmented timbres [4] the dominant timbre to beembellished by another would seem predestined to func-tion as the reference based on which the less dominanttimbre should not exhibit formant frequencies exceedingthose of the reference

Finally the results allow a reassessment of previousexplanations for blend The lsquodarkerrsquo-timbre hypothesis[4] is directly reflected in the obtained linear blend pro-file in which lower ΔF increases blend and at the sametime causes a decrease in the spectral-centroid compos-ite By contrast this hypothesis is not well explained bythe plateau profile as blend ratings remain similarly highfor ΔF le 0 The alternative hypothesis of coincidence offormant regions [3] would have predicted stronger blendratings for ΔF = 0 than for all other levels which in theperceptual results is only achieved for the levels ΔF gt 0While the hypothesis only achieves partial fulfillment withrespect to spectral variations ΔF it finds more agreementin the corresponding SAI representations As shown in twoexample cases for horn in Figure 9 the dyad SAI pro-files for the levels ΔF gt 0 are distinguishable from theremaining levels through clear deviations between 1 and2 kHz7 and located just above the hornrsquos estimated Frarr

3dBRemarkably the formant shifts related to ΔFlt 0 (Fig-ure 3) are not reflected in the corresponding dyad SAIprofiles (Figure 9) which instead exhibit direct align-ment below 500 Hz for all three levels ΔF le 0 There-fore only ΔF gt 0 seems to lead to incomplete masking[13] revealing the presence of the synthesized instrumentwhereas the spectral-envelope variations ΔF lt 0 evokelittle change compared to the dyad excitation pattern forΔF = 0

Of still greater importance the auditory system as mod-eled by AIM using the DCGC seemingly involves a high-

7 Concerning the output from the AIM a misalignment between actualsinusoidal frequencies and the corresponding SAI peaks was observedThrough personal communication with the developer of the utilized AIMimplementation this was explained as being an inherent property of thedynamic-compression filters A correction function was derived by com-puting SAIs for various sinusoidal frequencies and fitting the two fre-quency scales yielding the linear function fSAI = 117 middot f + 282 Hz[r2(50)= 999 p lt 0001] In Figure 9 the correction manifests itself inthe compressed frequency extent for the SAIs

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

001

002

SAImagnitude

0

-II

-I

+I

+II

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

002

004

006

008

Frequency (Hz)

Figure 9 SAI profiles of dyads for allΔF levels (Experiment B)depicting two experimental conditions for horn Top pitch 1 uni-son bottom pitch 2 non-unison the grid lines correspond topartial-tone locations

pass characteristic that attenuates spectral-envelope re-gions below 500 Hz affecting the perceived magnitudesof the respective partial tones (grid lines) This impliesthat in the region below 500 Hz frequency deviations be-tween main formants no longer affect the achieved degreeof blend as reflected both in Figure 9 and the perceptualfindings Horn and bassoon would especially benefit fromthis as their main formants are centered around 500 HzOboe and trumpet both exhibiting higher Frarr

3dB can be as-sumed to benefit to a lesser degree In summary main for-mants located in proximity to 500 Hz will benefit more ontop of which lower pitches would also increase the num-ber of partial tones falling into this favorable frequencyregion This reflects tendencies for pitch-invariant traits inSAI correlations (see Section 22) to be more pronouncedat lower pitch ranges and for instruments of lower registerwhich would lend support to the lsquodarker-timbrersquo hypothe-sis in terms of limiting the spectral centroid in frequency

7 Conclusion

Evidence from acoustical and psychoacoustical descrip-tions of wind instruments and from perceptual validationshows that relative location and prominence of main for-mants affect the perception of timbre blend critically Fur-thermore these pitch-invariant spectral characteristics ex-plain and predict the perception of blend to a promis-ingly high degree Remaining discrepancies between theacoustic and perceptual domains can be explained throughapparent constraints of the simulated auditory systemIn conclusion a perceptual model for the contributionof local spectral-envelope characteristics to blending isproposed keeping in mind that it would serve as aninstrument-specific component in a more complex gen-eral perceptual model involving compositional and per-formance factors as initially discussed in Section 1 Themain factors influencing the perception of spectral blendare summarized1 Frequency relationships between upper bounds of main

formants are critical to blend Among several instru-

1050

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

ments one is expected to serve as the reference (eglead instrument dominant timbre) above which thepresence of other instrumentsrsquo formants would stronglyresult in decreased blend

2 Prominence of the main formants governs whetherthese relationships lead to plateau or linear blend pro-files and in the first case pitch-invariant perceptual ro-bustness extends to non-unison intervals

3 Spectral-envelope relationships below 500 Hz may benegligible due to constraints of the auditory system Atthe same time blend decreases at higher pitches due toa degraded perceptual robustness of formants

This hypothetical model still requires further investigationconcerning a more systematic study on 1) the apparentconstraints of the auditory system as modeled by AIM2) how in musical practice one instrument may functionas the reference 3) establishing a more specific descrip-tion of formant prominence and 4) addressing the contri-bution of loudness balance between instruments to blendThese future investigations will further validate and refinethe proposed perceptual model and will improve compu-tational prediction tools for the instrument-specific spec-tral component of blend Orchestration practice will ben-efit from these research efforts even beyond aiming forblend as knowledge of favorable instrument relationshipsalso informs orchestrators as to how to avoid it

References

[1] G J Sandell Concurrent timbres in orchestration a per-ceptual study of factors determining blend PhD disserta-tion Northwestern University 1991

[2] R A Kendall E C Carterette Identification and blend oftimbres as a basis for orchestration Contemp Music Rev9 (1993) 51ndash67

[3] C Reuter Die auditive Diskrimination von Orchesterin-strumenten - Verschmelzung und Heraushoumlrbarkeit vonInstrumentalklangfarben im Ensemblespiel (The auditorydiscrimination of orchestral instruments fusion and distin-guishability of instrumental timbres in ensemble playing)P Lang Frankfurt am Main 1996

[4] G J Sandell Roles for spectral centroid and other factorsin determining ldquoblended instrument pairings in orchestra-tion Music Percept 13 (1995) 209ndash246

[5] A S Bregman Auditory scene analysis the perceptual or-ganization of sound MIT Press Cambridge MA 1990

[6] S McAdams S Winsberg S Donnadieu G De Soete JKrimphoff Perceptual scaling of synthesized musical tim-bres common dimensions specificities and latent subjectclasses Psychol Res 58 (1995) 177ndash192

[7] L Wedin G Goude Dimension analysis of the perceptionof instrumental timbre Scand J Psychol 13 (1972) 228ndash240

[8] J M Grey J W Gordon Perceptual effects of spectralmodifications on musical timbres J Acoust Soc Am 63(1978) 1493ndash1500

[9] W Strong M Clark Synthesis of wind-instrument tonesJ Acoust Soc Am 41 (1967) 39ndash52

[10] D Tardieu S McAdams Perception of dyads of impulsiveand sustained instrument sounds Music Percept 30 (2012)117ndash128

[11] L Hermann Phonophotographische Untersuchungen(Phonophotographic investigations) Pfluumlger Archiv fuumlrdie Gesamte Physiologie des Menschen und der Thiere 58(1894) 264ndash279

[12] C Stumpf Die Sprachlaute (The speech sounds) SpringerBerlin 1926

[13] J P Fricke Zur Anwendung digitaler Klangfarbenfilter beiAufnahme und Wiedergabe (On the application of digitaltimbre filters during recording and playback) Proc 14thTonmeistertagung Munich Germany 1986 135ndash148

[14] K E Schumann Physik der Klangfarben (Physics of tim-bres) Professorial dissertation Universitaumlt Berlin Berlin1929

[15] E L Saldanha J F Corso Timbre cues and the identifica-tion of musical instruments J Acoust Soc Am 36 (1964)2021ndash2026

[16] W Strong M Clark Perturbations of synthetic orchestralwind-instrument tones J Acoust Soc Am 41 (1967) 277ndash285

[17] D Luce J Clark Physical correlates of brass-instrumenttones J Acoust Soc Am 42 (1967) 1232ndash1243

[18] D A Luce Dynamic spectrum changes of orchestral in-struments J Audio Eng Soc 23 (1975) 565ndash568

[19] J Meyer Acoustics and the performance of music 5th edSpringer New York 2009

[20] G Fant Acoustic theory of speech production MoutonThe Hague 1960

[21] C Reuter Klangfarbe und Instrumentation Geschichte ndashUrsachen ndash Wirkung (Timbre and instrumentation historyndash causes ndash effect) P Lang Frankfurt am Main 2002

[22] R van Dinther R D Patterson Perception of acousticscale and size in musical instrument sounds J AcoustSoc Am 120 (2006) 2158ndash2176

[23] T Irino R D Patterson A dynamic compressive gam-machirp auditory filterbank IEEE Trans Audio SpeechLang Process 14 (2006) 2222ndash2232

[24] B C Moore B R Glasberg Suggested formulae for cal-culating auditory-filter bandwidths and excitation patternsJ Acoust Soc Am 74 (1983) 750ndash753

[25] S-A Lembke S McAdams A spectral-envelope synthe-sis model to study perceptual blend between wind instru-ments Proc 11th Congregraves Franccedilais drsquoAcoustique IOAannual meeting Acoustics 2012 Nantes France 20121025ndash1030

[26] F Martin C Champlin Reconsidering the limits of normalhearing J Am Acad Audiol 11 (2000) 64ndash66

[27] ISO 389ndash8 Acoustics Reference zero for the calibrationof audiometric equipmentmdashPart 8 Reference equivalentthreshold sound pressure levels for pure tones and circum-aural earphones Tech Rept International Organization forStandardization Geneva Switzerland 2004

[28] W J Conover R L Iman Rank transformations as abridge between parametric and nonparametric statisticsAm Stat 35 (1981) 124ndash129

[29] J J Higgins S Tashtoush An aligned rank transform testfor interaction Nonlinear World 1 (1994) 201ndash221

[30] G Peeters B L Giordano P Susini N Misdariis SMcAdams The Timbre Toolbox extracting audio descrip-tors from musical signals J Acoust Soc Am 130 (2011)2902ndash2916

[31] S McAdams Spectral fusion spectral parsing and the for-mation of auditory images PhD dissertation Stanford Uni-versity 1984

1051

Page 11: The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

Table VI Multiple-regression models best predicting timbre-blend ratings for two instrument subsets

Pitch-invariant subset R2adj=87

F (3 116)=2724 p lt0001

Regressors βstd t p

ΔLrarr3dB 100 286 lt0001

|ΔScentroid| 26 77 lt0001Clohi 27 79 lt0001

Pitch-variant subset R2adj=88

F (3 52)=1342 p lt0001

Regressors βstd t p

ΔLrarr3dB 103 200 lt0001

|ΔScentroid| 16 33 0018Clohi 34 68 lt0001

ant computed on the pitch-generalized spectral-envelopeestimates Table VI displays these optimized regressionmodels for pitch-invariant and pitch-variant subsets bothleading to about 87 explained variance (adjusted R2)The standardized regression-slope coefficients βstd indi-cate the relative contribution of regressors with the rel-ative weights being very similar for both subsets In thesemodels ΔLrarr

3dB acted as the strongest predictor for theblend ratings contributing about five times more than|ΔScentroid| which furthermore did not perform better thanClohi These findings clearly argue for local spectral-envelope descriptors to be more meaningful than globalones in explaining blending in the investigated dyadsMoreover the remaining global descriptor spectral slopeappeared to play no role Furthermore finding both in-strument subsets to be modeled equally well through thesame spectral-envelope descriptors points to a general util-ity of pitch-generalized descriptions for all instrumentsDespite the findings in Section 522 arguing against pitch-invariant perceptual robustness for clarinet and flute theobtained regression models excluded the Pitch and Intervalvariables thus appearing to be less relevant to explainingthe blend ratings

6 General discussion

Orchestrators would benefit from acoustical descriptionsof instruments that correspond to the perceptual processesinvolved in achieving blended timbres Section 2 sug-gests that common orchestral wind instruments are rea-sonably well described through pitch-generalized spectral-envelope estimates which furthermore show the instru-ments horn bassoon oboe and trumpet to be charac-terized by prominent formant structure Auditory mod-els employing stabilized auditory images (SAI) confirmthat for strong formant characterization and for lower tomiddle pitch ranges the pitch-invariant characterization isstable In higher instrument registers however SAI pro-files indicate limitations to pitch-invariant characteriza-tion Other instruments like clarinet and flute yield SAI

profiles clearly varying as a function of pitch implyingthat this pitch dependency may also extend to perception

The perceptual investigation in Sections 3 to 5 con-firms the acoustical implications showing that strong for-mant characterization results in main formants becomingperceptually relevant to blending Given a dyad in whicha putative main formant is variable in frequency relativeto a fixed reference formant the investigated instrumentsdisplay two archetypical profiles based on their formantprominence For the pitch-variant clarinet and flute blendincreases as a monotonic quasi-linear function if the vari-able formant moves from above to below the referenceFor pitch-invariant instruments the frequency alignmentbetween the variable formant and the reference (ΔF= 0)functions as a boundary delimiting a region of higher de-grees of blend at and below the reference and contrastedby a marked decrease in blend when the variable formantexceeds it which overall resembles a plateau profile Thepitch invariance even extends to different interval typesas the plateau profile remains unaffected in non-unisonintervals regardless of their degree of consonance How-ever the findings suggest that the perceptual relevance ofspectral-envelope estimates diminishes in higher instru-ment registers The limited sampling of conditions acrossthe investigated factors prevented a more comprehensiveevaluation of all instruments Whereas the findings allowgeneral relationships for formant frequency to be deducedmore comprehensive investigations are needed to attainmore generalizable quantification of absolute frequencyranges and thresholds

In correlating acoustical and perceptual factors spec-tral-envelope characteristics alone explain up to 87 ofthe variance in blend ratings In addition local spectraltraits seem to be more powerful acoustical predictors ofblend than global traits The formant descriptor for theupper formant bound Frarr

3dB when expressed as a derivatedescriptor ΔLrarr

3dB acts as the strongest predictor for theblend ratings regardless of whether instruments belongto the pitch-invariant group or not With regard to clar-inet and flute the departures from pitch invariance foundin Section 522 contradict the utility of pitch-generalizedspectral-envelope description in predicting blend ratingsas reported in Section 523 Taking both findings into ac-count this for one argues that the descriptor Frarr

3dB stillsucceeds in explaining blend well even for clarinet andflute On the other hand the same instruments displaya greater perceptual sensitivity to the Pitch and Intervalfactors likely associated with their less pronounced for-mant structure Overall strong formant prominence leadsto more drastic changes in blend

The prediction of blend using ΔLrarr3dB still presumes

that one of the instruments serves as a reference formantas the employed difference descriptors are anchored tothe sampled instrument The dependence on a referenceleaves some ambiguity because an arbitrary combinationof two instruments would lead to contradictory predictionsof blend if both instruments were given equal importancein serving as the reference Given the context of both ex-

1049

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

periments it can be assumed that the sampled instrumentacting as a constant anchor had been biased into servingas the reference by combining it with a variable synthe-sized instrument In addition a possible perceptual expla-nation could concern audio samples of instruments playingwithout vibrato generally still exhibiting coherent micro-modulations of partial tones These modulations have beenshown to contribute to a stronger and more stable audi-tory image [31] and may thus bias the more stable imagetoward acting as the reference especially as the synthe-sized partials remain static over time Even in the contextof blending in musical performance one instrument as-sumes the role of the leading voice in which it possiblyserves as the reference whereas an accompanying instru-ment avoids exceeding the lead instrumentrsquos main-formantfrequency Likewise returning to the notion of blend lead-ing to augmented timbres [4] the dominant timbre to beembellished by another would seem predestined to func-tion as the reference based on which the less dominanttimbre should not exhibit formant frequencies exceedingthose of the reference

Finally the results allow a reassessment of previousexplanations for blend The lsquodarkerrsquo-timbre hypothesis[4] is directly reflected in the obtained linear blend pro-file in which lower ΔF increases blend and at the sametime causes a decrease in the spectral-centroid compos-ite By contrast this hypothesis is not well explained bythe plateau profile as blend ratings remain similarly highfor ΔF le 0 The alternative hypothesis of coincidence offormant regions [3] would have predicted stronger blendratings for ΔF = 0 than for all other levels which in theperceptual results is only achieved for the levels ΔF gt 0While the hypothesis only achieves partial fulfillment withrespect to spectral variations ΔF it finds more agreementin the corresponding SAI representations As shown in twoexample cases for horn in Figure 9 the dyad SAI pro-files for the levels ΔF gt 0 are distinguishable from theremaining levels through clear deviations between 1 and2 kHz7 and located just above the hornrsquos estimated Frarr

3dBRemarkably the formant shifts related to ΔFlt 0 (Fig-ure 3) are not reflected in the corresponding dyad SAIprofiles (Figure 9) which instead exhibit direct align-ment below 500 Hz for all three levels ΔF le 0 There-fore only ΔF gt 0 seems to lead to incomplete masking[13] revealing the presence of the synthesized instrumentwhereas the spectral-envelope variations ΔF lt 0 evokelittle change compared to the dyad excitation pattern forΔF = 0

Of still greater importance the auditory system as mod-eled by AIM using the DCGC seemingly involves a high-

7 Concerning the output from the AIM a misalignment between actualsinusoidal frequencies and the corresponding SAI peaks was observedThrough personal communication with the developer of the utilized AIMimplementation this was explained as being an inherent property of thedynamic-compression filters A correction function was derived by com-puting SAIs for various sinusoidal frequencies and fitting the two fre-quency scales yielding the linear function fSAI = 117 middot f + 282 Hz[r2(50)= 999 p lt 0001] In Figure 9 the correction manifests itself inthe compressed frequency extent for the SAIs

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

001

002

SAImagnitude

0

-II

-I

+I

+II

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

002

004

006

008

Frequency (Hz)

Figure 9 SAI profiles of dyads for allΔF levels (Experiment B)depicting two experimental conditions for horn Top pitch 1 uni-son bottom pitch 2 non-unison the grid lines correspond topartial-tone locations

pass characteristic that attenuates spectral-envelope re-gions below 500 Hz affecting the perceived magnitudesof the respective partial tones (grid lines) This impliesthat in the region below 500 Hz frequency deviations be-tween main formants no longer affect the achieved degreeof blend as reflected both in Figure 9 and the perceptualfindings Horn and bassoon would especially benefit fromthis as their main formants are centered around 500 HzOboe and trumpet both exhibiting higher Frarr

3dB can be as-sumed to benefit to a lesser degree In summary main for-mants located in proximity to 500 Hz will benefit more ontop of which lower pitches would also increase the num-ber of partial tones falling into this favorable frequencyregion This reflects tendencies for pitch-invariant traits inSAI correlations (see Section 22) to be more pronouncedat lower pitch ranges and for instruments of lower registerwhich would lend support to the lsquodarker-timbrersquo hypothe-sis in terms of limiting the spectral centroid in frequency

7 Conclusion

Evidence from acoustical and psychoacoustical descrip-tions of wind instruments and from perceptual validationshows that relative location and prominence of main for-mants affect the perception of timbre blend critically Fur-thermore these pitch-invariant spectral characteristics ex-plain and predict the perception of blend to a promis-ingly high degree Remaining discrepancies between theacoustic and perceptual domains can be explained throughapparent constraints of the simulated auditory systemIn conclusion a perceptual model for the contributionof local spectral-envelope characteristics to blending isproposed keeping in mind that it would serve as aninstrument-specific component in a more complex gen-eral perceptual model involving compositional and per-formance factors as initially discussed in Section 1 Themain factors influencing the perception of spectral blendare summarized1 Frequency relationships between upper bounds of main

formants are critical to blend Among several instru-

1050

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

ments one is expected to serve as the reference (eglead instrument dominant timbre) above which thepresence of other instrumentsrsquo formants would stronglyresult in decreased blend

2 Prominence of the main formants governs whetherthese relationships lead to plateau or linear blend pro-files and in the first case pitch-invariant perceptual ro-bustness extends to non-unison intervals

3 Spectral-envelope relationships below 500 Hz may benegligible due to constraints of the auditory system Atthe same time blend decreases at higher pitches due toa degraded perceptual robustness of formants

This hypothetical model still requires further investigationconcerning a more systematic study on 1) the apparentconstraints of the auditory system as modeled by AIM2) how in musical practice one instrument may functionas the reference 3) establishing a more specific descrip-tion of formant prominence and 4) addressing the contri-bution of loudness balance between instruments to blendThese future investigations will further validate and refinethe proposed perceptual model and will improve compu-tational prediction tools for the instrument-specific spec-tral component of blend Orchestration practice will ben-efit from these research efforts even beyond aiming forblend as knowledge of favorable instrument relationshipsalso informs orchestrators as to how to avoid it

References

[1] G J Sandell Concurrent timbres in orchestration a per-ceptual study of factors determining blend PhD disserta-tion Northwestern University 1991

[2] R A Kendall E C Carterette Identification and blend oftimbres as a basis for orchestration Contemp Music Rev9 (1993) 51ndash67

[3] C Reuter Die auditive Diskrimination von Orchesterin-strumenten - Verschmelzung und Heraushoumlrbarkeit vonInstrumentalklangfarben im Ensemblespiel (The auditorydiscrimination of orchestral instruments fusion and distin-guishability of instrumental timbres in ensemble playing)P Lang Frankfurt am Main 1996

[4] G J Sandell Roles for spectral centroid and other factorsin determining ldquoblended instrument pairings in orchestra-tion Music Percept 13 (1995) 209ndash246

[5] A S Bregman Auditory scene analysis the perceptual or-ganization of sound MIT Press Cambridge MA 1990

[6] S McAdams S Winsberg S Donnadieu G De Soete JKrimphoff Perceptual scaling of synthesized musical tim-bres common dimensions specificities and latent subjectclasses Psychol Res 58 (1995) 177ndash192

[7] L Wedin G Goude Dimension analysis of the perceptionof instrumental timbre Scand J Psychol 13 (1972) 228ndash240

[8] J M Grey J W Gordon Perceptual effects of spectralmodifications on musical timbres J Acoust Soc Am 63(1978) 1493ndash1500

[9] W Strong M Clark Synthesis of wind-instrument tonesJ Acoust Soc Am 41 (1967) 39ndash52

[10] D Tardieu S McAdams Perception of dyads of impulsiveand sustained instrument sounds Music Percept 30 (2012)117ndash128

[11] L Hermann Phonophotographische Untersuchungen(Phonophotographic investigations) Pfluumlger Archiv fuumlrdie Gesamte Physiologie des Menschen und der Thiere 58(1894) 264ndash279

[12] C Stumpf Die Sprachlaute (The speech sounds) SpringerBerlin 1926

[13] J P Fricke Zur Anwendung digitaler Klangfarbenfilter beiAufnahme und Wiedergabe (On the application of digitaltimbre filters during recording and playback) Proc 14thTonmeistertagung Munich Germany 1986 135ndash148

[14] K E Schumann Physik der Klangfarben (Physics of tim-bres) Professorial dissertation Universitaumlt Berlin Berlin1929

[15] E L Saldanha J F Corso Timbre cues and the identifica-tion of musical instruments J Acoust Soc Am 36 (1964)2021ndash2026

[16] W Strong M Clark Perturbations of synthetic orchestralwind-instrument tones J Acoust Soc Am 41 (1967) 277ndash285

[17] D Luce J Clark Physical correlates of brass-instrumenttones J Acoust Soc Am 42 (1967) 1232ndash1243

[18] D A Luce Dynamic spectrum changes of orchestral in-struments J Audio Eng Soc 23 (1975) 565ndash568

[19] J Meyer Acoustics and the performance of music 5th edSpringer New York 2009

[20] G Fant Acoustic theory of speech production MoutonThe Hague 1960

[21] C Reuter Klangfarbe und Instrumentation Geschichte ndashUrsachen ndash Wirkung (Timbre and instrumentation historyndash causes ndash effect) P Lang Frankfurt am Main 2002

[22] R van Dinther R D Patterson Perception of acousticscale and size in musical instrument sounds J AcoustSoc Am 120 (2006) 2158ndash2176

[23] T Irino R D Patterson A dynamic compressive gam-machirp auditory filterbank IEEE Trans Audio SpeechLang Process 14 (2006) 2222ndash2232

[24] B C Moore B R Glasberg Suggested formulae for cal-culating auditory-filter bandwidths and excitation patternsJ Acoust Soc Am 74 (1983) 750ndash753

[25] S-A Lembke S McAdams A spectral-envelope synthe-sis model to study perceptual blend between wind instru-ments Proc 11th Congregraves Franccedilais drsquoAcoustique IOAannual meeting Acoustics 2012 Nantes France 20121025ndash1030

[26] F Martin C Champlin Reconsidering the limits of normalhearing J Am Acad Audiol 11 (2000) 64ndash66

[27] ISO 389ndash8 Acoustics Reference zero for the calibrationof audiometric equipmentmdashPart 8 Reference equivalentthreshold sound pressure levels for pure tones and circum-aural earphones Tech Rept International Organization forStandardization Geneva Switzerland 2004

[28] W J Conover R L Iman Rank transformations as abridge between parametric and nonparametric statisticsAm Stat 35 (1981) 124ndash129

[29] J J Higgins S Tashtoush An aligned rank transform testfor interaction Nonlinear World 1 (1994) 201ndash221

[30] G Peeters B L Giordano P Susini N Misdariis SMcAdams The Timbre Toolbox extracting audio descrip-tors from musical signals J Acoust Soc Am 130 (2011)2902ndash2916

[31] S McAdams Spectral fusion spectral parsing and the for-mation of auditory images PhD dissertation Stanford Uni-versity 1984

1051

Page 12: The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

ACTA ACUSTICA UNITED WITH ACUSTICA Lembke McAdams Spectral-envelope characteristicsVol 101 (2015)

periments it can be assumed that the sampled instrumentacting as a constant anchor had been biased into servingas the reference by combining it with a variable synthe-sized instrument In addition a possible perceptual expla-nation could concern audio samples of instruments playingwithout vibrato generally still exhibiting coherent micro-modulations of partial tones These modulations have beenshown to contribute to a stronger and more stable audi-tory image [31] and may thus bias the more stable imagetoward acting as the reference especially as the synthe-sized partials remain static over time Even in the contextof blending in musical performance one instrument as-sumes the role of the leading voice in which it possiblyserves as the reference whereas an accompanying instru-ment avoids exceeding the lead instrumentrsquos main-formantfrequency Likewise returning to the notion of blend lead-ing to augmented timbres [4] the dominant timbre to beembellished by another would seem predestined to func-tion as the reference based on which the less dominanttimbre should not exhibit formant frequencies exceedingthose of the reference

Finally the results allow a reassessment of previousexplanations for blend The lsquodarkerrsquo-timbre hypothesis[4] is directly reflected in the obtained linear blend pro-file in which lower ΔF increases blend and at the sametime causes a decrease in the spectral-centroid compos-ite By contrast this hypothesis is not well explained bythe plateau profile as blend ratings remain similarly highfor ΔF le 0 The alternative hypothesis of coincidence offormant regions [3] would have predicted stronger blendratings for ΔF = 0 than for all other levels which in theperceptual results is only achieved for the levels ΔF gt 0While the hypothesis only achieves partial fulfillment withrespect to spectral variations ΔF it finds more agreementin the corresponding SAI representations As shown in twoexample cases for horn in Figure 9 the dyad SAI pro-files for the levels ΔF gt 0 are distinguishable from theremaining levels through clear deviations between 1 and2 kHz7 and located just above the hornrsquos estimated Frarr

3dBRemarkably the formant shifts related to ΔFlt 0 (Fig-ure 3) are not reflected in the corresponding dyad SAIprofiles (Figure 9) which instead exhibit direct align-ment below 500 Hz for all three levels ΔF le 0 There-fore only ΔF gt 0 seems to lead to incomplete masking[13] revealing the presence of the synthesized instrumentwhereas the spectral-envelope variations ΔF lt 0 evokelittle change compared to the dyad excitation pattern forΔF = 0

Of still greater importance the auditory system as mod-eled by AIM using the DCGC seemingly involves a high-

7 Concerning the output from the AIM a misalignment between actualsinusoidal frequencies and the corresponding SAI peaks was observedThrough personal communication with the developer of the utilized AIMimplementation this was explained as being an inherent property of thedynamic-compression filters A correction function was derived by com-puting SAIs for various sinusoidal frequencies and fitting the two fre-quency scales yielding the linear function fSAI = 117 middot f + 282 Hz[r2(50)= 999 p lt 0001] In Figure 9 the correction manifests itself inthe compressed frequency extent for the SAIs

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

001

002

SAImagnitude

0

-II

-I

+I

+II

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

002

004

006

008

Frequency (Hz)

Figure 9 SAI profiles of dyads for allΔF levels (Experiment B)depicting two experimental conditions for horn Top pitch 1 uni-son bottom pitch 2 non-unison the grid lines correspond topartial-tone locations

pass characteristic that attenuates spectral-envelope re-gions below 500 Hz affecting the perceived magnitudesof the respective partial tones (grid lines) This impliesthat in the region below 500 Hz frequency deviations be-tween main formants no longer affect the achieved degreeof blend as reflected both in Figure 9 and the perceptualfindings Horn and bassoon would especially benefit fromthis as their main formants are centered around 500 HzOboe and trumpet both exhibiting higher Frarr

3dB can be as-sumed to benefit to a lesser degree In summary main for-mants located in proximity to 500 Hz will benefit more ontop of which lower pitches would also increase the num-ber of partial tones falling into this favorable frequencyregion This reflects tendencies for pitch-invariant traits inSAI correlations (see Section 22) to be more pronouncedat lower pitch ranges and for instruments of lower registerwhich would lend support to the lsquodarker-timbrersquo hypothe-sis in terms of limiting the spectral centroid in frequency

7 Conclusion

Evidence from acoustical and psychoacoustical descrip-tions of wind instruments and from perceptual validationshows that relative location and prominence of main for-mants affect the perception of timbre blend critically Fur-thermore these pitch-invariant spectral characteristics ex-plain and predict the perception of blend to a promis-ingly high degree Remaining discrepancies between theacoustic and perceptual domains can be explained throughapparent constraints of the simulated auditory systemIn conclusion a perceptual model for the contributionof local spectral-envelope characteristics to blending isproposed keeping in mind that it would serve as aninstrument-specific component in a more complex gen-eral perceptual model involving compositional and per-formance factors as initially discussed in Section 1 Themain factors influencing the perception of spectral blendare summarized1 Frequency relationships between upper bounds of main

formants are critical to blend Among several instru-

1050

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

ments one is expected to serve as the reference (eglead instrument dominant timbre) above which thepresence of other instrumentsrsquo formants would stronglyresult in decreased blend

2 Prominence of the main formants governs whetherthese relationships lead to plateau or linear blend pro-files and in the first case pitch-invariant perceptual ro-bustness extends to non-unison intervals

3 Spectral-envelope relationships below 500 Hz may benegligible due to constraints of the auditory system Atthe same time blend decreases at higher pitches due toa degraded perceptual robustness of formants

This hypothetical model still requires further investigationconcerning a more systematic study on 1) the apparentconstraints of the auditory system as modeled by AIM2) how in musical practice one instrument may functionas the reference 3) establishing a more specific descrip-tion of formant prominence and 4) addressing the contri-bution of loudness balance between instruments to blendThese future investigations will further validate and refinethe proposed perceptual model and will improve compu-tational prediction tools for the instrument-specific spec-tral component of blend Orchestration practice will ben-efit from these research efforts even beyond aiming forblend as knowledge of favorable instrument relationshipsalso informs orchestrators as to how to avoid it

References

[1] G J Sandell Concurrent timbres in orchestration a per-ceptual study of factors determining blend PhD disserta-tion Northwestern University 1991

[2] R A Kendall E C Carterette Identification and blend oftimbres as a basis for orchestration Contemp Music Rev9 (1993) 51ndash67

[3] C Reuter Die auditive Diskrimination von Orchesterin-strumenten - Verschmelzung und Heraushoumlrbarkeit vonInstrumentalklangfarben im Ensemblespiel (The auditorydiscrimination of orchestral instruments fusion and distin-guishability of instrumental timbres in ensemble playing)P Lang Frankfurt am Main 1996

[4] G J Sandell Roles for spectral centroid and other factorsin determining ldquoblended instrument pairings in orchestra-tion Music Percept 13 (1995) 209ndash246

[5] A S Bregman Auditory scene analysis the perceptual or-ganization of sound MIT Press Cambridge MA 1990

[6] S McAdams S Winsberg S Donnadieu G De Soete JKrimphoff Perceptual scaling of synthesized musical tim-bres common dimensions specificities and latent subjectclasses Psychol Res 58 (1995) 177ndash192

[7] L Wedin G Goude Dimension analysis of the perceptionof instrumental timbre Scand J Psychol 13 (1972) 228ndash240

[8] J M Grey J W Gordon Perceptual effects of spectralmodifications on musical timbres J Acoust Soc Am 63(1978) 1493ndash1500

[9] W Strong M Clark Synthesis of wind-instrument tonesJ Acoust Soc Am 41 (1967) 39ndash52

[10] D Tardieu S McAdams Perception of dyads of impulsiveand sustained instrument sounds Music Percept 30 (2012)117ndash128

[11] L Hermann Phonophotographische Untersuchungen(Phonophotographic investigations) Pfluumlger Archiv fuumlrdie Gesamte Physiologie des Menschen und der Thiere 58(1894) 264ndash279

[12] C Stumpf Die Sprachlaute (The speech sounds) SpringerBerlin 1926

[13] J P Fricke Zur Anwendung digitaler Klangfarbenfilter beiAufnahme und Wiedergabe (On the application of digitaltimbre filters during recording and playback) Proc 14thTonmeistertagung Munich Germany 1986 135ndash148

[14] K E Schumann Physik der Klangfarben (Physics of tim-bres) Professorial dissertation Universitaumlt Berlin Berlin1929

[15] E L Saldanha J F Corso Timbre cues and the identifica-tion of musical instruments J Acoust Soc Am 36 (1964)2021ndash2026

[16] W Strong M Clark Perturbations of synthetic orchestralwind-instrument tones J Acoust Soc Am 41 (1967) 277ndash285

[17] D Luce J Clark Physical correlates of brass-instrumenttones J Acoust Soc Am 42 (1967) 1232ndash1243

[18] D A Luce Dynamic spectrum changes of orchestral in-struments J Audio Eng Soc 23 (1975) 565ndash568

[19] J Meyer Acoustics and the performance of music 5th edSpringer New York 2009

[20] G Fant Acoustic theory of speech production MoutonThe Hague 1960

[21] C Reuter Klangfarbe und Instrumentation Geschichte ndashUrsachen ndash Wirkung (Timbre and instrumentation historyndash causes ndash effect) P Lang Frankfurt am Main 2002

[22] R van Dinther R D Patterson Perception of acousticscale and size in musical instrument sounds J AcoustSoc Am 120 (2006) 2158ndash2176

[23] T Irino R D Patterson A dynamic compressive gam-machirp auditory filterbank IEEE Trans Audio SpeechLang Process 14 (2006) 2222ndash2232

[24] B C Moore B R Glasberg Suggested formulae for cal-culating auditory-filter bandwidths and excitation patternsJ Acoust Soc Am 74 (1983) 750ndash753

[25] S-A Lembke S McAdams A spectral-envelope synthe-sis model to study perceptual blend between wind instru-ments Proc 11th Congregraves Franccedilais drsquoAcoustique IOAannual meeting Acoustics 2012 Nantes France 20121025ndash1030

[26] F Martin C Champlin Reconsidering the limits of normalhearing J Am Acad Audiol 11 (2000) 64ndash66

[27] ISO 389ndash8 Acoustics Reference zero for the calibrationof audiometric equipmentmdashPart 8 Reference equivalentthreshold sound pressure levels for pure tones and circum-aural earphones Tech Rept International Organization forStandardization Geneva Switzerland 2004

[28] W J Conover R L Iman Rank transformations as abridge between parametric and nonparametric statisticsAm Stat 35 (1981) 124ndash129

[29] J J Higgins S Tashtoush An aligned rank transform testfor interaction Nonlinear World 1 (1994) 201ndash221

[30] G Peeters B L Giordano P Susini N Misdariis SMcAdams The Timbre Toolbox extracting audio descrip-tors from musical signals J Acoust Soc Am 130 (2011)2902ndash2916

[31] S McAdams Spectral fusion spectral parsing and the for-mation of auditory images PhD dissertation Stanford Uni-versity 1984

1051

Page 13: The Role of Spectral-Envelope Characteristics in ... · Figure 1. Estimated spectral-envelope descriptions for all six instruments (labelled in individual panels). Estimates are based

Lembke McAdams Spectral-envelope characteristics ACTA ACUSTICA UNITED WITH ACUSTICAVol 101 (2015)

ments one is expected to serve as the reference (eglead instrument dominant timbre) above which thepresence of other instrumentsrsquo formants would stronglyresult in decreased blend

2 Prominence of the main formants governs whetherthese relationships lead to plateau or linear blend pro-files and in the first case pitch-invariant perceptual ro-bustness extends to non-unison intervals

3 Spectral-envelope relationships below 500 Hz may benegligible due to constraints of the auditory system Atthe same time blend decreases at higher pitches due toa degraded perceptual robustness of formants

This hypothetical model still requires further investigationconcerning a more systematic study on 1) the apparentconstraints of the auditory system as modeled by AIM2) how in musical practice one instrument may functionas the reference 3) establishing a more specific descrip-tion of formant prominence and 4) addressing the contri-bution of loudness balance between instruments to blendThese future investigations will further validate and refinethe proposed perceptual model and will improve compu-tational prediction tools for the instrument-specific spec-tral component of blend Orchestration practice will ben-efit from these research efforts even beyond aiming forblend as knowledge of favorable instrument relationshipsalso informs orchestrators as to how to avoid it

References

[1] G J Sandell Concurrent timbres in orchestration a per-ceptual study of factors determining blend PhD disserta-tion Northwestern University 1991

[2] R A Kendall E C Carterette Identification and blend oftimbres as a basis for orchestration Contemp Music Rev9 (1993) 51ndash67

[3] C Reuter Die auditive Diskrimination von Orchesterin-strumenten - Verschmelzung und Heraushoumlrbarkeit vonInstrumentalklangfarben im Ensemblespiel (The auditorydiscrimination of orchestral instruments fusion and distin-guishability of instrumental timbres in ensemble playing)P Lang Frankfurt am Main 1996

[4] G J Sandell Roles for spectral centroid and other factorsin determining ldquoblended instrument pairings in orchestra-tion Music Percept 13 (1995) 209ndash246

[5] A S Bregman Auditory scene analysis the perceptual or-ganization of sound MIT Press Cambridge MA 1990

[6] S McAdams S Winsberg S Donnadieu G De Soete JKrimphoff Perceptual scaling of synthesized musical tim-bres common dimensions specificities and latent subjectclasses Psychol Res 58 (1995) 177ndash192

[7] L Wedin G Goude Dimension analysis of the perceptionof instrumental timbre Scand J Psychol 13 (1972) 228ndash240

[8] J M Grey J W Gordon Perceptual effects of spectralmodifications on musical timbres J Acoust Soc Am 63(1978) 1493ndash1500

[9] W Strong M Clark Synthesis of wind-instrument tonesJ Acoust Soc Am 41 (1967) 39ndash52

[10] D Tardieu S McAdams Perception of dyads of impulsiveand sustained instrument sounds Music Percept 30 (2012)117ndash128

[11] L Hermann Phonophotographische Untersuchungen(Phonophotographic investigations) Pfluumlger Archiv fuumlrdie Gesamte Physiologie des Menschen und der Thiere 58(1894) 264ndash279

[12] C Stumpf Die Sprachlaute (The speech sounds) SpringerBerlin 1926

[13] J P Fricke Zur Anwendung digitaler Klangfarbenfilter beiAufnahme und Wiedergabe (On the application of digitaltimbre filters during recording and playback) Proc 14thTonmeistertagung Munich Germany 1986 135ndash148

[14] K E Schumann Physik der Klangfarben (Physics of tim-bres) Professorial dissertation Universitaumlt Berlin Berlin1929

[15] E L Saldanha J F Corso Timbre cues and the identifica-tion of musical instruments J Acoust Soc Am 36 (1964)2021ndash2026

[16] W Strong M Clark Perturbations of synthetic orchestralwind-instrument tones J Acoust Soc Am 41 (1967) 277ndash285

[17] D Luce J Clark Physical correlates of brass-instrumenttones J Acoust Soc Am 42 (1967) 1232ndash1243

[18] D A Luce Dynamic spectrum changes of orchestral in-struments J Audio Eng Soc 23 (1975) 565ndash568

[19] J Meyer Acoustics and the performance of music 5th edSpringer New York 2009

[20] G Fant Acoustic theory of speech production MoutonThe Hague 1960

[21] C Reuter Klangfarbe und Instrumentation Geschichte ndashUrsachen ndash Wirkung (Timbre and instrumentation historyndash causes ndash effect) P Lang Frankfurt am Main 2002

[22] R van Dinther R D Patterson Perception of acousticscale and size in musical instrument sounds J AcoustSoc Am 120 (2006) 2158ndash2176

[23] T Irino R D Patterson A dynamic compressive gam-machirp auditory filterbank IEEE Trans Audio SpeechLang Process 14 (2006) 2222ndash2232

[24] B C Moore B R Glasberg Suggested formulae for cal-culating auditory-filter bandwidths and excitation patternsJ Acoust Soc Am 74 (1983) 750ndash753

[25] S-A Lembke S McAdams A spectral-envelope synthe-sis model to study perceptual blend between wind instru-ments Proc 11th Congregraves Franccedilais drsquoAcoustique IOAannual meeting Acoustics 2012 Nantes France 20121025ndash1030

[26] F Martin C Champlin Reconsidering the limits of normalhearing J Am Acad Audiol 11 (2000) 64ndash66

[27] ISO 389ndash8 Acoustics Reference zero for the calibrationof audiometric equipmentmdashPart 8 Reference equivalentthreshold sound pressure levels for pure tones and circum-aural earphones Tech Rept International Organization forStandardization Geneva Switzerland 2004

[28] W J Conover R L Iman Rank transformations as abridge between parametric and nonparametric statisticsAm Stat 35 (1981) 124ndash129

[29] J J Higgins S Tashtoush An aligned rank transform testfor interaction Nonlinear World 1 (1994) 201ndash221

[30] G Peeters B L Giordano P Susini N Misdariis SMcAdams The Timbre Toolbox extracting audio descrip-tors from musical signals J Acoust Soc Am 130 (2011)2902ndash2916

[31] S McAdams Spectral fusion spectral parsing and the for-mation of auditory images PhD dissertation Stanford Uni-versity 1984

1051