9
1 © 2018 The Anthropological Society of Nippon Mammalian voice production Vocal communication in non-human primates has long been of interest to both academic researchers and the broad- er public. This interest exists for two principle reasons. First- ly, we have an intrinsic curiosity over how these animals, which are so closely related to us, communicate with one another. Secondly, and perhaps more importantly, under- standing vocal communication in our primate relatives pro- vides us with important insight into the evolution of human speech—a topic that has fascinated humans for centuries. Speech does not fossilize, and studying the evolution of this complex, yet critical aspect of human behavior using proxies from the fossil record (e.g. the shape of the skull or hyoid (Fitch, 2000a; Nishimura, 2003)) has led to little con- sensus (Fitch, 2000b; Hauser et al., 2002). A more powerful approach is provided by the comparative method, the prima- ry tool used by Darwin to analyze evolutionary phenomena (Darwin, 1859, 1871). Comparative analyses use data from extant species to draw inferences about extinct ancestors and evolutionary processes. Several important advances in our understanding of the evolution of speech have been made using comparative data (Ghazanfar and Hauser, 1999; Fitch, 2000b; Fitch et al., 2016; Ghazanfar et al., 2012; Takahashi et al., 2013). Humans, non-human primates, most other mammals (Herbst et al., 2012), and even birds (Elemans et al., 2015) produce sound according to a universal physical principle, described by the myoelastic aerodynamic (MEAD) theory (van den Berg, 1958; Titze, 1980). Steady airflow, coming from the lungs, is converted into a sequence of airflow puls- es by the passively vibrating vocal folds (or other laryngeal or syringeal tissue), resulting in self-sustained oscillation. The acoustic pressure waveform generated by this sequence of flow pulses excites the vocal tract, which filters the pulses acoustically, and the result is radiated from the mouth (and/ or the nose) (Story, 2002). The latter phenomenon, involving the individual contributions of the laryngeal sound source and the vocal tract to the quality of the emitted sound, has been described in the source–filter theory of sound produc- tion (Fant, 1960) and its non-linear extension (Flanagan, ANTHROPOLOGICAL SCIENCE Non-invasive documentation of primate voice production using electroglottography Christian T. HERBST 1 *, Jacob C. DUNN 2,3 1 Primate Research Institute, Kyoto University, Inuyama 484-8506, Japan 2 Department of Animal and Environmental Biology, Faculty of Science & Technology, Anglia Ruskin University, Cambridge, UK 3 Division of Biological Anthropology, University of Cambridge, Cambridge, UK Received 31 October 2017; accepted 1 February 2018 Abstract Electroglottography (EGG) is a low-cost, non-invasive method for documenting laryngeal sound production during vocalization. The EGG signal represents relative vocal fold contact area and thus delivers physiological evidence of vocal fold vibration. While the method has received much atten- tion in human voice research over the last five decades, it has seen very little application in other mam- mals. Here, we give a concise overview of mammalian vocal production principles. We explain how mammalian voice production physiology and the dynamics of vocal fold vibration can be documented qualitatively and quantitatively with EGG, and we summarize and discuss key issues from research with humans. Finally, we review the limited number of studies applying EGG to non-human mammals, both in vivo and in vitro. The potential of EGG for non-invasive assessment of non-human primate vocaliza- tion is demonstrated with novel in vivo data of Cebus albifrons and Ateles chamek vocalization. These examples illustrate the great potential of EGG as a new minimally invasive tool in primate research, which can provide important insight into the ‘black box’ that is vocal production. A better understanding of vocal fold vibration across a range of taxa can provide us with a deeper understanding of several important elements of speech evolution, such as the universality of vocal production mechanisms, the independence of source and filter, the evolution of vocal control, and the relevance of non-linear phenomena. Key words: electroglottography, EGG, primate sound production, vocalization of non-human mam- mals, in vivo Advance Publication Review * Correspondence to: Christian T. Herbst, Primate Research Insti- tute, Kyoto University, Inuyama 484-8506, Japan. E-mail: [email protected] Published online 24 March 2018 in J-STAGE (www.jstage.jst.go.jp) DOI: 10.1537/ase.180201

Non-invasive documentation of primate voice production ...arro.anglia.ac.uk/702870/6/Dunn_2018.pdf · 2 C.T. HERBST and J.C. DUNN ANTHROPOLOGICAL SCIENCE 1968; Titze, 2008). The source–filter

  • Upload
    ngokiet

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Non-invasive documentation of primate voice production ...arro.anglia.ac.uk/702870/6/Dunn_2018.pdf · 2 C.T. HERBST and J.C. DUNN ANTHROPOLOGICAL SCIENCE 1968; Titze, 2008). The source–filter

1© 2018 The Anthropological Society of Nippon

Mammalian voice productionVocal communication in non-human primates has long

been of interest to both academic researchers and the broad-er public. This interest exists for two principle reasons. First-ly, we have an intrinsic curiosity over how these animals, which are so closely related to us, communicate with one another. Secondly, and perhaps more importantly, under-standing vocal communication in our primate relatives pro-vides us with important insight into the evolution of human speech—a topic that has fascinated humans for centuries.

Speech does not fossilize, and studying the evolution of this complex, yet critical aspect of human behavior using proxies from the fossil record (e.g. the shape of the skull or hyoid (Fitch, 2000a; Nishimura, 2003)) has led to little con-sensus (Fitch, 2000b; Hauser et al., 2002). A more powerful approach is provided by the comparative method, the prima-

ry tool used by Darwin to analyze evolutionary phenomena (Darwin, 1859, 1871). Comparative analyses use data from extant species to draw inferences about extinct ancestors and evolutionary processes. Several important advances in our understanding of the evolution of speech have been made using comparative data (Ghazanfar and Hauser, 1999; Fitch, 2000b; Fitch et al., 2016; Ghazanfar et al., 2012; Takahashi et al., 2013).

Humans, non-human primates, most other mammals (Herbst et al., 2012), and even birds (Elemans et al., 2015) produce sound according to a universal physical principle, described by the myoelastic aerodynamic (MEAD) theory (van den Berg, 1958; Titze, 1980). Steady airflow, coming from the lungs, is converted into a sequence of airflow puls-es by the passively vibrating vocal folds (or other laryngeal or syringeal tissue), resulting in self-sustained oscillation. The acoustic pressure waveform generated by this sequence of flow pulses excites the vocal tract, which filters the pulses acoustically, and the result is radiated from the mouth (and/or the nose) (Story, 2002). The latter phenomenon, involving the individual contributions of the laryngeal sound source and the vocal tract to the quality of the emitted sound, has been described in the source–filter theory of sound produc-tion (Fant, 1960) and its non-linear extension (Flanagan,

AnthropologicAl Science

Non-invasive documentation of primate voice production using electroglottography

Christian T. HerbSt1*, Jacob C. Dunn2,3

1Primate Research Institute, Kyoto University, Inuyama 484-8506, Japan2Department of Animal and Environmental Biology, Faculty of Science & Technology, Anglia Ruskin University, Cambridge, UK

3Division of Biological Anthropology, University of Cambridge, Cambridge, UK

Received 31 October 2017; accepted 1 February 2018

Abstract Electroglottography (EGG) is a low-cost, non-invasive method for documenting laryngeal sound production during vocalization. The EGG signal represents relative vocal fold contact area and thus delivers physiological evidence of vocal fold vibration. While the method has received much atten-tion in human voice research over the last five decades, it has seen very little application in other mam-mals. Here, we give a concise overview of mammalian vocal production principles. We explain how mammalian voice production physiology and the dynamics of vocal fold vibration can be documented qualitatively and quantitatively with EGG, and we summarize and discuss key issues from research with humans. Finally, we review the limited number of studies applying EGG to non-human mammals, both in vivo and in vitro. The potential of EGG for non-invasive assessment of non-human primate vocaliza-tion is demonstrated with novel in vivo data of Cebus albifrons and Ateles chamek vocalization. These examples illustrate the great potential of EGG as a new minimally invasive tool in primate research, which can provide important insight into the ‘black box’ that is vocal production. A better understanding of vocal fold vibration across a range of taxa can provide us with a deeper understanding of several important elements of speech evolution, such as the universality of vocal production mechanisms, the independence of source and filter, the evolution of vocal control, and the relevance of non-linear phenomena.

Key words: electroglottography, EGG, primate sound production, vocalization of non-human mam-mals, in vivo

Advance PublicationReview

* Correspondence to: Christian T. Herbst, Primate Research Insti-tute, Kyoto University, Inuyama 484-8506, Japan. E-mail: [email protected] online 24 March 2018 in J-STAGE (www.jstage.jst.go.jp) DOI: 10.1537/ase.180201

Page 2: Non-invasive documentation of primate voice production ...arro.anglia.ac.uk/702870/6/Dunn_2018.pdf · 2 C.T. HERBST and J.C. DUNN ANTHROPOLOGICAL SCIENCE 1968; Titze, 2008). The source–filter

C.T. HERBST and J.C. DUNN2 AnthropologicAl Science

1968; Titze, 2008). The source–filter theory thus predicts that both the laryngeal sound source and the vocal tract have distinct influences on the generated sound.

This universal sound production mechanism is facilitated by common basic vocal anatomy among mammals (certain specializations, such as air sacs or vocal membranes, not-withstanding (e.g. Charlton et al., 2013; Dunn et al., 2015)). Similarities in vocal anatomy allow for homologous func-tionality of sound output.

The mammalian vocal organ is comprised of three sub-systems: the respiratory system, the larynx, and the supra-glottal vocal tract. On a physical level, these subsystems constitute the power source, the sound source, and the sound modifiers, respectively (Howard and Murphy, 2007). Table 1 summarizes the most basic characteristics of the emitted vocal sound, and how they are controlled through the three voice subsystems. Note that the overview given in Table 1 is a gross simplification of a complex system. A more compre-hensive discussion is provided in Herbst (2017).

Bioacoustic research in non-humans typically only focus-es on three main parameters of the generated sound (out of the five listed in Table 1): • Fundamental frequency (fo), i.e. the repetition rate of the

tissue vibrations constituting the sound source, has been suggested to be an indicator for interspecific (Fletcher, 2005) and intraspecific body size (Seyfarth and Cheney, 1986); an indicator of sexual dimorphism (Fouquet et al., 2016); a cue to mate quality, motivations, and emotions; and for individual recognition of conspecifics. The lack of periodicity (resulting in irregular/chaotic signals) may be an indicator for physical condition, status, or motivation (Wilden et al., 1998; Fitch et al., 2002).

• The intensity of the radiated sound, typically measured via the sound pressure level (SPL), may be an indicator for age, body size, and breeding status (Sanvito and Galimberti, 2003), as well as for physical condition and motivation (Wyman et al., 2008).

• Finally, the formant structure of the radiated sound (deter-mined by the convolution of the spectral properties of the laryngeal sound source and the spectral characteristics of the supraglottal vocal tract, centrally influenced by the vocal tract’s resonances) plays a central role in vocal com-munication. The average frequency spacing (Reby and McComb, 2003) between the individual formants is an indicator of the vocalizing animal’s vocal tract length (Fitch, 1997), which has been shown to be a cue to body size (Reby et al., 2005; Charlton et al., 2012).These and many other important studies, mainly focusing

on acoustic output, have provided important insights into the signaling function of mammalian vocal communication. In contrast, little is known about the actual voice production process in non-human mammals. The respective physical and physiological mechanisms and functional constraints of laryngeal sound generation have not yet been comprehen-sively investigated, mostly due to experimental difficulties in vivo.

The typical bioacoustic research paradigm treats the vocal production system as a ‘black box.’ Only sound output is analyzed, and the underlying voice production mechanisms are inferred hypothetically, based on empirical knowledge from humans. The physiological and physical framework is being bypassed, thus limiting the understanding gained from such research, and potentially leading to inappropriate con-clusions about sound generation of the species studied. It is therefore important to understand the physiological and physical vocal production mechanisms of vocalizations at the sound source level. This is particularly important in non-human primates, when asking questions about vocal production with a view to understanding the evolution of human speech.

Electroglottography: methodDirect investigation of the laryngeal sound source is best

accomplished via laryngeal endoscopy. Several imaging techniques exist, the foremost being videostroboendoscopy (Bless et al., 2009), videokymography (Svec and Schutte, 2012), and high-speed video (HSV) endoscopy (Deliyski and Hillman, 2010). These methods are, however, invasive, and even in humans only 90–95% of the population tolerate the procedure (Markus Hess, personal communication). In non-human primates and other mammals, the method is vir-tually impossible to apply in vivo, some experiments in situ with anesthetized animals notwithstanding (Berke et al., 1987; Döllinger et al., 2005).

A non-invasive, relatively low-cost alternative is electro-glottography (EGG). This method was introduced by Fabre in 1957 as a bio-impedance measurement (Fabre, 1957). A high-frequency, low-voltage current is passed between two electrodes placed on either side of the thyroid cartilage at the level of the vocal folds. Changes in the vocal fold contact area (VFCA) during vocal fold vibration result in admittance variations, and the resulting EGG signal is proportional to the relative VFCA (Hampala et al., 2016).

The landmarks within a stereotypical EGG signal from human normophonic speech are illustrated in Figure 1 (taken

Table 1. Greatly simplified model of vocal sound quality control in humans. Note that a wide variety of synergies and secondary effects exist, e.g. the positive correlation between subglottal pressure and fundamental frequency, or the enhancement of high-frequency components via proper vocal-tract adjustments in singing.

Feature Voice component ControlSound intensity Power source Tracheal/subglottal air pressureFundamental frequency (fo) Sound source Vocal fold length and tensionDegree of high-frequency energy Sound source Vocal fold geometry, morphology and adduction‘Breathiness’ (i.e. noise components) Sound source Vocal fold adduction, vocal fold geometry (pathologies/lesions)Formant structure, vowel color (in humans) Vocal tract Vocal-tract anatomy, articulation (tongue, jaw opening, lips, vertical larynx

position)

Page 3: Non-invasive documentation of primate voice production ...arro.anglia.ac.uk/702870/6/Dunn_2018.pdf · 2 C.T. HERBST and J.C. DUNN ANTHROPOLOGICAL SCIENCE 1968; Titze, 2008). The source–filter

PRIMATE VOICE PRODUCTION AND ELECTROGLOTTOGRAPHY 3

from Hampala et al., 2016):(a) initial contact of the lower vocal fold margins;(b) initial contact of the upper vocal fold margins;(c) maximum vocal fold contact reached (glottis not neces-

sarily fully closed);(d) de-contacting phase initiated by separation of the lower

vocal fold margins;(e) upper margins start to separate; and(f) glottis is open, the contact area is minimal

The EGG signal thus constitutes indirect physiological evidence of vocal fold vibration dynamics. There is no note-worthy influence of vocal tract acoustics, and no influence at all from room acoustics or ambient background noise, which makes the method ideal for recordings outside of laboratory conditions, lacking a sound-proofed environment. For this reason, EGG is not suitable for assessing vowels in human speech and formant structures in any mammalian vocaliza-tion. It is, however, a perfect candidate for assessing the fo of the generated sound. This is illustrated in Figure 2A, B, and G, where fundamental frequency data from simultaneously recorded acoustic and EGG signals are compared.

Informed quantitative interpretation of the EGG wave-form can produce deeper insights into the voice production mechanics of the analyzed vocalization. An example is giv-en in Figure 2C–F. The graphs displayed in Figure 2C and D are so-called wavegrams, a recently introduced visualization technique for EGG signals (Herbst et al., 2010) (source code available from http://homepage.univie.ac.at/christian.herbst/index.php?page=wavegram). For wavegram generation, the analyzed signal is segmented into individual glottal vibrato-ry cycles. For each cycle, both the duration and the EGG signal amplitude are normalized, and the normalized ampli-tude is linearly color-coded (low amplitudes in light color, high amplitude in dark color). The resulting color-strips are vertically oriented and then consecutively plotted along the x-axis from left to right, representing the overall time of the analyzed signal. The y-axis is mapped onto normalized intra-cycle progress, and the z-axis shows the time-varying relative VFCA as recorded by EGG. Wavegrams can either

be constructed from the EGG signal (Figure 2C) or its first mathematical derivative (dEGG, i.e. the rate of change of the relative VFCA—see Figure 2D).

The vocalization analyzed in Figure 2 is a glissando (i.e. a gradual variation of fo) sung by an amateur soprano singer. During the glissando, the soprano unwillingly introduced a so-called vocal ‘register break,’ i.e. an abrupt variation of sound spectral characteristics, caused by an abrupt variation in the mechanism of vocal fold vibration. This register break, occurring around t = 4 s, is clearly evident in the EGG waveforms extracted at t = 3 and t = 5 s (Figure 2E and F, respectively). The duration of contact increased from about 34% of the glottal cycle before the register break to about 67% after the register break. In this example, calculation of the contact duration was performed from the positive and negative peaks in the dEGG signal, which have been shown to be roughly (but not precisely) representative of glottal closure and opening incidents (Herbst et al., 2014). This relative contact duration and its development over time can also be recognized in the dEGG wavegram (Figure 2D), in-dicated by the horizontal dark and light lines.

The relative contact duration, typically termed EGG con-tact quotient (Orlikoff, 1991), can also be calculated algo-rithmically, either (a) based on positive and negative maxi-ma in the dEGG signal (as described in the previous paragraph); or (b) with threshold-based or hybrid approach-es. Note that the different methods of estimating the contact duration result in different data (Sapienza et al., 1998; Henrich et al., 2004; Herbst and Ternström, 2006; Kankare et al., 2012). In this manner, a larger corpus of data can be analyzed and is thus accessible to statistical appraisal, though the results should be interpreted with great care (Herbst et al., 2017).

The EGG contact quotient is the main quantitative EGG analysis parameter used in human voice research (Howard, 1995; Verdolini et al., 1998; Schutte and Miller, 2001; Henrich et al., 2005; Guzmán et al., 2016). Other, less fre-quently applied and less rigorously evaluated quantitative EGG analysis parameters are the speed quotient and the rel-ative contact rise time (Orlikoff, 1991), amongst others.

The EGG contact quotient is in most cases roughly equiv-alent to the closed quotient as derived from high-speed video endoscopy or glottal airflow analysis (Echternach et al., 2010; La and Sundberg, 2014), but important exceptions exist where the EGG contact quotient is meaningless and should not be computed at all (Herbst et al., 2017). Work with human singers suggests that the EGG contact quotient may be used to partly infer glottal configuration and thus activation of intrinsic laryngeal muscles in certain cases, but the relation is, unfortunately, not straightforward (Herbst et al., 2011, 2017).

For the example shown in Figure 2 it can be hypothesized that, based on the observed EGG contact quotients, the first part (t = 0 s to ~4 s) was produced in the so-called ‘falsetto register’ (sometimes called laryngeal mechanism M1 (Henrich, 2006)), while the second part was produced in the so-called ‘chest register’ (laryngeal mechanism M2). During phonation in the chest register the thyroarytenoid muscle is typically more contracted as compared to the falsetto regis-ter (Hirano et al., 1969; Chhetri et al., 2012).

Figure 1. Schematic illustration of EGG waveform for one glottal cycle (Baken and Orlikoff, 2000; Hampala et al., 2016) (see text).

Page 4: Non-invasive documentation of primate voice production ...arro.anglia.ac.uk/702870/6/Dunn_2018.pdf · 2 C.T. HERBST and J.C. DUNN ANTHROPOLOGICAL SCIENCE 1968; Titze, 2008). The source–filter

C.T. HERBST and J.C. DUNN4 AnthropologicAl Science

Electroglottography: application in non-human primates

In humans, EGG has regularly been applied to voice re-search, clinical work, and singing voice pedagogy for about 40 years, and this millennium has seen a considerable in-crease of respective scientific publication outputs. This is most likely owing to the attractiveness of EGG as a low-

cost, non-invasive method. In contrast, the application of the method to non-human mammals has been comparatively extremely rare. Most studies involving non-human mam-mals (mostly dogs, but also sheep and cows (Berke et al., 1989; Alipour et al., 1996; Verdolini et al., 1998; Alipour and Jaiswal, 2008, 2009)) have been conducted in vitro or in anesthetized animals, for the purpose of duplicating the hu-man model (most likely for ethical reasons, avoiding having

Figure 2. Glissando (gradual fo variation) produced on vowel /a/ by a human female in laboratory conditions (sound-treated room, negligible background noise) during simultaneous recording of acoustic and EGG signal. Around t = 4 s, an abrupt change of laryngeal mechanism occurred. (A, B) Narrow-band spectrograms of acoustic and EGG signal, respectively, with the calculated fo superimposed as orange circles and cyan trian-gles; (C, D) EGG and dEGG wavegrams (see text); (E, F) representative EGG waveform and first derivative (dEGG), extracted at t = 3 s and t = 5 s, respectively. The arrows across panels E and D indicate the positive and negative peaks in the dEGG waveform, respectively; (G) scatter plot of fo from acoustic vs. EGG signal. A linear regression fit through the data points resulted in a perfect correlation (R2 = 1).

Page 5: Non-invasive documentation of primate voice production ...arro.anglia.ac.uk/702870/6/Dunn_2018.pdf · 2 C.T. HERBST and J.C. DUNN ANTHROPOLOGICAL SCIENCE 1968; Titze, 2008). The source–filter

PRIMATE VOICE PRODUCTION AND ELECTROGLOTTOGRAPHY 5

to investigate human larynges), targeting medical and basic voice science questions in humans.

Only recently has EGG been used in vitro in several stud-ies with excised larynges (Garcia and Herbst, 2018), specif-ically targeting questions of animal bioacoustics in a number of non-human primates (Herbst et al., 2012; Garcia et al., 2017), as well as prototypical application of the method in birds (Elemans et al., 2015; Rasmussen et al., in prepara-tion). In vivo, the methodology has, to the knowledge of the authors, only been applied twice before. In a pioneering in-vestigation involving two adult female Syke’s monkeys (Cercopithecus albogularis) (Brown and Cannito, 1995), the authors suggested that acoustic variation between sound emissions was principally due to different underlying laryn-geal modes of vocalization. In a very recent study, EGG was applied as the central method of data acquisition with an operant conditioning approach (Koda et al., in preparation), studying an adult female Japanese macaque (Macaca fuscata) (Herbst et al., 2016, in preparation). That latter work was complemented with in vitro data from an excised Japanese macaque larynx. It provides a first SPL-calibrated documentation of three of the animal’s call types (coo, growl, and chirp), showing that the Japanese macaque voice range is comparable to that of humans 7–10 years old. EGG evidence suggested that growls, coos, and chirps were pro-duced by distinct laryngeal vibratory mechanisms, analo-gous to what is known from human vocal registers (recall Figure 2). EGG data also revealed that the investigated Jap-anese macaque most likely varied the degree of vocal fold adduction in vivo, resulting in variations of the spectral characteristics within the emitted coo calls, ranging from ‘breathy’ (sound containing broadband noise components) to ‘non-breathy.’ This is again analogous to what is found in humans (recall Table 1), further corroborating the notion that humans and non-human primates share a universal physical and physiological sound-production mechanism (Herbst et al., in preparation).

Here, we present first qualitative insights into another re-cent in vivo EGG study on non-human primates, conducted at La Senda Verde Wildlife Sanctuary in Bolivia (further publications are forthcoming, e.g. Herbst and Dunn, 2018).

Spontaneous vocalizations of 12 semidomesticated New World monkeys, stemming from six different species, were simultaneously documented with acoustic and EGG record-ings. The two data acquisition strategies—either spontane-ous voluntary vocalizations or during temporal immobiliza-tion by veterinary staff—are illustrated in Figure 3.

Drawing from the large corpus of data we collected, an example of the causal dependency between vocal fold vibra-tion and sound generation in a white-fronted capuchin (Cebus albifrons) is illustrated in Figure 4. The acoustic ex-citation event per glottal cycle, indicated by the negative peak in the acoustic signal, occurs around the incident of vocal fold contacting (marked with vertical red lines in Fig-ure 4). This phenomenon is also typically seen in humans (Fant, 1979).

The superior potential of EGG, as compared to acoustic recordings, for understanding laryngeal sound generation is exemplified in Figure 5. Vocalization produced by a 3-year-old spider monkey (Ateles chamek) was documented with both acoustic and EGG recordings. When considering the acoustic signal alone, the acoustic waveform and the acous-tic spectrogram of the highlighted segments would suggest regular (t = 154–172 ms) and irregular or chaotic (t = 355–374 ms) vocal fold vibration, respectively. However, when looking at the EGG signal it becomes evident that both seg-ments are actually subharmonic (period-doubling) in nature. The EGG evidence reflects the physiological ‘ground truth’ at the laryngeal level, while the acoustic signal is likely pol-luted by artefactual background noise.

SummaryIn this text we have given an overview of EGG, describ-

ing the method and its current application in humans and non-human mammals. We have documented three investiga-tive paradigms for EGG data acquisition in non-human pri-mates: operant conditioning, voluntary communication, and short periods of immobilization. We have demonstrated that, in comparison to acoustic recordings, EGG can be very use-ful for gaining deeper insights into the vocal production mechanism in non-human primates. Indeed, in some cases,

Figure 3. Illustration of in vivo EGG data acquisition in New World monkeys at La Senda Verde wildlife sanctuary, Bolivia. Two investigative paradigms were employed. (A) During spontaneous voluntary interaction between the monkeys and a local animal keeper, EGG electrodes were applied manually (left panel). The communicating animal keeper was seated in front of a glass door, behind which the recording equipment was situated (middle panel). The microphone was attached to the outer frame of the glass door, at a known distance to the monkey. (B) In cases where voluntary interaction was not possible, the monkeys were gently immobilized for a short duration (maximum 5 min) by two animal keepers, and a third animal keeper applied the EGG electrodes manually (right panel). SPL-calibrated measurements were possible through a known mouth-to- microphone distance (microphone attached to the edge of the brown table in the right panel). In most situations, the animals were highly vocal during the short period of immobilization.

Page 6: Non-invasive documentation of primate voice production ...arro.anglia.ac.uk/702870/6/Dunn_2018.pdf · 2 C.T. HERBST and J.C. DUNN ANTHROPOLOGICAL SCIENCE 1968; Titze, 2008). The source–filter

C.T. HERBST and J.C. DUNN6 AnthropologicAl Science

Figure 4. Typical EGG waveform (bottom: three glottal cycles displayed) and corresponding acoustic signal (top graph) recorded from a white-fronted capuchin, time-shifted by 1.06 ms to compensate for the acoustic delay of the microphone signal, as introduced by a mouth-to- microphone distance of 30 cm and an estimated vocal tract length of 6 cm. Two events of acoustic excitation (the negative minima in the acoustic signal) are indicated with vertical red lines, corresponding to the incidents of maximum increase of vocal fold contact within the EGG signal.

Figure 5. Different interpretation of laryngeal dynamics based on acoustic and EGG signal. (A, B) Simultaneously acquired acoustic and EGG waveforms of female spider monkey vocalization; (C, D) narrow-band spectrograms of acoustic and EGG signals. The highlighted sequences (154–172 and 355–374 ms, respectively) are illustrated in more detail in panels E–H; (E, F) detailed acoustic and EGG waveforms of segment ex-tracted at 154–172 ms; (G, H) detailed acoustic and EGG waveforms of segment extracted at 355–374 ms.

Page 7: Non-invasive documentation of primate voice production ...arro.anglia.ac.uk/702870/6/Dunn_2018.pdf · 2 C.T. HERBST and J.C. DUNN ANTHROPOLOGICAL SCIENCE 1968; Titze, 2008). The source–filter

PRIMATE VOICE PRODUCTION AND ELECTROGLOTTOGRAPHY 7

EGG allows interpretations that would not be possible from analysis of acoustic recordings alone, including (the likely common) cases when acoustic data are contaminated with noise. The method thus enables a better understanding of the entire sound-production ‘loop,’ from emotional impetus/communicative situation, muscular action, physical sound production, to sound output.

We argue that EGG is a powerful new minimally invasive tool that can provide important insight into the ‘black box’ that is vocal production. A better understanding of vocal fold vibration across a range of taxa can lead to better compre-hension of several important elements of speech evolution, such as the universality of vocal production mechanisms, the independence of source and filter, the evolution of vocal control, and the relevance of non-linear phenomena. Future studies should apply EGG in vivo and/or in vitro across a range of species in order to improve our knowledge about vocal production in non-humans. Such comparative studies are likely to provide important insight into the evolution of human speech.

AcknowledgmentsThis publication has been partially supported by an

‘APART’ grant, awarded to C.T.H. by the Austrian Academy of Sciences, and supported by the Research Units for Ex-ploring Future Horizons of Kyoto University.

ReferencesAlipour F. and Jaiswal S. (2008) Phonatory characteristics of ex-

cised pig, sheep, and cow larynges. Journal of the Acoustical Society of America, 123: 4572–4581.

Alipour F. and Jaiswal S. (2009) Glottal airflow resistance in ex-cised pig, sheep, and cow larynges. Journal of Voice: Official Journal of the Voice Foundation, 23: 40–50.

Alipour F., Scherer R., and Patel V. (1996) An experimental study of pulsatile flow in canine larynges. National Center for Voice and Speech Status and Progress Report, 9: 47–52.

Baken R.J. and Orlikoff R.F. (2000) Clinical Measurement of Speech and Voice, 2nd edn. Cengage Learning, San Diego, CA.

Berke G., Moore D., Hanson D., Hantke D., Gerratt B., and Burstein F. (1987) Laryngeal modeling: theoretical, in vitro, in vivo. The Laryngoscope, 97: 871–881.

Berke G., Moore D.M., Gerratt B.R., Hanson D.G., and Natividad M. (1989) Effect of superior laryngeal nerve stimulation on phonation in an in vivo canine model. American Journal of Otolaryngology, 10: 181–187.

Bless D.M., Patel R., and Connor N. (2009) Laryngeal imaging: stroboscopy, high-speed digital imaging, and kymography. In: Fried M.P. and Ferlito A. (eds.), The Larynx, Vol. I. Plural Publishing, San Diego. pp. 181–210.

Brown C.H. and Cannito M.P. (1995) Modes of vocal variation in Sykes’s monkey (Cercopithecus albogularis) squeals. Journal of Comparative Psychology, 109: 398–415.

Charlton B.D., Swaisgood R.R., Zhihe Z., and Snyder R.J. (2012) Giant pandas attend to androgen-related variation in male bleats. Behavioral Ecology and Sociobiology, 66: 969–974.

Charlton B.D., Frey R., McKinnon A.J., Fritsch G., Fitch W.T., and Reby D. (2013) Koalas use a novel vocal organ to produce unusually low-pitched mating calls. Current Biology, 23: R1035–R1036.

Chhetri D.K., Neubauer J., and Berry D.A. (2012) Neuromuscular control of fundamental frequency and glottal posture at pho-

nation onset. Journal of the Acoustical Society of America, 131: 1401–1412.

Darwin C. (1859) On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Strug-gle for Life. John Murray, London.

Darwin C. (1871) The Descent of Man, and Selection in Relation to Sex. John Murray, London.

Deliyski D.D. and Hillman R.E. (2010) State of the art laryngeal imaging: research and clinical implications. Current Opinion in Otolaryngology & Head and Neck Surgery, 18: 147–152.

Döllinger M., Berry D.A., and Berke G.S. (2005) Medial surface dynamics of an in vivo canine vocal fold during phonation. Journal of the Acoustical Society of America, 117: 3174–3183.

Dunn J.C., Halenar L.B., Davies T.G., Cristobal-Azkarate J., Reby D., Sykes D., Dengg S., Fitch W.T., and Knapp L.A. (2015) Evolutionary trade-off between vocal tract and testes dimen-sions in howler monkeys. Current Biology, 25: 2839–2844.

Echternach M., Dippold S., Sundberg J., Arndt S., Zander M.F., and Richter B. (2010) High-speed imaging and electroglottogra-phy measurements of the open quotient in untrained male voices’ register transitions. Journal of Voice: Official Journal of the Voice Foundation, 24: 644–650.

Elemans C.P.H., Rasmussen J.H., Herbst C.T., Düring D.N., Zollinger S.A., Brumm H., Srivastava K., Svane N., Ding M., Larsen O.N., Sober S.J., and Švec J.G. (2015) Universal mechanisms of sound production and control in birds and mammals. Nature Communications, 6: 8978.

Fabre P. (1957) Un procédé électrique percuntané d’inscription de l’accolement glottique au cours de la phonation: glottographie de haute fréquence; premiers résultats [A non-invasive elec-tric method for measuring glottal closure during phonation: high frequency glottography]. Bulletin De l’Academie Na-tionale de Medecine, 141: 66–69.

Fant G. (1960) Acoustic Theory of Speech Production. Mouton and Company, Gravenhage.

Fant G. (1979) Glottal source and waveform analysis. Speech Transmission Laboratory, Quarterly Progress and Status Re-ports, 20: 85–107.

Fitch W.T. (1997) Vocal tract length and formant frequency disper-sion correlate with body size in rhesus macaques. Journal of the Acoustical Society of America, 102: 1213–1222.

Fitch W.T. (2000a) Skull dimensions in relation to body size in nonhuman mammals: the causal bases for acoustic allometry. Zoology, 103: 40–58.

Fitch W.T. (2000b) The evolution of speech: a comparative review. Trends in Cognitive Sciences, 4: 258–267.

Fitch W.T., Neubauer J., and Herzel H. (2002) Calls out of chaos: the adaptive significance of nonlinear phenomena in mamma-lian vocal production. Animal Behaviour, 63: 407–418.

Fitch W.T., de Boer B., Mathur N., and Ghazanfar A.A. (2016) Monkey vocal tracts are speech-ready. Science Advances, 2: e1600723.

Flanagan J. (1968) Source–system interaction in the vocal tract. Annals of the New York Academy of Sciences, 155: 9–17.

Fletcher N.H. (2005) Acoustic systems in biology: from insects to elephants. Acoustics Australia, 33: 83–88.

Fouquet M., Pisanski K., Mathe von N., and Reby D. (2016) Seven and up: individual differences in male voice fundamental fre-quency emerge before puberty and remain stable throughout adulthood. Royal Society Open Science, 3: 160395.

Garcia M. and Herbst C.T. (2018). Excised larynx experimentation: history, current developments, and prospects for bioacoustical research. Anthropological Science, in press.

Garcia M., Herbst C.T., Bowling D.L., Dunn J.C., and Fitch W.T. (2017) Acoustic allometry revisited: morphological determi-nants of fundamental frequency in primate vocal production. Scientific Reports, 7: 10450.

Ghazanfar A.A. and Hauser M.D. (1999) The neuroethology of primate vocal communication: substrates for the evolution of

Page 8: Non-invasive documentation of primate voice production ...arro.anglia.ac.uk/702870/6/Dunn_2018.pdf · 2 C.T. HERBST and J.C. DUNN ANTHROPOLOGICAL SCIENCE 1968; Titze, 2008). The source–filter

C.T. HERBST and J.C. DUNN8 AnthropologicAl Science

speech. Trends in Cognitive Sciences, 3: 377–384.Ghazanfar A.A., Takahashi D.Y., Mathur N., and Fitch W.T. (2012)

Cineradiography of monkey lip-smacking reveals putative precursors of speech dynamics. Current Biology, 22: 1176–1182.

Guzmán M., Castro C., Madrid S., Olavarria C., Leiva M., Muñoz D., Jaramillo E., and Laukkanen A.-M. (2016) Air pressure and contact quotient measures during different semioccluded postures in subjects with different voice conditions. Journal of Voice: Official Journal of the Voice Foundation, 30: 759.e1–759.e10.

Hampala V., Garcia M., Svec J.G., Scherer R.C., and Herbst C.T. (2016) Relationship between the electroglottographic signal and vocal fold contact area. Journal of Voice: Official Journal of the Voice Foundation, 30: 161–171.

Hauser M.D., Chomsky N., and Fitch W. (2002) The faculty of language: what is it, who has it, and how did it evolve? Sci-ence, 298: 1569–1579.

Henrich N. (2006) Mirroring the voice from Garcia to the present day: some insights into singing voice registers. Logopedics, Phoniatrics, Vocology, 31: 3–14.

Henrich N., D’Alessandro C., Doval B., and Castellengo M. (2004) On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation. Journal of the Acoustical Society of America, 115: 1321–1332.

Henrich N., D’Alessandro C., Doval B., and Castellengo M. (2005) Glottal open quotient in singing: measurements and correla-tion with laryngeal mechanisms, vocal intensity, and funda-mental frequency. Journal of the Acoustical Society of Ameri-ca, 117: 1417–1430.

Herbst C.T. (2017) A review of singing voice sub-system interac-tions—towards an extended physiological model of ‘support.’ Journal of Voice: Official Journal of the Voice Foundation, 31: 249.e13–249.e19.

Herbst C.T. and Ternström S. (2006) A comparison of different methods to measure the EGG contact quotient. Logopedics Phoniatrics Vocology, 31: 126–138.

Herbst C.T. and Dunn J.C. (2018) Fundamental frequency estima-tion of low-quality electroglottographic signals. Journal of Voice: Official Journal of the Voice Foundation (in press).

Herbst C.T., Fitch W.T., and Švec J.G. (2010) Electroglottographic wavegrams: a technique for visualizing vocal fold dynamics noninvasively. Journal of the Acoustical Society of America, 128: 3070–3078.

Herbst C.T., Qiu Q., Schutte H.K., and Švec J.G. (2011) Membra-nous and cartilaginous vocal fold adduction in singing. Jour-nal of the Acoustical Society of America, 129: 2253–2262.

Herbst C.T., Stoeger A.S., Frey R., Lohscheller J., Titze I.R., Gumpenberger M., and Fitch W.T. (2012) How low can you go? Physical production mechanism of elephant infrasonic vocalizations. Science, 337: 595–599.

Herbst C.T., Lohscheller J., Švec J.G., Henrich Bernadoni N., Weissengruber G., and Fitch W.T. (2014) Glottal opening and closing events investigated by electroglottography and super-high-speed video recordings. The Journal of Experimental Biology, 217: 955–963.

Herbst C.T., Koda H., Kunieda T., Suzuki J., and Nishimura T. (2016) Electroglottographic assessment of in vivo Japanese macaque sound production. 10th International Conference on Voice Physiology and Biomechanics (ICVPB), Universidad Tecnica Federico Santa Maria, Vina del Mar, Chile, pp. 110–111.

Herbst C.T., Schutte H.K., Bowling D.L., and Švec J.G. (2017) Comparing chalk with cheese—the EGG contact quotient is only a limited surrogate of the closed quotient. Journal of Voice: Official Journal of the Voice Foundation, 31: 401–409.

Herbst C.T., Koda H., Kunieda T., Suzuki J., Garcia M., Fitch W.T., and Nishimura T. (in preparation) Acoustic and electroglotto-graphic assessment of Japanese macaque sound production—in vivo and in vitro. Journal of Experimental Biology.

Hirano M., Ohala J., and Vennard W. (1969) The function of laryn-geal muscles in regulating fundamental frequency and intensi-ty of phonation. Journal of Speech, Language, and Hearing Research, 12: 616–628.

Howard D.M. (1995) Variation of electrolaryngographically de-rived closed quotient for trained and untrained adult female singers. Journal of Voice: Official Journal of the Voice Foun-dation, 9: 163–172.

Howard D.M. and Murphy D.T. (2007) Voice Science, Acoustics, and Recording. Plural Publishing, San Diego.

Kankare E., Laukkanen A.-M., Ilomäki I., Miettinen A., and Pylkkänen T. (2012) Electroglottographic contact quotient in different phonation types using different amplitude threshold levels. Logopedics Phoniatrics Vocology, 37: 127–132.

Koda H., Kunieda T., and Nishimura T. (in preparation) From hand to mouth: greater effort in motor preparation is required for voluntary control of vocalization than for touching in mon-keys. Biology Letters.

Lã F.M.B. and Sundberg J. (2014) Contact quotient versus closed quotient: a comparative study on professional male singers. Journal of Voice: Official Journal of the Voice Foundation, 29: 148–154.

Nishimura T. (2003) Comparative morphology of the hyo-laryngeal complex in anthropoids: two steps in the evolution of the de-scent of the larynx. Primates, 44: 41–49.

Orlikoff R.F. (1991) Assessment of the dynamics of vocal fold contact from the electroglottogram: data from normal male subjects. Journal of Speech and Hearing Research, 34: 1066–1072.

Rasmussen J.H., Herbst C.T., and Elemans C.P.H. (in preparation) Quantifying syringeal kinematics in vitro using electroglot-tography. Journal of Experimental Biology.

Reby D. and McComb K. (2003) Anatomical constraints generate honesty: acoustic cues to age and weight in the roars of red deer stags. Animal Behaviour, 65: 519–530.

Reby D., McComb K., Cargnelutti B., Darwin C., Fitch W.T., and Clutton-Brock T. (2005) Red deer stags use formants as as-sessment cues during intrasexual agonistic interactions. Pro-ceedings of the Royal Society B Biological Sciences, 272: 941–947.

Sanvito S. and Galimberti F. (2003) Source level of male vocalisa-tions in the genus Mirounga: repeatability and correlates. Bio-acoustics, 14: 47–59.

Sapienza C., Stathopoulos E.T., and Dromey C. (1998) Approxima-tions of open quotient and speed quotient from glottal airflow and EGG waveforms: effects of measurement criteria and sound pressure level. Journal of Voice: Official Journal of the Voice Foundation, 12: 31–43.

Schutte H.K. and Miller D.G. (2001) Measurement of closed quo-tient in a female singing voice by electroglottography and videokymography. In Schutte H.K. (ed.), Proceedings of the 5th International Conference Advances in Quantitative Laryn-gology, Groningen.

Seyfarth R.M. and Cheney D.L. (1986) Vocal development in vervet monkeys. Animal Behaviour, 34: 1640–1658.

Story B. (2002) An overview of the physiology, physics and mode-ling of the sound source for vowels. Acoustical Science and Technology, 23: 195–206.

Svec J.G. and Schutte H.K. (2012) Kymographic imaging of laryn-geal vibrations. Current Opinion in Otolaryngology & Head and Neck Surgery, 20: 458–465.

Takahashi D.Y., Narayanan D.Z., and Ghazanfar A.A. (2013) Cou-pled oscillator dynamics of vocal turn-taking in monkeys. Current Biology, 23: 2162–2168.

Titze I.R. (1980) Comments on the myoelastic—aerodynamic the-ory of phonation. Journal of Speech & Hearing Research, 23: 495–510.

Titze I.R. (2008) Nonlinear source-filter coupling in phonation: theory. Journal of the Acoustical Society of America, 123: 2733–2749.

Page 9: Non-invasive documentation of primate voice production ...arro.anglia.ac.uk/702870/6/Dunn_2018.pdf · 2 C.T. HERBST and J.C. DUNN ANTHROPOLOGICAL SCIENCE 1968; Titze, 2008). The source–filter

PRIMATE VOICE PRODUCTION AND ELECTROGLOTTOGRAPHY 9

van den Berg J. (1958) Myoelastic-aerodynamic theory of voice production. Journal of Speech, Language, and Hearing Re-search, 1: 227–244.

Verdolini K., Chan R., Hess M., and Bierhals W. (1998) Corre-spondence of electroglottographic closed quotient to vocal fold impact stress in excised canine larynges. Journal of Voice: Official Journal of the Voice Foundation, 12: 415–423.

Wilden I., Herzel H., Peters G., and Tembrock G. (1998) Subhar-

monics, biphonation, and deterministic chaos in mammal vo-calization. Bioacoustics—International Journal of Animal Sound and its Recording, 9: 171–196.

Wyman M.T., Mooring M.S., McCowan B., Penedo M.C., and Hart L.A. (2008) Amplitude of bison bellows reflects male quality, physical condition and motivation. Animal Behaviour, 76: 1625–1639.