9
Special Issue: Cognition & Language Volition and learning in primate vocal behaviour Asif A. Ghazanfar a, b, c, * , Diana A. Liao a , Daniel Y. Takahashi a a Princeton Neuroscience Institute, Princeton University, Princeton, NJ, U.S.A. b Department of Psychology, Princeton University, Princeton, NJ, U.S.A. c Department of Ecology & Evolutionary Biology, Princeton University, Princeton, NJ, U.S.A. article info Article history: Received 10 October 2018 Initial acceptance 19 November 2018 Final acceptance 12 December 2018 Available online 1 March 2019 MS. number: SI-18-00738 Keywords: cooperative breeding human speech marmoset primate prosociality vocal communication vocal development vocal learning volitional control Investigating nonhuman primate vocal communication is often with the intention of elucidating their similarities with human speech and thus reconstructing the evolutionary history of this important behaviour. However, putative parallels between primate and human vocal behaviours have, in some respects, remained elusive. Here, we review two lines of research in marmoset monkeys on volitional vocal control and vocal learning during development that could bridge our understanding of the rela- tionship between primate vocalizations and human speech. Regarding volitional control, we review how changes in vocal output are not solely due to changes in arousal levels and their effect on the vocal apparatus; extrinsic factors like the vocalizations from other conspecics also have an important inu- ence. With regard to vocal learning, we describe not only how infant marmoset vocalizations undergo dramatic acoustic changes during development that are not wholly explained by physical growth, but also how, as in humans, contingent vocal responses from parents inuence the rate of vocal develop- ment. We argue that the similarities in the vocal systems of marmoset monkeys and humans may be due to their shared cooperative breeding strategy and prosociality. © 2019 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved. Evolution typically tinkers with pre-existing phenotypes to nd workable solutions to new challenges (Jacob, 1977). Therefore, it seems quite plausible that the vocalizations of our hominid an- cestors represent the precursors to human speech. Investigating the putative evolutionary processes via the fossil record has, un- fortunately, been unproductive and sometimes misleading, pri- marily because the directly relevant phenotypesdbehaviours and soft tissues such as the brain and parts of the vocal apparatusddo not fossilize. An alternative approach to understanding the evolu- tion of human communication is via the comparative method. Extant primates have adopted a rich diversity of social and vocal behaviours, and thus investigating them provides a means for us to reconstruct our own evolutionary history (Ghazanfar & Takahashi, 2014b). Thus, a pivotal question for understanding speech evolu- tion is to what extent the vocalizations of extant primates represent the precursors to speech. Do their vocal behaviours share features with human speech behaviour that would indicate that the last common ancestor also shared these features? From a mechanistic perspective, there are certainly aspects of vocal behaviour that humans and primates share. The anatomy of voice production is broadly similar in humans and other primates (Ghazanfar & Rendall, 2008). Voice production involves a sound source, generally the larynx, powered by respiratory pressure from the lungs, and coupled to a sound lter represented by the vocal- tract airways (the oral and nasal cavities) above the larynx. These basic components of the vocal apparatus behave and interact in complex ways to generate a wide range of sounds. Indeed, the evidence suggests that both macaque monkeys and baboons are, biomechanically, capable of producing most human speech sounds (Boe et al., 2017; Fitch, de Boer, Mathur, & Ghazanfar, 2016). In terms of dynamics, movements of the mouth can pattern vocal acoustics (affecting their amplitude and/or frequency content) and provide visual cues that enhance the perception of vocal signals. Across all languages studied to date, both the mouth motion and the acoustic envelope of speech typically exhibits a 3e8 Hz rhythm that is, for the most part, related to the rate of syllable production (Chandrasekaran, Trubanova, Stillittano, Caplier, & Ghazanfar, 2009; Greenberg, Carvey, Hitchcock, & Chang, 2003); this 3e8 Hz rhythm is critical to speech perception (Ghitza & Greenberg, 2009; for review, see ; Giraud & Poeppel, 2012). The orofacial movements and vocalizations of primates also exhibit this rhythmic structure * Correspondence: A.A. Ghazanfar, Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, U.S.A. E-mail address: [email protected] (A. A. Ghazanfar). Contents lists available at ScienceDirect Animal Behaviour journal homepage: www.elsevier.com/locate/anbehav https://doi.org/10.1016/j.anbehav.2019.01.021 0003-3472/© 2019 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved. Animal Behaviour 151 (2019) 239e247 SPECIAL ISSUE: COGNITION & LANGUAGE

Volition and learning in primate vocal behaviour · Special Issue: Cognition & Language Volition and learning in primate vocal behaviour Asif A. Ghazanfar a, b, c, *, Diana A. Liao

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Volition and learning in primate vocal behaviour · Special Issue: Cognition & Language Volition and learning in primate vocal behaviour Asif A. Ghazanfar a, b, c, *, Diana A. Liao

lable at ScienceDirect

Animal Behaviour 151 (2019) 239e247

SPECIAL ISSUE: COGNITION & LANGUAGE

Contents lists avai

Animal Behaviour

journal homepage: www.elsevier .com/locate/anbehav

Special Issue: Cognition & Language

Volition and learning in primate vocal behaviour

Asif A. Ghazanfar a, b, c, *, Diana A. Liao a, Daniel Y. Takahashi a

a Princeton Neuroscience Institute, Princeton University, Princeton, NJ, U.S.A.b Department of Psychology, Princeton University, Princeton, NJ, U.S.A.c Department of Ecology & Evolutionary Biology, Princeton University, Princeton, NJ, U.S.A.

a r t i c l e i n f o

Article history:Received 10 October 2018Initial acceptance 19 November 2018Final acceptance 12 December 2018Available online 1 March 2019MS. number: SI-18-00738

Keywords:cooperative breedinghuman speechmarmosetprimateprosocialityvocal communicationvocal developmentvocal learningvolitional control

* Correspondence: A.A. Ghazanfar, Princeton NeurUniversity, Princeton, NJ 08544, U.S.A.

E-mail address: [email protected] (A. A. Ghaza

https://doi.org/10.1016/j.anbehav.2019.01.0210003-3472/© 2019 The Association for the Study of A

Investigating nonhuman primate vocal communication is often with the intention of elucidating theirsimilarities with human speech and thus reconstructing the evolutionary history of this importantbehaviour. However, putative parallels between primate and human vocal behaviours have, in somerespects, remained elusive. Here, we review two lines of research in marmoset monkeys on volitionalvocal control and vocal learning during development that could bridge our understanding of the rela-tionship between primate vocalizations and human speech. Regarding volitional control, we review howchanges in vocal output are not solely due to changes in arousal levels and their effect on the vocalapparatus; extrinsic factors like the vocalizations from other conspecifics also have an important influ-ence. With regard to vocal learning, we describe not only how infant marmoset vocalizations undergodramatic acoustic changes during development that are not wholly explained by physical growth, butalso how, as in humans, contingent vocal responses from parents influence the rate of vocal develop-ment. We argue that the similarities in the vocal systems of marmoset monkeys and humans may be dueto their shared cooperative breeding strategy and prosociality.© 2019 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved.

Evolution typically tinkers with pre-existing phenotypes to findworkable solutions to new challenges (Jacob, 1977). Therefore, itseems quite plausible that the vocalizations of our hominid an-cestors represent the precursors to human speech. Investigatingthe putative evolutionary processes via the fossil record has, un-fortunately, been unproductive and sometimes misleading, pri-marily because the directly relevant phenotypesdbehaviours andsoft tissues such as the brain and parts of the vocal apparatusddonot fossilize. An alternative approach to understanding the evolu-tion of human communication is via the comparative method.Extant primates have adopted a rich diversity of social and vocalbehaviours, and thus investigating them provides a means for us toreconstruct our own evolutionary history (Ghazanfar & Takahashi,2014b). Thus, a pivotal question for understanding speech evolu-tion is towhat extent the vocalizations of extant primates representthe precursors to speech. Do their vocal behaviours share featureswith human speech behaviour that would indicate that the lastcommon ancestor also shared these features?

oscience Institute, Princeton

nfar).

nimal Behaviour. Published by Els

From a mechanistic perspective, there are certainly aspects ofvocal behaviour that humans and primates share. The anatomy ofvoice production is broadly similar in humans and other primates(Ghazanfar & Rendall, 2008). Voice production involves a soundsource, generally the larynx, powered by respiratory pressure fromthe lungs, and coupled to a sound filter represented by the vocal-tract airways (the oral and nasal cavities) above the larynx. Thesebasic components of the vocal apparatus behave and interact incomplex ways to generate a wide range of sounds. Indeed, theevidence suggests that both macaque monkeys and baboons are,biomechanically, capable of producing most human speech sounds(Bo€e et al., 2017; Fitch, de Boer, Mathur, & Ghazanfar, 2016). Interms of dynamics, movements of the mouth can pattern vocalacoustics (affecting their amplitude and/or frequency content) andprovide visual cues that enhance the perception of vocal signals.Across all languages studied to date, both the mouth motion andthe acoustic envelope of speech typically exhibits a 3e8 Hz rhythmthat is, for the most part, related to the rate of syllable production(Chandrasekaran, Trubanova, Stillittano, Caplier, & Ghazanfar,2009; Greenberg, Carvey, Hitchcock, & Chang, 2003); this 3e8 Hzrhythm is critical to speech perception (Ghitza & Greenberg, 2009;for review, see ; Giraud & Poeppel, 2012). The orofacial movementsand vocalizations of primates also exhibit this rhythmic structure

evier Ltd. All rights reserved.

Page 2: Volition and learning in primate vocal behaviour · Special Issue: Cognition & Language Volition and learning in primate vocal behaviour Asif A. Ghazanfar a, b, c, *, Diana A. Liao

A. A. Ghazanfar et al. / Animal Behaviour 151 (2019) 239e247240

SPECIAL ISSUE: COGNITION & LANGUAGE

(Bergman, 2013; Ghazanfar, Takahashi, Mathur, & Fitch, 2012;Lameira et al., 2015; Pomberger, Risueno-Segovia, L€oschner, &Hage, 2018; Terleph, Malaivijitnond, & Reichard, 2018; Toyoda,Maruhashi, Malaivijitnond, & Koda, 2017; for reviews, seeGhazanfar & Takahashi, 2014a, 2014b). In macaques, this rhythmicpatterning of visible orofacial movements influences perception(Ghazanfar, Morrill, & Kayser, 2013) and shows a similar develop-mental trajectory with that of humans (Morrill, Paukner, Ferrari, &Ghazanfar, 2012).

In spite of these and other putative homologies at the mecha-nistic level of vocal production and perception (Ghazanfar &Takahashi, 2014b; Lameira, Maddieson, & Zuberbühler, 2014),there are two major features at the ‘cognitive’ level of humanspeech that have long been thought absent in primate vocalcommunication: volitional control and learning during develop-ment. With regard to the volitional control of vocal production, ithas been believed that primate vocalizations are driven solely byinternal states and thus, have little or no relationship with humanspeech (which is assumed to be purely voluntary) (Jürgens, 2009;Owren, Amoss, & Rendall, 2011; Tomasello, 2008). While mon-keys can be trained to vocally respond to arbitrary visual or audi-tory cues (Coud�e et al., 2011; Gemba, Kyuhou, Matsuzaki, & Amino,1999; Hage, Gavrilov, & Nieder, 2013), it is difficult to demonstratevolition under natural or seminatural conditions. With regard tovocal learning, for over half a century now, primate vocalizationsare thought to undergo little or no experience-dependent acousticchanges during development (Egnor & Hauser, 2004). Since theyseem so fundamental to our human communication system, thesetwo apparent paradoxes must be dealt with if we want to under-stand the origins of human speech. In this review, we will use thevocal behaviour of marmoset monkeys (Callithrix jacchus)da NewWorld species and thus an outgroup relative to the Old World,human lineagedto investigate volitional control of vocal outputand vocal learning.

WHY MARMOSET MONKEYS?

The common marmoset monkey is a small-bodied species(300e400 g, on average) that is native to northeastern Brazil. Theylive in social groups of ~9e15 individuals, many of whom arerelated to each other. Marmosets and other closely related speciesin the callitrichid family are quite flexible in their vocal output,especially when compared to Old World primates like macaquesand apes. They readily adjust, without any training, the timing oftheir vocal production to the timing of conspecific calls (Ghazanfaret al., 2001, 2002; Liao, Zhang, Cai, & Ghazanfar, 2018; Takahashi,Narayanan, & Ghazanfar, 2013) and to avoid intermittent back-ground noise (Egnor, Wickelgren, & Hauser, 2007; Roy, Miller,Gottsch, & Wang, 2011). Marmosets also take turns when theyvocalize, exhibiting contingent and repeated exchanges of vocali-zations between any two individualsdrelated or unrelateddforextended periods. That is, their behaviour is not simply a call-and-response behaviour among mates or competitors (Takahashi et al.,2013). This turn-taking behaviour by marmosets has the sameuniversal features and coupled oscillator properties as humanconversational turn taking, albeit on a different timescale(Ghazanfar & Takahashi, 2014b; Levinson, 2016; Takahashi et al.,2013).

Another noteworthy phenotype of marmoset monkeys is thatthey are cooperative breeders and typically produce dizygotic twins(Harris et al., 2014). Both parents, as well as older siblings andnonkin, help care for offspring by carrying them and sharing food.This is not common among primates, as only humans andmembersof the callitrichid family exhibit this reproductive strategy. In termsof comparative developmental studies among human and

primates, this suggests that marmosets may be a more compellinganalogous species than the phylogenetically closer, but sociallydissimilar, Old World apes and monkeys (Elowson, Snowdon, &Lazaro-Perea, 1998). The cooperative breeding behaviour byhumans and marmosets lead to prosocial cognitive processes(Burkart et al., 2009, 2014; Snowdon & Cronin, 2007), includingthose related to vocal communication (Borjon & Ghazanfar, 2014).This provides a plausible explanation for their human-like vocalturn-taking behaviours (Takahashi et al., 2013; see Ghazanfar &Takahashi, 2014b, for review) and for their cooperative adjust-ment of vocal amplitude in accord with social distance from con-specifics (Choi, Takahashi, & Ghazanfar, 2015; i.e. they raise theirvoices when they hear that conspecifics are far away, as humans do:Pelegrín-García, Smits, Brunskog, & Jeong, 2011).

Given these parallels with human vocal behaviour and life his-tory strategy, perhaps marmoset monkeys could provide insightsinto volitional versus arousal-based vocal production and vocallearning.

VOLITIONAL VERSUS AROUSAL-BASED VOCALIZATIONS

One prevailing view is that primate vocalizations are inextri-cably linked to internal states like arousal and thus cannot repre-sent precursors to human speech as such vocalizations are notvolitional (Jürgens, 2009; Owren et al., 2011; Tomasello, 2008). Forexample, the presence of different predators (e.g. eagles andleopards) triggers the production of different alarm calls by vervetmonkeys, Chlorocebus pygerythrus, suggesting that external cuesdetermine which vocalizations are produced (Seyfarth, Cheney, &Marler, 1980). However, vocalizations that are acoustically similarto vervet alarm calls may be elicited by events unrelated to thepresence of predators (e.g. aggressive interactions with conspe-cifics). These contextsdpresence of a predator and territorialaggressiondare highly arousing, with the participating animalsvisibly alert and animated. Thus it was argued that vocal productionof these calls are linked to the arousal state of the animal asopposed to voluntary control (Price et al., 2015). Along similar lines,the severity of conspecific aggression modulates the acoustic fea-tures of victim screams in chimpanzees, Pan troglodytes (Slocombe& Zuberbühler, 2007). But within specific categories of aggression(e.g. direct physical contact), victims produce longer and louderscreams when socially high-ranking individuals are present nearthe conflict (Slocombe& Zuberbühler, 2007). This audience effect isproposed to recruit aid from other conspecifics, indicating a degreeof social intelligence in perceiving hierarchical relationships.However, the presence of higher-ranked individuals might alsoaffect the arousal level of the chimpanzee at the time of the conflict.

Is this complexity really absent from human speech? Not really.Like primate vocalizations, speech production is linked to physio-logically measured arousal states (Linden, 1987; Lynch et al., 1980;MacPherson, Abur, & Stepp, 2017), and the acoustic structure ofspeech is also influenced by arousal levels (Banse & Scherer, 1996).For example, during speech produced under increased cognitiveload, humans show heightened autonomic arousal that correlateswith changes in voice quality (MacPherson et al., 2017). Thus, a keyevolutionary question is not really about whether vocal productionin any species is purely volitional or purely driven by internal states,but the extent to which those two factors interact with each otherduring communication across different contexts.

It is challenging to figure out how internal states like arousalrelate to the production of primate vocalizations without a directmeasure of such physiological states in controlled contexts (Fischer& Price, 2017). One must simultaneously measure arousal states(e.g. heart rate, pupil dilation, skin conductance) while a primate isproducing different types of vocalizations. This is difficult to do

Page 3: Volition and learning in primate vocal behaviour · Special Issue: Cognition & Language Volition and learning in primate vocal behaviour Asif A. Ghazanfar a, b, c, *, Diana A. Liao

A. A. Ghazanfar et al. / Animal Behaviour 151 (2019) 239e247 241

SPECIAL ISSUE: COGNITION & LANGUAGE

under field conditions. In the wild, it is also difficult to account forthe many variables that could be affecting an individual's arousallevel at any moment in time. By contrast, in captivity, where directphysiological measures are easier to acquire, it is difficult to elicitdifferent types of vocalizations because the number of contextualcues that trigger vocalizations is more limited relative to morenaturalistic conditions (Marler, 1965; Seyfarth & Cheney, 2017). Tocircumvent these problems, a recent study used noninvasive sur-face electromyography to measure heart rates in marmoset mon-keys while they produced different vocalizations; differentvocalizations (‘phees’, ‘trill-phees’ and ‘trills’) were elicited bysystematically manipulating social context (Liao et al., 2018). Thecontext manipulation was inspired by the finding that wild pygmymarmosets, Cebuella pygmaea, will switch among different affili-ative call types according to their physical distance from their socialgroup (de la Torre & Snowdon, 2002). In brief, four different con-texts were tested as follows: (1) Alone: a single marmoset was byitself in a corner of an experiment room; (2) Occluded-Far: twomarmosets were placed in opposing corners with an acousticallytransparent, visually opaque curtain blocking sight; (3) Visible-Far:two marmosets were placed in opposing corners without an opa-que curtain, allowing visual contact; and (4) Visible-Close: twomarmosets were placed in the same corner with visual, but nottactile, contact (Liao et al., 2018) (Fig. 1a). These context changesrepresent changes in ‘social distance’, referring to physical distancefrom a partner, and whether the partner could be seen or not, inaccord with these four experimental conditions. One marmoset of

(a)

(c)

Visible FaenolA

Curtain

raFdedulccO

0.1 0.2 0.3

5

10

15

Freq

uen

cy (

kHz)

0.2 0.6 1

5

10

15

0.5 1 1.5 2

5

10

15

Time (s)Alone OccFar VisFar

Context

0

0.2

0.4

0.6

0.8

1

Prop

orti

on

Phee

Trill-phee

Trill

Figure 1. Intrinsic and extrinsic factors influence vocal output in adult marmoset monkeyorder of decreasing social distance: Alone e a marmoset was alone in one corner; OccludedVisible Far e two marmosets were at opposing corners without a curtain; Visible Close e twpairs of surface electrodes were applied to the dorsal and ventral thorax of the marmoset. (c)contact calls. All exemplars are from the same marmoset. The right panel shows the change inerror of the mean (SEM). (d) Heart rate differences by call type are separated into their resptypes binned by call response latency from an individual marmoset monkey. Bins are 1 s e

the pair was selected to noninvasively record electromyography(EMG) signals from; pairs of AgeAgCl surface electrodes were sewnonto a soft elastic band that was secured around the marmoset'sthorax (Fig. 1b).

Under these conditions, decreases in social distance reliablyelicited changes in the acoustic parameters of marmoset monkeyvocalizations that, ultimately, translated into producing differentproportions of three affiliative contact call types (Liao et al., 2018)(Fig. 1c). Measures of heart rate, a proxy for arousal, as a function ofsocial distance revealed that arousal levels generally decreasedwith decreasing social distance. Thus, on average, arousal levelswere correlated with changes in different acoustic features and theproduction of different vocalizations. These findings are consistentwith many vocalization studies of primates that infer arousal levelchanges based on an animal's behaviour as a function of the pres-ence and/or distance from a conspecific or predator. Vocalizationsget longer and louder and are produced at a higher rate withincreasing arousal (Fichtel & Hammerschmidt, 2002; Fischer,Hammerschmidt, & Todt, 1995; Meise, Keller, Cowlishaw, &Fischer, 2011; Scherer, Johnstone, & Klasmeyer, 2003; Slocombe &Zuberbühler, 2007; Yamaguchi, Izumi, & Nakamura, 2010). How-ever, at social distances that could elicit all three call types (phees,trill-phees and trills), their production was not reliably associatedwith different arousal levels (Fig. 1d) (Liao et al., 2018). This sug-gested that extrinsic factors also play a role in driving the produc-tion of vocalizations.

(b)

(d)

(e)

r Visible Close

VisClose

PheeTrillPheeTrill

Phee Trill-phee Trill

OccFar

Phee Trill-phee Trill

VisFar

Phee Trill-phee Trill

VisClose

Call type

5.6

5.8

6

6.2

6.4

6.6

Hea

rt r

ate

(Hz)

5.6

5.8

6

6.2

6.4

6.6

5.6

5.8

6

6.2

6.4

6.6

OccFar VisFar

12

0.2

0.6

1

Prop

orti

on

Prop

orti

on

21840 0 4 8

0.2

0.6

1

Response interval

s. (a) Schematic of room and marmoset placement in four different social contexts. InFar e two marmosets were at opposing corners with an opaque curtain between them;o marmosets were in the same corner. (b) Depiction of electromyography set-up whereThe three panels on the left are exemplar spectrograms for the phee, trill-phee and trillproportion of these three contact call types by context. Error bars capture the standardective contexts. Error bars capture the SEM. (e) Change in proportion of the three callach.

Page 4: Volition and learning in primate vocal behaviour · Special Issue: Cognition & Language Volition and learning in primate vocal behaviour Asif A. Ghazanfar a, b, c, *, Diana A. Liao

A. A. Ghazanfar et al. / Animal Behaviour 151 (2019) 239e247242

SPECIAL ISSUE: COGNITION & LANGUAGE

Among the many external factors that may be influencing pri-mate vocal production is the behaviour of conspecifics. Liao et al.(2018) first investigated how changes in social distance affectedthe timing of vocalizations within an individual and across pairs. Incontexts where marmosets were interacting with anotherconspecific, vocal responses to a conspecific's vocalizations werevariable in their timing and there were positive correlations be-tween duration, frequency and amplitude with increasing responselatency: the slower the response, the longer, higher-pitched, moretonal and louder the vocalization. These acoustic characteristicstranslated into different call types: when responses were quickdless than a few secondsdthe call types produced were more likelyto be trills or trill-phees; when more time passed, the call type wasmore likely to be a phee call (Fig. 1e). Tying it all together arefluctuating levels of arousal. Short latency vocal responses (e.g.trills) were more likely in a low arousal state, while long latencyvocal responses (e.g. phees) weremore likely in a high arousal state.Thus, the timing of a vocal response relative to a conspecific's signaland the state of arousal both influence the type of vocalization thatis produced (Liao et al., 2018). We suspect the same is true for otherprimates where their vocal behaviour was originally interpreted aseither purely arousal-based (triggered by intrinsic signals) or voli-tional (e.g. triggered by external signals).

As the role of arousal is a means of allocating metabolic energy(i.e. preparing the body for action (Pfaff, 2006)), variability in vocaloutput as function of contextdfrom changes in acoustic features tothe production of different call typesdmight result from theadaptive interplay between multiple behavioural drives thatimpact fitness. Vocal communication facilitates social coordination;one advantage of vocal communication is the ability to broadcastsignals over long distances. In visually challenging habitats, reliablereciprocal exchanges of calls between conspecifics allows for themaintenance of social bonds (Kulahci, Rubenstein, & Ghazanfar,2015). However, vocal production, particularly the production ofloud, long and tonal contact calls like marmoset phee calls, isenergetically demanding (Borjon, Takahashi, Cervantes, &Ghazanfar, 2016; Teramoto, Takahashi, Holmes, & Ghazanfar,2017), eliciting high metabolic costs (Gillooly & Ophir, 2010). Vi-sual contact could supplement vocal fidelity, allowing energy sav-ings through the production of quieter, shorter and noisiervocalizations (i.e. trills). This would allow energy to be allocated forother essential behaviours such as foraging, infant care and mating.Another type of cost-savings related to different call types ispredator evasion. For example, pygmy marmoset calls used forshort-distance communication (i.e. trills) are highly distortedbeyond 10 m from the vocalizer while long-distance calls can bedetected with some fidelity at 40 m (de la Torre & Snowdon, 2002).This suggests that in addition to energetic costs, there are alsodefensive costs. The probability of being detected and located bypredators increases with the production of longer and louder callsby monkeys. Thus, marmoset monkeys adapt to dynamic envi-ronments by balancing the constraints of broadcasting social in-formation to group members with other adaptive behaviours (e.g.finding food, territorial defence via scent marking, predatoravoidance, and more). It is likely that other primates negotiatethrough similar trade-offs.

While the evidence (at least from marmoset monkeys) suggeststhat both internal states and external signals influence vocal pro-duction, the possibility remains that this still does not indicatevolitional control of vocal production. That is, external cues like thetiming of a conspecific's vocalization (as above) may in fact betriggers that elicit involuntary vocal responses in listeners. Datafrom vocal exchange experiments suggest that this is unlikely to bethe case. For example, marmoset monkeys will adjust the ampli-tude of their contact calls as a function of perceived distance from a

conspecific (Choi et al., 2015). If the conspecific's contact call is loud(indicating close proximity), then the vocal replies are considerablylower in intensity (and vice versa). Marmoset vocalizations alsocontain in their acoustic structure indexical cues that reflect iden-tity and gender (Miller, Mandel, &Wang, 2010; Norcross, Newman,& Cofrancesco, 1999), and these identity cues used during vocalexchanges affect both the number of calls and their timing (Miller&Thomas, 2012). These vocal exchange data suggest that marmosetmonkeys are selective in who they respond to and how theyrespond, indicating a degree of voluntary control as opposed to areflexive, trigger-like response to conspecific vocalizations.

However, it could be argued that the best evidence for volitionalcontrol would consist of training a marmoset monkey to vocallyrespond to a completely arbitrary cue. For example, a number ofstudies demonstrate that macaque monkeys can be conditioned tovocalize in response to experimenter cues, suggesting that theyhave some control over what to vocalize towards andwhen to do so(Aitken, 1981; Coud�e et al., 2011; Hage et al., 2013; Hihara, Yamada,Iriki, & Okanoya, 2003; Sutton, Larson, Taylor, & Lindeman, 1973).To date, no similar study that trained marmosets to vocalize on cuehas been published.

VOCAL LEARNING

The vocal behaviours of primates and other animals reflect theinterplay between their arousal states, sensorimotor coordination,biomechanical conditions and social contexts. To capture the shapeof the developmental trajectory of any vocal behaviour, one mustsample early and densely (Adolph, Robinson, Young, & Gill-Alvarez,2008). This is especially true for marmoset monkeys that develop12 times faster than humans (de Castro Le~ao, Duarte D�oria Neto, &Bernardete Cordeiro de Sousa, 2009; Schultz-Darken, Braun, &Emborg, 2016). Contrary to what is typical for other nonhumanprimates (Egnor & Hauser, 2004), marmoset infants exhibit a dra-matic change in vocal production in the first few months of post-natal life (Takahashi et al., 2015). At postnatal day 1 (P1),vocalizations are more numerous and more variable in their spec-trotemporal structure than in later weeks (Fig. 2a). The number andvariability of calls diminish over 2 months, approaching maturevocal output with exclusive production of the context-appropriatecontact (‘phee’) calls (Takahashi et al., 2015). Prior reports in othermonkey species showed muchmore subtle developmental changesin vocal production and were typically attributed to the passiveconsequences of growth in vocalization-related structures (Egnor&Hauser, 2004). For example, as the vocal folds get bigger and/or thevocal tract gets longer, vocalizations will be produced with acous-tics in a lower frequency range (Ghazanfar & Rendall, 2008).However, the not-so-subtle changes in infant marmoset vocaliza-tions could not be attributed solely to growth alone (Takahashiet al., 2015).

A subset of infant marmoset calls serve as scaffolding formature, adult-like calls (Takahashi et al., 2015). Infant marmosetsproduce a mixture of mature and immature vocalizations: adult-like calls (‘twitters’, ‘trills’ and ‘phees’) and immature versions ofthe contact phee call (‘cries’, ‘subharmonic phees’, ‘phee-cries’). By2 months of age, however, they only produce the mature-soundingcontact calls when out of sight of conspecifics. This suggests thattwo different vocal learning processes may be at work: change inusage (Elowson et al., 1998; Seyfarth & Cheney, 1986) and trans-formation of immature calls into mature versions (Takahashi et al.,2015) (Fig. 2b). Twitters and trills are produced frequently bymarmosets of all ages (Bezerra & Souto, 2008; Pistorio, Vintch, &Wang, 2006; Takahashi et al., 2015), but in adults they are typi-cally produced only when in visual contact with conspecifics (Liaoet al., 2018). Thus, twitters and trills undergo change in usage in the

Page 5: Volition and learning in primate vocal behaviour · Special Issue: Cognition & Language Volition and learning in primate vocal behaviour Asif A. Ghazanfar a, b, c, *, Diana A. Liao

0302010

0

20

Time (s)

Freq

uen

cy (

kHz)

(a) P1

P13

P36

P60

0

20

0

20

0

20

(b)

0 0.8

0

10

200 0.4

0

10

20

0 0.8

0 0.8

0 0.8

Freq

uen

cy (

kHz)

Freq

uen

cy (

kHz)

Time (s)

Time (s)

Time (s)Time (s)

0

10

20

Time (s)Different context

Twitter Trill

Cry

Phee-cry

Phee

0 0.8Time (s)

Subharmonic0

10

20

0

10

20

0 0.40

10

20

0 0.8

Freq

uen

cy (

kHz)

Time (s)Time (s)

Change in usage Change in form

(d)

–28

–24

–20

–16

Wie

ner

en

trop

y (d

B)

5 10 15 20 25 30Postnatal day

35

High contingentLow contingent

P = 0.0022

0 0.1 0.2 0.3 0.4

r = -0.879, P = 0.002

Contingent parentalresponses

Phee

/cry

rat

ioze

ro-c

ross

ing

day

15

20

25

30

35

40

10

(c)

Figure 2. Vocal development and learning in marmoset monkeys. (a) Vocalizations over postnatal days P1eP60 from one infant. (b) Twitters and trills change usage whereas cries,phee-cries and subharmonic-phees transition to phee calls. (c) The more contingent parental vocal responses an infant hears following its own vocalizations, the faster it willpermanently transform high-entropy (immature-sounding) calls into low-entropy (mature-sounding) contact calls. The zero-crossing day is an index of vocal development,referring to the postnatal day onwhich an infant produces calls with a 50:50 phee/cry ratio. (d) Vocal production learning by infant marmoset monkeys. Twin infants received eitherhigh-contingency playbacks (100%) or low-contingency playbacks (10%). Wiener entropy (in dB), a measure of tonality, changes faster over postnatal days for high- versus low-contingency infants.

A. A. Ghazanfar et al. / Animal Behaviour 151 (2019) 239e247 243

SPECIAL ISSUE: COGNITION & LANGUAGE

first 2 months; infants stop producing them when out of visualcontact with a conspecific. By contrast, cries, phee-cries andsubharmonic-phees are only produced by infants (Takahashi et al.,2015). Because these infant-only calls share some features with the

mature contact phee call (e.g. a common duration that is distinctfrom trills and twitter syllables), they represent immature phees,consistent with vocal transformations observed in preverbal hu-man infants (Kent & Murray, 1982; Scheiner, Hammerschmidt,

Page 6: Volition and learning in primate vocal behaviour · Special Issue: Cognition & Language Volition and learning in primate vocal behaviour Asif A. Ghazanfar a, b, c, *, Diana A. Liao

A. A. Ghazanfar et al. / Animal Behaviour 151 (2019) 239e247244

SPECIAL ISSUE: COGNITION & LANGUAGE

Jurgens, & Zwirner, 2002) and songbirds (Tchernichovski, Mitra,Lints, & Nottebohm, 2001), but contrasting with prior reports ondeveloping primates (Egnor & Hauser, 2004). If growth alonecannot fully explain the acoustic changes that accompany thetransformation of infant marmoset contact vocalizations, what elsecan account for them?

Studies of naturalistic human infanteparent interactions(Bloom, Russell, & Wassenberg, 1987; Gros-Louis, West, & King,2014; Hsu & Fogel, 2001; Masataka, 1993) as well as experi-mental studies (Goldstein, King, & West, 2003; Goldstein &Schwade, 2008) reveal that contingent parental responses influ-ence the acoustic structure of subsequent infant vocalizations,making them sound more mature (i.e. speech-like). Along similarlines, marmoset infants increasingly produce longer andmore tonal(low entropy) calls after hearing their parents' contact calls(Takahashi, Fenley, & Ghazanfar, 2016). This suggests thatmarmoset parents could have a real-time influence on the pro-duction of infant vocalizations. To assess the effect of marmosetparenteinfant vocal interactions, vocal exchanges were recordedwhen infants and their mother or father were in auditory, but notvisual, contact (Takahashi et al., 2015). For each infant, the day onwhich the ratio of mature contact calls to immature contact callswas 50:50 was marked as the infant's transition day. Transitionswere typically sharp, but their timing varied substantially acrossinfants (~10e40 days). If parental responses to infant vocalizationsaffected the timing of this transition, then this would explain, atleast partially, its variability across infants. Parental influence couldbe via contingent responses and/or simply the number of adultvocalizations the infant has heard. Fig. 2c shows a significant cor-relation with the timing of the transition day and the number ofcontingent responses provided by parents. Proportions ofnoncontingent parental calls (91.5% of all calls on average) were notsignificantly correlated with this timing. Thus, contingent vocalresponses from parents seem to influence the vocal development ofinfant marmoset monkeys (Takahashi et al., 2015).

Taken together, these data suggest that developing marmosetmonkeysdunlike every other nonhuman primate investigated thusfardmay be vocal learners (Margoliash & Tchernichovski, 2015).However, a viable alternative hypothesis is that, instead of aninstance of vocal learning, marmoset parents are simply respond-ing more to healthier infants who develop their vocalizations morequickly than others. An experiment was designed to explicitly testwhether or not contingent vocal feedback can increase the rate atwhichmarmoset infants begin producing mature-sounding contactcalls (Takahashi, Liao,&Ghazanfar, 2017). Sincemarmoset monkeystypically give birth to dizygotic twins (Harris et al., 2014), the in-fluence of genetics and the perinatal environment on vocal devel-opment could be better controlled (Zhang & Ghazanfar, 2016)Starting on P1, infants were provided different levels of contingentfeedback using closed-loop, computer-driven playbacks of parentalcontact calls in almost daily 30 min sessions for 2 months. Onerandomly selected twin was given the best possible simulated‘parent’ who provided 100% contingent vocal feedback; the otherinfant received only 10% contingent vocal feedback. Infantmarmoset monkeys who received more contingent feedbacklearned faster than their twin (as measured by entropy; Fig. 2d).They did this not through imitation of the parental calls but ratherthrough the experience-dependent increase in the control of thevocal apparatus. Calls with high entropy are related to poormuscular control ofdand coordination betweendrespiration andvocal fold tension (Takahashi et al., 2015; Teramoto et al., 2017;Zhang & Ghazanfar, 2016). Thus, more contingent vocal feedbackresults in faster development of this respiratory and laryngealcontrol and coordination (Teramoto et al., 2017). Other acousticfeatures (e.g. dominant frequency and duration) were unaffected by

experience and this was consistent with the predictions of an in-tegrated framework for marmoset vocal development (Teramotoet al., 2017) and experimental data (Zhang & Ghazanfar, 2018)that indicate that changes in some acoustic features are, indeed,mostly explained by growth.

It is generally accepted that there are three varieties of vocallearning: comprehension, usage and production (Janik & Slater,2000). Comprehension learning is when an animal learns torespond appropriatelydvia experiencedto vocal signals. Forexample, infant vervet monkeys learn adaptive responses to alarmcalls by watching what adult conspecifics do (Seyfarth & Cheney,1986), infant macaques learn via experience to recognize theirmother's voice (Fischer, 2004), and Diana monkeys, Cercopithecusdiana, learn to respond adaptively to the alarm calls of other species(Zuberbühler, 2000). Usage learning is when an animal learns inwhich context(s) to produce a pre-existing call in its repertoire. Forinstance, infant and juvenile vervet monkeys produce adult-like‘raptor’ alarm calls but do so to the wrong birds; they eventuallylearn to associate their alarm calling to the appropriate bird species(Seyfarth & Cheney, 1986). Similarly, infant marmoset monkeysproduce some call types in contexts that are different from thoseelicited during adult call production (Elowson, Snowdon, & Sweet,1992), and learning the appropriate context is experience depen-dent (Gültekin & Hage, 2017). The third variety of vocal lear-ningdproduction learningdis the experience-dependent changein the acoustic structure of vocalizations (Janik & Slater, 2000).Increasingly, however, the literature has limited the definition ofvocal production learning to learning novel vocalizations viaimitation (e.g. Fitch, 2010). Indeed, some have limited it evenfurther to include only imitation of those vocalizations that involvechanges in laryngeal/syringeal control by the forebrain (Petkov &Jarvis, 2012). Such a narrow definition fails to capture the diversemeans by which humans and other animals learn to produce theircommunication signals (Tchernichovski & Marcus, 2014).

Infant marmoset monkeys exhibit vocal production learning viasocial reinforcement from parents. They do this not throughimitation but rather through the experience-dependent increase inthe control of the vocal apparatus that allows them to moreconsistently produce tonal (low entropy) phee calls. Early in life,infant marmosets produce immature versions of their contact pheecall. Relative to the mature contact call, these immature versionsare higher in spectral entropy, dominant frequency and amplitudemodulation frequency, and shorter in duration (Takahashi et al.,2015, 2016; Zhang & Ghazanfar, 2016). The production of theseimmature contact calls is related to a small lung capacity (Zhang &Ghazanfar, 2018) and poor muscular control ofdand coordinationbetweendrespiration and vocal fold tension (Takahashi et al., 2015;Teramoto et al., 2017; Zhang & Ghazanfar, 2016). These immaturecalls disappear later in development; they are not produced in anyother contexts. The developmental timing of this control and co-ordination of vocal apparatus elements is what is linked to expe-rience: more contingent vocal feedback results in fasterdevelopment of respiratory and laryngeal control and coordination(Teramoto et al., 2017). These data support the notion that forms ofvocal production learning extend beyond imitation (Lipkind et al.,2013; Tchernichovski & Marcus, 2014). Importantly, parental in-teractions are essential for the vocal behaviour of offspring in thelong term. Infant marmosets with limited parental contact exhibitvocal behaviours as subadults that remain immature relative tonormally reared marmosets (Gültekin & Hage, 2017).

The social reinforcement-based vocal learning by infantmarmoset monkeys is consistent with findings in experimentalstudies of early (prelinguistic) vocal development in humans(Goldstein et al., 2003; Goldstein & Schwade, 2008) and songlearning in birds (e.g. brown-headed cowbirds, Molothrus ater:

Page 7: Volition and learning in primate vocal behaviour · Special Issue: Cognition & Language Volition and learning in primate vocal behaviour Asif A. Ghazanfar a, b, c, *, Diana A. Liao

A. A. Ghazanfar et al. / Animal Behaviour 151 (2019) 239e247 245

SPECIAL ISSUE: COGNITION & LANGUAGE

West & King, 1988; zebra finches, Taeniopygia guttata: Chen,Matheson, & Sakata, 2016). In human infants, for instance, imma-ture vocalizations (e.g. cries and cooing sounds, among manyothers) gradually transform into the consistent, context-dependentproduction of speech-like babbling output (Kent & Murray, 1982;Scheiner et al., 2002). The speed of this transformation is influ-enced by contingent parental feedback (Bloom et al., 1987;Goldstein et al., 2003; Goldstein & Schwade, 2008; Gros-Louiset al. 2014; Hsu & Fogel, 2001; Masataka, 1993). Another similardevelopment pattern occurs during locomotion learning byhumans. Human toddlers alternate between crawling and walking,and only later in development will start to walk consistently with amore adult-like gait (Adolph, Vereijken, & Shrout, 2003; Adolph &Robinson, 2013). This locomotor transition is also influenced byparental social feedback (Tamis-LeMonda et al., 2008). Similarly,marmoset infants who received greater contingent vocal feedbackthan their twin began to consistently produce mature-soundingphee calls earlier in life (Takahashi et al., 2017).

COOPERATIVE BREEDING AS THE IMPORTANT COMMONFACTOR BETWEEN HUMANS AND MARMOSET MONKEYS

Why do humans and marmoset monkeys exhibit similar pat-terns of vocal development in the early postnatal period? As some40 million years have passed since the Old World and New Worldprimate lineages split (Steiper & Young, 2006), we suggest thatvocal learning arose as a by-product of the convergent evolution ofa cooperative breeding system. Cooperative breeding is only foundin about 3% of mammals (Hrdy, 2005). Of those mammals,marmoset monkeys and others in the callitrichid family are theonly nonhuman primates known to exhibit this strategy (Burkartet al., 2009; Hrdy, 2005). For marmosets, the rearing of infants isgreatly reliant on a concerted effort among the breeding female,breeding male, nonbreeding siblings and other familiar but unre-lated group members. Marmoset caregivers compete to carry in-fants (Santos, French, & Otta, 1997; Snowdon & Cronin, 2007) andfrequently provision food to offspring (Burkart & van Schaik, 2010;Yamamoto & Lopes, 2004). This cooperative breeding framework,in which nonparents within a social group spontaneously care foroffspring other than their own, drives a more general tendency tobe prosocial in other domains (Burkart et al., 2014), including vocalcommunication (Borjon & Ghazanfar, 2014).

How does this lead to vocal production learning in infant mar-mosets and humans? Care of infants is probably the most impor-tant context in which cooperation with unrelated individualsoccurs. There is a strong correlation between the amount of infantcare provided by others and the reproductive success of a mother(Ross & MacLarnon, 2000). In a noisy, busy environment wherecaregiver attention is a limited resource, and where nonmaternalcaregivers may have higher thresholds than mothers to providecare, evolution may select for a vocal learning capacity that helpsinfants produce mature-sounding (e.g. loud, long and tonal) vo-calizations more quickly and thus more reliably attract caregiverattention (Zuberbühler, 2012). A related hypothesis is that humaninfant vocalizations that sound more speech-like evolved to exploitpre-existing auditory predispositions in adult receivers (Locke,2006). The fact that parents of both human and marmoset infantsare more likely to give contingent responses to infant vocalizationswhen those vocalizations sound more adult-like (Gros-Louis, West,Goldstein, & King, 2006; Takahashi et al., 2016) is consistent withthis ‘receiver predisposition’ idea (Locke, 2006). We thereforesuggest that the vocal learning mechanism evolved to speed up theproduction of mature-sounding vocalizations (those that exploitthe receiver predispositions) using social feedback because suchvocalizations are more likely to elicit caregiver attention.

Acknowledgments

This work was supported by a National Institutes of Health grant(NIH R01NS054898, A.A.G.) and a National Science FoundationGraduate Research Fellowship (D.A.L.).

References

Adolph, K. E., & Robinson, S. R. (2013). The road to walking: What learning to walktells us about development. In P. Zelazo (Ed.), Oxford handbook of developmentalpsychology (pp. 403e443). New York, NY: Oxford University Press.

Adolph, K. E., Robinson, S. R., Young, J. W., & Gill-Alvarez, F. (2008). What is theshape of developmental change? Psychological Review, 115, 527e543.

Adolph, K. E., Vereijken, B., & Shrout, P. E. (2003). What changes in infant walkingand why. Child Development, 74, 475e497.

Aitken, P. G. (1981). Cortical control of conditioned and spontaneous vocal behaviorin rhesus monkeys. Brain and Language, 13, 171e184.

Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression.Journal of Personality and Social Psychology, 70, 614e636.

Bergman, T. J. (2013). Speech-like vocalized lip-smacking in geladas. Current Biology,23, R268eR269.

Bezerra, B. M., & Souto, A. (2008). Structure and usage of the vocal repertoire ofCallithrix jacchus. International Journal of Primatology, 29, 671e701.

Bloom, K., Russell, A., & Wassenberg, K. (1987). Turn taking affects the quality ofinfant vocalizations. Journal of Child Language, 14, 211e227.

Bo€e, L.-J., Berthommier, F., Legou, T., Captier, G., Kemp, C., Sawallis, T. R., et al. (2017).Evidence of a vocalic proto-system in the baboon (Papio papio) suggests pre-hominin speech precursors. PLoS One, 12, e0169321.

Borjon, J. I., & Ghazanfar, A. A. (2014). Convergent evolution of vocal cooperationwithout convergent evolution of brain size. Brain, Behavior and Evolution, 84,93e102.

Borjon, J. I., Takahashi, D. Y., Cervantes, D. C., & Ghazanfar, A. A. (2016). Arousaldynamics drive vocal production in marmoset monkeys. Journal of Neurophys-iology, 116, 753e764.

Burkart, J. M., Allon, O., Amici, F., Fichtel, C., Finkenwirth, C., Heschl, A., et al. (2014).The evolutionary origin of human hyper-cooperation. Nature Communications,5, 4747.

Burkart, J. M., Hrdy, S. B., & van Schaik, C. P. (2009). Cooperative breeding andhuman cognitive evolution. Evolutionary Anthropology, 18, 175e186.

Burkart, J. M., & van Schaik, C. P. (2010). Cognitive consequences of cooperativebreeding in primates? Animal Cognition, 13, 1e19.

de Castro Le~ao, A., Duarte D�oria Neto, A., & Bernardete Cordeiro de Sousa, M. (2009).New developmental stages for common marmosets (Callithrix jacchus) usingmass and age variables obtained by K-means algorithm and self-organizingmaps (SOM). Computers in Biology and Medicine, 39, 853e859.

Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., & Ghazanfar, A. A.(2009). The natural statistics of audiovisual speech. PLoS Computational Biology,5, e1000436.

Chen, Y., Matheson, L. E., & Sakata, J. T. (2016). Mechanisms underlying the socialenhancement of vocal learning in songbirds. Proceedings of the National Acad-emy of Sciences of the United States of America, 113(24), 6641e6646.

Choi, J. Y., Takahashi, D. Y., & Ghazanfar, A. A. (2015). Cooperative vocal control inmarmoset monkeys via vocal feedback. Journal of Neurophysiology, 114,274e283.

Coud�e, G., Ferrari, P. F., Rod�a, F., Maranesi, M., Borelli, E., Veroni, V., et al. (2011).Neurons controlling voluntary vocalization in the macaque ventral premotorcortex. PLoS One, 6, e26822.

Egnor, S. E. R., & Hauser, M. D. (2004). A paradox in the evolution of primate vocallearning. Trends in Neurosciences, 27, 649e654.

Egnor, S. E. R., Wickelgren, J. G., & Hauser, M. D. (2007). Tracking silence: Adjustingvocal production to avoid acoustic interference. Journal of Comparative Physi-ology A-Neuroethology Sensory Neural and Behavioral Physiology, 193, 477e483.

Elowson, A. M., Snowdon, C. T., & Lazaro-Perea, C. (1998). Babbling' and socialcontext in infant monkeys: Parallels to human infants. Trends in Cognitive Sci-ences, 2, 31e37.

Elowson, A. M., Snowdon, C. T., & Sweet, C. J. (1992). Ontogeny of trill and J-callvocalizations in the pygmy marmoset, Cebuella pygmaea. Animal Behaviour, 43,703e715.

Fichtel, C., & Hammerschmidt, K. (2002). Responses of redfronted lemurs toexperimentally modified alarm calls: Evidence for urgency-based changes incall structure. Ethology, 108, 763e778.

Fischer, J. (2004). Emergence of individual recognition in young macaques. AnimalBehaviour, 67, 655e661.

Fischer, J., Hammerschmidt, K., & Todt, D. (1995). Factors affecting acoustic variationin Barbary-macaque (Macaca sylvanus) disturbance calls. Ethology, 101, 51e66.

Fischer, J., & Price, T. (2017). Meaning, intention, and inference in primate vocalcommunication. Neuroscience & Biobehavioral Reviews, 82, 22e31.

Fitch, W. T. (2010). The evolution of language. Cambridge, U.K.: Cambridge UniversityPress.

Fitch, W. T., de Boer, B., Mathur, N., & Ghazanfar, A. A. (2016). Monkey vocal tractsare speech-ready. Science Advances, 2, e1600723.

Page 8: Volition and learning in primate vocal behaviour · Special Issue: Cognition & Language Volition and learning in primate vocal behaviour Asif A. Ghazanfar a, b, c, *, Diana A. Liao

A. A. Ghazanfar et al. / Animal Behaviour 151 (2019) 239e247246

SPECIAL ISSUE: COGNITION & LANGUAGE

Gemba, H., Kyuhou, S., Matsuzaki, R., & Amino, Y. (1999). Cortical field potentialsassociated with audio-initiated vocalization in monkeys. Neuroscience Letters,272, 49e52.

Ghazanfar, A. A., Flombaum, J. I., Miller, C. T., & Hauser, M. D. (2001). The units ofperception in the antiphonal calling behavior of cotton-top tamarins (Saguinusoedipus): Playback experiments with long calls. Journal of Comparative Physi-ology A: Sensory, Neural, and Behavioral Physiology, 187, 27e35.

Ghazanfar, A. A., Morrill, R. J., & Kayser, C. (2013). Monkeys are perceptually tuned tofacial expressions that exhibit a theta-like speech rhythm. Proceedings of theNational Academy of Sciences of the United States of America, 110, 1959e1963.

Ghazanfar, A. A., & Rendall, D. (2008). Evolution of human vocal production. CurrentBiology, 18, R457eR460.

Ghazanfar, A. A., Smith-Rohrberg, D., Pollen, A. A., & Hauser, M. D. (2002). Temporalcues in the antiphonal long-calling behaviour of cottontop tamarins. AnimalBehaviour, 64, 427e438.

Ghazanfar, A. A., & Takahashi, D. Y. (2014a). Facial expressions and the evolution ofthe speech rhythm. Journal of Cognitive Neuroscience, 26(6), 1196e1207.

Ghazanfar, A. A., & Takahashi, D. Y. (2014b). The evolution of speech: Vision, rhythm,cooperation. Trends in Cognitive Sciences, 18, 543e553.

Ghazanfar, A. A., Takahashi, D. Y., Mathur, N., & Fitch, W. T. (2012). Cineradiographyof monkey lip-smacking reveals putative precursors of speech dynamics. Cur-rent Biology, 22, 1176e1182.

Ghitza, O., & Greenberg, S. (2009). On the possible role of brain rhythms in speechperception: Intelligibility of time-compressed speech with periodic and aperi-odic insertions of silence. Phonetica, 66, 113e126.

Gillooly, J. F., & Ophir, A. G. (2010). The energetic basis of acoustic communication.Proceedings of the Royal Society B: Biological Sciences, 277, 1325e1331.

Giraud,A.-L.,&Poeppel,D. (2012).Cortical oscillationsandspeechprocessing:Emergingcomputational principles and operations. Nature Neuroscience, 15, 511e517.

Goldstein, M. H., King, A. P., & West, M. J. (2003). Social interaction shapes babbling:Testing parallels between birdsong and speech. Proceedings of the NationalAcademy of Sciences of the United States of America, 100, 8030e8035.

Goldstein, M. H., & Schwade, J. A. (2008). Social feedback to infants' babbling fa-cilitates rapid phonological learning. Psychological Science, 19, 515e523.

Greenberg, S., Carvey, H., Hitchcock, L., & Chang, S. (2003). Temporal properties ofspontaneous speech: A syllable-centric perspective. Journal of Phonetics, 31,465e485.

Gros-Louis, J., West, M. J., Goldstein, M. H., & King, A. P. (2006). Mothers providedifferential feedback to infants' prelinguistic sounds. International Journal ofBehavioral Development, 30, 509e516.

Gros-Louis, J., West, M. J., & King, A. P. (2014). Maternal responsiveness and thedevelopment of directed vocalizing in social interactions. Infancy, 19, 385e408.

Gültekin, Y. B., & Hage, S. R. (2017). Limiting parental feedback disrupts vocaldevelopment in marmoset monkeys. Nature Communications, 8, 14046.

Hage, S. R., Gavrilov, N., & Nieder, A. (2013). Cognitive control of distinct vocaliza-tions in rhesus monkeys. Journal of Cognitive Neuroscience, 25, 1692e1701.

Harris, R. A., Tardif, S. D., Vinar, T., Wildman, D. E., Rutherford, J. N., Rogers, J., et al.(2014). Evolutionary genetics and implications of small size and twinning incallitrichine primates. Proceedings of the National Academy of Sciences of theUnited States of America, 111, 1467e1472.

Hihara, S., Yamada, H., Iriki, A., & Okanoya, K. (2003). Spontaneous vocal differen-tiation of coo-calls for tools and food in Japanese monkeys. NeuroscienceResearch, 45, 383e389.

Hrdy, S. B. (2005). Evolutionary context of human development: The cooperativebreeding model. In C. Carter, L. Ahnert, K. Grossmann, S. Hrdy, M. Lamb,S. Porges, et al. (Eds.), Attachment and bonding: A new synthesis; from the 92ndDahlem Workshop Report (pp. 9e32). Cambridge, MA: MIT Press.

Hsu, H.-C., & Fogel, A. (2001). Infant Vocal development in a dynamicmothereinfant communication system. Infancy, 2, 87e109.

Jacob, F. (1977). Evolution and tinkering. Science, 196, 1161e1166.Janik, V. M., & Slater, P. J. (2000). The different roles of social learning in vocal

communication. Animal Behaviour, 60, 1e11.Jürgens, U. (2009). The neural control of vocalization in mammals: A review. Journal

of Voice, 23, 1e10.Kent, R. D., & Murray, A. D. (1982). Acoustic features of infant vocalic utterances at 3,

6, and 9 months. Journal of the Acoustical Society of America, 72, 353e365.Kulahci, I. G., Rubenstein, D. I., & Ghazanfar, A. A. (2015). Lemurs groom-at-a-

distance through vocal networks. Animal Behaviour, 110, 179e186.Lameira, A. R., Hardus, M. E., Bartlett, A. M., Shumaker, R. W., Wich, S. A., &

Menken, S. B. (2015). Speech-like rhythm in a voiced and voiceless orangutancall. PLoS One, 10, e116136.

Lameira, A. R., Maddieson, I., & Zuberbühler, K. (2014). Primate feedstock for theevolution of consonants. Trends in Cognitive Sciences, 18, 60e62.

Levinson, S. C. (2016). Turn-taking in human communication: Origins and impli-cations for language processing. Trends in Cognitive Sciences, 20(1), 6e14.

Liao, D. A., Zhang, Y. S., Cai, L. X., & Ghazanfar, A. A. (2018). Internal states andextrinsic factors both determine monkey vocal production. Proceedings of theNational Academy of Sciences of the United States of America, 115, 3978e3983.

Linden, W. (1987). A microanalysis of autonomic activity during human speech.Psychosomatic Medicine, 49, 562e578.

Lipkind, D., Marcus, G. F., Bemis, D. K., Sasahara, K., Jacoby, N., Takahasi, M., et al.(2013). Stepwise acquisition of vocal combinatorial capacity in songbirds andhuman infants. Nature, 498, 104e108.

Locke, J. L. (2006). Parental selection of vocal behavior: Crying, cooing, babbling andthe evolution of language. Human Nature, 17, 155e168.

Lynch, J. J., Thomas, S. A., Long, J. M., Malinow, K. L., Chickadonz, G., & Katcher, A. H.(1980). Human speech and blood pressure. Journal of Nervous and Mental Dis-ease, 168, 526e534.

MacPherson, M. K., Abur, D., & Stepp, C. E. (2017). Acoustic measures of voice andphysiologic measures of autonomic arousal during speech as a function ofcognitive load. Journal of Voice, 31(4), 504.e1e504.e9.

Margoliash, D., & Tchernichovski, O. (2015). Marmoset kids actually listen. Science,349, 688e689.

Marler, P. (1965). Communication in monkeys and apes. In I. DeVore, & K. R. L. Hall(Eds.), Primate behavior (pp. 544e586). NewYork,NY:Holt, Rinehardt&Winston.

Masataka, N. (1993). Effects of contingent and noncontingent maternal stimulationon the vocal behaviour of three- to five-month old Japanese infants. Journal ofChild Language, 20, 303e312.

Meise, K., Keller, C., Cowlishaw, G., & Fischer, J. (2011). Sources of acoustic variation:Implications for production specificity and call categorization in chacma ba-boon (Papio ursinus) grunts. Journal of the Acoustical Society of America, 129,1631e1641.

Miller, C. T., Mandel, K., & Wang, X. (2010). The communicative content of thecommon marmoset phee call during antiphonal calling. American Journal ofPrimatology, 72, 974e980.

Miller, C. T., & Thomas, A. W. (2012). Individual recognition during bouts ofantiphonal calling in common marmosets. Journal of Comparative Physiology A,198, 337e346.

Morrill, R. J., Paukner, A., Ferrari, P. F., & Ghazanfar, A. A. (2012). Monkey lip-smacking develops like the human speech rhythm. Developmental Science, 15,557e568.

Norcross, J., Newman, J. D., & Cofrancesco, L. (1999). Context and sex differencesexist in the acoustic structure of phee calls by newly-paired common mar-mosets (Callithrix jacchus). American Journal of Primatology, 49, 165e181.

Owren, M. J., Amoss, R. T., & Rendall, D. (2011). Two organizing principles of vocalproduction: Implications for nonhuman and human primates. American Journalof Primatology, 73, 530e544.

Pelegrín-García, D., Smits, B., Brunskog, J., & Jeong, C.-H. (2011). Vocal effort withchanging talker-to-listener distance in different acoustic environments. Journalof the Acoustical Society of America, 129, 1981e1990.

Petkov, C. I., & Jarvis, E. D. (2012). Birds, primates, and spoken language origins:Behavioral phenotypes and neurobiological substrates. Frontiers in EvolutionaryNeuroscience, 4, 12.

Pfaff, D. (2006). Brain arousal and information theory. Cambridge, MA: HarvardUniversity Press.

Pistorio, A. L., Vintch, B., & Wang, X. (2006). Acoustic analysis of vocal developmentin a New World primate, the common marmoset (Callithrix jacchus). Journal ofthe Acoustical Society of America, 120, 1655e1670.

Pomberger, T., Risueno-Segovia, C., L€oschner, J., & Hage, S. R. (2018). Precise motorcontrol enables rapid flexibility in vocal behavior of marmoset monkeys. Cur-rent Biology, 28, 788e794, e783.

Price, T., Wadewitz, P., Cheney, D., Seyfarth, R., Hammerschmidt, K., & Fischer, J.(2015). Vervets revisited: A quantitative analysis of alarm call structure andcontext specificity. Scientific Reports, 5, 13220.

Ross, C., & MacLarnon, A. (2000). The evolution of non-maternal care in anthropoidprimates: A test of the hypotheses. Folia Primatologica, 71, 93e113.

Roy, S., Miller, C. T., Gottsch, D., & Wang, X. (2011). Vocal control by the commonmarmoset in the presence of interfering noise. Journal of Experimental Biology,214, 3619e3629.

Santos, C. V., French, J. A., & Otta, E. (1997). Infant carrying behavior in callitrichidprimates: Callithrix and Leontopithecus. International Journal of Primatology, 18,889e907.

Scheiner, E., Hammerschmidt, K., Jurgens, U., & Zwirner, P. (2002). Acoustic analysesof developmental changes and emotional expression in the preverbal vocali-zations of infants. Journal of Phonetics, 16, 509e529.

Scherer, K. R., Johnstone, T., & Klasmeyer, G. (2003). Vocal expression of emotion. InR. J. Davidson, K. R. Scherer, & H. H. Goldsmith (Eds.), Handbook of affectivesciences (pp. 433e456). Oxford, U.K: Oxford University Press.

Schultz-Darken, N., Braun, K. M., & Emborg, M. E. (2016). Neurobehavioral devel-opment of common marmoset monkeys. Developmental Psychobiology, 58(2),141e158.

Seyfarth, R. M., & Cheney, D. L. (1986). Vocal development in vervet monkeys. An-imal Behaviour, 34, 1640e1658.

Seyfarth, R. M., & Cheney, D. L. (2017). The origin of meaning in animal signals.Animal Behaviour, 124, 339e346.

Seyfarth, R. M., Cheney, D. L., & Marler, P. (1980). Monkey responses to threedifferent alarm calls: Evidence of predator classification and semanticcommunication. Science, 210(4471), 801e803.

Slocombe, K. E., & Zuberbühler, K. (2007). Chimpanzees modify recruitmentscreams as a function of audience composition. Proceedings of the NationalAcademy of Sciences of the United States of America, 104, 17228e17233.

Snowdon, C. T., & Cronin, K. A. (2007). Cooperative breeders do cooperate. Behav-ioural Processes, 76, 138e141.

Steiper, M. E., & Young, N. M. (2006). Primate molecular divergence dates.MolecularPhylogenetics and Evolution, 41, 384e394.

Sutton, D., Larson, C., Taylor, E. M., & Lindeman, R. C. (1973). Vocalization in rhesusmonkeys: Conditionability. Brain Research, 52, 225e231.

Takahashi, D. Y., Fenley, A. R., & Ghazanfar, A. A. (2016). Early development of turn-taking with parents shapes vocal acoustics in infant marmoset monkeys. Phil-osophical Transactions of the Royal Society B, 371, 20150370.

Page 9: Volition and learning in primate vocal behaviour · Special Issue: Cognition & Language Volition and learning in primate vocal behaviour Asif A. Ghazanfar a, b, c, *, Diana A. Liao

A. A. Ghazanfar et al. / Animal Behaviour 151 (2019) 239e247 247

SPECIAL ISSUE: COGNITION & LANGUAGE

Takahashi, D. Y., Fenley, A. R., Teramoto, Y., Narayanan, D. Z., Borjon, J. I., Holmes, P.,et al. (2015). The developmental dynamics of marmoset monkey vocal pro-duction. Science, 349, 734e738.

Takahashi, D. Y., Liao, D. A., & Ghazanfar, A. A. (2017). Vocal learning via socialreinforcement by infant marmoset monkeys. Current Biology, 27,1844e1852.

Takahashi, D. Y., Narayanan, D. Z., & Ghazanfar, A. A. (2013). Coupled oscillatordynamics of vocal turn-taking in monkeys. Current Biology, 23, 2162e2168.

Tamis-LeMonda, C. S., Adolph, K. E., Lobo, S. A., Karasik, L. B., Ishak, S., &Dimitropoulou, K. A. (2008). When infants take mothers' advice: 18-month-olds integrate perceptual and social information to guide motor action. Devel-opmental Psychology, 44, 734e746.

Tchernichovski, O., & Marcus, G. (2014). Vocal learning beyond imitation: Mecha-nisms of adaptive vocal development in songbirds and human infants. CurrentOpinion in Neurobiology, 28, 42e47.

Tchernichovski, O., Mitra, P. P., Lints, T., & Nottebohm, F. (2001). Dynamics of thevocal imitation process: How a zebra finch learns its song. Science, 291,2564e2569.

Teramoto, Y., Takahashi, D., Holmes, P., & Ghazanfar, A. A. (2017). Vocal developmentin a Waddington landscape. eLife, 6, e20782.

Terleph, T. A., Malaivijitnond, S., & Reichard, U. H. (2018). An analysis of white-handed gibbon male song reveals speech-like phrases. American Journal ofPhysical Anthropology, 166(3), 649e660.

Tomasello, M. (2008). Origins of human communication. Cambridge, MA: MIT Press.

de la Torre, S., & Snowdon, C. T. (2002). Environmental correlates of vocalcommunication of wild pygmy marmosets, Cebuella pygmaea. Animal Behaviour,63, 847e856.

Toyoda, A., Maruhashi, T., Malaivijitnond, S., & Koda, H. (2017). Speech-like orofacialoscillations in stump-tailed macaque (Macaca arctoides) facial and vocal signals.American Journal of Physical Anthropology, 164, 435e439.

West, M. J., & King, A. P. (1988). Female visual displays affect the development ofmale song in the cowbird. Nature, 334, 244e246.

Yamaguchi, C., Izumi, A., & Nakamura, K. (2010). Time course of vocal modulationduring isolation in common marmosets (Callithrix jacchus). American Journal ofPrimatology, 72, 681e688.

Yamamoto, M. E., & Lopes, F.dA. (2004). Effect of removal from the family group onfeeding behavior by captive Callithrix jacchus. International Journal of Primatol-ogy, 25, 489e500.

Zhang, Y. S., & Ghazanfar, A. A. (2016). Perinatally influenced autonomic nervoussystem fluctuations drive infant vocal sequences. Current Biology, 26,1249e1260.

Zhang, Y. S., & Ghazanfar, A. A. (2018). Vocal development through morphologicalcomputation. PLoS Biology, 16, e2003933.

Zuberbühler, K. (2000). Interspecies semantic communication in two forest pri-mates. Proceedings of the Royal Society B: Biological Sciences, 267, 713e718.

Zuberbühler, K. (2012). Cooperative breeding and the evolution of vocal flexibility.In M. Tallerman, & K. R. Gibson (Eds.), The Oxford handbook of language evolution(pp. 71e81). New York, NY: Oxford University Press.