57
1 Cracking the emotion Cracking the emotion code in human speech code in human speech Yi Xu Yi Xu University College London University College London

Cracking the emotion code in human speech Yi Xu University College London

  • Upload
    morna

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

Cracking the emotion code in human speech Yi Xu University College London. Acknowledgment. Dr. Suthatip Chuenwattanapranithi Prof. Bundit Thipakorn Department of Computer Engineering, King Mongkut’s University of Technology, Bangkok, Thailand. Emotional expressions in speech: Why do we care?. - PowerPoint PPT Presentation

Citation preview

Page 1: Cracking the emotion code in human speech Yi Xu University College London

1

Cracking the emotion code in Cracking the emotion code in human speechhuman speech

Yi XuYi Xu

University College LondonUniversity College London

Cracking the emotion code in Cracking the emotion code in human speechhuman speech

Yi XuYi Xu

University College LondonUniversity College London

Page 2: Cracking the emotion code in human speech Yi Xu University College London

2

AcknowledgmentAcknowledgmentAcknowledgmentAcknowledgment

Dr. Suthatip Chuenwattanapranithi

Prof. Bundit Thipakorn

Department of Computer Engineering, King Mongkut’s University of Technology,

Bangkok, Thailand

Page 3: Cracking the emotion code in human speech Yi Xu University College London

3

Emotional expressions in speech: Emotional expressions in speech: Why do we care?Why do we care?Emotional expressions in speech: Emotional expressions in speech: Why do we care?Why do we care?

Human speech conveys not only linguistic messages, but also emotional information

Understanding how emotions are expressed in speech is important for its own sake

But it’s also important for understanding speech in general, e.g., how can we know how much of the F0 and spectral variability in a sentence is or is not reflecting an emotional content?

Page 4: Cracking the emotion code in human speech Yi Xu University College London

4

Emotional cues in speech have been Emotional cues in speech have been difficult to identify: difficult to identify: No distinctive No distinctive features recognizedfeatures recognized

Emotional cues in speech have been Emotional cues in speech have been difficult to identify: difficult to identify: No distinctive No distinctive features recognizedfeatures recognized Current practice: Examine as many acoustic parameters Current practice: Examine as many acoustic parameters

as possible and measure their correlations with multiple as possible and measure their correlations with multiple emotionsemotions

Scherer (2003:233): Synthetic compilation of the review of Synthetic compilation of the review of empirical data on acoustic patterning of basic emotionsempirical data on acoustic patterning of basic emotions

Page 5: Cracking the emotion code in human speech Yi Xu University College London

5

Why have emotions in speech Why have emotions in speech been so difficult to study?been so difficult to study?

Why have emotions in speech Why have emotions in speech been so difficult to study?been so difficult to study?

1. Hard to disentangle emotional coding from speech coding in the complex acoustic signal

2. Lack of clear, theory-based hypotheses

Page 6: Cracking the emotion code in human speech Yi Xu University College London

6

What is emotion?What is emotion?What is emotion?What is emotion? It is often believed that only humans have

emotions, and emotional intelligence makes humans stand out from other animals

Whether this belief is right depends on knowing what exactly emotions are

In general, it is believed that emotions are, first and foremost, internal feelings we experience

Therefore, emotional expressions are for displaying our internal feelings

Page 7: Cracking the emotion code in human speech Yi Xu University College London

7

Current theories of emotionCurrent theories of emotionCurrent theories of emotionCurrent theories of emotion Discrete emotion theories — There exist a set of

basic or fundamental emotions such as anger, fear, joy, sadness, disgust and surprise

Dimensional theories — All emotions can be placed into a space defined by a set of dimensions, e.g., valence (positive/negative), activation (active/rest), and power/control

Following these theories, emotional expressions are displays of either discrete emotions, or internal feelings described by the dimensions

Page 8: Cracking the emotion code in human speech Yi Xu University College London

8

A new theoretical perspectiveA new theoretical perspectiveA new theoretical perspectiveA new theoretical perspective

1. Internal feelings are an evolutionarily engineered mechanism to quickly mobilize all the reactions needed to cope with interactions with other individuals (either conspecific or cross-species), including the act of generating including the act of generating emotional expressionsemotional expressions

2. Vocal emotional expressions are evolutionarily engineered to elicit behaviors that may benefit the elicit behaviors that may benefit the vocalizervocalizer

Page 9: Cracking the emotion code in human speech Yi Xu University College London

9

Morton (1977): Many animals express Morton (1977): Many animals express dominance by trying to appear as large dominance by trying to appear as large as possibleas possible

Morton (1977): Many animals express Morton (1977): Many animals express dominance by trying to appear as large dominance by trying to appear as large as possibleas possible

What are they trying to do?What are they trying to do?

Cock fightCock fight

Page 10: Cracking the emotion code in human speech Yi Xu University College London

10

Permanent size markers based on the Permanent size markers based on the principle of body size projection principle of body size projection

Permanent size markers based on the Permanent size markers based on the principle of body size projection principle of body size projection

The visual strategyThe visual strategy

Page 11: Cracking the emotion code in human speech Yi Xu University College London

11

Permanent size markers based on the Permanent size markers based on the principle of body size projection principle of body size projection

Permanent size markers based on the Permanent size markers based on the principle of body size projection principle of body size projection

The visual strategyThe visual strategy

Page 12: Cracking the emotion code in human speech Yi Xu University College London

12

Height = frequencyHeight = frequency

Thick = harshThick = harsh

Thin = tonalThin = tonal

Arrow = changeableArrow = changeable

MortonMorton (1977) (1977): : The body size projection The body size projection principle principle (motivational-structural rules)

MortonMorton (1977) (1977): : The body size projection The body size projection principle principle (motivational-structural rules)

Animal sounds express hostility & fear/appeasement through Animal sounds express hostility & fear/appeasement through pitch and voice quality — based on body-size projection pitch and voice quality — based on body-size projection

Page 13: Cracking the emotion code in human speech Yi Xu University College London

13

OhalaOhala (1984) (1984): Human smile : Human smile ---- similar to monkey similar to monkey submission face submission face ---- is to understate body size is to understate body size

OhalaOhala (1984) (1984): Human smile : Human smile ---- similar to monkey similar to monkey submission face submission face ---- is to understate body size is to understate body size

Submission Aggression

Lip corner retraction Lip protrusion (O-face)

Monkey facial expressionsMonkey facial expressions (van Hooff, 1962, cited by Ohala, (van Hooff, 1962, cited by Ohala,

1984)1984)::

Page 14: Cracking the emotion code in human speech Yi Xu University College London

14

OhalaOhala (1984) (1984): Extending body-size projection to : Extending body-size projection to formant frequenciesformant frequencies

OhalaOhala (1984) (1984): Extending body-size projection to : Extending body-size projection to formant frequenciesformant frequencies

Submission Aggression

Retracting lip corners increases formant frequenciesRetracting lip corners increases formant frequencies

Page 15: Cracking the emotion code in human speech Yi Xu University College London

15

OhalaOhala (1984) (1984): Human smile : Human smile ---- similar to monkey similar to monkey submission face submission face ---- is to understate body size is to understate body size

OhalaOhala (1984) (1984): Human smile : Human smile ---- similar to monkey similar to monkey submission face submission face ---- is to understate body size is to understate body size

Submission Aggression

Lip corner retraction Lip protrusion (O-face)

Monkey facial expressionsMonkey facial expressions (van Hooff, 1962, cited by Ohala, (van Hooff, 1962, cited by Ohala, 1984)1984)

Page 16: Cracking the emotion code in human speech Yi Xu University College London

16

Fitch & RebyFitch & Reby (2001) (2001): Red deer larynx is : Red deer larynx is mobilely descended to exaggerate mobilely descended to exaggerate body sizebody size

Fitch & RebyFitch & Reby (2001) (2001): Red deer larynx is : Red deer larynx is mobilely descended to exaggerate mobilely descended to exaggerate body sizebody size

Male Red deer roars during mating competitionMale Red deer roars during mating competition

Page 17: Cracking the emotion code in human speech Yi Xu University College London

17

Tracheal elongation in birdsTracheal elongation in birds(Fitch, 1999)(Fitch, 1999)

Tracheal elongation in birdsTracheal elongation in birds(Fitch, 1999)(Fitch, 1999)

Crested guinea fowl

European spoonbill

Trumpeter swan

Trumpet manucode

Page 18: Cracking the emotion code in human speech Yi Xu University College London

18

Elongated and coiled tracheae of Elongated and coiled tracheae of birds-of-paradise birds-of-paradise (Clench, 1978)(Clench, 1978)

Elongated and coiled tracheae of Elongated and coiled tracheae of birds-of-paradise birds-of-paradise (Clench, 1978)(Clench, 1978)

Page 19: Cracking the emotion code in human speech Yi Xu University College London

19

Permanently descended larynx Permanently descended larynx in humanin humanPermanently descended larynx Permanently descended larynx in humanin human

Human (as well as Chimpanzee) males have lower larynx than females; their vocal folds are also longer than females’ (Fitch, 1994)

The dimorphism occurs at puberty (Negus, 1949; Goldstein, 1980), i.e., just at a time when males have the need to attract females (Feinberg et al., 2005)

Female listeners found male speech with lower pitch and denser spectrum more attractive (Feinberg et al., 2005)

Page 20: Cracking the emotion code in human speech Yi Xu University College London

20

The size code hypothesis of The size code hypothesis of emotional speechemotional speech (Chuenwattanapranithi, (Chuenwattanapranithi,

Xu, Thipakorn & Maneewongvatanaa, 2008)Xu, Thipakorn & Maneewongvatanaa, 2008)

The size code hypothesis of The size code hypothesis of emotional speechemotional speech (Chuenwattanapranithi, (Chuenwattanapranithi,

Xu, Thipakorn & Maneewongvatanaa, 2008)Xu, Thipakorn & Maneewongvatanaa, 2008)

• Vocal expression of angeranger is a display of agressivenessagressiveness and vocal expression of happinesshappiness a display of sociabilitysociability

• Anger and happiness are vocally conveyed by acoustically exaggerating or understating the body size of the speaker

Page 21: Cracking the emotion code in human speech Yi Xu University College London

21

A perceptual test using a 3D A perceptual test using a 3D articulatory synthesizerarticulatory synthesizer

(Chuenwattanapranithi et al. 2008)(Chuenwattanapranithi et al. 2008)

A perceptual test using a 3D A perceptual test using a 3D articulatory synthesizerarticulatory synthesizer

(Chuenwattanapranithi et al. 2008)(Chuenwattanapranithi et al. 2008)

Difference in larynx height = 7 mm

(Birkholz & Jackèl, 2003)

Page 22: Cracking the emotion code in human speech Yi Xu University College London

22

Manipulation of larynx height and lip Manipulation of larynx height and lip protrusion resulted in difference in protrusion resulted in difference in formant frequenciesformant frequencies

Static, VTL

Dynamic, VTL+F0

Dynamic, VTL+F0

Page 23: Cracking the emotion code in human speech Yi Xu University College London

23

Condition 1Condition 1 (a, b) (a, b) — Static low/high larynx; fixed F0

Condition 2Condition 2 (c, d) (c, d) — Static low/high larynx; static high/low F0

Condition 3Condition 3 (e, f) (e, f) — Dynamically lowered /raised larynx; fixed F0

Condition 4Condition 4 (g, h) (g, h) — Dynamically lowered /raised larynx; dynamically lowered /raised F0

Results of emotion and Results of emotion and body size perception body size perception

% angry% angry % larger size% larger size

*

*

**

* *

Page 24: Cracking the emotion code in human speech Yi Xu University College London

24

Experiment 2 — Effect of lip Experiment 2 — Effect of lip protrusion and Fprotrusion and F0 0 on perception of on perception of

emotion and body sizeemotion and body size

Experiment 2 — Effect of lip Experiment 2 — Effect of lip protrusion and Fprotrusion and F0 0 on perception of on perception of

emotion and body sizeemotion and body size Dynamically protruded / spread lips (by 7 mm);

dynamically lowered /raised F0 (by 5 Hz)

92 undergraduate Thai students as subjects (72 males and 22 females)

* *

Page 25: Cracking the emotion code in human speech Yi Xu University College London

25

InterpretationInterpretationInterpretationInterpretation Listeners are highly sensitive to formant and F0

variability that may signal body size of the speaker

They can also use the size information to determine whether the speaker is happy or angry, as long as the acoustic cues are dynamic

The finding is consistent with the size code hypothesis:

Anger and happiness are conveyed in speech by Anger and happiness are conveyed in speech by

exaggerating or understating the body size of the speaker, exaggerating or understating the body size of the speaker,

to communicate threat or appeasementto communicate threat or appeasement

Page 26: Cracking the emotion code in human speech Yi Xu University College London

26

Why dynamic?Why dynamic?Why dynamic?Why dynamic? It is as if listeners can “tell” when the acoustic cues

indicate “true” body size and when they are used as a code

From the perspective of encoding, it seems that showing anger and joy is not to sound convincingly large or small, but to show an effort to do so

This is exactly what speakers do when encoding lexical contrasts conveyed by tonal and segmental phonemes (Xu, 1997, 1999; Xu and Liu, 2007; Xu and Wang, 2001)

Page 27: Cracking the emotion code in human speech Yi Xu University College London

27

Target Approximation as basic Target Approximation as basic mechanism of speech codingmechanism of speech coding (Xu 2009)

Target Approximation as basic Target Approximation as basic mechanism of speech codingmechanism of speech coding (Xu 2009)

time

F0

Interval 1 Interval 2

[rise

]

[low]

Approaching [rise]

Approaching [low]

Each phonetic unit has a target, which is an ideal articulatory and/or acoustic pattern.

Surface movement trajectories are the result of sequential approximation of successive targets

Initial state

Page 28: Cracking the emotion code in human speech Yi Xu University College London

28

Encoding emotions by modifying Encoding emotions by modifying linguistic targetslinguistic targets

Initial state

Articulatory state

Target Final state

The target approximation process is intrinsically dynamic, and it is intrinsic to speech

Emotions are probably encoded through modification of the existing linguistic targets

Emotion-loaded Target

Page 29: Cracking the emotion code in human speech Yi Xu University College London

29

A new studyA new study — Direct modification — Direct modification of real speech based on the size code of real speech based on the size code

(Xu & Kelly, in press)(Xu & Kelly, in press)

A new studyA new study — Direct modification — Direct modification of real speech based on the size code of real speech based on the size code

(Xu & Kelly, in press)(Xu & Kelly, in press)

Conditions:1) Original speech: English numerals: 1, 2, 3, … 10

2) Spectral density:original +10% -10%

3) F0: original +10 Hz -10 Hz

Tasks:

1. Body size: Larger / Smaller

2. Emotion: Angry / happy

Page 30: Cracking the emotion code in human speech Yi Xu University College London

30

Static stimuli “nine”Static stimuli “nine”Static stimuli “nine”Static stimuli “nine”

Original

Expanded / Condensed spectrum

Raised / Lowered F0

Expanded spectrum + Raised F0

Condensed spectrum + Lowered F0

Page 31: Cracking the emotion code in human speech Yi Xu University College London

31

Dynamic stimuli “nine”Dynamic stimuli “nine”Dynamic stimuli “nine”Dynamic stimuli “nine”

Original

Expanded / Condensed spectrum

Raised / Lowered F0

Expanded spectrum + Raised F0

Condensed spectrum + Lowered F0

Page 32: Cracking the emotion code in human speech Yi Xu University College London

32

Results:Results: SizeSize judgment affected by judgment affected by

parameterparameter and direction of manipulation and direction of manipulationResults:Results: SizeSize judgment affected by judgment affected by

parameterparameter and direction of manipulation and direction of manipulation

Page 33: Cracking the emotion code in human speech Yi Xu University College London

33

Results:Results: EmotionEmotion judgment affected by judgment affected by

parameterparameter and direction of manipulation and direction of manipulationResults:Results: EmotionEmotion judgment affected by judgment affected by

parameterparameter and direction of manipulation and direction of manipulation

Page 34: Cracking the emotion code in human speech Yi Xu University College London

34

Results:Results: SizeSize judgment affected by judgment affected by

mannermanner and direction of manipulation and direction of manipulationResults:Results: SizeSize judgment affected by judgment affected by

mannermanner and direction of manipulation and direction of manipulation

Page 35: Cracking the emotion code in human speech Yi Xu University College London

35

Results:Results: EmotionEmotion judgment affected by judgment affected by

mannermanner and direction of manipulation and direction of manipulationResults:Results: EmotionEmotion judgment affected by judgment affected by

mannermanner and direction of manipulation and direction of manipulation

Page 36: Cracking the emotion code in human speech Yi Xu University College London

36

Vocal emotional expressions are evolutionarily engineered to elicit behaviors that may benefit the vocalizer

They influence the behavior of the receivers by manipulating the vocal signal along a set of bio-informational dimensions:

Extending the size code hypothesis:Extending the size code hypothesis: A bio-informational dimensions theory A bio-informational dimensions theory

(Xu, Kelly & Smillie, in press)(Xu, Kelly & Smillie, in press)

Extending the size code hypothesis:Extending the size code hypothesis: A bio-informational dimensions theory A bio-informational dimensions theory

(Xu, Kelly & Smillie, in press)(Xu, Kelly & Smillie, in press)

Page 37: Cracking the emotion code in human speech Yi Xu University College London

37

The bio-informational dimensions The bio-informational dimensions (Xu et al., in press)(Xu et al., in press)

The bio-informational dimensions The bio-informational dimensions (Xu et al., in press)(Xu et al., in press)

1.1. Size projectionSize projection = Size code

2.2. Dynamicity:Dynamicity: Controls how vigorous the vocalization sounds, depending on whether it is beneficial for the vocalizer to appear strong or weak.

3.3. Audibility: Audibility: Controls how far a vocalization can be transmitted from the vocalizer, depending on whether and how much it is beneficial for the vocalizer to be heard over long distance.

4.4. Association: Association: Controls associative use of sounds typically accompanying a non-emotional biological function in circumstances beyond the original ones. For example, the disgust vocalization seems to mirror the sounds made when a person orally rejects unpleasant food (Darwin, 1872).

Page 38: Cracking the emotion code in human speech Yi Xu University College London

38

An initial test of bio-informational An initial test of bio-informational dimensionsdimensions (Xu et al., in press)(Xu et al., in press)

An initial test of bio-informational An initial test of bio-informational dimensionsdimensions (Xu et al., in press)(Xu et al., in press)

• Original speech: English sentence “I owe you a yoyo” with focus on “owe”

• Parameter manipulations:

Formant shift ratio

Pitch median (Hz)

Pitch range factor

Duration factor

1.2 400 4 1.1

1.078 200 1.17 0.9

0.956 100 0.341

0.833 50 0.1

Page 39: Cracking the emotion code in human speech Yi Xu University College London

39

Emotion Best score (%)

Formant shift

Pitch median

Pitch range

Duration ratio

Happy 73.3 1.2*** 200 4.0*** 0.9***

Depressed 71.1 0.96 100*** 0.1*** 1.1***

Grief-stricken 60 0.83*** 400** 0.34*** 1.1**

Scared 48.9 0.83** 400*** 1.17 0.9

Angry 48.9 0.83 50*** 0.1 1.1

ResultsResults (Xu et al., in press)(Xu et al., in press)ResultsResults (Xu et al., in press)(Xu et al., in press)

Page 40: Cracking the emotion code in human speech Yi Xu University College London

40

Key findings Key findings (Xu et al., in press)(Xu et al., in press)Key findings Key findings (Xu et al., in press)(Xu et al., in press) Happiness is located near the small end of the size-projection

dimension and high end of dynamicity dimension

Two types of sadness: depressed, corresponding to most commonly reported sadness, grief-stricken, with lengthened vocal tract, suggesting that it is demandingdemanding (not beggingbegging) for sympathy

Fear has lengthened vocal tract and relatively large pitch range. Combined with high median pitch, it sends a mixed signal: I may be small (high pitch), but I am willing to fight (long vocal tract). This separates fear from submission, counter Morton (1977).

While submission probably indeed signals total surrender, a fear expression signals a demand for the aggressor to back off. This makes evolutionary sense, because a total surrender to a predator can only mean one thing: to be eaten.

Page 41: Cracking the emotion code in human speech Yi Xu University College London

41

Parallel encoding of emotional and Parallel encoding of emotional and non-emotional meaningsnon-emotional meanings

Parallel encoding of emotional and Parallel encoding of emotional and non-emotional meaningsnon-emotional meanings

Speakers convey to the listeners not only words, but also many layers of additional information.

They do so by controlling Target Approximation parameters specified by encoding schemes of various communicative functions, including the emotional functions.

Page 42: Cracking the emotion code in human speech Yi Xu University College London

42

Target Approximation as basic Target Approximation as basic mechanism of speech codingmechanism of speech coding (Xu 2005)

Target Approximation as basic Target Approximation as basic mechanism of speech codingmechanism of speech coding (Xu 2005)

time

F0

Syllable 1 Syllable 2

[rise

]

Onset F0

[low]

Approaching [rise]

Approaching [low]

The TA model assume that each syllable has a local pitch target, and surface F0 contours are the result of sequential approximation of successive targets

This also means that TA parameters can be manipulated to encode further information

Page 43: Cracking the emotion code in human speech Yi Xu University College London

43

Mandarin: Statement vs. question + focus Mandarin: Statement vs. question + focus + tone + tone (Liu & Xu, 2005)

Mandarin: Statement vs. question + focus Mandarin: Statement vs. question + focus + tone + tone (Liu & Xu, 2005)

Page 44: Cracking the emotion code in human speech Yi Xu University College London

44

Bloo laineYou’re going to ming dales with E

English:English: Statement vs. question + focus Statement vs. question + focus + word stress+ word stress (Liu & Xu, 2007) English:English: Statement vs. question + focus Statement vs. question + focus + word stress+ word stress (Liu & Xu, 2007)

Page 45: Cracking the emotion code in human speech Yi Xu University College London

45

Page 46: Cracking the emotion code in human speech Yi Xu University College London

46

SummarySummarySummarySummaryWWe are now starting to crack the emotion code:e are now starting to crack the emotion code:1. Vocal expression of emotion is not for the sake of

revealing one’s internal feelings

Rather, it is likely to be an evolutionarily engineered way of eliciting behaviors from the listener that may benefit the vocalizer

2. This is done mainly by projecting a large body size to scare away the listener, or a small body size to attract the listener

3. Size projection is also accompanied by other vocal manipulations to enhance the beneficial effects

4. All of this is done in parallel with the transmission of linguistic information

Page 47: Cracking the emotion code in human speech Yi Xu University College London

47

What about voice quality?What about voice quality?What about voice quality?What about voice quality?MortonMorton (1977) (1977)::

Without a harsh voice Without a harsh voice

quality, speech does not quality, speech does not

sound obviously angrysound obviously angry

Gobl & Ni Chasaide (2003): Gobl & Ni Chasaide (2003):

harsh/tense voice harsh/tense voice anger anger

Original

Enlarged Reduced

Page 48: Cracking the emotion code in human speech Yi Xu University College London

48

On-going studies:On-going studies:On-going studies:On-going studies:

1. Happiness and associability — With Lucy Noble

2. Fear versus submission — With Fatema Rattansey

3. Grief-stricken sadness vs. depressed sadness — With Katherine Lawson

4. Phonetic cues of politeness in Japanese —

With Marito Onishi

Page 49: Cracking the emotion code in human speech Yi Xu University College London

49

qTA — A quantitative qTA — A quantitative implementation of PENTAimplementation of PENTA

(Prom-on, Xu & Thipakorn 2009)(Prom-on, Xu & Thipakorn 2009)

qTA — A quantitative qTA — A quantitative implementation of PENTAimplementation of PENTA

(Prom-on, Xu & Thipakorn 2009)(Prom-on, Xu & Thipakorn 2009)Interactive at: www.phon.ucl.ac.uk/home/yi/qTA/

Page 50: Cracking the emotion code in human speech Yi Xu University College London

50

NaturalNatural (blue)(blue) vs. model generatedvs. model generated (red)(red) FF00 for Mandarinfor Mandarin

NaturalNatural (blue)(blue) vs. model generatedvs. model generated (red)(red) FF00 for Mandarinfor Mandarin

F0 of 5 repetitions by 1

speaker

Page 51: Cracking the emotion code in human speech Yi Xu University College London

51

NaturalNatural (blue)(blue) vs. model generatedvs. model generated (red)(red) FF00 for Englishfor English

NaturalNatural (blue)(blue) vs. model generatedvs. model generated (red)(red) FF00 for Englishfor English

Page 52: Cracking the emotion code in human speech Yi Xu University College London

52

Birkholz, P.; Jackèl, D.: A three-dimensional model of the vocal tract for speech synthesis. Proc. The 15th International Congress of Phonetic Sciences, pp. 2597-2600 (Barcelona, Spain 2003).

Chuenwattanapranithi, S., Xu, Y., Thipakorn, B. and Maneewongvatana, S. (2008). Encoding emotions in speech with the size code — A perceptual investigation. Phonetica 65: 210-230.

Clench, M. H. (1978). Tracheal elongation in birds-of-paradise. Condor 80: 423-430.Feinberg, D. R.; Jones, B. C.; Little, A. C.; Burt, D. M.; Perrett, D. I.: Manipulations of fundamental and

formant frequencies influence the attractiveness of human male voices. Animal Behavior 69: 561-568 (2005).

Fitch, W. T.: Vocal tract length perception and the evolution of language. Ph. D. Dissertation (Brown University, 1994).

Fitch, W. T. (1999). Acoustic exaggeration of size in birds by tracheal elongation: Comparative and theoretical analyses. Journal of Zoology (London) 248: 31-49.

Fitch, W. T. and Reby, D. (2001). The Descended Larynx Is Not Uniquely Human. Proceedings of the Royal Society, Biological Sciences 268(1477): 1669-1675.

Frick, R. W. (1985). Communicating emotion: The role of prosodic features. Psychological Bulletin 97: 412-429.

Goldstein, U. (1980). An articulatory model for the vocal tracts of growing children. Ph.D. dissertation, Massachusetts Institute of Technology.

Morton, E. W. (1977). On the occurrence and significance of motivation-structural rules in some bird and mammal sounds. American Naturalist 111: 855-869.

Negus, V. E.: The comparative anatomy and physiology of the larynx (Hafner, New York 1949).Ohala, J. J. (1984). An ethological perspective on common cross-language utilization of F0 of voice.

Phonetica 41: 1-16.

ReferencesReferencesReferencesReferences

Page 53: Cracking the emotion code in human speech Yi Xu University College London

53

Scherer, K. R. (1982). Methods of research on vocal communication: paradigms and parameters. In Handbook of Methods in Nonverbal Behavior Research. K. R. Scherer and P. Ekman. Cambridge, UK: Cambridge University Press pp. 136-198.

Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication 40: 227-256.

Scherer, K. R. (1999). Universality of emotional expression. In Encyclopedia of Human Emotions. D. Levinson, J. Ponzetti and P. Jorgenson. New York: Macmillan pp. 669-674.

Scherer, K. R., Banse, R. and Wallbott, H. G. (2001). Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology 32: 76-92.

van Bezooijen, R., Otto, S. and Heenan, T. A. (1983). Recognition of vocal expressions of emotions: A three-nation study to identify universal characteristics. Journal of Cross-Cultural Psychology 14: 387-406.

Xu, Y.: Effects of tone and focus on the formation and alignment of F0 contours. Journal of Phonetics 27: 55-105 (1999).

Xu, Y. (1997). Contextual tonal variations in Mandarin. Journal of Phonetics 25: 61-83.Xu, Y. (2005). Speech Melody as Articulatorily Implemented Communicative Functions. Speech

Communication 46: 220-251.Xu, Y. (2007). Speech as articulatory encoding of communicative functions. In Proceedings of The 16th

International Congress of Phonetic Sciences, Saarbrucken: 25-30.Xu, Y., Kelly, A. and Smillie, C. (forthcoming). Emotional expressions as communicative signals. To

appear in S. Hancil and D. Hirst (eds.) Prosody and Iconicity.Xu, Y.; Liu, F.: Determining the temporal interval of segments with the help of F0 contours. Journal of

Phonetics 35: 398-420 (2007).Xu, Y. and Wang, Q. E. (2001). Pitch targets and their realization: Evidence from Mandarin Chinese.

Speech Communication 33: 319-337.

ReferencesReferencesReferencesReferences

Page 54: Cracking the emotion code in human speech Yi Xu University College London

54

Vocal expression of disgustVocal expression of disgustVocal expression of disgustVocal expression of disgust

Disgust is one of the most easily recognized emotions from non-speech vocalization (Scott et al. 2004)

The disgust vocalizations are reminiscent of sound of vomit

Kohler (2007): ‘negative’ – expression of dislike, by weakening sonorous features of the accented syllable, initial consonant lengthening at the expense of the nucleus, e.g. it STINKS! (‘force accent’ [5,6]).

Page 55: Cracking the emotion code in human speech Yi Xu University College London

55

Interpretation Interpretation (cont.)(cont.)Interpretation Interpretation (cont.)(cont.) Listeners also heard vowels with longer vocal tract and

lower F0 as angry, and those with shorter vocal tract

and higher F0 as happy

This is also consistent with previous findings:

1. The smile during speech is audible to human listeners (Auberge & Cathiard, 2003)

2. Speech produced with a smiling or frowning face is perceived as happiness or unhappiness (Tartter & Braun, 1994)

3. Speech produced with lowered F0 is perceived as more

dominant (Ohala, 1984)

Page 56: Cracking the emotion code in human speech Yi Xu University College London

Emotional Intonation patternsEmotional Intonation patternsEmotional Intonation patternsEmotional Intonation patterns

Bananas aren’t poisonous!

I’m simply trying to get you to understand!

Page 57: Cracking the emotion code in human speech Yi Xu University College London

57

normalnormalPitch

Range:

longshortshortlong

DurationStrength

strongweakweakstrong

Encoding further information by Encoding further information by manipulatingmanipulating TATA parametersparameters

Encoding further information by Encoding further information by manipulatingmanipulating TATA parametersparameters

Pitch target:

[high] [rise] [low] [fall]

Low + narrowHigh + wide

[high] [rise] [low] [fall]