33
Speech and speaker normalization (in vowel normalization) Venice International University Phonetic and technological aspects of speaker characteristics Prof. Dr. J. Harrington Presented by Clara Tillmanns [email protected] 18.10.2007

Speech and speaker normalization (in vowel normalization)

  • Upload
    elam

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

Speech and speaker normalization (in vowel normalization). Venice International University Phonetic and technological aspects of speaker characteristics Prof. Dr. J. Harrington Presented by Clara Tillmanns [email protected] 18.10.2007. Contents. - PowerPoint PPT Presentation

Citation preview

Page 1: Speech and speaker normalization (in vowel normalization)

Speech and speaker normalization (in vowel normalization)

Venice International University

Phonetic and technological aspects of speaker characteristics

Prof. Dr. J. Harrington

Presented by

Clara Tillmanns

[email protected]

18.10.2007

Page 2: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

2

Contents

1. Speech and speaker normalization in vowel normalization: definition

2. Influencing parameters and instruments for vowel normalization

3. Theories

4. Studies: Johnson 1990 and 1999

5. Recapitulation

Page 3: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

3

Definition

Normalization.

We know there is extensive variation in speech. How come that listeners agree in their perception of vowels?

Page 4: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

4

Fig. 1: Scatter plot of first and second formant values of American English vowels. From Peterson & Barney 1952

Page 5: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

5

Definition

Normalization.

Which information influences this decision?

Page 6: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

6

Definition

Normalization.

And, which mechanism leads to the decision?

Page 7: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

7

Contents

1. Speech and speaker normalization: definition

2. Influencing parameters and instruments for vowel normalization

- Context- Formant ratio- F0- Visual information- Auditory gestalts

3. Theories4. Studies: Johnson 1990 and 19995. Recapitulation

Page 8: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

8

Influencing parameters and instruments for vowel normalization

Extrinsic Intrinsic

Context Formant ratio

F0

Visual information

Auditory gestalts

Page 9: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

9

Influencing parameters and instruments for vowel normalization

Extrinsic Intrinsic

Context Formant ratio

F0

Visual information

Syllable external Syllable internal

Auditory gestalts

Page 10: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

10

Influencing parameters and instruments for vowel normalization

Extrinsic Intrinsic

Context Formant ratio

F0

Visual information

Syllable external Syllable internal

Vocalic Prosodic

Tonal Auditory gestalts

Page 11: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

11

Influencing parameters and instruments for vowel normalizationContext:

Perceived vowel quality is influenced - by the formant frequencies of context vowels

(Ladefoged & Broadbent 1957)

- by the F0 range of the carrier phrase (Johnson 1990)

Tones: Pitch range of a context utterance influences Mandarin Chinese tones (Leather 1983)

Page 12: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

12

Influencing parameters and instruments for vowel normalization

Extrinsic Intrinsic

Context Formant ratio

F0

Visual information

Syllable external Syllable internal

Vocalic Prosodic

Tonal Gender

Relative patterns

Auditory gestalts

Page 13: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

13

Influencing parameters and instruments for vowel normalizationFormant ratio

Vowels are relative patterns - no absolute frequencies

Page 14: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

14

Influencing parameters and instruments for vowel normalizationFormant ratio

Fig. 2: Spectrogram of a man and a woman saying “cat”. The three lowest vowel formants (vocal tract resonant frequencies are marked as F1, F2 and F3) (Johnson 2004)

Page 15: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

15

Influencing parameters and instruments for vowel normalizationF0

Miller 1953doubled F0 and found vowel category shift for

most American English vowels

Fujisaki & Kawashime 1968:Found F1 boundary shifts from 100Hz to 200Hz

for F0 shifts of 200Hz

Page 16: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

16

Influencing parameters and instruments for vowel normalization

Extrinsic Intrinsic

Context Formant ratio

F0

Visual information

Syllable external Syllable internal

Vocalic Prosodic

Tonal Gender

Gender / AgeArticulatory gestures

Relative patterns

Auditory gestalts

Page 17: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

17

Influencing parameters and instruments for vowel normalizationVisual information- Gender: boundary shift much like the F0 shift

(Strand & Johnson 1996)- Age- Vowel quality: boundary shift through differing

visual phonetic information (Johnson 1999)- Sociocultural: Speech intelligibility is reduced,

when the voice is associated with an Asian looking face (Rubin 1992)

Page 18: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

18

Influencing parameters and instruments for vowel normalizationAuditory gestalts - “secondary cues” DurationFormant frequency movement trajectories:- Lehiste & Metzger 1973:

- Fixed duration vowels synthesized with steady-state formant frequencies (51% correct)

- mixed lists of the original vowels from men, women and children 79% correct.

- Hillenbrand & Neary 1999:- Flat-formant vowels were correctly identified 74% of the time,

while vowels synthesized with the original formant frequency trajectories were correctly identified 89% of the time.

Page 19: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

19

Contents

1. Speech and speaker normalization in vowel normalization: definition

2. Influencing parameters and instruments for vowel normalization

3. Theories3.1 Vocal tract normalization (VTN)

3.2 Talker normalization (TN)4. Studies: Johnson 1990 and 1999

5. Recapitulation

Page 20: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

20

Theories - VTN

Vocal tract normalization theories consider that listeners perceptually evaluate vowels on a talker specific coordinate system.” (Johnson 2004)

• Context vowels (reference)

• Visual information about the size of the vocal tract

Page 21: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

21

Theories - VTN

But: Talkers may differ from each other at the level of their articulatory habits of speech:

“Perception may not be able to depend on vocal tract normalization to “remove” talker differences by removing vocal tract differences.” (Johnson 2004)

Speaker/speech variation depends on anatomical differences only?

Page 22: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

22

Theories - VTN

Cross-linguistic gender differences

Bladon, Henton and Pickering (1984):The difference between men and women vary

from language to language. Cultural factors are involved in defining and

shaping male or female speech Anatomy does not completely determine the

vowel formant frequencies

Page 23: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

23

Theories - VTN

Fig. 3 Spectral shift needed to normalize male and female spectra From Bladon, Henton & Pickering (1984)

Page 24: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

24

Theories - VTN

“This seems to suggest that talkers choose different styles of speaking as social, dialectal gender markers.

A speaker normalization that removes vocal tract differences will fail to account for the linguistic categorical similarity of vowels that are different due to different habits of articulation.”

(Johnson 2004)

Page 25: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

25

Theories - TN

Talker normalization is subject to expectations:

Magnuson & Nusbaum (1994) compared

1-voice with 2-voice instructions in a mixed-talker and blocked-talker experiment.

Advantage of blocked-talker disappeared when subjects didn’t know about the different F0s of the two voices.

Talker normalization is an active process:

Kato & Kakehi (1988) Listener adaptation to talker voice:

Increase in recognition accuracy over the course of 5 stimuli presented in noise

Page 26: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

26

Theories - TN

“In this approach, cognitive categories are represented as collections of the stored cognitive representations of experienced instances of the category,

rather than as normalized abstract representations from which category-internal structure has been removed” (Johnson 2004)

Page 27: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

27

Contents

1. Speech and speaker normalization in vowel normalization: definition

2. Influencing parameters and instruments for vowel normalization

3. Theories

4. Studies4.1 Johnson 19904.2 Johnson 1999

5. Recapitulation

Page 28: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

28

Studies

“The role of perceived speaker identity in F0 normalization of vowels” (Johnson 1990)

Presentation of vowels from a “hood”-”hud” continuum in two different intonational contexts which were judged to have been produced by different speakers, even though the F0 of the test word was identical in the two contexts.

Page 29: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

29

Studies

“The role of perceived speaker identity in F0 normalization of vowels” (Johnson 1990)

Shift in identification as a result of the intonational context

which was interpreted as evidence for the role of perceived speaker identity in vowel normalization

Page 30: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

30

Studies

“Auditory-visual integration of talker gender in vowel perception” (Johnson 1999)

Exp. 1 found, that the gender of auditory-visually presented stimuli shift the phoneme boundary of a vowel continuum

Exp. 2 found that visual phonetic information is integrated in the boundary shift

Exp. 3 showed that listeners integrate abstract gender information with phonetic information in speech perception

Page 31: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

31

Contents

1. Speech and speaker normalization in vowel normalization: definition

2. Influencing parameters and instruments for vowel normalization

3. Theories

4. Studies: Johnson 1990 and 1999

5. Recapitulation

Page 32: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

32

Recapitulation

- Great internal and external influence on the perception (of vowels)

- Explanation must integrate repeated learning

- Information on speaker identity influences the perception (of vowels)

- But: Is the perception of speaker identity influenced by certain components of the speech signal?

- May speaker identity be manipulated?

Page 33: Speech and speaker normalization (in vowel normalization)

Clara Tillmanns - Speech and speaker normalization

33

ReferencesBladon, R.A., Henton, C. G. & Pickering, J. B. (1984) Towards an auditory theory of speaker normalization. Language

Communication 4, 59-69.Fujisaki, H. & Kawashima, T. (1968) The roles of pitch and higher formants in the perception of vowels. IEEE Transactions

on Audio and Electroacoustics AU-16, 73-77.Hillenbrand, J. M. & Neary, T. M. (1999) Identification of synthesized /hVd/ utterances: Effects of formant contour. J.

Acoust. Soc. Am. 105, 3509-3523.Ladefoged, P. & Broadbent, D. E. (1957) Information conveyed by vowels. J. Acoust. Soc. Am. 29, 98-104Leather, J. (1983) Speaker normalization in the perception of lexical tone. Journal of Phonetics 11, 373-382Lehiste, I. & Metzger, D. (1973) Vowel and speaker identification in natural and synthetic speech. Language and Speech

16, 356-364.Johnson, K., Strand, E. A. & D’Imperio, M. (1999) Auditory-visual integration of talker gender in vowel perception. Journal

of Phonetics 27, 359-384Johnson, K. (2004) Speaker normalization in speech perception. Ohio State UniversityJohnson, K. (1990) The role of percieved speaker identity in F0 normalization of vowels. J. Acoust. Soc. Am. 88 642-654Kato, K & Kakehi, K. (1988) Listener adaptability to individual speaker differences in monosyllabic speech perception. J.

Acoust. Soc. Of Japan 44, 180-186Magnuson, J. & Nusbaum, H. (1994) Are representations used for talker identification available for talker normalization?

Proceedings of the International Conference on Spoken Language Processing.Miller, R. L. (1953) Auditory tests with synthetic vowels. J. Acoust. Soc. Am. 25, 114-121.Peterson, G. E. & Barney, H. L. (1952) Control methods used in the study of vowels. J. Acoust. Soc. Am. 24, 175-184Rubin, D. L. (1992) Non-language factors affecting undergraduates’ jedgements of non-native English-speaking teaching

assistants. Research in Higher Education 33, 4.Strand, E. A. & Johnson, K. (1996) Gradient and visual speaker normalization in the perception of fricatives. In Natural

languag processing and speech technology: results of the 3rd KONVENS conference, Bielefeld, (D. Gibbon, Ed.), Berlin: Mouton de Gruyter (pp. 14-26).