27
Modelling Polish Intonation for Speech Synthesis Dominika Oliver 23 May 2002

Modelling Polish Intonation for Speech Synthesis

  • Upload
    vicky

  • View
    159

  • Download
    5

Embed Size (px)

DESCRIPTION

Modelling Polish Intonation for Speech Synthesis. Dominika Oliver 23 May 2002. Plan. Aims & Objectives Reasons Methodology. Building TTS systems. Basic building blocks: pre- processing: analysis of raw and labelled text into identifiable words. - PowerPoint PPT Presentation

Citation preview

Page 1: Modelling Polish Intonation for Speech Synthesis

Modelling Polish Intonation for Speech Synthesis

Dominika Oliver

23 May 2002

Page 2: Modelling Polish Intonation for Speech Synthesis

Plan

Aims & Objectives Reasons Methodology

Page 3: Modelling Polish Intonation for Speech Synthesis

Building TTS systems

Basic building blocks:

pre-processing: analysis of raw and labelled text into identifiable words. Text normalisation (abbreviations, dates, money time

indications, addresses, telephone num, bank accounts, etc) tokenization, mapping tokens to words, resolving mark-up

languages

linguistic module : From words to segments: Orthographic to phonetic conversion of words (morphological

analysis, g2p, syllabification, stress assignment) Sentence analysis (resolve pronunciation ambiguities,

syntactic, lexical and semantic analysis)

Page 4: Modelling Polish Intonation for Speech Synthesis

Building TTS systems (cd)

phonetic module F0 and durations (and anything else appropriate for

waveform synthesis) Prosodic modelling (generation of intonation contour by

intonation model, prosodic phrase, accent and F0 prediction)

acoustic module (Waveform synthesis) Conversion into digital speech signal From segments, F0 and duration to a waveform. There are many techniques to do this, concatenative

synthesis (diphone, unit selection), formant synthesis and articulatory synthesis.

Page 5: Modelling Polish Intonation for Speech Synthesis

Terminology

Stress - lexically specified distinction between strong and weak syllables, a stressed syllable louder and

longer than an unstressed one

Tone - lexically specified pitch movement, property of a syllable

Accent - post-lexical pitch movement, linked to a stressed syllable

Pitch accent - lexical pitch movement, property of a word

Page 6: Modelling Polish Intonation for Speech Synthesis

Intonation in TTS

Intonation prediction can be split into two tasks

Prediction of accents: (and/or tones) this is done on a per syllable basis, identifying which syllables are to be accented as well as what type of accent is required (if appropriate for the theory).

Realization of F0 contour: given the accents/tones generate an F0 contour.

Page 7: Modelling Polish Intonation for Speech Synthesis

Why is it important?

In the task of rendering natural sounding speech from raw text, one of the many tasks is generating natural sounding intonation.

A number of intonation theories have been utilised in various systems to try to do this task.

As the quality of speech synthesis improves, a greater demand is put on the intonation system to produce more varied intonation tunes.

Page 8: Modelling Polish Intonation for Speech Synthesis

Models of Intonation

Linear or Tone sequence models - generate values from left to right as a sequence of values or movements.

British school - based on auditory analysis Pierrehumbert 1980 - predominantly acoustic analysis Dutch school - ‘t Hart, Collier and Cohen 1990 - perceptual data Tilt - Taylor 1998 - phonetic

Superpositional or hierarchical models - generate a contour by modelling factors separately (phone, syllable, word, phase, sentence) and then combining the partial models.

Fujisaki 1983, Grønnum 1992, Möbius et al. 1993,

Page 9: Modelling Polish Intonation for Speech Synthesis

Techniques of intonation modelling: using Tilt & ToBI

Tilt and ToBI typify two major classes of intonation systems.

Tilt comes from a data-driven approach attempting to form an abstraction of the natural contour directly from the waveform.

ToBI takes a more linguistic or phonological approach specifying a small set of discrete labels which identify the intonational space of accents and tones.

Also prosodic labelling systems

Page 10: Modelling Polish Intonation for Speech Synthesis

ToBI (Pierrehumbert, 1980)

Autosegmental-metrical approach, pitch movements are decomposed into pitch levels.

Intonation phrases are modelled as sequences of (H) high and (L) low pitch levels.

ToBI offers a well-defined intonation phonology for labelled speech. Most widely available standard labelling system.

The ToBI labelling system itself does not define a mechanism to go from the labels to an F0 contour, or the reverse. However there are both hand written rule systems (e.g. M. Anderson, J. Pierrehumbert, and M. Liberman 1984)

and statistically trained methods (e.g. A. Black and A. Hunt, 1996.) which do this task.

Machine readable . Increase in descriptive power : transcriptions can be compared

across dialects and languages, ToBI for English, GToBI for German, SCToBI for Serbo-Croatian, ToDI for Dutch, etc.

Page 11: Modelling Polish Intonation for Speech Synthesis

Tilt (Taylor 1998)

Tilt is a phonetic model of intonation that represents intonation as a sequence of continuously parameterised events (pitch accents or boundary tones).

These parameters are called tilt parameters, determined directly from F0 contour.

They are : duration, amplitude and tilt

Imposes no categorial classification on events.

Page 12: Modelling Polish Intonation for Speech Synthesis

Tilt (cd) Duration is a sum of the rise and fall durations. Amplitude is the sum of the magnitudes of the rise and fall

amplitudes. Tilt parameter – expresses overall shape of the event, the

difference of the amplitudes divided by their sum.

The tilt parameter has a range of -1 to 1, -1 pure fall, 1 pure rise, 0 equal portions of rise and fall.

Page 13: Modelling Polish Intonation for Speech Synthesis

Examples of intonation control

Information provided by intonation: Focus or given/new information Emotions, word emphasis, syntactic disambiguation

examples from Mary TTS (DFKI) Gehen wir nach Hause !/?

Der Zug fährt nach Frankfurt, oder?

Ist die Nummer 180? Nein, die Nummer ist 100 80.

Page 14: Modelling Polish Intonation for Speech Synthesis

Prosodic Labelling Systems

ToBi (Tones and Break Indices)

ToBI is a intonational labelling standard for speech databases that in some way is based on Janet Pierrehumbert's thesis Pierrehumbert 1980.

Made on the basis of a speech wave and F0 trace The labelling scheme consists of: (1) words spoken Orthographic tier (2) the degree of juncture between words Break-index tier (3) intonation Tone tier (4) comments Miscellaneous tier

Page 15: Modelling Polish Intonation for Speech Synthesis

Prosodic Labelling Systems

ToBI (cd) discrete intonation accents types: H*, H+!H, L*, L*+H and

L+H*.

phrase accent type: H- and L-

boundary tones: L-L%, L-H%, H-L% and H-H%

break levels: 0, 1, 3, and 4 (2 reserved for special cases)

Page 16: Modelling Polish Intonation for Speech Synthesis

Prosodic Labelling Systems (cd)

Tilt A Tilt labelling for an utterance consists of an assignment of

one of four basic intonational events:

pitch accents, boundary tones, connections, silence (labelled a, b, c, sil).

Page 17: Modelling Polish Intonation for Speech Synthesis

Prosodic Labelling Systems (cd)

Page 18: Modelling Polish Intonation for Speech Synthesis

Polish synthesis (examples)

What is available : Festival (University of Edinburgh, CSTR)

Realspeak (Scansoft)

Spiker (IVO Software)

SynTalk (Neurosoft)

Page 19: Modelling Polish Intonation for Speech Synthesis

Polish intonation model

British school (Jassem 1984, Demenko, 1999)

The description of accent and intonation at the linguistic level is based on the main features of a British-English system developed essentially by O’Connor and Arnold (1973) and Jassem (1984),

an intonational phrase is defined in terms of a sequence of (optional) pre-nuclear, (constitutive) nuclear, and (optional) post-nuclear accents.

[prehead [ head [[ nucleus ] tail]]] (O'Connor & Arnold) [anacrusis][[prenuclear intonation[nuclear intonation]]] (Jassem)

e.g. To jest naj' lepsza 'pora "dnia. To jest naj' lepsza po" radnia. "Co mó wiłeś?

Page 20: Modelling Polish Intonation for Speech Synthesis

Intro - Polish intonation structure

A Polish phrase includes only one ictic accent, which is the also referred to as nuclear accent,

The pre-ictic accent is referred to as pre-nuclear and post-ictic accents are called post-nuclear accents

The pre-nuclear and the nuclear accents are mainly determined by specific pitch relations, whilst the post-nuclear accent (if any) is essentially durational.

Page 21: Modelling Polish Intonation for Speech Synthesis

Intro - Polish intonation structure (cd)

2 classes of pre-nuclear accents: H (high) and L (low)

9 classes of nuclear accents: HL, ML, xL, HM, LM, MH, MM, and LHL have been distinguished, where H is High, M Medium, L Low and xL extra-Low relative to the particular speaker’s average and mean-Low pitch; e. g., LH means “rising from Low to High”. etc.

e.g. ``Znowu ten wariat. (HL)

,, Znowu ten wariat? (LH)

Page 22: Modelling Polish Intonation for Speech Synthesis

Platform

Festival is a speech synthesis application developed at the The Centre for Speech Technology Research (CSTR) at the University of Edinburgh Multilingual text to speech

(English, Spanish, German, Welsh, Catalan, Polish) Allows addition of new languages

Synthesis research and development environment Tools for development - support for extracting

information from speech databases, in a way suitable for building models. (Models for accent prediction, F0 generation, duration, vowel reduction, homograph disambiguation, phrase break assignment and unit selection)

Free software

Page 23: Modelling Polish Intonation for Speech Synthesis

Platform (cd) - direct route from research to use

Multi-lingual text to speech: for those who have little interest in the internal workings of the system, and just want speech output.

Synthesis for language system: for applications that generate text from known forms. In this type of system perhaps telephone numbers, addresses, etc. can be explicitly marked, language type, even intonational forms can be specified. This form of access requires more knowledge about the synthesis internals but still not its low level details.

Synthesis development environment: In this mode, new synthesis modules, intonation, waveform synthesizers, etc. can be developed and compared in a software environment that provides the right basic tools so that development may concentrate on the theory not the implementation.

Page 24: Modelling Polish Intonation for Speech Synthesis

Intonation in Festival

Task :

Prediction of accents & realisation of F0 contour

Method :

Statistical and rule based

Tilt ToBI

Page 25: Modelling Polish Intonation for Speech Synthesis

Intonation in Festival (cd)

ToBI: Accents and boundary types are predicted by a CART tree (classification and regression trees), but the F0 generation method is a statistically trained method.

Three F0 values are predicted for each syllable, at the start, mid vowel and end. They are predicted using linear regression based on a number of features including ToBI accent type, phrase position, syllable position with contexts.

Although a three point prediction system cannot capture all the variability in natural intonation, by experiment it has been used to be sufficient to produce reasonable F0 contours (Black 1998).

Page 26: Modelling Polish Intonation for Speech Synthesis

Intonation in Festival (cd) The Tilt Intonation Theory, takes a bottom up approach. Its

intention is to build a parameterization of the F0 contour, that is abstract enough to be predictable in a text to speech system.

It has been shown that a good representation of a natural F0 contour can be made automatically from the raw signal (though it is better of the accents and boundaries are hand labelled). Dusterhoff 1997 further shows how that parameterization can be predicted from text.

Page 27: Modelling Polish Intonation for Speech Synthesis

Future work : pilot study

Immediate Plans ToBI description of Polish Intonation Phrase (Polish Intonation

database (Karpiński 2000)

Future Work Synthesis assessment : visually impaired

Potential Applications free Polish-English talking dictionary (EU project)