Generation of F 0 Contours Using a Model-Constrained Data- Driven Method Atsuhiro Sakurai (Texas...

Generation of F0 Contours Using a Model-Constrained Data-

Driven Method

Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center)Nobuaki Minematsu (Dep. of Comm. Eng., The Univ. of Tokyo, Japan)Keikichi Hirose (Dep. of Frontier Eng., The Univ. of Tokyo, Japan)

Corpus-Based Intonation Modeling

• Rule-based approach: ad-hoc rules derived from experience– Human-dependent, labor-expensive

• Corpus-based approach: mapping from linguistic to prosodic features statistically derived from a database– Automatic, potential to improve as larger corpora become av

ailable

– The F0 model : a parametric model that reduces degrees of freedom and improves learning efficiency

F0 Model

(t) lnFmin

(t T )i 1

aj(t T

JGaj(t t

piit it

ajjt jt

( )exp( ) ( )

( )min[ ( ) exp( ), ] ( )

F0 Model

• Characteristics of the F0 Model:

– Direct representation of physical F0 contours

– Relatively good correspondence with syntactic structure

– Ability to express an F0 contour with a small number of parameters

Better training efficiency by reducing degrees of freedom

Prosodic Database

Training module

Intonation model

Linguistic features +

F0 model parameters

Intonation model

Linguistic features F0 model parameters

Training/Generation Mechanism1) Training Phase

2) Generation Phase

Parameter Prediction Using a Neural Network

• Neural networks are good for non-linear mappings• The generalizing ability of neural networks can deal wit

h imperfect or inconsistent databases (prosodic databases labeled by hand)

• Feedback loops can be used to capture the relation between accentual phrases (partial recurrent networks)

InputLayer

HiddenLayer

OutputLayer

ContextLayer

InputLayer

HiddenLayer

OutputLayer

StateLayer

(a) Elman network (b) Jordan network

InputLayer

HiddenLayer

OutputLayer

(c) Multi-layer perceptron(MLP)

Neural Network Structure

Input Features

Position of accentual phrase within utteranceNumber of morae in accentual phraseAccent type of accentual phraseNumber of words in accentual phrasePart-of-speech of first wordConjugation form of first wordPOS category of last wordConjugation form of last word

181598217217

Input Feature Max. value

Chiisana unagiyani nekkinoyoona monoga minagiru（小さな　うなぎ屋に　熱気のような　ものが　みなぎる）

“unagiyani”Position of accentual phrase within utterance: 2No. of morae in accentual phrase: 5Accent type of accentual phrase: 0No. of words in accentual phrase: 2POS, conjugation type/category of first word: noun/0POS, conjugation type/category of last word: particle/0

Input Features - Example

Command

t0 t1 t2

tA tB tDtC

Waveform

Output FeaturesAccent nucleus

Phrase command magnitude (Ap)Accent command amplitude (Aa)Phrase command delay (t0 off 　 = tA - t0)Delay of accent command onset (t1 off = tA - t1 or tB - t1)Delay of accent command reset (t2 off = tC - t2)Phrase command flag

Output Feature

tA, tB, tC, tD: mora boundariest0, t1, t2: F0 model parameters

• Background– Neural networks provide no additional information on the modeling

– Binary regression trees provide human interpretability

– The knowledge obtained from binary regression trees could be used as a feedback in other kinds of modeling

• Outline– Input and output features equal to the neural network case

– Tree-growing stop criterion: minimum number of examples per leaf node

Parameter Prediction Using Binary Regression Trees

0.0 1.0 2.0 3.0 TIME [s]

WAVEFORM mhtsdj02

dorobo,u

mohaitt

pauissh

TIME [s]

0.0 1.0 2.0 3.0

0.0 1.0 2.0 3.0TIME [s]

FREQUENCY [Hz]

0.0 1.0 2.0 3.0

TIME [s]

PROSODIC COMMAND

Neural network example

0.0 1.0 2.0 3.0 TIME [s]

WAVEFORM mhtsdj02

dorobo,u

mohaitt

pauissh

TIME [s]

0.0 1.0 2.0 3.0

0.0 1.0 2.0 3.0TIME [s]

FREQUENCY [Hz]

0.0 1.0 2.0 3.0

TIME [s]

PROSODIC COMMAND

Binary regression tree example

)]log()[log(1

iii FF

Experimental Results (1): MSE Error for Neural Networks

Neural net #Elements in Mean squareconfiguration hidden layer error

MLP 10 0.218MLP 20 0.217Jordan 10 0.220Jordan 20 0.215Elman 10 0.214Elman 20 0.232

elman-10

)]log()[log(1

iii FF

Experimental Results (2): MSE Error for Binary Regression Trees

Stop Mean squarecriterion error

10 0.21520 0.22230 0.21040 0.22050 0.21750 0.220

stop-30

Method MSE

Experimental Results (3): Comparison with Rule-Based Parameter

Prediction

Neural network (elman-10) 0.214Binary regression tree (stop-30) 0.210Rule set I 0.221Rule set II 0.193

Rule set I:　　 Phrase and accent commands derived from rules (including phrase command flag)Rule set II:　　 Phrase and accent commands derived from rules (excluding phrase command flag)

Experimental Results (4): Listening Tests

Number of listeners: 8Number of sentencees

Neural network Binary regression trees Rule-based

Preference 28 39 13

Conclusions

• Advantages of data-driven intonation modeling:– No need of ad-hoc expertise

– Fast and straightforward learning

• Difficulties:– Prediction errors

– Difficulty in finding cause-effect relations for prediction errors

• For now on:– To explore other learning methods

– To deal with the data scarcity problem

Generation of F 0 Contours Using a Model-Constrained Data- Driven Method Atsuhiro Sakurai (Texas...

Documents

Osaka University Knowledge Archive : OUKA...Tsuyoshi GOTO (Graduate School of Economics, Osaka University), Nobuaki YAMAMOTO (Graduate School of Law and Politics, Osaka University),

SEDIMENT TRANSPORT PROCESSES IN … SEDIMENT TRANSPORT PROCESSES IN MOUNTAIN AREA OF KINUGAWA RIVER Catherine G. Jaceldone1 Supervisors: Atsuhiro Yorozuya2 MEE15625 Shinji Egashira3

Hiroaki Miyaoka, Yasunori Abe, Nobuaki Sekiya, Hidemichi ... · Hiroaki Miyaoka, Yasunori Abe, Nobuaki Sekiya, Hidemichi Mitome and Etsuko Kawashima School of Pharmacy, Tokyo University

1/33 Nobuaki ARAI, Junichi OKUYAMA (Graduate school of Informatics, Kyoto Univ.) Copyright (C) 2010 Field Informatics Research Group. Kyoto University

Redox-sensitive transient receptor potential …...INVITED REVIEW Redox-sensitive transient receptor potential channels in oxygen sensing and adaptation Yasuo Mori1,2 & Nobuaki Takahashi1,3

201912 company - freebit · Network Consulting Business Atsuki Ishida (Representative Director and Chairman) Nobuaki Tanaka (Representative Director and President) Takashi Shimizu

Improving Packing Algorithms for Server Consolidationcsc.csudh.edu/btang/seminar/slides/Packing.pdf · Consolidation YASUHIRO AJIRO, ATSUHIRO TANAKA SYSTEM PLATFORMS RESEARCH LABORATORIES,

Atsuhiro Goto - nedo.go.jp · What are we doing and lookingfor? To accomplish these three research goals, • Practical experiments and trials in the “working” environment with

Cryptanalysis of PMACx, PMAC2x, and SIVx · Cryptanalysis of PMACx, PMAC2x, and SIVx Kazuhiko Minematsu1 and Tetsu Iwata2 1 NEC Corporation, Japan k-minematsu@ah.jp.nec.com 2 Nagoya

Members of The Century Foundation International Working ... · International Crisis Group Nobuaki Tanaka Former Japanese Ambassador to Turkey and Pakistan Ann Wilkens Former Chair,

プレイヤーサマリー - ryutama.mints.ne.jpryutama.mints.ne.jp/DL/Player_summary01.pdf · 2 (C)2011 Atsuhiro Okada (C)2011 JIVE Ltd. プレイヤーキャラクター作成

A determinative experiment of molecular structure ... · A determinative experiment of molecular structure concerning heated glucose possessing cancer-preventing function Nobuaki

New Bounds for PMAC, TMAC, and XCBC Kazuhiko Minematsu and Toshiyasu Matsushima, NEC Corp. and Waseda University Fast Software Encryption 2007, March 26-28,

Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center)

[Kondo Nobuaki] Persian Documents Social History (Bookos.org)

Megumi Iwai, Tsuyoshi Minematsu, Qun Li, Takafumi Iwatsubo ...dmd.aspetjournals.org/content/dmd/early/2011/09/14/dmd.111.040733.full.pdf · clearance of YM155 (Iwai et al., 2009)

Department of Chemical Engineering Nara National College of Technology Takanori KOBAYASHI, Atsuhiro KAWAMURA, Katsumi KATAKURA, Hirohisa YAMADA Electrochemical

Title Cover, Preface, Table of Contents ARAI, NOBUAKI ......Title Cover, Preface,Table of Contents Author(s) ARAI, NOBUAKI Citation Proceedings of the 4th International Symposium on

Genetic linkage maps and QTLS assosiated with viral disease resistance in fish Nobuaki Okamoto Tokyo University of Marine Science and Technology

Copyright 2005 ＮＴＴ Information Sharing Platform Labs 1 Safe and Secure Ubiquitous Communication Jan. 27, 2005 Atsuhiro GOTO Information Sharing Platform