18
Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Embed Size (px)

Citation preview

Page 1: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Speech Parameter Generation From HMM Using Dynamic Features

Keiichi Tokuda, Takao Kobayashi, Satoshi Imai

ICASSP 1995

Reporter: Huang-Wei Chen

Page 2: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Introduction • From the fact of the HMM can model sequence of speech

spectra by well-defined algorithm, and can be applied to speech recognition systems, we surmise that, if there is a method for speech parameter generation from HMMs, it will be useful for speech synthesis by rule.

• For example, it is feasible to synthesis speech with var-ious voice by using speaker adaptation technique in HMM-based speech recognition, and synthesis units can be selected automatically based on the model clustering and splitting methods used in HMM-based speech recognition.

Page 3: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Introduction • From this point of view, this paper proposes an algorithm

for speech parameter generation from continuous HMMs which include the dynamic features.

• It is shown that the parameter generation from HMMs using the dynamic features results in searching for the optimum state sequence and solving a set of linear equ-ation for each possible state sequence.

• We derive a fast algorithm for the solution by analogy of the RLS algorithm* for adaptive filtering.

*RLS algorithm: The Recursive Least Squares (RLS) adaptive filter is an algorithm which recursively finds the filter coefficients that minimize a weighted linear least squares cost function relating to the input signals. -Wikipedia

Page 4: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Problem • Let be the vector sequence of speech parameter and be

the state sequence of an HMM .• In this paper, we assume that the vector of speech para-

meter at frame t consist of the static feature vector (e.g., cepstral coefficients) and the dynamic feature vector (e.g., delta cepstral coefficients), that is,

where

and is defined as

Page 5: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Problem• To simplify the discussion, we assume that and are

statistically independent.• The problem is to determine the parameter sequence

which maximizes

for a given HMM .• However, since the problem is difficult to solve, we

consider the optimum sequence in a similar manner of the Viterbi algorithm, that is , we maximize

with respect to . Since we have to determine q and c simul-taneously, in contrast to the Viterbi algorithm the dynamic programming methods cannot be used.

Page 6: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Solution of the problem • To solve the problem, first we consider maximizing for a

given state sequence q with respect to c. The probability is written as

• For given q, maximizing with respect to c is equivalent to maximizing with respect to c be-cause the probability is not depend on O.

• The probability is written by

Page 7: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Solution of the problem• Without loss of generality we assume that the distribut-

ions of are single Gaussian mixture because mixture components can be considered to be a special form of sub-state in which the transition probabilities are the mixture weights. Therefore, the output probability at state j is given by

where and are the M-by-1 mean vector and the M-by-M covariance matrix of at state j respectively; are those of respectively, and denotes the Gaussian distribution.

Page 8: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Solution of the problem• Thus, the logarithm of is written as

where

Page 9: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Solution of the problem

and denotes the M-by-M identity matrix. We assume that

where denotes the M-by-1 zero vector.

Page 10: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Solution of the problem• To maximize with respect to c, by setting , we obtain a

set of equations

where

• For direct solution , we need operations on the assumption that . When and are diagonal, it becomes .

• To obtain q and c which maximize , we have to solve for every possible state sequence.

Page 11: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Solution of the problem• By using special properties of , we can derive a fast

algorithm for determination of q and c.Consider replacing the value of the mean vector and covariance matrix, , at a frame t with . The corresponding set of equations can be written as

where

Page 12: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Solution of the problem• The relation of (1) and (2) is similar to the time update

property of the set of the equations for the RLS adaptive filtering.

• Consequently, we can derive a fast algorithm which obtains from recursively by the analogy of the deri-vation of the standard RLS algorithm.

• To replace with we can use following equations instead of equations (3)-(5):

Page 13: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Solution of the problem• The algorithm is summarized in the following table:

1. Set D, d, w by (3)-(5) to replace with

2. Set D, d, w by (6)-(8) to replace with

3. Substitute and obtained by the previous iteration to c and P, respectively, and calculate

• It is noted that • The computational complexity of the algorithm

becomes , when and are diagonal, the complexity is .

Page 14: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Solution of the problem-flow chat• By using recursive algorithm, we can search for the optimum

state sequence keeping c optimal in the sense that is maximized with respect to c.

• For each state sequence, we can use the recursive algorithm instead of solving directly, therefore, the computation complexity is reduced.

• The overall procedure for parameter generation from HMMs is as following:1. Solve the set of equation for an initial state sequence, and obtain

c and P.

2. Replace the state of a frame t with according to a certain strategy, and use the propose algorithm to obtain and .

3. If the value of is not smaller than that of , discard the replacement.

4. Repeat 2 & 3 until a certain condition is satisfied.

Page 15: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Solution of the problem-in the beginning

• For the initial state sequence, can be solved as follows:1. On the assumption that and for , the solution of is given

as and .

2. By putting the values of and , back with the original value for using the algorithm T times, we can obtain c and P for the initial state sequence.

• The initial state sequence should be given appropriately; a reasonable way is to select the initial state sequence which maximizes . And for a given T, we can obtain such sequence by using Viterbi algorithm.

Page 16: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Speech synthesis based on HMM• We suppose that mel-cepstrum is used as speech parameter.

However the LPC-derived mel-cepstrum is not proper for synthesizing speech because it does not represent the original spectrum obtained by the LPC analysis.

• To synthesize speech, we require pitch information besides spectral information. Consequently, speech parameters in the proposed algorithm should include pitch information of speech signal.

• To generate high quality speech we have to investigate the issues such as the choice of model size (number of states), choice of output distribution (number of mixtures, diagonal or full covariance matrix), and the choice of units of model (phoneme or syllable, context-dependent or independent).

Page 17: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Speech synthesis based on HMM• In the conventional HMMs, the probability of state

occupancy decreases exponentially with time. This type of state duration probability does not provide an adequate representation of the temporal structure of speech.

• To control temporal structure appropriately, we should use HMMs with state duration models.

Page 18: Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen

Example • Without the dynamic

features, the parameter sequence which max-imizes becomes a sequence of the mean vectors: .

• From the example, it is seen that incorporating the dynamic features is essential to generate speech parameters from HMMs.