5
Error Concealment of EVRC Speech Decoder using Residual Redundancy Ahmed J. Jameel, Hadeel Adnan, You Xiaohu and Abir Hussain Department of Telecommunications Engineering,College of Engineering, Ahlia University, Bahrain National Mobile Communications Research Laboratory, Department of Radio Engineering, Southeast University, Nanjing 210096, China [email protected] Abstract In digital mobile communication systems, speech coding is very important to increase the bandwidth efficiency. Usually, speech coding algorithms determine speech parameters, which are highly sensitive to transmission errors. In this paper, we use the residual redundancy remaining after using the Enhanced Variable Rate Codec (EVRC) algorithm for error concealment. Average residual redundancies of the quantized parameters are exploited in the error concealment process as a priori knowledge of the source. The simulation results show that the use of error concealment improves the parameter SNRs of these parameters, especially for Pitch delay, Delta delay and Fixed Codebook (FCB) gain. The results also show that the more redundancy exists in the encoded parameter, the more improvement we could obtain by using the error concealment scheme. 1. Introduction The Enhanced Variable Rate Codec (EVRC) is a standard for the “Speech Service Option 3 for Wideband Spread Spectrum Digital System”, which has been employed in both IS-95 cellular systems and ANSI J-STC-008 PCS (Personal Communications Systems). The EVRC coder is a multi-rate ACELP coder based on the relaxation CELP (RCELP) paradigm [1]. Unlike conventional CELP codec, RCELP attempts to match a modified speech residual signal generated by a time-warped version of the original residual that conforms to a simplified pitch contour. As a result, the pitch information is transmitted over a frame instead of a subframe. Consequently, more bits are allocated to the fixed codebook encoding and to the channel coding. However, at a high compression rate, the encoded bit stream becomes extremely vulnerable to errors and the quality of the synthesized speech at the receiver may suffer intolerable degradation especially under poor channel conditions. Therefore, error control techniques such as error correction and error concealment should be used to guarantee a high quality for the decoded speech. Softbit speech decoding (SBSD) [2] is a new error concealment approach, which reduces the subjective effects of residual bit errors, which have not been eliminated by channel decoding. In [2], applications of SBSD to PCM, ADPCM and GSM coded speech are studied. In this paper, we apply this new error concealment to EVRC Rate 1 encoded speech and the residual redundancies [3,4] are exploited as a priori knowledge of the considered parameters. In the rest of this paper, error concealment and SBSD are used interchangeably. This paper is organized as follows. In section 2, the residual redundancy of the EVRC system is analyzed. In section 3, the error concealment procedure is described. Section 4 concentrates on simulation results, while we conclude in section 5. 2. EVRC Residual Redundancy One frame of EVRC consists of 10 Line Spectral Pair (LSP) parameters; which model the signal’s short-term spectrum, is quantized with a weighted split vector LSP quantizer. EVRC coding also makes use of adaptive and fixed codebooks, which simulate the human speech’s voiced and unvoiced excitations, respectively. The adaptive codebook is represented by 7-bit pitch delay, 5-bit delta delay, and three 3-bit adaptive codebook gains per frame. Similarly, the fixed codebook is represented by three 35-bit shapes and three 5-bit gain parameters. The bit allocation for each set of parameters is shown in Table 1 [5]. Parameter Number of bits LPCFLAG 1 LSP 6+6+9+7 = 28 Pitch Delay 7 Delta Delay 5 ACB Gain 3 3 = 9 u FCB Shape 3 35 = 105 u FCB Gain 3 5 = 15 u Table 1: Bit Allocations of Rate 1. 2009 Second International Conference on Developments in eSystems Engineering 978-0-7695-3912-6/09 $26.00 © 2009 IEEE DOI 10.1109/DeSE.2009.53 84 2009 Second International Conference on Developments in eSystems Engineering 978-0-7695-3912-6/09 $26.00 © 2009 IEEE DOI 10.1109/DeSE.2009.53 84 2009 Second International Conference on Developments in eSystems Engineering 978-0-7695-3912-6/09 $26.00 © 2009 IEEE DOI 10.1109/DeSE.2009.53 84

[IEEE 2009 Second International Conference on Developments in eSystems Engineering (DESE) - Abu Dhabi, UAE (2009.12.14-2009.12.16)] 2009 Second International Conference on Developments

  • Upload
    abir

  • View
    216

  • Download
    4

Embed Size (px)

Citation preview

Page 1: [IEEE 2009 Second International Conference on Developments in eSystems Engineering (DESE) - Abu Dhabi, UAE (2009.12.14-2009.12.16)] 2009 Second International Conference on Developments

Error Concealment of EVRC Speech Decoder using Residual Redundancy

Ahmed J. Jameel, Hadeel Adnan, You Xiaohu and Abir Hussain Department of Telecommunications Engineering,College of Engineering,

Ahlia University, Bahrain National Mobile Communications Research Laboratory, Department of Radio Engineering,

Southeast University, Nanjing 210096, China [email protected]

AbstractIn digital mobile communication systems, speech

coding is very important to increase the bandwidth efficiency. Usually, speech coding algorithms determine speech parameters, which are highly sensitive to transmission errors. In this paper, we use the residual redundancy remaining after using the Enhanced Variable Rate Codec (EVRC) algorithm for error concealment. Average residual redundancies of the quantized parameters are exploited in the error concealment process as a priori knowledge of the source. The simulation results show that the use of error concealment improves the parameter SNRs of these parameters, especially for Pitch delay, Delta delay and Fixed Codebook (FCB) gain. The results also show that the more redundancy exists in the encoded parameter, the more improvement we could obtain by using the error concealment scheme.

1. Introduction

The Enhanced Variable Rate Codec (EVRC) is a standard for the “Speech Service Option 3 for Wideband Spread Spectrum Digital System”, which has been employed in both IS-95 cellular systems and ANSI J-STC-008 PCS (Personal Communications Systems). The EVRC coder is a multi-rate ACELP coder based on the relaxation CELP (RCELP) paradigm [1]. Unlike conventional CELP codec, RCELP attempts to match a modified speech residual signal generated by a time-warped version of the original residual that conforms to a simplified pitch contour. As a result, the pitch information is transmitted over a frame instead of a subframe. Consequently, more bits are allocated to the fixed codebook encoding and to the channel coding. However, at a high compression rate, the encoded bit stream becomes extremely vulnerable to errors and the quality of the synthesized speech at the receiver may suffer intolerable degradation especially under poor channel conditions. Therefore, error control techniques such as error correction and

error concealment should be used to guarantee a high quality for the decoded speech.

Softbit speech decoding (SBSD) [2] is a new error concealment approach, which reduces the subjective effects of residual bit errors, which have not been eliminated by channel decoding. In [2], applications of SBSD to PCM, ADPCM and GSM coded speech are studied. In this paper, we apply this new error concealment to EVRC Rate 1 encoded speech and the residual redundancies [3,4] are exploited as a priori knowledge of the considered parameters. In the rest of this paper, error concealment and SBSD are used interchangeably.

This paper is organized as follows. In section 2, the residual redundancy of the EVRC system is analyzed. In section 3, the error concealment procedure is described. Section 4 concentrates on simulation results, while we conclude in section 5.

2. EVRC Residual Redundancy

One frame of EVRC consists of 10 Line Spectral Pair (LSP) parameters; which model the signal’s short-term spectrum, is quantized with a weighted split vector LSP quantizer. EVRC coding also makes use of adaptive and fixed codebooks, which simulate the human speech’s voiced and unvoiced excitations, respectively. The adaptive codebook is represented by 7-bit pitch delay, 5-bit delta delay, and three 3-bit adaptive codebook gains per frame. Similarly, the fixed codebook is represented by three 35-bit shapes and three 5-bit gain parameters. The bit allocation for each set of parameters is shown in Table 1 [5].

Parameter Number of bits LPCFLAG 1LSP 6+6+9+7 = 28 Pitch Delay 7 Delta Delay 5 ACB Gain 3 3 = 9 FCB Shape 3 35 = 105 FCB Gain 3 5 = 15 Table 1: Bit Allocations of Rate 1.

2009 Second International Conference on Developments in eSystems Engineering

978-0-7695-3912-6/09 $26.00 © 2009 IEEE

DOI 10.1109/DeSE.2009.53

84

2009 Second International Conference on Developments in eSystems Engineering

978-0-7695-3912-6/09 $26.00 © 2009 IEEE

DOI 10.1109/DeSE.2009.53

84

2009 Second International Conference on Developments in eSystems Engineering

978-0-7695-3912-6/09 $26.00 © 2009 IEEE

DOI 10.1109/DeSE.2009.53

84

Page 2: [IEEE 2009 Second International Conference on Developments in eSystems Engineering (DESE) - Abu Dhabi, UAE (2009.12.14-2009.12.16)] 2009 Second International Conference on Developments

Our goal is to quantify the residual redundancy in the CELP parameters that are present in every frame: the LSPs, the pitch gains, the pitch delays, the adaptive codebook gains, and fixed codebook gains; to do this we will estimate the entropy rate of these parameters. All the parameters are of different bit lengths. For consistency we chose to quantify the redundancy in the three most significant bits (MSBs) of each EVRC parameter.

For each set of parameters, let the random process, represent the three most significant

bits of the (quantized) EVRC parameter in frame , and let where

denotes the number of parameters per frame. We assume that the process, , is block stationary, then the per frame entropy rate of this process is given by:

},{ , jiU

thij ],,,,[ ,,2j1, jljj UUUU

l

1}{ jjU

),,,,()(U 121 UUUU jjjj

HlimH (1)

where ),,,( 121 UUUU jjjH

j

jjjjPuu

uUuUuU,,

11111

),,,(

]),,,([log 1122112 uUuUuUuU jjjjjjP

Here, represents the minimum number of bits per frame required to describe without incurring distortion; alternately, if the process is encoded at a rate R, then the quantity

)U(H

,2,1}{ jjU

)(UHRis the residual redundancy incurred by the encoding.

To estimate the residual redundancy of the EVRC parameters, it is necessary to provide a probabilistic model for their generation. We assume that the parameters can be modeled by a stationary first-order Markov chain and computing the entropy rate based on the relative frequencies of transition [3]. We assume that the parameters in different frames are independent:

),(),,,( j1122-j11-jj jjjj PP uUuUuUuUuU(2)

and within a frame as ),,,( ,1,1,2,2,1,1,, jjjijijijijiji uUuUuUuUP

),( ,1,)(

jijii

A uuP (3)

for and . Note that for li ,,3,2 ,2,1j 1i ,

equation (3) becomes . This assumption is supported by a number of observations.

)( ,1)1(

jA uP

In practice, detected errors are often masked by repeating or interpolating parameter values from adjacent frames. A second feature is the ordered nature of the LSP parameters within one EVRC frame (LSP1 < LSP2 < … LSP10), which suggests intraframe dependency between the LSPs.

There is a little change in the observed entropy rate of the process if it is computed assuming a higher-degree Markov process – e.g., a second- or third- order Markov chain [4].

Ultimately, of course, the most important judge of the “fit” of this model is the performance of a decoder designed to fit this model, which we have seen in [4] is quite good.

In order to estimate the residual redundancy of the various parameters, a large training sequence of speech database was used; for every frame of speech, an EVRC algorithm is performed to get the parameters. The relative frequency of transitions between the three high-order bits values of each parameter was compiled to extract Markov transition probabilities. The entropy of the resulting Markov chains was computed to arrive at an estimate of the redundancy in each parameter in each frame. Let be the process entropy rate (in bits/frame) if the parameters were independent ( is independent of j since (1) is independent of j). Note that we can write

li jiUH1 ,

* )(H

*H

MDT where denotes the frame redundancy due to the non-uniform distribution of the parameters and denotes the frame redundancy due to the memory between the parameters.

*H36D

H(U)H*M

The results are compiled in Table 2 in which we provide the values of ,D M and T for each individual parameter as well as for the entire frame. It is clear from Table 2 that the amounts of 16% of the bits in each frame are redundant bits.

Table 2: Residual Redundancy (in bits/frame).

3. Error Concealment Procedure

Consider the transmission of codec parameters over a noisy channel as described in Figure 1. At the encoder side, the speech parameters are first extracted from the input samples. At time index k, a specific parameter Rvk

~ is first quantized

according to ikk vvQ ]~[ with , ( :

quantization table) where is a quantization index of quantized parameter , ,where M is the number of bits assigned to the

QTvik QT

i

kv }12,,1,0{ Mi

Parameter D M T Redundancy

LSP 0.035 0.103 0.138 1.15%Pitch Delay 0.564 0.698 1.262 42.06%Delta Delay 0.528 0.629 1.157 38.57%ACB Gain 0.184 0.289 0.473 5.25%FCB Gain 1.356 1.483 2.839 31.54%

Total 2.667 3.202 5.869 16.3%

858585

Page 3: [IEEE 2009 Second International Conference on Developments in eSystems Engineering (DESE) - Abu Dhabi, UAE (2009.12.14-2009.12.16)] 2009 Second International Conference on Developments

quantized parameter. Using bit mapping (BM), a bit combination consisting of M bits is assigned to index i

))1(,),(,),1(),0(( Mxmxxxx ik

ik

ik

ik

ik (4)

with , this bit combination is then encoded by channel encoder and transmitted through a channel. Assume the decoded bit combination

}1,1{)(mxik

kx̂ ,at the output of the channel decoder is:

))1(ˆ,),(ˆ,),1(ˆ),0(ˆ(ˆ Mxmxxxx kkkkk . (5)

Q BM ChannelParameterTransitionProbability

A posterioriProbability

A priori knowledge

Estimator

Speech Encoder

0~v 0v 0x )(

0estv

Softbit Speech Decoder

0x̂

0ep

)ˆ( )(00ixxP ),ˆ( 0

)( xxP i

Figure 1. Softbit Speech Decoding Technique.

To perform error concealment, the channel decoder should be a soft-output channel decoder. In our research, SOVA algorithm is used for rate-1/2 Turbo encoder with constraint length 5 and the L-value of bit at the output of the decoder is: )(mxi

k

)~|1)(()~|1)((

log)(YmxP

YmxPmL i

k

ik

k (6)

where denoting the transmitted bit, and )(mxik Y

~

is the received bit sequence. The hard decision of bit is given by: )(ˆ mxk

)]([)(ˆ mLsignmx kk (7) The bit error probability is defined as:

|)(|exp11)(

mLmp

kek (8)

Thus, the probability of transition from a transmitted bit to the known decoded bit is: )(mxi

k )(ˆ mxk

)()(ˆif)()()(ˆif)(1))(|)(ˆ(

mxmxmpmxmxmpmxmxP i

kkek

ikkeki

kk

(9) If we consider the channel to be memoryless, the transition probability from any bit combination i

kx ,

to the decoded bit combination }12,,1,0{ Mi

kx̂ is: 1

0))(|)(ˆ()|ˆ(

M

m

ikk

ikk mxmxPxxP (10)

For a memoryless channel, the a posteriori probability of the transmitted bit combination with index is: i

)ˆ()ˆ|ˆ()ˆ().ˆ|()|ˆ(

)ˆ,ˆ|(11

111

kkk

kkik

ikk

kkik XPXxP

XPXxPxxPXxxP

)ˆ|ˆ()ˆ|()|ˆ(

1

1

kk

kik

ikk

XxP

XxPxxP (11)

where 1ˆ

kX are the decoded bit combinations from time index 0 to 1k . Since we take each quantized parameter as a first-order Markov source, we have

)ˆ|( 1kik XxP

12

0111 )ˆ|()|(

M

jk

jk

jk

ik XxPxxP

12

02111 )ˆ,ˆ|()|(

M

jkk

jk

jk

ik XxxPxxP (12)

and

)ˆ,ˆ|()|ˆ()ˆ|ˆ( 21

12

01 kk

lk

l

lkkkk XxxPxxPXxP

M

(13)(11) can be expressed as

12

021111 )ˆ,ˆ|()|()|ˆ()ˆ,ˆ|(

M

jkk

jk

jk

ik

ikkkk

ik XxxPxxPxxPXxxP

12

0

12

02111 )ˆ,ˆ|()|()|ˆ(

M M

l jkk

jk

jk

lk

lkk XxxPxxPxxP

(14) and we can derive the a posteriori probability of any bit combination at time index by a recursive computation.

k

The probability )|( 1jk

ik xxP is an a priori

knowledge of the source and it depends on the inter-frame correlation of the encoded bit sequence. We used large training sequence to obtain the statistical value of )|( 1

jk

ik xxP .

If any of the previously specified a posteriori terms 1

ˆ,ˆ|( kkik XxxP is computed, the parameter

value itself can be estimated. The arbitrary estimation error criterion should reflect the impact of parameter errors on the subjective speech quality.

For a wide area of speech codec parameters, the Minimum Mean Square Error (MMSE) criterion is appropriate. We use the (MMSE) criterion to estimate the transmitted parameters as follows:

)ˆ,ˆ|(ˆ 1

12

0kk

ik

i

ikk XxxPvv

M

(15)

4. Simulation Results

For measurement of the speech distortion, the segmental signal-to-noise ratio (SSNR) and log spectral distance (LSD) [6] are investigated:

868686

Page 4: [IEEE 2009 Second International Conference on Developments in eSystems Engineering (DESE) - Abu Dhabi, UAE (2009.12.14-2009.12.16)] 2009 Second International Conference on Developments

1

010

2

10

2

))(ˆ)((

)(1 fN

iNn ii

Nn i

f nsns

ns

NSSNR (16)

and1

00

,)(ˆ)(fN

iii dwwSlogwlogSLSD (17)

where denotes the number of total frames, stands for the number of speech samples in each frame, and are the original and reconstructed speech in the i-th frame, and the corresponding power spectral density functions are denoted by , and , respectively. Based on these criteria, bit errors in the important parameters (i.e., the error-sensitive parameters) can cause severe SSNR reduction and LSD increment, while errors in other code bits result in little distortion.

fN N

)(nsi )(ˆ nsi

)(wSi )(ˆ wSi

Signal-to-noise ratio performance for each parameter (PSNR) protected by error concealment is measured for channel quality ( 0NEb ratio) ranging from –3.0 to 3.0 dB, where is the energy per information bit and

bE20N is the double sided

spectral noise density. The results are shown in Figures 2 and 3. All our results are based on an additive white Gaussian noise (AWGN) channel. The simulation results show that at low 0NEb ,there are 0.8-1 dB improvement in terms of parameter SNR (or 0.5-1.6 dB improvement in terms of 0NEb ) for Pitch delay and FCB Gain, 0.5-0.7 dB improvement in terms of parameter SNR (or 0.5-1.2 dB improvement in terms of 0NEb ) for LSPs, 0.9-1.5 dB improvement in terms of parameter SNR (or 0.5-1.5 dB improvement in terms of 0NEb ) for Delta delay and 0.7-1.3 dB improvement in terms of parameter SNR (or 0.6-1.4 dB improvement in terms of 0NEb ) for ACB Gain. In Figure 4, the quality of the speech is evaluated by log spectral distortion. The results show 1-1.8 dB improvement of LSD at bad channel quality when error concealment is used for EVRC encoded speech.

5. Conclusion

In this paper, we have studied the error concealment procedures used to protect the most important parameters of the EVRC speech decoder. Residual redundancies of these parameters are analyzed and exploited in the error concealment process as an a priori knowledge of the source. The simulation results show that the use of error concealment improves the parameter SNRs of these parameters, especially for Pitch delay, Delta delay and Fixed Codebook (FCB) gain. The results also show that the more redundancy exists in the encoded parameter, the more improvement we could obtain

by using the error concealment scheme. This is because the more the residual redundancy exists, the stronger is the correlation between the encoded parameters, and the higher is the probability that some information about the corrupted coefficients can be gained by observing their neighbors.

6. References

[1] W.B. Kleijn, P. Kroon, and D. Nahumi, “The RCELP Speech-Coding Algorithm”, EuropeanTransactions on Telecommunications, Vol. 5, No. 5, Sept/Oct. 1994, pp. 573-582.

[2] Tim Fingscheidt and P. Vary, “Softbit Speech Decoding: A new approach to error concealment,” IEEE Trans. Speech & Audio Process., vol. 9, no. 3, pp. 240-251, March 2001.

[3] Fady Alajaji, N. Phamdo, and T. E. Fuja, “Channel Codes that Exploit the Residual Redundancy in CELP-Encoded Speech,” IEEETransactions on Speech and Audio Processing, vol. 4, no. 5, pp. 325-336, Sept. 1996.

[4] Ahmed J. Jameel, You Xiaohu and Gao Xiqi, “Joint Source-Channel Decoding of EVRC Speech Encoder Using Residual Redundancy,” Journal of Southeast University, vol. 18, no. 2, pp. 103-107, June 2002.

[5] Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems, IS-127, July 19, 1996.

[6] L. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall: Englewood Cliffs, NJ, 1993.

-3 -2 -1 0 1 2 3

2

4

6

8

10

12

SNR of Delta delay with error concealment SNR of Delta delay without error concealment SNR of ACB Gain with error concealment SNR of ACB Gain without error concealment

Par

amet

er S

NR

(dB

)

Eb/No (dB)

Figure 2. SNR performance of Delta delay and ACB Gain.

878787

Page 5: [IEEE 2009 Second International Conference on Developments in eSystems Engineering (DESE) - Abu Dhabi, UAE (2009.12.14-2009.12.16)] 2009 Second International Conference on Developments

-3 -2 -1 0 1 2 30

2

4

6

8

10

12

14 SNR of LSPs with error concealment SNR of LSPs without error concealment SNR of Pitch delay with error concealment SNR of Pitch delay without error concealment SNR of FCB Gain with error concealment SNR of FCB Gain without error concealment

Par

amet

er S

NR

(dB

)

Eb/No (dB)

Figure 3. SNR performance of LSPs, Pitch delay and FCB Gain.

-3 -2 -1 0 1 2 3

2

4

6

8

10

LSD of speech without error concealment LSD of speech with error concealmentLo

g Sp

ectr

al D

isto

rtio

n (d

B)

Eb/No (dB)

Figure 4. Log Spectral Distortion of EVRC encoded speech.

888888