Upload
abir
View
216
Download
4
Embed Size (px)
Citation preview
Error Concealment of EVRC Speech Decoder using Residual Redundancy
Ahmed J. Jameel, Hadeel Adnan, You Xiaohu and Abir Hussain Department of Telecommunications Engineering,College of Engineering,
Ahlia University, Bahrain National Mobile Communications Research Laboratory, Department of Radio Engineering,
Southeast University, Nanjing 210096, China [email protected]
AbstractIn digital mobile communication systems, speech
coding is very important to increase the bandwidth efficiency. Usually, speech coding algorithms determine speech parameters, which are highly sensitive to transmission errors. In this paper, we use the residual redundancy remaining after using the Enhanced Variable Rate Codec (EVRC) algorithm for error concealment. Average residual redundancies of the quantized parameters are exploited in the error concealment process as a priori knowledge of the source. The simulation results show that the use of error concealment improves the parameter SNRs of these parameters, especially for Pitch delay, Delta delay and Fixed Codebook (FCB) gain. The results also show that the more redundancy exists in the encoded parameter, the more improvement we could obtain by using the error concealment scheme.
1. Introduction
The Enhanced Variable Rate Codec (EVRC) is a standard for the “Speech Service Option 3 for Wideband Spread Spectrum Digital System”, which has been employed in both IS-95 cellular systems and ANSI J-STC-008 PCS (Personal Communications Systems). The EVRC coder is a multi-rate ACELP coder based on the relaxation CELP (RCELP) paradigm [1]. Unlike conventional CELP codec, RCELP attempts to match a modified speech residual signal generated by a time-warped version of the original residual that conforms to a simplified pitch contour. As a result, the pitch information is transmitted over a frame instead of a subframe. Consequently, more bits are allocated to the fixed codebook encoding and to the channel coding. However, at a high compression rate, the encoded bit stream becomes extremely vulnerable to errors and the quality of the synthesized speech at the receiver may suffer intolerable degradation especially under poor channel conditions. Therefore, error control techniques such as error correction and
error concealment should be used to guarantee a high quality for the decoded speech.
Softbit speech decoding (SBSD) [2] is a new error concealment approach, which reduces the subjective effects of residual bit errors, which have not been eliminated by channel decoding. In [2], applications of SBSD to PCM, ADPCM and GSM coded speech are studied. In this paper, we apply this new error concealment to EVRC Rate 1 encoded speech and the residual redundancies [3,4] are exploited as a priori knowledge of the considered parameters. In the rest of this paper, error concealment and SBSD are used interchangeably.
This paper is organized as follows. In section 2, the residual redundancy of the EVRC system is analyzed. In section 3, the error concealment procedure is described. Section 4 concentrates on simulation results, while we conclude in section 5.
2. EVRC Residual Redundancy
One frame of EVRC consists of 10 Line Spectral Pair (LSP) parameters; which model the signal’s short-term spectrum, is quantized with a weighted split vector LSP quantizer. EVRC coding also makes use of adaptive and fixed codebooks, which simulate the human speech’s voiced and unvoiced excitations, respectively. The adaptive codebook is represented by 7-bit pitch delay, 5-bit delta delay, and three 3-bit adaptive codebook gains per frame. Similarly, the fixed codebook is represented by three 35-bit shapes and three 5-bit gain parameters. The bit allocation for each set of parameters is shown in Table 1 [5].
Parameter Number of bits LPCFLAG 1LSP 6+6+9+7 = 28 Pitch Delay 7 Delta Delay 5 ACB Gain 3 3 = 9 FCB Shape 3 35 = 105 FCB Gain 3 5 = 15 Table 1: Bit Allocations of Rate 1.
2009 Second International Conference on Developments in eSystems Engineering
978-0-7695-3912-6/09 $26.00 © 2009 IEEE
DOI 10.1109/DeSE.2009.53
84
2009 Second International Conference on Developments in eSystems Engineering
978-0-7695-3912-6/09 $26.00 © 2009 IEEE
DOI 10.1109/DeSE.2009.53
84
2009 Second International Conference on Developments in eSystems Engineering
978-0-7695-3912-6/09 $26.00 © 2009 IEEE
DOI 10.1109/DeSE.2009.53
84
Our goal is to quantify the residual redundancy in the CELP parameters that are present in every frame: the LSPs, the pitch gains, the pitch delays, the adaptive codebook gains, and fixed codebook gains; to do this we will estimate the entropy rate of these parameters. All the parameters are of different bit lengths. For consistency we chose to quantify the redundancy in the three most significant bits (MSBs) of each EVRC parameter.
For each set of parameters, let the random process, represent the three most significant
bits of the (quantized) EVRC parameter in frame , and let where
denotes the number of parameters per frame. We assume that the process, , is block stationary, then the per frame entropy rate of this process is given by:
},{ , jiU
thij ],,,,[ ,,2j1, jljj UUUU
l
1}{ jjU
),,,,()(U 121 UUUU jjjj
HlimH (1)
where ),,,( 121 UUUU jjjH
j
jjjjPuu
uUuUuU,,
11111
),,,(
]),,,([log 1122112 uUuUuUuU jjjjjjP
Here, represents the minimum number of bits per frame required to describe without incurring distortion; alternately, if the process is encoded at a rate R, then the quantity
)U(H
,2,1}{ jjU
)(UHRis the residual redundancy incurred by the encoding.
To estimate the residual redundancy of the EVRC parameters, it is necessary to provide a probabilistic model for their generation. We assume that the parameters can be modeled by a stationary first-order Markov chain and computing the entropy rate based on the relative frequencies of transition [3]. We assume that the parameters in different frames are independent:
),(),,,( j1122-j11-jj jjjj PP uUuUuUuUuU(2)
and within a frame as ),,,( ,1,1,2,2,1,1,, jjjijijijijiji uUuUuUuUP
),( ,1,)(
jijii
A uuP (3)
for and . Note that for li ,,3,2 ,2,1j 1i ,
equation (3) becomes . This assumption is supported by a number of observations.
)( ,1)1(
jA uP
In practice, detected errors are often masked by repeating or interpolating parameter values from adjacent frames. A second feature is the ordered nature of the LSP parameters within one EVRC frame (LSP1 < LSP2 < … LSP10), which suggests intraframe dependency between the LSPs.
There is a little change in the observed entropy rate of the process if it is computed assuming a higher-degree Markov process – e.g., a second- or third- order Markov chain [4].
Ultimately, of course, the most important judge of the “fit” of this model is the performance of a decoder designed to fit this model, which we have seen in [4] is quite good.
In order to estimate the residual redundancy of the various parameters, a large training sequence of speech database was used; for every frame of speech, an EVRC algorithm is performed to get the parameters. The relative frequency of transitions between the three high-order bits values of each parameter was compiled to extract Markov transition probabilities. The entropy of the resulting Markov chains was computed to arrive at an estimate of the redundancy in each parameter in each frame. Let be the process entropy rate (in bits/frame) if the parameters were independent ( is independent of j since (1) is independent of j). Note that we can write
li jiUH1 ,
* )(H
*H
MDT where denotes the frame redundancy due to the non-uniform distribution of the parameters and denotes the frame redundancy due to the memory between the parameters.
*H36D
H(U)H*M
The results are compiled in Table 2 in which we provide the values of ,D M and T for each individual parameter as well as for the entire frame. It is clear from Table 2 that the amounts of 16% of the bits in each frame are redundant bits.
Table 2: Residual Redundancy (in bits/frame).
3. Error Concealment Procedure
Consider the transmission of codec parameters over a noisy channel as described in Figure 1. At the encoder side, the speech parameters are first extracted from the input samples. At time index k, a specific parameter Rvk
~ is first quantized
according to ikk vvQ ]~[ with , ( :
quantization table) where is a quantization index of quantized parameter , ,where M is the number of bits assigned to the
QTvik QT
i
kv }12,,1,0{ Mi
Parameter D M T Redundancy
LSP 0.035 0.103 0.138 1.15%Pitch Delay 0.564 0.698 1.262 42.06%Delta Delay 0.528 0.629 1.157 38.57%ACB Gain 0.184 0.289 0.473 5.25%FCB Gain 1.356 1.483 2.839 31.54%
Total 2.667 3.202 5.869 16.3%
858585
quantized parameter. Using bit mapping (BM), a bit combination consisting of M bits is assigned to index i
))1(,),(,),1(),0(( Mxmxxxx ik
ik
ik
ik
ik (4)
with , this bit combination is then encoded by channel encoder and transmitted through a channel. Assume the decoded bit combination
}1,1{)(mxik
kx̂ ,at the output of the channel decoder is:
))1(ˆ,),(ˆ,),1(ˆ),0(ˆ(ˆ Mxmxxxx kkkkk . (5)
Q BM ChannelParameterTransitionProbability
A posterioriProbability
A priori knowledge
Estimator
Speech Encoder
0~v 0v 0x )(
0estv
Softbit Speech Decoder
0x̂
0ep
)ˆ( )(00ixxP ),ˆ( 0
)( xxP i
Figure 1. Softbit Speech Decoding Technique.
To perform error concealment, the channel decoder should be a soft-output channel decoder. In our research, SOVA algorithm is used for rate-1/2 Turbo encoder with constraint length 5 and the L-value of bit at the output of the decoder is: )(mxi
k
)~|1)(()~|1)((
log)(YmxP
YmxPmL i
k
ik
k (6)
where denoting the transmitted bit, and )(mxik Y
~
is the received bit sequence. The hard decision of bit is given by: )(ˆ mxk
)]([)(ˆ mLsignmx kk (7) The bit error probability is defined as:
|)(|exp11)(
mLmp
kek (8)
Thus, the probability of transition from a transmitted bit to the known decoded bit is: )(mxi
k )(ˆ mxk
)()(ˆif)()()(ˆif)(1))(|)(ˆ(
mxmxmpmxmxmpmxmxP i
kkek
ikkeki
kk
(9) If we consider the channel to be memoryless, the transition probability from any bit combination i
kx ,
to the decoded bit combination }12,,1,0{ Mi
kx̂ is: 1
0))(|)(ˆ()|ˆ(
M
m
ikk
ikk mxmxPxxP (10)
For a memoryless channel, the a posteriori probability of the transmitted bit combination with index is: i
)ˆ()ˆ|ˆ()ˆ().ˆ|()|ˆ(
)ˆ,ˆ|(11
111
kkk
kkik
ikk
kkik XPXxP
XPXxPxxPXxxP
)ˆ|ˆ()ˆ|()|ˆ(
1
1
kk
kik
ikk
XxP
XxPxxP (11)
where 1ˆ
kX are the decoded bit combinations from time index 0 to 1k . Since we take each quantized parameter as a first-order Markov source, we have
)ˆ|( 1kik XxP
12
0111 )ˆ|()|(
M
jk
jk
jk
ik XxPxxP
12
02111 )ˆ,ˆ|()|(
M
jkk
jk
jk
ik XxxPxxP (12)
and
)ˆ,ˆ|()|ˆ()ˆ|ˆ( 21
12
01 kk
lk
l
lkkkk XxxPxxPXxP
M
(13)(11) can be expressed as
12
021111 )ˆ,ˆ|()|()|ˆ()ˆ,ˆ|(
M
jkk
jk
jk
ik
ikkkk
ik XxxPxxPxxPXxxP
12
0
12
02111 )ˆ,ˆ|()|()|ˆ(
M M
l jkk
jk
jk
lk
lkk XxxPxxPxxP
(14) and we can derive the a posteriori probability of any bit combination at time index by a recursive computation.
k
The probability )|( 1jk
ik xxP is an a priori
knowledge of the source and it depends on the inter-frame correlation of the encoded bit sequence. We used large training sequence to obtain the statistical value of )|( 1
jk
ik xxP .
If any of the previously specified a posteriori terms 1
ˆ,ˆ|( kkik XxxP is computed, the parameter
value itself can be estimated. The arbitrary estimation error criterion should reflect the impact of parameter errors on the subjective speech quality.
For a wide area of speech codec parameters, the Minimum Mean Square Error (MMSE) criterion is appropriate. We use the (MMSE) criterion to estimate the transmitted parameters as follows:
)ˆ,ˆ|(ˆ 1
12
0kk
ik
i
ikk XxxPvv
M
(15)
4. Simulation Results
For measurement of the speech distortion, the segmental signal-to-noise ratio (SSNR) and log spectral distance (LSD) [6] are investigated:
868686
1
010
2
10
2
))(ˆ)((
)(1 fN
iNn ii
Nn i
f nsns
ns
NSSNR (16)
and1
00
,)(ˆ)(fN
iii dwwSlogwlogSLSD (17)
where denotes the number of total frames, stands for the number of speech samples in each frame, and are the original and reconstructed speech in the i-th frame, and the corresponding power spectral density functions are denoted by , and , respectively. Based on these criteria, bit errors in the important parameters (i.e., the error-sensitive parameters) can cause severe SSNR reduction and LSD increment, while errors in other code bits result in little distortion.
fN N
)(nsi )(ˆ nsi
)(wSi )(ˆ wSi
Signal-to-noise ratio performance for each parameter (PSNR) protected by error concealment is measured for channel quality ( 0NEb ratio) ranging from –3.0 to 3.0 dB, where is the energy per information bit and
bE20N is the double sided
spectral noise density. The results are shown in Figures 2 and 3. All our results are based on an additive white Gaussian noise (AWGN) channel. The simulation results show that at low 0NEb ,there are 0.8-1 dB improvement in terms of parameter SNR (or 0.5-1.6 dB improvement in terms of 0NEb ) for Pitch delay and FCB Gain, 0.5-0.7 dB improvement in terms of parameter SNR (or 0.5-1.2 dB improvement in terms of 0NEb ) for LSPs, 0.9-1.5 dB improvement in terms of parameter SNR (or 0.5-1.5 dB improvement in terms of 0NEb ) for Delta delay and 0.7-1.3 dB improvement in terms of parameter SNR (or 0.6-1.4 dB improvement in terms of 0NEb ) for ACB Gain. In Figure 4, the quality of the speech is evaluated by log spectral distortion. The results show 1-1.8 dB improvement of LSD at bad channel quality when error concealment is used for EVRC encoded speech.
5. Conclusion
In this paper, we have studied the error concealment procedures used to protect the most important parameters of the EVRC speech decoder. Residual redundancies of these parameters are analyzed and exploited in the error concealment process as an a priori knowledge of the source. The simulation results show that the use of error concealment improves the parameter SNRs of these parameters, especially for Pitch delay, Delta delay and Fixed Codebook (FCB) gain. The results also show that the more redundancy exists in the encoded parameter, the more improvement we could obtain
by using the error concealment scheme. This is because the more the residual redundancy exists, the stronger is the correlation between the encoded parameters, and the higher is the probability that some information about the corrupted coefficients can be gained by observing their neighbors.
6. References
[1] W.B. Kleijn, P. Kroon, and D. Nahumi, “The RCELP Speech-Coding Algorithm”, EuropeanTransactions on Telecommunications, Vol. 5, No. 5, Sept/Oct. 1994, pp. 573-582.
[2] Tim Fingscheidt and P. Vary, “Softbit Speech Decoding: A new approach to error concealment,” IEEE Trans. Speech & Audio Process., vol. 9, no. 3, pp. 240-251, March 2001.
[3] Fady Alajaji, N. Phamdo, and T. E. Fuja, “Channel Codes that Exploit the Residual Redundancy in CELP-Encoded Speech,” IEEETransactions on Speech and Audio Processing, vol. 4, no. 5, pp. 325-336, Sept. 1996.
[4] Ahmed J. Jameel, You Xiaohu and Gao Xiqi, “Joint Source-Channel Decoding of EVRC Speech Encoder Using Residual Redundancy,” Journal of Southeast University, vol. 18, no. 2, pp. 103-107, June 2002.
[5] Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems, IS-127, July 19, 1996.
[6] L. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall: Englewood Cliffs, NJ, 1993.
-3 -2 -1 0 1 2 3
2
4
6
8
10
12
SNR of Delta delay with error concealment SNR of Delta delay without error concealment SNR of ACB Gain with error concealment SNR of ACB Gain without error concealment
Par
amet
er S
NR
(dB
)
Eb/No (dB)
Figure 2. SNR performance of Delta delay and ACB Gain.
878787
-3 -2 -1 0 1 2 30
2
4
6
8
10
12
14 SNR of LSPs with error concealment SNR of LSPs without error concealment SNR of Pitch delay with error concealment SNR of Pitch delay without error concealment SNR of FCB Gain with error concealment SNR of FCB Gain without error concealment
Par
amet
er S
NR
(dB
)
Eb/No (dB)
Figure 3. SNR performance of LSPs, Pitch delay and FCB Gain.
-3 -2 -1 0 1 2 3
2
4
6
8
10
LSD of speech without error concealment LSD of speech with error concealmentLo
g Sp
ectr
al D
isto
rtio
n (d
B)
Eb/No (dB)
Figure 4. Log Spectral Distortion of EVRC encoded speech.
888888