10
Signal Processing 81 (2001) 1899–1908 www.elsevier.com/locate/sigpro Two new algorithms based on product system for discrete cosine transform Zhaoli Guo a , Baochang Shi b; , Nengchao Wang b a National Laboratory of Coal Combustion, Huazhong University of Science and Technology, Wuhan, 430074, People’s Republic of China b Department of Mathematics, Huazhong University of Science and Technology, Wuhan, 430074, People’s Republic of China Received 4 January 2000; received in revised form 11 April 2001 Abstract In this paper we present a product system and give a representation for cosine functions with the system. Based on the formula, two new algorithms are designed for computing the discrete cosine transform. Both algorithms have a regular recursive structure and good numerical stability and are easy to be implemented on parallel computers. Furthermore, this paper also provides a frame to design fast algorithms for discrete transforms. ? 2001 Elsevier Science B.V. All rights reserved. Keywords: Discrete cosine transform; Product system 1. Introduction The discrete cosine transform (DCT), which is a robust approximation of the optimal Karhunen–Lo eve transform (KLT) for a rst-order Markov source correlation coecient, is widely used for speech, signal and image processing. In the last few years, a number of fast cosine-transform (FCT) algorithms have been developed [1,2,5,6]. These algorithms possess the same eciency in terms of operation counts, and all of them have regular and simple structure. All of these algorithms mentioned above contain two stages, i.e., the Buttery stage and the permutation and recursive addition (PRA) stage. Dierences between these FCT algorithms can be found in both stages. Lee’s FCT [6], which contains secant multipliers in the Buttery stage, has a larger round-o error, but the PRA stage is quite simple and easy to implement. On the other hand, an index mapping is required in Hou’s algorithm [5] to transform the DCT into a phase-modulated DFT, which may not be performed in-place. Moreover, the PRA stage of the algorithm is much more complicated than that of Lee’s algorithm and may lead to a larger round-o error [8]. In [2], Chan proposed a new version of Hou’s algorithm, which can be performed in-place. However, the Buttery stage of this algorithm contains two buttery structures, i.e. a 2-point and a 4-point buttery structure. The computation of the 4-point structure is more complicated than that of the 2-point one. In Corresponding author. Tel.: +86-27-875-498-92. E-mail address: [email protected] (B. Shi). 0165-1684/01/$ - see front matter ? 2001 Elsevier Science B.V. All rights reserved. PII: S0165-1684(01)00081-0

Two new algorithms based on product system for discrete cosine transform

Embed Size (px)

Citation preview

Signal Processing 81 (2001) 1899–1908www.elsevier.com/locate/sigpro

Two new algorithms based on product system for discretecosine transform

Zhaoli Guoa, Baochang Shib; ∗, Nengchao Wangb

aNational Laboratory of Coal Combustion, Huazhong University of Science and Technology, Wuhan, 430074,People’s Republic of China

bDepartment of Mathematics, Huazhong University of Science and Technology, Wuhan, 430074, People’s Republic of China

Received 4 January 2000; received in revised form 11 April 2001

Abstract

In this paper we present a product system and give a representation for cosine functions with the system. Basedon the formula, two new algorithms are designed for computing the discrete cosine transform. Both algorithms havea regular recursive structure and good numerical stability and are easy to be implemented on parallel computers.Furthermore, this paper also provides a frame to design fast algorithms for discrete transforms. ? 2001 ElsevierScience B.V. All rights reserved.

Keywords: Discrete cosine transform; Product system

1. Introduction

The discrete cosine transform (DCT), which is a robust approximation of the optimal Karhunen–Lo7evetransform (KLT) for a 8rst-order Markov source correlation coe<cient, is widely used for speech, signaland image processing. In the last few years, a number of fast cosine-transform (FCT) algorithms havebeen developed [1,2,5,6]. These algorithms possess the same e<ciency in terms of operation counts, andall of them have regular and simple structure. All of these algorithms mentioned above contain two stages,i.e., the Butter,y stage and the permutation and recursive addition (PRA) stage. DiAerences betweenthese FCT algorithms can be found in both stages. Lee’s FCT [6], which contains secant multipliers inthe ButterCy stage, has a larger round-oA error, but the PRA stage is quite simple and easy to implement.On the other hand, an index mapping is required in Hou’s algorithm [5] to transform the DCT into aphase-modulated DFT, which may not be performed in-place. Moreover, the PRA stage of the algorithmis much more complicated than that of Lee’s algorithm and may lead to a larger round-oA error [8]. In[2], Chan proposed a new version of Hou’s algorithm, which can be performed in-place. However, theButterCy stage of this algorithm contains two butterCy structures, i.e. a 2-point and a 4-point butterCystructure. The computation of the 4-point structure is more complicated than that of the 2-point one. In

∗ Corresponding author. Tel.: +86-27-875-498-92.E-mail address: [email protected] (B. Shi).

0165-1684/01/$ - see front matter ? 2001 Elsevier Science B.V. All rights reserved.PII: S0165-1684(01)00081-0

1900 Z. Guo et al. / Signal Processing 81 (2001) 1899–1908

recent years several other improved versions of Hou’s algorithm were also developed [3,4,7]. This classof algorithms have the same drawbacks as the original algorithm proposed by Hou, namely, the PRAstage is inconvenient to implement and may lead to a larger round-oA error, and they are not suitablefor parallel computing in that at the PRA stage, the addition must be performed recursively in each step.In [1], a new FCT algorithm is designed using the successive-doubling method. The algorithm has aregular 2-point butterCy structure and the PRA stage overcomes the drawbacks of Hou’s algorithm. Sothis algorithm might be the best one among the algorithms mentioned above considering the round-oAerror and the regular structure of the algorithm. However, the input data is required to be rearrangedbefore the ButterCy stage of the algorithm, and the PRA stage has no obvious recursive structure.In this paper, we propose two new FCT algorithms based on a product system. The algorithms have

the same advantage as the one proposed by Arguello and Zapata [1], but need no rearrangement beforethe ButterCy stage, and the recursive structures are more obvious and regular. Furthermore, the methodused in this paper may be used to design fast algorithms for other discrete transforms. The paper isorganized as follows: In Section 2, a product system is introduced, and then a discrete transform isde8ned according to the system. In Section 3, two FCT algorithms are designed based on the fastalgorithms for the discrete transform as de8ned in Section 2. Finally, some concluding remarks aredrawn in Section 4.

2. A product system

In what follows we assume that N =2m; m¿ 2.Recall that a product system determined by a given sequence of functions ’i(�) (�∈ [a; b]; i=1; 2; : : :)

is also a sequence of functions de8ned by the following equation:

j(�):=∞∏i=0

’jii (�); �∈ [a; b]; j=0; 1; 2; : : : ; (1)

where ji ∈{0; 1} is the ith binary bit of j.Let ’i(�)=2 cos(2i�) and [a; b]= [0; �], then we may obtain a special product system, and we call it

cosine product system (CPS). The 8rst N functions of the CPS can be written as jN (�); (j=0; 1; : : : ;N − 1), and they are discretized on the interval [0; �] as

jnN : = jN (�

nN )=

m−1∏i=0

2jicos(ji2i�nN ) (2)

where, �nN =(2n+ 1)�=2N; n=0; 1; : : : ; N − 1.It is easy to verify that jnN have the following properties:

2j; nN = jnN=2 j=0; 1; : : : ; N=2− 1

j;N−n−1N =(−1)jjnN j=0; 1; : : : ; N − 1

2j+1; nN =2cos(�nN )

j;nN=2 j=0; 1; : : : ; N=2− 1

n=0; 1; : : : ; N=2− 1: (3)

By (2) we can de8ne a discrete CPS transform (DCPST):

Y (j)=N−1∑n=0

x(n)jnN j=0; 1; : : : ; N − 1; (4a)

Z. Guo et al. / Signal Processing 81 (2001) 1899–1908 1901

or

Y =�Nx; (4b)

where Y =(Y (0); Y (1); : : : ; Y (N − 1))T ; x=(x(0); x(1); : : : ; x(N − 1))T , and �N is the transform matrix.Let us denote the anti-diagonal unitary matrix by OI , i.e.

OI =

1

1

. . .

1

and for any square matrix A, de8ne OA=A OI and A= OIA OI , which represent the sequence of columnsinverted matrix and the sequence of columns and rows inverted matrix of A, respectively. With thesenotations we can obtain the recursive relation between �N and �N=2 from (3)

�2 =

[1 1√2 −

√2

]

�N =PN

[�N=2 O�N=2

�N=2DN=2 −�N=2 ODN=2

] ; (5)

where

PTN =

1 0 0 0 · · · 0 0 0

0 0 1 0 · · · 0 0 0

...

0 0 0 0 · · · 0 1 0

0 1 0 0 · · · 0 0 0

0 0 0 1 · · · 0 0 0

...

0 0 0 0 · · · 0 0 1

and DN;2 = diag(2 cos(�0N ); 2 cos(�1N ); : : : ; 2 cos(�

N=2−1N )).

From Eq. (5) two forms of recursive decomposition for the matrix �N can be obtained.

Form I : �N =QN (IN=4 ⊗ B4)(IN=8 ⊗ B8) · · · (I2 ⊗ BN=2)BNB′N ; (6)

where “⊗” denotes the Kronecker product, QN =PN (I2 ⊗ PN=2) · · · (IN=4 ⊗ P4) is the bit reversal matrix,

B′N =

[IN=2 OIN=2ODN=2 −DN=2

]and BN =

[B′N=2

B′′N=2

]with B′′

N =

[IN=2 OIN=2

− ODN=2 DN=2

]: The detailed proof may be

found in Appendix A.

1902 Z. Guo et al. / Signal Processing 81 (2001) 1899–1908

Fig. 1. Data Cow of DBFCT.

This factorization leads to an in-place fast algorithm for computing the DCPST de8ned by (4) withregular structure. The form of BN indicates that the algorithm contains two types of 2-point structures,and we will call it double-butterCy algorithm in this paper. The signal-Cow graph of the algorithm canbe seen in Fig. 1 with N =16

Form II : �N =Q′N (IN=2 ⊗ B′

2)(IN=4 ⊗ B′4) · · · (I2 ⊗ B′

N=2)B′N ; (7)

where Q′N =P

′N (I2 ⊗ P′

N=2) · · · (IN=4 ⊗ P′4) and P′

N =(IN=4 ⊗ I r4 )PN with I r4 = diag(1; 1; 1;−1). The detailedproof may be found in Appendix B.Eq. (7) gives another fast recursive DCPST algorithm containing only one type of 2-point butterCy

structure (and so in this paper we refer to it as single-butterCy algorithm), but the permutation is a littlemore complicated than that of the algorithm de8ned by (6). The signal-Cow graph of the algorithm canbe seen in Fig. 2 with N =16.

3. Two new FCT algorithms

It is well-known that for a given input data sequence x(n); 06 n6N − 1, the DCT output sequenceX (k) is de8ned by

X (k)=

√2N�(k)

N−1∑n=0

x(n) cos(k�nN ); (8)

where �(0)=1=√2 and �(k)=1:0 for 0¡k¡N . We will remove �(k) and

√2=N since they only aAect

the amplitudes of X (k). In what follows, we will focus on this simpli8ed DCT vision

X =CNx; (9)

Z. Guo et al. / Signal Processing 81 (2001) 1899–1908 1903

Fig. 2. Data Cow of SBFCT.

where X =(X (0); X (1); : : : ; X (N − 1))T ; x=(x(0); x(1); : : : ; x(N − 1))T , and the N N DCT coe<cientmatrix CN is de8ned by CN (k; n)= cos(k�nN ).

Now we give the relation between the DCT and the DCPST. First we have the following result:

Lemma. There exists a lower triangular matrix LN such that

[1; cos(�); : : : ; cos((N − 1)�)]T =LN [0N (�); 1N (�); : : : ;

N−1N (�)]T : (10)

Proof. From the de8nition, we know that as N =2; 0N (�)=1 and 1N (�)=2 cos(�). It is easy to verifythat the matrix L2 = diag(1; 0:5) satis8es Eq. (10). Now we assume that (10) is true for N=2, i.e. for agiven k; 06 k6N=2 − 1; cos(k�)=

∑N=2−1j=0 LkjN=2

jN=2(�) holds for some real numbers LkjN=2 (elements of

the matrix of L; 06 j6N=2− 1). It is easy to verify that

cos(N�=2)=0:5N=2N (�) (11a)

and jN (�)=

jN=2(�)

j+N=2N (�)=2 cos(N�=2)jN=2(�)06 j6N=2− 1

so

cos(k�)=N=2−1∑j=0

LkjN=2jN (�): (11b)

1904 Z. Guo et al. / Signal Processing 81 (2001) 1899–1908

Also notice that

cos[(N=2 + k)�] = 2 cos(N�=2) cos(k�)− cos[(N=2− k)�]

=N=2−1∑j=0

Lk;jN=22 cos[(N=2)�]jN=2(�)−

N=2−1∑j=0

LN=2−k; jN=2 jN=2(�)

=N=2−1∑j=0

LkjN=2N=2+jN (�)−

N=2−1∑j=0

LN=2−k; jN=2 jN (�); 16 k6N=2− 1: (11c)

From Eqs. (11) we can conclude that (10) will hold for N if we let

LN =

1

LN=2−1

0:5

− OIN=2−1LN=2−1 LN=2−1

; (12)

where LN=2−1 is the submatrix of LN=2 by eliminating the 8rst row and column. And thus the proof iscompleted.

From the lemma it is easy to obtain the following theorem:

Theorem. The DCT matrix CN can be expressed by the DCPST matrix �N as

CN =LN�N : (13)

So the computation of DCT de<ned by (13) can be split into two steps(a) the DCPST step:

Y =�Nx (14)

and(b) the recursive-addition step:

X =LNY: (15)

The 8rst step can be computed by the fast DCPST algorithms proposed in Section 2 and in whatfollows we will discuss fast algorithms for computing (15).Let L′N =LNEN , where EN =diag(1; 2; : : : ; 2), then from (12) we have that

L′N =

[L′N=2

−I N=2L′N=2 L′N=2

]=

[IN=2

−I N=2 IN=2

][L′N=2

L′N=2

]T=RN (I2 ⊗ L′N=2); (16)

where I N=2

[0 O

O OIN=2−1

]and L′2 = I2. Therefore, we can decompose L′N as

L′N =RN (I2 ⊗ RN=2)(I4 ⊗ RN=4) · · · (IN=4 ⊗ R4) (17)

and so

LN =RN (I2 ⊗ RN=2)(I4 ⊗ RN=4) · · · (IN=4 ⊗ R4)E−1N : (18)

This implies that we have obtained a fast algorithm to compute Eq. (15).

Z. Guo et al. / Signal Processing 81 (2001) 1899–1908 1905

Fig. 3. Simpli8ed data Cow of SD.

We can show that the computation of E−1N in (18) can be performed in the last butterCy step. In fact,

due to the special form of E−1N , and the fact that either QN or Q′

N does not change the amplitude of eachelement of a vector, we can conclude that E−1

N QN =QNE−1N and E−1

N Q′N =Q

′NE

−1N , so the DCT matrix

can be written in recursive form as

CN =RN (I2 ⊗ RN=2)(I4 ⊗ RN=4) · · · (IN=4 ⊗ R4)QNE−1N (IN=4 ⊗ B4)(IN=8 ⊗ B8) · · · (I2 ⊗ BN=2)BNB′

N (19)

and

CN =RN (I2 ⊗ RN=2)(I4 ⊗ RN=4) · · · (IN=4 ⊗ R4)Q′NE

−1N (IN=2 ⊗ B′

2)(IN=4 ⊗ B′4) · · · (I2 ⊗ B′

N=2)B′N (20)

Eqs. (19) and (20) give two FCT algorithms, respectively. Since the main diAerence between thesetwo algorithms lies in the DCPST step, we call them double-butterCy FCT (DBFCT) and single-butterCyFCT (SBFCT) algorithm according to their DCPST characteristics, respectively. The signal-Cow graphsare shown in Figs. 1 and 2, respectively.It is interesting to compare the DBFCT and SBFCT algorithms with the successive-doubling (SD)

algorithm proposed by Arguello and Zapata [1]. The SD algorithm contains three stages: permutationof the input sequence, the butterCy stage, and the PRA stage. Fig. 3 shows the simpli8ed data Cowof the SD algorithm for N =16. From Figs. 1, 2, and 3, we can see that the DBFCT and SBFCTalgorithms share the same advantages as the SD one, such as little round-oA error, regular structure, andin-place operations in the butterCy and recursive-addition stages. The three algorithms also have the samenumber of multiplications and additions=subtractions. However, there also exist some diAerences betweenthem. First, unlike the SD algorithm, the DBFCT and SBFCT algorithms require no data rearrangementbefore the butterCy stage. This feature enables DBFCT and SBFCT to be less time-consuming. Second,the structure of the recursive-addition stage in the DBFCT and SBFCT algorithms is more regularcompared with that in the SD algorithm, which facilitates the implementation of the algorithms onparallel computers or in VLSI technology. Third, in the recursive-addition stage of the SD algorithm,both addition and subtraction operations are involved, and the index of data must be distinguished incomputation; however, only subtraction operations are involved in the recursive-addition stage for the

1906 Z. Guo et al. / Signal Processing 81 (2001) 1899–1908

Table 1Time costs of the SD, DBFCT, and SBFCT algorithms

N SD (%�) DBFCT (%�) SBFCT (%�)

32 50 45 4564 109 98 98128 242 216 216256 529 476 477512 1158 1040 10401024 2530 2273 2274

DBSFC and SBFCT algorithms, therefore there is no need to distinguish the data index. The propertiesof the DBFCT and SBFCT algorithms described above indicate that they might be more e<cient thanthe SD algorithm.To quantify the comparison, some numerical experiments are carried out. The input data are randomly

generated between 0 and 1. The DBFCT, SBFCT, and SD algorithms are implemented on a PC. Thetime costs of each algorithm in cases of N =32, 64, 128, 256, 512 and 1024 are listed in Table 1. Fromthe numerical results, we can see that for each case, the DBFCT and SBFCT algorithms are almost thesame time-consuming, and both are less time-consuming than the SD algorithm.

4. Conclusions

The DBFCT and SBFCT algorithms presented in this paper were designed based on one productsystem (CPS). Like other FCT algorithms, these two algorithms have regular butterCy structures, but therecursive-addition computation is much simpler and is easier to be implemented on parallel computers.These important properties of both algorithms owe much to the natural recursive structure of the CPS.In fact, many other transforms widely used in signal processing can also be obtained from appropriate

product systems. For example, if one takes ’i(�)= sgn[cos(2i�)] and [a; b]= [0; 1], the resulting productsystem is just the well-known Walsh system; and ’i(�)= exp(ij�) (j=

√−1) with [a; b]= [0; 2�] willlead to the Fourier transform. From this point of view, the cosine transform, Walsh transform and Fouriertransform might also be listed into one class. Furthermore, it is possible to design other useful transformsfor diAerent problems by using product systems. This paper has just presented such an example.

Acknowledgements

This work was subsidized by the Special Funds for Major State Basic Research Projects (G1999022207)and the National Natural Science Foundation of China (60073044). The authors sincerely thank the re-viewers for the helpful comments which greatly improved the work.

Appendix A. Proof for (6)

From (5) we can get that

�N =PN (I2 ⊗�N=2)D′NK

′N ;

Z. Guo et al. / Signal Processing 81 (2001) 1899–1908 1907

where D′N =

[IN=2

ODN=2

]and K ′

N =

[IN=2 OIN=2OIN=2 −IN=2

]. So we can decompose �N as

�N =QN (IN=2 ⊗D′2)(IN=2 ⊗ K ′

2)(IN=4 ⊗D′4)(IN=4 ⊗ K ′

4) · · · (I2 ⊗D′N=2)(I2 ⊗ K ′

N=2)D′NK

′N : (A1)

Let K ′′N =

[IN=2 OIN=2− OIN=2 IN=2

], then for any N × N matrix A, it is easy to verify that K ′

NOA=K ′′

N A. With

this identity we have

(I2 ⊗ K ′N=2)D

′N =

[K ′N=2

K ′N=2

ODN=2

]=

[K ′N=2

K ′′N=2

][IN=2

DN=2

], KN VDN

and so

(I2i ⊗ K ′2m−i)(I2i−1 ⊗D′

2m−i+1) = (I2i−1 ⊗ (I2 ⊗ K ′2m−i))(I2i−1 ⊗D′

2m−i+1)

= I2i−1 ⊗ ((I2 ⊗ K ′2m−i)D′

2m−i+1)= I2i−1 ⊗ (K2m−i+1 VD2m−i+1)

= (I2i−1 ⊗ K2m−i+1)(I2i−1 ⊗ VD2m−i+1): (A2)

Apply (A2) to (A1) and note that VDNK ′N =B

′N and VDNK ′′

N =B′′N , the transform matrix �N can be

factorized as

�N =QN (IN=2 ⊗ VD2)(IN=4 ⊗ K4)(IN=4 ⊗ VD4)(IN=8 ⊗ K8) · · · (I2 ⊗ VDN=2)KN VDNK ′N

=QN (IN=4 ⊗ B4)(IN=8 ⊗ B8) · · · (I2 ⊗ BN=2)BNB′N

and the proof is completed.

Appendix B Proof for (7)

Let MN =(IN=2 ⊗D′2)(IN=2 ⊗ K ′

2)(IN=4 ⊗D′4)(IN=4 ⊗ K ′

4) · · · (I2 ⊗D′N=2)(I2 ⊗ K ′

N=2)D′NK

′N .

Obviously, MN =

[MN=2

MN=2

]D′NK

′N , and so

OMN =

[MN=2

MN=2

]D′NK

′NOIN =

[MN=2 OMN=2

MN=2DN=2 −MN=2 ODN=2

][ OIN=2OIN=2

]

=

[MN=2 OMN=2

−MN=2DN=2 MN=2 ODN=2

]=

[IN=2

−IN=2

]MN

therefore,

MN =

[MN=2

OMN=2

][IN=2

OIN=2 ODN=2

]K ′N

= (I r4 ⊗ IN=4)[MN=2

MN=2

][IN=2

DN=2

]K ′N =(I r4 ⊗ IN=4)(I2 ⊗MN=2)B′

N

and thus MN can be written in a recursive form as

MN = VIN (I2 ⊗ VIN=2) · · · (IN=2 ⊗ VI 2)(IN=2 ⊗ B′2) · · · (I2 ⊗ B′

N=2)B′N ; (B1)

1908 Z. Guo et al. / Signal Processing 81 (2001) 1899–1908

where VIN =(I r4 ⊗ IN=4). Substitute (B1) into (A1), and a new recursive decomposition of �N is nowobtained as

�N =QNI ′N (IN=2 ⊗ B′2) · · · (I2 ⊗ B′

N=2)B′N ; (B2)

where I ′N = VIN (I2 ⊗ VIN=2) · · · (IN=2 ⊗ VI 4).Let Q′

N ≡ QNI ′N , one can show that

Q′N =P

′N (I2 ⊗ P′

N=2) · · · (IN=4 ⊗ P′4): (B3)

In fact, since the matrix QN performs the bit-reversal operation, the identity QN VIN = I rNQN will alwayshold. Therefore

Q′N =QNI ′N =QN VIN (I2 ⊗ I ′N=2)= I rNQN (I2 ⊗ I ′N=2)= I rNPN (I2 ⊗QN=2)(I2 ⊗ I ′N=2)=P′

N (I2 ⊗ (QN=2I ′N=2))

= P′N (I2 ⊗Q′

N=2)= · · ·=P′N (I2 ⊗ P′

N=2) · · · (IN=4 ⊗ P′4):

We complete the proof by inserting (B3) into (B2).

References

[1] F. Arguello, E.L. Zapata, Fast cosine transform based on the successive doubling method, Electron. Lett. 26 (19) (1990)1616–1618.

[2] S.C. Chan, K.L. Ho, Direct methods for computing discrete sinusoidal transform, IEE Proc. 136 (6) (1990) 433–442.[3] S.C. Chan, K.L. Ho, A new two-dimensional fast cosine transform algorithm, IEEE Trans. Signal Process. 39 (2) (1991)

481–485.[4] Z. Cvetkovic, M.V. Popovic, New fast recursive algorithms for the computation of discrete cosine and sine transforms, IEEE

Trans. Signal Process. 40 (8) (1992) 2083–2086.[5] H.S. Hou, A fast recursive algorithm for computing the discrete cosine transform, IEEE Trans. ASSP ASSP-35 (10) (1987)

1455–1461.[6] B.G. Lee, A new algorithm to compute the discrete cosine transform, IEEE Trans. ASSP ASSP-32 (6) (1984) 1243–1245.[7] P. Lee, F-Y. Huang, Restructured recursive DCT and DST algorithms, IEEE Trans. Signal Process. 42 (7) (1994) 1600–1609.[8] I.D. Yun, S.U. Lee, On the 8xed-point error analysis of several fast IDCT algorithms, IEEE Trans. Circuits Systems-II:

Analog Digital Signal Process. 42 (11) (1995) 685–692.

Zhaoli Guo received the B.S. degree in Mathematics from the Huazhong University of Science and Technology (HUST), People’sRepublic of China, in 1997 and the Ph.D. degree in Computer Science from HUST in 2000. He is currently working at theNational Laboratory of Coal Combustion. His research interests include digital signal processing, parallel computation, andnumerical algorithms. He has published around 15 journal and conference papers in these areas.

Baochang Shi received the B.S. degree in Mathematics from the Huazhong University of Science and Technology (HUST),People’s Republic of China, in 1986 and the Ph.D. degree in Systems Engineering from HUST in 1996. From 1997 he is aprofessor of the Mathematics Department of HUST. He joined the Institute of Parallel Computing of HUST from 1996. Hisresearch interests include digital signal processing and parallel computation. He has published around 50 journal and conferencepapers in these areas.

Nengchao Wang received the B.S. degree in Mathematics from the Fudan University, People’s Republic of China in 1964, andbegan to work in the Department of Computer Science of the Huazhong University of Science and Technology (HUST). From1982, he joined the Mathematics Department of HUST as a professor. He is currently the head of the Institute of ParallelComputing of HUST. Over the past 8ve years, his research group has received several national projects. He is a member of theeditorial board of the journal of APPROXIMATION THEORY AND ITS APPLICATIONS. His research intrests include digitalsignal processing, parallel computation, and numerical algorithms. He has published around 80 journal and conference papers inthese areas.