Unifying Performance Metric of Viterbi Decoders · 2.1 Rate-1/2 convolutional encoder with generator polynomials (G0=5, ... [3, 4, 5, 6, 7, 8]. ... of the Viterbi decoder come up

Unifying Performance Metric

of Viterbi Decoders

by

Yong-dian Jian

Computer Science and Information Engineering

National Taiwan University, Taipei, 2004

Professor Mong-kai Ku and Professor Feipei Lai

Abstract

Convolutional codes and Viterbi decoders were extensively used in error con-

trol systems. The survivor memory management (SMM) unit of Viterbi decoder

is extremely important in determining the throughput, hardware area and coding

gain performance of the whole system. Many SMM architectures were proposed in

the past, but we lack an unifying metric to compare the coding gain performance

of them. In this thesis, we define a metric, average traceback depth (ATBD), to

unify the diversity of different SMM architectures. The ATBD metric can be used

to equalize different SMM architectures and predict the optimal traceback depth

(TBD) of them. The optimality is in terms of coding gain performance and hardware

cost. We perform extensive computer simulations with three popular convolutional

codes (DVB, DCII and UMTS) and many SMM architectures to verify the validity

of the ATBD metric. Simulation results show that the difference between optimal

TBD and ATBD is at most 10%. With this unifying metric, we can estimate the

hardware cost of different SMM architectures under fixed coding gain performance.

Besides, system architects can use it to fast evaluate the tradeoff among hardware

cost, throughput and coding gain performance because the calculation of ATBD

metric is very simple.

Table of Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Convolutional Code & Viterbi Algorithm 5

2.1 Convolutional Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 The Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Punctured Convolutional Codes . . . . . . . . . . . . . . . . . . . . . 14

2.4 Communication Channel Models . . . . . . . . . . . . . . . . . . . . . 17

3 Viterbi Decoder Architecture 21

3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Register Exchange Architecture . . . . . . . . . . . . . . . . . . . . . 24

3.3 Modified Register Exchange architecture . . . . . . . . . . . . . . . . 27

3.3.1 Comment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.4 Traceback Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.5 One-Pointer Traceback . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.6 Multiple Pointer Traceback . . . . . . . . . . . . . . . . . . . . . . . . 36

3.7 Traceforward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.8 Sliding Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

i

3.9 Best State Traceback Architecture . . . . . . . . . . . . . . . . . . . . 46

3.10 Comparison of Architectures . . . . . . . . . . . . . . . . . . . . . . . 48

4 Performance Analysis and Metric 53

4.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 Coding Gain Analysis of Viterbi Algorithm . . . . . . . . . . . . . . . 57

4.2.1 Simple Traceback Architecture . . . . . . . . . . . . . . . . . . 57

4.2.2 Register Exchange Architecture . . . . . . . . . . . . . . . . . 59

4.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.3 Equalization of SMM Architectures . . . . . . . . . . . . . . . . . . . 62

4.3.1 DCII & UMTS . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.3.2 Best State Architecture . . . . . . . . . . . . . . . . . . . . . . 63

4.4 Hardware Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.5 Coding Gain Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 72

5 Conclusion & Future Work 73

5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

A Acronyms & Abbreviations 75

B Glossary of Notation 77

ii

List of Figures

2.1 Rate-1/2 convolutional encoder with generator polynomials (G0=5,

G1=7, K=3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Rate-1/3 linear convolutional encoder with generator polynomials

(G0=161, G1=135, G2=171, K=7) . . . . . . . . . . . . . . . . . . . 7

2.3 State diagram for encoder in Figure 2.1 . . . . . . . . . . . . . . . . . 8

2.4 Trellis diagram of Figure 2.3 . . . . . . . . . . . . . . . . . . . . . . . 8

2.5 Information flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.6 Trellis diagram of Figure 2.3 . . . . . . . . . . . . . . . . . . . . . . . 12

2.7 Trellis diagram for rate-2/3 punctured convolutional code . . . . . . . 15

2.8 Additive channel noise model . . . . . . . . . . . . . . . . . . . . . . 17

3.1 Viterbi decoder system block diagram . . . . . . . . . . . . . . . . . . 22

3.2 Register exchange architecture . . . . . . . . . . . . . . . . . . . . . . 25

3.3 Register contents for register exchange operations . . . . . . . . . . . 25

3.4 Timing diagram of register exchange architecture . . . . . . . . . . . 26

3.5 Register contents for register exchange operations . . . . . . . . . . . 27

3.6 Basic implementation of TB method . . . . . . . . . . . . . . . . . . 30

3.7 Register contents for traceback operations . . . . . . . . . . . . . . . 31

3.8 Memory organization of one-pointer traceback architecture . . . . . . 33

3.9 Timing diagram of one-pointer traceback architecture with DTR=0.5 33

3.10 Timing diagram of multiple pointer traceback architecture . . . . . . 37

iii

3.11 Traceforward unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.12 Memory configuration of traceforward unit . . . . . . . . . . . . . . . 39

3.13 Timing diagram of traceforward algorithm . . . . . . . . . . . . . . . 40

3.14 Block diagram of traceforward algorithm . . . . . . . . . . . . . . . . 41

3.15 Hybrid Viterbi algorithm for selecting the shortest path through a

trellis of finite length (four-state trellis example) . . . . . . . . . . . . 43

3.16 Block decoding using the SBVD method: (a) forward processing and

(b) equal forward/backward processing . . . . . . . . . . . . . . . . . 44

3.17 Continuous stream processing using the SBVD method. . . . . . . . . 45

3.18 Best state traceback timing diagram . . . . . . . . . . . . . . . . . . 47

4.1 Computer simulation setup . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 Eb/No vs. BER for different coding systems . . . . . . . . . . . . . . 55

4.3 BER of DVB with simple traceback architecture . . . . . . . . . . . . 60

4.4 Coding gain of DVB with simple traceback architecture . . . . . . . . 61

4.5 Coding gain comparison of different SMM architectures (DVB) . . . . 64

4.6 Coding gain comparison equalized by ATBD (DVB) . . . . . . . . . . 65

4.7 Coding gain comparison of different SMM architectures (UMTS) . . . 66

4.8 Coding gain comparison equalized by ATBD (UMTS) . . . . . . . . . 67

4.9 Coding gain performance of best state architecture . . . . . . . . . . 69

4.10 Coding gain performance of best state architecture equalized by ATBD 70

iv

List of Tables

2.1 Summary of three convolutional codes . . . . . . . . . . . . . . . . . 9

2.2 Puncture code definition . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1 Comparison of traceback architectures . . . . . . . . . . . . . . . . . 35

3.2 Comparison multiple pointer traceback architectures . . . . . . . . . 37

3.3 Bandwidth requirement of basic SMM architectures . . . . . . . . . . 49

3.4 ATBD of architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.5 Speed and bandwidth related comparisons of architectures . . . . . . 50

3.6 Memory size comparison of architectures . . . . . . . . . . . . . . . . 50

3.7 Hardware requirement comparison . . . . . . . . . . . . . . . . . . . . 51

3.8 Hardware unit description . . . . . . . . . . . . . . . . . . . . . . . . 51

4.1 Coding gain analysis of DVB with simple TB architecture . . . . . . 59

4.2 Coding gain analysis of DCII with simple TB architecture . . . . . . 59

4.3 Coding gain analysis of UMTS with simple TB architecture . . . . . 59

4.4 Category of SMM architectures . . . . . . . . . . . . . . . . . . . . . 63

4.5 Equalized area of architectures, 10−3mm2 . . . . . . . . . . . . . . . . 71

v

vi

Chapter 1

Introduction

1.1 Motivation

In recent years, the demand for high-speed digital communication pushes the

data rate to 10 Gb/s in Ethernet, 56 Mb/s in WLAN and 2 Mb/s in mobile com-

munications [1, 2]. This high rate requirment poses problems of throughput, cost

and power to the design of channel decoders. We need a coding scheme to provide a

high-speed, low-power, robust and realizable communication system. Channel cod-

ing provides the means of trasforming the incoming data symbols, such that we can

increase the resistance to channel noise of a digital communication system. Convo-

lutional code with Viterbi decoder is a popular candidate for channel coding. They

were widely used in many applications such as satellite communication, COFDM,

GSM, UMTS, Ethernet and magnetic disks and tapes.

The quality of a Viterbi design is mainly measured by 3 criteria: coding gain,

throughput and power dissipation. High coding gain results in low data transfer error

probability. High throughput is necessary for high-speed applications. The design

of Viterbi decoders with high coding gain and throughput is challenged by the need

for low power, however, since Viterbi decoders are often placed in communication

systems running on batteries.

Single-chip VD design has been a very active research area for over the past

1

CHAPTER 1. INTRODUCTION

15 years. The design of high-throughput large-state Viterbi decoders has remained

largely unexplored. In the Viterbi decoders, the performance bottleneck is at the

Add-Compare-Select (ACS) unit survivor memory management (SMM) unit. The

SMM unit relates to the hardware realization of Viterbi algorithm, and it affects the

throughput and hardware complexity of the Viterbi decoder. The traceback depth

(TBD) is the number of traceback steps before decoding the first bit. Generally

speaking, longer TBD results in smaller BER. But to different SMM architectures,

the same TBD does not result in the same BER performance. Although many

high-speed architectures and implementations of SMM unit have been reported in

the past, we lack general metrics to equalize the optimal traceback depth (TBD) of

different SMM architectures. The optimal TBD is defined as the TBD level such

that the coding gain stops increasing significantly beyond this level.

In this thesis, we define some metrics to fast evaluate the throughput performance

of different SMM architectures. We also define a metric, average TBD (ATBD),

to equalize the diversity of different SMM architectures. It can help us predict

the optimal TBD and optimize the hardware cost and coding gain performance

of different SMM architectures. With this metric, we have the following benefits:

First, if we can unify different SMM architectures, we can determine the optimal

TBD which means no hardware resources are wasted. Second, for the analysis

purpose, we can put different SMM architectures on the same baseline to predict

their throughput and turnaround time. Third, once we set up the optimal TBDs

of different SMM architectures and fix the coding gain performance, we can further

compare their hardware cost and power dissipation.

To find out the metric mentioned above, we first perform extensive computer

simulations on the simplest traceback architecture (TBS) (shown in Section 3.4).

Then these simiulation results will serve as the baseline of our work. Furthermore, we

will extend our simulations to different convolutional codes and SMM architectures.

2

1.2. RELATED WORK

By analyzing the computer simulation results, we define a metric, average TBD

(ATBD), to unify the diversity of different SMM architectures. Hence the optimal

TBD of different SMM architectures can be efficiently predicted. System architects

can fast evaluate the tradeoff among hardware cost, throughput and coding gain

performance by these metrics.

1.2 Related Work

The theories of convolutional code and Viterbi algorithm are very mature. The

fundamental results can be seen in [3, 4, 5, 6, 7, 8].

But algorithm is not everything, for the implementation of Viterbi algorithm,

many VLSI techniques were used to boost the throughput or lower the power dissi-

pation of the Viterbi decoders. Many papers discuss the ACS architectures [9, 10, 11]

and SMM architectures [12, 13, 14, 15, 16, 17]. The hardware implmentation reports

of the Viterbi decoder come up in literature frequently [18, 19, 20, 21, 22, 23]. In

addition, We have to consider the quantization and precision problems [24]. At last,

Figure 1 of [22] summarizes the performance of various published Viterbi decoders.

1.3 Outline

This thesis is organized as follows:

Chapter 2 provides the background knowledge of convolutional codes and Viterbi

algorithm. We also show some channel noise models and the corresponding genera-

tion mechanisms.

Chapter 3 shows many SMM architectures and disclose the implementation is-

sues in Viterbi decoder. We define some perfomrance metrics to measure the perfor-

mance of SMM architecture. The major hardware components used in every SMM

architecture are also summarized in the last section.

3

CHAPTER 1. INTRODUCTION

Chapter 4 shows the details of the simulation platform and simulation results.

The BER performance of convolutional code with Viterbi algorithm will be analyzed.

The ATBD metric was defined to equalize the TBD of different SMM architectures.

A simple hardware area estimation is also shown in this chapter.

Chapter 5 summarizes and concludes this work.

4

Chapter 2

Convolutional Code & ViterbiAlgorithm

2.1 Convolutional Codes

Convolutional codes offer an approach to error control substantially different

from that of traditional block codes, such as BCH or Reed-Solomon codes [6, 7, 8].

The input of the convolutional encoder is a stream of information symbols, then

the convolutional encoder converts the entire data stream, regradless of its length,

into a single code word by a linear digital filter. Figure 2.1 shows a typical rate-1/2

convolutional encoder. The rate of this encoder is established by the fact that the

encoder outputs two bits for every input bit.

When a new input bit arrives, the contents in the registers are shifted right,

such that the oldest bit is removed. Each information bit stays within the encoder

for a fixed amount of time. The constraint length K of a convolutional code is

the maximum number of bits in a single output stream that can be affected by any

input bit. In Figure 2.1, the maximum number of bits affected by any input bit

is three. The output bits are determined by selectively summing the information

remembered by the encoder. For example, assume that the input to the encoder is

1, 0, 1, 0, ..., and the registers are initialized to 0. At time interval 0, first input

bit ’1’ arrives. The value of y00 is (0+1) modulo 2 = 1, and the value of y10 is

5

CHAPTER 2. CONVOLUTIONAL CODE & VITERBI ALGORITHM

(0+0+1) modulo 2 = 1. At time interval 1, the first input bit is shifted to ”r1”

while the second input bit ’0’ arrives. The value of y01 is (0+0) modulo 2 = 0, and

the value of y11 is (0+1+0) modulo 2 = 1. The same encoding operation continues

to generate the output bits.

Convolutional encoders can be viewed in a number of different ways. For ex-

ample, they may be considered as finite impulse response (FIR) digital filters or

as finite state machine (FSM). Both of these approaches and their corresponding

analytical tools yield some interesting insights into the structure of convolutional

codes.

As a FIR filter, the encoding operation can be described by two generator poly-

nomials, and each of them corresponds to an output bit stream. The generator

polynomials in Figure 2.1 are (the operator D represents a delay)

G(0)(D) = D2 + 1

G(1)(D) = D2 + D + 1

The interpretation of these polynomials is that the output are given by the modulo-2

sum of the corresponding bit in the registers. Generator polynomials are often coded

in octal in literature. A convolutional code system can be uniquely specified by a

constraint length and a set of generator polynomials. For example, the convolutional

code system in Figure 2.1 can be represented as (G0=5, G1=7, K=3).

Figure 2.2 shows another example of rate-1/3 convolutional encoder. This is the

base system used in DigiCipher II (DCII) standard which is one of the satellite TV

broadcasting standards in the United States.

In order to analyze the behavior of the convolutional encoder, we can view it as

a finite state machine (FSM). For example, the encoder in Figure 2.1 is equivalent

to the FSM in Figure 2.3. Each state corresponds to a unique value of the convo-

lutional encoder. Given the current state (XY), after the bit X is removed from

6

2.1. CONVOLUTIONAL CODES

r1 r2++…,x2,x1,x0 …,y02,y01,y00…,y12,y11,y10+: Delay Element : Modulo-2 Add

Figure 2.1: Rate-1/2 convolutional encoder with generator polynomials (G0=5,G1=7, K=3) +++…,x2,x1,x0 …,y22,y21,y20…,y12,y11,y10…,y02,y01,y00Figure 2.2: Rate-1/3 linear convolutional encoder with generator polynomials(G0=161, G1=135, G2=171, K=7)

the registers, the next state can be either (Y0) (corresponding to input bit ”0”) or

(Y1) (corresponding to input bit ”1”). Each branch in the state diagram has a label

of the form X/YZ, where X is the input bit that causes the state transition and

YZ is the corresponding pair of output bits. From the FSM, a new representation

called trellis diagram is shown in Figure 2.4. This structure is especially suitable for

the interpretation of Viterbi algorithm.

To generalize the previous explaination, a rate-a/b with constraint length K

convolutional code has 2a(K−1) states. Each state has 2a incoming and outgoing

7


00 1101 10 1/011/101/11 1/00 0/100/110/00 0/01

Figure 2.3: State diagram for encoder in Figure 2.111100100State Iteration Time

0/001/110/101/010/111/000/011/10t t+1 t+2 t+3 t+4

Figure 2.4: Trellis diagram of Figure 2.3

branches and each branch represents a b-bit code word. Hence it can be seen that

the number of states grows exponentially with a and K, and the hardware complexity

also increases exponentially as well. In practice, we realize complex code rate (e.g.

6/7, 7/8) by puncturing the simple data rate (e.g. 1/2, 1/3) convolutional code. This

approach enhances the modulization of the system. We gain flexibility to manipulate

the data rate over the channel, by merely adding a puncturing unit at the encoder

and a depuncturing unit at the decoder. By contrast, directly implementing a

variable data rate codec are more inflexible and expensive.

8

2.1. CONVOLUTIONAL CODES

In this work, we use three popular convolutional codes systems. The first is used

in Digital Video Broadcasting Standard (DVB) [25, 26, 27], which is the European

standard for digital broadcasting. The second is used in DigiCipher II (DCII), which

is one of the main satellite TV broadcasting standards in the United States. The

third is used in Universal Mobile Telecommunications System (UMTS), which is

used in Code-Division Multiple Access (CDMA) applications.

Table 2.1: Summary of three convolutional codes

K G0 G1 G2 dfree

DVB 7 133 171 - 10

DCII 7 161 135 171 15

UMTS 9 561 753 - 12

UMTS 9 557 663 711 18

Table 2.1 summarize the convolutional codes mentioned above. Constraint lengths

of K less than 5 are too small to provide any substantial coding gain, while systems

with K greater than 9 are typically too complex to implment as a parallel architec-

ture on a single VLSI device. The dfree, means by minimum free distance, is

the minimum Hamming distance between all pairs of complete convolutional code

words. In general, larger constraint length or higher code rate results in larger dfree.

The convolutional code with larger dfree has more coding gain. To know more details

of convolutional codes, please refer to [6, 7, 8]

9


2.2 The Viterbi Algorithm

Consider the decoding problem presented in Figure 2.5. An information sequence

x is encoded to form a convolutional code word y, which is then transmitted across a

noisy channel. The convolutional decoder takes the received vector r and generates

an estimation y′ of the transmitted code word.x Convolutional Encoder Channel Convolutional Decodery r y’Figure 2.5: Information flow

The maximum likelihood (ML) decoder selects, by definition, the esti-

mate y′ that maximizes the probability p(r|y′), while the maximum a posteriori

(MAP) decoder selects the estimate that maximizes p(y′|r). If the distribution of

the source words x is uniform, then the two decoders are identical; in general, they

can be related by Bayes’ rule

p(r|y)p(y) = p(y|r)p(r) (2.1)

The development of the ML decoder is pursued in this section. Suppose that a

rate-a/b convlutional encoder is in use, and we have an input sequence x composed

of L a-bit blocks.

x = (x(0)0 , x

(1)0 , ..., x

(a−1)0 , x

(0)1 , x

(1)1 , ..., x

(a−1)1 , ..., x

(a−1)L−1 )

The output sequence y will consist of L b-bit blocks (one for each input block) as

well as m additional blocks, where m is the length of the longest shift register in the

encoder.

y = (y(0)0 , y

(1)0 , ..., y

(b−1)0 , y

(0)1 , y

(1)1 , ..., y

(b−1)1 , ..., x

(b−1)L+m−1)

A noise-corrupted version r of the transmitted code word arrives at the receiver,

where the decoder generates a maximum likelihood estimate y′ of the transmitted

10

2.2. THE VITERBI ALGORITHM

sequence. r and y′ has the following form.

r = (r(0)0 , r

(1)0 , ..., r

(b−1)0 , r

(0)1 , r

(1)1 , ..., r

(b−1)1 , ..., r

(b−1)L+m−1)

y′ = (y′(0)0 , y

′(1)0 , ..., y

′(b−1)0 , y

′(0)1 , y

′(1)1 , ..., y

′(b−1)1 , ..., y

′(b−1)L+m−1)

A few assumptions about the channel need to be made to facilitate the analysis.

We assume that the channel is memoryless, i.e., that the noise process affecting a

given bit in the received word r is independent of the noise process affecting all of

the other received bits. Since the probability of joint, independent events is simply

the product of the probabilities of the individual events, it follows that

p(r|y′) =L+m−1∏

i=0

[ p(r(0)i |y′(0)

i ) p(r(1)i |y′(1)

i ) . . . p(r(b−1)i |y′(b−1)

i ) ]

=L+m−1∏

i=0

(b−1∏

j=0

p(r(j)i |y′(j)i )) (2.2)

There are two sets of product indices, one corresponding to the block numbers

(subscripts) and the other corresponding to bits within the blocks (superscripts).

Equation (2.2) is sometimes called the likelihood function for y′. Since logarithms

are monotonically increasing, the estimate that maximizes p(r|y′) is also the estimate

that maximizes log p(r|y′). By taking the logarithm of each side of Equation (2.2),

we obtain the log likelihood function

log p(r|y′) =L+m−1∏

i=0

(b−1∏

j=0

log p(r(j)i |y′(j)i )) (2.3)

By inspecting the Equation (2.3), we may notice that the summands are prob-

ability, i.e., real number. In hardware implementations of the Viterbi decoder, the

summands in Equation (2.3) are usually converted to a more easily manipulated

form called the bit metrics

M(r(j)i |y′(j)i ) = a [ log p(r

(j)i |y′(j)i ) + b] (2.4)

11


a and b are chosen such that the bit metrics are small positive intergers that can be

easily manipulated by digital logic circuits. The path metric for a code word y′ is

then computed as follows.

M(r|y′) =L+m−1∏

i=0

(b−1∏

j=0

M(r(j)i |y′(j)i )) (2.5)

If a in Equation (2.4) is positive and real, while b is simply real, then the code word

y′ that maximizes p(r|y′) also maximizes M(r|y′).At times it is useful for us to focus on the contribution made to the path metric

by a single block of r and y′. Recall that a single block corresponds to a single

branch in the trellis. The kth branch metric (BM) for a code word y′ is defined

as the sum of the bit metrics for the kth block of r given y′.

M(rk|y′k) =b−1∑

j=0

M(r(j)k |y′(j)k ) (2.6)

After explaining the Viterbi algorithm from a theoretical point of view, let us illus-

trate the above algorithm by an example.

By inspecting the example shown in Figure 2.3, we can redraw the evolution

of this four-state FSM by using the trellis diagram and present it in Figure 2.6.

Each node corresponds to a unique state, and each branch corresponds to a state11100100State Iteration Time

0/001/110/101/010/111/000/011/10t t+1 t+2 t+3 t+4Figure 2.6: Trellis diagram of Figure 2.3

12

2.2. THE VITERBI ALGORITHM

transition. Given a known starting state, every input sequence corresponds to a

unique path through the trellis. At the Viterbi decoder, upon receving an encoded

symbol, each branch is assigned a weight referred to as branch metric, which is a

measure of the likelihood of the transition given the noisy observations at receiver.

Branch metrics are typically calculated using a distance measure, so that more likely

transitions are assigned with smaller weights. Given the unique mapping between

a trellis path and an input sequence, the most likely path (shortest path) through

the trellis corresponds to the most likely input sequence. The Viterbi algorithm is

an efficient method for finding the shortest path through the trellis.

The first phase of the Viterbi algorithm is to recursively compute the shortest

path to time t + 1 in terms of that to time t. At time t each state S is assigned a

state metric ΓSn which is defined as the accumulated metric along the shortest path

leading to that state. The state metric at time t + 1 can be recursively calculated

in terms of the state metrics of the previous iteration as follows:

Γjt+1 = min

i{Γi

t + λi,jt } (2.7)

where i is a predecessor state of j and λi,jt is the branch metric on the transition

from state i to state j. The recursive update given in Equation (2.7) is the well-

known add-compare-select (ACS) operation. By inspecting Equation (2.7), we

know that every ACS operation occured at state j not only updates the state metric

but also finds out the decision about predecessor state. In order to facilitate the

second phase of the Viterbi algorithm, we will keep these decesions in memory.

The second phase of the Viterbi algorithm involves tracing back and decoding

the shortest path through the trellis, which is recursively defined by the decisions

from the ACS updates. We usually refer to this function as Survivor Memory

Management (SMM). The shortest path leading to a state is referred to as the

survivor path for that state. A property of the trellis which is utilized for survivor

13


path decode is that if the survivor paths from all possible states at time t are traced

back, then with high probability all the paths merge at time (t−L), where L is the

traceback depth (or survivor path length). Once merging to the survivor path,

the path is independent of the starting state and future ACS iterations.

Based on the merging property of survivor path, the traceback method for

survivor path decode proceeds as follows. At time t an arbitrary starting state is

chosen and the survivor path is traced back to time (t − L), at which point the

input symbol corresponding to the transition at time (t − L) is decoded. So far,

the simplest architecture of Viterbi decoder has been presented. We will give full

discussion on the SMM architectures in Chapter 3.

2.3 Punctured Convolutional Codes

Puncturing is defined as the systematic deletion of one or more bits in every code

word. The usage of convolutional code facilitates the design of variable code rate

Viterbi decoder. Given a fixed encoder structure, higher-rate convolutional codes

are achieved by periodically deleting bits from the output streams of convolutional

encoder. For example, let C be a rate-1/2 convolutional code, and let x be the

source information sequence corresponding to a code word y ∈ C.

x = (x0, x1, x2, ...)

y = (y(0)0 y

(1)0 , y

(0)1 y

(1)1 , y

(0)2 y

(1)2 , y

(0)3 y

(1)3 , y

(0)4 y

(1)4 , ...)

If every fourth bit of y is deleted, the resulting punctured code word yP has the

form

y = (y(0)0 y

(1)0 , y

(0)1 E, y

(0)2 y

(1)2 , y

(0)3 E, y

(0)4 y

(1)4 , ...) (2.8)

E’s have been inserted to mark the location of the deleted bits, though nothing is

actually transmitted for these bits. Since yP has three code-word bits for every two

14

2.3. PUNCTURED CONVOLUTIONAL CODES

information bits, therefore, yP becomes a code word in a rate-2/3 punctured code

CP. If the receiver inserts erasures at the point where bits have been punctured,

the rate-1/2 Viterbi decoder may thus be used instead of a more complicated rate-

2/3 one. For example, we can puncture the rate-1/2 convolutional code shown in

Figure 2.1, the resulting convolutional code is shown in Figure 2.7 (a), where the

corresponding rate-2/3 trellis can be derived in Figure 2.7 (b). It must be noted

that the dfree of a punctured convolutional code is smaller than the original code.

Hence the punctured convolutional codes boost data rate at the expense of degraded

coding gain.

00111001110001

100E1E1E0E1E0E0E

1X11100100State

001111110001 110010111000000

011010011 10110011100100

State 100101(a) (b)

Figure 2.7: Trellis diagram for rate-2/3 punctured convolutional code

To decode punctured convolutional code, we need an extra depuncturing block

and synchoronization block in the Viterbi decoder. The depuncturing block will

process the input data and insert dummy cycles to restore the rate-1/2 timing re-

lationships before sending the data stream to the ACS unit. The synchronization

block will correct the timing ambiguity in the depuncturing process. With these

extra blocks, we can support all the different coding rates with a single rate-1/2

Viterbi decoder engine.

Table 2.2 shows the punctured code rates and the puncture matrix for all three

standards.

15


Table 2.2: Puncture code definition

Standard Rate Puncture Matrix

DVB 1/2 N/A

2/3 [11],[10]

3/4 [110],[101]

4/5 [1111],[1000]

5/6 [11010],[10101]

6/7 [111010],[100101]

7/8 [1111010],[1000101]

DCII 1/2 [0],[0],[1]

3/4 [100],[001],[110]

5/11 [00111],[11010],[11111]

2/3 [11],[00],[01]

4/5 [0111],[0010],[1000]

7/8 [0000000],[0000001],[1111111]

3/5 [001],[010],[111]

5/6 [00111],[00000],[11001]

UMTS (1/2) 1/2 N/A

2/3 [11],[10]

3/4 [111],[100]

4/5 [1101],[1010]

5/6 [10110],[11001]

6/7 [110110],[101001]

7/8 [1101011],[1010100]

16

2.4. COMMUNICATION CHANNEL MODELS

2.4 Communication Channel Models

In the design of communication systems for transmitting information through

physical channels, we find it convenient to construct mathematical models that

reflect the most important characteristics of the transmission medium. Then, the

mathematical model for the channel is used in the design of the channel encoder

and modulator at the encoder, and the demodulator and channel decoder at the

receiver.

Figure 2.8: Additive channel noise model

The simplest mathematical model for a communication channel is the additive

noise channel, which is illustrated in Figure 2.8. Physically, the additive noise

process may arise from electronic components and amplifiers at the receiver of the

communication system or from interference encountered in transmission. In this

model, the transmitted signal s(t) is corrupted by a random process n(t). This type

of noise is usually characterized statistically as a Gaussian noise process. Hence, the

resulting mathematical model for the channel is usually called the additive Gaussian

noise channel. Because of its simplicity and mathematical tractability, this channel

model is used extensively in the communication system analysis and design. Channel

attenuation is easily incorporated into the model. The received signal is

r(t) = αs(t) + n(t) (2.9)

where α is the attenuation factor.

17


Although AWGN provides a satisfied model in most situations, it is not capable of

modeling some phenomena such like multipath fading [7] (intersymbol interference)

and shadowing [28]. These phenomena severely degrades the performance of data

transmission. In communication system, with land mobile terminal via satellite,

shadowing is caused by large obstacles in the signal path. Multipath fading is

caused by reflection of the satellite signal at a large number of points. Rician,

Rayleigh, Nakagami and lognormal processes are used for describing these special

phenomena. The Rician fading model may be used to model both the microcellular

environment and the mobile satellite fading channel. In these environments, there is

no obstacle in the signal path. The Rayleigh process is a widely accepted statistical

model for the received signal evelope in macrocellular mobile radio channels, where

there is no direct line-of-sight (LOS) radio propagation path. The more general

Nakagami fading model, parameterized by the fading severity parameter m, has been

shown to fit well to some urban multipath propagation data. Lognormal process is

usually used to model the shadowing effect by large obstacles such as buildings and

mountains. To know the details of the above random process, please refer to [7].

In addition to this fading only model, the received signal envelope in mobile radio

environments may suffer from shadowing, due to the topographical variation of the

transmission path. Therefore, for describing the random signal envelope variations

in microcellular mobile radio systems, a channel model characterized by Nakagami

fading and lognormal shadowing, so called Nakagami-lognormal (NLN) channel [29],

is appropriate. Certainly, there are many other channel models suitable for more

specific situations.

In addition to the mathematical model of the channel, we have to know how to

generate them systematically. Some popular methods are summarized.

White Gaussian Random Process Method [30] In this method, it is assumed

that all computer models (Rayleigh, Rican, Log-normal) for the fading chan-

18

2.4. COMMUNICATION CHANNEL MODELS

nels are based on the manipulation of a white Gaussian random process. The

white Gaussian random process can be approximated by a sum of sinusoids

with random phase angle.

Inverse Transform Method [31] This method is originated from probability the-

ory. We can generate a random variable x with continuous cdf F (x). Some

modified procedures for generating correlated Rayleigh and Nakagami-m fad-

ing signal pairs are also developed [32].

Direct Probe No mathematic model is better than the real world. A good ap-

proach is to extract the signals by probing the real channel and to record

them on tape [33]. The only priori assumption is that the channel is non-

frequency selective, which for low rate data transmission can be easily verified

for most mobile fading channels.

19


20

Chapter 3

Viterbi Decoder Architecture

3.1 Overview

The Viterbi decoder can be divided into five blocks: the depunctue unit, the

branch metric (BM) unit, the Add-Compare-Select (ACS) unit, the survivor mem-

ory management (SMM) unit and the synchronization unit. The block diagram

of a constraint length K, punctured rate-m/n Viterbi decoder (based on a rate-1/b

Viterbi decoder) is shown in Figure 3.1.

The demodulator output goes to the depuncture unit where dummy symbols are

inserted to restore the rate-m/n encoded data rate to rate-1/b source data rate. In

the BM unit, b input soft decisions and the puncture matrix from the depuncture

logic are used to generate 2b different branch metrics. The ACS unit selects the

surviving branches according to Equation (2.7) and generates the decision vectors.

According to the decision vectors generated by the ACS unit, the SMM unit finds the

survivor sequence to produce the decoded output bit stream. Many high speed or low

power ACS architectures have been proposed [9, 10, 11]. The synchronization unit

finishes the decoding process by resolving the timing and phase ambiguity introduced

by the channel. Various synchronization schemes are presented in the literature [34,

35, 36, 37, 38]. The SMM unit is the most critical part in the design of Viterbi

decoder. It affects the coding gain performance, hardware cost and throughput of

21

CHAPTER 3. VITERBI DECODER ARCHITECTUREDe-modulatorDe-Puncture UnitBranch Metric UnitAdd Compare Select UnitSurvivor Memory Management UnitSynchronization Unit

rate m/n data symbolrate 1/b data symbol ….b.….….2b….…2K-1…branch metricACS decision vectordecoded output

Viterbi Decoderchannel input

Figure 3.1: Viterbi decoder system block diagram

the Viterbi decoder. We will focus our discussion on the SMM unit in the latter

sections.

In the Viterbi decoding process, the decision vectors generated by the ACS unit

22

3.1. OVERVIEW

are stored in memory, so that the information bits can be retrieved later. To achieve

the maximum likelihood decoding of the convolutional code, the Viterbi algorithm

calls for arbitrary long path memories. But this requirement is impractical for the

actual hardware implementation. However, upon examining the Viterbi algorithm

closely, the required memory can be cut to manageable size with negligible loss in

coding gain performance. An unmerged path will accumulate more distance metric

than the merged one with high probability. In other words, upon merging to the

survivor path, an incorrect path with a very long unmerged span will have very low

probability of having a smaller metric than the correct path. Consequently, with

very high probability, the best path to all the states will have diverged from the

correct path within a short span, typically a few times of constraint length. So the

path memory only needs to be long enough to ensure that all path are merged. This

length denotes the traceback depth (TBD), L, in Viterbi decoders.

Definition 1 Traceback Depth (TBD) is the number of traceback operations per-

formed berfore decoding the first bit.

Forney [5] had shown through ensemble coding arguments that the probability

of truncation error decreases exponentially with TBD. At low signal-to-noise ratios,

his experimental result shows that the probability of truncation error is negligible

for L ≥ 5.8m (Rate-1/2), where m is the maximal memory order (the length of the

longest shift register in the encoder).

There are two basic approaches: register exchange (RE) method and trace-

back (TB) method. In both techniques a shift register is associated with every

trellis node throughout the decoding operations. RE based method is suitable for

high speed application at the expense of complex routing and high power dissipa-

tion. The routing complexity grows exponentially with the constraint length. For

Viterbi decoder implementation with medium to high constraint lengths, the TB

23

CHAPTER 3. VITERBI DECODER ARCHITECTURE

based method is the preferred architecture for SMM unit because of its lower hard-

ware requirements. TB based method has commonly been used in low-throughput

or low-power applications. It permits the design of very compact RAM that provides

significantly area advantages. We will show the details of RE and TB architectures

in Section 3.2 and Section 3.4.

3.2 Register Exchange Architecture

Figure 3.2 shows the architecture of Register Exchange method. A register ex-

change architecture consists of a two-dimentional array of one-bit registers and mul-

tiplexers. The desicion sequences of all 2K−1 (in this case, K=3) states are stored in

every column of registers. These registers are interconnected in precisely the fashion

as the ACS circuits. Associated with every trellis state is a register which contains

the survivor path leading to that state. The decision sequence dSn−4,n of the survivor

path to state S from time (n−4) to n is given by the recursive update:

dSn−4,n = (dS′

n−1−4,n−1 << 1)dSn (3.1)

where S ′ is the predecessor state of S as determined by its decision dSn from the

ACS update. Each survivor path is uniquely specified and stored as the sequence

decisions along the survivor path.

Each time a new branch is processed by the ACS, the register values are inter-

changed corresponding to the decision values. A new symbol is added to one end

of each register, and the oldest symbol in each register is delivered to the output

decision device. Figure 3.3 shows the trellis network of the RE architectures. Every

register has two candidate predecessors, one is connected with solid trellis and the

other is connected with dash one. The solid trellis corresponds to the decision value

generated by the ACS unit. At the moment of copying, the decision value is also

appended to the end of the registers. After L iterations, the state register with the

24

3.2. REGISTER EXCHANGE ARCHITECTURE

11100100State decision vector

…..

OutputSurvivor Path Merging DecodedOutput

d[3]d[2]d[1]d[0]d[3]d[2]d[1]d[0]

d[3]d[2]d[1]d[0]Figure 3.2: Register exchange architecture

minimum path metric contains the information of survivor path. (It seems that no

traceback operations were performed and the definition of TBD is applicable here.

But forward register exchange and traceback operation are essentially equivelant

with regard to Viterbi algorithm.)-1-0T=111100100

State 11011000T=2111101110000T=3

1011110110101100T=410111110011101010100T=5

----T=0Figure 3.3: Register contents for register exchange operations

25


Figure 3.4 shows another timing diagram of the RE architecture. At any time

interval, new decision values were appended to the registers and the MSB of the

zeroth register were taken as the decoded bit.n n+L-1n+1 n+LLdecisionvalues

n+1ndecoded bits……. T = n+L-1T = n+L

Figure 3.4: Timing diagram of register exchange architecture

For high-speed implementations, this technique requires all the shiftings to be

done in parallel. As a result, every stage will have a routing channel for interchang-

ing the decision values. It is the routing complexity which makes the RE method

impractical for large constraint lengths, or high puncture rate situation which have

high TBD and large number of states.

Variation of this architecture can be used in other ways to improve the register

exchange architecture. We will show it in Section 3.3.

26

3.3. MODIFIED REGISTER EXCHANGE ARCHITECTURE

3.3 Modified Register Exchange architecture

The RE method is based on successive RE operations between two origin states

and two destination states. It is the value of the registers that is interchanged along

the trellis. In the section, the proposed method uses the ”pointer” concept. State

indexes can be used as inputs to the register exchange network instead of the value

of the registers [17]. Instead of moving the values between registers, the pointer

to the source register is altered to point to the destination register. In the Viterbi

decoder, every state is assigned a register and a pointer; here the pointer to the

register simply carries the current state.ij

pqTime t Time t+1

BMt(i,p)BMt(j,q)BMt(j,p)BMt(i,q)PMt-1(i)

PMt-1(j) i = 0xj = 1xp = x0q = x1X is the common group of bits among statesFigure 3.5: Register contents for register exchange operations

For example, if (PM it−1+BM i,p

t ) is greater than (PM jt−1+BM j,p

t ), then the path

from j to p is the survivor path for p. The pointer to register j which carries the

value j is shifted to the left, and the bit, which causes the survivor path transition

from j to p (in this case ”0”), is appended to the LSB. Therefore, the pointer which

carried the value j now carries the value p, thereby pointing to register p. Then,

the decoded bit is appended to the content of the register whose pointer value is

changed from j to p. It must be noted that the register has a fixed physical location;

only the value of its pointer changed, and a bit is appended to the corresponding

register for each code word received. But a problem arises when j is the predecessor

27


of p and q at the same time. Which value should the pointer j take, p or q? In other

words, which of the two paths, originating from the same source state, should be

the survivor path, and how should the other path terminate? The ACS unit then

needs to produce a new decision bit, called the termination bit.

The solution to this problem is simple. If the both paths originated from state

j are considered to be the survivor paths for the destination state p and q, the BMs

of both paths are compared. Then, the pointer, which carries the value j, changes

to the value of the destination pointer whose path has the smaller BM. The value

of the pointer of state i changes to the other destination pointer. Simultaneously,

the path from state i receives a termination high signal. This indicates that the

path from state i is terminated, and no decision bits are appended to its register

anymore. Then we can prevent the presence of duplicated pointer values.

3.3.1 Comment

In the process of implementing the modified RE architecture, we find a problem

hidden in this architecture. In that paper, the update mechanism of register pointers

does not consider a usual case. For example, if j is the predecessor of p and q at the

same time, according to the update scheme above, we have to find out the smaller

branch metric of BM j,pt and BM j,q

t . But when BM j,pt and BM j,q

t are equal, we will

have trouble in determining which of them will be the successor. Choosing either of

the states may break the essence of Viterbi algorithm, because we have no chance to

recover the error we made at this time instance. In the tradtional RE architecture,

this kind of problem is resolved by keeping both of them alive and deferring the

determination timing (Viterbi algorithm). But it seems to be difficult to solve this

problem in the modified RE architecture.

28

3.4. TRACEBACK METHOD

3.4 Traceback Method

The traceback (TB) method is a backward processing algorithm for deriving the

survivor path from a starting state and the path decisions. The survivor memory

does not store the whole surviving paths but the decisions of branch selection in

every step. With these decision vectors output by ACS unit, the TB algorithm can

then trace the survivor path in the opposite direction to the ACS update, hence the

name ”traceback” algorithm. Given the current state Sn and decision dsn, the TB

algorithm estimates the previous state Sn−1 by:

Sn−1 = f(Sn, dsn) (3.2)

One may notice that the current state decision dsn is read from the decision

memory by using the current state Sn and time index n as an address. The function

f()̇ is a mapping function depending on the structure of the trellis. For the radix-2

trellis, Equation (3.2) simplifies to

Sn−1 = dsn(Sn >> 1) (3.3)

which corresponds the concatenation of the decision bit and the 1-bit right shift of

the current shift register state. The radix-4 fully parallel architecture for a radix-2

trellis is the lookahead scheme which obeys ideal linear scaling for a two-fold increase

in throughput [18]. The two levels of traceback can be done with one mapping

function in Equation (3.4). This technique provides speedup of traceback operations

with extra lookahead hardware. But throughput doubling relies on implementing

a 4-way ACS operation at the same rate as a 2-way ACS. A 4-way ACS circuit

has been presented that achieves this goal to within 17% overhead, resulting in an

overall speedup of a factor of 1.7.

Sn−2 = dsn−2,n(Sn >> 2) (3.4)

29


A straightforward implementation of the traceback method is shown in Fig-

ure 3.6. A total of (2K−1 × L) bit memory is needed (K is constraint length and L

…………LTime2K-1 Decision MemoryWRITEDecode Traceback

Decision MemoryFigure 3.6: Basic implementation of TB method

is TBD). At cycle t, the ACS decision vector is written to memory location (t mod

L). The algorithm then traces back for L cycles to get the merged survivor path.

The decoded bit is then derived from the end state of the surivior path (the starting

state is at cycle t and the end state is at cycle t− L). So far, we have decoded one

bit, and the same procedure is repeated for the next bit. In later discussions, we

will call this straightforward TB architecture as simple TB architecture (TBS).

Figure 3.7 shows the values of the registers for the TB method. This algorithm

is straightforward and the memory size grows linearly with L and exponetially with

K. However, this algorithm is undesirable with regard to circuit implementation due

to the high numbers of memory accesses required. It poses serious problem to the

throughput of the Viterbi decoders.

30

3.4. TRACEBACK METHOD-0-0T=111100100

State 000000T=20100101000T=3

01000110100001T=4010100110010000011T=5

----T=0Figure 3.7: Register contents for traceback operations

Take a closer look at the TB algorithm, there are three elemental operations

used in the decoding process:

Traceback read (TB): Reading the path decision in the current state, then com-

bining this information with the current state to find the previous state.

Decode Read (DC): This operation is similar to TB operation. Instead of just

finding the previous state, it also decodes the symbol and sends it to the

bit-order reversing circuit(because the direction of decoding is reverse to the

stream of symbols).

Writing New Data (WR): This operation writes the decision vectors output by

ACS unit into the memory.

The TB algorithm consists of repetitions of the above three operations. In every

ACS iteration, the path decisions are written to a circular memory bank. Then the

traceback read operation excutes for L iterations such that all survivor sequences

are merged to one common path. Finally the common path is scanned by the DC

operations to retrieve the information bits. In the simple TB architecture, we have

to do one WR operation and L TB operations to decode one bit.

31


In order to evaluate the throughput performance of different SMM architec-

tures, we define a performance index, traceback read to ACS iteration ratio

(TRAIR). This ratio corresponds to the read/write speed of the memory. Low

TRAIR ratio facilitates the design of the Viterbi decoder. In the design of high-

speed Viterbi decoder, it is desirable to bring the TRAIR as close to 1 as possible

to reduce the memory throughput requirement. In the simple TB architecture, the

ratio between the memory read and the ACS iteration is L, which is not appropriate

for high-speed implementations. In the the later sections, we will present more SMM

architectures suitable for high speed implementation.

3.5 One-Pointer Traceback

In Section 3.4, the basics of the TB algorithm is studied. Due to the high

TRAIR ratio, the TBS architecture is not suitable for high-speed implementation.

However, with some minor modifications to the algorithm, we can derive different

architectures suitable for high-speed Viterbi decoding applications [13].

Instead of tracing back L steps and decoding one symbol in an iteration, we

can put the merged survivor path to better use. That is, the decoding process can

decode a series of symbols from the known merged path. We define the ratio between

the length of decode block and the traceback block as ”Decode to Traceback Ratio”

(DTR). The value of DTR ranges from 1/L to 1. Instead of decoding only one

symbol per L traceback operations, (DTR × L) symbols are decoded in the new

architecture.

Figure 3.8 shows the memory organization of the one-pointer TB architecture.

In order to achieve continuous decoding operations, three kinds of operations (WR,

TB and DC) must be completed at the same cycle.

Figure 3.9 presents the timing diagram of the one-pointer TB architecture with

DTR=0.5. As shown in the figure, for every ACS iteration, the traceback unit

32

3.5. ONE-POINTER TRACEBACK

LDTR * L Traceback (TB) Write(WR)Decode(DC) DTR * LFigure 3.8: Memory organization of one-pointer traceback architectureDC TB TB WRDC DC DC

WR WR WRTB TBTBTBTB TBDC TB TB WR

Time

Figure 3.9: Timing diagram of one-pointer traceback architecture with DTR=0.5

has to perform three read operations and one write operation. In a straightforward

implementation, this architecture requires a single port RAM that runs four times

faster than the ACS cycle time. It results in a very high-speed requirement for the

RAM design. For high-speed Viterbi decoders, we can make modifications to achieve

a 1:1 (memory access/ACS iteration) ratio so as to reduce the speed requirement of

memory.

The DTR is important in determining the characteristics of the one-pointer TB

33


architecture. By varying the DTR, we can derive architectures with different mem-

ory speed requirement, memory size, throughput and coding gain performance. We

define several performance metrics to evaluate different SMM architectures. First,

we consider the TBD, L. TBD is defined as the number of traceback operations per-

formed before decoding the first bit. This parameter directly affects the throughput

and coding gain performance of the Viterbi decoder. Compared with the simple

TB architecture defined in Section 3.4, the decoding process in the one-pointer TB

architecture is an extension of the survivor path merging process. For the first de-

coded bits in the decoding block, the TB depth is L. For the last decoded bits in

the decoding block, the equivalent TB depth is (L + DTR ∗ L). In other words,

the probability that all the paths have merged into a common path is higher for the

last decoded symbol in the block than the first symbol. Therefore, the meaning of

TBD in the one-pointer TB architecture differs from that of the simple TB archi-

tecture. Note that the one-pointer TB architecture is equivalent to the simple TB

architecture when DTR equals 1L.

We can try to define some metrics, for example, the average mean traceback

depth (ATBD) is then given by:

ATBD =L + (L + DTR ∗ L)

2= (1 +

DTR

2)L (3.5)

For the one-pointer TB architecture, the TRAIR is given by:

TRAIR =L + DTR ∗ L

DTR ∗ L= 1 +

1

DTR(3.6)

The total decision memory length M is given by:

M = (DTR ∗ L) + L + (DTR ∗ L) = (1 + 2DTR)L (3.7)

These parameters determine the memory organization, throughput, and the hard-

ware area of the resulting TB architecture. But from Equation (3.5), the value of

34

3.5. ONE-POINTER TRACEBACK

ATBD also depends on DTR. To make the comparison fair, the decision memory

length (M) should be normalized, so that the ATBD of configurations with differ-

ent values of DTR are the same. Assume two configurations A and B each with

parameters DTRa, DTRb. La and Lb denote their TB depths. The value Lb that

will give configuration B equivalent ATBD to configuration A is given by:

Lb =2 + DTRa

2 + DTRb

La (3.8)

and the adjusted memory size is given by

Mb = (1 + 2DTRb)2 + DTRa

2 + DTRb

La (3.9)

This adjustment is especially important for the high-punctured-rate code, since their

TB depth is long and the impact on the memory size is significant.

Table 3.1: Comparison of traceback architectures

DTR TRAIR Memory Size ATBD Adjusted Memory Size

1 2 3L 1.5L 3.0L

1/2 3 2L 1.25L 2.4L

1/4 5 1.5L 1.125L 2.0L

1/7 8 1.28L 1.071L 1.8L

The comparison for architectures with different values of DTR is shown in Ta-

ble 3.1. The memory size decreases with the value of DTR, but at the expense of

larger TRAIR. The adjusted memory size is shown in the last column. As shown

in the table, once the ATBD of different configurations are made to be equivalent,

the memory size reduction is less significant for the smaller value of DTR.

Memory access speed is usually the limiting factor of the SMM unit throughput

in high-speed Viterbi decoder implementations. To simplify the memory design, it is

35


desirable to have a memory access rate equivalent to the ACS iteration rate. Addi-

tional hardware is needed to deal with the higher memory bandwidth requirement.

We will show two architectures to further resolve this problem. In order to minimize

the chip area, lower memory array size must be weighed against hardware needed

to support the high memory access rate.

3.6 Multiple Pointer Traceback

In the multiple pointer traceback architecture [13], the problem of the high mem-

ory access rate is solved by dividing the memory array into several banks and access-

ing them in parallel with multiple read and write pointers. Consider the DTR=0.5

memory configuration with the TRAIR equal to 3. The total number of memory

accesses needed for each ACS iteration is four, counting read and write operations.

Three read and one write pointers are necessary to achieve a memory cycle time that

is the same as the ACS iteration time. The timing diagram of this configuration

is shown in Figure 3.10. To accommodate running four pointers at the same time,

total memory size is increased to three times the traceback depth (TBD), L.

The memory is divided in to six banks, and each of them is of length L/2. Each of

the two traceback pointers traces back L/2 stages to find the merged survivor path,

and the decoding pointer produces the decoded output by using the survivor path

found by the traceback pointers alternately. After every L stages, a new traceback

front is started from the fixed state (such as the zeroth state). Table 3.2 shows the

comparison of different multiple-pointer architectures. The memory size and the

adjusted memory size increase very much.

Compared with DTR=0.5 one-pointer traceback configuration, this 3-pointer

configuration requires an extra length L memory array, and five more set of address

decoders to achieve the lower memory access rate. Because this configuration is de-

rived from the DTR=0.5 configuration, it has the same ATBD of 1.25L. According

36

3.6. MULTIPLE POINTER TRACEBACKDC TB1 TB2WRTime DC Idle TB2IdleWR WR WR WR WRWRDC DC DCDC DCIdle IdleIdleIdle Idle IdleIdleIdle Idle IdleIdle Idle

TB1 TB1TB1 TB1 TB1TB1TB2TB2 TB2TB2 TB2

Bank2

L/2

Bank0 Bank1 Bank3 Bank4 Bank5

Figure 3.10: Timing diagram of multiple pointer traceback architecture

Table 3.2: Comparison multiple pointer traceback architectures

DTR TRAIR Memory Size ATBD Adjusted Memory Size

1 2 3L 1.5L 3.0L

1/2 3 → 2 3L 1.25L 3.6L

1/4 5 → 2 5L 1.125L 6.7L

1/7 8 → 2 8L 1.071L 12.9L

to your memory requirement, other configurations based on different values of DTR

are also possible.

37


3.7 Traceforward

In Section 3.6, the problem of high memory access rate to ACS iteration rate

was resolved by using multiple pointers, or by speeding up the traceback operations.

The above algorithm approaches this problem by parallelization, i.e. carrying out

more operations in one ACS iteration. However, by inspecting the basic traceback

algorithm, another effective traceback algorithm can be found. In the traceback

algorithm, the ”starting decode state” is found by tracing back the decision memory

for L times. But, as mentioned in Section 3.2, we can find this ”starting decode

state” by employing the register exchange structure [15]. Because the direction of

this computation is in the same direction as the ACS operation, this architecture is

named ”traceforward” architecture.

Figure 3.11 shows how the register exchange structure can be applied to find the

”starting decode state” for the traceforward algorithm. Suppose that there are L

levels of register exchange structure with the state indexes as the register content.

At cycle t, the state registers are initialized to their index number. Then the ACS

decision is applied to the register exchange trellis structure with the states exchanged

in the same fashion as the decision bits in the register exchange algorithm. After

L cycles, the state indexes at the output of this register exchange structure will

all merge to one common state index, the ”starting decode state.” In the actual

implementation, since only the starting state is needed for every L cycles, only one

stage of register exchange structure is necessary, as highlighted in the ”Implemented

Circuit” box of Figure 3.11. The routing complexity of the traceforward unit is

achieved by replacing the ACS path metrics routing with the state routing, and the

branch metrics routing with ACS decision routing.

The memory configuration of the traceforward algorithm is shown in 3.12. This

configuration is based on the one-pointer traceback algorithm with DTR=1. At

38

3.7. TRACEFORWARD

11100100State decision vector

…..

survivor path merging

d[3]d[2]d[1]d[0]d[3]d[2]d[1]d[0]ImplementedCircuit

4

state initializationevery L cycle convergedstartingdecode stateFigure 3.11: Traceforward unit

LTraceforward(TF) Write(WR)Decode(DC) LL starting stateFoldedFigure 3.12: Memory configuration of traceforward unit

first glance, the total memory size is 3L. But we can reduce the memory size to 2L

by folding the WR memory with the DC memory. Although no traceback operation

39


is executing in the traceforward memory, it acts as a buffer to wait for the starting

state generated from the traceforward unit. 12345678 121110916151413 1718192021222324 28272625

startingstate48121620

decodedoutput1234 5678 9101112 13141516 17181920Applying starting state to decode memory write and traceforward Decode

Figure 3.13: Timing diagram of traceforward algorithm

Figure 3.13 gives an example of the traceforward algorithm with L = 4. Due to

the folding of DC memory and WR memory, the running direction of the DC and

the WR pointers will change during the traceback operation. The numbers in the

40

3.7. TRACEFORWARD

figure represent the time cycles. From time cycle 1 to 4, the ACS decision vectors

are written to the memory. From time cycle 5 to 8, along with the ACS decision

updates, the traceforward unit computes the starting state for the decode operation.

At time cycle 8, the starting decode state for the decisions generated from time cycle

1 to 4 is ready, so from time cycle 9 to 12, the decode operation will generate four

decoded bits. Because the decoding direction is reverse to the input direction, the

decoded bits are sent to the bit order reversing (last in first out) circuit to restore

the correct order. The same process repeats through the whole decoding process.

The latency of the above example is 3L.ACS DecisionsDecision Memory Traceforward UnitState DecodeStarting StateState Selection DecodedOutputFigure 3.14: Block diagram of traceforward algorithm

The block diagram of the traceforward implementation is shown in Figure 3.14.

There are several reasons for the selection of DTR=1 over DTR=0.5 in the trace-

forward algorithm implementation. First, the size of the decode region equals L for

the DTR=1 configuration, so only one traceforward unit is necessary, while for the

41


DTR=0.5 configuration, two interleaved traceforward units are necessary to pro-

duce the starting decode state every L/2 cycles. Second, due to the folding of DC

and WR memory, the total memory is 2L, the same as the DTR=0.5 configuration.

Third, with the same L, the ATBD of the DTR=1 configuration is 20% larger

than the DTR=0.5 configuration. The drawbacks of the DTR=1 configuration

is that it requires a longer LIFO to rearrange the order of the decoded bits, and

the decoding latency is longer. Compared with the algorithms discussed so far, the

traceforward algorithm is unique due to the fact that it is based on a DTR=1 con-

figuration rather than DTR=0.5 configuration. In terms of hardware area tradeoff,

it requires a complicated traceforward unit, and a complicated read/write pointer

scheme. However, the routing overhead is kept to a minimum in this algorithm, and

an efficient ACS layout topology can be readily applied to reduce the traceforward

unit chip area, with the branch metrics routing replaced by the decision routing,

and the path metrics routing replaced by the state routing.

3.8 Sliding Block

To achieve unlimited concurrency and hence throughput in an area-efficient man-

ner, a sliding block Viterbi decoder (SBVD) is implmented that combines the filter-

ing characteristics of a sliding block decoder with the computational efficiency of the

Viterbi algorithm. The SBVD approach reduces the problem of decoding a continu-

ous input stream to decoding independent overlapping blocks without constrainging

the encoding process [19].

As shown in Figure 3.15, a general hybrid Viterbi algorithm can be derived that

combines forward and backward processing of the interval. At some trellis iteration

m, within the interval, the shortest path must pass through one of the four possible

states. Forward processing of the interval n − L to m yields four survivor paths

corresponding to the shortest paths from n − L to each state at time m. Similary,

42

3.8. SLIDING BLOCKConcatenated State Metric Forward State Metric Backward State Metric= +n-LForward ACS Backward ACSState EstimateTracebackDecodeState

n+LZeroInitialMetricZeroInitialMetric

nmShortest pathSm Sn

Figure 3.15: Hybrid Viterbi algorithm for selecting the shortest path through atrellis of finite length (four-state trellis example)

backward processing of the interval n+L to m yields the shortest paths from n+L to

each state at time m. For a given state at time m, the shortest path over the interval

through this state must be the concatenation of the forward and backward state

metrics. Hence, selecting the state Sm at time m with the smallest concatenated

state metric yields the starting state for tracing back the shortest path. If m = n

then Sn can be decoded directly; otherwise traceback from m to n is required. The

hybrid Viterbi algorithm subsumes the forward only and backward only algorithms,

which correspond to the special cases of m = n + L and m = n− L, respectively.

In the basic mode, the SBVD requires 2L trellis iterations per decoded output;

hence the relative area efficiency is 1/(2L). The relative area efficiency can be

increased by decoding a block of length M rather than just a single bit. That is,

43


apply the hybrid Viterbi algorithm to the interval n−M/2−L to n+M/2+L and

decode the interval n−M/2 to n+M/2. Each block of M decoded outputs requires

M + 2L trellis iterations and hence the corresponding DTR is M/(M + 2L). The

SBVD architecture can be applied to high-speed Viterbi decoding of convolutional

codes of any number of states. The hardware overhead, compared to sequential

Viterbi decoding, is independent of the number of states, but inverse proportional

to the ratio of the survivor path length L and the block length M . This relatively

small overhead enables codes of large constraint lengths to be decoded as efficiently

as codes of shorter lengths without a theoretical speed limit.SyncBlock DecodeBlock TracebackBlockn n+M/2n-M/2n-M/2-L n-M/2+LForward ACSTraceback/Decode Traceback StateEstimate(a)SyncBlock DecodeBlock SyncBlockn n+M/2n-M/2n-M/2-L n-M/2+L

Forward ACS Backward ACSStateEstimateTracebackDecode TracebackDecode(b)

Figure 3.16: Block decoding using the SBVD method: (a) forward processing and(b) equal forward/backward processing

44

3.8. SLIDING BLOCK

Any choice of m for the hybrid Viterbi algorithm can be used for block decoding.

Two schemes of practical interest are shown in Figure 3.16. Figure 3.16 (a) is

the forward only processing, and the equal forward/backward processing is shown

in Figure 3.16 (b). As in Figure 3.16, the SBVD is a true maximum likelihood

algorithm, selecting the path from the set of all possible paths that is closest to the

obserbed output over the finite observation interval.nn-2L n+2L n+4L n+6L4LInputStreamn+Ln-L n+3L n+5LOutputStream 2LSlidingBlockMethod

Figure 3.17: Continuous stream processing using the SBVD method.

In high-speed implementation, it is efficient to choose the M=2L. The reason

is similar to what have mentioned about traceforward architecture in Section 3.7.

Decoding of a continuous input stream using the SBVD method is analogous to

overlap-add filtering as shown in Figure 3.17. The input stream is blocked into

input symbol vectors of length 2L, successive pairs of which are decoded using the

SBVD method to produce output vectors of length 2L.

In the original paper, the implementation of this architecture adopts fully par-

allel hardware instead of memory traceback, because the constraint length of the

target convolutional code is only 3. But the hardware cost is very high with larger

45


constraint length such as 7 or 9, because the number of BM units or ACS units is

proportional to the TBD. Hence we adopts the memory based implementation in

the later discussion.

3.9 Best State Traceback Architecture

In the punctured Viterbi decoder design, a large TB depth L is necessary to deal

with the high puncture rate (i.e. 7/8 code). In the discussion up to now, it was

assumed that the traceback operation started from arbitrary chosen state. However,

it is possible to start from the state that is closest to the desired survivor path, by

utilizing the information contained in the path metrics. For the state with the lowest

path metrics, the probability is high that the state is already very close to the surivor

path. With this modification, we only have to traceback along one instead of 2K−1

survivor paths. Simulation shows that the best state traceback can cut the TB

depth in half while maintaining the same coding gain performance [12]. In high-

speed parallel Viterbi decoder implementations, the TB operation usually starts

from an arbitrary state due to the difficulty of finding the best state at high-speed.

As noted earlier, for the high puncture rate situation, the TB depth requirement

calls for a large RAM, which will adversely impact the area. Using the best-state

TB architecture can cut the traceback depth in half while maintaining the coding

gain. Because a parallel comparator is very expensive in chip area, we may replace

it by a smaller two-way serial comparator structure to find the best state. A bank

of buffer RAM is added, so the extra delay time can be used to carry out the serial

compare operation.

Figure 3.18 shows the timing diagram for the best state traceback algorithm

based on the DTR=0.5 one-pointer TB memory configuration. An additional mem-

ory block of length L/2 is added to allow the time for the serial comparison. The

total latency is 2.5L. However, since the best state traceback reduces the TB depth,

46

3.9. BEST STATE TRACEBACK ARCHITECTUREDC TB TBDC DC DCWR WR WR

TB TBTB TBTB TBDCTBTB WRTime WRCMP CMPCMP CMP CMPDC TB TB WRCMP

0 L/2 L 3L/2 2L 5L/2

Figure 3.18: Best state traceback timing diagram

the overall latency is lower than the blind zero-state traceback. For a rate-7/8 punc-

tured code, a traceback depth of 64 gives satisfactory coding gain performance. The

selection of L = 64 leaves 32 clock cycles for the serial comparator to finish the

minimum path metrics search. To find the minimum state among 64 states, a 2-way

serial comparison structure is used. The total number of comparisons needed is 31

among each 32 states, plus 1 between 2 final candidates, for a total of exactly 32

cycles.

This algorithm can be applied to the multiple-pointer traceback algorithm. For

the traceforward algorithm, applying the best state decoding requires a big 64 to

1 selector to select the starting decode state computed by the traceforward unit,

which is impractical in terms of chip area.

47


3.10 Comparison of Architectures

We will summarize the memory speed, memory size requirements and ATBD

of the SMM architectures in this section. The convolutional code we are to deal

with is of constraint length K and rate-1/2 (the number of states equals S = 2K−1).

L denotes the TBD of the SMM architecture. The ”sb-bit” is the number of bits

used in soft decision decoding and we usually set the ”sb-bit” to three or four. The

”acs-bit” is the number of bits used to accumulate the path metric. To make the

accumulated path metric not overflow, we usually set the ”acs-bit” to 8 or 9.

We show the bandwidth requirements of the basic register exchange and trace-

back architectures in Table 3.3. From this table, we can see the fundamental dif-

ference between two methods. The bandwidth requirement of RE method is much

more than that of TB method, but the RE method is usually implemented with

hard-wired logic while the TB method is usually implemented with memory. The

delay time of RE method is less than TB method by 2L clock cycles. The routing

complexity of RE method grows exponentially with K. We conclude that the RE

based architectures are suitable for high-speed Viterbi decoder or small constraint

length convolutional codes. If the constraint length is large, TB based architecures

facilitate the design because of its lower memory bandwidth requirement and lower

routing complexity.

Table 3.4 summarizes the ATBD of SMM architectures. We will use it to

equalize the BER performance and hardware area later.

Table 3.5 shows the throughput related comparisons of the mentioned architec-

tures. The ”TRAIR” depicts the memory speed requirement of that architecture.

The ”Delay Cycles” shows the coding delay clock cycles of the architecture.

Table 3.6 summarizes the memory/register requirement of each mentioned ar-

chitectures. The ”RE” architecture use registers instead of memory.

48

3.10. COMPARISON OF ARCHITECTURES

Table 3.3: Bandwidth requirement of basic SMM architectures

Architecture RE TB

Write Bandwidth L× S S

Read Bandwidth L× S TRAIR×K

Total 2× L× S S + TRAIR×K

Wiring Complexity High Low

Delay L 3L

Table 3.4: ATBD of architectures

Architecture ATBD

RE L

TBS L

DTR=0.5 1.25L

DTR=1 1.5L

TF 1.5L

SB 2.5L

Tables 3.7 and 3.8 show the major hardware components of the architectures.

(The length of a word is S bit.) We will use it to perform hardware area estimation

in the next chapter.

49


Table 3.5: Speed and bandwidth related comparisons of architectures

Architecture TRAIR Delay Cycles Norm Delay Cycles

RE 1:1 L L

TBS L:1 1 1

1-Ptr (DTR=0.5) 3:1 1.5L 1.2L

M-Ptr (DTR=0.5) 2:1 1.5L 1.2L

1-Ptr (DTR=1) 2:1 3L 2L

TF 1:1 3L 2L

SB 1:1 4.5L 1.8L

Table 3.6: Memory size comparison of architectures

Architecture ATBD Memory Size Normalized Memory Size

TBS L L SL

1-Ptr (DTR=0.5) 1.25L 2L 1.6L

M-Ptr (DTR=0.5) 1.25L 3L 2.4L

1-Ptr (DTR=1) 1.5L 3L 2L

TF 1.5L 2L 1.33L

SB 2.5L 6L 2.4L

50

3.10. COMPARISON OF ARCHITECTURES

Table 3.7: Hardware requirement comparison

Architecture Hardware Requirement

RE (LS) reg and mux, complex routing

TBS

1-Ptr BM Unit TB Unit

M-Ptr S ACS Units 2 LIFO Units Extra TB Units

TF Tracforward Unit

1-Ptr BS S-way Comparator

SB 2S ACS Units, 1 BM Units, 2 TB Units

S-way Comparator

Table 3.8: Hardware unit description

Name Description

BM Unit 2 Inv, 4 Add (sb-bit)

ACS Unit 2 Add, 1 Sub, 1 Mux, 1 reg(acs-bit)

TB Unit (K-1) reg, 1 mux

TF Unit (K-1)S reg, 2(K-1)S mux

LIFO Unit L reg

S-Way Cmp (S-1) 2-way Cmp

2-way Cmp Sub (acs-bit), (K-1) Mux, (K-1) reg

51


52

Chapter 4

Performance Analysis and Metric

4.1 Simulation SetupPuncturing RateSNRInput Bit Stream Convolutional EncoderCode Type Puncture Modulator ChannelChannel Model

DepunctureViterbi DecoderBER MeterDelayModulationDe-ModulatorTrackback DepthArchitecture Data FlowSimulation ParameterBest State TB

Figure 4.1: Computer simulation setup

53

CHAPTER 4. PERFORMANCE ANALYSIS AND METRIC

The computer simulation setup is shown in Figure 4.1. As shown in the illus-

tration, the rectangle blocks are functional units and the round corner blocks are

adjustable parameters. All the parameters of this simulation environment can be

modified to examine their effects on the coding gain performance of Viterbi decoder.

The solid lines indicate the direction of data flow, and the dash lines indicate the

simulation parameters. The input bit stream is fed to the convolutional encoder,

and then the encoded code word is punctured according to the puncturing matrix.

Both the constraint length and the generator polynomials of the convolutional codes

are fully configurable. Here we can apply the generator polynomials listed in Ta-

ble 2.1. Note that we can achieve variable code rates by puncturing the rate-1/2

and rate-1/3 convolutional codes. The modulator sends the I and Q channel analog

output to the channel. The channel model consists of an additive white Gaussian

noise source to add noise to the I and Q channel according to the SNR setup. An

ideal QPSK receiver demodulates the I and Q channel output to produce the 4-bit

quantized soft decision for the Viterbi decoder model. A bit error rate counter then

compares the output of the Viterbi decoder to a delayed version of the input bit

stream to get the bit error rate. The simulation stops when the error count exceeds

10000 or maximum number of iterations is achieved. Finally, the ”BER Meter”

calculates some statistics such as BER and coding gain relative to plain QPSK.

The efficiency of a code is measured by the received energy per bit to noise ratio

(γb = Eb/N0) required to achieve a specific system bit error rate. The power of the

Gaussian channel noise must be adjusted in the simulation for a fair comparison

between different puncture codes. Assume that the modulated signal power has

symbol energy, Es, and each symbol carries B information bits. If the puncture

code rate is R, the Eb/N0 for the received signal is given by:

γb =Eb

N0

=Es

N0

1

BR

54

4.1. SIMULATION SETUP

For QPSK modulation with signal amplitude A, B = 2 bits/symbol, so Eb/N0 be-

comes:

γb =Eb

N0

=A2

N0

1

R

where γb is usually given in dB. The noise power for puncture code rate R is then

given by:

N0 =A2

R10

−γb10 (4.1)

The simulations shown in the later sections use Equation (4.1) to compute the white

Gaussian noise.

10e-6

10e-5

10e-4

10e-3

10e-2

10e-1

10e0

0 2 4 6 8 10

BE

R

Eb/No (dB)

QPSKDVB Rate-1/2, L= 48

Coding gain

Figure 4.2: Eb/No vs. BER for different coding systems

In order to analyze the coding gain performance of convolutional codes, we have

to know the performance of plain QPSK coding. The average bit error probability

of QPSK is as follows [39, 8]:

BER =1

2erfc(

√Eb

N0

) (4.2)

where erfc(·) is the complement of error function where

erfc(x) =2√π

∫ ∞

xe−t2dt (4.3)

55


(Note that erfc(·) is an absolutely decreasing function which means the inverse func-

tion of erfc(·) exists.) The first curve of Figure 4.2 depicts the BER curve of plain

QPSK. The second curve is for rate-1/2 DVB convolutional code and Viterbi decoder

with L=48.

To calculate the coding gain, we have to fix a specific BER and draw a horizontal

line in Figure 4.2. There will be two intersection points, and each of them represents

the required SNR for that curve. The coding gain is the difference between the two

points. For example, the coding gain is 4.7dB @ 10−4 BER.

But actual case is that we usually do not have a figure like Figure 4.2. Suppose

we perform computer simulations with a certain system configuration in which the

SNR is γ. Upon the completion of the simulation, we have the corresponding BER,

β. Apply the β to Equation (4.2), we have

β =1

2erfc(

√γqpsk)

Apply the inverse of error function at both sides,

erfcinv(2β) =√

γqpsk

⇒ (erfcinv(2β))2 = γqpsk

where erfcinv(·) is the inverse function of erfc(·). Finally, the corresponding coding

gain equals

CodingGain = γqpsk − γ (4.4)

We will use Equation (4.4) to calculate the coding gain later.

56

4.2. CODING GAIN ANALYSIS OF VITERBI ALGORITHM

4.2 Coding Gain Analysis of Viterbi Algorithm

4.2.1 Simple Traceback Architecture

The performance of Viterbi decoder with simple TB architecture will be an-

alyzed in this section. Computer simulations were performed to investigate the

performance of the said convolutional codes as well as their punctured codes. In

order to make the analyses simple, we first adopt the DVB convolutional code and

simple TB architecture as our simulation configurations. The relation between TBD

and BER is shown in Figure 4.3 where every curve represents a specific system con-

figuration. The system designers can have an overview of the BER performance of

DVB convolutional code with this figure.

To have a clear view of the benefits of using convolutional codes and Viterbi

algorithm, we usually convert the BER to coding gain relative to plain QPSK coding

by using Equation (4.4). (Note that the measurement of coding gain is dB.) The

resulting figures are shown in Figure 4.4, where the x axis also denotes TBD but y

axis denotes coding gain (dB).

There are several interesting phenomena in Figure 4.4. First, every curve in

Figure 4.4 can be cut into two regions, the linear growth region and the saturation

region. Between the two regions, there is a saturation level of TBD (STBD) such

that the coding gain stops increasing ”significantly” when TBD goes beyond this

level1. The STBD mainly depends on the puncture rate and the SNR does not

affect it very much. Higher punctured rate code needs longer TBD to make all

paths converge with high probability. (We can see that the BER curves of higher

punctured rate code possess smaller curvature.) But we can not arbitrarily increase

the TBD because longer TBD implies higher hardware cost. We have to consider the

tradeoff between coding gain performance and hardware cost. The STBD can be seen

1We will give a definition to ”significantly” later.

57


as the optimal TBD because it achieves the balance point between performance

and cost.

Second, the SNR affects the slope of curves and the maximum achievable coding

gain. Under the same puncture rate, higher SNR makes the coding gain larger. But

the variation of SNR does not affect the determination of optimal TBD. For this

sake, we will leave aside the effects of SNR in the later discussions.

Third, Some system configurations (e.g. the curve with SNR=3.5dB and Rate-

7/8) can not have positive coding gain even if the TBD were pushed to infinity.

Therefore, we have better not to use high puncture rate convolutional codes when

the channel condition is bad.

Table 4.1 summarizes the saturation (optimal) TBDs of simple TB architecture

in a channel with SNR=5.0dB AWGN. The ”Max CG” row shows the maximum

acheivable coding gain. The ”L” row shows the optimal TBDs of the corresponding

configurations. We define the STBD as the least TBD such that the corresponding

coding gain is 0.3dB less than maximum achievable coding gain.

Definition 2 Saturation TBD (STBD) is the least TBD such that the coding gain

is 0.3dB less than the maximum achievable coding gain.

When the hardware implementation of Viterbi decoders is taken into considera-

tion, we can adopt the STBD listed in Table 4.1 as our TBD to achieve a balance

between coding gain performance and hardware cost. We will try to find relationship

between simple TB and other SMM architectures in the later sections. The simula-

tion results shown in this section will serve as our baseline in the later discussions.

The same coding gain analysis is also applied to DCII and UMTS convolutional

code and the results are summarized in 4.2 and Table 4.3. Although the generator

polynomials of DCII and DVB are different, their results are very similar due to

their same constraint length. On the other hand, the results of UMTS are different

58


from those of DVB because their constraint length are different. The maximum

coding gains and optimal TBDs of UMTS are larger than those of DVB.

Table 4.1: Coding gain analysis of DVB with simple TB architecture

Rate 1/2 2/3 3/4 4/5 5/6 7/8

Opt. TBD 44 68 80 104 120 160

Max CG 5.6 5.2 4.3 3.9 3.4 2.6

Table 4.2: Coding gain analysis of DCII with simple TB architecture

Rate 5/11 1/2 3/5 2/3 3/4 4/5 5/6 7/8

Opt. TBD 44 44 56 64 88 104 128 160

Max CG 5.7 5.7 5.1 4.9 4.7 3.8 3.5 2.5

Table 4.3: Coding gain analysis of UMTS with simple TB architecture

Rate 1/2 2/3 3/4 4/5 5/6 7/8

Opt. TBD 64 88 112 136 160 208

Max CG 6.7 5.8 5.1 4.7 4.2 3.4

4.2.2 Register Exchange Architecture

We do the same analysis to RE architecture in this section. In our simulation

results, the BER performance of RE architecture is very similar to that of the simple

TB architecture. With the same system configuration, the STBD and the maximum

coding gain of two architectures are almost the same, but simple TB architecture

59


10e-6

10e-5

10e-4

10e-3

10e-2

10e-1

10e0

20 40 60 80 100 120 140 160 180 200

BE

R

TraceBack Depth

SIGMA=3.5SIGMA=4

SIGMA=4.5SIGMA=5

SIGMA=5.5SIGMA=6.0

10e-6

10e-5

10e-4

10e-3

10e-2

10e-1

10e0

20 40 60 80 100 120 140 160 180 200

BE

R

TraceBack Depth

SIGMA=3.5SIGMA=4

SIGMA=4.5SIGMA=5

SIGMA=5.5SIGMA=6.0

(a) Rate-1/2 (b) Rate-2/3

10e-6

10e-5

10e-4

10e-3

10e-2

10e-1

10e0

20 40 60 80 100 120 140 160 180 200

BE

R

TraceBack Depth

SIGMA=3.5SIGMA=4

SIGMA=4.5SIGMA=5

SIGMA=5.5SIGMA=6.0

10e-6

10e-5

10e-4

10e-3

10e-2

10e-1

10e0

20 40 60 80 100 120 140 160 180 200

BE

R

TraceBack Depth

SIGMA=3.5SIGMA=4

SIGMA=4.5SIGMA=5

SIGMA=5.5SIGMA=6.0

(c) Rate-3/4 (d) Rate-4/5

10e-6

10e-5

10e-4

10e-3

10e-2

10e-1

10e0

20 40 60 80 100 120 140 160 180 200

BE

R

TraceBack Depth

SIGMA=3.5SIGMA=4

SIGMA=4.5SIGMA=5

SIGMA=5.5SIGMA=6.0

10e-6

10e-5

10e-4

10e-3

10e-2

10e-1

10e0

20 40 60 80 100 120 140 160 180 200

BE

R

TraceBack Depth

SIGMA=3.5SIGMA=4

SIGMA=4.5SIGMA=5

SIGMA=5.5SIGMA=6.0

(e) Rate-5/6 (f) Rate-7/8

Figure 4.3: BER of DVB with simple traceback architecture

60


-12

-10

-8

-6

-4

-2

0

2

4

6

20 40 60 80 100 120 140 160 180 200

Cod

ing

Gai

n(dB

)

TraceBack Depth

SIGMA=3.5SIGMA=4

SIGMA=4.5SIGMA=5

SIGMA=5.5SIGMA=6.0

-12

-10

-8

-6

-4

-2

0

2

4

6

20 40 60 80 100 120 140 160 180 200

Cod

ing

Gai

n(dB

)

TraceBack Depth

SIGMA=3.5SIGMA=4

SIGMA=4.5SIGMA=5

SIGMA=5.5SIGMA=6.0


-12

-10

-8

-6

-4

-2

0

2

4

6

20 40 60 80 100 120 140 160 180 200

Cod

ing

Gai

n(dB

)

TraceBack Depth

SIGMA=3.5SIGMA=4

SIGMA=4.5SIGMA=5

SIGMA=5.5SIGMA=6.0

-12

-10

-8

-6

-4

-2

0

2

4

6

20 40 60 80 100 120 140 160 180 200

Cod

ing

Gai

n(dB

)

TraceBack Depth

SIGMA=3.5SIGMA=4

SIGMA=4.5SIGMA=5

SIGMA=5.5SIGMA=6.0


-12

-10

-8

-6

-4

-2

0

2

4

6

20 40 60 80 100 120 140 160 180 200

Cod

ing

Gai

n(dB

)

TraceBack Depth

SIGMA=3.5SIGMA=4

SIGMA=4.5SIGMA=5

SIGMA=5.5SIGMA=6.0

-12

-10

-8

-6

-4

-2

0

2

4

6

20 40 60 80 100 120 140 160 180 200

Cod

ing

Gai

n(dB

)

TraceBack Depth

SIGMA=3.5SIGMA=4

SIGMA=4.5SIGMA=5

SIGMA=5.5SIGMA=6.0


Figure 4.4: Coding gain of DVB with simple traceback architecture

61


outperforms RE architecture about 0.2 dB when the coding gain is in the linear

growth region. Besides this minor difference, the observations and conclusions we

made at last section are also applicable.

Modified Register Exchange Architecture

The coding gain performance of modified RE architecture presented in Section

3.3 is also examined, but the performance is not good enough. The BER of this

architecture is at least 0.1 with every system configuration. Thus we do not list the

resulting figures. To know the reasons of bad performance, please refer to Section 3.3.

4.2.3 Summary

Although the TB and RE architectures implement the Viterbi algorithm in dif-

ferent manners, their TBD to BER curves are almost the same. We think that is

because both SMM architectures possess the same ATBD.

4.3 Equalization of SMM Architectures

In this section, we will find out the optimal TBD of different SMM architectures

by equalizing them according to ATBD. Hence we can compare the hardware cost

of different SMM architectures under a fixed BER or coding gain performance.

First, we divide the mentioned SMM architectures into categories, and each cate-

gory represents a specific value of ATBD. Table 4.4 shows the member architectures

of the four categories.

Before equalizing different SMM architectures, we should see the comparison of

original coding gain performance in Figure 4.5. The maximum achievable coding

gain of all categories are almost the same. But they require different levels of TBD

to achieve the maximum coding gain. The category with larger ATBD achieves

maximum coding gain with smaller TBD.

62

4.3. EQUALIZATION OF SMM ARCHITECTURES

Table 4.4: Category of SMM architectures

Category ATBD SMM Architecture

1 L Simple Traceback/ Register Exchange

2 1.25L 1-Pointer Traceback (DTR=0.5)

3 1.5L Traceforward/1-Pointer Traceback (DTR=1)

4 2.5L Sliding Block

Figure 4.6 shows equalized version of Figure 4.5. The STBD of all curves are very

close in every subfigure, and the error is at most 10%. Then we can take ATBD

as a unifying metric to different SMM architectures. The advantage of ATBD is

that everyone can calculate it by pencil and paper in several seconds. The system

architects can use it to fast evaluate the coding gain performance and hardware cost

of the corresponding architectures.

4.3.1 DCII & UMTS

We change convolutional code to DCII and UMTS to further check the validity

of ATBD metric. The simulation results of DCII are very similar to those of DVB.

Hence we are not going to list the figures again.

It is worthy to list the results of UMTS because the constraint length of it is

different from DVB and DCII. Figure 4.7 shows the coding gain comparison of UMTS

and the equalized version is shown in Figure 4.8. The performance of ATBD is also

very satifactory here. The conclusions we made at last section are also applicable.

4.3.2 Best State Architecture

In this section, we will discuss the equalization of Best State Architecture. The

third and fourth curves in every subfigure of Figure 4.9 show the coding gain perfor-

63


-12

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

Traceback Depth

Rate=1/2, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB

-12

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

Traceback Depth

Rate=2/3, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB


-12

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

Traceback Depth

Rate=3/4, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB

-12

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

Traceback Depth

Rate=4/5, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB


-12

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

Traceback Depth

Rate=5/6, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB

-12

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

Traceback Depth

Rate=7/8, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB


Figure 4.5: Coding gain comparison of different SMM architectures (DVB)

64


-12

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

ATBD

Rate=1/2, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB

-12

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

ATBD

Rate=2/3, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB


-12

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

ATBD

Rate=3/4, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB

-12

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

ATBD

Rate=4/5, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB


-12

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

ATBD

Rate=5/6, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB

-12

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

ATBD

Rate=7/8, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB


Figure 4.6: Coding gain comparison equalized by ATBD (DVB)

65


-10

-8

-6

-4

-2

0

2

4

6

0 50 100 150 200 250

Cod

ing

Gai

n

Traceback Depth

Rate=1/2, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB

-10

-8

-6

-4

-2

0

2

4

6

0 50 100 150 200 250

Cod

ing

Gai

n

Traceback Depth

Rate=2/3, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB


-10

-8

-6

-4

-2

0

2

4

6

0 50 100 150 200 250

Cod

ing

Gai

n

Traceback Depth

Rate=3/4, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB

-10

-8

-6

-4

-2

0

2

4

6

0 50 100 150 200 250

Cod

ing

Gai

n

Traceback Depth

Rate=4/5, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB


-10

-8

-6

-4

-2

0

2

4

6

0 50 100 150 200 250

Cod

ing

Gai

n

Traceback Depth

Rate=5/6, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB

-10

-8

-6

-4

-2

0

2

4

6

0 50 100 150 200 250

Cod

ing

Gai

n

Traceback Depth

Rate=7/8, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB


Figure 4.7: Coding gain comparison of different SMM architectures (UMTS)

66


-10

-8

-6

-4

-2

0

2

4

6

0 50 100 150 200 250

Cod

ing

Gai

n

ATBD

Rate=1/2, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB

-10

-8

-6

-4

-2

0

2

4

6

0 50 100 150 200 250

Cod

ing

Gai

n

ATBD

Rate=2/3, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB


-10

-8

-6

-4

-2

0

2

4

6

0 50 100 150 200 250

Cod

ing

Gai

n

ATBD

Rate=3/4, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB

-10

-8

-6

-4

-2

0

2

4

6

0 50 100 150 200 250

Cod

ing

Gai

n

ATBD

Rate=4/5, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB


-10

-8

-6

-4

-2

0

2

4

6

0 50 100 150 200 250

Cod

ing

Gai

n

ATBD

Rate=5/6, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB

-10

-8

-6

-4

-2

0

2

4

6

0 50 100 150 200 250

Cod

ing

Gai

n

ATBD

Rate=7/8, SNR=5 dB

TBS/RE1P DTR=0.5

TF/1P DTR=1.0SB


Figure 4.8: Coding gain comparison equalized by ATBD (UMTS)

67


mance of simple and one-pointer best state architecture The normalized version of

Figure 4.9 is shown in Figure 4.10. We normalize the architectures with best state

traceback with extra 1.5 times of ATBD. The error of the estimation is also below

10%. That means we can equalize the SMM architecture with best state traceback

as well.

68


-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

Traceback Depth

Rate=1/2, SNR=5 dB

TBS1P

TBS BS1P BS

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

Traceback Depth

Rate=2/3, SNR=5 dB

TBS1P

TBS BS1P BS


-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

Traceback Depth

Rate=3/4, SNR=5 dB

TBS1P

TBS BS1P BS

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

Traceback Depth

Rate=4/5, SNR=5 dB

TBS1P

TBS BS1P BS


-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

Traceback Depth

Rate=5/6, SNR=5 dB

TBS1P

TBS BS1P BS

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

Traceback Depth

Rate=7/8, SNR=5 dB

TBS1P

TBS BS1P BS


Figure 4.9: Coding gain performance of best state architecture

69


-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

ATBD

Rate=1/2, SNR=5 dB

TBS1P

TBS BS1P BS

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

ATBD

Rate=2/3, SNR=5 dB

TBS1P

TBS BS1P BS


-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

ATBD

Rate=3/4, SNR=5 dB

TBS1P

TBS BS1P BS

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

ATBD

Rate=4/5, SNR=5 dB

TBS1P

TBS BS1P BS


-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

ATBD

Rate=5/6, SNR=5 dB

TBS1P

TBS BS1P BS

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100 120 140 160 180

Cod

ing

Gai

n

ATBD

Rate=7/8, SNR=5 dB

TBS1P

TBS BS1P BS


Figure 4.10: Coding gain performance of best state architecture equalized by ATBD

70

4.4. HARDWARE EQUALIZATION

4.4 Hardware Equalization

In this section, we will try to equalize the hardware area of different architectures

with the optimal TBDs listed in Tables 4.1, 4.2 and 4.3. The major hardware com-

ponents used in every SMM architecture have been summarized in Section 3.10. We

will use the library databook of TSMC 0.25µm process 2.5-Volt SAGE standard cell

provided by Artisan [40] to estimate the area of each architecture. We only concern

the TB based architectures, the area of RE based architecture highly depends on

the complex routing wires. The equalized hardware areas under different constraint

lengths and puncture rates are shown in Table 4.5. The values in every cell are nor-

malized area index and real area respectively. The estimation may not be precise

because it omits the routing areas and some pipeline registers. But our point is that

we need to make the comparison of different architectures fair. The TBDs must be

equalized to same coding gain performance. After this equalization, the estimation

of hardware requirement, especially memory requirement, will be more precise.

Table 4.5: Equalized area of architectures, 10−3mm2

Rate K=5, 1/2 DVB, 1/2 DVB, 7/8 UMTS, 1/2 UMTS, 7/8

Opt. ATBD 32 44 160 64 208

TBS 1.00(69) 1.00(263) 1.00(338) 1.00(1068) 1.00(1360)

1P (DTR=1) 1.05(72) 1.08(283) 1.22(411) 1.11(1186) 1.28(1743)

1P (DTR=0.5) 1.03(71) 1.05(276) 1.13(382) 1.07(1139) 1.17(1590)

MP (DTR=0.5) 1.08(74) 1.11(292) 1.31(442) 1.16(1234) 1.40(1898)

1P BS (DTR=1) 1.40(96) 1.52(401) 1.53(517) 1.61(1724) 1.63(2222)

TF 1.19(82) 1.29(340) 1.28(433) 1.39(1485) 1.37(1866)

SB 1.75(121) 1.99(523) 2.01(679) 2.14(2286) 2.20(2292)

71


4.5 Coding Gain Estimation

A method to estimate the BER performance of different SMM architectures will

be presented in this section. The BER performance can be predicted by integrating

over that of simple TB architecture. For example, the one-pointer TB architecture

decodes every bit with from L to (1 + DTR)L TB steps. Remember the BER

performance simulations of the simple TB architecture shown in Figure 4.3. Let

BSimple(L,P) denote the BER function in that figure, and B̂1P (L,DTR,P) denote

the estimation of BER of one-pointer TB architecture. (P denotes the other param-

eters not mentioned here.) Equation (4.5) shows the estimation of B̂1P (·).

B̂1P (L,DTR,P) =

∫ (1+DTR)LL BSimple(l,P) dl

DTR× L(4.5)

Because this method actually integrates the BER of simple TB architecture, we

will call it INTBER in later discussions. The error of the INTBER method is

at most 0.1dB, which means we can predict the BER performance of one-pointer

TB architecture with the INTBER metric in this case. The prediction for other

architectures are similar, once we know the number of TB steps for every decoded

bit, the BER performance will be precisely estimated.

72

Chapter 5

Conclusion & Future Work

5.1 Conclusion

First, we analyze the performance of many SMM architectures. We show the

memory bandwidth requirement, memory size requirement, coding delay and hard-

ware components of every SMM architecture. We can use the above indexes to tell

whether a new architecture is high-speed or low-cost.

Second, we construct a software simulation environment to evaluate the per-

formance of different convolutional code systems. The system accepts arbitrary

constraint length, generator polynomials, puncture rate and noise level. New SMM

architectures can also be inserted into this system by a little modifications. The

BER and coding gain performance of Viterbi algorithm are examined under differ-

ent kinds of conditions. The simulations show that the BER (coding gain) stops

decreasing (increasing) significantly when the TBD achieves an optimal level. This

optimal TBD level depends on the convolutional code, puncture rate and SMM ar-

chitecture. Larger constraint length or higer puncture rate results in longer optimal

TBD. We take optimal TBD as the balance point between hardware cost and coding

gain performance.

Third, we propose the ATBD metric to predict the optimal TBD of different

SMM architectures. That is, we can equalize the difference between SMM architec-

73

CHAPTER 5. CONCLUSION & FUTURE WORK

tures by ATBD. The error of ATBD is at most 10%. The most important is that,

given a new SMM architecture, the determination of ATBD is very simple. System

architects can use ATBD to fast estimate the hardware cost or memory requirement

of their systems. The effect of best-state traceback is also examined. Simulations

show that we can cut down 1/3 to 1/2 of the optimal TBD by tracing back from

the best state.

Fourth, we decompose every SMM architecture and list the major hardware

components used in it. The TSMC SAGE standard cell library databook is used to

estimate the corresponding hardware areas. The error of this high level estimation is

acceptable because what we omit in every SMM architecture are similar. According

to the analyses conducted in Chapters 3 and 4, the traceforward architecture was

the better choice. The TRAIR of this architecture is 1:1, and it also gives more

equivalent TBD for the same size of memory. An extra TF unit is used to find the

starting state for the decoding operation. The routing complexity of the TF unit

is exactly the same as the ACS unit. In addition, the RE and SB architectures are

suitable for high-speed implementation when the constraint length is small (3 or 5).

5.2 Future Work

In this thesis, we only use the AWGN and QPSK as our channel model and

modulation scheme. The validity of ATBD metric can be further verified by using

different channel models and modulation schemes such as fading channel and trellis

code modulation.

The precision of ATBD can be improved by inspecting the Viterbi algorithm. The

formulation of Viterbi algorithm is based on probability and maximum likelihood.

Maybe we should not only consider the TBD equally but also give different weights

to every decoded bit on the survivor path. But as what we have mentioned, the new

metric can not be too complex to calculate or the idea of fast evaluation is broken.

74

Appendix A

Acronyms & Abbreviations

ATBD Average Traceback Depth

AWGN Additive White Gaussian Noise

BER Bit Error Rate

BM Branch Metric

BS Best State

COFDM Coded Orthogonal Frequency Division Multiplexing

DCII DigiCipher II

DSS Digital Satellite System

DTR Decode to Traceback Ratio

DVB Digital Video Broadcasting

FEC Foward Error Correction

FSM Finite State Machine

LOS Line-of-Sight

ML Maximum Likelihood

MAP Maximum a Posteriori

NLN Nakagami-lognormal

PM Path Metric

QPSK Quadriphase-Shift Keying

75

APPENDIX A. ACRONYMS & ABBREVIATIONS

RE Register Exchange

SB Sliding Block

SBVD Sliding Block Viterbi Decoder

SMM Survivor Memory Management

STBD Saturation TBD

TB Trackback

TBD Trackback Depth

TBS Simple Traceback Architecture

TRAIR Traceback Read to ACS iteration ratio

TF Traceforward

UMTS Universal Mobile Telecommunications System

VA Viterbi Algorithm

WPAN Wireless Personal Area Network

76

Appendix B

Glossary of Notation

K constraint length

L traceback depth

Rs average symbol rate

rate-a/b b output bits per a input bits

G generating polynomial of convolutional codes

x source symbol

y encoded symbol

r estimation of encoded symbol

y′ estimation of source symbol

t time index

S state index

ΓSn path metric of state S at time t

λi,jt branch metric from state i to state j at time t

C Convolutional Code

m length of the longest shift register in convolutional encoder

d decision vector

k decode to traceback ratio

Eb bit energy

Es symbol energy

77

Bibliography

[1] IEEE, IEEE 802.3ae Standard. IEEE, 2002. [Online]. Available:

http://standards.ieee.org/getieee802/

[2] UMTS Specification Standard. [Online]. Available: http://www.3gpp.org

[3] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically

optimum decoding algorithm,” IEEE Transactions on Information Theory, pp.

260–269, April 1967.

[4] J. G.D. Forney, “The Viterbi algorithm,” Proceedings of the IEEE, vol. 61,

no. 3, pp. 268–278, March 1973.

[5] ——, “Convolutional code (ii): Maximum likelihood decoding,” Information

and Control, vol. 25, pp. 222–266, July 1974.

[6] S. B. Wicker, Error Control System for Digital Communication and Storage,

1st ed. Pretience Hall, 1995.

[7] J. G. Proakis, Digital Communication System, 4th ed. Mcgraw Hill, 2001.

[8] P. Sweeny, Error Control Coding From Theory to Practice, 1st ed. Wiley, 2002.

[9] M. Boo, F. Arguello, J. D. Bruguera, R. Doallo, and E. Zapata, “High-

performance VLSI architecture for the Viterbi algorithm,” IEEE Transactions

on Communications, vol. 45, pp. 168–176, February 1997.

78

BIBLIOGRAPHY

[10] K. Page and P. Chau, “Improved architectures for the add-compare-select oper-

ation in long constraint length viterbi decoding,” IEEE Journal Of Solid-State

Circuits, vol. 33, pp. 151–155, January 1998.

[11] I. Lee and J. Sonntag, “A new architecture for fast Viterbi algorithm,” IEEE

Transactions On Communications, pp. 1624–1628, October 2003.

[12] R. J. McEliece and I. M. Onyszchuk, “Truncation effects in viterbi decoding,”

IEEE Conference Military Commun., pp. 541–545, October 1989.

[13] G. Feygin and P. Gulak, “Architectural tradeoffs for survivor sequence mem-

ory management in viterbi decoders,” IEEE Transactions On Communications,

vol. 41, no. 3, pp. 425–429, March 1993.

[14] P. J. Black and T. H.-Y. Meng, “Hybrid survivor path architectures for viterbi

decoders,” IEEE, pp. I433–I436, 1993.

[15] G. Fettweis, “Algebraic survivor memory management design for viterbi detec-

tors,” IEEE Transactions On Communications, vol. 43, no. 9, pp. 2458–2463,

September 1995.

[16] E. Boutillon and N. Demassieux, “High speed low power architecture for mem-

ory management in a viterbi decoder,” IEEE, pp. 284–287, 1996.

[17] D. A. El-Dib and M. I. Elmasry, “Modified register-exchange viterbi decoder

for low-power wireless communications,” IEEE Transactions on Circuits and

Systems I, pp. 371–378, February 2004.

[18] P. J. Black and T. H.-Y. Meng, “A 140-mb/s, 32-state, radix-4 viterbi decoder,”

IEEE Journal of Solid-State Circuit, vol. 27, pp. 1877–1885, December 1992.

[19] ——, “A 1-gb/s, four-state, sliding block viterbi decoder,” IEEE Journal of

Solid-State Circuit, vol. 32, pp. 797–805, June 1997.

79

BIBLIOGRAPHY

[20] Y.-N. Chan, H. Suzuki, and K. K. Parhi, “A 2-mb/s 256-state 10-mw rate-1/3

viterbi decoder,” IEEE Journal Of Solid-State Circuits, vol. 35, pp. 826–834,

June 2000.

[21] T. Gemmeke, M. Gansen, and T. G. Noll, “Implementation of scalable power

and area efficient high-throughput viterbi decoders,” IEEE Journal Of Solid-

State Circuits, vol. 37, pp. 941–948, July 2002.

[22] X. Liu and M. C. Papaefthymiou, “Design of a 20-mb/s 256-state viterbi de-

coder,” IEEE Transactions on VLSI systems, vol. 11, pp. 965–975, December

2003.

[23] E. Yeo, S. A. Augsburger, W. T. Davis, and B. Nikolic, “A 500-mb/s soft-

output viterbi decoder,” IEEE Journal of Solid-State Circuits, pp. 1234–1241,

July 2003.

[24] I. Onyszchuk, K.-M. Cheung, and O. Collins, “Quantization loss in convolu-

tional decoding,” IEEE Transactions On Communications, vol. 41, no. 2, pp.

261–265, February 1993.

[25] D. A. Luthi and et. al., “A single-chip concatenated fec decoder,” in IEEE 1995

Custom Intergrated Circuits Conference, 1995, pp. 285–288.

[26] T. Kamada and et. al., “An area effective standard cell based channel decoder

lsi for digital satellite tv broadcasting,” in VLSI Processing IX 1996, 1996, pp.

337–346.

[27] M. Hass and F. Kuttner, “Advanced two ic chipset for dvb on satellite recep-

tion,” IEEE Transactions on Consumer Electronics, pp. 341–345, Augest 1996.

80

BIBLIOGRAPHY

[28] W. P. E. Lutz and E. Plochinger, “Land mobile satellite communications–

channel model, modulation and error control,” in 7th International Conference

of Digital Satellite Communication, May 1986.

[29] T. T. Tjhung and C. C. Chai, “Fade statistics in nakagami-lognormal channels,”

IEEE Transaction on Communications, pp. 1769–1772, December 1999.

[30] C. Loo and N. Secord, “Computer models for fading channels with applications

to digital transmission,” IEEE Transactions on Vehicular Techonology, pp. 700–

707, November 1991.

[31] A. Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd ed.

Mcgraw-Hill, 1991.

[32] C. Tellambura and A. D. S. Jayalath, “Generation of bivariate rayleigh and

nakagami-m fading evelopes,” IEEE Communication Letters, pp. 170–172, May

2000.

[33] J. Hagenauer and E. Lutz, “Forward error correction coding for fading com-

pensation in mobile satellite channels,” IEEE Journal On Selected Areas In

Communications, vol. SAC-5, no. 2, February 1987.

[34] U. Mengali, R. Pellizzoni, and A. Spalvieri, “Soft-decision-based node synchro-

nization for viterbi decoders,” IEEE Transactions on Communications, vol. 43,

pp. 2532–2539, September 1995.

[35] D. J. Sodha, “Code synchornization for convolutional codes,” in Proceedings of

Canadian Conference on Electrical and Computer Engineering, vol. 1, Septem-

ber 1994, pp. 344–347.

81

BIBLIOGRAPHY

[36] O. J. Joeressen and H. Meyr, “Node synchronization for punctured convolu-

tional codes of rate (n-1)/n,” in Proceedings of 1994 IEEE GLOBECOM, vol. 3,

1994, pp. 1279–1283.

[37] Q. Pan and M. P. C. Fossorier, “Code invariances and self-synchronized viterbi

decoding,” IEEE Transactions on Communications, vol. 51, pp. 1082–1092,

July 2003.

[38] G. Lorden, R. J. McEliece, and L. Swanson, “Node synchronization for the

viterbi decoder,” IEEE Transactions on Communications, pp. 524–531, May

1984.

[39] S. Haykin, Communication System, 4th ed. Wiley, 2001.

[40] Artisan, TSMC .25um Process 2.5-Volt SAGE Standard Cell Library Databook,

2000.

82

Documents

Unifying Performance Metric of Viterbi Decoders · 2.1 Rate-1/2 convolutional encoder with generator polynomials (G0=5, ... [3, 4, 5, 6, 7, 8]. ... of the Viterbi decoder come up