Upload
vantu
View
229
Download
0
Embed Size (px)
Citation preview
Unifying Performance Metric
of Viterbi Decoders
by
Yong-dian Jian
Computer Science and Information Engineering
National Taiwan University, Taipei, 2004
Professor Mong-kai Ku and Professor Feipei Lai
Abstract
Convolutional codes and Viterbi decoders were extensively used in error con-
trol systems. The survivor memory management (SMM) unit of Viterbi decoder
is extremely important in determining the throughput, hardware area and coding
gain performance of the whole system. Many SMM architectures were proposed in
the past, but we lack an unifying metric to compare the coding gain performance
of them. In this thesis, we define a metric, average traceback depth (ATBD), to
unify the diversity of different SMM architectures. The ATBD metric can be used
to equalize different SMM architectures and predict the optimal traceback depth
(TBD) of them. The optimality is in terms of coding gain performance and hardware
cost. We perform extensive computer simulations with three popular convolutional
codes (DVB, DCII and UMTS) and many SMM architectures to verify the validity
of the ATBD metric. Simulation results show that the difference between optimal
TBD and ATBD is at most 10%. With this unifying metric, we can estimate the
hardware cost of different SMM architectures under fixed coding gain performance.
Besides, system architects can use it to fast evaluate the tradeoff among hardware
cost, throughput and coding gain performance because the calculation of ATBD
metric is very simple.
Table of Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Convolutional Code & Viterbi Algorithm 5
2.1 Convolutional Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 The Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Punctured Convolutional Codes . . . . . . . . . . . . . . . . . . . . . 14
2.4 Communication Channel Models . . . . . . . . . . . . . . . . . . . . . 17
3 Viterbi Decoder Architecture 21
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Register Exchange Architecture . . . . . . . . . . . . . . . . . . . . . 24
3.3 Modified Register Exchange architecture . . . . . . . . . . . . . . . . 27
3.3.1 Comment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Traceback Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 One-Pointer Traceback . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6 Multiple Pointer Traceback . . . . . . . . . . . . . . . . . . . . . . . . 36
3.7 Traceforward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.8 Sliding Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
i
3.9 Best State Traceback Architecture . . . . . . . . . . . . . . . . . . . . 46
3.10 Comparison of Architectures . . . . . . . . . . . . . . . . . . . . . . . 48
4 Performance Analysis and Metric 53
4.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Coding Gain Analysis of Viterbi Algorithm . . . . . . . . . . . . . . . 57
4.2.1 Simple Traceback Architecture . . . . . . . . . . . . . . . . . . 57
4.2.2 Register Exchange Architecture . . . . . . . . . . . . . . . . . 59
4.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Equalization of SMM Architectures . . . . . . . . . . . . . . . . . . . 62
4.3.1 DCII & UMTS . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3.2 Best State Architecture . . . . . . . . . . . . . . . . . . . . . . 63
4.4 Hardware Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 Coding Gain Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 72
5 Conclusion & Future Work 73
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
A Acronyms & Abbreviations 75
B Glossary of Notation 77
ii
List of Figures
2.1 Rate-1/2 convolutional encoder with generator polynomials (G0=5,
G1=7, K=3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Rate-1/3 linear convolutional encoder with generator polynomials
(G0=161, G1=135, G2=171, K=7) . . . . . . . . . . . . . . . . . . . 7
2.3 State diagram for encoder in Figure 2.1 . . . . . . . . . . . . . . . . . 8
2.4 Trellis diagram of Figure 2.3 . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Information flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Trellis diagram of Figure 2.3 . . . . . . . . . . . . . . . . . . . . . . . 12
2.7 Trellis diagram for rate-2/3 punctured convolutional code . . . . . . . 15
2.8 Additive channel noise model . . . . . . . . . . . . . . . . . . . . . . 17
3.1 Viterbi decoder system block diagram . . . . . . . . . . . . . . . . . . 22
3.2 Register exchange architecture . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Register contents for register exchange operations . . . . . . . . . . . 25
3.4 Timing diagram of register exchange architecture . . . . . . . . . . . 26
3.5 Register contents for register exchange operations . . . . . . . . . . . 27
3.6 Basic implementation of TB method . . . . . . . . . . . . . . . . . . 30
3.7 Register contents for traceback operations . . . . . . . . . . . . . . . 31
3.8 Memory organization of one-pointer traceback architecture . . . . . . 33
3.9 Timing diagram of one-pointer traceback architecture with DTR=0.5 33
3.10 Timing diagram of multiple pointer traceback architecture . . . . . . 37
iii
3.11 Traceforward unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.12 Memory configuration of traceforward unit . . . . . . . . . . . . . . . 39
3.13 Timing diagram of traceforward algorithm . . . . . . . . . . . . . . . 40
3.14 Block diagram of traceforward algorithm . . . . . . . . . . . . . . . . 41
3.15 Hybrid Viterbi algorithm for selecting the shortest path through a
trellis of finite length (four-state trellis example) . . . . . . . . . . . . 43
3.16 Block decoding using the SBVD method: (a) forward processing and
(b) equal forward/backward processing . . . . . . . . . . . . . . . . . 44
3.17 Continuous stream processing using the SBVD method. . . . . . . . . 45
3.18 Best state traceback timing diagram . . . . . . . . . . . . . . . . . . 47
4.1 Computer simulation setup . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Eb/No vs. BER for different coding systems . . . . . . . . . . . . . . 55
4.3 BER of DVB with simple traceback architecture . . . . . . . . . . . . 60
4.4 Coding gain of DVB with simple traceback architecture . . . . . . . . 61
4.5 Coding gain comparison of different SMM architectures (DVB) . . . . 64
4.6 Coding gain comparison equalized by ATBD (DVB) . . . . . . . . . . 65
4.7 Coding gain comparison of different SMM architectures (UMTS) . . . 66
4.8 Coding gain comparison equalized by ATBD (UMTS) . . . . . . . . . 67
4.9 Coding gain performance of best state architecture . . . . . . . . . . 69
4.10 Coding gain performance of best state architecture equalized by ATBD 70
iv
List of Tables
2.1 Summary of three convolutional codes . . . . . . . . . . . . . . . . . 9
2.2 Puncture code definition . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Comparison of traceback architectures . . . . . . . . . . . . . . . . . 35
3.2 Comparison multiple pointer traceback architectures . . . . . . . . . 37
3.3 Bandwidth requirement of basic SMM architectures . . . . . . . . . . 49
3.4 ATBD of architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5 Speed and bandwidth related comparisons of architectures . . . . . . 50
3.6 Memory size comparison of architectures . . . . . . . . . . . . . . . . 50
3.7 Hardware requirement comparison . . . . . . . . . . . . . . . . . . . . 51
3.8 Hardware unit description . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1 Coding gain analysis of DVB with simple TB architecture . . . . . . 59
4.2 Coding gain analysis of DCII with simple TB architecture . . . . . . 59
4.3 Coding gain analysis of UMTS with simple TB architecture . . . . . 59
4.4 Category of SMM architectures . . . . . . . . . . . . . . . . . . . . . 63
4.5 Equalized area of architectures, 10−3mm2 . . . . . . . . . . . . . . . . 71
v
vi
Chapter 1
Introduction
1.1 Motivation
In recent years, the demand for high-speed digital communication pushes the
data rate to 10 Gb/s in Ethernet, 56 Mb/s in WLAN and 2 Mb/s in mobile com-
munications [1, 2]. This high rate requirment poses problems of throughput, cost
and power to the design of channel decoders. We need a coding scheme to provide a
high-speed, low-power, robust and realizable communication system. Channel cod-
ing provides the means of trasforming the incoming data symbols, such that we can
increase the resistance to channel noise of a digital communication system. Convo-
lutional code with Viterbi decoder is a popular candidate for channel coding. They
were widely used in many applications such as satellite communication, COFDM,
GSM, UMTS, Ethernet and magnetic disks and tapes.
The quality of a Viterbi design is mainly measured by 3 criteria: coding gain,
throughput and power dissipation. High coding gain results in low data transfer error
probability. High throughput is necessary for high-speed applications. The design
of Viterbi decoders with high coding gain and throughput is challenged by the need
for low power, however, since Viterbi decoders are often placed in communication
systems running on batteries.
Single-chip VD design has been a very active research area for over the past
1
CHAPTER 1. INTRODUCTION
15 years. The design of high-throughput large-state Viterbi decoders has remained
largely unexplored. In the Viterbi decoders, the performance bottleneck is at the
Add-Compare-Select (ACS) unit survivor memory management (SMM) unit. The
SMM unit relates to the hardware realization of Viterbi algorithm, and it affects the
throughput and hardware complexity of the Viterbi decoder. The traceback depth
(TBD) is the number of traceback steps before decoding the first bit. Generally
speaking, longer TBD results in smaller BER. But to different SMM architectures,
the same TBD does not result in the same BER performance. Although many
high-speed architectures and implementations of SMM unit have been reported in
the past, we lack general metrics to equalize the optimal traceback depth (TBD) of
different SMM architectures. The optimal TBD is defined as the TBD level such
that the coding gain stops increasing significantly beyond this level.
In this thesis, we define some metrics to fast evaluate the throughput performance
of different SMM architectures. We also define a metric, average TBD (ATBD),
to equalize the diversity of different SMM architectures. It can help us predict
the optimal TBD and optimize the hardware cost and coding gain performance
of different SMM architectures. With this metric, we have the following benefits:
First, if we can unify different SMM architectures, we can determine the optimal
TBD which means no hardware resources are wasted. Second, for the analysis
purpose, we can put different SMM architectures on the same baseline to predict
their throughput and turnaround time. Third, once we set up the optimal TBDs
of different SMM architectures and fix the coding gain performance, we can further
compare their hardware cost and power dissipation.
To find out the metric mentioned above, we first perform extensive computer
simulations on the simplest traceback architecture (TBS) (shown in Section 3.4).
Then these simiulation results will serve as the baseline of our work. Furthermore, we
will extend our simulations to different convolutional codes and SMM architectures.
2
1.2. RELATED WORK
By analyzing the computer simulation results, we define a metric, average TBD
(ATBD), to unify the diversity of different SMM architectures. Hence the optimal
TBD of different SMM architectures can be efficiently predicted. System architects
can fast evaluate the tradeoff among hardware cost, throughput and coding gain
performance by these metrics.
1.2 Related Work
The theories of convolutional code and Viterbi algorithm are very mature. The
fundamental results can be seen in [3, 4, 5, 6, 7, 8].
But algorithm is not everything, for the implementation of Viterbi algorithm,
many VLSI techniques were used to boost the throughput or lower the power dissi-
pation of the Viterbi decoders. Many papers discuss the ACS architectures [9, 10, 11]
and SMM architectures [12, 13, 14, 15, 16, 17]. The hardware implmentation reports
of the Viterbi decoder come up in literature frequently [18, 19, 20, 21, 22, 23]. In
addition, We have to consider the quantization and precision problems [24]. At last,
Figure 1 of [22] summarizes the performance of various published Viterbi decoders.
1.3 Outline
This thesis is organized as follows:
Chapter 2 provides the background knowledge of convolutional codes and Viterbi
algorithm. We also show some channel noise models and the corresponding genera-
tion mechanisms.
Chapter 3 shows many SMM architectures and disclose the implementation is-
sues in Viterbi decoder. We define some perfomrance metrics to measure the perfor-
mance of SMM architecture. The major hardware components used in every SMM
architecture are also summarized in the last section.
3
CHAPTER 1. INTRODUCTION
Chapter 4 shows the details of the simulation platform and simulation results.
The BER performance of convolutional code with Viterbi algorithm will be analyzed.
The ATBD metric was defined to equalize the TBD of different SMM architectures.
A simple hardware area estimation is also shown in this chapter.
Chapter 5 summarizes and concludes this work.
4
Chapter 2
Convolutional Code & ViterbiAlgorithm
2.1 Convolutional Codes
Convolutional codes offer an approach to error control substantially different
from that of traditional block codes, such as BCH or Reed-Solomon codes [6, 7, 8].
The input of the convolutional encoder is a stream of information symbols, then
the convolutional encoder converts the entire data stream, regradless of its length,
into a single code word by a linear digital filter. Figure 2.1 shows a typical rate-1/2
convolutional encoder. The rate of this encoder is established by the fact that the
encoder outputs two bits for every input bit.
When a new input bit arrives, the contents in the registers are shifted right,
such that the oldest bit is removed. Each information bit stays within the encoder
for a fixed amount of time. The constraint length K of a convolutional code is
the maximum number of bits in a single output stream that can be affected by any
input bit. In Figure 2.1, the maximum number of bits affected by any input bit
is three. The output bits are determined by selectively summing the information
remembered by the encoder. For example, assume that the input to the encoder is
1, 0, 1, 0, ..., and the registers are initialized to 0. At time interval 0, first input
bit ’1’ arrives. The value of y00 is (0+1) modulo 2 = 1, and the value of y10 is
5
CHAPTER 2. CONVOLUTIONAL CODE & VITERBI ALGORITHM
(0+0+1) modulo 2 = 1. At time interval 1, the first input bit is shifted to ”r1”
while the second input bit ’0’ arrives. The value of y01 is (0+0) modulo 2 = 0, and
the value of y11 is (0+1+0) modulo 2 = 1. The same encoding operation continues
to generate the output bits.
Convolutional encoders can be viewed in a number of different ways. For ex-
ample, they may be considered as finite impulse response (FIR) digital filters or
as finite state machine (FSM). Both of these approaches and their corresponding
analytical tools yield some interesting insights into the structure of convolutional
codes.
As a FIR filter, the encoding operation can be described by two generator poly-
nomials, and each of them corresponds to an output bit stream. The generator
polynomials in Figure 2.1 are (the operator D represents a delay)
G(0)(D) = D2 + 1
G(1)(D) = D2 + D + 1
The interpretation of these polynomials is that the output are given by the modulo-2
sum of the corresponding bit in the registers. Generator polynomials are often coded
in octal in literature. A convolutional code system can be uniquely specified by a
constraint length and a set of generator polynomials. For example, the convolutional
code system in Figure 2.1 can be represented as (G0=5, G1=7, K=3).
Figure 2.2 shows another example of rate-1/3 convolutional encoder. This is the
base system used in DigiCipher II (DCII) standard which is one of the satellite TV
broadcasting standards in the United States.
In order to analyze the behavior of the convolutional encoder, we can view it as
a finite state machine (FSM). For example, the encoder in Figure 2.1 is equivalent
to the FSM in Figure 2.3. Each state corresponds to a unique value of the convo-
lutional encoder. Given the current state (XY), after the bit X is removed from
6
2.1. CONVOLUTIONAL CODES
r1 r2++…,x2,x1,x0 …,y02,y01,y00…,y12,y11,y10+: Delay Element : Modulo-2 Add
Figure 2.1: Rate-1/2 convolutional encoder with generator polynomials (G0=5,G1=7, K=3) +++…,x2,x1,x0 …,y22,y21,y20…,y12,y11,y10…,y02,y01,y00Figure 2.2: Rate-1/3 linear convolutional encoder with generator polynomials(G0=161, G1=135, G2=171, K=7)
the registers, the next state can be either (Y0) (corresponding to input bit ”0”) or
(Y1) (corresponding to input bit ”1”). Each branch in the state diagram has a label
of the form X/YZ, where X is the input bit that causes the state transition and
YZ is the corresponding pair of output bits. From the FSM, a new representation
called trellis diagram is shown in Figure 2.4. This structure is especially suitable for
the interpretation of Viterbi algorithm.
To generalize the previous explaination, a rate-a/b with constraint length K
convolutional code has 2a(K−1) states. Each state has 2a incoming and outgoing
7
CHAPTER 2. CONVOLUTIONAL CODE & VITERBI ALGORITHM
00 1101 10 1/011/101/11 1/00 0/100/110/00 0/01
Figure 2.3: State diagram for encoder in Figure 2.111100100State Iteration Time
0/001/110/101/010/111/000/011/10t t+1 t+2 t+3 t+4
Figure 2.4: Trellis diagram of Figure 2.3
branches and each branch represents a b-bit code word. Hence it can be seen that
the number of states grows exponentially with a and K, and the hardware complexity
also increases exponentially as well. In practice, we realize complex code rate (e.g.
6/7, 7/8) by puncturing the simple data rate (e.g. 1/2, 1/3) convolutional code. This
approach enhances the modulization of the system. We gain flexibility to manipulate
the data rate over the channel, by merely adding a puncturing unit at the encoder
and a depuncturing unit at the decoder. By contrast, directly implementing a
variable data rate codec are more inflexible and expensive.
8
2.1. CONVOLUTIONAL CODES
In this work, we use three popular convolutional codes systems. The first is used
in Digital Video Broadcasting Standard (DVB) [25, 26, 27], which is the European
standard for digital broadcasting. The second is used in DigiCipher II (DCII), which
is one of the main satellite TV broadcasting standards in the United States. The
third is used in Universal Mobile Telecommunications System (UMTS), which is
used in Code-Division Multiple Access (CDMA) applications.
Table 2.1: Summary of three convolutional codes
K G0 G1 G2 dfree
DVB 7 133 171 - 10
DCII 7 161 135 171 15
UMTS 9 561 753 - 12
UMTS 9 557 663 711 18
Table 2.1 summarize the convolutional codes mentioned above. Constraint lengths
of K less than 5 are too small to provide any substantial coding gain, while systems
with K greater than 9 are typically too complex to implment as a parallel architec-
ture on a single VLSI device. The dfree, means by minimum free distance, is
the minimum Hamming distance between all pairs of complete convolutional code
words. In general, larger constraint length or higher code rate results in larger dfree.
The convolutional code with larger dfree has more coding gain. To know more details
of convolutional codes, please refer to [6, 7, 8]
9
CHAPTER 2. CONVOLUTIONAL CODE & VITERBI ALGORITHM
2.2 The Viterbi Algorithm
Consider the decoding problem presented in Figure 2.5. An information sequence
x is encoded to form a convolutional code word y, which is then transmitted across a
noisy channel. The convolutional decoder takes the received vector r and generates
an estimation y′ of the transmitted code word.x Convolutional Encoder Channel Convolutional Decodery r y’Figure 2.5: Information flow
The maximum likelihood (ML) decoder selects, by definition, the esti-
mate y′ that maximizes the probability p(r|y′), while the maximum a posteriori
(MAP) decoder selects the estimate that maximizes p(y′|r). If the distribution of
the source words x is uniform, then the two decoders are identical; in general, they
can be related by Bayes’ rule
p(r|y)p(y) = p(y|r)p(r) (2.1)
The development of the ML decoder is pursued in this section. Suppose that a
rate-a/b convlutional encoder is in use, and we have an input sequence x composed
of L a-bit blocks.
x = (x(0)0 , x
(1)0 , ..., x
(a−1)0 , x
(0)1 , x
(1)1 , ..., x
(a−1)1 , ..., x
(a−1)L−1 )
The output sequence y will consist of L b-bit blocks (one for each input block) as
well as m additional blocks, where m is the length of the longest shift register in the
encoder.
y = (y(0)0 , y
(1)0 , ..., y
(b−1)0 , y
(0)1 , y
(1)1 , ..., y
(b−1)1 , ..., x
(b−1)L+m−1)
A noise-corrupted version r of the transmitted code word arrives at the receiver,
where the decoder generates a maximum likelihood estimate y′ of the transmitted
10
2.2. THE VITERBI ALGORITHM
sequence. r and y′ has the following form.
r = (r(0)0 , r
(1)0 , ..., r
(b−1)0 , r
(0)1 , r
(1)1 , ..., r
(b−1)1 , ..., r
(b−1)L+m−1)
y′ = (y′(0)0 , y
′(1)0 , ..., y
′(b−1)0 , y
′(0)1 , y
′(1)1 , ..., y
′(b−1)1 , ..., y
′(b−1)L+m−1)
A few assumptions about the channel need to be made to facilitate the analysis.
We assume that the channel is memoryless, i.e., that the noise process affecting a
given bit in the received word r is independent of the noise process affecting all of
the other received bits. Since the probability of joint, independent events is simply
the product of the probabilities of the individual events, it follows that
p(r|y′) =L+m−1∏
i=0
[ p(r(0)i |y′(0)
i ) p(r(1)i |y′(1)
i ) . . . p(r(b−1)i |y′(b−1)
i ) ]
=L+m−1∏
i=0
(b−1∏
j=0
p(r(j)i |y′(j)i )) (2.2)
There are two sets of product indices, one corresponding to the block numbers
(subscripts) and the other corresponding to bits within the blocks (superscripts).
Equation (2.2) is sometimes called the likelihood function for y′. Since logarithms
are monotonically increasing, the estimate that maximizes p(r|y′) is also the estimate
that maximizes log p(r|y′). By taking the logarithm of each side of Equation (2.2),
we obtain the log likelihood function
log p(r|y′) =L+m−1∏
i=0
(b−1∏
j=0
log p(r(j)i |y′(j)i )) (2.3)
By inspecting the Equation (2.3), we may notice that the summands are prob-
ability, i.e., real number. In hardware implementations of the Viterbi decoder, the
summands in Equation (2.3) are usually converted to a more easily manipulated
form called the bit metrics
M(r(j)i |y′(j)i ) = a [ log p(r
(j)i |y′(j)i ) + b] (2.4)
11
CHAPTER 2. CONVOLUTIONAL CODE & VITERBI ALGORITHM
a and b are chosen such that the bit metrics are small positive intergers that can be
easily manipulated by digital logic circuits. The path metric for a code word y′ is
then computed as follows.
M(r|y′) =L+m−1∏
i=0
(b−1∏
j=0
M(r(j)i |y′(j)i )) (2.5)
If a in Equation (2.4) is positive and real, while b is simply real, then the code word
y′ that maximizes p(r|y′) also maximizes M(r|y′).At times it is useful for us to focus on the contribution made to the path metric
by a single block of r and y′. Recall that a single block corresponds to a single
branch in the trellis. The kth branch metric (BM) for a code word y′ is defined
as the sum of the bit metrics for the kth block of r given y′.
M(rk|y′k) =b−1∑
j=0
M(r(j)k |y′(j)k ) (2.6)
After explaining the Viterbi algorithm from a theoretical point of view, let us illus-
trate the above algorithm by an example.
By inspecting the example shown in Figure 2.3, we can redraw the evolution
of this four-state FSM by using the trellis diagram and present it in Figure 2.6.
Each node corresponds to a unique state, and each branch corresponds to a state11100100State Iteration Time
0/001/110/101/010/111/000/011/10t t+1 t+2 t+3 t+4Figure 2.6: Trellis diagram of Figure 2.3
12
2.2. THE VITERBI ALGORITHM
transition. Given a known starting state, every input sequence corresponds to a
unique path through the trellis. At the Viterbi decoder, upon receving an encoded
symbol, each branch is assigned a weight referred to as branch metric, which is a
measure of the likelihood of the transition given the noisy observations at receiver.
Branch metrics are typically calculated using a distance measure, so that more likely
transitions are assigned with smaller weights. Given the unique mapping between
a trellis path and an input sequence, the most likely path (shortest path) through
the trellis corresponds to the most likely input sequence. The Viterbi algorithm is
an efficient method for finding the shortest path through the trellis.
The first phase of the Viterbi algorithm is to recursively compute the shortest
path to time t + 1 in terms of that to time t. At time t each state S is assigned a
state metric ΓSn which is defined as the accumulated metric along the shortest path
leading to that state. The state metric at time t + 1 can be recursively calculated
in terms of the state metrics of the previous iteration as follows:
Γjt+1 = min
i{Γi
t + λi,jt } (2.7)
where i is a predecessor state of j and λi,jt is the branch metric on the transition
from state i to state j. The recursive update given in Equation (2.7) is the well-
known add-compare-select (ACS) operation. By inspecting Equation (2.7), we
know that every ACS operation occured at state j not only updates the state metric
but also finds out the decision about predecessor state. In order to facilitate the
second phase of the Viterbi algorithm, we will keep these decesions in memory.
The second phase of the Viterbi algorithm involves tracing back and decoding
the shortest path through the trellis, which is recursively defined by the decisions
from the ACS updates. We usually refer to this function as Survivor Memory
Management (SMM). The shortest path leading to a state is referred to as the
survivor path for that state. A property of the trellis which is utilized for survivor
13
CHAPTER 2. CONVOLUTIONAL CODE & VITERBI ALGORITHM
path decode is that if the survivor paths from all possible states at time t are traced
back, then with high probability all the paths merge at time (t−L), where L is the
traceback depth (or survivor path length). Once merging to the survivor path,
the path is independent of the starting state and future ACS iterations.
Based on the merging property of survivor path, the traceback method for
survivor path decode proceeds as follows. At time t an arbitrary starting state is
chosen and the survivor path is traced back to time (t − L), at which point the
input symbol corresponding to the transition at time (t − L) is decoded. So far,
the simplest architecture of Viterbi decoder has been presented. We will give full
discussion on the SMM architectures in Chapter 3.
2.3 Punctured Convolutional Codes
Puncturing is defined as the systematic deletion of one or more bits in every code
word. The usage of convolutional code facilitates the design of variable code rate
Viterbi decoder. Given a fixed encoder structure, higher-rate convolutional codes
are achieved by periodically deleting bits from the output streams of convolutional
encoder. For example, let C be a rate-1/2 convolutional code, and let x be the
source information sequence corresponding to a code word y ∈ C.
x = (x0, x1, x2, ...)
y = (y(0)0 y
(1)0 , y
(0)1 y
(1)1 , y
(0)2 y
(1)2 , y
(0)3 y
(1)3 , y
(0)4 y
(1)4 , ...)
If every fourth bit of y is deleted, the resulting punctured code word yP has the
form
y = (y(0)0 y
(1)0 , y
(0)1 E, y
(0)2 y
(1)2 , y
(0)3 E, y
(0)4 y
(1)4 , ...) (2.8)
E’s have been inserted to mark the location of the deleted bits, though nothing is
actually transmitted for these bits. Since yP has three code-word bits for every two
14
2.3. PUNCTURED CONVOLUTIONAL CODES
information bits, therefore, yP becomes a code word in a rate-2/3 punctured code
CP. If the receiver inserts erasures at the point where bits have been punctured,
the rate-1/2 Viterbi decoder may thus be used instead of a more complicated rate-
2/3 one. For example, we can puncture the rate-1/2 convolutional code shown in
Figure 2.1, the resulting convolutional code is shown in Figure 2.7 (a), where the
corresponding rate-2/3 trellis can be derived in Figure 2.7 (b). It must be noted
that the dfree of a punctured convolutional code is smaller than the original code.
Hence the punctured convolutional codes boost data rate at the expense of degraded
coding gain.
00111001110001
100E1E1E0E1E0E0E
1X11100100State
001111110001 110010111000000
011010011 10110011100100
State 100101(a) (b)
Figure 2.7: Trellis diagram for rate-2/3 punctured convolutional code
To decode punctured convolutional code, we need an extra depuncturing block
and synchoronization block in the Viterbi decoder. The depuncturing block will
process the input data and insert dummy cycles to restore the rate-1/2 timing re-
lationships before sending the data stream to the ACS unit. The synchronization
block will correct the timing ambiguity in the depuncturing process. With these
extra blocks, we can support all the different coding rates with a single rate-1/2
Viterbi decoder engine.
Table 2.2 shows the punctured code rates and the puncture matrix for all three
standards.
15
CHAPTER 2. CONVOLUTIONAL CODE & VITERBI ALGORITHM
Table 2.2: Puncture code definition
Standard Rate Puncture Matrix
DVB 1/2 N/A
2/3 [11],[10]
3/4 [110],[101]
4/5 [1111],[1000]
5/6 [11010],[10101]
6/7 [111010],[100101]
7/8 [1111010],[1000101]
DCII 1/2 [0],[0],[1]
3/4 [100],[001],[110]
5/11 [00111],[11010],[11111]
2/3 [11],[00],[01]
4/5 [0111],[0010],[1000]
7/8 [0000000],[0000001],[1111111]
3/5 [001],[010],[111]
5/6 [00111],[00000],[11001]
UMTS (1/2) 1/2 N/A
2/3 [11],[10]
3/4 [111],[100]
4/5 [1101],[1010]
5/6 [10110],[11001]
6/7 [110110],[101001]
7/8 [1101011],[1010100]
16
2.4. COMMUNICATION CHANNEL MODELS
2.4 Communication Channel Models
In the design of communication systems for transmitting information through
physical channels, we find it convenient to construct mathematical models that
reflect the most important characteristics of the transmission medium. Then, the
mathematical model for the channel is used in the design of the channel encoder
and modulator at the encoder, and the demodulator and channel decoder at the
receiver.
Figure 2.8: Additive channel noise model
The simplest mathematical model for a communication channel is the additive
noise channel, which is illustrated in Figure 2.8. Physically, the additive noise
process may arise from electronic components and amplifiers at the receiver of the
communication system or from interference encountered in transmission. In this
model, the transmitted signal s(t) is corrupted by a random process n(t). This type
of noise is usually characterized statistically as a Gaussian noise process. Hence, the
resulting mathematical model for the channel is usually called the additive Gaussian
noise channel. Because of its simplicity and mathematical tractability, this channel
model is used extensively in the communication system analysis and design. Channel
attenuation is easily incorporated into the model. The received signal is
r(t) = αs(t) + n(t) (2.9)
where α is the attenuation factor.
17
CHAPTER 2. CONVOLUTIONAL CODE & VITERBI ALGORITHM
Although AWGN provides a satisfied model in most situations, it is not capable of
modeling some phenomena such like multipath fading [7] (intersymbol interference)
and shadowing [28]. These phenomena severely degrades the performance of data
transmission. In communication system, with land mobile terminal via satellite,
shadowing is caused by large obstacles in the signal path. Multipath fading is
caused by reflection of the satellite signal at a large number of points. Rician,
Rayleigh, Nakagami and lognormal processes are used for describing these special
phenomena. The Rician fading model may be used to model both the microcellular
environment and the mobile satellite fading channel. In these environments, there is
no obstacle in the signal path. The Rayleigh process is a widely accepted statistical
model for the received signal evelope in macrocellular mobile radio channels, where
there is no direct line-of-sight (LOS) radio propagation path. The more general
Nakagami fading model, parameterized by the fading severity parameter m, has been
shown to fit well to some urban multipath propagation data. Lognormal process is
usually used to model the shadowing effect by large obstacles such as buildings and
mountains. To know the details of the above random process, please refer to [7].
In addition to this fading only model, the received signal envelope in mobile radio
environments may suffer from shadowing, due to the topographical variation of the
transmission path. Therefore, for describing the random signal envelope variations
in microcellular mobile radio systems, a channel model characterized by Nakagami
fading and lognormal shadowing, so called Nakagami-lognormal (NLN) channel [29],
is appropriate. Certainly, there are many other channel models suitable for more
specific situations.
In addition to the mathematical model of the channel, we have to know how to
generate them systematically. Some popular methods are summarized.
White Gaussian Random Process Method [30] In this method, it is assumed
that all computer models (Rayleigh, Rican, Log-normal) for the fading chan-
18
2.4. COMMUNICATION CHANNEL MODELS
nels are based on the manipulation of a white Gaussian random process. The
white Gaussian random process can be approximated by a sum of sinusoids
with random phase angle.
Inverse Transform Method [31] This method is originated from probability the-
ory. We can generate a random variable x with continuous cdf F (x). Some
modified procedures for generating correlated Rayleigh and Nakagami-m fad-
ing signal pairs are also developed [32].
Direct Probe No mathematic model is better than the real world. A good ap-
proach is to extract the signals by probing the real channel and to record
them on tape [33]. The only priori assumption is that the channel is non-
frequency selective, which for low rate data transmission can be easily verified
for most mobile fading channels.
19
CHAPTER 2. CONVOLUTIONAL CODE & VITERBI ALGORITHM
20
Chapter 3
Viterbi Decoder Architecture
3.1 Overview
The Viterbi decoder can be divided into five blocks: the depunctue unit, the
branch metric (BM) unit, the Add-Compare-Select (ACS) unit, the survivor mem-
ory management (SMM) unit and the synchronization unit. The block diagram
of a constraint length K, punctured rate-m/n Viterbi decoder (based on a rate-1/b
Viterbi decoder) is shown in Figure 3.1.
The demodulator output goes to the depuncture unit where dummy symbols are
inserted to restore the rate-m/n encoded data rate to rate-1/b source data rate. In
the BM unit, b input soft decisions and the puncture matrix from the depuncture
logic are used to generate 2b different branch metrics. The ACS unit selects the
surviving branches according to Equation (2.7) and generates the decision vectors.
According to the decision vectors generated by the ACS unit, the SMM unit finds the
survivor sequence to produce the decoded output bit stream. Many high speed or low
power ACS architectures have been proposed [9, 10, 11]. The synchronization unit
finishes the decoding process by resolving the timing and phase ambiguity introduced
by the channel. Various synchronization schemes are presented in the literature [34,
35, 36, 37, 38]. The SMM unit is the most critical part in the design of Viterbi
decoder. It affects the coding gain performance, hardware cost and throughput of
21
CHAPTER 3. VITERBI DECODER ARCHITECTUREDe-modulatorDe-Puncture UnitBranch Metric UnitAdd Compare Select UnitSurvivor Memory Management UnitSynchronization Unit
rate m/n data symbolrate 1/b data symbol ….b.….….2b….…2K-1…branch metricACS decision vectordecoded output
Viterbi Decoderchannel input
Figure 3.1: Viterbi decoder system block diagram
the Viterbi decoder. We will focus our discussion on the SMM unit in the latter
sections.
In the Viterbi decoding process, the decision vectors generated by the ACS unit
22
3.1. OVERVIEW
are stored in memory, so that the information bits can be retrieved later. To achieve
the maximum likelihood decoding of the convolutional code, the Viterbi algorithm
calls for arbitrary long path memories. But this requirement is impractical for the
actual hardware implementation. However, upon examining the Viterbi algorithm
closely, the required memory can be cut to manageable size with negligible loss in
coding gain performance. An unmerged path will accumulate more distance metric
than the merged one with high probability. In other words, upon merging to the
survivor path, an incorrect path with a very long unmerged span will have very low
probability of having a smaller metric than the correct path. Consequently, with
very high probability, the best path to all the states will have diverged from the
correct path within a short span, typically a few times of constraint length. So the
path memory only needs to be long enough to ensure that all path are merged. This
length denotes the traceback depth (TBD), L, in Viterbi decoders.
Definition 1 Traceback Depth (TBD) is the number of traceback operations per-
formed berfore decoding the first bit.
Forney [5] had shown through ensemble coding arguments that the probability
of truncation error decreases exponentially with TBD. At low signal-to-noise ratios,
his experimental result shows that the probability of truncation error is negligible
for L ≥ 5.8m (Rate-1/2), where m is the maximal memory order (the length of the
longest shift register in the encoder).
There are two basic approaches: register exchange (RE) method and trace-
back (TB) method. In both techniques a shift register is associated with every
trellis node throughout the decoding operations. RE based method is suitable for
high speed application at the expense of complex routing and high power dissipa-
tion. The routing complexity grows exponentially with the constraint length. For
Viterbi decoder implementation with medium to high constraint lengths, the TB
23
CHAPTER 3. VITERBI DECODER ARCHITECTURE
based method is the preferred architecture for SMM unit because of its lower hard-
ware requirements. TB based method has commonly been used in low-throughput
or low-power applications. It permits the design of very compact RAM that provides
significantly area advantages. We will show the details of RE and TB architectures
in Section 3.2 and Section 3.4.
3.2 Register Exchange Architecture
Figure 3.2 shows the architecture of Register Exchange method. A register ex-
change architecture consists of a two-dimentional array of one-bit registers and mul-
tiplexers. The desicion sequences of all 2K−1 (in this case, K=3) states are stored in
every column of registers. These registers are interconnected in precisely the fashion
as the ACS circuits. Associated with every trellis state is a register which contains
the survivor path leading to that state. The decision sequence dSn−4,n of the survivor
path to state S from time (n−4) to n is given by the recursive update:
dSn−4,n = (dS′
n−1−4,n−1 << 1)dSn (3.1)
where S ′ is the predecessor state of S as determined by its decision dSn from the
ACS update. Each survivor path is uniquely specified and stored as the sequence
decisions along the survivor path.
Each time a new branch is processed by the ACS, the register values are inter-
changed corresponding to the decision values. A new symbol is added to one end
of each register, and the oldest symbol in each register is delivered to the output
decision device. Figure 3.3 shows the trellis network of the RE architectures. Every
register has two candidate predecessors, one is connected with solid trellis and the
other is connected with dash one. The solid trellis corresponds to the decision value
generated by the ACS unit. At the moment of copying, the decision value is also
appended to the end of the registers. After L iterations, the state register with the
24
3.2. REGISTER EXCHANGE ARCHITECTURE
11100100State decision vector
…..
OutputSurvivor Path Merging DecodedOutput
d[3]d[2]d[1]d[0]d[3]d[2]d[1]d[0]
d[3]d[2]d[1]d[0]Figure 3.2: Register exchange architecture
minimum path metric contains the information of survivor path. (It seems that no
traceback operations were performed and the definition of TBD is applicable here.
But forward register exchange and traceback operation are essentially equivelant
with regard to Viterbi algorithm.)-1-0T=111100100
State 11011000T=2111101110000T=3
1011110110101100T=410111110011101010100T=5
----T=0Figure 3.3: Register contents for register exchange operations
25
CHAPTER 3. VITERBI DECODER ARCHITECTURE
Figure 3.4 shows another timing diagram of the RE architecture. At any time
interval, new decision values were appended to the registers and the MSB of the
zeroth register were taken as the decoded bit.n n+L-1n+1 n+LLdecisionvalues
n+1ndecoded bits……. T = n+L-1T = n+L
Figure 3.4: Timing diagram of register exchange architecture
For high-speed implementations, this technique requires all the shiftings to be
done in parallel. As a result, every stage will have a routing channel for interchang-
ing the decision values. It is the routing complexity which makes the RE method
impractical for large constraint lengths, or high puncture rate situation which have
high TBD and large number of states.
Variation of this architecture can be used in other ways to improve the register
exchange architecture. We will show it in Section 3.3.
26
3.3. MODIFIED REGISTER EXCHANGE ARCHITECTURE
3.3 Modified Register Exchange architecture
The RE method is based on successive RE operations between two origin states
and two destination states. It is the value of the registers that is interchanged along
the trellis. In the section, the proposed method uses the ”pointer” concept. State
indexes can be used as inputs to the register exchange network instead of the value
of the registers [17]. Instead of moving the values between registers, the pointer
to the source register is altered to point to the destination register. In the Viterbi
decoder, every state is assigned a register and a pointer; here the pointer to the
register simply carries the current state.ij
pqTime t Time t+1
BMt(i,p)BMt(j,q)BMt(j,p)BMt(i,q)PMt-1(i)
PMt-1(j) i = 0xj = 1xp = x0q = x1X is the common group of bits among statesFigure 3.5: Register contents for register exchange operations
For example, if (PM it−1+BM i,p
t ) is greater than (PM jt−1+BM j,p
t ), then the path
from j to p is the survivor path for p. The pointer to register j which carries the
value j is shifted to the left, and the bit, which causes the survivor path transition
from j to p (in this case ”0”), is appended to the LSB. Therefore, the pointer which
carried the value j now carries the value p, thereby pointing to register p. Then,
the decoded bit is appended to the content of the register whose pointer value is
changed from j to p. It must be noted that the register has a fixed physical location;
only the value of its pointer changed, and a bit is appended to the corresponding
register for each code word received. But a problem arises when j is the predecessor
27
CHAPTER 3. VITERBI DECODER ARCHITECTURE
of p and q at the same time. Which value should the pointer j take, p or q? In other
words, which of the two paths, originating from the same source state, should be
the survivor path, and how should the other path terminate? The ACS unit then
needs to produce a new decision bit, called the termination bit.
The solution to this problem is simple. If the both paths originated from state
j are considered to be the survivor paths for the destination state p and q, the BMs
of both paths are compared. Then, the pointer, which carries the value j, changes
to the value of the destination pointer whose path has the smaller BM. The value
of the pointer of state i changes to the other destination pointer. Simultaneously,
the path from state i receives a termination high signal. This indicates that the
path from state i is terminated, and no decision bits are appended to its register
anymore. Then we can prevent the presence of duplicated pointer values.
3.3.1 Comment
In the process of implementing the modified RE architecture, we find a problem
hidden in this architecture. In that paper, the update mechanism of register pointers
does not consider a usual case. For example, if j is the predecessor of p and q at the
same time, according to the update scheme above, we have to find out the smaller
branch metric of BM j,pt and BM j,q
t . But when BM j,pt and BM j,q
t are equal, we will
have trouble in determining which of them will be the successor. Choosing either of
the states may break the essence of Viterbi algorithm, because we have no chance to
recover the error we made at this time instance. In the tradtional RE architecture,
this kind of problem is resolved by keeping both of them alive and deferring the
determination timing (Viterbi algorithm). But it seems to be difficult to solve this
problem in the modified RE architecture.
28
3.4. TRACEBACK METHOD
3.4 Traceback Method
The traceback (TB) method is a backward processing algorithm for deriving the
survivor path from a starting state and the path decisions. The survivor memory
does not store the whole surviving paths but the decisions of branch selection in
every step. With these decision vectors output by ACS unit, the TB algorithm can
then trace the survivor path in the opposite direction to the ACS update, hence the
name ”traceback” algorithm. Given the current state Sn and decision dsn, the TB
algorithm estimates the previous state Sn−1 by:
Sn−1 = f(Sn, dsn) (3.2)
One may notice that the current state decision dsn is read from the decision
memory by using the current state Sn and time index n as an address. The function
f()̇ is a mapping function depending on the structure of the trellis. For the radix-2
trellis, Equation (3.2) simplifies to
Sn−1 = dsn(Sn >> 1) (3.3)
which corresponds the concatenation of the decision bit and the 1-bit right shift of
the current shift register state. The radix-4 fully parallel architecture for a radix-2
trellis is the lookahead scheme which obeys ideal linear scaling for a two-fold increase
in throughput [18]. The two levels of traceback can be done with one mapping
function in Equation (3.4). This technique provides speedup of traceback operations
with extra lookahead hardware. But throughput doubling relies on implementing
a 4-way ACS operation at the same rate as a 2-way ACS. A 4-way ACS circuit
has been presented that achieves this goal to within 17% overhead, resulting in an
overall speedup of a factor of 1.7.
Sn−2 = dsn−2,n(Sn >> 2) (3.4)
29
CHAPTER 3. VITERBI DECODER ARCHITECTURE
A straightforward implementation of the traceback method is shown in Fig-
ure 3.6. A total of (2K−1 × L) bit memory is needed (K is constraint length and L
…………LTime2K-1 Decision MemoryWRITEDecode Traceback
Decision MemoryFigure 3.6: Basic implementation of TB method
is TBD). At cycle t, the ACS decision vector is written to memory location (t mod
L). The algorithm then traces back for L cycles to get the merged survivor path.
The decoded bit is then derived from the end state of the surivior path (the starting
state is at cycle t and the end state is at cycle t− L). So far, we have decoded one
bit, and the same procedure is repeated for the next bit. In later discussions, we
will call this straightforward TB architecture as simple TB architecture (TBS).
Figure 3.7 shows the values of the registers for the TB method. This algorithm
is straightforward and the memory size grows linearly with L and exponetially with
K. However, this algorithm is undesirable with regard to circuit implementation due
to the high numbers of memory accesses required. It poses serious problem to the
throughput of the Viterbi decoders.
30
3.4. TRACEBACK METHOD-0-0T=111100100
State 000000T=20100101000T=3
01000110100001T=4010100110010000011T=5
----T=0Figure 3.7: Register contents for traceback operations
Take a closer look at the TB algorithm, there are three elemental operations
used in the decoding process:
Traceback read (TB): Reading the path decision in the current state, then com-
bining this information with the current state to find the previous state.
Decode Read (DC): This operation is similar to TB operation. Instead of just
finding the previous state, it also decodes the symbol and sends it to the
bit-order reversing circuit(because the direction of decoding is reverse to the
stream of symbols).
Writing New Data (WR): This operation writes the decision vectors output by
ACS unit into the memory.
The TB algorithm consists of repetitions of the above three operations. In every
ACS iteration, the path decisions are written to a circular memory bank. Then the
traceback read operation excutes for L iterations such that all survivor sequences
are merged to one common path. Finally the common path is scanned by the DC
operations to retrieve the information bits. In the simple TB architecture, we have
to do one WR operation and L TB operations to decode one bit.
31
CHAPTER 3. VITERBI DECODER ARCHITECTURE
In order to evaluate the throughput performance of different SMM architec-
tures, we define a performance index, traceback read to ACS iteration ratio
(TRAIR). This ratio corresponds to the read/write speed of the memory. Low
TRAIR ratio facilitates the design of the Viterbi decoder. In the design of high-
speed Viterbi decoder, it is desirable to bring the TRAIR as close to 1 as possible
to reduce the memory throughput requirement. In the simple TB architecture, the
ratio between the memory read and the ACS iteration is L, which is not appropriate
for high-speed implementations. In the the later sections, we will present more SMM
architectures suitable for high speed implementation.
3.5 One-Pointer Traceback
In Section 3.4, the basics of the TB algorithm is studied. Due to the high
TRAIR ratio, the TBS architecture is not suitable for high-speed implementation.
However, with some minor modifications to the algorithm, we can derive different
architectures suitable for high-speed Viterbi decoding applications [13].
Instead of tracing back L steps and decoding one symbol in an iteration, we
can put the merged survivor path to better use. That is, the decoding process can
decode a series of symbols from the known merged path. We define the ratio between
the length of decode block and the traceback block as ”Decode to Traceback Ratio”
(DTR). The value of DTR ranges from 1/L to 1. Instead of decoding only one
symbol per L traceback operations, (DTR × L) symbols are decoded in the new
architecture.
Figure 3.8 shows the memory organization of the one-pointer TB architecture.
In order to achieve continuous decoding operations, three kinds of operations (WR,
TB and DC) must be completed at the same cycle.
Figure 3.9 presents the timing diagram of the one-pointer TB architecture with
DTR=0.5. As shown in the figure, for every ACS iteration, the traceback unit
32
3.5. ONE-POINTER TRACEBACK
LDTR * L Traceback (TB) Write(WR)Decode(DC) DTR * LFigure 3.8: Memory organization of one-pointer traceback architectureDC TB TB WRDC DC DC
WR WR WRTB TBTBTBTB TBDC TB TB WR
Time
Figure 3.9: Timing diagram of one-pointer traceback architecture with DTR=0.5
has to perform three read operations and one write operation. In a straightforward
implementation, this architecture requires a single port RAM that runs four times
faster than the ACS cycle time. It results in a very high-speed requirement for the
RAM design. For high-speed Viterbi decoders, we can make modifications to achieve
a 1:1 (memory access/ACS iteration) ratio so as to reduce the speed requirement of
memory.
The DTR is important in determining the characteristics of the one-pointer TB
33
CHAPTER 3. VITERBI DECODER ARCHITECTURE
architecture. By varying the DTR, we can derive architectures with different mem-
ory speed requirement, memory size, throughput and coding gain performance. We
define several performance metrics to evaluate different SMM architectures. First,
we consider the TBD, L. TBD is defined as the number of traceback operations per-
formed before decoding the first bit. This parameter directly affects the throughput
and coding gain performance of the Viterbi decoder. Compared with the simple
TB architecture defined in Section 3.4, the decoding process in the one-pointer TB
architecture is an extension of the survivor path merging process. For the first de-
coded bits in the decoding block, the TB depth is L. For the last decoded bits in
the decoding block, the equivalent TB depth is (L + DTR ∗ L). In other words,
the probability that all the paths have merged into a common path is higher for the
last decoded symbol in the block than the first symbol. Therefore, the meaning of
TBD in the one-pointer TB architecture differs from that of the simple TB archi-
tecture. Note that the one-pointer TB architecture is equivalent to the simple TB
architecture when DTR equals 1L.
We can try to define some metrics, for example, the average mean traceback
depth (ATBD) is then given by:
ATBD =L + (L + DTR ∗ L)
2= (1 +
DTR
2)L (3.5)
For the one-pointer TB architecture, the TRAIR is given by:
TRAIR =L + DTR ∗ L
DTR ∗ L= 1 +
1
DTR(3.6)
The total decision memory length M is given by:
M = (DTR ∗ L) + L + (DTR ∗ L) = (1 + 2DTR)L (3.7)
These parameters determine the memory organization, throughput, and the hard-
ware area of the resulting TB architecture. But from Equation (3.5), the value of
34
3.5. ONE-POINTER TRACEBACK
ATBD also depends on DTR. To make the comparison fair, the decision memory
length (M) should be normalized, so that the ATBD of configurations with differ-
ent values of DTR are the same. Assume two configurations A and B each with
parameters DTRa, DTRb. La and Lb denote their TB depths. The value Lb that
will give configuration B equivalent ATBD to configuration A is given by:
Lb =2 + DTRa
2 + DTRb
La (3.8)
and the adjusted memory size is given by
Mb = (1 + 2DTRb)2 + DTRa
2 + DTRb
La (3.9)
This adjustment is especially important for the high-punctured-rate code, since their
TB depth is long and the impact on the memory size is significant.
Table 3.1: Comparison of traceback architectures
DTR TRAIR Memory Size ATBD Adjusted Memory Size
1 2 3L 1.5L 3.0L
1/2 3 2L 1.25L 2.4L
1/4 5 1.5L 1.125L 2.0L
1/7 8 1.28L 1.071L 1.8L
The comparison for architectures with different values of DTR is shown in Ta-
ble 3.1. The memory size decreases with the value of DTR, but at the expense of
larger TRAIR. The adjusted memory size is shown in the last column. As shown
in the table, once the ATBD of different configurations are made to be equivalent,
the memory size reduction is less significant for the smaller value of DTR.
Memory access speed is usually the limiting factor of the SMM unit throughput
in high-speed Viterbi decoder implementations. To simplify the memory design, it is
35
CHAPTER 3. VITERBI DECODER ARCHITECTURE
desirable to have a memory access rate equivalent to the ACS iteration rate. Addi-
tional hardware is needed to deal with the higher memory bandwidth requirement.
We will show two architectures to further resolve this problem. In order to minimize
the chip area, lower memory array size must be weighed against hardware needed
to support the high memory access rate.
3.6 Multiple Pointer Traceback
In the multiple pointer traceback architecture [13], the problem of the high mem-
ory access rate is solved by dividing the memory array into several banks and access-
ing them in parallel with multiple read and write pointers. Consider the DTR=0.5
memory configuration with the TRAIR equal to 3. The total number of memory
accesses needed for each ACS iteration is four, counting read and write operations.
Three read and one write pointers are necessary to achieve a memory cycle time that
is the same as the ACS iteration time. The timing diagram of this configuration
is shown in Figure 3.10. To accommodate running four pointers at the same time,
total memory size is increased to three times the traceback depth (TBD), L.
The memory is divided in to six banks, and each of them is of length L/2. Each of
the two traceback pointers traces back L/2 stages to find the merged survivor path,
and the decoding pointer produces the decoded output by using the survivor path
found by the traceback pointers alternately. After every L stages, a new traceback
front is started from the fixed state (such as the zeroth state). Table 3.2 shows the
comparison of different multiple-pointer architectures. The memory size and the
adjusted memory size increase very much.
Compared with DTR=0.5 one-pointer traceback configuration, this 3-pointer
configuration requires an extra length L memory array, and five more set of address
decoders to achieve the lower memory access rate. Because this configuration is de-
rived from the DTR=0.5 configuration, it has the same ATBD of 1.25L. According
36
3.6. MULTIPLE POINTER TRACEBACKDC TB1 TB2WRTime DC Idle TB2IdleWR WR WR WR WRWRDC DC DCDC DCIdle IdleIdleIdle Idle IdleIdleIdle Idle IdleIdle Idle
TB1 TB1TB1 TB1 TB1TB1TB2TB2 TB2TB2 TB2
Bank2
L/2
Bank0 Bank1 Bank3 Bank4 Bank5
Figure 3.10: Timing diagram of multiple pointer traceback architecture
Table 3.2: Comparison multiple pointer traceback architectures
DTR TRAIR Memory Size ATBD Adjusted Memory Size
1 2 3L 1.5L 3.0L
1/2 3 → 2 3L 1.25L 3.6L
1/4 5 → 2 5L 1.125L 6.7L
1/7 8 → 2 8L 1.071L 12.9L
to your memory requirement, other configurations based on different values of DTR
are also possible.
37
CHAPTER 3. VITERBI DECODER ARCHITECTURE
3.7 Traceforward
In Section 3.6, the problem of high memory access rate to ACS iteration rate
was resolved by using multiple pointers, or by speeding up the traceback operations.
The above algorithm approaches this problem by parallelization, i.e. carrying out
more operations in one ACS iteration. However, by inspecting the basic traceback
algorithm, another effective traceback algorithm can be found. In the traceback
algorithm, the ”starting decode state” is found by tracing back the decision memory
for L times. But, as mentioned in Section 3.2, we can find this ”starting decode
state” by employing the register exchange structure [15]. Because the direction of
this computation is in the same direction as the ACS operation, this architecture is
named ”traceforward” architecture.
Figure 3.11 shows how the register exchange structure can be applied to find the
”starting decode state” for the traceforward algorithm. Suppose that there are L
levels of register exchange structure with the state indexes as the register content.
At cycle t, the state registers are initialized to their index number. Then the ACS
decision is applied to the register exchange trellis structure with the states exchanged
in the same fashion as the decision bits in the register exchange algorithm. After
L cycles, the state indexes at the output of this register exchange structure will
all merge to one common state index, the ”starting decode state.” In the actual
implementation, since only the starting state is needed for every L cycles, only one
stage of register exchange structure is necessary, as highlighted in the ”Implemented
Circuit” box of Figure 3.11. The routing complexity of the traceforward unit is
achieved by replacing the ACS path metrics routing with the state routing, and the
branch metrics routing with ACS decision routing.
The memory configuration of the traceforward algorithm is shown in 3.12. This
configuration is based on the one-pointer traceback algorithm with DTR=1. At
38
3.7. TRACEFORWARD
11100100State decision vector
…..
survivor path merging
d[3]d[2]d[1]d[0]d[3]d[2]d[1]d[0]ImplementedCircuit
4
state initializationevery L cycle convergedstartingdecode stateFigure 3.11: Traceforward unit
LTraceforward(TF) Write(WR)Decode(DC) LL starting stateFoldedFigure 3.12: Memory configuration of traceforward unit
first glance, the total memory size is 3L. But we can reduce the memory size to 2L
by folding the WR memory with the DC memory. Although no traceback operation
39
CHAPTER 3. VITERBI DECODER ARCHITECTURE
is executing in the traceforward memory, it acts as a buffer to wait for the starting
state generated from the traceforward unit. 12345678 121110916151413 1718192021222324 28272625
startingstate48121620
decodedoutput1234 5678 9101112 13141516 17181920Applying starting state to decode memory write and traceforward Decode
Figure 3.13: Timing diagram of traceforward algorithm
Figure 3.13 gives an example of the traceforward algorithm with L = 4. Due to
the folding of DC memory and WR memory, the running direction of the DC and
the WR pointers will change during the traceback operation. The numbers in the
40
3.7. TRACEFORWARD
figure represent the time cycles. From time cycle 1 to 4, the ACS decision vectors
are written to the memory. From time cycle 5 to 8, along with the ACS decision
updates, the traceforward unit computes the starting state for the decode operation.
At time cycle 8, the starting decode state for the decisions generated from time cycle
1 to 4 is ready, so from time cycle 9 to 12, the decode operation will generate four
decoded bits. Because the decoding direction is reverse to the input direction, the
decoded bits are sent to the bit order reversing (last in first out) circuit to restore
the correct order. The same process repeats through the whole decoding process.
The latency of the above example is 3L.ACS DecisionsDecision Memory Traceforward UnitState DecodeStarting StateState Selection DecodedOutputFigure 3.14: Block diagram of traceforward algorithm
The block diagram of the traceforward implementation is shown in Figure 3.14.
There are several reasons for the selection of DTR=1 over DTR=0.5 in the trace-
forward algorithm implementation. First, the size of the decode region equals L for
the DTR=1 configuration, so only one traceforward unit is necessary, while for the
41
CHAPTER 3. VITERBI DECODER ARCHITECTURE
DTR=0.5 configuration, two interleaved traceforward units are necessary to pro-
duce the starting decode state every L/2 cycles. Second, due to the folding of DC
and WR memory, the total memory is 2L, the same as the DTR=0.5 configuration.
Third, with the same L, the ATBD of the DTR=1 configuration is 20% larger
than the DTR=0.5 configuration. The drawbacks of the DTR=1 configuration
is that it requires a longer LIFO to rearrange the order of the decoded bits, and
the decoding latency is longer. Compared with the algorithms discussed so far, the
traceforward algorithm is unique due to the fact that it is based on a DTR=1 con-
figuration rather than DTR=0.5 configuration. In terms of hardware area tradeoff,
it requires a complicated traceforward unit, and a complicated read/write pointer
scheme. However, the routing overhead is kept to a minimum in this algorithm, and
an efficient ACS layout topology can be readily applied to reduce the traceforward
unit chip area, with the branch metrics routing replaced by the decision routing,
and the path metrics routing replaced by the state routing.
3.8 Sliding Block
To achieve unlimited concurrency and hence throughput in an area-efficient man-
ner, a sliding block Viterbi decoder (SBVD) is implmented that combines the filter-
ing characteristics of a sliding block decoder with the computational efficiency of the
Viterbi algorithm. The SBVD approach reduces the problem of decoding a continu-
ous input stream to decoding independent overlapping blocks without constrainging
the encoding process [19].
As shown in Figure 3.15, a general hybrid Viterbi algorithm can be derived that
combines forward and backward processing of the interval. At some trellis iteration
m, within the interval, the shortest path must pass through one of the four possible
states. Forward processing of the interval n − L to m yields four survivor paths
corresponding to the shortest paths from n − L to each state at time m. Similary,
42
3.8. SLIDING BLOCKConcatenated State Metric Forward State Metric Backward State Metric= +n-LForward ACS Backward ACSState EstimateTracebackDecodeState
n+LZeroInitialMetricZeroInitialMetric
nmShortest pathSm Sn
Figure 3.15: Hybrid Viterbi algorithm for selecting the shortest path through atrellis of finite length (four-state trellis example)
backward processing of the interval n+L to m yields the shortest paths from n+L to
each state at time m. For a given state at time m, the shortest path over the interval
through this state must be the concatenation of the forward and backward state
metrics. Hence, selecting the state Sm at time m with the smallest concatenated
state metric yields the starting state for tracing back the shortest path. If m = n
then Sn can be decoded directly; otherwise traceback from m to n is required. The
hybrid Viterbi algorithm subsumes the forward only and backward only algorithms,
which correspond to the special cases of m = n + L and m = n− L, respectively.
In the basic mode, the SBVD requires 2L trellis iterations per decoded output;
hence the relative area efficiency is 1/(2L). The relative area efficiency can be
increased by decoding a block of length M rather than just a single bit. That is,
43
CHAPTER 3. VITERBI DECODER ARCHITECTURE
apply the hybrid Viterbi algorithm to the interval n−M/2−L to n+M/2+L and
decode the interval n−M/2 to n+M/2. Each block of M decoded outputs requires
M + 2L trellis iterations and hence the corresponding DTR is M/(M + 2L). The
SBVD architecture can be applied to high-speed Viterbi decoding of convolutional
codes of any number of states. The hardware overhead, compared to sequential
Viterbi decoding, is independent of the number of states, but inverse proportional
to the ratio of the survivor path length L and the block length M . This relatively
small overhead enables codes of large constraint lengths to be decoded as efficiently
as codes of shorter lengths without a theoretical speed limit.SyncBlock DecodeBlock TracebackBlockn n+M/2n-M/2n-M/2-L n-M/2+LForward ACSTraceback/Decode Traceback StateEstimate(a)SyncBlock DecodeBlock SyncBlockn n+M/2n-M/2n-M/2-L n-M/2+L
Forward ACS Backward ACSStateEstimateTracebackDecode TracebackDecode(b)
Figure 3.16: Block decoding using the SBVD method: (a) forward processing and(b) equal forward/backward processing
44
3.8. SLIDING BLOCK
Any choice of m for the hybrid Viterbi algorithm can be used for block decoding.
Two schemes of practical interest are shown in Figure 3.16. Figure 3.16 (a) is
the forward only processing, and the equal forward/backward processing is shown
in Figure 3.16 (b). As in Figure 3.16, the SBVD is a true maximum likelihood
algorithm, selecting the path from the set of all possible paths that is closest to the
obserbed output over the finite observation interval.nn-2L n+2L n+4L n+6L4LInputStreamn+Ln-L n+3L n+5LOutputStream 2LSlidingBlockMethod
Figure 3.17: Continuous stream processing using the SBVD method.
In high-speed implementation, it is efficient to choose the M=2L. The reason
is similar to what have mentioned about traceforward architecture in Section 3.7.
Decoding of a continuous input stream using the SBVD method is analogous to
overlap-add filtering as shown in Figure 3.17. The input stream is blocked into
input symbol vectors of length 2L, successive pairs of which are decoded using the
SBVD method to produce output vectors of length 2L.
In the original paper, the implementation of this architecture adopts fully par-
allel hardware instead of memory traceback, because the constraint length of the
target convolutional code is only 3. But the hardware cost is very high with larger
45
CHAPTER 3. VITERBI DECODER ARCHITECTURE
constraint length such as 7 or 9, because the number of BM units or ACS units is
proportional to the TBD. Hence we adopts the memory based implementation in
the later discussion.
3.9 Best State Traceback Architecture
In the punctured Viterbi decoder design, a large TB depth L is necessary to deal
with the high puncture rate (i.e. 7/8 code). In the discussion up to now, it was
assumed that the traceback operation started from arbitrary chosen state. However,
it is possible to start from the state that is closest to the desired survivor path, by
utilizing the information contained in the path metrics. For the state with the lowest
path metrics, the probability is high that the state is already very close to the surivor
path. With this modification, we only have to traceback along one instead of 2K−1
survivor paths. Simulation shows that the best state traceback can cut the TB
depth in half while maintaining the same coding gain performance [12]. In high-
speed parallel Viterbi decoder implementations, the TB operation usually starts
from an arbitrary state due to the difficulty of finding the best state at high-speed.
As noted earlier, for the high puncture rate situation, the TB depth requirement
calls for a large RAM, which will adversely impact the area. Using the best-state
TB architecture can cut the traceback depth in half while maintaining the coding
gain. Because a parallel comparator is very expensive in chip area, we may replace
it by a smaller two-way serial comparator structure to find the best state. A bank
of buffer RAM is added, so the extra delay time can be used to carry out the serial
compare operation.
Figure 3.18 shows the timing diagram for the best state traceback algorithm
based on the DTR=0.5 one-pointer TB memory configuration. An additional mem-
ory block of length L/2 is added to allow the time for the serial comparison. The
total latency is 2.5L. However, since the best state traceback reduces the TB depth,
46
3.9. BEST STATE TRACEBACK ARCHITECTUREDC TB TBDC DC DCWR WR WR
TB TBTB TBTB TBDCTBTB WRTime WRCMP CMPCMP CMP CMPDC TB TB WRCMP
0 L/2 L 3L/2 2L 5L/2
Figure 3.18: Best state traceback timing diagram
the overall latency is lower than the blind zero-state traceback. For a rate-7/8 punc-
tured code, a traceback depth of 64 gives satisfactory coding gain performance. The
selection of L = 64 leaves 32 clock cycles for the serial comparator to finish the
minimum path metrics search. To find the minimum state among 64 states, a 2-way
serial comparison structure is used. The total number of comparisons needed is 31
among each 32 states, plus 1 between 2 final candidates, for a total of exactly 32
cycles.
This algorithm can be applied to the multiple-pointer traceback algorithm. For
the traceforward algorithm, applying the best state decoding requires a big 64 to
1 selector to select the starting decode state computed by the traceforward unit,
which is impractical in terms of chip area.
47
CHAPTER 3. VITERBI DECODER ARCHITECTURE
3.10 Comparison of Architectures
We will summarize the memory speed, memory size requirements and ATBD
of the SMM architectures in this section. The convolutional code we are to deal
with is of constraint length K and rate-1/2 (the number of states equals S = 2K−1).
L denotes the TBD of the SMM architecture. The ”sb-bit” is the number of bits
used in soft decision decoding and we usually set the ”sb-bit” to three or four. The
”acs-bit” is the number of bits used to accumulate the path metric. To make the
accumulated path metric not overflow, we usually set the ”acs-bit” to 8 or 9.
We show the bandwidth requirements of the basic register exchange and trace-
back architectures in Table 3.3. From this table, we can see the fundamental dif-
ference between two methods. The bandwidth requirement of RE method is much
more than that of TB method, but the RE method is usually implemented with
hard-wired logic while the TB method is usually implemented with memory. The
delay time of RE method is less than TB method by 2L clock cycles. The routing
complexity of RE method grows exponentially with K. We conclude that the RE
based architectures are suitable for high-speed Viterbi decoder or small constraint
length convolutional codes. If the constraint length is large, TB based architecures
facilitate the design because of its lower memory bandwidth requirement and lower
routing complexity.
Table 3.4 summarizes the ATBD of SMM architectures. We will use it to
equalize the BER performance and hardware area later.
Table 3.5 shows the throughput related comparisons of the mentioned architec-
tures. The ”TRAIR” depicts the memory speed requirement of that architecture.
The ”Delay Cycles” shows the coding delay clock cycles of the architecture.
Table 3.6 summarizes the memory/register requirement of each mentioned ar-
chitectures. The ”RE” architecture use registers instead of memory.
48
3.10. COMPARISON OF ARCHITECTURES
Table 3.3: Bandwidth requirement of basic SMM architectures
Architecture RE TB
Write Bandwidth L× S S
Read Bandwidth L× S TRAIR×K
Total 2× L× S S + TRAIR×K
Wiring Complexity High Low
Delay L 3L
Table 3.4: ATBD of architectures
Architecture ATBD
RE L
TBS L
DTR=0.5 1.25L
DTR=1 1.5L
TF 1.5L
SB 2.5L
Tables 3.7 and 3.8 show the major hardware components of the architectures.
(The length of a word is S bit.) We will use it to perform hardware area estimation
in the next chapter.
49
CHAPTER 3. VITERBI DECODER ARCHITECTURE
Table 3.5: Speed and bandwidth related comparisons of architectures
Architecture TRAIR Delay Cycles Norm Delay Cycles
RE 1:1 L L
TBS L:1 1 1
1-Ptr (DTR=0.5) 3:1 1.5L 1.2L
M-Ptr (DTR=0.5) 2:1 1.5L 1.2L
1-Ptr (DTR=1) 2:1 3L 2L
TF 1:1 3L 2L
SB 1:1 4.5L 1.8L
Table 3.6: Memory size comparison of architectures
Architecture ATBD Memory Size Normalized Memory Size
TBS L L SL
1-Ptr (DTR=0.5) 1.25L 2L 1.6L
M-Ptr (DTR=0.5) 1.25L 3L 2.4L
1-Ptr (DTR=1) 1.5L 3L 2L
TF 1.5L 2L 1.33L
SB 2.5L 6L 2.4L
50
3.10. COMPARISON OF ARCHITECTURES
Table 3.7: Hardware requirement comparison
Architecture Hardware Requirement
RE (LS) reg and mux, complex routing
TBS
1-Ptr BM Unit TB Unit
M-Ptr S ACS Units 2 LIFO Units Extra TB Units
TF Tracforward Unit
1-Ptr BS S-way Comparator
SB 2S ACS Units, 1 BM Units, 2 TB Units
S-way Comparator
Table 3.8: Hardware unit description
Name Description
BM Unit 2 Inv, 4 Add (sb-bit)
ACS Unit 2 Add, 1 Sub, 1 Mux, 1 reg(acs-bit)
TB Unit (K-1) reg, 1 mux
TF Unit (K-1)S reg, 2(K-1)S mux
LIFO Unit L reg
S-Way Cmp (S-1) 2-way Cmp
2-way Cmp Sub (acs-bit), (K-1) Mux, (K-1) reg
51
CHAPTER 3. VITERBI DECODER ARCHITECTURE
52
Chapter 4
Performance Analysis and Metric
4.1 Simulation SetupPuncturing RateSNRInput Bit Stream Convolutional EncoderCode Type Puncture Modulator ChannelChannel Model
DepunctureViterbi DecoderBER MeterDelayModulationDe-ModulatorTrackback DepthArchitecture Data FlowSimulation ParameterBest State TB
Figure 4.1: Computer simulation setup
53
CHAPTER 4. PERFORMANCE ANALYSIS AND METRIC
The computer simulation setup is shown in Figure 4.1. As shown in the illus-
tration, the rectangle blocks are functional units and the round corner blocks are
adjustable parameters. All the parameters of this simulation environment can be
modified to examine their effects on the coding gain performance of Viterbi decoder.
The solid lines indicate the direction of data flow, and the dash lines indicate the
simulation parameters. The input bit stream is fed to the convolutional encoder,
and then the encoded code word is punctured according to the puncturing matrix.
Both the constraint length and the generator polynomials of the convolutional codes
are fully configurable. Here we can apply the generator polynomials listed in Ta-
ble 2.1. Note that we can achieve variable code rates by puncturing the rate-1/2
and rate-1/3 convolutional codes. The modulator sends the I and Q channel analog
output to the channel. The channel model consists of an additive white Gaussian
noise source to add noise to the I and Q channel according to the SNR setup. An
ideal QPSK receiver demodulates the I and Q channel output to produce the 4-bit
quantized soft decision for the Viterbi decoder model. A bit error rate counter then
compares the output of the Viterbi decoder to a delayed version of the input bit
stream to get the bit error rate. The simulation stops when the error count exceeds
10000 or maximum number of iterations is achieved. Finally, the ”BER Meter”
calculates some statistics such as BER and coding gain relative to plain QPSK.
The efficiency of a code is measured by the received energy per bit to noise ratio
(γb = Eb/N0) required to achieve a specific system bit error rate. The power of the
Gaussian channel noise must be adjusted in the simulation for a fair comparison
between different puncture codes. Assume that the modulated signal power has
symbol energy, Es, and each symbol carries B information bits. If the puncture
code rate is R, the Eb/N0 for the received signal is given by:
γb =Eb
N0
=Es
N0
1
BR
54
4.1. SIMULATION SETUP
For QPSK modulation with signal amplitude A, B = 2 bits/symbol, so Eb/N0 be-
comes:
γb =Eb
N0
=A2
N0
1
R
where γb is usually given in dB. The noise power for puncture code rate R is then
given by:
N0 =A2
R10
−γb10 (4.1)
The simulations shown in the later sections use Equation (4.1) to compute the white
Gaussian noise.
10e-6
10e-5
10e-4
10e-3
10e-2
10e-1
10e0
0 2 4 6 8 10
BE
R
Eb/No (dB)
QPSKDVB Rate-1/2, L= 48
Coding gain
Figure 4.2: Eb/No vs. BER for different coding systems
In order to analyze the coding gain performance of convolutional codes, we have
to know the performance of plain QPSK coding. The average bit error probability
of QPSK is as follows [39, 8]:
BER =1
2erfc(
√Eb
N0
) (4.2)
where erfc(·) is the complement of error function where
erfc(x) =2√π
∫ ∞
xe−t2dt (4.3)
55
CHAPTER 4. PERFORMANCE ANALYSIS AND METRIC
(Note that erfc(·) is an absolutely decreasing function which means the inverse func-
tion of erfc(·) exists.) The first curve of Figure 4.2 depicts the BER curve of plain
QPSK. The second curve is for rate-1/2 DVB convolutional code and Viterbi decoder
with L=48.
To calculate the coding gain, we have to fix a specific BER and draw a horizontal
line in Figure 4.2. There will be two intersection points, and each of them represents
the required SNR for that curve. The coding gain is the difference between the two
points. For example, the coding gain is 4.7dB @ 10−4 BER.
But actual case is that we usually do not have a figure like Figure 4.2. Suppose
we perform computer simulations with a certain system configuration in which the
SNR is γ. Upon the completion of the simulation, we have the corresponding BER,
β. Apply the β to Equation (4.2), we have
β =1
2erfc(
√γqpsk)
Apply the inverse of error function at both sides,
erfcinv(2β) =√
γqpsk
⇒ (erfcinv(2β))2 = γqpsk
where erfcinv(·) is the inverse function of erfc(·). Finally, the corresponding coding
gain equals
CodingGain = γqpsk − γ (4.4)
We will use Equation (4.4) to calculate the coding gain later.
56
4.2. CODING GAIN ANALYSIS OF VITERBI ALGORITHM
4.2 Coding Gain Analysis of Viterbi Algorithm
4.2.1 Simple Traceback Architecture
The performance of Viterbi decoder with simple TB architecture will be an-
alyzed in this section. Computer simulations were performed to investigate the
performance of the said convolutional codes as well as their punctured codes. In
order to make the analyses simple, we first adopt the DVB convolutional code and
simple TB architecture as our simulation configurations. The relation between TBD
and BER is shown in Figure 4.3 where every curve represents a specific system con-
figuration. The system designers can have an overview of the BER performance of
DVB convolutional code with this figure.
To have a clear view of the benefits of using convolutional codes and Viterbi
algorithm, we usually convert the BER to coding gain relative to plain QPSK coding
by using Equation (4.4). (Note that the measurement of coding gain is dB.) The
resulting figures are shown in Figure 4.4, where the x axis also denotes TBD but y
axis denotes coding gain (dB).
There are several interesting phenomena in Figure 4.4. First, every curve in
Figure 4.4 can be cut into two regions, the linear growth region and the saturation
region. Between the two regions, there is a saturation level of TBD (STBD) such
that the coding gain stops increasing ”significantly” when TBD goes beyond this
level1. The STBD mainly depends on the puncture rate and the SNR does not
affect it very much. Higher punctured rate code needs longer TBD to make all
paths converge with high probability. (We can see that the BER curves of higher
punctured rate code possess smaller curvature.) But we can not arbitrarily increase
the TBD because longer TBD implies higher hardware cost. We have to consider the
tradeoff between coding gain performance and hardware cost. The STBD can be seen
1We will give a definition to ”significantly” later.
57
CHAPTER 4. PERFORMANCE ANALYSIS AND METRIC
as the optimal TBD because it achieves the balance point between performance
and cost.
Second, the SNR affects the slope of curves and the maximum achievable coding
gain. Under the same puncture rate, higher SNR makes the coding gain larger. But
the variation of SNR does not affect the determination of optimal TBD. For this
sake, we will leave aside the effects of SNR in the later discussions.
Third, Some system configurations (e.g. the curve with SNR=3.5dB and Rate-
7/8) can not have positive coding gain even if the TBD were pushed to infinity.
Therefore, we have better not to use high puncture rate convolutional codes when
the channel condition is bad.
Table 4.1 summarizes the saturation (optimal) TBDs of simple TB architecture
in a channel with SNR=5.0dB AWGN. The ”Max CG” row shows the maximum
acheivable coding gain. The ”L” row shows the optimal TBDs of the corresponding
configurations. We define the STBD as the least TBD such that the corresponding
coding gain is 0.3dB less than maximum achievable coding gain.
Definition 2 Saturation TBD (STBD) is the least TBD such that the coding gain
is 0.3dB less than the maximum achievable coding gain.
When the hardware implementation of Viterbi decoders is taken into considera-
tion, we can adopt the STBD listed in Table 4.1 as our TBD to achieve a balance
between coding gain performance and hardware cost. We will try to find relationship
between simple TB and other SMM architectures in the later sections. The simula-
tion results shown in this section will serve as our baseline in the later discussions.
The same coding gain analysis is also applied to DCII and UMTS convolutional
code and the results are summarized in 4.2 and Table 4.3. Although the generator
polynomials of DCII and DVB are different, their results are very similar due to
their same constraint length. On the other hand, the results of UMTS are different
58
4.2. CODING GAIN ANALYSIS OF VITERBI ALGORITHM
from those of DVB because their constraint length are different. The maximum
coding gains and optimal TBDs of UMTS are larger than those of DVB.
Table 4.1: Coding gain analysis of DVB with simple TB architecture
Rate 1/2 2/3 3/4 4/5 5/6 7/8
Opt. TBD 44 68 80 104 120 160
Max CG 5.6 5.2 4.3 3.9 3.4 2.6
Table 4.2: Coding gain analysis of DCII with simple TB architecture
Rate 5/11 1/2 3/5 2/3 3/4 4/5 5/6 7/8
Opt. TBD 44 44 56 64 88 104 128 160
Max CG 5.7 5.7 5.1 4.9 4.7 3.8 3.5 2.5
Table 4.3: Coding gain analysis of UMTS with simple TB architecture
Rate 1/2 2/3 3/4 4/5 5/6 7/8
Opt. TBD 64 88 112 136 160 208
Max CG 6.7 5.8 5.1 4.7 4.2 3.4
4.2.2 Register Exchange Architecture
We do the same analysis to RE architecture in this section. In our simulation
results, the BER performance of RE architecture is very similar to that of the simple
TB architecture. With the same system configuration, the STBD and the maximum
coding gain of two architectures are almost the same, but simple TB architecture
59
CHAPTER 4. PERFORMANCE ANALYSIS AND METRIC
10e-6
10e-5
10e-4
10e-3
10e-2
10e-1
10e0
20 40 60 80 100 120 140 160 180 200
BE
R
TraceBack Depth
SIGMA=3.5SIGMA=4
SIGMA=4.5SIGMA=5
SIGMA=5.5SIGMA=6.0
10e-6
10e-5
10e-4
10e-3
10e-2
10e-1
10e0
20 40 60 80 100 120 140 160 180 200
BE
R
TraceBack Depth
SIGMA=3.5SIGMA=4
SIGMA=4.5SIGMA=5
SIGMA=5.5SIGMA=6.0
(a) Rate-1/2 (b) Rate-2/3
10e-6
10e-5
10e-4
10e-3
10e-2
10e-1
10e0
20 40 60 80 100 120 140 160 180 200
BE
R
TraceBack Depth
SIGMA=3.5SIGMA=4
SIGMA=4.5SIGMA=5
SIGMA=5.5SIGMA=6.0
10e-6
10e-5
10e-4
10e-3
10e-2
10e-1
10e0
20 40 60 80 100 120 140 160 180 200
BE
R
TraceBack Depth
SIGMA=3.5SIGMA=4
SIGMA=4.5SIGMA=5
SIGMA=5.5SIGMA=6.0
(c) Rate-3/4 (d) Rate-4/5
10e-6
10e-5
10e-4
10e-3
10e-2
10e-1
10e0
20 40 60 80 100 120 140 160 180 200
BE
R
TraceBack Depth
SIGMA=3.5SIGMA=4
SIGMA=4.5SIGMA=5
SIGMA=5.5SIGMA=6.0
10e-6
10e-5
10e-4
10e-3
10e-2
10e-1
10e0
20 40 60 80 100 120 140 160 180 200
BE
R
TraceBack Depth
SIGMA=3.5SIGMA=4
SIGMA=4.5SIGMA=5
SIGMA=5.5SIGMA=6.0
(e) Rate-5/6 (f) Rate-7/8
Figure 4.3: BER of DVB with simple traceback architecture
60
4.2. CODING GAIN ANALYSIS OF VITERBI ALGORITHM
-12
-10
-8
-6
-4
-2
0
2
4
6
20 40 60 80 100 120 140 160 180 200
Cod
ing
Gai
n(dB
)
TraceBack Depth
SIGMA=3.5SIGMA=4
SIGMA=4.5SIGMA=5
SIGMA=5.5SIGMA=6.0
-12
-10
-8
-6
-4
-2
0
2
4
6
20 40 60 80 100 120 140 160 180 200
Cod
ing
Gai
n(dB
)
TraceBack Depth
SIGMA=3.5SIGMA=4
SIGMA=4.5SIGMA=5
SIGMA=5.5SIGMA=6.0
(a) Rate-1/2 (b) Rate-2/3
-12
-10
-8
-6
-4
-2
0
2
4
6
20 40 60 80 100 120 140 160 180 200
Cod
ing
Gai
n(dB
)
TraceBack Depth
SIGMA=3.5SIGMA=4
SIGMA=4.5SIGMA=5
SIGMA=5.5SIGMA=6.0
-12
-10
-8
-6
-4
-2
0
2
4
6
20 40 60 80 100 120 140 160 180 200
Cod
ing
Gai
n(dB
)
TraceBack Depth
SIGMA=3.5SIGMA=4
SIGMA=4.5SIGMA=5
SIGMA=5.5SIGMA=6.0
(c) Rate-3/4 (d) Rate-4/5
-12
-10
-8
-6
-4
-2
0
2
4
6
20 40 60 80 100 120 140 160 180 200
Cod
ing
Gai
n(dB
)
TraceBack Depth
SIGMA=3.5SIGMA=4
SIGMA=4.5SIGMA=5
SIGMA=5.5SIGMA=6.0
-12
-10
-8
-6
-4
-2
0
2
4
6
20 40 60 80 100 120 140 160 180 200
Cod
ing
Gai
n(dB
)
TraceBack Depth
SIGMA=3.5SIGMA=4
SIGMA=4.5SIGMA=5
SIGMA=5.5SIGMA=6.0
(e) Rate-5/6 (f) Rate-7/8
Figure 4.4: Coding gain of DVB with simple traceback architecture
61
CHAPTER 4. PERFORMANCE ANALYSIS AND METRIC
outperforms RE architecture about 0.2 dB when the coding gain is in the linear
growth region. Besides this minor difference, the observations and conclusions we
made at last section are also applicable.
Modified Register Exchange Architecture
The coding gain performance of modified RE architecture presented in Section
3.3 is also examined, but the performance is not good enough. The BER of this
architecture is at least 0.1 with every system configuration. Thus we do not list the
resulting figures. To know the reasons of bad performance, please refer to Section 3.3.
4.2.3 Summary
Although the TB and RE architectures implement the Viterbi algorithm in dif-
ferent manners, their TBD to BER curves are almost the same. We think that is
because both SMM architectures possess the same ATBD.
4.3 Equalization of SMM Architectures
In this section, we will find out the optimal TBD of different SMM architectures
by equalizing them according to ATBD. Hence we can compare the hardware cost
of different SMM architectures under a fixed BER or coding gain performance.
First, we divide the mentioned SMM architectures into categories, and each cate-
gory represents a specific value of ATBD. Table 4.4 shows the member architectures
of the four categories.
Before equalizing different SMM architectures, we should see the comparison of
original coding gain performance in Figure 4.5. The maximum achievable coding
gain of all categories are almost the same. But they require different levels of TBD
to achieve the maximum coding gain. The category with larger ATBD achieves
maximum coding gain with smaller TBD.
62
4.3. EQUALIZATION OF SMM ARCHITECTURES
Table 4.4: Category of SMM architectures
Category ATBD SMM Architecture
1 L Simple Traceback/ Register Exchange
2 1.25L 1-Pointer Traceback (DTR=0.5)
3 1.5L Traceforward/1-Pointer Traceback (DTR=1)
4 2.5L Sliding Block
Figure 4.6 shows equalized version of Figure 4.5. The STBD of all curves are very
close in every subfigure, and the error is at most 10%. Then we can take ATBD
as a unifying metric to different SMM architectures. The advantage of ATBD is
that everyone can calculate it by pencil and paper in several seconds. The system
architects can use it to fast evaluate the coding gain performance and hardware cost
of the corresponding architectures.
4.3.1 DCII & UMTS
We change convolutional code to DCII and UMTS to further check the validity
of ATBD metric. The simulation results of DCII are very similar to those of DVB.
Hence we are not going to list the figures again.
It is worthy to list the results of UMTS because the constraint length of it is
different from DVB and DCII. Figure 4.7 shows the coding gain comparison of UMTS
and the equalized version is shown in Figure 4.8. The performance of ATBD is also
very satifactory here. The conclusions we made at last section are also applicable.
4.3.2 Best State Architecture
In this section, we will discuss the equalization of Best State Architecture. The
third and fourth curves in every subfigure of Figure 4.9 show the coding gain perfor-
63
CHAPTER 4. PERFORMANCE ANALYSIS AND METRIC
-12
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
Traceback Depth
Rate=1/2, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
-12
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
Traceback Depth
Rate=2/3, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
(a) Rate-1/2 (b) Rate-2/3
-12
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
Traceback Depth
Rate=3/4, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
-12
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
Traceback Depth
Rate=4/5, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
(c) Rate-3/4 (d) Rate-4/5
-12
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
Traceback Depth
Rate=5/6, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
-12
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
Traceback Depth
Rate=7/8, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
(e) Rate-5/6 (f) Rate-7/8
Figure 4.5: Coding gain comparison of different SMM architectures (DVB)
64
4.3. EQUALIZATION OF SMM ARCHITECTURES
-12
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
ATBD
Rate=1/2, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
-12
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
ATBD
Rate=2/3, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
(a) Rate-1/2 (b) Rate-2/3
-12
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
ATBD
Rate=3/4, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
-12
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
ATBD
Rate=4/5, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
(c) Rate-3/4 (d) Rate-4/5
-12
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
ATBD
Rate=5/6, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
-12
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
ATBD
Rate=7/8, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
(e) Rate-5/6 (f) Rate-7/8
Figure 4.6: Coding gain comparison equalized by ATBD (DVB)
65
CHAPTER 4. PERFORMANCE ANALYSIS AND METRIC
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250
Cod
ing
Gai
n
Traceback Depth
Rate=1/2, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250
Cod
ing
Gai
n
Traceback Depth
Rate=2/3, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
(a) Rate-1/2 (b) Rate-2/3
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250
Cod
ing
Gai
n
Traceback Depth
Rate=3/4, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250
Cod
ing
Gai
n
Traceback Depth
Rate=4/5, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
(c) Rate-3/4 (d) Rate-4/5
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250
Cod
ing
Gai
n
Traceback Depth
Rate=5/6, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250
Cod
ing
Gai
n
Traceback Depth
Rate=7/8, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
(e) Rate-5/6 (f) Rate-7/8
Figure 4.7: Coding gain comparison of different SMM architectures (UMTS)
66
4.3. EQUALIZATION OF SMM ARCHITECTURES
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250
Cod
ing
Gai
n
ATBD
Rate=1/2, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250
Cod
ing
Gai
n
ATBD
Rate=2/3, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
(a) Rate-1/2 (b) Rate-2/3
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250
Cod
ing
Gai
n
ATBD
Rate=3/4, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250
Cod
ing
Gai
n
ATBD
Rate=4/5, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
(c) Rate-3/4 (d) Rate-4/5
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250
Cod
ing
Gai
n
ATBD
Rate=5/6, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250
Cod
ing
Gai
n
ATBD
Rate=7/8, SNR=5 dB
TBS/RE1P DTR=0.5
TF/1P DTR=1.0SB
(e) Rate-5/6 (f) Rate-7/8
Figure 4.8: Coding gain comparison equalized by ATBD (UMTS)
67
CHAPTER 4. PERFORMANCE ANALYSIS AND METRIC
mance of simple and one-pointer best state architecture The normalized version of
Figure 4.9 is shown in Figure 4.10. We normalize the architectures with best state
traceback with extra 1.5 times of ATBD. The error of the estimation is also below
10%. That means we can equalize the SMM architecture with best state traceback
as well.
68
4.3. EQUALIZATION OF SMM ARCHITECTURES
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
Traceback Depth
Rate=1/2, SNR=5 dB
TBS1P
TBS BS1P BS
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
Traceback Depth
Rate=2/3, SNR=5 dB
TBS1P
TBS BS1P BS
(a) Rate-1/2 (b) Rate-2/3
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
Traceback Depth
Rate=3/4, SNR=5 dB
TBS1P
TBS BS1P BS
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
Traceback Depth
Rate=4/5, SNR=5 dB
TBS1P
TBS BS1P BS
(c) Rate-3/4 (d) Rate-4/5
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
Traceback Depth
Rate=5/6, SNR=5 dB
TBS1P
TBS BS1P BS
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
Traceback Depth
Rate=7/8, SNR=5 dB
TBS1P
TBS BS1P BS
(e) Rate-5/6 (f) Rate-7/8
Figure 4.9: Coding gain performance of best state architecture
69
CHAPTER 4. PERFORMANCE ANALYSIS AND METRIC
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
ATBD
Rate=1/2, SNR=5 dB
TBS1P
TBS BS1P BS
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
ATBD
Rate=2/3, SNR=5 dB
TBS1P
TBS BS1P BS
(a) Rate-1/2 (b) Rate-2/3
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
ATBD
Rate=3/4, SNR=5 dB
TBS1P
TBS BS1P BS
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
ATBD
Rate=4/5, SNR=5 dB
TBS1P
TBS BS1P BS
(c) Rate-3/4 (d) Rate-4/5
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
ATBD
Rate=5/6, SNR=5 dB
TBS1P
TBS BS1P BS
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160 180
Cod
ing
Gai
n
ATBD
Rate=7/8, SNR=5 dB
TBS1P
TBS BS1P BS
(e) Rate-5/6 (f) Rate-7/8
Figure 4.10: Coding gain performance of best state architecture equalized by ATBD
70
4.4. HARDWARE EQUALIZATION
4.4 Hardware Equalization
In this section, we will try to equalize the hardware area of different architectures
with the optimal TBDs listed in Tables 4.1, 4.2 and 4.3. The major hardware com-
ponents used in every SMM architecture have been summarized in Section 3.10. We
will use the library databook of TSMC 0.25µm process 2.5-Volt SAGE standard cell
provided by Artisan [40] to estimate the area of each architecture. We only concern
the TB based architectures, the area of RE based architecture highly depends on
the complex routing wires. The equalized hardware areas under different constraint
lengths and puncture rates are shown in Table 4.5. The values in every cell are nor-
malized area index and real area respectively. The estimation may not be precise
because it omits the routing areas and some pipeline registers. But our point is that
we need to make the comparison of different architectures fair. The TBDs must be
equalized to same coding gain performance. After this equalization, the estimation
of hardware requirement, especially memory requirement, will be more precise.
Table 4.5: Equalized area of architectures, 10−3mm2
Rate K=5, 1/2 DVB, 1/2 DVB, 7/8 UMTS, 1/2 UMTS, 7/8
Opt. ATBD 32 44 160 64 208
TBS 1.00(69) 1.00(263) 1.00(338) 1.00(1068) 1.00(1360)
1P (DTR=1) 1.05(72) 1.08(283) 1.22(411) 1.11(1186) 1.28(1743)
1P (DTR=0.5) 1.03(71) 1.05(276) 1.13(382) 1.07(1139) 1.17(1590)
MP (DTR=0.5) 1.08(74) 1.11(292) 1.31(442) 1.16(1234) 1.40(1898)
1P BS (DTR=1) 1.40(96) 1.52(401) 1.53(517) 1.61(1724) 1.63(2222)
TF 1.19(82) 1.29(340) 1.28(433) 1.39(1485) 1.37(1866)
SB 1.75(121) 1.99(523) 2.01(679) 2.14(2286) 2.20(2292)
71
CHAPTER 4. PERFORMANCE ANALYSIS AND METRIC
4.5 Coding Gain Estimation
A method to estimate the BER performance of different SMM architectures will
be presented in this section. The BER performance can be predicted by integrating
over that of simple TB architecture. For example, the one-pointer TB architecture
decodes every bit with from L to (1 + DTR)L TB steps. Remember the BER
performance simulations of the simple TB architecture shown in Figure 4.3. Let
BSimple(L,P) denote the BER function in that figure, and B̂1P (L,DTR,P) denote
the estimation of BER of one-pointer TB architecture. (P denotes the other param-
eters not mentioned here.) Equation (4.5) shows the estimation of B̂1P (·).
B̂1P (L,DTR,P) =
∫ (1+DTR)LL BSimple(l,P) dl
DTR× L(4.5)
Because this method actually integrates the BER of simple TB architecture, we
will call it INTBER in later discussions. The error of the INTBER method is
at most 0.1dB, which means we can predict the BER performance of one-pointer
TB architecture with the INTBER metric in this case. The prediction for other
architectures are similar, once we know the number of TB steps for every decoded
bit, the BER performance will be precisely estimated.
72
Chapter 5
Conclusion & Future Work
5.1 Conclusion
First, we analyze the performance of many SMM architectures. We show the
memory bandwidth requirement, memory size requirement, coding delay and hard-
ware components of every SMM architecture. We can use the above indexes to tell
whether a new architecture is high-speed or low-cost.
Second, we construct a software simulation environment to evaluate the per-
formance of different convolutional code systems. The system accepts arbitrary
constraint length, generator polynomials, puncture rate and noise level. New SMM
architectures can also be inserted into this system by a little modifications. The
BER and coding gain performance of Viterbi algorithm are examined under differ-
ent kinds of conditions. The simulations show that the BER (coding gain) stops
decreasing (increasing) significantly when the TBD achieves an optimal level. This
optimal TBD level depends on the convolutional code, puncture rate and SMM ar-
chitecture. Larger constraint length or higer puncture rate results in longer optimal
TBD. We take optimal TBD as the balance point between hardware cost and coding
gain performance.
Third, we propose the ATBD metric to predict the optimal TBD of different
SMM architectures. That is, we can equalize the difference between SMM architec-
73
CHAPTER 5. CONCLUSION & FUTURE WORK
tures by ATBD. The error of ATBD is at most 10%. The most important is that,
given a new SMM architecture, the determination of ATBD is very simple. System
architects can use ATBD to fast estimate the hardware cost or memory requirement
of their systems. The effect of best-state traceback is also examined. Simulations
show that we can cut down 1/3 to 1/2 of the optimal TBD by tracing back from
the best state.
Fourth, we decompose every SMM architecture and list the major hardware
components used in it. The TSMC SAGE standard cell library databook is used to
estimate the corresponding hardware areas. The error of this high level estimation is
acceptable because what we omit in every SMM architecture are similar. According
to the analyses conducted in Chapters 3 and 4, the traceforward architecture was
the better choice. The TRAIR of this architecture is 1:1, and it also gives more
equivalent TBD for the same size of memory. An extra TF unit is used to find the
starting state for the decoding operation. The routing complexity of the TF unit
is exactly the same as the ACS unit. In addition, the RE and SB architectures are
suitable for high-speed implementation when the constraint length is small (3 or 5).
5.2 Future Work
In this thesis, we only use the AWGN and QPSK as our channel model and
modulation scheme. The validity of ATBD metric can be further verified by using
different channel models and modulation schemes such as fading channel and trellis
code modulation.
The precision of ATBD can be improved by inspecting the Viterbi algorithm. The
formulation of Viterbi algorithm is based on probability and maximum likelihood.
Maybe we should not only consider the TBD equally but also give different weights
to every decoded bit on the survivor path. But as what we have mentioned, the new
metric can not be too complex to calculate or the idea of fast evaluation is broken.
74
Appendix A
Acronyms & Abbreviations
ATBD Average Traceback Depth
AWGN Additive White Gaussian Noise
BER Bit Error Rate
BM Branch Metric
BS Best State
COFDM Coded Orthogonal Frequency Division Multiplexing
DCII DigiCipher II
DSS Digital Satellite System
DTR Decode to Traceback Ratio
DVB Digital Video Broadcasting
FEC Foward Error Correction
FSM Finite State Machine
LOS Line-of-Sight
ML Maximum Likelihood
MAP Maximum a Posteriori
NLN Nakagami-lognormal
PM Path Metric
QPSK Quadriphase-Shift Keying
75
APPENDIX A. ACRONYMS & ABBREVIATIONS
RE Register Exchange
SB Sliding Block
SBVD Sliding Block Viterbi Decoder
SMM Survivor Memory Management
STBD Saturation TBD
TB Trackback
TBD Trackback Depth
TBS Simple Traceback Architecture
TRAIR Traceback Read to ACS iteration ratio
TF Traceforward
UMTS Universal Mobile Telecommunications System
VA Viterbi Algorithm
WPAN Wireless Personal Area Network
76
Appendix B
Glossary of Notation
K constraint length
L traceback depth
Rs average symbol rate
rate-a/b b output bits per a input bits
G generating polynomial of convolutional codes
x source symbol
y encoded symbol
r estimation of encoded symbol
y′ estimation of source symbol
t time index
S state index
ΓSn path metric of state S at time t
λi,jt branch metric from state i to state j at time t
C Convolutional Code
m length of the longest shift register in convolutional encoder
d decision vector
k decode to traceback ratio
Eb bit energy
Es symbol energy
77
Bibliography
[1] IEEE, IEEE 802.3ae Standard. IEEE, 2002. [Online]. Available:
http://standards.ieee.org/getieee802/
[2] UMTS Specification Standard. [Online]. Available: http://www.3gpp.org
[3] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically
optimum decoding algorithm,” IEEE Transactions on Information Theory, pp.
260–269, April 1967.
[4] J. G.D. Forney, “The Viterbi algorithm,” Proceedings of the IEEE, vol. 61,
no. 3, pp. 268–278, March 1973.
[5] ——, “Convolutional code (ii): Maximum likelihood decoding,” Information
and Control, vol. 25, pp. 222–266, July 1974.
[6] S. B. Wicker, Error Control System for Digital Communication and Storage,
1st ed. Pretience Hall, 1995.
[7] J. G. Proakis, Digital Communication System, 4th ed. Mcgraw Hill, 2001.
[8] P. Sweeny, Error Control Coding From Theory to Practice, 1st ed. Wiley, 2002.
[9] M. Boo, F. Arguello, J. D. Bruguera, R. Doallo, and E. Zapata, “High-
performance VLSI architecture for the Viterbi algorithm,” IEEE Transactions
on Communications, vol. 45, pp. 168–176, February 1997.
78
BIBLIOGRAPHY
[10] K. Page and P. Chau, “Improved architectures for the add-compare-select oper-
ation in long constraint length viterbi decoding,” IEEE Journal Of Solid-State
Circuits, vol. 33, pp. 151–155, January 1998.
[11] I. Lee and J. Sonntag, “A new architecture for fast Viterbi algorithm,” IEEE
Transactions On Communications, pp. 1624–1628, October 2003.
[12] R. J. McEliece and I. M. Onyszchuk, “Truncation effects in viterbi decoding,”
IEEE Conference Military Commun., pp. 541–545, October 1989.
[13] G. Feygin and P. Gulak, “Architectural tradeoffs for survivor sequence mem-
ory management in viterbi decoders,” IEEE Transactions On Communications,
vol. 41, no. 3, pp. 425–429, March 1993.
[14] P. J. Black and T. H.-Y. Meng, “Hybrid survivor path architectures for viterbi
decoders,” IEEE, pp. I433–I436, 1993.
[15] G. Fettweis, “Algebraic survivor memory management design for viterbi detec-
tors,” IEEE Transactions On Communications, vol. 43, no. 9, pp. 2458–2463,
September 1995.
[16] E. Boutillon and N. Demassieux, “High speed low power architecture for mem-
ory management in a viterbi decoder,” IEEE, pp. 284–287, 1996.
[17] D. A. El-Dib and M. I. Elmasry, “Modified register-exchange viterbi decoder
for low-power wireless communications,” IEEE Transactions on Circuits and
Systems I, pp. 371–378, February 2004.
[18] P. J. Black and T. H.-Y. Meng, “A 140-mb/s, 32-state, radix-4 viterbi decoder,”
IEEE Journal of Solid-State Circuit, vol. 27, pp. 1877–1885, December 1992.
[19] ——, “A 1-gb/s, four-state, sliding block viterbi decoder,” IEEE Journal of
Solid-State Circuit, vol. 32, pp. 797–805, June 1997.
79
BIBLIOGRAPHY
[20] Y.-N. Chan, H. Suzuki, and K. K. Parhi, “A 2-mb/s 256-state 10-mw rate-1/3
viterbi decoder,” IEEE Journal Of Solid-State Circuits, vol. 35, pp. 826–834,
June 2000.
[21] T. Gemmeke, M. Gansen, and T. G. Noll, “Implementation of scalable power
and area efficient high-throughput viterbi decoders,” IEEE Journal Of Solid-
State Circuits, vol. 37, pp. 941–948, July 2002.
[22] X. Liu and M. C. Papaefthymiou, “Design of a 20-mb/s 256-state viterbi de-
coder,” IEEE Transactions on VLSI systems, vol. 11, pp. 965–975, December
2003.
[23] E. Yeo, S. A. Augsburger, W. T. Davis, and B. Nikolic, “A 500-mb/s soft-
output viterbi decoder,” IEEE Journal of Solid-State Circuits, pp. 1234–1241,
July 2003.
[24] I. Onyszchuk, K.-M. Cheung, and O. Collins, “Quantization loss in convolu-
tional decoding,” IEEE Transactions On Communications, vol. 41, no. 2, pp.
261–265, February 1993.
[25] D. A. Luthi and et. al., “A single-chip concatenated fec decoder,” in IEEE 1995
Custom Intergrated Circuits Conference, 1995, pp. 285–288.
[26] T. Kamada and et. al., “An area effective standard cell based channel decoder
lsi for digital satellite tv broadcasting,” in VLSI Processing IX 1996, 1996, pp.
337–346.
[27] M. Hass and F. Kuttner, “Advanced two ic chipset for dvb on satellite recep-
tion,” IEEE Transactions on Consumer Electronics, pp. 341–345, Augest 1996.
80
BIBLIOGRAPHY
[28] W. P. E. Lutz and E. Plochinger, “Land mobile satellite communications–
channel model, modulation and error control,” in 7th International Conference
of Digital Satellite Communication, May 1986.
[29] T. T. Tjhung and C. C. Chai, “Fade statistics in nakagami-lognormal channels,”
IEEE Transaction on Communications, pp. 1769–1772, December 1999.
[30] C. Loo and N. Secord, “Computer models for fading channels with applications
to digital transmission,” IEEE Transactions on Vehicular Techonology, pp. 700–
707, November 1991.
[31] A. Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd ed.
Mcgraw-Hill, 1991.
[32] C. Tellambura and A. D. S. Jayalath, “Generation of bivariate rayleigh and
nakagami-m fading evelopes,” IEEE Communication Letters, pp. 170–172, May
2000.
[33] J. Hagenauer and E. Lutz, “Forward error correction coding for fading com-
pensation in mobile satellite channels,” IEEE Journal On Selected Areas In
Communications, vol. SAC-5, no. 2, February 1987.
[34] U. Mengali, R. Pellizzoni, and A. Spalvieri, “Soft-decision-based node synchro-
nization for viterbi decoders,” IEEE Transactions on Communications, vol. 43,
pp. 2532–2539, September 1995.
[35] D. J. Sodha, “Code synchornization for convolutional codes,” in Proceedings of
Canadian Conference on Electrical and Computer Engineering, vol. 1, Septem-
ber 1994, pp. 344–347.
81
BIBLIOGRAPHY
[36] O. J. Joeressen and H. Meyr, “Node synchronization for punctured convolu-
tional codes of rate (n-1)/n,” in Proceedings of 1994 IEEE GLOBECOM, vol. 3,
1994, pp. 1279–1283.
[37] Q. Pan and M. P. C. Fossorier, “Code invariances and self-synchronized viterbi
decoding,” IEEE Transactions on Communications, vol. 51, pp. 1082–1092,
July 2003.
[38] G. Lorden, R. J. McEliece, and L. Swanson, “Node synchronization for the
viterbi decoder,” IEEE Transactions on Communications, pp. 524–531, May
1984.
[39] S. Haykin, Communication System, 4th ed. Wiley, 2001.
[40] Artisan, TSMC .25um Process 2.5-Volt SAGE Standard Cell Library Databook,
2000.
82