1
An Efficient Implementation of Adaptive Prefix Coding Yakov Nekrich Dept. of Computer Science University of Bonn yasha0cs.uni-bonn.de The goal of the prefix coding is to assign codewords to elements of the input alphabet A, so that no codeword is a prefix of another one, and the total length of the encoded message S is minimized. In the case of static prefix coding, symbol frequencies are known in advance. In the case of adaptive (or dynamic) prefix coding, every symbol si is encoded before the next symbol si+l is read. To wit, when we encode a symbol si, only the frequencies of symbols in the already encoded prefix slsz . . .si-l of the message S are known. While static prefix encoding can be implemented in O(1) time per symbol, most adaptive al- gorithms take O(M) time, where M is the number of bits in the encoded message. Recently, [I] described an algorithm that encodes a message of rn symbols in O(m) time and has a good worst- case upper bound on the length of encoding. However, the algorithm of [I] uses time-consuming multiplications and divisions; therefore it is not suited for practical implementation. In this paper we present an algorithm that allows us to maintain a code and encode a message in a constant amortized time per symbol. Our algorithm uses only comparisons, additions, and bit shifts; hence, it can be efficiently implemented. A complete description of our algorithm will be given in the full version of this paper. Below we give the results of experiments for the files of Clagary compression corpus and large files from the Canterburry corpus. Our algorithm, further called quantized Shannon coding, is compared with the algorithm GEO [2] and adaptive arithmetic coding. Table 1. Encoding and Decoding Times (in seconds) The average compression rate (in bits per symbol) for the above test files is 5.524 for the quantized Shannon coding, 4.859 for the adaptive arithmetic coding, and 4.948 for GEO. The compression gain of arithmetic coding compared to quantized Shannon varies between 0.51 and 1.08 bits per symbol; the average compression gain is 0.665 bits per symbol. The compression gain of GEO compared to quantized Shannon coding is between 0.38 and 1.02 bits per symbol; the average compression gain is 0.576 bits per symboI. We conclude that the quantized Shannon coding achieves faster encoding and decoding than both arithmetic coding and GEO at the cost of somewhat smaller compression rates. Thus our method is an interesting trade-off between compression ratio and speed of adaptive coding. References [I] M. Karpinski, Y. Nekrich, "A Fast Algorithm for Adaptive Prefix Coding", Proc. IEEE Int. Symp. on Information Theory, 2006. [2] A. Turpin, A. Moffat, "On-line Adaptive Canonical Prefix Coding with Bounded Compression Loss", IEEE Trans. on Information Theory, 47(2001), 88-98. File bib book1 book2 geo news obj2 trans E.coli world bible Decoding 2007 Data Compression Conference (DCC'07) 0-7695-2791-4107 $20.00 O 2007 IEEE Size (inbytes) 111261 768771 610856 102400 377109 246814 93695 4638690 2473400 4047392 Encoding COMPUTER SOCIETY GEO 0.02 0.15 0.13 0.03 0.09 0.06 0.03 0.83 0.52 0.78 Quantized Shannon 0.02 0.12 0.08 0.02 0.06 0.05 0.02 0.60 0.42 0.65 Quantized Shannon 0.01 0.12 0.09 0.02 0.05 0.04 0.01 0.71 0.42 0.66 Arithmetic 0.13 0.88 0.70 0.12 0.45 0.31 0.11 4.83 3.06 4.80 Arithmetic 0.12 0.79 0.64 0.11 0.41 0.28 0.10 4.05 2.80 4.37 GEO 0.03 0.21 0.15 0.04 0.11 0.09 0.03 1.11 0.69 1.05

[IEEE 2007 Data Compression Conference (DCC'07) - Snowbird, UT, USA (2007.03.27-2007.03.29)] 2007 Data Compression Conference (DCC'07) - An Efficient Implementation of Adaptive Prefix

  • Upload
    yakov

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: [IEEE 2007 Data Compression Conference (DCC'07) - Snowbird, UT, USA (2007.03.27-2007.03.29)] 2007 Data Compression Conference (DCC'07) - An Efficient Implementation of Adaptive Prefix

An Efficient Implementation of Adaptive Prefix Coding Yakov Nekrich

Dept. of Computer Science University of Bonn

yasha0cs.uni-bonn.de

The goal of the prefix coding is to assign codewords to elements of the input alphabet A, so that no codeword is a prefix of another one, and the total length of the encoded message S is minimized. In the case of static prefix coding, symbol frequencies are known in advance. In the case of adaptive (or dynamic) prefix coding, every symbol s i is encoded before the next symbol si+ l is read. To wit, when we encode a symbol si, only the frequencies of symbols in the already encoded prefix slsz . . . s i - l of the message S are known.

While static prefix encoding can be implemented in O(1) time per symbol, most adaptive al- gorithms take O ( M ) time, where M is the number of bits in the encoded message. Recently, [I] described an algorithm that encodes a message of rn symbols in O(m) time and has a good worst- case upper bound on the length of encoding. However, the algorithm of [I] uses time-consuming multiplications and divisions; therefore it is not suited for practical implementation. In this paper we present an algorithm that allows us to maintain a code and encode a message in a constant amortized time per symbol. Our algorithm uses only comparisons, additions, and bit shifts; hence, it can be efficiently implemented. A complete description of our algorithm will be given in the full version of this paper. Below we give the results of experiments for the files of Clagary compression corpus and large files from the Canterburry corpus. Our algorithm, further called quantized Shannon coding, is compared with the algorithm GEO [2] and adaptive arithmetic coding.

Table 1. Encoding and Decoding Times (in seconds)

The average compression rate (in bits per symbol) for the above test files is 5.524 for the quantized Shannon coding, 4.859 for the adaptive arithmetic coding, and 4.948 for GEO. The compression gain of arithmetic coding compared to quantized Shannon varies between 0.51 and 1.08 bits per symbol; the average compression gain is 0.665 bits per symbol. The compression gain of GEO compared to quantized Shannon coding is between 0.38 and 1.02 bits per symbol; the average compression gain is 0.576 bits per symboI. We conclude that the quantized Shannon coding achieves faster encoding and decoding than both arithmetic coding and GEO at the cost of somewhat smaller compression rates. Thus our method is an interesting trade-off between compression ratio and speed of adaptive coding.

References [I] M. Karpinski, Y. Nekrich, "A Fast Algorithm for Adaptive Prefix Coding", Proc. IEEE Int. Symp. on Information Theory, 2006. [2] A. Turpin, A. Moffat, "On-line Adaptive Canonical Prefix Coding with Bounded Compression Loss", IEEE Trans. on Information Theory, 47(2001), 88-98.

File

bib book1 book2

geo news obj2 trans E.coli world bible

Decoding

2007 Data Compression Conference (DCC'07) 0-7695-2791-4107 $20.00 O 2007 IEEE

Size (inbytes)

111261 768771 610856 102400 377109 246814 93695

4638690 2473400 4047392

Encoding

COMPUTER SOCIETY

GEO

0.02 0.15 0.13 0.03 0.09 0.06 0.03 0.83 0.52 0.78

Quantized Shannon

0.02 0.12 0.08 0.02 0.06 0.05 0.02 0.60 0.42 0.65

Quantized Shannon

0.01 0.12 0.09 0.02 0.05 0.04 0.01 0.71 0.42 0.66

Arithmetic

0.13 0.88 0.70 0.12 0.45 0.31 0.11 4.83 3.06 4.80

Arithmetic

0.12 0.79 0.64 0.11 0.41 0.28 0.10 4.05 2.80 4.37

GEO

0.03 0.21 0.15 0.04 0.11 0.09 0.03 1.11 0.69 1.05