Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
Hamming-Distance-Based Binary
Representation of NumbersMinghai Qin
Western Digital Research, San Jose, CA, USA,
Abstract—Numbers are represented as binary arrays in com-puter storage. We propose a length-n binary representation ofnumbers from 0 to 2n − 1 based on Hamming distances suchthat for any i ∈ {0, . . . , 2n − 1}, if a constant number of bits(out of n) are flipped, the normalized L1 distance from the dis-torted number ierror to i is vanishing when n tends to infin-
ity. More precisely, maxi|ierror−i|
2n= o
(
1√n
)
. A pair of encoder
and decoder with O(n) time complexity is presented to estab-lish the Hamming-distance-based bijection between {0, 1}n and{0, 1, . . . , 2n − 1}.
I. INTRODUCTION
Real-valued and integer-valued numbers are stored in com-
puter as binary arrays. In current computer programming lan-
guages, integers in different ranges can be represented by 16
bits (short or unsigned short type in C++), 32 bits (int or un-
signed int type in C++), 64 bits (long long or unsigned long
long type in C++), etc. [1]. Real numbers are usually repre-
sented according to IEEE 754 standard for floating-point arith-
metic [1] by 16 bits (half precision), 32 bits (single precision,
float type in C++), 64 bits (double precision, double type in
C++), 128 bits (quadruple precision, long double type in C++),
etc. When the range of real numbers are small and known, a
fix-point arithmetic [1] can also be used to represent all real
numbers of interest by partitioning the whole range into 2n
intervals and each interval can be represented by a length-n
binary array.
One of the common assumptions for all currently-used
representations is that all bits are error-free. This assump-
tion is valid when binary arrays are stored in storage devices
(tapes, hard-disk drives, NAND flash solid-state drives, etc.)
protected by error-correction codes [2] (e.g., BCH codes,
low-density parity-check codes) or in fast computer mem-
ory (e.g., DRAM or cache) where the probability of bit-error
is negligible. Considering a storage-class memory (e.g.,
ReRAM or Phase-change RAM) that makes a better trade-off
between cost and performance (e.g., latency and through-
put) that might be used in future computing unit such as
general-purpose GPU, it provide fast and unified access to
load/write numbers, however, their memory cells are prune to
errors. Adding error-correction codes for protection can in-
duce 20% storage overhead and thus result in corresponding
bandwidth reduction and latency increment. Therefore, we
are interested in exploring the distortion brought by bit errors
when numbers are represented as binary arrays.
In this paper, we study the n-bit binary representation of
integers from 0 to 2n − 1 such that when any one bit of the
binary array is in error, the distortion from resultant integer to
the original integer is small. The representation can be used
for either signed or unsigned integer, and for the fix-point
arithmetic of real numbers. Note that Gray codes [3], which
are widely used to encode constellation points in communi-
cations systems, provide a mapping between integers from 0to 2n − 1 and all length-n binary arrays such that adjacent
integers have binary representations differing in 1 bit. How-
ever, Gray codes is not a satisfactory solution to the problem
this paper addresses since there are n possible bit-error lo-
cations in a length-n binary array and it is required that all
distortions are small. We provide a Hamming-distance-based
binary representation of numbers from 0 to 2n − 1 such that
the normalized distortion of 1-bit error is vanishing as n tends
to infinity. This is contrast to a simple unsigned binary expan-
sion, where the normalized distortion is a constant value of 12
if the leftmost bit is in error.
II. PRELIMINARIES
A. IEEE Standard for Floating-Point Arithmetic
IEEE 754 standard for floating-point arithmetic provides
guidelines to represent a real number r as (−1)s × cq by 16,
32, 64, 128, and 256 bits, where s is 1-bit of sign, c is a sig-
nificand, and q is an exponent. For 16-bit representation, 5 bits
are assigned to the exponent with bias equal 15 and the rest
11 bits are used for the 10-bit significand and the 1-bit sign.
With the presence of bit errors, IEEE standard for floating-
point arithmetic is not a proper representation of real num-
bers. The major weakness is due to the large range the IEEE
754 standard can represent. In particular, if the most signifi-
cant bit in the exponent is erroneous, the value of that num-
ber can be inadvertently set to a very large value. For exam-
ple, the binary string 0 01101 0101010101 represents (−1)0×213−15×1.3330078125≈ 0.33, but if the second bit is flippedand the string becomes 0 11101 0101010101, it will represent(−1)0× 229−15× 1.3330078125 = 21840. This drastic distor-tion will not likely to be acceptable for any applications.
B. Fixed-Point Arithmetic
In order to avoid the large distortion incurred by IEEE 754
standard for floating-point arithmetic, a fixed-point arithmetic
can be used to directly quantize real numbers between the min-
imum valued and the maximum value when they are available.
Assume the minimum and maximum value is denoted by vminand vmax, respectively. To convert a real number into a bi-
nary array, the interval [vmin, vmax] can be quantized into 2n
consecutive subintervals of the same size with boundaries
vmin = b0 < b1 < · · · < b2n−1 = vmax,
where n is the number of bits to represent a real number and
the length of all subintervals is ∆ = vmax−vmin2n . For all values
v ∈ [vmin, vmax], if v is in the ith interval, i.e., bi 6 v < bi+1,
then v is represented by an n-bit binary array of the integer i
as (b0, b1, . . . , bn−1). It can be seen that the fixed-point arith-metic is closely related to the n-bit representation of numbers
in {0, 1, . . . , 2n − 1}. If the binary representation is the un-
signed integer representation in C++, which will be referred
as unsigned binary expansion, then the real number can be
reconstructed from the binary array as
vrec = vmin +vmax − vmin
2n×
1
2+
n−1∑
j=0
bj2j
.
The reconstruction error |v−vrec| with the absence of bit errorsare then upper bounded by the size of the interval∆. However,
if the leftmost bit is in error, then the distance from the resul-
tant integer to the original integer |irec − i| = 2n−1, which is
half of the total number of integers representable. The recon-
struction error of the fix-point number is thus 12 (vmax − vmin),
which is normalized to be 12 of the range and is independent
of n.
III. HAMMING-DISTANCE-BASED BINARY
REPRESENTATION OF NUMBERS
In this section, we provide formal definitions and anal-
ysis for n-bit binary representation of integer numbers in
{0, 1, . . . , 2n − 1} with the presence of bit errors. We pro-
pose a binary representation called Hamming-distance-based
representation, which has a remarkable feature that the nor-
malized error is vanishing as n increases, conditioned on
that a constant number of bits in the binary representation is
erroneous.
Definition 1. Let n be the number of bits to represent an in-
teger in F = {0, 1, . . . , 2n − 1}. Then F is representable
according to a bijection f : {0, 1}n ↔ F . We define the dis-
tortion d(f, b1, b2) under the bijection f between two binary
arrays b1, b2 ∈ {0, 1}n as the normalized L1 distance of their
corresponding integer values, i.e.,
d(f, b1, b2) =|f(b1)− f(b2)|
2n, b1, b2 ∈ {0, 1}
n. (1)
Let dH(x1, x2) be the Hamming distance [4] between two
binary arrays x1, x2 ∈ {0, 1}n. We define the maximum
distance-1 distortion under f of one binary array b as the
maximum distortion between b and its neighbors whose
Hamming distance is 1 from b, i.e.,
dmax,1(f, b) = max{x:dH(b,x)=1}
d(f, b, x). (2)
We then define the distance-1 distortion under the bijection f
as
dmax,1(f) = maxb∈{0,1}n
dmax,1(f, b). (3)
For a constant number k, the definitions of distortion measures
dmax,k(f, b) and dmax,k(f) can be generalized by exploring thedistortion of neighbors with Hamming distance less than or
equal to k, i.e., dmax,k(f, b) = max{x:dH(b,x)6k} d(f, b, x) anddmax,k(f) = maxb∈{0,1}n dmax,k(f, b)
Example 1 Let n = 2 and F = {0, 1, 2, 3}. F can then
be used as the basis of the fixed-point representation for
any scaled and shifted values of the set {0, 13 ,23 , 1}. If the
bijection f is the unsigned binary expansion that maps
{(00), (01), (10), (11)} ↔ F = {0, 1, 2, 3} in the same order,
then we can calculate the distortion of (00) as follows.
dmax,1(f, (00)) = max {d(f, (00), (01)), d(f, (00), (10))}
= max
{
|0− 1|
2n,|0− 2|
2n
}
=1
2.
According to Definition 1, dmax,1(f) > dmax,1(f, (00)) = 12
and it can be shown that dmax,1(f) =12 .
If the bijection f is a length-two Gray code that maps
{(00), (01), (11), (10)} ↔ F in the same order. Then
dmax,1(f, (00)) = max {d(f, (00), (01)), d(f, (00), (10))}
= max
{
|0− 1|
2n,|0− 3|
2n
}
=3
4,
thus dmax,1(f) >34 . This means that the Gray code mapping is
worse than the unsigned binary expansion under the defined
distortion measure. Also note that as n increases, dmax,1(f)remains to be 1
2 for unsigned binary expansion and dmax,1(f)converges to 1 for Gray codes.
Remark 1. Gray codes G : {0, 1}n ↔ {0, 1, . . . , 2n − 1} thatmaps all binary arrays to an integer set satisfy that if i and j
are adjacent integers, then their Gray code representations have
Hamming distance 1, which is the optimally smallest number.The goal of this section is to propose a “reverse Gray”-like
mapping, where the adjacent binary arrays with Hamming dis-
tance 1 should have small distance in their integer represen-
tations. It can also be understood that for an integer i repre-
sented as b ∈ {0, 1}n, Gray codes can guarantee that at least
two of integers represented by Hamming-distance-1 neighbors
of b have distortion 1 from i (if i 6= 0 or 2n− 1) and the restcannot be guaranteed. In other word, Gray codes would be an
optimal bijection f in Definition 1 if the operation of “max”
in Eq. (2) is changed to “min”.
The next lemma shows an upper bound on dmax,k(f) as afunction of dmax,1(f). This lemma can be proved based on
the fact that Hamming distance and all binary arrays form a
metric space [5].
Lemma 1.
dmax,k(f) 6 kdmax,1(f). (4)
We present the Hamming distance based bijection as fol-
lows.
Construction 1 Let b1, b2 ∈ {0, 1}n, define b1 � b2 if either
one of the following two conditions are satisfied. 1) wH(b1) >wH(b2), 2) wH(b1) = wH(b2) and b1 is lexicographically
greater than or equal to b2, where wH(x) is the Hamming
weight of a binary array x and the dictionary order of the
binary elements is that 0 is followed by 1.
The Hamming-distance-based bijection is defined such that
∀b1, b2 ∈ {0, 1}n, f(b1) > f(b2)⇔ b1 � b2.
Remark 2. It can be shown that the first condition in Construc-
tion 1 is sufficient to guarantee the property in Theorem 1.
The second condition enables an enumerative encoding and
decoding function of f with linear time complexity discussed
in Section IV.
Example 2 Let n = 3 and F = {0, 1, . . . , 7}. The Hamming-distance-based bijection can be defined as
{(000), (001), (010), (100), (011), (101), (110), (111)}↔ F
in the corresponding order. That is, binary arrays with lower
Hamming weights are mapped to smaller numbers in F .
It can be observed that dmax,1(f) =12 . One possible way to
achieve it could be a bit error on the first bit of (001) (repre-senting 1 ∈ F ), which ends up to (101) (representing 5 ∈ F ).The normalized distortion is then 5−1
23 = 12 . For n = 3, the
Hamming-distance-based bijection has the same dmax,1(f) asthe unsigned binary expansion. But for larger n, the distor-
tion of Hamming-distance-based bijection will converge to 0,while it remain 1
2 for the unsigned binary expansion.
Theorem 1. Let f be the Hamming-distance-based bijection
and k be a constant number, then
limn→∞
dmax,k(f) = 0.
Proof: According to Lemma 1, we only need to prove
limn→∞
dmax,1(f) = 0.
By the construction of Hamming-distance-based bijection,
the first binary array is the all-zero array (i.e., the Hamming
weight is 0). The next n binary arrays all have Hamming
weight 1 and so on. Thus we can sequentially partition all
2n binary arrays into n+ 1 groups, where each group are all
binary arrays with Hamming weight w and the size of each
group is(
nw
)
for w = 0, 1, . . . , n. If any bit in a binary array
is flipped, then it is changed to another binary array in one
of two adjacent groups. The absolute distortion (their corre-
sponding integer representations in F ) is guaranteed to be
less than or equal to the total size of the two groups, i.e.,
dmax,1(f, b) 6
(
nw
)
+(
nw+1
)
2n,
if wH(b) < n2 ; otherwise,
dmax,1(f, b) 6
(
nw
)
+(
nw−1
)
2n.
Thus,
limn→∞
dmax,1(f) = limn→∞
maxb∈{0,1}n
dmax,1(f, b)
6 limn→∞
2(
nn/2
)
2n
= limn→∞
2√
2πn2
n
2n
= limn→∞
2
√
2
πn
= 0,
where the first equality follows the definition, the second in-
equality follows that(
nw
)
is maximized at w = n2 , and the
third equality follows from the Stirling approximation.
Remark 3. According Theorem 1, the distortion decreases to
0 as n increases, which is a much more promising result than
the unsigned binary expansion. But it should also be noted
that dmax,1(f) decreases in the order of o(1√n). We conjecture
this convergence speed is already the optimum.
Fig. 1 shows the distortion dmax,1(f, b) of Hamming-
distance-based representation of each f(b) ∈ {0, 1, . . . , 2n−1}for n = 16. It can be observed that the maximum distortion
(approximate 0.27) appears in the middle where wH(b) is
close to n2 . It provides an apparent improvement compared
to dmax,1(f, b) of the unsigned binary expansion for each
b ∈ {0, 1}n being 12 .
f(b) ×104
0 1 2 3 4 5 6
dm
ax,1
(f,b
)
0
0.1
0.2
0.3
Fig. 1. Maximum distortion dmax,1(f, b) of each b ∈ {0, 1}n for n = 16.
In order to show practical implications of the Hamming-
distance-based representation, we compare the accuracy of
a 10-category classification machine learning task in Fig. 2
on dataset CIRAR-10 [6] with the presence of bit errors in
the trained parameters. A nine-layer convolutional neural
network [7], [8] is trained with 2,786,890 real-valued param-
eters where each parameter is represented by 16 bits. The
total 44,590,240 bits are randomly flipped with probability p,
called raw bit error rate (RBER). It can be observed that the
accuracy is slightly improved by Hamming-distance-based
bijection. The limited improvement is due to the fact that
n = 16 is not sufficiently large.
RBER ×10-4
0 0.2 0.4 0.6 0.8 1
Accura
cy
0.6
0.65
0.7
0.75
0.8
0.85
0.9
Unsigned binary expasion
Hamming-distance-based
Fig. 2. Accuracy of CIFAR-10 for unsigned binary expansion and Hamming-distance-based representation.
IV. ENCODER AND DECODER
In this section, we provide a pair of encoder and decoder
for Construction 1 such that the time complexity is O(n)where n is the number of bits to represent an integer in
F = {0, 1, . . . , 2n − 1}.
A. Encoder {0, 1}n → F
Suppose b ∈ {0, 1}n has Hamming weight w and the posi-
tions of 1s are 0 6 p0 < p1 < · · · < pw−1 6 n−1. Accordingto Construction 1, f(b) is greater than all f(b′) if wH(b′) < w.
The total number of such b′ is∑w−1
i=0
(
ni
)
. We then need to
calculate the number of b′ with wH(b′) = w and b′ is lexico-graphically smaller than b. This is done by scanning b from
left to right to search for 1s. When the ith 1 is encountered
at position pi, the lexicographic order of b will increase by(
n−1−pw−i
)
. Therefore, the encoder performs the following cal-
culation
f(b) =
w−1∑
i=0
(
n
i
)
+
w−1∑
i=0
(
n− 1− pi
w − i
)
. (5)
The implementation of the encoder is straightforward from
Eq. (5). Note that if all values of(
ab
)
, 1 6 a 6 n, 0 6 b 6 a
is pre-calculated and stored with space complexity O(n3) (as-suming that one number occupies O(n) space), then the en-
coding time complexity is O(n) (assuming a single addition
can be done in O(1) time).
B. Decoder F → {0, 1}n
Let the integer to be mapped to a length-n binary array
be m. First we need to find the Hamming weight of the de-
coded binary array. It can be done by subtracting(
ni
)
, i =0, 1, . . . from m until the remaining value is less than
(
nw
)
.
The weight of the decoded array should then be w. Let m′ =m−
∑w−1i=0
(
ni
)
, we then need to find a weight-w binary array
with lexicographic order m′. The process is the reverse of en-coding, where each position is scanned from left to right and
tested to see if a 1 in that position will bring the lexicographicorder greater than m′. If not, the position is set to 1, other-wise, the position should remain 0. The detail of the decoderis described in Algorithm 1.
Algorithm 1 Decoder F → {0, 1}n
1: Input: an integer m ∈ {0, 1, . . . , 2n − 1}2: Output: a length-n binary array b
3: w ← 04: b← all-zero array of length n
5: while m−(
nw
)
> 0 do
6: m← m−(
nw
)
7: w ← w + 18: end while
9: for i = 0 to n− 1 do
10: if m = 0 then
11: Set last n− w bits in b as 1 and return b
12: end if
13: if m >(
n−1−iw
)
then
14: bi ← 115: m← m−
(
n−1−iw
)
16: w ← w − 117: end if
18: end for
19: return b
The decoding time complexity is O(n) if(
ab
)
, 1 6 a 6
n, 0 6 b 6 a is pre-calculated and stored with space complex-
ity O(n3) (assuming a single subtraction can be done in O(1)time).
V. CONCLUSIONS
In this paper, we provide a Hamming-distance-based binary
representation of integer or fixed-point numbers such that the
distortion brought by bit errors is reduced compared to the un-
signed binary expansion. O(n) time complexity encoder and
decoder are also described.
REFERENCES
[1] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantita-tive Approach. Elsevier Science, 2011.
[2] W. Ryan and S. Lin, Channel Codes: Classical and Modern. CambridgeUniversity Press, 2009.
[3] Gray and Frank, Pulse code communication. (NB. U.S. Patent 2,632,058filed November 1947.
[4] R. W. Hamming, “Error detecting and error correcting codes,” Bell Sys-tems Tech. J., vol. 29, no. 2, p. 147160, 1950.
[5] V. Bryant, Metric Spaces: Iteration and Application. Cambridge Uni-versity Press,ISBN 0-521-31897-1, 1985.
[6] Wolfram Research, “CIFAR-10,” https://doi.org/10.24097/wolfram.83212.data, 2016, [Online; accessed 07-November-2017].
[7] K. Jarrett, K. Kavukcuoglu, and M. Ranzato, “What is the best multi-stage architecture for object recognition?” in IEEE Int. Conf. ComputerVision, Kyoto, Japan, September 2009, p. 21462153.
[8] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification withdeep convolutional neural networks,” in Advances in Neural InformationProcessing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q.Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105.