RandomizedAlgorithms HSI 2013

8/18/2019 RandomizedAlgorithms HSI 2013

1/17

Randomized methods in lossless

compression of hyperspectral data

Qiang ZhangV. Paúl Pauca Robert Plemmons


2/17

Randomized methods in lossless compressionof hyperspectral data

Qiang Zhang,a V. Paúl Pauca,b and Robert Plemmonsca Wake Forest School of Medicine, Department of Biostatistical Sciences, Winston-Salem,

North Carolina 27157

[email protected] Forest University, Department of Computer Science, Winston-Salem, North Carolina

27109cWake Forest University, Departments of Mathematics and Computer Science, Winston-Salem,

North Carolina 27109

Abstract. We evaluate recently developed randomized matrix decomposition methods for fast

lossless compression and reconstruction of hyperspectral imaging (HSI) data. The simple ran-

dom projection methods have been shown to be effective for lossy compression without severely

affecting the performance of object identification and classification. We build upon these meth-

ods to develop a new double-random projection method that may enable security in data trans-

mission of compressed data. For HSI data, the distribution of elements in the resulting residualmatrix, i.e., the original data subtracted by its low-rank representation, exhibits a low entropy

relative to the original data that favors high-compression ratio. We show both theoretically and

empirically that randomized methods combined with residual-coding algorithms can lead to

effective lossless compression of HSI data. We conduct numerical tests on real large-scale

HSI data that shows promise in this case. In addition, we show that randomized techniques

can be applicable for encoding on resource-constrained on-board sensor systems, where the

core matrix-vector multiplications can be easily implemented on computing platforms such as

graphic processing units or field-programmable gate arrays. © 2013 Society of Photo-Optical

Instrumentation Engineers (SPIE) [DOI: 10.1117/1.JRS.7.074598]

Keywords: random projections; hyperspectral imaging; dimensionality reduction; lossless com-

pression; singular value decomposition.

Paper 12486SS received Jan. 3, 2013; revised manuscript received Apr. 18, 2013; accepted for publication Jun. 14, 2013; published online Jul. 30, 2013.

1 Introduction

Hyperspectral image (HSI) data are the measurements of the electromagnetic radiation reflected

from an object or a scene (i.e., materials in the image) at many narrow wavelength bands.

Spectral information is important in many fields such as environmental remote sensing, mon-

itoring chemical/oil spills, and military target discrimination. For comprehensive discussions,

see Refs. 1–3. HSI data is being gathered in sensors of increasing spatial, spectral, and radio-

metric resolutions leading to the collection of truly massive datasets. The transmission, storage,

and processing of these large datasets present significant difficulties in practical situations as

new-generation sensors are used. For example, for aircraft or for increasingly popular

unmanned-aerial vehicles carrying hyperspectral scanning imagers, the imaging time is limited

by the data capacity and computational capability of the on-board equipment; since within 5 to

10 s, hundreds to thousands of pixels of hyperspectral data are collected and often preprocessed. 1

For real-time on-board processing, it would be desirable to design algorithms capable of com-

pressing such amounts of data within 5 to 10 s, before the next section of the scene is scanned.

This requirement makes it difficult to apply algorithms such as JPEG2000,4 three-dimensional

(3-D)-SPIHT,5 or 3-D-SPECK,6 unless it is being deployed on acceleration platforms such as

digital signal processor,7 graphic processing unit (GPU), or field-programmable gate array

0091-3286/2013/$25.00 © 2013 SPIE

Journal of Applied Remote Sensing 074598-1 Vol. 7, 2013


3/17

(FPGA). For example, Christophe and Pearlman8 reported over 2 min of processing time using

3-D-SPIHT with random access for a 512 × 512 × 224 HSI dataset, including 30 s for the dis-

crete wavelet transformation.

Dimensionality reduction methods can provide means to deal with the computational

difficulties of hyperspectral data. These methods often use projections to compress a high-

dimensional data space represented by a matrix A into a lower-dimensional space represented

by a matrix B, which is then factorized. For HSI processing, hundreds of bands of images can be

grouped in a 3-D data array, also called a tensor or a datacube, which can be unfolded into a matrix A from which B is obtained and then factorized. Such factorizations are referred to as low-

rank matrix factorizations, resulting in a low-rank matrix approximation to the original HSI data

matrix A.2,9–11

However, dimensionality reduction techniques provide lossy compression, as the original

data is not exactly represented or reconstructed from the lower-dimensional space. Recent efforts

to provide lossless compression exploit the correlation structure within HSI data, encoding the

residuals (original data —approximation) after stripping off the correlated parts.12,13 Given the

large number of pixels, such correlations are often restricted to spatially or spectrally local areas,

whereas dimensionality reduction techniques essentially explore the global correlation structure.

In this paper, we propose the use of randomized dimensionality reduction techniques for effi-

ciently capturing global correlation structures and residual encoding, as in Ref. 13, and for pro-

viding lossless compression. The success of this approach requires low entropy of the

distribution of the residual data relative to the original, and as it shall be observed in the exper-imental section this appears to be the case with HSI data.

The most popular methods for low-rank factorizations employ the singular value decompo-

sition (SVD), e.g., Ref. 14, and can lead to popular data analysis methods such as principal

component analysis (PCA).15 Compared with algorithms that employ fixed basis functions,

such as 3-D wavelets in JPEG2000, 3-D-SPIHT, and 3-D-SPECK, the basis given by the

SVD or PCA are data driven and provide a more compact representation of the original

data. Moreover, by the optimality of the truncated SVD’s (TSVD) low-rank approximation,14

the Frobenius norm of the residual matrix is also optimal, and a low entropy in its distribution

may be expected. Both the SVD and PCA can be used to represent an n-band hyperspectral

dataset with the data size equivalent to only k bands, where k ≪ n. For applications of the

SVD and PCA in HSI, see Refs. 16–19. The main disadvantage of using the SVD is its com-

putation time: Oðmn2

Þ floating-point operations (flops) for an m×

n matrix (m ≥ n) (Ref. 20).With recent technology, HSI datasets can easily be at the million pixel or even giga pixel-level,

rendering the use of a full SVD impractical on real scenarios.

The recent development of probabilistic methods for approximated singular vectors and sin-

gular values has provided a way to circumvent the computational complexity of the SVD, though

at the cost of optimality in the approximation.21 These methods begin by randomly projecting the

original matrix to obtain a lower-dimensional matrix, while keeping the range of the original

matrix asymptotically intact. The much smaller-projected matrix is then factorized using a full-

matrix decomposition such as the SVD. The resulting singular vectors are backprojected to the

original space. Compared with deterministic methods, probabilistic methods often offer lower-

computational cost, while still achieving high-accuracy approximations (see Ref. 21 and the

references therein).

Chen et al.22 have recently provided an extensive study on the effects of linear projections on

the performance of target detection and classification of HSI. In their tests, they found that thedimensionality of hyperspectral data can typically be reduced to 1∕5 ∼ 1∕3 that of the original

data without severely affecting the performances of classical target detection and classification

algorithms. Compressive sensing approaches for HSI also take advantage of redundancy along

the spectral dimension,11,17,23–25 and involve random projection of the data onto a lower-dimen-

sional space. For example, Fowler 17 proposed an approach that exploits the use of compressive

projections in sensors that integrate dimensionality reduction and signal acquisition to effectively

shift the computational burden of PCA from the encoder platform to the decoder site. This tech-

nique, termed compressive-projection PCA (CPPCA), couples random projections at the

encoder with a Rayleigh–Ritz process for approximating eigenvectors at the decoder. In its

use of random projections, this technique possesses a certain duality with newer randomized

Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data



4/17

SVD (rSVD) approaches recently proposed.19 However, CPPCA recovers coefficients of a

known sparsity pattern in an unknown basis. Accordingly, CPPCA requires the additional

step of eigenvector recovery.

In this paper, we will present several randomized algorithms designed for the on-board

processing of the lossy and the lossless compressions of HSI. Our goals include the fast process-

ing of hundreds of pixels of hyperspectral data within a time frame of 5 s and to achieve a lossless

compression ratio (CR) close to 3. The structure in the remainder of the paper is as follows. In

Sec. 2, we present several fast-randomized methods for the purposes of lossless compression andreconstruction, suitable for on-board and off-board (receiving station) processing. In Sec. 3, we

apply the methods to a large HSI dataset to demonstrate the efficiency and effectiveness of the

proposed methods. We conclude with some observations in Sec. 4.

2 Methodology

Randomized algorithms have recently drawn a large amount of interest,21 and here we exploit

this approach specifically for efficient on-board lossless compression and data transfer and off-

board reconstruction of HSI data. For lossless compression, the process is as follows:

1. Calculate a low-rank approximation of the original data using randomized algorithms.

2. Encode the residual (original data —approximation) using standard integer or floating

point-coding algorithms.

We present several randomized algorithms for efficient low-rank approximation. They can be

written in fewer than 10 lines of pseudo-code, can be easily implemented on PC platforms, and

may be ported to platforms such as GPUs or FPGAs. As readers will see, in all of the large-scale

computations only matrix-vector multiplications are involved, and more computationally inten-

sive SVD computations involve only small scale matrices.

In the encoding and decoding algorithms that follow, it is assumed that HSI data is collected

in blocks of size n x × ny × n, where n x and ny are the number of pixels along the spatial dimen-

sions and n is the number of spectral bands. During compression, each block is first unfolded

into a two-dimensional array of size m × n, where m ¼ n xny, by stacking each slice of size n x ×ny into a one-dimensional array of size m × 1. The compact representation for each block canthen be stored on board. See Sec. 3 for a more extensive discussion of compression of HSI data in

blocks as the entire dataset is being gathered.We start by defining terms and notations. The SVD of a matrix A ∈ Rm×n is defined as

A ¼ U ΣV T , where U and V are orthonormal and the columns of which are denoted as uiand vi, respectively. Σ is a diagonal matrix with entries σ 1 ≥ σ 2 ≥ · · ·≥ σ p ≥ 0, with p ¼minðm; nÞ. For some k ≤ p, the TSVD rank-k approximation of A is a matrix Ak such that Ak ¼

Pki¼1 σ iuiv

T i ¼ U kΣkV T k , where U k and V k contain the first k-columns of U and V , respec-

tively. The residual matrix obtained from the approximation of A with Ak is given by

R ¼ A − Ak. By the Eckart –Young theorem,14 Ak is the optimal rank-k approximation of A min-imizing the Frobenius norm of R.

2.1 Single-Random Projection Method

Computing low-rank approximations of a large matrix using the SVD is prohibitive in most of

the real-world applications. Randomized projections into lower-dimensional spaces provide a feasible way to get around this problem. Let P ¼ ðpijÞm×k1 be a matrix of size m × k1 withrandom independent and identically distributed (i.i.d.) entries drawn from N ð0; 1Þ. We definethe random projection of the row space of A onto a lower k1-dimensional subspace as

B ¼ PT A: (1)

If P is of size n × k1, then B ¼ AP is a similar random projection of the column space of A.Given a target rank k, Vempala 26 uses such P matrices to propose an efficient algorithm for

computing a rank-k approximation of A. The algorithm consists of the following three sim-

ple steps:




5/17

1. Compute the random projection B ¼ 1∕ ffiffiffiffiffik1p PT 1 A for some k1 ≥ c log n∕ϵ22. Compute the SVD, B ¼Pi¼1 λiûiv̂T i3. Return: ˜ Ak← Að

Pki¼1 v̂iv̂

T i Þ ¼ AV̂ k V̂ T k .

It is also shown in Ref. 26 that with a high probability, the norm error between ˜ Ak and A is

bounded by

k A − ˜

Akk2

F ≤ k A − Akk2

F þ 2ϵk Akk2

F ; (2)

where Ak is is the optimal rank-k approximation provided by the TSVD. This bound shows that

the approximation ˜ Ak is near optimal for small ϵ.

During HSI remote sensing data acquisition, Vempala ’s algorithm may enable lossy com-

pression by efficiently computing and storing AV̂ k and V̂ k on-board as the data is being gathered.

The storage requirement of AV̂ k and V̂ k is proportional to ðm þ nÞk compared with mn of theoriginal data. For lossless compression, the residual R ¼ A − ˜ Ak may be compressed with aninteger or floating point-coding algorithm and also stored on board.

Encoding and decoding procedures using Vempala ’s algorithm are presented in Algorithms 1

and 2, respectively. For lossy compression, R̂ may be ignored. Clearly, there is a tradeoff between

the target rank, reducing the size of AV̂ k and V̂ k, and the compressibility of the residual R, which

is also dependent on the type of data being compressed. Figure 1 illustrates this tradeoff, assum-

ing that the entropy of the residual decreases as a scaled power law in the form of k−s∕α for s ¼ 0.1∶0.1∶2 and with constant α .

Matrix P1 plays an important role in the efficient low-rank approximation of A. P1 could be

fairly large depending on the prespecified value of ϵ. For example, for ϵ ¼ 0.15, c ¼ 5, andn ¼ 220, P requires k1 ≥ 1199 columns. However, P1 is needed only once in the compressionprocess, and may be generated in blocks (see Sec. 3). In addition, the distribution of random

entries in P1 is symmetric, being drawn from a normal distribution. Zhang et al.27 relax this

requirement to allow for any distribution with a finite variance. For faster implementation, a

Algorithm 1 On-Board Random Projections Encoder.

Input: HSI data block of size n x × n y × n , unfolded into a m × n array A, target rank k , and approximation

tolerance ϵ .

Output: V̂ k , W , R̂

1. Compute B ¼ 1∕ ffiffiffiffiffiffik 1p P T 1 A, for some k 1 ≥ c log n ∕ϵ2.2. Compute the SVD of B : B ¼ Pi ¼1 λi û i v̂ T i .3. Construct the rank-k approximation: ˜ Ak ¼ W V̂ T k ; W ¼ A V̂ k .

4. Compute the residual: R ¼ A − ˜ Ak .

5. Encode the residual as R̂ with a parallel coding algorithm.

6. Store V̂ k , W , and R̂

Algorithm 2 Off-Board Random Projection Decoder.

Input: V̂ k , W , R̂ .

Output: The original matrix A.

1. Decode R from R̂ with a parallel decoding algorithm.

2. Compute the rank-k approximation: ˜ Ak ¼ W V̂ T k .

3. Reconstruct the original: A ¼ ˜ Ak þ R




6/17

circulant random matrix could also be effective,27,28 needing storage of only one random

vector.

2.2 Double-Random Projections Method

A variant of the above low-rank approximation approach may be derived by introducing a second

random projection for the row space

B2 ¼ AP2; (3)

where P2 ∈ Rn×k2 has i.i.d. entries drawn from N ð0; 1Þ and B2 ∈ Rm×k2 . Substitution of A in

Eq. (3) with its rank-k approximation AV̂ k V̂ T k results in

B2 ≈ AV̂ k V̂ T k P2: (4)

Notice that V̂ T k P2 has full-row rank, hence its Moore–Penrose pseudo-inverse satisfies

ðV̂ T k P2ÞðV̂ T k P2Þ† ¼ I k: (5)

Multiplying Eq. (4) on both sides with ðV̂ T k P2Þ† gives

B2ðV̂ T k P2Þ† ≈ AV̂ k: (6)

A new rank-k approximation of A can then be obtained as

^ Ak ¼ B2ðV̂ T k P2Þ†V̂ T k ≈ AV̂ k V̂ T k ≈ A: (7)

As in Vempala ’s algorithm, the quality of this approximation depends on choosing a suffi-

ciently large value of k2 ≥ 2k þ 1 (see Ref. 27 for a more detailed discussion). We refer to thismethod as the double-random projection (DRP) approach for low-rank approximation.

During HSI remote sensing data acquisition, the DRP approach may enable lossy compres-

sion by efficiently computing and storing B2, V̂ k, and P2 on-board as the data is being gathered.

The storage requirement for these factors is proportional to ðm þ nÞk2 þ nk. For lossless com-pression, the residual R ¼ A − ^ Ak may be compressed with an integer or floating point-codingalgorithm and also stored on-board. Encoding and decoding procedures based on DRP are pre-

sented in Algorithms 3 and 4, respectively. For lossy compression, R̂ may be ignored as in the

single-random projection case.

At a slight loss of precision and increased storage requirement, the DRP encoding and decod-

ing algorithms offer the additional advantage of secure data transfer if P2 is used as a shared key

0 50 100 150 200 250 3000

2

4

6

8

10

12

14

0.10.20.30.40.50.6

0.7

0.80.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

desired rank

c o m p r e s s i o n r a t i o

Fig. 1 Theoretical compressibility curves when entropy of the residual decreases as ðk ∕α Þ−s for k ¼ 2; : : : ;300, s ¼ 0.1: : :2 and a constant α ¼ 2. The dashed line indicates a compressed ratio of1 (original data).




7/17

between the remote sensing aircraft and the ground. It remains to be seen whether this cipher is

easily violated in the future study, and for now we can regard it as a lightweight security. In this

case, P2 could be generated and transmitted securely only once between the ground and the

aircraft. Subsequent communication would not require transmission of P2. Unlike the sin-

gle-random projection approach, interception of factors B2, V̂ k, and R̂ would not easily leadto a reconstruction of the original without P2.

2.3 Randomized Singular Value Decomposition

The rSVD algorithm described by Halko et al.21 explores approximate matrix factorizations by

random projections and separates the process into two stages. In the first stage, A is projected

into a l-dimensional space by computing

Y ¼ AΩ; (8)

where Ω is a matrix of size n × l with random entries drawn from N ð0; 1Þ. Then, for a givenϵ > 0, a matrix Q ∈ Rm×l whose columns form an orthonormal basis for the range of Y is

obtained such that

k A − QQT Ak22 ≤ ϵ: (9)

See Algorithms 4.1 and 4.2 in Ref. 21 to see how Q and l may be computed adaptively. In the

second stage, the SVD of the reduced matrix QT A ∈ Rl×n is computed as ˜ U Σ̂ V̂ T . Since l ≪ n,

it is generally computationally feasible to compute the SVD of the reduced matrix. Matrix A can

then be approximated as

A ≈ ðQ ˜ U ÞΣ̂V̂ T ¼ Û Σ̂ V̂ T ; (10)

Algorithm 3 On-Board Double-Random Projections Encoder.

Input: HSI data block of size n x × n y × n , unfolded into a m × n array A, target rank k , and approximationtolerance ?.

Output: B 2, V̂ k , R̂ .

1. Compute: B 1 ¼ 1∕ ffiffiffiffiffiffik 1

p P T

1A, and B 2 ¼ AP 2, for some k 1 ≥ c log n ∕ϵ2 and k 2 ≥ 2k þ 1.

2. Compute the SVD of B 1: B 1 ¼ P

i ¼1 λi û i v̂ T i .

3. Compute the rank-k approximation: Âk ¼ B 2ð V̂ T k P 2Þ†V̂ T k .

4. Compute the residual: R ¼ A − Âk .

5. Code the residual as R̂ with a parallel coding algorithm.

6. Store B 2, V̂ k , and R̂

Algorithm 4 Off-Board Double-Random Projections Decoder.

Input: B 2, ^

V k , P 2, ^

R

Output: The original matrix A.


2. Compute the low-rank approximation: Âk ¼ B 2ð V̂ T k P 2Þ†V̂ T k 3. Reconstruct the original: A ¼ Âk þ R




8/17

where Û ¼ Q ˜ U and V̂ are orthonormal matrices. As such, Eq. (10) is an approximate SVD of A,and the range of Û is an approximation to range of A. See Ref. 21 for details on the choice of l,

along with extensive numerical experiments using rSVD methods, and a detailed error analysis

of the two-stage method described above.

The rSVD approach may also be used to specify HSI encoding and decoding compression

algorithms, as shown in Algorithms 5 and 6. For lossy compression, Q and B need to be com-

puted and stored on-board. The storage requirement for these factors is proportional to

ðm þ nÞl. As in the previous cases, for lossless compression the residual may be calculatedand compressed using an integer or floating point-coding algorithm.

Compared with the previous single- and double-random projection approaches, rSVD

requires the computation of Q but is also able to push the SVD calculation to the decoder.

Since l appears to be much smaller than k1 and k2 in practice, the encoder is able to store

Q and B directly without any loss in the approximation accuracy. Perhaps, the key benefit

of rSVD lies in that the low-rank approximation factors Û , Σ̂, and V̂ can be used directly

for subsequent analysis such as PCA, clustering, etc.

2.4 Randomized Singular Value Decomposition by DRP

The DRP approach can also be applied in the rSVD calculation by introducing

B1 ¼ PT 1 A; (11)

where P1 is of size m × k1 with entries drawn from N ð0; 1Þ. Replacing A with the rSVD approxi-mation, QQT A leads to

Algorithm 5 Randomized SVD Encoder.

Input: HSI data block of size n x × n y × n , unfolded into a m × n array A and approximation tolerance ?.

Output: Q , B , R̂

1. Calculate: Y ¼ AΩ, for some l > k 2. Apply Algorithm 4.2 in Ref. 21 to obtain Q from Y

3. Compute: B ¼ Q T A

4. Compute the residual: R ¼ A − QB

5. Code R as R̂ with a parallel coding algorithm.

6. Store Q , B , and R̂

Algorithm 6 Randomized SVD Decoder.

Input: Q , B , and R̂

Output: The original matrix A and its rank-k approximate SVD Û , Σ̂, V̂


2. Compute the SVD: B ¼ ˜ U Σ̂ V̂

3. Compute: Û ¼ Q ˜ U

4. Compute the low-rank approximation: Âl ¼ Û Σ̂ V̂

5. Reconstruct the original: A ¼ Âl þ R




9/17

B1 ≈ PT 1 QQ

T A: (12)

Multiplying both sides by the pseudo-inverse of PT 1 Q, we have

ðPT 1 QÞ†B1 ≈ QT A: (13)

With this slight modification, the rSVD calculation in the encoder can proceed by using

ðPT

1 QÞ†

B1 instead of QT

A. The corresponding encoding algorithm is given Algorithm 7.The decoder algorithm remains the same as in the rSVD case.

3 Numerical Experiments

We have tested the encoding algorithms presented in Sec. 2 on a large and publicly available HSI

dataset, namely Indian Pines, collected by AVIRIS over a 25 × 6 mi2 portion of Northwest

Tippecanoe County, Indiana, on June 12, 1992. The sensor has a spectral range of 0.45 to

2.5 μm over 220 bands, and the full dataset consists of a 2;678 × 614 × 220 image cube stored

as unsigned 16-bit integers. Figure 2 shows the 100th band in grayscale.

A remote-sensing aircraft carrying hyperspectral scanning imagers can collect such a data

cube in blocks of hundreds to thousands of pixels in size, each gathered within a few seconds

time.1

The size of each data block is determined by factors such as the ground sample distanceand the flight speed.

To simulate this process, we unfolded the Indian Pines data cube into a large matrix T of size

1;644;292 × 220, and then divided T into nine blocks Ai of size m ¼ 182;699 × n ¼ 220 each.For simplicity, the last pixel in the original dataset was ignored. Each Ai block was then com-

pressed sequentially using the encoding algorithms of Sec. 2. In all cases, Ai is converted from an

unsigned 16-bit integer to double the precision before compression, and the compressed rep-

resentation is converted back to unsigned 16-bit integers for storage.

All algorithms were implemented in Matlab, and the tests were performed on a PC platform

having eight 3.2 GHz Intel Xeon cores and 12 Gb memory. In the implementation of

Algorithm 1, random matrix P1 ∈ Rm×k1 could be large, since m ¼ 182;699 and the oversam-

pling requirement k1 ≥ c log n∕ϵ2 can lead to relatively large k1, e.g., k1 ¼ 1199 when c ¼ 5

and ϵ

¼ 0.15. To reduce the memory requirement, we implicitly represent P1 in column blocks

as P1 ¼ ½Pð1Þ1 Pð2Þ1 : : : PðνÞ1 and implement the matrix multiplication PT 1 A as a series of productsP

ð jÞ1 A, generating and storing P1 as only one block at the time.

3.1 Compressibility of HSI Data

As alluded to with the compressibility curves in Fig. 1, the effectiveness of low-rank approxi-

mation and residual encoding depends on (1) the compressibility of the data and (2) the effec-

tiveness of dimensionality reduction in reducing the entropy of the residual as a function of the

desired rank k. The first point can be demonstrated by computing high-accuracy approximated

Algorithm 7 Randomized SVD by DRP Encoder.

Input: HSI data block of size n x × n y × n , unfolded into a m × n array A and approximation tolerance ϵ

Output: Q ; W , and R̂

1. Calculate: B 1 ¼ 1 ffiffiffiffik 1

p P T 1

A, Y ¼ AΩ, for some k 1 ≥ c ?log?n ϵ2 and l > k

2. Apply Algorithm 4.2 in Ref. 21 to obtain Q from Y

3. Compute the residual: R ¼ A − Q W ; W ¼ ðP T 1

Q Þ†B 14. Code R as R̂ with a parallel coding algorithm.

5. Store Q , W , and R̂




10/17

singular vectors and singular values of the entire Indian Pines dataset using the rSVD algorithm.

Figure 3 shows the first eight singular vectors folded as images of size 2;678 × 614. Figure 4

shows the corresponding singular values up to the 20th value. As can be observed, a great deal of

the information is encoded in the first six singular vectors and singular values with the seventh

singular vector appearing more like noise.

To address the second point, we compare the histogram of the original dataset with that of the

residual produced by the rSVD encoder in Algorithm 5 with target rank k ¼ 6. Figure 5(a) shows

Fig. 2 The grayscale image of the 100th band.

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

Fig. 3 The first eight singular vectors, û i , shown as images.




11/17

values in the original dataset to be in the range ½0; 0.4. After rSVD encoding, the residual valuesare roughly distributed in a Laplacian distribution in the range ½−0.1; 0.1 as seen in Fig. 5(b).Moreover, 95.42% of the residual values are within the range of ½−:0015; :0015 (notice the logscale on the y-axis). This suggests that the entropy of the residual is significantly smaller than the

entropy of the original dataset and that, as a consequence, the residual may be effectively

encoded for lossless compression. Figure 5(c) shows the probability of observing a residual

value, r, greater than a given value x, i.e., pðr > xÞ, and again indicating the residuals are highlydensely distributed around zero.

3.2 Lossless Compression Through Randomized Dimensionality Reduction

We use the entropy of the residuals produced by each encoding algorithm as the information-

theoretic lower bound, i.e., the minimum amount of bits required to code the residuals, to esti-mate the amount of space needed to store a compressed residual. This entropy of the distribution

of residual values is defined as

hðRÞ ¼ −Z

pð xÞ logðpð xÞÞd x; (14)

2 4 6 8 10 12 14 16 18 2010

0

101

102

103

104

i

σ i

Fig. 4 The singular spectrum of the full Indian Pines dataset singular values up to the 20th value.

−0.1 −0.05 0 0.05 0.110

0

102

104

106

108

1010

(b)0 0.1 0.2 0.3 0.4

100

102

104

106

108

1010

(a)−0.01 −0.005 0 0.005 0.01

0

0.2

0.4

0.6

0.8

1

(c)

Fig. 5 (a) The distribution of the original Indian Pines hyperspectral imaging (HSI) data values.

(b) The distribution of residuals after subtracting the truncated SVD (TSVD) approximation from

the original data. (c) The cumulative distribution of residuals after subtracting the TSVD approxi-

mation from the original data.




12/17

where pð xÞ is the probabilistic distribution function of residual values. We estimate hðRÞ bycomputing and scaling histograms of residual values [as in Fig. 5(b)].

We assume that, like the original data, the low-rank representation and the corresponding

residual are stored in the signed 16-bit integer format. The CR is then calculated by dividing

the amount of storage needed for the original data by the amount of storage needed for the

compressed data. As an example, for Algorithm 1, output V̂ k and W ¼ AV̂ k require space pro-portional to

ðm

þn

Þk. If the entropy of the residual is h

ðR

Þ bits, then the lossless CR obtained

using Algorithm 1 is calculated as

CR ¼ 16mn16nk þ 16 mk þ hðRÞmn : (15)

Figure 6 shows lossless CRs obtained using all four encoding algorithms of Sec. 2 as a func-

tion of data block Ai. The target rank is k ¼ 6 for all cases, and the number of columns in P1 andP2 are k1 ¼ 1;000 and k2 ¼ 2k þ 1 ¼ 13, respectively. Notice that the CRs are above 2.5 andclose to or around 3, while Wang et al.13 indicated 3 as a good CR for HSI data. Readers should

be aware that Fig. 6 only shows the theoretical upper bounds of the lossless CRs, while those in

Ref. 13 are the real ones. The CRs produced by the DRP variants are slightly lower than their counterparts. This is an expected result as the advantage of DRP (Algorithm 3) lies in the easily

implemented lightweight data security. Finally, high CRs above 4.5 may be achieved, as shown

in Fig. 6, for the last data block. This block corresponds to segments of homogeneous vegetation,

seen in the right side of Fig. 2, which has been extensively tested by classification algorithms. 29

Besides the theoretical upper-bounds of the CRs presented in Fig. 6, we also combine the

randomized methods with some popular lossless compression algorithms for coding the resid-

uals. The chosen residual coding methods include the Lempel-Ziv-Welch (LZW) algorithm,30

Huffman coding,31 Arithmetic coding,32 and JPEG2000.33 Table 1 presents the mean lossless

CRs of the nine blocks of HSI data, where columns correspond to the randomized methods

and rows correspond to the coding algorithms. The highest CR of 2.430 is achieved by

the combination of the rSVD method and the JPEG2000 algorithm. Given the rapid

development of coding algorithms, and the relatively limited and rudimentary algorithms pre-sented here, the CR can be further elevated by incorporating more advanced algorithms in the

future work.

1 2 3 4 5 6 7 8 92

2.5

3

3.5

4

4.5

5

Block

C o m p r e s s i o n R a t i o

RP

DRP

rSVD

rSVD−DRP

Fig. 6 The lossless compression ratios (CR) using Algorithms 1, 3, 5, and 7.




13/17

3.3 Optimal Compressibility

Optimal CRs using the randomized dimensionality reduction methods of Sec. 2 depend on the

appropriate selection of parameters such as target rank, approximation error tolerances, etc. Such

results for the Indian Pines dataset are beyond the scope of this paper. However, some optimality

information can be gleaned by observing how CR changes as a function of target rank k (with

other parameters fixed). Notice from Ref. 15 that the amount of storage needed for the low-rank

representation increases with k, while the entropy of the residual decreases. The two terms in the

denominator thus result in an optimal k, which is often data dependent. Figure 7 shows such theresult for the Indian Pines dataset. The different curves correspond to different data blocks of the

original dataset, and the solid red curve is the mean across all blocks. Our choice of k ¼ 6 is seento be near optimal.

We can learn several things from the curves in Fig. 7. First, HSI data is compressible, but its

compressibility depends on the right choice of k in the presented algorithms. Some hints on

choosing the right k can be seen through the singular values and singular vectors. For example,

in Fig. 3, the singular vectors after the sixth singular vector look more and more like noise, which

tells us most of the information is contained in the first six singular vectors. Second, we have

empirically demonstrated the entropy of residuals approximately following the power law,

Table 1 Lossless compression ratios (CRs) of hyperspectral imaging (HSI) data with combina-

tions of randomized methods with coding algorithms.

Algorithm 1 Algorithm 3 Algorithm 5 Algorithm 7

LZW 1.438 1.338 1.569 1.563

Huffman coding 2.353 2.022 2.328 2.316

Arithmetic coding 2.362 2.017 2.326 2.313

JPEG2000 2.414 2.189 2.430 2.419

0 20 40 60 80 1001

1.5

2

2.5

3

3.5

4

4.5

5Algorithm 1 (RP)

0 20 40 60 80 100

1

1.5

2

2.5

3

3.5

4

4.5

5Algorithm 3 (DRP)

0 20 40 60 80 1001

1.5

2

2.5

3

3.5

4

4.5

5Algorithm 5 (rSVD)

0 20 40 60 80 1001

1.5

2

2.5

3

3.5

4

4.5

5Algorithm 7 (rSVD−DRP)

Block 1Block 2

Block 3

Block 4

Block 5

Block 6

Block 7

Block 8

Block 9

Mean

Fig. 7 CRs of the Indian Pines HSI dataset as function of target rank k .




14/17

i.e., ðk∕α Þ−s, as illustrated in Fig. 1; hence, the optimal k has a highly peak area, and is relativelyeasy to choose from compared with flatter curves. Further tests are needed to develop robust

methods for obtaining near optimal CRs. The adaptive selection of the rank parameter in the

rSVD calculation21 can be used as an important first step in this direction. Third, since the

rSVD algorithm is near optimal in terms of the Frobenius norm, or in the mean squared error

sense, the similar curves by other randomized algorithms demonstrate that they all share the near

optimality as the rSVD algorithm.

To further justify this finding, we explore their suboptimality through comparing theFrobenius norms and entropies of the residuals by the four randomized algorithms with

those by the exact TSVD. Figure 8(a) shows the ratios of the Frobenius norm of residuals

by the exact TSVD and the Frobenius norm of residuals by each algorithm for the nine blocks

of HSI data, while Fig. 8(b) shows the ratios of the entropy of residuals by the exact TSVD and

the entropy of residuals by each algorithm. The ratio at 1 represents the exact optimality, while

higher ratios are more optimal than the lower ones. In terms of the Frobenius norm, three of the

four algorithms are fairly close to the optimal, while Algorithm 3, the DRP algorithm, shows less

optimality. In terms of the entropy, all four algorithms are fairly close to the optimal, which

explains why the CRs of the four algorithms are all fairly close to each other. Interestingly,

in Fig. 8(b), we observe ratios even higher than 1, which means in some cases the entropies

of residuals by these algorithms can be even less than those by the exact TSVD.

3.4 Time Performance of Randomized Dimensionality Reduction

If lossy compression of HSI data is preferred, randomized dimensionality reduction methods can

perform in near real time. Figure 9 shows the amount of time (in seconds) that each encoder in

Sec. 2 takes to process each data block Ai, while ignoring the residuals. Notice that all encoders

take less than 5 s for each of the nine data blocks. The computation times of the RP encoder

(Algorithm 1) and the DRP encoder (Algorithm 1) do not appear to be significantly different, and

both take less than 2.4 s per data block, averaging about 2.3 s over all nine data blocks. This can

translate to a mean throughput of 182;699 × 220 × 8∕2.3≈ 140 Mb∕s. Note that the original

unsigned 16-bit integer is converted to double precision before processing. The green curve

corresponding to the rSVD encoder (Algorithm 5) shows the best performance, while the

black curve corresponding to the rSVD-DRP encoder (Algorithm 7) is the slowest, but still

takes less than 5 s per block. The extra time is spent in step 3 computing the pseudo-inverseof PT 1 Q. Efficient non-Matlab implementations of the encoding algorithms presented in this

paper on platforms such as GPUs, would be expected to perform in real time. For lossless com-

pression, our tests show that the low-entropy residuals may be effectively compressed with con-

ventional tools, such as gzip, in less than 4 s per data block, or better performance tools, such as

0 2 4 6 8 100.4

0.5

0.6

0.7

0.8

0.9

1

(a) (b)

Frobenius norm optimality

Algorithm 1

Algorithm 3

Algorithm 5

Algorithm 7

0 2 4 6 8 100.9

0.92

0.94

0.96

0.98

1

1.02

1.04

1.06Entropy Optimality

Fig. 8 (a) The ratios of the Frobenius norm of residuals of the nine blocks of HSI data by each

algorithm and that by the exact TSVD. (b) The ratios of the entropy of residuals by each algorithm

and that by the exact TSVD.




15/17

JPEG2000, which can compress each block within 4.5 s. For Huffman coding and Arithmeticcoding algorithms, computation would take significantly longer times without the assistance of

special acceleration platforms, such as GPU or FPGA.

For comparison, we also run 3-D-SPECK and 3-D-SPIHT on a 512 × 512 × 128 subset, and

both algorithms needed over 2 min to provide lossless compression. Christophe and Pearlman

also reported over 2 min of processing time using 3-D-SPIHT with random access for a similar-

size dataset.8

4 Conclusions and Discussions

As HSI datasets grow in size, compression and dimensionality reduction for analytical purposes

become increasingly critical for storage, data transmission, and subsequent postprocessing. This

paper shows the potential of using randomized algorithms for efficient and effective compressionand reconstruction of massive HSI datasets. Built upon the random projection and rSVD algo-

rithms, we have further developed a DRP method for a standalone encoding algorithm or for it

being combined with the rSVD algorithm. The DRP algorithm slightly sacrifices CRs, while

adding a lightweight encryption security.

We have demonstrated that for a large HSI dataset, such as the Indian Pines dataset, theo-

retical CRs close to 3 are possible, while empirical CRs can be as high as 2.43 based on testing a

limited number of coding algorithms. We have used the rSVD algorithm also to estimate near

optimal target ranks by simply using the approximate singular vectors. Choosing optimal param-

eters for dimensionality reduction using randomized methods is a topic of future research. The

adaptive rank selection method described in Ref. 21 offers an initial step in this direction. In

terms of the suboptimality of the randomized algorithms, we have compared them with the exact

TSVD in terms of the Frobenius norm and the entropy of the residuals, both of which appear to

be near optimal empirically.The presented randomized algorithms can be regarded as loss compression algorithms, which

need to be combined with residual-coding algorithms for the lossless compression. We have

shown empirically that the entropy of the residual (original data —low-rank approximation)

decreases significantly for HSI data. Conventional entropy-based methods for integer coding

are expected to perform well on these low-entropy residuals. Integrating advanced residual-

coding algorithms with the randomized algorithm is an important research topic for the future

study.

One concern for the residual coding is the speed. In this regard, recent developments in float-

ing-point coding34 have shown throughputs reaching as high as 75 Gb∕s on a GPU. On an eight

Xeon-core computer, we have observed throughputs near 20 Gb∕s. Both of these throughputs

1 2 3 4 5 6 7 8 9 101.5

2

2.5

3

3.5

4

4.5

5

Block

T i m e ( S

e c o n d )

Computation time for lossy compression using Algorithm 1, 3, 5, and 7

RP

DRPrSVD

rSVD−DRP

Fig. 9 The computation time for lossy compression by Algorithms 1, 3, 5, and 7.




16/17

should be sufficient for coding the required HSI residual data. Saving residuals back as 16-bit

integers can further reduce the computation time.

Acknowledgments

Research by R. Plemmons and Q. Zhang is supported by the U.S. Air Force Office of Scientific

Research (AFOSR), under Grant FA9550-11-1-0194.

References

1. M. T. Eismann, Hyperspectral Remote Sensing, SPIE Press, Bellingham, WA (2012).

2. H. F. Grahn and E. Paul Geladi, Techniques and Applications of Hyperspectral Image

Analysis, John Wiley & Sons Ltd., West Sussex, England (2007).

3. J. Bioucas-Dias et al., “Hyperspectral unmixing overview: geometrical, statistical, and

sparse regression-based approaches,” IEEE J. Sel. Top. Appl. Earth Obs. Rem. Sens. 5(2),

354–379 (2012).

4. J. Zhang et al., “Evaluation of jp3d for lossy and lossless compression of hyperspectralimagery,” in 2009 IEEE Int. Geoscience and Remote Sensing Symposium, IGARSS

2009, Vol. 4, pp. IV-474, IEEE, Cape Town, South Africa (2009).

5. B. Kim, Z. Xiong, and W. Pearlman, “Low bit-rate scalable video coding with 3-d set par-titioning in hierarchical trees (3-D SPIHT),” IEEE Trans. Circuits Syst. Video Technol.

10(8), 1374–1387 (2000), http://dx.doi.org/10.1109/76.889025.

6. X. Tang, S. Cho, and W. Pearlman, “Comparison of 3D set partitioning methods in hyper-

spectral image compression featuring an improved 3D-SPIHT,” in Proc. Data Compression

Conf., 2003, DCC 2003, p. 449, IEEE, Snowbird, UT (2003).

7. Y. Langevin and O. Forni, “Image and spectral image compression for four experiments on

the ROSETTA and Mars express missions of ESA,” in Int. Symp. Optical Science and

Technology, pp. 364–373, SPIE Press, Bellingham, WA (2000).

8. E. Christophe and W. Pearlman, “Three-dimensional SPIHT coding of volume images with

random access and resolution scalability,” J. Image Video Process., 13, Article 2 (2008),

http://dx.doi.org/10.1155/2008/248905.

9. J. C. Harsanyi and C. Chang, “

Hyperspectral image classification and dimensionality reduc-tion: an orthogonal subspace projection approach,” IEEE Trans. Geosci. Rem. Sens. 32(4),

779–785 (1994), http://dx.doi.org/10.1109/36.298007.

10. A. Castrodad et al., “Learning discriminative sparse models for source separation and map-

ping of hyperspectral imagery,” IEEE Trans. Geosci. Rem. Sens. 49(11), 4263–4281 (2011),

http://dx.doi.org/10.1109/TGRS.2011.2163822.

11. C. Li et al., “A compressive sensing and unmixing scheme for hyperspectral data process-

ing,” IEEE Trans. Image Process. 21(3), 1200–1210 (2012).

12. X. Tang and W. Pearlman, “Three-dimensional wavelet-based compression of hyperspectral

images,” Hyperspectral Data Compression, pp. 273–308, Springer, New York (2006).13. H. Wang, S. Babacan, and K. Sayood, “Lossless hyperspectral-image compression using

context-based conditional average,” IEEE Trans. Geosci. Rem. Sens. 45(12), 4187–4193

(2007), http://dx.doi.org/10.1109/TGRS.2007.906085.

14. G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed., The Johns HopkinsUniversity Press, Baltimore, Maryland (1996).

15. I. Jolliffe, Principal Component Analysis, 2nd ed., Springer, New York (2002).

16. Q. Du and J. Fowler, “Hyperspectral image compression using JPEG2000 and principal

component analysis,” IEEE Geosci. Rem. Sens. Lett. 4(2), 201–205 (2007), http://dx.doi

.org/10.1109/LGRS.2006.888109.

17. J. Fowler, “Compressive-projection principle component analysis,” IEEE Trans. Image

Process. 18(10), 2230–2242 (2009), http://dx.doi.org/10.1109/TIP.2009.2025089.

18. P. Drineas and M. W. Mahoney, “A randomized algorithm for a tensor-based generalization

of the SVD,” Linear Algebra Appl. 420(2–3), 553–571 (2007), http://dx.doi.org/10.1016/j

.laa.2006.08.023.



http://dx.doi.org/10.1109/JSTARS.2012.2194696http://dx.doi.org/10.1109/TIP.2011.2167626http://dx.doi.org/10.1109/TIP.2011.2167626http://dx.doi.org/10.1109/JSTARS.2012.2194696


17/17

19. J. Zhang et al., “Randomized SVD methods in hyperspectral imaging,” J. Elect. Comput.

Eng., article 3, in press (2012).

20. L. Trefethen and D. Bau, Numerical Linear Algebra, Lecture 31, SIAM, Philadelphia, PA

(1997).

21. N. Halko, P. G. Martinsson, and J. A. Tropp, “Finding structure with randomness: prob-

abilistic algorithms for constructing approximate matrix decompositions,” SIAM Rev. 53(2),

217–288 (2011), http://dx.doi.org/10.1137/090771806.

22. Y. Chen, N. Nasrabadi, and T. Tran, “Effects of linear projections on the performance of target detection and classification in hyperspectral imagery,” J. Appl. Rem. Sens. 5(1),

053563 (2011), http://dx.doi.org/10.1117/1.3659894.

23. Q. Zhang et al., “Joint segmentation and reconstruction of hyperspectral data with com-

pressed measurements,” Appl. Opt. 50(22), 4417–4435 (2011), http://dx.doi.org/

10.1364/AO.50.004417.

24. M. Gehm et al., “Single-shot compressive spectral imaging with a dual-disperser architec-

ture,” Opt. Express 15(21), 14013–14027 (2007), http://dx.doi.org/10.1364/OE.15.014013.

25. A. Wagadarikar et al., “Single disperser design for coded aperture snapshot spectral imag-

ing,” Appl. Opt. 47(10), B44–B51 (2008), http://dx.doi.org/10.1364/AO.47.000B44.

26. S. Vempala, The Random Projection Method , Vol. 65, American Mathematical Society,

Providence, Rhode Island (2004).

27. Q. Zhang, V. P. Pauca, and R. Plemmons, “Image reconstruction from double random pro-

jections,” (2013), to be submitted.28. W. Bajwa et al., “Toeplitz-structured compressed sensing matrices,” in IEEE/SP 14th

Workshop on Statistical Signal Processing, 2007, SSP’07 , pp. 294–298, IEEE,

Madison, WI (2007).

29. R. Archibald and G. Fann, “Feature selection and classification of hyperspectral images

with support vector machines,” IEEE Geosci. Rem. Sens. Lett. 4(4), 674–677 (2007),

http://dx.doi.org/10.1109/LGRS.2007.905116.

30. J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,”

IEEE Trans. Inf. Theor. 24(5), 530–536 (1978), http://dx.doi.org/10.1109/TIT.1978

.1055934.

31. K. Skretting, J. H. Husøy, and S. O. Aase, “Improved Huffman coding using recursive split-

ting,” in Proc. Norwegian Signal Processing, NORSIG, IEEE, Norway (1999).

32. M. Nelson and J.-L. Gailly, The Data Compression Book , 2nd ed., M & T Books,

New York, NY (1995).33. T. Acharya and P.-S. Tsai, JPEG2000 Standard for Image Compression: Concepts,

Algorithms and VLSI Architectures, Wiley & Sons Ltd., Hoboken, NJ (2005).

34. M. O’Neil and M. Burtscher, “Floating-point data compression at 75 gb∕s on a GPU,” inProc. Fourth Workshop on General Purpose Processing on Graphics Processing Units ,

p. 7, ACM, New York, NY (2011).

Biographies and photographs of the authors are not available.

http://dx.doi.org/10.1155/2012/409357http://dx.doi.org/10.1155/2012/409357http://dx.doi.org/10.1155/2012/409357http://dx.doi.org/10.1155/2012/409357

Documents

RandomizedAlgorithms HSI 2013