10
QUALITY AND INFORMATION CONTENT OF CHRIS HYPER-SPECTRAL DATA B. Aiazzi, S. Baronti, P. Marcoionni, I. Pippi, and M. Selva Inst. of Applied Physics ”Nello Carrara”, IFAC-CNR, Via Panciatichi, 64, 50127 Florence, Italy ABSTRACT The work focuses on evaluating quality and estimating information of CHRIS hyper-spectral images. Quality is assessed through the characterisation of the noise while information is estimated by means of an operative def- inition according to which the information content of a data set is given by the amount of information that can- not be predicted from the data that have already been ac- quired and, thus, by the entropy of the prediction errors. The noise model is first verified and the parameters of the model are then estimated. Afterwards lossless data com- pression is exploited to measure the entropy of the predic- tion errors through their bit-rate. The information content of the data is estimated by taking into account that the bit-rate achieved by the reversible compression process is due to both the contribution of the noise, whose relevance is null to a user, and of the hypothetically noise-free data. Since our goal is to estimate the amount of information of the ideal noise-free data, an entropy-variance model is as- sumed for the ideal image. Once all the parameters of the model have been estimated, the entropy of the noise-free source is derived. Results are reported and discussed for hyper-spectral data sets acquired by CHRIS spectrome- ter. Information assessment is first assessed before and after the radiometric correction process in order to eval- uate any effect introduced in the processed data. Then different areas of the same image are processed in order to assess the noise model. Eventually, the procedure is utilised to characterise data sets that have been acquired in different times in order to verify and assess any poten- tial operational change occurred in the instrument set-up or in the processing chain. Key words: CHRIS hyperspectral data; Correlation analysis; Generalised Gaussian PDF; Information theo- retic assessment; Lossless compression; Noise estima- tion; Noise modelling; Parametric entropy modelling. 1. INTRODUCTION Information-theoretic assessment is a branch of image analysis aimed at defining and measuring the quality of digital images (Ref. 1). In the field of optical remote sensing, the fidelity of the data produced by the sen- sor to the underlying radiance field is another concern (Ref. 2), which can be related to the concept of quality as well. Although this term may be intended as the ca- pability to fulfil the user’s expectations and thus cannot be defined in a universal context, being application de- pendent, quality may be related to the information con- tent of the data. Hence, it can be defined in an objective manner jointly from the signal to noise ratio (SNR) and from the entropy of the digitised images (Ref. 3). On the other hand, the availability of an objective quality mea- surement may be helpful also for application tasks, espe- cially when hyperspectral images are concerned. Due to the huge amount of data, an extraction of features reflect- ing the main aspects of spectral behaviour may be prefer- able to a straightforward classification (Ref. 4). A pre- liminary information-theoretic assessment may suggest which bands are more significant, i.e., potentially capa- ble of conveying an amount of information larger than that of other bands, so as to reduce the volume of the data to be processed without noticeable penalty. This work is based on a model suitable for quantifying the information content of digitised multi-dimensional sig- nals, more specifically hyperspectral images. Accurate estimates of the entropy of an image source can only be obtained provided that the data are uncorrelated. Hence, data decorrelation must be considered in order to sup- press or largely reduce the correlation existing in natural images. Indeed, entropy is a measure of statistical in- formation that is of uncertainty of symbols emitted by a source. Hence, any observation noise introduced by the imaging sensor will result in an increment in entropy, that is accompanied by a decrement of the information content useful in application contexts, according to Shan- non’s Information Theory (Ref. 5). An estimation of the noise must be preliminarily carried out in order to quan- tify its contribution to the overall source entropy rate. By assuming an additive noise independent of the signal and spatially stationary, i.e., statistically homogeneous, the noise parameters (variance and correlation coefficients across track, along track and along wavelength) can be estimated on the homogeneous areas of the signal. Once the standard deviation and the correlation coefficients of the noise (CCs) have been measured, the bit rate pro- _____________________________________________________ Proc. of the 3rd ESA CHRIS/Proba Workshop, 21–23 March, ESRIN, Frascati, Italy, (ESA SP-593, June 2005)

QUALITY AND INFORMATION CONTENT OF CHRIS HYPERSPECTRAL DATA

Embed Size (px)

Citation preview

QUALITY AND INFORMATION CONTENT OF CHRIS HYPER-SPECTRAL DATA

B. Aiazzi, S. Baronti, P. Marcoionni, I. Pippi, and M. Selva

Inst. of Applied Physics ”Nello Carrara”, IFAC-CNR, Via Panciatichi, 64, 50127 Florence, Italy

ABSTRACT

The work focuses on evaluating quality and estimatinginformation of CHRIS hyper-spectral images. Quality isassessed through the characterisation of the noise whileinformation is estimated by means of an operative def-inition according to which the information content of adata set is given by the amount of information that can-not be predicted from the data that have already been ac-quired and, thus, by the entropy of the prediction errors.The noise model is first verified and the parameters of themodel are then estimated. Afterwards lossless data com-pression is exploited to measure the entropy of the predic-tion errors through their bit-rate. The information contentof the data is estimated by taking into account that thebit-rate achieved by the reversible compression process isdue to both the contribution of the noise, whose relevanceis null to a user, and of the hypothetically noise-free data.Since our goal is to estimate the amount of information ofthe ideal noise-free data, an entropy-variance model is as-sumed for the ideal image. Once all the parameters of themodel have been estimated, the entropy of the noise-freesource is derived. Results are reported and discussed forhyper-spectral data sets acquired by CHRIS spectrome-ter. Information assessment is first assessed before andafter the radiometric correction process in order to eval-uate any effect introduced in the processed data. Thendifferent areas of the same image are processed in orderto assess the noise model. Eventually, the procedure isutilised to characterise data sets that have been acquiredin different times in order to verify and assess any poten-tial operational change occurred in the instrument set-upor in the processing chain.

Key words: CHRIS hyperspectral data; Correlationanalysis; Generalised Gaussian PDF; Information theo-retic assessment; Lossless compression; Noise estima-tion; Noise modelling; Parametric entropy modelling.

1. INTRODUCTION

Information-theoretic assessment is a branch of imageanalysis aimed at defining and measuring the quality of

digital images (Ref. 1). In the field of optical remotesensing, the fidelity of the data produced by the sen-sor to the underlying radiance field is another concern(Ref. 2), which can be related to the concept of qualityas well. Although this term may be intended as the ca-pability to fulfil the user’s expectations and thus cannotbe defined in a universal context, being application de-pendent, quality may be related to the information con-tent of the data. Hence, it can be defined in an objectivemanner jointly from the signal to noise ratio (SNR) andfrom the entropy of the digitised images (Ref. 3). On theother hand, the availability of an objective quality mea-surement may be helpful also for application tasks, espe-cially when hyperspectral images are concerned. Due tothe huge amount of data, an extraction of features reflect-ing the main aspects of spectral behaviour may be prefer-able to a straightforward classification (Ref. 4). A pre-liminary information-theoretic assessment may suggestwhich bands are more significant, i.e., potentially capa-ble of conveying an amount of information larger thanthat of other bands, so as to reduce the volume of the datato be processed without noticeable penalty.

This work is based on a model suitable for quantifying theinformation content of digitised multi-dimensional sig-nals, more specifically hyperspectral images. Accurateestimates of the entropy of an image source can only beobtained provided that the data are uncorrelated. Hence,data decorrelation must be considered in order to sup-press or largely reduce the correlation existing in naturalimages. Indeed, entropy is a measure of statistical in-formation that is of uncertainty of symbols emitted bya source. Hence, any observation noise introduced bythe imaging sensor will result in an increment in entropy,that is accompanied by a decrement of the informationcontent useful in application contexts, according to Shan-non’s Information Theory (Ref. 5). An estimation of thenoise must be preliminarily carried out in order to quan-tify its contribution to the overall source entropy rate. Byassuming an additive noise independent of the signal andspatially stationary, i.e., statistically homogeneous, thenoise parameters (variance and correlation coefficientsacross track, along track and along wavelength) can beestimated on the homogeneous areas of the signal. Oncethe standard deviation and the correlation coefficients ofthe noise (CCs) have been measured, the bit rate pro-

_____________________________________________________ Proc. of the 3rd ESA CHRIS/Proba Workshop, 21–23 March, ESRIN, Frascati, Italy, (ESA SP-593, June 2005)

duced by the reversible encoder can be utilised to yieldan estimate of the true information content of the multi-spectral source, i.e., of the entropy that the source wouldhave if it were noise-free. To this purpose, a model is de-vised from the rate distortion theory describing how therelationships between entropy and variance of an uncor-related non-Gaussian source changes when a stationarywhite Gaussian random process is superimposed. Such amodel can be inverted to yield the entropy of the noise-free source from that of the observed source and from theestimated parameters of the noise.

The remainder of this paper is organised as follows.Sect. 2 presents the information-theoretic procedurestep by step, starting from the assumed noise model,source decorrelation by DPCM, parametric entropy mod-elling of memory-less information sources via gener-alised Gaussian densities. Sect. 3 reports experimentalresults on several sets of CHRIS hyperspectral images.Concluding remarks are drawn in Sect. 4.

2. INFORMATION ASSESSMENT PROCEDURE

2.1. Noise modelling

This section focuses on modelling the noise affectingdigitised observed signal samples. Unlike coherent orsystematic disturbances, which may occur in some kindof data, the noise is assumed to be due to a fully sto-chastic process. Let us assume for the noise an additivesignal-independent non-Gaussian model:

g(i) = f(i) + n(i) (1)

in which g(i) is the recorded noisy signal level at posi-tion (i) and f(i) the noise-free signal. Both g(i) andf(i) are regarded as non-stationary non-Gaussian auto-correlated stochastic processes. The term n(i) is a zero-mean process, independent of f , stationary and autocor-related. Let its variance σ2

n and the correlation coefficient(CC) ρ be constant.

Let us assume for the stationary zero-mean noise a first-order Markov model, uniquely defined by the ρ and theσ2

nn(i) = ρ · n(i − 1) + εn(i) (2)

in which εn(i) is an uncorrelated random process havingvariance

σ2εn

= σ2n · (1 − ρ2). (3)

The variance of (1) can be easily calculated as

σ2g(i) = σ2

f (i) + σ2n (4)

thanks to the independence between signal and noisecomponents and to the spatial stationarity of the latter.From (2) it stems that the autocorrelation of n(i) is anexponentially decaying function of the correlation coeffi-cient:

Rnn(m) � E[n(i)n(i + m)] = ρ|m|σ2n. (5)

The zero-mean additive signal-independent correlatednoise model (2) is relatively simple and mathematicallytractable. Its accuracy has been validated for 2D and 3Dsignals produced by incoherent systems (Ref. 6), by mea-suring the exponential decay of the correlation function(5).

The noise samples n(i) may be estimated on homoge-neous signal segments, in which f(i) is constant, by tak-ing the difference between g(i) and its average g(i) on asliding window of length 2m + 1.

Once the CC of the noise, ρ, and the most homogeneousimage pixels have been found by means of robust bivari-ate regression procedures (Ref. 6), the noise samples maybe estimated in the following way. If (2) and (5) areutilised to calculate the correlation of the noise affectingg and g on a homogeneous window, the estimated noisesample at the i-th position can be written as

n(i) =

√√√√ (2m + 1)

(2m + 1) −(1 + 2ρ 1−ρm

1−ρ

) · [g(i) − g(i)].

(6)Eq. 6 is a generalisation of the corresponding expressionreported in (Ref. 7). In fact, when ρ = 0, both equationsgive the same result, if the equation in (Ref. 7) is writtento represent 1-D signals.

The resulting set {n(i)} is made available to find thenoise PDF, either empirical (histogram) or parametric, viaproper modelling.

2.2. Source de-correlation via DPCM

Differential Pulse Code Modulation (DPCM) is usuallyemployed for reversible data compression. DPCM basi-cally consists of a prediction followed by entropy codingof the resulting prediction errors. For sake of clarity, wewill develop the analysis for a 1D fixed DPCM and willextend its results to the case of 2D and 3D adaptive pre-diction (Ref. 8, 9, 10, 11).

Let g(i) denote the prediction at pixel i obtained as a lin-ear regression of the values of P previous pixels:

g(i) =P∑

j=1

φ(j) · g(i − j) (7)

in which {φ(j), j = 1, · · · , P} are the coefficients of thelinear predictor and are constant throughout the image.

By replacing the additive noise model one obtains:

g(i) = f(i) +P∑

j=1

φ(j) · n(i − j) (8)

in which

f(i) =P∑

j=1

φ(j) · f(i − j) (9)

represents the prediction for the noise-free signal as for-mulated from its previous samples. Prediction errors of gare

eg(i) � g(i)− g(i) = ef (i) + n(i)−P∑

j=1

φ(j) · n(i− j)

(10)in which ef (i) � f(i) − f(i) is the error the predic-tor would produce starting from noise-free data. Botheg(i) and ef (i) are zero-mean processes, uncorrelated,and non-stationary. The zero-mean property stems froman assumption of local first-order stationarity, within the(P + 1)-pixel window comprising the current pixel andits prediction support.

Eq. (10) may be written as

eg(i) = ef (i) + en(i) (11)

in which

en(i) � n(i) − n(i) = n(i) −P∑

j=1

φ(j) · n(i − j) (12)

is the error produced when the correlated noise is beingpredicted. The term en(i) is assumed to be zero-mean,stationary and independent of ef (i), since f and n areassumed to be independent of each other. Thus, the rela-tionship among the variances of the three types of predic-tion errors becomes

σ2eg

(i) = σ2ef

(i) + σ2en

. (13)

From the noise model (2) is is easily noticed that the termσ2

enis lower bounded by σ2

εn, which means that σ2

en≥

σ2n · (1 − ρ2). The optimum MMSE predictor for a first-

order Markov model like (2) is φ(1) = ρ and φ(j) =0, j = 2, · · · , P ; it yields σ2

en= σ2

n · (1 − ρ2) = σ2εn

, asit can be easily verified. Thus, the residual variance of thenoise after de-correlation may be approximated from theestimated variance of the correlated noise, i.e. σ 2

n, andfrom its estimated CC, ρ, as

σ2en

∼= σ2n · (1 − ρ2) (14)

the approximation being as more accurate as the predictorattains the optimal MMSE performance.

2.3. Entropy modelling

Given a stationary memory-less source S uniquely de-fined by its PDF, p(x), having zero-mean and varianceσ2, linearly quantised with a step size ∆, the minimumbit-rate needed to encode one of its symbols is (Ref. 12):

R ∼= h(S) − log2∆ (15)

in which h(S) is the differential entropy of S defined as

h(S) = −∫ ∞

−∞p(x) log2 p(x)dx =

12

log2(c ·σ2) (16)

with 0 < c ≤ 2πe a positive constant accounting forthe shape of the PDF and attaining its maximum for aGaussian function. Such a constant will be referred inthe following as entropy factor. The approximation in(15) holds for σ � ∆, but is still acceptable for σ > ∆(Ref. 13).

Now, the minimum average bit-rate Rg necessary to re-versibly encode an integer-valued sample of g, may beapproximated as in Eq. (15) in which prediction errorsare regarded as an uncorrelated source G ≡ {eg(i)} andare linearly quantised with a step size ∆ = 1:

Rg∼= h(G) =

12

log2(cg · σ2eg

) (17)

in which σ2eg

is the average variance of eg(i). By aver-aging (13) and replacing it into (17), Rg may be writtenas

Rg =12

log2[cg · (σ2ef

+ σ2en

)] (18)

where σ2ef

is the average variance of σ2ef

(i). If σ2ef

= 0,then (18) reduces to

Rg ≡ Rn =12

log2(cn · σ2en

) (19)

in which cn = 2πe is the entropy factor of the PDF ofen, if n is Gaussian. Analogously, if σ2

en= 0, then (18)

becomes:

Rg ≡ Rf =12

log2(cf · σ2ef

) (20)

in which cf ≤ 2πe is the entropy factor of predictionerrors of the noise-free image, that are generally non-Gaussian.

The average entropy of the noise-free signal f in the caseof correlated noise will be given by replacing (14) in (20)to yield

Rf =12

log2{cf · [σ2eg

− (1 − ρ2) · σ2n]}. (21)

Since σ2eg

can be measured during compression by aver-aging σ2

eg, cf is the only unknown parameter and its de-

termination is crucial for the estimation accuracy of Rf .cf is obtained by modelling ef as a Generalised Gaussiandensity (GGD) function whose parameters are to be esti-mated. After reporting the definition of GGD, next twosections show how model parameters of GGDs can beestimated and how correlated noise can be modelled byGGDs.

2.4. Generalised Gaussian PDF

A model suitable for describing unimodal non-Gaussianamplitude distributions may be achieved by varying theparameters ν (shape factor) and σ (standard deviation) of

the Generalised Gaussian density (GGD) (Ref. 14, 15,16), which is defined as

pGG(x) =[

ν · η(ν, σ)2 · Γ(1/ν)

]exp{−[η(ν, σ) · |x|]ν} (22)

in which

η(ν, σ) =1σ

[Γ(3/ν)Γ(1/ν)

]1/2

(23)

and Γ(·) is the Gamma function, i.e., Γ(z) =∫ ∞0 tz−1e−tdt, z > 0. Since Γ(n) = (n− 1)!, when ν =

1 a Laplacian law is obtained; ν = 2 yields a Gaussiandistribution. As limit cases, for ν → 0, pGG(x) becomesan impulse function, yet having extremely heavy tails andthus nonzero σ2 variance, whereas for ν → ∞, pGG(x)approaches a uniform distribution having variance σ 2 aswell. The shape parameter ν rules the exponential rate ofdecay: the larger the ν the flatter the PDF; the smaller theν, the more peaked the PDF. Fig. 1(a) shows the trend ofthe GG function for different values of ν.

The matching between a GGD and the empirical data dis-tribution can be obtained following a maximum likeli-hood (ML) approach (Ref. 17), having the disadvantageof a cumbersome numerical solution. In this work, threesimple and effective methods, suitable for real-time ap-plications and based on fitting a parametric function ofthe modelled source to statistics calculated from the ob-served data, will be reviewed hereafter.

2.4.1. Higher-order moment method

This method falls into the category of moment methods,being based on matching the moments of the data setwith those of the assumed distribution, in order to esti-mate variance and shape factor. For a GGD, the ratio ofsecond-order moment to square root of fourth-order mo-ment is a steadily increasing function of the shape factorν (Ref. 18):

FHM (ν) =Γ(3/ν)√

Γ(1/ν) · Γ(5/ν). (24)

Given N i.i.d. zero-mean random variables,{x1, x2, . . . , xN}, following a GGD p(x), letµ2 = 1

N

∑Ni=1 x2

i be the estimated second-order

moment and µ4 = 1N

∑Ni=1 x4

i the estimated fourth-order moment. The parameter ν is estimated by inverting(24), that is by solving

ν = F−1HM

(µ2√µ4

). (25)

In practical implementations, the values of (24) are calcu-lated at uniform steps sizes and stored in a look-up tablewhose entries are the corresponding values of µ2/

õ4.

2.4.2. Mallat’s method

Another moment-based method was briefly introduced byMallat (Ref. 19) and then developed in greater detail bySharifi and Leon-Garcia (Ref. 20). Again, the ratio ofmean absolute value to standard deviation of a GGD is asteadily increasing function of the shape factor ν:

FM (ν) =Γ(2/ν)√

Γ(1/ν) · Γ(3/ν). (26)

Given N i.i.d. zero-mean random variables, obeying to aGGD p(x), let m1 = 1

N

∑Ni=1 |xi| be the estimate of the

mean absolute value and σ2 = 1N

∑Ni=1 x2

i the samplevariance. The parameter ν is estimated by inverting (26),that is by solving

ν = F−1M

(m1

σ

). (27)

Again, the values of (26) are pre-calculated and stored ina look-up table indexed by the values of m1/σ.

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Amplitude

Pro

babi

lity

dens

ity

2

1

10

0.6

(a)

0 0.5 1 1.5 2 2.5 3−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

Shape factor

Sha

pe fu

nctio

n

Entropic function

Mallat’s function

Higher moments function

(b)

Figure 1. (a) unity-variance GG density plotted for sev-eral ν’s; (b) shape functions FHM (24), FM (26) andFH (29) of a GG PDF as a function of the shape factorν.

2.4.3. Entropy matching method

The method developed by the authors (Ref. 21) relies onfitting the entropy of the modelled source to that of theempirical data. Since entropy is not a moment, whoseaccuracy of estimation requires larger and larger samplesizes, as the order increases, this method is particularlysuitable for small sample sizes.

Given a GGD (22), its differential entropy (16), hGG, isa function of both σ and ν

hGG(σ, ν) = − log2

[ν · η(ν, σ)2 · Γ(1/ν)

]+

1ν ln 2

. (28)

Let H = −∑l pl log2 pl be the entropy of the memory-

less source, in which the pl’s are the probabilities of theinteger-valued data obtained after quantising with ∆ = 1.Replacing (23) in (28) and equating (28) to H yield

H−log2 σ = − log2

[ν · Γ(3/ν)1/2

ν · Γ(1/ν)3/2

]+

1ν ln 2

� FH(ν).

(29)

Since (29) is a steadily increasing function, for 0 < ν <2, once H and σ have been calculated from the sampledata, the estimated shape factor ν is found by inverting(29):

ν = F−1H (H − log2 σ) . (30)

The values of (29) are calculated at non-uniform stepsizes (increasing with ν) and stored in a look-up tablewhose entries are the values of H − log2 σ.

Fig. 1(b) shows the trends of higher-moment, Mallat’sand entropy-matching functions against the GGD shapefactor ν. In particular, the entropy-matching functionmeasures the entropy of a stationary memory-less sourcethat emits unity-variance GG-distributed symbols quan-tised with a unity step size (Ref. 21). It attains the max-imum for ν = 2, FH(2) = log2

√2πe ≈ 2.04, and

yields log2 (√

2e) ≈ 1.94 when ν = 1. Eventually,limν→∞ FH(ν) = log2 (2

√3) ≈ 1.79.

The entropy-matching method cannot be used for ν closeto 2, because the function (29) cannot be univocally in-verted around ν = 2. Since the slope of the functionto be inverted is related to the accuracy of inversion, themoments method is preferable to Mallat’s method aroundthe Gaussian case. The latter, however, is more accu-rate for small ν. The entropy method is more accuratethan the other two methods for ν < 1, especially whenν < 0.5. The moments method fails for ν < 0.15 be-cause its function turns out to be practically flat in thatcase. Conversely, the entropy method yields the most ac-curate results for very small ν, provided that the functionis non-uniformly sampled.

2.5. GG modelling of correlated noise

Once the estimated samples of correlated noise n(i) havebeen found by means of (6), their amplitude can be easily

matched by a GGD. However, n(i) are not realizationsof a memory-less source, if the noise is correlated. Inthat case, the decorrelated noise residues, en(i), are bettersuitable for describing the entropy of the noise through aparametric model.

Eq. (2) shows that if the noise residue εn(i) is Gaussian,the noise n(i) will be Gaussian as well, regardless ofcorrelation. Hence, if n(i) is estimated and found to beGaussian, εn(i) will be white and Gaussian, thereby hav-ing parametric entropy exactly known (19). Conversely,if n(i) is not Gaussian, but is amplitude is describedby a GGD with standard deviation σn and shape factorνn = 2, then also εn(i) is GG-distributed with varianceσ2

n(1 − ρ2) and shape factor νεn = νn, that depends onboth νn and ρ (Ref. 18). A relationship found betweenthe fourth-order moments of n(i) and εn(i) yields:

νεn = F−1µ4

[(Fµ4(νn) − 6ρ2

1 + ρ2

)· 1 + ρ2

1 − ρ2

](31)

where

Fµ4(ν) � Γ(5/ν)Γ(1/ν)Γ2(3/ν)

. (32)

Therefore, once νn and ρ are estimated, the shape factorof the correlated noise residue, νεn is found by calculat-ing Fµ4(νn) (32), replacing its value in (31) and invertingFµ by means of a look-up table.

2.6. Information-theoretic assessment

Let us assume that the real-valued eg(i) may be modelledas a GGD. From (17) the entropy function is

12

log2(cg) = Rg − log2(σeg ) = FH(νeg ) (33)

in which νeg is the shape factor of eg(i), the average rateof which, Rg , has been set equal to the entropy H of thediscrete source. The νeg is found by inverting either theentropy function FH(νeg ), or any other shape function;in that case the eg(i) produced by DPCM, instead of itsvariance, is directly used. Eventually, the parametric PDFof the uncorrelated observed source eg(i) is available.

The term eg(i) is obtained by adding a sample of whitenon-Gaussian noise of variance σ2

en, approximately equal

to (1−ρ2) ·σ2n, to a sample of the noise-free uncorrelated

non-Gaussian signal ef (i). Furthermore, ef (i) and en(i)are independent of each other.

Therefore, the GG PDF of eg previously found will begiven by the linear convolution of the unknown p ef

(x)with a GG PDF having variance σ2

enand shape factor

νen . By assuming that the PDF of the noise-free residue,pef

(x), is GG as well, its shape factor νefcan be obtained

starting from the forward relationship

pGG[σeg , νeg ](x) = pGG[√

σ2eg

− σ2en

, νef](x)

⊗pGG[σen , νen ](x) (34)

Figure 2. Flowchart of the information-theoretic assessment procedure for a digital signal.

by de-convolving the PDF of noise residue from that ofnoisy-signal residue.

In a practical implementation, the estimated value of νef

is found such that the direct convolution at right side of(34) yields a GGD, whose shape factor matches νeg asmuch as possible.

Eventually, the estimated shape factor νefis used to de-

termine the entropy function

12

log2 (cf ) = FH(νef) (35)

which is replaced in (21) to yield the entropy of the noise-free signal Rf .

Fig. 2 summarises the overall procedure.

Extension of the procedure to two-dimensional (2D) andthree-dimensional (3D) signals, i.e. to digital images andsequences of digital images is straightforward. In the for-mer case, 2D prediction is used to find eg, two correla-tion coefficients, ρx and ρy , are estimated for the noise,whose variance after decorrelation is approximated asσ2

n(1−ρ2x)(1−ρ2

y), by assuming a separable 2D Markovmodel. Analogously, Eq. (6), defining the estimatedvalue of a sample of correlated noise, is extended as

n(i, j) =

√√√√ N2

(N2 −(1 + 2ρx

1−ρmx

1−ρx

) (1 + 2ρy

1−ρmy

1−ρy

)

·[g(i, j) − g(i, j)]

and generalises the model reported in (Ref. 7), as previ-ously discussed in sect. 2.1. N = 2m + 1 is the lengthof the side of the square window on which the averageg(i, j) is calculated.

The 3D extension is more critical because a sequence ofimages may have noise variances and spatial CCs differ-ent for each image. Moreover, it is often desirable to es-timate entropy of the individual images of the sequence.Therefore, each image is de-correlated both spatially andalong the third dimension by using a 3D prediction.

3. EXPERIMENTAL RESULTS

3.1. CHRIS hyperspectral data

Figure 3. Colour composite of CHRIS data acquired on18 September 2003: red 661 nm, green 502 nm, blue 442nm.

The proposed information-theoretic procedure was runon several hyperspectral sequences collected by CHRISover the San Rossore test site, in Central Italy on differentdates. The sequences are constituted by 18 bands with amean width of 14.7 nm, in the range 442−1015 nm and aground resolution of about 17 m. The size of each imageis 744 pixels across track and 748 along track. The datahave a dynamic range of 19 bit. The RCI (Restored Cor-rected Images) data have been provided by Sira Electro-Optics Ltd. Company, whereas the L1B data have beenradiometrically corrected at IFAC-CNR; an efficient de-striping algorithm has also been applied to mitigate thestriping effect due to the diversity in gain and offset ofthe imaging sensor elements. Three main objectives wereconsidered. The first was aimed at evaluating the qualityof the radiometric corrections performed at IFAC-CNR.

The second was devoted at assessing the noise model ondifferent classes of data while the third consisted in veri-fying the variations of the noise parameters for differentobservations.

3.1.1. Radiometric correction assessment

The first step of the procedure concerns the estimation ofthe noise parameters for each band. Fig. 4 reports theplots of noise standard deviation σn before and after ra-diometric corrections. The amount of noise is larger for

0.4 0.5 0.6 0.7 0.8 0.9 10

100

200

300

400

500

600

700

800

Wavelength (µm)

CH

RIS

− N

oise

sta

ndar

d de

viat

ion

σn

L1BRCI

Figure 4. Noise standard deviation of CHRIS test imageplotted vs. wavelength.

0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Wavelength (µm)

CH

RIS

− N

oise

CC

alo

ng w

avel

engh

t ρλ

L1BRCI

Figure 5. Spectral CC ρλ plotted vs. wavelength

shorter wavelengths. The de-striping process has the ef-fect of reducing the contribution of noise. That is appar-ent for shorter wavelength. As a consequence, the curvetends to go down and becomes more flat in wavelength.

A similar trend in wavelength appears in Fig. 5, where thespectral CC, ρλ, of the noise is plotted. Analogously tothe standard deviation, ρλ is larger in the VIS than in theother parts of the spectrum. As expected, the de-stripingprocess tends to reduce the spectral correlation, becauseit processes each band independently of each other.

In a similar way, we can justify the trend of the plots ofthe CCs of the noise across and along track, reported in

Figg. 6 and 7, respectively. In fact, filtering across trackhas the effect of reducing the correlation along track and,conversely, to increase the correlation across track.

Eventually, the shape factor of the GG-modelled noise isreported in Fig. 8 for L1B data. Fig. 8 shows that thenoise of CHRIS images, after radiometric correction andde-striping, is Gaussian with a very good approximation.

0.4 0.5 0.6 0.7 0.8 0.9 1−0.05

0.05

0.15

0.25

0.35

0.45

0.55

0.65

Wavelength (µm)C

HR

IS −

Noi

se C

C a

cros

s tr

ack

ρx L1B

RCI

Figure 6. Across-track CC ρx plotted vs. wavelength.

0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Wavelength (µm)

CH

RIS

− N

oise

CC

alo

ng tr

ack

ρy L1B

RCI

Figure 7. Along-track CC ρy plotted vs. wavelength.

0.4 0.5 0.6 0.7 0.8 0.9 11

1.25

1.5

1.75

2

2.25

2.5

2.75

3

Wavelength (µm)

CH

RIS

cal

ibra

ted

− N

oise

sha

pe fa

ctor

ν

Noise Shape FactorLinear regression

Figure 8. Noise shape factor ν vs. wavelength.

0.4 0.5 0.6 0.7 0.8 0.9 19

9.5

10

10.5

11

11.5

12

12.5

Wavelength (µm)

CH

RIS

− R

g Glo

bal E

ntro

py (

bit/p

el) L1B

RCI

Figure 9. Estimated information Rg of observed radiancedata plotted vs. wavelength.

0.4 0.5 0.6 0.7 0.8 0.9 18

8.5

9

9.5

10

10.5

11

11.5

12

12.5

Wavelength (µm)

CH

RIS

− R

f Sig

nal E

ntro

py (

bit/p

el) L1B

RCI

Figure 10. Estimated information Rf of ideal noise-freeradiance data plotted vs. wavelength.

0.4 0.5 0.6 0.7 0.8 0.9 17

7.5

8

8.5

9

9.5

10

10.5

11

11.5

Wavelength (µm)

CH

RIS

− R

e n (bi

t/pel

)

L1BRCI

Figure 11. Estimated information Rn due to noise for theobserved radiance data plotted vs. wavelength.

After the noise parameters have been determined, theinformation-theoretic assessment procedure has been run.Figg. 9, 10, and 11, report the scores relative to informa-tion content varying with the wavelength. Rg and Rf areperfectly in trend, whereas Ren has been slightly reducedby the filtering step of the de-striping process.

Notwithstanding the useful information content is un-

changed after the calibration procedure, the user can ex-tract information more easily from the calibrated imagesbecause the de-striping process has diminished the noiselevel that has the effect to make the information contentless intelligible.

It is noticeable that the results obtained both for the RCIand calibrated data are very similar, confirming, as ex-pected, that the calibration process does not change theinformation content of the acquired data.

3.1.2. Variability with landscape

Notwithstanding the information assessment procedure isable to find homogeneous areas occurring at any place ofthe image, it resulted that all the regions selected by theprogram were located mostly on the sea. That happenedin particular when a restricted number of regions was se-lected in order to improve the reliability of the estimation.

To investigate on this aspect two subimages were initiallyconsidered. The first, namely Sea, was constituted by seapixel only, while the latter, namely Land, by land pixelonly. On both subimages the information assessment pro-cedure was run. As expected no difference was found be-tween the whole image and the Sea subimage. Concern-ing the Land subimage, we found that, depending on theband, the standard deviation of noise, σn, substantiallyincreased with respect to the Sea image. Initially we at-tributed this behaviour to a possible bias in σn caused bythe presence of texture on areas that were believed ho-mogeneous but that probably were not. In fact this con-jecture was also supported by a general increase of thevalues of the correlation coefficients as it appears in Fig.12 where ρx and ρy for the Land subimage are reported.

In order to verify the presence of such a bias we decidedto estimate σn also with the bit plane algorithm reportedin (Ref. 7). This approach is less sharp than the procedurethat adopts homogeneous regions but has the advantage tobe independent of region homogeneity.

Surprisingly, the results obtained by the bit plane proce-dure confirmed that σn is somewhat greater on the Landthan on the Sea image, thus justifying the behaviour ofσn reported in Fig. 13 where σn is reported for the Land,Sea, River1, and River2 subimages, the last two beingnarrow rectangular areas around the Arno river that ap-pears in the bottom part of Fig. 3.

River1, and River2 subimages have different geometricsize. The difference in size has been deliberately fixed inorder to force the estimation algorithm to work on homo-geneous areas clustered in different conditions. In par-ticular, the estimation that is more reliable (on River2) isobtained on the smallest subimage. In fact, on River2 thealgorithm is able to clusterise the homogeneous areas ofthe river in all bands and thus derive an estimation that isextremely simi lar to that of the Sea image.

Concerning the explanation of the bias, it might be due to

0.4 0.5 0.6 0.7 0.8 0.9 1−0.05

0.05

0.15

0.25

0.35

0.45

0.55

0.65

Wavelength (µm)

CH

RIS

− N

oise

CC

s

ρy

ρx

Figure 12. CC ρx and ρy estimated on Land subimageand plotted vs. wavelength.

0.4 0.5 0.6 0.7 0.8 0.9 10

500

1000

1500

2000

2500

Wavelength (µm)

CH

RIS

− N

oise

sta

ndar

d de

viat

ion

σn Sea

LandRiver 1River 2

Figure 13. Noise standard deviation plotted vs. wave-length for different landscapes; some values of σn forLand subimage are out of scale and have been clipped

a non-linear processing of the image occurred when pass-ing from the raw L0 data to the instrument corrected RCIdata. A not perfectly additive noise component, whichcould be modelled with the introduction of a multiplica-tive signal-dependent noise component, might also con-tribute to this effect. In fact, due to the relatively largedynamic range, also a slight multiplicative noise compo-nent could easily explain the increase of σn on land areas.

In order to clarify this point, further analysis should bedeveloped on the raw data since RCI images have beencorrected by a sensor-dependent gain whose effect is tomodify the statistics of the acquired data.

The presence of a slight multiplicative noise componentcould be easily taken into account by the noise model butwould also cause implications in the estimation of the en-tropy of the noise. Actually, since σn is mainly estimatedby means of the contribution of sea areas, and thus prob-ably under estimated, Rf might consequently be over es-timated and the reported scores might represent an upperbound.

3.1.3. Temporal variation

A preliminary analysis was also carried out to investigateon temporal variation of the data. The analysis was per-formed on the images acquired on July 2003, Septem-ber 2003, and January 2004. The different meteorologi-cal conditions that occurred during the observations obvi-ously impose some caution in the interpretation of the re-sults. In particular, as it appears in Fig. 14, the behaviourof the estimated σn in the image acquired in January isinfluenced by the conditions of the sea that is particularlyrough. Notwithstanding this premise some not negligibledifferences appear in the plots of σn at all wavelengths.Further investigations should be considered with furtherinformation to be provided on the acquisition process.

0.4 0.5 0.6 0.7 0.8 0.9 10

100

200

300

400

500

600

700

Wavelength (µm)

CH

RIS

− N

oise

sta

ndar

d de

viat

ion

σn

July 2003September 2003January 2004

Figure 14. σn vs. wavelength for three different dates.

0.4 0.5 0.6 0.7 0.8 0.9 1−10

0

10

20

30

40

50

60

Wavelength (µm)

SN

R (

dB)

July 2003September 2003January 2004

Figure 15. SNR vs. wavelength for the three acquisitions.

Eventually, the Signal to Noise Ratio (SNR),

SNR(dB) � 10 · log10

σ2g − σ2

n

σ2n

. (36)

is reported in Fig. 15 for the three different observations.Apart from the shorter wavelength in January’s image,the three plots present a similar trend and confirm the re-liability of the procedure. It should be noticed that theSNR is signal dependent and a direct comparison of itsvalue for the three plots is not significant.

4. CONCLUDING REMARKS

A procedure for information-theoretic assessment of digi-tised multi-dimensional signal has been described. It re-lies on robust noise estimation and on parametric entropymodelling to calculate the information of the ideal noise-free signal starting from the digitised observed signal.From the code rate and the estimated noise variance andCC’s a model was suggested to upper bound the amountof information generated by an ideally noise-free processof sampling and digitisation. The procedure has beenutilised on CHRIS hyperspectral image data to estimatenoise parameters and information content of each spec-tral band. Results between RCI and L1B data have beenfound consistent. Some discrepancy was found in the es-timation of noise parameters on areas of different textureand mean grey value. Possible explanations have beenpreliminarily discussed. Concerning variations found innoise parameters of multitemporal observations, furtherand deeper investigations are needed to understand thepossible causes.

ACKNOWLEDGEMENTS

The authors wish to warmly thank their former co-authorsL. Alparone, of the University of Florence, and A. Bar-ducci , of IFAC-CNR, for the valuable discussions on hy-perspectral data analysis.

REFERENCES

1. Huck, F. O., Fales, C. L., Alter-Ganterberg, R.,Park, S. K., and Rahman, Z. Information-theoreticassessment of sampled imaging systems. J. Opti-cal Engin., 38(5):742–762, May 1999.

2. Park, S. K. and Rahman, Z. Fidelity analysisof sampled imaging systems. J. Optical Engin.,38(5):786–800, May 1999.

3. Aiazzi, B., Alparone, L., Barducci, A., Baronti, S.,and Pippi, I. Assessment of noise variance andinformation content of multi-/hyper-spectral im-agery. ISPRS Internat. Archives Photogramm. Re-mote Sensing, 32(7-4-3W6):167–174, July 1999.

4. Benediktsson, J. A., Sveinsson, J. R., and Arna-son, K. Classification and feature extraction ofAVIRIS data. IEEE Trans. Geosci. Remote Sens-ing, 33(5):1194–1205, Sep. 1995.

5. Shannon, C. E. and Weaver, W. The MathematicalTheory of Communication. University of IllinoisPress, Urbana, IL, 1949.

6. Aiazzi, B., Alparone, L., Barducci, A., Baronti,S., and Pippi, I. Information-theoretic assessmentof sampled hyperspectral imagers. IEEE Trans.Geosci. Remote Sensing, 39(7):1447–1458, July2001.

7. Aiazzi, B., Alparone, L., Barducci, A., Baronti,S., and Pippi, I. Estimating noise and informa-tion of multispectral imagery. J. Optical Engin.,41(3):656–668, Mar. 2002.

8. Aiazzi, B., Alparone, L., and Baronti, S. Fuzzylogic-based matching pursuits for lossless predic-tive coding of still images. IEEE Trans. Fuzzy Sys-tems, 10(4):473–483, Aug. 2002.

9. Aiazzi, B., Alparone, L., and Baronti, S. Near-lossless image compression by relaxation-labelledprediction. Signal Processing, 82(11):1619–1631,Nov. 2002.

10. Aiazzi, B., Alba, P., Alparone, L., and Baronti,S. Lossless compression of multi/hyper-spectralimagery based on a 3-D fuzzy prediction. IEEETrans. Geosci. Remote Sensing, 37(5):2287–2294,Sep. 1999.

11. Aiazzi, B., Alparone, L., and Baronti, S. Near-lossless compression of 3-D optical data. IEEETrans. Geosci. Remote Sensing, 39(11):2547–2557, Nov. 2001.

12. Jayant, N. S. and Noll, P. Digital Coding of Wave-forms: Principles and Applications to Speech andVideo. Prentice Hall, Englewood Cliffs, NJ, 1984.

13. Roger, R. E. and Arnold, J. F. Reversible im-age compression bounded by noise. IEEE Trans.Geosci. Remote Sensing, 32(1):19–24, Jan. 1994.

14. Farvardin, N. and Modestino, J. W. Optimumquantizer performance for a class of non-Gaussianmemoryless sources. IEEE Trans. Inform. Theory,IT-30(5):485–497, Sep. 1984.

15. Varanasi, M. K. and Aazhang, B. Parametric gen-eralized Gaussian density estimation. J. Acoust.Soc. Amer., 86(4):1404–1415, 1989.

16. Birney, K. A. and Fischer, T. R. On the model-ing of DCT and subband image data for compres-sion. IEEE Trans. Image Processing, 4(2):186–193, Feb. 1995.

17. Muller, F. Distribution shape of two-dimensionalDCT coefficients of natural images. ElectronicsLett., 29(22):1935–1936, 1993.

18. Niehsen, W. Generalized Gaussian modeling ofcorrelated signal sources. IEEE Trans. SignalProcessing, 47(1):217–219, Jan. 1999.

19. Mallat, S. A theory for multiresolution sig-nal decomposition: the wavelet representation.IEEE Trans. Pattern Anal. Machine Intell., PAMI-11(7):674–693, July 1989.

20. Sharifi, K. and Leon-Garcia, A. Estimation ofshape parameter for generalized Gaussian distrib-utions in subband decompositions of video. IEEETrans. Circuits Syst. Video Technol., 5(1):52–56,1995.

21. Aiazzi, B., Alparone, L., and Baronti, S. Esti-mation based on entropy matching for generalizedGaussian PDF modeling. IEEE Signal ProcessingLett., 6(6):138–140, June 1999.