Correcting the Smoothing Effect

Mathematical Geology, Vol. 37, No. 1, January 2005 ( C© 2005)DOI: 10.1007/s11004-005-8748-7

Correcting the Smoothing Effect of OrdinaryKriging Estimates1

Jorge Kazuo Yamamoto2

The smoothing effect of ordinary kriging is a well-known dangerous effect associated with this estima-tion technique. Consequently kriging estimates do not reproduce both histogram and semivariogrammodel of sample data. A four-step procedure for correcting the smoothing effect of ordinary krigingestimates is shown to be efficient for the reproduction of histogram and semivariogram without lossof local accuracy. Furthermore, this procedure provides a unique map sharing both local and globalaccuracies. Ordinary kriging with a proper correction for smoothing effect can be revitalized as areliable estimation method that allows a better use of the available information.

KEY WORDS: smoothing effect, global accuracy, interpolation variance, ordinary kriging.

INTRODUCTION

Ordinary kriging has been established as a reliable method of data interpolation be-cause it is an exact estimator and provides estimates with minimum error variancein the least square sense. These characteristics guarantee local accuracy for krigingestimates. However, ordinary kriging estimates present a serious drawback wellknown by geostatisticians as the smoothing effect in which small values are usuallyoverestimated and large values underestimated. The smoothing effect is inherent inthe ordinary kriging estimator as well as in any other estimator based on a weightedaverage such as inverse distance weighting. As a consequence of the smoothingeffect ordinary kriging estimates do not reproduce either the histogram or thespatial variability as given by the semivariogram function. On the other hand, due tothe disadvantages of ordinary kriging estimation pointed before, conditional simu-lation has been recognized as an alternative for reproduction of both histogram andsemivariogram, i.e., guaranteeing the global accuracy. However, conditional simu-lation provides multiple equiprobable images of the phenomenon under study, and

1Received 9 February 2001; accepted 2 August 2004.2Department of Environmental and Sedimentary Geology, Institute of Geosciences, University of SaoPaulo, Rua do Lago, 562, CEP 05.508-900, Sao Paulo, SP, Brazil; e-mail: [email protected]

69

0882-8121/05/0100-0069/1 C© 2005 International Association for Mathematical Geology

70 Yamamoto

the problem is to choose a single representative image. Besides, global accuracydoes not assure local accuracy, because both are conflicting objectives as stated byCaers (2000) and Journel, Kyriakidis, and Mao (2000). Unfortunately, accordingto Olea (1999), stochastic realizations are not error-free rendering of a spatialphenomenon under study and the errors are, for any realization, larger than thosefor kriging estimation, which is the least attractive feature of stochastic simulation.

Therefore, there is no ready solution for obtaining a unique representativeimage that shares both global and local accuracies. Thus, both approaches callfor a correction for displaying properly a spatial phenomenon through a uniquerepresentative image. For ordinary kriging, several authors have proposed solutionsin order to correct the smoothing effect (Guertin, 1984; Journel, Kyriakidis,and Mao, 2000; Olea and Pawlowsky, 1996) all based on postprocessing of theresulting kriged image. Guertin (1984) proposed a correction function based onan isofactorial representation of the bivariate distribution of the true value againstthe estimated one. Olea and Pawlowsky (1996) have proposed a procedure calledcompensated kriging, which is carried out in two steps: a linear regression isestablished using results from a cross-validation procedure; the coefficients of theregression line are then used to correct ordinary kriging estimates. This procedure,according to Olea and Pawlowsky (1996), produces compensated estimates withintermediate properties between those obtained using conventional kriging andconditional simulation. Journel, Kyriakidis, and Mao (2000) have proposed a so-lution based on Yao’s spectral postprocessing algorithm, which has been proved tobe efficient in reproducing the semivariogram model. However, according to theseauthors, this semivariogram reproduction is achieved at a loss of local accuracy,confirming that global accuracy and local accuracy are conflicting objectives.

Goovaerts (1998) showed that by incorporating local criteria in stochasticsimulation one obtains a response with some desirable features of estimation suchas smaller prediction errors while reproducing the pattern of spatial variability asgiven by the semivariogram. Caers (2000) proposed addition of local accuracy toconditional simulation, which increased the connectivity of extreme values anddid not degrade too much from semivariogram reproduction.

In this paper, we present an approach for correcting the smoothing effect ofordinary kriging estimates, which was derived partially from that one suggestedby Olea and Pawlowsky (1996). The proposed approach also uses cross validationresults in order to derive true errors that are subsequently used for correctingordinary kriging estimates.

MEASURING UNCERTAINTY ASSOCIATED WITH ORDINARYKRIGING ESTIMATES

The basic idea for correcting the smoothing effect of ordinary kriging es-timates might be based on a reliable measurement of the uncertainty associated

Correcting the Smoothing Effect of Ordinary Kriging Estimates 71

with them. It is well known that the kriging variance cannot be used for un-certainty assessment because it does not measure the local data dispersion andjust only a configuration index of data points used to make the ordinary krigingestimate. On the other hand, as shown by this author (Yamamoto, 2000), theinterpolation variance measures the local data dispersion since it is data-valuedependent and semivariogram function dependent through the ordinary krigingweights.

The ordinary kriging estimate (z∗OK(xo)) results from the following weighted

average of neighboring data values:

z∗OK(xo) =

n∑

i=1

λiz(xi) (1)

where {λi, i = 1, n} are the kriging weights resulting from solution of a normalsystem of equations.

The interpolation variance associated with the ordinary kriging estimate iscomputed as:

s2o =

n∑

i=1

λi[z(xi) − z∗OK(xo)]2 (2)

It is important to recall that the kriging weights {λi, i = 1, n} must be all positiveor constrained to be positive in order to ensure positiveness of the interpolationvariance.

Through cross-validation it is possible to obtain for each sampling point thetrue error:

TrueError = [z∗OK(xo) − z(xo)] (3)

where z(xo) is the actual value of the random variable. Note that the true errorcan be positive if the estimate value is greater than actual value and negative ifvice-versa. Besides, the interpolation standard deviation presents a positive linearrelationship with the absolute true error (Yamamoto, 2000, p. 503).

SMOOTHING EFFECT

Estimates based on weighted average formula such as ordinary kriging(1) present a reduced variability, which is often referred to as the smoothingeffect, according to Isaaks and Srivastava (1989, p. 420). Moreover, the use ofmore sample values usually tends to increase the smoothness of the estimates(Isaaks and Srivastava, 1989, p. 420). As a consequence of the smoothing effect,small values are usually overestimated while large values are underestimated,which characterizes a conditional bias of the resulting estimator (Goovaerts, 1997,

72 Yamamoto

p. 182). The conditional bias given that the estimated value z∗(x) is greater than athreshold value zc is, after Journel, Kyriakidis, and Mao (2000):

E{Z∗OK(x) − Z(x)|Z∗(x) > zc} �= 0

The smoothing effect appears as a serious drawback for representing patternsof extreme attribute values, such as zones of higher permeabilities or richer zonesin metal (Goovaerts, 1997, p. 370). Furthermore, the smoothing effect is uneven inspace, being zero at data locations and increasing as location x gets farther awayfrom data locations (Journel, Kyriakidis, and Mao, 2000, p. 791).

The smoothing effect is due to a deficit of variance of the kriging estimatorand for simple kriging (SK) is equal to the kriging variance as shown by Journel,Kyriakidis, and Mao (2000, p. 791):

Var{Z(x)} − Var{Z∗SK(x)} = σ 2

SK(x)

Similarly, the deficit of variance for ordinary kriging (OK) can be derivedfrom the following expression (Yamamoto, 2000, p. 507):

Var{Z(x)} − Var{Z∗OK(x)} = σ 2

OK(x) + 2µ ≥ 0

where µ is the Lagrange multiplier. Yamamoto (2000, p. 493) also showed thatthe deficit of variance could be expressed in terms of the interpolation variance as:

Var{Z(x)} − Var{Z∗OK(x)} = E

{S2

o

} ≥ 0 (4)

which can be interpreted as the variance smoothing of the kriging estimatorZ∗

OK(x). This expression makes sense as a measure of the deficit of variancebecause the greater the interpolation variance S2

o taken in average over all possibledata values for a given data configuration, the greater the variance smoothing ofthe kriging estimator (Yamamoto, 2000, p. 493). Actually expression (4) showsthat the sample variance is equal to the variance of ordinary kriging estimatesplus the variance smoothing of the kriging estimator. For instance, if all samplelocations are estimated by ordinary kriging, then the variance of ordinary krigingestimates is equal to the sample variance because the interpolation variance ondata location is equal to zero.

This paper shows how expression (4) approaches the sample variance andhow the interpolation variance can be used for correcting the smoothing effect ofordinary kriging estimates. Since uncertainty around an estimate can be positiveor negative, it is important to choose the correct signal for correcting ordinarykriging estimates, as we can see in the following section.


CORRECTION FOR SMOOTHING EFFECT

The proposed procedure for correcting the smoothing effect is based on afour-step approach:

1. Run the cross-validation procedure in order to achieve for each data pointthe interpolation standard deviation (square root of expression (2)) and thetrue error (3). These variables are then transformed into another variablenamed number of interpolation standard deviations as follows:

NSo = −TrueError

sowith so �= 0 (5)

Note that NSo is either positive or negative. The minus sign makes com-pensation of the smoothing effect. If the true error is positive it meansthat the estimated value is greater than the actual value and, therefore, anerror quantity must be subtracted. On the other hand, a negative true errormeans that the estimated value is less than the actual value and so an errorquantity must be added. Therefore, as output from the cross-validationprocedure one obtains for each location a new variable (NSo ), i.e., numberof interpolation standard deviations. Note that the interpolation standarddeviation for a data location as computed by cross-validation procedure isalways nonzero;

2. Run the ordinary kriging procedure in order to determine the number ofinterpolation standard deviations (NSo ) at all nodes x to be corrected. It isimportant to mention that NSo must be estimated with a minimum numberof neighbor points in order to avoid smoothness.

3. Run the ordinary kriging procedure to make estimates z∗OK(xo) at all nodes

x based on the same number of neighbor points as used in step 1). In thisrun, one also obtains for each node the interpolation standard deviation(so);

4. Run a postprocessing program for correcting the ordinary kriging estimatesas follows:

z∗∗OK(xo) = z∗

OK + NSo (xo) × so (6)

where NSo is the number of interpolation standard deviations at node xo asdetermined in step 3). An important remark regarding expression (6) is thatcorrected estimates will tend to align along 45◦ line in a scattergram relatingreference values against corrected estimates. Therefore, if the deficit of variancecan be replaced properly and the corrected estimates relate to the actual values, thecorrected estimates will reproduce the histogram and the semivariogram as well.

The correction provided by expression (6) must be checked if the correctedvalue is within or outside the data range of the subset of neighboring points. The

74 Yamamoto

minimum and maximum values are determined:

zmin = min[z(xi), i = 1, n]

zmax = max[z(xi), i = 1, n]

If the corrected value falls outside the permissible range [zmin, zmax], thenthe postprocessing procedure offers two alternative corrections. The first simplyassigns the maximum or minimum values according to:

if z∗∗OK(xo) > zmax then z∗∗

OK(xo) = zmax, or

if z∗∗OK(xo) < zmin then z∗∗

OK(xo) = zmin

(7)

The second correction computes an optimum factor in such a way that thevariance of corrected values will approach the sample variance. Thus, the correctedestimate is computed as follows:

if z∗∗OK(xo) > zmax then z∗∗

OK(xo) = z∗OK(xo) + delta ∗ factor, or

if z∗∗OK(xo) > smax then z∗∗

OK(xo) = smax

if z∗∗OK(xo) < zmin then z∗∗

OK(xo) = z∗OK(xo) + delta ∗ factor

if z∗∗OK(xo) < smin then z∗∗

OK(xo) = smin

(8)

where delta = −(z∗OK(xo) − zmin) if the signal of NSo is negative, and delta =

zmax − z∗OK(xo) if the signal of NSo is positive; smin and smax are minimum and

maximum values for the sample distribution. Thus, according to this procedurecorrected estimates are firstly checked with the range of neighboring data, ifthey fall outside then a corrected estimate is computed again. The new correctedestimate is checked with the range of sample data, if they fall outside the maximumor minimum value is assigned to the corrected estimate according to expression (8).

The last correction concerns the mean of corrected estimates, which shouldcome close to the sample mean. After computing all corrected estimates, the meanis computed and compared with the sample mean. If they are different, then aconstant K is computed as follows:

K = E[Z(x)] − E[Z∗∗OK(x)]

this constant is added to all corrected estimates and checked again if they falloutside the sample data range as follows:

z∗∗OK(xo) = z∗∗

OK(xo) +K

if z∗∗OK(xo) < smin then z∗∗

OK(xo) = smin

if z∗∗OK(xo) > smax then z∗∗

OK(xo) = smax

(9)


Fig. 1. Image for the exhaustive data set [sdxk3.txt] (left) and histogram (right).

Adding a constant K to all corrected estimates (but data points) makes themean of corrected estimates equal to the sample mean without changing thesample variance according to a well known property of variance (Var [Z(x) +K] = Var [Z(x)]). It is important to notice that our target in reproducing bothhistogram and semivariogram is the sample statistics.

Throughout case studies the proposed method will become clearer.

Important remark. The neighboring parameters for both cross-validation(step 1) and ordinary kriging (step 3) must exactly be the same, because the localerror as given by the number of interpolation standard deviations (NSo ) dependsdirectly on the number of neighboring points. On the other hand, the number ofinterpolation standard deviations must be estimated by ordinary kriging (step 2)using as minimum as possible number of neighboring points in order to avoid thesmoothing effect in its determination for each node x of the regular grid. In thisstudy the number of standard deviations was estimated using always 4 nearest datapoints.

CASE STUDIES

In order to show how the proposed method works, two data sets will beconsidered as case studies. The first data set comes from a synthetic 2-D datacomposed of 48 × 42 values on a regular grid, which will be considered as theexhaustive data and named as sdxk3.txt. The second is a public domain data setnamed true.dat, which is available in Deutsch and Journel (1992). This data setcomprising 50 × 50 values on a regular grid will be used as the exhaustive data.

Figures 1 and 2 present images and histograms for the exhaustive data sets(sdxk3.txt and true.dat, respectively).

From these exhaustive data sets two subsets were drawn using stratifiedrandom sampling scheme. From the first data set (sdxk3.txt) a subset with 110

76 Yamamoto

Fig. 2. Image for the exhaustive data set [true.dat] (left) and histogram (right).

points (5.46% of total data) was drawn according to location map presented inFigure 3. Hereafter, this data subset will be named subset A. Another subset with72 points (i.e. 2.88% of total data) was drawn from the second data set (Fig. 4),which will be named as subset B.

Summary statistics for both the exhaustive data set (sdxk3.txt) and the subsetA are presented in Table 1.

Summary statistics for both the exhaustive data set and its data subset B arepresented in Table 2.

Fig. 3. Location map for the subset A.


Fig. 4. Location map for the subset B.

Figures 5 and 6 present histogram and P-P plot for subsets A and B,respectively. As we can see both subsets are reasonable samples from theirrespective exhaustive data sets. There is a difference between subsets A and B.The former presents mean and standard deviation less than those of the exhaustivedata set. Whereas, the subset B presents greater mean and standard deviation thanrespective values of the exhaustive data set. Numbers given in brackets on theupper left corner of P-P plots represent the average distance to 1:1 line. This

Table 1. Summary Statistics for the Exhaustive Data Setand Its Subset A

Summary statistics Exhaustive Subset A

Nb. of data 2016 110Mean 3.113 2.952Standard deviation 2.204 2.076Coefficient of variation 0.708 0.703Maximum 15.924 10.857Upper quartile 4.120 4.062Median 2.641 2.522Lower quartile 1.470 1.333Minimum 0.178 0.223

78 Yamamoto

Table 2. Summary Statistics for the Exhaustive Data Setand Its Subset B

Summary statistics Exhaustive Subset B

Nb. of data 2500 72Mean 2.320 2.449Standard deviation 2.705 2.811Coefficient of variation 1.166 1.148Maximum 22.460 16.760Upper quartile 2.815 3.195Median 1.343 1.480Lower quartile 0.634 0.720Minimum 0.110 0.180

average distance is computed as:

aver = 100 × cos(45)

n

n∑

i=1

abs(P exhi − P obsi),

where Pexhi and Pobsi are percentiles of the exhaustive and observed distributions.Since average distance is multiplied by 100, it can be interpreted as a percentageerror. If the average distance is 0% it means that the reference and observeddistributions are absolutely similar. Average distances between exhaustive data setsand uniform distributions generated within the same data range of the exhaustivedata set on a P-P plot are equal to 22.122 and 27.819%, respectively for sdxk3.txtand true.dat files. For data subsets regarding this study, the average distancesare equal to 1.271 and 1.039%, respectively for data subsets A and B. Averagedistances around 1–2% are perfectly acceptable errors coming from two similardistributions.

Fig. 5. Histogram of the subset A and P-P plot between the exhaustive data set and sample data.


Fig. 6. Histogram of the subset B and P-P plot between the exhaustive data set and sample data.

Now, let us compute experimental semivariograms and fit semivariogrammodels for both data subsets. For the subset A, directional semivariograms werecomputed along 135◦ and 45◦, respectively along directions of maximum andminimum continuity (Fig. 7). For the subset B, it was only possible to compute anomnidirectional semivariogram (Fig. 8), due to the number of data points and itsspatial distribution (Fig. 3).

In order to give a comprehensive view of the smoothing effect as well as howthe proposed method works, we considered a varying neighborhood from 4 to40 points, i.e. with increasing smoothness. The nearest data points were searchedaccording to the quadrant criterion, i.e., selecting 1 to 10 nearest points in eachquadrant. Ordinary kriging estimates were made on regular grids of 48 × 42 and50 × 50 nodes, respectively for subsets A and B, i.e. such as locations of their

Fig. 7. Experimental semivariograms and the semivariogram model for the subset A.

80 Yamamoto

Fig. 8. Experimental semivariogram and the semivariogram model for the subset B.

respective exhaustive data sets. Results of this survey are presented in Tables 3and 4, and in Figures 9 and 10, respectively for subsets A and B.

As long as the variance of ordinary kriging estimates (Var [Z∗OK]) decreases,

the deficit of variance (E{S2o}) increases compensating the smoothing effect.

Moreover, for the subset A the sum E{S2o} + Var [Z∗

OK] is always approximatelyequal to the variance of sample data. For the subset B the sum E{S2

o} + Var [Z∗OK]

does not approach to the sample variance, but explains from 71.7 to 87.8% of thesample variance.

Examining variances of corrected estimates we verify that expression(7) raises variability beyond the sample variance for smaller neighborhood andreduces variability for larger neighborhood. On the other hand, resulting variancesafter expression (8) are around the sample variance, but they decrease slowly fromsmaller to larger neighborhood, especially for subset A. Obviously the additionalcorrection provided by expression (9) is the main cause for variance decreasing,which is clear in Tables 3 and 4. Factor values increase with the number ofneighboring data points.

Now let us analyze occurrences beyond permissible data range according tosummaries presented in Tables 5 and 6. Number of nodes within regular gridsvaries according to the convex hull generated from each sample data. Thus, forthe subset A from a total of 2016 nodes, 1830 were estimated. Whereas for thesubset B, from a total of 2500 nodes, 2188 were estimated.

According to these tables, corrected values of ordinary kriging estimatesbased on 4 nearest neighbor points presented between 27 and 36% of outliers,i.e, corrected values falling outside the permissible data range. That is the reasonwhy corrected estimates have a higher variability and they are greater than sample


Tabl

e3.

Var

ianc

esfo

rO

rdin

ary

Kri

ging

and

Cor

rect

edE

stim

ates

and

Num

ber

ofC

ases

Out

side

Perm

issi

ble

Ran

geC

ompu

ted

for

Dat

aSu

bset

A

Nb.

poin

tsV

ar[Z

∗ OK

]E

{S2 o}

Var

[Z∗∗ O

K]a

Var

[Z∗∗ O

K]b

Fact

orz∗∗ O

K>

zm

axz∗∗ O

K<

zm

inz∗∗ O

K>

s max

z∗∗ O

K<

s min

42.

005

2.28

34.

288

4.29

80.

6335

630

40

08

1.45

72.

822

4.27

94.

289

0.83

238

160

00

121.

129

3.17

64.

305

4.28

90.

9318

813

30

016

0.91

83.

342

4.26

04.

287

1.00

154

104

65

200.

785

3.46

84.

253

4.28

91.

1410

194

718

240.

686

3.54

94.

235

4.27

31.

2285

776

4428

0.62

43.

611

4.23

54.

267

1.23

8061

643

320.

582

3.65

64.

238

4.28

21.

5058

5310

5036

0.54

93.

682

4.23

14.

169

2.99

4547

3845

400.

522

3.73

24.

254

4.04

72.

9939

3835

36

aV

aria

nce

ofco

rrec

ted

estim

ates

afte

rex

pres

sion

(7).

bC

orre

cted

estim

ates

afte

rex

pres

sion

(8).

82 Yamamoto

Tabl

e4.

Var

ianc

esfo

rO

rdin

ary

Kri

ging

and

Cor

rect

edE

stim

ates

and

Num

ber

ofC

ases

Out

side

Perm

issi

ble

Ran

geC

ompu

ted

for

Dat

aSu

bset

B

Nb.

poin

tsV

ar[Z

∗ OK

]E

{S2 o}

Var

[Z∗∗ O

K]a

Var

[Z∗∗ O

K]b

Fact

orz∗∗ O

K>

zm

axz∗∗ O

K<

zm

inz∗∗ O

K>

s max

z∗∗ O

K<

s min

43.

344

2.32

45.

688

7.89

70.

7535

226

50

08

3.01

03.

357

6.36

77.

898

0.98

195

214

00

122.

532

4.11

36.

645

7.86

21.

3812

424

826

193

162.

219

4.53

86.

757

7.87

61.

5991

215

3321

220

1.89

24.

932

6.82

47.

867

1.57

7022

233

222

241.

682

5.15

26.

834

7.86

41.

6765

175

3417

528

1.53

05.

388

6.91

87.

879

1.71

5916

434

164

321.

405

5.48

66.

891

7.86

21.

9355

135

3513

536

1.29

75.

610

6.90

77.

864

1.97

5510

935

109

401.

196

5.74

36.

939

7.85

41.

9656

111

3611

1

aV

aria

nce

ofco

rrec

ted

estim

ates

afte

rex

pres

sion

(7).

bC

orre

cted

estim

ates

afte

rex

pres

sion

(8).


Fig. 9. Graph showing variation of components of the deficit of variance as function ofnearest neighbor points for data subset A and variances for corrected estimates. Legend:empty circles = Var [Z∗

OK]; triangles = E{S2o }; diamonds = E{S2

o } + Var [Z∗OK];

squares = Var [Z∗∗OK]a according to expression (7); asterisk = Var [Z∗∗

OK]b accordingto expression (8); horizontal line indicates variance of the sample data equalto 4.310.

Fig. 10. Graph showing variation of components of the deficit of variance as function ofnearest neighbor points for data subset B and variances for corrected estimates. Legend:empty circles˜= Var [Z∗

OK]; triangles = E{S2o }; diamonds = E{S2

o } + Var + [Z∗OK]; squares˜=

Var [Z∗∗OK]a according to expression (7); asterisk = Var [Z∗∗

OK]b according to expression (8);horizontal line indicates variance of the sample data equal to 7.901.

84 Yamamoto

Table 5. Number of Occurrences Recorded for Expressions (6), (7) or (8),and (9) – Subset A

Expres. (6) Expres. (7) or (8) Expres. (9)

Nb. points Nb. % Nb. % Nb. %

4 1170 63.9 660 36.1 0 08 1432 78.3 398 21.7 0 0

12 1509 82.5 321 17.5 0 016 1572 85.9 258 14.1 11 0.620 1635 89.3 195 10.7 25 1.424 1668 91.1 162 8.9 50 2.728 1689 92.3 141 7.9 49 2.732 1719 93.9 111 6.1 69 3.836 1738 95.0 92 5.0 83 4.540 1753 95.8 77 4.2 71 3.9

variances when outliers are corrected according to expression (7). However, when 8or more neighbor points are used more than 78% of corrected estimates were basedon expression (6), confirming the effectiveness of the proposed method. Therefore,the proposed method can recover the sample variance. Moreover, as it was statedpreviously the principle of this method is based on projecting estimates to 45◦

line by adding or subtracting an error quantity that recovers the original samplevariance and reproducing consequently histogram and semivariogram. Hereafter,only corrected estimates using expressions (8) and (9) will be considered for thisstudy.

Table 6. Number of Occurrences Recorded for Expressions (6), (7) or (8),and (9) – Subset B

Expres. (6) Expres. (7) or (8) Expres. (9)

Nb. points Nb. % Nb. % Nb. %

4 1598 73.0 590 27.0 0 08 1779 81.3 409 18.7 0 0

12 1816 83.0 372 17.0 219 10.016 1882 86.0 306 14.0 245 11.220 1896 86.7 292 13.3 255 11.724 1948 89.0 240 11.0 209 9.628 1965 89.8 223 10.2 198 9.032 1998 91.3 190 8.7 170 7.836 2024 92.5 164 7.5 144 6.640 2021 92.4 167 7.6 147 6.7


Table 7. Statistical Summaries for Distributions of Ordinary Kriging Estimates and CorrectedEstimates Computed From Subset A

OK estimates Corrected estimates

Nb.points Mean

Standarddeviation

Coefficientof variation

ConstantK Mean

Standarddeviation


4 2.936 1.416 0.482 −0.307 2.953 2.073 0.7028 2.932 1.207 0.412 −0.308 2.954 2.071 0.701

12 2.939 1.062 0.362 −0.277 2.954 2.071 0.70116 2.938 0.958 0.326 −0.275 2.955 2.071 0.70120 2.941 0.886 0.301 −0.231 2.958 2.071 0.70024 2.941 0.828 0.282 −0.201 2.959 2.067 0.69928 2.939 0.790 0.269 −0.218 2.959 2.066 0.69832 2.938 0.763 0.260 −0.190 2.958 2.069 0.70036 2.935 0.741 0.252 −0.158 2.956 2.042 0.69140 2.936 0.722 0.246 −0.156 2.956 2.012 0.681

Histogram Reproduction

Firstly, let us check reproduction of histograms of corrected estimates. Forcomparative purposes histograms of smoothed ordinary kriging estimates arealso considered. Tables 7 and 8 present statistical summaries of ordinary krigingestimates and corrected estimates, respectively for subsets A and B. For illustrationpurposes, we will show only figures taken from 20 neighboring points. Figure 8presents histograms for both ordinary kriging and corrected estimates.

Table 8. Statistical Summaries for Distributions of Ordinary Kriging Estimates and CorrectedEstimates Computed From Subset B


Nb.Points Mean

Standarddeviation


ConstantK Mean

Standarddeviation


4 2.187 1.829 0.836 −0.060 2.450 2.810 1.1478 2.240 1.735 0.774 0.041 2.449 2.810 1.147

12 2.275 1.591 0.699 0.095 2.448 2.804 1.14516 2.302 1.490 0.647 0.064 2.448 2.806 1.14620 2.318 1.376 0.593 0.082 2.448 2.805 1.14624 2.323 1.297 0.558 0.085 2.448 2.804 1.14628 2.332 1.237 0.531 0.055 2.448 2.807 1.14632 2.330 1.185 0.509 0.087 2.448 2.804 1.14536 2.332 1.139 0.488 0.083 2.448 2.804 1.14640 2.330 1.094 0.469 0.104 2.448 2.803 1.145

86 Yamamoto

Fig. 11. Subset A – histograms for ordinary kriging estimates (left) and for corrected estimates (right)both computed from 20 neighboring points.

Comparing statistical summaries of computed estimates with those of samplesubsets we observe that ordinary kriging estimates reproduce only the mean values,while standard deviations are much less than sample standard deviations. Besides,as long as the number of points increases the standard deviation of ordinary krigingestimates decreases because of the smoothing effect. Corrected estimates presentmean values and standard deviations very close to sample ones.

Histograms of ordinary kriging estimates are distributed into few classesaround the mean values (Figs. 11 and 12) and, therefore, very different fromsample histograms. Histogram of corrected estimates (Fig. 11) for subset A isvery similar to sample histogram (Fig. 5). There are some differences comingfrom first and third classes. In this case, histogram reproduction is not perfect,but acceptable, because it keeps histogram asymmetry and number of sampledclasses. For subset B the histogram of corrected estimates (Fig. 12) is very similarto the sample histogram (Fig. 6).

For a quantitative comparison between ordinary kriging estimates and cor-rected estimates, we present average distances measured on P-P plots such asshown in Figures 5 and 6. Tables 9 and 10 present average distances for the

Fig. 12. Subset B – histograms for ordinary kriging estimates (left) and for corrected estimates(right) both computed from 20 neighboring points.


Table 9. Average Distances Measured on P-P Plots for the Subset A

Exhaustive data set Sample data subset

Nb. points OK estimates Corrected estimates OK estimates Corrected estimates

4 3.106 1.302 3.318 1.1598 4.314 1.113 4.516 1.060

12 5.190 0.846 5.385 0.89616 5.815 0.557 6.008 0.74420 6.307 0.496 6.500 0.88824 6.673 0.596 6.866 0.96728 6.899 0.621 7.091 0.97332 7.072 0.854 7.257 1.16636 7.189 0.953 7.370 1.21540 7.312 1.047 7.489 1.284

exhaustive distribution and sample distributions, respectively for subsets A and B.Figures 13 and 14 present P-P plots of the exhaustive and sample distribu-tions.

Average distances for ordinary kriging estimates are always greater than thosecomputed for sample distributions (Figs. 5 and 6). Besides, average distances(for ordinary kriging estimates) increase with number of points showing that asnumber of points increases distributions get further from both exhaustive andsample distributions.

For corrected estimates average distances are much less than for ordinarykriging estimates and they are around those computed for sample distributionsaccording to data presented in Figures 5 and 6. Furthermore, average distancescomputed between sample and corrected estimates distributions are around 1%,

Table 10. Average Distances Measured on P-P Plots for the Subset B

Exhaustive data set Sample data subset

Nb. Points OK estimates Corrected estimates OK estimates Corrected estimates

4 2.082 1.196 1.829 0.7848 2.534 1.198 2.279 0.860

12 3.114 1.224 2.843 0.83016 3.514 1.461 3.184 0.89220 3.880 1.440 3.525 0.86224 4.137 1.532 3.793 0.99528 4.327 1.471 3.986 0.95232 4.503 1.603 4.168 1.16036 4.635 1.621 4.322 1.21940 4.761 1.663 4.487 1.294

88 Yamamoto

Fig. 13. P-P plots comparing ordinary kriging estimates (empty circles) and corrected estimates (fullcircles) with the exhaustive distribution (left) and with the sample distribution (right), for the subset A.

showing that the proposed method reproduces sample distribution within anacceptable tolerance.

Semivariogram Reproduction

Now it is clear that corrected estimates will reproduce the underlying spatialcorrelation given by a semivariogram model. This conclusion is supported byall data presented so far. Once again for illustrative purposes we show onlysemivariograms computed for ordinary kriging and corrected estimates determined

Fig. 14. P-P plots comparing ordinary kriging estimates (empty circles) and corrected estimates (fullcircles) with the exhaustive distribution (left) and with the sample distribution (right), for the subset B.


Fig. 15. Subset A: Experimental semivariograms computed for ordinarykriging estimates (empty symbols) and for corrected estimates (full symbols).Legend: circles = N45◦ and squares = N135◦.

from 20 neighboring points (Figs. 15 and 16). Corrected estimates reproduce semi-variogram models, because the proposed method recovers the sample variance.

Examining Local Accuracy and Conditional Bias of Corrected Estimates

The proposed method can reproduce both histogram and semivariogram, butwe have to show that it can be done without loss of local accuracy. That is animportant issue for correcting the smoothing effect of ordinary kriging estimates.

Fig. 16. Subset B: Experimental semivariogram computed for ordinary krigingestimates (empty symbol) and for corrected estimates (full symbol).

90 Yamamoto

Table 11. Measures of Local Accuracy Between Reference and Estimated Values for the Subset A


Nb. points ρ ρRank RMS Slope ρ ρRank RMS Slope

4 0.676 0.683 0.947 0.978 0.625 0.651 1.138 0.6198 0.658 0.663 1.014 1.118 0.647 0.675 1.079 0.641

12 0.636 0.638 1.130 1.226 0.662 0.686 1.075 0.65616 0.616 0.626 1.179 1.315 0.661 0.686 1.071 0.65620 0.604 0.618 1.232 1.395 0.656 0.683 1.064 0.65124 0.600 0.618 1.255 1.483 0.659 0.687 1.045 0.65428 0.592 0.613 1.278 1.531 0.660 0.688 1.026 0.65632 0.584 0.610 1.300 1.564 0.655 0.690 1.005 0.65036 0.574 0.602 1.324 1.583 0.657 0.691 0.991 0.65940 0.566 0.602 1.330 1.600 0.659 0.694 0.979 0.671

Now we will compare estimated grids with the exhaustive ones. Measures of localaccuracy as given by correlation coefficients between reference and estimatedvalues, RMS errors, and slopes of regression lines are presented in Tables 11 and12, respectively for subsets A and B. For the vast majority of number of points (forboth subsets), correlation coefficients of corrected estimates are greater than thosefor ordinary kriging estimates. It means that corrected estimates are improved interms of local accuracy when compared with ordinary kriging estimates. Therefore,the proposed method makes correction without loss of local accuracy. In the sameway, RMS errors for corrected estimates are usually less than those for ordinarykriging estimates. It reconfirms that the proposed method does not present lossof local accuracy. Therefore, corrected estimates are closer to reference values.Slopes of regression lines of reference values as function of ordinary kriging

Table 12. Measures of Local Accuracy Between Reference and Estimated Values for the Subset B


Nb. points ρ ρRank RMS Slope ρ ρRank RMS Slope

4 0.785 0.805 1.132 1.096 0.720 0.803 1.498 0.6448 0.757 0.756 1.406 1.114 0.760 0.818 1.402 0.684

12 0.740 0.703 1.661 1.189 0.780 0.813 1.175 0.70416 0.727 0.681 1.829 1.248 0.777 0.807 1.211 0.70020 0.715 0.663 1.998 1.313 0.782 0.818 1.194 0.70624 0.705 0.651 2.101 1.369 0.780 0.821 1.177 0.70428 0.701 0.646 2.178 1.427 0.785 0.824 1.149 0.70832 0.701 0.647 2.222 1.488 0.783 0.832 1.086 0.70736 0.694 0.632 2.284 1.530 0.779 0.832 1.102 0.70340 0.691 0.623 2.358 1.584 0.775 0.826 1.121 0.700


estimates increase as the number of neighboring points increase. On the otherhand, slopes for corrected estimates are practically the same around a mean value.To understand this behavior let us analyze the expression used for computing theslope of a regression line:

Slope = ρ ∗ SRef

SEst(10)

where ρ is the correlation coefficient, SRef is the standard deviation for referencevalues and SEst is the standard deviation for estimated values. Thus, when thestandard deviation of estimate values goes down the slope goes up. This iswhat happens with ordinary kriging estimates, because as the number of pointsincreases the standard deviation decreases. However, the standard deviation ofcorrected estimates is approximately constant and, therefore, the slope remainsconstant.

Examining slopes of regression lines we verify that for ordinary krigingestimates slopes of regression lines are always greater than 1 and for correctedestimates are always less than 1. Theoretically we may think a slope equal to1 means no conditional bias. As correlation coefficients between reference andestimated values are expected to be less than 1, and the standard deviation ofreference values is fixed for a given data set, we can achieve slopes equal to 1only by varying the standard deviation of estimated values. For a slope equalto 1 the standard deviation of estimated values must be equal to the correlationcoefficient times the standard deviation of reference values. It means that thestandard deviation of estimated values will always be less than the standarddeviation of reference values and, consequently, presenting smoothing effect.

We can achieve global accuracy without loss of local accuracy, but theconditional bias is a consequence of the estimation process and it remains evenafter correcting the smoothing effect.

Figures 17 and 18 present scattergrams comparing estimates and referencevalues, respectively for subsets A and B. Scattergrams for corrected estimatespresent data spreading more symmetrically around 45◦ line.

Illustrative Example of a Run

A computer program integrating both cross-validation and ordinary krigingwas developed to get in a single run corrected estimates from ordinary kriging ones.Figures 19 and 20 present resulting images of ordinary kriging estimates and cor-rected estimates, respectively for subsets A and B. Compared to the ordinary krig-ing estimates, the corrected estimates emphasize high and low values illustratingthe smoothing effect removal as provided by this algorithm. Pattern reproductionwill depend on the number of sampling points. Actually, the greater the number of

92 Yamamoto

Fig. 17. Scattergrams of reference values with ordinary kriging estimates (left column) and betweenreference values and corrected estimates (right column) for the subset A. Thick lines represent leastsquare regression lines.

sampling points the better the pattern reproduction. For studied cases, consideringthe sampling levels (5.46 and 2.88%) resulting images are very good.

CONCLUSIONS

The proposed method for correcting the smoothing effect of ordinary krigingestimates shares local and global accuracies and so these are no more conflictingobjectives. It is important to note that even after adding an error quantity to

Fig. 18. Scattergrams of reference values with ordinary kriging estimates (left column) and betweenreference values and corrected estimates (right column) for the subset B. Thick lines represent leastsquare regression lines.


Fig. 19. Image of ordinary kriging estimates (left) and image of corrected estimates (right) for thesubset A.

the OK estimate, the resulting corrected estimate still presents local accuracy.Indeed, this is achieved because the interpolation variance measures accuratelythe deficit of variance due to the smoothing effect of OK estimators. Besides,the error quantity is dependent on the kriging estimator as long as it is firstlyderived from cross-validation and applied subsequently during ordinary krigingestimation.

Throughout the case studies, it was shown that corrected estimates presentdesirable properties of global accuracy, such as histogram and semivariogramreproduction, without loss of local accuracy. The worth of provided correction forthe smoothing effect is that its result is addressed to a single image, as opposed tostochastic simulation realizations. However, as we could see the conditional biasremains because it is inherent in weighted average estimates such as in ordinarykriging ones. It is important to realize that conditional bias will always exist when

Fig. 20. Image of ordinary kriging estimates (left) and image of corrected estimates (right) for thesubset B.

94 Yamamoto

the correlation coefficient between reference data values and estimated ones is lessthan 1, which is the general case for experimental data.

Certainly, the ordinary kriging considering smoothing removal will be revi-talized as a reliable method of estimation in the sense of local and global accuraciesand, therefore, allowing selection of attribute values at multiple locations, becausecorrected estimates reflect the underlying geological texture as measured by thesemivariogram.

ACKNOWLEDGMENTS

Results presented in this paper derive from the author’s main research supportedby a Brazilian sponsoring agency to which he would like to express his gratitude:National Council for Scientific and Technological Development – CNPq (Process304612/89-8).

REFERENCES

Caers, J., 2000, Adding local accuracy to direct sequential simulation: Math. Geol., v. 32, no. 7,p. 815–850.

Deutsch, C. V., and Journel, A. G., 1992, GSLIB geostatistical software library and user’s guide:Oxford University Press, New York, 340 p.

Goovaerts, P., 1997, Geostatistics for natural resources evaluation: Oxford University Press, New York,483 p.

Goovaerts, P., 1998, Accounting for estimation optimality criteria in simulated annealing: Math. Geol.,v. 30, no. 5, p. 511–534.

Guertin, K., 1984, Correcting conditional bias, in Verly, G., David, M., Journel, A. G., and Marechal,A., eds., Geostatistics for natural resources characterization, Part 1: D. Reidel, Dordrecht, Holland,p. 245–260.

Isaaks, E. H., and Srivastava, R. M., 1989, An introduction to applied geostatistics: Oxford UniversityPress, New York, 561 p.

Journel, A., Kyriakidis, P. C., and Mao, S., 2000, Correcting the smoothing effect of estimators:A spectral postprocessor: Math. Geol., v. 32, no. 7, p. 787–813.

Olea, R., 1999, Geostatistics for engineers and earth scientists: Kluwer Academic, Boston, 303 p.Olea, R., and Pawlowsky, V., 1996, Compensating for estimation smoothing in kriging: Math. Geol.,

v. 28, no. 4, p. 407–417.Yamamoto, J. K., 2000, An alternative measure of the reliability of ordinary kriging estimates: Math.

Geol., v. 32, no. 4, p. 489–509.

Documents

Correcting the Smoothing Effect