Estimation variogram uncertainty

8/12/2019 Estimation variogram uncertainty

1/32

Mathematical Geology, Vol. 36, No. 8, November 2004 ( C2004)

Estimating Variogram Uncertainty1

B. P. Marchant2 and R. M. Lark2

The variogram is central to any geostatistical survey, but the precision of a variogram estimated fromsample data by the method of moments is unknown. It is important to be able to quantify variogramuncertainty to ensure that the variogram estimate is sufficiently accurate for kriging. In previousstudies theoretical expressions have been derived to approximate uncertainty in both estimates of theexperimental variogram and fitted variogram models. These expressions rely upon various statisticalassumptions about the data and are largely untested. They express variogram uncertainty as functionsof the sampling positions and the underlying variogram. Thus the expressions can be used to designefficient sampling schemes for estimating a particular variogram. Extensive simulation tests show thatfor a Gaussian variable with a known variogram, the expression for the uncertainty of the experimentalvariogram estimate is accurate. In practice however, the variogram of the variable is unknown andthe fitted variogram model must be used instead. For sampling schemes of 100 points or more thishas only a small effect on the accuracy of the uncertainty estimate. The theoretical expressions for

the uncertainty of fitted variogram models generally overestimate the precision of fitted parameters.The uncertainty of the fitted parameters can be determined more accurately by simulating multipleexperimental variograms and fitting variogram models to these. The tests emphasize the importanceof distinguishing between the variogram of the field being surveyed and the variogram of the randomprocess which generated the field. These variograms are not necessarily identical. Most studies ofvariogram uncertainty describe the uncertainty associated with the variogram of the random process.Generally however, it is the variogram of the field being surveyed which is of interest. For intensivesampling schemes, estimates of the field variogram are significantly more precise than estimates ofthe random process variogram. It is important, when designing efficient sampling schemes or fittingvariogram models, that the appropriate expression for variogram uncertainty is applied.

KEY WORDS: ergodic, nonergodic, error, simulation tests.

INTRODUCTION

The variogram characterizes the structure of spatial correlation of a variable and is

central to any geostatistical survey. It expresses the variance of the difference be-

tween two observations of the variable as a function of the lag vector that separates

them. A variogram estimate, expressed as a mathematical function, is required to

1Received 12 November 2003; accepted 18 May 2004.2Silsoe Research Institute, Wrest Park, Silsoe, Bedford, MK45 4HS, United Kingdom; e-mail:

[email protected]

867

0882-8121/04/1100-0867/1 C 2004 International Association for Mathematical Geology


2/32

868 Marchant and Lark

krige or simulate a spatially correlated variable (Webster and Oliver, 2001). How-

ever, both of these techniques assume that the variogram of the variable is known,

whereas in reality the variogram must be estimated from the available data. There-fore there is some unavoidable uncertainty associated with variogram estimate.

In this paper, we discuss methods of quantifying this uncertainty for variograms

estimated by the method of moments.

Variogram uncertainty has been considered previously in a number of dif-

ferent contexts. Webster and Oliver (1992) measured the uncertainty of vari-

ograms estimated from different sampling schemes to determine whether the sam-

pling schemes were adequate for variogram estimation. Muller and Zimmerman

(1999) and Bogaert and Russo (1999) have suggested techniques for design-

ing sample schemes where the sample points are positioned to minimize thevalue of a theoretical expression of variogram uncertainty. The same theoret-

ical expressions are used to fit variogram models in a way that accounts for

the difference in accuracy of the experimental semivariance at each lag distance

(Cressie, 1985). Some measure of variogram uncertainty is also important when

considering the reliability of simulated or kriged estimates derived from the esti-

mated variogram (Brooker, 1986; Todini, 2001; Todini, Pellegrini, and Mazzetti,

2001).

We draw attention to three possible problems with previous approaches for

estimating variogram uncertainty. The reliability of the theoretical expressions ofvariogram uncertainty used by Muller and Zimmerman (1999) and Bogaert and

Russo (1999) have not been tested comprehensively. Yet the expressions are only

approximate and rely upon certain statistical assumptions. Furthermore, generally

it is the error in estimating the variogram of the field being surveyed which is

of interest. However, the theoretical expressions used by Muller and Zimmerman

(1999) and Bogaert and Russo (1999) quantify the expected error in the exper-

imental variogram as an approximation to the variogram of the random process

which generated the field. Finally, the theoretical expressions to determine the

uncertainty depend on the variogram of the random process. When applying theseexpressions, the variogram of the random process must be approximated by a

modelfitted to the experimental variogram. Thus this approach to the estimation

of variogram uncertainty is circular.

Here, through experiments on simulated data sets, we assess the impact of

each of these concerns. We follow Brus and de Gruijter (1994) in referring to

the variogram averaged over all realizations of the underlying random process as

the ergodic variogram, and the exhaustive variogram of the single realization or

field being sampled as the nonergodic variogram. Journel and Huijbregts (1978)

refer to these as the theoretical and local variograms respectively. First, we test

the accuracy of the theoretical expressions for the uncertainty of the methods

of moments variogram as an estimate to a known ergodic variogram. Second,

we consider the uncertainty associated with an experimental variogram estimate


3/32

Estimating Variogram Uncertainty 869

to a nonergodic variogram when (for the purpose of assessing uncertainty) the

ergodic variogram is known. In addition we compare the magnitude of the er-

rors when using the experimental variogram as an estimate of the ergodic andnonergodic variogram. Third, we test the accuracy of the theoretical expressions

for the uncertainty of the methods of moments variogram estimates to an un-

known ergodic variogram. In this case the uncertainty expressions are calculated

using a modelfitted to the experimental variogram, rather than the correct ergodic

variogram.

We denote the experimental variogram estimate by (h), the ergodic var-

iogram by (h) and the nonergodic variogram by NE(h). We assume that the

variograms are isotropic and therefore functions of the scalar separation distance

h. We now present the three problems being addressed in more detail and describeprevious studies of them.

Uncertainty of Estimates to a Known Ergodic Variogram

Previous studies of variogram uncertainty have mostly concentrated on es-

timates of the ergodic variogram. Cressie (1985), Ortiz and Deutsch (2002), and

Pardo-Iguzquiza and Dowd (2001a) suggested similar expressions for the covari-

ance matrix of experimental variogram estimates to the ergodic variogram. These

expressions are functions of both the sampling scheme and the ergodic variogram.

The elements of the main diagonal of the covariance matrix represent the variance

of the experimental variogram estimates at each separating distance. For conve-

nience, we refer to the standard error at each separating distance as the ergodic

error. The ergodic error is the result of two different types offluctuation. We are

most concerned with the sampling error, that is the expected difference between

the variogram estimate (h) and the nonergodic variogram of the realization being

sampled NE(h). However, the ergodic error also includes the effect offluctua-

tions between the ergodic variogram(h) and the nonergodic variogram NE(h).

Pardo-Iguzquiza and Dowd (2001a) also consider how the uncertainty of the ex-

perimental variogram may be incorporated into the uncertainty offitted variogram

parameters. This leads to an expression for the covariance matrix of variogram

parametersfitted by generalized least squares (GLS).

Few previous tests of the reliability of expressions of variogram uncertainty

have been carried out. Pardo-Iguzquiza and Dowd (2001a) applied their expression

to a particular case study and confirmed qualitatively that variogram uncertainty

varied with lag distance in the manner they expected. Ortiz and Deutsch (2002)

applied two different methods of simulation to test their expressions of variogram

uncertainty. One method simulated multiple values of a random variable at sets

of two pairs of locations. The observed covariances between the variogram es-

timates from each pair were in good agreement with their expressions. In the


4/32


second test they simulated multiple realizations of the random variable at a set

of sampling points and calculated the experimental variogram for each realiza-

tion. This was referred to as the global simulation method. The observed vari-ances of semivariances from the global method were generally less than those

predicted by their expressions. Difficulties in simulating realizations that hon-

oured the variogram function, particularly for long lag distances, were blamed

for these discrepancies. McBratney and Webster (1986) used a similar method

of simulation to establish confidence intervals on experimental variogram

estimates.

Expected Error in Estimates to the Nonergodic Variogramfor a Known Ergodic Variogram

Munoz-Pardo (1987) derived expressions for the uncertainty of estimates

to the nonergodic variogram. He separated the two components of fluctuation

within the ergodic error to approximate the expected error in approximating the

nonergodic variogram NE(h) by the experimental variogram (h). We refer to

this quantity, which may be thought of as the sampling error, as the nonergodic

error. It is this quantity that is of interest when optimizing sample schemes for

variogram estimation. Muller and Zimmerman (1999) and Bogaert and Russo(1999) designed optimal sample schemes for variogram estimation by minimizing

the ergodic error. Therefore we investigated both the reliability of Munoz-Pardos

(1987) expressions and the difference between the ergodic error and the nonergodic

error.

Prior to our investigations, Munoz-Pardos (1987) expressions had not been

validated comprehensively. Munoz-Pardo (1987) used his expressions to calcu-

late the expected sampling error of variogram estimates for different sampling

schemes, ergodic variograms, andfield sizes. He found that the ratio of variogram

range, that is the distance over which the variable is spatially-correlated, to fieldsize was a critical factor in determining the nonergodic error. Other authors have

attempted to establish confidence bands on nonergodic variograms using simu-

lated data. Webster and Oliver (1992) carried out extensive simulation tests in

order to estimate the nonergodic error when applying different sampling schemes.

Motivated by thefindings of Munoz-Pardo (1987), they examined data sets with

different ratios of variogram range tofield size. They also varied the basic struc-

ture of the ergodic variogram used to simulate the data. They found that between

150 and 225 sampling points are required to estimate the variogram accurately.

In each of their simulation tests they sampled the same region several times bytranslating the sampling grid across the region. Although they ensured that the

same point was not sampled by two different versions of the grid, they might have

underestimated the expected error of variogram estimates because of correlation


5/32


between the samples. We discuss this correlation and the effect it has on the error

estimates later in this paper. Webster and Oliver (1992) briefly compared their

observed confidence intervals with those of Munoz-Pardo (1987), and saw somesimilarities.

Uncertainty of Estimates to an Unknown Ergodic Variogram

All of the expressions of variogram uncertainty described above are functions

of the ergodic variogram. However, in any real survey the ergodic variogram would

not be known; it would be approximated by the estimated variogram. The effect

of this approximation has neither been accounted for in the theoretical studies norestimated from simulated data.

THEORY

Estimating the Variogram

In geostatistics we regard the value of a variable at a location x as a re-

alization of a random function Z(x). This random function is assumed to beintrinsically stationary. This is a weak form of second-order stationarity and

is met if two conditions hold. The first is that the expected value of the ran-

dom function, E[Z(x)], is constant for all x. Secondly, the variance of the dif-

ferences between the value of the variable at two different locations depends

only on the lag vector separating the two locations and not on the absolute lo-

cations. In general, this variance may be a function of both the direction and

length of the lag vector. In this study isotropic variograms only are considered.

These are purely a function of the length of the vector which we denote h.

Thus the relationship between values from different locations is described by thevariogram

(h) =1

2E[(Z(x) Z(x+ h))2]. (1)

The variogram is estimated from variable values observed at sampled points,

xs ,s = 1, . . . , n. The method of moments estimator is the average of squared dif-

ferences between observations separated by distance h . Pairs of observations are

divided amongst different bins based upon their separating distance. If the obser-vations are on a regular sampling grid, then bins consisting of pairs with exactly

the same separating distance may be chosen. Otherwise a small tolerance must

be placed on the separating distances associated with each bin. The experimental


6/32


variogram (hj), j= 1, . . . , kis then estimated by

(hj) =1

2n(h j)

n

(hj)i=1

zi1(hj) z

i2(hj)

2, (2)

wheren(h j) is the number of pairs in the bin centred on separating distance h j,

andz i1(hj),zi2(h j) are thei th pair of observed values in this bin.

Kriging and simulation require that the variogram is expressed as a mathemat-

ical function or model. This function must obey several mathematical constraints

to describe random variation and to avoid negative variances. This is achieved

typically byfitting a suitable function to the experimental variogram. The math-ematical constraints, and the most commonly used authorized functions which

obey them, are described by Webster and Oliver (2001). In practice, the model

type may be chosen by visual inspection of the experimental variogram or, after

fitting the model, by more formal criteria such as the Akaike Information Criterion

(McBratney and Webster, 1986). Webster and Oliver (2001) recommend that the

model type should be chosen by a procedure which combines visual and statistical

assessment.

Each function has a few parameters that are selected tofit the function to the

experimental variogram. Different methods are used to estimate these parameters.Some practitioners do so by eye, but most prefer more objective methods. Cressie

(1985) describes three mathematical techniques forfitting the parameter values.

The simplest is the least squares method. If is the vector of p variogram pa-

rameters,(h; ) the corresponding parameterised variogram function, and k the

number of experimental variogram bins, then the method of least squares chooses

that minimizes

ki=1

((hi ) (hi ; ))2

. (3)

However, the reliability of each experimental semivariance (hi ) varies according

to the number of point pairs used to describe it and the actual value of(hi ).

Therefore it is better to use weighted least squares and minimize

k

i=1

wi ((hi ) (hi ; ))2, (4)

wherewi is a weighting function. The weighting function may be set proportional

ton (hi ) or, in order to account for the inverse relation between the reliability of


7/32


an estimate of variance and the variance itself,

wi =n(h

i)

(hi )2, (5)

may be specified.

The most rigorous of the three techniques described by Cressie (1985) is

generalized least squares (GLS). The GLS technique accounts for the accuracy of

each bin estimate of the experimental variogram, and the correlation between each

estimate. The chosen parameter values minimize

((h) (h; ))

T1

(h; ) ((h) (h; )). (6)

Here, h is the length kvector of lag bin centres and 1(h; )isthe k kcovariance

matrix of(h). This matrix will be discussed in more detail later in this section.

The direct method of minimizing Equation (6) has been shown to be inconsistent

(Muller and Zimmerman, 1999). To account for this the following iterative scheme

is used

m+1 = min

((h) (h; ))T1(h;m )((h) (h; )),

= limm

m . (7)

Here, m is the estimate ofafterm 1 iterations of Equation (7). This iterative

scheme requires1, an initial estimate of the parameter values. This initial estimate

may be chosen by weighted least squares [Eqs. (4)(5)]. The procedure then con-

verges to the GLS parameter estimate in an asymptotically efficient and consistent

manner.

Assessing Variogram Uncertainty

Several authors (Cressie, 1985; Ortiz and Deutsch, 2002; Pardo-Iguzquiza

and Dowd, 2001a) have derived similar expressions for the uncertainty of the

experimental variogram. In each case they express this uncertainty in terms of,

the covariance matrix of the experimental variogram. The pq th element of this

matrix is

[]pq = Cov [(hp), (hq )], (8)

and the diagonal elements are the variances of semivariances. The expected value

of(h) for each lag distanceh is equal to(h). Therefore we refer to the standard


8/32


deviations of the semvariance at each lag bin, that is the square root of each element

on the main diagonal, as the ergodic error and to as the ergodic covariance matrix.

From the definition of covariance

[]pq = E [(hp)(hq )] (hp)(hq ), (9)

=1

4n(hp)n(hq )

n(hp )i=1

n(hq )j=1

E

zi1(hp) zi2(hp)

2z

j1 (hq ) z

j2 (hq )

2

(hp)(hq ). (10)

Munoz-Pardo (1987) showed that ifZ(x) is multivariate Gaussian with an isotropic

ergodic variogram(h), then

E

zi1(hp) zi2(hp)

2z

j

1 (hq ) zj

2 (hq )2

= 2Ci j(hp, hq ) + 4(hp)(hq ). (11)

The functionCi j(hp, hq ) describes the covariance between [zi1(hp) z

i2(hp)] and

[zj1 (hq ) z

j2 (hq )] and may be written

Ci j(hp, hq ) = xi1 xj1 + xi2 xj2 xi1 xj2 xi2 xj1 2,(12)

wherexi1,xi2,x

j

1 , andxj

2 are the sample points at which the valueszi1(hp),z

i2(hp),

zj

1 (hq ), andzj

2 (hq ), are measured, and |.| denotes the distance between the sample

points. Therefore the pq th element of the ergodic covariance matrix is written

[]pq =1

2n(hp)n(hq )

n(hp )i=1

n(hq )j=1

Ci j(hp, hq ). (13)

Pardo-Iguzquiza and Dowd (2001b) provide Fortran code to calculate this ex-

pression. To calculate Equation (12), the program requires the ergodic variogram

function as an input. This is best approximated from the fitted variogram model.

If the distribution of semivariances is multivariate Gaussian, it is completely

defined by the ergodic variogram (h) and the covariance matrix . Furthermore,

standard statistical theory (Gathwaite, Joliffe, and Jones, 1995) states that the

quantity

((h) (h))T1

(h)((h) (h)), (14)

has a chi squared distribution with kdegrees of freedom. Confidence sets for,

with confidence (1 ), where is the significant probability level, are given by


9/32


10/32


Similarly,

E[NE(hp)NE(hq )] 1

2N(hp)N(hq )

N(hp

)i=1

N(hq

)j=1

Ci j+ (hp)(hq ), (21)

and

E[(hp)(hq )] 1

2n(hp)n(hq )

n(hp )i=1

(nq )j=1

Ci j+ (hp)(hq ). (22)

Therefore, substituting Equations (20), (21), and (22) into Equation (17) gives

[NE]pq 1

2n(hp)n(hq )

n(hp )r=1

n(hq )s=1

Cr s (hp, hq )

+1

2N(hp)N(hq )

N(hp)r=1

N(hq )s=1

Crs (hp, hq )

1

2n(hp)N(hq )

n

(hp )r=1

N

(hq )s=1

Cr s (hp, hq )

1

2N(hp)n(hq )

N(hp )r=1

n(hq )s=1

Cr s (hp, hq ). (23)

This expression may be calculated numerically in a similar manner to Equation

(13). It is more computationally expensive however since the covariances between

N(N 1)/2 pairs of points must be considered.The most common method of estimating the uncertainty offitted variogram

parameter estimates is by calculating the inverse of the information matrix (Pardo-

Iguzquiza and Dowd, 2001a). The p pinformation matrix, M, that corresponds

to parameter vector(of length p)fitted by GLS is

M = JT1J. (24)

Here,J is thek pJacobian matrix in which thei jth element is [J]i j= (hi )/

j, evaluated at the GLS estimate of. A result from nonlinear inversion theory(Menke, 1984) says that M1 is a leading order Taylor series approximation to

the covariance matrix of the parameter estimates. Since this is a leading order ap-

proximation it is only accurate for estimates ofthat are themselves accurate. The


11/32


approximation assumes that the parameter estimates have a multivariate Gaussian

distribution. In this case, and under the assumption that the variogram estimation

technique is unbiased, the distribution of the vector of parameter estimates, ,is completely defined. The mean value is given by the parameter vector of the

simulated variable, and the covariance matrix by M1.

Ortiz and Deutsch (2002) assessed variogram parameter uncertainty by a

more arbitrary criterion. They examined the experimental variogram covariance

matrix, , andfitted variograms to what they judged to be extremerealizations

of the experimental variogram. Thesefitted variograms were themselves assumed

to haveextremeparameter values.

SIMULATION EXPERIMENTS

Simulated Fields and Sampling Schemes

The characteristics of the simulated fields and sampling schemes matched

those used by Webster and Oliver (1992). Fields were generated with one of two

ergodic variogram models. Thefirst was the exponential variogram model

(h) = c0 + c1(1 exp(h/r)) for h >0, (25)

(0) = 0, (26)

wherec0is the nugget variance,c1the sill of the spatially structured variance, and

rthe distance parameter of the model. The chosen parameter values were c0 = 0,

c1 = 1, andr= 16. The other was the spherical variogram model

(h) = c0 + c13h2a

1

2h

a

3

for 0a, (28)

(0) = 0, (29)

wherec0 andc1 have the same meaning as in Equation (25) and a is the distance

parameter. The parameter values werec0 = 1/3,c1 = 2/3, anda = 50. The dis-

tance parameter,a , for the spherical model is the range of the spatial dependence,

whereas the exponential model has effective range 3r. Thus both of the models

applied had approximately the same effective range.Each field was generated using unconditioned sequential Gaussian simulation

(Deutsch and Journel, 1998) and consisted of either 120 120= 14,400 or 256

256 = 65, 536 values on a square grid at unit interval. Henceforth we refer to the


12/32


Table 1. The Number of Sample Points in Each Scheme

and the Corresponding Distance Between These Points

Sample points 25 49 100 144 225 400Interval 20 15 10 8 7 5

four sets of simulatedfields as Sets 14. Sets 1 and 2 are the sets of large fields

with exponential and spherical variograms respectively. Sets 3 and 4 are the set of

smallfields with exponential and spherical variograms respectively.

The smallerfields were simulated 1000 times, and the larger ones 100 times.

The smallerfields, where the effective range was almost half of the length of the

field, were sampled using regular square grids with the sample sizes and sampling

intervals listed in Table 1. For the largerfields, the effective range of the simulated

variogram was less than a fifth of the length of the field. If the whole field had

been sampled using a square grid it would have provided little information about

the structured part of the variogram, unless the grid was very dense. Therefore the

field was sampled along transects. The combinations of sample sizes and sample

intervals were the same as those listed in Table 1, for example, a sample size of

25 points was split into 5 transects, with each point separated by distance 5.

Each of the four sets of fields were sampled with six different sampling

schemes. The exact position of the sampling grids or transects was chosen at

random, but the same positions were used for each realization within the same

field set. All of the theoretical expressions for variogram uncertainty described

previously are functions of both the ergodic variogram and the sampling scheme

used. Therefore, in the three simulation tests described below, each combination

of sampling scheme and field type was tested independently. In each case, the

theoretical expressions of variogram uncertainty were calculated. Then each of

the realizations was sampled, and from these data an experimental variogram was

estimated, and a variogram modelfitted by GLS. Thefitted variogram model was

of the same type as that of the simulated variable. The errors in the variogram

estimates were then compared with the expected values. Also, a further simulated

approximation of the covariance matrix offitted parameter values was made by

simulating Gaussian realizationsof the experimental variogram(h) directly, using

the estimated experimental variogram covariance matrix . A model wasfitted to

each realization by GLS and the covariance matrix of these simulated parameter

estimates was calculated.

Experiment 1

The first experiment considered the covariance matrices calculated from

Equations (13) and (24) which describe the uncertainty of method of moments


13/32


Table 2. The Constraints Placed on the Fitted Parameters for Each Data Set

Data set c0 min c0 max c1 min c1 max a orr min aorrmax

Set 1 0.0 0.7 0.1 1.6 7.0 30.0

Set 2 0.0 0.7 0.1 1.6 21.0 80.0

Set 3 0.0 0.7 0.1 1.6 7.0 30.0

Set 4 0.0 0.7 0.1 1.6 21.0 80.0

estimates to the ergodic variogram. These theoretical uncertainty estimates were

calculated for each combination of test set and sampling scheme. The ergodic

variogram values required by Equations (13) and (24) were taken from the vari-

ogram used to simulate the relevant data set.The covariance matrices of the experimental variograms and fitted parameters

for the sets of simulated data were then derived. The sampling scheme being tested

was applied to each realization of the random variable. Experimental variograms

Figure 1. Comparison between expected ergodic errors and those observed for the Set 1 data set. The

continuous lines show the expected ergodic errors for the marked sample size. The ergodic errors from

400 sample points are denoted by , from 100 sample points by +, and from 25 sample points by .


14/32


were calculated for each realization. The tolerance on the lag bins was set at

zero. A variogram model of the same type as the simulated variable was fitted

to the experimental variogram by a single iteration of the GLS procedure [Eq.(7)]. Limits were placed on the possiblefitted values for each data set in order to

prevent negative variances and ensure that the range of spatial correlation was not

greater than half the length of the field. Variogram estimates for lags greater than

half the length of a region are known to be unreliable (Webster and Oliver, 2001).

The limits are listed in Table 2. For Sets 1 and 3 the fitting procedure was then

repeated with the minimum value ofc0equal to0.3. Such a model would not be

fitted in reality since the variance is negative for small lag distances. It is included

here so that the effect of the c0 = 0 constraint on the uncertainty estimates may be

separated from other sources of error.In Figures 14 the expected ergodic error is compared with that observed from

the simulated data sets. There is good agreement for all data sets. The expected


continuous lines show the expected ergodic errors for the marked sample size. The simulated standard

errors from 400 sample points are denoted by , from 100 sample points by +, and from 25 sample

points by .


15/32



continuous lines show the expected ergodic errors for the marked sample size. The ergodic errors from

400 sample points are denoted by , from 100 sample points by +, and from 25 sample points by .

variance offitted parameter estimates (c0,c1,a orr) are compared with the sim-

ulated variance of these estimates in Tables 36. Here, the minimum permissible

value ofc0for Set 1 and Set 3 is0.3. For sample schemes of fewer than 100 points,

the simulated values are less than the expected values. This is due to the theoretical

Table 3. Comparison of Theoretical Variances of Fitted Variogram Parameters, With Variances

Observed From Multiple Simulated Fields and Multiple Simulated Experimental Variograms, for the

Set 1 Data Set

Theoretical Simulatedfield Simulated variogram

Size c0 c1 a c0 c1 a c0 c1 r

25 2.34e01 1.99e01 6.27e03 1.39e-01 4.36e-01 8.56e02 1.34e-01 5.67e-01 1.37e03

49 1.67e00 1.31e00 8.46e02 1.06e-01 2.72e-01 7.81e02 1.11e-01 2.99e-01 6.75e02

100 1.09e-01 1.33e-01 1.44e02 5.45e-02 1.92e-01 4.65e02 7.60e-02 1.45e-01 4.83e02144 3.47e-02 7.87e-02 8.45e01 2.83e-02 1.78e-01 3.06e02 4.44e-02 9.11e-02 2.22e02

225 1.39e-02 5.59e-02 5.41e01 1.87e-02 9.95e-02 1.55e02 2.23e-02 9.42e-02 1.72e02

400 2.96e-03 4.23e-02 2.69e01 5.10e-03 4.91e-02 3.52e01 5.82e-03 3.70e-02 3.66e02


16/32


17/32


18/32


19/32


Figure 6. A histogram showing a distribution of the fitted values ofc1, for the Set 4 data set sampled

with a 400 point square grid. The continuous line shows the distribution predicted by Equation (13).

The expected value ofc1 = 2/3.

Experiment 3

In Experiments 1 and 2, the ergodic variogram of the simulated variable is used

to calculate the expected variogram errors. In practice, this would be unknown.

Instead it would have to be approximated by the model fitted to the experimental

variogram. The third experiment investigates the effect that this has on the accuracy

of the confidence limits.

For each variogram estimated in Experiment 1, the value of

((h) (h))T1((h) (h)), (30)

was calculated. Here, (h) is the ergodic variogram of the simulated variable

calculated at the vector of lag distances h, (h) is the estimated experimentalvariogram values and is the covariance matrix of the experimental variogram

estimates, calculated from Equation (13), using (h). The covariance matrix

is then recalculated, using the variogram fitted to (h). Then Equation (30) is


20/32


Figure 7. A histogram showing a distribution of thefitted values ofa , for the Set 4 data set sampled

with a 400 point square grid. The continuous line shows the distribution predicted by Equation (13).

The expected value ofa = 50.

recalculated using this new matrix. For each test set and sampling scheme

combination, the distributions of the two sets of values of Equation (30) should

form chi squared distributions of orderk, wherekis the number of experimental

lag bins, and the confidence limits may be calculated from Equation (15).

In Tables 710, the observed percentage of ergodic experimental variogramestimates lying within each theoretical confidence limit are given. The theoretical

confidence limits appear to be reasonable for covariance matrices calculated with

fitted variogram estimates and for covariance matrices calculated with the actual

ergodic variogram. In general, the confidence limits resulting from the actual

ergodic variogram are slightly more accurate. This is particularly noticeable for

sample schemes with fewer than 100 points.

DISCUSSION

Extensive simulation tests have shown that, for an isotropic Gaussian random

variable with known ergodic variogram, the covariance matrix of experimental


21/32


Figure 8. Comparison between expected nonergodic errors and those observed for the Set 1 data set.

The continuous lines show the expected nonergodic errors [calculated from Eq. (23)] for the marked

sample size. The nonergodic errors from 400 sample points are denoted by, from 100 sample points

by +, and from 25 sample points by .

Table 7. Percentage of Estimates to the Ergodic Variogram Lying Within the Theoretical Con fidence

Limits for the Set 1 Data Set

Sample size Variogram 99 98 95 90 80 70 50 30 10

25 Fitted 92.0 91.0 85.0 77.0 68.0 58.0 45.0 27.0 9.0

25 Ergodic 99.0 98.0 94.0 87.0 79.0 72.0 56.0 36.0 12.0

49 Fitted 94.0 92.0 92.0 87.0 77.0 67.0 49.0 31.0 13.0

49 Ergodic 97.0 96.0 93.0 92.0 84.0 74.0 61.0 39.0 17.0

100 Fitted 97.0 95.0 93.0 86.0 73.0 65.0 52.0 30.0 10.0

100 Ergodic 95.0 95.0 93.0 90.0 83.0 80.0 57.0 31.0 10.0

144 Fitted 98.0 97.0 94.0 86.0 81.0 68.0 52.0 33.0 12.0

144 Ergodic 98.0 98.0 95.0 93.0 88.0 82.0 62.0 44.0 15.0

225 Fitted 97.0 96.0 93.0 87.0 78.0 64.0 56.0 37.0 11.0

225 Ergodic 98.0 96.0 94.0 91.0 75.0 67.0 54.0 32.0 12.0

400 Fitted 100.0 96.0 93.0 89.0 74.0 64.0 44.0 28.0 11.0400 Ergodic 97.0 94.0 89.0 86.0 78.0 73.0 58.0 36.0 16.0

Note. Theoretical confidence limits calculated with the fitted and ergodic variograms are treated

separately.


22/32


Figure 9. Comparison between expected nonergodic errors and those observed for the Set 2 data set.

The continuous lines show the expected nonergodic errors [calculated from Eq. (23)] for the marked

sample size. The nonergodic errors from 400 sample points are denoted by, from 100 sample points

by +, and from 25 sample points by .




25 Fitted 93.0 91.0 88.0 85.0 72.0 58.0 44.0 27.0 7.0

25 Ergodic 97.0 97.0 89.0 87.0 78.0 73.0 56.0 38.0 8.0

49 Fitted 97.0 93.0 91.0 83.0 72.0 63.0 51.0 27.0 11.0

49 Ergodic 99.0 98.0 95.0 91.0 83.0 73.0 53.0 36.0 14.0

100 Fitted 95.0 93.0 87.0 79.0 74.0 64.0 50.0 29.0 18.0

100 Ergodic 98.0 96.0 92.0 89.0 83.0 73.0 58.0 37.0 16.0

144 Fitted 97.0 96.0 92.0 88.0 74.0 69.0 52.0 33.0 10.0

144 Ergodic 97.0 97.0 95.0 93.0 87.0 80.0 62.0 46.0 13.0

225 Fitted 95.0 91.0 89.0 82.0 74.0 62.0 45.0 28.0 12.0

225 Ergodic 97.0 96.0 92.0 87.0 76.0 69.0 60.0 35.0 11.0

400 Fitted 97.0 96.0 92.0 84.0 78.0 61.0 47.0 31.0 13.0400 Ergodic 98.0 96.0 94.0 85.0 79.0 73.0 58.0 39.0 18.0


separately.


23/32


Figure 10. Comparison between expected nonergodic errors and those observed for the Set 3 data

set. The continuous lines show the expected nonergodic errors [calculated from Eq. (23)] for the

marked sample size. The nonergodic errors from 400 sample points are denoted by , from 100

sample points by +, and from 25 sample points by .




25 Fitted 93.4 91.3 87.0 82.2 72.4 66.0 50.5 32.6 11.5

25 Ergodic 97.0 95.2 93.0 90.6 83.9 78.3 60.4 38.1 11.9

49 Fitted 93.3 91.5 88.8 81.9 74.1 65.0 49.7 30.6 10.3

49 Ergodic 96.5 95.0 91.9 88.4 82.8 75.5 59.4 40.5 14.1

100 Fitted 95.1 92.8 89.4 83.9 75.1 66.3 47.9 30.4 11.1

100 Ergodic 96.8 95.1 92.5 89.4 80.9 73.3 58.4 39.2 14.8

144 Fitted 95.2 92.0 88.7 84.4 76.6 66.4 46.8 29.9 10.8

144 Ergodic 96.5 95.0 92.4 89.4 81.5 74.6 60.3 41.0 15.4

225 Fitted 95.5 93.7 90.1 85.9 75.9 67.2 49.3 31.5 10.3

225 Ergodic 96.0 93.9 90.4 85.4 78.7 71.6 60.8 42.3 17.1

400 Fitted 96.8 96.2 91.4 86.6 80.1 72.0 47.8 28.5 11.8400 Ergodic 98.0 96.0 94.0 85.0 79.0 73.0 58.0 39.0 18.0


separately.


24/32


Figure 11. Comparison between expected nonergodic errors and those observed for the Set 4 data

set. The continuous lines show the expected nonergodic error [calculated from Eq. (23)] for the

marked sample size. The nonergodic errors from 400 sample points are denoted by , from 100

sample points by +, and from 25 sample points by .

Table 10. Percentage of Estimates to the Ergodic Variogram Lying Within the Theoretical Confidence



25 Fitted 92.3 90.2 87.0 80.7 72.4 63.9 47.4 28.5 11.6

25 Ergodic 94.8 93.3 90.3 86.9 81.1 74.3 57.5 36.9 11.1

49 Fitted 92.3 90.3 86.1 80.7 71.2 62.0 46.5 31.1 11.0

49 Ergodic 95.2 93.5 90.5 86.6 81.5 75.2 60.1 40.8 14.0

100 Fitted 90.2 88.3 82.8 78.2 67.6 61.3 43.0 26.5 10.0

100 Ergodic 95.5 94.7 90.4 85.7 78.5 71.9 56.7 38.7 12.7

144 Fitted 90.0 87.6 83.1 76.1 66.6 59.5 43.3 26.6 8.8

144 Ergodic 95.9 94.4 91.5 88.1 82.1 74.2 59.4 38.9 15.1

225 Fitted 89.1 87.7 83.1 77.6 68.1 60.8 43.8 28.7 11.3

225 Ergodic 94.6 93.0 89.6 85.6 79.1 73.6 60.0 39.8 16.8

400 Fitted 91.3 88.9 83.4 78.7 69.4 58.5 42.9 24.9 8.7400 Ergodic 96.8 94.5 90.7 86.0 76.9 71.9 55.7 40.9 18.2


separately.


25/32


Figure 12. Comparison of the expected ergodic (continuous line) and nonergodic (dotted line)

errors for the Set 1 data set.

variogram estimates is approximated accurately by Equation (13). Pardo-Iguzquiza

and Dowd (2001a) approximated the covariance matrix of the parameters offitted

variogram models by calculating the inverse of the information matrix [Eq. (24)].

This method is seen to overestimate the precision of the parameter estimates.This might be due to a number of factors. First, this expression for parameter

uncertainty is based on a leading order Taylor series expansion centered on the

actual ergodic variogram. Thus it is accurate only when the uncertainty is small. In

Tables 36, the uncertainty estimates are seen to improve as the sample size, and

therefore the precision of estimates, increases. Secondly, this method assumes that

the distribution offitted variogram parameters is Gaussian. The distributions of

parameters fitted to the simulated data sets were seen to deviate from Gaussian. The

method also assumes that the parameters may take any value. In the simulation

tests it was necessary to place constraints on the parameter values for practicalreasons, as would be the case in a real survey. Finally, for some realizations an

inappropriate choice of variogram model might have caused larger deviations from

the expected parameter values.


26/32




As an alternative to calculating the inverse of the information matrix, the

uncertainty of fitted parameter values may be assessed by simulating multiple

experimental variograms using Equation (13) and then fitting variogram models to

these. This process is computationally more expensive, but for a small number oflag bins it is practical. The results in Tables 36 show that this simulation method is

more accurate than using the information matrix. Also, there is no need to assume

a particular distribution of parameter estimates, and constraints on the parameter

values can be accounted for.

Simulation tests have also shown that the expected nonergodic errors are

approximated accurately by Munoz-Pardos (1987) expression [Eq. (23)].These

nonergodic errors are due purely to sampling. Estimates of the ergodic variogram

have a component of uncertainty due to thefluctuations of the random variable, in

addition to this sampling error. When the large fields studied in Set 1 and Set 2 weresampled with a 25 point scheme, the ergodic and nonergodic errors were almost

identical. When more sampling points were used, the nonergodic error became

less than the ergodic error. The difference between the nonergodic and ergodic


27/32




errors was more pronounced for the smallerfields considered in Set 3 and Set 4,

particularly over large lag distances.

These results reflect that for the small fields, the sample grids covered the

entire field effectively. Thus most of the variation of the variable within the region,particularly over large lag distances, was accounted for. Therefore the nonergodic

error was much smaller than the ergodic error, which also had to account for

fluctuations of the random variable over other realizations. For the largerfields,

the sample points were more sparse. Therefore there were parts of the field that were

unsampled and the variation in these was not accounted for. Thus the nonergodic

error was more similar to the ergodic error since both estimators have to account

for behavior andfluctuations of the variable in unsampled regions. In the case of

the ergodic estimator this unsampled regionconsisted of all other realizations

of the variable.These simulation tests were computationally very expensive. Each realiza-

tion was sampled at every point to calculate a definitive nonergodic variogram.

Webster and Olivers (1992) study of nonergodic variogram uncertainty used a


28/32




method requiring far fewer computations. However, we feel that our method was

worthwhile since it was more accurate. To illustrate this, Figure 16 shows the ex-

pected nonergodic error for the Set 4 data set calculated using Webster and Olivers

method. The values seen here agree with those found in 1992. They are however,significantly less than both the nonergodic errors observed in our simulations and

the expected nonergodic errors from Munoz-Pardos (1987) expression [Eq. (23)].

Webster and Oliver (1992) estimated the nonergodic error for a particular

sampling grid design and separation distance h by the standard deviation of(h)

values derived from translations of the grid over a single realization. Since they

came from the same realization, these estimates ofNE(h) were not independent.

The covariance between (h; gp), the semivariance estimated from translated grid

gp, and (h; gq ), the semivariance estimated from translated grid gq is given by

Cov ((h; gp), (h; gq )) =1

2n2(h)

n(h)i=1

n(h)j=1

Ci j(h; gp, gq ), (31)


29/32


Figure 16. The expected nonergodic error calculated using Webster and Olivers (1992) method. The

are for 400 sample points, + are for 100 sample points, and the are for 25 sample points. The

lines show the expected nonergodic error using Munoz-Pardos (1987) method [Eq. (23].

where Ci j(h; gp, gq ) describes the covariance between [zi1(h) z

i2(h)] sampled

from grid gp and [zj

1 (h) zj

2 (h)] sampled from grid gq . Values of Ci j(h; gp,

gq ) may be calculated from Equation (12). If the translated grids are well sep-arated, orh is small, then the covariance between semivariances estimated from

different grids are small and Webster and Olivers method gives a good approxima-

tion of the nonergodic error. However there are only a small number of translations

of large sample grids over small regions which do not have a point in common.

The position of sample points within some of these grids are close enough to cause

significant correlation between the estimated semivariances. This leads to the non-

ergodic error being underestimated as illustrated in Figure 16. Our method did

not contain such a bias since the nonergodic error estimates came from different

realizations of the random process and were therefore uncorrelated.The difference between the nonergodic and ergodic errors can have impli-

cations for both the design of efficient sampling schemes and variogram model

fitting. The ergodic covariance matrix has been used previously to optimize sample


30/32


31/32


Equation (13). Then Equation (23) could be calculated from this model, and used

for onefinal iteration of thefitting procedure [Eq. (7)]. More theoretical work is

required to ensure that such an approach is consistent.A major disadvantage of calculating the nonergodic covariance matrix is the

extra computational work required. The method described in this paper requires all

the covariances between pairs of pairs within a very concentrated sample scheme

to be calculated. For each entry of the covariance matrix it is only the average of the

covariance between pairs from each bin that is needed. Therefore the computational

load may be reduced by subsampling of these pairs.

In Experiment 3, the effect upon the experimental variogram confidence limits

from using the fitted variogram rather than the correct ergodic variogram, was seen

to be small for sample schemes of 100 or more points. It therefore appears that thecircular approach in calculating variogram uncertainty is valid.

CONCLUSIONS

This study has demonstrated that for a known ergodic variogram, it is pos-

sible to accurately determine the expected difference between the experimental

semivariances calculated from a particular sampling scheme and the correspond-

ing ergodic and nonergodic variogram values. Ergodic errors may be estimatedby Pardo-Iguzquiza and Dowds (2001a) method [Eq. (13)] and nonergodic errors

by Munoz-Pardos (1987) expression [Eq. (23)]. The ergodic error is significantly

less demanding to compute than the nonergodic error. For large fields the differ-

ence between the two error expressions is negligible. However for small regions,

say with length around twice the range of spatial correlation of the variable, the

nonergodic error is significantly less than the ergodic error.

Previously Muller and Zimmerman (1999) and Bogaert and Russo (1999)

have used the ergodic error expressions to compute optimal sampling schemes for

variogram estimation. If the aim of these schemes is to approximate the variogramof the single region being sampled with maximum precision, then a nonergodic

expression of variogram uncertainty would be more appropriate. On smallerfields

use of the ergodic expression leads to more intensive sampling than is required.

Our results have also suggested that the GLS variogramfitting procedure may be

improved (in the sense that the fitted variogram better matches the nonergodic

variogram) if the nonergodic error is incorporated into the final iteration of the

procedure.

It should be noted that if these expressions are used to determine variogram

uncertainty in a real survey, there will be additional uncertainty because the ergodicvariogram is unknown. Further simulation tests suggested that the additional un-

certainty from using the estimated variogram rather than the true ergodic variogram

is small for sample schemes of more than 100 points.


32/32


ACKNOWLEDGMENTS

This work was supported by the Biotechnology and Biological Sciences Re-search Council of the U.K. through Grant 204/D1 5335 and by the Home-Grown

Cereals Authority of the U.K. through grant 2453.

REFERENCES

Bogaert, P., and Russo, D., 1999, Optimal spatial sampling design for the estimation of the variogram

based on a least squares approach: Water Resour. Res., v. 35, no. 4, p. 12751289.

Brooker, P. I., 1986, A parametric study of robustness of Kriging variance as a function of range and

relative nugget effect for a spherical semivariogram: Math. Geology, v. 18, no. 5, p. 477488.Brus, D. J., and de Gruijter, J. J., 1994, Estimation of nonergodic variograms and their sampling variance

by design-based sampling strategies: Math. Geology, v. 26, no. 4, p. 437453.

Cressie, N., 1985, Fitting variogram models by weighted least squares: Math. Geology, v. 17, no. 5,

p. 563586.

Deutsch, C. V., and Journel, A. G., 1998, GSLIB: Geostatistical software library and users guide, 2nd

ed.: Oxford University Press, New York, 369 p.

Gathwaite, P. H., Joliffe, I. T., and Jones, B., 1995, Statistical inference: Prentice Hall, London, 290 p.

Journel, A. G., and Huijbregts, C. J., 1978, Mining geostatistics: Academic Press, London, 600 p.

McBratney, A. B., and Webster, R., 1986, Choosing functions for semi-variograms of soil properties

andfitting them to sampling estimates: J. Soil Sci. v. 37, no. 4, p. 617639.

Menke, W., 1984, Geophysical data analysis: Discrete inversion theory: Academic Press, San Diego,

CA, 285 p.

Muller, W. G., and Zimmerman,D. L., 1999,Optimal designs for variogram estimation:Environmetrics,

v. 10, no. 1, p. 2337.

Munoz-Pardo, J. F., 1987, Approche Geostatistique de la variabilite spatiale des Milieux Geophysiques:

These Docteur-Ingenieur,UniversitedeGrenobleetlInstitut National Polytechnique de Grenoble,

254 p.

Ortiz, C. J., and Deutsch, C. V., 2002, Calculation of uncertainty in the variogram: Math. Geology:

v. 34, no. 2, p. 169183.

Pardo-Iguzquiza, E., and Dowd, P. A., 2001a, Variancecovariance matrix of the experimental vari-

ogram: Assessing variogram uncertainty: Math. Geology, v. 33, no. 4, p. 397419.

Pardo-Iguzquiza, E., and Dowd, P. A., 2001b, VARIOG2D: A computer program for estimating the

semi-variogram and its uncertainty: Comput. Geosciences, v. 27, no. 5, p. 549561.

Todini, E., 2001, Influence of parameter estimation uncertainty in Kriging: Part 1Theoretical devel-

opment: Hydrol. Earth Sci. Syst., v. 5, no. 2, p. 215223.

Todini, E., Pellegrini, F., and Mazzetti, C., 2001, Influence of parameter estimation uncertainty

in Kriging: Part 2Test and case study applications: Hydrol. Earth Sci. Syst., v. 5, no. 2,

p. 225232.

Webster, R., and Oliver, M. A., 1992, Sample adequately to estimate variograms of soil properties:

J. Soil Sci., v. 43, no. 1, p. 177192.

Webster, R., and Oliver, M. A., 2001, Geostatistics for environmental scientists: John Wiley & Sons,

Chichester, 271 p.

Documents

Estimation variogram uncertainty