20
P. Goovaerts Department of Civil and Environmental Engineering, The University of Michigan, EWRE Bldg, Room 117, Ann Arbor, MI 48109-2125, USA e-mail: [email protected], Tel.: c001-734-9360141, Fax: c001-734-7632275 Biol Fertil Soils (1998) 27 : 315–334 Q Springer-Verlag 1998 REVIEW ARTICLE P. Goovaerts Geostatistical tools for characterizing the spatial variability of microbiological and physico-chemical soil properties Received: 6 January 1998 Abstract This paper reviews the main applications of geostatistics to the description and modeling of the spa- tial variability of microbiological and physico-chemical soil properties. First, basic geostatistical tools such as the correlogram and semivariogram are introduced to characterize the spatial variability of each attribute sep- arately as well as their spatial interactions. Then, the key issue of fitting permissible models to experimental semivariograms is addressed for the univariate and multivariate situations. Capitalizing on this model of spatial dependence, the value of a soil property can be predicted at unsampled locations using only observa- tions of this particular property (kriging) or incorporat- ing additional information provided by other correlated properties (cokriging). Factorial kriging allows one to discriminate the different sources of spatial variation in soil on the basis of the scale at which they operate, and it often enhances relations between soil attributes which were blurred in a traditional correlation analysis where the different sources of variations are mixed. Geostatistics can also be used to assess the risk of ex- ceeding critical values (regulatory thresholds, soil quali- ty criterion) at unsampled locations, and to simulate the spatial distribution of attribute values. All the dif- ferent tools are illustrated using two transects of 100 pH and electrical conductivity values measured in pasture and forest. Key words Geostatistics 7 Spatial variability 7 Kriging 7 Scale-dependent correlation 7 Stochastic simulation Introduction Soil scientists are aware that soil properties vary spa- tially, and they have already recorded strong fluctua- tions even over short distances (Trangmar et al. 1985; Warrick et al. 1986). Behind a locally erratic aspect, some spatial structure is often discerned and may be related to the combined action of several physical, chemical or biological processes that act at different spatial scales. The characterization of the spatial varia- bility of soil attributes is essential to achieve a better understanding of complex relations between soil prop- erties and environmental factors. Also, a model of spa- tial dependence between soil data can later be used to estimate attributes at unsampled locations, leading, for example, to better recommendations for the applica- tion of fertilizers or pesticides. Geostatistics provides a set of statistical tools for in- corporating the spatial coordinates of soil observations in data processing, allowing for description and model- ing of spatial patterns, prediction at unsampled loca- tions, and assessment of the uncertainty attached to these predictions. Since the publication of the first ap- plications of geostatistics to soil data in the early 1980s (Burgess and Webster 1980a, b; Webster and Burgess 1980; Burgess et al. 1981), geostatistical methods have become popular in soil science, as illustrated by the in- creasing number of studies reported in the literature (Goovaerts 1998). The use of geostatistics is not con- fined to physico-chemical soil properties, but studies have also been conducted on the spatial patterns of soil micro-organisms (Wollum and Cassel 1984; Webster and Boag 1992), plants (Vieira et al. 1983; Sutherland et al. 1991), and other organisms living at the surface of the soil (Rossi et al. 1992). The greater accessibility of geostatistical software (e.g., Englund and Sparks 1991; Pannatier 1996; Deutsch and Journel 1998) has increased the risk that geostatistical tools are used without a good command of the underlying theory, in particular in the field of

P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

P. GoovaertsDepartment of Civil and Environmental Engineering,The University of Michigan, EWRE Bldg, Room 117,Ann Arbor, MI 48109-2125, USAe-mail: [email protected], Tel.: c001-734-9360141,Fax: c001-734-7632275

Biol Fertil Soils (1998) 27 :315–334 Q Springer-Verlag 1998

REVIEW ARTICLE

P. Goovaerts

Geostatistical tools for characterizing the spatial variability of

microbiological and physico-chemical soil properties

Received: 6 January 1998

Abstract This paper reviews the main applications ofgeostatistics to the description and modeling of the spa-tial variability of microbiological and physico-chemicalsoil properties. First, basic geostatistical tools such asthe correlogram and semivariogram are introduced tocharacterize the spatial variability of each attribute sep-arately as well as their spatial interactions. Then, thekey issue of fitting permissible models to experimentalsemivariograms is addressed for the univariate andmultivariate situations. Capitalizing on this model ofspatial dependence, the value of a soil property can bepredicted at unsampled locations using only observa-tions of this particular property (kriging) or incorporat-ing additional information provided by other correlatedproperties (cokriging). Factorial kriging allows one todiscriminate the different sources of spatial variation insoil on the basis of the scale at which they operate, andit often enhances relations between soil attributeswhich were blurred in a traditional correlation analysiswhere the different sources of variations are mixed.Geostatistics can also be used to assess the risk of ex-ceeding critical values (regulatory thresholds, soil quali-ty criterion) at unsampled locations, and to simulatethe spatial distribution of attribute values. All the dif-ferent tools are illustrated using two transects of100 pH and electrical conductivity values measured inpasture and forest.

Key words Geostatistics 7 Spatial variability 7Kriging 7 Scale-dependent correlation 7 Stochasticsimulation

Introduction

Soil scientists are aware that soil properties vary spa-tially, and they have already recorded strong fluctua-tions even over short distances (Trangmar et al. 1985;Warrick et al. 1986). Behind a locally erratic aspect,some spatial structure is often discerned and may berelated to the combined action of several physical,chemical or biological processes that act at differentspatial scales. The characterization of the spatial varia-bility of soil attributes is essential to achieve a betterunderstanding of complex relations between soil prop-erties and environmental factors. Also, a model of spa-tial dependence between soil data can later be used toestimate attributes at unsampled locations, leading, forexample, to better recommendations for the applica-tion of fertilizers or pesticides.

Geostatistics provides a set of statistical tools for in-corporating the spatial coordinates of soil observationsin data processing, allowing for description and model-ing of spatial patterns, prediction at unsampled loca-tions, and assessment of the uncertainty attached tothese predictions. Since the publication of the first ap-plications of geostatistics to soil data in the early 1980s(Burgess and Webster 1980a, b; Webster and Burgess1980; Burgess et al. 1981), geostatistical methods havebecome popular in soil science, as illustrated by the in-creasing number of studies reported in the literature(Goovaerts 1998). The use of geostatistics is not con-fined to physico-chemical soil properties, but studieshave also been conducted on the spatial patterns of soilmicro-organisms (Wollum and Cassel 1984; Websterand Boag 1992), plants (Vieira et al. 1983; Sutherlandet al. 1991), and other organisms living at the surface ofthe soil (Rossi et al. 1992).

The greater accessibility of geostatistical software(e.g., Englund and Sparks 1991; Pannatier 1996;Deutsch and Journel 1998) has increased the risk thatgeostatistical tools are used without a good commandof the underlying theory, in particular in the field of

Page 2: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

316

Fig. 1 Transects of 100 pH values measured in the topsoil of apasture and a forest plot

Fig. 2 Histograms of soil pH values measured in a pasture and aforest plot

semivariogram modeling. Moreover, in several situa-tions, inappropriate or non-optimal approaches are ad-opted because scientists are unaware of recent develop-ments, such as factorial kriging, indicator geostatisticsor stochastic simulation.

This paper aims at providing soil scientists with anapplied introduction to basic descriptive and predictivegeostatistical tools such as the semivariogram and krig-ing, while presenting recent developments in multivar-iate geostatistics and the modeling of local and spatialuncertainty. The different tools will be used for a step-by-step analysis of a one-dimensional set of soil data,which consists of two transects of 100 topsoil pH andelectrical conductivity values measured every meter inpasture and forest.

Description of spatial patterns

Figure 1 shows two series of 100 topsoil pH valuesmeasured every meter along transects, one in a pastureand the other in a forest (Goovaerts et al. 1989). Toooften, analysis of such data amounts to plotting the his-togram and computing summary statistics such as meanand standard deviation (Fig. 2). By so doing, one ig-nores critical information, that is the spatial location ofpH measurements. For example, the comparison of

standard deviations suggests that the variability of pHis similar for both land uses. The shape of pH transectsindicates, however, that pH values vary over shorterdistances in the forest than in the pasture! This spatialfeature, which may be important for interpretation, isnot captured if one ignores the spatial information.Geostatistics provides a set of statistical tools for de-tecting and quantifying the major scales of spatial varia-bility.

The correlogram

Let z(ua), ap1, 2, . . ., n denote the set of np100 pHvalues measured in a pasture, where ua is the vector ofspatial coordinates of the ath observation. Figure 1(top graph) shows that the distribution of pH valuesalong the transect is not fully random in that observa-tions that are close to each other on the ground tend tobe more alike than those further apart. The similaritybetween adjacent pH values can be depicted by plottingeach observation z(ua) versus the one measured 1 maway, z(uach) with hhhp1 m. In this example, 99 pairsof pH measurements (z(ua), z(uach)) can be formedfrom the initial set of 100 values, and the resulting plotis called an h-scattergram (Fig. 3; left top graph). Read-ers are likely to be familiar with the scattergram, which

Page 3: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

317

Fig. 3 Scattergrams betweensoil pH values separated by adistance of 1 m, 2 m, . . ., 10 m.Note how the correlation be-tween observations decreasesas the separation distance in-creases

is a plot of all pairs of values related to two differentattributes measured at the same locations. By analogy,the h-scattergram is a plot of all pairs of measurementsof the same attribute z (i.e., pH) at locations separatedby a given distance hhh in a particular direction. Byconvention, the value at the start of the vector h, z(ua),is called the tail value, whereas the value at the end,z(uach), is the head value.

The shape of the cloud of points on the h-scatter-gram indicates that there is some correlation betweenadjacent pH values, and this can be measured using thelinear correlation coefficient which is traditionally usedto assess correlation between different attributes. Thecorrelation is 0.64 and agrees with our visual impres-sion that the pH value at any location is related to thepH measured 1 m away. Intuitively, one would expectthat the relation between pH values weakens for a sep-aration distance of 2 m, which is confirmed by the sec-

ond h-scattergram of Fig. 3 (right top graph): the corre-lation drops from 0.64 to 0.51. The increasing inflationof the cloud of points with increasing distance reflectsthe decreasing similarity of measurements fartherapart: the correlation at 10 m is only 0.19!

The plot of the estimated correlation coefficients asa function of the separation distance is called the ex-perimental correlogram. Figure 4 (top graph) shows thecorrelograms for pH measured in a pasture (solid line)and in a forest (dashed line), respectively. For the pas-ture, the correlation becomes negligible at a separationdistance of about 12 m, which is referred to as the rangeand is interpreted at the distance beyond which two pHvalues can be considered as statistically independent.The decline in correlation is much sharper in the forestin that any two pH values 3 m apart can already be con-sidered as independent. The correlogram thus allowsone to quantify in terms of correlation the visual im-

Page 4: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

318

Fig. 4 Correlograms and semivariograms for the transects of 100soil pH values measured in pasture (solid line) and forest (dashedline)

pression that small-scale fluctuations prevail for thetransect of forest pH values displayed at the bottom ofFig. 1.

The semivariogram

In the geostatistical literature, spatial patterns areusually described in terms of dissimilarity (instead ofsimilarity) between observations as a function of theseparation distance. The average dissimilarity betweendata separated by a vector h is measured by the experi-mental semivariogram g (h), which is computed as halfthe average squared difference between the compo-nents of every data pair:

g(h)p1

2N(h)

N(h)

Aap1

[z(ua)Pz(uach)]2, (1)

where N(h) is the number of data pairs for a given dis-tance, e.g., N (h)p99 for hhhp1 m for the transects ofFig. 1.

Figure 4 (bottom graph) shows the semivariogramsof pH measured in pasture (solid line) and forest(dashed line), respectively. In both cases, semivario-

gram values increase with the separation distance, re-flecting our intuitive feeling that two pH values close toeach other on the ground are more alike, and thus theirsquared difference is smaller, than those further apart.The two semivariograms stop increasing at a given dis-tance, called the range, which corresponds to the rangeidentified on the correlogram, and again it can be inter-preted as the distance of dependence or zone of in-fluence of pH measurements. Beyond the range, thetwo semivariograms fluctuate around the same sill val-ue, which is roughly equal to the sample variance com-puted on Fig. 2. In other words, the overall variation ofpH measurements across the transect is similar for bothland uses, yet the larger values of the forest semivario-gram at short distances indicate that most of this varia-bility is present over a few meters in forest. The discon-tinuity at the origin of the semivariogram (i.e., zero sep-aration distance) is called the nugget effect and arisesfrom measurement errors or spatial sources of variationat distances smaller than the shortest sampling intervalor both (Journel and Huijbregts 1978, p. 39). Depend-ing on the sampling scale, a semivariogram can appearas a pure nugget effect even if the variation is locallyspatially structured, and so it might be useful to con-duct a preliminary survey to approximate the majorscales of spatial variation (Oliver and Webster 1986a).

In combination with a good knowledge about thephenomenon and the study area, such a spatial descrip-tion can improve our understanding of the physical un-derlying mechanisms controlling spatial patterns. Forexample, McBratney and Webster (1981) related theperiodicity displayed by the correlogram of wheat yieldmeasured in a 0.4 ha area to an earlier ploughing. Ro-bertson et al. (1988) found nitrogen availability in a0.5 ha old field to be highly patterned at scales 20–40 m,which corresponds to the scales over which plant com-munity composition varies during early succession, sug-gesting that the community structure is strongly in-fluenced by the spatial heterogeneity of available nu-trients. Goovaerts (1994a) found a remarkable similari-ty between the shapes of semivariograms for eight soilchemical properties and six banana leaf contents meas-ured across a 100 km2 area, and he was able to relatethis common spatial pattern to the geographical distri-bution of soil types in the study area. Other examplesin earth sciences can be found in Oliver and Webster(1986b). In the present study, the short-range variabili-ty of forest pH values probably reflects the local heter-ogeneity of ground vegetation compared with the uni-formity of grass cover.

The information collected during soil surveys rarelyreduces to a single transect, but a two-dimensional do-main is typically sampled. It is thus necessary to charac-terize the spatial variability along different directions.For example, Fig. 5 (left column) shows two grids of pHvalues which were measured in the same plots as thetwo transects of Fig. 1. Semivariograms have been com-puted along the N-S (solid line) and E-W (dashed line)directions. Data pairs have been grouped into classes of

Page 5: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

319

Fig. 5 Grids of soil pH valuesmeasured on 1 ha pasture andforest plots, and the corre-sponding semivariograms com-puted in the NS (solid line)and EW (dashed line) direc-tions. For pasture, the varia-bility is said to be anisotropicin that it depends on the di-rection

distances (lags) and angles, [hBDh] and [uBDu], sothat each semivariogram value is computed from a suf-ficient number of data pairs (i.e., N(h)130). In the ex-ample of Fig. 5, the two classes of angles are 07B457

and 907B457, whereas the lag tolerance is 5 m. Thesegraphs show distinct spatial patterns of acidity on bothland uses: pH appears to vary more continuously inpasture as illustrated by the smaller nugget effect andlarger range of the corresponding semivariogram. An-other important feature is that the variability of pH val-ues in pasture is anisotropic, that is it depends on thedirection. The continuous increase of the N-S semivar-iogram reflects a trend in pH along that direction; thelarge pH values in the southern part of the plot are infact due to the liming of the neighboring plot a fewhours before sampling.

The cross correlogram

In most situations, several properties are measured oneach soil sample, and geostatistics is increasingly usedto process such multivariate spatial soil information(Wackernagel 1988; Goovaerts 1992). For example, inaddition to pH, electrical conductivity was measured ateach of the 100 locations along the transect in pasture(Fig. 6). Typically, the relationship between two soilproperties is assessed by plotting the scattergram of thetwo variables and computing the corresponding corre-

lation coefficient (Fig. 7; left top graph). Once again,such an approach ignores the spatial coordinates of themeasurements.

Instead of looking at the relation between pH andconductivity measured at the same location, one mayassess the correlation between adjacent observations ofboth attributes by plotting each pH value z(ua) versusthe conductivity value measured 1 m away, y(uach)with hhhp1 m. The so-called cross h-scattergram is dis-played at the top of Fig. 7 (right graph). Surprisingly,the correlation between the two attributes increaseswith the separation distance, an effect referred to as thelag effect (Journel and Huijbregts 1978, p. 48). Such aneffect can be observed in geochemistry, where differentrates of precipitation may cause enrichment in someminerals to lag behind that of others along the directionof hydrothermal flow. In this particular example, thelag effect is not backed by any physical interpretationand most likely reflects sample fluctuations.

Figure 7 shows cross h-scattergrams for a separationdistance up to 10 m. The plot of the estimated correla-tion coefficient as a function of the separation distanceis called the experimental cross correlogram (Fig. 8, topgraph). Note that two cross correlograms can be com-puted, depending on whether a pH value is comparedwith the conductivity value east of it (easterly direc-tion) or west of it (westerly direction). In both cases,the correlation between pH and electrical conductivitytends to decrease with the separation distance, and be-comes negligible around 15 m.

Page 6: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

320

Fig. 6 Transect of 100 pH and electrical conductivity valuesmeasured in the topsoil of a pasture

Fig. 7 Scattergrams between soil pH and electrical conductivityvalues separated by a distance of 0 m, 1 m, 2 m, . . ., 10 m. Notehow the correlation between observations decreases as the sepa-ration distance increases

The cross semivariogram

Instead of looking at the similarity between pH andconductivity values separated by a vector h say z(ua)and y(uach), one may consider how these two attri-butes jointly vary over a vector h, that is the relationbetween the increments [z(ua)–z(uach)] and[y(ua)Py(uach)]. If both attributes are positively cor-related, an increase (decrease) in pH values from ua touach tends to be associated with an increase (de-crease) in conductivity values. Conversely, a negativecorrelation between attributes would entail that an in-crease (decrease) in pH values tends to be associatedwith a decrease (increase) in conductivity values.

A measure of the joint variability of two continuousattributes z and y is the experimental cross semivario-gram which is computed as:

gZY(h)

p1

2N(h)

N(h)

Aap1

[z(ua)Pz(uach)]7[y(ua)Py(uach)] (2)

The rescaling of cross semivariogram values by the cor-responding direct semivariogram values yields the co-dispersion coefficient nZY(h) which can be interpretedas a linear correlation coefficient between the spatialincrements of both attributes (Goovaerts 1997a):

nZY(h)pgZY(h)

;gZZ(h)7gYY(h)B [P1,c1] (3)

Figure 8 shows the experimental cross semivario-gram (middle graph) and codispersion function (bot-tom graph) between pH and electrical conductivity inpasture. Note that unlike the cross correlogram, thecross semivariogram is identical in the easterly andwesterly directions. An interesting feature is the changein sign of cross semivariogram values, which indicatesthat the two attributes vary in opposite ways at shortdistances (small scale), whereas at larger distances theirincrements are positively related. Such scale-dependentrelations may reflect the existence of different physico-chemical processes controling the fluctuations in pHand electrical conductivity values at different scales.Changes in the correlation between soil attributes as afunction of the spatial scale have been reported in sev-eral studies, e.g., see Wackernagel (1988), Goulard andVoltz (1992), Goovaerts (1992, 1994a), Goovaerts andWebster (1994), and Dobermann et al. (1995, 1997).

Page 7: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

321

Fig. 8 Experimental cross correlogram and semivariogram forthe transect of 100 soil pH and electrical conductivity valuesmeasured in pasture

The indicator semivariogram

The different tools introduced so far allow the quantifi-cation of the spatial variability over the full range ofattribute values. In most applications, such descriptionshould be supplemented by the characterization of thespatial distribution of specific classes of values. For ex-ample, knowledge of whether large values are clusteredor scattered in space can improve our interpretation ofunderlying physical processes. In many environmentalstudies, a few random hot spots of large pollutant con-centrations caused by local human activities coexistwith a natural background of small values (Leonte and

Schofield 1996; Mohammadi et al. 1997). Similarly,many microbiological processes in soil, such as denitri-fication, show hot spots of high activity which may berelated to the patchy distribution of organic material insoil (Parkin 1987; Robertson et al. 1988).

The characterization of the spatial distribution of z-values above or below a given threshold value zk re-quires a prior coding of each observation z(ua) into anindicator datum i(ua;zk), defined as:

i(ua;zk)p51 if z(ua)^zk

0 otherwise(4)

Indicator semivariograms can then be computed bysubstituting indicator data i(ua;zk) for z-data z(ua) inEq. (1):

gI(h;zk)p1

2N(h)

N(h)

Aap1

[i(ua;zk)Pi(uach;zk)]2 (5)

The indicator variogram value 2gI(h;zk) measures howoften two z-values separated by a vector h are on oppo-site sides of the threshold value zk. In other words,2gI(h;zk) measures the transition frequency betweentwo classes of z-values as a function of h. The greatergI(h;zk), the less connected in space are the small orlarge values.

Figure 9 shows the indicator semivariograms com-puted for the lower and upper quartiles of the distribu-tions of pH values in pasture and forest. For forestsoils, the two indicator semivariograms have a verylarge nugget, which indicates that small and large pHvalues are almost randomly distributed along the trans-ect. Indicator semivariograms for pasture measure-ments are clearly different: the shorter range of the up-per quartile semivariogram reflects the existence ofshort high-valued (pH larger than 6.92) segments alongthe transect, whereas small pH values are better con-nected in space as indicated by smaller values of thelower quartile semivariogram.

Indicator cross semivariogram gI (h;zk,zkb) can becomputed by substituting indicator data defined for twodifferent threshold values zk and zkb for z and y-data inthe equation (2):

gI(h;zk,zkb)p1

2N(h)

N(h)

Aap1

[i(ua;zk)

Pi(uach;zk)]7[i(ua;zkb)Pi(uach;zkb)] (6)

Webster and Boag (1992) used a combination of indica-tor direct and cross semivariograms to characterize thespatial distribution of the number of cyst nematodes intopsoil. They found that as the threshold value zk in-creased, the range of indicator direct semivariogramsshortens progressively, whereas the nugget effect of thecross semivariograms decreased, which indicates aborder effect with infestation increasing gradually fromthe outside of patches towards their centres. Biological-ly, the distribution can be explained by the nematodebeing spread by cultivation from initial fortuitous fociof infestation or control by nematophagous fungi orboth.

Page 8: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

322

Fig. 9 Experimental indicator semivariograms for soil pH valuesmeasured in pasture and forest. The two threshold values corre-spond to the lower (solid line) and upper (dashed line) quartilesof the sample distributions of Figure 2

Indicator semivariograms can also be used to charac-terize the spatial patterns of categorical variables, suchas texture or water table classes, which take only a lim-ited number of states which might be ordered or not.Let S be a categorical attribute with possible states sk,kp1,2, . . .,K. The K states are exhaustive and mutuallyexclusive in the sense that one and only one state sk

occurs at each data location ua. The pattern of variabil-ity of a category sk can be characterized by semivario-grams of type (5) defined on an indicator coding of thepresence of that category:

i(ua;sk)p51 if s(ua)psk

0 otherwise(7)

The indicator variogram value 2gI(h; sk) measures howoften two locations a vector h apart belong to differentcategories skb(sk. The smaller 2gI(h; sk), the greaterthe spatial connectivity of category sk. For example, ca-tegorical indicator semivariograms have been used tocharacterize the spatial distribution of water tableclasses (Bierkens and Burrough 1993a, b) and the spa-tial arrangement of soil types (Goovaerts 1994b).

Semivariogram modeling

One of the main applications of geostatistics is the pre-diction of attribute values at unsampled locations. Pre-diction is made possible by the existence of spatial de-pendence between observations as assessed by the cor-relogram or semivariogram. A key step between de-scription and prediction is the modeling of the spatialdistribution of attribute values. Most of geostatistics isbased on the concept of random function, whereby theset of unknown values is regarded as a set of spatiallydependent random variables. Each measurement z(ua)is thus interpreted as a particular realization of a ran-dom variable Z(ua). Interested readers should refer totextbooks such as Isaaks and Srivastava (1989, pp. 196–236) or Goovaerts (1997a, pp. 59–74) for a detailedpresentation of the theory of random functions. An im-portant characteristic of the random function is its sem-ivariogram which must be modeled from the experi-mental values.

The univariate case

Let [g(hk),kp1, . . .,K] be the set of experimental semi-variogram values computed for a finite number of lags,say the 25 semivariogram values displayed at the bot-tom of Fig. 4 for either pasture or forest pH measure-ments. A continuous function must be fitted to thesevalues so as to deduce semivariogram values for anypossible lag h required by prediction algorithms andalso to smooth out sample fluctuations. The difficulty isthat only functions that are conditionally negative defi-nite can be considered as semivariogram models, in or-der to ensure the non-negativity of the variance of theprediction error, see later expression (21). In practice,only a few models are known to be permissible, andFig. 10 shows the ones that are included in most geosta-tistical software. The three top models are bounded inthat they reach a sill either at a given range value(spherical model) or asymptotically (exponential,Gaussian model):I Spherical model with range a

g(h)pSph1ha2p 51.57

ha

P0.571ha2

3

if h~a

1 otherwise(8)

I Exponential model with distance parameter b

g(h)p1Pexp1Phb 2 (9)

I Gaussian model with distance parameter b

g(h)p1Pexp1Ph2

b2 2 (10)

The practical range is defined as the distance at whichthe model value is at 95% of the sill, that is 3b for theexponential model and ;3b for the Gaussian model.

Page 9: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

323

Fig. 10 Bounded semivariogram models with the same practicalrange (top graph), and power models for different values of theparameter v (bottom graph)

The bottom graph shows power models for differentvalues of the parameter v:

g(h)phv with 0~v~2 (11)

These functions allow one to model different behav-iours at infinity (bounded, unbounded) and at the ori-gin (linear, quadratic) of the semivariogram. Severalauthors (McBratney and Webster 1986; Wackernagel1995; Goovaerts 1997a) have warned users of the riskof numerical instability when using the Gaussian modelfor spatial interpolation (kriging). A common recom-mendation is to add a small nugget effect to the Gaus-sian model to destroy its extreme extrapolative proper-ties and avoid the generation of artifacts in interpolatedmaps. A better solution is never to use the Gaussianmodel. If the experimental semivariogram appears tohave both an asymptote and an inflexion, it can bemodeled using the following function:

g(h)p1Pexp1Phv

b 2 with 0~v~2 (12)

where the parameter v can be adjusted (Wackernagel1995, p. 110; Webster 1997). Note that for vp2, oneretrieves the Gaussian model (10). A theoreticallysounder alternative is to use Whittle’s elementary cor-relation (McBratney and Webster 1986).

Typically, two or more permissible models must becombined to fit the shape of the experimental semivar-iogram. For example, the semivariogram of pH on pas-ture has been modeled using a combination of a nuggeteffect and two spherical models of range 6 m and 26 m

Fig. 11 Experimental semivariograms for soil pH values meas-ured along the pasture and forest transects with the model fitted

(Fig. 11; top graph). The semivariogram of forest pHhas been modeled using a nugget effect and an expon-ential model of practical range 4 m. Combinations ofpermissible models are permissible as long as the con-tribution of each basic model is positive, that is thenested model is written as:

g(h)pL

Alp0

bl gl (h) with bl60 (13)

where bl is the positive sill or slope of the correspond-ing basic semivariogram model gl(h).

The way in which these permissible models are cho-sen and their parameters (range, sill) are estimated isstill controversial (McBratney and Webster 1986;Goovaerts 1997a, pp. 97–107). Several methods havebeen proposed, ranging from full blackbox proceduresin which the choice of model and its fitting are auto-matic to visual approaches where the model is selectedso that the fit is satisfactory from a graphical point ofview, as in the examples of Fig. 11. An intermediate ap-proach consists of an automatic (least-squares) estima-tion of parameters of models chosen by the user. In thissemi-automatic procedure, the objective is usually tominimize a weighted sum of squares (WSS) of differ-

Page 10: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

324

ences between experimental g (hk) and model g(hk)semivariogram values:

WSSpK

Akp1

v(hk)7[g(hk)Pg(hk)]2 (14)

The weight v(hk) given to each lag hk is often takenproportional to the number N(hk) of data pairs thatcontribute to the estimate g(hk). The implicit assump-tion is that the reliability of an experimental semivario-gram value increases with statistical mass. An alterna-tive that gives more weight to the first lags consists ofdividing the number of data pairs by the squared modelvalue: N(hk)/[g(hk)]2 (Cressie 1985).

The value of the WSS criterion is frequently used torank alternative models, say to compare the goodness-of-fit of a spherical model versus an exponential one. Inmany situations, the user tends to increase the numberof basic structures in the nested model (13) to lower theWSS value as much as possible. However, one shouldavoid overfitting experimental semivariograms: themore complicated model generally does not lead tomore accurate estimates. McBratney and Webster(1986) proposed using the Akaike Information Criteri-on to achieve a satisfactory compromise between good-ness-of-fit and parsimony of the model.

Semivariogram modeling is rarely a goal per se. Theultimate objective is usually to estimate the value of asoil property at an unsampled place. Cross validationallows one to compare the impact of different semivar-iogram models on interpolation results (McBratney andWebster 1986; Isaaks and Srivastava 1989, pp. 351–368). The idea consists of removing one datum at atime from the data set and re-estimating this value fromremaining data using different semivariogram models.Interpolated and actual values are compared, and themodel that yields the most accurate predictions is re-tained. A measure of accuracy is the mean absolute er-ror of prediction (MAE), which is computed as the av-erage absolute difference between actual and estimatedvalues at the n sampled locations:

MAEp1n

n

Aap1

hz (ua)Pz*(ua)h (15)

In the presence of anisotropic (direction-dependent)variation, one must build a model that describes howthe semivariogram changes as the distance and direc-tion change. Two types of anisotropy are usually distin-guished: geometric and zonal. In the first case, the di-rectional semivariograms have the same shape and sillbut different range values that plot as an ellipse as afunction of the azimuth of the direction. An anisotropythat involves sill values varying with direction is said tobe zonal. These two models of spatial anisotropy arepresented in textbooks such as Isaaks and Srivastava(1989, pp. 377–390) and Goovaerts (1997a, pp. 90–95).

Regardless of the fitting procedure adopted, onemust keep in mind that the objective is to build a per-missible semivariogram model that captures the major

spatial features of the attribute under study. The use ofleast-squares criteria amounts to reducing semivario-gram modeling to an exercise in fitting a curve to ex-perimental values, which I think is too restrictive. Al-though experimental semivariogram values play an im-portant role in this process, ancillary information suchas provided by physical knowledge of the area and phe-nomenon may be of great interest. For example, strongprior qualitative information may lead one to adopt ananisotropic model even if data sparsity prevents seeinganisotropy from the experimental semivariograms com-puted in different directions.

Bivariate case

Modeling the coregionalization between two variablesZ and Y involves choosing and fitting functions to thetwo direct semivariograms gZZ(h) and gZZ (h) plus thecross semivariogram gZY(h). The difficulty lies in thefact that the three models cannot be built indepen-dently from one another. The easiest approach consistsof modeling the three semivariograms as linear combi-nations of the same set of basic semivariogram modelsgl(h):

gZZ(h)pL

Alp0

blZZ gl (h)

gYY (h)pL

Alp0

blYY gl(h)

gZY(h)pL

Alp0

blZY gl(h)

The so-called linear model of coregionalization (LMC)is permissible if: (1) the (Lc1) basic models gl(h) arepermissible, and (2) the coefficients satisfy the follow-ing necessary and sufficient conditions:

blZZ60, bl

YY60 Gl (16)

hblZYh^;bl

ZZ blYY Gl (17)

In practice, the modeling is performed in two steps:1. Both direct semivariograms are first modeled as lin-

ear combinations of selected basic structures gl(h).2. The same basic structures are then fitted to the cross

semivariogram under the constraint (17).This approach was used to fit visually the followingmodel to the (cross) semivariograms of pH and electri-cal conductivity displayed in Fig. 12:

gPH(h)p0.007g0(h)c0.01 Sph(h/6 m)c0.0065 Sph(h/26 m)

gEC(h)p100g0(h)c155 Sph(h/26 m)gpH-EC(h)pP0.4g0(h)c1.0 Sph(h/26 m) (18)

where g0(h) is a nugget effect model, and Sph(h/6 m) isthe spherical model (8) with range ap6 m. For each ofthe three basic models, the coefficients of the directsemivariograms are positive and the inequality (17) is

Page 11: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

325

Fig. 12 Experimental direct and cross semivariograms for soil pHand electrical conductivity along the pasture transect, with the lin-ear model of coregionalization fitted

satisfied: hP0.4h^;0.0077100, h0h^;0.0170, andh1h^;0.00657155.

The requirement that all semivariograms must sharethe same set of basic structures may sound like a severelimitation on the linear model of coregionalization.Variables that are well cross-correlated are, however,likely to show similar patterns of spatial variability. Inaddition, there is no need for the direct and cross semi-variograms to include all the basic structures; for exam-ple, the semivariogram of electrical conductivity andthe cross semivariogram do not include the short-range(6 m) spherical structure.

Multivariate case

Soil studies typically involve more than two attributes,and it is often necessary to model the coregionalizationfor a large number of variables. Checking the permissi-bility of the LMC becomes cumbersome because thesimple constraints (16) and (17) do not suffice for morethan two variables. Mathematically speaking, one mustcheck that, for each structure gl(h), the matrix of b-coefficients is positive semi-definite. A symmetric ma-trix is positive semi-definite if its determinant and all itsprincipal minor determinants are non-negative. Consid-er, for example, the linear model of coregionalizationfor three variables Z, Y and X:

3gZZ(h)gYZ(h)gXZ(h)

gZY(h)gYY(h)gXY(h)

gZX(h)gYX(h)gXX(h)

4pL

Alp0 3

blZZ

blYZ

blXZ

blZY

blYY

blXY

blZX

blYX

blXX4gl (h)

For each structure gl(h), the matrix of coefficients mustsatisfy the following seven inequalities:I All diagonal elements are non-negative:

blZZ60 bl

YY60 blXX60

I All principal minor determinants of order 2 are non-negative:

)blZZ

blYZ

blZY

blYY)pbl

ZZ blYYP[bl

YZ]260

)blYY

blXY

blYX

blXX

)pblYY bl

XXP[blXY]260

)blZZ

blXZ

blZX

blXX

)pblZZ bl

XXP[blXZ]260

I The determinant of order 3 is non-negative:

)bl

ZZ

blYZ

blXZ

blZY

blYY

blXY

blZX

blYX

blXX)pbl

ZZ (blYY bl

XXP[blYX]2)

PblZY(bl

YZ blXXPbl

XZ blYX)

cblZX(bl

YZ blXYPbl

XZ blYY)60

An iterative procedure that fits the linear model ofcoregionalization directly under such a constraint ofpositive semi-definiteness has been developed by Gou-lard (1989). This technique is increasingly used to mod-el the coregionalization of soil properties (Goovaerts1992; Goulard and Voltz 1992; Voltz and Goulard 1994;Webster et al. 1994; Dobermann et al. 1995, 1997).

Spatial prediction

Most geostatistical studies in soil science aim at esti-mating soil properties at unsampled places and map-ping them. Kriging is a generic name adopted by thegeostatisticians for a family of generalized least-squaresregression algorithms (Webster 1996). There are manydifferent kriging algorithms, and most of them are re-viewed in Goovaerts (1998) with references to soil ap-

Page 12: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

326

plications. In this paper, only the most common ordina-ry kriging and cokriging are presented.

Ordinary kriging

Consider the problem of estimating the value of a con-tinuous soil attribute z at an unsampled location u us-ing only data on this attribute [z(ua), ap1, . . ., n]. Ordi-nary kriging (OK) estimates this value as a linear com-bination of neighboring observations:

z*OK(u)p

n(u)

Aap1

la(u)z(ua) (19)

Typically, the 10–20 observations closest to u being es-timated are retained because they screen the influenceof those farther away, that is n(u)Pn. The key issue isthe determination of the weight la(u) assigned to eachobservation.

As for other linear regression procedures, ordinarykriging weights are chosen so as to minimize the esti-mation or error variance s2

E(u)pVar [Z*(u)PZ(u)]under the constraint of unbiasedness of the estimator.These weights are obtained by solving a system of lin-ear equations which is known as “ordinary kriging sys-tem”:

5n(u)

Abp1

lb(u) g(uaPub)Pm(u)pg(uaPu)

ap1, . . ., n(u) (20)n(u)

Abp1

lb(u)p1

Unbiasedness of the estimator is ensured by constrain-ing the weights to sum to one, which requires the defi-nition of the Lagrange parameter m(u). The only infor-mation required by the kriging system are semivario-gram values for different lags, and these are readily de-rived from the semivariogram model fitted to experi-mental values (recall previous section).

In addition to an estimate for the unknown z-value,ordinary kriging provides an error variance which iscomputed as:

s2OK(u)p

n(u)

Aap1

la(u)g(uaPu)Pm(u) (21)

Under stringent hypotheses of normality and homosce-dasticity, the kriging variance can be combined with theestimated value to derive a confidence interval; for ex-ample, the 95% confidence interval is taken as:

Prob{Z(u)B[z*OK(u)P2sOK(u),z*

OK(u)c2sOK(u)]}p0.95 (22)

Figure 13 (left column) shows the ordinary kriging esti-mates and error variances for pH along the transect inpasture. Only 20 pH values out of the 100 original datawere considered here so that estimated values can belater compared with actual measurements to assess the

prediction performances of kriging. The estimation wasperformed every meter using the ten closest pH valuesand the semivariogram model displayed at the top ofFig. 11. The ordinary kriging estimator is exact in that ithonors observations at data locations, and the krigingvariance is zero there. The error variance increasesaway from the data and reaches a maximum valuebeyond the extreme right datum (extrapolation situa-tion).

Besides providing an error variance, kriging has sev-eral advantages over other interpolation techniquessuch as the inverse distances algorithm: (1) it accountsfor the pattern of spatial variability (range, anisotropy)through the semivariogram model, and (2) it allows es-timation of the target attribute on a support that is dif-ferent from the data support, e.g., pH values measuredon soil cores can be used to predict average pH valueson larger surfaces, such as 1 ha plots, which are bettersuited for land management (Burgess and Webster1980b).

Ordinary cokriging

When measurements are sparse or poorly correlated inspace, the estimation of the primary attribute of inter-est is generally improved by taking into account sec-ondary information originating from other correlatedcontinuous attributes. The ordinary kriging estimate(19) is readily extended to incorporate that additionalinformation. In the simplest case of a single secondaryattribute Y, the so-called ordinary cokriging (OCK) es-timate is written as a linear combination of both neigh-boring primary and secondary data:

z*OCK(u)p

n(u)

Aap1

la(u)z(ua)cnb(u)

Aabp1

nab(u)y(uab) (23)

where secondary data have been measured at possiblydifferent locations uab. Like ordinary kriging, the objec-tive is to minimize the error variance under unbiased-ness constraint, which yields a very complex system of(n(u)cnb(u)c2) linear equations:

5n(u)

Abp1

lb(u)gZZ(uaPub)cnb(n)

Abbp1

nbb(u)gZY(uaPubb)

PmZ(u)pgZZ(uaPu) ap1, . . .,n(u)n(u)

Abp1

lb(u)gYZ(uabPub)cnb(u)

Abbp1

nbb(u)gYY(uabPubb)

PmY(u)pgYZ(uabPu) abp1, . . .,nb(u) (24)n(u)

Abp1

lb(u)p1

nb(u)

Abbp1

nbb(u)p0

There are now two Lagrange parameters to account forthe constraints on primary and secondary data weights,and the input information comprises the values of di-rect and cross semivariograms for different lags. Thecokriging variance is computed as:

Page 13: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

327

Fig. 13 Ordinary kriging and cokriging estimates of soil pH alongthe pasture transect, with the corresponding error variances. Theinformation available consists of 20 pH values (black dots) and100 electrical conductivity values

s2OCK(u)p

n(u)

Aap1

la(u) gZZ (uaPu)

cnb(u)

Aabp1

nab(u) gYZ(uabPu)PmZ(u) (25)

Figure 13 (right column) shows the ordinary cokrig-ing estimates and the corresponding error variances forpH along the transect in pasture. The informationavailable consists of 20 pH values plus the 100 electricalconductivity data shown at the bottom of Fig. 6. Theestimation was performed every meter using the tenclosest pH values and the ten closest electrical conduc-tivity values. The linear model of coregionalization ofFig. 12 was used to create the cokriging system. Ac-counting for this additional information yields a moredetailed estimated transect and smaller error variancesthan kriging does. The absolute estimation error wasalso computed at the 80 locations where the pH valuewas ignored, and results show that cokriging yields a10% reduction of the error: 0.0978 instead of 0.1031 forkriging. The small magnitude of the reduction is ex-plained mainly by the weak correlation (rp0.33) be-tween pH and electrical conductivity. The benefit of co-kriging over kriging increases as the correlation be-tween the two attributes increases and as the secondary

information becomes more abundant than the primaryinformation. Cokriging is thus particularly useful whena cheap-to-measure attribute is strongly correlated withthe attribute of interest.

Many examples of cokriging can be found in the soilliterature. For example, Yates and Warrick (1987) esti-mated gravimetric moisture content in a 1 ha plot usingthe bare soil surface temperature and the percent sandcontent as secondary variables. On a much larger scale,Gallichand and Marcotte (1993) mapped the clay per-centage for a study area of 33500 ha using 485 clay con-tent values and 3488 values of saturated hydraulic con-ductivity. Several authors (Leenaers et al. 1990; Odehet al. 1997) have shown how the readily available digitalelevation model can be used to improve the predictionof soil attributes through cokriging. Finally, Gotwayand Hartford (1996) demonstrated that accounting forcorn yield measurements through cokriging improvedthe prediction of residual nitrate in the field, leading tobetter recommendations for variable-rate N fertilizerapplication. Beware that several applications of cokrig-ing in the soil literature do not use a linear model ofcoregionalization, and even worse, the permissibility ofthe model fitted is rarely checked!

Factorial kriging

Any soil property measured at a certain location inspace is the outcome of several physical, chemical or

Page 14: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

328

biological processes. Some of them operate only at mi-croscopic scales (e.g., biological activities), whereasothers such as weathering of parent material, erosion orsolute transport may act over larger distances. Thecombined effect of different sources of spatial variationthat operate at distinct scales yields nested semivario-grams, such as the semivariogram of pH on pasture(Fig. 11, top graph) which was modeled as the sum of anugget effect and two spherical models of range 6 mand 26 m:

gpH(h)p0.007g0(h)c0.01 Sph(h/6 m)c0.0065 Sph(h/26 m) (26)

Factorial kriging or kriging analysis (Matheron 1982;Wackernagel 1988, 1995; Goovaerts 1992) is a variantof kriging which aims at estimating and mapping thedifferent sources of spatial variability identified on theexperimental semivariogram. For example, on the basisof the semivariogram model (26), the transect of pHvalues measured in pasture can be viewed as the sum ofdifferent microscopic (nugget effect), short-range(6 m), and long-range (26 m) spatial components:

z(ua)pz0(ua)cz1(ua)cz2(ua) Ga

Assuming that these components are independent, thetransect of original pH values is decomposed into local(microscopic c short-range) and regional (long-range)components using the approach described in Goovaerts(1992) (Fig. 14, left column). A similar decompositionis performed for the transect of electrical conductivityvalues on the basis of the semivariogram model of Fig.12 (middle graph).

Although spatial components are mathematical con-structions with no a priori physical meaning, they canhelp to identify the main sources of spatial variationand improve our understanding of underlying physicalprocesses in soil. Goovaerts (1994a) used factorial krig-ing to separate local variation in soil and vegetationproperties due to field-to-field differences from region-al variation related to the presence of different soiltypes. Webster et al. (1994) used a similar approach tomap local variation in heavy metal concentrationswhich are partly due to anthropic activities and regionalchanges caused by the geology. Dobermann et al.(1995) investigated field-scale variation of soil fertilityin the humid tropics, and factorial kriging allowed thediscrimination of short-range variation in P, Ca, Mgand K topsoil concentrations, which is due to patchy in-puts of nutrients from buffalo excrement, from long-range variation related to vertical and lateral soil waterflow along the topographic slope.

Scale-dependent relations

An interesting application of factorial kriging is the de-tection of scale-dependent correlations between soil at-tributes, i.e. correlations that change as a function ofthe spatial scale. For example, the scattergram of pH

versus electrical conductivity values indicates that thesetwo soil attributes are weakly correlated along thetransect in pasture (Fig. 15, top graph). Original meas-urements, however, result from a combination of differ-ent factors operating at different spatial scales; some ofthem such as measurement errors or microscale varia-tion in biological activities likely reduce artificially theoverall correlation between attributes. This effect ismade clear by looking at the relation between the localor regional components of pH and electrical conductiv-ity shown in Fig. 14. Whereas the two attributes areslightly negatively correlated at a local scale, they dis-play a strong positive correlation at the regional scale(Fig. 15, bottom graph), which agrees with the changeof sign of the cross semivariogram noticed on Fig-ure 8.

Such scale-dependent relations are frequent in soilscience where many properties are controlled by thesame physical processes that operate at different spatialscales and influence these properties in different ways.Goovaerts and Webster (1994) found a strong positivecorrelation between topsoil cobalt and copper concen-trations once large micro-scale variations due to meas-urement and procedural errors were filtered from theoriginal data. They also showed that the classificationby soil association accounts for a large proportion ofthe variance at the regional scale, suggesting that theparent material contributes substantially more to thetrace element content of the soil than had been thoughtearlier on the basis of a classical analysis where all dif-ferent sources of variation were mixed. Accounting forthe spatial scale in the study of correlation can thus im-prove our understanding of complex relations betweensoil attributes and their environment.

Instead of computing the correlation coefficient be-tween previously estimated spatial components, thecorrelation between two attributes z and y at a particu-lar scale l can be assessed directly from the parametersof the linear model of coregionalization as:

lZYpCorr[Zl(u),Yl(u)]p

blZY

;blZZ7bl

YY

(27)

Application of expression (27) to the linear model ofcoregionalization (18) yields the following structuralcorrelation coefficients l

ZY for the pair pH-electricalconductivity: P0.31 and 0.99 at the local (up to 6 m)and regional (26 m) scales, respectively. These valuesare somewhat larger than those of Fig. 15, which arebased on estimated values.

Multivariate factorial kriging is a variant of cokrig-ing which aims at estimating and mapping the differentsources of spatial variability which are common to theset of attributes. The maps of the so-called regionalizedfactors have been used to summarize major features ofmultivariate soil data sets at different spatial scalesidentified on the experimental semivariogram (Goov-aerts 1992; Dobermann et al. 1995). More recently,Monestiez et al. (1997) used factorial kriging to map

Page 15: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

329

Fig. 14 Decomposition of the pasture transect of pH and electri-cal conductivity values into local and regional components on thebasis of the nested semivariogram models of Fig. 12

the short-range (120 km) and long-range (300 km) spa-tial genetic structures of wild populations of perennialryegrass in France.

Advanced geostatistical methods

Although geostatistics is still mainly used for descrip-tion of spatial patterns and prediction of unsampledsoil attributes, the last 5 years have seen the develop-

ment of new techniques which allow soil scientists totackle advanced issues such as the assessment of theuncertainty about soil quality parameters or soil pollu-tant concentrations, or the stochastic simulation of thespatial distribution of attribute values. These topics arebriefly addressed in this section, and the reader shouldrefer to Goovaerts (1997a, pp. 259–436) or Goovaerts(1998) for a more detailed presentation.

Assessing the local uncertainty

Mapping metal concentrations or other soil propertiesis often a preliminary step towards decision makingsuch as the delineation of polluted areas or the identi-

Page 16: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

330

Fig. 15 Scattergrams of pH vs electrical conductivity values: ori-ginal concentrations and spatial components displayed on Fig.14 Fig. 16 Kriged estimates of the probability for the soil pH along

the pasture transect to be smaller than 6.68. The informationavailable consists of 20 pH values (black dots) plus the indicatorsemivariogram model

fication of zones that are suitable for crop growth. Inmany applications, the focus is on the exceedence ofparticular values such as regulatory thresholds for hea-vy metal contamination or critical values of limiting fac-tors (e.g., depth to parent material, soil acidity) for soilquality. There is necessarily some error attached to thekriging estimate, and this must be accounted for in thedecision rule. Ignoring this uncertainty may, for exam-ple, lead one to declare safe a contaminated location onthe basis of a wrong estimate of pollutant concentrationwhich is slightly below the regulatory threshold.

Geostatistics is increasingly used to estimate andmap the risk of exceeding specific threshold values insoil science (Webster and Oliver 1989; Smith et al.1993; Goovaerts and Journel 1995; Goovaerts et al.1997; Mohammadi et al. 1997). The most straightfor-ward approach is based on a coding of each observationz(ua) into an indicator of non-exceedence of the targetthreshold value zk; recall expression (4)

i(ua;zk)p51 if z(ua)^zk

0 otherwise

The probability that the attribute value z does not ex-ceed zk at the unsampled location u is then estimatedfrom the indicator transforms using a kriging estimatorsimilar to the one developed for continuous attributes.For example, ordinary indicator kriging builds theprobability estimate as a linear combination of neigh-boring indicator data:

[Prob{Z(u)^zkh(n)}]*pn(u)

Aap1

la(u;zk) i(ua;zk) (28)

where the weights la(u;zk) are obtained by solving asystem of linear equations identical to system (20) ex-cept that semivariogram values are now derived fromthe model gI(h;zk) fitted to the experimental indicatorsemivariogram (5).

Figure 16 shows the probability that the soil pH is nogreater than 6.68 along the transect in pasture. Theseprobabilities were estimated every meter using ordina-

Page 17: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

331

Fig. 17 Transects of 100 pH values measured in the topsoil of apasture (top graph), and the transects of estimated or simulatedvalues constructed from only 20 observations depicted by blackdots. Three different realizations were generated using stochasticsimulation

ry indicator kriging and the indicator semivariogrammodel displayed in Fig. 16. Note that the modeling ofindicator semivariograms follows the same rules as thatof semivariograms of continuous attributes.

Prediction of probabilities is generally improved byincorporating additional information, e.g., by cokrigingof indicator transforms of both primary and secondaryinformation. Goovaerts and Journel (1995) showedhow probabilities of deficiency in copper and cobaltcan be derived from the calibration of a soil map andcombined with precise measurements of metal concen-tration to map the risk of deficiency of these metals inthe soil. They have also introduced the concept of lossfunction which, in combination with probability maps,allowed one to map the economical impact of declaringwrongly that a location is deficient or sufficient in thesemetals. Similar applications to the delineation of con-taminated soils are presented in Colin et al. (1996) andGoovaerts et al. (1997).

Stochastic simulation

The comparison of the transects of measured and esti-mated pH values displayed in Fig. 17 indicates thatkriging smooths out local details of the spatial variationof pH values in pasture. Such smoothing results fromthe least-squares criterion of the kriging algorithm andleads to an unfortunate overestimation of small valuesand underestimation of large values. In stochastic simu-lation, the aim is not the minimization of the error var-iance but to generate a set of values that reproducesstatistics such as the sample histogram or the semivario-

Page 18: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

332

gram model while honoring data at their locations.Mathematically speaking, each set of values is viewedas a realization of the random function.

For example, Fig. 17 shows three realizations of thespatial distribution of pH values generated using thesame information as in kriging, that is, 20 pH valuesdepicted by black dots and the semivariogram model ofFig. 11 (top graph). The transects of simulated valueslook more “realistic” than the transect of kriging esti-mates because they reproduce the spatial variabilitymodeled from the sample information. Stochastic simu-lation is thus increasingly preferred to kriging for allapplications where the spatial variability of the meas-ured field must be preserved, such as the delineation ofcontaminated areas (Desbarats 1996; Goovaerts 1997b)or the modeling of solute transport in the vadoze zone(Vanderborght et al. 1997). Like estimation, simulationcan be accomplished using a growing variety of tech-niques, and most of them are described in Goovaerts(1997a, pp. 376–424).

Each of the three realizations shown in Fig. 17 is aplausible representation of the unique and unknowndistribution of pH values along the transect in that eachsimulated transect honors the 20 data and reproducesapproximately the sample histogram and semivario-gram model. Differences between the three realizationsthus provide a measure of spatial uncertainty: featuressuch as segments of high pH are deemed certain if seenon most of the realizations, and their probability of oc-currence (i.e., probability that a given threshold is joint-ly exceeded at a series of locations) can be computed aslong as the realizations are equiprobable. Unlike sto-chastic simulation, indicator kriging provides only ameasure of local uncertainty in that it estimates theprobability that the threshold is exceeded at a single lo-cation; recall expression (28).

The impact of a given scenario, such as applicationof a particular amount of fertilizer, can be investigatedfrom a simulated map that reproduces aspects of thepattern of spatial dependence or other statisticsdeemed consequential for the problem at hand (e.g.,connectivity of large values, spatial correlation withsecondary attributes). Moreover, the availability ofmany equiprobable realizations allows one to assess theuncertainty about the consequences of this particularscenario, such as leaching of fertilizer and contamina-tion of groundwater, which results from our imperfectknowledge of the spatial distribution of soil attributevalues.

Conclusions

Any quantitative processing of soil information is in-complete as long as it takes no account of the spatialcoordinates of observations. Geostatistical characteri-zation of the spatial variability through semivariogramsor correlograms generally brings new insight into theway soil attributes are influenced by the environment,

such as geographical distribution of soil types or topo-graphy. Indicator semivariograms provide additionalinformation about the spatial distribution of specificclasses of values of continuous attributes as well as ca-tegorical soil attributes. Cross semivariograms andcross correlograms complete the description by assess-ing spatial relations between pairs of attributes.

A key step in any geostatistical analysis is the fittingof permissible models to the experimental semivario-gram values. In this paper, basic rules for univariateand multivariate semivariogram modeling have beenrecalled, with the hope that it will reduce the use of un-stable (i.e., Gaussian) or non-permissible models in soilstudies. Permissible models are, however, one ingre-dient of the modeling process, and whenever physicalknowledge of the area and phenomenon under study isavailable it should be accounted for in the building of amodel of spatial variability.

The existence of a model of spatial variability (de-pendence) allows one to tackle the problem of estimat-ing soil attribute values at unsampled locations. Geo-statistics offers a palette of linear least-squares (krig-ing) algorithms to incorporate different types of infor-mation in the estimation process. Kriging algorithmscan also be modified to decompose original observa-tions into specific spatial components, enabling the spa-tial fluctuations and interactions between soil attributesto be studied at different scales.

Recent developments in indicator geostatistics allowa straightforward assessment of the probability to ex-ceed critical values, such as regulatory thresholds in soilpollution or criteria for soil quality. Another way tomodel uncertainty is to generate many images (realiza-tions) that all honor the data and reproduce aspects ofthe patterns of spatial dependence or other statisticsdeemed consequential for the problem at hand. In thefuture, more attention should be devoted to the use ofthese various measures of local or spatial uncertainty indecision making, such as determination of the amountof fertilizer to be applied or the delineation of pollutedareas targeted for remediation.

Processes such as weathering of parent material, soilerosion or solute transport are dynamic, leading to var-iation of soil attribute values in both space and time.Temporal fluctuations are particularly important formicrobiological processes and cannot be ignored in sta-tistical modeling of soil variability. Although not intro-duced in the present paper, spatial geostatistical toolssuch as semivariogram or kriging can be extended tothe space-time domain.

References

Bierkens MFP, Burrough PA (1993a) The indicator approach tocategorical soil data. I. Theory. J Soil Sci 44 :361–368

Bierkens MFP, Burrough PA (1993b) The indicator approach tocategorical soil data. II. Application to mapping and land usesuitability analysis. J Soil Sci 44 :369–381

Page 19: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

333

Burgess TM, Webster R (1980a) Optimal interpolation and isa-rithmic mapping of soil properties. I. The variogram and punc-tual kriging. J Soil Sci 31 :315–331

Burgess TM, Webster R (1980b) Optimal interpolation and isa-rithmic mapping of soil properties. II. Block kriging. J Soil Sci31 :333–341

Burgess TM, Webster R, McBratney AB (1981) Optimal interpo-lation and isarithmic mapping of soil properties. VI. Samplingstrategy. J Soil Sci 31 :643–659

Colin P, Froidevaux R, Garcia M, Nicoletis S (1996) Integratinggeophysical data for mapping the contamination of industrialsites by polycyclic aromatic hydrocarbons: a geostatistical ap-proach. In: Rouhani S, Srivastava RM, Desbarats AJ, CromerMV, Johnson AI (eds) Geostatistics for environmental andgeotechnical applications. American Society for Testing andMaterials STP 1283, Philadelphia, pp 69–87

Cressie N (1985) Fitting variogram models by weighted leastsquares. Math Geol 17 :563–586

Desbarats AJ (1996) Modeling spatial variability using geostatisti-cal simulation. In: Rouhani S, Srivastava RM, Desbarats AJ,Cromer MV, Johnson Al (eds) Geostatistics for environmen-tal and geotechnical applications. American Society for Test-ing and Materials STP 1283, Philadelphia, pp 32–48

Deutsch CV, Journel AG (1998) GSLIB: geostatistical softwarelibrary and user’s guide, 2nd edn. Oxford University Press,New York

Dobermann A, Goovaerts P, George T (1995) Sources of soil var-iation in an acid Ultisol of the Philippines. Geoderma 68 :173–191

Dobermann A, Goovaerts P, Neue HU (1997) Scale-dependentcorrelations among soil properties in two tropical lowland ricefields. Soil Sci Soc Am J 61 :1483–1496

Englund E, Sparks A (1991) Geo-EAS 1.2.1 user’s guide. EPAreport 8 600/8-91/008, EPA-EMSL, Las Vegas, Nev

Gallichand J, Marcotte D (1993) Mapping clay content for sub-surface drainage in the Nile Delta. Geoderma 58 :165–179

Goovaerts P (1992) Factorial kriging analysis: a useful tool forexploring the structure of multivariate spatial soil information.J Soil Sci 43 :597–619

Goovaerts P (1994a) Study of spatial relationships between twosets of variables using multivariate geostatistics. Geoderma62 :93–107

Goovaerts P (1994b) Comparison of coIK, IK, and mIK perform-ances for modeling conditional probabilities of categoricalvariables. In: Dimitrakopoulos R (ed) Geostatistics for thenext century. Kluwer, Dordrecht, pp 18–29

Goovaerts P (1997a) Geostatistics for natural resources evalua-tion. Oxford University Press, New York

Goovaerts P (1997b) Kriging vs stochastic simulation for riskanalysis in soil contamination. In: Soares A, Gómez-Hernán-dez J, Froidevaux R (eds) geoENV I – geostatistics for envi-ronmental applications. Kluwer, Dordrecht, pp 247–258

Goovaerts P (1998) Geostatistics in soil science: state-of-the-artand perspectives. Geoderma (in press)

Goovaerts P, Journel AG (1995) Integrating soil map informationin modelling the spatial variation of continuous soil proper-ties. Eur J Soil Sci 46 :397–414

Goovaerts P, Webster R (1994) Scale-dependent correlation be-tween topsoil copper and cobalt concentrations in Scotland.Eur J Soil Sci 45 :79–95

Goovaerts P, Gérard G, Frankart R (1989) Etude de la variabilitéspatiale de quelques propriétés chimiques du sol en Fagne deChimay, Belgique. Pédologie 39 :191–207

Goovaerts P, Webster R, Dubois J-P (1997) Assessing the risk ofsoil contamination in the Swiss Jura using indicator geostatis-tics. Environ Ecol Stat 4 :31–48

Gotway CA, Hartford AH (1996) Geostatistical methods for in-corporating auxiliary information in the prediction of spatialvariables. J Agric Biol Environ Stat 1 :17–39

Goulard M (1989) Inference in a coregionalization model. In:Armstrong M (ed) Geostatistics. Kluwer, Dordrecht, pp 397–408

Goulard M, Voltz M (1992) Linear coregionalization model: toolsfor estimation and choice of cross-variogram matrix. MathGeol 24 :269–286

Isaaks EH, Srivastava RM (1989) An introduction to appliedgeostatistics. Oxford University Press, New York

Journel AG, Huijbregts CJ (1978) Mining geostatistics. AcademicPress, New York

Leenaers H, Okx JP, Burrough PA (1990) Employing elevationdata for efficient mapping of soil pollution on floodplains. SoilUse Manag 6 :105–114

Leonte D, Schofield N (1996) Evaluation of a soil contaminatedsite and clean-up criteria: a geostatistical approach. In: Rou-hani S, Srivastava RM, Desbarats AJ, Cromer MV, JohnsonAI (eds) Geostatistics for environmental and geotechnical ap-plications. American Society for Testing and Materials STP1283, Philadelphia, pp 133–145

Matheron G (1982) Pour une analyse krigeante de données ré-gionalisées. Centre de Géostatistique, Report N-732, Fontai-nebleau

McBratney AB, Webster R (1981) Detection of ridge and furrowpattern by spectral analysis of crop yield. Int Stat Rev 49 :45–52

McBratney AB, Webster R (1986) Choosing functions for semi-variograms of soil properties and fitting them to sampling esti-mates. Soil Sci 37 :617–639

Mohammadi J, Van Meirvenne M, Goovaerts P (1997) Mappingcadmium concentration and the risk of exceeding a local sani-tation threshold using indicator geostatistics. In: Soares A,Gómez-Hernández J, Froidevaux R (eds) geoENV I – geosta-tistics for environmental applications. Kluwer, Dordrecht, pp327–337

Monestiez P, Goulard M, Charmet G, Balfourier F (1997) Ana-lysing spatial genetic structures by multivariate geostatistics:study of wild populations of perennial ryegrass. In: Baafi EY,Schofield NA (eds) Geostatistics Wollongong ’96. Kluwer,Dordrecht, pp 1197–1208

Odeh IOA, McBratney AB, Slater BK (1997) Predicting soilproperties from ancillary information: non-spatial modelscompared with geostatistical and combined methods. In: BaafiEY, Schofield NA (eds) Geostatistics Wollongong ’96. Kluw-er, Dordrecht, pp 1008–1019

Oliver MA, Webster R (1986a) Combining nested and linear sam-pling for determining the scale and form of spatial variation ofregionalized variables. Geogr Anal 18 :227–242

Oliver MA, Webster R (1986b) Semi-variograms for modellingthe spatial pattern of landform and soil properties. Earth Sur-face Proc Landforms 11 :491–504

Pannatier Y (1996) Variowin: software for spatial data analysis in2-D. Springer, New York Berlin Heidelberg

Parkin TB (1987) Soil microsites as a source of denitrification var-iability. Soil Sci Soc Am J 51 :1194–1199

Robertson GP, Huston MA, Evans FC, Tiedje JM (1988) Spatialvariability in a successional plant community: patterns of ni-trogen availability. Ecology 69 :1517–1524

Rossi RE, Mulla DJ, Journel AG, Franz EH (1992) Geostatisticaltools for modeling and interpreting ecological spatial depend-ence. Ecol Monogr 62 :277–314

Smith JL, Halvorson JJ, Papendick RI (1993) Using multiple-vari-able indicator kriging for evaluating soil quality. Soil Sci SocAm J 57 :743–749

Sutherland RA, Kessel C van, Pennock D (1991) Spatial variabil-ity of nitrogen-15 natural abundance. Soil Sci Soc Am J55 :1339–1347

Trangmar BB, Yost RS, Uehara G (1985) Application of geosta-tistics to spatial studies of soil properties. Adv Agron 38 :45–94

Vanderborght J, Jacques D, Mallants D, Tseng PH, Feyen J(1997) Analysis of solute redistribution in heterogeneous soil.II. Numerical simulation of solute transport. In: Soares A,Gómez-Hernández J, Froidevaux R (eds) geoENV I – Geosta-tistics for environmental applications. Kluwer, Dordrecht, pp283–295

Page 20: P. Goovaerts Geostatistical tools for characterizing the ...mmc2.geofisica.unam.mx/cursos/geoest/Articulos/Geostatistics... · in data processing, allowing for description and model-ing

334

Vieira SR, Hatfield JL, Nielsen DR, Biggar JW (1983) Geostatis-tical theory and application to variability of some agronomicalproperties. Hilgardia 5 :1–75

Voltz M, Goulard M (1994) Spatial interpolation of soil moistureretention curves. Geoderma 62 :109–123

Wackernagel H (1988) Geostatistical techniques for interpretingmultivariate spatial information. In: Chung CF, Fabbri AG,Sinding-Larsen R (eds) Quantitative analysis of mineral andenergy resources. Reidel, Dordrecht, pp 393–409

Wackernagel H (1995) Multivariate geostatistics: an introductionwith applications. Springer, Berlin Heidelberg New York

Warrick AW, Myers DE, Nielsen DR (1986) Geostatistical meth-ods applied to soil science. In: Methods of soil analysis, part 1,2nd edn Physical and mineralogical methods. AgronomyMonograph 9, pp 53–82

Webster R (1996) What is kriging? Aspects Appl Biol 46 :57–66Webster R (1997) Software review. Eur J Soil Sci 48 :173–175

Webster R, Boag B (1992) Geostatistical analysis of cyst nema-todes in soil. J Soil Sci 43 :583–595

Webster R, Burgess TM (1980) Optimal interpolation and isarith-mic mapping of soil properties. III. Changing drift and univer-sal kriging. J Soil Sci 31 :505–524

Webster R, Oliver MA (1989) Optimal interpolation and isarith-mic mapping of soil properties. VI. Disjunctive kriging andmapping the conditional probability. J Soil Sci 40 :497–512

Webster R, Atteia O, Dubois J-P (1994) Coregionalization oftrace metals in the soil in the Swiss Jura. Eur J Soil Sci45 :205–218

Wollum AG, Cassel DK (1984) Spatial variability of Rhizobiumjaponicum in two North Carolina soils. Soil Sci Soc Am J48 :1082–1086

Yates SR, Warrick AW (1987) Estimating soil water content us-ing cokriging. Soil Sci Soc Am J 51 :23–30