Kriging: An Introduction to Concepts and Applications · 2020-07-02 · math. function. Demo #1 Map the data, Geostatistical Wizard, ESDA, Configure options. Variography (Modeling)

Kriging: An Introduction to Conceptsand ApplicationsNicholas M. Giner – Esri

Agenda

• What is interpolation?

• Interpolation applications

• Spatial autocorrelation

• Deterministic vs. Geostatistical interpolators

• Building up interpolation

• Kriging theory

• Empirical Bayesian Kriging (EBK)

• EBK Regression

• EBK 3D

• Areal Interpolation

What is interpolation?• Process of predicting values at unknown locations using values at known location

• Transforms measurements of a continuous phenomenon into a continuous surface

• Interpolation predicts within region; Extrapolation predicts outside region

What is interpolation?

Interpolation applications• Many continuous phenomena (z)

- Elevation

- Soil (pH, nutrient levels, porosity)

- Precipitation / Snowfall

- Temperature

- Windspeed

- Air pollution / Air quality

- Ozone

- Water quality

- Mining

- Heavy metal concentrations

- Environmental contaminants

- Noise

- Disease occurrence

Spatial autocorrelation• Tobler’s First Law of Geography

- “…everything is related to everything else, but near things are more related than distant things”

• O’Sullivan and Unwin, 2003

- “If geography is worth studying at all, it must be because phenomena do not vary randomly across space”

Deterministic vs. Geostatistical interpolators• Deterministic interpolators

- Based on mathematical functions, not statistical theory

- Model parameters are determined by the user

- Does not include randomness

- No estimates of prediction error (uncertainty/accuracy/confidence)

- Examples: Inverse Distance Weighting (IDW), Spline, Global Polynomial Interpolation

• Geostatistical interpolators

- Based on mathematical functions, AND statistical theory

- Model parameters are estimated based on the data (spatial autocorrelation)

- Includes randomness to approximate the variation present in geographic data

- Produces estimates of prediction error (uncertainty/accuracy/confidence)

- Example: Kriging

Two components of all interpolators• Neighborhood definition – distance or number of points

• Estimation function – mathematics used to make the estimation

(e.g. determine the weights)

Building up interpolation

Source: Geographic Information Analysis – O’Sullivan and Unwin

• Average of all data points: 49

https://www.amazon.com/Geographic-Information-Analysis-David-OSullivan/dp/0470288574


• Local spatial average: 40.75

- All points in the local neighborhood are weighted equally


• Inverse Distance Weighted (IDW): 41.01

- Closer points have higher weights and more influence

Source: Geographic Information Analysis – O’Sullivan and Unwin

https://www.amazon.com/Geographic-Information-Analysis-David-OSullivan/dp/0470288574


• Inverse Distance Weighted (IDW): 49.8

- More influence from points below simply because they are within the neighborhood and closer in distance


• Kriging: 56.2

- Prediction is based on how correlated points are based on distance

- There can be negative weights

Geostatistics and Kriging • Geostatistics - statistics of spatially correlated data

• Quantify spatial autocorrelation and incorporate it into the interpolation

• Kriging – “optimal” interpolator given that data meets certain conditions (assumptions)

- Based on the foundational work by Daniel Krige and George Matheron in the 1950s-1960s predicting gold ores in South Africa

- Main idea is that spatial data can be decomposed into two main components

1) Deterministic variation (global trend)

• Can be constant mean or mathematical function

2) Spatially correlated, random variation (local autocorrelation)

Z (s) = µ + ε(s)

Prediction = mean + error

What makes it “optimal”?• Estimates true value, on average (unbiased)

• Lowest expected prediction error

• Can use information about covariates

• Can be generalized to different geometries

• Estimates a prediction distribution at each location (not just one value)

• Kriging assumptions- Normally distributed

- No trends

- Spatially autocorrelated

- Stationary

Kriging assumption: Normal distribution• If your input data is normally distributed, you can guarantee that your predicted

distribution will be normally distributed

• Many transformation options if not

Histogram

QQ Plot

Kriging assumption: No trends• Systematic patterns and trends in an area might impact the interpolation

• Trade-off with spatial autocorrelation

Kriging assumption: Spatial autocorrelation• How correlated are points based on how far apart they are from one another

• Once you know expected correlation in known values given distance, you can predict the value at unknown locations

Kriging assumption: Stationarity• The correlation between points is defined only by the distance between them, not

their location

- Mean stationarity

- Local stationarity

Kriging workflow

1) Map your data 2) Exploratory Spatial Data Analysis (ESDA) Configure options

3) Variography –Describe spatialvariation in thedata

5) Use model to determine weightsin search neighborhood

6) Interpolate7) Evaluate

(Cross-validation)

8) Repeat Steps 2-7

4) Fit model –Summarize spatialvariation with a math. function

Demo #1Map the data, Geostatistical Wizard, ESDA, Configure options

Variography (Modeling)• Examining and modeling spatial autocorrelation

Variography (Modeling)

1) Calculate empirical semivariogram

- Calculate distance and difference between each pair of points

2) Bin the semivariogram

- Group the pairs of locations into a specified range of distances (lags)

3) Average the semivariogram

- Calculate the average distance and difference (semivariance) for each lag

4) Fit a model

- Find the best fit line for the average semivariances

Semivariogram (distance h) = 0.5 * average (location i – location j)2

Semivariogram

• Represents the expected difference in data value for pairs of points that are a given distance apart, regardless of their spatial location

Nugget – semivariance at 0 distance (measurement error)

Range – distance at which autocorrelation falls off, where semivariance is constant, where there is no more spatial structure in the data. Points are uncorrelated after the range.(data correlation)

Sill – constant semivariance value beyond the range(data variance)

Demo #2Simple kriging

Validation• Full validation

- Split data into ~80% training, ~20% testing

• Cross-validation (“Leave-one out”)

- Remove a single known point, use all remaining points to interpolate at that location, then compare measured value to predicted value

• Diagnostics

- Predictions should be unibiased (e.g. over- and under-predictions should cancel each other out)

- Mean Error should be near zero (unbiased)

- Mean Standardized Error should be near zero

- Predictions should be closed to known values

- Root Mean Square Error (RMSE) should be as small as possible

- Assessment of model stability and accuracy of standard errors

- Root Mean Square Standardized should be close to 1

- Average Standard Error close to RMSE

Empirical Bayesian Kriging (EBK)• Automates the most difficult aspects of building a valid kriging model

• Not as many parameters

• Relaxes the stationarity assumption of kriging

• More accurate estimates of prediction standard errors

• Handles uncertainty associated with one semivariogram (true)

How EBK works

1. Divide data into local subsets of a given size (can overlap)

2. For each subset, estimate the semivariogram

3. Use this semivariogram to simulate a new set of values for the points (sim #1)

4. Produce a semivariogram from the simulated points (semiv #1)

5. Repeat step 3 many times, resulting in a distribution of semivariograms

6. Mix the local prediction surfaces together to get the final surface

Demo #3EBK

EBK Regression Prediction

• Combines regression with kriging

• Allows covariates (explanatory variables to improve predictions)

• Both regression models and kriging models are estimated locally

• Uses Principal Components Analysis (PCA)

Prediction = mean + error

• Mean is constant and error term is estimatedfrom surrounding points

• Estimation focuses on the error terms, and doeslittle with the mean

Kriging

Prediction (DV) = intercept + (v1 * coef1) + (v2 * coef2) +… (vk * coefk) + error

• Error term is assumed to be random noise (unmodellable)• Estimation focuses on the mean, and does little

with the error terms

Regression (OLS)

Regression Kriging

Prediction (DV) = intercept + (v1 * coef1) + (v2 * coef2) +… (vk * coefk) + error

• Regression equation estimates the mean for kriging• Error is modeled with the semivariogram, and kriging is performed

• If semivariogram is flat, you essentially have OLS• If there are no explanatory variables, you essentially have simple kriging

Demo #4EBK Regression

EBK 3D

• Applies the EBK model to 3D

- Distances are calculated using 3D Euclidean Distance

- Subsets are created in 3D

- Search neighborhoods are 3D

- Vertical trend can be removed

• Elevation Inflation Factor

- Vertical variation happens at ta different rate than horizontal variation

Demo #5EBK 3D

Areal Interpolation

• Applies kriging theory to polygon data

• Two main use cases

- Fill missing data

- Downscale from larger polygons to smaller polygons

• Three data inputs

- Average (Gaussian)

- Rate (Binomial)

- Count (Poisson)

Demo #5Areal Interpolation

Print Your Certificate of Attendance

Print Stations Located in 150 Concourse Lobby

Tuesday12:30 pm – 6:30 pmExpoHall B

5:15 pm – 6:30 pmExpo SocialHall B

Wednesday10:45 am – 5:15 pmExpoHall B

6:30 pm – 9:30 pmNetworking ReceptionSmithsonian National Museumof Natural History

Download the EsriEvents app and find

your event

Select the session you attended

Scroll down to “Survey”Log in to access the

surveyComplete the survey and select “Submit”

Please Share Your Feedback in the App

Documents

Kriging: An Introduction to Concepts and Applications · 2020-07-02 · math. function. Demo #1 Map the data, Geostatistical Wizard, ESDA, Configure options. Variography (Modeling)