Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Kriging: An Introduction to Conceptsand ApplicationsNicholas M. Giner – Esri
Agenda
• What is interpolation?
• Interpolation applications
• Spatial autocorrelation
• Deterministic vs. Geostatistical interpolators
• Building up interpolation
• Kriging theory
• Empirical Bayesian Kriging (EBK)
• EBK Regression
• EBK 3D
• Areal Interpolation
What is interpolation?• Process of predicting values at unknown locations using values at known location
• Transforms measurements of a continuous phenomenon into a continuous surface
• Interpolation predicts within region; Extrapolation predicts outside region
What is interpolation?
Interpolation applications• Many continuous phenomena (z)
- Elevation
- Soil (pH, nutrient levels, porosity)
- Precipitation / Snowfall
- Temperature
- Windspeed
- Air pollution / Air quality
- Ozone
- Water quality
- Mining
- Heavy metal concentrations
- Environmental contaminants
- Noise
- Disease occurrence
Spatial autocorrelation• Tobler’s First Law of Geography
- “…everything is related to everything else, but near things are more related than distant things”
• O’Sullivan and Unwin, 2003
- “If geography is worth studying at all, it must be because phenomena do not vary randomly across space”
Deterministic vs. Geostatistical interpolators• Deterministic interpolators
- Based on mathematical functions, not statistical theory
- Model parameters are determined by the user
- Does not include randomness
- No estimates of prediction error (uncertainty/accuracy/confidence)
- Examples: Inverse Distance Weighting (IDW), Spline, Global Polynomial Interpolation
• Geostatistical interpolators
- Based on mathematical functions, AND statistical theory
- Model parameters are estimated based on the data (spatial autocorrelation)
- Includes randomness to approximate the variation present in geographic data
- Produces estimates of prediction error (uncertainty/accuracy/confidence)
- Example: Kriging
Two components of all interpolators• Neighborhood definition – distance or number of points
• Estimation function – mathematics used to make the estimation
(e.g. determine the weights)
Building up interpolation
Source: Geographic Information Analysis – O’Sullivan and Unwin
• Average of all data points: 49
Building up interpolation
• Local spatial average: 40.75
- All points in the local neighborhood are weighted equally
Building up interpolation
• Inverse Distance Weighted (IDW): 41.01
- Closer points have higher weights and more influence
Source: Geographic Information Analysis – O’Sullivan and Unwin
Building up interpolation
• Inverse Distance Weighted (IDW): 49.8
- More influence from points below simply because they are within the neighborhood and closer in distance
Building up interpolation
• Kriging: 56.2
- Prediction is based on how correlated points are based on distance
- There can be negative weights
Geostatistics and Kriging • Geostatistics - statistics of spatially correlated data
• Quantify spatial autocorrelation and incorporate it into the interpolation
• Kriging – “optimal” interpolator given that data meets certain conditions (assumptions)
- Based on the foundational work by Daniel Krige and George Matheron in the 1950s-1960s predicting gold ores in South Africa
- Main idea is that spatial data can be decomposed into two main components
1) Deterministic variation (global trend)
• Can be constant mean or mathematical function
2) Spatially correlated, random variation (local autocorrelation)
Z (s) = µ + ε(s)
Prediction = mean + error
What makes it “optimal”?• Estimates true value, on average (unbiased)
• Lowest expected prediction error
• Can use information about covariates
• Can be generalized to different geometries
• Estimates a prediction distribution at each location (not just one value)
• Kriging assumptions- Normally distributed
- No trends
- Spatially autocorrelated
- Stationary
Kriging assumption: Normal distribution• If your input data is normally distributed, you can guarantee that your predicted
distribution will be normally distributed
• Many transformation options if not
Histogram
QQ Plot
Kriging assumption: No trends• Systematic patterns and trends in an area might impact the interpolation
• Trade-off with spatial autocorrelation
Kriging assumption: Spatial autocorrelation• How correlated are points based on how far apart they are from one another
• Once you know expected correlation in known values given distance, you can predict the value at unknown locations
Kriging assumption: Stationarity• The correlation between points is defined only by the distance between them, not
their location
- Mean stationarity
- Local stationarity
Kriging workflow
1) Map your data 2) Exploratory Spatial Data Analysis (ESDA) Configure options
3) Variography –Describe spatialvariation in thedata
5) Use model to determine weightsin search neighborhood
6) Interpolate7) Evaluate
(Cross-validation)
8) Repeat Steps 2-7
4) Fit model –Summarize spatialvariation with a math. function
Demo #1Map the data, Geostatistical Wizard, ESDA, Configure options
Variography (Modeling)• Examining and modeling spatial autocorrelation
Variography (Modeling)
1) Calculate empirical semivariogram
- Calculate distance and difference between each pair of points
2) Bin the semivariogram
- Group the pairs of locations into a specified range of distances (lags)
3) Average the semivariogram
- Calculate the average distance and difference (semivariance) for each lag
4) Fit a model
- Find the best fit line for the average semivariances
Semivariogram (distance h) = 0.5 * average (location i – location j)2
Semivariogram
• Represents the expected difference in data value for pairs of points that are a given distance apart, regardless of their spatial location
Nugget – semivariance at 0 distance (measurement error)
Range – distance at which autocorrelation falls off, where semivariance is constant, where there is no more spatial structure in the data. Points are uncorrelated after the range.(data correlation)
Sill – constant semivariance value beyond the range(data variance)
Demo #2Simple kriging
Validation• Full validation
- Split data into ~80% training, ~20% testing
• Cross-validation (“Leave-one out”)
- Remove a single known point, use all remaining points to interpolate at that location, then compare measured value to predicted value
• Diagnostics
- Predictions should be unibiased (e.g. over- and under-predictions should cancel each other out)
- Mean Error should be near zero (unbiased)
- Mean Standardized Error should be near zero
- Predictions should be closed to known values
- Root Mean Square Error (RMSE) should be as small as possible
- Assessment of model stability and accuracy of standard errors
- Root Mean Square Standardized should be close to 1
- Average Standard Error close to RMSE
Empirical Bayesian Kriging (EBK)• Automates the most difficult aspects of building a valid kriging model
• Not as many parameters
• Relaxes the stationarity assumption of kriging
• More accurate estimates of prediction standard errors
• Handles uncertainty associated with one semivariogram (true)
How EBK works
1. Divide data into local subsets of a given size (can overlap)
2. For each subset, estimate the semivariogram
3. Use this semivariogram to simulate a new set of values for the points (sim #1)
4. Produce a semivariogram from the simulated points (semiv #1)
5. Repeat step 3 many times, resulting in a distribution of semivariograms
6. Mix the local prediction surfaces together to get the final surface
Demo #3EBK
EBK Regression Prediction
• Combines regression with kriging
• Allows covariates (explanatory variables to improve predictions)
• Both regression models and kriging models are estimated locally
• Uses Principal Components Analysis (PCA)
Prediction = mean + error
• Mean is constant and error term is estimatedfrom surrounding points
• Estimation focuses on the error terms, and doeslittle with the mean
Kriging
Prediction (DV) = intercept + (v1 * coef1) + (v2 * coef2) +… (vk * coefk) + error
• Error term is assumed to be random noise (unmodellable)• Estimation focuses on the mean, and does little
with the error terms
Regression (OLS)
Regression Kriging
Prediction (DV) = intercept + (v1 * coef1) + (v2 * coef2) +… (vk * coefk) + error
• Regression equation estimates the mean for kriging• Error is modeled with the semivariogram, and kriging is performed
• If semivariogram is flat, you essentially have OLS• If there are no explanatory variables, you essentially have simple kriging
Demo #4EBK Regression
EBK 3D
• Applies the EBK model to 3D
- Distances are calculated using 3D Euclidean Distance
- Subsets are created in 3D
- Search neighborhoods are 3D
- Vertical trend can be removed
• Elevation Inflation Factor
- Vertical variation happens at ta different rate than horizontal variation
Demo #5EBK 3D
Areal Interpolation
• Applies kriging theory to polygon data
• Two main use cases
- Fill missing data
- Downscale from larger polygons to smaller polygons
• Three data inputs
- Average (Gaussian)
- Rate (Binomial)
- Count (Poisson)
Demo #5Areal Interpolation
Print Your Certificate of Attendance
Print Stations Located in 150 Concourse Lobby
Tuesday12:30 pm – 6:30 pmExpoHall B
5:15 pm – 6:30 pmExpo SocialHall B
Wednesday10:45 am – 5:15 pmExpoHall B
6:30 pm – 9:30 pmNetworking ReceptionSmithsonian National Museumof Natural History
Download the EsriEvents app and find
your event
Select the session you attended
Scroll down to “Survey”Log in to access the
surveyComplete the survey and select “Submit”
Please Share Your Feedback in the App