58
Geostatistical analysis of Geostatistical analysis of Particulate matter Particulate matter concentration: A case study of concentration: A case study of Pune Pune MAP INDIA MAP INDIA 19-21 Jan 2010 19-21 Jan 2010 Sulochana Shekhar Sulochana Shekhar Associate Professor, NDA, Associate Professor, NDA, Pune Pune

Geostatistical Analyst for PM Concentration in Pune

Embed Size (px)

Citation preview

Page 1: Geostatistical Analyst for PM Concentration in Pune

Geostatistical analysis of Particulate matter Geostatistical analysis of Particulate matter concentration: A case study of Puneconcentration: A case study of Pune

MAP INDIAMAP INDIA19-21 Jan 201019-21 Jan 2010

Sulochana ShekharSulochana ShekharAssociate Professor, NDA, PuneAssociate Professor, NDA, Pune

Page 2: Geostatistical Analyst for PM Concentration in Pune

Increasing Urbanisation Increasing PopulationIncreasing Urbanisation Increasing Population

Increase in number of vehicles Increase in number of vehicles

other industrial activitiesother industrial activities

Increasing Air PollutionIncreasing Air Pollution

Increasing health hazardsIncreasing health hazards

Need for monitoring the air qualityNeed for monitoring the air quality

Environment ManagementEnvironment Management

GIS - GeostatisticsGIS - Geostatistics

Page 3: Geostatistical Analyst for PM Concentration in Pune

INDIA - MAHARASHTRA

MAHARASHTRA – PUNE DISTRICT

PUNE CITY

Scale: 1cm is to 150 km

KirkeeCant

PuneCant

Hadapsar

Bibewadi

Kasba

Warje

We

ste

rn G

ha

ts

']Pune City

To Mumbai

NH-4

Mula Mutha river

Study area – Pune city

Aundh

Kothrud

PUNE DISTRICT- PUNE CITY

On a list of 52 towns and cities ranked on the basis of respirable suspended particulate matter (RSPM) measured in residential areas in 2004, taking annual average concentrations. … Delhi comes 16th

Kolkata is 21st. Mumbai is 39th

but Pune is at the 13th spot. Pune is at the 13th spot.

Page 4: Geostatistical Analyst for PM Concentration in Pune
Page 5: Geostatistical Analyst for PM Concentration in Pune

Pune was selected as demonstration city for the Urban Air Quality Management project taken up by the USEPA, MoEF and GOI agreement since 2002.

Air Pollution Air Quality

SO2<80µg/m3

NOx<80µg/m3

SPM<200µg/m3

RSPM<100µg/m3

Page 6: Geostatistical Analyst for PM Concentration in Pune

Air quality trends in Pune (source: PMC Environmental Cell)

Page 7: Geostatistical Analyst for PM Concentration in Pune

It is equally important to know that Pune city's pollution has been always concerned with concentrations of particulate matter which are 10 microns in size (10-6m) known as PM10 .and are so small that they cant be seen visually but enter into the respiratory system of human beings and affect to a great extent. Some of the major effects of particulate pollutants are increased risk of respiratory death in infants less than 1 year, deterioration in rate of lung function development, aggravated asthma and also causes other respiratory symptoms such as cough and bronchitis in children. The much smaller particles in the size range of PM2.5 (2.5 microns) seriously affects health, increasing deaths from cardiovascular and respiratory diseases and lung cancer.

Page 8: Geostatistical Analyst for PM Concentration in Pune

•The city’s vehicular population is likely to hit the 19 lakh mark next month. In the last nine months alone, there has been an addition over one lakh vehicles, which comes to around 370 vehicles a day on average. Statistics provided by the Regional Transport office (RTO) show that till December 2009, the number of registered vehicles was 18,91,929.

In Pune, over 74 per cent of the total vehicles are two-wheelers. Cars and jeeps account for about 14 per cent of the total vehicles. Of the more then 18.9 lakh vehicles in the city, 14,10,821 are two-wheelers.

Page 9: Geostatistical Analyst for PM Concentration in Pune

Contrary to popular belief, the growing number of vehicles alone is not responsible for the air pollution in the city. The tar plants, burning of garbage and plastic, resuspended dust from unpaved roadsides and other factors also aggravate air pollution.

Page 10: Geostatistical Analyst for PM Concentration in Pune
Page 11: Geostatistical Analyst for PM Concentration in Pune

Methodology

Page 12: Geostatistical Analyst for PM Concentration in Pune

There are two main groups of interpolation techniques:

Deterministic and Geostatistical.

Deterministic interpolation techniques create surfaces

from measured points on specified mathematical

formulas. The Inverse Distance Weighted (IDW) and

Spline methods are referred to as deterministic

interpolation methods.

Geostatistical interpolation techniques utilize the

statistical properties of the measured points.

Geostatistical techniques quantify the spatial

autocorrelation among measured points.

Page 13: Geostatistical Analyst for PM Concentration in Pune

04/08/23 ss

Geostatistics It is a branch of applied statistics that deals

with spatially distributed properties Geostatistics was devised to treat problems

that arise when conventional statistical theory is used in estimating changes on ore grade within a mine.

However now it is applicable to many circumstances in different areas of geology and other natural sciences.

Page 14: Geostatistical Analyst for PM Concentration in Pune

04/08/23 ss

Geostatistics

Regionalized variable

Semivariance

Semivariogram

Kriging

Page 15: Geostatistical Analyst for PM Concentration in Pune

04/08/23ss

Kriging This estimation procedure is called Kriging, named

after D.G.Krige, a South African mining engineer and pioneer in the application of statistical techniques to mine evaluation.

Kriging estimate requires prior knowledge in the form of a model of the semivariogram or the spatial co variance.

It depends on mathematical and statistical models. The addition of a statistical model that includes probability separates kriging methods from the deterministic methods.

Page 16: Geostatistical Analyst for PM Concentration in Pune

This is the general formula for interpolators.

Page 17: Geostatistical Analyst for PM Concentration in Pune

The weight, λi, in• Deterministic interpolation depends solely on the distance to the prediction location.• Kriging method, the weights are based not only on the distance but also depends on spatial autocorrelation . Thus, in the weight, λi, depends on a fitted model to the measured points, the distance to the prediction location, and the spatial relationships among the measured values around the prediction location.

Page 18: Geostatistical Analyst for PM Concentration in Pune

Creating a prediction surface map with krigingTo make a prediction with the kriging interpolation method, two tasks are necessary: Uncover the dependency rules. Make the predictions. To realize these two tasks, kriging goes through a two-step process: It creates the semivariogram to estimate the statistical dependence (called spatial autocorrelation) values that depend on the model of autocorrelation (fitting a model). It predicts the unknown values (making a prediction).

It is because of these two distinct tasks that it has been said that kriging uses the data twice; the first time to estimate the spatial autocorrelation of the data and the second to make the predictions.

Page 19: Geostatistical Analyst for PM Concentration in Pune

SemivariogramThe semivariogram is defined as:

Y(si,sj) = ½ var(Z(si) - Z(sj)²)

where var is the variance. If two locations, si and sj, are close to each other in terms of the distance measure of d(si, sj), then, so the difference in their values, Z(si) - Z(sj), will be small. As si and sj get farther apart, they become less similar, so the difference in their values, Z(si) - Z(sj), will become larger. 04/08/23 ss

Page 20: Geostatistical Analyst for PM Concentration in Pune

The empirical semivariogram is a graph of the averaged semivariogram values on the y-axis and the distance (or lag) on the x-axis.

Page 21: Geostatistical Analyst for PM Concentration in Pune

A typical semivariogram

The semivariogram depicts the spatial autocorrelation of the measured sample points.The variance of the difference increases with distance.

04/08/23 ss

Understanding a semivariogram—Range, sill, and nugget

Page 22: Geostatistical Analyst for PM Concentration in Pune

Covariance functionThe covariance function is defined to be: C(si, sj) = cov(Z(si), Z(sj)), where cov is the covariance. Covariance is a scaled version of correlation. So, when two locations, si and sj, are close to each other, you expect them to be similar, and their covariance (a correlation) will be large. As si and sj get farther apart, they become less similar, and their covariance becomes zero. This can be seen in the following figure, which shows the anatomy of a typical covariance function.

Page 23: Geostatistical Analyst for PM Concentration in Pune

ArcGIS provides the following functions from which to choose for modeling the empirical semivariogram: Circular Spherical Exponential Gaussian Linear

04/08/23 ss

Page 24: Geostatistical Analyst for PM Concentration in Pune

Making a prediction

•The first use of the data Uncovering the dependence or autocorrelation in our data• The second use of data Make a prediction using the fitted model.

04/08/23 ss

Page 25: Geostatistical Analyst for PM Concentration in Pune

Search radius• Establish our search radius or neighborhood by assuming that as the locations get farther from the prediction location, the measured values will have less spatial autocorrelation with the unknown value. • Search radius controls computational speed. The smaller the search radius, the faster the predictions can be made.• The specified shape of the neighborhood restricts how far and where to look for the measured values to be used in the prediction.• Fixed and Variable search radius.

04/08/23 ss

Page 26: Geostatistical Analyst for PM Concentration in Pune

Geostatistical Wizard: Searching Neighborhood dialog box

Neighbors to include = 5 Search strategy: circle with four quadrants. Radius = 0.1 Coordinates of test point (x = 18.9, y = -34.01) Estimated Value= 841.895

Page 27: Geostatistical Analyst for PM Concentration in Pune

Z(s) = µ(s) + ε(s)where Z(s) is the variable of interest, decomposed into a deterministic trend µ(s) and a random, auto correlated errors form ε(s). The symbol s simply indicates the location; as containing the spatial x- (longitude) and y- (latitude) coordinates. Variations on this formula form the basis for all of the different types of kriging. The autocorrelation between ε(s) and ε(s + h) does not depend on the actual location s, but only the displacement h between the two.

04/08/23 ss

Kriging Formula

Page 28: Geostatistical Analyst for PM Concentration in Pune

Types of Kriging

Page 29: Geostatistical Analyst for PM Concentration in Pune

Deterministic interpolation techniques can be divided into two groups, global and local.• Global techniques calculate predictions using the entire dataset. • Local techniques calculate predictions from the measured points within neighborhoods, which are smaller spatial areas within the larger study area. Geostatistical Analyst in ArcGIS provides • Global Polynomial as a global interpolator and • Inverse Distance Weighted, Local Polynomial, and Radial Basis Functions as local interpolators.

Page 30: Geostatistical Analyst for PM Concentration in Pune

To predict a value for any unmeasured location, IDW will use the measured values surrounding the prediction location. Those measured values closest to the prediction location will have more influence on the predicted value than those farther away.

Page 31: Geostatistical Analyst for PM Concentration in Pune

Global Polynomial (GP) is a quick deterministic interpolator that is smooth. It is best used for surfaces that change slowly and gradually. However, there is no assessment of prediction errors and it may be too smooth.

Page 32: Geostatistical Analyst for PM Concentration in Pune

When the dataset exhibits short-range variation, Local Polynomial interpolation maps can capture the short-range variation. Local Polynomial interpolation is sensitive to the neighborhood distance. In this method, there is no assessment of prediction errors.

Page 33: Geostatistical Analyst for PM Concentration in Pune

They are moderately quick deterministic interpolators that are exact. There is no

assessment of prediction errors. RBFs are used for calculating smooth surfaces from a large number of data points. The functions produce good results for gently varying surfaces such as elevation.

Page 34: Geostatistical Analyst for PM Concentration in Pune

Exploring data“Things that are closer together tend to be more alike than things that are farther apart”.

Geostatistical interpolation techniques

Page 35: Geostatistical Analyst for PM Concentration in Pune

A Kriging method in which, the weights of the values sum to unity. It uses an average of a subset of neighboring points to produce a particular interpolation point.

Page 36: Geostatistical Analyst for PM Concentration in Pune

A kriging method in which, the weights of the values do not sum to unity. Simple kriging uses the average of the entire dataset, and produces a smoother result.

Page 37: Geostatistical Analyst for PM Concentration in Pune

A kriging method often used on data with a significant spatial trend, such as a sloping surface. In universal kriging, the expected values of the sampled points are modeled as a polynomial trend. Kriging is carried out on the difference between this trend and the values of the sampled points

Page 38: Geostatistical Analyst for PM Concentration in Pune

In general, Disjunctive Kriging tries to do more than Ordinary Kriging. Disjunctive Kriging requires the bivariate normality assumption and approximations to the functions f i(Z(si), the assumptions are difficult to verify, and the solutions are mathematically and computationally complicated.

Page 39: Geostatistical Analyst for PM Concentration in Pune

Declustering of data:Often times the spatial locations of our data are not randomly or regularly spaced. For various reasons, the data may have been sampled preferentially, with a higher density of sample points in some places than in others . Samples should be taken so they are representative of the entire surface. However, many times the samples are taken where the concentration is most severe, thus skewing the view of the surface. Declustering accounts for skewed representation of the samples by weighting them appropriately so that a more accurate surface can be created.To get the best result, declustering was done for the data which were actually clustered in one part of the study area. Therefore NST (Normal Score Transfromation) method was used with cell and polygon options to decluster the data. NST can be useful for geostatistics because when the data is dependent, it may be easier to detect and model autocorrelation using the NST.

Page 40: Geostatistical Analyst for PM Concentration in Pune
Page 41: Geostatistical Analyst for PM Concentration in Pune

A standard error map quantifies the uncertainty of the prediction. If the data comes from a normal distribution, the true value will be within prediction ± 2 times the prediction standard errors approximately 95 percent of the time.

Prediction Standard error map

Page 42: Geostatistical Analyst for PM Concentration in Pune
Page 43: Geostatistical Analyst for PM Concentration in Pune

Cross-validationCross-validation uses all of the data to estimate the trend and autocorrelation models. It removes each data location, one at a time, and predicts the associated data value. For example, the diagram below shows 10 randomly distributed data points. Cross-validation omits a point (red point) and calculates the value of this location using the remaining nine points (blue points). The predicted and actual values at the location of the omitted point are compared. This procedure is repeated for a second point, and so on.

"how good" the model is”

We can systematically compare each surface with another, eliminating the "worst" of the two being compared, until the two "best" surfaces remain and are compared with one another.

Page 44: Geostatistical Analyst for PM Concentration in Pune

Therefore the goal should be to have

Standardized mean prediction errors near 0

Small root-mean-squared prediction errors Average standard error near root-mean-squared prediction errors

Standardized root-mean-squared prediction errors near 1 Spread of the points should be as close as possible around the dashed gray line.

Page 45: Geostatistical Analyst for PM Concentration in Pune

Optimal Model and valid ModelThe root-mean-squared prediction error may be smaller for a particular model. Therefore, one might conclude that it is the "optimal" model. However, when comparing to another model, the root-mean-squared prediction error may be closer to the average estimated prediction standard error. This is a more valid model .Because when we predict at a point without data, we have only the estimated standard errors to assess our uncertainty of that prediction. When the average estimated prediction standard errors are close to the root-mean-squared prediction errors from cross-validation, we can be confident that the prediction standard errors are appropriate.

Page 46: Geostatistical Analyst for PM Concentration in Pune

If the average standard errors are close to the root-mean-squared prediction errors, we are correctly assessing the variability in prediction.

If the average standard errors are greater than the root-mean-squared prediction errors, we are overestimating the variability of our predictions.

If the average standard errors are less than the root-mean-squared prediction errors, we are underestimating the variability in our predictions.

If the root-mean-squared standardized errors are greater than 1, we are underestimating the variability in our predictions.

If the root-mean-squared standardized errors are less than 1, we are overestimating the variability in our predictions.

Page 47: Geostatistical Analyst for PM Concentration in Pune

These techniques are inappropriate when there are large changes in the surface values within a short horizontal distance and/or when we suspect the sample data is prone to error or uncertainty. They do not allow us to investigate the autocorrelation of the data, making it less flexible and more automatic than Kriging. These functions make no assumptions about the data. There fore these methods are not considered for air quality model.

Deterministic Interpolation Techniques

Page 48: Geostatistical Analyst for PM Concentration in Pune

As per the Prediction error statistics the Root-Mean –Square is 10.75 of Ordinary kriging (OK) is smaller than Simple kriging. But OK is only Optimal model.

Geostatistical Interpolation Techniques

Page 49: Geostatistical Analyst for PM Concentration in Pune

As per the Prediction error statistics the Root-Mean –Square is 10.75 of Universal kriging (UK) is smaller than Simple kriging. But UK is only Optimal Model.In case of Mean standardized, the UK has the value of -0.024 which is close to zero whereas the SK has the value of -0.13 which is little far from zero. The average standard error is greater than the root-mean-squared prediction error in UK show that we are overestimating the variability of our predictions.In both the cases the root-mean squared standardized errors are closer to 1. Which shows both the methods are reasonably good predictors.

Page 50: Geostatistical Analyst for PM Concentration in Pune

The average standard errors are above to the root-mean-squared prediction errors in Universal Kriging (UK) and it is below in DK. The average standard error is greater than the root-mean-squared prediction error in UK shows that we are overestimating the variability of our predictions. Similarly the average standard error is lesser than the root-mean-squared prediction error in DK shows that we are under estimating the variability of our predictions.In UK, the root-mean squared standardized error is closer to 1where as In DK; the root-mean squared standardized error is more than 1. That shows we are underestimating the variability in our predictions in Disjunctive method.

Page 51: Geostatistical Analyst for PM Concentration in Pune

After declustering the data by using Normal Score Transformation in simple kriging and Disjunctive kriging methods the cross validation statistics are again compared to select the best method.

The average standard error is lower than the root-mean-squared prediction error in SK and DK shows that we are underestimating the variability of our predictions. The root-mean squared standardized errors are more than 1. That shows we are underestimating the variability in our predictions in both methods.

Page 52: Geostatistical Analyst for PM Concentration in Pune

The Simple kriging method by using Normal Score transformation (NST) with Polygon option was compared with simple kriging with out NST

As per the Prediction error statistics the Root-Mean–Square is 11.26 of Simple kriging 6 (SK 6) is smaller than Simple kriging (SK). The average standard errors are very close to the root-mean-squared prediction errors in Simple Kriging 6 (SK 6) compared to the average standard errors in SK.The average standard error close to root-mean-squared prediction error in both the methods shows that both are reasonably good for predictions (Valid models). In SK, the root-mean squared standardized error is closer to 1where as In SK 6; the root-mean squared standardized error is little more than 1. That shows our predictions are better in Simple kriging method than other methods.

Page 53: Geostatistical Analyst for PM Concentration in Pune

The graphs of simple kriging prediction standard error of SK 6 and SK were compared below to select the better one.

If the errors of the predictions from their true values are normally distributed, the points should lie roughly along the dashed line. If the errors are normally distributed, one can be confident of using methods that rely on normality. In this case Simple kriging_2 graph, the values are relatively closer to dashed line than other methods.

Page 54: Geostatistical Analyst for PM Concentration in Pune

Comparing the results

Page 55: Geostatistical Analyst for PM Concentration in Pune

ID Measured value

(MV)

Universal Kriging UK

Ordinary Kriging OK

Simple Kriging 6 SK6

Simple Kriging SK

Value close to MV

Method close to MV

Karve Rd1 147 148.53 148.5 145.71 147.98 147.98 SK

Karve Rd2 158.82 147.62 147.53 145.77 148.17 148.17 SK

Nal Stop 130.75 140.07 139.80 138.87 140.03 138.87 SK 6

Navipeth 134.88 135.54 135.52 136.61 135.37 135.37 SK

Swargate 133.67 134.96 135.03 135.93 135.31 134.96 UK

Mandai 126.25 132.51 132.43 132.20 132.10 132.10 SK

Oasis 147.63 144.66 144.95 144.80 145.08 145.08 SK

Jog Building

140.08 141.94 142.02 141.39 141.39 141.39 SK, SK 6

Koregaon park

150 149.37 149.51 149.35 148.53 149.51 OK

Bosari2 159 159.09 159.12 157.30 156.44 159.09 UK

Bosari 170 164.48 164.52 161.52 162.05 164.52 OK

Comparing the Measured value with the predicted value

Page 56: Geostatistical Analyst for PM Concentration in Pune

Kriging Method Mean Root Mean Square

Average Standard Error

Mean Standardized

Root Mean Square Standardized

Simple Kriging -1.824 11.28 11.75 -0.1361 1.006

Simple Kriging 6 -2.086 11.26 11.37 -0.1649 1.029

The root-mean-squared prediction error was small for UK and OK models. Therefore, one might conclude that those methods are the "optimal" models. However, when comparing to SIMPLE KRIGING model, the root-mean-squared prediction error is closer to the average estimated prediction standard error. This is a more valid model. Therefore, from cross-validation, we can be confident that the prediction standard errors in SK are appropriate. Both the models are also satisfying other criteria such as Standardized mean prediction errors near 0Standardized root-mean-squared prediction errors near 1 andSpread of the points should be as close as possible around the dashed gray line.

Valid Model

Page 57: Geostatistical Analyst for PM Concentration in Pune

Air quality dispersion models have an important place in air quality management. They are essential tools in the development of action plans for improving air quality.There will always be a need for both measurements and models. Models improve the effectiveness of air quality management. Based upon model estimates it may also be possible to design measurement networks (i.e.(re)locate stations) in a given area. Knowledge about the spatial distribution of the pollutant concentrations in the area is therefore required, and Geo statistical (Kriging) models are most appropriate tools to obtain this information.

Page 58: Geostatistical Analyst for PM Concentration in Pune

Today's towns are Tomorrow’s cities: Today's cities are the Future of Mankind.

RS GIS

GPS