Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Interpolation {x,y} Data with Suavity
Peter K. OttForest Analysis & Inventory Branch
BC Ministry of FLNROVictoria, BC
130/11/2015
The Goal
• Given a set of points:
𝑥𝑖 , 𝑦𝑖 , 𝑖 = 1,2, … , 𝑛
find a function that passes through the points affording the prediction of 𝑦𝑖 at new 𝑥𝑖
• Regression or smoothing is a related but different problem
230/11/2015
Outline
• Example Data
• Linear Interpolation
• Thin Plate Splines (TPS)
• Ordinary Kriging (OK)
• Two implementations
• Conclusion
330/11/2015
data fake;
input x y;
cards;
0 41
1 26
2 19
3 18
4 18.5
5 17.5
6 18
7 18.5
8 19
;
run;
ods html;
ods graphics on;
proc sgplot data=fake noautolegend;
scatter y=y x=x /
markerattrs=(symbol=circle size=4pt color=blue);
title 'Y versus X';
run;
ods graphics off;
ods html close;
430/11/2015
Our Data
530/11/2015
Linear Interpolation (~200 yrs BC)
• Form a straight line between pairs of known points
• What is 𝑦0 | 𝑥0 where 𝑥0 lies between 𝑥−1 and 𝑥+1?
• Slope must be constant, so
𝑦0 − 𝑦−1
𝑥0 − 𝑥−1=
𝑦+1 − 𝑦0
𝑥+1 − 𝑥0
• Solve for 𝑦0:
𝑦0 =𝑦−1 𝑥+1 − 𝑥0 + 𝑦+1 𝑥0 − 𝑥−1
𝑥+1 − 𝑥−1
630/11/2015
data interp0; *denser range of x to be interpolated;
do x=0 to 8 by 0.1;
output;
end;
run;
proc sql;
create table lin_pred(drop=x0) as
select *
from interp0 left join fake(rename=(x=x0))
on put(interp0.x, 6.3) = put(fake.x0, 6.3)
;
quit;
proc print data=lin_pred(obs=34) noobs;
run;
730/11/2015
x y0.0 410.1 .0.2 .0.3 .0.4 .0.5 .0.6 .0.7 .0.8 .0.9 .1.0 261.1 .1.2 .1.3 .1.4 .1.5 .1.6 .1.7 .1.8 .1.9 .2.0 192.1 .2.2 .2.3 .2.4 .2.5 .2.6 .2.7 .2.8 .2.9 .3.0 183.1 .3.2 .3.3 .
830/11/2015
proc expand data=lin_pred(keep=x y) out=lin_interp;
convert y=linear / method=join;
id x; *data must be sorted by x;
run;
ods html;
ods graphics on;
proc sgplot data=lin_interp noautolegend;
series y=linear x=x / lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=GraphPrediction;
scatter y=y x=x / markerattrs=(symbol=circle size=4pt color=blue);
title 'Linear interpolated values';
run;
ods graphics off;
ods html close;
930/11/2015
Linear Interpolation
1030/11/2015
Thin Plate Splines (1970s)
• Want a function to minimize:
𝐿 =
𝑖=1
𝑛
𝑦𝑖 − 𝑓 𝑥𝑖2+ 𝜆 ∙ 𝑓′′ 𝑥 2 𝑑𝑥
• or more generally
𝐿 =
𝑖=1
𝑛
𝑦𝑖 − 𝑓 𝐱𝐢2+ 𝜆 ∙ 𝐽 𝑓
• where, for 𝑑 = 2
𝐽 𝑓 = 𝜕2𝑓
𝜕𝑥12
2
+𝜕2𝑓
𝜕𝑥1𝑥2
2
+𝜕2𝑓
𝜕𝑥22
2
𝑑𝑥1𝑑𝑥2
• Where 𝜆 ≥ 0 is an unknown parameter that controls the wiggliness
1130/11/2015
• Solution to this problem is a function that relies on radial basis functions and it passes through data without knots
• One dimension example:
𝑦0 | 𝑥0 = 𝛼0 + 𝛼1𝑥0 +1
12
𝑖=1
𝑛
𝛽𝑖 ∙ 𝑥0 − 𝑥𝑖3
• Two dimension example:
𝑦0 | 𝑥01, 𝑥02 = 𝛼0 + 𝛼1𝑥01 + 𝛼2𝑥02 +1
8𝜋
𝑖=1
𝑛
𝛽𝑖 ∙ 𝑧𝑖2𝑙𝑜𝑔 𝑧𝑖
• where
𝑧𝑖 = 𝑥01 − 𝑥1𝑖2 + 𝑥02 − 𝑥2𝑖
2
1230/11/2015
proc tpspline data=fake;
model y =(x); */ lambda0=1e-15;*setting lambda0 to zero is necessary for interpolation;
score data=interp0 out=tps_pred pred; *this will yield the interpolated points and more;
output out=tps_coef pred coef;
run;
proc print data=tps_coef noobs;
run;*output are alpha[0], alpha[1], beta[1], ..., beta[n], with the
beta aligned with sorted (unique) x[i];
*Note also that sum(beta[i])=0 and sum(beta[i]*x[i])=0;
1330/11/2015
x y P_y Coef_y0 41.0 41.0000 27.85131 26.0 26.0000 -8.10712 19.0 19.0000 10.50883 18.0 18.0000 -15.05304 18.5 18.5000 0.21215 17.5 17.5000 -0.79536 18.0 18.0000 11.96907 18.5 18.5000 -11.08108 19.0 19.0000 5.3549. . . -1.3387. . . 0.2231
1430/11/2015
proc sql;
create table tps_pred2(drop=x0) as
select *
from tps_pred left join fake(rename=(x=x0))
on put(tps_pred.x, 6.3) = put(fake.x0, 6.3)
;
quit;
ods html;
ods graphics on;
proc sgplot data=tps_pred2 noautolegend;
series y=p_y x=x /
lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=GraphPrediction;
scatter y=y x=x /
markerattrs=(symbol=circle size=4pt color=blue);
title 'Interpolated values';
run;
ods graphics off;
ods html close;
1530/11/2015
Thin Plate Spline
1630/11/2015
Ordinary Kriging (1960s)
• Consider 𝑦𝑖 | 𝑥𝑖 as a multivariate Gaussian process:
𝑦𝑖 | 𝑥𝑖 = 𝐲 ~ 𝑁𝑛 𝜇𝟏, 𝐑
• Find the estimator 𝑦0 | 𝑥0 = 𝑖=1𝑛 𝑤𝑖𝑦𝑖 = 𝐰′𝐲
such that:
𝐸 𝑦0 | 𝑥0 = 𝜇 (unbiased), and
Prediction error, 𝑉𝑎𝑟(𝑦0 − 𝑦0) is minimized
1730/11/2015
• It turns out:
𝐰 = 𝐑−𝟏𝐜 − 𝟏′𝐑−𝟏𝟏−1
𝐑−𝟏𝟏𝟏′𝐑−𝟏𝐜 + 𝟏′𝐑−𝟏𝟏−1
𝐑−𝟏𝟏 (ugly)
𝑦0 | 𝑥0= 𝜇 + 𝒄𝟎′ 𝐑−𝟏 𝐲 − 𝜇𝟏 (better)
• where
𝜇 = 𝟏′𝐑−𝟏𝟏−1
𝟏′𝐑−𝟏𝐲 and 𝐜𝟎 =
𝐶𝑜𝑣 𝑦1, 𝑦0
𝐶𝑜𝑣 𝑦2, 𝑦0
⋮𝐶𝑜𝑣 𝑦𝑛, 𝑦0
1830/11/2015
• How do we determine 𝐜𝟎? We’ll need to model the covariance structure as a function of distance, say ℎ
• Tradition is to use semivariances (semivariogram) instead of covariances (covariogram) or correlations (correlogram):
𝛾𝑖𝑗 = 𝜎2 − 𝜎𝑖𝑗
= 𝜎2 1 − 𝜌𝑖𝑗
𝛾 ℎ =1
2 ∙ 𝑛 ℎ
𝑖=1
𝑛 ℎ
𝑦𝑖 𝑥𝑖 + ℎ − 𝑦𝑖 𝑥𝑖2
1930/11/2015
Semivariogram
2030/11/2015
Features of the (Semi)variogam
• Nugget: discontinuity at the origin. Can’t have this for interpolation with kriging!
• Range: distance it takes for the variogram to level off (reach asymptote)
• Sill: value of variogram at asymptote (= 𝜎2=𝑣𝑎𝑟 𝑦0 ). When a nugget is present, sill = partial sill + nugget
2130/11/2015
Ordinary Kriging
Implementation - two options:
1. Use both proc variogram & proc krige2d
• need to create a second variable (x2) with constant values
2. Use a mixed model procedure (proc mixed)
• not provided empirical and fitted variogramsautomatically
2230/11/2015
data fake2;
set fake;
x2=1; *constant value;
run;
ods html;
ods graphics on;
proc variogram data=fake2 outvar=look;
store out=semivar_store;
directions 90(0); *not really needed;
compute lagdist=1 maxlag=10;
*lagdist should be ~ 2*min norm and maxlag should be ~ max norm among xs;
coordinates xc=x yc=x2;
var y;
model nugget=0 form=auto(mlist=(gau,pow,she) nest=2) choose=(AIC SSE STATUS);
*important that nugget is zero for interpolation;
run;
proc krige2d data=fake2 outest=kr_pred(rename=(gxc=x estimate=y_est));
restore in=semivar_store;
coordinates xc=x yc=x2;
predict var=y;
model storeselect;
grid x=0 to 8 by 0.01 y=1 to 1 by 1;
run;
2330/11/2015
The VARIOGRAM ProcedureDependent Variable: y
Empirical Semivariogram at
Angle=90
Lag
Class
Pair
Count
Average
Distance Semivariance
0 0 . .
1 8 1 17.313
2 7 2 39.339
3 6 3 49.146
4 5 4 58.000
5 4 5 77.188
6 3 6 97.542
7 2 7 138.813
8 1 8 242.000
9 0 . .
10 0 . .
2430/11/2015
2530/11/2015
proc sql;
create table kr_pred2(drop=x0) as
select *, (y_est+1.96*stderr) as cl_upp, (y_est-1.96*stderr) as cl_low
from kr_pred(keep=x y_est stderr) left join fake2(keep=x y rename=(x=x0))
on put(kr_pred.x, 6.3) = put(fake2.x0, 6.3)
;
quit;
proc sgplot data=kr_pred2 noautolegend;
series y=y_est x=x /
lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=graphprediction;
scatter y=y x=x /
markerattrs=(symbol=circle size=4pt color=blue);
series y=cl_upp x=x /
lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction;
series y=cl_low x=x /
lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction;
title 'Interpolated values';
run;
ods graphics off;
ods html close;
2630/11/2015
2730/11/2015
Getting setup for proc mixed
data fake_fmixed;
set fake end=last;
output;
if last then do x=0 to 8 by 0.01;
y=.;
output;
end;
run;
proc print data=fake_fmixed(obs=34) noobs;
run;
2830/11/2015
x y0.00 41.01.00 26.02.00 19.03.00 18.04.00 18.55.00 17.56.00 18.07.00 18.58.00 19.00.00 .0.01 .0.02 .0.03 .0.04 .0.05 .0.06 .0.07 .0.08 .0.09 .0.10 .0.11 .0.12 .0.13 .0.14 .0.15 .0.16 .0.17 .0.18 .0.19 .0.20 .0.21 .0.22 .0.23 .0.24 .
2930/11/2015
proc mixed data=fake_fmixed;
model y = / outp=ok_preds; *outputing predictions;
repeated / subject=intercept type=sp(matern)(x);
title 'Ordinary Kriging in Proc Mixed';
run;
proc print data=ok_preds(obs=34) noobs;
run;
ods html;
ods graphics on;
proc sgplot data=ok_preds(where=(resid=.)) noautolegend;
series y=pred x=x /
lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=GraphPrediction;
series y=lower x=x /
lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction;
series y=upper x=x /
lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction;
title 'Kriged values via Mixed Model';
run;
ods graphics off;
ods html close;
3030/11/2015
StdErrx y Pred Pred DF Alpha Lower Upper Resid
0.00 41.0 63.5257 59.7589 8 0.05 -74.2785 201.330 -22.52571.00 26.0 63.5257 59.7589 8 0.05 -74.2785 201.330 -37.52572.00 19.0 63.5257 59.7589 8 0.05 -74.2785 201.330 -44.52573.00 18.0 63.5257 59.7589 8 0.05 -74.2785 201.330 -45.52574.00 18.5 63.5257 59.7589 8 0.05 -74.2785 201.330 -45.02575.00 17.5 63.5257 59.7589 8 0.05 -74.2785 201.330 -46.02576.00 18.0 63.5257 59.7589 8 0.05 -74.2785 201.330 -45.52577.00 18.5 63.5257 59.7589 8 0.05 -74.2785 201.330 -45.02578.00 19.0 63.5257 59.7589 8 0.05 -74.2785 201.330 -44.52570.00 . 41.0000 . 8 0.05 . . .0.01 . 40.8198 0.0104 8 0.05 40.7957 40.844 .0.02 . 40.6401 0.0206 8 0.05 40.5925 40.688 .0.03 . 40.4607 0.0305 8 0.05 40.3904 40.531 .0.04 . 40.2818 0.0401 8 0.05 40.1894 40.374 .0.05 . 40.1034 0.0494 8 0.05 39.9895 40.217 .0.06 . 39.9254 0.0584 8 0.05 39.7906 40.060 .0.07 . 39.7478 0.0672 8 0.05 39.5929 39.903 .0.08 . 39.5707 0.0757 8 0.05 39.3962 39.745 .0.09 . 39.3941 0.0839 8 0.05 39.2006 39.588 .0.10 . 39.2179 0.0918 8 0.05 39.0062 39.430 .0.11 . 39.0422 0.0995 8 0.05 38.8128 39.272 .0.12 . 38.8670 0.1069 8 0.05 38.6206 39.113 .0.13 . 38.6923 0.1140 8 0.05 38.4295 38.955 .0.14 . 38.5181 0.1208 8 0.05 38.2394 38.797 .0.15 . 38.3443 0.1274 8 0.05 38.0505 38.638 .0.16 . 38.1711 0.1337 8 0.05 37.8628 38.479 .0.17 . 37.9984 0.1398 8 0.05 37.6761 38.321 .0.18 . 37.8262 0.1456 8 0.05 37.4906 38.162 .0.19 . 37.6546 0.1511 8 0.05 37.3061 38.003 .0.20 . 37.4834 0.1564 8 0.05 37.1229 37.844 .0.21 . 37.3128 0.1614 8 0.05 36.9407 37.685 .0.22 . 37.1428 0.1661 8 0.05 36.7597 37.526 .0.23 . 36.9733 0.1706 8 0.05 36.5798 37.367 .0.24 . 36.8043 0.1749 8 0.05 36.4010 37.208 .
3130/11/2015
3230/11/2015
Comparison of all 4 approaches
3330/11/2015
Conclusions
• TPSs and OK are both capable of interpolation and smoothing
• TPSs require no distributional assumptions but predictions can be overly “wiggly” when 𝜆 =0
• OK takes a bit more effort/practice but is powerful when a suitable model is available for the empirical variogram
• Consider TPS and OK over linear interpolation!
3430/11/2015
Thanks!
3530/11/2015