Upload
vuongcong
View
217
Download
0
Embed Size (px)
Citation preview
15/09/2009 1
Non-linearity and spatial correlation in landslide susceptibility mapping
C. Ballabio, J. Blahut, S. Sterlacchini
University of Milano-Bicocca
GIT 2009
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 2
Summary
Landslide susceptibility modeling
Non-linearity issues
Few examples
Application to a case study
Modeling the residual spatial correlation
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 3
IntroductionUNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
Landslide susceptibility modeling
Usually defined as a classification problem: if y=1 is an observed occurrence and y=0 is a point with no occurrence, and x is a series of variables, then we want to know:
P(y=1|x)=f(x, θ)
15/09/2009 4X2
X1
-2
0
2
4
6
-2 0 2 4 6
Linearly separable classesJust find a separating line/plane/hyper-plane
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
Introduction
15/09/2009 5
Exactly what LDA, QDA and LR doesUNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 6
Even for linearly separable classes the best function could be not linear
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 7
X2
X1
-1.0
-0.5
0.0
0.5
1.0
-1.0 -0.5 0.0 0.5 1.0
What if the separation can not be performed by linear functions?How can we separate the two classes by using only X1 and X2?
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 8
LDA does not work…
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 9
LDA does not work…
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 10
Neither does QDA…
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 11
Even far more flexible models fail to separate the classes…
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 12
ANNs get close to do the job, but require a lot of tuning…
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 13
Support VectorMachines
Based on the Statistical Learning Theory (Vapnik, 1995)Very good performance in classification tasksIntrinsic “Occam’s razor” logic: the simplest model is preferredEasy to avoid overfittingNot so “Black-box”
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 14
Support VectorMachines
Widely used in machine learningBioinformatics / genetic classificationSpatial mappingRoboticsDigital soil mapping
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 15
Support vectorclassification
Use the best hyperplaneUse the “kernel trick”
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 16X2
X1
-2
0
2
4
6
-2 0 2 4 6
Which is the best hyperplane?We need a way to define what “optimal separation” is…
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
?
15/09/2009 17X2
X1
-2
0
2
4
6
-2 0 2 4 6
Find the widest gap between classesFit a plane in the middle of the gap
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 18
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
Kernel Function
The kernel linearize the data in an high dimensional spaceMakes possible to find a flat separating hyperplane
15/09/2009 19
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
Kernel Function
The kernel linearize the data in an high dimensional spaceMakes possible to find a flat separating hyperplane
15/09/2009 20
Kernel Function
Based on the dot product:
Simple to elaborateBut very powerful, can project data in high dimensional spaces: Reproducing Kernel Hilbert Spaces (RKHS)But… it is not known beforehand which kernel is appropriate…
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
∑=
=l
iiiii xxxx
1]'[][',
15/09/2009 21
Kernels
Polynomial:
Linear:
Radial basis function:
dii xxxxK ,),( =
⎟⎟⎠
⎞⎜⎜⎝
⎛ −−= 2
2
2exp),(
σi
ixxxxK
Exponential RBF: ⎟⎠⎞
⎜⎝⎛ −−= 22
exp),(σ
ii
xxxxK
ii xxxxK ,),( =
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 22
SVM with Single Gaussian kernelSeparates the classes almost perfectlyReproduces the general trend of the data
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 23
The Stafforabasin study areaTriggering patterns for flowsDEM derived covariates + geology and landuse
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 24
1510000
1510000
1520000
1520000
4950
000
4950
000
4960
000
4960
000
4970
000
4970
000
0 1 2 3 4 5 Kilometers
LegendNB ProbabilityValue
High : 1
Low : 0Naïve Bayes(≈WoE) predictionNot bad, but we got a lot of high probability areas
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 25
LDA (a.k.a. Maximum Likelihood) predictionBetter than NB, but we still get a lot of high probabilities
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
1510000
1510000
1520000
1520000
4950
000
4950
000
4960
000
4960
000
4970
000
4970
000
0 1 2 3 4 5 Kilometers
LegendLDA ProbabilityValue
High : 1
Low : 0
15/09/2009 26
1510000
1510000
1520000
1520000
4950
000
4950
000
4960
000
4960
000
4970
000
4970
000
0 1 2 3 4 5 Kilometers
LegendSVM ProbabilityValue
High : 1
Low : 0SVMJust better…
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 27
Use cross-validation and ROC curves to compare the models
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
Far less false positivesPredicted in the cross-validationsample
15/09/2009 28
Success curves (Fabbri and Chung, 2003)It’s a ROC with only true positives rate…
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 29
Why use MachineLearning?
Increasing availability of low cost / high information topographic surveys
i.e. LiDAR, Hyper-spectral data
A lot raw derived information
A ML system can automatically interpreter the data without the need of refinements (automatic mapping systems).
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 30
What happened if we use only DEM derived data?We still get a decent prediction from SVM, but not from LR/LDA/NB
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 31
distance
sem
ivar
ianc
e
0.05
0.10
0.15
200 400 600 800
Once we predict with SVM can we derive useful information from the data?There is still a lot of autocorrelation al low distances
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
Residual spatial correlation
15/09/2009 32distance
sem
ivar
ianc
e
0.00
0.05
0.10
0.15
500 1500 2500
detrended.occurrence
0.00
0.02
0.04
0.06
0.08
svm.pred.occurrence
0.00
0.10
0.20
500 1500 2500
occurrence
-0.0
10-0
.005
0.00
0
detrended.svm.pred
0.00
0.02
0.04
0.06
0.08
svm.pred
0.00
0.05
0.10
0.15
detrended
We can implement a Kriging system to model the residual informationOr, we can use MK-SVM to model spatial variation
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
Original trend
Residual correlation
Model correlation
15/09/2009 33
A monodimensionalexample
Combination ofcosine functionswith different λPlus some Additive Gaussian NoiseSample 30% of the data points
0 5 10 15 20 25 30
−0.4
0.0
0.4
x
y
0 5 10 15 20 25 30
−1.0
0.0
1.0
x
y2
0 5 10 15 20 25 30
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
x
y3
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 34
Multi-kernel analysis
Two gaussianRBF kernelsMK-SVR is able to separate the two signals, even in presence of noise.
0 5 10 15 20 25 30
−1.0
0.0
1.0
x
y2
0 5 10 15 20 25 30
−1.0
0.0
1.0
x
pred
3
0 5 10 15 20 25 30
−0.4
0.0
0.4
x
y
0 5 10 15 20 25 30
−0.1
50.
000.
15
x
pred
4
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 35
Spatial SVM performance
Within slope predicted probabilityAverage probability close to max probability
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA
15/09/2009 36
Cross-validation performanceAverage prob. still close to max probability
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA Spatial SVM performance
15/09/2009 37
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA Spatial SVM rules extraction
15/09/2009 38
Conclusions
SVM clearly outperform most of the statistical techniques commonly applied for landslide susceptibility perdictionIt IS a “black box” technique, but not so much… several algorithms for feature selection and ranking are availableVery good for automatic and real time mappingCan easily update the model if new data is providedGood for automatic mapping systems
UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA