UNIVERSITÀ DEGLI STUDI Non-linearity and spatial DI MILANO ... 09...15/09/2009 1 Non-linearity and spatial correlation in landslide susceptibility mapping C. Ballabio, J. Blahut,

15/09/2009 1

Non-linearity and spatial correlation in landslide susceptibility mapping

C. Ballabio, J. Blahut, S. Sterlacchini

University of Milano-Bicocca

GIT 2009

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 2

Summary

Landslide susceptibility modeling

Non-linearity issues

Few examples

Application to a case study

Modeling the residual spatial correlation


15/09/2009 3

IntroductionUNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

Landslide susceptibility modeling

Usually defined as a classification problem: if y=1 is an observed occurrence and y=0 is a point with no occurrence, and x is a series of variables, then we want to know:

P(y=1|x)=f(x, θ)

15/09/2009 4X2

X1

-2

0

2

4

6

-2 0 2 4 6

Linearly separable classesJust find a separating line/plane/hyper-plane


Introduction

15/09/2009 5

Exactly what LDA, QDA and LR doesUNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 6

Even for linearly separable classes the best function could be not linear


15/09/2009 7

X2

X1

-1.0

-0.5

0.0

0.5

1.0

-1.0 -0.5 0.0 0.5 1.0

What if the separation can not be performed by linear functions?How can we separate the two classes by using only X1 and X2?


15/09/2009 8

LDA does not work…


15/09/2009 9

LDA does not work…


15/09/2009 10

Neither does QDA…


15/09/2009 11

Even far more flexible models fail to separate the classes…


15/09/2009 12

ANNs get close to do the job, but require a lot of tuning…


15/09/2009 13

Support VectorMachines

Based on the Statistical Learning Theory (Vapnik, 1995)Very good performance in classification tasksIntrinsic “Occam’s razor” logic: the simplest model is preferredEasy to avoid overfittingNot so “Black-box”


15/09/2009 14

Support VectorMachines

Widely used in machine learningBioinformatics / genetic classificationSpatial mappingRoboticsDigital soil mapping


15/09/2009 15

Support vectorclassification

Use the best hyperplaneUse the “kernel trick”


15/09/2009 16X2

X1

-2

0

2

4

6

-2 0 2 4 6

Which is the best hyperplane?We need a way to define what “optimal separation” is…


?

15/09/2009 17X2

X1

-2

0

2

4

6

-2 0 2 4 6

Find the widest gap between classesFit a plane in the middle of the gap


15/09/2009 18


Kernel Function

The kernel linearize the data in an high dimensional spaceMakes possible to find a flat separating hyperplane

15/09/2009 19


Kernel Function

The kernel linearize the data in an high dimensional spaceMakes possible to find a flat separating hyperplane

15/09/2009 20

Kernel Function

Based on the dot product:

Simple to elaborateBut very powerful, can project data in high dimensional spaces: Reproducing Kernel Hilbert Spaces (RKHS)But… it is not known beforehand which kernel is appropriate…


∑=

=l

iiiii xxxx

1]'[][',

15/09/2009 21

Kernels

Polynomial:

Linear:

Radial basis function:

dii xxxxK ,),( =

⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= 2

2

2exp),(

σi

ixxxxK

Exponential RBF: ⎟⎠⎞

⎜⎝⎛ −−= 22

exp),(σ

ii

xxxxK

ii xxxxK ,),( =


15/09/2009 22

SVM with Single Gaussian kernelSeparates the classes almost perfectlyReproduces the general trend of the data


15/09/2009 23

The Stafforabasin study areaTriggering patterns for flowsDEM derived covariates + geology and landuse


15/09/2009 24

1510000

1510000

1520000

1520000

4950

000

4950

000

4960

000

4960

000

4970

000

4970

000

0 1 2 3 4 5 Kilometers

LegendNB ProbabilityValue

High : 1

Low : 0Naïve Bayes(≈WoE) predictionNot bad, but we got a lot of high probability areas


15/09/2009 25

LDA (a.k.a. Maximum Likelihood) predictionBetter than NB, but we still get a lot of high probabilities


1510000

1510000

1520000

1520000

4950

000

4950

000

4960

000

4960

000

4970

000

4970

000


LegendLDA ProbabilityValue

High : 1

Low : 0

15/09/2009 26

1510000

1510000

1520000

1520000

4950

000

4950

000

4960

000

4960

000

4970

000

4970

000


LegendSVM ProbabilityValue

High : 1

Low : 0SVMJust better…


15/09/2009 27

Use cross-validation and ROC curves to compare the models


Far less false positivesPredicted in the cross-validationsample

15/09/2009 28

Success curves (Fabbri and Chung, 2003)It’s a ROC with only true positives rate…


15/09/2009 29

Why use MachineLearning?

Increasing availability of low cost / high information topographic surveys

i.e. LiDAR, Hyper-spectral data

A lot raw derived information

A ML system can automatically interpreter the data without the need of refinements (automatic mapping systems).


15/09/2009 30

What happened if we use only DEM derived data?We still get a decent prediction from SVM, but not from LR/LDA/NB


15/09/2009 31

distance

sem

ivar

ianc

e

0.05

0.10

0.15

200 400 600 800

Once we predict with SVM can we derive useful information from the data?There is still a lot of autocorrelation al low distances


Residual spatial correlation

15/09/2009 32distance

sem

ivar

ianc

e

0.00

0.05

0.10

0.15

500 1500 2500

detrended.occurrence

0.00

0.02

0.04

0.06

0.08

svm.pred.occurrence

0.00

0.10

0.20

500 1500 2500

occurrence

-0.0

10-0

.005

0.00

0

detrended.svm.pred

0.00

0.02

0.04

0.06

0.08

svm.pred

0.00

0.05

0.10

0.15

detrended

We can implement a Kriging system to model the residual informationOr, we can use MK-SVM to model spatial variation


Original trend

Residual correlation

Model correlation

15/09/2009 33

A monodimensionalexample

Combination ofcosine functionswith different λPlus some Additive Gaussian NoiseSample 30% of the data points

0 5 10 15 20 25 30

−0.4

0.0

0.4

x

y

0 5 10 15 20 25 30

−1.0

0.0

1.0

x

y2

0 5 10 15 20 25 30

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

x

y3


15/09/2009 34

Multi-kernel analysis

Two gaussianRBF kernelsMK-SVR is able to separate the two signals, even in presence of noise.

0 5 10 15 20 25 30

−1.0

0.0

1.0

x

y2

0 5 10 15 20 25 30

−1.0

0.0

1.0

x

pred

3

0 5 10 15 20 25 30

−0.4

0.0

0.4

x

y

0 5 10 15 20 25 30

−0.1

50.

000.

15

x

pred

4


15/09/2009 35

Spatial SVM performance

Within slope predicted probabilityAverage probability close to max probability


15/09/2009 36

Cross-validation performanceAverage prob. still close to max probability

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA Spatial SVM performance

15/09/2009 37

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA Spatial SVM rules extraction

15/09/2009 38

Conclusions

SVM clearly outperform most of the statistical techniques commonly applied for landslide susceptibility perdictionIt IS a “black box” technique, but not so much… several algorithms for feature selection and ranking are availableVery good for automatic and real time mappingCan easily update the model if new data is providedGood for automatic mapping systems


Documents

UNIVERSITÀ DEGLI STUDI Non-linearity and spatial DI MILANO ... 09...15/09/2009 1 Non-linearity and spatial correlation in landslide susceptibility mapping C. Ballabio, J. Blahut,