27
Kernel Entropy Component Analysis in Remote Sensing Data Clustering Luis Gómez-Chova 1 Robert Jenssen 2 Gustavo Camps-Valls 1 1 Image Processing Laboratory (IPL), Universitat de València, Spain. [email protected] , http://www.valencia.edu/chovago 2 Department of Physics and Technology, University of Tromsø, Norway. [email protected] , http://www.phys.uit.no/robertj IGARSS 2011 – Vancouver, Canada * IPL Image Processing Laboratory

Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Embed Size (px)

Citation preview

Page 1: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Kernel Entropy Component Analysisin Remote Sensing Data Clustering

Luis Gómez-Chova1 Robert Jenssen2 Gustavo Camps-Valls1

1Image Processing Laboratory (IPL), Universitat de València, [email protected] , http://www.valencia.edu/chovago

2Department of Physics and Technology, University of Tromsø, [email protected] , http://www.phys.uit.no/∼robertj

IGARSS 2011 – Vancouver, Canada

*

IPL

Image Processing Laboratory

Page 2: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Outline

1 Introduction

2 Entropy Component Analysis

3 Kernel Entropy Component Analysis (KECA)

4 KECA Spectral Clustering

5 Experimental Results

6 Conclusions and Open questions

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 1/26

Page 3: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Motivation

Feature Extraction

Feature selection/extraction essential before classification or regressionto discard redundant or noisy componentsto reduce the dimensionality of the data

Create a subset of new features by combinations of the existing ones

Linear Feature Extraction

Offer Interpretability ∼ knowledge discoveryPCA: projections maximizing the data set variancePLS: projections maximally aligned with the labelsICA: non-orthogonal projections with maximal independent axes

Fail when data distributions are curved

Nonlinear feature relations

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 2/26

Page 4: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Objectives

Objectives

Kernel-based non-linear data-transformationCaptures the data higher order statisticsExtracts features suited for clustering

Method

Kernel Entropy Component Analysis (KECA) [Jenssen, 2010]

Based on Information Theory:Maximally preserves entropy of the input dataAngular clustering maximizes cluster divergence

Out-of-sample extension to deal with test data

Experiments

Cloud screening from ENVISAT/MERIS multispectral images

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 3/26

Page 5: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

1 Introduction

2 Entropy Component Analysis

3 Kernel Entropy Component Analysis (KECA)

4 KECA Spectral Clustering

5 Experimental Results

6 Conclusions and Open questions

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 4/26

Page 6: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Information-Theoretic Learning

Entropy Concept

Entropy of a probability density function (pdf) is a measure of information

Entropy ⇔ Shapeof the pdf

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 5/26

Page 7: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Information-Theoretic Learning

Divergence Concept

The entropy concept can be extended to obtain a measure of dissimilaritybetween distributions

Divergence ⇔ Distancebetween pdfs

←→

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 6/26

Page 8: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Entropy Component Analysis

Shannon entropy

H(p) = −Z

p(x) log p(x)dx

How to handle densities? How to compute integrals?

Rényi’s entropies

H(p) = − 11− α log

Zpα(x)dx

Rényi’s entropies contain Shannon as a special case α→ 1We focus on the Rényi’s quadratic entropy α = 2

Rényi’s quadratic entropy

H(p) = − logZ

p2(x)dx = − logV (p)

It can be estimated directly from samples!

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 7/26

Page 9: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Entropy Component Analysis

Rényi’s quadratic entropy estimator

Estimated from data D = {x1, . . . , xN} ∈ Rd generated by the pdf p(x)Parzen window estimator with a Gaussian or Radial Basis Function (RBF):

p(x) =1N

Xxt∈D

K(x, xt |σ) with K(x, xt) = exp`−‖x− xt‖2/2σ2

´Idea: Place a kernel over the samples and sum with proper normalization

The estimator for the information potential V (p) =R

p2(x)dx

V (p) =

Zp2(x)dx =

Z1N

Xxt∈D

K(x, xt |σ)1N

Xxt′∈D

K(x, xt′ |σ)dx

=1

N2

Xxt∈D

Xxt′∈D

ZK(x, xt |σ)K(x, xt′ |σ)dx

=1

N2

Xxt∈D

Xxt′∈D

K(xt , xt′ |√2σ) =

1N2 1>K1

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 8/26

Page 10: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Entropy Component Analysis

Rényi’s quadratic entropy estimator

Empirical Rényi entropy estimate resides in the corresponding kernel matrix

V (p) =1

N2 1>K1

It can be expressed in terms of eigenvalues and eigenvectors of K

K = EDE>

D diagonal matrix of eigenvalues λ1, . . . , λN

E matrix with the eigenvectors e1, . . . , eN

Therefore we then have

V (p) =1

N2

NXi=1

“pλie>i 1

”2where each term

√λie>i 1 will contribute to the entropy estimate

ECA dimensionality reduction

Idea: to find the smallest set of features that maximally preserve theentropy of the input data (contributions

√λie>i 1)

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 9/26

Page 11: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Entropy Component Analysis

H(p) = 4.36 H(p) = 4.74 H(p) = 5.05

H(p) = 4.71 H(p) = 4.81 , H(p) = 4.44

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 10/26

Page 12: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

1 Introduction

2 Entropy Component Analysis

3 Kernel Entropy Component Analysis (KECA)

4 KECA Spectral Clustering

5 Experimental Results

6 Conclusions and Open questions

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 11/26

Page 13: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Kernel Principal Component Analysis (KPCA)

Principal Component Analysis (PCA)

Find projections of X = [x1, . . . , xN ]> maximizing the variance of data XU

PCA: maximize: Trace{(XU)>(XU)} = Trace{U>CxxU}subject to: U>U = I

Including Lagrange multipliers λ, this is equivalent to the eigenproblem

Cxxui = λiui → CxxU = UD

ui are the eigenvectors of Cxx and they are orthonormal, u>i uj = 0

PCA

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 12/26

Page 14: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Kernel Principal Component Analysis (KPCA)

Kernel Principal Component Analysis (KPCA)

Find projections maximizing variance of mapped data [φ(x1), . . . , φ(xN)]>

KPCA: maximize: Tr{(ΦU)>(ΦU)} = Tr{U>Φ>ΦU}subject to: U>U = I

The covariance matrix Φ>Φ and projection matrix U are dH × dH !!!

KPCA through kernel trick

Apply the representer’s theorem: U = Φ>A where A = [α1, . . . ,αN ]

KPCA: maximize: Tr{A>ΦΦ>ΦΦ>A} = Tr{A>KKA}subject to: U>U = A>ΦΦ>A = A>KA = I

Including Lagrange multipliers λ, this is equivalent to the eigenproblem

KKαi = λiKαi → Kαi = λiαi

Now matrix A is N × N !!! (eigendecomposition of K = EDE> = AA>)

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 13/26

Page 15: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Kernel ECA Transformation

Kernel Entropy Component Analysis (KECA)

KECA: projection of Φ onto those m feature-space principal axescontributing most to the Rényi entropy estimate of the input data

Φeca = ΦUm = EmD12m

Projections onto a single principal axis ui in H is given by u>i Φ> =√

λie>iEntropy associated with Φeca is Vm = 1

N2 1>Keca1 = 1N2Pm

i=1`√

λie>i 1´2

Note that Φeca is not necessarily based on the top eigenvalues λi sincee>i 1 also contributes to the entropy estimate

Out-of-sample extension

Projections for a collection of test data points:

Φeca,test = ΦtestUm = ΦtestΦEmD− 1

2m = KtestEmD

− 12

m

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 14/26

Page 16: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Kernel ECA Transformation

KECA example

KECAKPCAPCAOriginal

KECA reveals cluster structure → underlying labels of the data

Nonlinearly related clusters in X → different angular directions in HAn angular clustering based on the kernel features Φeca seems reasonable

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 15/26

Page 17: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

1 Introduction

2 Entropy Component Analysis

3 Kernel Entropy Component Analysis (KECA)

4 KECA Spectral Clustering

5 Experimental Results

6 Conclusions and Open questions

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 16/26

Page 18: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

KECA Spectral Clustering

Cauchy-Schwarz divergence

The Cauchy-Schwarz divergence between the pdf of two clusters is

DCS(pi , pj ) = − log(VCS(pi , pj )) = − logR

pi (x)pj (x)dxqRp2i (x)dx

Rp2j (x)dx

Measuring dissimilarity in a probability space is a complex issueEntropy interpretation in the kernel space → mean vector µ = 1

N

Pφ(xt):

V (p) =

Zp2(x)dx =

1N2 1>K1 =

1N2 1>ΦΦ>1 = µ>µ = ‖µ‖2

Diverg. via Parzen windowing ⇒ VCS(pi , pj ) =µ>i µj‖µi ‖‖µj‖

= cos∠(µi ,µj )

KECA Spectral Clustering

Angular clustering of Φeca maximizes the CS divergence between clusters:

J(C1, . . . ,Ck) =kX

i=1

Ni cos∠(φeca(x),µi )

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 17/26

Page 19: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

KECA Spectral Clustering

KECA Spectral Clustering Algorithm

1 Obtain Φeca by Kernel ECA2 Initialize means µi , i = 1, . . . , k3 For all training samples assign a cluster

xt → Ci maximizing cos∠(φeca(xt),µi )4 Update mean vectors µi5 Repeat steps 3 and 4 until convergence

CS

Entropy

Intuition

A kernel feature space data point φeca(xt) is assigned to the cluster representedby the closest mean vector µi in terms of angular distance

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 18/26

Page 20: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

1 Introduction

2 Entropy Component Analysis

3 Kernel Entropy Component Analysis (KECA)

4 KECA Spectral Clustering

5 Experimental Results

6 Conclusions and Open questions

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 19/26

Page 21: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Experimental results: Data material

Cloud masking from ENVISAT/MERIS multispectral images

Pixel-wise binary decisions about the presence/absence of clouds

MERIS images taken over Spain and France

Input samples with 13 spectral bands and 6 physically inspired features

Barrax (BR-2003-07-14) Barrax (BR-2004-07-14) France (FR-2005-03-19)

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 20/26

Page 22: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Experimental results: Numerical comparison

Experimental setup

KECA compared with k-means, KPCA + k-means, and Kernel k-meansNumber of clusters fixed to k = 2 (cloud-free and cloudy areas)Number of KPCA and KECA features fixed to m = 2 (stress differences)RBF-kernel width parameter is selected by gird-search for all methods

Numerical results

Validation results on 10000 pixels per image manually labeled

Kappa statistic results over 10 realizations for all images

200 400 600 800 10000.5

0.6

0.7

0.8

0.9

1

#Samples

Est

imat

ed κ

sta

tistic

BR-2003-07-14

200 400 600 800 10000.4

0.5

0.6

0.7

0.8

#Samples

Est

imat

ed κ

sta

tistic

BR-2004-07-14

200 400 600 800 10000

0.1

0.2

0.3

0.4

0.5

0.6

#Samples

Est

imat

ed κ

sta

tistic

FR-2005-03-19

KECAKPCAKernel k-meansk-means

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 21/26

Page 23: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Experimental results: Numerical comparison

Average numerical results

200 400 600 800 10000.4

0.5

0.6

0.7

0.8

#Samples

Est

imat

ed κ

sta

tistic

KECAKPCAKernel k-meansk-means

KECA outperforms k-means (+25%) and Kk-means and KPCA (+15%)

In general, the number of training samples positively affect the results

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 22/26

Page 24: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Experimental results: Classification maps

Test Site k-means Kernel k-means KPCA KECASpain (BR-2003-07-14) OA=96.25% ; κ=0.6112 OA=96.22% ; κ=0.7540 OA=47.52% ; κ=0.0966 OA=99.41% ; κ=0.9541

Spain (BR-2004-07-14) OA=96.91% ; κ=0.6018 OA=62.03% ; κ=0.0767 OA=96.66% ; κ=0.6493 OA=97.54% ; κ=0.7319

France (FR-2005-03-19) OA=92.87% ; κ=0.6142 OA=92.64% ; κ=0.6231 OA=80.93% ; κ=0.4051 OA=92.91% ; κ=0.6302

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 23/26

Page 25: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

1 Introduction

2 Entropy Component Analysis

3 Kernel Entropy Component Analysis (KECA)

4 KECA Spectral Clustering

5 Experimental Results

6 Conclusions and Open questions

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 24/26

Page 26: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Conclusions and open questions

Conclusions

Kernel entropy component analysis for clustering remote sensing dataNonlinear features preserving entropy of the input dataAngular clustering reveals structure in terms of clusters divergence

Out-of-sample extension for test data → mandatory in remote sensing

Good results on cloud screening from MERIS images

KECA code is available at http://www.phys.uit.no/∼robertj/Simple feature extraction toolbox (SIMFEAT) soon at http://isp.uv.es

Open questions and Future work

Pre-images of transformed data in the input space

Learn kernel parameters in an automatic way

Test KECA in more remote sensing applications

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 25/26

Page 27: Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf

Intro ECA KECA Clustering Results Conclusions

Kernel Entropy Component Analysisin Remote Sensing Data Clustering

Luis Gómez-Chova1 Robert Jenssen2 Gustavo Camps-Valls1

1Image Processing Laboratory (IPL), Universitat de València, [email protected] , http://www.valencia.edu/chovago

2Department of Physics and Technology, University of Tromsø, [email protected] , http://www.phys.uit.no/∼robertj

IGARSS 2011 – Vancouver, Canada

*

IPL

Image Processing Laboratory

L. Gómez-Chova et al. Kernel Entropy Component Analysis IGARSS 2011 – Vancouver 26/26