25
1 NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers Haixuan Yang Group Meeting Sep 26, 2005

NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

  • Upload
    homer

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers. Haixuan Yang Group Meeting Sep 26, 2005. Outline. Introduction Graph Heat Diffusion Model NHDC and PHDC algorithms Connections with other models Experiments Conclusions and future work. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

1

NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

Haixuan YangGroup MeetingSep 26, 2005

Page 2: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

2

IntroductionGraph Heat Diffusion ModelNHDC and PHDC algorithmsConnections with other modelsExperimentsConclusions and future work

Outline

Page 3: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

3

IntroductionKondor & Lafferty (NIPS2002)

Construct a diffusion kernel on a graph Handle discrete attributes Apply to a large margin classifier Achieve goof performance in accuracy on 5 data sets from UCI

Lafferty & Kondor (JMLR2005) Construct a diffusion kernel on a special manifold Handle continuous attributes Restrict to text classification Apply to SVM Achieve good performance in accuracy on WEbKB and Reuters

Belkin & Niyogi (Neural Computation 2003) Reduce dimension by heat kernel and local distance

Tenenbaum et al (Science 2000) Reduce dimension by local distance

Page 4: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

4

IntroductionWe inherit the ideas

Local information is relatively accurate in a nonlinear manifold.

The way heat diffuses on a manifold is related to the density of the data on the manifold: the point where heat diffuses rapidly is one that has high density.

For example, in the ideal case when the manifold is the Euclidean space, heat diffuses in the same way as Gaussian density:

The way heat diffuses on a manifold can be understood as a generalization of the Gaussian density from Euclidean space to manifold.

Learn local information by k nearest neighbors.

Page 5: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

5

IntroductionWe think differently:

Unknown manifold in most cases. Unknown solution for the known manifold.

The explicit form of the approximation to the solution in (Lafferty & Lebanon JMLR2005):

is a rare case. Establish the heat diffusion equation directly

on a graph that is formed by K nearest neighbors.

Always have an explicit form in any case. Form a classifier by the solution directly.

Page 6: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

6

Illustration

The first heat diffusion

The second heat diffusion

Page 7: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

7

Illustration

Page 8: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

8

Illustration

Page 9: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

9

IllustrationHeat received from A class:

0.018Heat received from B class:

0.016

Heat received from A class:

0.002Heat received

from B class: 0.08

SVM

Page 10: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

10

Graph Heat Diffusion Model

Given a directed weighted graph G=(V,E,W), where V={1,2,…,n}, E={(i,j): if there is an edge from i to j}, W=( w(i,j) ) is the weight matrix.

The edge (i,j) is imagined as a pipe that connects i and j, w(i,j) is the pipe length.Let f(i,t) be the heat at node i at time t.At time t, i receives M(i,j,t,dt) amount of heat from its neighbor j during a period of dt.

Page 11: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

11

Graph Heat Diffusion Model

Suppose that M(i,j,t,dt) is proportional to the time period dt.Suppose that M(i,j,t,dt) is proportional to the heat difference f(j,t)-f(i,t). Moreover, the heat flows from j to i through the pipe and therefore the heat diffuses in the pipe in the same way as it does in the Euclidean space as described before.

Page 12: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

12

Graph Heat Diffusion Model

The heat difference f(i,t+dt) and f(i,t) can be expressed as:

It can be expressed as a matrix form:

Let dt tends to zero, the above equation becomes:

Page 13: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

13

NHDC and PHDC algorithm - Step 1

[Construct neighborhood graph] Define graph G over all data points both in

the training data set and in the test data set.

Add edge from j to i if j is one of the K nearest neighbors of i.

Set edge weight w(i,j)=d(i, j) if j is one of the K nearest neighbors of i, where d(i, j) be the Euclidean distance between point i and point j.

Page 14: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

14

NHDC and PHDC algorithm - Step 2

[Compute the Heat Kernel] Using equation

Page 15: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

15

NHDC and PHDC algorithm - Step 3

[Compute the Heat Distribution] Set f(0): for each class c, nodes labeled by class c,

has an initial unit heat at time 0, all other nodes have no heat at time 0.

In PHDC, use equation

to compute the heat distribution. In NHDC, use equation

)0()( Hftf

Page 16: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

16

NHDC and PHDC algorithm - Step 4

[Classify the nodes] For each node in the test data set,

classify it to the class from which it receives most heat.

Page 17: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

17

Connections with other models

The Parzen window approach (when the window function takes the normal form) is a special case of the NHDC.

It is a non-parametric method for probability density estimation:

The class-conditional density for class k

Assign x to a class whose value is maximal.

Page 18: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

18

Connections with other models

The Parzen window approach (when the window function takes the normal form) is a special case of the NHDC.

In our model, let K=n-1, then the graph constructed in Step 1 will be a complete graph. The matrix H will be

Heat that xp receives from the data points in class k

Page 19: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

19

Connections with other models

KNN is a special case of the NHDC.

For each test data, assign it to the class that has the maximal number in its K nearest neighbors.

Page 20: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

20

Connections with other models

KNN is a special case of the NHDC.

In our model, letβtend to infinity, then the matrix H becomes

Heat that xp receives from the data points in class k

The number of the cases in class q in its K nearest neighbor.

Page 21: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

21

Connections with other models

PHDC can approximate NHDC. If γis small, then

Since the identity matrix has no effect on the heat

distribution, PHDC and NHDC has similarclassification accuracy when γ is small.

HIe H

Page 22: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

22

Connections with other models

PHDC

NHDC

KNN PWA

Page 23: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

23

Experiments2 artificial Data sets

Spiral-100 Spiral-1000

Compare with Parzen window (The window function takes the normal form), KNN and SVM.The result is the average of the ten-cross validation.

Page 24: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

24

ExperimentsResults

Algorithm NHDC PHDC KNN PWA SVM

Spiral-100 84 84 67 83 34

Spiral-1000 99.6 99.8 99.3 99.7 68.7  Credit-g 76.1 76.06 75.59 72.35 71.5

Diabetes 76.3 76.22 75.78 74.96 76.6

Glass 72.99 73.12 70.64 71.56 68.1

Iris 97.36 97.79 97.36 97.07 96

Sonar 88.75 89.07 82.86 88.28 84.8

Vehicle 72.90 72.93 71.41 72.45 88.5

Page 25: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers

25

Conclusions and future work

Avoid the difficulty of finding the explicit expression for the unknown geometryAvoid the difficult of finding a closed form heat kernel for some complicated geometries.Both NHDC and PHDC are efficient in accuracy.There is space to develop it further.

The assumption in the local heat diffusion is not fully justified.

We are now using a directed graph. Converting it into a undirected graph may be more reasonable because that in reality heat diffuses symmetrically.

Apply it to SVM?