CE 32110 Unsupervised Classification 2010

8/3/2019 CE 32110 Unsupervised Classification 2010

1/19

UNSUPERVISEDCLASSIFICATION


2/19

Unsupervised Classification

In contrast to supervised classification, unsupervisedclassification requires only a minimal amount of initial input fromthe analyst.

It is a process whereby numerical operations are performedthat search for natural groupings of the spectral properties ofpixels, as examined in multispectral feature space.

The user allows the computer to select the class means andcovariance matrices to be used in the classification.

Once the data are classified, the analyst attempts, to assignthese natural or spectral classes to the information classes ofinterest.

This may not be easy.

Some of the clusters may be meaningless as they representmixed classes of earth surface materials. This ambiguity is resolved by the analyst who understands the

spectral characteristics of the terrain in order classify clusters

into information classes.


3/19

UNSUPERVISED CLASSIFICATION

Clustering is one of the most important tasks in

data mining and knowledge discovery. It attempts to find subsets within a given data that

are similar enough to warrant further analysis.

It organizes a set of objects into groups (orclusters) such that objects in the same group aresimilar to each other and different from those inother groups.

These groups or clusters should have meaning inthe context of a particular problem.


4/19

In clustering one of important and fundamentaltask is the definition of

a) proximity (similarity function) between two dataobjects and

b) the overall optimization search strategy, i.e. howto find the best overall grouping according to an

optimization criteria.

Clustering, commonly known as unsupervisedclassification, does not need any training dataand is especially useful when the user haslimited knowledge about the data.


5/19

Clustering algorithms partition data into a certainnumber of clusters (groups, subsets, orcategories).

There is no universally agreed upon definition. Most researchers describe a cluster by

considering

a) the internal homogeneity andb) the external separation i.e. patterns in the same

cluster should be similar to each other, whilepatterns in different clusters should not.

Both the similarity and the dissimilarity should beexaminable in a clear and meaningful way.


6/19

STEPS IN CLUSTER ANALYSIS Data to cluster data acquisition, preparation and

cleaning.

Variables to use selection of relevant variable forperforming the clustering procedure. Irrelevant ormasking variable should be excluded as far aspossible.

A proximity measure designing a proper proximitymeasure.

The clustering procedure.

Number of clusters even no cluster is a possible

outcome. Replication,

Testing and interpretation.


7/19


Clustering algorithms used for the unsupervised classification ofremotely sensed data, generally, vary according to theefficiency with which the clustering takes place.

An example of a conceptually simple but not necessarilyefficient clustering algorithm has been used below todemonstrate the fundamental logic of unsupervisedclassification known as CLUSTER.

This algorithm operates in a two-pass mode.

In the first pass, the algorithm sequentially builds class clusters.

In the second pass, a minimum-distance classifier is applied tothe whole data set on a pixel-by-pixel basis, where each pixel isassigned to one of the mean vectors created in pass 1.


8/19

CLUSTER Algorithm

Pass 1: Cluster Building

During the first pass, the analyst may be required tosupply four types of information:

(i) R, radius of the cluster,

(ii) C, a distance parameter for merging clusters,(iii) N, the number of pixels to be evaluated between each

merging of the clusters,and

(iv) Cmax, the maximum number of clusters to be identifiedby the algorithm.


9/19

To start the process of building of cluster centres, thefirst pixel of the image is considered to be the cluster

centre of the first class. Then the second pixel is taken up and its membership

for the first cluster is found by computing the distancebetween this point and the cluster centre of class 1.

If the distance between the pixel and the cluster centreof class 1 is less than or equal to R, then this pixelbelong to class 1.

Now the class 1 has two points within its cluster andthe cluster centre of class 1 is modified by taking theaverage value of both the pixels.

CLUSTER Algorithm


10/19

Pixel 2 (20. 20)

Pixel 3 (30. 20)

Pixel 1 (10. 10)

0 10 20 30 40

30

20

10

0

Band 1Brightness Values

Ba

nd

2

BrightnessValues


11/19

Pixel 2 (20. 20)

Pixel 1 (10. 10)

0 10 20 30 40

30

20

10

0


Ba

nd

2

Brightn

essValues


12/19

R=15

Cluster #1after 1st iteration(15,15)

0 10 20 30 40

30

20

10

0


Band

5

Brightn

essValues


13/19

CLUSTER Algorithm

Now the third pixel is taken up for examination. If the distance between this pixel and the cluster centre

of class 1 less than or equal to R, then the pixel belongs

to class 1. Adjust the cluster centre of class 1 by taking the

average values of all the three pixels.

If the distance of the third pixel exceeds the distanceR

,then this pixel does not belong to the class 1, hence thispixel now becomes the cluster centre of a new class i.e.class 2.


14/19


15/19

This process of building cluster continues till N pixels havebeen examined for their membership to cluster of differentclasses.

At this point, the cluster building process stops temporarily andthe distance between class clusters are examined for theirseparability.

The class clusters that have now been identified have to bechecked such that the cluster centres of all classes areseparated by a minimum value C.

Those clusters, which are lying at a distance less than C, haveto be merged together as they belong to the same cluster.

The new cluster centres of the merged cluster are found bytaking weighted average value of the old cluster centres beingmerged.

CLUSTER Algorithm


16/19

Once the cluster centres have been checked for properseparability, the building of cluster starts from the pointwhere it had stopped.

It is found that the centres of the cluster, which havebeen identified, tends to move in its position in the initialphase, and as more points are examined, the positionsof the clusters start to stabilize before converging into a

fixed position. This process of cluster building continues till themaximum number of cluster centres (Cmax) have beenidentified or the end of image is encountered.

Finally, the separability of each cluster is checked beforeproceeding to Pass 2.

CLUSTER Algorithm


17/19

ending

Cluster # 1beginning

0 10 20 30 40

30

20

10

0


Band

5

Brightn

essValues

Cluster # 2beginning

ending


18/19


Pass 2: Classification of Image

Having identified the cluster centres of all the classes, theclassification of the image starts.

Each point is assigned a class membership on the basis ofminimum distance to means classifier.

When the whole image has been classified, the analyst nowexamines the classified image.

Since the classes that have been identified are basicallyspectral class and not informational classes, hence the

analyst now has to undertake the process of converting thespectral classes into informational classes.


19/19


In this process of convergence it is found that two or

more spectral classes may combine together to yield asingle information class.

This process is rather a tedious, cumbersome, andcomplex, hence requires a great amount of expertise on

the part of the analyst in merging many spectral classesinto one informational class.

Documents

CE 32110 Unsupervised Classification 2010