CE 32110 Unsupervised Classification 2010

Embed Size (px)

Citation preview

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    1/19

    UNSUPERVISEDCLASSIFICATION

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    2/19

    Unsupervised Classification

    In contrast to supervised classification, unsupervisedclassification requires only a minimal amount of initial input fromthe analyst.

    It is a process whereby numerical operations are performedthat search for natural groupings of the spectral properties ofpixels, as examined in multispectral feature space.

    The user allows the computer to select the class means andcovariance matrices to be used in the classification.

    Once the data are classified, the analyst attempts, to assignthese natural or spectral classes to the information classes ofinterest.

    This may not be easy.

    Some of the clusters may be meaningless as they representmixed classes of earth surface materials. This ambiguity is resolved by the analyst who understands the

    spectral characteristics of the terrain in order classify clusters

    into information classes.

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    3/19

    UNSUPERVISED CLASSIFICATION

    Clustering is one of the most important tasks in

    data mining and knowledge discovery. It attempts to find subsets within a given data that

    are similar enough to warrant further analysis.

    It organizes a set of objects into groups (orclusters) such that objects in the same group aresimilar to each other and different from those inother groups.

    These groups or clusters should have meaning inthe context of a particular problem.

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    4/19

    In clustering one of important and fundamentaltask is the definition of

    a) proximity (similarity function) between two dataobjects and

    b) the overall optimization search strategy, i.e. howto find the best overall grouping according to an

    optimization criteria.

    Clustering, commonly known as unsupervisedclassification, does not need any training dataand is especially useful when the user haslimited knowledge about the data.

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    5/19

    Clustering algorithms partition data into a certainnumber of clusters (groups, subsets, orcategories).

    There is no universally agreed upon definition. Most researchers describe a cluster by

    considering

    a) the internal homogeneity andb) the external separation i.e. patterns in the same

    cluster should be similar to each other, whilepatterns in different clusters should not.

    Both the similarity and the dissimilarity should beexaminable in a clear and meaningful way.

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    6/19

    STEPS IN CLUSTER ANALYSIS Data to cluster data acquisition, preparation and

    cleaning.

    Variables to use selection of relevant variable forperforming the clustering procedure. Irrelevant ormasking variable should be excluded as far aspossible.

    A proximity measure designing a proper proximitymeasure.

    The clustering procedure.

    Number of clusters even no cluster is a possible

    outcome. Replication,

    Testing and interpretation.

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    7/19

    Unsupervised Classification

    Clustering algorithms used for the unsupervised classification ofremotely sensed data, generally, vary according to theefficiency with which the clustering takes place.

    An example of a conceptually simple but not necessarilyefficient clustering algorithm has been used below todemonstrate the fundamental logic of unsupervisedclassification known as CLUSTER.

    This algorithm operates in a two-pass mode.

    In the first pass, the algorithm sequentially builds class clusters.

    In the second pass, a minimum-distance classifier is applied tothe whole data set on a pixel-by-pixel basis, where each pixel isassigned to one of the mean vectors created in pass 1.

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    8/19

    CLUSTER Algorithm

    Pass 1: Cluster Building

    During the first pass, the analyst may be required tosupply four types of information:

    (i) R, radius of the cluster,

    (ii) C, a distance parameter for merging clusters,(iii) N, the number of pixels to be evaluated between each

    merging of the clusters,and

    (iv) Cmax, the maximum number of clusters to be identifiedby the algorithm.

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    9/19

    To start the process of building of cluster centres, thefirst pixel of the image is considered to be the cluster

    centre of the first class. Then the second pixel is taken up and its membership

    for the first cluster is found by computing the distancebetween this point and the cluster centre of class 1.

    If the distance between the pixel and the cluster centreof class 1 is less than or equal to R, then this pixelbelong to class 1.

    Now the class 1 has two points within its cluster andthe cluster centre of class 1 is modified by taking theaverage value of both the pixels.

    CLUSTER Algorithm

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    10/19

    Pixel 2 (20. 20)

    Pixel 3 (30. 20)

    Pixel 1 (10. 10)

    0 10 20 30 40

    30

    20

    10

    0

    Band 1Brightness Values

    Ba

    nd

    2

    BrightnessValues

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    11/19

    Pixel 2 (20. 20)

    Pixel 1 (10. 10)

    0 10 20 30 40

    30

    20

    10

    0

    Band 1Brightness Values

    Ba

    nd

    2

    Brightn

    essValues

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    12/19

    R=15

    Cluster #1after 1st iteration(15,15)

    0 10 20 30 40

    30

    20

    10

    0

    Band 4Brightness Values

    Band

    5

    Brightn

    essValues

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    13/19

    CLUSTER Algorithm

    Now the third pixel is taken up for examination. If the distance between this pixel and the cluster centre

    of class 1 less than or equal to R, then the pixel belongs

    to class 1. Adjust the cluster centre of class 1 by taking the

    average values of all the three pixels.

    If the distance of the third pixel exceeds the distanceR

    ,then this pixel does not belong to the class 1, hence thispixel now becomes the cluster centre of a new class i.e.class 2.

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    14/19

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    15/19

    This process of building cluster continues till N pixels havebeen examined for their membership to cluster of differentclasses.

    At this point, the cluster building process stops temporarily andthe distance between class clusters are examined for theirseparability.

    The class clusters that have now been identified have to bechecked such that the cluster centres of all classes areseparated by a minimum value C.

    Those clusters, which are lying at a distance less than C, haveto be merged together as they belong to the same cluster.

    The new cluster centres of the merged cluster are found bytaking weighted average value of the old cluster centres beingmerged.

    CLUSTER Algorithm

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    16/19

    Once the cluster centres have been checked for properseparability, the building of cluster starts from the pointwhere it had stopped.

    It is found that the centres of the cluster, which havebeen identified, tends to move in its position in the initialphase, and as more points are examined, the positionsof the clusters start to stabilize before converging into a

    fixed position. This process of cluster building continues till themaximum number of cluster centres (Cmax) have beenidentified or the end of image is encountered.

    Finally, the separability of each cluster is checked beforeproceeding to Pass 2.

    CLUSTER Algorithm

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    17/19

    ending

    Cluster # 1beginning

    0 10 20 30 40

    30

    20

    10

    0

    Band 4Brightness Values

    Band

    5

    Brightn

    essValues

    Cluster # 2beginning

    ending

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    18/19

    Unsupervised Classification

    Pass 2: Classification of Image

    Having identified the cluster centres of all the classes, theclassification of the image starts.

    Each point is assigned a class membership on the basis ofminimum distance to means classifier.

    When the whole image has been classified, the analyst nowexamines the classified image.

    Since the classes that have been identified are basicallyspectral class and not informational classes, hence the

    analyst now has to undertake the process of converting thespectral classes into informational classes.

  • 8/3/2019 CE 32110 Unsupervised Classification 2010

    19/19

    Unsupervised Classification

    In this process of convergence it is found that two or

    more spectral classes may combine together to yield asingle information class.

    This process is rather a tedious, cumbersome, andcomplex, hence requires a great amount of expertise on

    the part of the analyst in merging many spectral classesinto one informational class.