CE-3219-supervised-classification-2010

Embed Size (px)

Citation preview

  • 8/3/2019 CE-3219-supervised-classification-2010

    1/65

    IMAGE CLASSIFICATION &ANALYSIS

  • 8/3/2019 CE-3219-supervised-classification-2010

    2/65

    In my previous session I had discussed the role of

    Image transformation in remote sensing digitalanalysis.

    In this session I will now discuss the various

    techniques by which a digital satellite data can beconverted into information of interest.

  • 8/3/2019 CE-3219-supervised-classification-2010

    3/65

    IMAGE CLASSIFICATION & ANALYSIS

    An analyst attempts to classify features in an image byusing the elements of visual interpretation to identifyhomogeneous groups of pixels that represent variousfeatures or land cover classes of interest.

    In digital image classification, the analyst uses thespectral information represented by the digital numbers

    in one or more spectral bands, and attempts to classifyeach individual pixel based on this spectral information.

    This type of classification is termed spectral patternrecognition. In either of the case, the objective is to

    assign all pixels in the image to particular classes orthemes.

    The resulting classified image is comprised of a mosaicof pixels, each of which belongs to a particular theme,

    and is essentially a thematic map of the original image.

  • 8/3/2019 CE-3219-supervised-classification-2010

    4/65

    Information classes are those categories of interestthat the analyst is actually trying to identify in theimagery, such as different kinds of crops, different foresttypes or tree species, different geologic units or rock

    types, etc.

    Spectral classes are groups of pixels that are uniform(or near-similar) with respect to their brightness values inthe different spectral channels of the data.

    The objective is to match the spectral classes in the datato the information classes of interest.

    However, it is rare that there is a simple one-to-onematch between these two types of classes.

    TYPES OF CLASS

  • 8/3/2019 CE-3219-supervised-classification-2010

    5/65

    TYPES OF CLASS

    Many times it is found that 2 to 3 spectral classes merge

    to form one informational class, while some classes maynot be of any particular interest.

    It is the analysts job to decide on the utility of thedifferent spectral classes and their correspondence touseful information classes.

  • 8/3/2019 CE-3219-supervised-classification-2010

    6/65

    Common classification procedures canbe broken down into two broad

    subdivisions based on the method used:

    i. supervisedclassificationii. unsupervisedclassification.

    IMAGE CLASSIFICATION

    AND ANALYSIS

  • 8/3/2019 CE-3219-supervised-classification-2010

    7/65

    SUPERVISED CLASSIFICATION In a supervised classification, the analyst identifies in the

    imagery, homogeneous representative samples of thedifferent surface cover types (information classes) of

    interest.

    These samples are referred to as training areas.

    The selection of appropriate training areas is based onthe analysts familiarity with the geographical area andknowledge of the actual surface cover types present in

    the image.

    Thus, the analyst is supervising the categorization of aset of specific classes.

  • 8/3/2019 CE-3219-supervised-classification-2010

    8/65

    The numerical information in all spectral bands for the pixelscomprising these areas, are used to train the computer torecognize spectrally similar areas for each class.

    The computer uses special programs or algorithms todetermine the numerical signaturesfor each training class.

    Once the computer has determined the signatures for eachclass, each pixel in the image is compared to these signaturesand labeled as the class it closely resemblesdigitally.

    Thus, in a supervised classification, the analyst is first identifiesthe information classes based on which it determines thespectral classes which represent them.

  • 8/3/2019 CE-3219-supervised-classification-2010

    9/65

    UNSUPERVISED CLASSIFICATION

    In essence, it is reverse of the supervised classification process.

    Spectral classes are grouped, first, based solely on thenumerical information in the data, and are then matched by the

    analyst to information classes (if possible).

    Programs called clustering algorithms are used to determinethe natural groupings or structures in the data.

    Usually, the analyst specifies how many groups or clusters areto be looked for in the data.

    In addition to specifying the desired number of classes, theanalyst may also specify parameters related to the separationdistance amongst the clusters and the variation within eachcluster.

  • 8/3/2019 CE-3219-supervised-classification-2010

    10/65

    UNSUPERVISED CLASSIFICATION

    The final result of this iterative clustering process may

    result in some clusters that the analyst would like tosubsequently combine, or that some clusters have beenbroken down, each of these require a further iteration ofthe clustering algorithm.

    Thus, unsupervised classification is not completelywithout human intervention. However, it does not startwith a pre-determined set of classes as in a supervised

    classification.

  • 8/3/2019 CE-3219-supervised-classification-2010

    11/65

    SUPERVISED CLASSIFICATION

    In order to carry out supervised classification the analystmay have to adopt a well defined procedure in so as toachieve a satisfactory classification of information.

    The important aspects of conducting a rigorous andsystematic supervised classification of remote sensor dataare as follows:

    (i) Selection of an appropriate classification scheme.(ii) Selection of representative areas as training sites.(iii) Extraction of training data statistics

    (iv)Testing of training data for separability in order to identifythe best possible combination of bands for classification.

    (v) Selection of an appropriate classification algorithm.(vi)Classification of image into appropriate defined classes.

    (vii) Evaluation of classification accuracy.

  • 8/3/2019 CE-3219-supervised-classification-2010

    12/65

    CLASSIFICATION SCHEME

    Classification schemes have been developed so thatthey can readily be incorporated as meanful land useand land cover data as obtained by interpreting remotesensing data.

    Some of the important are

    U.S. Geological Survey Land Use/Land Cover

    Classification System,

    Michigan Classification System, and

    Cowardin Wetland Classification system.

  • 8/3/2019 CE-3219-supervised-classification-2010

    13/65

  • 8/3/2019 CE-3219-supervised-classification-2010

    14/65

    52 Lakes

    51 Streams and canals

    5. Water

    54 Bays and estuaries

    53 Reservoirs

    92 Glaciers

    91 Perennial snowfields9. Perennial snow and ice

    84 Mixed tundra

    83 Bare ground

    82 Herbaceous tundra

    81 Shrub and brush tundra

    8.Tundra

    77 Mixed barren land

    76 Transitional areas

    75 Strip mines, quarries, and gravel pits74 Bare exposed rocks

    73 Sandy areas other than beaches

    72 Beaches

    71 Dry salt flats

    7.Barren land

    62 Non-forested wetland

    61 Forested wetland6. Wetland

  • 8/3/2019 CE-3219-supervised-classification-2010

    15/65

    Training Site Selection

    Once a classification scheme has been adopted, the analyst mayidentify and select sites within the image that are representativeof the land cover classes of interest.

    Training data should be of value if the environment from whichthey obtained is relatively homogenous.

    The image coordinates of these sites are identified and used toextract statistics from the multispectral data for each of theseareas.

    For each feature class c, the mean value (ci) for each andvariance-covariance matrix (Vc) are calculated in a similarmanner as explained.

    The success of a supervised classification depends upon thetraining data used to identify different classes.

    Hence selection of training data has to be done meticulouslykeeping in mind each training data set has some specificcharacteristics.

    These characteristics are discussed below.

  • 8/3/2019 CE-3219-supervised-classification-2010

    16/65

    Number of pixels: This is an important characteristic regardingthe number of pixels to be selected for each information class.

    However, there is no guideline available, yet in general, theanalyst must ensure that sufficient number of pixels is selected.

    Size: The training sets identified on the image should be largeenough to provide accurate and reliable information regarding

    the informational class. However, it should not be too big as large areas may include

    undesirable variation.

    Shape: It is not an important characteristic.

    However, regular shape of training area selected provides easein extracting the information from the satellite images.

    CHARACTERISTICS OF TRAINING SITE

    SELECTION

  • 8/3/2019 CE-3219-supervised-classification-2010

    17/65

    Location: Generally informational classes have small spectralvariability, thus it is necessary that training data are should be solocated that it accounts for different types of conditions within theimage.

    It is desirable that the analyst undertakes a field visit to the desired

    location to clearly mark out the selected information. In case of inaccessible or mountainous regions, aerial photographs or

    maps can provide the basis for accurate delineation of training areas. Number of training areas: The number of training areas depends

    upon the number of categories to be mapped, their diversity, and theresources available for delineating training areas.

    In general, five to ten training samples per class are selected in orderto account for the spatial and spectral variability of informational class.

    Selection of multiple training areas is also desirable as it may be

    possible that some training areas of a class may have to bediscarded later. It is found that it is usually better to define many small training fields

    than to have a few in number but large training areas.

    HARA TERI TI F TRAININ ITE

    SELECTION

  • 8/3/2019 CE-3219-supervised-classification-2010

    18/65

    CHARACTERISTICS OF TRAINING SITESELECTION

    Placement: The training area should be placed in such away that it does not lie close to the edge of the boundaryof the information class.

    Uniformity: This is one the most critical and importantcharacteristics of any training data for an informationclass. The training data collected must exhibit uniformity

    or homogeneity in the information.

    If the histogram displays one peak, i.e., unimodalfrequency distribution for each spectral class, the training

    data is acceptable. If the display is multimodal distribution, then there is

    variability or mixing of information and hence must bediscarded.

  • 8/3/2019 CE-3219-supervised-classification-2010

    19/65

    IDEALISED SEQUENCE FOR SELECTINGTRAINING DATA

    In order to select training data, no fixed or well definedprocedures can be laid out. However, as a guideline, the keysteps in selection and evaluation can be enumerated asfollows:

    (i) Collect information, including maps and aerial photographs ofthe area under study. If any previous study has been carried

    out, then acquire the necessary documents, maps, and reports.(ii) Conduct field trips to acquire first hand knowledge to selective

    and representative sites in the study area. The field trips mustcoincide with the date and time of data acquisition. If not

    possible, then it should be at the same time of the year.

    (iii) Conduct to preliminary examination of the digital data, in orderto make assessment for the quality of the image.

  • 8/3/2019 CE-3219-supervised-classification-2010

    20/65

    IDEALISED SEQUENCE FOR SELECTING

    TRAINING DATA(iv) Identify prospective training areas. These locations may be

    defined with respect to some easily identifiable objects on theimage. Further, the same may be identified on the map andaerial photographs if readily available.

    (v) Extract the training data areas from the digital image.

    (vi) For each informational class, display and inspect the

    frequency histogram for all bands. In case of multimodalfrequency distribution, identify the training areas which areresponsible for the same and discard them.

    (vii) Compute the training data statistics in the form of minimumand maximum, mean, standard deviations, variance-covariance matrices.

    (viii) Now ascertain the separability of the informational classes

    using feature selection.

  • 8/3/2019 CE-3219-supervised-classification-2010

    21/65

    Band 1 Band 2 Band 3 Band 4 Band 5 Band 7Agriculture - 1

    Barren Land

  • 8/3/2019 CE-3219-supervised-classification-2010

    22/65

    39.09730.13317.60734.85325.35418.0716

    29.45716.15328.49019.86713.9465

    13.64120.48514.3289.7144

    45.68029.80921.5333

    24.11816.7062

    16.6501

    654321LayerVariance Covariance Matrix

    6.2535.4273.6936.7594.9114.080Std. Dev

    120.696128.64997.870121.108101.685104.985Mean

    140.000149.000111.000142.000121.000122.000Maximum

    96.000111.00089.00095.00081.00090.000Minimum

    754321Layer

  • 8/3/2019 CE-3219-supervised-classification-2010

    23/65

    Feature Selection:

    Once the training statistics have been systematically collectedfrom each band for each class of interest, a judgment must bemade to determine those bands that are most effective in

    discriminating each class from all others.

    This process is commonly called feature selection.

    The goal is to delete from the analysis those bands that provideonly redundant spectral information.

    In this way the dimensionality (i.e. the number of bands to beprocessed) in the data set may be reduced.

    This process minimizes the cost of the digital image classification(but hopefully, not the accuracy).

  • 8/3/2019 CE-3219-supervised-classification-2010

    24/65

    Some of the statistical seperability measures are:1) City Block Distance

    2) Euclidean Distance

    3) Angular separation

    4) Normalized City Block Distance

    5) Mahalanobis Distance

    6) Divergence

    7) Transformed Divergence8) Bhattacharys Distance

    9) Jeffries Matusita Distance

  • 8/3/2019 CE-3219-supervised-classification-2010

    25/65

    City Block Distance commonly known asManhattan Distance, or Boxcar Distance(Kardi, 2006) is basically a seperabilitymeasure to represent the distance between

    two points in a city road grid. It examines the absolute differences between

    and the coordinates of two object a and b.and hence also known as Absolute ValueDistance.

    Euclidean Distance is a popular measure offinding distance between two points or

    objects, on the basis of Pythagoras theorem.

  • 8/3/2019 CE-3219-supervised-classification-2010

    26/65

    The Normalized City Block measure that isbetter than City Block distance, in the sensethat it is proportional to the separation of theclass means and inversely proportional to

    their standard deviations. If the means are equal, however, it will bezero regardless of the class variances, whichdoes not make sense for a statistical

    classifier based on probabilities. Angular separation is a similarity measure

    than a distance.

    It represents the cosine angle between twoobjects. Higher values of angular separationindicates close similarity (Kardi, 2006)

  • 8/3/2019 CE-3219-supervised-classification-2010

    27/65

    However, all the these measures do notaccount for overlap in class distance due tovariation and thus not good measures ofseperability, in case of remote sensing data.

    For this reason, probability-based measureshave also been defined.

  • 8/3/2019 CE-3219-supervised-classification-2010

    28/65

    Feature selection may involve both statisticaland/or graphical analysis to determine thedegree of between-class separability in the

    remote sensor training data.

    Combinations of bands are normally rankedaccording to their potential ability to discriminate

    each class from all others using n bands at atime.

    Statistical methods of features selection are used

    to quantitatively select the subset of bands (orfeatures) that provides the greatest degree ofstatistical separability between any two classes,

    c and d.

  • 8/3/2019 CE-3219-supervised-classification-2010

    29/65

    The basic problem of spectral patternrecognition is that given a spectral distribution ofdata in nbands of remotely sensed data, find a

    discrimination technique that will allowseparation of the major land cover categorieswith a minimum of error and a minimum numberof bands.

    Generally, the more bands analyzed in aclassification, the greater the cost and perhapsthe greater the amount of redundant spectralinformation being used.

    This problem is demonstrated diagrammaticallyusing just 1 band and 2 classes in Fig.

  • 8/3/2019 CE-3219-supervised-classification-2010

    30/65

    Feature Selection

    CLASS 1CLASS 2

    Pixel in CLASS 2 erroneouslyassigned to CLASS 1

    Pixel in CLASS 1 erroneouslyassigned to CLASS 2

    One dimensional decisionboundary

    No.ofpixels

  • 8/3/2019 CE-3219-supervised-classification-2010

    31/65

    Examining the histograms in Fig. suggests that there issubstantial overlap between classes 1 and 2 in band 1and in band 2.

    When there is overlap, any decision rule that one could

    use to separate or distinguish between two classesmust be concerned with two types of error.

    1. A pixel may be assigned to a class to which it does not

    belong (an error of commission).

    2. A pixel is not assigned to its appropriate class (an error

    of omission).The goal is to select an optimum subset of bands andapply appropriate classification techniques to minimizeboth types of error in the classification process.

  • 8/3/2019 CE-3219-supervised-classification-2010

    32/65

    If the training data for each band are normally

    distributed as suggested in Fig. it is possible to useeither a divergence or transformed divergenceequation to identify the optimum subset of bands to

    use in the classification procedure.

    Divergence was one of the first measures ofstatistical separability used in the machine

    processing of remote sensor data, and it is stillwidely used as a method of feature selection.

  • 8/3/2019 CE-3219-supervised-classification-2010

    33/65

    It addresses the basic problem of deciding what is thebest q-band subset of nbands for use in the supervisedclassification process. The number of combinations, C,of nbands taken qat a time is

    Thus, if there are six thematic mapper bands and weare interested in the three best bands to use in theclassification this results in 20 combinations that mustbe evaluated.

    = = 20 combinations

    ( )!!!

    qnq

    n

    q

    nC

    =

    ( )!36!3

    !6

    3

    6

    =

    C

    )6(6720=

  • 8/3/2019 CE-3219-supervised-classification-2010

    34/65

    E

    If the best two-band combinations were desired, It

    would be necessary to evaluate 15 possiblecombinations.

    Divergence is computed using the mean and

    covariance matrices of the class statistics collected inthe training phase of the supervised classification.

    The degree of divergence or "separability" between c

    and d, Divergcd, is computed according to the formulaDivergcd = 0.5 Tr [(Vc- Vd) (Vc

    -1 - Vd-1)]

    + 0.5 Tr [(Vc-1 + Vd

    -1) (Mc-Md)(Mc-Md)T]

    where Tr [.] is the trace of a matrix (i.e., the sum of thediagonal elements), Vc and Vd are the covariancematrices for the two classes, cand d, and Mc and Mdare the mean vectors.

  • 8/3/2019 CE-3219-supervised-classification-2010

    35/65

    E It should be remembered that the size of thecovariance matrices Vc and Vd is a function of thenumber of bands used in the training process (i.e., if six

    bands were trained upon, both Vc and Vd would bematrices 6 x 6 in dimension.

    Divergence in this case would be used to identify the

    statistical separability of the two training classes usingsix bands of training data.

    However, this is not the usual goal of applyingdivergence. What we actually want to know is theoptimum subset of qbands. For example, if q= 3, whatsubset of three bands provides the best separationbetween these two classes?

  • 8/3/2019 CE-3219-supervised-classification-2010

    36/65

    E

    But what about the case where there are more than twoclasses? In this instance, the most common solution is tocompute the average divergence, Divergavg.

    This involves computing the average over all possible pairs of

    classes, cand d, while holding the subset of bands, qconstant.Then another subset of bands, q is selected for the m classesand analyzed.

    The subset of features (bands) having the maximum averagedivergence may be the superior set of bands to use in theclassification algorithm. This can be expressed as:

    )]))()([(5.0 11 TdCdCdc

    MMMMVVTr ++

    c

    DivergDiverg

    m

    ccd

    m

    c

    avg

    +

    ==

    1

    1

    1

    1

  • 8/3/2019 CE-3219-supervised-classification-2010

    37/65

    Using this, the band subset, q, with the highestaverage divergence would be selected as the mostappropriate set of bands for classifying the mclasses.Kumar and Silva (1977) suggest that it is possible totake the divergence logic one step further and computetransformed divergence, DivergTavg, expressed as:

    }8

    )(exp1{2000 cdavg

    T DivergDiverg

    =

    This statistic gives an exponentially decreasing weight toincreasing distances between the classes. It also scalesthe divergence values to lie between 0 and 2000

  • 8/3/2019 CE-3219-supervised-classification-2010

    38/65

    EThere is no need to compute the divergence using allsix bands, since this represents the totality of the data

    set.It is useful, however, to calculate divergence withindividual channels (q= I) since a single channel mightadequately discriminate between all classes ofinterest.

    A transformed divergence value of 2000 suggestsexcellent between-class separa-tion. Above 1900

    provides good separation, while below 1700 is poor.

  • 8/3/2019 CE-3219-supervised-classification-2010

    39/65

    E

    There are other methods of feature selection also

    based on determining the separability between twoclasses at a time.

    For example, the Bhattacharyya distance assumes that

    the two classes, c and d, are Gaussian in nature andthat the means, Mc and Md and covariance matrics Vcand Vd are available. It is computed as:

    )()(8

    12 dC

    VV

    dCcd MMMMBhatdd

    +=+

    [ ])det()det(

    2/)det[

    2

    1

    dc

    dc

    VV

    VV

    Ln+

    +

    To select the best q features (i.e., combination ofbands) from the original n bands in an m -classproblem, the Bhattacharyya distance is calculatedbetween each of the m (m - 1)/2 pairs of classes foreach of the possible ways of choosing qfeatures from

    ndimensions.

  • 8/3/2019 CE-3219-supervised-classification-2010

    40/65

    The best q features are those dimensions whose sum of theBhattacharyya distance between the m(m-1)/2 classes ishighest.The JM distance (also sometimes called the Bhattacharryadistance ) between a pair of probability distributions (spectralclasses) is defined as

    )]))()([(5.0 11 TdCdCdc

    MMMMVVTr ++

    Feature Selection

    =x

    iijdxxpJ 2)}|({

    This is seen to be a measure of the average distance

    between the two class density functions. For normallydistributed classes this becomes.

    )1(2 xij eJ

    =

    SELECTION OF APPROPRIATE

  • 8/3/2019 CE-3219-supervised-classification-2010

    41/65

    SELECTION OF APPROPRIATECLASSIFICATION ALGORITHM

    Various supervised classification methods have been used toassign an unknown pixel to any one of the classes.

    The choice of a particular classifier or decision rule depends on

    the nature of the input data and the desired output. Parametric classification algorithms assume that the observed

    measurement vectors Xc obtained for each class in eachspectral band during the training phase of the supervised

    classification are Gaussian in nature (i.e., they are normallydistributed).

    Nonparametric classification algorithms make no suchassumption.

    Among the most frequently used classification algorithms arethe Minimum Distance, Parallelepiped, and MaximumLikelihood classifier.

    Mi i Di M Cl ifi i

  • 8/3/2019 CE-3219-supervised-classification-2010

    42/65

    Minimum-Distance to Means Classification

    It is one the simplest and most commonly used decisionrule classifier. Here the analyst provides the mean

    vectors for each class in each band ck, from the training

    data. To perform a minimum distance classification, thedistance to each mean vector, ck, from each unknownpixel (BVijk) is computed. Using Euclidian distance basedon the Pythagorean theorem.

    To which ever class the unknown point has the smallest

    distance, to that class the unknown pixel is assigned to.It can result in classification accuracies comparable toother more computationally intensive algorithms, such asthe maximum likelihood algorithm.

    ( ) ( )22 clijlckijk BVBVDist +=

    Th P ll l i d Cl ifi

  • 8/3/2019 CE-3219-supervised-classification-2010

    43/65

    The Parallelepiped Classifier

    This algorithm is based on simple and/or Boolean logic.Training data statistics in n spectral bands are used inperforming the classification.

    Brightness values from each pixel of the multispectral

    imagery are used to produce and n-dimensional, meanvector Mc = (c1, c2, c3, , cn) with ck being themean value of the training data obtained for class c inband k out of m possible classes as previously defined.

    ck is the standard deviation of the training data class cofband kout of mpossible classes.

    Using a one-standard deviation threshold, a parallelepipedalgorithm decides BVijk is in class cif, and only if,

    ckck BVijk ck+ ckwhere c=1,2,3.., m, number of classes, and

    k=1,2,3, , m, number of bands.

  • 8/3/2019 CE-3219-supervised-classification-2010

    44/65

    The Parallelepiped Classifier

    Therefore, if the low and high decision boundaries aredefined as

    Lowck= ck ck

    and

    Highck= ck+ ck

    The parallelepiped algorithm becomes

    Lowck BVijk Highck

    These decision boundaries form an n-dimensionalparallelepiped in feature space.

    If the pixel value lies between the low and the highthreshold for a class in all n bands evaluated, it isassigned to that class, otherwise it is assigned to anunclassified category.

  • 8/3/2019 CE-3219-supervised-classification-2010

    45/65

    Maximum Likelihood Classifier

  • 8/3/2019 CE-3219-supervised-classification-2010

    46/65

    andpi= probability of that class existing.

    Theoretically, pi for each class, is given equal weightage, if noknowledge regarding the existence of the features on the ground is

    available. If the chance of a particular class existing is more than theothers, then the user can define a set of priori probabilities for thefeatures and the equation can be slightly modified.

    Decide xis in class cif, and only if,pc(ac) > pi(ac), where i= 1,2,3, .., mpossible classes

    and

    pc(ac) = loge (ac)[0.5loge{det (Vc)}]0.5 [(XMc)T(Vc)

    -1 (XMc)]

    The use of priori probability helps in incorporating the effects of reliefand other terrain characteristics. The disadvantage of this classifier isthat it requires a large computer memory space and computing time,

    and yet sometimes may not produce the best results.

    CLASSIFICATION ACCURACY

  • 8/3/2019 CE-3219-supervised-classification-2010

    47/65

    CLASSIFICATION ACCURACY

    ASSESSMENT No classification task using remote sensing data is

    complete till an assessment of accuracy is performed.

    The analyst and the user of a classified map would liketo know as to how accurately the classes on the groundhave been identified on the image.

    The term accuracycorrelates to correctness.

    In digital image processing, accuracy is a measure ofagreement between standard information at givenlocation to the information at same location on theclassified image.

    Generally, the accuracy assessment is based on thecomparison of two maps; one based on the analysis ofremote sensing data and second based on informationderived from actual ground also known as the reference

    map.

    Classification Accuracy Assessment

  • 8/3/2019 CE-3219-supervised-classification-2010

    48/65

    y This reference map is often compiled from detailed information

    gathered from different sources and is thought to be moreaccurate than the map to be evaluated.

    The reference map consists of a network of discrete parcels,each designated by a single label.

    The simplest method of evaluation is to compare the two givenmaps with respect to areas assigned to each class or category.

    This yields a report of the areal extents of classes, which agreeto each other.

    The accuracy assessment is presented as an overallclassification of map or as site-specific accuracy.

    Overall classificationaccuracy represents the overall accuracybetween two maps in terms of total area for each category. Itdoes not take into account the agreement or disagreementbetween two maps at specific locations.

    The second form of accuracy measure is site-specific accuracy,which is based upon detailed assessment of agreementbetween the two maps at specific locations.

    Error Matrix

  • 8/3/2019 CE-3219-supervised-classification-2010

    49/65

    Error Matrix

    The standard form for reporting site-specific accuracy isthe error matrix, also known as the confusion matrixorthe contingency table.

    An error matrix not only identifies the overall error foreach category, but also the misclassification for eachcategory.

    An error matrix is essentially consists of an nby narray,where nis the number of class or categories on the mapreference.

    Here the rows of the matrix represent the true classesor

    information on the reference map, while column of thematrix represent the classes as identified on theclassified map.

    Error Matrix

  • 8/3/2019 CE-3219-supervised-classification-2010

    50/65

    Total sum of correctly identified pixels

    Correctly identified pixels

    Each row shows errors of omission while each column shows errors ofcommission

    Column Marginal

    Total

    Barren

    Forest

    Water

    Range

    Crop

    Urban

    BarrenForestWaterRangeCropUrbanTotalClassified ImageClass

    ReferenceIm

    age

    Row

    Margin

    al

    Error Matrix

    Error Matrix

  • 8/3/2019 CE-3219-supervised-classification-2010

    51/65

    o at

    The values in the last column gives the total numberof true points per class used for assessing theaccuracy.

    Similarly, the total at the bottom of each columngives information regarding the number ofpoints/pixels per class in the classified map.

    The diagonal elements of the error matrix indicatethe number of points/pixels correctly identified boththe reference and classified maps.

    The sum total of all these diagonal elements isentered at the right hand side bottom most element,i.e., total number of points/ pixels correctly classified

    both in the reference and classified maps.

  • 8/3/2019 CE-3219-supervised-classification-2010

    52/65

    ERROR MATRIX

    The off-diagonal elements of the error matrix provideinformation on errors of ommission and commission

    respectively.

    Errors of ommission are found in the upper right half ofthe matrix and for each class it is computed by taking the

    sum of all the non-diagonal elements along each row, anddividing it by the row total of that class.

  • 8/3/2019 CE-3219-supervised-classification-2010

    53/65

    ACCURACY INDICES

  • 8/3/2019 CE-3219-supervised-classification-2010

    54/65

    Rosenfield &

    Fitzpatrick-Lins (1986)

    Proportion of

    weighteddisagreement

    corrected for chance

    KwWeightedKappa

    Fung and

    LeDrew

    (1988)

    Average of overallaccuracy and average

    users accuracy

    CAu

    Combined

    accuracy Average of overall

    accuracy and averageusers accuracy

    CAp

    Congalton et

    al. (1983)

    Proportion of

    agreement after

    removing the propor-tion of agreement by

    chance

    K

    Kappa

    coefficientof agreement

    [ ]pAAOA +2

    1

    [ ]AApOA +21

    e

    eo

    P

    PP

    1

    eijij

    oijij

    Pv

    Pv

    1

    ACCURACY INDICES

    ACCURACY INDICES

  • 8/3/2019 CE-3219-supervised-classification-2010

    55/65

    Conditional Tau

    computed from the ith

    column (Producers)

    T+I

    Naesset

    (1996)

    Conditional Tau

    compute from the ith

    row (Users)

    Ti+

    Conditional

    Tau

    Tau for classifications

    based on unequal

    probabilities of class

    membership

    Tp

    Tau for classificationsbased on equal

    probabilities of class

    membership

    Te

    Tau

    coefficient

    Foody (1992),Ma and

    Raymond

    (1995)

    Conditional Kappacomputed from the ith

    column in error matrix

    (Producers)

    K+I

    Conditional

    Kappa )(

    )1()(

    1 ie

    ieio

    P

    PP

    +

    ++

    q

    qPo

    11

    1

    r

    ro

    P

    PP

    1

    i

    iioP

    PP

    +

    1)(

    i

    iio

    P

    PP

    +

    1

    )(

    Error Matrix

  • 8/3/2019 CE-3219-supervised-classification-2010

    56/65

    40812265891067741878920Total22021901000

    Pasture

    621056000061Bare soil

    9340287821033Forest 2

    9107018372000Forest 1

    87800008780Water

    858027500826Heather

    TotalPastureBare soilForest 2Forest 1WaterHeather

    Predicted classActual

    class

    Error Matrix

  • 8/3/2019 CE-3219-supervised-classification-2010

    57/65

    ERROR MATRIX

  • 8/3/2019 CE-3219-supervised-classification-2010

    58/65

    +=

    npp 50

    n)q)(p'(1.645'

    wherep =Overall accuracy at 95% confidence level

    p'=the overall accuracy,q = 100- p', andn = the sample size.

    If this value of pexceeds the defined criterion at the lower limit,then it is possible to accept this classification with 95%confidence limit.

    Normally, the defined criterion for confidence limit is set at 85%. For the above given example, the accuracy at the lower is

    91.6% and hence it is acceptable as the classified map has metor exceeded the defined accuracy standards.

  • 8/3/2019 CE-3219-supervised-classification-2010

    59/65

    OMMISSION & COMMISION ERRORSCommissionOmmission

  • 8/3/2019 CE-3219-supervised-classification-2010

    60/65

    94.4 99.196.922698.3 100.099.5220219Pasture

    93.3 96.795.158987.8 92.690.2621560Bare soil

    80.0 84.682.3106792.4 95.694.0934878Forest 2

    95.9 98.597.274176.4 81.879.1910720Forest 1

    99.4 100.0100.087899.4 100.0100.0878878Water

    87.8 91.889.892095.0 97.696.3858826Heather

    95%confidence

    limit

    %correct

    n95%confidence

    limits

    %correct

    nPointscorrect

    CommissionOmmission

    Class

    Using 85% as the criteria, it is seen that Forest 1 has failed the test as both theupper and lower limits of accuracy at 95% confidence limit is less than 85%.

    Similarly, when errors of commission are evaluated, it is found that Forest 2 fails tomeet the criteria. This is also evident from Table .

    It can be seen that 183 pixels have not been identified as Forest 1, and has beenidentified as Forest 2 class. It implies that even though the overall classificationaccuracy has exceeded the defined criterion of acceptability, the training samplesof classes Forest 1 and Forest 2 have not been able to provide the correctinformation to the classification process, and hence the training samples have to

    be collected with caution.

    KAPPA COEFFICIENT

  • 8/3/2019 CE-3219-supervised-classification-2010

    61/65

    In the above procedure for determining accuracy ofclassification, it is highly dependent upon the training samplesused for classification and assessment of classification

    accuracy.In order to assess the agreement between two maps, Kappa(), which is a measure of the difference between, observedagreement between two maps (as reported by overallaccuracy) and the agreement that might be contributed solelyby chance matching of two maps.

    It attempts to provide a measure of agreement that is adjustedfor chance and is expressed as follows:

    Expected

    ExpectedObservedK

    =

    1

    KAPPA COEFFICIENT

    Observed is the overall accuracy while expected is an estimate of

  • 8/3/2019 CE-3219-supervised-classification-2010

    62/65

    Observed is the overall accuracy, while expected is an estimate of

    chance agreement to the observed percentage correct.Expected is computed by first taking the products of row and

    column totals to estimate the number of pixels assigned to eachelement of the matrix, given that pixels are assigned by chance to

    each class.

    Table shows the sample computation of for the error matrix givenearlier.

    2265891007741878920

    22049720129580234740163020193160202400

    621140346365769662607460161545238571320

    934211084550126996578692094820052859280

    910205660535990970970674310798980837200

    878198428517142936826650598770884807760

    858193908505362915486635778753324789360

    KAPPA COEFFICIENT

  • 8/3/2019 CE-3219-supervised-classification-2010

    63/65

    Total of diagonal element = 3046621

    Total of all element = 28866023

    Expected Agreement by chance

    = 0.126

    = 0.904

    The value of K = 0.904 means that the classification hasachieved an accuracy that is 90% better than would be

    expected from random assignment of pixels to classes.

    elementsallofTotal

    elementdiagonalofSum=

    126.01

    126.0923.0

    =K

  • 8/3/2019 CE-3219-supervised-classification-2010

    64/65

    In my next session, I would discuss on the unsupervisedclassification techniques.

  • 8/3/2019 CE-3219-supervised-classification-2010

    65/65

    THANK

    YOU