CE-3219-supervised-classification-2010

8/3/2019 CE-3219-supervised-classification-2010

1/65

IMAGE CLASSIFICATION &ANALYSIS


2/65

In my previous session I had discussed the role of

Image transformation in remote sensing digitalanalysis.

In this session I will now discuss the various

techniques by which a digital satellite data can beconverted into information of interest.


3/65

IMAGE CLASSIFICATION & ANALYSIS

An analyst attempts to classify features in an image byusing the elements of visual interpretation to identifyhomogeneous groups of pixels that represent variousfeatures or land cover classes of interest.

In digital image classification, the analyst uses thespectral information represented by the digital numbers

in one or more spectral bands, and attempts to classifyeach individual pixel based on this spectral information.

This type of classification is termed spectral patternrecognition. In either of the case, the objective is to

assign all pixels in the image to particular classes orthemes.

The resulting classified image is comprised of a mosaicof pixels, each of which belongs to a particular theme,

and is essentially a thematic map of the original image.


4/65

Information classes are those categories of interestthat the analyst is actually trying to identify in theimagery, such as different kinds of crops, different foresttypes or tree species, different geologic units or rock

types, etc.

Spectral classes are groups of pixels that are uniform(or near-similar) with respect to their brightness values inthe different spectral channels of the data.

The objective is to match the spectral classes in the datato the information classes of interest.

However, it is rare that there is a simple one-to-onematch between these two types of classes.

TYPES OF CLASS


5/65

TYPES OF CLASS

Many times it is found that 2 to 3 spectral classes merge

to form one informational class, while some classes maynot be of any particular interest.

It is the analysts job to decide on the utility of thedifferent spectral classes and their correspondence touseful information classes.


6/65

Common classification procedures canbe broken down into two broad

subdivisions based on the method used:

i. supervisedclassificationii. unsupervisedclassification.

IMAGE CLASSIFICATION

AND ANALYSIS


7/65

SUPERVISED CLASSIFICATION In a supervised classification, the analyst identifies in the

imagery, homogeneous representative samples of thedifferent surface cover types (information classes) of

interest.

These samples are referred to as training areas.

The selection of appropriate training areas is based onthe analysts familiarity with the geographical area andknowledge of the actual surface cover types present in

the image.

Thus, the analyst is supervising the categorization of aset of specific classes.


8/65

The numerical information in all spectral bands for the pixelscomprising these areas, are used to train the computer torecognize spectrally similar areas for each class.

The computer uses special programs or algorithms todetermine the numerical signaturesfor each training class.

Once the computer has determined the signatures for eachclass, each pixel in the image is compared to these signaturesand labeled as the class it closely resemblesdigitally.

Thus, in a supervised classification, the analyst is first identifiesthe information classes based on which it determines thespectral classes which represent them.


9/65

UNSUPERVISED CLASSIFICATION

In essence, it is reverse of the supervised classification process.

Spectral classes are grouped, first, based solely on thenumerical information in the data, and are then matched by the

analyst to information classes (if possible).

Programs called clustering algorithms are used to determinethe natural groupings or structures in the data.

Usually, the analyst specifies how many groups or clusters areto be looked for in the data.

In addition to specifying the desired number of classes, theanalyst may also specify parameters related to the separationdistance amongst the clusters and the variation within eachcluster.


10/65

UNSUPERVISED CLASSIFICATION

The final result of this iterative clustering process may

result in some clusters that the analyst would like tosubsequently combine, or that some clusters have beenbroken down, each of these require a further iteration ofthe clustering algorithm.

Thus, unsupervised classification is not completelywithout human intervention. However, it does not startwith a pre-determined set of classes as in a supervised

classification.


11/65

SUPERVISED CLASSIFICATION

In order to carry out supervised classification the analystmay have to adopt a well defined procedure in so as toachieve a satisfactory classification of information.

The important aspects of conducting a rigorous andsystematic supervised classification of remote sensor dataare as follows:

(i) Selection of an appropriate classification scheme.(ii) Selection of representative areas as training sites.(iii) Extraction of training data statistics

(iv)Testing of training data for separability in order to identifythe best possible combination of bands for classification.

(v) Selection of an appropriate classification algorithm.(vi)Classification of image into appropriate defined classes.

(vii) Evaluation of classification accuracy.


12/65

CLASSIFICATION SCHEME

Classification schemes have been developed so thatthey can readily be incorporated as meanful land useand land cover data as obtained by interpreting remotesensing data.

Some of the important are

U.S. Geological Survey Land Use/Land Cover

Classification System,

Michigan Classification System, and

Cowardin Wetland Classification system.


13/65


14/65

52 Lakes

51 Streams and canals

5. Water

54 Bays and estuaries

53 Reservoirs

92 Glaciers

91 Perennial snowfields9. Perennial snow and ice

84 Mixed tundra

83 Bare ground

82 Herbaceous tundra

81 Shrub and brush tundra

8.Tundra

77 Mixed barren land

76 Transitional areas

75 Strip mines, quarries, and gravel pits74 Bare exposed rocks

73 Sandy areas other than beaches

72 Beaches

71 Dry salt flats

7.Barren land

62 Non-forested wetland

61 Forested wetland6. Wetland


15/65

Training Site Selection

Once a classification scheme has been adopted, the analyst mayidentify and select sites within the image that are representativeof the land cover classes of interest.

Training data should be of value if the environment from whichthey obtained is relatively homogenous.

The image coordinates of these sites are identified and used toextract statistics from the multispectral data for each of theseareas.

For each feature class c, the mean value (ci) for each andvariance-covariance matrix (Vc) are calculated in a similarmanner as explained.

The success of a supervised classification depends upon thetraining data used to identify different classes.

Hence selection of training data has to be done meticulouslykeeping in mind each training data set has some specificcharacteristics.

These characteristics are discussed below.


16/65

Number of pixels: This is an important characteristic regardingthe number of pixels to be selected for each information class.

However, there is no guideline available, yet in general, theanalyst must ensure that sufficient number of pixels is selected.

Size: The training sets identified on the image should be largeenough to provide accurate and reliable information regarding

the informational class. However, it should not be too big as large areas may include

undesirable variation.

Shape: It is not an important characteristic.

However, regular shape of training area selected provides easein extracting the information from the satellite images.

CHARACTERISTICS OF TRAINING SITE

SELECTION


17/65

Location: Generally informational classes have small spectralvariability, thus it is necessary that training data are should be solocated that it accounts for different types of conditions within theimage.

It is desirable that the analyst undertakes a field visit to the desired

location to clearly mark out the selected information. In case of inaccessible or mountainous regions, aerial photographs or

maps can provide the basis for accurate delineation of training areas. Number of training areas: The number of training areas depends

upon the number of categories to be mapped, their diversity, and theresources available for delineating training areas.

In general, five to ten training samples per class are selected in orderto account for the spatial and spectral variability of informational class.

Selection of multiple training areas is also desirable as it may be

possible that some training areas of a class may have to bediscarded later. It is found that it is usually better to define many small training fields

than to have a few in number but large training areas.

HARA TERI TI F TRAININ ITE

SELECTION


18/65

CHARACTERISTICS OF TRAINING SITESELECTION

Placement: The training area should be placed in such away that it does not lie close to the edge of the boundaryof the information class.

Uniformity: This is one the most critical and importantcharacteristics of any training data for an informationclass. The training data collected must exhibit uniformity

or homogeneity in the information.

If the histogram displays one peak, i.e., unimodalfrequency distribution for each spectral class, the training

data is acceptable. If the display is multimodal distribution, then there is

variability or mixing of information and hence must bediscarded.


19/65

IDEALISED SEQUENCE FOR SELECTINGTRAINING DATA

In order to select training data, no fixed or well definedprocedures can be laid out. However, as a guideline, the keysteps in selection and evaluation can be enumerated asfollows:

(i) Collect information, including maps and aerial photographs ofthe area under study. If any previous study has been carried

out, then acquire the necessary documents, maps, and reports.(ii) Conduct field trips to acquire first hand knowledge to selective

and representative sites in the study area. The field trips mustcoincide with the date and time of data acquisition. If not

possible, then it should be at the same time of the year.

(iii) Conduct to preliminary examination of the digital data, in orderto make assessment for the quality of the image.


20/65

IDEALISED SEQUENCE FOR SELECTING

TRAINING DATA(iv) Identify prospective training areas. These locations may be

defined with respect to some easily identifiable objects on theimage. Further, the same may be identified on the map andaerial photographs if readily available.

(v) Extract the training data areas from the digital image.

(vi) For each informational class, display and inspect the

frequency histogram for all bands. In case of multimodalfrequency distribution, identify the training areas which areresponsible for the same and discard them.

(vii) Compute the training data statistics in the form of minimumand maximum, mean, standard deviations, variance-covariance matrices.

(viii) Now ascertain the separability of the informational classes

using feature selection.


21/65

Band 1 Band 2 Band 3 Band 4 Band 5 Band 7Agriculture - 1

Barren Land


22/65

39.09730.13317.60734.85325.35418.0716

29.45716.15328.49019.86713.9465

13.64120.48514.3289.7144

45.68029.80921.5333

24.11816.7062

16.6501

654321LayerVariance Covariance Matrix

6.2535.4273.6936.7594.9114.080Std. Dev

120.696128.64997.870121.108101.685104.985Mean

140.000149.000111.000142.000121.000122.000Maximum

96.000111.00089.00095.00081.00090.000Minimum

754321Layer


23/65

Feature Selection:

Once the training statistics have been systematically collectedfrom each band for each class of interest, a judgment must bemade to determine those bands that are most effective in

discriminating each class from all others.

This process is commonly called feature selection.

The goal is to delete from the analysis those bands that provideonly redundant spectral information.

In this way the dimensionality (i.e. the number of bands to beprocessed) in the data set may be reduced.

This process minimizes the cost of the digital image classification(but hopefully, not the accuracy).


24/65

Some of the statistical seperability measures are:1) City Block Distance

2) Euclidean Distance

3) Angular separation

4) Normalized City Block Distance

5) Mahalanobis Distance

6) Divergence

7) Transformed Divergence8) Bhattacharys Distance

9) Jeffries Matusita Distance


25/65

City Block Distance commonly known asManhattan Distance, or Boxcar Distance(Kardi, 2006) is basically a seperabilitymeasure to represent the distance between

two points in a city road grid. It examines the absolute differences between

and the coordinates of two object a and b.and hence also known as Absolute ValueDistance.

Euclidean Distance is a popular measure offinding distance between two points or

objects, on the basis of Pythagoras theorem.


26/65

The Normalized City Block measure that isbetter than City Block distance, in the sensethat it is proportional to the separation of theclass means and inversely proportional to

their standard deviations. If the means are equal, however, it will bezero regardless of the class variances, whichdoes not make sense for a statistical

classifier based on probabilities. Angular separation is a similarity measure

than a distance.

It represents the cosine angle between twoobjects. Higher values of angular separationindicates close similarity (Kardi, 2006)


27/65

However, all the these measures do notaccount for overlap in class distance due tovariation and thus not good measures ofseperability, in case of remote sensing data.

For this reason, probability-based measureshave also been defined.


28/65

Feature selection may involve both statisticaland/or graphical analysis to determine thedegree of between-class separability in the

remote sensor training data.

Combinations of bands are normally rankedaccording to their potential ability to discriminate

each class from all others using n bands at atime.

Statistical methods of features selection are used

to quantitatively select the subset of bands (orfeatures) that provides the greatest degree ofstatistical separability between any two classes,

c and d.


29/65

The basic problem of spectral patternrecognition is that given a spectral distribution ofdata in nbands of remotely sensed data, find a

discrimination technique that will allowseparation of the major land cover categorieswith a minimum of error and a minimum numberof bands.

Generally, the more bands analyzed in aclassification, the greater the cost and perhapsthe greater the amount of redundant spectralinformation being used.

This problem is demonstrated diagrammaticallyusing just 1 band and 2 classes in Fig.


30/65

Feature Selection

CLASS 1CLASS 2

Pixel in CLASS 2 erroneouslyassigned to CLASS 1

Pixel in CLASS 1 erroneouslyassigned to CLASS 2

One dimensional decisionboundary

No.ofpixels


31/65

Examining the histograms in Fig. suggests that there issubstantial overlap between classes 1 and 2 in band 1and in band 2.

When there is overlap, any decision rule that one could

use to separate or distinguish between two classesmust be concerned with two types of error.

1. A pixel may be assigned to a class to which it does not

belong (an error of commission).

2. A pixel is not assigned to its appropriate class (an error

of omission).The goal is to select an optimum subset of bands andapply appropriate classification techniques to minimizeboth types of error in the classification process.


32/65

If the training data for each band are normally

distributed as suggested in Fig. it is possible to useeither a divergence or transformed divergenceequation to identify the optimum subset of bands to

use in the classification procedure.

Divergence was one of the first measures ofstatistical separability used in the machine

processing of remote sensor data, and it is stillwidely used as a method of feature selection.


33/65

It addresses the basic problem of deciding what is thebest q-band subset of nbands for use in the supervisedclassification process. The number of combinations, C,of nbands taken qat a time is

Thus, if there are six thematic mapper bands and weare interested in the three best bands to use in theclassification this results in 20 combinations that mustbe evaluated.

= = 20 combinations

( )!!!

qnq

n

q

nC

=

( )!36!3

!6

3

6

=

C

)6(6720=


34/65

E

If the best two-band combinations were desired, It

would be necessary to evaluate 15 possiblecombinations.

Divergence is computed using the mean and

covariance matrices of the class statistics collected inthe training phase of the supervised classification.

The degree of divergence or "separability" between c

and d, Divergcd, is computed according to the formulaDivergcd = 0.5 Tr [(Vc- Vd) (Vc

-1 - Vd-1)]

+ 0.5 Tr [(Vc-1 + Vd

-1) (Mc-Md)(Mc-Md)T]

where Tr [.] is the trace of a matrix (i.e., the sum of thediagonal elements), Vc and Vd are the covariancematrices for the two classes, cand d, and Mc and Mdare the mean vectors.


35/65

E It should be remembered that the size of thecovariance matrices Vc and Vd is a function of thenumber of bands used in the training process (i.e., if six

bands were trained upon, both Vc and Vd would bematrices 6 x 6 in dimension.

Divergence in this case would be used to identify the

statistical separability of the two training classes usingsix bands of training data.

However, this is not the usual goal of applyingdivergence. What we actually want to know is theoptimum subset of qbands. For example, if q= 3, whatsubset of three bands provides the best separationbetween these two classes?


36/65

E

But what about the case where there are more than twoclasses? In this instance, the most common solution is tocompute the average divergence, Divergavg.

This involves computing the average over all possible pairs of

classes, cand d, while holding the subset of bands, qconstant.Then another subset of bands, q is selected for the m classesand analyzed.

The subset of features (bands) having the maximum averagedivergence may be the superior set of bands to use in theclassification algorithm. This can be expressed as:

)]))()([(5.0 11 TdCdCdc

MMMMVVTr ++

c

DivergDiverg

m

ccd

m

c

avg

+

==

1

1

1

1


37/65

Using this, the band subset, q, with the highestaverage divergence would be selected as the mostappropriate set of bands for classifying the mclasses.Kumar and Silva (1977) suggest that it is possible totake the divergence logic one step further and computetransformed divergence, DivergTavg, expressed as:

}8

)(exp1{2000 cdavg

T DivergDiverg

=

This statistic gives an exponentially decreasing weight toincreasing distances between the classes. It also scalesthe divergence values to lie between 0 and 2000


38/65

EThere is no need to compute the divergence using allsix bands, since this represents the totality of the data

set.It is useful, however, to calculate divergence withindividual channels (q= I) since a single channel mightadequately discriminate between all classes ofinterest.

A transformed divergence value of 2000 suggestsexcellent between-class separa-tion. Above 1900

provides good separation, while below 1700 is poor.


39/65

E

There are other methods of feature selection also

based on determining the separability between twoclasses at a time.

For example, the Bhattacharyya distance assumes that

the two classes, c and d, are Gaussian in nature andthat the means, Mc and Md and covariance matrics Vcand Vd are available. It is computed as:

)()(8

12 dC

VV

dCcd MMMMBhatdd

+=+

[ ])det()det(

2/)det[

2

1

dc

dc

VV

VV

Ln+

+

To select the best q features (i.e., combination ofbands) from the original n bands in an m -classproblem, the Bhattacharyya distance is calculatedbetween each of the m (m - 1)/2 pairs of classes foreach of the possible ways of choosing qfeatures from

ndimensions.


40/65

The best q features are those dimensions whose sum of theBhattacharyya distance between the m(m-1)/2 classes ishighest.The JM distance (also sometimes called the Bhattacharryadistance ) between a pair of probability distributions (spectralclasses) is defined as

)]))()([(5.0 11 TdCdCdc

MMMMVVTr ++

Feature Selection

=x

iijdxxpJ 2)}|({

This is seen to be a measure of the average distance

between the two class density functions. For normallydistributed classes this becomes.

)1(2 xij eJ

=

SELECTION OF APPROPRIATE


41/65

SELECTION OF APPROPRIATECLASSIFICATION ALGORITHM

Various supervised classification methods have been used toassign an unknown pixel to any one of the classes.

The choice of a particular classifier or decision rule depends on

the nature of the input data and the desired output. Parametric classification algorithms assume that the observed

measurement vectors Xc obtained for each class in eachspectral band during the training phase of the supervised

classification are Gaussian in nature (i.e., they are normallydistributed).

Nonparametric classification algorithms make no suchassumption.

Among the most frequently used classification algorithms arethe Minimum Distance, Parallelepiped, and MaximumLikelihood classifier.

Mi i Di M Cl ifi i


42/65

Minimum-Distance to Means Classification

It is one the simplest and most commonly used decisionrule classifier. Here the analyst provides the mean

vectors for each class in each band ck, from the training

data. To perform a minimum distance classification, thedistance to each mean vector, ck, from each unknownpixel (BVijk) is computed. Using Euclidian distance basedon the Pythagorean theorem.

To which ever class the unknown point has the smallest

distance, to that class the unknown pixel is assigned to.It can result in classification accuracies comparable toother more computationally intensive algorithms, such asthe maximum likelihood algorithm.

( ) ( )22 clijlckijk BVBVDist +=

Th P ll l i d Cl ifi


43/65

The Parallelepiped Classifier

This algorithm is based on simple and/or Boolean logic.Training data statistics in n spectral bands are used inperforming the classification.

Brightness values from each pixel of the multispectral

imagery are used to produce and n-dimensional, meanvector Mc = (c1, c2, c3, , cn) with ck being themean value of the training data obtained for class c inband k out of m possible classes as previously defined.

ck is the standard deviation of the training data class cofband kout of mpossible classes.

Using a one-standard deviation threshold, a parallelepipedalgorithm decides BVijk is in class cif, and only if,

ckck BVijk ck+ ckwhere c=1,2,3.., m, number of classes, and

k=1,2,3, , m, number of bands.


44/65

The Parallelepiped Classifier

Therefore, if the low and high decision boundaries aredefined as

Lowck= ck ck

and

Highck= ck+ ck

The parallelepiped algorithm becomes

Lowck BVijk Highck

These decision boundaries form an n-dimensionalparallelepiped in feature space.

If the pixel value lies between the low and the highthreshold for a class in all n bands evaluated, it isassigned to that class, otherwise it is assigned to anunclassified category.


45/65

Maximum Likelihood Classifier


46/65

andpi= probability of that class existing.

Theoretically, pi for each class, is given equal weightage, if noknowledge regarding the existence of the features on the ground is

available. If the chance of a particular class existing is more than theothers, then the user can define a set of priori probabilities for thefeatures and the equation can be slightly modified.

Decide xis in class cif, and only if,pc(ac) > pi(ac), where i= 1,2,3, .., mpossible classes

and

pc(ac) = loge (ac)[0.5loge{det (Vc)}]0.5 [(XMc)T(Vc)

-1 (XMc)]

The use of priori probability helps in incorporating the effects of reliefand other terrain characteristics. The disadvantage of this classifier isthat it requires a large computer memory space and computing time,

and yet sometimes may not produce the best results.

CLASSIFICATION ACCURACY


47/65

CLASSIFICATION ACCURACY

ASSESSMENT No classification task using remote sensing data is

complete till an assessment of accuracy is performed.

The analyst and the user of a classified map would liketo know as to how accurately the classes on the groundhave been identified on the image.

The term accuracycorrelates to correctness.

In digital image processing, accuracy is a measure ofagreement between standard information at givenlocation to the information at same location on theclassified image.

Generally, the accuracy assessment is based on thecomparison of two maps; one based on the analysis ofremote sensing data and second based on informationderived from actual ground also known as the reference

map.

Classification Accuracy Assessment


48/65

y This reference map is often compiled from detailed information

gathered from different sources and is thought to be moreaccurate than the map to be evaluated.

The reference map consists of a network of discrete parcels,each designated by a single label.

The simplest method of evaluation is to compare the two givenmaps with respect to areas assigned to each class or category.

This yields a report of the areal extents of classes, which agreeto each other.

The accuracy assessment is presented as an overallclassification of map or as site-specific accuracy.

Overall classificationaccuracy represents the overall accuracybetween two maps in terms of total area for each category. Itdoes not take into account the agreement or disagreementbetween two maps at specific locations.

The second form of accuracy measure is site-specific accuracy,which is based upon detailed assessment of agreementbetween the two maps at specific locations.

Error Matrix


49/65

Error Matrix

The standard form for reporting site-specific accuracy isthe error matrix, also known as the confusion matrixorthe contingency table.

An error matrix not only identifies the overall error foreach category, but also the misclassification for eachcategory.

An error matrix is essentially consists of an nby narray,where nis the number of class or categories on the mapreference.

Here the rows of the matrix represent the true classesor

information on the reference map, while column of thematrix represent the classes as identified on theclassified map.

Error Matrix


50/65

Total sum of correctly identified pixels

Correctly identified pixels

Each row shows errors of omission while each column shows errors ofcommission

Column Marginal

Total

Barren

Forest

Water

Range

Crop

Urban

BarrenForestWaterRangeCropUrbanTotalClassified ImageClass

ReferenceIm

age

Row

Margin

al

Error Matrix

Error Matrix


51/65

o at

The values in the last column gives the total numberof true points per class used for assessing theaccuracy.

Similarly, the total at the bottom of each columngives information regarding the number ofpoints/pixels per class in the classified map.

The diagonal elements of the error matrix indicatethe number of points/pixels correctly identified boththe reference and classified maps.

The sum total of all these diagonal elements isentered at the right hand side bottom most element,i.e., total number of points/ pixels correctly classified

both in the reference and classified maps.


52/65

ERROR MATRIX

The off-diagonal elements of the error matrix provideinformation on errors of ommission and commission

respectively.

Errors of ommission are found in the upper right half ofthe matrix and for each class it is computed by taking the

sum of all the non-diagonal elements along each row, anddividing it by the row total of that class.


53/65

ACCURACY INDICES


54/65

Rosenfield &

Fitzpatrick-Lins (1986)

Proportion of

weighteddisagreement

corrected for chance

KwWeightedKappa

Fung and

LeDrew

(1988)

Average of overallaccuracy and average

users accuracy

CAu

Combined

accuracy Average of overall

accuracy and averageusers accuracy

CAp

Congalton et

al. (1983)

Proportion of

agreement after

removing the propor-tion of agreement by

chance

K

Kappa

coefficientof agreement

[ ]pAAOA +2

1

[ ]AApOA +21

e

eo

P

PP

1

eijij

oijij

Pv

Pv

1

ACCURACY INDICES

ACCURACY INDICES


55/65

Conditional Tau

computed from the ith

column (Producers)

T+I

Naesset

(1996)

Conditional Tau

compute from the ith

row (Users)

Ti+

Conditional

Tau

Tau for classifications

based on unequal

probabilities of class

membership

Tp

Tau for classificationsbased on equal

probabilities of class

membership

Te

Tau

coefficient

Foody (1992),Ma and

Raymond

(1995)

Conditional Kappacomputed from the ith

column in error matrix

(Producers)

K+I

Conditional

Kappa )(

)1()(

1 ie

ieio

P

PP

+

++

q

qPo

11

1

r

ro

P

PP

1

i

iioP

PP

+

1)(

i

iio

P

PP

+

1

)(

Error Matrix


56/65

40812265891067741878920Total22021901000

Pasture

621056000061Bare soil

9340287821033Forest 2

9107018372000Forest 1

87800008780Water

858027500826Heather

TotalPastureBare soilForest 2Forest 1WaterHeather

Predicted classActual

class

Error Matrix


57/65

ERROR MATRIX


58/65

+=

npp 50

n)q)(p'(1.645'

wherep =Overall accuracy at 95% confidence level

p'=the overall accuracy,q = 100- p', andn = the sample size.

If this value of pexceeds the defined criterion at the lower limit,then it is possible to accept this classification with 95%confidence limit.

Normally, the defined criterion for confidence limit is set at 85%. For the above given example, the accuracy at the lower is

91.6% and hence it is acceptable as the classified map has metor exceeded the defined accuracy standards.


59/65

OMMISSION & COMMISION ERRORSCommissionOmmission


60/65

94.4 99.196.922698.3 100.099.5220219Pasture

93.3 96.795.158987.8 92.690.2621560Bare soil

80.0 84.682.3106792.4 95.694.0934878Forest 2

95.9 98.597.274176.4 81.879.1910720Forest 1

99.4 100.0100.087899.4 100.0100.0878878Water

87.8 91.889.892095.0 97.696.3858826Heather

95%confidence

limit

%correct

n95%confidence

limits

%correct

nPointscorrect

CommissionOmmission

Class

Using 85% as the criteria, it is seen that Forest 1 has failed the test as both theupper and lower limits of accuracy at 95% confidence limit is less than 85%.

Similarly, when errors of commission are evaluated, it is found that Forest 2 fails tomeet the criteria. This is also evident from Table .

It can be seen that 183 pixels have not been identified as Forest 1, and has beenidentified as Forest 2 class. It implies that even though the overall classificationaccuracy has exceeded the defined criterion of acceptability, the training samplesof classes Forest 1 and Forest 2 have not been able to provide the correctinformation to the classification process, and hence the training samples have to

be collected with caution.

KAPPA COEFFICIENT


61/65

In the above procedure for determining accuracy ofclassification, it is highly dependent upon the training samplesused for classification and assessment of classification

accuracy.In order to assess the agreement between two maps, Kappa(), which is a measure of the difference between, observedagreement between two maps (as reported by overallaccuracy) and the agreement that might be contributed solelyby chance matching of two maps.

It attempts to provide a measure of agreement that is adjustedfor chance and is expressed as follows:

Expected

ExpectedObservedK

=

1

KAPPA COEFFICIENT

Observed is the overall accuracy while expected is an estimate of


62/65

Observed is the overall accuracy, while expected is an estimate of

chance agreement to the observed percentage correct.Expected is computed by first taking the products of row and

column totals to estimate the number of pixels assigned to eachelement of the matrix, given that pixels are assigned by chance to

each class.

Table shows the sample computation of for the error matrix givenearlier.

2265891007741878920

22049720129580234740163020193160202400

621140346365769662607460161545238571320

934211084550126996578692094820052859280

910205660535990970970674310798980837200

878198428517142936826650598770884807760

858193908505362915486635778753324789360

KAPPA COEFFICIENT


63/65

Total of diagonal element = 3046621

Total of all element = 28866023

Expected Agreement by chance

= 0.126

= 0.904

The value of K = 0.904 means that the classification hasachieved an accuracy that is 90% better than would be

expected from random assignment of pixels to classes.

elementsallofTotal

elementdiagonalofSum=

126.01

126.0923.0

=K


64/65

In my next session, I would discuss on the unsupervisedclassification techniques.


65/65

THANK

YOU

Documents

CE-3219-supervised-classification-2010