Upload
sudeep-gautam
View
216
Download
0
Embed Size (px)
Citation preview
8/3/2019 CE-3219-supervised-classification-2010
1/65
IMAGE CLASSIFICATION &ANALYSIS
8/3/2019 CE-3219-supervised-classification-2010
2/65
In my previous session I had discussed the role of
Image transformation in remote sensing digitalanalysis.
In this session I will now discuss the various
techniques by which a digital satellite data can beconverted into information of interest.
8/3/2019 CE-3219-supervised-classification-2010
3/65
IMAGE CLASSIFICATION & ANALYSIS
An analyst attempts to classify features in an image byusing the elements of visual interpretation to identifyhomogeneous groups of pixels that represent variousfeatures or land cover classes of interest.
In digital image classification, the analyst uses thespectral information represented by the digital numbers
in one or more spectral bands, and attempts to classifyeach individual pixel based on this spectral information.
This type of classification is termed spectral patternrecognition. In either of the case, the objective is to
assign all pixels in the image to particular classes orthemes.
The resulting classified image is comprised of a mosaicof pixels, each of which belongs to a particular theme,
and is essentially a thematic map of the original image.
8/3/2019 CE-3219-supervised-classification-2010
4/65
Information classes are those categories of interestthat the analyst is actually trying to identify in theimagery, such as different kinds of crops, different foresttypes or tree species, different geologic units or rock
types, etc.
Spectral classes are groups of pixels that are uniform(or near-similar) with respect to their brightness values inthe different spectral channels of the data.
The objective is to match the spectral classes in the datato the information classes of interest.
However, it is rare that there is a simple one-to-onematch between these two types of classes.
TYPES OF CLASS
8/3/2019 CE-3219-supervised-classification-2010
5/65
TYPES OF CLASS
Many times it is found that 2 to 3 spectral classes merge
to form one informational class, while some classes maynot be of any particular interest.
It is the analysts job to decide on the utility of thedifferent spectral classes and their correspondence touseful information classes.
8/3/2019 CE-3219-supervised-classification-2010
6/65
Common classification procedures canbe broken down into two broad
subdivisions based on the method used:
i. supervisedclassificationii. unsupervisedclassification.
IMAGE CLASSIFICATION
AND ANALYSIS
8/3/2019 CE-3219-supervised-classification-2010
7/65
SUPERVISED CLASSIFICATION In a supervised classification, the analyst identifies in the
imagery, homogeneous representative samples of thedifferent surface cover types (information classes) of
interest.
These samples are referred to as training areas.
The selection of appropriate training areas is based onthe analysts familiarity with the geographical area andknowledge of the actual surface cover types present in
the image.
Thus, the analyst is supervising the categorization of aset of specific classes.
8/3/2019 CE-3219-supervised-classification-2010
8/65
The numerical information in all spectral bands for the pixelscomprising these areas, are used to train the computer torecognize spectrally similar areas for each class.
The computer uses special programs or algorithms todetermine the numerical signaturesfor each training class.
Once the computer has determined the signatures for eachclass, each pixel in the image is compared to these signaturesand labeled as the class it closely resemblesdigitally.
Thus, in a supervised classification, the analyst is first identifiesthe information classes based on which it determines thespectral classes which represent them.
8/3/2019 CE-3219-supervised-classification-2010
9/65
UNSUPERVISED CLASSIFICATION
In essence, it is reverse of the supervised classification process.
Spectral classes are grouped, first, based solely on thenumerical information in the data, and are then matched by the
analyst to information classes (if possible).
Programs called clustering algorithms are used to determinethe natural groupings or structures in the data.
Usually, the analyst specifies how many groups or clusters areto be looked for in the data.
In addition to specifying the desired number of classes, theanalyst may also specify parameters related to the separationdistance amongst the clusters and the variation within eachcluster.
8/3/2019 CE-3219-supervised-classification-2010
10/65
UNSUPERVISED CLASSIFICATION
The final result of this iterative clustering process may
result in some clusters that the analyst would like tosubsequently combine, or that some clusters have beenbroken down, each of these require a further iteration ofthe clustering algorithm.
Thus, unsupervised classification is not completelywithout human intervention. However, it does not startwith a pre-determined set of classes as in a supervised
classification.
8/3/2019 CE-3219-supervised-classification-2010
11/65
SUPERVISED CLASSIFICATION
In order to carry out supervised classification the analystmay have to adopt a well defined procedure in so as toachieve a satisfactory classification of information.
The important aspects of conducting a rigorous andsystematic supervised classification of remote sensor dataare as follows:
(i) Selection of an appropriate classification scheme.(ii) Selection of representative areas as training sites.(iii) Extraction of training data statistics
(iv)Testing of training data for separability in order to identifythe best possible combination of bands for classification.
(v) Selection of an appropriate classification algorithm.(vi)Classification of image into appropriate defined classes.
(vii) Evaluation of classification accuracy.
8/3/2019 CE-3219-supervised-classification-2010
12/65
CLASSIFICATION SCHEME
Classification schemes have been developed so thatthey can readily be incorporated as meanful land useand land cover data as obtained by interpreting remotesensing data.
Some of the important are
U.S. Geological Survey Land Use/Land Cover
Classification System,
Michigan Classification System, and
Cowardin Wetland Classification system.
8/3/2019 CE-3219-supervised-classification-2010
13/65
8/3/2019 CE-3219-supervised-classification-2010
14/65
52 Lakes
51 Streams and canals
5. Water
54 Bays and estuaries
53 Reservoirs
92 Glaciers
91 Perennial snowfields9. Perennial snow and ice
84 Mixed tundra
83 Bare ground
82 Herbaceous tundra
81 Shrub and brush tundra
8.Tundra
77 Mixed barren land
76 Transitional areas
75 Strip mines, quarries, and gravel pits74 Bare exposed rocks
73 Sandy areas other than beaches
72 Beaches
71 Dry salt flats
7.Barren land
62 Non-forested wetland
61 Forested wetland6. Wetland
8/3/2019 CE-3219-supervised-classification-2010
15/65
Training Site Selection
Once a classification scheme has been adopted, the analyst mayidentify and select sites within the image that are representativeof the land cover classes of interest.
Training data should be of value if the environment from whichthey obtained is relatively homogenous.
The image coordinates of these sites are identified and used toextract statistics from the multispectral data for each of theseareas.
For each feature class c, the mean value (ci) for each andvariance-covariance matrix (Vc) are calculated in a similarmanner as explained.
The success of a supervised classification depends upon thetraining data used to identify different classes.
Hence selection of training data has to be done meticulouslykeeping in mind each training data set has some specificcharacteristics.
These characteristics are discussed below.
8/3/2019 CE-3219-supervised-classification-2010
16/65
Number of pixels: This is an important characteristic regardingthe number of pixels to be selected for each information class.
However, there is no guideline available, yet in general, theanalyst must ensure that sufficient number of pixels is selected.
Size: The training sets identified on the image should be largeenough to provide accurate and reliable information regarding
the informational class. However, it should not be too big as large areas may include
undesirable variation.
Shape: It is not an important characteristic.
However, regular shape of training area selected provides easein extracting the information from the satellite images.
CHARACTERISTICS OF TRAINING SITE
SELECTION
8/3/2019 CE-3219-supervised-classification-2010
17/65
Location: Generally informational classes have small spectralvariability, thus it is necessary that training data are should be solocated that it accounts for different types of conditions within theimage.
It is desirable that the analyst undertakes a field visit to the desired
location to clearly mark out the selected information. In case of inaccessible or mountainous regions, aerial photographs or
maps can provide the basis for accurate delineation of training areas. Number of training areas: The number of training areas depends
upon the number of categories to be mapped, their diversity, and theresources available for delineating training areas.
In general, five to ten training samples per class are selected in orderto account for the spatial and spectral variability of informational class.
Selection of multiple training areas is also desirable as it may be
possible that some training areas of a class may have to bediscarded later. It is found that it is usually better to define many small training fields
than to have a few in number but large training areas.
HARA TERI TI F TRAININ ITE
SELECTION
8/3/2019 CE-3219-supervised-classification-2010
18/65
CHARACTERISTICS OF TRAINING SITESELECTION
Placement: The training area should be placed in such away that it does not lie close to the edge of the boundaryof the information class.
Uniformity: This is one the most critical and importantcharacteristics of any training data for an informationclass. The training data collected must exhibit uniformity
or homogeneity in the information.
If the histogram displays one peak, i.e., unimodalfrequency distribution for each spectral class, the training
data is acceptable. If the display is multimodal distribution, then there is
variability or mixing of information and hence must bediscarded.
8/3/2019 CE-3219-supervised-classification-2010
19/65
IDEALISED SEQUENCE FOR SELECTINGTRAINING DATA
In order to select training data, no fixed or well definedprocedures can be laid out. However, as a guideline, the keysteps in selection and evaluation can be enumerated asfollows:
(i) Collect information, including maps and aerial photographs ofthe area under study. If any previous study has been carried
out, then acquire the necessary documents, maps, and reports.(ii) Conduct field trips to acquire first hand knowledge to selective
and representative sites in the study area. The field trips mustcoincide with the date and time of data acquisition. If not
possible, then it should be at the same time of the year.
(iii) Conduct to preliminary examination of the digital data, in orderto make assessment for the quality of the image.
8/3/2019 CE-3219-supervised-classification-2010
20/65
IDEALISED SEQUENCE FOR SELECTING
TRAINING DATA(iv) Identify prospective training areas. These locations may be
defined with respect to some easily identifiable objects on theimage. Further, the same may be identified on the map andaerial photographs if readily available.
(v) Extract the training data areas from the digital image.
(vi) For each informational class, display and inspect the
frequency histogram for all bands. In case of multimodalfrequency distribution, identify the training areas which areresponsible for the same and discard them.
(vii) Compute the training data statistics in the form of minimumand maximum, mean, standard deviations, variance-covariance matrices.
(viii) Now ascertain the separability of the informational classes
using feature selection.
8/3/2019 CE-3219-supervised-classification-2010
21/65
Band 1 Band 2 Band 3 Band 4 Band 5 Band 7Agriculture - 1
Barren Land
8/3/2019 CE-3219-supervised-classification-2010
22/65
39.09730.13317.60734.85325.35418.0716
29.45716.15328.49019.86713.9465
13.64120.48514.3289.7144
45.68029.80921.5333
24.11816.7062
16.6501
654321LayerVariance Covariance Matrix
6.2535.4273.6936.7594.9114.080Std. Dev
120.696128.64997.870121.108101.685104.985Mean
140.000149.000111.000142.000121.000122.000Maximum
96.000111.00089.00095.00081.00090.000Minimum
754321Layer
8/3/2019 CE-3219-supervised-classification-2010
23/65
Feature Selection:
Once the training statistics have been systematically collectedfrom each band for each class of interest, a judgment must bemade to determine those bands that are most effective in
discriminating each class from all others.
This process is commonly called feature selection.
The goal is to delete from the analysis those bands that provideonly redundant spectral information.
In this way the dimensionality (i.e. the number of bands to beprocessed) in the data set may be reduced.
This process minimizes the cost of the digital image classification(but hopefully, not the accuracy).
8/3/2019 CE-3219-supervised-classification-2010
24/65
Some of the statistical seperability measures are:1) City Block Distance
2) Euclidean Distance
3) Angular separation
4) Normalized City Block Distance
5) Mahalanobis Distance
6) Divergence
7) Transformed Divergence8) Bhattacharys Distance
9) Jeffries Matusita Distance
8/3/2019 CE-3219-supervised-classification-2010
25/65
City Block Distance commonly known asManhattan Distance, or Boxcar Distance(Kardi, 2006) is basically a seperabilitymeasure to represent the distance between
two points in a city road grid. It examines the absolute differences between
and the coordinates of two object a and b.and hence also known as Absolute ValueDistance.
Euclidean Distance is a popular measure offinding distance between two points or
objects, on the basis of Pythagoras theorem.
8/3/2019 CE-3219-supervised-classification-2010
26/65
The Normalized City Block measure that isbetter than City Block distance, in the sensethat it is proportional to the separation of theclass means and inversely proportional to
their standard deviations. If the means are equal, however, it will bezero regardless of the class variances, whichdoes not make sense for a statistical
classifier based on probabilities. Angular separation is a similarity measure
than a distance.
It represents the cosine angle between twoobjects. Higher values of angular separationindicates close similarity (Kardi, 2006)
8/3/2019 CE-3219-supervised-classification-2010
27/65
However, all the these measures do notaccount for overlap in class distance due tovariation and thus not good measures ofseperability, in case of remote sensing data.
For this reason, probability-based measureshave also been defined.
8/3/2019 CE-3219-supervised-classification-2010
28/65
Feature selection may involve both statisticaland/or graphical analysis to determine thedegree of between-class separability in the
remote sensor training data.
Combinations of bands are normally rankedaccording to their potential ability to discriminate
each class from all others using n bands at atime.
Statistical methods of features selection are used
to quantitatively select the subset of bands (orfeatures) that provides the greatest degree ofstatistical separability between any two classes,
c and d.
8/3/2019 CE-3219-supervised-classification-2010
29/65
The basic problem of spectral patternrecognition is that given a spectral distribution ofdata in nbands of remotely sensed data, find a
discrimination technique that will allowseparation of the major land cover categorieswith a minimum of error and a minimum numberof bands.
Generally, the more bands analyzed in aclassification, the greater the cost and perhapsthe greater the amount of redundant spectralinformation being used.
This problem is demonstrated diagrammaticallyusing just 1 band and 2 classes in Fig.
8/3/2019 CE-3219-supervised-classification-2010
30/65
Feature Selection
CLASS 1CLASS 2
Pixel in CLASS 2 erroneouslyassigned to CLASS 1
Pixel in CLASS 1 erroneouslyassigned to CLASS 2
One dimensional decisionboundary
No.ofpixels
8/3/2019 CE-3219-supervised-classification-2010
31/65
Examining the histograms in Fig. suggests that there issubstantial overlap between classes 1 and 2 in band 1and in band 2.
When there is overlap, any decision rule that one could
use to separate or distinguish between two classesmust be concerned with two types of error.
1. A pixel may be assigned to a class to which it does not
belong (an error of commission).
2. A pixel is not assigned to its appropriate class (an error
of omission).The goal is to select an optimum subset of bands andapply appropriate classification techniques to minimizeboth types of error in the classification process.
8/3/2019 CE-3219-supervised-classification-2010
32/65
If the training data for each band are normally
distributed as suggested in Fig. it is possible to useeither a divergence or transformed divergenceequation to identify the optimum subset of bands to
use in the classification procedure.
Divergence was one of the first measures ofstatistical separability used in the machine
processing of remote sensor data, and it is stillwidely used as a method of feature selection.
8/3/2019 CE-3219-supervised-classification-2010
33/65
It addresses the basic problem of deciding what is thebest q-band subset of nbands for use in the supervisedclassification process. The number of combinations, C,of nbands taken qat a time is
Thus, if there are six thematic mapper bands and weare interested in the three best bands to use in theclassification this results in 20 combinations that mustbe evaluated.
= = 20 combinations
( )!!!
qnq
n
q
nC
=
( )!36!3
!6
3
6
=
C
)6(6720=
8/3/2019 CE-3219-supervised-classification-2010
34/65
E
If the best two-band combinations were desired, It
would be necessary to evaluate 15 possiblecombinations.
Divergence is computed using the mean and
covariance matrices of the class statistics collected inthe training phase of the supervised classification.
The degree of divergence or "separability" between c
and d, Divergcd, is computed according to the formulaDivergcd = 0.5 Tr [(Vc- Vd) (Vc
-1 - Vd-1)]
+ 0.5 Tr [(Vc-1 + Vd
-1) (Mc-Md)(Mc-Md)T]
where Tr [.] is the trace of a matrix (i.e., the sum of thediagonal elements), Vc and Vd are the covariancematrices for the two classes, cand d, and Mc and Mdare the mean vectors.
8/3/2019 CE-3219-supervised-classification-2010
35/65
E It should be remembered that the size of thecovariance matrices Vc and Vd is a function of thenumber of bands used in the training process (i.e., if six
bands were trained upon, both Vc and Vd would bematrices 6 x 6 in dimension.
Divergence in this case would be used to identify the
statistical separability of the two training classes usingsix bands of training data.
However, this is not the usual goal of applyingdivergence. What we actually want to know is theoptimum subset of qbands. For example, if q= 3, whatsubset of three bands provides the best separationbetween these two classes?
8/3/2019 CE-3219-supervised-classification-2010
36/65
E
But what about the case where there are more than twoclasses? In this instance, the most common solution is tocompute the average divergence, Divergavg.
This involves computing the average over all possible pairs of
classes, cand d, while holding the subset of bands, qconstant.Then another subset of bands, q is selected for the m classesand analyzed.
The subset of features (bands) having the maximum averagedivergence may be the superior set of bands to use in theclassification algorithm. This can be expressed as:
)]))()([(5.0 11 TdCdCdc
MMMMVVTr ++
c
DivergDiverg
m
ccd
m
c
avg
+
==
1
1
1
1
8/3/2019 CE-3219-supervised-classification-2010
37/65
Using this, the band subset, q, with the highestaverage divergence would be selected as the mostappropriate set of bands for classifying the mclasses.Kumar and Silva (1977) suggest that it is possible totake the divergence logic one step further and computetransformed divergence, DivergTavg, expressed as:
}8
)(exp1{2000 cdavg
T DivergDiverg
=
This statistic gives an exponentially decreasing weight toincreasing distances between the classes. It also scalesthe divergence values to lie between 0 and 2000
8/3/2019 CE-3219-supervised-classification-2010
38/65
EThere is no need to compute the divergence using allsix bands, since this represents the totality of the data
set.It is useful, however, to calculate divergence withindividual channels (q= I) since a single channel mightadequately discriminate between all classes ofinterest.
A transformed divergence value of 2000 suggestsexcellent between-class separa-tion. Above 1900
provides good separation, while below 1700 is poor.
8/3/2019 CE-3219-supervised-classification-2010
39/65
E
There are other methods of feature selection also
based on determining the separability between twoclasses at a time.
For example, the Bhattacharyya distance assumes that
the two classes, c and d, are Gaussian in nature andthat the means, Mc and Md and covariance matrics Vcand Vd are available. It is computed as:
)()(8
12 dC
VV
dCcd MMMMBhatdd
+=+
[ ])det()det(
2/)det[
2
1
dc
dc
VV
VV
Ln+
+
To select the best q features (i.e., combination ofbands) from the original n bands in an m -classproblem, the Bhattacharyya distance is calculatedbetween each of the m (m - 1)/2 pairs of classes foreach of the possible ways of choosing qfeatures from
ndimensions.
8/3/2019 CE-3219-supervised-classification-2010
40/65
The best q features are those dimensions whose sum of theBhattacharyya distance between the m(m-1)/2 classes ishighest.The JM distance (also sometimes called the Bhattacharryadistance ) between a pair of probability distributions (spectralclasses) is defined as
)]))()([(5.0 11 TdCdCdc
MMMMVVTr ++
Feature Selection
=x
iijdxxpJ 2)}|({
This is seen to be a measure of the average distance
between the two class density functions. For normallydistributed classes this becomes.
)1(2 xij eJ
=
SELECTION OF APPROPRIATE
8/3/2019 CE-3219-supervised-classification-2010
41/65
SELECTION OF APPROPRIATECLASSIFICATION ALGORITHM
Various supervised classification methods have been used toassign an unknown pixel to any one of the classes.
The choice of a particular classifier or decision rule depends on
the nature of the input data and the desired output. Parametric classification algorithms assume that the observed
measurement vectors Xc obtained for each class in eachspectral band during the training phase of the supervised
classification are Gaussian in nature (i.e., they are normallydistributed).
Nonparametric classification algorithms make no suchassumption.
Among the most frequently used classification algorithms arethe Minimum Distance, Parallelepiped, and MaximumLikelihood classifier.
Mi i Di M Cl ifi i
8/3/2019 CE-3219-supervised-classification-2010
42/65
Minimum-Distance to Means Classification
It is one the simplest and most commonly used decisionrule classifier. Here the analyst provides the mean
vectors for each class in each band ck, from the training
data. To perform a minimum distance classification, thedistance to each mean vector, ck, from each unknownpixel (BVijk) is computed. Using Euclidian distance basedon the Pythagorean theorem.
To which ever class the unknown point has the smallest
distance, to that class the unknown pixel is assigned to.It can result in classification accuracies comparable toother more computationally intensive algorithms, such asthe maximum likelihood algorithm.
( ) ( )22 clijlckijk BVBVDist +=
Th P ll l i d Cl ifi
8/3/2019 CE-3219-supervised-classification-2010
43/65
The Parallelepiped Classifier
This algorithm is based on simple and/or Boolean logic.Training data statistics in n spectral bands are used inperforming the classification.
Brightness values from each pixel of the multispectral
imagery are used to produce and n-dimensional, meanvector Mc = (c1, c2, c3, , cn) with ck being themean value of the training data obtained for class c inband k out of m possible classes as previously defined.
ck is the standard deviation of the training data class cofband kout of mpossible classes.
Using a one-standard deviation threshold, a parallelepipedalgorithm decides BVijk is in class cif, and only if,
ckck BVijk ck+ ckwhere c=1,2,3.., m, number of classes, and
k=1,2,3, , m, number of bands.
8/3/2019 CE-3219-supervised-classification-2010
44/65
The Parallelepiped Classifier
Therefore, if the low and high decision boundaries aredefined as
Lowck= ck ck
and
Highck= ck+ ck
The parallelepiped algorithm becomes
Lowck BVijk Highck
These decision boundaries form an n-dimensionalparallelepiped in feature space.
If the pixel value lies between the low and the highthreshold for a class in all n bands evaluated, it isassigned to that class, otherwise it is assigned to anunclassified category.
8/3/2019 CE-3219-supervised-classification-2010
45/65
Maximum Likelihood Classifier
8/3/2019 CE-3219-supervised-classification-2010
46/65
andpi= probability of that class existing.
Theoretically, pi for each class, is given equal weightage, if noknowledge regarding the existence of the features on the ground is
available. If the chance of a particular class existing is more than theothers, then the user can define a set of priori probabilities for thefeatures and the equation can be slightly modified.
Decide xis in class cif, and only if,pc(ac) > pi(ac), where i= 1,2,3, .., mpossible classes
and
pc(ac) = loge (ac)[0.5loge{det (Vc)}]0.5 [(XMc)T(Vc)
-1 (XMc)]
The use of priori probability helps in incorporating the effects of reliefand other terrain characteristics. The disadvantage of this classifier isthat it requires a large computer memory space and computing time,
and yet sometimes may not produce the best results.
CLASSIFICATION ACCURACY
8/3/2019 CE-3219-supervised-classification-2010
47/65
CLASSIFICATION ACCURACY
ASSESSMENT No classification task using remote sensing data is
complete till an assessment of accuracy is performed.
The analyst and the user of a classified map would liketo know as to how accurately the classes on the groundhave been identified on the image.
The term accuracycorrelates to correctness.
In digital image processing, accuracy is a measure ofagreement between standard information at givenlocation to the information at same location on theclassified image.
Generally, the accuracy assessment is based on thecomparison of two maps; one based on the analysis ofremote sensing data and second based on informationderived from actual ground also known as the reference
map.
Classification Accuracy Assessment
8/3/2019 CE-3219-supervised-classification-2010
48/65
y This reference map is often compiled from detailed information
gathered from different sources and is thought to be moreaccurate than the map to be evaluated.
The reference map consists of a network of discrete parcels,each designated by a single label.
The simplest method of evaluation is to compare the two givenmaps with respect to areas assigned to each class or category.
This yields a report of the areal extents of classes, which agreeto each other.
The accuracy assessment is presented as an overallclassification of map or as site-specific accuracy.
Overall classificationaccuracy represents the overall accuracybetween two maps in terms of total area for each category. Itdoes not take into account the agreement or disagreementbetween two maps at specific locations.
The second form of accuracy measure is site-specific accuracy,which is based upon detailed assessment of agreementbetween the two maps at specific locations.
Error Matrix
8/3/2019 CE-3219-supervised-classification-2010
49/65
Error Matrix
The standard form for reporting site-specific accuracy isthe error matrix, also known as the confusion matrixorthe contingency table.
An error matrix not only identifies the overall error foreach category, but also the misclassification for eachcategory.
An error matrix is essentially consists of an nby narray,where nis the number of class or categories on the mapreference.
Here the rows of the matrix represent the true classesor
information on the reference map, while column of thematrix represent the classes as identified on theclassified map.
Error Matrix
8/3/2019 CE-3219-supervised-classification-2010
50/65
Total sum of correctly identified pixels
Correctly identified pixels
Each row shows errors of omission while each column shows errors ofcommission
Column Marginal
Total
Barren
Forest
Water
Range
Crop
Urban
BarrenForestWaterRangeCropUrbanTotalClassified ImageClass
ReferenceIm
age
Row
Margin
al
Error Matrix
Error Matrix
8/3/2019 CE-3219-supervised-classification-2010
51/65
o at
The values in the last column gives the total numberof true points per class used for assessing theaccuracy.
Similarly, the total at the bottom of each columngives information regarding the number ofpoints/pixels per class in the classified map.
The diagonal elements of the error matrix indicatethe number of points/pixels correctly identified boththe reference and classified maps.
The sum total of all these diagonal elements isentered at the right hand side bottom most element,i.e., total number of points/ pixels correctly classified
both in the reference and classified maps.
8/3/2019 CE-3219-supervised-classification-2010
52/65
ERROR MATRIX
The off-diagonal elements of the error matrix provideinformation on errors of ommission and commission
respectively.
Errors of ommission are found in the upper right half ofthe matrix and for each class it is computed by taking the
sum of all the non-diagonal elements along each row, anddividing it by the row total of that class.
8/3/2019 CE-3219-supervised-classification-2010
53/65
ACCURACY INDICES
8/3/2019 CE-3219-supervised-classification-2010
54/65
Rosenfield &
Fitzpatrick-Lins (1986)
Proportion of
weighteddisagreement
corrected for chance
KwWeightedKappa
Fung and
LeDrew
(1988)
Average of overallaccuracy and average
users accuracy
CAu
Combined
accuracy Average of overall
accuracy and averageusers accuracy
CAp
Congalton et
al. (1983)
Proportion of
agreement after
removing the propor-tion of agreement by
chance
K
Kappa
coefficientof agreement
[ ]pAAOA +2
1
[ ]AApOA +21
e
eo
P
PP
1
eijij
oijij
Pv
Pv
1
ACCURACY INDICES
ACCURACY INDICES
8/3/2019 CE-3219-supervised-classification-2010
55/65
Conditional Tau
computed from the ith
column (Producers)
T+I
Naesset
(1996)
Conditional Tau
compute from the ith
row (Users)
Ti+
Conditional
Tau
Tau for classifications
based on unequal
probabilities of class
membership
Tp
Tau for classificationsbased on equal
probabilities of class
membership
Te
Tau
coefficient
Foody (1992),Ma and
Raymond
(1995)
Conditional Kappacomputed from the ith
column in error matrix
(Producers)
K+I
Conditional
Kappa )(
)1()(
1 ie
ieio
P
PP
+
++
q
qPo
11
1
r
ro
P
PP
1
i
iioP
PP
+
1)(
i
iio
P
PP
+
1
)(
Error Matrix
8/3/2019 CE-3219-supervised-classification-2010
56/65
40812265891067741878920Total22021901000
Pasture
621056000061Bare soil
9340287821033Forest 2
9107018372000Forest 1
87800008780Water
858027500826Heather
TotalPastureBare soilForest 2Forest 1WaterHeather
Predicted classActual
class
Error Matrix
8/3/2019 CE-3219-supervised-classification-2010
57/65
ERROR MATRIX
8/3/2019 CE-3219-supervised-classification-2010
58/65
+=
npp 50
n)q)(p'(1.645'
wherep =Overall accuracy at 95% confidence level
p'=the overall accuracy,q = 100- p', andn = the sample size.
If this value of pexceeds the defined criterion at the lower limit,then it is possible to accept this classification with 95%confidence limit.
Normally, the defined criterion for confidence limit is set at 85%. For the above given example, the accuracy at the lower is
91.6% and hence it is acceptable as the classified map has metor exceeded the defined accuracy standards.
8/3/2019 CE-3219-supervised-classification-2010
59/65
OMMISSION & COMMISION ERRORSCommissionOmmission
8/3/2019 CE-3219-supervised-classification-2010
60/65
94.4 99.196.922698.3 100.099.5220219Pasture
93.3 96.795.158987.8 92.690.2621560Bare soil
80.0 84.682.3106792.4 95.694.0934878Forest 2
95.9 98.597.274176.4 81.879.1910720Forest 1
99.4 100.0100.087899.4 100.0100.0878878Water
87.8 91.889.892095.0 97.696.3858826Heather
95%confidence
limit
%correct
n95%confidence
limits
%correct
nPointscorrect
CommissionOmmission
Class
Using 85% as the criteria, it is seen that Forest 1 has failed the test as both theupper and lower limits of accuracy at 95% confidence limit is less than 85%.
Similarly, when errors of commission are evaluated, it is found that Forest 2 fails tomeet the criteria. This is also evident from Table .
It can be seen that 183 pixels have not been identified as Forest 1, and has beenidentified as Forest 2 class. It implies that even though the overall classificationaccuracy has exceeded the defined criterion of acceptability, the training samplesof classes Forest 1 and Forest 2 have not been able to provide the correctinformation to the classification process, and hence the training samples have to
be collected with caution.
KAPPA COEFFICIENT
8/3/2019 CE-3219-supervised-classification-2010
61/65
In the above procedure for determining accuracy ofclassification, it is highly dependent upon the training samplesused for classification and assessment of classification
accuracy.In order to assess the agreement between two maps, Kappa(), which is a measure of the difference between, observedagreement between two maps (as reported by overallaccuracy) and the agreement that might be contributed solelyby chance matching of two maps.
It attempts to provide a measure of agreement that is adjustedfor chance and is expressed as follows:
Expected
ExpectedObservedK
=
1
KAPPA COEFFICIENT
Observed is the overall accuracy while expected is an estimate of
8/3/2019 CE-3219-supervised-classification-2010
62/65
Observed is the overall accuracy, while expected is an estimate of
chance agreement to the observed percentage correct.Expected is computed by first taking the products of row and
column totals to estimate the number of pixels assigned to eachelement of the matrix, given that pixels are assigned by chance to
each class.
Table shows the sample computation of for the error matrix givenearlier.
2265891007741878920
22049720129580234740163020193160202400
621140346365769662607460161545238571320
934211084550126996578692094820052859280
910205660535990970970674310798980837200
878198428517142936826650598770884807760
858193908505362915486635778753324789360
KAPPA COEFFICIENT
8/3/2019 CE-3219-supervised-classification-2010
63/65
Total of diagonal element = 3046621
Total of all element = 28866023
Expected Agreement by chance
= 0.126
= 0.904
The value of K = 0.904 means that the classification hasachieved an accuracy that is 90% better than would be
expected from random assignment of pixels to classes.
elementsallofTotal
elementdiagonalofSum=
126.01
126.0923.0
=K
8/3/2019 CE-3219-supervised-classification-2010
64/65
In my next session, I would discuss on the unsupervisedclassification techniques.
8/3/2019 CE-3219-supervised-classification-2010
65/65
THANK
YOU