Classi cation - hi

Soft omputingFinal proje t: Classi� ation treesHafrún Hauksdóttir ogSveinn Ríkarður Jóelsson25th April 2005

Classi� ation trees Soft omputing1 The dataROSIS (Re�e tive Opti s System Imaging Spe trometer) in 102(Center) and 103(Uni-versity) bands. Ground truth is split up into training(≈ 10%) and testing(≈ 90%).The images where aquired during a �ight ampaign over Pavia, northern Italy(40◦11' N,9◦9' E), on the 8th of July 2002 from 10:30 a.m. to 12:00 noon.Tables 1 and 2 show the lass names and the number of samples in ea h lass andset. Figures 1 to 4 are the ground truth masks(test/train) used here for the Centerand University data respe tively.Table 1: Classes and samples inCenter datasetClass Name Training Testing1 Water 824 651472 Trees 820 67783 Meadows 824 22664 Bri ks 808 18915 Bare Soil 820 57646 Asphalt 816 84327 Bitumen 808 64798 Tiles 1260 415669 Shadow 476 2387Total 7456 140710

Table 2: Classes and samples inUniversity datasetClass Name Training Testing1 Asphalt 548 63042 Meadows 540 181463 Gravel 392 18154 Trees 524 29125 Metal Sheets 265 11136 Bare Soil 532 45727 Bitumen 375 9818 Self-Bl. Bri ks 514 33649 Shadow 231 795Total 3921 400021

2

3

4

5

6

7

8

Figure 1: Test mask for Center 1

2

3

4

5

6

7

8

Figure 2: Train mask for Center1

2

3

4

5

6

7

8

Figure 3: Test mask for University 1

2

3

4

5

6

7

8

Figure 4: Train mask for University1

Classi� ation trees Soft omputing2 Prin ipal Component AnalysisOne appro h of feature extra tion is prin ipal omponent transform. The goal offeature extra tion is to redu e the omputational omplexity and at the same timeimprove lassi� ation a ura y and generality.The resultant features of PCT (prin ipal omponent transform) are linear ombina-tions of the original bands with maximum varian e. Transformations of the spe tralbands and do not present spe tral bands of the original data. The prin ipal ompo-nent transformation is image ontent dependent and gives thus poor generalizationof the lassi�er. Our goal is to train a lassi�er based on a small training set fromone pi ture and lassify the same pi ture, the ontent dependen e of PCT is thusa aptable in this ase.

Figure 5: The �st three prin ipal ompenets of the Center and University data sets(Center above).The input data is normalized so that it has a zero mean and falls in the interval[−1, 1]. The training set is then used to �nd prin ipal omponent transformation ofthe data. The prin ipal omponent analysis transforms the training data so that thebands will be un orrelated and redu es the number of bands by retaining only those omponents whi h ontribute more than a spe i�ed fra tion of the total variationin the training set. The same transformation is then used on the test set.

2

Classi� ation trees Soft omputing3 Morphologi al operatorsSin e the data is of an urban area any stru tural information is of great value.One way to aquire su h information is through morphologi al operators. See refer-en es [6, 1℄.When manipulating the data we use iterative morphologi al operators on sele ted omponents of the data in order to generate erotion and dilation 'history' for thepixels. Figure 6 gives a s hemati of the pro edure, where 3 prin ipal omponentshave been sele ted from the dataset and ea h of these omponents undergoes 3 folderotion and dilation and thus 18 'stru tural' bands out of the 3 are generated.PCA 1PCA 1 PCA 1 PCA 1 PCA 1 PCA 1

PCA 2PCA 2PCA 2PCA 2PCA 2PCA 2PCA 2

PCA 3PCA 3PCA 3PCA 3PCA 3PCA 3PCA 3

Principalcomponents

erotiondilation

PCA 1

Figure 6: 18 stru tural hannels derived from 3 prin ipal omponents using 3 folddilation and erotion

3

Classi� ation trees Soft omputing4 CaRT lassi�er(partition on input)The trees we use in this se tion are grown using split riteria based on Gini di-versity/impurity index and the training data, splitting is stopped when a node ispure or the numger of samples to split is less then 10. Post training, the treesare pruned using an optimal pruning s heme that �rst prunes bran hes giving lessimprovement in error ost. The level of pruning is then sele ted to minimize theerror when lassifying the test set.We use PCA to transform the data down to 3 bands(the only omponents whi h over at least 0.01 of the varian e in the data set) and then apply 3 iterative erotionand dilation morphologi al operators, with a disk shaped stru turing element ofradius 5 pixels, on ea h of the 3 bands and end up with a total of 21 bands to workon. Figure 6 gives an illustration of this.Finally we ompare these two approa hes.4.1 CenterIn �gure 7 we plot the training and testing error for all possible levels of pruningfor a tree grown on the test dataset for Center as well as the varian e of the errorbetween lasses. By lose inspe tion of the testing error we sele t a pruning levelof 4 whi h yields the tree in �gure 8 more shown here as an example of a tree.0 5 10 15 20 25 30 35

0

0.2

0.4

0.6

0.8

Err

or (

%)

training errortesting error

0 5 10 15 20 25 30 350

0.05

0.1

0.15

0.2

0.25

0.3

0.35

pruning level

Err

or v

aria

nce(

betw

een

clas

ses) training error variance

testing error variance

Figure 7: Train/test error as a fun tion of pruning level for Center4

Classi� ation trees Soft omputing

Figure 8: Pruned lassi� ation tree for Center4.1.1 Results for CenterIn table 3 we ompare the results using PCA and morphologi al �lters to simplygrowing a tree based on the whole dataset. The PCA and morphology seems too�er few per ent in a ura y improvements on this data set.The lassi�ed image an be seen in �gure 9.5

Classi� ation trees Soft omputingTable 3: Classifa tion results omparison for CenterPCA and morphologyClasses Training Testing Training Testing1 1.0000 0.9189 1.0000 0.97792 0.9707 0.8536 0.9976 0.83203 0.9830 0.9175 0.9939 0.81864 0.9616 0.6520 1.0000 0.86575 0.9646 0.8466 0.9988 0.88536 0.9510 0.9134 0.9902 0.93477 0.9319 0.8946 0.9913 0.73728 0.9857 0.9572 0.9984 0.94689 1.0000 0.9627 1.0000 0.9937Total 0.9717 0.9198 0.9966 0.9404

2

3

4

5

6

7

8

Figure 9: Center lassi�ed using PCA and morphologi al operators4.2 UniversityNow we simply repeat the pro ess for the University dataset. The most notabledi�eren e was that the a ura ies imporved mu h more when doing the PCA/mor-phology approa h when ompared to the Center dataset. A pruning level of 10(PCA/M version) yielded the lowest testing error (�gure 10) so the tree is onsid-erably smaller then the one used above, the tree is drawn in �gure 11.6

Classi� ation trees Soft omputing0 5 10 15 20 25 30 35

0

0.2

0.4

0.6

0.8

Err

or (

%)

training errortesting error

0 5 10 15 20 25 30 350

0.05

0.1

0.15

0.2

0.25

pruning level

Err

or v

aria

nce(

betw

een

clas

ses) training error variance

testing error variance

Figure 10: Train/test error as a fun tion of pruning level for University4.2.1 Results for UniversityIn table 4 we ompare the results using PCA and morphologi al �lters to simplygrowing a tree based on the whole dataset. The PCA and morphology seems too�er signi� ant a ura y improvements on this data set.Table 4: Classifa tion results omparison for UniversityPCA and morphologyClasses Training Testing Training Testing1 0.8011 0.7194 0.9617 0.85522 0.8352 0.7574 0.9852 0.96383 0 0 0.9260 0.53884 0.8836 0.9701 0.9656 0.94575 0.9736 0.8706 0.9585 0.84916 0.4699 0.2622 0.9774 0.56587 0.8507 0.8073 0.9787 0.58318 0.8463 0.9230 0.9650 0.72039 0.9610 0.9987 0.9870 0.8855Total 0.7235 0.6990 0.9674 0.8461Finally we lassify the whole image in �gure 12.7


Figure 11: Pruned lassi� ation tree for University8


2

3

4

5

6

7

8

Figure 12: University lassi�ed using PCA and morphologi al operators

9

Classi� ation trees Soft omputing5 Binary Hierar hi al Classi�erThe binary heirar hi al lassi�er is a tree lassi�er where ea h leaf in the treepresents one lass. The tree has C − 1 internal nodes where C is the number of lasses and gives a hierari hi al stru ture of the lasses. The natural hierar hy givesadditional information about the similarity between lasses.1 2 3 4 56 7 89

(10) (11)

(12)

(13)

(14)

(15) (16)

(17)

1 23 4 5678 9

(10)

(11)

(12) (13) (14)

(15)

(16)

(17)

(a) Center data (b) University dataFigure 13: BHC tree stru turesFeature extra tion is applied in ea h internal node. Bands are aggregated until auser de�ned ratio between the number of training samples in the node and inputdimension is a hieved. The tree is then developed using the resulting set of featuressele ted at ea h node.The goal is to maximize the Fisher dis riminant:τ(w) =

wTBwwTWwwhere W is the ovarian e matrix within sub- lass,W = P (ωα)Σα + P (ωβ)Σbetaand B is the ovarian e between sub- lassesB = (µα − µbeta)(µα − µbeta)T .The Fisher proje tion that maximizes the dis riminant is given byw̃ = W

−1(µα − µβ)The lasses in ea h node are split in two by the maximum Fisher dis riminant.Simulated annealing is used to determine the best split based on the Fisher dis- riminant. Simulated annealing uses an logliklehood to estimate the probability ofa lass belonging to the left or right meta lass and simultaneously determines theFisher dis riminant that separates these two subsets . This pro edure is re urseduntil the original C lasses are obtained at the leaves. The referen es [3, 2, 4℄ de-s ribe BHC in some detail and a des ription of the algorithm step by step an beseen in referen e [5℄.10

Classi� ation trees Soft omputingBHC BHC BHC BHCBand bands [1 : 30] bands [1 : 70] bands [1 : 90] bands [1 : 2 : 103]1 59.0 71.0 57.7 64.82 54.5 66.0 54.3 55.73 61.9 66.6 65.0 64.44 84.3 93.9 91.2 94.25 99.6 97.4 97.8 98.86 31.1 65.3 90.9 92.47 85.2 86.0 90.5 85.08 72.9 77.1 88.4 86.69 96.0 99.7 98.6 98.6OA 59.40 71.74 68.04 69.88Table 5: Classi� ation results of the Universe test data when trained with a subsetof the bandsThe best overall a ura y when subsets of the original data is used is only 71.74%and de rea es when the number of bands used goes over about 70 bands. Featureextra tion is therfore applied as prepro essing. By adding morpholigi al layers tothe PC data we boost the a ura y up to 87.50%. With the morholi i al layers omes an additional spatial information. Ea h PC layer is eroded and dialatedthree times. Thus we in rese the dimensionality of the prin ipal omponent databy seven. BHC BHC BHCBand BHC 3 PC bands 5 PC bands 10 PC bands1 99.9 91.2 97.1 99.12 90.6 82.7 91.3 92.33 97.3 95.5 87.5 91.24 77.2 79.3 81.7 85.65 85.2 80.3 94.3 92.76 90.7 95.9 79.3 78.77 91.1 77.3 80.5 86.48 95.6 95.7 94.9 92.59 99.1 95.6 91.5 91.296.26 91.29 93.76 87.50Table 6: Classi� ation results of the Center test dataAs an bee seen in table 7 good result or 96.26% overall a ura y is obtained by lassifying the Center data dire tly with the BHC lassi�er without any prepro ess-ing. By adding morphologi al layers to the PC image, better lassi� ation resultare obtained for the University data but the overall a ura y drops for the Centerdata.Figures 14 and 15 show the best lassi� ation results for the BHC lassi�er.11

Classi� ation trees Soft omputingBHC BHC BHCBand 3 PC bands 5 PC bands 9 PC bands1 74.7 72.7 77.22 93.1 93.6 95.93 59.8 56.7 66.34 98.3 96.6 94.35 98.4 99.3 99.56 58.2 59.9 76.57 47.9 48.5 64.58 74.3 78.6 83.19 99.5 99.9 95.0OA 82.66 82.93 87.50Table 7: Classi� ation results of the Universe test data with PC transformation andadditional morpholigi al layers.

0

1

2

3

4

5

6

7

8

Figure 14: Classi� ation results for the Center data12


0

1

2

3

4

5

6

7

8

Figure 15: Classi� ation results for the University data

13

Classi� ation trees Soft omputingReferen es[1℄ Sveinsson JR Benediktsson JA, Palmason JA. Classi� ation of hyperspe traldata from urban ereas based on extended morphologi al pro�les. IEEE Trans.on Geos ien e and Remote Sensing, 43(3):480�491, 2005.[2℄ Ghosh J Ham JiSoo Chen Yang hi, Crawford MM. Investigation of the randomforest framwork for lassi� ation of hyperspentral data.[3℄ Kumar S Crawford MM, Ghosh J. A hierar hi al multi lassi�er system forhyperspe tral data analysis.[4℄ Kumar S Crawford MM, Ghosh J. Best-bases feature extra tion algorithm for lassi� ation of hyperspe tral data. IEEE Trans. on Geos ien e and RemoteSensing, 39(7):1368�1379, 2001.[5℄ Kumar S Crawford MM, Ghosh J. Hierar hi al fusion of multiple lassi�ersfor hyperspe tral data analyses. Pattern Analysis and Appli ations, 5:210�220,2002.[6℄ J.R. Sveinsson J.A. Benediktsson, J.A. Palmason. Classi� ation of hyperspe tralrosis data from urban areas.

14

Documents

Classi cation - hi