Impact of reference datasets and autocorrelation on classification accuracy

International Journal of Remote SensingiFirst, 2011, 1–10

Impact of reference datasets and autocorrelation on classificationaccuracy

SYLVIO MANNEL*†, MARIBETH PRICE‡ and DONG HUA§†Department of Geosciences, Idaho State University, Pocatello, ID 83209, USA

‡Department of Geology and Geological Engineering, South Dakota School of Minesand Technology, 501 E St Joseph Street, Rapid City, SD 57701, USA

§Department of Forest Resources, University of Minnesota, 1530 Cleveland AvenueNorth, St Paul, MN 55108-6112, USA

(Received 31 January 2008; in final form 28 September 2009)

Reference data and accuracy assessments via error matrices build the founda-tion for measuring success of classifications. An error matrix is often based onthe traditional holdout method that utilizes only one training/test dataset. If thetraining/test dataset does not fully represent the variability in a population, accu-racy may be over – or under – estimated. Furthermore, reference data may beflawed by spatial errors or autocorrelation that may lead to overoptimistic results.For a forest study we first corrected spatially erroneous ground data and then usedaerial photography to sample additional reference data around the field-sampledplots (Mannel et al. 2006). These reference data were used to classify forest coverand subsequently determine classification success. Cross-validation randomly sep-arates datasets into several training/test sets and is well documented to performa more precise accuracy measure than the traditional holdout method. However,random cross-validation of autocorrelated data may overestimate accuracy, whichin our case was between 5% and 8% for a 90% confidence interval. In addition,we observed accuracies differing by up to 35% for different land cover classesdepending on which training/test datasets were used. The observed discrepan-cies illustrate the need for paying attention to autocorrelation and utilizing morethan one permanent training/test dataset, for example, through a k-fold holdoutmethod.1

1. Accuracy assessment review

Accuracy assessment and reference data are vital, yet often under-appreciated, issuesin the design and validation of image classification. Reference data are used as train-ing and test data to train a classifier and subsequently test classification accuracy.Reference data can be field-sampled ground data, or obtained through other means. Itis an established procedure to randomly divide the reference data into one permanenttraining and test dataset. Based on the test data an error matrix is usually calculatedand reported (Congalton and Green 2008). This traditional holdout method assumesthat the test data cover the full spectral range of each class in the data. However, if

*Corresponding author. Email: [email protected] at: Cottey College, 6000 W. Austin, Nevada, MO 64772, USA.

International Journal of Remote SensingISSN 0143-1161 print/ISSN 1366-5901 online © 2011 Taylor & Francis

http://www.tandf.co.uk/journalsDOI: 10.1080/01431161.2010.498841

2 S. Mannel et al.

this assumption is inaccurate and the variability of the land cover class is not fullycaptured in the test data, the results of the error matrix may over or underestimate theaccuracy (Congalton and Green 2008). The results may depend heavily on just howthe training/test data happened to be separated.

In this work, we focus on two issues: inadequate training/test data and autocor-relation. First, using only one training and test dataset may not be representative ofthe entire population. Second, cross-validation may be a solution in that it randomlydivides reference data into several training/test datasets; however, if autocorrelationexists in the data, simple random selection of training/test data is likely to overesti-mate the accuracy. Autocorrelation might be present due to large pixel size (Friedlet al. 2000) or points sampled in close proximity via aerial photographs around fieldsampled plots (Mannel et al. 2006). Some work has published warning, about mean-ingless results in case of improperly determined training/test samples (Weber 2006,Congalton and Green 2008, Weber et al. 2008) or pointing to autocorrelation issueswithin Advanced Very High Resolution Radiometer (AVHRR) reference data (Friedlet al. 2000).

Remote sensing projects rarely explore how prevalent reference data or accuracyassessment issues are in limiting the validity of classification results for different landcover types, sensors or classification methods. Studies, such as Friedl et al. (2000),illustrate how easily accuracy could be misrepresented. Their accuracy for a near-est neighbour classification of AVHRR data, for example, dropped from 91.5% tojust 37.9% after taking autocorrelation into account. Our work confirms and quan-tifies limitations due to spatial errors and autocorrelation, as well as insufficienttraining/test reference samples. In the following we elucidate these issues and suggestways to correct for them.

Global Positioning System (GPS) errors or georeferencing errors of remote sensingimagery are two possible error sources impacting the spatial validity of reference data(Weber 2006). Georeferencing errors may cause plots that are close to the border oftheir land cover class to actually end up in the wrong land cover or as a mixed pixelon another remote sensing scene (figure 1(b)). Not checking for spatial validity of ref-erence points can lead to lower accuracies in subsequent classifications (Williams andHunt 2002). There seems to be a need for more research on how much reference dataare impacted by spatial errors and other reference data issues, such as autocorrelation.

Many aspects constrain satisfactory reference data acquisition, such as availablefinances, time and manpower, but also the spatial extent of a land cover class or thespectral range within the land cover class. However, a large number of reference datapoints, covering the spectral range of the desired land cover classes, are crucial for arobust, solid and repeatable classification (Congalton 1991). Training data are sub-ject to errors; yet, quality training data are paramount to successful and repeatableclassification (Frery et al. 2009). Test data that do not cover the spectral range of aland cover will introduce a bias by underemphasizing classification success or mis-takes within the underrepresented spectral range of the land cover class. Using aerialphotography, such as Digital Orthoquads (DOQs), to collect additional points aroundfield measured plots is a strategy to improve classification robustness by increasing thenumber of reference data by up to 10-fold (Mannel et al. 2006). However, points withinthese reference data clusters are sampled close to each other in homogenous areas andthus may be positively autocorrelated (figure 1).

A first accord of autocorrelation was given through Tobler’s first law of geography,which states that closer areas are usually more similar than areas further away (Tobler

https://www.researchgate.net/publication/224839853_A_note_on_procedures_used_for_accuracy_assessment_in_land_cover_maps_derived_from_AVHRR_data?el=1_x_8&enrichId=rgreq-f975fbe1c03f814f05b3ed0459eac8f7-XXX&enrichSource=Y292ZXJQYWdlOzIzMzMzODAxMztBUzoxNjIwNjk4NDQzNDA3MzdAMTQxNTY1MTg3MDk1MA==





https://www.researchgate.net/publication/43289736_Estimation_of_leafy_spurge_cover_from_hyperspectral_imagery_using_mixture_tuned_matched_filtering?el=1_x_8&enrichId=rgreq-f975fbe1c03f814f05b3ed0459eac8f7-XXX&enrichSource=Y292ZXJQYWdlOzIzMzMzODAxMztBUzoxNjIwNjk4NDQzNDA3MzdAMTQxNTY1MTg3MDk1MA==


https://www.researchgate.net/publication/222467662_A_review_of_assessing_the_accuracy_of_classifications_of_remotely_sensed_data?el=1_x_8&enrichId=rgreq-f975fbe1c03f814f05b3ed0459eac8f7-XXX&enrichSource=Y292ZXJQYWdlOzIzMzMzODAxMztBUzoxNjIwNjk4NDQzNDA3MzdAMTQxNTY1MTg3MDk1MA==

Impact of reference datasets and autocorrelation 3

(a) (b)

Figure 1. Black Hills, South Dakota. Circle with dot (�) is a field measured medium pine plot.Circles (©) are additional medium pine plots based on Digital Orthoquads (DOQs) and spa-tially validated on Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data. Field datain boundary regions may lower accuracy (e.g. the original field-sampled pine plot is invalid forAVIRIS data). Additional points were sampled based on (a) DOQ. The validity of the referencedata was checked for (b) AVIRIS data.

1970). This situation is also the case with pixels in spatial proximity to each other.They are seldom uncorrelated or negatively correlated, with the degree of correlationdecreasing as distance increases (Kettig and Landgrebe 1976). This is exactly the rea-son that allows for using DOQs to sample additional reference points in homogenousareas close to field-sampled plots (Mannel et al. 2006). A method to reduce autocor-relation by increasing variability within a shorter sampling interval is subsampling.However, subsampling was not applicable to our reference data, because, first, addi-tional DOQ points of the same land cover class were picked in homogenous areas andsecond, they were in close spatial proximity, sometimes within 30 m (approximatelyone Landsat Thematic Mapper (TM) pixel apart). Geographic Information Systems(GIS) and related technologies, such as remote sensing, have resulted in demand anddevelopment of new techniques to quantify autocorrelation (e.g. Anselin and Getis1992, Getis and Ord 1992). Furthermore, Steele et al. (1998) describe a method onspatial distribution and variation of map accuracy across remote sensing imagery.

For datasets that consist of spatially correlated data clusters, training and test datathat are determined in an entirely random manner can pose a problem, because pixel-based splits may not provide independent training and test data (Friedl et al. 2000).Autocorrelation in test data has been found to inflate estimates of classification accu-racy by over 50% in AVHRR data where pixel splits were performed (Friedl et al.2000). Friedl et al. (2000) suggest separating autocorrelated AVHRR reference databased on sites (or clusters). The impact of autocorrelation and spatial errors on thevalidity of a reference dataset is still largely unclear.

A second issue of reference data is the spectral validity and robustness of theone chosen training/test dataset. Often it is unclear if and by how much accuracieswould have differed with a different training/test set. An alternative to dividing thedata into just one training/test set is cross-validation or bootstrapping. A k-foldcross-validation averages accuracy using more than one randomly obtained test andtraining subset. The user determines k, the number of subsets into which the dataare being divided. k times, one of the k subsets is used as test data and the other k−1subsets are used as training data (also known as bootstrap resampling). Then, the





https://www.researchgate.net/publication/222472268_Estimation_and_Mapping_of_Misclassification_Probabilities_for_Thematic_Land_Cover_Maps?el=1_x_8&enrichId=rgreq-f975fbe1c03f814f05b3ed0459eac8f7-XXX&enrichSource=Y292ZXJQYWdlOzIzMzMzODAxMztBUzoxNjIwNjk4NDQzNDA3MzdAMTQxNTY1MTg3MDk1MA==

4 S. Mannel et al.

average error across all k trials is computed. Cross-validation is advantageous, becauseit makes the greatest use of the available data by having several training/test sets, thusallowing for all data to be used in the classification. It largely avoids misleading resultsthat are a function of test data that inadequately represent the variability of the data.However, cross-validation determines training and test datasets by separating samplepixels randomly. Random separation of reference data into training/test samples willinflate accuracy, if the data are autocorrelated (Friedl et al. 2000). We therefore expectinflated accuracies when applying cross-validation to autocorrelated reference data(measured using Moran’s Index).

A classification method that is often assessed with k-fold cross-validation is decisiontree classification (e.g. Plutowski et al. 1994, De’Ath and Fabricius 2000 or Quinlan2000), but it could easily be applied to any other classification method. Although deci-sion trees in remote sensing applications were already evaluated in the 1970s (Swainand Hauska 1977), only in the last decade did decision tree classification graduallyemerge from business applications into natural science and provided successful landcover classifications (Hansen et al. 1996, Brodley et al. 1999, Lawrence and Wright2001, Vogelmann et al. 2001). A decision tree consists of branches and nodes. Eachnon-terminal node is labelled with a question, which splits the cases into subsetsaccording to the answer to that question. Terminal nodes give the predicted class.Decision trees have similarities to other machine learning approaches. They use recur-sive partitioning algorithms to derive classification rules from training samples, whichis often referred to as data mining (Read 2000). Entropy measures are used to compareall possible splits to find the one split that results in the highest dissimilarity amongthe resulting subsets (Breiman et al. 1984). Possible splits of all variables are examinedand the split within a particular variable that produces the smallest entropy measureis chosen to partition the data (Breiman et al. 1984).

Much work is still needed to reflect the importance reference data acquisition holdsfor the remote sensing community and to fully comprehend the effects of commonlyused, yet potentially unreliable, accuracy assessments, such as the traditional holdoutmethod. Studies on reference data and autocorrelation within reference data have notbeen conclusive or detailed enough for common remote sensing platforms or commonland cover types to cause a change in how most remote sensing classifications are beingassessed.

This work contributes to the analysis of the effects of spatial autocorrelation inmultilayer datasets on classification accuracy and specifically draws attention to theneed to (1) confirm the validity of horizontal positions of reference data, (2) use morethan one training/test dataset, and (3) use cluster based training/test datasets (versusrandom reference data separation) if autocorrelated clusters are present in the data.

2. Project example

We utilized 135 random field-sampled plots, stratified by tree species and density,to classify a forested area in the Black Hills of western South Dakota (SD), USA.The field-sampled plots were positively autocorrelated with a Moran’s Index of 0.12.Decision trees were used on 20 Airborne Visible/Infrared Imaging Spectrometer(AVIRIS) scenes to classify species composition and tree density.

We georeferenced the 20 m resolution AVIRIS scenes based on 1 m resolutionDOQs into the Universal Transverse Mercator (UTM) Zone 13 map projection. Up tofour DOQ scenes had to be fused to cover one AVIRIS scene. Each AVIRIS scene was

https://www.researchgate.net/publication/239459111_Classification_trees_An_alternative_to_traditional_land_cover_classifiers?el=1_x_8&enrichId=rgreq-f975fbe1c03f814f05b3ed0459eac8f7-XXX&enrichSource=Y292ZXJQYWdlOzIzMzMzODAxMztBUzoxNjIwNjk4NDQzNDA3MzdAMTQxNTY1MTg3MDk1MA==

https://www.researchgate.net/publication/260879509_Knowledge_Discovery_and_Data_Mining?el=1_x_8&enrichId=rgreq-f975fbe1c03f814f05b3ed0459eac8f7-XXX&enrichSource=Y292ZXJQYWdlOzIzMzMzODAxMztBUzoxNjIwNjk4NDQzNDA3MzdAMTQxNTY1MTg3MDk1MA==


https://www.researchgate.net/publication/23581963_The_decision_tree_classifier_-_Design_and_potential?el=1_x_8&enrichId=rgreq-f975fbe1c03f814f05b3ed0459eac8f7-XXX&enrichSource=Y292ZXJQYWdlOzIzMzMzODAxMztBUzoxNjIwNjk4NDQzNDA3MzdAMTQxNTY1MTg3MDk1MA==



georegistered independently – because of limited computer capacity. We first rotatedthe AVIRIS scenes to north–south orientation and then identified at least 40 GroundControl Points (GCPs) on the AVIRIS image and the DOQ base image. To ensurebest georeferencing results, we took the first four GCPs from near the corners of theAVIRIS image and focused on an even distribution of GCPs rather than quantity.The AVIRIS scenes were georeferenced using a third-order polynomial with nearestneighbour resampling. The average root mean square (RMS) error was 1.3 pixels. Themaximum RMS error for a GCP was 4.6 pixels.

Prior to classification we overlaid the AVIRIS and DOQ imagery with the fieldplots to check for spatial errors, which can lead to lower accuracies (Williams andHunt 2002). Clearly visible tree stands on 1 m DOQs were visually compared to hueextent and form on AVIRIS scenes (Mannel et al. 2006). We then added additionaldata points within the homogenous area in close proximity to the existing field data(Mannel et al. 2006). For example, a dense pine field plot might be surrounded by alarge homogenous dense pine area that is visually identifiable by its similar hue valueson a DOQ (figure 1(a)). Additional DOQ-based reference points of dense pine wouldthen be collected from this homogenous area in close proximity to the original fieldplot (Mannel et al. 2006).

Sampling additional points clustered around existing data points ensures that muchof the spectral range of a land cover class is included in a classification, but also causesspatial autocorrelation. Our final reference data included sample plots measured inthe field and additional points selected from DOQs around the field data locations.Moran’s Index of all the data including the clustered additional points was 0.67.

During this process, it became evident that some original field data were in closespatial proximity to each other and were part of the same cluster. The number of ref-erence clusters (table 1) is lower than the original number of randomly determined fieldplots for two reasons. First, when checking for horizontal positioning errors, we elim-inated some field plots on the border of land cover classes that turned out to be mixedpixels or a different land cover on the AVIRIS or DOQ scene (figure 1(b)). Second,some field plots were close to each other and sampling additional points caused themto merge into one cluster.

Table 1 shows the final number of field samples, the number of additional DOQ-based samples, and the number of generated clusters. The land cover classes ‘water’

Table 1. Number of sites for each land cover class. The field measured plots were supplementedwith additional points based on Digital Orthoquads (DOQs) resulting in large point clusters.

Land coverNumber of field data

plots

Number of field- andDOQ-based reference

data Number of clusters*

Meadow 19 64 10Bare 0 5 1Open pine 26 104 14Dense pine 26 212 19Spruce 21 126 12Aspen 17 84 9Water 0 106 2Mixed 25 Not utilized –

*Resulting number of clusters after checking field data plots for autocorrelation and spatialproximity to other land cover classes.

6 S. Mannel et al.

and ‘bare’ were sampled only using DOQs. The land cover class ‘bare’ was rare. Abare area large enough to be independent of possible georeferencing errors was foundonly in one sandpit.

In an effort to prevent autocorrelated training and test data, we selected train-ing and test data by randomly dividing the spatially unrelated point clusters (versusrandomly dividing all reference data). An entire cluster was either training or testdata. We applied the traditional hold-out method to different training/test datasetsand noticed discrepancies indicating that the selected training/test data were insuf-ficient for capturing the entire spectral range of a land cover. We then created fourtrial sets, each with different randomly chosen training and test clusters to quantifyuncertainties related to only one training/test dataset. Ideally, a comparison of all pos-sible training/test combinations would have been preferable, but was time prohibitivewithout a computerized process.

The test data for each trial were chosen by randomly selecting approximately one-third of the reference clusters. The remaining two-thirds of clusters became trainingdata. We felt it necessary to select at least one-third of the clusters as test data inorder to capture enough data variability. For the ‘bare’ land cover class we dividedindividual points (within the one available reference cluster) into training/test data. Asin a traditional holdout method, each individual test trial was evaluated with an errormatrix. The resulting accuracies for each land cover were compared to the other threetrial results to quantify the error range that would be attributable to having differenttraining/test sets. We then averaged the accuracy of all four trials (four-fold holdoutmethod). In order to quantify the error range, we calculated a confidence intervalfor all four trials. The confidence interval is based on the arcsine transformation tonormalize the data in order to determine the confidence interval under a nearly normaldistribution (Zar 1984).

A k-fold cross-validation would be a valid alternative to the traditional hold-out method, since it utilizes several training/test datasets. However, cross-validationseparates reference data randomly and thus could lead to overoptimistic results ifautocorrelation is present, as it is in our DOQ reference data (Moran’s Index was0.67). We applied the cross-validation simply to compare the accuracies of differentclassification settings within this particular project knowing that the actual accuraciescould be inflated. More importantly, we were interested in quantifying the bias relatingto autocorrelation and performed a paired t-test to estimate how consistently randomcross-validation would overestimate the accuracy.

3. Results

Reference data preparation and altering training/test datasets impacted accuracyassessment outcomes. We found that spatial errors and autocorrelation limit theavailable reference data. Error matrices using different training/test datasets pro-duced different accuracies. Not adjusting for autocorrelation of heavily autocorrelatedreference data produced inflated accuracies.

After examining the validity of the randomly stratified field plots and combiningautocorrelated plots to form spatially independent clusters, we noticed the numberof spatially independent clusters going down by 25% to 50% in comparison to theoriginal field dataset. The 19 field-sampled meadow plots, for example, yielded only10 spatially valid and independent clusters (table 1).


Different training and test datasets produced accuracies that differed by up to 35%,as in the spruce land cover class (table 2). The trial 2 accuracy for the spruce classwas 92%. The training/test set of trial 1, on the other hand, reported an accuracyof only 57% for the same class. The 90% confidence interval for this class had arange of 35% (62% to 97%). In a traditional holdout method, only one or the othertraining/test dataset might have been used. Accuracy assessment through the k-foldholdout method, on the other hand, averages the accuracies of all k trials.

We observed an especially high fluctuation of accuracy for the bare/unvegetatedclass, which had a very low number of reference samples (table 1). The misclassifi-cation of one test data point had a large impact on accuracy. For example, test 4was 25% less accurate than test 3. The different trials showed that overall accuracyalso fluctuated. For example, the trial 4 overall accuracy was 12% lower than trial 3(table 2).

Random cross-validation, which did not adjust for autocorrelation yielded 5–10%higher accuracies than the manual four-fold holdout method, which used referencedata divided by clusters (table 3). Within a 90% confidence interval, the overestimationwas 5% to 8%. The p-value was 0.01. This is consistent (although less drastic) with

Table 2. Total accuracies for different land cover types.

Land coverTrial 1*

(%)Trial 2*

(%)Trial 3*

(%)Trial 4*

(%)

Four-fold

holdoutmethod†

(%) lCI‡ (%)Mean‡

(%) uCI‡ (%)

Meadow 94 90 89 81 88 82 89 94Bare 100 100 100 75 94 82 98 97Open pine 69 58 86 71 71 57 72 84Dense pine 74 82 87 74 79 71 80 87Spruce 57 92 92 84 81 62 83 97Aspen 90 84 83 76 83 76 84 90Water 87 100 100 95 95 87 98 100Total 82 87 91 79 85

*The variability in the four trials suggests that more than one permanent training/test datasetmay be necessary for a true accuracy estimation.†Four-fold holdout method averages accuracies of the four trials.‡90% confidence intervals constructed on the four resampling trials using arcsine transforma-tion (lCI: lower confidence interval, uCI: upper confidence interval).

Table 3. Accuracies based on random points and random clusters.

Land cover Random cross-validation* (%) Four-fold holdout method† (%)

Meadow 94 88Bare 100 94Open pine 78 71Dense pine 85 79Spruce 89 81Aspen 88 83Water 100 95Total 91 85

*Random cross-validation of autocorrelated data leads to overestimated accuracy.†Four-fold holdout method averages accuracies of the four trials.

8 S. Mannel et al.

previous research by Friedl et al. (2000) who found up to 54% inflated accuracies whennot adjusting for autocorrelation in AVHRR data.

4. Discussion

The results of this research illustrate the impact of (a) reference data preparation,(b) one versus several training/test datasets, and (c) autocorrelation on reported clas-sification accuracy. First, our results demonstrated that adjusting for spatial errorsand autocorrelation may lead to final reference clusters that are fewer in number thanin the original field data. Second, a k-fold holdout method provides better accuracyestimation than the traditional holdout method, because it averages the results of sev-eral training/test datasets. Third, not correcting for autocorrelation will artificiallyincrease accuracy consistently and significantly.

A failure to properly sample or prepare reference data or using only onetraining/test dataset will complicate comparing classification results to other projectswith different data, classification methods or sensor types. One of our goals was toquantify the affected reference data when adjusting for spatial errors and autocorre-lation. During field data acquisition we included small but important tree stands aswell as field sites close to the edge of another land cover type. Georeferencing andGPS errors affected the validity of some of the field samples in these stands. Thenumber of spatially independent reference data clusters was up to 48% less than thenumber of original field plots. These results suggest the importance of checking fieldsites on remotely sensed images for spatial validity and discarding any that clearly fallin the wrong land cover class. If possible, setting up a field data acquisition schemethat utilizes mainly sites in homogeneous areas far from land cover boundaries willhelp prevent wasted time in collecting unusable field data. We assessed the impact ofseveral training and test datasets (k-fold holdout method) versus the traditional hold-out method, which uses only one training/test dataset. We found accuracies divergedconsistently and significantly between the k trials, confirming the concern that onepermanent training/test set alone rarely covers the variability within the entire dataset.For example, in trial 1 spruce had an accuracy of 57%, whereas different training/testdata distributions in trials 2 and 3 showed a 92% classification accuracy. This illus-trates the need for implementing more than one training/test set in order to accuratelydetermine success of a remote sensing classification. Assessing accuracy using severaltraining/test datasets through the k-fold holdout method provided several advan-tages: (1) avoiding misleading results due to one permanent training/test dataset;(2) allowing all the data to be used as training data in building the final classifier(also an advantage of cross-validation); and (3) allowing the identification of classi-fication parameters and settings that are robust and not specific to a single randomtraining/test dataset.

Another alternative is random k-fold cross-validation. However, autocorrelationmay be present within spatially close field data, for example when sampling addi-tional data from DOQs clustered around field-sampled plots. Closely spaced pointsshould be combined into clusters, where one cluster would be either test or train-ing data. If autocorrelation is present in the reference data, random cross-validationinflates accuracy, in our case by 5% to 8% in a 90% confidence interval. Random cross-validation can still be useful for comparing different settings within one classificationproject.



Our study suggests using the k-fold holdout method as an objective way formeasuring any classification accuracy. If reference data are free of autocorrelation,cross-validation or similar techniques are also a valid alternative. Both methods areeasily applicable to most classifications and land cover types. Autocorrelation shouldbe taken into account whenever reference data are in close proximity, for exampleAVHRR data (Friedl et al. 2000) or DOQ-based reference point clusters (Mannel et al.2006). Other methods, such as one proposed by Steele et al. (1998) can further quan-tify how misclassification probabilities vary with terrain and land use. We encourageadditional studies to quantify the limits of reference data and the impact of accuracyassessment methods on reported classification results of different remote sensing data,classification methods and land cover types.

AcknowledgementsThe authors would like to acknowledge the efforts of Mark Rumble and the USDARocky Mountain Forest Service Research Station for providing vegetation data andsupport with the reference data collection, Bruce Wylie and Chengquan Huang forassistance with the decision tree analysis, and Doug Baldwin and Krystal Price for col-lecting and processing field data. Furthermore, we would like to thank Keith Weber forhis review and suggestions and Teri Peterson for her help with the statistical analysis.This research was supported by the National Aeronautics and Space Administration(NASA) Food and Fiber Applications of Remote Sensing Program, grant NAG13-99021, South Dakota School of Mines and Technology, National Science Foundation/

Experimental Program to Stimulate Competitive Research (EPSCoR) grant EPS-0091948, EPS-9720642 and the South Dakota Space Grant Consortium.

ReferencesANSELIN, L. and GETIS, A., 1992, Spatial statistical analysis and geographic information

systems. The Annals of Regional Science, 26, pp. 19–33.BREIMAN, L., FRIEDMAN, J., OLSEN, R. and STONE, C., 1984, Classification and Regression

Trees (Belmont, CA: Wadsworth).BRODLEY, C.E., LANE, T. and STOUGH, T.M., 1999, Knowledge discovery and data mining.

American Scientist, 87, pp. 54–61.CONGALTON, R.G., 1991, A review of assessing the accuracy of classifications of remotely sensed

data. Remote Sensing of Environment, 37, pp. 35–46.CONGALTON, R.G. and GREEN, K., 2008, Assessing the Accuracy of Remotely Sensed Data:

Principles and Practices (2nd edn) Mapping Sciences Series (Boca Raton, FL: Lewis).DE’ATH, G. and FABRICIUS, K.E., 2000, Classification and regression trees: a powerful yet

simple technique for ecological data analysis. Ecology, 81, pp. 3178–3192.FRERY, A.C., FERRERO, S. and BUSTOS, O.H., 2009, The influence of training errors, context

and numbers of bands in the accuracy of image classification. International Journal ofRemote Sensing, 30, pp. 1425–1440.

FRIEDL, M.A., WOODCOCK, C., GOPAL, S., MUCHONEY, D., STRAHLER, A.H. and BARKER-SCHAAF, C., 2000, A note on procedures for accuracy assessment in land cover mapsderived from AVHRR data. International Journal of Remote Sensing, 21, pp. 1073–1077.

GETIS, A. and ORD, K., 1992, The analysis of spatial association by use of distance statistics.Geographical Analysis, 24, pp. 189–206.

HANSEN, M., DUBAYAH, R. and DEFRIES, R., 1996, Classification trees: an alternative totraditional land cover classifiers. International Journal of Remote Sensing, 17, pp.1075–1081.

KETTIG, R.L. and LANDGREBE, D.A., 1976, Classification of multispectral image data byextraction and classification of homogeneous objects. IEEE Transactions on GeoscienceElectronics, 14, pp. 19–26.

https://www.researchgate.net/publication/227897919_The_Analysis_of_Spatial_Association_by_Use_of_Distance_Statistics?el=1_x_8&enrichId=rgreq-f975fbe1c03f814f05b3ed0459eac8f7-XXX&enrichSource=Y292ZXJQYWdlOzIzMzMzODAxMztBUzoxNjIwNjk4NDQzNDA3MzdAMTQxNTY1MTg3MDk1MA==

https://www.researchgate.net/publication/227897919_The_Analysis_of_Spatial_Association_by_Use_of_Distance_Statistics?el=1_x_8&enrichId=rgreq-f975fbe1c03f814f05b3ed0459eac8f7-XXX&enrichSource=Y292ZXJQYWdlOzIzMzMzODAxMztBUzoxNjIwNjk4NDQzNDA3MzdAMTQxNTY1MTg3MDk1MA==

https://www.researchgate.net/publication/258516474_Classification_and_Regression_Trees_A_Powerful_Yet_Simple_Technique_for_Ecological_Data_Analysis?el=1_x_8&enrichId=rgreq-f975fbe1c03f814f05b3ed0459eac8f7-XXX&enrichSource=Y292ZXJQYWdlOzIzMzMzODAxMztBUzoxNjIwNjk4NDQzNDA3MzdAMTQxNTY1MTg3MDk1MA==

https://www.researchgate.net/publication/258516474_Classification_and_Regression_Trees_A_Powerful_Yet_Simple_Technique_for_Ecological_Data_Analysis?el=1_x_8&enrichId=rgreq-f975fbe1c03f814f05b3ed0459eac8f7-XXX&enrichSource=Y292ZXJQYWdlOzIzMzMzODAxMztBUzoxNjIwNjk4NDQzNDA3MzdAMTQxNTY1MTg3MDk1MA==













10 S. Mannel et al.

LAWRENCE, R.L. and WRIGHT, A., 2001, Rule-based classification systems using classificationand regression tree (Cart) analysis. Photogrammetric Engineering & Remote Sensing, 67,pp. 1137–1142.

MANNEL, S., HUA, D. and PRICE, M., 2006, A method to obtain large quantities of referencedata. International Journal of Remote Sensing, 27, pp. 623–627.

PLUTOWSKI, M., SAKATA, S. and WHITE, H., 1994, Cross-validation estimates IMSE. InAdvances in Neural Information Processing Systems, J. Cowan, G. Tesauro andJ. Alspector (Eds), pp. 391–398 (San Mateo, CA: Morgan Kaufman).

QUINLAN, J.R., 2000, See5: an informal tutorial. Available online at: http://www.rulequest.com/see5-win.html (accessed 6 May 2005).

READ, B.J., 2000, Data mining and science? CLRC Rutherford Appleton Laboratory,Chilton, Didcot, Oxon, UK. Available online at: http://www.ercim.eu/publication/

ws-proceedings/12th-EDRG/EDRG12_Re.pdf (accessed 10 June 2007).STEELE, B.M., WINNE, J.C. and REDMOND, R.L., 1998, Estimation and mapping of misclassi-

fication probabilities for thematic land cover maps. Remote Sensing of Environment, 66,pp. 192–202.

SWAIN, P.H. and HAUSKA, H., 1977, The decision tree classifier: design and potential. IEEETransactions on Geoscience Electronics, 15, pp. 142–147.

TOBLER, W.R., 1970, A computer movie simulating urban growth in the Detroit region.Economic Geography, 46, pp. 234–240.

VOGELMANN, J.E., HOWARD, S.M., YANG, L., LARSON, C.R., WYLIE, B.K. and VAN DRIEL, N.,2001, Completion of the 1990s national land cover data set for the conterminous UnitedStates from Landsat Thematic Mapper and ancillary data sources. PhotogrammetricEngineering & Remote Sensing, 67, pp. 650–662.

WEBER, K.T., 2006, Challenges of integrating geospatial technologies into rangeland researchand management. Rangeland Ecology & Management, 59, pp. 38–43.

WEBER, K.T., THEAU, J. and SERR, K., 2008, Effect of co-registration error on patchytarget detection using high-resolution imagery. Remote Sensing of Environment, 112,pp. 845–850.

WILLIAMS, A.P. and HUNT, E.R., JR, 2002, Estimation of leafy spurge cover from hyperspec-tral imagery using mixture tuned matched filtering. Remote Sensing of Environment, 82,pp. 446–456.

ZAR, J.H., 1984, Biostatistical Analysis (2nd edn) (Englewood Cliffs, NJ: Prentice Hall).









Documents

Impact of reference datasets and autocorrelation on classification accuracy