22
This article was downloaded by:[University of Michigan] On: 22 March 2008 Access Details: [subscription number 788746703] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK International Journal of Remote Sensing Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713722504 Using a hybrid fuzzy classifier (HFC) to map typical grassland vegetation in Xilin River Basin, Inner Mongolia, China Z. Sha a ; Y. Bai b ; Y. Xie a ; M. Yu b ; L. Zhang c a Department of Geography and Geology, Eastern Michigan University, Ypsilanti, Michigan 48197, USA b Laboratory of Quantitative Vegetation Ecology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, P. R. China c Institute of Microbiology, Chinese Academy of Sciences, Beijing 100080, P. R. China First Published on: 10 March 2008 To cite this Article: Sha, Z., Bai, Y., Xie, Y., Yu, M. and Zhang, L. (2008) 'Using a hybrid fuzzy classifier (HFC) to map typical grassland vegetation in Xilin River Basin, Inner Mongolia, China', International Journal of Remote Sensing, 1 - 21 To link to this article: DOI: 10.1080/01431160701408436 URL: http://dx.doi.org/10.1080/01431160701408436 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article maybe used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

This article was downloaded by:[University of Michigan]On: 22 March 2008Access Details: [subscription number 788746703]Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of RemoteSensingPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713722504

Using a hybrid fuzzy classifier (HFC) to map typicalgrassland vegetation in Xilin River Basin, InnerMongolia, ChinaZ. Sha a; Y. Bai b; Y. Xie a; M. Yu b; L. Zhang ca Department of Geography and Geology, Eastern Michigan University, Ypsilanti,Michigan 48197, USAb Laboratory of Quantitative Vegetation Ecology, Institute of Botany, ChineseAcademy of Sciences, Beijing 100093, P. R. Chinac Institute of Microbiology, Chinese Academy of Sciences, Beijing 100080, P. R.China

First Published on: 10 March 2008To cite this Article: Sha, Z., Bai, Y., Xie, Y., Yu, M. and Zhang, L. (2008) 'Using a hybrid fuzzy classifier (HFC) to maptypical grassland vegetation in Xilin River Basin, Inner Mongolia, China', International Journal of Remote Sensing, 1 - 21To link to this article: DOI: 10.1080/01431160701408436URL: http://dx.doi.org/10.1080/01431160701408436

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article maybe used for research, teaching and private study purposes. Any substantial or systematic reproduction,re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expresslyforbidden.

The publisher does not give any warranty express or implied or make any representation that the contents will becomplete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should beindependently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with orarising out of the use of this material.

Page 2: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

Using a hybrid fuzzy classifier (HFC) to map typical grasslandvegetation in Xilin River Basin, Inner Mongolia, China

Z. SHA{, Y. BAI{, Y. XIE*{, M. YU{ and L. ZHANG§

{Department of Geography and Geology, Eastern Michigan University, Ypsilanti,

Michigan 48197, USA

{Laboratory of Quantitative Vegetation Ecology, Institute of Botany, Chinese Academy

of Sciences, Beijing 100093, P. R. China

§Institute of Microbiology, Chinese Academy of Sciences, Beijing 100080, P. R. China

(Received 20 October 2006; in final form 14 April 2007 )

Community ecologists and vegetation scientists in grassland research have a

strong interest in quantifying biotic communities in detail. However, a

satisfactory classification with fine biotic details has been challenged by the

coarse resolutions of Landsat images, although they are easily accessible. In this

paper, a hybrid fuzzy classifier (HFC) for vegetation classification with Landsat

ETM + imagery on the typical grassland in Xilinhe River Basin, Inner Mongolia,

China has been developed. Three vegetation classification systems were created

from different aspects: the botanical system (Bio-classes, also as the final

mapping units for vegetation cover), the combined botanical and spectral system

(Bio-S classes), and the spectral system (Spec-classes). The HFC designed a fuzzy

logic to measure the similarity between Spec-classes, extracted by the

unsupervised classification, and Bio-S classes, built from the field samples, when

considering the spectral variations of samples within the same Bio-class. Then,

Bio-S classes, which served as a bridge for assigning Spec-classes to the target

Bio-classes, were merged to restore Bio-classes for the final mapping. To assess

the classification accuracy, the HFC was compared with a conventional

supervised classification (CSC). The overall result of the HFC was much better

than that of the CSC, with an accuracy percentage of 80.2% as compared to

69.0% for the CSC.

1. Introduction

Grassland, as a sensitive ecosystem to global climate change, is an important land

cover that bears human imprints (Liang et al. 2003). One of the interests among

community ecologists and vegetation scientists in grassland research is to quantify

biotic communities in meaningful details (Cerna and Chytry 2005). The technology

of remote sensing offers an effective means of studying grassland vegetation cover

changes, especially over large areas (Nordberg and Evertson 2003). Compared with

high spatial resolution sensors such as IKONOS or QuickBird, or low spatial

resolution sensors such as Advanced Very High Resolution Radiometer (AVHRR)

or Moderate Resolution Imaging Spectrometer (MODIS), Landsat imagery is a type

of satellite sensor data with medium spatial resolution, and which has been widely

used in resource monitoring and assessment.

*Corresponding author. Email: [email protected]

INT. J. REMOTE SENSING

2008, iFirst Article, 1–21

International Journal of Remote SensingISSN 0143-1161 print/ISSN 1366-5901 online # 2008 Taylor & Francis

http://www.tandf.co.uk/journalsDOI: 10.1080/01431160701408436

Page 3: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

Previous studies have proved that vegetation classification on grassland is

definitely a challenging task. Numerous factors affect the potential success of image

classification using satellite images (Salovaara et al. 2005). The same vegetation type

on ground may have different spectral features in remote sensed images and, on the

contrary, different vegetation types may possess similar spectra. Also, it is common

that mosaic-like patterns of grassland vegetation exist (Cingolani et al. 2004, Stuart et

al. 2006). Inconsistency always occurs when grassland vegetation classification is

made from a botanical point of view (Bio-classes) or from spectral considerations

(Spec-classes). Due to the complexities involved, different methods have been

developed to classify grassland vegetation from remote sensing images. An

unsupervised approach that is easy to apply is often used in thematic mapping from

imagery, and which is widely available in image processing and statistical software

packages (Langley et al. 2001). One disadvantage of the unsupervised classification is

that the classification process has to be repeated again if new data (cases) are added.

By contrast, supervised classification methods learn an established classification from

a training data set, which contains the predictor variables measured in each sampling

unit and the priori class assignments of the sampling units (Cerna and Chytry 2005).

Therefore, the addition of new data (cases) has no impact on the established standards

of existing classification once the classifier has been set up. A maximum likelihood

(ML) classifier is usually regarded as a classic and most widely used supervised

classification for remote image resting on the statistical distribution pattern (Higdon

and Schafer 2001, Sohn and Rebello 2002, Xu et al. 2005). However, the ML classifier

shows less satisfactory successes, as its assumption that the data follow Gaussian

distributions may not always be the case in complex areas.

Recent studies have made great progress in land or grassland cover mapping using

remote sensing images by developing more powerful classifiers. Stuart et al. (2006)

developed continuous classifications using Landsat data to distinguish variations

within Neotropical savannas and to characterize the boundaries between savanna

areas, the associated gallery forests, seasonally dry forests, and wetland communities.

Researches have also shown that classification accuracy can be greatly improved after

applying expert knowledge (empirical rules) and ancillary data to extract thematic

objects from remote sensing images (Shrestha and Zinck 2001, Gad and Kusky 2006).

Sohn has developed supervised and unsupervised spectral angle classifiers (SAC)

that take account of the fact that the spectra of the same type of surface objects are

approximately linearly scaled variations of one another due to atmospheric and

topographic effects (Sohn and Rebello 2002, Sohn and Qi 2005). Normalized

difference vegetation index (NDVI) is another method of studying vegetation cover

and is popularly used in vegetation density mapping. NDVI is a general biophysical

parameter that correlates with the photosynthetic activity of vegetation, and which

provides an indication of the ‘greenness’ of the vegetation rather than providing

vegetation cover type directly (Wang and Tenhunen 2004).

As well as previously mentioned methods, artificial neural network (ANN) and

fuzzy logic classification are also frequently reported in grassland or land cover

classifications in recent years. The ANN method is appropriate for the analysis of

nearly any kind of data irrespective of their statistical properties, but at the expense

of the interpretability of the results as it represents a black-box approach that hides

the underlying prediction process (Cerna and Chytry 2005). Berberoglu et al. (2000)

combined ANN and texture analysis on a per-field basis to classify land cover and

found the accuracy could be 15% greater than the accuracy achieved using a

2 Z. Sha et al.

Page 4: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

standard per-pixel ML classification. One disadvantage of ANN, however, is that it

can be computationally demanding when large data sets are used to train the

network, and sometimes no result may be achieved at all, even after a long

computation time, due to the local minimum (e.g. for a back-propagation ANN). A

fuzzy classification approach is usually useful in mixed-class areas and was

investigated for the classification of suburban land cover from remote sensing

imagery by Zhang and Foody (1998). It is a kind of probability-based classification

rather than crisp classification. Unlike implementing a per-pixel-based classifier to

produce crisp or hard classification, Xu et al. (2005) employed a decision tree

derived from the regression approach to determine class proportions within a pixel,

in order to produce soft classification from remote sensed data in land cover

classification research. Theoretically, probability-based or soft classification is more

reasonable for composite units, since those units cannot be simply classified to one

type, but to a probability for that type.

In this paper, a hybrid fuzzy classifier (HFC) to map the vegetation cover in Xilin

River Basin, the Inner Mongolia Autonomous Region, China has been developed.

This HFC was designed to meet the challenges of classifying grassland vegetations

through Landsat images because of the medium spatial resolution of the images, the

grassland spectral complexities, and the heavy human interference. It was hoped to

make a better vegetation classifier to support ecological studies of vegetation changes

and to provide scientific data to assist decision making on grassland management and

exploitation. The study area and data sources will be presented in the next section. The

design and technical procedures of the HFC will be described in a systematic manner

in the third section. The accuracy assessment in comparison with the ML classifier will

be examined in the fourth section. The results, analyses and future improvements to

the HFC will be discussed in the concluding section.

2. The study area and data sources

Grasslands are the primary natural land cover in northern China, and in the vast

semi-arid region of the Eurasian continent as a whole. As increasing population,

expanding residential areas, and intensified grazing pressure have been imposed on

this region, grassland degradation has become a major ecological and economic

problem in the Inner Mongolia steppe region. As a result, grassland productivity

decreases, and desertification occurs (Tong et al. 2004, Hea et al. 2005, Li et al.

2005). The Xilinhe River Basin was chosen as the case study area (see figure 1) for

the following considerations. Xilin River Basin, situated 43u269 to 44u299 N and

115u329 to 117u129 E, is one of the most representative steppe zones in China (Li et al.

1988). It has been best preserved since planned utilization and scientific research in

the Xilin River Basin were initiated in the early 1950s, when the Xilin Breeding

Stock Rangeland was established. Researchers from Nanjing Agriculture University

surveyed the grassland and forage grass in 1952 (The Inner Mongolia and Ninxia

Survey Team of CAS 1985). Xilin River Basin was designated as the biological

practice field by the University of Inner Mongolia in 1957. A large-scale scientific

survey was conducted from 1964 to 1965. A permanent ecosystem observation

station (EOB) was established there in 1979 by the Chinese Academy of Sciences (Li

et al. 1988). Systematic collections of climate, soil, vegetation, and ecosystem data

have been conducted since then, and this has provided comprehensive support for

grassland and related research activities. This study is one of many research projects

based on the data and long-term research goals set up by the EOB.

Classifying typical grassland vegetation 3

Page 5: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

Landsat Enhanced Thematic Mapper (ETM + ) images covering the study area

were used in this research. Cloud coverage around most of the year presented a

major limitation for the selection of images. Only cloud-free single-day images were

identified for the study area for the period between April 2004 and October 2004,

since the plants usually grow from late April to early October (Wang and Han

2005). As the study area is covered with two image scenes, two scenes of ETM + on

14 August 2004 were obtained and then preprocessed.

3. The classification method and procedures

The HFC developed here was inspired by the fact that inconsistencies exist between

classifications from the botanical point of view (Bio-classes) and from the spectral

point of view (Spec-classes). Spectral variations, often present within a Bio-class,

could be used to develop a better classifier. Assignments of these subtle variations to

vegetation classes are fuzzy to a large degree. Therefore, an integration of the

supervised method with the unsupervised classification through a fuzzy membership

function could provide meaningful data to improve the classification accuracy. The

technical procedures and methods to implement the HFC are shown in figure 2 and

will be detailed below. The advantages and questions concerning this design will be

discussed in the discussions and conclusions section, as much of the design will

become appreciable by then. The implementation of the HFC includes three phases:

data preparation, image classification, and accuracy assessment.

Figure 1. The research site (Tong et al. 2004).

4 Z. Sha et al.

Page 6: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

3.1 Stage I: data preparation

Data preparation includes five steps (Step 1,Step 5). The main task in this stage

was to decide a classification system, to acquire field samples and ETM + images for

classification, and to make some preprocessing and initial analysis of the image

data.

Figure 2. Flowchart of the classification procedures conducted in this study.

Classifying typical grassland vegetation 5

Page 7: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

3.1.1 Step 1: deciding a grassland classification system from a botanical point of

view. Choosing a suitable grassland classification system for supporting ecological

study has been a challenge. Many factors should be taken into consideration. The

choice of which factors should be considered and how they should be evaluated

depends on research goals. Five principles are most commonly employed (Hanson

and Dahl 1957): (1) physiognomy, or general appearance of the vegetation; (2)

geographic distribution, such as altitudinal and latitudinal zones; (3) floristics, or the

kinds of species that make up the community; (4) habitat relations, with emphasis

on the causal influence of the environment; and (5) successional status, or the

relation to the climax. The purpose here of designing a classification system is to

support one research goal at the EOB. This goal is to utilize the long-term, fine-scale

and in-depth observations of the rangeland ecosystem compositions and changes to

extend the understanding of ecosystem dynamics over a much broader region of

grassland. In addition, scientific data could be provided through modern spatial

technologies to support informed decision making for better utilization and

management of grassland in Inner Mongolia. Therefore, priorities were given to

principles 1, 2, and 3 above. The details of weighting each principle were omitted

from this paper as the main focus here is on how to get fine-scale botanic

classifications of grassland from the Landsat images with limited resolutions. As

previously mentioned, due to the limitations of spectral resolutions, the actual

vegetation classification system was a compromised product in the price of losing

some details of eco-community species compositions (see table 1). However, it is the

most detailed botanic classification system in Inner Mongolia grassland research

that has been derived from the remote sensed imagery.

3.1.2 Step 2: field sampling. To make a reliable classification and build a data set

for assessing classification accuracy, actual data at field sites were collected. The

original on-site trip to gather field samples was made in late August 2004. A total of

568 evenly distributed sampling sites covering the research area were decided on,

with the help of the topographical map produced in 2005, before the field trips to

collect samples. Each site was visually inspected and the area of the same vegetation

type at that site was measured. An area of about one pixel (90 m2) was then focussed

on and five vegetation samples were collected. One sample was roughly located at

the centre of this focussed area and the other four were dispersed at a distance about

10 m from the four corners of that area. An initial vegetation classification (Bio-

class) of the samples was identified and recorded in the field. Re-examinations of all

Table 1. Grassland vegetation classification system based on the botanic types.

Class Community type (named after dominant species) Vegetation type

1 Cleistogenes squarrosa Typical steppe2 Stipa grandis Typical steppe3 Achnatherum splendens Meadow4 Stipa krylovii Typical steppe5 Artemisia frigida Typical steppe6 Carex pediformis Meadow steppe7 Carex spp. Meadow8 Caragana microphylla Typical steppe9 Leymus chinensis + Stipa baicalensis Meadow steppe10 Leymus chinensis Typical steppe11 Salsola collina (Chenopodium glaucum) Typical steppe

6 Z. Sha et al.

Page 8: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

five samples at a site were conducted by the biologists at the ecosystem observation

station to give a final determination of the vegetation classification of that site. In

addition, all field samples were geo-coded in the field with a hand-held global

positioning system (GPS) to allow further processing in image classification and

geographic information systems. Some auxiliary data were used to help us analyse

the field samples and image classification. These data included two scenes of

Thematic Mapper images of 1998, the digital elevation model (DEM) with a scale of

1 : 25 000 made in 2005, the soil map of the research area of 1988, and the vegetation

coverage of 1988.

3.1.3 Step 3: preprocessing ETM + images. The research area straddles over two

Landsat TM scenes: path 124/row 29 and path 124/row 30. Geometric corrections

using the first order polynomial rectification with the accuracy of the root mean

squared error (RMSE) of less than half a pixel were carried out on the two scenes

with ground control points (GCPs) gathered at the field trips. The GCPs were

obtained in the field from GPS and from the reference points read from a

topographical map covering the same area. Both of the scenes fitted well with the

topographical data and other ancillary data. Although there was no obvious cloud

cover in the images, atmospheric haze still could not be neglected. Since there were

no in-situ atmospheric measurements, image-based atmospheric corrections to

remove haze effects were the priority. Therefore, a strictly image-based atmospheric

correction, as proposed by Chavez (1996), was followed to remove atmospheric haze

impact on both images. Digital Numbers (DNs) from equivalent dark objects in

both scenes (e.g. Xilin River, deserted areas) were found to vary by four or less DNs

in the infrared bands and two or less DNs in the visible bands. The two scenes were

then mosaicked into a single image. No noticeable irregularities were found in the

mosaicking process. The mosaicked image was then clipped using the boundary

polygon of the research area. Roads and urban areas were removed from the image

visually with the support of topographical data. Heavily deserted areas were also

dug out, as they might influence the classification accuracy. Additionally, prior to

carrying out the classification, farming and man-fenced lands were also removed

from the image to avoid possible side influences, as these lands had no typical

vegetation spectral signatures. Reflective bands (band 1, 2, 3, 4, 5, 7) of Landsat

ETM + were used. The spectral statistics (DNs) of the preprocessed image are listed

in table 2. This preprocessed image (Img I) was then used for further classifications.

3.1.4 Step 4: validating samples. All samples with GPS coordinates were then

registered on Img I (from Step 3). The following samples were discarded to avoid

possible errors or noises: (1) samples with an estimated unit area smaller than four

pixels, since a small area is seldom representative of a vegetation type (Zha et al.

Table 2. Spectral statistics of the preprocessed image.

Band Min. (DN) Max. (DN) Mean (DN) Std. dev.

1 40 244 68.3 14.672 15 195 36.9 10.423 13 255 53.3 23.704 2 217 80.8 14.375 1 255 122.6 30.777 1 255 58.1 26.18

Classifying typical grassland vegetation 7

Page 9: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

2003); (2) samples too close to main roads (within five to ten pixels); and (3) samples

located in man-fenced farms, as the spectral property of cultivated farms greatly

differed from natural lands. As a result, 464 samples out of 568 were kept as

validated samples. Furthermore, the 464 validated samples were randomly divided

into the training set and the testing set with a ratio of 3 : 1. As a result, 348 samples

were in the training set and 116 samples were in the testing set. The selection of this

unequal ratio gave a favourable weight to the training set. This choice was

determined by the fact that the field sampling was very costly and time consuming

and thus the size of the samples was relatively small. In the following sections, all

image analyses were conducted based on the training samples, while the accuracy

assessment was carried out from the testing samples.

3.1.5 Step 5: sampling brightness. The training samples and Img I were used to

extract spectral data of each sample. Spectral data for the validated samples were

generated with the six reflective bands. The brightness value of each band of each

validated sample was used to produce a brightness matrix of 34867 (sampling-

id + six bands).

3.2 Stage II: image classifications

This stage consisted of two processes: the conventional supervised classification

(CSC) with ML as the classifier, and the hybrid fuzzy classification (HFC), based on

the data prepared in Stage I. A total of ten steps (Step 6 to Step 15) were performed.

3.2.1 Step 6: hierarchical clustering on samples within Bio-classes to create CSC Bio-

S classes. The purpose of this step was to reduce the spectral confusion between Bio-

classes and guarantee the separability of Bio-classes, as spectral variations possibly

existed within each Bio-class of samples. When simply considering the spectral

variations of Bio-class, the number of classes might be expected to increase several

times more than the 11 Bio-classes determined from the botanical point of view.

Therefore, a two-step hierarchical clustering analysis was performed using the

Statistical Package for the Social Sciences (SPSS) (see http://www.wright.edu/cats/

docs/docroom/spss/), with the 11 Bio-classes as priori groups, and the brightness

values of each band (from Step 5) as variables. By using Euclidean distance as the

linkage distance measure and the unweighted pair-group centroids as the linkage rule

(LR), 11 tree-like dendrograms were generated. The Euclidean distance between any

two points P5(p1, p2, …, pn) and Q5(q1, q2,…, qn), in Euclidean n-space, is defined as:

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

p1{q1ð Þ2z p2{q2ð Þ2z � � �z pn{qnð Þ2q

~

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

X

n

i~1

pi{qið Þ2s

, ð1Þ

where pi and qi (i51, 2, 3, 4, 5, 7) are the spectral values of the six bands of any two

samples.

Secondly, an appropriate preset threshold (11.5 was set in this case, as it seemed to

achieve the best classification accuracy after many trials) was used to intercept the

axis (LR dissimilarity distance) of all 11 dendrograms. The samples belonging to a

Bio-class were clustered and grouped into several subclasses if the spectral variations

within this Bio-class were larger than the threshold. These subclasses were both

spectrally and biologically accounted, so they were referred to as CSC Bio-S classes.

In this way, a total of 18 CSC Bio-S classes were generated. The spectral signatures

of the 18 CSC Bio-S classes were used to make the ML supervised classification.

8 Z. Sha et al.

Page 10: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

3.2.2 Step 7: ML classification. When the spectral signatures of the Bio-S classes

were established, ML classification (also referred to as the CSC since it is the classic

and widely used method in supervised image classification) was adopted to classify

Img I. This was carried out by Erdas ‘the supervised classification’ module with Img

I as the input raster file and the spectral signatures of the CSC Bio-S classes (from

Step 6) as the input signature file (ERDAS 2000). A thematic image map of the CSC

Bio-S classes was produced in this step.

3.2.3 Step 8: Principal component analysis (PCA). Correlative analysis of Img I

revealed possible data redundancy in this image data set. To remove the redundant

information, PCA was performed on the brightness matrix (from Step 5) as

variables. The principal component matrix (PCM) was built with the SPSS software

using a maximum likelihood algorithm to transform the brightness of six bands of

each validated sample into two principal components (principal component 1 and

2), forming an HFC brightness matrix with three dimensions (sampling-id + 2

principal components), which kept more than 90% information of the original

brightness matrix (see table 3).

3.2.4 Step 9: hierarchical clustering of the HFC brightness matrix to make HFC Bio-

S classes. This process was similar to Step 6 except that the variables at this step

were the two principal components (from Step 8) rather than the six-band spectral

properties of the validated samples. The hierarchical clustering analysis was also

performed on the training set using the SPSS software, with the 11 Bio-classes as

priori groups. A total of 21 HFC Bio-S classes were generated. All validated samples

grouped by the HFC Bio-S classes were then reprojected to the PCA space using the

PCM (from Step 8) as the transformation parameters.

3.2.5 Step 10: statistical analysis on the HFC Bio-S classes. Statistical analysis was

made on the HFC Bio-S classes with the principal components as variables to generate

MBio-S, a matrix of n16dn. The matrix MBio-S synthesized the statistical information

of the principal components for the HFC Bio-S classes in the form of: MBio-S (Cid,

Meancomp-d1, SDcomp-d1, Meancomp-d2, SDcomp-d2, …, Meancomp-dn, SDcomp-dn), where

n1 is the total number of HFC Bio-S classes (21 in this study), dn is the number of

principal components selected for HFC Bio-S classes after the PCA analysis (two

principal components were kept in this study), Cid is the code of the HFC Bio-S

class, Meancomp-dn is the mean value for the principal component (comp-dn) of the

HFC Bio-S class, and SDcomp-dn is the standard deviation for the principal

component (comp-dn) of the HFC Bio-S class.

Table 3. PCA analysis using six bands as input variables for the validated samples.

Component

Initial eigenvalues Extraction sums of squared loadings

Total % of var. Cumulative % Total % of var. Cumulative %

1 5.00 83.40 83.40 5.00 83.40 83.402 0.90 14.96 98.37 0.90 14.97 98.373 0.0739 1.23 99.604 0.0186 0.31 99.915 0.00557 0.0929 100.006 0 0 100.00

Classifying typical grassland vegetation 9

Page 11: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

3.2.6 Step 11: applying a PCA transformation to image. The PCM (from Step 8)

was also applied independently to Img I to transform reflective bands of the image

into a new image (Img II) with the two principal components as the new bands.

3.2.7 Step 12: ISODATA-based unsupervised classification. Img II was then

classified by an unsupervised method (ISODATA algorithm) and the total number of

36p Spec-classes (63 HFC Spec-classes) was obtained, where p is the total number of

HFC Bio-S classes (ERDAS 2000). Spec-classes were classified only considering the

spectral characteristics of Img II and would be assigned to HFC Bio-S classes in Step 15.

3.2.8 Step 13: statistical analysis on the HFC Spec-classes. Statistical analysis was

also made on the 63 HFC Spec-classes with the principal components as variables.

Similar to Step 10, MSpec, a matrix of n26dn, was created to formalize the statistical

information for HFC Spec-classes in the form of: MSpec (Cid, Meancomp-d1, SDcomp-d1,

Meancomp-d2, SDcomp-d2, …, Meancomp-dn, SDcomp-dn), where n2 is the total number of

HFC Spec-classes (63 here), dn is the number of principal components selected for

HFC Spec-classes after the PCA analysis (two in this study), Cid is the code of the

HFC Spec-class, Meancomp-dn is the mean value for a principal component (comp-dn)

of the HFC Spec-class, and SDcomp-dn is the standard deviation for the principal

component (comp-dn) of the HFC Spec-class. The matrices MSpec (from this step)

and MBio-S (from Step 10) had the same structure and are used later to define a fuzzy

membership function in Step 14.

3.2.9 Step 14: defining a fuzzy membership function for fuzzy assignment of the HFC

Spec-classes. In an HFC classification, the definition of a fuzzy membership

function is a key step in preparing the basic data for assigning the HFC Spec-classes

to the HFC Bio-S classes. The fuzzy membership function served as an indicator to

evaluate the similarity between Spec-classes and HFC Bio-S classes and thus was

able to determine the HFC Spec-class assignment. An HFC Spec-class was assigned

to an HFC Bio-S class so that it had the highest similarity to that Bio-S class,

measured by the fuzzy value defined by the fuzzy membership function.

There were three possible conditions for a value of fuzzy membership: (1) if an

HFC Spec-class and an HFC Bio-S class had exactly the same statistical spectral

property (recorded in the matrices of MBio-S and MSpec), the similarity between the

HFC Spec-class and the HFC Bio-S class was defined as 100% matched; (2) on the

contrary if any principal component value between an HFC Spec-class and an HFC

Bio-S class differed too much so that the mean value (Meancomp-dn, where comp-dn

might be 1, 2, …, depending on which principal component) of any principal

component between the two classes was greater than the sum of the Bio-S class SD

and the Spec-class SD, the similarity between the HFC Spec-class and the HFC Bio-

S class was defined as 0%; and (3) others had a value between 100% and 0%.

After defining the fuzzy membership function, an n16n2 fuzzy similarity matrix

(FSM) was built, where n1 was the total number of HFC Bio-S classes (21) and n2

was the total number of HFC Spec-classes (63). Each element in the FSM was the

similarity measurement between an HFC Bio-S class (indexed as k1) and an HFC

Spec-class (indexed as k2). This element (value) was calculated by the fuzzy

membership function and noted as FSMk1–k2. The fuzzy membership function for

computing FSMk1–k2 is defined as:

F x, yð Þ~ðx2

x1

ðy2

y1

min Z1 x, yð Þ, Z2 x, yð Þð Þ, ð2Þ

10 Z. Sha et al.

Page 12: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

where x is the axis of principal component 1 and y is the axis of principal component

2 in the PCA space, x1 is the minimum value of principal component 1 and y1 is the

minimum value of principal component 2, while x2 and y2 are the maximum values

of principal components 1 and 2 respectively. The min (Z1, Z2) function selects the

minimum value between Z1 and Z2. The value of Z1(x,y) or Z2(x,y) can be

calculated based on MBio-S or MSpec respectively, according to the following rule:

Zi x, yð Þ~0, if abs x{Meancomp�d1

� �

w0, or abs y{SDcomp�d2

� �

w0,

Zi x, yð Þ~abs x{Meancomp�d1

� ��

SDcomp�d1| SDcomp�d1|SDcomp�d2

� �� �

,

if abs x{Meancomp�d1

� �

w0 and abs y{SDcomp�d2

� �

w0 and x=y§SDcomp�d1

SDcomp�d2,

Zi x, yð Þ~abs y{Meancomp�d2

� ��

SDcomp�d1| SDcomp�d1|SDcomp�d2

� �� �

,

if abs x{Meancomp�d1

� �

w0 and abs y{SDcomp�d2

� �

w0 and x=yvSDcomp�d1

SDcomp�d2,

Zi x, yð Þ~1�

SDcomp�d1|SDcomp�d2

� �

, if x~Meancomp�d1 and y~SDcomp�d2:

8

>

>

>

>

>

>

>

>

>

<

>

>

>

>

>

>

>

>

>

:

ð3Þ

The definition of the Zi(x,y) function is applicable for both the Bio-S class (i51)

and the Spec-class (i52). Figure 3 illustrates the above definition of the fuzzy

membership function with two principal components (PC1 and PC2) as the x and y

axes respectively. The bottoms of both conical shapes in the three-dimensional space

were determined by the spectral statistical characteristic of the HFC Spec-class

(from MSpec in Step 13) and the bio-spectral statistical characteristic of the HFC

Bio-S class (from MBio-S in Step 10). The centroids of both rectangles (x and y

positions in the PCA space) were defined by Meancomp-d1 and Meancomp-d2, while the

width and the height of each bottom rectangle were defined by SDcomp-d1 and

SDcomp-d2 respectively. The heights of the conic had reversing relations to the areas

of the bottom rectangles (SDcomp-d16SDcomp-d2) to guarantee the two shapes have

the same volume. The fuzzy membership describes the common portions of conical

shape 1 (V1) and conical shape 2 (V2) with the overlapped shadow rectangle as its

bottom. The value of axis z is the dependent variable calculated by function (3),

based on the relations of the independent variables x and y. For conical shape 1 (V1)

and conical shape 2 (V2), z is separately defined. The z value is 0 when x and y are

Figure 3. Definition of the fuzzy membership function in a three-dimensional space.

Classifying typical grassland vegetation 11

Page 13: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

located outside the bottom rectangle of V1 or V2. The fuzzy membership

measurement is calculated by function (2) over the overlapped shadow rectangle.

3.2.10 Step 15: fuzzy assignment for the Spec-classes. This work is based on the

FSM (from Step 14). A specific Spec-class (from Step 12) had a set of values to

describe its similarity to possible Bio-S classes based on the fuzzy membership

function, as indicated by the fuzzy value measurement in the FSM (see table 4). In a

normal condition, a Spec-class was assigned to a Bio-S class based on its highest

similarity value, which had the highest probability of being assigned correctly.

However, it was found that a Spec-class might, in some cases, be assigned to a Bio-S

class that corresponded to this Spec-class’s second highest similarity value (table 4).

In table 4a, only the first and second reliable fuzzy values are listed. Values smaller

than the second fuzzy membership are neglected. Some Spec-classes might not be

able to be assigned at all because they had no similarity to any of the Bio-S classes.

In this case, these Spec-classes would be assigned as unclassified. After all Spec-

classes were assigned, a HFC Bio-S map could be produced. As shown in table 4b,

only 1 of the Spec-classes could not be assigned, 17 of the Spec-classes could be

exclusively assigned, 21 had two fuzzy similarity values, and 24 had more than two

similarity values.

3.3 Stage III: accuracy assessment and vegetation mapping

The last phase (consisting of Step 16 to Step 17) is to evaluate the classification

result, to make comparisons between the CSC and the HFC, and to decide which

classifier is the best one for mapping the vegetation cover of the research area.

3.3.1 Step 16: restoring the Bio-S map to the Bio-class map. Bio-S classes (from

Steps7 and 15) belonging to the same Bio-class were merged both in the CSC and

HFC methods to map the vegetation cover and to conduct an accuracy assessment.

As a result, both the 18 CSC Bio-S classes and the 63 HFC Spec-classes were

restored to the 11 Bio-classes, or were signed as unclassified.

3.3.2 Step 17: accuracy assessment and vegetation mapping. Accuracy assessment

was conducted on the testing set using the Kappa statistic (de Leeuw et al. 2006). Bio-

class maps resulting from each classification method (CSC and HFC) were evaluated

against the field data. Error matrices were then constructed for both classifications.

The overall classification accuracies and Kappa statistic were calculated for each case.

Afterwards, the classified Bio-classes in raster format were transformed into a vector

format using the Erdas ‘Raster to Vector’ module (ERDAS 2000), and the vector map

of vegetation coverage was made using Esri ArcGIS Desktop 9.1 (ESRI 2005).

4. Accuracy assessment

The key assumption of the HFC here was to generalize a small number of principal

components from the six-band brightness matrix for removing the noise in order to

improve classification accuracy. On the other hand, this PCA analysis would be an

important source for losing or propagating errors. Another process that might be

prone to errors is the fuzzy assignment of the HFC Spec-classes to the HFC Bio-S

classes. In addition, any new approach has to be checked against the common

method (the HFC classifier versus the CSC classifier in this study) to see whether

there is any improvement in the final product.

12 Z. Sha et al.

Page 14: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8 Table 4a. Partial list of the assignments of the fuzzy similarity matrix for Spec-classes and Bio-S classes.

Spec-class

1 2 3 4 5 6 7 8 9 10 11Bio-Sclass

Bio-classc1d1 c1d2 c2d1 c2d2 c2d3 c2d4 c3d1 c4d1 c4d2 c4d3 c5d1 c5d2 c6d1 c7d1 c8d1 c9d1 c10d1 c10d2 c10d3 c11d1 c11d2

1 0.074 0.183 c10d1 102 0.148 0.391 c5d2 53 0.211 0.455 c7d1 74 0.069 0.200 c4d2 45 0.098 0.233 c9d1 96 0.304 0.700 c10d1 107 0.196 0.053 c4d3 48 0.335 0.104 c2d1 2… … …63

0.119c8d1 8

Cla

ssifyin

gty

pica

lg

rassla

nd

vegeta

tion

13

Page 15: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

4.1 Assessing the PCA analysis

The PCA analysis (Step 8) was conducted based on the brightness matrix generated

at Step 5. The result illustrated that the first two principal components explained

83.40% and 14.97% of the variance (a total of 98.37%) of the spectral data of Img I

respectively (table 3). The six bands of Img I were then synthesized into the two

principal components (two bands) for the late analyses. The first component was

positively associated with bands 1, 2, 3, 5 and 7 (table 5), while the second

component mainly explained the information of band 4. From tables 3 and 5, thefollowing transformation functions for PCA can be built:

Comp 1 axis 1ð Þ~0:442TM1z0:444TM2z0:444TM3z0:167TM4

z0:442TM5z0:431TM7,ð4Þ

Comp 2 axis 2ð Þ~{0:101TM1z0:0239TM2{0:066TM3z0:979TM4

{0:101TM5{0:129TM7ð5Þ

where Tm1, Tm2, Tm3, Tm4, Tm5 and Tm7 are reflective values of band 1, 2, 3, 4, 5

and 7, respectively.

The six-dimensional space (six bands) of the validated samples was then

transformed to a two-dimensional PCA space using the above functions. An analysis

at the value distributions of the principal components revealed that some within-classspectral variations existed in certain Bio-classes. The authors think this is important to

further separate biotic communities because different densities of the Bio-classes and

varied water contents of plants may affect the spectral information and induce the

within-class variations. This led to the design of a hierarchical clustering analysis on

the samples within Bio-classes as prior groups. The 11 Bio-classes were then

subdivided into 21 Bio-S classes (Step 9). The samples grouped by Bio-S classes were

re-projected over the PCA space (see figure 4). In the figure, Bio-classes 1, 5 and 11

had 2 Bio-S classes, Bio-classes 4 and 10 had 3 Bio-S classes, and Bio-class 2 had 4 Bio-S classes. Bio-classes 3, 6, 7, 8 and 9 were not divided as samples in these 5 classes had

little spectral variations. In brief, Bio-class 2 displayed high variations in spectra.

4.2 Assessing the fuzzy assignments for HFC Spec-classes

Based on the matrices of MSpec and MBio-S, the FSM was constructed to match

Spec-classes with Bio-S classes (from Step 14 and table 5). Partial fuzzy values were

Table 4b. Distribution summary of fuzzy membership value for allSpec-classes (extracted from the FSM).

Count

Distribution

Column total0 1 2 .2

Total 1 17 21 24 63

Table 5. Component matrix from the PCA analysis.

Component Band 1 Band 2 Band 3 Band 4 Band 5 Band 7

1 0.99 0.99 0.99 0.37 0.99 0.962 20.0953 0.0226 20.0625 0.93 20.0953 20.12

14 Z. Sha et al.

Page 16: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

listed in table 4 to illustrate the assignment process. However, each Spec-class had a

set of fuzzy similarity values measuring its similarity to different Bio-S classes. The

FSM only recorded the first two highest fuzzy values that were supposed to be more

reliable than the others. A Spec-class was assigned to a Bio-S class corresponding to

its highest similarity index in general (the assignment of Spec-class 1 is a good

example). However, there were cases in which the right classifications could have

been made if the assignments went to the second highest similarity values (table 4).

4.3 Comparison between the HFC and CSC methods

The vegetation maps derived from the HFC and the CSC methods are shown in

figure 5. The accuracy of the HFC was much higher than that of the CSC, as shown

in tables 6 and 7. The overall accuracy of the HFC method reached 80.2% while the

accuracy of the CSC was only 69.0%. The overall Kappa statistic showed that the

CSC classification had a lower value than the HFC did (0.63 and 0.77 respectively).

In table 7, the ‘*’ implies the fuzzy classification result. The number before the slash

(/) is the actual number misclassified in this map class. The number after the slash is

the number that would be correctly classified for this map class if the second highest

fuzzy similarity value were used to classify. Interestingly, out of the misclassified 23

samples in the HFC, 12 samples of Spec-classes actually matched the Bio-classes

that were corresponding to their second highest similarity values. Although

advantage of this additional data was not able to be taken in the current project,

potential exists for improving the accuracy level of the HFC in the future.

5. Discussions and conclusions

5.1 Advantages of the HFC

Compared to CSC, the HFC is a soft and statistically based classification rather

than a hard and pixel-based method, which is suitable for image classifications in

Figure 4. Distribution of the samples grouped by Bio-S classes in the PCA space.

Classifying typical grassland vegetation 15

Page 17: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

complicated regions where high within-pixel variations may exist. Lo and Choi

(2004) developed a hybrid classification method that incorporated the advantages of

supervised and unsupervised approaches as well as hard and soft classifications for

mapping the land use/cover of the Atlanta metropolitan area using Landsat 7

Enhanced Thematic Mapper Plus (ETM + ) data. They applied a supervised fuzzy

Figure 5. Comparison of the vegetation maps made by the CSC and HFC methods: (a) ImgI (preprocessed image, farms, roads and urban area were dug out); (b) vegetation cover byCSC classification; (c) vegetation cover by HFC classification. Legend: 1-Cleistogenessquarrosa, 2-Stipa grandis, 3-Achnatherum splendens, 4-Stipa krylovii, 5-Artemisia frigida, 6-Carex pediformis, 7-Carex spp., 8-Caragana microphylla, 9-Leymus chinensis + Stipa baica-lensis, 10-Leymus chinensis, 11-Salsola collina (Chenopodium glaucum).

Table 6. Error matrix for the CSC.

Map class

Reference class

TotalUser’s

(%)1 2 3 4 5 6 7 8 9 10 11

1 4 1 5 80.02 1 22 5 4 32 68.73 5 5 100.04 1 10 1 2 14 71.45 2 6 8 75.06 5 5 100.07 1 1 100.08 1 5 6 83.39 5 1 3 9 44.410 3 6 1 14 24 58.311 2 5 7 71.4Total 6 34 5 21 6 6 1 6 4 22 5 116Producer’s(%)

66.7 64.7 100.0 47.6 100.0 83.3 100.0 83.3 75.0 63.7 100.0

16 Z. Sha et al.

Page 18: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

classification to the mixed pixels, and got a slightly better result than other methods

(unsupervised ISODATA, supervised fuzzy and supervised maximum likelihood

classification methods) in terms of land use/cover classification accuracy. Laba et al.

(2002) compared the accuracy of a regional-scale thematic map of land cover with

taxonomic resolution classified by conventional and fuzzy methods. Their study

showed that fuzzy map accuracies had an obvious improvement in map accuracy

both at low and high taxonomic resolutions. Therefore, in general, the principal

fuzzy classification is more suitable for heterogeneous areas, while hard classifica-

tion can get a good classification result for homogeneous areas.

In the research here, the creation of a vegetation map that could reveal more

vegetation classes to support ecological studies of grassland change dynamics was

desirable. Moreover, it was known from long-term observations and field sampling

that much of the research region was dominated by heterogeneous plant

communities. Therefore, it was believed that the hard and pixel-based image

classification might not be the best way to map the vegetation cover at the fine

details required here. This led to the design and development of the HFC. Through

many rounds of explorations driven by the experiences accumulated through field

work at the Ecosystem Observation Station, the PCA was deployed to summarize

the spectral properties of the selected six bands into two principal components with

the botanic classes chosen as prior groups. This generalization apparently eliminated

some noise from heterogeneous plants, which help to extract the most important

spectral signatures that separated the botanic classes of interest here.

An overall accuracy of 80.2% (Kappa50.77) was obtained by the HFC method.

Compared to the researches conducted in other areas (e.g. Cingolani et al. 2004,

Sohn and Qi 2005), this accuracy level may not be significant. However, if the

complicated vegetation cover and strong human influences in this region are

considered, this classification result could be the best result the present authors have

ever had for the same area (ML classification accuracy is only 69.0% accurate and

Kappa50.63), and even compared to other studies conducted in the same region

(Chen et al. 2003).

Table 7. Error matrix for the HFC.

Map class

Reference class

TotalUser’s

(%)1 2 3 4 5 6 7 8 9 10 11

1 5 1 6 83.32 2/1* 24 1/1* 1 28 85.73 1 6 1/1* 8 75.04 14 1/1* 2/1* 17 82.45 1/1* 5 1 7 71.46 2 2 100.07 1/1* 2 1 4 50.08 1 6 7 85.79 1 1/1* 5 7 71.410 2/1* 1 14 17 82.411 1/1* 2/2* 10 13 76.9Total 8 28 7 20 6 4 3 6 5 18 11 116Producer’s(%)

62.5 85.7 85.7 70.0 83.3 50.0 66.7 100.0 100.0 77.8 90.9

Classifying typical grassland vegetation 17

Page 19: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

5.2 Impact of sampling on classification

Sampling has a great influence on the outcome and accuracy of vegetation

classification. It is a critical step to select field samples of different classes as training

or reference data when implementing classification using remotely sensed data

(Debba et al. 2005). In the present study, the samples were used to build the

signature set in the CSC, to make the PCA matrix, and to define the fuzzy

membership function in the HFC. Theoretically, it is ideal to select the training

samples that are separable both in vegetation stand structures and remote sensing

spectral signatures (Lu 2005). In reality, however, this is seldom the case either

because of differential ground and image sampling intervals (Gao 2006) or multiple

growing stages of plants during sampling. To map vegetation cover, some

alternative units rather than the units from the botanical community are usually

adopted. For example, ecologically meaningful units or repetitive combinations of

structural types were defined by utilizing spectral information to map mountain

rangeland (Cingolani et al. 2004). One limitation of this method is that the

classification system used may not be the one that biologists prefer most because

they are usually more interested in the plant communities rather than the

combinations of structural types. This causes a dilemma in vegetation classification

using remote sensing images since the spectral data of collected samples from the

field may not sufficiently reflect the variations of plant communities.

A similar, but different, approach was adopted in this study. Instead of defining

the mapping units directly, the spectra information was extracted from the field

samples to define the intermediate mapping units, Bio-S classes, which were later

merged into Bio-classes for the final mapping. The samples were used to help define

a middle layer (Bio-S classes) rather than the final mapping units, Bio-classes. In this

way, a match between the intermediate units derived primarily from the spectral

information contained in the field samples and the final mapping units based on the

plant communities requested by the ecologists was designed.

5.3 Number of Bio-S classes in the HFC

In the HFC method, the Bio-S classes acted as a bridge that connected Bio-classes and

Spec-classes. The selection of the number of Bio-S classes may have a critical impact

on the classification result. In the present study, this number was set to around twice

that of the Bio-classes. This decision was based on the statistical result to make the

Bio-S classes as separable as possible in the spectra. As table 4 indicated, Bio-classes 1,

2, 4, 5, 10 and 11 had more than one Bio-S class, while the rest of the Bio-classes had

single Bio-S classes. The number of Bio-S classes within each Bio-class was largely

determined by two factors: (1) the number of the validated samples of the Bio-classes;

and (2) the spectral variations within a Bio-class. Although the Bio-S classes were

spectrally separable, some of them had no significant statistical differences due to the

limited number of samples. Therefore, the size and distribution of field samples had a

direct impact on the number of Bio-S classes that had been generated.

5.4 Future improvement

A HFC method to extract vegetation types from Landsat ETM + imagery to map

vegetation cover characterized by the mixture of plant communities in a typical

steppe grassland has been proposed. The classifier integrated both supervised and

unsupervised classifications, as well as the fuzzy logic. Considerations of both

18 Z. Sha et al.

Page 20: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

botanical features of vegetation and spectral variations within vegetation type were

given. The results indicated that the overall accuracy of the HFC was much betterthan that of the CSC (80.2% and Kappa50.77 versus 69.0% and Kappa50.63). In

summary, the HFC enables the separation of grassland vegetation types from

Landsat ETM + images with a reasonable accuracy and can be applied to grassland

vegetation classifications in other regions.

In addition, a second candidate, the second Bio-S class, for matching with Spec-

classes on the basis of the second highest fuzzy similarity value exists. In the HFC

implementation, manual checking was deployed to examine whether the wrong

assignments of the field samples could be corrected using the second value, and the

error was reported in the assessment in table 4. As previously mentioned, the second

highest value has not been used in classifying the image in this project. However,designing a systematic approach to take advantage of this information for more

accurate classification will be one area of improvement for future work.

Acknowledgements

The authors wish to thank The Center for Ecological Research, Institute of Botany,

Chinese Academy of Sciences (CAS) for the financial support through The One

Hundred Scholars – Distinguished Overseas Scholar Funds. The authors are also

grateful to the research staff and graduate assistants at CAS – Inner Mongolia

Grassland Research Station (IMGERS) who assisted in collecting the field samples

for this research.

ReferencesBERBEROGLU, S., CURRAN, P.J., LLOYD, C.D. and ATKINSON, P.M., 2000, The integration of

spectral and textural information using neural networks for land cover mapping in the

Mediterranean. Computers and Geosciences, 26, pp. 385–396.

CERNA L. and CHYTRY, M., 2005, Supervised classification of plant communities with

artificial neural networks. Journal of Vegetation Science, 16, pp. 407–414.

CHAVEZ, P.S. Jr., 1996, Image-based atmospheric corrections – revisited and improved.

Photogrammetric Engineering and Remote Sensing, 62, pp. 1025–1036.

CHEN, S., XIAO, X., LIU, J. and ZHUANG, D., 2003, Observation of land use/cover change of

the Xilin River Basin, Inner Mongolia, using multi-temporal Landsat images. In

Proceedings of SPIE series, Ecosystems Dynamics, Ecosystem-Society Interactions, and

Remote Sensing Applications for Semi-Arid and Arid Land, Hangzhou, China, pp.

674–685.

CINGOLANI, A.M., RENISON, D., ZAK, M.R. and CABIDO, M.R., 2004, Mapping vegetation in

a heterogeneous mountain rangeland using landsat data: an alternative method to

define and classify land-cover units. Remote Sensing of Environment, 92, pp. 84–97.

DE LEEUW J., LIU, X., SCHMIDT, K., SKIDMORE, A.K., JIA, H. and YANG, L., 2006,

Comparing accuracy assessments to infer superiority of image classification methods.

International Journal of Remote Sensing, 27, pp. 223–232.

DEBBA, P., CARRANZA, E.J.M., STEIN, A., VAN RUITENBEEK, F.J.A. and VAN DER

MEER F.D., 2005, Optimal field sampling for targeting minerals using hyperspectral

data. Remote Sensing of Environment, 99, pp. 373–386.

ERDAS, 2000, ERDAS field guide (5th ed.) (Atlanta: ERDAS).

ESRI, 2005, PC ArcGIS desktop help (ver. 9.1), Redland, CA.

GAD, S. and KUSKY, T., 2006, Lithological mapping in the Eastern Desert of Egypt, the

Barramiya area, using Landsat thematic mapper (TM). Journal of African Earth

Sciences, 44, pp. 196–202.

GAO, J., 2006, Quantification of grassland properties: how it can benefit from geoinformatic

technologies? International Journal of Remote Sensing, 27, pp. 1351–1365.

Classifying typical grassland vegetation 19

Page 21: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

HANSON, H.C. and DAHL, E., 1957, The use of basic principles in the classification of range

vegetation. Journal of Range Management, 10, pp. 26–33.

HEA, C., ZHANG, Q., LI, Y., LI, X. and SHI, P., 2005, Zoning grassland protection area using

remote sensing and cellular automata modelling – a case study in Xilingol steppe

grassland in northern China. Journal of Arid Environments, 63, pp. 814–826.

HIGDON, R. and SCHAFER, D.W., 2001, Maximum likelihood computations for regression

with measurement error. Computational Statistics and Data Analysis, 35, pp. 283–299.

LABA, M., OGURCAK, D., HILL, E., FEGRAUS, E., FIORE, J., DEGLORIA, S.D., GREGORY, S.K.

and BRADEN, J., 2002, Conventional and fuzzy accuracy assessment of the New York

Gap Analysis Project land cover map. Remote Sensing of Environment, 81, pp.

443–455.

LANGLEY, S.K., CHESHIRE, H.M. and HUMES, K.S., 2001, A comparison of single date and

multitemporal satellite image classifications in a semi-arid grassland. Journal of Arid

Environments, 49, pp. 401–411.

LI, B., YON, S.P. and LI, Z.H., 1988, Vegetation and its utilization in Xilinhe Basin. Research

on Grassland Ecosystem, 3, pp. 84–183.

LI, F.R., KANG, L.F., ZHANG, H., ZHAO, L.Y., SHIRATO, Y. and TANIYAMA, I., 2005,

Changes in intensity of wind erosion at different stages of degradation development

in grasslands of Inner Mongolia, China. Journal of Arid Environments, 62, pp.

567–585.

LIANG, E., VENNETIER, M., LIN, J. and SHAO, X., 2003, Relationships between tree increment,

climate and above-ground biomass of grass: a case study in the typical steppe, north

China. Acta Oecologica, 24, pp. 87–94.

LO, C.P. and CHOI, J., 2004, A hybrid approach to urban land use/cover mapping using

Landsat 7 Enhanced Thematic Mapper Plus (ETM + ) images. International Journal of

Remote Sensing, 25, pp. 2687–2700.

LU, D., 2005, Integration of vegetation inventory data and Landsat TM image for vegetation

classification in the western Brazilian Amazon. Forest Ecology and Management, 213,

pp. 369–383.

NORDBERG, M.L. and EVERTSON, J., 2003, Monitoring change in mountainous dry-heath

vegetation at a regional scale Using multitemporal Landsat TM data. Ambio, 32, pp.

502–509.

SALOVAARA, K.J., THESSLER, S., MALIK, R.N. and TUOMISTO, H., 2005, Classification of

Amazonian primary rain forest vegetation using Landsat ETM + satellite imagery.

Remote Sensing of Environment, 97, pp. 39–51.

SHRESTHA, D.P. and ZINCK, J.A., 2001, Land use classification in mountainous areas:

integration of image processing, digital elevation data and field knowledge –

application to Nepal. International Journal of Applied Earth Observation and

Geoinformation, 3, pp. 78–85.

SOHN, Y. and REBELLO, N.S., 2002, Supervised and unsupervised spectral angle classifiers.

Photogrammetric Engineering and Remote Sensing, 68, pp. 1271–1280.

SOHN, Y. and QI, J., 2005, Mapping detailed biotic communities in the Upper San Pedro

Valley of southeastern Arizona using Landsat 7 ETM + data and supervised spectral

angle classifier. Photogrammetric Engineering and Remote Sensing, 71, pp. 709–718.

STUART, N., BARRATT, T. and PLACE, C., 2006, Classifying the neotropical savannas of Belize

using remote sensing and ground survey. Journal of Biogeography, 33, pp. 476–490.

THE INNER MONGOLIA AND NINXIA SURVEY TEAM OF CAS, 1985, Vegetation in Inner

Mongolia (Beijing: China Science Press).

TONG, C., WU, J., YONG, S. and YONG, W., 2004, A landscape-scale assessment of steppe

degradation in the Xilin River Basin, Inner Mongolia, China. Journal of Arid

Environments, 59, pp. 133–149.

WANG, Z. and HAN, X., 2005, Diurnal variation in methane emissions in relation to plants

and environmental variables in the Inner Mongolia marshes. Atmospheric

Environment, 39, pp. 6295–6305.

20 Z. Sha et al.

Page 22: International Journal of Remote Sensing Sensing Paper.pdf · applying expert knowledge (empirical rules) and ancillary data to extract thematic objects from remote sensing images

Dow

nloa

ded

By:

[Uni

vers

ity o

f Mic

higa

n] A

t: 14

:23

22 M

arch

200

8

WANG, Q. and TENHUNEN, J.D., 2004, Vegetation mapping with multitemporal NDVI in

North Eastern China Transect (NECT). International Journal of Applied Earth

Observation and Geoinformation, 6, pp. 17–31.

XU, M., WATANACHATURAPORN, P., VARSHNEY, P.K. and ARORA, M.K., 2005, Decision tree

regression for soft classification of remote sensing data. Remote Sensing of

Environment, 97, pp. 322–336.

ZHA, Y., GAO, J., NI, S., LIU, Y., JIANG, J. and WEI, Y., 2003, A spectral reflectance-based

approach to quantification of grassland cover from Landsat TM imagery. Remote

Sensing of Environment, 87, pp. 371–375.

ZHANG, J. and FOODY, G.M., 1998, A fuzzy classification of sub-urban land cover from

remotely sensed imagery. International Journal of Remote Sensing, 19, pp. 2721–2738.

Classifying typical grassland vegetation 21