16
Automated Global Land Cover Mapping – FROM-GLC Version 2: the production of the 30 m circa 2015 global land cover map https://doi.org/10.6084/ m9.figshare.5362774.v1

Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

Automated Global Land Cover Mapping – FROM-GLC Version 2:

the production of the 30 m circa 2015 global land cover map

https://doi.org/10.6084/m9.figshare.5362774.v1

Page 2: Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

2017.12

Page 3: Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

Automated Global Land Cover Mapping – FROM-GLC

Version 2: the production of the 30 m circa 2015

global land cover map

Peng Gong1*, Jie Wang2, Congcong Li3, Luyan Ji1, Huabing Huang2, Nicholas Clinton4, Yuqi Cheng1, Wenyu Li1, Meinan Zhang1, Yuqi Bai1, Le Yu1, Yali Si1, Haohuan Fu1,6, Lin Gan5,6, Guangwen Yang5,6,

Shupeng Shi6, Gregory Biging3, Zhiliang Zhu7

1 Ministry of Education Key Laboratory for Earth System Modeling, Department of Earth System Science, Tsinghua University, Beijing, 100084, China2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing, 100101, China3 Department of Environmental Science, Policy and Management, University of California, Berkeley, CA 94720-3114 USA4 Google Inc., 1600 Amphitheatre Pkwy, Mountain View, CA 94043, USA5 Department of Earth System Science, Tsinghua University, Beijing, 100084, China6 National Supercomputing Center in Wuxi, Jiangsu, China7 United States Geological Survey, Reston, VA 12201, USA

* Correspondence: [email protected]

Abstract

In 2013, the first set of 30 m resolution global land cover maps, named Finer Resolution Observation and Monitoring of Global Land Cover (FROM-GLC), has been produced and released at Tsinghua University (data.ess.tsinghua.edu.cn). That data set was produced by an automated supervised classification approach based on 8,900 single-date Landsat scenes obtained in and before 2010. In this document, we report the procedure used to produce the second generation of 30 m global land cover maps (FROM-GLC Version 2) based on multi-seasonal data of more than 60,000 Landsat scenes acquired between 2013 and 2015. This is made possible with the completion of a new set of all-season training and validation samples. The overall classification accuracy for Level I classes have been been assessed to be approximately 77% and 72% for Level II classes in the FROM-GLC land cover classification system. These represent substantial improvements in accuracy compared with the first generation global land cover product of FROM-GLC 2010.

Page 4: Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

Introduction

The circa 2015 30m resolution global land cover map is the second generation of the finer resolution observation and monitoring of global land cover (FROM-GLC version 2) that is a continuation of the FROM-GLC 2010 land cover map (Gong et al., 2013). Different from FROM-GLC 2010 are the collection of multi-seasonal sample sets for training and validation (Li et al., 2017), the use of multi-seasonal instead of single-date Landsat images primarily acquired with Landsat-8 between 2013 and 2015, and the incorporation of day of year, geographical coordinates, and elevation data in the image classification. In the following, we document the mapping procedure and report the results.

Mapping procedure

The map was produced using an automated supervised image classification algorithm – Random Forest. The algorithm has been previously assessed in our global land cover mapping efforts to be one of the most robust algorithms when a high dimension of data is used (Li et al., 2014; Feng et al., 2016). In the first generation of 30 m global land cover map for 2010, the overall classification accuracy assessed with an independent test sample set reached approximately 64% (Gong et al., 2013). Several classes were poorly classified including croplands, shrubs, grassland and impervious surfaces. Part of the reasons was due to the use of a single-date image at each location. The lack of multi-seasonal data made it hard to differentiate some of the vegetation classes. The impervious class cannot be well differentiated from bare land or other vegetation classes due to the nature of surface cover complexity in human settlement areas and the high proportion of spectrally mixed pixels. As a result, large tracts of baren lands are mis-classified as impervious class, causing an over classification of human settlement areas. This could not be easily overcome by use of multi-seasonal data. Therefore, here we further processed the impervious class with the inclusion of nighttime light images (Source). Only those impervious areas falling in a mask with sufficient nighttime lights will be retained as Impervious. This can reduce the commission error for the impervious class. The flowchart of the classification procedure is shown in Figure 1.

Page 5: Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

Figure 1. The general processing procedure involved in the production of the new global land cover map for 2015 (FROM-GLC Version 2).

The training of the classifier was done using the first global multi-season sample set (Li et al., 2017). Figure 2 shows the distribution of annually dominant land cover classes at level-1 of FROM-GLC.T wo sets of Random Forest classifiers were trained, one with the impervious class and one without the impervious. The classification system was the same as the one used in FROM-GLC 2010 (Gong et al., 2013). The models were then used to classify Landsat images. The impervious data were then extracted with the classifier trained with the inclusion of the impervious sample for further processing with the nighttime light data. The post-processed impervious mask was used in the classification results trained without the impervious sample to form the final FROM-GLC version map.

Page 6: Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

Figure 2. Annually dominant sample distribution and proportion of samples by categories at FROM-GLC level-1 categories.

Data used and pre-processing

The primary data source is the 30 m resolution Landsat images (lta.cr.usgs.gov). The downloaded images are reflectance data. Data were organized into TIF file format. A total of 60998 Landsat scenes were used among which 60429 scenes (22614 scenes in 2015; 30113 in 2014 and 7702 in 2013) (99%) were from Landsat 8. Because some parts of the world are highly cloudy, we used a small portion of images from previous years. Only 569 scenes were from Landsat 4, 5 and 7.

For every pixel in each Landsat scene, normalized difference vegetation index (NDVI) was calculated and the day of the year (DOY) was recorded. When there are multiple-date images for at the same pixel location, the maximum NDVI and corresponding DOY were preserved for subsequent use.

Page 7: Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

Digital elevation model (DEM) data were used. For data lower than the +/- 60 o latitude, 30 m resolution DEM from Shuttle Radar Topographic Mission were chosen (SRTM-30) (e4ftl01.cr.usgs.gov). Beyond +/- 60o latitude, DEM data from ASTER data were used (lta.cr.usgs.gov). DEM data were reprojected to Universal Transverse Mercator projection and resampled to 30 m, and slope was calculated for each pixel location.

500 m resolution nightime light data were downloaded from https://earthobservatory.nasa.gov/Features/NightLights/page3.php.

Supervised classification and input features

200 trees were set in the Random Forest classifier. Experiments indicate that when the number of trees used in the Random Forest algorithm exceed 200, the accuracy gain at the global level was less than 1%. A total of 200 trees was considered to be sufficient in consideration of the computation costs when the number increases.

A total of 24 features were used in the image classification of each imagery at a single date. They include elevation and slope from the DEM, longitude and latitude, the 7 optical spectral bands, the corresponding NDVI, DOY of the image, the maximum NDVI and its corresponding DOY and 7 spectral bands. We calculated two DOY features for each DOY, the cos(ND) and sin(ND), with ND = 2π * DOY/366.

Our recent all-season sample collection for global land cover mapping research indicates that a global classification model, that is to use all training sample to train one Random Forest classifier to classify the entire global does not lead to best overall accuracies. Instead, dividing the world into geographical zones would lead to better overall classification accuracies (Li et al., 2017). In addition, dividing the world into different regions to train multiple classifiers would save computation. Hence we divided the world into 16 regions and trained one classifier for each region and applied it to classify the images in that corresponding region (Figure 3). In order to ensure that regional borders are properly classified, sample units falling within 5 degrees in longitude or latitude were included in training the classifier for the region.

Page 8: Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

Figure 3. Number of samples in each region separately classified.

Use of supercomputer

Similar to the production of FROM-GLC 2010, a supercomputer was used with 4000 cores to complete the image classification at the global scale. Each core of the computer uses an Intel 2.0GHZ 4800 processing unit. The total data volume of input features is approximately 33TB. It took 48 hours to complete the classification of the 60,000+ images.

Refinement of impervious with nighttime light data

For each path and row, let the total number of Landsat images be N, each image be noted as Landsati (i=1,2, …, N) and its corresponding nighttime light data be Nlight. We processed the initially classified image in the following steps:Step 1. Generate a mask from the nighttime light data by setting Nlight>=Thres (after some experiments, Thres was set to 10). Calculate the PA i and UAi for each image Landsati. If max{UAi}<0.5, which means that the commission error among all images for the impervious class is high, then goto step 3 to correct impervious class with nighttime light data. Otherwise, goto step 2.Step 2. Rank the UAi (i=1,2, …, N). If there are clouds in scene with the maximum UA, then fill the cloud area with the classification results from the remaining images in the order of UAi.

Step 3. Take the impervious results on the image with the greatest UA i. Remove the impervious areas whose Thres is lower than or equal to 10. Figure 4 displays the final result of the global impervious surface.

Figure 4. The percentage of global impervious surface for circa 2015.

Page 9: Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

From multi-seasonal classification results to the

annual product

The selection of land cover class to produce the annual land cover map follows the following ascending order: cloud, ice and snow, bare land, wetland, water, cropland, grassland, shrubs, forest and tundra. On top of all these classes, the post-processed impervious class has the highest priority. The remaining classes are processed in the following steps:Step 1. Calculate the proportion of vegetation on each Landsat scene, Landsat i (i=1,2, …, N). It is noted as PVi. Find the image scene with the greatest PVi, ind = argmax{ PVi }.Step 2. On Landsatind replace background value (0), cloud, snow and ice and bareland with the class from other scenes according to the class priority order. Step 3. Partition the world into 10oX10o geographical patches. Combine the Landsat ind falling into each patch. For pixels located in overlapping areas between neighboring Landsat ind, the land cover class was selected according to the order of class priority.

Classification results and final annual composite

map

Using the multi-seasonal validation sample set developed in Li et al (2017) we obtained an overall accuracy of 77.3% for the 10 Level I classes. For the 26 Level II classes, the corresponding overall accuracy was 71.6%. Compared with FROM-GLC Version 1 (2010) data (Gong et al., 2013), this represents an improvement of more than 10% (Table 1) for Level I classes and almost 20% improvement for Level II classes (Appendix I). The two-level classification system is listed in Appendix II. In table 1, the impervious class was not included as this class was further improved with nighttime light data. Since impervious class occupies only 0.87% of the total validation sample and its improved UA and PA are approximately 57% and 61%, respectively, the overall accuracy of our FROM-GLC Version 2 circa 2015 global land cover data is approximately 77%.

Page 10: Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

Table 1. Confusion matrix among Level I classes in the FROM-GLC Version 2 classification circa 2015Name Cropland Forest Grassland Shrubland Wetland Water bodies Tundra Bareland Snow/Ice Cloud SUM UA (%)Cropland 8544 1033 2700 896 82 11 4 136 6 84 13496 63.3Forest 949 25623 1317 2192 95 28 141 22 131 92 30590 83.8Grassland 2751 1624 14187 4775 181 11 379 832 23 85 24848 57.1Shrubland 827 1518 2559 8719 62 1 10 468 1 36 14201 61.4Wetland 33 32 103 57 139 16 5 9 0 4 398 34.9Waterbodies 35 27 62 57 197 3911 24 125 50 38 4526 86.4Tundra 0 41 118 15 21 10 1584 130 4 19 1942 81.6Bare land 217 0 1344 943 113 14 41 20359 42 129 23202 87.8Snow/Ice 9 196 31 21 5 15 5 16 13228 605 14131 93.6Cloud 139 340 185 117 21 31 23 81 494 14276 15707 90.9SUM 13504 30434 22606 17792 916 4048 2216 22178 13979 15368 143041PA (%) 63.3 84.2 62.8 49.0 15.2 96.6 71.5 91.8 94.6 92.9 77.3

Page 11: Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

Figure 5 shows the composite annual global land cover map from all 60000+ classified images.

FROM-GLC legend: Cropland Forest Grass Shrub Water Impervious surface Bareland Snow/Ice Cloud

Figure 5. Annual composite global land cover map (FROM-GLC Version 2 (2015)).

References

Breiman, L., 2001. Random forests. Machine learning, 45, 5-32.Feng D.L., Y.Y. Zhao, L. Yu, et al., 2016. Circa 2014 African land-cover maps compatible with FROM-GLC and GLC2000 classification schemes based on multi-seasonal Landsat data, International Journal of Remote Sensing, 37(19), 4648-4664.Gong P., J. Wang, L. Yu et al., 2013. Finer resolution observation and monitoring of global land cover: first mapping results with Landsat TM and ETM+ data. International Journal of Remote Sensing. 34(7), 2607-2654.Li C.C., J. Wang, L. Wang, et al. 2014. Comparison of classification algorithms and training sample sizes in urban land classification with Landsat Thematic Mapper imagery, Remote Sensing, 6(2), 964-983.Li C.C., P. Gong, J. Wang, et al., 2017. The first all-season sample set for mapping global land cover with Landsat-8 data. Science Bulletin, 62(7), 508-515.

Acknowledgements

This research was partially supported by the National High Tech Program (2013AA122804), the Special Fund for Meteorology Scientific Research in Public Welfare (GYHY201506010), and National Key R&D Program of China (2016YFA0602200).

Page 12: Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

Appendix I. Confusion matrix of Level II for FROM-GLC Version 2 (circa 2015 global land cover mapping)11 12 13 14 15 21 22 23 24 25 26 31 32 33 41 42 51 52 53 60 71 72 90 101 102 120 SUM UA

(%)

11 13 0 65 2 1 18 0 2 0 0 0 0 5 0 3 0 2 0 0 0 0 0 0 0 0 1 112 11.6

12 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 100.0

13 70 12 4078 233 116 622 28 58 0 90 0 352 1189 55 486 14 56 0 0 2 0 0 3 2 0 27 7493 54.4

14 0 0 38 120 2 97 0 5 0 0 0 0 18 2 19 0 4 0 0 0 0 0 0 0 0 1 306 39.2

15 18 2 494 14 3265 18 75 7 7 5 1 31 162 886 60 314 14 6 0 9 0 4 133 4 0 55 5584 58.5

21 8 0 374 364 1 14103 29 318 0 1133 1 45 612 5 1505 8 41 0 0 7 15 6 1 0 0 61 18637 75.7

22 1 0 15 7 35 69 570 39 11 260 18 0 34 33 30 55 1 0 0 0 2 3 1 14 0 5 1203 47.4

23 1 0 82 37 1 309 37 6148 99 837 9 23 475 33 509 25 49 0 0 21 28 81 13 102 2 24 8945 68.7

24 0 0 0 0 0 0 9 24 77 5 10 0 3 9 2 4 3 0 0 0 0 1 7 12 0 0 166 46.4

25 0 0 13 10 0 232 22 287 1 961 0 6 37 2 54 0 1 0 0 0 3 2 0 0 0 2 1633 58.9

26 0 0 0 0 0 0 0 1 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 6 16.7

31 0 0 54 0 2 12 0 0 0 0 0 366 102 6 10 0 1 0 0 0 0 0 0 0 0 0 553 66.2

32 3 0 1454 154 91 740 32 559 2 151 0 1114 7300 323 2898 113 171 2 0 11 66 273 35 3 0 54 15549 47.0

33 0 0 129 6 858 9 69 22 24 2 2 36 639 4301 266 1488 3 3 1 0 2 38 797 20 0 31 8746 49.2

41 0 0 510 84 30 1150 49 150 1 30 0 76 1337 112 5305 270 59 1 1 1 3 7 20 0 0 26 9222 57.5

42 0 0 47 3 153 13 123 2 0 0 0 0 235 799 478 2666 1 0 0 0 0 0 448 1 0 10 4979 53.5

51 3 0 24 2 4 16 0 13 0 1 0 4 86 2 57 0 124 1 0 15 0 4 0 0 0 3 359 34.5

52 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 1 11 0 0 0 1 8 0 0 1 25 44.0

53 0 0 0 0 0 0 1 0 1 0 0 0 0 8 0 0 1 1 0 1 0 0 1 0 0 0 14 0.0

60 6 0 13 0 16 10 1 12 3 1 0 2 37 23 39 18 94 100 3 3911 1 23 125 34 16 38 4526 86.4

71 0 0 0 0 0 2 0 1 0 4 0 0 4 0 1 0 0 0 0 0 13 21 0 0 0 0 46 28.3

72 0 0 0 0 0 2 0 27 1 4 0 0 103 11 13 1 20 1 0 10 63 1487 130 4 0 19 1896 78.4

90 0 0 64 5 148 0 0 0 0 0 0 5 429 910 134 809 3 110 0 14 2 39 20359 40 2 129 23202 87.8

101 0 0 0 0 9 2 57 56 35 36 10 0 6 25 9 12 0 1 0 8 2 3 16 12825 155 602 13869 92.5

102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 7 0 0 0 69 179 3 262 68.3

120 1 0 95 8 35 242 17 49 5 24 3 14 123 48 93 24 11 10 0 31 0 23 81 485 9 14276 15707 90.9

SUM 124 15 7549 1049 4767 17666 1119 7780 269 3545 55 2074 12938 7594 11971 5821 660 251 5 4048 200 2016 22178 13616 363 15368 143041

PA

(%)10.5 6.7 54.0 11.4 68.5 79.8 50.9 79.0 28.6 27.1 1.8 17.7 56.4 56.6 44.3 45.8 18.8 44 0.0 96.6 6.5 73.8 91.8 94.2 49.3 92.9 71.6

Page 13: Finer Resolution Observation and Monitoring - …data.ess.tsinghua.edu.cn/data/temp/AutomatedFROMG… · Web view2 Institute of Remote Sensing and Digital Earth, Datun Road, Beijing,

Appendix II: The two level land cover classification system of FROM-GLC Version 2.Level-1 Level-2

Name Code Name Code Name Code Name Code

Name Code Name Code Name Code

Cropland 10 Rice paddy 11 Greenhouse 12 Other 13 Orchard 14 Bare farmland 15Forest 20 Broadleaf, leaf-on 21 Broadleaf, leaf-

off22 Needleleaf, leaf-

on23 Needleleaf, leaf-

off24 Mixed leaf,

leaf-on25 Mixed leaf,

leaf-off26

Grassland 30 Pasture 31 Natural grassland

32 Grassland, leaf-off

33

Shrubland 40 Shrubland, leaf-on 41 Shrubland, leaf-off

42

Wetland 50 Marshland 51 Mudflat 52 Marshland, leaf-off

53

Water 60 60Tundra 70 Shrub and brush

tundra71 Herbaceous

tundra72

Impervious surface

80 80

Bareland 90 90/92

Snow/Ice 100 Snow 101 Ice 102Cloud 120 120