Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
Analyzing Satellite Data in R
Richard E. Plant
October 10, 2018
Additional topic to accompany Spatial Data Analysis in Ecology and Agriculture using R,
Second Edition
http://psfaculty.plantsciences.ucdavis.edu/plant/sda2.htm
Chapter and section references are contained in that text, which is referred to as SDA2.
1. Introduction
Remotely sensed data from satellites are now freely and easily available and convenient to work
with, and the ability to work with these data should be in the arsenal of skills of every ecologist.
There are three steps in learning to work with satellite data: 1) getting the data, 2) putting the
data into the form you want, and 3) carrying out the actual analysis. The third step is very project
specific and we will only touch on it in a general way. Some of the tools used in the analysis of
satellite data are discussed in SDA2.
To facilitate the discussion we will make up a pretend problem and look at how to solve it. The
problem is this. One of the conclusions of the analysis of Data Set 2 of SDA2, the Wieslander
(1935) data set of blue oak presence/absence, was that mean annual temperature and
precipitation are strongly associated with blue oak presence or absence, and that temperature and
precipitation are themselves highly correlated. Suppose we want to carry out a follow-up survey
of some sub-region of the original region surveyed by Wieslander and his team to more carefully
measure the climatic conditions at the original locations in order to resolve these relationships.
Our problem is to use satellite data to select and characterize a region in which conduct this
survey.
In fact, a second survey of a portion of the original area of the original Wieslander survey
actually was carried out (Holzman and Allen-Diaz, 1991). This survey was restricted to lands in
eastern Monterey County and southern San Benito County, California. In Additional Topic 2 we
constructed an interactive mapview map of the interaction between Blue Oak presence QUDO
and mean annual temperature MAT for Data Set 2, and we can use this map to visualize the data
in this area laid over an OpenStreetMap base map (Fig. 1a). Recall from that Additional Topic
that the variable QM displayed on the map takes on values of 0 (i.e, 00) for QUDO = 0 and
2
MATQ = 0, where MATQ = 0 if the value of MAT is less than the median, and 1 otherwise. The
other values of QM are defined in the obvious way. The map in Fig. 1a shows an interesting
gradient in values of QM. One problem with the area is the obvious proximity to the Pacific
(a) (b)
Figure 1. Two maps of the QM data of Data Set 2. (a) the approximate region covered by
Holzman and Allen-Diaz (1991); (b) an ecological gradient centered on the Napa Valley.
Coast, which might make the data unrepresentative. Scrolling around a bit in the map reveals
another gradient shown in Fig. 1b. The two maps cover the same area, so we can see that in the
map of Fig. 1b the gradient exists across a smaller distance and potentially over more similar
terrain. Also, if we select this area for our follow-up study then after a hard day of field work we
can relax with a glass of wine. For these very good reasons we will pick the area in Fig. 1b as the
site for our proposed study. In the next section we will discuss the acquisition of Landsat and
ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) data describing
this region. Section 3 introduces the analysis of the Landsat and ASTER data using the raster
package (Hijmans, 2016).
2. Acquiring data from the USGS website
The USGS Earth Explorer website is a terrific resource for remote sensing data, most of it
available for free. A very detailed tutorial on the use of the website is available here. We will go
through a brief demonstration of how to download images relating to our planned sampling
campaign. The first thing you must do is to register as an Earth Explorer user. This is a
straightforward and immediate process, if you have not already done it, do it now, and then
3
follow along with the discussion here. Unlike the case with the auxiliary data in SDA2, you are
going to have to download these data yourself (they are in very large files).
The first step is to enter our search criteria. Using the interactive mapview map displayed in Fig.
1b and moving the cursor to the center of that map shows that the center of the region is at
approximately longitude -121.3°, latitude 38.4°. Moving to the Earth Explorer page, under 1.
Enter Search Criteria we move to the Coordinates tab, switch to Decimal, and press Add
Coordinate. I entered (38.5, -122) as the coordinates (latitude first!). A marker appears in the
appropriate spot. The next step is to enter the Date Range. I entered 01/01/2000 to 12/12/2017.
Now we move on to 2. Select Your Data Set(s) by clicking on that button. We will start by
getting ASTER digital elevation model data. Expand Digital Elevation and check ASTER
GLOBAL DEM. You will get a window that has something you can read and then click OK. Now
move to the bottom and click Results. Four data sets should appear, with an image and a small
row of icons. These are ASTER DEM data whose acquisition date is October 17, 2011. If you
click on the left-most icon, which looks like a bare foot, you will toggle a footprint of the data.
This indicates that the image with coordinates (-122.5, 38.5) is the one we want. To download
this image click on the icon with the downward pointing green arrow, read the little blurb, and
click OK, and the image should be downloaded to your computer. Click on Return to Earth
Explorer. Although this image covers our entire region of interest, for expository purposes we
are going to need another image, so repeat the process for the image with coordinates (-121.5,
38.5). After that, you can play around with the other icons if you wish.
Next we will download Landsat data covering our study region. Return to the Data Sets tab,
uncheck ASTER GLOBAL DEM, and expand Landsat. Expand Landsat Collection 1 Level-1 and
check the box next to Landsat 8 OLI/TIRS C1 Level-1. You should get a window that looks like
Fig. 2, except without the reddish colored square.
If you click on the bare feet, you will come to one that provides the same coverage as that shown
in the figure. As can be seen in the figure, the Landsat ground swath is at an angle to true north.
Among the selections, look for LC08_L1TP_044033_20171207_20171223_01_T1. The Landsat
naming convention specifies that LC08 means Landsat 8, combined OLI/TIR; L1TP means level
1 precision terrain; 044033 refers to the satellite path; 20171207 is the acquisition date;
20171223 is the processing date; 01 is the collection number; and T1 means category Tier 1.
Again click on the downward pointing arrow. You will get a window listing the download
4
options. Click Download next to the Level-1 GeoTIFF Data Product (873.1 MB). You can
download more than one selection at once, so you could have waited to download the ASTER
file until after you had also selected the Landsat file.
Fig. 2. Earth Explorer window showing coverage of the selected data set.
The file is in tar.gz format, which is a Unix based compression format. You will need an app
such as WinZip or 7Zip to open it. When you do, it will look like Fig. 3. GeoTIFF is “a public
domain metadata standard which allows georeferencing information to be embedded within a
TIFF file.” In the case of our file, the georefencing information is UTM Zone 10 coordinates.
Fig. 3. Downloaded Landsat data
5
Level-1 (abbreviated L1) products consist of a standardized collection of 11 radiometrically and
geometrically corrected images together with a quality assessment (QA) image and a metadata
file. Table 1 shows the wavelength, description, and pixel size of each band. Bands 1 through 9
are produced by the Operational Land Image (OLI) and bands 10 and 11 are produced by the
Thermal Infrared Sensor (TIRS). We will be using bands 1, 2, 3, 4, 5, and 6, as well as the BQA
file, so extract these and place them in a data folder. I named my folder “SatelliteData” and set it
as a subfolder of my basic Data folder set accessed with setwd().
Table 1. Landsat 8 band information
Band Wavelength (µm) Description Pixel size (m)
1 0.435-0.451 Coastal/Aerosol 30
2 0.452-0.512 Blue 30
3 0.533-0.590 Green 30
4 0.636-0.673 Red 30
5 0.851-0.879 NIR 30
6 1.566-1.651 SWIR-1 30
7 2.107-2.294 SWIR-2 30
8 0.503-0.676 Panchromatic 15
9 1.363-1.384 Cirrus 30
10 10.60-11.19 TIR-1 100
11 11.50-12.51 TIR-2 100
It happens that in addition to our study region this Landsat scene also contains the fields in Data
Set 4 of SDA2. Fig. 4 shows the boundary of Field 1 of this data set superimposed on the data
from Band 3. This gives an idea of the size and orientation of the pixels.
Figure 4. Plot of the green band of the data together with the boundary and sample points, which
are on a 61m grid, of Field 1 of Data Set 4 of SDSA2.
6
3. Setting up the data
There are a number of R options available for working with image data and treating them as grid
data in the GIS context. The raster package (Hijmans, 2016) is simple to use and provides full
GIS functionality, so it is my package of choice. The Landsat files can then be input using the
function raster().
> # blue band
> b2 <-
+ raster("SatelliteData\\LC08_L1TP_044033_20171207_20171223_01_T1_B2.TIF")
> # green band
> b3 <-
+ raster("SatelliteData\\LC08_L1TP_044033_20171207_20171223_01_T1_B3.TIF")
> # red band
> b4 <-
+ raster("SatelliteData\\LC08_L1TP_044033_20171207_20171223_01_T1_B4.TIF")
> # NIR band
> b5 <-
+ raster("SatelliteData\\LC08_L1TP_044033_20171207_20171223_01_T1_B5.TIF")
> # Quality assessment
> QA <-
+ raster("SatelliteData\\LC08_L1TP_044033_20171207_20171223_01_T1_BQA.TIF")
Let’s first check the projections.
> projection(b2)
[1] "+proj=utm +zone=10 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84
They are UTM, Zone 10. There are two raster data structures available to work with multiple
layers, the RasterStack and the RasterBrick (SDA2, Section 2.4.3). The RasterStack is more
flexible in terms of the types of grid it can work with, while the RasterBrick is more
parsimonious with memory. The latter format works with our data, which have identical grid
structure, so we will use it to create a brick containing the four radiometric bands.
> full.brick <- brick(b2, b3, b4, b5)
Let’s take a look at the structure of the full.brick object (most of the output is deleted).
> str(region.brick)
Formal class 'RasterBrick' [package "raster"] with 12 slots
* * *
.. .. .. .. ..$ : chr [1:4]
"LC08_L1TP_044033_20171207_20171223_01_T1_B2"
"LC08_L1TP_044033_20171207_20171223_01_T1_B3"
"LC08_L1TP_044033_20171207_20171223_01_T1_B4"
"LC08_L1TP_044033_20171207_20171223_01_T1_B5"
From this we see that the layers are numbered in order in which they were entered, so that the
blue band (B2) is first and the NIR band (B5) is last.
7
The first step in working with the data is to determine the subset of the Landsat image that
contains our region of interest. The Landsat data is in UTM Zone 10 coordinates and Data Set 2
is in longitude and latitude, so the first thing we must do is to transform this data set. As
discussed in SDA2 Section 2.6.2, we can use the function spTransform() of the sp package
(Pebesma and Bivand, 2005). The points are then identified by clicking on them in the mapview
image shown in Fig. 1b.
> data.Set2utm <- spTransform(data.Set2,
+ CRS("+proj=utm +zone=10 +ellps=WGS84"))
> coordinates(data.Set2utm[2205,]) #NW
Longitude Latitude
[1,] 531575.1 4261170
> coordinates(data.Set2utm[2306,]) #E
Longitude Latitude
[1,] 588979 4241924
> coordinates(data.Set2utm[2301,]) #S
Longitude Latitude
[1,] 571962.4 4228598
We will set the boundaries about 10 km from these points.
> print(N <-
+ round((coordinates(data.Set2utm[2205,])[2] + 10000)/1000, 0)*1000)
[1] 4271000
> print(W <-
+ round((coordinates(data.Set2utm[2205,])[1] + 10000)/1000, 0)*1000)
[1] 542000
> print(E <-
+ round((coordinates(data.Set2utm[2306,])[1] - 10000)/1000, 0)*1000)
[1] 579000
> print(S <-
+ round((coordinates(data.Set2utm[2301,])[2] + 10000)/1000, 0)*1000)
[1] 4239000
Lo and Young (2007, p. 157) refer to the GIS operation of cutting out a portion of a raster layer
to make a new layer as “clipping.” The raster package accomplishes this with the function
crop(). The boundaries of a square subset to cut out can be identified using a matrix whose first
row is the minimum and maximum x coordinates and whose second row is the minimum and
maximum y coordinates. Remembering that R assigns data to a matrix by columns, we can create
this matrix as follows.
> print(M <- matrix(c(W,S,E,N), nrow = 2, ncol = 2))
[,1] [,2]
[1,] 542000 579000
[2,] 4239000 4271000
The crop() function is applied to the brick object using the raster function extent().
> region.brick <- crop(full.brick, extent(M))
8
> extent(region.brick)
class : Extent
xmin : 541995
xmax : 579015
ymin : 4239015
ymax : 4270995
As a first step in our data exploration we will plot a true color image of the study region that also
shows the locations of the points in Data Set 2. We can use the raster function plotRGB() to
plot the true color image itself.
> plotRGB(region.brick, r = 3, g = 2, b = 1, axes = FALSE, stretch = "lin")
The arguments r, g, and b provide the layers of the RasterBrick that contain the corresponding
colors. The argument stretch provides the type of contrast stretching to apply to the image. The
two options are “lin” and “hist”. I tried them both and thought “lin” looked nicer. The next
step is to plot the sample points. We first select the subset contained in our region of interest and
then plot them using the method discussed in Section 2.6.2 of adding to an existing plot.
> region.pts <- which(coordinates(data.Set2utm)[,1] <= M[1,2] &
+ coordinates(data.Set2utm)[,1] >= M[1,1] &
+ coordinates(data.Set2utm)[,2] <= M[2,2] &
+ coordinates(data.Set2utm)[,2] >= M[1,1])
> data.region <- data.Set2utm[region.pts,]
> plot(data.region, add = TRUE, pch = 16, col = "yellow", cex = 1)
Fig. 5 shows the result.
Figure 5. True color image of the region of interest showing the sample points.
9
We can obtain a false color image (not shown) by simply shifting the color bands.
> plotRGB(region.brick, r = 4, g = 3, b = 2, axes = FALSE, stretch = "lin")
Prior to any further exploration of the data we need to verify its quality. The data for this is
contained in the QA (for “quality assessment”) band.
The code for this is the following.
> QA <-
raster("SatelliteData\\LC08_L1TP_044033_20171207_20171223_01_T1_BQA.TIF")
> region.QA <- crop(QA, extent(M))
> plot(region.QA, axes = TRUE, main = "Region Quality Assessment")
> plot(data.region, add = TRUE, pch = 16, col = "red", cex = 1)
Fig. 6 shows the results.
Figure 6. Plot of the QA band of the study region.
There is a small region of anomalous values in the southeast corner, and although no sample
locations are in the band, some are quite close. We can get an idea of these values using the
function hist(), but printing rather than plotting the result.
> print(QAvals <- hist(region.QA))
$breaks
[1] 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200
5400 5600 5800 6000 6200
[20] 6400 6600 6800 7000
$counts
[1] 1313432 2011 0 0 0 0 0 0 0 0 0 0
10
[13] 0 0 0 0 0 0 0 0 0 1
Now we use unique to see the actual values.
> unique(region.QA@data@values)
[1] 2720 6816 2752 2800 2976 2724
We will not worry about the one cell with a value of 6616. The table of QA values in the Landsat
quality documentation indicates that the values 2720 and 2724 are clear, 2752 indicates medium
cloud confidence, 2800 indicates high cloud confidence, and 2976 indicates cloud shadow. We
will take these results as indicating that the area of the sample points is for our purposes cloud
free. Exercise 2 follows up on the cloud issue.
As discussed in SDA2 Section 7.5, the most commonly used index of green vegetation is the
normalized difference vegetation index, or NDVI, which is given by
RIR
RIRNDVI
+
−= .
This is easily computed and plotted for our data.
> IR <- region.brick[[4]]
> R <- region.brick[[3]]
> region.NDVI <- ((IR - R) / (IR + R))
> plot(region.NDVI)
Note the double brackets. A RasterBrick is a list, and the double brackets identify the elements.
The plot is not shown. What we are really interested in is distinguishing the regions containing
green vegetation. To visualize these we can first generate a histogram of NDVI values (Fig. 7).
> hist(region.NDVI, xlab = "NDVI", main = "NDVI in Sample Region")
Figure 7. Histogram of NDVI values in the study region.
11
This indicates that high values of NDVI fall in the range above about 0.25. Using this we can
plot high NDVI regions together with the sample points. We use the raster function calc() to
obtain the cells with high vegetation.
> hi.NDVI <- function(x){
+ x[which(x < 0.25)] <- NA
+ return(x)
+ }
> region.veg <- calc(region.NDVI, hi.NDVI)
> plot(region.veg, main = "HighNDVI Regions")
> plot(data.region, add = TRUE, pch = 16, col = "red", cex = 1)
Fig. 8 shows the result.
Figure 9. Regions of high NDVI together with the sample points.
There are of course many other analyses that we could carry out with the Landsat data. Hijmans
provides an excellent discussion of some of these. Instead, we will move on and briefly cover the
ASTER data.
The first step is to load the ASTER data for our region and check its projection.
> DEM <- raster("SatelliteData\\ASTGTM2_N38W123_dem.tif")
> projection(DEM)
[1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
There is a quality assessment file associated with the raster data similar to that associated with
the Landsat data. In Exercise 3 you are asked to verify the quality of this data set. Since the
12
projection is longitude-latitude, we use the raster function projectRaster() to change to
UTM. Then we can determine the range of elevation values and plot the data Fig. 9).
> DEM.UTM <- projectRaster(from = DEM, to = b2)
> region.DEM <- crop(DEM.UTM, extent(M))
> range(region.DEM@data@values)
[1] 9.03165 857.58487
> plot(region.DEM)
> plot(data.region, add = TRUE, pch = 16, col = "red", cex = 1)
Figure 9. ASTER Elevation data together with sample points.
Now that the data are loaded and the quality checked, detailed analysis can begin. Since this is
often quite specialized, you can begin with SDA2 and Hijmans’ website as well as the references
discussed in the next section and move on into your own particular application.
4. Further reading
Before reading anything else, you should read Robert Hijmans’ discussion of the topic
mentioned in Section 3. In particular, Hijmans discusses several more steps that can be carried
out in image analysis using the raster package. My favorite reference for remote sensing is
13
Lillesand and Kiefer (2015), although Jensen (2015) is also excellent. There are of course many
different vegetation indices and transformations – too many to discuss. One transformation,
however, may have not gotten as much attention from ecologists and agronomists as it deserves.
This is the tasseled cap (Kauth and Thomas, 1976). The landsat package (Goslee, 2011)
contains the function tasscap() to calculate this, and it is worth a look.
5. Exercises
1. Create Figure 4.
2. The landsat package (Goslee, 2011) contains the function clouds() for estimating cloud
cover based on Landsat bands 1 and 6. Use ?clouds to read about this function. By adjusting the
argument level, estimate the areas of potential cloud cover in the sample region and compare it to
that obtained in Fig. 6.
3. ASTER digital elevation models are created by stereoscopic analysis of multiple images. A
good discussion is given here. The num TIFF file contains the number of stacked images used to
estimate the elevation of a particular pixel. Construct a plot showing the number of images used
to create the pixels representing the study area. Why might these values not be integers?
4. It happens that our region of interest is located entirely within one Landsat and one ASTER
image, but sometimes it is necessary to combine more than one image to obtain full coverage.
This operation is called mosaicking (Lo and Young, 2007, p. 157). The raster package contains
the function mosaic() to accomplish this. Use ?mosaic to read about this function and then use
it to mosaic the files ASTGTM2_N38W123_dem.tif and ASTGTM2_N38W122_dem.tif. Then plot
the image and blow it up to try to determine locations where you can identify the “seam”.
6. References
Goslee, S.C. (2011). Analyzing Remote Sensing Data in R: The landsat Package. Journal of
Statistical Software, 43(4), 1-25. URL http://www.jstatsoft.org/v43/i04/.
Hijmans, R. J. (2016). raster: Geographic Data Analysis and Modeling. R package version 2.5-8.
https://CRAN.R-project.org/package=raster.
Kauth, R. J., and G. S. Thomas (1976). The tasselled cap - a graphic description of the spectral-
temporal development of agricultural crops as seen by landsat. Proceedings of the Symposium on
14
Machine Processing of Remotely Sensed Data pp. 4B41-4B51. Purdue University, West
Lafayette, Indiana.
Jensen, J. R. (2015). Introductory Digital Image Processing: A Remote Sensing Perspective.
Prentice-Hall, Englewood Cliffs, NJ.
Lillesand, T. M., and R. W. Kiefer (2015). Remote Sensing and Image Interpretation. John
Wiley, New York, NY.
Lo, C. P., and A. K. W. Yeung (2007). Concepts and Techniques in Geographic Information
Systems. Pearson Prentice Hall, Upper Saddle River, NJ.
Pebesma, E.J. and R.S. Bivand, 2005. Classes and methods for spatial data in R. R News 5 (2),
https://cran.r-project.org/doc/Rnews/.
Wieslander, A. E. (1935). A vegetation type map of California. Madroño 3: 140-144.