Analyzing Satellite Data in ROct 10, 2018 · Select Your Data Set(s) by clicking on that button. We will start by getting ASTER digital elevation model data. Expand Digital Elevation

1

Analyzing Satellite Data in R

Richard E. Plant

October 10, 2018

Additional topic to accompany Spatial Data Analysis in Ecology and Agriculture using R,

Second Edition

http://psfaculty.plantsciences.ucdavis.edu/plant/sda2.htm

Chapter and section references are contained in that text, which is referred to as SDA2.

1. Introduction

Remotely sensed data from satellites are now freely and easily available and convenient to work

with, and the ability to work with these data should be in the arsenal of skills of every ecologist.

There are three steps in learning to work with satellite data: 1) getting the data, 2) putting the

data into the form you want, and 3) carrying out the actual analysis. The third step is very project

specific and we will only touch on it in a general way. Some of the tools used in the analysis of

satellite data are discussed in SDA2.

To facilitate the discussion we will make up a pretend problem and look at how to solve it. The

problem is this. One of the conclusions of the analysis of Data Set 2 of SDA2, the Wieslander

(1935) data set of blue oak presence/absence, was that mean annual temperature and

precipitation are strongly associated with blue oak presence or absence, and that temperature and

precipitation are themselves highly correlated. Suppose we want to carry out a follow-up survey

of some sub-region of the original region surveyed by Wieslander and his team to more carefully

measure the climatic conditions at the original locations in order to resolve these relationships.

Our problem is to use satellite data to select and characterize a region in which conduct this

survey.

In fact, a second survey of a portion of the original area of the original Wieslander survey

actually was carried out (Holzman and Allen-Diaz, 1991). This survey was restricted to lands in

eastern Monterey County and southern San Benito County, California. In Additional Topic 2 we

constructed an interactive mapview map of the interaction between Blue Oak presence QUDO

and mean annual temperature MAT for Data Set 2, and we can use this map to visualize the data

in this area laid over an OpenStreetMap base map (Fig. 1a). Recall from that Additional Topic

that the variable QM displayed on the map takes on values of 0 (i.e, 00) for QUDO = 0 and

http://psfaculty.plantsciences.ucdavis.edu/plant/sda2.htm

https://psfaculty.plantsciences.ucdavis.edu/plant/additionaltopics_mapview.pdf

2

MATQ = 0, where MATQ = 0 if the value of MAT is less than the median, and 1 otherwise. The

other values of QM are defined in the obvious way. The map in Fig. 1a shows an interesting

gradient in values of QM. One problem with the area is the obvious proximity to the Pacific

(a) (b)

Figure 1. Two maps of the QM data of Data Set 2. (a) the approximate region covered by

Holzman and Allen-Diaz (1991); (b) an ecological gradient centered on the Napa Valley.

Coast, which might make the data unrepresentative. Scrolling around a bit in the map reveals

another gradient shown in Fig. 1b. The two maps cover the same area, so we can see that in the

map of Fig. 1b the gradient exists across a smaller distance and potentially over more similar

terrain. Also, if we select this area for our follow-up study then after a hard day of field work we

can relax with a glass of wine. For these very good reasons we will pick the area in Fig. 1b as the

site for our proposed study. In the next section we will discuss the acquisition of Landsat and

ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) data describing

this region. Section 3 introduces the analysis of the Landsat and ASTER data using the raster

package (Hijmans, 2016).

2. Acquiring data from the USGS website

The USGS Earth Explorer website is a terrific resource for remote sensing data, most of it

available for free. A very detailed tutorial on the use of the website is available here. We will go

through a brief demonstration of how to download images relating to our planned sampling

campaign. The first thing you must do is to register as an Earth Explorer user. This is a

straightforward and immediate process, if you have not already done it, do it now, and then

https://earthexplorer.usgs.gov/

https://earthexplorer.usgs.gov/documents/helptutorial.pdf

3

follow along with the discussion here. Unlike the case with the auxiliary data in SDA2, you are

going to have to download these data yourself (they are in very large files).

The first step is to enter our search criteria. Using the interactive mapview map displayed in Fig.

1b and moving the cursor to the center of that map shows that the center of the region is at

approximately longitude -121.3°, latitude 38.4°. Moving to the Earth Explorer page, under 1.

Enter Search Criteria we move to the Coordinates tab, switch to Decimal, and press Add

Coordinate. I entered (38.5, -122) as the coordinates (latitude first!). A marker appears in the

appropriate spot. The next step is to enter the Date Range. I entered 01/01/2000 to 12/12/2017.

Now we move on to 2. Select Your Data Set(s) by clicking on that button. We will start by

getting ASTER digital elevation model data. Expand Digital Elevation and check ASTER

GLOBAL DEM. You will get a window that has something you can read and then click OK. Now

move to the bottom and click Results. Four data sets should appear, with an image and a small

row of icons. These are ASTER DEM data whose acquisition date is October 17, 2011. If you

click on the left-most icon, which looks like a bare foot, you will toggle a footprint of the data.

This indicates that the image with coordinates (-122.5, 38.5) is the one we want. To download

this image click on the icon with the downward pointing green arrow, read the little blurb, and

click OK, and the image should be downloaded to your computer. Click on Return to Earth

Explorer. Although this image covers our entire region of interest, for expository purposes we

are going to need another image, so repeat the process for the image with coordinates (-121.5,

38.5). After that, you can play around with the other icons if you wish.

Next we will download Landsat data covering our study region. Return to the Data Sets tab,

uncheck ASTER GLOBAL DEM, and expand Landsat. Expand Landsat Collection 1 Level-1 and

check the box next to Landsat 8 OLI/TIRS C1 Level-1. You should get a window that looks like

Fig. 2, except without the reddish colored square.

If you click on the bare feet, you will come to one that provides the same coverage as that shown

in the figure. As can be seen in the figure, the Landsat ground swath is at an angle to true north.

Among the selections, look for LC08_L1TP_044033_20171207_20171223_01_T1. The Landsat

naming convention specifies that LC08 means Landsat 8, combined OLI/TIR; L1TP means level

1 precision terrain; 044033 refers to the satellite path; 20171207 is the acquisition date;

20171223 is the processing date; 01 is the collection number; and T1 means category Tier 1.

Again click on the downward pointing arrow. You will get a window listing the download

https://landsat.usgs.gov/what-are-naming-conventions-landsat-scene-identifiers

https://landsat.usgs.gov/what-are-naming-conventions-landsat-scene-identifiers

4

options. Click Download next to the Level-1 GeoTIFF Data Product (873.1 MB). You can

download more than one selection at once, so you could have waited to download the ASTER

file until after you had also selected the Landsat file.

Fig. 2. Earth Explorer window showing coverage of the selected data set.

The file is in tar.gz format, which is a Unix based compression format. You will need an app

such as WinZip or 7Zip to open it. When you do, it will look like Fig. 3. GeoTIFF is “a public

domain metadata standard which allows georeferencing information to be embedded within a

TIFF file.” In the case of our file, the georefencing information is UTM Zone 10 coordinates.

Fig. 3. Downloaded Landsat data

https://en.wikipedia.org/wiki/GeoTIFF

5

Level-1 (abbreviated L1) products consist of a standardized collection of 11 radiometrically and

geometrically corrected images together with a quality assessment (QA) image and a metadata

file. Table 1 shows the wavelength, description, and pixel size of each band. Bands 1 through 9

are produced by the Operational Land Image (OLI) and bands 10 and 11 are produced by the

Thermal Infrared Sensor (TIRS). We will be using bands 1, 2, 3, 4, 5, and 6, as well as the BQA

file, so extract these and place them in a data folder. I named my folder “SatelliteData” and set it

as a subfolder of my basic Data folder set accessed with setwd().

Table 1. Landsat 8 band information

Band Wavelength (µm) Description Pixel size (m)

1 0.435-0.451 Coastal/Aerosol 30

2 0.452-0.512 Blue 30

3 0.533-0.590 Green 30

4 0.636-0.673 Red 30

5 0.851-0.879 NIR 30

6 1.566-1.651 SWIR-1 30

7 2.107-2.294 SWIR-2 30

8 0.503-0.676 Panchromatic 15

9 1.363-1.384 Cirrus 30

10 10.60-11.19 TIR-1 100

11 11.50-12.51 TIR-2 100

It happens that in addition to our study region this Landsat scene also contains the fields in Data

Set 4 of SDA2. Fig. 4 shows the boundary of Field 1 of this data set superimposed on the data

from Band 3. This gives an idea of the size and orientation of the pixels.

Figure 4. Plot of the green band of the data together with the boundary and sample points, which

are on a 61m grid, of Field 1 of Data Set 4 of SDSA2.

6

3. Setting up the data

There are a number of R options available for working with image data and treating them as grid

data in the GIS context. The raster package (Hijmans, 2016) is simple to use and provides full

GIS functionality, so it is my package of choice. The Landsat files can then be input using the

function raster().

> # blue band

> b2 <-

+ raster("SatelliteData\\LC08_L1TP_044033_20171207_20171223_01_T1_B2.TIF")

> # green band

> b3 <-


> # red band

> b4 <-


> # NIR band

> b5 <-


> # Quality assessment

> QA <-

+ raster("SatelliteData\\LC08_L1TP_044033_20171207_20171223_01_T1_BQA.TIF")

Let’s first check the projections.

> projection(b2)

[1] "+proj=utm +zone=10 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84

They are UTM, Zone 10. There are two raster data structures available to work with multiple

layers, the RasterStack and the RasterBrick (SDA2, Section 2.4.3). The RasterStack is more

flexible in terms of the types of grid it can work with, while the RasterBrick is more

parsimonious with memory. The latter format works with our data, which have identical grid

structure, so we will use it to create a brick containing the four radiometric bands.

> full.brick <- brick(b2, b3, b4, b5)

Let’s take a look at the structure of the full.brick object (most of the output is deleted).

> str(region.brick)

Formal class 'RasterBrick' [package "raster"] with 12 slots

* * *

.. .. .. .. ..$ : chr [1:4]

"LC08_L1TP_044033_20171207_20171223_01_T1_B2"

"LC08_L1TP_044033_20171207_20171223_01_T1_B3"

"LC08_L1TP_044033_20171207_20171223_01_T1_B4"

"LC08_L1TP_044033_20171207_20171223_01_T1_B5"

From this we see that the layers are numbered in order in which they were entered, so that the

blue band (B2) is first and the NIR band (B5) is last.

7

The first step in working with the data is to determine the subset of the Landsat image that

contains our region of interest. The Landsat data is in UTM Zone 10 coordinates and Data Set 2

is in longitude and latitude, so the first thing we must do is to transform this data set. As

discussed in SDA2 Section 2.6.2, we can use the function spTransform() of the sp package

(Pebesma and Bivand, 2005). The points are then identified by clicking on them in the mapview

image shown in Fig. 1b.

> data.Set2utm <- spTransform(data.Set2,

+ CRS("+proj=utm +zone=10 +ellps=WGS84"))

> coordinates(data.Set2utm[2205,]) #NW

Longitude Latitude

[1,] 531575.1 4261170

> coordinates(data.Set2utm[2306,]) #E

Longitude Latitude

[1,] 588979 4241924

> coordinates(data.Set2utm[2301,]) #S

Longitude Latitude

[1,] 571962.4 4228598

We will set the boundaries about 10 km from these points.

> print(N <-

+ round((coordinates(data.Set2utm[2205,])[2] + 10000)/1000, 0)*1000)

[1] 4271000

> print(W <-


[1] 542000

> print(E <-

+ round((coordinates(data.Set2utm[2306,])[1] - 10000)/1000, 0)*1000)

[1] 579000

> print(S <-


[1] 4239000

Lo and Young (2007, p. 157) refer to the GIS operation of cutting out a portion of a raster layer

to make a new layer as “clipping.” The raster package accomplishes this with the function

crop(). The boundaries of a square subset to cut out can be identified using a matrix whose first

row is the minimum and maximum x coordinates and whose second row is the minimum and

maximum y coordinates. Remembering that R assigns data to a matrix by columns, we can create

this matrix as follows.

> print(M <- matrix(c(W,S,E,N), nrow = 2, ncol = 2))

[,1] [,2]

[1,] 542000 579000

[2,] 4239000 4271000

The crop() function is applied to the brick object using the raster function extent().

> region.brick <- crop(full.brick, extent(M))

8

> extent(region.brick)

class : Extent

xmin : 541995

xmax : 579015

ymin : 4239015

ymax : 4270995

As a first step in our data exploration we will plot a true color image of the study region that also

shows the locations of the points in Data Set 2. We can use the raster function plotRGB() to

plot the true color image itself.

> plotRGB(region.brick, r = 3, g = 2, b = 1, axes = FALSE, stretch = "lin")

The arguments r, g, and b provide the layers of the RasterBrick that contain the corresponding

colors. The argument stretch provides the type of contrast stretching to apply to the image. The

two options are “lin” and “hist”. I tried them both and thought “lin” looked nicer. The next

step is to plot the sample points. We first select the subset contained in our region of interest and

then plot them using the method discussed in Section 2.6.2 of adding to an existing plot.

> region.pts <- which(coordinates(data.Set2utm)[,1] <= M[1,2] &

+ coordinates(data.Set2utm)[,1] >= M[1,1] &

+ coordinates(data.Set2utm)[,2] <= M[2,2] &

+ coordinates(data.Set2utm)[,2] >= M[1,1])

> data.region <- data.Set2utm[region.pts,]

> plot(data.region, add = TRUE, pch = 16, col = "yellow", cex = 1)

Fig. 5 shows the result.

Figure 5. True color image of the region of interest showing the sample points.

https://homepages.inf.ed.ac.uk/rbf/HIPR2/stretch.htm

9

We can obtain a false color image (not shown) by simply shifting the color bands.

> plotRGB(region.brick, r = 4, g = 3, b = 2, axes = FALSE, stretch = "lin")

Prior to any further exploration of the data we need to verify its quality. The data for this is

contained in the QA (for “quality assessment”) band.

The code for this is the following.

> QA <-

raster("SatelliteData\\LC08_L1TP_044033_20171207_20171223_01_T1_BQA.TIF")

> region.QA <- crop(QA, extent(M))

> plot(region.QA, axes = TRUE, main = "Region Quality Assessment")

> plot(data.region, add = TRUE, pch = 16, col = "red", cex = 1)

Fig. 6 shows the results.

Figure 6. Plot of the QA band of the study region.

There is a small region of anomalous values in the southeast corner, and although no sample

locations are in the band, some are quite close. We can get an idea of these values using the

function hist(), but printing rather than plotting the result.

> print(QAvals <- hist(region.QA))

$breaks

[1] 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200

5400 5600 5800 6000 6200

[20] 6400 6600 6800 7000

$counts

[1] 1313432 2011 0 0 0 0 0 0 0 0 0 0

10

[13] 0 0 0 0 0 0 0 0 0 1

Now we use unique to see the actual values.

> unique(region.QA@data@values)

[1] 2720 6816 2752 2800 2976 2724

We will not worry about the one cell with a value of 6616. The table of QA values in the Landsat

quality documentation indicates that the values 2720 and 2724 are clear, 2752 indicates medium

cloud confidence, 2800 indicates high cloud confidence, and 2976 indicates cloud shadow. We

will take these results as indicating that the area of the sample points is for our purposes cloud

free. Exercise 2 follows up on the cloud issue.

As discussed in SDA2 Section 7.5, the most commonly used index of green vegetation is the

normalized difference vegetation index, or NDVI, which is given by

RIR

RIRNDVI

+

−= .

This is easily computed and plotted for our data.

> IR <- region.brick[[4]]

> R <- region.brick[[3]]

> region.NDVI <- ((IR - R) / (IR + R))

> plot(region.NDVI)

Note the double brackets. A RasterBrick is a list, and the double brackets identify the elements.

The plot is not shown. What we are really interested in is distinguishing the regions containing

green vegetation. To visualize these we can first generate a histogram of NDVI values (Fig. 7).

> hist(region.NDVI, xlab = "NDVI", main = "NDVI in Sample Region")

Figure 7. Histogram of NDVI values in the study region.

https://landsat.usgs.gov/collectionqualityband

https://landsat.usgs.gov/collectionqualityband

11

This indicates that high values of NDVI fall in the range above about 0.25. Using this we can

plot high NDVI regions together with the sample points. We use the raster function calc() to

obtain the cells with high vegetation.

> hi.NDVI <- function(x){

+ x[which(x < 0.25)] <- NA

+ return(x)

+ }

> region.veg <- calc(region.NDVI, hi.NDVI)

> plot(region.veg, main = "HighNDVI Regions")


Fig. 8 shows the result.

Figure 9. Regions of high NDVI together with the sample points.

There are of course many other analyses that we could carry out with the Landsat data. Hijmans

provides an excellent discussion of some of these. Instead, we will move on and briefly cover the

ASTER data.

The first step is to load the ASTER data for our region and check its projection.

> DEM <- raster("SatelliteData\\ASTGTM2_N38W123_dem.tif")

> projection(DEM)

[1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

There is a quality assessment file associated with the raster data similar to that associated with

the Landsat data. In Exercise 3 you are asked to verify the quality of this data set. Since the

https://rspatial.org/rs/index.html

12

projection is longitude-latitude, we use the raster function projectRaster() to change to

UTM. Then we can determine the range of elevation values and plot the data Fig. 9).

> DEM.UTM <- projectRaster(from = DEM, to = b2)

> region.DEM <- crop(DEM.UTM, extent(M))

> range(region.DEM@data@values)

[1] 9.03165 857.58487

> plot(region.DEM)


Figure 9. ASTER Elevation data together with sample points.

Now that the data are loaded and the quality checked, detailed analysis can begin. Since this is

often quite specialized, you can begin with SDA2 and Hijmans’ website as well as the references

discussed in the next section and move on into your own particular application.

4. Further reading

Before reading anything else, you should read Robert Hijmans’ discussion of the topic

mentioned in Section 3. In particular, Hijmans discusses several more steps that can be carried

out in image analysis using the raster package. My favorite reference for remote sensing is

13

Lillesand and Kiefer (2015), although Jensen (2015) is also excellent. There are of course many

different vegetation indices and transformations – too many to discuss. One transformation,

however, may have not gotten as much attention from ecologists and agronomists as it deserves.

This is the tasseled cap (Kauth and Thomas, 1976). The landsat package (Goslee, 2011)

contains the function tasscap() to calculate this, and it is worth a look.

5. Exercises

1. Create Figure 4.

2. The landsat package (Goslee, 2011) contains the function clouds() for estimating cloud

cover based on Landsat bands 1 and 6. Use ?clouds to read about this function. By adjusting the

argument level, estimate the areas of potential cloud cover in the sample region and compare it to

that obtained in Fig. 6.

3. ASTER digital elevation models are created by stereoscopic analysis of multiple images. A

good discussion is given here. The num TIFF file contains the number of stacked images used to

estimate the elevation of a particular pixel. Construct a plot showing the number of images used

to create the pixels representing the study area. Why might these values not be integers?

4. It happens that our region of interest is located entirely within one Landsat and one ASTER

image, but sometimes it is necessary to combine more than one image to obtain full coverage.

This operation is called mosaicking (Lo and Young, 2007, p. 157). The raster package contains

the function mosaic() to accomplish this. Use ?mosaic to read about this function and then use

it to mosaic the files ASTGTM2_N38W123_dem.tif and ASTGTM2_N38W122_dem.tif. Then plot

the image and blow it up to try to determine locations where you can identify the “seam”.

6. References

Goslee, S.C. (2011). Analyzing Remote Sensing Data in R: The landsat Package. Journal of

Statistical Software, 43(4), 1-25. URL http://www.jstatsoft.org/v43/i04/.

Hijmans, R. J. (2016). raster: Geographic Data Analysis and Modeling. R package version 2.5-8.

https://CRAN.R-project.org/package=raster.

Kauth, R. J., and G. S. Thomas (1976). The tasselled cap - a graphic description of the spectral-

temporal development of agricultural crops as seen by landsat. Proceedings of the Symposium on

https://www.arcgis.com/home/item.html?id=93545c023ec44b109be1b3425edc72e1

https://cran.r-project.org/package=raster

14

Machine Processing of Remotely Sensed Data pp. 4B41-4B51. Purdue University, West

Lafayette, Indiana.

Jensen, J. R. (2015). Introductory Digital Image Processing: A Remote Sensing Perspective.

Prentice-Hall, Englewood Cliffs, NJ.

Lillesand, T. M., and R. W. Kiefer (2015). Remote Sensing and Image Interpretation. John

Wiley, New York, NY.

Lo, C. P., and A. K. W. Yeung (2007). Concepts and Techniques in Geographic Information

Systems. Pearson Prentice Hall, Upper Saddle River, NJ.

Pebesma, E.J. and R.S. Bivand, 2005. Classes and methods for spatial data in R. R News 5 (2),

https://cran.r-project.org/doc/Rnews/.

Wieslander, A. E. (1935). A vegetation type map of California. Madroño 3: 140-144.

https://cran.r-project.org/doc/Rnews/

Documents

Analyzing Satellite Data in ROct 10, 2018 · Select Your Data Set(s) by clicking on that button. We will start by getting ASTER digital elevation model data. Expand Digital Elevation