Geospatial Modeling Environment and
Data Assembly, Part III
GIS Cyberinfrastructure ModuleDay 5
R Questions?
Objectives
Address any R questions from the tutorial
Become familiar with the Geospatial Modeling Environment (GME)
Complete and export a dataset suitable for species distribution modeling
Data Management
Computing Notes
GME is the replacement for Hawthes Tools, which are no longer updated and are not guaranteed to function with ArcGIS 9.3 and higher
If you are running ArcGIS 9.2 or lower, GME will not run, but you can use Hawthes Tools instead
To obtain GME, follow the installation instructions here: http://www.spatialecology.com/gme/gmedownload.htm
NOTE: You MUST be running Arc10 and have all GME-associated software to run the currently available version of GME
GME Functionality
Why use GME? It formally replaces Hawthes Tools for ArcGIS
versions 9.3.1 and above Hawthes Tools often function with 9.3.1, but not
always. Technical support for Hawthes Tools has also ceased.
GME contains some of the same functions as Hawthes Tools, plus added tools
GME (and Hawthes Tools) conducts analyses that are either not available in ArcToolbox, or run more efficiently than tools in ArcToolbox
GME Functionality
How does GME work?
GME commands are entered in the GUI, but processed in R, using ArcGIS only when necessary.
Older versions are run from within ArcMap, but still process commands in R
GME GUI The GME interface looks the same whether you are
running the stand-alone version or the version that runs from within ArcMap
Version and Use Instructions
Command line
Menu of Commands
Entered commands
Command History
Once you enter a command, it will disappear from the command line
The entered command will appear below the version and instructions information, along with any processing notes or errors.
You cannot cut and paste code from the command history back into the command line
To avoid re-typing potentially long code, I strongly suggest that you write your functions in a text editor (Notepad, Word, etc.) and paste them into GME from there. Any edits can then be made quickly and the revised function re-pasted into GME
Basic Command Setup
GME commands behave like R packages
They are set up as:
function_name(required input*, optional input*)
*there may be multiple inputs of each type
Type buffer in the command line (no quotes)
You will see all of the required and optional inputs displayed below the version and instruction information
Command Setup For the buffer function, there are three required inputs: in,
out, and distance There are three optional inputs: units, copyfields, and where
Buffer Example Open a text editor and modify this function for your data to
calculate a 150 ft buffer around linear hydro features:
buffer(in="C:\GISCourse\hydro_l_NE_albers.shp", out="C:\GISCourse\hydro_l_albers_gmebuff.shp",distance=150,units="ft",copyfields=TRUE)
The input shapefile should be your linear hydro shapefile that was clipped to New England and projected to Albers
Can you reason what the output will be? How is this similar or different to the ArcToolbox Buffer tool?
Run the command in GME and add the resulting shapefile to your map.
Data Check You should have X files, all projected in Albers
Equal Area. These should already be clipped to New England. Files provided for you are in black, files you created are in blue.
DEM Slope Aspect 2 climate rasters (MAT & MAP) LULC Species observation points Species observation point buffers Roads New England Boundary
Data Management
If your current map is very cluttered and disorganized, I strongly recommend starting a new map and adding only the layers you now need
Use the Group function to further organize your l ayers Recall: hold down the Control key, click the layers you want to
group in the T of C, then right click and select Group
Open the attribute table of the point and the point buffer layers and remove any unneeded fields from previous processing errors
Data Summarization Some variables are most informative when considered at
the landscape scale.
We have 2 layers to summarize within the point buffers (landscape scale): roads and LULC Roads: sum length of road within buffer LULC: calculate percentage of each category within
buffer
GME can be used for these analyses
We then need to append these landscape summarized variables along with climate, elevation, slope, and aspect values to the point observations.
Sum Roads in Buffers The Sum Line Length in Polys function in GME will sum
the lines of a specified input line file contained within user specified polygons.
This is a tool that calculates lengths, thus the input datasets MUST be in a projected coordinate system
Type sumlinelengthinpolys in the GME command line to see the required and optional tool inputs
Modify this function to run on your data:
sumlinelengthsinpolys(line="C:\Users\Jenica\Documents\UConn\GIS Course\Day 5\roads_albers.shp",poly="C:\Users\Jenica\Documents\UConn\GIS Course\Day 5\IPANE_Phrag_buffer_albers.shp",field="ROAD_SUM")
Once the tool is complete, open the attribute table of your point buffers to see the result
Thematic Raster Summary
LULC data are an example of thematic data the numeric categories represent classes rather than real numeric values
We can use the isectpolyrst command in GME to calculate the percentage of each point buffer represented by each LULC class
Enter isectpolyrst in the GME command line to see the required and optional inputs.
Thematic Raster Summary
Required inputs
Optional inputs
Thematic Raster Summary
To run the isectployrst command, modify the following function for your file path:
isectpolyrst(in="C:\Users\Jenica\Documents\UConn\GIS Course\Day 3\IPANE_Phrag_buffer.shp", raster="C:\Users\Jenica\Documents\UConn\GIS Course\Day2\ne_lulc", prefix="lu",thematic=TRUE,proportion=TRUE);
The results should be added automatically to the in file attribute table
Thematic Raster Summary Open the buffer layer attribute table to view the results
Each field is named luV#, which corresponds to the prefix we defined (lu) plus V#, where there is one # for each LULC category (see LULC layer for definitions of numeric classes)
Keep in mind that the outputs are proportions of each LULC type in each buffer. Raw pixel counts in each class could be obtained by omitting the proportion = TRUE statement. Propor tions could then be custom calculated after all geoprocessing is complete and the dataset is exported.
isectpolyrst
This tool can also be used to summarize continuous rasters (e.g., climate) within polygons (e.g., buffers)
What tool have we already used that does this type of summarization?
Use the isectpolyrst command in GME to summarize the MAT and MAP climate rasters within the point buffers
Assessment
Where does your data assembly stand?
We have a point buffer layer with roads and LULC summarized
We need to append those buffer values to the specimen points, along with point values for MAT, MAP, elevation, slope, and aspect
How might you proceed?
Data Organization
Organize your map data Group the following layers that you will need for final assembly:
DEM Slope Aspect MAT MAP Processed point buffers (with road_sum and luV# fie lds) Specimen observation points
These layers should all be in Albers and clipped t o New England
Data OrganizationA well organized map before final assembly:
Combining Data
Use the Extract Multi Values to Points tool to append MAT, MAP, DEM, slope, and aspect to the IPANE specimen observation points
You now have 2 data files: IPANE specimen observation points with associated environmental data (MAT, MAP, DEM, slope, aspect) and IPANE point buffers with LULC and road length summaries.
Use a spatial join to append the buffer data to the point observations.
Export the data to a new shapefile
Delete redundant fields (use the Delete Field tool)
Final Dataset
Check your processing results on the map and the attribute table to ensure everything looks correct
Are there any features that seem odd?
Does your data need additional quality control (QC)?
Final Dataset For example, there are points in my dataset that have values for road_sum
and luV#, but all zeros or -1 for the other raster values. Each of our data layers had slightly different extents, so coastal points may have fallen outside the data area for some layers. There isnt much we can do about this, other than be sure missing data is coded as such.
Final Dataset After exploring your dataset and determining which points
need QC, export the data in tabular form
You can export a variety of file types
Text files are very flexible and can be used easily in Excel or R dBASE files are commonly used in ArcGIS and can be used in R and
Excel
Open your exported file in Excel
Do any needed quality control
Enter NA for cells that you feelshould be quality controlled
Final Dataset
Save your quality controlled table
You now have a dataset that could be used to model your species distribution based on environmental factors. Most statistical modeling (such as species distribution modeling) is conducted external to ArcGIS, which is why we needed to export the dataset to tabular form.
What Have We Learned?!
Geospatial data are complex and must be processed with caution
Designing a workflow before processing can save a lot of headaches
Datum and projections are both critical and sometimes difficult to handle
Many analyses and transformation can be accomplished in ArcGIS, but add on tools can also be quite helpful
Geospatial processing can be time consuming and should not be approached with short timelines if possible
Practicing consistent data management is the best way to prevent file chaos and wasted memory
Skills Summary
Geospatial Modeling Environment
Function coding
Sum Line Lengths in Polygons
Thematic Raster Summary
Effective Data Organization
Quality Control of Processed Data
Assignments
Read the two papers posted on the course website
Complete sections 1.5, 1.6, 2.1-2.3, and 3.1 of the R tutorial on the course website