Synthesizing and Spatially Linking Disparate Datasets

Synthesizing and Spatially Linking Disparate Datasets Catherine Crosier1,2, Gregory J. Newman1, and Thomas J. Stohlgren1,2

1Natural Resource Ecology Laboratory, Colorado State University, Fort Collins, Colorado 80523-1499

2Midcontinent Ecological Science Center, US Geological Survey, Fort Collins, Colorado 80525

Abstract

Extensive surveys to obtain species distribution data are expensive and time consuming. In addition, many land managers do not know what species occur within their management boundaries, let alone adjacent to them. Yet, there is a wealth of data collected by government agencies and non-government organizations that, when synthesized, maximize the use of existing data on species locations without incurring large costs. We obtained datasets including species lists, plot data, transect data, point data, and individual species polygon data that have been collected using various methods at different spatial scales to meet different objectives. We then created a Microsoft Access 2000 database that relates these different datasets and allows us to use them in concert. The database was subsequently linked to ArcView 3.2 GIS. When combined and spatially linked, the data improved the completeness of each individual dataset. Species reported in a specific area by one dataset were not reported by a second dataset for the same location. The creation of a synthesized, spatially linked database allows development of predictive models and maps for the distribution, abundance, and occurrence of non-native species. These products can be used by land managers at all levels to aid in the detection, management, and control of non-native species.

Keywords: Data synthesis, Data synergy, Spatial linking, Non-native species, Ecological Databases

Objectives

(1) Synthesize pre-existing datasets, including multiple scale (i.e., 1-m2 to entire county) information on non-native vascular plant species in Colorado obtained from individuals, organizations, and agencies at all levels (i.e., federal to local) in a spatially linked database.

(2) Use online GIS technology to disseminate information on non-native species locations to land managers and the public in a user-friendly and easily understood manner.

Methods

We are creating a relational database using Microsoft Access 2000 to synthesize data obtained on non-native species from partners throughout Colorado (Table 1). The NRCS species codes will be used to standardize species names as different projects include synonyms for the same species. Then, we will spatially link this relational database through ArcView 3.2 using a SQL connection and nested locations of species. For example, a point may be located within a wildlife refuge which is located within a county. This connection will allow data to be displayed with maps. These spatially linked datasets can then be used to create predictive models for hotspots of invasion and distributions of target species. Finally, we aim to make our spatial database available on-line using ArcIMS software to allow users to obtain overall numbers and distributions of non-natives at any given spatial scale (Figure 1).

DATA SOURCE DATA TYPE

1. Biota of North America Program County species list

2. USGS Multi-scale plot data

3. USFS - Forest Health Monitoring Program Multi-scale plot data

4. Graduate students Multi-scale plot data

5. San Luis Valley GIS Authority Point, line, and polygon data

6. USFS - Wilderness area surveys Manager survey results

7. Center for the Environmental Management of Military Lands (CEMML)

Vegetation monitoring transects

8. The Nature Conservancy Polygon, point, and transect data

9. Colorado Natural Heritage Program Polygon and point data

10. State of Colorado Quarter quad data for 20 species

11. CSU Research Scientists Point data and control data

12. National Park Service- NPSpecies Park species lists

13. U.S. Fish and Wildlife Service Refuge manager survey results

14. Bureau of Land Management Point, polygon, and control data

Results

We compared county species lists from all 63 Colorado counties, manager survey results from 14 National Forest Service wilderness areas, and plot data from 358 multi-scale vegetation plots. These disparate datasets have different biases associated with them. For example, the Biota of North America Program (BONAP) dataset, compiled from herbarium records, recorded 252 non-native plants in Boulder County while the next largest county was Jefferson County with 135 species (Figure 2). This higher number may be an artifact of the location of the state’s herbarium at the University of Colorado Boulder and of the amount of research being conducted in the Boulder County vicinity, rather than an actual hotspot of non-native species within the state.

When combining these datasets, a mean of 14 species (range 2 to 46) could be added to each county list from plot data (16 counties contained plots) for a total of 222 records added to the BONAP dataset (Figure 3). We analyzed Canada thistle (Cirsium arvense) as a specific species to compare the datasets and evaluate their synergy (Figure 4). This non-native species was identified in 4 plots (located in two counties) and 8 wilderness areas (12 counties) where, simultaneously, the county in which these plots were located had not identified C. arvense. Some wilderness areas span multiple counties, so we obtained a likelihood for Canada thistle presence in those counties. Thus, datasets may be improved when combined.

Figure 5: On-line Interface

Introduction

Ecological studies are often conducted over short temporal scales in specific areas. With synthesis of these smaller studies, the number and complexity of questions that can be addressed increases. Using pre-existing datasets also capitalizes on resources already expended, reducing time and money constraints. Additionally, lists of probable species, including invasive non-native plants, do not exist for many public land units (e.g. national wildlife refuges), and distribution maps are not available for many non-native species. It is impossible to manage an area without knowing what species may occur there. In Colorado, approximately 50% of the land is publicly owned. Many times, agencies and other organizations managing these lands do not communicate with one another. It is important to know what species occur on adjacent lands because adjacent lands can be source populations for invading species.

Figure 1: On-line Spatial Database

Relational Database

• Hierarchical tables

• Normalized database

• Multi-scale analyses

Spatial GIS Database

• Interactive interface

• Spatial analyses

• Multi-scale graphics

Web-based Interface

• User friendly

• Simplistic

• Outreach tool

Table 1. Data sets obtained (1-6) and promised (7- 14)

Figure 4: Dataset ComparisonsFigure 2: Dataset Bias

Number of Non-native Plants2 - 67 - 1011 - 1718 - 2324 - 28

29 - 4546 - 6667 - 110111 - 135136 - 252

Hinsdale county has only 2 recorded non-native plant species, probably because it has been less intensively surveyed.

Boulder county has 252 recorded non-native plant species, but this number may be an artifact of the location of the state’s herbarium at the University of Colorado and the amount of research being conducted in this area.

Future Steps

These results indicate that data synergy can be used to improve the quality of individual datasets. In addition, the synthesized, spatially linked database can be used to create predictive spatial models for hotspots of invasion in the state or in a specific land management unit (Figure 5) and to create trend surfaces for individual species.

Figure 3: County List Additions

View data and distribution maps for single species or invasion trends for the whole state or a subset of your choice.

U.S. Forest Service wilderness areas that identified Cirsium arvense where the county list did not.

Plots that identified Cirsium arvense where the county list did not.

San Luis Valley GIS/GPS

Acknowledgements

Documents

Synthesizing and Spatially Linking Disparate Datasets