34
Information Management Jornada Basin LTER

Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Embed Size (px)

Citation preview

Page 1: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Information ManagementJornada Basin LTER

Page 2: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Mission

The Jornada Information Management System provides the infrastructure for the curation, protection, access, and analysis of Jornada LTER (JRN) data holdings.

Our mission is to protect and provide access to publically funded research data, tools, and findings that result from JRN research and associated collaborations.

Page 3: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Jornada Information Management System

JIMS is a multi-organization system that contains data and metadata holdings and ancillary information from the Jornada through the LTER, USDA, and AISE as well as collaborative efforts, either through data collection and storage for other sites (e.g., BLM-Malpais Borderlands) or through development of tools to improve data access and analysis (e.g., EcoTrends, The Nature Conservancy Landscape Toolbox).

Page 4: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Information ManagementThe purpose of our information management system is to provide protocols and services for data collection, verification, organization, archive, and distribution. One of the primary tools to insuring long-term usefulness of data and products is detailed metadata that describes the research project and its related datasets. Metadata are shared and integrated amongst services and tools of the Jornada Information Management System.

We provide access to hundreds of datasets linked directly to our program or from other research locations and management agencies in support of multi-user needs. Our intent is to provide datasets, science-based information, tools, and technologies that can be used, through simple or more complex analyses, to address the needs of a diverse user community.

Page 5: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Information management system

Our system consists of six major components: a) Data management implementation/process b) Management of data, spatial maps, and imagery, and

the creation of and access to associated metadatac) Formal data management protocolsd) Tools and resources dedicated to harvest, document,

archive, manage, and make data accessible, and tools to access, analyze, and download the data and metadata

e) Networking and computing servicesf) Support staff

Page 6: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Data Management Process

Our site manager, John Anderson, acts as the liaison between researchers and the information management team. His involvement begins during the Project Design phase with the completion of the Jornada Notification of Proposed Research form by a researcher prior to the start of work. This form alerts the Information Manager to the new study and potential LTER datasets.

Upon initiation of a new study, the researcher completes a Project Documentation form that provides the second level of "metadata" documentation, and arranges for GPS of the data collection sites by the LTER field crew.

Page 7: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Data CollectionIn the data collection phase, the Information Manager (IM) helps researchers design field and laboratory data sheets that facilitate data entry and analysis. The investigator completes a Project & Dataset Documentation forms to provide descriptive metadata of the dataset and its attributes. Both Project and dataset Documentation forms are provided with the dataset when it is requested or obtained from our website.

JIMS data entry programs validate data upon entry. Computer files are subjected to further verification by graphing and/or error-checking programs, and/or examination by the responsible investigator. Final quality assurance rests with the investigator who submits data for inclusion in the Data Management System. Direct communication between researchers and the IM ensures the timely submission and accessibility of data, as required by NSF guidelines.

Page 8: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Databases & Geospatial Analysis

One of the biggest challenges is migrating historic and current data into formats consistent with database rules, and to support geospatial analysis and mapping. Processing data files that have been collected or designed without database protocols is an enormous workload.

Our approach that focuses on continued interactions between researchers and the IM minimizes this workload.

Page 9: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Data Management Process

Page 10: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Management of Data and Metadata

We employ many tools in our system, including SQL Server, GIS Server, geodatabases, Drupal, data entry systems, and open-source geoportals. We are building a system that optimizes the contribution of each tool to the total system. We also optimize the roles of people on the information management team as the system evolves.

We are building, and have pilot tested, an approach to information management that integrates tabular research data with the spatial component of data sample location as a complete dataset package. This approach allows us to easily integrate data from more than one research program or project, and facilitates scientific research.

Page 11: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Data Protocols and Metadata Standards

Procedures are conducted in accordance with recommendations and guidelines developed by the LTER Network Information Managers Committee.

Data access, acknowledgement, and data management policies, which are compliant with those developed by the LTER IMC, can be found at: http://jornada.nmsu.edu/lter/data/policies

EML is being produced for each dataset. Datasets with full metadata are automatically harvested into the LTER Data Portal and subsequently DataOne.

Page 12: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Data Access Policy

Data derived from LTER funding are made freely and publicly available within 2 years after collection.

Data are designated as either “Unrestricted” and � �available online or “Restricted” with release � �authority by the responsible investigator. Restricted datasets are those in preparation for publication or part of student research that is protected to allow them the opportunity to publish.

Page 13: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Tools, Resources and DevelopmentTools

• Website• Geodatabase• Data Catalog• Geoportal

Resources• Networking• Computational• Redundant Backups• Information Management Team

Development• EcoNIS Server• Knowledge, Learning, Analysis System• EcoTrends

Page 14: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Data Flow

Data and Metadata

LTER Data Portal

LTER Data Portal

Jornada Database

LTER NetworkDatabase

JornadaInteractive

Maps

JornadaInteractive

Maps

JornadaGeoportalJornada

Geoportal

Jornada Data Catalog

Jornada Data Catalog

EcoTrends Database

Jornada Geodatabase

Page 15: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Website

Our website offers options for users to access, query, and understand the datasets online before deciding to download. We will continue to add data accessibility and analytical tool capabilities as they are made available from our collaborative projects (EcoTrends, Landscape Toolbox, KLAS). Datasets are delivered to users as downloadable dataset packages from the data catalogs and geoportals through web queries. The dataset package includes metadata files, data files (with coordinates for each data record), and a shapefile (spatial representation of dataset location) if available.

We have all long-term JRN LTER datasets in this system. These datasets are currently available online at http://jornada.nmsu.edu/lter/data, and a listing is available at http://jornada.nmsu.edu/lter/data/long-term.

Page 16: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

GeodatabaseThe Jornada geodatabase runs ESRI ArcSDE spatial data engine on SQL Server 2008. The geodatabase provides storage and access to JRN spatial and tabular research data holdings. The geodatabase is also used to create and manage metadata in ISO format which is subsequently used by the JRN website and geoportal to allow users to visualize, search, and access JRN GIS and tabular data.

Map and image services are created from geodatabase resources, and are provided by the GIS server to the Geoportal and other web-mapping applications. This approach provides a visual display of the cataloged spatial datasets to the user. A geodatabase is used to integrate spatial and tabular research. Geographic coordinates, as well as other key dataset identifier fields, are inserted into CSV files to allow the data to be easily imported into any number of analytical software or databases.

Page 17: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Data Catalog

The website provides access via a data catalog and geoportal to data as well as personnel information, publications, research proposals, reports, and other information about the Jornada Basin LTER and its research activities and collaborations. The website follows the LTER website design recommendations. Recently, the Jornada moved to a Drupal content management system to host websites for all JRN-related research projects and collaborations.

DEIMS has been adopted by several other LTER and iLTER sites (ARC, LUQ, NTL, NWT, PIE, SEV, VCR, European Union, Taiwan) as a common approach to making data available and for generating EML to LTER best practices. EML generated by DEIMS is being harvested into the current LTER Data Portal (PASTA).

Page 18: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

DEIMSDrupal Ecological Information Management System

Content RelationshipsIntegrated Drupal content types

DEIMS

ResearchSites

Publications

Data Sets

Projects

People

Page 19: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

GeoportalWe implemented ESRI open source geoportals into JIMS. The geoportals provide textual searches via keywords as well as the ability to query geographic extent and to map research site locations. Although primarily developed to facilitate the distribution of spatial datasets, the geoportal can also be used to query and deliver a wide range of products, including documents, tabular data, and integration with other data portals using web services. Registered users can save multiple search terms to revisit the site at a later date.

Data providers can manually publish datasets in the portal or the geoportal can be configured to automatically publish datasets when properly formatted ISO metadata are added or updated in a specific internal directory. The interface also allows a user to select a bounding extent to limit or clip spatial datasets and automatically e-mail the customized files to the user as a zip file. The geoportal has the capability to deny access to restricted datasets or grant access to only selected registered users as defined by the portal administrator.

Page 20: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

NetworkingThe Jornada site offices and laboratories located in Wooton Hall on the campus of NMSU are connected to a local area network (LAN) through a firewall to the NMSU network (Gigabit Ethernet) and Internet2. Desktop computers and servers are connected to the LAN using Gigabit Ethernet. We recently increased bandwidth from the field station to the NMSU campus from a single T1 connection (1.54 MB) by adding 20 MB additional bandwidth for wireless access by adding a wireless backhaul to connect a local ISP with the new 100 foot tower at JER Headquarters in the field.

The increased bandwidth supports streaming data and video, and remote education activities (K-12) from the wireless network covering the research site. We plan to continue increasing the wireless coverage (cloud) across research sites to provide Wi-Fi and 900 MHz spread spectrum connectivity for researchers, educators, and scientific instrumentation. This year 3 old windmill towers were instrumented with wireless access points to extend coverage to nearby research sites.

Page 21: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Computational ResourcesJornada servers are configured as XenServer hypervisors within a resource pool and are used to host a large number of virtual servers (50+). The resource pool supports virtual servers running multiple operating systems (Linux, Windows). The resource pool can be configured to provide high availability to ensure the servers are available 24 hours a day, 365 days a year. If one of the physical servers (hypervisors) within a pool fails or is brought down for maintenance, the virtual servers running on the server are automatically transferred to another hypervisor.

To a user connected to services provided by one of the virtual servers, the server will appear to have a slight delay (15-30 seconds), but otherwise the user will see no apparent effect from the virtual server being transferred to another hypervisor. Currently, we have 4 physical servers within the resource pool. Additionally, servers that are not virtualized provide directory (Active Directory, LDAP), backup, and licensing services for the resource pools and desktop computers. Server storage is centralized using a storage area network (SAN) and provides 48 TB of storage capacity. The servers and SAN are connected redundantly to allow for hardware failure without impacting server performance. New SAN switches which increase storage bandwidth from 4 GB to 8 GB has been purchased and will be deployed once the server host bus adapters have been upgraded to support the higher communications speeds.

Page 22: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Redundant BackupsMultiple forms of backup are incorporated to protect data and systems from disaster and to allow for rapid recovery in case a disaster occurs. Servers and switch closets are physically secured and environmentally controlled to provide security and protection. Differential backups are performed nightly on all servers and many desktop computers using a dual drive LTO 4 tape library directly attached to the SAN. Backup media is reused after 3 months. Backup media are stored off-site in case of catastrophe.

Virtual server and storage (SAN) snapshots are performed prior to system upgrades or modifications to allow rapid recovery in the event these alterations produce undesirable results. The data archive volume is backed up routinely to DVDs and hard drives stored off-site. The DVDs are not reused, but are saved indefinitely. We are exploring mechanisms to automate and schedule server snapshots with little or no additional cost. We are also exploring disk-to-disk backups and alternative technologies to replace our tape library and decrease backup windows.

Page 23: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

IM Team

Our information management team consists of four full-time staff jointly supported by the JRN and USDA (Ken Ramsey: Information Manager; Jim Lenz: Network and systems administrator; Valerie LaPlante: Multimedia and Website Administrator; Scott Schrader: Geoportal Administrator).

Student employees and graduate assistants support data entry and computer programming efforts. Team member's skill sets complement each other with some overlap to allow for temporary absence and employee turnover.

Page 24: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

EcoNIS Server

We plan on integrating Drupal and the geoportals to allow seamless access to both systems without requiring users to login separately to them. As GIS enabled data packages are created, the EML files will be updated using Drupal to point to the data package.

Page 25: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

EcoNIS IntegrationEcological Network Information System

GeoportalData Catalog webservices

EcoTrends

XML

webservices

webservices

Integration using SOA

EML and/or ISO formatted XML metadata will be used as exchange formats to populate databases that support systems integrated using web services

Page 26: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Knowledge, Learning, Analysis System (KLAS)

The goal of KLAS is to develop a new scientific approach that takes advantage of the of the big data in ecology and environmental sciences. This approach can be implemented as a knowledge-driven, open access system that "learns" and becomes more efficient and easier to use as data streams increase in variety and size.

The system integrates a data-intensive approach with a hypothesis-data-driven approach. The data-intensive approach examines large amounts of data using statistical analysis, machine learning, and data mining tools (Peters et at. 2014). There are five key components on KLAS:1. Recommendations of similar or complementary

datasets and analytical tools.

2. Caching of intermediate results for future use.

3. Precomputing data analysis such as regressions, decision tree analysis, aggregation, and feature selection.

4. Prefetching data before it is requested based on similar experiments.

5. Filtering algorithms to flag, correct, or delete poor quality or irrelevant data.

We are in an early stage of KLAS, but we expect that it will adapt the scientific method to accommodate vast amounts of data, and make them accessible to a broad range of users via open access with an iterative learning process.

Debra P. C. Peters, Kris M. Havstad, Judy Cushing, Craig Tweedie, Olac Fuentes, and Natalia Villanueva-Rosales (2014). Harnessing the power of big data: infusing the scientific method with machine learning to transform ecology. Ecosphere 5:art67.

Page 27: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

EcoTrends:

-Long-term data from 50 sites(26 LTER, USDA, DOE, USGS)

-Major themes- Biota- Stream and air biogeochemistry- Disturbance responses- Climate- Human economy, population

-Accessibility of derived data on web site for network-level synthesis

-Book on value of long-term data to societal concerns (USDA publisher)

Page 28: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

EcoTrendsWe continue to develop the EcoTrends website by adding datasets from > 50 sites within the US and abroad, deriving new data variables, and improving data accessibility and analytical tools. This web site was migrated from the LNO to a Jornada virtual server in LTER-V. The content was updated during this process to correct problems or omissions that had not been previously identified. We continue to develop the next iteration of the EcoTrends website and have dedicated 4 full-time staff and several students to this project.

We plan on implementing the LTER NIS at the Jornada as soon as possible to explore integration of the new EcoTrends website with the data and metadata web services of the NIS. We are also integrating the P2ERLS website of general information (e.g., ecosystem type, long-term mean precipitation, temperature) from > 300 sites distributed globally (http://www.p2erls.net) with the EcoTrends website (http://www.ecotrends.info).

Page 29: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Milestones and Deliverables

Ongoing JRN participation in LTER network-wide activities includes the LTER Data Portal, All-Site Bibliography, and Climate databases as well as representation and participation at the annual Information Managers (IM) Meeting, and workshops.

These activities are associated with expanding the capability of the JRN to acquire, maintain, and exchange information in a timely fashion to meet our milestones and deliverables, and to share this information with other LTER and non-LTER users via the JRN website and the NIS at the LTER Network Office.

Page 30: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Data AccessibilityRecently, activity and discussion within the LTER Network resulted from the Bob Robbins video illustrating problems he encountered while trying to access data from each web site. JRN responded quickly to these problems by immediately implementing a website redirect to forward users to the current data catalog page. We then created EML using DEIMS to replace JRN EML documents used to search for our datasets on the LTER Data Portal. These documents pointed directly to the appropriate dataset section of the old data catalog until we were able to update our data files and metadata from our new data catalog (DEIMS) to generate updated EML and improve data accessibility.

During this process, we increased the quantity of datasets in the LTER Data Portal as well as the quality and congruency of the EML metadata and data. As a member of the NIS Data Portal tiger team (Ramsey), we worked with the LNO and NIS developers to ensure that the new LTER Data Portal (PASTA) allow users to more easily access LTER datasets and associated metadata.

Page 31: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Updated Goals

2015 2018

2016 2017 2018

October 2015

Deploy the LTER NIS to Jornada

June 2016

Support data subset and data aggregation on

website with dynamically generated metadata

August 2016

Integrate DEIMS and GeoPortals

March 2016

Integrate map and image services into DEIMS and the GeoPortals

September 2018

Integrate NIS data and metadata services into DEIMS, GeoPortals,

and EcoTrends

October 2017

Integrate DEIMS and geoportals with the LTER Unit Registry and controlled vocabulary

Page 32: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

New Goals

[pull]New goals since proposal awarded:• Evaluate Matlab Toolkit for processing

streaming data and data harvests as well as integration with the data catalog (DEIMS)

• Evaluate exposure Jornada’s systems as cloud services to allow science teams to provision servers and storage for their collaboration efforts

Page 33: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Tasks

elev veg temp PPT cover rich NPP type ness

Samplin

g

Locat

ions

G

S n

Time

1858

2009

t

year veg type PPT richness NPP

1990 Mesquite 214.5 2.7 28.6

1991 Mesquite 334.9 3.4 32.5

1992 Mesquite 424.4 4.5 60.6

… … … … …

1990 Black grama 156.6 6.6 77.2

1991 Black grama 313.8 5.3 118.0

1992 Black grama 386.8 6.1 156.7

… … … … …

RelationalDatabases

Jornada

Site …

Site A

TextFiles

TextFiles

temporal

data/metadata

spatial

data

EML

met

adat

a

JIMS mapping tools

Populate databases with research data Produce project and dataset documentation for GIS data from existing metadata in GIS

Associate research site locations (RSL) to data set and project ID numbers in JIMS and import RSL shapefiles into geodatabase

Create attribute level EMLfrom databases for: - research data - GIS data - remote sensing data

Develop visualization, mapping, animation, andanalytical tools NPP map

visualization &analytical tools

Our Future

0

100

200

300

0 5 10 15 20species richness

NP

P (

g/m

2)

mesquite black grama

0

100

200

300

199019921994199619982000200220042006

year

NP

P (

g/m

2) mesquite black grama

EcoTrends

MetadataRepository

Page 34: Information Management Jornada Basin LTER. Mission The Jornada Information Management System provides the infrastructure for the curation, protection,

Information ManagementJornada Basin LTER