22
CASA Data Management Handbook Principles and Policy Data Description Real-Time Dissemination Archival and Storage J. Brotzge B. Philips V. Chandrasekar M. Zink 0

Data handbook

Embed Size (px)

Citation preview

Page 1: Data handbook

CASA Data Management Handbook

Principles and PolicyData Description

Real-Time DisseminationArchival and Storage

J. BrotzgeB. Philips

V. ChandrasekarM. Zink

Updated: June 2, 2009

0

Page 2: Data handbook

Outline

I. Principles and Policy

II. Data Description

III. Real-time Dissemination

IV. Archival, Storage, and Retrieval

This document provides an overview of how data from the IP1 test bed are processed, disseminated, and archived for users. Real-time and archived CASA data are provided to the following users:

CASA Researchers CASA Industry Partners Emergency Managers in the test bed region National Weather Service Forecast Office, Norman

The goal of data dissemination to researchers is to advance the understanding of CASA’s sensing, distributing, detecting and forecasting capabilities and contribute to the state-of-the-art in the associated disciplines; the goal of disseminating data to operational users is to understand how CASA changes hazardous weather decision making. All users continue to offer feedback for future changes in the end-to-end system design.

1

Page 3: Data handbook

I. PRINCIPLES AND POLICY

What follows are the principles and policy of CASA in regard to the sharing and dissemination of data.

a. PRINCIPLES1. CASA supports NSF’s policy concerning broad dissemination of research

results

37. Sharing of Findings, Data, and Other Research Productsa. NSF expects significant findings from research and education activities it supports to be promptly submitted for publication, with authorship that accurately reflects the contributions of those involved. It expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections and other supporting materials created or gathered in the course of the work. It also encourages awardees to share software and inventions or otherwise act to make the innovations they embody widely useful and usable.b. Adjustments and, where essential, exceptions may be allowed to safeguard the rights ofindividuals and subjects, the validity of results, or the integrity of collections or to accommodate legitimate interests of investigators.”

- NSF General Contract 1 (dated 7/20), section 37, which is incorporated by reference into CASA’s Cooperative Agreement with NSF

2. Data generated from the test beds is part of CASA’s research results and also a research tool.

CASA’s goal is to create end-to-end systems which improve forecasts, warning and response. The new data generated through the test beds links CASA’s technology and forecasts to warning and response. In addition, data will validate the operation of the of different system components such as detection algorithms, MC&C, sensing strategies, and the data will provide a basis for research in all of CASA’s thrusts.

3. CASA should maintain its “brand” as data is disseminated for quasi- operational use and for research inside/outside the CASA program.

4. Making data widely available and allowing flexible access to data will help CASA achieve its tech transfer goals.

5. CASA principal partners should have an advantage over non-partners accessing and using CASA data.

6. CASA should validate the integrity of the data before it is made available to institutions and organizations that are outside of CASA’s partnership.

2

Page 4: Data handbook

b. DATA POLICY COMPONENTS

Core Academic Partners, Principal Partners, Academic Outreach Partners, Collaborating Partners

Non-Collaborators

Real Time Data As soon as possible for academic partners

For other groups, as soon as data is sufficiently validated given the users level of sophistication and involvement in the program.

One year after data collection, at cost of providing data

Archived Data As soon as possible for academic partners (see schedule below)

For other groups, as soon as data is sufficiently validated given the users level of sophistication and involvement in the program.

One year after data collection, at cost of providing data

Usage Agreement Agreement developed for non-core academic partners addressing re-distribution, liability, acknowledgement, and display.

Data should be copyrighted

Agreement developed addressing re-distribution, liability, acknowledgement, and display.

Data should be copyrighted

User Data Preferences Incorporated into utility-based scanning strategies, ability to make dynamic data requests of system when available.

User preferences not incorporated into the system.

Data Display/Web CASA web site used for display of data to general public.

Users access archives on QuickPlace (using LEAD or OpenDap)

Defined in User Agreement

Acknowledgement Acknowledgement of CASA and NSF in publications and display of data.

Copyright on CASA data display. CASA Logo on data display.

Acknowledgement of CASA and NSF in publications and display of data

Copyright on CASA data display. CASA Logo on data display.

3

Page 5: Data handbook

II. Data Description

Data collected by IP1 are provided in real-time to National Weather Service forecasters and county and city emergency management and are now a vital piece of their decision-making process within the testbed area. Thus now more than ever, CASA data and associated metadata must be clear, quality-controlled, and available in real-time whenever possible. For CASA researchers and industry partners, archived data must be well-organized and thorough and easy to access. Nevertheless, the data must remain protected, mitigating proprietary and security concerns.

What follows is a description of the overall data organization, data flow, and data file description.

a. Data Organization

There are several levels of data created within the IP1 system (see Figure 1) which reflect the stages of processing prior to dissemination to end-users:

Tier 1 data: This data are the basic time series data generated at the radar node. These are collected and saved at the node on a request-basis only. Once collected, this data are stored off-line.

Tier2a data: Tier2a data are the basic moment data such as reflectivity, velocity and dual-polarization products that are transmitted routinely from the radar to the System Operations Control Center (SOCC).

Tier2b data: Tier2a files are assigned a new directory structure for compatibility with non-CASA (operational) warning-decision tools such as the Warning Decision Support System – Integrated Information (WDSS-II) and Weatherscope. Data organized within the new directory structure are referred to as Tier2b data.

Tier 4 data: All associated meta data and product images are labeled as Tier4 data. Tier4 data includes all web-generated images and products. Tier4 data includes:

- Logs of all radar scanning information from each heartbeat cycle- One-minute images from nowcasting- Five-minute images of merged reflectivity- Five-minute images of wind analysis output- Five-minute images of ARPS forecast output

All Tier2a, Tier2b and Tier4 files are stored daily (UTC) on tape. No Tier2a data are ever discarded. All Tier2b and Tier4 data are saved if collected during precipitating events. At least six months of data are stored on-line. After six months, data are moved off-line, but remain

4

Page 6: Data handbook

accessible via tape library. After about one year, data on tape are moved off-line. Data from frequently requested cases remain on-line.

More details regarding archival and storage are discussed in Section IV.

b. Data Flow – from Radar to Archival

Radar moments (Tier2a) are generated at the radar from the time series (Tier1) data. These products are transmitted via wireless microwave links to the Systems Operations Control Center (SOCC), courtesy of CASA Industrial Partner OneNet. Data arrives at the SOCC via LDM (Local Data Manager).

Tier2a data that arrives at the SOCC are: i) Archived; ii) Converted into Tier2b data for use in warning-decision tools and real-time display; and iii) Assimilated for use in real-time analysis and numerical weather prediction (NWP) software.

i) All Tier2a data are saved into daily directories, which are then copied to tape.

ii) Tier2b data are generated from the Tier2a data stream using a WDSS-II routine called “casaIngestor”. In effect, the Tier2a data are copied and rearranged into a new directory structure, compatible with COTS display software such as is available from WDSS-II and Weatherscope. Tier2b data also are saved into daily directories, and are copied to tape.

iii) Tier2a data are assimilated directly into the Advanced Regional Prediction System (ARPS) Data Analysis System (ADAS). Real-time 3DVAR surface analysis and 6-hour forecasts are generated from these analyses and are displayed in near real-time. Surface analysis are generated and displayed within ten minutes of data collection. CASA forecast output is available within an hour after initial data assimilation.

Figure 1: Flow and creation of data within the SOCC.

5

casaIngestor WDSS-II Detection Algorithms and other products

LDM

Data assimilation

Tier 2a Tier 2b

Tier 2b_archive Tier 2b

Tier 3bLDM

Tier 2b

Storage

Page 7: Data handbook

c. File Description

All CASA data are written in NetCDF (Network Common Data Format) format. The format is generally self-describing. The header information from a sample file from KCYR is as follows:

netcdf KCYR_20090602-180006.netcdf.gz {dimensions: Radial = UNLIMITED ; // (428 currently) Gate = 426 ;variables: double Azimuth(Radial) ; Azimuth:Units = "Degrees" ; double Elevation(Radial) ; Elevation:Units = "Degrees" ; double GateWidth(Radial) ; GateWidth:Units = "Millimeters" ; double StartRange(Radial) ; StartRange:Units = "Millimeters" ; int StartGate(Radial) ; StartGate:Units = "Unitless" ; int Time(Radial) ; Time:Units = "Seconds" ; double TxFrequency(Radial) ; TxFrequency:Units = "Hertz" ; double TxLength(Radial) ; TxLength:Units = "Seconds" ; double TxPower(Radial) ; TxPower:Units = "dBm" ; int AfcSet(Radial) ; AfcSet:Units = "Unitless" ; int GcfState(Radial) ; GcfState:Units = "Unitless" ; float Reflectivity(Radial, Gate) ; Reflectivity:Units = "dBz" ; float Velocity(Radial, Gate) ; Velocity:Units = "MetersPerSecond" ; float SpectralWidth(Radial, Gate) ; SpectralWidth:Units = "MetersPerSecond" ; float DifferentialReflectivity(Radial, Gate) ; DifferentialReflectivity:Units = "dB" ; float DifferentialPhase(Radial, Gate) ; DifferentialPhase:Units = "Degrees" ; float CrossPolCorrelation(Radial, Gate) ; CrossPolCorrelation:Units = "Unitless" ;

6

Page 8: Data handbook

float NormalizedCoherentPower(Radial, Gate) ; NormalizedCoherentPower:Units = "Unitless" ; float SpecificPhase(Radial, Gate) ; SpecificPhase:Units = "DegreePerKm" ; float HPropagationPhase(Radial, Gate) ; HPropagationPhase:Units = "Radians" ; float VPropagationPhase(Radial, Gate) ; VPropagationPhase:Units = "Radians" ; int GateFlags(Radial, Gate) ; GateFlags:Units = "BitField" ; float CorrectedReflectivity(Radial, Gate) ; CorrectedReflectivity:Units = "dBZ" ; float CorrectedDifferentialReflectivity(Radial, Gate) ; CorrectedDifferentialReflectivity:Units = "dB" ;

// global attributes: :NetCDFRevision = "$Id: mdmd.c,v 1.33 2008-02-28 00:11:57 junyent Exp $" ; :RadarName = "cyril.ok" ; :Latitude = 34.8739068803885 ; :Longitude = -98.2514293120135 ; :Height = 423.939392604865 ; :MccId = 124344750 ; :NumGates = 426 ; :ScanFlag = 1 ; :ScanType = 2 ; :ScanId = 64373 ; :PosAccel = 60. ; :PosVelocity = 21. ; :AntennaGain = 37. ; :AntennaBeamwidth = 1.8 ;}

Actual data content follows the header information within each file. Global attributes are variable constants (meta data information). Each variable is listed with associated units.

7

Page 9: Data handbook

d. File Naming and Directory Structure

Each file type has a unique file naming convention and directory structure. The naming convention is abbreviated, in part based on the directory structure architecture (see Fig 2).

Tier 2a – The Tier 2a file naming convention is:

<radar>_<yyyymmdd>-<hhmmss>.netcdf.gz

Example: KCYR_20060820-014841.netcdf.gz (radar)_(date)-(time).netcdf.gz

The Tier2a directory structure is as follows:

/<radar>/file

Example: /tier2a/KCYR/KCYR_20060820-014841.netcdf.gz

Tier 2b – The Tier2b file naming convention is:

<yyyymmdd>-<hhmmss>.netcdf.gz

Example: 20060822-165808.netcdf.gz (date)-(time).netcdf.gz

The Tier2b directory structure is as follows:

/<radar>/<product>/<tilt>/file

Example: /tier2b/KSAO/Kdp/03.00/20060822-165808.netcdf.gz

Tier 4 – The Tier 4 directory structure is as follows:

/<radarname>/<year>/<month>/<day>/<multi>/<year>/<month>/<day>

Note that the Tier 4 data file structure may vary with the character of the data.

8

Page 10: Data handbook

Figure 2: Data file directory architecture.

9

data

tier2a tier2b

files

files

site site

product

elevation

tier4

files

site

date

Page 11: Data handbook

e. Quality Control

Several layers of quality control can be applied to the data. These are defined as follows:

RhoHV: Filtering can be applied to the data based upon the dual-polarization variable rhoHV. RhoHV is the correlation between the H and V components. Ideally, this value should be close to 1.

Example: >casaIngestor … -R 0.5.

NormalizedCoherentPower (NCP): Filtering based upon the normalized coherent power (similar to the rhoHV filter).

Example: >casaIngestor … -p 0.6.

Clutter: Filtering based upon the variable GcfState. Example: >casaIngestor … -F

GateFlags: Filtering based upon a level of accuracy as determined at the node. Example: >casaIngestor … -G

The variable GcfState is set to 0 if a clear air scan is being done for refractivity. These data do not have clutter removed, as it is the clutter that is used to determine the refractivity. When set to zero, these files are discarded for use in display and conversion to Tier2b.

The variable GateFlag is set to either 0 or 1 to identify whether or not a particular sample should be displayed. Data are not used when the GateFlag is set to zero. The flag is based upon:

(1) Clutter detection(2) Signal to noise ratio(3) Second trip removal(4) Standard deviation of polarimetric variables

When the GateFlag is used for quality control, then the dual-polarization filtering (RhoHV, NCP) is no longer needed. The GateFlag includes dual-polarization information within its accuracy determination.

10

Page 12: Data handbook

III. DATA DISSEMENATION IP1

a. End Users for IP1 data

Principal users of IP1 data include:

- CASA Researchers - NWS Forecast Office Norman/Warning Decision Training Branch - Oklahoma Emergency Managers - CASA Industrial Partners

b. Real-Time Dissemination and Visualization

IP1 has two primary mechanisms for distributing and visualizing CASA data in real-time:

a) WDSS-II (Warning Decision Support System – Integrated Information). A web page has been created that displays all WDSS-II products from CASA in real-time:

http://wdssii.nssl.noaa.gov/web/wdss2/products/radar/casart.shtml

This site is open and may be accessed by the general public.

An array of products are displayed in real-time, each updated every 5 minutes. These products include:

- Merged reflectivity composite of all 4 radars - Animated movies of merged reflectivity composite (30+ minute loops)- Individual radar reflectivity and velocity (lowest two elevations)- Individual radar dual-polarization products (lowest two elevations)- Individual radar RHIs- Coincident KTLX and KFDR (NEXRAD) reflectivity and velocity

(0.5 deg elevation)

Some sample images from the WDSS-II display is shown in Figure 3.

WDSS-II is a research platform tool developed by the National Severe Storms Laboratory (NSSL) by which radar data may be ingested, quality-controlled, and displayed in real-time. CASA researchers, the NWS, and the CASA Industrial partners are the primary users of the WDSS-II real-time display.

11

Page 13: Data handbook

Figure 3: A sample of products available from WDSS-II include: a) Merged reflectivity composite; b) NEXRAD reflectivity at 0.5 degree elevation; c) SCIT feature detections; and d) Meteorological Command & Control (MC&C) scanning strategy with merged reflectivity composite.

12

Page 14: Data handbook

b) OKFirst, Weatherscope – CASA data also may be viewed using Weatherscope. Individual radar products and composite reflectivity are viewed routinely using Weatherscope. In addition, GIS products such as city streets, as well as NEXRAD data may be overlaid for comparison and application. Emergency managers are the primary users viewing the real-time data within Weatherscope.

Figure 4: A snapshot of CASA data as displayed in Weatherscope. This example shows a tornadic thunderstorm collected 8 May 2007 from KSAO (Chickasha).

13

Page 15: Data handbook

IV. Archival, Storage and Retrieval

a. Archival and Storage

For research needs, all CASA moment data are archived permanently on tape. A multi-terabyte tape library is available for data storage and archival holds 33 tapes with a combined native storage of 13.2 TB (Fig. 5).

Approximately 28 TB of CASA radar data have been archived since the inception of IP1 in fall of 2006.

Figure 5: A photo of the type of tape library utilized for on-line storage of CASA radar data.

b. Data Retrieval

Archived data can be retrieved by all CASA personnel and partners via the CASA IP1 web site (Fig. 6):

http://socc.caps.ou.edu

Archived data can be accessed in several ways:

(1) Nearly all IP1 data collected during severe weather events are left on-line for easy access. A detailed history of each case is included with a link via ftp for immediate downloading of data:

http://socc.caps.ou.edu/cases_09.html

(2) For archived data not associated with a severe weather event, an interactive data request form has been set up for users to request archived data. An electronic copy of this form is sent to the SOCC manager. If the data are on-line, then a link to that data is sent to the user. If the data are off-line, then the SOCC manager will retrieve that data manually, and put it back on-line for the user to retrieve.

14

Page 16: Data handbook

(3) For a quick summary of cases collected during a given year, a summary table has been compiled and is available for download from the SOCC web site:

(4) To access a visual archive of previous events, images of the merged reflectivity composite from CASA and NEXRAD have been saved. These images can be readily accessed via the SOCC web site. For example, Feb 10, 2009 data can be seen at:

(5) Finally, a Wikipedia site has been developed for each of the more interesting cases collected from the IP1 testbed. Images and associated materials are collected from various thrusts and shared among all CASA participants:

15

Page 17: Data handbook

Fig. 6: The IP1 Systems Operations Control Center (SOCC) web site, designed for easy access for real-time visualization and archive data retrieval.

16