16
Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts Alan M. Goldberg [email protected] NOTICE This technical data was produced for the U.S. Government under Contract No. 50-SPNA-9-00010, and is subject to the Rights in Technical Data - General clause at FAR 52.227-14 (JUN 1987) © 2004 The MITRE Corporation Approved for public release; distribution unlimited

Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

Embed Size (px)

DESCRIPTION

This poster (or slides) describes the main concepts which relate data format (HDF5), metadata, and data organization, as they are being applied to NPOESS. It includes the motivation for selecting HDF5; the motivation and general implementation of FGDC base metadata and remote sensing extensions; and the current attempt to use best experiences with HDF-EOS swath and netCDF Climate & Forecast conventions to guide NPOESS product definition.

Citation preview

Page 1: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

Alan M. [email protected]

NOTICEThis technical data was produced for the U.S. Government under Contract No.

50-SPNA-9-00010, and is subject to the Rights in Technical Data - General clause at FAR 52.227-14 (JUN 1987)

© 2004 The MITRE Corporation

Approved for public release; distribution unlimited

Page 2: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

NPOESS will collect an unprecedented quantity and variety of satellite environmental data from a constellation of satellites carrying multiple remote sensing and in situ sensors.

NPOESS realized the need for standardization and good metadata in connection with environmental data sets. Global data sets, such as those produced by satellites or used in modeling and simulation, are at the leading edge of information stewardship challenges.

Data sets will be delivered to operational users by pre-subscription through Interface Data Processors (IDPS) at the four US processing Centrals, and through direct readout Field Terminals (FTS) worldwide. Data will be retrieved by near- and long-term researchers through NOAA’s data archives.

NPOESS committed early to using existing standards in a rational manner to support our mission goals.

– Standards-based data products ease product usability.

– Comprehensive metadata contributes to maximizing the continuing value of large, complex data sets. Standards-based metadata ease product understanding.

– Working from a clean sheet of paper, NPOESS can serve as a “workshop” for best practices in data stewardship.

This paper presents the results of planning for the NPOESS data product delivery system. It presents a view of the optimal data product design, much of which is being implemented in NPOESS.

Motivation

Page 3: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

Application Processing,Display, & Dissemination

FT ProcessorElement

CentralCentral Element

NPOESS-UniqueProcessing

RFElectronics

Demodulator

C3S/DRR

SpaceSegment

NPOESS-UniqueCollection

Antenna

UserTerminal

IDPS

External I/FOtherData

FTS

End-to-End Process

The National Polar-orbiting Operational Environmental Satellite System (NPOESS) is managed by the NOAA Integrated Program Office, on behalf of the U.S. Dept. of Commerce, Dept. of Defense, and NASA. Northrop-Grumman Space Technology is prime contractor, with Raytheon the subcontractor for ground processing & operations

Page 4: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

Architectural ConsiderationsSizing • Overall data rates have increased by two orders of magnitude• more products• impact the on existing user systems and applications• decision to “loosely” couple• simple data interface with flexibility on both sides• decided to deliver relatively short-duration data granules, containing typically 30 seconds of dataUsers and use patterns: designed to serve operational users, current science users, and future archival

researchers• Operational users need effectively all of the data as soon as possible• Research users need current environmental information, but usually have time and resources to improve

product quality with post-processing• Archival researchers look for highly selective data sets• Users via Centrals or via field terminalsSensor complexity• multiple versions of the data• the raw bitstreams originating from the sensors (RDRs)• calibrated fluxes measured by the sensor (SDRs)• environmental variables estimated at the source (EDRs)• sensors themselves produce a wide variety of data types• various techniques to maximize performance within bandwidthAnticipating change• detailed formats, contents, product lists, and interfaces that must be accommodated by NPOESS data

product framework format through the mission lifetime

Page 5: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

A/D Conversion

Detection

FluxManipulation

Packetization

Compression

FiltrationAux.SensorData

CCSDS (mux, code, frame) & Encrypt

CommXmitter

SEN

SOR

S

OT

HE

RSU

BSY

ST

EM

S

Cal.Source

DataStore

ENVIRONMENTALSOURCE

COMPONENTS

SPA

CE

SE

GM

EN

T

RDRProduction

EDRProduction

SDRProduction

EDR Level

SDR Level

RDR Level

IDP

S

CommReceiver

CommProcessing

Delivered Raw

C3 S

TDR Level

Data Products Are More Than Scenes

IDPS produces mission data sets which recreate or estimate signals at 4 points in the sensing chain. Ancillary data, brought from other systems and used in EDR processing, is captured. Auxiliary data, produced within the NPOESS system to support processing, is kept with the mission data or incorporated in documentation. Metadatata provided for all.

Page 6: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

RD

Rs

SD

Rs

E

DR

sA

nci

llary

Time-Series of Packet Types

Binary

Headers

Multi- spectralImagery

VectorFlux

Slit Spectra

Sounding FT Spectra

uvw

CalibrationTable

AbstractExternal Data?

DataVolume

GeolocationThematic Layers

Quality

Imagery

Column Data

Geolocation

Typical Data Organization

Page 7: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

First Decision – File Format

The NPOESS program identified several key characteristics for the file format within which data products would be delivered. Hierarchical Data Format ver.5 (HDF5) was found to be the best solution within technical and programmatic constraints:

• A single format with proven ability to handle environmental data products (EDRs); more abstract data structures in RDRs, TDRs & SDRs; and other products delivered to users and archives

• Capability to incorporate full metadata

• Supported by the user community and other institutional support, with an adequate practical lifetime; interoperable with DoD standards

• Ability to handle large data sets (such as full orbits) and small data sets (individual granules) with acceptable efficiency

• Ability to handle multiple arrays and array types within the same granule, such as observational data arrays and geolocation arrays

• High efficiency for reading and writing; built-in compression function; capability to “chunk” large data arrays to access prestructured subsets

• Sufficiently self-documenting to permit variable formats

• Available with development tools to expedite file definition and applications

• Acceptable licensing terms

• Supported on all likely user platforms & operating systems

• Support for all likely atomic data types

• Simple data objects and groups which permit application-specific structures to be created

Page 8: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

Data

Dataspace

Dim_1=5

Dim_2=4

2 Dim_3=2Rank

Dimensions

Dataset

Header

Attributes

current = 12e-9

temp = 56

time = 32.4

Datatype

int16

Storage layout

chunked; compressed

A

Dataset

Dataset

granule attributes

Granule

Dataset

Dataset

granule attributes

Granule

file attributes

FileB

File Structure Implemented in HDF5

HDF5 provides a simple, logical file structure based on a Dataset, comprising a data array and a header which describes the dataspace. Datasets can be structured hierarchically. NPOESS datasets and additional attributes – incorporating the metadata – combine into granules, and granules combine into product files. Granules are concatenated in a file in such a way that they can be addressed either collectively or individually. Individual datasets will be created for elements such as mission data, quality assessment, geospatial location, time, illumination, and viewing geometry. Users may access subsets of the full data using HDF5 utilities.

Page 9: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

Second Decision - Metadata

Based on extensive prior experiences in the earth sciences and other sciences, NPOESS highest level operational requirements specified that comprehensive metadata would be delivered with the data. In our context, metadata incorporates the following “data about data”:

• Identification

• Content summary

• Content meaning

• Content structure and format

• Acquisition and processing history – provenance

• Distribution and availability

The National Spatial Data Infrastructure (NSDI) provides basic guidance for metadata. The Federal Geospatial Data Committee (FGDC) sets content standards. The program establishes compatible extensions to the standard. The program also defines the representation. Over one hundred metadata items have been defined for NPOESS.

Basic identification metadata is duplicated in a User Block at the front of the file, where it can be read without HDF software.

Page 10: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

FGDC Metadata Base & RS Standards

Base Standard

RSE Standard

Metadata

2

DataQuality

1

Identification

4

SpatialReference

3

SpatialData

Organization

5

Entity &Attribute

Information

6

DistributionInformation

7

MetadataReferenceInformation

8

Platform &Mission

Information

9

InstrumentInformation

Page 11: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

Concept Analysis Repositories

Completegranulemetadata

Quasi-staticNSDI metadata

DynamicNSDI metadata

Dynamicdetail metadata

eHandbook

external reference

Aggregate metadata

Common metadata

Unique metadata

granule

file

Identification (UB)

NPOESS will create all applicable metadata identified in the FGDC Base Standard and Remote Sensing Extensions, plus mission-unique. Comprehensive metadata at the granule level will be captured in different ways, consistent with usage patterns. Metadata which changes infrequently will be collected in an external online reference. Metadata which changes with each granule is saved with it. Metadata which describes a file or all the granules in the file is extracted and stored with the file.

Page 12: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

Third Decision – File OrganizationPrimary characteristics of a design•Systematic data organization•Simplifies data maintenance, enhancement, documentation, retrieval, visualization, and exploitation•NPOESS evaluation looked at lessons learned from EOS, and best practices elsewhere•Must work with abstract, generalized scientific, and specifically geospatial data setsResult: Specific derived requirements closely match the “Climate & Forecast Conventions”, primarily developed for use with netCDF. It provides guidelines for consistent, complete, and clear data entity and attribute definition. NPOESS is attempting to implement this data design.Need clear identification of truly independent and dependent variable contents:•For most unresampled data, the natural independent variables are the index attributes defining the discrete points in space and time at which the data samples were collected. Index might be a time-series sample number, detector number, energy or spectral bin.•Usually include an associated independent variable for each index attribute, such as time, direction, position, or energy level. These relate to indices by calibration, known to be very accurate.•Some associated independent variables may be multidimensional. E.g., polar geolocation or solar elevation is a deterministic function of spatial indices.•Always a primary dependent variable: function of independent variable(s). May be an abstract binary object, function of time or place. Usually, well defined variables which are a function of temporal, spatial, or spectral indices.•Often 1+ associated dependent variable arrays, such as quality estimates or telemetry values, associated by design with the primary independant variable•Optional supplementary arrays, such as calibrations, are functions of 1+ index attributes used in the primary and associated data arrays.Concatenation: One index attribute, usually time or time-like, can often be defined as the index which establishes continuity from one granule to the next.

Page 13: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

Primary Index

n-Dimensional Dependant Variable (Entity) Array

Primary Arraye.g., Flux, Brightness, Counts, NDVI

2-D Independent Variable Arrayse.g., lat/lon, sun

alt/az, land mask

Index Attribute

Associated Independent Variable(s)

Clear Index & Array Definitions

Page 14: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

Putting Together the Framework

Supporting Timeline Data

Sensor Data Quality Flag Bits

Sensor Data Arrays Geolocation Arrays

Granule Boundaries

Time

Page 15: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

Putting Together the Framework

With careful design, and based on lessons learned from previous programs, a comprehensive data product design can be achieved. The design eases development and maintenance, by providing a common approach to data and metadata.

Granules form the basic unit of production and cataloguing. They are essentially self-contained.

Each granule contains the primary data arrays, associated data arrays, and descriptive attributes – including metadata – needed to understand the information content.

To facilitate efficient delivery, multiple granules of the same type are combined into product files. Common attributes of all granules in a file may be extracted to the file level as common metadata. Summary and identification attributes are created and added to the file. Basic identification metadata is extracted from the HDF format and saved as an ASCII ‘user block’ at the physical start of the file.

Finally, metadata which changes only rarely is maintained separately as an electronic handbook, and is incorporated in the granules by reference.

Page 16: Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts

eHandbook

Associated Arrays

Granule

GranuleAttributes

Granule Metadata

Other Granule Attributes

Index Variables

Primary Array

Datasets

User Block

HDF File Block

RootFile

Attributes File Metadata

Common Metadata

Other File AttributesData Product

Associated Arrays

Granule

GranuleAttributes

Granule Metadata

Other Granule Attributes

Index Variables

Primary Array

Datasets

Associated Arrays

Granule

GranuleAttributes

Granule Metadata

Other Granule Attributes

Index Variables

Primary Array

Datasets