21
The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Embed Size (px)

Citation preview

Page 1: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

The Research Data Archive at NCAR

Doug Schusterand

Steve WorleyNCAR

Page 2: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Topic Outline

Introduction/HistoryCore Data Categories/Featured DatasetsArchive Management/Tools New Supporting IT InfrastructureFuture Possibilities

1/25/2011 AMS 2011 2

Page 3: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Introduction/History

Data Support Section (Founded 1965)Paper -> Punch Cards -> Tapes -> CD/DVD’s ->Hard

Drives -> Network Based Storage and TransferKB of observations -> Terabytes of Model Generated

Data (Total archive volume over 600 TB)Weeks or months for a user to get data -> Users want

data access now (over 7000 registered users) Pay for Data -> Free and open access to all datasets

that aren’t subject to source restrictions

1/25/2011 AMS 2011 3

Page 4: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Introduction/History

How do we evolve to support the growing needs of data users and generators?Stay aware of current research uses

Strengthen datasets supporting core research data categories

Update archive management toolsRebuild/Augment IT infrastructureEducate supporting staff

1/25/2011 AMS 2011 4

Page 5: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Core Data CategoriesContent to support atmospheric and

geosciences researchSome research examples:

ClimateOceanographic HydrologicWeather PredictionRenewable Energy (Wind/Solar)

1/25/2011 AMS 2011 5

Page 6: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Core Data Categories

1/25/2011 AMS 2011 6

Operational and Reanalysis model outputs

Meteorological and Oceanographic Observations

Remote Sensing Observations

Topography/Bathymetry, Vegetation, Land Use

Page 7: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Featured Datasets

Platform ObservationsDataset Title Coverage Update

FrequencyNCEP GDAS observations (PREPBUFR and NetCDF) Global 1999 – Present Daily

RDA Upper Air Database Global 1920 – Present Monthly

NCDC TD3200 U.S. Cooperative Summary of Day U.S. 1890 – Present Monthly

Unidata IDD GTS based observations (NetCDF) Global 2002 – Present Daily

NCEP operational observations (ON-29 Format) Global 1975 – 2007 Fixed

International Comprehensive Ocean-Atmosphere Data Set (ICOADS)

Global 1662 – Present Monthly

1/25/2011 AMS 2011 7

1662 2011Global Platform Observations

Page 8: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Featured Datasets

Analysis and Forecast Model DataDataset Title Coverage Update Frequency

Thorpex Interactive Grand Global Ensemble (TIGGE) Global 2006 - Present Hourly

Unidata IDD (GFS 0.5deg, RUC 20km, NAM 12km) Global and Regional 2002 - Present Daily

NCEP ETA/NAM (40km) North America 1995 - Present Monthly

ECMWF Operational Deterministic (1.25 x 1.25 Deg) Global 1985 - Present Bi-Yearly

NCEP GDAS Final Analysis (1x1 Deg) Global 1999 - Present Daily

NCEP OI Global SST (1x1 Deg) Global 1981 - Present Weekly

NOAA OI Global SST (0.25 x 0.25 Deg) Global 1981 - Present Monthly

Hadley Centre Global Sea Ice and SST Global 1850 - Present Monthly

1/25/2011 AMS 2011 8

1850 2011Analysis and Forecast Model Data

Page 9: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Featured Datasets

1/25/2011 AMS 2011 9

High Resolution Re-AnalysisDataset Title Coverage Update Frequency

ERA-40 (T159) Global 1957 - 2002 Static Set

ERA-Interim (N128 Gaussian) Global 1989 - Present Yearly

1870 2011High Resolution Re-Analysis

JRA-25 (1.125 Deg Gaussian) Global 1979 – Present Yearly

NCEP/DOE (T62) Global 1979 - Present Static Set

NCEP/NCAR (T62) Global 1948 - Present Quarterly

NARR (32 x 32 km) North America 1979 - Present Quarterly

CFSR (0.5 x 0.5 Deg) Global 1979 - Present Monthly

NOAA-CIRES 20th Century Global 1870 – 2008 Static Set

Page 10: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Archive Management

How can we support an archive that continuously grows in volume and complexity with a fixed number of supporting staff?

1/25/2011 AMS 2011 10

Page 11: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Archive ManagementCommon Data Management Tools

Functionality RequirementsScalableIntegrated –one call does allAutomatable

1/25/2011 AMS 2011 11

Page 12: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Archive ManagementCommon Data Management Tools

Task Completion Requirements1. Data acquisition

Get Data (daily or irregularly)

2. Data ArchivalArchive to disk and tape

3. Metadata CollectionCollect MetadataUpdate Metadata Databases

4. Metadata PublishingUpdate Web Server PagesUpdate Internal Metadata Access Points

1/25/2011 AMS 2011 12

Page 13: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Integrated Archival Tools

1/25/2011 AMS 2011 13

Model Generated Data GRIB,

NetCDF

Obs DataBUFR, ASCII etc.

TopographyVector Image,

Binary, etc

Remote Sensing Data

Binary

RDA/CISL Servers

RDA/CISL Servers

Automateddsupdt

ManualTape, FTP,

etc

Step 1: Get Data

Page 14: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

RDA/CISL Servers

Integrated Archival Tools

1/25/2011 AMS 2011 14

Model Generated Data GRIB,

NetCDF

Obs DataBUFR, ASCII etc.

TopographyVector Image,

Binary, etc

Remote Sensing Data

Binary

Step 2: Archive Data

Model Generated Data Files GRIB-2

DISK

HPSSModel

Generated Data File

Model Generated Data File

dsarch

RDA DatabaseFile attribute metadata:

Name, Dataset, Location, Format

Page 15: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

RDA/CISL Servers

Integrated Archival Tools

1/25/2011 AMS 2011 15

Step 3: Collect File ContentMetadata/Check Integrity

RDA DB

Model Generated

File,GRIB-2 Format

Temperature(Center, Date, Time, Level,

Location)

Humidity(Center, Date, Time, Level,

Location)

Vorticity(Center, Date, Time, Level,

Location)

Visibility(Center, Date, Time, Level,

Location)

Precip Rate(Center, Date, Time, Level,

Location)

File attribute metadata:Name, Dataset, Location,

Format

File content metadata:T(C,D,T,L,L)

RH(C,D,T,L,L)Vort(C,D,T,L,L)Vis(C,D,T,L,L)

PcpR(C,D,T,L,L)

GatherMetadata

Page 16: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

RDA/CISL Servers

Integrated Archival Tools

1/25/2011 AMS 2011 16

Step 4: Publish Metadata and Data

RDA Web Server

-Dynamic File lists-Data Search tools-Detailed Content Metadata-Data Subsetting Interfaces

CISL Computational Node

-Detailed Metadata for files on disk.-Data Subsetting

RDA DB

File attribute metadata:Name, Dataset, Location,

Format

File content metadata:T(C,D,T,L,L)

RH(C,D,T,L,L)Vort(C,D,T,L,L)Vis(C,D,T,L,L)

PcpR(C,D,T,L,L)

Page 17: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

New Supporting IT/Infrastructure

Online Disk UpgradesLarger Disk (450 TB)Common Disk Interfaces (webserver and

compute nodes)Tape Archive Upgrades

High Performance Storage System (HPSS)Computing Power Upgrades

Additional and more powerful servers

1/25/2011 AMS 2011 17

Page 18: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

New Supporting IT/Infrastructure

1/25/2011 AMS 2011 18

Complete User CommunityPros:-Fast access to online data.-Access to all RDA metadata.-Access to RDA data. processing services.

Complete User CommunityCons:-Small fraction of RDA online.-Slow access to offline data.-Data processing requests take a long time to finish.

NCAR User CommunityPros:-Access to full RDA.-Fast computing.

NCAR User CommunityCons:-No access to online data.-Forced to use MSS as a file server: access is too slow-No direct access to RDA metadata.

Page 19: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

New Supporting IT/Infrastructure

1/25/2011 AMS 2011 19

Complete User CommunityImprovements:-Faster access to full RDA.-Expanded data processing services available.-Faster turnaround on data processing requests.

NCAR User CommunityImprovements:-Faster access to full RDA.-Direct access to all RDA metadata.

Page 20: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

Future Possibilities

1/25/2011 AMS 2011 20

Leverage New IT Infrastructure Server side parameter and spatial sub-setting

across multiple datasets Model or In-Situ observations

Data provided in multiple output formats Web services based requests (REST, etc.) Addition of large and diverse data sets to the

RDA.

Page 21: The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

http://dss.ucar.edu

1/25/2011 AMS 2011 21