30
ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Embed Size (px)

Citation preview

Page 1: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

ICOADS Archive Practices at NCAR

JCOMM ETMC-III

9-12 February 2010

Steven Worley

Page 2: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Topics

Environment settingData management tools and principles ICOADS NCAR Release 2.5 contributionsBackground CollectionsFuture Challenges

Page 3: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Environment Setting

ICOADS is part of a larger collection called the Research Data Archive (RDA)

RDA – briefly600+ datasets (atmosphere, ocean, geosciences)4.3M files, 462 TB (primary data)6000+ unique users annually, including ICOADSStaff, 7 scientific programmers (M.S. degrees), me,

and administrative assistant

Page 4: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Data management principles

Always archive 2 copies of observational data3rd copy at a partner center (disaster recovery)

Free and open data access world-wide InternetPast – other media, cd-roms, tapes, etc.

Share what we have to build archivesE.g. Digitization of Maury data in China in

exchange for global land surface data

Page 5: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Data Management Tools

• Old System: Specialized Software to manage each data input.• Inefficient• Difficult to Scale

RDAMetadataDatabase

Unidata Server

University Server

NWPServer

Online Disk

Tape Storage

GCMDMetadata

Server

RDA Data Server

Specialized Software

Package 2

Specialized Software

Package 3

Specialized Software

Package 1

• New System: Common RDA tools that homogenize data management.• Efficient• Scalable

RDA Data Management

Common Tool Set

SCD NCAR
allows us to upscale the system with a fixed staff.See me afterwords if you have any questions about the details.
Page 6: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Data Management tools – a few details

Common scripting structure to do routine dataset updates (dsupdt)Very tunable

Frequency, multiple server priority list, validation

Fully integrated with RDADBUsers view is automatically update and therefore always

current

Common single archiving function (dsarch) location and copy control (MSS/HPSS storage, and

online disk)Fills all DB entries (e.g. file and dataset relationships)

Page 7: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Data management tools

Harvest file level metadata (gatherxml)Handle various formats (GRIB1, GRIB2, netCDF,

BUFR, IMMA, ON29, etc.)Save as <xml> and populate DBBenefits

Problem detectionVersioning, replacement, extension

Inventory informationDrive better data service for users

Page 8: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Data management tools

Provide access to data in tape storage archive (dsrqst)Relatively new, not universally available across

RDA - yetDelayed mode – with DB control (many details)Why – RDA holds 462 TB

40 TB online – most popular small scale productsAccess to more products for greater community

Page 9: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

ICOADS Release 2.5 contributions @ NCAR

Data Preparation – format evaluations, translate native formats to IMMA formatMoored research buoy delayed mode archives

TOA, PIRATA (PMEL, JAMSTEC)

World Ocean Database 2005Multiple ocean profile types (NODC)

Receive/archive ICOADS data processing resultsNOAA/ESRL does processing - source

merging, duplicate elimination, preconditioning deletion and fixes, etc.

Page 10: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

ICOADS Release 2.5 contributions @ NCAR

Create and maintain user data access interfacesFile access

IMMA and binary (observations, monthly summary statistics)

Sub-selection (time, space, parameter)Example coming.Output is ASCII tabular formatRuns automatically – nearly all requests completed in 10

minutes

Keep user metrics

Page 11: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

ICOADS Release 2.5 contributions @ NCAR

Near-term preliminary extensions to R2.5Beginning with data in 2008 and forwardBased on NCEP GTS compilation/mergeRuns on day 2 of each month – processes

previous month.Create IMMA observations and binary monthly

summary statisticsHarvest file level metadata Do all archiving of original and processed filesAutomatically, update user interfaces

Page 12: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Brief drive through ICOADS @ NCAR

Page 13: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

World-wide User Access

Page 14: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

File Level Metadata – ICOADS IMMA Example

Page 15: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

File Level Metadata – ICOADS IMMA Example

Page 16: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley
Page 17: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley
Page 18: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

8 pages of information like this

Page 19: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

A look at 2009

Page 20: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley
Page 21: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

What is happening in 2009?

Page 22: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

World-wide User Access

Page 23: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley
Page 24: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley
Page 25: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley
Page 26: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Similar service for the monthly summary statistics

Page 27: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Who uses the sub-setting interfaces?2005-2009

58 Countries

Page 28: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Background Collections

HistoricalMost complete set of ALL source data used to

create ALL ICOADS ReleasesBeginning in mid-1980s

Copies of ALL ICOADS Releases We do not delete any files

Page 29: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Background Collections

Ongoing / Routine data receiptsFormat conversions are done at NCDC

Description Source Frequency

Marine Surface GTS NCEP (BUFR) Monthly

Marine Surface GTS NCDC (IMMA) Monthly

SEAS NCDC (IMMA) Monthly

Keyed NCDC (IMMA) Monthly (nominally)

GCC NCDC (IMMA) Quarterly (nominally)

VOSClim NCDC (IMMA) Monthly

Page 30: ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley

Future Challenges

Eliminate user interface dependency on java applets – deploy java script instead.

Support “advanced” ICOADS initiative

Bias adjusted / corrected observations

Serve as a central DB / handle data ingest

Build a user interface

Continue as a full U.S. partner.