42
ECMWF hort ODB Training 2007 slide 1 Introduction to Observational DataBase (ODB) [email protected] [email protected] 25-Apr-2007

ECMWF A short ODB Training 2007 slide 1 Introduction to Observational DataBase (ODB) [email protected] [email protected] 25-Apr-2007

Embed Size (px)

Citation preview

ECMWFA short ODB Training 2007 slide 1

Introduction to Observational DataBase (ODB)

[email protected]@ecmwf.int

25-Apr-2007

ECMWFA short ODB Training 2007 slide 2

OverviewIntroduction to ODB

Creating a simple database

Use of simulobs2odb –program

Visualizing data using basic odbviewer

More complex databases

ODB within IFS/4DVAR-system

Manipulating ODB data from Fortran90

Few tools: odbsql, odbdiff, odbcompress, odbdup, odb2netcdf

ODBTk : A GUI-based ODB visualisation toolkit

A separate presentation & demo by Paul Burton

ECMWFA short ODB Training 2007 slide 3

Introduction to ODBODB is a tailor made (hierarchical) database software

developed at ECMWF to manage very large observational data volumes through the ECMWF IFS/4DVAR-system on highly parallel supercomputer systems

ODB also enables flexible post-processing of observational data even on a desktop computer

ODB software is written in C and Fortran-90 languages and is available virtually on any Unix-systems (and now also for Windows/CYGWIN)

The software can be installed from source code (“tar-ball“) normally in a less than an hour

ECMWFA short ODB Training 2007 slide 4

Snapshot of AIRS channel#1873 brightness-T

ECMWFA short ODB Training 2007 slide 5

A snapshot of SATOB/AMV-winds

ECMWFA short ODB Training 2007 slide 6

One month of averaged Br-T : HIRS,channel#4

ECMWFA short ODB Training 2007 slide 7

… Introduction to ODBAn observational database usually contains following items:

Observation identification, position and time coordinates

Observation value, pressure levels, channel numbers

Various quality control flags

Obs. departures from background and analysis fields

Satellite specific information

Other closely related informationAll information can be accessed via ODB/SQL language and

Fortran90 interfaceAlso a direct (read-only) access to ODB-data is now available

no programming effort to “scan” ODB-data

ECMWFA short ODB Training 2007 slide 8

Basic components of ODBODB/SQL-language

Data Definition Language: To describe what data items belong to database, what are their data types and how they are related (if any) to each other

Data Query Language: To query and return a subset of data which satisfies certain user specified conditions. This is the key feature of the ODB software !!

Fortran90 interface layer

Data manipulation : create, update & remove data

Execute ODB/SQL-queries and retrieve filtered data

To control MPI and OpenMP-parallelization

ECMWFA short ODB Training 2007 slide 9

Creating a simple ODB databaseWe will create a very simple database using text files

The 3 text files describe

Data layout i.e. what data items will go into ODB

Location and time information of observations

Actual observation measurement information for each location at the given pressure levels

Feed these files into simulobs2odb-program

Discover the data values in database by using odbviewer

ECMWFA short ODB Training 2007 slide 10

Data definition layout : MYDB.ddl

CREATE TABLE hdr AS (

seqno pk1int,

obstype pk1int,

codetype pk1int,

lat pk9real,

lon pk9real,

date yyyymmdd,

time hhmmss,

body @LINK,

);

CREATE TABLE body AS (

entryno pk1int,

varno pk1int,

vertco_type pk1int,

press pk9real,

obsvalue pk9real, );

ECMWFA short ODB Training 2007 slide 11

Input file#2 : hdr.txt

#hdr

obstype = 2

codetype = 141

seqno lat lon date time body.len

1 45 -15 20041101 000000 1

ECMWFA short ODB Training 2007 slide 12

Input file#3 : body.txt

#body

entryno varno vertco_type press obsvalue

1 2 1 50000 251.0

ECMWFA short ODB Training 2007 slide 13

Running simulobs2odbInitialize ODB interactive environment :

use odb

Create database using the following simple command :

simulobs2odb –l MYDB –i hdr.txt –i body.txt

As a result of these commands, a small database called MYDB has been created and it contains one data pool with two tables hdr and body, which are linked (related) to each other via special @LINK data type

It is now easy to extend database by providing more data, or specifying more data items, or adding more tables, or all above at the same time

ECMWFA short ODB Training 2007 slide 14

Visualizing with odbviewerHistory: odbviewer was originally written to be used as a

debugging tool for ODB software development

Linked with ECMWF graphics package MAGICS/MAGICS++

Displays coverage plots

Also a textual report generator

Displays output of data queries

“Sensitive” to ODB/SQL-language : tries automatically produce both coverage plot and textual report for the user

Textual report itself can be invaluable source of information for further post-processing tasks

Making use of the new and more economical tool odbsql

ECMWFA short ODB Training 2007 slide 15

Running odbviewerGo to database directory

cd MYDB

Run

odbviewer –q ‘SELECT lat,lon,press,obsvalue\

FROM hdr, body \

WHERE obstype = 2’

ECMWFA short ODB Training 2007 slide 16

odbviewer coverage plot

Our observation !!

ECMWFA short ODB Training 2007 slide 17

Some odbviewer options-h List of options (gimme some “help” !)

-q ‘SQL-stmt’ Provide ODB/SQL-statement inline

-v viewname/poolno Choose SQL name (& optionally pool number)

-p “1-10,12,15” Choose from a subset of pools

-R No radians-to-degrees conversion for (lat,lon)

-r Enforce radians-to-degrees conversion

-k Show (lat,lon) in degrees even if they were in radians in DB

-c Clean start (i.e. recompile all)

-e editor Choose preferred editor

-e batch Run in batch mode (same as –e pipe)

-N Do not produce a report at all

-I Do not show plot immediately

-P projection Change display projection

-C file.cmap Supply a color map file

-A plot_area Choose plotting area

-F (en)Force to use the old style odbviewer over ‘odbsql’

ECMWFA short ODB Training 2007 slide 18

More complex databasesIn reality databases usually contain many more tables (>>5)

than in the simple example earlier

Each table can contain 10—50 data columns

There can also be a sophisticated data hierarchy (see the next slide) to describe potentially quite complex relationships between tables

In order to provide a good parallel performance on supercomputers, data tables are furthermore divided into data pools, which enables parallel I/O, too:

They behave like sub-databases within a database

Allows much bigger data sets than otherwise possible

ECMWFA short ODB Training 2007 slide 19

Comprehensive data hierarchy

ECMWFA short ODB Training 2007 slide 20

ODB within IFS/4DVAR-system

ECMA/ODB

CCMA/ODB

Output BUFRs

ECMWFA short ODB Training 2007 slide 21

AMSU-A data before screening

ECMWFA short ODB Training 2007 slide 22

AMSU-A data after screening

Under 10% left active !!

ECMWFA short ODB Training 2007 slide 23

Typical ODB usage at ECMWF …Database can be created interactively or in batch mode

We usually run our in-house BUFR2ODB in batch-mode

New observation types can also be fed in via text file

Complete database manipulation prefer using Fortran90-interface, but any read/only-database can also be accessed via rudimentary client-server –interface (C/C++)

Another possibility is to run the new tool – odbsql

No need to use of ODB/SQL compilation system

No need to write a single line of Fortran90

The tool is under development

ECMWFA short ODB Training 2007 slide 24

… Typical ODB usage at ECMWFWhen database has been created, the application program

queries data via precompiled ODB/SQL and places the result data (also known as view ) into a data matrix allocated by the user program

There can virtually be any number of active views at any given time. These can be updated and fed back to database

Due to ODB, the use of WMO BUFR has therefore been minimized at ECMWF in order to enable faster and more robust processing of observations

ECMWFA short ODB Training 2007 slide 25

ECMWF BUFR to ODB conversionODBs at ECMWF are normally created by using bufr2odb

Enables MPI-parallel database creation efficient

Allows retrospective inspection of Feedback BUFR data by converting it into ODB (slow & not all data in BUFR)

bufr2odb can also be used interactively, for example: bufr2odb –i bufr_input_file –I 1-20 –n 4

The preceding example creates 4 pools of ECMA database from the given BUFR input file, but includes only BUFR subtypes from 1 to 20 (inclusive)

Feedback BUFR to ODB works similarly:

fb2odb –i feedback_bufr_file –n 8 –u 2

ECMWFA short ODB Training 2007 slide 26

Manipulating ODB from Fortran90Currently Fortran90 is the only way to fill an ODB database

simulobs2odb is also a Fortran90-program underneath

likewise odbviewer or practically any other ODB-tool

Also: to fetch and update data, Fortran90 is necessary

ODB Fortran90 interface layer offers a comprehensive set of functions to

Open & close database

Attach to & execute precompiled ODB/SQL queries

Load, update & store queried data

ECMWFA short ODB Training 2007 slide 27

An example ODB program program main

use odb_module

implicit none

integer(4) :: h, rc, nra, nrows, ncols, npools, j, jp

real(8), allocatable :: x(:,:)

npools = 0

h = ODB_openODB_open(‘MYDB’, ’OLD’, npools=npools)

< data manipulation loop ; see next page >

rc = ODB_closeODB_close(h, save=.TRUE.)

end program main

ECMWFA short ODB Training 2007 slide 28

Data manipulation loop DO jp=1,npools

! Execute SQL, allocate space, get data into matrix

rc = ODB_selectODB_select(h,’sqlview’,nrows,ncols,poolno=jp)

allocate(x(nrows,0:ncols))

rc = ODB_getODB_get(h,’sqlview’,x,nrows,ncols,poolno=jp)

! Update data, put back to DB, deallocate space

call update(x,nrows,ncols) ! Not an ODB-routine

rc = ODB_putODB_put(h,’sqlview’,x,nrows,ncols,poolno=jp)

deallocate(x)

rc = ODB_cancelODB_cancel(h,’sqlview’,poolno=jp)

! Use the following only with READONLY-databases

! rc = ODB_releaseODB_release(h,poolno=jp)

ENDDO

ECMWFA short ODB Training 2007 slide 29

Compile, link and run

(1) use odb # once per session

(2) odbcomp MYDB.ddl # once only;often from file MYDB.sch

(3) odbcomp sqlview.sql # recompile only when changed

(4) odbf90 main.F90 update.F90 –lMYDB –o main.x # link

(5) ./main.x # run

ECMWFA short ODB Training 2007 slide 30

ODB/SQL compilation system

ECMWFA short ODB Training 2007 slide 31

odbsqlA new tool to access ODB data in read/only –mode

Does not generate C-code, but dives directly into data

Usually faster than generated C-code with exception of accessing large amounts of satellite data (investigated)

The tool is under active development right now

Usage: odbsql –q ‘SELECT column(s) FROM table(s) WHERE …’ \

–s starting_row –n number_of_rows_to_display \

[–X] [other_options]

ECMWFA short ODB Training 2007 slide 32

ODB/SQL – examples (1)

SET $t2m = 39; // Scalar parameters, whose values …SET $synop = 1; // … can be overridden in Fortran90

CREATE VIEW t2m ASSELECT an_depar, fg_depar, lat, lon, obsvalueFROM hdr, bodyWHEREobstype = $synop // Give me synopsANDvarno@body = $t2m // Give me 2 meter temperaturesANDobsvalue is not NULL ; // Don’t want missing data

ECMWFA short ODB Training 2007 slide 33

ECMWFA short ODB Training 2007 slide 34

ODB/SQL – examples (2)

SELECT count(*), avg(obsvalue), stdev(fg_depar)FROM hdr, bodyWHERE obstype = $synop && varno = $t2m AND obsvalue IS NOT NULL;

// Observation count per (obstype,codetype)-pair :SELECT obstype, codetype, count(*)FROM hdr ;

SELECT varno, avg(fg_depar), CORR(fg_depar, an_depar)FROM bodyWHERE fg_depar is NOT null ;

ECMWFA short ODB Training 2007 slide 35

ECMWFA short ODB Training 2007 slide 36

odbdiffEnables comparison of two ODB databases for differences

A very useful tool when trying to identify errors/differences between operational and experimental 4DVAR runs

Usually a non-trivial task

Usage:

odbdiff –q ‘SELECT …’ /dir1/DATABASE1 /dir2/DATABASE2

By default the command brings up an xdiff-window with respect to differences

If latitude and longitude were also given in the data query, then it also produces a difference plot using odbviewer-tool

ECMWFA short ODB Training 2007 slide 37

odbcompressEnables to create very compact databases from the existing

ones for

archiving purposes, or

for smaller database footprint (disk occupancy)

Makes post-processing considerably faster

The user can choose to

Truncate the data precision, and/or

Leave out columns that are less of an importance

Typical compression ratios vary between 2.5X … 11X

the high compression achieved for satellite data !!

ECMWFA short ODB Training 2007 slide 38

odbdup/odbmergeAllows f.ex. database sharing between multiple users

Over shared (e.g. NFS, Lustre, GPFS, GFS) disks

Duplicates [merges] database(s) by copying metadata (low in volume), but shares the actual (high volume) binary data

Also enables creation of time-series database, for example: odbdup –i “200701*/ECMA.conv” –o USERDB

The previous example creates a new database labelled as USERDB, which presumably spans over the all conventional observations during the January 2007

The main point : user has now access to whole month of data as if it was a single database !!

ECMWFA short ODB Training 2007 slide 39

odb2netcdfTranslates the result of a given ODB-query (or whole ODB-

table) into a series of NetCDF-files, by default one file for each ODB data pool (i.e. partition)

Usage:

odb2netcdf –q ‘SELECT …’ [-p pool_number] [-P]

The result files can be viewed with the standard NetCDF tools like ncdump and ncview

The files can also be created in the NetCDF packed format (caveat : truncated data precision), -P option was used

ECMWFA short ODB Training 2007 slide 40

Some interesting facts on ODBWritten mainly in C-language

Except Fortran90-interface and IFS/4DVAR interface

Except BUFRODB (by Milan Dragosavac, ECMWF)

ODB/SQL is currently converted into C-code

10 lines of SQL generates >> 100 lines of C-code

Standalone ODB installation (w/o IFS) is also available

Tested at least on the following machines

SGI/Altix, IBM Power3/4/5, Linux Intel/AMD

Fujitsu VPPs, NEC SX, Cray XT3/4

Automatic binary data conversion guarantees database portability between different machines

ECMWFA short ODB Training 2007 slide 41

… and some ODB “limitations”ODB software is clearly meant for large scale computation

since – given lots of memory and disk space, fast CPUs:

A single program can handle up to 2^31 ODB databases

A single database can have up to 2^31 data pools

A single database can have any number of tables

A single table in a data pool can have up to 2^31 rows and (by default) 9999 columns

A single ODB/SQL-query over active data pools can retrieve up to 2^31 rows in one go

These really big numbers show that ODBs potential is on parallel computers. Yet we haven’t forgotten the PCs!

ECMWFA short ODB Training 2007 slide 42

Finally…ODB software is developed to allow unprecedented amounts

of satellite data through the IFS/4DVAR system

Software has been operational at ECMWF since June’2000, but is still evolving

Emphasis is now on graphical post-processing and how to enable fast access to very large amounts of data

Who is using ODB outside ECMWF ? At least …

MeteoFrance, Hungarian MS, SMHI, FMI

Aladin and some HIRLAM nations

Australian Bureau of Meteorology

University of Vienna via re-analysis ERA40 collaboration