15
Med-CORDEX database Med-CORDEX database = = netcdf files + their info = File System + relational database = XFS + mysql db = file server + LAMP server Linux, Apache, Mysql and PHP www.medcordex.eu 1

Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

Embed Size (px)

Citation preview

Page 1: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

www.medcordex.eu 1

Med-CORDEX database

Med-CORDEX database =

= netcdf files + their info = File System + relational database = XFS + mysql db= file server + LAMP server

Linux, Apache, Mysql and PHP

Page 2: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

file server

www.medcordex.eu 2

NETAPP FAS3240 HA Storage System

dual controller RAID DP technology

(two simultaneus disk failures allowed)

environment:

dual power supply (one coming from UPS) air-conditioned room

Page 3: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

3

LAMP server

HP DL575G7 Linux Server

SLES 11SP2 Operating System no users: the machine is devoted to act as a webserver

(not only for Med-CORDEX database)

Apache 2.4.6 PHP 5.5.10 Tomcat 7.0.52

JVM 1.7.0_55 mysql 5.0.96 pure-ftpd 1.0.36

Environment: dual power supply (one coming from UPS) air-conditioned room

www.medcordex.eu

Page 4: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

www.medcordex.eu 4

paths & filenames

ATMOSPHERIC DATA

• PATH /MEDCORDEX/<Domain>/<Institution>/<GCMModelName>/<CMIP5ExperimentName>/<CMIP5EnsembleMember>/<RCMModelName>/<RCMVersionID>/<Frequency>/<VariableName>

Our PATH shortcut: /MEDCORDEX/ALL (files are not listable)

• FILENAME VariableName_Domain_GCMModelName_CMIP5ExperimentName_CMIP5EnsembleMember_RCMModelName_RCMVersionID_Frequency[_StartTime-EndTime].nc

According to “CORDEX Archive Design” O. B. Christensen, W.J Gutowski, G.Nikulin, and S. Legutke

http ://cordex.dmi.dk

Page 5: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

www.medcordex.eu 5

paths & filenames

OCEAN DATA

Not yet defined a standard (AFAIK)shall we use

http://cmip-pcmdi.llnl.gov/cmip5/output_req.html#req_list ?

Page 6: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

www.medcordex.eu 6

paths & filenames

All tokens which form the PATH are derived from FILENAME but the Institution which is the name of the directory where files have been placed by each data providers

e.g. /incoming_MEDCORDEX/ENEA ENEA

In the db we use all tokens and one more info: realm which is atmosphere or ocean. Realm is deduced from the VariableName

THUS WE HAVE A CONSTRAINT !variables must ALL be unique

regardless to the realm they belong to!

Page 7: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

www.medcordex.eu 7

uploading files

Data providers having data to upload can use ANY ftp client to do:

ftp ftp://user:[email protected] /incoming_MEDCORDEX/$INSTmput *.nc (all files into the same flat dir)

put PLEASEGO.txt (any size, also empty)

where $INST is the code of their institution (eg: ENEA)

Then they wait for the automatic daily procedure to start (at 20:00)

Page 8: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

www.medcordex.eu 8

ingesting files

Every day at 20:00 is automatically run the “ingesting procedure”

For each dir /incoming_MEDCORDEX/$INST with PLEASEGO.txt: for each other file in the dir, the procedure:

1. verifies it’s a netcdf file ncdump -h works properly

2. splits filenames in tokens and checks their compliance to CORDEX standard

3. checks validity of variable name it is already known

4. creates the right $PATH in /MEDCORDEX5. moves the file into its $PATH6. inserts/updates the file’s record in the db also ncdump –h

continue

Page 9: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

www.medcordex.eu 9

ingesting files

When data provider’s files are all processed a mail is sent to him/her with the log of what happened ingesting his/her data

After ingesting all files of all data providers, the procedure:1. computes some statistics and publishes them on

www.medcordex.eu/stats taking figures from db & ftp logs 2. makes all links in /MEDCORDEX/ALL3. copies the whole /MEDCORDEX directory to another host

Page 10: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

10

downloading files

• FTP Server (can be accessed by any ftp client)• THREDDS Data Server (software by unidata.ucar.edu)

www.medcordex.eu

credentials U / D server

data providers readyU D FTP

D THREDDS

authorized users web request D FTP THREDDS

HyMeX database users

their own Mistrals db credentials

D FTP *

Page 11: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

11

downloading data (using any FTP client)

cmd line: ftp $f/$p/ ; dir ; get filen.nc “dir” not in /ALL ncftp –u $hymex www.medcordex.eu ; cd $p ; get filen.nc wget $f/$p/file.nc wget -r $f/$p recursive get, not in /ALL

browser: $f/$p $f/$p/filen.nc

where: $f = ftp://user:[email protected] $p = MEDCORDEX/MED-xx/…/…/….$p = MEDCORDEX/ALL

www.medcordex.eu

Page 12: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

12

downloading data (using THREDDS)

www.medcordex.eu

services: (password required only to get netcdf files)

OpENDAP use files remotely , download them HTTP server download files netcdf subset select & download sections of each file WCS Web Coverage Service serves data to WCS clients WMS Web Map Service serves data to WMS clients NCML NetCDF Markup Language to define a CDM ds ISO description of the file in ISO 19115(-2) metadata. UDDC Unidata Attribute Convention for Data

Discovery provides recommendations for netCDF attributes that can be added to netCDF files

Page 13: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

13

downloading data (using THREDDS)

cmd line: ncdump –h $t/dodsC/$p/file.nc cdo showdate $t/dodsC/$p/file.nc cdo copy $t/dodsC/$p/file.nc local.nc ferret: use $t/dodsC/$p/file.nc

tested with: netcdf 4.3.1.1, cdo 1.6.4rc6, ferret 6.9

browser: www.medcordex.eu/tds MEDCORDEX/ALL is invisible

where: $p=MEDCORDEX/MED-xx/…/…/….$p=MEDCORDEX/ALL

$t=https://user:[email protected]:8290/medcordexwww.medcordex.eu

Page 14: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

www.medcordex.eu 14

db fields

for each ingested netcdf file are

recorded:

codepathfnamesizencdumprealm

InstitutionVariableName DomainGCMModelNameCMIP5ExperimentNameCMIP5EnsembleMemberRCMModelName RCMVersionIDRCMmodelFrequencyStartTimeEndTime

Page 15: Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,

www.medcordex.eu 15

statistics as of May 22, 2014

netcdf files size in GB

CMCC 5896 90.5CNRM 3° 7803 1° 493.5ENEA 2° 14023 97.7GUF 1° 62784 3° 303.6ICPT 5404 101.1INSTM 160 0.2IPSL 1606 113,7LMD 739 2° 429.0UCL 1012 101.8

Total 99427 1732.0