13

Click here to load reader

Module net cdf4

Embed Size (px)

DESCRIPTION

python

Citation preview

Page 1: Module net cdf4

Home Trees Indices HelpModule netCDF4

Module netCDF4

IntroductionPython interface to the netCDF version 4 library. netCDF version 4 has many features not found in earlier versions of thelibrary and is implemented on top of HDF5. This module can read and write files in both the new netCDF 4 and the oldnetCDF 3 format, and can create files that are readable by HDF5 clients. The API modelled after Scientific.IO.NetCDF, andshould be familiar to users of that module.

Most new features of netCDF 4 are implemented, such as multiple unlimited dimensions, groups and zlib data compression.All the new numeric data types (such as 64 bit and unsigned integer types) are implemented. Compound and variable length(vlen) data types are supported, but the enum and opaque data types are not. Mixtures of compound and vlen data types(compound types containing vlens, and vlens containing compound types) are not supported.

DownloadLatest bleeding-edge code from the subversion repository.Latest releases (source code and windows installers).

Requiresnumpy array module http://numpy.scipy.org, version 1.3.0 or later (1.5.1 or higher recommended, required if usingpython 3).Cython is optional - if it is installed setup.py will use it to recompile the Cython source code into C, using conditionalcompilation to enable features in the netCDF API that have been added since version 4.1.1. If Cython is not installed,these features (such as the ability to rename Group objects) will be disabled to preserve backward compatibility witholder versions of the netCDF library.For python < 2.7, the ordereddict module http://python.org/pypi/ordereddict.The HDF5 C library version 1.8.4-patch1 or higher (1.8.8 or higher recommended) from ftp://ftp.hdfgroup.org/HDF5/current/src. Be sure to build with '--enable-hl --enable-shared'.Libcurl, if you want OPeNDAP support.HDF4, if you want to be able to read HDF4 "Scientific Dataset" (SD) files.The netCDF-4 C library from ftp://ftp.unidata.ucar.edu/pub/netcdf. Version 4.1.1 or higher is required (4.2 or higherrecommended). Be sure to build with '--enable-netcdf-4 --enable-shared', and set CPPFLAGS="-I$HDF5_DIR/include" and LDFLAGS="-L $HDF5_DIR/lib", where $HDF5_DIR is the directory where HDF5 wasinstalled. If you want OPeNDAP support, add '--enable-dap'. If you want HDF4 SD support, add '--enable-hdf4'and add the location of the HDF4 headers and library to CPPFLAGS and LDFLAGS.

Installinstall the requisite python modules and C libraries (see above). It's easiest if all the C libs are built as shared libraries.optionally, set the HDF5_DIR environment variable to point to where HDF5 is installed (the libs in $HDF5_DIR/lib, theheaders in $HDF5_DIR/include). If the headers and libs are installed in different places, you can use HDF5_INCDIR andHDF5_LIBDIR to define the locations of the headers and libraries independently.optionally, set the NETCDF4_DIR (or NETCDF4_INCDIR and NETCDF4_LIBDIR) environment variable(s) to point to wherethe netCDF version 4 library and headers are installed.If the locations of the HDF5 and netCDF libs and headers are not specified with environment variables, some standardlocations will be searched.if HDF5 was built as a static library with szip support, you may also need to set the SZIP_DIR (or SZIP_INCDIR andSZIP_LIBDIR) environment variable(s) to point to where szip is installed. Note that the netCDF library does notsupport creating szip compressed files, but can read szip compressed files if the HDF5 lib is configured to support szip.if netCDF lib was built as a static library with HDF4 and/or OpenDAP support, you may also need to set HDF4_DIR,JPEG_DIR and/or CURL_DIR.Instead of using environment variables to specify the locations of the required libraries, you can either let setup.py tryto auto-detect their locations, or use the file setup.cfg to specify them. To use this method, copy the filesetup.cfg.template to setup.cfg, then open setup.cfg in a text editor and follow the instructions in the commentsfor editing. If you use setup.cfg, environment variables will be ignored.If you are using netcdf 4.1.2 or higher, instead of setting all those enviroment variables defining where libs are

netCDF4 http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-modul...

1 of 13 1/29/2014 12:30 PM

Page 2: Module net cdf4

installed, you can just set one environment variable, USE_NCCONFIG, to 1. This will tell python to run the netcdfnc-config utility to determine where all the dependencies live.run python setup.py build, then python setup.py install (as root if necessary).If using environment variables to specify build options, be sure to run 'python setup.py build' *without* using sudo.sudo does not pass environment variables. If you run 'setup.py build' first without sudo, you can run 'setup.py install'with sudo.run the tests in the 'test' directory by running python run_all.py.

Tutorial

1) Creating/Opening/Closing a netCDF file

To create a netCDF file from python, you simply call the Dataset constructor. This is also the method used to open anexisting netCDF file. If the file is open for write access (w, r+ or a), you may write any type of data including newdimensions, groups, variables and attributes. netCDF files come in several flavors (NETCDF3_CLASSIC, NETCDF3_64BIT,NETCDF4_CLASSIC, and NETCDF4). The first two flavors are supported by version 3 of the netCDF library. NETCDF4_CLASSICfiles use the version 4 disk format (HDF5), but do not use any features not found in the version 3 API. They can be read bynetCDF 3 clients only if they have been relinked against the netCDF 4 library. They can also be read by HDF5 clients.NETCDF4 files use the version 4 disk format (HDF5) and use the new features of the version 4 API. The netCDF4 module canread and write files in any of these formats. When creating a new file, the format may be specified using the format keywordin the Dataset constructor. The default format is NETCDF4. To see how a given file is formatted, you can examine thefile_format Dataset attribute. Closing the netCDF file is accomplished via the close method of the Dataset instance.

Here's an example:

>>> from netCDF4 import Dataset>>> rootgrp = Dataset('test.nc', 'w', format='NETCDF4')>>> print rootgrp.file_formatNETCDF4>>>>>> rootgrp.close()

Remote OPeNDAP-hosted datasets can be accessed for reading over http if a URL is provided to the Dataset constructorinstead of a filename. However, this requires that the netCDF library be built with OPenDAP support, via the --enable-dap

configure option (added in version 4.0.1).

2) Groups in a netCDF file

netCDF version 4 added support for organizing data in hierarchical groups, which are analagous to directories in afilesystem. Groups serve as containers for variables, dimensions and attributes, as well as other groups. A netCDF4.Dataset

defines creates a special group, called the 'root group', which is similar to the root directory in a unix filesystem. To createGroup instances, use the createGroup method of a Dataset or Group instance. createGroup takes a single argument, apython string containing the name of the new group. The new Group instances contained within the root group can beaccessed by name using the groups dictionary attribute of the Dataset instance. Only NETCDF4 formatted files supportGroups, if you try to create a Group in a netCDF 3 file you will get an error message.

>>> rootgrp = Dataset('test.nc', 'a')>>> fcstgrp = rootgrp.createGroup('forecasts')>>> analgrp = rootgrp.createGroup('analyses')>>> print rootgrp.groupsOrderedDict([('forecasts', <netCDF4.Group object at 0x1b4b7b0>), ('analyses', <netCDF4.Group object at 0x1b4b970>)])>>>

Groups can exist within groups in a Dataset, just as directories exist within directories in a unix filesystem. Each Group

instance has a 'groups' attribute dictionary containing all of the group instances contained within that group. Each Group

instance also has a 'path' attribute that contains a simulated unix directory path to that group.

Here's an example that shows how to navigate all the groups in a Dataset. The function walktree is a Python generator thatis used to walk the directory tree. Note that printing the Dataset or Group object yields summary information about it'scontents.

>>> fcstgrp1 = fcstgrp.createGroup('model1')>>> fcstgrp2 = fcstgrp.createGroup('model2')>>> def walktree(top):

netCDF4 http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-modul...

2 of 13 1/29/2014 12:30 PM

Page 3: Module net cdf4

>>> values = top.groups.values()>>> yield values>>> for value in top.groups.values():>>> for children in walktree(value):>>> yield children>>> print rootgrp>>> for children in walktree(rootgrp):>>> for child in children:>>> print child<type 'netCDF4.Dataset'>root group (NETCDF4 file format): dimensions: variables: groups: forecasts, analyses<type 'netCDF4.Group'>group /forecasts: dimensions: variables: groups: model1, model2<type 'netCDF4.Group'>group /analyses: dimensions: variables: groups:<type 'netCDF4.Group'>group /forecasts/model1: dimensions: variables: groups:<type 'netCDF4.Group'>group /forecasts/model2: dimensions: variables: groups:>>>

3) Dimensions in a netCDF file

netCDF defines the sizes of all variables in terms of dimensions, so before any variables can be created the dimensions theyuse must be created first. A special case, not often used in practice, is that of a scalar variable, which has no dimensions. Adimension is created using the createDimension method of a Dataset or Group instance. A Python string is used to set thename of the dimension, and an integer value is used to set the size. To create an unlimited dimension (a dimension that can beappended to), the size value is set to None or 0. In this example, there both the time and level dimensions are unlimited.Having more than one unlimited dimension is a new netCDF 4 feature, in netCDF 3 files there may be only one, and it mustbe the first (leftmost) dimension of the variable.

>>> level = rootgrp.createDimension('level', None)>>> time = rootgrp.createDimension('time', None)>>> lat = rootgrp.createDimension('lat', 73)>>> lon = rootgrp.createDimension('lon', 144)

All of the Dimension instances are stored in a python dictionary.

>>> print rootgrp.dimensionsOrderedDict([('level', <netCDF4.Dimension object at 0x1b48030>), ('time', <netCDF4.Dimension object at 0x1b481c0>), ('lat', <netCDF4.Dimension object at 0x1b480f8>), ('lon', <netCDF4.Dimension object at 0x1b48a08>)])>>>

Calling the python len function with a Dimension instance returns the current size of that dimension. The isunlimited

method of a Dimension instance can be used to determine if the dimensions is unlimited, or appendable.

>>> print len(lon)144>>> print len.is_unlimited()False>>> print time.is_unlimited()True>>>

Printing the Dimension object provides useful summary info, including the name and length of the dimension, and whether it

netCDF4 http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-modul...

3 of 13 1/29/2014 12:30 PM

Page 4: Module net cdf4

is unlimited.

>>> for dimobj in rootgrp.dimensions.values():>>> print dimobj<type 'netCDF4.Dimension'> (unlimited): name = 'level', size = 0<type 'netCDF4.Dimension'> (unlimited): name = 'time', size = 0<type 'netCDF4.Dimension'>: name = 'lat', size = 73<type 'netCDF4.Dimension'>: name = 'lon', size = 144<type 'netCDF4.Dimension'> (unlimited): name = 'time', size = 0>>>

Dimension names can be changed using the renameDimension method of a Dataset or Group instance.

4) Variables in a netCDF file

netCDF variables behave much like python multidimensional array objects supplied by the numpy module. However, unlikenumpy arrays, netCDF4 variables can be appended to along one or more 'unlimited' dimensions. To create a netCDF variable,use the createVariable method of a Dataset or Group instance. The createVariable method has two mandatoryarguments, the variable name (a Python string), and the variable datatype. The variable's dimensions are given by a tuplecontaining the dimension names (defined previously with createDimension). To create a scalar variable, simply leave outthe dimensions keyword. The variable primitive datatypes correspond to the dtype attribute of a numpy array. You canspecify the datatype as a numpy dtype object, or anything that can be converted to a numpy dtype object. Valid datatypespecifiers include: 'f4' (32-bit floating point), 'f8' (64-bit floating point), 'i4' (32-bit signed integer), 'i2' (16-bit signedinteger), 'i8' (64-bit singed integer), 'i1' (8-bit signed integer), 'u1' (8-bit unsigned integer), 'u2' (16-bit unsignedinteger), 'u4' (32-bit unsigned integer), 'u8' (64-bit unsigned integer), or 'S1' (single-character string). The old Numericsingle-character typecodes ('f','d','h', 's','b','B','c','i','l'), corresponding to('f4','f8','i2','i2','i1','i1','S1','i4','i4'), will also work. The unsigned integer types and the 64-bit integer type canonly be used if the file format is NETCDF4.

The dimensions themselves are usually also defined as variables, called coordinate variables. The createVariable methodreturns an instance of the Variable class whose methods can be used later to access and set variable data and attributes.

>>> times = rootgrp.createVariable('time','f8',('time',))>>> levels = rootgrp.createVariable('level','i4',('level',))>>> latitudes = rootgrp.createVariable('latitude','f4',('lat',))>>> longitudes = rootgrp.createVariable('longitude','f4',('lon',))>>> # two dimensions unlimited.>>> temp = rootgrp.createVariable('temp','f4',('time','level','lat','lon',))

All of the variables in the Dataset or Group are stored in a Python dictionary, in the same way as the dimensions:

>>> print rootgrp.variablesOrderedDict([('time', <netCDF4.Variable object at 0x1b4ba70>), ('level', <netCDF4.Variable object at 0x1b4bab0>), ('latitude', <netCDF4.Variable object at 0x1b4baf0>), ('longitude', <netCDF4.Variable object at 0x1b4bb30>), ('temp', <netCDF4.Variable object at 0x1b4bb70>)])>>>

To get summary info on a Variable instance in an interactive session, just print it.

>>> print rootgrp.variables['temp']<type 'netCDF4.Variable'>float32 temp(time, level, lat, lon) least_significant_digit: 3 units: Kunlimited dimensions: time, levelcurrent shape = (0, 0, 73, 144)>>>

Variable names can be changed using the renameVariable method of a Dataset instance.

5) Attributes in a netCDF file

There are two types of attributes in a netCDF file, global and variable. Global attributes provide information about a group,or the entire dataset, as a whole. Variable attributes provide information about one of the variables in a group. Globalattributes are set by assigning values to Dataset or Group instance variables. Variable attributes are set by assigning values

netCDF4 http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-modul...

4 of 13 1/29/2014 12:30 PM

Page 5: Module net cdf4

to Variable instances variables. Attributes can be strings, numbers or sequences. Returning to our example,

>>> import time>>> rootgrp.description = 'bogus example script'>>> rootgrp.history = 'Created ' + time.ctime(time.time())>>> rootgrp.source = 'netCDF4 python module tutorial'>>> latitudes.units = 'degrees north'>>> longitudes.units = 'degrees east'>>> levels.units = 'hPa'>>> temp.units = 'K'>>> times.units = 'hours since 0001-01-01 00:00:00.0'>>> times.calendar = 'gregorian'

The ncattrs method of a Dataset, Group or Variable instance can be used to retrieve the names of all the netCDFattributes. This method is provided as a convenience, since using the built-in dir Python function will return a bunch ofprivate methods and attributes that cannot (or should not) be modified by the user.

>>> for name in rootgrp.ncattrs():>>> print 'Global attr', name, '=', getattr(rootgrp,name)Global attr description = bogus example scriptGlobal attr history = Created Mon Nov 7 10.30:56 2005Global attr source = netCDF4 python module tutorial

The __dict__ attribute of a Dataset, Group or Variable instance provides all the netCDF attribute name/value pairs in apython dictionary:

>>> print rootgrp.__dict__OrderedDict([(u'description', u'bogus example script'), (u'history', u'Created Thu Mar 3 19:30:33 2011'), (u'source', u'netCDF4 python module tutorial')])

Attributes can be deleted from a netCDF Dataset, Group or Variable using the python del statement (i.e. del grp.fooremoves the attribute foo the the group grp).

6) Writing data to and retrieving data from a netCDF variable

Now that you have a netCDF Variable instance, how do you put data into it? You can just treat it like an array and assigndata to a slice.

>>> import numpy>>> lats = numpy.arange(-90,91,2.5)>>> lons = numpy.arange(-180,180,2.5)>>> latitudes[:] = lats>>> longitudes[:] = lons>>> print 'latitudes =\n',latitudes[:]latitudes =[-90. -87.5 -85. -82.5 -80. -77.5 -75. -72.5 -70. -67.5 -65. -62.5 -60. -57.5 -55. -52.5 -50. -47.5 -45. -42.5 -40. -37.5 -35. -32.5 -30. -27.5 -25. -22.5 -20. -17.5 -15. -12.5 -10. -7.5 -5. -2.5 0. 2.5 5. 7.5 10. 12.5 15. 17.5 20. 22.5 25. 27.5 30. 32.5 35. 37.5 40. 42.5 45. 47.5 50. 52.5 55. 57.5 60. 62.5 65. 67.5 70. 72.5 75. 77.5 80. 82.5 85. 87.5 90. ]>>>

Unlike NumPy's array objects, netCDF Variable objects with unlimited dimensions will grow along those dimensions if youassign data outside the currently defined range of indices.

>>> # append along two unlimited dimensions by assigning to slice.>>> nlats = len(rootgrp.dimensions['lat'])>>> nlons = len(rootgrp.dimensions['lon'])>>> print 'temp shape before adding data = ',temp.shapetemp shape before adding data = (0, 0, 73, 144)>>>>>> from numpy.random import uniform>>> temp[0:5,0:10,:,:] = uniform(size=(5,10,nlats,nlons))>>> print 'temp shape after adding data = ',temp.shapetemp shape after adding data = (6, 10, 73, 144)>>>>>> # levels have grown, but no values yet assigned.>>> print 'levels shape after adding pressure data = ',levels.shape

netCDF4 http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-modul...

5 of 13 1/29/2014 12:30 PM

Page 6: Module net cdf4

levels shape after adding pressure data = (10,)>>>

Note that the size of the levels variable grows when data is appended along the level dimension of the variable temp, eventhough no data has yet been assigned to levels.

>>> # now, assign data to levels dimension variable.>>> levels[:] = [1000.,850.,700.,500.,300.,250.,200.,150.,100.,50.]

However, that there are some differences between NumPy and netCDF variable slicing rules. Slices behave as usual, beingspecified as a start:stop:step triplet. Using a scalar integer index i takes the ith element and reduces the rank of theoutput array by one. Boolean array and integer sequence indexing behaves differently for netCDF variables than for numpyarrays. Only 1-d boolean arrays and integer sequences are allowed, and these indices work independently along eachdimension (similar to the way vector subscripts work in fortran). This means that

>>> temp[0, 0, [0,1,2,3], [0,1,2,3]]

returns an array of shape (4,4) when slicing a netCDF variable, but for a numpy array it returns an array of shape (4,).Similarly, a netCDF variable of shape (2,3,4,5) indexed with [0, array([True, False, True]), array([False, True,

True, True]), :] would return a (2, 3, 5) array. In NumPy, this would raise an error since it would be equivalent to [0,

[0,1], [1,2,3], :]. While this behaviour can cause some confusion for those used to NumPy's 'fancy indexing' rules, itprovides a very powerful way to extract data from multidimensional netCDF variables by using logical operations on thedimension arrays to create slices.

For example,

>>> tempdat = temp[::2, [1,3,6], lats>0, lons>0]

will extract time indices 0,2 and 4, pressure levels 850, 500 and 200 hPa, all Northern Hemisphere latitudes and EasternHemisphere longitudes, resulting in a numpy array of shape (3, 3, 36, 71).

>>> print 'shape of fancy temp slice = ',tempdat.shapeshape of fancy temp slice = (3, 3, 36, 71)>>>

Time coordinate values pose a special challenge to netCDF users. Most metadata standards (such as CF and COARDS)specify that time should be measure relative to a fixed date using a certain calendar, with units specified like hours since

YY:MM:DD hh-mm-ss. These units can be awkward to deal with, without a utility to convert the values to and from calendardates. The functione called num2date and date2num are provided with this package to do just that. Here's an example of howthey can be used:

>>> # fill in times.>>> from datetime import datetime, timedelta>>> from netCDF4 import num2date, date2num>>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]>>> times[:] = date2num(dates,units=times.units,calendar=times.calendar)>>> print 'time values (in units %s): ' % times.units+'\n',times[:]time values (in units hours since January 1, 0001):[ 17533056. 17533068. 17533080. 17533092. 17533104.]>>>>>> dates = num2date(times[:],units=times.units,calendar=times.calendar)>>> print 'dates corresponding to time values:\n',datesdates corresponding to time values:[2001-03-01 00:00:00 2001-03-01 12:00:00 2001-03-02 00:00:00 2001-03-02 12:00:00 2001-03-03 00:00:00]>>>

num2date converts numeric values of time in the specified units and calendar to datetime objects, and date2num does thereverse. All the calendars currently defined in the CF metadata convention are supported. A function called date2index isalso provided which returns the indices of a netCDF time variable corresponding to a sequence of datetime instances.

7) Reading data from a multi-file netCDF dataset.

If you want to read data from a variable that spans multiple netCDF files, you can use the MFDataset class to read the data asif it were contained in a single file. Instead of using a single filename to create a Dataset instance, create a MFDataset

instance with either a list of filenames, or a string with a wildcard (which is then converted to a sorted list of files using the

netCDF4 http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-modul...

6 of 13 1/29/2014 12:30 PM

Page 7: Module net cdf4

python glob module). Variables in the list of files that share the same unlimited dimension are aggregated together, and canbe sliced across multiple files. To illustrate this, let's first create a bunch of netCDF files with the same variable (with thesame unlimited dimension). The files must in be in NETCDF3_64BIT, NETCDF3_CLASSIC or NETCDF4_CLASSIC format(NETCDF4 formatted multi-file datasets are not supported).

>>> for nfile in range(10):>>> f = Dataset('mftest'+repr(nfile)+'.nc','w',format='NETCDF4_CLASSIC')>>> f.createDimension('x',None)>>> x = f.createVariable('x','i',('x',))>>> x[0:10] = numpy.arange(nfile*10,10*(nfile+1))>>> f.close()

Now read all the files back in at once with MFDataset

>>> from netCDF4 import MFDataset>>> f = MFDataset('mftest*nc')>>> print f.variables['x'][:][ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]>>>

Note that MFDataset can only be used to read, not write, multi-file datasets.

8) Efficient compression of netCDF variables

Data stored in netCDF 4 Variable objects can be compressed and decompressed on the fly. The parameters for thecompression are determined by the zlib, complevel and shuffle keyword arguments to the createVariable method. Toturn on compression, set zlib=True. The complevel keyword regulates the speed and efficiency of the compression (1 beingfastest, but lowest compression ratio, 9 being slowest but best compression ratio). The default value of complevel is 4.Setting shuffle=False will turn off the HDF5 shuffle filter, which de-interlaces a block of data before compression byreordering the bytes. The shuffle filter can significantly improve compression ratios, and is on by default. Settingfletcher32 keyword argument to createVariable to True (it's False by default) enables the Fletcher32 checksumalgorithm for error detection. It's also possible to set the HDF5 chunking parameters and endian-ness of the binary datastored in the HDF5 file with the chunksizes and endian keyword arguments to createVariable. These keyword argumentsonly are relevant for NETCDF4 and NETCDF4_CLASSIC files (where the underlying file format is HDF5) and are silently ignoredif the file format is NETCDF3_CLASSIC or NETCDF3_64BIT,

If your data only has a certain number of digits of precision (say for example, it is temperature data that was measured with aprecision of 0.1 degrees), you can dramatically improve zlib compression by quantizing (or truncating) the data using theleast_significant_digit keyword argument to createVariable. The least significant digit is the power of ten of thesmallest decimal place in the data that is a reliable value. For example if the data has a precision of 0.1, then settingleast_significant_digit=1 will cause data the data to be quantized using numpy.around(scale*data)/scale, wherescale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). Effectively, this makes thecompression 'lossy' instead of 'lossless', that is some precision in the data is sacrificed for the sake of disk space.

In our example, try replacing the line

>>> temp = rootgrp.createVariable('temp','f4',('time','level','lat','lon',))

with

>>> temp = dataset.createVariable('temp','f4',('time','level','lat','lon',),zlib=True)

and then

>>> temp = dataset.createVariable('temp','f4',('time','level','lat','lon',),zlib=True,least_significant_digit=3)

and see how much smaller the resulting files are.

9) Beyond homogenous arrays of a fixed type - compound data types

Compound data types map directly to numpy structured (a.k.a 'record' arrays). Structured arrays are akin to C structs, orderived types in Fortran. They allow for the construction of table-like structures composed of combinations of other data

netCDF4 http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-modul...

7 of 13 1/29/2014 12:30 PM

Page 8: Module net cdf4

types, including other compound types. Compound types might be useful for representing multiple parameter values at eachpoint on a grid, or at each time and space location for scattered (point) data. You can then access all the information for apoint by reading one variable, instead of reading different parameters from different variables. Compound data types arecreated from the corresponding numpy data type using the createCompoundType method of a Dataset or Group instance.Since there is no native complex data type in netcdf, compound types are handy for storing numpy complex arrays. Here's anexample:

>>> f = Dataset('complex.nc','w')>>> size = 3 # length of 1-d complex array>>> # create sample complex data.>>> datac = numpy.exp(1j*(1.+numpy.linspace(0, numpy.pi, size)))>>> # create complex128 compound data type.>>> complex128 = numpy.dtype([('real',numpy.float64),('imag',numpy.float64)])>>> complex128_t = f.createCompoundType(complex128,'complex128')>>> # create a variable with this data type, write some data to it.>>> f.createDimension('x_dim',None)>>> v = f.createVariable('cmplx_var',complex128_t,'x_dim')>>> data = numpy.empty(size,complex128) # numpy structured array>>> data['real'] = datac.real; data['imag'] = datac.imag>>> v[:] = data # write numpy structured array to netcdf compound var>>> # close and reopen the file, check the contents.>>> f.close(); f = Dataset('complex.nc')>>> v = f.variables['cmplx_var']>>> datain = v[:] # read in all the data into a numpy structured array>>> # create an empty numpy complex array>>> datac2 = numpy.empty(datain.shape,numpy.complex128)>>> # .. fill it with contents of structured array.>>> datac2.real = datain['real']; datac2.imag = datain['imag']>>> print datac.dtype,datac # original datacomplex128 [ 0.54030231+0.84147098j -0.84147098+0.54030231j -0.54030231-0.84147098j]>>>>>> print datac2.dtype,datac2 # data from filecomplex128 [ 0.54030231+0.84147098j -0.84147098+0.54030231j -0.54030231-0.84147098j]>>>

Compound types can be nested, but you must create the 'inner' ones first. All of the compound types defined for a Dataset orGroup are stored in a Python dictionary, just like variables and dimensions. As always, printing objects gives useful summaryinformation in an interactive session:

>>> print f<type 'netCDF4.Dataset'>root group (NETCDF4 file format): dimensions: x_dim variables: cmplx_var groups:<type 'netCDF4.Variable'>>>> print f.variables['cmplx_var']compound cmplx_var(x_dim)compound data type: [('real', '<f8'), ('imag', '<f8')]unlimited dimensions: x_dimcurrent shape = (3,)>>> print f.cmptypesOrderedDict([('complex128', <netCDF4.CompoundType object at 0x1029eb7e8>)])>>> print f.cmptypes['complex128']<type 'netCDF4.CompoundType'>: name = 'complex128', numpy dtype = [(u'real','<f8'), (u'imag', '<f8')]>>>

10) Variable-length (vlen) data types.

NetCDF 4 has support for variable-length or "ragged" arrays. These are arrays of variable length sequences having the sametype. To create a variable-length data type, use the createVLType method method of a Dataset or Group instance.

>>> f = Dataset('tst_vlen.nc','w')>>> vlen_t = f.createVLType(numpy.int32, 'phony_vlen')

The numpy datatype of the variable-length sequences and the name of the new datatype must be specified. Any of theprimitive datatypes can be used (signed and unsigned integers, 32 and 64 bit floats, and characters), but compound data typescannot. A new variable can then be created using this datatype.

>>> x = f.createDimension('x',3)

netCDF4 http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-modul...

8 of 13 1/29/2014 12:30 PM

Page 9: Module net cdf4

>>> y = f.createDimension('y',4)>>> vlvar = f.createVariable('phony_vlen_var', vlen_t, ('y','x'))

Since there is no native vlen datatype in numpy, vlen arrays are represented in python as object arrays (arrays of dtypeobject). These are arrays whose elements are Python object pointers, and can contain any type of python object. For thisapplication, they must contain 1-D numpy arrays all of the same type but of varying length. In this case, they contain 1-Dnumpy int32 arrays of random length betwee 1 and 10.

>>> import random>>> data = numpy.empty(len(y)*len(x),object)>>> for n in range(len(y)*len(x)):>>> data[n] = numpy.arange(random.randint(1,10),dtype='int32')+1>>> data = numpy.reshape(data,(len(y),len(x)))>>> vlvar[:] = data>>> print 'vlen variable =\n',vlvar[:]vlen variable =[[[ 1 2 3 4 5 6 7 8 9 10] [1 2 3 4 5] [1 2 3 4 5 6 7 8]] [[1 2 3 4 5 6 7] [1 2 3 4 5 6] [1 2 3 4 5]] [[1 2 3 4 5] [1 2 3 4] [1]] [[ 1 2 3 4 5 6 7 8 9 10] [ 1 2 3 4 5 6 7 8 9 10] [1 2 3 4 5 6 7 8]]]>>> print f<type 'netCDF4.Dataset'>root group (NETCDF4 file format): dimensions: x, y variables: phony_vlen_var groups:>>> print f.variables['phony_vlen_var']<type 'netCDF4.Variable'>vlen phony_vlen_var(y, x)vlen data type: int32unlimited dimensions:current shape = (4, 3)>>> print f.VLtypes['phony_vlen']<type 'netCDF4.VLType'>: name = 'phony_vlen', numpy dtype = int32>>>

Numpy object arrays containing python strings can also be written as vlen variables, For vlen strings, you don't need to createa vlen data type. Instead, simply use the python str builtin instead of a numpy datatype when calling the createVariable

method.

>>> z = f.createDimension('z',10)>>> strvar = rootgrp.createVariable('strvar', str, 'z')

In this example, an object array is filled with random python strings with random lengths between 2 and 12 characters, andthe data in the object array is assigned to the vlen string variable.

>>> chars = '1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'>>> data = NP.empty(10,'O')>>> for n in range(10):>>> stringlen = random.randint(2,12)>>> data[n] = ''.join([random.choice(chars) for i in range(stringlen)])>>> strvar[:] = data>>> print 'variable-length string variable:\n',strvar[:]variable-length string variable:[aDy29jPt jd7aplD b8t4RM jHh8hq KtaPWF9cQj Q1hHN5WoXSiT MMxsVeq td LUzvVTzj 5DS9X8S]>>> print f<type 'netCDF4.Dataset'>root group (NETCDF4 file format): dimensions: x, y, z variables: phony_vlen_var, strvar groups:>>> print f.variables['strvar']<type 'netCDF4.Variable'>vlen strvar(z)vlen data type: <type 'str'>unlimited dimensions:current size = (10,)>>>

All of the code in this tutorial is available in examples/tutorial.py, Unit tests are in the test directory.

netCDF4 http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-modul...

9 of 13 1/29/2014 12:30 PM

Page 10: Module net cdf4

Contact: Jeffrey Whitaker <[email protected]>

Copyright: 2008 by Jeffrey Whitaker.

License: Permission to use, copy, modify, and distribute this software and its documentation for any purpose andwithout fee is hereby granted, provided that the above copyright notice appear in all copies and that both the copyrightnotice and this permission notice appear in supporting documentation. THE AUTHOR DISCLAIMS ALLWARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OFMERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL,INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSSOF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHERTORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THISSOFTWARE.

Version: 1.0.7

ClassesCompoundTypeA CompoundType instance is used to describe a compound data type.DatasetDataset(self, filename, mode="r", clobber=True, diskless=False, persist=False, format='NETCDF4')DimensionDimension(self, group, name, size=None)GroupGroup(self, parent, name)MFDatasetMFDataset(self, files, check=False, aggdim=None, exclude=[])MFTimeMFTime(self, time, units=None)VLTypeA VLType instance is used to describe a variable length (VLEN) data type.VariableVariable(self, group, name, datatype, dimensions=(), zlib=False, complevel=4, shuffle=True, fletcher32=False,contiguous=False, chunksizes=None, endian='native', least_significant_digit=None,fill_value=None)

Functionschartostring(b)convert a character array to a string array with one less dimension.date2index(dates, nctime, calendar=None, select='exact')

Return indices of a netCDF time variable corresponding to the given dates.date2num(dates, units, calendar='standard')

Return numeric time values given datetime objects.getlibversion()returns a string describing the version of the netcdf library used to build the module, and when it wasbuilt.num2date(times, units, calendar='standard')

Return datetime objects given numeric time values.stringtoarr(a, NUMCHARS, dtype='S')

convert a string to a character array of length NUMCHARSstringtochar(a)convert a string array to a character array with one extra dimension

VariablesNC_DISKLESS = 8

__has_nc_inq_path__ = 1

__has_rename_grp__ = 1

__hdf5libversion__ = '1.8.10'__netcdf4libversion__ = u'4.3.1-rc4'

netCDF4 http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-modul...

10 of 13 1/29/2014 12:30 PM

Page 11: Module net cdf4

__package__ = None

default_encoding = 'utf-8'default_fillvals = {'S1': '\x00', 'U1': '\x00', 'f4': 9.969209...gregorian = datetime.datetime(1582, 10, 15, 0, 0)

is_native_big = False

is_native_little = True

python3 = False

unicode_error = 'replace'

Function Details

chartostring(b)convert a character array to a string array with one less dimension.

Parameters:b - Input character array (numpy datatype 'S1' or 'U1'). Will be converted to a array of strings, where each stringhas a fixed length of b.shape[-1] characters.

Returns:A numpy string array with datatype 'SN' or 'UN' and shape b.shape[:-1], where N=b.shape[-1].

date2index(dates, nctime, calendar=None, select='exact')Return indices of a netCDF time variable corresponding to the given dates.

Parameters:dates - A datetime object or a sequence of datetime objects. The datetime objects should not include atime-zone offset.nctime - A netCDF time variable object. The nctime object must have a units attribute.calendar - Describes the calendar used in the time calculation. Valid calendars 'standard', 'gregorian','proleptic_gregorian' 'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'. Defaultis 'standard', which is a mixed Julian/Gregorian calendar If calendar is None, its value is given bynctime.calendar or standard if no such attribute exists.select - 'exact', 'before', 'after', 'nearest' The index selection method. exact will return the indicesperfectly matching the dates given. before and after will return the indices corresponding to the dates justbefore or just after the given dates if an exact match cannot be found. nearest will return the indices thatcorrespond to the closest dates.

Returns:an index (indices) of the netCDF time variable corresponding to the given datetime object(s).

date2num(dates, units, calendar='standard')Return numeric time values given datetime objects. The units of the numeric time values are described by the units

argument and the calendar keyword. The datetime objects must be in UTC with no time-zone offset. If there is a time-zoneoffset in units, it will be applied to the returned numeric values.

Parameters:dates - A datetime object or a sequence of datetime objects. The datetime objects should not include atime-zone offset.units - a string of the form 'time units since reference time' describing the time units. time units canbe days, hours, minutes, seconds, milliseconds or microseconds. reference time is the time origin.Milliseconds and microseconds can only be used with the proleptic_gregorian calendar, or the standard andgregorian calendars if the time origin is after 1582-10-15. A valid choice would be units='milliseconds since1800-01-01 00:00:00-6:00'.calendar - describes the calendar used in the time calculations. All the values currently defined in the CFmetadata convention are supported. Valid calendars 'standard', 'gregorian', 'proleptic_gregorian''noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'. Default is 'standard', which is amixed Julian/Gregorian calendar.

netCDF4 http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-modul...

11 of 13 1/29/2014 12:30 PM

Page 12: Module net cdf4

Returns:a numeric time value, or an array of numeric time values.

num2date(times, units, calendar='standard')Return datetime objects given numeric time values. The units of the numeric time values are described by the units

argument and the calendar keyword. The returned datetime objects represent UTC with no time-zone offset, even if thespecified units contain a time-zone offset.

Parameters:times - numeric time values.units - a string of the form 'time units since reference time' describing the time units. time units canbe days, hours, minutes, seconds, milliseconds or microseconds. reference time is the time origin.Milliseconds and microseconds can only be used with the proleptic_gregorian calendar, or the standard andgregorian calendars if the time origin is after 1582-10-15. A valid choice would be units='milliseconds since1800-01-01 00:00:00-6:00'.calendar - describes the calendar used in the time calculations. All the values currently defined in the CFmetadata convention are supported. Valid calendars 'standard', 'gregorian', 'proleptic_gregorian''noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'. Default is 'standard', which is amixed Julian/Gregorian calendar.

Returns:a datetime instance, or an array of datetime instances.

The datetime instances returned are 'real' python datetime objects if the date falls in the Gregorian calendar (i.e.calendar='proleptic_gregorian', or calendar = 'standard' or 'gregorian' and the date is after1582-10-15). Otherwise, they are 'phony' datetime objects which support some but not all the methods of 'real'python datetime objects. This is because the python datetime module cannot the uses the'proleptic_gregorian' calendar, even before the switch occured from the Julian calendar in 1582. Thedatetime instances do not contain a time-zone offset, even if the specified units contains one.

stringtoarr(a, NUMCHARS, dtype='S')convert a string to a character array of length NUMCHARS

Parameters:a - Input python string.NUMCHARS - number of characters used to represent string (if len(a) < NUMCHARS, it will be padded on the rightwith blanks).dtype - type of numpy array to return. Default is 'S', which means an array of dtype 'S1' will be returned. Ifdtype='U', a unicode array (dtype = 'U1') will be returned.

Returns:A rank 1 numpy character array of length NUMCHARS with datatype 'S1' (default) or 'U1' (if dtype='U')

stringtochar(a)convert a string array to a character array with one extra dimension

Parameters:a - Input numpy string array with numpy datatype 'SN' or 'UN', where N is the number of characters in eachstring. Will be converted to an array of characters (datatype 'S1' or 'U1') of shape a.shape + (N,).

Returns:A numpy character array with datatype 'S1' or 'U1' and shape a.shape + (N,), where N is the length of each stringin a.

Variables Details

default_fillvals

Value:

netCDF4 http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-modul...

12 of 13 1/29/2014 12:30 PM

Page 13: Module net cdf4

{'S1': '\x00','U1': '\x00','f4': 9.96920996839e+36,'f8': 9.96920996839e+36,'i1': -127,'i2': -32767,'i4': -2147483647,'i8': -9223372036854775806,

...

Home Trees Indices HelpGenerated by Epydoc 3.0.1 on Thu Nov 14 09:25:36 2013 http://epydoc.sourceforge.net

netCDF4 http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-modul...

13 of 13 1/29/2014 12:30 PM