Upload
giuseppe-onorevoli
View
29
Download
4
Embed Size (px)
DESCRIPTION
HDF5 Slides
Citation preview
1
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 1
HDF5 Tutorial
37th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal
The HDF Group
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 2
Outline 8:00 – 9:00
Introduction to HDF5 data, programming models and tools 9:00 – 9:30
Advanced features
10:00 – 12:00 Introduction to Parallel HDF5
13:15 – 14:15 Caching and buffering in HDF5
14:45 – 16:45 New features in HDF5 1.8.0
2
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 3
Introduction to HDF5 Data, Programming Models
and Tools
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 4
What is HDF?
3
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 5
HDF is…
• A file format for managing any kind of data • Software system to manage data in the format
• Designed for high volume or complex data • Designed for every size and type of system • Open format and software library, tools
• There are two HDF’s: HDF4 and HDF5 • Today we focus on HDF5
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 6
HDF5 The Format
4
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 7
An HDF5 “file” is a container…
lat|lon|temp‐‐‐‐|‐‐‐‐‐|‐‐‐‐‐12|23|3.115|24|4.217|21|3.6
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 8
Structures to organize objects “Groups”
“Datasets”
5
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 9
HDF5 model
• Groups – provide structure among objects • Datasets – where the primary data goes
• Data arrays • Rich set of datatype options • Flexible, efficient storage and I/O
• Attributes, for metadata
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 10
HDF5 The Software
6
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 11
HDF5 Software
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 12
Users of HDF5 Software
“Virtualfilelayer”(VFL)
HDF5ApplicationProgrammingInterface
ModulestoadaptI/Otospecificfeaturesofsystem,ordoI/Oinsomespecialway.
“File”couldbeonparallelsystem,inmemory,network,collecHonoffiles,etc.
ApplicaHons,toolsusethisAPItocreate,read,write,query,etc.Powerusers(consumers)
Mostdataconsumersarehere.ScienHfic/engineeringapplicaHons.Domain‐specificlibraries/API,tools.
7
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 13
HDF5 Philosophy
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 14
Who uses HDF5?
8
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 15
Who uses HDF5?
• Applications that deal with big or complex data • Over 200 different types of apps • 2+million product users world-wide • Academia, government agencies, industry
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 16
Applications with large amounts of data
9
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 17
NASA EOS remote sense data
• HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission.
• Petabytes of data stored in HDF and HDF5 to support the Global Climate Change Research Program.
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 18
Large simulations
• A simulation can have billions of elements • Each element can have dozens of associated
values
10
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 19
Large images Electron tomography
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 20
It is not just about size
11
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 21
Data complexity
Thanks to Mark Miller, LLNL
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 22
Complex relationships within data
Contig Summaries
Discrepancies
Contig Qualities
Coverage Depth
Aligned bases
Reads
Percent match
12
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 23
Different views of data Flight test
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 24
HDF5 Data Model
13
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 25
HDF5 model (recap)
• Groups – provide structure among objects • Datasets – where the primary data goes
• Data arrays • Rich set of datatype options • Flexible, efficient storage and I/O
• Attributes, for metadata • Other objects
• Links (point to data in a file or in another HDF5 file) • Datatypes (can be stored for complex structures
and reused by multiple datatsets)
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 26
HDF5 Dataset
Data Metadata Dataspace
3 Dim_2 = 5 Dim_1 = 4
Time = 32.4 Pressure = 987
Temp = 56
Chunked
Compressed
Dim_3 = 7
IEEE 32-bit float
14
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 27
HDF5 Dataspace
• Two roles • Dataspace contains spatial info about a dataset
stored in a file • Rank and dimensions • Permanent part of dataset
definition
• Dataspace describes application’s data buffer and data elements participating in I/O
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 28
HDF5 Datatype
• Datatype – how to interpret a data element • Permanent part of the dataset definition • Two classes: atomic and compound • Can be stored in a file as an HDF5 object (HDF5
committed datatype) • Can be shared among different datasets
15
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 29
HDF5 Datatype
• HDF5 atomic types include • normal integer & float • user-definable (e.g., 13-bit integer) • variable length types (e.g., strings) • references to objects/dataset regions • enumeration - names mapped to integers • array
• HDF5 compound types • Comparable to C structs (“records”) • Members can be atomic or compound types
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 30
int8 int4 int16
HDF5 dataset: array of records
3
5
16
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 31
Special storage options for dataset
Better subsetting access time; extendable
chunked
Improves storage efficiency, transmission speed
compressed
Arrays can be extended in any direction
extendable
Metadata for Fred
Dataset “Fred” File A
File B
Data for Fred
Metadata in HDF5 file, raw data in a binary file
external
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 32
HDF5 Attribute
• Attribute – data of the form “name = value”, attached to an object by application
• Operations similar to dataset operations, but … • Not extendible • No compression or partial I/O
• Can be overwritten, deleted, added during the “life” of a dataset
17
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 33
HDF5 Group
• A mechanism for organizing collections of related objects
• Every file starts with a root group
• Similar to UNIX directories
• Can have attributes
“/”
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 34
“/” X
temp
temp
/ (root) /X /Y /Y/temp /Y/bar/temp
Path to HDF5 object in a file
Y
bar
18
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 35
Shared HDF5 objects
“/” A B C
P R P
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 36
HDF5 Data Model Example
ENSIGHT Automotive crash simulation
19
Automotive crash simulation
September 9, 2008 37 SPEEDUP Workshop - HDF5 Tutorial
Automotive crash simulation
September 9, 2008 38 SPEEDUP Workshop - HDF5 Tutorial
20
Automotive crash simulation
September 9, 2008 39 SPEEDUP Workshop - HDF5 Tutorial
Solid modeling
September 9, 2008 40 SPEEDUP Workshop - HDF5 Tutorial
21
Solid modeling
September 9, 2008 41 SPEEDUP Workshop - HDF5 Tutorial
HDF5mesh
September 9, 2008 42 SPEEDUP Workshop - HDF5 Tutorial
22
April 28, 2008 LCI Tutorial 43
Mesh Example, in HDFView
September 9, 2008 43 SPEEDUP Workshop - HDF5 Tutorial
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 44
HDF5 Software
23
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 45
HDF5 software stack
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 46
Structure of HDF5 Library
24
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 47
Write – from memory to disk
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 48
Partial I/O
(b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array
(a) Hyperslab from a 2D array to the corner of a smaller 2D array
Move just part of a dataset
25
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 49
(c) A sequence of points from a 2D array to a sequence of points in a 3D array.
(d) Union of hyperslabs in file to union of hyperslabs in memory.
Partial I/O
Move just part of a dataset
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 50
Layers – parallel example
Application Parallel computing system (Linux cluster)
Compute node
I/O library (HDF5)
Parallel I/O library (MPI-I/O)
Parallel file system (GPFS)
Switch network/I/O servers
Compute node
Compute node
Compute node
I/O flows through many layers from application to disk.
26
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 51
Virtual I/O layer
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 52
Virtual file I/O layer
• A public API for writing I/O drivers • Allows HDF5 to interface to disk, the network,
memory, or a user-defined device
Network
Network File Family MPI I/O Memory
Virtual file I/O drivers
Memory
Stdio
“Storage”
27
Applications & Domains
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 53
Storage
File on parallel file system File
Split metadata and raw data files
User-defined device
Across the networkor to/from another
application or library HDF5 format
Simulation, visualization, remote sensing…
Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools
Climate models
Common domain-specific data models
UDM SAF H5Part HDF-EOS IDL Domain-specific APIs
LANL LLNL, SNL Grids COTS NASA
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 54
Portability & Robustness
• Runs almost anywhere • Linux and UNIX workstations • Windows, Mac OS X • Big ASC machines, Crays, VMS systems • TeraGrid and other clusters • Source and binaries available from http://www.hdfgroup.org/HDF5/release/index.html
• QA • Daily regression tests on key platforms • Meets NASA’s highest technology readiness level
28
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 55
Other Software
• The HDF Group • HDFView • Java tools • Command-line utilities • Web browser plug-in • Regression and performance testing software • Parallel h5diff
• 3rd Party (IDL, MATLAB, Mathematica, PyTables, HDF Explorer, LabView)
• Communities (EOS, ASC, CGNS) • Integration with other software (iRODS, OPeNDAP)
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 56
Creating an HDF5 File with HDFView
29
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 57
A B “/” (root)
Example: Create this HDF5 File
4x6 array of integers
Storm
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 58
Demo
• Demonstrate the use of HDFView to create the HDF5 file
• Use h5dump to see the contents of the HDF5 file • Use h5import to add data to the HDF5 file • Use h5repack to change properties of the stored
objects • Use h5diff to compare two files
30
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 59
Introduction to HDF5 Programming Model
and APIs
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 60
Structure of HDF5 Library (recap)
31
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 61
Goals of HDF5 Library
• Provide flexible API to support a wide range of operations on data.
• Support high performance access in serial and parallel computing environments.
• Be compatible with common data models and programming languages.
Because of these goals, the HDF5 API is rich and large
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 62
Operations Supported by the API
• Create groups, datasets, attributes, linkages • Create complex data types • Assign storage and I/O properties to objects • Perform complex subsetting during read/write • Use variety of I/O “devices” (parallel, remote, etc.) • Transform data during I/O • Query about file and structure and properties • Query about object structure, content, properties
32
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 63
Characteristics of the HDF5 API
• For flexibility, the API is extensive • 300+ functions
• This can be daunting… but there is hope • A few functions can do a lot • Start simple • Build up knowledge as more features are needed
• Library functions are categorized by object type • “H5Lite” API supports basic capabilities
Victronix Swiss Army Cybertool 34
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 64
The General HDF5 API
• Currently C, Fortran 90, Java, and C++ bindings. • C routines begin with prefix H5?
? is a character corresponding to the type of object the function acts on
Example APIs:
H5D : Dataset interface e.g., H5Dread H5F : File interface e.g., H5Fopen H5S : dataSpace interface e.g., H5Sclose
33
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 65
Compiling HDF5 Applications
• h5cc – HDF5 C compiler command • Similar to mpicc
• h5fc – HDF5 F90 compiler command • Similar to mpif90
• h5c++ – HDF5 C++ compiler command
To compile: % h5cc h5prog.c % h5fc h5prog.f90
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 66
Compile option: -show
-show: displays the compiler commands and options without executing them % h5cc –show Sample_c.c gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API -DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -c Sample_c.c
gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o -L/home/packages/hdf5_1.6.6/Linux_2.6/lib /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a -lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib
34
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 67
General Programming Paradigm
• Properties of object are optionally defined • Creation properties • Access property lists • Default values used if none are defined
• Object is opened or created • Object is accessed, possibly many times • Object is closed
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 68
Order of Operations
• An order is imposed on operations by argument dependencies
For Example: A file must be opened before a dataset
-because- the dataset open call requires a file handle as an argument.
• Objects can be closed in any order.
35
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 69
HDF5 Defined Types
For portability, the HDF5 library has its own defined types:
hid_t: object identifiers (native integer) hsize_t: size used for dimensions (unsigned long or unsigned long long) hssize_t: for specifying coordinates and sometimes for dimensions (signed long or signed long long) herr_t: function return value
hvl_t: variable length datatype
For C, include hdf5.h in your HDF5 application.
Example: Create this HDF5 File
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 70
A B “/” (root)
4x6 array of integers
36
Example: Step by Step
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 71
A
4x6 array of integers
B “/” (root)
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 72
“/” (root)
Example: Create a File
37
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 73
Steps to Create a File
1. Decide any special properties the file should have • Creation properties, like size of user block • Access properties, such as metadata cache size
2. Create property lists, if necessary 3. Create the file 4. Close the file and the property lists, as needed
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 74
Code: Create a File
hid_t file_id;
file_id = H5Fcreate("file.h5",H5F_ACC_TRUNC, H5P_DEFAULT,H5P_DEFAULT);
H5F_ACC_TRUNC flag – removes existing file H5P_DEFAULT flags – create regular UNIX file and access it with HDF5 SEC2 I/O file driver
38
Example: Add a Dataset
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 75
4x6 array of integers
A “/” (root)
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 76
Dataset Components
Data Metadata Dataspace
3 Dim_2 = 5 Dim_1 = 4
Time = 32.4 Pressure = 987
Temp = 56
Chunked
Compressed
Dim_3 = 7
IEEE 32-bit float
39
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 77
Dataset Creation Property List
Dataset creation property list: information on how to store data in a file
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 78
Steps to Create a Dataset
1. Define dataset characteristics • Dataspace – 4x6 Datatype – integer • Properties (if needed)
2. Decide where to put it – “root group” • Obtain location identifier
3. Decide link or path – “A” 4. Create link and dataset in file 5. (Eventually) Close everything A
“/” (root)
40
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 79
1 hid_t file_id, dataset_id, dataspace_id; 2 hsize_t dims[2]; 3 herr_t status;
4 file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
5 dims[0] = 4; 6 dims[1] = 6; 7 dataspace_id = H5Screate_simple (2, dims, NULL);
8 dataset_id = H5Dcreate(file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT);
9 status = H5Dclose (dataset_id); 10 status = H5Sclose (dataspace_id); 11 status = H5Fclose (file_id);
Code: Create a Dataset
Terminate access to dataset, dataspace, file
Create a dataspace rank current dims
Create a dataset
dataspace
datatype
property list (default)
pathname
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 80
A B “/” (root)
Example: Create a Group
4x6 array of integers
file.h5
41
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 81
Steps to Create a Group
1. Decide where to put it – “root group” • Obtain location identifier
2. Decide link or path – “B” 3. Create link and group in file
• Specify number of bytes to store names of objects to be added to group (as a hint) – or use default.
4. (Eventually) Close the group.
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 82
Code: Create a Group
hid_t file_id, group_id; ... /* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR,
H5P_DEFAULT);
/* Create group "/B" in file. */ group_id = H5Gcreate(file_id,"/B",0);
/* Close group and file. */ status = H5Gclose(group_id); status = H5Fclose(file_id);
42
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 83
HDF5 Information
HDF Information Center http://www.hdfgroup.org
HDF Help email address [email protected]
HDF users mailing lists [email protected] [email protected]
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 84
Questions?