42
1 September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 1 HDF5 Tutorial 37 th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal The HDF Group September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 2 Outline 8:00 – 9:00 Introduction to HDF5 data, programming models and tools 9:00 – 9:30 Advanced features 10:00 – 12:00 Introduction to Parallel HDF5 13:15 – 14:15 Caching and buffering in HDF5 14:45 – 16:45 New features in HDF5 1.8.0

HDF5 Intro

Embed Size (px)

DESCRIPTION

HDF5 Slides

Citation preview

Page 1: HDF5 Intro

1

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 1

HDF5 Tutorial

37th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal

The HDF Group

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 2

Outline 8:00 – 9:00

Introduction to HDF5 data, programming models and tools 9:00 – 9:30

Advanced features

10:00 – 12:00 Introduction to Parallel HDF5

13:15 – 14:15 Caching and buffering in HDF5

14:45 – 16:45 New features in HDF5 1.8.0

Page 2: HDF5 Intro

2

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 3

Introduction to HDF5 Data, Programming Models

and Tools

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 4

What is HDF?

Page 3: HDF5 Intro

3

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 5

HDF is…

•  A file format for managing any kind of data •  Software system to manage data in the format

•  Designed for high volume or complex data •  Designed for every size and type of system •  Open format and software library, tools

•  There are two HDF’s: HDF4 and HDF5 •  Today we focus on HDF5

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 6

HDF5 The Format

Page 4: HDF5 Intro

4

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 7

An HDF5 “file” is a container…

lat|lon|temp‐‐‐‐|‐‐‐‐‐|‐‐‐‐‐12|23|3.115|24|4.217|21|3.6

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 8

Structures to organize objects “Groups”

“Datasets”

Page 5: HDF5 Intro

5

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 9

HDF5 model

•  Groups – provide structure among objects •  Datasets – where the primary data goes

•  Data arrays •  Rich set of datatype options •  Flexible, efficient storage and I/O

•  Attributes, for metadata

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 10

HDF5 The Software

Page 6: HDF5 Intro

6

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 11

HDF5 Software

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 12

Users of HDF5 Software

“Virtualfilelayer”(VFL)

HDF5ApplicationProgrammingInterface

ModulestoadaptI/Otospecificfeaturesofsystem,ordoI/Oinsomespecialway.

“File”couldbeonparallelsystem,inmemory,network,collecHonoffiles,etc.

ApplicaHons,toolsusethisAPItocreate,read,write,query,etc.Powerusers(consumers)

Mostdataconsumersarehere.ScienHfic/engineeringapplicaHons.Domain‐specificlibraries/API,tools.

Page 7: HDF5 Intro

7

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 13

HDF5 Philosophy

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 14

Who uses HDF5?

Page 8: HDF5 Intro

8

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 15

Who uses HDF5?

•  Applications that deal with big or complex data •  Over 200 different types of apps •  2+million product users world-wide •  Academia, government agencies, industry

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 16

Applications with large amounts of data

Page 9: HDF5 Intro

9

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 17

NASA EOS remote sense data

•  HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission.

•  Petabytes of data stored in HDF and HDF5 to support the Global Climate Change Research Program.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 18

Large simulations

•  A simulation can have billions of elements •  Each element can have dozens of associated

values

Page 10: HDF5 Intro

10

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 19

Large images Electron tomography

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 20

It is not just about size

Page 11: HDF5 Intro

11

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 21

Data complexity

Thanks to Mark Miller, LLNL

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 22

Complex relationships within data

Contig Summaries

Discrepancies

Contig Qualities

Coverage Depth

Aligned bases

Reads

Percent match

Page 12: HDF5 Intro

12

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 23

Different views of data Flight test

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 24

HDF5 Data Model

Page 13: HDF5 Intro

13

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 25

HDF5 model (recap)

•  Groups – provide structure among objects •  Datasets – where the primary data goes

•  Data arrays •  Rich set of datatype options •  Flexible, efficient storage and I/O

•  Attributes, for metadata •  Other objects

•  Links (point to data in a file or in another HDF5 file) •  Datatypes (can be stored for complex structures

and reused by multiple datatsets)

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 26

HDF5 Dataset

Data Metadata Dataspace

3 Dim_2 = 5 Dim_1 = 4

Time = 32.4 Pressure = 987

Temp = 56

Chunked

Compressed

Dim_3 = 7

IEEE 32-bit float

Page 14: HDF5 Intro

14

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 27

HDF5 Dataspace

•  Two roles •  Dataspace contains spatial info about a dataset

stored in a file •  Rank and dimensions •  Permanent part of dataset

definition

•  Dataspace describes application’s data buffer and data elements participating in I/O

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 28

HDF5 Datatype

•  Datatype – how to interpret a data element •  Permanent part of the dataset definition •  Two classes: atomic and compound •  Can be stored in a file as an HDF5 object (HDF5

committed datatype) •  Can be shared among different datasets

Page 15: HDF5 Intro

15

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 29

HDF5 Datatype

•  HDF5 atomic types include •  normal integer & float •  user-definable (e.g., 13-bit integer) •  variable length types (e.g., strings) •  references to objects/dataset regions •  enumeration - names mapped to integers •  array

•  HDF5 compound types •  Comparable to C structs (“records”) •  Members can be atomic or compound types

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 30

int8 int4 int16

HDF5 dataset: array of records

3

5

Page 16: HDF5 Intro

16

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 31

Special storage options for dataset

Better subsetting access time; extendable

chunked

Improves storage efficiency, transmission speed

compressed

Arrays can be extended in any direction

extendable

Metadata for Fred

Dataset “Fred” File A

File B

Data for Fred

Metadata in HDF5 file, raw data in a binary file

external

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 32

HDF5 Attribute

•  Attribute – data of the form “name = value”, attached to an object by application

•  Operations similar to dataset operations, but … •  Not extendible •  No compression or partial I/O

•  Can be overwritten, deleted, added during the “life” of a dataset

Page 17: HDF5 Intro

17

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 33

HDF5 Group

•  A mechanism for organizing collections of related objects

•  Every file starts with a root group

•  Similar to UNIX directories

•  Can have attributes

“/”

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 34

“/” X

temp

temp

/ (root) /X /Y /Y/temp /Y/bar/temp

Path to HDF5 object in a file

Y

bar

Page 18: HDF5 Intro

18

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 35

Shared HDF5 objects

“/” A B C

P R P

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 36

HDF5 Data Model Example

ENSIGHT Automotive crash simulation

Page 19: HDF5 Intro

19

Automotive crash simulation

September 9, 2008 37 SPEEDUP Workshop - HDF5 Tutorial

Automotive crash simulation

September 9, 2008 38 SPEEDUP Workshop - HDF5 Tutorial

Page 20: HDF5 Intro

20

Automotive crash simulation

September 9, 2008 39 SPEEDUP Workshop - HDF5 Tutorial

Solid modeling

September 9, 2008 40 SPEEDUP Workshop - HDF5 Tutorial

Page 21: HDF5 Intro

21

Solid modeling

September 9, 2008 41 SPEEDUP Workshop - HDF5 Tutorial

HDF5mesh

September 9, 2008 42 SPEEDUP Workshop - HDF5 Tutorial

Page 22: HDF5 Intro

22

April 28, 2008 LCI Tutorial 43

Mesh Example, in HDFView

September 9, 2008 43 SPEEDUP Workshop - HDF5 Tutorial

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 44

HDF5 Software

Page 23: HDF5 Intro

23

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 45

HDF5 software stack

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 46

Structure of HDF5 Library

Page 24: HDF5 Intro

24

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 47

Write – from memory to disk

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 48

Partial I/O

(b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array

(a) Hyperslab from a 2D array to the corner of a smaller 2D array

Move just part of a dataset

Page 25: HDF5 Intro

25

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 49

(c) A sequence of points from a 2D array to a sequence of points in a 3D array.

(d) Union of hyperslabs in file to union of hyperslabs in memory.

Partial I/O

Move just part of a dataset

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 50

Layers – parallel example

Application Parallel computing system (Linux cluster)

Compute node

I/O library (HDF5)

Parallel I/O library (MPI-I/O)

Parallel file system (GPFS)

Switch network/I/O servers

Compute node

Compute node

Compute node

I/O flows through many layers from application to disk.

Page 26: HDF5 Intro

26

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 51

Virtual I/O layer

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 52

Virtual file I/O layer

•  A public API for writing I/O drivers •  Allows HDF5 to interface to disk, the network,

memory, or a user-defined device

Network

Network File Family MPI I/O Memory

Virtual file I/O drivers

Memory

Stdio

“Storage”

Page 27: HDF5 Intro

27

Applications & Domains

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 53

Storage

File on parallel file system File

Split metadata and raw data files

User-defined device

Across the networkor to/from another

application or library HDF5 format

Simulation, visualization, remote sensing…

Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools

Climate models

Common domain-specific data models

UDM SAF H5Part HDF-EOS IDL Domain-specific APIs

LANL LLNL, SNL Grids COTS NASA

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 54

Portability & Robustness

•  Runs almost anywhere •  Linux and UNIX workstations •  Windows, Mac OS X •  Big ASC machines, Crays, VMS systems •  TeraGrid and other clusters •  Source and binaries available from http://www.hdfgroup.org/HDF5/release/index.html

•  QA •  Daily regression tests on key platforms •  Meets NASA’s highest technology readiness level

Page 28: HDF5 Intro

28

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 55

Other Software

•  The HDF Group •  HDFView •  Java tools •  Command-line utilities •  Web browser plug-in •  Regression and performance testing software •  Parallel h5diff

•  3rd Party (IDL, MATLAB, Mathematica, PyTables, HDF Explorer, LabView)

•  Communities (EOS, ASC, CGNS) •  Integration with other software (iRODS, OPeNDAP)

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 56

Creating an HDF5 File with HDFView

Page 29: HDF5 Intro

29

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 57

A B “/” (root)

Example: Create this HDF5 File

4x6 array of integers

Storm

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 58

Demo

•  Demonstrate the use of HDFView to create the HDF5 file

•  Use h5dump to see the contents of the HDF5 file •  Use h5import to add data to the HDF5 file •  Use h5repack to change properties of the stored

objects •  Use h5diff to compare two files

Page 30: HDF5 Intro

30

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 59

Introduction to HDF5 Programming Model

and APIs

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 60

Structure of HDF5 Library (recap)

Page 31: HDF5 Intro

31

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 61

Goals of HDF5 Library

•  Provide flexible API to support a wide range of operations on data.

•  Support high performance access in serial and parallel computing environments.

•  Be compatible with common data models and programming languages.

Because of these goals, the HDF5 API is rich and large

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 62

Operations Supported by the API

•  Create groups, datasets, attributes, linkages •  Create complex data types •  Assign storage and I/O properties to objects •  Perform complex subsetting during read/write •  Use variety of I/O “devices” (parallel, remote, etc.) •  Transform data during I/O •  Query about file and structure and properties •  Query about object structure, content, properties

Page 32: HDF5 Intro

32

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 63

Characteristics of the HDF5 API

•  For flexibility, the API is extensive •  300+ functions

•  This can be daunting… but there is hope •  A few functions can do a lot •  Start simple •  Build up knowledge as more features are needed

•  Library functions are categorized by object type •  “H5Lite” API supports basic capabilities

Victronix Swiss Army Cybertool 34

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 64

The General HDF5 API

•  Currently C, Fortran 90, Java, and C++ bindings. •  C routines begin with prefix H5?

? is a character corresponding to the type of object the function acts on

Example APIs:

H5D : Dataset interface e.g., H5Dread H5F : File interface e.g., H5Fopen H5S : dataSpace interface e.g., H5Sclose

Page 33: HDF5 Intro

33

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 65

Compiling HDF5 Applications

•  h5cc – HDF5 C compiler command •  Similar to mpicc

•  h5fc – HDF5 F90 compiler command •  Similar to mpif90

•  h5c++ – HDF5 C++ compiler command

To compile: % h5cc h5prog.c % h5fc h5prog.f90

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 66

Compile option: -show

-show: displays the compiler commands and options without executing them % h5cc –show Sample_c.c gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API -DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -c Sample_c.c

gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o -L/home/packages/hdf5_1.6.6/Linux_2.6/lib /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a -lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib

Page 34: HDF5 Intro

34

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 67

General Programming Paradigm

•  Properties of object are optionally defined •  Creation properties •  Access property lists •  Default values used if none are defined

•  Object is opened or created •  Object is accessed, possibly many times •  Object is closed

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 68

Order of Operations

•  An order is imposed on operations by argument dependencies

For Example: A file must be opened before a dataset

-because- the dataset open call requires a file handle as an argument.

•  Objects can be closed in any order.

Page 35: HDF5 Intro

35

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 69

HDF5 Defined Types

For portability, the HDF5 library has its own defined types:

hid_t: object identifiers (native integer) hsize_t: size used for dimensions (unsigned long or unsigned long long) hssize_t: for specifying coordinates and sometimes for dimensions (signed long or signed long long) herr_t: function return value

hvl_t: variable length datatype

For C, include hdf5.h in your HDF5 application.

Example: Create this HDF5 File

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 70

A B “/” (root)

4x6 array of integers

Page 36: HDF5 Intro

36

Example: Step by Step

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 71

A

4x6 array of integers

B “/” (root)

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 72

“/” (root)

Example: Create a File

Page 37: HDF5 Intro

37

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 73

Steps to Create a File

1.  Decide any special properties the file should have •  Creation properties, like size of user block •  Access properties, such as metadata cache size

2.  Create property lists, if necessary 3.  Create the file 4.  Close the file and the property lists, as needed

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 74

Code: Create a File

hid_t file_id;

file_id = H5Fcreate("file.h5",H5F_ACC_TRUNC, H5P_DEFAULT,H5P_DEFAULT);

H5F_ACC_TRUNC flag – removes existing file H5P_DEFAULT flags – create regular UNIX file and access it with HDF5 SEC2 I/O file driver

Page 38: HDF5 Intro

38

Example: Add a Dataset

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 75

4x6 array of integers

A “/” (root)

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 76

Dataset Components

Data Metadata Dataspace

3 Dim_2 = 5 Dim_1 = 4

Time = 32.4 Pressure = 987

Temp = 56

Chunked

Compressed

Dim_3 = 7

IEEE 32-bit float

Page 39: HDF5 Intro

39

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 77

Dataset Creation Property List

Dataset creation property list: information on how to store data in a file

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 78

Steps to Create a Dataset

1. Define dataset characteristics •  Dataspace – 4x6 Datatype – integer •  Properties (if needed)

2. Decide where to put it – “root group” •  Obtain location identifier

3.  Decide link or path – “A” 4.  Create link and dataset in file 5.  (Eventually) Close everything A

“/” (root)

Page 40: HDF5 Intro

40

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 79

1 hid_t file_id, dataset_id, dataspace_id; 2 hsize_t dims[2]; 3 herr_t status;

4 file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

5 dims[0] = 4; 6 dims[1] = 6; 7 dataspace_id = H5Screate_simple (2, dims, NULL);

8 dataset_id = H5Dcreate(file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT);

9 status = H5Dclose (dataset_id); 10 status = H5Sclose (dataspace_id); 11 status = H5Fclose (file_id);

Code: Create a Dataset

Terminate access to dataset, dataspace, file

Create a dataspace rank current dims

Create a dataset

dataspace

datatype

property list (default)

pathname

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 80

A B “/” (root)

Example: Create a Group

4x6 array of integers

file.h5

Page 41: HDF5 Intro

41

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 81

Steps to Create a Group

1.  Decide where to put it – “root group” •  Obtain location identifier

2.  Decide link or path – “B” 3.  Create link and group in file

•  Specify number of bytes to store names of objects to be added to group (as a hint) – or use default.

4.  (Eventually) Close the group.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 82

Code: Create a Group

hid_t file_id, group_id; ... /* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR,

H5P_DEFAULT);

/* Create group "/B" in file. */ group_id = H5Gcreate(file_id,"/B",0);

/* Close group and file. */ status = H5Gclose(group_id); status = H5Fclose(file_id);

Page 42: HDF5 Intro

42

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 83

HDF5 Information

HDF Information Center http://www.hdfgroup.org

HDF Help email address [email protected]

HDF users mailing lists [email protected] [email protected]

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 84

Questions?