28
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006

1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006

Embed Size (px)

Citation preview

1

High level view of HDF5 Data structures

and libraryHDF Summit

Boeing SeattleSeptember 19, 2006

2

Mesh Example, in HDFView

3

HDF5 Data Model

4

HDF5 data model• HDF5 file – container for scientific data• Primary Objects

• Groups• Datasets

• Additional ways to organize data• Attributes• Sharable objects• Storage and access properties

Everything else is built from

Everything else is built from

these parts.

these parts.

5

HDF5 Dataset

DataMetadataDataspaceDataspaceDataspaceDataspace

3

RankRank

Dim_2 = 5Dim_1 = 4

DimensionsDimensions

time = 32.4

pressure = 987

temp = 56

AttributesAttributes

Chunked

compressed

Dim_3 = 7

Storage infoStorage info

IEEE 32-bit float

DatatypeDatatype

6

Dataspaces • Dataspace – spatial info about a

dataset• Rank and dimensions

• Permanent part of dataset definition

• Subset of points, for partial I/O• Needed only during

I/O operations

• Apply to datasets in memory or in the file

Rank = 2Rank = 2

Dimensions = 4x6Dimensions = 4x6

7

Datatypes (array elements)• Datatype – how to interpret a data

element• Permanent part of the dataset definition• Two classes: atomic and compound

8

Datatypes• HDF5 atomic types

• normal integer & float• user-definable (e.g. 13-bit integer)• variable length types (e.g. strings)• pointers - references to objects/dataset regions• enumeration - names mapped to integers• array

• HDF5 compound types• Comparable to C structs • Members can be atomic or compound types

9

RecordRecord

int8int8 int4int4 int16int16 2x3x2 array of float322x3x2 array of float32Datatype:Datatype:

HDF5 dataset: array of records

Dimensionality: 5 x 3Dimensionality: 5 x 3

3

5

10

Attributes • Attribute – data of the form “name =

value”, attached to an object• Operations scaled down versions of

dataset operations • Not extendible • No compression • No partial I/O

• Optional for the dataset definition• Can be overwritten, deleted, added during

the “life” of a dataset

11

“Groups”• A mechanism for

collections of related objects

• Every file starts with a root group

• Similar to UNIX directories

• Can have attributes

“/”tom dick

harry

a b c

12

“/”x

temp

temp

/ (root)/x/foo/foo/temp/foo/bar/temp

HDF5 objects are identified and located by their pathnames

foo

bar

13

Groups & their members can be shared

/tom/P/tom/P/dick/R/dick/R/harry/P/harry/P

“/”tom dick harry

PR P

14

Special Storage OptionsBetter subsetting Better subsetting access time; access time; extendableextendable

chunked

Improves storage Improves storage efficiency, efficiency, transmission speedtransmission speed

compressedcompressed

Arrays can be Arrays can be extended in any extended in any directiondirection

extendableextendable

Metadata for FredMetadata for FredMetadata for FredMetadata for Fred

Dataset “Fred”Dataset “Fred”Dataset “Fred”Dataset “Fred”

File AFile A

File BFile B

Data for FredData for Fred

Metadata in one file, Metadata in one file, raw data in another.raw data in another.Split fileSplit file

15

HDF5 Software

16

HDF5 Software stack

Tools & ApplicationsTools & ApplicationsTools & ApplicationsTools & Applications

HDF FileHDF FileHDF FileHDF File

HDF I/O LibraryHDF I/O LibraryHDF I/O LibraryHDF I/O Library

17

Virtual file I/O (C only)Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.)

Virtual file I/O (C only)Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.)

Library internalsLibrary internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)

Library internalsLibrary internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)

Structure of HDF5 Library

Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations

Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations

18

Writing – move from memory to disk

memorymemory diskdisk

19

Partial I/O – move just part of a dataset

(b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array

memorymemorydiskdisk(a) Hyperslab from a 2D array to the corner of a smaller 2D array

memorymemory diskdisk

20

(c) A sequence of points from a 2D array to a sequence of points in a 3D array.

memorymemorydiskdisk

(d) Union of hyperslabs in file to union of hyperslabs in memory.

Partial I/O – move just part of a dataset

memorymemory diskdisk

21

Layers – parallel exampleApplicationApplication

Parallel computing system (Linux cluster)Parallel computing system (Linux cluster)Compute

nodeCompute

node

I/O library (HDF5)I/O library (HDF5)

Parallel I/O library (MPI-I/O)Parallel I/O library (MPI-I/O)

Parallel file system (GPFS)Parallel file system (GPFS)

Switch network/I/O serversSwitch network/I/O servers

Computenode

Computenode

Computenode

Computenode

Computenode

Computenode

Disk architecture & layout of data on diskDisk architecture & layout of data on disk

I/O flows through many layers from application to disk.

22

Virtual file I/O (C only)Virtual file I/O (C only)Virtual file I/O (C only)Virtual file I/O (C only)

Library internalsLibrary internalsLibrary internalsLibrary internals

Virtual I/O layer

Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++)

23

Virtual file I/O layer• A public API for writing I/O drivers• Allows HDF5 to interface to disk, the

network, memory, or a user-defined device

Network

NetworkFile Family MPI I/O Memory

Virtual file I/O driversVirtual file I/O drivers

Memory

Stdio

File File FamilyFamily

FileFile

““Storage”Storage”

24

StorageStorage

File on parallelFile on parallelfile systemfile systemFileFile

Split metadata Split metadata and raw data filesand raw data files

User-definedUser-defineddevicedevice

?? Across the networkAcross the networkor to/from anotheror to/from another

application or libraryapplication or libraryHDF5 formatHDF5 format

HDF5HDF5 data model & API data model & API

Apps: simulation, visualization, remote sensing…

Examples: Thermonuclear simulationsProduct modelingData mining tools

Visualization toolsClimate models

Common application-specific data models

HDF5 virtual file layer (I/O drivers)HDF5 virtual file layer (I/O drivers)

MPI I/OMPI I/OSplit FilesSplit FilesStdioStdio CustomCustom StreamStreamHDF5 serial & HDF5 serial &

parallel I/Oparallel I/O

UDM SAF hdf5mesh HDF-EOSIDLappl-specificappl-specific

APIsLANL LLNL, SNL Grids COTS NASA

25

Other info• Runs almost anywhere

• Most workstations• Big ASCI machines, Cray, Compaq• TeraGrid and other clusters

• QA• Daily regression tests on key platforms• Meets NASA’s highest technology

readiness level

26

Other HDF Software• NCSA HDF

• Java tools• Command-line utilities• Regression and performance testing

software

• Commercial (IDL, Matlab, HDF Explorer, etc.)

• Community (EOS, ASCI, etc.)• Integration with other software (SRB,

etc.)

28

Thank you

29

HDF Information

• HDF Information Center• http://hdfgroup.org/

• HDF Help email address• [email protected]/

• HDF users mailing list• [email protected]/