Upload
morgan-ray
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
4
HDF5 data model• HDF5 file – container for scientific data• Primary Objects
• Groups• Datasets
• Additional ways to organize data• Attributes• Sharable objects• Storage and access properties
Everything else is built from
Everything else is built from
these parts.
these parts.
5
HDF5 Dataset
DataMetadataDataspaceDataspaceDataspaceDataspace
3
RankRank
Dim_2 = 5Dim_1 = 4
DimensionsDimensions
time = 32.4
pressure = 987
temp = 56
AttributesAttributes
Chunked
compressed
Dim_3 = 7
Storage infoStorage info
IEEE 32-bit float
DatatypeDatatype
6
Dataspaces • Dataspace – spatial info about a
dataset• Rank and dimensions
• Permanent part of dataset definition
• Subset of points, for partial I/O• Needed only during
I/O operations
• Apply to datasets in memory or in the file
Rank = 2Rank = 2
Dimensions = 4x6Dimensions = 4x6
7
Datatypes (array elements)• Datatype – how to interpret a data
element• Permanent part of the dataset definition• Two classes: atomic and compound
8
Datatypes• HDF5 atomic types
• normal integer & float• user-definable (e.g. 13-bit integer)• variable length types (e.g. strings)• pointers - references to objects/dataset regions• enumeration - names mapped to integers• array
• HDF5 compound types• Comparable to C structs • Members can be atomic or compound types
9
RecordRecord
int8int8 int4int4 int16int16 2x3x2 array of float322x3x2 array of float32Datatype:Datatype:
HDF5 dataset: array of records
Dimensionality: 5 x 3Dimensionality: 5 x 3
3
5
10
Attributes • Attribute – data of the form “name =
value”, attached to an object• Operations scaled down versions of
dataset operations • Not extendible • No compression • No partial I/O
• Optional for the dataset definition• Can be overwritten, deleted, added during
the “life” of a dataset
11
“Groups”• A mechanism for
collections of related objects
• Every file starts with a root group
• Similar to UNIX directories
• Can have attributes
“/”tom dick
harry
a b c
12
“/”x
temp
temp
/ (root)/x/foo/foo/temp/foo/bar/temp
HDF5 objects are identified and located by their pathnames
foo
bar
13
Groups & their members can be shared
/tom/P/tom/P/dick/R/dick/R/harry/P/harry/P
“/”tom dick harry
PR P
14
Special Storage OptionsBetter subsetting Better subsetting access time; access time; extendableextendable
chunked
Improves storage Improves storage efficiency, efficiency, transmission speedtransmission speed
compressedcompressed
Arrays can be Arrays can be extended in any extended in any directiondirection
extendableextendable
Metadata for FredMetadata for FredMetadata for FredMetadata for Fred
Dataset “Fred”Dataset “Fred”Dataset “Fred”Dataset “Fred”
File AFile A
File BFile B
Data for FredData for Fred
Metadata in one file, Metadata in one file, raw data in another.raw data in another.Split fileSplit file
16
HDF5 Software stack
Tools & ApplicationsTools & ApplicationsTools & ApplicationsTools & Applications
HDF FileHDF FileHDF FileHDF File
HDF I/O LibraryHDF I/O LibraryHDF I/O LibraryHDF I/O Library
17
Virtual file I/O (C only)Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.)
Virtual file I/O (C only)Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.)
Library internalsLibrary internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)
Library internalsLibrary internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)
Structure of HDF5 Library
Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations
Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations
19
Partial I/O – move just part of a dataset
(b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array
memorymemorydiskdisk(a) Hyperslab from a 2D array to the corner of a smaller 2D array
memorymemory diskdisk
20
(c) A sequence of points from a 2D array to a sequence of points in a 3D array.
memorymemorydiskdisk
(d) Union of hyperslabs in file to union of hyperslabs in memory.
Partial I/O – move just part of a dataset
memorymemory diskdisk
21
Layers – parallel exampleApplicationApplication
Parallel computing system (Linux cluster)Parallel computing system (Linux cluster)Compute
nodeCompute
node
I/O library (HDF5)I/O library (HDF5)
Parallel I/O library (MPI-I/O)Parallel I/O library (MPI-I/O)
Parallel file system (GPFS)Parallel file system (GPFS)
Switch network/I/O serversSwitch network/I/O servers
Computenode
Computenode
Computenode
Computenode
Computenode
Computenode
Disk architecture & layout of data on diskDisk architecture & layout of data on disk
I/O flows through many layers from application to disk.
22
Virtual file I/O (C only)Virtual file I/O (C only)Virtual file I/O (C only)Virtual file I/O (C only)
Library internalsLibrary internalsLibrary internalsLibrary internals
Virtual I/O layer
Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++)
23
Virtual file I/O layer• A public API for writing I/O drivers• Allows HDF5 to interface to disk, the
network, memory, or a user-defined device
Network
NetworkFile Family MPI I/O Memory
Virtual file I/O driversVirtual file I/O drivers
Memory
Stdio
File File FamilyFamily
FileFile
““Storage”Storage”
24
StorageStorage
File on parallelFile on parallelfile systemfile systemFileFile
Split metadata Split metadata and raw data filesand raw data files
User-definedUser-defineddevicedevice
?? Across the networkAcross the networkor to/from anotheror to/from another
application or libraryapplication or libraryHDF5 formatHDF5 format
HDF5HDF5 data model & API data model & API
Apps: simulation, visualization, remote sensing…
Examples: Thermonuclear simulationsProduct modelingData mining tools
Visualization toolsClimate models
Common application-specific data models
HDF5 virtual file layer (I/O drivers)HDF5 virtual file layer (I/O drivers)
MPI I/OMPI I/OSplit FilesSplit FilesStdioStdio CustomCustom StreamStreamHDF5 serial & HDF5 serial &
parallel I/Oparallel I/O
UDM SAF hdf5mesh HDF-EOSIDLappl-specificappl-specific
APIsLANL LLNL, SNL Grids COTS NASA
25
Other info• Runs almost anywhere
• Most workstations• Big ASCI machines, Cray, Compaq• TeraGrid and other clusters
• QA• Daily regression tests on key platforms• Meets NASA’s highest technology
readiness level
26
Other HDF Software• NCSA HDF
• Java tools• Command-line utilities• Regression and performance testing
software
• Commercial (IDL, Matlab, HDF Explorer, etc.)
• Community (EOS, ASCI, etc.)• Integration with other software (SRB,
etc.)
29
HDF Information
• HDF Information Center• http://hdfgroup.org/
• HDF Help email address• [email protected]/
• HDF users mailing list• [email protected]/