View
212
Download
0
Category
Preview:
Citation preview
Victoria, May 2006
DAL for theorists: Implementation of the SNAP service for the TVO
Claudio Gheller, Giuseppe FiameniInterUniversitary Computing Center CINECA, Bologna
Ugo Becciani, Alessandro CostaAstrophysical Observatory of Catania
Victoria, May 2006
The Simple Numerical Access Protocol Service
The Snap service extracts or "cuts out" rectangular (spherical or even irregular) regions of some larger theory dataset, returning a subset of the requested size to the client.
Snap basic components:
• DATA
• SNAP code
• SERVICE
Victoria, May 2006
1. Data and Data Model
In order to analyze the needs of data produced by numerical simulations, we have considered a wide spectrum of applications:
• Particle based Cosmological simulations
• Grid based Cosmological simulations
• Magnatohydrodynamics simulations
• Planck mission simulated data
• ...(thanks to V. Antonuccio, G. Bodo, S. Borgani, N. Lanza, L. Tornatore)
At the moment, we consider only RAW data
Victoria, May 2006
1. Data
In general, data produced by numerical simulations are
• Large (GB to TB scale)
• Monolithic (few files contains plenty of data)
• Uncompressible
• Non standard (propretary formats are the rule)
• Non portable (depend from simulation machine)
• No (or few) annotations – metadata
• Heterogeneous in units (often code units)
Victoria, May 2006
Data: the HDF5 format
HDF5 (http://hdf.ncsa.uiuc.edu) represents a possible solution to deal with such data
HDF5 is• Portable between most of
modern platform• High performance• Well supported• Well documented• Rich of tools
HDF5 data files are• Platform independent (portable)• Well organized• Self defined• Metadata enriched• Efficiently accessible
HDF5 drawbacks• Requires some expertise and
skill to be used• Information are difficult to
access• Can be subject to major library
changes (see HDF4 to HDF5)
Victoria, May 2006
Data: our HDF5 implementation
Each file represents an output time
The structure is simple: all the data objects are at the root level:
/BmMassDensity Dataset {512, 512, 512}
/BmTemperature Dataset {512, 512, 512}
/BmVelocity Dataset {512, 512, 512, 3}
/DmMassDensity Dataset {512, 512, 512}
/DmPosition Dataset {134217728, 3}
/DmVelocity Dataset {134217728, 3}
HDF5 metadata make the file completely self-consistent
Structural metadata (strictly required from the library)
• rank• Dimensionality
Annotation metadata (required from our implementation)
• Data object name• Data object description• Unit• Formula
Data objects (at the moment) can be:
• Structured grid: rank 4 (scalars or vectors)
• Unstructured points: rank 2 (scalar or vectors)
Victoria, May 2006
Implementation of the model
The database is at present implemented on a PostgreSQL Linux installation.
Victoria, May 2006
The Snap Code: overview
The Snap code acts on large datafiles on different platforms. Therefore it has been implemented according to the following requirements:
• Efficiency
• Robustness
• Portability
• Extensibility
We have adopted the C++ programming language over the HDF5 format and APIs.
It is compiled under Linux (Gnu Compiler) and AIX (xlC compiler)
Source HDF5 fileDataset1
...
...
Dataset N
Snapped HDF5 file
Dataset1
...
Dataset M
SNAP
service Dow
nlo
ad
Victoria, May 2006
The Snap Code
Input:
Data filename
Data objects (one or more)
Spatial Units
Box Center
Box Size
Output filename
Data objects names
Output:
One ore more HDF5 file with the same descriptive metadata as the original dataset.
Goal: select all the data that fall inside a pre-defined region. At present the region can be only rectangular.
Victoria, May 2006
The Snap Code
Mesh Based Data:
Selection is performed using HDF5 hyperslabs selection functions. Only necessary data are loaded in memory.
Selection is extremely fast.
Particle Based Data:
Particle positions are loaded in memory
Particles inside the selected region are identified and their ids are stored (linked list)
Other particle based dataset are loaded in memory and the list is used to select target particles
Selected particles are written in the file
Procedure can become “heavy”
Data Geometry and Topology: at present we support regular mesh based data and unstructured data (particles). The data structure is crucial for the Snap implementation features
Future upgrades:
Support of spherical (or even irregular) regions
Support of periodic boundary conditions
Parallel implementation
Victoria, May 2006
Access to the Archive (the service)
The archive can be accessed in two complementary ways:
• Via web and web portal
• Via web service and high level applications
Data Archive (data + metadata+apps)
WEB WEB SERVICE
Web Portal
VisIVO
User app. 1
User app. 2
TomCat+Axis
OGSA-DAIPHP, Java…
Recommended