Victoria, May 2006 DAL for theorists: Implementation of the SNAP service for the TVO Claudio...

Victoria, May 2006

DAL for theorists: Implementation of the SNAP service for the TVO

Claudio Gheller, Giuseppe FiameniInterUniversitary Computing Center CINECA, Bologna

Ugo Becciani, Alessandro CostaAstrophysical Observatory of Catania

Victoria, May 2006

The Simple Numerical Access Protocol Service

The Snap service extracts or "cuts out" rectangular (spherical or even irregular) regions of some larger theory dataset, returning a subset of the requested size to the client.

Snap basic components:

• DATA

• SNAP code

• SERVICE

Victoria, May 2006

1. Data and Data Model

In order to analyze the needs of data produced by numerical simulations, we have considered a wide spectrum of applications:

• Particle based Cosmological simulations

• Grid based Cosmological simulations

• Magnatohydrodynamics simulations

• Planck mission simulated data

• ...(thanks to V. Antonuccio, G. Bodo, S. Borgani, N. Lanza, L. Tornatore)

At the moment, we consider only RAW data

Victoria, May 2006

1. Data

In general, data produced by numerical simulations are

• Large (GB to TB scale)

• Monolithic (few files contains plenty of data)

• Uncompressible

• Non standard (propretary formats are the rule)

• Non portable (depend from simulation machine)

• No (or few) annotations – metadata

• Heterogeneous in units (often code units)

Victoria, May 2006

Data: the HDF5 format

HDF5 (http://hdf.ncsa.uiuc.edu) represents a possible solution to deal with such data

HDF5 is• Portable between most of

modern platform• High performance• Well supported• Well documented• Rich of tools

HDF5 data files are• Platform independent (portable)• Well organized• Self defined• Metadata enriched• Efficiently accessible

HDF5 drawbacks• Requires some expertise and

skill to be used• Information are difficult to

access• Can be subject to major library

changes (see HDF4 to HDF5)

Victoria, May 2006

Data: our HDF5 implementation

Each file represents an output time

The structure is simple: all the data objects are at the root level:

/BmMassDensity Dataset {512, 512, 512}

/BmTemperature Dataset {512, 512, 512}

/BmVelocity Dataset {512, 512, 512, 3}

/DmMassDensity Dataset {512, 512, 512}

/DmPosition Dataset {134217728, 3}

/DmVelocity Dataset {134217728, 3}

HDF5 metadata make the file completely self-consistent

Structural metadata (strictly required from the library)

• rank• Dimensionality

Annotation metadata (required from our implementation)

• Data object name• Data object description• Unit• Formula

Data objects (at the moment) can be:

• Structured grid: rank 4 (scalars or vectors)

• Unstructured points: rank 2 (scalar or vectors)

Victoria, May 2006

Data Model schema

Victoria, May 2006

Implementation of the model

The database is at present implemented on a PostgreSQL Linux installation.

Victoria, May 2006

The Snap Code: overview

The Snap code acts on large datafiles on different platforms. Therefore it has been implemented according to the following requirements:

• Efficiency

• Robustness

• Portability

• Extensibility

We have adopted the C++ programming language over the HDF5 format and APIs.

It is compiled under Linux (Gnu Compiler) and AIX (xlC compiler)

Source HDF5 fileDataset1

Dataset N

Snapped HDF5 file

Dataset1

Dataset M

service Dow

Victoria, May 2006

The Snap Code

Input:

Data filename

Data objects (one or more)

Spatial Units

Box Center

Box Size

Output filename

Data objects names

Output:

One ore more HDF5 file with the same descriptive metadata as the original dataset.

Goal: select all the data that fall inside a pre-defined region. At present the region can be only rectangular.

Victoria, May 2006

The Snap Code

Mesh Based Data:

Selection is performed using HDF5 hyperslabs selection functions. Only necessary data are loaded in memory.

Selection is extremely fast.

Particle Based Data:

Particle positions are loaded in memory

Particles inside the selected region are identified and their ids are stored (linked list)

Other particle based dataset are loaded in memory and the list is used to select target particles

Selected particles are written in the file

Procedure can become “heavy”

Data Geometry and Topology: at present we support regular mesh based data and unstructured data (particles). The data structure is crucial for the Snap implementation features

Future upgrades:

Support of spherical (or even irregular) regions

Support of periodic boundary conditions

Parallel implementation

Victoria, May 2006

Access to the Archive (the service)

The archive can be accessed in two complementary ways:

• Via web and web portal

• Via web service and high level applications

Data Archive (data + metadata+apps)

WEB WEB SERVICE

Web Portal

VisIVO

User app. 1

User app. 2

TomCat+Axis

OGSA-DAIPHP, Java…

Victoria, May 2006 DAL for theorists: Implementation of the SNAP service for the TVO Claudio...

Documents

tvo. file · Web viewtvo.wikispaces.com

Tvo magazine 01 2015

Firenze, 10-11 Giugno 2003, C. Gheller Applicazioni di Calcolo Parallelo in Astrofisica Claudio Gheller CINECA c.gheller@cineca.it

Tvo magazine 05 2015

Tvo magazine 06 2015

Exehda API Implementação Tuple-Space Rodrigo Gheller Luque

Tvo magazine 07 2015

Tvo magazine 07 2014

TVO : Annual Report 2008

Claudio Gheller CINECA (c.gheller@cineca.it)

TVO Impact Report Fall 2011

Tvo magazine 03 2015

TVO Impact Report May June 2014

Tvo magazine 08 2014

TVO Rapport annuel 2009-10

TVO Impact Report Spring 2011

Jodi Whitehead TVO Groupe

TVO Impact Report March April 2014

Dissertação Carolina Gheller Miguens Capa Final.pdf

Pmi Tvo 030508