21
19.07.2012, GRID2012, Dubna 1 GRID AND HPC SUPPORT FOR NATIONAL PARTICIPATION IN LARGE-SCALE COLLABORATIONS Department of Computational Physics and Information Technologies, 'Horia Hulubei' National Institute for R&D in Physics and Nuclear Engineering (IFIN-HH), Magurele, Romania M. Dulea, S. Constantinescu, M. Ciubancan, T. Ivanoaica, C. Placinta, I.T. Vasile, D. Ciobanu-Zabet

GRID AND HPC SUPPORT FOR NATIONAL …grid2012.jinr.ru/docs/grid2012-Dulea.pdfHDD / node 8 GB SSD 76 / 146 GB SAS 500 GB SAS ... 13 • Simulation and ... (collab. with Physics Faculty

  • Upload
    vumien

  • View
    221

  • Download
    3

Embed Size (px)

Citation preview

19.07.2012, GRID2012, Dubna 1

GRID AND HPC SUPPORT

FOR NATIONAL PARTICIPATION

IN LARGE-SCALE COLLABORATIONS

Department of Computational Physics and Information Technologies,

'Horia Hulubei' National Institute for R&D in Physics

and Nuclear Engineering (IFIN-HH),

Magurele, Romania

M. Dulea, S. Constantinescu, M. Ciubancan,

T. Ivanoaica, C. Placinta, I.T. Vasile, D. Ciobanu-Zabet

Resource centres:

• 11 active sites hosted by 7 institutions

• Total number of cores: 5830

• Total disk capacity: 1,8 PB

Network infrastructure provided by

NREN: RoEduNet (min. 10 Gbps)

19.07.2012, GRID2012, Dubna 2

NATIONAL GRID INFRASTRUCTURE

19.07.2012, GRID2012, Dubna 3

WLCG VOs:

cover 98% of the national Grid production

alice - 54% (supported by 3 centres)

atlas - 41% (supported by 4 centres)

lhcb - 3% (supported by 3 centres)

(reported period 07.2011-06.2012)

2%: envirogrids.vo.eu-egee.org - models & scenarios for the environment of the Black Sea Catchment gridifin - provides framework for running non-HEP applications supported by the National Grid for Physics and Related Areas (GriNFiC) see, seegrid - regional (South East Europe) hone

ilc

SUPPORTED VOs AND GRID USAGE

19.07.2012, GRID2012, Dubna 4

Tier-2 Normalised CPU time

per country (HEPSPEC06-hours)

1,58% 10th position / 33;

Total: 114.842.196

ALICE: 64.488.248 (6th)

ATLAS: 46.523.872

LHCb: 3.830.076

Supported VOs and Grid usage

(07.2011-06.2012)

19.07.2012, GRID2012, Dubna 5

Tier-2 number of jobs

3,63% 9th position / 33

Total: 15.009.823

alice: 6.521.359 (first)

atlas: 8.335.369 (9th) 90.6% CPU efficiency (4th)

lhcb: 153.095

Supported VOs and Grid usage

(07.2011-06.2012)

Years Tier-2 RO-LCG 2010-2011 / 2009-2010 1,74 2,16 2011-2012 / 2010-2011 1,44 2,65

6

High-Performance Computing Infrastructure for South East Europe’s Research Communities

Worldwide LHC Computing Grid

Processing of PetaByte-Scale Data @ FR Cloud

GriCeFCo

Coordination of the Romanian HPC consortium that participates in the HP-SEE project (FP7-RI-261499 , 2010-2013)

Coordination of the Romanian Tier-2 Federation RO-LCG, - composed of 5 institutions that participate in the WLCG collaboration with CERN (since 2006)

Collaboration IRFU/CEA - IFIN-HH, on Efficient Handling and Processing Petabite Scale Data for the Computing Centres within the French Cloud (HaPPSDag)

Coordination of the National Grid for Research in Physics and Related Areas, built through EU structural funds (SOP IEC 2.2.3 - Grid)

RO - LIT-JINR / Dubna collaboration Hulubei-Meshcheriakov programme (2005-2013), project Optimization Investigations of the GRID and Parallel Computing Facilities at LIT-JINR and Magurele Campus

COLLABORATIONS AND PROJECTS

(DFCTI)

19.07.2012, GRID2012, Dubna

7

IFIN_Bio > Myrinet 2000 256 cores

IFIN_BC > Infiniband 1040 cores

APC InfraStruXureTM

Data Center ^

COMPUTING CENTER

GRID AND HPC INFRASTRUCTURE @ DFCTI

19.07.2012, GRID2012, Dubna

19.07.2012, GRID2012, Dubna 8

DFCTI hosts 3 grid sites:

RO-07-NIPNE ( alice, atlas, lhcb)

RO-11-NIPNE (lhcb)

IFIN GRID (GriNFiC)

DD: dpm disk storage servers

2 CREAM CE, VO-Box, etc.

Running atlas analysis jobs required a scalable solution for handling the increasing data transfers between the SE and WNs.

A convenient solution was the stacking of switches, wich can provide the required minimum bandwidth when the number of simultaneous transfers grows.

IFIN GRID

RO-07

LOCAL NETWORK TOPOLOGY

19.07.2012, GRID2012, Dubna 9

MONITORING

Provided at 4 levels: 1) utility and support equipment (electric power, UPS, cooling, etc)

2) data traffic on main switches 3) grid activity: number of jobs of different types, Grid production ( normalised CPU time ), the traffic on the main servers (SE, GW, etc.), on a common interface: http://www.nipne.ro/RO-07-NIPNE.html

19.07.2012, GRID2012, Dubna 10

SERVICE AVAILABILITY MONITORING

SAM provided by IFIN GRID, through ifops VO; results transferred to WMS and published by Nagios

ifops is dedicated to the monitoring of the sites of GriNFiC, including those of RO-LCG

The GUI allows to compare the results of the ifops tests with those of the ops tests performed by NGI and published on the EGI monitoring portal.

19.07.2012, GRID2012, Dubna 11

from the sublime to the ridiculous ...

NETWORK SUPPORT

Extensive investigation of the connectivity in HaPPSDaG

Most of the RO-LCG resources are connected to the NREN's NOC through a 12 km FO aerial cable (10 Gbps).

perfSONAR tests make no sense if this connection is cut

This begins to happen rather frequently

Below: reaching 8,5 Gbps throughput, during alice files transfer, just before the link being cut.

19.07.2012, GRID2012, Dubna 12

The development of the HPC infrastructure started in 2006 and benefited of the cooperation with LIT-JINR.

There are now 4 parallel clusters in technology Infiniband, Myrinet 2G and GbE.

The large-scale computations are performed on the IFIN_BC cluster (IBM Blade Center)

Server type: IBM QS22 IBM LS22 IBM HS22 Total CPU IBM PowerXCell 8i AMD Opteron 2376 Intel Xeon X5650 Clock frequency 3,2 GHz 2,3 GHz 2,67 GHz Core no. / cpu 1x PPE + 8x SPE 4 6 Cache level2/cpu 512 KB 512 KB 6x256 KB FSB frequency 1066 1000 3200 MHz HDD / node 8 GB SSD 76 / 146 GB SAS 500 GB SAS RAM / node 32 GB 8 GB 24 / 36 / 48 Total RAM 512 GB 80 GB 21x24+23x36+12x48 GB 1908 GB Nodes total 16 10 56 82 CPUs total 32 20 112 164 Cores total 32x PPE + 256x SPE 80 672 1040 Interconnects Infiniband 4x QDR 40 Gbps

HPC INFRASTRUCTURE

13

• Simulation and modeling of large biomolecular systems by means of molecular dynamics codes (e.g. NAMD) (collab. with Fac. of Biology / Univ. of Bucharest)

• Modeling drug - efflux pump inhibitors interaction dynamics: Activity modeling and simulation of efflux pump inhibitors based on advanced laser methods (collaboration with INFLPR)

• Dynamics of Bose-Einstein condensates (collaboration with IPB, Belgrade)

• Modeling of seismic events (INCDFP collaboration)

• Ab-initio investigation of charge transport in nanostructures (collab. with Physics

Faculty / Univ. of Bucharest)

19.07.2012, GRID2012, Dubna

HPC SUPPORT FOR RESEARCH PROJECTS

High-Performance Computing Infrastructure for South East Europe’s Research Communities (HP-SEE) (2010-2013)

14

Coordinator: GRNET, Greece 14 partners DFCTI leads the Romanian consortium:

• IFIN-HH • UVT • UPB (NCIT) • ISS

HPC SUPPORT FOR REGIONAL RESEARCH COMMUNITIES 1/2

19.07.2012, GRID2012, Dubna

Contribution to the realization of an common integrated HPC infrastructure in SEE (11 HPC resource centres).

Grid middleware is used in in order to hide the various ways of accessing HPC sites provided by the local resource management systems (LRMS).

Several middleware solutions can be used, but even this access layer can also be generalized by application specific, graphical portal solutions.

Example: WS-PGRADE - can communicate with all the EMI legacy middleware solutions: ARC, dCache, gLite, and UNICORE (https://www.shiwa-workflow.eu/wiki/-/wiki/Main/WS-PGRADE)

15

HPC SUPPORT FOR REGIONAL RESEARCH COMMUNITIES 2/2

19.07.2012, GRID2012, Dubna

In figure: the ISyMAB user can choose to work on IFIN_BC or on PARADOX @ IPB, Belgrade.

The working directory of the user on any cluster can be synchronized with the ISyMAB directory using the Sync buttons

ISyMAB - Integrated System for Modeling and data Analysis of complex Biomolecules

The application provides a remote access framework on NAMD clusters which offers the users an integrated interface with analysis tools

The user with access rights can launch jobs through pbs and execute shell scripts on various HP-SEE clusters

19.07.2012, GRID2012, Dubna 16

DFCTI will use a new building (in construction): 316 sqm datacenter, electric power: 750 kW

PROSPECTS 1/2

SHORT- AND MID-TERM TASKS

Realization of a redundant network connection (in collaboration with NREN), improving

availability, and the integration in LHCONE

Federated storage for the atlas sites RO-07-NIPNE, RO-02-NIPNE

Simplifying the structure, operation, and management of the RO-LCG data centres

LONG-TERM TASKS:

Providing computing support for large-scale

collaborations, on long term, based on

international agreements:

• WLCG: implementing the post-LS1 strategy

• ELI-NP project (Extreme Light Infrastructure

- Nuclear Physics, http://www.eli-np.ro);

• FAIR-GSI experiments (CBM, PANDA);

• ITER

• EURATOM programme, Integrated Tokamak

Modelling Task Force (ITM-TF); etc.

PROSPECTS 2/2

19.07.2012, GRID2012, Dubna 17

19.07.2012, GRID2012, Dubna 18

During the first years of LHC data taking, RO-LCG provided better production results than the Tier-2 average

Scalable solutions were designed and implemented for improving the transfer, processing and storage of large datasets at the DFCTI center in the framework of the computing support of the ATLAS experiment at LHC-CERN.

A significant contribution to the performance management of the Grid system came from the NGI-independent set of tools implemented for monitoring the data transfer, storage efficacy, and service availability in the resource centres; this avoids possible inconvenients related to the end of the EGI support.

An integrated system for modeling, production runs, and data analysis of complex biomolecules was programmed for the molecular dynamics simulations performed in the framework of the HP-SEE infrastructure

Urgent measures are required for improving the network connectivity, in order to participate to LHCONE

CONCLUSIONS

19.07.2012, GRID2012, Dubna 19

INVITATION: RO-LCG CONFERENCE

Grid, Cloud & High Performance Computing in Science 25-27.10.2012

www.itim-cj.ro/rolcg2012 Deadlines:

15.08.2012: full paper submission

National contribution to the development of the LCG computing grid for elementary particle physics, project funded by the National Authority for Scientific Research (ANCS) under contract 8EU/2012 CEA - IFA Partnership, R&D project: Efficient Handling and Processing Petabite Scale Data for the Computing Centres within the French Cloud, contract C1-06/2010, co-funded by ANCS Collaboration with LIT-JINR / Dubna in the framework of the Hulubei- Meshcheriakov programme, project: Optimization Investigations of the Grid and Parallel Computing Facilities at LIT-JINR and Magurele Campus FP7 HP-SEE project: High-Performance Computing Infrastructure for South East Europe’s Research Communities, contract FP7-RI-261499, www.hp-see.eu

19.07.2012, GRID2012, Dubna 20

ACKNOWLEDGEMENTS

19.07.2012, GRID2012, Dubna 21

The 5th International Conference "Distributed Computing and Grid

technologies in Science and Education"

July 16 - 21, 2012

Dubna, Russia

THANK YOU

FOR YOUR ATTENTION !