16
IN2P3 Status Report IN2P3 Status Report HTASC March 2003 Fabio HERNANDEZ et al. from CC-in2p3 François ETIENNE [email protected] [email protected]

IN2P3 Status Report HTASC March 2003 Fabio HERNANDEZ et al. from CC-in2p3 François ETIENNE [email protected] [email protected]

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

IN2P3 Status ReportIN2P3 Status Report

HTASCMarch 2003

Fabio HERNANDEZ et al. from CC-in2p3François [email protected]

[email protected]

2HTASC - 14 March 2003

OutlineOutline

User community Update on computing services Update on storage services Network status Grid status

3HTASC - 14 March 2003

0,51995

21996

41997

61999

34 Mb/s2000

IN2P3 current contextIN2P3 current context

18 labs 1 Computer Center 2500 users 40 experiments

CCIN2P3-CERN connection

155 Mb/s2001

1 Gb/s 2003

CCIN2P3-SLAC connection30 Mb/s2001

155 Mb/s2002

600 Mb/s2003

4HTASC - 14 March 2003

RENATER current contextRENATER current context

Deployed : oct. 2002 More grid than star-shape Most links = 2.4 Gbps still 2 main nodes : Paris, Lyon

5HTASC - 14 March 2003

User communityUser community

Experiments: LHC (Atlas, CMS, Alice, LHCB), BaBar (SLAC), D0 (FNAL), PHENIX

(Brookhaven), astrophysics (17 expts : EROS, SuperNovae, Auger, Virgo…)

2500 users from different countries TIER A BaBar 20% CPU power were consumed by non-French users in 2002

Starting to provide services to biologists at a local/regional level (4 teams and ~3% of cpu over the last 6 months, WP10 EDG, Heaven cluster)

User community steadily growing

6HTASC - 14 March 2003

Experiments CPU (UI ~ 5 SI-95)

Aleph 300 000Alice 1 000 000Ams 3 000 000Antares 500 000Archeops 300 000Atlas 3 500 000Auger 3 000 000Babar 16 000 000Clas 600 000Cmb 25 000Cms 2 500 000D0 15 000 000Delphi 30 000Edelweiss 100 000Eros 500 000Euso 25 000Glast 50 000H1 500 000Hess 500 000Indra 40 000

Experiments

BIOLOGY 1 000 000 (several teams)Lhcb 3 500 000NA48 600 000NA50 200 000Nemo 500 000Ngs-Opera 10 000Phenix 400 000Planck-S. 5 000Siren 8 000 000Snovae 300 000Star 5 000Tesla 100 000Thémis 200 000Virgo 400 000WA98 50 000

Total experiments above : 35-40CPU (UI) : ~ 60 000 000 hours (~ 300 Mh SI-95)

Experiments CPU requestExperiments CPU request

7HTASC - 14 March 2003

Computing ServicesComputing Services

Supported platforms: Linux, SunOS, AIX Dropped support for HP-UX Currently migrating to RedHat Linux 7.2 and SunOS 5.8

Waiting for remaining users and EDG to drop support for RH6.2 More CPU power added over the last six months :

72 bi-processor Intel Pentium 1.4 GHz, 2 GB RAM, 120 GB disk (november)

192 bi-processor Intel Pentium 2.4 GHz, 2 GB RAM (february) Today, the computing capacity (batch+interactive) is

Linux: 920 CPUs SunOS: 62 CPUs AIX: 70 CPUs Total > 1 000 CPUs

Worker nodes storage capacity used for temporary data (reset after job execution)

8HTASC - 14 March 2003

Storage ServicesStorage Services

Extensive use of AFS for user and group files HPSS and staging system for physics data Mix of several platforms/protocols

SunOS, AIX, Tru64 SCSI, FibreChannel AFS, NFS, RFIO

Shared disk capacity (IBM, Hitachi, Sun) ~50TB AFS

User Home directories Code, programs and some experimental data

Xtage Temporary disk system for data on tape

9HTASC - 14 March 2003

Storage Services (cont.)Storage Services (cont.)

Mass storage (HPSS): 250 TB now, 500 TB expected in dec 03

Installed capacity on tape: 700 TB Up to 8.8 TB/day Originally purchased for Babar but now used by most experiments Babar Objectivity: 130 TB and 25 TB cache disk, others: 120 TB and

4.4TB STK 9840 (20GB tapes, fast mount) and STK 9940 (200GB tapes,

slower mount, higher I/O) Accessed by RFIO, mainly rfcp. Supports files larger than 2GB Direct HPSS access from network through BBFTP

10HTASC - 14 March 2003

Storage Services (cont.)Storage Services (cont.)

Semi-permanent storage Suited for small files(which deteriorate HPSS performances) Access with NFS or RFIO API Back-up possible for experiments whose CC-IN2P3 is the « base-

site » (Auger, Antares) Working on RFIO transparent access

Back-up, Archive: TSM (Tivoli Storage Manager) For Home directories, critical experimental data, HPSS metadata,

Oracle data TSM allows data archival (Elliot). For back up of external data (eg. From Admin. Data of IN2P3, from

Biology labs, etc)

11HTASC - 14 March 2003

Disks

• AFS : 4 TB• HPSS : 4,4 TB• Objectivity : 25 TB• Oracle : 0.4 TB• Xstage : 1,2 TB• Semi-perm. : 1,9 TB• TSM : 0.3 TB• Local : 10 TB

Tapes

1 robot STK – 6 silos, 36 000 slots• 12 drives 9940B 200 GB/tape (7 hpss, 3 TSM, 2 others)• 35 drives 9840 20 GB (28 hpss, 4 TSM, 3 others)• 8 drives IBM-3490 0,8 GB (service will stop by end 2003)

1 Robot DLT – 400 Slots• 6 DLT 4000• 4 DLT 7000

Storage Service (cont)Storage Service (cont)

12HTASC - 14 March 2003

NetworkNetwork

International connectivity through… RENATER+GEANT to the US (600 Mbps via ESNET and ABILENE in NY) and

Europe CERN to the US as alternate (600 Mbps)

Babar is using both links to the US for transferring data between SLAC and Lyon

Specific software developed for "filling the pipe" (bbFTP) being extensively used by Babar and D0, amongst others

Dedicated 1 Gb link between Lyon and CERN since january 2003 LAN is composed of a mixture of FastEthernet and GigabitEthernet

Ubiquitous wireless service Connectivity to the other IN2P3 laboratories across the country by

RENATER-3 (the French academic and research network, 2.4 Gbps links) All labs have a private connection to RENATER POPs

13HTASC - 14 March 2003

Grid-related activitiesGrid-related activities

Fully involved in the DataGRID project & partly in DataTag (INRIA) One of the 5 major test bed sites Currently all the "conventional" production environment is accessible

through the grid interface Jobs submitted to the grid are managed by BQS, the home-grown batch

management system Grid jobs can use the same pool of resources than normal jobs (~1000

CPUs) Access to mass storage (HPSS) from remote sites enabled through bbFTP

Benefits: Tests of DataGRID software in a production environment Scalability tests can be performed Users access exactly the same working environment and data whatever

the interface they choose to access our facility Operational issues detected early

14HTASC - 14 March 2003

Grid-related activities (cont.)Grid-related activities (cont.)

Disadvantages Local resources needed for integration of the production

environment (AFS, BQS, …). More work needed to achieve a seamless integration between the

local and grid worlds Users want us to provide a grid service: how to provide a service

around a "moving target" software project? Some experiments already using the grid interface for

"semi-production" Other expressed interest in using it as soon as it gets more stable

Starting from march 2003, the resource broker and associated services for Applications and Development DataGRID testbeds will be hosted and operated in Lyon

15HTASC - 14 March 2003

Grid-related activities (cont.)Grid-related activities (cont.)

Involved in several other grid projects at regional and national levels

Cooperation agreement signed with IBM to work on grid technology

Exchange of experiences Grid technology evaluation Perform experiments of this technology in a production

environment Explore technologies for virtualization of storage …

16HTASC - 14 March 2003

DataGRID @ CNRS –IN2P3DataGRID @ CNRS –IN2P3

Coordination of: WP6 Integration Testbed WP7 Networking WP10 Bioinformatics

IPSL Earth Observation (Paris)

BBE Bioinformatics (Lyon)

CREATIS Imaging and signal processing (Lyon)

RESAM High Speed networking (Lyon)

LIP Parallel computing (Lyon)

IBCP Bioinformatics (Lyon)

UREC Networking (Paris –Grenoble)

LIMOS Bioinformatics (Clermont Ferrant)

LBP Bioinformatics (Clermont Ferrant)

LPC IN2P3 (Clermont-Ferrant)

LAL IN2P3 (Paris)

Subatech IN2P3 (Nantes)

LLR-X IN2P3 (Paris)

ISN In2P3 (Grenoble)

CC-In2P3 IN2P3 (Lyon)

LPNHE IN2P3 (Paris)

CPPM IN2P3 (Marseille)

LAPP IN2P3 (Annecy)