View
217
Download
0
Tags:
Embed Size (px)
Citation preview
IN2P3 Status ReportIN2P3 Status Report
HTASCMarch 2003
Fabio HERNANDEZ et al. from CC-in2p3François [email protected]
2HTASC - 14 March 2003
OutlineOutline
User community Update on computing services Update on storage services Network status Grid status
3HTASC - 14 March 2003
0,51995
21996
41997
61999
34 Mb/s2000
IN2P3 current contextIN2P3 current context
18 labs 1 Computer Center 2500 users 40 experiments
CCIN2P3-CERN connection
155 Mb/s2001
1 Gb/s 2003
CCIN2P3-SLAC connection30 Mb/s2001
155 Mb/s2002
600 Mb/s2003
4HTASC - 14 March 2003
RENATER current contextRENATER current context
Deployed : oct. 2002 More grid than star-shape Most links = 2.4 Gbps still 2 main nodes : Paris, Lyon
5HTASC - 14 March 2003
User communityUser community
Experiments: LHC (Atlas, CMS, Alice, LHCB), BaBar (SLAC), D0 (FNAL), PHENIX
(Brookhaven), astrophysics (17 expts : EROS, SuperNovae, Auger, Virgo…)
2500 users from different countries TIER A BaBar 20% CPU power were consumed by non-French users in 2002
Starting to provide services to biologists at a local/regional level (4 teams and ~3% of cpu over the last 6 months, WP10 EDG, Heaven cluster)
User community steadily growing
6HTASC - 14 March 2003
Experiments CPU (UI ~ 5 SI-95)
Aleph 300 000Alice 1 000 000Ams 3 000 000Antares 500 000Archeops 300 000Atlas 3 500 000Auger 3 000 000Babar 16 000 000Clas 600 000Cmb 25 000Cms 2 500 000D0 15 000 000Delphi 30 000Edelweiss 100 000Eros 500 000Euso 25 000Glast 50 000H1 500 000Hess 500 000Indra 40 000
Experiments
BIOLOGY 1 000 000 (several teams)Lhcb 3 500 000NA48 600 000NA50 200 000Nemo 500 000Ngs-Opera 10 000Phenix 400 000Planck-S. 5 000Siren 8 000 000Snovae 300 000Star 5 000Tesla 100 000Thémis 200 000Virgo 400 000WA98 50 000
Total experiments above : 35-40CPU (UI) : ~ 60 000 000 hours (~ 300 Mh SI-95)
Experiments CPU requestExperiments CPU request
7HTASC - 14 March 2003
Computing ServicesComputing Services
Supported platforms: Linux, SunOS, AIX Dropped support for HP-UX Currently migrating to RedHat Linux 7.2 and SunOS 5.8
Waiting for remaining users and EDG to drop support for RH6.2 More CPU power added over the last six months :
72 bi-processor Intel Pentium 1.4 GHz, 2 GB RAM, 120 GB disk (november)
192 bi-processor Intel Pentium 2.4 GHz, 2 GB RAM (february) Today, the computing capacity (batch+interactive) is
Linux: 920 CPUs SunOS: 62 CPUs AIX: 70 CPUs Total > 1 000 CPUs
Worker nodes storage capacity used for temporary data (reset after job execution)
8HTASC - 14 March 2003
Storage ServicesStorage Services
Extensive use of AFS for user and group files HPSS and staging system for physics data Mix of several platforms/protocols
SunOS, AIX, Tru64 SCSI, FibreChannel AFS, NFS, RFIO
Shared disk capacity (IBM, Hitachi, Sun) ~50TB AFS
User Home directories Code, programs and some experimental data
Xtage Temporary disk system for data on tape
9HTASC - 14 March 2003
Storage Services (cont.)Storage Services (cont.)
Mass storage (HPSS): 250 TB now, 500 TB expected in dec 03
Installed capacity on tape: 700 TB Up to 8.8 TB/day Originally purchased for Babar but now used by most experiments Babar Objectivity: 130 TB and 25 TB cache disk, others: 120 TB and
4.4TB STK 9840 (20GB tapes, fast mount) and STK 9940 (200GB tapes,
slower mount, higher I/O) Accessed by RFIO, mainly rfcp. Supports files larger than 2GB Direct HPSS access from network through BBFTP
10HTASC - 14 March 2003
Storage Services (cont.)Storage Services (cont.)
Semi-permanent storage Suited for small files(which deteriorate HPSS performances) Access with NFS or RFIO API Back-up possible for experiments whose CC-IN2P3 is the « base-
site » (Auger, Antares) Working on RFIO transparent access
Back-up, Archive: TSM (Tivoli Storage Manager) For Home directories, critical experimental data, HPSS metadata,
Oracle data TSM allows data archival (Elliot). For back up of external data (eg. From Admin. Data of IN2P3, from
Biology labs, etc)
11HTASC - 14 March 2003
Disks
• AFS : 4 TB• HPSS : 4,4 TB• Objectivity : 25 TB• Oracle : 0.4 TB• Xstage : 1,2 TB• Semi-perm. : 1,9 TB• TSM : 0.3 TB• Local : 10 TB
Tapes
1 robot STK – 6 silos, 36 000 slots• 12 drives 9940B 200 GB/tape (7 hpss, 3 TSM, 2 others)• 35 drives 9840 20 GB (28 hpss, 4 TSM, 3 others)• 8 drives IBM-3490 0,8 GB (service will stop by end 2003)
1 Robot DLT – 400 Slots• 6 DLT 4000• 4 DLT 7000
Storage Service (cont)Storage Service (cont)
12HTASC - 14 March 2003
NetworkNetwork
International connectivity through… RENATER+GEANT to the US (600 Mbps via ESNET and ABILENE in NY) and
Europe CERN to the US as alternate (600 Mbps)
Babar is using both links to the US for transferring data between SLAC and Lyon
Specific software developed for "filling the pipe" (bbFTP) being extensively used by Babar and D0, amongst others
Dedicated 1 Gb link between Lyon and CERN since january 2003 LAN is composed of a mixture of FastEthernet and GigabitEthernet
Ubiquitous wireless service Connectivity to the other IN2P3 laboratories across the country by
RENATER-3 (the French academic and research network, 2.4 Gbps links) All labs have a private connection to RENATER POPs
13HTASC - 14 March 2003
Grid-related activitiesGrid-related activities
Fully involved in the DataGRID project & partly in DataTag (INRIA) One of the 5 major test bed sites Currently all the "conventional" production environment is accessible
through the grid interface Jobs submitted to the grid are managed by BQS, the home-grown batch
management system Grid jobs can use the same pool of resources than normal jobs (~1000
CPUs) Access to mass storage (HPSS) from remote sites enabled through bbFTP
Benefits: Tests of DataGRID software in a production environment Scalability tests can be performed Users access exactly the same working environment and data whatever
the interface they choose to access our facility Operational issues detected early
14HTASC - 14 March 2003
Grid-related activities (cont.)Grid-related activities (cont.)
Disadvantages Local resources needed for integration of the production
environment (AFS, BQS, …). More work needed to achieve a seamless integration between the
local and grid worlds Users want us to provide a grid service: how to provide a service
around a "moving target" software project? Some experiments already using the grid interface for
"semi-production" Other expressed interest in using it as soon as it gets more stable
Starting from march 2003, the resource broker and associated services for Applications and Development DataGRID testbeds will be hosted and operated in Lyon
15HTASC - 14 March 2003
Grid-related activities (cont.)Grid-related activities (cont.)
Involved in several other grid projects at regional and national levels
Cooperation agreement signed with IBM to work on grid technology
Exchange of experiences Grid technology evaluation Perform experiments of this technology in a production
environment Explore technologies for virtualization of storage …
16HTASC - 14 March 2003
DataGRID @ CNRS –IN2P3DataGRID @ CNRS –IN2P3
Coordination of: WP6 Integration Testbed WP7 Networking WP10 Bioinformatics
IPSL Earth Observation (Paris)
BBE Bioinformatics (Lyon)
CREATIS Imaging and signal processing (Lyon)
RESAM High Speed networking (Lyon)
LIP Parallel computing (Lyon)
IBCP Bioinformatics (Lyon)
UREC Networking (Paris –Grenoble)
LIMOS Bioinformatics (Clermont Ferrant)
LBP Bioinformatics (Clermont Ferrant)
LPC IN2P3 (Clermont-Ferrant)
LAL IN2P3 (Paris)
Subatech IN2P3 (Nantes)
LLR-X IN2P3 (Paris)
ISN In2P3 (Grenoble)
CC-In2P3 IN2P3 (Lyon)
LPNHE IN2P3 (Paris)
CPPM IN2P3 (Marseille)
LAPP IN2P3 (Annecy)