21
Tier 1 Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

  • View
    222

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Tier 1Tier 1

Luca dell’AgnelloINFN – CNAF, BolognaWorkshop CCRPaestum, 9-12 Giugno 2003

Page 2: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

INFN – Tier1

INFN computing facility for HNEP community Location: INFN-CNAF, Bologna (Italy)

o One of the main nodes on GARR network Ending prototype phase this year Fully operational next year Personnel: ~ 10 FTE’s

Multi-experiment LHC experiments, Virgo, CDF BABAR (3rd quarter 2003) Resources dynamically assigned to experiments according

to their needs Main (~50%) Italian resource for LCG

Coordination with Tier0 and other Tier1 (management, security etc..)

Coordination with Italian tier2s, tier3s Participation to grid test-beds (EDG,EDT,GLUE) Participation to CMS, ATLAS, LHCb , Alice data challenge GOC (deployment in progress)

Page 3: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Networking

CNAF interconnected to GARR-B backbone at 1 Gbps.

Giga-PoP co-located GARR-B backbone at 2.5 Gbps.

LAN: star topology Computing elements connected via FE to rack switch

o 3 Extreme Summit 48 FE + 2 GE portso 3 3550 Cisco 48 FE + 2 GE portso 8 Enterasys 48 FE 2GE ports

Servers connected to GE switcho 1 3Com L2 24 GE ports

Uplink via GE to core switch o Extreme 7i with 32 GE portso ER16 Gigabit switch router Enterasys

Disk servers connected via GE to core switch.

Page 4: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

LAN TIER1

FarmSW1 (*)

FarmSW2(*)

FarmSWG1 (*)

FarmSW3(*)

Switch-lanCNAF (*)

SSR2000

Catalyst6500

Fcds1 Fcds2

8TF.C.

2TSCSI

NA

S2

131.154.99.192

NA

S3

131.154.99.193

Fcds3

LHCBSW1 (*)

LAN CNAF 1 Gbps

GARR

1 Gbps link

(*) vlan tagging enabled

Page 5: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Vlan Tagging

Define VLAN’s across switches Independent from switch brand (Standard 802.1q)

Adopted solution for complete granularity To each switch port is associated one VLAN identifier Each rack switch uplink propagates VLAN information VLAN identifiers are propagated across switches Each farm has its own VLAN Avoid recabling (or physical moving) of hw to change

the topology Level 2 isolation of farms

Aid for enforcement of security measures Possible to define multi-tag ports (for servers)

Page 6: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Computing units (1)

160 1U rack-mountable Intel dual processor servers 800 MHz – 2.2 GHz

160 1U bi-processors Pentium IV 2.4 GHz to be shipped this month

1 switch per rack 48 FastEthernet ports 2 Gigabit uplinks Interconnected to core switch via 2 couples of optical fibers

o Also 4 UTP cables available 1 network power control per rack

380 V three-phase power as input Outputs 3 independent 220 V lines Completely programmable (permits gradual servers

switching on). Remotely manageable via web

Page 7: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Computing units (2)

OS: Linux RedHat (6.2, 7.2, 7.3, 7.3.2) Experiment specific library software Goal: have generic computing units

o Experiment specific library software in standard position (e.g. /opt/cms)

Centralized installation system LCFG (EDG WP4) Integration with central Tier1 db (see below) Each farm on a distinct VLAN

o Moving from a farm to another a server changes IP address (not name)

Unique dhcp server on all VLAN’s Support for DDNS (cr.cnaf.infn.it) in progress

Queue manages: PBS Not possible to have version “Pro” (only for edu) Free version not flexible enough Tests of integration with MAUI in progress

Page 8: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Tier1 Database

Resource database and management interface Hw servers characteristics Sw servers configuration Servers allocation Postgres database as back end Web interface (apache+mod_ssl+php)

Possible direct access to db for some applications

Monitoring system nagios

Interface to configure switches and interoperate with LCFG

Page 9: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Monitoring/Alarms

Monitoring system developed at CNAF Socket server on each computer Centralized collector ~100 variables collected every 5 minutes

o Data archived on flat file – In progress: XML structure for data archives

User interface: http://tier1.cnaf.infn.it/monitor/o Next release: JAVA interface (collaboration with D. Galli,

LHCb)

Critical parameters periodically checked by nagios Connectivity (i.e. ping), system load, bandwidth use, ssh

daemon, pbs etc… User interface: http://tier1.cnaf.infn.it/nagios/ In progress: configuration interface

Page 10: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Remote control

KVM switches permit remote control of servers console 2 models under test

Paragon UTM8 (Raritan) 8 Analog (UTP/Fiber) output connections Supports up to 32 daisy chains of 40 servers (UKVMSPD

modules needed) Costs: 6 KEuro + 125 Euro/server (UKVMSPD module) IP-reach (expansion to support IP transport): 8 KEuro

Autoview 2000R (Avocent) 1 Analog + 2 Digital (IP transport) output connections Supports connections up to 16 servers

o 3 switches needed for a standard rack Costs: 4.5 KEuro

NPC’s (Network Power Control) permit remote and scheduled power cycling via snmp calls or web

Bid under evaluation

Page 11: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Raritan

Page 12: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Avocent

Page 13: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Storage

Access to on-line data: DAS, NAS, SAN 32 TB (> 70 TB this month) Data served via NFS v3

Test of several hw technologies (EIDE, SCSI, FC) Bid for FC switch

Study of large file system solutions (>2TB) and load balancing/failover architectures

GFS (load balancing)o Problems with lock server (better in hw?)

GPFS (load balancing, large file systems)o Not that easy to install and configure….

HA (failover) “SAN on WAN” tests (collaboration with

CASPUR) Tests with PVFS (LHCb, Alice)

Page 14: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

STORAGE CONFIGURATION

CLIENT SIDE(Gateway or all Farm must access Storage)

WAN or TIER1 LAN

PROCOM NAS2Nas2.cnaf.infn.it8100 GbyteVIRGO ATLAS

Fileserver CMS (or more in cluster or HA)diskserv-cms-1.cnaf.infn.it

PROCOM NAS3Nas3.cnaf.infn.it4700 GbyteALICE ATLAS

IDE NAS4Nas4.cnaf.infn.it1800GbyteCDF LHCB

AXUS BROWIECirca 2200 Gbyte 2 FC interface

DELL POWERVAULT7100 Gbyte2 FC interface

FAIL-OVERsupport

FC SwitchIn order

RAIDTEC1800 Gbyte2 SCSI interfaces

CASTORServer+staging

STK180 with 100 LTO (10Tbyte Native)

Fileserver Fcds3.cnaf.infn.it

Page 15: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Mass Storage Resources

StorageTek library with 9840 and LTO drives 180 tapes (100/200 GB each)

StorageTek L5500 with 2000-5000 slots in order LTOv2 (200/400 GB each) 6 I/O drives 500 tapes ordered

CASTOR as front-end software for archiving Direct access for end-users Oracle as back-end

Page 16: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

TAPE HARDWARE

2 internal tape drive 9840 HVD 8MB/s

2 internal tape drive IBM ultrium LTO 15 MB/s

Data access via connessione diretta SCSI 3 LVD 80MB/s

Data access connessione diretta SCSI HVD

- WIN 2000 Dell 1650 raid 1 server - LEGATO NSR -Adaptec 2944 UW 32 bit 40 MB/s

- CASTOR SERVER -LINUX 7.2 Compaq 360 raid 1 server -CASTOR -Adaptec SCSI 3 39160 LVD 64 bit 160 MB/s dual port

Robot access via connessione diretta SCSI HVD

-SOLARIS 8 Ultra 10 Sparc -ACSLS soft. -Antares SCSI-HVD 32 bit

TCP/IPGigabit Ethernet

2 internal tape drive IBM ultrium LTO 15 MB/s

TAPE LIBRARY STK180 (10 TB uncompressed using LTO tapes)

- TAPESERVER -LINUX 7.2 Dell 1650 raid 1 server -CASTOR TAPE soft. -Adaptec SCSI 3 LVDport

TCP/IPGigabit Ethernet

TCP/IPGigabit Ethernet

TCP/IP100M/s Ethernet

Data access connessione diretta SCSI 3 LVD 80MB/s

2 TB STAGING AREA

Page 17: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

CASTOR

Developed and maintained at CERN Chosen as front-end for archiving Features

Needs a staging area on disk (~ 20% of tape) ORACLE database as back-end for full capability (a MySQL

interface is also included)o ORACLE database is under day-policy backup

Every client needs to install the CASTOR client packet (works on almost major OS’s including Windows)

o Access via rfio command

CNAF setup Experiment access from TIER1 farms via rfio with

UID/GID protection from single server National Archive support via rfio with UID/GID

protection from single server Grid SE tested and working

Page 18: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

CASTOR at CNAF2 drive 9840

4 drives LTO Ultrium

SCSI

SCSI

LEGATO NSR(Backup)

Robot access via SCSI

ACSLS

CASTOR

STK L180

LAN

2 TB Staging Disk

Page 19: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

New Location

The present location (at CNAF office level) is not suitable, mainly due to:

Insufficient space. Weight (~ 700 kg./0.5 m2 for a standard rack with 40

1U servers). Moving to the final location (early) this summer.

New hall in the basement (-2nd floor) almost ready. ~ 1000 m2 of total space

o Computerso Electric Power System (UPS, MPU)o Air conditioning system

Easily accessible with lorries from the road Not suitable for office use (remote control)

Page 20: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Electric Power

220 V mono-phase needed for computers. 4 – 8 KW per standard rack (with 40 bi-processors)

16-32 A. 380 V three-phase for other devices (tape

libraries, air conditioning etc..). To avoid black-outs, Tier1 has standard

protection systems. Installed in the new location:

UPS (Uninterruptible Power Supply).o Located in a separate room (conditioned and

ventilated).o 800 KVA (~ 640 KW).

Electric Generator.o 1250 KVA (~ 1000 KW).

up to 80-160 racks.

Page 21: Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

Summary & conclusions

INFN-TIER1 is closing the prototype phase But still testing new technological solutions

Going to move the resources to the final location Interoperation with grid projects (EDG,EDT,LCG) Starting integration with LCG Participating to CMS DC04

~ 70 computing servers ~ 4M events (40% of Italian commitment) 15+60(Tier0)

TB of data (July to December 03) Analysis of simulated events (January to February 04) Interoperation with Tier0 (CERN) and Tier2 (LNL)