14
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Embed Size (px)

Citation preview

Page 1: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

HEP Computing Status

Sheffield UniversityMatt Robinson

Paul Hodgson

Andrew Beresford

Page 2: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Interactive Cluster

• 30 self built linux boxes• AMD Athlon XP cpu’s, 256/512 meg ram• OS Scientific Linux 303• 100 megabit network• Use NIS for authentication, NFS mount /home etc• System install using kickstart + post install scripts• Separate backup machine• 15 Laptops mostly dual boot• Some MAC’s and one Windows Box• 3 Disk servers mounted as /data1 /data2 etc (few TB)

Page 3: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Batch Cluster

• 100 cpu farm Athlon XP 2400/2800• OS Scientific Linux 303• NFS mounted /home and /data• OpenPBS batch system for job submission• Gigabit Backbone with 100 MBit to worker nodes• Disk server provides 1.3 TB as /data Raid5• Entire cluster assembled in house from OEM components

for less than 50k• Hard part was finding air-conditioned room with sufficient

power

Page 4: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Cluster Usage

Page 5: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Software

• PAW, CERNLIB etc• Geant4• ROOT• Atlas 10.0.1• FLUKA• ANSYS, LS-DYNA

Page 6: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Comments - Issues

• Have tightened up security in last year• Strict firewall policy, limited machine exemption• Blocking scripts prevent ssh access after 3

authentication failures within 1 hour• Cheap disks allow construction of large disk

arrays• Very happy with SL3 for desktop machines• Use FC3 for Laptops – 2.6 kernel

Page 7: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

The Sheffield LCG Cluster

Page 8: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Division of Hardware• 162 x AMD Opteron 250 (2.4

GHz)• 4 GB RAM/box (2 GB/CPU)• 72 GB U320 10K RPM local

SCSI disk• Currently running 32 bit

SL303 for maximum compatibility with grid.

• ~2.5 TB storage for experiments.

• Middleware: 2.4.0• Probably the most purple

cluster in the grid.

Page 9: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Looking Sinister

Page 10: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Status

Page 11: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Usage so far

• We can take quite a bit more.

Page 12: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Monitoring

• Ganglia with modified webfrontend to present queue information

Page 13: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Installation

• Service nodes connected to VPN and Internet

• PXE Installation via VPN allows complete control of dhcpd and named

• RedHat kickstart + post install script

• ssh servers not exposed

• RGMA always the hardest part

• Stumbled across routing rules.

• WN install takes about 30 minutes, can do up to 40 simultaneously.

Page 14: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Future plans

• Keep up with middleware updates

• Increase available storage as required in

~3-4 TB steps

• Fix things that break

• Try not to mess anything up by screwing around

• Look toward operating with 64 bit OS.

Matt Robinson:Matt Robinson: