18
Tier1A Status Andrew Sansum 30 January 2003

Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

Embed Size (px)

Citation preview

Page 1: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

Tier1A Status

Andrew Sansum30 January 2003

Page 2: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

Overview

• Systems• Staff• Projects

Page 3: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

Lots of Services

DISK FARM

CPU FARM CDF Babar SunsTESTBEDS

Core Services

AFSDatastore

Support Systems

Page 4: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

Lots of Operating Systems

• Production Farm– Redhat 6.2 (Close to end of life)– Redhat 7.2 (In production/ Babar)– Redhat 7.3 (close to Trial Service: For LHC)

• CDF Service– Redhat 7.1 (Kerberised Fermi Distribution)– Redhat 7.3 (Possible Future release)

• Solaris Service– Solaris 2.6/Solaris 8

• EDG Testbed(s) - Redhat 6.2 -> Redhat 7.3

Page 5: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

Lots of EDG Testbeds!

• Production Testbed (CE, SE, 3*WN+NM)• Development Testbed (CE, SE, 1*WN)• RGMA Testbed (CE, SE, WN and RB)• WP5 SE • WP3/WP5 development systems• EDG UI• CE for REDhat 7.2 service

Page 6: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

Lots of Grid Testbeds!

Tier1A

Babar

Page 7: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

New Hardware

• Disk– Expect 40TB– Continue with existing IDE technology,

but different manufacturer.

• CPU– Expect 100 CPUs– Move to Pentium 4 or possible AMD

Page 8: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

Some New Staff

GridPP Staff: Traylen, Radden, Bly

ESC/PPD System Staff: Wheeler, White, Sansum, Saunders, Ross, Folkes, Strong

Management: Kelsey, Gordon, Sansum, ...

BITD Support: Networking, Operations, User Reg, AFS

Experiment Support Staff (RAL and elsewhere)

Users

Page 9: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

Lots of New Projects

• Basic fabric performance monitoring (ganglia)• Resource CPU accounting (based on PBS

accounts/mysql)• New CA in production• New batch scheduler (MAUI)• Deploy new helpdesk (end March)• Network Performance tests (CERN/Bristol - also

maybe WP7)• Get ready for LCG (February deployment?)

Page 10: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

Ganglia Monitoring

• Urgently needed live performance and utilisation monitoring– RAL Ganglia Monitoring (live)– RAL Ganglia Monitoring (Static)

• Scalable solution based on multicast• Very rapidly deployable - reasonable

support on all Tier1A Hardware• See: http://ganglia.sourceforge.net/

Page 11: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects
Page 12: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

New CA Deployed

• Now fully deployed by E-Science Centre (Jens+Alastair Mills)

• In use in UK core GRID• Several PP have RA’s defined • Approved by EDG - not yet in

distribution.• Once in EDG - termination date for old

CA will be set.

Page 13: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

New Scheduler (MAUI)

• With Redhat 7.2 now using MAUI Scheduler over PBS

• Some problems with MAUI scheduling on wallclock time - now corrected.

• Testing algorithms, but essentially have a range of strategies we can apply.

• Will make changes to queue structure in due course

Page 14: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

New Helpdesk Software

• Old helpdesk (Remedy) - mail based, unfriendly.

• With additional staff, urgently need to deploy new solution.

• Expect new system to be based on free software (Bugzilla, Request Tracker …)

• Hope that deployed system will also meet needs of Testbed and Tier 2 sites.

• Expect deployment by end of March.

Page 15: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

Network Performance Tests

• Simon Metson, Nick White, +….• Preparing for CMS production. Must be

able to move data to CERN at 100-200Mbit/second.

• Currently aggregate 350Mbit/s to Bristol - but under 100Mbit/s to CERN.

• Main problem seems to be within CMS infrastructure

Page 16: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

BaBar Batch CPU Use at RAL

0

20,000

40,000

60,000

80,000

100,000

120,000

Week Beginning

BaB

ar C

PU

Ho

urs

per

Wee

k(N

orm

alis

ed

to

P4

50

)

SPUK UsersNon-UK Users

Full usage at full efficiency of BaBar CPUs = 106,624 Hours/Week; 59,733 according to MOU

MOU

Page 17: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

Successes (2002)

• Five additional staff online since January 2002.

• Fully engaged in EDG testbed. Making an impact in EDG: Steve

• Tier1A installation went very well in March/April/May

• Tier A service ramp up excellent: – Most successful of the Tier A services. SLAC

seem pleased - so far.

Page 18: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects

Challenges

• Complete 2002/2003 tender/deployment• Carry out major EU tenders for 2003/2004• Expand use of Tier 1• Need to evolve strategy to cope with

diversity of requirements• Deploy the LCG Testbed (What/When?)• Enhance automation / out of hours cover• Improve reporting to GridPP - accountability