Upload
shawn-stevens
View
214
Download
0
Embed Size (px)
Citation preview
Tier1A Status
Andrew Sansum30 January 2003
Overview
• Systems• Staff• Projects
Lots of Services
DISK FARM
CPU FARM CDF Babar SunsTESTBEDS
Core Services
AFSDatastore
Support Systems
Lots of Operating Systems
• Production Farm– Redhat 6.2 (Close to end of life)– Redhat 7.2 (In production/ Babar)– Redhat 7.3 (close to Trial Service: For LHC)
• CDF Service– Redhat 7.1 (Kerberised Fermi Distribution)– Redhat 7.3 (Possible Future release)
• Solaris Service– Solaris 2.6/Solaris 8
• EDG Testbed(s) - Redhat 6.2 -> Redhat 7.3
Lots of EDG Testbeds!
• Production Testbed (CE, SE, 3*WN+NM)• Development Testbed (CE, SE, 1*WN)• RGMA Testbed (CE, SE, WN and RB)• WP5 SE • WP3/WP5 development systems• EDG UI• CE for REDhat 7.2 service
Lots of Grid Testbeds!
Tier1A
Babar
New Hardware
• Disk– Expect 40TB– Continue with existing IDE technology,
but different manufacturer.
• CPU– Expect 100 CPUs– Move to Pentium 4 or possible AMD
Some New Staff
GridPP Staff: Traylen, Radden, Bly
ESC/PPD System Staff: Wheeler, White, Sansum, Saunders, Ross, Folkes, Strong
Management: Kelsey, Gordon, Sansum, ...
BITD Support: Networking, Operations, User Reg, AFS
Experiment Support Staff (RAL and elsewhere)
Users
Lots of New Projects
• Basic fabric performance monitoring (ganglia)• Resource CPU accounting (based on PBS
accounts/mysql)• New CA in production• New batch scheduler (MAUI)• Deploy new helpdesk (end March)• Network Performance tests (CERN/Bristol - also
maybe WP7)• Get ready for LCG (February deployment?)
Ganglia Monitoring
• Urgently needed live performance and utilisation monitoring– RAL Ganglia Monitoring (live)– RAL Ganglia Monitoring (Static)
• Scalable solution based on multicast• Very rapidly deployable - reasonable
support on all Tier1A Hardware• See: http://ganglia.sourceforge.net/
New CA Deployed
• Now fully deployed by E-Science Centre (Jens+Alastair Mills)
• In use in UK core GRID• Several PP have RA’s defined • Approved by EDG - not yet in
distribution.• Once in EDG - termination date for old
CA will be set.
New Scheduler (MAUI)
• With Redhat 7.2 now using MAUI Scheduler over PBS
• Some problems with MAUI scheduling on wallclock time - now corrected.
• Testing algorithms, but essentially have a range of strategies we can apply.
• Will make changes to queue structure in due course
New Helpdesk Software
• Old helpdesk (Remedy) - mail based, unfriendly.
• With additional staff, urgently need to deploy new solution.
• Expect new system to be based on free software (Bugzilla, Request Tracker …)
• Hope that deployed system will also meet needs of Testbed and Tier 2 sites.
• Expect deployment by end of March.
Network Performance Tests
• Simon Metson, Nick White, +….• Preparing for CMS production. Must be
able to move data to CERN at 100-200Mbit/second.
• Currently aggregate 350Mbit/s to Bristol - but under 100Mbit/s to CERN.
• Main problem seems to be within CMS infrastructure
BaBar Batch CPU Use at RAL
0
20,000
40,000
60,000
80,000
100,000
120,000
Week Beginning
BaB
ar C
PU
Ho
urs
per
Wee
k(N
orm
alis
ed
to
P4
50
)
SPUK UsersNon-UK Users
Full usage at full efficiency of BaBar CPUs = 106,624 Hours/Week; 59,733 according to MOU
MOU
Successes (2002)
• Five additional staff online since January 2002.
• Fully engaged in EDG testbed. Making an impact in EDG: Steve
• Tier1A installation went very well in March/April/May
• Tier A service ramp up excellent: – Most successful of the Tier A services. SLAC
seem pleased - so far.
Challenges
• Complete 2002/2003 tender/deployment• Carry out major EU tenders for 2003/2004• Expand use of Tier 1• Need to evolve strategy to cope with
diversity of requirements• Deploy the LCG Testbed (What/When?)• Enhance automation / out of hours cover• Improve reporting to GridPP - accountability