29
UK Testbed Report GridPP 9 Steve Traylen [email protected]

UK Testbed Report GridPP 9 Steve Traylen [email protected]

Embed Size (px)

Citation preview

Page 1: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

UK Testbed ReportGridPP 9

Steve [email protected]

Page 2: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

Topics

• Current test beds.• Resources at GridPP sites. (Corrections

welcome)– Scot Grid– London Grid– North Grid– South Grid– Tier1/A Grid

• Future test beds and grid facilities.

Page 3: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

EDG 2.1.12

• No big changes within EDG 2.1.X, only bug fixes and security updates. Released last November.

• 19 sites in total with 8 from the UK.• Software freeze, 4th February.• New sites freeze, 9th February.• EDG review on 19th , 20th February.• EDG will continue as is through March. • OS security policy based on Fedora Legacy.

Page 4: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

LCG 1

• Released October 2003.• Since then only a few security updates.• In limited use by experiments but the

deployment procedure is now well established as are plans for the next data challenge.

• Three sites in the UK of a total 28.• No support for managed or mass storage

access.• Security updates provided by CERN Linux

group.

Page 5: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

How to join LCG.

• Procedure is quite formal but appears to work.– A How2Start document exists.

• A questionnaire must be filled in supplying your intentions. – Number of WNs.– Storage capacity. – VOs you plan to support.

• All install support for LCG is via your primary LCG, for the UK this is RAL.

• The existing tb-support list can also be used.• LCG2 similar although the GOC will be collecting

information via a Gridsite enabled web form.

Page 6: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

ScotGrid (Glasgow)

• Previously within EDG.• EDG2 being installed.• Various WP2 test machines.• New CE, SE and UI being

arranged.• Will join LCG2 with these

resources and new resources.• 6 x335 and 29 blades from IBM

funded by eDIKT for bioinformatics being added.

• A CDF-SAM/JIM front end exists.

• Possible UK eScience front end run by NeSC Hub.

• Fraser Speirs now the tech co-ordinator for ScotGrid.

Page 7: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

ScotGrid (Durham)

• EDG 2.1.8 installed.• Ganglia to be added.• New GridPP

hardware will be installed in EDG/LCG.

• Extra front ends will be made available from ScotGrid.

Page 8: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

NorthGrid (Lancaster)

• Predominately SAMGrid.

• EDG CE, SE, MON, WNs.

• 6 figure sum of hardware, arriving end of 2004.

• This shiney farm will appear within LCG and NorthGrid.

• Ganglia is use and liked for debugging and ease of install.

EDG 2.1.8

Page 9: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

NorthGrid (Sheffield)

• Most recent full member of the EDG testbed.

• More WNs will now be added.

• Grid jobs could be fed to existing 68cpu farm.

• This farm will continue to grow.

• “Ganglia is fantastic.”• Plans exist for a 400 CPU

cluster for use by GridPP.

EDG 2.1.12

Page 10: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

NorthGrid (Manchester)

• VO services for GridPP, Babar and experimental MICE, CALICE and DESY.

• WWW for GridPP.• Very active with Babar grid.• Will join LCG1/2.• Alessandra will be North

grid coordinator.• Ganglia in used throughout

department. Also a GridPP ganglia view likely to be created.

• 1000 CPU farm being commissioned this year.

Page 11: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

NorthGrid (Liverpool)

• 940 DELL P4s installed.• Around 80 will be committed to LCG2,

this figure can vary once actual demand is known.

• Bunch of nodes for grid front ends are available.

• Plan to take part in ATLAS and LHCb data challenges as well as Babar grid.

Page 12: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

ScotGrid (Edinburgh)

• Joining EDG2 now.• Plan to join LCG

and/or Babar. • Grid jobs could be fed

to a 17 node Babar farm.

• Grid access to some of 150TB of SRIF storage is planned.

• Phil Clark will coordinating ScotGrid activities in Edinburgh.

Page 13: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

LondonGrid (UCL)

• UCL is a full member of the EDG2 testbed.

• SRIF funded cluster with 192 CPUs for Grid and E-Science projects being installed now

• A large portion will be made available within LCG2.

• Will be used for Atlas DC2.

EDG 2.1.8

Page 14: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

LondonGrid (QMUL)

• A full member of EDG2 testbed.

• Plan to front end SRIF farm.

• 100-128 CPUs are in tender process now

• Ganglia in use.• Unhappy with

ScalablePBS/Maui. Will consider Sun Grid Engine v6 once released.

EDG 2.1.8

Page 15: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

London Grid (Imperial College)

• Full member of LCG1 and EDG2.

• Eagerly awaiting LCG2 for CMS DC.

• Both CPU and Disk dedicated to LCG and EDG will be increased.

• A UI for LCG2--.• Monitoring, job tracking

map is now an applet.

EDG 2.1.12

LCG1 1.1.4

Page 16: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

South Grid (Bristol)

• VDT based CE within Babar Grid.

• This fronts an existing 40 WN Babar farm.

• There is a GridPP replica catalogue.

• Ganglia running.• A 6 month plan to be running

Babar SP production lies ahead.• Lots of work for CMS DC04

such as GMcat.

VDT 1.8-14.

Page 17: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

SouthGrid (Birmingham)

• EDG 2.1.8 testbed installed.

• Possibly integrate to existing farms later.

• New on-site hardware expected, not dedicated to UKHEP.

EDG 2.1.8

Page 18: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

SouthGrid (RAL PPD)

• A full member of EDG2, but will ramp down to join LCG2.

• WP3 testbed also includes 2 R-GMA Nagios nodes.

• Ganglia in use, Nagios being considered.

• 16+40 CPUs, 5TB disk being arranged.

• Supporting SouthGrid install EDG and then LCG.

EDG 2.1.12

RGMA , EDG 2.1.12++

Page 19: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

SouthGrid (Oxford)

• A full member of EDG testbed.

• Oxford eScience centre has a Condor and JISC infrastructure testbed.

• Ganglia in use else where other than EDG cluster.

• Hardware being sourced now for 2004 data challenges.

• All resources will appear within SouthGrid.

EDG 2.1.12

Page 20: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

SouthGrid (Cambridge)

• 20 nodes in total including one for testing and a NM.

• Plan to feed jobs to existing eScience farms.

• 3TB and 20 more CPUs to deployed in 2months time.

• Ganglia in use and liked.

EDG 2.1.12

LCG1 1.1.4

Page 21: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

RAL Tier1/a (EDG App)

• RAL runs EDG core services such as RGMA catalogue and RLS for some VOs.EDG 2.1.12

Page 22: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

RAL Tier1/a (EDG and Others)

• EDG Dev: CE, 2xSE, MON and RLS.• R-GMA: CE, SE, MON and IC.• EDG-SE: 4 x SEs• Public access UIs for apptb and

devtb.• Gatekeeper into main production

farm for Babar Grid.• Central SRB MCAT server.

Page 23: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

RAL Tier1/a (LCG1/2--)

• LCG1: UI, CE, SE, BDII, WN and West GIIS.– Skeleton service to be terminated ASAP.

• LCG2 : UI , CE, WN, BDII, PROXY, RB and west GIIS.– Babar and DZero VOs added to standard LHC

VOs.– R-GMA added to LCG2 as proof of concept.– Nagios running across LCG2.

Page 24: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

Common Problems/Requests

• More reliable or informative monitoring.• Firewall requirements, one site was nearly

charged because of the required complexity.

• Strange Globus TCP considered illegal by a couple of fire walls and blocked wrongly or rightly.

• Mistakes/omissions in the instructions.– Too many possible errors in the system to

document.

Page 25: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

Arrival of LCG2.

• LCG2 is appearing now.• With this release it is good time commit

resources.– Data challenges are all eager to start.– Teir1 will initially move 70 high end nodes into this.

• Installation by LCFG or manual by hand instructions exist for installed/shared resources.

• Likely to be in place for some time perhaps for the remainder of this year.

Page 26: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

DCache SRM

• What is it with respect to the LCG?– One solution to SRM’ed fronted disk.– DCache is GridFTP server with a very flexible

backend. – A SRM implementation exists.

• RAL is packaging, testing and configuring with LCFG.

• Testing at a brick wall due to minor, but critical, differences in SRM implementation.– The SRM does not create directories automagically.– As a classic SE gridftp-mkdir not implemented.

Page 27: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

Plans for EDG testbed.

• Lots of speculation.• After the review software may move closer to

LCG2++.• Officially EDG testbed will become a

development/demonstration testbed for EGEE.• From early April people inside EGEE will

coordinate and run this resource.• This may move to be the SA1 testbed, it will be the

same people running it.• A partial quattor install EDG is also being worked

on.

Page 28: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

Testbeds for EGEE

• JRA1, Middleware Engineering and Integration– Similar role to EDG dev testbed for EGEE middleware.– Rather more controlled and closed.– Resources at CERN, NIKHEF and RAL plus five testers

at CERN.– Release frequency up to 1/week.

• SA1, EU Grid Operations, Support, and Management.– Testing ground for applications with EGEE

middleware.– Release frequency of around 3 months

• WP3 testbed.– Will continue into the future for EGEE in some form.

Page 29: UK Testbed Report GridPP 9 Steve Traylen s.traylen@rl.ac.uk

Conclusions

• GridPP sites still make up the most significant portion of EDG.

• EDG will become smaller and somehow migrate to EGEE. As it stands it is probably to large for the early stages of EGEE.

• Running EDG is very good practise for LCG.

• Significant resources are wanted and best suited to LCG.

• Tier2s appear to working closely to do work on same things.