RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

Preview:

Citation preview

RAL Tier1: 2001 to 2011

James ThorneGridPP 19

30th August 2007

30/08/2007 j.i.thorne@scitech.ac.uk

2001 to 2007

• Sorry GridPP, I’m afraid I can’t do that!

30/08/2007 j.i.thorne@scitech.ac.uk

Result of GridPP3 for Tier1

• Good result:– Effort increases from 16.5 to 20.4 FTE– £6.8M hardware budget (cf £2.3M in GridPP2)

• Extra fault management/hardware staff as size of farm increases

• A good result but team remains thinly stretched; hardware is just sufficient to meet experiments’ requirements.

30/08/2007 j.i.thorne@scitech.ac.uk

Planned Tier1 Storage Capacity (TiB)

Storage Capacity (TiB)

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

2008 2009 2010 2011

April

TiB Tape

Disk

30/08/2007 j.i.thorne@scitech.ac.uk

Planned Tier1 CPU Capacity (KSI2K)

0

2000

4000

6000

8000

10000

12000

14000

16000

2008 2009 2010 2011

April

KS

I2K

30/08/2007 j.i.thorne@scitech.ac.uk

Estimated Rack Count

0

20

40

60

80

100

120

2006 2007 2008 2009 2010 2011

Ra

ck

s

Disk

CPU

30/08/2007 j.i.thorne@scitech.ac.uk

Estimated number of Disk Servers

050

100150200250300350400450500

2006 2007 2008 2009 2010 2011

Nu

mb

er o

f d

isk

serv

ers

30/08/2007 j.i.thorne@scitech.ac.uk

Estimated number of Spinning Drives

0

2000

4000

6000

8000

10000

12000

2006 2007 2008 2009 2010 2011

Nu

mb

er

of

dri

ve

s

30/08/2007 j.i.thorne@scitech.ac.uk

Approximate H.W Value Allocated to Experiments in 2008

Alice4%

Atlas53%

Babar3%

CMS31%

LHCb8%

Other1%

Alice

Atlas

Babar

CMS

LHCb

Other

30/08/2007 j.i.thorne@scitech.ac.uk

Hardware

• CPU• Disk• Tape• Further procurements in FY08, FY09 and

FY10

30/08/2007 j.i.thorne@scitech.ac.uk

New Machine Room

• Order placed and contractor has started work• 800m2 can accommodate 300 racks + 5 robots• 2.3MW Power/Cooling capacity (some UPS)• Office accommodation for all E-Science staff• Scheduled to be available for September 2008

30/08/2007 j.i.thorne@scitech.ac.uk

Staffing

• Lex Holt left Tier1• James Adams is moving from hardware

support to Fabric Team system admin• Plan to recruit:

– Replacement hardware repair position– Two experiment support posts; one ATLAS, one

CMS.– Raja Nandakumar as honorary team member from

LHCb– Will also shortly commences GridPP3 recruitments

30/08/2007 j.i.thorne@scitech.ac.uk

CASTOR

• Operational issues mentioned at GridPP 18 were tip of iceberg and CASTOR 2.1.2 service was found to be inoperable.

• Massive amount of re-engineering carried out since March with much effort from CASTOR team.– Huge progress– Areas of concern

• We are optimistic that CASTOR will be a success

30/08/2007 j.i.thorne@scitech.ac.uk

SL4

• 20% of batch farm now running SL4• Negotiating with LHC experiments to agree

the move of their capacity from SL3 to SL4.• Once LHC migration is completed, remaining

capacity will follow within a few weeks.• Depends on the experiments, but should

expect termination of SL3 service in September

30/08/2007 j.i.thorne@scitech.ac.uk

Reliability

• March: invested a lot of effort without much gain

• Continue to prioritise reliability and making progress

• Recently exceeded target, now must maintain

• Start “Sysadmin On Duty” in September• Start on call later this year

30/08/2007 j.i.thorne@scitech.ac.uk

RAL-LCG2 Availability/Reliability

0%

20%

40%

60%

80%

100%

120%

Available

Old Reliability

New Reliability

Target

Average

Best 8

30/08/2007 j.i.thorne@scitech.ac.uk

CPU Efficiencies

• CPU efficiency much improved • August fall still being investigated• March minimum when CASTOR was

broken

30/08/2007 j.i.thorne@scitech.ac.uk

CPU Efficiencies

30/08/2007 j.i.thorne@scitech.ac.uk

Termination of GridPP use of ADS Service

• GridPP funding and use of old legacy Atlas Datastore service scheduled to end at end of March 2008.

• RAL will continue to operate ADS service and experiments are free to purchase capacity directly from ADS Team.

30/08/2007 j.i.thorne@scitech.ac.uk

dCache Closure

• dCache still supported and working• We will give 6 months notice before

terminating dCache service• No notice of termination yet• Aiming to end service by end of GRIDPP2

(March 2008). Also cannot terminate ADS service until dCache ceases.

30/08/2007 j.i.thorne@scitech.ac.uk

Grid Only

• Move to Grid only access postponed until December 2007

• No new local accounts• In January 2008:

– Batch job submission through RB/CE only (no qsub, some exceptions)

– No local login to UIs (some exceptions)– AFS Service will end

30/08/2007 j.i.thorne@scitech.ac.uk

Conclusions

• Positioning ourselves for LHC production.• A lot of good progress with CASTOR and

expect to meet the needs of the ATLAS M4 run and CMS’s CSA07.

• Reliability has finally improved.