Upload
charles-coleman
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
GridPP4 Project Management
Pete Gronbech April 2012
GridPP28 Manchester
2GridPP28, Manchester
Since the last meeting
• LHC is still building up to full running again after the Christmas technical stop.
• Tier-1 running well, and also busy with infrastructure upgrades
• Tier-2s busy installing new hardware and new networking equipment.
• GridPP4 1st tranche hardware money spent• Digital Research Infrastructure Grant equipment
money spent.
17/4/12
3GridPP28, Manchester
Accelerator Update
• This year the collision energy is 8 TeV (=beam energy 4 TeV), slightly higher from last years 3.5 TeV.
• Started first beams four weeks ago, mostly for testing of 'safety systems'.
• First physics at 2 x 4 TeV was one week ago starting with 2 x 3 bunches. At this very moment collisions are taking place with 2 x 1092 bunches, already giving more collisions than last year with 2 x 1380 bunches.
• In a few days aim to be at the nominal number of bunches for this year, 2 x 1380, (but higher luminosities = more collisions than last year because of the higher energy and smaller beta* at the interaction point).
• On Friday there will be 3 days of machine development followed by the first Technical Stop of the year. Back to production again for data taking in the beginning of May for an 8 week period.17/4/12
4GridPP28, Manchester
Tier-1• CPU hardware delivered and commissioned in time to meet
WLCG pledge• Both tranches of disk has been delivered and deployed• Upgrade to CASTOR 2.1.11-8 completed • Operations very stable following many upgrades in February.
17/4/12
5GridPP28, Manchester
Tier-2s
• All grants for 1st tranche of hardware issued and should have been spent– sites should have hardware to meet 2012 pledge.– All sites have been trying to spend the money this Financial
Year.• Most sites made significant upgrades and coupled
with the DRI grants have been able to enhance the infrastructure and networking both within the clusters and across campus to the JANET connections.
• Future MoU’s showed shortfalls in Storage capacity more than CPU, which meant an emphasis on disk purchases.
• Prices were inflated and deliveries extended due to the flood in Thailand causing a worldwide shortage.
• However prices for networking equipment came down substantially in January which did compensate in part at some sites.
17/4/12
6GridPP28, Manchester
DRI and GridPP4 Grants
• Instructions for JeS issued 9/11/11• GridPP4 grants issued very quickly some in December 2011.
• DRI Bids solicited 8/11/11• DRI Project team reviewed responses very quickly during
18th November to 8th December and revised to meet the £3M target once this was known.
• JeS instructions were sent out on 9th December.• Grants issued early January 2012.
• All equipment on sites by end of March 2012.
7GridPP28, Manchester
UKI CPU contribution (LHC)
CPU March 2012 – GStat2.0
17/4/12
Since April 2011
Country stats
8GridPP28, Manchester
UKI VOs
17/4/12
Since March 2011
Previous year
Non LHC VOs are getting squeezed
9GridPP28, Manchester17/4/12
10GridPP28, Manchester
VO support across sites
17/4/12
11GridPP28, Manchester
UKI Tier-1 & Tier-2 contributions
17/4/12
Since March 2011
Previous year
12GridPP28, Manchester
Storage
17/4/12
• From GStat2.0
August 2010
March 2011
April 2012
HS06 TB
Q411 CPU Q411 Disk
LondonGrid 69550 3955
NorthGrid 64081 2520
ScotGrid 43886 1563
SouthGrid 37744 2251
TOTAL 215261 10289
Quarterly Reported Resources
Truth somewhere in between, Q112 report will help clarify the situation.
13GridPP28, Manchester
GridPP4 ProjectMap Q411
17/4/12
14GridPP28, Manchester
Q411
• Tier 1 staff, Service availability for Atlas due to castor and network issues.
• Atlas data availability (92%)• CMS Red Metrics are all due to Bristol• Data group no of blog posts low, and NFSv4 study late.• Security delay in running SSC.• Execution, no of vacant posts, and review of service to expts.• Outreach, no of news items, Press releases, KE meetings low.
• Q112 reports due in at the end of this month or earlier preferably!!!
15GridPP28, Manchester
Non LHC Storage Stats so far
Site Total
Percentage Disk used across T2 Site percentage non LHC
EDFA-JET 0 0.01% 31%Birmingham 154 10.58% 1%Bristol 62 4.26% 0%Cambridge 49 3.37% 1%Oxford 421 28.97% 3%RALPPD 768 52.80% 1%
Total 1454 100.00% 2%
UKI-LT2-Brunel 430 16% 1%
UKI-LT2-IC-HEP 744 28% 4%
UKI-LT2-QMUL 867 32% 5%
UKI-LT2-RHUL 498 19% 1%
UKI-LT2-UCL-HEP 129 5% 1%
Total 2669 100% 3%
16GridPP28, Manchester
Project map - statistics
17/4/12
Metrics Milestones
17GridPP28, Manchester
Manpower
• GridPP was running at reduced manpower for the later part of 2011, with ~2 FTE short at the T2s and ~4 FTE at RAL.
• Both T1 and 2’s have now filled the posts so there should be the capacity to do development work that has been on hold due to the shortages.
17/4/12
18GridPP28, Manchester
Risk register
17/4/12
• Highlighted risks– Recruitment and retention – Still a concern but currently more stable.– Resilience to Storage – Problems with batches of Storage– CASTOR is critical and although more stable now, has serious
consequences when it fails.– Insufficient funding for T2 h/w: Increased equipment costs (esp Disk),
and increases Experiment Resource requests. Mitigated by DRI investment to a certain extent.
– Contention for resources anticipated to be more of an issue as LHC use increases and squeezes the minor VO’s.
19GridPP28, Manchester17/4/12
Timeline2006 20082007
GridPP2 GridPP2+
GridPP3
End of GridPP2(31 August 2007)
Start of GridPP3(1 April 2008)
2009 20112010
GridPP3 GridPP4
Start of GridPP4(1 April 2011)
2012 20142013
GridPP4
GridPP celebrated it’s 10th Birthday in December 2011
20GridPP28, Manchester
From the start of GridPP3to the present time
• At the start of GridPP4~27000 CPUs and ~7PB disk reported.
• Now ~31000 CPUs and ~27PB (If GSTAT is to be believed)
• The UK reported approx 370GSI2K hours last year, just ahead of Germany and France, and is still the largest in the EGI grid.
•
17/4/12
21GridPP28, Manchester
Reporting
• The main LHC experiments will continue to report on the Tier 1 and the Tier2 performance as both Analysis and Production sites
• The tier 2 sites reporting continues as before with reports going via the Production Manager.
• Slight modifications to enable better tracking of Non LHC VO storage use.
• Storage, Security, NGI and Dissemination have separate reports.
17/4/12
22GridPP28, Manchester
Summary
17/4/12
• The first accounting period completed and the 1st tranche of h/w funding was allocated.
• Last Autumn and this Spring particularly busy with GridPP h/w and DRI grants. Tendering, quotes, purchasing and now installations and upgrades.
• Should plan to be stable in time for the next data taking in May, although the load seen on Tier 2s is more aligned with Physics conferences than data taking in some cases.
• A reminder that we are in continuous accounting period which started at the end of the last one. i.e. from 1st November through to a date to be determined, dependant on STFC capital spend profiling.