23
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010

UKI-SouthGrid Overview and Oxford Status Report

  • Upload
    mandek

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

UKI-SouthGrid Overview and Oxford Status Report. Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010. UK Tier 2 reported CPU – Historical View to present. SouthGrid Sites Accounting as reported by APEL. - PowerPoint PPT Presentation

Citation preview

Page 1: UKI-SouthGrid Overview  and Oxford Status Report

UKI-SouthGrid Overview and Oxford Status Report

Pete GronbechSouthGrid Technical Coordinator

HEPSYSMAN – RAL10th June 2010

Page 2: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 20102

UK Tier 2 reported CPU

– Historical View to present

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

Jun-09

Jul-09

Aug-09

Sep-09

Oct-09

Nov-09

Dec-09

Jan-10

Feb-10

Mar-10

Apr-10

May-10

Jun-10

K SPEC int 2000 hours

UK-London-Tier2

UK-NorthGrid

UK-ScotGrid

UK-SouthGrid

Page 3: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 20103

0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

May-09

Jun-09 Jul-09 Aug-09

Sep-09

Oct-09

Nov-09

Dec-09

Jan-10 Feb-10

Mar-10

Apr-10 May-10

Jun-10

K SPEC int 2000 hours

JET

BHAM

BRIS

CAM

OX

RALPPD

SouthGrid SitesAccounting as reported by

APEL

Sites Upgrading to SL5 and recalibration of published SI2K values

RALPP seem low, even after my compensation for publishing 1000 instead of 2500

Page 4: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 20104

Page 5: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 20105

Site Resourcesat Q409

HEPSPEC06

CPU (kSI2K) converted from

HEPSPEC06 benchmarks Storage (TB)

1772 442 1.5

3344 836 114

1836 459 110

1772 443 140

3332 833 160

12928 3232 633

0    

24984 6246 1158.5

Site

EDFA-JET

Birmingham

Bristol

Cambridge

Oxford

RALPPD

 

Totals

Page 6: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 20106

Resources at Q110

Page 7: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 20107

Gridpp3 h/w generated MoU for 2010,11,12

2010 TB 2011 TB 2012 TB

bham 179 95 124

bris 22 27 35

cam 108 135 174

ox 203 255 328

RALPPD 364 440 583

2010 HS06 2011 HS06 2012 HS06

bham 1450 2,119 2724

bris 661 1,173 1429

cam 1148 1,445 1738

ox 2034 2,483 2974

RALPPD 6499 13109 16515

Page 8: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 20108

Oxford

• Preparing tender to purchase h/w with the 2nd tranche of gridpp3 money

• ATLAS pool accounts on the DPM problem, worked for some not for others. Increased the number fixed it.

• Ewan working on KVM and iSCSI for VMs. SL5 has Virt IO support as both host and guest. Shared storage will give us VMware style live migration for free. No VMware tools hassle.

Page 9: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 20109

Grid Cluster setup SL5 Worker Nodes

T2ce04

LCG-ce

T2ce05

LCG-ce

t2torque02

T2wn40T2wn5x

T2wn6xT2wn7xT2wn8xT2wn85

Glite 3.2 SL5

Oxford

Page 10: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201010

Grid Cluster setup CREAM ce & pilot setup

t2ce02

CREAM

Glite 3.2 SL5

T2wn41glexec

enabled

t2scas01

t2ce06

CREAM

Glite 3.2 SL5

T2wn40 -87

Oxford

Page 11: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201011

Grid Cluster setup NGS integration setup

ngsce-test.oerc.ox.ac.uk

ngs.oerc.ox.ac.uk

wn40wn5x

wn6xwn7xwn8x

Oxford

ngsce-test is a Virtual Machine which has glite ce software installed.

The glite WN software is installed via a tar ball in an NFS shared area visible to all the WN’s.

PBSpro logs are rsync’ed to ngsce-test to allow the APEL accounting to match which PBS jobs were grid jobs.

Contributed 1.2% of Oxfords total work during Q1

Page 12: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201012

Oxford Tier-2 Cluster – Jan 2009

located at Begbroke. Tendering for upgrade.

Decommissioned January 2009

Saving approx 6.6KW

Originally installed April 04

1717thth November November 2008 Upgrade2008 Upgrade

26 servers = 208 Job Slots

60TB Disk

22 Servers = 176 Job Slots 100TB Disk Storage

Page 13: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201013

Grid Cluster Network setup

3com 5500

T2se0n – 20TB

Disk Pool Node

Worker Node

3com 5500

3com 5500

Backplane Stacking Cables 96Gbps full duplex

T2se0n – 20TB

Disk Pool NodeT2se0n – 20TB

Disk Pool Node

Dual Channel bonded 1 Gbps links to the storage nodes

Oxford

10 gigabit too expensive, so will maintain 1gigabit per ~10TB ratio with channel bonding in the new tender

Page 14: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201014

Tender for new Kit

• Storage likely to be 36 bay *2TB supermicro servers• Compute node options based twin squared supermicro

with– AMD 8 core Best Value for money– AMD 12core– Intel 4 core– Intel 6 core 3GB RAM per coreDual SAS local disks

• APC racks, PDUs and UPSs• 3COM 5500G network switches to extend our existing

infrastructure

Page 15: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201015

Atlas Monitoring

Page 16: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201016

PBSWEBMON

Page 17: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201017

Ganglia

Page 18: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201018

Command Line

• showq | more• pbsnodes –l• qstat –an• ont2wns df –hl

Page 19: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201019

Local Campus Network Monitoring

Page 20: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201020

Gridmon

Page 21: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201021

Patch levels – Pakitiv1 vs v2

Page 22: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201022

Monitoring tools etc

Site Pakiti Ganglia Pbswebmon Scas,glexec, argus

JET No Yes No No

Bham Yes Yes No No,yes,yes

Brist Yes, v1 Yes No No

Cam No Yes No No

Ox V1 production, v2 test

Yes Yes Yes, yes, no

RALPP V1 Yes No No (but started on scas)

Page 23: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid June 201023

Conclusions

• SouthGrid sites utilisation improving• Many had recent upgrades others putting out tenders• Will be purchasing new hardware in gridpp3 second

tranche• Monitoring for production running improving