Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford

Preview:

Citation preview

Presenter Name

Facility Name

UK Testbed Status and

EDG Testbed Two.Steve Traylen

GridPP 7, Oxford

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Outline

• Status of the UK Sites.

• Release of EDG 2.

• UK Certificates.

• Grid monitoring in the UK.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Manchester

CE SE(1.5TB)

EDG 1.4

80xWN

CE SE(5TB)

EDG 1.4

60xWN

CE SE

EDG 1.4

9xWN

EDG Testbed BaBar Farm DZero Farm

•GridPP and BaBar VO Servers.

•User Interface

•Plan that DZero farm will join LCG.

•SRIF bid in place for significant HEP resources for the end of the year.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

UCL

CE SE

EDG 1.4

1xWN

EDG Testbed •Network Monitors for WP7 development.

•SRIF bid in place for 200 cpus for the end of the year to join LCG1.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

RAL PPD

CE SE

EDG 1.4

9xWN

EDG Testbed

CE SE

EDG 2.0

1xWN

RGMA Testbed

MON

•User Interface

•Plan to be a portion of the Southern Tier2 Centre within LCG1.

•50 cpus and 5TB of disk expected for the end of year.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Birmingham

CE SE

EDG 1.4

1xWN

EDG Testbed•Expansion to 60 cpus and 4TBs.

•Expect to participate within LCG1/EDG2

Liverpool

CE SE

EDG 1.4

1xWN

EDG Testbed

•Currently unmaintained.

•Plan to follow EDG 2, possibly integrating BaBar farm.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

RAL

CE SE

EDG 1.4

5xWN

EDG Testbed

CE

EDG 1.4

230xWN

Teir1/a

CE SE

EDG 2.0

RGMA Testbed

MON

CE SE

EDG 2.0

1xWN

EDG Dev Testbed

MON SE

ADS

•UI within CSF.

•NM for EDG2.

•Top level MDS for EDG.

•Various WP3 and WP5 dev nodes.

•VOMS for DEV TB.

•http://ganglia.gridpp.rl.ac.uk/

SE

LCG0 Testbed

CE 1xWN

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Cambridge

CE SE

EDG 1.4

15xWN

EDG Testbed•Farm shared with local NA-48, GANGA users.

•Some RH73 WNs for ongoing Atlas challenge.

•3TB GridFTP-SE.

•Plan to join LCG1/EDG2 later in the year with an extra 50 cpus later this year.

•EDG jobs will soon be fed into the local E-Science farm.

•http://farm002.hep.phy.cam.ac.uk/cavendish/

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Bristol

CE SE

EDG 1.4

1xWN

EDG Testbed

CE SE

EDG 2.0

1xWN

RGMA Testbed

MON

CE SE

CMS-LCG0

CMS/LHCb Farm

24xWN

CE SE

EDG 1.4

BaBar Farm

78xWN

•GridPP RC.

•Plan to join EDG2 and LCG1

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Imperial College

CE SE

EDG 1.4

EDG Testbed

WNs

CE

EDG 1.4

WNs

BaBar Farm

CE SE

CMS-LCG0

CMS-LCG0

WN

CE SE

EDG 2.0

1xWN

RGMA Testbed

MON

•RB and BD-II for EDG 1.4.

•RB and BD-II for EDG 2.0.

•Plan to be in LCG1 and other testbeds.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Queen Mary

CE SE

EDG 1.4

1xWN

EDG Testbed

32xWN

• CE also feeds EDG jobs to 32 node E-Science farm.

•Plan to have LCG1/EDG2 running for the end of the year.•Expansion with SRIF grants.(64WN+2TB in Jan 2004, 100WN + 8TB in Dec 2004.)

•http://194.36.10.1/ganglia-webfrontend

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Oxford

CE SE

EDG 1.4

2xWN

EDG Testbed •Plan to join EDG2/LCG1.

•Nagios monitoring has been set up.

•(RAL is also evaluating Nagios.)

•Planning to send EDG jobs into 10 WN CDF farm.

•128 node cluster being ordered now.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Glasgow

CE SE

EDG 1.4

ScotGRID

59xWN

• New hardware expected soon.

• WNs on a private network with outbound NAT in place.

• As ScotGRID grows plans to be part of LCG.

• Various WP2 development boxes.

CE SE

EDG 2.0

RGMA Testbed

MON

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

UK Overview

• Now significant resources within EDG.

• Integrating EDG to farm has been repeated many times but it is difficult.

• Sites are keen to take part within LCG1 or EDG2.

• By the end of the year many HEP farms plan to be contributing to LCG1 resources.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

EDG 2.0

• Now in a permanent state of immanent release.

• Since 27th May:

– 25 pre releases.

– 295 configuration changes.

– Range from a typo to a new resource broker.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Criteria for cutting EDG 2.0• For EDG 2.0 to the following must be satisfied.

– 50 sequential jobs. 98% success.– 250 jobs being ran by 1 RB. 80% success.– 5 jobs with 2GB i/o sandbox. 80% success.– 25 jobs which require two proxy renewals. 80%– Upload and register 1GB file to an SE, replicate

to a mass storage device.– Register 1000 files in less than 1000s.– Match a job against three files on an SE.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Installation of an EDG2 Testbed• LCFGng recommended installation method - No

manual install instructions yet.– Significantly better than LCFG.

• Configuration (site-cfg.h) is less cryptic.• Less hand installation required.

– Install host certificates.– PBS server.– MySQL tables.– mkgridmap.conf.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Integration after 2.0.

• Use gcc3.2.2 throughout.– Currently used by RB and the APIs the

RB uses.• GridFTP access to castor.• Integration of VOMS.

– Currently ongoing in parallel.– Has no impact on existing software.

• This will be EDG 2.1

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Required Nodes

• CE: gatekeeper, MDS, gin, ..• SE: GridFTP, WP5-SE, gin, …• WN: PBS batch worker + client tools.• MON: Servlets for a site, GOut for the RB.

Also collects fabric monitoring information…– On small sites can be moved to the CE.

• Generally configuration is more modular.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

LCG1 or EDG2

• Which testbed should I join?

– Significant resources best suited to LCG1.

– Small dynamic testbeds can contribute to continued development of testbed two.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

UK Certificates

• UK EScience CA was added to production EDG testbed 3 weeks ago.

• UK Hep CA will stop issuing certificates.

– Existing certificates will still be valid for the remainder of their lifetime.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Ratio of UKHep to EScience Certs

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

EScience Certs by OU.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

VO Membership + EDG Guidelines

0 20 40 60 80 100 120

BaBar

Eobs

Iteam

LHCb

Alice

BioMe

CMS

Atlas

WP6

Members

UK Members

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Ganglia• Ganglia provides time plots of system metrics.• In use at RAL, Cambridge and QMUL.• By default load, network i/o, memory.• Trivial to add new metrics, e.g. active MySQL

connection for CMS.• Expansion to the UK possible via LCFG objects

and instructions, however WP4 tools might be a used instead.

• Data could be collected centrally for a UK view.

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

GridPP Map

• Checks HEP sites every 6(?) hours for:

– Ping

– Globus Submission

– EDG Job Submission via Imperial RB.

– EDG Job Submission via LYON RB.

• http://www.gridpp.ac.uk/map/

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

GridPP RB Monitoring @ Imperial

• Publishes service status.• Publishes times for LDAP queries of resources.• http://www.hep.ph.ic.ac.uk/~dguser/diagnostics.html.• Imperial also submits test jobs, more sophisticated

jobs than the map, e.g. check for the existence of a CloseSE.

• http://www.hep.ph.ic.ac.uk/~dguser/Qstatus.html

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

28th April 2003 Steve Traylen, s.traylen@rl.ac.uk

PPD

Monitoring

• Currently lots of monitoring but no central location.

• Most monitoring currently only shows the current state.

• The Grid operations centre can coordinate much of this.

Recommended