Upload
questa
View
34
Download
0
Embed Size (px)
DESCRIPTION
WLCG Middleware Status Report. 16 th February, 2009. Overview. The three WLCG middleware stacks ARC (NDGF) Most sites in northern Europe ~ 10 % of WLCG CPUs OSG Most North American sites > 25 % of WLCG CPUs gLite Used by the EGEE infrastructure Summary and Issues - PowerPoint PPT Presentation
Citation preview
Markus SchulzLCG Deployment
WLCG Middleware Status Report
16th February, 2009
Overview The three WLCG middleware stacks
ARC (NDGF) Most sites in northern Europe
~ 10 % of WLCG CPUs OSG
Most North American sites > 25 % of WLCG CPUs
gLite Used by the EGEE infrastructure
Summary and Issues
s have been added by me
ARC middleware status
Michael GrønagerProject Director, NDGFLCG-LHCC Mini Review
CERN, Geneva, February 16th 2009
LCG-LHCC Mini Review, CERN, February 2009
WLCG sites with ARC Tier-1:
NDGF Tier-2s:
Finnish Norwegia
n Slovenian Swedish
Tier-3s: Danish Norwegia
n Swedish Swiss
5LCG-LHCC Mini Review, CERN, February 2009
ARC Status – Current Version
Current stable release 0.6.5 - “Earth Quake” (December)
Improved cache scalability added ARC supports caching of files used by several jobs.
This boosts performance for e.g. Analysis, but scalability issues were detected for large clusters. ARC0.6.5 enables to split this load to several file servers
Optional patch for replacing Globus MDS by new solution: EGIIS, which includes BDII - This is deployed at most NDGF related sites
Minor issue with LFC fixed
6LCG-LHCC Mini Review, CERN, February 2009
ARC Status – Next Version Next stable release “Fastelavn”* (February) Further scalability improvements included:
Support for sharing the load on multiple file system servers
Support for distributing multiple up and down loaders on multiple machines
- these new features makes ARC ready for running production on large +5000 core machines
MDS fully replaced by EGIIS and BDII Optional publishing of GLUE1.3 along with ARC
schema (Currently in testing e.g. at NDGF-BENEDICT-T3)
KnowARC features stating to appear: Optional Module for OGF BES submission, based on
new and more modular code base * aka Mardi Gras
7LCG-LHCC Mini Review, CERN, February 2009
ARC Future The production release of ARC (sometimes called
ARC classic) will continue to evolve More and more components will be integrated from
e.g. the KnowARC project. The KnowARC development adds new service
interfaces that adhere to standards like GLUE2, BES and JSDL These will be incorporated into the production rel. of
ARC. There will be no “migrations” but a graduate
incorporation of the novel components into the stable branch, like OGF BES in “Fastelavn”
ARC components will be included in UMD, and ARC now supports building on ETICS.
Staus of OSG Middleware for WLCG
Ruth Pordes, OSG Executive DirectorAlain Roy, OSG Software Coordinator
LHCC MiniReview Feb 16th 2009
OSG Middleware Scope & Status
• OSG provides packages for Compute Elements, Storage Elements, VO managers, Worker-Node Client and User Client.
• OSG middleware is tested to allow Applications to interoperate across OSG and EGEE (and NDGF).
• Thus WLCG users are able to transparently use the multiple grids.
• OSG V1.0 stable for during data taking, cosmic runs, ramped up simulation production and analysis during second half of 2008.
9LHC Mini-Review, Feb 2009
Progress over last 6 months• Bestman/xrootd Storage Elements now installed at several
Tier-2. Bestman + nfs/luster/hadoop) installed on Tier-3s and a couple of Tier-2s.
• Addition of WLCG Client utilities (LFC, lcg_utils) enables use OSG Client with no need to install both the OSG and EGEE client packages.
• Roll-out of joint gLite/VO services/ GLobus common interfaces and protocols in security components. Significant testing effort across the projects including SCAS/LCAS, glexec, GUMS.
• EGEE packages continue to be included in OSG s/w stack:
10
VOMS/VOMS-Admin glexec edg-mkgridmap
LHC Mini-Review, Feb 2009
Software Tools Group• Part of new OSG project structure in FY09. Led by Alain Roy and Mine
Altunay.
• Central hub for all software projects/plans.
• Aims to ensure stakeholder’s needs are met from planning to deployment.
• Single point of contact for software providers.
• Inputs: User/VO/Site requirements Software providers timelines/plans
• Outputs: Plans for software stack evolution
• Point of contact with the EGEE EMT and gLite.
11LHC Mini-Review, Feb 209
External Software ProvisionOSG,US ATLAS,US CMS working closely with software
development groups for Timely deployment of new versions of dCache and Bestman for
WLCG needs. Evolution of the identity systems (looking at backends to Shib,
Kerberos) and compatability. Condor changes to support scalability in number of jobs. Internet2/ESNEt for deployment of perfsonar network monitoring
tools. Gratia accounting, OIM operations database & tools. Use of xrootd.
OSG & US ATLAS working on generalization of PANDA for other users.
OSG and US CMS working on generalization of Glide-in WMS for other users.
12LHC Mini-Review, Feb 2009
OSG support for gLite underpinnings
• We continue to supply a subset of the VDT as RPMs: Condor Globus MyProxy GSI OpenSSH GPT
13LHC Mini-Review, Feb 2009
Current Work• Major focus is on better support for incremental
upgrades, roll-back, forward compatability. Includes a redesign of the packaging to improve native
packaging• Debian 5 support for LIGO• Software upgrades only if really needed.
Not looking yet at Globus 4.2
• Interoperability: Testing of compatability of CREAM with OSG Client stack Ensure availablity, reliability, installed capacity, accounting
software and sevices all report correctly from OSG to EGEE and to WLCG.
14LHC Mini-Review, Feb 2009
Currently Supported Platforms• Linux (32 & 64 bit)
RHEL 3 RHEL 4 RHEL 5 Debian 4 ROCKS 3 SuSE Linux 9 (just 64-bit) Scientific Linux 3 Scientific Linux 4
• Mac OS 10.4 (client only)• AIX 5.3 (limited support)
15LHC Mini-Review, Feb 2009
Concerns (nothing new)
Need to continue to ensure modularity/separation of EGEE services and WLCG, to enable OSG to effectively contribute and peer. Need WLCG to work with OSG middleware activities as closely as with the EGEE middleware activities. We are all trying hard here!
Interoperability activities will become more challenging in an EGI era where the number of independent s/w stacks may grow or diverge. OSG committed to work with EGI partners in these areas.
OSG pleased to contribute to the Infrastructure Policy Group. These are pragmatic activities for understanding commonalities and differences. OSG remains nervous at the potential of OGF standards being really successful.
16LHC Mini-Review, Feb 2009
gLite The current release is gLite 3.1 It is updated almost every week ( 30+ updates/year) Its purpose is to provide a stable platform for production grid usage It covers:
Data Management Workload Management Information System AAA
Distributed lifecycle Tools and formal processes
Links teams and tasks Monitor progress
Large code base (~1.6 Million lines of code)
Release Day
time
C
Update1
B
Update2
AC
Update3
B
Integration CertificationBuild
Regular release interval
Component A
Component B
Component C
Illustration of
in a component based release process
Update4
Most Active Areas Workload management ( access to computing resources)
Support for multiuser pilot jobs Used by experiment’s frameworks: Dirac, Panda, ALIEN
Move to next OS platform: SL5
Continuous evolution of other components FTS, DPM, LFC……..
Workload management LCG-RB has been phased out WMS-3.1 SL4 major update (accumulates patches from > 8 months)
Certified Will be released to production in the next weeks Can handle >30K jobs/day Better support for bulk submission Almost ready to support CREAM-CE
ICE integrated, but needs more testing Support for multiuser pilot jobs
SCAS and glexec on WNs are late Now under stress testing Still issues with memory management Fails at 0.03% rate Not good enough for an authorization system Scales to > 10 Hz ( ok for most sites) Will start a pilot service during the next week
Computing Resource Access (CE)
In production at all EGEE sites: LCG-CE Legacy service
Introduced end 2002 Has been improved over the years to handle 50 users and 4K jobs
This is good enough for production use Might be problematic for analysis tasks
CREAM-CE New architecture
Web Service interface, supports BES standard Parameter passing to batch systems Scalability!!!
First version has been released to production 8 months ago 13 instances in production + 13 in PPS Used by ALICE
New version with many bug fixes in final certification state
Scientific Linux 5 SL5 Worker Node pilot phase has come to an end
Experiments encountered no major problems New formal release is being prepared Will arrive in production soon
Other activities: Multi compiler support Support for multiple versions Improved rollback support
Long term: Support of new information system schema ( GLUE-2) Introduction of first components of new EGEE Authorization
Framework Policy management system
Issues and Outlook EGEE-III ends early 2010
The new environment for middleware support is under discussion Less CERN involvement in integration and release management Will the new entities be up and running in time?
gLite Consortium Discussions on formal agreement are taking place Required to organize support for gLite middleware
Unified Middleware Distribution is forming ARC + gLite + UNICORE
Move towards standards based middleware WLCG has a wider scope
Maintaining interoperability might become more difficult
Summary
All 3 middleware stacks provide stable production environments And are aware of scalability issues and addressed most of them
All 3 stacks interoperate with each other And work on improving interoperability and interoperation
OSG supports actively supports pilot jobs (glexec/Gums) gLite will soon ( glexec/SCAS)
Middleware stacks still evolve successfully introduced major changes to the production system Without interrupting the service
The transition from EGEE-III to EGI, UMD and the gLite consortium will be challenging