13
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March 2006

INFSO-RI-508833 Enabling Grids for E-sciencE Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

Embed Size (px)

Citation preview

Page 1: INFSO-RI-508833 Enabling Grids for E-sciencE  Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

Integration and Testing, SA3

Markus Schulz

CERN IT

JRA1 All-Hands Meeting 22nd - 24nd March 2006

Page 2: INFSO-RI-508833 Enabling Grids for E-sciencE  Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

Integration and Testing, SA3 2

Enabling Grids for E-sciencE

INFSO-RI-508833

Building a Release

The Plan1. TCG provides prioritized list of functionality for the next release2. SA3 “shops” for components3. SA3 builds list of component candidates4. TCG blesses the list5. SA3 Integration (functional freeze)6. SA3 Certification 7. SA1 Preproduction

1. with users, integrates new components into operations

8. Loop quickly through 5-7 to add patches, remove broken parts9. SA3 packages release (with contributions from developers & friends)

1. Documentation (user, config, and admin)2. RPM repositories3. Configuration tools

10.SA1 & SA3 push out new release11.SA1, SA3, and software developer support the release

Page 3: INFSO-RI-508833 Enabling Grids for E-sciencE  Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

Integration and Testing, SA3 3

Enabling Grids for E-sciencE

INFSO-RI-508833

Inte

grat

ion

Inte

grat

ion

VDT/OSG

OMII-Europe

JRA1

SA3

Tes

ting

& C

ertif

icat

ion

Support, analysis, debuggingSupport, analysis, debugging

Pro

duct

ion

serv

ice

Pro

duct

ion

serv

ice

SA1P

re-p

rodu

ctio

n se

rvic

e

Mid

dlew

are

prov

ider

s

SA3

Certification activities SA3+SA1

Process to deployment

Page 4: INFSO-RI-508833 Enabling Grids for E-sciencE  Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

Integration and Testing, SA3 4

Enabling Grids for E-sciencE

INFSO-RI-508833

Release Sequence

The Plan Part II1. Every second release ends into production2. X.1 releases contain all potential packages 3. Process (Integrate, cert) weeds out those that are not ready yet 4. X.2 release candidate is moved on the pre-production service 5. The x.0 and x.1 have to be overlapping activities

Page 5: INFSO-RI-508833 Enabling Grids for E-sciencE  Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

Integration and Testing, SA3 5

Enabling Grids for E-sciencE

INFSO-RI-508833

Planned Schedule for gLite 3.0.0

Plan for gLite 3.1.0:– March 31st: code freeze for development release gLite 3.1.0– April 30th: end of integration– May 31st: end of certification. Deployment on PPS– July 31st: release of production version gLite 3.2.0. Start deployment at sites– September : gLite 3.2.0 installed at sites and usable.

PRODUCTION!!Deploy in prodnPPSCertification

Tuesday 28/2/06gLite 3.0.0β exitscertification and

enters pre-production

February April May JuneMarch

Friday 28/4/06 PPS phase ends. gLite 3.0.0 passes from PPS to Production.

Wednesday 15/3/06gLite 3.0.0β availableto users in the PPS

Deployment of gLite 3.0.0β in PPS Continual bug fixing and patches passed to PPS

Thursday 1/6/06

LCG Service Challenge 4

(SC4) starts!!

Page 6: INFSO-RI-508833 Enabling Grids for E-sciencE  Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

Integration and Testing, SA3 6

Enabling Grids for E-sciencE

INFSO-RI-508833

Time to rollout

• Time to upgrade ~constant (~2.5 sites/day)

• Takes a long time to upgrade entire infrastructure

LCG-2.6.0

Page 7: INFSO-RI-508833 Enabling Grids for E-sciencE  Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

Integration and Testing, SA3 7

Enabling Grids for E-sciencE

INFSO-RI-508833

Problems

• To get into a steady state– No extra time for the merging of the two release prep. Systems– No time for establishing a new process

• Integration and testing of gLite-3.0 is special– 2 stacks (build systems)– Multiple tests components – 2 sets of installation and configuration tools – Many changes of the the way integration is done – Merging teams and procedures

• SC4 requirements– Core components still need to integrate core functionality– Non negotiable release date + non negotiable functionality

• Requirements and Prioritization for next releases– become clear only during (pre.) production usage– BUT: Freeze a few days after startup of PPS…..

Page 8: INFSO-RI-508833 Enabling Grids for E-sciencE  Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

Integration and Testing, SA3 8

Enabling Grids for E-sciencE

INFSO-RI-508833

Problems

• gLite-3.1 currently not well defined– Partially due to lack of time

• Time for getting new services production ready is hard to predict– This makes pushing 3.1 through the process a frightening task– A tick list is needed to check components at certain process state transitions

Something like the next slide, but tailored for• Enter Integration• Integration -> Certification • Certification -> PPS• PPS -> Production

• Still need to move a lot of code into the ETICS build system• Still need to define the process from code to release

– Working on gLite-3.0 is a good to understand what is needed

• Maybe the current concept of releases is not adequate?– Component based with infrequent checkpoint releases + upgrades?

Page 9: INFSO-RI-508833 Enabling Grids for E-sciencE  Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

Integration and Testing, SA3 9

Enabling Grids for E-sciencE

INFSO-RI-508833

Checklist for a new service• User support procedures (GGUS)

– Troubleshooting guides + FAQs– User guides

• Operations Team Training– Site admins– CIC personnel– GGUS personnel

• Monitoring– Service status reporting– Performance data

• Accounting– Usage data

• Service Parameters – Scope - Global/Local/Regional– SLAs– Impact of service outage– Security implications

• Contact Info– Developers– Support Contact– Escalation procedure to developers

• Interoperation– Documented issues

• First level support procedures– How to start/stop/restart service

– How to check it’s up

– Which logs are useful to send to CIC/Developers

and where they are

• SFT Tests– Client validation

– Server validation

– Procedure to analyse these error messages and likely causes

• Tools for CIC to spot problems– GIIS monitor validation rules (e.g. only one

“global” component)– Definition of normal behaviour

Metrics

• CIC Dashboard– Alarms

• Deployment Info– RPM list

– Configuration details

– Security audit

Page 10: INFSO-RI-508833 Enabling Grids for E-sciencE  Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

Integration and Testing, SA3 10

Enabling Grids for E-sciencE

INFSO-RI-508833

Current State of Integration

• ETICS and lcg build systems– Move to ETICS started – Progress is slow because priority is SC4 functionality– ETICS team handles building of meta RPMs

Define release candidates

– Repositories for certification, preproduction, and soon production– Build for SC4 32bit RPM only – Lots of informal communication between:

Integrators, Certificators, and SoftwareProviders

Next steps (after we are ready to roll for SC4)– One build system – Define and describe integration process

Including synchronization with external dependencies

– Prepare, with SA3 partners, for building releases for: Multiple linux distributions and package formats 32/64 Intel & AMD

Page 11: INFSO-RI-508833 Enabling Grids for E-sciencE  Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

Integration and Testing, SA3 11

Enabling Grids for E-sciencE

INFSO-RI-508833

Testing

• We have now an inventory of existing tests (Zdenek and friends)– Not 100% complete, but an excellent start

• We have started to identify gaps – And have sign ups for some of them– Here the external partners in SA3 will contribute

• On the certification testbed we use– Gilbert’s test suite for LCG components– Gilbert’s test suite for gLite components – Some manually run tests– SFT

• “External” tests for SRM interoperations, FTS,..

• An integration of tests is urgently needed– Common reporting and archiving – Should be linked to ETICS activities

Page 12: INFSO-RI-508833 Enabling Grids for E-sciencE  Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

Integration and Testing, SA3 12

Enabling Grids for E-sciencE

INFSO-RI-508833

Testing

• Complexity of the certification testbeds has and will increase– Different WLMs (Ces, RBs) – More services– Hopefully soon external partner sites

Different platforms (OS, distributions, architecture) Interoperability (different grids, different OS versions)

• Main problem in the testing area is the lack of resources– All hands are on deck getting gLite-3.0 out of the door

• Next steps:– Gap analysis – Plan for an integrated test environment

Best within ETICS

– Explore ways to handle complexity and diversity more efficiently Virtual machines ?

Page 13: INFSO-RI-508833 Enabling Grids for E-sciencE  Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March

Integration and Testing, SA3 13

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

• With gLite-3.0 we have not yet reached a fully integrated release– Work to be done on:– Integrating the build systems– Integrating all tests – Defining a release process

Workflow Acceptance criteria

• All activities are focused to meet the SC4 deadlines – This helps to prioritize – This slows down the lcg<->gLite integration process

• Consolidation of test and certification activities will take some time

• We have to rethink how we can evolve the production system– How to introduce change? – What is a release?– The current approach 3.0 ->( 3.1) -> 3.2 is not very realistic