19
Tony.Cass@CERN .ch Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

  • Upload
    moke

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12. The HEPiX virtualisation working group. The HEPiX virtualisation working group was formed to facilitate the instantiation of user-generated virtual machine images at HEPiX (and WLCG) sites. - PowerPoint PPT Presentation

Citation preview

Page 1: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

Virtualised Worker Nodes

Where are we?What next?

Tony CassGDB 201212/12/12

Page 2: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

[email protected]

The HEPiX virtualisation working group The HEPiX virtualisation working group was

formed to facilitate the instantiation of user-generated virtual machine images at HEPiX (and WLCG) sites.

Users were expressing such a wish in 2008/9, but sites were worried about issues such as uncontrolled root access and the maintenance of the traceability logs required by Grid security policies.

2

This, at least, is still an

issue.

Page 3: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

[email protected]

Image endorsement The HEPiX VWG developed a policy that introduced the

concept of image endorsers: people who would guarantee that generated images could be used safely at sites.

Amongst other things, such images would– have no embedded user credentials, and– enable sites to contextualise the images to enable the required

logging and make other necessary customisations.» Sites agree, however, not to modify the software environment of the

image. Sites are free to trust (or not) specific image endorsers

but, if they do trust someone in this role, it is expected that any images endorsed by this person can be used at that site without the need for inspection or manual approval.

The HEPiX VWG policy became the basis of an approved JSPG policy document, “Policy Trusted Virtual Machines”.

3

Page 4: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

[email protected]

Current Status The endorsement policy is agreed. Technical arrangements have been defined for

– image contextualisation» these are compatible with EC2/OpenNebula/OpenStack

– exchange of information between the site infrastructure and a running virtual machine» e.g. remaining lifetime, that the virtual machine can be terminated,

… A framework for image endorsers to publish and

distribute images has been developed.– This has been integrated with StratusLab’s marketplace at

LAL and is being integrated with OpenStack Glance at CERN. CERNVM images are compatible with the HEPiX VWG

policies– and there has been a security review of the underlying

technology.4

Many thanks to Owen,

Michel, Belmiro & UlrichHEPiX vwg model (and s/w) endorsed by the

EGI federated cloud task force.

Page 5: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

[email protected]

Job done then. What now?

5

Page 6: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

How this could be used

CentralTask

Queue

Site A

Site B

Site C

SharedImage

Repository(VMIC)

User

VO service

Instance requests

Commercial cloud

Payload pull

Image maintainer

Cloud bursting

Slid

e co

urte

sy o

f Ulri

ch S

chwi

cker

ath

Page 7: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

A Visionfor Virtualisation

in WLCG

Tony CassWLCG GDB, 9/9/9

Page 8: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Goals• Enable experiments/users to choose

environment for job execution.

• Ensure sites have control/traceability over resource usage.

Virtualisation Vision- 8

Page 9: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Approach• Step-by-step: Build on

– established successes– established trust

• But end goal in view. Prepare for this now with– technical agreements/developments– user behaviour (especially explicit

statement of resource requirements)

Virtualisation Vision- 9

Page 10: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Approach• Five steps

• Steps 1-3– realistic– relatively uncontroversial(?)– achievable by end-2010?

• Steps 4 & 5– kite-flying– probably controversial– interesting

Virtualisation Vision- 10

Page 11: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Step 1• Users can choose between virtual

images created at sites.

• Not really any different from now; could be rephrased “sites provide virtual machines for job execution, not real hardware”.

• Key issue is (full) understanding of resource requirements– OS type, memory, (range of) #cores, ...

Virtualisation Vision- 11

Not done. Sites may be using

virtual machines but th

is is

transparent to users.

And I’m not su

re we’re any

nearer a negotiation on core

needs.

Let’s just f

orget this s

tep now.

Page 12: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Step 2• Distribution of virtual machine images

between sites (or from CERN...).– Image limited to minimalist operating

system (SL4/5/6...)

• Requires– transparent process for image generation

guaranteeing content– mechanism for sites to hook into local

monitoring and batch scheduling.– trusted and verifiable method of image

distribution

Virtualisation Vision- 12

HEPiX

HEPiX

HEPiX

Not done but could be.

Page 13: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Step 3• Distributed virtual image includes

experiment software environment– So users can choose ATLAS version X on

OS Y.

• Requires “transparent process for image generation” to be extended to include experiment software.– Snapshot of experiment build servers at

CERN?

• Removes need for pilot jobs to verify (or create) correct environment. Virtualisation Vision- 13

CVMFS delivers

this.

Page 14: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

What about CernVM?• Instantiation of CernVM machines

being discussed between IT and PH teams; could be an option at CERN.

• But scalability and verifiability of CernVM distribution for widespread use as remote batch image is far from evident.– Not excluded, but more likely after

successful experience with static images.

Virtualisation Vision- 14

We took too long (not) testing static

images!

This works…

Page 15: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Step 4• Distributed virtual image includes client to

connect directly to experiment pilot job framework (Dirac, PanDA).

• Initially with virtual machine images instantiated according to jobs arriving at sites.

• Later, sites instantiate virtual machines according to observed load and local policy– Lots of busy ATLAS machines? Start more...

• Requires some way for pilot job frameworks to know (remaining) lifetime of virtual machine.– VM unlikely to be updated (security patches...), so

lifetime will be limited.

Virtualisation Vision- 15

Let’s work on this n

ow

Let’s work on this t

ogether

now

from

cvmfs

Page 16: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

[email protected]

Step 4 issues Moving credentials into VM images What role for pilot factories? Can we avoid queues of virtual machine

instantiation requests at sites? How to streamline (minimise…)

communications between sites and experiments?

16

Page 17: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

Let the discussion begin!

Page 18: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Step 5• Experiment pilot job frameworks

replaced by commercial/public domain schedulers.– Virtual LSF cluster for ATLAS– Virtual SGE cluster for CMS– ...– ...

Virtualisation Vision- 18

SLURM today?

Page 19: Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12