19
1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) [email protected] 2011-5-10

1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) [email protected] 2011-5-10

Embed Size (px)

Citation preview

Page 1: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

1

port BOSS on CAS@Home

Wenjing Wu (IHEP-CC)[email protected]

2011-5-10

Page 2: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

2

Outline

Volunteer Computing and CAS@Home

How to port BOSSA big pictureRelated technologyGlue together

Page 3: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

Volunteer Computing and BOINC

Volunteer Computing (VC) is an established technology that enables users to contribute to important challenges in fundamental science and medicine, by providing idle time on their PCs and even partaking in data analysis via the Internet.

Berkeley Open Infrastructure for Network Computing (BOINC) is a kind of middleware which allows exploiting the computing resources provided by volunteers. The system currently has more than 300k registered users and 500K registered hosts.

Middleware: BOINC, XtremWeb, Xgrid, Grid MP

Page 4: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

Basic Model for VC Based BOINC

The BOINC client is distributed to volunteer PCs , laptops or Clusters.

BOINC client gets the application and input files from server, runs the application and sends the results back to BOINC server

The scientist deploys and manages the BOINC server

and develops the application1

2

3

BOINC Server

BOINC Client Application

APP Input Job Output/CPU time

Control Message

Page 5: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

Workflow

User submit a (batch of) job(s) to BOINC server

1

BOINC server generates work unites from the submitted jobs

BOINC server sends job, input files, application to requesting client

Job being executed on the client side, it will suspend if the client hosting machine is “busy”, and continue

Job finished, output data being sent back to BOINC server,

2

An Active client requests job from BOINC server3

4

5

6

Page 6: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

CAS@Home

CAS@Home is a BOINC based volunteer computing project based at IHEP-CC

Goal: collecting free and large amount of computing resources from volunteers to support scientific computing from CAS and other institutes.

Current resource: about11000 registered hosts, 8000 registered computers, roughly 60% of them are active

Designed for Multiple application Current application: Scthread Working on porting application: BOSS,

LAMMPS

Page 7: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

Porting BOSS

Challenges:

BOSS is heavily platform dependent, but most volunteer computers are windows machine

A big code base: several GB , takes long to download

Solution:

Virtual Machine is used to run provide BOSS running environment

Page 8: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

E

BOSS JOB Manage System

BOINC Server

volunteer computer 1

BOINC Client 1Create/start /resume VM

Migrate BOSS job files to VMExecute BOSS job from host

Query job statusPause/stop VM when job finished

SQUID

CERM VM(Execute BOSS jobs)

volunteer computer N

BOINC Client N

CERM VM(Execute BOSS jobs)

CVMFS Server

Page 9: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

How it worksBOSS jobs are submitted to BOINC server via the current

BOSS job manage system

BOINC clients run on desktops request jobs from BOINC server, and get jobs.

JOB:

Create/Start/resume a virtual machine with CernVM image

copy BOSS job files to virtual machine shared folder and run it on CernVM

Get job status (create a file in shared folder to indicate job status)

Finish job , pause/poweroff/remove vm

CernVM download /cache BOSS software (only happens once), run BOSS job

Page 10: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

Related Technologies

BOINC

CernVM

VirtualBox

Page 11: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

BOINC

Client features: Sticky file: files remain at client after job is done (vm

images)

Report on RPC: client report the presence of files to BOINC server

Locality schedule : jobs are scheduled to where files are located

BOINC Wrapper: BOINC Wrapper is used to control the

start/suspension/resume/finish of the application, and report CPU time.

Application can be rewritten with BOINC API to do so(wrapper is not needed in this case)

For Virtual Machine, the wrapper is to create/start/pause/resume/poweoff the hyper visor(VirtualBox)

BOINC developer has finished a generic VM wrapper

Page 12: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

History of BOINC wrapper

A specific wrapper has been developed for LHC@home, this is VirtualBox and CernVM based. CoPilot is being used to schedule the jobs which is different from our case.

With CoPilot, the LHC@home wrapper does not have to support host/guest machine file share, and guest control(ie, execute command on guest machine from host machine)

LHC@home wrapper only support one instance of virtual machine running on a host machine.

BOINC developer works on the generic VM wrapper which is still virtualbox based, but supports guest control and multiple instances of vm on a host machine.

Page 13: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

CernVM

CernVM is a thin virtual machine image dedicated for LHC experiment users, its basic image is about 250MB, can be run on most Hyper visor (VirtualBox, VMWare, Xen, KVM,Hyper-V server)

LHC and other HEP application can be distributed and run easily to different platforms(Linux/Windows/Mac) via Hypervisor+CernVM image

Current version is 2.2, it provides 3 types of SLC5 based Linux system image : Desktop, basic and BOINC,earlier versions support SLC4

Page 14: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

Software distribution and CVMFS

CernVM comes with a network file system CVMFS which delivers the software to CernVM users.

CVMFS is mounted to appear as a local file system to CernVM users. Files will be downloaded and cached on demand.

CVMFS is like a software repository, it currently supports four LHC experiments (ALICE, ATLAS, CMS and LHCb), as well as other experiments and projects (LCD, NA61,  and  H1), BOSS has also been deployed

CVMFS is http based, no firewall/proxy concerns

Page 15: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

CoPilot

CernVM comes with a job schedule system CoPilot (PULL model), XMPP protocol based

CoPilot Client, run on any machine, submit jobs to CoPilot server

CoPilot Server: central service, schedule and deliver jobs/input files to Copilot Agent, receiving output files

CoPilot Agent: run on each CernVM machine, executes jobs

Page 16: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

VirtualBox

Hyper visor. free, open source, has versions for Linux/Windows/Mac

Rich command lines to create and control the status of virtual machine:Create vm/ Modify vm/ Start vmSave vm status/Pause vmResume vmPoweoff vm/ release vm /remove vmList running vms/existing vm instanaces

Life cycle:create->start->[pause|resume/save_state]->poweroff->[remove|release]

Page 17: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

Glue all pieces together

BOINC VM wrapper is needed to build communication between vm and client.

VM wrapper uses guestcontrol (of virtualbox) to execute commands from host machine on guest machine.

VM wrapper receives signals from client to decide whether pause/resume virtualbox

VM wrapper copies the BOSS related files to a shared folder, so they appear in the virtual machine for execution

For multiple vm, wrapper needs to keep track of the status of each vm.

Page 18: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

Glue all pieces together

Only a basic cernvm image is downloaded by the client, and put in the project directory, if to run multiple instances on a host, each instance should have a clone of the image

Input files include image(cermvm, original size 800M), boss scripts and its associated input files.

BOSS software is distributed via CVMFS, a local squid is used to cache the repository

Page 19: 1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10

19

Thanks!For more information: http://twiki.ihep.ac.cn/twiki/bin/view/CASAtHome/BOSSonBOINC