Distributed Grid Computing at ISIS using the Grid MP System

Preview:

DESCRIPTION

Distributed Grid Computing at ISIS using the Grid MP System. Tom Griffin, ISIS Facility & University of Manchester / UMIST. What do I mean by ‘Distributed Grid’?. A way of speeding up large, compute intensive tasks Break large jobs into smaller chunks - PowerPoint PPT Presentation

Citation preview

Distributed Grid Computing at ISIS using the Grid MP

System

Tom Griffin, ISIS Facility & University of Manchester / UMIST

What do I mean by ‘Distributed Grid’?• A way of speeding up large, compute intensive

tasks

• Break large jobs into smaller chunks

• Send these chunks out to (distributed) machines

• Distributed machines do the work

• Collate and merge the results

Spare Cycles Concept

• Typical PC usage is about 10%

• Most PCs not used at all after 5pm

• Even with ‘heavily used’ (Outlook, Word, IE)

PCs, the CPU is still grossly underutilised

• Everyone wants a fast PC!

• Can we use (“steal?”) their unused CPU cycles?

• SETI@home, World Community Grid (www.

worldcommunitygrid.org)

• Toolkit e.g. COSM• Low level toolkit – source code level integration

• So time consuming work, for each application

• Entropia DC Grid• Trial run at ISIS two years ago. Some success

• Company bought out and in limbo (?)

• United Devices Grid MP• What we’re currently using

• Quite expensive

• Condor• Free (academic research project)

• In our experience 2 yrs ago, not reliable with Windows

Possible Software Implementations

The United Devices System• Server hardware

• We use two, dual Xeon servers + 280 client licenses• Could (will) easily cope with more clients

• Software• Servers run RedHat Linux Advanced Server / DB2• Clients available for Windows, Linux, SPARCs and Macs

•Programming• MGSI – Web Services interface – XML, SOAP• Accessed with C++ and Java classes etc

• Management Console• Web browser based• Can manage services, jobs, devices etc

Visual Introduction to the Grid

Installing and Deploying the System• Servers

• Complete set up in under 3 hours

• Virtually self maintaining

• Clients• Windows only so far

• MSI Installer

• approx 20 seconds

• SMS

• MP Agent User

• Install to other OSs looks straightforward

• CPU Intensive• Low to moderate memory use• Not too much file output• Coarse grained• Command line / batch driven• Licensing issues?

Suitable / Unsuitable Applications

• Program

• Job

• Jobstep

• Data Set

• Data

• Workunit

• Client

Objects within the Grid

1) Think about how to split your data and merge results

2) Wrap and upload your executable

3) Write the application service• Pre and Post processing

4) Use the Grid

• Fairly easy to write

• Interface to grid via Web Services

• So far used: C++, Java, Perl, C# (any .Net language)

How to write Grid Programs

• Executable + any dlls etc

• Standard data files

• Compression

• Encryption

• Capture screen output

• Set Environmental Variables

• Command Line

Wrapping Your Executable

• Pre-processing1) Partition data

2) Package data partitions

3) Log in to the Grid server

4) Create a Job and Job Step

5) Create a Data Set

6) Create Datas and upload data packages

7) Create Workunits

8) Set the Job running

• Post-Processing1) Retrieve results

2) Merge results

Application Service

Hybrid Monte Carlo method of global optimisation to solve molecular crystal structures from powder diffraction dataParametric problem

• e.g. vary parameters such as acceptance ratio, to scan a 3D grid

• each run completely independent of any other

• Send one run to each machine on the grid

Example Application: HMC

• Unchanged exe

• User edits or creates an appropriate settings file

• User runs “my” HMC submit program• Splits bat file into one line per machine

• Uploads chunks to the Grid server• Grid server distributes Workunits to clients

• User monitors the job with their web browser

• Clients return results to the Grid server

• User runs HMC retrieve program• Downloads results

Running HMC on the Grid

• Split the batch file into lines

• Create a dataset (to hold our data)

• Package data (command line and zmatrix files etc)

• Associate data with dataset

• Upload data packages to Grid server

• Create Workunits from the dataset

• Create a Job to hold the Workunits

More on HMC Submit…

Yet more…• Program written in C++

• Uses C++ classes to ‘hide’ SOAP calls

dsHMC.data_set_gid = mgsi->createDataSet(dsHMC);

ud::uuid MgsiClient::createDataSet(const DataSet &data_set) throw(MgsiException){ SOAPMethod request("createDataSet", "urn://ud.com/mgsi"); request.AddParameter("authkey") << authkey; request.AddParameter("data_set") << data_set; const SOAPResponse &response = call(request, const_cast<SOAPParameter *>(&request.GetParameter((size_t)0)));

ud::uuid retval; response.GetReturnValue() >> retval; return retval;

}

• Auto generated by ‘Axis C++’ from WSDL file

• Also a C++ HTTPs file transfer program

• Linear: 50 devices ≈ 50 times faster

• Affected by size of Workunit– Overhead for distribution is ≈ 1minute– Risk of device being switched off

Performance

Example 2: MD Manager• Molecular Dynamics simulation(s)

• Program written in C#• Generated from WSDL (and modified) C# classes to hide

SOAP

• Wrote generic C# HTTP file transfer classes

• ‘Interactive’ program

• Typical runtime ~10 hours per single

simulation

• Need to investigate ‘grids’ of simulations

IHG

FED

CBA

IHG

FED

CBA A B C

D E F

G H I

• But in 3-dimensions

• and with ‘ordering restrictions’

• plus a post processing stage

Temperature

Pressure

• Johnson & Johnson

• Novartis

• GSK

• National Physical Laboratory

• Accelrys

• IBM

• World Community Grid• http://www.worldcommunitygrid.org/

• Currently the Human Proteome Folding project

Who Else Does This?

• Technical Problems• Mercifully few!

• Main issue has been RAM thresholding (now resolved)

• Encryption of certain files causes a problem

• Support• So far been very good

• Responses to queries always next day (time difference) and always insightful• Ease of setup / maintenance• Installed and fully running in ~3 hours

• Next to no maintenance required, other than backup

Problems Encountered & Support

• Easiest thing to blame

• Too abstract for some users (no big box)• Stealing my cycles

• Expansion leads to political problems

‘Social’ Issues

• Expansion• Proposal accepted for an additional 400 licenses

• Giving us a total of 480

• Change in licensing model

Future Developments - Expansion Upgrade to 280

Licences

Upgrade tounlimited licences

for 1 year

MP Insight

UnlimitedLicences forever

480 Permanentlicences

Completed

Funded

Seeking funding

$50k

$45k

$50k

$83k

• Bottom Line: Costs• Setup, server licenses, 80 client licenses + support – $18k – CMSD

• Total ≈ $250k

• Grid is here and running smoothly

• Easy to use

• Excellent performance

• Vast amount of compute power available

• Future looks good

Summary

Recommended