25
Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST

Distributed Grid Computing at ISIS using the Grid MP System

  • Upload
    abiola

  • View
    60

  • Download
    0

Embed Size (px)

DESCRIPTION

Distributed Grid Computing at ISIS using the Grid MP System. Tom Griffin, ISIS Facility & University of Manchester / UMIST. What do I mean by ‘Distributed Grid’?. A way of speeding up large, compute intensive tasks Break large jobs into smaller chunks - PowerPoint PPT Presentation

Citation preview

Page 1: Distributed Grid Computing at ISIS using the Grid MP System

Distributed Grid Computing at ISIS using the Grid MP

System

Tom Griffin, ISIS Facility & University of Manchester / UMIST

Page 2: Distributed Grid Computing at ISIS using the Grid MP System

What do I mean by ‘Distributed Grid’?• A way of speeding up large, compute intensive

tasks

• Break large jobs into smaller chunks

• Send these chunks out to (distributed) machines

• Distributed machines do the work

• Collate and merge the results

Page 3: Distributed Grid Computing at ISIS using the Grid MP System

Spare Cycles Concept

• Typical PC usage is about 10%

• Most PCs not used at all after 5pm

• Even with ‘heavily used’ (Outlook, Word, IE)

PCs, the CPU is still grossly underutilised

• Everyone wants a fast PC!

• Can we use (“steal?”) their unused CPU cycles?

• SETI@home, World Community Grid (www.

worldcommunitygrid.org)

Page 4: Distributed Grid Computing at ISIS using the Grid MP System

• Toolkit e.g. COSM• Low level toolkit – source code level integration

• So time consuming work, for each application

• Entropia DC Grid• Trial run at ISIS two years ago. Some success

• Company bought out and in limbo (?)

• United Devices Grid MP• What we’re currently using

• Quite expensive

• Condor• Free (academic research project)

• In our experience 2 yrs ago, not reliable with Windows

Possible Software Implementations

Page 5: Distributed Grid Computing at ISIS using the Grid MP System

The United Devices System• Server hardware

• We use two, dual Xeon servers + 280 client licenses• Could (will) easily cope with more clients

• Software• Servers run RedHat Linux Advanced Server / DB2• Clients available for Windows, Linux, SPARCs and Macs

•Programming• MGSI – Web Services interface – XML, SOAP• Accessed with C++ and Java classes etc

• Management Console• Web browser based• Can manage services, jobs, devices etc

Page 6: Distributed Grid Computing at ISIS using the Grid MP System

Visual Introduction to the Grid

Page 7: Distributed Grid Computing at ISIS using the Grid MP System

Installing and Deploying the System• Servers

• Complete set up in under 3 hours

• Virtually self maintaining

• Clients• Windows only so far

• MSI Installer

• approx 20 seconds

• SMS

• MP Agent User

• Install to other OSs looks straightforward

Page 8: Distributed Grid Computing at ISIS using the Grid MP System

• CPU Intensive• Low to moderate memory use• Not too much file output• Coarse grained• Command line / batch driven• Licensing issues?

Suitable / Unsuitable Applications

Page 9: Distributed Grid Computing at ISIS using the Grid MP System

• Program

• Job

• Jobstep

• Data Set

• Data

• Workunit

• Client

Objects within the Grid

Page 10: Distributed Grid Computing at ISIS using the Grid MP System

1) Think about how to split your data and merge results

2) Wrap and upload your executable

3) Write the application service• Pre and Post processing

4) Use the Grid

• Fairly easy to write

• Interface to grid via Web Services

• So far used: C++, Java, Perl, C# (any .Net language)

How to write Grid Programs

Page 11: Distributed Grid Computing at ISIS using the Grid MP System

• Executable + any dlls etc

• Standard data files

• Compression

• Encryption

• Capture screen output

• Set Environmental Variables

• Command Line

Wrapping Your Executable

Page 12: Distributed Grid Computing at ISIS using the Grid MP System

• Pre-processing1) Partition data

2) Package data partitions

3) Log in to the Grid server

4) Create a Job and Job Step

5) Create a Data Set

6) Create Datas and upload data packages

7) Create Workunits

8) Set the Job running

• Post-Processing1) Retrieve results

2) Merge results

Application Service

Page 13: Distributed Grid Computing at ISIS using the Grid MP System

Hybrid Monte Carlo method of global optimisation to solve molecular crystal structures from powder diffraction dataParametric problem

• e.g. vary parameters such as acceptance ratio, to scan a 3D grid

• each run completely independent of any other

• Send one run to each machine on the grid

Example Application: HMC

Page 14: Distributed Grid Computing at ISIS using the Grid MP System

• Unchanged exe

• User edits or creates an appropriate settings file

• User runs “my” HMC submit program• Splits bat file into one line per machine

• Uploads chunks to the Grid server• Grid server distributes Workunits to clients

• User monitors the job with their web browser

• Clients return results to the Grid server

• User runs HMC retrieve program• Downloads results

Running HMC on the Grid

Page 15: Distributed Grid Computing at ISIS using the Grid MP System

• Split the batch file into lines

• Create a dataset (to hold our data)

• Package data (command line and zmatrix files etc)

• Associate data with dataset

• Upload data packages to Grid server

• Create Workunits from the dataset

• Create a Job to hold the Workunits

More on HMC Submit…

Page 16: Distributed Grid Computing at ISIS using the Grid MP System

Yet more…• Program written in C++

• Uses C++ classes to ‘hide’ SOAP calls

dsHMC.data_set_gid = mgsi->createDataSet(dsHMC);

ud::uuid MgsiClient::createDataSet(const DataSet &data_set) throw(MgsiException){ SOAPMethod request("createDataSet", "urn://ud.com/mgsi"); request.AddParameter("authkey") << authkey; request.AddParameter("data_set") << data_set; const SOAPResponse &response = call(request, const_cast<SOAPParameter *>(&request.GetParameter((size_t)0)));

ud::uuid retval; response.GetReturnValue() >> retval; return retval;

}

• Auto generated by ‘Axis C++’ from WSDL file

• Also a C++ HTTPs file transfer program

Page 17: Distributed Grid Computing at ISIS using the Grid MP System

• Linear: 50 devices ≈ 50 times faster

• Affected by size of Workunit– Overhead for distribution is ≈ 1minute– Risk of device being switched off

Performance

Page 18: Distributed Grid Computing at ISIS using the Grid MP System

Example 2: MD Manager• Molecular Dynamics simulation(s)

• Program written in C#• Generated from WSDL (and modified) C# classes to hide

SOAP

• Wrote generic C# HTTP file transfer classes

• ‘Interactive’ program

• Typical runtime ~10 hours per single

simulation

• Need to investigate ‘grids’ of simulations

Page 19: Distributed Grid Computing at ISIS using the Grid MP System

IHG

FED

CBA

IHG

FED

CBA A B C

D E F

G H I

• But in 3-dimensions

• and with ‘ordering restrictions’

• plus a post processing stage

Temperature

Pressure

Page 20: Distributed Grid Computing at ISIS using the Grid MP System
Page 21: Distributed Grid Computing at ISIS using the Grid MP System

• Johnson & Johnson

• Novartis

• GSK

• National Physical Laboratory

• Accelrys

• IBM

• World Community Grid• http://www.worldcommunitygrid.org/

• Currently the Human Proteome Folding project

Who Else Does This?

Page 22: Distributed Grid Computing at ISIS using the Grid MP System

• Technical Problems• Mercifully few!

• Main issue has been RAM thresholding (now resolved)

• Encryption of certain files causes a problem

• Support• So far been very good

• Responses to queries always next day (time difference) and always insightful• Ease of setup / maintenance• Installed and fully running in ~3 hours

• Next to no maintenance required, other than backup

Problems Encountered & Support

Page 23: Distributed Grid Computing at ISIS using the Grid MP System

• Easiest thing to blame

• Too abstract for some users (no big box)• Stealing my cycles

• Expansion leads to political problems

‘Social’ Issues

Page 24: Distributed Grid Computing at ISIS using the Grid MP System

• Expansion• Proposal accepted for an additional 400 licenses

• Giving us a total of 480

• Change in licensing model

Future Developments - Expansion Upgrade to 280

Licences

Upgrade tounlimited licences

for 1 year

MP Insight

UnlimitedLicences forever

480 Permanentlicences

Completed

Funded

Seeking funding

$50k

$45k

$50k

$83k

• Bottom Line: Costs• Setup, server licenses, 80 client licenses + support – $18k – CMSD

• Total ≈ $250k

Page 25: Distributed Grid Computing at ISIS using the Grid MP System

• Grid is here and running smoothly

• Easy to use

• Excellent performance

• Vast amount of compute power available

• Future looks good

Summary