31
Distributed Monte Carlo Instrument Simulations at ISIS Tom Griffin, ISIS Facility & University of Manchester

Distributed Monte Carlo Instrument Simulations at ISIS Tom Griffin, ISIS Facility & University of Manchester

Embed Size (px)

Citation preview

Distributed Monte Carlo Instrument

Simulations at ISIS

Tom Griffin, ISIS Facility & University of Manchester

• What is Distributed Computing

• The software we use

• VITESS Specifics

• McStas Specifics

• Conclusions

Introduction

What do I mean by ‘Distributed Grid’?• A way of speeding up large, compute intensive

tasks

• Break large jobs into smaller chunks

• Send these chunks out to (distributed) machines

• Distributed machines do the work

• Collate and merge the results

Spare Cycles Concept

• Typical PC usage is about 10%

• Most PCs not used at all after 5pm

• Even with ‘heavily used’ (Outlook, Word, IE)

PCs, the CPU is still grossly underutilised

• Everyone wants a fast PC!

• Can we use (“steal?”) their unused CPU cycles?

• SETI@home, World Community Grid (www.worldcommunitygrid.org)

• Toolkit e.g. COSM• Low level toolkit – source code level integration

• So time consuming work, for each application

• Entropia DC Grid• Trial run at ISIS two years ago. Some success

• Company bought out and in limbo (?)

• United Devices Grid MP• What we’re currently using

• Quite expensive

• Condor• Free (academic research project)

• In our experience 2 yrs ago, not reliable with Windows

Possible Software Implementations

The United Devices System• Server hardware

• We use two, dual Xeon servers + 280 client licenses• Could (will) easily cope with more clients

• Software• Servers run RedHat Linux Advanced Server / DB2• Clients available for Windows, Linux, SPARCs and Macs

•Programming• MGSI – Web Services interface – XML, SOAP• Accessed with C++ and Java classes etc

• Management Console• Web browser based• Can manage services, jobs, devices etc

Visual Introduction to the Grid

• CPU Intensive• Low to moderate memory use• Not too much file output• Coarse grained• Command line / batch driven• Licensing issues?

Suitable / Unsuitable Applications

• Program• McStas

• Job• wish_simulation

• Jobstep

• Workunit • sent to a Device

• Data Set

• Data

Objects within the Grid

1) Think about how to split your data and merge results

2) Wrap and upload your executable

3) Write the application service• Pre and Post processing

4) Use the Grid

• Fairly easy to write

• Interface to grid via Web Services

• So far used: C++, Java, Perl, Fortran, C#

How to write Grid Programs

• Executable + any dlls etc

• Standard data files

• Compression

• Encryption

• Capture screen output

• Set Environmental Variables

• Command Line

Wrapping Your Executable

• Pre-processing1) Partition data

2) Package data partitions

3) Log in to the Grid server

4) Create a Job and Job Step

5) Create a Data Set

6) Create Datas and upload data packages

7) Create Workunits

8) Set the Job running

• Post-Processing1) Retrieve results

2) Merge results

Application Service

• Two scenarios:

• Single large simulation run

• Split the neutrons into smaller numbers and execute separately

• Merge results in some way

• Many smaller runs

• Parameter scan

Monte Carlo Speed-up Ideas

• Easy mode of operation: fixed executables + data files

• Executables held on server

• Split command line into bits – divide Ncount

• Vary the random seed

• Create data packages

• Upload data packages

VITESS – Splitting It

• Use GUI to create instrument – Save As Command

• “Parameter directory” set to “.”

VITESS – Running It

• Submit program parses bat file

• Substitutes ‘V’ and ‘P’

• Removes ‘header’ and ‘footer’

• Creates many new bat files with different ‘--Z’s and

• Submit program creates many bat files

VITESS – Running ItC:\My_GRID\VITESSE\VITESSE\build>Vitess-Submit.exe example_job example.bat req_files 20logging in to https://bruce.nd.rl.ac.uk:18443/mgsi/rpc_soap.fcgi as tom....

Adding Vitesse dataset....Adding Vitesse datas....3e+007 neutrons split into 20 chunks, of -n1500000 neutronsTotal number of Vitesse 'runs' = 20Uploading data for run #1...Uploading data for run #2.....Uploading data for run #19...Uploading data for run #20...

Adding Vitesse datas to system....Adding job....Adding jobstep....Turning on automatic workunit generation....Closing jobstep....

All doneYour job_id is 4878

• Web Interface

VITESS – Monitoring It

• Download the ‘chunks’

• Merge Data files

• DetectedNeutrons.dat : concatenate

• vpipes : trajectories & count rate

• Two classes of files

•1D - Values: sum & divide by num chunks-

- Errors: square, sum and divide

•2D –Sum / num of chunks

VITESS – Merging It

• Many times faster: linear increase

• Needs verification runs (x3)

• Typically 11 (potentially) 30+ times faster

• 12 hours runs in 1 hour!

• Very large simulations reach random limits

VITESS – Advantages and Problems

VITESS – Some Results

Comparison

Time-of-Flight (ms)

63.0 63.2 63.4 63.6 63.8 64.0 64.2 64.4

Neutrons s-1

0

2

4

6

8

10

12

1 CPU Simulation - 66 Hours GRID Simulation - 6 Hours

176 hours

59 hours6hrs 20mins

• Different executable for every run

• Executable must be uploaded at run time

• Split –n into chunks

• or run many instances (parameter scan)

• Create data (+ executable) packages

• Upload packages

McStas – Splitting It

• Use McGui to create and compile executable

• Create input file for Submit program

McStas – Running It

• Large run• Submit program breaks up –n#####

• Uploads new command line + data + executable

• Parameter Scan• Send each run to a separate machine

McStas – Running It

• Many output files Separate merge program

• PGPLOT and Matlab implemented

• Very similar

• PGPLOT• 1D – intensities: sum and divide. Errors: square, sum and divide. Events: Sum

• 2D – intensities: sum and divide. Errors: square, sum and divide. Events: Sum

• Matlab• 1D – Same maths, different format

• 2D – Virtually the same

• ‘Metadata’ leave untouched

McStas – Merging It

• Security: Do we trust users?

• 100 times faster[?]

• Linux version much faster than Windows [?]

• How do we merge certain fields?• values = '1.44156e+006 10459.9 30748';

• statistics = 'X0=3.5418; dX=1.52975; Y0=0.000822474; dY=1.0288;';

• Some issue related to randomness of moderator file

McStas – Advantages and Problems

• Expansion• Proposal accepted for an additional 400 licenses

• Giving us a total of 480

• Change in licensing model

Future Developments - Expansion Upgrade to 280

Licences

Upgrade tounlimited licences

for 1 year

MP Insight

UnlimitedLicences forever

480 Permanentlicences

Completed

Funded

Seeking funding

$50k

$45k

$50k

$83k

• Bottom Line: Costs• Setup, server licenses, 80 client licenses + support – $18k – CMSD

• Total ≈ $250k

• Both run well under Grid MP

• Submit & Retrieve a few hours work

• Merge a bit more

• Needs to merge more output formats [?]

• Issues with very large simulations

• More info on Grid MP at www.ud.com

Conclusions