Upload
big-data-spain
View
595
Download
1
Embed Size (px)
DESCRIPTION
Session presented at Big Data Spain 2012 Conference 16th Nov 2012 ETSI Telecomunicacion UPM Madrid www.bigdataspain.org More info: http://www.bigdataspain.org/es-2012/conference/cloudMC-a-cloud-computing-map-reduce-implementation-for-radiotherapy/ruben-jimenez-and-hector-miras
Citation preview
CloudMC: A cloud computing map-reduce implementation
for radiotherapy
Rubén Jiménez MarrufoHéctor Miras del RíoCarlos Miras del RíoCarles Gomà Estadella
Big Data Spainhttp://www.bigdataspain.org
Madrid, November 16th, 2012
Contents
IntroductionRadiotherapyMonte Carlo simulations for radiation transportMonte Carlo parallelizationClustering vs. Cloud ComputingCloud Computing for clinical radiation transportCloudMC
DEMO STARTArchitectureMap ReduceElasticityHow did Radarc help us?ResultsIs it reinventing the wheel?RoadmapDEMO RESULTS
Questions & Answers
Introduction
Héctor Miras del RíoDepartment of Medical Physics, Virgen Macarena Hospital, Seville, Spain Rubén Jiménez MarrufoR&D Division, Icinetic TIC S.L., Seville, Spain
Carlos Miras del RíoR&D Division, Wedoit Innovacion Tecnologica, Seville, SpainCarles GomàCentre for Proton Therapy, Paul Scherrer Institute, Villigen PSI, Switzerland
Introduction
Monte Carlo Simulations
Radiotherapy
Cloud Computing
Radiotherapy
Radiotherapy: is the medical use of ionizing radiation, generally as part of cancer treatment to control or kill malignant cells.
Radiotherapy treatment planning: is the process for calculating the radiation dose to be absorbed by an object to be irradiated, prior to radiotherapy.
Monte Carlo simulations for radiation transport
Monte Carlo simulations for radiation transport
+👍 Gold standard algorithms for radiation calculations
- 👎 Extremely computationally intensive and very time-consuming.
Monte Carlo simulation for radiation transport
Monte Carlo Simulations:
Monte Carlo parallelization
Parallelization: Execute simultaneously one simulation in several nodes and merge the results.
Monte Carlo simulations are highly parallelizable since the primary events are independent.
Parallelization: Clustering vs. Cloud Computing
Cloud Computing for clinical radiation
calculations
100 cores cluster ≈ 20 000 €
Cost / plan
2 €
tCPU = 100 h
Number instanc
esn = 100
T(n) = 1.44 h
Extra-small
0.0142 € / h
1000 patients
/ year
Cost / year
2 000 €
160 years of computing time in an extra-small instance
CloudMC
CloudMC offers an implementation of map/reduce over Windows Azure cloud computing platform, for the parallelization of MC simulations of radiation therapy dose distribution.
Non-intrusive
Multi-application: Penelope Geant4 EGSnrc
Elasticity: Resources are not reserved 1 hour simulation costs 1 hour
CloudMC: DEMO
CloudMC Architecture
Worker Roles
UI
Service Management
Simulation filesMessages Queues
Cloud Storage
Cloud Hosted Services
SQL Azure
Users & Simulation
Repositories
Provisioning
MapReduceFactory
Entities
Services
1. New simulation
3. Parallel execution 4. Reduce 5. End of
simulation2. Map
5. End of Simulation
- Finished simulation metadata is saved on SQL Azure.
- Mail notices to the user of the end of the simulation to proceed to download the results.
2. Map
- Generation of n initial independent seeds.- Mapper: Modification of simulation config to divide histories by n. - Provisioning of the n worker roles.- Sending of n messages of “start”.
1. New simulation
- Simulation metadata is saved on SQL Azure.
- Simulation files are uploaded to the Azure Storage.
4. Reduce
- When the web role reads the n messages of end of simulation, Resolver merges the n results uploaded to the storage.
- n-1 worker roles are scaled down.
3. Parallel Execution
Every worker role:
1. Reads a message from the queue and downloads the simulation files.
2. Executes the “fragmented” simulation.
3. Sends the results to the storage.
4. Sends an “end of simulation” message.
CloudMC: MapReduce
Sequence of actions when carrying out a MC simulation on n instances:
CloudMC: Map
Input A: Configuratio
nFiles
• Simulation parameters• Histories count• Geometry & materials
files• …• MapReduce
Parameters
ExecutableHistories: 1015
Input B
Histories: 215
ExecutableExecutableExecutableExecutableMapped Executable
Mapper: parametrized mapper to set histories number and seeds in the input files
Most of MC applications for radiation transport simulation read the configuration from textual files.
CloudMC: Reduce
The result of MC applications for radiation transport simulation are dose, energy or any magnitude distribution files formatted in columns.
ExecutableExecutableExecutableExecutableMapped Executable
ExecutableExecutableExecutableExecutableDose distribution
files
Output
Reducer: parametrized reducer to combine columns depending on the column type:- Magnitude column- Uncertainty column
CloudMC: MapReduce DSL
CloudMC uses a MapReduce DSL to read parameters to adapt Mapper and Reducer to specific MC applications.
Mapper parameters Reducer parameters
CloudMC: Elasticity
Users choose the number of instances to use for each simulation.
CloudMC scales up worker role to run simulation and scales down when it finishes.
Windows Azure Service Management allows roles scaling:
👍 REST API 👍 Based on XML config files
👎 Minimum of 1 instance 👎 Impossible to scale down
specific instances (Multi-tenant)
Worker Roles
UI
Service Management
Simulation files
Messages Queues
User account
s
Cloud Storage
Cloud Hosted Services
SQL Azure
Users & Simulation
Repositories
Provisioning
MapReduce
FactoryEntities
ServicesFormula Azure
≃ 50% generated code:
• ASP.Net MVC 3 UI
• C# App Services
• C# POCO Entities
• EF CodeFirst• SQL Azure DB
Focus on domain core: map/reduce, provisioning, fault tolerance, etc.
CloudMC: How did Radarc help us?
CloudMC: Results
Case Study:Simulation: 125I seed in ophtalmic applicator.Number of histories: 3·109
MC Code: PENELOPE, main program PenEasy.
Results:Worker instances size: extra-smallClock time in 1 instance: 30 hClock time in 64 instances: 48 min
(speed up = 37x)
T(n): Clock time for 1 simulation in n instances.
tcpu: Overall time used only in the simulation of n histories.
Dt0: Non-parallelizable time for 1 instance.
a: Non-parallelizable part of time proportional to n.
CloudMC: Results
Time vs number of instances study
CloudMC: Is it reinventing the wheel?
http://stackoverflow.com/questions/1190520/is-it-possible-to-write-map-reduce-jobs-for-amazon-elastic-mapreduce-using-net
Why not using Amazon Elastic MapReduce? (http://aws.amazon.com/es/elasticmapreduce)
• Our mapper and reducer were written for .Net
Why not using Hadoop On Azure? (http://www.hadooponazure.com)
• First preview released on 2012.• The cluster size must be reserved.
Roadmap
Testing with more MC applications: Geant4, EGSnrc, etc.
Support packages with specific MapReduce implementations• Application to different domains• Use of MEF to provide Mappers and Reducers in
simulation packages
SDK to develop specific MapReduce implementation packages.• Visual Studio Templates could facilitate the
development of CloudMC packages
Enable multi-tenant environments• Concurrent simulations require scaling down of
specific instances that is not possible on Windows Azure.
Questions
CloudMC soon available at:
https://cloudmontecarlo.cloudapp.net
Thank you for your attention …
[email protected] @hmiras
[email protected] @rjimenez