Specialising Generators for High-Performance Monte-Carlo Simulation ... in Haskell

Specialising Simulator Generatorsfor High-Performance Monte-Carlo Methods

Gabriele KellerHugh Chaffey-Millar

Manuel M. T. ChakravartyDon Stewart

Christopher Barner-Kowollik

University of New South Wales

Programming Languages and Systems &Centre for Advanced Macromolecular Design

Don Stewart | Galois Inc Specialising Simulator Generators

Imagine. . .

You are not a computer scientist, but a chemist!• You study new polymers and their efficient production• The search space is too big to try all possible approaches

in the lab

Kinetic models (here, polymerisation of bulk styrene)

Exploration by way of computational chemistry• Turn kinetic models into PDEs• Solve with a deterministic solver, esp. h-p-Galerkin method


Imagine. . .


in the lab




Imagine. . .


in the lab




Solving PDEs gives molecular weight distribution (MWD)

0

2000

4000

6000

8000

10000

12000

14000

0

2500

5000

7500

10000

simulated time/ seconds

0100

200300

400500

600700

800

degreeof

polymerisation

0

7000

14000

concentration/mol L-1

Problems encountered by chemists• Scalability: slow for complex systems (star polymers)• Generality: missing microscopic information:

I multiple chain lengths (i.e., star polymers)I cross-linking densities and branchingI copolymer composition

• Improving both speed and information content seems hard


Solving PDEs gives molecular weight distribution (MWD)

0

2000

4000

6000

8000

10000

12000

14000

0

2500

5000

7500

10000

simulated time/ seconds

0100

200300

400500

600700

800

degreeof

polymerisation

0

7000

14000

concentration/mol L-1

Problems encountered by chemists• Scalability: slow for complex systems (star polymers)• Generality: missing microscopic information:

I multiple chain lengths (i.e., star polymers)I cross-linking densities and branchingI copolymer composition

• Improving both speed and information content seems hard


Back To Being a Programming LanguagesResearcher!

Attack the problem from a different angle• Replace deterministic by stochastic solver• Don’t solve PDEs, use microscopic simulation

Turn the problem into a programming languages problem

Regard kinetic models asprobabilistic rewrite systems

Potential to gain both speed and information content• Microscopic simulation gives microscopic information• Stochastic solvers (Monte-Carlo) are easier to parallelise• We know much about speeding up the execution of RSes














Microscopic simulation of polymerisation kinetics

The soup—aka system state• Multiset of molecules• Subscript is chain length of a polymer• Monomers have no subscript• Star polymers have multiple subscripts

Reactions with rate coefficients—aka probabilistic rewrite rulesReactions:

• Consume one or two molecules• Produce one or two molecules

Rate coefficients:• Reaction probability relative to

molecule concentration

I 7→ I• + I• [kd ]

I• + M 7→ P1 [ki ]

Pn + M 7→ Pn+1 [kp]

Pn + Pm 7→ Dn+m [kt ]


Microscopic simulation of polymerisation kinetics

The soup—aka system state• Multiset of molecules• Subscript is chain length of a polymer• Monomers have no subscript• Star polymers have multiple subscripts

Reactions with rate coefficients—aka probabilistic rewrite rulesReactions:

• Consume one or two molecules• Produce one or two molecules

Rate coefficients:• Reaction probability relative to

molecule concentration

I 7→ I• + I• [kd ]

I• + M 7→ P1 [ki ]

Pn + M 7→ Pn+1 [kp]

Pn + Pm 7→ Dn+m [kt ]


Structure of the simulation

À Compute reaction probabilities, known as rates:I Product of the reaction’s rate coefficient and current

concentration of the reactantsI Stored in reaction rate treeI Determines the probability distribution functionI Rate coefficients are experimentally determined



Á Randomly pick a reactionI Reaction rate tree enables fast reaction selection given a

uniformly distributed random numberI E.g., we might pick Pn + Pm 7→ Pn+m

[NB: we haven’t fixed n and m yet]



Â Randomly pick the moleculesI For monomers, there is nothing to doI For polymers, we need to pick one or more chain lengthsI E.g., For Pn + Pm 7→ Pn+m, pick n and m randomly (from

available lengths)I In some systems, different chain lengths react with different

probabilitiesDon Stewart | Galois Inc Specialising Simulator Generators


Ã Compute reaction productsI Update the concentration of moleculesI E.g., remove Pn and Pm and add Pn+mI Advance system clock (non-trivial)


Specialisation

Reaction kinetics as probabilistic multiset rewriting• Direct implementation of the four simulator steps

=⇒ interpreter for the rewrite system

• Performance is an issue: Let’s compile the rewrite system!

Which part of the simulator is worth compiling?

• Only code that depends on the kinetics model (our“program”)

• The code that updates the system state according to aselected reaction and reactants:

1 Update molecule count—i.e., reactant concentrations2 Adapt reaction probabilities3 Modify reaction rate tree


Specialisation


=⇒ interpreter for the rewrite system• Performance is an issue: Let’s compile the rewrite system!

Which part of the simulator is worth compiling?

• Only code that depends on the kinetics model (our“program”)

• The code that updates the system state according to aselected reaction and reactants:

1 Update molecule count—i.e., reactant concentrations2 Adapt reaction probabilities3 Modify reaction rate tree


Specialisation


=⇒ interpreter for the rewrite system• Performance is an issue: Let’s compile the rewrite system!

Which part of the simulator is worth compiling?• Only code that depends on the kinetics model (our

“program”)• The code that updates the system state according to a

selected reaction and reactants:1 Update molecule count—i.e., reactant concentrations2 Adapt reaction probabilities3 Modify reaction rate tree


Runtime system as template (using CPP to instantiate)

#include "genpolymer.h" // generated by simulator generatorvoid oneReaction () {

// . . . variable declarations omitted. . .

À updateProbabilities ();

Á reactIndex = pickRndReact ();

Â mol1Len = pickRndMol (reactToSpecInd1 (reactIndex));

if (consumesTwoMols (reactIndex))

mol2Len = pickRndMol (reactToSpecInd2 (reactIndex));

Ã switch (reactIndex) // compute reaction productsDO_REACT_BODY // defined in genpolymer.h

incrementMolCnt (resMol1Spec, resMol1Len);

if (resMolCnt == 2)

incrementMolCnt (resMol2Spec, resMol2Len);

advanceSystemTime (); // compute ∆t of this reaction}


Simulator generator in Haskell• Compiles the kinetics model (probabilistic rewrite system)• For example,

I 7→ I• + I• ⇓#define I_Star 2

...

#define DECOMPOSITION 0

...

#define DO_REACT_BODY \

{case DECOMPOSITION:\

resMolCnt = 2;\

resMol1Spec = I_Star;\

resMol2Spec = I_Star;\

break;\

...

genpolymer.h


Haskell

Programming in Haskell• Popular functional language: fast, all functions, no OO!• Pure – no side effects by default• Richly and strongly typed – no runtime type errors• Easy to parallelise, implicit and explicit cheap threads• Algebraic data types – good for symbolic manipulation

Example: data parallel dot product

dotp :: [: Double :] → [: Double :] → Doubledotp v w = sumP (zipWithP (∗) v w)


Specialising Simulator Generator• Input is reaction specification• Custom C runtime for reaction generated from Haskell• All inputs become static constants• C compiler then aggressively optimises the reaction

implementation• Run compiled reaction and inspect results

Portability• Simulator generation easy to retarget: single core,

multicore, clusters• Just port the simulator runtime


Parallelisation

Vanilla Monte-Carlo• Probability distribution function is static• All stochastic events are independent• Embarrassingly parallel

Markov-chain Monte-Carlo• Probability distribution function is dynamic• It is dependent on previous stochastic events• Fully sequential (no parallelism)

Our solution: stirring• Physical reality: only molecules in close proximity can react• Stirring: regularly, exchange and average system state• Optimal stirring frequency dependant on Brownian motion


Parallelisation





Parallelisation





Benchmarks

Specialised versus unspecialised

0

10

20

30

40

50

60

70

107 2*107 3*107 4*107 5*107

Tim

e (s

ec)

Simulator steps

generic simulator (gcc)generic simulator (icc)

specialised simulator (gcc)specialised simulator (icc)

[icc = Intel C Compiler; gcc = GNU C Compiler]


Parallel speedup

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8 1

2

3

4

5

6

7

8Re

lativ

e Sp

eedu

p

PE

P4 cluster (1010 particles)Opteron SMP (1010 particles)

P4 cluster (109 particles)Opteron SMP (109 particles)

[P4 cluster = Intel P4, 3.2GHz, with GigaBit Ethernet;Opteron SMP = AMD Athlon 64 3200+, 2.2GHz, with HyperTransport 1.0]


Deterministic versus Monte-Carlo

0

5000

10000

15000

20000

25000

30000

35000

40000

(1)

PREDICIoriginal

0.02

(2)

PREDICIoptimised

0.01

(3)

PREDICIoptimised

0.02

(4)

PREDICIoptimised

0.05

(5)

MC 1010

(6)

MC 109

(7)

MC 1010 / 8

(8)

MC 1010 / 16

(9)

MC 109 / 8

simul

atio

n tim

e / s

econ

ds

10

100

1000

10000

100000

(1) (2) (3) (4) (5) (6) (7) (8) (9)

logarithmic depiction

[PREDICI = commercial coarse-grained h-p-Galerkin simulator;MC with 4 PEs matches PREDICI with 109 particles and accuracy 0.02]


Lessons Learnt

Make it a PL problem• It can be worthwhile to turn seemingly non-PL problems

(shortcomings with deterministic PDE solvers) into PLproblems (multiset rewriting & code specialisation)

Declarative languages for computationally-intensive code• Generative approaches can outperform hand-coded

low-level code• Generators/compilers are a core domain of functional

languages• Why not generate code for your next hard simulator

problem?

Prototyping• Simulator development by prototyping in Haskell (→ paper)


Lessons Learnt






problem?



Lessons Learnt






problem?



Specialisation of inner loops in Monte-Carlo solvers• MC always execute an inner loop very many times• Specialisation is worthwhile whenever significant

parameters are fixed over all or many iterations

Parallelisation of Markov-chain Monte-Carlo• Averaging of parallel system states (i.e., stirring) is

generally applicable in MCMC


Specialisation of inner loops in Monte-Carlo solvers• MC always execute an inner loop very many times• Specialisation is worthwhile whenever significant

parameters are fixed over all or many iterations

Parallelisation of Markov-chain Monte-Carlo• Averaging of parallel system states (i.e., stirring) is

generally applicable in MCMC


Conclusions

Specialising Monte-Carlo simulator generators• Kinetics models as probabilistic rewrite systems• Highly optimised low-level code• Parallelisation of Markov-chain Monte-Carlo• First competitive Monte-Carlo simulator for polymerisation

kinetics, produces microscopic information

Future work• Other polymer structures (meshes)• Application to financial mathematics


Thanks!

Make sure you understand compilers


Documents

Specialising Generators for High-Performance Monte-Carlo Simulation ... in Haskell