51
Detector Simulation Project Prototype SFT meeting, July 11 2011 René Brun 11/07/2011 Detector Simulation Prototype Rene Brun 1

Detector Simulation Project Prototype

Embed Size (px)

DESCRIPTION

Detector Simulation Project Prototype. SFT meeting, July 11 2011 René Brun. My talk. Project context My December talk in a thumbnail Findings in January Progress with our prototype in the past 6 monthes Next steps. Project context. - PowerPoint PPT Presentation

Citation preview

Page 1: Detector Simulation Project Prototype

Detector Simulation Prototype Rene Brun 1

Detector Simulation Project

Prototype SFT meeting, July 11 2011

René Brun

11/07/2011

Page 2: Detector Simulation Project Prototype

My talk

Project context

My December talk in a thumbnail

Findings in January

Progress with our prototype in the past 6 monthes

Next steps

11/07/2011Detector Simulation Prototype Rene Brun 2

Page 3: Detector Simulation Project Prototype

Project context

La Mainaz meeting in Jan 2010->Better synergy between G4&ROOT teams.

Many discussions between April and October 2010.

In November 2010, new Project approuved with more focus on medium and long term.

My understanding that the core G4 team is too busy with the current stuff and unwilling to make a substantial contribution to the project (at least in the short term).

Main work so far between Andrei and me + Federico and Philippe/FNAL interest to join.

Discussions with Atlas (Andi Salzburger) and OpenLab (Alfio).

11/07/2011Detector Simulation Prototype Rene Brun 3

Page 4: Detector Simulation Project Prototype

My December talk in a thumbnail

Increase synergy between G4&ROOT teams.

Particle stack outside G4.

Virtual transporters with concrete instances for fast or/and full simulation, reconstruction,visualization.

Investigation of parallel architectures.

11/07/2011Detector Simulation Prototype Rene Brun 4

Page 5: Detector Simulation Project Prototype

Rene Brun: Next Steps in Detector Simulation 5

Starting Assumptions (I)

The LHC experiments use extensively G4 as main simulation engine. They have invested in validation procedures. Any new project must be coherent with their framework.

One of the reasons why the experiments develop their own fast MC solution is the fact that a full simulation is too slow for several physics analysis. These fast MCs are not in the G4 framework (different control, different geometries, etc), but becoming coherent with the experiments frameworks.

Giving the amount of good work with the G4 physics, it is unthinkable to not capitalize on this work.

16/12/2010

Page 6: Detector Simulation Project Prototype

Rene Brun: Next Steps in Detector Simulation 6

New GEANT in one picture

16/12/2010

EventGenerator

s

GeantEtransporter

EtaPhiGeometrytransporter

G4

transporte

r

G4

physics

InterpretersIO

Graphicsbuild

Abstracttransporte

r

Stackmanag

er

G4

EVEtransporter

Fast MCtransporter

Stdtransporte

r

TGeo

ROOT

Math

GUI

AbstractPhys&X-

secGEANT

Page 7: Detector Simulation Project Prototype

Rene Brun: Next Steps in Detector Simulation 7

Event loop and stacking

16/12/2010

User application

Push primaries

Stack Stack

manager

Current transport

er

Loop over particles

Geometry

navigator

FieldVirtual

transporter

Physics processe

s

Push secondaries

Step manager

Step actions for selected process

User step actions

Current transport

er

Page 8: Detector Simulation Project Prototype

Fast and Full MonteCarlo

We would like an architecture (via the abstract transporters) where fast and full MC can be run together.

To make it possible one must have a separate particle stack.

However, it was clear from the very beginning in January that the particle stack depends strongly on the constraints of parrallelism. Multiple threads cannot update efficiently a tree data structure.

11/07/2011Detector Simulation Prototype Rene Brun 8

Page 9: Detector Simulation Project Prototype

Findings in January

Decide to concentrate on a very small prototype to test our main ideas.

No need to import G4 (at least for some time)

Understanding the geometry of our detectors. We have the real detector geometry of 35 experiments (LHC, LEP, Tevatron, Hera, Babar, etc).

We rapidly concluded that MASSIVE changes are required in the current simulation strategy to take advantage of the new parallel architectures.

In this talk, I will discuss mainly the impact of parrallelism.

11/07/2011Detector Simulation Prototype Rene Brun 9

Page 10: Detector Simulation Project Prototype

Steps/volume in Atlas

11/07/2011Detector Simulation Prototype Rene Brun 10

Huge dynamic range

7100 volume types

29 million instances

Page 11: Detector Simulation Project Prototype

Atlas: volumes sorted/entries

11/07/2011Detector Simulation Prototype Rene Brun 11

50 per cent of the time spent in 50/7100

volumes

Page 12: Detector Simulation Project Prototype

Neighbors/volume in Atlas

11/07/2011Detector Simulation Prototype Rene Brun 12

Volumes with too many neighbors

Page 13: Detector Simulation Project Prototype

Neighbors/volume in CMS

11/07/2011Detector Simulation Prototype Rene Brun 13

Same problem with CMS

Page 14: Detector Simulation Project Prototype

LHCB geometry statistics

11/07/2011Detector Simulation Prototype Rene Brun 14

Better situation with neighbors because of a non

cylindrical geometry

90 per cent of steps in 50/700

volumes

Page 15: Detector Simulation Project Prototype

Conventional Transport

11/07/2011Detector Simulation Prototype Rene Brun 15

oo

o

o

oo

o

o

o

o

ooo

o

oo

o oo

o

o

o

T1

T3

T2

o

o

o

oo

oo

o

o

o

o

ooo

o

oo

oo

oT4

Each particle tracked step by step through hundreds of volumes

when all hits for all tracks are in

memory summable digits

are computed

Page 16: Detector Simulation Project Prototype

Conventional Transport

At each step, the navigator *nav has the state of the particle x,y,z,px,py,pz, the volume instance volume*, etc.

We compute the distance to the next boundary with something likeDist = nav->DistoOut(volume,x,y,z,px,py,pz)

Or the distance to one physics process with, egDistp = nav-

>DistPhotoEffect(volume,x,y,z,px,py,pz)

11/07/2011Detector Simulation Prototype Rene Brun 16

Page 17: Detector Simulation Project Prototype

Voxelisation (naive view)

11/07/2011Detector Simulation Prototype Rene Brun 17

To speedup the « where am I » and

« where I go » a 2d/3d grid of voxels

is computed.Each voxel has a list

of the included volumes (or a

bitmap)

Page 18: Detector Simulation Project Prototype

Voxelisation 2

11/07/2011Detector Simulation Prototype Rene Brun 18

Very often there is a mismatch between the

voxel array and the detector

geometry (too many

volumes/voxel)

Page 19: Detector Simulation Project Prototype

Voxelisation 3

11/07/2011Detector Simulation Prototype Rene Brun 19

One solution is to decrease the

voxel size, but this requires more

memory and contribute to the

level 2 cache misses.

Page 20: Detector Simulation Project Prototype

11/07/2011Detector Simulation Prototype Rene Brun 20

Page 21: Detector Simulation Project Prototype

11/07/2011Detector Simulation Prototype Rene Brun 21

Page 22: Detector Simulation Project Prototype

11/07/2011Detector Simulation Prototype Rene Brun 22

Page 23: Detector Simulation Project Prototype

parallelism

11/07/2011Detector Simulation Prototype Rene Brun 23

From a recent talk by Intel

Page 24: Detector Simulation Project Prototype

If you trust Intel

11/07/2011Detector Simulation Prototype Rene Brun 24

Page 25: Detector Simulation Project Prototype

If you trust Intel 2

11/07/2011Detector Simulation Prototype Rene Brun 25

Page 26: Detector Simulation Project Prototype

Current Situation

We run jobs in parallel, one per core.

Nothing wrong with that except that it does not scale in case of many cores because it requires too much memory.

A multithreaded version may reduce (say by a factor 2 or 3) the amount of required memory, but also at the expense of performance.

A multithreaded version does not fit well with a hierarchy of processors.

So, we have a problem, in particular with the way we have designed some data structures, eg HepMC.

11/07/2011Detector Simulation Prototype Rene Brun 26

Page 27: Detector Simulation Project Prototype

Back to Vectorization !!

We spent a few years between 1985 and 1990 trying to vectorize GEANT3 on CRAY, Cyber205, Eta10 and IBM3090.

We failed!! The gain in using vectors was lost in the memory management on small memory systems (machines had less than 8 Mbytes of memory!). And when looking a posteriori, our strategy was not the right one.

However, one may learn something positive from this failure.

11/07/2011Detector Simulation Prototype Rene Brun 27

Page 28: Detector Simulation Project Prototype

11/07/2011Detector Simulation Prototype Rene Brun 28

Page 29: Detector Simulation Project Prototype

Data Structures & parallelism

11/07/2011Detector Simulation Prototype Rene Brun 29

eventevent

vertices

tracks

C++ pointersspecific to a process

Copying the structure implies a

relocation of all pointers

I/O is a nightmare

Update of the structure from a different thread implies a

lock/mutexx

Page 30: Detector Simulation Project Prototype

Can we make progress?

We need data structures with internal relations only. This can be implemented by using pools and indices.

When looping on collections, one must avoid the navigation in large memory areas killing the cache.

We must generate vectors of reasonable size well matched to the degree of parallelism of the hardware and the amount of memory.

We must find a system to avoid the tail effects

11/07/2011Detector Simulation Prototype Rene Brun 30

Page 31: Detector Simulation Project Prototype

tails, tails, tails

11/07/2011Detector Simulation Prototype Rene Brun 31

Page 32: Detector Simulation Project Prototype

Tails again

11/07/2011Detector Simulation Prototype Rene Brun 32

A killer if one has to wait the end of col(i) before

processing col(i+1)

Average number of objects in

memory

Page 33: Detector Simulation Project Prototype

New Transport Scheme

11/07/2011Detector Simulation Prototype Rene Brun 33

oo

o

o

oo

o

o

o

o

ooo

o

oo

o oo

o

o

o

T1

T3

T2

o

o

o

oo

oo

o

o

o

o

ooo

o

oo

oo

oT4

All particles in the same volume type are transported in

parallel.Particles entering new volumes or generated are

accumulated in the volume basket.

Events for which all hits are

available are digitized in

parallel

Page 34: Detector Simulation Project Prototype

New Transport

At each step, the navigator *nav has the state of the particles *x,*y,*z,*px,*py,*pz, the volume instances volume**, etc.

We compute the distances (array *Dist) to the next boundaries with something likenav->DistoOut(volume,x,y,z,px,py,pz,Dist)

Or the distances to one physics process with, egnav->DistPhotoEffect(volume,x,y,z,px,py,pz,DispP)

11/07/2011Detector Simulation Prototype Rene Brun 34

Page 35: Detector Simulation Project Prototype

New Transport

The new transport system implies many changes in the geometry and physics classes. These classes must be vectorized (a lot of work!).

Meanwhile we can survive and test the principle by implementing a bridge function like

11/07/2011Detector Simulation Prototype Rene Brun 35

MyNavigator::DisttoOut(int n, TGeoVolume **vol, double *x,..) { for int i=0;i<n;i++) { Dist[i] = DisttoOutOld(vol[i],x[i],…); } }

Page 36: Detector Simulation Project Prototype

A better solution

11/07/2011Detector Simulation Prototype Rene Brun 36

Pipeline of objects

CheckpointSynchronization.

Only 1 « gap » every N events

This type of solution required

anyhow for pile-up studies

Page 37: Detector Simulation Project Prototype

A better better solution

11/07/2011Detector Simulation Prototype Rene Brun 37

checkpoints At each checkpoint we have to keep the

non finished objects/events.

We can now digitize with parallelism on events, clear and reuse the slots.

Page 38: Detector Simulation Project Prototype

Generations of baskets

When a particle enters a volume or is generated, it is added to the basket of particles for the volume type.

The navigator selects the basket with the highest score (with a high and low water mark algorithm).

The user has the control on the water marks, but the idea that this should be automatic in function of the number of processors and the total amount of memory available. (see interactive demo)

11/07/2011Detector Simulation Prototype Rene Brun 38

Page 39: Detector Simulation Project Prototype

11/07/2011Detector Simulation Prototype Rene Brun 39

Page 40: Detector Simulation Project Prototype

11/07/2011Detector Simulation Prototype Rene Brun 40

Page 41: Detector Simulation Project Prototype

Vectorizing : work to do

11/07/2011Detector Simulation Prototype Rene Brun 41

Page 42: Detector Simulation Project Prototype

Vectorizing the geometry (ex1)

11/07/2011Detector Simulation Prototype Rene Brun 42

Double_t TGeoPara::Safety(Double_t *point, Bool_t in) const{ // computes the closest distance from given point to this shape. Double_t saf[3]; // distance from point to higher Z face saf[0] = fZ-TMath::Abs(point[2]); // Z

Double_t yt = point[1]-fTyz*point[2]; saf[1] = fY-TMath::Abs(yt); // Y // cos of angle YZ Double_t cty = 1.0/TMath::Sqrt(1.0+fTyz*fTyz);

Double_t xt = point[0]-fTxz*point[2]-fTxy*yt; saf[2] = fX-TMath::Abs(xt); // X // cos of angle XZ Double_t ctx = 1.0/TMath::Sqrt(1.0+fTxy*fTxy+fTxz*fTxz); saf[2] *= ctx; saf[1] *= cty; if (in) return saf[TMath::LocMin(3,saf)]; for (Int_t i=0; i<3; i++) saf[i]=-saf[i]; return saf[TMath::LocMax(3,saf)];}

Huge performance gain expected in this type of code

where shape constants can be computed outside

the loop

Page 43: Detector Simulation Project Prototype

Vectorizing the geometry (ex2)

11/07/2011Detector Simulation Prototype Rene Brun 43

G4double G4Cons::DistanceToIn( const G4ThreeVector& p, const G4ThreeVector& v ) const{ G4double snxt = kInfinity ; // snxt = default return value const G4double dRmax = 100*std::min(fRmax1,fRmax2); static const G4double halfCarTolerance=kCarTolerance*0.5; static const G4double halfRadTolerance=kRadTolerance*0.5;

G4double tanRMax,secRMax,rMaxAv,rMaxOAv ; // Data for cones G4double tanRMin,secRMin,rMinAv,rMinOAv ; G4double rout,rin ;

G4double tolORMin,tolORMin2,tolIRMin,tolIRMin2 ; // `generous' radii squared G4double tolORMax2,tolIRMax,tolIRMax2 ; G4double tolODz,tolIDz ;

G4double Dist,s,xi,yi,zi,ri=0.,risec,rhoi2,cosPsi ; // Intersection point vars

G4double t1,t2,t3,b,c,d ; // Quadratic solver variables G4double nt1,nt2,nt3 ; G4double Comp ;

G4ThreeVector Normal;

// Cone Precalcs

tanRMin = (fRmin2 - fRmin1)*0.5/fDz ; secRMin = std::sqrt(1.0 + tanRMin*tanRMin) ; rMinAv = (fRmin1 + fRmin2)*0.5 ;

if (rMinAv > halfRadTolerance) { rMinOAv = rMinAv - halfRadTolerance ; } else { rMinOAv = 0.0 ; } tanRMax = (fRmax2 - fRmax1)*0.5/fDz ; secRMax = std::sqrt(1.0 + tanRMax*tanRMax) ; rMaxAv = (fRmax1 + fRmax2)*0.5 ; rMaxOAv = rMaxAv + halfRadTolerance ; // Intersection with z-surfaces

tolIDz = fDz - halfCarTolerance ; tolODz = fDz + halfCarTolerance ;

…… //here starts the real algorithm

Huge performance gain expected in this type of code

where shape constants can be computed outside

the loop

All these statements

are independent

of the particle !!!

Page 44: Detector Simulation Project Prototype

Vectorizing the Physics

This is going to be more difficult when extracting the physics classes from G4. However important gains are expected in the functions computing the distance to the next interaction point for each process.

There is a diversity of interfaces and we have now sub-branches per particle type.

11/07/2011Detector Simulation Prototype Rene Brun 44

Page 45: Detector Simulation Project Prototype

Next Steps

Consolidation of the prototype.

Implementation of the sliding objects.

Web site construction with a description of the current status and goals.

Thread safety of TGeo

Vectorization of TGeo (at least a critical subpart)

Discussion with the G4 team about the consequences for the G4 physics classes.

Discussion of the leadership and membership.

11/07/2011Detector Simulation Prototype Rene Brun 45

Page 46: Detector Simulation Project Prototype

Acknowledgements

Several people have contributed to discussions.

Alfio Lazzaro (OpenLab) expert on parallelism

Sverre Jarp (Open Lab) expert on many topics

Philippe Canal

Andi Salsburger for the discussions and interest about the fast MC implementation.

Axel, Bertrand, Fons for valuable comments

11/07/2011Detector Simulation Prototype Rene Brun 46

Page 47: Detector Simulation Project Prototype

Geometry of solids

In parallel to this activity, but complementary.

The project between the core G4 + Andrei Gheata and Jean-Marie Guyader is progressing. Among the goals:

Common interface for the basic solids with bridges for TGeo and the G4 geometry.

Development of the best possible algorithms to find distances, etc.

11/07/2011Detector Simulation Prototype Rene Brun 47

Page 48: Detector Simulation Project Prototype

Comparing Sequential & Parallel versions

A stress suite using our detector data bases has been setup to compare several parameters of the sequential and parallel versions.

We do not want to wait the last minute to test the relative performances.

So far the goal has been to design data structures that do not penalize the parallel version (even on one single thread and no vectorisation).

See an example in the following slide.

11/07/2011Detector Simulation Prototype Rene Brun 48

Page 49: Detector Simulation Project Prototype

11/07/2011Detector Simulation Prototype Rene Brun 49

Page 50: Detector Simulation Project Prototype

11/07/2011Detector Simulation Prototype Rene Brun 50

NewTransport

code

Original idea

G4

Page 51: Detector Simulation Project Prototype

11/07/2011Detector Simulation Prototype Rene Brun 51

Create a new architecturedesigned for

parallelism with full or/and fast

simulation