Upload
amery-marshall
View
37
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Detector Simulation Project Prototype. SFT meeting, July 11 2011 René Brun. My talk. Project context My December talk in a thumbnail Findings in January Progress with our prototype in the past 6 monthes Next steps. Project context. - PowerPoint PPT Presentation
Citation preview
Detector Simulation Prototype Rene Brun 1
Detector Simulation Project
Prototype SFT meeting, July 11 2011
René Brun
11/07/2011
My talk
Project context
My December talk in a thumbnail
Findings in January
Progress with our prototype in the past 6 monthes
Next steps
11/07/2011Detector Simulation Prototype Rene Brun 2
Project context
La Mainaz meeting in Jan 2010->Better synergy between G4&ROOT teams.
Many discussions between April and October 2010.
In November 2010, new Project approuved with more focus on medium and long term.
My understanding that the core G4 team is too busy with the current stuff and unwilling to make a substantial contribution to the project (at least in the short term).
Main work so far between Andrei and me + Federico and Philippe/FNAL interest to join.
Discussions with Atlas (Andi Salzburger) and OpenLab (Alfio).
11/07/2011Detector Simulation Prototype Rene Brun 3
My December talk in a thumbnail
Increase synergy between G4&ROOT teams.
Particle stack outside G4.
Virtual transporters with concrete instances for fast or/and full simulation, reconstruction,visualization.
Investigation of parallel architectures.
11/07/2011Detector Simulation Prototype Rene Brun 4
Rene Brun: Next Steps in Detector Simulation 5
Starting Assumptions (I)
The LHC experiments use extensively G4 as main simulation engine. They have invested in validation procedures. Any new project must be coherent with their framework.
One of the reasons why the experiments develop their own fast MC solution is the fact that a full simulation is too slow for several physics analysis. These fast MCs are not in the G4 framework (different control, different geometries, etc), but becoming coherent with the experiments frameworks.
Giving the amount of good work with the G4 physics, it is unthinkable to not capitalize on this work.
16/12/2010
Rene Brun: Next Steps in Detector Simulation 6
New GEANT in one picture
16/12/2010
EventGenerator
s
GeantEtransporter
EtaPhiGeometrytransporter
G4
transporte
r
G4
physics
InterpretersIO
Graphicsbuild
Abstracttransporte
r
Stackmanag
er
G4
EVEtransporter
Fast MCtransporter
Stdtransporte
r
TGeo
ROOT
Math
GUI
AbstractPhys&X-
secGEANT
Rene Brun: Next Steps in Detector Simulation 7
Event loop and stacking
16/12/2010
User application
Push primaries
Stack Stack
manager
Current transport
er
Loop over particles
Geometry
navigator
FieldVirtual
transporter
Physics processe
s
Push secondaries
Step manager
Step actions for selected process
User step actions
Current transport
er
Fast and Full MonteCarlo
We would like an architecture (via the abstract transporters) where fast and full MC can be run together.
To make it possible one must have a separate particle stack.
However, it was clear from the very beginning in January that the particle stack depends strongly on the constraints of parrallelism. Multiple threads cannot update efficiently a tree data structure.
11/07/2011Detector Simulation Prototype Rene Brun 8
Findings in January
Decide to concentrate on a very small prototype to test our main ideas.
No need to import G4 (at least for some time)
Understanding the geometry of our detectors. We have the real detector geometry of 35 experiments (LHC, LEP, Tevatron, Hera, Babar, etc).
We rapidly concluded that MASSIVE changes are required in the current simulation strategy to take advantage of the new parallel architectures.
In this talk, I will discuss mainly the impact of parrallelism.
11/07/2011Detector Simulation Prototype Rene Brun 9
Steps/volume in Atlas
11/07/2011Detector Simulation Prototype Rene Brun 10
Huge dynamic range
7100 volume types
29 million instances
Atlas: volumes sorted/entries
11/07/2011Detector Simulation Prototype Rene Brun 11
50 per cent of the time spent in 50/7100
volumes
Neighbors/volume in Atlas
11/07/2011Detector Simulation Prototype Rene Brun 12
Volumes with too many neighbors
Neighbors/volume in CMS
11/07/2011Detector Simulation Prototype Rene Brun 13
Same problem with CMS
LHCB geometry statistics
11/07/2011Detector Simulation Prototype Rene Brun 14
Better situation with neighbors because of a non
cylindrical geometry
90 per cent of steps in 50/700
volumes
Conventional Transport
11/07/2011Detector Simulation Prototype Rene Brun 15
oo
o
o
oo
o
o
o
o
ooo
o
oo
o oo
o
o
o
T1
T3
T2
o
o
o
oo
oo
o
o
o
o
ooo
o
oo
oo
oT4
Each particle tracked step by step through hundreds of volumes
when all hits for all tracks are in
memory summable digits
are computed
Conventional Transport
At each step, the navigator *nav has the state of the particle x,y,z,px,py,pz, the volume instance volume*, etc.
We compute the distance to the next boundary with something likeDist = nav->DistoOut(volume,x,y,z,px,py,pz)
Or the distance to one physics process with, egDistp = nav-
>DistPhotoEffect(volume,x,y,z,px,py,pz)
11/07/2011Detector Simulation Prototype Rene Brun 16
Voxelisation (naive view)
11/07/2011Detector Simulation Prototype Rene Brun 17
To speedup the « where am I » and
« where I go » a 2d/3d grid of voxels
is computed.Each voxel has a list
of the included volumes (or a
bitmap)
Voxelisation 2
11/07/2011Detector Simulation Prototype Rene Brun 18
Very often there is a mismatch between the
voxel array and the detector
geometry (too many
volumes/voxel)
Voxelisation 3
11/07/2011Detector Simulation Prototype Rene Brun 19
One solution is to decrease the
voxel size, but this requires more
memory and contribute to the
level 2 cache misses.
11/07/2011Detector Simulation Prototype Rene Brun 20
11/07/2011Detector Simulation Prototype Rene Brun 21
11/07/2011Detector Simulation Prototype Rene Brun 22
parallelism
11/07/2011Detector Simulation Prototype Rene Brun 23
From a recent talk by Intel
If you trust Intel
11/07/2011Detector Simulation Prototype Rene Brun 24
If you trust Intel 2
11/07/2011Detector Simulation Prototype Rene Brun 25
Current Situation
We run jobs in parallel, one per core.
Nothing wrong with that except that it does not scale in case of many cores because it requires too much memory.
A multithreaded version may reduce (say by a factor 2 or 3) the amount of required memory, but also at the expense of performance.
A multithreaded version does not fit well with a hierarchy of processors.
So, we have a problem, in particular with the way we have designed some data structures, eg HepMC.
11/07/2011Detector Simulation Prototype Rene Brun 26
Back to Vectorization !!
We spent a few years between 1985 and 1990 trying to vectorize GEANT3 on CRAY, Cyber205, Eta10 and IBM3090.
We failed!! The gain in using vectors was lost in the memory management on small memory systems (machines had less than 8 Mbytes of memory!). And when looking a posteriori, our strategy was not the right one.
However, one may learn something positive from this failure.
11/07/2011Detector Simulation Prototype Rene Brun 27
11/07/2011Detector Simulation Prototype Rene Brun 28
Data Structures & parallelism
11/07/2011Detector Simulation Prototype Rene Brun 29
eventevent
vertices
tracks
C++ pointersspecific to a process
Copying the structure implies a
relocation of all pointers
I/O is a nightmare
Update of the structure from a different thread implies a
lock/mutexx
Can we make progress?
We need data structures with internal relations only. This can be implemented by using pools and indices.
When looping on collections, one must avoid the navigation in large memory areas killing the cache.
We must generate vectors of reasonable size well matched to the degree of parallelism of the hardware and the amount of memory.
We must find a system to avoid the tail effects
11/07/2011Detector Simulation Prototype Rene Brun 30
tails, tails, tails
11/07/2011Detector Simulation Prototype Rene Brun 31
Tails again
11/07/2011Detector Simulation Prototype Rene Brun 32
A killer if one has to wait the end of col(i) before
processing col(i+1)
Average number of objects in
memory
New Transport Scheme
11/07/2011Detector Simulation Prototype Rene Brun 33
oo
o
o
oo
o
o
o
o
ooo
o
oo
o oo
o
o
o
T1
T3
T2
o
o
o
oo
oo
o
o
o
o
ooo
o
oo
oo
oT4
All particles in the same volume type are transported in
parallel.Particles entering new volumes or generated are
accumulated in the volume basket.
Events for which all hits are
available are digitized in
parallel
New Transport
At each step, the navigator *nav has the state of the particles *x,*y,*z,*px,*py,*pz, the volume instances volume**, etc.
We compute the distances (array *Dist) to the next boundaries with something likenav->DistoOut(volume,x,y,z,px,py,pz,Dist)
Or the distances to one physics process with, egnav->DistPhotoEffect(volume,x,y,z,px,py,pz,DispP)
11/07/2011Detector Simulation Prototype Rene Brun 34
New Transport
The new transport system implies many changes in the geometry and physics classes. These classes must be vectorized (a lot of work!).
Meanwhile we can survive and test the principle by implementing a bridge function like
11/07/2011Detector Simulation Prototype Rene Brun 35
MyNavigator::DisttoOut(int n, TGeoVolume **vol, double *x,..) { for int i=0;i<n;i++) { Dist[i] = DisttoOutOld(vol[i],x[i],…); } }
A better solution
11/07/2011Detector Simulation Prototype Rene Brun 36
Pipeline of objects
CheckpointSynchronization.
Only 1 « gap » every N events
This type of solution required
anyhow for pile-up studies
A better better solution
11/07/2011Detector Simulation Prototype Rene Brun 37
checkpoints At each checkpoint we have to keep the
non finished objects/events.
We can now digitize with parallelism on events, clear and reuse the slots.
Generations of baskets
When a particle enters a volume or is generated, it is added to the basket of particles for the volume type.
The navigator selects the basket with the highest score (with a high and low water mark algorithm).
The user has the control on the water marks, but the idea that this should be automatic in function of the number of processors and the total amount of memory available. (see interactive demo)
11/07/2011Detector Simulation Prototype Rene Brun 38
11/07/2011Detector Simulation Prototype Rene Brun 39
11/07/2011Detector Simulation Prototype Rene Brun 40
Vectorizing : work to do
11/07/2011Detector Simulation Prototype Rene Brun 41
Vectorizing the geometry (ex1)
11/07/2011Detector Simulation Prototype Rene Brun 42
Double_t TGeoPara::Safety(Double_t *point, Bool_t in) const{ // computes the closest distance from given point to this shape. Double_t saf[3]; // distance from point to higher Z face saf[0] = fZ-TMath::Abs(point[2]); // Z
Double_t yt = point[1]-fTyz*point[2]; saf[1] = fY-TMath::Abs(yt); // Y // cos of angle YZ Double_t cty = 1.0/TMath::Sqrt(1.0+fTyz*fTyz);
Double_t xt = point[0]-fTxz*point[2]-fTxy*yt; saf[2] = fX-TMath::Abs(xt); // X // cos of angle XZ Double_t ctx = 1.0/TMath::Sqrt(1.0+fTxy*fTxy+fTxz*fTxz); saf[2] *= ctx; saf[1] *= cty; if (in) return saf[TMath::LocMin(3,saf)]; for (Int_t i=0; i<3; i++) saf[i]=-saf[i]; return saf[TMath::LocMax(3,saf)];}
Huge performance gain expected in this type of code
where shape constants can be computed outside
the loop
Vectorizing the geometry (ex2)
11/07/2011Detector Simulation Prototype Rene Brun 43
G4double G4Cons::DistanceToIn( const G4ThreeVector& p, const G4ThreeVector& v ) const{ G4double snxt = kInfinity ; // snxt = default return value const G4double dRmax = 100*std::min(fRmax1,fRmax2); static const G4double halfCarTolerance=kCarTolerance*0.5; static const G4double halfRadTolerance=kRadTolerance*0.5;
G4double tanRMax,secRMax,rMaxAv,rMaxOAv ; // Data for cones G4double tanRMin,secRMin,rMinAv,rMinOAv ; G4double rout,rin ;
G4double tolORMin,tolORMin2,tolIRMin,tolIRMin2 ; // `generous' radii squared G4double tolORMax2,tolIRMax,tolIRMax2 ; G4double tolODz,tolIDz ;
G4double Dist,s,xi,yi,zi,ri=0.,risec,rhoi2,cosPsi ; // Intersection point vars
G4double t1,t2,t3,b,c,d ; // Quadratic solver variables G4double nt1,nt2,nt3 ; G4double Comp ;
G4ThreeVector Normal;
// Cone Precalcs
tanRMin = (fRmin2 - fRmin1)*0.5/fDz ; secRMin = std::sqrt(1.0 + tanRMin*tanRMin) ; rMinAv = (fRmin1 + fRmin2)*0.5 ;
if (rMinAv > halfRadTolerance) { rMinOAv = rMinAv - halfRadTolerance ; } else { rMinOAv = 0.0 ; } tanRMax = (fRmax2 - fRmax1)*0.5/fDz ; secRMax = std::sqrt(1.0 + tanRMax*tanRMax) ; rMaxAv = (fRmax1 + fRmax2)*0.5 ; rMaxOAv = rMaxAv + halfRadTolerance ; // Intersection with z-surfaces
tolIDz = fDz - halfCarTolerance ; tolODz = fDz + halfCarTolerance ;
…… //here starts the real algorithm
Huge performance gain expected in this type of code
where shape constants can be computed outside
the loop
All these statements
are independent
of the particle !!!
Vectorizing the Physics
This is going to be more difficult when extracting the physics classes from G4. However important gains are expected in the functions computing the distance to the next interaction point for each process.
There is a diversity of interfaces and we have now sub-branches per particle type.
11/07/2011Detector Simulation Prototype Rene Brun 44
Next Steps
Consolidation of the prototype.
Implementation of the sliding objects.
Web site construction with a description of the current status and goals.
Thread safety of TGeo
Vectorization of TGeo (at least a critical subpart)
Discussion with the G4 team about the consequences for the G4 physics classes.
Discussion of the leadership and membership.
11/07/2011Detector Simulation Prototype Rene Brun 45
Acknowledgements
Several people have contributed to discussions.
Alfio Lazzaro (OpenLab) expert on parallelism
Sverre Jarp (Open Lab) expert on many topics
Philippe Canal
Andi Salsburger for the discussions and interest about the fast MC implementation.
Axel, Bertrand, Fons for valuable comments
11/07/2011Detector Simulation Prototype Rene Brun 46
Geometry of solids
In parallel to this activity, but complementary.
The project between the core G4 + Andrei Gheata and Jean-Marie Guyader is progressing. Among the goals:
Common interface for the basic solids with bridges for TGeo and the G4 geometry.
Development of the best possible algorithms to find distances, etc.
11/07/2011Detector Simulation Prototype Rene Brun 47
Comparing Sequential & Parallel versions
A stress suite using our detector data bases has been setup to compare several parameters of the sequential and parallel versions.
We do not want to wait the last minute to test the relative performances.
So far the goal has been to design data structures that do not penalize the parallel version (even on one single thread and no vectorisation).
See an example in the following slide.
11/07/2011Detector Simulation Prototype Rene Brun 48
11/07/2011Detector Simulation Prototype Rene Brun 49
11/07/2011Detector Simulation Prototype Rene Brun 50
NewTransport
code
Original idea
G4
11/07/2011Detector Simulation Prototype Rene Brun 51
Create a new architecturedesigned for
parallelism with full or/and fast
simulation