Upload
sibyl-gibbs
View
214
Download
0
Embed Size (px)
Citation preview
Progress report on the alignment of the tracking system
A. BonissentD. FouchezA.Tilquin
CPPM Marseille
•Mechanical constraints from optical measurement
•Inversion of big matrix in a decent execution time
•Pixel barrel as a test bench for the alignment code
March 6, 2003
Mechanical constraintsAleph case:
•Basic unit was a face composed of six silicon wafers, rigidly glued to each other and to an Omega structure
•During construction,3-dimensional coordinate measurements were made at each corner and on each wafer
•The invariance of the locations of neighbouring corners was taken into account by projecting the linear system of equations into the orthogonal subspace
40% reduction of the number of degree of freedom
Stabilisation of the poorly constrained end wafer
Improvement of the final resolution
We investigated such implementation for the ATLAS pixels
Projection factor and execution speed
With similar approach:
Two degrees of freedom per junction between two modules can be eliminated: 2 translations along and perpendicular to the stave
12*3=36 degrees of freedom for a total of 13*6 40% reduction
We can expect to gain, inversion is an N3 process.
But do not forget the projection in subspace, and the backward projection to the initial space.
1-reduction factor
Full inversion
ProjectionStart to gain with a reduction of at least 60% for 60 modules, and more than 90% for barrel pixel.
Projection is not justified by execution time
Worse
Better
Projection and precisionThe projection technique is equivalent to neglecting completely the errors on the optical measurements
Measurements have to be much better than final resolutionOptical measurements:
•Done in the lab with an accuracy of ~1 micron•But at room temperature and before transportation to CERN! Temperature effects are difficult to evaluate.•Mechanical junction between modules not rigid enough to guarantee
movements smaller than few microns.•No mechanical measurement after installation and during operation
The full alignment would have to be done at the beginning:•To get the correct position•To verify stability of mechanical constraint
Unstable mechanical constraint may introduce artificial distortion
Conclusions: Mechanical constraints not justified (at the beginning)
Matrix inversionOur last conclusions using Millipede package and 1.4 Ghz PC was:
Pixels Barrel 5 hours
Whole pixels 8 hours
SCT alone 2.5 days
Whole ID 10 days
For only one iteration
•In 2007 we may expect a reduction factor between 5 or 10 in execution time due to an increase of PC power.
•But still too long. Number of iterations at least 3:•Non linearity: 2 iterations
•Minimum check: i2- I+1 2
•Test before 2007 will be difficult with this execution time
~10 is more realistic
New ideas and new technologies should be investigated
New investigationsThe general problem can be split in 2 independent parts:
•Solve the linear system, without inversion and iterate Faster (~2), more accurate and robust
•Invert the big matrix at the end to get the errorsNot really necessary if like in ALEPH, the errorsdue to alignment are small compare to intrinsic one.
But a factor 2 of is still not enough
The new technology is parallelism
For that purpose we have investigated two freeware package:
HPL: High performance LinPack (Solver)
ScalaPACK: Scalable LAPACK (Inversion)
Both emulate a massive parallel computer on a PC cluster
HPL: High Performance LinPack
HPL: It solves dense linear system in double precision (64 bits) on distributed-memory computer ( PC’s). It uses:
•Two dimensional block-cyclic memory data distribution:
Each PC’s receives only part of the big matrix
•LU factorisation with row partial pivoting
•Recursive factorisation with pivot search See
http://www.netlib.org/benchmark/hpl/
Installation and featuresInstallation. It needs:
•MPI (Message Passing Interface):Freeware•BLAS(Basic Linear Algebra Subprograms):Freeware
Provides:•A testing program (random matrix generation):accuracy•A timing program: execution time and estimate of Gflops•An optimisation program:
•The blocking factor: size of Aij sub matrices (64*64)•The cluster size : number of PC’s or processes•The process grid: nrows*ncolumns = nmachine ( NPC*1)
To investigate performances we have used 16 personal PC’s from our Lab:
•CPU from 500 Mhz up to 1.8 Ghz•Memory from 128 Mb up to 256 Mb
Rem:Overall performances are driven by the least
powerful PC
Evolution of performances with cluster sizeOn a single machine: Linear resolution 2 times faster than Mellepede matrix inversion.
Resolution of a 2000 linear equations system:
•~Scalable with number of PC’s up to 4.
•Above 4, limitation is due to the communication protocol. Matrix is too small.
Overall gain compared to Mellepede is a factor of 10 with
an accuracy close to the hardware limit.
Evolution of performances with matrix size
Performances measured with 16 PC’s up to a system of 11000 equations (limited by memory size).
Square:Measured
Circle:corrected
•It’s an N3 process up to 8000
•Above 8000, process slow down due to memory swap.
•Time renormalization with Gflop/s at peek (best point):
peak
current
sGflop
sGfloptt
)/(
)/(.'
Nominal performances with an ideal system (dedicated/enough memory)
Final performances
Sn
Nst
5.016
745)(
3 N=matrix size
n=number of PC’s
S=Speed (Ghz) of the slowest
Actual performances with our 16 PC’s cluster and S=0.5 Ghz
HPL Millipede(1.4 Ghz)
Pixel barrel 50 mn 8 hours
Whole Pixel 1.4 hours 2.6 days
Whole ID (extrapolated) 18 hours 10 days
But still too long:
•More than one iteration will be needed depending on the final accuracy we want.
Future improvements•The processor grid and blocking size can be optimised:
We used : (16 rows * 1 column) (4*4) or (2*8)
: 64 blocking size 128,256…xxx•The BLAS routines are not optimised:
From specialist we may gain a big factor by using either:•Optimised BLAS version : Not freeware•ATLAS (Automatically Tuned Linear Algebra Software)
Free, not tested yet because of our too old Linux version•The power of the machine in five years at least 5 Ghz (%0.5)•64 bits floating point arithmetic unit already available
First estimate on DEC machine is a factor of at least 10But not independent of CPU and memory speed
~2
~5
~10
>2
(18 hours*10 iterations)/200= 0.9 hour for the whole ID
ID alignment execution time is no more a problem
T=200
Matrix inversionMain problems:
•Matrix inversion is always slower than linear resolution (~2)•Accuracy is worse :non uniformity of the matrix elements•It is less robust: null eigenvalue
However:•Computing the errors gives more confidence to the final results•It needs to be done only once at the end
The solution is ScalaPack (Scalable LAPACK)
The package is a general parallel linear algebra including inversionIt needs:
•PBLAS (BLAS level 1-2-3)•BLACS (Basic Linear Algebra Communication Subprograms)•MPI (Message passing Interface)
First investigationsThe philosophy of the package is very close to HPL: Same data distribution, basic algebra and communication protocolThe installation is easier Pre-compiled library and MPI already installedBut example programs not that clear No inversion example
However we succeeded to invert matrices, but only on single processor (PII at 0.5 Ghz)
Size Inversion HPL
1000*1000 69 s 35 s
2000*2000 10.5 mn 5.5 mn
•A factor 2 slower than HPL Millepede
•Same N3 dependency than HPL apart a multiplicative factor of 2
Because of scalability, all what we said about HPL is valid for ScalaPack
)(.2)( solvertinversiont
Conclusions on Matrix inversion•Using software as HPL and/or ScalaPack
•A cluster of 16 dedicated PC’s•5 Ghz CPU clock•1 Gb of fast memory•64 floating point arithmetic unit
The whole ID will be aligned in one hour or so !!
If you were to buy new hardware, my guess is that you could solve this problem 10 times in a day on one multiprocessor machine. The forthcoming AMD Opteron gets roughly 85% of peak, and is a 64 bit machine so that you can put enough memory on it. There's certainly no need for 32 nodes for such a small problem.
R Clint Whaley
Incredible ? Here is a mail from the expert itself
Pixel barrel alignmentTo test the alignment procedure we used:
•A sample of multimuons generated by Richard
•1500 events ( 0<<2,<1.8, 2<pT<50 GeV )
•10 muons per event with a common vertex Vertex spread: 5.6 cm in z and 2 mm in x and y
•Richard’s code to produce the ntuple with perfect geometry
for the full pixel barrel (1418 modules) or ring (100 modules)
•Alignment code running in two passes:
•Preparation of vector and matrix (Pawel’s code) and storage on disk (no vertex fit applied yet)
•Solving the linear system with HPL using a farm of 25 PC’s (no inversion)
Alignment parameters correction
•Structure in the alignment parameters and errors point out remaining problems
•The comparison on a single ring between HPL and standard CERN library shows similar results.
Machinery is ready for a successful test of the
pixel barrel allignement
Conclusions
•Mechanical constraints are not justified :
No gain in execution time
Stability not guaranteed at the level of few microns
•Linear system resolution and matrix inversion no more a problem
With a cluster of 16 modern PCs,whole ID alignment in 1 hour
•First test of full scale alignment of the pixel barrel has been done
Some problems remain in the alignment procedure
•Detailed documentation on Pawel software and geometry package recommended.
However, we are very close from a successful test