Progress report on the alignment of the tracking system A. Bonissent D. Fouchez A.Tilquin CPPM Marseille Mechanical constraints from optical measurement

Progress report on the alignment of the tracking system

A. BonissentD. FouchezA.Tilquin

CPPM Marseille

•Mechanical constraints from optical measurement

•Inversion of big matrix in a decent execution time

•Pixel barrel as a test bench for the alignment code

March 6, 2003

Mechanical constraintsAleph case:

•Basic unit was a face composed of six silicon wafers, rigidly glued to each other and to an Omega structure

•During construction,3-dimensional coordinate measurements were made at each corner and on each wafer

•The invariance of the locations of neighbouring corners was taken into account by projecting the linear system of equations into the orthogonal subspace

40% reduction of the number of degree of freedom

Stabilisation of the poorly constrained end wafer

Improvement of the final resolution

We investigated such implementation for the ATLAS pixels

Projection factor and execution speed

With similar approach:

Two degrees of freedom per junction between two modules can be eliminated: 2 translations along and perpendicular to the stave

12*3=36 degrees of freedom for a total of 13*6 40% reduction

We can expect to gain, inversion is an N3 process.

But do not forget the projection in subspace, and the backward projection to the initial space.

1-reduction factor

Full inversion

ProjectionStart to gain with a reduction of at least 60% for 60 modules, and more than 90% for barrel pixel.

Projection is not justified by execution time

Worse

Better

Projection and precisionThe projection technique is equivalent to neglecting completely the errors on the optical measurements

Measurements have to be much better than final resolutionOptical measurements:

•Done in the lab with an accuracy of ~1 micron•But at room temperature and before transportation to CERN! Temperature effects are difficult to evaluate.•Mechanical junction between modules not rigid enough to guarantee

movements smaller than few microns.•No mechanical measurement after installation and during operation

The full alignment would have to be done at the beginning:•To get the correct position•To verify stability of mechanical constraint

Unstable mechanical constraint may introduce artificial distortion

Conclusions: Mechanical constraints not justified (at the beginning)

Matrix inversionOur last conclusions using Millipede package and 1.4 Ghz PC was:

Pixels Barrel 5 hours

Whole pixels 8 hours

SCT alone 2.5 days

Whole ID 10 days

For only one iteration

•In 2007 we may expect a reduction factor between 5 or 10 in execution time due to an increase of PC power.

•But still too long. Number of iterations at least 3:•Non linearity: 2 iterations

•Minimum check: i2- I+1 2

•Test before 2007 will be difficult with this execution time

~10 is more realistic

New ideas and new technologies should be investigated

New investigationsThe general problem can be split in 2 independent parts:

•Solve the linear system, without inversion and iterate Faster (~2), more accurate and robust

•Invert the big matrix at the end to get the errorsNot really necessary if like in ALEPH, the errorsdue to alignment are small compare to intrinsic one.

But a factor 2 of is still not enough

The new technology is parallelism

For that purpose we have investigated two freeware package:

HPL: High performance LinPack (Solver)

ScalaPACK: Scalable LAPACK (Inversion)

Both emulate a massive parallel computer on a PC cluster

HPL: High Performance LinPack

HPL: It solves dense linear system in double precision (64 bits) on distributed-memory computer ( PC’s). It uses:

•Two dimensional block-cyclic memory data distribution:

Each PC’s receives only part of the big matrix

•LU factorisation with row partial pivoting

•Recursive factorisation with pivot search See

http://www.netlib.org/benchmark/hpl/

Installation and featuresInstallation. It needs:

•MPI (Message Passing Interface):Freeware•BLAS(Basic Linear Algebra Subprograms):Freeware

Provides:•A testing program (random matrix generation):accuracy•A timing program: execution time and estimate of Gflops•An optimisation program:

•The blocking factor: size of Aij sub matrices (64*64)•The cluster size : number of PC’s or processes•The process grid: nrows*ncolumns = nmachine ( NPC*1)

To investigate performances we have used 16 personal PC’s from our Lab:

•CPU from 500 Mhz up to 1.8 Ghz•Memory from 128 Mb up to 256 Mb

Rem:Overall performances are driven by the least

powerful PC

Evolution of performances with cluster sizeOn a single machine: Linear resolution 2 times faster than Mellepede matrix inversion.

Resolution of a 2000 linear equations system:

•~Scalable with number of PC’s up to 4.

•Above 4, limitation is due to the communication protocol. Matrix is too small.

Overall gain compared to Mellepede is a factor of 10 with

an accuracy close to the hardware limit.

Evolution of performances with matrix size

Performances measured with 16 PC’s up to a system of 11000 equations (limited by memory size).

Square:Measured

Circle:corrected

•It’s an N3 process up to 8000

•Above 8000, process slow down due to memory swap.

•Time renormalization with Gflop/s at peek (best point):

peak

current

sGflop

sGfloptt

)/(

)/(.'

Nominal performances with an ideal system (dedicated/enough memory)

Final performances

Sn

Nst

5.016

745)(

3 N=matrix size

n=number of PC’s

S=Speed (Ghz) of the slowest

Actual performances with our 16 PC’s cluster and S=0.5 Ghz

HPL Millipede(1.4 Ghz)

Pixel barrel 50 mn 8 hours

Whole Pixel 1.4 hours 2.6 days

Whole ID (extrapolated) 18 hours 10 days

But still too long:

•More than one iteration will be needed depending on the final accuracy we want.

Future improvements•The processor grid and blocking size can be optimised:

We used : (16 rows * 1 column) (4*4) or (2*8)

: 64 blocking size 128,256…xxx•The BLAS routines are not optimised:

From specialist we may gain a big factor by using either:•Optimised BLAS version : Not freeware•ATLAS (Automatically Tuned Linear Algebra Software)

Free, not tested yet because of our too old Linux version•The power of the machine in five years at least 5 Ghz (%0.5)•64 bits floating point arithmetic unit already available

First estimate on DEC machine is a factor of at least 10But not independent of CPU and memory speed

~2

~5

~10

>2

(18 hours*10 iterations)/200= 0.9 hour for the whole ID

ID alignment execution time is no more a problem

T=200

Matrix inversionMain problems:

•Matrix inversion is always slower than linear resolution (~2)•Accuracy is worse :non uniformity of the matrix elements•It is less robust: null eigenvalue

However:•Computing the errors gives more confidence to the final results•It needs to be done only once at the end

The solution is ScalaPack (Scalable LAPACK)

The package is a general parallel linear algebra including inversionIt needs:

•PBLAS (BLAS level 1-2-3)•BLACS (Basic Linear Algebra Communication Subprograms)•MPI (Message passing Interface)

First investigationsThe philosophy of the package is very close to HPL: Same data distribution, basic algebra and communication protocolThe installation is easier Pre-compiled library and MPI already installedBut example programs not that clear No inversion example

However we succeeded to invert matrices, but only on single processor (PII at 0.5 Ghz)

Size Inversion HPL

1000*1000 69 s 35 s

2000*2000 10.5 mn 5.5 mn

•A factor 2 slower than HPL Millepede

•Same N3 dependency than HPL apart a multiplicative factor of 2

Because of scalability, all what we said about HPL is valid for ScalaPack

)(.2)( solvertinversiont

Conclusions on Matrix inversion•Using software as HPL and/or ScalaPack

•A cluster of 16 dedicated PC’s•5 Ghz CPU clock•1 Gb of fast memory•64 floating point arithmetic unit

The whole ID will be aligned in one hour or so !!

If you were to buy new hardware, my guess is that you could solve this problem 10 times in a day on one multiprocessor machine. The forthcoming AMD Opteron gets roughly 85% of peak, and is a 64 bit machine so that you can put enough memory on it. There's certainly no need for 32 nodes for such a small problem.

R Clint Whaley

Incredible ? Here is a mail from the expert itself

Pixel barrel alignmentTo test the alignment procedure we used:

•A sample of multimuons generated by Richard

•1500 events ( 0<<2,<1.8, 2<pT<50 GeV )

•10 muons per event with a common vertex Vertex spread: 5.6 cm in z and 2 mm in x and y

•Richard’s code to produce the ntuple with perfect geometry

for the full pixel barrel (1418 modules) or ring (100 modules)

•Alignment code running in two passes:

•Preparation of vector and matrix (Pawel’s code) and storage on disk (no vertex fit applied yet)

•Solving the linear system with HPL using a farm of 25 PC’s (no inversion)

Alignment parameters correction

•Structure in the alignment parameters and errors point out remaining problems

•The comparison on a single ring between HPL and standard CERN library shows similar results.

Machinery is ready for a successful test of the

pixel barrel allignement

Conclusions

•Mechanical constraints are not justified :

No gain in execution time

Stability not guaranteed at the level of few microns

•Linear system resolution and matrix inversion no more a problem

With a cluster of 16 modern PCs,whole ID alignment in 1 hour

•First test of full scale alignment of the pixel barrel has been done

Some problems remain in the alignment procedure

•Detailed documentation on Pawel software and geometry package recommended.

However, we are very close from a successful test

Documents

Progress report on the alignment of the tracking system A. Bonissent D. Fouchez A.Tilquin CPPM Marseille Mechanical constraints from optical measurement