AliRoot : Experience from 2011

AliRoot: Experience from 2011

07/03/2012P. Hristov

AliRoot releases• 11/2010-09/2011: v4-20-Release

– 41 revisions– used for PbPb 2010 and pp 2011: raw data & MC

• 08/2011-03/2012: v5-01-Release– 27 revisions– used for pp and PbPb 2011: raw data & MC

• General observations– Too many porting requests– The code in not very stable => a lot of work needed to make sure the

changes are OK– Several “emergency” cases– Probably we need better planning and modified release procedure

Changes: v4-20 vs v5-01

“Life cycle”

Main issues in 2011• Low job efficiency in the reconstruction(CPU time/Wall time)

– Observed also as increase in processing time with the number of events

– Caused by too many file open/close operations– Technical fix to avoid the call to fsync– Still the way we use the loaders has to be improved– A systematic study of the job efficiency is needed. This concerns the

simulation, reconstruction, organized and chaotic analysis• Run time errors

– Segmentation violations: missing protections, other– Floating point exceptions: missing protections– G__exception crisis

Main issues in 2011• Run time errors

– Sometimes difficult/impossible to reproduce on a local WN– May depend on the other running jobs– Sometimes “Cured” after resubmission: significant waste of

resources, point to memory corruption• More intelligent resubmission policy?

– GRID “gdb runs” used to do “catch throw”– as a rule caused by indexes out of range, memory overwriting, etc.– Coverity detects some of the problems, so we have to follow the

clean-up of the defects (12 in the trunk)– Valgrind with memcheck: slow, but detects additional problems.

Run it before major productions?

Main issues in 2011• CPU consumption

– “Objective” causes: more central events this year, higher luminosity => more pile-up, more background, etc.

– “Subjective” causes: unsatisfactory reconstruction parameters, “improved” algorithms, etc.• Example: the parameters of the cascade finder

– “Organization” causes: the running conditions were not known in advance and no reliable tests/estimations were done in preparation of the data processing

– We have to monitor/profile continuously the CPU per event: done on a dedicated server

– Valgrind with callgrind: slow, but very useful. Run before major productions?

Differences between 2010 and 2011 data samples

• More detector modules (EMCAL, TRD)– more data

• Lower detector “efficiency” (ITS, TPC)– higher combinatorics in the reconstruction/matching

• Different trigger – more central events– more resources (memory and CPU) needed

• Higher luminosity: more background/pileup• HLT compression

– additional memory and CPU• => Very high memory consumption: (3.5Gb RAM, 4.5Gb

virtual): main problem in 2011 PbPb data

Very high memory consumption: Investigations

• syswatch.log from the GRID production: useful• Profiling with Google performance tools. Relatively slow.• Profiling with Instruments: fast, no obvious memory leaks • Profiling with massif. Relatively slow, but very useful.• Suspected pile-up events are not the main or only reason• Technical solutions

– Reconstruction of events in decreasing size order: makes the memory allocation flat, but still at the previous level

– tcmalloc: nice reduction on Ubuntu 11.10, no effect on SLC5– jemalloc: too late for the production. Tested in February, improvement of

the xrootd consumption.• “Memory pools” implemented by Ruben

Very high memory consumption

• Virtual memory improved by the “memory pools” for TPC and used in production (~400 Mb less, no trashing)

• Still too high resident memory• Additional “pools” tested => no effect• local files vs AliEn/xrootd access => significant improvement

– Explanation: the standard malloc is not thread friendly by design, the memory allocated in the xrootd threads is not released

10

Memory comparisons

11

Very high memory consumption• Too big files with rec. points => option to keep only the current (last) event,

needed if we test reconstruction of local raw file (helps also for the G__exception)

• Switch off QA and MUON reconstruction: gain of ~150 Mb• Size of the libraries: loadlibs.C => libITSrec.so takes ~80Mb => no obvious

reason• Rec. points in split mode: no effect• Option and macros and scripts to reconstruct one big chunk in several

consecutive aliroot processes + merging of the ESDs, ESDfriends and tags => not used

• Test of event ordering with full pools and local raw+OCDB => more flat memory distribution, no gain

• xrootd option to avoid buffers: equivalent to local raw file, no gain• Run with TPC pools, local raw file, OCDB snapshot, keeping only the rec.

points for the current event, switch off QA and MUON => OK

12

Very high memory consumption

• All the changes listed so far made possible the reconstruction of 75% of the PbPb data

• The remaining 25% were processed in “split mode”. Overall success 95%– The raw files were reconstructed in two parts– Why does this help if we don’t have memory leaks?

• Faster resubmission and possibility to find site with more resources

• We still have some memory trashing => less events, lower virtual memory

Allocations: resident memory (v5-01)• Total allocations seen by massif: 1.9

GBWhy don't we see the "missing 1GB”?mmap is used at some base level, difficultto monitor

• TPC: ~700 MB (37%)AliTPCtrackerMI::TrackingAliTPCtrackerMI::MakeSeeds5 190 MBAliTPCtrackerMI::MakeSeeds3 180 MBAliTPCtrackerMI::LoadClusters 80+70 MB– Why do we load twice the clusters?

OROC+IROCAliTPCclustererMI::AliTPCclustererMI

50+50 MBAliTPCtrackerSector::Setup 50+30 MB Why do we call twice each of these

functions? OROC+IROC

• OCDB: ~350 MB (18%). Among them AliTPCCalROC::Streamer 70 MB AliMUONCalibParamNF::Streamer 30 MBAliMUONCalibParamNI::Streamer 45 MBAliMUONCalibParamND::AliMUONCalibPara

mND 40 MBThe size of the OCDB increased from 250

MB last year to 350 MB this year…• Raw data: ~140 MB (8%). Some

improvements in HLT already in place• Other allocations: ~700 MB (37%).

AliITSQASPDDataMakerRec::InitRaws 50 MB AliMpDetElement::AddManu 50 MB AliMpExMap::AliMpExMap 20 MB AliTRDclusterizer::AddClusterToArray 40 MB AliTRDtrackerV1::ReadClusters 40 MB

Memory: v5-01 vs v5-02

Allocations: v5-02-Release

Out of 2.7GiB:• 376.0 MiB: AliTPCtrackerMI::NextFreeSeed()• 373.8 MiB: TStorage::ReAllocChar called by

TTree::Fill (Root IO)• 291.7 MiB:

AliHLTTPCClusterAccessHLTOUT::AliTPCclusterMIContainer::NextCluster

• 231.9 MiB: AliTPCtrackerRow::SetCluster(2or1)• 107.6 MiB: AliTPCclustererMI::AliTPCclustererMI• => TPC and HLT need improvement as in v5-01

Memory consumption: conclusions and questions

• We need significant improvement before we start LHC11h pass3• The most obvious candidates for improvement are in TPC and HLT• TPC reconstruction is a long-standing issue

– #45933: High memory consumption in the TPC reconstruction of PbPb events. Posted on 15/01/2009 and closed on 22/04/2010 without any improvement

– It keeps all the clusters in memory together with a lot of heavy objects (seeds, etc.)– We have examples of different reconstruction that might be useful: HLT TPC– Is it possible to make a minimal reduction of the memory by reconstructing separately,

for example, A and C sides?– What are the TPC plans?

• HLT allocations most probably are related to the way TPC uses the clusters in the reconstruction– Can we further reduce them?

• Can we optimize the Root IO and reduce the memory?– What if we do not store the rec. points in trees, but in TClonesArrays (Ruben)?

Other issues

• SHUTTLE– most of the time: stable operation– problem in the merging of raw tags for runs with

many empty events: not yet fixed– rare crashes of DA for runs with low statistics

• Visualization– Online event display: complete disaster in 2011,

now should be OK (thanks to Mihai and Barth).

Other issues

• Calibration and OCDB– few cases of “forward incompatibility” when new

OCDB objects were used with old AliRoot versions– unclear calibration strategy: significant delays in the

production, manual updates, waste of resources, bad physics performance. The PWGPP calibration group is working on this issue.

– use case to simulate with old AliRoot version and the corresponding OCDB objects: significant manual work, no satisfactory solution yet

Other issues• Detector description

– Discrepancies in the material budget were discovered using conversions, hadronic interactions and satellite collisions. Special working group is trying to clarify the situation

• Generators– Misuse of some generators (for example AliGenBox with flat

pseudorapidity instead of rapidity distribution)– Complicated cocktail generators with unexpected issues in the analysis

of the MC events (background shapes, estimation of efficiencies, selection of primaries, etc.)

– Some generators needed a lot of fixes to become operational (for example, TAmpt)

– How do we validate new tunes, versions, etc?

Questions for discussion

• Simulation– Improvements in Geant3, how do we treat them?• #90939: Request to include (anti)hypertriton in ALICE

GEANT3• #84396: Request to include some improvements for

light (anti)nuclei in ALICE GEANT3 (“Geant3/Fluka” will not be needed anymore)

– How do we use Geant4?• Validation?• Does it become our main simulation package?

Questions for discussion

• Physics performance & QA– During the PWGPP and QA meetings several

problems have been discussed, for example the quality of the reconstruction at high PT, the difference between the PT spectra of positive and negative particles, the quality of the vertex finding, etc.

– How do we address these and other performance problems and make sure the corresponding fixes are in AliRoot?

Documents

AliRoot : Experience from 2011