View
29
Download
0
Category
Tags:
Preview:
DESCRIPTION
DOE/NSF Review – January 2003, LBNL. ATLAS Software & Computing Status and Plans. Dario Barberis University of Genoa (Italy). Foreword. - PowerPoint PPT Presentation
Citation preview
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 1
ATLAS Software & Computing
Status and Plans
Dario Barberis
University of Genoa (Italy)
DOE/NSF Review – January 2003, LBNL
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 2
Foreword
• I have been designated by the ATLAS Management to be the next Computing Coordinator, and the ATLAS Collaboration Board has been asked to endorse this proposal (e-mail vote by Collaboration Board in process)
• Main parts of this talk were prepared with contributions of the outgoing Computing Coordinator, N. McCubbin, and several other members of the Computing Steering Group
• Organizational changes outlined at the end of this talk are still proposals being discussed within the ATLAS Collaboration
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 3
Outline
• Data Challenges
• GRID
• Geant4
• LCG
• Computing Organization
• Software development plans
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 4
DC0: readiness & continuity tests(December 2001 – June 2002)
• “3 lines” for “full” simulation– 1) Full chain with new geometry (as of January 2002)
Generator->(Objy)->Geant3->(Zebra->Objy)->Athena recon.->(Objy)->Analysis – 2) Reconstruction of ‘Physics TDR’ data within Athena
(Zebra->Objy)->Athena rec.-> (Objy) -> Simple analysis– 3) Geant4 robustness test
Generator-> (Objy)->Geant4->(Objy)
• “1 line” for “fast” simulationGenerator-> (Objy) -> Atlfast -> (Objy)
Continuity test: Everything from the same release for the full chain (3.0.2) we learnt a lot we underestimated the implications of that statement completed in June 2002
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 5
ATLAS Computing: DC1• The ‘Phase1’ (G3) simulation (Jul-Aug 2002) was a highly successful world-
wide exercise from which we learned a lot, e.g. software distribution, importance of validation, etc.
• Grid tools were used in Scandinavia (‘NorduGrid’) for their full share of DC1, and in USA for a significant fraction of theirs. Grid tools have also been used for an extensive ATLAS-EDG test involving 6 sites, aimed at repeating ~1% of ‘European’ DC1 share.
• We have launched end Nov 2002 ‘Phase 2’, i.e. the “pile-up” (2x1033 and 1034) exercise, following ‘site validation’ (55 sites) and ‘physics validation’. The HLT community specifies details of what samples are to be piled up. Most sites completed by mid-December, last few jobs running right now. – About the same CPU neeed as for phase 1
• 70 Tbyte, 100 000 files• Additional countries/institutes joined in
– Large scale GRID test in since end November in preparation for reconstruction
• Reconstruction February-March 2003– using ATHENA. CPU needed <10% than in simulation, but 30 TB of
data collected in 7 simulation production sites.
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 6
Contribution to the overall CPU-time (%) per country
1,41%
10,92%
0,01%
1,46%9,59%2,36%
4,94%
10,72%
2,22%
3,15%
4,33%
1,89%
3,99%
14,33%
0,02%
28,66%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
ATLAS DC1 Phase 1: July-August 20023200 CPU‘s
110 kSI9571000 CPU days
5*10*7 events generated1*10*7 events simulated3*10*7 single particles30 Tbytes35 000 files
39 Institutes in 18 Countries1. Australia
2. Austria3. Canada4. CERN5. Czech Republic6. France7. Germany8. Israel9. Italy10. Japan11. Nordic12. Russia13. Spain14. Taiwan15. UK16. USA
grid tools
used at 11 sites
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 7
ATLAS Computing: DC1 WGs & people (under the responsibility of the Data Challenge Coordinator, G.Poulard)
• A-Wp1: Event Generator (I. Hinchliffe + 8 physicists)
• A-Wp2: Geant3 Simulation (P. Nevski)
• A-Wp3: Geant4 Simulation (A. Dell'Acqua)
• A-Wp4: Pile-up (M. Wielers)
– "Atlsim" framework (P. Nevski)
– "Athena" framework (P. Calafiura)
• A-Wp5: Detector response
– (not active for DC1)
• A-Wp6: Data Conversion (RD Schaffer + DataBase group)
– Additional people were active for DC0
– + people involved in AthenaRoot I/O conversion
• A-Wp7: Event Filtering (M. Wielers)
• A-Wp8: Reconstruction (D. Rousseau)
• A-Wp9: Analysis (F. Gianotti)
• A-Wp10: Data Management (D. Malon)
• A-Wp13: Tier centres (A. Putzer)
– WG: responsible of production centers
– + contact person in each country
• A-Wp14: Fast simulation (P. Sherwood)
– WG: E. Richter-Was, J. Couchman
• A-Wp11: Tools:
– Bookkeeping & cataloguing (S. Albrand, L. Goossens + 7 other physicists/engineers)
– Production WG: L. Goossens, P. Nevski, S. Vaniachine
• + Virtual Data catalog (S. Vaniachine, P. Nevski)
• + Grid Tools providers (NorduGrid & US)
– Organisation & Documentation WG: A. Nairz, N. Benekos + AMI and Magda people (in close connection with bookkeeping & cataloguing WG)
• A-Wp12: Teams
– "Site" validation (J-F. Laporte)
• All local managers from collaborating institutes
– Physics Validation (J-F. Laporte, F. Gianotti + representatives of HLT and Physics WG’s)
– Production
• WG: P. Nevski, S. O'Neale, L. Goossens, Y. Smirnov, S. Vaniachine
• + local production managers
• (39 sites for DC1/1 and 56 sites for DC1/2)
• + ATLAS-Grid people
Success of DC1 due to effort and commitment of many world-wide sites, actively organized by A. Putzer
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 8
ATLAS Computing: DC1
• Currently we are preparing (validating) the Athena-based reconstruction step: Software Release 6 (end January). Aim is that we can launch wide-scale reconstruction a.s.a.p. after Release 6, possibly with wide use of some GRID tools. [The actual reconstruction, which will probably be (re-)done on various sub-samples over the first few months of next year is not strictly part of DC1.]
• Note that our present scheduling of software releases is driven entirely by HLT (High Level Trigger) requirements and schedule. For example, when Release 5 slipped in Fall 2002 by ~1 month compared to the original schedule, we issued two intermediate releases (adding ‘ByteStream’ [raw data format] capability) to minimise effects of delay on HLT schedule.
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 9
ATLAS Computing: DC1/HLT/EDM
• In fact, one of the most important benefits of DC1 has been the much enhanced collaboration between the HLT and ‘off-line’ communities, most prominently in the development of the raw-data part of the Event Data Model. (‘ByteStream’, Raw Data Objects, etc.)
• We have not yet focussed on the reconstruction part of the Event Data Model to the same extent, but an assessment of what we have got ‘today’ and (re-)design where appropriate is ongoing.
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 10
DC2-3-4-… • DC2: Q4/2003 – Q2/2004
– Goals
• Full deployment of Event Data Model & Detector Description
• Geant4 becomes the main simulation engine
• Pile-up in Athena
• Test the calibration and alignment procedures
• Use LCG common software
• Use widely GRID middleware
• Perform large scale physics analysis
• Further tests of the computing model
– Scale
• As for DC1: ~ 107 fully simulated events (pile-up too)
• DC3, DC4...
– yearly increase in scale and scope
– increasing use of Grid
– testing rate capability
– testing physics analysis strategy
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 11
ATLAS and GRID
• Atlas has already used GRID for producing DC1 simulations
– Production distributed on 39 sites, GRID used for ~5% of the total amount of data by:
• NorduGrid (8 sites), who produced all their data using GRID
• US Grid Testbed (Arlington, LBNL, Oklahoma), where GRID was used for ~10% of their DC1 share (10%=30k hours)
• EU-DataGrid re-ran 350 DC1 jobs (~ 10k hours) in some Tier1 prototype sites: CERN, CNAF (Italy), Lyon, RAL, NIKHEF e Karlsruhe (CrossGrid site): this last production was done in the first half of September and was made possible by the work of the ATLAS-EDG task force
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 12
ATLAS GRID plans for the near future • In preparation for the reconstruction phase (spring 2003) we performed further Grid
tests in Nov/Dec.
– Extend the EDG to more ATLAS sites, not only in Europe.
– Test a basic implementation of a worldwide Grid.
– Test the inter-operability between the different Grid flavors.
• Inter-operation = submit a job in region A, the job is run in region B if the input data are in B; the produced data are stored; the job log is made available to the submitter.
• The EU project DataTag has a Work Package devoted specifically to interoperation in collaboration with US iVDGL project: the results of the work of these projects is expected to be taken up by LCG (GLUE framework)
• ATLAS has collaborated with DataTag-iVDGL for interoperability demonstrations in November-December 2002.
• The DC1 data will be reconstructed (using Athena) early 2003: the scope and way of using Grids for distributed reconstruction will depend on the results of the tests started in Nov/December and still on-going.
• ATLAS is fully committed to LCG and to its Grid middleware selection process: our “early tester” role has been recognized to be very useful for EDG: we are confident that it will be the same for LCG products
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 13
ATLAS Long Term GRID Planning
• Worldwide GRID tests are essential to define in detail the ATLAS distributed Computing Model.
• The principles of the cost and resource sharing are described in a paper and were presented in the last ATLAS week (October 2002) and endorsed by the ATLAS Collaboration Board:
PRINCIPLES OF COST SHARING FOR THE ATLAS OFFLINE COMPUTING RESOURCES
Prepared by: R. Jones, N. McCubbin, M. Nordberg, L. Perini, G. Poulard, and A. Putzer
• Main implementation of cost sharing is foreseen through in-kind contributions of resources in regional centres, made available for the common ATLAS computing infrastructure
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 14
ATLAS Computing: Geant4 evaluation and integration programme
• ATLAS has invested and is investing substantial effort into evaluation of G4, in close collaboration with G4 itself
• Involves essentially all ATLAS sub-detectors• Provides reference against which any future simulation will have to
compare• Provides (sufficiently well-) tested code that should, in principle,
integrate with no difficulty into a complete detector simulation suite:– Striving for:
• Minimal inter-detector coupling• Minimal coupling between framework and users code.
– With this approach we are finding no problem in interfacing different detectors
• Further integration issues (framework, detector clashes, memory, performance) are being checked.
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 15
Example: Geant4 Electron Response in ATLAS Calorimetry
Example: Geant4 Electron Response in ATLAS Calorimetry
Overall signal characteristics:Overall signal characteristics:
Geant4 reproduces the average electron signal as func- tion of the incident energy in all ATLAS calorimeters very well (testbeam setup or analysis induced non-line- arities typically within ±1%)…
…but average signal can be smaller than in G3 and data (1-3% for 20- 700 μm range cut in HEC);
signal fluctuations in EMB very well simulated;
electromagnetic FCal: high energy limit of reso- lution function ~5% in G4, ~ 4% in data and G3;
Overall signal characteristics:Overall signal characteristics:
Geant4 reproduces the average electron signal as func- tion of the incident energy in all ATLAS calorimeters very well (testbeam setup or analysis induced non-line- arities typically within ±1%)…
…but average signal can be smaller than in G3 and data (1-3% for 20- 700 μm range cut in HEC);
signal fluctuations in EMB very well simulated;
electromagnetic FCal: high energy limit of reso- lution function ~5% in G4, ~ 4% in data and G3;
0 1 2 4 53
0
-2
-4
-6
2
0
-2
-4
-6
2
ΔE
rec M
C-D
ata
[%
]
noiseNoise Cut Level σ
GEANT4
GEANT4
GEANT3
GEANT3
stochastic term
%× GeV
9.2 9.40.3 0.40.2 0.5 9 9.6
data data
GEANT3GEANT3
GEANT4GEANT4
high energy limit %
TileCal: stochastic term 22%GeV1/2 G4/G3, 26%GeV1/2 data; high energy limit very comparable.
TileCal: stochastic term 22%GeV1/2 G4/G3, 26%GeV1/2 data; high energy limit very comparable.
FCal Electron ResponseFCal Electron Response
EMB Electron Energy Resolution
EMB Electron Energy Resolution
(thanks to P.Loch)
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 16
Conclusions on ATLAS Geant4 Physics validation
Conclusions on ATLAS Geant4 Physics validation
Geant4 can simulate relevant features of muon, electron and Geant4 can simulate relevant features of muon, electron and pion signals in various ATLAS detectors, often better than pion signals in various ATLAS detectors, often better than Geant3;Geant3;
remaining discrepancies, especially for hadrons, are remaining discrepancies, especially for hadrons, are addressed and progress can be expected in the near future;addressed and progress can be expected in the near future;
ATLAS has a huge amount of the right testbeam data for the ATLAS has a huge amount of the right testbeam data for the calorimeters, inner detector modules, and the muon detectors to calorimeters, inner detector modules, and the muon detectors to evaluate the Geant4 Physics models in detail;evaluate the Geant4 Physics models in detail;
feedback loops to Geant4 team are for most systems feedback loops to Geant4 team are for most systems established since quite some time; communication is not a established since quite some time; communication is not a problem.problem.
Geant4 can simulate relevant features of muon, electron and Geant4 can simulate relevant features of muon, electron and pion signals in various ATLAS detectors, often better than pion signals in various ATLAS detectors, often better than Geant3;Geant3;
remaining discrepancies, especially for hadrons, are remaining discrepancies, especially for hadrons, are addressed and progress can be expected in the near future;addressed and progress can be expected in the near future;
ATLAS has a huge amount of the right testbeam data for the ATLAS has a huge amount of the right testbeam data for the calorimeters, inner detector modules, and the muon detectors to calorimeters, inner detector modules, and the muon detectors to evaluate the Geant4 Physics models in detail;evaluate the Geant4 Physics models in detail;
feedback loops to Geant4 team are for most systems feedback loops to Geant4 team are for most systems established since quite some time; communication is not a established since quite some time; communication is not a problem.problem.
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 17
G4 simulation of full ATLAS detector
• DC0 (end 2001): robustness test with complete Muons, simplified InDet and Calorimeters
– 105 events, no crash!
• Now basically all detectors available
– Some parts of the detectors (dead material, toroids) are not there and are being worked on
• Combined simulation starting now
• Full geometry usable early February
• Beta version of the full simulation program to be ready end January, to be tested in realistic production.
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 18
ATLAS Computing: Interactions with the LCG Project
• The LCG project is completely central to ATLAS computing. We are committed to it, and, in our planning, we rely on it:
– Participation in RTAGs; ATLAS has provided the convenors for two major RTAGs (Persistency and Simulation);
– Commitment of ATLAS effort into POOL (‘persistency’) project:
• The POOL project is the ATLAS data persistency project!
• LCG products and the release and deployment of the first LCG GRID infrastructure (‘LCG-1’) are now in our baseline planning:
– LCG-1 must be used for our DC2 production end 2003 – early 2004
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 19
ATLAS Computing organization (1999-2002)
simulation reconstruction database coordinator
QA group simulation reconstruction database Arch. team
Event filter
Technical Group
National Comp. Board
Comp. Steering Group Physics
Comp. Oversight Board
Detector system
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 20
Key ATLAS Computing bodies
– Computing Oversight Board (COB): ATLAS Spokesperson and Deputy, Computing Coordinator, Physics Coordinator, T-DAQ Project Leader. Role: oversight, not executive. Meets ~monthly.
– Computing Steering Group (CSG): Membership first row and first column of Detector/Task Matrix, plus Data Challenge Co-ordinator, Software Controller, Chief Architect, Chair NCB, GRID Coordinator. The top executive body for ATLAS computing. Meets ~monthly.
– National Computing Board (NCB): Representatives of all regions and/or funding agencies, GRID-coordinator and Atlas Management ex-officio. Responsible for all issues which bear on national resources: notably provision of resources for World Wide Computing. Meets every two/three months.
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 21
ATLAS Detector/Task matrix ( CSG members)
Offline Coordinator
Simulation Reconstruction Database
Chair N. McCubbin A. Dell’Acqua D. Rousseau D. Malon
Inner Detector D. Barberis F. LuehringD. Rousseau
D. BarberisD. Froidevaux
Liquid Argon J. Collot M. Leltchuk S. Rajagopalan H. Ma
Tile Calorimeter
A. Solodkov V. Tsulaya F. Merritt T. LeCompte
Muon System J. Shank A. Rimoldi J. Shank S. Goldfarb
LVL2 Trigger S. George M. Wielers S. Tapprogge A. Amorim
Event Filter V. Vercesi M. Bosman F. Touchard
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 22
Other ATLAS key post-holders– Computing Steering Group:
• Chief Architect: D.Quarrie (LBNL)• Physics Co-ordinator: F.Gianotti (CERN)• Planning Officer: T.Wenaus (BNL/CERN)• Chair NCB: A.Putzer (Heidelberg)• GRID Coordinator: L.Perini (Milan)• Data Challenge Coordinator: G.Poulard (CERN)• Software ‘Controller’: J-F.Laporte (Saclay)
– Software Infrastructure Team:• Software Librarians: S.O’Neale (Birmingham), A.Undrus (BNL)• Release Co-ordinator (rotating): D.Barberis (Genoa)• Release tools: Ch.Arnault (Orsay), J.Fulachier (Grenoble)• Quality Assurance: S.Albrand (Grenoble), P.Sherwood (UCL)
– LCG ATLAS representatives:• POB (Project Oversight Board): T.Åkesson (Deputy Spokesperson), J.Huth
(USA), P.Eerola (Nordic Cluster), H.Sakamoto (Japan)• SC2 (Software & Computing Committee): N.McCubbin (Computing
Coordinator) and D.Froidevaux• PEB (Project Execution Board): G.Poulard (Data Challenge Coordinator)• GDB (Grid Deployment Board): N.McCubbin (Computing Coordinator),
G.Poulard (Data Challenge Coordinator), L.Perini (Grid Coordinator, Deputy)
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 23
Proposed new computing organization
DRAFT FOR DRAFT FOR DISCUSSIONDISCUSSION
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 24
Main positions in proposed new computing organization
• Computing Coordinator
• Leads and coordinates the developments of ATLAS computing in all its aspects: software, infrastructure, planning, resources.
• Coordinates development activities with the TDAQ Project Leader(s), the Physics Coordinator and the Technical Coordinator through the Executive Board and the appropriate boards (COB and TTCC).
• Represents ATLAS computing in the LCG management structure (SC2 and other committees) and at LHC level (LHCC and LHC-4).
• Chairs the Computing Management Board.
• Software Project Leader
• Leads the developments of ATLAS software, as the Chief Architect of the Software Project.
• Is member of the ATLAS Executive Board and COB.
• Participates in the LCG Architects Forum and other LCG activities.
• Chairs the Software Project Management Board and the Architecture Team.
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 25
Main boards in proposed newcomputing organization (1)
• Computing Management Board (CMB):
• Computing Coordinator (chair)
• Software Project Leader
• TDAQ Liaison
• Physics Coordinator
• NCB Chair
• GRID & Operations Coordinator
• Planning & Resources Coordinator
– Responsibilities: coordinate and manage computing activities. Set priorities and take executive decisions.
– Meetings: bi-weekly.
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 26
Main boards in proposed newcomputing organization (2)
• Software Project Management Board (SPMB):
• Software Project Leader (chair)
• Computing Coordinator (ex officio)
• Simulation Coordinator
• Reconstruction, HLT Algorithms & Analysis Tools Coordinator(s)
• Core Services Coordinator
• Software Infrastructure Team Coordinator
• LCG Applications Liaison
• Calibration/Alignment Coordinator
• Sub-detector Software Coordinators
– Responsibilities: coordinate the coherent development of software (both infrastructure and applications).
– Meetings: bi-weekly.
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 27
Development plan (1)
• early 2003:
– completion of the first development cycle of OO/C++ software:
• Framework
• Fast Simulation
• Event Data Model
• Geometry
• Reconstruction
– implementation of the complete simulation in Geant4 and integration Geant4/Athena
• reminder: first cycle of OO development had to prove that “new s/w can do at least as well as old one” and was based on “translation” of algorithm and data structures from Fortran to C++
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 28
Development plan (2)
• 2003 – 2005:– Second cycle of OO software development (proper design is
needed of several components):• Event Data Model and Geometry:
– coherent design across all detectors and data types– optimization of data access in memory and on disk
• Integrated development of alignment/calibration procedures• Development and integration of the Conditions Data Base• Simulation:
– optimization of Geant4 (geometry and physics)– optimization of detector response
• On-line/off-line integration: Trigger and Event Filter software• Reconstruction: development of a global strategy, based on
modular interchangeable components
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 29
Major Milestones
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 30
Major Milestones
0 1 2 3 4 5 6 71 Tbyte database prototype (Done)Release of Athena pre-alpha version (Done)Athena alpha release (Done)Geant3 digi data available (Done)Athena beta release (Done)Athena accepted (ARC concluded) (Done)Athena Lund release (Done)Event store architecture design document (Done)DC0 production release (Done)Decide on database product (Done)DC0 Completed - continuity test (Done)Full validation of Geant4 physics (Done)DC1 Completed DelayComputing TDR Finished (Align with LCG) DelayDC2 Completed (followed by annual DCs) DelayDC3 Completed (Exercise LCG-3) NewPhysics readiness report completed DelayDC4 Completed NewFull chain in real environment (DC5) Delay
Green: Done Gray: Original date Blue: Current date
LBNL, 15 January 2003 Dario Barberis – Università e INFN, Genova 31
Perspectives
• This plan of action is realistic and can succeed if:
• there are sufficient Human Resources
• there is a “critical mass” of people working together in a few key institutions, first of all at CERN
• there is general consensus on where we are heading, and by which means (not always true in the past)
Recommended