Upload
noah-sutton
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
1
Trigger Software Upgrades
John Baines, Tomasz Bold, Joerg Stelzer, Werner Wiedenmann
2
Trigger Software Upgrades Meetings
Purpose of these meetings:• ForuBring together people from working on Phase-I Trigger software
upgrades targeted at Run 3• Coordinate HLT work on Frameworks and exploitation of new
technologies.
Run 3
Phase I Upgrades
3
Organisation & Meetings• Trigger Core Software
– Covers both operations & upgrade– Meetings: Fridays 15:00 (chairs: Joerg Stelzer, Attila Krasznahorkay, Werner Wiedenmann)
– Periodic meetings will be dedicated to Software Upgrades (chaired by Tomasz & John)
currently planned:• 19 Sep, 5 Dec (sw weeks), other dates as needed
• DAQ/HLT Software and Operations– Covers both operations & upgrade– Meetings: Thursdays 14:00 (chairs: Rainer Hauser, Wainer Vandelli)
• Meeting in Copenhagen TDAQ week – Parallel session Tuesday 15th July– Discussion session focusing on:
• Online/HLT interface: present and past experience & discussion of implications for new framework requirements
• Accelerators: How to quantify benefits & cost including cost of additional online complexity.
4
Motivation for Trigger Software Upgrades
• Meet physics requirements within online & offline resource constraintsÞ Cleverer selections to maintain HLT rejection; Þ faster code that fully exploits the capability of the farm hardware
• HLT upgrades to match Detector & L1 Upgrades:– FTK, Muon New Small Wheel, L1Topo
• Exploit Technology Evolution:– Increased no. of cores => may no longer be possible to run an application per core– Possible trend to higher no. of small, low-power cores with lower memory
• May be instead-of or in addition to larger CPU cores. – Availability of more specialised hardware e.g. GPGPU– Evolution of compilers, libraries etc.
5
Discuss today
Upgrade Work Packages• TDAQ Phase-I TDR defines “Trigger” and “Online” work packages. • In practice closely coupled:
Online
HLT Processing Unit
Evaluate & Exploit new technologies
Online Core Software, Infrastructure
Configuration, Control Monitoring
Dataflow, Event Format
Detector Software & Tools
Trigger
Trigger Core software
Evaluate & Exploit new technologies
Menus & Algorithms
Simulation
TDAQ Phase-I upgrade TDR:https://cds.cern.ch/record/1602235
Trigger Core Software DAQ/HLT Software Signatures & Menus
6
Tasks: Trigger Core Software• Design & Implementation of new offline/HLT framework
– Requirements, design, prototyping and implementation of New FrameworkÞ in collaboration with offline and other experiments
– Design & Implementation of Steering/Scheduler• common HLT/offline mechanism for concurrent algorithm scheduling
– Interface to online software – Design & Implement HLT-specific features/extensions of the new framework
• Exploitation of the new Framework– Central work to migrate signatures and algorithms– Monitoring (especially cost monitoring) able to handle parallel, asynchronous
component execution– Tools for parallel software validation and debugging
• Infrastructure for offloading work – to GPU/other co-processor/idle cores
• Trigger configuration upgrades– Support changes to the Level-1 hardware and HLT software
• Support for FTK – Steering & RegionSelector
7
Tasks: New Technologies• Evaluate CPU and co-processor/accelerator developments • Software optimisation:
– using profiling tools and techniques, expert code inspection and code redesign – make better use of parallelism provided by CPU architectures
• Look at new compilers, languages and libraries – to facilitate optimal use of new hardware and parallel programming techniques.
• Define best practices – for implementation of framework & algorithms on chosen hardware.
8
Tasks: Trigger Menus and Algorithms• Speed up of code
– especially Detector Specific code for data preparation & reconstruction
• Improve selections: – maintain efficiency w.r.t. offline & rejection; – track offline changes; – improved robustness w.r.t. pile-up– Benefit from use of FTK information
Tasks: Simulation• Ability to simulation the trigger as run online – use of old software version• FTK Simulation (fast and full)• Fast Trigger simulation (L1+HLT) based on parameterisation
• Explore flexible approach in common with Integrated Simulation Framework
9
Timescales: Framework, Steering & New Technologies
2014
Q3 Q4
LS 1
Design & Prototype
Implement core functionality
Extend to full functionality
Commissioning Run
Evaluate Implement Infrastructure
Exploit New. Tech. in Algorithms
Speed up code, thread-safety, investigate possibilities for internal parallelisation
Implement Algorithms in new framework.
HLT software Commissioning
Complete
Final Software Complete
Framework & Algos.
Fix PC architecture
FrameworkCore Functionality
Complete Incl. HLT components& new tech. support
Design of Framework
& HLT Components
Complete
Narrow h/w choices e.g. Use or not GPU
Run 3
Full menu complete
Simple menu
Requirements Capture
Complete
Framework
New Tech.
Algs & Menus
Draft Version for discussion
TDR+0 mon.
TDR +6 mon.
TDR +12 mon.
TDR +12 mon.
TDR +12 mon.
Prototype with 1 or 2 chains
10
Todays Meeting
Aims for today’s meeting:
Discuss and start to form a plan on:1) How to speed up algorithms:
code optimisation, vectorisation, internal parallelisation • what are the priorities?• what tools are there to help?• what code re-design is needed
(e.g. EDM).
2) How do we evaluate, choose and exploit future technologies & architectures in the HLT farm:• what technologies to follow? • what demonstrators/prototypes
are needed? • what infrastructure is needed? • what do we need to measure?
Agenda
11
Additional Material
12
Timescales: draft version for discussion
2014
Q3 Q4
LS 1
HLT software Commissioning
Complete
Final Software Complete
Framework & Algos.
Fix PC architecture
FrameworkCore Functionality
Complete Incl. HLT components& new tech. support
Design of Framework
& HLT Components
Complete
Narrow h/w choices e.g. Use or not GPU
Run 3
Full menu complete
Requirements Capture
Complete
Initial FTK Chains
FTK Fast Sim.
All FTK Chains
Trigger Fast Sim. Complete
Trigger Fast Sim. validated
Trigger Fast Sim.Design Complete
Simple menu implemented in NF
TDR +6 mon.
TDR +12 mon.
TDR +12 mon.
TDR +12 mon.
TDR+0 mon.
NF prototype with 1 or 2
chains
13
GPUSBenefits:• Potential for v. large speed-ups for specific algorithms/parts of code (up ~ x30)
– Partly from EDM and code restructuring (factor 2-3?) and part from use of GPU
• A lot of interest. Good way to bring in new peopleIssues:• Lower speed-ups for some other algorithms/code• Overheads to ship conditions & event data to/from GPU• Need to rewrite code in specialist language (CUDA, OpenCL)• Need to restructure EDM and code to be parallelisable (but useful for CPU as well as GPU)• Rapidly evolving h/w
=> code restructured for a specific h/w may be much less efficient on a different h/w
• GPGPU becoming less General Purpose? Trend to more cores, less memory?
Questions:• Important to evaluate & track this technology, but how much effort should we invest? What can
we learn from demonstrators? How complete do they need to be?• Language: proprietary e.g. CUDA or cross-platform e.g. OPENCL?• How to integrate with Athena? What framework infrastructure is needed? APE, dopenCL etc.
14
Frameworks• Desirable to have common framework for trigger & offline:
– unique window of opportunity now to influence framework design• Requirements capture ongoing:
– FFReq: joint Trigger + Offline. Bi-weekly meetings. Tomasz+Ben (John ex. Offic.)– Parallel session at TDAQ week to discuss online constraints
• Prototyping:– GaudiHive: based on real algorithms - so far offline code only: CaloHive, IDHive
• Stalled due to issues with Tools, Services, Incidents – TBB sheduler (Tomasz) based on dummy algos.
• Questions:– What can we learn from demonstrators? Do we need real algorithms?– What HLT-specific components are needed? Can the offline & HLT schedulers be the same?
15
Some Issues for DiscussionOptimisation & New Technologies
• Code optimisation: – Code profiling & optimisation & thread-safety are a vital first step –
how do we motivate & attract more effort for this?– Can all/most code used in the trigger (incl. increasing amounts of
offline code) be made thread-safe? What do we do if it can’t?– Restructuring EDM and code is vital for internal parallelisation – is
this achievable?– What is the correct balance between re-writing and re-use?
• New Technologies– GPUs - speculative activity: how much effort should we put into it?– How do we make architecture decisions? (e.g. GPU or not?) What input is
needed?– What do we need to measure with GPU demonstrators?
• How complete do they need to be? • What can we learn from standalone demonstrators and when must
they be integrated in athena?
16
Assessment criteria for Cost/Benefit for GPU
Increase in throughput
Compare throughput of fully occupied CPU node running C++ (e.g. 2x16 cores with hyperthreading) with same system with the addition of a GPUintegrated into athena via. APE.Reference 1: Original C++ codeReference 2: C++ code restructured and optimised to same level as GPU code
Cost Cost of hardware & support
Effort needed to port code to openCL/CUDA
hw Integration Physical size, heat output, how mounted - PCI…
sw integration Interaction with run-control, farm monitoring, error reporting.
Maintenance How easy to maintain software & to pass on maintenance to others
Debugging How easy/difficult is it to pinpoint errors occurring online/on Grid so that they can be reported & assigned (by non-expert) & debugged (by expert).
17
Some Issues for DiscussionFrameworks
• Frameworks– What questions do we need framework demonstrators to answer?
• How complete does it have to be? • What can be learnt with dummy algos & what needs real code?
– How do we make the choice of framework tech. ? (e.g. GaudiHive or another).
– Is it a framework requirement to minimize the modifications of algo. code? Or can we assume significant algo. code renewal?
18
Possible Next Steps• Code Optimisation
• Framework Requirements:– Complete framework requirements capture
• Framework Demonstrator:– Step 1: Simple demonstrator: Implement with modified GaudiHive scheduler and/or TBB scheduler
with a small menu (few chains, few steps per chain) with step-wise execution of dummy algorithms and menu decision after each step
– Step 2: Extended prototype: Once problem with tools, services, incidents is solved, implement small menu running a few real algorithms to identify any issues using a more realistic prototype.
• GPU demonstrator– Calo data prep & TopoCluster– ID data prep & ID tracking– Muon data prep & MuonTracking– Integrated into athena using APE
19
Work Areas needing peopleFramework & HLT SteeringFramework: demonstrator evaluation, requirements capture, design, implementation of HLT-specific componentsSteering:
20
GPUSTime (ms) Tau RoI 0.6x0.6
tt events 2x1034C++ on 2.4 GHz
CPU
CUDA on Tesla
C2050
SpeedupCPU/GPU
Data. Prep.
27 3 9
Seeding 8.3 1.6 5
Seed ext. 156 7.8 20
Triplet merging
7.4 3.4 2
Clone removal
70 6.2 11
CPU GPU xfer
n/a 0.1 n/a
Total 268 22 12
Example of complete L2 ID chain implemented on GPU (Dmitry Emeliyanov)
Data Prep.
L2 Tracking
21
22
Data Preparation Code
23
24
25
26
27
28
29
30
31