Trigger Software Upgrades John Baines, Tomasz Bold, Joerg Stelzer, Werner Wiedenmann 1

1

Trigger Software Upgrades

John Baines, Tomasz Bold, Joerg Stelzer, Werner Wiedenmann

2

Trigger Software Upgrades Meetings

Purpose of these meetings:• ForuBring together people from working on Phase-I Trigger software

upgrades targeted at Run 3• Coordinate HLT work on Frameworks and exploitation of new

technologies.

Run 3

Phase I Upgrades

3

Organisation & Meetings• Trigger Core Software

– Covers both operations & upgrade– Meetings: Fridays 15:00 (chairs: Joerg Stelzer, Attila Krasznahorkay, Werner Wiedenmann)

– Periodic meetings will be dedicated to Software Upgrades (chaired by Tomasz & John)

currently planned:• 19 Sep, 5 Dec (sw weeks), other dates as needed

• DAQ/HLT Software and Operations– Covers both operations & upgrade– Meetings: Thursdays 14:00 (chairs: Rainer Hauser, Wainer Vandelli)

• Meeting in Copenhagen TDAQ week – Parallel session Tuesday 15th July– Discussion session focusing on:

• Online/HLT interface: present and past experience & discussion of implications for new framework requirements

• Accelerators: How to quantify benefits & cost including cost of additional online complexity.

https://indico.cern.ch/category/2660/




4

Motivation for Trigger Software Upgrades

• Meet physics requirements within online & offline resource constraintsÞ Cleverer selections to maintain HLT rejection; Þ faster code that fully exploits the capability of the farm hardware

• HLT upgrades to match Detector & L1 Upgrades:– FTK, Muon New Small Wheel, L1Topo

• Exploit Technology Evolution:– Increased no. of cores => may no longer be possible to run an application per core– Possible trend to higher no. of small, low-power cores with lower memory

• May be instead-of or in addition to larger CPU cores. – Availability of more specialised hardware e.g. GPGPU– Evolution of compilers, libraries etc.

5

Discuss today

Upgrade Work Packages• TDAQ Phase-I TDR defines “Trigger” and “Online” work packages. • In practice closely coupled:

Online

HLT Processing Unit

Evaluate & Exploit new technologies

Online Core Software, Infrastructure

Configuration, Control Monitoring

Dataflow, Event Format

Detector Software & Tools

Trigger

Trigger Core software

Evaluate & Exploit new technologies

Menus & Algorithms

Simulation

TDAQ Phase-I upgrade TDR:https://cds.cern.ch/record/1602235

Trigger Core Software DAQ/HLT Software Signatures & Menus

https://cds.cern.ch/record/1602235

https://cds.cern.ch/record/1602235

6

Tasks: Trigger Core Software• Design & Implementation of new offline/HLT framework

– Requirements, design, prototyping and implementation of New FrameworkÞ in collaboration with offline and other experiments

– Design & Implementation of Steering/Scheduler• common HLT/offline mechanism for concurrent algorithm scheduling

– Interface to online software – Design & Implement HLT-specific features/extensions of the new framework

• Exploitation of the new Framework– Central work to migrate signatures and algorithms– Monitoring (especially cost monitoring) able to handle parallel, asynchronous

component execution– Tools for parallel software validation and debugging

• Infrastructure for offloading work – to GPU/other co-processor/idle cores

• Trigger configuration upgrades– Support changes to the Level-1 hardware and HLT software

• Support for FTK – Steering & RegionSelector

7

Tasks: New Technologies• Evaluate CPU and co-processor/accelerator developments • Software optimisation:

– using profiling tools and techniques, expert code inspection and code redesign – make better use of parallelism provided by CPU architectures

• Look at new compilers, languages and libraries – to facilitate optimal use of new hardware and parallel programming techniques.

• Define best practices – for implementation of framework & algorithms on chosen hardware.

8

Tasks: Trigger Menus and Algorithms• Speed up of code

– especially Detector Specific code for data preparation & reconstruction

• Improve selections: – maintain efficiency w.r.t. offline & rejection; – track offline changes; – improved robustness w.r.t. pile-up– Benefit from use of FTK information

Tasks: Simulation• Ability to simulation the trigger as run online – use of old software version• FTK Simulation (fast and full)• Fast Trigger simulation (L1+HLT) based on parameterisation

• Explore flexible approach in common with Integrated Simulation Framework

9

Timescales: Framework, Steering & New Technologies

2014

Q3 Q4

LS 1

Design & Prototype

Implement core functionality

Extend to full functionality

Commissioning Run

Evaluate Implement Infrastructure

Exploit New. Tech. in Algorithms

Speed up code, thread-safety, investigate possibilities for internal parallelisation

Implement Algorithms in new framework.

HLT software Commissioning

Complete

Final Software Complete

Framework & Algos.

Fix PC architecture

FrameworkCore Functionality

Complete Incl. HLT components& new tech. support

Design of Framework

& HLT Components

Complete

Narrow h/w choices e.g. Use or not GPU

Run 3

Full menu complete

Simple menu

Requirements Capture

Complete

Framework

New Tech.

Algs & Menus

Draft Version for discussion

TDR+0 mon.

TDR +6 mon.

TDR +12 mon.

TDR +12 mon.

TDR +12 mon.

Prototype with 1 or 2 chains

10

Todays Meeting

Aims for today’s meeting:

Discuss and start to form a plan on:1) How to speed up algorithms:

code optimisation, vectorisation, internal parallelisation • what are the priorities?• what tools are there to help?• what code re-design is needed

(e.g. EDM).

2) How do we evaluate, choose and exploit future technologies & architectures in the HLT farm:• what technologies to follow? • what demonstrators/prototypes

are needed? • what infrastructure is needed? • what do we need to measure?

Agenda

11

Additional Material

12

Timescales: draft version for discussion

2014

Q3 Q4

LS 1

HLT software Commissioning

Complete

Final Software Complete

Framework & Algos.

Fix PC architecture

FrameworkCore Functionality

Complete Incl. HLT components& new tech. support

Design of Framework

& HLT Components

Complete

Narrow h/w choices e.g. Use or not GPU

Run 3

Full menu complete

Requirements Capture

Complete

Initial FTK Chains

FTK Fast Sim.

All FTK Chains

Trigger Fast Sim. Complete

Trigger Fast Sim. validated

Trigger Fast Sim.Design Complete

Simple menu implemented in NF

TDR +6 mon.

TDR +12 mon.

TDR +12 mon.

TDR +12 mon.

TDR+0 mon.

NF prototype with 1 or 2

chains

13

GPUSBenefits:• Potential for v. large speed-ups for specific algorithms/parts of code (up ~ x30)

– Partly from EDM and code restructuring (factor 2-3?) and part from use of GPU

• A lot of interest. Good way to bring in new peopleIssues:• Lower speed-ups for some other algorithms/code• Overheads to ship conditions & event data to/from GPU• Need to rewrite code in specialist language (CUDA, OpenCL)• Need to restructure EDM and code to be parallelisable (but useful for CPU as well as GPU)• Rapidly evolving h/w

=> code restructured for a specific h/w may be much less efficient on a different h/w

• GPGPU becoming less General Purpose? Trend to more cores, less memory?

Questions:• Important to evaluate & track this technology, but how much effort should we invest? What can

we learn from demonstrators? How complete do they need to be?• Language: proprietary e.g. CUDA or cross-platform e.g. OPENCL?• How to integrate with Athena? What framework infrastructure is needed? APE, dopenCL etc.

14

Frameworks• Desirable to have common framework for trigger & offline:

– unique window of opportunity now to influence framework design• Requirements capture ongoing:

– FFReq: joint Trigger + Offline. Bi-weekly meetings. Tomasz+Ben (John ex. Offic.)– Parallel session at TDAQ week to discuss online constraints

• Prototyping:– GaudiHive: based on real algorithms - so far offline code only: CaloHive, IDHive

• Stalled due to issues with Tools, Services, Incidents – TBB sheduler (Tomasz) based on dummy algos.

• Questions:– What can we learn from demonstrators? Do we need real algorithms?– What HLT-specific components are needed? Can the offline & HLT schedulers be the same?

15

Some Issues for DiscussionOptimisation & New Technologies

• Code optimisation: – Code profiling & optimisation & thread-safety are a vital first step –

how do we motivate & attract more effort for this?– Can all/most code used in the trigger (incl. increasing amounts of

offline code) be made thread-safe? What do we do if it can’t?– Restructuring EDM and code is vital for internal parallelisation – is

this achievable?– What is the correct balance between re-writing and re-use?

• New Technologies– GPUs - speculative activity: how much effort should we put into it?– How do we make architecture decisions? (e.g. GPU or not?) What input is

needed?– What do we need to measure with GPU demonstrators?

• How complete do they need to be? • What can we learn from standalone demonstrators and when must

they be integrated in athena?

16

Assessment criteria for Cost/Benefit for GPU

Increase in throughput

Compare throughput of fully occupied CPU node running C++ (e.g. 2x16 cores with hyperthreading) with same system with the addition of a GPUintegrated into athena via. APE.Reference 1: Original C++ codeReference 2: C++ code restructured and optimised to same level as GPU code

Cost Cost of hardware & support

Effort needed to port code to openCL/CUDA

hw Integration Physical size, heat output, how mounted - PCI…

sw integration Interaction with run-control, farm monitoring, error reporting.

Maintenance How easy to maintain software & to pass on maintenance to others

Debugging How easy/difficult is it to pinpoint errors occurring online/on Grid so that they can be reported & assigned (by non-expert) & debugged (by expert).

17

Some Issues for DiscussionFrameworks

• Frameworks– What questions do we need framework demonstrators to answer?

• How complete does it have to be? • What can be learnt with dummy algos & what needs real code?

– How do we make the choice of framework tech. ? (e.g. GaudiHive or another).

– Is it a framework requirement to minimize the modifications of algo. code? Or can we assume significant algo. code renewal?

18

Possible Next Steps• Code Optimisation

• Framework Requirements:– Complete framework requirements capture

• Framework Demonstrator:– Step 1: Simple demonstrator: Implement with modified GaudiHive scheduler and/or TBB scheduler

with a small menu (few chains, few steps per chain) with step-wise execution of dummy algorithms and menu decision after each step

– Step 2: Extended prototype: Once problem with tools, services, incidents is solved, implement small menu running a few real algorithms to identify any issues using a more realistic prototype.

• GPU demonstrator– Calo data prep & TopoCluster– ID data prep & ID tracking– Muon data prep & MuonTracking– Integrated into athena using APE

19

Work Areas needing peopleFramework & HLT SteeringFramework: demonstrator evaluation, requirements capture, design, implementation of HLT-specific componentsSteering:

20

GPUSTime (ms) Tau RoI 0.6x0.6

tt events 2x1034C++ on 2.4 GHz

CPU

CUDA on Tesla

C2050

SpeedupCPU/GPU

Data. Prep.

27 3 9

Seeding 8.3 1.6 5

Seed ext. 156 7.8 20

Triplet merging

7.4 3.4 2

Clone removal

70 6.2 11

CPU GPU xfer

n/a 0.1 n/a

Total 268 22 12

Example of complete L2 ID chain implemented on GPU (Dmitry Emeliyanov)

Data Prep.

L2 Tracking

21

22

Data Preparation Code

23

24

25

26

27

28

29

30

31

Documents

Trigger Software Upgrades John Baines, Tomasz Bold, Joerg Stelzer, Werner Wiedenmann 1