Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
The Trigger and Real-time Reconstruction at LHCbCPAD Instrumentation Frontier Workshop 2019
Daniel Craikon behalf of the LHCb collaboration
Massachusetts Institute of Technology
9th December, 2019
Photo: User:Emery / Wikimedia Commons / CC-BY-SA-2.5
The LHCb detector
Located at point 8 of the LHCGeneral-purpose detector in the forward regionSpecialised in studying b- and c-decays
Instrumented in the forwardregion to exploitforward-production of c- andb-hadrons
0/4π
/2π/4π3
π
0/4π
/2π/4π3
π [rad]1θ
[rad]2θ
1θ
2θ
b
b
z
LHCb MC = 14 TeVs
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 1 / 23
The LHCb detector
Instrumentation in theforward region(2 < η < 5)Excellent secondaryvertex reconstructionPrecise tracking beforeand after magnetGood PID separation upto ∼ 100 GeV/c
JINST 3 (2008) S08005
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 2 / 23
https://doi.org/10.1088/1748-0221/3/08/S08005
LHCb timeline
2010 2012 2014 2016 2018 now 2022 2024 2026 2028 2030 2032
LHC HL-LHC
Run I LS 1 Run II LS 2 Run III LS 3 Run IV LS 4 Runs V+
Phase I UpgradeTriggerless readout at 40 MHz
Phase Ib UpgradePossible stepping stone
Phase II UpgradeUpgrade for HL
9 fb−1 50 fb−1 300 fb−1
Belle 250 ab−1
LHCb may be only dedicated B-physics experiment
timetable may shift
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 3 / 23
Real-time reconstruction in Run II
Hardwaretrigger
high pT, ET
1st softwaretrigger
partial reco
2nd softwaretriggerfull reco
ReconstructionAlign + Calib
Analysis40 MHz 1 MHz 100 kHz 5 kHz 5 kHz
“Online”: near detector “Offline”: grid computing
Time from collision: µs ms hours weeks
Run I:
Hardwaretrigger
high pT, ET
1st softwaretrigger
partial reco
9PB bufferReal-time
Align + Calib
2nd softwaretriggerfull reco
Analysis(Turbo)
40 MHz 1 MHz 100 kHz 100 kHz 12 kHz
Online Offline
Time from collision: µs ms hours hours
Run II:
Calibration and alignment of Run I data performed “offline” weeks after data takingTrigger reconstruction different from offline
In Run II, data buffered before final trigger stageAllows for real-time alignment and calibrationOffline-like reconstruction within the triggerMany analyses use “Turbo-stream” data – online reconstruction, full raw event notsaved
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 4 / 23
Real-time reconstruction in Run II
Real-time alignment andcalibration performed forvertex locator, RICHdetectors, trackingstations, calorimeter andmuon stationsWill focus on velo andRICHAlignment particularlyimportant for velo, whichopens and closesbetween fillsGas-filled RICH detectorsalso require frequentcalibration
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 5 / 23
Real-time reconstruction in Run II
Each alignment task performedonce per fillAlignment begins once a largeenough dataset has beencollectedCalibration of RICH gasrefractive index performedregularly to account fortemperature/pressure changeswithin the radiator gas
LHCb Preliminary LHCb Preliminary
align + calib
initial improved
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 6 / 23
Real-time alignment of the Velo
Vertex locator modules sit5 mm from the LHC beamConsists of two retractablehalves (one shown)Modules formed of twosections – one on each velohalfDuring beam injection, veloretracted to 35 mm for safetyClosed once LHC beams arestableMoves every fill→ align everyfill
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 7 / 23
Real-time alignment of the Velo
Alignment of velo based on minimisingresiduals between hits andreconstructed tracksPlot shows x and y translation betweenthe two velo halvesTolerance of ±2µm allowed withoutalignment update (empty markers)Updates may also be caused by otherdegrees of freedom
e.g. offsets or rotations within a velohalf
∆Tx
x
z
100 200Alignment number [a.u.]
20−15−10−
5−0
5
10
15
20m]
µV
aria
tion
[
x-translationy-translationLHCb VELO
Preliminary
Empty markers = no update 17/04/2018 - 21/11/2018
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 8 / 23
Real-time calibration of the RICH
250 mrad
Track
Beam pipe
PhotonDetectors
VELO exit window
SphericalMirror
PlaneMirror
C4F10
0 100 200 z (cm)
MagneticShield
RICH detectors provide particle ID information based onangle of Cherenkov radiationIndex of refraction of the gas radiators sensitive tochanges in temperature, pressure and compositionThese features are monitored but data-driven calibrationalso requiredCompare recorded and expected Cherenkov angles(bottom left)Alignment of mirrors also calibrated (bottom right)
2
FIG. 2. Side view of the LHCb RICH detector upstream of themagnet.
system, and calibrate the refractive index of radiators andthe HPD image with good precision. These factors are alltime-dependent, necessitating real-time calibration andalignment of the LHCb RICH detectors, and the trackingsystem.Calibration and alignmentCalibration of the refractive index of the RICH radi-ators
The refractive index of the gas radiators depends onthe ambient temperature and pressure, and the exactcomposition of the gas mixture; so it can change in time.These quantities are monitored by hardware to computean expected refractive index, but this does not have aprecision that is high enough for the physics analysis,therefore it needs to be further corrected. As shown inFig. 3, the distribution of the difference between the re-constructed and expected Cherenkov angle is fitted to ob-tain the shift, which is then converted to a scale factor ofthe expected refractive index according to studies basedon simulation.
About 50 Hz of events are sent to multiple online re-construction tasks, which run in parallel, and the result-ing histograms are merged at the end of each run. Thena dedicated task is used to fit the histograms merged run-by-run and produce calibration constants to be used bythe RICH reconstruction in the final stage of the softwaretrigger. The maximum run length is one hour.Calibration of the HPD images
The Hybrid Photon Detector is used to detectCherenkov photons. As shown in Fig. 4, the photoelec-tron produced at the photocathode is accelerated by ahigh voltage of up to 20 kV onto a reverse-biased pixel-lated silicon detector, with a de-magnification factor of
delta(Cherenkov Theta) / rad0.008− 0.006− 0.004− 0.002− 0 0.002 0.004 0.006 0.008
Entr
ies
1600
1800
2000
2200
2400
2600
2800
310×Rich1Gas Rec-Exp Cktheta | All photons Entries 3.643921e+08Mean 0.0002294
Std Dev 0.004334 / ndf 2χ 1711 / 90
Gaus Constant 8.645e+05Gaus Mean 0.0006738Gaus Sigma 0.001926p3 1.905e+06p4 1.09e+07p5 2.062e+09− p6 1.113e+11
Run 182551SF 1.025672
Mon Aug 29 05:41:53 2016
Rich1Gas Rec-Exp Cktheta | All photons
FIG. 3. Difference between the reconstructed and expectedCherenkov angle before the calibration.
FIG. 4. Schematic drawing of the Hybrid Photon Detector(HPD).
about 5 [6]. The HPD anode images are affected by themagnetic and electric fields, and have been observed tomove and change their size, possibly due to changes inthese residual fields when the high voltage is cycled eachLHC fill. Such changes could degrade the reconstructionof the Cherenkov angle and affect the PID performance.Therefore the centre and radius of all the HPD imagesare calibrated run-by-run. Figure 5 shows the calibra-tion process. First, the centre of the image is cleaned toeliminate ion feedback. Then a Sobel filter is used to de-tect the edges of the image that are fitted to determinethe centre and the radius of the image, which are used bythe RICH reconstruction in the final stage of the softwaretrigger. As only the raw HPD data needs to be decoded,more than 500 Hz of events are processed run-by-run.Alignment of the RICH mirror system
The Cherenkov photons emitted by the charged parti-cles passing through the RICH detectors are focused ontothe photon-detector plane by the spherical and secondarymirrors. In case of misalignment the centre of Cherenkovring would not correspond to the intersection point of thecharged track, and this would introduce a dependenceof the difference between the measured and expectedCherenkov angle on the azimuthal angle of the ring, as
3
FIG. 5. Calibration process of the HPD image. (a) HPD imagefor a typical run. (b) The centre of the HPD image is cleanedto eliminate ion feedback. (c) Edge of the HPD image detectedby the Sobel filter. (d) Radius and centre of the HPD imagereturned by the fit.
[rad]φ
0 2 4 6
[ra
d]
θ∆
-0.002
-0.0015
-0.001
-0.0005
0
0.0005
0.001
0.0015
0.002
600
700
800
900
1000
1100
310×
[rad]φ
0 2 4 6
[ra
d]
θ∆
-0.002
-0.0015
-0.001
-0.0005
0
0.0005
0.001
0.0015
0.002
600
700
800
900
1000
1100
310×
FIG. 6. Difference between the measured and expectedCherenkov angle, ∆θC plotted as a function of the azimuthalangle φ and fitted with θx cos(φ) + θy sin(φ), for one side ofthe RICH 2 detector [6]. The upper plot is prior to alignment,and shows a dependency of the angle θC on the angle φ. Thebottom plot is after the alignment correction, and ∆θC is uni-form in φ.
Alignment number0 5 10
Abs
olut
e va
lue
[mra
d]
0
0.05
0.1
0.15
0.2
LHCb RICHPreliminary
24/05/2016 - 24/08/2016
average local y-rotationrange of local y-rotationsaverage local z-rotationrange of local z-rotations
Alignment number0 5 10
Abs
olut
e va
lue
[mra
d]
0
0.05
0.1
0.15
0.2
LHCb RICHPreliminary
24/05/2016 - 24/08/2016
average local y-rotationrange of local y-rotationsaverage local z-rotationrange of local z-rotations
FIG. 7. Time dependence of the mirror alignment parametersfor the RICH detector downstream the magnet for the 2016 datasample, (upper) spherical mirror, (bottom) secondary mirror.The shadow regions show the range of alignment parametersfor all the mirror pairs, the points show the average value forall the mirror pairs.
shown in Fig. 6. The alignment constants for each mirrorare determined by the fit of the Cherenkov angle differ-ence as a function of the azimuthal angle on the ring.The correlation between the different mirror pairs is alsotaken into account. The procedure is evaluated by an iter-ative procedure implemented in a dedicated framework,which makes it possible to run the alignment in parallelusing about 1800 nodes of the software trigger farm. Thealignment of the RICH mirror system has been found tobe stable enough to not affect the PID performance, asshown in Fig. 7, and it runs routinely as a monitoringtask.
Time alignment
In order to maximise the photon collection efficiencyof the LHCb RICH detectors, the HPD readout must besynchronised with the LHC bunch crossing to within afew nanoseconds. The initial time alignment was per-formed in the absence of beam using a pulsed laser, andhas been improved further with dedicated timing scandata taken during physics collisions. As shown in Fig. 8,all the HPDs have been time aligned to about 1 ns.
3
FIG. 5. Calibration process of the HPD image. (a) HPD imagefor a typical run. (b) The centre of the HPD image is cleanedto eliminate ion feedback. (c) Edge of the HPD image detectedby the Sobel filter. (d) Radius and centre of the HPD imagereturned by the fit.
[rad]φ
0 2 4 6
[ra
d]
θ∆
-0.002
-0.0015
-0.001
-0.0005
0
0.0005
0.001
0.0015
0.002
600
700
800
900
1000
1100
310×
[rad]φ
0 2 4 6
[ra
d]
θ∆
-0.002
-0.0015
-0.001
-0.0005
0
0.0005
0.001
0.0015
0.002
600
700
800
900
1000
1100
310×
FIG. 6. Difference between the measured and expectedCherenkov angle, ∆θC plotted as a function of the azimuthalangle φ and fitted with θx cos(φ) + θy sin(φ), for one side ofthe RICH 2 detector [6]. The upper plot is prior to alignment,and shows a dependency of the angle θC on the angle φ. Thebottom plot is after the alignment correction, and ∆θC is uni-form in φ.
Alignment number0 5 10
Abs
olut
e va
lue
[mra
d]
0
0.05
0.1
0.15
0.2
LHCb RICHPreliminary
24/05/2016 - 24/08/2016
average local y-rotationrange of local y-rotationsaverage local z-rotationrange of local z-rotations
Alignment number0 5 10
Abs
olut
e va
lue
[mra
d]
0
0.05
0.1
0.15
0.2
LHCb RICHPreliminary
24/05/2016 - 24/08/2016
average local y-rotationrange of local y-rotationsaverage local z-rotationrange of local z-rotations
FIG. 7. Time dependence of the mirror alignment parametersfor the RICH detector downstream the magnet for the 2016 datasample, (upper) spherical mirror, (bottom) secondary mirror.The shadow regions show the range of alignment parametersfor all the mirror pairs, the points show the average value forall the mirror pairs.
shown in Fig. 6. The alignment constants for each mirrorare determined by the fit of the Cherenkov angle differ-ence as a function of the azimuthal angle on the ring.The correlation between the different mirror pairs is alsotaken into account. The procedure is evaluated by an iter-ative procedure implemented in a dedicated framework,which makes it possible to run the alignment in parallelusing about 1800 nodes of the software trigger farm. Thealignment of the RICH mirror system has been found tobe stable enough to not affect the PID performance, asshown in Fig. 7, and it runs routinely as a monitoringtask.
Time alignment
In order to maximise the photon collection efficiencyof the LHCb RICH detectors, the HPD readout must besynchronised with the LHC bunch crossing to within afew nanoseconds. The initial time alignment was per-formed in the absence of beam using a pulsed laser, andhas been improved further with dedicated timing scandata taken during physics collisions. As shown in Fig. 8,all the HPDs have been time aligned to about 1 ns.
NIM A 12 (2016) 041
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 9 / 23
https://doi.org/10.1016/j.nima.2016.12.041
The turbo stream
Save only parts of the eventneeded for offline analysisMultiple persistence levels
Only candidate (∼ 7 kB)Part of event (∼ 16 kB)Full event (∼ 48 kB)cf. Non-turbo (∼ 69 kB)
[Comput.Phys.Commun. 208 (2016) 35], applied in LHCb since 2015
Turbo at LHCb
REAL TIME ANALYSIS
23
It exploits the event topology and saves only a subset of the objects which are relevant for a posterior analysis. One can use several persistence levels:
Used by many analyses, e.g.
]2c) [MeV/+π0D(m2005 2010 2015 2020
)2 cC
andi
date
s / (
0.1
MeV
/
0
1000
2000
3000
4000
5000
6000
310×
Data
+K−K → 0D
Comb. bkg.
LHCb
]2c) [MeV/+π0D(m2005 2010 2015 2020
)2 cC
andi
date
s / (
0.1
MeV
/
0200400600800
1000120014001600180020002200
310×
Data
+π−π → 0D
Comb. bkg.
LHCb
310 410 510
210
310
410
510
610
710
m(µ+µ−) [MeV ]
Can
didates/σ[m
(µ+µ−)]/2
LHCb√s = 13TeV
prompt µ+µ−
µQµQhh+ hµQ
⇒ isolationapplied
prompt-like sample
pT(µ) > 1GeV, p(µ) > 20GeV
]2c) [MeV/++ccΞ(candm3500 3600 3700
2 cC
andi
date
s pe
r 5
MeV
/
0
20
40
60
80
100
120
140
160
180
DataTotalSignalBackground
LHCb 13 TeV
CP violation in charm decays Search for dark photons decaying to dimuons Observation of Ξ++ccPRL 122 (2019) 211803 PRL 120 (2018) 061801 PRL 119 (2017) 112001
JINST 14 (2019) P04006
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 10 / 23
https://doi.org/10.1103/PhysRevLett.122.211803https://doi.org/10.1103/PhysRevLett.120.061801https://doi.org/10.1103/PhysRevLett.119.112001https://doi.org/10.1088/1748-0221/14/04/P04006
The LHCb detector: Run III upgrade
05/07/2018
Why do we want to upgrade for Run III?• We currently level our luminosity at
• Huge gains available if we can run at higher luminosities
• Why do we run at lower luminosity? • Design choices for our physics programme • Detector and trigger limitations
• Note that upgrading for Run 3 is before the HL-LHC era in Run 4 onwards
6
R. Quagliani 20th March 2018 5
LHCb in Run I (and Run II)
LHCb Run I
LHCb upgrade
LHCb limitations and upgrade
Run III target
Run I and II
Huge gain in physics capabilities if able to run at larger luminosity
LHCb will further expand its physics program as GPD.
[CERN-LHCC-2011-001]
☞μ = 1.1-1.8
☞μ = 7.6 ☞50 fb-1
☞3 (Run I) + 5 (Run II) fb-1
New detectors and trigger strategy.
LHCb upgrade and trigger strategy
[Run I]
L ⇡ 4 ⇥ 1032cm�2s�1AAACJnicbVDLSgMxFL3js9bXqEs3QRHcWGaqoG6k6MaFCwWrhU5bM2mqoclMSDJiGeZD3LvxV9wIPhB3foqZ1oW2HgicnHMv994TSs608bxPZ2x8YnJqujBTnJ2bX1h0l5YvdJwoQqsk5rGqhVhTziJaNcxwWpOKYhFyehl2j3L/8pYqzeLo3PQkbQh8HbEOI9hYqeUeBAKbG4J5epKhAEup4ju0gwLDBNXI95rpdjlDaaAEIiJrplvlrP/ROfezlrvulbw+0Cjxf8h6ZUbeXwHAact9CdoxSQSNDOFY67rvSdNIsTKMcJoVg0RTiUkXX9O6pRG2azTS/p0Z2rBKG3ViZV9kUF/93ZFioXVPhLYyv0oPe7n4n1dPTGevkbJIJoZGZDCok3BkYpSHhtpMUWJ4zxJMFLO7InKDFSbGRlu0IfjDJ4+Sarm0X/LPbBiHMEABVmENNsGHXajAMZxCFQg8wBO8wpvz6Dw7787HoHTM+elZgT9wvr4BdH+myQ==AAACJnicbVDLTgIxFO34BHyhLt00EhM3khk0UTeG6MaFC0xEiAyQTinQ0M40bcdAJvMh7t248j/cmPiIceefaAdYKHiSJqfn3Jt77/EEo0rb9qc1Mzs3v7CYSmeWlldW17LrG9cqCCUmZRywQFY9pAijPilrqhmpCkkQ9xipeL2zxK/cEqlo4F/pgSB1jjo+bVOMtJGa2ROXI93FiEUXMXSREDLowwPoasqJgo7diPYLMYxcySHmcSPaK8TDj0q4EzezOTtvDwGniTMmuWJa3N089r9LzeyL2wpwyImvMUNK1Rxb6HqEpKaYkTjjhooIhHuoQ2qG+sisUY+Gd8Zwxygt2A6keb6GQ/V3R4S4UgPumcrkKjXpJeJ/Xi3U7aN6RH0RauLj0aB2yKAOYBIabFFJsGYDQxCW1OwKcRdJhLWJNmNCcCZPniblQv4471yaME7BCCmwBbbBLnDAISiCc1ACZYDBPXgCr+DNerCerXfrY1Q6Y417NsEfWF8/SPCo6Q==AAACJnicbVDLTgIxFO34BHyhLt00EhM3khk0UTeG6MaFC0xEiAyQTinQ0M40bcdAJvMh7t248j/cmPiIceefaAdYKHiSJqfn3Jt77/EEo0rb9qc1Mzs3v7CYSmeWlldW17LrG9cqCCUmZRywQFY9pAijPilrqhmpCkkQ9xipeL2zxK/cEqlo4F/pgSB1jjo+bVOMtJGa2ROXI93FiEUXMXSREDLowwPoasqJgo7diPYLMYxcySHmcSPaK8TDj0q4EzezOTtvDwGniTMmuWJa3N089r9LzeyL2wpwyImvMUNK1Rxb6HqEpKaYkTjjhooIhHuoQ2qG+sisUY+Gd8Zwxygt2A6keb6GQ/V3R4S4UgPumcrkKjXpJeJ/Xi3U7aN6RH0RauLj0aB2yKAOYBIabFFJsGYDQxCW1OwKcRdJhLWJNmNCcCZPniblQv4471yaME7BCCmwBbbBLnDAISiCc1ACZYDBPXgCr+DNerCerXfrY1Q6Y417NsEfWF8/SPCo6Q==AAACJnicbVDLSsNAFJ34rPUVdelmsAhuLEkV1I0U3bhwUcHYQpOWyXTSDp1JwsxELCF/48ZfcSP4QNz5KU7SLrT1wMCZc+7l3nv8mFGpLOvLmJtfWFxaLq2UV9fWNzbNre07GSUCEwdHLBItH0nCaEgcRRUjrVgQxH1Gmv7wMveb90RIGoW3ahQTj6N+SAOKkdJS1zx3OVIDjFh6nUEXxbGIHuAxdBXlRELb6qRHtQymruAQ86yTHtay4iNzbmdds2JVrQJwltgTUgETNLrmq9uLcMJJqDBDUrZtK1ZeioSimJGs7CaSxAgPUZ+0NQ2RXsNLizszuK+VHgwioV+oYKH+7kgRl3LEfV2ZXyWnvVz8z2snKjj1UhrGiSIhHg8KEgZVBPPQYI8KghUbaYKwoHpXiAdIIKx0tGUdgj198ixxatWzqn1jVeoXkzRKYBfsgQNggxNQB1egARyAwSN4Bm/g3XgyXowP43NcOmdMenbAHxjfPyxPpRc=
Run I + II target : 8 fb-1 Run III + IV target: 50 fb-1
A real fill with the upgrade overlaid
Run at 5× higher luminosityTriggerless readout at 40 MHzNew vertex locatorNew tracking (UT, SciFi)
CERN-LHCC-2012-007
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 11 / 23
https://cds.cern.ch/record/1443882
Challenges in Run III
3
Reaching the MHz signal eraRun 3: Luminosity of 2x1033 cm-2s-1, √s = 14 TeV
At increased luminosity, charm (beauty) in 24 % (2 %) ofbunch crossings
Cannot write out charm at 7 MHz
Trigger must distinguish signal from less-interesting signalas well as from backgroundNo longer feasible to have first trigger based oncalorimeters and muon detectors aloneNeed as much information about an event as soon aspossible
6
Change in trigger paradigm
Access as much information about the collision as early as possible
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 12 / 23
LHCb trigger in Run III
x86 CPU farm
Run 1 Run 2
pp Collisions
Hardware L0
EB
HLT1
9 PB bu↵ercalibration
HLT2
Storage
1 TB/s1 TB/s
50 GB/s50 GB/s
50 GB/s50 GB/s
4 GB/s
0.3 GB/s 0.7 GB/s
6 GB/s
6 GB/s
x86 CPU farm
Run 3: Baseline
pp Collisions
EB
HLT1
bu↵er on diskcalibration and alignment
HLT2
Storage
5 TB/s
5 TB/s
10 GB/s
x86 CPU farm
Run 3: GPU-enhanced
pp Collisions
EB+
HLT1 on GPUs
bu↵er on diskcalibration and alignment
HLT2
Storage
5 TB/s
0.2 TB/s
10 GB/s
Figure 2: Evolution of the LHCb trigger system. Real-time calibration and alignment was firstperformed between the HLT stages in Run 2. The FPGA-based hardware stage will be removedin Run 3. Our proposal focuses on adding GPUs to the EB servers and running the entire HLT1sequence on GPUs. This reduces the bandwidth that needs to be transmitted from the EB nodesto the CPU farm from 5 TB/s to 0.2TB/s. The cost savings on networking is expected to be morethan the total cost of the GPUs needed to run HLT1 on the EB servers. Furthermore, the entire(fixed-size) CPU farm would be available for running HLT2.
(the Event Filter Farm or EFF) for processing by a software application called the high-level trigger(HLT), which is divided into two stages. HLT1 partially reconstructs events and selects a subset forfurther processing by HLT2, which performs a more complete reconstruction then executes manyselection algorithms to further reduce the rate at which data are ingested for permanent storage.
In Run 1, the combination of limited CPU in the EFF (20k logical cores), lack of experiencewith the data (a new detector), and suboptimal algorithms limited HLT1 to reconstructing onlya low-fidelity subset of the interesting objects in each event. Similarly, HLT2 was not able toreconstruct all objects, and the lack of data calibrations available in real time meant that o✏inereconstruction was necessary to produce the high quality data required for physics analysis. Despitethese limitations, Williams pioneered the use of ML already in 2011 in the primary HLT2-selectionalgorithm, known as the Topological Trigger (TOPO). About 60% of all LHCb publications to-date were produced using data recorded by the ML-based TOPO. By the end of Run 1, innovationslike the TOPO made it possible for LHCb to process proton-proton collisions at twice its designmaximum rate, while recording signal samples at more than twice the anticipated rates and withhigher than expected purities.
For Run 2, Williams and collaborators from the Yandex corporation reoptimized the TOPO [10].Furthermore, they replaced the primary HLT1 algorithms with ML-based versions. In Run 2, 75%of the data persisted by HLT1 were selected by these ML-based algorithms. Other major changeswere also made to the trigger in Run 2. Increasing the number of (logical) EFF cores to 50k anddeploying faster reconstruction algorithms allowed both HLT1 and HLT2 to execute the o✏ine
4
Hardware trigger to be removed fromRun IIIHLT1 software trigger must perform at30× higher rate with 5× the pileupBuffer will reduce fromO(weeks)→ O(days)Significant increase in data transfer ratesNew trigger setup offers up to ∼ 10×efficiency improvement for some physicschannels
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 13 / 23
Run III baseline HLT1 performance
Significant progress made tooptimise tracking algorithms∼ 4× improvement in throughputfrom vectorising, improvements toevent model and optimisation
Figure 4.2: Throughput of the displaced-track reconstruction sequence, as a function of the product ofthe number of processes and number of threads, for different number of threads per process, as indicatedin the legend. The throughput peak performance is 12400 evt/s/node for 2 processes and 20 threads perprocess. The “non-Hive” line indicates the performance that is achieved without multithreading.
Figure 4.3: Maximum throughput of the displaced-track reconstruction sequence, as a function of the cuton the impact parameter (in µm), for different transverse momentum thresholds in the pattern recognitionalgorithms, as indicated in the legend.
38
2018
-09-
2520
18-0
9-30
2018
-10-
0520
18-1
0-10
2018
-10-
1520
18-1
1-09
2018
-11-
15
2018
-12-
1320
18-1
2-30
2019
-01-
1120
19-0
1-17
2019
-03-
3020
19-0
4-05
2019
-04-
1120
19-0
4-16
2019
-04-
2120
19-0
4-27
2019
-05-
0320
19-0
5-09
2019
-05-
2720
19-0
6-02
2019
-06-
0820
19-0
6-14
2019
-06-
1920
19-0
6-24
2019
-07-
0520
19-0
7-11
2019
-07-
1720
19-0
7-28
2019
-06-
2920
19-0
7-05
2019
-07-
1120
19-0
7-17
0
5
10
15
20
25
30
35
Even
ts p
roce
ssed
per
seco
nd,
assu
min
g 10
00 re
fere
nce
node
s (M
Hz) LHCb Upgrade simulation
Scalar event model, maximal SciFi reconstructionScalar event model, fast SciFi reconstruction with tighter track tolerance criteriaScalar event model, vectorizable SciFi reconstruction with entirely reworked algorithm logicFully SIMD-POD friendly event model, vectorizable SciFi and vectorized vertex detector and PV reconstruction, I/O improvements
multi-threaded processes offer gains overrunning more processes in parallelOptimal CPU architecture underinvestigation – new AMD architecture offerssignificant cost/benefit improvements
LHCB-FIGURE-2019-002LHCB-TDR-017
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 14 / 23
https://cds.cern.ch/record/2684267https://cds.cern.ch/record/2310827
Run III baseline HLT1 performance
Allows for a flexible and configurablesequencePhysics performance of single-track andtwo-track selections studiedLoose (L) and tight (T) versions ofalgorithms simulated with different pTthresholds (500− 1000 MeV/c)(Top) ∼ 1 MHz output rate achievable basedon “minimum bias” simulationTwo-track selection remains efficientSingle-track selection still requires work
Upgrade trigger: Bandwidth strategy proposal Ref: LHCb-PUB-2017-006Public Note Issue: 13 Selection at HLT1 Date: February 22, 2017
500 MeV 750 MeV 1000 MeV
Rat
e (M
Hz)
0
0.5
1
1.5
2
LL LT TL TT LL LT TL TT LL LT TL TT
Hea
vy f
lavo
r pu
rity
0
0.2
0.4
0.6
0.8
11-Track 2-Track 1- OR 2-Track
LL LT TL TT LL LT TL TT LL LT TL TT0
0.2
0.4
0.6
0.8
1
500 MeV 750 MeV 1000 MeV
Eff
icie
ncy
-K+K →0D +π-K+K →+D -π+π-K+K →0D -K+KS0K →0D +π+cΛ →++cΣ
1-Track 2-Track 1- OR 2-Track
LL LT TL TT LL LT TL TT LL LT TL TT0
0.2
0.4
0.6
0.8
1
500 MeV 750 MeV 1000 MeV
Eff
icie
ncy
+K0
D →+B +KS0K →+B -µ+µ*K →0B ν+µ-D →0B -D+D →0B +π-K+K →+B
1-Track 2-Track 1- OR 2-Track
Figure 1: HLT1 inclusive trigger (top) rates and heavy-flavor purities from minimum bias MC, and (middle)charm and (bottom) signal efficiencies on loosely selected MC. In each plot, the vertical light-grey lines separateresults in which tracking pT thresholds of 500, 750, and 1000 MeV are used. The LL, LT, TL, and TT denote loose(L) and tight (T) selections applied in the 1-Track and 2-Track lines, where the first letter is for the 1-Track and thesecond for the 2-Track.
triggered by a b or c hadron. The purity of the 2-Track is 65-80%, depending on the MVA threshold, andon the value of the tracking pT threshold. The 1-Track, however, is only about 25% pure. The low purityof the 1-Track is partially due to its large ghost contamination of 30-40%, based on the assumptionthat 20% of ghost tracks remain after a Run 2-like ghost rejection requirement. For all modes exceptB+→K0SK+, the 2-Track provides most of the signal efficiency, with HLT1 efficiencies for beautyaveraging ∼ 80% at 1MHz output rate. Charm efficiencies of ∼ 60% are achievable at the same rate.If the output rate must be reduced further, a 2-Track only HLT1 would still be capable of providingaverage efficiencies of ∼ 60% for beauty and ∼ 50% for charm signals at 0.5MHz assuming no furtheroptimisation of the 2-Track inclusive trigger. In reality however, the 2-Track would benefit from furtheroptimisation in this operating regime, and for topologies including a K0s exclusive selections would berequired to obtain a high efficiency.
page 4
LHCb-PUB-2017-006
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 15 / 23
https://cds.cern.ch/record/2244313
LHCb trigger in Run III
x86 CPU farm
Run 1 Run 2
pp Collisions
Hardware L0
EB
HLT1
9 PB bu↵ercalibration
HLT2
Storage
1 TB/s1 TB/s
50 GB/s50 GB/s
50 GB/s50 GB/s
4 GB/s
0.3 GB/s 0.7 GB/s
6 GB/s
6 GB/s
x86 CPU farm
Run 3: Baseline
pp Collisions
EB
HLT1
bu↵er on diskcalibration and alignment
HLT2
Storage
5 TB/s
5 TB/s
10 GB/s
x86 CPU farm
Run 3: GPU-enhanced
pp Collisions
EB+
HLT1 on GPUs
bu↵er on diskcalibration and alignment
HLT2
Storage
5 TB/s
0.2 TB/s
10 GB/s
Figure 2: Evolution of the LHCb trigger system. Real-time calibration and alignment was firstperformed between the HLT stages in Run 2. The FPGA-based hardware stage will be removedin Run 3. Our proposal focuses on adding GPUs to the EB servers and running the entire HLT1sequence on GPUs. This reduces the bandwidth that needs to be transmitted from the EB nodesto the CPU farm from 5 TB/s to 0.2TB/s. The cost savings on networking is expected to be morethan the total cost of the GPUs needed to run HLT1 on the EB servers. Furthermore, the entire(fixed-size) CPU farm would be available for running HLT2.
(the Event Filter Farm or EFF) for processing by a software application called the high-level trigger(HLT), which is divided into two stages. HLT1 partially reconstructs events and selects a subset forfurther processing by HLT2, which performs a more complete reconstruction then executes manyselection algorithms to further reduce the rate at which data are ingested for permanent storage.
In Run 1, the combination of limited CPU in the EFF (20k logical cores), lack of experiencewith the data (a new detector), and suboptimal algorithms limited HLT1 to reconstructing onlya low-fidelity subset of the interesting objects in each event. Similarly, HLT2 was not able toreconstruct all objects, and the lack of data calibrations available in real time meant that o✏inereconstruction was necessary to produce the high quality data required for physics analysis. Despitethese limitations, Williams pioneered the use of ML already in 2011 in the primary HLT2-selectionalgorithm, known as the Topological Trigger (TOPO). About 60% of all LHCb publications to-date were produced using data recorded by the ML-based TOPO. By the end of Run 1, innovationslike the TOPO made it possible for LHCb to process proton-proton collisions at twice its designmaximum rate, while recording signal samples at more than twice the anticipated rates and withhigher than expected purities.
For Run 2, Williams and collaborators from the Yandex corporation reoptimized the TOPO [10].Furthermore, they replaced the primary HLT1 algorithms with ML-based versions. In Run 2, 75%of the data persisted by HLT1 were selected by these ML-based algorithms. Other major changeswere also made to the trigger in Run 2. Increasing the number of (logical) EFF cores to 50k anddeploying faster reconstruction algorithms allowed both HLT1 and HLT2 to execute the o✏ine
4
Option to move to aGPU-based HLT1 withGPUs installed on theEvent Builder serversFree up full CPU farmfor HLT2 and save onnetworking betweenevent builders andCPU farmDemonstratedtechnical feasibilityDecision due in early2020
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 16 / 23
Why GPUs?
Moore’s law still holds butsingle-thread performance haslevelled offGains now to be made throughparallelisationGPUs specialised formassively parallel operations(100s–1000s of cores)
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 17 / 23
HLT1
HLT1 involves decoding,clustering and trackreconstruction for all trackingsubdetectorsAlgorithms also performKalman filter and triggerselectionAll stages of the process maybe parallelised
Raw data
GlobalEvent Cut
Velo decodingand clustering
Velo tracking
Simple Kalman filter
Find primary vertices
UT decoding
UT tracking
SciFi decoding
SciFi tracking
ParameterizedKalman filter
Muon decoding
Muon ID
Find sec-ondary vertices
Select events
Selected events
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 18 / 23
The Allen project
Generic configurable framework for GPU-based execution of an algorithm sequenceData passed to GPU deviceAll algorithms executed in orderResults passed back to the host
Process thousands of events in a single sequenceOpportunity for massive parallelisation
Configurable sequences at compile timeConfigurable algorithms at run timeCustom GPU memory management – no dynamic allocationBuilt in validation and monitoringCross-platform compatibility with CPU architecturesNamed for Frances E. AllenImplement HLT1 on GPUs
Photo: User:Rama / Wikimedia Commons / CC-BY-SA-2.0 fr
LHCB-FIGURE-2019-009
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 19 / 23
http://cds.cern.ch/record/2693058/
Allen selection ingredientsPrimary vertices
36
Ingredients for selections
Selections● 1-track selection ● 2-track selection● Based on p, pt, displacement,
vertex criteria and muon identification
0 2000 4000[MeV]
Tp
0
0.2
0.4
0.6
0.8
1
effici
ency
efficiency, not electronsefficiency, electrons
distribution, not electronsT
pdistribution, electrons
Tp
Num
ber o
f eve
nts [
a.u.
]
LHCb simulation GPU R&D
Tracks0 20 40 60 80 100
310×
p [MeV/c]0
0.2
0.4
0.6
0.8
1
1.2
/p [%
]pσ
p resolution
p distribution Num
ber o
f eve
nts [
a.u.
]
LHCb simulation GPU R&D
Momentum
Muon identification
0 20 40track multiplicity of MC PV
0
0.2
0.4
0.6
0.8
1
effici
ency
efficiency
multiplicity distribution
Num
ber o
f eve
nts [
a.u.
]
LHCb simulation GPU R&D
Primary vertices
Impact parameter
Secondary vertices
Tracks
36
Ingredients for selections
Selections● 1-track selection ● 2-track selection● Based on p, pt, displacement,
vertex criteria and muon identification
0 2000 4000[MeV]
Tp
0
0.2
0.4
0.6
0.8
1
effici
ency
efficiency, not electronsefficiency, electrons
distribution, not electronsT
pdistribution, electrons
Tp
Num
ber o
f eve
nts [
a.u.
]
LHCb simulation GPU R&D
Tracks0 20 40 60 80 100
310×
p [MeV/c]0
0.2
0.4
0.6
0.8
1
1.2
/p [%
]pσ
p resolution
p distribution Num
ber o
f eve
nts [
a.u.
]
LHCb simulation GPU R&D
Momentum
Muon identification
0 20 40track multiplicity of MC PV
0
0.2
0.4
0.6
0.8
1
effici
ency
efficiency
multiplicity distribution
Num
ber o
f eve
nts [
a.u.
]
LHCb simulation GPU R&D
Primary vertices
Impact parameter
Secondary vertices
SelectionsOne trackTwo tracksSingle muonTwo muons (displaced)Two muons (high-mass)
Momentum
36
Ingredients for selections
Selections● 1-track selection ● 2-track selection● Based on p, pt, displacement,
vertex criteria and muon identification
0 2000 4000[MeV]
Tp
0
0.2
0.4
0.6
0.8
1
effici
ency
efficiency, not electronsefficiency, electrons
distribution, not electronsT
pdistribution, electrons
Tp
Num
bero
feve
nts [
a.u.
]
LHCb simulation GPU R&D
Tracks0 20 40 60 80 100
310×
p [MeV/c]0
0.2
0.4
0.6
0.8
1
1.2
/p [%
]pσ
p resolution
p distribution Num
ber o
f eve
nts [
a.u.
]
LHCb simulation GPU R&D
Momentum
Muon identification
0 20 40track multiplicity of MC PV
0
0.2
0.4
0.6
0.8
1
effici
ency
efficiency
multiplicity distribution
Num
bero
feve
nts [
a.u.
]
LHCb simulation GPU R&D
Primary vertices
Impact parameter
Secondary vertices
Impact parameter
0 500 1000 1500 2000Rate [kHz]
00.10.20.30.40.50.60.70.80.9
1
Effic
ienc
y
2-Track, Parameterized 2-Track, Simple
1-Track, Parameterized 1-Track, Simple
LHCb simulation
GPU R&D
Figure 6: Efficiency of the 1-Track and 2-Track trigger lines when calculating the IP χ2 from tracksfitted with the simple Kalman filter directly after the Velo tracking versus the parameterizedKalman filter on the Velo-segment of a track using the momentum estimate from the Forwardtracking. Using the B0s → φφ sample.
Figure 7: Throughput of the entire HLT1 sequence on single GPU cards.
Trigger Rate [kHz]
1-Track 249± 182-Track 663± 30High-pT muon 1± 1Displaced dimuon 50± 8High-mass dimuon 101± 12Total 971± 36
Table 1: Rates of the five trigger lines implemented in Allen and the total HLT1 output rate,determined with minimum bias events.
5
Secondary vertices
36
Ingredients for selections
Selections● 1-track selection ● 2-track selection● Based on p, pt, displacement,
vertex criteria and muon identification
0 2000 4000[MeV]
Tp
0
0.2
0.4
0.6
0.8
1
effici
ency
efficiency, not electronsefficiency, electrons
distribution, not electronsT
pdistribution, electrons
Tp
Num
ber o
f eve
nts [
a.u.
]
LHCb simulation GPU R&D
Tracks0 20 40 60 80 100
310×
p [MeV/c]0
0.2
0.4
0.6
0.8
1
1.2
/p [%
]pσ
p resolution
p distribution Num
ber o
f eve
nts [
a.u.
]
LHCb simulation GPU R&D
Momentum
Muon identification
0 20 40track multiplicity of MC PV
0
0.2
0.4
0.6
0.8
1
effici
ency
efficiency
multiplicity distribution
Num
ber o
f eve
nts [
a.u.
]
LHCb simulation GPU R&D
Primary vertices
Impact parameter
Secondary verticesMuon ID
0 20 40 60 80 100310×
p [MeV/c]0
0.2
0.4
0.6
0.8
1
1.2
/p [%
]pσ
p resolution
p distribution Num
ber o
f eve
nts [
a.u.
]
LHCb simulation
GPU R&D
Figure 4: Momentum resolution of tracks passing through the Velo, UT and SciFi detectors versusmomentum, using the combination of signal samples described in the text. Points represent themean, error bars the width of a Gaussian distribution fitted to the resolution in every momentumslice. The momentum distribution is overlaid as histogram.
0 50 100310×
p [MeV]0
0.2
0.4
0.6
0.8
1
Muo
n ID
effi
cien
cy
Muon ID efficiency
p distribution Num
ber o
f eve
nts [
a.u.
]
LHCb simulation
GPU R&D
Figure 5: Muon identification efficiency versus momentum for tracks passing through the Velo,UT and SciFi detectors with respect to all reconstructible muons, using the combination ofsignal samples described in the text. The momentum distribution is overlaid as histogram.
4
LHCB-FIGURE-2019-009
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 20 / 23
http://cds.cern.ch/record/2693058/
Allen performance
Trigger Rate [kHz]1-Track 249± 182-Track 663± 30High-pT muon 1± 1Displaced dimuon 50± 8High-mass dimuon 101± 12Total 971± 36
Total rate reduced 30→ 1 MHzPhysics performance consistent with x86baseline
Signal GEC TIS -OR- TOS TOS GEC× TOSB0 → K ∗0µ+µ− 89 ± 2 85 ± 2 78 ± 3 69 ± 3B0 → K ∗0e+e− 84 ± 3 69 ± 4 62 ± 4 53 ± 3B0s → φφ 83 ± 3 70 ± 3 65 ± 4 54 ± 3D+s → K +K−π+ 82 ± 4 62 ± 5 38 ± 5 32 ± 4Z → µ+µ− 78 ± 1 97 ± 1 97 ± 1 75 ± 1
GEC = global event cut, TIS = trigger independent of signal, TOS= trigger on signal
LHCB-FIGURE-2019-009
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 21 / 23
http://cds.cern.ch/record/2693058/
Allen throughput
39
Throughput on various GPUs
Throughput of the full HLT1 sequence
HLT1 can run on 500 GPUs→ read out full detector Buy GPUs instead of expensive network
0
20
40
60
80
100
120
140
160
Num
ber
of e
vent
s [a
.u.]
6 8 10 12 14 16SciFi raw data volume [kB]
0
20
40
60
80
100
120
140
160
Thro
ughp
ut o
n Te
sla
V100
16G
B [k
Hz]
Data volume distributionThroughput
LHCb simulation
GPU R&D
Figure 8: Throughput of the Allen sequence as a function of the SciFi raw data volume. Theraw data volume distribution is overlaid as histogram.
2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5Theoretical 32 bit TFLOPS
0
10
20
30
40
50
60
70
80
Alle
n th
roug
hput
[kH
z]
GeForce GTX 670GeForce GTX 680
GeForce GTX 1060 6GBGeForce GTX TITAN X
GeForce GTX 1080 TiTesla T4
GeForce RTX 2080 TiQuadro RTX 2080 Ti
Tesla V100 32GBLHCb simulation
GPU R&D
Figure 9: Allen throughput on various GPUs with respect to their reported peak 32-bit FLOPSperformance.
6
Full HLT1 algorithm can be run on ∼ 500 current GPUsBuy GPUs instead of networkingPerformance scales with GPU so can expect more from 2021 GPUs
Not yet limited by Amdahl’s lawPotential to perform more tasks within HLT1
LHCB-FIGURE-2019-009
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 22 / 23
http://cds.cern.ch/record/2693058/
Summary
Real-time reconstruction and calibration a success story for LHCb in Run IIOffline-quality reconstruction allowed for many trigger selections to be moved to theturbo stream
Selections can be based on offline-quality featuresSmaller event size→ higher event rate for same disk rateTradeoff – full raw event not saved→ cannot rerun reconstruction offlineAlready crucial for charm decays in Run II
LHCb detector and DAQ upgrades for Run IIINo hardware triggerFirst-stage software trigger must perform track reconstruction at bunch-crossing rate
Baseline x86 implementation of first-stage trigger significantly optimised to deal withhigher throughputAllen project offers a GPU-implementation
Generic framework allows for configurable algorithm sequenceFeasibility for possible use in LHCb Run III already demonstrated
Dan Craik (MIT) Trigger & Real-time Reconstruction @ LHCb 2019-12-09 23 / 23