43
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data CMS Week, CERN — June 25, 2018 1 Jesse Thaler The Future is Open Jet Substructure with CMS Public Data

The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data

CMS Week, CERN — June 25, 2018

1

Jesse Thaler

The Future is OpenJet Substructure with CMS Public Data

Page 2: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 2

CMS and CERN are pioneering the release of research-grade public collider data

2014 2015 2016 2017 2018 2019 2020 2021

2011A 2012B/C2010B

Page 3: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3

Exposing the QCD Splitting Function with CMS Open Data

Andrew Larkoski,1,*

Simone Marzani,2,†

Jesse Thaler,3,‡

Aashish Tripathee,3,§

and Wei Xue3,∥

1Physics Department, Reed College, Portland, Oregon 97202, USA

PRL 119, 132003 (2017) P HY S I CA L R EV I EW LE T T ER Sweek ending

29 SEPTEMBER 2017

2014 2015 2016 2017 2018 2019 2020 2021

First Publication

MODMIT Open Data

2011A 2012B/C2010B

Page 4: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 4

Exposing the QCD Splitting Function with CMS Open Data

Andrew Larkoski,1,*

Simone Marzani,2,†

Jesse Thaler,3,‡

Aashish Tripathee,3,§

and Wei Xue3,∥

1Physics Department, Reed College, Portland, Oregon 97202, USA

PRL 119, 132003 (2017) P HY S I CA L R EV I EW LE T T ER Sweek ending

29 SEPTEMBER 2017

2014 2015 2016 2017 2018 2019 2020 2021

First Publication

MODMIT Open Data

2011A 2012B/C2010B

A Milestone for Public Collider Data

A Milestone for Jet Physics

An Opportunity/Challenge for our Community

Page 5: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 5

Viability of public collider datadepends on interest/enthusiasmof particle physics community

Highlight the opportunitiesExpose the challengesInspire you to help

Goals of this talk:

Page 6: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 6

The Future of Public Collider Data

Jet Substructure and QCD Splittings

Using the CMS Open Data

0

1

2

3

4

5

6

7

8

1

σdσdzg

CMS 2010 Open Data

Theory (MLL; all)

Pythia 8.219

Herwig 7.0.3

Sherpa 2.2.1

pPFCT > 1.0 GeV

AK5; |η| < 2.4

pjetT > 150 GeV

SD: β = 0, zcut = 0.1

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Track zg

0.5

1.0

1.5

2.0

Rati

oto

Pyth

ia

Outline

Recent progress processing the 2011 dataset

Highlights from our 2010 publications

Back to the future with ALEPH data

Page 7: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 7

The Future of Public Collider Data

Jet Substructure and QCD Splittings

Using the CMS Open Data

0

1

2

3

4

5

6

7

8

1

σdσdzg

CMS 2010 Open Data

Theory (MLL; all)

Pythia 8.219

Herwig 7.0.3

Sherpa 2.2.1

pPFCT > 1.0 GeV

AK5; |η| < 2.4

pjetT > 150 GeV

SD: β = 0, zcut = 0.1

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Track zg

0.5

1.0

1.5

2.0

Rati

oto

Pyth

ia

Page 8: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 8

November 2014:!

Run 2010B7 TeV, 32 pb–1

!

>20 TB, no MC!

(First publication: QCD)

April 2016:!

Run 2011A7 TeV, 2.5 fb–1

!

>100 TB, with MC!

(In the pipeline: BSM, ML)

opendata.cern.ch/research/CMS

December 2017:!

Run 2012B/C8 TeV, 11.6 fb–1

!

>1 PB, with MC

Kati Lassila-Perini,Achim Geiser, ...

Page 9: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 9

The MIT Open Data Team

2010QCD:

Aashish Tripathee Wei Xue Simone MarzaniAndrew Larkoski Summer intern:Alexis Romero

CMS advice:Sal Rappoccio

2011BSM:

Wei XueMatt Strassler Yotam Soreq Cari Cesarotti Raffaele D'Agnolo

2011ML:

Radha Mastandrea Preksha Naik …

Today:very preliminary 2011 results(only 16%, still debugging)

Patrick Komiske

Page 10: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 10

Key Challenge: Initial Data Processing

CernVM + CMSSW 5.3.32 (for 2011)!

AOD Format (CMS Root)RAW → RECO → “Analysis Object Data”!

Access via XRootD, write custom EDAnalyzer

For novices, very steep learning curve for using Rootand understanding overall data structure (esp. triggers)

Jet Primary Dataset: 4.7 TB for 3 million AOD eventsDaunting amount of information!

Page 11: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 11

Our Strategy: Simplified Analysis Framework

MOD

MODProducer + MODAnalyzer + FastJet 3.3.1!

MOD Format (ASCII)For 2010: Cross-check with flat Root n-tuples!

Access via External Hard Drive

Text files as educational tool and debugging strategyAccess only essential information, sacrifice flexibility

Jet Primary Dataset: ~500 GB for 3 million MOD eventsNew for 2011: Separate processing for triggers/luminosity

Page 12: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 12

Example MOD Metadata (Simplified)

New for 2011: Effective luminosity information per trigger

BeginFile Version 6 CMS_2011A Data Jet !# File Filename TotalEvents ValidEvents IntLumiDel IntLumiRec File A850-02163E008D77 30284 29057 895036.1 884409.2 !#Block RunNum LumiBlock Events Valid? IntLumiDel IntLumiRec Block 160578 366 53 1 28.6 27.5 Block 160578 367 58 1 28.6 27.5 Block 160578 368 38 1 28.6 27.5 Block 160578 369 40 1 28.6 27.5 Block 160578 370 57 1 28.6 27.5 Block 160578 371 46 1 28.6 27.5 Block 160578 372 37 1 28.6 27.5 ... !# Trig Name Present Valid Fired EffLumiDel EffLumiRec AvePrescale Trig HLT_Jet240_v2 16530 9 1 27228.1 27026.7 3.0 Trig HLT_DiJetAve140U_v4 13754 5 1 11705.3 11582.4 1.0 Trig HLT_Jet240_v1 13754 5 1 18387.4 18205.5 1.0 Trig HLT_Jet150_v2 16530 9 2 2722.8 2702.6 30.0 Trig HLT_Jet190_v2 16530 9 2 8168.4 8108.0 10.0 Trig HLT_Jet190_v1 13754 5 2 6939.9 6872.4 3.0 Trig HLT_DiJetAve15U_v4 13754 5 1 1.0 1.0 7500.0 Trig HLT_DiJetAve110_v2 11772 3 1 358.3 356.0 75.0 ... !EndFile

Page 13: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 13

Example MOD Metadata (Simplified)

New for 2011: Effective luminosity information per trigger

BeginFile Version 6 CMS_2011A Data Jet !# File Filename TotalEvents ValidEvents IntLumiDel IntLumiRec File A850-02163E008D77 30284 29057 895036.1 884409.2 !#Block RunNum LumiBlock Events Valid? IntLumiDel IntLumiRec Block 160578 366 53 1 28.6 27.5 Block 160578 367 58 1 28.6 27.5 Block 160578 368 38 1 28.6 27.5 Block 160578 369 40 1 28.6 27.5 Block 160578 370 57 1 28.6 27.5 Block 160578 371 46 1 28.6 27.5 Block 160578 372 37 1 28.6 27.5 ... !# Trig Name Present Valid Fired EffLumiDel EffLumiRec AvePrescale Trig HLT_Jet240_v2 16530 9 1 27228.1 27026.7 3.0 Trig HLT_DiJetAve140U_v4 13754 5 1 11705.3 11582.4 1.0 Trig HLT_Jet240_v1 13754 5 1 18387.4 18205.5 1.0 Trig HLT_Jet150_v2 16530 9 2 2722.8 2702.6 30.0 Trig HLT_Jet190_v2 16530 9 2 8168.4 8108.0 10.0 Trig HLT_Jet190_v1 13754 5 2 6939.9 6872.4 3.0 Trig HLT_DiJetAve15U_v4 13754 5 1 1.0 1.0 7500.0 Trig HLT_DiJetAve110_v2 11772 3 1 358.3 356.0 75.0 ... !EndFile

Effective Luminosity per Trigger Testing Trigger Consistency

Page 14: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 14

Example MOD Event (Simplified)BeginEvent Version 6 CMS_2011A Data Jet !# /Run2011A/Jet/MOD/12Oct2013-v1/20000/000D4260-D23E-E311-A850-02163E008D77.mod !#Cond RunNum EventNum LumiBlock NPV Timestamp msOffset Cond 160578 38142433 366 4 1300254008 84656 !#Trig Name Prescale_1 Prescale_2 Fired? Trig HLT_DiJetAve30U_v4 1 15 0 Trig HLT_DiJetAve50U_v4 1 3 1 Trig HLT_Jet110_v1 1 1 1 ... !# AK5 px py pz energy jec area no_of_const neu_had_frac ... AK5 -48.53 91.23 922.46 928.25 1.15 0.77 3 0.17 ... AK5 27.14 -27.95 -176.24 180.60 1.11 0.71 14 0.11 ... AK5 6.87 -27.39 -127.71 130.89 1.13 0.59 10 0.14 ... ... # PFC px py pz energy pdgId PFC 3.05 -2.27 -18.08 18.48 211 PFC 3.51 -3.48 -21.66 22.22 211 PFC 2.83 -3.01 -20.00 20.42 -211 PFC 2.89 -2.37 -18.40 18.77 211 PFC 1.21 -1.31 -7.58 7.79 -211 PFC 1.62 -2.72 -12.17 12.58 -211 PFC 7.15 -7.56 -46.86 48.01 22 ... !EndEvent

Crucial: JEC factors, jet quality criteria, and particle flow candidates

Page 15: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 15

Example MOD Event (Simplified)BeginEvent Version 6 CMS_2011A Data Jet !# /Run2011A/Jet/MOD/12Oct2013-v1/20000/000D4260-D23E-E311-A850-02163E008D77.mod !#Cond RunNum EventNum LumiBlock NPV Timestamp msOffset Cond 160578 38142433 366 4 1300254008 84656 !#Trig Name Prescale_1 Prescale_2 Fired? Trig HLT_DiJetAve30U_v4 1 15 0 Trig HLT_DiJetAve50U_v4 1 3 1 Trig HLT_Jet110_v1 1 1 1 ... !# AK5 px py pz energy jec area no_of_const neu_had_frac ... AK5 -48.53 91.23 922.46 928.25 1.15 0.77 3 0.17 ... AK5 27.14 -27.95 -176.24 180.60 1.11 0.71 14 0.11 ... AK5 6.87 -27.39 -127.71 130.89 1.13 0.59 10 0.14 ... ... # PFC px py pz energy pdgId PFC 3.05 -2.27 -18.08 18.48 211 PFC 3.51 -3.48 -21.66 22.22 211 PFC 2.83 -3.01 -20.00 20.42 -211 PFC 2.89 -2.37 -18.40 18.77 211 PFC 1.21 -1.31 -7.58 7.79 -211 PFC 1.62 -2.72 -12.17 12.58 -211 PFC 7.15 -7.56 -46.86 48.01 22 ... !EndEvent

Crucial: JEC factors, jet quality criteria, and particle flow candidates

Trigger Turn-on Behavior Hardest Jet pT

0 100 200 300 400 500 600 700 800

Trigger Jet pT [GeV]

0.7

0.8

0.9

1.0

1.1

1.2

1.3

Cro

ssS

ect

ion

Rati

o

480390

310270

210150

11090

AK5; |η| < 2.4

Preliminary (16%)

Jet60 / Jet30

Jet80 / Jet60

Jet110 / Jet80

Jet150 / Jet110

Jet190 / Jet150

Jet240 / Jet190

Jet300 / Jet240

Jet370 / Jet300

CMS 2011 Open DataCMS 2011 Open Data

New for 2011:Detector Simulated Data (!)

10−810−710−610−510−410−310−210−1100101102103104105106

Dif

f.C

ross

Sect

ion

[pb/G

eV

]

Preliminary (16%)

CMS 2011 Open Data

Pythia 6 Tune Z2 (Simulated)

Pythia 6 Tune Z2 (Truth)

CMS 2011 Open Data

AK5; |η| < 2.4

pjetT > 90 GeV

0 200 400 600 800 1000 1200 1400 1600 1800

pT [GeV]

0.5

1.0

1.5

2.0

Rati

oto

Sim

ula

ted

Page 16: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 16

Basic Jet PropertiesPseudorapidity

Jet Mass (without JMC)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Dif

f.C

ross

Sect

ion

[nb]

Preliminary (16%)

CMS 2011 Open Data

Pythia 6 Tune Z2 (Simulated)

Pythia 6 Tune Z2 (Truth)

CMS 2011 Open Data

AK5

pjetT > 270 GeV

−4 −3 −2 −1 0 1 2 3 4

η

0.5

1.0

1.5

2.0

Rati

oto

Sim

ula

ted

0

20

40

60

80

100

120

140

160

Dif

f.C

ross

Sect

ion

[pb/G

eV

]

Preliminary (16%)

CMS 2011 Open Data

Pythia 6 Tune Z2 (Simulated)

Pythia 6 Tune Z2 (Truth)

CMS 2011 Open Data

AK5; |η| < 2.4

pjetT > 270 GeV

0 10 20 30 40 50 60 70 80 90

Mass [GeV]

0.5

1.0

1.5

2.0

Rati

oto

Sim

ula

ted

Track Multiplicity

Expected mismatch with truth from strange hadrons (cτ0 ∈ [10,1000] mm)

Mismatch with simulated under study

0

50

100

150

200

Dif

f.C

ross

Sect

ion

[pb]

Preliminary (16%)

CMS 2011 Open Data

Pythia 6 Tune Z2 (Simulated)

Pythia 6 Tune Z2 (Truth)

CMS 2011 Open Data

AK5; |η| < 2.4

pjetT > 270 GeV

0 10 20 30 40 50 60

Track Constituent Multiplicity

0.5

1.0

1.5

2.0

Rati

oto

Sim

ula

ted

Page 17: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 17

Opportunity to streamline,since steps common to almost every collider data project

Preliminary 2011 Processing:!

2 students, ~6 months!(First collider physics project,previous experience in Python/C++,using template from 2010 analysis)

MOD

Page 18: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 18

The Future of Public Collider Data

Jet Substructure and QCD Splittings

Using the CMS Open Data

0

1

2

3

4

5

6

7

8

1

σdσdzg

CMS 2010 Open Data

Theory (MLL; all)

Pythia 8.219

Herwig 7.0.3

Sherpa 2.2.1

pPFCT > 1.0 GeV

AK5; |η| < 2.4

pjetT > 150 GeV

SD: β = 0, zcut = 0.1

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Track zg

0.5

1.0

1.5

2.0

Rati

oto

Pyth

ia

Page 19: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 19

CMS 2010 Run B7 TeV pp31.8 pb–1

!Jet Primary Dataset768,687 events(from 20 million)!HLT_Jet70UHLT_Jet100UHLT_Jet140U

MOD

Page 20: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 20

Anti-kt, R = 0.5|η| < 2.4pT > 20 GeV“Loose” JQC

Page 21: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 21

Anti-kt, R = 0.5|η| < 2.4Hardest pT > 150 GeV“Loose” JQC PFC pT > 1 GeV

Page 22: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 22

Jet Grooming: mMDT/Soft Drop β = 0

![Larkoski, Marzani, Soyez, JDT, 1402.2657; Dasgupta, Fregoso, Marzani, Salam, 1307.0007;

see also Butterworth, Davison, Rubin, Salam, 0802.2470]

zg > zcut θgβ

zg

1–zg

θg

Drop radiation until:

Page 23: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

zg > zcut θgβ

zg

θg

Drop radiation until:

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 23

0.0 0.1 0.2 0.3 0.4 0.5

zg

0.0

0.2

0.4

0.6

0.8

1.0

θg

pPFCT > 1.0 GeV

AK5; |η| < 2.4

pjetT > 150 GeV

mMDT / SDβ=0 : zcut = 0.1

CMS 2010 Open DataCMS 2010 Open Data

0

4

8

12

16

20

Collinear

dPi→ig '

2αs

πCi

θ

dz

z

Soft

Cq = 4/3Cg = 3

Soft/Collinear Behavior

z ≈ zg

zg > zcut θgβ

Drop radiation until:

[Larkoski, Marzani, JDT, 1502.01719; see also Larkoski, JDT, 1307.1699]

(!)z

1–zg

θ

Page 24: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

zg > zcut θgβ

zg

θg

Drop radiation until:

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 24

0.0 0.1 0.2 0.3 0.4 0.5

zg

0.0

0.2

0.4

0.6

0.8

1.0

θg

pPFCT > 1.0 GeV

AK5; |η| < 2.4

pjetT > 150 GeV

mMDT / SDβ=0 : zcut = 0.1

CMS 2010 Open DataCMS 2010 Open Data

0

4

8

12

16

20

Collinear

dPi→ig '

2αs

πCi

θ

dz

z

Soft

Cq = 4/3Cg = 3

Soft/Collinear Behavior

z ≈ zg

zg > zcut θgβ

Drop radiation until:

[Larkoski, Marzani, JDT, 1502.01719; see also Larkoski, JDT, 1307.1699]

(!)z

1–zg

θ

Perfect application of CMS Open Data

Benefits from low trigger thresholds and low pileup

2010 data ⇒ 2014 release ⇒ 2015 idea ⇒ 2017 analysis

Page 25: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 25

0

1

2

3

4

5

6

7

8

9

1

σdσdzg

CMS 2010 Open Data

Theory (MLL)

Pythia 8.219

Herwig 7.0.3

Sherpa 2.2.1

pPFCT > 1.0 GeV

AK5; |η| < 2.4

pjetT > 150 GeV

mMDT / SDβ=0 : zcut = 0.1

0.0 0.1 0.2 0.3 0.4 0.5 0.6

zg

0.5

1.0

1.5

2.0

Rati

oto

Pyth

ia

Initial 2011 Results2010 Analysis

[Larkoski, Marzani, JDT, Tripathee, Xue, 1704.05066]

zg

Exposing the QCD Splitting Function

0

1

2

3

4

5

6

7

8

9

1

σdσdzg

Preliminary (16%)

CMS 2011 Open Data

Pythia 6 Tune Z2 (Simulated)

Pythia 6 Tune Z2 (Truth)

CMS 2011 Open Data

AK5; |η| < 2.4

pjetT > 270 GeV

mMDT / SDβ=0 : zcut = 0.1

0.0 0.1 0.2 0.3 0.4 0.5 0.6

zg

0.5

1.0

1.5

2.0

Rati

oto

Sim

ula

ted

Clear impact from detector(N.B.: no PFC pT cut imposed)

Page 26: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 26

The Future of Public Collider Data

Jet Substructure and QCD Splittings

Using the CMS Open Data

0

1

2

3

4

5

6

7

8

1

σdσdzg

CMS 2010 Open Data

Theory (MLL; all)

Pythia 8.219

Herwig 7.0.3

Sherpa 2.2.1

pPFCT > 1.0 GeV

AK5; |η| < 2.4

pjetT > 150 GeV

SD: β = 0, zcut = 0.1

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Track zg

0.5

1.0

1.5

2.0

Rati

oto

Pyth

ia

Page 27: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 27

V. ADVICE TO THE COMMUNITY

A. Challenges

B. Recommendations

VI. CONCLUSION

As the LHC explores the frontiers of scientific knowl-

edge, its primary legacy will be the measurements and

discoveries made by the LHC detector collaborations. But

there is another potential legacy from the LHC that could be

just as important: granting future generations of physicists

access to unique high-quality data sets from proton-proton

collisions at 7, 8, 13, and 14 TeV.

In our view, the best way to build a legacy data set is to

Jet substructure studies with CMS open data

Aashish Tripathee,1,*

Wei Xue,1,†

Andrew Larkoski,2,‡

Simone Marzani,3,§

and Jesse Thaler1,∥

1

PHYSICAL REVIEW D 96, 074003 (2017)

Page 28: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 28

E.g. ALEPH Confronts the CMS Ridge

[Badea, Baty, Chang, Innocenti, Yen-Jie Lee, Maggi, McGinn, Peters, Sheng, JDT, appeared at Quark Matter 2018]

pp

pp pPb Pb PbPb

vs.e+e–

1990–95 e+e– data

2010 pp surprise!

2018 e+e– analysis

Page 29: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Competes with needs

of the collaboration

Less impact on

the target audience

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 29

Different Options for “Public Data”

Audience

Timing

Event Displays

Instant Quarter Year Decade

Outreach

Archival

Education

Few Years

Research

Page 30: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Balance

openness and

priority

Competes with needs

of the collaboration

Less impact on

the target audience

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 30

Different Options for “Public Data”

Audience

Timing

Event Displays

Instant Quarter Year Decade

Outreach

Archival

Education

Few Years

Research

Page 31: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 31

Data preservation (and outside analyses)require significant resources:

People, time, ideas, and money

To help justify these resources,let me address three common concerns

about public collider data raised by our work

Page 32: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

0.0

0.1

0.2

0.3

0.4

0.5

0.6

θgσ

dθg

CMS 2010 Open Data

Theory (MLL)

Pythia 8.219

Herwig 7.0.3

Sherpa 2.2.1

pPFCT > 1.0 GeV

AK5; |η| < 2.4

pjetT > 150 GeV

mMDT / SDβ=0 : zcut = 0.1

0.01 0.02 0.05 0.10 0.20 0.50 1.00

θg

0.5

1.0

1.5

2.0

Rati

oto

Pyth

ia

0.0

0.1

0.2

0.3

0.4

0.5

0.6

θgσ

dθg

Preliminary (16%)

CMS 2011 Open Data

Pythia 6 Tune Z2 (Simulated)

Pythia 6 Tune Z2 (Truth)

CMS 2011 Open Data

AK5; |η| < 2.4

pjetT > 270 GeV

mMDT / SDβ=0 : zcut = 0.1

0.01 0.02 0.05 0.10 0.20 0.50 1.00

θg

0.5

1.0

1.5

2.0

Rati

oto

Sim

ula

ted

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 32

!

“There is no way you can do an external analysis with thesame degree of sophistication as within the collaboration”

Balance between Sophistication and Exploration

Agreed (mostly)!

But with unexpected theoretical/experimental issues at play, value in exploratory studies

[Tripathee, Xue, Larkoski, Marzani, JDT, 1704.05842]

Page 33: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 33

!

“This work competes with ongoing collaboration analyseswithout the scrutiny of internal review”

Synergy between Internal and External Efforts

[CMS, 1708.09429 ⇒ Phys. Rev. Lett. 120:142302]

Agreed, but fine line between “compete” and “complement”Important to have robust peer review (e.g. dual referees)

0.1 0.2 0.3 0.4 0.5

gd

zdN

je

tN1

0

2

4

6

8

10 Centrality: 50-80%

PbPb

pp smeared

CMS

1.8 0.1 0.2 0.3 0.4 0.50

2

4

6

8

10 Centrality: 30-50%

1.8 0.1 0.2 0.3 0.4 0.50

2

4

6

8

10 Centrality: 10-30%

1.8 0.1 0.2 0.3 0.4 0.50

2

4

6

8

10 Centrality: 0-10%

1.8

-1bµ, PbPb 404 -1 = 5.02 TeV, pp 27.4 pbNN

s

gz0.1 0.2 0.3 0.4 0.5

0

gz0.1 0.2 0.3 0.4 0.5

0

gz0.1 0.2 0.3 0.4 0.5

0

gz0.1 0.2 0.3 0.4 0.5

0

Page 34: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 34

!

“If you really wanted to do this jet substructure measurement,you should have joined CMS as an associate member”

Value of Open-Ended Investigations

Agreed, but what I really wanted to do is figure out theanswer to this question (curiosity-driven research)

Page 35: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 35

The CMS Open Data is a fantastic resource,with many exciting applications

My View

Stress-testing archival data strategies

Enabling exploratory/proof-of-principle studies

Facilitating dialogue between theory and experiment

Educating future scientists

Researching physics in and beyond the standard model

These are only possible with sustainedinvestment in public data initiatives

Page 36: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 36

The Future of Public Collider Data

Jet Substructure and QCD Splittings

Using the CMS Open Data

0

1

2

3

4

5

6

7

8

1

σdσdzg

CMS 2010 Open Data

Theory (MLL; all)

Pythia 8.219

Herwig 7.0.3

Sherpa 2.2.1

pPFCT > 1.0 GeV

AK5; |η| < 2.4

pjetT > 150 GeV

SD: β = 0, zcut = 0.1

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Track zg

0.5

1.0

1.5

2.0

Rati

oto

Pyth

iaSummary

Unique collider data set, ideal for exploratory studies

Exposing the universal singularity structure of gauge theories

Sustained investment from outreach to research to archives

Page 37: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 37

Backup Slides

Page 38: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 38

Additional 2011 PlotsAzimuth

Track Mass

Constituent Multiplicity

0

20

40

60

80

100

120

140

Dif

f.C

ross

Sect

ion

[pb]

Preliminary (16%)

CMS 2011 Open Data

Pythia 6 Tune Z2 (Simulated)

Pythia 6 Tune Z2 (Truth)

CMS 2011 Open Data

AK5; |η| < 2.4

pjetT > 270 GeV

0 10 20 30 40 50 60 70 80 90

Constituent Multiplicity

0.5

1.0

1.5

2.0

Rati

oto

Sim

ula

ted

0

50

100

150

200

Dif

f.C

ross

Sect

ion

[pb/G

eV

]

Preliminary (16%)

CMS 2011 Open Data

Pythia 6 Tune Z2 (Simulated)

Pythia 6 Tune Z2 (Truth)

CMS 2011 Open Data

AK5; |η| < 2.4

pjetT > 270 GeV

0 10 20 30 40 50 60

Track Mass [GeV]

0.5

1.0

1.5

2.0

Rati

oto

Sim

ula

ted

0.0

0.2

0.4

0.6

0.8

1.0

Dif

f.C

ross

Sect

ion

[nb]

Preliminary (16%)

CMS 2011 Open Data

Pythia 6 Tune Z2 (Simulated)

Pythia 6 Tune Z2 (Truth)

CMS 2011 Open Data

AK5; |η| < 2.4

pjetT > 270 GeV

0 π/2 π 3π/2 2π

φ

0.5

1.0

1.5

2.0

Rati

oto

Sim

ula

ted

Page 39: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 39

Sewing Together Simulated Data Samples

0 100 200 300 400 500 600 700

Hardest Jet pT [GeV]

10−3

10−2

10−1

100

101

102

103

104

105

106

107

108

109

Cro

ssS

ect

ion

[pb]

Pythia 6 Tune Z2 (Simulated)AK5; |η| < 2.4

Preliminary (16%)

pGenT ∈ [15,30] GeV

pGenT ∈ [30,50] GeV

pGenT ∈ [50,80] GeV

pGenT ∈ [80,120] GeV

pGenT ∈ [120,170] GeV

pGenT ∈ [170,300] GeV

pGenT ∈ [300,470] GeV

CMS 2011 Open DataCMS 2011 Open Data

0 500 1000 1500 2000 2500 3000

Hardest Jet pT [GeV]

10−10

10−9

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101

102

Cro

ssS

ect

ion

[pb]

Pythia 6 Tune Z2 (Simulated)AK5; |η| < 2.4

Preliminary (16%)

pGenT ∈ [470,600] GeV

pGenT ∈ [600,800] GeV

pGenT ∈ [800,1000] GeV

pGenT ∈ [1000,1400] GeV

pGenT ∈ [1400,1800] GeV

pGenT ≥ 1800 GeV

CMS 2011 Open DataCMS 2011 Open Data

Page 40: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 40

Textbook QCD: Universal Collinear Limit

Collinearsingularity

Softsingularity

dPi→ig '

2αs

πCi

θ

dz

z

x

1→2

2z

1–zθ

Splitting Function

Cq = 4/3Cg = 3

2. . .

. . .

2. . .

. . .

2→n 2→n–1

Page 41: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 41

A Quad-Jet Puzzle in Archival ALEPH Data

[Kile, von Wimmersperg-Toeller, 1706.02242, 1706.02255, 1706.02269]

2: ∆ for 45 GeV < Σ < 61 GeV.

-200

-100

0

100

200

300

400

500

600

10 20 30 40 50 60 70 80 90 100

Σ (GeV/c2)

events

/ 1

.00 G

eV

/c2

LEP2 ALEPH archived dataSherpa MEPS@LO

4fWW

4fZZother

Nfit = 223.28 ± 61.41Mean = 53.97 ± 1.24

Width = 3.55 ± 1.14

0.6

0.8

1

1.2

1.4

10 20 30 40 50 60 70 80 90 100

(a)

-100

-50

0

50

100

150

200

250

300

-100 -80 -60 -40 -20 0 20 40 60 80 100

∆ (GeV/c2)

eve

nts

/ 5

Ge

V/c

2

LEP2 ALEPH archived data

SHERPA MEPS@LO

4fWW

4fZZ

Other

0.6

0.8

1

1.2

1.4

-100 -80 -60 -40 -20 0 20 40 60 80 100

(?!)

Sym-Dijet Mass Average Sym-Dijet Mass Difference

Page 42: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 42

Example of stress-testing archival data strategies

Page 43: The Future is Open · Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3 Exposing the QCD Splitting Function with CMS Open Data Andrew Larkoski,1,* Simone

Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 43

New: Public neutrino data!

Derived data from 2008-2012 ⇒ Released May 2018