Identification of the correct hard-scatter vertex at the Large Hadron Collider(LHC) Pratik Kumar(pratikk), Neel Mani Singh(neelmani) • ATLAS is a particle detector analyzing proton-proton collisions from the LHC. • Identification of the correct hard-scatter primary vertex from around 60 collisions. • Key challenge for the analysis of LHC events is pileup. MOTIVATION The current technique for the identification of the primary vertex selects the vertex with the highest total energy. The total energy is computed as the scalar sum of all particle tracks associated to the vertex. This method has a very poor performance when the number of pileup interactions is large, selecting the wrong vertex 40% of the time as seen in the graph. CURRENT METHOD DATASET & FEATURES Our dataset consists of computer simulated events of Higgs bosons. Each event picture consists of a list of vertices (60 on average) and each vertex consists of a list of particle tracks. Each track is represented by a direction in 3D space, an origin (given by the vertex it belongs to), and its energy. res that will be used as inputs for a classifier. MODELUSED RESULT Model F-Score (test) F-Score (train) LR 98.63 98.62 NN 96.84 96.72 BBLR 96.37 96.32 BBNN 55.18 55.01 DISCUSSIONS FUTURE WORK Features used – • sumPt - scalar sum of transverse momentum of all the tracks. • sumPtw - weighted sum of track. • MET - missing transverse energy. • eta1, eta2, eta3 - angle for top 3 tracks. • pt1, pt2, pt3 – transverse momentum of top 3 tracks. Since we have a class imbalance problem, we have to use a metric that is not biased towards the majority class. Therefore we have chosen to use weighted F1-score. VERTEX SELECTION VERTEX SELECTION EFFICIENCY The vertex selection efficiency from BBLR shows that our model performs better at high pileup densities than the current technique. • Logistic Regression(LR) • Neural Network(NN) • Balanced Bagging(BB) • Balanced Bagging with Logistic Regression(BBLR) • Balanced Bagging with Neural Network(BBNN) LR did not perform well due to class imbalance in data. Bagging techniques gave better results. • Till now, we have treated each of the vertex as an independent data input. • But for our problem, we need to select a vertex from a group of vertices of an experiment. For this we evaluate our model per experiment and chose the vertex that gives the highest probabilities. • Based on this, we calculate the vertex selection efficiency vs pileup densities. • Neural Networks can be improved by tuning of the parameters - learning rate, hidden layer units, etc. • Thresholds - Predictions based on different thresholds. • Features - More features can be extracted from the simulation of the events. REFERENCES • https://atlas.cern • Slides from Prof Ariel Schwartzman • Debashree Devi, Saroj kr. Biswas and Biswajit Purkayastha, “Redundancydriven modified Tomek-link based undersampling: A solution to class imbalance”, 2016 • Kevin W. Bowyer, Nitesh V. Chawla, Lawrence O. Hall and W. Philip Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique”, CoRR , 2011. • https://svds.com/learning-imbalanced-classes/ . • The data is inherently unbalanced because of the nature of the experiment so general training techniques doesn’t work. • Features apart from sumPt has discriminating effect for different type of collision event. That is why our model works better than the existing approach at high pile-up densities as per vertex selection efficiency. • Our model performs almost similar on training and test set. Therefore no overfitting. • Neural Network without balanced bagging method of classification is unstable for this dataset as it produces quite varying results. The ROC curve for BBLR indicates that results can be improved by using different threshold

Identification of the correct hard-scatter vertex at the ...cs229.stanford.edu/proj2017/final-posters/5148556.pdfPratik Kumar(pratikk), Neel Mani Singh(neelmani) • ATLAS is a particle

Download PDF Report

Upload
others
View
1
Download
0

Embed Size (px)

Citation preview

Page 1: Identification of the correct hard-scatter vertex at the ...cs229.stanford.edu/proj2017/final-posters/5148556.pdfPratik Kumar(pratikk), Neel Mani Singh(neelmani) • ATLAS is a particle

Identificationofthecorrecthard-scattervertexattheLargeHadronCollider(LHC)PratikKumar(pratikk),NeelManiSingh(neelmani)

• ATLASisaparticledetectoranalyzingproton-protoncollisionsfromtheLHC.

• Identificationofthecorrecthard-scatterprimaryvertexfromaround60collisions.

• KeychallengefortheanalysisofLHCeventsispileup.

MOTIVATION

Thecurrenttechniquefortheidentificationoftheprimaryvertexselectsthevertexwiththehighesttotalenergy.Thetotalenergyiscomputedasthescalarsumofallparticletracksassociatedtothevertex.Thismethodhasaverypoorperformancewhenthenumberofpileupinteractionsislarge,selectingthewrongvertex40%ofthetimeasseeninthegraph.

CURRENTMETHOD

DATASET&FEATURESOur dataset consists ofcomputer simulated eventsof Higgs bosons. Each eventpicture consists of a list ofvertices (60 on average) andeach vertex consists of a listof particle tracks. Each trackis represented by a directionin 3D space, an origin (givenby the vertex it belongs to),and its energy. res that willbe used as inputs for aclassifier.

MODELUSED

RESULTModel F-Score

(test)F-Score(train)

LR 98.63 98.62NN 96.84 96.72BBLR 96.37 96.32BBNN 55.18 55.01

DISCUSSIONS

FUTUREWORKFeaturesused–• sumPt - scalarsumof

transversemomentumofallthetracks.

• sumPtw - weightedsumoftrack.

• MET - missingtransverseenergy.

• eta1,eta2,eta3 - anglefortop3tracks.

• pt1,pt2,pt3 –transversemomentumoftop3tracks.

Since we have a classimbalance problem, wehave to use a metric thatis not biased towardsthe majority class.Therefore we havechosen to use weightedF1-score.

VERTEXSELECTION

VERTEXSELECTIONEFFICIENCY

ThevertexselectionefficiencyfromBBLRshowsthatourmodelperformsbetterathighpileupdensitiesthanthecurrenttechnique.

• LogisticRegression(LR)• NeuralNetwork(NN)• BalancedBagging(BB)• BalancedBaggingwithLogisticRegression(BBLR)• BalancedBaggingwithNeuralNetwork(BBNN)LRdidnotperformwellduetoclassimbalanceindata.Baggingtechniquesgavebetterresults.

• Tillnow,wehavetreatedeachofthevertexasanindependentdatainput.

• Butforourproblem,weneedtoselectavertexfromagroupofverticesofanexperiment.Forthisweevaluateourmodelperexperimentandchosethevertexthatgivesthehighestprobabilities.

• Basedonthis,wecalculatethevertexselectionefficiencyvspileupdensities.

• NeuralNetworkscanbeimprovedbytuningoftheparameters- learningrate,hiddenlayerunits,etc.

• Thresholds- Predictionsbasedondifferentthresholds.• Features- Morefeaturescanbeextractedfromthe

simulationoftheevents.REFERENCES

• https://atlas.cern• SlidesfromProfArielSchwartzman• Debashree Devi,Saroj kr.BiswasandBiswajit Purkayastha,“Redundancydriven modified

Tomek-link basedundersampling:Asolutiontoclassimbalance”,2016• KevinW.Bowyer,Nitesh V.Chawla,LawrenceO.HallandW.PhilipKegelmeyer,“SMOTE:

SyntheticMinorityOver-samplingTechnique”,CoRR ,2011.• https://svds.com/learning-imbalanced-classes/.

• Thedataisinherentlyunbalancedbecauseofthenatureoftheexperimentsogeneraltrainingtechniquesdoesn’twork.

• FeaturesapartfromsumPt hasdiscriminatingeffectfordifferenttypeofcollisionevent.Thatiswhyourmodelworksbetterthantheexistingapproachathighpile-updensitiesaspervertexselectionefficiency.

• Ourmodelperformsalmostsimilarontrainingandtestset.Thereforenooverfitting.

• NeuralNetworkwithoutbalancedbaggingmethodofclassificationisunstableforthisdatasetasitproducesquitevaryingresults.

The ROC curve forBBLR indicates thatresults can beimproved by usingdifferent threshold