27
D D S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access to the CLHEP and Root libraries www-d0.fnal.gov/~smjt/ multiv.html

S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

Embed Size (px)

Citation preview

Page 1: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

TerraFerMAA Suite of

Multivariate Analysis tools

Sherry TowersSUNY-SB

Version 1.0 has been released!

useable by anyone with access to the CLHEP and

Root librarieswww-d0.fnal.gov/~smjt/

multiv.html

Page 2: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

TerraFerMA

TerraFerMA=Fermilab Multivariate Analysis (aka “FerMA”)

Convenient interface to various multivariate analysis packages (ex: MLPfit, Jetnet, PDE/GEM, Fisher discriminant,binned likelihood etc)

User fills signal and background (and data) “Samples”, which are then used as input to FerMA methods…

Includes method to sort variables to determine which are best discriminators between signal and background.

Page 3: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

TerraFerMA

Also includes useful stats tools (correlations, means, RMS’s), and method to detect outliers.

Using a multivariate package chosen by user, FerMA will yield prob that data event is signal or background.

TerraFerMA makes it trivial to compare performance of different multivariate techniques!

Also makes it easy to reduce the number of discriminators used in an analysis!

Page 4: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

Simple techniques: Ingore all correlations between

discriminators… For example; simple techniques

based on square cuts, or likelihood techniques which obtain multi-D likelihood from product of 1-D likelihoods.

Advantage: fast, easy understand.

Easy to tell if modelling of data is sound.

Disadvantage: useful discriminating info is lost if correlations are ignored

FerMA includes a method to determineoptimal square cuts in a

multidimensionalparameter space.

Overview of common multivariate analysis

techniques:

Page 5: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

More powerful...

More complicated techniques take into account simple (linear) correlations between discriminators ANOVA/MANCOVA

H-Matrix Fisher-discriminant* Principal component analysis*

Projection correlation transformations*

Optimal Observables and many, many more…

Advantage: fast, more powerfulDisadvantage: can be a bit harder

to understand, systematics can be harder to assess. Harder to tell if modelling of data is sound.

Page 6: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

Probability correlation transformations

(ProCor)

ProCor is default multivariate package in TerraFerMA. Very fast (Relatively) easy to understand

Essentially, ProCor maps every point in signal (or background) MC onto a multi-dimensional Gaussian PDF. Mapping is optimal for MC sets

with linear correlations between variables

If mapping is not optimal, ProCor tells you!

Page 7: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

Most powerful...

Analytic/binned likelihood Advantage: easy to understand Disadvantage: difficult to implement

with many variables

Neural Networks Advantage: powerful, reasonably

fast Disadvantage: Black box! Many

parameters of method, and systematics can be difficult to assess

Kernel Estimation (Gaussian Expansion Method=GEM) (Static-Kernal Probability Density

Estimation=PDE) Advantage: powerful, easy to

understand. Unbinned estimate of original PDF. Few parameters of method.

Disadvantage: a bit slow.

Page 8: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

Gaussian Expansion Method/Probability Density

Estimation

All kernal PDFestimation

methodsare developed

froma very simpleidea…

If a data pointlies in a regionwhere clustering

ofsignal MC points isrelatively tight,andbkgnd MC points isrelatively loose,then that point ismore likely to besignal.

Page 9: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

GEM

Whether theclustering is“relatively

tight”can be

determinedfrom the localcovariance

matrix,calculated fromnearest

neighboursto a point

Page 10: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

GEM/PDE

But we also want estimate of probability density...

GEM/PDE uses idea that any continuous function can be modelled from the sum of “kernel” functions (similar to idea behind Fourier series)

GEM/PDE use multi-dimensional Gaussian kernels

Each Gaussian kernel is centred about an MC point…widths of Gaussian come from local covariance matrix at that point

Page 11: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

GEM: 1-D Gaussian

Page 12: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

GEM/PDE: 1-D Gaussian

Page 13: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

Boring details...

Page 14: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

The case for fewer discriminators…

Using a large number of variables indiscriminantly can indicate a lack of forethought in the design and conceptualization of an analysis

Page 15: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

The case for fewer discriminators…

Also, each added variable makes it more difficult to determine if modelling of data is sound, and makes analysis more difficult to understand

And, each added variable adds statistical noise…This can degrade overall discrimination power!

Page 16: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

Optimising discrimination…

Maximise S/sqrt(S+B), or:

Page 17: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

The curse of too many variables

Signal 5D Gaussian = (1,0,0,0,0) =

(1,1,1,1,1)Bkgnd 5D Gaussian = (0,0,0,0,0)

=

(1,1,1,1,1)

Only difference between signal and background is in first dimension. Other four dimensions are `useless’ discriminators

Page 18: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

The curse of too many variables

Page 19: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

The curse of too many variables

Page 20: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

A “real-world” example…

A Tevatron RunI analysis used a 7 variable NN to discriminate between signal and background.

Were all 7 needed? Ran the signal and

background n-tuples through the TerraFerMA interface to the sorting method…

Page 21: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

A “real-world” example…

Page 22: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

Another “real-world” example…

A Tevatron “physics-object-ID” method uses 9 variables in the analysis.

How many are actually needed?

Page 23: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

Another “real-world” example…

Page 24: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

Summary

Careful examination of discriminators used in a multivariate analysis is always a good idea!

Reduction of number of variables can simplify analysis considerably, and can even increase discrimination power!

Page 25: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

TerraFerMAVersion 1.0

TerraFerMA documentation: www-d0.fnal.gov/~smjt/

ferma.ps

TerraFerMA users’ guide: www-d0.fnal.gov/~smjt/

guide.ps

TerraFerMA package: …/ferma.tar.gz (includes an example

program in examples/simple/simple.cpp)

Page 26: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

TerraFerMA Version 1.0

Soon to be included:Support Vector Machines

Methods to fit for the fraction of signal and bkgrnd in a data sample

Ensembles (many samples grouped together)

Enhanced ability to write-out/read-in NN weights

Want more? Let me know!

Page 27: S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access

DD

S.TowersTerraFerMA

Summary

TerraFerMA is:A platform of very powerful multivariate analysis tools. In all test applications to-date, TerraFerMA has signficantly improved existing analyses!