View
216
Download
1
Category
Preview:
Citation preview
Advanced Statistical Techniques in Particle
Physics
Conference Summary(Thanks to Bob Cousins!)
Jim LinnemannMSU HEP Seminar
23 April, 2002
Conference Overview Durham, UK
– 5 days, nearly no rain!
• Mixture of “theoretical” and practical– Overview/Tutorial talks– Systematic Comparisons of Methods– New Developments– Problems
• Visiting (tolerant!) Statisticians: – Michael Goldstein– Wolfgang Rolke
• Radical idea: if phenomenologist in collaboration, why not a professional statistician (a la medical research)?
http://www.ippp.dur.ac.uk/statistics/
Tutorials, Overviews, Explanations• Fred James:
– Overview– Goodness of Fit vs. Intervals
• Roger Barlow: – Systematics:
• mistakes, effects, errors
Multidimensional: • Sherry Towers:
– PDE’s– Reducing variables in classification
• Harrison Prosper: – multi- dimensional methods
• Tony Vaiculis: – Support Vector Machines
• Niels Kjaer: Monte Carlo– Interpolating (+ much else)
• Pekka Sinervo:– Significance
• Berkan Aslan (G. Zech);– Goodness of Fit measures
• Glen Cowan, Volker Blobel: – Unfolding
• Paul Harrison: – Blind Analysis
Theory, Practice, and Methods
• Chris Parkes– Combining Lep W results
• Gary Hill, Tyce De Young– Bayes in Amanda tracking
• Rudy Bock, Wolfgang Wittek– Multidimensional methods for
Gamma/hadron separation
• Volker Blobel– Global Alignment Fits
• Alex Read– CLS
• Dean Karlen– Credibility of Conf Intervals
• Raja– Uncertainty of Limits
Problems to Chew On
• Nigel Smith and Dan Tovey– Dark Matter Searches
• Bruce Yabsley– Statistics in Practice at Belle
Fred James
Important not to confuse these problems, e.g., interval estimation and goodness-of-fit testing.
Parameter fitting criterionParameter fitting criterion
– Hypothesis-testing vs parameter-fitting criteria ( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p.3 )
Weng
Multidimensional Methods• Aspire to full extraction of information• Equivalent to trying to fit
P(signal)/P(background) (Neyman-Pearson)
• Issues in – choice of dimensionality (no one tells you how many!)– methods of approximation– control of bias/variance tradeoff – complexity of fit– Number of free parameters– Amount of training data needed– “ease” of interpretation
• We are following the field; hope for theory to help• See: Elements of Statistical Learning,
– Hastie, Tibshirani, Friedman
Sherry Towers
Wow! Several questions come to my mind…
[In general case, variables deletion is safer than variable addition. –M.G.]
Harrison Prosper• Thumbnail sketch of some methods of interest:
– Fisher Linear Discriminant– Principal Components Analysis– Independent Component Analysis– Self-Organizing Map– Grid Search– Probability Density Estimation– Neural Networks– Support Vector Machines
• Said these all are attempts to solve the single classification problem whose solution is the Bayes discriminator D(x) = P(S|x)/P(B|x) = (L(S)/L(B)) (P(S)/P(B)) … = Neyman-Pearson when P(S)=P(B)
• Multivariate analysis is hard: important to use all the information used by D(x) (which might be lost, e.g., by marginalization). Appears that there is no single optimal approximation.
Unfolding (unsmearing)
• Inherently unstableMeasured smoother than
trueUn-smooth: Enhances noise!
• Nice discussion of regularization, biases, uncertainies
• See talk – and his statistics book
One program:
• Must balance between oscillations and over-smoothed result
• = Bias-variance tradeoff– Same issues in
multidimensional methods
Glen Cowan
Unfolding: Insight• View as matrix
problem• “ill-posed” = singular• Analyze in terms of
Eigenvalues/vectors and condition number
V. Blobel
Statistical error
Truncate eigenfunctions when below error bound
Unfolding Results
High frequencies not measured:
Report fewer bins? (or supply from prior??)
Statistical error in neighboring bins no longer uncorrelated!
Blobel
oversmoothed
Higher modes converge slowly: interate only a few times (d’Agostini)
N.J. Kjaer (I)
Delphi MC
Re-interpretation of data to interpolate on physics paramters
Analogy with Stat Mech MC techniques?
Paul Harrison
Blind Analysis
Cousins: it takes longer, especially first time
“liberating”
By the way, no one can read light green print…
Blind Analysis
• A called shot– Step towards making 3 mean 3
• Many ways to blind– 10% of data; background-only, obscure fit
result
• Creates a mindset– Avoiding biases and subjectivity
R.K.Bock, Durham, March 2002 23
Method details and comments: composite probabilities (2-D)
• intuitive determination of event probabilities by multiplying the probabilities in all 2D projections that can be made from image parameters, using constant bin content for some data
• shown on some IACT data to at least match best existing results (but strict comparisons suffered from moving data sets)
R.K.Bock, Durham, March 2002 24
CP program uses same-content binning
in 2 dimensions
Bins are set up for gammas (red),
probabilities are evaluated for protons (blue)
all possible 2-D projections are used
Method details and comments: composite probabilities (2-D)
R.K. Bock
Interesting idea:
Expansion in dimensionality of correlation
Very Interesting Technique!• Let’s relate it to something we do: say particle ID
in a detector: – In hot part of detector near beam: lots of background,
we tighten particle-ID cuts
– In lower-occupancy part of the detector away from beam, can loosen certain particle-ID cuts without letting in a lot of background
• Use our knowledge of position-dependent occupancy rates in Bayes’s Theorem to calculate the probability that a given particle in a given location is the species of interest.
• If all input P’s are frequentist P’s, the output P(particle type | data) is a frequentist P.
• We can use this posterior frequentist P like any other observable for cuts, weights, etc. If we independently calibrate the signal efficiency/ background rejection of this use, there is nothing circular about using our knowledge of the input occupancies.
• If the input occupancy knowledge is imperfect it will not introduce a bias, but rather make the technique less powerful.
Comments:
Bayes’s Theorem applies to any P satisfying the axioms of probability
• Frequentist P: limiting frequency– Theorem not much use if the unknown is a constant of nature:
P(unknown) = delta-function at unknown value
• Bayesian P: degree of belief– For constant of nature, P(unknown) can be combination of
delta-function and continuous function, reflecting degree of belief
• Is the Amanda technique “Bayesian”?– Not if “Bayesian” implies “not frequentist”, as I think is
common, even though frequency P is emulated in a certain application/limit of degree of belief.
• In any case, instructive example!
Chris Parkes
Practicalities of Combining Analyses:
W Physics Results at LEPNow the stuff you don’t normally see…
An important reminder: pragmatic considerations (sometimes even irrational) can be as important as principles in order to get out a result.
RC: An informative talk about both methodology and sociology!
• LEP experiments contained a sizable fraction of world HEP community, and reached very mature state of analysis.– We have much to learn from them, both theoretical and
practical.
This talk is not for the squeamish or over-idealistic,
but it is a vivid description of the real world in action!
Cousins:
Studies of Intervals
• Byron Roe and Michael Woodroofe: Mini-Boone
• Jan Conrad: Coverage with Systematics
• Rolke and Lopez: Bias correction via double-bootstrap
• Giunti and Laveder: the “power” of confidence intervals
• Punzi: Strong Confidence Intervals
• Giovanni Signorelli et al: Strong C.I. And systematics
Dean Karlen’s Proposal to Evaluate Credibility of Confidence Intervals
• Yesterday evening, generally interested-to-favorable reaction
• Cousins: I’m outlier: I think it will only encourage unthinking “easy” use of Bayes, with more flat (i.e., not degree of belief) priors.
• We evaluate Bayesian intervals with serious frequentist methods.
• Why not evaluate confidence intervals with serious Bayesian methods? One metric-dependent prior constituteth not a sensitivity analysis.
• Who was it who said “How do you know that the outlier isn’t right?”
Alex Read’s Beautiful Talk on CLS • CLs = PV(s+b)/PV(b) (Cowan, PDG Stats)
– PV = P Value = prob(obs), posterior, like P(2)
• Behavior compared to LR Ordering (F-C) is understood and lucidly explained. Application to neutrino oscillations!
• Please see his talk• Cousins comment: The non-standard conditioning
(inequality, not ancillary statistic) of Zech and Roe&W and Read leads to problems with lower end of confidence intervals (see Cousins PRD Comment). Alex recognized this.
• Therefore, Alex now advocates CLS only for limits, and in case of signal, he now would use LR Ordering.
To Use Bayes or not?• Professional Statisticians are much more Bayes-oriented in last 20 years
– Computationally possible– Philosophically coherent
• (solipsistic?? Subjective Bayes…)
• In HEP: want to publish result, not prior– We want to talk about P(theory|data)
• But this requires prior: P(theory)– Likelihoods we can agree on!– Conclusions should be insensitive to a range of priors
• Probably true, with enough data
• Search limits DO depend on priors!– Hard to convince anyone of a single objective prior!!!– Unpleasant properties of naïve frequentist limits, too
• Feldman-Cousins is current consensus
• Systematic errors hard to treat in frequentist– PDG currently recommends Bayes “smearing of likelihood”
• close in spirit to Cousins-Highland mixed Frequentist-Bayesian
Michael Goldstein
• A real pleasure to have you here!• Since subjective Bayes is rarely used in HEP, but
is “known” to be the “coherent” version, it has been very enlightening:
• “Sensitivity Analysis is at the heart of scientific Bayesianism”– How skeptical would the community as a whole have to
be in order not to be convinced.– What prior gives P(hypothesis) > 0.5– What prior gives P(hypothesis) > 0.99, etc
• There’s a split among Bayesians; M.G. is in the group that sees no virtue in objective (“arbitrary”) priors (except as one of many examples of possible prior beliefs in a sensitivity analysis).
Michael Goldstein (cont.)
• Procedures should obey the likelihood principle. Frequentist methods don’t obey it: fundamental flaw.
• Bayesian methods are hard to do right, but they are the only way to attack certain hard problems.
• Bayes Linear Methodology: addresses expectations rather than whole pdf’s.
• HEP problems: appear to map onto a very similar set of abstract problems.
Cousins would add:
• (Coherent) Subjective priors behave like real probabilities under transformations, unlike, e.g., flat priors.
• M.G. represents only one school of Bayesian stats, but I don’t think you will find a school advocating uniform prior for a Poisson mean.
• M.G. portrays Bayesian methods as hard, but worth the effort. This should be stressed in HEP, where the hard part (subjective prior) is dodged, and the math is (indeed) easily cranked out (without backwards thinking) to give an “answer” that I think is without much content unless evaluated by frequentist standards.
• I think M.G.’s point about sensitivity analysis has to be taken to heart in HEP, whether one uses objective or subjective priors.
Cousins’ Last Words (for now!)
• The area under the likelihood function is meaningless.
• Mode of a probability density is metric-dependent, as are shortest intervals.
• A confidence interval is a statement about P(data | parameters), not P(parameters | data)
• Don’t confuse confidence intervals (statements about parameter) with goodness of fit (statement about model itself).
• P(non-SM physics | data) requires a prior; you won’t get it from frequentist statistics.
• The argument for coherence of Bayesian P is based on P = subjective degree of belief.
Recommended