Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Bayesian MethodsLectures I+II
U. von Toussaint
Max-Planck-Institute for Plasmaphysics
Garching
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Recommended Literature:
Introduction: Data Analysis, 2nd edition (D.S. Sivia): Bayesian Logical Data Analysis (Phil Gregory)
General: Bayesian Probability Theory (W. von der Linden)Pattern Recognition and Machine Learning (C.M. Bishop)
Information Theory, Inference and Learning Algorithms (D.J.C. MacKay)Bayesian Reasoning (D. Barber)
Review article: Bayesian Inference in Physics in Review of Modern Physics 83 p. 943-999, (2011) :-)
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
I. Experiment II. Evaluation
IV. Hypothesis III. Hypothesisverification building
Scientific Inference Cycle
Introduction
I. Experiment II. Evaluation
III. Hypothesis DesignIV. Hypothesis Verification
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
I. Experiment II. Evaluation
IV. Hypothesis III. Hypothesisverification building
Scientific Inference Cycle
Introduction
I. Experiment II. EvaluationParameter Estimation
III. Hypothesis DesignModel Selection
IV. Hypothesis VerificationExperimental Design
Prerequisite: Consistent reasoning...
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Outline
Inference in Science Processing of Information
Non-Gaussian Likelihoods Change Points Integral Eq. / Deconvolution
I. Scientific Inference
II. Parameter Estimation
III. Model Comparison Basic Concept Mass Spectroscopy Background Subtraction
IV. Numerical Interlude Integration methods
V. Experimental Design Basic Concept Optimized observation times Nuclear Reaction Analysis
VI. 'Non-parametric' models Gaussian Processes
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Inference in Science
Science: prior information + new data new knowledge
Prior information: continous learning process- old data- calibration measurements, validation data- theoretical considerations- parameters- model
New data: - measurements- calibration- theories
But: prior information and data are uncertainnew knowledge and derived hypotheses are uncertaininductive (instead of deductive) reasoning requiredCalculus for “uncertainties” needed (quantification)
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Inference in Science
Science: prior information + new data new knowledge
a) deductive reasoning:
causeEffects
or outcomes
Effectsor
observations
b) inductive reasoning
Example: fair coin: p(7 heads | 10 tosses)=?
Example: 7 heads out of 10 tosses: p(fair coin)=?
possiblecauses
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Inference in Science
Calculus: Cox, Knuth, Skilling,...: Basic requirements (i.e. symmetry,consistency)
single out usual rules of probability theory to handle uncertainty
Please note: probability here not restricted to frequency interpretation:
Degree of belief about a proposition
Data: - statistical (counting statistics)- measurement uncertainty (ruler)- systematic (e.g. misalignment)- outliers
Notation: p(x|I)p(x|I) describes how probability (plausibility) is distributed among possible choices for x for the case at hand (information I)
Hypotheses:- parameter of interest- nuisance parameters- physical models- future data
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Inference in Science
Bayesian interpretation:p(x|I) describes how probability (plausibility) is distributed among possible choices for x for the case at hand (information I)
Frequentist interpretation:p(x|I) describes how x is distributedthroughout an infinite (hypothetical)ensemble probability = frequency
p(x|
I)
x
p is distributed
x has a single, uncertain value
x
x is distributed
p(x|
I)
Calculus: Cox, Knuth, Skilling,...: Basic requirements (i.e. symmetry,consistency)
single out usual rules of probability theory to handle uncertainty
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Processing of information
Processing of information: Combination of (cond.) probability distributions
Conditional Probabilities:
p(A| B) : probability of proposition A given truth of proposition B:quantification of uncertainty of A
Probability Theory Axioms:
- sum rule (OR):
- product rule (AND):
Bayes Theorem
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Processing of information
Bayes' theorem:
Prior: knowledge before experiment (logically)Likelihood: Probability for data if the hypothesis was truePosterior: Probability that the hypothesis is true given the dataEvidence: normalization; important for model comparison
Maximum Likelihood approach (parameters which maximise probability for data) does not yield most likely parameters*!
Examples: p(wet street | rain,I) ≠ p(rain | wet street,I)p(female | pregnant,I)≠ p(pregnant | female,I)
Posterior
Likelihood Prior
Evidence
*) in general, except e.g. flat, unbounded priors
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Processing of information
Toy Example: Mass of elementary particle
- Prior knowledge: 0 ≤ m ≤ mupper
=0.2
- measurement uncertainty σ=0.15- measured data:
d1=0.09,
d2=-0.2,
d3=0.05
Maximum likelihood: m=-0.02 (<0!)
Bayes: 1) Assign prior:
2) Assign likelihood:
3) Compute posterior using Bayes' theorem:
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Processing of information
posterior :
contains complete information:
summarizing quantities:- mode: m=0- mean: <m
0>=0.06
- 95%-interval: [0;0.145]
Non-gaussian likelihoods:
E.g.
for d3 with β=0.05
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Processing of information
Marginalization:
generalized error propagation
( )||
()
)
)
((
p d aa
pp
a
p dd
×=+Bayes theorem
prior distribution
p(d|a) likelihood distribution
posterior distribution
( , | )
)( |
( |
( , )
(
)
)
,
db p a b d
dbp ap d b
p
d
a
a b
d
p
=
=
∫∫
+an additional (nuisance) parameter b:
Reasoning about parameter a:(uncertain) prior information p(a|I)
(uncertain) measured data d=D ± ε physical model D=f(a)
++
Clear recipe how to tackle a problem – possibly demanding mathematics/numerics
1) Quantify information at hand in probability distributions2) Multiply probability distributions 3) Marginalize nuisance parameters4) Analyze posterior distribution
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Outline
Non-Gaussian Likelihoods Change Points Deconvolution
II. Parameter Estimation
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Beyond Gauss...
Least square approach:
• Computationally very convenient• Supported by law of large numbers• De-facto standard in scientific community
Model equation: μ=di + ξ
i with ξ
i=N(0,σ2)
Likelihood:
which can be rewritten as Gaussian with
Mean-value and variance
p d∣ , , I =∏i
1 i2
exp [−1
2 i2 d i−
2]
= 2∑i
d i
i2
2=
1
∑i
1
i2
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Least square approach:
• Result should be disturbing: Actual data scatter is not taken into account !
Beyond Gauss...
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Least square approach:
• Result should be disturbing: Data scatter is not taken into account Transformations of data with d
i'=d
i + γ(d
i – μ) do not change
estimated mean and variance!
Beyond Gauss...
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Least square approach:
• Result should be disturbing: Data scatter is not taken into account Transformations of data with d
i'=d
i + γ(d
i – μ) do not change
estimated mean and variance!
Beyond Gauss...
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Least square approach:
• Result should be disturbing: Data scatter is not taken into account Transformations of data with d
i'=d
i + γ(d
i – μ) do not change
estimated mean and variance!
Both data set yield identical results formean and variance
•Particular property ofleast-square estimation:Uncertainty estimate has tobe correct
Beyond Gauss...
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
CODATA-Group:
Panel of experts, evaluating every 5-10 years the best estimates (and uncertainties) of physical constants, e.g.
elementary charge ePlanck's constant hgyromagnetic ratio of the proton γ
p
...Non-trivial because of coupling of constants and quality of input data
Beyond Gauss...
Applied Method:
Least-Squares
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
• Deviations from the probability distribution are to be expected• Outliers -> Mixture models (Sivia 1996, Press 1998,...)• Wrong uncertainty estimate• Small experimental inaccuracies• ... Use generalised distributions: A) generalized Gauss
Includes standard Gaussian for α=2.
p d∣G, ,= 12 211 /
exp[−∣d−G 2 ∣
]
Beyond Gauss...
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
•
p d∣G, ,= 12 211 /
exp[−∣d−G 2 ∣
]Beyond Gauss...
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
α needs to be marginalised
Which prior for α ?
• Definitely bounded• Jeffreys prior inapplicable(singular Fisher information matrix)• Flat prior clearly ruled out
p d∣G, =∫d p p d∣G, ,
Beyond Gauss...
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
α needs to be marginalised
Which prior for α ?
Approach suggested by e.g.Zellner[1]:
Maximize informationdistance betweenprior and posterior
p d∣G, =∫d p p d∣G, ,
p(α)~exp(I(α))
with
resulting in: 0<α<α0
I =∫d d p d∣G, , lnp d∣G, ,
p =1Z 0
exp−1/ 11 /
divergence
Beyond Gauss...
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
•Generalized Gauss still assumed perfect knowledge of uncertainty σ In reality: only point estimate s of σ is known (see e.g. [3]) :
MaxEnt-Principle specifies p(ω) (expectation value=1):
Then:
resulting in
p i∣s i ,i= i−s i
i
p =exp−∑ii
p d∣G,s =∫d p ∫d ∏i i−
s i
i
p d i∣G , i
p d∣G,s =∏i
1s i2
12xi
2 [1− 2xi
exp 1
4xi2 erfc
12xi
] xi=d i−G
si 2
Beyond Gauss: Marginalized Uncertainty
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Properties of
•Heavy tails : ~ 1/x2
•Singularitiescancel:
~ 1-2x2
p d∣G,s =∏i
1s i2
1
2xi2 [1− 2xi
exp 1
4xi2 erfc
12xi
] xi=d i−G
s i 2
Beyond Gauss: Marginalized Uncertainty
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
• Data for Newtons constant of gravitation (offset from median)
G [1011 G/m3kg-1s-2]
LS: 6.674 274 (+67;-67)GG: 6.674 200 (+71;-103)GM: 6.674 233 (+49; -88)
Beyond Gauss: Marginalized Uncertainty
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
• Does the data support the use of LS?
- Birge ratio RB=(χ2/(N-1)) = 2.35 (instead of 1)
Approach by CODATA: multiply LS uncertainty by 10.
- Posterior estimate of p(α| d,σ): Strong deviation from α=2
Validation: Generation of mock data with same mean and variance but different R
B:
Large spread (N_data=8) ?
Beyond Gauss: Marginalized Uncertainty
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
•p(α| d,σ): Should center around α=2 for increasing sample size
Validation: Generation of mock data with same mean and variance and R
B=1
---------------------ND <α> σ8 4.26 2.09 16 3.58 1.4932 2.30 0.4664 1.96 0.25
Beyond Gauss: Marginalized Uncertainty
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
•Robustness against outliers of the three approaches:
- use of additional fictitious data point
- response of estimate
Conclusion:- LS: unacceptable- GM: very good- GG: very similar
Beyond Gauss: Marginalized Uncertainty
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Extensions to multivariate problems: elementary charge and Planck's constant from Josephson constant K
J, Klitzing constant R
K
(KJ,=2e/h; R
K=h/e2 ) [2]
Beyond Gauss: Marginalized Uncertainty
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Comparison of LS, GG, GM
• Both GG and GM take into account deviations from (assumed) Gaussian distribution
• GM exhibits outlier robustness with sensible prior assumptions
GM approach recommended whenever doubts about (Gaussian) data distribution do exist
Open issue: - Appropriate error model?
→ Mixture modelling
Beyond Gauss: Summary I
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Combining incompatible measurements
• Fraction β of data may suffer from mis-specification: Mixture models
• Assignment is a-priori unknown
Mixture Models: Outlier Detection
• Some data points exhibit wrong scaling of uncertainty, eg.
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Combining incompatible measurements
Likelihood :
Mixture Models: Outlier Detection
with
and
Form of likelihood A depends on assumed uncertainty properties
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mixture Models: Outlier Detection
and
Example: N=100, β=0.1, ω=10, σ=0.1
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mixture Models: Outlier Detection
Example: N=100, β=0.1, ω=10, σ=0.1
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mixture Models: Outlier Detection
Posterior distribution for mean μ and outlier probability β:
:
Mean μ and outlier probability β are uncorrelated and independent of s and ω → use of flat priors
Typically: brute-force numerics – here β-marginalization possible:
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mixture Models: Outlier Detection
:
Interlude: The β-distribution and -function:
• Often relevant for probabilities on probabilities (Bernoulli problems)• Definition of beta-function:
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mixture Models: Outlier Detection
:
The β-distribution: defined (domain) for x in [0,1]:
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mixture Models: Outlier Detection
:
The β-distribution:
• Very flexible distribution on [0,1]
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mixture Models: Outlier Detection
Rewrite in terms of β-distribution:
:
Coefficients C can be derived from equality
and the resulting recurrence relation (υ=1,..,n)
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mixture Models: Outlier Detection
Marginal distribution for β:
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mixture Models: Outlier Detection
• Example: N=100, β=0.1, ω=10, σ=0.1, μ=1
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
- Detection of changes in a data series: - Are there changes?
- How many?- Where?
- Example: Time series of onset of cherry
Blossom*
- Plausible hypotheses: - constant onset- linear trend- change point(s): piecewise linear
Change point analysis
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
- Detection of changes in a data series: - Are there changes?
- How many?- Where?
Example: Change in mean and variance at unknown location
Change point analysis
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Example: Change in mean and variance at unknown location m
Change point analysis
Completing the square in the exponent (standard trick for Gaussians):
Mean and variance are required but unknown: marginalization→ need prior distributions
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Example: Change in mean and variance at unknown location m
Change point analysis
Marginalization over mean μ and variance σ :
Plugging in the prior distributions:
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Integration over the mean (limit W →infinity): standard Gaussian integral
Change point analysis
Marginalization over variances σ : again of standard type
Which yields
(factor ½ missing? (UvT))
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Prior for change point: flat
Change point analysis
Now all terms are specified: compute posterior prob. for all values of m
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Change point analysis
Data have been generated with change point m=80.
Please note: at present no pdf for mean and variance given Exercise: equations for mean and variance distributions in the both parts
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
- Detection of changes in a data series: - Are there changes?
- How many?- Where?
- Example: Time series of onset of cherry
Blossom*
- Plausible hypotheses: - constant onset- linear trend- change point(s): piecewise linear
- Model: ; with E=xk
- Likelihood: ~
*) V. Dose, A. Menzel, GCB 10, p.259 (2004)
Change point analysis
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
- Detection of changes in a data series: - Are there changes?
- How many?- Where?
- Example: Time series of onset of cherry
Blossom*
- Plausible hypotheses: - constant onset- linear trend- change point(s): piecewise linear
- Model: ; with E=xk
- Likelihood: ~
*) V. Dose, A. Menzel, GCB 10, p.259 (2004)
Change point analysis
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
- Posterior for change point position E:
- How about f,σ in likelihood?
Nuisance parameters
Marginalize parameters
Maximum: mid-eighties (~1985) , same result for several other species
Broad probability distribution of change points: Prediction requires weighted averaging
Change point analysis
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
- Posterior for change point position E:
- How about f,σ in likelihood?
Nuisance parameters
Marginalize parameters
Broad probability distribution of change points: Prediction requires weighted averaging:
rapidly increasing confidence limits
Change point analysis
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Linear deconvolution of apparatus function: - Ubiquitous problem in physics: Most measurements suffer from limited resolution
Example: Rutherford Backscattering Analysis
- 2.6 MeV 4He on Co/Si Isotopically pure
apparatus function A
- 2.6 MeV 4He on Cu/Si spectra to be deconvolved
Convolution:
Deconvolution:
Deconvolution I
d i=Dii=∑j=1
N
Aij f ji
f uc=A−1d
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Linear deconvolution of apparatus function: - Ubiquitous problem in physics: Most measurements suffer from limited resolution
Example: Rutherford Backscattering Analysis
- 2.6 MeV 4He on Co/Si Isotopically pure
apparatus function A
- 2.6 MeV 4He on Cu/Si spectra to be deconvolved
Convolution:
Deconvolution:
Noise and prior knowledge have to be taken into account...
Deconvolution I
d i=∑j=1
N
Aij f ji
f uc=A−1d
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Likelihood function:
Poisson: counting experiment
with
Prior knowledge (in this case):
- Neighbored channels f are correlated (unknown correlation length):
Adaptive kernels
- positive (additive) distribution h: Entropic prior (Skilling 1991)
with
Deconvolution I
p d∣D =∏i=1
N Did i
d i !exp−Di
Di=∑ Aij f j
p h∣ , I ~expS h S=−h x ln h x
m x
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Combining everything:
Likelihood x prior:
Posterior for f:
Deconvolution I
R. Fischer et al, PRE55,6667 (1997)
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Nonlinear deconvolution: RBS-Depth profile estimation
RBS - Rutherford Backscattering Spectrometry
- Widely used for near-surface layer analysis of solids (2-20 µm)
- Elemental composition and depth profiling of individual elements
- Spectra depend nonlinear on elemental composition:
due to stopping power and energy dependent cross sections
Deconvolution II
EEnergysensitivedetector
αβ
θDi=g c x
400 6000
500
1000
1500
2000 Initial 550°C 650°C Simulation
Co
un
ts
Channel
Si substrate
Ag, 1.2x1018 at/cm2
TiN,1.0x1017 at/cm2
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Nonlinear deconvolution: RBS-Depth profile estimation
Erosion / Deposition in ASDEX Upgrade Divertor: Isotope marker experiment: 13C/12C-sample
Deconvolution II
12C
13 C
12C
12C13 C
before:
after:
Evaluation hampered by overlapping signal contributions: quantitative evaluation challenging:
Nonlinear forward modeling coupled with adaptive kernels
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Nonlinear deconvolution: RBS-Depth profile estimation *)
Erosion / Deposition in ASDEX Upgrade Divertor: Isotope marker experiment
Deconvolution II
12C
13 C
12C
12C13 C
before:
after:
Simultaneous erosionand deposition:
Long standing tension with spectroscopic measurements mitigated
U. von Toussaint et al, NJP, 1999
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Nonlinear deconvolution: RBS-Depth profile estimation
Erosion / Deposition in ASDEX Upgrade Divertor: Isotope marker experiment
Deconvolution II
12C
13 C
12C
12C13 C
before:
after:
Simultaneous erosionand deposition:
Long standing tension with spectroscopy data understandable
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Outline
III. Model Comparison Basic Concepts Mass Spectroscopy Background Subtraction
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Model Comparison
How many boxes are in the picture*)?
Desired: Occam's Razor (Prefer simpler models (that fit the data))
*)MacKay, Information theory
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Model Comparison
Posterior for θ
Bayesian Approach:
•Bayes Model Comparison requires always alternative models
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Model Comparison
*
*) T. Loredo, Garching, 2003
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mass Spectrometry
A +e– ⇒ A+ +2e–
Iions
∆U
+ —
e–
Quadrupole mass spectrometry with electron impact ionization:
0 5 10 15 20
103
104
105
106
QM
S s
igna
l (co
unts
/ s
ec)
mass / charge (amu / e)
CH4
Ee- = 70 eV
Electron impact ionization can lead to complex, instrument dependent fragmentation patterns:
Versatile tool for neutral gas analysis: fast, high dynamic range, flexible,...
CH4 +e– ⇒ CH3+ + H +2e–, CH2
+ + 2H +2e–
CH+ + 3H +2e–, C+ + 4H +2e–
H+ + CH3 +2e–, ... (+13C, D) …
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mass Spectrometry
"a mass spectrometrist is someone,who figures out what something is,
by smashing it with a hammerand looking at the pieces"Quote from Th. Schwarz-Selinger
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mass Spectrometry
Model: with cracking matrix C (C, x are uncertain/unknown)
Likelihood:
S : measurement uncertaintiesE : number of species
Prior terms:
Concentrations x : depending on experiment (e.g. gas mixture)
Cracking matrix C : exponential prior based on point estimates of Cornu&Massot (1979)
Probability for a set of species E:
with
High-dimensional integrals!MCMC-Integration
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mass Spectrometry
Measured data:
4 o.m.dynamicrange
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
0 10 20 30 40 50 60100
101
102
103
104
105
106
CH4 plasma
data model 6 model 1
rela
tive
in
ten
sity (
a.u
.)
mass / charge (amu/q)
Mass Spectrometry
input:
data:signal + error of 34 mass channels for 27 different plasmas conditions
calibration measurements + errorfor 11 species
prior: cracking estimates for 14 species
ouput:cracking pattern and concentrations (+ errors!)for 14 species(e.g. C4H2 , C3H6)
'shifted' cracking pattern for radical CH3 very bad
*) Th. Schwarz-Selinger, H. Kang, U. von Toussaint et al, various publications (2001-2007)
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Mass Spectrometry
model comparison with „Occams Razor“CH3, C2H5, H
fitting noise!0 1 2 3 4 5 6
150
1000
0.0
0.5
1.0
1.5
mis
fit:
1/S
2 (
d -
C ⋅
x)2
975
Ba
yes:
ln
P(E
|D,S
,I)
number of radicals
125
How many radicals are in the plasma?
(<χ2 >
/N)-
1
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Neural Networks
• Neural Networks: non-linear parameterized mapping y=y(x,w)
• Workhorse: feedforward network
Input Layer
Hidden Layer
Output Layer
hi= f 1(∑k
K
w ik(1 ) xk+bi
(1 ))z=f 2 (∑
i
I
wi(2 ) hi+b (2 ))
• Neural Networks are trained by minimizing an error function ED
• Large non-linear optimization problem: Backpropagation algorithm for weight updates
•Sigmoidal activation functions:
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Soft-X-Ray Diagnostic at W7-X
•Currently favoured setup
(A. Weller):
• Present algorithms for
the tomography problem:
- matrix inversion
- regularized inversion (curvature)
- Maximum Entropy approaches
- Bayesian approaches
•Problem: Algorithms either unreliable or too slow for monitoring (~10ms)
• Solution: Use of fast Neural Networks
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Neural Networks
•Problem:
How many cells are necessary?
Trade-off between bias and variance
Approaches to control the complexity:
•Early stopping (requires validation data set)
•Regularization
( )wΩ+= αDEE
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Neural Networks
Regularizers: A few examples
•Simple weight decay
•Modified weight decay
•Curvature based regularizers
( ) ∑=Ωi
iww 2
2
1
( ) ∑ +=Ω
i i
i
w
ww
22
2
γ
Ω (w )=12∑n , k (
∂2 yn
∂ x k2 )
•Regularizers can be interpreted as prior probability distributions over parameters (MacKay):
Bayesian Probability Theory can be applied
•But: „Arbitrary“ choice of prior...
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Hyperplane Priors
Interpretation of hidden neuron output:
gi=∑k
wik(1 ) x k+b i
(1 )=bi (∑k a ik xk+1)
0g =idefines (K-1) dimensional hyperplane
in K-dimensional x-space
•Orientation of hyperplane depends on network weights
•If there is a prior for hyperplanes Prior for first layer of hidden neurons
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Hyperplane Priors
Interpretation of hidden neuron output:
gi=∑k
wik(1 ) x k+b i
(1 )=bi (∑k a ik xk+1)
0g =idefines (K-1) dimensional hyperplane
in K-dimensional x-space
•Orientation of hyperplane depends on network weights
•If there is a prior for hyperplanes Prior for hidden neurons
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Hyperplane Priors
•Prior distribution for the weights a
01...2211 =++++ KiKii xaxaxa
• Prior has to be independent of coordinate transforms (translation, rotation)
( ) ( ) ´´11 ...´... iKiiKi dadaapdadaap
=
•Invariances determine prior[1]:
( )( ) 2
122
1
1
...
11... +
++= K
iKi
iKi
aaZaap
•Well known case for straight-line prior: y=mx+t
( )( ) 2
321
11,
mZtmp
+=
[1] V. Dose, Bayesian Inference and ME Methods, 22nd International Workshop, AIP, 2003
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Hyperplane Priors
• b determines the slope of the decision boundary: testable information
• slope ∑=k
ikii abs 222
•Prior according to Maximum Entropy Principle:
p ( b∣a , λ , I )= 1Z
exp (− λ2∑
i
bi2∑
k
a ik2 )
•Prior is marginalized over λ employing Jeffrey‘s prior
•The prior is finally given by
p ( a , b∣I )= 1Z
1
(∑i
bi2
(∑k a ik2
))N2
∏i=1
N1
∑k
aik2
Uff……………..
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Moving Plasma
•Simulated data set of a
horizontal shift of the plasma
• same noise level as before
• note the absence of vertical
movements
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Reliability of reconstruction
•How trustworthy is the reconstruction?
a) Use NN result to compare measured and reconstructed sensor data (necessary but
not sufficient for inverse problems)
b) Additional check:
Different networks differ outside the „known“ regime -> Indication of unexpected events
Complementary verification methods
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
An extreme example of model comparison: Neural Networks without training data
U. von Toussaint et al, JAO, 2006
Model Comparison
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Outlier detection – Background estimation
Form free background estimation:- analytical form unknown
Signature of background: smooth variationFind regions without signal contribution and fit smooth function to this background data
Suited model: e.g. cubic splines, thin plate splines, …
with Φ...Spline basis; E...Number of spline knots (support points) ξ ...Spline knot positions; c
v...Spline coefficients
Signal from background separation by minimal knot spacing Δx
Quantification: Mixture Model
b x =∑v=1
E
x ,v cv
Bi : datum d
i is purely background :
Bi : d
i contains signal contribution :
d i=b i c , ,E i
d i=bi c , ,E isi
PIXE
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Outlier detection – Background estimation
Assigning the respective likelihoods:
However: signal contribution si is unknown...marginalize with average signal strength s
0 :
yielding the likelihood for the mixture model
W. von der Linden, R. Fischer, V. Dose, various publications (2000)
∫ds ...
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Outlier detection – Background estimation
Result:
( | ) 0.5
( | ) 0.5o
o i
i
p B d
p B d
<>
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Outline
IV. Numerical Interlude Evidence Integrals
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Evidence Integrals
Bayesian inference depends on (high-dimensional) integration...
*J. Skilling 2004; MacKay 2006
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Relation to statistical physics: partition function
Why is this integration difficult?
Evidence Integrals
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Typical situation for expectation values <f>:
Typical situation for evidence calculation:
Evidence Integrals
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
( ) ( ) ( )∫ Γ= xxxdZ πλ λ
( ) ( ) ( ) ( )∫ ∫∫=
=
=
=
Γ=∂
∂=1
0
1
0
lnln
lnλ
λλ
λ
λ
ρλλ
λλ xxxddZ
dI
Thermodynamic Integration: Slowly introduce likelihood structure into integrand (similar: parallel tempering, perfect tempering):
Z(0)=1 and Z(1)=Evidence
Evidence Integrals
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Evidence Integrals
Different approach: Nested sampling...see John's talks :-)
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Outline
V. Experimental Design Basic Concept Nuclear Reaction Analysis
VI. 'Non-parametric' models Gaussian Processes
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Experimental Design
Design of experiments:
Maximize information gain of planned experiments Best performance for various physical scenarios,
hardware constraints, parameter constraints, financial budget
How to quantify the strength (weakness) of an experiment?How to quantitatively chose experimental design parameter(s) ?
(spectral bands, number and position of line-of-sights, measurement times)
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Experimental Design
Bayesian Decision Theory
Decisions depend on consequencesMight bet on improbable outcome if payoff is large
Utility functionsCompare consequences via utility quantifying the benefits of a decision
Choice of action: a (eg next accelerator energy E0)
Utility=U(a,o)Outcome: o (eg next yield d(E
0))
Deciding amidst uncertaintyWe are uncertain of what the outcome will be: average possible results
Expected Utility:
Best choice maximizes EU: a*=arg max EU(a) a
EU (a)= ∑outcomes
P (o∣a)U (a ,o )
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Information as Utility: Goal:minimize estimation uncertainty for parameter(s) θ or maximize information gain
Utility function: Kullback-Leibler divergence K (generalizes covariance-matrix)
(a=action, d=expected data, θ=parameter)
Putting it all together:
Best choice maximizes EU: a*=arg max EU(a)
High- dimensional integration necessary: Markov Chain Monte Carlo-Methods
Experimental Design
EU (a)=∫dd p (d∣a)K (d , a)=∫d θ p(θ)∫dd p(d∣θ , a)K (d , a)
Utility (d , a)=K (d ,a)=∫d θ p (θ∣d , a)lnp (θ∣d ,a)
p(θ∣I)
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Putting it all together: Without initial data:
Experimental Design
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Putting it all together: With previous data d (which has been measured before action a) :
Experimental Design
Keep in mind (marginalisation): d
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Putting it all together: D=D(a,d) With previous data d (which has been measured before action a ) :
Experimental Design
Keep in mind (marginalisation):
And now ? Have a closer look...
d
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Putting it all together: D=D(a) Only two terms are relevant: old posterior and new likelihood
Approximate by posterior samples (eg. via nested sampling or rejection sampling)
Experimental Design
1
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Approximate by posterior samples
Experimental Design
1
Just 1-dim integrals remain (no high-dim parameter integration!)
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Approximate by posterior samples
Experimental Design
1
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Tutorial example:
Model: di = a*cos(b*t
i) + ε
i with ε
i~N(0,0.5)
Prior: p(a|I) flat in [0,10], p(b|I)= c/b for b in [0.04,25]
Measurement at d1(t=1.5)=1.6
Compute best time t2 for next measurement...
Experimental Design
1
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Experimental Design
- Hydrogen isotope depth profiling using NRA (not too many other ways)- nuclear reaction: d(3He,p)4He : background free, range
V. Alimov, J. Roth, NIMB, 2006
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Experimental Design
c x =a 0 e x p−xa 1
a 2
Sequential Design: Which energy E
0 next?
- depth profile has to be parametrized: piecewise constant or analytically:
c(x)=a0,a
1,a
2,a
3,... or
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Experimental Design
Compute EU(E),and select best energy
Integration by posteriorsamplingCPU:1-4 min
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Experimental Design
Optimize EU,using posteriorsamplingCPU:1-4 min
Cycle:Prediction -Verification
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Experimental Design
Optimize EU,using posteriorsamplingCPU:1-4 min
Cycle:Prediction -Verification
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Experimental Design
Optimize EU,using posteriorsamplingCPU:1-4 min
Cycle:Prediction -Verification
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Experimental Design
Quantification of information
Value of measurements /diagnostics / experimental setups accessibleOptimized measurements protocols (on-line sequential design)
• Linear and nonlinear problems can be tackled
• Necessary integrations computationally demanding:
High-dimensional integrals in data- and parameter space Anything beyond greedy algorithms (almost) unexplored (n-step ahead designs)
Considerable gain in efficiency and/or accuracy has been demonstrated
• Result-driven automated measurement strategies ( robotics) conceivable
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Outline
VI. 'Non-parametric' models Gaussian Processes
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Large Scale Models: Methods in Uncertainty Quantification
Methods applied in Uncertainty Quantification
Sampling
Spectral Expansion for Simulators Galerkin Approach Stochastic Collocation Discrete Projection
Surrogate models (Emulators)
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Methods applied in Uncertainty Quantification
Sampling
Spectral Expansion for Simulators Galerkin Approach Stochastic Collocation Discrete Projection
Surrogate models (Emulators)
Large Scale Models: Methods in Uncertainty Quantification
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Emulators
x Code y( x)
A simulator is a model of a real process Typically implemented as a computer code Think of it as a function taking inputs x and giving
outputs y:
An emulator is a statistical representation of this function Expressing knowledge/beliefs about what the output
will be at any given input(s) Built using prior information and a training set of
model runs
Use of emulators as surrogate for simulators:
Focus on Gaussian Processes
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Literature: Gaussian Processes, Rasmussen (available online)Slide adapted from [4]
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018Slide adapted from [4]
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018Slide adapted from [4]
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018Slide adapted from [4]
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018Slide adapted from [4]
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
SOLPS-Application
SOLPS-Data base: - 1500 parameters- collected by D. Coster
Idea: exploit for initialization, scans
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
SOLPS-Data
• Result of GP-interpolation• Color code: normalized variance
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
SOLPS-Application
• Present data base “insufficient” …• Physic based line-scans (power, density, temperature...)• Does not cover space (eg. approx. Latin hypercube)
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Closed-loop design
• Present data base insufficient + limited human resources
Closed-loop data base generation (eg. based on Bayesian Experimental Design):
• Design Cycle:
- Determine best location(s) for next simulation(s): Utility
- Recompute uncertainty estimates
- Check for design criteria: exit?
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
SOLPS-Application
Idea: Select input-space locations with largest information gain
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
SOLPS-Application
Input-space locations with largest information gain: at infinity :-(
Need to define region of interest and 'integral' measures
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Closed-loop design
Utility function: Maximize KL-divergence:
or (easier):
minimize variance for ROI
KL(N0 ; N1)=12
log (det (Σ1Σ0−1))+
12
tr Σ1−1 ((μ0−μ1)(μ0−μ1)
T+Σ0−Σ1)
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Closed-loop design
Iterated closed-loop design:
Function + est. variance deviation GP-model
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Closed-loop design
Reduction of integrated deviation:
noise
shape
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Closed-loop design
KL-ROI-measure → Variance measure
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Gaussian Processes: Challenges
• Scaling: N3 : not yet prohibitive• Limitation: 'global' scale of covariance matrix
→ work in progress (hyperparameter optimization, adaptive segmentation)
• Correlated output
- Standard approach: independent scalar response variablesDrawback: Prediction not satisfactory: co-variance
- Difficulty: Cokriging extremely expensive
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Conclusion
• Bayesian probability theory provides consistent and transparent approach to the cycle of scientific inference
• Incorporation of available prior information is straightforward
• Drawback of numerical complexity mitigated by • New algorithms• Increasing computing power
• State of the Art: parameter estimation, model comparison
• Still (largely) unexplored:- potential of Experimental Design, e.g. robotics, self-adapting
'measurement'-strategies on computer simulations (e.g. automated MD potential generation)
- novelty detection in large scale simulations / experiments
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Thank youfor
your attention
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
⊗ =
Soft X-ray Thomson scattering Integrated result
Integrated data analysis (IDA)
Exploitation of nonlinear correlations
U. von Toussaint, Bayesian Lecture I+II+III+iV,Summer School SA, Jan 2018
Outlier detection – without censoringMixture approach
AUG Thomson profile with short term fluctuations
Outlier detection