327
Probabilistic Inference and Forecasting in Physics Giulio D’Agostini Universit` a di Roma La Sapienza e INFN Roma, Italy c GdA, WLS-6 12/02/20 1/67

Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Probabilistic Inference and Forecastingin Physics

Giulio D’Agostini

Universita di Roma La Sapienza e INFNRoma, Italy

c© GdA, WLS-6 12/02/20 1/67

Page 2: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

List of contents

◮ Summary and remarks on probabilitic inference/forecasting [3-9]

c© GdA, WLS-6 12/02/20 2/67

Page 3: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

List of contents

◮ Summary and remarks on probabilitic inference/forecasting [3-9]

◮ Normal distribution: inferring µ [10-14]

c© GdA, WLS-6 12/02/20 2/67

Page 4: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

List of contents

◮ Summary and remarks on probabilitic inference/forecasting [3-9]

◮ Normal distribution: inferring µ [10-14]

◮ Comments on priors [15-18]

c© GdA, WLS-6 12/02/20 2/67

Page 5: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

List of contents

◮ Summary and remarks on probabilitic inference/forecasting [3-9]

◮ Normal distribution: inferring µ [10-14]

◮ Comments on priors [15-18]

◮ Normal distribution: forecasting future observations [19-25]

c© GdA, WLS-6 12/02/20 2/67

Page 6: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

List of contents

◮ Summary and remarks on probabilitic inference/forecasting [3-9]

◮ Normal distribution: inferring µ [10-14]

◮ Comments on priors [15-18]

◮ Normal distribution: forecasting future observations [19-25]

◮ Systematics: intro and exact treatement of offset type [26-42]

c© GdA, WLS-6 12/02/20 2/67

Page 7: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

List of contents

◮ Summary and remarks on probabilitic inference/forecasting [3-9]

◮ Normal distribution: inferring µ [10-14]

◮ Comments on priors [15-18]

◮ Normal distribution: forecasting future observations [19-25]

◮ Systematics: intro and exact treatement of offset type [26-42]

◮ Inferring µ from a sample [43-45]

c© GdA, WLS-6 12/02/20 2/67

Page 8: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

List of contents

◮ Summary and remarks on probabilitic inference/forecasting [3-9]

◮ Normal distribution: inferring µ [10-14]

◮ Comments on priors [15-18]

◮ Normal distribution: forecasting future observations [19-25]

◮ Systematics: intro and exact treatement of offset type [26-42]

◮ Inferring µ from a sample [43-45]

◮ Joint inference of µ and σ: overview, Gibbs samplerand comments on critical issues [46-53]

c© GdA, WLS-6 12/02/20 2/67

Page 9: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

List of contents

◮ Summary and remarks on probabilitic inference/forecasting [3-9]

◮ Normal distribution: inferring µ [10-14]

◮ Comments on priors [15-18]

◮ Normal distribution: forecasting future observations [19-25]

◮ Systematics: intro and exact treatement of offset type [26-42]

◮ Inferring µ from a sample [43-45]

◮ Joint inference of µ and σ: overview, Gibbs samplerand comments on critical issues [46-53]

◮ Fits: general considerations; linear model in JAGS [54-67]

c© GdA, WLS-6 12/02/20 2/67

Page 10: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

List of contents

◮ Summary and remarks on probabilitic inference/forecasting [3-9]

◮ Normal distribution: inferring µ [10-14]

◮ Comments on priors [15-18]

◮ Normal distribution: forecasting future observations [19-25]

◮ Systematics: intro and exact treatement of offset type [26-42]

◮ Inferring µ from a sample [43-45]

◮ Joint inference of µ and σ: overview, Gibbs samplerand comments on critical issues [46-53]

◮ Fits: general considerations; linear model in JAGS [54-67]

◮ Appendix with details on small sample inference of µ and σ

c© GdA, WLS-6 12/02/20 2/67

Page 11: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Mass −→ Reading

‘Mass’ ↔ ‘µ’; ’Reading’ ↔ ‘x ’

mass

readingc© GdA, WLS-6 12/02/20 3/67

Page 12: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Mass −→ Reading

‘Mass’ ↔ ‘µ’; ’Reading’ ↔ ‘x ’

mass

reading

?

c© GdA, WLS-6 12/02/20 3/67

Page 13: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Mass −→ Reading

‘Mass’ ↔ ‘µ’; ’Reading’ ↔ ‘x ’

mass

reading

Model

?

c© GdA, WLS-6 12/02/20 3/67

Page 14: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Reading −→ ‘true’ mass

‘Mass’: ‘µ’; ’Reading’: ‘x ’ ⇒ Inference

mass

readingc© GdA, WLS-6 12/02/20 4/67

Page 15: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Reading −→ ‘true’ mass

‘Mass’: ‘µ’; ’Reading’: ‘x ’ ⇒ Inference

mass

reading

Model

?

c© GdA, WLS-6 12/02/20 4/67

Page 16: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Reading → mass → future reading

‘Mass’: ‘µ’; ’Reading’: ‘x ’ ⇒ Forecasting

mass

reading

Model

c© GdA, WLS-6 12/02/20 5/67

Page 17: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theorySome important remarks (‘philosophical’, if you like)

c© GdA, WLS-6 12/02/20 6/67

Page 18: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theorySome important remarks (‘philosophical’, if you like)

◮ As we have seen, in probability theory there isno clear distinction between inference and forecasting:

c© GdA, WLS-6 12/02/20 6/67

Page 19: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theorySome important remarks (‘philosophical’, if you like)

◮ As we have seen, in probability theory there isno clear distinction between inference and forecasting:◮ just probabilistic statements on numbers (or ‘events’)

about which we are in condition of uncertainty.

c© GdA, WLS-6 12/02/20 6/67

Page 20: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theorySome important remarks (‘philosophical’, if you like)

◮ As we have seen, in probability theory there isno clear distinction between inference and forecasting:◮ just probabilistic statements on numbers (or ‘events’)

about which we are in condition of uncertainty.

◮ Similarly, in probability theory, there are no causes, strictlyspeaking,

c© GdA, WLS-6 12/02/20 6/67

Page 21: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theorySome important remarks (‘philosophical’, if you like)

◮ As we have seen, in probability theory there isno clear distinction between inference and forecasting:◮ just probabilistic statements on numbers (or ‘events’)

about which we are in condition of uncertainty.

◮ Similarly, in probability theory, there are no causes, strictlyspeaking, but only conditions.

c© GdA, WLS-6 12/02/20 6/67

Page 22: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theorySome important remarks (‘philosophical’, if you like)

◮ As we have seen, in probability theory there isno clear distinction between inference and forecasting:◮ just probabilistic statements on numbers (or ‘events’)

about which we are in condition of uncertainty.

◮ Similarly, in probability theory, there are no causes, strictlyspeaking, but only conditions.

In general conditioning does not imply causation!

c© GdA, WLS-6 12/02/20 6/67

Page 23: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theorySome important remarks (‘philosophical’, if you like)

◮ As we have seen, in probability theory there isno clear distinction between inference and forecasting:◮ just probabilistic statements on numbers (or ‘events’)

about which we are in condition of uncertainty.

◮ Similarly, in probability theory, there are no causes, strictlyspeaking, but only conditions.

In general conditioning does not imply causation!(to say nothing of spurious correlations. . . )

c© GdA, WLS-6 12/02/20 6/67

Page 24: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theorySome important remarks (‘philosophical’, if you like)

◮ As we have seen, in probability theory there isno clear distinction between inference and forecasting:◮ just probabilistic statements on numbers (or ‘events’)

about which we are in condition of uncertainty.

◮ Similarly, in probability theory, there are no causes, strictlyspeaking, but only conditions.

In general conditioning does not imply causation!(to say nothing of spurious correlations. . . )

◮ As we have also seen, in probability theory the most generalefull description of the case is provided by the joing pdf

f (θ, x f , x | I )

c© GdA, WLS-6 12/02/20 6/67

Page 25: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theorySome important remarks (‘philosophical’, if you like)

◮ As we have seen, in probability theory there isno clear distinction between inference and forecasting:◮ just probabilistic statements on numbers (or ‘events’)

about which we are in condition of uncertainty.

◮ Similarly, in probability theory, there are no causes, strictlyspeaking, but only conditions.

In general conditioning does not imply causation!(to say nothing of spurious correlations. . . )

◮ As we have also seen, in probability theory the most generalefull description of the case is provided by the joing pdf

f (θ, x f , x | I )

where θ and x f are the ‘uncertain vectors’ related to themodel parameters and to the future observations;

c© GdA, WLS-6 12/02/20 6/67

Page 26: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theorySome important remarks (‘philosophical’, if you like)

◮ As we have seen, in probability theory there isno clear distinction between inference and forecasting:◮ just probabilistic statements on numbers (or ‘events’)

about which we are in condition of uncertainty.

◮ Similarly, in probability theory, there are no causes, strictlyspeaking, but only conditions.

In general conditioning does not imply causation!(to say nothing of spurious correlations. . . )

◮ As we have also seen, in probability theory the most generalefull description of the case is provided by the joing pdf

f (θ, x f , x | I )

where θ and x f are the ‘uncertain vectors’ related to themodel parameters and to the future observations;x is ‘uncertain vector’ of the ‘data’ (in physicist’s language).

c© GdA, WLS-6 12/02/20 6/67

Page 27: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theory (cont.d)◮ Given f (θ, x f , x | I ), the ‘inference/forecasting’ is simply given

by

f (θ, x f | x , I ) =f (θ, x f , x | I )

f (x | I )

c© GdA, WLS-6 12/02/20 7/67

Page 28: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theory (cont.d)◮ Given f (θ, x f , x | I ), the ‘inference/forecasting’ is simply given

by

f (θ, x f | x , I ) =f (θ, x f , x | I )

f (x | I )while f (θi | x , I ) and f (xfi | x , I ) are simply obtained bymarginalization.

c© GdA, WLS-6 12/02/20 7/67

Page 29: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theory (cont.d)◮ Given f (θ, x f , x | I ), the ‘inference/forecasting’ is simply given

by

f (θ, x f | x , I ) =f (θ, x f , x | I )

f (x | I )while f (θi | x , I ) and f (xfi | x , I ) are simply obtained bymarginalization. Similarily f (θi , θj | x , I ), etc.

c© GdA, WLS-6 12/02/20 7/67

Page 30: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theory (cont.d)◮ Given f (θ, x f , x | I ), the ‘inference/forecasting’ is simply given

by

f (θ, x f | x , I ) =f (θ, x f , x | I )

f (x | I )while f (θi | x , I ) and f (xfi | x , I ) are simply obtained bymarginalization. Similarily f (θi , θj | x , I ), etc.

◮ The so called ‘chain rule‘ allows us to rewrite f (θ, x f , x | I ) inmany possible ways.

c© GdA, WLS-6 12/02/20 7/67

Page 31: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theory (cont.d)◮ Given f (θ, x f , x | I ), the ‘inference/forecasting’ is simply given

by

f (θ, x f | x , I ) =f (θ, x f , x | I )

f (x | I )while f (θi | x , I ) and f (xfi | x , I ) are simply obtained bymarginalization. Similarily f (θi , θj | x , I ), etc.

◮ The so called ‘chain rule‘ allows us to rewrite f (θ, x f , x | I ) inmany possible ways. The most convenient for us are thosewhich can be mapped into a causal graphical model

c© GdA, WLS-6 12/02/20 7/67

Page 32: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Physicist’s language vs probability theory (cont.d)◮ Given f (θ, x f , x | I ), the ‘inference/forecasting’ is simply given

by

f (θ, x f | x , I ) =f (θ, x f , x | I )

f (x | I )while f (θi | x , I ) and f (xfi | x , I ) are simply obtained bymarginalization. Similarily f (θi , θj | x , I ), etc.

◮ The so called ‘chain rule‘ allows us to rewrite f (θ, x f , x | I ) inmany possible ways. The most convenient for us are thosewhich can be mapped into a causal graphical model, e.g.

n0

x0

p n

x

λ

Tr

rSrBT0

λ0

xB t

√√

c© GdA, WLS-6 12/02/20 7/67

Page 33: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferential nodes vs forecasting nodes

There is indeed a difference between the nodes associated to theparameters of the model (‘θ’) and the nodes associated to futuremeasurements (‘x f ’), as we can also see from the graphical model:

c© GdA, WLS-6 12/02/20 8/67

Page 34: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferential nodes vs forecasting nodes

There is indeed a difference between the nodes associated to theparameters of the model (‘θ’) and the nodes associated to futuremeasurements (‘x f ’), as we can also see from the graphical model:

◮ some nodes associated to parameters have no parentsand therefore they need priors;

c© GdA, WLS-6 12/02/20 8/67

Page 35: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferential nodes vs forecasting nodes

There is indeed a difference between the nodes associated to theparameters of the model (‘θ’) and the nodes associated to futuremeasurements (‘x f ’), as we can also see from the graphical model:

◮ some nodes associated to parameters have no parentsand therefore they need priors;

◮ the nodes associated to measurements future observationsdo have parents: no priors are needed (see graphical model inthe previous slide)

c© GdA, WLS-6 12/02/20 8/67

Page 36: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferential nodes vs forecasting nodes

There is indeed a difference between the nodes associated to theparameters of the model (‘θ’) and the nodes associated to futuremeasurements (‘x f ’), as we can also see from the graphical model:

◮ some nodes associated to parameters have no parentsand therefore they need priors;

◮ the nodes associated to measurements future observationsdo have parents: no priors are needed (see graphical model inthe previous slide)⇒ observables follow from the model parametersin a (probabilistic) deductive way.

Therefore x f can be forgotten when we are interested in theinference of the model parameters and we focus ouselves on

f (θ | x , I ) =f (θ, x | I )f (x | I )

c© GdA, WLS-6 12/02/20 8/67

Page 37: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferential nodes vs forecasting nodes

There is indeed a difference between the nodes associated to theparameters of the model (‘θ’) and the nodes associated to futuremeasurements (‘x f ’), as we can also see from the graphical model:

◮ some nodes associated to parameters have no parentsand therefore they need priors;

◮ the nodes associated to measurements future observationsdo have parents: no priors are needed (see graphical model inthe previous slide)⇒ observables follow from the model parametersin a (probabilistic) deductive way.

Therefore x f can be forgotten when we are interested in theinference of the model parameters and we focus ouselves on

f (θ | x , I ) =f (θ, x | I )f (x | I ) ∝ f (θ, x | I )

c© GdA, WLS-6 12/02/20 8/67

Page 38: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferential nodes vs forecasting nodes

There is indeed a difference between the nodes associated to theparameters of the model (‘θ’) and the nodes associated to futuremeasurements (‘x f ’), as we can also see from the graphical model:

◮ some nodes associated to parameters have no parentsand therefore they need priors;

◮ the nodes associated to measurements future observationsdo have parents: no priors are needed (see graphical model inthe previous slide)⇒ observables follow from the model parametersin a (probabilistic) deductive way.

Therefore x f can be forgotten when we are interested in theinference of the model parameters and we focus ouselves on

f (θ | x , I ) =f (θ, x | I )f (x | I ) ∝ f (θ, x | I ) = f (x |θ, I ) · f0(θ | I )

c© GdA, WLS-6 12/02/20 8/67

Page 39: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Sequential use of the Bayes’ rule

Sticking, to simplify the notation, to 1-D θ and calling x1, x2, etc.the observations in ‘order of inclusion’ in the analysis(and assuming ‘I’ implicit):

f (θ | x1) ∝ f (x1 | θ) · f (θ)

c© GdA, WLS-6 12/02/20 9/67

Page 40: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Sequential use of the Bayes’ rule

Sticking, to simplify the notation, to 1-D θ and calling x1, x2, etc.the observations in ‘order of inclusion’ in the analysis(and assuming ‘I’ implicit):

f (θ | x1) ∝ f (x1 | θ) · f (θ)

f (θ | x1, x2)

c© GdA, WLS-6 12/02/20 9/67

Page 41: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Sequential use of the Bayes’ rule

Sticking, to simplify the notation, to 1-D θ and calling x1, x2, etc.the observations in ‘order of inclusion’ in the analysis(and assuming ‘I’ implicit):

f (θ | x1) ∝ f (x1 | θ) · f (θ)

f (θ | x1, x2) ∝ f (x1, x2 | θ) · f (θ)

c© GdA, WLS-6 12/02/20 9/67

Page 42: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Sequential use of the Bayes’ rule

Sticking, to simplify the notation, to 1-D θ and calling x1, x2, etc.the observations in ‘order of inclusion’ in the analysis(and assuming ‘I’ implicit):

f (θ | x1) ∝ f (x1 | θ) · f (θ)

f (θ | x1, x2) ∝ f (x1, x2 | θ) · f (θ)∝ f (x2 | x1, θ) · f (x1, θ)

c© GdA, WLS-6 12/02/20 9/67

Page 43: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Sequential use of the Bayes’ rule

Sticking, to simplify the notation, to 1-D θ and calling x1, x2, etc.the observations in ‘order of inclusion’ in the analysis(and assuming ‘I’ implicit):

f (θ | x1) ∝ f (x1 | θ) · f (θ)

f (θ | x1, x2) ∝ f (x1, x2 | θ) · f (θ)∝ f (x2 | x1, θ) · f (x1, θ)∝ f (x2 | x1, θ) · f (x1 | θ) · f (θ)

c© GdA, WLS-6 12/02/20 9/67

Page 44: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Sequential use of the Bayes’ rule

Sticking, to simplify the notation, to 1-D θ and calling x1, x2, etc.the observations in ‘order of inclusion’ in the analysis(and assuming ‘I’ implicit):

f (θ | x1) ∝ f (x1 | θ) · f (θ)

f (θ | x1, x2) ∝ f (x1, x2 | θ) · f (θ)∝ f (x2 | x1, θ) · f (x1, θ)∝ f (x2 | x1, θ) · f (x1 | θ) · f (θ)

if x1 and x2 independent:

∝ f (x2 | θ) · f (x1 | θ) · f (θ)

c© GdA, WLS-6 12/02/20 9/67

Page 45: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Sequential use of the Bayes’ rule

Sticking, to simplify the notation, to 1-D θ and calling x1, x2, etc.the observations in ‘order of inclusion’ in the analysis(and assuming ‘I’ implicit):

f (θ | x1) ∝ f (x1 | θ) · f (θ)

f (θ | x1, x2) ∝ f (x1, x2 | θ) · f (θ)∝ f (x2 | x1, θ) · f (x1, θ)∝ f (x2 | x1, θ) · f (x1 | θ) · f (θ)

if x1 and x2 independent:

∝ f (x2 | θ) · f (x1 | θ) · f (θ)

Reasoning easily extended to many xi .

c© GdA, WLS-6 12/02/20 9/67

Page 46: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Sequential use of the Bayes’ rule

Sticking, to simplify the notation, to 1-D θ and calling x1, x2, etc.the observations in ‘order of inclusion’ in the analysis(and assuming ‘I’ implicit):

f (θ | x1) ∝ f (x1 | θ) · f (θ)

f (θ | x1, x2) ∝ f (x1, x2 | θ) · f (θ)∝ f (x2 | x1, θ) · f (x1, θ)∝ f (x2 | x1, θ) · f (x1 | θ) · f (θ)

if x1 and x2 independent:

∝ f (x2 | θ) · f (x1 | θ) · f (θ)

Reasoning easily extended to many xi .

‘Likelihoods’ multiply if independent

c© GdA, WLS-6 12/02/20 9/67

Page 47: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Sequential use of the Bayes’ rule

Sticking, to simplify the notation, to 1-D θ and calling x1, x2, etc.the observations in ‘order of inclusion’ in the analysis(and assuming ‘I’ implicit):

f (θ | x1) ∝ f (x1 | θ) · f (θ)

f (θ | x1, x2) ∝ f (x1, x2 | θ) · f (θ)∝ f (x2 | x1, θ) · f (x1, θ)∝ f (x2 | x1, θ) · f (x1 | θ) · f (θ)

if x1 and x2 independent:

∝ f (x2 | θ) · f (x1 | θ) · f (θ)

Reasoning easily extended to many xi .

‘Likelihoods’ multiply if independent; else ⇒ overall likelihood

c© GdA, WLS-6 12/02/20 9/67

Page 48: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distributionSetting up the problem

µ σ

x

c© GdA, WLS-6 12/02/20 10/67

Page 49: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distributionSetting up the problem

µ σ

x

◮ In general f (x , µ, σ | I )

c© GdA, WLS-6 12/02/20 10/67

Page 50: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distributionSetting up the problem

µ σ

x

◮ In general f (x , µ, σ | I )

◮ We start assuming σ well known, that we call here σe toremember that it is the standard deviation which describesstatistical errors.

c© GdA, WLS-6 12/02/20 10/67

Page 51: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distributionSetting up the problem

µ σ

x

◮ In general f (x , µ, σ | I )

◮ We start assuming σ well known, that we call here σe toremember that it is the standard deviation which describesstatistical errors.

◮ And let us start from having observed the ‘first’ value x1

c© GdA, WLS-6 12/02/20 10/67

Page 52: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distributionSetting up the problem

µ σ

x

◮ In general f (x , µ, σ | I )

◮ We start assuming σ well known, that we call here σe toremember that it is the standard deviation which describesstatistical errors.

◮ And let us start from having observed the ‘first’ value x1(remember that time order is not important;what matters is the order in which the information is used)

c© GdA, WLS-6 12/02/20 10/67

Page 53: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

◮ σe assumed perfectly known;

◮ x1 observed

c© GdA, WLS-6 12/02/20 11/67

Page 54: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

◮ σe assumed perfectly known;

◮ x1 observed (≡ ‘assumed perfectly known’)

c© GdA, WLS-6 12/02/20 11/67

Page 55: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

◮ σe assumed perfectly known;

◮ x1 observed (≡ ‘assumed perfectly known’)

µ σe

x1

c© GdA, WLS-6 12/02/20 11/67

Page 56: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

◮ σe assumed perfectly known;

◮ x1 observed (≡ ‘assumed perfectly known’)

µ σe

x1

◮ Our task: f (µ | x1, σe)

c© GdA, WLS-6 12/02/20 11/67

Page 57: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

◮ σe assumed perfectly known;

◮ x1 observed (≡ ‘assumed perfectly known’)

µ σe

x1

◮ Our task: f (µ | x1, σe)

◮ In general: f (µ | data, I )

c© GdA, WLS-6 12/02/20 11/67

Page 58: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

◮ σe assumed perfectly known;

◮ x1 observed (≡ ‘assumed perfectly known’)

µ σe

x1

◮ Our task: f (µ | x1, σe)

◮ In general: f (µ | data, I )‘data’ can be a set of observations

c© GdA, WLS-6 12/02/20 11/67

Page 59: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

µ σe

x1

c© GdA, WLS-6 12/02/20 12/67

Page 60: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

µ σe

x1

(Considering implicit the condition σe as well as I )

f (µ | x1) ∝ f (x1 |µ) · f0(µ)

c© GdA, WLS-6 12/02/20 12/67

Page 61: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

µ σe

x1

(Considering implicit the condition σe as well as I )

f (µ | x1) ∝ f (x1 |µ) · f0(µ)

f (µ | x1) =f (x1 |µ) · f0(µ)

f (x1)

c© GdA, WLS-6 12/02/20 12/67

Page 62: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

µ σe

x1

(Considering implicit the condition σe as well as I )

f (µ | x1) ∝ f (x1 |µ) · f0(µ)

f (µ | x1) =f (x1 |µ) · f0(µ)

f (x1)

=f (x1 |µ) · f0(µ)

∫ +∞−∞ f (x1 |µ) · f0(µ) dµ

c© GdA, WLS-6 12/02/20 12/67

Page 63: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distributionSolution for a flat prior

Starting as usual from a flat prior

f (µ | x1) =

1√2π σe

e−

(x1−µ)2

2σ2e

∫∞−∞

1√2π σe

e−

(x1−µ)2

2σ2e dµ

c© GdA, WLS-6 12/02/20 13/67

Page 64: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distributionSolution for a flat prior

Starting as usual from a flat prior

f (µ | x1) =

1√2π σe

e−

(x1−µ)2

2σ2e

∫∞−∞

1√2π σe

e−

(x1−µ)2

2σ2e dµ

In the denominator, the exponential depends on (x1 − µ)2:→ the integral over µ is equal to the integral over x1

c© GdA, WLS-6 12/02/20 13/67

Page 65: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distributionSolution for a flat prior

Starting as usual from a flat prior

f (µ | x1) =

1√2π σe

e−

(x1−µ)2

2σ2e

∫∞−∞

1√2π σe

e−

(x1−µ)2

2σ2e dµ

In the denominator, the exponential depends on (x1 − µ)2:→ the integral over µ is equal to the integral over x1: → 1

c© GdA, WLS-6 12/02/20 13/67

Page 66: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distributionSolution for a flat prior

Starting as usual from a flat prior

f (µ | x1) =

1√2π σe

e−

(x1−µ)2

2σ2e

∫∞−∞

1√2π σe

e−

(x1−µ)2

2σ2e dµ

In the denominator, the exponential depends on (x1 − µ)2:→ the integral over µ is equal to the integral over x1: → 1

f (µ | x1) =1√

2π σee−

(µ−x1)2

2σ2e

c© GdA, WLS-6 12/02/20 13/67

Page 67: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distributionSolution for a flat prior

Starting as usual from a flat prior

f (µ | x1) =

1√2π σe

e−

(x1−µ)2

2σ2e

∫∞−∞

1√2π σe

e−

(x1−µ)2

2σ2e dµ

In the denominator, the exponential depends on (x1 − µ)2:→ the integral over µ is equal to the integral over x1: → 1

f (µ | x1) =1√

2π σee−

(µ−x1)2

2σ2e

Note the swap of µ and x1 at the exponent,to emphasize that they have now different roles:

c© GdA, WLS-6 12/02/20 13/67

Page 68: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distributionSolution for a flat prior

Starting as usual from a flat prior

f (µ | x1) =

1√2π σe

e−

(x1−µ)2

2σ2e

∫∞−∞

1√2π σe

e−

(x1−µ)2

2σ2e dµ

In the denominator, the exponential depends on (x1 − µ)2:→ the integral over µ is equal to the integral over x1: → 1

f (µ | x1) =1√

2π σee−

(µ−x1)2

2σ2e

Note the swap of µ and x1 at the exponent,to emphasize that they have now different roles:

◮ µ is the variable;

◮ x1 is a parameter

c© GdA, WLS-6 12/02/20 13/67

Page 69: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

x

Μ

x0

?

Inference

c© GdA, WLS-6 12/02/20 14/67

Page 70: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

x

Μ

x0

?

Inference

f (µ | x1) =1√

2π σee−

(µ−x1)2

2σ2e

c© GdA, WLS-6 12/02/20 14/67

Page 71: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

x

Μ

x0

?

Inference

f (µ | x1) =1√

2π σee−

(µ−x1)2

2σ2e

Summaries:

E[µ] = x1

σ(µ) = σe

c© GdA, WLS-6 12/02/20 14/67

Page 72: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

x

Μ

x0

?

Inference

f (µ | x1) =1√

2π σee−

(µ−x1)2

2σ2e

Summaries:

E[µ] = x1

σ(µ) = σe

All probability intervals calculated from the pdf.

c© GdA, WLS-6 12/02/20 14/67

Page 73: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

x

Μ

x0

?

Inference

f (µ | x1) =1√

2π σee−

(µ−x1)2

2σ2e

Summaries:

E[µ] = x1

σ(µ) = σe

All probability intervals calculated from the pdf.⇒ really probability intervals, and not ‘confidence intervals’

c© GdA, WLS-6 12/02/20 14/67

Page 74: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ of the normal distribution

x

Μ

x0

?

Inference

f (µ | x1) =1√

2π σee−

(µ−x1)2

2σ2e

Summaries:

E[µ] = x1

σ(µ) = σe

All probability intervals calculated from the pdf.⇒ really probability intervals, and not ‘confidence intervals’(∗)

(∗)The expressions “confidence interval” and “confidence limits” are jeopardized

having often little to do with ‘confidence’ – sic! c© GdA, WLS-6 12/02/20 14/67

Page 75: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Role of the prior

Yes, but the prior?

c© GdA, WLS-6 12/02/20 15/67

Page 76: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Role of the prior

Yes, but the prior?Remember that (writing σe again)

f (µ | xi σe) ∝ f (x1 |µ, σe) · f0(µ)

c© GdA, WLS-6 12/02/20 15/67

Page 77: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Role of the prior

Yes, but the prior?Remember that (writing σe again)

f (µ | xi σe) ∝ f (x1 |µ, σe) · f0(µ)◮ The first factor in the r.h.s. (‘likelihood’)

prefers a region a few σe ’s around x1.

c© GdA, WLS-6 12/02/20 15/67

Page 78: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Role of the prior

Yes, but the prior?Remember that (writing σe again)

f (µ | xi σe) ∝ f (x1 |µ, σe) · f0(µ)◮ The first factor in the r.h.s. (‘likelihood’)

prefers a region a few σe ’s around x1.

◮ If f0(µ) is ‘practically flat’ in that region, then it is irrelevant.

c© GdA, WLS-6 12/02/20 15/67

Page 79: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Role of the prior

Yes, but the prior?Remember that (writing σe again)

f (µ | xi σe) ∝ f (x1 |µ, σe) · f0(µ)◮ The first factor in the r.h.s. (‘likelihood’)

prefers a region a few σe ’s around x1.

◮ If f0(µ) is ‘practically flat’ in that region, then it is irrelevant.

◮ Otherwise model it at best and do the math (e.g. by MCMC).

c© GdA, WLS-6 12/02/20 15/67

Page 80: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Role of the prior

Yes, but the prior?Remember that (writing σe again)

f (µ | xi σe) ∝ f (x1 |µ, σe) · f0(µ)◮ The first factor in the r.h.s. (‘likelihood’)

prefers a region a few σe ’s around x1.

◮ If f0(µ) is ‘practically flat’ in that region, then it is irrelevant.

◮ Otherwise model it at best and do the math (e.g. by MCMC).

◮ And, please, remember Gauss (well aware of the limitations)

c© GdA, WLS-6 12/02/20 15/67

Page 81: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Role of the prior

Yes, but the prior?Remember that (writing σe again)

f (µ | xi σe) ∝ f (x1 |µ, σe) · f0(µ)◮ The first factor in the r.h.s. (‘likelihood’)

prefers a region a few σe ’s around x1.

◮ If f0(µ) is ‘practically flat’ in that region, then it is irrelevant.

◮ Otherwise model it at best and do the math (e.g. by MCMC).

◮ And, please, remember Gauss (well aware of the limitations). . . and that

”All models are wrong, but some are useful”(G. Box)

c© GdA, WLS-6 12/02/20 15/67

Page 82: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Role of the prior

Yes, but the prior?Remember that (writing σe again)

f (µ | xi σe) ∝ f (x1 |µ, σe) · f0(µ)◮ The first factor in the r.h.s. (‘likelihood’)

prefers a region a few σe ’s around x1.

◮ If f0(µ) is ‘practically flat’ in that region, then it is irrelevant.

◮ Otherwise model it at best and do the math (e.g. by MCMC).

◮ And, please, remember Gauss (well aware of the limitations). . . and that

”All models are wrong, but some are useful”(G. Box)

And GAnd Gauss was the first to realize thatAnd Gthe Gaussian is indeed ‘wrong’ !

c© GdA, WLS-6 12/02/20 15/67

Page 83: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Use of a conjugate priorAs we have already, a ‘trick’ developped in order to simplify thecalculations is the use of conjugate priors:

c© GdA, WLS-6 12/02/20 16/67

Page 84: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Use of a conjugate priorAs we have already, a ‘trick’ developped in order to simplify thecalculations is the use of conjugate priors:

Binomial distribution: Beta distribution.Poisson distribution: Gamma distribution.

c© GdA, WLS-6 12/02/20 16/67

Page 85: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Use of a conjugate priorAs we have already, a ‘trick’ developped in order to simplify thecalculations is the use of conjugate priors:

Binomial distribution: Beta distribution.Poisson distribution: Gamma distribution.Gaussian distribution: Gaussian distribution.

c© GdA, WLS-6 12/02/20 16/67

Page 86: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Use of a conjugate priorAs we have already, a ‘trick’ developped in order to simplify thecalculations is the use of conjugate priors:

Binomial distribution: Beta distribution.Poisson distribution: Gamma distribution.Gaussian distribution: Gaussian distribution.

Imagine that our initial prior was of the kind

µ ∼ N (µ◦, σ◦)

c© GdA, WLS-6 12/02/20 16/67

Page 87: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Use of a conjugate priorAs we have already, a ‘trick’ developped in order to simplify thecalculations is the use of conjugate priors:

Binomial distribution: Beta distribution.Poisson distribution: Gamma distribution.Gaussian distribution: Gaussian distribution.

Imagine that our initial prior was of the kind

µ ∼ N (µ◦, σ◦)

thenf (µ | x1, σe , µ◦, σ◦) ∝ e

−(x1−µ)2

2σ2e · e−

(µ−µ◦)2

2σ2◦ ,

c© GdA, WLS-6 12/02/20 16/67

Page 88: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Use of a conjugate priorAs we have already, a ‘trick’ developped in order to simplify thecalculations is the use of conjugate priors:

Binomial distribution: Beta distribution.Poisson distribution: Gamma distribution.Gaussian distribution: Gaussian distribution.

Imagine that our initial prior was of the kind

µ ∼ N (µ◦, σ◦)

thenf (µ | x1, σe , µ◦, σ◦) ∝ e

−(x1−µ)2

2σ2e · e−

(µ−µ◦)2

2σ2◦ ,

resulting into (technical details in next slide)

f (µ | x1, σe , µ◦, σ◦) =1√

2π σAe−

(µ−µA)2

2σ2A ,

c© GdA, WLS-6 12/02/20 16/67

Page 89: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Use of a conjugate priorAs we have already, a ‘trick’ developped in order to simplify thecalculations is the use of conjugate priors:

Binomial distribution: Beta distribution.Poisson distribution: Gamma distribution.Gaussian distribution: Gaussian distribution.

Imagine that our initial prior was of the kind

µ ∼ N (µ◦, σ◦)

thenf (µ | x1, σe , µ◦, σ◦) ∝ e

−(x1−µ)2

2σ2e · e−

(µ−µ◦)2

2σ2◦ ,

resulting into (technical details in next slide)

f (µ | x1, σe , µ◦, σ◦) =1√

2π σAe−

(µ−µA)2

2σ2A ,

withµA =

x1/σ2e + µ◦/σ

2◦

1/σ2e + 1/σ2

1

σ2A

=1

σ2e

+1

σ2◦.

c© GdA, WLS-6 12/02/20 16/67

Page 90: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Other ‘Gaussian tricks’Here are the details of our to get the previous result

f (µ) ∝ exp

[

−1

2

(−2µ x1σ2◦ + µ2σ2

◦ +−2µµ◦σ2e + µ2σ2

e

σ2e + σ2

)]

= exp

−1

2

µ2 − 2µ(

x1 σ2◦+µ◦ σ2

e

σ2e+σ2

)

(σ2e · σ2

◦)/(σ2e + σ2

◦)

= exp

[

−1

2

(

µ2 − 2µµA

σ2A

)]

∝ exp

[

−(µ− µA)2

2σ2A

]

In particolular, in the last step the trick of complementing theexponential has been used, since adding/removing constant termsin the exponential is equivalent to multiply/devide by factors.Once we recognize the structure, the normalization is automatic.

c© GdA, WLS-6 12/02/20 17/67

Page 91: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Use of a conjugate prior to infer µ of a Gaussian

◮ Unfortunately, the conjugate prior of a Gaussian is not thatflexible.

c© GdA, WLS-6 12/02/20 18/67

Page 92: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Use of a conjugate prior to infer µ of a Gaussian

◮ Unfortunately, the conjugate prior of a Gaussian is not thatflexible.

◮ It results on the well known formula to ‘combine results’ by aweighted average, with weights equal to the inverses of thevariances

c© GdA, WLS-6 12/02/20 18/67

Page 93: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Use of a conjugate prior to infer µ of a Gaussian

◮ Unfortunately, the conjugate prior of a Gaussian is not thatflexible.

◮ It results on the well known formula to ‘combine results’ by aweighted average, with weights equal to the inverses of thevariances

◮ In particularσA < min(σ0, σe)

c© GdA, WLS-6 12/02/20 18/67

Page 94: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Use of a conjugate prior to infer µ of a Gaussian

◮ Unfortunately, the conjugate prior of a Gaussian is not thatflexible.

◮ It results on the well known formula to ‘combine results’ by aweighted average, with weights equal to the inverses of thevariances

◮ In particularσA < min(σ0, σe)

→ a measurement improves our knowledge about µ

c© GdA, WLS-6 12/02/20 18/67

Page 95: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Use of a conjugate prior to infer µ of a Gaussian

◮ Unfortunately, the conjugate prior of a Gaussian is not thatflexible.

◮ It results on the well known formula to ‘combine results’ by aweighted average, with weights equal to the inverses of thevariances

◮ In particularσA < min(σ0, σe)

→ a measurement improves our knowledge about µ

◮ A flat prior is recovered for σ2o ≫ σ2

e (and x0 ‘reasonable’).

c© GdA, WLS-6 12/02/20 18/67

Page 96: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Predictive distribution

µ σe

xp xf

What shall we observe in a next measurement xf (’f’ as ‘future’),given our knowledge on µ based on the previous observation xp(Note the new evocative name for the observation, instead of x1)?

c© GdA, WLS-6 12/02/20 19/67

Page 97: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Predictive distribution

µ σe

xp xf

What shall we observe in a next measurement xf (’f ’ as ‘future’),given our knowledge on µ based on the previous observation xp?

c© GdA, WLS-6 12/02/20 20/67

Page 98: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Predictive distribution

µ σe

xp xf

What shall we observe in a next measurement xf (’f ’ as ‘future’),given our knowledge on µ based on the previous observation xp?(Note the new evocative name for the observation, instead of x1)

c© GdA, WLS-6 12/02/20 20/67

Page 99: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Predictive distribution

xp → µ → xf

0 0.5 1 1.5 2Μ

x

x

Observation

Prediction

EHΜL

EHxfL

c© GdA, WLS-6 12/02/20 21/67

Page 100: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Predictive distributionProbability theory teaches us how toinclude the uncertainty concerning µ:

f (x | I ) =∫ +∞

−∞f (x |µ, I ) f (µ | I ) dµ .

c© GdA, WLS-6 12/02/20 22/67

Page 101: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Predictive distributionProbability theory teaches us how toinclude the uncertainty concerning µ:

f (x | I ) =∫ +∞

−∞f (x |µ, I ) f (µ | I ) dµ .

Thus, in our case

f (xf | xp) =

∫ +∞

−∞

f (xf |µ) · f (µ | xp) dµ

c© GdA, WLS-6 12/02/20 22/67

Page 102: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Predictive distributionProbability theory teaches us how toinclude the uncertainty concerning µ:

f (x | I ) =∫ +∞

−∞f (x |µ, I ) f (µ | I ) dµ .

Thus, in our case

f (xf | xp) =

∫ +∞

−∞

f (xf |µ) · f (µ | xp) dµ

=

∫ +∞

−∞

1√2π σf

exp

[

− (xf − µ)2

2σ2f

]

1√2π σp

exp

[

− (µ− xp)2

2σ2p

]

c© GdA, WLS-6 12/02/20 22/67

Page 103: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Predictive distributionProbability theory teaches us how toinclude the uncertainty concerning µ:

f (x | I ) =∫ +∞

−∞f (x |µ, I ) f (µ | I ) dµ .

Thus, in our case

f (xf | xp) =

∫ +∞

−∞

f (xf |µ) · f (µ | xp) dµ

=

∫ +∞

−∞

1√2π σf

exp

[

− (xf − µ)2

2σ2f

]

1√2π σp

exp

[

− (µ− xp)2

2σ2p

]

=1

√2π

σ2p + σ2

f

exp

[

− (xf − xp)2

2 (σ2p + σ2

f )

]

c© GdA, WLS-6 12/02/20 22/67

Page 104: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Predictive distributionProbability theory teaches us how toinclude the uncertainty concerning µ:

f (x | I ) =∫ +∞

−∞f (x |µ, I ) f (µ | I ) dµ .

Thus, in our case

f (xf | xp) =

∫ +∞

−∞

f (xf |µ) · f (µ | xp) dµ

=

∫ +∞

−∞

1√2π σf

exp

[

− (xf − µ)2

2σ2f

]

1√2π σp

exp

[

− (µ− xp)2

2σ2p

]

=1

√2π

σ2p + σ2

f

exp

[

− (xf − xp)2

2 (σ2p + σ2

f )

]

In particular, if σp = σf = σ, then

f (xf | xp, σp = σf = σ) =1√

2π√2σ

exp

[

−(xf − xp)2

2 (√2σ)2

]

c© GdA, WLS-6 12/02/20 22/67

Page 105: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Problem on the expected x f having observed xpData: xp = 8.1234, s = 0.7234, n = 10000

c© GdA, WLS-6 12/02/20 23/67

Page 106: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Problem on the expected x f having observed xpData: xp = 8.1234, s = 0.7234, n = 10000

µ = xp ±s√n

c© GdA, WLS-6 12/02/20 23/67

Page 107: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Problem on the expected x f having observed xpData: xp = 8.1234, s = 0.7234, n = 10000

µ = xp ±s√n= 8.1234± 0.0072

c© GdA, WLS-6 12/02/20 23/67

Page 108: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Problem on the expected x f having observed xpData: xp = 8.1234, s = 0.7234, n = 10000

µ = xp ±s√n= 8.1234± 0.0072

(based on standard knowledge, including the fact that σe ≈ s withrather good approximation – we shall return on this point later)

Also the question concerning xf (meant a single observation)is rather easy to answer:

xf = xp ± s

c© GdA, WLS-6 12/02/20 23/67

Page 109: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Problem on the expected x f having observed xpData: xp = 8.1234, s = 0.7234, n = 10000

µ = xp ±s√n= 8.1234± 0.0072

(based on standard knowledge, including the fact that σe ≈ s withrather good approximation – we shall return on this point later)

Also the question concerning xf (meant a single observation)is rather easy to answer:

xf = xp ± s = 8.12± 0.72 (Gaussian)

c© GdA, WLS-6 12/02/20 23/67

Page 110: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Problem on the expected x f having observed xpData: xp = 8.1234, s = 0.7234, n = 10000

µ = xp ±s√n= 8.1234± 0.0072

(based on standard knowledge, including the fact that σe ≈ s withrather good approximation – we shall return on this point later)

Also the question concerning xf (meant a single observation)is rather easy to answer:

xf = xp ± s = 8.12± 0.72 (Gaussian)

More interesting was question concerning x f , remembering thatan aritmethic average can be considered an equivalentmeasurement with ‘σe ’ = σ(x) = σ(xi )/

√n:

c© GdA, WLS-6 12/02/20 23/67

Page 111: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Problem on the expected x f having observed xpData: xp = 8.1234, s = 0.7234, n = 10000

µ = xp ±s√n= 8.1234± 0.0072

(based on standard knowledge, including the fact that σe ≈ s withrather good approximation – we shall return on this point later)

Also the question concerning xf (meant a single observation)is rather easy to answer:

xf = xp ± s = 8.12± 0.72 (Gaussian)

More interesting was question concerning x f , remembering thatan aritmethic average can be considered an equivalentmeasurement with ‘σe ’ = σ(x) = σ(xi )/

√n:

x f = xp ±√2

s√n= 8.123± 0.010 (Gaussian)

c© GdA, WLS-6 12/02/20 23/67

Page 112: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Expected x f having observed xp

However, the factor√2 is usually ‘forgotten’

c© GdA, WLS-6 12/02/20 24/67

Page 113: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Expected x f having observed xp

However, the factor√2 is usually ‘forgotten’

c© GdA, WLS-6 12/02/20 24/67

Page 114: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Expected x f having observed xp

However, the factor√2 is usually ‘forgotten’

(Glen Cowan, Statistical Data Analysis)

c© GdA, WLS-6 12/02/20 24/67

Page 115: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Remark on ‘conventional statistics’Objection:“A method which is ‘classical’ and ‘exact’ cannot be wrong”

c© GdA, WLS-6 12/02/20 25/67

Page 116: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Remark on ‘conventional statistics’Objection:“A method which is ‘classical’ and ‘exact’ cannot be wrong”Uhm. . .

c© GdA, WLS-6 12/02/20 25/67

Page 117: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Remark on ‘conventional statistics’Objection:“A method which is ‘classical’ and ‘exact’ cannot be wrong”Uhm. . .◮ Frequentist ‘gurus’ are champions in misusing terminonology,

thus confusing people (“CL”, “confidence intervals”).

c© GdA, WLS-6 12/02/20 25/67

Page 118: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Remark on ‘conventional statistics’Objection:“A method which is ‘classical’ and ‘exact’ cannot be wrong”Uhm. . .◮ Frequentist ‘gurus’ are champions in misusing terminonology,

thus confusing people (“CL”, “confidence intervals”).◮ Details in GdA, About the proof of the so called

exact classical confidence intervals. Where is the trick?,https://arxiv.org/abs/physics/0605140

c© GdA, WLS-6 12/02/20 25/67

Page 119: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Remark on ‘conventional statistics’Objection:“A method which is ‘classical’ and ‘exact’ cannot be wrong”Uhm. . .◮ Frequentist ‘gurus’ are champions in misusing terminonology,

thus confusing people (“CL”, “confidence intervals”).◮ Details in GdA, About the proof of the so called

exact classical confidence intervals. Where is the trick?,https://arxiv.org/abs/physics/0605140

If you like, the method is exact not because it provides preciselythe correct answer to our problem

c© GdA, WLS-6 12/02/20 25/67

Page 120: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Remark on ‘conventional statistics’Objection:“A method which is ‘classical’ and ‘exact’ cannot be wrong”Uhm. . .◮ Frequentist ‘gurus’ are champions in misusing terminonology,

thus confusing people (“CL”, “confidence intervals”).◮ Details in GdA, About the proof of the so called

exact classical confidence intervals. Where is the trick?,https://arxiv.org/abs/physics/0605140

If you like, the method is exact not because it provides preciselythe correct answer to our problem, but becauseit results from an exact prescription.

c© GdA, WLS-6 12/02/20 25/67

Page 121: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Remark on ‘conventional statistics’Objection:“A method which is ‘classical’ and ‘exact’ cannot be wrong”Uhm. . .◮ Frequentist ‘gurus’ are champions in misusing terminonology,

thus confusing people (“CL”, “confidence intervals”).◮ Details in GdA, About the proof of the so called

exact classical confidence intervals. Where is the trick?,https://arxiv.org/abs/physics/0605140

If you like, the method is exact not because it provides preciselythe correct answer to our problem, but becauseit results from an exact prescription.

Q. Does the method always produce wrong results?

c© GdA, WLS-6 12/02/20 25/67

Page 122: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Remark on ‘conventional statistics’Objection:“A method which is ‘classical’ and ‘exact’ cannot be wrong”Uhm. . .◮ Frequentist ‘gurus’ are champions in misusing terminonology,

thus confusing people (“CL”, “confidence intervals”).◮ Details in GdA, About the proof of the so called

exact classical confidence intervals. Where is the trick?,https://arxiv.org/abs/physics/0605140

If you like, the method is exact not because it provides preciselythe correct answer to our problem, but becauseit results from an exact prescription.

Q. Does the method always produce wrong results?A. In most routine cases the answer is ‘numerically’ OK.

c© GdA, WLS-6 12/02/20 25/67

Page 123: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Remark on ‘conventional statistics’Objection:“A method which is ‘classical’ and ‘exact’ cannot be wrong”Uhm. . .◮ Frequentist ‘gurus’ are champions in misusing terminonology,

thus confusing people (“CL”, “confidence intervals”).◮ Details in GdA, About the proof of the so called

exact classical confidence intervals. Where is the trick?,https://arxiv.org/abs/physics/0605140

If you like, the method is exact not because it provides preciselythe correct answer to our problem, but becauseit results from an exact prescription.

Q. Does the method always produce wrong results?A. In most routine cases the answer is ‘numerically’ OK.

In Frontier Physics cases this is not the case (!).

c© GdA, WLS-6 12/02/20 25/67

Page 124: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Remark on ‘conventional statistics’Objection:“A method which is ‘classical’ and ‘exact’ cannot be wrong”Uhm. . .◮ Frequentist ‘gurus’ are champions in misusing terminonology,

thus confusing people (“CL”, “confidence intervals”).◮ Details in GdA, About the proof of the so called

exact classical confidence intervals. Where is the trick?,https://arxiv.org/abs/physics/0605140

If you like, the method is exact not because it provides preciselythe correct answer to our problem, but becauseit results from an exact prescription.

Q. Does the method always produce wrong results?A. In most routine cases the answer is ‘numerically’ OK.

In Frontier Physics cases this is not the case (!).GdA, Bayesian reasoning versus conventional statistics

in High Energy Physics,https://arxiv.org/abs/physics/9811046

c© GdA, WLS-6 12/02/20 25/67

Page 125: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Prescriptions?

c© GdA, WLS-6 12/02/20 26/67

Page 126: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Objective prescriptions?

Mistrust those who promise you ‘objective’ methods to form upyour confidence about the physical world!

c© GdA, WLS-6 12/02/20 27/67

Page 127: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Principles?

Too many unnecessary ‘principles’ on the market.

c© GdA, WLS-6 12/02/20 28/67

Page 128: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Principles?

Too many unnecessary ‘principles’ on the market.

c© GdA, WLS-6 12/02/20 28/67

Page 129: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Introducing systematicsInfluence quantities

By influence quantities we mean:

→ all kinds of external factors which may influence the result(temperature, atmospheric pressure, etc.);

c© GdA, WLS-6 12/02/20 29/67

Page 130: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Introducing systematicsInfluence quantities

By influence quantities we mean:

→ all kinds of external factors which may influence the result(temperature, atmospheric pressure, etc.);

→ all calibration constants;

c© GdA, WLS-6 12/02/20 29/67

Page 131: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Introducing systematicsInfluence quantities

By influence quantities we mean:

→ all kinds of external factors which may influence the result(temperature, atmospheric pressure, etc.);

→ all calibration constants;

→ all possible hypotheses upon which the results may depend(e.g. Monte Carlo parameters).

c© GdA, WLS-6 12/02/20 29/67

Page 132: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Introducing systematicsInfluence quantities

By influence quantities we mean:

→ all kinds of external factors which may influence the result(temperature, atmospheric pressure, etc.);

→ all calibration constants;

→ all possible hypotheses upon which the results may depend(e.g. Monte Carlo parameters).

c© GdA, WLS-6 12/02/20 29/67

Page 133: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Introducing systematicsInfluence quantities

By influence quantities we mean:

→ all kinds of external factors which may influence the result(temperature, atmospheric pressure, etc.);

→ all calibration constants;

→ all possible hypotheses upon which the results may depend(e.g. Monte Carlo parameters).

From a probabilistic point of view, there is no distinction betweenµ and h: they are all conditional hypotheses for the x , i.e. causeswhich produce the observed effects. The difference is simply thatwe are interested in µ rather than in h.

c© GdA, WLS-6 12/02/20 29/67

Page 134: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Introducing systematicsSeveral approaches (within probability theory – no adhocheries!)

Uncertainty due to systematic effects is also included in a naturalway in this approach. Let us first define the notation (i is thegeneric index):

◮x = {x1, x2, . . . xnx} is the ‘n-tuple’ (vector) of observables Xi ;

◮ µ = {µ1, µ2, . . . µnµ} is the n-tuple of true values µi ;

◮h = {h1, h2, . . . hnh} is the n-tuple of influence quantities Hi .(see ISO GUM).

c© GdA, WLS-6 12/02/20 30/67

Page 135: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Taking into account of uncertain hGlobal inference on f (µ, h)

◮ We can use Bayes’ theorem to make an inference on µ and h.A subsequent marginalization over h yields the p.d.f. ofinterest:

x ⇒ f (µ,h | x) ⇒ f (µ | x) .This method, depending on the joint prior distributionf◦(µ,h), can even model possible correlations between µ andh.

c© GdA, WLS-6 12/02/20 31/67

Page 136: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Taking into account of uncertain hConditional inference

◮ Given the observed data, one has a joint distribution of µ forall possible configurations of h:

x ⇒ f (µ | x ,h) .

Each conditional result is reweighed with the distribution ofbeliefs of h, using the well-known law of probability:

f (µ | x) =∫

f (µ | x ,h) f (h) dh .

c© GdA, WLS-6 12/02/20 32/67

Page 137: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Taking into account of uncertain hConditional inference

µ

xxo

true value

observed value

f(µ|xo)

f(µ|xo ,h)

f(x|µo ,h)

µo

f(x|µo)

c© GdA, WLS-6 12/02/20 33/67

Page 138: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Taking into account of uncertain hPropagation of uncertainties

◮ Essentially, one applies the propagation of uncertainty, whosemost general case has been illustrated in the previous section,making use of the following model: One considers a ‘rawresult’ on raw values µR for some nominal values of theinfluence quantities, i.e.

f (µR | x ,h◦) ;

then (corrected) true values are obtained as a function of theraw ones and of the possible values of the influence quantities,i.e.

µi = µi (µiR ,h) ,

and f (µ) is evaluated by probability rules.

The third form is particularly convenient to make linear expansionswhich lead to approximate solutions.

c© GdA, WLS-6 12/02/20 34/67

Page 139: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetModel:◮ the “zero” of the instrument is not usually known exactly,

owing to calibration uncertainty.

c© GdA, WLS-6 12/02/20 35/67

Page 140: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetModel:◮ the “zero” of the instrument is not usually known exactly,

owing to calibration uncertainty.◮ This can be parametrized assuming that its true value Z is

normally distributed around 0 (i.e. the calibration wasproperly done!) with a standard deviation σZ .:

Z ∼ N (0, σZ )

c© GdA, WLS-6 12/02/20 35/67

Page 141: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetModel:◮ the “zero” of the instrument is not usually known exactly,

owing to calibration uncertainty.◮ This can be parametrized assuming that its true value Z is

normally distributed around 0 (i.e. the calibration wasproperly done!) with a standard deviation σZ .:

Z ∼ N (0, σZ )

◮ Since the true value of µ is usually independent of the truevalue of Z , the initial joint probability density function can bewritten as the product of the marginal ones:

f◦(µ, z) = f◦(µ) f◦(z) = k1√

2π σZexp

[

− z2

2σ2Z

]

.

◮ X is no longer Gaussian distributed around µ,

c© GdA, WLS-6 12/02/20 35/67

Page 142: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetModel:◮ the “zero” of the instrument is not usually known exactly,

owing to calibration uncertainty.◮ This can be parametrized assuming that its true value Z is

normally distributed around 0 (i.e. the calibration wasproperly done!) with a standard deviation σZ .:

Z ∼ N (0, σZ )

◮ Since the true value of µ is usually independent of the truevalue of Z , the initial joint probability density function can bewritten as the product of the marginal ones:

f◦(µ, z) = f◦(µ) f◦(z) = k1√

2π σZexp

[

− z2

2σ2Z

]

.

◮ X is no longer Gaussian distributed around µ, but aroundµ+ Z :

X ∼ N (µ+ Z , σ)c© GdA, WLS-6 12/02/20 35/67

Page 143: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetApplication to the single (equivalent) measuement X1, with std σ1

Likelihood:

f (x1 |µ, z) =1√

2π σ1exp

[

−(x1 − µ− z)2

2σ21

]

.

c© GdA, WLS-6 12/02/20 36/67

Page 144: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetApplication to the single (equivalent) measuement X1, with std σ1

Likelihood:

f (x1 |µ, z) =1√

2π σ1exp

[

−(x1 − µ− z)2

2σ21

]

.

f (µ, z | x1) ∝1√

2π σ1exp

[

−(x1 − µ− z)2

2σ21

]

1√2π σZ

exp

[

− z2

2σ2Z

]

c© GdA, WLS-6 12/02/20 36/67

Page 145: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetApplication to the single (equivalent) measuement X1, with std σ1

Likelihood:

f (x1 |µ, z) =1√

2π σ1exp

[

−(x1 − µ− z)2

2σ21

]

.

f (µ, z | x1) ∝1√

2π σ1exp

[

−(x1 − µ− z)2

2σ21

]

1√2π σZ

exp

[

− z2

2σ2Z

]

After joint inference and marginalization

f (µ | x1) =∫

1√2π σ1

exp[

− (x1−µ−z)2

2σ21

]

1√2π σZ

exp[

− z2

2σ2Z

]

dz

∫∫

1√2π σ1

exp[

− (x1−µ−z)2

2σ21

]

1√2π σZ

exp[

− z2

2σ2Z

]

dµ dz.

c© GdA, WLS-6 12/02/20 36/67

Page 146: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetApplication to the single (equivalent) measuement X1, with std σ1

Likelihood:

f (x1 |µ, z) =1√

2π σ1exp

[

−(x1 − µ− z)2

2σ21

]

.

f (µ, z | x1) ∝1√

2π σ1exp

[

−(x1 − µ− z)2

2σ21

]

1√2π σZ

exp

[

− z2

2σ2Z

]

After joint inference and marginalization

f (µ | x1) =∫

1√2π σ1

exp[

− (x1−µ−z)2

2σ21

]

1√2π σZ

exp[

− z2

2σ2Z

]

dz

∫∫

1√2π σ1

exp[

− (x1−µ−z)2

2σ21

]

1√2π σZ

exp[

− z2

2σ2Z

]

dµ dz.

Integrating we get

f (µ) = f (µ | x1, . . . , f◦(z)) =1

√2π

σ21 + σ2

Z

exp

[

− (µ− x1)2

2 (σ21 + σ2

Z )

]

.

c© GdA, WLS-6 12/02/20 36/67

Page 147: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetTechnical remark

It may help to know that

∫ +∞

−∞exp

[

b x − x2

a2

]

dx =√a2 π exp

[

a2 b2

4

]

c© GdA, WLS-6 12/02/20 37/67

Page 148: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetResult

f (µ) = f (µ | x1, . . . , f◦(z)) =1

√2π

σ21 + σ2

Z

exp

[

− (µ− x1)2

2 (σ21 + σ2

Z )

]

.

◮ f (µ) is still a Gaussian, but with a larger variance

c© GdA, WLS-6 12/02/20 38/67

Page 149: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetResult

f (µ) = f (µ | x1, . . . , f◦(z)) =1

√2π

σ21 + σ2

Z

exp

[

− (µ− x1)2

2 (σ21 + σ2

Z )

]

.

◮ f (µ) is still a Gaussian, but with a larger variance◮ The global standard uncertainty is the quadratic combination

of that due to the statistical fluctuation of the data sampleand the uncertainty due to the imperfect knowledge of thesystematic effect:

σ2tot = σ2

1 + σ2Z .

c© GdA, WLS-6 12/02/20 38/67

Page 150: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetResult

f (µ) = f (µ | x1, . . . , f◦(z)) =1

√2π

σ21 + σ2

Z

exp

[

− (µ− x1)2

2 (σ21 + σ2

Z )

]

.

◮ f (µ) is still a Gaussian, but with a larger variance◮ The global standard uncertainty is the quadratic combination

of that due to the statistical fluctuation of the data sampleand the uncertainty due to the imperfect knowledge of thesystematic effect:

σ2tot = σ2

1 + σ2Z .

◮ This result (a theorem under well stated conditions!) is often used

as a ‘prescription’, although there are still some “old-fashioned”

recipes which require different combinations of the contributions to

be performed.

c© GdA, WLS-6 12/02/20 38/67

Page 151: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetMeasuring two quantities with the same instrument

Measuring µ1 and µ2, resulting into x1 and x2.Setting up the model:

Z ∼ N (0, σZ )

X1 ∼ N (µ1 + Z , σ1)

X2 ∼ N (µ2 + Z , σ2)

c© GdA, WLS-6 12/02/20 39/67

Page 152: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetMeasuring two quantities with the same instrument

Measuring µ1 and µ2, resulting into x1 and x2.Setting up the model:

Z ∼ N (0, σZ )

X1 ∼ N (µ1 + Z , σ1)

X2 ∼ N (µ2 + Z , σ2)

f (x1, x2 |µ1, µ2, z) =1√

2π σ1exp

[

−(x1 − µ1 − z)2

2σ21

]

× 1√2π σ2

exp

[

−(x2 − µ2 − z)2

2σ22

]

=1

2π σ1σ2exp

[

−1

2

(

(x1 − µ1 − z)2

σ21

+(x2 − µ2 − z)2

σ22

)]

c© GdA, WLS-6 12/02/20 39/67

Page 153: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetMeasuring two quantities with the same instrument

f (µ1, µ2 | x1, x2) =

f (x1, x2 |µ1, µ2, z) f◦(µ1, µ2, z)dz∫

. . . dµ1 dµ2 dz

c© GdA, WLS-6 12/02/20 40/67

Page 154: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetMeasuring two quantities with the same instrument

f (µ1, µ2 | x1, x2) =

f (x1, x2 |µ1, µ2, z) f◦(µ1, µ2, z)dz∫

. . . dµ1 dµ2 dz

=1

2π√

σ21 + σ2

Z

σ22 + σ2

Z

1− ρ2

× exp

{

− 1

2 (1− ρ2)

[

(µ1 − x1)2

σ21 + σ2

Z

−2 ρ(µ1 − x1)(µ2 − x2)

σ21 + σ2

Z

σ22 + σ2

Z

+(µ2 − x2)

2

σ22 + σ2

Z

]}

where

ρ =σ2Z

σ21 + σ2

Z

σ22 + σ2

Z

.

c© GdA, WLS-6 12/02/20 40/67

Page 155: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetMeasuring two quantities with the same instrument

f (µ1, µ2 | x1, x2) =

f (x1, x2 |µ1, µ2, z) f◦(µ1, µ2, z)dz∫

. . . dµ1 dµ2 dz

=1

2π√

σ21 + σ2

Z

σ22 + σ2

Z

1− ρ2

× exp

{

− 1

2 (1− ρ2)

[

(µ1 − x1)2

σ21 + σ2

Z

−2 ρ(µ1 − x1)(µ2 − x2)

σ21 + σ2

Z

σ22 + σ2

Z

+(µ2 − x2)

2

σ22 + σ2

Z

]}

where

ρ =σ2Z

σ21 + σ2

Z

σ22 + σ2

Z

.

⇒ bivariate normal distribution!c© GdA, WLS-6 12/02/20 40/67

Page 156: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetSummary:

µ1 ∼ N(

x1,√

σ21 + σ2

Z

)

,

µ2 ∼ N(

x2,√

σ22 + σ2

Z

)

ρ =σ2Z

σ21 + σ2

Z

σ22 + σ2

Z

c© GdA, WLS-6 12/02/20 41/67

Page 157: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetSummary:

µ1 ∼ N(

x1,√

σ21 + σ2

Z

)

,

µ2 ∼ N(

x2,√

σ22 + σ2

Z

)

ρ =σ2Z

σ21 + σ2

Z

σ22 + σ2

Z

Cov(µ1, µ2) = ρ σµ1σµ2

= ρ√

σ21 + σ2

Z

σ22 + σ2

Z = σ2Z

c© GdA, WLS-6 12/02/20 41/67

Page 158: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetSummary:

µ1 ∼ N(

x1,√

σ21 + σ2

Z

)

,

µ2 ∼ N(

x2,√

σ22 + σ2

Z

)

ρ =σ2Z

σ21 + σ2

Z

σ22 + σ2

Z

Cov(µ1, µ2) = ρ σµ1σµ2

= ρ√

σ21 + σ2

Z

σ22 + σ2

Z = σ2Z

Checks, defining S = µ1 + µ2 and D = µ1 − µ2

c© GdA, WLS-6 12/02/20 41/67

Page 159: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetSummary:

µ1 ∼ N(

x1,√

σ21 + σ2

Z

)

,

µ2 ∼ N(

x2,√

σ22 + σ2

Z

)

ρ =σ2Z

σ21 + σ2

Z

σ22 + σ2

Z

Cov(µ1, µ2) = ρ σµ1σµ2

= ρ√

σ21 + σ2

Z

σ22 + σ2

Z = σ2Z

Checks, defining S = µ1 + µ2 and D = µ1 − µ2

D ∼ N(

x1 − x2,√

σ21 + σ2

2

)

c© GdA, WLS-6 12/02/20 41/67

Page 160: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetSummary:

µ1 ∼ N(

x1,√

σ21 + σ2

Z

)

,

µ2 ∼ N(

x2,√

σ22 + σ2

Z

)

ρ =σ2Z

σ21 + σ2

Z

σ22 + σ2

Z

Cov(µ1, µ2) = ρ σµ1σµ2

= ρ√

σ21 + σ2

Z

σ22 + σ2

Z = σ2Z

Checks, defining S = µ1 + µ2 and D = µ1 − µ2

D ∼ N(

x1 − x2,√

σ21 + σ2

2

)

S ∼ N(

x1 + x2,√

σ21 + σ2

2 + (2σZ )2)

c© GdA, WLS-6 12/02/20 41/67

Page 161: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Systematics due to uncertain offsetSummary:

µ1 ∼ N(

x1,√

σ21 + σ2

Z

)

,

µ2 ∼ N(

x2,√

σ22 + σ2

Z

)

ρ =σ2Z

σ21 + σ2

Z

σ22 + σ2

Z

Cov(µ1, µ2) = ρ σµ1σµ2

= ρ√

σ21 + σ2

Z

σ22 + σ2

Z = σ2Z

Checks, defining S = µ1 + µ2 and D = µ1 − µ2

D ∼ N(

x1 − x2,√

σ21 + σ2

2

)

S ∼ N(

x1 + x2,√

σ21 + σ2

2 + (2σZ )2)

As more or less intuitively expected from an offset!c© GdA, WLS-6 12/02/20 41/67

Page 162: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

An exerciseTwo samples of data have been collected with the sameinstrument. These are the numbers, as they result from a printout(homogeneous quantities, therefore measurement unit omitted):◮ n1 = 1000, x1 = 10.4012, s1 = 5.7812;◮ n2 = 2000, x2 = 10.2735, s2 = 5.9324.

We know that the instrument has an offset uncertainty of 0.15.

1. Report the results on µ1, µ2, σ1 and σ2.2. If you consider the σ’s of the two samples consistent you

might combine the result.3. Calculate the correlation coefficient between µ1 and µ2.4. Give also the result on s = µ1 + µ2 and s = µ1 − µ2,

including ρ(s, d).5. Give also the result on z1 = µ1 µ

22 and z2 = µ1/µ2, including

ρ(z1, z2).6. Consider also a third data sample, recorded with the same

instrument:n3 = 4, x3 = 13.8931, s3 = 4.5371.

c© GdA, WLS-6 12/02/20 42/67

Page 163: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ f (x |µ, σ) · f0(µ)

c© GdA, WLS-6 12/02/20 43/67

Page 164: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ f (x |µ, σ) · f0(µ)

∝∏

i

f (xi |µ, σ) · f0(µ) =

c© GdA, WLS-6 12/02/20 43/67

Page 165: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ f (x |µ, σ) · f0(µ)

∝∏

i

f (xi |µ, σ) · f0(µ) =∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ)

c© GdA, WLS-6 12/02/20 43/67

Page 166: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ f (x |µ, σ) · f0(µ)

∝∏

i

f (xi |µ, σ) · f0(µ) =∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ)

∝ exp

[

−∑

i (xi − µ)2

2σ2

]

· f0(µ)

c© GdA, WLS-6 12/02/20 43/67

Page 167: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ f (x |µ, σ) · f0(µ)

∝∏

i

f (xi |µ, σ) · f0(µ) =∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ)

∝ exp

[

−∑

i (xi − µ)2

2σ2

]

· f0(µ)

∝ exp

[

−(∑

i x2i − 2µ

i xi + n µ2)

2σ2

]

· f0(µ)

c© GdA, WLS-6 12/02/20 43/67

Page 168: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ f (x |µ, σ) · f0(µ)

∝∏

i

f (xi |µ, σ) · f0(µ) =∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ)

∝ exp

[

−∑

i (xi − µ)2

2σ2

]

· f0(µ)

∝ exp

[

−(∑

i x2i − 2µ

i xi + n µ2)

2σ2

]

· f0(µ)

∝ exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ)

c© GdA, WLS-6 12/02/20 43/67

Page 169: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ f (x |µ, σ) · f0(µ)

∝∏

i

f (xi |µ, σ) · f0(µ) =∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ)

∝ exp

[

−∑

i (xi − µ)2

2σ2

]

· f0(µ)

∝ exp

[

−(∑

i x2i − 2µ

i xi + n µ2)

2σ2

]

· f0(µ)

∝ exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ)

∝ exp

[

−µ2 − 2µ x

2σ2/n

]

· f0(µ)

c© GdA, WLS-6 12/02/20 43/67

Page 170: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ f (x |µ, σ) · f0(µ)

∝∏

i

f (xi |µ, σ) · f0(µ) =∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ)

∝ exp

[

−∑

i (xi − µ)2

2σ2

]

· f0(µ)

∝ exp

[

−(∑

i x2i − 2µ

i xi + n µ2)

2σ2

]

· f0(µ)

∝ exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ)

∝ exp

[

−µ2 − 2µ x

2σ2/n

]

· f0(µ)

∝ exp

[

−µ2 − 2µ x + x2

2σ2/n

]

· f0(µ)

c© GdA, WLS-6 12/02/20 43/67

Page 171: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ f (x |µ, σ) · f0(µ)

∝∏

i

f (xi |µ, σ) · f0(µ) =∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ)

∝ exp

[

−∑

i (xi − µ)2

2σ2

]

· f0(µ)

∝ exp

[

−(∑

i x2i − 2µ

i xi + n µ2)

2σ2

]

· f0(µ)

∝ exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ)

∝ exp

[

−µ2 − 2µ x

2σ2/n

]

· f0(µ)

∝ exp

[

−µ2 − 2µ x + x2

2σ2/n

]

· f0(µ)

Trick: complementing of exponentialc© GdA, WLS-6 12/02/20 43/67

Page 172: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ exp

[

−(µ− x)2

2σ2/n

]

· f0(µ)

c© GdA, WLS-6 12/02/20 44/67

Page 173: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ exp

[

−(µ− x)2

2σ2/n

]

· f0(µ)

In the case of f0(µ) irrelevant (but we know how to act otherwise!)we recognize by eye a Gaussian

c© GdA, WLS-6 12/02/20 44/67

Page 174: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ exp

[

−(µ− x)2

2σ2/n

]

· f0(µ)

In the case of f0(µ) irrelevant (but we know how to act otherwise!)we recognize by eye a Gaussian

f (µ | x , σ) =1√

2π σ/√nexp

[

− (µ− x)2

2 (σ/√n)2

]

c© GdA, WLS-6 12/02/20 44/67

Page 175: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ exp

[

−(µ− x)2

2σ2/n

]

· f0(µ)

In the case of f0(µ) irrelevant (but we know how to act otherwise!)we recognize by eye a Gaussian

f (µ | x , σ) =1√

2π σ/√nexp

[

− (µ− x)2

2 (σ/√n)2

]

µ is Gaussian around arithmetic average, with standard deviationσ/

√n

µ ∼ N (x ,σ√n)

c© GdA, WLS-6 12/02/20 44/67

Page 176: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ exp

[

−(µ− x)2

2σ2/n

]

· f0(µ)

In the case of f0(µ) irrelevant (but we know how to act otherwise!)we recognize by eye a Gaussian

f (µ | x , σ) =1√

2π σ/√nexp

[

− (µ− x)2

2 (σ/√n)2

]

µ is Gaussian around arithmetic average, with standard deviationσ/

√n

µ ∼ N (x ,σ√n)

◮ x is a sufficient statistic (very important concept!)

c© GdA, WLS-6 12/02/20 44/67

Page 177: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

f (µ | x , σ) ∝ exp

[

−(µ− x)2

2σ2/n

]

· f0(µ)

In the case of f0(µ) irrelevant (but we know how to act otherwise!)we recognize by eye a Gaussian

f (µ | x , σ) =1√

2π σ/√nexp

[

− (µ− x)2

2 (σ/√n)2

]

µ is Gaussian around arithmetic average, with standard deviationσ/

√n

µ ∼ N (x ,σ√n)

◮ x is a sufficient statistic (very important concept!)⇒ x it provides the same information about µ⇒ contained in detailed knowledge of x

c© GdA, WLS-6 12/02/20 44/67

Page 178: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

Exercise

◮ In the last steps we have used the technique ofcomplementing the exponential.

◮ Restart, using a flat prior, from

f (µ | x , σ) ∝ exp

[

−x2 − 2µ x + µ2

2σ2/n

]

and use the ‘Gaussian tricks’ (first and second derivatives ofϕ(µ)) to find E(µ) and Var(µ).

c© GdA, WLS-6 12/02/20 45/67

Page 179: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ from a sample(Gaussian, independent observations, σ perfectly known)

Exercise

◮ In the last steps we have used the technique ofcomplementing the exponential.

◮ Restart, using a flat prior, from

f (µ | x , σ) ∝ exp

[

−x2 − 2µ x + µ2

2σ2/n

]

and use the ‘Gaussian tricks’ (first and second derivatives ofϕ(µ)) to find E(µ) and Var(µ).

◮ In this case the result is exact, because f (µ | x , σ) is indeedGaussian.(A hint is that d2ϕ(µ)

dµ2 is constant ∀µ)

c© GdA, WLS-6 12/02/20 45/67

Page 180: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sample

f (µ, σ | x) ∝∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ, σ)

c© GdA, WLS-6 12/02/20 46/67

Page 181: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sample

f (µ, σ | x) ∝∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ, σ)

∝ 1

σnexp

[

−∑

i (xi − µ)2

2σ2

]

· f0(µ, σ),

c© GdA, WLS-6 12/02/20 46/67

Page 182: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sample

f (µ, σ | x) ∝∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ, σ)

∝ 1

σnexp

[

−∑

i (xi − µ)2

2σ2

]

· f0(µ, σ),

∝ σ−n exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

c© GdA, WLS-6 12/02/20 46/67

Page 183: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sample

f (µ, σ | x) ∝∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ, σ)

∝ 1

σnexp

[

−∑

i (xi − µ)2

2σ2

]

· f0(µ, σ),

∝ σ−n exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−x2−x2 + x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

c© GdA, WLS-6 12/02/20 46/67

Page 184: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sample

f (µ, σ | x) ∝∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ, σ)

∝ 1

σnexp

[

−∑

i (xi − µ)2

2σ2

]

· f0(µ, σ),

∝ σ−n exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−x2−x2 + x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

with s2 = x2 − x2, variance of the sample.

c© GdA, WLS-6 12/02/20 46/67

Page 185: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sample

f (µ, σ | x) ∝∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ, σ)

∝ 1

σnexp

[

−∑

i (xi − µ)2

2σ2

]

· f0(µ, σ),

∝ σ−n exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−x2−x2 + x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

with s2 = x2 − x2, variance of the sample.◮ the inference on µ and σ depends only on x and s

(and on the priors, as it has to be!).

c© GdA, WLS-6 12/02/20 46/67

Page 186: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sample

f (µ, σ | x) ∝∏

i

1√2πσ

e−

(xi−µ)2

2σ2 · f0(µ, σ)

∝ 1

σnexp

[

−∑

i (xi − µ)2

2σ2

]

· f0(µ, σ),

∝ σ−n exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−x2−x2 + x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

with s2 = x2 − x2, variance of the sample.◮ the inference on µ and σ depends only on x and s

(and on the priors, as it has to be!).⇒ x and s are sufficient statistics c© GdA, WLS-6 12/02/20 46/67

Page 187: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleIn practice

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

c© GdA, WLS-6 12/02/20 47/67

Page 188: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleIn practice

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Then

f (µ | x , s) =

∫ ∞

0f (µ, σ | x , s) dσ

c© GdA, WLS-6 12/02/20 47/67

Page 189: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleIn practice

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Then

f (µ | x , s) =

∫ ∞

0f (µ, σ | x , s) dσ

f (σ | x , s) =

∫ +∞

−∞f (µ, σ | x , s) dµ

c© GdA, WLS-6 12/02/20 47/67

Page 190: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleIn practice

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Then

f (µ | x , s) =

∫ ∞

0f (µ, σ | x , s) dσ

f (σ | x , s) =

∫ +∞

−∞f (µ, σ | x , s) dµ

Details in the Appendix, but some remarks are in order:

c© GdA, WLS-6 12/02/20 47/67

Page 191: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleIn practice

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Then

f (µ | x , s) =

∫ ∞

0f (µ, σ | x , s) dσ

f (σ | x , s) =

∫ +∞

−∞f (µ, σ | x , s) dµ

Details in the Appendix, but some remarks are in order:

◮ f (µ | x , s) is in general not Gaussian (not even starting froma flat prior!)

c© GdA, WLS-6 12/02/20 47/67

Page 192: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleIn practice

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Then

f (µ | x , s) =

∫ ∞

0f (µ, σ | x , s) dσ

f (σ | x , s) =

∫ +∞

−∞f (µ, σ | x , s) dµ

Details in the Appendix, but some remarks are in order:

◮ f (µ | x , s) is in general not Gaussian (not even starting froma flat prior!) due to the uncertainty on σ

c© GdA, WLS-6 12/02/20 47/67

Page 193: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleIn practice

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Then

f (µ | x , s) =

∫ ∞

0f (µ, σ | x , s) dσ

f (σ | x , s) =

∫ +∞

−∞f (µ, σ | x , s) dµ

Details in the Appendix, but some remarks are in order:

◮ f (µ | x , s) is in general not Gaussian (not even starting froma flat prior!) due to the uncertainty on σ (‘convolution overall possible values’)

c© GdA, WLS-6 12/02/20 47/67

Page 194: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleIn practice

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Then

f (µ | x , s) =

∫ ∞

0f (µ, σ | x , s) dσ

f (σ | x , s) =

∫ +∞

−∞f (µ, σ | x , s) dµ

Details in the Appendix, but some remarks are in order:

◮ f (µ | x , s) is in general not Gaussian (not even starting froma flat prior!) due to the uncertainty on σ (‘convolution overall possible values’)

◮ It tends to Gaussian when ‘σ is precisely measured’

c© GdA, WLS-6 12/02/20 47/67

Page 195: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleIn practice

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Then

f (µ | x , s) =

∫ ∞

0f (µ, σ | x , s) dσ

f (σ | x , s) =

∫ +∞

−∞f (µ, σ | x , s) dµ

Details in the Appendix, but some remarks are in order:

◮ f (µ | x , s) is in general not Gaussian (not even starting froma flat prior!) due to the uncertainty on σ (‘convolution overall possible values’)

◮ It tends to Gaussian when ‘σ is precisely measured’

⇒ n → ∞c© GdA, WLS-6 12/02/20 47/67

Page 196: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleLarge sample behaviour starting from uniform priors(∗)

(with ‘std’ for standard deviation to avoid confusion withunkown σ)

c© GdA, WLS-6 12/02/20 48/67

Page 197: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleLarge sample behaviour starting from uniform priors(∗)

(with ‘std’ for standard deviation to avoid confusion withunkown σ)

E(µ)′n→∞′

−−−−→ x

std(µ)n→∞−−−→ s√

n

µn→∞−−−→ ∼ N (x ,

s√n)

c© GdA, WLS-6 12/02/20 48/67

Page 198: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleLarge sample behaviour starting from uniform priors(∗)

(with ‘std’ for standard deviation to avoid confusion withunkown σ)

E(µ)′n→∞′

−−−−→ x

std(µ)n→∞−−−→ s√

n

µn→∞−−−→ ∼ N (x ,

s√n)

E(σ) −−−→n→∞

s

std(σ) −−−→n→∞

s√2 n

σ −−−→n→∞

∼ N (s,s√2 n

)

c© GdA, WLS-6 12/02/20 48/67

Page 199: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleLarge sample behaviour starting from uniform priors(∗)

(with ‘std’ for standard deviation to avoid confusion withunkown σ)

E(µ)′n→∞′

−−−−→ x

std(µ)n→∞−−−→ s√

n

µn→∞−−−→ ∼ N (x ,

s√n)

E(σ) −−−→n→∞

s

std(σ) −−−→n→∞

s√2 n

σ −−−→n→∞

∼ N (s,s√2 n

)

(∗) The most sensitive is the prior on σ

c© GdA, WLS-6 12/02/20 48/67

Page 200: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleLarge sample behaviour starting from uniform priors(∗)

(with ‘std’ for standard deviation to avoid confusion withunkown σ)

E(µ)′n→∞′

−−−−→ x

std(µ)n→∞−−−→ s√

n

µn→∞−−−→ ∼ N (x ,

s√n)

E(σ) −−−→n→∞

s

std(σ) −−−→n→∞

s√2 n

σ −−−→n→∞

∼ N (s,s√2 n

)

(∗) The most sensitive is the prior on σ ⇒ inducing abstractspeculations of mathematicians and statisticians who often havelittle idea of what they are taking about

c© GdA, WLS-6 12/02/20 48/67

Page 201: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleLarge sample behaviour starting from uniform priors(∗)

(with ‘std’ for standard deviation to avoid confusion withunkown σ)

E(µ)′n→∞′

−−−−→ x

std(µ)n→∞−−−→ s√

n

µn→∞−−−→ ∼ N (x ,

s√n)

E(σ) −−−→n→∞

s

std(σ) −−−→n→∞

s√2 n

σ −−−→n→∞

∼ N (s,s√2 n

)

(∗) The most sensitive is the prior on σ ⇒ inducing abstractspeculations of mathematicians and statisticians who often havelittle idea of what they are taking about (Gauss was Gauss!).

c© GdA, WLS-6 12/02/20 48/67

Page 202: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleLarge sample behaviour starting from uniform priors(∗)

(with ‘std’ for standard deviation to avoid confusion withunkown σ)

E(µ)′n→∞′

−−−−→ x

std(µ)n→∞−−−→ s√

n

µn→∞−−−→ ∼ N (x ,

s√n)

E(σ) −−−→n→∞

s

std(σ) −−−→n→∞

s√2 n

σ −−−→n→∞

∼ N (s,s√2 n

)

(∗) The most sensitive is the prior on σ ⇒ inducing abstractspeculations of mathematicians and statisticians who often havelittle idea of what they are taking about (Gauss was Gauss!).⇒ See references and links c© GdA, WLS-6 12/02/20 48/67

Page 203: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

c© GdA, WLS-6 12/02/20 49/67

Page 204: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Conditioning µ on a precise value of σ = σ∗:

f (µ | x , s, σ∗)

c© GdA, WLS-6 12/02/20 49/67

Page 205: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Conditioning µ on a precise value of σ = σ∗:

f (µ | x , s, σ∗) ∝ σ∗−n exp

[

−s2 + (µ− x)2

2σ∗2/n

]

· f0(µ)

c© GdA, WLS-6 12/02/20 49/67

Page 206: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Conditioning µ on a precise value of σ = σ∗:

f (µ | x , s, σ∗) ∝ σ∗−n exp

[

−s2 + (µ− x)2

2σ∗2/n

]

· f0(µ)

∝ exp

[

−(µ− x)2

2σ∗2/n

]

· f0(µ)

All factors not depending on µ absorbed in ‘∝’.

c© GdA, WLS-6 12/02/20 49/67

Page 207: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Conditioning µ on a precise value of σ = σ∗:

f (µ | x , s, σ∗) ∝ σ∗−n exp

[

−s2 + (µ− x)2

2σ∗2/n

]

· f0(µ)

∝ exp

[

−(µ− x)2

2σ∗2/n

]

· f0(µ)

All factors not depending on µ absorbed in ‘∝’.In the case of uniform f0(µ) it turns out that µ is Gaussianaround x with standard deviation equal to σ∗/

√n.

c© GdA, WLS-6 12/02/20 49/67

Page 208: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Conditioning µ on a precise value of σ = σ∗:

f (µ | x , s, σ∗) ∝ σ∗−n exp

[

−s2 + (µ− x)2

2σ∗2/n

]

· f0(µ)

∝ exp

[

−(µ− x)2

2σ∗2/n

]

· f0(µ)

All factors not depending on µ absorbed in ‘∝’.In the case of uniform f0(µ) it turns out that µ is Gaussianaround x with standard deviation equal to σ∗/

√n.

“Obviously!”: this is equivanent to the choice fo(σ) = δ(σ − σ∗)

c© GdA, WLS-6 12/02/20 49/67

Page 209: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

c© GdA, WLS-6 12/02/20 50/67

Page 210: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Conditioning σ on a precise value of µ = µ∗:

f (σ | x , s, µ∗) ∝ σ−n exp

[

−s2 + (µ∗ − x)2

2σ2/n

]

· f0(σ)

c© GdA, WLS-6 12/02/20 50/67

Page 211: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Conditioning σ on a precise value of µ = µ∗:

f (σ | x , s, µ∗) ∝ σ−n exp

[

−s2 + (µ∗ − x)2

2σ2/n

]

· f0(σ)

∝ σ−n exp

[

−K 2

σ2

]

· f0(σ)

with K 2 = n (s2 + (µ∗ − x)2)/2, just a positive constant.

c© GdA, WLS-6 12/02/20 50/67

Page 212: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Conditioning σ on a precise value of µ = µ∗:

f (σ | x , s, µ∗) ∝ σ−n exp

[

−s2 + (µ∗ − x)2

2σ2/n

]

· f0(σ)

∝ σ−n exp

[

−K 2

σ2

]

· f0(σ)

with K 2 = n (s2 + (µ∗ − x)2)/2, just a positive constant.Change of variable: σ → τ = 1/σ2 (technically convenient):

f (τ | x , s, µ∗) ∝ τn/2 exp[

−K 2 τ]

· f0(τ)

c© GdA, WLS-6 12/02/20 50/67

Page 213: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Conditioning σ on a precise value of µ = µ∗:

f (σ | x , s, µ∗) ∝ σ−n exp

[

−s2 + (µ∗ − x)2

2σ2/n

]

· f0(σ)

∝ σ−n exp

[

−K 2

σ2

]

· f0(σ)

with K 2 = n (s2 + (µ∗ − x)2)/2, just a positive constant.Change of variable: σ → τ = 1/σ2 (technically convenient):

f (τ | x , s, µ∗) ∝ τn/2 exp[

−K 2 τ]

· f0(τ)We ‘easily’ recognize in τn/2 exp

[

−K 2 τ]

c© GdA, WLS-6 12/02/20 50/67

Page 214: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Conditioning σ on a precise value of µ = µ∗:

f (σ | x , s, µ∗) ∝ σ−n exp

[

−s2 + (µ∗ − x)2

2σ2/n

]

· f0(σ)

∝ σ−n exp

[

−K 2

σ2

]

· f0(σ)

with K 2 = n (s2 + (µ∗ − x)2)/2, just a positive constant.Change of variable: σ → τ = 1/σ2 (technically convenient):

f (τ | x , s, µ∗) ∝ τn/2 exp[

−K 2 τ]

· f0(τ)We ‘easily’ recognize in τn/2 exp

[

−K 2 τ]

a Gamma distribution:

c© GdA, WLS-6 12/02/20 50/67

Page 215: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Conditioning σ on a precise value of µ = µ∗:

f (σ | x , s, µ∗) ∝ σ−n exp

[

−s2 + (µ∗ − x)2

2σ2/n

]

· f0(σ)

∝ σ−n exp

[

−K 2

σ2

]

· f0(σ)

with K 2 = n (s2 + (µ∗ − x)2)/2, just a positive constant.Change of variable: σ → τ = 1/σ2 (technically convenient):

f (τ | x , s, µ∗) ∝ τn/2 exp[

−K 2 τ]

· f0(τ)We ‘easily’ recognize in τn/2 exp

[

−K 2 τ]

a Gamma distribution:→ also f (τ | x , s, µ∗) will be a Gamma if a Gamma f0(τ) is chosen

c© GdA, WLS-6 12/02/20 50/67

Page 216: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and σ from a sampleConditional distributions

Joint distribution:

f (µ, σ | x , s) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

Conditioning σ on a precise value of µ = µ∗:

f (σ | x , s, µ∗) ∝ σ−n exp

[

−s2 + (µ∗ − x)2

2σ2/n

]

· f0(σ)

∝ σ−n exp

[

−K 2

σ2

]

· f0(σ)

with K 2 = n (s2 + (µ∗ − x)2)/2, just a positive constant.Change of variable: σ → τ = 1/σ2 (technically convenient):

f (τ | x , s, µ∗) ∝ τn/2 exp[

−K 2 τ]

· f0(τ)We ‘easily’ recognize in τn/2 exp

[

−K 2 τ]

a Gamma distribution:→ also f (τ | x , s, µ∗) will be a Gamma if a Gamma f0(τ) is chosen

⇒ Gibbs sampler! c© GdA, WLS-6 12/02/20 50/67

Page 217: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and τ (→ σ) from a sampleSampling the posterior by MCMC using Gibbs sampler

0) Inizialization:◮ i = 0◮ choose a suitable Gaussian conjugate for µ (σ0 → ∞ for ‘flat’);◮ choose a suitable Gamma conjugate for τ (c = 1, r → 0 for

‘flat’);

c© GdA, WLS-6 12/02/20 51/67

Page 218: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and τ (→ σ) from a sampleSampling the posterior by MCMC using Gibbs sampler

0) Inizialization:◮ i = 0◮ choose a suitable Gaussian conjugate for µ (σ0 → ∞ for ‘flat’);◮ choose a suitable Gamma conjugate for τ (c = 1, r → 0 for

‘flat’);◮ choose an arbitary (but possible ‘reasonable’) µi ;

c© GdA, WLS-6 12/02/20 51/67

Page 219: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and τ (→ σ) from a sampleSampling the posterior by MCMC using Gibbs sampler

0) Inizialization:◮ i = 0◮ choose a suitable Gaussian conjugate for µ (σ0 → ∞ for ‘flat’);◮ choose a suitable Gamma conjugate for τ (c = 1, r → 0 for

‘flat’);◮ choose an arbitary (but possible ‘reasonable’) µi ;◮ extract at random τi from f (τ | x , s, µi )

c© GdA, WLS-6 12/02/20 51/67

Page 220: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and τ (→ σ) from a sampleSampling the posterior by MCMC using Gibbs sampler

0) Inizialization:◮ i = 0◮ choose a suitable Gaussian conjugate for µ (σ0 → ∞ for ‘flat’);◮ choose a suitable Gamma conjugate for τ (c = 1, r → 0 for

‘flat’);◮ choose an arbitary (but possible ‘reasonable’) µi ;◮ extract at random τi from f (τ | x , s, µi )

Then loop n times :

1) i = i + 1;extract at random µi from f (µ | x , s, τi−1);

c© GdA, WLS-6 12/02/20 51/67

Page 221: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and τ (→ σ) from a sampleSampling the posterior by MCMC using Gibbs sampler

0) Inizialization:◮ i = 0◮ choose a suitable Gaussian conjugate for µ (σ0 → ∞ for ‘flat’);◮ choose a suitable Gamma conjugate for τ (c = 1, r → 0 for

‘flat’);◮ choose an arbitary (but possible ‘reasonable’) µi ;◮ extract at random τi from f (τ | x , s, µi )

Then loop n times :

1) i = i + 1;extract at random µi from f (µ | x , s, τi−1);

2) extract at random τi from f (τ | x , s, µi ) ;goto 1)

c© GdA, WLS-6 12/02/20 51/67

Page 222: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and τ (→ σ) from a sampleSampling the posterior by MCMC using Gibbs sampler

0) Inizialization:◮ i = 0◮ choose a suitable Gaussian conjugate for µ (σ0 → ∞ for ‘flat’);◮ choose a suitable Gamma conjugate for τ (c = 1, r → 0 for

‘flat’);◮ choose an arbitary (but possible ‘reasonable’) µi ;◮ extract at random τi from f (τ | x , s, µi )

Then loop n times :

1) i = i + 1;extract at random µi from f (µ | x , s, τi−1);

2) extract at random τi from f (τ | x , s, µi ) ;goto 1)

Try it!

You only need Gassian and Gamma random number generators(e.g. in R)

c© GdA, WLS-6 12/02/20 51/67

Page 223: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and τ (→ σ) with JAGS/rjagsModel (to be written in the model file)

model{

for (i in 1:length(x)) {

x[i] ~ dnorm(mu, tau);

}

mu ~ dnorm(0.0, 1.0E-6);

tau ~ dgamma(1.0, 1.0E-6);

sigma <- 1.0/sqrt(tau);

}

c© GdA, WLS-6 12/02/20 52/67

Page 224: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and τ (→ σ) with JAGS/rjagsModel (to be written in the model file)

model{

for (i in 1:length(x)) {

x[i] ~ dnorm(mu, tau);

}

mu ~ dnorm(0.0, 1.0E-6);

tau ~ dgamma(1.0, 1.0E-6);

sigma <- 1.0/sqrt(tau);

}

Simulated data

mu.true = 3; sigma.true = 2; sample.n = 20

x = rnorm(sample.n, mu.true, sigma.true)

c© GdA, WLS-6 12/02/20 52/67

Page 225: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and τ (→ σ) with JAGS/rjagsModel (to be written in the model file)

model{

for (i in 1:length(x)) {

x[i] ~ dnorm(mu, tau);

}

mu ~ dnorm(0.0, 1.0E-6);

tau ~ dgamma(1.0, 1.0E-6);

sigma <- 1.0/sqrt(tau);

}

Simulated data

mu.true = 3; sigma.true = 2; sample.n = 20

x = rnorm(sample.n, mu.true, sigma.true)

JAGS calls

data = list(x=x)

inits = list(mu=mean(x), tau=1/var(x))

jm <- jags.model(model, data, inits)

update(jm, 100)

chain <- coda.samples(jm, c("mu","sigma"), n.iter=10000)c© GdA, WLS-6 12/02/20 52/67

Page 226: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and τ (→ σ) with JAGS/rjags⇒ inf mu sigma.R

c© GdA, WLS-6 12/02/20 53/67

Page 227: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Joint inference of µ and τ (→ σ) with JAGS/rjags⇒ inf mu sigma.R

mu = 2.87, std(mu) = 0.44; sigma = 1.94, std(sigma) = 0.31

c© GdA, WLS-6 12/02/20 53/67

Page 228: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Fits – introduction

◮ In a probabilistic framework the issue of the fits is nothing but

parametric inference.

◮ set up the model,e.g. µyi = m µxi + c

c© GdA, WLS-6 12/02/20 54/67

Page 229: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Fits – introduction

◮ In a probabilistic framework the issue of the fits is nothing but

parametric inference.

◮ set up the model,e.g. µyi = m µxi + c

Note: Linearity is between µyi andµxi , not between yi and xi !

c© GdA, WLS-6 12/02/20 54/67

Page 230: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Fits – introduction

◮ In a probabilistic framework the issue of the fits is nothing but

parametric inference.

◮ set up the model,e.g. µyi = m µxi + c

Note: Linearity is between µyi andµxi , not between yi and xi !

◮ apply probability rules;

c© GdA, WLS-6 12/02/20 54/67

Page 231: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Fits – introduction

◮ In a probabilistic framework the issue of the fits is nothing but

parametric inference.

◮ set up the model,e.g. µyi = m µxi + c

Note: Linearity is between µyi andµxi , not between yi and xi !

◮ apply probability rules;

◮ perform the calculations.

θ

µxi

xi

µyi

yi

[ for each i ]

c© GdA, WLS-6 12/02/20 54/67

Page 232: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Fits – introduction

◮ In a probabilistic framework the issue of the fits is nothing but

parametric inference.

◮ set up the model,e.g. µyi = m µxi + c

Note: Linearity is between µyi andµxi , not between yi and xi !

◮ apply probability rules;

◮ perform the calculations.

θ

µxi

xi

µyi

yi

[ for each i ]

→ f (θ | x, y, I )

c© GdA, WLS-6 12/02/20 54/67

Page 233: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Fits – introduction

◮ In a probabilistic framework the issue of the fits is nothing but

parametric inference.

◮ set up the model,e.g. µyi = m µxi + c

Note: Linearity is between µyi andµxi , not between yi and xi !

◮ apply probability rules;

◮ perform the calculations.

θ

µxi

xi

µyi

yi

[ for each i ]

→ f (θ | x, y, I )→ f (m, c | x, y,σ), in the case of case of linear fitwith “σ’s known a priori” (!)

c© GdA, WLS-6 12/02/20 54/67

Page 234: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fit – introduction

θ

µxi

xi

µyi

yi

[ for each i ]

◮ Deterministic links between µx ’s and µy ’s.

c© GdA, WLS-6 12/02/20 55/67

Page 235: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fit – introduction

θ

µxi

xi

µyi

yi

[ for each i ]

◮ Deterministic links between µx ’s and µy ’s.◮ Probabilistic links between µx ’s and x ’s, and between µy ’s

and y ’s (errors on both axes)

c© GdA, WLS-6 12/02/20 55/67

Page 236: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fit – introduction

θ

µxi

xi

µyi

yi

[ for each i ]

◮ Deterministic links between µx ’s and µy ’s.◮ Probabilistic links between µx ’s and x ’s, and between µy ’s

and y ’s (errors on both axes)◮ ⇒ aim of fit (σ’s known): {x , y} → θ = (m, c)

c© GdA, WLS-6 12/02/20 55/67

Page 237: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fit – introduction

θ

µxi

xi

µyi

yi

[ for each i ]

◮ Deterministic links between µx ’s and µy ’s.◮ Probabilistic links between µx ’s and x ’s, and between µy ’s

and y ’s (errors on both axes)◮ ⇒ aim of fit (σ’s known): {x , y} → θ = (m, c)◮ If σx ’s and σy ’s are unkown and assumed all equal

{x , y} → θ = (m, c , σx , σy )◮ etc. . .

c© GdA, WLS-6 12/02/20 55/67

Page 238: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fit – simplest case

f (m, c | x , y , I ) ∝ f (x , y |m, c , I ) · f0(m, c)

Simplifying hypotheses:

◮ No error on µx ⇒ µxi = xi :µyi = mxi + c .

c© GdA, WLS-6 12/02/20 56/67

Page 239: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fit – simplest case

f (m, c | x , y , I ) ∝ f (x , y |m, c , I ) · f0(m, c)

Simplifying hypotheses:

◮ No error on µx ⇒ µxi = xi :µyi = mxi + c .

◮ Gaussian errors on y , with yi ∼ N (µyi , σi ), with σi “knownsomehow” (or “to be determined in some way”)

c© GdA, WLS-6 12/02/20 56/67

Page 240: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fit – simplest case

f (m, c | x , y , I ) ∝ f (x , y |m, c , I ) · f0(m, c)

Simplifying hypotheses:

◮ No error on µx ⇒ µxi = xi :µyi = mxi + c .

◮ Gaussian errors on y , with yi ∼ N (µyi , σi ), with σi “knownsomehow” (or “to be determined in some way”)

◮ Independence of data points.

c© GdA, WLS-6 12/02/20 56/67

Page 241: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fit – simplest case

f (m, c | x , y , I ) ∝ f (x , y |m, c , I ) · f0(m, c)

Simplifying hypotheses:

◮ No error on µx ⇒ µxi = xi :µyi = mxi + c .

◮ Gaussian errors on y , with yi ∼ N (µyi , σi ), with σi “knownsomehow” (or “to be determined in some way”)

◮ Independence of data points.

f (m, c | x , y ,σ) ∝ exp

[

−∑

i

(yi − µyi )2

2σ2i

]

· f0(m, c)

∝ exp

[

−1

2

i

(yi −mxi − c)2

σ2i

]

· f0(m, c)

c© GdA, WLS-6 12/02/20 56/67

Page 242: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fit – simplest case

f (m, c | x , y , I ) ∝ f (x , y |m, c , I ) · f0(m, c)

Simplifying hypotheses:

◮ No error on µx ⇒ µxi = xi :µyi = mxi + c .

◮ Gaussian errors on y , with yi ∼ N (µyi , σi ), with σi “knownsomehow” (or “to be determined in some way”)

◮ Independence of data points.

f (m, c | x , y ,σ) ∝ exp

[

−∑

i

(yi − µyi )2

2σ2i

]

· f0(m, c)

∝ exp

[

−1

2

i

(yi −mxi − c)2

σ2i

]

· f0(m, c)

⇒ flat priors: inference only depends on exp[

−12

i(yi−mxi−c)2

σ2i

]

.

c© GdA, WLS-6 12/02/20 56/67

Page 243: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Least squares and ‘Gaussian tricks’ on linear fits

f (m, c | x , y ,σ) ∝ exp

[

−∑

i (yi −mxi − c)2

2σ2i

]

· f0(m, c)

◮ If the prior is irrelevant and the σ’s are all equal, than themaximum of the posterior is obtained when the sum of thesquares is minimized:⇒ Least Square ‘Principle’.

c© GdA, WLS-6 12/02/20 57/67

Page 244: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Least squares and ‘Gaussian tricks’ on linear fits

f (m, c | x , y ,σ) ∝ exp

[

−∑

i (yi −mxi − c)2

2σ2i

]

· f0(m, c)

◮ If the prior is irrelevant and the σ’s are all equal, than themaximum of the posterior is obtained when the sum of thesquares is minimized:⇒ Least Square ‘Principle’.

◮ You might recognize at the exponent: χ2/2:⇒ χ2 minimization.

c© GdA, WLS-6 12/02/20 57/67

Page 245: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Least squares and ‘Gaussian tricks’ on linear fits

f (m, c | x , y ,σ) ∝ exp

[

−∑

i (yi −mxi − c)2

2σ2i

]

· f0(m, c)

◮ If the prior is irrelevant and the σ’s are all equal, than themaximum of the posterior is obtained when the sum of thesquares is minimized:⇒ Least Square ‘Principle’.

◮ You might recognize at the exponent: χ2/2:⇒ χ2 minimization.

◮ As an approximation, one can obtain best fit parameters andcovariance matrix by the ‘Gaussian trick’⇒ ϕ(m, c) ∝ χ2.

c© GdA, WLS-6 12/02/20 57/67

Page 246: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Least squares and ‘Gaussian tricks’ on linear fits

f (m, c | x , y ,σ) ∝ exp

[

−∑

i (yi −mxi − c)2

2σ2i

]

· f0(m, c)

◮ If the prior is irrelevant and the σ’s are all equal, than themaximum of the posterior is obtained when the sum of thesquares is minimized:⇒ Least Square ‘Principle’.

◮ You might recognize at the exponent: χ2/2:⇒ χ2 minimization.

◮ As an approximation, one can obtain best fit parameters andcovariance matrix by the ‘Gaussian trick’⇒ ϕ(m, c) ∝ χ2.

⇒ same result of the detailed one is achieved, simply because theproblem is linear!(No garantee in general!)

c© GdA, WLS-6 12/02/20 57/67

Page 247: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Uncertain standard deviationIn the probabilistic approach it is rather simple: just add σ in θ toinfer.

◮ For example, if we have good reasons to belief that the σ’sare all equal, then

f (m, c , σ | x , y) ∝ σ−n exp

[

−∑

i (yi −mxi − c)2

2σ2

]

· f0(m, c , σ)

c© GdA, WLS-6 12/02/20 58/67

Page 248: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Uncertain standard deviationIn the probabilistic approach it is rather simple: just add σ in θ toinfer.

◮ For example, if we have good reasons to belief that the σ’sare all equal, then

f (m, c , σ | x , y) ∝ σ−n exp

[

−∑

i (yi −mxi − c)2

2σ2

]

· f0(m, c , σ)

Even if the prior is flat in all parameters◮ methods “based only on the properties of the argument of the

exponent” fail, because they miss the contribution from σ−n!

c© GdA, WLS-6 12/02/20 58/67

Page 249: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Uncertain standard deviationIn the probabilistic approach it is rather simple: just add σ in θ toinfer.

◮ For example, if we have good reasons to belief that the σ’sare all equal, then

f (m, c , σ | x , y) ∝ σ−n exp

[

−∑

i (yi −mxi − c)2

2σ2

]

· f0(m, c , σ)

Even if the prior is flat in all parameters◮ methods “based only on the properties of the argument of the

exponent” fail, because they miss the contribution from σ−n!◮ The Gaussian trick applied to the full posterior perfoms better.

c© GdA, WLS-6 12/02/20 58/67

Page 250: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Uncertain standard deviationIn the probabilistic approach it is rather simple: just add σ in θ toinfer.

◮ For example, if we have good reasons to belief that the σ’sare all equal, then

f (m, c , σ | x , y) ∝ σ−n exp

[

−∑

i (yi −mxi − c)2

2σ2

]

· f0(m, c , σ)

Even if the prior is flat in all parameters◮ methods “based only on the properties of the argument of the

exponent” fail, because they miss the contribution from σ−n!◮ The Gaussian trick applied to the full posterior perfoms better.

Residuals? Ok if there are many points, otherwise we do not takeinto account the uncertainty on σ and its effect on the probabilityfunction of m and c .

c© GdA, WLS-6 12/02/20 58/67

Page 251: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Uncertain standard deviationIn the probabilistic approach it is rather simple: just add σ in θ toinfer.

◮ For example, if we have good reasons to belief that the σ’sare all equal, then

f (m, c , σ | x , y) ∝ σ−n exp

[

−∑

i (yi −mxi − c)2

2σ2

]

· f0(m, c , σ)

Even if the prior is flat in all parameters◮ methods “based only on the properties of the argument of the

exponent” fail, because they miss the contribution from σ−n!◮ The Gaussian trick applied to the full posterior perfoms better.

Residuals? Ok if there are many points, otherwise we do not takeinto account the uncertainty on σ and its effect on the probabilityfunction of m and c .Note: as long as σ is constant (although unknown) and the priorflat in m and c the best estimates of m and c do not depend in σ.

c© GdA, WLS-6 12/02/20 58/67

Page 252: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSModel

var mu.y[N];

model{

for (i in 1:N) {

y[i] ~ dnorm(mu.y[i], tau);

mu.y[i] <- x[i]*m + c;

}

c ~ dnorm(0, 1.0E-6);

m ~ dnorm(0, 1.0E-6);

tau ~ dgamma(1.0, 1.0E-6);

sigma <- 1.0/sqrt(tau);

}

c© GdA, WLS-6 12/02/20 59/67

Page 253: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSModel

var mu.y[N];

model{

for (i in 1:N) {

y[i] ~ dnorm(mu.y[i], tau);

mu.y[i] <- x[i]*m + c;

}

c ~ dnorm(0, 1.0E-6);

m ~ dnorm(0, 1.0E-6);

tau ~ dgamma(1.0, 1.0E-6);

sigma <- 1.0/sqrt(tau);

}

Simulated data

m.true = 2; c.true = 1; sigma.true=2

x = 1:20

y = m.true * x + c.true + rnorm(length(x), 0, sigma.true)

plot(x,y, col=’blue’,ylim=c(0,max(y)) )

c© GdA, WLS-6 12/02/20 59/67

Page 254: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSPlot of simulated data

c© GdA, WLS-6 12/02/20 60/67

Page 255: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSPlot of simulated data

Calling JAGS

ns=10000

jm <- jags.model(model, data, inits)

update(jm, 100)

chain <- coda.samples(jm, c("c","m","sigma"), n.iter=ns)

c© GdA, WLS-6 12/02/20 60/67

Page 256: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGS⇒ linear fit.RJAGS summary

c = −0.04± 0.96; m = 2.10± 0.08; σ = 2.06± 0.34c© GdA, WLS-6 12/02/20 61/67

Page 257: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGS‘Check’ the result

c <- as.vector(chain[[1]][,1])

m <- as.vector(chain[[1]][,2])

sigma <- as.vector(chain[[1]][,3])

plot(x,y, col=’blue’,ylim=c(0,max(y)) )

abline(mean(c), mean(m), col=’red’)

c© GdA, WLS-6 12/02/20 62/67

Page 258: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGS‘Check’ the result

c <- as.vector(chain[[1]][,1])

m <- as.vector(chain[[1]][,2])

sigma <- as.vector(chain[[1]][,3])

plot(x,y, col=’blue’,ylim=c(0,max(y)) )

abline(mean(c), mean(m), col=’red’)

c© GdA, WLS-6 12/02/20 62/67

Page 259: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGS

Correlation between m and c

plot(m,c,col=’cyan’)

cat(sprintf("rho(m,x) = %.3f\n", cor(m,c) ))

c© GdA, WLS-6 12/02/20 63/67

Page 260: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGS

Correlation between m and c

plot(m,c,col=’cyan’)

cat(sprintf("rho(m,x) = %.3f\n", cor(m,c) ))

ρ(m, c) = −0.88

c© GdA, WLS-6 12/02/20 63/67

Page 261: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSCheck with R lm() (least square)plot(x,y, col=’blue’,ylim=c(0,max(y)) )

abline(mean(c), mean(m), col=’red’) # JAGS

abline(lm(y~x), col=’black’) # least squares

c© GdA, WLS-6 12/02/20 64/67

Page 262: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSCheck with R lm() (least square)plot(x,y, col=’blue’,ylim=c(0,max(y)) )

abline(mean(c), mean(m), col=’red’) # JAGS

abline(lm(y~x), col=’black’) # least squares

c© GdA, WLS-6 12/02/20 64/67

Page 263: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSCheck with R lm() (least square)plot(x,y, col=’blue’,ylim=c(0,max(y)) )

abline(mean(c), mean(m), col=’red’) # JAGS

abline(lm(y~x), col=’black’) # least squares

Linear model line (c = −0.05, m = 2.10) covers perfectly the JAGSresult

c© GdA, WLS-6 12/02/20 64/67

Page 264: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSCheck with R lm() (least square)plot(x,y, col=’blue’,ylim=c(0,max(y)) )

abline(mean(c), mean(m), col=’red’) # JAGS

abline(lm(y~x), col=’black’) # least squares

Linear model line (c = −0.05, m = 2.10) covers perfectly the JAGSresult: waste of time?

c© GdA, WLS-6 12/02/20 64/67

Page 265: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSCheck with R lm() (least square)plot(x,y, col=’blue’,ylim=c(0,max(y)) )

abline(mean(c), mean(m), col=’red’) # JAGS

abline(lm(y~x), col=’black’) # least squares

Linear model line (c = −0.05, m = 2.10) covers perfectly the JAGSresult: waste of time? It all depends. . .

c© GdA, WLS-6 12/02/20 64/67

Page 266: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSCheck with R lm() (least square)plot(x,y, col=’blue’,ylim=c(0,max(y)) )

abline(mean(c), mean(m), col=’red’) # JAGS

abline(lm(y~x), col=’black’) # least squares

Linear model line (c = −0.05, m = 2.10) covers perfectly the JAGSresult: waste of time? It all depends. . .If the purpose was just to get an idea of the trend, then drawing a linewith pencil and ruler would have been enough

c© GdA, WLS-6 12/02/20 64/67

Page 267: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSCheck with R lm() (least square)plot(x,y, col=’blue’,ylim=c(0,max(y)) )

abline(mean(c), mean(m), col=’red’) # JAGS

abline(lm(y~x), col=’black’) # least squares

Linear model line (c = −0.05, m = 2.10) covers perfectly the JAGSresult: waste of time? It all depends. . .If the purpose was just to get an idea of the trend, then drawing a linewith pencil and ruler would have been enough (as suggested to studentsof Circuit Lab)

c© GdA, WLS-6 12/02/20 64/67

Page 268: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSCheck with R lm() (least square)plot(x,y, col=’blue’,ylim=c(0,max(y)) )

abline(mean(c), mean(m), col=’red’) # JAGS

abline(lm(y~x), col=’black’) # least squares

Linear model line (c = −0.05, m = 2.10) covers perfectly the JAGSresult: waste of time? It all depends. . .If the purpose was just to get an idea of the trend, then drawing a linewith pencil and ruler would have been enough (as suggested to studentsof Circuit Lab): m and c ≈OK

c© GdA, WLS-6 12/02/20 64/67

Page 269: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSCheck with R lm() (least square)plot(x,y, col=’blue’,ylim=c(0,max(y)) )

abline(mean(c), mean(m), col=’red’) # JAGS

abline(lm(y~x), col=’black’) # least squares

Linear model line (c = −0.05, m = 2.10) covers perfectly the JAGSresult: waste of time? It all depends. . .If the purpose was just to get an idea of the trend, then drawing a linewith pencil and ruler would have been enough (as suggested to studentsof Circuit Lab): m and c ≈OK: NO FIT: focus on circuits!

c© GdA, WLS-6 12/02/20 64/67

Page 270: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Linear fits with uncertain σ in JAGSCheck with R lm() (least square)plot(x,y, col=’blue’,ylim=c(0,max(y)) )

abline(mean(c), mean(m), col=’red’) # JAGS

abline(lm(y~x), col=’black’) # least squares

Linear model line (c = −0.05, m = 2.10) covers perfectly the JAGSresult: waste of time? It all depends. . .If the purpose was just to get an idea of the trend, then drawing a linewith pencil and ruler would have been enough (as suggested to studentsof Circuit Lab): m and c ≈OK: NO FIT: focus on circuits!

Otherwise: ⇒ f (c ,m, σ | data points)

c© GdA, WLS-6 12/02/20 64/67

Page 271: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Forecasting new µy and new yImagine we are interested at “y at xf = 30” (referring to our‘data’).

c© GdA, WLS-6 12/02/20 65/67

Page 272: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Forecasting new µy and new yImagine we are interested at “y at xf = 30” (referring to our‘data’).

◮ First at all it is important to distinguish

µy (xf ) → µy (µxf ) (no error on x)

y(xf ) → y(µxf )

c© GdA, WLS-6 12/02/20 65/67

Page 273: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Forecasting new µy and new yImagine we are interested at “y at xf = 30” (referring to our‘data’).

◮ First at all it is important to distinguish

µy (xf ) → µy (µxf ) (no error on x)

y(xf ) → y(µxf )

◮ Then we have to take into account all uncertainties, includingcorrelations (not only the covariance matrix!)

c© GdA, WLS-6 12/02/20 65/67

Page 274: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Forecasting new µy and new yImagine we are interested at “y at xf = 30” (referring to our‘data’).

◮ First at all it is important to distinguish

µy (xf ) → µy (µxf ) (no error on x)

y(xf ) → y(µxf )

◮ Then we have to take into account all uncertainties, includingcorrelations (not only the covariance matrix!)

Our problem

f (µyf | data, xf ) =

f (µyf |m, c , xf ) · f (m, c | data) dc dm

c© GdA, WLS-6 12/02/20 65/67

Page 275: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Forecasting new µy and new yImagine we are interested at “y at xf = 30” (referring to our‘data’).

◮ First at all it is important to distinguish

µy (xf ) → µy (µxf ) (no error on x)

y(xf ) → y(µxf )

◮ Then we have to take into account all uncertainties, includingcorrelations (not only the covariance matrix!)

Our problem

f (µyf | data, xf ) =

f (µyf |m, c , xf ) · f (m, c | data) dc dm

f (yf | data, xf ) =

f (yf |µyf ) · f (µyf | data, xf ) dµyf

c© GdA, WLS-6 12/02/20 65/67

Page 276: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Forecasting new µy and new yIncluding prediction in the JAGS model

var mu.y[N];

model{

for (i in 1:N) {

y[i] ~ dnorm(mu.y[i], tau);

mu.y[i] <- x[i] * m + c;

}

mu.yf <- xf * m + c; # future ’true value’ for x=xf

yf ~ dnorm(mu.yf, tau); # future ’observation for x=xf

c ~ dnorm(0, 1.0E-6);

m ~ dnorm(0, 1.0E-6);

tau ~ dgamma(1.0, 1.0E-6);

sigma <- 1.0/sqrt(tau);

}

c© GdA, WLS-6 12/02/20 66/67

Page 277: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Forecasting new µy and new yIncluding prediction in the JAGS model

var mu.y[N];

model{

for (i in 1:N) {

y[i] ~ dnorm(mu.y[i], tau);

mu.y[i] <- x[i] * m + c;

}

mu.yf <- xf * m + c; # future ’true value’ for x=xf

yf ~ dnorm(mu.yf, tau); # future ’observation for x=xf

c ~ dnorm(0, 1.0E-6);

m ~ dnorm(0, 1.0E-6);

tau ~ dgamma(1.0, 1.0E-6);

sigma <- 1.0/sqrt(tau);

}

Or we can do the ‘integral’ by sampling, using the MCMC historiesof the quantities of interest(see previous model, without prediction)

c© GdA, WLS-6 12/02/20 66/67

Page 278: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Forecasting new µy and new yIncluding prediction in the JAGS model

var mu.y[N];

model{

for (i in 1:N) {

y[i] ~ dnorm(mu.y[i], tau);

mu.y[i] <- x[i] * m + c;

}

mu.yf <- xf * m + c; # future ’true value’ for x=xf

yf ~ dnorm(mu.yf, tau); # future ’observation for x=xf

c ~ dnorm(0, 1.0E-6);

m ~ dnorm(0, 1.0E-6);

tau ~ dgamma(1.0, 1.0E-6);

sigma <- 1.0/sqrt(tau);

}

Or we can do the ‘integral’ by sampling, using the MCMC historiesof the quantities of interest(see previous model, without prediction)

⇒ Left as exercisec© GdA, WLS-6 12/02/20 66/67

Page 279: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Forecasting new µy and new y with JAGS

µy (x = 30) = 63.0± 1.7; y(x = 30) = 63.0± 2.7Try with Root ;-) [’data’ on the web site]

c© GdA, WLS-6 12/02/20 67/67

Page 280: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

The End

c© GdA, WLS-6 12/02/20 68/67

Page 281: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Appendix on small samples

c© GdA, WLS-6 12/02/20 69/67

Page 282: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations)

f (µ, σ | x) ∝ σ−n exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

c© GdA, WLS-6 12/02/20 70/67

Page 283: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations)

f (µ, σ | x) ∝ σ−n exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−x2−x2 + x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

c© GdA, WLS-6 12/02/20 70/67

Page 284: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations)

f (µ, σ | x) ∝ σ−n exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−x2−x2 + x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

with s2 = x2 − x2, variance of the sample.

c© GdA, WLS-6 12/02/20 70/67

Page 285: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations)

f (µ, σ | x) ∝ σ−n exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−x2−x2 + x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

with s2 = x2 − x2, variance of the sample.◮ the inference on µ and σ depends only on s2 and x (and on

the priors, as it has to be!).

c© GdA, WLS-6 12/02/20 70/67

Page 286: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations)

f (µ, σ | x) ∝ σ−n exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−x2−x2 + x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

with s2 = x2 − x2, variance of the sample.◮ the inference on µ and σ depends only on s2 and x (and on

the priors, as it has to be!).◮ Evaluate f (µ, σ | x , s) and then

f (µ | x , s) =

∫ ∞

0f (µ, σ | x , s) dσ

c© GdA, WLS-6 12/02/20 70/67

Page 287: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations)

f (µ, σ | x) ∝ σ−n exp

[

−x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−x2−x2 + x2 − 2µ x + µ2

2σ2/n

]

· f0(µ, σ)

∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

with s2 = x2 − x2, variance of the sample.◮ the inference on µ and σ depends only on s2 and x (and on

the priors, as it has to be!).◮ Evaluate f (µ, σ | x , s) and then

f (µ | x , s) =

∫ ∞

0f (µ, σ | x , s) dσ

f (σ | x , s) =

∫ +∞

−∞f (µ, σ | x , s) dµ

c© GdA, WLS-6 12/02/20 70/67

Page 288: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on σ)

f (µ, σ | x) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

Marginalizing1

f (µ | x) =

∫ ∞

0f (µ, σ | x) dσ

1The integral of interest is∫ ∞

0

z−n exp

[

c

2 z2

]

dz = 2(n−3)/2 Γ

[

1

2(n − 1)

]

c−(n−1)/2.

c© GdA, WLS-6 12/02/20 71/67

Page 289: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on σ)

f (µ, σ | x) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

Marginalizing1

f (µ | x) =

∫ ∞

0f (µ, σ | x) dσ

∝(

(x − µ)2 + s2)−(n−1)/2

1The integral of interest is∫ ∞

0

z−n exp

[

c

2 z2

]

dz = 2(n−3)/2 Γ

[

1

2(n − 1)

]

c−(n−1)/2.

c© GdA, WLS-6 12/02/20 71/67

Page 290: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on σ)

f (µ, σ | x) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

Marginalizing1

f (µ | x) =

∫ ∞

0f (µ, σ | x) dσ

∝(

(x − µ)2 + s2)−(n−1)/2

∝(

1 +(µ− x)2

s2

)−(n−1)/2

1The integral of interest is∫ ∞

0

z−n exp

[

c

2 z2

]

dz = 2(n−3)/2 Γ

[

1

2(n − 1)

]

c−(n−1)/2.

c© GdA, WLS-6 12/02/20 71/67

Page 291: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on σ)

f (µ | x) ∝(

1 +(µ− x)2

s2

)−(n−1)/2

??

c© GdA, WLS-6 12/02/20 72/67

Page 292: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on σ)

f (µ | x) ∝(

1 +(µ− x)2

s2

)−(n−1)/2

??

∝(

1 +(µ− x)2

(n − 2) s2/(n − 2)

)−((n−2)+1)/2

c© GdA, WLS-6 12/02/20 72/67

Page 293: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on σ)

f (µ | x) ∝(

1 +(µ− x)2

s2

)−(n−1)/2

??

∝(

1 +(µ− x)2

(n − 2) s2/(n − 2)

)−((n−2)+1)/2

∝(

1 +t2

ν

)−(ν+1)/2

with

ν = n − 2

t =µ− x

s/√n − 2

,

c© GdA, WLS-6 12/02/20 72/67

Page 294: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on σ)

f (µ | x) ∝(

1 +(µ− x)2

s2

)−(n−1)/2

??

∝(

1 +(µ− x)2

(n − 2) s2/(n − 2)

)−((n−2)+1)/2

∝(

1 +t2

ν

)−(ν+1)/2

with

ν = n − 2

t =µ− x

s/√n − 2

,

that is

µ = x +s√n − 2

t ,

where t is a “Student t” with ν = n − 2: c© GdA, WLS-6 12/02/20 72/67

Page 295: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Student t

-4 -2 2 4x

0.1

0.2

0.3

0.4

f

Examples of Student t for ν equal to 1 , 2, 5, 10 and 100 (≈ “∞”).

c© GdA, WLS-6 12/02/20 73/67

Page 296: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

In summary,

µ− x

s/√n − 2

∼ Student(ν = n − 2)

c© GdA, WLS-6 12/02/20 74/67

Page 297: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

In summary,

µ− x

s/√n − 2

∼ Student(ν = n − 2)

E(µ)(n>3)= x

c© GdA, WLS-6 12/02/20 74/67

Page 298: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

In summary,

µ− x

s/√n − 2

∼ Student(ν = n − 2)

E(µ)(n>3)= x

σ(µ)(n>4)=

s√n − 4

.

c© GdA, WLS-6 12/02/20 74/67

Page 299: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

In summary,

µ− x

s/√n − 2

∼ Student(ν = n − 2)

E(µ)(n>3)= x

σ(µ)(n>4)=

s√n − 4

.

The uncertainty on σ increases the probability of the values of µfar from x :

◮ not only the standard uncertainty increases, but thedistribution itself changes and, as ‘well know’ the tdistribution has ‘higher’ tails.

c© GdA, WLS-6 12/02/20 74/67

Page 300: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

In summary,

µ− x

s/√n − 2

∼ Student(ν = n − 2)

E(µ)(n>3)= x

σ(µ)(n>4)=

s√n − 4

.

The uncertainty on σ increases the probability of the values of µfar from x :

◮ not only the standard uncertainty increases, but thedistribution itself changes and, as ‘well know’ the tdistribution has ‘higher’ tails.

However, when n is very large the Gaussian distribution is recovered(the t-distribution tends to a gaussian), with σ(µ) = s/

√n.

c© GdA, WLS-6 12/02/20 74/67

Page 301: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sampleMisunderstandings and ‘myths’ related to the Student t distribution

Expected value and variance only exist above certain values of n:

E(µ)(n>3)= x

σ(µ)(n>4)=

s√n − 4

.

2Flat priors are good for teaching purposes, but when the result hurts withour beliefs it means we have to use priors that match with previous knowledge.

c© GdA, WLS-6 12/02/20 75/67

Page 302: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sampleMisunderstandings and ‘myths’ related to the Student t distribution

Expected value and variance only exist above certain values of n:

E(µ)(n>3)= x

σ(µ)(n>4)=

s√n − 4

.

So what?

2Flat priors are good for teaching purposes, but when the result hurts withour beliefs it means we have to use priors that match with previous knowledge.

c© GdA, WLS-6 12/02/20 75/67

Page 303: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sampleMisunderstandings and ‘myths’ related to the Student t distribution

Expected value and variance only exist above certain values of n:

E(µ)(n>3)= x

σ(µ)(n>4)=

s√n − 4

.

So what?

It is just a reflex of the fact that we have used, for lazyness,2 priorswhich are indeed absurd.

2Flat priors are good for teaching purposes, but when the result hurts withour beliefs it means we have to use priors that match with previous knowledge.

c© GdA, WLS-6 12/02/20 75/67

Page 304: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sampleMisunderstandings and ‘myths’ related to the Student t distribution

Expected value and variance only exist above certain values of n:

E(µ)(n>3)= x

σ(µ)(n>4)=

s√n − 4

.

So what?

It is just a reflex of the fact that we have used, for lazyness,2 priorswhich are indeed absurd.

◮ In no measurement we beleive that µ and/or σ could be‘infinite’.

2Flat priors are good for teaching purposes, but when the result hurts withour beliefs it means we have to use priors that match with previous knowledge.

c© GdA, WLS-6 12/02/20 75/67

Page 305: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sampleMisunderstandings and ‘myths’ related to the Student t distribution

Expected value and variance only exist above certain values of n:

E(µ)(n>3)= x

σ(µ)(n>4)=

s√n − 4

.

So what?

It is just a reflex of the fact that we have used, for lazyness,2 priorswhich are indeed absurd.

◮ In no measurement we beleive that µ and/or σ could be‘infinite’.

◮ Just plug in some reasonable, although very vagues, properpriors, and the problem disappears.

2Flat priors are good for teaching purposes, but when the result hurts withour beliefs it means we have to use priors that match with previous knowledge.

c© GdA, WLS-6 12/02/20 75/67

Page 306: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

◮ Large n limit:

E(µ)n→∞−−−→ x

σ(µ)n→∞−−−→ s√

n

µn→∞−−−→ ∼ N (x ,

s√n).

c© GdA, WLS-6 12/02/20 76/67

Page 307: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

Marginal f (σ)

f (σ | x , s) =

∫ +∞

−∞f (µ, σ | x , s) dµ

c© GdA, WLS-6 12/02/20 77/67

Page 308: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

Marginal f (σ)

f (σ | x , s) =

∫ +∞

−∞f (µ, σ | x , s) dµ

∝ σ−n exp

[

− n s2

2σ2

] ∫ +∞

−∞exp

[

−n (x − µ)2

2σ2

]

c© GdA, WLS-6 12/02/20 77/67

Page 309: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

Marginal f (σ)

f (σ | x , s) =

∫ +∞

−∞f (µ, σ | x , s) dµ

∝ σ−n exp

[

− n s2

2σ2

] ∫ +∞

−∞exp

[

−n (x − µ)2

2σ2

]

∝ σ−(n−1) exp

[

− n s2

2σ2

]

c© GdA, WLS-6 12/02/20 77/67

Page 310: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

Marginal f (σ)

f (σ | x , s) =

∫ +∞

−∞f (µ, σ | x , s) dµ

∝ σ−n exp

[

− n s2

2σ2

] ∫ +∞

−∞exp

[

−n (x − µ)2

2σ2

]

∝ σ−(n−1) exp

[

− n s2

2σ2

]

That is. . . (no special function)

c© GdA, WLS-6 12/02/20 77/67

Page 311: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

Marginal f (σ)

f (σ | x , s) =

∫ +∞

−∞f (µ, σ | x , s) dµ

∝ σ−n exp

[

− n s2

2σ2

] ∫ +∞

−∞exp

[

−n (x − µ)2

2σ2

]

∝ σ−(n−1) exp

[

− n s2

2σ2

]

That is. . . (no special function)[But if we would use τ = 1/σ2 we would recognize a Gamma. . . ]

c© GdA, WLS-6 12/02/20 77/67

Page 312: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on mu and σ)

2 4 6 8 10Σ�s

0.25

0.5

0.75

1

1.25

1.5

fHΣ�sL Prior uniforme in Σ

n = 3 (dotted), n = 5 (dashed) e n = 10 (continous).

c© GdA, WLS-6 12/02/20 78/67

Page 313: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

2 4 6 8 10Σ�s

0.25

0.5

0.75

1

1.25

1.5

fHΣ�sL Prior uniforme in Σ

E(σ) −−−→n→∞

s

std(σ) −−−→n→∞

s√2 n

σ −−−→n→∞

∼ N (s,s√2 n

) .

c© GdA, WLS-6 12/02/20 79/67

Page 314: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

Using the “Gaussian trick”

ϕ(µ, σ) = n lnσ +s2 + (µ− x)2)

2σ/n

c© GdA, WLS-6 12/02/20 80/67

Page 315: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

Using the “Gaussian trick”

ϕ(µ, σ) = n lnσ +s2 + (µ− x)2)

2σ/n

First derivatives:

∂ϕ

∂µ=

µ− x

σ/n

∂ϕ

∂σ=

n

σ− n s2

σ3

c© GdA, WLS-6 12/02/20 80/67

Page 316: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

Using the “Gaussian trick”

ϕ(µ, σ) = n lnσ +s2 + (µ− x)2)

2σ/n

First derivatives:

∂ϕ

∂µ=

µ− x

σ/n

∂ϕ

∂σ=

n

σ− n s2

σ3

From which it follows (equating the derivatives to zero)

E(µ) = x

E(σ) = s

(They are indeed the modes!)

c© GdA, WLS-6 12/02/20 80/67

Page 317: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

Hessian calculated at µ = x and σ = s (hereafter ‘m’):

∂2ϕ

∂µ2

m

=n

σ2

m=

n

s2

∂2ϕ

∂σ2

m

=

(

− n

σ2+

3 (s2 + (µ− x)2)

σ4/n

)∣

m

=2 n

s2

∂2ϕ

∂µ∂σ

m

=−2 (µ− x)

σ3/n

m

= 0

∂2ϕ

∂σ∂µ

m

=−2 (µ− x)

σ3/n

m

= 0

c© GdA, WLS-6 12/02/20 81/67

Page 318: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

Hessian calculated at µ = x and σ = s (hereafter ‘m’):

∂2ϕ

∂µ2

m

=n

σ2

m=

n

s2

∂2ϕ

∂σ2

m

=

(

− n

σ2+

3 (s2 + (µ− x)2)

σ4/n

)∣

m

=2 n

s2

∂2ϕ

∂µ∂σ

m

=−2 (µ− x)

σ3/n

m

= 0

∂2ϕ

∂σ∂µ

m

=−2 (µ− x)

σ3/n

m

= 0

It followsstd(µ) =

s√n

std(σ) =s√2 n

,

reobtaining the large number limit.

c© GdA, WLS-6 12/02/20 81/67

Page 319: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

Hessian calculated at µ = x and σ = s (hereafter ‘m’):

∂2ϕ

∂µ2

m

=n

σ2

m=

n

s2

∂2ϕ

∂σ2

m

=

(

− n

σ2+

3 (s2 + (µ− x)2)

σ4/n

)∣

m

=2 n

s2

∂2ϕ

∂µ∂σ

m

=−2 (µ− x)

σ3/n

m

= 0

∂2ϕ

∂σ∂µ

m

=−2 (µ− x)

σ3/n

m

= 0

It followsstd(µ) =

s√n

std(σ) =s√2 n

,

reobtaining the large number limit. And, notice, ρ(µ, σ) = 0 .

c© GdA, WLS-6 12/02/20 81/67

Page 320: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations – prior uniform on µ and σ)

Hessian calculated at µ = x and σ = s (hereafter ‘m’):

∂2ϕ

∂µ2

m

=n

σ2

m=

n

s2

∂2ϕ

∂σ2

m

=

(

− n

σ2+

3 (s2 + (µ− x)2)

σ4/n

)∣

m

=2 n

s2

∂2ϕ

∂µ∂σ

m

=−2 (µ− x)

σ3/n

m

= 0

∂2ϕ

∂σ∂µ

m

=−2 (µ− x)

σ3/n

m

= 0

It followsstd(µ) =

s√n

std(σ) =s√2 n

,

reobtaining the large number limit. And, notice, ρ(µ, σ) = 0 .Q.: Are they independent?

c© GdA, WLS-6 12/02/20 81/67

Page 321: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations. Expression the Gaussian in terms of τ = 1/σ2)

f (µ, σ | x) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

c© GdA, WLS-6 12/02/20 82/67

Page 322: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations. Expression the Gaussian in terms of τ = 1/σ2)

f (µ, σ | x) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

It is technically convenient to use τ = 1/σ2:

f (µ, τ | x) ∝ τn/2 exp[

−n τ

2

(

s2 + (µ− x)2)

]

· f0(µ, τ)

c© GdA, WLS-6 12/02/20 82/67

Page 323: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations. Expression the Gaussian in terms of τ = 1/σ2)

f (µ, σ | x) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

It is technically convenient to use τ = 1/σ2:

f (µ, τ | x) ∝ τn/2 exp[

−n τ

2

(

s2 + (µ− x)2)

]

· f0(µ, τ)

For a fixed µ (and observed s and x)

f (τ | x , µ) ∝ τα e−β τ · f0(τ)

c© GdA, WLS-6 12/02/20 82/67

Page 324: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations. Expression the Gaussian in terms of τ = 1/σ2)

f (µ, σ | x) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

It is technically convenient to use τ = 1/σ2:

f (µ, τ | x) ∝ τn/2 exp[

−n τ

2

(

s2 + (µ− x)2)

]

· f0(µ, τ)

For a fixed µ (and observed s and x)

f (τ | x , µ) ∝ τα e−β τ · f0(τ)Do you recongnize a famous mathematical form?

c© GdA, WLS-6 12/02/20 82/67

Page 325: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations. Expression the Gaussian in terms of τ = 1/σ2)

f (µ, σ | x) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

It is technically convenient to use τ = 1/σ2:

f (µ, τ | x) ∝ τn/2 exp[

−n τ

2

(

s2 + (µ− x)2)

]

· f0(µ, τ)

For a fixed µ (and observed s and x)

f (τ | x , µ) ∝ τα e−β τ · f0(τ)Do you recongnize a famous mathematical form?

On the other way around, for a fixed τ ,

f (µ | x , τ) ∝ exp[

−n τ

2(µ− x)2

]

· f0(µ)

c© GdA, WLS-6 12/02/20 82/67

Page 326: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Inferring µ and σ from a sample(Gaussian, independent observations. Expression the Gaussian in terms of τ = 1/σ2)

f (µ, σ | x) ∝ σ−n exp

[

−s2 + (µ− x)2

2σ2/n

]

· f0(µ, σ)

It is technically convenient to use τ = 1/σ2:

f (µ, τ | x) ∝ τn/2 exp[

−n τ

2

(

s2 + (µ− x)2)

]

· f0(µ, τ)

For a fixed µ (and observed s and x)

f (τ | x , µ) ∝ τα e−β τ · f0(τ)Do you recongnize a famous mathematical form?

On the other way around, for a fixed τ ,

f (µ | x , τ) ∝ exp[

−n τ

2(µ− x)2

]

· f0(µ)

⇒ Gibbs samplingc© GdA, WLS-6 12/02/20 82/67

Page 327: Probabilistic Inference and Forecasting in Physicsdagos/WLS/wls_06.pdfSummary and remarks on probabilitic inference/forecasting [3-9] Normal distribution: inferring µ [10-14] Comments

Practical introduction to BUGS

◮ Introducing the bug language to build up the models.

◮ Running the model (including data and ‘inits’) in theOpenBUGS GUI.

◮ Analysing the resulting chain in R.

c© GdA, WLS-6 12/02/20 83/67