55
Lecture 8 The Principle of Maximum Likelihood

Lecture 8 The Principle of Maximum Likelihood

  • Upload
    hidi

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Lecture 8 The Principle of Maximum Likelihood. Syllabus. - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture 8  The Principle of Maximum Likelihood

Lecture 8

The Principle of Maximum Likelihood

Page 2: Lecture 8  The Principle of Maximum Likelihood

SyllabusLecture 01 Describing Inverse ProblemsLecture 02 Probability and Measurement Error, Part 1Lecture 03 Probability and Measurement Error, Part 2 Lecture 04 The L2 Norm and Simple Least SquaresLecture 05 A Priori Information and Weighted Least SquaredLecture 06 Resolution and Generalized InversesLecture 07 Backus-Gilbert Inverse and the Trade Off of Resolution and VarianceLecture 08 The Principle of Maximum LikelihoodLecture 09 Inexact TheoriesLecture 10 Nonuniqueness and Localized AveragesLecture 11 Vector Spaces and Singular Value DecompositionLecture 12 Equality and Inequality ConstraintsLecture 13 L1 , L∞ Norm Problems and Linear ProgrammingLecture 14 Nonlinear Problems: Grid and Monte Carlo Searches Lecture 15 Nonlinear Problems: Newton’s Method Lecture 16 Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Lecture 17 Factor AnalysisLecture 18 Varimax Factors, Empircal Orthogonal FunctionsLecture 19 Backus-Gilbert Theory for Continuous Problems; Radon’s ProblemLecture 20 Linear Operators and Their AdjointsLecture 21 Fréchet DerivativesLecture 22 Exemplary Inverse Problems, incl. Filter DesignLecture 23 Exemplary Inverse Problems, incl. Earthquake LocationLecture 24 Exemplary Inverse Problems, incl. Vibrational Problems

Page 3: Lecture 8  The Principle of Maximum Likelihood

Purpose of the Lecture

Introduce the spaces of all possible data,all possible models and the idea of likelihood

Use maximization of likelihood as a guiding principle for solving inverse problems

Page 4: Lecture 8  The Principle of Maximum Likelihood

Part 1

The spaces of all possible data,all possible models and the idea of

likelihood

Page 5: Lecture 8  The Principle of Maximum Likelihood

viewpoint

the observed data is one point in the space of all possible observations

or

dobs is a point in S(d)

Page 6: Lecture 8  The Principle of Maximum Likelihood

d2

d3

d1O

plot of dobs

Page 7: Lecture 8  The Principle of Maximum Likelihood

d2

d3

d1O

dobs

plot of dobs

Page 8: Lecture 8  The Principle of Maximum Likelihood

now suppose …the data are independent

each is drawn from a Gaussian distribution

with the same mean m1 and variance σ2

(but m1 and σ unknown)

Page 9: Lecture 8  The Principle of Maximum Likelihood

d2

d3

d1O

plot of p(d)

Page 10: Lecture 8  The Principle of Maximum Likelihood

d2

d3

d1O

plot of p(d)

cloud centered on d1=d2=d3with radius proportional to σ

Page 11: Lecture 8  The Principle of Maximum Likelihood

now interpret …

p(dobs)as the probability that the observed data was in

fact observed

L = log p(dobs)called the likelihood

Page 12: Lecture 8  The Principle of Maximum Likelihood

find parameters in the distribution

maximizep(dobs)with respect to m1 and σ

maximize the probability that the observed datawere in fact observed

thePrinciple of Maximum Likelihood

Page 13: Lecture 8  The Principle of Maximum Likelihood

Example

Page 14: Lecture 8  The Principle of Maximum Likelihood

solving the two equations

Page 15: Lecture 8  The Principle of Maximum Likelihood

solving the two equations

usual formula for the sample

mean

almost the usual formula for the sample standard

deviation

Page 16: Lecture 8  The Principle of Maximum Likelihood

these two estimates linked to the assumption of the data being Gaussian-distributed

might get a different formula for a different p.d.f.

Page 17: Lecture 8  The Principle of Maximum Likelihood

L(m 1,

σ)

σm1

maximumlikelihoodpoint

example of a likelihood surface

Page 18: Lecture 8  The Principle of Maximum Likelihood

d1d 2 d 2d1

p(d1, ,d1 ) p(d1, ,d1 )(A) (B)

likelihood maximization process will fail if p.d.f. has no well-defined peak

Page 19: Lecture 8  The Principle of Maximum Likelihood

Part 2

Using the maximization of likelihood as a guiding principle for

solving inverse problems

Page 20: Lecture 8  The Principle of Maximum Likelihood

linear inverse problem for with Gaussian-distibuted data

with known covariance [cov d]assumeGm=d

gives the mean dT

Page 21: Lecture 8  The Principle of Maximum Likelihood

principle of maximum likelihood

maximize L = log p(dobs)minimize

with respect to m

T

Page 22: Lecture 8  The Principle of Maximum Likelihood

principle of maximum likelihood

maximize L = log p(dobs)minimize

This is just weighted least squares

E = T

Page 23: Lecture 8  The Principle of Maximum Likelihood

principle of maximum likelihood

when data Gaussian-distributedsolve Gm=d with weighted least

squares

with weighting of

Page 24: Lecture 8  The Principle of Maximum Likelihood

special case of uncorrelated dataeach datum with a different variance[cov d]ii = σdi2

minimize

Page 25: Lecture 8  The Principle of Maximum Likelihood

special case of uncorrelated dataeach datum with a different variance[cov d]ii = σdi2

minimize

errors weighted by

their certainty

Page 26: Lecture 8  The Principle of Maximum Likelihood

but what about a priori information?

Page 27: Lecture 8  The Principle of Maximum Likelihood

probabilistic representation of a priori information

probability that the model parameters are

near mgiven by p.d.f.

pA(m)

Page 28: Lecture 8  The Principle of Maximum Likelihood

probabilistic representation of a priori information

probability that the model parameters are

near mgiven by p.d.f.

pA(m)centered at a priori value

<m>

Page 29: Lecture 8  The Principle of Maximum Likelihood

probabilistic representation of a priori information

probability that the model parameters are

near mgiven by p.d.f.

pA(m)variance reflects uncertainty in a

priori information

Page 30: Lecture 8  The Principle of Maximum Likelihood

certain uncertain<m2> <m2><m 1> <m 1>

m1 m1

m2 m2

Page 31: Lecture 8  The Principle of Maximum Likelihood

<m2><m 1>

m1

m2

Page 32: Lecture 8  The Principle of Maximum Likelihood

linear relationship approximation with Gaussian<m2>

<m 1 >m1 m1

m2 m2

Page 33: Lecture 8  The Principle of Maximum Likelihood

m1

m2 p=constantp=0

Page 34: Lecture 8  The Principle of Maximum Likelihood

assessing the information contentin pA(m)

Do we know a little about mor

a lot about m ?

Page 35: Lecture 8  The Principle of Maximum Likelihood

Information Gain, S

-S called Relative Entropy,

Page 36: Lecture 8  The Principle of Maximum Likelihood

Relative Entropy, Salso called Information Gain

null p.d.f.state of no knowledge

Page 37: Lecture 8  The Principle of Maximum Likelihood

Relative Entropy, Salso called Information Gain

uniform p.d.f. might work for this

Page 38: Lecture 8  The Principle of Maximum Likelihood

probabilistic representation of data

probability that the data are

near dgiven by p.d.f.

pA(d)

Page 39: Lecture 8  The Principle of Maximum Likelihood

probabilistic representation of data

probability that the data are

near dgiven by p.d.f.

p(d)centered at

observed data dobs

Page 40: Lecture 8  The Principle of Maximum Likelihood

probabilistic representation of data

probability that the data are

near dgiven by p.d.f.

p(d)variance reflects

uncertainty in measurements

Page 41: Lecture 8  The Principle of Maximum Likelihood

probabilistic representation of both prior information and observed data

assume observations and a priori information are uncorrelated

Page 42: Lecture 8  The Principle of Maximum Likelihood

dobs model, m

datu

m,

d map

Example of

Page 43: Lecture 8  The Principle of Maximum Likelihood

the theoryd = g(m)is a surface in the combined space of

data and model parameters

on which the estimated model parameters and predicted data must lie

Page 44: Lecture 8  The Principle of Maximum Likelihood

the theoryd = g(m)is a surface in the combined space of

data and model parameters

on which the estimated model parameters and predicted data must lie

for a linear theorythe surface is planar

Page 45: Lecture 8  The Principle of Maximum Likelihood

the principle of maximum likelihood says

maximize

on the surface d=g(m)

Page 46: Lecture 8  The Principle of Maximum Likelihood

0 1 2 3 4 5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

m

d

dobs model, m

datu

m, d

mapmest

d pred=g(m)

0 1 2 3 4 50

0.1

0.2

s

P(s

)

position along curve, sp(s)

(B)

smax

(A)

Page 47: Lecture 8  The Principle of Maximum Likelihood

0 1 2 3 4 50

0.5

1

s

P(s

)

0 1 2 3 4 5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

m

d

dobs model, m

datu

m,

d mest≈map

d pre

d=g(m)

position along curve, s

p(s)

smax

Page 48: Lecture 8  The Principle of Maximum Likelihood

0 1 2 3 4 50

0.5

1

s

P(s

)

0 1 2 3 4 5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

m

dd pre ≈dob s model, m

datu

m,

d mestd=g(m)

position along curve, s

p(s)

map

smax

(A)

(B)

Page 49: Lecture 8  The Principle of Maximum Likelihood

principle of maximum likelihoodwith

Gaussian-distributed dataGaussian-distributed a priori information

minimize

Page 50: Lecture 8  The Principle of Maximum Likelihood

this is just weighted least squareswith

so we already know the solution

Page 51: Lecture 8  The Principle of Maximum Likelihood

solve Fm=f with simple least squares

Page 52: Lecture 8  The Principle of Maximum Likelihood

when [cov d]=σd2I and [cov m]=σm2I

Page 53: Lecture 8  The Principle of Maximum Likelihood

this provides and answer to the question

What should be the value of ε2in damped least squares?

The answer

it should be set to the ratio of variances of the data and the a priori model parameters

Page 54: Lecture 8  The Principle of Maximum Likelihood

if the a priori information is

Hm=hwith covariance [cov h]Athen the Fm=f becomes

Page 55: Lecture 8  The Principle of Maximum Likelihood

Gm=dobs with covariance [cov d]Hm=h with covariance [cov h]Amest = (FTF)-1FTdobs

with

the most useful formula in inverse theory