View
247
Download
6
Tags:
Embed Size (px)
Citation preview
Bayesian Inference
Presenting:Assaf Tzabari
2
Agenda
Basic concepts Conjugate priors Generalized Bayes rules Empirical Bayes Admissibility Asymptotic efficiency
3
Basic concepts
x - random vector with density |f x - unknown parameter with prior density ( )
Joint density of x and : ( , ) | ( )h x f x
marginal density of x : ( ) ( , )m x h x d
Posterior density of :
( , )( | )
( )
h xx
m x
4
Basic concepts (cont.)
- loss function defined for all ,a A
- decision rule( ) :x X A
Risk function : ( , ) ,xR E L x
Elements of a decision problem:
,L a- the set of all possible decisionsA
Bayes risk function : ( , ) ,r E R
5
Basic concepts (cont.)
or, equivalently, which minimizes:
, |L a x d
A Bayes rule is a decision rule which minimizes ( , )r
A Bayes rule can be found by choosing, for each x, an action which minimizes the posterior expected loss:
, |L a f x d
6
Basic concepts (cont.)
2
2
2 |
|
,
The posterior expacted loss is |
0 | 2 2
x
x
L a a
a x d
da x d E a
da
x E
Example: Bayesian estimation under MSE
7
Conjugate priors
Definition:
A class
F | | f x of prior distributions is a conjugate family for F
ifP
for all | Px P , Ff
Example: the class of normal priors is a conjugate family for the class of normal sample densities,
2 2 1( ) ( | )
2 22 2
2 2 2 2
, , , ( ),
where ( ) and
xx N N N x
x x
8
Using conjugate priors
Step 1: Find a conjugate prior
Choose a class with the same form as the likelihood functions
Step 2: Calculate the posterior
Gather the factors involving in
|xl f x
( , )h x
9
Using conjugate priors (cont.)
11
1
1 /2
( 1) ( 1/ )
( | )
| ( )! !
Assuming ~ ( , ) we get:
( ) ( , )
( , ) | ( ) , , ( ) |
1 The posterior is a ~ ( ,
ix nx nnnx n
xni i ii
a b
nx a n b
x
e ef c e l
x x
G a b
c a b e
h x f c a b x e m x
G nx a
x x
x x
)1/n b
Example: Finding conjugate prior for the Poisson distribution
x=(x1,…, xn ) where xi~P are iid,
Factors fit to gamma distribution of
10
Using conjugate priors (cont.)
( | )
( | )
1~ ( , ) where and
1/
[ ]1/
x
x
G a b a nx a bn b
nx aE a b
n b
Example (cont.): Finding conjugate prior for the Poisson distribution
The Bayes estimator under MSE is then,
The ML estimator is ML x 0 1 2 3 4 5 6 7 8 9 10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
a=1, b=2a=2, b=2a=3, b=2a=10, b=0.5
Gamma
11
Using conjugate priors (cont.)
More conjugate priors for common statistical distributions:
( ) ( | )
( | )
, ,
where [ ]
x
x
B a b B a x b n x
a a xE
a b a b n
Binomial x~b(p,n) and Beta prior
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3
3.5
a=2,b=2a=0.5,b=0.5a=2,b=5a=5,b=2
Beta
12
Using conjugate priors (cont.)
Uniform iid x=(x1,…, xn ) , xi ~U(0) and Pareto prior
( ) 0 ( | ) 0 1
0 1( | ) 0
, max , ,..., ,
( ) max , ,...,where [ ]
1 1
x n
nx
Pa a Pa x x a n
a n x xaE
a a n
0 0.5 1 1.5 2 2.5 3 3.5 40
0.5
1
1.5
2
2.5
3a=1a=2a=3
Pareto
13
Conjugate priors (cont.)
| x Advantages
Easy to calculate Intuitive Useful for sequential estimation
Can a conjugate prior be a reasonable approximation to the true prior?
Not always!
14
Conjugate priors (cont.)
Step 1: subjectively determine -fractiles a point z is defined -fractile if
Step 2: look for matching prior and find Bayes estimator
Example: estimating under MSE based on x~N1)
( )F z
1 2
1 22
1 1 3If median =0 and quartiles =(-1) , =1
2 4 4
There are two matching priors (0,2.19) or (0,1)
2 ( ) while ( )
3.19 2
z z z
N C
x xx x x x
x
Only is conjugate prior, but which is a better estimator ?
15
Improper priors
d
Improper prior – a prior with infinite mass Bayes risk has no meaning The posterior function usually exists
Useful in the following cases: Prior information is not available (noninformative priors
are usually improper) The parameter space is restricted
16
Generalized Bayes rules
, ( | )L a f x d
Definition:If is an improper prior, a generalized Bayes rule, for given x, is an action which minimizes
or, if , which minimizes the posterior expected loss. 0 m x
2 2
2 2
2 2
( ) /2(0, )
(0, ) ( ) /2
1/2 /2|
Noninformative (improper) prior - |
2( ) where (0,1)
/
x
x
xx
e II x
e d
ex E x Z N
P Z x
2,x N Example: estimating >0 under MSE based on
17
Generalized Bayes rules (cont.)
Bayes estimator for
ML estimator
Bayes estimator for =2
-10 -5 0 50
1
2
3
4
5
6
Bayes estimator for
x
18
Generalized Bayes rules (cont.)
Generalized Bayes rules are useful in solving problems which don’t include prior information
fx| is a location density with location parameter if fx| =f(x-)
Using we get,
Example: Location parameter estimation under L(a-)
( | ) ( )
( | ) ( )
( ) ( ) ( ) ( )
where
x f y
x f x
E L a L y a x f y dy E L y K
a x K
19
Generalized Bayes rules (cont.)
The generalized Bayes rule has the form,
This is a group of invariant rules, and the best invariant rule is the generalized Bayes rule with the prior
Example (cont.): Location parameter estimation under L(a-)
( )( ) where minimizes ( )f yx x K K E L y K
20
Generalized Bayes rules (cont.)
Under MSE (x) is the posterior mean,
Example (cont.): Location parameter estimation under L(a-)
( )( ) ( ) f yx f x d x E y
for x=(x1,…,xn) , Pitman’s estimator is derived:
1
1
( ,..., )
( )
( ,..., )
n
n
f x x d
x
f x x d
21
Empirical Bayes
m x
Development of Bayes rules using auxiliary empirical (past or current) data
Methods: Using past data in constructing the prior Using past data in estimating the marginal distribution Dealing simultaneously with several decision problems
Xn+1 - sample information with density 1 1|n nf x
x1 ,…,xn - past observations with densities |i if x
22
Determination of the prior from past data
- conditional mean and variance of xi 2 , f f
Assumption: ,…,n ,n+1 - parameters from a common prior
22 2
m f
m f f m
E
E E
- marginal mean and variance of xi 2 , m m
Lemma 1:
Result 1:
2 2 22 2If then mf
m ff f
23
Determination of the prior from past data (cont.)
22
1 1
1 1ˆ ˆ ˆ= and
1
n n
m i m i mi i
x xn n
Step 1: Assume a certain functional form for Conjugate family of priors is convenient
Step 2: Estimate , 2 based on x1,…, xn
Xn+1 can be included too
If f and f2 is constant then:
Step 2a: Estimate m , m2 from the data
E.g.
Step 2b: Use result 1. to calculate , 2
24
Determination of the prior from past data (cont.)
1 122 2
1 1
2 22 2 2
21
|
2
1 1ˆ ˆ and
1
1 1 0ˆ ˆ and 1 where 1
0
ˆ ˆThe prior is ( , ) and the Bayes estimate of under MSE,
ˆ1ˆ( )
ˆ1
n n
m i m ii i
n
x
x x x x sn n
s if sx s s
else
N
x E
2
1 1 12 2
1min 1,
ˆ1 n n nx x x xs
,1i ix N Example: and is assumed to be normal (conjugate prior).
Estimation of and 2 is needed for determining the prior.
25
Estimation of marginal distribution from past data
[the number of equal to ]ˆ
1ix j
m jn
Assumption: The Bayes rule can be represented in terms of m(x)
Advantage: No need to estimate the prior
Step 1: Estimate m(x) x1,…, xn , xn+1 are a sample from the distribution with density m(x)
E.g. in the discrete case,
Step 2: Estimate the Bayes rule, using
Advantage: No need to estimate the prior
m̂ j
26
Estimation of marginal distribution from past data (cont.)
1
1|1 1
1
1
1 111
1 1
1 11
1
|( ) |
1 ( 1)!( )
ˆ1 ( 1)ˆand ( )ˆ
n
nxn n
n
x
n nnn
n n
n nn
n
f x dx E x d
m x
ed
x m xxx
m x m x
x m xx
m x
Example: The Bayes estimation of n+1 when ,under MSE. i ix P
27
Compound decision problems
Independent x1,…, xn are observed, where i are from a common prior
Goal: simultaneously make decisions involving ,…,n
The loss is L(,…,n,a)
Solution: Determine the prior from x1,…, xn using empirical Bayes methods
28
Admissibility of Bayes rules
If a Bayes rule, is unique then it is admissible E.g. Under MSE the Bayes rule is unique Proof: Any rule R-better than must be a Bayes rule itself
For discrete , assuming that is positive, is admissible
For continuous , if R( is continuous in for every then is admissible
Bayes rules with finite Bayes risk are typically admissible:
29
Admissibility of Bayes rules (cont.)
Generalized Bayes rules can be inadmissible and the verification of their admissibility can be difficult.
Example: generalized Bayes estimator of based on
versus the James-Stein estimator ,p pNx θ I
21 1
1
2 2
1 1
,..., , , ,..., ( ) 1 , L( , ) ( )
, ,
is identical to the ML estimator
, ( ) ( )
pt t
p p p p i ii
p p
p p
i i i ii i
x x N a
N E
R E x E x p
θ|xθ|x
x
x θ I θ θ a
θ x I x θ x
θ x
30
Admissibility of Bayes rules (cont.)
Example (cont.): generalized Bayes estimator of versus the James-Stein estimator
2
2 ( | )2
2The James-Stein estimator is 1
1And its risk is , 2
For 2 , is not admissible
JS
js f
p
R p p E
p
x θ
x xx
θx
31
Admissibility of Bayes rules (cont.)
1
| ( ) exp ( ) ( ) ( )p
i ii
f h x T x B
x
Theorem: If x is continues with p-dimensional exponential density and is closed, then any admissible estimator is a generalized Bayes rule
fx| is a p-dimensional exponential density if,
E.g. The normal distribution
32
Asymptotic efficiency of Bayes estimators
x1 ,…,xn are iid samples with density f(xi|)
Definitions: Estimator n(x1,…,xn) of is defined asymptotically
unbiased if,
Asymptotically unbiased estimator is defined asymptotically efficient if,
and 0d
nn H E H
1
( )v
nI
v– asymptotic variance
I() – Fisher information in a single sample
33
Asymptotic efficiency of Bayes estimators (cont.)
| d x
Assumptions for the next theorems
The posterior is a proper continues and positive density, and
The prior can be improper!
The likelihood function lf(x satisfies regularity conditions
34
Asymptotic efficiency of Bayes estimators (cont.)
Theorem: For large values of n, the posterior distribution is approximately –
Conclusion: Bayes estimators such as the posterior mean are asymptotic unbiased
The effect of the prior declines as n increases
1 ,
( ) ( )
lN
nI nI
35
Asymptotic efficiency of Bayes estimators (cont.)
Theorem: If n is the Bayes estimator under MSE then,
Conclusion: The Bayes estimator n under MSE is asymptotically efficient
1 0 ,
( )
d
nn NI
36
Asymptotic efficiency of Bayes estimators (cont.)
Example: estimator of pbased on binomial sample x~b(p,n) under MSE
( ) |
Beta distribution is a conjugate family for the binomial distribution,
~ ( , ) ~ ( , )
The Bayes estimator under MSE is
The ML estimator is arg max ln |
p p x
p
MLp
p B a b p B a x b n x
a xx E p
a b nx
x f x pn
|x
( ) pML
a b nx E p x
a b n a b n
37
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1a=b=2000 a=b=2
(x) Bayes estimatorML estimator
x x
If the prior is concentrated it determines the estimator, “Don’t confuse me with the facts!”
Asymptotic efficiency of Bayes estimators (cont.)
38(x) Bayes estimator when a=b=2ML estimator
9 100 1 2 3 4 5 6 7 80
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
300 400 500 600 700 800 900 10000 100 2000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
xx
n=10 n=1000
Asymptotic efficiency of Bayes estimators (cont.)
For large sample, the Bayes estimator tends to become independent of the prior
39
Asymptotic efficiency of Bayes estimators (cont.)
Location distributions: if the likelihood function lf(x satisfies the regularity conditions, then the Pitman estimator after one observation is asymptotically efficient
Exponential distributions: if
then it satisfies the regularity conditions, and the asymptotic efficiency depends on the prior
More examples of asymptotic efficient Bayes estimators
( ) ( )( | ) T x Af x e
40
Conclusions
Bayes rules are designed for problems with prior information, but useful in other cases as well
Determining the prior is a crucial step, which affects the admissibility and the computational complexity
Bayes estimators, under MSE, performs well on large sample