46
Bayes and Classical Approaches: A Comparative Overview Econ 690 Purdue University December 23, 2011 Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 1 / 46

Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Bayes and Classical Approaches: A ComparativeOverview

Econ 690

Purdue University

December 23, 2011

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 1 / 46

Page 2: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Outline

1 Review

2 Exchange Paradox

3 Example #1

4 exchangeability

5 The Likelihood PrincipleExample #1Example #2

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 2 / 46

Page 3: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

The Classical (Frequentist) Approach

For both the Bayesian and the Frequentist, the object of interest isdenoted as θ ⊆ Θ ⊆ Rk , a parameter which characterizes somefeature of the population at large (e.g., the average height of men inthe U.S.)

A sample of data is collected, which must then be used to shed lighton the value of θ.

A variety of estimators - rules taking the data y and mapping thatdata into a “guess” regarding the value of θ - are possible. Theperformance of an estimator is evaluated by the frequentist byanalyzing its sampling distribution.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 3 / 46

Page 4: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

The Classical (Frequentist) Approach

“Frequentist econometrics is tied to the notion of sample andpopulation” (Lancaster, 2004, page 360). A “good” estimator is onewhich is “close” to θ, where “closeness” is measured by itsperformance averaged over hypothetical repeated sampling from thepopulation.

Therefore, although a given estimate obtained by a particularresearcher may be far away from θ, he or she could take comfort inthe fact that other researchers, obtaining different samples from thepopulation of interest, and making use of the same estimator, willobtain estimates that are “close” to θ, at least on average.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 4 / 46

Page 5: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

The Classical (Frequentist) Approach, Continued

To illustrate, consider the estimator:

For any sample of data, the realized estimate can never equal the trueparameter of interest θ. Thus, ex post, (i.e., after seeing the data),we know that the estimate is not “good” in the sense that, wheneverthe estimator it is applied, there is no chance of realizing a estimateequal to the true parameter θ.

However, viewed according to the ex ante criterion of unbiasedness,the estimator is “close” to the true parameter θ in the sense that ifwe could apply our estimator over and over again, on average, wewould get the “correct” θ.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 5 / 46

Page 6: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

The Bayesian Approach

Unlike the frequentist point of view, the Bayesian perspective providesno role for data that could have been observed, but is not. That is, itsfocus is ex post - conditioning on the observed data y , rather than exante - which involves averaging over the sampling distribution of y |θ.

In fairness, the Bayesian approach requires more inputs, and mustspecify the full joint distribution of observables y that become knownunder sampling and unobservables θ. This joint distribution istypically decomposed as

p(y , θ) = p(y |θ)p(θ),

into the likelihood p(y |θ) [which, viewed as a function of θ for giveny is denoted L(θ)] and a prior p(θ).

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 6 / 46

Page 7: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

The Bayesian Approach

The demands required of the model specification open the Bayesianup to criticism regarding sensitivity of results to the specification ofthe likelihood and/or the prior. In frequentist econometrics, far less isrequired - no prior is needed, and in many cases, and a completespecification of the sampling distribution p(y |θ) is not necessary andonly moment conditions are assumed to hold.

However, in such cases, approximations must be used to characterizethe sampling distribution of the estimator since the model is not richenough to allow for analytic calculation of this distribution. Thus,flexibility comes at a cost.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 7 / 46

Page 8: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

The Bayesian Approach

Bayesians regard the necessary additional structure as a worthyinvestment: By construction, Bayesian inference provides exact finitesample results.

In many cases, flexible representations of the likelihood (allowing forfat tails, skewness or multimodality) can go a long way toward easingconcerns regarding its specification.

In sufficiently large samples, the influence of the prior generallyvanishes, and it is typically wise to conduct a sensitivity analysis overalternate prior choices. (For testing purposes, the prior is far moreimportant).

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 8 / 46

Page 9: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

The Bayesian Approach

Rather than having a variety of competing estimators for the sameproblem, the Bayesian “estimator” is unique - all informationregarding θ is summarized in the posterior distribution p(θ|y).Different features of this posterior can be used for point estimates,e.g., the posterior mean or the posterior mode, depending on therelevant Loss function.

Nuisance parameters can cause headaches for frequentisteconometrics. In the Bayesian framework, they are handled intuitivelyand consistently across various models, by simply integrating themout of the posterior distribution.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 9 / 46

Page 10: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

Asymptotic Comparisons

In finite dimensional models with standard regularity conditions, thefollowing are known:

Asymptotic Interval Agreement There exist regions Rn such that∫Rn

p(θ|yn)dθ = 1− α and Rn has frequentist coverage

1− α + O(n−1).

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 10 / 46

Page 11: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

Exponential Family Example

Consider a set of scalar independent and identically distributed (iid)observations, denoted here by y1, y2, . . . , yn , from the natural exponentialfamily of distributions:

p(yi |θ) = h(yi ) exp [θyi − g(θ)] , i = 1, 2, . . . , n,

where the functions h and g are known and yn will be used to denote thefull vector of n observations.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 11 / 46

Page 12: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

Exponential Family Example: Sampling Theory

Provided one can interchange the order of integration and differentiation:

where pθ(y |θ) ≡ ∂p(y |θ)/∂θ and gθ(θ) is defined similarly.

This immediately impliesE (y) = gθ(θ0)

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 12 / 46

Page 13: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

Exponential Family Example: Sampling Theory

By extension:Var(y) = gθθ(θ0),

where gθθ(·) is analogously defined as the second derivative.

It is straightforward to show that the MLE is determined by solving

gθ(θn) = yn,

where yn ≡ n−1∑n

i=1 yi and θn represents the MLE.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 13 / 46

Page 14: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

Exponential Family Example: Sampling Theory

Under suitable regularity conditions, we obtain the following asymptoticapproximation to the sampling distribution of ψn:

from which large-sample confidence intervals can be constructed.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 14 / 46

Page 15: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

Bayesian Inference

Consider the priorp(θ|a, b) ∝ exp[aθ − bg(θ)],

Using the same type of argument, it is straightforward to show, providedthe moment exists:

E (ψ) ≡ µψ

= a b−1,

with ψ ≡ gθ(θ).

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 15 / 46

Page 16: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

Bayesian Inference

Combining the likelihood and prior, we obtain via Bayes Theorem:

p(θ|yn) ∝ exp [(nyn + a)θ − (n + b)g(θ)] .

It therefore follows:

which we write equivalently as

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 16 / 46

Page 17: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

Bayesian Inference

The previous equation immediately reveals

Since ωn → 1, we see that the Bayesian posterior mean and the frequentistMLE in this case are asymptotically equivalent.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 17 / 46

Page 18: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Review

In order to conduct inference, for which finite sample results areseldom available, the frequentist typically relies on asymptoticapproximations to the sampling distribution of the estimator.

But under this large sample metric, the sampling distribution of theBayes rule is identical, suggesting that, according to his or her ownrecipe for inference, the prior does not patter.

To question the role of the prior, then, from a sampling theoryperspective is to introduce the importance of finite sample results;otherwise concerns about the prior must be misplaced andunwarranted, as the Bayes rule simply represents a different estimatorwith identical large sample sampling properties.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 18 / 46

Page 19: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Exchange Paradox

Exchange Paradox

Another reason why one may pursue a Bayesian approach is that priorinformation is available (and should be used) while not using suchinformation can lead to undesirable results.

An example of this is the Exchange Paradox (or “two envelope problem”):see, e.g. Christensen and Utts (1992 American Statistician)

An amount of money m is placed in one envelope and 2m placed inanother. One of the envelopes is randomly handed to a player and thenopened. The player is permitted to keep the prize or, if desired, choosethe other envelope.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 19 / 46

Page 20: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Exchange Paradox

Exchange Paradox

The agent opens the envelope, revealing an amount x .

He reasons that the other envelope either contains x/2 or 2x .

Since the envelope was distributed randomly, the chance of each is 1/2.

His expected return from switching is

so he switches.

However, your “opponent” thinks the same way and switches as well. Howcan this be reasonable?

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 20 / 46

Page 21: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Exchange Paradox

Exchange Paradox

A solution to this problem notes that the agent probably has some beliefsover the distribution of prize money, and this should be (and probably is)taken into account when making a decision.

Formally, let M be the prize placed in the first envelope, and X be the (exante) random amount placed in your envelope. Upon seeing X = x in yourenvelope, M = x or M = x/2.

The sampling distribution is

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 21 / 46

Page 22: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Exchange Paradox

Exchange Paradox

Bayes Theorem gives:

Likewise,

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 22 / 46

Page 23: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Exchange Paradox

Exchange Paradox

If the agent keeps the envelope, he gets x with certainty. If he switches, hegets:

Therefore, the expected return from switching is:

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 23 / 46

Page 24: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Exchange Paradox

Exchange Paradox

It is straightforward to show that, when 2p(x) = p(x/2), the expectedreturn from switching is x . So, when

it is optimal for the agent to switch. Perhaps most importantly, when willthe expected return equal 5x/4?

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 24 / 46

Page 25: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Example #1

Ex Ante v.s. Ex Post: An Example

This example is constructed to illustrate the differences between theex ante and ex post perspectives. (It also helps to clarify the betweena Bayesian posterior interval and a frequentist confidence interval).

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 25 / 46

Page 26: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Example #1

Ex Ante v.s. Ex Post: An Example

Given an unknown parameter θ, −∞ < θ <∞, suppose Yi (i = 1, 2)are iid binary random variables with probabilities of occurrence equallydistributed over points of support θ − 1 and θ + 1. That is,

1

Pr(Yi = θ − 1|θ) = 1/2 and

2

Pr(Yi = θ + 1|θ) = 1/2.

Suppose the prior for θ is constant.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 26 / 46

Page 27: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Example #1

Ex Ante v.s. Ex Post: An Example

Pr(Yi = θ − 1|θ) = 1/2, and Pr(Yi = θ + 1|θ) = 1/2.

(a) Suppose we observe y1 6= y2. Find the posterior distribution of θ.

(b) Suppose we observe y1 = y2. Find the posterior distribution of θ.

(c) Consider

θ =

(1/2)(Y1 + Y2) if Y1 6= Y2

Y1 − 1 if Y1 = Y2.

What is the ex ante probability that this estimator contains (i.e.,equals) θ?

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 27 / 46

Page 28: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Example #1

Ex Ante v.s. Ex Post: An Example

Pr(Yi = θ − 1|θ) = 1/2, and Pr(Yi = θ + 1|θ) = 1/2.

(a) If y1 6= y2, then one of the values must equal θ − 1 and the otherequals θ + 1. Ex post, averaging these two values it is absolutelycertain that θ = (1/2)(y1 + y2), i.e.,

(b) If y1 = y2 = y , say, then the common value y is either θ − 1 orθ + 1. Since the prior does not distinguish among values of θ, expost, it is equally uncertain whether θ = y − 1 or θ = y + 1, i.e.,

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 28 / 46

Page 29: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

Example #1

Ex Ante v.s. Ex Post: An Example

Pr(Yi = θ − 1|θ) = 1/2, and Pr(Yi = θ + 1|θ) = 1/2.

(c) Parts (a) and (b) suggest the ex post (i.e., posterior) probability thatour estimator equals θ is either 1 or 1/2, depending on whether y1 6= y2 ory1 = y2.

From the ex ante perspective, however,

The difficult question for the pure frequentist is: why use the realized datato estimate θ and then report the ex ante confidence level 75% instead ofthe appropriate ex post measure of uncertainty?

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 29 / 46

Page 30: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

exchangeability

exchangeability and de Finetti

Consider a sequence of Bernoulli random variables x1, x2, · · · , xn and thejoint probability

p(x1, x2, . . . , xn).

Suppose, additionally, that you feel the labels 1, 2, · · · , n are uniformativein the sense that all of the marginal distributions of the random quantities(and marginals of pairs, triples, etc.) are the same. Such a belief impliesthat

p(x1, . . . , xn) = p(xπ(1), . . . xπ(n)),

where π is an operator that permutes the labels i = 1, . . . , n.If the equation above holds for any permutation operator π, then therandom quantities x1, x2, · · · , xn are regarded as exchangeable.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 30 / 46

Page 31: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

exchangeability

exchangeability and de Finetti

What would NOT be an exchangeable sequence?

Consider the case of “streaks” in sports, where a team who has just wonits previous game is more likely to win the next, and conversely, a teamwho has just lost a game is more likely to loose the next.In this case,

p(1, 1, 1, 0, 0, 0)

would not be believed to equal

p(1, 0, 1, 0, 1, 0)

and thus the joint probability is not preserved under permutation. Thus,the sequence would not be regarded as exchangeable.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 31 / 46

Page 32: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

exchangeability

exchangeability and de Finetti

What would NOT be an exchangeable sequence?

[Bernardo and Smith (2000)] Suppose that xi , . . . , xn denotemeasurements of a particular substance, all made by the same individualusing the same process and equipment. In this case, it may be reasonableto assume that the sequence of measurements is exchangeable.

If, however, the measurements are taken by a series of different labs, thenone might be skeptical of assuming exchangeability of the entire sequence.One might, however, still be comfortable with assuming thatexchangeability is appropriate for the sequences within each lab. Suchbeliefs are regarded as partially (or conditionally) exchangeable.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 32 / 46

Page 33: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

exchangeability

exchangeability and de Finetti: Example

Our common assumption of iid data is a stronger condition thanassuming the data are exchangeable.

To see this, we need to show the equivalence of p(x1, x2, . . . , xn) andp(xπ(1), xπ(2), . . . , xπ(n)) when the data are iid :

p(x1, x2, . . . , xn) =n∏

j=1

Pr(Xj = xj)

=n∏

j=1

Pr(X = xj)

= Pr(X = xπ(1))Pr(X = xπ(2)) · · ·Pr(X = xπ(n))

= p(xπ(1), xπ(2), . . . , xπ(n))

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 33 / 46

Page 34: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

exchangeability

exchangeability and de Finetti: Example

SupposeY = [Y1 Y2 · · · YT ]′ ∼ N(0T ,Σ),

where

and ι is a T × 1 of ones.Let π(t) (t = 1, 2, · · · ,T ) be a permutation of 1, 2, · · · ,T and write

[Yπ(1) Yπ(2) · · ·Yπ(T )]′ = AY ,

where A is a T × T selection matrix such that for t = 1, 2, · · · ,T , row tin A consists of all zeros except column π(t) which is unity.Show that these beliefs are exchangeable but the events are not necessarilyindependent.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 34 / 46

Page 35: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

exchangeability

exchangeability and de Finetti: Example

Note that

Then,AY ∼ N(0T ,Ω)

where

Hence, beliefs regarding Yt(t = 1, 2, · · ·T ) are exchangeable. Despite thisexchangeability, if α 6= 0, Yt (t = 1, 2, · · ·T ) are not independent

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 35 / 46

Page 36: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

exchangeability

de Finetti’s Representation Theorem (Loosely)

Suppose that x1, x2, . . . is an infinitely exchangeable sequence of Bernoullirandom variables (every finite subsequence is exchangeable). Then, thejoint probability distribution p(x1, x2, · · · , xn) must take the form:

p(x1, x2, . . . , xn) =

∫ 1

0

[n∏

i=1

θxi (1− θ)(1−xi )

]dQ(θ)

for some distribution function Q(θ).

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 36 / 46

Page 37: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

exchangeability

de Finetti’s Representation Theorem

p(x1, x2, . . . , xn) =

∫ 1

0

[n∏

i=1

θxi (1− θ)(1−xi )

]dQ(θ).

This result is profound from the viewpoint of subjectivist probabilitymodeling. It:

1 “justifies” thinking about the data as being conditionally independentBernoulli random variables (conditioned on a parameter θ).

2 provides an explicit justification for the “prior” Q(θ).

3 The assumption of exchangeability implies the Bayesian viewpoint oflikelihood times prior.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 37 / 46

Page 38: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

The Likelihood Principle

The Likelihood Principle

Definition

The likelihood principle (loosely) asserts that if two experiments yieldproportional likelihood functions for realized data y , i.e., L1(θ) = cL2(θ),where c does not depend on θ, then (under the same prior), identicalinferences should be obtained about θ.

This is a controversial statement (see, e.g. Robins and Wasserman,JASA, 2000), although it follows from two, seemingly reasonableconditions: Conditionality and (weak) sufficiency.

In the Bayesian paradigm, this is not an imposed “principle,” butrather, follows as a consequence of Bayes Theorem.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 38 / 46

Page 39: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

The Likelihood Principle

The Likelihood Principle

To see this, consider two experiments with realized data y1 and y2,which yield proportional likelihood functions L1(θ) and L2(θ),respectively.

The posterior under experiment 1 is:

Thus, upon normalization, it is seen that the posteriors under eachexperiment are identical, and thus identical inferences are obtained.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 39 / 46

Page 40: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

The Likelihood Principle

The Likelihood Principle

In frequentist calculations, proportional likelihoods may not yieldidentical inferences, since the Classical paradigm provides an explicitrole for data that could have been observed, but were not.

The following simple example illustrates this point.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 40 / 46

Page 41: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

The Likelihood Principle Example #1

The Likelihood Principle

Consider two researchers, A and B.

Researcher A observes a random variable X having the Gammadistribution G (3, θ−1) with density

Researcher B observes a random variable Y having the Poissondistribution Po(2θ) with mass function

In collecting one observation each, Researcher A observes X = 2 andResearcher B observes Y = 3.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 41 / 46

Page 42: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

The Likelihood Principle Example #1

The Likelihood Principle

The respective likelihood functions are

LA(θ; x = 2) =

(θ3

Γ(3)

)22 exp(−2θ) ∝ θ3 exp(−2θ),

LB(θ; y = 3) =(2θ)3 exp(−2θ)

3!∝ θ3 exp(−2θ).

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 42 / 46

Page 43: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

The Likelihood Principle Example #1

The Likelihood Principle

Since these likelihood functions are proportional, the evidence in thedata about θ is the same in both data sets.

Because the priors and loss functions are the same, from the Bayesianstandpoint the two researchers would obtain the same posteriordistributions and therefore, make identical inferences.

From the standpoint of maximum likelihood estimation, since thelikelihoods are proportional, the maximum likelihood point estimateswould be the same, θ = 1.5.

However, the maximum likelihood estimators would be very different:

In the case of Researcher A, the sampling distribution would becontinuous, whereas in the case of Researcher B the samplingdistribution would be discrete.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 43 / 46

Page 44: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

The Likelihood Principle Example #2

The Likelihood Principle

A second (and more widely cited) example also emphasizes this point.

Suppose researcher A decides to flip a coin n times, and observes lsuccesses (heads) during these n trials.

The likelihood function in this case is the standard Bernoulli pdf:

L(θ) =

(nl

)θl(1− θ)n−l .

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 44 / 46

Page 45: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

The Likelihood Principle Example #2

The Likelihood Principle

Now, suppose that researcher B decides to flip a coin until sheobserves l heads. It happens that it takes her exactly n trials toobserve l heads.

In this case, the appropriate likelihood is the negative binomialdistribution, which describes the distribution over the number of trials(n) necessary to achieve l successes.

Since these two likelihoods are proportional, from the Bayesian pointof view, these two researchers will obtain identical posterior inferencesprovided they employ the same prior.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 45 / 46

Page 46: Bayes and Classical Approaches: A Comparative Overview · The Bayesian Approach Rather than having a variety of competing estimators for the same problem, the Bayesian \estimator"

The Likelihood Principle Example #2

The Likelihood Principle

However, inference in the classical case regarding θ will differ acrossthe two experiments. Specifically, suppose n = 12, l = 9. Then, in theBernoulli case:

Pr(X ≥ 9|H0 : θ = 1/2) = .073.

while in the Negative Binomial case

Pr(X ≥ 9|H0 : θ = 1/2) = .0327.

In fact, the classical econometrician can not make inferences about θunless he or she knew what the researcher intended to do beforeconducting the trials!

Depending on the intention of the researcher, you either reject(negative binomial) or fail to reject.

Justin L. Tobias (Purdue) Comparative Overview December 23, 2011 46 / 46