5
323 Censoring, conditionality, and likelihood Larry A. WASSERMAN University of Toronto Key words und phvu,ses: Conditionality, censoring, relabelling, likelihood. AMS 1980 subject c1ussficution.s: Primary 62A99; secondary 62A10. ABSTRACT The censoring principle is shown to imply a principle similar to the likelihood principle. Further, it is demonstrated that the censoring principle and a weak form of the conditionality principle jointly imply the likelihood principle. RESUME On demontre qu'un principc semblable au principe de vraisemblance dCcoule du principe de censure. De plus, on demontre que le principe dc vraisemblance decode de cc principe de censure lorsqu'il est conjugue b une version faiblc du principc de conditionnement. 1. INTRODUCTION Since Birnbaum ( 1962) showed that conditionality and sufficiency imply the likelihood principle, there has been considerable interest in the logical relationships between various principles of statistical inference. [See Berger and Wolpert (1984) or Dawid (1977) for details.] The censoring principle, due to Pratt (1961, 1962), has received less attention than some other principles, yet it has considerable appeal. He suggests that the censoring principle implies the likelihood principle, but does not present a formal proof. We shall investigate the logical implications of the censoring principle. Further, we shall show that the censoring principle and a weak form of conditionality imply the likelihood principle. These two principles will be seen to play complementary roles in statistical inference. 2. NOTATION AND DEFINITIONS By a statistical model M we mean M = (S, where S is a finite set and {f"} is a set of probability measures defined over P(S), the power set of S, and indexed by the parameter 0 E 8. For simplicity we shall only consider the case where S in finite. Extensions of many statistical principles for continuous sample spaces have been considered. See Berger and Wolpert (1984) and Evans, Fraser, and Monette (1985). However, we should point out that we have not found an obvious way to extend the argument in this paper to the continuous case. By an experiment E we mean a model M and a realized outcome s: E = (M, s), s E S.

Censoring, Conditionality, and Likelihood

Embed Size (px)

Citation preview

Page 1: Censoring, Conditionality, and Likelihood

323

Censoring, conditionality, and likelihood Larry A. WASSERMAN

University of Toronto

Key words und phvu,ses: Conditionality, censoring, relabelling, likelihood. AMS 1980 subject c1ussficution.s: Primary 62A99; secondary 62A10.

ABSTRACT

The censoring principle is shown to imply a principle similar to the likelihood principle. Further, it is demonstrated that the censoring principle and a weak form of the conditionality principle jointly imply the likelihood principle.

RESUME

On demontre qu'un principc semblable au principe de vraisemblance dCcoule du principe de censure. De plus, on demontre que le principe dc vraisemblance decode de cc principe de censure lorsqu'il est conjugue b une version faiblc du principc de conditionnement.

1. INTRODUCTION

Since Birnbaum ( 1962) showed that conditionality and sufficiency imply the likelihood principle, there has been considerable interest in the logical relationships between various principles of statistical inference. [See Berger and Wolpert (1984) or Dawid (1977) for details.] The censoring principle, due to Pratt (1961, 1962), has received less attention than some other principles, yet it has considerable appeal. He suggests that the censoring principle implies the likelihood principle, but does not present a formal proof. We shall investigate the logical implications of the censoring principle. Further, we shall show that the censoring principle and a weak form of conditionality imply the likelihood principle. These two principles will be seen to play complementary roles in statistical inference.

2. NOTATION AND DEFINITIONS

By a statistical model M we mean

M = ( S ,

where S is a finite set and { f " } is a set of probability measures defined over P ( S ) , the power set of S , and indexed by the parameter 0 E 8. For simplicity we shall only consider the case where S in finite. Extensions of many statistical principles for continuous sample spaces have been considered. See Berger and Wolpert (1984) and Evans, Fraser, and Monette (1985). However, we should point out that we have not found an obvious way to extend the argument in this paper to the continuous case.

By an experiment E we mean a model M and a realized outcome s:

E = ( M , s), s E S .

Page 2: Censoring, Conditionality, and Likelihood

324 WASSERMAN Vol. 14, No. 4

We denote the inference drawn from E by I ( E ) . The form of the function 1 will be left unspecified. To indicate that experiments E and Fare to produce identical inferences about 0 we write

I ( E ) = I i F ) .

Consider the models, M = (S, { jil}) and M ' = ( T,{gll}). Suppose there exists a bijection + : S - T such that

, f , , ( s ) = ,g l l (+ (s ) ) for all s E S, 8 E 8.

Then M ' is obtained from M merely by relabelling sample points. In this case it seems reasonable, if not tautological, that inferences from ( M , s) should be identical to infer- ences from ( M ' , +(s)). We shall refer to this principle as R . Formally we have

R : I ( E ) Es f ( E ' ) ,

where E = ( M , s ) and E ' = ( M ' , +(s)).

REMARK: Godambe (1979) refers to R as M. He considers in detail the relationships between various relabelling principles.

A more subtle but still compelling principle is the censoring principle, which is borne out by the following example, based essentially on Pratt (1962). A statistician is given the following data: 5, 7, and I 1. He is told that the upper bound of the measuring instrument used is 10,005. Later he is informed that the upper bound is 10,006 not 10,005. The censoring principle claims that our inferences should not change in light of this new information. From a practical point of view, this principle seems reasonable, since one rarely has full knowledge about the limits of measuring instruments. As long as we are certain that the data we obtained are not censored, we usually do not concern ourselves with what observations could potentially have been censored. Formally, the censoring principle Ce may be defined in the following way. Let M = ( S , {,fl,}) and M ' = (T , {g , , } ) be two models. Let X and Y be disjoint subsets of S such that S = X U Y . Let t* E T. Suppose there is a map +:S + T such that the restriction of 4 to X is a bijection into T - { r , } and

4 ' { l * } == Y .

.f'll(+ '{f}). I + 1,

2: / " ( S ) , f = I,. { $ € I

Defineg,,(r) =

Then Ce asserts that

Ce : I ( E ) = I ( E ' )

whenever E = ( M , s ) , E ' = ( M ' , + ( s ) ) , and .s E X .

We shall consider another principle which again is motivated by a simple example. Suppose that after completing a statistical analysis for an experimenter, the statistician is informed that the experimenter narrowly avoided an automobile accident the morning the experiment was performed. Thus there was a probability 4 that the experiment might not have been performed. It seems reasonable to assert that this new information should not change the statistician's analysis. We call this principle C * . To formalize this principle we proceed as follows. Let M = (S, { .fH}) and define M ' = (S', {gl,}) where

Page 3: Censoring, Conditionality, and Likelihood

1986 CENSORING, CONDITIONALITY, LIKELIHOOD 325

S ' = S U {A)

and

where p + 4 = I . (Thus A corresponds to the event that the experiment was not performed). We have

C*: I ( E ) = [ ( E ' ) ,

where E = ( M , s) and E' = ( M ' , s ) . It is interesting that Ce and C* play complementary roles as principles of statistical

inference. To see this note that Ce allows us to add or subtract sample points from a model in such a way that the corresponding redistribution of probability does not affect the observed point. On the other hand, C* allows us to add or subtract sample points, whose probabilities are constant over 8, without affecting inferences, as long the redistribution of probability occurs evenly throughout the sample points via multiplication by a constant.

We close this section by defining three more principles. Let M = (S, { f H } ) , and let h be a map with domain S . We call h = h ( s ) ancillary if Prob(h(s) = h ) is independent of 8. Let gH(s) = . f ; l ( s I h ) and S,, = {s: h ( s ) = h}. Define N = (S,,, {go}). According to the conditionality principle C ,

C : I ( E ) = I ( F )

when E = ( M , s), F = ( N , s), and h = h ( s ) is ancillary. We see immediately that C* is a weak version of C .

If M = (S, { f , , } ) and N = ( T , {go}) are two models, then the likelihood principle L says that

L : I ( E ) = I ( F )

if E = ( M , s), F = ( N , r ) , and if there exists some c > 0 such that

, f0(s) = cgll(r) for all 8 E 8.

A special case of L occurs if we take c = I . In this case, the resulting restricted form L will be denoted L * .

3. LOGICAL IMPLICATIONS OF Ce AND C*

We now prove two theorems based on the principles we have discussed.

THEOREM 1 .

Ce + R -+ L*.

Proof. Consider M = (S, { f i , } ) and N = ( T , {go}). Let s E S and t E T . Suppose that j8(s) = g H ( t ) for all 8 E 8.

Let S = S - {s} and T = T - {t}. Define the new models M ' = (S ' , { f ; } ) and N ' = ( T ' , {xi)) by

S' = {s} u {S}, T ' = { t } u { T } ,

. f m = 1 - .f&), gl(T) = 1 - g d t ) .

f ; ( s ) = f H ( S ) ,

g ; ( t ) = gH(tL

Page 4: Censoring, Conditionality, and Likelihood

326 WASSERMAN Vol. 14, No. 4

Let E = ( M , s ) , F = ( N , t ) , and let E' = ( M ' , s ) . F ' = (A", t ) . By Ce we have that

/ ( E ) = / ( E ' ) ,

/ ( F ) 5 / ( F ' ) .

We are given thatjil(,s) = gll(1) for all 9, so that,fi(.s) = gQ(0 for all 9. Furthermore,

f;l(s) = 1 - f;l(.s)

= 1 - g , d t )

= g Q ( i ) .

Under the bijection : S ' ++ 7" given by

+ ( S ) == t ,

+ ( S ) == i we see that E' and F ' are equivalent under K , which completes the proof. Q.E.D.

We illustrate the argument as follows:

We are now in a position to prove

THEOREM 2 .

L" + C" -+ L .

Proof. Let M = ( S , { f t l } ) and N = ( T , {gH}). Suppose E = ( M , s ) and F = ( N , t ) are observed and that

.f',l(~s) = c.gll(t) for all 9 E 0

for some c > 0. Define E ' = ( M ' , s ) where M ' = ( S ' , {j'/l}), S' = S U {A}, and

where p I + 41 = I . In an similar way, define F ' from F using p 2 and q2 in place o fp , and 41. Choose p 2 from the unit interval such that p 2 < c . Take / I , = I>?/(.. Now,

= /llfi,(.s) = P2.fH(S)/('

- - p?(.g,l(1)/c. = p z g , , ( t ) = gQ(t).

It follows from Theorem I that

/ ( E ' ) = f ( F ' ) .

But by C*,

/ ( E ) = I@')

Page 5: Censoring, Conditionality, and Likelihood

1986

and

CENSORING, CONDITIONALITY, LIKELIHOOD 327

I ( F ) = I ( F ’ ) ,

/ ( E ) = I ( F ) , so that

which completes the proof. Q.E.D.

The argument may be depicted as:

E F

9 c’ 1 c”

Theorern I E’ - F ’

4. DISCUSSION

Theorem I shows that L* follows logically from Ce and R . This demonstrates the strength of the censoring principle. The consequence is that most frequentist methods are contraindicated. This includes significance testing and confidence intervals for example. This was pointed out informally by Pratt (1961).

It is a small step to add C* and hence arrive at L . The result is that simple, compelling principles have produced powerful results, which to some may be counterintuitive. An- other example of such a result is the recent proof by Evans, Fraser, and Monette (1986) that C implies L . Also, it can be shown directly that C implies Ce (Evans, personal communication).

ACKNOWLEDGEMENTS

I would like to thank Rob Tibshirani and Mike Evans, whose comments and suggestions have proved extremely helpful. I would also like to thank two referees whose suggestions led to significant improvements in the presentation of this work.

REFERENCES Berger, J.O. and Wolpert, R.L. ( 1984). Thr LikelihoodPrinc,iple. IMS Lecture Notes - Monograph Series, Vol.

Birnbaum, A . (1962). On the foundations of statistical inference (with discusaion). J . Amer. Sturist. Assoc . , 57,

Birnbaum, A . (1972). More o n concepts of statistical evidence. J . Amer. Stutist. Assoc., 67, 858-861. Dawid, A.P. (1977). Conformity of inference patterns. Rrc,c,iirL)c,velopmc,nrs in Sturistics ( J . R . Barra et al. eds.).

Evans. M., Fraser, D.; and Monette, G . (1985). On regularity for statistical models. Cunud. J . Srutisr., 13,

Evans, M., Frdser, D.; Monette, G . (1986). On principles and arguments to likelihood. Canad. J . Sturist., 14,

Godambe, V.P. (1979). On Birnbaum’s axiom of mathematically equivalent experiments. J . Roy. S/ati.st. Soc.

Pratt. J.W. (1981). Review of Lehmann’s Te.sri~i~SrutisricuIHvpo/liese,s. J . Amer. Starist. Assoc., 56, 163- 166. Pratt, J.W. (1962). Contribution to discussion of Birnbaum’s paper. J . Amer. Starist. Assoc.., 57, 269-306.

6. IMS, Hayward, Calif.

269-306.

North-Holland, Amsterdam.

I 37 - I 44.

I 8 I - 199.

Ser. B , 41 107-1 10.

Received 8 June 1985 Revised 16 January I986 Accepted 16 April I986

Drpurrment uf Preventive Medicine and Biosrurisrics 4th Floor. McMurrich Building

Universiry of Toronto Toronto. Ontario M S S IA8