21
Synthese (2009) 166:91–111 DOI 10.1007/s11229-007-9259-5 How and how not to make predictions with temporal Copernicanism Kevin Nelson Received: 13 April 2007 / Accepted: 14 September 2007 / Published online: 11 October 2007 © Springer Science+Business Media B.V. 2007 Abstract Gott (Nature 363:315–319, 1993) considers the problem of obtaining a probabilistic prediction for the duration of a process, given the observation that the process is currently underway and began a time t ago. He uses a temporal Copernican principle according to which the observation time can be treated as a random variable with uniform probability density. A simple rule follows: with a 95% probability, 39t > T t > 1 39 t where T is the unknown total duration of the process and hence T t is its unknown future duration. Gott claims that this rule is of very general application. In response, I argue that we are usually only entitled to assume approximate temporal Copernican- ism. That amounts to taking a probability distribution for the observation time that is, while not necessarily uniform, at least a smooth function. I work from that assumption to carry out Bayesian updating of the probability for process duration, as expressed by my Eq. 11. I find that for a wide range of conditions, processes that have already been underway a long time are likely to last a long time into the future—a qualitative conclusion that is intuitively plausible. Otherwise, however, too much depends on the specifics of various circumstances to permit any simple general rule. In particular, the simple rule proposed by Gott holds only under a very restricted set of conditions. Keywords Temporal Copernicanism · Anthropic principle · Doomsday Argument K. Nelson (B ) 1203 Wilshire BLVD, Austin, TX 78722, USA e-mail: [email protected] 123

How and how not to make predictions with temporal Copernicanism

Embed Size (px)

Citation preview

Page 1: How and how not to make predictions with temporal Copernicanism

Synthese (2009) 166:91–111DOI 10.1007/s11229-007-9259-5

How and how not to make predictions with temporalCopernicanism

Kevin Nelson

Received: 13 April 2007 / Accepted: 14 September 2007 / Published online: 11 October 2007© Springer Science+Business Media B.V. 2007

Abstract Gott (Nature 363:315–319, 1993) considers the problem of obtaining aprobabilistic prediction for the duration of a process, given the observation that theprocess is currently underway and began a time t ago. He uses a temporal Copernicanprinciple according to which the observation time can be treated as a random variablewith uniform probability density. A simple rule follows: with a 95% probability,

39t > T − t >1

39t

where T is the unknown total duration of the process and hence T − t is its unknownfuture duration. Gott claims that this rule is of very general application. In response, Iargue that we are usually only entitled to assume approximate temporal Copernican-ism. That amounts to taking a probability distribution for the observation time that is,while not necessarily uniform, at least a smooth function. I work from that assumptionto carry out Bayesian updating of the probability for process duration, as expressedby my Eq. 11. I find that for a wide range of conditions, processes that have alreadybeen underway a long time are likely to last a long time into the future—a qualitativeconclusion that is intuitively plausible. Otherwise, however, too much depends on thespecifics of various circumstances to permit any simple general rule. In particular, thesimple rule proposed by Gott holds only under a very restricted set of conditions.

Keywords Temporal Copernicanism · Anthropic principle · Doomsday Argument

K. Nelson (B)1203 Wilshire BLVD, Austin, TX 78722, USAe-mail: [email protected]

123

Page 2: How and how not to make predictions with temporal Copernicanism

92 Synthese (2009) 166:91–111

1 Introduction: Gott’s simple rule

Sometimes our information enables us to predict the future quite well. There can belittle doubt that the sun will rise tomorrow or that Mt. Everest will stand for a longtime. But often our predictive powers fall short. We would still like to use what limitedinformation we have to arrive at a probabilistic forecast. How can we do that?

There are, of course, many approaches to that question—some compatible witheach other, others not. In 1993, the astrophysicist J. Richard Gott proposed an unusualapproach, which he called the delta-t argument. According to him it can give clear-cutprobabilistic forecasts even when our information is very limited indeed.

The method is best illustrated by example. In 1969, Gott himself saw the BerlinWall for the first time. It was then 8 years old. By a temporal Copernican principle, hereasoned that he was highly unlikely to have arrived either in the first small fractionof the wall’s existence or in the last small fraction. More generally, he concluded thatfor any real number F from 0 to 1, there was a probability F of his having arrived inthe middle fraction F of the wall’s then-unknown total duration. That is, he concludedthere was a probability of F that

1

2− F

2<

t

T<

1

2+ F

2(1)

where T stands for the wall’s total duration, t0 stands for the time when it was built,and t stands for its age when he arrived. The idea is that there was nothing “special”about the time of his arrival; it could have occurred equally well at any moment duringthe wall’s duration. So t is treated as a random variable with a uniform probabilitydistribution between 0 and T .

From Eq. 1, it follows via straightforward algebra that

(1 + F

1 − F

)t > T − t >

(1 − F

1 + F

)t (2)

also has a probability F of holding true. This is a probabilistic prediction for T − t ,the wall’s duration subsequent to Gott’s arrival at it. Using F = 0.95 and the knownvalue of t = 8 years, Gott concluded that the wall had a 95% chance of lasting less

than(

1+0.951−0.95

)t = 39 · 8 = 312 years into the future but more than

(1−0.951+0.95

)t =

(1/39) · 8 = 0.21 years.Twenty years later, the wall came down and the prediction was confirmed. The

success inspired Gott to propose Eq. 2 as a rule of very general application. Through-out several subsequent publications he has recognized only two conditions as necessaryto apply the rule. The first condition is that the time when a process is observed be inno way “special.” The second condition, which in fact he does not seem to regard asstrictly necessary, is that we lack “actuarial data” to guide our probability assignments(Gott 1993, 1996, 1997). If those conditions are met, then the equation can give ussubstantial information even when we start from near-total ignorance about the natureof the process. (For present purposes we may consider the persistence of an object,such as the Berlin Wall, to be a sort of process.) Among the many predictions Gott has

123

Page 3: How and how not to make predictions with temporal Copernicanism

Synthese (2009) 166:91–111 93

made have been ones about the future duration of Stonehenge, of the journal Nature,and of the British government. He finds the underlying idea to be very intuitive: Weare unlikely to find ourselves, merely by coincidence, observing a process either verynear its beginning or very near its end. For a process that has already been underwaya long time, an ending in the near future would imply that the unlikely coincidencehad occurred; not so for a process that began relatively recently. Therefore, “Thingsthat have been around a long time tend to stay around a long time. Things that haven’tbeen around long may be gone soon.” (Gott 2001, p. 219).

Let us call Eq. 2 the simple rule. But does it really work? Several authors havechallenged it (Caves 2000; Sober 2003; Bass 2006). And anyway, just what does itmean for an observation time to be “special”?

2 A Bayesian argument

2.1 Temporal Copernicanism

To proceed, I will first present a Bayesian argument based on Gott’s underlying idea. Itwas Buch (1994) who first pointed out the desirability of using a Bayesian approach. Inresponse, Gott (1994,1996) reformulated the argument accordingly. Caves (2000) andLedford et al. (2001) have also presented Bayesian formulations of the argument. Butmy development will highlight some points that have so far been underappreciated.

Three moments in time are of concern: the starting time of the process, the end-ing time of the process, and the observation time. Let us adopt the viewpoint of theobserver, so that the observation time counts as the present moment—the now. Tem-poral Copernicanism means that the observation time is treated as random; from theobserver’s own perspective, that will then be transposed into randomness as to howlong ago the process started or how far in the future it will start. With that in mindconsider the following propositions, where “S” designates the process of interest.

• cS : Process S is currently in progress.• aS

t,�t : Process S began more than time t ago but less than time t + �t ago.

• d ST,�T : Process S will have lasted more than a total time T but less than a total time

T + �T .

(For t and t + �t negative, aSt,�t says that process S will begin in the future. But of

course, a negative total duration T is impossible.)As the words “currently” and “ago” indicate, cS and aS

t,�t have different truth-values at different times. We thus might regard them as indexical propositions of asort, since their truth-value will depend on context if the time when they are entertainedis regarded as part of the context. But there is no need for me to become entangled inthorny issues about the semantics of indexicality; all that my approach requires is forsuch propositions to bear probabilities, and for ordinary theorems of the probabilitycalculus to be applicable.1

1 On the influential theory of indexicality proposed by Kaplan (1988), there is no such thing as an indexicalproposition; rather, indexical sentences express different singular propositions in different contexts. The

123

Page 4: How and how not to make predictions with temporal Copernicanism

94 Synthese (2009) 166:91–111

I exclude from consideration processes that might have infinite duration. (Myapproach could be extended to deal with them; the exclusion is merely for ease ofpresentation.) As a technical point, I regard a process as lasting over a closed finiteinterval in the time line, and I permit the degenerate case of a process that lasts merelyfor one instant. Note that d S

T,dT entails that process S will have occurred at sometime; however, I leave open the possibility that the process will never occur at all.Consequently, the integral of P(d S

T,dT ) from T = 0 to T = ∞ may be less than one.2

Let our goal be to find a posterior probability distribution for the total durationof process S. (That is equivalent to working with the future duration but turns outto be simpler.) That is, we want to know the probability of d S

T,dT conditional on theavailable information. In order to avoid extraneous (and difficult) issues associatedwith the old evidence problem, assume a situation in which an observer starts outcompletely ignorant as to whether process S is currently in progress and also as tohow long ago it started, if it has even started yet at all.3 The observer then learns forthe first time that the process is in fact in progress and began a time t ago. Let B be thetotal background information of the observer immediately before learning that. Thisbackground information will include, for example, information about other processessimilar to process S. As a notational point, I will regard B as a single proposition, aconjunction of many individual propositional items.

The desired posterior probability distribution is P(d ST,dT |cS & aS

t,dt & B). This isan infinitesimal probability, infinitesimal in dT but not in dt . Dividing it by dT willyield a differential probability density.

We are supposing that the observation time is random. If it is completely random,then for any value T of the total duration, the observation could equally well occurat any t . (Since t is the observation time minus the starting time of the process, apositive value corresponds to an observation made after the process has started whilea negative value corresponds to an observation made before.) We are thus led to thefollowing.

The ideal temporal Copernican assumption:For all T , the probability distribution P(aS

t,dt |d ST,dT &B) is independent of t .

Normalization in t will then require that the probability also be independent of T .The above is essentially the same as the temporal Copernican principle of Caves

(2000). Caves correctly notes that it does not always hold true. However, he doesnot explicitly point out that the truth of the principle depends on what background

Footnote 1 continuedindexical propositions cS and aS

t,�t are perhaps closer to what Kaplan would call “characters.” But nothingin my account hinges on these highly debatable issues. If the reader prefers, phrases such as “currently inprogress” can be translated into ones such as “in progress when the observation is made.”2 I do exclude the case of the process stopping and then restarting. Without loss of generality, that may beregarded as a case of two different processes.3 Note that I am thus ruling out Gott’s Berlin Wall example from consideration. The old evidence problem,first posed by Glymour (1980), is based on a straightforward Bayesian calculation. Old evidence, with aprobability of 1, appears unable to raise the probability of any hypothesis. Yet it appears that sometimes oldevidence does support hypotheses; for example, old observations of Mercury’s orbit supported Einstein’sgeneral theory of relativity. A substantial literature exists on this problem, with debate continuing.

123

Page 5: How and how not to make predictions with temporal Copernicanism

Synthese (2009) 166:91–111 95

information is available. In fact, it will unfortunately fail for the vast majority of Sand B. If nothing else, we may be quite confident that process S began no more than14 billion years ago, the time of the Big Bang. Usually, the background informationwill provide a far stronger constraint than that. Until yesterday, I had no idea as towhen the Chateau de Chambord was built or whether it still existed; but I could stilleasily rule out the possibility that it was constructed a million years ago.

2.2 Approximate temporal Copernicanism

Caves suggests that we can sometimes take some timespan �, large compared to allother timescales of interest, and that within that timespan temporal Copernicanismwill hold. One problem with that suggestion is that our background information Bwill seldom require that the process must occur within some sharply defined time-span. And even if it does, then it will seldom be the case that the process is equallylikely to occur at any point in that timespan. (Much more often, an occurrence towardsthe middle of the timespan will be favored.)

I thus go beyond Caves—and, to my knowledge, all other authors who have writtenon this topic—by considering what happens when temporal Copernicanism is merelyapproximate. To arrive at the approximation, first note that it is for large T values thattemporal Copernicanism is most likely to break down. If the process is very long-lived,then sometimes the observation is especially likely to occur near the beginning of theprocess.

I follow Gott and also Olum (2002) in using Broadway plays as an illustration;they are actually a far better example of approximate than ideal temporal Coperni-canism. Let the background information be that of an ordinary person in 2007 whois only ordinarily well-informed about theater. Given such background information, atremendously durable play with a thousand-year run is far more likely to be observedtowards the beginning of its run than towards its end. (It is far more likely that a playbeginning in 2007 will run until 3007 than that one ending in 2007 began in 1007.)Still, precisely because a play with such a long run is so unlikely in the first place, thisis little threat to temporal Copernicanism.

For the above reason I restrict consideration to what I call “plausible T values,”meaning specifically T values that are not implausibly large. This is somewhat sim-ilar to Caves’ suggestion of a large-timescale cutoff, but applied only to the durationT . Whatever the bound is for plausible T values, there will have to be a negligibleprobability of T being larger.4 A new assumption may now be stated.

The approximate temporal Copernican assumption:For all plausible T values, the probability distribution P(aS

t,dt |d ST,dT & B) has

negligible variation in t over a timescale T .

The above assumption continues to be based on Gott’s intuition that the observationtime is random. The idea is that any plausible duration T , together with the background

4 “Negligible probability” here requires that both the prior probability and the posterior probability condi-tional on c be negligible.

123

Page 6: How and how not to make predictions with temporal Copernicanism

96 Synthese (2009) 166:91–111

information B, provides only a weak constraint on when the observation is made. Itfollows that t can be treated as a random variable with a very spread-out probabilitydistribution—spread-out as compared to T .

The assumption is clearly far from a universal truth. When will it hold? In short,it will hold if whatever causes the observation to be made at a particular time (by anobserver with background information B) has little connection with whatever causesor permits the process to start or stop.5 A slight connection is allowable. Otherwisewe would have ideal temporal Copernicanism.

From the viewpoint of the observer, t gives how long ago the process started (with anegative t corresponding to a process that will start in the future). The above assump-tion can then be regarded as pertaining to how long ago the process started. Givena plausible total duration of a thousand years, the process is about equally likely tohave begun a thousand years ago, or two thousand—or to be beginning right now, orto begin a thousand years in the future. But it may be much less likely to have begun10 million years ago. Given a plausible total duration of 1 day, the process is aboutequally likely to have begun today, yesterday, or tomorrow; but it may be much lesslikely to have begun last year, or to begin next year.

Put otherwise, the approximate temporal Copernican assumption means that B,together with the total duration of the process being T , merely confines the startingtime to some zone that is broad compared to T .

Ex hypothesi, the observer starts out not knowing when the process began or willbegin. The assumption means that this ignorance is well and truly deep, so that theobserver has no reason to favor one t value over another within a range large com-pared to T . Furthermore, the ignorance must be so deep that even if the total durationT becomes known, that information by itself will not help.6

Happily, it is precisely when we suffer from such deep ignorance that approaches ofthe Copernican sort are most appealing. If we know more, then there are ample alter-natives. I thus feel comfortable in working from the approximate temporal Copernicanassumption as a starting point. What this largely boils down to is that I am restrictingmyself to B that do not start out with too much information about how long ago theprocess started or how far in the future it will start.

It is of course always possible that we are not quite as ignorant as we think. Wemay possess information with implications that we have failed to grasp, perhapsthrough lack of logical omniscience. But that is the sort of risk we always run in any

5 This statement is phrased to give a sufficient condition, but the condition is also close to being necessary.There is just a tiny bit of room for an observation that does have a substantial connection with whateverstarts or stops the process to nonetheless occur at an effectively random time.6 It may appear that I am relying on the Principle of Indifference in making a connection between igno-rance and a probability distribution that is, if not absolutely uniform, at least smooth and broad. FortunatelyI need not rely on that often-refuted Principle in any general form. A very narrow version of it, applicableonly to time itself, is adequate and seems perfectly reasonable. I largely agree with Gott (1994) that such anarrow Principle is fit to serve as a foundation for temporal Copernicanism. It is, if you will, a consistentrestriction of the Principle, such as has been discussed by several recent authors (Castell 1998; Bartha andJohns 2001; Mikkelson 2004). In any case, this is all rather tangential; any reader who objects to a movefrom ignorance to the approximate temporal Copernican assumption can regard the latter as fundamental,taking it as an assumption that should be independently reasonable for many S and B.

123

Page 7: How and how not to make predictions with temporal Copernicanism

Synthese (2009) 166:91–111 97

probabilistic calculation. Quantifying that risk may be difficult, but in practice it seemssafe to say that it is small.

Broadway plays can continue to serve as an illustration. First of all, let us say thatsuch plays have a plausible duration of no more than 50 years; that is a somewhatarbitrary bound, but surely a play that runs continuously for more than 50 years is veryunlikely. Let me use a specific play as an example. I have just found that Deuce (a playI had never heard of before) has been showing at Broadway’s Music Box Theatre forthe past 9 weeks. My decision to observe what was playing in the Music Box Theatrehad essentially no connection with that particular play. So I could just as easily havemade my observation at any time during the play’s run, or well before the run began,or well after the run ended. Supposing that the play will have had a total run of 1 year,my observation was about equally likely to be made a year or two before the beginningof the run, or a year or two after the end of the run. But an observation 40 years beforeor after the beginning of the run was rather less likely. For one thing, the longevity ofthe Music Box Theatre itself would then be an issue. But if the play will have had atotal run of 20 years, that will make a difference. Then my observation will be aboutequally likely to have occurred 40 years before or after the beginning of the run. Thelongevity of the play will provide evidence for the longevity of the theater. Still, anobservation 200 years before or after the beginning of the play’s run may be consider-ably less likely. Finally, if the play will have had an implausibly long run of more than50 years, approximate temporal Copernicanism will begin to break down; we will bein the same sort of situation as discussed previously for the thousand-year play.

The approximate temporal Copernican assumption does not require that a differ-ence in duration has the sort of effect described above; it merely allows it. That is, theduration of the process is allowed to influence the probability distribution of t , just aslong as that distribution remains broad and spread-out on a timescale of T . Such aninfluence should make intuitive sense. We often expect that a very long-lived processwill be more likely to have started in the remote past or to start the remote future than ashort-lived one. (Partly that is just because short-lived processes taking place entirelyin the remote past or future are unlikely to be of interest to us at all. But there is also aless subjective reason related to the prerequisites of the process, as will be discussedin Sect. 7.1.)

Other examples are easy enough to come by. Among phenomena of scientificinterest, astronomical processes are especially good examples; a previously unknownastronomical process will very rarely have much connection with the development oftechniques that allow it to be observed. If the process is a flare on a distant star, thestar could equally well have been observed either long before the flare began or longafter it ended.

2.3 Updating with the Gott factor

Purely for ease of discussion, let me assume henceforth that the probability distribu-tion P(d S

T,dT |B) is continuous for all T > 0 and has at most an integrable singularityat T = 0. That assumption, while not strictly necessary, is eminently plausible andwill make some of the mathematical steps easier.

123

Page 8: How and how not to make predictions with temporal Copernicanism

98 Synthese (2009) 166:91–111

From the approximate temporal Copernican assumption, another importantapproximate result now follows. As with all other approximate results from now on,it is an element of the approximation that T is restricted to take on only plausible val-ues. For brevity I will no longer write the superscript S or the additional backgroundinformation B, since both are constant and always present.

P(at,dt |c&dT,dT ) = P(c|at,dt &dT,dT )P(at,dt |dT,dT )∫ ∞t ′=−∞ P(c|at ′,dt ′&dT,dT )P(at ′,dt ′ |dT,dT )

={ P(at,dt |dT,dT )∫ T

t ′=0 P(at ′,dt ′ |dT,dT ), 0 ≤ t ≤ T

0, otherwise

={

dt/T, 0 ≤ t ≤ T0, otherwise

(3)

The above is very similar to Gott’s original assumption of temporal Copernicanism.It should make intuitive sense because the conjunction c&dT,dT constrains t (or, putmore pedantically, constrains the value of t for which aS

t,dt is true) to be in the intervalfrom 0 to T but is otherwise irrelevant to it.

Bayes’ Theorem now yields a straightforward result.

P(dT,dT |c&at,dt ) = P(at,dt |c&dT,dT )∫ ∞T ′=0 P(at,dt |c & dT ′,dT ′)P(dT ′,dT ′ |c) P(dT,dT |c)

={

(dt/T )P(dT,dT |c)∫ ∞T ′=t (dt/T ′)P(dT ′,dT ′ |c) , T ≥ t

0, T < t

={

(1/T )P(dT,dT |c)∫ ∞T ′=t (1/T ′)P(dT ′,dT ′ |c) , T ≥ t

0, T < t(4)

Let θ(T ) be the step function

θ(T ) ={

1, T ≥ 00, T < 0

(5)

As a restatement of Eq. 4, we can obtain P(dT,dT |c&at,dt ) from P(dT,dT |c) by mul-tiplying by the factor θ(T − t)T −1 and then renormalizing. I will call θ(T − t)T −1

the “Gott factor.” (It is sometimes called the “anthropic factor,” but the terminologyis not consistent in the literature.)

3 The choice of prior

If the Jeffreys prior P(dT,dT |c) = dT/T is used, then we will obtain a posteriorprobability distribution

P(dT,dT |c&at,dt ) ={

tdT/T 2, T ≥ t0, T < t

(6)

123

Page 9: How and how not to make predictions with temporal Copernicanism

Synthese (2009) 166:91–111 99

If the above is integrated from T = 21+F t to T = 2

1−F t and a slight algebraicrearrangement is performed, then Gott’s simple rule is recovered.

Gott (1994, 1996) argues in favor of using the Jeffreys prior on the grounds that itreflects total ignorance. This distribution is non-normalizable, but he proposes to curethat apparent defect by imposing a high-T cutoff equal to the lifetime of the proton orsome other physical timescale that limits just about any process we are interested in.(The limiting timescales in question are extremely large, such as 105,000,000 years.)7

My main objection is that P(dT,dT |c) is far from reflecting complete ignorance.This probability distribution is conditional on the fact that the process is currentlyunderway, which is a very significant piece of information that by assumption thebackground information does not contain. We start with P(dT,dT ) as a prior and onlysubsequently update to the new probability distribution P(dT,dT |c). In my framework,we may start out ignorant—that is, the background information B may be meager—but then it is P(dT,dT ) rather than P(dT,dT |c) that reflects ignorance. So even if weaccept the move from ignorance to use of the Jeffreys prior, that move is inapplicableto P(dT,dT |c).

So then should we set P(dT,dT ) equal to the Jeffreys prior? In general, no. B willoften provide constraints on what values T is likely to have. Most notably, B willsometimes include information about a large number of processes similar to the oneat hand. When using such a sample we will naturally restrict ourselves to the subsetof the sample consisting of processes that have ended already, since the processes thatare still in progress have unknown total duration. We can then find the distribution oftotal duration within that subset, expressible in the form of a histogram. That distri-bution will be a good approximation to P(dT,dT ), and it may be very different fromthe Jeffreys prior.8

In fact, it is perfectly possible for P(dT,dT ) to be a strongly peaked distribution.To be sure, in order for the approximate temporal Copernican assumption to hold atall, B must reflect considerable ignorance about the process’s starting time. But thatis perfectly compatible with having a great deal of information about its duration.

This is the sort of circumstance that Gott had in mind when he recognized, albeitsomewhat hesitantly, that his approach is unsuitable when we have “actuarial data.”9

Then what if such data are lacking? In that case should we set P(dT,dT ) equal to theJeffreys prior?

I think the answer is still no, because we hardly ever start out with total ignorance.Even if we know of no processes that are very similar to the one at hand, we canresort to analogies of one sort or another. And even the most far-fetched analogies will

7 The Jeffreys prior was advocated forcefully by Jeffreys (1961). It has subsequently been controversialamong Bayesians, as discussed by Press (1988). But the imposition of the cutoff averts most of the criticisms.8 Some Bayesians might not consider P(dT,dT ) to deserve the name of “prior,” since sometimes it is basedon observations, even if observations of processes other than the one of immediate interest. I plead brevityas an extenuating circumstance.9 “My Copernican formula is most useful when examining the longevity of something. . . for which thereis no actuarial data available.” (Gott 1997) Note that he implies his approach may still be of some use whenwe do have actuarial data. But he does not explain in any of his publications exactly what that limitedusefulness is.

123

Page 10: How and how not to make predictions with temporal Copernicanism

100 Synthese (2009) 166:91–111

still provide some information. To return to the Berlin Wall example, we will knowsomething about the longevity of other political barriers to emigration.

In practice, that sort of information will just about always go far beyond the limitimposed by Gott’s huge high-T cutoff. Under the Jeffreys prior with cutoff, the prob-ability of total duration between 104,999,999 and 105,000,000 years is the same as theprobability of total duration between 10 and 100 years. As Olum (2002) rightly notes,that is absurd. I think the reason is that only a tiny amount of initial information (gainedfrom remote analogies or in any other way) is necessary to make the former probabilityvastly less than the latter.

Even if the Jeffreys prior goes too far, it is certainly true that when we start out witha small amount of information about the process’s total duration, P(dT,dT ) will be avery spread-out function of T . It may, for example, be roughly proportional to T 1+α

for large T and to T 1−β for small T . Here α and β are parameters that must be greaterthan zero in order for the probability distribution to remain normalizable. A smallα means that the probability distribution approaches zero slowly for large T , whilea small β means that it has a relatively strong integrable singularity at T = 0. Thecloser both parameters are to zero, the less overall initial information there will be; thatassertion can be formalized using the Shannon approach (Shannon and Weaver 1949,pp. 54–58; Schuster and Just 2005, pp. 243–245), but it should make sense intuitively.

When we start out with really exceptionally little information, we may wish to usea distribution such as the log-Cauchy distribution, which while normalizable showsa large-T decay slower than T 1+α for any α > 0. (It has analogous behavior forsmall T .) The important point is that we will always start out with at least some smallinformation about the total duration of process S, and that we have ample resources toexpress that small information via a Bayesian prior other than the troublesome Jeffreysprior—which remains troublesome even with its cutoff.10

So farewell to the Jeffreys prior. But if the Bayesian argument presented in theprevious section is valid at all, it should continue to be valid even if we start witha quite informative prior P(dT,dT ). We can thereby retain part of the Gott approachwhile avoiding the absurdities that Caves (2000) and Sober (2003) quite rightly seeflowing from indiscriminate use of the simple rule.

4 The probability of observing a process in progress

The procedure that is called for is to start with P(dT,dT ), to move from there toP(dT,dT |c), and then finally to arrive at P(dT,dT |c&at,dt ).

So how exactly can we relate P(dT,dT ) and P(dT,dT |c)? We can arrive at the answerthrough several steps. First note that if a process is just now beginning, then it must be

10 Monton and Kierland (2006) argue that in addition to the Gott argument, one can make a different kindof argument that relies on inference to the best explanation. For example, one might reason that an objectthat has so far lasted a long time is likely to be sturdy, and that a sturdy object will last a long time into thefuture. I think that kind of reasoning can be entirely incorporated into the choice of prior for total duration,and hence it is not separate from the Gott argument at all. If the object might have a wide range of degrees ofsturdiness, then the prior probability distribution will be well spread-out. This is a tangential issue, however,so I will not discuss it further.

123

Page 11: How and how not to make predictions with temporal Copernicanism

Synthese (2009) 166:91–111 101

counted as currently in progress. So P(dT,dT |a0,dt ) = P(dT,dT |c&a0,dt ). Next, takethe special case of Eq. 4 with t = 0:

P(dT,dT |c&a0,dt ) = (1/T )P(dT,dT |c)∫ ∞T ′=0(1/T ′)P(dT ′,dT ′ |c) (7)

The above equation can easily be inverted to obtain

P(dT,dT |c) = T P(dT,dT |c&a0,dt )∫ ∞T ′=0 T ′ P(dT ′,dT ′ |c&a0,dt )

= T P(dT,dT |a0,dt )∫ ∞T ′=0 T ′ P(dT ′,dT ′ |a0,dt )

(8)

Now we need to look at P(dT,dT |a0,dt ). As an immediate application of Bayes’Theorem, we have

P(dT,dT |a0,dt ) = P(a0,dt |dT,dT )P(dT,dT )∫ ∞T ′=0 P(a0,dt |dT ′,dT ′)P(dT ′,dT ′)

(9)

A straightforward combination of Eqs. 8 and 9 finally yields

P(dT,dT |c) = T P(a0,dt |dT,dT )P(dT,dT )∫ ∞T ′=0 T ′ P(a0,dt |dT ′,dT ′)P(dT ′,dT ′)

(10)

In other words, to obtain P(dT,dT |c) from P(dT,dT ), the rule is to multiply byT P(a0,dt |dT,dT ) and then renormalize. The reason for the factor of T is easy enoughto understand: Longer-lived processes are more likely to be currently in progress. Sogiven that a process is currently in progress, we should assign more probability to thedT,dT with larger T . This factor is essentially the same as the one first presented byCaves.

5 The three factors put together

Section 2.3 showed how to move from P(dT,dT |c) to P(dT,dT |c&at,dt ) and Sect. 4showed how to move from P(dT,dT ) to P(dT,dT |c). Putting those two results together,this paper’s central result immediately follows:

P(dT,dT |c&at,dt ) = θ(T − t) T −1T P(a0,dt |dT,dT )P(dT,dT )∫ ∞T ′=t θ(T ′ − t)T ′−1T ′ P(a0,dt |dT ′,dT ′)P(dT ′,dT ′)

= θ(T − t) P(a0,dt |dT,dT )P(dT,dT )∫ ∞T ′=t θ(T ′ − t)P(a0,dt |dT ′,dT ′)P(dT ′,dT ′)

(11)

The only difference between the right-hand side of the first and second lines is that thefactors of T −1 and T have cancelled each other out. At first sight, that cancellation

123

Page 12: How and how not to make predictions with temporal Copernicanism

102 Synthese (2009) 166:91–111

appears to utterly invalidate the Gott approach, just as Caves and Olum would have it.And indeed, even at second sight it requires a major modification to that approach. Thekey factor is no longer the Gott factor θ(T − t)T −1, but rather θ(T − t)P(a0,dt |dT,dT ).

I will conceptualize the application of Eq. 11 as consisting of two steps. In the firststep, we multiply by the step function θ(T − t) and renormalize. In the second step,we multiply the result of the first step by P(a0,dt |dT,dT ) and renormalize again.

I will call the first step “simple clipping” and will call the resulting probabilitydistribution a “simply-clipped posterior.” Note that if P(a0,dt |dT,dT ) is constant, thenthe second step will have no effect and the effect of Eq. 11 will automatically reduceto simple clipping.

(Incidentally, if there is some chance of the process never occurring at all and henceP(dT,dT ) is not normalized, that will make no difference. So henceforth, without lossof generality, I take P(dT,dT ) to be normalized.)

6 Simple clipping

Simple clipping involves no appeal to Copernicanism; it merely updates the proba-bility distribution for dT,dT by ruling out total durations less than the the current ageof an ongoing process. This is essentially the approach advocated by Caves and byOlum. Simple clipping may appear rather trivial. But unfortunately it has spawnedsome confusion in the literature, which is why I am devoting an entire section to it.

For comparison, it is also useful to consider the “slid prior” P(dT −t,dT ). This isexactly the same as the prior P(dT,dT ) except that it has been translated (slid) byan amount t . By construction, the mean T value for the slid prior will equal t plusthe mean T value for the prior. The median T value for the slid prior will also equalt plus the median T value for the prior, and likewise for all percentiles such as thefifth-percentile T value.

As an example, consider taking as our prior probability P(dT,dT ) = θ(T )e−T ,where the θ(T ) reflects the fact that durations are necessarily non-negative. Applyingsimple clipping to the prior produces the posterior probability distributionetθ(T − t)e−T = θ(T − t)e−(T −t). So in this example, the simply-clipped pos-terior and the slid prior are identical. Consequently, the mean T value for the posteriordistribution equals t plus the mean T value for the prior distribution. The same rela-tion holds between the prior and posterior median T values and between all prior andposterior percentile T values. I will call this an exactly additive shift.

However, that kind of shift does not always occur. Consider, by contrast, the priorprobability

P(dT,dT ) = 2δ2/(T + δ)3 (12)

where δ is an adjustable positive parameter that keeps the distribution normalized nearT = 0. The simply-clipped posterior distribution is then

P(dT,dT )θ(T − t)∫ ∞T ′=0 P(dT ′,dT ′)θ(T ′ − t)

= 2(t + δ)2θ(T − t)

(T + δ)3 (13)

123

Page 13: How and how not to make predictions with temporal Copernicanism

Synthese (2009) 166:91–111 103

As Fig. 1 shows, that is quite different from the slid prior

P(dT −t,dT ) = 2δ2θ(T − t)

(T − t + δ)3 (14)

In particular, the simply-clipped posterior distribution is smaller than the “slid prior”for small T and larger for large T . It immediately follows that the mean T value forthe simply-clipped posterior is larger than the mean T value for the slid prior, andhence larger than t plus the mean T value for the prior.

It also follows that the cumulative probability distribution for the simply-clippedposterior is smaller than the cumulative probability distribution for the slid prior forall T > t . Figure 2 illustrates the cumulative distributions, which can be obtainedby integration. (These distributions give the probability that the total duration of theprocess will be less than T .) For each probability distribution, the median T value isthe value of T for which the cumulative distribution equals 0.5; the fifth-percentile Tvalue is the value of T for which the cumulative distribution equals 0.05; and so on.Therefore the median T value for the simply-clipped posterior is greater than t plusthe median T value for the prior, and likewise for all percentile T values.

1 2 3T

1

2

3

4

P

Fig. 1 The left solid curve is the prior probability distribution 2δ2/(T + δ)3 with δ = 0.5. The right solidcurve is the result of sliding by t = 1, and the dashed curve is the result of simple clipping with t = 1

1 2 3T

0.5

1P

Fig. 2 As above but with cumulative probability distributions

123

Page 14: How and how not to make predictions with temporal Copernicanism

104 Synthese (2009) 166:91–111

1 2 3T

1

2

P

Fig. 3 The left solid curve is the prior proportional to e−(T +T 2)/2. The right solid curve is the result ofsliding by t = 1 and the dashed curve is the result of simple clipping with t = 1

I will call this case a super-additive shift. In this example, provided t > δ, the priorhas a mean T value of δ and the simply-clipped posterior has a mean T value of δ+2t .The corresponding median T values are (

√2 − 1)δ and (

√2 − 1)δ + √

2t .By contrast, consider the prior proportional to e−(T +T 2)/2, which decreases very

rapidly as a function of T . It shows precisely the opposite behavior: The simply-clipped posterior is larger than the slid prior for small T , and smaller for large T .Correspondingly, for all T > t the cumulative probability distribution for the simply-clipped posterior is now larger than the cumulative probability distribution for the slidprior. Figure 3 illustrates. Consequently, the simply-clipped posterior will have meanand median T values that are less than t plus the respective mean and median T valuesfor the prior. I will call this a sub-additive shift.11

Roughly speaking, when the prior shows a slower than exponential decrease, theshift from simple clipping will be super-additive. The slower the decrease of the prior,the more of an increase simple clipping will produce in the mean and median T values.(Sometimes the mean T value will be infinite; if infinite for the prior, then it will alsobe infinite for the simply-clipped posterior.) And, still speaking somewhat roughly,when the prior shows a faster than exponential decrease, the shift will be sub-additive.Precise necessary and sufficient conditions for each case can be found, but they arerather complicated; the important point for present purposes is that shifts of all kindsare possible, depending on the form of the prior. The effect of simple clipping is notas trivial as it might have seemed.

As usual, T is the total duration of the process while t is its known current age. SoT − t is the future duration of the process. When there is an exactly additive shift,learning the current age of the process yields an expected future duration equal tothe original expected total duration. When there is a super-additive shift, learning thecurrent age yields an expected future duration greater than the original expected total

11 A super-additive shift is essentially the same as what Monton and Kierland (2006) call a Gott-like shift,and a subadditive shift is essentially the same as what they call an anti-Gott-like shift. Since I am discussingsuch shifts as they occur without considering the Gott factor derived from temporal Copernicanism, I usedifferent names from Monton and Kierland.

123

Page 15: How and how not to make predictions with temporal Copernicanism

Synthese (2009) 166:91–111 105

duration. When the shift is subadditive, learning the current age yields an expectedfuture duration less than the original expected total duration.

Recall Gott’s qualitative thesis: “Things that have been around a long time tend tostay around a long time. Things that haven’t been around long may be gone soon.” Thatthesis can now be given a foundation resting purely on simple clipping. No appeal totemporal Copernicanism need be made. We merely need take a slowly decaying priorprobability distribution P(dT,dT ). And given the assumed limits on our backgroundinformation, that will usually be an appropriate sort of prior to take, since it reflectsconsiderable ignorance. A super-additive shift will result. For a sufficiently slowlydecaying prior, the mean and median values of T may be shifted upward by a verylarge multiple of t .

In itself, the qualitative thesis is highly intuitive, as Monton and Kierland (2006)urge. They are incorrect, however, in their apparent assumption that the qualitativethesis can only be supported via an updating process that uses the Gott factor, and intheir conclusion that such updating receives strong intuitive support thereby.

7 Full updating for T

7.1 The subfactor P(a0,dt |dT,dT )

Now let us look at the subfactor P(a0,dt |dT,dT ). This is the probability of the processstarting in the infinitesimal interval dt prior to the observation time, as conditional onthe process having a total duration T . (dt could also be visualized as a finite intervalmuch smaller than all other timescales of interest. If we are concerned with a processthat we expect to last for years, taking dt as 1 day would work.)

However, the subscript of 0 in a0,dt above is a bit arbitrary; thanks to approximatetemporal Copernicanism, the 0 could be replaced by any small multiple of T . That is,the subfactor P(a0,dt |dT,dT ) does not pertain purely to the probability of the processstarting immediately before the observation; more broadly, it pertains to the probabil-ity of the process starting somewhere in the vicinity of the observation time. Whatit really measures is the probability that the process starts in the general era of theobservation time, conditional on the total duration being T .

It is clear why this subfactor should be of concern. If longer-lived processes areless likely to start in the neighborhood of the observation time, then an observationthat a process is currently underway gives less weight than it otherwise would to ahypothesis of long total duration.

This sort of connection, between duration and the probability of the process start-ing near the observation time, has already shown up in Sect. 2.2 as part of the Deuceexample. As mentioned there, approximate temporal Copernicanism does not requirethat such a connection exist. It does allow it, however. And it is entirely intuitive thatsuch a connection often will exist. Putting ourselves in the position of the observeragain, it makes sense that very long-lived processes will sometimes be more likelyto start in the remote past or the remote future; and they will then be less likely tostart near the present moment. Longer-lived processes may have different prerequi-sites than shorter-lived ones, and those prerequisites may themselves last longer. The

123

Page 16: How and how not to make predictions with temporal Copernicanism

106 Synthese (2009) 166:91–111

play Deuce can continue to illustrate. Its prerequisite is the existence of the Music BoxTheatre. If the play itself is very long-running, that means the theater must also lasta long time. Furthermore, it is probable that the total duration of the theater will notmerely be slightly greater than the total duration of the play, but substantially greater.Hence a greater range of possibility opens up for the play to begin in the remote past orfuture. (We may suppose that the background information B included knowledge ofthe existence of the Music Box Theater, but not knowledge of when it was built.) Noris this example contrived; it is perfectly common for a process of interest to rely forits existence on some prerequisite condition that may itself be of uncertain duration.

Unfortunately, there is no general rule for how to obtain P(a0,dt |dT,dT ). The bestwe can do is to put a bound on it and to describe some important special cases. Let us,as usual, restrict our attention to plausible T . First note that P(at,dt |dT,dT ) is normal-ized as a function of t , so it must approach 0 as t → ±∞. But due to the approximatetemporal Copernican assumption, it can only approach 0 slowly. Over any intervalin t of size T , the function is approximately constant. Therefore its integral over anysuch interval must be much less than 1; otherwise, its integral from t = −∞ to +∞would be more than 1. Concentrating specifically on the interval from t = −T/2 tot = T/2, we have P(at,dt |dT,dT ) ≈ P(a0,dt |dT,dT ) for |t | < T/2. So

t=T/2∫t=−T/2

P(at,dt |dT,dT ) ≈ T P(a0,dt |dT,dT )/dt � 1 (15)

and consequently

P(a0,dt |dT,dT ) � dt/T (16)

The above bound rules out the possibility that P(a0,dt |dT,dT ) is constant for all T .So simple clipping, which is what results if it is constant, is never the whole story.

In the simplest and most important special case, there will be some T0 such thatP(a0,dt |dT,dT ) is approximately constant for all T < T0. That special case will oftenprevail in practice. It holds, intuitively, when the prerequisites for the process are fixedto be in place over a timespan of at least T0. If T0 is so large that there is a negligibleprobability of the process lasting longer at all, then the bound in Eq. 16 fails to haveany bite and P(a0,dt |dT,dT ) is effectively constant. Simple clipping will then at leastbe a good approximation.

If there is a range of plausible T values greater than T0, then we may still expectthat P(a0,dt |dT,dT ) will be “as constant as possible” in that range consistent with theabove bound, and hence proportional to 1/T . And what then determines T0? One thingthat can determine it, intuitively, is that a process lasting longer than T0 will force usto revise our opinions about the nature of whatever prerequisites the process requires.The possibility will thereby open up that the process could have started far earlier orfar later than we would originally have thought likely.

I emphasize again that the above is just a special case, albeit a common and impor-tant one. P(a0,dt |dT,dT ) can have a wide variety of functional forms consistent with

123

Page 17: How and how not to make predictions with temporal Copernicanism

Synthese (2009) 166:91–111 107

the constraint (16). That should not be surprising in the least; rather, we would rightlybe suspicious of any claim that the details of particular cases made no difference.12

7.2 The effect of multiplying by P(a0,dt |dT,dT )

Now let us consider the second step in the application of Eq. 11 again, in which we takethe simply-clipped posterior, multiply it by P(a0,dt |dT,dT ), and then renormalize. Letus concentrate on priors P(dT,dT ) for which simple clipping produces a super-additiveshift, since that is the case of main interest.

As discussed in the previous section, it may happen that P(a0,dt |dT,dT ) is approx-imately constant up to some T0. If T0 is large enough that the prior probability distri-bution P(dT,dT ) gives very little chance of T being greater than T0, then the secondstep in the application of Eq. 11 will have little effect.

But it can also easily happen that P(a0,dt |dT,dT ) is a decreasing function of T forall T . Or it may be constant up to some T0 and then decrease, with a substantial priorprobability of T being greater than T0. In either of these two cases, the super-additiveshift from simple clipping will be reduced.

To illustrate, let me adapt an example from Monton and Kierland. Suppose thatsome future space mission comes across a planet on which a large geyser is currentlyerupting. After a bit of investigation, it is determined that the geyser has been eruptingcontinuously for t = 100 years.

The relevant background information B, remember, is that which was availableimmediately before the eruption was first noticed. For purposes of illustration, assumethat P(a0,dt |dT,dT ) ∝ 1/T given B. In other words, the probability that the eruptionbegan in the last instant dt , conditional on the eruption’s having a total duration T ,is proportional to 1/T . That sort of dependence on T will be plausible if there aremany kinds of geological conditions that permit geyser eruptions, with some of thoseconditions being much longer-lasting than others, and if it is unknown which of themhappens to prevail on this particular planet.

Naturally other geysers on other planets are known, and from them a prior proba-bility P(dT,dT ) for this particular eruption’s duration can be found. But let us say thatall the other known geysers are of very diverse subtypes, and it is not yet known whatsubtype this particular geyser belongs to; so the prior distribution will have to includeall known subtypes and thus will be a very broad one. For illustration, let us take

P(dT,dT ) = 1.1 dT

(T + 1)2.1 (17)

where T is measured in years. The exponent of 2.1 is chosen to give a slowly decayingprior that still has a finite mean T value; the prefactor of 1.1 makes the probabilitydistribution normalized.

12 If P(a0,dt |dT,dT ) is taken as an objective probability, we may sometimes be unable to form a goodestimate for it. We may still proceed with whatever rough estimate we can form, just as we would do in anyother probabilistic calculation when the probabilities themselves become uncertain.

123

Page 18: How and how not to make predictions with temporal Copernicanism

108 Synthese (2009) 166:91–111

Table 1 Descriptive statistics for the prior P(dT,dT ) = 1.1 dT/(T +1)2.1, for the posterior obtained fromit by simple clipping with t = 100, and for the posterior obtained from it by applying Eq. 11 in full witht = 100.

Prior Simply-clipped Posterior fromposterior full updating

Mean 10 1110 192Percentile 2.5 0.023 102 101Median 0.878 188 139Percentile 97.5 27.6 2888 583

The prior distribution has mean and median T values of 10 and 2(1/1.1) − 1 =0.878 years, respectively. (It is characteristic of such slowly decaying probability dis-tributions for the mean to be much larger than the median.) Table 1 shows the resultof applying first simple clipping to the prior, and then the result of full updating usingEq. 11.

Simple clipping produces quite a dramatic effect, increasing the eruption’s expectedduration by a factor of more than a hundred. That thoroughly accords with Gott’s quali-tative thesis, but I emphasize again that absolutely no appeal is made to Copernicanismor to any other form of anthropic reasoning to get that far. When Copernicanism doesenter the picture via multiplication by P(a0,dt |dT,dT ) in the third column, the increasein the expected duration is moderated substantially.

For more slowly decaying priors, the effect of simple clipping will be still moredramatic, and the moderating effect of Copernicanism will also be greater. For a priorthat decays sufficiently slowly, the mean T value will in fact be infinite, and it willremain infinite after simple clipping. But the factor of P(a0,dt |dT,dT ) will then makeit finite, even if that factor is constant up to an arbitrarily large T0.

If P(a0,dt |dT,dT ) decays even faster than 1/T , then the moderating effect will bemore pronounced. The super-additive effect of simple clipping may sometimes beeliminated entirely.

Since so much depends on the functional form of the prior and of P(a0,dt |dT,dT ),I conclude that neither Gott’s simple rule nor any other similar rule holds in general.The example presented above is typical, but apart from the above qualitative remarksthere is no simple general pattern. That may be a bit disappointing, but it should hardlybe surprising. There are so many different kinds of processes in the world, and theextent of our knowledge concerning them varies so much, that a single simple rulewould be highly implausible. We may sometimes be further hobbled by an inabilityto estimate either the prior P(dT,dT ) or the factor P(a0,dt |dT,dT ). But when we doknow them, or at least can produce good estimates of them, I unreservedly recommendEq. 11 as giving the information we seek.

As a final note, there is at least one class of situations in which my approachdoes replicate Gott’s simple rule. If P(a0,dt |dT,dT ) ∝ 1/T and furthermore the priorP(dT,dT ) is proportional to the Jeffreys prior 1/T over a wide range in T , then withinthat range the simple rule will hold as a good approximation. The prior need not takethe form of a Jeffreys prior with cutoff, as Gott proposed; the log-Cauchy distribu-tion, for example, will also work for certain choices of parameters. These conditions

123

Page 19: How and how not to make predictions with temporal Copernicanism

Synthese (2009) 166:91–111 109

will normally only be met in the greatest depths of our ignorance. In this one ratherspecial case, Gott’s conclusion—if not the details of his reasoning—may be regardedas vindicated.

8 Doomsday

Gott’s most eye-catching prediction is about the future of the human race itself.Using his simple rule, he finds a 95% chance of it lasting between five thousandand eight million years into the future.13 The simple rule as applied to our own spe-cies’ longevity is similar to the Doomsday Argument forcefully propounded by JohnLeslie; some differences do exist between the two arguments, but they are of debatablesignificance.14

The framework I present in this paper clearly does not apply here. Our back-ground information B starts out with the fact that the human race has existed for along time—that fact is old evidence. So to make progress, we must confront the oldevidence problem head-on.

That problem is beyond the scope of this paper, but my preference is for somethinglike the “counterfactual” approach of Howson (1991). If that approach is adopted,then this paper’s framework can be extended accordingly. That extension, I think,should include something like what Bostrom (pp. 65–66) has called the Self-Indication Assumption, namely the assumption that propositions should be regardedas more probable when they imply the existence of a larger number of agents. Theupshot will be a fairly optimistic outlook for human longevity. I hope to pursue thesehighly debatable matters in a subsequent paper.

9 Conclusion

Taking a step back, let us ask: What is the foundation of fully general Copernican-ism? It is certainly not the assumption that our spatiotemporal location is necessarilyordinary. Nor is it the assumption that our location’s being ordinary has a high proba-bility conditional on our background information, regardless of what that backgroundinformation may be. Rather, I take the assumption to be that our location is probablyas ordinary as it can be consistent with our background information. If our backgroundinformation is meager, then that claim veers close to supporting the absolute ordinar-iness of our location; otherwise it does not. Since we practically always have somebackground information, statements about the absolute ordinariness of our location(temporal or otherwise) will rarely be exactly true. We must resort to approximationsinstead, and simple general rules will not be easy to come by.

13 His actual numbers are 5,100 and 7.8 × 106 years. But in view of the uncertainty as to how long thehuman species has already existed, that level of precision is hardly warranted.14 Gott was originally unaware of Leslie’s earlier work, which is presented in Leslie (1992, 1996, 1997).Brandon Carter also made a similar argument in unpublished lectures. A substantial literature about theDoomsday Argument now exists, both for and against (Bostrom 1999; Korb and Oliver 1999; Sowers 2002;Monton, 2003).

123

Page 20: How and how not to make predictions with temporal Copernicanism

110 Synthese (2009) 166:91–111

I thus agree with Sober (2003) insofar as empirical information must guide us whenit is available. We cannot rely blindly on an a priori principle. But when empirical infor-mation is slight, Sober appears to prefer abstaining from prediction while carrying outmore observations: “In the absence of data, you should go out and get some [ratherthan following Gott].” And with that I disagree.15 It is when little empirical informa-tion is available that Copernicanism comes into its own, approximate though it maybe, and has substantial predictions to offer.

The question remains open of how to make a truly rigorous statement of generalCopernicanism. That question is of vital importance to anthropic reasoning and itsassociated disputes, such as the one over how to explain the apparent fine-tuningfor life of our universe’s fundamental physical parameters.16 I think Bostrom (2002)strikes close to the mark with his Self-Sampling Assumption, “One should reason as ifone were a random sample from the set of all observers in one’s reference class”; butI have several reservations about that assumption as phrased, including its apparentlycounterfactual nature (“as if”).

And is it frightening or comforting to think of ourselves as ordinary? I suspect thatquestion has no absolute answer either.

References

Barrow, J. D., & Tipler, F. J. (1996). The anthropic cosmological principle. Oxford University Press.Bartha, P., & Johns, R. (2001). Probability and symmetry. Philosophy of Science, 68S, S109–S122.Bass, L. (2006). How to predict everything: Nostradamus in the role of Copernicus. Reports on Mathematical

Physics, 57, 13–15.Bostrom, N. (1999). The Doomsday Argument is alive and kicking. Mind, 108, 539–550.Bostrom, N. (2002). Anthropic bias: Observation selection effects in science and philosophy. Routledge.Buch, P. (1994). Future prospects discussed. Nature, 368, 107–108.Castell, P. (1998). A consistent restriction of the Principle of Indifference. The British Journal for the

Philosophy of Science, 49, 387–395.Caves, C. M. (2000). Predicting future duration from present age: A critical assessment. Contemporary

Physics, 41, 143–153.Glymour, C. (1980). Theory and evidence. Princeton University Press.Gott, J., III (1993). Implications of the Copernican Principle for our future prospects. Nature, 363, 315–319.Gott, J., III (1994). Future prospects discussed. Nature, 368, 108.Gott, J., III (1996). Our future in the Universe. In V. Trimble & A. Reisenegger (Eds.), Clusters, lensing,

and the future of the universe. Astronomical Society of the Pacific.Gott, J., III (1997). A grim reckoning. New Scientist, 15, 36–39.Gott, J., III (2001). Time travel in Einstein’s universe. Houghton Mifflin.Howson, C. (1991). The ‘Old Evidence’ problem. The British Journal for the Philosophy of Science, 42,

547–555.Jeffreys, S. (1961). Theory of probability. Clarendon Press.Juhl, C. (2005). Fine-tuning, many worlds, and the ‘Inverse Gambler’s Fallacy’. Nous, 39, 337–347.

15 Sober also makes a technical error in one of the examples he presents against the Gott approach. Heconsiders the various organizations he has joined throughout his life. According to him, Gott would predictthat the organizations should “go extinct sooner, the earlier in my life I joined them.” He rightly findsthat prediction implausible. But the real prediction is that the organizations should go extinct sooner, theyounger they were at the time he joined them.16 The literature on that dispute is far too large for me to even begin to survey, but a few representativesources are Barrow and Tipler (1996), White (1991) and Juhl (2005).

123

Page 21: How and how not to make predictions with temporal Copernicanism

Synthese (2009) 166:91–111 111

Kaplan, D. (1988). Demonstratives. In J. Almog, J. Perry, & H. Wettstein (Eds.), Themes from Kaplan.Oxford University Press.

Korb, K., & Oliver, J. (1999). A refutation of the Doomsday Argument. Mind, 107, 403–410.Ledford, A., Marriott, P., & Crowder, M. (2001). Lifetime prediction from only present age: Fact or fiction?

Physics Letters A, 280, 309–311.Leslie, J. (1992). Time and the anthropic principle. Mind, 101, 521–540.Leslie, J. (1996). The end of the world: The science and ethics of human extinction. Routledge.Leslie, J. (1997). Observer-relative chances. Inquiry, 40, 427–436.Mikkelson, J. M. (2004). Dissolving the wine/water paradox. The British Journal for the Philosophy of

Science, 55, 137–145.Monton, B. (2003). The Doomsday Argument without knowledge of birth rank. The Philosophical Quar-

terly, 53, 79–82.Monton, B., & Kierland, B. (2006). How to predict future duration from present age. The Philosophical

Quarterly, 56, 16–38.Olum, K. D. (2002). The Doomsday Argument and the number of possible observers. The Philosophical

Quarterly, 52, 164–184.Press, S. (1988). Bayesian statistics. Wiley.Schuster, H., & Just, W. (2005). Deterministic chaos: An introduction. Wiley-VCH.Shannon, C., & Weaver, W. (1949). The mathematical theory of communication. University of Illinois Press.Sober, E. (2003). An empirical critique of two versions of the Doomsday Argument. Synthese, 135, 415–430.Sowers, G. F. (2002). The demise of the Doomsday Argument. Mind, 111, 37–45.White, R. (1991). Fine-tuning and multiple universes. Nous, 34, 260–276.

123