York H. Dobyns- On the Bayesian Analysis of REG Data

Embed Size (px)

Citation preview

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    1/23

    Journa l o f S c i e n t i f i c Explorat ion, Vol. 6, No. 1, pp. 23-45, 1992 0892-3310/92 1 9 9 2 Society for Scientific Exploration

    On the Bayesian Analysis of REG DataY O R K H . D O B Y N S

    P r i n c e t o n E n g i n e e r in g A n o m a l ie s R e s e a r c hP r i n c e t o n U n i v e r s i t y , P r i n c e t o n , N J 0 8 S 4 4Abstract Bayesian analysis m ay profitably be applied to anomalous dataobtained in Random Event Generator and similar human/machine experi-ments, but only by proceeding f r o m sensible prior probability estimates.Unreasonable estimates or strongly conflicting initial hypotheses can projectthe analysis into contradictory and misleading results. Depending upon thechoice of prior and other factors, the results of Bayesian analysis range f r o mconfirmation of classical analysis to complete disagreement, and for thisreason classical estimates seem more reliable for the interpretation of dataof this class.

    IntroductionThe relative validity of B ayesian versu s classical statistics is an ongoing ar-gument in the statistical com m un ity. The Princeton Engineering AnomaliesR esearch p rogram (PEAR) has heretofore used classical statistics exclusivelyin its published results, as a matter of conscious policy, on the grounds thatthe explicit inclusion of prior prob ab ility estimates in the analysis m ight divertmeaningful discussion of experimental biases and controls into debate overthe suitability of various priors. Nonetheless, B ayesian analysis can offer someclarifications, particularly in discriminating evidence from prior belief, an dis therefore worth exam ination.In this article we apply the Bayesian statistical approach to a large body ofrandom event generator (REG ) data acquired over an eigh t-year period ofexperimentation in the hu m an/m achine interaction portion of the PrincetonEngineering A nom alies Research (PEAR) prog ram . When assessed by classicalstatistical tests, these data display strong, reproducible, operator-specificanomalies, clearly correlated with pre-recorded intention and, to a lesserdegree, with a variety of secondary experim ental param eters (Nelson, D un ne,and Jahn, 1984). Wh en assessed by a B ayesian form alism , the conclusionscan range from a confirmation of the results of the classical analysis withessentially the sam e p-value against the null hypothesis, to a confirmation ofthe null hypothesis at 12 to 1 odds against a su itab ly chosen alternate, de-pending on how the analysis is done and wh at hypotheses are chosen forcomparison.The intent of this p aper is to exam ine both the range of conclus ions possiblefrom B ayesian analysis as applied to a specific data set and the implications

    23

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    2/23

    24 Y. H. Dobynsof that range. Th e empirical m eaning of families of prior probabilities tha tlead to related conclusions is of particular interest. We will also examine therelation between Bayesian odds adjustments and classical p-values in light ofconsiderations of statistical power and the likelihoods of what a classicalstatistician would call "Type I" and "Type II" error. (Atvarious times it willbe necessary to contrast Bayesian with non-Bayesian approaches which arevariously called "frequentist", "Fisherian", or "sampling theory" statistics.While acknowledging that non-Bayesian statisticsare a conglom erate categoryon the order of "nonelephant animals," for purposes of this discussion anyanalytical approach that does not include an explicit role for subjective priorprobabilities will be called "classical.")

    Elementary Bayesian AnalysisBayes' theorem (p(t\y)p(y) = p(y \ t )p( t ) ) is a fundamental result of proba-bility theory from the properties of contingent probabilities. Its analyticalapplication is tha t of revising prior p robab ility estimates in light of evidence,that is, of determining the extent to which various possibilities are supportedby a given empirical outcome. In the most basic application of Bayesiananalysis one is assumed to have a model or hypothesis with one or moreadjustable parameters 0. One's state of prior knowledge, or ignorance, m ay

    be expressed as aprior probabilitydistribution p0(0) over the space of possibletvalues. It is further assumed that some method existsbywhich a probabilityp ( y \ t ) can b e computed for any possible experimental outcome y , given adefinite parameter value 6. In standard R EG experiments, for example, Bconsists of a single parameter, the probability p with which the machinegenerates a h it in an elementary Bernoulli trial. The probability of exactly 5successes in a set of n trials is then given by the binomial formula

    / \where ( I is the combination of n elements taken s at a time, namely n ! / [ s ! ( n- s ) ! ] .By Bayes' theorem, Equation (1) m ay alternately be regarded as the prob-ability of a parameter value p, g iven actual data n an d s. When cast in thisform, p is more commonly called the likelihood, l. The general recipe forupdating one's knowledge of parameter(s) t in light of evidence y is expressedby

    (1)

    where p,(t|y) is the posterior probability distribution among possible valuesfor B, given the prior distribution p0 and the likelihood l. For the case of REGdata,(2)

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    3/23

    B ayesian R EG analysis 25

    Note that (2) and (3) are expressed as proportionalitiesrather than equalities.While there is always a normalization for l such that l ( p \ y ) p 0 ( t ) has a totalintegral of 1, this normalization in general depends on p0( t ) . Specifically, theuse of L(t \y ) = l ( t \ y) / I d K L ( 9 \ y } t t ^ S ) , where I denotes integration over allJi Jepossible values of t, produces the correct normalization.The relations (2) and(3), on the other hand, express l purely as a function of y and t withoutreference to prior probabilities. The conceptual clarity of stepping from onestate of knowledge about 6, or probab ility distribu tion over 6, to another, withthe aid of a quantity that depends only on objective evidence, thus entailsthe cost of renormalizing the posterior probabilities to a total strength of 1as the last step in the calculation.Since it is a lready a proportionality, expression (3) m ay be simplified to

    Bayesian Analysis of REG DataLet us now apply Bayesian formalism to the body of REG data presented inM arg i ns o f Real i ty (Jahn an d Dunne, 1987), also published in the Journal forScientific Exploration (Jahn, D unn e, and Nelson, 1987). From the summary

    (3)

    (4)where the combinatorial factor, lacking p dependence, has been subsumedinto the normalization.This form illustratesan important feature of B ayesiananalysis. After prior probabilities have been adjusted in the light of firstevidence, the resulting posterior probabilities m ay then be used as priors forsubsequent calculation based on new evidence. For example, after two stagesof such iteration,

    (5)Note, however, that one can argue with equal merit that the posterior prob-abilities after both data sets y,, y 2 are available should just b e p 2(t|yl, y 2)2 ( 0 1 ^ 1 + y^oW- For these formulas to produce different values of p2 wouldbe contradictory, i.e., the evidence would support different conclusions de-pending on the sequence in which it was evaluated. This can b e avoided onlyif l has the property (0|y, + y j oc (fl|y1)fi(0|j> 2). Th e likelihood function = ^(1 p ) " ~ * indeed has this essential addition property:

    (6)Since Bayesian formalism is internally consistent, a calculation such as (6) isessentially a check on the validity of the likelihood function; any legitimate must have the appropriate addition property.

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    4/23

    26 Y. H. Dobynstable on pp . 352-53 of the book and pp. 30-32 of the Journal article, the"dPK" data consist of some 522,450 experimental trials totalling n =104,490,000 binary Bernoulli events. The Z score of 3.614 is tantamount toan excess of 18,471 hits in the direction of intention. From relation (4), thelikelihood function forBernoulli trials is (p |n, s) = f f ( \ . - p f~ s. Themeanof this p distribution is (5 + !)/( + 2); for large n its standard deviation isa a = 7= + O(\/ri); and it becomes essentially normal. With the values2vnabove,

    (7)Figure 1 shows this likelihood function for the R EG data against a range ofp values from 0.4999 to 0.5004. Also shown for comparison is the intervalof p values that are credible under the null hypothesis, calculated as outlinedbelow.The REG device itself is constructed to generate groups of Bernoulli trialswith an underlying hit probability as close to exactly 0.5 as possible (Nelson,Bradish, and Dobyns, 1989). A s part of its internal processing, it comparesthe random string of positive and negative logic pulses from the noise gen-erator with a sequence of regularly alternating positive and negative pulsesgenerated by a separate circuit Matches with this alternating "template" arethen reported as hits in the final sum. This technique effectively cancels anysystematic bias in the raw noise process. While it is conceivable that somebias in the final output could still occur, if for example some part of theapparatus contributed an oscillatory component to the noise that happenedto be exactly in phase with the template, or if the counting module shouldsystematically m alfunction, such remote possibilitiesare precluded by a n u m-ber of internal failsafes and coun terchecks incorporated in the circuitry. Thedevice was extensively and regularly calibrated during the period that theMargins data were collected,and from thesecalibration data it was establishedthat, if p is expressed as 0.5 + 6, then |d| < 0.0002, with no lower boundestablished.Yet further protection againstbias isprovidedby the experimental protocol,wherein each operatorgeneratesapproximately equal amounts of data in threeexperimental conditions. These arelabeled "high," "low," and"baseline" inaccordance with the operator's pre-recorded intentions. The "dPK" data inMargins are differential combinations of "high" and "low" intention data;the combined result is equivalent to inverting the definition of success for the"low" data and com puting the deviation of the resulting com posite sequenceof high and low efforts from chance expectation. Thus, to survive in the dPKresults, anyresidual artifactual bias of the device or thedata processingwoulditself have to correlate with the operator's intention. Specifically, if p0 = 0.5+ d is the probability of an "on" bit, a data set containing a total ofNh bitsfrom the high intention and N, bits from the low intention (where the goal isto get "off" bits) will have a null-hypothesis p in the d PK of:

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    5/23

    Bayesian REG analysis 27

    Fig. 1. Likelihood function for PEAR REG data.

    (8)Thus, when N* = N,, p^ = 0.5, regardless of the value of 5. For the actualM a r g i n s data, with Nk = 52,530,000 bits and N, = 51,960,000 bits, ( N , , -#;)/(#* +Nt) =0.0055. Given |d| < 0.0002 asabove, themaximum possibleartifactual deviation from p d = 0.5 is 1.1 x 10-6. This value is the sourceof the n ull hypothesis interval shown in Figure 1.While the issue of possible sources ofbias in the REGdata could be treatedat considerably greater length (see, for a fuller treatment, Nelson et al. , 1989)such discussion is a separate issue from the statistical interpretation of thedata. It has been m entioned here only to explain the derivation of the nullhypothesis interval.Having established the likelihood function (Eq. 7), let us now considervarious sets of prior probabilities with which may be combined to arriveat a posterior probability distribution for the value of p in the actual exper-iment. First, consider the prior probability corresponding to extreme skep-ticism. A person who regards an y influence of consciousness on the REGoutput to be impossible a pr ior i should, by the tenets of Bayesian analysis,choose a prior p0(p ) = 5(p - 0.5), where 5 ( x ) is the standard Dirac deltafunction defined by the property I /(x)5(jc - x 0)dx =f( x 0) for any f and anyJaa , b such that a < x0 < b. It then follows that p 1 ( p ) will also be a deltafunction, an d after normalizing m ust in fact be the same function. Since this

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    6/23

    28 Y. H. Dobynschoice of prior probability is clearly impervious to any conceivable evidence,it is illegitimate in an y effort to learn from new information, however stronglyheld on philosophical grounds.As an extreme alternative, one might select a prior evincing complete ig-norance as to the value of p, by regarding all the possible values of p as equallyprobable: p 0 ( p ) = 1 for p e [0, 1] as illustrated in Figure 2a. With this priorthe posterior probability p1(p) must, of course, have exactly the sam e shapeas l. This replicates classical analysis in the following sense: is normal withits center 3.614 standard deviations away from p = 0.5, so, if we defineconfidence intervals in p centered on the region of maximum posterior prob-ability, we may include as much as 99.97% of p 1 ( p ) before the interval becomescompatible with a point null hypothesis, corresponding to the 3.0 x 10-4 pvalue of a two-tailed classical test. Accounting for the actual spread of thenull hypothesis slightly narrows this interval, raising the equivalent p valueto 3.3 x 10-4.It is, however, unnecessary to assume this level of ignorance to arrive at avery similar result For example, one m igh t regard it as plausible, in light ofth e m easures taken to force p = 0.5, that p ought to have some value in anarrow range centered about 0.5 but that within that range there is no strongreason to prefer on e p over another. This defines a one-parameter family of"rectangular" priors characterized by their width w .

    (9)Figure 2b illustrates the member of this f a m i l y with w = 10-3. Use of thisprior essentially replicates the result from the uniform prior of Figure 2a,since it still includes all of the likelihood function except for tiny contributionsin the extreme tails. In consequence, p1 has the same shape as in thepreviouscase for the region 0.5 w / 2 < p < 0.5 + w/2, but is uniformly augmented by amultiplicative factor to compensate for the missing tails. Until w is madesmall enough that 0.5 + w /2 comes within a few standard deviations of themaximum of the e f f e c t s of this correction remain negligible.Obviously, if the prior is made s u f f i c i e n t l y narrow it will become indistin-guishable from the null hypothesis interval and the resulting posterior prob-ability can n o longer exclude the null hypothesis interval from the region ofhigh likelihood. Figure 3 displays the equivalent p value with which the nullhypothesis is excluded for a range of widths of the prior; as above, this p isthe conjugate probability to the widest confidence interval about the maxi-m um of p1that does not include any of the null hypothesis interval. The linelabeled "Breakpoint" marks a value of special interest for the width of theprior. For wider priors, the u pper limit of the confidence range for the Bernoullip is established by the symmetry condition about the peak, and the conditionthat the interval not include the null hypothesis range. For narrower priors,this upper limit is dictated by the width of the prior itself. It is unsurprising

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    7/23

    Bayesian REG analysis 29

    Fig. 2. Likelihood and different priors.that this change of regimes is accompanied by an inflection in the p value ofthe null hypothesis. Beyond the left edge of Figure 3, we should note thatwhen the width of the prior drops to 2.2 x 1 0 -6 , the same as the null hy-pothesis, the p value of the null rises to 1. This is essentially the sam e im -perviousness previously seen in the delta-function prior. Indeed, the familyof rectangular priors tend s toward a delta function in the limit as the widthgoes to zero. However, values consistent with the null hypothesis are stillexcluded at p = 0.05 for w as small as 8.7 x 10-5. Note that this is of thesame order as the width of the likelihood function itself.Further perspective on the interplay of the evidence with a prior preferencefor the null hypothesis interval m ay be obtained by considering another familyof priors that specifically favor the null hypothesis to varying degrees but donot have sharp cutoffs of probability. Let p 0 ( k , p ) = [ ( 2 k + l)!/(k!)2]pk(l - p)kfor any k. A ll of these functions are properly normalized probability distribu-tions, with mean 0.5 and standard deviation s = \/(k - 1)/(2k2 + 5k + 3),which tends to /= for large k. These functions also become increasinglynormal for greater k. A s in the prev ious case, they tend to a delta function inthe limit k oo. When one of these functions is used as a prior with l from

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    8/23

    30 Y. H. Dobyns

    Fig. 3. Conjugate confidence intervals from rectangular priors.

    th e M a r g i n s data, th e resulting -, has mean /a, = (s + 1 + k)/(n + 2 + 2k)and standard deviation oti = ll r \ /n + 2k (in the large-/:approximation, whichis clearly justified), as can be seen from the functional form of and the factthat multiplying by w0is equivalent to the substitution s s + k , n n +2k, up to normalization. The equivalent Z score, that is, the number of itsown standard deviations that separate the peak of the posterior probabilitydistribution from thenull hypothesis, becom es Z =(2s - n)/ \ /n + 2k.Whilethis clearly tends toward zero as k oo, it is also clear that large values ofk, and hence extremely narrow priors, are needed to change the result appre-ciably. Figure 4 presents the equivalent p value, as defined for Figu re 3, forthis family of priors as a function of k. Also shown is the width (standarddeviation) of the prior, indicative of how strongly the null hypothesis isfavored. Note that to drive the p value above 0.05 (that is, to bring the nullhypothesis interval within the 95%confidence interval of the posterior prob-ability) a k > 108, or a o - < 3 x 10~ 5, is required. Here the characteristic scaleof the prior is actually narrower than that of the likelihood.A n alternative way of favoring a narrowly defined region of probabilityoften employed in Bayesian analysis, as pointed out by the reviewer of anearlier version of thiswork, is to put some of theprior probability in a "lump"at the preferred value. In this case, for example, one might modify any of thepriors above by multiplying it by 1 a and then adding a S ( p 0.5), for any0 < a < 1;this inflates the degree of probability accorded the null hypothesis.Large values of a are not very interesting, since a = 1replicates the completelyimpervious delta-function distribution.Consider a family ofpriors that mightbe regarded as plausible by an analyst who believes the null hypothesis has

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    9/23

    B ayesian R EG analysis 31

    Fig. 4. Conjugate probabilities from It-family of priors.

    considerable support but who has no reason to prefer one value of p overanother within some reasonable range for the alternate. This might be rep-resented as n o ( f l , w, p) =a d ( p - 0.5) + (1 - a)xj(p), where v w(p) is the same"rectangular" prior denned as v0(w, p ) in Eq. 9. Thus v0(a, w , p ) is a two-parameter family of priors in a , the extra weight initially assigned to the nullhypothesis, and w , the range of plausible alternatives. Th e confidence-intervalformulation discussed above is somewhatawkward for the posterior proba-bility resu lting from thesefunctions, since they are highly bimodal. However,this b imodality arises from the preservation of the delta-function componentand also suggests that the posterior probabilityof the null hypothesis, giventhis prior, m ay be computed from the strength of the delta-function null inthe (normalized) posterior probability I T , . The contribution from the part ofthe component compatiblewith the null hypothesis is negligible for mostvalues of a an d w .The upper portion of Figure 5 presents a contour plot of the posteriorprobability of p = 0.5 for a range o f a an d w values. Both scales are logarithmic,with grid lines shown at 1, 3, 5, 7, and 9 times even powers of 10. For avalues as large as 0.9,the posterior probability of the null is less than 0.05for w 5 x 10~*. A s a grows the calculation becom es less sensitive to w an dless responsive to the data, as expected.

    The lower portion of Figure 5 shows a related quantity of interest, therelative strengthof the null hypothesis in the prior and posterior distributionsas given by the coefficient of the delta-function component. This can beregarded as the degree to which the null hypothesis component is amplifiedby the evidence. Two noteworthy features are that for small a values this

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    10/23

    32 Y. H. Dobyns

    Fig. 5.amplification factor tends toward a constant depending only on w,and thateven for a 0.85, the null hypothesis emerges twenty times less likely afteraccounting for the evidence for w = 5 x 10~ 4.

    In summary, an examination of various possible prior probability distri-butions leads to conclusions ranging from confirmation of the classical oddsagainst the null hypothesis to confirmation of the nu ll hypothesis, dependingon one's choice of prior. Priors that lead to confirmation, or low odds against,the null hypothesis, are associated with large concentrationsof probability on

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    11/23

    Bayesian R EG analysis 33the null hypothesis, or ranges around the null that are narrow compared tothe likelihood function. In other words, they must be relatively imperviousto evidence.For all of these examples, the evidence (as manifested in the likelihoodfunction) has remained constant. The variability of the conclusions has re-sulted entirely from the various choices of prior prob ab ility distribu tion. Withthe pure delta-function prior standing as a cautionary exam ple of a prior beliefthat cannot be sh aken by any evidence whatsoever, it seems suggestive thatthose priors which lead to conclusions most strongly in disagreement withthe classical analysis are precisely those which m ost nearly approach the delta-function. A possibly oversimplified sum m ation is that the likelihood function,taken alone, would lead to the same conclusion as a classical analysis, whilethe more an analyst wishes to favor the nu ll hypothesis a pr ior i , the m ore theposterior conclusions will likewise favor the null. This at least suggests thata prior hypothesis leading to strong disagreem ent with classical analysis maybe inappropriate to a given problem.Concerns of appropriate choices of prior hypotheses will be addressed fur-ther below, in light of another method of analysis.

    Bayesian Hypothesis Testing and Lindley's ParadoxTh e last example in the p revious section was chosen in p art because it leadsrather directly to the question of using Bayesian analysis to compare twodistinct hypotheses, rather than evaluating a param eter range under a singlehypothesis. Consider for example the hypotheses p 0( t ) an d p1(t), wherep1now denotes an alternative prior. Let p 0 and p1 denote pr ior probab ilities onthe hypotheses, with p 0 + p1 = 1 so that the two hypotheses comprise ex-haustive alternatives. The relative likelihood of the hypotheses can also bestated as the p rior oddsW= P 0/P1.Given the twohypotheses andtheir respective prior probabilities,an overall

    prior probability distribution for 6 can b e constructed p ( t ) = p 0p 0( t ) + P 1p 1(t).This may then be used in a Bayesian calculation resulting in a posteriorprobability * ' ( 6 ) = L(6\y}*(6), where L(6\y) W\vV I W\y)*(0W is theJ normalized likelihood. This posterior probability can unambiguously be di-vided into components arising from the two hypotheses, p' =

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    12/23

    34 Y. H. Dobyns

    where L(B\y) . S(9\y) has been used to eliminate the explicit normalizingconstant Th e last two lines of Eq. 11 define the Bayesian odds adjustmentfactor, or odds ratio, B ( y ) . Note that, unlike % ( 6 \y ) , B ( y ) is not completelyobjective, since prior probability distributions are required to calculate it.Applications of this formula are referred to as Bayesian hypothesis testing,as distinct from the Bayesian parameter evaluation described in previoussections.In the general context of Bayesian hypothesis testing there can arise anoddity in the statistical inference between the twoalternatives. When a pointor very narrow null hypothesis TO is being tested against a diffuse or vaguelycharacterized alternative hypothesis T T , Bayesian hypothesis testing m ay leadto an unreasonable result in which data internally quite distinct from the nullhypothesis are nevertheless regarded as su pporting the null in preference tothe alternate. Mathematically, a likelihood whose m axim um is several stan-dard deviations away from the null still yields B(y) > 1. This situation isreferred to by various authors as J e f f r e y s ' p a r a d o x or Lindley's p a r a d o x . It iswell described by G. Shafer (1982):"Lindley's paradox is evidently of great generality; the effect it exhibits can arisewhenever the prior density under an alternative hypothesis is very diffuse relative tothe power of discrimination of the observations. Th e effect can b e thought of as anexample of conflicting evidence: the statistical evidence points strongly to a certainrelatively small set of parameter values, bu t the diffuse prior density proclaims greatskepticism (presum ably b ased on prior evidence) towards this set of parameter values.If the prior density is sufficiently diffuse, then this skepticism will overwhelm thecontrary evidence of the observations."The paradoxical aspect of the matter is that the diffuse density v,(6) seems to beskeptical about all small sets of parameter values. Because of this, we are som ewhatuneasy when its skepticism about values near the 'observed interval' overwhelm s themore straightforward statistical evidence in favor of those values. We are especiallyuneasy if the diffuseness of *-,(#) represents weak evidence, approximating total igno-rance; the more ignorant we are the more diffuse *-,(0) is, yet this increasing diffusenessis being interpreted as increasingly strong evidence against the 'observed interval.'"

    Shafer's article then proceeds to a cogent argument that cases where aLindley paradox occurs are precisely those where ordinary Bayesian hypoth-

    (11)

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    13/23

    Bayesian REG analysis 35esis testing is m isleading and should not be used. (In fairness, one should n otethat the major development of Shafer's treatment is an extension of Bayesianformalism to deal with this awkward case; and that the published articleincludesan assortment of counter-arguments from various authors.)The prob-lem, of course, is that a diffuse prior is being treated as evidence against thehypothesis in question. A s noted above, B (y) is not an objective adjustmentof subjective prior odds between hypotheses, but depends on a second sub-jective choice of prior distribution for an alternate. (If the null hypothesis isalso not well defined, it presents yet a third opportunity for s ub jective con-tributions.) If not carefully noted, these further subjective elements can bequite as inexplicit and m isleading as those that Bayesians object to in classicalanalysis.A further practical difficulty with hypothesis testing, relative to classicaltreatm ents, is that the null hypothesis is always com pared to a s p e c i f i c alter-native. In many situations, including the PEAR experiments, investigatorsare interested in any possibledeviation from a specified range of possibilities,without having enough information about the possible character of such adeviation to construct one specific alternate with any degree of conviction. Adiffuse alternative that encompasses a wide range of probabilities is not asatisfactory option. This can be seen abstractly, from consideration of theLindley paradox in cases where the statistical resolving power of a proposedexperiment will be very high; it can also be argued on other grounds, as willbe discussed below under the heading of statistical power.

    Hypothesis Testing on PEAR DataThe extreme sharpness of the likelihood function for the PEAR data baseused earlier makes any hypothesis test on it susceptible to a Lindley paradox.Unless the prior for the alternate is also narrowly focused in the region ofhigh likelihood, B(y) will claim unreasonable su pport for the null hypothesis.One might then argue that the recipe for avoiding dubious results is toemploy a narrow range of values for the alternate hypothesis. There are, afterall, numerous arguments that anomalous effects such as PEAR examinesshould be small. Perhaps the simplest argument is that if such effects werelarge, they would not be a subject of dispute! Despite such reasoning, asrecently as 1990 an article appeared using, at one point, *-,(p) = 1, p [0, 1]for the alternate hi a hypothesis test (Jeffreys 1990). The use of highly diffusepriors can thus b e seen to be a real and current practice meriting cautiousexamination, rather than a purely argumentative pointTh e final section of the parameter-evaluation discussion above, with itstwo-component prior, is already very close to a hypothesis test The only

    major difference is that the weighting parameter a is absorbed into the oddsW, leaving a one-parameter family of alternate priors for comparison with thenu ll. Figu re 6 shows the value of B for a range w values (where w is the widthof the rectangular alternate). The solid line, marked "Symmetric", can be

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    14/23

    36 Y. H. D obyns

    Fig. 6.

    seen to be the limit of the w-dependence shown in Figure 5 for a -> 0. Thedotted line, marked "One-Tailed", shows the odds ratio for the null againsta one-sided version of the rectangular prior, which has support only for p >0.5. Since the PEAR results are based on a directed hypothesis, one-tailedstatistics are appropriate in a classical framework, and this would seem to bean appropriate Bayesian analog, as well. B oth functions attain a minimum atw = 4.8 x 10-4, for B = 0.00316 in the one-tailed case.Inflation of p-valnes and Statistical Power

    The smallest B factor in the hypothesis comparison above was a factor of10 larger than the two-tailed p-value of 3 x 10-4 quoted in Marg ins . Thesmallest B to emerge from a direct hypothesis test for these data is 0.00146,for comparison of a point null against a point alternate located exactly at themaximum likelihood p = s/n. This is still a factor of 10 larger than thecorresponding one-tailed value (the B ayesian test is also "one-tailed" in thiscase). The tendency of hypothesis comparison to emerge with a larger B valuethan the corresponding p-value of a classical test is often cited by B ayesiananalysts as evidence that classicalp values are misleading for large databases,and should b e adjusted by some correction factor, perhaps of order n 1/2. (See,for example, the discussions by Good an d Hill in the latter portions of theShafer article; see also Jeffreys (1990).) Such proposals generally fail to take

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    15/23

    Bayesian REG analysis 37into account considerations of statistical power, a somewhat neglected branchof analysis.

    Conventional statistical reasoning recognizes two types of errors. The morecom m only acknowledged Type I or a error is the false rejection of the nullhypothesis, where a is the probability of making such an error. Type II orBerror is the false acceptance of the null hypothesis, with B likewise being theprobability of making theerror. 1 - B is usually called the statistical powerof a test In any real situation, the null hypothesis is either true or false andtherefore only one of the two types of error is actually possible. A less obviouspoint is made in the literature:"The null hypothesis, of course, is usually adopted only for statistical purposes byresearchers who believe it to b e false and expect to reject it We therefore often havethe curious situation of researcherswh oassume that the probabilityof error that appliesto their research is B (that is, they assume the null hypothesis is false), yet permit B tobe so high that they have more chance of being wrong than right when they interpretthe statistical significance of their results.While such behavior is not altogether rational,it is perhaps understandable given the minuscule emphasis placed on Type II erroran d statistical power in the teaching an d practice of statistical analysis and design..."(Lindsey 1990)

    Consider an experiment involving N Bernoulli trials where one wishes toknow whether they are evenly balanced ( p = 0.5, the null hypothesis) or biased,even by extremely small deviations from the null hypothesis. (This is in factthe case in PEAR R EG experiments.) Consider two cases: the null hypothesisis true (p - 0.5000); thenull hypothesis is false with p - 0.5002. Assumethat the experiment (in each case) is analyzed b y two statisticians, neither ofwhom has any advance knowledge of p: a classical statistician who rejects thenull hypothesis if a two-tailed p-value 1, and as supportingthe alternate if B < 1. To give the probability estimates some concrete realitywe m ay imagine the experiment being run many times with different pairs ofanalysts. The probability that the classical statistician makes a type I error isdefined by the choice of a , and is independent of N. The table below givesthe probability, for various N, of a type I error by the Bayesian analyst(regarding the evidence as favoring the alternate when the null is true) andthe probability of type II error by either analyst. For either a true null or atrue alternate, the final experimental scores follow a binomial distributionwith N determined by the row of the tableand p = .5000 or .5002 respectively.For both the classical analyst and the Bayesian analyst, one m ay calculate thenumber of successes needed for an analyst to reject the null hypothesis g,where the Bayesian is regarded as rejecting the null if B < 1. The table thenquotes the error frequencies that follow from the actual sucess distributionsunder each hypothesis and the analytical criterion used for rejection. The

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    16/23

    38 Y. H. DobynsTABLE IError rates under different analyses

    N10010,00010*1071081.5 x 108109

    Null is truea error, Bayesian0.0280.00312.6 x 10-7.6 x 10-52.2 x 10-51.86 x 10-56.8 x 10-6

    Null is falseB error, classical0.9950.9940.9850.9050.0770.0103.6 x 10-24

    N ull is falseB error, Bayesian0.9820.9980.9990.9060.5950.2671.8 x 10-16

    probabilities of type n error combine the prob ab ility of erroneous acceptanceof the null hypothesis with that of (correct) rejection of the null du e to mis-takenly inferring p < 1/2. For the considerations of columns 3 and 4, p > 1/2, and both conclusions are equally erroneous. The abrupt drop of B valuesin the last few lines of the table may seem jarring, but is a rather genericfeature of power analysis. For any given constant effect size, there will be afairly narrow range of N (as measured on a logarithmic scale) for which anyspecific test will quickly shift from being almost useless to being virtuallycertain to spot the effectA salient feature is that the Bayesian calculation, with this prior, starts ou tmore vulnerable to type I error, and less vulnerable to type n error, for sm allN: however, they are both so likely to suffer type II error that this is not veryinteresting. For large N, the Bayesian calculation is uniformly more conser-vative in that its probability of falsely rejecting the null hypothesis declineswith N, while the classical analysis uses a constant p-value criterion forrejecting the null. Correspondingly, the Bayesian calculation has a far higherlikelihood than the classical of falsely accept ing the null hypothesis. The rowfor N = 1.5 x 108 is of special interest, because for this value the classicalanalysis attains equal likelihood of type I and type n errors. At this level theBayesian analysis still has over 1 chance in 4 of incorrectly confirming p 0.5.Table I actually m akes an extrem ely generousinterpretation of the Bayesianoutput. The Bayesian analyst is assumed to regard the d ata as supporting thealternatehypothesisas soon as the B ayes factor B < 1.However, asmentionedabove, various authors write as though the odds adjustment factor B oughtto be regarded as an analogue to the p-value for a data set. Had this sort ofreasoning been used in constructing Table I, the Bayesian analyst would stillhave p = 0.634 for committing a type II error on 150 million trials.

    Th e Bayesian analysis used is not optimized for the problem of testing, say,a circuit that produces "on" signals with a probability that is definitely eitherp = 0.5 or p = 0.5002. Neither is the classical analysis. If the problem wereto distinguish these twodiscrete alternatives, a Bayesian test would comparetwo point hypotheses; while a classical test might, with given reliability levels,

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    17/23

    Bayesian R EG analysis 39establish ranges of ou tput for which a circuit would be classed as "definitely0.5", "definitely 0.5002", or "inconclusive, further testing required." Theactual problem m ay be envisioned as a sociological though t experim ent inwhich large num bers of Bayesian and classical analysts are presented onlywith the ou tput of the device, and the information that the underlying Ber-noulli probability either was or was not 0.5.The uniform Bayesian alternatesimply represents ignorance of possible alternative values of p, and is directlyanalogous to the situation, described earlier, for selection of priors in anom -alies data. The second to last line of Table I says that,were such anexperimentcondu cted an d each analyst presented with 150 million trials with either p =0.5 or p = 0.5002, the classical analysts would produce 1% false positives and1% false negatives; while the Bayesian analysts would produce a vanishinglytiny fraction of false positive reports but over 26% false negatives deviantdatasets identified as unbiased.In a m ore general vein, for large databases with sm all effects, it is apparentin light of the various discussions above that a n y Bayesian hypothesis com-parison will yield an odds adjustment factor larger than the classical p-valuefor the sam e data. If the odds adjustment B is regarded as equivalent to a p -value, or a corrected version of it, the inevitable consequence will be a testless powerful than the classical version, and so more prone to m issing actualeffects that m ay be present for any g iven database size.A n important consideration in statistical power analysis is the effect size.O ne seldom has the advantage of knowing in advance the magnitude ofpotential effects. In the anomalies research program at PEAR, for example,any unambiguous deviation from the demonstrable null hypothesis range hasprofound theoretical an d philosophical import While traditional power anal-ysis would suggest scaling the sample sizes to the smallest effect clearly dis-tinguishable from the null hypothesis range, this would be totally impracticalin that it would require datasets several orders of m agnitude larger than thosepublished inMarg ins . This,too, is a standard situation frequently encounteredin power analysis, in that effects of potential interest m ay nonetheless be toosmall to identify in studies of manageable size. While in fact the apparenteffect size manifest in the PEAR data is much larger than this pessimisticcase, there was no way of knowing in advance that this would b e so. Confrontedwith the possibility of very small effects, the only viable alternative m ay beto condu ct such measurements as are feasible, with the awareness that effectsm ay be too small to measure in the cu rrent study; in which case the experimentwill at least permit the establishment of an upper bound to the effect inquestion.In such a situation, a Bayesian analysis using the uniform alternate prioris obviously too obtuse to be of value; it retains a high chance of a falsenegative report for dataset sizes where the classical test has a high degree ofreliability. A t the same time, information about plausible alternates m ay wellbe so scant that the uniform prior, or an only slightly narrower one,is none-theless a fair summary of one'sprior state of knowledge. Under these circum-

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    18/23

    40 Y. H. Dobynsstances, the reasonablecoursewould seem to be adoptionof classical statisticaltestswith an experiment designed to excludeany procedures, such as optionalstopping, which would invalidate the tests. The next section will discussoptional stopping and related issues in more detail.

    Relative Merits: Bayesian vs. Classical StatisticsBayesian analysis is occasionally claimed to remedy various shortfalls inthe classical analysis of very large data bases (Jeffreys, 1990; see also Utts,1988). Beyond the question of replacing classicalp values with Bayesian oddsadjustment factors discussed above, two other sources of inadequacy areusually cited: First, an y repeated measurement eventually reaches a point of

    diminishing returns where further samples only refine measurement of sys-tematic biases rather than of the phenomenon under investigation. Second,indefinite continuation of data collection guarantees that arbitrarily largeexcursions will eventually arise from statistical fluctuations ("sampling to aforegone conclusion"). Both of these concerns, together with the notion thatBayesian analysis isspecially qualified todeal with them in a waythat classicalanalysis is not, are not substantiated by well-designed REG experiments ingeneral, or by the M a r g i n s data in particular.1. The inev i table d o m i n a n ce of bias. Th e maximum possible influence ofbiasing effects in this experiment has been discussed in the context ofthe "null hypothesis interval" above, and displayed graphically in Fig1. In an experiment that contrasts conditions where the only salientdistinction is the operator's stated intention, an y systematic technicalerror m u s t itself correlate with intention to affect the final results. Whileunforseen effects m ay never be completely ruled out, it would requireconsiderable ingenuity to devise an error mechanism that achieved thiscorrelation without itself being as anomalous as the putative effect. Overthe eight years of experimentation that went into the M a r g i n s database

    (twelve years as of this writing), both the PEAR staff and interestedoutsiders, includingprominent members of the critical community, havebeen unab le to find any such mundane source of systematic error. Beyondthis, the bias question in REG data is an improper conflation of twounrelated issues. A s pointed out by Hansel (1966) in the evaluation ofan y data, a statistical figure-of-improbability measures only the likeli-hood that data are the result of r a n d o m fluctuation. It remains for eachanalyst to dr aw conclusions as to whether the deviation from expectedbehavior is more plausibly du e to the effect under investigation or to anunaccounted-for systematic bias in the experiment. Thus, the questionof bias isessentially external to the purely statistical issue of whether ornot the data, are consistent with a null hypothesis.2. Arbitrari ly large excurs ions. Th e conclusion of Feller's (1957) discussionof the law of the iterated logarithm m ay be sum m arized thus: A ny

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    19/23

    Bayesian R EG analysis 41threshold condition for the term inal Z score of a binary random walkthat grows more slowly than \ '2log(log(rij) will be exceeded infinitelymany times as the walk is indefinitely prolonged, and thus is guaranteedto be exceeded for arbitrarily large data bases. Obviously, this is ofconcern only for experim ental sequences of indeterminate leng th, whereone could, in principle, wait for one of these large excursions to occur,and then declare an end to data collection.A ny experiment of predefinedlength will always have a well-defined terminal probabilitydistribution.Without exception, all PEAR laboratory data, including the M a r g i n sarray, have conformed to the latter, specified series length protocols.Nevertheless, if the M a r g i n s data are arbitrarily subjected to a worst-case, "optional-stopping-after-any-trial" analysis, the probability that aterminal Z score of3.014 could beattained at any time in the program'shistory computes to

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    20/23

    42 Y. H. DobynsBayesian analysis with a recalcitrant prior eventually agrees with classicalanalysis in rejecting the null hypothesis, when enough data are accumulatedwith a constant m ean shift. They also show an example, appropriate to theclass of REG-type experiments, where B ayesian analyses that choose priorsto be very conservative are also necessarily very insensitive and m us t suffera large probability of type II error. This is true whether the effect is real or asystematic error, and the mode of analysis grants no special ab ility to distin-guish the two cases.

    Data ScalingA final point of comparison concerns the interpretation of the M a r g i n s dataon various scales. Classical analysis does not require that any special attention

    be paid to the intermediate structure of the experimental data; if a Z score iscom puted for each series, and the assorted series Z scores are compoundedappropriately, the composite result is exactly the Z that would result were thedata treated in aggregate. This occurs because, no m atter what scale is usedto define elements of the data, the increased consistency of the results exactlycompensates for the loss of statistical leverage from the decreased N of units.Processing R EG data in large blocks is essentially a signal averaging procedure,unusual only in that it is performed algorithmically on stored data rather thanin preprocessing instrumentation.Directly checking for the same sensible scalingprop erty in Bayesian analysiswould entail developing an extension of the formalism for continuously dis-tributed variables, beyond the scope of this discussion. However, a cursorylook at the issue can b e accomplished by examining the seriesdata breakdownin Marg ins . The column listing p < 0.5 allows the 87 series to b e regardedas 87 Bernoulli trials, each one returning a binary answer to the question,"Did the operator achieve in the direction of intent, or not?"Naturally a greatdeal of information is lost in this representation, since the differential degreeof success or failure cannot be reckoned, bu t it remains instructive. O f the 87

    series, 56 were "successes" as Bernoulli trials. The binomial distribution for87 p =0.5 trials has = 43.5, s = 4.66.The actual success rate thus translatesto z - 2.68,p = 0.004 one-tailed. The loss of information is seen in thereduction of significance, bu t the result is consistent in being a strong rejectionof the null.Not so for a Bayesian hypothesis test against the uninformed alternate p l= 1. For the binary data, as wesaw, B = 12; butB(87, 56) = 0.2. Where thereduced information decreased the significance of the classical result, as onemight expect, it has inverted the Bayesian result from a modest confirmationof the null to a modest rejection of the null. The discrepancy, of course, liesin the Lindley paradox: the naivealternateprior is inappropriate for the binarytest, but not unreasonable for the vastly larger effect that m u s t be present , ifthe e f f e c t is real on the series scale. The fact that the inversion occurs is itselfconfirmatory evidence for the reality of the mean shift and therefore evidence

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    21/23

    Bayesian R EG analysis 43against the utility of a test that regards the data as supporting the null hy-oothesis.

    Final Comments and SummaryThe main points to emerge from this study are:1. For a Bayesian analysis of Bernoulli trials an objective likelihood func-tion can be constructed which obeys the necessary addition rule forconsistent handling of accum ulating data (Eq 6). The likelihood function

    has the same distribution as a classical estimate of confidence intervalson the value of p , and differences of interpretation can therefore ariseonly from the choice of priors.2. It therefore follows that a prior that is uniform in the region of highlikelihood, thus producing a posterior probability of the same shape asthe likelihood, replicates the classical analysis. For the PEAR data, thisreproduces a two-tailed p of 3.0 x 10-4 against a point null hypothesis.3. Prior belief favoring the null hypothesis impacts the conclusions. In itsultimate expression, where only values consistent with the null hypoth-esis are allowed prior support, no ev idence can sway the outcome. Lessextreme forms continue to reject the null hypothesis (exclude it fromreasonable parameter confidence intervals) unless the prior includesmuchof the probability within the nu ll (thu s approaching the imperviouscase)or is narrower than th e likelihood (and therefore narrower than thestatistical leverage of the known number of trials justifies.) Some ex-amples using the PEA R database include: Exclusion of the null hypoth-esis from at least the 95% posterior confidence interval for a normalprior centered on the null hypothesis with a as small as 3 x 10-5,compared to s = 4.9 x 10-5 for the likelihood function; posterior oddsagainst the null hypothesis of 20 to 1 for a prior that starts with 85% ofth e strength concentrated at p = 0.5 an d the remainder uniformly dis-tributed with width 5 x 10-5.4. Hypothesis comparison needs to be approached with caution becausethe odds adjustment factor B(y) contains a contribution from the choiceof prior probability distributions, and so is at least as vu lnerable to priorprejud ices as are the prior oddsW(Eq.9)5. Hypothesis tests return odds correction factors larger than classical p -values even when near-optimal cases are chosen. For the PEAR data,the optimal case is a comparison ofa point null against a point alternateat p = 0.50018 (them axim um of the likelihood function), leading to B= 0.00146 (odds of 685 to 1 against the null.) A consideration of sta-tistical power, however, demonstrates that this does not establish a flawin, or correction to, classical p-values but is a simple consequence ofadopting a less sensitive test.6. Examination of the response of Bayesian hypothesis testing to large

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    22/23

    44 Y. H. Dobynsdatabases indicates that claims of special ability to deal with biases oroptional stopping, or of qualitatively superior response to increasingamounts of data, compared to classical statistics, are unwarranted.7. In a situation such as confronted by the PEAR program and relatedinvestigations,where an y detectable effect is of fundamental interest an dimportance, the necessity of having a specific alternate hypothesis for aBayesian hypothesis test is a limitingand potentiallyconfounding factor.A prior that is diffuse enough to reflect ignorance of potential effects willhave much less statistical power than an appropriate classical test.

    Thus, while Bayesian statistical approaches have the virtue of making theirpractitioner's prejudices explicit, they may in some applications allow thoseprejudices more free rein than is usually acknowledged or desirable. Whereasa classical analysis returns results that depend only on the experim ental design,Bayesian results range from confirmation of the classical analysis to completerefutation, dependingon the choiceof prior. Those priors that disagreestrong-ly with the classical analysis frequently show one or more suspect features,such as b eing either very diffuse or pathologically concentrated with respectto the likelihood. (While it violates the definition of a prior to adjust it withrespect to an observed effect, the width of the likelihood is determined onlyby the experiment's size, not its outcome, and is therefore a legitimate guideto the characteristics of reasonable priors.) This would suggest that, the morestrongly a Bayesian analysis disagrees with a classical resu lt, the m ore likelythe disagreement is du e to a subjective contribution of the analyst.Given the impact of pr ior probabilities, one might argue that the proper roleof a Bayesian analysis should be strictly to quote likelihood functions andallow each reader to impose his own priors. However, the philosophical ex-ercise of justifying (or refuting) various priors remains a valuable one, par-ticularly for clarifying the m eaning of a particular result

    AcknowledgementsThe author wishes to thank Robert Jahn , Brenda Dunne, Roger Nelsonand Elissa Hoeger for reading earlier drafts of the manuscript and providingvaluable advice. The Engineering Anomalies Research program is supportedby major grants from the McD onnell Foundation, the Fetzer Institute, an dLaurance S. Rockefeller.Correspondence an d requests for reprints should be addressed to YorkDobyns, Princeton Engineering Anomalies Research, C-131 EngineeringQuadrangle, Princeton University, Princeton, New Jersey 08544-5263.

    ReferencesDunne, B . J., Jahn, R. G., & Nelson, R . D . (1981). A n R E G Exper imen t with Large Data-BaseCapabil i ty. Princeton U niversity. PEAR Technical Note.Dunne, B . J., Jahn, R. G., & Nelson, R. D. (1982). An R E G Exp eriment w i th Large D a t a Base

  • 8/3/2019 York H. Dobyns- On the Bayesian Analysis of REG Data

    23/23

    B ayesian R EG analysis 45Capability, II : E f f e c t s of Sample S ize and Various Operators. Princeton University: PEARTechnical Note.Feller, W. (1957). A n In troduct ion to Probabil i ty Theory and I ts Applications. Volume 2. (2nded.) NewYork, London: John Wiley & Sons.Good, I. J. (1982). Comment [on Shafer (1982), see below]. J o u r n a l o f the A m e r i c a n Stat i s t i ca lAssocia t ion , 77 , 342.Hansel, C E. M. (1966). E S P / A Scientific E v a l u a t i o n . N ew York: Charles Scribner's Sons.Jahn, R. G. (1982). Th e persistent paradox of psychic phenomena: A n engineering perspective.Proceedings of the I E E E , 70, 136-170.Jahn, R. G. (1987). Psychic Phenomena. In G. Adelman, Ed., Encyclopedia of Neuroscience,V o l . II. pp. 993-996. Boston, Basel, Stuttgart: Birkhauser.Jahn, R.G., & D unne, B.J. (1986). O n the quantum mechanics of consciousness, with applicationto anomalous phenomena. Foundations of Physics, 16(8) 721-772.Jahn, R.G., & Dunne, B.J. (1987). Marg ins o f Real i ty. San Diego, New York, London:HarcourtBrace Jovanovich.Jahn, R. G., Dunne, B. J., & Nelson, R. D. (1983). Princeton Engineering Anomalies Research.In C. B. Scott Jones, Ed., Proceedings of a Sympos ium on Applications of A n o m a l o u s Phe-n o m e n a , Leesburg, VA., November 30-December 1, 1983. Alexandria, Santa B arbara: KamanTempo, A Division of Kaman Sciences Corporation.Jahn, R. G ., Dunne, B . J., & Nelson, R . D . (1987). Engineering Anomalies Research. J o u r n a lof Scientific E x p l o r a t i o n , 1(1), 21-50.Jeffreys, W.H. (1990). Bayesian Analysis of Random Event Generator Data. J o u r n a l of ScientificExplorat ion, 4(2), 153-169.Lipsey, M. W. (1990). Design Sensitivity: Stat is t ical P o w e r fo r Experimental Research. NewburyPark, London, New Delhi: SAGE Publications.Nelson, R. D., Bradish, G . J., & Dobyns, Y. H. (1989). R a n d o m Event Generator Qualif ication,Cal ibrat ion, and Analysis . Princeton University: PEAR Technical Note.Nelson, R . D., Dunne, B . J., & Jahn, R. G. (1984). A n R E G Experiment With Large Da ta BaseCapability, H I: Operator Related Anomal ies . Princeton University: PEAR Technical Note.Nelson, R. D., Jahn, R . G ., & D unne, B. J. (1986). O perator-related anomalies in physicalsystems and information processes. J o u r n a l of the Society fo r Psychical Research. 53(803)261-286.Shafer, G. (1982). Lindley's P aradox. Journa l o f the America n Sta t is tica l Assoc ia t ion , 77 , 325-351.Utts, J. (1988). Successful replication versu s statistical significance. Journa l of Parapsychology ,52, 305-320.