27
U.U.D.M. Project Report 2011:18 Examensarbete i matematik, 15 hp Handledare och examinator: Sven Erick Alm Juni 2011 Department of Mathematics Uppsala University Approximating the Binomial Distribution by the Normal Distribution – Error and Accuracy Peder Hansen

Approximating the Binomial Distribution by the Normal - DiVA

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Approximating the Binomial Distribution by the Normal - DiVA

U.U.D.M. Project Report 2011:18

Examensarbete i matematik, 15 hpHandledare och examinator: Sven Erick AlmJuni 2011

Department of MathematicsUppsala University

Approximating the Binomial Distribution by the Normal Distribution – Error and Accuracy

Peder Hansen

Page 2: Approximating the Binomial Distribution by the Normal - DiVA
Page 3: Approximating the Binomial Distribution by the Normal - DiVA

Approximating the Binomial Distribution by theNormal Distribution - Error and Accuracy

Peder HansenUppsala University

June 21, 2011

Abstract

Different rules of thumb are used when approximating the binomialdistribution by the normal distribution. In this paper an examinationis made regarding the size of the approximations errors. The exactprobabilities of the binomial distribution is derived and then comparedto the approximated value of the normal distribution. In addition aregression model is done. The result is that the different rules indeedgives rise to errors of different sizes. Furthermore, the regression modelcan be used in order to get guidance of the maximum size of the error.

1

Page 4: Approximating the Binomial Distribution by the Normal - DiVA

Acknowledgenment

Thank you Professor Sven Erick Alm!

2

Page 5: Approximating the Binomial Distribution by the Normal - DiVA

Contents

1 Introduction 4

2 Theory and methodology 42.1 Characteristics of the distributions . . . . . . . . . . . . . . . 42.2 Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Continuity correction . . . . . . . . . . . . . . . . . . . . . . . 62.4 Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.5 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.5.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 82.5.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Background 10

4 The approximation error of the distribution function 114.1 Absolute error . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2 Relative Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Summary and conclusions 17

3

Page 6: Approximating the Binomial Distribution by the Normal - DiVA

1 Introduction

Neither is any extensive examination found, regarding the rules of thumbused when approximating the binomial distribution by the normal distribu-tion, nor of the accuracy and the error which they result in. The scope of thispaper is the most common approximation of a Binomial distributed randomvariable by the normal distribution. We let X ∼ Bin(n, p), with expectationE(X) = np and variance V (X) = np(1 − p), be approximated by Y , whereY ∼ N(np, np(1− p)). We denote, X ≈ Y .

The rules of thumb, is a set of different guidelines, minimum values orlimits, here denoted L for np(1− p), in order to get a good approximation,that is, np(1−p) ≥ L. There are various kinds of rules found in the literatureand any extensive examination of the error and accuracy has not been found.Reasonable approaches when comparing the errors are, the maximum errorand the relative error, which both are investigated.

The main focus lies on two related topics. First, there is a shorter section,where the origin of the rules, where they come from and who is the originator,is discussed. Next comes an empirical part, where the error affected by thedifferent rules of thumb is studied. The result is both plotted and tabled. Ananalysis of regression is also made, which might be useful as a guideline whenestimating the error in situations not covered here. In addition to the maintopics, a section dealing with the preliminaries, notation and definitions ofprobability theory and mathematical statistics is found. Each of the sectionswill be more explanatory themselves regarding their topics. I presume thereader to be familiar with some basic concepts of mathematical statisticsand probability theory, otherwise the theoretical part would range way tofar. Therefor, also proofs and theorems are just referred to. Finally there isa summarizing section, where the results of the empirical part are discussed.

2 Theory and methodology

First of all, the reader is assumed to be familiar with basic concepts inmathematical statistics and probability theory. Furthermore there are, asstated above, some theory that instead of being explicitly explained, only isreferred to. Regarding the former, I suggest the reader to view for instance[1] or [4] and concerning the latter the reader may want to read [7].

2.1 Characteristics of the distributions

As the approximation of a binomial distributed random variable by a normaldistributed random variable is the main subject, a brief theoretical introduc-tion about them is made. We start with a binomial distributed random

4

Page 7: Approximating the Binomial Distribution by the Normal - DiVA

variable, X and denote,

X ∼ Bin(n, p), where n ∈ N and p ∈ [0, 1].

The parameters p and n are the probability of an outcome and the numberof trials. The expected value and variance of X are,

E(X) = np and V (X) = np(1− p),

respectively. In addition, X has got the probability function

pX(k) = P (X = k) =

(n

k

)pk(1− p)n−k, where 0 ≤ k ≤ n,

and the cumulative probability function, or distribution function,

FX(k) = P (X ≤ k) =

k∑i=0

(n

i

)pi(1− p)k−i. (1)

The variable X is approximated by a normal distributed random variable,call it Y , we write,

Y ∼ N(µ, σ2), where µ ∈ R and σ2 <∞.

The parameters µ and σ2 are the mean value and variance, E(Y ) and V (Y ),respectively. The density function of Y is

fY (x) =1

σ√

2πe−(x−µ)

2/2σ2

and the distribution function is defined by,

FY (k) = P (Y ≤ x) =

∫ x

−∞

1

σ√

2πe−(t−µ)

2/2σ2dt. (2)

2.2 Approximation

Thanks to De Moivre, among others, we know by the central limit theo-rem that a sum of random variables converges to the normal distribution.A binomial distributed random variable X may be considered as a sum ofBernoulli distributed random variables. That is, let Z be a Bernoulli dis-tributed random variable,

Z ∼ Be(p) where p ∈ [0, 1],

5

Page 8: Approximating the Binomial Distribution by the Normal - DiVA

with probability distribution,

pZ = P (Z = k) =

{p for k = 1

1− p for k = 0.

Consider the sum of n independent identically distributed Zi’s, i.e.

X =n∑i=0

Zi

and note that X ∼ Bin(n, p). For instance one can realize that the proba-

bility of the sum being equal to k, P (X = k) =

(n

k

)pk(1 − p)n−k. Hence,

we know that when n → ∞, the distribution of X will be normal and forlarge n approximately normal. How large n should be in order to get a goodapproximation also depends, to some extent, on p. Because of this, it seemsreasonable to define the following approximations. Again, let X ∼ Bin(n, p)and Y ∼ N(µ, σ2). The most common approximation, X ≈ Y , is the onewhere µ = np and σ2 = np(1− p), this is also the one used here. Regardingthe distribution function we get

FX(k) ≈ Φ

(k − np√np(1− p)

), (3)

where FX(k) is defined in (1) and Φ is the standard normal distributionfunction. We extend the expression above and get that,

FX(b)− FX(a) = P (a < X ≤ b) ≈ Φ

(b− np√np(1− p)

)− Φ

(a− np√np(1− p)

).

(4)

2.3 Continuity correction

We proceed with the use of continuity correction, which is recommended by[1], suggested by [4] and advised by [9], in order to decrease the error, theapproximation (3) will then be replaced by

FX(k) ≈ Φ

(k + 0.5− np√np(1− p)

)(5)

and hence (4) is written as

FX(b)− FX(a) = P (a < X ≤ b) ≈ Φ

(b+ 0.5− np√np(1− p)

)−Φ

(a+ 0.5− np√np(1− p)

).

(6)

6

Page 9: Approximating the Binomial Distribution by the Normal - DiVA

This gives, for a single probability, with the use of continuity correction, theapproximation,

pX(k) = FX(k)−FX(k−1) ≈ Φ

(k + 0.5− np√np(1− p)

)−Φ

((k − 1) + 0.5− np√

np(1− p)

)(7)

and further we note that it can be written

FX(k)− FX(k − 1) ≈k+0.5∫k−0.5

fY (t)dt. (8)

2.4 Error

There are two common ways of measuring an error, the absolute error andthe relative error. In addition another usual measure of how close, so tospeak, two distributions are to each other, is the supremum norm

supA|P (X ∈ A)− P (Y ∈ A)|.

However, from a practical point of view, we will study the absolute errorand relative error of the distribution function. Let a denote the exact valueand a the approximated value. The absolute error is the difference betweenthem, the real value and the one approximated. The following notation isused,

εabs = |a− a| .

Therefor, the absolute error of the distribution function, denoted εFabs(k),

for any fixed p and n, where k ∈ N : 0 ≤ k ≤ n, without use of continuitycorrection, is

εFabs(k) =

∣∣∣∣∣FX(k)− Φ

(k − np√np(1− p)

)∣∣∣∣∣ . (9)

Regarding the relative error, in the same way as before, let a be the exactvalue and a the approximated value. Then the relative error is defined as

εrel =

∣∣∣∣a− aa∣∣∣∣ .

This gives the relative error of the distribution function, denoted εFrel(k),

for any fixed p and n, where k ∈ N : 0 ≤ k ≤ n, without use of continuitycorrection, is

εFrel(k) =

εFabs(k)

FX(k),

7

Page 10: Approximating the Binomial Distribution by the Normal - DiVA

or equivalently, inserting εFabs(k) from (9),

εFrel(k) =

∣∣∣∣∣FX(k)− Φ

(k − np√np(1− p)

)∣∣∣∣∣FX(k)

.

2.5 Method

The examination is done in the statistical software R. The software providespredefined functions for deriving the distribution function and probabilityfunction of the normal and binomial distributions. The examination is splitinto two parts, where the first part deals with the absolute error of theapproximation of the distribution function and the second part concerns therelative error. The conditions under which the calculations are made, arethose found as guidelines in [4]. The calculations will be made with the helpof a two-step algorithm. At the end of each section a linear model is fitted tothe error. Finally, an overview, where a table and a plot of how the value ofnpq, where q = 1− p, affects the maximum approximation error for differentprobabilities are presented.

2.5.1 Algorithm

The two step algorithm below is used. The values of npq mentioned in theliterature are, in all cases said to be equal or larger than some limit, heredenoted L. The worst case scenario, as to speak, is the case where they areequal, that is, npq = L. Therefor equalities are chosen as limits. We knowthat n ∈ N, which means that p must be semi-fixed if the equality shouldhold, this means that the values of p are adjusted, but still remain close tothe ones initially chosen. The way of doing this is a two-step algorithm. Firsta reasonable set of different initial probabilities, pi’s are chosen, whereafterthe corresponding ni values, which in turn will be rounded to ni, are derived.These are used to adjust pi to pi so that the equality will hold.

1. (a) Chose a set P of different initial probabilities, pi ∈ [0, 0.5], wherei ∈ N : 0 < i <

∣∣∣P∣∣∣.(b) Derive the corresponding ni ∈ R+ so that nipi(1− pi) = L,

(c) continue by deriving ni ∈ N, in order to get a integer,

ni(pi) := min{n ∈ N : npi(1− pi) ≥ L}. (10)

Now we got a set of ni ∈ N, denote it N.

2. Chose a set P so that for every pi ∈ P,

8

Page 11: Approximating the Binomial Distribution by the Normal - DiVA

nipi(1− pi) = L.

The result is that we always keep the limit L fixed. Let us take a lookat an example. Let L = 10, use continuity correction and the initial P =0.1(0.1)0.5,

Exemplifying table of algorithm valuesi 1 2 3 4 5pi 0.1 0.2 0.3 0.4 0.5ni 111.11 62.50 47.62 41.67 40.00ni 112 63 48 42 40pi 0.099 0.198 0.296 0.391 0.500

Different rules of thumb are suggested by [4]. Using approximation (3)the authors say that np(1 − p) ≥ 10 gives reasonable approximations andin addition, using (5), it may even be sufficient using np(1 − p) ≥ 3. Theinvestigation takes place under three different conditions,

• np(1− p) = 10 without continuity correction, suggested in [4],

• np(1− p) = 10 with continuity correction, suggested in [2],

• np(1− p) = 3 with continuity correction, suggested in [4].

The investigation of the rules is made only for pi ∈ [0, 0.5] due to symmetry.As we see, np(1 − p) simply gets the same values for p ∈ [0, 0.5] as forp ∈ [0.5, 1]. So, for every p, ni(pi) is derived, this in turn, means that weget ni(pi) + 1 approximations. For every ni(pi), and of course pi as well, wedefine the maximum absolute error of the approximation of the distributionfunction,

MFabs= max{εFabs

(k) : 0 ≤ k ≤ ni(pi)}, (11)

and in addition the maximum relative error

MFrel= max{εFrel

(k) : 0 ≤ k ≤ ni(pi)}. (12)

The results are both tabled and plotted.

2.5.2 Regression

Beforehand, some plots where made which indicated that the maximum ab-solute error could be a linear function of p. Regarding the relative maximumerror, a quadratic or cubic function of p seemed plausible. Because of that,a regression is made. The model assumed to explain the absolute error is

9

Page 12: Approximating the Binomial Distribution by the Normal - DiVA

Mε = α+ βp+ εl, (13)

where Mε is the maximum error, α is the intercept, β the slope and εl theerror of the linear model. For the relative error, the two additional regressionmodels are,

Mε = α+ βp+ γp2 + εl (14)

and

Mε = α+ βp+ γp2 + δp3 + εl. (15)

3 Background

In the first basic courses in mathematical statistics, the approximations (3)and (5) are taught. Students have learned some kind of rules of thumb theyshould use when applying the approximations, myself included, for examplethe rules suggested by Blom [4],

np(1− p) ≥ 10,

np(1− p) ≥ 3 with continuity correction.

Any motivation why the limit L is set to be L = 10 and L = 3 respec-tively is not found in the book. On the other hand, in 1989 Blom claimsthat the approximation ”gives decent accuracy if npq is approximately largerthan 10” with continuity correction [2]. Further, it is interesting, that Blomchanges the suggestion between the first edition of [3] from 1970, where itsays, similarly as above, that it ”gives decent accuracy if np(1 − p) is ap-proximately larger than 10” with continuity correction, and in the secondedition from 1984 the same should yield, but now instead without use ofcontinuity correction, the conclusion is that there has been some fuzzinessregarding the rules. Neither have I, nor my advisor Sven Erick Alm, foundany examination of the accuracy of these rules anywhere else. With Blom [4]as starting-point, I begun backtracking, hoping that I could find the sourceof the rules of thumb. It is worth mentioning that among authors, slightlydifferent rules have been used. For instance Alm himself and Britton, presenta schema with rules for approximating distributions, in which np(1− p) > 5with continuity correction is suggested [1]. Even between countries, or froman international point of view, so to speak, differences are found. Schaderand Schmid [10] says that ”by far the most popular are”

np(1− p) > 9

10

Page 13: Approximating the Binomial Distribution by the Normal - DiVA

and

np > 5 for 0 < p ≤ 0.5,

n(1− p) > 5 for 0.5 < p < 1,

which I am not familiar with and I have not found in any Swedish literature.In the mid-twentieth century, more precise 1952, Hald [9] wrote,

An exhaustive examination on the accuracy of the approxi-mation formulas has not yet been made, and we can thereforeonly give rough rules for the applicability of the formulas.

With these words in mind, the conclusion is that there probably does not ex-ist any earlier work made about the accuracy of the approximation. However,Hald himself made an examination in the same work for npq > 9. Furtherhe also points out that in cases where the binomial distribution is very skew,p < 1

n+1 and p > nn+1 , the approximation cannot be applied. Some articles

have been found that briefly discuss the accuracy and error of the distribu-tions. Mainly, the focus of the articles lies on some more advanced methodof approximating than (3) or (5). An update of [2] has been made by Enger,Englund, Grandell and Holst in 2005, [4]. The writers have been contactedand Enger was said to be the one that assigned the rules. Hearing this mademe believe that the source could be found. However, Enger could not recallfrom where he had got it [6]. That is how far I could get. Nevertheless, theexamination remains as interesting as beforehand.

Discussing rules for approximating, one can not avoid at least mentioningthe Berry-Esseen theorem. The theorem gives a conservative estimate, in thesense that it gives the largest possible size of the error. It is based upon therate of convergence of the approximation to the normal distribution. TheBerry-Esseen theorem will not be further examined here, but there are severalinteresting articles due to that the theorem is improved every now and then,most recently in May 2010 [11].

4 The approximation error of the distribution func-tion

The errors of the approximations,MFabsandMFrel

, defined in (11), and (12)respectively, are plotted and tabled. The cases that are examined are thosementioned earlier, suggested by [4].

4.1 Absolute error

We examine the absolute maximum errors of the approximation of the distri-bution function, MFabs

defined in (11), here in the first part. In addition to

11

Page 14: Approximating the Binomial Distribution by the Normal - DiVA

that a regression is made, defined in (13), to see if we might find any lineartrend.

Case 1: npq = 10, without continuity correction

First, the case where L = 10 = npq, without continuity correction. P, theset of different initial probabilities is chosen to be pi = 0.01(0.01)0.50. Thismeans that we use 50 equidistant pi. The smallest probability is p1 = 0.0100and it has the largest error MFabs

= 0.0831. MFabsdecreases the closer to

0.5 we get, which is natural since the binomial distribution tends to be skew.The points make a pattern which is a bit curvy, but still the points are closeto the straight line in Figure 1. Another remark made, is that the distancebetween the probabilities decreases the closer to 0.5 we get. The fact thatthere are several ni rounded to the same value of ni, which in turn givesequal values on pi, makes several MFabs

the same, and plotted in the samespot. So they are all there, but not visible due to that reason. Next we tryto fit a linear model for MFabs

. The result is

MFabs= 0.0836− 0.0417p+ εl.

The regression line is the straight line in Figure 1. The slope of the lineshows that the size of MFabs

changes moderately. Note that the sum of theerrors of the regression line,

∑|εl|, is relatively small, the result should be

somewhat precise estimates of MFabsfor probabilities which are not taken in

consideration here.

● ●●

●● ● ●

●● ●

●● ●

● ● ●

●●

●●●

●●● ●●●

●●●●●●●

0.0 0.1 0.2 0.3 0.4 0.5

0.06

50.

070

0.07

50.

080

Probabilities

Max

err

or

Figure 1: Maximum absolute error for npq = 10 without continuity correc-tion. The straight line is the regression line, MFabs

= 0.0836− 0.0417p.

12

Page 15: Approximating the Binomial Distribution by the Normal - DiVA

Case 2: npq = 10, with continuity correction

Under these circumstances MFabsdecreases and is about four times smaller

than without continuity correction. The regression line,

MFabs= 0.0209− 0.0416p+ εl, (16)

also has got a four times smaller intercept than in the first case. What isinteresting is that, the slope is approximately the same in both cases, thisin turn, means that for every pi = 0.01(0.01)0.50, it holds that MFabs

also isfour times smaller. This can be seen in Figure 2.

●●

●●

●●

●●

●●

●●

● ●

●●

●●

0.0 0.1 0.2 0.3 0.4 0.5

0.00

00.

005

0.01

00.

015

0.02

0

Probabilities

Max

err

or

Figure 2: Maximum absolute error for npq = 10 with continuity correction.The straight line is the regression line, MFabs

= 0.0209− 0.0416p+ εl.

Case 3: npq = 3, with continuity correction

Finally we take a look at the last case, regarding the absolute error, whereL = 3 = npq and continuity correction is used. The plot is seen in Figure 3.P is the same as above. In this case the regression line is

MFabs= 0.0373− 0.0720p+ εl.

The largest error, MFabs= 0.0355 appears at p1 = 0.0100 and is about twice

the size compared to the largest MFabsfor L = 10. The slope of the line is

more aggressive here, which in turn results in errors, one order of magnitudeless than in the Case 1 for probabilities close to 0.5. Also here the sum of

13

Page 16: Approximating the Binomial Distribution by the Normal - DiVA

discrepancy from the regression line is relatively small which should resultin fairly good estimations of MFabs

.

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●●

●●●●●●●●●●●●●

0.0 0.1 0.2 0.3 0.4 0.5

0.00

50.

010

0.01

50.

020

0.02

50.

030

0.03

5

Probabilities

Max

err

or

Figure 3: Maximum absolute error for npq = 3, with continuity correction.The straight line is the regression line, MFabs

= 0.0373− 0.0720p.

4.2 Relative Error

Here, the maximum relative error of the approximation of the distributionfunction, MFrel

, defined in (12) is examined. The regression models (14) and(15) are both tested.

Case 1: npq = 10, without continuity correction

In the first case we perform the calculations under, L = 10 = npq with-out continuity correction. The result is shown in Figure 4. As we see MFrel

increases very rapidly. The smallest value of MFrel, 16.97317 is at p1. The

largest 138.61756 at p50. As we see in Table 4, it is k = 0 that gives thelargest error. For other values of k the error is much smaller. Furthermorewe note that MFrel

is very large. If we look at a specific example wherep = 0.2269, which means that n = 57, then X ∼ Bin(57, 0.2269). Let Xbe approximated, according to (3), by Y ∼ N(12.933, 3.162078). We getthat P (X ≤ 1) = 7.55 · 10−6 and P (Y ≤ 1) = 8.04 · 10−5. Under thesecircumstances we get,

MFrel=|P (X ≤ 1)− P (Y ≤ 1)|

P (X ≤ 1)= 9.64.

14

Page 17: Approximating the Binomial Distribution by the Normal - DiVA

The result is shown in Table 4. So the relative error is, as we also can see,large, for small k and small probabilities. The regression curves, defined in(14) and (15) are,

MFrel= 14.66 + 69.86p+ 416.14p2 + εl

and

MFrel= 21.53− 92.26p+ 1246.60p2 − 1136.07p3 + εl

respectively. We note that there are not any larger differences in accuracydepending on the choice of model. Naturally, the discrepancy of the secondmodel is lower.

● ● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.1 0.2 0.3 0.4 0.5

2040

6080

100

120

140

Probabilities

Max

err

or

Figure 4: Maximum relative error for npq = 10 without continuity correction.The solid line is the regression curve, MFrel

= 14.66 + 69.86p+ 416.14p2 andthe dashed line, MFrel

= 21.53− 92.26p+ 1246.60p2 − 1136.07p3.

Case 2: npq = 10, with continuity correction

We continue by looking at the same case as above, but here continuity correc-tion is used. This gives somewhat remarkable results,MFrel

is actually abouttwo times larger than without continuity correction. Let us study the samenumeric example as above, except that we use continuity correction. We gotp = 0.2269 which again means that n = 57, then X ∼ Bin(57, 0.2269). Welet X be approximated, according to (5), by Y ∼ N(12.933, 3.162078). Itresults in, P (X ≤ 1) = 7.55 · 10−6 and P (Y ≤ 1 + 0.5) = 0.000150. Under

15

Page 18: Approximating the Binomial Distribution by the Normal - DiVA

these circumstances we get,

MFrel=|P (X ≤ 1)− P (Y ≤ 1 + 0.5)|

P (X ≤ 1)= 18.84,

which fits the values in Table 5. MFabsgets dramatically worse when we

use continuity correction than without. Hence, also MFrelbecomes worse.

In Figure 5 one can judge that the results gets worse as we get closer toprobabilities near 0.5. The regression curves, defined in (14) and (15) are,

MFrel= 34.9− 69.8p+ 1597.1p2 + εl

and

MFrel= 37.4− 127.3p+ 1891.8p2 − 403.2p3 + εl,

respectively. Looking at Figure 5, we see that the difference between the twomodels is insignificant.

● ● ● ● ● ● ● ● ● ●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.1 0.2 0.3 0.4 0.5

5010

015

020

025

030

035

0

Probabilities

Max

err

or

Figure 5: Maximum relative error for npq = 10 with continuity correction.The solid line is the regression curve, MFrel

= 34.9 − 69.8p + 1597.1p2 andthe dashed line, MFrel

= 37.4− 127.3p+ 1891.8p2 − 403.2p3.

Case 3: npq = 3 with continuity correction

Here, in the last case npq = 3 and continuity correction is used, see Fig-ure 6. This gives the curves of regression, defined in (14) and (15),

MFrel= 0.473 + 2.204p+ 2.123p2 + εl

16

Page 19: Approximating the Binomial Distribution by the Normal - DiVA

and

MFrel= 0.514 + 1.155p+ 7.858p2 − 7.885p3 + εl,

respectively. As we see MFrelactually get the smallest value here, where

npq = 3 and continuity correction is used. As well as in the two other casesregarding the relative error the difference between the quadratic and cubicregression model is minimal.

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.1 0.2 0.3 0.4 0.5

0.5

1.0

1.5

2.0

Probabilities

Max

err

or

Figure 6: Maximum relative error for npq = 3 with continuity correction.The solid line is the regression curve, MFrel

= 0.473 + 2.204p+ 2.123p2 andthe dashed line, MFrel

= 0.514 + 1.155p+ 7.858p2 − 7.885p3.

5 Summary and conclusions

The three different rules of thumbs that are focused on turned out to giveapproximation errors of different sizes. Regarding the absolute errors, thelargest difference is found between the case where L = 10 without continuitycorrection and L = 10 with continuity correction. The largest error decreasesfrom ∼ 0.08 to about ∼ 0.02, which is approximately four times smaller, arelatively large difference. Letting L = 3 and using continuity correctionwe end up with the largest error ∼ 0.035, closer to the latter case, but stillbetween them. When using this common and simple way of approximating,depending on the problem, different levels of tolerance usually are accepted.A common level in many cases may be 0.01. If we look deeper, we see thatthe probabilities for getting such a small MFabs

differs from between therules of thumb. Using npq = 10 without continuity correction does not evenreach to the 0.01 level of accepted accuracy. Comparing this to the other

17

Page 20: Approximating the Binomial Distribution by the Normal - DiVA

two cases which in contrast reach the 0.01 level for probabilities ∼ 0.25in the same case as above but in addition with continuity correction, andfor probabilities ∼ 0.35 in the case where npq = 3. Further, it would beinteresting to investigate how the relationship between k and n affects theerror. In addition, another interesting part would be some tables indicatinghow large n should be in order to get sufficiently small errors, for differentprobabilities.

Concerning the relative errors I would say that the applicability may besomewhat uncertain, due to the fact that MFrel

is very large for small valuesof k but rapidly decrease. This fact, I may say, make the plots look a bit ex-treme and there are other values of k that give much better approximations.Judging by Tables 4, 5 and 6 indeed this seems to be the case. We knowthat the approximation is motivated by the central limit theorem, however,what we also know, is that it does not hold the same accuracy for smallprobabilities, that is, the tails of the distributions. This is also the directreason why the accuracy gets worse when using continuity correction, it putsextra mass on the already too large approximated value. In a similar waywe get the explanation why the relative error increases when the value ofnpq changes from 10 to 3, (as one maybe would expect the opposite), themean value of the normal distribution, np, gets closer to 0 which in turngives additional mass. The conclusion is, one should remember that due tothe fluctuations depending on k, of the relative errors, what we also can seein Tables 4, 5 and 6, that the regression model also provides conservativeestimates of the errors. As a natural alternative, and most likely better,Poisson approximation is recommended for small probabilities. Like in theprevious case concerning the absolute errors, some more exhaustive exami-nation of the relative error would be interesting. How large should n be toget acceptable levels of the error, for instance 10% or 5% and so on.

References

[1] Alm S.E. and Britton T., Stokastik - Sannolikhetsteori och statistikteorimed tillämpningar, Liber (2008).

[2] Blom, G., Sannolikhetsteori och statistikteori med tillämpningar (Bok C),Fjärde upplagan, Studentlitteratur (1989).

[3] Blom, G., Sannolikhetsteori med tillämpningar (Bok A), Studentlitter-atur (1970,1984)

[4] Blom G., Enger J., Englund G., Grandell J. and Holst L., Sannolihetsteorioch statistikteori med tillämpningar, Femte upplagan, Studentlitteratur(2008).

18

Page 21: Approximating the Binomial Distribution by the Normal - DiVA

[5] Cramér H., Sannolikhetskalkylen, Almqvist & Wiksell/Geber Förlag AB(1949).

[6] Enger J., Private communication, (2011).

[7] Gut A., An Intermediate Course in Probability, Springer (2009).

[8] Hald A., A History of Mathematical Statistics from 1750 to 1930. Wiley,New York (1998).

[9] Hald A., Statistical Theory with Engineering Applications, John Wiley &Sons, Inc., New York and London (1952).

[10] Schader M. and Schmid F., Two Rules of Thumb for the Approximationof the Binomial Distribution by the Normal Distribution,The AmericanStatistician, 43, 1989, 23-24.

[11] Shevtsov I. G., An Improvement of Convergence Rate Estimates in theLyapunov Theorem, Doklady Mathematics, 82, 2010, 862-864.

Tables

Regarding the plotted probabilities, that is the set P, only the maximumerror is plotted. One can not tell from which k the error comes from, neithercan one tell if the error is of similar size for other values of k. To get a moredetailed picture this section contains tables both for the absolute errors andthe relative errors. It would have been possible to table all the errors forall values of k, but due to the fact that the cardinality of N at times, thatis for small probabilities, is relatively large, it would have taken too muchplace. This made me table only the 10 values of k which resulted in thelargest errors. The columns in the tables, that contains the values of k isin descending order. What this means is that the first value of k in eachcolumn is the maximum error that is plotted. On the side of every columnof k, there is a column where the corresponding error is written. These twosub columns, got a common header which tells the value of p in the specificcase.

19

Page 22: Approximating the Binomial Distribution by the Normal - DiVA

p=

0.01

0.02

0.03

0.03

990.

0499

0.05

970.

0698

0.07

990.

0893

0.09

910.

109

0.11

960.

129

0.13

810.

1487

0.15

840.

1696

0.17

920.

1899

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

100.

0831

100.

083

100.

0828

100.

0824

100.

0819

100.

0813

100.

0805

100.

0795

110.

0793

110.

0794

110.

0793

110.

079

110.

0786

110.

078

110.

0771

110.

0761

120.

0763

120.

0763

120.

0761

90.

0808

90.

0795

90.

078

110.

0768

110.

0776

110.

0782

110.

0787

110.

0791

100.

0784

100.

0772

100.

0758

100.

0741

120.

0743

120.

075

120.

0757

120.

0761

110.

0747

110.

0733

110.

0715

110.

0738

110.

0749

110.

0759

90.

0765

90.

0748

90.

073

90.

071

90.

0689

120.

0698

120.

0711

120.

0723

120.

0734

100.

0724

100.

0706

100.

0684

130.

0665

130.

0683

130.

0696

130.

0709

80.

0672

80.

065

80.

0628

120.

0621

120.

0638

120.

0654

120.

067

120.

0685

90.

0669

90.

0647

90.

0623

90.

0597

130.

0614

130.

0631

130.

0649

100.

0662

100.

0636

100.

0611

100.

0583

120.

0569

120.

0587

120.

0604

80.

0605

80.

0581

80.

0557

80.

0532

130.

0516

130.

0536

130.

0555

130.

0575

130.

0596

90.

0573

90.

0549

90.

0521

140.

051

140.

0536

140.

0557

140.

058

70.

0465

70.

0442

70.

0419

130.

0435

130.

0455

130.

0475

130.

0496

80.

0507

80.

0483

80.

0458

80.

0433

140.

0422

140.

0443

140.

0464

140.

0488

90.

0494

90.

0463

90.

0436

150.

0418

130.

0377

130.

0396

130.

0416

70.

0396

70.

0373

70.

035

70.

0328

140.

0337

140.

0356

140.

0377

140.

0398

80.

0406

80.

0382

80.

0359

80.

0332

150.

0342

150.

0368

150.

0391

90.

0406

60.

0253

60.

0235

140.

0243

140.

026

140.

0278

140.

0297

140.

0317

70.

0305

70.

0285

70.

0264

70.

0244

150.

0258

150.

0277

150.

0296

150.

032

80.

0308

80.

0281

80.

0259

160.

0263

140.

021

140.

0226

60.

0217

60.

026

0.01

846

0.01

6815

0.01

715

0.01

8615

0.02

0215

0.02

1915

0.02

377

0.02

227

0.02

047

0.01

8716

0.01

816

0.01

9816

0.02

1916

0.02

398

0.02

3515

0.00

9215

0.01

0315

0.01

1515

0.01

2815

0.01

4115

0.01

556

0.01

526

0.01

386

0.01

256

0.01

1216

0.01

1816

0.01

3316

0.01

4716

0.01

627

0.01

697

0.01

527

0.01

3417

0.01

2517

0.01

42

0.19

790.

2066

0.21

630.

2269

0.23

890.

2454

0.25

980.

2678

0.27

640.

2857

0.29

590.

307

0.31

940.

3333

0.34

920.

3679

0.39

090.

4219

0.5

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

120.

0757

120.

0751

120.

0742

130.

0736

130.

0737

130.

0737

130.

073

130.

0724

130.

0715

140.

0714

140.

0715

140.

0711

140.

0701

150.

0695

150.

0693

150.

0676

160.

0675

170.

0662

200.

0627

130.

0717

130.

0725

130.

0731

120.

073

120.

0712

120.

0701

140.

0698

140.

0705

140.

0711

130.

0702

130.

0685

150.

0675

150.

0688

140.

0682

160.

0656

160.

0675

170.

0645

160.

0629

190.

0614

110.

0711

0.06

8111

0.06

614

0.06

5314

0.06

7214

0.06

8112

0.06

7212

0.06

5412

0.06

3315

0.06

4315

0.06

613

0.06

6213

0.06

3216

0.06

2914

0.06

5214

0.06

0315

0.06

3218

0.06

2621

0.05

814

0.05

9714

0.06

1514

0.06

3411

0.06

3411

0.06

0211

0.05

8415

0.05

915

0.06

0815

0.06

2612

0.06

0712

0.05

7816

0.05

716

0.06

130.

0593

170.

0554

170.

0601

180.

0552

150.

0533

180.

0544

100.

0561

100.

0536

100.

0508

150.

0511

150.

0541

150.

0557

110.

0541

110.

0516

110.

0489

160.

0514

160.

0541

120.

0543

120.

0502

170.

0508

130.

0542

180.

048

140.

0526

190.

0532

220.

0487

150.

0438

150.

046

150.

0484

100.

0476

100.

044

100.

042

160.

0442

160.

0464

160.

0488

110.

0459

110.

0425

170.

0428

170.

0466

120.

0453

180.

0418

130.

0475

190.

0424

200.

0408

170.

0434

90.

0384

90.

036

90.

0333

160.

0353

160.

0384

160.

0402

100.

0376

100.

0352

170.

0337

170.

0364

170.

0394

110.

0388

110.

0347

180.

0366

120.

0396

190.

0343

130.

0387

140.

0402

230.

0373

160.

0281

160.

0301

160.

0325

90.

0304

90.

0273

90.

0256

170.

0292

170.

0314

100.

0326

100.

0298

100.

0269

180.

0286

180.

0322

110.

0301

190.

0282

120.

0329

200.

0293

210.

0282

160.

0311

80.

0217

80.

0199

170.

0191

170.

0213

170.

024

170.

0256

90.

022

90.

0202

180.

0206

180.

0229

180.

0255

100.

0238

100.

0205

190.

0235

110.

0252

200.

022

120.

025

130.

0269

240.

026

170.

0156

170.

0172

80.

0179

80.

0159

80.

0138

180.

0142

180.

017

180.

0187

90.

0182

90.

0163

190.

0146

190.

017

190.

0199

100.

0171

200.

017

110.

0198

210.

0183

220.

0177

150.

02

Tab

le1:

Tab

leof

the10

largesterrors,ε F

absan

dwhich

kis

comes

from

,foreverypi,

unde

rnpq

=10

witho

utcontinuity

correction

.

20

Page 23: Approximating the Binomial Distribution by the Normal - DiVA

p=

0.01

0.02

0.03

0.03

990.

0499

0.05

970.

0698

0.07

990.

0893

0.09

910.

109

0.11

960.

129

0.13

810.

1487

0.15

840.

1696

0.17

920.

1899

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

110.

0201

110.

0211

0.01

9811

0.01

9511

0.01

9211

0.01

8711

0.01

8211

0.01

7711

0.01

7112

0.01

6512

0.01

6312

0.01

6112

0.01

5812

0.01

5412

0.01

4912

0.01

4412

0.01

3613

0.01

3313

0.01

3110

0.02

100.

0193

100.

0185

100.

0176

100.

0167

120.

0164

120.

0165

120.

0166

120.

0165

110.

0164

110.

0157

110.

0148

110.

0139

130.

0133

130.

0134

130.

0135

130.

0134

120.

013

120.

0121

120.

0149

120.

0153

120.

0156

120.

0159

120.

0162

100.

0157

100.

0147

100.

0137

100.

0126

130.

0121

130.

0125

130.

0129

130.

0131

110.

0131

110.

0121

110.

0111

110.

0099

140.

0114

0.01

049

0.01

419

0.01

299

0.01

189

0.01

066

0.01

036

0.01

0213

0.01

0713

0.01

1213

0.01

1710

0.01

1610

0.01

0410

0.00

927

0.00

847

0.00

8314

0.00

8614

0.00

9114

0.00

9711

0.00

8811

0.00

775

0.01

145

0.01

15

0.01

076

0.01

045

0.00

9813

0.01

016

0.01

60.

0098

60.

0096

60.

0093

60.

0089

60.

0085

60.

0082

140.

008

70.

0081

70.

0079

70.

0076

70.

0074

70.

007

60.

0104

60.

0105

60.

0105

50.

0103

130.

0095

50.

0094

50.

0089

50.

0084

70.

0082

70.

0083

70.

0084

70.

0084

100.

0082

60.

0078

60.

0073

60.

0069

80.

0066

80.

0067

80.

0068

40.

0091

40.

0086

130.

0082

130.

0089

90.

0094

90.

0083

70.

0077

70.

008

50.

008

50.

0075

50.

0071

140.

0069

140.

0075

100.

0071

80.

0061

80.

0064

60.

0064

60.

006

60.

0055

160.

0076

130.

0075

40.

0081

40.

0076

40.

0071

70.

0074

90.

0071

170.

0066

170.

0064

170.

0062

140.

0062

50.

0065

50.

0061

80.

0058

100.

0059

180.

0053

180.

0052

180.

005

150.

0055

170.

0071

160.

0073

160.

0071

170.

007

70.

007

170.

0068

170.

0067

90.

006

180.

0058

180.

0058

170.

006

170.

0058

180.

0057

50.

0057

180.

0055

100.

0048

190.

0047

150.

0048

180.

0047

130.

0068

170.

0071

170.

007

160.

0069

170.

0069

40.

0066

40.

0061

180.

0058

40.

0053

140.

0055

180.

0058

180.

0057

170.

0055

180.

0056

50.

0052

50.

0047

50.

0043

190.

0047

190.

0046

0.19

790.

2066

0.21

630.

2269

0.23

890.

2454

0.25

980.

2678

0.27

640.

2857

0.29

590.

307

0.31

940.

3333

0.34

920.

3679

0.39

090.

4219

0.5

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

130.

0128

130.

0125

130.

012

130.

0114

140.

0108

140.

0106

140.

0102

140.

0099

140.

0095

140.

0089

150.

0085

150.

0082

150.

0077

150.

007

160.

0064

160.

0056

170.

0046

180.

0032

147e

-04

120.

0114

140.

0108

140.

0109

140.

0109

130.

0107

130.

0102

130.

009

150.

0086

150.

0087

150.

0086

140.

0083

140.

0074

160.

0067

160.

0067

150.

0059

170.

0051

160.

0043

170.

0032

277e

-04

140.

0106

120.

0106

120.

0097

120.

0087

150.

008

150.

0082

150.

0085

130.

0083

130.

0075

130.

0066

160.

0064

160.

0066

140.

0064

140.

0052

170.

005

150.

0045

180.

0036

190.

0024

156e

-04

110.

0068

80.

0067

150.

007

150.

0075

120.

0074

120.

0068

90.

0054

90.

0053

160.

0057

160.

006

130.

0057

130.

0046

100.

0042

170.

0046

140.

0038

110.

0033

110.

0028

160.

0022

266e

-04

80.

0067

150.

0065

80.

0066

80.

0063

80.

0061

80.

0059

80.

0054

160.

0053

90.

0052

90.

005

90.

0048

90.

0045

170.

0041

100.

004

100.

0036

180.

0032

150.

0027

120.

0022

286e

-04

70.

0067

70.

0064

70.

006

70.

0056

90.

0054

90.

0054

120.

0053

80.

0051

80.

0048

80.

0044

100.

0043

100.

0043

90.

0041

90.

0036

110.

0034

100.

0031

120.

0026

130.

0019

136e

-04

150.

006

110.

0058

90.

0049

90.

0052

70.

005

70.

0048

160.

0049

120.

0044

100.

004

100.

0042

80.

004

170.

0036

130.

0034

110.

0033

90.

003

120.

0024

100.

0023

110.

0019

254e

-04

60.

0051

60.

0047

110.

0048

190.

0042

190.

004

160.

004

70.

0041

70.

0038

120.

0036

200.

0033

200.

0031

80.

0035

80.

0031

80.

0025

180.

0026

90.

0023

190.

0019

100.

0013

164e

-04

190.

0046

90.

0046

190.

0044

60.

0038

200.

0037

190.

0039

200.

0036

100.

0037

70.

0034

70.

0031

170.

003

200.

0028

110.

003

210.

0024

210.

0021

140.

0022

90.

0015

200.

0011

124e

-04

180.

0045

190.

0045

60.

0043

200.

0037

160.

0036

200.

0037

190.

0035

200.

0035

200.

0034

210.

0028

210.

0028

210.

0027

210.

0026

130.

0021

220.

002

220.

0018

130.

0015

140.

001

294e

-04

Tab

le2:

Tab

leof

the10

largesterrors,ε

Fabsan

dwhichkiscomes

from

,for

everypi,un

dernpq

=10

withcontinuity

correction

.

21

Page 24: Approximating the Binomial Distribution by the Normal - DiVA

p=

0.01

0.01

990.

0297

0.03

950.

0493

0.05

90.

0685

0.07

950.

089

0.09

780.

1086

0.11

720.

1273

0.13

94

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

20.

0356

20.

0343

30.

0332

30.

033

30.

0328

30.

0325

30.

0322

30.

0318

30.

0314

30.

031

30.

0304

30.

033

0.02

943

0.02

863

0.03

353

0.03

342

0.03

312

0.03

182

0.03

052

0.02

922

0.02

792

0.02

652

0.02

522

0.02

392

0.02

242

0.02

122

0.01

980

0.01

890

0.02

450

0.02

420

0.02

390

0.02

360

0.02

330

0.02

290

0.02

250

0.02

210

0.02

160

0.02

120

0.02

070

0.02

020

0.01

962

0.01

816

0.01

176

0.01

166

0.01

146

0.01

124

0.01

114

0.01

164

0.01

214

0.01

264

0.01

314

0.01

354

0.01

44

0.01

434

0.01

474

0.01

524

0.00

94

0.00

964

0.01

014

0.01

066

0.01

116

0.01

096

0.01

076

0.01

056

0.01

036

0.01

016

0.00

986

0.00

966

0.00

946

0.00

95

0.00

95

0.00

865

0.00

825

0.00

775

0.00

727

0.00

717

0.00

77

0.00

77

0.00

77

0.00

697

0.00

697

0.00

687

0.00

681

0.00

757

0.00

727

0.00

727

0.00

717

0.00

717

0.00

715

0.00

685

0.00

635

0.00

585

0.00

535

0.00

481

0.00

531

0.00

591

0.00

677

0.00

671

0.00

471

0.00

358

0.00

38

0.00

38

0.00

38

0.00

38

0.00

38

0.00

31

0.00

361

0.00

445

0.00

425

0.00

385

0.00

328

0.00

38

0.00

318

0.00

31

0.00

241

0.00

139

0.00

19

0.00

11

0.00

171

0.00

288

0.00

38

0.00

38

0.00

38

0.00

38

0.00

35

0.00

259

0.00

19

0.00

19

0.00

19

0.00

110

3e-0

41

8e-0

49

0.00

19

0.00

19

0.00

19

0.00

19

0.00

19

0.00

19

0.00

19

0.00

1

0.14

640.

1542

0.16

290.

1727

0.18

380.

1965

0.21

130.

2288

0.25

0.27

640.

311

0.36

130.

5

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

kεFabs

30.

0281

30.

0276

30.

0269

30.

0262

30.

0252

30.

0241

30.

0227

30.

0209

30.

0186

40.

016

40.

0146

40.

0112

90.

0024

00.

0185

00.

018

00.

0175

00.

0169

40.

0164

40.

0166

40.

0167

40.

0168

40.

0166

30.

0155

30.

0114

10.

0079

20.

0024

20.

0172

20.

0161

40.

0159

40.

0161

00.

0161

00.

0153

00.

0143

00.

0131

00.

0116

10.

0108

10.

0099

50.

0068

100.

0015

40.

0154

40.

0156

20.

0149

20.

0135

20.

012

20.

0104

10.

0106

10.

0109

10.

011

00.

0098

00.

0076

30.

0057

10.

0015

60.

0088

60.

0086

10.

0088

10.

0093

10.

0098

10.

0102

20.

0085

20.

0063

70.

0055

70.

005

50.

0061

00.

0047

30.

0015

10.

0079

10.

0084

60.

0083

60.

008

60.

0077

60.

0072

60.

0067

60.

006

60.

0051

50.

0048

70.

0042

20.

0043

80.

0015

70.

0066

70.

0066

70.

0065

70.

0064

70.

0063

70.

0062

70.

006

70.

0058

20.

0039

60.

0039

80.

0024

70.

0027

68e

-04

80.

003

80.

003

80.

003

80.

003

80.

003

80.

0029

80.

0029

80.

0029

50.

0036

80.

0026

60.

0024

80.

0017

58e

-04

50.

0021

50.

0017

50.

0012

99e

-04

99e

-04

99e

-04

50.

0015

50.

0025

80.

0028

20.

0012

20.

0017

95e

-04

76e

-04

90.

001

90.

001

99e

-04

57e

-04

102e

-04

57e

-04

99e

-04

99e

-04

99e

-04

99e

-04

98e

-04

62e

-04

46e

-04

Tab

le3:

Tab

leof

the10

largesterrors,ε

Fabsan

dwhichkiscomes

from

,for

everypi,un

dernpq

=3

withcontinuity

correction

.

22

Page 25: Approximating the Binomial Distribution by the Normal - DiVA

p=

0.01

0.02

0.03

0.03

990.

0499

0.05

970.

0698

0.07

990.

0893

0.09

910.

109

0.11

960.

129

0.13

810.

1487

0.15

840.

1696

0.17

920.

1899

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

016

.973

2017

.746

8018

.566

6019

.428

4020

.343

5021

.301

5022

.335

4023

.436

70

24.5

15025

.714

4026

.986

8028

.440

4029

.814

6031

.217

5032

.939

10

34.6

254

036

.684

038

.551

90

40.7

878

13.

5794

13.

7342

13.

8973

14.

0678

14.

2478

14.

435

14.

636

14.

8487

15.

0558

15.

2848

15.

5263

15.

8005

16.

0582

16.

3197

16.

6388

16.

9494

17.

3262

17.

6659

18.

0701

21.

1123

21.

1665

21.

2233

21.

2825

21.

3447

21.

4092

21.

4781

21.

5507

21.

621

21.

6985

21.

7799

21.

8718

21.

9578

22.

0447

22.

1502

22.

2524

22.

3758

22.

4865

22.

6176

30.

3227

30.

3473

30.

373

30.

3998

30.

4278

30.

4568

30.

4877

30.

5202

30.

5516

30.

5861

30.

6222

30.

6629

30.

7009

30.

7391

30.

7855

30.

8302

30.

884

30.

9322

30.

989

70.

2216

70.

2213

70.

2209

70.

2203

70.

2195

70.

2186

70.

2176

80.

2178

80.

2184

80.

219

80.

2194

80.

2197

80.

2199

80.

2199

40.

2364

40.

2592

40.

2867

40.

3112

40.

3401

80.

2097

80.

2111

80.

2125

80.

2137

80.

2149

80.

216

80.

2169

70.

2163

70.

2149

70.

2133

70.

2114

70.

2092

90.

2094

40.

2127

80.

2197

80.

2193

80.

2187

80.

2179

90.

2187

60.

2064

60.

2036

60.

2007

60.

1975

60.

194

90.

1944

90.

1968

90.

1991

90.

2012

90.

2034

90.

2055

90.

2076

70.

207

90.

2111

90.

2129

90.

2145

90.

2161

90.

2174

80.

2169

90.

1817

90.

1843

90.

1869

90.

1895

90.

1919

60.

1904

60.

1864

60.

1821

60.

1779

100.

17510

0.17

8210

0.18

164

0.19

317

0.20

477

0.20

187

0.19

8810

0.19

6910

0.19

9710

0.20

2710

0.14

5710

0.14

8910

0.15

2210

0.15

5510

0.15

8810

0.16

210

0.16

5410

0.16

8710

0.17

186

0.17

316

0.16

814

0.17

3710

0.18

4610

0.18

7410

0.19

0710

0.19

367

0.19

517

0.19

167

0.18

745

0.14

545

0.13

95

0.13

225

0.12

5111

0.12

1711

0.12

5311

0.12

911

0.13

2811

0.13

6411

0.14

014

0.15

286

0.16

236

0.15

6811

0.15

5311

0.15

9511

0.16

3311

0.16

7711

0.17

1511

0.17

57

0.19

790.

2066

0.21

630.

2269

0.23

890.

2454

0.25

980.

2678

0.27

640.

2857

0.29

590.

307

0.31

940.

3333

0.34

920.

3679

0.39

090.

4219

0.5

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

042

.539

5044

.554

3046

.895

6049

.648

4052

.929

7054

.817

9059

.225

70

61.8

19064

.736

10

68.0

41071

.815

90

76.1

68081

.239

8087

.225

2094

.395

1010

3.13

77011

4.01

9012

7.79

4013

8.61

761

8.38

51

8.74

541

9.16

181

9.64

85110

.224

7110

.554

5111

.319

71

11.7

67112

.267

7112

.832

2113

.473

3114

.208

1115

.058

9116

.055

8117

.240

51

18.6

718

120

.433

5122

.628

61

24.1

341

22.

7193

22.

8352

22.

9685

23.

1235

23.

306

23.

412

3.65

012

3.78

972

3.94

542

4.12

012

4.31

762

4.54

282

4.80

212

5.10

412

5.46

042

5.88

732

6.40

722

7.04

462

7.40

283

1.03

293

1.08

283

1.14

31.

2063

31.

2841

31.

3283

31.

4299

31.

4887

31.

5541

31.

6273

31.

7097

31.

8032

31.

9105

32.

0347

32.

1804

32.

3537

32.

5626

32.

8144

32.

9154

40.

3625

40.

3878

40.

4168

40.

4504

40.

4897

40.

5119

40.

5631

40.

5926

40.

6253

40.

6619

40.

703

40.

7495

40.

8026

40.

8639

40.

9355

41.

0201

41.

1211

41.

2405

41.

2619

90.

2196

90.

2204

90.

2212

90.

2218

90.

2224

90.

2226

90.

2227

90.

2227

100.

2229

100.

2245

50.

2381

50.

2637

50.

293

50.

3267

50.

3659

50.

412

50.

4665

50.

5294

50.

5199

80.

216

80.

2148

80.

2133

100.

2124

100.

2152

100.

2166

100.

2197

100.

2212

90.

2225

90.

2222

100.

2262

100.

2279

100.

2297

100.

2315

100.

2334

110.

2371

110.

2443

110.

2554

120.

3121

100.

2049

100.

2072

100.

2097

80.

2114

80.

209

80.

2075

80.

2039

110.

2049

110.

2079

50.

2154

90.

2217

90.

2209

110.

2222

110.

2265

110.

2314

100.

2357

100.

2388

120.

25111

0.31

127

0.18

4111

0.18

2211

0.18

5911

0.18

9911

0.19

4411

0.19

6811

0.20

28

0.20

178

0.19

9211

0.21

1111

0.21

4511

0.21

829

0.21

999

0.21

859

0.21

6812

0.22

4912

0.23

5510

0.24

4913

0.30

2

Tab

le4:

Tab

leof

the10

largesterrors,ε F

relan

dwhich

kis

comes

from

,foreverypi,

undernpq

=10

witho

utcontinuity

correction

.

23

Page 26: Approximating the Binomial Distribution by the Normal - DiVA

p=

0.01

0.02

0.03

0.03

990.

0499

0.05

970.

0698

0.07

990.

0893

0.09

910.

109

0.11

960.

129

0.13

810.

1487

0.15

840.

1696

0.17

920.

1899

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

029

.720

8031

.197

3032

.770

9034

.434

70

36.2

120

38.0

840

40.1

172

042

.297

044

.445

40

46.8

510

49.4

206

052

.378

20

55.1

954

058

.092

10

61.6

752

065

.214

10

69.5

729

073

.564

50

78.3

871

16.

4725

16.

7619

17.

0685

17.

3907

17.

7327

18.

0908

18.

4774

18.

8893

19.

2928

19.

7419

110

.218

81

10.7

642

111

.280

51

11.8

083

112

.457

21

13.0

941

113

.873

51

14.5

828

115

.434

42

2.29

232

2.39

262

2.49

842

2.60

912

2.72

612

2.84

82

2.97

892

3.11

782

3.25

322

3.40

322

3.56

172

3.74

212

3.91

22

4.08

492

4.29

642

4.50

32

4.75

452

4.98

222

5.25

423

0.97

073

1.01

663

1.06

483

1.11

513

1.16

813

1.22

323

1.28

223

1.34

453

1.40

513

1.47

213

1.54

253

1.62

243

1.69

753

1.77

353

1.86

633

1.95

663

2.06

63

2.16

473

2.28

224

0.42

474

0.44

914

0.47

474

0.50

144

0.52

954

0.55

864

0.58

984

0.62

274

0.65

474

0.68

994

0.72

694

0.76

894

0.80

814

0.84

794

0.89

634

0.94

334

1.00

024

1.05

144

1.11

215

0.16

655

0.18

045

0.19

515

0.21

045

0.22

655

0.24

335

0.26

125

0.28

025

0.29

865

0.31

895

0.34

035

0.36

455

0.38

735

0.41

025

0.43

825

0.46

545

0.49

825

0.52

785

0.56

299

0.04

516

0.04

686

0.05

536

0.06

426

0.07

366

0.08

356

0.09

416

0.10

546

0.11

646

0.12

866

0.14

146

0.15

66

0.16

986

0.18

376

0.20

086

0.21

736

0.23

746

0.25

556

0.27

718

0.04

399

0.04

479

0.04

429

0.04

369

0.04

289

0.04

199

0.04

089

0.03

949

0.03

810

0.03

727

0.03

737

0.04

617

0.05

447

0.06

297

0.07

337

0.08

357

0.09

67

0.10

737

0.12

086

0.03

878

0.04

28

0.03

988

0.03

7510

0.03

7110

0.03

7310

0.03

7510

0.03

7510

0.03

749

0.03

6310

0.03

6810

0.03

6210

0.03

5610

0.03

4810

0.03

3610

0.03

2411

0.03

0611

0.03

038

0.03

5410

0.03

5310

0.03

5910

0.03

6410

0.03

688

0.03

498

0.03

218

0.02

911

0.02

7811

0.02

847

0.02

979

0.03

449

0.03

2111

0.03

0411

0.03

0711

0.03

0811

0.03

0810

0.03

0610

0.02

8911

0.02

98

0.19

790.

2066

0.21

630.

2269

0.23

890.

2454

0.25

980.

2678

0.27

640.

2857

0.29

590.

307

0.31

940.

3333

0.34

920.

3679

0.39

090.

4219

0.5

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

082

.198

9086

.619

5091

.804

4097

.966

1010

5.40

31010

9.72

78011

9.95

26012

6.05

3013

2.99

07014

0.94

88015

0.16

82016

0.97

31017

3.81

13018

9.32

27020

8.46

03023

2.72

78026

4.70

54030

9.51

34038

2.97

11116

.103

8116

.876

11

17.7

77118

.841

21

20.1

171

120

.855

21

22.5

896

123

.617

91

24.7

821

126

.110

91

27.6

424

129

.427

41

31.5

357

134

.066

91

37.1

685

141

.072

71

46.1

771

53.2

711

64.8

288

25.

4669

25.

7115

25.

9954

26.

3291

26.

727

26.

9561

27.

4919

27.

8079

28.

1642

28.

5694

29.

0342

29.

5734

210

.207

22

10.9

642

11.8

862

13.0

394

214

.537

52

16.6

055

219

.961

33

2.37

383

2.47

873

2.60

013

2.74

233

2.91

113

3.00

793

3.23

353

3.36

63

3.51

53

3.68

393

3.87

693

4.10

013

4.36

123

4.67

183

5.04

853

5.51

743

6.12

333

6.95

553

8.30

44

1.15

944

1.21

344

1.27

584

1.34

874

1.43

54

1.48

444

1.59

924

1.66

654

1.74

194

1.82

734

1.92

464

2.03

684

2.16

774

2.32

34

2.51

074

2.74

364

3.04

364

3.45

434

4.12

075

0.59

025

0.62

145

0.65

735

0.69

935

0.74

95

0.77

745

0.84

335

0.88

195

0.92

525

0.97

45

1.02

975

1.09

385

1.16

865

1.25

75

1.36

385

1.49

625

1.66

635

1.89

915

2.27

886

0.29

396

0.31

316

0.33

526

0.36

126

0.39

196

0.40

946

0.45

036

0.47

426

0.50

16

0.53

146

0.56

596

0.60

576

0.65

226

0.70

726

0.77

366

0.85

66

0.96

26

1.10

746

1.34

667

0.13

147

0.14

357

0.15

767

0.17

417

0.19

387

0.20

57

0.23

147

0.24

687

0.26

427

0.28

397

0.30

657

0.33

257

0.36

297

0.39

97

0.44

287

0.49

737

0.56

767

0.66

477

0.82

678

0.04

198

0.04

958

0.05

848

0.06

898

0.08

158

0.08

888

0.10

598

0.11

68

0.12

748

0.14

058

0.15

548

0.17

288

0.19

338

0.21

778

0.24

758

0.28

488

0.33

348

0.40

18

0.51

6411

0.02

9211

0.02

8511

0.02

7411

0.02

612

0.02

4712

0.02

449

0.03

489

0.04

129

0.04

869

0.05

719

0.06

79

0.07

869

0.09

239

0.10

899

0.12

949

0.15

539

0.18

959

0.23

789

0.32

28

Tab

le5:

Tab

leof

the10

largesterrors,ε

Frelan

dwhichkiscomes

from

,for

everypi,un

dernpq

=10

withcontinuity

correction

.

24

Page 27: Approximating the Binomial Distribution by the Normal - DiVA

p=

0.01

0.01

990.

0297

0.03

950.

0493

0.05

90.

0685

0.07

950.

089

0.09

780.

1086

0.11

720.

1273

0.13

94

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

00.

514

00.

533

00.

5523

00.

5721

00.

5924

00.

6132

00.

634

00.

6588

00.

6809

00.

7019

00.

7283

00.

7501

00.

7764

00.

8088

20.

0856

20.

0842

20.

0828

20.

0813

20.

0797

20.

078

20.

0762

20.

0741

20.

0721

20.

0702

20.

0678

20.

0657

20.

0631

10.

0634

30.

0524

30.

0527

30.

053

30.

0533

30.

0536

30.

0538

30.

054

30.

0541

30.

0542

30.

0542

30.

0542

30.

0542

30.

0541

20.

0599

10.

0243

10.

0189

10.

0132

40.

0133

40.

0141

40.

0148

40.

0155

10.

0181

10.

0247

10.

031

10.

039

10.

0455

10.

0535

30.

0538

60.

0121

60.

012

40.

0126

60.

0117

60.

0115

60.

0113

60.

0111

40.

0163

40.

017

40.

0177

40.

0184

40.

0191

40.

0198

40.

0206

40.

0112

40.

0119

60.

0118

50.

0085

50.

008

50.

0075

10.

0108

60.

0109

60.

0107

60.

0105

60.

0103

60.

0101

60.

0098

60.

0095

50.

0099

50.

0094

50.

009

10.

0075

70.

0072

70.

0072

70.

0071

70.

0071

70.

0071

70.

007

70.

007

70.

0069

70.

0069

70.

0068

70.

0073

70.

0073

70.

0072

70.

0072

80.

003

10.

0046

50.

007

50.

0064

50.

0059

50.

0054

50.

0048

50.

0043

50.

0037

80.

003

80.

0031

80.

0031

80.

0031

80.

0031

10.

0015

80.

003

80.

003

80.

003

80.

003

80.

003

80.

003

80.

003

80.

003

50.

0029

90.

001

90.

001

90.

001

90.

001

90.

001

90.

001

90.

001

90.

001

90.

001

90.

001

90.

001

90.

001

90.

001

9

0.14

640.

1542

0.16

290.

1727

0.18

380.

1965

0.21

130.

2288

0.25

0.27

640.

311

0.36

130.

5

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

kεFrel

00.

8281

00.

8499

00.

8748

00.

9036

00.

9373

00.

9773

01.

0255

01.

085

01.

1605

01.

2602

01.

3994

01.

6142

02.

0641

10.

0692

10.

0759

10.

0836

10.

0925

10.

1029

10.

1153

10.

1304

10.

1491

10.

173

10.

2048

10.

2498

10.

3203

10.

4769

20.

0579

20.

0557

30.

0531

30.

0526

30.

052

30.

0512

30.

053

0.04

843

0.04

593

0.04

23

0.03

542

0.04

362

0.12

273

0.05

363

0.05

342

0.05

32

0.04

992

0.04

622

0.04

172

0.03

612

0.02

94

0.02

634

0.02

684

0.02

664

0.02

43

0.02

40.

0211

40.

0216

40.

0222

40.

0228

40.

0234

40.

0241

40.

0249

40.

0256

20.

0196

20.

0068

20.

0122

30.

0226

40.

0031

60.

0093

60.

0091

60.

0088

60.

0085

60.

0081

60.

0077

60.

0071

60.

0064

70.

0056

50.

0061

50.

0081

50.

0099

90.

0024

70.

0068

70.

0067

70.

0066

70.

0066

70.

0065

70.

0063

70.

0062

70.

006

60.

0055

70.

0051

70.

0043

70.

0029

50.

002

80.

003

80.

003

80.

003

80.

003

80.

003

80.

003

80.

0029

50.

003

50.

0044

60.

0043

60.

0026

80.

0018

80.

0016

50.

0025

50.

002

50.

0014

99e

-04

99e

-04

99e

-04

50.

0018

80.

0029

80.

0028

80.

0027

80.

0024

95e

-04

100.

0015

Tab

le6:

Tab

leof

the10

largesterrors,ε

Frelan

dwhichkiscomes

from

,for

everypi,un

dernpq

=3

withcontinuity

correction

.

25