Upload
lyxuyen
View
219
Download
0
Embed Size (px)
Citation preview
i
Report on Approximation of the Gamma Distribution
Submitted to Professor Knight
for English 316
Brigham Young University Provo, Utah 9 April 2014
by Brianne Cass
Trevor Johnson Jane Ostergar
Ammon Washburn
ii
Brianne Cass English 316 Student
9 April 2014 Professor Liz Knight
Dear Professor Knight, We have submitted herewith our technical report that summarizes our findings on the topic of “Approximation of the Gamma Distribution”. This report is not being submitted to any other journal or publisher or professor. This report is for both you as our professor and grader of this report, as well as for our classmates, to whom we will be presenting our research. The purpose of this report is to demonstrate our ability to go through the whole research process as well as our ability to complete a well written technical report. For this report Ammon Washburn will be the correspondence author because he is the most knowledgeable about the subject matter our report is on. He can be contacted by phone at
or by e-mail at . Sincerely, Brianne Cass Trevor Johnson Jane Ostergar Ammon Washburn
iii
TABLE OF CONTENTS LIST OF FIGURES ........................................................................................................................................... iv
LIST OF TABLES .............................................................................................................................................. v
ABSTRACT ..................................................................................................................................................... vi
I. INTRODUCTION ...................................................................................................................................... 1
II. GAMMA DISTRIBUTION ........................................................................................................................ 3
III. SIMPLE SIMPSON’S METHOD ............................................................................................................... 4
Strategy of Method ............................................................................................................................... 4
Derivation of Error for Gamma Distribution ......................................................................................... 7
Appraisal ............................................................................................................................................... 8
IV. COMPOSITE SIMPSON’S METHOD ....................................................................................................... 9
Strategy of Method ............................................................................................................................... 9
Derivation of Composite Error Bound ................................................................................................ 11
V. ADAPTIVE SIMPSON’S METHOD ......................................................................................................... 12
Strategy of Method ............................................................................................................................. 12
Stability ............................................................................................................................................... 14
Derivation of Error Bound................................................................................................................... 14
Appraisal ............................................................................................................................................. 16
VI. TWO-THIRDS SIMPSON’S METHOD ................................................................................................... 16
Strategy of Method ............................................................................................................................. 16
Derivation of Method and Error ......................................................................................................... 18
Appraisal ............................................................................................................................................. 19
VII. COMPARISON.................................................................................................................................... 20
VIII. CONCLUSION.................................................................................................................................... 21
REFERENCES ................................................................................................................................................ 23
APPENDIXES ................................................................................................................................................ 24
iv
LIST OF FIGURES Figure 1: Gamma Distribution ...................................................................................................................... 3 Figure 2: Simple Simpson's .......................................................................................................................... 5 Figure 3: Fourth Derivative .......................................................................................................................... 7 Figure 4: Composite Simpson's .................................................................................................................. 10 Figure 5: Adaptive Simpson's ..................................................................................................................... 13 Figure 6: Two-Thirds Simpson's ................................................................................................................. 17
v
LIST OF TABLES Table 1: Simple Simpson's ............................................................................................................................ 8 Table 2: Composite Simpson's .................................................................................................................... 12 Table 3: Adaptive Simpson's ...................................................................................................................... 16 Table 4: Two-Thirds Simpson's .................................................................................................................. 19 Table 5: Comparison Summary .................................................................................................................. 20
vi
ABSTRACT
In many scenarios, it is useful to integrate the gamma distribution over a given interval in
order to calculate the probability of a certain event occurring, yet there is no known solution to
this particular integral. As such, it is necessary to approximate the integral using some type of
numerical method. An effective approximation provides an accurate estimate, is easy to
implement, and stays theoretically consistent. The goal is to get the estimation close to the real
answer within some tolerance. It is also helpful if the approximation is easily and quickly
calculated on a computer. The various Simpson’s methods that have been developed since
Simpson proposed the original method have proven to be very useful in these regards; however,
there are numerous variations. To goal of our research is to find which variation is the most
efficient for the gamma distribution. We will explore four different Simpson’s methods—simple,
composite, adaptive, and two-thirds—with regard to their theoretical strengths and how well they
approximate sample intervals. From our analysis, we have decided that the composite Simpson’s
method is the most efficient in approximating the real answer. Although it is not the most
accurate approximation, it is relatively easy to implement and does not have a considerably long
computational time. We recommend the composite Simpson’s method as the best way to
evaluate any interval along the gamma distribution.
1
APPROXIMATION OF THE GAMMA DISTRIBUTION
I. INTRODUCTION
“The theory and application of integrals is one of the great and central themes of
mathematics,” said numerical analysts P. Davis and P. Rabinowitz [2]. This report addresses the
problem of how to approximate otherwise unsolvable integrals. The study of these impossible
integrals is called quadrature analysis. There are many integrals that are commonly used that can
only be solved through approximation. There are many known approximation methods which
accurately approximate an integral. Because they are so diverse, some are better than others
when approximating certain functions. One such case is when taking the integral of the gamma
distribution, a common probability density function. We want to approximate this impossible
integral because the area under the function gives us the probability or the likelihood of an event
happening.
We will discover which approximation method is superior in estimating this unknown
integral of the gamma distribution through creating computer programs that implement these
methods that we have chosen to compare. The approximation methods we will compare are the
simple Simpson’s method, composite Simpson’s method, adaptive Simpson’s method, and two-
thirds Simpson’s method. These methods have been widely researched and much is known about
them. In fact, it has been said by Milton Abramowitz, that “95% (sic) of all practical work in
numerical analysis boil[s] down to applications of Simpson’s rule and linear interpolation” [3].
When comparing these different approximation methods we will use two standards of
judgment: the speed and the accuracy of the approximation. The speed will be measured by how
fast the approximation converges and by how fast the computer can run the method. The
2
accuracy will be measured by determining the maximum possible error, or error bound, as well
as running tests to determine the actual error for each of the methods. It should be noted though
that even if one method has a smaller error bound, it does not necessarily imply that the method
will be more accurate [6].
For most of these methods, the error bound involves two variables: the width of the
interval over which the integral is being approximate, and the fourth derivative of the function
being integrated. Each of these variables will affect the rate at which the function converges as
well as how close the approximation will be to the actual solution. For our purposes, all the
methods we examine will have similar convergence, while the difference will be the error
bounds.
There are coefficients to our error terms that change depending on the method. We want
the smallest coefficient so we can have a tighter error bound. However, as mentioned before, we
also want to consider the real error in our tests. Since we can’t test all possible intervals, error
bounds can help estimate actual error over these untested intervals.
From our previous knowledge about approximations and the gamma distribution, we
believe that the adaptive Simpson’s method will be the most accurate. We came to this
hypothesis because our chosen gamma distribution is much skewed. The skewed nature of the
gamma distribution won’t affect the adaptive Simpson’s method because it “adapts” to the
skewed graph, as will be explained later. On the other hand, we believe the simple Simpson’s
rule will be the fastest to compute. As the name implies, it is the most simple approximation
method and hence will require the least amount of work on the part of the computer.
3
II. GAMMA DISTRIBUTION
As previously stated, we decided to study the gamma distribution. The reason for this
decision was that the gamma distribution can be very skewed, or asymmetric, which will really
test the different approximation methods. For this
research we decided to use a specific gamma
distribution to simplify our calculations. The gamma
distribution has two parameters, shape (α) and scale
(β). We chose the value 1.5 for both our shape and
our scale, because using an integer value for these
parameters results in an integral function has an
exact solution. These parameter values change for
each unique situation in which a gamma distribution is used. Figure 1 shown above is a graphical
representation of our specified gamma distribution. Below is, respectively, the general gamma
distributions formula as well as our specific formulaic representation:
The gamma distribution is used most often to predict waiting time, or time until something runs
out; e.g., the amount of time a person has to wait in line, or how long until a light bulb will burn
out. This distribution is applicable to everyday life situations; therefore, being able to
approximate the gamma distribution through a simple approximation computer program will be
very useful to the general public.
(1)
Figure 1: Gamma Distribution
(2)
4
III. SIMPLE SIMPSON’S METHOD
Strategy of Method
Simple Simpson’s method, as the name suggests, is the easiest way to approximate an
integral using the technique Thomas Simpson published in 1743 [7]. This technique involves
approximating a function whose integral is too difficult to accurately calculate with a polynomial
whose integral is more easily solvable. The polynomial that Simpson uses for the approximation
is the parabola.
Formulaic Approach. A parabola is a function of the form y=Ax2+Bx +C. This function
is easily integrable, and is the polynomial Thomas Simpson chose to use to approximate intervals
of more difficult functions. When integrated, the equation of the parabola becomes ℎ3(2Ah2+6C).
To determine the specific parabola that approximates the given term, Simpson uses three
points—the function values at the two endpoints and the midpoint of the interval. We calculate
this midpoint by subtracting our initial point (a) from our terminal point (b) and dividing by two.
This gives us h, the size of the subinterval between a and our midpoint m, and m and our
midpoint b. When we plug these into the general parabola equation, we obtain:
(3)
(4)
5
Geometric Approach. Integrating a function gives us the area under the curve.
Approximating the integral of a
function involves adding known areas
together. With simple Simpson's
method, we just have one area, the area
under the parabola. In this case, we
have n=2, where the initial point a is
represented by n=0, the midpoint by
n=1, and the endpoint by n=2. As is
apparent, there is quite a discrepancy between the area under the gamma distribution and the area
under the parabola.
General Derivation of Error. Following Burden’s proof in his Numerical methods text
[1], we will derive the error term for simple Simpson’s method. To make complicated functions
more manageable, they are commonly expressed in their Taylor series expansion form, which is
an infinite sum of polynomials. This is especially useful when dealing with equations such as the
gamma distribution. If we have our function f(x), the Taylor series expansion is taken at the
value of the midpoint, m, and is expressed as:
Fortunately, we do not have to carry out this series to infinity. The fourth derivative is a
logical place to stop, because the Taylor series representation of the function is exactly equal to
Simpson’s approximation up to the fourth derivative [5]. We know by the mean value theorem,
that there exists a number in our interval from a to b, for which the function value is equal to the
rest of the function expanded out. We do not know what that number is, but we know it exists.
Figure 2: Simple Simpson's
(5)
6
Because we do not know the value of this number, this term in the fourth derivative is our error.
We can easily integrate the first four terms of the Taylor Series, but our fourth derivative remains
an unknown. When we integrate the function, we obtain:
Note that when the endpoints of the integral are plugged in, the first derivative and the second
derivative cancel, and the function value and third derivative just become double the length of
the interval, leaving us:
In order to calculate the integral of our unknown error term, we must employ the Weighted Mean
Value Theorem for integrals (see appendix A). In essence, this lets us replace our function with a
constant term which we can easily factor out of the equation and simplify, like so:
To simplify the equation further, it is necessary to use an approximation of the second derivative
(see appendix A), which involves another unknown number ξ in our interval [a,b]. By the
Intermediate Value Theorem, we can represent these two unknown values with a single constant,
leaving us the general equation for simple Simpson’s method:
Note that the last term is our unknown, or our error. It is what makes our method an
approximation. Because it involves the fourth derivative, we know Simple Simpson’s method is
exactly correct for functions of degree three or less.
(6)
(7)
(8)
(9)
7
Derivation of Error for Gamma Distribution
Note that the last term is our unknown, or our error. It is what makes our method an
approximation. Because it involves the fourth derivative, we know Simple Simpson’s method is
exactly correct for functions of degree three or less. Below is the formula for the error where K
represents the maximum value of the fourth derivative of the function within the given interval
[3]:
Using this value for K guarantees that the error bound given is the largest possible error margin.
Therefore, in order to find a maximum possible error when using the simple Simpson’s method,
we need to find the fourth derivative of our specific gamma distribution. Using derivation rules it
can be shown that the fourth derivative is as follows:
As we can see this function is undefined when x is equal to zero. We can confirm this by looking
at the graph of the fourth derivative shown on this page.
As x approaches zero, the graph
approaches negative infinity.
This results in an infinite error
bound for any interval
containing zero. For any other
interval, our error bounds will vary depending on where the interval is located. As we can see by
(10)
(11)
Figure 3: Fourth Derivative
8
the graph, we will have our largest error bounds when the interval is close to zero. For these
values, we will have a large maximum absolute value for our fourth derivative thereby creating
larger error bounds.
Appraisal
Table 1: Simple Simpson's
As is apparent from Table 1 above, using simple Simpson’s method does not produce
extremely accurate results on a given interval, unless the function being approximated is a
polynomial of three degrees or less. In our case though, we are approximating the gamma
distribution equation which is more complex than a third degree polynomial. As such, using the
simple Simpson’s method will definitely not be the most accurate method.
As is implied by the name though, it is the simplest method of approximating the gamma
distribution. Because of this simplicity, the computer coding is easier and takes less time to
create than the other approximation methods. These conclusions are based on our
experimentation. It will also take the computer less time to approximate the area since it involves
a simpler process. The error bounds are also much simpler to derive as it follows a simple
equation. For these reasons, the simple Simpson’s method is still a good approximation.
9
IV. COMPOSITE SIMPSON’S METHOD
Strategy of Method
Composite Simpson’s method applies simple Simpson’s method multiple times along the
same interval. For simple Simpson’s method, we evaluated the two endpoints and the midpoint.
To increase our accuracy, we can further subdivide the two intervals we initially have with
simple Simpson’s method into many more intervals. This is especially effective because the size
of the intervals decreases in half but our error term decreases exponentially as more intervals are
taken. Because the error term relies on a power of the interval, h, the smaller the interval, the
smaller the error term.
10
Geometric Representation. As is seen in Figure 4, we create more intervals to use in the
composite method by evaluating the function at the midpoints of existing intervals. Here we
define “interval” to be from the
distance between endpoints and
“subinterval” to be the distance
from an endpoint to the midpoint of
an interval. Of course, this process
of taking the midpoints of intervals
could go on forever. That infinite
calculation would give us an exact
value for our function, so we will not address that in this paper. When the midpoints of the
intervals are evaluated, it ensures that the subintervals are all the same length. This means we
must have an even number n of points we evaluate. In Figure 4 our n=8, with a as our initial
point, n=0.
Formulaic Representation. The formula for the composite Simpson’s method is a summation of
the simple approximations for the subintervals. The general formula for composite Simpson’s
method is:
Whereas simple Simpson’s method uses one parabola to approximate f(x), composite Simpson’s
method has a parabola for each interval. All of the endpoints except our initial point 𝒙𝒙𝟎𝟎 and our
Figure 4: Composite Simpson's
(12)
11
terminal point 𝒙𝒙𝒏𝒏 are used in two different parabolas. As you can see, the formula takes this
repetition into account.
Derivation of Composite Error Bound
The error bound for the composite Simpson’s method is found in a very similar way as
the error bounds for the simple Simpson’s method. This is because the composite Simpson’s
approximation is closely related to that of the simple method as explained above, the one
difference being the use of multiple subintervals instead of just two. As such, the formula to
compute the maximum error bounds for the composite Simpson’s rule is almost exactly the same
as the formula for the simple Simpson’s method. The one difference is that the composite error
formula takes into account the number of subintervals n used to approximate the gamma
distribution. The formula is as follows:
Because this equation also uses the maximum value of the fourth derivative in the given
interval, or K, the error once again is infinite for any interval that contains zero. For any other
interval though, the magnitude of the error bounds directly correlates to the number of
subintervals used. Using more subintervals throughout our composite method results in a lower
margin for error and, in effect, a more accurate approximation. In fact, if we compare the
formulas from the two different methods we can see that the composite approximation is
guaranteed to have a smaller error bound since we have an n5 in the denominator. If we choose
two for our value of n, which is how many subintervals are used in the simple Simpson’s rule,
our formula becomes the formula used in calculating the simple Simpson’s error bounds (see
(13)
12
equation (10)). Therefore, as long we divide our interval into more than two subintervals, our
error bounds will automatically be less for the composite Simpson’s method.
Appraisal
Table 2: Composite Simpson's
The composite Simpson’s method overall is far more effective than the simple Simpson’s
method. “For functions that have four continuous derivatives,” as is the case with our gamma
distribution, “Simpson’s rule...converges to the true value of the integral with rapidity N-4 at
worst” [2]. As we discovered, the approximation of the composite method is guaranteed to be
more accurate than that of the simple, and the accuracy can be improved arbitrarily by simply
increasing the number of subintervals used in the approximation. It requires more calculation and
more code though, since more than two subintervals are involved. In fact, any number of even
subintervals can be used. This further complicates how the code will be written and will most
likely affect how much time it will take the computer to work through the calculations. The basic
formula though, stays fairly simple; therefore, overall this method seems like a very effective
way to approximate our gamma distribution function.
V. ADAPTIVE SIMPSON’S METHOD
Strategy of Method
While composite gives us an answer that is much closer to the truth, we would rather not
have to worry about how many intervals are “enough”. “Adaptive quadrature is an automatic
13
procedure for increasing the accuracy of a numerical approximation to an integral by increasing
the number of samples of the integrand. Additional samples only need to be taken where the
quadrature scheme is having numerical difficulties” [7]. These numerical difficulties are related
to the difference in the answers that simple and composite Simpson’s methods give. Through
some careful manipulation we can see that using composite Simpson’s method with four
subintervals is about 15 times more accurate than simple Simpson’s method (to be derived in a
later section). Taking advantage of this we can simply let a computer decide when to stop
making intervals. Also, if we are careful, we can put more intervals in areas that we need them.
Geometric Representation. Notice that in some areas of the gamma distribution there are
bigger intervals. There the function is not changing so rapidly and so our approximation with the
bigger intervals is sufficient.
However, on the left side there are
more intervals because the function is
changing more rapidly and we need
more to get a better approximation.
That is a huge benefit of the adaptive
Simpson’s method.
Formulaic Representation. Adaptive
Simpson’s method has no formula of its own. Instead it uses the formulas for simple and
composite Simpson’s method and compares the answer according to this formula below:
Figure 5: Adaptive Simpson's
(14)
14
Epsilon is the error and herein lies the greatest strength of adaptive Simpson’s method. Instead of
giving it amount of intervals we want, we give it the biggest error we want to allow. We can
specify the error. This is much more meaningful and intuitive than the number of intervals.
Stability
For composite and simple Simpson’s method we did not need to talk about stability.
However, with adaptive Simpson’s we need to in order to understand some limitations. Stability
is the ability of an algorithm to not get messed up by a computer. Computers have a finite
memory and thus can’t hold infinite digits. This means the computer will always be a little off.
This is called machine epsilon. Just like our epsilon, machine epsilon refers to error but the error
incurred by the machine or computer. For most computers this is about 10-16.
This means that for the adaptive Simpson’s method, we shouldn’t do an error tolerance smaller
than that. It will not provide any more accuracy and will take substantially longer. If we go too
small it might even grow the error as machine epsilon is added over and over again to our
answer. However, for most applications errors of the order of machine epsilon are completely
acceptable and this limitation is okay.
Derivation of Error Bound
Here we will describe how to find the error tolerance (epsilon) for adaptive Simpson’s
method, following Burden’s proof [1]. As described above, the error tolerance determines
whether or not to further subdivide an interval using composite Simpson’s method. If we take a
general interval [a,b] of our function with the length of our subintervals being (𝑏𝑏−𝑎𝑎)4
= ℎ as we
did in the simple Simpson’s method, and split it using the composite method so that our
subinterval length becomes (𝑏𝑏−𝑎𝑎)4
= ℎ2. This changes our equation
15
to
for some 𝑥𝑥𝑖𝑖 in [a,b]. We break this equation into two parts, the first half (15) and second half
(16), excluding the error. If we assume that 𝜉𝜉 in our simple equation and 𝜉𝜉 in our composite
equation are the same, we can set the two equations equal to each other. Even though this is
unlikely, this is a reasonable assumption for small intervals and doesn’t affect the integrity of the
derivation. When we set our composite Simpson’s method (equation (12)) equal to our simple
Simpson’s method (equation (9)), we can move both error terms to the same side and simplify to
obtain:
By this we can see that our composite approximation is 15 times better approximating our
function on the interval than agreeing with the simple approximation on the same interval. Thus,
as long as our simple approximation subtracted by our composite approximations is less than 15
epsilon, the approximated value of the function will be less than epsilon, our error tolerance. If
that holds true, we know that the composite approximation is sufficient over that interval. If our
inequality is greater than epsilon, we will need to further subdivide our intervals.
(15)
(16)
(17)
16
Appraisal
Table 3: Adaptive Simpson's
The adaptive Simpson’s method has various advantages and disadvantages. The greatest
advantage of it is the ability the user has to specify a certain error tolerance for the method to
consider, as explained in the previous section. It will then give an approximation that is
guaranteed to be accurate within the given error tolerance. As a result, this process can be as
accurate as the user desires. The main disadvantage is that the process is far more complicated
than the previous two methods. This method requires the computer to make decisions whether
certain intervals lie within the given error tolerance, and then to constantly repeat the process for
the smaller intervals that did not lie within the tolerance. This makes the code far more difficult
and time-consuming to write. It also increases the time the computer will take to calculate the
approximation, especially for very small error tolerances. This method is only considered to be
effective if a specific amount of accuracy is desired. Otherwise, the composite Simpson’s rule is
an accurate and far less complex method.
VI. TWO-THIRDS SIMPSON’S METHOD
Strategy of Method
The Simpson’s methods so far have an undesirable trait associated with them. They give
points different weights. This gives an asymmetrical feel to formula and makes us ask why they
17
have different weights. If we look at the derivation of the simple Simpson’s method, we can see
that we are most interested what the function is doing in the middle of the interval than on the
edges so that point gets more weight. However, the adaptation we use with the composite
Simpson’s method seems to give different middle points different weights. Thus also the
adaptive Simpson’s method which uses composite Simpson’s rule is also unsymmetrical.
We would not only love to find a Simpson’s rule variant that give equal weights to the middle
points but also a variant that is better. According to Dr. Daniel Velleman in his paper “Simpson
Symmetrized and Surpassed” the two-thirds Simpson’s method does the job [6].
Geometric Representation. There have been many variants that have symmetrized
Simpson’s rule [6], but they have a worse error bound on the function then composite Simpson’s
rule. We do not want to sacrifice numerical accuracy for aethestic qualities of the formula.
So Dr. Velleman pinpoints the loss in accuracy in the endpoints. The larger error bound comes
giving equal weight to end intervals. Dr. Velleman proposes to shrink the end intervals by two
thirds and grow the other intervals accordingly as shown in the picture. The intuitive idea behind
it is that the end intervals are weighing the
function with regard to intervals not inside
the interval we care about. By making this
smaller we are decreasing this problem.
Formulaic Representation.
Figure 6: Two-Thirds Simpson's
18
Notice that the formula doesn’t require an even number of intervals. Because we have
symmetrized the formula, it no longer needs this constraint. However, we need at least 5
intervals otherwise you get a different formula.
Derivation of Method and Error
The math required to derive the Simpson’s two-third rule requires much more time and
space for a general audience. The interested readers are encouraged to read Dr. Velleman’s
explanation in [6] if they desire more information. However, we will give an overview here.
There are many different formulas for approximating integrals with polynomials. The Peano
kernel is a generalization of the error formula. After substituting the appropriate specifics that we
want the two-thirds Simpson’s rule to have, we can minimize the Peano kernel to find out how
much we should reduce the end intervals and we get about 0.6660. Velleman proposes, in order
to keep his formula simple and intuitive, that we use the value two-thirds. Thus the error comes
out to be:
Notice this has the best error bound of all the methods (ignoring adaptive quadrature which has a
different basis).
(18)
(19)
19
Appraisal
Table 4: Two-Thirds Simpson's
Sadly for our problem the two-thirds Simpson’s method does not perform as well. It
seems that for small numbers of intervals, the two-thirds Simpson’s method does not do well.
The reason for this is that for the gamma distribution, the end intervals are very important,
especially on the right side. Two-thirds Simpson’s method puts most of the weight in the middle
of the interval. For most functions, that’s all you have to worry about but for the gamma
distribution the ends of the intervals are very important. This does not fare well in terms of error
with any but the simple Simpson’s method.
20
VII. COMPARISON
Now that we have collected data for our four chosen methods, we can compare them
using our previously defined standards of judgment: speed and accuracy. Table 5 shown below
displays a summary of the data that we have collected from our computer program.
Table 5: Comparison Summary
As seen from the table, the adaptive Simpson’s method was the best in terms of the
accuracy but it was the worst in terms of speed. The simple Simpson’s method though was the
worst in terms of accuracy but was the best in terms of speed. So it comes down to which is more
important, accuracy or speed. If a compromise between the two criteria is wanted then the
composite Simpson’s method would be the best choice. The composite method is second best in
terms of error and still has a computation time under 10 seconds. The two-thirds method, even
though more accurate that the simple method, was the least effective method because it was not
all that accurate and the computer program to simulate this method was complicated.
21
VIII. CONCLUSION
Through our findings, we have discovered that the least effective approximation method
for the gamma distribution is the two-thirds Simpson’s method. Not only was it not nearly as
accurate as the composite or adaptive Simpson’s methods, it involved much more complicated
coding then the simple or composite methods and took the computer a significant amount of time
to compute. The other approximation methods have a varying degree of effectiveness. The
adaptive Simpson’s method is the most accurate approximation given the fact that it will give an
estimate within a certain error bound specified by the user. However, this method took the
computer the longest time to compute, by far. The simple Simpson’s rule is the least accurate
method but easily had the simplest coding involved. That is evident by the fact that it only took
the computer 0.03 seconds to compute the answer. Taking this all into account, we found that the
most effective approximation method was the composite method. It was the second most
accurate test and took the computer only a slightly longer time to compute than the two-thirds
method. Of course, if less subintervals are used, the calculation times becomes faster while the
accuracy decreases. As such, this method provides a very accurate approximation and a
relatively fast computation speed with the option of using less subintervals for those who desire
the calculation in even less time.
Since the scope of this paper did not allow for exploration of all the different methods of
numerical quadrature analysis, the curious reader can research additional methods, such as
Simpson’s three-eighths rule, Simpson’s one-third rule, and the generalized Simpson’s rule [7].
Changing the scale and size of our gamma function will also yield different and interesting
results. Another avenue to consider is applying and combining existing methods. Perhaps by
applying the adaptive method to our two-thirds method, we would be able to obtain even more
22
accuracy. These Simpson’s methods could also be compared to other quadrature methods like
Gaussian quadrature which doesn’t use the midpoint of intervals. These derivations are left for
the reader to pursue.
23
REFERENCES
[1] Burden, Richard L. Faires, J. Douglas. Numerical Analysis Ninth Edition. Brooks/Cole
Cengage Learning, 2011.
[2] Davis, P. J., & Rabinowitz, P. (1984). Methods of numerical integration. (2nd ed., Vol. 1).
Orlando: Academic Press, INC.
[3] Stewart J. (2012). Single Variable Calculus: Early Transcendentals 7th ed. Vol. 2. Belmont,
CA. Brooks/Cole, Cengage Learning.
[4] Schroder, J. (2006). Lecture 24: Simpson’s Rule [PowerPoint slides]. Retrieved from
http://lukeo.cs.illinois.edu/static_courses/2006spring/CS257/lectures/lecture24.pdf
[5] Schervish, M. J., & DeGroot, M.H. (2010). Probability and Statistics. (4th ed.). Boston.
Addison-Wesley.
[6] Velleman, Daniel J. (2004). “Simpson Symmetrized and Surpassed”. Mathematical
Association of America. Mathematics Magazine. Volume 77, Number 1, p. 31-45.
[7] Velleman, Daniel J. (2005). “The Generalized Simpson’s Rule”. The American Mathematical
Monthly. Volume 112, No. 4, p. 342-350.
[8] Zwillinger, D. (1992). Handbook of Integration. United States of America. Jones and Bartlett
Publishers, Inc.
24
APPENDIXES
Appendix A: Related theorems
Weighted Mean Value Theorem for Integrals: Suppose f is a continuous function on the interval
[a,b], the Riemann integral of g exists on [a,b], and g(x) does not change sign on [a,b].
Then there exists a number c in (a,b) with ∫ 𝑓𝑓(𝑥𝑥)𝑔𝑔(𝑥𝑥)𝑑𝑑𝑥𝑥 = 𝑓𝑓(𝑐𝑐)∫ 𝑔𝑔(𝑥𝑥)𝑑𝑑𝑥𝑥𝑏𝑏𝑎𝑎
𝑏𝑏𝑎𝑎
Intermediate Value Theorem: If f is a continuous function on the interval [a,b], and K is any
number between f(a) and f(b), then there exists a number c in (a,b) for which f(c)=K.
Second Derivative Midpoint Formula:
𝑓𝑓"(𝑥𝑥0) = 1ℎ2
[𝑓𝑓(𝑥𝑥0 − ℎ) − 2𝑓𝑓(𝑥𝑥0) + 𝑓𝑓(𝑥𝑥0 + ℎ)] − ℎ2
12𝑓𝑓(4)(𝜉𝜉)
For some 𝜉𝜉, where 𝑥𝑥0 − ℎ < 𝜉𝜉 < 𝑥𝑥0 + ℎ.
25
Appendix B: Original Code
#316 code x<-seq(0,6,length=1000) plot(x,dgamma(x,1.5, rate=1.5), type='l', ylim=c(0,1), main="Gamma Distribution", xlab="X", ylab="Density" ) #dgamma is the function that we are plotting abline(v=4.6) abline(v=5.6) #v is for vertical line #h is for horizontal line #Simple Simpson's function for the gamma distribution gammasimple<-function(a, b){ h<-(b-a)/2 out<- (h/3)*(dgamma(a,1.5,rate=1.5)+4*dgamma(a+h,1.5,rate=1.5)+dgamma(b,1.5,rate=1.5)) return(out) } #Composite Simpson's function for the gamma distribution gammacomp<-function(a,b,n){ h<-(b-a)/n suma=0 sumb=0 totala = n/2-1 if(totala>0){ for(i in 1:totala){ suma= suma+dgamma(a+2*h*i,1.5,rate=1.5)} } for(i in 1:(n/2)){ sumb= sumb+dgamma(a+h*(2*i-1),1.5,rate=1.5) } out<- (h/3)*(dgamma(a,1.5,rate=1.5)+dgamma(b,1.5,rate=1.5)+2*suma+4*sumb) return(out) } #2/3 Simpson's method for the gamma distribution
26
gammatwo<-function(a,b,n){ h=(b-a)/(n-(2/3)) suma=0 if(n>6){ for(i in 3:(n-3)){ suma=suma+dgamma(a+2/3*h+(i-1)*h,1.5,rate=1.5)} } sumb= (77/360)*dgamma(a,1.5,rate=1.5) +(205/216)*dgamma(a+(2*h)/3,1.5,rate=1.5) +(271/270)*dgamma(a+2/3*h+h,1.5,rate=1.5) +(271/270)*dgamma(a+(2*h)/3+h*(n-3),1.5,rate=1.5) +(205/216)*dgamma(a+(2*h)/3+h*(n-2),1.5,rate=1.5) +(77/360)*dgamma(b,1.5,rate=1.5) out=h*(sumb+suma) return(out) } #Adaptive Simpson's method for gamma distribution gammaadpt<-function(a,b,e){ suma=0 c <- rep(NA, 10000) d <- rep(NA, 10000) d[1]=e c[1]=a c[2]=b j=1 while(j>0){ simple=gammasimple(c[j],c[j+1]) comp=gammacomp(c[j],c[j+1],4) if (abs(simple-comp) < 15*e){ suma=suma+comp d[(j+1)/2]=NA c[j]=NA c[j+1]=NA j=j-2 } if (abs(simple-comp) > 15*e){ end=c[j+1] c[j+1]=(end+c[j])/2 c[j+2]=(end+c[j])/2 c[j+3]=end d[(j+1)/2+1]=d[(j+1)/2]/2 d[(j+1)/2]=d[(j+1)/2+1] j=j+2 } } return(suma) } #Takes vectors and uses gammaapt to find probabilities
27
vadpt = function(x,y,e){ out = rep(NA, length(x)) if(length(x)==length(y)){ for (i in 1:length(x)){ out[i]=gammaadpt(x[i],y[i],e) } } return(out) } #Random tests #Generate the random intervals to test on x = runif(5000,min=0,max=3) z = runif(5000,min=0,max=3) y= x+z #Find the approximations and compute the time taken to do so intervals = c(6,8,10,100,1000) error = c(1,10^-2,10^-6,10^-12) simpletime = system.time({ simpleans = t(gammasimple(x,y)) }) comptime = rep(0,length(intervals)) compans = matrix(NA,length(x),length(intervals)) for (i in 1:length(intervals)){ comptime[i] = system.time({ compans[,i] = t(gammacomp(x,y,intervals[i])) })[3] } twotime = rep(0,length(intervals)) twoans = matrix(NA,length(x),length(intervals)) for (i in 1:length(intervals)){ twotime[i] = system.time({ twoans[,i] = t(gammatwo(x,y,intervals[i])) })[3] } adpttime = rep(0,length(error)) adptans = matrix(NA,length(x),length(error)) for (i in 1:length(error)){ print(i) adpttime[i] = system.time({ adptans[,i] = t(vadpt(x,y,error[i])) })[3] }
28
#Calculate the real answer real = pgamma(y,1.5,rate=1.5)-pgamma(x,1.5,rate=1.5) #Compute the errors in the approximations simpleerror = real-simpleans comperror = real-compans twoerror = real-twoans adpterror = real-adptans #Statistics #Comments: # mean() calculates mean # sd() calculates standard deviation # quantile( ,c(.025,.975)) calculates a 95% confidence level #Simple Simpson's method statistics mean(simpleerror) sd(simpleerror) quantile(simpleerror,c(.025,.975)) #Composite Simpson's method statistics apply(comperror,2,mean) apply(comperror,2,sd) for (i in 1:length(intervals)){ print(quantile(comperror[,i],c(.025,.975))) } #2/3 Simpson's method statistics apply(twoerror,2,mean) apply(twoerror,2,sd) for (i in 1:length(intervals)){ print(quantile(twoerror[,i],c(.025,.975))) } #Adaptive Simpson's method statistics apply(adpterror,2,mean) apply(adpterror,2,sd) for (i in 1:length(error)){ print(quantile(adpterror[,i],c(.025,.975))) } simpletime comptime twotime adpttime
29
Appendix C: Assignment Foundation
Purpose: The purpose of this assignment is to conduct our own original experiment and to report
on our findings. This project is the main focus of this class; therefore, there are many sub-
purposes of this assignment. Through this project we will learn how to effectively communicate
research as well as how to conduct our own research. Our chosen research topic is integral
approximation. In this report we propose to explore different methods, all variations of
Simpson’s method, in approximating the gamma distribution. Our research and experimentation
must be presented clearly and accurately, not necessarily to persuade the reader to a specific
action, but to inform them, and perhaps to peak their interest in the field of mathematics
involving approximating integrals. We hope to clearly communicate our findings by means of
graphs, tables, and formulas, with explanations of each.
Audience: The audience for this assignment is our instructor and fellow classmates. Therefore,
when writing our report we should keep in mind that not everyone reading this report will be
experts in our topic. We have assumed a basic understanding of calculus and mathematics in
general. Our audience is educated and intelligent, and should not be patronized. This report is
designed to make the reader think, and the reader should not feel spoon-fed. Most people do not
have much of an interest in calculus, so our report should be interesting and engaging. Our
instructor Liz is looking not so much at the correctness and rigor of our research, but rather the
correctness and rigor of our writing. She is looking for a paper that has cut down the lard, that
clearly illustrates a point, that is concise and consistent, and that is able to communicate the ideas
in the paper.
Scope: This project should be at least 15 pages in length without any added graphics but no more
than 20 pages at Liz’s request. This allows us to elaborate on our four chosen methods, but not to
30
explore many more than that. We researched and experimented with simple, composite, adaptive,
and two-thirds Simpson’s methods. We have included much information on these methods,
including derivations of the formulae for the approximation itself as well as the error. Also
included is our experimentation, as we took multiple approximations of the same integral with
the different methods to compare them. Since the majority of our paper is based on original
research, we have less cited sources than may be expected. We do have included in our appendix
much of our original code. We drew on the research of others for derivations and assumptions
that could not be explained in the allotted page limit, but the majority of our equations were
worked out by ourselves.
Format: This project is to be submitted as a hard copy and is to be spiral bound. There is a
formatting guide that has been placed on learning suite and we are to follow that format exactly
when writing this report. We have prefatory elements, which include a title page, table of
contents, lists of figures and tables, and an abstract; as well as the bulk of our report, which is
organized with headings, subheadings, and sub-subheadings. Following the body of our report is
our appendix, which is where our assignment foundation will be included, as well as other
relevant information.
Schedule: This project has a lot of subprojects that are due that will help us finish this technical
report. These include the Journal Article Analysis White Paper, Proposal, Conflict Resolution
Memo, and Critique. The final draft of our technical report is due April 9, 2014. There is no
rough draft due for this project; therefore, our group needs to stay on top of our decided schedule
in our proposal. As a group we meet at least weekly on Saturday mornings to work together on
our project.