Introduction to Statistics: Worked examples

Embed Size (px)

Citation preview

  • 8/10/2019 Introduction to Statistics: Worked examples

    1/18

    MATH1725 Introduction to Statistics: Worked examples

    Worked Example: Lectures 12

    The lifetimes of 400 light-bulbs were found to the nearest hour. The results were recorded asfollows.

    Lifetime (hours) 0199 200399 400599 600799 800999 10001199 12001999Frequency 143 97 64 51 14 14 17

    Construct a histogram and cumulative frequency polygon for these data. Estimate the percentageof bulbs with lifetime less than 480 hours.

    Answer: Lifetimes cannot be negative so class intervals are [0, 199.5), [199.5, 399.5), [399.5, 599.5),and so on.

    Lifetime (hours)

    Freq.per200hourclass

    0 500 1000 1500 2000

    0

    20

    40

    60

    80

    12

    0

    Adjust height of the rectangle for the 12002000 interval to make histogram area proportionalto frequency. If the vertical axis is frequency per interval of 200 hours, the height of the [0, 199.5)class is 143 200/199.5 = 143.4 to allow for the first class not being of width 200.

    Lifetime (hours) 0.0 199.5 399.5 599.5 799.5 999.5 1299.5 1999.5Cumulative frequency 0 143 240 304 355 369 383 400

    Make the cumulative frequency at time zero equal to 0.

    0 500 1000 1500 2000

    0

    100

    200

    300

    400

    Lifetime (hours)

    Cumulativefreq.

    400 450 500 550 600

    240

    260

    280

    300

    Lifetime (hours)

    Cumulativefreq.

    480

    265.8

    Estimated number of light-bulbs with lifetime less than 480 hours is

    240 +480 399.5200

    (304 240) = 265.8.

    1

  • 8/10/2019 Introduction to Statistics: Worked examples

    2/18

    Required percentage is265.8

    400 100 = 66.4%

    Worked Example: Lectures 12

    The Christmas cactus Zygocactus truncatushas branches made up of separate segments. For one

    such cactus the number of segments in each branch were counted.

    Number x of segments 1 2 3 4 5 6 7 8 9Number of branches withx segments 3 0 6 7 8 18 8 0 2

    Construct a cumulative frequency polygon to represent these data.

    Answer: The data is discrete so cumulative frequency plot is a step function.

    Number x of segments 1 2 3 4 5 6 7 8 9Number of branches with x segments 3 3 9 16 24 42 50 50 52

    0 2 4 6 8 10

    0

    10

    20

    30

    40

    50

    60

    Number of segments

    Cumulativefreq.

    Worked Example: Lectures 12

    The following data give one hundred measurement errors made during the mapping of the Americanstate of Massachusetts during the last century.

    ErrorX (in minutes of arc) (4,2] (2, 0] (0, +2] (+2, +4] (+4, +6]Frequency 10 43 39 5 3

    Show that the sample mean and sample standard deviation for these data are x =0.04 ands= 1.717 respectively.

    Answer:

    Class Class frequencyf Class mid-point x f x f x2

    4< x 2 10 3 30 902< x 0 43 1 43 43

    0< x +2 39 +1 39 39+2< x +4 5 +3 15 45+4< x +6 3 +5 15 75

    Totals n= 100 4 292

    2

  • 8/10/2019 Introduction to Statistics: Worked examples

    3/18

    x=4100

    = 0.04.

    s2 = 1

    99(292 100 (0.04)) = 2.9479, so s=

    (s2) =

    2.9479 = 1.717.

    Worked Example: Lectures 12

    The time between arrival of 60 patients at an intensive care unit were recorded to the nearest hour.The data are shown below.

    Time (hours) 019 2039 4059 6079 8099 100119 120139 140159 160179Frequency 16 13 17 4 4 3 1 1 1

    Determine the median and semi-interquartile range. Explain why this pair of statistics might bepreferred to the mean and standard deviation for these data.

    Answer:

    Time (hours) 0.0 19.5 39.5 59.5 79.5 99.5 119.5 139.5 159.5 179.5Cumulative frequency 0 16 29 46 50 54 57 58 59 60

    Median lies in 4059 class, corresponding to cumulative frequency 30.Lower quartile is in 019 class, corresponding to cumulative frequency 15. Notice that this

    class has width 19.5 hours, not 20 hours.Upper quartile is in 4059 class, corresponding to cumulative frequency 45.

    Median = 39.5 +30 2946 29 20 = 40.7 hours.

    Lower quartile = 0.0 +15 016

    0 19.5 = 18.3 hours.

    Upper quartile = 39.5 +45 2946 29 20 = 58.3 hours.

    Semi-interquartile range = 12

    (58.3 18.3) = 20.0 hours.The histogram for these data is positively skew, so the median and semi-interquartile range mightbe preferred to the mean and standard deviation as measures of location and dispersion respectively.

    Interarrival time (hours)

    Freq.per20hourclass

    0 50 100 150 200

    0

    5

    10

    1

    5

    20

    3

  • 8/10/2019 Introduction to Statistics: Worked examples

    4/18

    Worked Example: Lectures 46

    A firm investigates the length of telephone conversations of their office staff. Ten consecutiveconversations had lengths, in minutes:

    10.7, 9.5, 11.1, 7.8, 11.9, 4.1, 10.0, 9.2, 6.5, 9.2.

    derive a 95% confidence interval for the mean conversation length. Test whether the mean lengthof a conversation is eight minutes.

    Answer:

    x= 1

    n

    ni=1

    xi =90

    10= 9 minutes.

    s2 = 1

    n 1

    ni=1

    x2i nx2

    = 5.42.

    Estimate the population variance2 bys2 withs =

    5.42 = 2.33. Then

    X s/n tn1.

    95% confidence interval for is x t9(2.5%)s/

    10. Here s/

    10 = 0.737, t9(2.5%) = 2.262.

    x t9(2.5%) s10

    = 9 (2.262 0.737)= 9 1.667 = (7.3, 10.7).

    Since 8 minutes lies inside the 95% confidence interval we would accept H0 in testing H0 : =8 vs. H1: = 8 at the 5% significance level.

    Worked Example: Lectures 56

    A population has a Poisson distribution but it is not known whether the mean is 1 or 4. Totest the hypothesis H0 : = 1 vs. H1 : = 1 on the basis of one observation Xthe following testprocedure is considered: reject H0 ifX i.

    Type I error is defined to be rejecting H0 when H0 is true. Find the probability of type Ierror for the three cases i= 2, 3, 4.

    Answer: If H0 is true, = 1 and

    pr{X=x} =e1

    x! , x= 0, 1, 2, . . . ,

    so that pr{Type I error} = pr{X i}.Ifi = 2,

    pr{Type I error} = pr{X 2} = 1pr{X

  • 8/10/2019 Introduction to Statistics: Worked examples

    5/18

    Worked Example: Lectures 56

    A sample of size 64 is drawn by simple random sampling from a normal population which hasvariance 4. The sample mean is0.45. Test the hypothesis H0 : = 0vs. H1 : = 0 at the 5%level of significance. Repeat for testing H0: = 0 vs. H1: >0

    Answer: Here X

    N(, 2/n) with 2 = 4, n = 64, so 2/n= 0.0625 and X

    N(, 0.0625).Test statistic is

    Z=X /

    n =

    X0.0625

    =X

    0.25

    whereZ N(0, 1) if H0 is true.For = 0.05 with a two-sided test, z/2 = 1.96. Critical region isZ 1.96.

    Observed value is z= 0.45/0.25 = 1.8. This does not lie in critical region so accept H0.For = 0.05 with a one-sided test, z= 1.645. Critical region is Z < 1.645. Observed value

    is z = 1.8 which lies in critical region so reject H0.

    Worked Example: Lecture 6

    The absenteeism rates (in days and parts of days) for nine employees of a large company wererecorded in two consecutive years.

    Employee 1 2 3 4 5 6 7 8 9

    Year 1 3.0 6.7 11.3 5.0 9.4 15.7 8.0 10.0 9.7Year 2 2.8 5.1 8.4 5.0 6.2 12.2 10.0 6.8 6.0

    Is there any evidence that the average absenteeism rate is different for the two years?

    Answer: Data paired same employee studied in each of the two years.Form difference di= (year 1)i (year 2)i. Need to estimate variance 2d.Test H0: d= 0 vs. H1: d= 0. See lecture 6.Worked Example: Lecture 8

    Which phrases i-iv below apply to the sample correlation coefficient rXY?(i) measures linear association between two variables,(ii) is never negative,(iii) has positive slope,(iv) depends on the units of measurement ofX andY.

    Answer: i only.

    Worked Example: Lecture 8

    The tensile strength of a glued joint is related to the glue thickness. A sample of six values gavethe following results:

    Glue Thickness (inches) 0.12 0.12 0.13 0.13 0.14 0.14Tensile Strength (lbs.) 49.8 46.1 46.5 45.8 44.3 45.9

    Calculate the sample correlation coefficient r for these data.Use the fitted least squares regression line to predict the tensile strength of a joint for a glue

    thickness of 0.14 inches.Using scatter-diagrams, sketch the form of regression line expected in the three cases when r

    takes the values1, 0, and +1.

    5

  • 8/10/2019 Introduction to Statistics: Worked examples

    6/18

    Answer: LetXdenote the glue thickness and Ythe joint strength.

    x y x2 y2 xy

    0.12 49.8 0.0144 2480.04 5.9760.12 46.1 0.0144 2125.21 5.5320.13 46.5 0.0169 2162.25 6.0450.13 45.8 0.0169 2097.64 5.9540.14 44.3 0.0196 1962.49 6.2020.14 45.9 0.0196 2106.81 6.426

    Totals 0.78 278.4 0.1018 12934.44 36.135

    x= 0.78

    6 = 0.131, y=

    278.4

    6 = 46.41, s2X=

    1

    5{0.1018 6(0.131)2} = 0.00008,

    s2Y =1

    5{12934.44 6(46.41)2} = 3.336, sXY =1

    5{36.135 6(0.131)(46.41)} = 0.0114.

    rXY = sXYsXsY

    = 0.01140.00008 3.336 = 0.698.

    Regression line:

    y = y+ (x x) sXYs2X

    = 46.4 + (x 0.13)0.01140.00008

    = 64.925 142.5x.

    Atx= 1.4 this gives y= 44.975 lbs..Scatter-plots:r= 1: data lies on a straight line with negative slope.r= +1: data lies on a straight line with positive slope.r= 0: data randomly scattered (X andYindependent) or could show case with X andY havinga non-linear dependence as in the lecture notes. You could even show both of these cases!

    Worked Example: Lecture 11

    A coin is tossed three times. LetXdenote the number of heads and Y the length of the longestrun of heads or tails. Thus HTT gives X= 1 and Y= 2, THT gives X= 1 and Y = 1.(a)Obtain the joint probabilities ofX and Y.(b) Obtain the marginal probability distribution ofX and Y .(c)IfX= 1, what is the distribution ofY?

    Answer: (a and b)All eight outcomes are equally likely, so occur with probability 1/8.

    Outcome HHH HHT HTH HTT THH THT TTH TTT

    X 3 2 2 1 2 1 1 0Y 3 2 1 2 2 1 2 3

    Probability 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8

    Y1 2 3 pX(x)

    0 0 0 1/8 1/8X 1 1/8 1/4 0 3/8

    2 1/8 1/4 0 3/83 0 0 1/8 1/8

    pY(y) 1/4 1/2 1/4 Total = 1

    6

  • 8/10/2019 Introduction to Statistics: Worked examples

    7/18

    Joint probabilities p(x, y) are found by summing probabilities for each outcome giving rise to(X=x, Y =y). Thusp(1, 2) = pr{HT T or T T H} = 1/4.

    Marginal probabilities are found by forming row or column sum. Thus, for Worked Example,

    pr{X= 2} =p(2, 1) +p(2, 2) +p(2, 3) = 38

    .

    (c)IfX= 1, then

    pr{Y =y|X= 1} = p(1, y)pX(1)

    =p(1, y)

    3/8 .

    Thus

    pr{Y = 1|X= 1} = 1/83/8

    = 1/3, pr{Y = 2|X= 1} = 2/83/8

    = 2/3, pr{Y = 3|X= 1} = 0.

    IfX= 1, then the outcome is one of HTT, THT, TTH. In one out of these three cases we observeY = 1 and in two out of three we observeY = 2.

    Worked Example: Lecture 11X and Yare independent continuous random variables which are each uniformly distributed onthe interval (0, 1).

    (a)Find the probability that 0 < X+ Y < z for values z (0, 2).(b) IfZ= X+ Y, deduce the form of the probability density function f(z) ofZ.Hints: In (a), think about the area on the x-y plane corresponding to 0 < x+y < z. In (b), firstfind the cumulative distribution function F(z) = pr{Z z}.

    Answer: X and Yare uniformly distributed on the interval [0, 1) so X and Y have pdf,

    fX(x) = 1 if 0< x

  • 8/10/2019 Introduction to Statistics: Worked examples

    8/18

    From the figure above, pr{0< X+ Y < z} =

    1

    2z2 if 0< z

  • 8/10/2019 Introduction to Statistics: Worked examples

    9/18

    Answer:

    E[T] =E[a1X1+ a2X2] = a1E[X1] + a2E[X2] = a1 + a2= (a1+ a2).

    If we require E[T] =, then a1+ a2 = 1, so thata2= 1 a1.SinceE[T] =, thenT is said to be an unbiased estimator of the mean.

    Var[T] = Var[a1X1+ a2X2] = a21Var[X1] + a

    22Var[X2] =a

    21

    2 + a222 = (a21+ a

    22)

    2.

    Sincea2= 1 a1, Var[T] = {a21+ (1 a1)2}2 = (2a21 2a1+ 1)2. Differentiate this with respectto a1 to find the minimum.

    d

    da1Var[T] = (4a1 2)2,

    which is zero when a1= 1

    2. Hence Var[T] is a minimum when a1= a2=

    1

    2 soT = 1

    2(X1+ X2).

    Alternative derivation: writea1= 1

    2+ , a2=

    1

    2 . Then

    Var[T] = (a21+ a22)

    2 ={

    (1

    2

    + )2 + (1

    2)2

    }2 = ( 1

    2

    + 22)2,

    and is a minimum if = 0.

    What does this question show? In part (a) you chosea2 to restrict attention to linear combina-tions of theXi which were unbiased estimators of the mean, so E[T] =. In part (b) you thenshowed that of all such unbiased estimators, the sample meanXis the one with smallest variance,so giving values closest to the true mean.

    Worked Example: Lecture 15.

    The following data give the noise level (in decibels) generated by fourteen different chain sawspowered in one of two different ways.

    Petrol-powered chain saws 103 103 105 106 108 105 106Electric-powered chain saws 97 95 94 93 91 95 94

    At the 5% level of significance, test whether the average noise level of petrol-powered chain sawsis higher than for electric-powered chain saws.

    Answer: Testing H0: 1 = 2 vs. H1: 1> 2, i.e. H0: 1 2= 0 vs. H1: 1 2> 0.Have two independent samples with unknown variance. Need to assume variances are equal.

    Worked Example: Lecture 15.

    The following data give the length (in mm.) of cuckoo (cuculus canorus) eggs found in nestsbelonging to wrens (A) and reed warblers (B).

    A: 19.8 22.1 21.5 20.9 22.0 21.0 22.3 21.0 20.3 20.9B: 23.2 22.0 22.2 21.2 21.6 21.6 21.9 22.0 22.9 22.8

    Assuming the variances for each group are the same, is there any evidence at the 5% level tosuggest that the egg size differs between the two host species?

    9

  • 8/10/2019 Introduction to Statistics: Worked examples

    10/18

    Answer: Have two independent normal distributions with unknown variances.Wrens: x1= 21.18 mm., s

    21

    = 0.6418, n1= 10.Reed warblers: x2= 22.14 mm., s

    22= 0.4116, n2= 10.

    Assume 21 =22 =

    2 (unknown). Estimate2 using

    s2 = (n1 1)s21+ (n2 1)s22

    n1+ n2 2 =

    9s21+ 9s22

    18

    = 0.5267.

    Also x1 x2= 21.18 22.14 = 0.96,

    s2

    1

    n1+

    1

    n2

    = 0.1053, t18(2.5%) = 2.101.

    If1 = 2 then the two groups of eggs have the same mean length.

    To test H0: 1= 2 vs. H1: 1=2 at 5% level, reject H0 if x1 x2s2 (1/n1+ 1/n2)

    t8(2.5%).Here

    x1 x2s2 (1/n1+ 1/n2)

    =

    0.960.1052 = 2.95 so reject the null hypothesis of equal means at 5%

    level. The two groups of eggs are significantly different at 5% level.

    This does not necessarily imply cuckoos can control their egg size. It has been proposed that acuckoo lays its egg in the particular nest for which it is best adapted. For further information see:Wyllie, I. (1981) The Cuckoo. Batsford: London.Davies, N.B. and Brooke, M. Coevolution of the cuckoo and its host, Scientific American, January1991, p.66-73.

    10

  • 8/10/2019 Introduction to Statistics: Worked examples

    11/18

    Question (lecture 1-2).For values 1, 3, 4, 5, 6 obtain the sample mean, sample median, sample variance and samplestandard deviation.Answer: 1

    Question (lecture 1-2).

    The number of insurance policies sold by a small firm per week is 7, 8, 5, 6, 6, 7, 9, 5, 7, 8, 4, 7, 6,7, 7, 5, 8, 6, 7, 6, 6. Obtain the sample mean, sample median, sample variance, sample standarddeviation. Check your values using R.Answer: 2

    Question (lecture 3).For Z N(0, 1), calculate pr{Z 0.55}, pr{Z >2.25}, pr{Z 0.15}, pr{1.50< Z 2.25}.Answer: 3

    Question (lecture 3).For Z

    N(0, 1), calculate pr

    {Z z}= 0.9713,pr{z < Z z} = 0.9108.Answer: 5

    Question (lecture 3).An advertising company requires all of its job applicants to take a psychometric test. Based onrecent studies, it is believed that the test score follows a normal distribution with mean 100 andstandard deviation 15. Determine the probability that a job applicant will receive a test score

    below 118, above 112, between 100 and 112.Answer: 6

    Question (lecture 4).IfX t5, for what value ofx is pr{X > x} = 0.05?Answer: 7

    Question (lecture 4).IfT t8, for what value t is pr{T > t} = 0.025? For what value t is pr{T < t} = 0.05?Answer: 8

    Question (lecture 4).

    13.8, 4, 3.7, 1.92.26.524, 7.0 (middle ordered value), 1.462, 1.209.3 pr{Z 0.55} = (0.55) = 0.7088, pr{Z >2.25} = 1 (Z 2.25) = 1 (2.25) = 0.0122, pr{Z 0.15} =

    1 pr{Z 0.15} = 1 (0.15) = 0.4404, pr{1.50< Z 2.25} = pr{Z 2.25} pr{Z 1.50} = 0.9210. Recallthat pr{Z > z} = 1 pr{Z z}, pr{Z < z} = pr{Z > z} by symmetry, and also pr{X < b} = pr{X < a} +pr{a < X < b}.

    4Using interpolation in the tables (0.63) = 0.7356.5 pr{Z 1.25} = 0.8944, pr{Z > 1.90} = pr{Z 1.90} = 0.9713, pr{z < Z z} = (z) (z) = 2(z)

    1 = 0.9108 so (z) = 0.9554 and z= 1.70.60.8849, 0.2119, 0.2881. Hint: IfX N(, 2), then pr{X x} = `x

    .

    7From tables,x = 2.015.8

    t8(2.5%) = 2.306. pr{T >1.860} = 0.05 so pr{T < 1.860} = 0.05 by symmetry. Thus t = 1.860.

    11

  • 8/10/2019 Introduction to Statistics: Worked examples

    12/18

    IfT t10, what is pr{T < 2.228}? What is pr{2.228< T 1.96.

    Thus reject H0 at 5% level.14 Let X be number of sixes in 100 throws, so X Bin(n = 100, = 1/6) if H0 true. X N(= 16.667, 2 =

    13.889) if H0 true. Test statistic is z= x 16.667

    13.889= 2.236. Test rule is reject H0 if|z| > 1.96, so reject H0 at 5%

    level.

    12

  • 8/10/2019 Introduction to Statistics: Worked examples

    13/18

    Answer: 15

    Question (lecture 8).For values (x, y) as given below, obtain the sample correlation r.

    xi 1.1 2.2 3.4 4.5 5.0

    yi 3.3 6.1 7.0 10.4 11.5Answer: 16

    Question (lecture 10).For values (x, y) as given below, obtain the line of regression for y givenx. What does the residualat the first data point x1 = 1.1 equal? Ifx = 4, what is the predicted value ofy?

    xi 1.1 2.2 3.4 4.5 5.0yi 3.3 6.1 7.0 10.4 11.5

    Answer: 17

    Question (lecture 10).For values (x, y) as given below, a line of regression for y given x is fitted.

    xi 1.1 2.2 3.4 4.5 5.0yi 3.3 6.1 7.0 10.4 11.5

    Test the hypothesis that the slope equals zero.Answer: 18

    Question (lecture 11).Suppose pr

    {X=x

    }= x

    10 for x = 1, 2, 3, 4. Check that the probability function is valid (is 0

    pr{X=x} 1 for all x and does x

    pr{X=x} = 1?). Calculate E[X] and Var[X].

    15 n= 4, x= 4, s2 = 3.333, 0 = 1, s2/n= 0.8333. Test statistic is t =

    x 0/

    n =

    4 10.8333

    = 3.286. Test rule is

    reject H0 if|t| > t3(2.5%). As t3(2.5%) = 3.182, reject H0 at 5% level.16 x= 3.24, s2x =

    1

    n 1X

    (xi x)2 = 1n 1

    Xx2i nx2

    = 2.593,

    y = 7.66, s2y = 1

    n 1X

    (yi y)2 = 1n 1

    Xy2i ny2

    = 11.033,

    sxy = 1

    n 1X

    (xi x)(yi y) = 1n 1

    Xxiyi nxy

    = 5.2645, rXY =sxy/

    ps2xs2y = 0.984.

    Check your answer using R!x=c(1.1,2.2,3.4,4.5,5.0) # And setup y similarly.

    cor(x,y)17 x = 3.24, y = 7.66, s2x = 2.593, s2y = 11.033, sxy = 5.2645. Regression line isy = +x where = sxy/s

    2x =

    2.030, = y x= 1.082 so fitted line is y = 1.082 + 2.030x. Ifx1 = 1.1, predict y1 = 3.315. At x = 1.1, residualis r1= y1 y1= 3.3 3.315 = 0.015. Ifx = 4, predict y = 9.023. Check your answers using R!x=c(1.1,2.2,3.4,4.5,5.0) # And setup y similarly.

    lm(yx) # Gives parameter estimates.model=lm(yx) # Stores regression model output as model.model$residual[1] # First residual value.

    18 If H0: = 0, then /

    r2

    Sxx tn2, where Sxx = P(xi x)2 = (n 1)s2x. Here

    r2

    Sxx= 0.2105 where

    Sxx = (n 1)s2x = 10.372. Thus t = 9.646. t3(2.5%) = 3.182. As|t| > 3.182, reject H0 at 5% level. Checkyour answers using R!x=c(1.1,2.2,3.4,4.5,5.0) # And setup y similarly.

    model=lm(yx)summary(model) # Can you find your answers in the R output?

    13

  • 8/10/2019 Introduction to Statistics: Worked examples

    14/18

    Answer: 19

    Question (lecture 12).Suppose (X, Y) take values (0,0), (0,1), (1,0), (1,1) with probabilities 0.2, 0.5, 0.2, 0.1 respectively.Obtain the marginal probabilities for X, and the conditional probabilities for Y given X = 1.Obtain E[XY ]. Are Xand Y independent?

    Answer: 20

    Question (lecture 12).SupposefXY(x, y) = 4xy for 0 < x

  • 8/10/2019 Introduction to Statistics: Worked examples

    15/18

    Question (lecture 14).If Var[X] = 4 and Var[Y] = 9 and corr(X, Y) = 0.1, obtain cov(X+ 2Y, X Y).Answer: 26

    Question (lecture 14).IfX N(1, 9) andY N(1, 16) andX andYare independent, what is pr{|X Y| 5} =0.1587 and answer is 0.6826.

    28

    Recall that Var[Xi] = E[(Xi )2

    ] = 2

    and Var[X] = E[(X )2

    ] = 2

    /n. Also notice that ({Xi } {X })2 = (Xi )2 + ( X )2 2(Xi )(X ) andPi(Xi ) = n(X ). ThusPi(Xi X)2 =Pi(Xi )2 n(X )2. Now take expectations.29 A suitable model is to assume accidents occur randomly and independently in time. Assuming a constant level

    of car usage we are using a Poisson process model. Thus the number X1 of accidents in December 2010 satisfiesX1 Poisson(1). Similarly the numberX2 of accidents in December 2009 satisfies X2 Poisson(2). We want totest whether 1 = 2. Fori large, Xi N(i, i) for i = 1, 2 independently so X1 X2 N(1 2, 1+ 2).Thus ifH0 is true, and 1 = 2 = ,

    U= X1 X2

    2 N(0, 1).

    Assuming the null hypothesis is true, we would estimate by = 12 (336+308) = 322. Thus, replacing by = 322we obtain U = 1.103. Since|U| < 1.96, we accept the null hypothesis at the 5% level. The observed increase inaccidents was not significant!

    30 n1 = 4, x1 = 4, 21 = 4, n2 = 5, x2 = 3,

    22 = 1. Testing H0: 1

    2 = 0 vs. H1: 1

    2= 0. Test statistic is

    15

  • 8/10/2019 Introduction to Statistics: Worked examples

    16/18

    Question (lecture 15).Two independent samples gave values 3, 6, 5, 2 for sample 1 and 2, 2, 3, 3, 5 for sample 2. Assumingthat the samples come from independent normal distributions with common unknown variance 2,test at the 5% level whether the difference in mean equals zero against the alternative that it doesnot equal zero.Answer: 31

    Question (lecture 15).Five randomly selected remuneration packages for US oil and gas CEOs in 2008 were (in thousandsof US dollars) 21333, 7294, 6712, 5727, 7087. Five randomly selected remuneration packages forUS health care CEOs in 2008 were (in thousands of dollars) 14262, 8381, 7245, 10211, 1817. Testat the 5% level whether the difference in mean remuneration equals zero against the alternativehypothesis that it does not equal zero. You can assume that the two populations have common(unknown) variance 2.Answer: 32

    Question (lecture 16).

    A quarter of insurance claims are incomplete in some way. If you have 250 forms to process, whatis the approximate probability that you will find fewer than 50 of them incomplete?Answer: 33

    Question (lecture 16).Inn = 100 tosses of a coin I obtainX= 72 heads. Obtain an approximate 95% confidence intervalfor the probability of a head.Answer: 34

    Question (lecture 17).In December 2010 two analysts suggested several shares as likely to rise in 2011. By the end of

    October 2011 one (Neil Woodford) had four out ofn1 = 7 share tips showing a rise while theother (Harry Nummo) had three out ofn2 = 10 share tips showing a rise. Test at the 5% levelwhether the two success proportions are significantly different.Answer: 35

    z= x1 x2q

    21

    n1+

    22

    n2

    = 4 3q

    44 +

    15

    = 0.913. Test rule is reject H0 if|z| >1.96. Thus accept H0 at 5% level.

    31 n1= 4, x1= 4,s21 = 3.333, n2= 5, x2= 3,s

    22= 1.5, pooled estimate of

    2 is s2 = 3s21+ 4s

    22

    7 = 2.2857. Testing

    H0: 1 2 = 0 vs. H1: 1 2= 0. Test statistic is t= x1 x2sq

    1n1

    + 1n2

    = 4 3

    1.5119 q

    14 +

    15

    = 0.986. Test rule is

    reject H0 if

    |t

    |> t7(2.5%). As t7(2.5%) = 2.365, accept H0 at 5% level.

    32 Data source: http://graphicsweb.wsj.com/php/CEOPAY09.html.n1 = 5, x1 = 9630.6, s

    21 = 43158021, n2 = 5, x2 = 8383.2, s

    22 = 20577907, n1+ n2 2 = 8, t8(2.5%) = 2.306.

    If variances are equal to 2, estimate 2 using s2 = (n1 1)s21+ (n2 1)s22

    n1+ n2 2 = 31867964. Test statistic is t =|x1x2|rs2(

    1n1

    + 1n2

    )

    = 1247.43570.32

    = 0.349. Sincet8(2.5%) = 2.306, then|t| < t8(2.5%) so accept H0 that 1 = 2 against the

    alternative1=2 at the 5% level.33 IfX is the number of incomplete forms, X Bin(n = 250, = 14) N( = 62.5, 2 = 46.875). You require

    pr{X

  • 8/10/2019 Introduction to Statistics: Worked examples

    17/18

    Question (lecture 17).In January 2011 Durham police were reported as disappointed by the increase in the num-ber of people arrested for drinking and driving. Between December 1st 2010 and December31st 2010 they had 52 positive breath tests out of 1799 breath tests administered, while forthe same period in 2009 they had 41 positive tests out of 1433 administered. Construct a95% confidence interval for the difference in proportion of drivers who tested positive. Source:

    http://www.bbc.co.uk/news/uk-england-12261462Answer: 36

    Question (lecture 17).I observe two dice. For one die I notice that it gives a six 20 times out of 100 and for the seconddie I notice that it gives a six 22 times out of 80. Test at the 5% level whether the two dice givethe same probability of showing a six.Answer: 37

    Question (lecture 18).IfX

    24

    , for what value ofx is pr

    {X > x

    }= 0.05?

    Answer: 38

    Question (lecture 19).I roll a die 100 times and observe the following results.

    Outcome i 1 2 3 4 5 6Observed frequency 16 15 16 15 15 23

    Test at the 5% level whether the die is fair.Answer: 39

    ions-tips-2011.html

    Two binomial proportions here.

    1 = 4/7 = 0.571, 2 = 3/10 = 0.300, n1 = 7, n2 = 10. Common estimated

    proportion is = 71+ 102

    17 = 0.412. Approximate test statistic is z=

    |1 2|r(1 )

    1n1

    + 1n2

    = 1.119. reject H0 at5% level if|z| >1.96, so here accept the hypothesis that the two proportions are equal.

    36 Two binomial proportions again. 1= 52/1799 = 0.028905, 2 = 41/1433 = 0.028611,n1 = 1799, n2= 1433.

    Common estimated proportion is =17991+ 14332

    3232 = 0.0288. (This is very small so the normal approxima-

    tion is doubtful. In practice we would transform to give approximate normality.) Approximate test statistic is

    z= |1 2|r(1 )

    1n1

    + 1n2

    = 0.0496. Reject H0 at 5% level if|z| >1.96, so here accept the hypothesis that the twoproportions are equal.

    37 n1 = 100, x1 = 20, 1 = 20/100 = 0.200, n2 = 80, x2 = 22, 2 = 22/80 = 0.275. We test H0: 1 =2(= ) vs. H1: 1= 2. This is equivalent to testing H0: 1 2 = 0 vs. H1: 1 2= 0. Assuming H0 istrue, the estimated common proportion is estimated by =

    n11+ n22n1+ n2

    =20 + 22

    180 = 0.2333. Test statistic is

    z=1 2q

    (1)n1

    + (1)

    n2

    = 0.200 0.2750.0017889 + 0.0014907

    = 1.31. Test rule is reject H0 if|z| > 1.96, so accept H0 at 5%

    level.38From tables,x = 9.488.39 Let Xdenote the outcome of the die. We test whether pr{X= i} = 1/6 for alli. Expected frequency for any

    outcome would then be 100 16 = 16.667.

    Outcomei 1 2 3 4 5 6Observed frequencyOi 16 15 16 15 15 23

    Expected frequencyEi 1 6.67 16.67 16.67 16.67 16.67 16.67(Oi Ei)2/Ei 0.0267 0.1667 0.0267 0.1667 0.1667 2.407 sum=2.960

    17

  • 8/10/2019 Introduction to Statistics: Worked examples

    18/18