16
1 Chapter 10 Exercises 1. A sample of 30 observations has ¯ x = 137 and s = q 1 n-1 n i=1 (x i - ¯ x) 2 = 30. Assuming the data follow a normal distribution, use these values to test whether or not the population mean is different from 150. 2. A variable X follows a normal distribution with unknown mean μ. A sample of 60 observations of X is obtained and the following sample statistics are found: ¯ x = 915, s = v u u t 1 n - 1 n X i=1 (x i - ¯ x) 2 = 80. Use the data to test the following hypotheses: H 0 : μ 900 vs. H 1 : μ> 900 3. A variable Y follows a normal distribution with unknown mean μ but a known standard deviation of 6. A sample of 5 observations of Y is obtained and the sample average is ¯ y = 25. (a) Use the data to test the following hypotheses: H 0 : μ> 30 vs. H 1 : μ 30 (b) Find the p-value of the test and use it to comment on the error probability in relation to the conclusion of the test. (c) Suppose the sample standard deviation s = q 1 n-1 n i=1 (y i - ¯ y) 2 =5.5 is also known; repeat the test in (a) using this piece of information. Which of the two tests: (a) or (c), is better? 4. To determine whether a coin is fair, an experiment is carried out in which the coin is tossed 15 times. Out of these 15 tosses, heads is observed 13 times and tails 2 times. Assuming these tosses are independent of each other and each toss has the same probability of observing a head. Formulate the hypothesis test, find its exact p-value and drawn a conclusion. 5. In an experiment to test the effectiveness of bed-nets in a malaria infested region, 500 households are provided with bed-nets, and out of these, 100 households reported new infections in the month. The historic rate of new infections in the region is 1 in 4 households every month. Based on the data, is there sufficient evidence to suggest bed-nets are effective in reducing new malaria infections? 6. A student just signed up for a new internet service. Using the new service, it took him 1.3 days to download a movie, which is shorter than the average download speed of 1.55 days using the old service. Assuming download time follows an exponential distribution, if

1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

Embed Size (px)

Citation preview

Page 1: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

1

Chapter 10 Exercises

1. A sample of 30 observations has x = 137 and s =√

1n−1

∑ni=1(xi − x)2 = 30. Assuming

the data follow a normal distribution, use these values to test whether or not the populationmean is different from 150.

2. A variable X follows a normal distribution with unknown mean µ. A sample of 60observations of X is obtained and the following sample statistics are found:

x = 915, s =

√√√√ 1

n− 1

n∑i=1

(xi − x)2 = 80.

Use the data to test the following hypotheses:

H0 : µ ≤ 900 vs. H1 : µ > 900

3. A variable Y follows a normal distribution with unknown mean µ but a known standarddeviation of 6. A sample of 5 observations of Y is obtained and the sample average is y = 25.

(a) Use the data to test the following hypotheses:

H0 : µ > 30 vs. H1 : µ ≤ 30

(b) Find the p-value of the test and use it to comment on the error probability in relationto the conclusion of the test.

(c) Suppose the sample standard deviation s =√

1n−1

∑ni=1(yi − y)2 = 5.5 is also known;

repeat the test in (a) using this piece of information. Which of the two tests: (a) or(c), is better?

4. To determine whether a coin is fair, an experiment is carried out in which the coin is tossed15 times. Out of these 15 tosses, heads is observed 13 times and tails 2 times. Assuming thesetosses are independent of each other and each toss has the same probability of observing ahead. Formulate the hypothesis test, find its exact p-value and drawn a conclusion.

5. In an experiment to test the effectiveness of bed-nets in a malaria infested region, 500households are provided with bed-nets, and out of these, 100 households reported newinfections in the month. The historic rate of new infections in the region is 1 in 4 householdsevery month. Based on the data, is there sufficient evidence to suggest bed-nets are effectivein reducing new malaria infections?

6. A student just signed up for a new internet service. Using the new service, it took him1.3 days to download a movie, which is shorter than the average download speed of 1.55days using the old service. Assuming download time follows an exponential distribution, if

Page 2: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

2

he uses the data to claim that the average download time is faster under the new service,what is the probability that his claim is false?

7. Tofu, one of the great delicacies from Asia, was invented during the Han Dynasty(206 BC - 226 AD) in China and it was introduced to Japan during the Nara period(710 - 794 AD). Tofu is solidified soy milk made from pouring soy milk over a coagulant.Interestingly, some of the best tofus nowadays are found inNara, Japan, where tofu shops still make tofu in the same wayit was done a few centuries ago. A young man just inheriteda tofu shop from his father and he wants to experiment anew type of coagulant. Traditional Japanese tofu uses nigiri,which is extracted from sea water. The young man madetwo lots of tofu, one uses nigiri as the coagulant and anotherusing the new coagulant. Samples from both lots of tofu were offered to customers and eachcustomer was asked to state his/her preference.

Let x1, ..., x800 be IID Bernoulli(p) preferences (X) from his customers, where X = 1if a customer prefers tofu using the new coagulant and X = 0 if the customer prefers tofuusing nigiri; p represents the proportion of preference for tofu using a new coagulant. LetY =

∑800i=1 xi. Suppose 376 customers prefer tofu using the new coagulant.

(a) State the MLE for p, based on the given data?

(b) How many observations of X are there?

(c) What does Y represent and what distribution does Y follow? How many observationsof Y are there and what are their values?

(d) The new coagulant is cheaper and easier to obtain, so he would be happy with the newcoagulant if customers have no preference between the two. The relevant hypothesesare:

H0 : p = 0.5 vs. H1 : p 6= 0.5.

Are the hypotheses 1-sided or 2-sided? State your reason.

(e) Find the p-value of the test. Write one sentence to explain what this p-value represent.Does the p-value you obtain suggest H0 may be untrue?

(f) What is a 5% significance test? In a 5% significance test, what is the chance of atype-one error?

(g) Based on the p-value, what is the conclusion of your test, if a 5% significance level isto be used?

(h) Another way to carry out a 5% significance test is to use a test statistic. Repeat the5% significance test using a test statistic.

Page 3: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

3

8. Visitors to Nara love to visit its onsens, orhot springs. However, the onsen business is verycompetitive, especially during economic recessions.The owner of an onsen suspects his business is notdoing well but he has no clue on how to confirmhis hypothesis, so he approaches his daughter, whohas just finished a degree in Statistics. She adviseshim that he should take some data and carry outa statistical test. The number of guests followed aPoisson distribution with a rate of 5 per day in the past. Let λ be the rate right now.Then the hypotheses of interest are:

H0 : λ = 5 vs. H1 : λ < 5.

(a) Are the hypotheses 1-sided or 2-sided? State your reason.

(b) On the day they carried out the test, only 1 guest arrived. How many observations arethere?

(c) Find the p-value of the test. Write one sentence to explain what this p-value represent.Does the p-value suggest H0 may be untrue?

(d) Based on the p-value, what is the conclusion of the test, if a 5% significance test is tobe used?

9. To improve business, they introduce a new package so guests canenjoy unlimited free transportation to the surrounding hillsides and anine-course Kaiseki dinner. Let x1, ..., x50 be the daily revenue (X, in1000 U) in 50 days following the introduction of the new package. SupposeX ∼ N(µ, σ2), where µ is the average daily revenue and σ2 is the varianceof daily revenue. Let x = 1

50

∑50i=1 xi = 1060 and s2 = 1

50

∑50i=1(xi − x)2 =

51529.

(a) State the MLE for (µ, σ2), based on the given data?

(b) How many observations of X are there?

(c) Their interest is to determine whether the average revenue has improved compared tothe past, so the appropriate hypotheses are:

H0 : µ < 1000 vs. H1 : µ > 1000.

Are the hypotheses 1-sided or 2-sided? State your reason.

(d) Find the p-value of the test. Write one sentence to explain what this p-value represent.Does the p-value suggest H0 may be untrue?

Page 4: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

4

(e) Based on the p-value, what is the conclusion of the test, if a 5% significance level is tobe used?

(f) Another way to carry out a 5% significance test is to use a test statistic. Repeat the5% significance test using a test statistic.

(g) Repeat (f) using a “t-test” by choosing the appropriate critical value from the tablebelow:

df = n− 1 29 30 40 60 >120critical value 1.699 1.697 1.684 1.671 1.64

10. Six thousand miles from Nara, in the county ofNottinghamshire, England, the blue cheese makers are facinga different problem from that of the tofu maker in Nara.Since the outbreak of the Mad Cow Disease in the 1980’s,the government has banned the use of un-pasteurized milk inmaking cheeses. One maker wants to confirm the hypothesisthat pasteurization degrades the flavour of her cheeses. Soshe made two batches of cheeses, one using raw milk andanother using pasteurized milk. She then held a cheese tasting for her best customers in asecrete location. Let x1, ..., x100 be the preferences (X) between the two types of cheese in100 customers, where X = 1 if the customer liked the cheese made with pasteurized milkand X = 0 if he/she preferred the cheese made with raw milk. Suppose x1, ..., x100 are iidBernoulli(p), where p represents the proportion among all customer who prefer cheese madewith pasteurized milk. Let x = 1

100

∑100i=1 xi = 0.4.

(a) State the MLE for p, based on the given data?

(b) How many observations of X are there?

(c) The cheese maker’s interest is to test the hypotheses:

H0 : p ≥ 0.5 vs. H1 : p < 0.5.

Are the hypotheses 1-sided or 2-sided? State your reason.

(d) Find the p-value of the test. Write one sentence to explain what this p-value represent.Does the p-value suggest H0 may be untrue?

(e) Based on the p-value, what is the conclusion of your test, if a 5% significance level isto be used?

(f) Another way to carry out a 5% significance test is to use a test statistic. Repeat the5% significance test using a test statistic.

Page 5: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

5

11. Another cheese maker in the region decided to use a different method to determinewhether customers prefer cheeses made with raw milk over those made with pasteurizedmilk. He recorded the time T (in weeks) between purchases of his raw milk cheeses from 40of his customers: t1, ..., t40. Suppose t1, ..., t40 are IID Exp(λ), where 1

λrepresents the mean

time between purchases. Let t = 140

∑40i=1 ti = 0.24 (in weeks).

(a) State the MLE for 1λ, based on the given data?

(b) How many observations of T are there?

(c) The cheese maker noticed that the average time between purchases for his pasteurizedmilk cheeses is 0.3 week, so he is interested to test the hypotheses:

H0 :1

λ≥ 0.3 vs. H1 :

1

λ< 0.3.

Are the hypotheses 1-sided or 2-sided? State your reason.

(d) Find the p-value of the test. Write one sentence to explain what this p-value represent.Does the p-value you obtain suggest H0 may be untrue? (You may use the fact that

var( 1λ) = 1

nλ2, however, you should ask yourself why this is so.)

(e) Based on the p-value, what is the conclusion of your test, if a 5% significance level isto be used?

(f) Another way to carry out a 5% significance test is to use a test statistic. Repeat the5% significance test using a test statistic.

12. A type of cheese that has the texture of tofu is mozzarellacheese. The best mozzarella cheese comes from Italy, where bylaw, has to be made from buffalo milk. The cost of makingmozzarella cheese is very high because an average buffalo producesabout half the milk of a cow and harvesting buffalo milk cannotbe easily automated. Buffaloes are not indigenous to Italy andin fact, they are found widely in many parts of Asia. An Indianentrepreneur is venturing into the business of making mozzarellacheese in India. There are two possible sources to obtain buffalomilk, either through a co-operative or from the farmers directly. He wants to determinewhether there is a difference in obtaining the supply of milk from the two sources. Let Xbe the daily amount of milk from the co-operative and Y be that from the farmers. AssumeX ∼ N(µX , σ

2X) and Y ∼ N(µY , σ

2Y ), where (µX , σ

2X) and (µY , σ

2Y ) are unknown. Over a

course of n = 30 days, he obtained the following data on the amount (in 1000 liters) of milkfrom the two sources:

x =1

n

n∑i=1

xi = 11.6, s2x =1

n

n∑i=1

(xi − x)2 = 27.6,

y =1

n

n∑i=1

yi = 12.7, s2y =1

n

n∑i=1

(yi − y)2 = 32.4.

Page 6: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

6

Assume all data are independent of each other.

(a) He is interested to test the hypotheses:

H0 : µX − µY = 0 vs. H1 : µX − µY 6= 0.

Are the hypotheses 1-sided or 2-sided? State your reason.

(b) Using the results from Question 5 from Chapter 8 Exercises, write down the teststatistic for testing the hypotheses and carry out a 5% significance test.

(c) Repeat (b) using a “t-test”1 by choosing the appropriate critical value from the tablebelow. In this case, use a df = (n− 1) + (n− 1) = 2n− 2:

df = 2n− 2 29 30 40 60 >120critical value 2.045 2.042 2.021 2.000 1.96

(d) He notices that the fluctuations of milk production from both sources are large, whichare reflected in the large values of s2x and s2y. These large fluctuations are due to thefact that on some days, such as a holiday or when there is a village wedding, milkproduction is very low and following such days, production is unusually high. Hisinterest is in the difference between the mean production from the two sources, notthe daily fluctuations. In fact, the fluctuations are a distraction. To gain a betterinsight into the difference between the two sources, he takes the (daily) differencexi − yi = di, i = 1, ..., 30 and arrives at the following:

d =1

n

n∑i=1

di =1

n

n∑i=1

(xi − yi) = 11.6− 12.7 = −1.1,

s2d =

∑ni=1(di − d)2

n− 1= 7.

He also realises µ1 − µ2 is really the same as µD, where µD represents the mean of thedaily differences been the two sources in the long run. Therefore, the hypotheses in(a) can be re-written as

H0 : µD = 0 vs. H1 : µD 6= 0.

Treating d1, ..., d30 as a sample of observations of D, the daily difference, write downthe test statistic for the re-written hypotheses and carry out a 5% significance test.

(e) Repeat (d) using a “t-test”2 by choosing the appropriate critical value from the tablebelow. In this case, use a df = n − 1 because the test is based on 30 observations ofD:

1This test is often called a two-sample t-test2This test is often called a paired t-test

Page 7: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

7

df = n− 1 29 30 40 60 >120critical value 2.045 2.042 2.021 2.000 1.96

ANSWERS

(1) We first set up the hypotheses. Since the question is asking whether the mean (µ) is oris not equal to 150 and no specification of the direction of the difference (from 150) if any,the hypotheses are two-sided:

H0 : µ = 150 vs. H1 : µ 6= 150.

We use the following test statistic:

Z∗ =x− µ0√σ2/n

=137− 150√

302/30= −2.37.

Since σ is replaced by an estimate σ = s, we compare Z∗ to a critical value from the t-table.For n = 30, df = n−1 = 30−1 = 29, which gives a critical value of 2.045. Since |Z∗| > 2.045,H0 should be rejected.

(2) We use the following test statistic:

Z∗ =x− µ0√σ2/n

=915− 900√

802/60= 1.45.

Since σ is replaced by an estimate σ = s, we compare Z∗ to a critical value from the t-table.For n = 60, df = n− 1 = 60− 1 = 59, but there is no df that is exactly 59 from the table sowe choose the next smallest df = 40 which gives a critical value of 1.684. Since |Z∗| < 1.684,H0 cannot be rejected.

(3a) We use the following test statistic:

Z∗ =x− µ0√σ2/n

=25− 30√

62/5= −1.86

since |Z∗| = 1.86 is bigger than 1.64, the critical value for a one-sided test, H0 should berejected.

(b) Since the test is one-sided, the p-value of the test is P(Z > 1.86) = 0.0314. Hence theprobability of wrongly rejecting H0 is 3.14%.

(c) We use the following test statistic:

Z∗ =x− µ0√σ2/n

=25− 30√

5.52/5= −2.03

Page 8: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

8

Since σ is replaced by an estimate σ = s, we use a critical value from the t-table. Forn = 5, df = n−1 = 5−1 = 4, which gives a critical value of 2.132. Since |Z∗| = 2.03 < 2.132,H0 is not rejected.

We notice that the conclusions using the test (a) and (c) are different. The differentconclusions are due to the fact, that if we discard σ = 6 in favour of a sample estimatederived from a small sample, then we must make allowance for the uncertainties in thatestimate, resulting in a more conservative test statistic and a different conclusion.

In practice, if known population values are given (σ = 6 in this case), we should makeuse of the known population values and utilize methods that leverage on these known values.Hence (a) is the preferred test here.

(4) We first set up the hypotheses. Let p be the probability of heads in a toss for the coin.A fair coin has equal probability of observing heads or tails in a toss, and hence, p0 = 0.5for a fair coin. Since the question is asking whether the coin is fair (p0 = 0.5) or not fair andno specification of the direction of the difference (from a fair coin) if any, the hypotheses aretwo-sided:

H0 : p = 0.5(= p0) vs. H1 : p 6= 0.5.

Under the null hypothesis, if X represents the total number of heads in n = 15 tosses, thenX is a random variable that follows a Binomial(n = 15, p = 0.5) distribution. The p-valuefor a two sided test is

2× P(X ≥ 13)

= 2× [P(X = 13) + P(X = 14) + P(X = 15)]

= 2×[(

15

13

)(0.5)13(1− 0.5)2 +

(15

14

)(0.5)14(1− 0.5)1 +

(15

15

)(0.5)15(1− 0.5)0

]≈ 2× 0.00369

≈ 0.0074.

Based on this p-value, if the coin is fair, there is a probability of 0.0074 (or about once in136 times) that such an event (13 heads and 2 tails) is observed. Since this probability isquite small, we are inclined to believe this event is the result of an unfair coin and we rejectthe hypothesis that the coin is fair.

(5) We first set up the hypotheses. Let p be the probability of new infections in a monthafter using bed-nets. The historic rate is 1 in 4 households which implies a probability ofp0 = 0.25. Since the interest is in finding out whether bed-nets are effective, this implies aone-sided set of hypotheses:

H0 : p ≥ 0.25(= p0) vs. H1 : p < 0.25.

Page 9: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

9

Since the sample size n = 500 is rather large, we employ the one-sample test for proportions.The test statistic is:

Z∗ =p− p0√

p0(1− p0)/n=

100/500− 0.25√0.25(1− 0.25)/500

≈ −2.58.

Since |Z∗| = 2.58 is much larger than the one-sided critical value of 1.64, there is sufficientevidence to say that bed-nets reduce new infection rates.

(6) If T is the download time under the new service and it follows an Exp(λ) distribution,then the average download time is E(T ) = 1/λ. The probability of error can be framed undera hypothesis testing setting. First the claim that the new service is faster can be tested bysetting up the hypotheses as follows

H0 :1

λ≥ 1.55 vs. H1 :

1

λ< 1.55,

where H0 states that the average download time is not better than the old service whereasH1 states that the average download time is now shorter.

The observed time under the new service is 1.3 days. To use this observation to test thehypotheses, we assume H0 is true and then determine whether the outcome of 1.3 days isunlikely to be observed, in which case, we argue that our assumption that H0 may not betrue. We re-write the hypotheses as

H∗0 :

1

λ= 1.55 vs. H1 :

1

λ< 1.55,

Under H∗0 : 1

λ= 1.55, outcomes that are more unusual than the observed time is all times,

T , that are shorter than 1.3 days. The probability associated with these unusual events is:

P(T < 1.3 days).

To evaluate this probability, we need to recognise that T ∼ Exp(λ = 1/1.55) under H0.Hence

P(T < 1.3) = F (1.3) = 1− e−1.3/1.55 ≈ 0.57.

Therefore, the p-value of the test is 0.57. The p-value can be interpreted as follows: Eventhe new is not better than old service, there is a 57% chance that a particular downloadwould take less than 1.3 days! This means there is no reason to believe the new service isfaster on average. If he insists on claiming that the new service is faster, he has a 57% formaking a false claim.

(7a) MLE of p is p = X = Yn

= 376800

= 0.47.

Page 10: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

10

(b) Each xi, i = 1, ..., 800 is an observation of X. Therefore, there are n = 800 observations.

(c) Y is the total number of customers who prefer the tofu using the new coagulant. Y ∼Bin(n = 800, p). There is only one observation of Y and its value is 376.

(d) The hypotheses are 2-sided because in H1, we are not interested in whether p > or < 0.5.

(e) The MLE of p, p is 0.47. We need to find out how unusual is the observed value ofp = 0.47, if H0 : p = 0.5 is true.

According to the CLT, in a random sample of size n, p ∼ N(p, var(p) = p(1−p)n

) =

N(0.5, 0.5(1−0.5)800

= 0.0003125), if p = 0.5.

We can standardize the observed p as a Z-score:

z∗ =376800− 0.5

√0.0003125

= −1.697.

If H0 : p = 0.5 is true, then outcomes that are at least as unusual as the observed data arethose with a p further away from 0.5, or those with a Z-score that is more extreme than-1.697, which are Z ≤ −1.697 and Z ≥ 1.697. The associated probabilities are

P(Z ≥ 1.697) + P(Z ≤ −1.697) = 2× P(Z > 1.697) = 2× 0.0455︸ ︷︷ ︸from normal table

= 0.091.

Therefore, if p = 0.5, the probability is 0.091 (the p-value) that we will observe a value of pat least as unusual as 376

800.

Since 0.091 is not a very small value, there is only moderate evidence against H0 : p = 0.5.

(f) A 5% significance test {Rejects H0 if p-value < 0.05Rejects H1 if p-value ≥ 0.05

In a 5% significance test, the probability of a type-one error is 0.05.

(g) Since p-value= 0.091 > 0.05, therefore, H0 is NOT rejected.

(h) We have already calculated the test statistic in (e):

z∗ =376800− 0.5

√0.0003125

= −1.697

In a 2-sided 5% significance test, the rule is to{Reject H0 if |z∗| > 1.96Reject H1 if |z∗| ≤ 1.96

.

Since |z∗| = 1.697 < 1.96, therefore, we reject H1.

Page 11: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

11

(8a) The hypotheses are 1-sided because in H1, we are interested in whether λ < 5.

(b) There is one observation. The value of the observation is 1, which is the number ofcustomers in one day.

(c) The hypotheses of interest are:

H0 : λ = 5 vs. H1 : λ < 5.

We observed a single observation of 1. We want to find out how unusual is this observation,if H0 : λ = 5 is true.

We can calculate the probabilities of various outcomes under a Poisson(λ = 5)distribution:

k 0 1 2 3 4 5 6 7 8 9 ≥ 10P(X = k) 0.00674 0.03369 0.08422 0.14037 0.17547 0.17547 0.14622 0.10444 0.06528 0.03627 0.03182

Therefore, if λ = 5, the probability for outcomes as unusual as the observed value of 1 is:

P(X ≤ 1) = 0.00674 + 0.03369 = 0.04043 = p-value.

Since 0.04043 is a moderately small value, there is some evidence against H∗0 : λ = 5 (and

against H0).

(d) A 5% significance test {Rejects H0 if p-value < 0.05Rejects H1 if p-value ≥ 0.05

Since p-value= 0.04043 < 0.05, therefore, H∗0 (and H0) is rejected. They can conclude that

there are fewer guests per day than before.

(9a) MLE of µ and σ2 are µ = x = 150

∑50i=1 xi = 1060 and σ2 = s2 = 1

50

∑50i=1(xi − x)2 =

51529.

(b) Each xi, i = 1, ..., 50 is an observation of X. Therefore, there are n = 50 observations.

(c) The hypotheses are 1-sided because in H1, we are interested in whether µ > 1000.

(d) As we discussed in class, we can test the following hypotheses:

H∗0 : µ = 1000 vs. H1 : µ > 1000.

We want to determine how unusual is the observed µ = 1060, if H0 : µ = 1000 is true. Usingthe CLT, µ ∼ N(µ, var(µ) = σ2

n) ≈ N(1000, σ

2

50) = N(1000, 51529

50), if µ = 1000. We can

express the observed µ = 1060 in Z-score:

z∗ =1060− 1000√

5152950

= 1.869.

Page 12: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

12

Outcomes as unusual as the observed data are those with a Z-score at least as big, in absolutevalue term, which means Z ≥ 1.869 (Since this is a 1-sided test). The probability of theseoutcomes is

P(Z ≥ 1.869) = 0.0308

= p-value.

Since 0.0308 is a small value, there is some evidence against H∗0 : µ = 1000 (and against H0).

(e) A 5% significance test {Rejects H0 if p-value < 0.05Rejects H1 if p-value ≥ 0.05

Since p-value= 0.0308 < 0.05, therefore, H∗0 (and H0) is rejected.

(f) From (d), the test statistic is:

z∗ =1060− 1000√

5152950

= 1.869.

In a 1-sided 5% significance test, the rule is to{Reject H0 if |z∗| > 1.64Reject H1 if |z∗| ≤ 1.64

.

Since |z∗| = 1.869 > 1.64, therefore, we reject H∗0 (and H0).

(g) Using the t-test, we need to determine df = n − 1 = 49. However, the table does notgive a critical value corresponding to df = 49. In that case, we can choose the critical valuecorresponding to df = 40, which is 1.684.3 Since the test statistic is 1.869 > 1.684, thereforethe conclusion is the same as (f). It is not surprising that we obtain the same conclusions in(f) and (g), since the critical values in (f) and (g) are very similar. This problem highlightsthe fact that, unless n is really small and the test statistic is borderline significant, using (f)is often sufficient.

(10a) MLE of p is p = x = 40100

= 0.4.

(b) Each xi, i = 1, ..., 100 is an observation of X. Therefore, there are n = 100 observations.

(c) The hypotheses are 1-sided because in H1, we are interested in whether p < 0.5.

(d) As we discussed in class, we can test the following hypotheses:

H∗0 : p = 0.5 vs. H1 : p < 0.5.

3A general rule is, when we cannot find a df in the table that corresponds to the calculated df , then weshould choose the critical value corresponding to the next lowest df that is available

Page 13: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

13

The MLE of p, p is 0.4. We need to find out how unusual is p = 0.4, if H∗0 : p = 0.5 is true.

According to the CLT, in a random sample, p ∼ N(p, var(p) = p(1−p)n

) =

N(0.5, 0.5(1−0.5)100

= 0.0025), if p = 0.5.

We can standardize the observed p as a Z-score:

z∗ =40100− 0.5

√0.0025

= −2.

If H∗0 : p = 0.5 is true, then outcomes that are at least as unusual as the observed data are

those with a p further away from 0.5, or those with a Z-score that is more extreme than -2,which are Z ≤ −2 (Recall this is a 1-sided test). The associated probability is

P(Z ≤ −2) = P(Z > 2) = 0.0228︸ ︷︷ ︸from normal table

.

Therefore, if p = 0.5, the probability is 0.0228 (the p-value) that we will observe a value ofp at least as unusual as 40

100= 0.4.

Since 0.0228 is a small value, there is evidence against H∗0 : p = 0.5 (and against H0).

(e) A 5% significance test {Rejects H0 if p-value < 0.05Rejects H1 if p-value ≥ 0.05

Since p-value= 0.00228 < 0.05, therefore, H∗0 (and H0) is rejected.

(f) We have already calculated the test statistic in (d):

z∗ =40100− 0.5

√0.0025

= −2.

In a 1-sided 5% significance test, the rule is to{Reject H0 if |z∗| > 1.64Reject H1 if |z∗| ≤ 1.64

.

Since |z∗| = 2 > 1.64, therefore, we reject H∗0 (and H0).

(11a) MLE of 1λ

is 1λ

= t = 0.24.

(b) Each ti, i = 1, ..., 40 is an observation of T . Therefore, there are n = 40 observations.

(c) The hypotheses are 1-sided because in H1, we are interested in 1λ< 0.3.

Page 14: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

14

(d) As we discussed in class, we can test the following hypotheses:

H∗0 :

1

λ= 0.3 vs. H1 :

1

λ< 0.3.

The MLE of 1λ

is 1λ

= 0.24. We need to find out how unusual is 1λ

= 0.24, if H∗0 : 1

λ= 0.3 is

true.

According to the CLT, in a random sample, 1λ∼ N( 1

λ, var( 1

λ) = 1

nλ2) = N(0.3, 0.3

2

40=

0.00225), if 1λ

= 0.3.

Note:

We determine var( 1λ) as follows:

var(1

λ) = var(T )

= var

(T1 + ...+ Tn

n

)=

1

n2{var(T1) + ...var(Tn)}

=nvar(T )

n2︸ ︷︷ ︸T1,...,Tn iid

=1

nλ2︸︷︷︸var(T )=1/λ2 for T∼Exp(λ)

We can standardize the observed 1λ

as a Z-score:

z∗ =0.24− 0.3√

0.00225= −1.2649.

If H∗0 : 1

λ= 0.3 is true, then outcomes that are at least as unusual as the observed data are

those with a 1λ

further away from 0.3, or those with a Z-score that is more extreme than-1.2649, which are Z ≤ −1.2649 (This is a 1-sided test). The associated probability is

P(Z ≤ −1.2649) = P(Z > 1.2649) = 0.102︸ ︷︷ ︸from normal table

.

Therefore, if 1λ

= 0.3, the probability is 0.102 (the p-value) that we will observe a value of 1λ

at least as unusual as 0.24.

Since 0.102 is a rather big value, there is no evidence against H∗0 : 1

λ= 0.3 (or against

H0).

Page 15: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

15

(e) A 5% significance test {Rejects H0 if p-value < 0.05Rejects H1 if p-value ≥ 0.05

Since p-value= 0.102 > 0.05, therefore, H∗0 (and H0) is not rejected.

(f) From (d):

z∗ =0.24− 0.3√

0.00225= −1.2649.

In a 1-sided 5% significance test, the rule is to{Reject H0 if |z∗| > 1.64Reject H1 if |z∗| ≤ 1.64

.

Since |z∗| = 1.2649 < 1.64, therefore, we do not reject H∗0 (and H0).

(12a) The test is a 2-sided test since he has no preference of the direction of difference in H1.

(b) From Question 5 in Chapter 8 Exercises, we found that the MLE for µX − µY is x− y.

Furthermore, if we let µX−Y be the MLE of µX−µY , we showed that var(µX−Y ) =σ2X

n+σ2Y

n,

since the two samples have the same size. Therefore, from the CLT for MLE, we can deducethat a test statistic for the hypotheses is:

x− y√σ2X

n+

σ2Y

n

=11.6− 12.7√

27.630

+ 32.430

=−1.1√

2≈ −0.7778

Since this is a 2-sided test, the critical value is 1.96 for a 5% significance test.Furthermore, | − 0.7778| < 1.96, therefore, there is no evidence that there is any differencebetween the two sources.

(c) Since n = 30, df = n + n − 2 = 58. However, there is no corresponding df from thetable. Therefore, we use the critical value corresponding to df = 40, which is 2.021. Since| − 0.7778| < 2.021, the conclusion is identical to that in (b)

(d) Based on the data, d1, ...d30 can be seen as 30 observations of D, the daily difference.His interest is to determine whether the long run average of D, which is µD, is zero or not.Thus, the hypotheses can be tested using the same type of test statistics we have been using,viz.:

d

sd/√n

=−1.1√7/30

≈ −2.2772.

Page 16: 1 Chapter 10 Exercises - mysmu.edu · 1 Chapter 10 Exercises 1. A sample of 30 observations has x= 137 and s= q 1 n i1 P n =1 (x i x)2 = 30. Assuming the data follow a normal distribution,

16

This statistic gives a p-value < 0.05 since | − 2.2772| > 1.96, therefore H0 is rejected.

The reason for the difference in the test statistic by (b) and (d) is as follows. Under (b),the hypotheses are tested by separately estimating µ1 and µ2 by x and y, both of whichare bad estimates because there is a high variation in production of milk. Therefore, usingthem to test µ1− µ2 is a bad choice. On the other hand, after taking paired differences, thelarge variations disappeared (Note s2d is much smaller than s21, s

22). Therefore, using d gives

a much more sensitive test of the hypotheses.

There are three conditions for carrying out a test similar to the one in (d):

1. The sample size in both samples must be equal, i.e., n for both samples

2. Each observation in one sample is uniquely paired in some meaningful way to anobservation in the second sample. In this case, production form the two sources on thesame day are paired

3. The samples should be positively correlated. This situation can be determined byvisual inspection or by observing that s21, s

22 are large relative to the value of s2d

We notice the test in (d) essentially a test using the differences n observations of di’s. Sothis test is not different in concept from the ones that we have been using, for example inQuestions 1, 3-5.

(e) There are 30 pairs of differences, so n = 30, df = n − 1 = 29. Therefore, we use thecritical value 2.045. Since | − 2.2772| > 2.045, the conclusion is identical to that in (d)