9
7/23/2019 Anh Nguyen 6 http://slidepdf.com/reader/full/anh-nguyen-6 1/9 MATH 325 HW6 Anh Nguyen November 4, 2015

Anh Nguyen 6

Embed Size (px)

Citation preview

Page 1: Anh Nguyen 6

7/23/2019 Anh Nguyen 6

http://slidepdf.com/reader/full/anh-nguyen-6 1/9

MATH 325 HW6

Anh Nguyen

November 4, 2015

Page 2: Anh Nguyen 6

7/23/2019 Anh Nguyen 6

http://slidepdf.com/reader/full/anh-nguyen-6 2/9

Page 3: Anh Nguyen 6

7/23/2019 Anh Nguyen 6

http://slidepdf.com/reader/full/anh-nguyen-6 3/9

3

Problem 5.10 p.154

Description .  Estimate the total number of acres of trees on farms in the state, and place

a bound on the error of estimation. Graph the data on an appropriate plot and comment

on the variation as we move from I to IV.

(a)   Solving without using macros from library(survey):

Denote:

N i :  ith stratum size

ni :  sample size from ith stratum

Y i :  mean of the sample taken from  ith stratum

si :  standard deviation of sample taken from  ith stratum

Entering data into R:

> source("http://educ.jmu.edu/~garrenst/math325.dir/Rmacros")

> source("http://educ.jmu.edu/~garrenst/math100.dir/Rmacros")

> myframe=read.table2("EXER5_10.DAT")

> y1=myframe[1:14,2]

> y2=myframe[15:26,2]> y3=myframe[27:35,2]

> y4=myframe[36:40,2]

> N=c(86,72,52,30)

> n=c(length(y1),length(y2),length(y3),length(y4))

Estimate   τ , the population total number of acres of trees on farms in the state, for

stratified sampling.

 bτ st =  N  Y st =4

Xi=1 N iY i   (5)

> sample.means=c(mean(y1),mean(y2),mean(y3),mean(y4))

> tau.hat.st=sum(N*sample.means)

> tau.hat.st

[1] 50505.6

Page 4: Anh Nguyen 6

7/23/2019 Anh Nguyen 6

http://slidepdf.com/reader/full/anh-nguyen-6 4/9

4

The estimated total number of acres of trees on farms in the state is 50, 505.6. Next,

the estimate of the variance of the τ   is:

 bV  (τ st) =

 bV  (N  Y st) =

3

Xi=1N 2

i 1 −

ni

N is2i

ni

(6)

> sample.sd=c(sd(y1),sd(y2),sd(y3),sd(y4))

> var.tau.hat=sum(N^2*(1-(n/N))*(sample.sd^2/n))

> var.tau.hat

[1] 18762431

Place a bound on the error of the estimation:

B = 2q V  (τ st) (7)

> 2*sqrt(var.tau.hat)

[1] 8663.124

So the bound on error of estimation is 8663.124.

(b)   Solving with using macros from library(survey)

> library(survey)

> mydesign=svydesign(~1,strata=~V1,fpc=rep(N,n),data=myframe)

> tau=svytotal(~V2,mydesign)

> tau.hat.st=coef(tau)

> tau.hat.st

V2

50505.6

> bound=2*SE(tau)

> bound

V2

V2 8663.124

So the estimated total number of acres of trees on farms in the state is 50, 505.6 and

the bound on error of estimation is 8663.124

Page 5: Anh Nguyen 6

7/23/2019 Anh Nguyen 6

http://slidepdf.com/reader/full/anh-nguyen-6 5/9

5

(c)  The graph of the sample data from all strata is shown below

> ggplot(data=myframe,aes(x=V1,y=V2,fill=V1))+geom_boxplot()+labs(title="

Plot of the Stratums",x="Stratum",y="Farm Sizes") + scale_fill_brewer(

palette="Blues") + theme_classic()

Figure 1: Boxplot of the sample data from all strata

0

200

400

600

800

I   II   III   IV

Stratum

   F  a  r  m    S

   i  z  e  s

  V1

I

II

III

IV

Plot of the Stratums

From the above boxplot, the variation in the sample taken from the fourth stratum

is very high compared to all other samples. In addition, observe that the variation is

steadily increasing as we move from stratum I to stratum IV.

Page 6: Anh Nguyen 6

7/23/2019 Anh Nguyen 6

http://slidepdf.com/reader/full/anh-nguyen-6 6/9

6

Problem 5.8 p.154

Denote:

N i :  ith stratum size

ni :  sample size from  ith stratum

ai :  proportional allocation of the sample taken from  ith stratum

si :  standard deviation of sample taken from  ith stratum

Entering data into  R:

> myframe=read.table2("EXER5_6.DAT")

> N=c(55,80,65)

> y1=myframe[1:14,2]

> y2=myframe[15:34,2]

> y3=myframe[35:50,2]

> n=c(length(y1),length(y2),length(y3))

> a=n/sum(n)

The stratum standard deviations are approximated by the sample variances obtained using

the given data, that is   σ2i  ≈ s2i . The approximate sample sized required to estimate average

score with a bound B = 4 on the error of estimation is:

n =

P3

i=1 N 2

i σ2

i/ai

N 2D +P

3

i=1 N iσ2i

(8)

where D  =  B2/4. In R:

> y.vars=c(var(y1),var(y2),var(y3))

> B=4

> D=B^2/4

> (sum(N^2*y.vars/a))/(sum(N)^2*D+sum(N*y.vars))

[1] 32.14369

So, the sample size required to estimate the average score is 32

Page 7: Anh Nguyen 6

7/23/2019 Anh Nguyen 6

http://slidepdf.com/reader/full/anh-nguyen-6 7/9

7

Problem 5.31 p.159

(a)   Estimate the population proportion of those who think they have worked beyond a

safe limit. Calculate a bound on the error of estimation.

The population of those practicing anesthesiology was stratified intro 3 groups: anes-

thesiologists (50%), anesthesiology resident (10%), nurse anesthetist (40%). The anes-

thesiologists were asked whether or not they had worked without a break beyond a safe

limit. We want to estimate p, the population proportion of those who would answer

”yes”.

Given the sample proportion of anesthesiologist, ˆ p1 = 0.687

Given the sample proportion of anesthesiology resident, ˆ p2 = 0.824

Given the sample proportion of nurse anesthetist, ˆ p3 = 0.782

The population proportion,   p, is estimated by the stratified sample proportion ˆ pst,

which is the weighted average of the sample proportions, ˆ pi  of the strata:

ˆ pst =3X

i=1

N iN 

  ˆ pi

= 0.5  · 0.687 + 0.1  ·  0.824 + 0.4  · 0.782

> N.frac=c(0.5,0.10,0.40)

> p.hat=c(0.687,0.824,0.782)

> p.hat.st=sum(N.frac*p.hat)> p.hat.st

[1] 0.7387

As a result, the estimated population proportion of practicing anesthesiology who

would answer ”yes” is 0.7387. Also, the estimated population proportion of those who

would answer ”no” is q  = 1 − ˆ p. Therefore, the estimated variance of ˆ pst   is calculated

by:

V  (ˆ pst) =3

Xi=1N 2

i

N 2 1 −

ni

N i  ˆ piq i

ni − 1   (9)

However, since we assume the population size is large,

1 −   ni

N i

  = 1. Equation (9)

will become:

V  (ˆ pst) =3X

i=1

N iN 

2

  ˆ piq ini − 1

  (10)

> n=c(417+913,29+136,240+860)

Page 8: Anh Nguyen 6

7/23/2019 Anh Nguyen 6

http://slidepdf.com/reader/full/anh-nguyen-6 8/9

8

> v.hat.p.st=sum(N.frac^2*p.hat*(1-p.hat)/(n-1))

> B=2*sqrt(v.hat.p.st)

> B

[1] 0.01721764

So the bound on error of estimation is 0.0172. In addition, there is another way to

solve this problem by using macros from library(survey). First, we need to build a

dataframe, which cotains two columns: ”job” and ”safelimit”. The values of ”job”

column is 1, 2 and 3 which correspond to the numberth of stratum. The values of 

”safelimit” column is 0 and 1, which 0 corresponds to ”No” and 1 corresponds to

”Yes”.

> y1=c(rep(1,417+913),rep(2,29+136),rep(3,240+860))

> y2=c(rep(0,417),rep(1,913),rep(0,29),rep(1,136),rep(0,240),rep(1,860))

> y3=cbind(y1,y2)

> colnames(y3)=c("job","safelimit")

> myframe=data.frame(y3)

and then we can use macros from library(survey) to help us solve this problem:

> N.frac=c(0.5,0.10,0.40)

> N=1e10*N.frac

> n=c(417+913,29+136,240+860)

> mydesign=svydesign(~1,strata=~V1,fpc=rep(N,n),data=myframe)

> z=svymean(~safelimit,mydesign)

> coef(z)

safelimit

0.7383846

> 2*SE(z)

safelimit

safelimit 0.01722261

From the ouput, the estimated population proportion of practicing anesthesiology who

would answer ”yes” is 0.7384 and the bound on error of estimation is 0.0172

Page 9: Anh Nguyen 6

7/23/2019 Anh Nguyen 6

http://slidepdf.com/reader/full/anh-nguyen-6 9/9

9

(b)   Do anesthesiologists diff er significantly from residents in this matter?

To answer this question, we need to compute the confidence interval on the diff erence

between those two’s corresponding population proportions and see whether or not it

contains zero.

 b p1 − b p2  ±  2

s  b p1(1 − b p1)

n1+ b p2(1 − b p2)

n2

> (p.hat[1]-p.hat[2])+c(-1,1)*2*sqrt(p.hat[1]*(1-p.hat[1])/n[1]+p.hat[2]*(1-

p.hat[2])/n[2] )

[1] -0.201517 -0.072483

The 95% confidence interval on   p1   −  p2   is (−0.2015,   −0.0724). Since zero is not

included in this interval, there is enough evidence to suggest that anesthesiologistsdiff er significantly from residents in this matter.

(c)   Do anesthesiologists diff er significantly from nurse anesthetists in this matter?

Again, we need to compute the confidence interval on the diff erence between those

two’s corresponding population proportions and see whether or not it contains zero.

 b p1 −

 b p3  ±  2

s  b p1(1 −

 b p1)

n1+

 b p3(1 −

 b p3)

n3

> (p.hat[1]-p.hat[3])+c(-1,1)*2*sqrt(p.hat[1]*(1-p.hat[1])/n[1] + p.hat

[3]*(1-p.hat[3])/n[3] )

[1] -0.13058964 -0.05941036

The 95% confidence interval on   p1   −  p3   is (−0.1306,   −0.0594). Since zero is not

included in this interval, there is enough evidence to suggest that anesthesiologists

diff er significantly from nurse anesthesiologists in this matter.