116
DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM QUALIFYING EXAMINATION I: APPLIED STATISTICS Saturday, January 24, 2015 Examiners: Drs. R. D. Wooten, L. Lu and C. P. Tsokos Answer any seven questions completely; at least one must be from Part B. PART A: Statistical Analysis 1. A calculator runs on four batteries where the time to failure of any battery is exponentially distribution with a mean of 10 months. Let be the time to failure of the calculator. Determine the CDFs under each fail condition. ()= (a) The calculator fails to function properly if any battery fails. (b) The calculator fails to function properly only if all batteries fail.

DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

  • Upload
    hadat

  • View
    251

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

DEPARTMENT OF MATHEMATICS & STATISTICS

GRADUATE PROGRAM

QUALIFYING EXAMINATION I: APPLIED STATISTICS

Saturday, January 24, 2015

Examiners: Drs. R. D. Wooten, L. Lu and C. P. Tsokos

Answer any seven questions completely; at least one must be from Part B. PART A: Statistical Analysis 1. A calculator runs on four batteries where the time to failure of any battery is exponentially

distribution with a mean of 10 months. Let 𝑻𝑻 be the time to failure of the calculator. Determine the CDFs under each fail condition.

𝒇𝒇(𝒕𝒕) =𝟏𝟏𝟏𝟏𝟏𝟏

𝒆𝒆−𝒕𝒕𝟏𝟏𝟏𝟏

(a) The calculator fails to function properly if any battery fails.

(b) The calculator fails to function properly only if all batteries fail.

Page 2: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

2. Doctors from a certain research institute think that they have found a drug that can relieve flu symptoms better than other drugs currently available on the market. Ten sets of identical twins (sharing genetically similar immune systems) are asked to participate in a study for the drug efficacy. Among each set, one twin is given the new drug and the other twin is given the old drug. After 24 hours, they are asked to rate how they feel on a scale of 1 to 10 (1=horrible, 10=great). The data are summarized in the table below. Consider the rating scores as continuous and assume they follow normal distributions. Note the normal plot for the difference in the ratings of the new and old drugs for the ten sets of identical twins is examined as shown on the right side.

Twins 1 2 3 4 5 6 7 8 9 10 Mean SD New drug 6 5 7 6 8 7 9 4 8 7 6.7 1.5 Old drug 5 4 7 4 6 8 8 4 5 4 5.5 1.6 Difference in rating (New-Old)

1 1 0 2 2 -1 1 0 3 3 1.2 1.3

(a) Do you think the data suggest the new drug is a significant improvement over the old drug

at a significance level of 0.05?

(b) Find a 95% prediction interval for the difference in the rating of the new and old drugs for a future sample of identical twins.

Page 3: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

3. Two brands of water filters are to be compared in terms of the mean reduction in impurities measured in parts per million (ppm). Sixteen water samples were tested with each filter and reduction in the impurity level was measured , resulting in the following data

Filter 1 1 16n = 8.0x = 1 4s = Filter 2 2 16n = 5.5y = 2 2.5s =

(a) Do you think the data suggest that the two brands of filters are different in the mean reduction in impurities? State and test the relevant hypotheses at a significance level of 0.05 using the Welch-Satterthwaite separate variances approach?

(b) Do the two brands of filters have equal amount of variability in reduction in the impurity level at a significant level of 0.01?

Page 4: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(c) Now if the two brands of filters are assumed to have equal variances, would the conclusion change at the same significance level 0.05?

(d) If the two populations are assumed to have equal variances, compute the 95% confidence interval for the pooled variance for the combined population.

Page 5: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

4. Biologists studying sandflies at a site in eastern Panama looked at the relationship between gender and the height at which the flies were caught in a light trap, resulting in the following data:

Gender Height above Ground

Row Total 3 feet 35 feet

Males 173 125 298

Females 150 73 223

Column Total 323 198 521

Set up the hypotheses to test whether the gender of the sandflies and trap height are associated Conduct an appropriate test at 0.05α = and draw a conclusion.

Page 6: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

5. A student is interested in modeling the relationship between the weekly number of hours studied before an exam and score earned on that exam. The following data were supplied by the student.

Test Hours Studied Score on Exam 1 8 77 2 10 90 3 5 56 4 8 84 5 7 72

(a) Fit a linear regression model to the data and estimate the parameters using the least squares

method. Also estimate the standard error (variance) 𝜎𝜎2. (b) Determine the correlation coefficient and interpret in terms of strength and direction.

Page 7: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(c) Test the hypothesis that there is a linear correlation. (d) State the marginal change and interpret it in context.

Page 8: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

6. Mercury is a naturally occurring contaminant in fish in freshwater lakes. In some lakes, the mercury concentration is sufficiently high that warnings are posted to limit the consumption of fish from that lake. The mercury concentration is influenced by many characteristics of lakes. The state of Maine collected information from a random sample of 110 lakes. The variables of interest included:

mc: mercury concentration in fish elev: elevation of the lake area: area of the lake drain: size of the drainage basin for the lake runoff: annual runoff into the lake

Here are the parameter estimates and their standard errors when a multiple linear regression model with all the variables is fit to the data:

Parameter Estimate SE

0β 0.758 0.144

elevβ -0.211 0.061

areaβ -0.014 0.016

drainβ 0.0137 0.029

runoffβ -0.292 0.262

(a) Complete the ANOVA table given below:

Models SSE DF MS

Model 1.137

Error 7.515

Total 8.653

(b) Construct a test of 0 : 0elev area drain runoffH β β β β= = = = and draw a conclusion at 0.01α = , if that

is possible from the information provided. In not, indicate how you would construct the desired test.

Page 9: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(c) Construct a test of 0 : 0drain runoffH β β= = and draw a conclusion at 0.01α = , if that is possible

from the information provided. In not, indicate how you would construct the desired test.

(d) Construct a test of 0 : 0area drainH β β+ = and draw a conclusion at 0.01α = , if that is possible from the information provided. In not, indicate how you would construct the desired test.

Page 10: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

7. In an experiment to investigate the effect of color of paper (blue, green, orange) on response rates for questionnaires distributed by the “windshield method” in supermarket parking lots, 15 representative supermarket parking lots were chosen in a metropolitan area and each color was assigned at random to five of the lots. The response rates (in percent) are summarized in the table below. Assume the

ANOVA model ij i ijY µ ε= + , 2~ (0, )ij Nε σ is appropriate.

j

i 1 2 3 4 5 iY ⋅ 2is

Blue 28 26 31 27 35 29.4 13.3

Green 34 29 25 31 29 29.6 10.8

Orange 31 25 27 29 28 28.0 5

29Y⋅⋅ = 2 9.7ps =

(a) Calculate a 90% confidence interval for the experimental errorσ .

(b) Do the average response rates differ for the three colors of paper? State and test your hypotheses at the 0.1 significance level.

Page 11: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(c) Use Tukey method to do all pairwise comparisons between the three colors of paper at significance level 0.05 and summarize your result.

(d) When informed of the finds, an executive said: “See? I was right all along. We might as well print the questionnaires on plain white paper, which is cheaper.” Does this conclusion follow from the finds of the study? Discuss.

Page 12: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

8. Consider {𝑨𝑨,𝑩𝑩,𝑪𝑪} × {𝑼𝑼,𝑽𝑽} coded with three dummy variables: 𝒅𝒅𝟏𝟏,𝒅𝒅𝟐𝟐 and 𝒅𝒅𝟑𝟑

𝒅𝒅𝟏𝟏 = �𝟏𝟏 𝑩𝑩𝟏𝟏 𝑩𝑩′

𝒅𝒅𝟐𝟐 = �𝟏𝟏 𝑪𝑪𝟏𝟏 𝑪𝑪′

𝒅𝒅𝟑𝟑 = �𝟏𝟏 𝑽𝑽𝟏𝟏 𝑽𝑽′

𝒁𝒁 = 𝜷𝜷𝟏𝟏 + 𝜷𝜷𝟏𝟏𝒅𝒅𝟏𝟏 + 𝜷𝜷𝟐𝟐𝒅𝒅𝟐𝟐 + 𝜷𝜷𝟑𝟑𝒅𝒅𝟑𝟑 + 𝜷𝜷𝟒𝟒𝒅𝒅𝟏𝟏𝒅𝒅𝟑𝟑 + 𝜷𝜷𝟓𝟓𝒅𝒅𝟐𝟐𝒅𝒅𝟑𝟑

(a) Explain why 𝒅𝒅𝟏𝟏𝒅𝒅𝟑𝟑 and 𝒅𝒅𝟐𝟐𝒅𝒅𝟑𝟑 are included in the model, and explain why 𝑑𝑑1𝑑𝑑2 is not included.

(b) State the six means in terms of the beta coefficients; and give the relationships between difference of means and any contrast. Include each beta coefficient as either a mean, difference of means or contrast.

Page 13: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

9. The table below shows the results of using all subset selection for choosing a best model from all possible regressions with four predictor variables 1 2 3, ,X X X and 4X . Five different criteria are considered. Please make a recommendation of which model to choose based on the results below and explain your reason.

Variables in the model

2R 2adjR pC AIC BIC

1 0.061 0.043 141.16 -77.08 -73.10 2 0.221 0.206 108.56 -87.18 -83.20 3 0.428 0.417 66.49 -103.83 -99.85 4 0.422 0.410 67.71 -103.26 -99.28 12 0.263 0.234 102.03 -88.16 -82.20 13 0.549 0.531 43.85 -114.66 -108.69 14 0.430 0.408 67.97 -102.07 -96.10 23 0.663 0.650 20.52 -130.48 -124.52 24 0.483 0.463 57.21 -107.32 -101.36 34 0.599 0.584 33.50 -121.11 -115.15 123 0.757 0.743 3.39 -146.16 -138.20 124 0.487 0.456 58.39 -105.75 -97.79 134 0.612 0.589 32.93 -120.84 -112.89 234 0.718 0.701 11.42 -138.02 -130.07 1234 0.759 0.740 5.00 -144.59 -134.64

Page 14: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

Part B: Linear Statistical Model

10. Suppose we will conduct an experiment with a completely randomized design. We are interested in understanding the effects of two treatment factors A and B on the mean of the response variable. Each of these factors has three levels. We will assume that there is no interaction between the factors and will model the response variable as

ijk i j ijky µ α β ε= + + + , 1, 2,3; 1,2,3; 1, , iji j k n= = =

where ijky denotes the response for the k th experimental unit treated with level i of factor A and

level j of factor B; 1 2 3 1 2 3, , , , , ,µ α α α β β β represent unknown parameters; the ijkε are i.i.d. random

variables with mean 0 and unknown positive variance 2σ ; and ijn denotes the number of

experimental units treated with level i of factor A and level j of factor B ( 1, 2,3; 1,2,3i j= = ).

Based on available resources we are initially considering an experimental design where

11 13 23 32 4n n n n= = == and 0ijn = for all other combinations of i and j .

(a) To understand the effects of factor A on the response, we are interested in estimating linear functions of the form i iα α ′− for i i< ′ . List all such functions that would be estimable under the initially proposed design.

Page 15: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(b) To understand the effects of factor B on the response, we are interested in estimating linear functions of the form jjβ β ′− for j j< ′ . List all such functions that would be estimable under

the initially proposed design.

(c) Assume that the initially planned observations ( 11 13 23 32 4n n n n= = == ) are essentially free but

that any additional observation from an experimental unit treated with level i of factor A and level j of factor B will cost 10 100i j× + × dollars. If we wish to be able to estimate all possible

linear functions of the form i iα α ′− for i i< ′ and jjβ β ′− for j j< ′at minimal cost, how many

additional observations are needed and at which levels of factor A and B should the observation(s) be taken?

Page 16: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

11. In the calibration of a scientific instrument, “true” values x are known and produce experimental readings y on the instrument. Suppose that we are willing to assume that the mean value of y is proportional to x , and it is sensible to model experimental readings using the model given as

i i iy x εβ= +

where ( ) 0iE ε = for 1, ,i n= . A particular calibration experiment produces 4n = data points as given below

x 3 4 5 6 y 3 6 11 14

Initially, we assume that ( ) 2iVar ε σ= for 1, ,i n= .

(a) Find XP=Y Y and ( )Var Y . If the “most influential” observation in the data in fitting the linear

model here is defined as the one with the largest variance for ˆiy , 1, , 4i = , then which of the 4 observations is “most influential”?

(b) Give 90% two-sided confidence limits for σ in the normal version of this model.

Page 17: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(c) Give 90% two-sided prediction limits for a new observation *y at 10x = .

Now suppose that it is plausible that not only is the mean value of y proportional to x , but that the

standard deviation of y is also proportional to x , i.e. ( ) 2 2i iVar xε σ= .

(d) Find the generalized least squares estimate of β , denoted by GLSβ , and the estimated standard

deviation of GLSβ under the new model (variance) assumptions.

Page 18: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

12. In the study of the precision of a measuring device, each of 2a = widgets (call “widgets” levels of Factor A) was measured 2m = times by each of 5b = different technicians (call “technicians” levels of Factor B). The resulting data can be thought of as having 2 5× (complete, balanced, replicated) factorial structure. With

ijky = measurement k by technician j on widget i ,

model as

ijk i j ij ijky µ α β αβ ε= + + + +

Where µ is the only fixed effect, all the random effects are independent, the 2~ (0 ),i N αα σ , the 2~ (0 ),j N ββ σ , the 2~ (0, )ij N αβαβ σ , and the 2~ (0, )ijk N εε σ . Standard two-way factorial

ANOVA calculations were done and produced the following ANOVA table.

Source DF Mean Square Expected MS A 1 20.0 2 2 212 0αβ ασ σ σ++ B 4 10.0 2 2 22 4αβ βσ σ σ++

AB 4 6.0 2 22 αβσ σ+ Error 10 2.0 2σ

(a) A quantity of serious interest in this context is 2 2β αβσ σ+ (which is called a measure of

measurement “reproducibility”). Find a sensible point estimate of this quantity.

(b) Calculate approximate 90% confidence limits for 2 2β αβσ σ+ .

Page 19: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

Standard Normal Probability Distribution P(Z<z) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

Page 20: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

t-distribution one-tail 0.005 0.010 0.025 0.050 0.100 0.150 0.200 0.250 0.300 0.350 two-tail 0.010 0.020 0.050 0.100 0.200 0.300 0.400 0.500 0.600 0.700 𝜈𝜈\c 0.990 0.980 0.950 0.900 0.800 0.700 0.600 0.500 0.400 0.300

1 63.66 31.82 12.71 6.31 3.08 1.96 1.38 1.00 0.73 0.51 2 9.92 6.96 4.30 2.92 1.89 1.39 1.06 0.82 0.62 0.44 3 5.84 4.54 3.18 2.35 1.64 1.25 0.98 0.76 0.58 0.42 4 4.60 3.75 2.78 2.13 1.53 1.19 0.94 0.74 0.57 0.41 5 4.03 3.36 2.57 2.02 1.48 1.16 0.92 0.73 0.56 0.41 6 3.71 3.14 2.45 1.94 1.44 1.13 0.91 0.72 0.55 0.40 7 3.50 3.00 2.36 1.89 1.41 1.12 0.90 0.71 0.55 0.40 8 3.36 2.90 2.31 1.86 1.40 1.11 0.89 0.71 0.55 0.40 9 3.25 2.82 2.26 1.83 1.38 1.10 0.88 0.70 0.54 0.40

10 3.17 2.76 2.23 1.81 1.37 1.09 0.88 0.70 0.54 0.40 11 3.11 2.72 2.20 1.80 1.36 1.09 0.88 0.70 0.54 0.40 12 3.05 2.68 2.18 1.78 1.36 1.08 0.87 0.70 0.54 0.39 13 3.01 2.65 2.16 1.77 1.35 1.08 0.87 0.69 0.54 0.39 14 2.98 2.62 2.14 1.76 1.35 1.08 0.87 0.69 0.54 0.39 15 2.95 2.60 2.13 1.75 1.34 1.07 0.87 0.69 0.54 0.39 16 2.92 2.58 2.12 1.75 1.34 1.07 0.86 0.69 0.54 0.39 17 2.90 2.57 2.11 1.74 1.33 1.07 0.86 0.69 0.53 0.39 18 2.88 2.55 2.10 1.73 1.33 1.07 0.86 0.69 0.53 0.39 19 2.86 2.54 2.09 1.73 1.33 1.07 0.86 0.69 0.53 0.39 20 2.85 2.53 2.09 1.72 1.33 1.06 0.86 0.69 0.53 0.39 21 2.83 2.52 2.08 1.72 1.32 1.06 0.86 0.69 0.53 0.39 22 2.82 2.51 2.07 1.72 1.32 1.06 0.86 0.69 0.53 0.39 23 2.81 2.50 2.07 1.71 1.32 1.06 0.86 0.69 0.53 0.39 24 2.80 2.49 2.06 1.71 1.32 1.06 0.86 0.68 0.53 0.39 25 2.79 2.49 2.06 1.71 1.32 1.06 0.86 0.68 0.53 0.39 26 2.78 2.48 2.06 1.71 1.31 1.06 0.86 0.68 0.53 0.39 27 2.77 2.47 2.05 1.70 1.31 1.06 0.86 0.68 0.53 0.39 28 2.76 2.47 2.05 1.70 1.31 1.06 0.85 0.68 0.53 0.39 29 2.76 2.46 2.05 1.70 1.31 1.06 0.85 0.68 0.53 0.39 30 2.75 2.46 2.04 1.70 1.31 1.05 0.85 0.68 0.53 0.39

Page 21: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

Chi-squared (given degree of freedom and level of significance) AREA TO THE RIGHT

ν 0.995 0.990 0.975 0.950 0.900 0.100 0.050 0.025 0.010 0.005 1 0.000 0.000 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879 2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597 3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838 4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860 5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750 6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548 7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278 8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955 9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589

10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188 11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757 12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300 13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819 14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319 15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801 16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267 17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718 18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156 19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582 20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997 21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401 22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796 23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181 24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559 25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928 26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290 27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645 28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993 29 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336 30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672 31 14.458 15.655 17.539 19.281 21.434 41.422 44.985 48.232 52.191 55.003 32 15.134 16.362 18.291 20.072 22.271 42.585 46.194 49.480 53.486 56.328 33 15.815 17.074 19.047 20.867 23.110 43.745 47.400 50.725 54.776 57.648 34 16.501 17.789 19.806 21.664 23.952 44.903 48.602 51.966 56.061 58.964 35 17.192 18.509 20.569 22.465 24.797 46.059 49.802 53.203 57.342 60.275

Page 22: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

F 𝜈𝜈2 1 2 3 4 5 6 7 8 9 10 𝜈𝜈1 alpha=0.01

1 4052.2 98.5 34.1 21.2 16.3 13.7 12.2 11.3 10.6 10.0 2 4999.5 99.0 30.8 18.0 13.3 10.9 9.5 8.6 8.0 7.6 3 5403.4 99.2 29.5 16.7 12.1 9.8 8.5 7.6 7.0 6.6 4 5624.6 99.2 28.7 16.0 11.4 9.1 7.8 7.0 6.4 6.0 5 5763.6 99.3 28.2 15.5 11.0 8.7 7.5 6.6 6.1 5.6 6 5859.0 99.3 27.9 15.2 10.7 8.5 7.2 6.4 5.8 5.4 7 5928.4 99.4 27.7 15.0 10.5 8.3 7.0 6.2 5.6 5.2 8 5981.1 99.4 27.5 14.8 10.3 8.1 6.8 6.0 5.5 5.1 9 6022.5 99.4 27.3 14.7 10.2 8.0 6.7 5.9 5.4 4.9

10 6055.8 99.4 27.2 14.5 10.1 7.9 6.6 5.8 5.3 4.8

F

11 12 13 14 15 16 17 18 19 20

alpha=0.01

1 9.6 9.3 9.1 8.9 8.7 8.5 8.4 8.3 8.2 8.1 2 7.2 6.9 6.7 6.5 6.4 6.2 6.1 6.0 5.9 5.8 3 6.2 6.0 5.7 5.6 5.4 5.3 5.2 5.1 5.0 4.9 4 5.7 5.4 5.2 5.0 4.9 4.8 4.7 4.6 4.5 4.4 5 5.3 5.1 4.9 4.7 4.6 4.4 4.3 4.2 4.2 4.1 6 5.1 4.8 4.6 4.5 4.3 4.2 4.1 4.0 3.9 3.9 7 4.9 4.6 4.4 4.3 4.1 4.0 3.9 3.8 3.8 3.7 8 4.7 4.5 4.3 4.1 4.0 3.9 3.8 3.7 3.6 3.6 9 4.6 4.4 4.2 4.0 3.9 3.8 3.7 3.6 3.5 3.5

10 4.5 4.3 4.1 3.9 3.8 3.7 3.6 3.5 3.4 3.4

Page 23: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

F

1 2 3 4 5 6 7 8 9 10

alpha=0.05

1 161.4 18.5 10.1 7.7 6.6 6.0 5.6 5.3 5.1 5.0 2 199.5 19.0 9.6 6.9 5.8 5.1 4.7 4.5 4.3 4.1 3 215.7 19.2 9.3 6.6 5.4 4.8 4.3 4.1 3.9 3.7 4 224.6 19.2 9.1 6.4 5.2 4.5 4.1 3.8 3.6 3.5 5 230.2 19.3 9.0 6.3 5.1 4.4 4.0 3.7 3.5 3.3 6 234.0 19.3 8.9 6.2 5.0 4.3 3.9 3.6 3.4 3.2 7 236.8 19.4 8.9 6.1 4.9 4.2 3.8 3.5 3.3 3.1 8 238.9 19.4 8.8 6.0 4.8 4.1 3.7 3.4 3.2 3.1 9 240.5 19.4 8.8 6.0 4.8 4.1 3.7 3.4 3.2 3.0

10 241.9 19.4 8.8 6.0 4.7 4.1 3.6 3.3 3.1 3.0

F 𝜈𝜈2 15 16 17 18 19 20 21 22 23 24 𝜈𝜈1 alpha=0.05

1 161.4 18.5 10.1 7.7 6.6 6.0 5.6 5.3 5.1 5.0 2 199.5 19.0 9.6 6.9 5.8 5.1 4.7 4.5 4.3 4.1 3 215.7 19.2 9.3 6.6 5.4 4.8 4.3 4.1 3.9 3.7 4 224.6 19.2 9.1 6.4 5.2 4.5 4.1 3.8 3.6 3.5 5 230.2 19.3 9.0 6.3 5.1 4.4 4.0 3.7 3.5 3.3 6 234.0 19.3 8.9 6.2 5.0 4.3 3.9 3.6 3.4 3.2 7 236.8 19.4 8.9 6.1 4.9 4.2 3.8 3.5 3.3 3.1 8 238.9 19.4 8.8 6.0 4.8 4.1 3.7 3.4 3.2 3.1 9 240.5 19.4 8.8 6.0 4.8 4.1 3.7 3.4 3.2 3.0

10 241.9 19.4 8.8 6.0 4.7 4.1 3.6 3.3 3.1 3.0

Page 24: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

DEPARTMENT OF MATHEMATICS & STATISTICS

GRADUATE PROGRAM

QUALIFYING EXAMINATION I: APPLIED STATISTICS

Saturday, May 9th, 2015

Examiners: Drs. L. Lu and C. P. Tsokos

Answer any seven questions completely; at least one must be from Part B. PART A: Statistical Analysis

1. The table below contains results of a study comparing radiation therapy with surgery in treating

cancer of the larynx. Cancer

Controlled Cancer Not Controlled

Surgery 11 1 Radiation therapy 5 2

(a) Set up the hypotheses to determine whether there is a difference in the response to cancer treatment

between the radiation therapy and surgery. Which statistical test is appropriate to test the hypotheses?

(b) Calculate the P-value of the test. What is your conclusion using 0.05α = ?

Page 25: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

2. Doctors from a certain research institute think that they have found a drug that can relieve flu symptoms better than other drugs currently available on the market. Ten sets of identical twins (sharing genetically similar immune systems) are asked to participate in a study for the drug efficacy. Among each set, one twin is given the new drug and the other twin is given the old drug. After 24 hours, they are asked to rate how they feel on a scale of 1 to 10 (1=horrible, 10=great). The data are summarized in the table below. Consider the rating scores as continuous and assume they follow normal distributions. Note the normal plot for the difference in the ratings of the new and old drugs for the ten sets of identical twins is examined as shown on the right side.

Twins 1 2 3 4 5 6 7 8 9 10 Mean SD New drug 6 5 7 6 8 7 9 4 8 7 6.7 1.5 Old drug 5 4 7 4 6 8 8 4 5 4 5.5 1.6 Difference in rating (New-Old)

1 1 0 2 2 -1 1 0 3 3 1.2 1.3

(a) Do you think the data suggest the new drug is a significant improvement over the old drug

at a significance level of 0.05?

(b) Find a 95% prediction interval for the difference in the rating of the new and old drugs for a future sample of identical twins.

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

-10

12

3

Normal plot of difference in rating

Normal Score

Ord

ered

Dat

a

Page 26: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

3. A plan was developed by an airline on the premise that 10% of its current customers would qualify

for membership. A random sample of customers is planned to be surveyed on their qualification of membership according to the plan.

(a) How many customers need to be sampled in order to obtain an estimate of the proportion within 3% at a 95% confidence level?

(b) If a random sample of 500 customers was actually sampled and 36 turned out to be qualified. Is the airline’s premise correct at a significant level 0.05?

(c) What is the probability that the company’s premise will be judged correct when in fact 7% of all current customers qualify at 0.05α = ?

Page 27: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

4. Mercury is a naturally occurring contaminant in fish in freshwater lakes. In some lakes, the mercury concentration is sufficiently high that warnings are posted to limit the consumption of fish from that lake. The mercury concentration is influenced by many characteristics of lakes. The state of Maine collected information from a random sample of 110 lakes. The variables of interest included:

mc: mercury concentration in fish elev: elevation of the lake area: area of the lake drain: size of the drainage basin for the lake runoff: annual runoff into the lake

Here are the parameter estimates and their standard errors when a multiple linear regression model with all the variables is fit to the data:

Parameter Estimate SE

0β 0.758 0.144

elevβ -0.211 0.061

areaβ -0.014 0.016

drainβ 0.0137 0.029

runoffβ -0.292 0.262

(a) Complete the ANOVA table given below:

Models SSE DF MS

Model 1.137

Error 7.515

Total 8.653

(b) Construct a test of 0 : 0elev area drain runoffH β β β β= = = = and draw a conclusion at 0.01α = , if that

is possible from the information provided. In not, indicate how you would construct the desired test.

(c) Construct a test of 0 : 0drain runoffH β β= = and draw a conclusion at 0.01α = , if that is possible

from the information provided. In not, indicate how you would construct the desired test.

Page 28: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(d) Construct a test of 0 : 0area drainH β β+ = and draw a conclusion at 0.01α = , if that is possible from the information provided. In not, indicate how you would construct the desired test.

5. A commercial real estate company evaluates vacancy rates, square footage, rental rates, and operating expenses for commercial properties in a large metropolitan area in order to provide clients with quantitative information upon which to make rental decisions. The data were collected from 81 suburban commercial properties from five geographic areas. The variables in the data include the age (X1), operating expenses and taxes (X2), vacancy rates (X3), total square footage (X4), and rental rates (Y). The following tables (from R output) include the estimates of model parameters as well as the ANOVA table from fitting a multiple regression model with all four predictor variables. Answer the following questions at the 0.01α = significance level. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.220e+01 5.780e-01 21.110 < 2e-16 *** X4 7.924e-06 1.385e-06 5.722 1.98e-07 *** X1 -1.420e-01 2.134e-02 -6.655 3.89e-09 *** X2 2.820e-01 6.317e-02 4.464 2.75e-05 *** X3 6.193e-01 1.087e+00 0.570 0.57

Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) X4 1 67.775 67.775 52.4369 3.073e-10 *** X1 1 42.275 42.275 32.7074 2.004e-07 *** X2 1 27.857 27.857 21.5531 1.412e-05 *** X3 1 0.420 0.420 0.3248 0.5704 Residuals 76 98.231 1.293

Page 29: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(a) Construct a test on whether 3X can be dropped from the regression model given that 1 2,X X and

4X are retained if it is possible from the information provided. If not, indicate how you would construct the desired test.

(b) Construct a test on whether both 2X and 3X can be dropped from the regression model given that

1X and 4X are retained if it is possible from the information provided. If not, indicate how you would construct the desired test.

(c) Construct a test on where 4X can be dropped from the regression model given that 1 2,X X and 3X are retained if possible. If not, indicate how you would construct the desired test.

Page 30: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(d) Construct a test on whether both 1X and 2X can be dropped from the regression model given that

3X and 4X are retained if possible. If not, indicate how you would construct the desired test.

6. A tax consultant wanted to study whether the selling price (𝑌𝑌, measured in K$) of one-family residential dwelling varies with lot locations (𝑋𝑋1). Two different lot locations are considered which are the corner lots and non-corner lots. The selling price is also considered to be related to the assessed valuation of the dwellings (𝑋𝑋2). Data were collected for a random sample of 16 recent “arm’s-length” sales transactions of one-family dwellings located on corner lots and for a random sample of 48 sales of one-family dwellings not located on corner lots.

(a) If the tax consultant was told that the relationship between the assessed valuation and the selling price is linear and the effect of lot location is the same for all assessed valuation. Then write down the appropriate regression model for the relationship between the selling price (𝑌𝑌) and both lot location (𝑋𝑋1) and the assessed valuation (𝑋𝑋2). Describe the variables in your model.

68 70 72 74 76 78

6070

8090

100

Assessed Valuation

Ass

esse

d va

luat

ion

Non-cornerCorner

Page 31: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(b) The tax consultant suspected that the effect of the lot location increases with the assessed valuation of the dwellings. Write down an appropriate model that allows this relationship. Describe any new variables used in this model.

(c) Now imagine you are consulted as a statistician for this problem. Based on the sequential SS table given below, what will be your conclusion regarding whether the effect of lot location is constant over assessed valuation or increases with valuation? Specify the appropriate hypotheses and the test statistic and explain how your conclusion is drawn at 𝛼𝛼 = 0.1.

Source Sum of Squares Location 491.2 Valuation 3667.9

Location * Valuation 32.5 Residuals 818.4

(d) What is your conclusion on whether there is a significant lot location effect on the selling price conditioned on fixed assessed valuation? Conduct an appropriate test at 𝛼𝛼 = 0.01.

Page 32: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

7. Residential sales that occurred during the year 2002 were available from a city in the Midwest. Data on 522 arms-length transactions include 𝑌𝑌 = sales price, 𝑋𝑋1 = finished square feet, 𝑋𝑋2 = number of bedrooms, 𝑋𝑋3 = number of bathrooms, 𝑋𝑋4 = air conditioning (presence or absence), 𝑋𝑋5 = garage size (# cars can be held), 𝑋𝑋6 = pool (presence or absence), 𝑋𝑋7 = year built, 𝑋𝑋8 = quality (1=high, 2=medium, 3=low), 𝑋𝑋9 = lot size (square feet), 𝑋𝑋10 = adjacent to highway (presence or absence). The city tax assessor was interested in predicting sales price based on the demographic variable information.

(a) A random sample of 300 observations was selected to build a model for predicting sales price. The table below shows the criteria values for the best model for different number of parameters. The models were built based on the log sales price. The variable 𝑋𝑋8 = quality was treated as numerical. Which model would you choose for predicting the sales price? Explain your reasons.

# Param-eters

Best subset model 𝑅𝑅2 𝑅𝑅𝑎𝑎𝑎𝑎𝑎𝑎2 𝐶𝐶𝑝𝑝 AIC BIC PRESS

2 𝑋𝑋1 0.657 0.656 230.6 -833.4 -825.9 18.7

3 𝑋𝑋1 + 𝑋𝑋8 0.745 0.744 97.1 -920.6 -909.5 14.0

4 𝑋𝑋1 + 𝑋𝑋8 + 𝑋𝑋9 0.778 0.776 49.0 -959.7 -944.9 12.3

5 𝑋𝑋1 + 𝑋𝑋7 + 𝑋𝑋8 + 𝑋𝑋9 0.799 0.797 17.8 -988.5 -969.9 11.3

6 𝑋𝑋1 + 𝑋𝑋5 + 𝑋𝑋7 + 𝑋𝑋8 + 𝑋𝑋9 0.805 0.801 11.6 -994.5 -972.3 11.1

7 𝑋𝑋1 + 𝑋𝑋5 + 𝑋𝑋6 + 𝑋𝑋7 + 𝑋𝑋8 +𝑋𝑋9

0.808 0.804 9.2 -996.9 -971.0 11.0

8 𝑋𝑋1 + 𝑋𝑋3 + 𝑋𝑋5 + 𝑋𝑋6 + 𝑋𝑋7 +𝑋𝑋8 + 𝑋𝑋9

0.810 0.805 7.7 -998.6 -968.9 10.9

9 𝑋𝑋1 + 𝑋𝑋3 + 𝑋𝑋4 + 𝑋𝑋5 + 𝑋𝑋6 +𝑋𝑋7 + 𝑋𝑋8 + 𝑋𝑋9

0.811 0.806 7.8 -998.5 -965.1 11.0

10 𝑋𝑋1 + 𝑋𝑋3 + 𝑋𝑋4 + 𝑋𝑋5 + 𝑋𝑋6 +𝑋𝑋7 + 𝑋𝑋8 + 𝑋𝑋9 + 𝑋𝑋10

0.812 0.806 9.0 -997.3 -960.3 11.0

11 𝑋𝑋1 + 𝑋𝑋2 + 𝑋𝑋3 + 𝑋𝑋4 + 𝑋𝑋5 +𝑋𝑋6 + 𝑋𝑋7 + 𝑋𝑋8 + 𝑋𝑋9 + 𝑋𝑋10

0.812 0.805 11.0 -995.3 -954.6 11.2

Page 33: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(b) Assuming we have obtained the fitted model that was selected from (a), write down the formula for estimating the expected sales price for a house built in Year 1992 in high quality with 3000 finished square feet, 5 bedrooms, 3 bathrooms, 3-car garage, 7000 lot size, AC units, has no swimming pool and is not adjacent to highway. (Please plug in numbers for the predictor variables.)

8. In an experiment designed for studying children’s memory, a random sample of 36 fourth-graders

from a school were included in the experiment. Two design factors are considered: the level of reinforcement (none or verbal) and time of isolation (20, 40, or 60 minutes). Students participating in the study were told to memorize a paragraph and given positive verbal reinforcement or no reinforcement while learning it according to their treatment assignment. Then students were isolated for the specified amount of time. There were 6 students randomly assigned to each of the six treatment groups. The response is a score measuring the student’s memory for the learned paragraph.

(a) Complete the ANOVA table below. Which effects are tested significant at 𝛼𝛼 = 0.05 level? Source of Variation

Degrees of freedom

Sum of Squares Mean Squares

Reinforcement 117.36 Isolation time 141.56

Interaction 916.22 Error 401.17

Page 34: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(b) Considering the following estimates of differences in the average memory scores between A. verbal and no reinforcement, averaged over the three isolation times B. verbal and no reinforcement, only for 20 minute isolation time C. 20 and 40 minutes isolation times, averaged over the two reinforcement levels D. Interaction between 20 and 40 minutes and reinforcement levels

Which of the four contrasts has the largest standard error? Which has the smallest standard error? Each of the four estimates can lead to a t-test for an “interesting difference”. Assume the “interesting difference” is the same value for all four tests, which test is most powerful? Which is least powerful?

9. In a pig breeding study, two offspring from each of ten litters were measured for average daily weight gain. The individual pig measurements can be modeled in the form of

𝑦𝑦𝑖𝑖𝑎𝑎 = 𝜇𝜇 + 𝛼𝛼𝑖𝑖 + 𝜀𝜀𝑖𝑖𝑎𝑎 where 𝜇𝜇 is the overall mean, and 𝛼𝛼𝑖𝑖 ∼ 𝑁𝑁(0,𝜎𝜎𝛼𝛼2) is the litter random effect which is independent of the individual pig effect 𝜀𝜀𝑖𝑖𝑎𝑎 ∼ 𝑁𝑁(0,𝜎𝜎2).

(a) Complete the ANOVA table below. And obtain point estimates of the litter and individual pig variance components.

Source d.f. SS MS Expected MS

Litters 0.6705

Pigs within litters 0.3834

total 1.0539

(b) If the investigator intends to select a pig at random from the population and measure its average daily weight gain. Estimate the variance associate with this quantity as well as the fraction of this variance that is due to genetic factors (i.e. litter).

Page 35: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(c) Calculate the 95% CI for the overall average daily weight gain for all offspring from the population. The overall sample mean of the weight gain is 2.572 kg.

(d) Suppose the investigator now intends to design an experiment where the treatments are assigned to mothers (i.e. litters), but the response is observed on pigs to study the effects of some drugs. The quantity of interest is the treatment means, averaged over 𝑏𝑏 litters and 𝑛𝑛 pigs at each treatment level. The investigator can choose between the following three designs:

A. 𝑏𝑏 = 2 litters, 𝑛𝑛 = 4 pigs per litter at each treatment level; B. 𝑏𝑏 = 4 litters, 𝑛𝑛 = 2 pigs per litter at each treatment level; C. 𝑏𝑏 = 8 litters, 𝑛𝑛 = 1 pigs per litter at each treatment level.

Assume the variance components estimated in (a) are appropriate for this new study. Calculate the standard error of the treatment means for each of the three designs. Which design gives the most precise estimate of the treatment mean?

Page 36: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

Part B: Linear Statistical Model

10. In the study of the precision of a measuring device, each of 2a = widgets (call “widgets” levels of Factor A) was measured 2m = times by each of 5b = different technicians (call “technicians” levels of Factor B). The resulting data can be thought of as having 2 5× (complete, balanced, replicated) factorial structure. With

ijky = measurement k by technician j on widget i ,

model as

ijk i j ij ijky µ α β αβ ε= + + + +

Where µ is the only fixed effect, all the random effects are independent, the 2~ (0 ),i N αα σ , the 2~ (0 ),j N ββ σ , the 2~ (0, )ij N αβαβ σ , and the 2~ (0, )ijk N εε σ . Standard two-way factorial

ANOVA calculations were done and produced the following ANOVA table.

Source DF Mean Square Expected MS A 1 20.0 2 2 212 0αβ ασ σ σ++ B 4 10.0 2 2 22 4αβ βσ σ σ++

AB 4 6.0 2 22 αβσ σ+ Error 10 2.0 2σ

(a) A quantity of serious interest in this context is 2 2β αβσ σ+ (which is called a measure of measurement

“reproducibility”). Find a sensible point estimate of this quantity.

(b) Calculate approximate 90% confidence limits for 2 2β αβσ σ+ .

Page 37: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

11. In the calibration of a scientific instrument, “true” values x are known and produce experimental readings y on the instrument. Suppose that we are willing to assume that the mean value of y is proportional to x , and it is sensible to model experimental readings using the model given as

i i iy x εβ= +

where ( ) 0iE ε = for 1, ,i n= . A particular calibration experiment produces 4n = data points as given below

x 3 4 5 6 y 3 6 11 14

Initially, we assume that ( ) 2iVar ε σ= for 1, ,i n= .

(a) Find XP=Y Y and ( )Var Y . If the “most influential” observation in the data in fitting the linear

model here is defined as the one with the largest variance for ˆiy , 1, , 4i = , then which of the 4 observations is “most influential”?

(b) Give 90% two-sided confidence limits for σ in the normal version of this model.

Page 38: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(c) Give 90% two-sided prediction limits for a new observation *y at 10x = .

Now suppose that it is plausible that not only is the mean value of y proportional to x , but that the

standard deviation of y is also proportional to x , i.e. ( ) 2 2i iVar xε σ= .

(d) Find the generalized least squares estimate of β , denoted by GLSβ , and the estimated standard

deviation of GLSβ under the new model (variance) assumptions.

Page 39: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

12. Suppose we will conduct an experiment with a completely randomized design. We are interested in understanding the effects of two treatment factors A and B on the mean of the response variable. Each of these factors has three levels. We will assume that there is no interaction between the factors and will model the response variable as

ijk i j ijky µ α β ε= + + + , 1, 2,3; 1,2,3; 1, , iji j k n= = =

where ijky denotes the response for the k th experimental unit treated with level i of factor A and

level j of factor B; 1 2 3 1 2 3, , , , , ,µ α α α β β β represent unknown parameters; the ijkε are i.i.d. random

variables with mean 0 and unknown positive variance 2σ ; and ijn denotes the number of

experimental units treated with level i of factor A and level j of factor B ( 1, 2,3; 1,2,3i j= = ).

Based on available resources we are initially considering an experimental design where

11 13 23 32 4n n n n= = == and 0ijn = for all other combinations of i and j .

(a) To understand the effects of factor A on the response, we are interested in estimating linear functions of the form i iα α ′− for i i< ′ . List all such functions that would be estimable under the initially proposed design.

(b) To understand the effects of factor B on the response, we are interested in estimating linear functions of the form jjβ β ′− for j j< ′ . List all such functions that would be estimable under the initially

proposed design.

Page 40: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

(c) Assume that the initially planned observations ( 11 13 23 32 4n n n n= = == ) are essentially free but that

any additional observation from an experimental unit treated with level i of factor A and level j of factor B will cost 10 100i j× + × dollars. If we wish to be able to estimate all possible linear

functions of the form i iα α ′− for i i< ′ and jjβ β ′− for j j< ′at minimal cost, how many additional

observations are needed and at which levels of factor A and B should the observation(s) be taken?

Page 41: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

Standard Normal Probability Distribution P(Z<z) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

Page 42: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

Critical values for t-distribution

Page 43: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

Critical values for chi-square distribution

Page 44: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

Critical values for F-distribution

Page 45: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM
Page 46: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM
Page 47: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

1  

DEPARTMENT OF MATHEMATICS & STATISTICS

GRADUATE PROGRAM

QUALIFYING EXAMINATION I: APPLIED STATISTICS

Saturday May 14, 2016

Examiners: Dr. L. Lu, and Dr. C. P. Tsokos

Answer any seven questions completely; at least one must be from Part B. PART A: Statistical Analysis

1. The data shown below were obtained in a small-scale experiment to study the relation between o F of

storage temperature ( X ) and number of weeks before flavor deterioration of a food product begins to occur (Y ). A simple linear regression model was fitted.

i 1 2 3 4 5

iX 10 5 0 -5 -10

iY 7.4 9.4 9.6 11 12.6(a) Find 95% CI for 1 . Will you conclude that there is a significant linear relationship at 0.05 ?

(b) Find the sample correlation of coefficient between the time before flavor deterioration and the storage temperature and interpret in terms of the strength and direction of the linear relationship.

Page 48: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

2  

(c) Suppose five such food products are stored in the refrigerator at 5oX F , find a 95% prediction interval for the average number of weeks before flavor deterioration of these five food products begin to occur.

2. In a study about strategies that wait staff at restaurants employ to increase tips, data on tip

amount as a percentage of the bill are summarized in the table below. Assume the tip amounts follow normal distributions using both strategies.

Self-introduction 1 10n 22.63x 1 7.82s No self-introduction 2 10n 14.15y 2 6.10s

(a) Do these data suggest that a self-introduction increases tips on average? State and test the relevant hypotheses at the 0.01 significance level using the Welch-Satterthwaite separate variances approach?

Page 49: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

3  

(b) Do these data suggest that the tip amounts using the two strategies have equal variances at a significant level of 0.1?

(c) Now if the tip amounts using the two strategies are assumed to have equal variances, would the conclusion change at the 0.01 significance level?

(d) If the two populations are assumed to have equal variances, compute the 95% confidence interval for the pooled variance for the combined population.

Page 50: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

4  

3. The following data give the starting position of winners in 144 horse races, where position 1 is closest to the inside rail of the race track.

Starting Position 1 2 3 4 5 6 7 8

Number of wins 29 19 18 25 17 10 15 11

(a) Determine the two-sided 95% confidence interval for the chance of winning at starting position 1.

(b) The data were collected for testing the hypothesis that a horse’s chance of winning are unaffected by its position on the starting lineup. State the null and alternative hypotheses and perform an appropriate test at the 5% level of significance.

4. Members of the New York Choral Society are organized according to vocal range. Among male singers, the parts from lowest to highest pitch are Bass 1, Bass 2, Tenor 1, and Tenor 2. The heights of male singers are summarized in the table below.

Heights in Inches

Part Sample size

( ) Sample Mean

( iY )

Sample

Variance ( 2is )

Bass 1 31 70.84 5.2065 Bass 2 31 71.38 7.3828 Tenor 1 31 69.00 5.8667 Tenor 2 31 69.70 3.8258

Total 124 70.16Y 2 5.5704ps

Page 51: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

5  

(a) Complete the ANOVA table below and test whether there are significant differences in height among singers in different parts at 0.05.

Source of Variance

Degree of Freedom

Sum of Squares Mean Squares F-Raito

Part 92.32

Error 668.45

(b) Test if there is any significant difference between the average height of male singers in low pitch parts (Bass 1 & 2) and those in high pitch parts (Tenor 1 & 2) at 0.05.

5. In a study for developing a model to predict the college GPA of matriculating freshmen based on their

college entrance verbal and mathematics test scores, data for a random sample of 40 graduating seniors on their college GPA ( ) along with their college entrance verbal test score ( ) and mathematics test score ( ) expressed as percentiles are collected.

(a) A linear regression model is fitted to these data. The parameter estimates and their standard errors are summarized below. The SSE for the linear fit is 5.9876. Calculate 95% CI’s for the coefficients of the verbal and mathematics test scores. Based on the CI’s, should any of the variables be excluded from the model?

Page 52: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

6  

Parameter Estimate SE

0 -1.571 0.494

1 0.026 0.004

2 0.034 0.005

(b) A quadratic model of the form is then fitted to these data. Complete the sequential ANOVA table below:

Models SSE DF MS

5.2549

7.5311

3.6434

1.0552

0.0982

Residuals 1.1908

Based on the sequential ANOVA table in (b), construct a test of 20 1 3 4 5: 0H and draw a conclusion at 0.01 , if that is possible from the information provided. In not, indicate how you would construct the desired test.

Page 53: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

7  

(c) Construct a test of 50 3 4: 0H and draw a conclusion at 0.01 , if that is possible from

the information provided. In not, indicate how you would construct the desired test.

(d) Construct a test of 0 5: 0H and draw a conclusion at 0.01 , if that is possible from the

information provided. In not, indicate how you would construct the desired test.

6. Some data were collected from a study of a natural compound that may help regulate blood cholesterol levels in rats. A total of 128 rats were randomly assigned to one of 7 diets differing only in the dose of the natural compound. A second order model using the log-transformed response (blood cholesterol) in the form of , ∼ 0, was fitted and the parameter estimates and their variance-covariance matrix are given below:

5.2170.1620.226

and 0.00206 0.00357 0.001440.00357 0.00988 0.005180.00144 0.00518 0.00306

(a) Test whether the dose-response model (describing the expected log-cholestrol level as a function of the dose of a drug) is better fit by a straight line or a quadratic curve at 0.05.

Page 54: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

8  

(b) Use the following results from fitting a variety of regression models to test whether there is any evidence of lack of fit of the quadratic model at 0.05.

Model Error Sum-of-squares

5.876

4.527

3.996

3.870

3.819

(c) Assume that a quadratic dose-response model is chosen, please estimate the dose value that produces the largest expected response and its approximated standard error.

Page 55: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

9  

7. A randomized clinical trial was conducted to study a proposed new drug for reducing blood pressure. Volunteers were randomly assigned to one of the two treatments: take the new drug for 3 months or take a placebo for 3 months. Blood pressure for each subject was recorded at the start of the experiment and again after 3 months. The relevant variables are defined as:

: blood pressure for subject at the end of the study (post-treatment) : blood pressure for subject at the beginning of the study (pre-treatment)

1 ifsubject receivedthenewdrug0 ifsubject receivedtheplacebo

There are 181 subjects participated in the study, 98 subjects received the new drug and 83 subjects received the placebo. The average pre-treatment and post-treatment blood pressure for all 181 subjects are 85.7 and 59.1, respectively. First, a regression model in the form of

, ∼ 0, , was fitted using ordinary least squares. The results are given below:

Coefficient Estimates

Var-Cov matrix of coefficient estimates ANOVA table

Source of Variation

Degree of Freedom

Sum of Squares

26.17 52.47 -0.531 -5.008 Pre-BP 10731.0

0.391 -0.531 0.00594 -0.000847 Treatment 643.1

3.78 -5.008 -0.000847 9.263 Error 74099.99

(a) Find a 95% confidence interval for the effect of the drug (difference in post-treatment blood pressure between subjects receiving the drug and those receiving the placebo) for subjects with a pre-treatment blood pressure of 100. (Use , . 1.973)

(b) Find a 95% prediction interval for the post-treatment blood pressure for a subject with a pre-treatment blood pressure of 100 and received the drug, given that the standard error of the mean prediction for all subjects with a pre-treatment blood pressure of 100 and received the drug is 2.217.

Page 56: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

10  

(c) The investigators are also concerned whether the effect of the drug is the same at all blood pressures, i.e. whether the slope of the relationship between and is the same in the “drug” and “placebo” groups. Some addition models are fitted to the data. The error sums-of-squares for the original model and the additional models are given in the table below:

Models Error Sum of Squares

74,099.99

74,080.15

74,080.50

73,272.78

Conduct a test for equal slopes in the “drug” and “placebo” groups at 0.05.

8. In an experiment designed for studying children’s memory, a random sample of 36 fourth-graders from a school were included in the experiment. Two design factors are considered: the level of reinforcement (none or verbal) and time of isolation (20, 40, or 60 minutes). Students participating in the study were told to memorize a paragraph and given positive verbal reinforcement or no reinforcement while learning it according to their treatment assignment. Then students were isolated for the specified amount of time. There were 6 students randomly assigned to each of the six treatment groups. The response is a score measuring the student’s memory for the learned paragraph.

(a) Complete the ANOVA table below. Which effects are tested significant at 0.05 level?

Page 57: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

11  

Source of Variation

Degrees of freedom

Sum of Squares Mean Squares

Reinforcement 117.36 Isolation time 141.56

Interaction 916.22 Error 401.17

(b) Considering the following estimates of differences in the average memory scores between A. verbal and no reinforcement, averaged over the three isolation times B. verbal and no reinforcement, only for 20 minute isolation time C. 20 and 40 minutes isolation times, averaged over the two reinforcement levels D. Interaction between 20 and 40 minutes and reinforcement levels

Which of the four contrasts has the largest standard error? Which has the smallest standard error? Each of the four estimates can lead to a t-test for an “interesting difference”. Assume the “interesting difference” is the same value for all four tests, which test is most powerful? Which is least powerful?

Page 58: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

12  

9. The table below shows the results of using all subset selection for choosing a best model from all possible regressions with four predictor variables 1 2 3, ,X X X and 4X . Five different criteria are considered. Please make a recommendation of which model to choose based on the results below and explain your reason.

Variables in the model

2R 2adjR pC AIC BIC

1 0.061 0.043 141.16 -77.08 -73.10

2 0.221 0.206 108.56 -87.18 -83.20

3 0.428 0.417 66.49 -103.83 -99.85

4 0.422 0.410 67.71 -103.26 -99.28

12 0.263 0.234 102.03 -88.16 -82.20

13 0.549 0.531 43.85 -114.66 -108.69

14 0.430 0.408 67.97 -102.07 -96.10

23 0.663 0.650 20.52 -130.48 -124.52

24 0.483 0.463 57.21 -107.32 -101.36

34 0.599 0.584 33.50 -121.11 -115.15

123 0.757 0.743 3.39 -146.16 -138.20

124 0.487 0.456 58.39 -105.75 -97.79

134 0.612 0.589 32.93 -120.84 -112.89

234 0.718 0.701 11.42 -138.02 -130.07

1234 0.759 0.740 5.00 -144.59 -134.64

Page 59: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

13  

Part B: Linear Statistical Model

10. This question concerns the analysis of a small set of data on the operation of a Butane Hydrogenolysis Reactor. The response variable is

percent conversion of butane,

which is to be estimated as a function of the chemical reactor process variables

a total feed flow (cc/sec at STP) feed ratio (Hydrogen/Butane) the reactor wall temperature ( )

The data are shown in the following table.

Run, Setup, 1 82 115 6 495 1 2 91 50 4 470 2 3 75 180 8 520 3 4 98 50 4 520 4 5 39 180 8 470 5 6 77 115 6 495 1 7 95 50 8 520 6 8 61 180 4 470 7 9 81 115 6 495 1 10 76 50 8 470 8 11 92 180 4 520 9 12 82 115 6 495 1

Twelve runs were made on 9 process setups (corresponding to combinations of levels of the flow, ratio, and temp factors) were used. We consider the analysis based on a cell means model

, for 1,2, , … ,12

where , , … , are unknown parameters (the 9 mean responses for the different setups of the process), ∼ 0, , and we use the notation the setup number employed in the th run of the process. (For example, when 7 for the 7th run, 7 6 to indicate that setup 6 was used.) Note that septup #1 is a “center point” for the set of , , combinations in the data set. The other 8 setups form a 2 2 2 factorial structure.

(a) Find a 90% confidence interval for in the above model.

Page 60: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

14  

(b) Find a 95% confidence interval for the main effect (i.e. the difference in the high and low flow levels averaged over all treatment combinations of the other two factors).

(c) Find a 95% prediction interval for an addition observation under process setup #2 under this model.

Page 61: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

15  

11. In a study meant to determine the variability of diameters of widgets produced on a manufacturing line, an engineer measures 10m widgets produced on the line once each. Then the engineer measures the diameter of an 11th widget 8n times. Suppose one models a measured widget diameter, y , as

y x

where x is the true diameter of the particular widget and is measurement error, for 2~ ( , )xx N

independent of 2~ (0, )N . With 2ys the sample variance of the measurements on the 10 widgets,

2 2 2( )y xE s , and with 2s the sample variance of the repeated measurements on the 11th widget, 2 2( )E s . If the engineer observes 0.05ys mm and 0.01s mm, find approximate 90%

confidence limits for x . (Hint: Use the Cochran-Satterthwaite approximation. Plug in the values for calculating the confidence limits. No need to calculate actual critical values if the calculated d.f. is not an integer.)

Page 62: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

16  

12. Suppose the relationship between the mean of a normally distributed response variable and a continuous explanatory variable is known to be linear with intercept 0. The slope of the linear relationship is unknown and may depend on conditions that can be controlled by a researcher. The variability of for a given value of is unknown but is assumed to be the same for all values of and for all conditions. Suppose the researcher conducted an experiment involving 6 independent trials (carried out in random order) and obtained the following “data”:

Trial Condition 1 1 0 1 2 1 5 2 3 1 6 3 4 2 3 1 5 2 1 2 6 2 3 3

(a) The researcher would like to know if the slope for the relationship between the mean of and is the

same under Condition 1 as it is under Condition 2. Test the researcher’s question at 0.05.

Page 63: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

17  

(b) Suppose the researcher would like to repeat the experiment described here. Once again 6 independent trials will be used. Any trial can be conducted under Condition1 or Condition 2. The value of may be set at any value in 1,2,3 for each trial. Recommend a design to the researcher that will maximize the power for detecting a difference between the slopes under Condition 1 and Condition 2. For each trial, state the condition and the value of the variable that you recommend. (Hint: find the non-centrality parameter for the F-test statistic.)

Trial Condition 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ?

Page 64: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

18  

Standard Normal Probability Distribution P(Z<z) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.53590.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.57530.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.61410.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.65170.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.68790.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.72240.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.75490.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.78520.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.81330.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.83891.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.86211.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.88301.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.90151.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.91771.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.93191.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.94411.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.95451.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.96331.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.97061.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.97672.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.98172.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.98572.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.98902.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.99162.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.99362.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.99522.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.99642.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.99742.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.99812.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.99863.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.99903.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.99933.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.99953.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.99973.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

Page 65: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

19  

Critical values for t-distribution

Page 66: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

20  

Critical values for chi-square distribution

Page 67: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

21  

Critical values for F-distribution

Page 68: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

22  

Page 69: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

23  

Page 70: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

1  

DEPARTMENT OF MATHEMATICS & STATISTICS

GRADUATE PROGRAM

QUALIFYING EXAMINATION I: APPLIED STATISTICS

Saturday January 23, 2016

Examiners: Dr. L. Lu, and Dr. C. P. Tsokos

Answer any seven questions completely; at least one must be from Part B. PART A: Statistical Analysis

1. Biologists studying sandflies at a site in eastern Panama looked at the relationship between gender

and the height at which the flies were caught in a light trap, resulting in the following data:

Gender Height above Ground

Row Total3 feet 35 feet Males 173 125 298

Females 150 73 223 Column Total 323 198 521

Set up the hypotheses to test whether the gender of the sandflies and trap height are associated. Conduct an appropriate test at 0.05 and draw a conclusion.

Page 71: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

2  

2. To determine whether glaucoma affects the corneal thickness, measurements were made in 8 people affected by glaucoma in one eye but not in the other. The measured corneal thicknesses (in microns) were as follows:

Person 1 2 3 4 5 6 7 8 Mean SD

Eye affected by glaucoma 488 478 480 426 440 410 458 460 455 27.7

Eye not affected by glaucoma 484 478 492 444 436 398 464 476 459 31.3

Difference 4 0 -12 -18 4 12 -6 -16 -4 10.7

(a) Assume the corneal thickness measurements are normally distributed for all eyes affected or not affected by glaucoma. Do you think the data suggest the average corneal thickness is affected by glaucoma at a significance level of 0.1?

(b) Find a 90% prediction interval for the difference in the corneal thickness between two eyes for a future patient with only one eye affected by glaucoma.

Page 72: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

3  

3. Calls to technical support service of a software company are monitored on a sampling basis for quality assurance. Each monitored call is classified as satisfactory or unsatisfactory by the supervisor in terms of the quality of help offered. For a new trainee, a random sample of calls will be monitored for one month to access his/her quality of help offered. Additional training is required if the proportion of unsatisfactory calls during a month is more than 10%.

(a) How many calls need to be sampled in order to obtain an estimate of the proportion of unsatisfactory calls during the month within 3% at a 95% confidence level?

(b) If a random sample of 200 calls was monitored over the month for a new trainee and 30 calls were classified as unsatisfactory. Would you suggest additional training for the new trainee at the significance level 0.05 ?

(c) What is the probability that additional training is judged to be unnecessary at 0.05 when in fact the proportion of unsatisfactory calls is 15% for the new trainee?

Page 73: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

4  

4. Members of the New York Choral Society are organized according to vocal range. Among male singers, the parts from lowest to highest pitch are Bass 1, Bass 2, Tenor 1, and Tenor 2. The heights of male singers are summarized in the table below.

Heights in Inches

Part Sample size

( ) Sample Mean

( iY )

Sample

Variance ( 2is )

Bass 1 31 70.84 5.2065 Bass 2 31 71.38 7.3828 Tenor 1 31 69.00 5.8667 Tenor 2 31 69.70 3.8258

Total 124 70.16Y 2 5.5704ps

(a) Complete the ANOVA table below and test whether there are significant differences in height among

singers in different parts at 0.05. Source of Variance

Degree of Freedom

Sum of Squares Mean Squares F-Raito

Part 92.32

Error 668.45

(b) Use Tukey’s method to do all pairwise comparisons between the average height of singers in different parts at 0.05 and summarize your result. (Use , , 3.685

Page 74: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

5  

(c) Test if there is any significant difference between the average height of male singers in low pitch parts (Bass 1 & 2) and those in high pitch parts (Tenor 1 & 2) at 0.05.

5. In a study for developing a model to predict the college GPA of matriculating freshmen based on their

college entrance verbal and mathematics test scores, data for a random sample of 40 graduating seniors on their college GPA ( ) along with their college entrance verbal test score ( ) and mathematics test score ( ) expressed as percentiles are collected.

(a) A linear regression model is fitted to these data. The parameter estimates and their standard errors are summarized below. The SSE for the linear fit is 5.9876. Calculate 95% CI’s for the coefficients of the verbal and mathematics test scores. Based on the CI’s, should any of the variables be excluded from the model?

Parameter Estimate SE

0 -1.571 0.494

1 0.026 0.004

2 0.034 0.005

Page 75: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

6  

(b) A quadratic model of the form is then fitted to these data. Complete the sequential ANOVA table below:

Models SSE DF MS

5.2549

7.5311

3.6434

1.0552

0.0982

Residuals 1.1908

Based on the sequential ANOVA table in (b), construct a test of 20 1 3 4 5: 0H and draw a conclusion at 0.01 , if that is possible from the information provided. In not, indicate how you would construct the desired test.

(c) Construct a test of 50 3 4: 0H and draw a conclusion at 0.01 , if that is possible from

the information provided. In not, indicate how you would construct the desired test.

Page 76: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

7  

(d) Construct a test of 0 5: 0H and draw a conclusion at 0.01 , if that is possible from the

information provided. In not, indicate how you would construct the desired test.

6. Some data were collected from a study of a natural compound that may help regulate blood cholesterol levels in rats. A total of 128 rats were randomly assigned to one of 7 diets differing only in the dose of the natural compound. A second order model using the log-transformed response (blood cholesterol) in the form of , ∼ 0, was fitted and the parameter estimates and their variance-covariance matrix are given below:

5.2170.1620.226

and 0.00206 0.00357 0.001440.00357 0.00988 0.005180.00144 0.00518 0.00306

(a) Test whether the dose-response model (describing the expected log-cholestrol level as a function of the dose of a drug) is better fit by a straight line or a quadratic curve at 0.05.

(b) Use the following results from fitting a variety of regression models to test whether there is any evidence of lack of fit of the quadratic model at 0.05.

Model Error Sum-of-squares

5.876

4.527

3.996

3.870

3.819

Page 77: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

8  

(c) Assume that a quadratic dose-response model is chosen, please estimate the dose value that produces the largest expected response and its approximated standard error.

7. A randomized clinical trial was conducted to study a proposed new drug for reducing blood pressure.

Volunteers were randomly assigned to one of the two treatments: take the new drug for 3 months or take a placebo for 3 months. Blood pressure for each subject was recorded at the start of the experiment and again after 3 months. The relevant variables are defined as:

: blood pressure for subject at the end of the study (post-treatment) : blood pressure for subject at the beginning of the study (pre-treatment)

1 ifsubject receivedthenewdrug0 ifsubject receivedtheplacebo

Page 78: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

9  

There are 181 subjects participated in the study, 98 subjects received the new drug and 83 subjects received the placebo. The average pre-treatment and post-treatment blood pressure for all 181 subjects are 85.7 and 59.1, respectively. First, a regression model in the form of

, ∼ 0, , was fitted using ordinary least squares. The results are given below:

Coefficient Estimates

Var-Cov matrix of coefficient estimates ANOVA table

Source of Variation

Degree of Freedom

Sum of Squares

26.17 52.47 -0.531 -5.008 Pre-BP 10731.0

0.391 -0.531 0.00594 -0.000847 Treatment 643.1

3.78 -5.008 -0.000847 9.263 Error 74099.99

(a) Find a 95% confidence interval for the effect of the drug (difference in post-treatment blood pressure between subjects receiving the drug and those receiving the placebo) for subjects with a pre-treatment blood pressure of 100. (Use , . 1.973)

(b) Find a 95% prediction interval for the post-treatment blood pressure for a subject with a pre-treatment blood pressure of 100 and received the drug, given that the standard error of the mean prediction for all subjects with a pre-treatment blood pressure of 100 and received the drug is 2.217.

Page 79: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

10  

(c) The investigators are also concerned whether the effect of the drug is the same at all blood pressures, i.e. whether the slope of the relationship between and is the same in the “drug” and “placebo” groups. Some addition models are fitted to the data. The error sums-of-squares for the original model and the additional models are given in the table below:

Models Error Sum of Squares

74,099.99

74,080.15

74,080.50

73,272.78

Conduct a test for equal slopes in the “drug” and “placebo” groups at 0.05.

8. In an experiment designed for studying children’s memory, a random sample of 36 fourth-graders from a school were included in the experiment. Two design factors are considered: the level of reinforcement (none or verbal) and time of isolation (20, 40, or 60 minutes). Students participating in the study were told to memorize a paragraph and given positive verbal reinforcement or no reinforcement while learning it according to their treatment assignment. Then students were isolated for the specified amount of time. There were 6 students randomly assigned to each of the six treatment groups. The response is a score measuring the student’s memory for the learned paragraph.

(a) Complete the ANOVA table below. Which effects are tested significant at 0.05 level? Source of Variation

Degrees of freedom

Sum of Squares Mean Squares

Reinforcement 117.36 Isolation time 141.56

Interaction 916.22 Error 401.17

Page 80: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

11  

(b) Considering the following estimates of differences in the average memory scores between A. verbal and no reinforcement, averaged over the three isolation times B. verbal and no reinforcement, only for 20 minute isolation time C. 20 and 40 minutes isolation times, averaged over the two reinforcement levels D. Interaction between 20 and 40 minutes and reinforcement levels

Which of the four contrasts has the largest standard error? Which has the smallest standard error? Each of the four estimates can lead to a t-test for an “interesting difference”. Assume the “interesting difference” is the same value for all four tests, which test is most powerful? Which is least powerful?

Page 81: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

12  

9. In a pig breeding study, two offspring from each of ten litters were measured for average daily weight gain. The individual pig measurements can be modeled in the form of

where is the overall mean, and ∼ 0, ) is the litter random effect which is independent of the individual pig effect ∼ 0, .

(a) Complete the ANOVA table below. And obtain point estimates of the litter and individual pig variance components.

Source d.f. SS MS Expected MS

Litters 0.6705

Pigs within litters 0.3834

total 1.0539

(b) If the investigator intends to select a pig at random from the population and measure its average daily weight gain. Estimate the variance associate with this quantity as well as the fraction of this variance that is due to genetic factors (i.e. litter).

Page 82: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

13  

(c) Calculate the 95% CI for the overall average daily weight gain for all offspring from the population. The overall sample mean of the weight gain is 2.572 kg.

(d) Suppose the investigator now intends to design an experiment where the treatments are assigned to mothers (i.e. litters), but the response is observed on pigs to study the effects of some drugs. The quantity of interest is the treatment means, averaged over litters and pigs at each treatment level. The investigator can choose between the following three designs:

A. 2 litters, 4 pigs per litter at each treatment level; B. 4 litters, 2 pigs per litter at each treatment level; C. 8 litters, 1 pigs per litter at each treatment level.

Assume the variance components estimated in (a) are appropriate for this new study. Calculate the standard error of the treatment means for each of the three designs. Which design gives the most precise estimate of the treatment mean?

Page 83: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

14  

Part B: Linear Statistical Model

10. This question concerns the analysis of a small set of data on the operation of a Butane Hydrogenolysis Reactor. The response variable is

percent conversion of butane,

which is to be estimated as a function of the chemical reactor process variables

a total feed flow (cc/sec at STP) feed ratio (Hydrogen/Butane) the reactor wall temperature ( )

The data are shown in the following table.

Run, Setup, 1 82 115 6 495 1 2 91 50 4 470 2 3 75 180 8 520 3 4 98 50 4 520 4 5 39 180 8 470 5 6 77 115 6 495 1 7 95 50 8 520 6 8 61 180 4 470 7 9 81 115 6 495 1 10 76 50 8 470 8 11 92 180 4 520 9 12 82 115 6 495 1

Twelve runs were made on 9 process setups (corresponding to combinations of levels of the flow, ratio, and temp factors) were used. We consider the analysis based on a cell means model

, for 1,2, , … ,12

where , , … , are unknown parameters (the 9 mean responses for the different setups of the process), ∼ 0, , and we use the notation the setup number employed in the th run of the process. (For example, when 7 for the 7th run, 7 6 to indicate that setup 6 was used.) Note that septup #1 is a “center point” for the set of , , combinations in the data set. The other 8 setups form a 2 2 2 factorial structure.

(a) Find a 90% confidence interval for in the above model.

Page 84: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

15  

(b) Find a 95% confidence interval for the main effect (i.e. the difference in the high and low flow levels averaged over all treatment combinations of the other two factors).

(c) Find a 95% prediction interval for an addition observation under process setup #2 under this model.

Page 85: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

16  

11. In a study meant to determine the variability of diameters of widgets produced on a manufacturing line, an engineer measures 10m widgets produced on the line once each. Then the engineer measures the diameter of an 11th widget 8n times. Suppose one models a measured widget diameter, y , as

y x

where x is the true diameter of the particular widget and is measurement error, for 2~ ( , )xx N

independent of 2~ (0, )N . With 2ys the sample variance of the measurements on the 10 widgets,

2 2 2( )y xE s , and with 2s the sample variance of the repeated measurements on the 11th widget, 2 2( )E s . If the engineer observes 0.05ys mm and 0.01s mm, find approximate 90%

confidence limits for x . (Hint: Use the Cochran-Satterthwaite approximation. Plug in the values for calculating the confidence limits. No need to calculate actual critical values if the calculated d.f. is not an integer.)

Page 86: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

17  

12. Suppose the relationship between the mean of a normally distributed response variable and a continuous explanatory variable is known to be linear with intercept 0. The slope of the linear relationship is unknown and may depend on conditions that can be controlled by a researcher. The variability of for a given value of is unknown but is assumed to be the same for all values of and for all conditions. Suppose the researcher conducted an experiment involving 6 independent trials (carried out in random order) and obtained the following “data”:

Trial Condition 1 1 0 1 2 1 5 2 3 1 6 3 4 2 3 1 5 2 1 2 6 2 3 3

(a) The researcher would like to know if the slope for the relationship between the mean of and is the

same under Condition 1 as it is under Condition 2. Test the researcher’s question at 0.05.

Page 87: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

18  

(b) Suppose the researcher would like to repeat the experiment described here. Once again 6 independent trials will be used. Any trial can be conducted under Condition1 or Condition 2. The value of may be set at any value in 1,2,3 for each trial. Recommend a design to the researcher that will maximize the power for detecting a difference between the slopes under Condition 1 and Condition 2. For each trial, state the condition and the value of the variable that you recommend. (Hint: find the non-centrality parameter for the F-test statistic.)

Trial Condition 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ?

Page 88: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

19  

Standard Normal Probability Distribution P(Z<z) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.53590.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.57530.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.61410.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.65170.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.68790.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.72240.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.75490.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.78520.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.81330.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.83891.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.86211.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.88301.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.90151.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.91771.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.93191.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.94411.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.95451.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.96331.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.97061.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.97672.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.98172.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.98572.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.98902.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.99162.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.99362.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.99522.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.99642.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.99742.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.99812.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.99863.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.99903.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.99933.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.99953.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.99973.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

Page 89: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

20  

Critical values for t-distribution

Page 90: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

21  

Critical values for chi-square distribution

Page 91: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

22  

Critical values for F-distribution

Page 92: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

23  

Page 93: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

24  

Page 94: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

1  

DEPARTMENT OF MATHEMATICS & STATISTICS

GRADUATE PROGRAM

QUALIFYING EXAMINATION I: APPLIED STATISTICS

Saturday September 24, 2016

Examiners: Dr. L. Lu, and Dr. C. P. Tsokos

Answer any seven questions completely; at least one must be from Part B. PART A: Statistical Analysis 1. An 8-inch bar that is clamped at both ends is to be subjected to an increasing amount of stress

until it snaps. Let X= the distance from the left end to where the break occurs. Suppose the distribution of X has probability density function given by

3 1 0 8

( )0 otherwise

32 8x

x xf x

(a) Find the median distance from the left end to where the break occurs.

(b) Find the expected length of the shorter segment when the bar is broken.

Page 95: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

2  

2. Two brands of water filters are to be compared in terms of the mean reduction in impurities measured in parts per million (ppm). Sixteen water samples were tested with each filter and reduction in the impurity level was measured, resulting in the following data

Filter 1 1 16n 8.0x 1 4s Filter 2 2 16n 5.5y 2 2.5s

(a) Do you think the data suggest that the two brands of filters are different in the mean reduction in impurities? State and test the relevant hypotheses at a significance level of 0.05 using the Welch-Satterthwaite separate variances approach?

(b) Do the two brands of filters have equal amount of variability in reduction in the impurity level at a significant level of 0.01?

(c) Now if the two brands of filters are assumed to have equal variances, would the conclusion

change at the same significance level 0.05?

Page 96: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

3  

3. A common practice in horticulture is to graft stems of one type of tree onto root systems of another. If the graft is healthy, the combination tree is much more vigorous than either part alone. Unfortunately, grafts often get diseased. The age of the root systems is suspected to influence the probability of success. The table below shows data from an experiment to evaluate graft success in root stocks of two different ages. There are 60 trees classified by graft status (healthy or diseased) and age of the rootstock.

Age of Tree Graft Status 3-5 6-8

Health 12 16 Disease 22 10

(a) Test the hypothesis that the probability of graft success (healthy) in 3-5 year old trees is equal to the probability of graft success in 6-8 year old trees at significance level 0.05.

The data above were part of a larger experiment with 4 age classes. The original experiment started with 40 trees per age group. Some trees died before the success was measured. The two-way table for the complete experiment is given below.

Age of Tree Graft Status 1-2 3-5 6-8 9-10

Health 1 12 16 3 Disease 14 22 10 7 Dead 25 6 14 30

(b) Set up the hypotheses for testing whether are significant differences in the percentage of different graft status among different age groups of the trees. Write out the test statistic (plug in numbers but no need to calculate the final value) and the critical value to be compared with.

Page 97: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

4  

4. For investigating the economical difference in four types of nonrechargeable batteries, a completely randomized design was conducted on 16 batteries of four types to compare the lifetime per unit cost. The data are summarized in the table below:

Battery Type Life per unit cost (minutes per

dollar) Group mean

( iy ) Group variance

( 2is )

1 (alkaline, name brand) 611 537 542 593 570.75 1360.25

2 (alkaline, store brand) 923 794 827 898 860.50 3619.00

3 (heavy duty, name brand) 445 490 384 413 433.00 2064.67

4 (heavy duty, store brand) 476 569 480 460 496.25 2426.92

y 590.125 2ps 2367.71

(a) Calculate a 90% confidence interval for the experimental error .

(b) Do the four types of batteries have the same average life per unit cost? State and test hypotheses at significance level 0.01 based on the ANOVA table given in the R output below. Df Sum Sq Mean Sq F value Pr(>F) type 3 427915 142638 60.24 1.66e-07 *** Residuals 12 28412 2368

(c) Is there any significant difference in running time per unit cost between the alkaline and heavy duty

batteries at significance level 0.05?

Page 98: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

5  

5. In a study for developing a model to predict the college GPA of matriculating freshmen based on their college entrance verbal and mathematics test scores, data for a random sample of 40 graduating seniors on their college GPA ( ) along with their college entrance verbal test score ( ) and mathematics test score ( ) expressed as percentiles are collected.

(a) A linear regression model is fitted to these data. The parameter estimates and their standard errors are summarized below. The SSE for the linear fit is 5.9876. Calculate 95% CI’s for the coefficients of the verbal and mathematics test scores. Based on the CI’s, should any of the variables be excluded from the model?

Parameter Estimate SE

0 -1.571 0.494

1 0.026 0.004

2 0.034 0.005

(b) A quadratic model of the form is then fitted to these data. Complete the sequential ANOVA table below:

Models SSE DF MS 5.2549 7.5311 3.6434 1.0552 0.0982

Residuals 1.1908

Based on the sequential ANOVA table in (b), construct a test of 20 1 3 4 5: 0H and draw a conclusion at 0.01 , if that is possible from the information provided. In not, indicate how you would construct the desired test.

Page 99: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

6  

(c) Construct a test of 50 3 4: 0H and draw a conclusion at 0.01 , if that is possible from

the information provided. In not, indicate how you would construct the desired test.

(d) Construct a test of 0 5: 0H and draw a conclusion at 0.01 , if that is possible from the

information provided. In not, indicate how you would construct the desired test.

6. Some data were collected from a study of a natural compound that may help regulate blood cholesterol levels in rats. A total of 128 rats were randomly assigned to one of 7 diets differing only in the dose of the natural compound. A second order model using the log-transformed response (blood cholesterol) in the form of , ∼ 0, was fitted and the parameter estimates and their variance-covariance matrix are given below:

5.2170.1620.226

and 0.00206 0.00357 0.001440.00357 0.00988 0.005180.00144 0.00518 0.00306

(a) Test whether the dose-response model (describing the expected log-cholestrol level as a function of the dose of a drug) is better fit by a straight line or a quadratic curve at 0.05.

Page 100: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

7  

(b) Use the following results from fitting a variety of regression models to test whether there is any evidence of lack of fit of the quadratic model at 0.05.

Model Error Sum-of-squares

5.876

4.527

3.996

3.870

3.819

(c) Assume that a quadratic dose-response model is chosen, please estimate the dose value that produces

the largest expected response and its approximated standard error.

Page 101: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

8  

7. A randomized clinical trial was conducted to study a proposed new drug for reducing blood pressure. Volunteers were randomly assigned to one of the two treatments: take the new drug for 3 months or take a placebo for 3 months. Blood pressure for each subject was recorded at the start of the experiment and again after 3 months. The relevant variables are defined as:

: blood pressure for subject at the end of the study (post-treatment) : blood pressure for subject at the beginning of the study (pre-treatment)

1 ifsubject receivedthenewdrug0 ifsubject receivedtheplacebo

There are 181 subjects participated in the study, 98 subjects received the new drug and 83 subjects received the placebo. The average pre-treatment and post-treatment blood pressure for all 181 subjects are 85.7 and 59.1, respectively. First, a regression model in the form of

, ∼ 0, , was fitted using ordinary least squares. The results are given below:

Coefficient Estimates

Var-Cov matrix of coefficient estimates ANOVA table

Source of Variation

Degree of Freedom

Sum of Squares

26.17 52.47 -0.531 -5.008 Pre-BP 10731.0

0.391 -0.531 0.00594 -0.000847 Treatment 643.1

3.78 -5.008 -0.000847 9.263 Error 74099.99

(a) Find a 95% confidence interval for the effect of the drug (difference in post-treatment blood pressure between subjects receiving the drug and those receiving the placebo) for subjects with a pre-treatment blood pressure of 100. (Use , . 1.973)

(b) Find a 95% prediction interval for the post-treatment blood pressure for a subject with a pre-treatment blood pressure of 100 and received the drug, given that the standard error of the mean prediction for all subjects with a pre-treatment blood pressure of 100 and received the drug is 2.217.

Page 102: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

9  

(c) The investigators are also concerned whether the effect of the drug is the same at all blood pressures, i.e. whether the slope of the relationship between and is the same in the “drug” and “placebo” groups. Some addition models are fitted to the data. The error sums-of-squares for the original model and the additional models are given in the table below:

Models Error Sum of Squares

74,099.99

74,080.15

74,080.50

73,272.78

Conduct a test for equal slopes in the “drug” and “placebo” groups at 0.05.

Page 103: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

10  

8. In a snow gauge calibration study, data were collected to describe how the response variable, gain (Y), is related to the predictor variable, density (X). Polyethylene blocks were used as substitute for snow since they can be manufactured in different densities. The density was set by the process used to manufacture the blocks. The data set includes 9 densities. Ten blocks of each density were measured. (a) A simple linear regression model was fitted to the data at

the beginning. The plot of the residuals vs. the predicted variables is shown below. Do you have any concerns?

(b) The Box-Cox method was used to explore possible transformation for the response variable. Below is the plot of SSE vs. different values of the power parameter, . What transformation (if any) does it suggest?

(c) Rightly or wrongly, the investigator decided to use a log transformation. A simple linear regression of log( )Y Y on X density was fitted to the transformed data. Find the 95% confidence intervals for the gain when density 0.5X based on the given R output. Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 6.08452 0.04681 129.99 <2e-16 *** density -4.68498 0.11692 -40.07 <2e-16 *** Residual standard error: 0.2496 on 88 degrees of freedom Multiple R-squared: 0.948, Adjusted R-squared: 0.9474 F-statistic: 1606 on 1 and 88 DF, p-value: < 2.2e-16 > lm2=lm(log(gain)~density,data=snow) > predict(lm2,data.frame(density=0.5),se=T,interval="confidence") $fit fit lwr upr 1 3.742028 3.676656 3.8074 $se.fit [1] 0.03289493 $df

-0.4 -0.2 0.0 0.2 0.4

6000

080

000

1000

0012

0000

SS

E(

)

0 100 200 300

-150

-100

-50

050

100

150

200

y

Res

idua

ls

Page 104: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

11  

[1] 88 $residual.scale [1] 0.2495851 > qt(0.975,88) [1] 1.98729

(d) Lack of fit is an important concern for these investigators. Since there are repeated measurements

for each density, it is possible to use the ANOVA lack of fit test. Construct the ANOVA table for the lack of fit test from the R output below. Is a straight line sufficient to describe the relationship between log gain and density at 0.05 ? Use F critical value ?,?, . 2.125 in your analysis. Analysis of Variance Table Response: log(gain) Df Sum Sq Mean Sq F value Pr(>F) density 1 100.015 100.015 1605.6 < 2.2e-16 *** Residuals 88 5.482 0.062 Analysis of Variance Table Response: log(gain) Df Sum Sq Mean Sq F value Pr(>F) factor(density) 9 1950.19 216.69 3616.2 < 2.2e-16 *** Residuals 81 4.85 0.06

Page 105: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

12  

9. The table below shows the results of using all subset selection for choosing a best model from all possible regressions with four predictor variables 1 2 3, ,X X X and 4X . Five different criteria are considered. Please make a recommendation of which model to choose based on the results below and explain your reason.

Variables in 2R 2adjR pC AIC BIC

1 0.061 0.043 141.16 -77.08 -73.10 2 0.221 0.206 108.56 -87.18 -83.20 3 0.428 0.417 66.49 -103.83 -99.85 4 0.422 0.410 67.71 -103.26 -99.28 12 0.263 0.234 102.03 -88.16 -82.20 13 0.549 0.531 43.85 -114.66 -108.69 14 0.430 0.408 67.97 -102.07 -96.10 23 0.663 0.650 20.52 -130.48 -124.52 24 0.483 0.463 57.21 -107.32 -101.36 34 0.599 0.584 33.50 -121.11 -115.15 123 0.757 0.743 3.39 -146.16 -138.20 124 0.487 0.456 58.39 -105.75 -97.79 134 0.612 0.589 32.93 -120.84 -112.89 234 0.718 0.701 11.42 -138.02 -130.07 1234 0.759 0.740 5.00 -144.59 -134.64

Page 106: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

13  

Part B: Linear Statistical Model

10. This question concerns the analysis of a small set of data on the operation of a Butane Hydrogenolysis Reactor. The response variable is

percent conversion of butane,

which is to be estimated as a function of the chemical reactor process variables

a total feed flow (cc/sec at STP) feed ratio (Hydrogen/Butane) the reactor wall temperature ( )

The data are shown in the following table.

Run, Setup, 1 82 115 6 495 1 2 91 50 4 470 2 3 75 180 8 520 3 4 98 50 4 520 4 5 39 180 8 470 5 6 77 115 6 495 1 7 95 50 8 520 6 8 61 180 4 470 7 9 81 115 6 495 1 10 76 50 8 470 8 11 92 180 4 520 9 12 82 115 6 495 1

Twelve runs were made on 9 process setups (corresponding to combinations of levels of the flow, ratio, and temp factors) were used. We consider the analysis based on a cell means model

, for 1,2, , … ,12

where , , … , are unknown parameters (the 9 mean responses for the different setups of the process), ∼ 0, , and we use the notation the setup number employed in the th run of the process. (For example, when 7 for the 7th run, 7 6 to indicate that setup 6 was used.) Note that septup #1 is a “center point” for the set of , , combinations in the data set. The other 8 setups form a 2 2 2 factorial structure.

(a) Find a 90% confidence interval for in the above model.

Page 107: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

14  

(b) Find a 95% confidence interval for the main effect (i.e. the difference in the high and low flow levels averaged over all treatment combinations of the other two factors).

(c) Find a 95% prediction interval for an addition observation under process setup #2 under this model.

Page 108: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

15  

11. Consider a Gauss-Markov linear model Y Xβ ε with 2( )Var ε I for 5n observations:

1 1

2 1 2

3 2 3

4 3 4

5 5

1 0 01 1 11 1 11 1 11 1 1

y

y

y

y

y

(a) Which of the parameters 1 , 2 , or 3 can be estimated with the greatest precision?

(b) Which row of the X matrix represents a set of conditions under which ( )E Y can be estimated with the best precision? Explain carefully.

For the next two questions, suppose that Y is such that 3SSE and ˆ (5,6,2)OLS β . Consider an analysis under the normal Gauss-Markov model.

(c) In the future, two new observations, ,1newy and ,2newy are going to be observed under the condition

described respectively by the 1st and 2nd rows of the X matrix. Give 95% two-sided prediction limits for ,1 ,2new newy y .

Page 109: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

16  

(d) Write the hypothesis 0 1 2: ( ) ( )H E y E y and 1 3( ) ( )E y E y in testable form 0 :H Cβ 0 for an

appropriate matrixC . Find such a matrix C and compute an F statistic for testing this hypothesis.

12. Suppose the relationship between the mean of a normally distributed response variable and a continuous explanatory variable is known to be linear with intercept 0. The slope of the linear relationship is unknown and may depend on conditions that can be controlled by a researcher. The variability of for a given value of is unknown but is assumed to be the same for all values of and for all conditions. Suppose the researcher conducted an experiment involving 6 independent trials (carried out in random order) and obtained the following “data”:

Trial Condition 1 1 0 1 2 1 5 2 3 1 6 3 4 2 3 1 5 2 1 2 6 2 3 3

(a) The researcher would like to know if the slope for the relationship between the mean of and is the

same under Condition 1 as it is under Condition 2. Test the researcher’s question at 0.05.

Page 110: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

17  

(b) Suppose the researcher would like to repeat the experiment described here. Once again 6 independent trials will be used. Any trial can be conducted under Condition1 or Condition 2. The value of may be set at any value in 1,2,3 for each trial. Recommend a design to the researcher that will maximize the power for detecting a difference between the slopes under Condition 1 and Condition 2. For each trial, state the condition and the value of the variable that you recommend. (Hint: find the non-centrality parameter for the F-test statistic.)

Trial Condition 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ?

Page 111: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

18  

Standard Normal Probability Distribution P(Z<z) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.53590.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.57530.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.61410.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.65170.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.68790.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.72240.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.75490.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.78520.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.81330.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.83891.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.86211.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.88301.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.90151.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.91771.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.93191.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.94411.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.95451.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.96331.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.97061.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.97672.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.98172.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.98572.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.98902.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.99162.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.99362.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.99522.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.99642.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.99742.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.99812.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.99863.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.99903.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.99933.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.99953.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.99973.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

Page 112: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

19  

Critical values for t-distribution

Page 113: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

20  

Critical values for chi-square distribution

Page 114: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

21  

Critical values for F-distribution

Page 115: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

22  

Page 116: DEPARTMENT OF MATHEMATICS & STATISTICS GRADUATE PROGRAM …math.usf.edu/grad/quals/applied stats/Applied Statistics QEs.pdf · DEPARTMENT OF MATHEMATICS & STATISTICS . GRADUATE PROGRAM

23