52
Random Sample and Central Limit Theorem; X-Bar and R control charts. Exercise 1: (Example 1) Suppose X 1 , X 2 , …, X 20 is a sample from normal distribution N ( 2 ) with = 5, 2 = 4. Find (a) Expectation and Variance of (b) Distribution of Exercise 2: (Example 2) Given that X is normally distributed with mean 50 and standard deviation 4, compute the following for n=25. 1 5

PS-work book_solution

Embed Size (px)

Citation preview

Page 1: PS-work book_solution

Random Sample and Central Limit

Theorem; X-Bar and R control charts.

Exercise 1: (Example 1)Suppose X1, X2, …, X20 is a sample from normal distribution N ( 2) with = 5, 2 = 4. Find (a) Expectation and Variance of (b) Distribution of

Exercise 2: (Example 2)Given that X is normally distributed with mean 50 and standard deviation 4, compute the following for n=25.

(a) Mean and variance of

(b)

(c)

(d)

1

5

Page 2: PS-work book_solution

Probability and Statistics Work Book

Exercise 3: (Tutorial 5, No.1)Given that X is normally distributed with mean 20 and standard deviation 2, compute the following for n=40.(a) Mean and variance of

(b)

(c)

(d)

Solution:(a) Mean of = 20 and variance of = 4/40 = 0.1

(b)

(c)

(d)

Exercise 4: (Tutorial 5, No.2)Let X denote the number of flaws in a 1 in length of copper wire. The pmf of X is given in the following table

X=x 0 1 2 3P(X=x) 0.48 0.39 0.12 0.01

100 wires are sampled from this population. What is the probability that the average number of flaws per wire in this sample is less than 0.5?

Solution: Given that,Mean of X = 0(0.48) + 1(0.39) + 2(0.12) + 3(0.01)=0.66Variance of X =[ 02(0.48) + 12(0.39) + 22(0.12) + 32(0.01) ] – (0.66)2 = 0.5244If n=100, the mean of is 0.66 and the variance of is 0.5244/100 = 0.005244

So,

Exercise 5: (Tutorial 5, No.3)

2

Page 3: PS-work book_solution

Probability and Statistics Work Book

At a large university, the mean age of the students is 22.3 years, and the standard deviation is 4 years. A random sample of 64 students is drawn. What is the probability that the average age of these students is greater than 23 years?

Solution: Given that, the mean of X is 22.3 and the variance of X is 16

If n = 64, the mean of is 22.3 and the variance of is 16/64 = 0.25

So,

Exercise 6: The flexural strength (in MPa) of certain concrete beams is X ~ N (8, 2.25). Find the probability that the sample mean of strength of 16 concrete beams will belong to (7.55, 8.75)

Exercise 7(Example 3)

3

Page 4: PS-work book_solution

Probability and Statistics Work Book

A component part for a jet aircraft engine is manufactured by an investment casting process. The vane opening on this casting is an important functional parameter of the part. We will illustrate the use of   and R control charts to assess the statistical stability of this process. The table presents 20 samples of five parts each. The values given in the table have been coded by using the last three digits of the dimension; that is, 31.6 should be 0.50316 inch.

Sample Number x1 x2 x3 x4 x5           r 1 33 29 31 32 33 31.6  4 2 33 31 35 37 31 33.4  6 3 35 37 33 34 36 35.0  4 4 30 31 33 34 33 32.2  4 5 33 34 35 33 34 33.8  2 6 38 37 39 40 38 38.4  3 7 30 31 32 34 31 31.6  4 8 29 39 38 39 39 36.8 10 9 28 33 35 36 43 35.0 1510 38 33 32 35 32 34.0  611 28 30 28 32 31 29.8  412 31 35 35 35 34 34.0  413 27 32 34 35 37 33.0 1014 33 33 35 37 36 34.8  415 35 37 32 35 39 35.6  716 33 33 27 31 30 30.8  617 35 34 34 30 32 33.0  518 32 33 30 30 33 31.6  319 25 27 34 27 28 28.2  920 35 35 36 33 30 33.8  6

(a) Construct and R control charts. (b) After the process is in control, estimate the process mean and standard deviation.

Exercise 8(Tutorial 5, No.4)

4

Page 5: PS-work book_solution

Probability and Statistics Work Book

The overall length of a skew used in a knee replacement device is monitored using and R charts. The following table gives the length for 20 samples of size 4. (Measurements are coded from 2.00 mm; that is, 15 is 2.15 mm.)

  Observation   Observation

Sample 1 2 3 4 Sample 1 2 3 4

1 16 18 15 13 11 14 14 15 13

2 16 15 17 16 12 15 13 15 16

3 15 16 20 16 13 13 17 16 15

4 14 16 14 12 14 11 14 14 21

5 14 15 13 16 15 14 15 14 13

6 16 14 16 15 16 18 15 16 14

7 16 16 14 15 17 14 16 19 16

 8 17 13 17 16 18 16 14 13 19

 9 15 11 13 16 19 17 19 17 13

10 15 18 14 13 20 12 15 12 17

(i) Using all the data, find trial control limits for and R charts, construct the chart, and plot the data.

(ii) Use the trial control limits from part (a) to identify out-of-control points. If necessary, revise your control limits, assuming that any samples that plot outside the control limits can be eliminated.

(iii) Assuming that the process is in control, estimate the process mean and process standard deviation.

Solution:

5

Page 6: PS-work book_solution

Probability and Statistics Work Book

(i) The trial control limits are as follows.

(ii) Based on the control charts, there is a single observation beyond the control limits. Observation 14 is above the upper control limit on the R chart.

With Observation 14 removed, the control limits and charts are as follows.

6

Page 7: PS-work book_solution

Probability and Statistics Work Book

.0

All points are within the control limits. The process is said to be in statistical control.

(iii) The estimate process mean is 15.14The estimate process standard deviation is 3.895/2.059 = 1.892

Exrcise 9:

7

Page 8: PS-work book_solution

Probability and Statistics Work Book

The thickness of a printed circuit board (PCB) is an important quality parameter. Data on board thickness (in cm) are given below for 25 samples of three boards each.

Sample 1 2 3 Sample 1 2 3

1 0.0629 0.0636 0.0640 14 0.0645 0.0640 0.0631

2 0.0630 0.0631 0.0622 15 0.0619 0.0644 0.0632

3 0.0628 0.0631 0.0633 16 0.0631 0.0627 0.0630

4 0.0634 0.0630 0.0631 17 0.0616 0.0623 0.0631

5 0.0619 0.0628 0.0630 18 0.0630 0.0630 0.0626

6 0.0613 0.0629 0.0634 19 0.0636 0.0631 0.0629

7 0.0630 0.0639 0.0625 20 0.0640 0.0635 0.0629

 8 0.0628 0.0627 0.0622 21 0.0628 0.0625 0.0616

 9 0.0623 0.0626 0.0633 22 0.0615 0.0625 0.0619

10 0.0631 0.0631 0.0633 23 0.0630 0.0632 0.0630

11 0.0635 0.0630 0.0638 24 0.0635 0.0629 0.0635

12 0.0623 0.0630 0.0630 25 0.0623 0.0629 0.0630

13 0.0635 0.0631 0.0630

(i) Using all the data, find trial control limits for and R charts, construct the chart, and plot the data.

(ii) Use the trial control limits from part (a) to identify out-of-control points. If necessary, revise your control limits, assuming that any samples that plot outside the control limits can be eliminated.

(iii) Assuming that the process is in control, estimate the process mean and process standard deviation.

Hypothesis Testing

8

6

Page 9: PS-work book_solution

Probability and Statistics Work Book

- One Population

Exercise 1: (Example 1)A manufacturer of sprinkler systems used for fire protection in office buildings claims that the true average system- activation temperature is 1300. A sample of 9 systems, when tested yields an average activation temperature of 131.080F. If the distribution of activation times is normal with standard deviation 1.50F, does the data contradict the firm’s claim at level of significance a = 0.01. What is the P-value for this test?

Exercise 2: (Example 2)A random sample of 50 battery packs is selected and subjected to a life test. The average life of these batteries is 4.05 hours. Assume that the battery life is normally distributed with standard deviation equals 0.2 hour. Is there evidence to support the claim that mean battery life exceeds 4 hours? Use a = 0.05. What is the P-value for this test?

Exercise 3: A new cure has been developed for a certain type of cement that results in a compressive strength of 5000 kilograms per square centimeter with a standard deviation of 120 kilograms follow the normal distribution. To test the null hypothesis that = 5000 against the alternative that < 5000, a random sample of 50 pieces of cement is observed. The critical region is defined to be < 4970.

(a) Find the probability of committing a type I error when H0 is true. (b) Evaluate b (the probability of type II error) if = 4960

Exercise 4: (Tutorial 6, No.1)

9

Page 10: PS-work book_solution

Probability and Statistics Work Book

A civil engineer is analyzing the compressive strength of concrete. Compressive strength is approximately normally distributed with variance 2 = 1000psi2. A random sample of 12

specimens has a mean compressive strength of =3255.42 psi.

(a) Test the hypothesis that mean compressive strength is 3500psi. Use a fixed-level test with =0.01;

(b) What is the smallest level of significance at which you would be willing to reject the null hypothesis?;

(c) Construct a 95% two-sided CI on mean compressive strength; and(d) Construct a 99% two-sided CI on mean compressive strength. Compare the width of this

confidence interval with the width of the one in part (c). What is your comment?

Solution:

(a) (i) The parameter of interest is the true mean compressive strength, μ. (ii) The hypothesis Testing:

vs

(iii) The significance level α = 0.01(iv) The test statistics is:

Computation

(v) Decision:

Reject H0 if z0 <- z/2 where z0.005 = 2.58 or z0 > z/2 where z0.005 = 2.58

(vi) Result and conclusion:

Since -26.79 < -2.58, so we reject the null hypothesis and conclude the true mean compressive strength is significantly different from 3500 at α = 0.01.

(b) The smallest level of significance at which we are willing to reject the null hypothesis is P-value = 2[1 - (26.84)]=2[1-1]=0

(c) A 95% two-sided CI on mean compressive strength is

10

Page 11: PS-work book_solution

Probability and Statistics Work Book

With 95% confidence, we believe the true mean compressive strength is between 3237.53psi and 3273.31psi.

(d) A 99% two-sided CI on mean compressive strength is

With 99% confidence, we believed that the true mean compressive strength is between 3231.96 psi and 3278.88 psi.

The 99% confidence interval is wider than the 95% confidence interval. We can conclude that the confidence interval with the larger level of confidence will always result in a wider confidence interval when , 2, and n are held constant.

Exercise 5: (Example 3)A new process for producing synthetic diamonds can be operated at a profitable level only if the average weight of the diamonds is greater than 0.5 karat. To evaluate the profitability of

11

Page 12: PS-work book_solution

Probability and Statistics Work Book

the process, six diamonds are generated with recorded weights, 0.46, 0.61, .52, .48, .57 and .54 karat. (a) At 5% significance level Do the six measurements present sufficient evidence that the

average weight of the diamonds produced by the process is in excess of .05 karat? (b) Use the P-value approach to test the hypothesis null.(c) Construct a 95% CI on the average weight of diamonds.

Exercise 6: (Tutorial 6, No.2)One of the Cigarette Company claims that their cigarettes contain an average of only 10mg of tar. A random sample of 25 cigarettes shows the average tar content to be 12.5mg with standard deviation of 4.5mg.

(a) Construct a hypothesis test to determine whether the average tar content of cigarettes exceeds 10mg. using the P-value approach;

(b) Construct a 95% two-sided CI on the average tar content of cigarettes.

Solution:(a) (i) The parameter of interest is the true mean tar content, μ. (ii) The hypothesis testing:

(iii) The test statistics is:

(v) Decision:Reject H0 if P-value is smaller than 0.05

(vi) Conclusion:From a t-distribution table, for a t – distribution with 24degree of freedom, that t0

=2.778 falls between two values: 2.492 for which =0.01 and 2.797 for which =0.005. So the P-value is : 0.005 < P < 0.01. Since P<0.05, thus we reject H0 and conclude that the mean tar content of the cigarette exceeds 10mg.

12

Page 13: PS-work book_solution

Probability and Statistics Work Book

(b) A 95% two-sided CI on mean tar content is

Exercise 7: (Example 4)Regardless of age, about 20% of Malaysian adults participate in fitness activities at least twice a week. In a local survey of 100 adults over 40 years old, a total of 15 people indicated that they participated in a fitness activity at least twice a week. (a) Do these data indicate that the participation rate for adults over 40 years of age is

significantly less than 20%? Carry out a test at 10% significance level and draw appropriate conclusion.

(b) Construct a 95% two-sided CI on the participation rate.

Exercise 8: (Tutorial 6, No.3)A survey done one year ago showed that 45% of the population participated in recycling programs. In a recent poll a random sample of 1250 people showed that 588 participate in recycling programs. (a) Test the hypothesis that the proportion of the population who participate in recycling

programs is greater than it was one year ago. Use a 5% significance level.

13

Page 14: PS-work book_solution

Probability and Statistics Work Book

(b) Construct a 95% two-sided CI on the proportion.

Solution:

(a) (i) The parameter of interest is the proportion of the population who participate in recycling program, p.

(ii) The hypothesis testing:

(iii) The significance level α = 0.05 (iv) Test statistics is:

(v) Decision:

Reject H0 if z0 > zα where zα = z0.05 = 1.645.(vi) Conclusion:

Since 1.449 < 1.645, thus we do not reject the null hypothesis and conclude that 45% of the population who participate in recycling program is true at the 0.05 level of significance.

(b) 95% two-sided CI is

Since p =0.45 is inside the interval, then we cannot reject the null hypothesis.Exercise 9:A Ipoh city council member gave a speech in which she said that 18% of all private homes in the city had been undervalued by the county tax assessor’s office. In a follow-up story the local newspaper reported that it had taken random sample of 91 private homes. Using professional evaluator to evaluate the property and checking against county tax records it found that 14 of the homes had been undervalued. (i) Does this data indicate that the proportion of private homes that are undervalued by the

14

Page 15: PS-work book_solution

Probability and Statistics Work Book

county tax assessor is different from 18%? Use a 5% significance level.(ii) Construct a 95% two-sided CI on the proportion.

Exercise 10: (Example 5)Engineers designing the front-wheel-drive half shaft of a new model automobile claim that the variance in the displacement of the constant velocity joints of the shaft is less than 1.5 mm. 20 simulations were conducted and the following results were obtained, and s = 1.41. (i) At α = 0.05, do these data support the claim of the engineers? (ii) What is the P-value for this test?(iii) Construct a two-sided CI for

Exercise 11: (Tutorial 6, No.4) An Aerospace Engineers claim that the standard deviation of the percentage in an alloy used in aerospace casting is greater than 0.3. 51 parts were randomly selected and the sample standard deviation of the percentage in an alloy used in aerospace casting is s =0.37. (i). At α = 0.05, do these data support the claim of the engineers? (ii) What is the P-value for this test?(iii) Construct a 95% two-sided CI for . What is conclusion?

15

Page 16: PS-work book_solution

Probability and Statistics Work Book

Solution:(i) (a) The parameter of interest is the population variance (b) The hypothesis testing:

(c) The significance level α = 0.05 (d) Test statistics is:

(e) Decision:

Reject H0 if

(f) Conclusion:Since 76.056 > 67.50, thus we reject the null hypothesis and conclude that the engineers claim is true at the 0.05 level of significance.

(ii) From the table, . Since

71.42<76.056< 76.15, so the P-value is 0.1 < p < 0.25. Because the P-value is large, then we do not reject the null

hypothesis.(b) 95% two-sided CI is

16

Page 17: PS-work book_solution

Probability and Statistics Work Book

Exercise 12: The scientists claim that the variance of sugar content of the syrup in canned peaches thought to be 18 mg2. From a random sample of 10 cans yields a sample deviation of 4.8mg.(i) At α = 0.05, do these data support the claim of the scientists?(ii) What is the P-value for this test?(iii) Construct a 95% two-sided CI for . What is conclusion?

Hypothesis Testing -Two Population

Exercise 1: (Example 1) A random sample of size n = 25 taken from a normal population with = 5.2 has a mean equals 81. A second random sample of size n = 36, taken from a different normal population with = 3.4, has a mean equals 76. (a) Do the data indicate that the true mean value 1 and 2 are different? Carry out a test

at = 0.01 (b) Find 90% CI on the difference in mean strength

17

7

Page 18: PS-work book_solution

Probability and Statistics Work Book

Exercise 2: (Example 2)Two machines are used for filling plastic bottles with a net volume of 16.0 oz. The fill volume can be assumed normal with, s1 = 0.02 and s2 = 0.025. A member of the quality engineering staff suspects that both machines fill to the same mean net volume, whether or not this volume is 16.0 oz. A random sample of 10 bottles is taken from the output of each machine with the following results: (a) Do you think the engineer is correct? Use the p – value approach. (b) Find a 95% CI on the difference in means.

Exercise 3: (Tutorial 7, No.1)Two machine are used to fill plastic bottles with dishwashing detergent. The standard deviations of fill volume are known to be 10.01 and = 0.15 fluid ounce for two machines, respectively. Two random samples of n1 = 12 bottles from machine 1 and n2=10

bottles from machine 2 are selected, and the sample mean fill volumes are =30.61 =30.24 fluid ounces. Assume normality.

(i) Test the hypothesis that both machines fill to the same mean volume. Use the P-value approach;(ii) Construct a 90% two-sided CI on the mean difference in fill volume; and(iii) Construct a 95% two-sided CI on the mean difference in fill volume. Compare and comment on the width of this interval to the width of the interval in part (ii).

18

Page 19: PS-work book_solution

Probability and Statistics Work Book

Exercise 4: (Example 3)To find out whether a new serum will arrest leukemia, 9 mice, all with an advanced stage of the disease are selected. 5 mice receive the treatment and 4 do not. Survival, in years, from the time the experiment commenced are as follows:

Treatment 2.1 5.3 1.4 4.6 0.9

No treatment 1.9 0.5 2.8 3.1

At the 0.05 level of significance can the serum be said to be effective? Assume the two distributions to be of equal variances.

Exercise 5: (Tutorial 7, No.2)A new policy regarding overtime pay was implemented. This policy decreased the pay factor for overtime work. Neither the staffing pattern nor the work loads changed. To determine if overtime loads changed under the policy, a random sample of employees was selected. Their overtime hours for a randomly selected week before and for another randomly selected week after the policy change were recorded as follows:

Employees: 1 2 3 4 5 6 7 8 9 10 11 12Before: 5 4 2 8 10 4 9 3 6 0 1 5After: 3 7 5 3 7 4 4 1 2 3 2 2

Assume that the two population variances are equal and the underlying population is normally distributed.(i) Is there any evidence to support the claim that the average number of hours worked as

overtime per week changed after the policy went into effect. Use a P-value approach in arriving at this conclusion.

(ii) Construct a 95% CI for the difference in mean before and after the policy change. Interpret this interval.

19

Page 20: PS-work book_solution

Probability and Statistics Work Book

Exercise 6:The diameter of steel rods manufactured on two different extrusion machines is being investigated. Two random samples of sizes n1 = 15 and n2 = 17 are selected, and respectively. Assume that data are drawn normal distribution with equal variances.

(a) Is there evidence to support the claim that the two machines produce rods with different mean diameters ? Use the p – value approach.

(b) Construct a 95% CI on the difference in mean rod diameter.

Exercise 7: (Example 4)The following data represent the running times of films produced by 2 motion-picture companies. Test the hypothesis that the average running time of films produced by company 2 exceeds the average running time of films produced by company 1 by 10 minutes against the one-sided alternative that the difference is less than 10 minutes? Use a = 0.01 and assume the distributions of times to be approximately normal with unequal variances.

Time

Company

X1 102 86 98 109 92

X2 81 165 97 134 92 87 114

20

Page 21: PS-work book_solution

Probability and Statistics Work Book

Exercise 8:Two companies manufacture a rubber material intended for use in an automotive application. 25 samples of material from each company are tested, and the amount of wear after 1000 cycles are observed. For company 1, the sample mean and standard deviation of wear are

and for company 2, we obtain

(a) Do the sample data support the claim that the two companies produce material with different mean wear? Assume each population is normally distributed but unequal variances?

(b) Construct a 95% CI for the difference in mean wear of these two companies. Interpret this interval.

Exercise 9: (Tutorial 7, No.3)Professor A claims that a probability and statistics student can increase his or her score on tests if the person is provided with a pre-test the week before the exam. To test her theory she selected 16 probability and statistics students at random and gave these students a pre-test the week before an exam. She also selected an independent random sample of 12 students who were given the same exam but did not have access to the pre-test. The first group had a mean score of 79.4 with standard deviation 8.8. The second group had sample mean score 71.2 with standard deviation 7.9. (i) Do the data support Professor A claims that the mean score of students who get a pre-

test are different from the mean score of those who do not get a pre test before an exam. Use the P-value approach and assume that their variances are not equal.

(ii) Construct a 95% CI for the difference in mean score of students who get a pre-test and those who do not get a pre-test before an exam. Interpret this interval.

21

Page 22: PS-work book_solution

Probability and Statistics Work Book

Exercise 10: (Example 5)A vote is to be taken among residents of a town and the surrounding county to determine whether a proposed chemical plant should be constructed. If 120 of 200 town voters favour the proposal and 240 of 500 county residents favour it, would you agree that the proportion of town voters favouring the proposal is higher than the proportion of county voters? Use a = 0.05

Exercise 11: (Tutorial 7, No.4)The rollover rate of sport utility vehicles is a transportation safety issue. Safety advocates claim that the manufacturer A’s vehicle has a higher rollover rate than that of manufacturer B. One hundreds crashes for each of this vehicles were examined. The rollover rates were pA=0.35 and pB=0.25.(i) By using the P-value approach, does manufacturer A’s vehicle has a higher rollover rate than manufacturer B’s?(ii) Construct a 95% CI on the difference in the two rollover rates of the vehicle. Interpret this interval.

22

Page 23: PS-work book_solution

Probability and Statistics Work Book

Exercise 12:Professor Rady gave 58 A’s and B’s to a class of 125 students in his section of English 101. The next term Professor Hady gave 45 A’s and B’s to a class of 115students in his section of English 101. (i) By using a 5% significance level, test the claim that Professor Rady gives a higher

percentage of A’s and B’s in English 101 than Professor Hady does. What is comment?(ii) Construct a 95% CI on the difference in the percentage of A’s and B’s in English 101

given by this two professors.

Simple Linear Regression

Exercise 1: (Example 1)The manager of a car plant wishes to investigate how the plant’s electricity usage depends upon the plant production. The data is given below

Production (RMmillion)

(x)

4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2

Electricity Usage

(y)

2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53

23

8

Page 24: PS-work book_solution

Probability and Statistics Work Book

(a) Estimate the linear regression equation (b) An estimate for the electricity usage when x = 5 (c) Find a 90% Confidence Interval for the electricity usage.

Exercise 2: An experiment was set up to investigate the variation of the specific heat of a certain chemical with temperature. The data is given below

Temperature oF(x)

50 60 70 80 90 100

Heat(y)

1.601.64

1.631.65

1.671.67

1.701.72

1.711.72

1.711.74

(a) Estimate the linear regression equation (b) Plot the results on a scatter diagram(c) An estimate for the specific heat when the temperature is 75oF (d) Find a 95% Confidence Interval for the specific heat.

Exercise 3: (Example 2)An engineer at a semiconductor company wants to model the relationship between the device HFE (y) and the parameter Emitter - RS ( ). Data for Emitter - RS was first collected and a statistical analysis is carried out and the output is displayed in the table given.

Regression Analysis: y = 1075.2 – 63.87x1

Predictor Coef SE Coef T P-valueConstant 1075.2 121.1 8.88 0.000 x1 -63.87 8.002 -7.98 0.000S = 19.4 R-Sq = 0.78

Analysis of varianceSource DF SS MS FRegression 1 23965 23965 63.70Residual 18 6772 376Total 19 30737

(a) Estimate HFE when the Emitter - RS is 14.5. (b) Obtain a 95 % confidence interval for the true slope β. (c) Test for significance of regression for a = 0.05.

24

Page 25: PS-work book_solution

Probability and Statistics Work Book

Exercise 4: An chemical engineer wants to model the relationship between the purity of oxygen (y) produced in a chemical distillation process and the percentage of hydrocarbons (x ) that are present in the main condenser of the distillation unit. A statistical analysis is carried out and the output is displayed in the table given.

Regression Analysis: y = 74.3 + 14.9x

Predictor Coef SE Coef T P-valueConstant 74.283 1.593 46.62 0.000 x1 14.947 1.317 11.35 0.000S = 1.087 R-Sq = 87.7%

Analysis of varianceSource DF SS MS FRegression 1 152.13 152.13 12.86Residual 18 21.25 1.18Total 19 173.38

(a) Estimate the purity of oxygen when the percentage of hydrocarbon 1%. (b) Obtain a 95 % confidence interval for the true slope β.

25

Page 26: PS-work book_solution

Probability and Statistics Work Book

(c) Test for significance of regression for a = 0.05.

Exercise 5: (Tutorial 8, No.1)Regression methods were used to analyze the data from a study investigating the relationship between roadway surface temperature (x) and pavement deflection (y). The data follow.

Temperature x Deflection y Temperature x Deflection y

70.0 0.621 72.7 0.637

77.0 0.657 67.8 0.627

72.1 0.640 76.6 0.652

72.8 0.623 73.4 0.630

78.3 0.661 70.5 0.627

74.5 0.641 72.1 0.631

74.0 0.637 71.2 0.641

72.4 0.630 73.0 0.631

75.2 0.644 72.7 0.634

26

Page 27: PS-work book_solution

Probability and Statistics Work Book

Temperature x Deflection y Temperature x Deflection y

76.0 0.639 71.4 0.638

(a) Estimate the intercept and slope regression coefficients. Write the estimated regression line.

(b) Compute SSE and estimate the variance.(c) Find the standard error of the slope and intercept coefficients.

(d) Show that (e) Compute the coefficient of determination, R2. Comment on the value.(f) Use a t-test to test for significance of the intercept and slope coefficients at .

Give the P-values of each and comment on your results.(g) Construct the ANOVA table and test for significance of regression using the P-value.

Comment on your results and their relationship to your results in part (f).(h) Construct 95% CIs on the intercept and slope. Comment on the relationship

of these CIs and your findings in parts (f) and (g).

Exercise 6: (Tutorial 8, No.2)The designers of a database information system that allows its users to search backwards for several days wanted to develop a formula to predict the time it would be take to search. Actually elapsed time was measured for several different values of days. The measured data is shown in the following table:

Number of Days 1 2 4 8 16 25Elapsed Time 0.65 0.79 1.36 2.26 3.59 5.39

(i) Estimate the intercept and slope regression coefficients. Write the estimated regression line.

(ii) Compute SSE and estimate the variance.(iii) Find the standard error of the slope and intercept coefficients.(iv) Show that (v) Compute the coefficient of determination, R2. Comment on the value.

27

Page 28: PS-work book_solution

Probability and Statistics Work Book

(vi)Use a t-test to test for significance of the intercept and slope coefficients at . Give the P-values of each and comment on your results.

(vii) Construct the ANOVA table and test for significance of regression using the P-value. Comment on your results and their relationship to your results in part (vi).

(viii) Construct 95% CIs on the intercept and slope. Comment on the relationship of these CIs and your findings in parts (vi) and (vii).

Multiple Linear Regressions

Exercise 1: (Example 1)Given the data:

Test Number y x1 x21 1.6 1 12 2.1 1 23 2.4 2 1

28

9

Page 29: PS-work book_solution

Probability and Statistics Work Book

4 2.8 2 25 3.6 2 36 3.8 3 27 4.3 2 48 4.9 4 29 5.7 4 310 5 3 4

(a) Fit a multiple linear regression model to these data.

Exercise 2: Given the data:

Observation Number Pull Strength y Wire Length x1 Die Height x21 9.95 2 502 24.45 8 1103 31.75 11 1204 35.00 10 5505 25.02 8 2956 16.86 4 2007 14.38 2 3758 9.60 2 529 24.35 9 10010 27.50 8 300

29

Page 30: PS-work book_solution

Probability and Statistics Work Book

11 17.08 4 41212 37.00 11 40013 41.95 12 50014 11.66 2 36015 21.65 4 20516 17.89 4 40017 69.00 20 60018 10.30 1 58519 34.93 10 54020 46.59 15 25021 44.88 15 29022 54.12 16 51023 56.63 17 59024 22.13 6 10025 21.15 5 400

(b) Fit a multiple linear regression model to these data.

Exercise 3:A study was performed to investigate the shear strength of soil (y) as it related to depth in meter (x1) and percentage moisture content (x2). Ten observations were collected and the following summary quantities obtained:

(a) Estimate the parameters to fit the multiple regression models for these data.(b) What is the predicted strength when x1=18meter and x2= 43%.

30

Page 31: PS-work book_solution

Probability and Statistics Work Book

Exercise 4: (Example 2)A set of experimental runs were made to determine a way of predicting cooking time y at various levels of oven width x1, and temperature x2. The data were recorded as follows:

31

Page 32: PS-work book_solution

Probability and Statistics Work Book

(a) Fit a multiple linear regression model to these data.

(b) Estimate and the standard errors of the regression coefficients.(c) Test for significance of and .(d) Predict the useful range when brightness = 80 and contrast = 75. Construct a 95% PI.(e) Compute the mean response of the useful range when brightness = 80 and contrast = 75.

Compute a 95% CI.(f) Interpret parts (d) and (e) and comment on the comparison between the 95% PI and 95%

CI.

Exercise 5: (Tutorial 9, No.1)An article in Optical Engineering (“Operating Curve Extraction of a Correlator's Filter,” Vol. 43, 2004, pp. 2775–2779) reported the use of an optical correlator to perform an experiment by varying brightness and contrast. The resulting modulation is characterized by the useful range of gray levels. The data are shown

Brightness (%): 54 61 65 100 100 100 50 57 54

Contrast (%): 56 80 70 50 65 80 25 35 26

Useful range (ng): 96 50 50 112 96 80 155 144 255

(a) Fit a multiple linear regression model to these data.

(b) Estimate and the standard errors of the regression coefficients.(c) Test for significance of and .

32

Page 33: PS-work book_solution

Probability and Statistics Work Book

(d) Predict the useful range when brightness = 80 and contrast = 75. Construct a 95% PI.(e) Compute the mean response of the useful range when brightness = 80 and contrast = 75.

Compute a 95% CI.(f) Interpret parts (d) and (e) and comment on the comparison between the 95% PI and 95%

CI.

Exercise 6: (Tutorial 9, No.2)A study was performed on wear of a bearing y and its relationship to x1 = oil viscosity and x2 = load. The following data were obtained:

x1 1.6 15.5 22.0 43.0 33.0 40.0

x2 851 816 1058 1201 1357 1115

y 293 230 172 91 113 125

(a) Fir a multiple regression model to these data.(b) Estimate and the standard errors of the regression coefficients.(c) Use the model to predict wear when x1 = 25 and x2 = 1000.(d) Fit a multiple regression model with an interaction term to these data.(e) Estimate and se(bj) for this new model. How did these quantities change? Does this

tell you anything about the value of adding the interaction term to the model?

33

Page 34: PS-work book_solution

Probability and Statistics Work Book

(f) Use the model in (d), to predict when x1=25 and x2=1000. Compare this prediction with the predicted value from part (c) above.

Factorial Experiments – 22 Factorial design

Exercise 1: (Example 1)

34

1

Page 35: PS-work book_solution

Probability and Statistics Work Book

An engineer is investigating the thickness of epitaxial layer which will be subject to two variations in A, deposition time (+ for short time, and – for long time) and two levels of B, arsenic flow rate (- for 55% and + for 59%). The engineer conduct 22 factorial design with n = 4 replicates. The data are as follow:

a)

Construct the 2 X 2 factorial design table. b) Find the estimate of all effects and interaction. c) Construct the ANOVA table for each effect, test the null hypothesis that the effect is

equal to 0.

Exercise 2: (Tutorial No1)

A two factor experimental design was conducted to investigate the lifetime of a component

being manufactured. The two factors are A (design) and B (cost of material). Two levels ((+)

and (-)) of each factor are considered. Three components are manufactured with each

combination of design and material, and the total lifetime measured (in hours) is as shown in

table below

Treatment

Design

A

Material

B

AB Total lifetime of 3

components

(in hours)

Arsenic Level

Deposition Time

B –(Low - 55%)

B +(High – 59%)

A - (Long)

14.03714.16513.97213.907

13.88013.86014.03213.914

A + (Short)

14.82114.75714.84314.878

14.88814.92114.41514.932

35

Page 36: PS-work book_solution

Probability and Statistics Work Book

Combination

(1) - - + 122

a + - - 60

b - + - 120

ab + + + 118

(a) Perform a two way analysis of variance to estimate the effects of design and material expense on the component life time. (b) Based on your results in part (a), what conclusions can you draw from the factorial experiment?(c) Indicate which effects are significant to the lifetime of a component.

(d) Write the least square fitted model using only the significant sources.

Exercise 3: An engineer suspects that the surface finish of metal parts is influenced by the type of paint used and the drying time. He selected three drying times – 20, 25, and 30 minutes and used two types of paint. Three parts are tested with each combination of paint typoe and drying time. The data are as follow:

Drying Time (min)

Paint 20min 25min 30min

ICI 746450

736144

788592

NIPPON 928668

987388

664585

36

Page 37: PS-work book_solution

Probability and Statistics Work Book

(a) Compute the estimates of the effects and their standard errors for this design.(b) Construct two-factor interaction plots and comment on the interaction of the factors.(c) Use the t ratio to determine the significance of each effect with .Comment on

your findings.(d) Compute an approximate 95% CI for each effect. Compare your results with those in

part (c) and comment.(e) Perform an analysis of variance of the appropriate regression model for this design.

Include in your analysis hypothesis tests for each coefficient, as well as residual

Exercise 4: (Tutorial 10, No.2)An experiment involves a storage battery used in the launching mechanism of a shoulder-fired ground-to-air missile. Two material types can be used to make the battery plates. The objective is to design a battery that is relatively unaffected by the ambient temperature. The output response from the battery is effective life in hours. Two temperature levels are selected, and a factorial experiment with four replicates is run. The data are as follows:

  Temperature (°F)

Material Low High

1 130 155 20 70

74 180 82 58

2 138 110 96 104

  168 160 82 60

37

Page 38: PS-work book_solution

Probability and Statistics Work Book

(a) Compute the estimates of the effects and their standard errors for this design.(b) Construct two-factor interaction plots and comment on the interaction of the factors.(c) Use the t ratio to determine the significance of each effect with .Comment on

your findings.(d) Compute an approximate 95% CI for each effect. Compare your results with those in

part (c) and comment.(e) Perform an analysis of variance of the appropriate regression model for this design.

Include in your analysis hypothesis tests for each coefficient, as well as residual analysis. State your final conclusions about the adequacy of the model. Compare your results to part (c) and comment.

Exercise 5: An article in the IEEE Transactions on Semiconductor Manufacturing (Vol. 5, 1992, pp. 214-222) describes an experiment to investigate the surface charge on a silicon wafer. The factors thought to influence induced surface charge are cleaning method (spin rinse dry or SRD and spin dry or SD and the position on the wafer where the charge was measured. The surface charge ( X1011 q/cm3) response data are shown.

CleaningMethod

Test Position

SD

L R1.66 1.841.90 1.841.92 1.62

SRD-4.21 -7.58-1.35 -2.20

38

Page 39: PS-work book_solution

Probability and Statistics Work Book

-2.08 -5.36

(a) Compute the estimates of the effects and their standard errors for this design.(b) Construct two-factor interaction plots and comment on the interaction of the factors.(c) Use the t ratio to determine the significance of each effect with .Comment on

your findings.(d) Compute an approximate 95% CI for each effect. Compare your results with those in

part (c) and comment.(e) Perform an analysis of variance of the appropriate regression model for this design.

Include in your analysis hypothesis tests for each coefficient, as well as residual analysis. State your final conclusions about the adequacy of the model. Compare your results to part (c) and comment.

39