13
S e = SSE n2 ¿ MSE R 2 = SS R SS total = 1 – SSE SS Total R 2 adjusted = 1 – ( SSE÷( nk1) SST÷n1 ) = 1 – n1 nk1 (1-R 2 ) -Value: 0 0.5 0.8 1 Type of Association No linear ass’n Weak Moderat e Stron g Quiz & Test Averages: R 2 = 0.2565 Interpret R 2 : 25.65% of the variation in test average is due to the quiz average & model. S e = 13.1 Interpret: On average, the predicted test average is 13.1 points from the actual test average. R = 0.5065 Interpret: There is a moderate positive linear ass’n between quiz average & test average. Conditions for Inferences for Linear Model— QOLINE—Acronym: Quantitative—Check to ensure the 2 variables really are quantitative. It is if it makes sense to average it. Outliers—To determine if a given value is an outlier that needs to be removed, take it out & examine the resulting change in the slope of the model. Linear—Is a line or a curve the best model for the data? Create a residual (residuals vs. x) plot. On the x-axis, you can plot X or ^ y. On the y-axis, plot the residuals. If the best way to model how x & y relate is a: (1) line, the residual plot will show points randomly scattered above & below 0 with no U-shape (2) curveresidual plot has a U shape. Stat RegressionSimple Linear Select fitted values vs. residuals OR residuals vs. x values Independent—Are the residuals independent? Look at the residual index plot; if there is no pattern, this condition is satisfied. If there is a pattern, it is violated. Normal—Are the residuals normally distributed? It is satisfied as long as the residuals are not very skewed. You can look at the histogram of the residuals. If n is small, look at a QQ-plot. If the residuals are normally distributed, the dots will fall along the line. Equal Spread—Are the points equally spread about the regression line? Satisfied if there is no pattern in the residual plot—they are randomly scattered above & below 0. It is violated if the points on the residual plot show a fan pattern (<, >) 3 ways to tell if 2 quantitative vars have a linear ass’n: (1) T-Test (2) ANOVA (3) CI for the slope B 1 Method One—A T-Test—Example—% Enrolled & Tuition: Simple linear regression results:

Cheat Sheet for Test 4 Updated

Embed Size (px)

DESCRIPTION

VCU MGMT 302 Statistics Notes for Exam 4 with professor burch

Citation preview

Page 1: Cheat Sheet for Test 4 Updated

Se =√ SSEn−2 ¿√MSE R2 = SSRSS total

= 1 – SSESSTotal

R2 adjusted = 1 – (SSE÷(n−k−1)SST ÷n−1

) = 1 – n−1n−k−1 (1-R2)

-Value: 0 0.5 0.8 1Type of Association No linear ass’n Weak Moderate Strong

Quiz & Test Averages:R2 = 0.2565 Interpret R 2 : 25.65% of the variation in test average is due to the quiz average & model.

Se = 13.1 Interpret: On average, the predicted test average is 13.1 points from the actual test average.R = 0.5065 Interpret: There is a moderate positive linear ass’n between quiz average & test average.

Conditions for Inferences for Linear Model—QOLINE—Acronym:

Quantitative—Check to ensure the 2 variables really are quantitative. It is if it makes sense to average it.Outliers—To determine if a given value is an outlier that needs to be removed, take it out & examine the resulting change in the slope of the model.Linear—Is a line or a curve the best model for the data? Create a residual (residuals vs. x) plot. On the x-axis, you can plot X or y . On the y-axis, plot the residuals. If the best way to model how x & y relate is a: (1) line, the residual plot will show points randomly scattered above & below 0 with no U-shape (2) curveresidual plot has a U shape.Stat RegressionSimple Linear Select fitted values vs. residuals OR residuals vs. x valuesIndependent—Are the residuals independent? Look at the residual index plot; if there is no pattern, this condition is satisfied. If there is a pattern, it is violated.Normal—Are the residuals normally distributed? It is satisfied as long as the residuals are not very skewed. You can look at the histogram of the residuals. If n is small, look at a QQ-plot. If the residuals are normally distributed, the dots will fall along the line.Equal Spread—Are the points equally spread about the regression line? Satisfied if there is no pattern in the residual plot—they are randomly scattered above & below 0. It is violated if the points on the residual plot show a fan pattern (<, >)

3 ways to tell if 2 quantitative vars have a linear ass’n: (1) T-Test (2) ANOVA (3) CI for the slope B1

Method One—A T-Test—Example—% Enrolled & Tuition:

Simple linear regression results: Dep. Variable: Tuition Independent Variable: %Enroll Tuition = $16,999.074 + 52.716232 %Enroll

Sample size: 59 R (correlation coef.) = 0.1048 R-sq = 0.0109 Estimate of error SD: 5926.138

Parameter estimates:

ANOVA table for regression model:

Parameter Estimate Std. Err. Alternative DF T-Stat P-Value

Intercept 16999.074 2585.9006 ≠ 0 57 6.5737543 <0.0001

Slope 52.716232 66.28215 ≠ 0 57 0.79533076 0.4297

Source DF SS MS F-stat P-value

Model 1 2.2214634E7 2.2214634E7 0.6325511 0.4297

Error 57 2.00178957E9 3.5119116E7

Total 58 2.02400422E9

Page 2: Cheat Sheet for Test 4 Updated

(1) H0: B1 = 0 HA: B1 ≠ 0 (2) α= 0.05 (3) Conditions: (1)Quantitative—Yes, % enrolled & tuition are both quantitative variables. (2)Outliers—Yes, we removed 2 outliers. (3)Linear—Yes—there is no U-shape in the residual index plot. (4)Independent—The residuals independent because there is no pattern in the residual index plot. (5)Normal—Yes, the residuals are normally distributed. The histogram is a little bit skewed to the right, but the QQ plot doesn’t look bad. (6)Equal Spread—Yes, the points are equally spread about the regression line. There

is no fan pattern on the residual plot. (4) T = b1−B1SEof b1

= −0.00048−00.0022 = -0.218 Interpret T: B1 (=-0.00048) is

0.218 SEs to the left of B1=0. (5) P = 0.8283 Interpret P-Value: Assuming B1=0, there is a 82.83% chance of getting a b1 of at least 0.218 SE from 0 in both directions. (6) 0.8283 P-Value vs.0.05 α Fail to Reject H0 (7) There is not strong enough evidence to show that there is a linear association between % enrolled & tuition.

Method 2—ANOVA—Diamonds Example:

(1) H0: B1 = 0 HA: B1 ≠ 0 (2) α = 0.05 (3) Conditions: (1) Quantitative—Yes, % enrolled & tuition really are both quantitative variables. (2) Outliers—Yes, we removed 2 outliers. (3) Linear—Yes—there is no U-shape in the residual plot. (4) Independent—The residuals are indep. because there is no pattern in the residual index plot. (5) Normal—Yes, the residuals are normally distributed. The histogram is a little bit skewed to the right, but the QQ plot doesn’t look bad. (6) Equal Spread—Yes, the points are equally spread about the regression line. There is no fan pattern on the residual plot. (4) F = 1,773.72 Interpret F: There is 1,773.72 times as much variation in diamond price (y) due to the size & model as there is due to random error. (6) Find P-Value: P < 0.0001 Interpret P-Value: Assuming B1=0, there is less than a 0.01% chance of getting an F of 1,773.72 or even more. (6) 0.0001 P-Value vs. 0.05 α Reject H0 (7) There is strong evidence to show that there is a linear ass’n between diamond size & price.

Method 3—A CI for the Slope—B1—Diamonds Example:

CI = Estimate +/- ME = b1 +/- T* SE of b1 = 3,671.4 +/- (2.014) 87.17 = (3,495.82, 3846.98)T* calculated from table, using a df = to n – 2 & a confidence level of 95%. Interpret CI: I’m 95% confident the diamond price will ↑ between $3,495.82 & $3,846.98 when the size ↑’s by 1 carat.

Tuition Example: CI = (-0.00049, 0.000397) Interpret CI: There is no linear ass’n between tuition & % enrolled b/c the interval contains 0.

Quiz/Test Average Example: CI = (3.13, 8.12) Interpret CI: I’m 95% confident that the y-value (test average) will ↑ between 3.13 & 8.12 points when the quiz average is ↑’d by 1 point.

Confidence & Prediction Intervals:

95% CI for mean: (474.48, 492.94): I’m 95% confident that the mean price of all diamonds that are 0.2 carats is between $474.48 & $492.94.

Prediction interval for 1 diamond: 95% PI = (419.77, 547.65): I’m 95% confident that the price of 1 diamond which is 0.2 carats is between $419.77 & $547.65.

b1 = 5.63 CI for b1 = (3.13, 8.12) I’m 95% confident the test average will ↑ between 3.13 & 8.12 points when the quiz average increases by 1 point.

Predict Test Avg if Quiz Avg = 8: y = 75.84 95% CI for mean = (72.26, 79.42) I’m 95% confident that the mean test avg for all students with a quiz avg of 8 is between 72.26 & 79.42 points.

95% PI = (49.38, 102.30): I’m 95% confident that the test average for 1 student with a quiz average of 8 is between 49.38 & 102.3 points.

Page 3: Cheat Sheet for Test 4 Updated

CI’s are for the mean—Goal is to capture the mean. PI’s are for 1 value—Goal is to capture 1 value.(1) Why are the 95% CI’s narrower/tighter than the 95% CI’s? Individual y values vary much more than the mean y value, which should be close to the middle of the y values. So, to capture 1 individual y value, the PI must be much wider than the CI, which only needs to capture a mean.

(2) Why do the CIs & PIs bend?CI: y +/- T* (SE of y) SE of y = SE √ 1n+(x−x)2

SSx

We want the SE of y to be small. There are 4 components to this: (1) Se—We want this to be small—The points are tightly clustered among the SLR line. This has nothing to do with the bend. Se is the same no matter which x we choose for our prediction. (2) N—We want N to be large—the more points the better. This has nothing to do with the bend. N is the same no matter which x we choose for our prediction. (3) (x−x )2—We want this to be small so that the SE of y is small. This is what’s causing the bend to occur. Predictions made for x values close to the middle (close to x) will be narrower than predictions for x-values on the edges of the sample. (4) SSx—We want this to be large so that the SE of y is small. This measures the variety in the x-values.

Multiple Linear Regression: More than 1 predictor variable (x)If the # of x’s =’s 1, then model with a line. If the # of x’s =’s 2, then model with a plane.If the # of x’s =’s 3, then model with a cube.

The linear in multiple linear regression refers to how x relates to its coefficient.

K = # of predictor variables (x’s)

Demand = Demand for a laundry detergent for a 4 wk. sales period (in 100s of 1,000s of bottles) Price = Price charged for detergent during the 4-week period. AIP = Average Industry Price—How much their competition is charging, on average. Advertising: How much you’re spending on advertising (measured in 100s of 1,000s of $$s)

Multiple linear regression results: Dep. Var: Demand (y) Indep. Var(s): Price, Advertising, AIP

Parameter estimates:Variable Estimate SE T stat P-valueIntercept 7.5891025 2.44502 3.10390 0.0046

Price -2.357723 0.63795 -3.69579 0.001Advertising 0.501152 0.12587 3.98138 0.0005

AIP 1.612214 0.295353 5.45861 <0.0001

ANOVA table for multiple regress. model:Source DF SS MS F-stat P-valueModel 3 12.027 4.00892 72.8 <0.0001Error 26 1.4318 0.05507Total 29 13.459

Summary of fit: Root MSE: 0.235 R2: 0.8936Demand = $7.59 – $2.36 Price + 0.161 AIP + 0.5 AdvertisingPredict demand if price = $3.70, AIP = $3.80, & Advertising = 550,000 Demand = 7.748To find on StatCrunch: Add the values for the 3 vars multiple regressionsave prediction values

b0 = 7.59: Interpret: I predict the demand will be 759,000 bottles if price, AIP, & advertising = $0. Bprice = (2.36): Interpret bprice: I predict demand will ↓ 236,000 bottles when price ↑s by $1 & if AIP & advertising are held constant.

Predict Demand if price = $4.70, AIP = $3.80 & Advertising = 5.5: Demand = 539,000 BAIP = 1.61: I predict demand will increase 161,000 bottles when AIP increases by $1, if price & advertising are held constant. Badvertising= ½: I predict demand will ↑ 50,000 when advertising ↑’s $100,000, & if price & AIP are held constant.

Root MSE = se.R measures the strength of a linear model. Multiple regression does not use lines—so we do not have R in multiple regression.

Page 4: Cheat Sheet for Test 4 Updated

Multiple linear regression results: Dep. Variable: Salary Indep. Vars: YrsEm, PriorYr, Educ

Parameter estimates:Variable Estimate Std. Err. Tstat P-valueIntercept 23480.461 2027.6963 11.579871 <0.0001

YrsEm 671.32545 143.21248 4.6876185 <0.0001PriorYr -73.827338 232.78401 -0.31714953 0.7527

Educ 1925.8824 384.43946 5.0095856 <0.0001

ANOVA table for multiple regression model:Source DF SS MS F-stat P-valueModel 3 4.0317309e9 1.3439103e9 39.959939 <0.0001Error 42 1.4125205e9 33631440Total 45 5.4442514e9

Summary of fit: Root MSE: 5799.26 R2: 0.7405 R2 (adjusted): 0.722

Salary = 23,480.46 + 671.33 Years Employed – 73.83 Prior Years + 1,925.88 Education

BEducation = 1,925.88: We predict salary will ↑ by $1,925.88 when education ↑s by 1 year, & if years of employment & prior years are held constant. B0 = 23,480.46: We predict salary will be $23,480.46 when years employed = 0 years, prior years = 0, & education = 0 years.Predict salary if yrs employed = 5, prior yrs of work = 5, & education = 4 Salary = $34,171.48

Assessing the Multiple Regression Model:

RootMSE= Se=0.23466: On avg, each predicted demand is 23,466 bottles from the actual demand.Temco file: RootMSE= Se=5799.26: On avg, each predicted salary is $5,799.26 from actual salary.

SSE = ∑ e2 e2

R2 = SSRSSTotal

= ExplainedVariationTotalVariation

= %of the variation in Y that’s explained by the model.

Source DF SS MSModel k ∑ ¿¿ Ssr / kError n-k-1 ∑ ( y− y)2 Sse / n-k-1Total n-1 ∑ ¿¿ -

Temco: R2 = 0.7405: 74.05% of the variation in salary is due to years of employment, prior year experience, education, & the model. Detergent: R2 = 0.8936: 89.36% of the variation in demand is explained by price, AIP, the model, & advertising.

If you want to compare models that have different n’s or different variables, you can’t use r2.

R2 Adjusted—This is used to compare models which have different #s of predictor variables (x’s) or different sample sizes. So, if adding a new predictor variable improves the model, R2 adjusted will go up. If adding a new predictor variable makes the model worse, R2 adjusted goes down. We don’t use R2 for comparison b/c it’ll ↑ when new predictors are added whether they’re good or bad.

SALARY: R2 adjusted = 1 – (n−1n−k−1) (SSE / SST) = 1 – ( 46-1/46-3-1) (1.4125 e9 / 5.4443 e9)

Page 5: Cheat Sheet for Test 4 Updated

R2 adjusted = 0.7224: 72.24% of the variation in salary is due to years of experience, prior years of experience, education, & the model after adjusting for the sample size & the # of predictors.

K = 2 n = 15 SSE = 61.43 SStotal = 325.14Source DF SS MSModel 2 263.71 2.26Error 12 61.43 5.12Total 14 325.14 -

R2= SSRSSTotal

= 0.8111 R2 adjusted = 1 – (1412) (61.43 / 325.14) = 0.1889

R2 adjusted is always smaller than r2 for the same model.

K = 3 n = 15 SSE = 51.192 r2 = 0.843 SE = 2.157 r2 = 0.8002

Source DF SS MSModel 3 274.87Error 11 51.192 4.654Total 14 326.06 -

ANOVA tests the overall model to see if any predictor variables are good.

1. H0: B1 = B2 = B3 = 0 No predictors are good2. HA: At least 1 Bi is ≠ 0 At least 1 predictor is good3. = 0.05α4. F = 72.7975. P-value < 0.00016. Reject H0

7. There is strong evidence to show at least 1 predictor is good at predicting demand.

Once you know at least 1 predictor is good, use t-tests or ci for slopes to tell which 1(s) are good

CI Example: Is price a good predictor?

CI for BPRICE = BPRICE +/- T* (SE of BPRICE) = -2.36 +/- (2.056) 0.638 = -2.36 +/- 1.31Df = n – k – 1 = 26 df @ 95% confidence

CI for BPRICE = (-3.67, -1.05) : Good prediction because the CI does not contain 0.

T-Test Example: Is AIP a good predictor?

H0: BAIP = 0 Ha: BAIP ≠ 0 P-value < 0.0001AIP is a good predictor of demand. T-Stat bigger = better if both have same p-value

Predict Demand for the detergents if: Price = $3.70 AIP = $3.80 Advertising = $550,000

CI = (7.509, 7.986): I’m 95% confident the mean demand for all 4-week sale periods where price = $3.70, AIP = $3.80, & advertising = $550,000 is between 750,900 & 798,600 bottles.PI: (7.21, 8.286): I’m 95% confident the demand for 1 4-week sale period where price = $3.70, AIP = $3.80, & advertising = $550,000 is between 721,000 & 828,600 bottles.How to Check the Conditions:Quantitative—Both of the variables are quantitative.Outliers—Use cooks d(distance). A row is an outlier if the cooks d value is > a certain amount.

Page 6: Cheat Sheet for Test 4 Updated

To tell where that amount is: Go to the F calculator. In the numerator, put k+1 (4) for detergents. In the denominator, put n – k – 1. (26 for detergents) Doesn’t matter if you use < or >. So use the P ( x ≤ blank) = 0.5

The certain amount for our problem = 0.86 Stat GraphsHistogramLinear—Residual Plot. StatResidualsGraph Scatterplot No U-shape on any x residual plot—check x’s independentlyIndependent—Are the residuals independent? Look at the residual index plot; if there is no pattern, this condition is satisfied. If there is a pattern, it is violated.Normal—Are the residuals normally distributed? It is satisfied as long as the residuals are not very skewed. You can look at the histogram of the residuals. If n is small, look at a QQ-plot. If the residuals are normally distributed, the dots will fall along the line.Equal Spread—Are the points equally spread about the regression line? Satisfied if there is no pattern in the residual plot—they are randomly scattered above & below 0. It is violated if the points on the residual plot show a fan pattern (<, >) X axis = predicted values Y = Y

Incorporating categorical data into a multiple regression model:

0 = “Off”: Absence of a Quality

1 = “On”: Presence of a Quality

Page 7: Cheat Sheet for Test 4 Updated

Gender: Dummy Female:Male 0

Female 1You need 1 fewer dummy variable than the different categories you’re trying to code.

Example—Coding 4 different departments—Sales, Purchasing, Advertising, & Engineering:

Page 8: Cheat Sheet for Test 4 Updated

Multiple linear regression results: Dependent Variable: SalaryIndependent Variable(s): YrsEm, Educ, Dummy: Female, Sales, Purchasing, AND Advertising

Parameter estimates:Parameter Estimate Std. Err. Alternative DF T-Stat P-Value

Intercept 26782.217 1956.6695 ≠ 0 39 13.687655 <0.0001YrsEm 722.42648 121.41234 ≠ 0 39 5.9501899 <0.0001Educ 1750.2416 314.1732 ≠ 0 39 5.5709449 <0.0001

Dummy Female -1935.1944 1427.31 ≠ 0 39 -1.3558332 0.183Dummy Sales -8561.053 1806.2896 ≠ 0 39 -4.7395795 <0.0001

Dummy Purchasing 74.592125 2010.181 ≠ 0 39 0.037107168 0.9706Dummy Advertising -3327.5391 2068.7432 ≠ 0 39 -1.6084834 0.1158

Analysis of variance table for multiple regression model:Source DF SS MS F-stat P-valueModel 6 4.5893206e9 7.6488676e8 34.892395 <0.0001Error 39 8.5493083e8 21921303Total 45 5.4442514e9

^Salary^ = 26,782 + 722 Years of Employment + 1,750 Education – 1935 Dummy Female – 8,561 Dummy Sales + 74 Dummy Purchasing – 3,327 Dummy Advertising

Dummy Purchasing = least helpful because it has the largest P-value.

Years of Employment = most helpful because it has the largest F-Statistic.B0 = 26,782 Interpret: I predict the salary will be $26,782 when yrs of employment & education = 0, the employee is male, & in the engineering dept. B YearsEmplo = 722 Interpret: I predict the salary will ↑ by $722 when years of employment ↑’s by 1 year, if all else is held constant.

The coefficient in front of a dummy variable compares the “on” case to the all “off” case.

B dummy female = (1,935) Interpretation: We predict the salary for a female employee will be $1,935 less than that of a male employee if all else is held constant.

Predict the salary if years of employment = 5 education = 8, they’re female, & they’re in sales.^Salary^ = 26,782 + 722 (5) + 1,750 (8) – 1,935 (1) – 8,561 (1) + 74 (0) – 3,327 (0) = $33,896

Predict the salary if years of employment = 5 education = 8, they’re male, & they’re in sales.^Salary^ = 26,782 + 722 (5) + 1,750 (8) – 1,935 (0) – 8,561 (1) + 74 (0) – 3,327 (0) = $35,831

Difference = $1,935 as it should be— B dummy female = (1,935) Interpret: I predict the salary for a female employee will be $1,935 < that of a male employee if all else is held constant.B dummy sales=($8,561) Interpret: I predict the salary for an employee in sales will be $8,561 < an employee in the engineering department—the all “off” case—if all else is held constant.B dummy purchasing = ($74) Interpretation: We predict the salary for an employee in purchasing will be $74 more than an employee in the engineering dept. if all else is held constant.B dummy advertising = ($3,327) Interpretation: We predict the salary for an employee in advertising will be $3,327 < an employee in the engineering dept. if all else is held constant.

Detergent Data Set—Coding 3 different Campaigns—A, B, and C:

Page 9: Cheat Sheet for Test 4 Updated

Multiple linear regression results: Dependent Variable: Demand (y)Independent Variable(s): Price, Avg Industry Price, Advertising, Dummy A, Dummy B

Parameter estimates:Parameter Estimate Std. Err. Alternative DF T-Stat P-Value

Intercept 9.1549761 1.5937037 ≠ 0 24 5.7444656 <0.0001Price -2.7680243 0.41443668 ≠ 0 24 -6.679004 <0.0001

Avg Industry Price 1.6666921 0.19133194 ≠ 0 24 8.7109976 <0.0001Advertising 0.49274246 0.080645681 ≠ 0 24 6.1099672 <0.0001Dummy A -0.43956212 0.07033498 ≠ 0 24 -6.2495521 <0.0001Dummy B -0.17006607 0.066877659 ≠ 0 24 -2.542943 0.0179

Analysis of variance table for multiple regression model:Source DF SS MS F-stat P-valueModel 5 12.916568 2.5833135 114.38622 <0.0001Error 24 0.54201916 0.022584132Total 29 13.458587

^Demand^ = 9.15 – 2.77 Price + 1.67 AIP + 0.49 Advertising – 0.44 Dummy A – 0.17 Dummy B

B Price = (2.77) Interpretation: We predict the demand will ↓ by 277,000 bottles when price ↑’s by $1 if all else held constant.B Dummy A = (0.44) Interpretation: We predict the demand for Campaign A will be 44,000 bottles less than Campaign C if all else is held constant.B Dummy B = (0.17) Interpretation: We predict the demand for Campaign B will be 17,000 bottles less than Campaign C if all else is held constant.

Predict Demand if Price = $3.70, AIP = $3.80 Advertising = $550,000 & its:I. Campaign A II. Campaign B III. Campaign C

For Campaign A, on StatCrunch, you would enter the following into the columns of a blank row:3.7 3.8 5.5 A 1 0

StatCrunch Results:Campaign A Predicted Demand = 7.5172 Camp. B PD =7.7867 Camp. C PD =7.95679

2 years ago, a random sample of 1000 homeowners showed that 435 of them had flower gardens.  This year a random sample of a different 1000 homeowners showed 380 of them had flower gardens.  Test to see if the proportion of homeowners who have flower gardens this year is different from the proportion last year. Having a flower garden or not is categorical so I will work with proportions. We have 2 samples: 1 from this year & 1 from last year. So, the appropriate test would be for 2 proportions. Ho: pthis year = plast year HA: pthis year ≠ plast year A random sample of 18 homes south of Center Street in Corning has a mean selling price of $125,000 & a standard deviation of $2400.  A random sample of 20 homes north of Center Street has a mean selling price of $127000 with a standard deviation of $4800.  Is the mean home price north of Center Street significantly diff from the mean home price south of Center Street? Home prices are quantitative so work w/ means. We have 2 samples. The sample sizes are diff so they samples must be independent. So use independent means. Ho: µsouth= µnorth HA: µsouth ≠ µnorth A fast-food chain wants to evaluate the service at four restaurants.  The customer service director for the chain hires six investigators with varied experiences in food-service evaluation to act as raters.  The six raters evaluate the service at each of the four restaurants in a random order, rating the restaurant on a scale from 0 (low) to 100 (high).  Is there a significant difference in the mean ratings among the four restaurants? The ratings will be quantitative since we are averaging them to find the mean. There are four restaurants. So I have more than 2 means which means I must use ANOVA. Ho: µ1 = µ2 = µ3 = µ4 HA: At least 1 µi is different A grower claims a certain type of flower seed will produce 60% magenta, 30% chartreuse & 10% ochre flowers.  A total of 100 seeds were planted & all germinated yielding 52 magenta, 36 chartreuse & 12 ochre flowers.  Do the proportions of flower types produced in this sample match the proportions claimed by the grower? Types of flowers are categorical so we need to work with proportions. There are 3 types & we are trying to see if the proportions match (“fit”) what was claimed so we need to use a chi-square goodness of fit test. Ho: pmagenta = 0.6, pchartreuse = 0.3, pochre = 0.1 HA: At least 1 pi is diff from H0 Gasoline pumped from a supplier's pipeline is supposed to have an octane rating of 87.5.  On 13 consecutive days a sample was taken & analyzed.  Is

Page 10: Cheat Sheet for Test 4 Updated

there sufficient evidence to show the mean octane is significantly < 87.5? Octane rating is quantitative so we will work with means. We have 1 sample. So, the test we will do is for 1 mean. Ho: µ= 87.5 HA: µ< 87.5 A # of sports enthusiasts have argues that major league baseball players from teams in the Central Division have an unfair advantage over coastal players in the Western & Eastern Divisions.  This is because of the impact due to the difference in time is likely to be greater when playing on the road.  Players from teams on the coasts could gain or lose up to 3 hours, whereas Central Division players would seldom gain or lose more than 1 hour.  A random sample of win/loss percentages of road games were taken from each of the 3 divisions.  Is there a significant difference in the mean win/loss percentages for the 3 divisions? They are asking us to test if there is a difference in the MEAN win/loss percentage for 3 divisions. So, we are working with quantitative data & we have 3 groups. Therefore, we need to use ANOVA. Ho: µcentral = µwestern = µeastern HA: At least 1 µi is diff Many people sleep in on the weekends to make up for late nights during the work week.  The Better Sleep Council reports that 61% of us get at least 7 hours of sleep/night on the weekend.  A random sample of 350 adults found that 235 had at least 7 hours of sleep each night last weekend.  Does this evidence show that the proportion of people who get at least 7 hours of sleep each night during the weekend is 0.61? (that the Council's claim is correct?) Whether or not someone gets at least seven hours of sleep per night is categorical. When you ask the question “do you get at least 7 hours of sleep per night?” the answers will be “yes” or “no”. Another indication is in the problem it says “proportion”. We’re working with 1 sample. So test is 1 proportion. Ho: p= 0.61

HA: p≠ 0.61The Dean of Engineering wants to know if the distr’n of engineering majors has changed from what it was when the school 1st opened.  Records from the first year show 17% aerospace, 25% civil, 20% electrical, 10% industrial & 28% mechanical.  This year there are 70 aerospace, 470 civil, 150 electrical, 350 industrial & 335 mechanical majors.  Test the claim that the disbut’n of engineering majors has changed from the original %ages. The question’s dealing w/ engineering majors. This data is categorical. There are 5 different majors listed. We’re testing a claim about a distribution to see if it matches (FITS) the records from the 1st year. So, test chi-square goodness of fit test. Ho: paerospace = 0.17, pcivil = 0.25, pelectrical = 0.20, pindustrial = 0.10, pmechanical = 0.28 HA: At least 1 pi is different from Ho There are 2 techniques for determining diastolic blood pressure: the standard method used by medical personnel & a digital method which uses an electronic device with a digital readout.  The diastolic blood pressure is measured using both methods for 45 patients.  Is there a significant diff in the mean diastolic blood pressure using the 2 methods. A blood pressure measurement is quantitative. Another clue our data’s quantit. is that they ask us to see if there is a diff in the MEAN … There are 2 groups, so I need to decide if the data is paired or indep. It says that BOTH methods were tested on 45 patients. So, the data for the standard & digital method can be matched for the SAME patient. Thus, use paired means. Ho: µstandard = µdigital HA: µstandard ≠ µdigital