View
225
Download
1
Tags:
Embed Size (px)
Citation preview
BCOR 1020Business Statistics
Lecture 24 – April 17, 2008
Overview
Chapter 12 – Linear Regression– Visual Displays and Correlation Analysis– Bivariate Regression– Regression Terminology
Chapter 12 –Visual Displays
• Begin the analysis of bivariate data (i.e., two variables) with a scatter plot.
• A scatter plot - displays each observed data pair (xi, yi) as a dot on an X/Y grid.- indicates visually the strength of the relationship between the two variables.
Visual Displays:
Chapter 12 –Visual DisplaysVisual Displays:
The price of Regular Unleaded appears to have a positively sloped linear relationship with the price of Diesel.
These variables appear to be correlated.
Chapter 12 –Correlation Analysis
• The sample correlation coefficient (r) measures the degree of linearity in the relationship between X and Y.
-1 < r < +1
• r = 0 indicates no linear relationship.• Correlation functions are available in Excel,
MegaStat and on your calculators.
Correlation Analysis:
Strong negative relationship Strong positive relationship
Chapter 12 –Correlation Analysis
Correlation Analysis (Computing r):
This value can be calculated on your calculator or using a software package like Excel or MegaStat.
Chapter 12 –Correlation Analysis
Example:• Data Set for problem 12.3
(“CallWait”)…Y = Hold time (minutes) for concert tickets
X = number of operators
Operators (X) Wait Time (Y)
4 385
5 335
6 383
7 344
8 288
Wait Time vs. Operators
0
100
200
300
400
500
0 2 4 6 8 10
Operators (X)
Wa
it T
ime
(Y
)
There appears to be “some” negative correlation between the variables.
Does this make sense?
We can calculate the sample correlation coefficient…
r = -0.733(overhead)
Excel/MegaStat Demo…
Chapter 12 –Correlation Analysis
Strong Positive Correlation
Weak Positive Correlation
Weak Negative Correlation
Strong Negative Correlation
Chapter 12 –Correlation Analysis
No Correlation
Nonlinear Relation
Chapter 12 –Correlation Analysis
• r is an estimate of the population correlation coefficient (rho).
• To test the hypothesis H0: = 0, the test statistic is:
• The critical value t is obtained from Appendix D using = n – 2 degrees of freedom for any .
• We can bound the p-value for this test using the t table or we can find it exactly using Excel or MegaStat.
Tests for Significance:
Chapter 12 –Correlation Analysis
• Equivalently, you can calculate the critical value for the correlation coefficient using
• This method gives a benchmark for the correlation coefficient.
• However, there is no p-value and is inflexible if you change your mind about .
Tests for Significance:
Chapter 12 –Correlation Analysis
• Step 1: State the HypothesesDetermine whether you are using a one or two-tailed test and the level of significance (a).
H0: = 0 H1: ≠ 0
• Step 2: Calculate the Critical ValueFor degrees of freedom = n -2, look up the critical value t in Appendix D, then calculate
Steps in Testing if = 0:
• Step 3: Make the DecisionIf the sample correlation coefficient r exceeds the critical value r, then reject H0.
• If using the t statistic method, reject H0 if t > t or if the p-value < .
Chapter 12 –Correlation AnalysisExample:• In our earlier example on the data set “CallWait”, we
calculated the sample correlation, r = -0.733, based on n = 5 data points.
• Calculate the Critical Value, r, to test the hypothesis H0: = 0 vs. H1: ≠ 0 at the 10% level of significance.
25353.2
353.2
2 22
2,2/
2,2/
nt
tr
n
n
805.0r
• Since | r | is not greater than r, we cannot reject H0. There is not a significant correlation between these variables at the 10 % level of significance.
ClickersFor our example on the data set “CallWait”, we calculated the sample correlation, r = -0.733, based on n = 5 data points. Instead of calculating the Critical Value, r, to test the hypothesis H0: = 0 vs. H1: ≠ 0, we could have calculated the test statistic
What are the bounds for the p-value on this test statistic?
(A) 0.10 < p-value < 0.20
(B) 0.025 < p-value < 0.05
(C) 0.05 < p-value < 0.10
866.1)733.0(1
25)733.0(
1
2 *22
*
Tr
nrT
t distribution with = n-2 d.f. under H0.
Chapter 12 –Correlation Analysis
• As sample size increases, the critical value of r becomes smaller.
• This makes it easier for smaller values of the sample correlation coefficient to be considered significant.
• A larger sample does not mean that the correlation is stronger nor does its significance imply importance.
Role of Sample Size:
Chapter 12 –Bivariate Regression
• Bivariate Regression analyzes the relationship between two variables.
• It specifies one dependent (response) variable and one independent (predictor) variable.
• This hypothesized relationship may be linear, quadratic, or whatever.
What is Bivariate Regression?
Chapter 12 –Bivariate Regression
Some Model Forms:
Chapter 12 –Bivariate Regression
• The intercept and slope of a fitted regression can provide useful information. For example, consider the fitted regression model…
Sales(Y) = 268 + 7.37Ads(X)– Each extra $1 million of advertising will generate $7.37
million of sales on average.– The firm would average $268 million of sales with zero
advertising. – However, the intercept may not be meaningful because
Ads = 0 may be outside the range of the observed data.
Prediction Using Regression:
Chapter 12 –Bivariate Regression
• One of the main uses of regression is to make predictions. Once you have a fitted regression equation that shows the estimated relationship between X and Y, we can plug in any value of X to make a prediction for Y. Consider our example…
Sales(Y) = 268 + 7.37Ads(X)• If the firm spends $10 million on advertising, its
expected sales would be…– Sales(Y) = 268 + 7.37(10) = $341.7 million.
Prediction Using Regression:
Chapter 12 –Regression Terminology
• Unknown parameters that we will estimate are0 = Intercept1 = Slope
• The assumed model for a linear relationship is
yi = 0 + 1xi + i
for all observations (i = 1, 2, …, n)• The error term is not observable, but is assumed
normally distributed with mean of 0 and standard deviation .
Models and Parameters:
Chapter 12 –Regression Terminology
• The fitted model used to predict the expected value of Y for a given value of X is
yi = b0 + b1xi
Models and Parameters:
• The fitted coefficients areb0 the estimated interceptb1 the estimated slope
• Residual is ei = yi - yi.
• Residuals may be used to estimate , the standard deviation of the errors.
• We will discuss how b0 and b1 are found next lecture.
^
^
Chapter 12 –Regression Terminology
• Step 1: - Highlight the data columns.- Click on the Chart Wizard and choose Scatter Plot- In the completed graph, click once on the points in the scatter plot to select the data- Right-click and choose Add Trendline- Choose Options and check Display Equation
Fitting a Regression on a Scatter Plot in Excel:
Chapter 12 –Regression Terminology
Example:• Data Set for problem 12.3 (“CallWait”)…• Y = Hold time (minutes) for concert
tickets• X = number of operators
Operators (X) Wait Time (Y)
4 385
5 335
6 383
7 344
8 288Wait Time vs. Operators
y = -18.5x + 458
0
100
200
300
400
500
0 2 4 6 8 10
Operators (X)
Wait
Tim
e (
Y)
From this output, we have the linear model:
y = 458 – 18.5x
b0 = 458
b1 = -18.5
Discussion…
ClickersFor our example on the data set “CallWait”, we have now calculated the regression model:
Wait time (Y) = 458 – 18.5 Operators (X)
If the there are 7 operators, what is the expected wait
time?
(A) 458
(B) 129.5
(C) 328.5
(D) 587.5
Chapter 12 –Regression Terminology
Regression Caveats:• The “fit” of the regression does not depend on the sign of
its slope. The sign of the fitted slope merely tells whether X has a positive or negative association with Y.
• View the intercept with skepticism unless X = 0 is logically possible and was actually observed in the data set.
• Be wary of extrapolating the model beyond the observed range in the data.
• Regression does not demonstrate cause-and-effect between X and Y. A good fit shows that X and Y vary together. Both could be affected by another variable or by the way the data are defined.