BCOR 1020 Business Statistics Lecture 24 – April 17, 2008

BCOR 1020Business Statistics

Lecture 24 – April 17, 2008

Overview

Chapter 12 – Linear Regression– Visual Displays and Correlation Analysis– Bivariate Regression– Regression Terminology

Chapter 12 –Visual Displays

• Begin the analysis of bivariate data (i.e., two variables) with a scatter plot.

• A scatter plot - displays each observed data pair (xi, yi) as a dot on an X/Y grid.- indicates visually the strength of the relationship between the two variables.

Visual Displays:

Chapter 12 –Visual DisplaysVisual Displays:

The price of Regular Unleaded appears to have a positively sloped linear relationship with the price of Diesel.

These variables appear to be correlated.

Chapter 12 –Correlation Analysis

• The sample correlation coefficient (r) measures the degree of linearity in the relationship between X and Y.

-1 < r < +1

• r = 0 indicates no linear relationship.• Correlation functions are available in Excel,

MegaStat and on your calculators.

Correlation Analysis:

Strong negative relationship Strong positive relationship


Correlation Analysis (Computing r):

This value can be calculated on your calculator or using a software package like Excel or MegaStat.


Example:• Data Set for problem 12.3

(“CallWait”)…Y = Hold time (minutes) for concert tickets

X = number of operators

Operators (X) Wait Time (Y)

4 385

5 335

6 383

7 344

8 288

Wait Time vs. Operators

0

100

200

300

400

500

0 2 4 6 8 10

Operators (X)

Wa

it T

ime

(Y

)

There appears to be “some” negative correlation between the variables.

Does this make sense?

We can calculate the sample correlation coefficient…

r = -0.733(overhead)

Excel/MegaStat Demo…


Strong Positive Correlation

Weak Positive Correlation

Weak Negative Correlation

Strong Negative Correlation


No Correlation

Nonlinear Relation


• r is an estimate of the population correlation coefficient (rho).

• To test the hypothesis H0: = 0, the test statistic is:

• The critical value t is obtained from Appendix D using = n – 2 degrees of freedom for any .

• We can bound the p-value for this test using the t table or we can find it exactly using Excel or MegaStat.

Tests for Significance:


• Equivalently, you can calculate the critical value for the correlation coefficient using

• This method gives a benchmark for the correlation coefficient.

• However, there is no p-value and is inflexible if you change your mind about .

Tests for Significance:


• Step 1: State the HypothesesDetermine whether you are using a one or two-tailed test and the level of significance (a).

H0: = 0 H1: ≠ 0

• Step 2: Calculate the Critical ValueFor degrees of freedom = n -2, look up the critical value t in Appendix D, then calculate

Steps in Testing if = 0:

• Step 3: Make the DecisionIf the sample correlation coefficient r exceeds the critical value r, then reject H0.

• If using the t statistic method, reject H0 if t > t or if the p-value < .

Chapter 12 –Correlation AnalysisExample:• In our earlier example on the data set “CallWait”, we

calculated the sample correlation, r = -0.733, based on n = 5 data points.

• Calculate the Critical Value, r, to test the hypothesis H0: = 0 vs. H1: ≠ 0 at the 10% level of significance.

25353.2

353.2

2 22

2,2/

2,2/

nt

tr

n

n

805.0r

• Since | r | is not greater than r, we cannot reject H0. There is not a significant correlation between these variables at the 10 % level of significance.

ClickersFor our example on the data set “CallWait”, we calculated the sample correlation, r = -0.733, based on n = 5 data points. Instead of calculating the Critical Value, r, to test the hypothesis H0: = 0 vs. H1: ≠ 0, we could have calculated the test statistic

What are the bounds for the p-value on this test statistic?

(A) 0.10 < p-value < 0.20

(B) 0.025 < p-value < 0.05

(C) 0.05 < p-value < 0.10

866.1)733.0(1

25)733.0(

1

2 *22

*

Tr

nrT

t distribution with = n-2 d.f. under H0.


• As sample size increases, the critical value of r becomes smaller.

• This makes it easier for smaller values of the sample correlation coefficient to be considered significant.

• A larger sample does not mean that the correlation is stronger nor does its significance imply importance.

Role of Sample Size:

Chapter 12 –Bivariate Regression

• Bivariate Regression analyzes the relationship between two variables.

• It specifies one dependent (response) variable and one independent (predictor) variable.

• This hypothesized relationship may be linear, quadratic, or whatever.

What is Bivariate Regression?


Some Model Forms:


• The intercept and slope of a fitted regression can provide useful information. For example, consider the fitted regression model…

Sales(Y) = 268 + 7.37Ads(X)– Each extra $1 million of advertising will generate $7.37

million of sales on average.– The firm would average $268 million of sales with zero

advertising. – However, the intercept may not be meaningful because

Ads = 0 may be outside the range of the observed data.

Prediction Using Regression:


• One of the main uses of regression is to make predictions. Once you have a fitted regression equation that shows the estimated relationship between X and Y, we can plug in any value of X to make a prediction for Y. Consider our example…

Sales(Y) = 268 + 7.37Ads(X)• If the firm spends $10 million on advertising, its

expected sales would be…– Sales(Y) = 268 + 7.37(10) = $341.7 million.

Prediction Using Regression:

Chapter 12 –Regression Terminology

• Unknown parameters that we will estimate are0 = Intercept1 = Slope

• The assumed model for a linear relationship is

yi = 0 + 1xi + i

for all observations (i = 1, 2, …, n)• The error term is not observable, but is assumed

normally distributed with mean of 0 and standard deviation .

Models and Parameters:


• The fitted model used to predict the expected value of Y for a given value of X is

yi = b0 + b1xi

Models and Parameters:

• The fitted coefficients areb0 the estimated interceptb1 the estimated slope

• Residual is ei = yi - yi.

• Residuals may be used to estimate , the standard deviation of the errors.

• We will discuss how b0 and b1 are found next lecture.

^

^


• Step 1: - Highlight the data columns.- Click on the Chart Wizard and choose Scatter Plot- In the completed graph, click once on the points in the scatter plot to select the data- Right-click and choose Add Trendline- Choose Options and check Display Equation

Fitting a Regression on a Scatter Plot in Excel:


Example:• Data Set for problem 12.3 (“CallWait”)…• Y = Hold time (minutes) for concert

tickets• X = number of operators

Operators (X) Wait Time (Y)

4 385

5 335

6 383

7 344

8 288Wait Time vs. Operators

y = -18.5x + 458

0

100

200

300

400

500

0 2 4 6 8 10

Operators (X)

Wait

Tim

e (

Y)

From this output, we have the linear model:

y = 458 – 18.5x

b0 = 458

b1 = -18.5

Discussion…

ClickersFor our example on the data set “CallWait”, we have now calculated the regression model:

Wait time (Y) = 458 – 18.5 Operators (X)

If the there are 7 operators, what is the expected wait

time?

(A) 458

(B) 129.5

(C) 328.5

(D) 587.5


Regression Caveats:• The “fit” of the regression does not depend on the sign of

its slope. The sign of the fitted slope merely tells whether X has a positive or negative association with Y.

• View the intercept with skepticism unless X = 0 is logically possible and was actually observed in the data set.

• Be wary of extrapolating the model beyond the observed range in the data.

• Regression does not demonstrate cause-and-effect between X and Y. A good fit shows that X and Y vary together. Both could be affected by another variable or by the way the data are defined.

Documents

BCOR 1020 Business Statistics Lecture 24 – April 17, 2008