14
Linear Regression Modeling with Data

Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

Embed Size (px)

Citation preview

Page 1: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

Linear Regression

Modeling with Data

Page 2: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

The BIG Question

Did you prepare for today?

If you did, mark yes and estimate the amount of time you spent preparing on your frequency log.

Page 3: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

Problem

Suppose we are given the following data about father and son heights to analyze. What can we conclude about it?

Page 4: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

Connect

How about if we formulate a hypothesis to investigate such as:Is there a correlation between a father’s height and his son’s height?

: There is a correlation between a father’s height and his son’s height. : There is no correlation between a father’s height and his son’s height.

Is there anything we have studied that can help you think where to start?

Page 5: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

DefinitionsFor a problem such as this one, we are trying to determine if there is

a relationship between two variables. This is called a correlation.

The data can be represented as ordered pairs (x, y). Does anyone recall what the x and y are called?

The x-variable is the independent (or explanatory) variable and the y-variable is the dependent (or response) variable. This is similar to the concepts you have seen in algebra.

In our example, the father’s height is the independent variable and the son’s height is the dependent variable.

Page 6: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

Scatter plotA scatter plot is a plotting of the ordered pairs (x, y) which is

used to see what kind of correlation two variables might have.

Example 1: What kind of correlation would you guess these data sets to have?

Negative Linear Correlation

Positive Linear Correlation

Nonlinear Correlation No Correlation

Page 7: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

Father and Son Data Scatter plotUsing SPSS, I loaded the father and son height

data into the software. I then generated a scatter plot for the data which looks like:

What kind of correlation does it look like it might have?

Looks like a positive linear relationship.

Page 8: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

QuestionIs there a way can we can calculate to find out if

there is a correlation and how strong it might be?

The correlation coefficient, denoted as r, gives us a measure of the strength and direction of a linear relationship between two variables. The population correlation coefficient is denoted as ρ .

How do we calculate the correlation coefficient?

The formula is:

2 22 2

n xy x yr

n x x n y y

Where n is the number of data pairs.

Page 9: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

What is the correlation coefficient for the father and son data?

Using SPSS we have the following output:

This is the correlation coefficient.

-1 0 1

If r = -1 there is a perfect negative

correlation

If r is close to 0 there is no linear

correlation

If r = 1 there is a perfect positive

correlation

About where .668 is.What is the range for the correlation coefficient?

Page 10: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

AnalysisSince the correlation coefficient is .668, this implies

there seems to be a positive linear relationship between a father’s height and his son’s height.

However, does this imply that this relationship is significant enough to use it to predict if it would hold as a population correlation coefficient for ρ?

We would use r as the test statistic and could use the standardized test statistic t with degrees of freedom n - 2.

212

r

r rtr

n

How do we calculate the t statistic here?

Page 11: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

Hypothesis testing for significance

Testing the null hypothesis that there is no linear relationship between the independent and dependent variables, we would use the model:

: ρ = 0 : ρ ≠ 0a = .05

Degrees of freedom would be 11 – 2 = 9. Thus at a .05 significance, the rejection region starts at - = -2.262 and = 2.262.

Example

Page 12: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

Calculate and Summarize

By running a model analysis in SPSS we have:

At the .05 level of significance, the t-value

is 2.690

The test statistic lies inside of the rejection region which starts at 2.262. Thus there is enough evidence to reject the null hypothesis and conclude there is a significant linear correlation between a father’s height and his son’s height.

Page 13: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

Finding the Regression LineNow that we know that

there is a significant linear correlation between a father and son’s height, we can find the regression line.

The regression line is the line that best models the data. It can be used to predict the value of y given a value of x. In SPSS we find the regression line to the right:

Page 14: Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing

Question

Can we find the exact equation of the regression line?

Yes, the equation is similar to the equation of a line from algebra.

Who recalls the equation of a line?