Upload
derek-gardner
View
215
Download
0
Embed Size (px)
Citation preview
Linear Regression
Modeling with Data
The BIG Question
Did you prepare for today?
If you did, mark yes and estimate the amount of time you spent preparing on your frequency log.
Problem
Suppose we are given the following data about father and son heights to analyze. What can we conclude about it?
Connect
How about if we formulate a hypothesis to investigate such as:Is there a correlation between a father’s height and his son’s height?
: There is a correlation between a father’s height and his son’s height. : There is no correlation between a father’s height and his son’s height.
Is there anything we have studied that can help you think where to start?
DefinitionsFor a problem such as this one, we are trying to determine if there is
a relationship between two variables. This is called a correlation.
The data can be represented as ordered pairs (x, y). Does anyone recall what the x and y are called?
The x-variable is the independent (or explanatory) variable and the y-variable is the dependent (or response) variable. This is similar to the concepts you have seen in algebra.
In our example, the father’s height is the independent variable and the son’s height is the dependent variable.
Scatter plotA scatter plot is a plotting of the ordered pairs (x, y) which is
used to see what kind of correlation two variables might have.
Example 1: What kind of correlation would you guess these data sets to have?
Negative Linear Correlation
Positive Linear Correlation
Nonlinear Correlation No Correlation
Father and Son Data Scatter plotUsing SPSS, I loaded the father and son height
data into the software. I then generated a scatter plot for the data which looks like:
What kind of correlation does it look like it might have?
Looks like a positive linear relationship.
QuestionIs there a way can we can calculate to find out if
there is a correlation and how strong it might be?
The correlation coefficient, denoted as r, gives us a measure of the strength and direction of a linear relationship between two variables. The population correlation coefficient is denoted as ρ .
How do we calculate the correlation coefficient?
The formula is:
2 22 2
n xy x yr
n x x n y y
Where n is the number of data pairs.
What is the correlation coefficient for the father and son data?
Using SPSS we have the following output:
This is the correlation coefficient.
-1 0 1
If r = -1 there is a perfect negative
correlation
If r is close to 0 there is no linear
correlation
If r = 1 there is a perfect positive
correlation
●
About where .668 is.What is the range for the correlation coefficient?
AnalysisSince the correlation coefficient is .668, this implies
there seems to be a positive linear relationship between a father’s height and his son’s height.
However, does this imply that this relationship is significant enough to use it to predict if it would hold as a population correlation coefficient for ρ?
We would use r as the test statistic and could use the standardized test statistic t with degrees of freedom n - 2.
212
r
r rtr
n
How do we calculate the t statistic here?
Hypothesis testing for significance
Testing the null hypothesis that there is no linear relationship between the independent and dependent variables, we would use the model:
: ρ = 0 : ρ ≠ 0a = .05
Degrees of freedom would be 11 – 2 = 9. Thus at a .05 significance, the rejection region starts at - = -2.262 and = 2.262.
Example
Calculate and Summarize
By running a model analysis in SPSS we have:
At the .05 level of significance, the t-value
is 2.690
The test statistic lies inside of the rejection region which starts at 2.262. Thus there is enough evidence to reject the null hypothesis and conclude there is a significant linear correlation between a father’s height and his son’s height.
Finding the Regression LineNow that we know that
there is a significant linear correlation between a father and son’s height, we can find the regression line.
The regression line is the line that best models the data. It can be used to predict the value of y given a value of x. In SPSS we find the regression line to the right:
Question
Can we find the exact equation of the regression line?
Yes, the equation is similar to the equation of a line from algebra.
Who recalls the equation of a line?