Upload
rachel-chung
View
334
Download
0
Tags:
Embed Size (px)
Citation preview
Linear Regression Purpose – Determine if one or more
IVs can predict a DV Examples:
• Does your height (IV) predict how much money you will spend (DV)?
• Does the number of store managers predict how often the machine will break down (DV)?
• Does the number of clicks (IV1) and the number of comments (IV2) on the blog predict the size of revenue (DV)?
Choosing the right test for your research
Research Question Inferential Statistics
Compare means of 2 numeric variables
T test
Relate 2 categorical variables Pearson Chi Square
Relate 2 numeric variables Pearson Correlation r
Use 1+ IVs to explain 1 numeric DV
Regression
Where’s the crystal ball? I want to see the future!
Correlation tells us how X relates to Y (in the past)
Simple Regression tells us how X predicts Y (in the future)• E.g., Does AvgDailyClicks predict
DirectSalesRevenue? Multiple Regression tells us how
X1, X2, X3, ….. predicts Y• E.g., Do NumberBlogAuthors &
AvgDailyClicks predict SponsorRevenue?
Linear Regression Assumptions The relationship between Xs and Y are
linear If you have 2 or more Xs, they are not
perfectly correlated with each other Xs are not correlated with external
variables Independence – Any two observations
should be independent from each other. Errors are normally distributed And a few others
Simple Regression Example: Does Number of Stupid
Customers predict Self Checkout Error Rate?
When we use X to predict Y:• X = the predictor = the independent variable (IV)• Y = the predicted value = the dependent variable
(the value of Y depends on the predictor X) (DV)• You’re basically building a linear model between X
and Y:
Y = Constant + B*X + error
Basic Geometry: Linear Function Y = Constant + B*X + error Y = 1 + 2*X
Source: wikepedia
Constant = 1
Slope B = 2
What do Armani and regression have in common?Model Audition: Fitting the best straight line between
X & Y
Who is the best fitting model? (Hint: Not Kate Moss)
Line that’s closest to all dots
Kate Moss expressed mathematically:DirectSalesRevenue=(constant)
+B*AvgDailyClicks+error
Goodness of Fit (R2): How well does the line fit the data?(How well does Kate fit the average
woman?)
(constant)
Slope B
Distances to regression line = error
Good fit = small errors
Kate Moss as a lousy regression model:
Large errors, poor goodness of fit, small R2
Reading the SPSS Regression Output
Y = Constant + B*X + error DirectSalesRevenue =
19.466-.003*AvgDailyClicks+errorConstant is significantly greater than
zero
Slope (-.003) is significantly less than zero
Goodness of Fit (R2): Model explains 59% variations in DirectSalesRevenue
Reporting Regression in plain English
The number of average daily clicks significantly predicted direct sales revenue, b = -.03, t(39) = 14.72, p < .001. The number of average daily clicks also explained a significant proportion of variance in direct sales revenue, R2 = .59, F(1, 38) = 42.64, p < .001. These findings suggest that, websites with more average daily clicks tend to have lower direct sales revenue level.
Why is regression useful for predicting the future?
Y=200X (R2 = 45%)Given any X, we can predict value of Y with 45%
accuracy
Additional Notes Assumptions: Xs are somewhat independent; Y values are
independent; Y values are normally distributed; errors are normally distributed; X Y relations are linear; no outliers• Example: Time series data are NOT independent – stock price today depends on
stock price yesterday which depends on stock price the day before, etc. Multiple regression is just an extension of single regression
• Use multiple Xs (e.g., both AvgDailyClicks and NumberAuthors) to predict Y
• When you have a condition (e.g., customer choice depends on gender; brand awareness depends on comm. channel; number of applications depends on program of study), you need to create an interaction term next class
When an X is categorical (e.g., whether the blog host is Google or WordPress): Code X in numbers – e.g., 0 is Google, 1 is WordPress
When Y is categorical (e.g., whether the blog won the Outstanding Blog Award): Code Y in numbers – e.g. 0 is No, 1 is Yes, and use Logistic Regression
Y=Constant +B1 * X1 + B2 * X2 + error for Your Project
What is your Y (the value you want to predict)? Is your Y categorical? Do you need Logistic
Regression? See the instructor for help What is your X (your predictor variable)? How many
Xs do you have? Is any of your Xs categorical? Do you have a
coding scheme? Do you have a condition? (e.g., customer choice
depends on gender; brand awareness depends on comm. channel; number of applications depends on program of study) See the instructor for help
Choosing the right test for your research
Research Question Inferential Statistics
Compare means of 2 numeric variables
T test
Relate 2 numeric variables Pearson Correlation r
Relate 2 categorical variables Pearson Chi Square
Use 1+ IVs to explain 1 numeric DV Regression