(Correlation and) (Multiple) Regression Friday 5 th March (and Logistic Regression too!)

(Correlation and)(Multiple) Regression

Friday 5th March

(and Logistic Regression too!)

The Shape of Things to Come…Rest of Module

Week 8 Week 9 Week 10

Morning:Regression

Morning:Published MultivariateAnalyses

Morning:Log-linear

Models

Afternoon:Regression &

LogisticRegression(Computing

Session)

Afternoon:Logistic

Regression

Afternoon:Log-linear

Models(Computing

Session)

ASSESSMENT D ASSESSMENT E

The Correlation Coefficient (r)

Age at FirstChildbirth

Age at First Cohabitation

This shows the strength/closeness of a relationship

r = 0.5(or perhaps less…)

r = + 1 r = -1

Correlation… and Regression

• r measures correlation in a linear way

• … and is connected to linear regression

• More precisely, it is r2 (r-squared) that is of relevance

• It is the ‘variation explained’ by the regression line

• … and is sometimes referred to as the ‘coefficient of determination’

The arrows show the overall variation(variation from the mean of y)

Some of the overall variation is explained by theregression line (i.e. the arrows tend to be shorter than

the dashed lines, because the regression line is closer to the points than the mean line is)

Length ofResidence (y)

Age (x)0

outlier

y = Bx + C + ε Error term(Residual)

ConstantSlope

Regressionline

• Some variation is explained by the regression line• The residuals constitute the unexplained variation

• The regression line is chosen so as to minimise the sum of the squared residuals

• i.e. to minimise Σε2 (Σ means ‘sum of’)

• The full/specific name for this technique is

Ordinary Least Squares (OLS) linear regression

Choosing the line that best explains the data

Regression assumptions #1 and #2

Frequency

#1: Residuals have the usual symmetric, ‘bell-shaped’ normal distribution

#2: Residuals are independent of each other

HomoscedasticitySpread of residuals (ε) stays consistent in size (range) as x increases

HomoscedasticitySpread of residuals (ε)

increases as x increases (or varies in some other way)

Use Weighted Least Squares

Regression assumption #3

Regression assumption #4

• Linearity! (We’ve already assumed this…)

• In the case of a non-linear relationship, one may be able to use a non-linear regression equation, such as:

y = B1x + B2x2 + c

Another problem: Multicollinearity

• If two ‘independent variables’, x and z, are perfectly correlated (i.e. identical), it is impossible to tell what the B values corresponding to each should be

• e.g. if y = 2x + c, and we add z, should we get:• y = 1.0x + 1.0z + c, or• y = 0.5x + 1.5z + c, or• y = -5001.0x + 5003.0z + c ?• The problem applies if two variables are highly

(but not perfectly) correlated too…

Example of Regression(from Pole and Lampard, 2002, Ch. 9)

• GHQ = (-0.69 x INCOME) + 4.94

• Is -0.69 significantly different from 0 (zero)?

• A test statistic that takes account of the ‘accuracy’ of the B of -0.69 (by dividing it by its standard error) is t = -2.142

• For this value of t in this example, the significance value is p = 0.038 < 0.05

• r-squared here is (-0.321)2 = 0.103 = 10.3%

… and of Multiple Regression

• GHQ = (-0.47 x INCOME) + (-1.95 x HOUSING) + 5.74

• For B = 0.47, t = -1.51 (& p = 0.139 > 0.05)

• For B = -1.95, t = -2.60 (& p = 0.013 < 0.05)

• The r-squared value for this regression is 0.236 (23.6%)

Interaction effects…

Squareroot oflength

of residence

In this situation there is an interaction between the effects of age and of gender, so B (the slope) varies according to gender and is greater for women

Logistic regression and odds ratios

• Men: 1967/294 = 6.69 (to 1)

• Women: 1980/511 = 3.87 (to 1)

• Odds ratio 6.69/3.87 = 1.73

• Men: p/(1-p) = 3.87 x 1.73 = 6.69

• Women: p/(1-p) = 3.87 x 1 = 3.87

Odds and log odds

• Odds = Constant x Odds ratio

• Log odds = log(constant) + log(odds ratio)

• Men

log (p/(1-p)) = log(3.87) + log(1.73)

• Women

log (p/(1-p)) = log(3.87) + log(1) = log(3.87)

• log (p/(1-p)) = constant + log(odds ratio)

• Note that:

log(3.87) = 1.354

log(6.69) = 1.900

log(1.73) = 0.546

log(1) = 0

• And that the ‘reverse’ of the logarithmic transformation is exponentiation

• log (p/(1-p)) = constant + (B x SEX)

where B = log(1.73)SEX = 1 for menSEX = 0 for women

• Log odds for men = 1.354 + 0.546 = 1.900• Log odds for women

= 1.354 + 0 = 1.354

• Exp(1.900) = 6.69 & Exp(1.354) = 3.87

Interpreting effects in Logistic Regression

• In the above example: Exp(B) = Exp(log(1.73)) = 1.73 (the odds ratio!)

• In general, effects in logistic regression analysis take the form of exponentiated B’s (Exp(B)), which are odds ratios. Odds ratios have a multiplicative effect on the (odds of) the outcome

• Is a B of 0.546 (= log(1.73)) significant?• In this case p = 0.000 < 0.05 for this B.

Back from odds to probabilities

• Probability = Odds / (1 + Odds)

• Men: 6.69 / (1 + 6.69) = 0.870

• Women: 3.87 / (1 + 3.87) = 0.795

‘Multiple’ Logistic regression

• log odds = c + (B1 x SEX) + (B2 x AGE)

= c + (0.461 x SEX) + (-0.099 x AGE)

• For B1 = 0.461, p = 0.000 < 0.05

• For B2 = -0.099, p = 0.000 < 0.05

• Exp(B1) = Exp(0.461) = 1.59

• Exp(B2) = Exp(-0.099) = 0.905

(Correlation and) (Multiple) Regression Friday 5 th March (and Logistic Regression too!)

Documents

Teaching Logistic Regression using Ordinary Least … Logistic Regression ... showed how they used logistic regression in the room class to ... Teaching Logistic Regression using Ordinary

Logistic Regression for Distribution Modeling - CLAS Usersusers.clas.ufl.edu/.../logistic-regression_modeling.pdf · Logistic Regression for Distribution Modeling ... logistic regression

Introduction to Logistic Regression Modeling - minitab.com · Logistic Regression will estimate binary (Cox (1970)) and multinomial (Anderson (1972)) logistic models. Logistic Regression

Logistic Regression Using SPSS - sites.education.miami.edu...Logistic Regression Using SPSS Overview Logistic Regression - Logistic regression is used to predict a categorical (usually

Logistic Regression - cs.wellesley.educs.wellesley.edu/~cs305/lectures/6_Logistic_Regression.pdfLogistic Regression Logistic regression is used for classification, not regression!

Logistic Regression - krishanpandey.com Regression.pdfCHAPTER 22. LOGISTIC REGRESSION 22.1 Introduction 22.1.1 Difference between standard and logistic regression In regular multiple-regression

Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data

Binary Logistic Regression Multinomial Logistic Regressionmgormley/courses/10601/slides/lecture10-multi.pdf · Binary Logistic Regression + Multinomial Logistic Regression 1 10-601

Logistic regression

Binary Logistic Regression - Juan BattleBinary Logistic Regression • The logistic regression model is simply a non-linear transformation of the linear regression. • The logistic

Logistic Regression Analysis for Breastfeeding of Nepalese ...fdominic/teaching/bio655/... · Review of logistic regression in STATA without taking into account correlation of repeated

Logistic Regression and Discriminant Analysis · 2018-04-16 · Discriminant Analysis? Logistic Regression . Logistic Regression •Logistic regression builds a predictive model for

Regression analysis Linear regression Logistic regression

Logistic Regression

1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application

Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Lecture 20 - Logistic Regression - Statistical Science · Logistic Regression Logistic Regression Logistic regression is a GLM used to model a binary categorical variable using numerical

Logistic Regression€¦ · Logistic Regression • Combine with linear regression to obtain logistic regression approach: • Learn best weights in • • We know interpret this

Correlation, OLS (simple) regression, logistic regression, reading tables

Stata for Logistic Regression - people.umass.edu for Logistic Regression.pdfFit a Logistic Regression Model Summary The commands logit and logistic will fit logistic regression models