30
4 basic analytical tasks in statistics: 1)Comparing scores across groups look for differences in means 2)Cross-tabulating categoric variables look for contingencies 3)Computing correlations among variables look for covariances 4)Predicting scores on an outcome variable from numerical predictor variables look for causal effects (or predicted outcomes) -- Focus this week on the 4 th task

4 basic analytical tasks in statistics: 1)Comparing scores across groups look for differences in means 2)Cross-tabulating categoric variables look

Embed Size (px)

DESCRIPTION

“Regression” = closely related topic The relationship/difference between correlation and regression? –Correlation = compute the degree to which values of variables cluster around a straight line  a symmetric description (r xy = r yx )  a standardized measure –Regression = compute the equation for the “best fitting” straight line (Y = a + bX)  It is an asymmetric description (b xy b yx )  an unstandardized measure (usually)

Citation preview

Page 1: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

4 basic analytical tasks in statistics:

1) Comparing scores across groups look for differences in means

2) Cross-tabulating categoric variables look for contingencies

3) Computing correlations among variables look for covariances

4) Predicting scores on an outcome variable from numerical predictor variables look for causal effects (or predicted outcomes)

-- Focus this week on the 4th task

Page 2: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

“Correlation” (revisited)Correlation = strength of the linear

association between 2 numeric variables• It reflects the degree to which the association

is described by a “straight-line” relationship– The degree to which two variable covary or share

common variance – [“covariance” = a key term]• It reflects the “commonality” (“predictability”)

between the two variables• Note: r2 (r-squared) = the proportion of

variance that “shared” or common to both variables

Page 3: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

“Regression” = closely related topic• The relationship/difference between

correlation and regression?– Correlation = compute the degree to which

values of variables cluster around a straight line a symmetric description (rxy = ryx) a standardized measure

– Regression = compute the equation for the “best fitting” straight line (Y = a + bX) It is an asymmetric description (bxy <> byx) an unstandardized measure (usually)

Page 4: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

Linear Regression

Page 5: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

So, what’s the deal with “Regression” ?

• Why is “regression” called that?a) Term introduced by Francis Galton in late-19th

century to describe prediction of genetic traits across generations reflecting imperfect correlations between parents and children

b) It referred to tendency of extreme values of traits to “regress toward the mean” across successive generations reflecting Galton’s interest in the inheritability of genius & other unusual traits

c) Correct word use: we “regress the dependent variable on the independent variable” Y = a + byxX

Page 6: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

What’s the deal with “Regression”? (cont.)

• Why is regression used in data analysis? To describe the functional pattern that links 2

variables together in a correlation – i.e., what are the optimal values of a and b for X & Y?

Two basic uses of regression: a) Prediction:

-- predict values of one variable (Y) from values of another variable (X) (using linear equation)

b) Explanation:-- Estimate the causal influence of one variable (X) on

another (Y) (based on measurable correlation).-- test a causal hypothesis about how Y and X are related.

Page 7: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

How is regression analysis done?• By fitting a straight line to a set of bivariate

points (values on 2 variables for the same data units)

– y = a + byxx (basic formula for linear relation)– y = the dependent variable– x = the independent variable– a = the “intercept”– byx = the “slope” of the line

• Concern is with fitting a straight line that minimizes the errors of prediction (of y from x) – y y ei i i (observed = predicted + error)

Page 8: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

2 ways of expression the prediction equation:

y a b x ei yx i i

y a b xi yx i

or

Page 9: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

Regression example (continued)

Page 10: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

“Regression”• How to obtain the straight line that “best

fits” the data?– Rely on a method called “least squares” which

minimizes the sum of the squared errors (deviations between the line and the data points)

– Yields best-fitting line to the points– Yields formulas for a and b provided in the book

• How to compute regression coefficients?• By hand calculations:

– Definitional formula (the familiar one)– Computational formula (no deviation scores)

• By SPSS: Analyze Regression Linear

Page 11: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

bX X Y Y

X Xyx

( )( )

( ) 2

a Y b Xyx

Regression Coefficient: Definitional Formula

Regression Coefficient: Computational Formula

Intercept (Constant): Computational Formula

bXY N X Y

X N Xyx

2 2

Page 12: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

“Regression”

• Use Example from Fox/Levin/Forde text (p. 277) (handout)

Prior Charges Sentence (mos)

0 12

3 13

1 15

0 19

6 26

5 27

3 29

4 31

10 40

8 48

Page 13: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

# PriorsX

SentenceY X2 Y2 XY

0 12 0 144 O3 13 9 169 391 15 1 225 150 19 0 361 06 26 36 676 1565 27 25 729 1353 29 9 841 874 31 16 961 124

10 40 100 1600 400 8 48 64 2304 384

Σ= 40 Σ=260 Σ=260 Σ=8010 Σ=1340

Page 14: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

Regression Example (cont.)

bXY N X Y

X n X

2 2

b

1340 10 4 0 26 0260 10 4 0 4 0

( )( . )( . )( )( . )( . )

1340 1040260 160

300100

a Y b X a 26 3 0 4 0 26 12( . )( . )

= = = 3.0

= 14.0

Page 15: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

Regression example (continued)

Page 16: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look
Page 17: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look
Page 18: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

Regression (continued)- How to interpret the results?• Slope (b) = predicted change in Y for a 1-unit

change in X Unstandardized b (b) = in original units/metric Standardized b (β)[beta]= in standard (Z) units

• Intercept (a) = predicted value of Y when X=0 Interpretable only when zero is a meaningful value of X Also called the “constant” term since it is the same for all

values of X

• R (multiple r) = correlation between Y and the predictor(s) (predictability of Y from Xs)

Page 19: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

Regression (continued)• What are assumptions/requirements of

regression?1. Numeric variables (interval or ratio level)2. Linear relationship between variables3. Random sampling4. Normal distribution of data5. Homoscedasticity (equal conditional variances)

• What if the assumptions do not hold?1. Don’t worry about small deviations2. May be able to transform variables3. May use alternative procedures

Page 20: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

Regression (continued)• How to test for significance of results?

– F-test for overall regression– t-test for individual b coefficients

• What is R? (or R2?)• Can we use more than one independent

variable?– Yes – it’s called “multiple regression”– Regress a single dependent variable (Y) on

multiple independent variables (a linear combination that best predicts Y)

Page 21: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

Multiple Regression - addenda• Simultaneous analysis of the regression

of a dependent variable on 2 or more independent variables Yi = a +b1X1 + b2 X2 + b3X3 + ei

• All coefficients are computed at once– In this case, the b coefficients are partial

regression coefficients– They reflect the unique predictive ability of each

variable (with the covariance of other independent variables “partialled out”)

Page 22: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look
Page 23: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look
Page 24: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look

Multiple Regression• What is Multiple Regression good for?

allows us to estimate:– The combined effects of multiple variables– The unique effects of individual variables

allows us to test causal theories– The combined effects of multiple variables– The unique effects of individual variables

• In this case, R2 measure how well the entire model does in predicting Y.

The overall F-test refers to whole set of variables The t-tests apply to coefficients of each variable

Page 25: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look
Page 26: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look
Page 27: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look
Page 28: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look
Page 29: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look
Page 30: 4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look