Upload
yelena
View
31
Download
1
Embed Size (px)
DESCRIPTION
Sociology 601 Class 21: November 10, 2009. Review formulas for b and se(b) stata regression commands & output Violations of Model Assumptions, and their effects (9.6) Causality (10). Formulas for b , a, r , and se(b ). Stata Example of Inference about a Slope. - PowerPoint PPT Presentation
Citation preview
Sociology 601 Class 21: November 10, 2009
• Review
– formulas for b and se(b)
– stata regression commands & output
• Violations of Model Assumptions, and their effects (9.6)
• Causality (10)
1
Formulas for b, a, r, and se(b)
2
€
b =Σ(X − X )(Y − Y )
Σ(X − X )2; a = Y − bX ;r = b
sx
sy
ˆ Y = a + bX; SSE = (Y − ˆ Y )∑2
se(b) =
SSE
n − 2sx
n −1
Stata Example of Inference about a Slope
. summarize murder poverty
Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- murder | 51 8.727451 10.71758 1.6 78.5 poverty | 51 14.25882 4.584242 8 26.4
. regress murder poverty
Source | SS df MS Number of obs = 51-------------+------------------------------ F( 1, 49) = 23.08 Model | 1839.06931 1 1839.06931 Prob > F = 0.0000 Residual | 3904.25223 49 79.6786169 R-squared = 0.3202-------------+------------------------------ Adj R-squared = 0.3063 Total | 5743.32154 50 114.866431 Root MSE = 8.9263
------------------------------------------------------------------------------ murder | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- poverty | 1.32296 .2753711 4.80 0.000 .7695805 1.876339 _cons | -10.1364 4.120616 -2.46 0.017 -18.41708 -1.855707-----------------------------------------------------------------------------
3
Stata Example of Inference about a Slope
4
. correlate murder poverty(obs=51)
| murder poverty-------------+------------------ murder | 1.0000 poverty | 0.5659 1.0000
. correlate murder poverty, covariance(obs=51)
| murder poverty-------------+------------------ murder | 114.866 poverty | 27.8024 21.0153
sqrt(114.866) = 14.26 = sd(y);sqrt (21.0153) = 8.73 = sd(x)
Alternative Formula for b
5
€
b =Σ(X − X )(Y −Y )
Σ(X − X )2
=Σ(X − X )(Y −Y ) /(N −1)
Σ(X − X )2 /(N −1)
=covariance(x,y)
var iance(x)
b = 27.8024 / 21.0153 = 1.323
Stata Example of Inference about a Slope
6
scatter murder poverty || lfit murder poverty
Stata Example of Inference about a Slope
7
. regress murder poverty if state!="DC"
Source | SS df MS Number of obs = 50-------------+------------------------------ F( 1, 48) = 31.36 Model | 307.342297 1 307.342297 Prob > F = 0.0000 Residual | 470.406476 48 9.80013492 R-squared = 0.3952-------------+------------------------------ Adj R-squared = 0.3826 Total | 777.748773 49 15.8724239 Root MSE = 3.1305
------------------------------------------------------------------------------ murder | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- poverty | .5842405 .104327 5.60 0.000 .3744771 .7940039 _cons | -.8567153 1.527798 -0.56 0.578 -3.92856 2.215129------------------------------------------------------------------------------
Assumptions Needed to make Population Inferences for slopes.
• The sample is selected randomly.
• X and Y are interval scale variables.
• The mean of Y is related to X by the linear equation E{Y} = + X.
• The conditional standard deviation of Y is identical at each X value. (no heteroscedasticity)
• The conditional distribution of Y at each value of X is normal.
• There is no error in the measurement of X.
8
Common Ways to Violate These Assumptions
• The sample is selected randomly.
o Cluster sampling (e.g., census tracts / neighborhoods) causes observations in any cluster to be more similar than to observations outside the cluster.
o Autocorrelation (spatial and temporal)
o Two or more siblings in the same family.
o Sample = populations (e.g., states in the U.S.)
• X and Y are interval scale variables.
o Ordinal scale attitude measures
o Nominal scale categories (e.g., race/ethnicity, religion)9
Common Ways to Violate These Assumptions (2)
• The mean of Y is related to X by the linear equation E{Y} = + X.
o U-shape: e.g., Kuznets inverted-U curve (inequality <- GDP/capita)
o Thresholds:
o Logarithmic (e.g., earnings <- education)
• The conditional standard deviation of Y is identical at each X value. (no heteroscedasticity)
o earnings <- education
o hours worked <- years
o adult child occupational status <- parental occupational status
10
Common Ways to Violate These Assumptions (3)
• The conditional distribution of Y at each value of X is normal.
o earnings (skewed) <- education
o Y is binary
o Y is a %
• There is no error in the measurement of X.
o almost everything
o what is the effect of measurement error in x on b?
11
Things to watch out for: extrapolation.
Extrapolation beyond observed values of X is dangerous.• The pattern may be nonlinear.• Even if the pattern is linear, the standard errors become
increasingly wide.• Be especially careful interpreting the Y-intercept: it may lie
outside the observed data.o e.g., year zeroo e.g., zero education in the U.S.o e.g., zero parity
12
Things to watch out for: outliers
• Influential observations and outliers may unduly influence the fit of the model.
• The slope and standard error of the slope may be affected by influential observations.
• This is an inherent weakness of least squares regression.
• You may wish to evaluate two models; one with and one without the influential observations.
13
Things to watch out for: truncated samples
Truncated samples cause the opposite problems of influential observations and outliers.
• Truncation on the X axis reduces the correlation coefficient for the remaining data.
• Truncation on the Y axis is a worse problem, because it violates the assumption of normally distributed errors.
•Examples: Topcoded income data, health as measured by number of days spent in a hospital in a year.
14
Causality
• We never prove that x causes y• Research and theory make it increasingly likely
• Criteria:• association• time order • no alternative explanations
• is the relationship spurious?
15
Alternative Explanations
Example: Neighborhood poverty -> Low Test Scores
16
Alternative Explanations
Example: Neighborhood poverty -> Low Test Scores
Possible solutions:• multivariate models
• e.g., control for parents’ education, income• controls for other measureable differences
• fixed effects models• e.g., changes in poverty -> changes in test scores• controls for constant, unmeasured differences
• instrumental variables• find an instrument that affects x1 but not y
• experiments• e.g., Moving to Opportunity• randomize increases in $
17
Alternative Explanations
Example: Fertility -> Lower Mothers’ LFP
Possible solutions:
18
Alternative Explanations
Example: Fertility -> Lower Mothers’ LFP
Possible solutions:• multivariate models
• e.g., control for gender attitudes• controls for other measureable differences
• fixed effects models• e.g., changes in # children -> dropping out• controls for constant, unmeasured differences
• instrumental variables• find an instrument that affects x1 but not y• e.g., mothers of two same sex children
• experiments• not feasible (or ethical)
19
Types of 3-variable Causal Models
• Spurious• x2 causes both x1 and y• e.g., religion causes fertility and women’s lfp
• Intervening• x1 causes x2 which causes y• e.g., fertility raises time spent on children which
lowers time in the labor force
• What is the statistical difference between these?
20
Another type of 3-varaible relationship:Statistical Interaction Effects
Example: Fertility -> Lower Mothers’ LFP
The relationship between x1 and y depends on the value of another variable, x2
• e.g., marital status -> earnings depends on gender
21