Upload
audra-west
View
250
Download
0
Embed Size (px)
Citation preview
AMMBR II
Gerrit Rooks
Today
• Introduction to Stata– Files / directories– Stata syntax– Useful commands / functions
• Logistic regression analysis with Stata– Estimation– GOF– Coefficients– Checking assumptions
Stata file types
• .ado – programs that add commands to Stata
• .do– Batch files that execute a set of Stata commands
• .dta– Data file in Stata’s format
• .log– Output saved as plain text by the log using
command
The working directory
• The working directory is the default directory for any file operations such as using & saving data, or logging output
• cd “d:\my work\”
Saving output to log files
• Syntax for the log command– log using filename [, append replace [smcl|text]]
• To close a log file– log close
Using and saving datasets
• Load a Stata dataset – use d:\myproject\data.dta, clear
• Save – save d:\myproject\data, replace
• Using change directory– cd d:\myproject– Use data, clear– save data, replace
Entering data
• Data in other formats– You can use SPSS to convert data– You can use the infile and insheet commands to
import data in ASCII format
• Entering data by hand– Type edit or just click on the data-editor button
Do-files
• You can create a text file that contains a series of commands
• Use the do-editor to work with do-files • Example I
Adding comments
• // or * denote comments stata should ignore• Stata ignores whatever follows after /// and
treats the next line as a continuation • Example II
A recommended structure//if a log file is open, close itcapture log close//dont'pause when output scrolls off the pageset more off//change directory to your working directorycd d:\myproject//log results to file myfile.loglog using myfile, replace text// * myfile.do-written 7 feb 2010 to illustrate do-files//
your commands here
//close the log filelog close
Serious data analysis
• Ensure replicability use do+log files• Document your do-files
– What is obvious today, is baffling in six months
• Keep a research log– Diary that includes a description of every program
you run
• Develop a system for naming files
Serious data analysis
• New variables should be given new names• Use labels and notes• Double check every new variable• ARCHIVE
The Stata syntax• Regress y x1 x2 if x3 <20, cluster(x4)
1. Regress = Command– What action do you want to performed
2. y x1 x2 = Names of variables, files or other objects– On what things is the command performed
3. if x3 <20 = Qualifier on observations– On which observations should the command be performed
4. , cluster(x4) = Options– What special things should be done in executing the
command
Examples
• tabulate smoking race if agemother > 30, row
• Example of the if qualifier– sum agemother if smoking == 1 & weightmother < 100
Elements used for logical statements
Operator Definition Example
== Equal to If male == 1
!= Not equal to If male !=1
> Greater than If age > 20
>= Greater than or equal to If age >=21
< Less than If age<66
<= Less than or equal to If age<=65
& And If age==21&male ==1
| or If age<=21|age>=65
Missing values
• Automatically excluded when Stata fits models, they are stored as the largest positive values
• Beware – The expression ‘age > 65’ can thus also include
missing values– To be sure type: ‘age > 65 & age != .’
Selecting observations
• drop variable list• Keep variable list
• drop if age < 65
Creating new variables
• generate command– generate age2 = age * age– generate – see help function
– !!sometimes the command egen is a useful alternative, f.i.
– egen meanage = mean(age)
Useful functionsFunction Definition Example
+ addition gen y = a+b
- subtraction gen y = a-b
/ Division gen density=population/area
* Multiplication gen y = a*b
^ Take to a power gen y = a^3
ln Natural log gen lnwage = ln(wage)
exp exponential gen y = exp(b)
sqrt Square root Gen agesqrt = sqrt(age)
Replace command
• replace has the same syntax as generate but is used to change values of a variable that already exists
• gen age_dum = .• replace age = 0 if age < 5• replace age = 1 if age >=5
Recode
• Change values of exisiting variables– Change 1 to 2 and 3 to 4:
recode origvar (1=2)(3=4), gen(myvar1)
– Change missings to 1:recode origvar (.=1), gen(origvar)
Logistic regression
• Lets use a set of data collected by the state of California from 1200 high schools measuring academic achievement.
• Our dependent variable is called hiqual. • Our predictor variable will be a continuous
variable called avg_ed, which is a continuous measure of the average education (ranging from 1 to 5) of the parents of the students in the participating high schools.
OLS in Stata
_cons -.855187 .0363792 -23.51 0.000 -.9265637 -.7838102 avg_ed .4287064 .0127215 33.70 0.000 .4037467 .4536662 hiqual Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 254.263385 1157 .219760921 Root MSE = .33309 Adj R-squared = 0.4951 Residual 128.260563 1156 .110952044 R-squared = 0.4956 Model 126.002822 1 126.002822 Prob > F = 0.0000 F( 1, 1156) = 1135.65 Source SS df MS Number of obs = 1158
. regress hiqual avg_ed
. use "D:\Onderwijs\AMMBR\apilog.dta", clear
01
1 2 3 4 5avg parent ed
Fitted values Hi Quality School, Hi vs Not
. twoway scatter yhat hiqual avg_ed, connect(l) ylabel(0 1)
(42 missing values generated)(option xb assumed; fitted values). predict yhat
Logistic regression in Stata
_cons -12.30333 .731532 -16.82 0.000 -13.73711 -10.86956 avg_ed 3.910475 .2383352 16.41 0.000 3.443347 4.377603 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -353.94352 Pseudo R2 = 0.5156 Prob > chi2 = 0.0000 LR chi2(1) = 753.49Logistic regression Number of obs = 1158
Iteration 5: log likelihood = -353.94352 Iteration 4: log likelihood = -353.94352 Iteration 3: log likelihood = -353.94368 Iteration 2: log likelihood = -355.09635 Iteration 1: log likelihood = -386.86717 Iteration 0: log likelihood = -730.68708
. logit hiqual avg_ed
. twoway scatter yhat1 hiqual avg_ed, connect(l i) msymbol(i O) sort ylabel(0 1)
(42 missing values generated)(option pr assumed; Pr(hiqual)). predict yhat1
01
1 2 3 4 5avg parent ed
Pr(hiqual) Hi Quality School, Hi vs Not
)9.312( 111
1)|(
XeXYE
Multiple predictors
_cons -12.05417 .739755 -16.29 0.000 -13.50407 -10.60428 avg_ed 3.86531 .2411152 16.03 0.000 3.392733 4.337887 yr_rnd -1.091038 .3425665 -3.18 0.001 -1.762456 -.4196197 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -348.2462 Pseudo R2 = 0.5234 Prob > chi2 = 0.0000 LR chi2(2) = 764.88Logistic regression Number of obs = 1158
Iteration 5: log likelihood = -348.2462 Iteration 4: log likelihood = -348.2462 Iteration 3: log likelihood = -348.24638 Iteration 2: log likelihood = -349.81276 Iteration 1: log likelihood = -384.29232 Iteration 0: log likelihood = -730.68708
. logit hiqual yr_rnd avg_ed
Model fit: the likelihood ratio test
)]baseline()New([22 LLLL
Model fit: LR test
_cons -12.05417 .739755 -16.29 0.000 -13.50407 -10.60428 avg_ed 3.86531 .2411152 16.03 0.000 3.392733 4.337887 yr_rnd -1.091038 .3425665 -3.18 0.001 -1.762456 -.4196197 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -348.2462 Pseudo R2 = 0.5234 Prob > chi2 = 0.0000 LR chi2(2) = 764.88Logistic regression Number of obs = 1158
Iteration 5: log likelihood = -348.2462 Iteration 4: log likelihood = -348.2462 Iteration 3: log likelihood = -348.24638 Iteration 2: log likelihood = -349.81276 Iteration 1: log likelihood = -384.29232 Iteration 0: log likelihood = -730.68708
. logit hiqual yr_rnd avg_ed
764.88176. di 2*(-348.2462+730.68708)
Pseudo R2: proportional change in LL
_cons -12.05417 .739755 -16.29 0.000 -13.50407 -10.60428 avg_ed 3.86531 .2411152 16.03 0.000 3.392733 4.337887 yr_rnd -1.091038 .3425665 -3.18 0.001 -1.762456 -.4196197 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -348.2462 Pseudo R2 = 0.5234 Prob > chi2 = 0.0000 LR chi2(2) = 764.88Logistic regression Number of obs = 1158
Iteration 5: log likelihood = -348.2462 Iteration 4: log likelihood = -348.2462 Iteration 3: log likelihood = -348.24638 Iteration 2: log likelihood = -349.81276 Iteration 1: log likelihood = -384.29232 Iteration 0: log likelihood = -730.68708
. logit hiqual yr_rnd avg_ed
.52339899
. di (730.68708-348.2462)/730.68708
Classification Table
Correctly classified 67.42% False - rate for classified - Pr( D| -) 32.58%False + rate for classified + Pr(~D| +) .%False - rate for true D Pr( -| D) 100.00%False + rate for true ~D Pr( +|~D) 0.00% Negative predictive value Pr(~D| -) 67.42%Positive predictive value Pr( D| +) .%Specificity Pr( -|~D) 100.00%Sensitivity Pr( +| D) 0.00% True D defined as hiqual != 0Classified + if predicted Pr(D) >= .5
Total 391 809 1200 - 391 809 1200 + 0 0 0 Classified D ~D Total True
Logistic model for hiqual
. estat class
Classification Table
Correctly classified 87.31% False - rate for classified - Pr( D| -) 10.96%False + rate for classified + Pr(~D| +) 16.76%False - rate for true D Pr( -| D) 23.61%False + rate for true ~D Pr( +|~D) 7.43% Negative predictive value Pr(~D| -) 89.04%Positive predictive value Pr( D| +) 83.24%Specificity Pr( -|~D) 92.57%Sensitivity Pr( +| D) 76.39% True D defined as hiqual != 0Classified + if predicted Pr(D) >= .5
Total 377 781 1158 - 89 723 812 + 288 58 346 Classified D ~D Total True
Logistic model for hiqual
. estat class
Interpreting coefficients: significance
_cons -12.05417 .739755 -16.29 0.000 -13.50407 -10.60428 avg_ed 3.86531 .2411152 16.03 0.000 3.392733 4.337887 yr_rnd -1.091038 .3425665 -3.18 0.001 -1.762456 -.4196197 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -348.2462 Pseudo R2 = 0.5234 Prob > chi2 = 0.0000 LR chi2(2) = 764.88Logistic regression Number of obs = 1158
. logit hiqual yr_rnd avg_ed, nolog
bSE
b Wald
Comparing models
_cons -12.05417 .739755 -16.29 0.000 -13.50407 -10.60428 avg_ed 3.86531 .2411152 16.03 0.000 3.392733 4.337887 yr_rnd -1.091038 .3425665 -3.18 0.001 -1.762456 -.4196197 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -348.2462 Pseudo R2 = 0.5234 Prob > chi2 = 0.0000 LR chi2(2) = 764.88Logistic regression Number of obs = 1158
Iteration 5: log likelihood = -348.2462 Iteration 4: log likelihood = -348.2462 Iteration 3: log likelihood = -348.24638 Iteration 2: log likelihood = -349.81276 Iteration 1: log likelihood = -384.29232 Iteration 0: log likelihood = -730.68708
. logit hiqual yr_rnd avg_ed
After the full model and storage, estimate nested model
.
_cons -12.30333 .731532 -16.82 0.000 -13.73711 -10.86956 avg_ed 3.910475 .2383352 16.41 0.000 3.443347 4.377603 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -353.94352 Pseudo R2 = 0.5156 Prob > chi2 = 0.0000 LR chi2(1) = 753.49Logistic regression Number of obs = 1158
Iteration 5: log likelihood = -353.94352 Iteration 4: log likelihood = -353.94352 Iteration 3: log likelihood = -353.94368 Iteration 2: log likelihood = -355.09635 Iteration 1: log likelihood = -386.86717 Iteration 0: log likelihood = -730.68708
. logit hiqual avg_ed if e(sample)
.
. est store full_model
Likelihood ratio test
(Assumption: . nested in full_model) Prob > chi2 = 0.0007Likelihood-ratio test LR chi2(1) = 11.39
. lrtest full_model
Interpretation of coefficients: direction
------------------------------------------------------------------ avg_ed | 3.86531 16.031 0.000 47.7180 19.5978 0.7698 yr_rnd | -1.09104 -3.185 0.001 0.3359 0.6593 0.3819---------+-------------------------------------------------------- hiqual | b z P>|z| e^b e^bStdX SDofX------------------------------------------------------------------
Odds of: high vs not_high
logit (N=1158): Factor Change in Odds
. listcoef
nnxbxbxbbyp
yp
...
)(1
)(lnlogit 22110
Interpretation of coefficients: direction
------------------------------------------------------------------ avg_ed | 3.86531 16.031 0.000 47.7180 19.5978 0.7698 yr_rnd | -1.09104 -3.185 0.001 0.3359 0.6593 0.3819---------+-------------------------------------------------------- hiqual | b z P>|z| e^b e^bStdX SDofX------------------------------------------------------------------
Odds of: high vs not_high
logit (N=1158): Factor Change in Odds
. listcoef
nnxbxbxbb eeeeyp
yp
...
)(1
)(Odds 22110
Interpretation of coefficients: Magnitude
_cons -12.05417 .739755 -16.29 0.000 -13.50407 -10.60428 avg_ed 3.86531 .2411152 16.03 0.000 3.392733 4.337887 yr_rnd -1.091038 .3425665 -3.18 0.001 -1.762456 -.4196197 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -348.2462 Pseudo R2 = 0.5234 Prob > chi2 = 0.0000 LR chi2(2) = 764.88Logistic regression Number of obs = 1158
. logit hiqual yr_rnd avg_ed, nolog
)yr_rnd1.1avg_ed9.312( 11
1)|(
eXYE
Interpretation of coefficients: Magnitude
)yr_rnd1.1avg_ed9.312( 11
1)|(
eXYE
yr_rnd 1200 .18 .3843476 0 1 avg_ed 1158 2.754212 .7697744 1 5 Variable Obs Mean Std. Dev. Min Max
. summ avg_ed yr_rnd
.08509905
. di 1/(1+exp(12-3.9*2.75+1.1))
.21840254
. di 1/(1+exp(12-3.9*2.75))
the assumptions of logistic regression
• The true conditional probabilities are a logistic function of the independent variables.
• No important variables are omitted.• No extraneous variables are included.• The independent variables are measured without
error.• The observations are independent.• The independent variables are not linear
combinations of each other.
Hosmer & Lemeshow
Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups
Test should not be significant (indicating no difference)
Hosmer & Lemeshow
AverageProbabilityIn j th group
First logistic regression
_cons 2.425635 .3995025 6.07 0.000 1.642624 3.208645 cred_ml .7406536 .3152647 2.35 0.019 .1227463 1.358561 meals -.0936 .0084587 -11.07 0.000 -.1101786 -.0770213 yr_rnd -1.189537 .5022235 -2.37 0.018 -2.173877 -.2051967 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -156.25611 Pseudo R2 = 0.5523 Prob > chi2 = 0.0000 LR chi2(3) = 385.53Logistic regression Number of obs = 707
Iteration 5: log likelihood = -156.25611 Iteration 4: log likelihood = -156.25612 Iteration 3: log likelihood = -156.27132 Iteration 2: log likelihood = -160.11854 Iteration 1: log likelihood = -199.10312 Iteration 0: log likelihood = -349.01971
. logit hiqual yr_rnd meals cred_ml
Then postestimation command
Prob > chi2 = 0.0000 Hosmer-Lemeshow chi2(8) = 40.45 number of groups = 10 number of observations = 707
10 0.9595 62 61.1 8 8.9 70 9 0.7531 44 43.5 26 26.5 70 8 0.4960 23 22.0 47 48.0 70 7 0.1554 4 7.4 68 64.6 72 6 0.0560 2 2.4 68 67.6 70 5 0.0208 1 0.9 71 71.1 72 4 0.0078 0 0.4 68 67.6 68 3 0.0037 0 0.2 71 70.8 71 2 0.0019 1 0.1 71 71.9 72 1 0.0008 1 0.0 71 72.0 72 Group Prob Obs_1 Exp_1 Obs_0 Exp_0 Total (Table collapsed on quantiles of estimated probabilities)
Logistic model for hiqual, goodness-of-fit test
. estat gof, table group(10)
Specification error
_cons 2.425635 .3995025 6.07 0.000 1.642624 3.208645 cred_ml .7406536 .3152647 2.35 0.019 .1227463 1.358561 meals -.0936 .0084587 -11.07 0.000 -.1101786 -.0770213 yr_rnd -1.189537 .5022235 -2.37 0.018 -2.173877 -.2051967 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -156.25611 Pseudo R2 = 0.5523 Prob > chi2 = 0.0000 LR chi2(3) = 385.53Logistic regression Number of obs = 707
. logit hiqual yr_rnd meals cred_ml, nolog
_cons -.1408008 .1637332 -0.86 0.390 -.4617121 .1801105 _hatsq .0748928 .0263911 2.84 0.005 .0231673 .1266184 _hat 1.215465 .1283978 9.47 0.000 .9638102 1.46712 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -152.86003 Pseudo R2 = 0.5620 Prob > chi2 = 0.0000 LR chi2(2) = 392.32Logistic regression Number of obs = 707
. linktest, nolog
Including interaction term helps
_cons 2.686005 .4307661 6.24 0.000 1.841719 3.530291 ym .0463257 .0188326 2.46 0.014 .0094145 .0832368 cred_ml .7789823 .3206881 2.43 0.015 .1504452 1.407519 meals -.1019211 .0098691 -10.33 0.000 -.1212641 -.0825781 yr_rnd -2.834458 .8630901 -3.28 0.001 -4.526083 -1.142832 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -153.78831 Pseudo R2 = 0.5594 Prob > chi2 = 0.0000 LR chi2(4) = 390.46Logistic regression Number of obs = 707
. logit hiqual yr_rnd meals cred_ml ym , nolog
. gen ym=yr_rnd*meals
.
Prob > chi2 = 0.3215 Hosmer-Lemeshow chi2(8) = 9.25 number of groups = 10 number of observations = 707
10 0.9697 61 61.5 8 7.5 69 9 0.7725 44 43.4 25 25.6 69 8 0.4745 24 22.0 50 52.0 74 7 0.1420 2 6.5 66 61.5 68 6 0.0620 4 2.5 69 70.5 73 5 0.0204 1 1.0 70 70.0 71 4 0.0095 1 0.5 63 63.5 64 3 0.0054 0 0.3 74 73.7 74 2 0.0033 1 0.2 73 73.8 74 1 0.0015 0 0.1 71 70.9 71 Group Prob Obs_1 Exp_1 Obs_0 Exp_0 Total (Table collapsed on quantiles of estimated probabilities)
Logistic model for hiqual, goodness-of-fit test
. estat gof, table group(10)
Ok now
_cons -.0644637 .1684527 -0.38 0.702 -.3946249 .2656976 _hatsq .0297354 .0317399 0.94 0.349 -.0324737 .0919445 _hat 1.067861 .1160715 9.20 0.000 .8403653 1.295357 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -153.36794 Pseudo R2 = 0.5606 Prob > chi2 = 0.0000 LR chi2(2) = 391.30Logistic regression Number of obs = 707
Iteration 6: log likelihood = -153.36794 Iteration 5: log likelihood = -153.36794 Iteration 4: log likelihood = -153.36857 Iteration 3: log likelihood = -153.49407 Iteration 2: log likelihood = -156.07793 Iteration 1: log likelihood = -174.14403 Iteration 0: log likelihood = -349.01971
. linktest
Ok now
Multicollinearity
Mean VIF 2.56 yr_rnd 1.11 0.903460 avg_ed 3.25 0.307731 meals 3.31 0.301982 Variable VIF 1/VIF
. vif
_cons .2445202 .0824989 2.96 0.003 .0826554 .4063849 meals -.0076084 .000527 -14.44 0.000 -.0086423 -.0065744 yr_rnd -.0008586 .0248112 -0.03 0.972 -.0495386 .0478215 avg_ed .1729601 .021089 8.20 0.000 .1315831 .2143371 hiqual Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 254.263385 1157 .219760921 Root MSE = .30632 Adj R-squared = 0.5730 Residual 108.279876 1154 .093830049 R-squared = 0.5741 Model 145.983509 3 48.6611696 Prob > F = 0.0000 F( 3, 1154) = 518.61 Source SS df MS Number of obs = 1158
. reg hiqual avg_ed yr_rnd meals
Influential observations
(42 missing values generated). predict stdres, rstand
(42 missing values generated)(option pr assumed; Pr(hiqual)). predict p
186018135012 4552185045346984007187112933069 2521134440642152
211569651401951505919521859 5222664347513024873492140860444006452227723728227459071185406111864901 61461618545718994068208639751967 22083509413244912921724409858703121401247811607488264411872114 28223612273218467671660 19852223214459931807 198739781713164213650951630194721401315 50031966 47834718 20735980 521552952075018 1629738592218631103 28353166 323553871748594649758939302269176930232977524225192339
3147881417317291077186160465990244147271843 509014512326 39661497300921038721728214122721886 52108121501992594857685052477 3246465 84472959061988736176839704961980198219971926759 654696 46983118208931309801623413345041595932486855732786187258345899189026854876462740031069131894986717861879 207658671494 193248603767 321448651894151652402623524548331192788321617466572127589617233174190617174146909313113455953191617182280 593741437531721 1819 5379148817142323189823254618 141545964445430137942981 14374358 6124207852703161 33653735 54332548 33757142882 427612744353107 38252284927233231761083688 6105 53385900 252746263353 52734351 549945854673 432610723159
4773410 5725 60435783162 52184461004
3294314
433037644521 4329670 43284320 52945694 302937415663
33711085 44333801001 52242989 204356
309850542324 1523461422546 112542
2910526857285926 5299591289949555547 49213703 5110337359492582 8623622 3945065527152276126
30975000653700 4550
5154
5123084 4436296 6145297356201851 2490
227042533426 1502487060082480 5636479946991709995
19041923 38001706 356648152624 23336007 160017711685 39542281 32071657 4561173 17992276 538048155811339 11123150 5607
583692 5375 5374
5300
47024736 6116
549438871340 252051011500
4663
1402
11611762
1403
481138954852
22272226 3986 61066017846 13622957 320147465358 33552136 24531696140
29551264216845002128 384936381055954 1792337
50165911 58275018281795
229340593944173916872695 57191350302226633833 521919242083 4853 2319249148804002 30042752
167
19654539 536144144724
44152070 169828706182 1777 32389416715421
430726005561
162053042951520018243013 2841342544 35832972 536340225313258914841855 660 5295 1035722639180917372282981 29441751273060154533484 5211755 58734056 10451280 2984536958829355093 12391118 230714901758127514502494395511562599 70
784948 38241839 2338116 5036401821266885998 661278
52112116347116131511 38845252 335048263522 30871887
14931672
16792430 472812766180591058742440147327953765
2991 1646 623519 590464317225834130 531216016088 470540102692 389319141853 50622714 4705406 57011949 49634536
60385700
4553 2313507835324638 924640 4284184523 379736993411 2905330555482606
3307490 5276503533163775 29086109 49233853
30815847
4719
743 44394411 54425404 4271647
503641 3343
5765708 56925483581834283778 1294396 1131505754692929853666 694
3003 5323742 563532063043 4399203 5408285
43345534 451936213083 51333295092 459445254745
51146614544842 3422 550328856769234381 321037125967 4452 5716 520448657483695
3465107 28985943 272583
47353708628 42852535457246835787
336657375039
3272 692256544374194594 4083121556937613733 45575842550630077733754 49264289649 3265612959173760 33452179 487 32856674019 5612 53053193574 5563540142864497 6030 51924984583 435098 49855471 2904560541314496 1115
3845 395648791461
5864
38341234
540326522672351 14015134 258710627542698 2489
26351419 5904 4651
28496 1249 12321108 4040228 13792334
4747532913115063
1912609040245599
23772580 22661426 484948243822 544118301390 4175 270536133502 2755
216521195956
2369 12132583570 38365434
166623781492 46452679 13834223
55694816 39041297 5664
3917 181
144457732386
1219 5409422 533126252691 2098
36751199 610116614135 219818743610 60434309585838763864 5798 478647 138
372
9316252588 1608 205
596
42023460
399851941427 27035020 3870
385838291373 5572
4558
13104512 54145427531627 259
335654444226 32245316483951331198 2802
365019152593 21676190 13723832687 351847785196
51495586 3655
1033
215941822622 2381241 396053974709 121426966016856 1240
53262191 11001458 5593363
26074220373
41454477129916815589
836 5334421361714240 54654248 3521 999801 20973415 4391101834544409 42574275 784 42824670 4237799 3296
13843289
45473340
443944266134490 352042683111 5928
2817 49642924
8102711 3266
56464654 2902319 367058371031
6036
44003408 5555 3293283 61864033 2935
840
4608 4314505 388223874822 45144292
450645374036 48203593 2930
3656
42031038
3582 6114678
30635755
2922559 4790404558624518
2509792 5777387450262573 543563 571349111473843244 5853
358150565851
5704 284255974385 51895639 35894556
4369 35305656 293429135657
4580776 30643881
401
36362918 480023535796 3865606 3126019 302
49361514932
3634323656383204329426364121 5427
42785712
4591572327045844
4043 386828013449
6087
5192
328
4609342
44285761 420034164084
125301
748 331737305
4366 12344834035
5752 492952882816
5524
4386 381259784302 427040915968
493426436156063
4910364061725647 3757 257142644581010
2030
4050
stan
dard
ize
d P
ears
on r
esi
dua
l
0 .2 .4 .6 .8 1Pr(hiqual)
. scatter stdres p, mlabel(snum)
No 27 2.19 0 100 awards ell avg_ed hicred ym low medium medium . 808 824 59 28 cred_hl pared pared_ml pared_hl api00 api99 full some_col 1403 315 high high nd 100 497 low low 458. snum dnum schqual hiqual yr_rnd meals enroll cred cred_ml
. list if snum==1403
_cons -3.528875 1.037345 -3.40 0.001 -5.562035 -1.495716 avg_ed 2.010791 .2947269 6.82 0.000 1.433137 2.588445 meals -.0790397 .0076984 -10.27 0.000 -.0941283 -.0639511 yr_rnd -1.1328 .3842377 -2.95 0.003 -1.885892 -.3797077 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -265.68934 Pseudo R2 = 0.6358 Prob > chi2 = 0.0000 LR chi2(3) = 927.75Logistic regression Number of obs = 1157
Iteration 5: log likelihood = -265.68934 Iteration 4: log likelihood = -265.68934 Iteration 3: log likelihood = -265.70542 Iteration 2: log likelihood = -270.06297 Iteration 1: log likelihood = -332.43297 Iteration 0: log likelihood = -729.56398
. logit hiqual yr_rnd meals avg_ed if snum != 1403
_cons -3.566451 1.01715 -3.51 0.000 -5.560028 -1.572874 avg_ed 1.98805 .2884154 6.89 0.000 1.422766 2.553334 meals -.0758864 .0074453 -10.19 0.000 -.090479 -.0612938 yr_rnd -.9913148 .3743452 -2.65 0.008 -1.725018 -.2576117 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -273.66402 Pseudo R2 = 0.6255 Prob > chi2 = 0.0000 LR chi2(3) = 914.05Logistic regression Number of obs = 1158
. logit hiqual yr_rnd meals avg_ed, nolog
If we have enough time left
• Perform a logistic regression analysis• Use apilog.dta• Awards = dependent variable