58
Stat 342 - Wk 13: Plong and final exam prep. Unfinished From last week: Binary responses (proc logisc) Ordinal and mulnomial responses (proc logisc) New this week: ODS Graphics vs SAS/Graph Graphing Examples (KDE, LOESS, Interacon, ROC) Exam Topics TA Evaluaon Stat 342 Notes. Week 12 Page 1 / 58

Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Stat 342 - Wk 13: Plotting and final exam prep.

Unfinished From last week:

Binary responses (proc logistic)

Ordinal and multinomial responses (proc logistic)

New this week:

ODS Graphics vs SAS/Graph

Graphing Examples (KDE, LOESS, Interaction, ROC)

Exam Topics

TA Evaluation

Stat 342 Notes. Week 12 Page 1 / 58

Page 2: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

The basic syntax of PROC LOGISTIC follows the same patterns of the GLM and GLMSELECT procedures.

However, random effects don't work with the LOGISTIC proc.

proc logistic data=copenhagen;

class <categorical predictors>;

model <response> = <explanatory> / <options>;

freq <varname>;

output out =<dataset> <var=newname>;

run;

Stat 342 Notes. Week 12 Page 2 / 58

Page 3: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

For the predicting the level of neighbour contact.

proc logistic data=copenhagen;

class housing influence satisfaction;

freq n;

model contact = housing influence satisfaction;

run;

Stat 342 Notes. Week 12 Page 3 / 58

Page 4: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Here, everything is done predicting the chance of 'high'.

Stat 342 Notes. Week 12 Page 4 / 58

Page 5: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Here, everything is done predicting the chance of 'high'.

... which was decided by SAS, and it may not have been the category that we wanted to have as our 'yes' category.

To change this, define the 'yes' category that you want with the event option in the model statement.

model contact(event = 'low') = housing influence satisfaction;

Stat 342 Notes. Week 12 Page 5 / 58

Page 6: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Significance levels are the same, estimates are 'reversed'.

Stat 342 Notes. Week 12 Page 6 / 58

Page 7: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Don't confuse logistic models with logical models.

Stat 342 Notes. Week 12 Page 7 / 58

Page 8: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Model selection is done with the SELECTION option in the MODEL statement after the slash.

The DETAILS option tells SAS to report on the entire model selection process, not just the end result.

proc logistic data=copenhagen;

class housing influence satisfaction;

freq n;

model contact = housing influence

satisfaction / selection = stepwise details;

run;

Stat 342 Notes. Week 12 Page 8 / 58

Page 9: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

The ODDSRATIO option shows the odds-ratio (and confidence interval of the odds-ratio) of each category compared to the baseline for your selected variable.

proc logistic data=copenhagen;

class housing influence satisfaction;

freq n;

model contact = housing influence satisfaction;

oddsratio housing;

run;

Stat 342 Notes. Week 12 Page 9 / 58

Page 10: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Stat 342 Notes. Week 12 Page 10 / 58

Page 11: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

There is more than one link function, which is the function used to convert probability, which is bounded by [0,1], into something that is unbounded.

This is important because logistic regression is doing something very similar to linear regression at its basic mechanic, and linear regression depends on the response variable to be some continuous variable which can, theoretically, take any value.

Stat 342 Notes. Week 12 Page 11 / 58

Page 12: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

The default link function is the 'logit' link, which is the one we use to put log-odds in the place of probability.

One common alternative is the 'probit' link, which uses the CDF of the normal distribution instead. The theory behind why one link is selected over any other link is graduate-level theory (in Generalized Linear Models), so for now I recommend using the default 'logit' most of the time.

Stat 342 Notes. Week 12 Page 12 / 58

Page 13: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

If there are numerical issues (e.g. failure to converge, nonsense summary data) with the logit link, you can treat probit as an 'alternate mode' of logistic regression, which may have better luck.

proc logistic data=copenhagen;

class housing influence satisfaction;

freq n;

model contact = housing influence

satisfaction / link=probit;

run;

Stat 342 Notes. Week 12 Page 13 / 58

Page 14: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Why deal with link function at all? Because there's another procedure called proc probit, which is like proc logistic, but isolder with fewer features.

To avoid having to learn an outdated proc, if you ever have to use a probit link instead of a logit link, then just use the LINK option in the MODEL statement of PROC LOGISTIC.

Stat 342 Notes. Week 12 Page 14 / 58

Page 15: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

We can also get additional summary data, such as Naglekirke's R-squared (a logistic version of the regular r-squared), and the confidence limits of the odds ratios with the rsquared and clodds options, respectively.

proc logistic data=copenhagen;

class housing influence satisfaction;

freq n;

model contact = housing influence

satisfaction / rsquare clodds = wald;

run;

Stat 342 Notes. Week 12 Page 15 / 58

Page 16: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Stat 342 Notes. Week 12 Page 16 / 58

Page 17: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

But what if there's more than 2 levels?

Stat 342 Notes. Week 12 Page 17 / 58

Page 18: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

If you are trying to make predictions about a categorical response with more than two levels, there's one thing you have to ask before going any further.

Do the categories I wish to predict form a natural ordering, (e.g. None, Low, Medium, High, Extreme), or,

are they just nominal , unordered categories

(e.g. Cat, Dog, Dragon, Capybara)?

Stat 342 Notes. Week 12 Page 18 / 58

Page 19: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

If the data is ordered, you can use proc logistic to conduct anORDINAL LOGISTIC REGRESSION.

Just code you categories into integers {1,2,...,k} and use those coded categories as your response.

Data copenhagen;

set copenhagen;

sat_lvl = 1;

if satisfaction = 'medium' then sat_lvl = 2;

if satisfaction = 'high' then sat_lvl = 3;

run;

Stat 342 Notes. Week 12 Page 19 / 58

Page 20: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

The logistic procedure will understand that each integer value is an ordered category.

proc logistic data=copenhagen;

class housing influence contact;

freq n;

model sat_lvl = housing influence contact / rsquare;

oddsratio housing;

run;

Stat 342 Notes. Week 12 Page 20 / 58

Page 21: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

With ordinal responses, all the effect sizes refer to the log-odds of any given response being in the 'next category up'.

Each response category after the first has its own intercept.

Stat 342 Notes. Week 12 Page 21 / 58

Page 22: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

If there is no natual ordering to the categories, you can use the generalized logit to do a logistic regression on several categorical responses together.

To do this, use the link option and set it to 'glogit'

proc logistic data=copenhagen;

class contact influence satisfaction;

freq n;

model housing = contact influence

satisfaction / link=glogit;

run;

Stat 342 Notes. Week 12 Page 22 / 58

Page 23: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

There results show the effect of each variable on the log-odds of any observation having the list response (compared to the 'baseline' response)

Stat 342 Notes. Week 12 Page 23 / 58

Page 24: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

ODS Graphics: More entertaining thanNickelback

Stat 342 Notes. Week 12 Page 24 / 58

Page 25: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

ODS vs SAS/Graph

Like other statistical software (R, SPSS, JMP), SAS has a base and packages that are installed on top of it.

SAS University Edition is only comes with some of these packages, like SAS/IML, which make it great for programming and research. However, it's missing SAS/Graph, which means that many of the plotting functionsin our textbook aren't available.

Stat 342 Notes. Week 12 Page 25 / 58

Page 26: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

For our purposes, there are two graphical system: ODS (Open Document System) and SAS/Graph.

In ODS:

proc gplot, proc sgplot, proc sgscatter, proc sgpanel,

proc kde,

In SAS/Graph:

proc gcontour, proc gchart, proc g3d, proc gmap,

Stat 342 Notes. Week 12 Page 26 / 58

Page 27: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

KDE stands for Kernel Density Estimation. It's used to make asmooth estimation of the probability density of a distribution from the points in a data set.

The most commonly used method of KDE is to make a normal curve centred at each data point and add up the densities.

The densities are divided by the sample size N, so that they probability density still integrates to 1. By default, the standard deviations are the same around each point.

Stat 342 Notes. Week 12 Page 27 / 58

Page 28: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

KDE is a quick way to compare the distribution of different continuous variables.

proc kde data=ds;

univar x y / plots = densityoverlay;

run;

Stat 342 Notes. Week 12 Page 28 / 58

Page 29: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

KDE is like the continuous version of a histogram.

proc kde data=ds;

univar x y / plots = histdensity;run;

Stat 342 Notes. Week 12 Page 29 / 58

Page 30: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

There a couple viable options for the plots you can produce with the Kernel Density Estimation procedure, but it's simplest to print them all if you don't mind the output.

This plots=all option works for MANY procedures that use ODS graphics.

proc kde data=ds;

univar x y / plots = all;

run;

Stat 342 Notes. Week 12 Page 30 / 58

Page 31: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

In a histogram, we would select (manually or automatically) 'bin widths' for each bar. In KDE, we select the standard deviation around each point (or more generally, the 'bandwidth' in case we're using a Kernel other than Normal)

The 'bwm' option changes the default, automatically generated, bandwidth with a 'BandWidth Multiplier'. A lower bandwidth creates a curve that fits the data more exactly, and a higher bandwidth creates more smoothing.

Stat 342 Notes. Week 12 Page 31 / 58

Page 32: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

proc kde data=ds;

univar x (bwd=0.5) x (bwd=1) x (bwd=2)/ plots = densityoverlay; run;

Stat 342 Notes. Week 12 Page 32 / 58

Page 33: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

For bivariate plots (contours, 3D surface), use a BIVAR statement

proc kde data=ds;

bivar x y / plots = all;

run;

Stat 342 Notes. Week 12 Page 33 / 58

Page 34: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Similar to the contour plot is the heatmap, which is a discrete version of the contour plot. This is especially appropriate when one of the variables is categorical/ordinal,or when both variables are whole numbers only.

proc sgplot data=ds;

heatmap y=z x=x;

run;

Stat 342 Notes. Week 12 Page 34 / 58

Page 35: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

The colour scheme can be changed, but the default is blue to red.

Stat 342 Notes. Week 12 Page 35 / 58

Page 36: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Heatmaps are meant for hundreds or thousands of data points, not n=30.

Stat 342 Notes. Week 12 Page 36 / 58

Page 37: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

SAS doesn't have the best reputation forgraphics.

Stat 342 Notes. Week 12 Page 37 / 58

Page 38: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

LOESS Curves

With scatterplots and regression, we have learned to make lines and curves of best fit by specifying a model in advance and assessing it.

Sometimes it would be easier to see something more exploratory first by letting SAS draw the pattern between two variables first, before we specify what kind of model we want.

Stat 342 Notes. Week 12 Page 38 / 58

Page 39: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

LOESS, or LOcal regrESSion (also called LOWESS: LOcally WEighted Scatterplot Smoothing) is a system that allows youdo that.

Like KDE, it smooths out a function at each observation point, and takes the average. Also like KDE, it relies on a bandwidth setting (called the smoothing parameter in loess),which is usually determined automatically.

Unlike KDE, the variable being averaged is the value of Y at each level of X, rather than simply the number of points at X.

Stat 342 Notes. Week 12 Page 39 / 58

Page 40: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Try these.

proc sgplot data=ds;

loess y=z x=x;

run;

proc sgplot data=ds;

loess y=y x=x;

run;

Stat 342 Notes. Week 12 Page 40 / 58

Page 41: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Stat 342 Notes. Week 12 Page 41 / 58

Page 42: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Now try with the smoothing parameter.

proc sgplot data=ds;

loess y=y x=x / smooth=0.5;

run;

proc sgplot data=ds;

loess y=y x=x / smooth=0.2;

run;

Stat 342 Notes. Week 12 Page 42 / 58

Page 43: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Stat 342 Notes. Week 12 Page 43 / 58

Page 44: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Interaction Plot

One important graphical option that was overlooked when we were doing regression was the interaction plot.

An interaction plot gives the mean response (y-axis) for different combinations of two explanatory variables (x-axis and colour).

Stat 342 Notes. Week 12 Page 44 / 58

Page 45: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

If the lines are close to parallel, the effects of the two explanatory variables are additive, and there is no evidence of an interaction.

If the lines have different slopes, especially if they cross, then an interaction term may be warranted.

When there many categories in one or both explanatory variables, some crossing over will happen by random chance. This doesn't necessarily mean that an interaction term will improve the model.

Stat 342 Notes. Week 12 Page 45 / 58

Page 46: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

proc glm data=ds;

class block block2;

model z = block | block2;

run;

Stat 342 Notes. Week 12 Page 46 / 58

Page 47: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

At least one of the explanatory variables (the colour one) should be categorical, or else you will end up with a differentcolour for each observed value for the continuous variable.

proc glm data=ds plots=all;

model z = x | y;

run;

proc glm data=ds plots=all;

class block;

model z = x | block;

run;Stat 342 Notes. Week 12 Page 47 / 58

Page 48: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Receiver-Operator Curves

Another option for logistic regression is the receiver operator character curve (ROC curve).

These are popular graphs in medical statistics because they can be used to determine the best cutoff to detect some binary response (such as disease status).

Stat 342 Notes. Week 12 Page 48 / 58

Page 49: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

proc logistic data=ds plots(only)=roc;

model block2 = x y;

run;

Stat 342 Notes. Week 12 Page 49 / 58

Page 50: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

For reference, the best cutoffs are far above the diagonal line. Any cutoff point touching the diagonal line is literally as good as guessing.

Stat 342 Notes. Week 12 Page 50 / 58

Page 51: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

ROC curves are also given a measurement of the general quality of the data used to make the logistic regression: this is called the Area Under Curve (AUC).

Higher AUC is better.

An AUC of 0.5 is as good as the null model (no predictors),

An AUC of 1 is perfect – the explanatory variables can perfectly predict the response.

Stat 342 Notes. Week 12 Page 51 / 58

Page 52: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

For future reference:

Basic ODS Graphics Examples (214 page PDF textbook) by Warren F. Kuhfeld http://support.sas.com/documentation/prod-p/grstat/9.4/en/PDF/odsbasicg.pdf

Basic ODS Graphics Examples (250 page PDF textbook) also by Warren F. Kuhfeld

http://support.sas.com/documentation/prod-p/grstat/9.4/en/PDF/odsadvg.pdfStat 342 Notes. Week 12 Page 52 / 58

Page 53: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Some parting comments: Certification Exams

The stats department will pay for any of its own students to get their first PASS in a SAS certification exam. As in, the chair will reimburse the $90 exam fee to anyone in the department that presents proof of a passing mark.

If you're planning on taking this, I highly recommend startingwith the base programmer certification to get more of the fundamentals that were overlooked in this class.

Stat 342 Notes. Week 12 Page 53 / 58

Page 54: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

The textbooks for Base Programmer and Advanced Programmer are available through the SFU library by searching or following this link:

http://search.lib.sfu.ca/?q=sas%20certification%20prep%20guide

After base programmer, 'Statistical Business Analyst' is a good next step, because it matches up nicely with the skills you've learned in this course and other applied statistics / regression / linear modelling courses in the department.

Stat 342 Notes. Week 12 Page 54 / 58

Page 55: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Passing any one of these exams also puts your name in a database that SAS uses for hiring directly (e.g. in their Canada HQ in Toronto, or their world HQ campus at Cary, North Carolina) or for their many clients.

It also gives you an Acclaim badge that connects directly to your Linkedin profile (I think).

Stat 342 Notes. Week 12 Page 55 / 58

Page 56: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Some parting comments: Stat 342 Exam and Practice Exam

The final exam is Dec 11, 3:30pm to 6:30pm.

Yes, that means it's a Sunday.

The exam is at WMC (West Mall Centre), 3260.

The room is in the general area of Tim Horton's (I think), but it's a bit hard to find because the door number isn't highly visible. Either come early or make sure you know where it is in advance.

Stat 342 Notes. Week 12 Page 56 / 58

Page 57: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

The final exam will be structured a lot like the midterm exam, and will be slightly harder. This just means I won't be giving as many free marks for things like including 'run' at the end of every proc.

You will be allowed to bring a cheat sheet like last time, and this time you can use both sides of the paper.

About 80% of the exam will be on the material AFTER midterm 1.

Stat 342 Notes. Week 12 Page 57 / 58

Page 58: Stat 342 - Wk 13: Plotting and final exam prep.jackd/Stat342/Lect_Wk13.pdf · 2016. 12. 15. · Stat 342 - Wk 13: Plotting and final exam prep. Unfinished From last week: Binary responses

Topics include:

- Loading and saving data with proc import, export.

- Loading data with the datalines command in a data step.

- Making new variables with the data step.

- Operations in PROC IML

- Continuous data with PROC UNIVARIATE, PROC MEANS

- Anova and Regression models with PROC GLM

- Model selection with PROC GLMSELECT

- Logistic regression models with PROC Logistic

- Basics of plottingStat 342 Notes. Week 12 Page 58 / 58