Graphics in EG and R HRP223 2009 November 16th, 2009
Copyright Leland Stanford Junior University.All rights reserved.
Warning:This presentation is protected by copyright law and
international treaties.Unauthorized reproduction of this
presentation, or any portion of it, may result in severe civil and
criminal penalties and will be prosecuted to maximum extent
possible under the law. Robbins Creating More Effective Graphics by
Naomi Robbins is a wonderful book showing the right and wrong ways
to visualize scientific data.Read it when you have an afternoon
off.It is an ideal read on a transcontinental flight. Why Do Data
Visualization?
Well designed pictures will show you the details and the whole
pattern in your data. Numeric descriptions can easily hide
important patterns. Some patterns are hard to detect in tables.
Whenever data is reported over time or locations, you need art. YOU
CAN LEARN A LOT BY JUST LOOKING. -Yogi Berra Fishers Plot Data
Reported in Cleveland
Year 1 Year 2 Based on code written by Robert Allison at SAS
Institute Scatter Plot for Correlations
Anscombe 1973, Graphs in Statistical Analysis All have r2 = .67 Bad
Things First, I want to talk about bad graphics that I frequently
see. 3d Pie Donuts Stacked graphics General 3D graphics Dont, Dont,
Dont
While the SAS implementation of 3D graphics is relatively good,
dont use 3D effects, unless you are measuring something in 3D.Even
then, dont. Tufte is a God to many. The empiricist in me is very
nervous about the amount of pontificating in his books I want to
have evidence-based advice. His best advice is to put no extra ink
on the page. Think about the ink-to-information ratio. Remove all
chart junk. Note: the irony of the chart junk on this slide. You
can remove ink rather than adding .
Example Bar Chart Serum Samples in Each Trimester You can remove
ink rather than adding . Ink-to-Information Ratio
How much ink for seven numbers? Based on Soukup & Davidson,
2002 Visual Data Mining Cleveland If you want to know how to do
scientific visualization, you must read William Clevelands work. He
attempted to quantify what makes a good graphic good. His early
work on graphics is one of the reasons why R/S-plus is taking over
the statistical world. Pie is bad. Work by Cleveland (and
experimental psychologists) suggests that: people are bad at
judging the relative magnitude of angles if you twist the rotation
of the pie you can cause people to systematically misjudge the size
of the angles a 3rd dimension makes judgment worse If you get a
glossy handout with a 3D pie, assume someone is lying to you. Dont
use them. Dont Explode! This exploded 3D pie (brought to you by
Excel) is nearly useless for judging amounts. Forbidden Donut.
Donut plots have the same problems as pies (if not worse) .
Stacking is Bad Cleveland also quantified the fact that people are
bad at judging the relative height of stacked data. Wow, a cinnamon
roll plot!
Good luck making rapid judgments using this stacked 3D pie. What is
a good graphic? Dont make your audience think unnecessarily!
Minimize the amount of ink on the page. This needs to be studied.
Show the central tendency and the variability. Plot the quantity
(inference) that you want people to notice. Be sure colorblind
people can understand it. Use a black and white photocopier and
make sure you can distinguish all groups. Avoid Thinking But labels
on the graphic directly instead of using a key. If you want people
to compare the difference between two lines, plot the difference,
not the two lines. Bivariate Comparisons with Lines
People are extremely bad at judging the distance between two
curves.Never ask people to judge up and down (vertical) distances
between curves. The distance between the two curves is the same at
all points. Based on: Robbins Creating More Effective Graphs, 2005
Plot Types Univariate (one variable) Categorical variables
Bar charts Dot plots Waffle plots Continuous variables Histogram
Box plot Violin plots Bar Charts The ink-to-information ratio is
lousy.
A one dimensional quantity is being expanded into two dimensions.
Doubling of the amount corresponds to how much of an increase in
area? SAS Bar Charts SAS makes the reader do extra work by rotating
the axis labels in ActiveX images. They pointlessly include
variable labels by default. Notice you can Edit the data and apply
filters.
How to do it? Notice you can Edit the data and apply filters. You
can right click on variables and apply user-defined formats off the
Properties dialog. First create the format.
In the Data windowpane of the Bar Chart GUI, right click on the
variable and change the format to the User Defined format you had
created. The GUI is Solid My only complaints are that the rotate
grouping values text does not work (position in this example) and
the summary statistics do not show up when you request ActiveX
images. Saving the Graphic for Publication
The easiest way to get publication quality graphics is to set the
output type to be RTF. .PNG format ActiveX image format Default
Output and Graphics
The default graphic format in EG is ActiveX.These images can be
edited (even on the web) but they only display with Internet
Explorer.I have set my graphics to display as ActiveX images.Tweak
this with Tools> Options > Graph. Types of Images The default
formats of the images are determined by the ODS destinations you
are using: LISTING: pgn visible in the Windows Image Fax Viewer
HTML: png, gif, jpg contained in web pages and visible in Internet
Explorer, Firefox or Opera LATEX: PostScrpt, epsi, gif, jpeg, pgn
are visible in GhostView PCL or PS: contained in Postscript file
are visible in GhostView PDF: contained in pdf, which is visible
with Adobe Reader RTF: visible in MS Word I Typically Use HTML This
is the appearance template.For optimal results use: Analysis: color
Default : overdistinguishes symbols for color or B&W Journal or
journal2, etc: black and white Statistical or statistical2, etc:
color Include image_dpi = 200 to set the resolution to be higher
than the default 100 dots per inch.Try 200 for final images pasting
into MS Office. This says the images should show tooltips with
extra statistical details when you hover the mouse over parts of
the graphic.(I cant image these.) Useful ods graphics Options
After the ods graphics on statement, type a / then: imagename =
fileName reset resets the counter of images back to 0. imagefmt =
jpg width = 4.5 in height = 4.5 in If you set only width or height,
it will use a 4:3 aspect ratio. Different appearance
templates
What is ODS? The Output Delivery System (ODS) controls the type and
appearance of SAS output. Different appearance templates Different
outputdestinations/types. You can browse the ODS appearance
templates from the Style Manager on the Tools menu. ODS Graphics
Compared to the competition, for the last 10 years SAS graphics
have been between poor and pathetic. Graphics procedures that
rendered okay quality, at best . No what you see is what you get
editing. Many plots were nearly impossible to render. Custom
graphics required extensive programming. SAS 9.x has attempted to
solve this problem. Old vs. New Procedures The old (commonly used)
graphics procedures were gchart, gplot. Now most analysis
procedures have built in high quality graphics that can be invoked
with an ODS graphics on statement. Early on in the class I told you
to tweak the EG options to include ODS graphics on with every run.
There are also new easy to use statistical graphics (sg)
procedures. New Graphics Statistical Graphics Procs
proc sgPlot general plotting procedure that replaces gplot proc
sgScatter lots of tools for scatterplots and scatter matrices proc
sgPanel quick and easy trellis/lattice/matrix/panel of plots Proc
sgRender used with proc template to make totally custom plots It
replaces proc greplay Plot Types Univariate (one variable)
Categorical variables
Bar charts Dot plots Waffle plots Continuous variables Histogram
Box plot Violin plots Quantile and QQ plots You can get an okay
looking graphic using sgpanel. I was able to get exactly the
graphic I wanted using R. If you want to use R Download R for Mac
or PC cran.cnr.berkeley.edu/bin/macosx/
cran.cnr.berkeley.edu/bin/windows/base If you use a PC, also get
PERL and Tinn-R
PERL is a text manipulation language that is used by a couple of
key R packages.It ships with Mac OS X.PC users can get ActivePerl
(what I use) or Strawberry Perl for Windows. Tinn-R is a text
editor that knows the R language. sourceforge.net/projects/tinn-r/
R Help R help files are user hostile.To learn about the options for
dotchart type: ?dotchart Use: rseek.org Browse To see why people
use R for graphics look here:
addictedtor.free.fr/graphiques/thumbs.php Additional Libraries If
you see sample code that includes require() or library(), you will
need to do a onetime download of the additional package.If you are
using Vista, run R as the administrator (by right clicking on the R
icon instead of just double clicking ) to install and update
packages. Waffle Plots I have not found software to do them. I need
to find their real name Image from: Visual language for Designers
by Connie Malamed 2009. Continuous Outcomes The Distribution
Analysis menu option can do basic plots. The resolution of the
histogram is okay but the others are unacceptable. Use sgplot for
high resolution plots. Violin A violin plot mirrors the shape of
the histogram (density). They can be done in R. Grouped Categorical
Data
To graph categorical data in SAS you need to get Michael Friendlys
Visualizing Categorical Data.Unfortunately, his macros are
copyrighted with the bookSo I will show you the R versions.
Fourfold plots Mosaic plots Association plots Fourfold Plots They
draw 4 slices of pie with the area corresponding to the number of
people in each cell of a 2x2 table and they have confidence bands
such that if the confidence bounds overlap on adjacent pie pieces,
they are not statistically significantly different. More males were
admitted than females.
There is clear evidence of sexist policies in admissions!
Department A admitted more females than males and every other
department had no bias!
The joy of Simpsons paradox. Mosaic Plots So you have an
contingency table and you want to know if there is as an
association.You do a chi-square test and it says there are
associations between the rows and columns.What next? Some basic
voodoo in R shows which combinations are over (in blue) or under
represented (in red).
values = c(5, 29, 14, 16, 15, 54, 14, 10, 20, 84, 17, 94, 68, 119,
26, 7); values = matrix(values, nrow = 4, byrow=TRUE)
rownames(values) =c("Green", "Hazel", "Blue", "Brown")
colnames(values) =c("Black", "Brown", "Red", "Blond")
mosaicplot(values, shade = TRUE) I prefer the simpler association
plots.
values = c(5, 29, 14, 16, 15, 54, 14, 10, 20, 84, 17, 94, 68, 119,
26, 7); values = matrix(values, nrow = 4, byrow=TRUE)
rownames(values) =c("Green", "Hazel", "Blue", "Brown")
colnames(values) =c("Black", "Brown", "Red", "Blond") marg
Options>Enhanced Editor then click User Defined Keywords to add
the coloring. I want to add in a reference line showing what is
normal and put the categories in order. Grids You can produce
lattices full of graphics with proc gpanel. Spaghetti Plots Data
from Singer and Willett: