1 How to Navigate the Guide To navigate this SAS Guide, use the
PageDown and PageUp buttons on the keyboard. A copy of this
PowerPoint document can be downloaded from
http://www.biostat.ku.dk/~lts/varians_regression/sasguide.ppt
Slide 2
2 Preface This is The Beginners Guide To SAS. The document was
originally written by Anna Johansson, MEP, Stockholm. It has been
lightly edited by Peter Dalgaard and Lene Theil Skovgaard for the
Ph.D. course on SAS at the Faculty of Health Sciences, University
of Copenhagen, May 2002, and later by LTS for the Ph.D. Course in
Analysis of Variance and Regression.
Slide 3
3 Introduction What is SAS? SAS is a software package for
managing large amounts of data and performing statistical analyses.
It was created in the early 1960s by the Statistical Department at
North Carolina State University. Today SAS is developed and
marketed by SAS Institute Inc. with head office in Cary, North
Carolina, U.S.A.
Slide 4
4 Introduction (cont.) SAS in Denmark The Danish subdivision of
SAS Institute provides consulting and a wide range of courses. It
is located in Copenhagen. SAS Institute A/S Kbmagergade 7-9 1150
Kbh. K Tel: 70 28 28 70 Fax: 70 28 29 91 Email:
[email protected]
Slide 5
5 Introduction (cont.) The SAS System The SAS System is mainly
used for -Data Management (about 80% of all users) -Statistical
Analysis (about 20% of all users) The power of SAS lies in its
ability to manage large data sets. It is fast and has many
5statistical and non-statistical features. The disadvantage of SAS
is its steep learning curve. It takes quite a bit of an effort to
get started. User-friendly interfaces do exist, though.
Slide 6
6 Introduction (cont.) Start af SAS p kursussalen: -Flyt p
musen (eller tnd maskinen) -Login er kursusxx -Password skifter
-Vlg START, efterfulgt af STATISTIK og SAS 8.2
Slide 7
7 Introduction (cont.) Getting Started A very good start is to
enter the SAS Online Training. Choose in the menu Help + Getting
Started with the SAS Software, then click on the book.
Slide 8
8 Introduction (cont.) SAS Files If your data is not yet in a
SAS data set, you access the raw data by creating a SAS data set
from it. Once you have made the SAS data set, you use SAS programs
to analyse, manage and/or present the data. SAS data sets can be
permanent or temporary. A special library called WORK is created on
start-up and deleted on exit.
Slide 9
9 Introduction (cont.) SAS Programming SAS programming works in
two steps: Data Step 1. reads data from file 2. makes
transformations and adds new variables 3. creates SAS Data Set Proc
Step 4. uses the SAS Data Set 5. produces the information we want,
such as tables, statistics, graphs, web pages
Slide 10
10 Introduction (cont.) Data and Proc Steps Example of a SAS
program: data work.main; set work.original; age=1997-birthyr; Data
Step bmi=weight/(height*height); run; proc print data=work.main;
var id age bmi; run; Proc Steps proc means data=work.main; var age
bmi; run;
Slide 11
11 Introduction (cont.) SAS Modules The SAS system is made up
of several modules, each used for different purposes. This Guide
deals only with the SAS BASE and the GRAPH modules, giving
knowledge on basic data management and simple statistical analyses.
Other modules are SAS/Stat (statistical analyses), SAS/Access (data
base applications), SAS/Graph, SAS/Assist (menu-driven info
system), SAS/FSP (data entry and retrieval), SAS/Connect (remote
submit), etc.
Slide 12
12 Introduction (cont.) SAS at Biostat Dept. We primarily use
SAS on a Unix server whereas these notes assume that the programs
are run locally on a PC The basic programming is the same
regardless of what platform you use. This is one of the big
advantages of SAS. We do tend to prefer running SAS
non-interactively though.
Slide 13
13 The SAS Environment Windows The main feature of SAS is its
division of the main window into two halves. The left part is a
navigator of SAS libraries and Results (from the Output window).
The right part is divided into three separate windows: -Program
window or Enhanced Editor -Log window -Output window
Slide 14
14 The SAS Environment (cont.) Windows The log and output
windows are always opened by default when you start SAS (although
they may be hidden behind each other). The program window and the
Enhanced Editor are two different windows but they are used for the
same purpose, i.e. writing code and executing it. One of them will
open by default. Other windows are also available and are opened on
request (use View), for instance the Graphics window.
Slide 15
15 The SAS Environment (cont.) Windows (The program window is a
reminiscent of the older SAS version 6. The Enhanced Editor is a
new feature of version 8, and is more user-friendly, since it
colours the code and works more like an ordinary text editor.)
Slide 16
16 The SAS Environment (cont.) Windows To check which windows
are opened, choose Window in the menu. At the bottom there is a
list of opened windows. The active window is indicated by a . A
star * after the window name indicates that the file has not been
saved since its latest alteration. If you are missing any of the
windows (Enhanced Editor, Log, Output), you can open it by choosing
in the menu View + window-name
Slide 17
17 The SAS Environment (cont.) Windows You switch between the
windows by choosing Window + ENHANCED EDITOR Window + OUTPUT Window
+ LOG in the menu.
Slide 18
18 The SAS Environment (cont.) Windows The window location on
the screen can be changed by choosing Window + Tile Window +
Cascade or by pulling the lower right corner of the window with the
mouse. When you exit SAS, the window setting will be kept for the
next session (unless someone else...).
Slide 19
19 The SAS Environment (cont.) Enhanced Editor / Program Window
In the Enhanced Editor you write the SAS programs. The programs
tell SAS to produce the data sets, tables, statistics, etc. A
program consists of data steps and proc steps. A SAS program is
executed (submitted) by choosing Run + Submit in the menu (or by
clicking on the Running Man icon, fourth from the right in the
menu).
Slide 20
20 The SAS Environment (cont.) Output and Log Windows The
result of a program execution is printed to the Output window.
There you will find the prints, tables and reports, etc. A log file
is printed to the Log window. The log file contains information
about the execution, whether it was successful or not. It usually
points out your mistakes with warning and error messages so that
you can correct them.
Slide 21
21 The SAS Environment (cont.) Example: SAS Log 65 proc gplot
data=work.influnce; 66 plot di*pred / vaxis=axis1 haxis=axis1;
ERROR: Variable DI not found. NOTE: The previous statement has been
deleted. 67 run; Make a habit of checking the Log window after
every execution. Even if SAS has accepted and executed the program,
you may have made a methodological error. Check the note on how
many observations were read, and if there were any missing
values.
Slide 22
22 The SAS Environment (cont.) Example: SAS Output patientens
alder Cumulative ALDER Frequency Frequency
__________________________________ 0 - 24 41 41 25 - 44 176 217 45
- 64 77 294 65- 25 319
Slide 23
23 The SAS Environment (cont.) File Types These files are
created by SAS: -.sas file (SAS program) -.log file (Log) -.lst
file (Output) The SAS data sets are saved as.sd7 or.sas7bdat files.
(Other file types, e.g. catalogs, are also used and created by SAS,
but we will not pursue this any further.)
Slide 24
24 The SAS Environment (cont.) Using the SAS System You work
with SAS using -Menus and Toolbar -Command Line -Key Functions
F1-F12
Slide 25
25 The SAS Environment (cont.) Example Three different ways to
Open a File in the Enhanced Editor: 1. Menus: choose File + Open 2.
Toolbar: press the icon for Open 3. Command line: write include
N:\temp\bp.sas and press Enter.
Slide 26
26 The SAS Environment (cont.) Commands and Keys
Slide 27
27 The SAS Environment (cont.) Write and Read In the Enhanced
Editor you can -create new, or edit existing, programs -submit
programs -save programs (an unsaved file is marked with * after the
file name) You can NOT edit the log file or the output file in
their windows. They are only readable. If you wish to edit these
files, save them and use the Enhanced Editor or Word.
Slide 28
28 SAS syntax Statements The SAS code (syntax) consists of
statements (stninger). Statements mostly begin with a keyword
(ngleord), and they ALWAYS end with a SEMICOLON. data work.cohort;
set course.males98; run; proc print data=work.cohort; run; Examples
of keywords: data, set, run, proc.
Slide 29
29 SAS syntax (cont.) Statements SAS statements can begin and
end anywhere on a line. data work.cohort; One or several blanks can
be used between words. data work.cohort; One or several semicolons
can be used between statements. data work.cohort;;; ;
Slide 30
30 SAS syntax (cont.) Statements The statement can begin and
end on different lines. data work.cohort; SAS will not object to
several statements on the same line. However, it is not considered
good programming to have more than one statement per line. It makes
the code difficult to read. Avoid this! data work.cohort; set
course.males98; run;
Slide 31
31 SAS syntax (cont.) Indenting to improve readability Improve
the readability of your program by adding more space to the code (=
indenting). Begin data steps and proc steps in the first position,
as far left as possible. The ending run statement should also be in
the first position. All statements in between should start a few
blanks in from the left margin. This creates blocks of data steps
and proc steps, and you can easily see where one ends and another
begins.
Slide 32
32 SAS syntax (cont.) Example of Indenting data work.height;
infile 'h:\mep\rawdata_height.txt'; input name $ 1-20 kon 21 alder
22-23 height 24-30; if kon=0 and (height ne.) then do; if 0
104 The Online HELP (cont.) Example PROC MEANS: Syntax PROC
MEANS ; BY variable-1 variable- n> ; CLASS variable(s) ; FREQ
variable; ID variable(s); OUTPUT ; TYPES request(s); VAR
variable(s) ; WAYS list; WEIGHT variable;
Slide 105
105 The Online HELP (cont.) Explanation to the Online Help Text
-underlined word = keyword referring to a statement (statements
within a procedure are optional, the PROC and the RUN statements
are required) -black word = required if the corresponding keyword
is used -words within = optional, not required -words separated by
| = possible choices of values for a specific option
Slide 106
106 The Online HELP (cont.) Example If you click on the PROC
MEANS, a list of possible options will be displayed. Among them is
the MAXDEC= option which we have already used. The equal sign is
required. Next to MAXDEC= is the black word number. If you use the
MAXDEC option you are required to fill in a number corresponding to
the maximum number of decimals to be displayed. (The exact
conventions depend on which version of the help you use. pd)
Slide 107
107 Labels What are Labels? Each variable has a variable name
(e.g. birthyr) and a LABEL (e.g. Year of Birth). The label is how
the variable is written on the output. By default the label =
variable name unless you specify it. To define and assign a label,
use the LABEL statement. label variable1 = label-name1 variable2 =
label-name2... ;
Slide 108
108 Labels (cont.) What are Labels? Labels can be 256
characters long at most. The output from proc CONTENTS include a
column with labels for all the variables in the data set. To delete
a label simply define the label equal to space: label variable1 =
;
Slide 109
109 Labels (cont.) Permanent or Temporary Labels Labels can be
assigned inside a data step or a proc step. Labels assigned in a
data step are permanent. They are also transferred to new data
sets. Labels assigned in a proc step are temporary. A temporary
label replaces a permanent label throughout the execution of the
procedure step. Most common are permanent labels defined in the
data step.
Slide 110
110 Labels (cont.) Example Assigning permanent labels in a data
step: data course.main; set course.original; age=1997-birthyr;
height=height/100; bmi=weight/(height*height); label birthyr=Year
of Birth age=Alder height=Hjde bmi=BMI; run;
Slide 111
111 Labels (cont.) Example With label: Year of OBS Birth 1 1954
2 1956 3 1956 4 1962 5 1954 6 1953 7 1955... 18 1957 Without label:
OBS BIRTHYR 1 1954 2 1956 3 1956 4 1962 5 1954 6 1953 7 1955... 18
1957
Slide 112
112 Formats What are Formats? Formats are used on variable
values to -display the values differently from the raw values (e.g.
with fewer decimals, or as dates) -group the values (values
0-25=low, values 26-100=high) There are predefined formats in SAS
which you may use, but you can also create your own formats. The
procedures are designed to handle formats and use them
accordingly.
Slide 113
113 Formats (cont.) Assign Formats To assign formats you use
the FORMAT statement inside a data step (permanently) or a proc
step (temporarily). The general form of the FORMAT statement is
format variable1 format1.; The following yields a value with two
digits, a decimal point and two decimals (5 positions, of which two
are decimals): format bmi 5.2;
Slide 114
114 Formats (cont.) Example Permanent Assignment data
course.main; set course.original; age=1997-birthyr;
height=height/100; bmi=weight/(height*height); format age 4.0 bmi
4.2 birthyr best4.; run;
Slide 115
115 Formats (cont.) Example Temporary Assignment proc print
data=course.main; var birthyr age bmi; format age 4.0 bmi 4.2
birthyr best5.; run; Usually, the format statement is at the end of
the data or proc step together with the label statement.
Slide 116
116 Formats (cont.) Predefined SAS Formats Formats are all of
the form (where indicates optional and is not to be typed in):
format-name. w indicates maximum number of positions used to
display the value d indicates optional number of decimals in a
numeric format
Slide 117
117 Formats (cont.) Predefined SAS Formats Formats for
character variables need a $ sign in the first position: $
format-name. All formats, numeric or character, MUST contain a
period (. punktum), either at the end or before the d value. See
examples.
Slide 118
118 Formats (cont.) Predefined SAS Formats w.d = numeric values
at most w positions long, and d of these positions are decimals $w.
= character values w positions long COMMAw.d = numeric values with
commas and decimal points: 12,345.67 BESTw. = chooses the best
notation with w positions for numeric values The period (.)
occupies one position in all of these formats.
Slide 119
119 Formats (cont.) Example
Slide 120
120 Formats (cont.) User-defined Formats There are situations
when the predefined formats do not suffice. An example, you wish to
group the BMI values into three categories; underweight, normal
weight, overweight. There is no predefined format to meet your
demands in this situation. The solution is to create your own
format.
Slide 121
121 Formats (cont.) User-defined Formats To use your own
formats you must -define the format -assign the format Several
variables may be assigned to the same format and A variable may be
assigned to different formats in different procedures
Slide 122
122 Formats (cont.) Proc FORMAT defines formats Formats are
defined through the FORMAT procedure. proc format; value
format-name range1 = label range2 = label... ; run; The labels must
be inside quotes ().
Slide 123
123 Formats (cont.) User-defined Formats Format names are like
any other SAS names, however they must not end in a number. A
format for a character variable must have a dollar sign $ as its
first character. Format names do NOT end with a period (.) in proc
FORMAT. The period is only used when assigning the format in a data
or proc step.
Slide 124
124 Formats (cont.) Example: User-defined Formats A
case/control format (case_1f) and a BMI format (bmif). proc format;
value case_1f 0=Case 1=Control other=Other; value bmif
low-20.0=Underweight 20.0-25.0=Normal weight 25.0-high=Overweight
other=Other; run;
Slide 125
125 Formats (cont.) Example: User-defined Formats Above, a
value of 20.0000 would fall into Underweight, but 20.0001 would
fall into Normal weight. The first true range alternative is used
for a value of a variable assigned by the format.
Slide 126
126 Formats (cont.) Special Format Values other = all other
values, including missing values low = the lowest value (minimum)
of the variable assigned to the format, including missing values.
(For character formats low does not include missing values.) high =
the highest value (maximum) of the variable assigned to the
format
Slide 127
127 Formats (cont.) Assigning User-defined Formats User-defined
formats are assigned by a FORMAT statement, exactly as with the
predefined formats. proc freq data=course.main; tables bmi; format
bmi bmif.; run; Cumulative Cumulative BMI Frequency Percent
Frequency Percent Underweight 2 11.1 2 11.1 Normal weight 14 77.8
16 88.9 Overweight 2 11.1 18 100.0
Slide 128
128 Formats (cont.) Assigning User-defined Formats proc means
data=course.main maxdec=1; class bmi; var age; format bmi bmif.;
run; The MEANS Procedure Analysis Variable : age N bmi Obs N Mean
Std Dev Minimum Maximum Underweight 7 7 37.9 3.3 35.0 44.0 Normal
weight 51 49 37.6 3.4 30.0 44.0 Overweight 5 5 38.0 3.3 35.0
42.0
Slide 129
129 Formats (cont.) Assigning User-defined Formats As shown
above, user-defined formats are assigned in the exact same way as
the SAS formats: format variable1 format1.;
Slide 130
130 Titles and Footnotes Titles You can add titles to the
output with a TITLE statement. A TITLE statement is one of the
global statements which do not have to be included in a data step
or a proc step (other global statements are the LIBNAME and OPTIONS
statements). The form of the TITLE statement is title
here-you-write-the-title; The title must be surrounded by quotes
().
Slide 131
131 Titles and Footnotes (cont.) Example title BMI Body Mass
Index; proc freq data=course.main; tables bmi; format bmi bmif.;
run; BMI Body Mass Index Cumulative Cumulative BMI Frequency
Percent Frequency Percent Underweight 2 11.1 2 11.1 Normal weight
14 77.8 16 88.9 Overweight 2 11.1 18 100.0
Slide 132
132 Titles and Footnotes (cont.) Delete Titles A title will
stay defined, and be printed to all output, until it is changed, or
deleted. To delete a title simply write title;
Slide 133
133 Titles and Footnotes (cont.) Several Titles It is also
possible to have second titles below the main title. A maximum of
10 titles can be used simultaneously. title1
here-you-write-the-first-title; title2
here-you-write-the-second-title;... title10
here-you-write-the-tenth-title;
Slide 134
134 Titles and Footnotes (cont.) Several Titles The unnumbered
title statement, is equal to the title1 statement. It is possible
to have, for example, title2 undefined or deleted while title3 is
defined. It will result in a gap between title1 and title3 on the
printout representing title2. However, when you delete say title3,
all titles beneath it (title4-title10) will also be deleted. title3
;
Slide 135
135 Titles and Footnotes (cont.) Example title BMI Body Mass
Index; title2 Women 35-45 yrs; proc freq data=course.main; tables
bmi; format bmi bmif.; run;
Slide 136
136 Titles and Footnotes (cont.) Example BMI Body Mass Index
Women 35-45 yrs Cumulative Cumulative BMI Frequency Percent
Frequency Percent Underweight 2 11.1 2 11.1 Normal weight 14 77.8
16 88.9 Overweight 2 11.1 18 100.0
Slide 137
137 Titles and Footnotes (cont.) Titles Window A shortcut to
defining titles is the Titles window. Issue the title command in
the Command line. The Titles window will open, with all your
current title definitions. From here the titles can be changed
directly by editing. The disadvantage of this shortcut is that you
can NOT save the title definitions, as you could have if you had
written them in code. When a program is rerun later after many
title changes, the titles will not be as originally.
Slide 138
138 Titles and Footnotes (cont.) Footnotes Footnotes work in
the exact same way as titles. The only difference is that footnotes
are written at the bottom of the printout. footnote
here-you-write-the-footnote; To delete a footnote write
footnote;
Slide 139
139 Titles and Footnotes (cont.) Example footnote BMI Body Mass
Index; footnote2 Women 35-45 yrs; proc freq data=course.main;
tables bmi; format bmi bmif.; run;
Slide 140
140 Titles and Footnotes (cont.) Example Cumulative Cumulative
BMI Frequency Percent Frequency Percent Underweight 2 11.1 2 11.1
Normal weight 14 77.8 16 88.9 Overweight 2 11.1 18 100.0 BMI Body
Mass Index Women 35-45 yrs
Slide 141
141 Titles and Footnotes (cont.) Footnotes Window To open the
Footnotes window and edit footnotes directly, issue the command
footnote in the Command line.
________________________________________________ There are lots of
additional features to titles and footnotes available, such as
fonts, sizes and orientation, etc. See the manual SAS/GRAPH
Software Vol. I.
Slide 142
142 Subsetting a Data Set Subsets of Data Often one wants to
use only a subset of a data set, e.g. persons older than 60 years,
women, cases etc. This is particularly useful when performing data
cleaning, and you only want to print the observations with extreme
values of a variable, say blood pressure > 200.
Slide 143
143 Subsetting a Data Set (cont.) WHERE option In procedures
you use the WHERE data set option to subset the data set. proc
print data=SAS-data-set(where=(expression)); run; The WHERE data
set option may be used in any procedure. It can also be used in
data steps, although it is less usual. data course.cases; set
course.main(where=(case_1=1)); run;
Slide 144
144 Subsetting a Data Set (cont.) WHERE option The expression
must be a logical one, resulting in true or false. Only
observations for which the expression is true will be used in the
proc step. Examples of expressions are: where=(birthyr gt 1950)
where=(1947
145 Subsetting a Data Set (cont.) Conditional Operators
Possible conditional operators (use sign or abbreviation): = eq
equal to ^= ne not equal to > gt greater than < lt less than
>= ge greater than or equal to
147 Subsetting a Data Set (cont.) Examples proc freq
data=course.main(where=(birthyr