Upload
doxuyen
View
243
Download
0
Embed Size (px)
Citation preview
1
N U I T, NEWCASTLE UNIVERSITY
Getting Started With SAS Base and SAS Enterprise Guide
A Training Manual for Beginners
Dr. S. T. Kometa
2
Table of Contents
Introducing SAS Enterprise Guide (SAS-EG): A Typical Session …………………………..….3
SAS Enterprise Guide and the SAS System …………………………………………………..…..4
Introduction to SAS Enterprise Guide ………………………………………………………..…..4
Analysis Capabilities …………………………………………………………………………..…...5
Data Exploration, Analysis, and Reporting by Example ……………………………….……….5
Example …………………………………………………………………………..…………………5
Exercise One: Importing Data and Creating Notes………...……………………………………..7
Exercise Two: ……..………………………………………………………………………………...7
Exercise Three: Combining Reports into a Single Document..………………………………....10
Getting Started with the SAS System Point and Click Approach .......…………….…………...12
Getting Started with the SAS System Programming Approach ..........………………………...18
Basic Statistical Procedures through Programming ……………………………………………23
Learning outcomes (Aims and objectives of this workshop)
This course gives a quick overview of the essentials of SAS. After completing this course you
should:
be able to create an SAS data file from scratch
be able to carry out some simple analyses on the data file
be able to present some of the data graphically
open an Excel file in SAS
be able to use SAS Enterprise Guide
be able to interpret the output from the analyses
be able to use SAS with a degree of confidence
Important Notice: Please follow this instruction and change one default setting before you
continue. From the menu bar select Tools-> Options -> Tasks. Under “Default footnote text for
task output:” change the last argument to TIMEAMPM20.
3
Introducing SAS Enterprise Guide (SAS-EG): A Typical Session
A typical SAS-EG session involve starting SAS-EG, opening an SAS data file, requesting an
analysis or build a chart and then leaving SAS-EG.
Starting SAS-EG
On Remote Application Service (RAS)
Type the following url in the address field of your internet browser: https://ras.ncl.ac.uk/. Log in
normally using your logon name and password. Single click on the Statistical Software folder.
Single click on SAS Enterprise Guide. Close the Welcome dialogue box.
On Common Desktop
Choose Start -> Programs -> Statistical Software. The available statistical software will be
displayed, choose SAS, and select SAS Enterprise Guide. Close the Welcome dialogue box.
Opening An SAS Data File (Set)
Click File -> Open -> Data…. Under the text area for File name: type
\\campus\software\dept\sas and click on Open. Select fall94 and click Open.
Running An Analysis
1. Select Describe -> Summary Statistics….
2. With Data selected in the Selection pane, control-click to select the variables sat, hsgpa,
and ptest in the Variables pane.
3. Drag these variables to the Task pane and drop them in as Analysis variables.
4. Select Status in the Variables pane and drag it to the Task pane to assign it as a
Classification variable.
5. Select Statistics in the Selection pane.
6. In the Basic statistics pane change the Maximum decimal places from Best fit to 2.
7. Select Percentiles in the Selection pane and select Median in the Percentile statistics pane.
8. Select Run to generate the descriptive statistics and plots.
Building A Chart
1. Select Graph -> Bar Chart….
2. With Bar Chart selected in the Selection pane, select Stacked Vertical Bar.
3. Select Data in the Selection pane.
4. In the Columns to assign pane, select College and drag it to the Column to Chart role in
the Task roles pane.
5. In the Columns to assign pane, select Status and drag it to the Stack role in the Task roles
pane.
6. Select Layout in the Selection pane. Deselect 2D by clicking on it.
7. In the Shape panel, select Cylinder from the drop-down menu.
8. Select Run.
Leaving SAS-EG
Click File -> Exit. Save Changes to Project? Select No.
4
SAS Enterprise Guide and the SAS System
When you use SAS Enterprise Guide, you are also using SAS software behind the scenes. As you
access data and build tasks, SAS Enterprise Guide generates SAS code. You can use all the features
of SAS Enterprise Guide without knowing how to program in SAS. When you run a task, the
generated code is sent to SAS for processing and the results are returned to SAS Enterprise Guide.
Introduction to SAS Enterprise Guide
5
Analysis Capabilities
An Overview of Statistical Tasks
To better understand the full capabilities of SAS Enterprise Guide, go through the menu bar and
examine all the available menus. Your attention is draw to the following menus:
Describe
Analyze
ANOVA
Regression
Multivariate
Survival Analysis
Capability Analysis
Control Charts
Time Series
Graph
Exit Enterprise Guide.
Data Exploration, Analysis, and Reporting by Example
Objectives
Explore the Enterprise Guide environment.
Conduct data exploration and analysis.
Generate a simple report.
Example
Information was collected on students attending your university in academic year 1994. You want
to create some reports and graphs to summarize student characteristics.
The data set contains 18 pieces of information on 200 students. The data is already in SAS format.
Study the data file carefully to better understand it. The variables in the data set are shown below:
6
7
Exercise One: Importing Excel Data and Creating Notes
Follow these instructions to import the data set:
1. Open Enterprise Guide by single-click on the icon.
2. When you open Enterprise Guide, you will see a window asking whether you want to open
an existing project or create a new project, SAS program, or data. Select New Project.
3. Next, import the data. To open the Microsoft Excel file, select File -> Import Data. In File
name: type \\campus\software\dept\spss and click Open.
4. Select Employee data and then select Open.
5. Follow the online instructions by selecting Next, Next, Next and Finish.
A new SAS data table is created from the Excel worksheet.
Exercise Two: Data Exploration using Charts, Summary Statistics and Correlation
1. Select File ->Open ->Data…. In File name: type \\campus\software\dept\sas and click
Open.
2. Select the file fall94, and then select Open. Note that this begins a new flow in the Process
Flow diagram, which we will work with from this point forward.
3. To add some text in the project describing the data, select File ->New ->Note. You can then
type information into the note. For example “Data file containing 18 pieces of information
on 200 students”.
4. Save the work done so far by selecting File -> Save Project As….
5. Save the project within a folder (directory) on your H: Drive (My Documents). If you like,
you can create a new folder for this tutorial within the folder My SAS Files.
6. Name the project Students, and then select Save.
Data Exploration
You have been asked to generate a report on the data from academic year 1994. In particular, the
administration is interested in comparing students by their status. Begin by creating some graphs
and descriptive statistics that look at the college the students are enrolled in and some of their
current and previous academic measures. Begin by creating some bar charts that show the status of
the students within each of the colleges. Follow these steps:
1. Select Tasks -> Graph -> Bar Chart….
2. With Bar Chart selected in the Selection pane, select Stacked Vertical Bar.
3. Select Data in the Selection pane.
4. In the Columns to assign pane, select College and drag it to the Column to Chart role in
the Task roles pane.
5. In the Columns to assign pane, select Status and drag it to the Stack role in the Task roles
pane.
6. Select Layout in the Selection pane. Deselect 2D by clicking on it.
7. In the Shape panel, select Cylinder from the drop-down menu.
8. Select Run.
Examine the bar chart. What information can you get from it?
8
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
Recall that the definitions of the values of these variables were listed at the beginning of this
document on page 6.
If you point your cursor at the different parts of the bar chart, the information on that section is
displayed.
If you examine the Process Flow and Project Tree, you will notice the additions of the dialog that
created the bar chart and the output to each of these. In the Project window, you also can see and
have access to the Log window for this task.
The office requesting the information would like some descriptive statistics about the students' high
school grade point averages, pretest scores, and sat scores. They would like this information
separately for each type of student (freshman, returning, continuing, and transfer students). To
generate the requested descriptive statistics, follow these steps:
1. Select Tasks -> Describe -> Summary Statistics….
2. With Data selected in the Selection pane, control-click to select the variables sat, hsgpa,
and ptest in the Variables to assign pane.
3. Drag these variables to the Task roles pane and drop them in as Analysis variables.
4. Select Status in the Variables pane and drag it to the Task roles pane to assign it as a
Classification variable.
5. Select Statistics in the Selection pane.
6. In the Basic statistics pane change the Maximum decimal places from Best fit to 2.
7. Select Percentiles in the Selection pane and select Median in the Percentile statistics pane.
8. Select Run to generate the descriptive statistics and plots.
Now examine the output. What information can you draw from the output?
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
9
You are interested in examining the relationship between these three measures of student
performance prior to attending the university and the student performance at the university as
measured by their cumulative grade point average at the end of the Fall term (cumgpae). One
statistical measure that can be used is the correlation statistic. Follow these steps:
1. Select Tasks or Analyze -> Multivariate -> Correlations….
2. With Data selected in the Selection pane, control-click to select the variables sat, cumgpae,
hsgpa, and ptest in the Variables to assign pane.
3. Drag those selections to the Task Roles pane and drop them in as Analysis variables.
4. Select Options in the Selection pane and note that the Pearson correlation will be computed
by default.
5. Select Results in the Selection pane and click to select Create a scatter plot for each
correlation pair.
6. Select Run.
Examine the output, what conclusion(s) can you draw from the output?
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
Now attempt to produce two dimension scatter plot using any two of the variables you used to
produce the correlation.
In addition to two-dimensional scatter plots, you can also produce three-dimensional scatter plots.
To illustrate this, generate a 3D scatter plot with the variables cumgpae, hsgpa, and ptest. This will
allow you to explore the relationship between all three of the variables at the same time. Follow
these steps:
1. Select Task -> Graph -> Scatter Plot….
2. With Scatter Plot selected in the Selection pane, select 3D Scatter Plot in the Scatter Plot
pane.
3. Select Data in the Selection pane.
4. Assign hsgpa as the Horizontal variable, ptest as the Vertical variable, and cumgpae as
the Depth variable.
5. Select 3D Scatter in the Selection pane.
6. Change the Symbol type to Sphere using the drop down arrow. Note that you do have the
capability to control most aspects of the graph including such things as the axes and titles.
7. Select Run and examine the chart.
You can export a graph like this to a Word or PowerPoint presentation.
10
Exercise Three: Combine Reports into a Single Document
You might want to include some of these results in a report. Your results can be sent to several
different types of files, several of which would be useful for creating a report. You can also pick
and choose output to include in a document using the Document Builder in Enterprise Guide.
Follow these steps:
1. Select File -> New -> Report.
2. Select any part of the output (results) that you want to include in your report from the left
pane. Drop it in any of the squares in the right pane. Click OK.
The document has been added to the project and the process flow as Report. Save the document
outside of Enterprise Guide. Follow these steps:
1. Right-click on Report in the project tree or the process flow and select Export -> Export
Report….
2. Select Local Computer to export the document to.
3. Choose a location on your computer to save the document, rename it.
4. Choose the report format that you want e.g. html. Click Save.
At this point, you can open the html file that has been created in an application such as Microsoft
Word and add comments and conclusions. This file can then be saved and submitted electronically.
11
Getting Started with the SAS System
12
Getting Started with the SAS System
Point and Click Approach
You are advised to use this document alongside the program.
Starting SAS
To start SAS click on Start -> Programs -> Statistical Software -> The SAS System -> The SAS
System for Windows.
The SAS software has five main windows: Explorer, Results, Editor (Program), Log and Output
Windows. Windows may overlay one another. See Fig.1 below.
Fig. 1 The SAS Windows
Explorer window
Use this window for file management tasks such as moving, copying and deleting files. In this
window, you can also create new libraries and SAS files. You can display the Explorer window
with or without a tree view of its contents. That is, View -> Show Tree. You can also use the
Toggle Tree on the toolbar to do this. See Fig. 2.
Log Window
Editor Window
Toolbar
Pull down Menu bar
Window bar Editor is the active window
Status bar Cursor Position indicator Message Line
Output Window under the Log and Editor windows
Results Window under the Explorer window
Explorer Window
Command line area. Click on tick to submit command
13
Fig. 2 The Explorer Window
Program Editor Window
In this window, you can enter, edit, and submit SAS programs. You can use more than one Editor
window. The main features of this window include:
colour coding and syntax checking of SAS language
expandable and collapsible sections, click on the (+) to expand and (-) to collapse
multi-level undo and redo. See Fig. 3 for details.
Fig. 3 Program Editor Window
Normal View of the
Explorer Window
Toolbar for the
Explorer Window.
Unavailable tools
are grayed.
Help
Explorer
Programming Window
New Library
New Toggle Tree Toggle Details
Tree View of the Explorer
Window
Help
Break
Clear All
Submit
SAS Explorer New Library
Undo
Paste
New
Open
Save
Print Preview
Toolbar for the
Editor Window.
Unavailable tools
are grayed.
14
Log window
This window display important information about your SAS session and programs you submit.
Always examine the log window before you look at your output.
Fig. 4 Log Window
Output Window
You can browse output from SAS programs that you submit in this window. By default, the
window is positioned behind the Editor and Log windows. The window automatically moves to the
front for your first output in your SAS session. For subsequent output in the same session, you will
need to select the output button to see the output.
Results Window
This window helps you to navigate and manage output from SAS programs that you submit. You
can view, save and print individual items of output. By default, the Results window is positioned
behind the Explorer window and it is empty until you submit a program that creates output. Then it
moves to the front of your display.
The output and results window are shown together in Fig. 5 below.
Toolbar for the
Log Window.
Unavailable tools
are grayed. Very
similar to the
Editor toolbar.
15
Fig. 5 Output and Results Windows
Using SAS window features
The active window determines what items are available on the main menu bar. In other words, each
window has its own menu selections that reflect the actions you can perform in the window. For
example, click the Explorer window and then select View. Examine the options available. Click
the Editor window and select View again. Examine the options available and notice that it offers
different selections.
Note:
you can minimize the Log, Editor and Output windows
right-click in any window to see the pop-up menus
toolbar displays icons for many of the actions you perform most often in a particular
window. Select the various Windows and examine how the toolbar changes.
Move your mouse to a tool and hold it there for a moment to see the name of the tool.
Getting Help in the SAS System
Help is available for all products in the SAS System. To appreciate the extensive help provided by
the SAS System, try the following:
Select Help. Take your time and examine this, particularly the following:
Output area.
It shows the actual
output.
Results area.
It shows table
of contents of the
output
Output
Window Results
Window
16
Using This Widow
SAS Help and Documentation
Getting Started with the SAS Software
Learning SAS Programming
SAS on the Web
Make the Explorer window active, then select Help -> Using This Window. Take a few
moments and examine the task-oriented help available.
SAS Products
Base SAS, Enterprise Miner, SAS Enterprise Guide, SAS/ACCESS, SAS/AF, SAS/ASSIST,
SAS/CALC, SAS/CONNECT, SAS/EIS, SAS/ETS, SAS/FSP, SAS/GIS, SAS/GRAPH, SAS/IML,
SAS/INSIGHT, SAS/IntrNet, SAS/LAB, SAS/MDDB, SAS/OR, SAS/QC, SAS/SHARE,
SAS/SPECTRAVIEW/ SAS/STAT, SAS/TOOLKIT, SAS/TUTOR, SAS/Warehouse
Administrator.
Programming Language
The SAS Software is also a very powerful programming language.
Exploring Files in SAS
In the SAS System, you can explore and manage both SAS files and other files. In the Explorer
window, you can view and manage SAS files, which are stored in libraries. A library is simply a
link to the physical location of a file such as a directory or folder. If you delete the library the file
still exist in the directory or folder.
Make the Explorer window active and double-click on Libraries. All the active libraries
(directories) are listed. Double-click the Sashelp library. All members (files and directories) are
listed. Move back up to the top level, i.e. View -> Up One Level, or just click on its icon.
SAS allows you to explore files in your operating environment via View -> My Favorite Folders.
Working with SAS files: Details and Sort
To see detail information about any file double-click on its library and then click the detail tool or
View -> Details. In the Explorer window, you can sort files by any column. Just click the heading
of the column to sort the files. For example, click on the Type column to sort the files by file type.
Click it again to reverse the order. Go back to the original order by View -> Refresh.
File properties
The SAS System allows you to view detail properties on a file. Just right-click on the file you are
interested in the Explorer window and select Properties from the pop-up menu. The General
Properties window will be displayed. To see other properties, click the tab at the top of this
window. Try this using any file in the sashelp library.
To open a file just double-click its icon in the Explorer window. The file will be opened using the
Viewtable window. Try this again using any file in the sashelp library.
17
Working with SAS Libraries
Creating a new library
There is already a folder in your H Drive (Documents) called My SAS Files. Check to confirm that
this folder is there. We need to create a new library in SAS and link it to this folder (My SAS Files).
All the SAS files that we create in this workshop will be stored in the folder (My SAS Files).
To create a new library follow these instructions:
Make the Explorer window active and double-click Libraries.
Select File -> New.
In the New Library window, type a name for the new library, e.g. MyNewLib. Leave the
default engine selected.
Select Enabled at Startup.
Click Browse and go H Drive (Documents) and select My SAS Files.
Click OK to create the library. MyNewLib appears in the active Libraries list.
Copying a table (file or data set)
Make sure that the Explorer window is active, then select View -> Show Tree. The window
will now have two panes, left and right.
Double-click the Sashelp library located in the left pane. Its content will be displayed in the
right pane. Scroll down and look for Prdsale table.
Click on Prdsale, drag and drop it on MyNewLib on the left pane.
Open MyNewLib and confirm that the table was copied successfully.
Note: You can change the name of a table quite easily. Right-click the name of the table and select
Rename from the pop-up menu. Re-name Prdsale as Myproductsales. Click OK.
18
Getting Started with the SAS System
Programming Approach
The SAS Language Introduction
The SAS software has a very powerful programming language. There are many things that you can
do through programming that you can't do through the point and click approach. In fact, the power
of SAS still lies in the programming language. This document introduces some of the basic ideas in
SAS programming.
Some basic ideas about SAS and its programming language
SAS programs: A SAS program is simply a sequence of statements executed in order. This can be
simple or complex in nature.
SAS statements: Every SAS statement ends with a semicolon (;). Some SAS statements have
optional parts or sections.
Layout of SAS programs:
SAS statements can be in upper or lowercase
Statement can continue on the next line as long as you don’t split words in two
More than one statement can be on the same line
Statements can start in any column
Comments: Including comments in your program makes it easy to read and understand. There are
two ways to include comments in your SAS programs:
One starts with an asterisk (*) and ends with a semicolon (;).
The other starts with slash asterisk (/*) and ends with asterisk slash (*/).
That is, your comment goes between these characters. SAS ignores comments during the execution
of your program.
SAS Data Sets You write SAS programs to create, manipulate or analyse data sets. It is therefore important to
understand what a SAS data set is. The following applies to SAS data sets:
Variables and observations: In SAS terminology, data set consists of variables and observations.
In terms of relational databases, SAS data sets are also called tables, observations are called rows,
and variables are called columns.
Data types: There are just two data types in SAS: Numeric and Character. The dollar sign ($) is
used to denote character variables.
Missing data: Missing numeric data values are represented by periods (.), while missing character
values are represented by blanks.
Size of SAS data sets: SAS can handle up to 32,767 variables in a single data set. The number of
observations is limited by your computer’s capacity.
Rules for SAS names: When naming your variables and data sets bear the following rules in mind:
names must be 32 characters or fewer in length
names must start with a letter or underscore ( _ )
names can contain only letters, numerals, or underscores ( _ ). No % $ ! * & # @, please.
names can contain upper- and lowercase letters.
19
Information on SAS data set such as the name, date created, version of SAS used, are stored as part
of the data set. Information on the variables such as type, name, length, format and informat are also
stored.
The Two Parts of a SAS Program
A SAS program is made up of two parts a data step and a procedure (proc) step. This can be simple
or complex. Examine the simple program below:
DATA firstdata;
quantity=10;
price=50;
Totalcost=quantity*price;
PROC PRINT DATA=firstdata;
RUN;
Data steps:
begin with DATA statements
read and modify data
create a SAS data set
The data step has built-in loops that execute line by line and observation by observation.
Proc steps:
begin with PROC statements
perform specific analysis or function
produce results or report
Choosing a Mode for Submitting SAS Programs
A SAS program does nothing until you submit or execute it. Once you have completed your
program, there are several ways to submit or execute the program:
SAS Windowing environment: Also known as interactive SAS. Run -> Submit. This is the
recommended method.
Non-interactive mode: The command is: system prompt> SAS filename.sas.
Batch or background mode: Common with mainframe computers.
Interactive line mode: Not recommended.
Windows and Commands in the SAS Windowing Environment
There are five basic SAS windows:
Results
Explorer
Program Editor
Log
Output
You may have up to three ways to issue commands:
Menus
20
The tool bar
SAS command bar (command line)
Submitting a Program in the SAS Windowing Environment
First, make sure that your program is in the Program Editor window. Use any of the following to
submit your program:
Use the Submit icon on the toolbar
Make the Program Editor window active and type SUBMIT in the command line and then
press the tick with your mouse pointer
Make the Program Editor window active and select SUBMIT from the RUN pull-down
menu
Use the method you feel most comfortable with. Once you execute or submit a SAS program
messages will be generated. If your program does not have any errors output will be generated if
any.
Viewing the SAS Log and Output: You can now examine the SAS Log and the output from your
program if any.
Reading the SAS Log
SAS writes messages in to the log window each time you submit a program. Always examine the
log first before you examine your output if any. There is plenty of important information in the log.
You may also find warnings and other type of notes, which sometimes indicate errors and other
times just provide useful information.
Viewing and Printing the SAS Output
How you view or print your output depends on how you submit your programs. If you submit your
program in the SAS windowing environment, then your output will go to the Output window by
default. Listing of the different parts of your output will also be displayed in the Result window.
Printing or saving the contents of the Output window should be easy. Note that you can print or
save all or just parts of your output.
Using SAS System Options
System options are parameters you can change that affect the SAS System. The SAS System makes
many assumptions about how you want it to work. System options give you a way to change some
of these assumptions.
You can see a list of system options and their current values by opening the SAS System Options
window (i.e. Tools -> Options -> System) or by using the OPTIONS procedure in the Editor
Window (i.e. PROC OPTIONS; RUN;). In the SAS windowing environment, you can change
system options by:
SAS System Options window
OPTIONS statement
The OPTIONS statement is one of the special SAS statements which does not belong to either a
PROC or a DATA step.
Standard and Non-standard Numeric data
21
Standard numeric data contain only numbers, decimal points, minus signs, and E for scientific
notation. Numbers with commas and pound signs are example of non-standard data. To read non-
standard data you use SAS informats. To print out non-standard data you use SAS formats.
Examples will be highlighted as we proceed through this document.
Practice 1
Type in the simple program on page 19 into the Editor window and submit it (via Run ->
Submit). The program starts with DATA and ends with a semicolon after RUN. Examine the Log
and Output Window.
GETTING YOUR DATA INTO THE SAS SYSTEM
SAS Data Libraries
This is used to store and manage data in SAS files. A SAS data library is a collection of one or
more SAS files that is recognized by the SAS System. Each file belongs to a particular library. A
library reference (libref) is the name you associate with the SAS data library. Each SAS data library
is assigned a libref. You reference files in the library by using the corresponding libref followed by
the name of the file.
Temporary and Permanent Data Sets The SAS System creates two types of data sets:
1. Temporary data set - A temporary SAS data set exists only for the duration of the current
SAS session and it is lost forever when you exit the SAS System. The default libref for
temporary data set is work. In other words, the SAS System automatically assigns the libref
work to the data set name you specify. When working with files in this library, you can
specify the libref or ignore it, SAS does not mind.
2. Permanent data set - A permanent SAS data set exists after the end of the current session and
until you deletes it. In creating or working with a permanent SAS data set you must specify
the libref and the data set name.
Both types of SAS data sets have two-level names in the form of libref.data-set-name. The sample
data sets found in SAS are permanent SAS data sets. For example, some data sets are kept in the
sashelp library. You can create your own library to store your SAS data sets permanently or use any
of the libraries available in SAS.
Methods of Getting Your Data into the SAS System
Entering data directly into SAS data sets
Creating SAS data sets from raw data files
Converting other software’s data files into SAS data sets
Reading other software’s data files directly
Telling SAS Where to Find Your Raw Data Internal Raw Data: Use the DATALINES or CARDS statement to tell SAS that raw data follows.
The following example illustrates this. Type and submit this little program. Examine the log
window and then the output window.
22
Practice 2
DATA seconddata;
INPUT name $ weight height;
CARDS;
James 65 1.65
John 70 1.50
Lisa 60 1.70
;
PROC PRINT DATA=seconddata;
RUN;
The DATA statement tells SAS to create a temporary data set called seconddata. The INPUT
statement tells SAS to create three variables, the first is character (name) and the other two are
numeric (weight and height). If a variable name comes before $ then it is a character variable
otherwise numeric. The semicolon after the data indicates end of data line. The PROC PRINT
statement prints out the output on the screen not on paper.
External Raw Data Files: You can read raw data to create a SAS data set using the INFILE
statement, e.g. the following data in text format could be read into the SAS system using the
program below. As an exercise, use NotePad to type in the raw data. Save it as firsttrial in the
Temp directory in drive C. The data values are separated by space. Do not type in the first line.
Practice 3
Raw data firsttrial.txt
Lucky 2.3 1.9 . 3.0
Spot 4.6 2.5 3.1 .5
Tubs 7.1 . . 3.8
Hop 4.5 3.2 1.9 2.6
Noisy 3.8 1.3 1.8
1.5
Winner 5.7 . . .
Now type the following program in SAS Program Editor window to read the raw data file and
create a SAS data set.
Program to read raw data
* Create a temporary SAS data set named data1;
* Read the data file firsttrail.txt using list input;
DATA data1;
INFILE 'c:\Temp\firsttrial.txt';
INPUT ToadName $ Weight Jump1 Jump2 Jump3;
* Print the data to make sure that the file was read correctly;
PROC PRINT DATA = data1;
TITLE 'SAS Data Set Toads';
RUN;
The Input statement tells SAS what variables to read in the order given.
When dealing with long records, include the LRECL=option in the INFILE statement, e.g. Infile
'c:\stk\Training\sas\parks.txt' LRECL=2000; This ensure that data are not truncated.
23
Importing Data
You can also import data directly into SAS. Follow these instructions.
Make sure that the Explorer window is your active window then select File -> Import
Data….We want to import an Excel file.
On the displayed dialogue box, click on the dropdown arrow and select Microsoft Excel
Workbook on PC Files Server.
Click Next and select Browse…
Under File name: type \\campus\software\dept\spss. Then click Open and select Employee
data.
Click Open again. Notice that the full path of the file appears next to Workbook:. So if you
knew the full path of your file you could have type it in.
Click OK. Then click Next and select Mynewlib from the library dropdown menu.
Under Member type the name of your file e.g. myfirstimport.
Click Next and click Finish. Check the Log window to make sure that the file was imported
successfully. Open and examine the file.
WORKING WITH YOUR DATA
Once you have successfully included your data in the SAS software, there is a lot that you can do
with it. This section of the document gives you examples of some of the things that you can do. You
can create new variables from existing ones, use SAS functions on variables, request various types
of statistics, build a model, etc.
Basic Statistical Procedures
Examining the Distribution of Data with PROC UNIVARIATE
This is a good procedure to help you explore your data before you do any formal statistical testing.
Type this simple programme into the data editor and submit it:
proc univariate data=stk.employeedata plot normal;
var salary salbegin;
run;
The normal option produces tests of normality while the plot option produces three plots of your
data (stem-and-leaf, box plot and normal probability plot).
Examine the output. What conclusion can you draw?
Hint: Skewness indicates how symmetric the distribution is while Kurtosis indicates how flat or
peaked the distribution is. The normal distribution has values of zero for both Skewness and
Kurtosis. If the mean (average) is very different from the median this indicates that the data is not
normal.
Producing Statistics with PROC MEANS
Proc means produce the descriptive statistics that you produce with proc Univariate. Univariate
prints out all these statistics by default. But if you know you want only a few of these statistics then
proc Means is better. Type this simple programme into the data editor and submit it:
24
proc means data=stk.employeedata n mean median std mode;
var salary salbegin;
run;
Testing Categorical Data with PROC FREQ
Produces many statistics for categorical data. The best known of which is chi-square. Type this
simple programme into the data editor and submit it:
proc format;
value $gender 'm'='male'
'f'='Female';
value group 1='Clerical'
2='Custodial'
3='Manager';
proc freq data=stk.employeedata;
format gender $gender. jobcat group.;
tables gender jobcat gender*jobcat / chisq measures;
run;
Examine the output. What conclusion can you draw?
Examining Correlations with PROC CORR
A correlation coefficient measures the relationship between two variables. Before producing the
correlation coefficient between two interval variables, it is advisable to produce a scatter plot first.
Type this simple programme into the data editor and submit it:
proc gplot data=stk.employeedata;
plot salary*salbegin = gender;
run;
proc corr data=stk.employeedata;
var salary salbegin;
run;
Examine the output. What conclusion can you draw?
Using PROC ANOVA for One-Way Analysis of Variance
The ANOVA procedure is one of several in the SAS System that perform analysis of variance.
PROC ANOVA is specifically designed for balanced data – data were there are equal numbers of
observations (cases or subjects) in each classification. If your data is not balance use the GLM
procedure (see next example). To do this example you need to first import the data file from
\\campus\software\dept\spss. The file is an SPSS file and is called ‘number of words recalled’. So
you must tell SAS that you want to import an SPSS file in the Import Window (File -> Import
Data…). Import the file into your library Mynewlib and give it a member name words. Type this
simple programme into the data editor and submit it:
proc anova data=stk.words;
class group;
model score=group;
means group / scheffe;
run;
quit; *the quite statement is necessary because proc anova will still be running;
25
Examine the output. What conclusion can you draw?
Using PROC GLM
Use this when your data is unbalance. Type this simple programme into the data editor and submit
it:
proc glm data=stk.employeedata;
class jobcat;
model salary=jobcat;
means jobcat / scheffe;
run;
quit;
Examine the output. What conclusion can you draw?
Using PROC T TEST
You can use this procedure for group means comparison, one-sample comparison or paired sample
comparison. It is assumed that the data is normally distributed.
Group Means Comparison (Independent samples)
Type this simple programme into the data editor and submit it:
ods graphics on;
proc ttest data=stk.employeedata;
class gender;
var salary;
run;
ods graphics off;
Examine the output. What conclusion can you draw?
One-Sample Comparison
Is it true that the average salary of people working in this company is $30,000? Type this simple
programme into the data editor and submit it:
ods graphics on;
proc ttest data=stk.employeedata h0=30000;
var salary;
run;
ods graphics off;
Examine the output. What conclusion can you draw?
Paired Sample Comparison
Some common examples of paired samples are:
pre- and post-test scores for a student receiving tutoring
fuel efficiency readings of two fuel types observed on the same automobile
26
sunburn scores for two sunblock lotions, one applied to the individual’s right arm, one to the
left arm
political attitude scores of husbands and wives
Type this simple programme into the data editor and submit it:
ods graphics on;
proc ttest data=stk.employeedata;
paired Salary*Salbegin;
run;
ods graphics off;
Examine the output. What conclusion can you draw?
Nonparametric Analysis
Most formal statistical analysis assumes that your data is normally distributed. If your data fails this
assumption you must use nonparametric analysis. We have seen that salary is not normally
distributed. Type this simple programme into the data editor and submit it:
Independent samples
ods graphics on;
proc npar1way data=stk.employeedata wilcoxon median;
class gender;
var salary;
run;
ods graphics off;
Examine the output. What conclusion can you draw?
Using PROC REG
You use proc reg to build linear regression model. Type this simple programme into the data editor
and submit it:
ods graphics on;
proc reg data=stk.employeedata;
model salary=salbegin prevexp;
run;
ods graphics off;
Adding to programme. proc format;
value $gender 'm'='male'
'f'='Female';
value group 1='Clerical'
2='Custodial'
3='Manager';
proc means data=mynewlib.employeedata n mean median std mode;
format gender $gender. jobcat group.;
class gender jobcat;
var salary salbegin;
run;