81

Advanced SAS Programming Techniques ()

Embed Size (px)

DESCRIPTION

The DATA Step, Input/Output,SAS Functions,Looping and Arrays,The NULL Data Set, Data Step Examples,

Citation preview

Page 1: Advanced SAS Programming Techniques ()

Advanced SAS ProgrammingTechniques

A Workshop Presented to the

Alaska Chapter

The American Fisheries Society

E. Barry Moser

Department of Experimental Statistics

Louisiana State University

and

Louisiana State University Agricultural Center

Baton Rouge, LA 70803

Phone: 504-388-8376

FAX: 504-388-8344

E-mail: [email protected]

September 29-October 3, 1997

Page 2: Advanced SAS Programming Techniques ()

Contents

1 Introduction 3

2 The DATA Step 4

2.1 The DATA STEP process : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4

2.1.1 An implicit loop : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4

2.1.2 RETURN, DELETE, and OUTPUT : : : : : : : : : : : : : : : : : : 5

2.1.3 Compound Statements : : : : : : : : : : : : : : : : : : : : : : : : : : 7

2.1.4 Data Set Options : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8

2.1.5 DROP, KEEP, and RETAIN : : : : : : : : : : : : : : : : : : : : : : 10

2.2 Input/Output : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10

2.2.1 List Input : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10

2.2.2 Column Input : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13

2.2.3 Pointer Control and Formatted Input : : : : : : : : : : : : : : : : : 14

2.2.4 The PUT Statement : : : : : : : : : : : : : : : : : : : : : : : : : : : 18

2.2.5 SAS Formats and Informats : : : : : : : : : : : : : : : : : : : : : : : 19

2.3 SAS Functions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21

2.3.1 Mathematical Functions : : : : : : : : : : : : : : : : : : : : : : : : : 21

2.3.2 Random Number Generators : : : : : : : : : : : : : : : : : : : : : : 22

2.3.3 String Functions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23

2.3.4 Date and Time Functions : : : : : : : : : : : : : : : : : : : : : : : : 24

2.3.5 PUT and INPUT Functions : : : : : : : : : : : : : : : : : : : : : : : 25

2.4 Looping and Arrays : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 26

2.4.1 Univariate and Multivariate Data Views : : : : : : : : : : : : : : : : 27

1

Page 3: Advanced SAS Programming Techniques ()

CONTENTS 2

2.4.2 Indeterminant DO Loops : : : : : : : : : : : : : : : : : : : : : : : : 32

2.5 The NULL Data Set : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 33

2.6 Data Step Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 35

2.6.1 Simple Random Sampling Without Replacement : : : : : : : : : : : 35

2.6.2 Data Recoding : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 36

3 Working With Files 38

3.1 External Files : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 38

3.1.1 FTP Access : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 44

3.1.2 WWW Access : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45

3.2 Including External SAS Code : : : : : : : : : : : : : : : : : : : : : : : : : : 45

3.3 The SAS Data Library : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45

3.3.1 The LIBNAME Statement : : : : : : : : : : : : : : : : : : : : : : : : 46

3.3.2 Library Procedures : : : : : : : : : : : : : : : : : : : : : : : : : : : : 47

3.4 File Import/Export/Transport : : : : : : : : : : : : : : : : : : : : : : : : : 51

3.4.1 Import/Export : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 51

3.4.2 Transport : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 53

3.5 The X Files : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 55

4 The Macro Language 57

4.1 Macro Variables : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 57

4.2 Macro Procedures : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 59

4.3 Bootstrap Example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 62

4.4 Cluster Dendrogram : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 66

5 SAS Special Files 70

5.1 Autoexec.sas : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 70

5.2 Con�g.sas : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 72

5.3 Pro�le.sct : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 75

6 SAS Internet Tools 76

6.1 Capturing OUTPUT for the Web : : : : : : : : : : : : : : : : : : : : : : : : 76

Page 4: Advanced SAS Programming Techniques ()

Chapter 1

Introduction

The SAS1 system, composed of many diverse components, is a very powerful programming

environment, data management and data analysis environment, and report generation and

graphics presentation environment. This manuscript was developed for a short-course in

\advanced SAS." Obviously the coverage will have to be quite limited. The coverage is

designed around material that I have encountered through my teaching, research, and sta-

tistical consulting work that I believe will be relevant and useful for others dealing with

basic data management and statistical analysis needs. This manuscript is not intended as

a SAS language or SAS system reference manual, it hardly scratches the surface. Nor is

it designed to show how to do statistical analysis with the SAS system. The manuscript

will �rst focus on the data step, as a lot of the power of the SAS environment can be

demonstrated through the data step. Next, the input/output and library system will be

discussed. Later the macro language will be introduced. And �nally several chapters dealing

with various parts of the system, graphics, data analysis procedures, and the internet will

be introduced. As this is an \advanced" course, some items will be introduced before they

are actually covered in some detail. This was purposefully done so as to avoid completely

arti�cial-looking or contrived examples (although a few do exist, sorry). Further, since not

all of the \basics" are covered, keep copies of the SAS manuals available. The best way to

learn the SAS system and to bene�t from this course is to experiment with the examples

and to create your own. Again, DO NOT hessitate to modify the examples and to create

new ones.

1SAS, SAS/BASE, SAS/GRAPH, SAS/ACCESS, SAS/ASSIST, SAS/FSP, SAS/INSIGHT, SAS/OR,

SAS/ETS, SAS/IntrNet, SAS/IML, and SAS/STAT are registered trademarks or copyrights of SAS Institute,

Inc., Cary, NC.

3

Page 5: Advanced SAS Programming Techniques ()

Chapter 2

The DATA Step

2.1 The DATA STEP process

2.1.1 An implicit loop

To understand much of what happens in the data step, one �rst needs to understand its

overall design. When originally conceived, the SAS data step was designed to get data

stored in some \raw" format into the SAS data format and to perform any transformations

and computations on that data prior to data analysis with procedures that would follow.

Thus, the data step was designed with an implicit loop around the data input. That is,

rather than the programmer having to explicitly write a loop around the input code, as

would need to be done with FORTRAN (and most other programming languages), the loop

was already assumed to be needed, and was, therefore, automatically supplied. At the

end of the implied loop, the resulting data, in the form of variables, is output to the new

SAS data set. The programmer writes the code needed to process a single observation of

data, and the data step will then automatically repeat this same code for each observation,

outputing each new observation in turn into the new SAS data set. This same basic process

is also followed when a SAS data set is created from an existing SAS data set, such as when

several SAS data sets are concatenated or merged together. The example below illustrates

the basic looping process using a portion of the ier data set.

Title2 "Simple Data Step";Data One;Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;

Datalines;7 21 74 2 7 3 0 5 3.5 1.4 57.001 9 76 2 3 2 0 1 5.3 3.0 0.0012 18 74 2 4 1 0 5 5.4 3.4 83.202 15 76 2 1 1 0 5 6.0 5.7 111.009 13 75 2 2 2 1 5 10.1 23.4 203.00

;Proc Print Data=One;Run;

4

Page 6: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 5

Data Step ExamplesSimple Data Step

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL

1 7 21 74 2 7 3 0 5 3.5 1.4 57.002 1 9 76 2 3 2 0 1 5.3 3.0 0.003 12 18 74 2 4 1 0 5 5.4 3.4 83.204 2 15 76 2 1 1 0 5 6.0 5.7 111.005 9 13 75 2 2 2 1 5 10.1 23.4 203.00

2.1.2 RETURN, DELETE, and OUTPUT

The behavior of the data step loop can be modi�ed by several statements. The RETURN

statement causes execution of a loop to \return" to a speci�c point in a loop. When the

loop is the implicit data step loop, execution returns immediately to the beginning of the

data step loop. If no OUTPUT statements are contained in the data step, then the RETURN

statement also outputs the current observation in whatever stat it is in into the SAS data

set.

Title2 "RETURN Statement";Data One;Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;If TSL=0 Then RETURN;ConvFact=Lt/TSL;

Datalines;7 21 74 2 7 3 0 5 3.5 1.4 57.001 9 76 2 3 2 0 1 5.3 3.0 0.0012 18 74 2 4 1 0 5 5.4 3.4 83.202 15 76 2 1 1 0 5 6.0 5.7 111.009 13 75 2 2 2 1 5 10.1 23.4 203.00

;Proc Print Data=One;Run;

Data Step ExamplesRETURN Statement

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.0614042 1 9 76 2 3 2 0 1 5.3 3.0 0.00 .3 12 18 74 2 4 1 0 5 5.4 3.4 83.20 0.0649044 2 15 76 2 1 1 0 5 6.0 5.7 111.00 0.0540545 9 13 75 2 2 2 1 5 10.1 23.4 203.00 0.049754

Notice that the conversion factor, CONVFACT, for the second observation is missing (rep-

resented by a period). This observation could be dropped from the data set by several

methods. We'll consider a couple to further illustrate the behavior of the data step. The

OUTPUT statement can be used to output an observation to a data set or data sets. When it

is present in a data step, observations will ONLY BE OUTPUT when the OUTPUT statement

is executed. Consider the example below using both the RETURN and OUTPUT statements.

Page 7: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 6

Title2 "RETURN and OUTPUT Statements";Data One;Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;If TSL=0 Then RETURN;ConvFact=Lt/TSL;OUTPUT;

Datalines;7 21 74 2 7 3 0 5 3.5 1.4 57.001 9 76 2 3 2 0 1 5.3 3.0 0.0012 18 74 2 4 1 0 5 5.4 3.4 83.202 15 76 2 1 1 0 5 6.0 5.7 111.009 13 75 2 2 2 1 5 10.1 23.4 203.00

;Proc Print Data=One;Run;

Data Step ExamplesRETURN and OUTPUT Statements

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.0614042 12 18 74 2 4 1 0 5 5.4 3.4 83.20 0.0649043 2 15 76 2 1 1 0 5 6.0 5.7 111.00 0.0540544 9 13 75 2 2 2 1 5 10.1 23.4 203.00 0.049754

Note that the second observation was never output to the new SAS data set. Now, what

if we would like the observations with zero (or no) total scale length (TSL) to be placed

into one data set and the others to be placed into another data set. Let's use the OUTPUT

statement to do this for us.

Title2 "OUTPUT to Different Data Sets";Data WithTSL NoTSL;Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;ConvFact=Lt/TSL;If TSL=0 Then OUTPUT NoTSL;Else OUTPUT WithTSL;

Datalines;7 21 74 2 7 3 0 5 3.5 1.4 57.001 9 76 2 3 2 0 1 5.3 3.0 0.0012 18 74 2 4 1 0 5 5.4 3.4 83.202 15 76 2 1 1 0 5 6.0 5.7 111.009 13 75 2 2 2 1 5 10.1 23.4 203.00

;Title3 "With TSL > 0";Proc Print Data=WithTSL;Run;

Title3 "TSL Not Measured";Proc Print Data=NoTSL;Run;

Page 8: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 7

Data Step ExamplesOUTPUT to Different Data Sets

With TSL > 0

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.0614042 12 18 74 2 4 1 0 5 5.4 3.4 83.20 0.0649043 2 15 76 2 1 1 0 5 6.0 5.7 111.00 0.0540544 9 13 75 2 2 2 1 5 10.1 23.4 203.00 0.049754

Data Step ExamplesOUTPUT to Different Data Sets

TSL Not Measured

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

1 1 9 76 2 3 2 0 1 5.3 3 0 .

One problem that we will have here is that the conversion factor will be computed on

each and every observation and so we get a divide by zero error on the second observation

(TSL=0). Further, since no conversion factor is possible when the scale length is not

measured, we should like to drop the CONVFACT variable from the NoTSL data set.

2.1.3 Compound Statements

A compound statement is a programming statement that consists of several simple state-

ments. In the SAS Language, compound statements are constructed using the DO and END

statements. Often a compound statement is the object of a conditional statement such as

the IF statement. Using the Flier data set from the previous example, we can prohibit the

divide by zero error and output the observations to the proper data sets using a compound

statement.

Title2 "Compound Statements";Data WithTSL NoTSL;Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;If TSL=0 Then OUTPUT NoTSL;ElseDo;ConvFact=Lt/TSL;OUTPUT WithTSL;End;

Datalines;7 21 74 2 7 3 0 5 3.5 1.4 57.001 9 76 2 3 2 0 1 5.3 3.0 0.0012 18 74 2 4 1 0 5 5.4 3.4 83.202 15 76 2 1 1 0 5 6.0 5.7 111.009 13 75 2 2 2 1 5 10.1 23.4 203.00

;Title3 "With TSL > 0";Proc Print Data=WithTSL;Run;

Title3 "TSL Not Measured";Proc Print Data=NoTSL;Run;

Page 9: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 8

Data Step ExamplesCompound Statements

With TSL > 0

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.0614042 12 18 74 2 4 1 0 5 5.4 3.4 83.20 0.0649043 2 15 76 2 1 1 0 5 6.0 5.7 111.00 0.0540544 9 13 75 2 2 2 1 5 10.1 23.4 203.00 0.049754

Data Step ExamplesCompound StatementsTSL Not Measured

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

1 1 9 76 2 3 2 0 1 5.3 3 0 .

Unfortunately, the CONVFACT variable is still present in the NoTSL data set, although it

was not computed for any observations in this data set. A DROP statement could be used

to remove the CONVFACT variable from ALL output data sets, but this is not what we

would like either.

2.1.4 Data Set Options

There are a number of options that can be speci�ed for accessing SAS data sets, including

passwords to protect a data set from being read, edited, or written to. These options are

speci�ed within parenthesis following the data set name and can be used within a data step

or a procedure step. Some options that are commonly used are

KEEP= Speci�es a list of variables to be retained in a new data set.

DROP= Speci�es a list of variables to be excluded from a new data set.

LABEL= Speci�es a label that is to be stored in the SAS data set and will be shown

using various data set utilities.

RENAME= Permits variable names to be changed. The old names are available in the

current data step, while the new names will actually be stored in the new data set.

Options that are used for the processing of an existing SAS data set, such as when using

SET, MERGE, and UPDATE statements, include

KEEP= Speci�es a list of variables to be accessible from an existing data set.

DROP= Speci�es a list of variables to be inaccessible from an existing data set.

FIRSTOBS= Speci�es the observation number with which processing should begin.

OBS= Speci�es the last observation number to be processed, after which pro-

cessing will stop.

Page 10: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 9

IN= Names a new variable that will have the value 1 when the current ob-

servation is read from the data set and 0 when the current observation is read from

another data set.

Now we can use this information to update our program so that we can have a di�erent set

of variables for the two new data sets.

Title2 "DROP Data Set Option";Data WithTSL NoTSL(DROP=ConvFact);Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;If TSL=0 Then OUTPUT NoTSL;ElseDo;ConvFact=Lt/TSL;OUTPUT WithTSL;End;

Datalines;7 21 74 2 7 3 0 5 3.5 1.4 57.001 9 76 2 3 2 0 1 5.3 3.0 0.0012 18 74 2 4 1 0 5 5.4 3.4 83.202 15 76 2 1 1 0 5 6.0 5.7 111.009 13 75 2 2 2 1 5 10.1 23.4 203.00

;Title3 "With TSL > 0";Proc Print Data=WithTSL;Run;

Title3 "TSL Not Measured";Proc Print Data=NoTSL;Run;

Data Step ExamplesDROP Data Set Option

With TSL > 0

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.0614042 12 18 74 2 4 1 0 5 5.4 3.4 83.20 0.0649043 2 15 76 2 1 1 0 5 6.0 5.7 111.00 0.0540544 9 13 75 2 2 2 1 5 10.1 23.4 203.00 0.049754

Data Step ExamplesDROP Data Set OptionTSL Not Measured

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL

1 1 9 76 2 3 2 0 1 5.3 3 0

Finally, this is what we had wanted. Note that the KEEP= option could also have been used,

but would have required that we list each of the variables that we wanted to keep in the

new data set. The choice as to KEEP or DROP is usally based upon which list is shorter or

easiest to write.

Page 11: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 10

2.1.5 DROP, KEEP, and RETAIN

There are a few statements in the SAS data step that are not executable statements and can

be placed anywhere within the SAS data step. As with the DROP= and KEEP= data set options,

the DROP and KEEP statements are used to specify which variables are to be excluded or

included in the new data set or data sets being created. Others include the FORMAT, LENGTH,

and ARRAY statements. The RETAIN statement can also be placed anywhere but can serve

several roles. Normally, after the variables have been output for an observation at the end

of the data step loop, the variables' values are reset to missing before the next observation

is processed. The RETAIN statement alters this behavior by not resetting the data values for

any variables given in its list of variables. Another use of the RETAIN statement is to give

initial values to speci�c variables. Examples of the RETAIN statement will be encountered

later.

2.2 Input/Output

One of the many powerful features of the SAS language is the diversity of methods, and

modi�cations to them, for inputing data into a SAS data set. We will review several of the

more important methods and then consider some options and modi�cations that become

more and more useful as the data become more complicated to read. For reading data from

a spreadsheet, for example, you may want to skip to the section on �le import and export

on page 51. We will next look at output methods that can be very useful for generating

reports that have to be in very speci�c formats.

The basic statement used for reading raw data from a �le is the INPUT statement. It takes as

its arguments a list of variables, pointer placement instructions, and informat information.

For options that are used to modify the standard behavior of the input process, the INFILE

statement is used.

2.2.1 List Input

The list input method is the simplest of methods and works in very many situations. Essen-

tially, one speci�es the variable names in the order in which they occur in the data set with

no pointer placement instructions. The values on a line will be read in order and placed into

the variables according to their order in the INPUT statement. This is seen in the \Simple

Data Step" example on page 4.

The behavior of the input process, however, depends upon having at least as many data

values on each line as there are variables in the INPUT statement. If the number of data

values is less than the number of variables speci�ed in the INPUT statement, then, by default,

the input pointer will be moved to the beginning of the next line of input and the remainder

of the variables will be �lled using data on this new line. The SAS data step parser will

report the following information in the SAS LOG:

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

Page 12: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 11

This may not be what you have intended. Consider the example below where there are

several missing data values scattered throughout the input data set. Note that the data

values become mismatched with which variables they should correspond with.

Title2 "List Input / Missing Data";Data One;Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;

Datalines;7 21 74 2 7 3 0 5 3.51 9 76 2 3 2 0 1 5.3 3.0 0.0012 18 74 1 0 5 5.4 3.4 83.202 15 76 2 1 1 0 5 6.0 5.7 111.009 13 75 2 2 2 1 5 23.4 203.001 28 75 2 3 1 1 5 13.5 33.2 242.00

;

Proc Print Data=One;Run;

Data Step ExamplesList Input / Missing Data

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL

1 7 21 74 2 7 3 0.0 5.0 3.5 1 92 12 18 74 1 0 5 5.4 3.4 83.2 2 153 9 13 75 2 2 2 1.0 5.0 23.4 203 1

First of all, the month and day information for observation 2 was used as the weight and

total scale length data for observation 1. Second, observe that the remainder of the data for

observation 2 was discarded. Third, observation 3 has missing data for the area and station

variables, but the sex and age data were used for these variables. Finally, we note that

instead of 6 observations, the �nal data set has only 3. What has happened is the following.

When list input is used, the input processor simply scans a line of data until it comes to

non-blank information. It will then place this data into the next variable, in order, from

the INPUT statement. It does not matter in which columns the data are placed on the line,

except that their left-to-right order corresponds with the order of the variables in the INPUT

statement. If insu�cient values are found to �ll the variables, then the processor moves on

to the next line of data. When all variables have been �lled, the processor stops reading

in data and the current line of data being read is, by default, discarded. This is why the

remainder of line 2 was not used. You should now be able to duplicate the assignment of

the data to the variables by following the above rules.

How to modify the list input to handle the missing data? If the missing data all occur at

the end of a data line, as might happen with repeated measurements data where dropouts

would have no data after dropping out, then the MISSOVER option of the INFILE statement

can be used. This option speci�es that the pointer is NOT to be moved to the next line if

no more values are found on a data line, rather, the remaining variables are to be assigned

the value for missing data (by default, a period for numeric data and a blank for character

data).

Page 13: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 12

Title2 "Missing Data / Missover Option";Data Trout;/* Flack Lake trout Catch Data*/Infile Datalines MISSOVER;Input Year Age3-Age9;

Datalines;1975 0 105 674 446 16 2 21976 46 422 838 726 70 4 41977 3 310 1224 1068 651978 14 354 1264 1172 69 0 61979 6 429 1222 1067 192;Proc Print Data=Trout;Run;

Data Step ExamplesMissing Data / Missover Option

OBS YEAR AGE3 AGE4 AGE5 AGE6 AGE7 AGE8 AGE9

1 1975 0 105 674 446 16 2 22 1976 46 422 838 726 70 4 43 1977 3 310 1224 1068 65 . .4 1978 14 354 1264 1172 69 0 65 1979 6 429 1222 1067 192 . .

Notice that we did get 5 observations with the correct data values. In this case the missing

values are not truly missing, but rather are zero values. The data step below would change

the missing values to zeros. See the section on looping and arrays on page 26 to see how to

solve this problem more generally and easily.

Title2 "Missing Data / Missover Option";Data Trout;/* Flack Lake trout Catch Data*/Infile Datalines MISSOVER;Input Year Age3-Age9;If Age8=. Then Age8=0;If Age9=. Then Age9=0;

Datalines;1975 0 105 674 446 16 2 21976 46 422 838 726 70 4 41977 3 310 1224 1068 651978 14 354 1264 1172 69 0 61979 6 429 1222 1067 192;

Now what if the missing data are interior to other data line values? In order to continue

to use list input, one needs to enter the missing value symbol in the data lines for these

values. This is illustrated below.

Title2 "List Input / Missing Data";Data One;Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;

Datalines;7 21 74 2 7 3 0 5 3.5 . .1 9 76 2 3 2 0 1 5.3 3.0 0.0012 18 74 . . 1 0 5 5.4 3.4 83.202 15 76 2 1 1 0 5 6.0 5.7 111.009 13 75 2 2 2 1 5 . 23.4 203.001 28 75 2 3 1 1 5 13.5 33.2 242.00

;

Page 14: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 13

Proc Print Data=One;Run;

Data Step ExamplesList Input / Missing Data

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL

1 7 21 74 2 7 3 0 5 3.5 . .2 1 9 76 2 3 2 0 1 5.3 3.0 0.03 12 18 74 . . 1 0 5 5.4 3.4 83.24 2 15 76 2 1 1 0 5 6.0 5.7 111.05 9 13 75 2 2 2 1 5 . 23.4 203.06 1 28 75 2 3 1 1 5 13.5 33.2 242.0

2.2.2 Column Input

When the values for variables are placed in speci�c columns, as can be produced from

a printed spreadsheet or data base program, then column input can be very valuable. In

addition, since it only looks for data in the speci�ed columns, it is easy to skip over unwanted

information and to easily handle missing data. Consider the data set above with the internal

missing values, then read the data using column input. If you are using the SAS Program

Editor window to edit the data and you have \line numbers" turned on, then you can enter

the \cols" line command over any of the line numbers and you will get a reference line to

determine the columns from.

Title2 "Column Input / Missing Data";Data One;Input Mo 2-3 Day 6-7 Yr 10-11 Ar 14 St 18

Sex 22 Age 27 Sn 31 Lt 32-36 Wt 37-42 TSL 43-50;/* Some comment lines to help us find the columns *//* 1 2 3 4 5*//*345678901234567890123456789012345678901234567890*/Datalines;7 21 74 2 7 3 0 5 3.51 9 76 2 3 2 0 1 5.3 3.0 0.0012 18 74 1 0 5 5.4 3.4 83.202 15 76 2 1 1 0 5 6.0 5.7 111.009 13 75 2 2 2 1 5 23.4 203.001 28 75 2 3 1 1 5 13.5 33.2 242.00

;

Proc Print Data=One;Run;

Data Step ExamplesColumn Input / Missing Data

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL

1 7 21 74 2 7 3 0 5 3.5 . .2 1 9 76 2 3 2 0 1 5.3 3.0 0.03 12 18 74 . . 1 0 5 5.4 3.4 83.24 2 15 76 2 1 1 0 5 6.0 5.7 111.05 9 13 75 2 2 2 1 5 . 23.4 203.06 1 28 75 2 3 1 1 5 13.5 33.2 242.0

Page 15: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 14

One can also mix the various input methods. Thus, the input line could have been written

as

Input Mo Day Yr Ar 14 St 18 Sex Age Sn Lt 32-36 Wt 37-42 TSL 43-50;

since the date, sex, age, and scale number information were never missing. It is more

common to use either all list or all column input.

2.2.3 Pointer Control and Formatted Input

There are also a number of special pointer controls that can move the pointer around in the

input line or to change its behavior. Similar to column input, the pointer can be positioned

at a particular column using @n, then list (or formatted) input can be used to read the data

in, where the value n is the column number. One can also use the symbols +n and -n to

move the pointer forward or backward along the line. When the data for an observation falls

on more than a single physical input line, the pointer can be moved on to additional lines

using the symbol #n, where n is the line number relative to the current line. For character

data, the symbol & can be used to tell the pointer that a single space does not separate

variable values for a speci�c character variable. The ier data set is modi�ed so that we

can illustrate these techniques. The data for a single �sh will now be placed on 2 physical

lines for input and the �rst variable will be the scale reader's name.

Title2 "Pointer Control / Multiple Input Lines";Data One;Length Name $13;Input @1 Name & Mo Day Yr @26 Ar 1. @30 St 1.

Sex Age Sn #2 @1 Lt 5.1 @6 Wt 6.1 @12 TSL 8.2;/* Some comment lines to help us find the columns *//* 1 2 3 4 5*//*345678901234567890123456789012345678901234567890*/Datalines;Bill Smith 7 21 74 2 7 3 0 53.5

Bill Smith 1 9 76 2 3 2 0 15.3 3.0 0.00

John P. Doe 12 18 74 1 0 55.4 3.4 83.20

John P. Doe 2 15 76 2 1 1 0 56.0 5.7 111.00

John P. Doe 9 13 75 2 2 2 1 523.4 203.00

Bill Smith 1 28 75 2 3 1 1 513.5 33.2 242.00;

Proc Print Data=One;Run;

Data Step ExamplesPointer Control / Multiple Input Lines

OBS NAME MO DAY YR AR ST SEX AGE SN LT WT TSL

1 Bill Smith 7 21 74 2 7 3 0 5 3.5 . .2 Bill Smith 1 9 76 2 3 2 0 1 5.3 3.0 0.03 John P. Doe 12 18 74 . . 1 0 5 5.4 3.4 83.24 John P. Doe 2 15 76 2 1 1 0 5 6.0 5.7 111.0

Page 16: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 15

5 John P. Doe 9 13 75 2 2 2 1 5 . 23.4 203.06 Bill Smith 1 28 75 2 3 1 1 5 13.5 33.2 242.0

Note that we had to leave at least 2 spaces separating the reader's name from the month

value since we used the & symbol, otherwise the month value would have been read as part

of the name. Secondly, since we did not specify a character format, we needed to de�ne the

length of the character variable using the LENGTH statement. Otherwise the length would

have been determined from the �rst value read in. Using formatted character input we

could have used the INPUT statement

Input @1 Name $Char13. Mo Day Yr @26 Ar 1. @30 St 1.Sex Age Sn #2 @1 Lt 5.1 @6 Wt 6.1 @12 TSL 8.2;

and have dropped the LENGTH statement from the program. SAS also has input formats,

called \informats" for reading in other types of data such as social security numbers, phone

numbers, and dates and times. We will give some examples of working with SAS dates and

times later. The input formats can be explicitly given on the INPUT statement, or can be

assigned to the variables using the INFORMAT statement.

When reading complex data �les controlling the pointer can be very important. Normally

after the input statement has been executed for an observation, the physical input line

is discarded and the next physical input line is moved into the input bu�er. Sometimes,

however, you would like to read what is on the physical line at speci�c points, say looking

for special key words, then you use an input statement that depends upon the key word.

The \trailing" @ and @@ signs can be used to hold the pointer on the current input line.

The single @ sign holds the line until another input statement releases it or until the data

step loop restarts. The double @@ sign holds the current line even after the data step loop

restarts. They are called \trailing" because the symbol is placed at the very end of the

input statement (just before the semicolon).

Example: Multiple observations per input line

In this example more than one observation is placed on a single physical line. To keep the

example simple, we will assume that there is no missing data, or if missing data occurr

they are indicated in the data set by a \." surrounded by a space. The data set consists of

lengths and weights of ier sun�sh.

Title2 "Multiple Observations Per Input Line";Data One;Input Lt Wt @@;Datalines;3.5 1.4 5.3 3.0 5.4 3.4 6.0 5.7 10.1 23.413.5 33.2 8.7 16.8 11.9 44.4 12.0 40.9 16.0 103.49.1 17.9 10.5 29.1 17.2 127.1 17.9 132.712.1 47.9 17.2 136.4 17.3 138.3 17.5 134.413.2 42.2 16.4 110.0 16.7 101.6 15.3 92.915.4 76.3 18.0 125.8 18.5 131.6;Proc Print Data=One;Run;

Page 17: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 16

Data Step ExamplesMultiple Observations Per Input Line

OBS LT WT

1 3.5 1.42 5.3 3.03 5.4 3.44 6.0 5.75 10.1 23.46 13.5 33.27 8.7 16.88 11.9 44.49 12.0 40.910 16.0 103.411 9.1 17.912 10.5 29.113 17.2 127.114 17.9 132.715 12.1 47.916 17.2 136.417 17.3 138.318 17.5 134.419 13.2 42.220 16.4 110.021 16.7 101.622 15.3 92.923 15.4 76.324 18.0 125.825 18.5 131.6

Example: Conditional input

In this example, we have data on lengths and weights of �sh collected on di�erent sampling

dates. To keep the amount of typing to a minimun, the data were coded such that the date

of collection falls on one line by itself and then on separate lines come the length-weight

data pairs measured on that date. Then the next date appears followed by its length-weight

data pairs. Since there need not be the same number of �sh measured on each date, we must

test to see whether the input line contains a date or contains length-weight data. Since all

of the date lines contain a \/" while none of the length-weight data lines do, we can test

for the presence of the \/" on the input line using the INDEX() function.

Page 18: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 17

Title2 "Conditional Input";Data One;Drop Test;Retain Date;Length Test $8;Input Test @;If Index(Test,'/') Then /* is a date value */Input @1 Date MMDDYY8.;Else /* is a fish LT WT observation */Do;Input @1 Lt Wt;Output;End;

Datalines;4/6/973.5 1.45.3 3.05.4 3.46.0 5.710.1 23.413.5 33.28.7 16.811.9 44.412.0 40.916.0 103.49.1 17.95/12/9710.5 29.117.2 127.117.9 132.712.1 47.917.2 136.417.3 138.317.5 134.46/4/9713.2 42.216.4 110.016.7 101.615.3 92.915.4 76.318.0 125.818.5 131.6;Proc Print Data=One;Format Date MMDDYY8.;Run;

Page 19: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 18

Data Step ExamplesConditional Input

OBS DATE LT WT

1 04/06/97 3.5 1.42 04/06/97 5.3 3.03 04/06/97 5.4 3.44 04/06/97 6.0 5.75 04/06/97 10.1 23.46 04/06/97 13.5 33.27 04/06/97 8.7 16.88 04/06/97 11.9 44.49 04/06/97 12.0 40.910 04/06/97 16.0 103.411 04/06/97 9.1 17.912 05/12/97 10.5 29.113 05/12/97 17.2 127.114 05/12/97 17.9 132.715 05/12/97 12.1 47.916 05/12/97 17.2 136.417 05/12/97 17.3 138.318 05/12/97 17.5 134.419 06/04/97 13.2 42.220 06/04/97 16.4 110.021 06/04/97 16.7 101.622 06/04/97 15.3 92.923 06/04/97 15.4 76.324 06/04/97 18.0 125.825 06/04/97 18.5 131.6

Note that we needed to retain the date variable so that its value would be maintained

through each loop of the data step. The date is only changed when a date value is found on

the input line. Also note that in this example the date format for the DATE variable was

speci�ed in the PROC PRINT section rather than in the data step. If the FORMAT statement

is placed in the data step, then the format will be used by future procedures. If it is placed

in a procedure step, then the format is local to that procedure and will not carry on to

future procedures.

2.2.4 The PUT Statement

The PUT statement provides the output interface for the SAS data step. It works very

similar to the INPUT statement in that a list of variables and constants can be given, or

pointer control can be used to position the pointer to get data placed into speci�c positions,

or a combination of them can be used. Additionally, SAS formats (to be discussed later

on page 19) can also be used. This all provides for a very powerful report writing tool. A

simple example will be used to illustrate some of this methodology.

Page 20: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 19

Title2 "Report Writing";Data One;Input @1 Name $Char13. Mo Day Yr @26 Ar 1. @30 St 1.

Sex Age Sn #2 @1 Lt 5.1 @6 Wt 6.1 @12 TSL 8.2;File PRINT; /* Send to the OUTPUT window */If _N_ = 1 Then Put // @10 "Flier Sunfish Scale Report";PUT @5 Name $13. @20 Mo 2. '/' Day 2. '/' Yr 2.

@30 "Area=" Ar @41 St= / @5 Sex Age SnLt 10.3 Wt 10.0 TSL 10.2;

Datalines;Bill Smith 7 21 74 2 7 3 0 53.5

Bill Smith 1 9 76 2 3 2 0 15.3 3.0 0.00

John P. Doe 12 18 74 1 0 55.4 3.4 83.20

John P. Doe 2 15 76 2 1 1 0 56.0 5.7 111.00

John P. Doe 9 13 75 2 2 2 1 523.4 203.00

Bill Smith 1 28 75 2 3 1 1 513.5 33.2 242.00;

Data Step ExamplesReport Writing

Flier Sunfish Scale ReportBill Smith 7/21/74 Area=2 ST=73 0 5 3.500 . .Bill Smith 1/ 9/76 Area=2 ST=32 0 1 5.300 3 0.00John P. Doe 12/18/74 Area=. ST=.1 0 5 5.400 3 83.20John P. Doe 2/15/76 Area=2 ST=11 0 5 6.000 6 111.00John P. Doe 9/13/75 Area=2 ST=22 1 5 . 23 203.00Bill Smith 1/28/75 Area=2 ST=31 1 5 13.500 33 242.00

There are several features to notice in this particular example, besides its very unattractive

appearance. Note that string constants can be printed in the report simply by enclosing

the data within quotes. Note also that the output format does not have to be the same

as the input format. Further, the name of a variable along with its value can be obtained

simply by listing the variable's name on the PUT statement followed immediately by the

equals sign. The pointer control symbol / is used to move the pointer to the next line. The

FILE statement can also be used to redirect the output report to a �le or device such as a

printer.

2.2.5 SAS Formats and Informats

The SAS System provides a large number of built-in formats, as well as informats. The date

and time formats can be especially useful. When date data are input using a date informat,

the date is actually stored internally as a numerical value representing the number of days

Page 21: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 20

since a speci�c date. We'll see what that value is in a moment. What this means for us

is several things. First, it becomes easy to sort the data by date without having to worry

about a character values such as \01/19/97" being sorted before \05/03/92". Secondly, it is

easy to compute the number of days between any two SAS dates, simply take the di�erence

between them. Thirdly, the output format can be changed from that used upon input. Let's

look at an example.

Title2 "SAS Dates";Data One;Retain Now;If _N_=1 Then Now=Today(); /* Get the current day */Input Name $Char10. Birthday mmddyy8.;If Birthday=. Then Birthday=0;Bday=Birthday; Sday=Birthday; MDYday=Birthday;DaysOld=Now-Birthday;Format Bday WeekDatX29. Sday Date7. Now MDYday mmddyy8.;

Datalines;Bob 12/08/87Erica 3/15/92Sammy 06/7/57Keith .;Proc Sort Data=One;By Birthday;Run;Proc Print Data=One NoObs;Var Name MDYday Sday Bday Birthday Now DaysOld;Run;

Data Step Examples 1SAS Dates

NAME MDYDAY SDAY BDAY BIRTHDAY NOW DAYSOLD

Sammy 06/07/57 07JUN57 Friday, 7 June 1957 -938 09/16/97 14711Keith 01/01/60 01JAN60 Friday, 1 January 1960 0 09/16/97 13773Bob 12/08/87 08DEC87 Tuesday, 8 December 1987 10203 09/16/97 3570Erica 03/15/92 15MAR92 Sunday, 15 March 1992 11762 09/16/97 2011

Now we discover that SAS dates are relative to January 1, 1960 as this is the date corre-

sponding to the SAS date of zero. Note that because we did not assign a SAS date format

to the BIRTHDAY variable, that the actual SAS date value was printed. The example also

demonstrated the SAS formats WEEKDATXn. and DATEn., where n is the format width. If the

width is not su�cient to write out the entire names, as requested, then abbreviations will

be used where possible. Decimal values of SAS dates can also be used to contain internal

values for time. See the BASE SAS documentation for these formats. SAS date-time values

are entered in a date set as MM/DD/YY:hh:mm:ss where MM/DD/YY is the date, while hh is

the hour, mm is the minutes, and ss is the seconds in the time. When typed within the data

step, such as in an IF statement, a date is enclosed in quotes and followed by the letter d,

such as "09/29/97"d while times are followed by the letter t, such as "8:30"t. Note that

hour takes on the values 0 through 23, where 0 is midnight.

Formats can also be constructed using PROC FORMAT. Once constructed, these formats can

be used as with any other format. These formats provide a very nice way for coding data

very simply for input, but then producing reports with very nice labels for values. We'll

create a format for the SEX variable used in the ier data and use it to write out the labels

Page 22: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 21

on the printout.

Title2 "Proc FORMAT";Proc Format;Value Sex 1="Male" 2="Female" 3="Unknown";Run;Data One;Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;Format Sex Sex.;

Datalines;7 21 74 2 7 3 0 5 3.5 1.4 57.001 9 76 2 3 2 0 1 5.3 3.0 0.0012 18 74 2 4 1 0 5 5.4 3.4 83.202 15 76 2 1 1 0 5 6.0 5.7 111.009 13 75 2 2 2 1 5 10.1 23.4 203.00

;

Proc Print Data=One;Run;

Data Step ExamplesProc FORMAT

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL

1 7 21 74 2 7 Unknown 0 5 3.5 1.4 57.002 1 9 76 2 3 Female 0 1 5.3 3.0 0.003 12 18 74 2 4 Male 0 5 5.4 3.4 83.204 2 15 76 2 1 Male 0 5 6.0 5.7 111.005 9 13 75 2 2 Female 1 5 10.1 23.4 203.00

2.3 SAS Functions

The SAS system provides a very large number of functions for computing a wide diversity

of things. Some of the more common functions that are encountered in basic data analysis

and management are described below. Keep in mind that there are many more functions

and function families than are described herein.

2.3.1 Mathematical Functions

� ABS(value) returns the absolute value of the numeric argument.

� COS(value) returns the cosine in radians of value.

� EXP(value) returns the constant e raised to the power given by value.

� INT(value) returns the integer part of a real number.

� LOG(value) returns the natural logarithm of value.

� LOG10(value) returns the base 10 logarithm of value.

� MOD(value,divisor) returns the integer remainder when value is divided by divisor.

Page 23: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 22

� ROUND(value,decimals) returns value rounded o� to the nearest value based upon

the decimals value. For example, ROUND(123.456,0.1) returns 123:5 and ROUND(123.456,10.0)

returns 120. If the decimals value is omitted, a value of 1 is assumed.

� SIN(value) returns the sine of value.

� SQRT(value) returns the square root of value.

� SUM(v1,v2,...,vn) returns the sum of the non-missing values contained in the ar-

gument list.

2.3.2 Random Number Generators

All of the random number generators require a seed to start them. A seed of zero can be

used to seed the generator with a value derived from the system clock. For a given positive

integer seed, a generator will return exactly the same random number series. There are

other generators available, and through programming and the use of existing generators,

variates from other distributions can be generated.

� NORMAL(seed) returns a standard normal random variate. seed is the value which

speci�es the position within the psuedorandom number stream the variates are se-

lected from.

� RANBIN(seed,n,p) returns a binomial random variate from the binomial distribution

with parameters n and p.

� RANNOR(seed) is the same as the NORMAL(seed) function.

� RANPOI(seed,lambda) returns a Poisson random variate from the Poisson distribution

with parameter lambda.

� RANUNI(seed) returns a continuous uniform random variate from the interval (0; 1).

� UNIFORM(seed) is the same as RANUNI(seed).

A simple example to generate a standard normal variate Z, and from it a normal variate Y

with mean u and standard deviation s is given below. Assume u=5 and s=3.

Title2 "Normal Random Variates";Data One;Drop I u s;Retain u 5 s 3;Do I=1 To 25;Z=Normal(0); /* Use system clock as seed */Y=u + s*Z;Output;

End;Run;

Proc Print Data=One;Run;

Proc Means Data=One Mean Std;

Page 24: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 23

Var Z Y;Run;

Data Step ExamplesNormal Random Variates

OBS Z Y

1 2.01531 11.04592 0.58587 6.75763 0.18383 5.55154 -1.08207 1.75385 -1.87971 -0.63916 -0.87702 2.36897 0.63108 6.89328 1.53379 9.60149 -0.34128 3.976110 0.47535 6.426111 1.26282 8.788512 0.64412 6.932313 2.11873 11.356214 0.25801 5.774015 0.08347 5.250416 -1.37542 0.873717 -1.00606 1.981818 -0.68600 2.942019 -0.91837 2.244920 -1.28510 1.144721 -1.69460 -0.083822 -1.37999 0.860023 0.35292 6.058824 0.76663 7.299925 0.00566 5.0170

Data Step ExamplesNormal Random Variates

Variable Mean Std Dev------------------------------------Z -0.0643208 1.1334234Y 4.8070375 3.4002701------------------------------------

2.3.3 String Functions

String functions can be very useful for the processing of complex data sets and for subsetting

data sets according to values contained within strings. Some commonly used string functions

include:

� COMPRESS(string) returns a new string that has blanks removed and then padded at

the end of the string.

� INDEX(string,value) returns the position in the string where the �rst character of

value begins within the string. If value is not contained within the string, then the

function returns zero. Thus, it is very useful for checking for the presence or absence

Page 25: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 24

of a certain value in a string variable, such as testing that a string variable contains

the last name of a particular person.

� LEFT(string) will left align a character string by removing any leading blanks from

the string. Note that the string's length is not changed by this function.

� LENGTH(string) returns the \length" of a string. Here, the \length" is de�ned to

be the position of the right-most non-blank character in the string, rather than the

number of characters reserved for storage of the string.

� LOWCASE(string) converts all uppercase characters to lowercase characters in string.

� RIGHT(string) will right align a character string by removing trailing blanks from

the end of the string and inserting them at the beginning of the string. Thus, it does

not change the length of the string.

� SCAN(string,n,delimiters) returns the nth word from the character string string,

where words are delimited by the characters in delimiters. If delimiters is omitted

from the function, then blanks and most punctuation and special characters are used

as the delimiters. Consult the SAS help or SAS/BASE documentation. If there are

fewer words in the string than given by n, then a blank character string is returned.

� SUBSTR(string,start,n) returns a substring or part of string beginning with the

character at the position start in the string and continuing for n characters. If n is

omitted, then the remainder of the string is extracted.

� SUBSTR(string1,start)=string2 replaces the characters in string1 beginning at

position start in string1 with string2.

� TRIM(string) returns a new string whose trailing blanks have been removed and

whose length corresponds with the position of the last non-blank character in string.

A blank string, however, is returned as a string with one blank character.

� TRIMN(string) is like TRIM(string), but a blank string is returned as a null string

(length of zero).

� UPCASE(string) returns a new string will any lowercase characters replaced with their

uppercase counterparts.

2.3.4 Date and Time Functions

� DATE() with no argument returns the current system date.

� DATETIME() with no argument returns the current system date and time as a SAS

date-time value.

� DHMS(date,hour,minute,second) returns a SAS date-time value by combining a SAS

date value with the hour, minute, and second values.

� MDY(month,day,year) returns a SAS date from month, day, and year values.

Page 26: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 25

� TIME() with no argument returns the current system time.

� TODAY() is the same as DATE().

� WEEKDAY(date) returns an integer from 1 to 7, 1=Sunday, 7=Saturday, corresponding

to the day of the week for the SAS date date.

2.3.5 PUT and INPUT Functions

The PUT(value,format.) and INPUT(value,informat.) functions permit input/output

(I/O) to variables rather than to some I/O device. The value can be a variable or a

constant value, and the format or informat should conform to the value's type.

The example below uses the date function MDY(), the string functions COMPRESS(), SCAN(),

and UPCASE(), the I/O function PUT(), and the numeric function LOG(). It also uses the

string operator || that concatenates or joins two strings together.

Title2 "SAS Functions";Data One;Length Name $13 Chardate $8;Input @1 Name & Mo Day Yr @26 Ar 1. @30 St 1.

Sex Age Sn #2 @1 Lt 5.1 @6 Wt 6.1 @12 TSL 8.2;SASdate=MDY(Mo,Day,Yr);Chardate=Compress(Put(Mo,Z2.)||"/"||Put(Day,Z2.)||"/"||Put(Yr,Z2.));SASdate2=Input(Chardate,mmddyy8.);If Wt NE . Then LnWt=Log(Wt);Lastname=Upcase(Scan(Name,3," "));If Lastname=" " Then Lastname=Upcase(Scan(Name,2," "));Format SASdate SASdate2 mmddyy8.;

Datalines;Bill Smith 7 21 74 2 7 3 0 53.5

Bill Smith 1 9 76 2 3 2 0 15.3 3.0 0.00

John P. Doe 12 18 74 1 0 55.4 3.4 83.20

John P. Doe 2 15 76 2 1 1 0 56.0 5.7 111.00

John P. Doe 9 13 75 2 2 2 1 523.4 203.00

Bill Smith 1 28 75 2 3 1 1 513.5 33.2 242.00;Proc Print Data=One;Var Name Lastname Mo Day Yr SASdate Chardate SASdate2 Wt;Run;

Data Step ExamplesSAS Functions

OBS NAME LASTNAME MO DAY YR SASDATE CHARDATE SASDATE2 WT

1 Bill Smith SMITH 7 21 74 07/21/74 07/21/74 07/21/74 .2 Bill Smith SMITH 1 9 76 01/09/76 01/09/76 01/09/76 3.03 John P. Doe DOE 12 18 74 12/18/74 12/18/74 12/18/74 3.44 John P. Doe DOE 2 15 76 02/15/76 02/15/76 02/15/76 5.75 John P. Doe DOE 9 13 75 09/13/75 09/13/75 09/13/75 23.46 Bill Smith SMITH 1 28 75 01/28/75 01/28/75 01/28/75 33.2

Page 27: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 26

Note the use of the Zn. format to supply the leading zeros needed to mimic the look of

the MMDDYYn. format. Recognize that time calculations and ordering (sorting) can be made

with the SASDATE and SASDATE2 variables, while the CHARDATE varible is simply a character

representation of the date. If we had wanted to \ ag" all observations with a date prior to

"07/04/75"d, then it is easy using the SAS date variables,

If SASDATE < "07/04/75"d Then Flag="PRIOR";Else Flag="AFTER";

while much more programming is needed if we use the character variable. Basically, we

would have to create our own \SAS date" representation of the character variable for such

comparisons.

2.4 Looping and Arrays

In a number of circumstances the same task needs to be performed multiple times over a

set of observations, variables, or times. The implicit loop of the data step is already seen

to provide a loop around the observations in an input data set. SAS arrays can be used

to facilitate looping over a set variables. The ARRAY statement lists the names of variables

to be treated as a set such that they can be referenced through an indexing variable to an

array. The syntax of the ARRAY statement is

ARRAY arrayname{number|* {,number,number,...}} list_of_variables

where arrayname is the name for the array and {number|*} is either the number of variables

in the that dimension of the array or is \*" indicating that all variables listed are to be used

in the one dimensional array. Using the Flier sun�sh data, let's assume that we wished to

create a set of new variables that were the logarithmic transformation of a set of the original

variables. Rather than write a number of assignment statements, we can use arrays, a single

assignment statement, and a loop to complete the task. The following SAS code illustrates

the brute force method that we wish to avoid.

Title2 "Brute Force Approach to Repetitive Task";Data One;Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;If TSL=0 Then DELETE;LnLt=Log(Lt); LnWt=Log(Wt); LnTSL=Log(TSL);

Datalines;7 21 74 2 7 3 0 5 3.5 1.4 57.001 9 76 2 3 2 0 1 5.3 3.0 0.0012 18 74 2 4 1 0 5 5.4 3.4 83.202 15 76 2 1 1 0 5 6.0 5.7 111.009 13 75 2 2 2 1 5 10.1 23.4 203.00

;

Page 28: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 27

Now let's re-write this program using arrays and the basic \do" loop.

Title2 "Arrays and Do Loop";Data One(DROP=I);Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;If TSL=0 Then DELETE;Array RawVars{3} Lt Wt TSL;Array NewVars{3} LnLt LnWt LnTSL;Do I=1 To 3;NewVars(I)=Log(RawVars(I));

End;Datalines;7 21 74 2 7 3 0 5 3.5 1.4 57.001 9 76 2 3 2 0 1 5.3 3.0 0.0012 18 74 2 4 1 0 5 5.4 3.4 83.202 15 76 2 1 1 0 5 6.0 5.7 111.009 13 75 2 2 2 1 5 10.1 23.4 203.00

;

The data sets produced by each of these programs are the same. However, imagine that

instead of 3 variables that needed to be transformed, there were many more. Multidimen-

sional arrays are also allowed simply by specifying multiple subscript sizes.

2.4.1 Univariate and Multivariate Data Views

In many instances measurements are made at the same location or on the same individuals

through time. These repeated measures data can be viewed and analyzed using both uni-

variate and multivariate approaches. In the multivariate approach, each measurement made

on the same individual is treated as a di�erent variable, while in the univariate approach,

each measurement is treated as a separate observation made on the same individual. In the

latter case, one variable is used to identify the individual while another is used to hold the

value of the measurement. Below we will treat the Flack lake trout data in both multivari-

ate and univariate formats. This data set has catch data reported on trout aged 3 through

9 years old for the years 1968 through 1979. First consider the multivariate view of the

data where each catch number for each age is kept in separate variables.

Title2 "Multivariate Data View";Data TroutM;/* Flack Lake Trout Catch Data*/Input Year Age3-Age9;

Datalines;1968 13 129 646 954 99 19 41969 19 169 416 1031 243 47 181970 40 354 606 479 152 18 71971 32 606 1424 644 157 23 171972 0 226 1178 1156 116 16 51973 2 165 593 982 428 22 111974 53 209 560 410 30 0 41975 0 105 674 446 16 2 21976 46 422 838 726 70 4 41977 3 310 1224 1068 65 0 01978 14 354 1264 1172 69 0 61979 6 429 1222 1067 192 0 0;Proc Print Data=TroutM;Run;

Page 29: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 28

Data Step ExamplesMultivariate Data View

OBS YEAR AGE3 AGE4 AGE5 AGE6 AGE7 AGE8 AGE9

1 1968 13 129 646 954 99 19 42 1969 19 169 416 1031 243 47 183 1970 40 354 606 479 152 18 74 1971 32 606 1424 644 157 23 175 1972 0 226 1178 1156 116 16 56 1973 2 165 593 982 428 22 117 1974 53 209 560 410 30 0 48 1975 0 105 674 446 16 2 29 1976 46 422 838 726 70 4 410 1977 3 310 1224 1068 65 0 011 1978 14 354 1264 1172 69 0 612 1979 6 429 1222 1067 192 0 0

Now, let's rearrange the data into the univariate view, where one variable will contain the

age of the catch and another will contain the number caught of that age. We will start with

the raw data and use a loop to read in the data.

Title2 "Multivariate To Univariate Data View I";Data TroutU;/* Flack Lake trout Catch Data*/Input Year @;Do Age=3 To 9;Input Number @;Output;

End;Datalines;1968 13 129 646 954 99 19 41969 19 169 416 1031 243 47 181970 40 354 606 479 152 18 71971 32 606 1424 644 157 23 171972 0 226 1178 1156 116 16 51973 2 165 593 982 428 22 111974 53 209 560 410 30 0 41975 0 105 674 446 16 2 21976 46 422 838 726 70 4 41977 3 310 1224 1068 65 0 01978 14 354 1264 1172 69 0 61979 6 429 1222 1067 192 0 0;Proc Print Data=TroutU(Obs=22);Run;

Page 30: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 29

Data Step ExamplesMultivariate To Univariate Data View I

OBS YEAR AGE NUMBER

1 1968 3 132 1968 4 1293 1968 5 6464 1968 6 9545 1968 7 996 1968 8 197 1968 9 48 1969 3 199 1969 4 16910 1969 5 41611 1969 6 103112 1969 7 24313 1969 8 4714 1969 9 1815 1970 3 4016 1970 4 35417 1970 5 60618 1970 6 47919 1970 7 15220 1970 8 1821 1970 9 722 1971 3 32

Here we only listed the �rst 22 observations of the data set to illustrate the format of the

univariate view.

Title2 "Multivariate To Univariate Data View II";Data TroutU;Drop Age3-Age9;Array Ages{3:9} Age3-Age9;Set TroutM;Do Age=3 To 9;

Number=Ages(Age);Output;

End;Run;

Proc Print Data=TroutU;Run;

This data and print step will produce the same output as the previous one. In this instance

we are accessing a SAS data set that already has the data in the multivariate view. Notice

that the ARRAY statement speci�es the beginning and ending value of the array index (3

and 9). Notice also that the DROP statement is needed to remove the \old" variables AGE3

through AGE9 from the new data set.

PROC TRANSPOSE

Before leaving this section it is worth looking at a procedure developed to convert between

the univariate and multivariate data views. Although not appropriate for all problems, it

can be very useful for many. PROC TRANSPOSE takes an input data set, and based upon

Page 31: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 30

some structure commands, creates a new data set with a di�erent con�guration. For this

example we will again use the lake trout data and we will input it into the multivariate

view. Then PROC TRANSPOSE will be called to convert it to the univariate view. The DATA=

and OUT= options on the PROC TRANSPOSE statement specify the input and output SAS data

sets, respectively. The VAR statement lists the variables that will be transposed. The BY

statement instructs PROC TRANSPOSE to treat each year separately, i.e., we want to transpose

all of the values of the speci�ed variables before moving on to the next year.

Title2 "PROC TRANSPOSE";Data TroutM;/* Flack Lake Trout Catch Data*/Input Year Age3-Age9;

Datalines;1968 13 129 646 954 99 19 41969 19 169 416 1031 243 47 181970 40 354 606 479 152 18 71971 32 606 1424 644 157 23 171972 0 226 1178 1156 116 16 51973 2 165 593 982 428 22 111974 53 209 560 410 30 0 41975 0 105 674 446 16 2 21976 46 422 838 726 70 4 41977 3 310 1224 1068 65 0 01978 14 354 1264 1172 69 0 61979 6 429 1222 1067 192 0 0;Proc Transpose Data=TroutM Out=TroutU;By Year Notsorted;Var Age3-Age9;Run;Proc Print Data=TroutU;Run;

Data Step ExamplesPROC TRANSPOSE

OBS YEAR _NAME_ COL1

1 1968 AGE3 132 1968 AGE4 1293 1968 AGE5 6464 1968 AGE6 9545 1968 AGE7 996 1968 AGE8 197 1968 AGE9 48 1969 AGE3 199 1969 AGE4 16910 1969 AGE5 41611 1969 AGE6 103112 1969 AGE7 24313 1969 AGE8 4714 1969 AGE9 18

Only the �rst 14 observations are listed. The NAME variable contains the names of the

original variables, while the COL1 variable contains the values that those variables held.

There are some options on the PROC TRANSPOSE procedure line that can be used to make

the variable names more attractive. However, we will use the data step below to make the

data set look like one that we might have created from scratch if our original intent was

Page 32: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 31

to have a univariate view. This will make the reverse transpose to follow more realistic

looking. Note the use of the INPUT() and SUBSTR() functions to convert values such as

AGE4 into a numeric value, here, 4. Then we dropped the NAME variable as it is no longer

needed. We also renamed the COL1 variable to NUMBER.

/* Make the data set look more like one that wouldhave come from reading the data directly intothe univariate view. This makes the example alittle more realistic. */

Data TroutU;Set TroutU;Age=Input(Substr(_NAME_,4),2.);Rename Col1=Number;Drop _NAME_;Run;

Proc Print Data=TroutU;Run;

Data Step ExamplesPROC TRANSPOSE

OBS YEAR NUMBER AGE

1 1968 13 32 1968 129 43 1968 646 54 1968 954 65 1968 99 76 1968 19 87 1968 4 98 1969 19 39 1969 169 410 1969 416 511 1969 1031 612 1969 243 713 1969 47 814 1969 18 9

Again, only the �rst 14 observations are listed here. In transposing the data set back, we

would like to have the same variable names as before. This would have been easy had we

not adjusted the data set as above. However, that is not particularly realistic, since you

would not likely transpose a data set and then reverse the transpose exactly as given later.

Since variable names cannot be numbers, we need to have a way to handle the numeric

values for AGE. PROC TRANSPOSE will use formats when available and this will be our way

out. For a very large number of levels of AGE, it might be more productive to construct a

character variable (like NAME ) containing the names of the variables that we wish to create.

The ID statement gives the variable containing the names of the new variables. Since we

used a FORMAT statement for AGE, the formatted values will be used.

Page 33: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 32

Proc Format;Value Ages 0="AGE0" 1="AGE1" 2="AGE2" 3="AGE3" 4="AGE4"

5="AGE5" 6="AGE6" 7="AGE7" 8="AGE8" 9="AGE9";Proc Transpose Data=TroutU Out=TroutM;By Year Notsorted;Var Number;Id Age;Format Age Ages.;Run;

Proc Print Data=TroutM;Run;

Data Step ExamplesPROC TRANSPOSE

OBS YEAR _NAME_ AGE3 AGE4 AGE5 AGE6 AGE7 AGE8 AGE9

1 1968 NUMBER 13 129 646 954 99 19 42 1969 NUMBER 19 169 416 1031 243 47 183 1970 NUMBER 40 354 606 479 152 18 74 1971 NUMBER 32 606 1424 644 157 23 175 1972 NUMBER 0 226 1178 1156 116 16 56 1973 NUMBER 2 165 593 982 428 22 117 1974 NUMBER 53 209 560 410 30 0 48 1975 NUMBER 0 105 674 446 16 2 29 1976 NUMBER 46 422 838 726 70 4 410 1977 NUMBER 3 310 1224 1068 65 0 011 1978 NUMBER 14 354 1264 1172 69 0 612 1979 NUMBER 6 429 1222 1067 192 0 0

With the exception of the NAME variable, the data set looks like the data set we started

with. Thus, it is relatively easy to go either direction with PROC TRANSPOSE when dealing

with univariate and multivariate views.

2.4.2 Indeterminant DO Loops

There are occasions when a loop is needed, but it is not known in advance of the loop,

how many iterations will be needed. This is usually determined from the data themselves,

or by some other mechanism that triggers the end of the looping process. The SAS data

step has DO WHILE and DO UNTIL statements to handle indeterminant loops. The DO UNTIL

loop is always executed at least once while the DO WHILE loop is executed only if the WHILE

condition is met. The end of the scope of the loop is given by an END statement.

As an example, consider a data set that has as physical data lines, a part-time employee's

name and a list of the hours worked by the employee. We would like to create a data set

that gives the employee name and hours such that each hour value is treated as a separate

observation. Since the number of values on each physical line is unknown, we we read until

there is nothing left to read on the line. The MISSOVER option is used to keep the pointer

from moving to the next line to read new data.

Page 34: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 33

Title2 "DO UNTIL Loop";Data Pay;Length Employee $15;Infile Datalines MISSOVER;Input Employee & Hours @;Do Until (Hours=.);Output;Input Hours @;

End;Datalines;Bob Jones 3.5 8 8 2 3Erin Walsh 6 7.5 1.5Tom N. Smith 2 3 3 1 6 7 3;Proc Print;Run;

Data Step ExamplesDO UNTIL Loop

OBS EMPLOYEE HOURS

1 Bob Jones 3.52 Bob Jones 8.03 Bob Jones 8.04 Bob Jones 2.05 Bob Jones 3.06 Erin Walsh 6.07 Erin Walsh 7.58 Erin Walsh 1.59 Tom N. Smith 2.010 Tom N. Smith 3.011 Tom N. Smith 3.012 Tom N. Smith 1.013 Tom N. Smith 6.014 Tom N. Smith 7.015 Tom N. Smith 3.0

The use of the single @ sign keeps the pointer on the same line until all data for an employee

have been read. Then the loop exits and the data step begins again, but because the single

@ sign was used, the previous line gets discarded and a new one is worked on.

2.5 The NULL Data Set

In some problems the power of the SAS data step is needed, but no new or changed SAS

data set will be produced. This happens, for example, when the data step is used to produce

a report, for system management tasks such as part of an interactive program, or to obtain

information to be used by the macro language. The reserved data set name _NULL_ is used

as the name for the \new" data set when one is not wanted. As an example with the ier

data set, let's say that we wanted to list all observations in a data set for which the sex of

the �sh is female and that we wanted the average weight of these �sh printed at the end of

the report.

Page 35: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 34

Title2 "Null Data Set";Data _NULL_;File Print;Infile Datalines EOF=EOF;If _N_=1 Then Put " DATE WEIGHT";Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;If Sex=2;Date=MDY(Mo,Day,Yr);Put Date MMDDYY8. +2 Wt 8.1;N+1;TotWt+Wt;

Return;Eof:AveWt=TotWt/N;Put // "Average Weight =" AveWt;

Return;Datalines;7 21 74 2 7 3 0 5 3.5 1.4 57.001 9 76 2 3 2 0 1 5.3 3.0 0.0012 18 74 2 4 1 0 5 5.4 3.4 83.202 15 76 2 1 1 0 5 6.0 5.7 111.009 13 75 2 2 2 1 5 10.1 23.4 203.00

... lines omitted ...;

Data Step ExamplesNull Data Set

DATE WEIGHT01/09/76 3.009/13/75 23.404/18/76 16.801/28/75 40.904/18/76 17.904/18/75 47.903/22/75 101.604/19/75 76.306/24/75 125.8

Average Weight =50.4

This example also demonstrates several other features of the SAS language that have not

yet been discussed. The 2 lines N+1; and TOTWT+WT; are called sum statements. They take

the value or value of the variable to the right of the plus sign and add it to the value of the

variable to the left of the plus sign. They are not exactly like, for example, N=N+1; because

when the data step loop begins again, unless N is retained it will lose its value. This does

not happen with variables in sum statements. They are automatically retained.

Note also the strange IF SEX=2; statement that appears to be missing the THEN clause. This

statement is equivalent to the one IF SEX NE 2 THEN DELETE;. It is called a subsetting IF

statement.

Lastly is the data step \subroutine." The data step does not have subroutines that are

separated from the data step, rather they are contained within it and all variables are global

to the main and subroutine sections. These \subroutines" behave like goto statements,

but control can return to the code immediately following the call. These subroutines are

called from input/output coditions, such as above where the condition is an \End of File"

Page 36: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 35

condition and the INFILE option EOF is used to point to a subroutine to be executed when the

condition is met, or from LINK and GOTO statements. If the LINK statement is used, control

returns to the spot immediately following the LINK statement, while the GOTO statement

simply redirects program execution through the subroutine and control usually returns

to the top of the data step loop. Note the use of the RETURN statement. The �rst use

prevents execution from continuing into the subroutine, while the second marks the end of

the subroutine. Had the �rst RETURN been left out, a \running" average of the �sh weights

would have been generated for the females.

2.6 Data Step Examples

2.6.1 Simple Random Sampling Without Replacement

This example will demonstrate how to take a simple random sample (SRS) without replace-

ment from a frame or list of the population under study. Here, we will treat our ier data

set as a population and take a simple random sample from it. For real problems, the frame

might be a list of names and addresses of persons on a license sales list. Those selected

from the list would then be sent a questionnaire. The example demonstrates several features

covered in this chapter, as well as showing some very simple macro language features.

Title2 "Simple Random Sample";Data FLIER;Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;

Datalines;... data go here ...;

%Let Size=12; /* Define The Sample Size */Data SRS;Drop Seed K NN Prob;Retain Seed 123459; /* Random Number Seed */Retain K &Size; /* Sample Size */NN=N; /* Population Size */Call Symput("PopSize",Trim(Left(Put(NN,20.))));I=0;Do Until ((I>=N) or (K=0));I+1;Set FLIER Point=I NObs=N;Prob=K/NN;If Ranuni(Seed) <= Prob Then /* Select This One */Do;Output;K=K-1; /* We Need One Less In The Sample */End;NN=NN-1; /* We Have One Less To Choose From */

End;Stop; /* We Are Done */Run;

Title3 "Sample of Size &Size From a Population of Size &Popsize";Proc Print Data=SRS;Run;

Page 37: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 36

Data Step ExamplesSimple Random Sample

Sample of Size 12 From a Population of Size 25

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL

1 7 21 74 2 7 3 0 5 3.5 1.4 57.002 1 9 76 2 3 2 0 1 5.3 3.0 0.003 2 15 76 2 1 1 0 5 6.0 5.7 111.004 9 13 75 2 2 2 1 5 10.1 23.4 203.005 1 28 75 2 3 1 1 5 13.5 33.2 242.006 4 18 76 3 3 1 2 5 11.9 44.4 244.007 1 28 75 2 3 1 2 5 16.0 103.4 307.808 4 18 76 3 2 2 3 5 9.1 17.9 171.409 2 23 75 2 3 1 3 5 17.2 127.1 346.0010 4 18 75 2 1 2 4 5 12.1 47.9 266.6011 2 8 75 2 3 1 4 4 17.5 134.4 357.7512 1 28 75 2 3 1 6 1 18.5 131.6 409.00

This particular method uses unequal probability methods in the selection of each element

into the sample, but because each sample of size 12 has the same probability of selection as

every other sample of size 12, the sample is a simple random sample. Since we do not know

when the 12 elements will be selected, it could be the �rst 12 or the last 12 observations,

we use an indeterminant loop, the DO UNTIL loop. Note also that the data step loop is

executed only once here. Otherwise, multiple samples of size 12 would have been taken.

The STOP statement is used to keep the data step from looping again by simply stopping

execution of the data step.

2.6.2 Data Recoding

Reversing a Likert scale

A Likert scale is often used in questionnaires to measure a degree of belief or agreement

with an idea or statement. If one wanted to measure the degree of satisfaction with a �shing

trip, a questionnaire might ask

Would you consider this to be the best �shing trip that you have had in the last

5 years?

1=Strongly Disagree

2=Disagree Somewhat

3=Neutral

4=Agree Somewhat

5=Strongly Agree

A typical way of analyzing a collection of such types of questions asked of the same indi-

viduals is to create a \scale" which is often the simple sum of the scores (1-5) over each

question. However, in many questionnaires, some questions may be worded in such a way

that a 5 means \positive" or \agree", while for others, a 5 means \negative" or \disagree".

Page 38: Advanced SAS Programming Techniques ()

CHAPTER 2. THE DATA STEP 37

Thus, some questions may need the Likert scale reversed. This is very easily accomplished

using a mathematical transformation. Let's make the example more interesting by stating

that variables Q1, Q20, Q32, and Q184 need reverse coding.

Data One;Input Subject Q1-Q200;Array Reverse {*} Q1 Q20 Q32 Q184;Do I=1 To 4;Reverse(I)=6-Reverse(I);

End;Datalines;data follow here

Using formats

For some problems the data recoding can be particularly complicated. Either a simple

mathematical transformation is not possible (as could be done above), or one may need

to convert between numeric and character data. The IF-THEN-ELSE statements or CASE

expression can be used for these purposes, but often require a lot of programming. Often

times the PUT() and INPUT() functions can be used to simplify this process. Assume that

we have a data set on vegetation collected from transects run across a marsh in which the

species of plant and coverage along a 5m stick are recorded. To facilitate data entry, only

a 2-character abbreviation for the species is input. However, the scientist would like the

complete name in the data set. The example below illustrates one approach to accomplishing

this.

Title2 "Data Recoding";Proc Format;Value $Veg "wg"="Wire Grass" "sp"="Spartina patens"

"br"="Bull Rush" "wi"="Widgeon Grass";Run;

Data One;Input Sp $ Coverage @@;Length Species $20.;Species=Put(Sp,$Veg.);Drop Sp;

Datalines;wi 0.4 wi 0.6 wg 1.2 br 0.2 sp 4.8;Proc Print Data=One;Run;

Data Step ExamplesData Recoding

OBS COVERAGE SPECIES

1 0.4 Widgeon Grass2 0.6 Widgeon Grass3 1.2 Wire Grass4 0.2 Bull Rush5 4.8 Spartina patens

Page 39: Advanced SAS Programming Techniques ()

Chapter 3

Working With Files

The SAS System works with data stored in a special format called a SAS data set. Although

it is possible for SAS to work with data in other program formats using SAS/VIEWS and

\data engines", it is much more common to work with data in the SAS data set format.

The SAS data set is specially formatted to permit the SAS System to quickly and easily

work with the data. For example, SAS procedures can determine whether the data have

been sorted without �rst having to read the entire data set.

3.1 External Files

Often the �rst step in a SAS program is the input of \raw" data into a SAS data set for

analysis. Later in the program, we may wish to create an external �le or report for use

elsewhere. For these actions we must learn about some of the interfaces that the SAS system

has with external (non-SAS) �les.

For input into a data step, external �les can typically be referenced using several di�erent

methods, some of which depend upon the platform. On the WIN95 and Unix platforms,

the �lename for the \raw" data can simply be placed in the INFILE statement. Assume

that the �le of interest is named raw.dat and is stored in the subdirectory fish. On the

WIN95 platform we could use the statement

Infile "c:\fish\raw.dat";

while on the Unix platform it might be referenced as

Infile "fish/raw.dat";

where the fish subdirectory is relative to our current working directory on Unix. Alterna-

tively, the FILENAME statement can be used to link a �le reference to the actual �le. This

can make programs much easier to use on multiple platforms and to update and modify

later. The basic statement looks like

FILENAME �leref "path-and-�le-name";

Using the WIN95 �le given earlier a skeleton program would look like

38

Page 40: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 39

FILENAME fish "c:\fish\raw.dat";Data one;Infile fish;Input ....;Run;

Notice that the INFILE statement is still needed, but it speci�es the �le reference rather

than the actual �le name. The FILENAME statement can also be used to reference more than

one �le in a subdirectory. Study the following example.

FILENAME fish "c:\fish"; /* specify the subdirectory only */Data Halibut;Infile Fish(Halibut.dat);Input ...;Run;Data Coho;Infile Fish(Coho.dat);Input ...;Run;

The SAS system assumes that data �les will have the extension .DAT, and so the extension

can be dropped from the INFILE statement. Thus the code could have been written,

FILENAME fish "c:\fish"; /* specify the subdirectory only */Data Halibut;Infile Fish(Halibut);Input ...;Run;Data Coho;Infile Fish(Coho);Input ...;Run;

An example of these methods is shown for the ier data set. The orignal data set was also

broken into 2 pieces called FLIERS1.DAT and FLIERS2.DAT, and the variable headings were

removed from each of these two data sets. To better illustrate the results, the SAS log for

the code is shown. The �rst method uses only the INFILE statement.

238 Title2 "External File Reference With Infile";239 Data Flier;240 Infile "c:\projects\alaska97\flier.dat" Firstobs=2;241 Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;242 Run;

NOTE: The infile "c:\projects\alaska97\flier.dat" is:FILENAME=c:\projects\alaska97\flier.dat,RECFM=V,LRECL=256

NOTE: 664 records were read from the infile "c:\projects\alaska97\flier.dat".The minimum record length was 102.The maximum record length was 102.

NOTE: The data set WORK.FLIER has 664 observations and 11 variables.NOTE: The DATA statement used 0.77 seconds.

243244 Proc Print Data=Flier(Obs=10);245 Run;

NOTE: The PROCEDURE PRINT used 0.11 seconds.

Page 41: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 40

Next, the �le reference is moved to the FILENAME statement.

247 Title2 "FILENAME File Reference";248 Filename Fish "c:\projects\alaska97\flier.dat";249 Data Flier;250 Infile Fish Firstobs=2;251 Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;252 Run;

NOTE: The infile FISH is:FILENAME=c:\projects\alaska97\flier.dat,RECFM=V,LRECL=256

NOTE: 664 records were read from the infile FISH.The minimum record length was 102.The maximum record length was 102.

NOTE: The data set WORK.FLIER has 664 observations and 11 variables.NOTE: The DATA statement used 0.48 seconds.

The FILENAME statement can also be used to refer to directories. When directories are

used, the INFILE statement is then used to select the member to process. This method also

permits several directories to be concatenated together to search for �les, and the same �le

reference can be used for multiple �les, and for reading and writing.

254 Title2 "FILENAME Directory Reference";255 Filename Fish "c:\projects\alaska97";256 Data Flier;257 Infile Fish(Flier.dat) Firstobs=2;258 Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;259 Run;

NOTE: The infile library FISH is:DIRECTORY=c:\projects\alaska97

NOTE: The infile FISH(Flier.dat) is:DIRECTORY=c:\projects\alaska97,MEMBERNAME=c:\projects\alaska97\Flier.dat,RECFM=V,LRECL=256

NOTE: A total of 664 records were read from the infile library FISH.The minimum record length was 102.The maximum record length was 102.

NOTE: 664 records were read from the infile FISH(Flier.dat).The minimum record length was 102.The maximum record length was 102.

NOTE: The data set WORK.FLIER has 664 observations and 11 variables.NOTE: The DATA statement used 0.59 seconds.

Since the SAS system assumes that data �les end with .DAT, the su�x can be dropped

from the member name. However, for names that do not have a su�x, the name should be

enclosed within quotes.

261 Data Flier;262 Infile Fish(Flier) Firstobs=2;263 Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;264 Run;

NOTE: The infile library FISH is:DIRECTORY=c:\projects\alaska97

NOTE: The infile FISH(Flier) is:DIRECTORY=c:\projects\alaska97,

Page 42: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 41

MEMBERNAME=c:\projects\alaska97\Flier.DAT,RECFM=V,LRECL=256

NOTE: A total of 664 records were read from the infile library FISH.The minimum record length was 102.The maximum record length was 102.

NOTE: 664 records were read from the infile FISH(Flier).The minimum record length was 102.The maximum record length was 102.

NOTE: The data set WORK.FLIER has 664 observations and 11 variables.NOTE: The DATA statement used 0.66 seconds.

Sometimes the data that we would like to work with exists in more than one physical raw

�le. For example, the catch data might be kept in separate spreadsheets for each harvest

year. To analyze the data together, the data sets must be concatenated. Below, the two

ier data sets (the original data set split into 2 pieces) will be input and concatenated using

3 di�erent data steps. Note that the �le reference from above is being reused.

266 Title2 "Data Set Concatenation of Files";267 Data Flier1;268 Infile Fish(Fliers1);269 Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;270 Run;

NOTE: The infile library FISH is:DIRECTORY=c:\projects\alaska97

NOTE: The infile FISH(Fliers1) is:DIRECTORY=c:\projects\alaska97,MEMBERNAME=c:\projects\alaska97\Fliers1.DAT,RECFM=V,LRECL=256

NOTE: A total of 298 records were read from the infile library FISH.The minimum record length was 102.The maximum record length was 102.

NOTE: 298 records were read from the infile FISH(Fliers1).The minimum record length was 102.The maximum record length was 102.

NOTE: The data set WORK.FLIER1 has 298 observations and 11 variables.NOTE: The DATA statement used 0.44 seconds.

271 Data Flier2;272 Infile Fish(Fliers2);273 Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;274 Run;

NOTE: The infile library FISH is:DIRECTORY=c:\projects\alaska97

NOTE: The infile FISH(Flier2) is:DIRECTORY=c:\projects\alaska97,MEMBERNAME=c:\projects\alaska97\Fliers2.DAT,RECFM=V,LRECL=256

NOTE: A total of 366 records were read from the infile library FISH.The minimum record length was 102.The maximum record length was 102.

NOTE: 366 records were read from the infile FISH(Fliers2).The minimum record length was 102.The maximum record length was 102.

Page 43: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 42

NOTE: The data set WORK.FLIER2 has 366 observations and 11 variables.NOTE: The DATA statement used 0.44 seconds.

275 Data Flier;276 Set Flier1 Flier2;277 Run;

NOTE: The data set WORK.FLIER has 664 observations and 11 variables.NOTE: The DATA statement used 0.5 seconds.

As the FILENAME statement can concatenate directories together, it can also concatenate

physical �les together. The method is to put the list of �lenames within parenthesis, each

separated by a comma.

279 Title2 "FILENAME Concatenation of Files";280 Filename Fish("c:\projects\alaska97\fliers1.dat","c:\projects\alaska97\fliers2.dat");281 Data Flier;282 Infile Fish;283 Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;284 Run;

NOTE: The infile FISH is:FILENAME=c:\projects\alaska97\fliers1.dat,RECFM=V,LRECL=256

NOTE: The infile FISH is:FILENAME=c:\projects\alaska97\fliers2.dat,RECFM=V,LRECL=256

NOTE: 298 records were read from the infile FISH.The minimum record length was 102.The maximum record length was 102.

NOTE: 366 records were read from the infile FISH.The minimum record length was 102.The maximum record length was 102.

NOTE: The data set WORK.FLIER has 664 observations and 11 variables.NOTE: The DATA statement used 0.66 seconds.

File references can also be listed as

66 Title2 "List All Defined File References";67 Filename _ALL_ List;NOTE: Fileref= FISH

Physical Name= c:\projects\alaska97\Fliers2.datc:\projects\alaska97\Fliers1.dat

NOTE: Fileref= TMP1Physical Name= C:\PROJECTS\Alaska97\Files.sas

and can be cleared using

69 Title2 "Clear The FISH File Reference";70 Filename Fish Clear;NOTE: Fileref FISH has been deassigned.

Clearing a �le reference can free up some memory, but usually is important with complicated

programs to insure that an important �le is not written over, or the wrong �le read as input,

due to programming errors.

Files can also be created and appended to using techniques similar to the above for input.

Basically, de�ne a �le reference to receive the data and use the FILE statement to direct

Page 44: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 43

output to the reference. The example below �rst writes the FLIER1 data to an external ascii

�le, then next, this could be at some latter time for example, appends the FLIER2 data to

the same external �le.

72 Title2 "Creating An External File";73 Filename Fish "c:\projects\alaska97";74 Data _NULL_;75 File Fish(Flier1N2);76 Set Flier1;77 Put (Mo Day Yr) (3.) (Ar St Sex Age Sn) (2.) (Lt Wt TSL) (7.3);78 Run;

NOTE: The file library FISH is:DIRECTORY=c:\projects\alaska97

NOTE: The file FISH(Flier1N2) is:DIRECTORY=c:\projects\alaska97,MEMBERNAME=c:\projects\alaska97\Flier1N2.DAT,RECFM=V,LRECL=256

NOTE: A total of 298 records were written to the file library FISH.The minimum record length was 40.The maximum record length was 40.

NOTE: 298 records were written to the file FISH(Flier1N2).The minimum record length was 40.The maximum record length was 40.

NOTE: The DATA statement used 0.6 seconds.

80 Title2 "Appending To An Existing External File";81 Data _NULL_;82 File Fish(Flier1N2) Mod;83 Set Flier2;84 Put (Mo Day Yr) (3.) (Ar St Sex Age Sn) (2.) (Lt Wt TSL) (7.3);85 Run;

NOTE: The file library FISH is:DIRECTORY=c:\projects\alaska97

NOTE: The file FISH(Flier1N2) is:DIRECTORY=c:\projects\alaska97,MEMBERNAME=c:\projects\alaska97\Flier1N2.DAT,RECFM=V,LRECL=256

NOTE: A total of 366 records were written to the file library FISH.The minimum record length was 40.The maximum record length was 40.

NOTE: 366 records were written to the file FISH(Flier1N2).The minimum record length was 40.The maximum record length was 40.

NOTE: The DATA statement used 0.44 seconds.

A sample of what the output data set looks like is given below.

7 21 74 2 7 3 0 5 3.500 1.400 57.0001 9 76 2 3 2 0 1 5.300 3.000 0.00012 18 74 2 4 1 0 5 5.400 3.400 83.2002 15 76 2 1 1 0 5 6.000 5.700111.00012 14 75 2 1 1 0 5 6.100 6.600109.000

Notice the use of the grouped formats to associate a single format with several di�erent

variables that appear together. A slightly larger �eld width for TSL appears needed. To

Page 45: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 44

append the second data set to the �rst, it was necessary to specify the MOD option on the

FILE statement.

3.1.1 FTP Access

The FILENAME statement can also be used to access, modify, and create information stored

or to be stored on other computers using TCP/IP network methods. The FTP option can

be used to open an FTP (�le transfer protocol) connection with another computer. Data

can then be read from the FTP server or can be written to the FTP server, just as if the �le

were stored on the local machine. The syntax for writing the �le flier.dat to the Unix1

computer at LSU for user bill and to prompt for the password would be

Title2 "FTP Access To Write A File";Filename Out FTP "flier.dat" /* Name of data set */

host="unix1.sncc.lsu.edu" /* Host name */user="bill" /* User login name */prompt /* Prompt for password */cd="/u/bill/alaska/stuff" /* Change directory first */rcmd="ascii" /* Remote command to exec */recfm=v; /* User variable record length */

Data _NULL_;File Out;Set Flier1;Put (Mo Day Yr) (3.) (Ar St Sex Age Sn) (2.) (Lt Wt TSL) (7.3);Run;

This technique can also be used to input not only �les via FTP, but to retrieve any infor-

mation that can be obtained via FTP. To get a directory listing from the remote machine

above, we might write

Title2 "FTP Access To List A Directory";Filename Dir FTP " " /* Null data set (required) */

ls /* FTP command */host="unix1.sncc.lsu.edu" /* Host name */user="bill" /* User login name */prompt /* Prompt for password */cd="/u/bill/alaska/stuff"; /* Change directory first */

Data _NULL_;File Log;Infile Dir;Input;Put _INFILE_;

Run;

The data step simply reads then writes the directory listing to the SAS Log. Note that the

data could have been placed into a variable or variables, and then processed using the data

step.

Page 46: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 45

3.1.2 WWW Access

Not to be outdone by the FTP access method, the FILENAME statement can also reference

a URL (uniform resource locator), otherwise known as a web page. The page might be a

hypertext document or might be a �le being distributed via a WWW server. In the example

below, we will assume that the ier data set is stored at the URL

http://www.stat.lsu.edu/faculty/moser/flier.dat

and we wish to read these data into our program.

Title2 "HTTP Access To Read A File";Filename Fish URL "http://www.stat.lsu.edu/faculty/moser/flier.dat";Data Flier;Infile Fish Firstobs=2;Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;Run;

Some important aspects of the FTP and URL access, is that data may be stored in some

\distant" location, yet reachable by network, and be accessed at runtime. This could be

very important for real-time data, such as from data loggers, or can be important when

many people are accessing the data and and single, master copy is needed. From a systems

programmer perspective, it means that the SAS system can be used to develop an interface

to the internet.

3.2 Including External SAS Code

As you will begin to see later, we may wish to reuse the same SAS code in many di�erent

programs. This may be especially true of SAS macros that we may have written to solve

various problems. In other cases, the program that we are working on is very large and

we would like to maintain the program in separate pieces or �les. SAS code external to

the currently executing SAS session can be included into the session using the %INCLUDE

statement. The argument to the statement can either be a �le name or a �le reference.

Notice that if a �le reference is used, then all of the powerful methods described above for

reading in data are available. It is assumed that SAS program �le names end with the .sas

extension. A couple of examples are given below.

%Include "c:\projects\alaska97\fishdemo.sas" / Nosource2;

Filename Fishdemo URL "http://www.stat.lsu.edu/faculty/moser/fishdemo.sas";%Include Fishdemo;

The NOSOURCE2 option to the %INCLUDE statement is used to suppress printing the included

code to the SAS log.

3.3 The SAS Data Library

SAS data sets are stored in SAS libraries. When the SAS System was initially being

developed on the IBM mainframes, a SAS library was a specially formatted MVS data set

Page 47: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 46

whose members were SAS data sets. Now, a SAS library may be nothing more than a

reference to a subdirectory in a �lesystem on a PC or a Unix workstation, and may contain

much more than SAS data sets. The idea behind the library is to organize and reference a

collection of similar objects, such as data sets, in a common way. If I have several di�erent

data sets on �shing e�ort, say one data set for each year since 1975, I may want to keep

the data in each year separated, but be able to �nd all of the data very easily and be able

to combine any or all of it very easily as well. On a Windows PC you can identify the SAS

data sets by the su�x \ssd", while on an AIX Unix workstation the �les have the su�x

\ssd001". Because they are in a special format, they are not to be read or edited using

non-SAS tools.

3.3.1 The LIBNAME Statement

The LIBNAME statement is used to specify the name and location of a SAS library. It has

the basic format

LIBNAME libname "path";

where \libname" is some 1-8 character name and \path" is the path in the �lesystem

where the library is to be kept. Upon startup, the SAS System internally issues a LIBNAME

statement de�ning a library named WORK which it uses as the default library. By default, at

the end of the SAS session, the WORK library is cleaned of all SAS �les. Thus, to keep any

SAS data sets permanent, you should store them in a library that you specify. To do this,

a two-level data set name is used in the SAS program. A two-level name has two names

which are separated by a period. The �rst name is the library name de�ned in the LIBNAME

statement, and the second name is the member or data set name.

/* Point to the subdirectory containing the raw effort data */FILENAME raw "c:\temp\effort97.csv";/* Point to the SAS Library to store the effort data */LIBNAME effort "c:\fishing\effort";

DATA effort.year1997;LENGTH location $30.;INFILE raw delimiter=",";INPUT date mmddyy8. location & effort;RUN;

Here, the raw data are input into the SAS data set effort.year1997, which can be used

later in the same program or in some future program for analysis, without the need for the

above data step. For example, the SAS code below could be used to print the data set at

some future time.

LIBNAME effort "c:\fishing\effort";PROC PRINT DATA=effort.year1997;RUN;

Page 48: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 47

3.3.2 Library Procedures

There are several procedures for working with SAS libraries. The procedures permit update

of the information about a data set and utility operations such as copying, deletion or

renaming a member. You can also append one member to another.

PROC DATASETS

PROC DATASETS does many things related to the management of data sets. The usual way

to operate with this procedure is in full-screen mode (the default). In full-screen mode you

can easily modify parameters of the data set, copy, rename, and delete data sets, and list

various parameters associated with them. A couple of these tasks are shown below in a

non-full-screen session.

1777 Title2 "Proc Datasets";1778 Proc Datasets Library=Work Nofs;

-----Directory-----

Libref: WORKEngine: V612Physical Name: C:\SAS\SASWORK\#TD16077

# Name Memtype Indexes

1 CURSTAT CATALOG2 FLIER DATA3 FLIER1 DATA4 FLIER2 DATA5 SASMACR CATALOG

1779 Modify Flier(Label="James Geaghan Flier Data Set");1780 Run;

1781 Delete Flier1;1782 Run;

NOTE: Deleting WORK.FLIER1 (memtype=DATA).1783 Quit;

Page 49: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 48

PROC CONTENTS

The PROC CONTENTS procedure produces a printout of information about the structure of

a data set, such as the variable names, types, lengths, labels, and formats, and the num-

ber of observations in the data set, etc. This listing can also be produced from within

PROC DATASETS.

1785 Title2 "Proc Contents";1786 Proc Contents Data=Work.Flier;1787 Run;

Proc Contents

CONTENTS PROCEDURE

Data Set Name: WORK.FLIER Observations: 664Member Type: DATA Variables: 11Engine: V612 Indexes: 0Created: 18:55 Thursday, September 25, 1997 Observation Length: 88Last Modified: 18:55 Thursday, September 25, 1997 Deleted Observations: 0Protection: Compressed: NOData Set Type: Sorted: NOLabel: James Geaghan Flier Data Set

-----Engine/Host Dependent Information-----

Data Set Page Size: 8192Number of Data Set Pages: 8File Format: 607First Data Page: 1Max Obs per Page: 92Obs in First Data Page: 73

-----Alphabetic List of Variables and Attributes-----

# Variable Type Len Pos------------------------------------7 AGE Num 8 484 AR Num 8 242 DAY Num 8 89 LT Num 8 641 MO Num 8 06 SEX Num 8 408 SN Num 8 565 ST Num 8 3211 TSL Num 8 8010 WT Num 8 723 YR Num 8 16

This type of listing can be very important for documentation purposes and for �nding out

what is in a permanent SAS data set.

Page 50: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 49

PROC COPY

PROC COPY is used to copy SAS data set members (and catalogs) from one data library to

another. This is useful for making backups. When combined with import/export engines,

it can also be used to convert data from one form into another. The example below simply

copies a data set from one library to another.

1789 Libname Perm "c:\temp";NOTE: Libref PERM was successfully assigned as follows:

Engine: V612Physical Name: c:\temp

1790 Proc Copy In=Work Out=Perm;1791 Select Flier;1792 Run;

NOTE: Copying WORK.FLIER to PERM.FLIER (MEMTYPE=DATA).NOTE: The data set PERM.FLIER has 664 observations and 11 variables.

PROC APPEND

PROC APPEND is normally used to append one SAS data set to another SAS data set called

the base. If the base is a new data set, then a simple copy is used. In the example below,

the FLIER2 data set is appended to the original FLIER data set.

1795 Title2 "Proc Append";1796 Proc Append Base=Perm.Flier Data=Flier2;1797 Run;

NOTE: Appending WORK.FLIER2 to PERM.FLIER.NOTE: 366 observations added.NOTE: The data set PERM.FLIER has 1030 observations and 11 variables.

PROC DELETE

The DELETE procedure simply deletes a SAS data set.

1799 Title2 "Proc Delete";1800 Proc Delete Data=Perm.Flier;1801 Run;

NOTE: Deleting PERM.FLIER (memtype=DATA).

Page 51: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 50

PROC CATALOG

The CATALOG procedure is necessary for working with SAS catalogs. Catalogs contain a

wide variety of information. Rather than data, per se, they contain settings for various

aspects of the SAS system. They may contain the help �les used by SAS. They can contain

information that you have created for a SAS/AF program. We will look at what might

show up in as user's SAS Pro�le.

2361 Title2 "Proc Catalog";2362 Proc Catalog Cat=sasuser.profile;2363 Contents;2364 Run;2365 Quit;

Files and LibrariesProc Catalog

Contents of Catalog SASUSER.PROFILE

# Name Type Date Description

1 AF AFGO 09/08/972 DMKEYS KEYS 04/06/97 Function Key Definitions3 PASSIST SLIST 09/25/97 User profile4 _VPLAY_ SLIST 06/25/97 VIDEO: player preferences5 MRUWSAVE WSAVE 09/25/97

Some of these catalogs might be accessible using functions found in PROC CATALOG, but typ-

ically the parameters that they contain are to be set using the programs that these catalogs

are associated with (such as the Display Manager). This procedure is more commonly used

in its interactive mode.

Page 52: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 51

3.4 File Import/Export/Transport

3.4.1 Import/Export

Many users enter their data using spreadsheets or data base software such as Microsoft

Excel, Lotus 1-2-3, or Borland Paradox. If you have SAS/ACCESS installed, special

import/export �lters are available to directly access and use the data stored in several

other software formats. A SAS/AF application is provided on the File->Import and

File->Export pull-down menu that automates the programming of this task.

The task of importing an Excel spreadsheet containing age and growth information of Flier

sun�sh, courtesy of Dr. James Geaghan, LSU, where the �rst non-blank row of the spread-

sheet contains the variable names (SAS compatible) and the data follow in the remaining

rows, is given below using PROC ACCESS.

PROC ACCESS DBMS=EXCEL;CREATE work.xcell.ACCESS;PATH='C:\fishing\Flierdat.xls';WORKSHEET='flier';GETNAMES YES;SCANTYPE=YES;CREATE work.xcell.VIEW;SELECT ALL;

RUN;

DATA work.flier;SET work.xcell;

RUN;

The PROC ACCESS step creates a view into the data set and speci�es the particular worksheet

to import, whether variable names are to be gotten from the worksheet, etc. The library

name WORK is included to show that the data view is actually two catalog members within

the libarary member XCELL. Note that the data step is the process that actually converts

Page 53: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 52

the data to a SAS data set. It is possible to use the data directly from the spreadsheet by

referencing the data view as in the SET statement.

PROC PRINT DATA=work.xcell;RUN;

In general, this would not be the most e�cient way to access the data as the \conversion"

would need to be performed on each data access.

It is also possible to export the data to other program formats. PROC DBLOAD is used below

to convert the ier data set back into an Excel spreadsheet.

PROC DBLOAD DBMS=EXCEL DATA=work.flier;PATH='C:\fishing\newflier.xls';PUTNAMES YES;LIMIT=0;LOAD;

RUN;

Also there are a number of other options for both PROC ACCESS and PROC DBLOAD that

can be used to control the variable names, types, data ranges, and other input/output

information.

What if you do not have SAS/ACCESS? It is not too di�cult to input a spreadsheet using

the SAS data step. If the data contain no commas, then the \comma separated values"

(CSV) format produced by most spreadsheets and many data bases is often convenient.

The �rst step is to save the spreadsheet into the CSV format (don't remove the original

spreadsheet �le, the CSV �le is just an intermediate step). Next, write a SAS data step

to input the data using an INFILE statement containing the DELIMITER="," option. As an

example, let's assume that the ier data set has been saved as a CSV �le called \ ierdat.csv".

A portion of the data set is shown below.

Mo,Day,Yr,Ar,St,Sex,Age,Sn,Lt,Wt,TSL,Size1,Size2,Size3,Size4,Size5,Size6,Edge,No7,21,74,2,7,3,0,5,3.5,1.4,57.00,0.00,0.00,0.00,0.00,0.00,0.00,34.72,11,9,76,2,3,2,0,1,5.3,3.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,212,18,74,2,4,1,0,5,5.4,3.4,83.20,0.00,0.00,0.00,0.00,0.00,0.00,47.79,32,15,76,2,1,1,0,5,6.0,5.7,111.00,0.00,0.00,0.00,0.00,0.00,0.00,60.97,412,14,75,2,1,1,0,5,6.1,6.6,109.00,0.00,0.00,0.00,0.00,0.00,0.00,60.04,5

Now we can write a SAS data step to read in these data. Note that we need to skip over

the �rst line of the data set as it contains the spreadsheet column headings.

FILENAME in "flierdat.csv";DATA flier;INFILE in FIRSTOBS=2 DELIMITER=",";INPUT Mo Day Yr Ar St Sex Age Sn Lt Wt TSL

Size1 Size2 Size3 Size4 Size5 Size6 Edge No;RUN;

If character data were included in the data set, then appropriate \informats" would need

to be used for reading those variables.

The third-party software produce DBMS/COPY is designed to move data between a number

of di�erent data formats including SAS data sets. This introduces the need for a \third"

program, but can greatly ease the data export/import process, especially when the same

data are needed in several di�erent formats.

Page 54: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 53

The SAS/CONNECT product that interconnects a network of PC's, Unix workstations,

and/or mainframes, can also be used to move SAS data sets from one machine type to

another. This product contains PROC UPLOAD and PROC DOWNLOAD for easily moving SAS

data sets, but they can also be used to handle ascii �les as well. This product also permits

you to execute code and work with data on several di�erent platforms at the same time.

3.4.2 Transport

To copy a SAS data set from one system to another may lead to trouble if the systems and

SAS versions are not exactly compatible with one another. For example, the structure of a

SAS data set on an IBM MVS mainframe is quite di�erent from that on a WIN95 PC. To

move or copy a SAS data set, a transport data library can be created. Since it is a library,

it may contain more than one SAS data set. The CPORT and CIMPORT procedures are used,

respectively, to create and import a transport data library. The process is to �rst create

the transport library, transport it to the other host system, then import the library. A

real advantage to this methodology is that you can send someone a SAS data set and know

exactly what is in the data set, while sending them a \raw" data set requires that they

know how to read in the data correctly. In other instances, the data may not exist in its

\raw" form, but may have been entered directly into a SAS data set. Below is an example

that creates a transport library of the ier data set, then imports it as if we had moved

the transport library to another system. First, let's create the transport data library. The

LIBRARY argument speci�es which SAS data library will be used, the FILE argument species

the destination for the transport image, and the MT argument speci�es what types of things

from the SAS data library are to be transported. Here we speci�ed that only SAS data sets

are to be considered.

668 Title2 "Create A Transport Data Library";669 Filename XPT "c:\projects\alaska97\flier.xpt";670 Proc CPort Library=Work File=XPT MT=Data;671 Run;

NOTE: Proc CPORT begins to transport data set WORK.FLIERNOTE: The data set contains 11 variables and 664 observations.

Logical record length is 88.

NOTE: Proc CPORT begins to transport data set WORK.FLIER1NOTE: The data set contains 11 variables and 298 observations.

Logical record length is 88.

NOTE: Proc CPORT begins to transport data set WORK.FLIER2NOTE: The data set contains 11 variables and 366 observations.

Logical record length is 88.

672 Filename XPT Clear;NOTE: Fileref XPT has been deassigned.

Page 55: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 54

Now we can reverse the process using the CIMPORT procedure. This would be the same type

of code that would be used on the \other" platform.

674 Title2 "Import From A Transport Data Library";675 Filename IMPT "c:\projects\alaska97\flier.xpt";676 Proc CImport Library=Work File=IMPT;677 Run;

NOTE: Proc CIMPORT begins to create/update data set WORK.FLIERNOTE: Data set contains 11 variables and 664 observations.

Logical record length is 88.

NOTE: Proc CIMPORT begins to create/update data set WORK.FLIER1NOTE: Data set contains 11 variables and 298 observations.

Logical record length is 88.

NOTE: Proc CIMPORT begins to create/update data set WORK.FLIER2NOTE: Data set contains 11 variables and 366 observations.

Logical record length is 88.

Note that we could couple some of this code with the FTP �le access method described

earlier to let the SAS system move the �le for us. When moving the transport library,

use a binary transfer method so that the internal structure of the transport library is not

changed.

Page 56: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 55

3.5 The X Files

The SAS system provides through various methods, access to the operating system and its

commands. The X statement can be used to pass a command along to the operating system

and have the command executed as if it were given at a system command prompt (such

as at an MS-DOS, Unix, or terminal window). The SAS program will wait, by default,

until the command has completed. Also by default, the user will have to \manually" close

the external shell or program started by the X statement. The program below will use the

WIN95 operating system to generate a directory listing of the ier data sets and will redirect

the output to a �le named flier.dir. The NOXWAIT system option is used to tell the SAS

system to continue processing SAS statements once the X statement has completed, while

the XSYNC option is used to prevent the SAS system from processing more statements before

the command has completed. We then open the flier.dir data set and read its contents

and write them to the SAS log.

652 Options XSync NoXWait;653 X "dir c:\projects\alaska97\flier*.dat > c:\projects\alaska97\flier.dir"653 ;654 Filename Fish "c:\projects\alaska97";655 Data _NULL_;656 Infile Fish("flier.dir") EOF=LF;657 File Log;658 Input;659 If _N_=1 Then Link LF;660 Put _INFILE_;661 Return;662 LF:663 Put //;664 Return;665 Run;

NOTE: The infile library FISH is:DIRECTORY=c:\projects\alaska97

NOTE: The infile FISH("flier.dir") is:DIRECTORY=c:\projects\alaska97,MEMBERNAME=c:\projects\alaska97\flier.dir,RECFM=V,LRECL=256

Volume in drive C is RON RICOVolume Serial Number is 1BF6-1260Directory of C:\PROJECTS\Alaska97

FLIER DAT 69,160 09-22-97 5:57p FLIER.datFLIERS2 DAT 38,064 09-22-97 6:20p Fliers2.datFLIER1N2 DAT 27,888 09-22-97 9:32p Flier1N2.DATFLIERS1 DAT 30,992 09-22-97 6:19p Fliers1.dat

4 file(s) 166,104 bytes0 dir(s) 345,735,168 bytes free

NOTE: A total of 11 records were read from the infile library FISH.The minimum record length was 0.The maximum record length was 56.

NOTE: 11 records were read from the infile FISH("flier.dir").The minimum record length was 0.The maximum record length was 56.

NOTE: The DATA statement used 0.66 seconds.

Page 57: Advanced SAS Programming Techniques ()

CHAPTER 3. WORKING WITH FILES 56

Note that with very little programming we could have read the directory listing into variables

in a SAS data set and have processed them in some fashion, say to compute the disk space

occupied by the �les.

Page 58: Advanced SAS Programming Techniques ()

Chapter 4

The Macro Language

Macro languages are programming languages that lie on top of another programming lan-

guage. Their purpose, typically, is to control which code may be compiled at compile time

(say, certain aspects of the programming code might be machine dependent) and to do

variable substitution. That is, the macro language looks for special symbols that indicate

that it is simply to replace this set of special symbols with some other code. The SAS

macro language can accomplish these tasks, but generally is used to control the generation

of the SAS language itself. In fact, the macro language often writes SAS language code.

Thus, entire applications can be developed with the macro language, which upon execu-

tion, generates SAS language that might be a mixture of macro commands, data steps, and

procedure steps.

4.1 Macro Variables

Probably, the easiest way to begin with the SAS macro language is to start with macro

variables. A macro variable holds an assigned value and when its reference is encountered,

the assigned value is substituted for the reference. In the macro language a macro variable

reference looks like &NAME where NAME is some variable name that you have assigned. The

assignment of a macro variable is often made through the %LET macro statement. Notice

that macro statement elements begin with a percent sign to di�erentiate them from SAS

language statements. Let's look at a simple example. We will assume that the ier data

set has been read in and available.

15 %Let DSName=Flier;16 Title2 "Listing of Data Set &DSNAME";17 Title3 'Single Quotes for &DSNAME';18 Proc Print Data=&DSName(Obs=10);19 Run;

NOTE: The PROCEDURE PRINT printed page 1.

57

Page 59: Advanced SAS Programming Techniques ()

CHAPTER 4. THE MACRO LANGUAGE 58

The SAS Macro Language 1Listing of Data Set FlierSingle Quotes for &DSNAME

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL

1 7 21 74 2 7 3 0 5 3.5 1.4 57.02 1 9 76 2 3 2 0 1 5.3 3.0 0.03 12 18 74 2 4 1 0 5 5.4 3.4 83.24 2 15 76 2 1 1 0 5 6.0 5.7 111.05 12 14 75 2 1 1 0 5 6.1 6.6 109.06 12 18 74 2 4 1 0 4 6.2 5.8 104.57 9 13 75 2 2 1 0 5 6.4 6.7 122.88 12 14 75 2 1 1 0 5 6.4 7.6 118.49 12 14 75 2 1 1 0 5 6.5 6.8 130.610 2 15 76 2 1 1 0 5 6.8 8.1 118.0

The %LET statement has the name of the variable to receive the macro assignment, the equals

sign, then the value to be assigned to the macro variable. Note that the variable is written

without any special symbols (at least for now). Also note that the value is given without

any quotes. We'll address more complicated issues as we move along. In this particular

instance if we had several di�erent data sets that we might print, it becomes easy to specify

a di�erent data set and, at the same time, have its name included in the title. The type

of quotes used, however, are very important. You de�nitely must use double quotes if you

want the macro variable to resolve to its value. Single quotes will print the ampersand and

name as written.

Normally, there is nothing written on the SAS log to indicate what happened with respect

to the macro substitution. To see what actually was substituted we can change one of the

system options, SYMBOLGEN, on. Then upon re-running the code we see the following:

21 Options Symbolgen;22 %Let DSName=Flier;SYMBOLGEN: Macro variable DSNAME resolves to Flier23 Title2 "Listing of Data Set &DSNAME";24 Proc Print Data=&DSName(Obs=10);SYMBOLGEN: Macro variable DSNAME resolves to Flier25 Run;

NOTE: The PROCEDURE PRINT printed page 2.

Now we can see what value was substituted for &DSNAME. Let's look at the following code.

27 %Let Somecode=%Str(Proc Print Data=Flier(Obs=10); Run;);28 %Unquote(&Somecode);SYMBOLGEN: Macro variable SOMECODE resolves to Proc Print

Data=Flier(Obs=10); Run;SYMBOLGEN: Some characters in the above value which were subject to macro

quoting have been unquoted for printing.

NOTE: The PROCEDURE PRINT printed page 3.

Thus, macro variables can actually hold very complicated expressions. The macro quoting

function %STR() was used to quote the entire expression so that the internal parentheses

and semicolons would not cause any problems. However, when we reference the variable,

the value remains quoted, which means that its contents want be executed as normal SAS

language statements. To get rid of the special quoting, the macro function %UNQUOTE was

Page 60: Advanced SAS Programming Techniques ()

CHAPTER 4. THE MACRO LANGUAGE 59

used.

Although these variables are quite powerful with respect to substitution, they have their

limits for writing reusable software. Macro procedures give us even more control of code

generation and variable substitution.

4.2 Macro Procedures

Macro procedures consist of a procedure de�nition and a body. The body can be \plain"

SAS language code, or can make use of the macro language. In fact, a number of the macro

statements are only permitted within the body of a macro. A macro procedure begins with

the %MACRO statement and ends with the %MEND statement.

Let's rewrite our earlier program as a sas macro named PRT.

30 %Macro PRT(DSName);31 Title2 "Listing of Data Set &DSNAME";32 Proc Print Data=&DSName(Obs=10);33 Run;34 %Mend PRT;35 %PRT(Flier)SYMBOLGEN: Macro variable DSNAME resolves to FlierSYMBOLGEN: Macro variable DSNAME resolves to Flier

NOTE: The PROCEDURE PRINT printed page 4.

Again, it is di�cult to determine what is actually happening in the macro. The SYMBOLGEN

results do help but they appear out of place. This is because macros are actually compiled

before they are executed. This is also why the complexity of macro programming can be

greater within a macro procedure. You should notice that to perform the same task on a

di�erent data set would only require another call to the procedure. For example

%PRT(Bluegill);

The MPRINT system option can be used to print out the SAS language statements generated

by the macro language. The code below turns on this option and turns o� the SYMBOLGEN

option.

37 Options NoSymbolGen MPrint;38 %PRT(Flier)MPRINT(PRT): TITLE2 "Listing of Data Set Flier";MPRINT(PRT): PROC PRINT DATA=FLIER(OBS=10);MPRINT(PRT): RUN;

NOTE: The PROCEDURE PRINT printed page 5.

Not only did we get the macro variables resolved, but we also have them listed in the context

of the SAS language statements.

Page 61: Advanced SAS Programming Techniques ()

CHAPTER 4. THE MACRO LANGUAGE 60

It is also possible to put optional arguments to a macro. This permits you to assign initial

values to these macro variables, which can be overwritten when the macro is called. The

example below makes the data set name optional.

40 %Macro PRT(DSName=Flier);41 Title2 "Listing of Data Set &DSNAME";42 Proc Print Data=&DSName(Obs=10);43 Run;44 %Mend PRT;45 %PRT;MPRINT(PRT): TITLE2 "Listing of Data Set Flier";MPRINT(PRT): PROC PRINT DATA=FLIER(OBS=10);MPRINT(PRT): RUN;

NOTE: The PROCEDURE PRINT printed page 6.

Note that the macro was called without replacing the data set name. Had we wanted to

use the BLUEGILL data set, for example, we would have made the call

%PRT(DSNAME=BLUEGILL);

We can even include required and optional parameters in the same macro. The required or

positional parameters must be called in the same order as given in the %MACRO statement,

while the optional parameters can be given in any order following after the positional

parameters.

47 %Macro Sex(Sex,DSName=Flier);48 Data &DSName&Sex;49 Set &DSName;50 Where (Sex=&Sex);51 Run;52 %Mend Sex;

Now we can call the new macro and specify which sex we would like to create the data set

for. Notice here the complicated name &DSName&Sex. This is actually two macro names

joined together. When executed, both macro names will be resolved then joined together

to form a single name. Let's call the macro and look at the results produced.

53 %Sex(1,DSNAME=Flier);MPRINT(SEX): DATA FLIER1;MPRINT(SEX): SET FLIER;MPRINT(SEX): WHERE (SEX=1);MPRINT(SEX): RUN;

NOTE: The data set WORK.FLIER1 has 306 observations and 11 variables.NOTE: The DATA statement used 0.44 seconds.

54 %Sex(2);MPRINT(SEX): DATA FLIER2;MPRINT(SEX): SET FLIER;MPRINT(SEX): WHERE (SEX=2);MPRINT(SEX): RUN;

NOTE: The data set WORK.FLIER2 has 340 observations and 11 variables.NOTE: The DATA statement used 0.38 seconds.

55 %Sex(3);MPRINT(SEX): DATA FLIER3;MPRINT(SEX): SET FLIER;MPRINT(SEX): WHERE (SEX=3);MPRINT(SEX): RUN;

Page 62: Advanced SAS Programming Techniques ()

CHAPTER 4. THE MACRO LANGUAGE 61

NOTE: The data set WORK.FLIER3 has 18 observations and 11 variables.NOTE: The DATA statement used 0.22 seconds.

5657 Proc Datasets Library=Work;

-----Directory-----

Libref: WORKEngine: V612Physical Name: C:\SAS\SASWORK\#TD41921

# Name Memtype Indexes----------------------------1 FLIER DATA2 FLIER1 DATA3 FLIER2 DATA4 FLIER3 DATA5 SASMACR CATALOG

58 Run;59 Quit;

The macro created each of the 3 data sets, one for each sex category. Now if we knew

that we would be doing this, there might be other ways to accomplish this with the macro

language. The macro language also has looping statements like the SAS language does.

61 %Macro Sex(DSName=Flier);62 %Do Sex=1 %To 3;63 Data &DSName&Sex;64 Set &DSName;65 Where (Sex=&Sex);66 Run;67 %End;68 %Mend Sex;

which when called generates the code below.

69 %Sex;

MPRINT(SEX): DATA FLIER1;MPRINT(SEX): SET FLIER;MPRINT(SEX): WHERE (SEX=1);MPRINT(SEX): RUN;

NOTE: The data set WORK.FLIER1 has 306 observations and 11 variables.

MPRINT(SEX): DATA FLIER2;MPRINT(SEX): SET FLIER;MPRINT(SEX): WHERE (SEX=2);MPRINT(SEX): RUN;

NOTE: The data set WORK.FLIER2 has 340 observations and 11 variables.

MPRINT(SEX): DATA FLIER3;MPRINT(SEX): SET FLIER;MPRINT(SEX): WHERE (SEX=3);MPRINT(SEX): RUN;

NOTE: The data set WORK.FLIER3 has 18 observations and 11 variables.

Page 63: Advanced SAS Programming Techniques ()

CHAPTER 4. THE MACRO LANGUAGE 62

4.3 Bootstrap Example

In this section we will write a more complicated macro. This macro will compute a Monte

Carlo or bootstrap test for the equality of two means (mean weight of the ier �sh of males

and females) using the ANOVA F statistic as the criterion. Under the null hypothesis, there

is only one population and so it doesn't matter which observations you classify as males

and which you classify as females. The algorithm will be the following.

1. Compute the test statistic for the original sample.

2. Select a simple random sample with replacement equal to the number of males in the

sample and label these observations as males.

3. Select a simple random sample with replacement equal to the number of females in

the sample and label these observations as females.

4. Concatenate the two samples together and compute the ANOVA test statistic testing

for di�erences between the sexes.

5. Save the test statistic then start again at (2) until a large number of iterations have

been performed.

6. Rank order the test statistics from smallest to largest and record the rank of the

original test statistic.

7. If the original test statistic is within the largest 5% of these test statistics, then reject

the null hypothesis and claim that the weight means are di�erent between the sexes.

The following macro was written to implement this basic scheme. You should be aware that

much more e�cient schemes exist to complete this task. For example, one could generate

all of the Monte Carlo samples (iterations) in a single data step and use BY processing to

compute the ANOVAs BY SEX.

%Macro MonteT(MaxIter=99);%* Get Rid of Unknowns First;Data _NoUnk_;Set Flier;If Sex NE 3;Run;

%* Compute Test Statistic for Observed Data;Proc ANOVA Data=_NOUNK_ NoPrint OutStat=AllStats;Class Sex;Model Wt=Sex;Run;Quit;

%* Determine the Number of Males and Females;%* We should preserve the sample sizes;Proc Means Data=_NoUnk_ Noprint;Class Sex;Var Wt;Output Out=Temp N=N;Run;

Page 64: Advanced SAS Programming Techniques ()

CHAPTER 4. THE MACRO LANGUAGE 63

%* Get The Results From Above Into Macro Variables;Data _NULL_;Set Temp;Select (Sex);When (1) Call Symput("Males",Put(N,5.));When (2) Call Symput("Females",Put(N,5.));Otherwise;End;Run;%Put The Data Set Contains &Males Males.;%Put The Data Set Contains &Females Females.;

%* This macro actually selects the sample;%Macro TakeSamp(Sex,Number);Sex=&Sex;Do I=1 To &Number;Obs=Floor(Ranuni(0)*N+1);Set _NoUnk_(Drop=Sex) Point=Obs Nobs=N;Output;

End;%Mend TakeSamp;

%* This loop controls the sampling;%Do Iter=1 %To &MaxIter;Data BootSamp;%TakeSamp(1,&Males);%TakeSamp(2,&Females);Stop;Run;

%* Compute the test statistic based upon the pseudo-sample;Proc ANOVA Data=BootSamp NoPrint OutStat=One;Class Sex;Model Wt=Sex;Run;Quit;

%* Append the results to the stats data set;Proc Append Base=AllStats Data=One;Run;%End;

%* Print Out the Original Statistics;Title2 "Listing of All ANOVA Statistics";Proc Print Data=AllStats(Obs=10);Run;

%* Keep Only The F Statistics For Sex;Data AllStats;Retain Iter -1;Set AllStats;If _Source_="SEX";Iter+1;Run;

%* Reorder The Runs By F Statistic Value;Proc Sort Data=AllStats;By F;Run;

Page 65: Advanced SAS Programming Techniques ()

CHAPTER 4. THE MACRO LANGUAGE 64

%* List In Increasing Order of F;%* Find Iter 0 And See Where It Falls In Relation;%* To The Other F Values.;Title2 "Listing of Ordered ANOVA Results";Title3 "Iter Number 0 Is From Original Analysis";Proc Print Data=AllStats;Run;

Data _NULL_;Set AllStats;If Iter=0 ThenDo;

MonteP=(_N_-1)/(&MaxIter+1);Put "Monte Carlo P-value is " MonteP 6.4;

End;Run;

%Mend MonteT;

Some of the �rst code generated by the macro is the following.

171 %MonteT;

MPRINT(MONTET): DATA _NOUNK_;MPRINT(MONTET): SET FLIER;MPRINT(MONTET): IF SEX NE 3;MPRINT(MONTET): RUN;

NOTE: The data set WORK._NOUNK_ has 646 observations and 11 variables.

MPRINT(MONTET): PROC ANOVA DATA=_NOUNK_ NOPRINT OUTSTAT=ALLSTATS;MPRINT(MONTET): CLASS SEX;MPRINT(MONTET): MODEL WT=SEX;MPRINT(MONTET): RUN;

MPRINT(MONTET): QUIT;

NOTE: The data set WORK.ALLSTATS has 2 observations and 7 variables.

MPRINT(MONTET): PROC MEANS DATA=_NOUNK_ NOPRINT;MPRINT(MONTET): CLASS SEX;MPRINT(MONTET): VAR WT;MPRINT(MONTET): OUTPUT OUT=TEMP N=N;MPRINT(MONTET): RUN;

NOTE: The data set WORK.TEMP has 3 observations and 4 variables.

MPRINT(MONTET): DATA _NULL_;MPRINT(MONTET): SET TEMP;MPRINT(MONTET): SELECT (SEX);MPRINT(MONTET): WHEN (1) CALL SYMPUT("Males",PUT(N,5.));MPRINT(MONTET): WHEN (2) CALL SYMPUT("Females",PUT(N,5.));MPRINT(MONTET): OTHERWISE;MPRINT(MONTET): END;MPRINT(MONTET): RUN;

The Data Set Contains 306 Males.The Data Set Contains 340 Females.

Page 66: Advanced SAS Programming Techniques ()

CHAPTER 4. THE MACRO LANGUAGE 65

MPRINT(MONTET): DATA BOOTSAMP;MPRINT(TAKESAMP): SEX=1;MPRINT(TAKESAMP): DO I=1 TO 306;MPRINT(TAKESAMP): OBS=FLOOR(RANUNI(0)*N+1);MPRINT(TAKESAMP): SET _NOUNK_(DROP=SEX) POINT=OBS NOBS=N;MPRINT(TAKESAMP): OUTPUT;MPRINT(TAKESAMP): END;MPRINT(MONTET): ;MPRINT(TAKESAMP): SEX=2;MPRINT(TAKESAMP): DO I=1 TO 340;MPRINT(TAKESAMP): OBS=FLOOR(RANUNI(0)*N+1);MPRINT(TAKESAMP): SET _NOUNK_(DROP=SEX) POINT=OBS NOBS=N;MPRINT(TAKESAMP): OUTPUT;MPRINT(TAKESAMP): END;MPRINT(MONTET): ;MPRINT(MONTET): STOP;MPRINT(MONTET): RUN;

NOTE: The data set WORK.BOOTSAMP has 646 observations and 12 variables.

MPRINT(MONTET): PROC ANOVA DATA=BOOTSAMP NOPRINT OUTSTAT=ONE;MPRINT(MONTET): CLASS SEX;MPRINT(MONTET): MODEL WT=SEX;MPRINT(MONTET): RUN;MPRINT(MONTET): QUIT;NOTE: The data set WORK.ONE has 2 observations and 7 variables.

MPRINT(MONTET): PROC APPEND BASE=ALLSTATS DATA=ONE;MPRINT(MONTET): RUN;

NOTE: Appending WORK.ONE to WORK.ALLSTATS.NOTE: 2 observations added.NOTE: The data set WORK.ALLSTATS has 4 observations and 7 variables.

The last 3 steps will continue until all iterations have been completed. Once completed the

code generated will be.

MPRINT(MONTET): TITLE2 "Listing of All ANOVA Statistics";MPRINT(MONTET): PROC PRINT DATA=ALLSTATS(OBS=10);MPRINT(MONTET): RUN;

NOTE: The PROCEDURE PRINT printed page 7.

MPRINT(MONTET): DATA ALLSTATS;MPRINT(MONTET): RETAIN ITER -1;MPRINT(MONTET): SET ALLSTATS;MPRINT(MONTET): IF _SOURCE_="SEX";MPRINT(MONTET): ITER+1;MPRINT(MONTET): RUN;

NOTE: The data set WORK.ALLSTATS has 100 observations and 8 variables.

MPRINT(MONTET): PROC SORT DATA=ALLSTATS;MPRINT(MONTET): BY F;MPRINT(MONTET): RUN;

NOTE: The data set WORK.ALLSTATS has 100 observations and 8 variables.

MPRINT(MONTET): TITLE2 "Listing of Ordered ANOVA Results";MPRINT(MONTET): TITLE3 "Iter Number 0 Is From Original Analysis";MPRINT(MONTET): PROC PRINT DATA=ALLSTATS;MPRINT(MONTET): RUN;

Page 67: Advanced SAS Programming Techniques ()

CHAPTER 4. THE MACRO LANGUAGE 66

NOTE: The PROCEDURE PRINT printed pages 8-10.

MPRINT(MONTET): DATA _NULL_;MPRINT(MONTET): SET ALLSTATS;MPRINT(MONTET): IF ITER=0 THEN DO;MPRINT(MONTET): 1-MONTEP=(_N_-1)/(99+1);MPRINT(MONTET): PUT "Monte Carlo P-value is " MONTEP 6.4;MPRINT(MONTET): END;MPRINT(MONTET): RUN;

Monte Carlo P-value is 0.0800

The �nal page of the output produced had the following observations, including the original

analysis results (ITER=0),

OBS ITER _NAME_ _SOURCE_ _TYPE_ DF SS F PROB

80 56 WT SEX ANOVA 1 933.89 1.56952 0.2107381 76 WT SEX ANOVA 1 1063.30 1.63018 0.2021482 79 WT SEX ANOVA 1 999.67 1.82098 0.1776783 27 WT SEX ANOVA 1 1281.41 2.03846 0.1538584 15 WT SEX ANOVA 1 1452.89 2.28557 0.1310785 8 WT SEX ANOVA 1 1488.45 2.55722 0.1102886 94 WT SEX ANOVA 1 1658.02 2.69699 0.1010387 13 WT SEX ANOVA 1 1437.03 2.73300 0.0987888 4 WT SEX ANOVA 1 1690.12 2.80394 0.0945289 18 WT SEX ANOVA 1 1827.85 3.08179 0.0796590 51 WT SEX ANOVA 1 1857.69 3.10496 0.0785391 5 WT SEX ANOVA 1 1817.26 3.33083 0.0684692 63 WT SEX ANOVA 1 1926.93 3.53462 0.0605593 0 WT SEX ANOVA 1 2318.17 3.83827 0.0505394 26 WT SEX ANOVA 1 2148.51 3.85926 0.0499095 87 WT SEX ANOVA 1 2675.15 4.32986 0.0378496 43 WT SEX ANOVA 1 2799.96 4.38115 0.0367397 6 WT SEX ANOVA 1 3672.40 6.04351 0.0142298 24 WT SEX ANOVA 1 3720.97 6.30796 0.0122699 83 WT SEX ANOVA 1 4204.62 8.06348 0.00466100 14 WT SEX ANOVA 1 5745.28 8.91242 0.00294

We get a very similar answer as to what was obtained by the original ANOVA analysis.

Many more simulations and we would have obtained a more precise estimate of the sampling

distribution of the test statistic. It does take a while for the macro to execute. Its speed can

be increased by suppressing some of the printed output. For example, suppress the macro

printed output along with the notes.

Options NoNotes NoMPrint NoSymbolgen;

4.4 Cluster Dendrogram

There are many SAS macros that have been written by users, as well as by SAS Institute

sta�, to �ll \holes" in the collection of SAS procedures. For example, the dendrogram

generated by PROC TREE is not particularly readable and is very frustrating to users that

are already familiar with a dendrogram. The dendro macro, available at the SAS WWW

site, uses SAS/GRAPH to produce a high resolution dendrogram. The following program

calls this macro and produces the attached graph.

Page 68: Advanced SAS Programming Techniques ()

CHAPTER 4. THE MACRO LANGUAGE 67

*---------------------------------------------------------*| Average Linkage Cluster Analysis. || Analysis of a subset of the 1980 Uniform Crime Report || data base. Only cities with populations between 100,000|| and 249,999 are used. The largest of these per state is|| selected for clustering. Reported crimes are || standardized to total offenses for each city. |*---------------------------------------------------------*;Options PS=55 LS=78 NoDate NoNumber;

Proc Format;Value State1="ALABAMA" 2="ARIZONA" 3="ARKANSAS"4="CALIFORNIA" 5="COLORADO" 6="CONNECTICUT"7="DELAWARE" 8="WASHINGTON, D.C." 9="FLORIDA"10="GEORGIA" 11="IDAHO" 12="ILLINOIS"13="INDIANA" 14="IOWA" 15="KANSAS"16="KENTUCKY" 17="LOUISIANA" 18="MAINE"19="MARYLAND" 20="MASSACHUSETTS" 21="MICHIGAN"22="MINNESOTA" 23="MISSISSIPPI" 24="MISSOURI"25="MONTANA" 26="NEBRASKA" 27="NEVADA"28="NEW HAMPSHIRE" 29="NEW JERSEY" 30="NEW MEXICO"31="NEW YORK" 32="NORTH CAROLINA" 33="NORTH DAKOTA"34="OHIO" 35="OKLAHOMA" 36="OREGON"37="PENNSYLVANIA" 38="RHODE ISLAND" 39="SOUTH CAROLINA"40="SOUTH DAKOTA" 41="TENNESSEE" 42="TEXAS"43="UTAH" 44="VERMONT" 45="VIRGINIA"46="WASHINGTON" 47="WEST VIRGINIA" 48="WISCONSIN"49="WYOMING" 50="ALASKA" 51="HAWAII";Run;

Title1 'Reported Offenses from the 1980 Uniform Crime Report';Title2 'for Moderate Sized Cities';Data Crime;Input state division citysize

murder manslght rape robbery assault burglary larceny motor total;Array crimes{8} murder manslght rape robbery assault burglary larceny motor;Do i=1 to 8; /* standardize the reported crimes */Crimes{i}=crimes{i}/total*100;End;Drop i;Label manslght='manslaughter';Format state state.;Datalines;1 6 58040 54 0 144 956 2270 7130 10189 1094 218371 6 58730 37 1 58 306 1015 3671 7597 663 133481 6 41840 28 0 69 276 1781 4161 7491 827 146332 8 56210 10 0 62 193 1273 2773 7947 567 12825

--- Data Deleted Here ---46 9 85450 11 0 127 409 1521 4110 10278 869 1732548 3 52470 4 0 76 244 1136 3646 10125 590 1582150 9 1900 15 0 117 296 1581 2611 7322 1055 12997

;

Proc Sort;By state citysize;Run;

/* Now keep the largest city from each state */Data crime;Set crime;

Page 69: Advanced SAS Programming Techniques ()

CHAPTER 4. THE MACRO LANGUAGE 68

By state;If first.state;Drop citysize;Run;

Proc print;Format murder manslght rape robbery assault burglary larceny motor 6.2;Run;

Proc Cluster Data=crime Method=Average Outtree=Tree;Var murder manslght rape robbery assault burglary larceny motor;Id state;Run;

%EPS; /* my own macro to turn on encapsulated postscript */

%Include "dendro.sas";%dendro;Quit;

Page 70: Advanced SAS Programming Techniques ()

CHAPTER 4. THE MACRO LANGUAGE 69

f o r M o d e r a t e S i z e d C i t i e s

Page 71: Advanced SAS Programming Techniques ()

Chapter 5

SAS Special Files

There are several �les that are useful to be aware of and to make use of. The �rst is the

AUTOEXEC.SAS �le.

5.1 Autoexec.sas

The AUTOEXEC.SAS �le is typically stored in a user's home directory or in the main SAS

subdirectory. Once the SAS system initializes, it will, by default, read and execute the

statements contained in the AUTOEXEC.SAS �le. This makes it a very convenient way to

set up your environment with the basic settings that you might like. You can de�ne your

graphics drivers here as well as printout size, and other options. In developing applications,

it can be a way to autostart a program for persons that are non-programmers, but need to

use a pre-built SAS application.

Below is an example AUTOEXEC.SAS �le that I use on my Unix system. Some parts is use

frequently and others much less frequently.

*************************************************;* AUTOEXEC.SAS *;*************************************************;

%Macro CGM;%************************************************;%* CGM options are CGMFRMA - Monochrome *;%* CGMFRGA - Gray Scale *;%* CGMFRCA - Color *;%************************************************;Filename GSASFile '/tmp/sas.cgm';GOptions Device=CGMFRMA GAccess=GSASFile GSFMode=Replace

FText=HWCGM010 HText=1 CText=BlackFTitle=HWCGM010 HTitle=1 CTitle=Black;

%* FText = HWCGM010 is for Helvetica;%* Set FText to HWCGM001 For SansSerif Font;%Put %Str( );%Put %Str(NOTE: Graphics Device is: CGMFRMA);%Put %Str(NOTE: Writing Graphics Output To: /tmp/sas.cgm);%Put %Str( );%Mend CGM;

70

Page 72: Advanced SAS Programming Techniques ()

CHAPTER 5. SAS SPECIAL FILES 71

%Macro CGMX;%************************************************;%* Display device is XCOLOR - Then print to CGM *;%* CGM options are CGMFRMA - Monochrome *;%* CGMFRGA - Gray Scale *;%* CGMFRCA - Color *;%************************************************;Filename GSASFile '/tmp/sas.cgm';GOptions Device=XColor TargetDevice=CGMFRMA GAccess=GSASFile

GSFMode=Replace FText=HWCGM010 HText=1 CText=Black;%Put %Str( );%Put %Str(NOTE: Graphics Device is: XCOLOR);%Put %Str(NOTE: Target Device is: CGMFRMA);%Put %Str(NOTE: Writing Target Output To: /tmp/sas.cgm);%Put %Str( );%Mend CGMX;

%Macro EPS;%************************************************;%* EPS -Generate Encapsulated Postscript Output.*;%************************************************;FILENAME GSASFile "graph.eps";GOPTIONS Device=PSEPSF TargetDevice=PSEPSF

CBack=White Colors=(Black)GAccess=GSASFile NoPrompt GSFMode=Replace;

%Put %Str( );%Put %Str(NOTE: Graphics Device is: PSEPSF);%Put %Str(Note: Graphics Output to: graph.eps);%Put %Str( );%Mend EPS;

%Macro EPSX;%************************************************;%* EPSX-Generate Encapsulated Postscript Output.*;%************************************************;FILENAME GSASFile "graph.eps";GOPTIONS Device=XCOLOR TargetDevice=PSEPSF

CBack=White Colors=(Black)GAccess=GSASFile NoPrompt GSFMode=Replace;

%Put %Str( );%Put %Str(NOTE: Graphics Device is: XCOLOR);%Put %Str(Note: Graphics Output to: graph.eps);%Put %Str( );%Mend EPSX;

%Macro PS;%************************************************;%* PS - Generate Postscript Output For Printing.*;%************************************************;FILENAME GSASFile pipe 'lpr -Pps -h';GOPTIONS Device=PS1200 TargetDevice=PS1200

CBack=White Colors=(Black)GProlog='252150532D41646F62652D0D0A'xGAccess=GSASFile NoPrompt GSFMode=Replace;

%Put %Str( );%Put %Str(NOTE: Graphics Device is: PS1200);%Put %Str( );%Mend PS;

%Macro PSX;

Page 73: Advanced SAS Programming Techniques ()

CHAPTER 5. SAS SPECIAL FILES 72

%************************************************;%* PS - Generate Postscript Output For Printing.*;%************************************************;FILENAME GSASFile pipe 'lpr -Pps -h';GOPTIONS Device=XColor

TargetDevice=PS1200 CBack=White Colors=(Black)GProlog='252150532D41646F62652D0D0A'xGAccess=GSASFile NoPrompt GSFMode=Replace;

%Put %Str( );%Put %Str(Note: Graphics Device is: XCOLOR);%Put %Str(NOTE: Hardcopy Graphics Device is: PS1200);%Put %Str( );%Mend PSX;

%Macro PSCX;%************************************************;%* PS - Generate Postscript Output For Printing.*;%************************************************;FILENAME GSASFile pipe 'lpr -Ppsc0 -J/nff/nb';GOPTIONS Device=XColor

TargetDevice=PSCOLOR CBack=WhiteGProlog='252150532D41646F62652D0D0A'xGAccess=GSASFile NoPrompt GSFMode=Replace;

%Put %Str( );%Put %Str(Note: Graphics Device is: XCOLOR);%Put %Str(NOTE: Hardcopy Graphics Device is: PSCOLOR);%Put %Str( );%Mend PSCX;

%Macro HP;%************************************************;%* HP - Generate HPLJS3 Output For Printing. *;%************************************************;FILENAME GSASFile pipe 'lpr -Php1 -J/nff/nb';GOPTIONS Device=HPLJS3 GAccess=GSASFile NoPrompt GSFMode=Append;%Put %Str( );%Put %Str(NOTE: Graphics Device is: HPLJS3);%Put %Str( );%Mend HP;

Options /* Forms=SASUSER.PROFILE.DEFAULT.FORM */ LPTYPE=BSD;Libname Insight "/usr/lpp/sas612/samples/insight";

The macros make it easy for me to assign the graphics devices that I want to use. For in-

corporating graphics into FrameMaker I use the %CGM and %CGMX macros. For incorporating

graphics into LATEXI use %EPS and %EPSX. While for direct printing I might use %PS.

5.2 Con�g.sas

The CONFIG.SAS �le contains the con�guration data that a user might customize, such as

the locations of the SAS system �les and memory allocations. It also speci�es where the

SAS WORK and SASUSER libraries are to be located. On networked systems, this can be very

useful for de�ning independent con�gurations for each user.

Page 74: Advanced SAS Programming Techniques ()

CHAPTER 5. SAS SPECIAL FILES 73

An example CONFIG.SAS �le is shown below.

/** This file, config.sas, holds default configuration options* for the SAS System. These options are overridden by options on the* command line, or options specified through the SAS612_OPTIONS* environment variable.**/

/** -maps specifies the pathname of the map datasets used by PROC GMAP.*/-maps !SASROOT/maps

/** -msg specifies the directory where the SAS System will search* for the files containing the text for all error messages* and notes. These messages are stored in an internal format.*/-msg !SASROOT/sasmsg

/** -news specifies the name of a text file that will automatically* be displayed in the log when SAS is invoked.*/-news !SASROOT/misc/base/news

/** -sasautos establishes the path(s) to director(ies) for automatic* macro definitions to be searched by the macro facility when* an unknown macro is referenced.*/-sasautos !SASROOT/sasautos

/** -sashelp specifies the pathname for the directory containing on-line* help and menu screens for the SAS System.*/-sashelp !SASROOT/sashelp

/** -sasuser specifies the pathname for the directory used by the SAS* System as a default place to store files, such as the SAS* user profile catalog. See your SAS System documentation* for more information.*/-sasuser ~/sasuser

/** -work specifies where to create the SAS work library. This* library is temporary and any SAS data sets created there* will be deleted when the system terminates. The unique name* 'SAS_workANNNN' is assigned to each SAS work library. 'A' is* some letter and 'NNNN' is the hexadecimal representation of the* process ID of the SAS process.*/-work /tmp

/** -sasscript specifies the location to search for SAS/CONNECT scripts.*/

Page 75: Advanced SAS Programming Techniques ()

CHAPTER 5. SAS SPECIAL FILES 74

-sasscript !SASROOT/misc/connect

/** -dms specifies that you are running SAS in Display Manager mode.*/-dms

/*** -memsize limits the amount of memory that will be allocated by the* SAS System.*/-memsize 32m

/*** -sortsize limits the amount of memory that will be allocated during* sorting operations.*/-sortsize 16m

/** Default windowing system to use.*/-fsdevice x11

/** -helpenv specifies that native help should be used when help is* invoked.*/-helpenv helplus

/** -helploc specifies the location of the native help files for helplus.* You would have to specify '-helpenv helplus' to use those files.*/-helploc !SASROOT/X11/native_help

/** -samploc specifies the location of the sample files for helplus.*/-samploc !SASROOT/X11/native_help

/** -path specifies the search path that the SAS System will use* to find the dynamically loaded modules. Each -path* specification indicates one directory. They will be* searched in the order in which they are given.*/-path !SASROOT/sasexe/base-path !SASROOT/sasexe/graph-path !SASROOT/sasexe/stat-path !SASROOT/sasexe/fsp-path !SASROOT/sasexe/af-path !SASROOT/sasexe/insight-path !SASROOT/sasexe/ets-path !SASROOT/sasexe/eis-path !SASROOT/sasexe/iml-path !SASROOT/sasexe/connect-path !SASROOT/sasexe/or

Page 76: Advanced SAS Programming Techniques ()

CHAPTER 5. SAS SPECIAL FILES 75

-path !SASROOT/sasexe/qc-path !SASROOT/sasexe/dbi-path !SASROOT/sasexe/english-path !SASROOT/sasexe/fsc-path !SASROOT/sasexe/gis-path !SASROOT/sasexe/image-path !SASROOT/sasexe/lab-path !SASROOT/sasexe/nvi-path !SASROOT/sasexe/pub-path !SASROOT/sasexe/share-path !SASROOT/sasexe/trader-path !SASROOT/sasexe/toolkit-path !SASROOT/sasexe/spectraview-path !SASROOT/sasexe/unixdb-path !SASROOT/sasexe/mddbserver

5.3 Pro�le.sct

The PROFILE.SCT is a SAS catalog that contains lots of information about your SAS ses-

sion. It will contain your keyboard mapping settings, your window color settings, printer

de�nition information, and lots of other information of this sort. Typically, its values are

modi�ed through the various utilities and pull-down menus available in the SAS display

manager.

Page 77: Advanced SAS Programming Techniques ()

Chapter 6

SAS Internet Tools

SAS Institute is developing many products to WEB-enable the SAS system. The products

enable the system to, for example, produce HTML output. That is, you can read the SAS

output with a web browser. Other products permit the use of HTML forms to interact with

SAS data bases. Thus you can issue data base queries against a SAS data set where the

commands are obtained from an HTML form. Other products allow an HTML document

to invoke and run commands on the local SAS system. This would permit development of

web applications that would actually be executed from within SAS.

6.1 Capturing OUTPUT for the Web

The easiest way to get started with a web-enabled SAS system is to use the macros for

generating HTML output. In the current versions (6.11 and 6.12) the macros must �rst

be installed. The necessary �les can be obtained from the SAS Institute WWW server

at the URL http://www.sas.com/rnd/web/intro.html. Once installed it is a relatively

straightforward process to calling the OUT2HTM() macro to turn on the capturing of the

output (or log) and calling it again to turn o� the capturing and to generate the HTML

code.

An example of capturing a printout of the ier data set is given below. The FORMCHAR option

is reset to the values below so that the line plots and tables look properly constructed.

76

Page 78: Advanced SAS Programming Techniques ()

CHAPTER 6. SAS INTERNET TOOLS 77

Options PS=55 LS=78 PageNo=1 NoDate;Options Formchar="|----|+|---+=|-/\<>*";Title1;

Data Flier;Infile "c:\projects\alaska97\flierdat.csv" Delimiter="," Firstobs=2;Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL

Size1 Size2 Size3 Size4 Size5 Size6 Edge No;Run;

%out2htm(capture=on);

%*Title1 "<center><font face=arial size=2 color=red>Flier Data</center>";Title1 "Flier Data";

Proc Print Data=Flier Uniform;Run;

Proc Plot Data=Flier;Plot Wt*Lt;Run; Quit;

%out2htm(capture=off,proploc=sasuser.htmlgen.outprop.slist,encode=n,htmlfile=flier.htm);

Note the use of HTML codes embedded within the TITLE statement so as to modify the de-

fault settings. The �rst call to OUT2HTM turns capturing on. The second call ends capturing,

speci�es the properties catalog that will control the appearance of the HTML code, and also

speci�es the output �le to contain the results. It is also possible to specify modi�cations to

the properties directly in the OUT2HTM macro as below

%out2htm(capture=off,encode=n,dface=COURIER,hface=COURIER,htag=PREFORMATTED TEXT,htmlfile=flier.htm,ttag=NO FORMATTING);

The DFACE variable controls the typeface for the data, the HFACE variable controls the type-

face for the headings, HTAG controls the tag type(s) that will be used around the headings,

and the TTAG controls the tag type(s) around the titles. The ENCODE variable determines

whether or not the angle brackets (greater than and less than signs) are encoded so as to

print in HTML as angle brackets. To capture the SAS log, use the option WINDOW=LOG. To

append HTML output to an existing �le, specify OPENMODE=APPEND. There are many other

options that are possible.

For the settings that I used (�xed-width fonts and black color), the following page was

observed in my browser.

Page 79: Advanced SAS Programming Techniques ()

CHAPTER 6. SAS INTERNET TOOLS 78

Page 80: Advanced SAS Programming Techniques ()

CHAPTER 6. SAS INTERNET TOOLS 79

Since a properties catalog can be constructed with the properties information, once you

have found the properties you like for a particular type of report, you may want to enter

them into a properties catalog. An easy way to do this is interactively. From within the

display manager, issue the following macro call

%out2htm(runmode=I);

This will bring up a dialog box within which you can specify the properties catalog to use.

Then select the properties button on the dialog box and make whatever modi�cations that

you have decided upon. The �rst dialog box that you encounter looks like the following.

Click on the properties button to move to the next window.

Page 81: Advanced SAS Programming Techniques ()

CHAPTER 6. SAS INTERNET TOOLS 80

Then select the TEXT tab to get to the html de�nitions for the text.

From this dialog box you can update the properties very easily then save them away. Later

when you wish to assign those properties to the SAS output or log, simply refer to this

properties list and you'll not have to make a long list of properties in the macro call.