35
SAS PROGRAMMING ESSENTIALS FOR STATISTICAL COMPUTING IN DAIRY RESEARCH A.K. Sharma NDRI, Karnal-132 001 [email protected] 1. Introduction Statistical Computing is the interface between Statistics and Computer Science. It is the area of computational science (or scientific computing) specific to the mathematical science of Statistics. The rapid and sustained increase in computing power starting from the second half of the 20 th century has made a substantial impact on the practice of Statistical Science. Early statistical models were almost always from the class of linear models, but powerful computers, coupled with suitable numerical algorithms, caused an increased interest in nonlinear models (such as connectionist models) as well as the creation of new types, such as generalised linear models and multivariate and multilevel models. Increased computing power has also led to the growing popularity of computationally intensive methods. The computer revolution has implications for the future of Statistics with new emphasis on Experimentaland EmpiricalStatistics. A large number of both general and special purpose statistical software are now available. SAS (pronounced sass, originally Statistical Analysis System) is such an integrated system of software products provided by SAS Institute Inc., Cary, North Carolina, USA that enables Scientists to perform various tasks, viz., Data Entry, Retrieval, Management and Mining Report Writing and Graphics Statistical Analysis Forecasting and Decision Support Operations Research and Project Management Quality Improvement Data Warehousing Platform Independent and Remote Computing Grid and Cloud Computing, etc. SAS was conceived by Anthony J. Barr in 1966. As a North Carolina State University graduate student during 1962-64, Barr had created an Analysis of Variance Modelling Language inspired by the notation of statistician Sir Maurice G. Kendall, followed by a Multiple Regression Programme that generated machine code for performing algebraic transformations of the raw data. Drawing on those programmes and his experience with structured data files, he created SAS, placing statistical procedures into a formatted file framework. From 1966 to 1968, Barr developed the fundamental structure and language of SAS. By 1971, SAS was gaining popularity within the academic community. Strength of the system was analysing experiments with missing data, which was useful especially to the agricultural and pharmaceutical industries among others. In 1973, extensive programming in econometrics, time series, and matrix algebra were made. In 1976, SAS Institute, Inc. was incorporated by Barr, et al.

SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS PROGRAMMING ESSENTIALS FOR STATISTICAL

COMPUTING IN DAIRY RESEARCH

A.K. Sharma

NDRI, Karnal-132 001

[email protected]

1. Introduction

Statistical Computing is the interface between Statistics and Computer Science. It is the area of

computational science (or scientific computing) specific to the mathematical science of Statistics.

The rapid and sustained increase in computing power starting from the second half of the 20th

century has made a substantial impact on the practice of Statistical Science. Early statistical

models were almost always from the class of linear models, but powerful computers, coupled

with suitable numerical algorithms, caused an increased interest in nonlinear models (such as

connectionist models) as well as the creation of new types, such as generalised linear models and

multivariate and multilevel models. Increased computing power has also led to the growing

popularity of computationally intensive methods. The computer revolution has implications for

the future of Statistics with new emphasis on “Experimental” and “Empirical” Statistics. A large

number of both general and special purpose statistical software are now available.

SAS (pronounced “sass”, originally Statistical Analysis System) is such an integrated system of

software products provided by SAS Institute Inc., Cary, North Carolina, USA that enables

Scientists to perform various tasks, viz.,

Data Entry, Retrieval, Management and Mining

Report Writing and Graphics

Statistical Analysis

Forecasting and Decision Support

Operations Research and Project Management

Quality Improvement

Data Warehousing

Platform Independent and Remote Computing

Grid and Cloud Computing, etc.

SAS was conceived by Anthony J. Barr in 1966. As a North Carolina State University graduate

student during 1962-64, Barr had created an Analysis of Variance Modelling Language inspired

by the notation of statistician Sir Maurice G. Kendall, followed by a Multiple Regression

Programme that generated machine code for performing algebraic transformations of the raw

data. Drawing on those programmes and his experience with structured data files, he created

SAS, placing statistical procedures into a formatted file framework. From 1966 to 1968, Barr

developed the fundamental structure and language of SAS. By 1971, SAS was gaining popularity

within the academic community. Strength of the system was analysing experiments with missing

data, which was useful especially to the agricultural and pharmaceutical industries among others.

In 1973, extensive programming in econometrics, time series, and matrix algebra were made. In

1976, SAS Institute, Inc. was incorporated by Barr, et al.

Page 2: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Base SAS – An Overview

Base SAS is the core foundation for variety of data management and analytic software

components under SAS. It provides essential tools for the basic data-driven tasks that are

performed by the scientists, i.e., data access, management, analysis and presentation (Fig.1).

Fig.1: Base SAS.

The SAS programming language, ready-to-use procedures and the windowing interface can be

used to access research data from many sources, manage, analyse and present the research data

as meaningful information in the form of a report that can be delivered to any platform, in any

format as per need. The reports thus generated can be published on the Web for its wider

dissemination among various stakeholders.

Accessing data

As stated earlier, you can access data using SAS regardless of the data sources or platforms on

which it resides, i.e., you can access data that are stored (almost) anywhere, whether these are in

a file on your system, or data that is stored on a remote server or in another database system in

almost any format, including raw data, SAS data sets, and files created by other vendors’

software. You can access various types of file with SAS that include, reading raw data in any

format from any kind of file including variable-length records, binary files, free-formatted data,

even files with missing data; accessing some other vendors’ files directly, including BMDP,

SPSS, and OSIRIS files. For others, you can use SAS/ACCESS tool to access external data as if

it were native to SAS, e.g., reading data stored in a Microsoft Excel spreadsheet, in a Microsoft

Access table, in a dBase file, or in ORACLE or another DBMS.

Managing data

After having accessed the data, the same can be manipulated using the SAS programming

language, e.g., format the data, create variables (columns), use operators to evaluate data values,

Data Access and Management

Application Development

Reporting and Graphics

Analytics

base SAS

Page 3: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

use functions to create and recode data values, subset data, perform conditional processing,

merge a wide range of data sources, create, retrieve, and update database information.

Analysing data and presenting information

Once the data is in shape, you can use SAS to analyse the data and produce reports. The resultant

SAS output can range from a simple listing of a data set to customised reports of complex

relationships. Base SAS provides powerful data analysis tools to produce tables, frequency

counts, and cross-tabulation tables; create a variety of charts and plots; compute a variety of

descriptive statistics, including the mean, sum, variance, standard deviation and more; compute

correlations and other measures of association, as well as multi-way cross-tabulations and

inferential statistics, etc.

For reporting and displaying analytical results, SAS provides several visually appealing output

formats, such as an array of mark-up languages including HTML4 and XML; output that is

formatted for a high-resolution printer, PostScript, PDF, and PCL files; RTF; interactive

coloured graphs using ActiveX controls or Java applets. Finally, these reports can be delivered to

a wide variety of locations and platforms in order to suit your needs.

SAS Workspace

SAS is designed to be easy to use. It provides windows for accomplishing all the basic SAS tasks

you need to do. Once you get familiar with the starting points for your SAS tasks, you are ready

to accomplish any task that SAS can do. SAS workspace is organised in the following five main

windows:

The Explorer Window

In the Explorer Window, you can view and manage your SAS files and create shortcuts to files

that are not formatted by SAS. You can use this window to:

View and manage SAS files and create shortcuts to files that are not formatted by SAS

Create new SAS libraries and SAS files

Open any SAS file

Perform most file management tasks such as moving, copying, and deleting files.

Program Editor or Editor Window

You can use either of these windows to enter, edit, and submit SAS programs. The Editor

Window provides several useful editing features such as Colour Coding and Syntax Checking of

SAS language; Expandable and Collapsible sections; Multilevel Undo and Redo, etc. The initial

Editor Window title is Editor - Untitled. When you open a file or save the contents of the Editor

Window to a file, the window title changes to reflect that file name. When the contents of the

Editor Window are modified, an asterisk is added to the title. You can have multiple Editor

Windows open at the same time.

Log Window

The Log Window displays messages about your SAS session and any SAS programs that you

submit.

Page 4: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Output Window

The Output Window displays the output from SAS programs that you submit. It automatically

opens as soon as an output is created. In the MS-Windows operating environment, the Output

Window is positioned behind the Log and Editor Windows until there is output to display. You

can navigate between windows using the taskbar. Several SAS programs may not create any

output in the Output Window. Some programs open interactive windows while other programs

only produce messages in the Log Window. If you create HTML output, you can view it in the

Results Viewer Window, which is the internal browser for SAS.

Results Window

The Results Window helps us to navigate and manage output from SAS programs that you

submit. You can view, save and print individual items of the output. Recall that the Results

Viewer Window is the place where you actually view HTML output. The Results Window is

empty until you submit a SAS program that creates output. Then it opens automatically. The

Results Window is positioned in front of the Explorer Window when SAS creates output. You

can move between the two windows by using the tabs at the bottom of the windows.

SAS data sets

Before you start working with the data in SAS, these data must be in a special form called a SAS

data set. Understanding SAS data sets is the first and foremost step towards learning about SAS

programming. Conceptually, a SAS data set (also called a table) is a file containing descriptor

information and related data values. The file is organised as a table of observations (rows), also

called cases and variables (columns) that SAS can process. Some SAS data sets also contain an

index, which enables SAS to locate records in the data set. In some special cases, such as using

SAS/ACCESS to read database management system files directly, the SAS data set may contain

only the logic for accessing the data, not the data itself! However, for this presentation, you

assume that SAS data sets contain data. To work with SAS data sets, it is also important to

understand as to how are they stored? Note that all SAS files are stored in a SAS library, which

is a collection of files such as SAS data sets and catalogues. In the MS-Windows operating

environment, a SAS library is typically a group of SAS files in the same folder. To access a

library, assign it a name (also known as a libref, or library reference). The library names are

nicknames or shortcuts that you can use to identify libraries during a SAS session. In this

presentation, you will learn about assigning SAS libraries, and then to work with SAS data sets

in a library. SAS assigns three libraries, viz., Sashelp, Sasuser and Work, automatically each time

you start SAS. In this presentation, you learn about these libraries and view file types in the

Sashelp library. In the Explorer Window, double-click Libraries. Notice that there are three

libraries. Sashelp is a permanent library that contains sample data and other files, which control

how SAS works at your site; this is a read-only library. Sasuser is also a permanent library that

contains SAS files in the Profile Catalogue that store your personal settings; this is also a

convenient place to store your own files. Work is a temporary library for files that do not need to

be saved from session to session. Note that Sashelp, Sasuser and the user-assigned libraries are

permanent libraries. Files in permanent libraries are saved in the operating environment. Work is

a temporary library; i.e., files in the Work library are not saved once you end the SAS session.

Double-click the Sashelp library. Scroll the Explorer Window and notice that there are several

types of files, or members, in the library.

Page 5: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Assign a library

When you define a library, you indicate the location of your SAS files to SAS. Once you define a

library, you can manage SAS files within it. In this presentation, you use the New Library

Window to assign a library. On the toolbar, click the New Library tool; the New Library

Window opens. In the Name box, type Aksas (or any name you like). Library names are limited

to 8 characters, which must start with a letter or underscore. The library names can contain only

letters, numerals, or underscores. Select the Enable at Startup check box; this library will be

automatically assigned each time you start a SAS session. Click Browse; select the default

location or select another location in your operating environment. Any files that you save to the

Aksas library will be saved in the folder that you designate in the Path box; click OK. You can

delete SAS libraries. Note that when you delete a SAS library, SAS no longer has access to the

directory. However, the contents of the library still exist in your operating environment.

Import different format data files as SAS data sets and rename the data set files

Example 1

Consider the following MS-Excel worksheet (lact.xlsx) containing data pertaining to milk

production (state wise, up to first four lactations) for different breeds of dairy cattle. The same

worksheet file can be saved as ASCII text file (say, lact.txt) by selecting appropriate file type at

the time of saving the worksheet.

Let us see how to add and rename a SAS data set. Now that you have a library, you can add a

SAS data set to the library. With the Explorer Window active, select View ► Show Tree; the

libraries are displayed in the left pane of the window. Click the Aksas library. Import lact.xlsx

(or lact.txt) file to SAS using File ► Import Data... option. Click Aksas; notice that lact file

has been imported there. Right-click lact and select Rename. Type lactation for the new name

and click OK. SAS data set names must be 1 to 32 characters in length; begin with a letter (A-Z,

STATE LACT BREED MY

Haryana 1 Hariana 18

Haryana 2 Hariana 19

Haryana 3 Hariana 21

Haryana 4 Hariana 20

Haryana 1 Tharparkar 23

Haryana 2 Tharparkar 25

Haryana 3 Tharparkar 22

Haryana 4 Tharparkar 21

Haryana 1 Sahiwal 22

Haryana 2 Sahiwal 24

Haryana 3 Sahiwal 23

Haryana 4 Sahiwal 23

Haryana 1 Red Sindhi 27

Haryana 2 Red Sindhi 26

Haryana 3 Red Sindhi 25

Haryana 4 Red Sindhi 25

STATE LACT BREED MY

Panjab 1 Hariana 25

Panjab 2 Hariana 36

Panjab 3 Hariana 35

Panjab 4 Hariana 37

Panjab 1 Tharparkar 30

Panjab 2 Tharparkar 32

Panjab 3 Tharparkar 29

Panjab 4 Tharparkar 30

Panjab 1 Sahiwal 26

Panjab 2 Sahiwal 29

Panjab 3 Sahiwal 30

Panjab 4 Sahiwal 38

Panjab 1 Red Sindhi 29

Panjab 2 Red Sindhi 25

Panjab 3 Red Sindhi 27

Panjab 4 Red Sindhi 28

Page 6: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

including mixed case characters) or an underscore ( _ ); and continue with any combination of

numbers, letters, or underscores.

Open a SAS data set

Now that you’ve created a data set, view the data that it contains. There are many other ways to

get your data into a SAS data set. In general, you can enter data directly into a SAS data set using

the VIEWTABLE Window; read raw data into a SAS data set using the Import wizard or SAS

programming statements; read and modify existing data; convert other vendors’ data files into

SAS data sets using SAS/ACCESS; read other vendor’s data directly using SAS/ACCESS. In the

Explorer Window, double-click the lactation table in the Aksas library; the table opens in the

VIEWTABLE Window; scroll the VIEWTABLE Window and notice that there are 32 rows

(also called observations or cases, both terms are used here synonymously) and 4 columns (also

called variables) sets using SAS programming statements.

View general data properties

The descriptor portion of a SAS data set contains information about the data set, including the

name of the data set; the date and time the data set was created; the number of observations; the

number of variables; you can see this information by viewing the general properties of a data set.

In the Explorer Window, right-click the Lactation table and select Properties. In the General

tab, view the data set’s properties. (Don’t close this window yet as you’ll need it for the next step

too).

View column attributes

Besides general information about the data set, the descriptor portion also contains attribute

information for each variable in the data set including the variable’s name, type, length, format,

informat (input format), and label. In the Aksas.lactation Properties Window, click the

Columns tab. In the Column Name column, notice that all the variables for the data set are

listed along with a symbol that indicates the variable’s type.

Variables and Labels

Variable (column) names must be 1 to 32 characters in length; begin with a letter (A-Z, including

mixed case characters) or an underscore ( _ ); continue with any combination of numbers, letters,

or underscores. SAS is case insensitive for variable names; however, SAS remembers the first

occurrence of the variable and writes it that way in output. Now, look at the Label column, a

label is descriptive text up to 256 characters; labels are used instead of variable names in some

reports and for the column headings in the VIEWTABLE Window.

Types of Variables

Further, look at the Type column. There are only two types of variables in SAS, viz., character

and numeric. Character variables are listed as Text in the type column, and numeric variables are

listed as Number in the Type column. The Length attribute is related to the variable’s type.

Character variables can contain any values; use a blank to represent missing values; can be up to

32k long. Numeric variables can contain only numeric values (the digits 0 through 9, +, -, ., and

E for scientific notation); use a single period (.) to represent missing values; have a default length

of 8; numeric values (no matter how many digits they contain) are stored as floating point

numbers in 8 bytes of storage, unless you specify another length.

Page 7: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Change a variable’s format

Formats are variable attributes that affect the way data values are written. SAS offers a variety of

formats for numeric and character data. In this session you change the format of a variable. In the

Aksas.lactation Properties Window, look at the Format column; the MY variable uses the SAS

default format, which displays the value 18.00 as 18 in a report. In general, SAS formats have a

name; a w value, which specifies the width that is used for displaying the value, a period

following the w value, numeric SAS formats, such as the Fw.d format, can also specify a d value,

which is the number of decimal places to be displayed. Open the lactation table if it is not

already open; right-click the MY column heading and select Column Attributes from the

shortcut menu (Fig.2); in the Column Attributes Window, click the selector next to the Format

box.

Fig.2: Viewing Column Attributes of Aksas.lactation data set.

In the Format Window, view the available formats and their descriptions. When you select a

format in this list, an example is displayed in the Format Details area. Let’s say that you want to

incorporate the decimal places for the values of the MY variable. Change the value in the

Decimal box to 12.2; click OK. In the Column Attributes Window, click Apply; then click

Close. View the lactation table and notice that the values in the MY column now contain

decimal places up to two decimal points.

Page 8: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Fig.3: Change a variable’s format.

1. SAS programming essentials

The SAS programming language is both powerful and flexible. You can program any number of

analyses and reports with it. The SAS language contains statements, expressions, functions and

CALL routines, options, formats, and informats, i.e., elements that many programming

languages share. SAS can also simplify programming for you with its library of built-in

programs known as SAS procedures.

Create and run SAS programs

SAS procedures use data values from SAS data sets to produce pre-programmed reports,

requiring minimal effort from users. Base SAS procedures enable you to manipulate data; store

and retrieve information; perform statistical analysis; create reports. This session shows you the

main characteristics of SAS programs, i.e., components of SAS programs. Let’s begin by looking

at a simple SAS program. The sample SAS program below (Fig.4) contains two steps, a DATA

step and a PROC step. These two types of steps, alone or combined, form all SAS programs:

Fig.4: A sample SAS program showing the DATA and PROC steps.

Page 9: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Components of a SAS program

DATA step typically creates or modifies SAS data set. Also, it is used to produce customised

reports. For example, you can use DATA step to:

put your data into a SAS data set

compute the values for new variables

check for and correct errors in your data

produce new SAS data sets by subsetting, merging, and updating existing data sets, etc.

Note that, in the sample program (Fig.4), the DATA step creates a new SAS data set,

lact_pp.sas7bdat in the Aksas library using the existing SAS data set lactation.sas7bdat in the

same library. Note that we do not directly use the original data set (lactation, in this case) rather

a new data file (lact_pp, in this case) is created from the original data set, which is actually

utilised by SAS for analysis of the underlying data.

PROC (procedure) steps typically analyse and process data presented in the form of a SAS data

set; and they sometimes create SAS data sets that contain the results of the procedure. PROC

steps control a library of pre-written routines (procedures) that perform tasks on SAS data sets,

such as listing, sorting, and summarising data, e.g., you can use PROC steps to print a report;

produce descriptive statistics; create a tabular report; produce plots and charts, etc. The sample

program (Fig.4) uses the PROC PRINT procedure, which prints the data in a data set. The

DATA= option tells SAS what data to use for the procedure.

Characteristics of SAS Programs

Let’s have a glance at the individual statements in the sample program (Fig.4). SAS programs

comprise of several SAS statements. A SAS statement has two important characteristics, i.e., it

usually begins with a SAS keyword; and it always ends with a semicolon. A DATA step begins

with the keyword DATA, and a PROC step begins with the keyword PROC. Generally, a step

ends with a RUN statement or when a new DATA or PROC step begins. A RUN statement tells

SAS to process all the preceding lines of the step. SAS statements are free-format, i.e., they can

begin and end anywhere on a line; one statement can continue over several lines; and several

statements can be on a single line. SAS statements are not case sensitive. Blanks or special

characters separate the ‘words’ in a SAS statement.

Composing a SAS program into the Editor Window Let’s compose the sample SAS programs [i.e., Code 1(a) and Code 1(b)] given below. These

programs create a new SAS data set from an existing SAS data set and then print a listing of the

new data set. The Aksas.lactation data set was created in Aksas with SAS data sets. If you

didn’t create it, return to the task to define the library and create the data set before proceeding

here. Type the following codes (one-by-one) separately, into the Editor Window and save:

Page 10: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Code 1(a): One-level name. Code 1(b): Two-level name.

LIBNAME aksas ‘d:\aksas’;

DATA lact_1a;

SET aksas.lactation;

RUN;

PROC PRINT DATA=lact_1a;

RUN;

LIBNAME aksas ‘d:\aksas’;

DATA aksas.lact_1b;

SET aksas.lactation;

RUN;

PROC PRINT DATA=aksas.lact_1b;

RUN;

Notice that when you write this text into the Editor Window, SAS keywords are colour-coded

and the program steps are separated into expandable blocks. The DATA step in Code 1(a)

creates a new SAS data set named lact_1a. Note that lact_1a is a one-level name. Since no

library name is specified for the lact_1a data set, SAS uses by default the Work library to

temporarily store this file. The lact_1a data set is created from the lactation data set that you

stored in your Aksas library. Notice that Aksas.lactation and Aksas.lact_1b are two-level

names [as shown in the Code 1(a) and Code 1(b)].The PROC PRINT step lists all the data in the

Work.lact_1a data set. If the user-defined library is assigned as in case of the Code 1(b), SAS

uses the user-defined library (Aksas in present case) rather than the Work library for one-level

names.

Submit the program and view output When you submit a SAS program, SAS compiles (checks the code for its grammatical

correctness in SAS perspective) and executes (if compilation process is successful) the code and

returns expected results (provided the code is logically correct) to the Output Window.

1. With the Editor Window active, select Run ►Submit. To submit only a portion of a

program in the Editor Window, highlight the portion you want to submit, right-click the

highlighted area, and select Submit ► Selection.

2. The Output Window comes to the front (Fig.5). Scroll the window and notice that the

PRINT procedure has printed all the data from the lactation data set. In addition, the

procedure has automatically formatted the data into rows and columns.

Fig.5: Results of the Code 1(a) SAS program being displayed in the OUTPUT Window.

Page 11: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

View the log Each time a step is executed, SAS generates a log of the processing activities and the results of

the processing. The SAS log collects messages about the processing of SAS programs and any

errors that may occur. Click the Log Window to activate it. The contents of the Log Window

produced by SAS for the Code 1(a) program are shown in Fig.6. Notice that there are separate

messages for the DATA and PROC steps. The messages from the DATA step tell you that SAS

read 32 observations from the Aksas.lactation data set and that there are 32 observations and 4

variables in the new data set Work.lact_1a. The messages from the PROC step tell you that 32

observations were read during the PRINT procedure. There are no error messages.

Fig.6: The contents of the Log Window produced by SAS while executing the Code 1(a)

program.

It’s important that you understand how SAS processes DATA steps (one observation at a time)

and PROC steps. You should also learn methods and tips for debugging SAS programs. For

details on these topics, refer to the SAS documentation.

Use the Results Window Once you have created output for Code 1(a), the Results Window opens. On windows, the

Results Window covers the Explorer Window. You can switch between the two windows using

the tabs at the bottom of the window.

1. Look at the Results Window. The Results node at the top of the window contains a folder

for the output of the PROC PRINT procedure (Fig.7). The folder is labeled with the

procedure name and the title (default in present example) of the report.

2. Click the expansion icon next to the folder.

3. Double-click Data Set Work.lact_1a. The Output Window (Fig.5) scrolls to the top of

the output.

Page 12: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Fig.7: The contents of the Results Window produced by SAS while executing the Code 1(a)

program.

Note that each time you run a program that creates output; the new output is appended to the last

output in the Output Window. You can double-click the output item in the Results Window to

scroll the Output Window to the top of that particular output. If you clear the Output Window,

the links in the Results Window are deleted.

Create HTML output You’ve seen that you can create SAS listing output in the Output Window, but you can also set

preferences to create HTML output (Fig.8). Your preferences are stored from session to session.

1. Select Tools ► Options ► Preferences.

2. Click the Results tab.

3. Select the Create HTML check box and click OK.

Page 13: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Fig.8: Setting preferences to create HTML output.

Modify the program Now that you have seen the output and formatting for the PRINT procedure, let’s try adding

options to the program to incorporate the following changes:

Remove the date from the top of each page, which SAS prints by default

Replace the default “The SAS System” title with

Use the column label rather than the column name in the output

Remove the Obs column.

We can do this by using a SAS system option, a TITLE statement, and options on the PROC

PRINT statement. To remove the date from the top of each page, use the NODATE system

option. Insert the following OPTIONS statement at the top of the program: options nodate; to

replace the default title, use the TITLE statement. Insert the line, title “State, Lactation and

Breed wise Milk Yield Report”; above the RUN statement for the PROC PRINT procedure.

By including the TITLE statement in the PROC step, the title is applied to the output of the

procedure. To remove the Obs column and use column labels, use options on the PROC PRINT

statement. Change the PROC PRINT statement to add the LABEL and NOOBS options. Your

PROC PRINT statement should look like this:

proc print data=lact_1b label noobs. The revised program should look like as shown below.

OPTIONS NODATE;

LIBNAME aksas ‘d:\aksas’;

DATA aksas.lact_1b;

SET aksas.lactation;

RUN;

PROC PRINT DATA=aksas.lact_1b label noobs;

TITLE "State, Lactation and Breed wise Milk Yield Report";

Page 14: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

RUN;

OPTIONS;

TITLE;

Select Run ► Submit the above program; it produces the following revised output (compare it

with that given in Fig.5):

Fig.9: Revised output without system date as well as having user-defined title replacing the

default SAS title on top of the report.

With the Results Viewer Window active, select File ► Save As. In the File name box, type

e:\aksas\Lactation.htm. Click Save. Look at the Results Window. Notice that there is a new

folder for the results of the program you just ran. The folder name uses the State, Lactation and

Breed wise Milk Yield Report title that you specified in your program. Double-click the Print:

State, Lactation and Breed wise Milk Yield Report folder, then double-click Data Set

Aksas.lact_1b. There are links to both the listing output and the HTML output (Fig.10).

Page 15: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Fig.10: Revised Results with links to simple output and HTML formats.

Now you’ve learned the basic steps for creating, submitting (running), and saving SAS

programs.

Creating simple data sets and manipulating the data with SAS program

Earlier, we imported data from an already existing data file in different formats such as MS-

Excel worksheet or ASCII formats to build up SAS data set. The following example program

illustrates as to how to create a SAS data set directly through programming.

Example 2: Let us create directly a SAS data set for the data given in Example-1 above.

We use the following program (a revised version of code 1(b) above). The data set is saved as

aksas.lactation_.sas7bdat. Explanation of the code is given at its end.

OPTIONS NODATE;

LIBNAME aksas 'd:\aksas';

DATA aksas.lactation_;

INPUT state $ lact $ breed $ MY;

Page 16: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

CARDS; Haryana 1 Hariana 18

Haryana 2 Hariana 19

Haryana 3 Hariana 21

Haryana 4 Hariana 20

Haryana 1 Tharparkar 23

Haryana 2 Tharparkar 25

Haryana 3 Tharparkar 22

Haryana 4 Tharparkar 21

Haryana 1 Sahiwal 22

Haryana 2 Sahiwal 24

Haryana 3 Sahiwal 23

Haryana 4 Sahiwal 23

Haryana 1 Red Sindhi 27

Haryana 2 Red Sindhi 26

Haryana 3 Red Sindhi 25

Haryana 4 Red Sindhi 25

Panjab 1 Hariana 25

Panjab 2 Hariana 36

Panjab 3 Hariana 35

Panjab 4 Hariana 37

Panjab 1 Tharparkar 30

Panjab 2 Tharparkar 32

Panjab 3 Tharparkar 29

Panjab 4 Tharparkar 30

Panjab 1 Sahiwal 26

Panjab 2 Sahiwal 29

Panjab 3 Sahiwal 30

Panjab 4 Sahiwal 38

Panjab 1 Red Sindhi 29

Panjab 2 Red Sindhi 25

Panjab 3 Red Sindhi 27

Panjab 4 Red Sindhi 28

;

PROC PRINT DATA = aksas.lactation_ label noobs;

TITLE "State, Lactation and Breed wise Milk Yield Report";

RUN;

OPTIONS;

TITLE;

Program Explanation (Example 2):

As stated earlier, this program creates a data set, lactation_.sas7bdat under Aksas library. The

above program needs some explanation for readers’ comprehension. The ‘input’ statement has

been incorporated under the DATA step. This statement creates the variables, state, lact, breed

and my denoting State Name, Lactation Number, Breed Name and Milk Yield, respectively.

Note the ‘$’ symbol following the variable names, state, lact and breed each under input

statement indicates that these variables are all of type ‘alpha-numeric’ or ‘non-numeric’, or

‘character’, i.e., no statistical/mathematical analysis is employed on these variables, though these

variables may be used as grouping variables for classification of the underlying data. The readers

Page 17: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

should note it carefully that the data values assumed by the lact variable seem to be numeric (1-

4) but these values are like only serial numbers, which do not directly participate in the data

analysis, e.g., you will never compute average lactation number as it makes no sense. Hence,

such variables are better defined as character type variables (to be used as grouping variables)

rather than numeric variables, for you will not appreciate if the lactation number is represented as

1.00, 2.00…4.00. The data type of the variable, my (denoting milk yield) is numeric. Further, the

statement ‘cards’ used in conjunction with the input statement indicates to the SAS system that

each of the rows following cards statement represent one case or record pertaining to the milk

production of a particular breed of dairy cow in a specific state, pertaining to a specific lactation

number. Several cases can be included here and the end of the data is marked with a semi-colon

(;). Note it carefully that the DATA step does not terminate with ‘run’ statement in such a case;

rather it ends with only a semi-colon. Also, note that each variable in input statement is separated

with a blank space. Each case is written on a separate line and the data value for each variable on

this line is written in the same order as defined in the input statement. This program produces

same output as shown in Fig.9.

Example 3: The following table gives the gain in body weight (kg) per heifer during four

grazing treatments:

Heifer number Gain in body weight (kg)

Treatments

T1 T2 T3 T4

1 67.3 74.2 63.1 48.7

2 36.9 42.2 32.9 49.0

3 63.2 58.6 59.2 62.0

4 26.8 36.6 42.4 38.8

5 54.8 54.6 34.0 48.2

6 64.2 81.8 65.6

7 81.4

Test whether the mean gain in weight of heifers under four treatments is equal or not?

You know that this is a typical example of one-way Analysis of Variance (ANOVA). Therefore,

the SAS program to first create the data set and then perform the one-way ANOVA on this data

set is as follows:

LIBNAME aksas 'd:\aksas';

DATA aksas.bla256;

INPUT TREAT $ GAIN;

CARDS;

T1 67.3

T1 36.9

T1 63.2

T1 26.8

T1 54.8

T1 64.2

Page 18: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

T1 81.4

T2 74.2

T2 42.2

T2 58.6

T2 36.6

T2 54.6

T2 81.8

T3 63.1

T3 32.9

T3 59.2

T3 42.4

T3 34.0

T3 65.6

T4 48.7

T4 49.0

T4 62.0

T4 38.8

T4 48.2

;

PROC ANOVA DATA=aksas.bla256;

CLASS TREAT;

MODEL GAIN=TREAT;

TITLE 'COMPARE MEAN GAIN IN BODY WEIGHT OF HEIFERS UNDER FOUR

TREATMENTS - ONE-WAY ANOVA';

RUN;

QUIT;

Program explanation (Example 3):

Similar to previous Example-2, the DATA step here creates data set containing data on two

variables, TREAT and GAIN denoting grazing treatments (grouping variable) and gain in body

weight, respectively. Note that the cases are presented sequentially treatment wise, i.e., following

the order: T1, T2, T3 and T4. The SAS system expects you to present the one-way data for

ANOVA in this fashion only. Now, further look at the PROC ANOVA step. (Note: the PROC

ANOVA is presented here so as to highlight the layout of the data within the data set under

DATA step in perspective with ANOVA as the arrangement of data in data set depends upon the

intended analysis tool to be employed on the data set). The PROC ANOVA step performs one-

way analysis of variance procedure on the underlying data set. PROC ANOVA is relatively easy

to use. Let’s look at the code statement-by-statement.

PROC ANOVA DATA = aksas.bla256;

This statement defines that we want to use the ANOVA procedure on the data set called

aksas.bla256.sas7bdat, which was created under the DATA step above. Now, consider the next

statement in PROC ANOVA step:

CLASS TREAT;

Page 19: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

This defines TREAT (independent variable) as class variable. This is how we specify the

associated values to be grouped. Further, consider the next statement:

MODEL GAIN=TREAT;

It tells SAS what model we want to test. This statement signifies that GAIN (on LHS to ‘=’ sign)

is the response variable or dependent variable and TREAT (on RHS to ‘=’ sign) is a factor or

independent variable. Complete description of any SAS step and statements contained therein

can be obtained through inbuilt HELP documentation. This program produces the ANOVA table

along with several other results as follows:

COMPARE MEAN GAIN IN BODY WEIGHT OF HEIFERS UNDER FOUR

TREATMENTS - ONE-WAY ANOVA

The ANOVA Procedure Class Level Information

Class Levels Values

TREAT 4 T1 T2 T3 T4

Number of Observations Read 24

Number of Observations Used 24

The ANOVA Procedure

Dependent Variable: GAIN

Source DF

Sum of

Squares Mean Square F Value Pr > F

Model 3 359.889964 119.963321 0.48 0.7022

Error 20 5035.919619 251.795981

Corrected Total 23 5395.809583

R-Square Coeff Var Root MSE GAIN Mean

0.066698 29.60233 15.86808 53.60417

Source DF Anova SS Mean Square F Value Pr > F

TREAT 3 359.8899643 119.9633214 0.48 0.7022

Note: Complete details of the analysis and interpretation of the results are discussed elsewhere

(by different authors) in this e-course.

Page 20: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Example 4: The following table gives the fat content for four batches of paneer :

Batch

1 2 3 4

Fat

Conte

nt

50.3 54.1 57.5 55.3

52.2 53.7 56.3 52.8

52.5 52.5 55.8 53.5

51.7 55.2 56.9 51.7

52.6 55.8

53.2

Write a SAS program to create data set to analyse the data for variation among batches with

regard to fat content in paneer.

This is the example of one-way ANOVA, which is similar to the previous Example-3. Therefore,

the SAS program to create the requisite data set file for the data given in the above table and to

print the observations contained in the newly created data file is a follows:

OPTIONS NODATE;

LIBNAME aksas 'd:\aksas';

DATA aksas.ex4;

INPUT batch $ fat_cont;

cards;

B1 50.3

B1 52.2

B1 52.5

B1 51.7

B1 52.6

B1 53.2

B2 54.1

B2 53.7

B2 52.5

B2 55.2

B3 57.5

B3 56.3

B3 55.8

B3 56.9

B3 55.8

B4 55.3

B4 52.8

B4 53.5

B4 51.7

;

PROC PRINT DATA = aksas.ex4;

TITLE "Analysis of variance between batches w.r.t. fat content in Paneer";

RUN;

TITLE;

OPTIONS;

QUIT;

Page 21: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Program output (Example 4)

Analysis of variance between batches w.r.t. fat content in Paneer

Obs batch fat_cont

1 B1 50.3

2 B1 52.2

3 B1 52.5

4 B1 51.7

5 B1 52.6

6 B1 53.2

7 B2 54.1

8 B2 53.7

9 B2 52.5

10 B2 55.2

11 B3 57.5

12 B3 56.3

13 B3 55.8

14 B3 56.9

15 B3 55.8

16 B4 55.3

17 B4 52.8

18 B4 53.5

19 B4 51.7

Example 5: Four batches of butter were prepared by churning cream. Three fat levels in cream

and three churning speeds were taken for preparation of butter. The moisture content (mc) in

butter for the four batches is given below:

Bat

ch

Fat Level

30% 40% 50%

Churning Speed

(RPM)

Churning Speed

(RPM)

Churning Speed

(RPM)

1600 1700 1800 1600 1700 1800 1600 1700 1800

1 66.8 69.1 69.3 77.9 77.8 76.1 71.7 72.2 71.9

2 68.1 68.9 70.1 77.6 78.8 76.2 72.3 71.9 71.5

3 67.0 69.2 70.0 77.7 78.7 76.0 72.3 72.2 71.8

4 67.5 68.1 68.9 77.1 77.6 75.7 71.5 71.4 71.3

Write a SAS program to prepare a data file to perform two-way ANOVA with interaction.

This is an example of two-way classification of data with observations per cell ANOVA. The

program to create SAS data set file for this problem is given below (Note: Codes used for the

variables, fat level (f_level), speed (speed) and batch (batch) are 1-3, 1-3 and 1-4, respectively) :

Page 22: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

LIBNAME aksas 'd:\aksas';

DATA aksas.twoway_m;

INPUT f_level speed batch mc;

CARDS;

1 1 1 66.8

1 2 1 69.1

1 3 1 69.3

2 1 1 77.9

2 2 1 77.8

2 3 1 76.1

3 1 1 71.7

3 2 1 72.2

3 3 1 71.9

1 1 2 68.1

1 2 2 68.9

1 3 2 70.1

2 1 2 77.6

2 2 2 78.7

2 3 2 76.4

3 1 2 72.3

3 2 2 71.9

3 3 2 71.8

1 1 3 67.0

1 2 3 69.2

1 3 3 70.0

2 1 3 77.0

2 2 3 78.2

2 3 3 76.0

3 1 3 72.2

3 2 3 72.3

3 3 3 71.8

1 1 4 67.5

1 2 4 68.1

1 3 4 68.9

2 1 4 77.1

2 2 4 77.6

2 3 4 75.7

3 1 4 71.5

3 2 4 71.4

3 3 4 71.3

;

PROC GLM DATA = aksas.twoway_m;

CLASS f_level speed batch;

MODEL mc = f_level speed f_level*speed;

TITLE 'Analyse the effects of fat level and speed';

TITLE2 'Including Interaction';

RUN;

QUIT;

Page 23: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Program output (Example 5):

Analyse the effects of fat level and speed Including Interaction

Obs f_level speed batch mc

1 1 1 1 66.8

2 1 2 1 69.1

3 1 3 1 69.3

4 2 1 1 77.9

5 2 2 1 77.8

6 2 3 1 76.1

7 3 1 1 71.7

8 3 2 1 72.2

9 3 3 1 71.9

10 1 1 2 68.1

11 1 2 2 68.9

12 1 3 2 70.1

13 2 1 2 77.6

14 2 2 2 78.7

15 2 3 2 76.4

16 3 1 2 72.3

17 3 2 2 71.9

18 3 3 2 71.8

19 1 1 3 67.0

20 1 2 3 69.2

21 1 3 3 70.0

22 2 1 3 77.0

23 2 2 3 78.2

24 2 3 3 76.0

25 3 1 3 72.2

26 3 2 3 72.3

27 3 3 3 71.8

28 1 1 4 67.5

29 1 2 4 68.1

30 1 3 4 68.9

31 2 1 4 77.1

32 2 2 4 77.6

33 2 3 4 75.7

34 3 1 4 71.5

35 3 2 4 71.4

36 3 3 4 71.3

Page 24: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Example programs (using data set, aksas.lactation.sas7bdat given in Example-1 above)

Ex.1. Let us learn how to work with SAS data set and to access and browse SAS data libraries.

The following program demonstrates how to generate a list of all the SAS files in the data library

(aksas in this case) using _ALL_ keyword. Note that NODS option suppresses the descriptor

portions of the data sets. NODS is used in conjunction with the keyword _ALL_.

LIBNAME aksas “d:\aksas”;

PROC CONTENTS DATA = aksas._ALL_ nods;

RUN;

Ex.2. The following program displays contents of the aksas.test data set file as well as prints

listing of the data set according to specified condition and order.

OPTIONS NODATE;

LIBNAME aksas “d:\aksas”;

DATA aksas.test;

SET aksas.lactation;

RUN;

PROC CONTENTS DATA = aksas.test;

RUN;

PROC PRINT DATA=aksas.test;

TITLE “State, Lactation and Breed wise Milk Yield Report”;

WHERE state=‘Haryana’;

VAR my lact;

RUN;

Output produced by SAS Program (Ex.2):

State, Lactation and Breed wise Milk Yield Report

Obs MY LACT

1 18 1

2 19 2

3 21 3

4 20 4

5 23 1

6 25 2

7 22 3

8 21 4

9 22 1

10 24 2

11 23 3

12 23 4

13 27 1

14 26 2

Page 25: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

15 25 3

16 25 4

Program explanation (Ex.2):

Two new statements, where and var have been introduced under PROC PRINT procedure, which

need explanation. The other statements have been defined already. A where statement is used to

select cases from a SAS data set that meet a particular condition. Thus, where statement subsets

the input data by specifying certain condition(s) that each observation must meet before it is

available for processing. The condition(s) that you define in a where statement is an arithmetic or

logical expression that generally consists of a sequence of operands and operators. To compare

character values, you must enclose them in single or double quotation marks and the values must

match exactly, including capitalisation. Using the where statement might improve the efficiency

of your SAS programs because SAS is not required to read all the observations in the input data

set. Note: You can use only one where statement in a DATA step or a PROC step.

Now, let us further discuss the program described in this Example-2. The where statement is

used here to define certain condition or compound condition (connected through AND, OR or

NOT logical operators, etc. For more details on SAS operators visit:

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/

viewer.htm#a000780367.htm) to be tested by SAS for selective data extraction from source data

set (aksas.lactation.sas7bdat in this case) into the destination data set (aksas.test.sas7bdat in this

exercise program). Consider the following statement:

WHERE state=‘Haryana’;

This instructs to SAS that extract all those cases from the original data set for which the state

variable contains data value as Haryana and stores them into destination data set file aksas.test.

The ‘=’ sign used in such conditions as above is for comparison and not for equality. Note that

state variable is of character data type; therefore, while matching its data value, Haryana is

enclosed in single quote marks. If the variable is numeric type, we equate the data value to the

variable as such, e.g., my > 18 (i.e., extract all the cases for which milk yield is greater than 18).

We can replace earlier condition with the following:

WHERE state=‘Haryana’ and my between 20 and 30;

This statement would extract all the cases having state as Haryana as well as milk yield lying in

the interval [20, 30] (see output with this compound condition instead of earlier simple condition,

at the end of this paragraph).

Now, consider the statement:

VAR my lact;

This indicates to SAS that while printing the information from the destination data set file,

consider only variables listed under var statement, i.e., my and lact in this case. The information

will be printed according to the order in which the variable names are mentioned under var

statement irrespective of the original order in which these variables have been actually stored in

Page 26: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

computer memory. Hence, we can alter the order of variables while printing the report without

disturbing their original settings.

Output of SAS Program (Ex.2) with compound condition in where statement:

State, Lactation and Breed wise Milk Yield Report

Obs MY LACT

1 21 3

2 20 4

3 23 1

4 25 2

5 22 3

6 21 4

7 22 1

8 24 2

9 23 3

10 23 4

11 27 1

12 26 2

13 25 3

14 25 4

Programs for practice (All the following programs use data set, aksas.lactation.sas7bdat given

in Example-1 above)

Now the readers can try the following programs similar to the program discussed in Example-2

above. You will come across some new statements and options that are explained; however, you

can comprehend the new statements through inbuilt SAS help.

Ex.1. Program to subset cases from the master or base data set file, lactation.sas7bdat into sub

data set or target file, red_sin.sas7bdat pertaining to ‘Red Sindhi’ breed of dairy cattle for both

the states. Logical operator OR is used to form the compound condition in PROC PRINT step;

this means that it will extract the case if either of the two conditions is found TRUE. (Note: This

is unlike AND operator used in Example-2; which was used to test both the conditions to be

TRUE simultaneously). As stated earlier, spellings and case for the pattern matching string say,

‘Panjab’ should be unique throughout. If you store the state as Panjab and search for Punjab or

panjab, etc., SAS will not be able to find it even though the one exists in data set file!

SAS Program (Ex.1):

OPTIONS NODATE;

LIBNAME aksas ‘d:\aksas’;

Data aksas.red_sin;

Set aksas.lactation;

RUN;

Page 27: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

PROC CONTENTS DATA = aksas.red_sin;

RUN;

Proc Print Data=aksas.red_sin;

Title "State, Lactation and Breed wise Milk Yield Report";

Where (state=‘Haryana’ and breed=‘Red Sindhi’) or (state=‘Panjab’ and breed=‘Red Sindhi’);

Var lact my;

Run;

Ex.2. This program subsets and prints the Haryana state’s cases on variables breed, lact and

my.

OPTIONS NODATE;

LIBNAME aksas ‘d:\aksas’;

DATA aksas.lact_hry;

SET aksas.lactation;

WHERE state=‘Haryana’;

RUN;

PROC PRINT DATA=aksas.lact_hry;

TITLE "State, Lactation and Breed wise Milk Yield Report";

VAR breed lact my;

RUN;

Ex.3. This program subsets and prints the Panjab state’s cases on variables state, breed, lact and

my.

OPTIONS NODATE;

LIBNAME aksas ‘d:\aksas’;

DATA aksas.lact_pb;

SET aksas.lactation;

WHERE state = ‘Panjab’;

RUN;

PROC PRINT DATA=aksas.lact_pb;

TITLE "State, Lactation and Breed wise Milk Yield Report";

VAR state breed lact my;

RUN;

Ex.4. This subsets and prints the Panjab state’s cases on variables state, breed, lact and my

according to the criterion ‘my between 30 and 35’. This example demonstrates a different way to

perform the operation ‘my between 30 and 35’. SAS provides different tools to perform such

operations. Referring to the inbuilt Help is a good idea to learn new tools in SAS.

OPTIONS NODATE;

LIBNAME aksas ‘d:\aksas’;

DATA aksas.lact_pb;

SET aksas.lactation;

WHERE state = ‘Panjab’ and (my>=30 and my<=35);

RUN;

PROC PRINT DATA=aksas.lact_pb;

Page 28: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

TITLE "State, Lactation and Breed wise Milk Yield Report";

VAR state breed lact my;

RUN;

Ex.5. This is similar to the Ex.4; and subsets and prints the Panjab state’s cases on variables

state, breed, lact and my according to the criterion ‘my between 30 and 35’ using still another

option to perform the same operation ‘my between 30 and 35’.

OPTIONS NODATE;

LIBNAME aksas ‘d:\aksas’;

DATA aksas.lact_pb;

SET aksas.lactation;

WHERE state = ‘Panjab’ and (30<=my<=35);

RUN;

PROC PRINT DATA=aksas.lact_pb;

TITLE "State, Lactation and Breed wise Milk Yield Report";

VAR state breed lact my;

RUN;

Ex.6. This program is also similar to the Ex.4; and i) subsets and prints the Panjab state’s cases

on variables state, breed, lact and my according to the criterion ‘my between 30 and 35’; and ii)

subsets and prints the Haryana state’s cases on variables state, breed, lact and my according to

the criterion ‘my > 22’. This program demonstrates the use of ‘like’ operator with where

statement. The like operator selects cases by comparing the values of a character variable to a

specified pattern (called pattern matching). The like operator is case sensitive. There are two

special characters available for specifying a pattern:

a) Percent sign (%) – specifies that any number of characters can occupy that position. The

following where expressions select all cases having state with a data value that starts with

the letters ‘Pan’ and ‘H’, respectively. The data value contained in variable, state can be

of any length.

WHERE state like ‘Pan%’;

WHERE state like ‘H%’;

b) Underscore (_) – matches just one character in the value for each underscore character.

You can specify more than one consecutive underscore character in a pattern, and you

can specify a percent sign and an underscore in the same pattern.

WHERE state LIKE 'P_n%'; (Note: data may now contain mixed names like Panjab or

Punjab)

WHERE state LIKE ‘Har_a%’; (Note: data may now contain mixed names like Haryana

or Hariana)

Page 29: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

SAS Programs (Ex.6):

i)

OPTIONS NODATE;

LIBNAME aksas "d:\aksas";

DATA aksas.lact_pb;

SET aksas.lactation;

WHERE state LIKE "Pan%" and (my between 30 and 35);

RUN;

PROC PRINT DATA=aksas.lact_pb;

TITLE "State, Lactation and Breed wise Milk Yield Report";

VAR state breed lact my;

RUN;

Program Output (Ex.6-i):

State, Lactation and Breed wise Milk Yield Report

Obs STATE BREED LACT MY

1 Panjab Hariana 3 35

2 Panjab Tharparkar 1 30

3 Panjab Tharparkar 2 32

4 Panjab Tharparkar 4 30

5 Panjab Sahiwal 3 30

ii) OPTIONS NODATE;

LIBNAME aksas "d:\aksas";

DATA aksas.lact_hry;

SET aksas.lactation;

WHERE state LIKE "H%" and (my>22);

RUN;

PROC PRINT DATA=aksas.lact_hry;

TITLE "State, Lactation and Breed wise Milk Yield Report";

VAR state breed lact my;

RUN;

Program output (Ex.6-ii):

State, Lactation and Breed wise Milk Yield Report

Obs STATE BREED LACT MY

1 Haryana Tharparkar 1 23

2 Haryana Tharparkar 2 25

3 Haryana Sahiwal 2 24

4 Haryana Sahiwal 3 23

Page 30: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

5 Haryana Sahiwal 4 23

6 Haryana Red Sindhi 1 27

7 Haryana Red Sindhi 2 26

8 Haryana Red Sindhi 3 25

9 Haryana Red Sindhi 4 25

Ex.7. This program subsets and prints the cases for the state of Haryana. This example

demonstrates a different way to perform the operation through subsetting IF statement rather

than where statement. The IF statement causes the DATA step to continue processing only those

raw data records or those observations from a SAS data set that meet the condition of the

expression that is specified in the IF statement. That is, if the expression is TRUE for the case

(its value is neither 0 nor missing), SAS continues to execute statements in the DATA step and

includes the current case in the data set. The resulting SAS data set(s) contains a subset of the

original external file or SAS data set. If the expression is FALSE (its value is 0 or missing), no

further statements are processed for that case, the current case is not written to the data set, and

the remaining program statements in the DATA step are not executed. SAS immediately returns

to the beginning of the DATA step because the subsetting IF statement does not require

additional statements to stop processing cases. The subsetting IF statement is equivalent to this

IF-THEN statement, e.g.

IF (condition)

THEN SAS statement;

While creating SAS data sets, you can use the subsetting IF statement when it is easier to specify

a condition for including cases. When it is easier to specify a condition for excluding cases, use

the DELETE statement. See examples below for understanding the logic.

The subsetting IF and the where statements are not equivalent! The two statements work

differently and produce different output data sets in some cases. The most important differences

are summarised below:

The subsetting IF statement selects observations that have been read into the program

data vector. The where statement selects observations before they are brought into the

program data vector. The subsetting IF might be less efficient than the where statement

because it must read each observation from the input data set into the program data

vector.

The subsetting IF statement and where statement can produce different results in DATA

steps that interleave, merge, or update SAS data sets.

The subsetting IF statement can select cases from an existing SAS data set or from raw

data that are read with the INPUT statement. The where statement can select cases only

from existing SAS data sets.

The subsetting IF statement is executable; the where statement is not.

Page 31: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

SAS Program (Ex.7):

OPTIONS NODATE;

LIBNAME aksas ‘d:\aksas’;

DATA aksas.lact_h;

SET aksas.lactation;

IF (state = ‘Haryana’)

THEN DELETE;

PROC PRINT DATA=aksas.lact_h;

RUN;

Program Output (Ex.7):

State, Lactation and Breed wise Milk Yield Report

Obs STATE BREED LACT MY

1 Panjab Hariana 1 25

2 Panjab Hariana 2 36

3 Panjab Hariana 3 35

4 Panjab Hariana 4 37

5 Panjab Tharparkar 1 30

6 Panjab Tharparkar 2 32

7 Panjab Tharparkar 3 29

8 Panjab Tharparkar 4 30

9 Panjab Sahiwal 1 26

10 Panjab Sahiwal 2 29

11 Panjab Sahiwal 3 30

12 Panjab Sahiwal 4 38

13 Panjab Red Sindhi 1 29

14 Panjab Red Sindhi 2 25

15 Panjab Red Sindhi 3 27

16 Panjab Red Sindhi 4 28

Ex.8. This program is similar to the previous example.

OPTIONS NODATE;

LIBNAME aksas ‘d:\aksas’;

DATA aksas.lact_p;

SET aksas.lactation;

IF NOT (state LIKE ‘H%’)

THEN DELETE;

PROC PRINT DATA=aksas.lact_p;

RUN;

Page 32: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Program Output (Ex.8):

State, Lactation and Breed wise Milk Yield Report

Obs STATE BREED LACT MY

1 Panjab Hariana 1 25

2 Panjab Hariana 2 36

3 Panjab Hariana 3 35

4 Panjab Hariana 4 37

5 Panjab Tharparkar 1 30

6 Panjab Tharparkar 2 32

7 Panjab Tharparkar 3 29

8 Panjab Tharparkar 4 30

9 Panjab Sahiwal 1 26

10 Panjab Sahiwal 2 29

11 Panjab Sahiwal 3 30

12 Panjab Sahiwal 4 38

13 Panjab Red Sindhi 1 29

14 Panjab Red Sindhi 2 25

15 Panjab Red Sindhi 3 27

16 Panjab Red Sindhi 4 28

Ex.9. This program is also similar to previous examples.

LIBNAME aksas ‘d:\aksas’;

data aksas.lact_ph;

set aksas.lactation;

if not (my>=25)then

delete;

proc print data=aksas.lact_ph;

run;

Program Output (Ex.9):

State, Lactation and Breed wise Milk Yield Report

Obs STATE BREED LACT MY

1 Haryana Tharparkar 2 25

2 Haryana Red Sindhi 1 27

3 Haryana Red Sindhi 2 26

4 Haryana Red Sindhi 3 25

5 Haryana Red Sindhi 4 25

6 Panjab Hariana 1 25

7 Panjab Hariana 2 36

8 Panjab Hariana 3 35

9 Panjab Hariana 4 37

Page 33: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

10 Panjab Tharparkar 1 30

11 Panjab Tharparkar 2 32

12 Panjab Tharparkar 3 29

13 Panjab Tharparkar 4 30

14 Panjab Sahiwal 1 26

15 Panjab Sahiwal 2 29

16 Panjab Sahiwal 3 30

17 Panjab Sahiwal 4 38

18 Panjab Red Sindhi 1 29

19 Panjab Red Sindhi 2 25

20 Panjab Red Sindhi 3 27

21 Panjab Red Sindhi 4 28

KEEP and DROP statements – dropping and keeping variables

SAS will automatically write to any data set created by a DATA step all variables in the input

data set(s) along with all new variables created in the DATA step. If you don’t want all these

variables, you can indicate this to SAS in one of several ways. KEEP and DROP statements are

used often to control the number of variables read into and output into the data sets. You can

create several variables during the data processing, but need to save only selected variables in the

final data set. If you want to restrict the number of columns in output (or target) data set, use the

following method. This will ensure that output data set is created with required variables only.

DATA aksas.lacpb (keep = state my);

SET aksas.lactation;

RUN;

Alternately, you can specify the first statement as follows:

DATA aksas.lacpb;

SET aksas.lactation;

KEEP state my;

RUN;

If you are reading a big data set into SAS and wish to extract only a few variables from it, use the

following statements in the program.

DATA aksas.lacpb;

SET base_data (keep = state breed my);

WHERE state = ‘Panjab’;

RUN;

In the first case, SAS reads the entire data set lactation, even though you only intend to use a few

variables. In the second case, SAS reads from storage only the three variables you intend to keep.

Page 34: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Note that we have to use such efficient methods to restrict the data read into the system so as to

optimise the system resources such as SAS Libraries and shared drives.

The same way DROP statement can also be specified based on the data requirement of the user.

Now, try following simple programs.

Ex.10: The following programs demonstrate KEEP/DROP statements to subset data from master

(base) data set into target data set.

i)

LIBNAME aksas ‘d:\aksas’;

DATA aksas.lactpb;

SET aksas.lactation;

KEEP breed lact my;

WHERE state = ‘Panjab’;

RUN;

PROC PRINT DATA=aksas.lactpb;

Run;

ii)

LIBNAME aksas ‘d:\aksas’;

DATA aksas.lactpb;

SET aksas.lactation;

DROP state;

WHERE state = ‘Panjab’;

RUN;

PROC PRINT DATA=aksas.lactpb;

VAR breed lact my;

RUN;

Page 35: SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the core foundation for variety of data management and analytic software components

SAS Programming Essentials for Statistical Computing in Dairy Research

Suggested Readings & WWW Resources

[1] An Introduction to SAS Data Steps. Social Science Computing Cooperative, University

of Wisconsin-Madison. Accessed May 23, 2013. Available online at:

http://www.ssc.wisc.edu/sscc/pubs/4-18.htm.

[2] SAS Institute Inc., SAS/IML User’s Guide, Version 8, Cary, NC: SAS Institute Inc.,

1999. pp. 846. Accessed May 23, 2013. Available online at:

http://ciser.cornell.edu/sasdoc/saspdf/iml/chap6.pdf.

[3] Gupta S, 2005. WHERE vs. IF Statements: Knowing the Difference in How and When to

Apply. Paper: 238-31. http://www2.sas.com/proceedings/sugi31/238-31.pdf.

[4] SAS Operators.

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a00

0 780367.htm.

[5] Statistical Computing with SAS. The Institute for Digital Research and Education (IDRE)

Research Technology Group. The University of California, Los Angeles (UCLA).

Accessed: May 24, 2013. http://www.ats.ucla.edu/stats and https://idre.ucla.edu/stats/sas.

[6] Kurian, J. S for SAS: A Beginners’ Guide. Accessed May 24, 2013. Available online at:

http://www.puzha.com/sasbook/s4sas.pdf.

[7] Reference Manual - Data Analysis using SAS. Accessed June 18, 2013. Available online

at: http://www.iasri.res.in/sscnars/contentmain.htm.