34
Chapter 2: referencing Files and Setting Options

Chapter 2: referencing Files and Setting Options

Embed Size (px)

Citation preview

Page 1: Chapter 2: referencing Files and Setting Options

Chapter 2: referencing Files and Setting Options

Page 2: Chapter 2: referencing Files and Setting Options

SAS Libraries

Every SAS file is stored in a SAS library. SAS data set is one type of SAS file.In some operating environment, a library is a physical collection of files.In others, such as Windows and Unix environments, a library is a logical name

consisting of a group of files that are stored in a physical location in a storage space.

Library can be Temporary or Permanent.A SAS library must be prepared in order for a SAS program to reach the directory

to either read or output a SAS data set.SAS program only need to recognize the Library reference name.

Hard DriveA Library Name

Path to the physical HD location

Page 3: Chapter 2: referencing Files and Setting Options

Reference a SAS file in a SAS LibraryA SAS library name has two-levels:

LIBREF.Filename

Libref is the the SAS Library name that is connected to a physical directory in a storage location in your computer.

fielname is a file stored in the directory referred to the Libref.

Page 4: Chapter 2: referencing Files and Setting Options

Two types of SAS Library(A) Temporary SAS Library for hosting temporary SAS

data sets:The LIBREF is always WORK, which is already available in the Libraries folder in

Explore Panel of the SAS working environment.Example: WORK.admit is a temporary SAS data set.

WORK is the LIBREF and the data set name is admit

NOTE: All of the SAS data sets stored in the WORK library will disappear after log off the SAS session.

NOTE: one can ignore ‘WORK’ and specify the data set as admit, if it is stored in the WORK library as temporary library.

Fro example, in the DATA step:

DATA admit2; is the same as DATA work.admit2;

Page 5: Chapter 2: referencing Files and Setting Options

(B) Permanent SAS Library:The data sets hosted in the permanent SAS library

remains in the SAS session, but the files are stored physically in the HD as defined. The Libref is defined by the user.

For example: Mylib.admit refers to a SAS data set admit which is stored in the library named Mylib.

Mylib is user defined SAS library. Admit is a file stored in the corresponding physical location in the hard drive.

Page 6: Chapter 2: referencing Files and Setting Options

How to assign a SAS Library

If you want to use the WORK library to store your file, there is no need to define WORK library. It is already created by SAS when you login.

If you want to create you own library, there are two ways:

(1) By the pull-down menu, as described in the SAS Window Environment document.

Page 7: Chapter 2: referencing Files and Setting Options

(2) By using a SAS statement as below to define a SAS library:

LIBNAME libref ‘the path link to the physical folder in HD’;

NOTE: libref is a logical name for the entire folder in HD. The folder can have many data sets. Each data set in the folder will be called:

libref.dasetnameExample: you store data sets: admit, budget, tuition in the folder

‘UNIVERSITY’ in C-drive. You define a SAS library ‘COST’ link to these files by:LIBNAME cost ‘C:\university’;

The data sets will be named in your SAS program as:Cost.admit , cost.budget , cost .tuition

NOTE: the names can be upper or lower cases.

Page 8: Chapter 2: referencing Files and Setting Options

Rules required for a Valid SAS Library name

• are limited to 8 characters• must start with a letter or underscore• can contain only letters, numbers, or

underscores.

Example:s575, _s575 , s575_ s575_ are valid LIBREFS-575 , sta575_online are not valid

Page 9: Chapter 2: referencing Files and Setting Options

How Long Libref remains in effect

The LIBNAME statement is a global statement. A global statement will remain in effect until you modify them, cancel them or end your SAS session.

Although we say the library is permanent, this means your data set in the SAS library (in the physical storage) is permanent, but not the LIBREF. You still need to assign a libref to each permanent library in order to access these data sets in each SAS session.

NOTE: If you use the Pull-Down menu to create your permanent and check ‘Enable at Startup’, then, the LIBREF will be available when you login without LIBNAME statement.

Page 10: Chapter 2: referencing Files and Setting Options

Referencing files in other formatsYou can use LIBNAME statement to reference not only

SAS files, but also files created by other software products, such as database management systems.

SAS uses appropriate SAS engine designed to connect to these specific software products.

Files from non-SAS software Engine SAS data library

LIBNAME Libref Engine ‘path to the physical location’;

Some available engines are BMDP, SPSS, OSIRISAllows read-only access to BMDP, SPSS, OSIRIS filesSee Help document for more details, if needed.

Page 11: Chapter 2: referencing Files and Setting Options

Where to find the Library created and the contents in the library and in each data set?

Once the library is created, it appears in the folder called ‘Libraries’ on the left panel (Explore Panel) of the SAS working interface.

To see the content of a SAS data set, click on the data set to open the data set in ‘Tableview’ window. Close the Tableview window afterwards.

One can use SAS statements to view the contents of a SAS library and the detailed DATA descriptor information of any SAS data set.

Page 12: Chapter 2: referencing Files and Setting Options

ExerciseWrite a SAS program to read the following SAS data set located in the class webiste,Pilots.sas7bdatThis data consists of pilots employed at an airline. The variables are

Variable Type Length DescriptionID char 4 ID numberLastName char 10 last nameFirstName char 9 first name City char 12 cityState char 2 stateGender char 1 genderJobCode char 3 job code Salary num 8 current salaryBirth num 8 birth dateHired num 8 date hired

HomePhone char 12 home phone number

In this program, you will do the following tasks:(1) Create a SAS library, mylib that

connects to the folder in which Pilots data set is stored.

(2) Read the SAS data set, Pilots(3) Create a new SAS data set, call: Pilotsnew, and store it in another SAS library call: mylib1 that connect to the folder, DataEx , inside Math707 folder.(4) Print the data.Save the SAS program, name it C2_readSASData to your C-drive in a new folder, SASEx inside Math707,

Page 13: Chapter 2: referencing Files and Setting Options

Answer to Exercise

Libname mylib ‘c:\math707\sasdata’;Libname mylib1 ‘c:\math707\ dataex’;Data mylib1.pilotsnew; Set mylib.pilots;Run;Proc print data = mylib1.pilotsnew;Run;

Page 14: Chapter 2: referencing Files and Setting Options

View contents of entire Library and/or Data descriptor of a data set

In practical situation, a SAS library often consists of many data sets shared by different users. Therefore, it is a good practice to find out the contents in the library.

SAS has two SAS procedures to display the contents in the library as well as for each SAS data set:

PROC CONTENTS <options>; RUN;

PROC DATASETS <options> ; CONTENTS <options>;QUIT;

Page 15: Chapter 2: referencing Files and Setting Options

View the contents in the entire library without data descriptor

/* To display all SAS data sets in Mylib library */proc contents data=mylib._all_ nods; run;

/*Or use the following procedure */proc datasets; contents data=mylib._all_ nods;Quit;

NOTE: the filename _all_ is a SAS designated variable name referring to all files in the mylib library.NODS: is a key word referring to NO Data Descriptor details

NOTE: The statement inside /* */ is a comment statement.

Page 16: Chapter 2: referencing Files and Setting Options

View detail data descriptor information of a data set

/*view the data descriptor information for the SAS data set admit */

PROC CONTENTS data=mylib.admit; run;

/* One can also use the following procedure *PROC DATASETS;

CONTENTS data=mylib.admit;QUIT;

NOTE: The variables are listed in alphbetic order by default.

Page 17: Chapter 2: referencing Files and Setting Options

View detail data descriptor information of a data set in table column order for the

variables in the data set

One can list the variable order based on the order it created in the SAS data set by using the option: VARNUM

PROC CONTENTS data=mylib.admit varnum;Or PROC DATSETS ;CONTENTS DATA=mylib.ADMIT VARNUM;QUIT;

Page 18: Chapter 2: referencing Files and Setting Options

Exercise

Open the SAS program C2_readSASdata program, and use PROC CONTENTS as well as PROC DATASETS to (1) View only the SAS data sets in mylib library.(2) View the detailed data descriptor for the SAS data

set pilots in mylib.(3) View the detailed data descriptor for the SAS data

set pilots with the table column variable order.(4) Save the SAS program, name it C2_Contents, to

your SASEx folder.

Page 19: Chapter 2: referencing Files and Setting Options

Answer to Exercise

/* use proc contents , display all sas data sets in mylib*/Libname mylib ‘c:\math707\sasdata’;Proc contents data = mylib._all_ nods; Run;/* use proc datasets , display all sas data sets in mylib */proc datasets; contents data=mylib._all_ nods;Quit;/* use proc contents , display details of sas data set pilots with variables in alphabetic order */Proc contents data = mylib.pilots; Run;proc datasets; contents data=mylib.pilots;Quit;

/* use proc contents , display details of sas data set pilots with variables in table column order */Proc contents data = mylib.pilots varnum; run;

proc datasets; contents data=mylib.pilots varnum;Quit;

Page 20: Chapter 2: referencing Files and Setting Options

Setting SAS System Options

SAS system options for each window can be set using Tools, Options, System to set the system options using Pull-down menu, or use SAS statement to specify System options:

NOTE: One can set system options for SAS Listing output regarding to

• Line size, page size, the page number, the date and time to be displayed, and many others. These options will not affect the HTML output format.

Page 21: Chapter 2: referencing Files and Setting Options

Setting System OptionsThe general syntax: OPTIONS options;Some useful options are:

DATE|NODATE: to print date and time or not (Default is DATE)NUMBER\NONUMBER: to print page # or not. Default is number and all numbers are

cumulated until renumbered.PAGENO = n: by default, page # are cumulated. Use PAGENO=n to reset the starting page #.

For example,PAGENO=3 will reset the page # starting at page 3, and begin cumulating from that point on.PAGESIZE = n|maxLINESIZE=n|max: Note: If an observation need more than one line, it continues on to next

line.

NOTE: OPTIONS statement is a global statement. Can appear anywhere in your program to change the setting from that point on.

NOTE: It is a good practice to place OPTIONS statements outside the DATA or PROC steps.

Page 22: Chapter 2: referencing Files and Setting Options

ExerciseOpen C2_Contents program, and practice the following SAS system options using OPTIONS statement.Delete all RPOC DATASETS procedures.Add options statement at the end of this program with the following options:

Change options to NODATE, Set PAGENO starting at 1 for the outputSet PAGESIZE to be 50Set LINESIZE to be 80

Use proc contents to see the descriptor of admit data in mylibUse proc print statement to print admit data.

Check results to see the effects of these options.

Add another OPTIONS statement and change options back toDATE, PAGESIZE=max, LINESIZE=max, then, Use proc print to print PILOTS data in the mylib.Check the results to see the effect of the options.

Save the program, named C2_SYSOptions to your SASEx folder

Page 23: Chapter 2: referencing Files and Setting Options

Answer to Exercise

Libname mylib ‘c:\math707\sasdata’;

Proc contents data = mylib._all_ nods; Run;

Proc contents data = mylib.admit; run;

Options nodate pageno=1 pagesize=50 linesize=80;

Proc print data = mylib.admit; run;

Options date pagesize=max linesize=max;

Proc print data = mylib.pilots; run;

Page 24: Chapter 2: referencing Files and Setting Options

Handling two-digit years using System OPTIONS statement

Many data use two-digit year such as 94 for 1994. 10 for 1910. There is no confuse for 1994 using 94 now, but year 10 can be 1910 or 2010. This is Year 2000 Compliance problem.

SAS uses OPTIONS YEARCUTOFF = year; to control the 2000 year compliance issue. This specifies the 100 year span for interpret two-digit year.

The default yearcutoff = 1920 (interpret the 100 years span from 1920 to 2019 for the two-digit year.

OPTIONS YEARCUTOFF = 1940; interpret 1940 to 2039 as 100 year span for two-digit year.

Page 25: Chapter 2: referencing Files and Setting Options

How does YEARCUTOFF work?OPTIONS YEARCUTOFF=1940;Interpret the 100 year from 1940 to 2039

OPTIONS YEARCUTOFF=1960

Date in the data set Interpreted as

8/26/15 8/26/2015

12/25/65 12/25/1965

5/7/90 5/7/1990

8/30/48 8/30/1948

Date in the data set Interpreted as8/26/15

12/25/65

5/7/90

8/30/48

Page 26: Chapter 2: referencing Files and Setting Options

Specifying observations of SAS data set to be processed using OPTIONS statement

In many applications, the # of observations (cases) is very large. It is important that a SAS program is correct before processing the entire data set. However, one needs to test if the program correctly process the data, one can specify only a small part of the data to be processed for testing purpose.

This can be done by using OPTIONS statement.

OPTIONS FIRSTOBS = n1 OBS= n2 ;FIRSTOBS = n1 will read the data starting at the n1th observation.OBS=n2 will read the data set ending at the n2th observation.Example: OPTIONS FIRSTOBS=5 OBS=15;Will read from the 5th observations until the 15th observations. Default n1 and n2 are: FIRSTOBS=1 and OBS=MAXTo reset reading the entire data set, use OPTIONS FIRSTOBS = 1 OBS =MAX;

Page 27: Chapter 2: referencing Files and Setting Options

GLOBAL statement Vs. Local StatementSAS defines some statements as global statements such as

LIBNAME statement, OPTIONS statement. They take effect once it is defined and overwritten by the next statement in the same program during the same SAS session.

Most of SAS statements are local, meaning it takes in effect only at the time it appears. If the same task defined in a global and in a local statement, the local statement overwrites the global statement at the point, but return to the global statement afterwards.

Page 28: Chapter 2: referencing Files and Setting Options

ExerciseWrite a program to (1) Read and print the sas data set Admit using the following options:Pageno=1, firstobs=5 and obs = 15(2) Add another options statement to the program with the options:Firstobs=3 and obs=8And print the data set Admit again.Observe the output and make sure you understand the reason for getting the output.(3) Reset the options with Pageno=1, firstobs=1 and obs=max, then print the Admit data.(4) Save the program as C2_sysoptions2 to SASEx folder

Page 29: Chapter 2: referencing Files and Setting Options

AnswerLibname mylib ‘c:\math707\sasdata’;

Options pageno=1 firstobs=5 obs=15;

Data admitn; set mylib.admit;

Proc print data=admitn; run;

Proc print data = mylib.admit; run;

Options firstobs=3 obs=8;

Proc print data = admitn; run;

Proc print data = mylib.admit; run;

Options firstobs=1 obs=max pageno=1;

Proc print data=admitn; run;

Proc print data = mylib.admit; run;

Page 30: Chapter 2: referencing Files and Setting Options

FIRSTOBS=, OBS= as local options in a PROC PRINT procedure

PROC PRINT procedure is the most common procedure to print the data.

The general syntax is:PROC PRINT <options>; RUN;The following examples use Local options in PROC PRINT to specify

observations: PROC PRINT data=mylib.admit (FIRSTOBS=5 OBS=15); Will print 5th observations to 15th observations.

Page 31: Chapter 2: referencing Files and Setting Options

More on Local Options Vs. Global Options in PROC PRINT

OPTIONS FIRSTOBS=10 OBS=18;/* Uses the global OPTIONS. Since there is no local option*/proc print data = mylib.admit; title 'print 10th to 18th cases';

/*Uses local option for Firstobs = 15, and use global option for obs=18 */PROC PRINT data=mylib.admit (firstobs=15); title 'prints cases 15 to 20'; run;

/*uses local option for Firstobs = 12, and obs=16. Since local options overwrite global option for the specific procedure.*/PROC PRINT data=mylib.admit (firstobs=12 OBS=16); title 'prints cases 12 to 16 '; run;

/*Uses local option for Firstobs = 5, and obs=20. Since local options overwrite global option for the specific procedure.*/

PROC PRINT data=mylib.admit(firstobs=5 obs=20); title 'prints 5 to 20 '; run;

Page 32: Chapter 2: referencing Files and Setting Options

More System Options

See SAS Help Documents and a few additional options in textbook.

Page 33: Chapter 2: referencing Files and Setting Options

Exercise

Write a SAS program to do the following:(1) Create the library Mylib to connect to the SASData folder as usual.(2) Use options: pageno=1 firstobs=5 obs=15(3) Print data set admit in mylib (4) Print data set admit using local options (firstobs = 3 obs =12) in proc print

statement.(5) Add system options statement with firstobs =1 and obs =15.(6) Print data set admit using local options (firstobs = 10 obs =20) in proc print

statement.(7) Add system options statement with firstobs =1 and obs =max.(8) Print data set admit using local options (firstobs = 3 obs =12) in proc print

statement.

Save the program as c2_glob_loc_options to SASEx folder

Page 34: Chapter 2: referencing Files and Setting Options

Answer

Libname mylib ‘c:\math\sasdata’;

Options pageno=1 firstobs=5 obs=15;

Proc print data = mylib.admit; run;

Proc print data = mylib.admit (firstobs=3 obs=12); run;

Options firstobs=1 obs=15;

Proc print data = mylib.admit (firstobs=3 obs =12); run;

Options firstobs=1 obs=max;

Proc print data = mylib.admit (firstobs=3 obs =12); run;