YET ANOTHER TIPS, TRICKS, TRAPS, TECHNIQUES PRESENTATION: A Random Selection of What I Learned From...

Preview:

Citation preview

YET ANOTHER TIPS, TRICKS,TRAPS, TECHNIQUES

PRESENTATION:

A Random Selection of What I Learned From 15+

Years of SAS Programming

John Pirnat Kaiser Permanente

Oct 15, 2009

2

ARE THESE EQUIVALENT?

data allschools1;set school1 school2;keep id lname fname;run;

#1

#2

data allschools2(keep = id lname fname);set school1 school2;run;

#3

data allschools3;set school1 (keep = id lname fname)school2 (keep = id lname fname);run;

3

EXAMPLE OF WHY HOW YOU WRITE THE CODE MIGHT MATTER:

data school1;

input id 8. region $1. lname $25. fname $10. pass_math $1. pass_eng $1. pass_span $1.;

datalines;

72186 4Mouse Minnie YYN

51682 8Duck Donald NNN

368882 2Hall Annie YYY

45111 2Sawyer Tom NYN

;

run;data school2;

input id 8. region 8. lname $25. fname $10.pass_math $1. pass_eng $1. pass_span $1.;

datalines;

46631 3 Lane Lois NYN

866322 7 Mouse Mickey YYN

63358 6 Kent Clark NYY

42643 1 Bunker Edith YYY

;

run;

4

CODE #1

99 data allschools1;

100 set school1 school2;

ERROR: Variable region has been defined as both character and numeric.

101 keep id lname fname;

102 run;

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.ALLschoolS1 may be incomplete. When this step was stopped there

were 0 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.00 seconds

5

CODE #2

105 data allschools2

106 (keep = id lname fname);

107 set school1 school2;

ERROR: Variable region has been defined as both character and numeric.

108 run;

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.ALLschoolS2 may be incomplete. When this step was stopped there

were 0 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

6

CODE #3

110 data allschools3;

111 set school1 (keep = id lname fname)

112 school2 (keep = id lname fname);

113 run;

NOTE: There were 4 observations read from the data set WORK.school1.

NOTE: There were 4 observations read from the data set WORK.school2.

NOTE: The data set WORK.ALLschoolS3 has 8 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.00 seconds

7

IN CODE #3 EXAMPLE, ONLY THE COLUMNS IN THE KEEP COMMAND ARE BEING LOADED INTO PROGRAM DATA VECTOR (PDV)

IN CODE #1 AND #2 EXAMPLES, ALL COLUMNS OF THE TWO SCHOOL DATASETS ARE BEING READ IN THE PDV, BUT ONLY COLUMNS INCLUDED IN KEEP STATEMENT ARE READ TO THE OUTPUT DATASET

8

DON’T “FREQ” OUT

proc freq data = school1;tables pass_math pass_eng pass_span /out = grade_results;run;

proc print data = grade_results;run;

The SAS System pass_ Obs span COUNT PERCENT

1 N 3 75 2 Y 1 25

9

USE ODS INSTEAD

proc freq data = school1;tables pass_math pass_eng pass_span;ods output OneWayFreqs = grade_results;run;

proc print data = grade_results;var pass_eng pass_math pass_span frequency percent ;run;

The SAS System pass_ pass_ Obs pass_eng math span Frequency Percent 1 N 2 50.00 2 Y 2 50.00 3 N 1 25.00 4 Y 3 75.00 5 N 3 75.00 6 Y 1 25.00

SAS DATASET NAME

ODS TABLE NAME

10

YOU CAN BE TOO CLEVER IN YOUR CODE

data school1;

input schoolid 8. lname $25. fname $10. pass_math $1. pass_eng $1. pass_span $1.;

datalines;

72186 Mouse Minnie YYN

51682 Duck Donald NNN

368882 Hall Annie YYY

45111 Sawyer Tom NYN

;

run;

data school2;

input id 8. lname $25. fname $10. pass_math $1. pass_eng $1. pass_span $1.;

datalines;

46631 Lane Lois NYN

866322 Mouse Mickey YYN

63358 Kent Clark NYY

42643 Bunker Edith YYY

;

run;

11

lname fname id schoolid

Mouse Minnie 72186 72186

Duck Donald 72186 51682

Hall Annie 72186 368882

Sawyer Tom 72186 45111

Lane Lois 46631 .

Mouse Mickey 866322 .

Kent Clark 63358 .

Bunker Edith 42643 .

HUH?

data allschools;set school1 school2;if id = . then id = schoolid;run;

proc print data = allschools noobs;var lname fname id schoolid;run;

12

“When variables are read with a SET, MERGE, or UPDATE statement, the SAS System sets the value to missing only before the first iteration of the DATA step…Thereafter, the variables retain their values until new values become available …”

-SAS® Language

Reference

Version 6

First Edition

13

ONE WORKAROUND:

data allschools;set school1 (rename = (schoolid = id))school2;run;

proc print data = allschools;var lname fname id;run;

Obs lname fname id

1 Mouse Minnie 72186

2 Duck Donald 51682

3 Hall Annie 368882

4 Sawyer Tom 45111

5 Lane Lois 46631

6 Mouse Mickey 866322

7 Kent Clark 63358

8 Bunker Edith 42643

14

1. TS486 – Quick Reference Guide to SAS Functions Informats Formats

Updated at:

http://www.sascommunity.org/wiki/TS_486_Functions,_Informats,_and_Formats

data school1;

input id 8. region $1. lname $25. fname $10. pass_math $1. pass_eng $1. pass_span $1.;

datalines;

72186 4Mouse Minnie YYN

51682 8Duck Donald NNN

368882 2Hall Annie YYY

45111 2Sawyer Tom NYN

;

run;

2. =:

QUICK HITTERS:

15

data test1;set school1;if lname =: 'M';run;

proc print data = test1 noobs;var lname;run;

lname

Mouse

data test2;set school1;if lname>=: 'M';run;

proc print data = test2 noobs;var lname;run;

lname

Mouse

Sawyer

16

data test3;set school1;if lname<=: 'M';run;

proc print data = test3 noobs;var lname;run;

Lname

Mouse DuckHall

17

OFFBEAT MACRO USES

%macro skip;

Lotsa comments /* */ in code

%mend;

Use when debugging program and don’t want to run heavily commented code

Let %abc =

Big chunk of code that you will repeat often ;

18

AND THEN THERE IS PROC IMPORT

How many of you receive spreadsheets like this?

SOUTHLAND SCHOOL          

           

ID LNAME FNAMEPASS_MATH

PASS_ENG

PASS_SPAN

38793 NEWMAN   N N N

5763 GEKKO GORDON Y Y Y

414452 SPAULDING GEOFFREY Y Y N

916547 O'HARA SCARLETT N N Y

43256 MARPLE JANE Y N Y

602345 CHARLES NORA N Y Y

402395 MASON PERRY Y Y Y

19

proc importdatafile = 'P:\My Documents\CO DAY 2009\SOUTHLAND SCHOOL.xls'out = school3dbms = excelreplace;run;

proc print data = school3;run;

SOUTHLAND_ Obs SCHOOL F2 F3 F4 F5 F6 1 . 2 . LNAME FNAME PASS_MATH PASS_ENG PASS_SPAN 3 38793 NEWMAN N N N 4 5763 GEKKO GORDON Y Y Y 5 414452 SPAULDING GEOFFREY Y Y N 6 916547 O'HARA SCARLETT N N Y 7 43256 MARPLE JANE Y N Y 8 602345 CHARLES NORA N Y Y 9 . MASON PERRY Y Y Y

20

CORRECTION ATTEMPT 1proc importdatafile = 'P:\My Documents\CO DAY 2009\SOUTHLAND SCHOOL.xls'out = school3dbms = excelreplace;mixed = yes;run;

proc print data = school3;run;

SOUTHLAND_ Obs SCHOOL F2 F3 F4 F5 F6 1 2 ID LNAME FNAME PASS_MATH PASS_ENG PASS_SPAN 3 38793 NEWMAN N N N 4 5763 GEKKO GORDON Y Y Y 5 414452 SPAULDING GEOFFREY Y Y N 6 916547 O'HARA SCARLETT N N Y 7 43256 MARPLE JANE Y N Y 8 602345 CHARLES NORA N Y Y 9 402395 MASON PERRY Y Y Y

INSERT MIXED COMMAND

21

TO GET MIXED RESULTS BEYOND 8 OBS NEED TO ADJUST YOUR WINDOWS REGISTRY – I WOULDN’T TRY IT EVEN IF I COULD

CORRECTION ATTEMPT 2proc importdatafile = 'P:\My Documents\CO DAY 2009\SOUTHLAND SCHOOL NEW.xls'out = school3dbms = excel replace;mixed = yes;run;proc print data = school3;run;

PASS_ PASS_ Obs ID LNAME FNAME MATH PASS_ENG SPAN 1 38793 NEWMAN N N N 2 5763 GEKKO GORDON Y Y Y 3 414452 SPAULDING GEOFFREY Y Y N 4 916547 O'HARA SCARLETT N N Y 5 43256 MARPLE JANE Y N Y 6 602345 CHARLES NORA N Y Y 7 3958475 DOODY HOWDY Y Y N 8 5457346 BOND JAMES N Y Y 9 . MASON PERRY Y Y Y

22

THIS COULD HAPPEN TO YOU IF YOU ARE NOT CAREFUL – ANOTHER PROC IMPORT EXAMPLE

proc importdatafile = 'P:\My Documents\CO DAY 2009\ACCEPTED_PROCEDURES.xls'out = proceduresdbms = excelreplace;run;

data patients;input id 8. lname $25. fname $10. procedure 8.;datalines;72186 Mouse Minnie 9920151682 Duck Donald 99204368882 Hall Annie45111 Sawyer Tom 99402;run;

proc sql;select id, lname, procedurefrom patientswhere procedure in(select procedurefrom procedures);quit;

23

id lname procedure

72186 Mouse 99201

51682 Duck 99204

368882 Hall .

WHY DID THIS HAPPEN?

SPREADSHEET “ACCEPTED PROCEDURES” CONTAINED BLANK CELLS IN “PROCEDURE” COLUMN

24

WORKAROUNDS:

1. Use DDE

cf. www.lexjansen.com and search on “DDE”

2. Save EXCEL spreadsheet in CSV or tab-delimitted format and input into SAS through your code

3. Put criteria in code to ensure you input what you really desire

25

QUICK REPORT TIPS FOR THE HARD-TO-SATISFY CLIENT

FOR ASSEMBLING SUMMARY DATA FROM VARIOUS DATASETS INTO A SPECIFIED LAYOUT

SCHOOL EXAMPLE:

DISTRICT SCHOOL SUPERINTENDENT WANTS COUNTS AND PERCENTAGES FOR PASSING MATH IN A SPREADSHEET IN A LAYOUT ONLY HE/SHE WOULD COME UP WITH

%macro pass(num);%global tot&num pass&num;proc sql;select count(*) into :tot&numfrom school&num;quit;

proc sql;select count(*) into :pass&numfrom school&numwhere pass_math = 'Y';quit;

%mend;

CREATING MACROVARIABLE IN PROC SQL

SAVE MACRO VARIABLESOUTSIDE MACRO

26

%pass(1);%pass(2);%pass(3);run;

data _NULL_;file 'P:\My Documents\CO DAY 2009\MATH COUNTS.txt';

pct1 = &pass1 / &tot1;pct2 = &pass2 / &tot2;pct3 = &pass3 / &tot3;

retain t '09'x;

if _N_ = 1 thenput "MATH PASS RESULTS" / /"SCHOOL" t "STUDENTS" t "NUMBER PASSED" t "PERCENT PASSED" /"SCHOOL1" t "&tot1" t "&pass1" /t t t pct1/"SCHOOL2" t "&tot2" t "&pass2" /t t t pct2 /"SOUTHLAND SCHOOL" t "&tot3" t "&pass3" /t t t pct3;format pct1 pct2 pct3 percent10.2;options missing = 0;run;

TAB DELIMITTED

27

AFTER BRINGING INTO EXCEL AS TAB-DELIMITTED AND SOME MANUAL ADJUSTMENTS TO COLUMN WIDTH:

CLIENT GETS DATA IN LAYOUT HE/SHE DESIRES

MATH PASS RESULTS

SCHOOL STUDENTS NUMBER PASSED PERCENT PASSEDSCHOOL1 4 2

50.00%SCHOOL2 4 2

50.00%SOUTHLAND SCHOOL 8 5

62.50%

CLIENT GETS DATA IN LAYOUT HE/SHE DESIRES