Upload
alvin-butler
View
213
Download
0
Embed Size (px)
Citation preview
SAS Programming Techniques for Decoding Variables on the Database Level
By Chris Speck PAREXEL International
RTSUG – Wednesday, March 23, 2011
Libraries can be viewed as discrete units. May require a change in perspective for the
beginning and intermediate programmer. Crucial now that programmers are under
greater pressure to apply regulatory standards to clinical research data.
Programmers must now deal with metadata on the library level, which can be especially difficult with legacy data.
One tool for this is the Meta-Engine
You’re given a library of 55 datasets of varying quality from 2001 which you must convert to submission ready data.
Many datasets have variables with different kinds of non-native SAS formats. New variables equivalent to the decode of these
formatted variables must be made. All non-native formats need to be stripped.
What are you going to do?
One solution is to make one program per dataset. Another is to create one massive program that
updates datasets one at a time. Some of the obvious flaws to this approach include
Specific only to one project Involves much unnecessary rework Disorganized Difficult to debug
SAS program that manipulates metadata of entire libraries. Portable Streamlined Easy to understand and debug
Loops through a library one dataset at a time to make quick and uniform changes.
Relies much on the SAS macro facility Dictionary tables Proc format
Meta-Engine Overview
Decode( ) MacroInput: A WORK dataset with a formatted variable
Output: A WORK dataset with the decode of this variable
Meta
-En
gin
e S
tru
ctu
re
Meta-Engine Overview
Decode( ) MacroInput: A WORK dataset with a formatted variable
Output: A WORK dataset with the decode of this variable
Meta
-En
gin
e S
tru
ctu
re
%DO Looping MechanismUses PROC SQL and Dictionary Tables.
%DO Looping Mechanism
Meta-Engine Overview
Decode( ) MacroInput: A WORK dataset with a formatted variable
Output: A WORK dataset with the decode of this variable
Meta
-En
gin
e S
tru
ctu
re
%DO Looping MechanismUses PROC SQL and Dictionary Tables.
%DO Looping Mechanism
Code to Adjust Dataset labels
Meta-Engine Overview
Decode( ) MacroInput: A WORK dataset with a formatted variable
Output: A WORK dataset with the decode of this variable
Meta
-En
gin
e S
tru
ctu
re
%DO Looping MechanismUses PROC SQL and Dictionary Tables.
%DO Looping Mechanism
Code to Adjust Dataset labels
Code Calling the Decode() macroDoes this once for every variable needing decoding.
Meta-Engine Overview
Decode( ) MacroInput: A WORK dataset with a formatted variable
Output: A WORK dataset with the decode of this variable
Meta
-En
gin
e S
tru
ctu
re
%DO Looping MechanismUses PROC SQL and Dictionary Tables.
%DO Looping Mechanism
Code to Adjust Dataset labels
Code Calling the Decode() macroDoes this once for every variable needing decoding.
Code to make further changes
Meta-Engine Overview
The Meta-Engine macro asks for two library names: The one that contains the existing database, and the one that will contain the corrected, submission ready database.
For example:
%macro MetaEngine(lib=, outlib=);
%mend MetaEngine;
%MetaEngine(lib=MYLIB, outlib=MYNEWLIB);
<< All of programming performed >>
%DO Looping Mechanismproc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1;quit;
%DO Looping Mechanismproc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1;quit;
Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~)
%DO Looping Mechanismproc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1;quit;
%let i = %eval(1);%do %while (%scan(&dsnames,&i,~) ne ); %let thisds =%scan(&dsnames,%eval(&i),~);
%let i = %eval(&i+1);%end;
Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~)
%DO Looping Mechanismproc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1;quit;
%let i = %eval(1);%do %while (%scan(&dsnames,&i,~) ne ); %let thisds =%scan(&dsnames,%eval(&i),~);
%let i = %eval(&i+1);%end;
Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~)
Macro variable THISDS will represent a
dataset name for every loop iteration
%DO Looping Mechanismproc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1;quit;
%let i = %eval(1);%do %while (%scan(&dsnames,&i,~) ne ); %let thisds =%scan(&dsnames,%eval(&i),~);
%let i = %eval(&i+1);%end;
Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~)
Macro variable THISDS will represent a
dataset name for every loop iteration
Loops till you run out of tildes
%DO Looping Mechanismproc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1;quit;
%let i = %eval(1);%do %while (%scan(&dsnames,&i,~) ne ); %let thisds =%scan(&dsnames,%eval(&i),~);
%let i = %eval(&i+1);%end;
Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~)
Macro variable THISDS will represent a
dataset name for every loop iteration
Loops till you run out of tildes
<< Bulk of programming >>
Adjusting Dataset Labels
%let dslabel=;proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds";quit;%let dslabel=%trim(&dslabel);
Adjusting Dataset Labels
%let dslabel=;proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds";quit;%let dslabel=%trim(&dslabel);
Data Steps won’t save
them
Adjusting Dataset Labels
%let dslabel=;proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds";quit;%let dslabel=%trim(&dslabel);
Data Steps won’t save
them
Resetting label macro variable before each loop
iteration
Adjusting Dataset Labels
%let dslabel=;proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds";quit;%let dslabel=%trim(&dslabel);
Data Steps won’t save
them
Resetting label macro variable before each loop
iteration
Dictionary Table assigns label of THISDS
to macro variable DSLABEL. Used to
assign dataset label to final dataset.
Adjusting Dataset Labels
%let dslabel=;proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds";quit;%let dslabel=%trim(&dslabel);
%if &thisds=DATA1 %then %let dslabel=Label for DATA1;%else %if &thisds=DATA2 %then %let dslabel=Label for DATA2;
Data Steps won’t save
them
Resetting label macro variable before each loop
iteration
In case you want to manually adjust
dataset labels (not in paper)
Dictionary Table assigns label of THISDS
to macro variable DSLABEL. Used to
assign dataset label to final dataset.
Calling the Decode Macro
data ds0; set &lib..&thisds;run;%let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1;%end;
Calling the Decode Macro
data ds0; set &lib..&thisds;run;%let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1;%end;
Base dataset equal to THISDS
Calling the Decode Macro
data ds0; set &lib..&thisds;run;%let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1;%end;
Base dataset equal to THISDS
Will equal list of all decoded variables for later processing
Calling the Decode Macro
data ds0; set &lib..&thisds;run;%let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1;%end;
Base dataset equal to THISDS
DECODE MACRO Parameters:
Will equal list of all decoded variables for later processing
Calling the Decode Macro
data ds0; set &lib..&thisds;run;%let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1;%end;
Base dataset equal to THISDS
DECODE MACRO Parameters:• Format library
Will equal list of all decoded variables for later processing
Calling the Decode Macro
data ds0; set &lib..&thisds;run;%let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1;%end;
Base dataset equal to THISDS
DECODE MACRO Parameters:• Format library• Input dataset
Will equal list of all decoded variables for later processing
Calling the Decode Macro
data ds0; set &lib..&thisds;run;%let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1;%end;
Base dataset equal to THISDS
DECODE MACRO Parameters:• Format library• Input dataset• Output dataset (with 1 new decode variable)
Will equal list of all decoded variables for later processing
Calling the Decode Macro
data ds0; set &lib..&thisds;run;%let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1;%end;
Base dataset equal to THISDS
DECODE MACRO Parameters:• Format library• Input dataset• Output dataset (with 1 new decode variable) • Variable to be decoded
Will equal list of all decoded variables for later processing
Calling the Decode Macro
data ds0; set &lib..&thisds;run;%let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1;%end;
Base dataset equal to THISDS
DECODE MACRO Parameters:• Format library• Input dataset• Output dataset (with 1 new decode variable) • Variable to be decoded• Decode variable name
Will equal list of all decoded variables for later processing
Calling the Decode Macro
data ds0; set &lib..&thisds;run;%let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1;%end;
Base dataset equal to THISDS
DECODE MACRO Parameters:• Format library• Input dataset• Output dataset (with 1 new decode variable) • Variable to be decoded• Decode variable name
Will equal list of all decoded variables for later processing
Builds decode list
Calling the Decode Macro
data ds0; set &lib..&thisds;run;%let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1;%end;
Base dataset equal to THISDS
DECODE MACRO Parameters:• Format library• Input dataset• Output dataset (with 1 new decode variable) • Variable to be decoded• Decode variable name
Will equal list of all decoded variables for later processing
Builds decode list
Used DS0_00 numbering scheme with NEWDS parameter because it will be the parameter DS in the next macro call, producing DS0_001. Final product should be DS1.
Calling the Decode MacroHow it would appear in real code
Decode Macro
proc sql noprint; select format into :fmt from dictionary.columns where libname="WORK" and memname=%upcase("&ds")
and name="&var";quit;
1. Finds variable’s format using SAS Dictionary table COLUMNS and assigns it to the FMT macro variable.
Decode Macro
proc sql noprint; select format into :fmt from dictionary.columns where libname="WORK" and memname=%upcase("&ds")
and name="&var";quit;proc format noprint cntlout=fmt (keep=length) library=%upcase(&fmtlib) fmtlib; select %substr(&fmt,1,%length(&fmt)-1);run;
2. Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT. LENGTH is max length of format value.
1. Finds variable’s format using SAS Dictionary table COLUMNS and assigns it to the FMT macro variable.
Decode Macro
proc sql noprint; select format into :fmt from dictionary.columns where libname="WORK" and memname=%upcase("&ds")
and name="&var";quit;proc format noprint cntlout=fmt (keep=length) library=%upcase(&fmtlib) fmtlib; select %substr(&fmt,1,%length(&fmt)-1);run;
Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT.
1. Finds variable’s format using SAS Dictionary table COLUMNS and assigns it to the FMT macro variable.
Why do we use a substring function here?
2. Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT. LENGTH is max length of format value.
Decode Macro
proc sql noprint; select format into :fmt from dictionary.columns where libname="WORK" and memname=%upcase("&ds")
and name="&var";quit;proc format noprint cntlout=fmt (keep=length) library=%upcase(&fmtlib) fmtlib; select %substr(&fmt,1,%length(&fmt)-1);run;
Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT.
1. Finds variable’s format using SAS Dictionary table COLUMNS and assigns it to the FMT macro variable.
Why do we use a substring function here?To remove thetrailing period
2. Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT. LENGTH is max length of format value.
Decode Macro
data _null_; set fmt; if _n_=1 then call symput('len',cats(put(length,best.))); run;
2.5. Assigns max length of format to &LEN to prevent truncation.
Decode Macro
data _null_; set &ds; if _n_=1 then do; if length(vlabel(&var))<=38 then call symput('newlabel',cats(vlabel(&var))||"-C"); else if length(vlabel(&var))=39 then call symput('newlabel',cats(vlabel(&var))||"C"); else if length(vlabel(&var))>=40 then call symput('newlabel',substr(cats(vlabel(&var)), 1,length(vlabel(&var))-1)||"C"); end;run;
3. Retrieves variable label with VLABEL
Decode Macro
data _null_; set &ds; if _n_=1 then do; if length(vlabel(&var))<=38 then call symput('newlabel',cats(vlabel(&var))||"-C"); else if length(vlabel(&var))=39 then call symput('newlabel',cats(vlabel(&var))||"C"); else if length(vlabel(&var))>=40 then call symput('newlabel',substr(cats(vlabel(&var)), 1,length(vlabel(&var))-1)||"C"); end;run;
3. Retrieves variable label with VLABEL
Adjusts label so decode variables will have unique labels. Truncates label if >40 characters. Assigns to macro variable &NEWLABEL
Decode Macro
data &newds; length &newvar $&len; set &ds; &newvar=put(&var,&fmt); label &newvar="&newlabel";run;
4. Creates output dataset (&NEWDS). Derives decode variable (&NEWVAR) with variable format (&FMT). Assigns it a length (&LEN) and a label (&NEWLABEL).
Decode Macro
data &newds; length &newvar $&len; set &ds; &newvar=put(&var,&fmt); label &newvar="&newlabel";run;
proc datasets nolist lib=work memtype=data; delete fmt &ds;run; quit; 5. Garbage collection
4. Creates output dataset (&NEWDS). Derives decode variable (&NEWVAR) with variable format (&FMT). Assigns it a length (&LEN) and a label (&NEWLABEL).
Further Adjustments
data ds2; set ds1; %if &thisds=DATA3 %then %do; label D3VAR1C="Trunc. decode label which was
too long-C"; %end;run;
For further information see my previous paper SAS Programming Techniques for Adjusting Metadata on the Database Level.
Completing Database Loop
data &outlib..&thisds %if %length(&dslabel)>0 %then (label="&dslabel");; set ds2; format &fv;run;
Creates final dataset in output library Assigns dataset label DS2 could be DS3 or any number depending on
the adjustments performed. Strips formats off of decoded variables. Process repeats for every dataset in library.
Other Possibilities
Possible to automate variable decoding. PROC SQL produces list of variables with non-
native formats. List informs inner loop calling %DECODE() once
for each variable.
A gain in automation, a loss in adaptability. Not all formats exist in the same catalog. Not all variable names may be 8 characters long. Not all formatted variables may require decodes.
Other Possibilities
The Meta-Engine can be tweaked depending on the task. Some ideas include: Testing libraries for SAS 5 compliance Excluding certain datasets Renaming datasets Splitting a dataset into two if it takes up too much
memory. Adjusting dataset and variable metadata.
See my previous paper SAS Programming Techniques for Adjusting Metadata on the Database Level.
The Meta-Engine offers a quick and streamlined approach for a programmer to begin thinking about metadata on the library or database level.
Programmers can begin to manipulate whole libraries intuitively as if they were datasets.
The Meta-Engine in its entirety plus further information can be found in my paper SAS Programming Techniques for Decoding Variables on the Database Level.
Conclusion
Chris Speck, Senior Programmer
PAREXEL International
2520 Meridian Parkway, Suite 200
Durham, NC 27713
Work Phone: 919 294 5018
Fax: 919 544 3698
Contact Information