Upload
rickey-howorth
View
216
Download
2
Embed Size (px)
Citation preview
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Eliminating Redundant Custom Formats
How to Really Take Advantage of PROC SQL, PROC CATALOG, and the DATA Step
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Eliminating Redundant Custom Formats
Phase I: Generate Catalog of Unique Formats
Phase II: Update Dataset Descriptor
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
1. Generate Unique Format Keys for Each Custom Format
2. Optimize Length of Key Variable
3. Sort Keys Dataset and Generate Index for FMTNAME
4. Prepend VARNUM of Variables with Custom Formats
5. Generate View of Unique Format Keys
6. Generate Dataset of Unique Custom Format Metadata Records
7. Generate Persistent Unique Formats Catalog
Phase I: Generate Catalog of Unique Formats
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
1. Generate Unique Format Keys for Each Custom Format
2. Optimize Length of Key Variable
3. Sort Keys Dataset and Generate Index for FMTNAME
4. Prepend VARNUM of Variables with Custom Formats
5. Generate View of Unique Format Keys
6. Generate Dataset of Unique Custom Format Metadata Records
7. Generate Persistent Unique Formats Catalog
Phase I: Generate Catalog of Unique Formats
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
1. Generate Unique Format Keys for Each Custom Format
proc format library = user cntlout = work._format_metadata;quit ;
a. Generate Formats Metadata Dataset from Existing Catalog
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Variables in Creation Order
# Variable Type Len Label1 FMTNAME Char 32 Format name
2 START Char 16 Starting value for format
3 END Char 16 Ending value for format
4 LABEL Char 31 Format value label
5 MIN Num 3 Minimum length
6 MAX Num 3 Maximum length
7 DEFAULT Num 3 Default length
8 LENGTH Num 3 Format length
9 FUZZ Num 8 Fuzz value
10 PREFIX Char 2 Prefix characters
Variables in Creation Order
# Variable Type Len Label11 MULT Num 8 Multiplier
12 FILL Char 1 Fill character
13 NOEDIT Num 3 Is picture string noedit?
14 TYPE Char 1 Type of format
15 SEXCL Char 1 Start exclusion
16 EEXCL Char 1 End exclusion
17 HLO Char 11 Additional information
18 DECSEP Char 1 Decimal separator
19 DIG3SEP Char 1 Three-digit separator
20 DATATYPE Char 8 Date/time/datetime?
21 LANGUAGE Char 8 Language for date strings
_FORMAT_METADATA Contents
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
1. Generate Unique Format Keys for Each Custom Format
data work._custom_format_keys ( keep = fmtname type custom_format_key
key_length where = (!(missing(custom_format_key))) );set USER._FORMAT_METADATA ( );by fmtname;
a. Generate Formats Metadata Dataset from Existing Catalog
b. Generate Custom Format Keys Dataset
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
1. Generate Unique Format Keys for Each Custom Format
attrib CUSTOM_FORMAT_KEY length = $ &_MAX_KEY_LEN format = $CHAR&_MAX_KEY_LEN.. label = 'Custom Format Key' KEY_LENGTH length = 8 format = comma12.0 label = 'Key Length';retain custom_format_key (' ') format_count (0);
a. Generate Formats Metadata Dataset from Existing Catalog
b. Generate Custom Format Keys Dataset
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
1. Generate Unique Format Keys for Each Custom Format
* CHECK FOR CARROT CHARACTER IN VALUES AND LABELS ;if ( (indexc(start,'^') NE 0) or (indexc(end,'^') NE 0) or (indexc(label,'^') NE 0) or (indexc(sexcl,'^') NE 0) or (indexc(eexcl,'^') NE 0)) then put 'WARNING: CARROT CHARACTER USED: ' fmtname= start= end= label= sexcl= eexcl= ;
a. Generate Formats Metadata Dataset from Existing Catalog
b. Generate Custom Format Keys Dataset
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
1. Generate Unique Format Keys for Each Custom Format
* GENERATE THE CUSTOM FORMAT KEY ;custom_format_key = catx( '^', custom_format_key, start, end, label, type, sexcl, eexcl );
a. Generate Formats Metadata Dataset from Existing Catalog
b. Generate Custom Format Keys Dataset
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
1. Generate Unique Format Keys for Each Custom Format
if (last.fmtname) then do ; format_count ++ 1 ; key_length =
length(trim(left(custom_format_key))) ; if (key_length GE %eval(&_MAX_KEY_LEN - 1))
then put 'WARNING: POTENTIAL KEY LENGTH OVERUN: '
FORMAT_COUNT= ; output ; custom_format_key = '' ; end ;
run ;
a. Generate Formats Metadata Dataset from Existing Catalog
b. Generate Custom Format Keys Dataset
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
_CUSTOM_FORMAT_KEYS Contents
Variables in Creation Order
# Variable Type Len Label1 CUSTOM_FORMAT_KEY Char 32767 Custom Format Key
2 KEY_LENGTH Num 8 Key Length
3 FMTNAME Char 32 Format name
4 TYPE Char 1 Type of format
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Generate Dataset of Custom Format Keys
FMTNAME CUSTOM_FORMAT_KEY STRING_LENGTHV20A 000.NONE130NNN111.ONEDAY130NNN222.TWODAYS130NNN333.THR
EEDAYS130NNN444.FOURDAYS130NNN555.FIVEDAYS130NNN666.SIXDAYS130NNN777.EVERYDAY130NNN888.DON'TKNOW130NNN999.RF.NA130NNN^9^9^9.RF.NA^13^0^N^N^N
194
V20B 000.NONE130NNN111.ONEDAY130NNN222.TWODAYS130NNN333.THREEDAYS130NNN444.FOURDAYS130NNN555.FIVEDAYS130NNN666.SIXDAYS130NNN777.EVERYDAY130NNN888.DON'TKNOW130NNN999.RF.NA130NNN^9^9^9.RF.NA^13^0^N^N^N
194
V20C 000.NONE130NNN111.ONEDAY130NNN222.TWODAYS130NNN333.THREEDAYS130NNN444.FOURDAYS130NNN555.FIVEDAYS130NNN666.SIXDAYS130NNN777.EVERYDAY130NNN888.DON'TKNOW130NNN999.RF.NA130NNN^9^9^9.RF.NA^13^0^N^N^N
194
V20D 000.NONE130NNN111.ONEDAY130NNN222.TWODAYS130NNN333.THREEDAYS130NNN444.FOURDAYS130NNN555.FIVEDAYS130NNN666.SIXDAYS130NNN777.EVERYDAY130NNN888.DON'TKNOW130NNN999.RF.NA130NNN^9^9^9.RF.NA^13^0^N^N^N
194
WEIGHT 000.NoPostIW130NNN^0^0^0.NoPostIW^13^0^N^N^N 44
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
2. Optimize Length of Key Variable
* GENERATE AND EXPORT MAXIMUM KEY LENGTH TO MACRO VARIABLE ;proc sql noprint ;select strip(put(max(key_length),5.0))into :_maximum_key_lengthfrom work._custom_format_keys;quit ;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
2. Optimize Length of Key Variable
data WORK._CUSTOM_FORMAT_KEYS ( label = '_CUSTOM_FORMAT_KEYS' );attrib fmtname label = 'Format Name' length = $ 32 type label = 'Variable Type' length = $ 1 custom_format_key label = 'Custom Format Key' length = $ &_MAXIMUM_KEY_LENGTH format = %nrbquote($CHAR%trim(&_MAXIMUM_KEY_LENGTH).);set WORK._CUSTOM_FORMAT_KEYS ( drop = key_length );run ;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
3. Sort Keys Dataset and Generate Index for FMTNAME
proc sort data =
work._custom_format_keys out =
work._custom_format_keys ( index = (fmtname) );by custom_format_key fmtname type;run ;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
4. Prepend VARNUM of Variables with Custom Formats
proc sql ;create view work._ordered_custom_format_keysas select variables.varnum, keys.fmtname as fmtname, variables.type as type, keys.custom_format_key as custom_format_keyfrom WORK._CUSTOM_FORMAT_KEYS keysleft join USER._VARIABLE_METADATA variableson (cats(keys.fmtname,'.') EQ variables.format) and (keys.type EQ upcase(substr(variables.type,1,1)))
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
4. Prepend VARNUM of Variables with Custom Formats
order by custom_format_key, varnum;quit ;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
5. Generate View of Unique Format Keys
data work._unique_format_keys / view = work._unique_format_keys;set work._ordered_custom_format_keys;by custom_format_key;if (first.custom_format_key) then output ;run ;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Generate Dataset of Unique Format Records
FMTNAME CUSTOM_FORMAT_STRING STRING_LENGTHV20A 000.NONE130NNN111.ONEDAY130NNN222.TWODAYS130NNN333.THRE
EDAYS130NNN444.FOURDAYS130NNN555.FIVEDAYS130NNN666.SIXDAYS130NNN777.EVERYDAY130NNN888.DON'TKNOW130NNN999.RF.NA130NNN^9^9^9.RF.NA^13^0^N^N^N
194
WEIGHT 000.NoPostIW130NNN^0^0^0.NoPostIW^13^0^N^N^N 44V22C 0000.NA160NNN979797.97andolder160NNN^97^97^97.97andolder^16^0^N
^N^N67
V05C 110-2460NNN2225-4960NNN335060NNN4451-7560NNN5576-10060NNN99NA60NNN^9^9^NA^6^0^N^N^N
83
V00I 110-2470NNN2225-4970NNN335070NNN4451-7570NNN5576-10070NNN99NA70NNN^9^9^NA^7^0^N^N^N
83
V14G 1118-24100NNN2225-34100NNN3335-44100NNN4445-54100NNN5555-64100NNN6665&older100NNN99NA100NNN^9^9^NA^10^0^N^N^N
109
V01D 116-7days100NNN223-5days100NNN331-2days100NNN440days100NNN99NA100NNN^9^9^NA^10^0^N^N^N
86
V02F 117days80NNN223-6days80NNN331-2days80NNN440days80NNN99NA80NNN^9^9^NA^8^0^N^N^N
78
V09E 11Addresssocialproblems230NNN77Vigorouslyenforcelaws230NNN99NA230NNN^9^9^NA^23^0^N^N^N
86
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
6. Generate View of Unique Custom Format Metadata Records
proc sql ;create view work._unique_format_viewas select *from USER._FORMAT_METADATAwhere fmtname in ( select fmtname from WORK._UNIQUE_FORMAT_KEYS )order by fmtname, type, start, end;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
7. Generate Persistent Unique Formats Catalog
proc format library = USER cntlin = WORK._UNIQUE_FORMAT_VIEW;quit ;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Eliminating Redundant Custom Formats
Phase I: Generate Catalog of Unique Formats
Phase II: Update Dataset Descriptor
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
1. Generate Formats Crosswalk Dataset
2. Generate Crosswalk Pairs Dataset
3. Update Dataset Descriptor
Phase II: Update Dataset Descriptor
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
1. Generate Formats Crosswalk Dataset
proc sql ;create table WORK._FORMAT_CROSSWALKas select distinct _custom_format_keys.fmtname as fmtname, _custom_format_keys.type as type, _unique_format_keys.fmtname as unique_fmtnamefrom work._custom_format_keys customleft join work._unique_format_keys unique on (custom.custom_format_key EQ unique.custom_format_key);quit ;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
2. Generate Crosswalk Pairs Dataset
proc sql ;create table _NUMBERED_CROSSWALK_PAIRSas select variables.varnum, variables.name, variables.type, crosswalk.unique_fmtnamefrom _FORMATTED_VARIABLE_METADATA variablesleft join _FORMAT_CROSSWALK crosswalkon (compress(variables.format,'$.') EQ crosswalk.fmtname) and (upcase(substr(variables.type,1,1)) EQ crosswalk.type)order by varnum;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
3. Update Dataset Descriptor
data _NULL_ ;set _NUMBERED_CROSSWALK_PAIRS end = end_of_dataset;call symput( '_varname_' || strip(put(_N_,6.0)), strip(name)) ;if type EQ 'char' then call symput( '_fmtname_' || strip(put(_N_,6.0)),
strip(cats('$',unique_fmtname,'.'))) ;else call symput( '_fmtname_' || strip(put(_N_,6.0)),
strip(cats(unique_fmtname,'.'))) ;if end_of_dataset then call symput( '_pairs_n', strip(put(_N_,6.0))) ;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
3. Update Dataset Descriptor
* UPDATE DATASET VARIABLE--FORMAT SPECIFICATIONS. ;proc datasets library = &_LIBNAME nolist;%put NOTE: CLEARING OLD VARIABLE--FORMAT PAIRINGS. ;modify &_DS_NAME;format _ALL_ ;run ;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
3. Update Dataset Descriptor
modify &_DS_NAME ;;format%do _i = 1 %to &_PAIRS_N ; &&_VARNAME_&_I &&_FMTNAME_&_I%end ;;run ;quit ;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Author contact information second line
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Eliminating Redundant Custom Formats (or How to Really Take Advantage of PROC SQL, PROC CATALOG, and the DATA Step)
Philip A. Wright
Inter-university Consortium for Political and Social Research (ICPSR),The Institute for Social Research (ISR),University of Michigan
P.O. Box 1248Ann Arbor, Michigan 48106-1248
E-mail: [email protected]