Upload
janice-flathers
View
224
Download
0
Embed Size (px)
Citation preview
Creating Summary Data SetsRon Cody, Ed.D.
Robert Wood Johnson Medical School
Test data set (CLINIC)SUBJECT GENDER AGE_GROUP BLOOD_TYPE HR SBP DBP
1 M 1 A 80 130 80
2 M 1 B 68 128 70
3 M 2 O . 120 72
4 M 1 A 48 140 86
5 F 2 A 56 160 94
6 F 1 B 60 109 64
7 F 2 O 82 118 70
8 F 2 O 64 . 76
9 F 1 A 56 . 88
10 F 1 B 88 188 110
11 M 1 B 64 120 80
12 M 2 B 62 120 76
PROC MEANS DATA=data_set_name NOPRINT;
Is equivalent to
PROC SUMMARY DATA=data_set_name;
PROC MEANS vs. PROC SUMMARY
Creating a SUMMARY Data Set Containing MEANS
PROC MEANS DATA=CLINIC NOPRINT;/****************************************Equivalent to PROC SUMMARY DATA=CLINIC;*****************************************/ CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; Listing of data set OUT1
Obs GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP
1 0 12 66.1818 133.300 80.5000 2 F 1 6 67.6667 143.750 83.6667 3 M 1 6 64.4000 126.333 77.3333
Using a BY statement Instead of a CLASS Statement
PROC SORT DATA=CLINIC; BY GENDER;RUN;PROC MEANS DATA=CLINIC NOPRINT; BY GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN;
Listing of data set OUT1
Obs GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP
1 F 0 6 67.6667 143.750 83.6667 2 M 0 6 64.4000 126.333 77.3333
Creating a SUMMARY Data Set Containing MEANS
Broken Down by GENDER and AGE_GROUP PROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP
. 0 12 66.1818 133.300 80.5000 1 1 7 66.2857 135.833 82.5714 2 1 5 66.0000 129.500 77.6000 F . 2 6 67.6667 143.750 83.6667 M . 2 6 64.4000 126.333 77.3333 F 1 3 3 68.0000 148.500 87.3333 F 2 3 3 67.3333 139.000 80.0000 M 1 3 4 65.0000 129.500 79.0000 M 2 3 2 62.0000 120.000 74.0000
Explaining the _TYPE_ Variable
Class Variables Representation
GENDER AGE_GROUP Binary Decimal
0 0 00 0
0 1 01 1
1 0 10 2
1 1 11 3
CLASS GENDER AGE_GROUP;
Demonstrating the NWAY Option
PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN;
AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP
F 1 3 3 68.0000 148.5 87.3333 F 2 3 3 67.3333 139.0 80.0000 M 1 3 4 65.0000 129.5 79.0000 M 2 3 2 62.0000 120.0 74.0000
Outputting More than One StatisticPROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN =M_HR M_SBP M_DBP N =N_HR N_SBP N_DBP MAX =MAX_HR MAX_SBP MAX_DBP MEDIAN =MED_HR MED_SBP MED_DBP;RUN; GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP N_HR N_SBP
0 12 66.1818 133.300 80.5000 11 10 F 1 6 67.6667 143.750 83.6667 6 4 M 1 6 64.4000 126.333 77.3333 5 6
N_DBP MAX_HR MAX_SBP MAX_DBP MED_HR MED_SBP MED_DBP
12 88 188 110 64 124 78 6 88 188 110 62 139 82 6 80 140 86 64 124 78
Partial List of Some Available StatisticsKeyword Description________________________________ MEAN MeanN Number of non-missing valuesNMISS Number of missing values MIN Smallest non-missing valueMAX Largest valueMEDIAN MedianRANGE Range - difference between the minimum and
maximum valuesQ1 25th percentileQ3 75th percentileQRANGE Interquartile range
(difference between 25th and 75th percentile)STD Standard deviationSTDERR Standard errorUCLM Upper bound of the 95% confidence interval LCLM Lower bound of the 95% confidence interval
Demonstrating the AUTONAME OUTPUT optionPROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN = N = MAX = MEDIAN = / AUTONAME;RUN;
GENDER _TYPE_ _FREQ_ HR_Mean SBP_Mean DBP_Mean HR_N SBP_N
0 12 66.1818 133.300 80.5000 11 10 F 1 6 67.6667 143.750 83.6667 6 4 M 1 6 64.4000 126.333 77.3333 5 6
SBP_ DBP_DBP_N HR_Max SBP_Max DBP_Max HR_Median Median Median
12 88 188 110 64 124 78 6 88 188 110 62 139 82 6 80 140 86 64 124 78
Another Way of Naming Output Variables
PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=;RUN;
Listing of Data Set OUT1
AGE_GENDER GROUP _TYPE_ _FREQ_ HR SBP DBP
F 1 3 3 68.0000 148.5 87.3333 F 2 3 3 67.3333 139.0 80.0000 M 1 3 4 65.0000 129.5 79.0000 M 2 3 2 62.0000 120.0 74.0000
Dropping Unneeded Variables in the Output Dataset
PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1(DROP= _:) MEAN=M_HR M_SBP M_DBP;RUN;
Listing of Data Set OUT1
AGE_GENDER GROUP M_HR M_SBP M_DBP
F 1 68.0000 148.5 87.3333 F 2 67.3333 139.0 80.0000 M 1 65.0000 129.5 79.0000 M 2 62.0000 120.0 74.0000
Demonstrating the CHARTYPE Procedure Option
PROC MEANS DATA=CLINIC NOPRINT CHARTYPE; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; Demonstrating CHARTYPE Option
AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP
. 00 12 66.1818 133.300 80.5000 1 01 7 66.2857 135.833 82.5714 2 01 5 66.0000 129.500 77.6000 F . 10 6 67.6667 143.750 83.6667 M . 10 6 64.4000 126.333 77.3333 F 1 11 3 68.0000 148.500 87.3333 F 2 11 3 67.3333 139.000 80.0000 M 1 11 4 65.0000 129.500 79.0000 M 2 11 2 62.0000 120.000 74.0000
Demonstrating the CHARTYPE Procedure Option
PROC PRINT DATA=OUT1 NOOBS; TITLE "Demonstrating CHARTYPE Option"; WHERE _TYPE_ EQ "10";RUN;
Demonstrating CHARTYPE Option
AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP
F . 10 6 67.6667 143.750 83.6667 M . 10 6 64.4000 126.333 77.3333
Another Way to Name Variables
(instead of using a VAR statement)PROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; ***VAR STATEMENT OPTIONAL; OUTPUT OUT=OUT1 MEAN(HR) =M_HR N(HR SBP DBP) =N_HR N_SBP N_DBP MAX(SBP) =MAX_SBP MEDIAN(SBP DBP) =MED_SBP MED_DBP;RUN;
GENDER _TYPE_ _FREQ_ M_HR N_HR N_SBP N_DBP MAX_SBP MED_SBP MED_DBP
0 12 66.1818 11 10 12 188 124 78 F 1 6 67.6667 6 4 6 188 139 82 M 1 6 64.4000 5 6 6 140 124 78
Multi-way Breakdowns Using a TYPES Statement
PROC MEANS DATA=CLINIC NOPRINT CHARTYPE; CLASS GENDER AGE_GROUP BLOOD_TYPE; VAR HR SBP DBP; TYPES GENDER AGE_GROUP*GENDER BLOOD_TYPE*GENDER; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; AGE_ BLOOD_
GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP
F . 100 6 67.6667 143.750 83.6667 M . 100 6 64.4000 126.333 77.3333 F . A 101 2 56.0000 160.000 91.0000 F . B 101 2 74.0000 148.500 87.0000 F . O 101 2 73.0000 118.000 73.0000 M . A 101 2 64.0000 135.000 83.0000 M . B 101 3 64.6667 122.667 75.3333 M . O 101 1 . 120.000 72.0000 F 1 110 3 68.0000 148.500 87.3333 F 2 110 3 67.3333 139.000 80.0000 M 1 110 4 65.0000 129.500 79.0000 M 2 110 2 62.0000 120.000 74.0000
Using the _TYPE_ Values to Create Multiple Data Sets
DATA GENDER AGE_BY_GENDER BLOOD_BY_GENDER; SET OUT1; IF _TYPE_ = "100" THEN OUTPUT GENDER; ELSE IF _TYPE_ = "110" THEN OUTPUT AGE_BY_GENDER;RUN; Listing of Data Set GENDER
AGE_ BLOOD_GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP
F . 100 6 67.6667 143.750 83.6667 M . 100 6 64.4000 126.333 77.3333
Listing of Data Set AGE_BY_GENDER
AGE_ BLOOD_GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP
F 1 110 3 68.0000 148.5 87.3333 F 2 110 3 67.3333 139.0 80.0000 M 1 110 4 65.0000 129.5 79.0000 M 2 110 2 62.0000 120.0 74.0000
Examples of TYPES Statements
TYPES A A*C D*C; TYPES A*(B C D);TYPES () A A*C*D;
Using PROC FREQ to Count Frequencies
PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=NUMBER;RUN;
Listing of Data Set NUMBER
AGE_GROUP COUNT PERCENT
1 7 58.3333 2 5 41.6667
Renaming the COUNT Variable
PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=NUMBER(RENAME=(COUNT=N_AGE) DROP=PERCENT);RUN;
Listing of Data Set NUMBER
AGE_GROUP N_AGE
1 7 2 5
Using PROC MEANS to Count Frequencies
PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS AGE_GROUP; VAR HR; /* ANY NUMERIC VARIABLE */ OUTPUT OUT=COUNTS(RENAME=(_FREQ_ = N_AGE) DROP=_TYPE_ DUMMY) N=DUMMY;RUN; Listing of Data Set COUNTS
AGE_GROUP N_AGE
1 7 2 5
Using PROC FREQ to Count Frequencies in a Two-way Table
PROC FREQ DATA=CLINIC NOPRINT; TABLES GENDER*BLOOD_TYPE / OUT=FREQOUT(DROP=PERCENT
RENAME=(COUNT=NUMBER));RUN; Listing of Data Set FREQOUT
BLOOD_GENDER TYPE NUMBER
F A 2 F B 2 F O 2 M A 2 M B 3 M O 1
Using PROC FREQ to Output More than One Data Set
PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=OUT1; TABLES GENDER / OUT=OUT2; TABLES GENDER*AGE_GROUP / OUT=OUT3;RUN; Listing of Data Set OUT1
AGE_GROUP COUNT PERCENT
1 7 58.3333 2 5 41.6667----------------------------------------------------------------Listing of Data Set OUT2
GENDER COUNT PERCENT
F 6 50 M 6 50----------------------------------------------------------------Listing of Data Set OUT3
GENDER AGE_GROUP COUNT PERCENT
F 1 3 25.0000 F 2 3 25.0000 M 1 4 33.3333 M 2 2 16.6667