24
Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Embed Size (px)

Citation preview

Page 1: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Creating Summary Data SetsRon Cody, Ed.D.

Robert Wood Johnson Medical School

Page 2: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Test data set (CLINIC)SUBJECT GENDER AGE_GROUP BLOOD_TYPE HR SBP DBP

1 M 1 A 80 130 80

2 M 1 B 68 128 70

3 M 2 O . 120 72

4 M 1 A 48 140 86

5 F 2 A 56 160 94

6 F 1 B 60 109 64

7 F 2 O 82 118 70

8 F 2 O 64 . 76

9 F 1 A 56 . 88

10 F 1 B 88 188 110

11 M 1 B 64 120 80

12 M 2 B 62 120 76

Page 3: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

PROC MEANS DATA=data_set_name NOPRINT;

Is equivalent to

PROC SUMMARY DATA=data_set_name;

PROC MEANS vs. PROC SUMMARY

Page 4: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Creating a SUMMARY Data Set Containing MEANS

PROC MEANS DATA=CLINIC NOPRINT;/****************************************Equivalent to PROC SUMMARY DATA=CLINIC;*****************************************/ CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; Listing of data set OUT1

Obs GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP

1 0 12 66.1818 133.300 80.5000 2 F 1 6 67.6667 143.750 83.6667 3 M 1 6 64.4000 126.333 77.3333

Page 5: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Using a BY statement Instead of a CLASS Statement

PROC SORT DATA=CLINIC; BY GENDER;RUN;PROC MEANS DATA=CLINIC NOPRINT; BY GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN;

Listing of data set OUT1

Obs GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP

1 F 0 6 67.6667 143.750 83.6667 2 M 0 6 64.4000 126.333 77.3333

Page 6: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Creating a SUMMARY Data Set Containing MEANS

Broken Down by GENDER and AGE_GROUP PROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP

. 0 12 66.1818 133.300 80.5000 1 1 7 66.2857 135.833 82.5714 2 1 5 66.0000 129.500 77.6000 F . 2 6 67.6667 143.750 83.6667 M . 2 6 64.4000 126.333 77.3333 F 1 3 3 68.0000 148.500 87.3333 F 2 3 3 67.3333 139.000 80.0000 M 1 3 4 65.0000 129.500 79.0000 M 2 3 2 62.0000 120.000 74.0000

Page 7: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Explaining the _TYPE_ Variable

Class Variables Representation

GENDER AGE_GROUP Binary Decimal

0 0 00 0

0 1 01 1

1 0 10 2

1 1 11 3

CLASS GENDER AGE_GROUP;

Page 8: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Demonstrating the NWAY Option

PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN;

AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP

F 1 3 3 68.0000 148.5 87.3333 F 2 3 3 67.3333 139.0 80.0000 M 1 3 4 65.0000 129.5 79.0000 M 2 3 2 62.0000 120.0 74.0000

Page 9: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Outputting More than One StatisticPROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN =M_HR M_SBP M_DBP N =N_HR N_SBP N_DBP MAX =MAX_HR MAX_SBP MAX_DBP MEDIAN =MED_HR MED_SBP MED_DBP;RUN; GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP N_HR N_SBP

0 12 66.1818 133.300 80.5000 11 10 F 1 6 67.6667 143.750 83.6667 6 4 M 1 6 64.4000 126.333 77.3333 5 6

N_DBP MAX_HR MAX_SBP MAX_DBP MED_HR MED_SBP MED_DBP

12 88 188 110 64 124 78 6 88 188 110 62 139 82 6 80 140 86 64 124 78

Page 10: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Partial List of Some Available StatisticsKeyword Description________________________________ MEAN MeanN Number of non-missing valuesNMISS Number of missing values MIN Smallest non-missing valueMAX Largest valueMEDIAN MedianRANGE Range - difference between the minimum and

maximum valuesQ1 25th percentileQ3 75th percentileQRANGE Interquartile range

(difference between 25th and 75th percentile)STD Standard deviationSTDERR Standard errorUCLM Upper bound of the 95% confidence interval LCLM Lower bound of the 95% confidence interval

Page 11: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Demonstrating the AUTONAME OUTPUT optionPROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN = N = MAX = MEDIAN = / AUTONAME;RUN;

GENDER _TYPE_ _FREQ_ HR_Mean SBP_Mean DBP_Mean HR_N SBP_N

0 12 66.1818 133.300 80.5000 11 10 F 1 6 67.6667 143.750 83.6667 6 4 M 1 6 64.4000 126.333 77.3333 5 6

SBP_ DBP_DBP_N HR_Max SBP_Max DBP_Max HR_Median Median Median

12 88 188 110 64 124 78 6 88 188 110 62 139 82 6 80 140 86 64 124 78

Page 12: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Another Way of Naming Output Variables

PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=;RUN;

Listing of Data Set OUT1

AGE_GENDER GROUP _TYPE_ _FREQ_ HR SBP DBP

F 1 3 3 68.0000 148.5 87.3333 F 2 3 3 67.3333 139.0 80.0000 M 1 3 4 65.0000 129.5 79.0000 M 2 3 2 62.0000 120.0 74.0000

Page 13: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Dropping Unneeded Variables in the Output Dataset

PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1(DROP= _:) MEAN=M_HR M_SBP M_DBP;RUN;

Listing of Data Set OUT1

AGE_GENDER GROUP M_HR M_SBP M_DBP

F 1 68.0000 148.5 87.3333 F 2 67.3333 139.0 80.0000 M 1 65.0000 129.5 79.0000 M 2 62.0000 120.0 74.0000

Page 14: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Demonstrating the CHARTYPE Procedure Option

PROC MEANS DATA=CLINIC NOPRINT CHARTYPE; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; Demonstrating CHARTYPE Option

AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP

. 00 12 66.1818 133.300 80.5000 1 01 7 66.2857 135.833 82.5714 2 01 5 66.0000 129.500 77.6000 F . 10 6 67.6667 143.750 83.6667 M . 10 6 64.4000 126.333 77.3333 F 1 11 3 68.0000 148.500 87.3333 F 2 11 3 67.3333 139.000 80.0000 M 1 11 4 65.0000 129.500 79.0000 M 2 11 2 62.0000 120.000 74.0000

Page 15: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Demonstrating the CHARTYPE Procedure Option

PROC PRINT DATA=OUT1 NOOBS; TITLE "Demonstrating CHARTYPE Option"; WHERE _TYPE_ EQ "10";RUN;

Demonstrating CHARTYPE Option

AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP

F . 10 6 67.6667 143.750 83.6667 M . 10 6 64.4000 126.333 77.3333

Page 16: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Another Way to Name Variables

(instead of using a VAR statement)PROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; ***VAR STATEMENT OPTIONAL; OUTPUT OUT=OUT1 MEAN(HR) =M_HR N(HR SBP DBP) =N_HR N_SBP N_DBP MAX(SBP) =MAX_SBP MEDIAN(SBP DBP) =MED_SBP MED_DBP;RUN;

GENDER _TYPE_ _FREQ_ M_HR N_HR N_SBP N_DBP MAX_SBP MED_SBP MED_DBP

0 12 66.1818 11 10 12 188 124 78 F 1 6 67.6667 6 4 6 188 139 82 M 1 6 64.4000 5 6 6 140 124 78

Page 17: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Multi-way Breakdowns Using a TYPES Statement

PROC MEANS DATA=CLINIC NOPRINT CHARTYPE; CLASS GENDER AGE_GROUP BLOOD_TYPE; VAR HR SBP DBP; TYPES GENDER AGE_GROUP*GENDER BLOOD_TYPE*GENDER; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; AGE_ BLOOD_

GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP

F . 100 6 67.6667 143.750 83.6667 M . 100 6 64.4000 126.333 77.3333 F . A 101 2 56.0000 160.000 91.0000 F . B 101 2 74.0000 148.500 87.0000 F . O 101 2 73.0000 118.000 73.0000 M . A 101 2 64.0000 135.000 83.0000 M . B 101 3 64.6667 122.667 75.3333 M . O 101 1 . 120.000 72.0000 F 1 110 3 68.0000 148.500 87.3333 F 2 110 3 67.3333 139.000 80.0000 M 1 110 4 65.0000 129.500 79.0000 M 2 110 2 62.0000 120.000 74.0000

Page 18: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Using the _TYPE_ Values to Create Multiple Data Sets

DATA GENDER AGE_BY_GENDER BLOOD_BY_GENDER; SET OUT1; IF _TYPE_ = "100" THEN OUTPUT GENDER; ELSE IF _TYPE_ = "110" THEN OUTPUT AGE_BY_GENDER;RUN; Listing of Data Set GENDER

AGE_ BLOOD_GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP

F . 100 6 67.6667 143.750 83.6667 M . 100 6 64.4000 126.333 77.3333

Listing of Data Set AGE_BY_GENDER

AGE_ BLOOD_GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP

F 1 110 3 68.0000 148.5 87.3333 F 2 110 3 67.3333 139.0 80.0000 M 1 110 4 65.0000 129.5 79.0000 M 2 110 2 62.0000 120.0 74.0000

Page 19: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Examples of TYPES Statements

TYPES A A*C D*C; TYPES A*(B C D);TYPES () A A*C*D;

Page 20: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Using PROC FREQ to Count Frequencies

PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=NUMBER;RUN;

Listing of Data Set NUMBER

AGE_GROUP COUNT PERCENT

1 7 58.3333 2 5 41.6667

Page 21: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Renaming the COUNT Variable

PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=NUMBER(RENAME=(COUNT=N_AGE) DROP=PERCENT);RUN;

Listing of Data Set NUMBER

AGE_GROUP N_AGE

1 7 2 5

Page 22: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Using PROC MEANS to Count Frequencies

PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS AGE_GROUP; VAR HR; /* ANY NUMERIC VARIABLE */ OUTPUT OUT=COUNTS(RENAME=(_FREQ_ = N_AGE) DROP=_TYPE_ DUMMY) N=DUMMY;RUN; Listing of Data Set COUNTS

AGE_GROUP N_AGE

1 7 2 5

Page 23: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Using PROC FREQ to Count Frequencies in a Two-way Table

PROC FREQ DATA=CLINIC NOPRINT; TABLES GENDER*BLOOD_TYPE / OUT=FREQOUT(DROP=PERCENT

RENAME=(COUNT=NUMBER));RUN; Listing of Data Set FREQOUT

BLOOD_GENDER TYPE NUMBER

F A 2 F B 2 F O 2 M A 2 M B 3 M O 1

Page 24: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Using PROC FREQ to Output More than One Data Set

PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=OUT1; TABLES GENDER / OUT=OUT2; TABLES GENDER*AGE_GROUP / OUT=OUT3;RUN; Listing of Data Set OUT1

AGE_GROUP COUNT PERCENT

1 7 58.3333 2 5 41.6667----------------------------------------------------------------Listing of Data Set OUT2

GENDER COUNT PERCENT

F 6 50 M 6 50----------------------------------------------------------------Listing of Data Set OUT3

GENDER AGE_GROUP COUNT PERCENT

F 1 3 25.0000 F 2 3 25.0000 M 1 4 33.3333 M 2 2 16.6667