20
Humaira Khair A Closer Look at Proc Compare 1

2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

Humaira Khair

A Closer Look at Proc Compare

1

Page 2: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

How PROC COMPARE makes

my work easy?

Page 3: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

Proc Compare is a procedure that allows two datasets to be compared for properties, number of observations and number of variables.

For a dataset, we can find differences in:

date of creation, last modification of the datasets,

number of variables and observations of the datasets.

For matching variables, we can get output about differences in:

Values, type, length, formats, informats and labels.

For observations, we can get a comparison of the values of matching observations. We can also decide how different the values of the observations can be.

Page 4: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

PROC COMPARE produces lengthy output. With well-chosen

options and statements, we can compare pairs of SAS

datasets at multiple levels without the need of MERGEs or

SQL JOINs for DATA step.

proc compare BASE=th_old COMPARE=th_new ;

title 'Proc Compare with no options‘ ;

run ;

BASE=

Specify the base data set

COMPARE=

Specify the comparison data set

Page 5: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

Proc Compare with no options

The COMPARE Procedure

Comparison of WORK.TH_OLD with WORK.TH_NEW

(Method=EXACT)

Data Set Summary

Dataset Created Modified NVar NObs

WORK.TH_OLD 23OCT13:11:56:06 23OCT13:11:56:06 10 13

WORK.TH_NEW 23OCT13:11:56:06 23OCT13:11:56:06 10 14

Variables Summary

Number of Variables in Common: 10.

Number of Variables with Differing Attributes: 10.

Number of BY Variables: 2.

Listing of Common Variables with Differing Attributes

Variable Dataset Type Length Format Informat

crnum WORK.TH_OLD Char 7

WORK.TH_NEW Char 7 $7.

dxdt WORK.TH_OLD Num 8 YYMMDD10.

WORK.TH_NEW Num 8 YYMMDD10.

rectype WORK.TH_OLD Char 11 $11. $11.

WORK.TH_NEW Char 8 $8.

recloc WORK.TH_OLD Char 10 $10. $10.

WORK.TH_NEW Char 7 $7.

vitalst WORK.TH_OLD Char 8 $8. $8.

WORK.TH_NEW Char 1 $1.

t7 WORK.TH_OLD Char 15 $15. $15.

WORK.TH_NEW Char 12 $12.

cci3 WORK.TH_OLD Char 11 $11. $11.

WORK.TH_NEW Char 8

cci4 WORK.TH_OLD Char 11 $11. $11.

WORK.TH_NEW Char 8

txdt3 WORK.TH_OLD Num 8 DATE9.

WORK.TH_NEW Num 8 DDMMYY10.

txdt4 WORK.TH_OLD Num 8 DATE9.

WORK.TH_NEW Num 8 DDMMYY10.

Page 6: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

Observation Summary

Observation Base Compare

First Obs 1 1

First Unequal 1 1

Last Unequal 13 13

Last Match 13 13

Last Obs . 14

Number of Observations in Common: 13.

Number of Observations in WORK.TH_NEW but not in WORK.TH_OLD: 1.

Total Number of Observations Read from WORK.TH_OLD: 13.

Total Number of Observations Read from WORK.TH_NEW: 14.

Number of Observations with Some Compared Variables Unequal: 13.

Number of Observations with All Compared Variables Equal: 0.

Values Comparison Summary

Number of Variables Compared with All Observations Equal: 1.

Number of Variables Compared with Some Observations Unequal: 9.

Number of Variables with Missing Value Differences: 4.

Total Number of Values which Compare Unequal: 65.

Maximum Difference: 138.

Variables with Unequal Values

Variable Type Len1 Len2 Label Ndif MaxDif MissDif

crnum CHAR 7 7 10 0

dxdt NUM 8 8 dxdt 9 138 0

recloc CHAR 10 7 recloc 4 0

vitalst CHAR 8 1 vitalst 1 0

t7 CHAR 15 12 t7 13 0

cci3 CHAR 11 8 cci3 11 11

cci4 CHAR 11 8 cci4 3 3

txdt3 NUM 8 8 txdt3 11 0 11

txdt4 NUM 8 8 txdt4 3 0 3

Page 7: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

Value Comparison Results for Variables

__________________________________________________________

|| ($) CancerCare registry number

|| Base Value Compare Value

Obs || crnum crnum

________ || _______ _______

||

4 || 866 723

5 || 488 866

6 || 834 488

<skip>

__________________________________________________________

__________________________________________________________

|| dxdt

|| (N) diagnosis date (SAS)

|| Base Compare

Obs || dxdt dxdt Diff. % Diff

________ || _________ _________ _________ _________

||

4 || 2010-12-14 2010-07-29 -138.0000 -0.7415

6 || 2010-11-26 2010-12-14 18.0000 0.0968

7 || 2010-12-13 2010-11-26 -17.0000 -0.0914

<skip>

________________________________________________________

__________________________________________________________

|| recloc

|| ($) record location

|| Base Value Compare Value

Obs || recloc recloc

________ || __________ _______

||

2 || mcc sbu

3 || sbu mcc

4 || mcc sbu

8 || sbu mcc

__________________________________________________________ 7

Page 8: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

_________________________________________________________

|| vitalst

|| ($) vital status

|| Base Value Compare Value

Obs || vitalst vitalst

________ || ________ _

||

13 || a d

__________________________________________________________

__________________________________________________________

|| t7

|| ($) Tumour stage according to AJCC 7th edition

|| Base Value Compare Value

Obs || t7 t7

________ || _______________ ____________

||

1 || T1b T1b(m)

2 || T1b T1b(s)

3 || T1b T1b(s)

4 || T1b T2(m)

5 || T2 T1b(s)

6 || T1a T2(m)

7 || T3 T1a(s)

8 || T1b T3(s)

9 || T2 T1b(m)

10 || T3 T2(s)

11 || T3 T3(s)

12 || T3 T3(m)

13 || T1b T3(s)

__________________________________________________________

8

Page 9: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

__________________________________________________________

|| cci3 ($) Treatment code3

|| Base Value Compare Value

Obs || cci3 cci3

________ || ___________ ________

2 || 1FU59HAV

4 || 1FU59CAV

5 || 1FU59HAV

6 || 1FU59CAV

7 || 1FU59CAV

8 || 1FU59CAV

9 || 1FU59CAV

10 || 1FU59CAV

11 || 1MC87LA

12 || 1MC87LA

13 || 1FU59CAV

__________________________________________________________

__________________________________________________________

|| cci4 ($) Treatment code4

|| Base Value Compare Value

Obs || cci4 cci4

________ || ___________ ________

8 || 1MC87LA

11 || 1FU59HAV

12 || 1FU59CAV

__________________________________________________________

__________________________________________________________

|| txdt3

|| Base Compare

Obs || txdt3 txdt3 Diff. % Diff

________ || _________ _________ _________ _________

2 || . 02/12/11 . .

4 || . 18/11/11 . .

5 || 22JUN2011 . . .

6 || . 28/06/11 . .

7 || 24MAY2011 . . .

8 || . 24/05/11 . .

9 || 25FEB2011 . . .

10 || . 25/02/11 . .

Page 10: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

__________________________________________________________

|| txdt3

|| Base Compare

Obs || txdt3 txdt3 Diff. % Diff

________ || _________ _________ _________ _________

||

11 || 10DEC2010 . . .

12 || . 10/12/10 . .

13 || 01APR2011 . . .

__________________________________________________________

__________________________________________________________

|| txdt4

|| Base Compare

Obs || txdt4 txdt4 Diff. % Diff

________ || _________ _________ _________ _________

||

8 || . 15/06/12 . .

11 || 09MAY2011 . . .

12 || . 09/05/11 . .

__________________________________________________________

Page 11: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

Now let’s start adding some options:

proc compare BASE=th_old COMPARE=th_new NOVALUES LISTVAR ;

title 'Proc Compare: If we want to compare the contents of the data sets' ;

run ;

Adding these two options: NOVALUES LISTVAR will skip the output part “Value Comparison Results for Variables”

Page 12: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

proc compare BASE=th_old COMPARE=th_new

NOVALUES WARNING NOPRINT ;

title 'PROC COMPARE with NOVALUES, WARNING &

NOPRINT options‘ ;

run ;

Here NOPRINT option will suppress the output and

WARNING option will show in the LOG.

WARNING: 10 variables have conflicting attributes in the two data sets.WARNING: Data set WORK.TH_NEW contains 1 observations not in WORK.TH_OLD.WARNING: Values of the following 9 variables compare unequal: crnum dxdt

recloc vitalst t7 cci3 cci4 txdt3 txdt4WARNING: The data sets WORK.TH_OLD and WORK.TH_NEW contain unequal

values.

Page 13: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

In most cases, the main goal is to compare the values of

variables for matching observations in two data sets using

the ID variable(s). Before using ID statement we need to

sort two data sets by ID. The output will be very lengthy

with each ID#. So we have to add some other options.

proc compare BASE=th_old COMPARE=th_new OUT=th_change

NOSUMMARY OUTBASE OUTCOMPARE OUTDIF OUTNOEQUAL

NOPRINT ;

by crnum ;

title 'PROC COMPARE using ID statement' ;

run ;

I have found this particular way of using PROC COMPARE

to be very helpful to check that expected updates to a data

set have been made.

Page 14: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

Control the output data set

Create an output data set

Write an observation that contains the

differences for each pair of matching

observations

Suppress the writing of observations when all values are equal Control the details in the default report

Print only a short comparison summary

Suppress all printed output

OUT=

OUTDIF

OUTNOEQUAL

BRIEFSUMMARY

NOPRINT

Page 15: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

_TYPE_ _OBS_ crnum dxdt recloc vitalst t7 cci3 cci4 txdt3 txdt4

BASE 2 138 2010-12-14 mcc a T1b . .

COMPARE 2 138 2010-12-14 sbu a T1b(s) 1FU59HAV 2011-12-02 .

DIF 2 138 E XXX....... ........ ...XXX......... XXXXXXXX... ........... . E

BASE 4 866 2010-12-14 mcc a T1b . .

COMPARE 5 866 2010-12-14 mcc a T1b(s) . .

DIF 5 866 E .......... ........ ...XXX......... ........... ........... E E

BASE 5 488 2010-12-14 mcc a T2 1FU59HAV 2011-06-22 .

COMPARE 6 488 2010-12-14 mcc a T2(m) 1FU59CAV 2011-06-28 .

DIF 6 488 E .......... ........ ..XXX.......... .....X..... ........... 1960-01-07 E

BASE 6 834 2010-11-26 mcc a T1a . .

COMPARE 7 834 2010-11-26 mcc a T1a(s) . .

DIF 7 834 E .......... ........ ...XXX......... ........... ........... E E

BASE 7 201 2010-12-13 mcc a T3 1FU59CAV 2011-05-24 .

COMPARE 8 201 2010-12-13 mcc a T3(s) 1FU59CAV 1MC87LA 2011-05-24 2012-06-15

DIF 8 201 E .......... ........ ..XXX.......... ........... XXXXXXX.... E .

BASE 8 389 2010-11-29 sbu a T1b . .

COMPARE 9 389 2010-11-29 sbu a T1b(m) . .

DIF 9 389 E .......... ........ ...XXX......... ........... ........... E E

BASE 9 818 2010-09-29 sbu a T2 1FU59CAV 2011-02-25 .

COMPARE 10 818 2010-09-29 sbu a T2(s) 1FU59CAV 2011-02-25 .

DIF 10 818 E .......... ........ ..XXX.......... ........... ........... E E

BASE 10 676 2010-12-13 sbu a T3 . .

COMPARE 11 676 2010-12-13 sbu a T3(s) . .

DIF 11 676 E .......... ........ ..XXX.......... ........... ........... E E

BASE 11 693 2010-09-30 sbu a T3 1MC87LA 1FU59HAV 2010-12-10 2011-05-09

COMPARE 12 693 2010-09-30 sbu a T3(m) 1MC87LA 1FU59CAV 2010-12-10 2011-05-09

DIF 12 693 E .......... ........ ..XXX.......... ........... .....X..... E E

BASE 12 358 2010-11-25 sbu a T3 . .

COMPARE 13 358 2010-11-25 sbu d T3(s) . .

DIF 13 358 E .......... X....... ..XXX.......... ........... ........... E E

BASE 13 161 2010-07-29 sbu a T1b 1FU59CAV 2011-04-01 .

COMPARE 14 161 2010-07-29 sbu a T1b(m) 1FU59CAV 2011-04-01 .

DIF 14 161 E .......... ........ ...XXX......... ........... ........... E E

Page 16: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

• _TYPE_ (Type of Observation) is a character variable. Its value indicates the source of the values for the matching variables in that observation. For this example it has the values BASE, COMPARE and DIF since OUTBASE OUTCOMPARE OUTDIF options were specified.

• _OBS_ (Observation Number) is a numeric variable containing a number further identifying the source of the OUT= observations. For observations with _TYPE_ equal to DIF, _OBS_ is a sequence number that counts the matching observations in the BY group.

• For numeric variables, E indicates the values are equal on that variable and that observation.

• For character variables, a period (.) is included for each position that is the same between the two data sets and an X is used to designate unequal characters.

• The OUTBASE and OUTCOMPARE options also ensure that non-matching observations (i.e. the ID value is in one data set and not the other) will be included in the output data set.

Page 17: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

No need to Rename

WITH statement allows to compare variables that have different names in two data sets. Variables on the VAR and WITH statements are matched up one-to-one.

proc compare BASE=th_old COMPARE=th_new ;

id crnum ;

var sex age height weight ;

with gender age_yrs ht wt ;

run ;

Check for Formatted ValuesProc Compare compares unformatted values. If there are two matching variables that are formatted differently, Proc Compare lists the formats of the variables.

Page 18: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

PROC COMPARE is a validation tool that is worth

getting to know.

There are several options available to customize the

PROC COMPARE output. You will just have to explore

those depending on your work purpose.

Page 19: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

References:

http://ciser.cornell.edu/sasdoc/saspdf/proc/c09.pdf

http://analytics.ncsu.edu/sesug/2011/BB12.Williams.pdf

http://www.hasug.org/newsletters/hasug200711/Proc_Compare.pdf

http://www.lexjansen.com/pharmasug/2003/tutorials/tu056.pdf

Page 20: 2013 A Closer Look at Proc Compare - Sas Institute Group... · 2016-03-11 · Humaira Khair A Closer Look at Proc Compare 1. How PROC COMPARE makes my work easy? Proc Compare is a

Questions?