15
SAS Users Group Dec 2005 To be, or not to be: that is the question: Whether 'tis nobler in the mind to suffer The use of the if statement, Or to take the Case statement in Proc sql, Or to use a format to recode data?

2b Or Nt 2b Jonathan Mac Kenzie

Embed Size (px)

Citation preview

Page 1: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

To be, or not to be: that is the question:

Whether 'tis nobler in the mind to suffer The use of the if statement, Or to take the Case statement in Proc sql, Or to use a format to

recode data?

Page 2: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

IntroductionWho am I?

Jonathan McKenzie, HR Analyst at Statistics New Zealand

Have been using SAS for over 5 years and assisting in further developing others use of SAS at Statistics NZ

Father of 3 girls

Page 3: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

What is this session about?SAS is a tool that provides many different approachesThis paper focuses on recoding data, including:

Reviewing the If statement Introducing the Case statement in Proc SQL Introducing using formats in the data step, Plus some other useful tips and tricks This paper is about introducing options for doing similar tasks not

about looking in depth at how to write and use each of the options

Questions, feel free to interrupt as we go

Page 4: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

The dataStatistics NZ’s Human Resources Information Management System (HRMIS) is

stored in SQL Server Using data stored in SQL Server database

Extract using mainly SAS One of the reasons for the use of SAS is the ease of extraction, manipulation and reformatting of the data

Most of the data in the HRMIS is stored as codes Great for minimising storage space, But of limited use for presenting results.

One of the key pieces of information I report on is the Resource centre, Used for allocating expenditures, Stored as a 2 digit character with values of ‘01’, ‘03’, ‘07’, ‘10’.

Page 5: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

Resources Centres

CodeDescription Summary

2002 2004 200501 Executive

ManagementExecutive Management

Executive Management

Corporate Services

03 Finance Corp Money Finance07 Human

ResourcesOrganisational Performance Solutions

Human Resources

13 Planning & Talking

Planning & Talking

Strategy and Communication

Strategy14 Planning & Figures

Planning & Important Figures

Strategy & Official Statistics

Page 6: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

All data in this presentation is a figment of the author imaginations, and any similarity to existing data is pure coincidence.

No Animals were harmed in the preparation of this presentation.

Page 7: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

If StatementIF expression THEN statement;

<ELSE statement;>

if Resource_centre = '01' then Resource_centre_desc = 'Executive Management';

Page 8: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

If Statement (cont)Effective optionLengthy data step if large number of codesNot very reusableCodes or descriptions changing means updating all

occurrences of codeCan create in Excel

Page 9: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

Using the Case Expression in Proc SQLProc SQL great for extracting data from databasesCase expressions provides a similar functionality to

the IF statement

CASE WHEN when-condition THEN result-expression <WHEN when-condition THEN result-expression>... <ELSE result-expression>

END

Page 10: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

Using the Case Expression in Proc SQL (cont)Effective option like IF statementLengthy data step if large number of codesNot very reusableCodes or descriptions changing means updating all

occurrences of code

Page 11: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

Using User Defined FormatsFormats can be used to similar effect as the IF

statement or Case expressionCreated using Proc Formats, either from code or an

existing data setCan be used in either data or proc stepsIf codes or descriptions change only the format needs

redefined and recreated, to take effect where ever the format is used

Page 12: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

Dealing with Historic dataCode and descriptions can change between yearsUsing formats can simplify working with data across

several years when used with the putn (for numeric columns) or putc (for character columns) functions

Page 13: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

Comparison of optionsOption Advantages DisadvantagesIf Statement Easy to write

Commonly used and understoodGreat when using a data step

Can get long for a large number of codesOnly valid in a data step

Case Expression in Proc SQL

Great when using proc sql Can get long for a large number of codesOnly valid when using proc sql

User defined Formats

Can be used in either data or proc stepsMeans more concise data steps

Not as clear as other optionsMore complex to understand and modify

Page 14: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

ConclusionNo one right answerDepends on:

An individual’s knowledge, The number of codes, Frequency of use, The number of rows in the tables Where and how the data is stored

Best approach is to know the options, their advantages and disadvantages and choose the most appropriate solution for the situation.

Page 15: 2b Or Nt 2b Jonathan Mac Kenzie

SAS Users Group Dec 2005

Questions?