26
IOWA STATE UNIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture No. 6 September 16, 2010

I OWA S TATE U NIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture

Embed Size (px)

Citation preview

IOWA STATE UNIVERSITYDepartment of Animal Science

Modifying and Combing SAS Data Sets

(Chapter in the 6 Little SAS Book)

Animal Science 500

Lecture No. 6

September 16, 2010

IOWA STATE UNIVERSITYDepartment of Animal Science

Modifying Variable Lengthsu The code below uses only 3 character spaces for the

county by specifying the length of the location variable

u You can also use this with numeric variables

data new;length location $3;input location $ 1-15 date $ rainfall;Cards; /*remember you could use the datalines statement to do the same thing);*/Story county 6/10/10 4.5Polk county 6/10/10 6.5Story county 7/10/10 4.9Polk county 7/10/10 2.4;Run;Quit;

IOWA STATE UNIVERSITYDepartment of Animal Science

Creating new variables

u You can create variables in the initial data step where you are inputting the data or you can use a new data step to create the variables. n As we did when we created

1. Adjusted backfat variables for barrows and gilts

2. Total weight gain during the test period

3. Average daily gain during the test period

u You do not have to make calculations when making new variables.n You could divide backfat, loin muscle are,

gain, etc into categories to evaluate if model effects differ by the classes you developed

IOWA STATE UNIVERSITYDepartment of Animal Science

Creating new variables

u If you choose to create a new data step – you will need to either create a new file in SAS or modify the existing file. n Example what we have done in Lab by the

following statements

Data Pig13 Set Pig12;

n Alternatively you could create a new file in SASl If you create a new file you may need to merge it with

the original data file;l This is a place where students often have difficulty

IOWA STATE UNIVERSITYDepartment of Animal Science

Creating New Variables in SAS

u Both of these options will result in a data set named “New” with all the variables that have been defined

u This option creates the variables lograinfall and sqrtrainfall in the initial data step

u In the second set of code you are creating a new file in SAS and naming it “New” the set statement tells SAS to Assign the data from the first “New” to this file “New”

data new;input @1 location $ 1-15 date mmddyy8. rainfall;log_rain = log(rainfall);sqrt_rain= sqrt(rain);datalines;Story county 6/10/10 4.5Polk county 6/10/10 6.5Story county 7/10/10 4.9Polk county 7/10/10 2.4;run;

data new; set new;log_rain = log(rainfall);sqrt_rain = sqrt(rain);run;

IOWA STATE UNIVERSITYDepartment of Animal Science

Creating New Variables in SASu SAS has many other functions to perform various calculations for

trigonometry, finance, and other applications. n Some examples assuming x is the variable you want to modify:

l Log = log(x)l Sin = sin(x)l Cos = cos(x)

n As we will see in the next lab, you may find the distribution does not meet the assumptions of the analysis of variance

n Will need to transform data often using some of the u Use SAS help to find the correct notationu Some helpful search hints are:

n Search under SAS Functions, Arithmetic Functions, Numeric Variables, Logical Operators

IOWA STATE UNIVERSITYDepartment of Animal Science

Operatorsu Recall from previous discussions

u Addition, subtraction, multiplication, and division are specified by +, -, *, and /, respectively.

u For exponentiation, a double asterisk ** is used. n exprainfall = exprainfall**2

u Parentheses can be used to group expressions, and these expressions can be nested within several levels. SAS follows the standard PEMDAS order, () ** * / + - ,

for evaluating functions.

IOWA STATE UNIVERSITYDepartment of Animal Science

Logical Operatorsu SAS can also evaluate

logical expressionsn < ,>, =, if, then, else,

else if, and (&), or(|), not (^)…

n Search Logical Operators in SAS Help

data new;input @1 location $ 1-15 date mmddyy8. rainfall;*creating a new variable based on location;if location = “Story county" then x = rainfall +5;else x = 5;*creating a new variable based on level of rainfall;if rainfall < 3 then y=1;else if rainfall < 4.9 then y=2;else y = 3;datalines;Story county 6/10/10 4.5Polk county 6/10/10 6.5Story county 7/10/10 4.9Polk county 7/10/10 2.4;Run;Quit;data new; set new;log_rain = log(rainfall);sqrt_rain = sqrt(rain);run;

IOWA STATE UNIVERSITYDepartment of Animal Science

Other ways to Modify Variables - Do Loopsu DO loops can be used to create an ordered sequence of numbers.

u Below is an example of a do loop in SAS

uThe program "loops" through the values of Q from 1 to 5 and performs the calculations requested for the current value of Q. The OUTPUT statement tells SAS to export Q and the new variables to the dataset EXAMPLE. The END statement signifies the end of the loop. An END statement is necessary for each DO statement! Notice that neither INPUT nor DATALINES statements are used.

data do; set new; do q=1 to 5; q_rain=q*rainfall; q_rainsquared=q_rain**2; output; end; proc print data = do; run;Quit;

IOWA STATE UNIVERSITYDepartment of Animal Science

Modifying your data using PROC Transpose

u Sometimes you need to reshape your data which is in a long format (shown below)

FamID Year FamInc

1 2007 40000

1 2008 40500

1 2009 41000

2 2007 45000

2 2008 45400

2 2009 45800

3 2007 75000

3 2008 76000

3 2009 77000

IOWA STATE UNIVERSITYDepartment of Animal Science

u into a wide format (shown below).

Modifying your data using PROC Transpose

FamID FamInc07 FamInc08 FamInc0

1 40000 40500 41000

2 45000 45400 45800

3 75000 76000 77000

IOWA STATE UNIVERSITYDepartment of Animal Science

Modifying your data using PROC Transpose

u How do we accomplish this?

u SAS proc transpose to reshape the data from a long to a wide format.

IOWA STATE UNIVERSITYDepartment of Animal Science

Modifying your data using PROC Transpose data long1 ; input famid year faminc ; cards ; 1 96 40000 1 97 40500 1 98 41000 2 96 45000 2 97 45400 2 98 45800 3 96 75000 3 97 76000 3 98 77000 ; run; quit;proc transpose data=long1 out=wide1

prefix=faminc; by famid ; id year; var famin;run;

quit; proc print data = wide1;

run;quit;

Notice that the option prefix= faminc specifies a prefix to use in constructing names for transposed variables in the output data set. SAS automatic variable _NAME_ contains the name of the variable being transposed

IOWA STATE UNIVERSITYDepartment of Animal Science

Modifying your data using PROC Transpose

u What does this get you?

u SAS output that looks like the following

Obs famid _Name_ faminc96 faminc97 faminc98

1 1 faminc 40000 40500 41000

2 2 faminc 45000 45400 45800

3 3 faminc 75000 76000 77000

IOWA STATE UNIVERSITYDepartment of Animal Science

Removing Observations

u When performing statistical calculations, SAS, by default, uses all of the observations that are in the dataset.

u You can selectively delete observations that you do not wish to use with:n IF statements - specify which observations to

keep, n IF and THEN DELETE will delete observations.

IOWA STATE UNIVERSITYDepartment of Animal Science

Removing Observations

u Both statements result in a file with only data from Story County

data new; set new;if location = “Story county";Run;Quit;

data new; set new; if location = “Polk county" then delete;run;

IOWA STATE UNIVERSITYDepartment of Animal Science

Removing Variables

u You may only need to use a few variables from a larger dataset. This can be done with KEEP or DROP statements.

u For many datasets, you can keep unneeded variables in the dataset, and SAS can handle them with ease. n This will be the case for most of you dealing

with several hundred or even several thousand observations

n Those dealing with very large data sets might benefit from using the Keep or Drop statementsl Caution be sure you are thinking ahead so that you

do not drop a variable needed later in some calculation or function

IOWA STATE UNIVERSITYDepartment of Animal Science

Removing Variables

u Let’s say we wanted to keep only the location, x and y variables from the “new” file.

u Both data steps below will accomplish the task.

data subset; set new;drop date rainfall;run;

data subset; set new;keep location x y;run;

IOWA STATE UNIVERSITYDepartment of Animal Science

Combining Datasetsu There are two different ways to combine datasets

using the SET and MERGE statements in a Data Step.

u The SET statement is used to add observations to an existing dataset.n This is what we have done in labn We have developed new variables using calculations with existing

data and have adjusted current data in the data set.n Data Pig12 Set Pig12;

l ADG = (OFFWT – ONWT) / DOT;

u Consider the following example, using monthly rainfall totals in Gainesville in 1995 and 1996. When used to combine observations into one dataset, the SET command works as follows:

IOWA STATE UNIVERSITYDepartment of Animal Science

Combining Datasets – Set data rain95;

input month rainfall @@; year=1995;

datalines;

1 3.08 2 1.07 3 6.14 4 5.18 5 2.47 6 7.55 7 7.66 8 7.20 9 2.10 10 4.33 11 3.15 12 1.29

;

run;

quit;

data rain96;

input month rainfall @@; year=1996;

datalines; 1 0.97 2 0.66 3 10.52 4 1.72 5 2.01 6 6.05 7 11.00 8 4.90 9 2.23 10 6.18 11 1.73 12 6.63

;

run;

quit;

data rain9596; set rain95 rain96;

run;

quit;

IOWA STATE UNIVERSITYDepartment of Animal Science

Combining Datasets

u The MERGE statement adds the variables in one dataset to another dataset.

u Consider the following example using Southern teams in the National Basketball Association:

IOWA STATE UNIVERSITYDepartment of Animal Science

Combining Datasets - Mergedata nba1; input @1 city $11. @11 division $;

Cards; Orlando Atlantic Miami Atlantic Atlanta Central Charlotte Central ;run; quit;

data nba2; input mascot $ @@;

Cards;Magic Heat Hawks Hornets

; run; quit;

data nba3; merge nba1 nba2; run; quit;

Proc Print;run; quit;

IOWA STATE UNIVERSITYDepartment of Animal Science

Combining Datasets – Merge Output Results

Obs city division mascot

1 Orlando Atlantic Magic

2 Miami Atlantic Heat

3 Atlanta Central Hawks

4 Charlotte Central Hornets

IOWA STATE UNIVERSITYDepartment of Animal Science

Combining Datasetsu In the last example, the observations

were ordered so that each dataset corresponds to the other. You will often need to put datasets together based on values of variables which are included in both datasets. To do this, both datasets must first be sorted in order by the common variable or variables.

IOWA STATE UNIVERSITYDepartment of Animal Science

Combining Datasets

Proc sort sorts the datafile using the variablechosen in the by statement – we will discuss Proc Sort in more detail later on

IOWA STATE UNIVERSITYDepartment of Animal Science

Combining Datasets with Merge -OutputThe SAS System 10:16 Saturday, September 11, 2010 14

Obs city division mascot

1 Atlanta Central Hawks

2 Charlotte Central Hornets

3 Miami Atlantic Heat

4 Orlando Atlantic Magic