34
Chapter 3 Combining Data Vertically 3.1 Appending Raw Data Files.......................................3-3 3.2 Appending SAS Data Sets.......................................3-24 3.3 Solutions to Exercises........................................3-33

Combining Data Vertically

Embed Size (px)

DESCRIPTION

Combining Data Vertically

Citation preview

3-32Chapter 3 Combining Data Vertically

3.3 Solutions to Exercises3-33

Chapter 3 Combining Data Vertically

3-33.1Appending Raw Data Files

3.2Appending SAS Data Sets3-243.3Solutions to Exercises3-33

3.1 Appending Raw Data Files

(Only the FILENAME statement and the FILEVAR= option are discussed in this section.

The FILEREF= option is an alias to the FILE= option.

In addition to viewing the values of the raw data file, you need to obtain the file layout for the fields in the raw data file.

PROC FSLIST (Full Screen List) is part of Base SAS beginning in Version 7. XE "FSLIST procedure" Selected commands and menu options:

HEXdisplays data in hexadecimal.

COLSdisplays column numbers.

NUMSdisplays row numbers.

PROC FSLIST displays a raw data file in browse mode. To display a raw data file in edit mode within the SAS interactive windowing mode, select File ( Open from the Program Editor or Enhanced Editor window. To remove the contents of the raw data file from the Program Editor or Enhanced Editor window, select Edit ( Clear All.

For z/OS (OS/390):

proc fslist file = '.prog3.rawdata(month1)';

run;

Use multiple INFILE statements to read a record from one raw data file, a record from the second raw data file, a record from the third raw data file, and so on (similar to an interleave).

Use the FILENAME statement to concatenate multiple raw data files whose names can be hard-coded.

Use the FILEVAR= option to conditionally concatenate multiple raw data files.

A FILENAME statement can associate a fileref with multiple physical external files. XE "DESCRIBE statement:DATA step view"

XE "DATA step views:DESCRIBE statement"

XE "FILENAME statement:syntax"

Under z/OS (OS/390):

filename Q1 ('.prog3.rawdata(month1)' '.prog3.rawdata(month2)' '.prog3.rawdata(month3)');

Log

filename Q1 ('month1.dat' 'month2.dat' 'month3.dat');

data firstq;

infile Q1;

input Flight $ Origin $ Dest $ Date : date9. RevCargo : comma15.2;

run;

NOTE: The infile Q1 is:

File Name=c:\workshop\winsas\prog3\month1.dat,

File List=('c:\workshop\winsas\prog3\month1.dat'

'c:\workshop\winsas\prog3\month2.dat'

'c:\workshop\winsas\prog3\month3.dat'),

RECFM=V,LRECL=256

NOTE: The infile Q1 is:

File Name=c:\workshop\winsas\prog3\month2.dat,

File List=('c:\workshop\winsas\prog3\month1.dat'

'c:\workshop\winsas\prog3\month2.dat'

'c:\workshop\winsas\prog3\month3.dat'),

RECFM=V,LRECL=256

NOTE: The infile Q1 is:

File Name=c:\workshop\winsas\prog3\month3.dat,

File List=('c:\workshop\winsas\prog3\month1.dat'

'c:\workshop\winsas\prog3\month2.dat'

'c:\workshop\winsas\prog3\month3.dat'),

RECFM=V,LRECL=256

NOTE: 2299 records were read from the infile Q1.

The minimum record length was 33.

The maximum record length was 37.

NOTE: 2090 records were read from the infile Q1.

The minimum record length was 33.

The maximum record length was 37.

NOTE: 2297 records were read from the infile Q1.

The minimum record length was 33.

The maximum record length was 37.

NOTE: The data set WORK.FIRSTQ has 6686 observations and 5 variables.

NOTE: DATA statement used:

real time 0.31 seconds

cpu time 0.12 seconds

The value of a FILEVAR= variable option is a character string that contains the physical filename of the raw data file to be read.

When the next INPUT statement executes, it reads from the new file that the FILEVAR= variable option specifies.

Like automatic variables, the FILEVAR= variable option is not written to the data set.

The FILEVAR= variable option can read raw data files conditionally. You can construct the names of the raw data files programatically. XE "INFILE statement:FILEVAR= option"

XE "FILEVAR= option:INFILE statement"

The placeholder must be eight characters or fewer, and must begin with an alpha character or underscore, followed by alphanumeric characters or underscores.

If the characters-to-remove option is omitted, the COMPRESS function removes blanks from the source.

The STOP statement is needed to prevent an infinite loop of the DATA step.

Log

data movingq;

do i = 11,10,9;

NextFile = "month"!!put(I,2.)!!".dat";

NextFile = compress(NextFile,' ');

infile xxx filevar=NextFile;

input Flight $ Origin $ Dest $ Date : date9. RevCargo : comma15.2;

output;

end;

stop;

run;

NOTE: The infile XXX is:

File Name=c:\workshop\winsas\prog3\month11.dat,

RECFM=V,LRECL=256

NOTE: The infile XXX is:

File Name=c:\workshop\winsas\prog3\month10.dat,

RECFM=V,LRECL=256

NOTE: The infile XXX is:

File Name=c:\workshop\winsas\prog3\month9.dat,

RECFM=V,LRECL=256

NOTE: 1 record was read from the infile XXX.

The minimum record length was 37.

The maximum record length was 37.

NOTE: 1 record was read from the infile XXX.

The minimum record length was 37.

The maximum record length was 37.

NOTE: 1 record was read from the infile XXX.

The minimum record length was 37.

The maximum record length was 37.

NOTE: The data set WORK.MOVINGQ has 3 observations and 6 variables.

NOTE: DATA statement used:

real time 0.15 seconds

cpu time 0.01 seconds

The END= variable option is not written to the SAS data set.

(Obtain the month number of todays date to begin the rolling quarter range.

(Calculate the month numbers of the two months prior to todays month number.

See the SAS documentation for more information about the INTNX function.

Reading Multiple Raw Data Filesc03s1d4data movingq;

drop MonNum MidMon LastMon I;

MonNum=month(today());

MidMon=month(intnx('month',today(),-1));

LastMon=month(intnx('month',today(),-2));

do i=MonNum, MidMon, LastMon;

NextFile="month"!!put(i,2.)!!".dat";

NextFile=compress(NextFile,' ');

do until (LastObs);

infile xxx filevar=NextFile end=LastObs;

input Flight $ Origin $ Dest $ Date : date9.

RevCargo : comma15.2;

output;

end;

end;

stop;

run;

For z/OS (OS/390):

NextFile = '.prog3.rawdata(month'!!put(i,2.)!!')';

Log

data movingq;

drop MonNum MidMon LastMon I;

MonNum=month(today());

MidMon=month(intnx('month',today(),-1));

LastMon=month(intnx('month',today(),-2));

do i=MonNum, MidMon, LastMon;

NextFile="month"!!put(i,2.)!!".dat";

NextFile=compress(NextFile,' ');

do until (LastObs);

infile xxx filevar=NextFile end=LastObs;

input Flight $ Origin $ Dest $ Date : date9.

RevCargo : comma15.2;

output;

end;

end;

stop;

run;

NOTE: The infile XXX is:

File Name=c:\workshop\winsas\prog3\month2.dat,

RECFM=V,LRECL=256

NOTE: The infile XXX is:

File Name=c:\workshop\winsas\prog3\month1.dat,

RECFM=V,LRECL=256

NOTE: The infile XXX is:

File Name=c:\workshop\winsas\prog3\month12.dat,

RECFM=V,LRECL=256

NOTE: 2090 records were read from the infile XXX.

The minimum record length was 33.

The maximum record length was 37.

NOTE: 2299 records were read from the infile XXX.

The minimum record length was 33.

The maximum record length was 37.

NOTE: 2190 records were read from the infile XXX.

The minimum record length was 33.

The maximum record length was 37.

NOTE: The data set WORK.MOVINGQ has 6579 observations and 5 variables.

NOTE: DATA statement used:

real time 0.48 seconds

cpu time 0.14 seconds

This program was run in February.

Reference Information

Storing the Raw Data Filenames in a SAS Data Set

If raw data files that are to be read are in the SAS data set ia.rawdata, you can use the following code:

ia.rawdata

Obs ReadIt

1 route1.dat

2 route2.dat

3 route3.dat

4 route4.dat

5 route5.dat

c03ref1.sasdata route1_5;

set ia.rawdata;

infile in filevar = ReadIt end = LastFile;

do while(LastFile = 0);

input @1 RouteID $7.

@8 Origin $3.

@11 Dest $3.

@14 Distance 5.

@19 Fare1st 4.

@23 FareBusiness 4.

@27 FareEcon 4.

@31 FareCargo 5.;

output;

end;

run; XE "raw data file names:storing in a SAS data set"

XE "DATALINES statement" Storing the Raw Data Filenames in an External File

If the raw data files to be read are in the external file rawfiles.dat, you can use the following code:

rawfiles.dat

route1.dat

route2.dat

route3.dat

route4.dat

route5.datc03ref2.sasdata route1_5;

infile 'rawfiles.dat';

input ReadIt $ 10.;

infile in filevar = ReadIt end = LastFile;

do while(LastFile = 0);

input @1 RouteID $7.

@8 Origin $3.

@11 Dest $3.

@14 Distance 5.

@19 Fare1stclass 4.

@23 FareBusiness 4.

@27 FareEcon 4.

@31 FareCargo 5.;

output;

end;

run;

XE "SAS data sets:creating from multiple raw data files" \r "rdfrm"

XE "raw data files:reading multiple" \r "rdfrm"

XE "multiple raw data files:reading" \r "rdfrm"

XE "reading multiple raw data files" \r "rdfrm"

XE "raw data file names:storing in an external file" Exercises1. Concatenate the companys annual raw data files for the current year and previous two years using the FILEVAR= option. Create a SAS data set named last3. The raw data files use the following naming convention:

For directory based:y2002.dat

For z/OS (OS/390):'.prog3.rawdata(y2002)'

Use the following INPUT statement:

input Flight $ Date : date9. Depart : time5.;

Partial Output

The SAS System

Obs Flight Date Depart

1 IA10800 01JAN2004 6:35

2 IA10801 01JAN2004 9:35

3 IA10802 01JAN2004 12:35

4 IA10803 01JAN2004 15:35

5 IA10804 01JAN2004 18:35

6 IA10805 01JAN2004 21:35

7 IA10800 01JAN2004 6:35

8 IA10801 01JAN2004 9:35

9 IA10802 01JAN2004 12:35

10 IA10803 01JAN2004 15:35

11 IA10804 01JAN2004 18:35

12 IA10805 01JAN2004 21:35

13 IA10800 01JAN2004 6:35

14 IA10801 01JAN2004 9:35

15 IA10802 01JAN2004 12:35

3.2 Appending SAS Data Sets

(Only PROC APPEND is discussed in this section.

PROC APPEND only reads the data in the DATA= SAS data set, not the BASE= SAS data set.

The FORCE option forces PROC APPEND to concatenate data sets when the DATA= data set contains variables that either

are not in the BASE= data set.

do not have the same type as the variables in the BASE= data set. For variables with a type mismatch, missing values are assigned in the appended observations when the FORCE option is used.

are longer than the variables in the BASE= data set.

The work.moredata data set has 19 variables. The work.partdata data set has 15 variables.

The FORCE option was required because of the variable type mismatch for RouteID. The new observations receive missing values for RouteID.

The FORCE option can cause loss of data due to truncation or dropping variables.

The work.moredata data set has 19 variables. The work.partdata data set has 15 variables.

In addition to the type mismatch for RouteID, the additional variables present in work.moredata require the use of the FORCE option.

Appending Variables with Different Attributes

c03s2d1proc contents data = airports;

run;Partial Output

The CONTENTS Procedure

Data Set Name WORK.AIRPORTS Observations 9397

Member Type DATA Variables 4

Alphabetic List of Variables and Attributes

# Variable Type Len Label

2 City Char 50 City Where Airport is Located

1 Code Char 3 Airport Code 3 Country Char 40 Country Where Airport is Located

4 Name Char 50 Airport Name

proc contents data = acities;

run;

Partial Output

The CONTENTS Procedure

Data Set Name WORK.ACITIES Observations 52

Member Type DATA Variables 4

Alphabetic List of Variables and Attributes

# Variable Type Len Label

1 City Char 30 City Where Airport is Located

2 Code Char 3 Start Point 4 Country Char 40 Country Where Airport is Located

3 Name Char 50 Airport Name

proc append data = airports base = acities force;

run;

proc contents data = acities;

run;

Log

proc append data=airports base=acities force;

run;

NOTE: Appending WORK.AIRPORTS to WORK.ACITIES.

WARNING: Variable City has different lengths on BASE and DATA files

(BASE 30 DATA 50).

NOTE: FORCE is specified, so dropping/truncating will occur.

NOTE: There were 9397 observations read from the data set WORK.AIRPORTS.

NOTE: 9397 observations added.

NOTE: The data set WORK.ACITIES has 9449 observations and 4 variables.

NOTE: PROCEDURE APPEND used:

real time 0.04 seconds

cpu time 0.04 secondsPartial Output

The CONTENTS Procedure

Data Set Name WORK.ACITIES Observations 9449

Member Type DATA Variables 4

Alphabetic List of Variables and Attributes

# Variable Type Len Label

1 City Char 30 City Where Airport is Located

2 Code Char 3 Start Point 4 Country Char 40 Country Where Airport is Located

3 Name Char 50 Airport Name

proc contents data = allemps;

run;

Output

The CONTENTS Procedure

Data Set Name WORK.ALLEMPS Observations 550

Member Type DATA Variables 5

Alphabetic List of Variables and Attributes

# Variable Type Len

5 Division Char 30

1 EmpID Char 6

2 LastName Char 15

4 Location Char 13

3 Phone Char 4

proc contents data = pilots;

run;

Output

The CONTENTS Procedure

Data Set Name WORK.PILOTS Observations 31

Member Type DATA Variables 5

Alphabetic List of Variables and Attributes

# Variable Type Len Format Informat Label

2 Division Char 30 $30. $30. Division

5 EmpID Char 6 $6. $6. Employee Identification Number

3 LastName Char 32 $32. $32. Employee Last Name

4 Location Char 16 $16. $16. Employee Office Location

1 Phone Num 8proc append base = allemps data = pilots force;

run;

Log

proc append base=allemps data=pilots force;

run;

NOTE: Appending WORK.PILOTS to WORK.ALLEMPS.

WARNING: Variable LastName has different lengths on BASE and DATA files

(BASE 15 DATA 32).

WARNING: Variable Phone not appended because of type mismatch.

WARNING: Variable Location has different lengths on BASE and DATA files

(BASE 13 DATA 16).

NOTE: FORCE is specified, so dropping/truncating will occur.

NOTE: There were 31 observations read from the data set WORK.PILOTS.

NOTE: 31 observations added.

NOTE: The data set WORK.ALLEMPS has 581 observations and 5 variables.

NOTE: PROCEDURE APPEND used:

real time 0.01 seconds

cpu time 0.01 seconds

proc print data = allemps;

var phone;

run;

Output

Obs Phone

547 1003

548 1028

549 1070

550 1016

551

552

553

554proc contents data = allemps;

run;

Output

The CONTENTS Procedure

Data Set Name WORK.ALLEMPS Observations 581

Member Type DATA Variables 5

Alphabetic List of Variables and Attributes

# Variable Type Len

5 Division Char 30

1 EmpID Char 6

2 LastName Char 15

4 Location Char 13

3 Phone Char 4

Summary

DATA= data set contains variables thatForce Required?Consequences

Are not in the BASE= data setyesExtra DATA= data set variables are dropped.

Do not have the same type as the variables in the BASE= data setyesData not appended. Missing values are assigned to mismatched data.

Are longer than the variables in the BASE= data setyesDATA = data set variable values truncated.

Are in the BASE= data set, but BASE= data set has more variablesnoMissing values are assigned to extra BASE= data set variables.

Exercises2. Append ia.quarter4 to ia.y2000. First, determine if the data sets have the same variables. The resulting data set should be ia.y2000 data with the additional observations from ia.quarter4.

Partial Output: Added Observations

ia.y2000 with Quarter4 Data

Obs CrgoRev1 CrgoRev2 CrgoRev3 CrgoRev4

265 $3,281,364 $558,698 $2,094,261 $1,814,348

266 $3,296,780 $534,094 $2,403,148 $1,803,004

267 $3,317,456 $567,020 $2,155,557 $1,822,840

268 $3,279,250 $526,076 $1,893,366 $1,801,768

269 $3,260,316 $552,722 $2,133,225 $1,834,500

270 $3,243,090 $559,722 $2,337,188 $1,849,388

271 $3,293,606 $531,262 $2,132,043 $1,824,242

272 $3,268,782 $553,850 $2,114,361 $1,828,158

273 $3,227,646 $545,726 $2,369,204 $1,825,288

274 $3,287,060 $549,280 $2,132,679 $1,817,324

275 $3,281,134 $555,670 $1,917,524 $1,769,740

276 $3,270,620 $572,136 $2,102,609 $1,775,210

277 $3,296,466 $592,800 $2,352,088 $1,797,826

278 $3,299,664 $542,860 $2,102,151 $1,846,074

279 $3,283,118 $538,246 $2,135,697 $1,795,390

280 $3,212,646 $528,154 $2,403,092 $1,800,462

Obs CrgoRev5 CrgoRev6 Date

265 $216,498 $1,229,390 22SEP2000

266 $233,466 $975,811 23SEP2000

267 $217,542 $943,923 24SEP2000

268 $219,428 $967,185 25SEP2000

269 $214,046 $985,297 26SEP2000

270 $212,828 $949,119 27SEP2000

271 $223,846 $943,461 28SEP2000

272 $219,926 $1,194,524 29SEP2000

273 $219,114 $974,305 30SEP2000

274 $219,792 $972,585 01OCT2000

275 $225,944 $984,625 02OCT2000

276 $215,386 $981,231 03OCT2000

277 $216,650 $941,179 04OCT2000

278 $217,364 $980,101 05OCT2000

279 $225,754 $1,211,148 06OCT2000

280 $213,560 $969,143 07OCT20003.3 Solutions to Exercises1. Concatenate the companys annual raw data files for the current year and previous two years using the FILEVAR= option. Create a SAS data set named last3. The raw data files use the following naming convention:

For directory based:y2002.dat

For z/OS (OS/390):'.prog3.rawdata(y2002)'

Use the following INPUT statement:

input Flight $ Date : date9. Depart : time5.;

Save your SAS program.

For directory based:

ch3ex1.sas

For z/OS (OS/390):

'.prog3.sascode(ch3ex1)'

data last3(drop=year thisyear);

thisyear=year(today());

do year=thisyear to thisyear-2 by -1;

NextFile="y"!!put(year,4.)!!".dat";

do until(Last);

infile xxx filevar=NextFile end=Last;

input Flight $ Date : date9. Depart : time5.;

output;

end;

end;

stop;

run;

proc print data=last3;

format Date date9. Depart time5.;

run;

2. Append ia.quarter4 to ia.y2000. First, determine if the data sets have the same variables. The resulting data set should be ia.y2000 data with the additional observations from ia.quarter4.

proc append data = ia.quarter4 base = ia.y2000 force;

run;