102
Chapter 18 Reading Free-Format Data

Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Embed Size (px)

DESCRIPTION

What is FREE-FORMAT data The data values not arranged in fixed fields. Data values separated by blanks or some specific delimiters. Numeric data values that are not in standard format. Issues that need special attention when reading free-format data: How to handle missing data in free-format data set? The danger of incorrect variable length. How to handle data values with quotation marks? Informats used in Formatted Input are not the same when reading free-format data values.

Citation preview

Page 1: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Chapter 18Reading Free-Format Data

Page 2: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

2

Objectives

• Read free-format data not recognized in fixed fields.• Read free-format data separated by non-blank

delimiters, such as commas.• Read a raw data file with missing data (at the end

middle or beginning of a record).• Read character values exceeding 8 characters.• Read nonstandard free-format data.• Read character values containing embedded blanks.

Page 3: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

What is FREE-FORMAT data• The data values not arranged in fixed fields.• Data values separated by blanks or some specific delimiters.• Numeric data values that are not in standard format.

Issues that need special attention when reading free-format data:

• How to handle missing data in free-format data set?• The danger of incorrect variable length. • How to handle data values with quotation marks?• Informats used in Formatted Input are not the same when

reading free-format data values.

Page 4: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

4

List Input with the Default Delimiter(Blank is the Default Delimiter)

The data is not in fixed columns. The fields are separated by spaces. There is one nonstandard field.

50001 4feb1989 132 53050002 11nov1989 152 54050003 22oct1991 90 53050004 4feb1993 172 55050005 24jun1993 170 51050006 20dec1994 180 520

Page 5: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

LIST INPUT and its variations

To read a free-format data, the simplest INPUT is by using LIST INPUT.

The general Syntax:INPUT variable <$> ;• Variable is the variable name to be read.• $ specifies character variable.

NOTE: • The list input style signals to the SAS System that fields are

separated by delimiters.• SAS then reads from non-delimiter to delimiter instead of from a

specific location on the raw data record.

Page 6: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

IMPORTANT CONDITIONS for LIST Input:

• All fields must be separated by at least one blank.• Fields must be read sequentially from left to right• Can not skip or re-read fields.• Missing data for character variable must be

specified using user-defined missing (can not use blank as missing, since Blank is the delimiter.

• Missing data for numeric must be specified using ‘. ‘ Or other user-defined missing (can not use blank for numeric missing).

Page 7: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

7

Delimiters

tab characters

A space (blank) is the default delimiter.

blanks

commas

Common delimiters are

Page 8: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

8

Input Data involving Date, Time

• The second field is a date. How does SAS store dates?

50001 4feb1989 132 53050002 11nov1989 152 54050003 22oct1991 90 53050004 4feb1993 172 55050005 24jun1993 170 51050006 20dec1994 180 520

Page 9: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

9

Standard Data

• The term standard data refers to character and numeric data that SAS recognizes automatically.

• Some examples of standard numeric data include– 35469.93– 3E5 (exponential notation)– -46859.

• Standard character data is any character you can type on your keyboard. Standard character values are always left-justified by SAS.

Page 10: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

10

Nonstandard Data

• The term nonstandard data refers to character and numeric data that SAS does not recognize automatically.

• Examples of nonstandard numeric data include– 12/12/2012– 29FEB2000– 4,242– $89,000.

Page 11: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

11

Informats• To read in nonstandard data, you must apply

an informat.

• General form of an informat:

• Informats are instructions that specify how SAS reads raw data.

<$>INFORMAT-NAME<w>.<d>

Page 12: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

12

Informats

Examples of informats are• COMMAw. reads numeric data ($4,242) and

strips out selected nonnumeric characters, such as dollar signs and commas, dashes, blanks.

• MMDDYYw. reads dates in the form 12/31/2012.• DATEw. reads dates in the form 29Feb2000.

Page 13: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Reading Free-Format data with Delimiters

• By default, free-format data values are separated by BLANKS. SAS reads a data value until it reaches the next blank.

• Blank is not the only delimiter to separate data values. SAS allows user-specified delimiters, as long as it is not part of the data values. For example, one can use / , % ; and so on as delimiter to create the external free-format data set.

• The option DLM = ‘ ‘ in the INFILE statement is needed to inform the SAS INPUT statement the delimiters used.

Ex:INFILE ‘path-to-the-file’ DLM = ‘,’ ; informs the INPUT statement to read data value until comma ( , ) is reached.

Page 14: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Example

LA50001,4feb1989,132, 530PHIL50002, 11nov1989, 152 ,540NEWYORK50003 ,22oct1991, 90, 530CHICAGO50004, 4feb1993 ,172 ,550DETROIT50005 ,24jun1993, 170 ,510DALLAS50006, 20dec1994, 180, 520

The following is an airplane data set consisting of ID, date_inservice, # of passenger capacity and # of cargo capacity

The data values are separated by comma and space.How does SAS read this data set?

Page 15: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

15

Reading a Delimited Raw Data Filedata airplanes; infile 'raw-data-file‘ DLM = ‘, ’; input ID $ InService date9. PassCap CargoCap;run;

Page 16: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Exercise• Write a SAS program to read the following data. Variables are: Location,

date # of passengers # of cargos for the flight

LA50001,4feb1989,132, 530 PHIL50002,11nov1989, 152 ,540 NEWYORK50003,22oct1991 , 90, 530 CHICAGO50004,4feb1993 , 172 ,550 DETROIT50005,24jun1993, 170 ,510DALLAS50006,20dec1994 , 180, 520 • Print the data.Save the program as c18_freeform1 to the SASEx folder in your c-drive.

• Observe the results. You should notice that some data values for Location are not complete.

• What is the cause of incomplete data values?• How to solve this problem?

Page 17: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

data airplane;infile datalines dlm=', ' ;input Loc $ date date9. npas ncargo;datalines;LA50001,4feb1989,132, 530 PHIL50002,11nov1989, 152 ,540 NEWYORK50003,22oct1991 , 90, 530 CHICAGO50004,4feb1993 , 172 ,550 DETROIT50005,24jun1993, 170 ,510DALLAS50006,20dec1994 , 180, 520 ;run; proc print; format date date9. ; run;

Answer

Page 18: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Results

Obs Loc date npas ncargo

1 LA50001 04FEB1989 132 5302 PHIL5000 11NOV1989 152 5403 NEWYORK5 22OCT1991 90 5304 CHICAGO5 04FEB1993 172 550 5 DETROIT5 24JUN1993 170 5106 DALLAS50 20DEC1994 180 520

What is wrong with this result?NOTE: The some of the LOC’s are not complete.

NOTE: It is 8 characters. But, some of the ID’s are more than 8.

Page 19: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

19

Lengths of Variables read using free-format

• When you use list input, the default length for character and numeric variables is 8 bytes.

• You can set the length of character variables with a LENGTH statement or with an informat.

• General form of a LENGTH statement:

LENGTH variable-name <$> length-specification ...;

Page 20: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

20

Setting the Length of a Variable

data airplanes; length ID $ 15.; infile 'raw-data-file‘ DLM = ‘ , ‘; input LOC $ InService date9. PassCap CargoCap;run;

Page 21: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Exercise

Open the program c18_freeform1, revise the program to make the data values for Location are complete.

Page 22: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Answerdata airplane;Length Loc $ 15.;infile datalines dlm=', ' ;input Loc $ date date9. npas ncargo;datalines;LA50001,4feb1989,132, 530 PHIL50002,11nov1989, 152 ,540 NEWYORK50003,22oct1991 , 90, 530 CHICAGO50004,4feb1993 , 172 ,550 DETROIT50005,24jun1993, 170 ,510DALLAS50006,20dec1994 , 180, 520 ;run; proc print; format date date9. ; run;

Page 23: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Correct Results

Obs LOC date npas ncargo

1 LA50001 04FEB1989 132 5302 PHIL50002 11NOV1989 152 5403 NEWYORK50003 22OCT1991 90 5304 CHICAGO50004 04FEB1993 172 5505 DETROIT50005 24JUN1993 170 5106 DALLAS50006 20DEC1994 180 520

Page 24: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

24

ID$5

data airplanes; length ID $ 5.’; infile 'raw-data-file'; input ID $ InService date9. PassCap CargoCap;run;

50001 4feb1989 132 53050002 11nov1989 152 54050003 22oct1991 90 53050004 4feb1993 172 55050005 24jun1993 170 51050006 20dec1994 180 520

Raw Data File Compile

PDV

Input Buffer

...

Page 25: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

25

ID$5

PASSCAPN8

CARGOCAPN8

INSERVICEN8

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService date9. PassCap CargoCap;run;

50001 4feb1989 132 53050002 11nov1989 152 54050003 22oct1991 90 53050004 4feb1993 172 55050005 24jun1993 170 51050006 20dec1994 180 520

Raw Data File Compile

PDV

Input Buffer

...

Page 26: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

26

50001 4feb1989 132 53050002 11nov1989 152 54050003 22oct1991 90 53050004 4feb1993 172 55050005 24jun1993 170 51050006 20dec1994 180 520

Raw Data File Execute

PDV

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService date9. PassCap CargoCap;run;

Input Buffer

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

....

Page 27: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

27

50001 4feb1989 132 53050002 11nov1989 152 54050003 22oct1991 90 53050004 4feb1993 172 55050005 24jun1993 170 51050006 20dec1994 180 520

Raw Data File

PDV

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService date9. PassCap CargoCap;run;

Input Buffer

5 0 0 0 1 4 f e b 1 9 8 9 1 3 2 5 3 0

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

....

Page 28: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

28

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

.

50001 4feb1989 132 53050002 11nov1989 152 54050003 22oct1991 90 53050004 4feb1993 172 55050005 24jun1993 170 51050006 20dec1994 180 520

Raw Data File

PDV

50001 10627 132 530

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService date9. PassCap CargoCap;run;

Input Buffer

...

5 0 0 0 1 4 f e b 1 9 8 9 1 3 2 5 3 0

Page 29: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

29 Write out observation to airplanes.

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService date9. PassCap CargoCap;run;

50001 4feb1989 132 53050002 11nov1989 152 54050003 22oct1991 90 53050004 4feb1993 172 55050005 24jun1993 170 51050006 20dec1994 180 520

Raw Data File

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

.

PDV

50001 10627 132 530

Input BufferImplicit output

...

5 0 0 0 1 4 f e b 1 9 8 9 1 3 2 5 3 0

Page 30: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

30

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService date9. PassCap CargoCap;run;

50001 4feb1989 132 53050002 11nov1989 152 54050003 22oct1991 90 53050004 4feb1993 172 55050005 24jun1993 170 51050006 20dec1994 180 520

Raw Data File

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

.

PDV

50001 10627 132 530

Input BufferImplicit return

...

5 0 0 0 1 4 f e b 1 9 8 9 1 3 2 5 3 0

Page 31: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

31

Using the DLM= Option in the INFILE statement

• The DLM= option sets a character or characters that SAS recognizes as a delimiter in the raw data file.

• General form of the INFILE statement with the DLM= option:

• Any character you can type on your keyboard can be a delimiter. You can also use hexadecimal characters.

INFILE 'raw-data-file' DLM='delimiter(s)';

Page 32: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Reading Missing Values

There are two situations may occur when reading a free-format data involving missing data:

• Missing values at the END of a record• Missing values at the BEGINNING or MIDDLE of a

record

Page 33: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

33

50001 , 4feb1989,13250002, 11nov1989,152, 540 50003, 22oct1991,90, 53050004, 4feb1993,17250005, 24jun1993, 170, 51050006, 20dec1994, 180, 520

Missing Data at the End of a Record

Page 34: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

34

Missing Data at the End of a Row

• By default, when there is missing data at the end of a row, SAS will continue to read the missing data value from the next record:

1. SAS loads the next record to finish the observation.

2. A note is written to the log3. SAS loads a new record at the top of the DATA step

and continues processing.

Page 35: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

35

data airplanes3; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService : date9. PassCap CargoCap;run;

50001 , 4feb1989,13250002, 11nov1989,152, 540 50003, 22oct1991,90, 53050004, 4feb1993,17250005, 24jun1993, 170, 51050006, 20dec1994, 180, 520

Raw Data File Execute

PDV

Input Buffer

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

....

Page 36: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

36

data airplanes3; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService : date9. PassCap CargoCap;run;

50001 , 4feb1989,13250002, 11nov1989,152, 540 50003, 22oct1991,90, 53050004, 4feb1993,17250005, 24jun1993, 170, 51050006, 20dec1994, 180, 520

Raw Data File

PDV

Input Buffer

5 0 0 0 1 , 4 f e b 1 9 8 9 , 1 3 2

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

....

Page 37: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

37

Raw Data File50001 , 4feb1989,13250002, 11nov1989,152, 540 50003, 22oct1991,90, 53050004, 4feb1993,17250005, 24jun1993, 170, 51050006, 20dec1994, 180, 520

data airplanes3; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService : date9. PassCap CargoCap;run;

Input Buffer

5 0 0 0 1 , 4 f e b 1 9 8 9 , 1 3 2

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

.

PDV

50001 10627 132...

No data

Page 38: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

38

50001 , 4feb1989,13250002, 11nov1989,152, 540 50003, 22oct1991,90, 53050004, 4feb1993,17250005, 24jun1993, 170, 51050006, 20dec1994, 180, 520

Raw Data File data airplanes3; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService : date9. PassCap CargoCap;run;

Input Buffer

5 0 0 0 2 , 1 1 n o v 1 9 8 9 , 1 5 2 , 5 4 0

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

.

PDV

...

SAS loadsnext record.

5000250001 10627 132

Page 39: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

39 Write out observation to airplanes.

50001 , 4feb1989,13250002, 11nov1989,152, 540 50003, 22oct1991,90, 53050004, 4feb1993,17250005, 24jun1993, 170, 51050006, 20dec1994, 180, 520

Raw Data File data airplanes; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService : date9. PassCap CargoCap;run;

Input Buffer

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

.

PDV

5000250001 10627 132

Implicit output

5 0 0 0 2 , 1 1 n o v 1 9 8 9 , 1 5 2 , 5 4 0

Page 40: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

40

50001 , 4feb1989,13250002, 11nov1989,152, 540 50003, 22oct1991,90, 53050004, 4feb1993,17250005, 24jun1993, 170, 51050006, 20dec1994, 180, 520

Raw Data File data airplanes3; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService : date9. PassCap CargoCap;run;

Input Buffer

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

.

PDV

...

5000250001 10627 132

Implicit return

5 0 0 0 2 , 1 1 n o v 1 9 8 9 , 1 5 2 , 5 4 0

Page 41: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

41

Input Buffer

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

.

data airplanes3; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService : date9. PassCap CargoCap;run;

50001 , 4feb1989,13250002, 11nov1989,152, 540 50003, 22oct1991,90, 53050004, 4feb1993,17250005, 24jun1993, 170, 51050006, 20dec1994, 180, 520

Raw Data File

PDV

5 0 0 0 3 , 2 2 o c t 1 9 9 1 , 9 0 , 5 3 0

...

Page 42: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

42

Input Buffer

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

.

data airplanes3; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService : date9. PassCap CargoCap;run;

50001 , 4feb1989,13250002, 11nov1989,152, 540 50003, 22oct1991,90, 53050004, 4feb1993,17250005, 24jun1993, 170, 51050006, 20dec1994, 180, 520

Raw Data File

PDV

5 0 0 0 3 , 2 2 o c t 1 9 9 1 , 9 0 , 5 3 0

...

Continue processinguntil end of the raw data file.

Page 43: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

43

NOTE: 6 records were read from the infile 'aircraft3.dat'. The minimum record length was 19. The maximum record length was 26.NOTE: SAS went to a new line when INPUT statement reached past the end of a line.NOTE: The data set WORK.AIRPLANES3 has 4

observations and 4 variables.

Partial Log

Page 44: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

44

proc print data=airplanes3 noobs;run;

In Pass Cargo ID Service Cap Cap

50001 10627 132 5000250003 11617 90 53050004 12088 172 5000550006 12772 180 520

Missing Data at the End of the Row

PROC PRINT Output

Page 45: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

45

Use the MISSOVER Option in INFILE statement to handle missing at the end of a record

• The MISSOVER option prevents SAS from loading a new record when the end of the current record is reached.

• General form of the INFILE statement with the MISSOVER option:

• If SAS reaches the end of the row without finding values for all fields, variables without values are set to missing.

INFILE 'raw-data-file' MISSOVER;

Page 46: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

46

Using the MISSOVER Option

data airplanes; length ID $ 5; infile 'raw-data-file' dlm=',' missover; input ID $ InService : date9. PassCap CargoCap;run;

Page 47: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

47

Partial SAS Log

NOTE: 6 records were read from the infile 'aircraft3.dat'. The minimum record length was 19. The maximum record length was 26.NOTE: The data set WORK.AIRPLANES3 has 6 observations and 4 variables.

Using the MISSOVER Option

Page 48: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

48

proc print data=airplanes noobs;run;

In Pass Cargo ID Service Cap Cap

50001 10627 132 .50002 10907 152 54050003 11617 90 53050004 12088 172 .50005 12228 170 51050006 12772 180 520

Using the MISSOVER Option

PROC PRINT Output

Page 49: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Missing Values at the beginning or Middle of a record

There are situations where missing values occur in the beginning of a record or middle of a record.

Since multiple delimiters , such as ,, is treated as a delimiter, simply using DLM = ‘,’ will not able to take care of these situations here.

Page 50: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

50

Missing Values without Placeholders

• There is missing data represented by two consecutive delimiters.

50001 , 4feb1989,, 53050002, 11nov1989,132, 54050003, 22oct1991,90, 53050004, 4feb1993,172, 55050005, 24jun1993,, 51050006, 20dec1994, 180, 520

Page 51: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

51

5 0 0 0 1 , 4feb1989 , . , 5 3 0

Missing Values without Placeholders

• By default, SAS treats two consecutive delimiters as one. Missing data should be represented by a placeholder by filling the missing value with proper missing value such as a period (.) for numeric missing.

• However, it is not possible to use blank as missing for character values, using a placeholder for character variable means to define a string as missing and then, writing a SAS program to convert the string into missing data.

• Alternatively, one can use an option DSD in the INFILE statement to handle these missing cases.

Page 52: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

52

Missing Values without Placeholders

data airplanes4; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService date9. PassCap CargoCap;run;

Page 53: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

53

data airplanes4; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService date9. PassCap CargoCap;run;

50001 , 4feb1989,, 53050002, 11nov1989,132, 54050003, 22oct1991,90, 53050004, 4feb1993,172, 55050005, 24jun1993,, 51050006, 20dec1994, 180, 520

Raw Data File Execute

PDV

Input Buffer

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

....

Page 54: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

54

data airplanes4; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService date9. PassCap CargoCap;run;

50001 , 4feb1989,, 53050002, 11nov1989,132, 54050003, 22oct1991,90, 53050004, 4feb1993,172, 55050005, 24jun1993,, 51050006, 20dec1994, 180, 520

Raw Data File

PDV

Input Buffer

5 0 0 0 1 , 4 f e b 1 9 8 9 , , 5 3 0

ID$5

PASSCAPN8

.

CARGOCAPN8

.

INSERVICEN8

....

Page 55: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

55

. ..

Raw Data File50001 , 4feb1989,, 53050002, 11nov1989,132, 54050003, 22oct1991,90, 53050004, 4feb1993,172, 55050005, 24jun1993,, 51050006, 20dec1994, 180, 520

data airplanes4; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService date9. PassCap CargoCap;run;

Input Buffer

50001 10627 530...

No data PDV

ID$5

PASSCAPN8

CARGOCAPN8

INSERVICEN8

5 0 0 0 1 , 4 f e b 1 9 8 9 , , 5 3 0

Page 56: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

56

50001 , 4feb1989,, 53050002, 11nov1989,132, 54050003, 22oct1991,90, 53050004, 4feb1993,172, 55050005, 24jun1993,, 51050006, 20dec1994, 180, 520

Raw Data File data airplanes4; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService date9. PassCap CargoCap;run;

Input Buffer

...

5 0 0 0 1 , 4 f e b 1 9 8 9 , , 5 3 0

PDVID$5

PASSCAPN8

CARGOCAPN8

INSERVICEN8

. ..50001 10627 530

Page 57: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

57

50001 , 4feb1989,, 53050002, 11nov1989,132, 54050003, 22oct1991,90, 53050004, 4feb1993,172, 55050005, 24jun1993,, 51050006, 20dec1994, 180, 520

Raw Data File data airplanes4; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService date9. PassCap CargoCap;run;

Input Buffer

5 0 0 0 2 , 1 1 n o v 1 9 8 9 , 1 3 2 , 5 4 0

...

SAS loadsnext record.

. ..50001 10627 530

PDVID$5

PASSCAPN8

CARGOCAPN8

INSERVICEN8

Page 58: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

58

. ..50001 10627 530

50001 , 4feb1989,, 53050002, 11nov1989,132, 54050003, 22oct1991,90, 53050004, 4feb1993,172, 55050005, 24jun1993,, 51050006, 20dec1994, 180, 520

Raw Data File data airplanes4; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService date9. PassCap CargoCap;run;

Input Buffer

5 0 0 0 2 , 1 1 n o v 1 9 8 9 , 1 3 2 , 5 4 0

...50002

PDVID$5

PASSCAPN8

CARGOCAPN8

INSERVICEN8

Page 59: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

59 Write out observation to airplanes4.

50001 , 4feb1989,, 53050002, 11nov1989,132, 54050003, 22oct1991,90, 53050004, 4feb1993,172, 55050005, 24jun1993,, 51050006, 20dec1994, 180, 520

Raw Data File data airplanes4; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService date9. PassCap CargoCap;run;

Input Buffer

. ..50001 10627 132 530

Implicit output

...50002

5 0 0 0 2 , 1 1 n o v 1 9 8 9 , 1 3 2 , 5 4 0

PDVID$5

PASSCAPN8

CARGOCAPN8

INSERVICEN8

Page 60: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

60

50001 , 4feb1989,, 53050002, 11nov1989,132, 54050003, 22oct1991,90, 53050004, 4feb1993,172, 55050005, 24jun1993,, 51050006, 20dec1994, 180, 520

Raw Data File data airplanes4; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService date9. PassCap CargoCap;run;

Input Buffer

. ..50001 10627 132 530

Implicit return

...50002

5 0 0 0 2 , 1 1 n o v 1 9 8 9 , 1 3 2 , 5 4 0

PDVID$5

PASSCAPN8

CARGOCAPN8

INSERVICEN8

Page 61: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

61

data airplanes4; length ID $ 5; infile 'raw-data-file' dlm=','; input ID $ InService date9. PassCap CargoCap;run;

50001 , 4feb1989,, 53050002, 11nov1989,132, 54050003, 22oct1991,90, 53050004, 4feb1993,172, 55050005, 24jun1993,, 51050006, 20dec1994, 180, 520

Raw Data File

Input Buffer

5 0 0 0 3 , 2 2 o c t 1 9 9 1 , 9 0 , 5 3 0

. .....

PDVID$5

PASSCAPN8

CARGOCAPN8

INSERVICEN8

Page 62: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

62

NOTE: 6 records were read from the infile 'aircraft4.dat'. The minimum record length was 21. The maximum record length was 26.NOTE: SAS went to a new line when INPUT statement reached past the end of a line.NOTE: The data set WORK.AIRPLANES4 has 4 observations and 4 variables.

Missing Values without Placeholders

Partial Log

The missing is not correctly read.

Page 63: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

63

proc print data=airplanes4 noobs;run;

In Pass Cargo ID Service Cap Cap

50001 10627 530 5000250003 11617 90 53050004 12088 172 55050005 12228 510 50006

Missing Values without Placeholders

PROC PRINT Output

This is not correct. Not only missing values are not correctly read, more errors have occurred.

Page 64: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

64

5 0 0 0 1 , 4feb1989 ,, 5 3 0

Missing Values without Placeholders• If your data does not have placeholders, use

the DSD option.

Page 65: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

65

The DSD Option

• General form of the DSD option in the INFILE statement:

INFILE ‘file-name’ DSD;

Page 66: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

66

The DSD Option

• The DSD option– sets the default delimiter to a comma– treats consecutive delimiters as missing values– enables SAS to read values with embedded

delimiters if the value is surrounded by double quotes.

Page 67: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

67

Using the DSD Option

data airplanes4; length ID $ 5; infile 'raw-data-file' dsd; input ID $ InService date9. PassCap CargoCap;run;

Page 68: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

68

NOTE: 6 records were read from the infile 'aircraft4.dat'. The minimum record length was 22. The maximum record length was 25.NOTE: The data set WORK.AIRPLANES4 has 6 observations and 4 variables.

Missing Values Without Placeholders

Partial Log

Page 69: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

69

proc print data=airplanes4 noobs;run;

In Pass Cargo ID Service Cap Cap

50001 10627 . 53050002 10907 132 54050003 11617 90 53050004 12088 172 55050005 12228 . 51050006 12772 180 520

Using the DSD Option

PROC PRINT Output

Page 70: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Exercise

Open the program c18_freeformat_missingRun the program, and observe the problem.Revise the program so that the missing data are properly handled.

Page 71: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Answerdata carsales;infile datalines dlm = ‘,’ missover DSD;input year country $ type $ sales;datalines;1998,US,CARS, 194324.121998,US,TRUCKS,142290.301998, CANADA,CARS,10483.441998, CANADA,TRUCKS,1998,JAPAN,CARS,15066.431998,JAPAN, TRUCKS ,40700.341997 ,,CARS , 213504.051997,US,TRUCKS,116735.651997,CANADA,CARS,904.891997,CANADA,TRUCKS,76576.121997,JAPAN,CARS,10000.181997,JAPAN,TRUCKS,50458.22;proc print data = carsales; run;

Page 72: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Exercise

Open c18_freeformat2Run the program, observe the results, and revise the program to read the data correctly.

Page 73: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Answer

data carsales2;length type $ 14. ;infile datalines dlm = '/' missover dsd;input year (country type) ($) sales comma10.;datalines;1998/US/CARS/$194324.121998/US/TRUCKS_GM/ $142290.301998/CANADA/CARS/$10483.441998/CANADA/TRUCKS_FORD/ 1998/JAPAN/CARS/$15066.431998/JAPAN/'TRUCKS_HUNDA'/$40700.341997/US/CARS/$213504.051997//TRUCKS_FORD/ $116735.651997/CANADA/CARS/$904.891997/CANADA/TRUCKS_GM/$76576.12/JAPAN/CARS/$10000.181997/JAPAN/TRUCKS_TOYOTA/$50458.22;proc print data = carsales2; title ' / as delimiter '; run;proc contents; run;

Page 74: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

74

Specifying an InformatTo specify an informat, use the colon (:) format modifier in the

INPUT statement between the variable name and the informat.General form of a format modifier in an INPUT statement:

NOTE: The informat used for free-format is not the same as the informat used in the Fixed Format input:

• Informat in Fixed Formatted Input is the format specifying the columns and how the data created in the raw data, so that the data values will be read based on the Informat.

• The Informat in free-format input is the format that the data values will be read to the new data set to be created.

INPUT variable : informat;

Page 75: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Modifying List Input

In reading free-format data, it is difficult to specify an informat that defines the # of columns to be read from the data set, since the # of columns is often not properly formatted. Also, nonstandard data values can not be properly read in these situations.

SAS provides two modifiers to help defining the informat.

Page 76: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Modifiers used in LIST INPUT

• The ampersand (&) modifier is used to read character values that contain embedded blanks.

• The colon ( : ) modifier is used to read nonstandard data values and character values that are longer than 8 characters, but which contain no embedded blanks.

Page 77: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Use the Modifier (&) in LIST INPUT• & enables to read characters contain single embedded

blanks, such as NEW YORK as a character value, and there is an embedded blank. Using DLM = ‘ ‘ will read NEW YORK as two character values: NEW and YORK.

But, we want to read it as NEW YORK as one data value. • Use & allows to read this as one data value. However, in

order to stop reading further into the next data value as part of NEW YORK, it requires TWO or MORE blanks following NEW YORK.

• & helps to read data values with one embedded blanks until it reaches TWO or more blanks.

Page 78: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Example of applying Modifier &

Data set (City , Population)NEW YORK 7,262,700LOS ANGLES 3,259,340CHICAGO 3,009,530HOUSTON 1,728910

To read this data set,

Data city_pop;Input city $ & population comma10.;Datalines;NEW YORK 7,262,700LOS ANGLES 3,259,340CHICAGO 3,009,530HOUSTON 1,728,910;run; proc print; run;

Page 79: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

The results from previous program using & modifier

The SAS System 13:23 Monday, November 15, 2010 28 Obs city population 1 NEW YORK 7262700 2 LOS ANGL 3259340 3 CHICAGO 3009530 4 HOUSTON 1728910

NOTE: The data value LOS ANGELOS is not read correctly. It has the default length of 8, not the correct length of 10 in this case.

To handle this problem, we introduce the use of LENGTH statement previously:

LENGTH city $ 10; SAS has another way to do this by using modifier & with an

informat together.

Page 80: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Using the & Modifier with an InformatData city_pop;Input city & $10. population comma10.;Datalines;NEW YORK 7,262,700LOS ANGLES 3,259,340CHICAGO 3,009,530HOUSTON 1,728,910;run; proc print; run;

NOTE: Once use $10. in the list input, one does not need to define the LENGTH statement. Since it defines the length for storing the CITY.

Page 81: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Some cautions of using &

• NOTE: $10. does not specify the # of columns to be read for city variable. It specifies the length to store the data value city when it is used with &.

• You MUST use two consecutive blanks as delimiters when use the & modifier.

• You can not use any other delimiter to indicate the end of each record.

Page 82: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Exercise

Open Program c18_freeformat_modifierRun each program to learn how modifiers work, review the options of using MISSOVER, DSD, Review the LENGTH statement,

Page 83: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Reading Nonstandard Values in LIST INPUT

• Nonstandard values, such as datew. , timew. Datetimew. , commaw.d, and so on require the user to specify the width, w. When this is used as Informat, w defines the # of columns to be read from the data.

• However, in a LIST INPUT, which is free-format, it is often very difficult to have the nonstandard values are properly defined in the correct # of columns.

• SAS introduces a LIST INPUT Modifier, Colon (:) to allows for reading the nonstandard values from delimiter to the next delimiter.

Page 84: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

84

LIST INPUT Without the Colon

• The colon signals that SAS should read from delimiter to delimiter.

• If the colon is omitted, SAS reads the length of the informat, which may cause it to read past the end of the field.– No error message is printed.– You might see invalid data messages or

unexpected data values.

Page 85: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Use COLON (:) as Modifier in LSIT INPUT

• Colon (:) modifier enables user to read nonstandard data values and

• Read character values that are longer than 8 characters with no embedded blanks.

• It reads values until a blank (or a delimiter) is reached.

• If the informat $w. is specified, this length overrides the default length.

Page 86: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Example of using Colon (:) modifierData city_pop;Input city & $10. population : comma.;Datalines;NEW YORK 7,262,700LOS ANGLES 3,259,340CHICAGO 3,009,530HOUSTON 1,728,910;run; proc print; run;

NOTE: the informat COMMA. Does not specify the w value. List Input reads data value until the next delimiter is reached. The default length of numeric is 8 for storing the numeric value. There is no need to specify the length of a numeric variable.

Page 87: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

• NOTE: The informat COMMA. does not specify the w value. List Input reads data value until the next delimiter is reached. The default length of numeric is 8 for storing the numeric value. There is no need to specify the length of a numeric variable.

• NOTE: If we DO NOT use Colon (: ), then, we must specify COMMAw.d in order to read the correct # of columns in then data. In this situation, w. is the # of columns read from the data set.

Page 88: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

88

Problem Option Non-blank delimiters

DLM='delimiter(s)'

Missing data at end of row MISSOVER

Missing data represented by consecutive delimiters and/ or Embedded delimiters where values are surrounded by double quotes

DSD

INFILE Statement Options

These options can be used separately or together in the INFILE statement.

Page 89: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Creating Free-Format External DataSimilar to reading free-format external data, we

can also create free-format external data by using:

FILE ‘path-to-external-data-set’ <DLM = ‘delimiters’ MISSOVER DSD > ;

PUT variable <format>;

Format specifies the format to write the data values. This is particular useful when creating data values in nonstandard format such as commaw.d, date9. , mmddyy10. and so on.

Page 90: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

An example to create city_pop.dat dataData city;Input city & $10. population : comma.;Datalines;NEW YORK 7,262,700LOS ANGLES 3,259,340CHICAGO 3,009,530HOUSTON 1,728,910;run; proc print; run;Data citypop; set city;File ‘c:\math707\rawdata\city_pop.dat’ dlm = ‘/’; Put city population comma.;Run;

Page 91: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

An example of creating external data using free format when delimiter is , and some numeric

variables are also saved using COMMAw.d format

Data citypop; set city;File ‘c:\math707\rawdata\city_pop.dat’ dsd; Put city population:comma10.;Run;

NOTE: since both delimiter is , and population is stored with comma format, the data values needs to be treated in a way it is recognizable as a data value. Using DSD option in the FILE statement creates quotation marks for population.

When reading this type of data, one must also use DSD option in the INFILE statement and one should also be careful about the LENGHTH.

Page 92: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

The resulting data setNEW YORK," 7,262,700"LOS ANGLES," 3,259,340"CHICAGO," 3,009,530"HOUSTON," 1,728,910“

To read this data set, one needs to use DSD option in the INFILE statement.

Data citypop2;length city $ 10; infile 'c:\math707\rawdata\city_pop3.dat' DSD ; input city $ population : comma10.;Run;proc print; run;

Page 93: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

The resulting data

Obs city population

1 NEW YORK 7262700 2 LOS ANGLES 3259340 3 CHICAGO 3009530 4 HOUSTON 1728910

Page 94: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Writing Character Strings and variable values in the external data set

Data citypop; set city;File ‘c:\math707\rawdata\city_pop.dat’ dsd;

Put ‘2000 City Census ‘ city ‘Total Population ‘ population : comma10.;

Run;

This program will create extra string to describe City and Population in the created data set.

Page 95: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Use PROC EXPORT procedure to create external data set

General Syntax:

PROC EXPORT DATA = ‘sas-data-set’ OUTFILE = filename’ DBMS=DLM REPLACE;

DELIMITER = ‘delimiter’;PUTNAME = <YES|NO>;RUN;

Using SAS pulldown menu, to export data set.File, Export Data, then follow the step-by-step menu to

create external file.

Page 96: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Exercise

Open program c18_put_freeformat_ExportRun the programs, and observe the result to make sure you learn how to write PUT statement and PROC EXPORT.

Page 97: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Exercise

Open program c18_Import to learn how to write PROC IMPORT procedure to read external data with free format

Page 98: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Mixing Input Styles

We have introduced • Column Input,• Formatted Input,• List InputAll of these input styles can be mixed in one

INPUT statement, depending on the situations.

Page 99: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Additional materials useful for reading delimited data

The textbook introduces the following options can be used in the INFILE statement for handling different situations when reading delimited external data:

MISSOVER, DSD, DLM = ‘delimiter’The follow are three additional useful options to handle the end

of a record:STOPOVER, TRUNCOVER, FLOWOVER

Page 100: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

ExampleConsider the following data set, TESTNUM. ----+----1----+-122333444455555

We will show the effect of usingFLOWOVER, MISSOVER and TRUNCOVER options in the infile statement

Page 101: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

The Value of TESTNUM Using Different INFILE Statement Options

OBS FLOWOVER MISSOVER TRUNCOVER

1 22 . 1

2 4444 . 22

3 55555 . 333

4 . 4444

5 55555 55555

data numbers; infile 'external-file'; input testnum 5.;run;

Page 102: Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,

Explanation of these options• FLOWOVER is the default behavior. It causes the DATA step

to look in the next record if the end of the current record is encountered before all of the variables are assigned values

• MISSOVER causes the DATA step to assign missing values to any variables that do not have values when the end of a data record is encountered. The DATA step continues processing.

• STOPOVER causes the DATA step to stop execution immediately and write a note to the SAS log.

• TRUNCOVER causes the DATA step to assign values to variables, even if the values are shorter than expected by the INPUT statement, and to assign missing values to any variables that do not have values when the end of a record is encountered.