25
Lesson 2 • Topic - Reading in data – Chapter 2 (Little SAS Book)

Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

Embed Size (px)

Citation preview

Page 1: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

Lesson 2

• Topic - Reading in data

– Chapter 2 (Little SAS Book)

Page 2: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

Raw Data

Read in Data

Process Data(Create new variables)

Output Data(Create SAS Dataset)

Analyze Data Using Statistical Procedures

Data Step

PROCs

Page 3: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

Raw Data Sources

• You type it in the SAS program

• Text file

• Spreadsheet (Excel)

• Database (Access, Oracle)

• SAS dataset

Page 4: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

Data in Text Files

• Delimited data – variables are separated by a special character (e.g. a comma)

• Fixed position – data is organized into columns

Text files are simple character files that you can create or view in a text editor like Notepad. They can also be created as “dumps” from spreadsheet files like excel.

Page 5: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

Data delimited by commas(.csv file)

C,84,138,93,143D,89,150,91,140A,78,116,100,162A,,,86,155C,81,145,86,140• Note: Missing data is identified by multiple

commas.

Page 6: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

Column Data

C084138093143D089150091140A078116100162A 086155C081145086140• Note: Missing data values are blank.

Page 7: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

INFILE and INPUT Statements

When you write a SAS program to read in raw data, you’ll use two key statements:

• The INFILE statement tells SAS where to find the data and how it is organized.

• The INPUT statement tells SAS which variables to read-in

Page 8: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

Program 1* List Directed Input: Reading data values

separated by spaces;DATA bp; INFILE DATALINES; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES;C 84 138 93 143D 89 150 91 140A 78 116 100 162A . . 86 155C 81 145 86 140;RUN ;TITLE 'Data Separated by Spaces';PROC PRINT DATA=bp;RUN;

Obs clinic dbp6 sbp6 dbpbl sbpbl

1 C 84 138 93 143 2 D 89 150 91 140 3 A 78 116 100 162 4 A . . 86 155 5 C 81 145 86 140

Page 9: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

PARTIAL SASLOG

1 DATA bp;2 INFILE DATALINES;3 INPUT clinic $ dbp6 sbp6 dbpbl

sbpbl;4 DATALINES;

NOTE: The data set WORK.BP has 5 observations and 5 variables.NOTE: DATA statement used: real time 0.39 seconds cpu time 0.03 seconds

Page 10: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

* List Directed Input: Reading .csv files

DATA bp; INFILE DATALINES DLM = ',' DSD ; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES;C,84,138,93,143D,89,150,91,140A,78,116,100,162A,,,86,155C,81,145,86,140;TITLE 'Reading in Data using the DSD Option';PROC PRINT DATA=bp;RUN;

Consecutive commas indicate missing data

Page 11: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

* List Directed Input: Reading data values separated by tabs (.txt files);

DATA bp; INFILE DATALINES DLM = '09'x DSD; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES;C 84 138 93 143D 89 150 91 140A 78 116 100 162A 86 155C 81 145 86 140;TITLE 'Reading in Data separated by a tab';PROC PRINT DATA=bp;RUN;

Page 12: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

* Column Input: Data in fixed columns.

DATA bp; INFILE DATALINES ; INPUT clinic $ 1-1 dbp6 2-4 sbp6 5-7 dbpbl 8-10 sbpbl 11-13 ; DATALINES;C084138093143D089150091140A078116100162A 086155C081145086140;Title 'Reading in Data using Column Input';PROC PRINT DATA=bp;

Note: missing data is blank

Page 13: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

* Reading data using Pointers and Informats

DATA bp; INFILE DATALINES ; INPUT @1 clinic $1. @2 dbp6 3. @5 sbp6 3. @8 dbpbl 3. @11 sbpbl 3. ; DATALINES;C084138093143D089150091140A078116100162A 086155C081145086140;Title 'Reading in Data using Point/Informats';PROC PRINT DATA=bp;

Informats must end with a period.

Page 14: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

Program 2* Reading data from an external file

DATA bp; INFILE ‘C:\SAS_Files\bp.csv' DSD FIRSTOBS = 2; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl ; TITLE 'Reading in Data from an External File';PROC PRINT DATA=bp;

clinic,dbp6,sbp6,dbpbl,sbpblC,84,138,93,143D,89,150,91,140A,78,116,100,162A,,,86,155C,81,145,86,140

Content of bp.csv

Page 15: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

PARTIAL SAS LOG

7 DATA bp;8 INFILE 'C:\SAS_Files\bp.csv' DSD FIRSTOBS=2 ;9 INPUT clinic $ dbp6 sbp6 dbpbl sbpbl ;

NOTE: The infile 'C:\SAS_Files\bp.csv' is: File Name=C:\SAS_Files\bp.csv, RECFM=V,LRECL=256

NOTE: 5 records were read from the infile 'C:\SAS_Files\bp.csv'. The minimum record length was 10. The maximum record length was 16.

NOTE: The data set WORK.BP has 5 observations and 5 variables.NOTE: DATA statement used (Total process time): real time 0.10 seconds cpu time 0.01 seconds

Page 16: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

* Using PROC IMPORT to read in data ;* Can skip data step;

PROC IMPORT DATAFILE=‘C:\SAS_Files\bp.csv' OUT = bp DBMS = csv

REPLACE ; GETNAMES = yes; GUESSINGROWS = 9999;

TITLE 'Reading in Data Using PROC IMPORT';

PROC PRINT DATA=bp;PROC CONTENTS DATA=bp;

Uses first row for variable names

Page 17: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

The CONTENTS Procedure

Data Set Name WORK.BP Observations 5Member Type DATA Variables 5

Alphabetic List of Variables and Attributes

# Variable Type Len Format Informat

1 Clinic Char 1 $1. $1.2 DBP6 Num 8 BEST12. BEST32.4 DBPBL Num 8 BEST12. BEST32.3 SBP6 Num 8 BEST12. BEST32.5 SBPBL Num 8 BEST12. BEST32.

Page 18: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

SOME INFILE OPTIONS

• OBS - limits number of observations read• FIRSTOBS - start reading from this obs.• MISSOVER and PAD - used to read in data

with short records• TERMSTR= used to read files from different

OS.• LRECL= needed when you have data with

long records (> 256 characters)

Page 19: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

Problem when reading past default logical record length;

DATA temp; INFILE ‘\...\tomhs.data' OBS=6 ; INPUT @260 jntpain 2. ;TITLE 'Data not read in correctly because

variable is past LRECL ';PROC PRINT;

Obs jntpain

1 . 2 . 3 .

NOTE: Invalid data for jntpain in line 2 NOTE: SAS went to a new line when INPUT statement reached past the end of a line

Page 20: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

*Add LRECL option to fix problem ;

DATA temp; INFILE ‘\…\tomhs.data' OBS=6 LRECL=500; INPUT @260 jntpain 2. ;

TITLE 'Data read in correctly using LRECL option';

PROC PRINT;

Obs jntpain

1 1 2 1 3 1 4 1 5 1 6 2

Page 21: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

Reading Special Data

• 04/11/1982 Date• 59,365 Comma in number• 086-59-9054 Long (>8) characters

Informat• 04/11/1982 mmddyy10.• 59,365 comma6.• 086-59-9054 $11.

Page 22: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

* Reading special data with fixed position data;

DATA info; INFILE DATALINES; INPUT @1 ssn $11. @13 taxdate mmddyy10. @25 income comma6. ; DATALINES;086-59-9054 04/12/2001 59,365 405-65-0987 03/15/2002 26,925212-44-9054 04/15/2003 44,999;TITLE 'Variables with Special Formats';PROC PRINT DATA=info; FORMAT taxdate mmddyy10.;

Obs ssn taxdate income1 086-59-9054 04/12/2001 593652 405-65-0987 03/15/2002 269253 212-44-9054 04/15/2003 44999

Page 23: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

* Reading special data with list input using colon modifier;

DATA info; INFILE DATALINES DLM=‘;’; INPUT ssn : $11. taxdate : mmddyy10. income : comma6. ; DATALINES;086-59-9054;04/12/2001;59,365 405-65-0987;03/15/2002;26,925212-44-9054;04/15/2003;44,999;TITLE 'Variables with Special Formats';PROC PRINT DATA=info; FORMAT taxdate mmddyy10.;

Obs ssn taxdate income1 086-59-9054 04/12/2001 593652 405-65-0987 03/15/2002 269253 212-44-9054 04/15/2003 44999

Page 24: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

Summary of Ways of Reading in Data

You may not have a choice - data may come to you in a certain format

• List input - data is separated by a delimiter; must read in all variables.

• Column input - data is in fixed columns;must know where each variable starts and ends; can read in selected variables

• Pointers and Informats - alternative to column input; most flexible; must be used for special data

• PROC IMPORT

Page 25: Lesson 2 Topic - Reading in data –Chapter 2 (Little SAS Book)

Exercise 2

• See exercise 2 in course notes