View
226
Download
0
Category
Preview:
Citation preview
Importing Excel into SAS: A Robust
Approach for Difficult-To-Read
Worksheets
Name of event: TASS
Location of event: Toronto
Presenter’s name: Bill Sukloff
Branch name: Science &Technology
Date of event: September 25, 2015
Page 2 – November-11-15
Overview of this Presentation
• Background
• Issues encountered when importing Excel Workbooks
• Requirements
• Approach and implementation
• Conclusions
Page 3 – November-11-15
Background
• Air Quality Measurements and Analysis Section,
Environment Canada
• Mandate:
– Monitor atmospheric pollutants at multiple sites across Canada
– Data analysis and reporting
• Obtain data from Canada and the U.S. to analyze and
report on:
– the chemical composition of the atmosphere,
– atmospheric processes,
– spatial and temporal patterns,
– source-receptor relationships,
– and long range and transboundary transport of air pollutants.
Page 4 – November-11-15
Canadian Air and Precipitation
Monitoring Network
Page 5 – November-11-15
Background
• Data sources
– Laboratories
– Scientists
– External organizations
• Data are often provided in Excel workbooks
Page 6 – November-11-15
Example of a Difficult-to-Read Worksheet
Page 7 – November-11-15
Example of a Difficult-to-Read Worksheet
Page 8 – November-11-15
Example of a Difficult-to-Read Worksheet
Page 9 – November-11-15
Example of a Difficult-to-Read Worksheet
Page 10 – November-11-15
Example of a Difficult-to-Read Worksheet
Page 11 – November-11-15
Issues
• Column header names are not always on the first row
• Column header names sometimes span two rows
• Worksheet descriptions often appear in the first few rows
• Characters in numeric fields
– Special meaning, e.g., “<“ indicates a value that is below the
detection limit
– Data entry errors such as 0..01
– Inadvertently repeating column header names
• PROC IMPORT often produces inconsistent variable
attributes across files
– i.e., variable types (numeric, character, date, time) and formats
Page 12 – November-11-15
Requirements
• Requirements
– No manipulation of original worksheets
– Column headers may appear on any row and may span over two
rows
– Variable attributes are consistent over multiple worksheets
– A report that shows values which are inconsistent with the
specified attributes
▪ Characters in a numeric variable
▪ Extra decimal places
▪ Invalid dates and times
▪ The same column header name appearing in more than one column
– SAS/ACCESS Interface to PC Files
Page 13 – November-11-15
Approach
• A variable definition worksheet that specifies:
1. Column name
2. Variable type
▪ Character
▪ Numeric
▪ Date
▪ Time
3. Format
▪ Char: # of characters
▪ Numeric: number of digits before and after decimal place
▪ Date: format, e.g., yyyy-mm-dd, mm-dd-yyyy
▪ Time: format, e.g., hh:mm, hh:mm:ss
Page 14 – November-11-15
Approach (continued)
• Import worksheets with all columns in character format
– PROC IMPORT option GETNAMES=NO;
• The DATA step is used to convert results from PROC
IMPORT to the attributes specified for each variable
• Creation of a report showing conversion errors
Page 15 – November-11-15
Variable Definition Worksheet
Page 16 – November-11-15
Code
Page 17 – November-11-15
Implementation
• Open the input worksheet in Excel
• Record the following information:
– Workbook name
– Worksheet name
– Column headers row number
– 2nd row number for column headers, if any
– Start of data row number
– Output SAS dataset name
▪ the work directory is emptied with each run of the program so use
two-level names if importing multiple worksheets, e.g.,
archive.airdata
– Directory name for the data conversion report (PDF’s)
Page 18 – November-11-15
Implementation (continued)
• Enter the required information into the macro
parameters appearing in the header section of the
SAS program
• Alternatively, when you have many worksheets, move
the macro parameters to a new SAS program and
“%include” the SAS conversion program
– repeat the macro parameters and %include statement for
each worksheet
• Copy the header row(s) into a new worksheet
– Select “transpose” when pasting
– Insert the three variable definition headings on the top row
– Enter the variable attributes for each column heading
Page 19 – November-11-15
Output SAS Dataset
Page 20 – November-11-15
Conversion Report
Page 21 – November-11-15
Conclusions
• Keys to a robust method for difficult-to-read Excel
worksheets:
– Variable attributes are specified in a variable definition
worksheet
– Import worksheets as character and programmatically
convert to numeric, date, and time variables
• Combining data from multiple worksheets is easy
because variable attributes are forced to be consistent
• Invalid numeric entries are reported to data originators
• For a copy of the SAS program please contact
bill.sukloff@ec.gc.ca
Recommended