Upload
pspedding
View
3.175
Download
1
Embed Size (px)
DESCRIPTION
The integration of SAS and Tableau can have significant business benefits. SAS and Tableau are ‘best of breed’ in their own areas: SAS in the area of Analytics and ‘Analytical Data Preparation’; Tableau in the area of data discovery, visualization and intuitive, interactive dashboarding. Consequently, it makes sense to find ways to combine these technologies to deliver an Integrated Information Framework which leverages the strengths of both solutions.
Citation preview
SAS/Tableau Integration
10 Steps for a Seamless SAS/Tableau
Experience
Patrick SpeddingStrategic Advisor, Business Intelligence & Analytics
See-Change Solutions Ltd
au.linkedin.com/in/spedding @spedding
Copyright @ 2012 See-Change Solutions http://www. see-change.com.au
US: (949) 528-6665
Australia: (02) 8005-6148
10 Steps for a Seamless SAS/Tableau Experience
Copyright @ 2012 See-Change Solutions
For the many organizations who have both SAS and Tableau, it makes sense to find ways to integrate these technologies to provide an Integrated Information Framework which leverages the strengths of both solutions.
This presentation covers 10 techniques for integrating SAS and Tableau, using SAS as a data source and data preparation environment. SAS routines developed to feed the Tableau solution will also be demonstrated.
Taking advantage of SAS and Tableau capabilities in this manner can provide a way to ‘rapid prototype’ business reporting requirements, without the costs and delays typically seen when attempting to model emerging business requirements in the Data Warehouse via traditional ETL methods. In addition, this approach suggests a way to protect existing investments in analytical reporting as developed by your SAS team, by providing a platform to publish those reports for easy consumption, plus easy re-formatting and ‘slice & dice’ of these reports in the Tableau environment.
Techniques covered will include commonly requested topics such as data and currency formatting, relative date calculations, longitudinal data analysis, integrating SAS Web Stored Processes and considerations for the use of SAS ACCESS and SAS ODBC/ OLE-DB.
Agenda
Copyright @ 2012 See-Change Solutions
• SAS/Tableau Integration:1. Extract formats2. Integrating SAS Web Stored Processes3. Use of SAS ODBC/OLE-DB4. Date Formats5. Extract useful elements of date fields/relative date
calculations6. Currency Formats7. Rename Column Names/columns used for levels in
hierarchies8. Add descriptors for coded fields9. Use SAS formats for more complex formats10.Merge Disparate Data Sources
• Q & A
SAS Data as a Source for Tableau: Approaches
Copyright @ 2012 See-Change Solutions
• SAS Dataset -> CSV -> Tableau
• SAS Report -> CSV -> Tableau
• SAS Stored Process -> Tableau
• SAS Data -> Datasource (RDBMS Connection) -> Tableau
• SAS Data -> Datasource (via ODBC) -> Tableau
• SAS Dataset -> OLE-DB -> Excel -> Tableau
Note: CSV is typically around 10X smaller than SAS7BDAT format
SAS Dataset -> CSV -> Tableau
Copyright @ 2012 See-Change Solutions
This method uses a SAS dataset to feed the Tableau environment. For example, complex business logic can be built into a SAS (E. Guide) process, then value can be added in Tableau Desktop (e.g. drill-downs, relative time calculations, ratios), before displaying via Tableau Server). This is a good approach for both prototyping BI requirements as well as ‘Analytical Data Preparation’.
proc export
data=WORK.COMPARATIVE_PERFORMANCE
outfile=
"\\corpdfs\SAS\Output\Comparative
_Performance.txt"
dbms=dlm replace ;
delimiter = '|' ;
run ;
SAS Report -> CSV -> Tableau
Copyright @ 2012 See-Change Solutions
This method takes the output of a SAS report (eg Enterprise Guide report) and ‘pivots’ the data in such a way as to provide a data input into Tableau.
SAS Stored Process -> Tableau
Copyright @ 2012 See-Change Solutions
This method takes a SAS report and enables it as a SAS Web Stored Process, which can then be linked and run within Tableau. Security can be integrated via ‘Single Signon’ if required. (Note: SAS Integration Technologies required)
Note: For SAS Web Stored Processes with Prompts, need to add
&_action=properties to the URL
http://<SAS Server>:8080/SASStoredProcess/do?_program=<Report
Name>&_action=properties
SAS Data -> RDBMS-> Tableau
Copyright @ 2012 See-Change Solutions
This method uses the SAS ‘PROC SQL’ method to output SAS results directly to a relational table, for example a table within the Data Warehouse. With the SAS/ACCESS interface, you reference database objects directly in a DATA step or SAS procedure using the SAS LIBNAME statement. PROC SQL can be used to update, delete or insert data into a relational table, for example via Bulk Load.
SAS Data -> ODBC -> Tableau
Copyright @ 2012 See-Change Solutions
This method can use an ODBC Connection to allow any SAS dataset to be a source for Tableau. The SAS ODBC driver can be used to create an ODBC connection, from which a data source connection can be defined within Tableau to point to the SAS data set.
http://support.sas.com/demosdownloads/setupcat.jsp?cat=ODBC%20Drivers
Note: SAS profile required to access product downloads (400Mb download)
Note: Date fields are not properly interpreted unless it is a Tableau extract
SAS Data -> OLE-DB -> Excel -> Tableau
Copyright @ 2012 See-Change Solutions
This method can use an OLE-DB Connection to allow any SAS dataset to be a source for Tableau. The SAS OLE-DB provider can be used to create an OLE-DB, from which a data source connection can be defined within Excel. Tableau can then point to the Excel file to retrieve the SAS data.
Dealing with SAS Dates
Copyright @ 2012 See-Change Solutions
In this example, we have a number of dates in our SAS dataset:
t1.rsmwrkdt FORMAT=DDMMYYS8. LABEL="Resumed Work Date" AS 'Resumed Work Date'n
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_POLICY1 AS
SELECT t1.trandate FORMAT=DDMMYYS8.,
t1.polexpdt FORMAT=DDMMYYS8.,
t1.commdate FORMAT=DDMMYYS8.
FROM WORK.QUERY_FOR_POLICY t1
QUIT;
Dealing with SAS Dates - Notes
Copyright @ 2012 See-Change Solutions
If you're going to work with a date as a string type it's better to use ISO-8601 format of YYYY-MM-DD. This is locale insensitive so you don't need to worry about DD/MM/YYYY vs. MM/DD/YYYY. Your formula would then read:
DATE(LEFT([Period],4)+ “-“ + MID([Period],5,2)+ “-“ + RIGHT([Period],2))
This is an improvement, but string logic is much slower than numeric logic, so it would be even better to work with this as numbers. Convert the [Period] field to be a number instead of a string, then use the following:
DATEADD(‘DAY’, [YYYYMMDD]%100-1,DATEADD(‘MONTH’, INT(([YYYYMMDD]%10000)/100)-1,DATEADD(‘YEAR’, INT([YYYYMMDD]/10000)-1900, #1900-01-01#)))
Note that the performance gains can be remarkable with large data sets. In a test conducted over a 1 billion record sample, the first calculation took over 4 hours to complete, while the second took about a minute.
Dealing with SAS Dates - Examples
Copyright @ 2012 See-Change Solutions
/* Day of Injury */
(CASE
WHEN(WEEKDAY(t1.injdate)) = 1 THEN
'Sun'
WHEN(WEEKDAY(t1.injdate)) = 2 THEN
'Mon'
WHEN(WEEKDAY(t1.injdate)) = 3 THEN
'Tue'
WHEN(WEEKDAY(t1.injdate)) = 4 THEN
'Wed'
WHEN(WEEKDAY(t1.injdate)) = 5 THEN
'Thu'
WHEN(WEEKDAY(t1.injdate)) = 6 THEN
'Fri'
WHEN(WEEKDAY(t1.injdate)) = 7 THEN
'Sat'
END) AS 'Day of Injury'n,
Dealing with SAS Dates - Examples
Copyright @ 2012 See-Change Solutions
/* Elapsed Weeks */
intnx('week',t1.csdwrkdt,t1.rsmwrkdt,'e')
LABEL="Elapsed Weeks" AS 'Elapsed Weeks'n,
/* Claim Age (Months) */
intck('month',t1.clamdate,t1.sysdate) LABEL="Claim
Age (Months)" AS 'Claim Age (Months)'n,
/* Reporting Delay (days) */
intck('day',t1.injdate,t1.clamdate) ) AS 'Reporting
Delay (days)'n,
SAS Lag functions can also be very useful
(See “Longitudinal Data Techniques” http://www.ats.ucla.edu/stat/sas/library/nesug00/ad1002.pdf)
Currency Formats
Copyright @ 2012 See-Change Solutions
t1.totpayc FORMAT=BEST12. LABEL="Claim Payments to
Date" AS 'Claim Payments to Date'n
Use BEST12. to avoid issues when importing/displaying SAS currency data in Tableau.
Rename Column Names
Copyright @ 2012 See-Change Solutions
t1.emplnam1 LABEL="Employer Name" AS
'Employer Name'n,
/* Industry - Level 1*/
t1.indgroup LABEL="Industry - Level 1"
/* Industry - Level 2*/
t1.indsubgrp LABEL="Industry - Level 2"
data claims ;
set mth.claims ;
keep insurer claim teed deis injdate injdsnat
injnatc injresc injdislc clmclosf clmclodt
workpc csdwrkdt rsmwrkdt hrswrkwk hrstincc;
run ;
Rename cryptic SAS field names:
Label hierarchy levels appropriately:
Add descriptors for coded SAS fields
Copyright @ 2012 See-Change Solutions
/* Insurer Name */
(CASE
WHEN t1.insurer = 1 THEN 'Insurer 1'
WHEN t1.insurer = 2 THEN 'Insurer 2'
WHEN t1.insurer = 3 THEN 'Insurer 3'
WHEN t1.insurer = 4 THEN 'Insurer 4'
ELSE 'Not Known'
END) AS 'Insurer Name'n
/* Liability Status */
(CASE
WHEN t1.clmliab = 1 THEN 'Notification of work related
injury'
WHEN t1.clmliab = 2 THEN 'Liability accepted'
WHEN t1.clmliab = 5 THEN 'Liability not yet determined'
WHEN t1.clmliab = 6 THEN 'Administration error'
WHEN t1.clmliab = 7 THEN 'Liability denied'
WHEN t1.clmliab = 8 THEN 'Provisional liability accepted -
weekly and medical payments'
WHEN t1.clmliab = 9 THEN 'Reasonable excuse'
WHEN t1.clmliab = 10 THEN 'Provisional liability
discontinued'
WHEN t1.clmliab = 11 THEN 'Provisional liability accepted -
medical only, weekly payments not applicable'
WHEN t1.clmliab = 12 THEN 'No action after notification'
ELSE 'Not Known'
END) AS 'Liability Status'n,
Use SAS formats for more complex formats
Copyright @ 2012 See-Change Solutions
CASE WHEN t1.deis le '30jun2011'd
THEN put(t1.occncode,asc21dgn.)
ELSE put(t1.occncode,anzsco1n.)
END AS 'Occupation - Level 1'n,
/* Occupation - Level 2*/
CASE WHEN t1.deis le '30jun2011'd
THEN put(t1.occncode,asc22dgn.)
ELSE put(t1.occncode,anzsco2n.)
END AS 'Occupation - Level 2'n,
Often in SAS, a single field will be set up with several informats, relating to different levels of a hierarchy.
Connecting to the SAS dataset via SAS ODBC would lose this information, therefore it is advisable to apply each SAS informat to create multiple fields in the SAS extract, prior to importing into Tableau.
Merge Disparate Data Sources
Copyright @ 2012 See-Change Solutions
This is particularly useful when rows may not all match across sources
Also, this approach avoids having to try to join all sources in real time in one or several outer join SQL statements (as would be the approach in traditional BI tools such as Cognos)
THANK YOU.
See-Change [email protected]
www.see-change.com.au
Copyright @ 2012 See-Change Solutions
au.linkedin.com/in/spedding
@spedding