SAS Connect vs SAS Access

Embed Size (px)

Citation preview

  • 8/13/2019 SAS Connect vs SAS Access

    1/8

    Getting connected with your DATA: Using SAS/CONNECTand SAS/ACCESS

    to work with

    data housed in a remote environment

    Kevin Delaney, New York State Office of Mental Health, Albany, New York

    Abstract

    This paper will provide an overview of

    SAS/CONNECT and SAS/ACCESS software foraccessing and manipulating databases located in aremote environment. Using the example of an

    Oracle database on a remotely located Unix server,the author will demonstrate many of the mainfeatures of SAS/Connect and SAS/Access.SAS/Connect topics to be covered include:

    connecting to the remote server, submitting SAScode remotely, and moving data back and forth

    between the server and client. SAS/Access topics tobe covered include: interfacing with the data usingboth the libname statement with SAS/Access

    specific options and running a query against the datausing the SQL pass through facility. Issues ofefficiency and practicality will also be discussed.

    Introduction

    At this company, many of our large data sets arehoused in an ORACLE relational database, on aUnix server. In order to access these data from our

    local area network, and work with them in SAS, wehad to become familiar with the SAS/CONNECTand SAS/ACCESS software packages. Like mostSAS software, within these packages there aremany different ways to reach the same goal. Onceyou become familiar with several of these methods,

    the only challenge is to figure out which method ismost appropriate for a given circumstance. This

    paper will attempt to walk through several of themore common utilities of SAS/CONNECT andSAS/ACCESS software, and hopefully clarify which

    methods are most efficient and most practical.

    Throughout this paper I will stick with what I know,I will use examples involving a SAS/CONNECTsession with a Unix server, and the SAS/ACCESS

    interface with ORACLE relational databases. Forthose of you who know more about other operating

    systems, or other database management systems, Ihope you will find that my examples are adaptable toyour host/entity of choice.

    SAS/CONNECT :

    The introduction to the SAS/CONNECT User'sGuideTMtells us that SAS/CONNECT is a "SAS-to-SAS client/server toolkit." What exactly does thismean? SAS/CONNECT software can be used to

    connect to a SAS session running on a remoteserver, to transfer data between environments, and to

    process data on the remote server. I will attempt toaddress the multitude of methods that

    SAS/CONNECT provides for accomplishing thesetasks within this paper. SAS/CONNECT also

    supports SCL commands and SAS/AF applicationsthat allow for remote messaging, linking of objectson different platforms, and running of scheduled

    applications for routine updates from one server toanother, but I will not cover these topics here.

    Getting Connected

    There are several methods within SAS/CONNECTthat can be used to actually connect to a remoteenvironment. In the Windowing environment youcan use the SIGNON window to connect to the

    remote host.

    Select RUN from the toolbar, then SIGNON

    Figure 1: Select Run and SIGNON from the DisplayManager Pull down menus.

    This gets you the following SIGNON menu:

  • 8/13/2019 SAS Connect vs SAS Access

    2/8

    Figure 2: SAS/CONNECT SIGNON window

    SAS/CONNECT ships with a number of script filesthat establish the connection between SAS on the

    local host and SAS on the remote host. These arespecific to the remote host, but can be modified from

    their standard form. You can also write your ownscript file, instructions for doing so are included inthe SAS/CONNECT User's Guide, Version 8TM. By

    default these script files are stored in the

    !SASROOT\CONNECT\SASLINK

    folder in Windows. SAS will also look for script

    files in your SASUSER folder. An example of thedefault TCPUNIX.scr that ships with SAS is

    attached to this paper, as well as my modifiedTCPUNIX2.scr, see if you can recognize the

    changes. As you might guess you can have a lot offun with the script files, if you are so inclined.

    SAS/CONNECT supports several different accessmethods that are operating system dependant. All ofmy examples will involve the TCP/IP access method

    for communication between Unix and Windows. Iwill not say a whole lot about access methods other

    than to mention that you need to use one that may beused by both the local and remote hosts. For a morein depth discussion see: Communications Accessmethods for SAS/CONNECT and SAS/SHARE

    software, Version 8TM.

    With this information, and the name of the remotehost onto which you would like to connect, you canthen sign on to your remote SAS session using theSAS/CONNECT SIGNON window pictured in

    Figure 2. You would place the name of the scriptfile you would like to use in the first line, the remotesession's name in the second and yourcommunications access method in the third.

    For my example I am using a script calltcpunix2.scr, to connect to the remote host Unixdata,

    using the TCP/IP access method. These are the onlythree lines that need to be filled in, as the NOTE onthe bottom of the window states, leaving a field

    blank will default to the current setting. The onlyother item you might want to change is whether or

    not remotely submitted commands executesynchronously, but we will discuss this more fully in

    a minute.

    If you prefer a more programmatic approach when

    signing on to the remote host, the syntax is equallyeasy to grasp. In SAS Version 8, you need only

    associate the fileref RLINK with your script file andthen issue the SIGNON command. For my example:

    filename rlink 'tcpunix2.scr';signon unixdata;

    Passing SAS statements to the remote host

    Now that we are signed onto SAS "up" on Unix, letssend some SAS commands through and see how it

    works. To send SAS statements to a remote hostyou need only bracket your normal SAS code with

    an RSUBMIT; - ENDRSUBMIT; block. Forinstance:

    rsubmit;libname myunix '/home/myunixdir';

    endrsubmit;

    Creates the LIBRARY MYUNIX within the session

    of SAS executing on Unix, and then returns the logfrom this remote session to your local SAS Log. (Ifwe had done something that produced output, theoutput would also be directed down to the localoutput window.)

    Remote Library Services

    SAS/CONNECT also offers the ability to create alocal library that refers to files on the remote sessionusing the REMOTE engine. This is useful if youwish to use the Explorer window to look at the SASdata sets housed in your remote directories. The

    syntax to create a LOCAL libref to the samedirectory as our MYUNIX LIBRARY "up there"would be:

    libname mylocux '/home/myunixdir'

    server=unixdata;

  • 8/13/2019 SAS Connect vs SAS Access

    3/8

    Once you have set up this remote libref you can then

    manipulate data on the remote host withoutwrapping it in an RSUBMIT; - ENDRSUBMIT

    block. For example:

    proc contents data=mylocux.set1

    out=mylocux.set1contents;run;

    If you happen to know the directory you have beenassigned on the remote host this works well, but

    what about viewing the work directory? You canuse the SASHELP.VMEMBER data set view on

    your remote host to set up a local libref to yourremote WORK library:

    rsubmit;data findwork;

    x=1;run;

    data find2(keep=path);set sashelp.vmember;if Upcase(memname)='FINDWORK';

    run;proc download data=find2 out=finduxwork;

    run;endrsubmit;data _null_;

    set finduxwork;call symput("workdir",trim(path));

    run;%put &workdir ; *to make sure it worked;

    libname unixwork "&workdir"server=uxdata2;

    Notice we are looking for the Unix WORK libraryso we need to SET SASHELP.VMEMBER from

    Unix, by using an RSUBMIT with our data set. Forthose of you who have not used the VMEMBERdata set view in the past, it contains the attributes ofall the data sets currently referenced in your SAS

    session. By creating a dataset in the WORK libraryand then selecting the variable path for that data set,we obtain the full path of our current WORK library.

    This example also adds a new SAS/CONNECT

    procedure. PROC DOWNLOAD, and its partner incrime PROC UPLOAD, are SAS/CONNECT

    procedures that perform data transfer. The syntaxfor the procedures really is as easy as it looks. ForPROC DOWNLOAD DATA= data set name refers

    to the data on the remote host which you wish to

    push to your local SAS session. OUT=data set nameis the name of the data set that will reside in the local

    session. In this case the procedure copies the dataset FIND2 from Unix down into the data setFINDUXWORK on our local SAS session. Thisdata set is then used to create the MACRO variableWORKDIR, and a remote library ref to WORKDIR

    is established. This seems like a lot of work, but itactually executes in tenths or hundredths of seconds,

    and then allows you to use the local EXPLORERWINDOW to look at data sets on the remote server,rename them interactively, and even move them to

    other referenced libraries on either host.

    You can use the remote library reference as youwould any other library reference, so you can SET

    data on the remote host, and use it to create a localdata set, you can use PROC PRINT to print datafrom the remote host, and well, you get the point.

    However, this is often not the most efficient way touse the SAS/CONNECT product. For example, lets

    look at the following code:

    HEAT # 1

    data work.test;

    set unixwork.smallset;run;rsubmit;

    VS.

    proc download data=work.smallsetout=work.test2;

    run;endrsubmit;

    Heat #2

    data unixwork.test;set localref.smallset;run;

    VS.

    rsubmit;proc upload data=localref.smallsetout=work.test2;

    run;endrsubmit;

  • 8/13/2019 SAS Connect vs SAS Access

    4/8

    Heat # 3

    proc format library=workcntlout=unixwork.fmts;run;

    rsubmit;

    proc format library=work cntlin=work.fmts;run;

    endrsubmit;

    VS.

    proc format library=workcntlout=work.fmts;

    run;

    rsubmit;proc upload data=work.fmtsout=work.fmts2;

    run;proc format cntlin=fmts2;

    run;endrsubmit;

    I am not sure where the word HEAT comes from, butdefinition 10a in my dictionary does state " Oneround of many in a sporting competition, such as arace."

    This example pits remote library services againstPROC DOWNLOAD/UPLOAD in a little contest to

    see who is faster. With relatively small numbers ofobservations, and particularly with small numbers ofvariables, these two methods come pretty close.

    However, PROC DOWNLOAD/UPLOADdefinitely wins both HEAT # 1 and HEAT # 2. Theadvantage to using this procedure over the Remotelibrary option grows wider as you add morevariables and observations to the data sets you are

    moving between hosts. Of course if you arecleaning up for the night and interactively movingdata from your Unix work directory to a permanentlibrary it might be easier to click and drag in the

    EXPLORER WINDOW, but for long programs thatneed to be duplicable and or completely automated,PROC DOWNLOAD/UPLOAD seems to makemore sense.

    HEAT # 3 is much closer, because there is an extrastep needed to use PROC UPLOAD to move thedata. Also, unless you have a HUGEFORMATCATALOG, I don't know that the FMTS data setwill ever be big enough to see a real difference in

    efficiency.

    What would be neat (this is directed to those SAS

    people who make this stuff happen) is if

    options fmtsearch = (work.formatsunixwork.formats library.formats);

    actually worked. Unfortunately as it stands now ifyou try to assign formats located in the unixwork

    library or any other remote library using theOPTIONS FMTSEARCH=() option and a remotelibrary reference, you won't get an error, but when

    you try to assign a format from a remote FORMATcatalog to a local session variable it won't work.

    This is because "You cannot open a catalog througha server because access to catalogs is not supported

    when the user machine and server machine havedifferent data representations." (If you want to seethis "NOTE" yourself double click on the

    FORMATS catalog as it appears in theUNIXWORK folder of the EXPLORER

    WINDOW.)

    Are we having fun yet? The best attributes of

    SAS/CONNECT software are still ahead of us. Notonly can SAS/CONNECT talk back and forth with a

    remote host, but it can also do so asynchronously.To this point we have not made use of theWAIT=NO option in any of our RSUBMIT

    statements. This option tells SAS to send the SASstatements in the RSUBMIT; - ENDRSUBMIT;

    block through to the host server, but to immediatelyreturn control to the local SAS session. We haven'tused this option thus far because we haven't needed

    it; all of the code we have submitted executed andreturned results faster than we could blink. Thiswould not be true if we were trying to pull recordsout of a database with a couple million records, orto perform an SQL query that combines ten tables

    from a relational database. In my mind the bestreason for using SAS/CONNECT is to be able tosend large, memory intensive tasks such as these toanother server, and let the processing take place on

    the remote host, allowing you to be free to do otherthings locally. This is especially true if you storeyour data remotely so as not to bog down your localserver.We will look more closely at the uses of the

    SAS/CONNECT WAIT=NO and other statementsthat work with it as we turn our attention to anotherimportant piece of SAS software.

  • 8/13/2019 SAS Connect vs SAS Access

    5/8

    SAS/ACCESS software

    If your data is stored in a format other than a SASdata set on the remote server you are CONNECTedto, how do we ACCESSit??

    In effect SAS/ACCESS software provides a SAS-to-NONSAS database management software

    connection in the same way that SAS/CONNECT isa SAS-to-SAS connection. SAS/ACCESS allowsyou to read in and modify data housed in a NON-

    SAS data storage package, and then write thatmodified data back out to the database. From the

    data analysts prospective, I don't have a need towrite data back out to the database, in fact, in my

    job; I don't have the privilege of doing so. My focuswill therefore be on the various ways to 'access' datastored in a relational database, using SAS/ACCESS,

    rather than on the way to write these data back out(PROC DBLOAD). Again, the examples in this

    paper discuss accessing ORACLE tables on a Unixserver, if you are using a different DBMS, see theSAS/ACCESS User's Guidespecific to your product

    for modifications that you might need to run theseexamples on your system.

    SAS/ACCESS software provides three mainmethods for accessing a relational database, The

    ACCESS Procedure, a DBMS specific LIBNAMEstatement, or the SQL Pass-through facility. I will

    compare and contrast the three.

    The ACCESS Procedure

    This procedure is the most code intensive method ofaccessing a DBMS (Those of you deathly afraid ofSQL will note that I didn't say 'of using DBMSdata'), although none of the code is particular

    difficult to grasp. The ACCESS procedure forrelational databases consists of two distinctcomponents, the ACCESS descriptor, and the VIEWdescriptor.

    The ACCESS descriptor is a set of statements thattells SAS how to access a DBMS table. Forexample:

    proc access dbms=oracle;create work.mytest.access;user=kpd;orapw=mypassword;table=category_service;

    path='prda';

    assign=yes;rename catsrv_code=CATCODE

    catsrv_label=Service;list all;

    This is the access descriptor for an ORACLE tablecalled CATEGORY_SERVICE within the

    ORACLE instance 'prda'. The access descriptorcontains the information SAS/ACCESS will need to

    read this table when it is called upon to do so,including my userid (USER=) and password(ORAPW=). ASSIGN=YES tells SAS that all

    attributes of data sets created from this ACCESSdescriptor must conform to what is described here.

    For example, I have renamed the ORACLE fieldcatsrv_code to be CATCODE. Any SAS data sets

    created using this descriptor will contain the variableCATCODE, and I will not be able to rename them inthe VIEW descriptor. In addition to RENAME you

    can also use such familiar SAS options as FORMATand DROP within the ACCESS descriptor.

    A VIEW descriptor uses the information containedin its reference ACCESS descriptor to access the

    database, then CREATE VIEW to "take a picture" ofthe data. When you create a view, you actually set

    up a query of the data, which can later be called byany SAS procedure or DATA step. You can alsocreate a SAS data set from the ACCESS procedure,

    but it must occur after the initialization of a VIEWdescription. In other words while we would like:

    rsubmit;proc access dbms=oracle;

    create work.mytest.access;user=kpd;orapw=noturpassword;table=category_service;

    path='prda';

    assign=yes;rename catsrv_code=CATCODEcatsrv_label=Service;list all;

    create work.myview.viewout=outputdataset;

    select catsrv_code catsrv_label;subset where catsrv_code ='96';

    run;

    We instead need to use a second PROC ACCESSstatement to create the data set:

    rsubmit;

  • 8/13/2019 SAS Connect vs SAS Access

    6/8

    proc access dbms=oracle;create work.mytest.access;

    user=coevkpd;orapw=urnosey;table=category_service;

    path='prda';assign=yes;

    rename catsrv_code=CATCODEcatsrv_label=Service;

    list all;

    create work.myview.view;

    select catsrv_code catsrv_label;subset where catsrv_code ='96';

    run;proc access viewdesc=work.myview

    out=oratable1;run;endrsubmit;

    Notice that I submitted this code to my SAS session

    running remotely. This is, even in the case of a dataset with only two variables and one observation, themost efficient way of using PROC ACCESS. There

    are two reasons for this, first the Unix server is farless bogged down with everyday traffic than my

    Windows server. Even if I had a copy of thisORACLE database available locally, SAS could

    probably read it faster "up there." Second, since I

    don't actually have a copy of the data to accesslocally, it is much faster to access and manipulate it

    up where it lives than to pull the data through mynetwork connection to Unix (which is what wouldhappen if I submitted the code without the

    RSUBMIT).

    The LIBNAME statement

    The next option available to me is to reference the

    ORACLE instance where my data is stored using aLIBNAME statement.

    libname dwh1 oracle user=kpd

    password=stopaskingpath='dwh1' schema=cpeom;

    libname dwh1 oracle dbprompt=yesschema=cpeom;

    rsubmit;libname dwh2 oracle user=kpd

    password=iwonttellpath='dwh1'schema=cpeom;

    endrsubmit;

    The first piece of code represents a local LIBRARY

    reference to the remotely stored ORACLE data. Thesecond demonstrates the DBPROMT= optiondiscussed below. The main reason I can think of toset up the localLIBREF is the same as the reasonwe used the SERVER= option earlier. It provides a

    way to look at and move the smaller data tablesinteractively.

    The third example shows the preferred method,remotely submitting the library reference to create

    the ORACLE library as close to the data as possible.Like remote submitting PROC ACCESS in the

    previous section, we are trying to avoid pulling datathrough the network until absolutely necessary, i.e.,

    when we have a small enough subset of our data touse PROC DOWNLOAD or REMOTE LIBRARYSERVICES. In case you are wondering the

    SERVER= option presented in the SAS/CONNECTportion of the paper applies to the REMOTE library

    engine, while ORACLE in your LIBREF here callsthe ORACLE library engine, so we can't combinethe two to get a local copy of a remote ORACLE

    library. Nice thought though.

    The LIBNAME statement with options for theSAS/ACCESS to ORACLE engine has two distinctadvantages over PROC ACCESS. First, by

    referencing the ORACLE instance (an instance isORACLE's way of saying LIBRARY) you set up a

    reference to an entire group of tables at once, ratherthan having to create a descriptor for each table.Second, by using the DBPROMPT= option you can

    tell SAS to prompt you for your username, passwordand path rather than leaving them laying out in opencode. (Note: this obviously will not work inBATCH SAS code, nor will it work for a remotelibrary reference, since you won't have access to the

    resulting prompt locally.)

    SQL Pass-Through Facility

    For those of you familiar with SQL, the code forPROC ACCESS probably looked familiar. That is

    because SQL queries underlie most of what SASdoes with SAS/ACCESS for relational databases.(SAS/ACCESS software for other types of database

    management systems that do not use SQL to operateon the data stored within them works differently.) Ifyou do not use SQL, don't know how to use SQL,and have no interest in learning SQL, then the SQLPass-through facility is not for you. You can do

    pretty much everything you want to do with your

  • 8/13/2019 SAS Connect vs SAS Access

    7/8

    DBMS data using PROC ACCESS or theLIBNAME statement, and never have to write any

    "real SQL code." But if you are going to be workingwith data with large numbers of observations, ormany (50, 100, 250, etc.) related tables, you mightwant to start playing with SQL. Here is an exampleof what looks to be a complicated SQL Query (its

    really not that bad, but I am not teaching SQL todayso you will have to take my word for it) that

    combines information from three different tables in arelational database with over 1 million total records.It produces a count of the total number of clients

    served per year by county.:

    rsubmit wait=no;proc sql;

    connect to oracle (user=kpdorapw=mylipsrsealed

    path='pwh1'

    schema=snp);

    create table querytable as select *

    from connection to oracle (select dates.year,

    counties.ctyofres, count(distinctservices.recipient) as tot_served

    from snp.dates dates, snp.services services,snp.counties counties

    where dates.datekey=services.datekey

    andcounties.countykey=services.countykey

    group by dates.year counties.cntyofres);

    disconnect from oracle;

    quit;endrsubmit;

    *rdisplay unixdata;

    /*Pick one of us not both*/

    *rget unixdata;

    There are several key points. First to toot SQL'shorn a little, notice that it did not require sorting thedatabase to perform by group processing, that it

    produced a frequency count for me, and that it also

    essentially produced a report dataset of Total clientsserved by county and year.

    Second, what you may not have noticed is probablythe most exciting part of this SQL code, the

    CONNECT TO ORACLE and SELECT * (SQL for

    ALL) FROM CONNECTION TO ORACLEstatements. These statements are used to leave SAS

    entirely and run this query from within theORACLE database itself. SAS then is passed theresults of this ORACLE SQL query, which it uses tomake the data set QUERYTABLE. This is by farthe most efficient way of running a query against a

    database this large and complicated. It letsORACLE do the work it was designed to do, and

    then lets SAS do the rest. This could have beensubmitted on Unix using a LIBRARY reference forORACLE such as the DWH2 from my LIBNAME

    example, but this would have been slower than thequery that uses the SQL-Pass Through facility. The

    query could also have been run using the localLIBREF DWH1, but this would have been by far the

    slowest option (in the case of queries against HUGEdata sets the slowest by HOURS).

    Also, since this was submitted remotely with theWAIT=NO you can run other SAS procedures

    locally while this is running on your remote SASsession. The last two lines of code bring us back toSAS/CONNECT. The RDISPLAY and RGET

    commands are used with the WAIT=NO option togo up to the remote server at a later point in time and

    pull down the SAS LOG and output printed to theLISTING OUTPUT destination. RGET puts theseresults into your local LOG and OUTPUT windows

    respectively, while RDISPLAY opens two morewindows to display this output separately. Of these

    two, I prefer RGET. The reason for this preferencebeing that you can use RGET with PROC PRINTTOto save a local copy of the remote SAS session's

    LOG and OUTPUT, separate from your local SASsession log.

    proc printto log='remote.log'print='remote.lst' new;

    run;rget unixdata;

    proc printto;run;

    I haven't figured out a good way to do this withRDISPLAY output, other than to interactively copythe LOG or OUTPUT and then paste it into someother text file for later.

    Conclusion

    This paper was intended to present just some of themany ways to use SAS/CONNECT andSAS/ACCESS software, and, within the ways

    presented, to describe their pros and cons.

  • 8/13/2019 SAS Connect vs SAS Access

    8/8

    Hopefully the suggestions CONNECTed with you,and they will serve to make these two valuable

    packages more ACCESSible to you.

    References

    SAS Institute Inc. (1999), Communication Access

    Methods for SAS/CONNECT and SAS/SHAREsoftware, Version 8, Cary, NC: SAS Institute Inc.

    SAS Institute Inc. (1999), SAS/CONNECT User'sGuide, Version 8, Cary, NC: SAS Institute Inc.

    SAS Institute Inc. (1999), SAS OnlineDoc, Version

    8, Cary, NC: SAS Institute Inc.

    SAS and all other SAS Institute Inc. product orservice names are registered trademarks or

    trademarks of SAS Institute Inc. in the USA andother countries. indicates USA registration.

    Other brand and product names are registeredtrademarks or trademarks of their respective

    companies.

    Contact Information

    Please send questions, comments and suggestions to:

    Kevin DelaneyNYS Office of Mental Health44 Holland Ave

    Albany, NY 12229(518) [email protected]