Upload
vuongtram
View
296
Download
7
Embed Size (px)
Citation preview
Introduction to R in IBM SPSS Modeler
A guide for SPSS Users
Wannes RosiusBelgiumIBM
Goal of this guide
Although there are several very good articles and blogs related to IBM SPSS Modeler in my roleas technical professional for IBM Analytical solutions we still see lots of people struggling withboth R and the integration between IBM SPSS Modeler and R
The idea of this document is certainly not to replace these very useful links listed below butto enhance these in a way that people knowing IBM SPSS Modeler with only a very limitedknowledge of R can use this integration
Going through sections 2 3 and 4 the reader should be able to understand at a high level theR integration within SPSS and to (re)create some very basic R models within SPSS even if youhave only a basic knowledge of R
In section 5 you will learn more detailed tips tricks and other things This part is for theexperienced user and can be interpreted as a list of loose things which might help you get up tospeed with some more detailed functionalities of the integration and understand some pitfalls
At every point in the document we try to include R examples to the reader that could be easilycopied into the appropriate R node in IBM SPSS Modeler Unless specied otherwise these codesnippets are always based on the telcosav dataset which can be found in the demo folder of yourSPSS Modeler installation After the source node attach a type node and thereafter the appro-priate R node However sometimes there are just abstracts of code to show you the idea It willbe clearly mentioned when the code is incomplete You will nd these codes back into several codeframes throughout this document Furthermore all the SPSS streams and assets are embedded
in the pdf symbolized by You can access them by right clicking within this pdf document
Some useful links
bull Essentials for R - Installation Instructions
bull User Guide IBM SPSS Modeler 18 R Nodes
bull Modeler essentials for R Downloads
bull SPSS Modeler and R integration - Getting started
IBM SPSS Modeler and R
Contents
1 System Setup 311 Installing R 312 Enabling the R nodes 3
2 R basics 3
3 The basics of R nodes in IBM SPSS Modeler 531 The R nodes 532 Simple R code example 5
321 modelerData 6322 modelerDataModel 7323 modelerModel 8
33 Some general remarks 1034 Read data options 11
4 Custom Dialog builder 1141 Tools 1242 Custom dialog 1243 Simple example 12
5 Tips amp tricks Some more detailed 1451 R code 14
511 ibmspsscf70 library 14512 Some useful parts of R code 15
52 Custom Dialog builder 17521 How to save and share a custom dialog 17522 Link to dialog and script 17
53 What about SQL Pushback Hadoop pushback 1854 What about real-time scoring and Solution Publisher 1955 Something more about the metadata in modeler and the consequences on R integration 19
Page 2 of 20
IBM SPSS Modeler and R
1 System Setup
Let us start with the setup of your system For now we assume that you have a valid installationof IBM SPSS Modeler on your machine For more installation topics we refer to the InstallationInstructions
11 Installing R
Depending on the version of your IBM SPSS Modeler you will now have to install dierent versionsof R
SPSS Version R version R download link
160 2152 httpscranr-projectorgbinwindowsbaseold2152170 31 httpscranr-projectorgbinwindowsbaseold310171 31 httpscranr-projectorgbinwindowsbaseold310180 32 httpscranr-projectorgbinwindowsbaseold320
Once you downloaded and installed you will have a working R instance on your machineserverLike SPSS Modeler you can have several versions of R installed on your machine without anyproblem
12 Enabling the R nodes
You will need to install the IBM SPSS Modeler essentials for R You can nd these here on theSPSS Community Downloads page Click 2 Get Essentials for SPSS and then click the buttonGet R Essentials for SPSS Modeler This will take you to github and you will be able to selectand download the Modeler 18 Essentials for R for a variety of platforms If you require Essentialsfor R for earlier Modeler versions there is also a link to legacy versions
Run this execution le The installation will ask you the path of your R installation and thepath to the bin les of your SPSS Modeler installation (Note that in the prelled path it is thedefault path to a ModelerServer and you will need to change this if you want to congure yourclient) This installation will place the R nodes in your SPSS Modeler node palette and it willalso include necessary R libraries in your R installation folder
2 R basics
There is already an over13ow of R courses (publicly) available through several channels so wewould certainly not want to replace these In it also not very important that you are an R expertto follow this document However there are still some basics of R code and R terminology usersneed to understand in order to exploit the integration of R and IBM SPSS Modeler For thissection let us open R in its original GUI Therefore go to the R installation folder and openbinx64RGUIexe A window will be opened looking like this
Page 3 of 20
IBM SPSS Modeler and R
This is the R console ready for commands to run You might often hear the term RStudio whichis nothing more than a development environment on top of this R gui Installation of RStudio isnot required for this introduction but might be handy for further useWe will start the R introduction by stating R is a powerful programming language and environmentfor statistical computing and graphics An important part within that last phrase is that R is aprogramming language unlike IBM SPSS Modeler That means it is built on objects that aredened by the user As an example assume the following R code (feel free to type it within theR console to see the R outputs)
1 x lt- 1+1
2 y lt- 2x
3 xyVector lt- c(xy)
4 z lt- mean(xyVector)
5 print(z)
Here x is an object This statement will ll the object x with the value of the evaluated formula1 + 1 being 2 So whenever the program refers to x it will be interpreted as 2 In the secondline we will dene y as twice the value of x In the third line we create a vector containing thecontent of x and y to calculate the mean of these 2 objects and place it in an object z
The operator lt- could also be replaced by = but for various reasons lots of R users pre-fer this way of writing (actually it is not exactly the same but that could be ignored for thepurpose of this document) If you feel more comfortable in using = please do so
Like we lled x y and z with some numbers any R object can be lled with a variety of typesHere is a list of the most important for our purposes
Vector is a sequence of data elements of the same type (eg numeric or character) This includesvectors of length 1 which can be interpreted as just being numbers You can create a vectorwith the R function c() So in the example code above all the values of x y and z arevectors of length one xyVector is a vector of length 2 containing the values of (the vector)x followed by (the vector) y Trying to link it back to SPSS you can interpret a vector asthe values of a single data column
Data frame is a list of vectors of equal length If you look at a vector as the values of a variablea data frame could be interpreted as a 2-dimensional dataset with columns (the number ofvectors) and lines (the size of each vector)
1 n lt- c(2 3 5 3 9) A first vector of 5 numeric values
2 n2 lt- c(1 3 2 5 4) A second vector of 5 numeric values
3 s lt- c(aa bb cc aa zz) A third vector of 5 string values
4 b lt- c(TRUE FALSE TRUE TRUE TRUE) A fourth vector of 5 flag values
5 Data lt- dataframe(n s b New = n+n2) A data frame containing 4 vectors
Page 4 of 20
IBM SPSS Modeler and R
6 Note n+n2 will be a new vector called New with the sum of the n + n2 c(3 6 7 8 13)
7
8 dim(Data) Will show you it is a 5x4 dataset
9 Data[24] Will give back the value on the 2nd line the 3rd column
10 colnames(Data) Will give the column names as a vector (nsb New)
11 Data$n[1] Will give back the first value of the vector n within the data frame
12
13 iris predefined data frame
There are also several pre-dened data frames installed within R One of them is called irisSometimes this document will refer back to iris
Model class which is actually a specic list containing predened objects dening a statisticalmodel For example a linear model class will be a list containing among others the coecientsof the regression model
List is an ordered collection of objects As an example you can have a list where the rst elementis a vector the second is a data frame and the third is a model Note that a data frame isa special type of a list where all the elements are vectors of equal sizes
3 The basics of R nodes in IBM SPSS Modeler
31 The R nodes
Once the installation for the R essentials are done you will see 3 new nodes in your node palettesThere is also a 4th R node which is the R nugget The dierence between and understanding ofthese 4 objects are essential
Output with this node data will be sent to R but it will never go back to SPSS (as it is aterminal node) The only thing that can go back to SPSS is the outputs generated by Rthatwill be presented within an SPSS output window
Transform data will go from SPSS to R but will also go back to SPSS after which the SPSSprocess can be continued
Model like the output node this is a terminal node so data will not go back to SPSS Howeverthere will be a reusable R object created within a nugget
Nugget similar to Transform node with the dierence that there is a reusable R object that canbe used in the R code
Node
Name R output node R transform node R model node R syntax nodePalette tab Output Record Ops Model NAData back to SPSS No Yes No YesReusable R object No No Create Use
32 Simple R code example
Let us start with saying that all the examples in this section are intentionally kept very simple soas to explain the interaction in a functional and structured way and be simple enough for non Rprogrammers We are certainly aware that most of the R code snippets we show in this chapter
Page 5 of 20
IBM SPSS Modeler and R
could also easily be implemented using standard SPSS Modeler functionality
There are 3 very important and reserved R objects that you should keep in mind when youuse the SPSS Modeler R integration Here is a brief description of these 3 after which we will gointo more detail for each of them
modelerData This is an R data frame that will be lled by the data entering in this R node Thisdata frame can be used and changed within your R code Eventually it will also be the dataframe that will be sent back to SPSS Modeler as a dataset Note that it will only containthe content of the data not (necessarily) the data column names and other metadata items
modelerDataModel This is also an R data frame containing the metadata of the data that is sentto R and back to SPSS Modeler It contains most of the information that you may expectwithin an SPSS Modeler type node This will be the object that will be most strange forexperienced R users
modelerModel this is an R object that can be lled by the user by any type of object you wantIt does not need to have a certain structure It will be calculated in the R model node afterwhich it will be saved within the R nugget where it can be used in the R-syntax
Note that R code is case-sensitive and therefore so are these object names In the following sectionswe will explain the usage of these objects
321 modelerData
modelerData is the R data frame that will be lled by the dataset entering the SPSS Modeler nodeit comes from So you can use this data frame to perform the desired calculations transformationsand outputs in R
Place the following code in an R output node
1 Print the first 6 lines of the data
2 head(modelerData)
3
4 Give a summary of the data
5 summary(modelerData)
6
7 create a histogram of the variable tenure
8 hist(modelerData$tenure xlab = years main = Tenure histogram)
9
10 change the tenure unit from months to years
11 modelerData$tenure lt- modelerData$tenure1212
13 recreate the histogram now in months
14 hist(modelerData$tenure xlab = months main = Tenure histogram)
Execution of this node will result in an SPSS Modeler output window in which all the R outputswill be assembled These will always be divided in 2 tabs Text output and Graph output
Page 6 of 20
IBM SPSS Modeler and R
In this case the text output is linked to the code on line 2 and 5 rst it prints the rst 6 lines(head) of the data next it will give summary statistics for each column
The graph output are two histograms One for the tenure in months the other for the samecolumn but after redening it by dividing the original value by twelve to give the tenure in years(note the X-axis scale)
As shown in the example stream Explain modelerDatastr you can also copy exactly this samecode into a transform node and attach a table node to it After running this table node you willnot see any R output (as none is expected) That means that even though the output code hasrun no outputs will be given However the data frame of modelerData will be send back to SPSSModeler In this case you will see the value of tenure being divided by 12
322 modelerDataModel
Metadata is very important in SPSS Modeler Let us for simplicity say that within modeler meta-data is represented by the type node With metadata we mean the type of each of the variablesin the dataset (numeric 13ag String storage ) At all times modeler will know exactly all themetadata at every step in the stream
R does not handle the metadata in a similar way as SPSS Modeler We already explainedmodelerDataModel taking over the role of the type node This is done by a data frame(=dataset) of the following structure
X1 X2 Xn
fieldName region tenure agefieldLabel Geographic indicator Months with service Age in years
fieldStorage real real realfieldMeasure nominal continuous continuousfieldFormat standard standard standardfieldRole input input input
So this means that this dataset will always have 6 lines with xed names (yes in R also thelines have names) The thing with this dataset is that it is completely the responsibility of theuser to align this metadata with the appropriate data So that means if we would like to add avariable with R the user must also manually add a column in modelerDataModel to make suremodelerData correctly goes back to SPSS Modeler In the earlier example above we did not makeany changes to the modelerDataModel and it was also not needed as the metadata did not change(dividing a number by 12 will not change the metadata) Now let us continue on the previousexample But now rather than changing the value of tenure in the same data variable we willcreate another one As a result we would have to update the metadata
1 Create the vector of tenure in years
2 Rcolumn lt- modelerData$tenure123
4 Paste this vector to the right of the dataset
5 modelerData lt- cbind(modelerDataRcolumn)
6
7 create the metadata for the column to add
8 newVar lt- c(fieldName=tenureYears fieldLabel=fieldStorage=real fieldMeasure=
fieldFormat= fieldRole=)
9
10 paste the new column metadata to the existing metadata
11 modelerDataModel lt- cbind(modelerDataModelnewVar)
Running a table node downstream of this transform node will show you the new variable withthe name tenureYears There are some important things to realize in this
Page 7 of 20
IBM SPSS Modeler and R
bull fieldName and fieldStorage are the only 2 required rows that needs to be lled in for anynew column In the code we left all the other lines empty meaning they will be lled in bythe stream default For a list of available values we refer to the user guide
bull As modelerDataModel is only useful when you go back to SPSS Modeler you will generallyonly usechange this object in non-terminal R-nodes It might still be handy to use it interminal nodes if the value of the modelerDataModel is important for your output (egrun a histogram of all continuous variables)
bull When data will go back to SPSS modeler it will be the content of the data frame excludingthe column- and row names That means that even though the column in the modelerData
will be called Rcolumn the name of the column in SPSS will only be dened by the metadatawithin the row fieldName In this case it is called tenureYears
bull The only link between modelerData and modelerDataModel is the order of the columns Itwill not look by name The rst column in the data will be given the metadata of the rstcolumn of modelerDataModel In case the metadata (modelerDataModel ) does not matchthe modelerData an error is thrown The table below shows schematic how this works
modelerData modelerDataModel
R
RName1 RNamenx11 x1n
x21 x2n
xm11 xm1n
xm1 xmn
X1 Xn
fieldName Name1 NamenfieldLabel
fieldStorage xxx xxxfieldMeasure fieldFormat
fieldRole
SPSS
Name1 Namenx11 x1n
x21 x2n
xm1 xmn
Note that only the names of the modelerDataModel are used
Since this concept is very strange to standard R users We found this part the most dicult toexplain To people who know SPSS you can summarize it as modelerDataModel taking over therole of the type node
You can nd all streams and R scripts explaining modelerDataModel here
323 modelerModel
modelerModel is the R object that is stored within the R nugget This object will be populatedwithin the R model node after which you could use modelerModel within the R nugget for scoringThis very much works the same way as IBM Modeler works You ask a model node to calculatea formula after which that formula will be stored within the nugget together with the way itshould be used to calculate a scoring
You will only use this object within the R model node and nugget Note that within the Rmodel node there are 2 syntax window
Page 8 of 20
IBM SPSS Modeler and R
R model building is to calculate whatever you want to store within modelerModel that couldbe used within your nugget calculations to score your data This can be any object withinR As any SPSS Model builder node this will be a terminal code meaning that from thiscode no data will go back to SPSS modeler unless some outputs and things that are storedwithin the modelerModel object
R model scoring is the syntax to dene how you will use the object modelerModel containingthe content you stored to it in the R model building syntax to derive the new data Apartfrom the use of modelerModel this is very similar to the R transform node
Let us start with a simple example where we would like to create a basic linear model for thevariable tenure The formula of this model should be saved in the modelerModel after which itcan be used in the scoring
1 Create the model and save it in modelerModel
2 modelerModel lt- lm(tenure ~ age + region + ed + income data= modelerData)
3
4 Add some summary of the model in the nugget
5 summary(modelerModel)
6
7 together with a histogram
8 hist(modelerModel$residuals main = residual histogram)
9
10 and the residual vs actuals scatterplot
11 plot(modelerData$tenure modelerModel$fittedvalues xlab = actual ylab = predicted )
12
13 All of these output will be stored in the modeler nugget tabs
1 Use the model to make a prediction and add it to the existing data
2 pred lt- predict(modelerModel modelerData)
3 modelerData lt- cbind(modelerDatapred)
4
5 Take care of the metadata
6 newVar lt-c(fieldName=$L-tenure fieldLabel= fieldStorage=real fieldMeasure=
fieldFormat= fieldRole=)
7 modelerDataModel lt- cbind(modelerDataModelnewVar)
It is important to note that modelerModel can be lled with any type of object but will veryoften be of a model class but does not have to be In the previous example the object stored toit was clearly a (statistical) model In the next example we will just save 2 numbers within themodelerModel object Imagine we want to calculate the z-values of a certain variable In orderto create the z-values we need the mean and the standard deviation of the column We will storeboth of these within modelerModel after which we will use them in the scoring syntax1 Thisexample shows you do not need to store a statistical model within your modelerModel objectbut it really can be any R object
1 calculate mean and standard deviation
2 M lt- mean(modelerData$tenure)3 SD lt- sd(modelerData$tenure)4
5 and save it in a list called modelerModel
6 modelerModel lt- list(avg = M sDev = SD)
1 calculate z scores using the elements in modelerModel
2 zTenure lt- (modelerData$tenure - modelerModel$avg)modelerModel$sDev3 modelerData lt- cbind(modelerDatazTenure)
4
5 define new metadata column and add it
6 newVar lt- c(fieldName=zTenure fieldLabel=fieldStorage=real fieldMeasure=
fieldFormat= fieldRole=)
7 modelerDataModel lt- cbind(modelerDataModelnewVar)
1Note that there is a very good reason this is not combined into an R Transform node explained in 34
Page 9 of 20
IBM SPSS Modeler and R
You can nd all streams and R scripts explaining modelerModel here
33 Some general remarks
bull Although it might seems this way you are not required to build modelerData from theexisting data within that frame modelerData will be lled with the dataset you have inModeler however nothing stops you to throw that data away in R and dene some new datacoming from another data source in R As an example imagine this link from the WeatherCompany website This will give the weather history in Brussels Belgium in the monthNovember 2015 Now we can use R code as a source node by just overwriting modelerData
and redening modelerDataModel
1 Define the link
2 linkPath lt- httpwwwwundergroundcomhistoryairportEBBR20151101
MonthlyHistoryhtmlreq_city=Brusselsampformat=1
3 Read the data as csv
4 modelerData lt- readcsv(linkPath)
5 modelerData[1] lt- asDate(modelerData[1])
6
7 Redefining modelerDataModel all are real numbers except the first column is the date
8 modelerDataModel lt- asdataframe(t(dataframe(fieldName = colnames(modelerData)
fieldLabel = fieldStorage = c(daterep(realncol(modelerData)-1))
fieldMeasure = fieldFormat = fieldRole = )))
As you can see this code does not use the old denition of the dened R objects butcompletely redenes them Placing this in a R transform node will give back this newdataset to modeler So in this way you can use this approach to create an R input node
You can nd an example here
bull Within an R model node there is place for 2 scripts The building script will be the scriptthat will be populated within the R nugget It will not be run when you run the model nodeAs a result of this these 2 scripts are independent The only thing they share is the valueof the object modelerModel which is saved within the nugget when running the buildingsyntax and picked up within the R scoring syntaxThis also means that eventual R-libraries that are required should be loaded in both scriptsTake for example a model for a random forest
1 Load the library
2 library(randomForest)
3 Create the model and save it in modelerModel
4 modelerModel lt- randomForest(tenure ~ age + region + ed + income data= modelerDatantree
=50)
1 Load the library
2 library(randomForest)
3
4 Use the model to make a prediction and add it to the existing data
5 predlt- predict(modelerModel modelerData)
6 modelerData lt- cbind(modelerDatapred)
7
8 Take care of the metadata
9 newVar lt- c(fieldName=$RF-tenurefieldLabel=fieldStorage=realfieldMeasure=fieldFormat=fieldRole=)
10 modelerDataModel lt- cbind(modelerDataModelnewVar)
bull Talking about libraries and package A package is a collection of R objects dened for acertain purpose These often are specic statistical functionalities like randomForest in theexample above A basic R installation comes with the standard packages however there aremany more packages available made available by the R community on CRAN
Page 10 of 20
IBM SPSS Modeler and R
Packages needs to be installed and made locally available in libraries Once the package isinstalled on the system as a library you can load this library in any R session by the codelibrary(ltnamegt)To install a package you have several options The easiest is probably to write a codelike installpackages(randomForest) within R You will have to select a CRAN mirrorwhere this library will be downloaded from and the download will go automatically Nor-mally you will only have to do this onceAlthough possible it is not recommended to run this package installation command fromwithin SPSS The reason is that these libraries will than be saved in a temporary folder andafterwards be deleted If you still want to this through SPSS you will have to hard codethe installation path
34 Read data options
Something we ignored until now are the settings within the node under Read data options Thebasics of the R integration with Modeler can be done without the knowledge of this as it requiressome more advanced R knowledge The user guide still has a good explanation about these items
However there is one more thing that might be important For modeler version 17 and lowerthe R integration of non-terminal nodes (ie transform and nuggets) will by default be done inbatches of 1000 The reason for this was to allow these R nodes to work on hadoop and otherclustered environments As a result of this it is very important to realize that any R code thatwould span over multiple lines of data would lead to false results For a workaround for this werefer to 511
Take as an example the z scores above If we would calculate the mean and the standard de-viation of the variable in a non-terminal node it would start with running this code for the rst1000 lines of data So that leads to a specic mean deviation and corresponding z-scores How-ever the next 1000 lines a new mean and deviation would be calculated and the z scores will bebased on these
As a solution the means and standard deviations are calculated in the R model (ie a termi-nal node) over all the data and used in the R nugget to calculate the z scores
Note that this approach may lead to a very slow integration between SPSS and R in the caseof streaming R nodes in a local non-clustered environment However as from IBM SPSS Modelerversion 171 there is a default option not to use this approach of batch processing or to increasethe batch size For the lower versions there is a workaround possible if you still want to increasethis batch size or turn it o (see later)
4 Custom Dialog builder
The Custom Dialog Builder allow you to create and manage R nodes with prefilled R code to use insideIBM SPSS Modeler streams In this way users can create their own nodes You can start the CustomDialog builder in the Tools menu under rdquoCustom Dialog Builderrdquo
When opening a custom dialog builder you will see a 2 windows One of them is the custom dialogitself the other is the toolset to populate the dialog
Page 11 of 20
IBM SPSS Modeler and R
41 Tools
The tools window is a list of items you can place within your dialog This include among others the fieldchooser Check and combo boxes Text and number controls and tabs You can select any and drag themonto the dialog itselfOnce you have any item in the dialog you can select it and you will see the item properties These are theproperties of this specific item and will change dependent on the type of item it is The most importantare the identifier (the way it will be referenced within the script) and the Title (the one that will be visualin the dialog)
42 Custom dialog
The big gray window is the dialog itself For the moment it is empty as it should be populated with itemsfrom the Tools listClicking on this gray dialog will show you the dialog properties below As main items this includes thename and title of the dialog the script itself and the type and position of the created nodeWith regards to the script to be written The global rule is that you reference to the items within thedialog using their identifier between double percentages (rdquoltidentifiergtrdquoOnce you finished creating the custom node you can install it by pressing the green arrow in the toolbarYou can also save intermediate versions to the disk
43 Simple example
Let us create a custom dialog for the randomForest model created earlier in section 33 Below you willfind a step by step approachThe most important thing we have to wonder is what within this code we want flexible for the user Inthe case of this model there might be 3 things that we want flexibel the input variables the target andthe number of trees in our forest
First fill in the Custom dialog properties as indicated
Page 12 of 20
IBM SPSS Modeler and R
In our fixed example the target is tenure but a user might chooseany other field As a result we will place a field chooser on the dialogChange the properties like shown The variable filter properties allowsyou to select only categorical variables
In our fixed example the inputs are age income but a user mightchoose any other field As a result we will place a field chooser on thedialog The biggest difference is that now we can select several variablesas there might be different inputs To make it easier we will separatethese values by a + Therefore change the properties like shown
As a third custom choice we would like to add the number of trees inour forest In the original script is was 50 so we will choose this as thedefault However users may choose any integer value between 1 and1000 Add a number control on the dialog and change the properties
So now the dialog is ready and we need to add the script to it Go tothe Edit options and choose rdquoScript Templaterdquo This will bring you toan empty window for the script In this case (as we selected we wanted2 scripts) there is a tab for the building code and one for the scoringscript If the coring script is greyed out you did set the dialog propertyrdquoScore from the Modelrdquo to True
Page 13 of 20
IBM SPSS Modeler and R
Let us start with the scoring script as this is easier The only thingwhich will need to be adapted for custom input is the variable namethat will be send back to SPSS So copy the scoring code and changethe name tenure to TARGET (this is the name of the identifier of thetarget)
Fill in the code for the building script and change in a similar wayas above the values for target and intput variables together with thenumber of trees Afterwards press OK to close the script window
Being back at your Custom Dialog builder save the dialog in any appro-priate location Also deploy the dialog by clicking on the green deployarrow in the toolbar Close the Dialog builder
Back to the stream you will now see the new node in the model paletteYou can use this node within your stream
You can find the resulted cfe file here (place this in the correct location see 521) and a stream
where it is deployed
5 Tips amp tricks Some more detailed
51 R code
511 ibmspsscf70 library
Let us now have a more detailled view about what actually happens with the code First of all it is worthto check what happens when you do the R installation correctly This will install the by IBM delivered Rpackage ibmspsscf70 in the library folder of your R installation This library contains several functionsto handle the data traffic between SPSS and R
Running any R node in SPSS will not only run the code you write but it will also run some extracode behind the scenes You can see this code in the rdquoConsole outputrdquo window of the R node Lookingfor example at this tab for an R nugget you will see that your code will be something like
1 modelerModel lt- ibmspsscfoutputGetModel()
2 while(ibmspsscfdataHasMoreData())
3 modelerDataModel lt- ibmspsscfdatamodelGetDataModel()
4 modelerData lt- ibmspsscfdataGetData(rowCount=1000 missing=NA rDate=None
logicalFields=FALSE)
5
Page 14 of 20
package comspsssharedcustom_guiui_builderpeers public synchronized class RStatsApplierPeer implements compaswframeworkcommonextensionspiExtensionObjectPeer compaswframeworkcommonextensionspiOutputDataModelProvider compaswframeworkcommonextensionspiInteractorListener compaswcorepropertyPropertySetListener private compaswframeworkcommonextensionExtensionObject extensionObject private static final String RINTERACTOR = rinteractor private static final String RBUILDER = rbuilder private static final String RAPPLIER = rapplier private static final String ROUTPUT = routput private static final String RMODELAPPLIER = rmodel private static final String RPROCESS = rprocess private static final String SYNTAX = syntax private static final String SCORE_SYNTAX = score_syntax private static final String OUTPUT_DATAMODEL = output_datamodel private static final String INPUT_DATAMODEL = input_datamodel private static final String OUTPUT_MODE = output_mode private static final String FILE_MODE = File private static final String OUTPUT_TYPE = output_type private static final String GRAPH_TYPE = Graph private static final String TEXT_TYPE = Text private static final String OUTPUT_FILE_TYPE = output_file_type private static final String GRAPH_OUTPUT_TYPE = graph_output_file_type private static final String TEXT_OUTPUT_TYPE = text_output_file_type private static final String OUTPUT_CONTAINER_ID = output_container_id private static final String OUTPUT_CONTAINER_TYPE = output_container_type private static final String OUTPUT_CONTAINER_GRAPH = HTMLOutput private static final String OUTPUT_CONTAINER_TEXT = TextOutput private static final String CONTAINER_TYPE_HTML = html private