18
Paper SP12-2008 Bayesian Data Analysis Using %WinBUGS Lei Zhang, Celgene Corporation ABSTRACT WinBUGS is a powerful statistical tool for Bayesian analysis using Markov chain Monte Carlo (McMC) methods. It has been used to construct and analyze a wide variety of Bayesian models in many application areas; however, it has very limited capabilities for data manipulation, graph customization, and comparison with other statistical methods. The SAS System is a dominant statistical platform with rich functions for data management, graph generation, and analytical statistical solutions. Therefore, making WinBUGS work with SAS will create much-wanted synergy. This paper introduces a macro called %WinBUGS that gives you the edge to perform a Bayesian analysis using WinBUGS from within the SAS System. With %WinBUGS, you can convert SAS datasets into WinBUGS data files, invoke WinBUGS to perform the intended Bayesian analysis, and then get back results into SAS for further analyses and reporting. All that you need to do is to create a WinBUGS analysis file, and submit it to %WinBUGS for execution. In this paper, I first describe the mechanism used in %WinBUGS, explain the syntax of %WinBUGS directives, and then give examples to demonstrate how to write WinBUGS analysis files to automate Bayesian analyses using features from both SAS and WinBUGS. INTRODUCTION Bayesian inference has become increasingly popular in recent years. This is largely due to the advance in Markov chain Monte Carlo (McMC) methods, especially Gibbs sampling techniques. WinBUGS (the Windows version of Bayesian inference Using Gibbs Sampling) is the most developed software package for carrying out McMC methods to analyze a broad range of Bayesian models. The package can be downloaded with no cost from the website www.mrc-bsu.cam.ac.uk/bugs . Its functions and usages are fully described in the accompanying online manual, and many other Bayesian websites. As a stand-alone application, WinBUGS is usually run interactively with menus and toolbars. Since the version 1.4, WinBUGS has provided a simple scripting language so that a Bayesian analysis can be managed with a script by an external application. SAS can invoke external applications with its built-in facilities, such as I/O pipe, or X statements; it is thus possible to run WinBUGS from within SAS. However, the major obstacles in making two systems work together are the very different ways they work both in language and data formats, and the ways that a WinBUGS script runs in batch mode. To overcome these problems and others, I developed the %WinBUGS macro to simplify all the routine tasks of collaboration between SAS and WinBUGS so that you can focus on the Bayesian model construction and analysis using the features from both systems. WHAT DOES WINBUGS DO? WinBUGS has been developed to carry out McMC computations on a broad range of statistical models within the Bayesian framework that treats all quantities as random variables. The model it assumes consists of a joint distribution over all unobserved quantities such as parameters (or nodes in WinBUGS term), and observed quantities such as collected data. It then conditions on the data to obtain a posterior distribution over the parameters through Bayes theorem. To obtain inferences on the unknown quantities of interest from the model, it marginalizes the posterior distribution by using the McMC simulation techniques [1][2]. Instead of using exact formula or numerical approximation for the estimates, WinBUGS generates a stream of simulated values from the posterior distribution for each unknown quantity, and then makes inference from them. The exact sampling method used by WinBUGS varies for different types of models. In the simplest case, a tractable conjugate prior distribution is used; in more complicated cases, the posterior distribution is sampled with Gibbs or Metropolis-Hastings methods if there is no simple way for the direct sampling. WinBUGS is thus especially valuable to the complex statistical models that lack any feasible analytic solutions. WinBUGS offers a GUI interactive environment [3], through which you can 1. Creates a model specification that contains definitions of the likelihood and prior distributions for data and parameters, using syntax similar to SPlus or R language, 2. Load observed data and provide sets of initial parameter values for a specified number of Markov chains, 3. Sample parameters from the posterior distribution with a large number of iterations using McMC, and 4. Finally save the simulated values for the interested parameters and use them to make summary statistics, chain convergence diagnostics and statistical inference. As a SAS user, you nevertheless might be interested in running WinBUGS from SAS for a Bayesian analysis to mimic the above process with little pointing and clicking. In doing so, you usually have to create following four types of files [4][5]:

Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

  • Upload
    dodat

  • View
    227

  • Download
    7

Embed Size (px)

Citation preview

Page 1: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

Paper SP12-2008

Bayesian Data Analysis Using %WinBUGS

Lei Zhang, Celgene Corporation

ABSTRACT WinBUGS is a powerful statistical tool for Bayesian analysis using Markov chain Monte Carlo (McMC) methods. It has been used to construct and analyze a wide variety of Bayesian models in many application areas; however, it has very limited capabilities for data manipulation, graph customization, and comparison with other statistical methods. The SAS System is a dominant statistical platform with rich functions for data management, graph generation, and analytical statistical solutions. Therefore, making WinBUGS work with SAS will create much-wanted synergy. This paper introduces a macro called %WinBUGS that gives you the edge to perform a Bayesian analysis using WinBUGS from within the SAS System. With %WinBUGS, you can convert SAS datasets into WinBUGS data files, invoke WinBUGS to perform the intended Bayesian analysis, and then get back results into SAS for further analyses and reporting. All that you need to do is to create a WinBUGS analysis file, and submit it to %WinBUGS for execution. In this paper, I first describe the mechanism used in %WinBUGS, explain the syntax of %WinBUGS directives, and then give examples to demonstrate how to write WinBUGS analysis files to automate Bayesian analyses using features from both SAS and WinBUGS.

INTRODUCTION Bayesian inference has become increasingly popular in recent years. This is largely due to the advance in Markov chain Monte Carlo (McMC) methods, especially Gibbs sampling techniques. WinBUGS (the Windows version of Bayesian inference Using Gibbs Sampling) is the most developed software package for carrying out McMC methods to analyze a broad range of Bayesian models. The package can be downloaded with no cost from the website www.mrc-bsu.cam.ac.uk/bugs. Its functions and usages are fully described in the accompanying online manual, and many other Bayesian websites. As a stand-alone application, WinBUGS is usually run interactively with menus and toolbars. Since the version 1.4, WinBUGS has provided a simple scripting language so that a Bayesian analysis can be managed with a script by an external application. SAS can invoke external applications with its built-in facilities, such as I/O pipe, or X statements; it is thus possible to run WinBUGS from within SAS. However, the major obstacles in making two systems work together are the very different ways they work both in language and data formats, and the ways that a WinBUGS script runs in batch mode. To overcome these problems and others, I developed the %WinBUGS macro to simplify all the routine tasks of collaboration between SAS and WinBUGS so that you can focus on the Bayesian model construction and analysis using the features from both systems.

WHAT DOES WINBUGS DO? WinBUGS has been developed to carry out McMC computations on a broad range of statistical models within the Bayesian framework that treats all quantities as random variables. The model it assumes consists of a joint distribution over all unobserved quantities such as parameters (or nodes in WinBUGS term), and observed quantities such as collected data. It then conditions on the data to obtain a posterior distribution over the parameters through Bayes theorem. To obtain inferences on the unknown quantities of interest from the model, it marginalizes the posterior distribution by using the McMC simulation techniques [1][2]. Instead of using exact formula or numerical approximation for the estimates, WinBUGS generates a stream of simulated values from the posterior distribution for each unknown quantity, and then makes inference from them. The exact sampling method used by WinBUGS varies for different types of models. In the simplest case, a tractable conjugate prior distribution is used; in more complicated cases, the posterior distribution is sampled with Gibbs or Metropolis-Hastings methods if there is no simple way for the direct sampling. WinBUGS is thus especially valuable to the complex statistical models that lack any feasible analytic solutions. WinBUGS offers a GUI interactive environment [3], through which you can

1. Creates a model specification that contains definitions of the likelihood and prior distributions for data and parameters, using syntax similar to SPlus or R language,

2. Load observed data and provide sets of initial parameter values for a specified number of Markov chains,

3. Sample parameters from the posterior distribution with a large number of iterations using McMC, and

4. Finally save the simulated values for the interested parameters and use them to make summary statistics, chain convergence diagnostics and statistical inference.

As a SAS user, you nevertheless might be interested in running WinBUGS from SAS for a Bayesian analysis to mimic the above process with little pointing and clicking. In doing so, you usually have to create following four types of files [4][5]:

Page 2: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

1. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters for the Bayesian analysis are specified.

2. One or more text files containing the observed data in either list, or rectangular format.

3. A set of files containing initial parameter values for a specified number of McMC chains. This is optional sometimes, because you have a option to let WinBUGS generate the initial values for the model.

4. A script file containing the instructions on how to load and compile the above files, and how to carry out the McMC computations, and how to save the analysis results.

With these files, you can then issue the following Windows command from SAS to have WinBUGS run the script for the intended Bayesian analysis. (Assuming that the full directory path of the WinBUGS executable WinBUGS14.exe has been added to a PATH variable in the MS Windows System.)

C > WinBUGS14.exe /PAR Script.txt

In addition, if you want to transfer data between SAS and WinBUGS, you have to write extra code to convert SAS datasets into WiNBUGS data files or vice versa, which often is a non-trivial process.

%WINBUGS = INTERPRETER + BRIDGE %WinBUGS is developed to simplify the communication and cooperation between SAS and WinBUGS. It can be called with two parameters as follows: %WinBUGS(WBGFile=, RootDir=) Where WBGFile= is a WinBUGS analysis file in full path, and RootDir= is a working directory.

%WinBUGS carries outs following steps in order when invoked:

1. Parse and interpret the submitted WinBUGS analysis file to generate individual files for model specification, observed data, initial parameter values, and script, and store them in the working directory;

2. If %WinBUGS export directives are included in the analysis file, follow the directives to convert the SAS datasets into data or initial value files in either list or rectangular formats and store them in the working directory;

3. Launch WinBUGS in batch mode using the script to carry out McMC computations with the model and data files created in 1 and 2, and saves the results in the working directory;

4. If %WinBUGS import directives are included in the analysis file, once WinBUGS completed the execution, follow the directives to get the WinBUGS results back into SAS for further analyses and reporting.

In a word, %WinBUGS takes care of all the mundane tasks of running WinBUGS from within SAS. Below I give a small example to show how a WinBUGS analysis file looks like and how it can be used.

A WINBUGS ANALYSIS FILE A WinBUGS analysis file puts together all the elements of performing a WinBUGS analysis (or Bayesian analysis using WinBUGS) in one place. It uses a set of single-line %WinBUGS directives to govern the creation of text files for model specification, data, initial parameter values, and script used in a WinBUGS analysis, and at the same time to manage the data exchange between SAS and WinBUGS if requested. Let’s consider a simple example that performs a multiple regression on weight over height and age, with data of nine girls from SAS built-in dataset SASHELP.CLASS. In this example, Weight is a dependent variable; Height and Age are explanatory variables. Assuming the error distribution is normal, the regression model can be expressed as,

Weightj ~ Normal(µj, τ) µj = α+ β1*Agej + β2*Heightj Where j = 1, . . ., 9. All parameters in the model are assumed to have independent, relatively flat, prior distributions. α, β1, and β2 have normal distributions, with their values being within the range of about 1.0E+6 with all values around zero having more or less equal prior probability, and σ has uniform distribution with its value being equally likely to be any value between 0 and 1.0E+3. The analysis file uses “-WBG” as postfix of the file name and “.SAS” as file extension for easy editing in a SAS GUI environment, it is coded as follows.

Line REG-WBG.SAS

1 2 3 4 5 6 7

#Multiple regression on weight over height and age of 9 girls #DEF MODEL Model { # Priors alpha ~ dnorm(0, 1.0E-6)

Page 3: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

beta1 ~ dnorm(0, 1.0E-6) beta2 ~ dnorm(0, 1.0E-6) sigma ~ dunif(0.0, 100) tau <- 1/(sigma*sigma) # Liklihood for (j in 1:N ) { mu[j] <- alpha + beta1*Age[j] + beta2*Height[j] Weight[j] ~ dnorm(mu[j], tau) } } #DEF DATA list(N=9, Weight=c( 84,98,102.5,84.5, 112.5, 50.5,90, 77,112), Height=c( 56.5, 65.3,62.8,59.8,62.5,51.3,64.3,56.3,66.5), Age=c(13,13,14,12,15,11,14,12,15)) #DEF INIT list(alpha=0, beta1=0, beta2=0, sigma=1) #DEF SCRIPT display ('log') check ('$ROOT/model.txt') # Load and check model data ('$ROOT/data.txt') # Load data compile (1) # Compile 1 chain inits (1, '$ROOT/init.txt') # Load init data for chain 1 update (10000) # Set up burn-in times # Monitor parameters set(deviance) set(alpha) set (beta1) set (beta2) set (sigma) update (20000) # Set up simulation times coda (*, '$ROOT/coda') stats (*) save('$root/output.txt') #IMPORT CODA coda<$ROOT/coda #IMPORT SUMMARY result<$ROOT/output.txt #IMPORT LOG <$ROOT/output.txt

Here are line by line explanations of the WinBUGS analysis file.

Line 1 is a WinBUGS comment that starts with # and is inserted in the WinBUGS lanaguage and script for ease of reading. However, in a WinBUGS analysis file, a comment starting with #DEF, #EXPORT, or #IMPORT in a new line will be treated as a %WinBUGS directive.

Line 2 is a #DEF MODEL directive, which should be provided in a WinBUGS analysis file. It instructs %WinBUGS to save all the text from the next line up to the line before the next #DEF directive as a model specification file. The file is by default named as MODEL.txt, and will be stored in the $ROOT directory. (Word $ROOT is an anchor for working directory provided by users when %WinBUGS is called. It can appear in %WinBUGS directives or script commands). If you want to save the model specification to another file, such as RegModel.txt, you can use output pipe mechanism (symbolized by “>”) to switch the output to that file, for example

#DEF MODEL > $ROOT/RegMODEL.txt

Lines 3 – 22 contain the model specification for the regression. The language for the WinBUGS model specification is similar to SPlus, or R, which provides a concise syntactical expression of a Bayesian model. In the model specification, the combined symbol <- is used to denote assignment, and the symbol ~ denotes a distribution. Variable names in WinBUGS code are case sensitive, which means that WinBUGS interprets a and A as different symbols. A WinBUGS model variable or node usually appears once on the left-hand side of the equations, which means that the order of the statements in the model specification doesn’t matter. For example, the priors can be specified on the code lines before or after they are used. This feature however could be a little odd to a SAS user, because the order of SAS statements really matters. In this example, lines 5-11 state the prior belief in each of the possible values of the four parameters. Line 13 creates a variable for precision, which is 1/ σ2. Line 15-20 is

Page 4: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

a for loop that provides a concise expression that the weight of each of the nine girls in the sample data are drawn from the same normal distribution with mean being α+ β1*Agei + β2*Heighti. For more information about the WinBUGS model language, syntax, and functions, please see [3].

Line 23 is a #DEF DATA directive. It instructs %WinBUGS to save all the text from the next line up to the line before the next #DEF directive as a WinBUGS data file. The file is by default named as DATA.txt, and will be stored in the $ROOT directory. If you want to save the data to another file, such as RegData.txt, use output pipe , for example,

#DEF DATA > $ROOT/RegDATA.txt

Note you can create more than one data files with output pipes like the one above.

Lines 24-28 contain observations to be output into a data file. WinBUGS can accept two types of data. One is list format, which uses the syntax similar to that in SPlus or R for storing data as lists. In this example, the list contains four components: three vectors Weight, Height, Age; and a constant N, which representing the number of observations used in the model. The second is rectangular format, which is similar to SAS data lines, the difference is it requires the data lines end with END line followed by a carriage return. For example,

Weight[] Height[] Age[] 84.0 56.5 13 98.0 65.3 13 102.5 62.8 14 84.5 59.8 12 112.5 62.5 15 50.5 51.3 11 90.0 64.3 14 77.0 56.3 12 112.0 66.5 15 END <carriage return> Note that WinBUGS only accepts numeric values with missing values being represented by the words NA. It also assumes that the data is separated by spaces or tabs. It does not accept any other delimiters. Multiple data files can be created with %WinBUGS DATA directives using output pipes. The other way to create data files is using %WinBUGS EXPORT directives, which will be described later.

Line 29 is the #DEF INIT directive, which instructs %WinBUGS to save all the text from the next line up to the line before the next #DEF directive as a data file for initial parameter values. The file is by default named as INIT.txt, and will be stored in the $ROOT directory. If you want to save the values to another file, such as RegInit.txt, use output pipe, for example

#DEF INIT > $ROOT/RegInit.txt

Note you can create more than one data files for initial parameter values with output pipes like the one above.

Line 30 contains initial values for the model parameters. Initial values are required for each unknown parameter that has a distribution attached to it. They can be provided in both list and rectangular formats. Sometimes, you can have WinBUGS generate initial parameter values for the model.

Line 32 is a #DEF SCRIPT directive, which must be provided in a WinBUGS analysis file. It instructs %WinBUGS to extract all the texts from the next line up to the line before the next #DEF directive, or end of the file and save them as a WinBUGS script template. The script commands in the script template can contain anchor word $ROOT, and SAS global macro variables where appropriate. The script statements are resolved by %WinBUGS with the actual value from macro parameter ROORDIR= and values form SAS global macro variables. The resolved statements are saved as an executable script file in the $ROOT directory. The file is by default named as SCRIPT.txt. If you want to save the script to another file, such as RegScript.txt, use output pipe, for example

#DEF SCRIPT > $ROOT/RegScript.txt

Lines 33-52 contain the minimal script template for this WinBUGS analysis. Line 33 opens a WinBUGS log window which traces the progress of the script execution, and records any outputs and error messages. Line 34 asks WinBUGs to load the model specification file MODEL.txt from the $ROOT folder. If the model is syntactically correct, a message “model is syntactically correct” will appear in the log; otherwise error messages such as typos and imbalanced parentheses will appear. Line 35 reads the observed data from file DATA.txt in the directory $ROOT. Line 36 compiles the model and data into an internal executable program. The compile(1) command tells WinBUGS to perform one-chain McMC computation. Line 37 reads the initial values for this single chain from the file INIT.txt in the directory $ROOT. Line 39 creates the McMC chain with first 10,000 burn-in iterations. The burn-in is intended to allow the chain to stabilize and eliminate the effects from the initial values; the simulated values in that period are discarded. Line 41 contains five set commands that tell WiNBUGS to monitor the subsequent simulated values for parameters α, β1, β2, σ and Bayesian deviance. Line 43 tells WinBUGS to run the chain for additional 20,000 iterations. Line 44 tells WinBUGS to write the 20,000 simulated values for the monitored parameters to a set of coda files with ‘coda’ being a prefix of file names. (CODA stands for Convergence Diagnostic and Output Analysis). Each run

Page 5: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

of WinBUGS produces CODA output files containing the McMC output in CODA format. The CODA files consist of an output file for each chain, showing the iteration number and value. In addition there is an index file containing a description of which lines of the output files correspond to which node. Line 45 prints out overall statistics of the estimates in the log. Line 46 tells WinBUGS to save all the text outputs in the log to the file output.txt in the $ROOT directory. For other WinBUGS script statements, please see [3].

Lines 48-50 contain three #IMPORT directives which read selected WinBUGS outputs into SAS. Line 48 tells %WinBUGS to load the coda data into the SAS dataset CODA after running the WinBUGS script. Line 49 tells %WinBUGS to extract the overall statistics of the estimates from the log file output.txt and store them in the SAS dataset RESULT. Line 50 tells %WinBUGS to read the whole log file, and display them in the SAS log window, which will be very helpful in debugging a WinBUGS analysis file.

USING THE WINBUGS ANALYSIS FILE Once a WinBUGS analysis file is ready, you can use the file as parameter to call %WinBUGS to perform the desired WinBUGS analysis. Below is the sample SAS code that uses the analysis file REG-WBG.SAS.

%WinBUGS( WBGFile=C:\MySAS\Bayes\Reg\Reg-wbg.sas ,RootDir=C:\MySAS\Bayes\Reg\results\Reg ) proc print data=result width=minimum label noobs; var node mean sd mcerror lowpct median highpct; run;

Running the code, a summary of the Bayesian estimates of interested parameters will appear on the SAS output window:

20000 Simulations after 10000 burn-ins Node Mean SD MC Error 2.5% Median 97.5% ALPHA 90.100 3.248 0.023440 83.6300 90.100 96.640 BETA1 8.068 4.204 0.030660 -0.3844 8.078 16.380 BETA2 1.613 1.171 0.008918 -0.7029 1.607 3.967 DEVIANCE 62.890 4.432 0.060690 57.4300 61.870 74.210 SIGMA 8.993 3.761 0.054030 4.7150 8.126 18.560

%WINBUGS DIRECTIVES In this section, %WinBUGS directives for a WinBUGS analysis file are summarized. Please note a %WinBUGS directive should always start with a new line and should not be written over more than one line. There are three types of single-line %WinBUGS directives. The definition directive that starts with tag #DEF are used to divide a WinBUGS analysis file into four sections that correspond to model specification, observed data, initial parameter values, and script template, respectively. The export directive that starts with tag #EXPORT is used to convert a set of variables in a SAS dataset into a WinBUGS data file in list or rectangular format. The import directive that starts with tag #IMPORT is used to read the selected WinBUGS outputs into SAS datasets, or display them in the SAS log window. %WinBUGS parses and interprets the directives it finds in a WinBUGS analysis file with following order.

1. If it finds a definition directive, it immediately saves the text lines right after the directive up to the next definition directive into a specified .txt file. The other two types of directives if found during this process will be kept in two working datasets until the completion of the parsing of all the definition directives.

2. Before invoking WinBUGS in batch, all the export directives are executed with the order they are collected in step 1, to create additional data files.

3. Once WinBUGS completes the execution of the script, the import directives collected in step 1 will be executed immediately, to read WinBUGS outputs into SAS datasets, or display them in the SAS log window.

Besides, any SAS global macro variables or working directory anchor $ROOT, found in the directives, or in the script template during this process, will be immediately resolved with the values provided.

DEFINITION DIRECTIVES There are four definition directives that start with tag #DEF along with key words, MODEL, DATA, INIT, or SCRIPT, each of which indicates the beginning of one of four sections in a WinBUGS analysis file that usually has following layout:

#DEF MODEL [> $ROOT//AnotherModelFile.txt]

Model specification in WinBUGS language

#DEF DATA [> $ROOT//AnotherDataFilee.txt]

Observed data in list or rectangular format

Page 6: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

#DEF INIT [> $ROOT//AnotherInitFile.txt]

Initial parameter values in list or rectangular format

#DEF SCRIPT [> $ROOT//AnotherScriptFile.txt]

WinBUGS script statements

A definition directive tells %WinBUGS to extract a particular section from the text lines right after the directive up to the next definition directive, or end of the file, and save it as a .txt file in the $ROOT directory. By default, it creates a .txt file with key word as the filename, such as MODEL.txt or DATA.txt. If you want a different name for the file, use output pipe symbol (>) followed by the new file name, for example,

#DEF DATA > $ROOT/AnotherDataFileName.txt.

EXPORT DIRECTIVES There are two export directives that can be used to convert and save SAS datasets into WinBUGS data files in either list or rectangular format. Export directives start with tag #EXPORT, and can be inserted between lines in a WinBUGS analysis file. Please note, if an output SAS variable has a associated non-blank label, the variable label will be used as the variable name in the WinBUGS data file; similarly, if an output SAS variable has a associated format, data values will be output using the format to the WinBUGS data file.

#EXPORT LIST DIRECTIVE The #EXPORT LIST directive converts a set of numerical variables in a SAS dataset into a list file and store it in the $ROOT directory, with missing values being converted into words NA. The directive can convert a single value, a range of values, or all the values of each SAS variable into a component in a WinBUGS list. The directive can be used multiple times to create multiple list files for observed data or initial parameter values that are to be loaded into WinBUGS engine with script commands DATA(), or INITS(). The directive has following syntax:

#EXPORT LIST DSN[Variable-expression(s)/Options] > $ROOT/Filename.txt

Arguments

DSN SAS dataset name for output. Filename Name of the list output file with extension .txt. The file is usually stored in the $ROOT directory.

Variable-expression(s) is a list of space-separated variable expressions; each can have one of following three forms:

Expression Meaning

VAR All values of the SAS variable will be converted into a vector in the WinBUGS list.

VAR@ The first value of the SAS variable will be converted into a constant in the WinBUGS list

VAR@ n The nth value of the SAS variable will be converted into a constant in the WinBUGS list.

VAR@ n–m The nth to mth values (inclusive) of the SAS variable will be converted into a vector in the list

Note that missing values in the variables will be converted into the words NA.

Options

N= The name for a special constant in the list that has the total number of observations of the SAS dataset. The default name is N. The constant is created automatically. But, if N =NULL, no such a constant will be created in the list.

LINESIZE= Line size of the list file. The default size is 132.

EXAMPLES Suppose you have a SAS dataset GIRLS created using following data step.

Data GIRLS; Set SASHELP.CLASS (where=(SEX=”F”)); If sex=”M” then sexn=1; else sexn=2; Label AGE=”G.Age” Height=”G.Height” Weight=”G.Weight” Sexn="G.Sex"; Format AGE 4.1 Height Weight 5.2; Run;

To create the list file girls1.txt that contains all the values from the variables AGE, and HEIGHT , with the name of the automatic constant being NOBS, use

#EXPORT LIST GIRLS[age height/N=NOBS] > $ROOT/girls1.txt

Page 7: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

The contents of the created file will be

list(NOBS=9, G.Age=c( 13,13,14,12,15,11,14,12,15), G.Height=c( 56.5,65.3,62.8,59.8,62.5,51.3,64.3,56.3,66.5))

To create a list file girls2.txt that contains the first 3 values of variables AGE, and HEIGHT, and the second value of variable SEXN in GIRLS, with line size = 80 and no automatic constant in output file, use

#EXPORT LIST GIRLS[age@1-3 height@1-3 sexn@2/LineSize=80 N=NULL] > $ROOT/girls2.txt The contents of the created file will be

list(G.Age=c( 13,13,14),G.Height=c(56.5,65.3,62.8),G.Sex=2) Please note the #EXPORT LIST directive can only create constants, and vectors for a list from a set of SAS variables.

#EXPORT RECT DIRECTIVE The #EXPORT RECT directive converts a set of numerical variables in a SAS dataset into a .txt file in rectangular format, with missing values being converted into words NA, and saves it in the $ROOT working directory, You can use the directive to output all the values in either vector or matrix form. The directive can also be used multiple times to create multiplt rectangular data files for observed data or initial parameter values that are to be loaded into WinBUGS engine with script commands DATA(), and INITS(). The directive has following syntax:

#EXPORT RECT DSN[Variable-list(s)/Options] > $ROOT/Filename.txt

Arguments

DSN SAS dataset name for output

Variable-list(s) is a list of space-separated SAS variables for output.

Filename Name for the rectangular data file with extension .txt. The file is usually stored in the study $ROOT folder.

Options

MatName= If MatName= is blank, then vector variables will be created in the rectangular data file; otherwise a matrix will be created in the rectangular data file with &MatName as the matrix name. The order of matrix columns is same as the variable order in the variable list. Default is vector style.

LINESIZE= Line size of the output text file. The default value is 132.

Examples

To create a rectangular data file girls3.txt in vector style from variables AGE, and HEIGHT , use

#EXPORT RECT GIRLS[age height] > $ROOT/girls3.txt

The contents of the created file will be

G.Age[] G.Height[] 13.0 56.50 13.0 65.30 14.0 62.80 12.0 59.80 15.0 62.50 11.0 51.30 14.0 64.30 12.0 56.30 15.0 66.50 END

To create a rectangular data file girls4.txt in matrix style from variables AGE, and HEIGHT with girls being the matrix name, use

#EXPORT RECT GIRLS[age height/MatName=girls] >$ROOT/girls4.txt

The contents of the created file will be

girls[,1] girls[,2] 13.0 56.50 13.0 65.30 14.0 62.80 12.0 59.80 15.0 62.50 11.0 51.30 14.0 64.30

Page 8: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

12.0 56.30 15.0 66.50 END

Please note that WinBUGS only accepts numeric values in list or rectangular format, and you should not provide more variables than needed in the model, otherwise you will receive error messages.

IMPORT DIRECTIVES There are three import directives that can be used to read selected WinBUGS outputs into SAS for further processing. The import directives start with tag #IMPORT are executed right after the WINBUGS completes the execution of the submitted script and returns the control back to %WinBUGS.

#IMPORT CODA DIRECTIVE The #IMPORT CODA directive creates a CODA dataset from a set of CODA files in the $ROOT directory. It has following syntax

#IMPORT CODA DSN < $ROOT/PrefixForCodaFiles

Arguments

DSN Name of SAS dataset that hold the CODA data

PrefixForCodaFiles is the prefix for a set of CODA files created by WinBUGS script command coda (*, '$ROOT/CodaFilePrefix'), which contain the simulated values of the model parameters selected by set() commands. In order to use this directive to get a CODA dataset from the WinBUGS outputs, coda() command has to be executed in the script. The CODA dataset created with this directive has the following structure:

Variable Type Label

_CHAIN_ Num MC Chain ID starting from 1.

_NSIMU_ Num Number of Simulations after burn-in

Parameter1 Num Simulated Values of Parameter 1 monitored by set()command

Parameter2 Num Simulated Values of Parameter 2 monitored by set()command

. . . . . . . . .

ParameterN Num Simulated Values of Parameter N monitored by set()command

#IMPORT SUMMARY DIRECTIVE The #IMPORT SUMMARY directive creates a dataset by extracting the summary statistics in the WinBUGS log file that are output by stats(*) command and then saved by save() command. It has following syntax

#IMPORT SUMMARY DSN < $ROOT/Logfile.txt

Argument

Logfile is the name of the WinBUGS text log file created by command save(). In order for the directive to create a valid summary dataset from the text log file, a set of stats(variable) commands , or stats(*) command have to be issued from the script and then followed by the save()command to save the WinBUGS text log into a .txt file, for example, save(‘$ROOT/Logfile.txt’).

DSN SAS dataset name for the summary statistics. The summary dataset contains the overall estimates for the parameters that are monitored by set() commands, and output by stats(*) commands. It has following structure:

Variable Type Label

Node Char(32) Variable or node name of the unknown quantity

Mean Num Average of the simulations, or the estimated μ of the posterior distribution of the unknown quantity

SD Num Standard deviation of the simulations, or the estimated σ of the posterior distribution of the unknown quantity

MCERROR Num Computational accuracy of the estimated mean, or μ

LowPct Num 2.5th percentile of simulations, an approximation of the lower endpoint of the 95% Bayesian credible interval

Median Num Median, or 50th percentile of simulations

Page 9: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

HighPct Num 97.5th percentile of simulations, an approximation of the upper endpoint of the 95% Bayesian credible interval

Start Num First simulation number for the estimation

Sample Num Total number of simulations for the estimation

#IMPORT LOG DIRECTIVE The #IMPORT LOG directive reads the whole WinBUGS text log file in the $ROOT directory and displays it in the SAS LOG window so that you can review the WinBUGS compiling and running process, examine and debug any warning or error messages issued by WinBUGS. It has following simple syntax:

#IMPORT LOG < $ROOT/Logfile.txt

Arguments

Logfile is the name for the WinBUGS text log file created by script command save() such as save(‘$ROOT/Logfile.txt’) in the script.

COMMON STEPS FOR BAYESIAN ANALYSIS USING %WINBUGS You can use following steps to perform a WinBUGS analysis from within SAS when you have a mathematical Bayesian model for a real-life statistical problem.

1. Create a WinBUGS analysis file for the Bayesian model by using #DEF directives to put together model specification, observed data (optional), initial values (optional) and script in a single place. The new WinBUGS analysis file often can be developed from an existing one, especially when both have similar models.

2. Write a corresponding SAS program to create datasets for observed data and initial parameter values, add #EXPORT directives to the analysis file that will convert them into WinBUGS data files; if you want to bring WinBUGS results into SAS, add #IMPORT directives to the analysis file too, write %WinBUGS call in the SAS program with the analysis file and a working directory as parameters.

3. Run the SAS program to carry out the WinBUGS analysis on the Bayesian model.

4. Check the convergence of MCMC computation by looking over the diagnosis plots that the WinBUGS generated. If the results fail to pass all convergence tests, go back to step 1 and 2 and modify the model specification, script, and initial values for parameters in the analysis file and the SAS program, and run the SAS program again.

5. In addition, compare the results with those from analytic methods using appropriate SAS procedures to guard against any erroneous inference, evaluate and report the results.

Now that you know the general way to perform a WinBUGS analysis, let’s examine more examples. The two examples presented in the next sections will use the data from SAS online help documents to show how to perform a Bayesian analysis via %WinBUGS.

CLASS DATA: BAYESIAN ANCOVA WITH INTERACTION TERM In this example, all the data of 9 girls and 10 boys from SASHELP.CLASS are used. It is expected that gender may have some influence on the effect of height over weight; thus variable Height in the previous model is replaced with an interaction term G*Height, where G is a numerical categorical variable derived from the variable Sex, with 1=Male and 2=Female. The Bayesian ANCOVA model can be defined as follows:

Weightj ~ Normal(µj, τ1) µj = a+ b*Agei + λGj*Heighti λGj ~ Normal(0, τ2 ) where j = 1, . . . , 19. Gj= 1, 2, being the gender of the jth observation. λGj is the effect of height under the influence of the

unknown gender Gj group, which is assumed to have normal distribution with mean 0 and precision τ2.

By extending the previous WinBUGS analysis file, the WinBUGS analysis file can be coded as follows.

ANVOCA-WBG.SAS #ANCOVA on weight over age and sex*height #DEF MODEL { # Priors a ~ dnorm(0, 1.0E-6) b ~ dnorm(0, 1.0E-6) for ( i in 1:2) { lamda[i] ~ dnorm(0, 1.0E-6)

Page 10: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

} prec ~ dgamma(0.001, 0.001) # Liklihood for (j in 1:N ) { mu[j] <- a + b*Age[j] + lamda[G[j]]*Height[j] Weight[j] ~ dnorm(mu[j], prec) } } #EXPORT LIST class[weight height age gender] > $ROOT/class.txt #EXPORT LIST inits[a@1 b@1 lamda@1-2 prec@1/N=NULL] > $ROOT/init1.txt

#DEF SCRIPT display ('log') check ('$ROOT/model.txt') # Load and check model data ('$ROOT/class.txt') # Load data compile (1) # Compile 1 chain inits (1, '$ROOT/init1.txt') # Load init data for chain 1 thin.updater(&nThin)

update (&Burnin) # The number of burn-in times set(a) set(b) set(lamda[1]) set(lamda[2]) set(deviance) history(*) density(*) autoC(*)

update(&nSimu) # The number of Simulation after burn-in stats(*) coda (*, '$ROOT/coda') save('$root/output.txt') save('$root/output.odc')

# WinBUGS Directives #IMPORT CODA coda<$ROOT/coda #IMPORT SUMMARY result<$ROOT/output.txt #IMPORT LOG <$ROOT/output.txt

Note:

The Initial parameter values in a SAS dataset are converted into a data file using #EXPORT List directive.

SAS global macro variables nThin, Burnin, and nSimu are embedded in the script commands so that a user can control the burn-in times, simulation number, and thinning rate easily from the SAS program. Using thinning rate enables WinBUGS to store the samples from every kth iteration in order to reduce the autocorrelation within the collected samples.

Monitor the convergence of parameter chains with commands history(), density(), and autoC(), which generate trace plots, kernel density plots, and autocorrelation plots respectively. If a MC chain has converged, the trace plots will look like a horizontal band with no upward or downward trends; if a MC chain is converged, the kernel plot will look like bell-shaped, even though it doesn’t have to be symmetric; if a MC chain is slow to converge, the autocorrelation plot will show high autocorrelation among the simulated values.

Save WinBUGS analysis results in an .ODC file that can hold various types of information in multimedia format in one place, such as formatted text, tables, formulae, plots, graphs displayed in the WinBUGS log window.

The corresponding SAS program is

ANOVA.sas

Data class; set sashelp.class; gender=1;if sex="F" then gender=2; label age="Age" weight="Weight" height="Height" gender="G"; format gender 8.; run; /* Create initial parameter values */ Data inits;

input a b prec; do i = 0, 1; lamda=i; output; end; datalines; 0 0 100 ; run;

Page 11: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

%global burnin nSimu nThin rootdir; %let burnin=5000; %let nSimu=10000; %let nThin=10; %let rootdir=C:\MySAS\Bayes\Ancova\results\Ancova; %WinBUGS( WBGFile=C:\MySAS\Bayes\Ancova\Ancova-wbg.sas ,RootDir=&rootdir ) /* Print the statistical summary */ proc print data=result width=minimum label noobs; var node mean sd mcerror lowpct median highpct; run; /* SAS analysis for the ANCOVA model*/ proc mixed data=class;

class sex; model weight=age sex*height/solution cl; run;

Note:

Categorical variable Gender (1 = Male, and 2= Female) derived from SEX will be used in the nested indexing expression of the model specification.

Dataset inits is to provide the initial values for the chain of the model parameter.

Global macro variables Burnin, nSimul, and nThin are created to control the generation of WinBUGS script.

The corresponding analytic results are provided by Proc MIXED as a comparison.

ANCOVA.sas invokes %WinBUGS with the analysis file ANOVA-WBG.SAS as parameter. The script in the analysis file sets up a single MC chain and samples it for 15,000 iterations with thinning rate 10. A total sample of 10,000 is used for summarization and convergence checks after discarding the first 5,000 burn-in iterations. The time series plots, kernel density plots, and autocorrelation plots set for each simulated parameter for convergence diagnosis are showed as follows.

Time-series or trend plots for simulated parameters a, b, lamda[1], and lamda[2]

a

iteration5001 7500 10000 12500 15000

-300.0

-200.0

-100.0

0.0

100.0

b

iteration5001 7500 10000 12500 15000

-20.0 -10.0 0.0 10.0 20.0 30.0

Page 12: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

lamda[1]

iteration5001 7500 10000 12500 15000

-2.5 0.0 2.5 5.0 7.5 10.0

lamda[2]

iteration5001 7500 10000 12500 15000

-5.0

0.0

5.0

10.0

Kernel density plots for simulated parameters a, b, lamda[1], and lamda[2]

a sample: 10000

-400.0 -200.0 0.0

0.0 0.005 0.01 0.015

b sample: 10000

-20.0 0.0 10.0 20.0

0.0 0.05 0.1 0.15

lamda[1] sample: 10000

-5.0 0.0 5.0

0.0 0.2 0.4 0.6

lamda[2] sample: 10000

-5.0 0.0 5.0

0.0 0.1 0.2 0.3 0.4

Autocorrelation plots for simulated parameters a, b, lamda[1], and lamda[2]

a

lag0 20 40

-1.0 -0.5 0.0 0.5 1.0

b

lag0 20 40

-1.0 -0.5 0.0 0.5 1.0

Page 13: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

lamda[1]

lag0 20 40

-1.0 -0.5 0.0 0.5 1.0

lamda[2]

lag0 20 40

-1.0 -0.5 0.0 0.5 1.0

Checking these plots indicates that the MC chain is well mixed and has no evidence of drift. The overall parameter estimates from the WinBUGS analysis and these from Proc MIXED are very similar and thus reported as follows.

Parameter WinBUGS Proc MIXED Estimate (SE) Bayesian 95% CI Estimate(SE) 95% CI a -121.20(37.10) (-196.40, -47.82) -120.75(34.82) (-194.96, -46.53) b 3.09(3.46) ( -3.80, 9.95) 3.04(3.21) ( -3.80, 9.88) Lamda[1] 2.95(1.05) ( 0.90, 5.01) 2.96(0.97) ( 0.89, 5.02) Lamda[2] 2.81(1.09) ( 0.68, 4.96) 2.82(1.01) ( 0.66, 4.97)

MULTIPLE MYELOMA DATA: WEIBULL PROPORTION HAZARDS REGRESSION This example shows how to perform Weibull proportion hazards regression using %WinBUGS. The multiple myeloma (MM) study data come from the Example 49.1 of the SAS online Help and Documentation. In this study, 65 MM patients were treated with alkylating agents. Of those patients, 48 died during the study and 17 survived.

In the data set Myeloma, the variable TIME represents the survival time in months from diagnosis. The variable VSTATUS consists of two values, 0 and 1, indicating whether the patient was alive or dead at the end of the study. If the value of VStatus is 0, the corresponding value of TIME is censored. There are nine explanatory variables that are thought to be related to survival in the study. The simple Weibull regression model uses TIME as the response variable, VSTATUS as the censoring indicator, LOGBUN (log(BUN) at diagnosis) as a continuous explanatory variable, and PLATELET (platelets at diagnosis: 0=abnormal, 1=normal) as a binary variable, The model can be presented mathematically as

TIMEj ~ Weibull(shape, λj) where j = 1, …, 65 λj = exp( - (β0 + β1 * LOG(BUNj )+ β2 * PLATELETj) )

For censored observations, the survival distribution is a truncated Weibull, with lower bound corresponding to the censoring time. The regression coefficients β0, β1, and β2 were assumed a prior that follows independent Normal distributions with zero mean and very small precision 1.0E-9. The shape parameter for the Weibull distribution was given a Gamma(1, 1.0E-6) prior. The complete WinBUGS analysis file is given as follows.

Myeloma-WBG.SAS

#DEF Model

model

{

# Priors

beta0 ~ dnorm(0.0, 1.0E-9) beta1 ~ dnorm(0.0, 1.0E-9)

beta2 ~ dnorm(0.0, 1.0E-9) shape ~ dgamma(1, 1.0E-3)

# Liklihood

for(j in 1 : N) {

TIME[j] ~ dweib(shape, lamda[i])I(TIME_CEN[j],)

lamda[j] <- exp(-(beta0 + beta1*LOGBUN[j] + beta2*PLATELET[j]))

}

}

#EXPORT list MYELOMA1[TIME TIME_CEN LOGBUN PLATELET/linesize=80] > $ROOT/data1.txt

#EXPORT list INITS[beta0@1 beta1@1 beta2@1 shape@1/N=NULL] > $ROOT/init1.txt

#EXPORT list INITS[beta0@2 beta1@2 beta2@2 shape@2/N=NULL] > $ROOT/init2.txt

Page 14: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

#EXPORT list INITS[beta0@3 beta1@3 beta2@3 shape@3/N=NULL] > $ROOT/init3.txt

#DEF SCRIPT

display ('log')

check ('$ROOT/model.txt') # Load and check model

data ('$ROOT/data1.txt') # Load data

compile(3) # Compile 3 chains

inits(1, '$ROOT/init1.txt') # Load init data for chain 1

inits(2, '$ROOT/init2.txt') # Load init data for chain 2

inits(3, '$ROOT/init3.txt') # Load init data for chain 3

gen.inits()

over.relax('yes') thin.updater(&nThin)

update (&burnin) # The number of burn-in times

set(beta0) set(beta1) set(beta2) set(shape) set(deviance)

update (&nSimu) # The number of Simulation after burn-in

gr(*) history(*) density(*) autoC(*)

stats(*) # Print out the stat summary

coda (*, '$ROOT/coda') save('$root/output-myeloma.txt')

save('$root/output-myeloma.odc')

#IMPORT CODA coda<$ROOT/coda

#IMPORT SUMMARY result<$ROOT/output-myeloma.txt

#IMPORT LOG <$ROOT/output-myeloma.txt

Note:

The truncated Weibull distribution is constructed with I() function.

Since three MC chains are set up for the myeloma model, three sets of initial values for these three chains are created in a SAS dataset; each of three sets will be stored in a list data file, and loaded by the script command inits()later.

Since the observed data contain missing values, which are treated as unknown quantities in the WinBUGS model, the command gen.inits() should be used to asks WinBUGS to -produce the initial values for the missing values automatically.

Set up over-relaxed form for the McMC chains, which enables WinBUGS to generate multiple samples at each iteration and then selects one that is negatively correlated with the current value[3]. The over-relaxed form reduces the within-chain correlations and the number of iterations for convergence, but increases the iteration time. The auto-correlation plots can be used to check whether the over-relaxed form enhances the mixing of the chain.

Command gr() can provides the Gelman-Rubin statistic for assessing convergence when multiple MC chains are used. For a given parameter, this statistics assesses the variability within the parallel chains as compared to variability between parallel chains. A model is judged to have converged if the ratio of between to within variability is close to 1. See [3] for more information.

The corresponding SAS program is

Myeloma.sas

data Myeloma; input Time VStatus LogBUN HGB Platelet Age LogWBC Frac LogPBM Protein SCalc; label Time='Survival Time' VStatus='0=Alive 1=Dead'; datalines; 1.25 1 2.2175 9.4 1 67 3.6628 1 1.9542 12 10 1.25 1 1.9395 12.0 1 38 3.9868 1 1.9542 20 18 .. . /* See Example 49.1 in SAS online Help and Documentation */

Page 15: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

... 77.00 0 1.0792 14.0 1 60 3.6812 0 0.9542 0 12 ; run; data myeloma1; set myeloma(keep=time logbun platelet vstatus); TIME_CEN=0; censored=1-vstatus; if censored then do;TIME_CEN=TIME;TIME=.;end; label TIME="TIME" TIME_CEN="TIME_CEN" LOGBUN="LOGBUN" PLATELET="PLATELET"; Run; data inits; input beta0 beta1 beta2 shape; datalines; 5 -1.5 0.5 1 5.5 -1.3 0.4 1.2 4.5 -1.8 0.6 0.9 ; run; %global burnin nSimu nThin; %let burnin=10000; %let nSimu=5000; %let nThin=20; %WinBUGS( RootDir=C:\MySAS\Bayes\survival\results\myeloma ,WBGFile=C:\MySAS\Bayes\survival\myeloma-wbg.sas ) proc print data=result width=minimum label noobs; var node mean sd mcerror lowpct median highpct; run; /* SAS code for the Weibull survival analysis */ proc lifereg data=myeloma; model TIME*VStatus(0) = logbun platelet/dist=weibull; run;

Note:

The complete dataset can be obtained from the SAS online Help and Documentation.

Since the truncated Weibull distribution is used, an extra SAS data step is needed to derive a censored survival time variable TIME_CEN for the model specification with following rule:

TIME_CEN = 0 if TIME is uncensored, otherwise TIME_CEN=TIME, and set TIME to missing.

Myeloma.sas submits the analysis file Myeloma-WBG.SAS to %WinBUGS to perform the WinBUGS analysis which sets up three MC chains and make samples from 15,000 iterations with thinning rate 20. A total sample of 5000 is saved for summarization after discarding the first 10,000 iterations as burn-ins. The time series plots, kernel density plots, and autocorrelation plots, and Gelman-Rubin statistic set for each simulated parameter are generated as follows for the convergence diagnostics.

Time-series or trend plots for simulated parameters beta0 beta1 beta2, and shape

beta0 chains 1:3

iteration10001 12000 14000

0.0 2.5 5.0 7.5 10.0 12.5

Page 16: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

beta1 chains 1:3

iteration10001 12000 14000

-6.0

-4.0

-2.0

0.0

2.0

beta2 chains 1:3

iteration10001 12000 14000

-2.0 -1.0 0.0 1.0 2.0 3.0

shape chains 1:3

iteration10001 12000 14000

0.5 0.75 1.0 1.25 1.5 1.75

Kernel density plots for simulated parameters beta0 beta1 beta2, and shape

beta0 chains 1:3 sample: 15000

0.0 2.5 5.0 7.5 10.0

0.0 0.1 0.2 0.3 0.4

beta1 chains 1:3 sample: 15000

-6.0 -4.0 -2.0 0.0

0.0 0.2 0.4 0.6 0.8

beta2 chains 1:3 sample: 15000

-2.0 -1.0 0.0 1.0 2.0

0.0 0.25 0.5 0.75 1.0

shape chains 1:3 sample: 15000

0.5 0.75 1.0 1.25 1.5

0.0 1.0 2.0 3.0 4.0

Autocorrelation plots for simulated parameters beta0 beta1 beta2, and shape

Page 17: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

beta0 chains 1:3

lag0 20 40

-1.0 -0.5 0.0 0.5 1.0

beta1 chains 1:3

lag0 20 40

-1.0 -0.5 0.0 0.5 1.0

beta2 chains 1:3

lag0 20 40

-1.0 -0.5 0.0 0.5 1.0

shape chains 1:3

lag0 20 40

-1.0 -0.5 0.0 0.5 1.0

Gelman-Rubin statistic for simulated parameters beta0 beta1 beta2, and shape

beta0 chains 1:3

start-iteration10051 11000 12000

0.0 0.5 1.0 1.5

beta1 chains 1:3

start-iteration10051 11000 12000

0.0 0.5 1.0 1.5

beta2 chains 1:3

start-iteration10051 11000 12000

0.0

0.5

1.0

shape chains 1:3

start-iteration10051 11000 12000

0.0 0.5 1.0 1.5

Assessing these plots indicates that the parameter traces look like straight hairy colorful caterpillars, with the three chains fluctuating rapidly around their equilibrium, and that there are no obvious upward or downward trends. Besides, the autocorrelation plots show little correlations, and kernel density plots show bell-like posterior distributions, and the Gelman-Rubin statistic show that the ratio of between to within variability is close to 1. All plots assume us that the model is converged. The overall parameter estimates from the analysis and these from Proc LIFEREG are similar and thus provided below as a comparison.

Parameter WinBUGS Proc LIFEREG Estimate(SE) Bayesian 95% CI Estimate(SE) 95% CI Beta0 5.84(1.13) ( 3.68, 8.09) 5.13(0.84) (3.49, 6.76) Beta1 -1.67(0.59) (-2.83, -0.50) -1.51(0.51) (-2.51 -0.51) Beta2 0.46(0.40) (-0.36, 1.21) 0.46(0.36) (-0.24, 1.15) Shape 1.13(0.12) ( 0.91, 1.38) 1.12(0.12) ( 0.91, 1.39)

CONCLUSION %WinBUGS gives you a convenient tool to explore the full range of possibilities offered by WinBUGS from within SAS. It has following advantages:

Page 18: Bayesian Data Analysis Using %WinBUGS. A file containing the model specification in WinBUGS language, in which the distributions, likelihood and calculations of data and parameters

• You can prepare data, and initial parameter values with SAS data steps, and convert them into WinBUGS data files easily, or use SAS/STAT to make initial model exploration before using WinBUGS. It is also very convenient to check the outputs from WinBUGS with these from SAS analytic methods to guard against any erroneous inference.

• A new WinBUGS analysis file can be easily created from an existing one. You can modify and re-run a WinBUGS analysis from within SAS without lots of pointing and clicking. You can even use SAS macro variables to control the WinBUGS execution and data exchange between two systems.

• More importantly, %WinBUGS extends the SAS functionality. It can be used to fit a great variety of linear and nonlinear models, including GLM, categorical, and survival model, with or without random effects. Some of them even can not fitted by any current SAS procedures.

In addition, all of the programs and macros referred to in this paper are available upon request.

DISCLAIMER: The contents of this paper are the work of the author and do not necessarily represent the opinions, recommendations, or practices of Celgene Corporation.

ACKNOWLEDGMENTS The author wishes to thank Dr. Jichao Sun for his kind comments and constructive suggestions during the preparation of this paper.

REFERENCE 1. Gilks, W. R., S. Richardson, and D. J. Spiegelhalter, ed., Markov chain Monte Carlo in practice, London: Chapman & Hall,

1996.

2. Gamerman, D. and Lopes, H. F., Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference (2nd edn), Chapman & Hall , 2006

3. Spiegelhalter, D., Thomas, A., Best, N., and Lunn, D. , WinBUGS 1.4 Manual, 2003

4. Thompson JT, Palmer T and Moreno S., Performing Bayesian analysis in Stata using WinBUGS, The Stata Journal (2006); 6 (4): 530-549.

5. http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/remote14.shtml, Calling WinBUGS 1.4 from other programs.

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact author at:

Lei Zhang

Celgene Corporation.

86 Morris Avenue

Summit, NJ 07901

Phone: (908) 673-9000

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® Indicates USA registration.

Other brand and product names are trademarks of their respective companies.