Preliminary Version of R Commands for – Log-Linear Models ...fletcher/R-LOGLIN2R.pdf · Preface This online book is an R companion to Log-linear Models and Logistic Regres-sion,

Ronald ChristensenDepartment of Mathematics and StatisticsUniversity of New Mexicoc© 2019

Preliminary Version ofR Commands for –Log-Linear Models andLogistic RegressionRevised Second Edition

Springer

Preface

This online book is an R companion to Log-linear Models and Logistic Regres-sion, Revised Second Edition (LOGLIN2R). This book presupposes that the readeris already familiar with downloading R, plotting data, reading data files, trans-forming data, basic housekeeping, loading R packages, and specifying basic lin-ear models. That is the material in Chapters 1 and 3 of my Preliminary Versionof R Commands for Analysis of Variance, Design, and Regression: Linear Mod-eling for Unbalanced Data which is available at http://www.stat.unm.edu/˜fletcher/Rcode.pdf. Much of the material here has just been mod-ified/copied from the other volume (but placed appropriately for LOGLIN2R).Data files for this book are available at http://stat.unm.edu/˜fletcher/llm_data.zip. At the moment I am also using data files from ANREG-II avail-able at http://stat.unm.edu/˜fletcher/newavdr_data.zip. A toolthat I have found very useful for writing R code is Tom Short’s R Reference card,http://cran.r-project.org/doc/contrib/Short-refcard.pdf

Like all of the other R code documents for my books, this is arranged to corre-spond to the actual book. Thus the R code for performing the things in Chapter 1 ofLOGLIN2R is contained in Chapter 1 of this book, etc. When using this book, if youare copying R code from a pdf file into R, “tilde”, i.e.,

˜

will often copy incorrectly so that you may need to delete the copied version oftilde and retype it.

At least on my computer it has become more difficult to install packages of late,so it is easier to install them all at once. The packages (currently) discussed in thisdocument are:

bestglmexactLoglinTestgnmleapslogmultMASS

vii

http://www.stat.unm.edu/~fletcher/Rcode.pdf

http://www.stat.unm.edu/~fletcher/Rcode.pdf

http://stat.unm.edu/~fletcher/llm_data.zip

http://stat.unm.edu/~fletcher/llm_data.zip

http://stat.unm.edu/~fletcher/newavdr_data.zip

http://cran.r-project.org/doc/contrib/Short-refcard.pdf

viii

nnetpsych

Additional packages will be needed for doing Chapter 13. This issue is discussedmore in the R code for ANREG-II.

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Conditional Probability and Independence . . . . . . . . . . . . . . . . . . . . . . 11.2 Random Variables and Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 The Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.4.1 Product-Multinomial Distributions . . . . . . . . . . . . . . . . . . . . . . 21.5 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Two-Dimensional Tables and Simple Logistic Regression . . . . . . . . . . . 32.1 Two Independent Binomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Testing Independence in a 2×2 Table . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 I× J Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.4 Maximum Likelihood Theory for Two-Dimensional Tables . . . . . . . 42.5 Log-Linear Models for Two-Dimensional Tables . . . . . . . . . . . . . . . . 42.6 Simple Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Three-Dimensional Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.1 Simpson’s Paradox and the Need for Higher-Dimensional Tables . . . 93.2 Independence and Odds Ratio Models . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.1 The Model of Complete Independence . . . . . . . . . . . . . . . . . . 93.2.2 Models with One Factor Independent of the Other Two . . . . 103.2.3 Models of Conditional Independence . . . . . . . . . . . . . . . . . . . . 103.2.4 A Final Model for Three-Way Tables . . . . . . . . . . . . . . . . . . . . 11

3.3 Iterative Computation of Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.4 Log-Linear Models for Three-Dimensional Tables . . . . . . . . . . . . . . . 11

3.4.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

ix

x Contents

3.4.2 Testing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.5 Product-Multinomial and Other Sampling Plans . . . . . . . . . . . . . . . . . 133.6 Model Selection Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.7 Higher-Dimensional Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.1 Multiple Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2 Measuring Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3 Logistic Regression Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.4 Model Selection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.4.1 Stepwise logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.4.2 Best subset logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.5 ANOVA Type Logit Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.6 Logit Models For a Multinomial Response . . . . . . . . . . . . . . . . . . . . . 254.7 Logistic Discrimination and Allocation . . . . . . . . . . . . . . . . . . . . . . . . 284.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Independence Relationships and Graphical Models . . . . . . . . . . . . . . . . 315.1 Model Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2 Graphical and Decomposable Models . . . . . . . . . . . . . . . . . . . . . . . . . . 315.3 Collapsing Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.4 Recursive Causal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6 Model Selection Methods and Model Evaluation . . . . . . . . . . . . . . . . . . . 336.1 Stepwise Procedures for Model Selection . . . . . . . . . . . . . . . . . . . . . . 336.2 Initial Models for Selection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.2.1 All s-Factor Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.2.2 Examining Each Term Individually . . . . . . . . . . . . . . . . . . . . . 346.2.3 Tests of Marginal and Partial Association . . . . . . . . . . . . . . . . 346.2.4 Testing Each Term Last . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.3 Example of Stepwise Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.3.1 Forward Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.3.2 Backward Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.4 Aitkin’s Method of Backward Selection . . . . . . . . . . . . . . . . . . . . . . . . 376.5 Model Selection Among Decomposable and Graphical Models . . . . 376.6 Use of Model Selection Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.7 Residuals and Influential Observations . . . . . . . . . . . . . . . . . . . . . . . . . 386.8 Drawing Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Contents xi

7 Models for Factors with Quantitative Levels . . . . . . . . . . . . . . . . . . . . . . . 417.1 Models for Two-Factor Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417.2 Higher-Dimensional Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427.3 Unknown Factor Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.4 Logit Models with Unknown Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

8 Fixed and Random Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458.1 Fixed Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458.2 Partitioning Polytomous Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468.3 Random Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

9 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519.1 Distributions for Generalized Linear Models . . . . . . . . . . . . . . . . . . . . 519.2 Estimation of Linear Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519.3 Estimation of Dispersion and Model Fitting . . . . . . . . . . . . . . . . . . . . 519.4 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

10 The Matrix Approach to Log-Linear Models . . . . . . . . . . . . . . . . . . . . . . 5310.1 Maximum Likelihood Theory for Multinomial Sampling . . . . . . . . . 5310.2 Asymptotic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5310.3 Product-Multinomial Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5410.4 Inference for Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5410.5 Methods for Finding Maximum Likelihood Estimates . . . . . . . . . . . . 5410.6 Regression Analysis of Categorical Data . . . . . . . . . . . . . . . . . . . . . . . 5410.7 Residual Analysis and Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5510.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

11 The Matrix Approach to Logit Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5711.1 Estimation and Testing for Logistic Models . . . . . . . . . . . . . . . . . . . . . 5711.2 Model Selection Criteria for Logistic Regression . . . . . . . . . . . . . . . . 5711.3 Likelihood Equations and Newton-Raphson . . . . . . . . . . . . . . . . . . . . 5711.4 Weighted Least Squares for Logit Models . . . . . . . . . . . . . . . . . . . . . . 5711.5 Multinomial Response Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5711.6 Asymptotic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5711.7 Discrimination, Allocations, and Retrospective Data . . . . . . . . . . . . . 5711.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

12 Maximum Likelihood Theory for Log-Linear Models . . . . . . . . . . . . . . 5912.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5912.2 Fixed Sample Size Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5912.3 Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5912.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5912.5 Proofs of Lemma 12.3.2 and Theorem 12.3.8 . . . . . . . . . . . . . . . . . . . 59

xii Contents

13 Bayesian Binomial Regression: OpenBUGS Run Through R . . . . . . . . 6113.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

13.1.1 Alternative Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6513.2 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

13.2.1 Specifying the Prior and Approximating the Posterior . . . . . . 6613.2.2 Predictive Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8513.2.3 Inference for Regression Coefficients . . . . . . . . . . . . . . . . . . . 8713.2.4 Inference for LDα . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

13.3 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8913.3.1 Case Deletion Influence Measures . . . . . . . . . . . . . . . . . . . . . . 8913.3.2 Estimative Influence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8913.3.3 Predictive Influence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8913.3.4 Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9013.3.5 Link Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9013.3.6 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

13.4 Posterior Computations and Sample Size Calculation . . . . . . . . . . . . 93

14 Bayesian Binomial Regression: OpenBUGS GUI . . . . . . . . . . . . . . . . . . . 9714.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

14.1.1 Running the OpenBUGS GUI . . . . . . . . . . . . . . . . . . . . . . . . . . 10014.1.2 Alternative Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

14.2 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10414.2.1 Specifying the Prior and Approximating the Posterior . . . . . . 10414.2.2 Predictive Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11214.2.3 Inference for Regression Coefficients . . . . . . . . . . . . . . . . . . . 11414.2.4 Inference for LDα . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

14.3 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11814.3.1 Case Deletion Influence Measures . . . . . . . . . . . . . . . . . . . . . . 11814.3.2 Estimative Influence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11814.3.3 Predictive Influence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11814.3.4 Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12014.3.5 Link Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12014.3.6 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

14.4 Posterior Computations and Sample Size Calculation . . . . . . . . . . . . 121

15 Correspondence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12515.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12515.2 Singular Value Decomposition Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12715.3 Correspondence Analysis Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13115.4 R code for SVD and CA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

15.4.1 Nobel Prize Winners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14015.5 Multiple correspondence analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Contents ix

16 Exact Conditional Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14716.1 Two-Factor Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

16.1.1 R code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15116.2 Three-Factor Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

16.2.1 Testing [AC][BC] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15316.2.2 Testing [B][AC] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

16.3 General Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16116.4 Model Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

16.4.1 General Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16416.4.2 Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

16.5 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

17 Polya Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16917.0.1 Alas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Chapter 1Introduction

1.1 Conditional Probability and Independence

1.2 Random Variables and Expectations

1.3 The Binomial Distribution

To evaluate Bin(N, p) densities, use dbinom(x,N,p). The cdf F(u) can be eval-uated as pbinom(u,N,p) where p in pbinom stands for probability.

1.4 The Multinomial Distribution

To evaluate Mult(N, p) densities for a vector p, at some vector of allow-able scalars x, use dmultinom(x,N,p). The cdf F(u) can be evaluated aspmultnom(u,N,p) where p in pmultinom stands for probability.

The probability for the table given in this section is given by

p=c(.12,.12,.04,.12,.18,.18,.06,.18)x=c(5,7,4,6,8,7,3,10)N=50dmultinom(x,N,p)

This does not agree with the number in the book. This gives 0.000002 rather thanthe book’s value 0.000007. I suspect I computed the book value on a hand calculatorcanceling many of the terms in the factorials. The following code, that I wrote withnumerical stability in mind, does something similar and agrees with the book

a=c(50,47,46,11,43,42,41,39,38,37,34,33,31,29,9,24,28,26,25,23,22,21,19,17,15,14,13,11)

1

2 1 Introduction

b=c(.12ˆ2,.12ˆ2,.12ˆ2,.12ˆ2,.12ˆ2,.12ˆ2,.12ˆ2,.12,.12,.12,.12,.04,.04,.04,.04, .06,.06,.06,.18ˆ3,.18ˆ3,.18ˆ3,.18ˆ3,.18ˆ3,.18ˆ3,.18ˆ3,.18ˆ2,.18,.18)c=a*bprod(c)

Just goes to show that you should never believe extremely small probabilities.

1.4.1 Product-Multinomial Distributions

Given the caveats just given, the probability for the table in this section would becomputed as

p1=c(.3,.3,.1,.3)p2=c(.3,.3,.1,.3)x1=c(10,10,2,8)x2=c(5,8,1,6)N1=30N2=20dmultinom(x1,N1,p1)*dmultinom(x2,N2,p2)

This time dmultinom(x1,N1,p1) agreed with my numerically stable compu-tation but disagreed with what was in the unrevised second edition, so I revised theprobability in the book.

1.5 The Poisson Distribution

To evaluate Pois(λ ) densities, use dpois(x,lambda). The cdf F(u) can be eval-uated as pbinom(u,lambda) where p in ppois stands for probability.

Chapter 2Two-Dimensional Tables and Simple LogisticRegression

2.1 Two Independent Binomials

A data file might contain three columns: supports, opposess, and the total numbersurveyed. With this information, the simplest way to proceed is to just type in thedata.

Support=c(309,319)Oppose=c(191,281)Total=Support+Opposeprop.test(Support,Total,correct=FALSE)

The test statistic produced is the square of the test statistic in the book.An alternative way to enter the data is to create a matrix of the admissions and

rejections.

OP <- matrix(c(Support,Oppose),ncol=2)OPprop.test(OP,correct=FALSE)

We could replace prop.testwith chisq.test (using the same arguments) andget the same test but slightly different output and options. The procedure providesaccess to Pearson residuals and estimated expected values, things that prop.testdoes not give.

fit <- chisq.test(OP,correct=FALSE)fitfit$expectedfit$residual

3

4 2 Two-Dimensional Tables and Simple Logistic Regression

2.2 Testing Independence in a 2×2 Table

Although the sampling scheme differs from the previous section, so the theory isdifferent, the computations are exactly the same.

A=c(483,1101)B=c(477,1121)EX <- matrix(c(A,B),ncol=2)EXfit <- chisq.test(EX,correct=FALSE)fitfit$expectedfit$residual

2.3 I× J Tables

With a table this small it is easy to type in the data values.

E=c(21,3,7)G=c(11,2,1)F=c(4,2,1)IJ <- matrix(c(E,G,F),ncol=3)IJfit <- chisq.test(IJ,correct=FALSE)fitfit$expectedfit$residual

2.4 Maximum Likelihood Theory for Two-Dimensional Tables

2.5 Log-Linear Models for Two-Dimensional Tables

The only computing really done in this section is finding Figure 2.1. We begin withthe figure but we then fit the data as we previously have in this chapter and finallyfit the data using a log-linear model. You should examine the output from the twoprograms for fitting the data to identify that the fitted values and Pearson residualsare identical.

LGCLG=log(CLG)test=c(1,2,3)par(mfrow=c(1,1))plot(test,LGCLG[1,],type="n",ylab="log(n)",ylim=c(0,5),

2.5 Log-Linear Models for Two-Dimensional Tables 5

#xaxt = "n", #frame = TRUE,xlab="Political Affiliation",lty=1,lwd=2)#,lab=c(4,5,7))axis(1,at=c(1,2,3),labels=c("Rep.","Dem.","Ind."))#axis(1, 1:4, LETTERS[1:4])lines(test,LGCLG[1,],type="o",lty=1,lwd=2)lines(test,LGCLG[2,],type="o",lty=2,lwd=2)lines(test,LGCLG[3,],type="o",lty=3,lwd=2)lines(test,LGCLG[4,],type="o",lty=4,lwd=2)legend("topleft",c("College","Letters","Engin.","Agri.","Educ."),lty=c(NA,1,2,3,4))

This is how we have been fitting two-way tables in this chapter.

Rep=c(34,31,19,23)Dem=c(61,19,23,39)Ind=c(16,17,16,12)CLG <- matrix(c(Rep,Dem,Ind),ncol=3)CLGfit <- chisq.test(CLG,correct=FALSE)fitfit$expectedfit$residual

This fits an equivalent log-linear model. The count data are in one string cntwith two other stings to identify the count’s political affiliation pa and college clg.The likelihood ratio test statistic G (deviance) is listed as the “residual deviance.”The data n, fitted values m, and Pearson residuals are listed in a table at the endalong with two things that the book does not introduce for some time, standardizedresiduals and Cook’s distances. (There should exist a call to get the Pearson teststatistic.)

rm(list = ls())cnt=c(34,31,19,23,61,19,23,39,16,17,16,12)pa=c(1,1,1,1,2,2,2,2,3,3,3,3)clg=c(1,2,3,4,1,2,3,4,1,2,3,4)

#Summary tablesPA=factor(pa)CLG=factor(clg)ts <- glm(cnt ˜ PA + CLG,family = poisson)tsp=summary(ts)tspanova(ts)

rpearson=(cnt-ts$fit)/(ts$fit)ˆ(.5)


rstand=rpearson/(1-hatvalues(ts))ˆ(.5)infv = c(cnt,ts$fit,hatvalues(ts),rpearson,

rstand,cooks.distance(ts))inf=matrix(infv,I(tsp$df[1]+tsp$df[2]),6,dimnames =list(NULL,c("n","mhat","lev","Pearson","Stand.","C")))inf

2.6 Simple Logistic Regression

This code also includes the computation of diagnostic quantities that are not dis-cussed in Chapter 2. This code computes an R2 value that is not discussed inLOGLIN2R.

rm(list = ls())oring.sllr <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\tab20-3.dat",

sep="",col.names=c("Case","Flt","y","s","x","no"))

attach(oring.sllr)oring.sllr#summary(oring.sllr)

#Summary tablesor <- glm(y ˜ x,family = binomial)orp=summary(or)orpanova(or)

#predictionnew = data.frame(x=c(31,53))predict(or,new,type="response")rpearson=(y-or$fit)/(or$fit*(1-or$fit))ˆ(.5)rstand=rpearson/(1-hatvalues(or))ˆ(.5)infv = c(y,or$fit,hatvalues(or),rpearson,

rstand,cooks.distance(or))inf=matrix(infv,I(orp$df[1]+orp$df[2]),6,dimnames =list(NULL,c("y","yhat","lev","Pearson","Stand.","C")))infR2 = (cor(y,or$fit))ˆ2R2

We now repeat the computations using a log-linear model.

rm(list = ls())

2.7 Exercises 7

oring.sllr <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\tab20-3.dat",

sep="",col.names=c("Case","Flt","f","s","x","no"))

attach(oring.sllr)oring.sllr#summary(oring.sllr)

# Construct data for log-linear model# Sting out the failures followed by successescnt=c(f,s)# The temp for each element of cntxx=c(x,x)# The row of the table for each element of cntrow=c(Case,Case)# The col. of the table for each element of cnt# For binary data, f+s=1# first 23 obs. are first col, 2nd 23 are 2nd col.col=c(f+s,2*(f+s))# check that the table is correctmatrix(c(cnt,row,col),ncol=3)

# Fit log-linear modelR=factor(row)C=factor(col)fit=glm(cnt ˜ R + C + C:xx, family=poisson)summary(fit)anova(fit)

Compare the parameter estimates associated with C and C:xx to the logistic regres-sion output. Also compare G2s.

2.7 Exercises

EXERCISE 2.7.4. Partitioning Tables. To perform Lancaster-Irwin partitioning,you “need” to manipulate the data to create appropriate subtables. You can do thatin your favorite editor. I might mention that in Exercise 8.4.3 and ANREG-II, Chap-ter 21 I discuss performing Lancaster-Irwin partitioning by manipulating the sub-scripts used to define log-linear models.

EXERCISE 2.7.5. Fisher’s Exact Test.

command fisher.test


Also package exactLoglinTest

EXERCISE 2.7.6. Yule’s Q. data should be a 2×2 matrix of counts

library(psych)Yule(data,Y=False)

EXERCISE 2.7.7. Freeman-Tukey Residuals.

EXERCISE 2.7.8. Power Divergence Statistics.

EXERCISE 2.7.10. Testing for Symmetry.nominalSymmetryTest

EXERCISE 2.7.12. McNemar’s Test.Input a matrix of count values.mcnemar.test

Chapter 3Three-Dimensional Tables

For an analysis of Example 3.0.1, see Example 10.2.6.

3.1 Simpson’s Paradox and the Need for Higher-DimensionalTables

3.2 Independence and Odds Ratio Models

Although log-linear models are not introduced until the next section, we use soft-ware for fitting them now.

3.2.1 The Model of Complete Independence

EXAMPLE 3.2.1.

cnt=c(716,79,207,25,819,67,186,22)ii=c(1,1,1,1,2,2,2,2)kk=c(1,2,1,2,1,2,1,2)jj=c(1,1,2,2,1,1,2,2)II=factor(ii)JJ=factor(jj)KK=factor(kk)sv <- glm(cnt ˜ II+JJ+KK,family = poisson)

fitted(sv)sum(residuals(sv,type="pearson")ˆ2)deviance(sv)

9

10 3 Three-Dimensional Tables

df.residual(sv)residuals(sv,type="pearson")

3.2.2 Models with One Factor Independent of the Other Two

EXAMPLE 3.2.2.

cnt=c(16,7,15,34,5,3,1,1,3,8,1,3)ii=c(1,1,1,1,1,1,2,2,2,2,2,2)jj=c(1,2,1,2,1,2,1,2,1,2,1,2)kk=c(1,1,2,2,3,3,1,1,2,2,3,3)II=factor(ii)JJ=factor(jj)KK=factor(kk)sv <- glm(cnt ˜ II+JJ:KK,family = poisson)

fitted(sv)sum(residuals(sv,type="pearson")ˆ2)deviance(sv)df.residual(sv)qchisq(.95,5)

For more of an analysis of Example 3.2.2, also see Example 10.2.6.

3.2.3 Models of Conditional Independence

EXAMPLE 3.2.3.

cnt=c(716,79,207,25,819,67,186,22)ii=c(1,1,1,1,2,2,2,2)kk=c(1,2,1,2,1,2,1,2)jj=c(1,1,2,2,1,1,2,2)II=factor(ii)JJ=factor(jj)KK=factor(kk)sv <- glm(cnt ˜ II:JJ+II:KK,family = poisson)

fitted(sv)sum(residuals(sv,type="pearson")ˆ2)deviance(sv)df.residual(sv)

3.4 Log-Linear Models for Three-Dimensional Tables 11

3.2.4 A Final Model for Three-Way Tables

EXAMPLE 3.2.4.

cnt=c(350,150,60,112,26,23,19,80)ii=c(1,1,1,1,2,2,2,2)jj=c(1,2,1,2,1,2,1,2)kk=c(1,1,2,2,1,1,2,2)II=factor(ii)JJ=factor(jj)KK=factor(kk)sv <- glm(cnt ˜ II:JJ+II:KK+JJ:KK,family = poisson)

fitted(sv)df.residual(sv)sum(residuals(sv,type="pearson")ˆ2)deviance(sv)

For further analysis of Example 3.2.4, see Example 10.2.4.

3.3 Iterative Computation of Estimates

The generalized linear model procedure glm uses Newton-Raphson (iterativelyreweighted least squares)? [The output refers to it as Fisher Scoring.] To use it-erative proportional fitting use

library(mass)loglm(y ˜ model)

I need to check whether this does more than ANOVA type models. Documen-tation seems general.

3.4 Log-Linear Models for Three-Dimensional Tables

3.4.1 Estimation

3.4.2 Testing Models

We now add the anova command to our fitting.

EXAMPLE 3.4.1. The last residual deviances are what we want.


cnt=c(16,7,15,34,5,3,1,1,3,8,1,3)ii=c(1,1,1,1,1,1,2,2,2,2,2,2)jj=c(1,2,1,2,1,2,1,2,1,2,1,2)kk=c(1,1,2,2,3,3,1,1,2,2,3,3)II=factor(ii)JJ=factor(jj)KK=factor(kk)sv <- glm(cnt ˜ II+JJ+KK+JJ:KK,family = poisson)anova(sv)

EXAMPLE 3.4.1. The last residual deviances are what we want.

cnt=c(16,7,15,34,5,3,1,1,3,8,1,3)ii=c(1,1,1,1,1,1,2,2,2,2,2,2)jj=c(1,2,1,2,1,2,1,2,1,2,1,2)kk=c(1,1,2,2,3,3,1,1,2,2,3,3)II=factor(ii)JJ=factor(jj)KK=factor(kk)sv <- glm(cnt ˜ II+JJ+KK+JJ:KK,family = poisson)anova(sv)

EXAMPLE 3.4.2.

cnt=c(716,79,207,25,819,67,186,22)ii=c(1,1,1,1,2,2,2,2)kk=c(1,2,1,2,1,2,1,2)jj=c(1,1,2,2,1,1,2,2)II=factor(ii)JJ=factor(jj)KK=factor(kk)sv7 <- glm(cnt ˜ II:JJ+II:KK+JJ:KK,family = poisson)sv6 <- glm(cnt ˜ II:JJ+II:KK,family = poisson)sv5 <- glm(cnt ˜ II:JJ+JJ:KK,family = poisson)sv4 <- glm(cnt ˜ II:KK+JJ:KK,family = poisson)sv1 <- glm(cnt ˜ II+JJ:KK,family = poisson)sv2 <- glm(cnt ˜ JJ+II:KK,family = poisson)sv3 <- glm(cnt ˜ II:JJ+KK,family = poisson)sv0 <- glm(cnt ˜ II+JJ+KK,family = poisson)

tab7=c(7,df.residual(sv7),sum(residuals(sv7,type="pearson")ˆ2),deviance(sv7),1-pchisq(deviance(sv7),df.residual(sv7)))tab6=c(6,df.residual(sv6),

3.6 Model Selection Criteria 13

sum(residuals(sv6,type="pearson")ˆ2),deviance(sv6),1-pchisq(deviance(sv6),df.residual(sv6)))tab5=c(5,df.residual(sv5),sum(residuals(sv5,type="pearson")ˆ2),deviance(sv5),1-pchisq(deviance(sv5),df.residual(sv5)))tab4=c(4,df.residual(sv4),sum(residuals(sv4,type="pearson")ˆ2),deviance(sv4),1-pchisq(deviance(sv4),df.residual(sv4)))tab1=c(1,df.residual(sv1),sum(residuals(sv1,type="pearson")ˆ2),deviance(sv1),1-pchisq(deviance(sv1),df.residual(sv1)))tab2=c(2,df.residual(sv2),sum(residuals(sv2,type="pearson")ˆ2),deviance(sv2),1-pchisq(deviance(sv2),df.residual(sv2)))tab3=c(3,df.residual(sv3),sum(residuals(sv3,type="pearson")ˆ2),deviance(sv3),1-pchisq(deviance(sv3),df.residual(sv3)))tab0=c(0,df.residual(sv0),sum(residuals(sv0,type="pearson")ˆ2),deviance(sv0),1-pchisq(deviance(sv0),df.residual(sv0)))

t(matrix(c(tab7,tab6,tab5,tab4,tab1,tab2,tab3,tab0),5,8))anova(sv0,sv6)qchisq(0.95,2)anova(sv3,sv6)qchisq(0.95,1)anova(sv3,sv7)

3.5 Product-Multinomial and Other Sampling Plans

https://cran.r-project.org/web/packages/exactLoglinTest/index.html

3.6 Model Selection Criteria

When the book was written, software did not readily compute AIC soAIC − q was used because it was easy to compute by hand for the out-put d f and G2. In R, for a fitted model svm, the computation below isAICq=deviance(svm)-2*df.residual(svm). Now AIC is part of R’sstandard output and can be manipulated as AIC(svm)




cnt=c(716,79,207,25,819,67,186,22)ii=c(1,1,1,1,2,2,2,2)kk=c(1,2,1,2,1,2,1,2)jj=c(1,1,2,2,1,1,2,2)II=factor(ii)JJ=factor(jj)KK=factor(kk)sv7 <- glm(cnt ˜ II:JJ+II:KK+JJ:KK,family = poisson)sv6 <- glm(cnt ˜ II:JJ+II:KK,family = poisson)sv5 <- glm(cnt ˜ II:JJ+JJ:KK,family = poisson)sv4 <- glm(cnt ˜ II:KK+JJ:KK,family = poisson)sv1 <- glm(cnt ˜ II+JJ:KK,family = poisson)sv2 <- glm(cnt ˜ JJ+II:KK,family = poisson)sv3 <- glm(cnt ˜ II:JJ+KK,family = poisson)sv0 <- glm(cnt ˜ II+JJ+KK,family = poisson)

tab7=c(7,df.residual(sv7), deviance(sv7),deviance(sv7)-2*df.residual(sv7),((deviance(sv0)-deviance(sv7))/(deviance(sv0))),(1-(deviance(sv7)*df.residual(sv0)/(deviance(sv0)*df.residual(sv7)))) )tab6=c(6,df.residual(sv6), deviance(sv6),deviance(sv6)-2*df.residual(sv6),((deviance(sv0)-deviance(sv6))/(deviance(sv0))),(1-(deviance(sv6)*df.residual(sv0)/(deviance(sv0)* df.residual(sv6)))) )tab5=c(5,df.residual(sv5), deviance(sv5),deviance(sv5)-2*df.residual(sv5),((deviance(sv0)-deviance(sv5))/(deviance(sv0))),(1-(deviance(sv5)*df.residual(sv0)/(deviance(sv0)* df.residual(sv5)))) )tab4=c(4,df.residual(sv4), deviance(sv4),deviance(sv4)-2*df.residual(sv4),((deviance(sv0)-deviance(sv4))/(deviance(sv0))),(1-(deviance(sv4)*df.residual(sv0)/(deviance(sv0)* df.residual(sv4)))) )tab1=c(1,df.residual(sv1), deviance(sv1),deviance(sv1)-2*df.residual(sv1),((deviance(sv0)-deviance(sv1))/(deviance(sv0))),(1-(deviance(sv1)*df.residual(sv0)/(deviance(sv0)* df.residual(sv1)))) )tab2=c(2,df.residual(sv2), deviance(sv2),deviance(sv2)-2*df.residual(sv2),((deviance(sv0)-deviance(sv2))/(deviance(sv0))),

3.7 Higher-Dimensional Tables 15

(1-(deviance(sv2)*df.residual(sv0)/(deviance(sv0)* df.residual(sv2)))) )tab3=c(3,df.residual(sv3), deviance(sv3),deviance(sv3)-2*df.residual(sv3),((deviance(sv0)-deviance(sv3))/(deviance(sv0))),(1-(deviance(sv3)*df.residual(sv0)/(deviance(sv0)* df.residual(sv3)))) )tab0=c(0,df.residual(sv0), deviance(sv0),deviance(sv0)-2*df.residual(sv0),((deviance(sv0)-deviance(sv0))/(deviance(sv0))),(1-(deviance(sv0)*df.residual(sv0)/(deviance(sv0)* df.residual(sv0)))) )

t(matrix(c(tab7,tab6,tab5,tab4,tab1,tab2,tab3,tab0),6,8))

3.7 Higher-Dimensional Tables

Muscle tension changes.

rm(list = ls())tense <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\tab20-10a.dat",

sep="",col.names=c("y","Tn","Wt","Ms","Dr"))attach(tense)tense#summary(tense)

W=factor(Wt)M=factor(Ms)D=factor(Dr)T=factor(Tn)m7 <- glm(y ˜ T:W:M+T:W:D+T:M:D+W:M:D,family = poisson)m4 <- glm(y ˜ T:W+T:M+T:D+W:M+W:D+M:D,family = poisson)m0 <- glm(y ˜ T + W + M + D,family = poisson)

df=c(m7$df.residual,m4$df.residual,m0$df.residual)G2=c(m7$deviance,m4$deviance,m0$deviance)A2q=G2-(2*df)modelm=c(df,G2,A2q)model=matrix(modelm,3,3,dimnames=list(NULL,c("df","G2","A-q")))model


You can also get the key statistics from the following commands

m7 <- glm(y ˜ T*W*M+T*W*D+T*M*D+W*M*D,family=poisson)m7p=summary(m7)m7panova(m7)

What you want is in the last 2 columns. R is fitting the models sequentially, addingin each term on the left.

See Section 4.6 for the Abortion Opinion data.

3.8 Exercises

EXERCISE 3.8.9. The Mantel-Haenszel Statistic.mantelhaen.test

Chapter 4Logistic Regression

4.1 Multiple Logistic Regression

This code includes diagnostic quantities that are not discussed until a few sectionslater. This fits the full model, the other fitted models are easy.

rm(list = ls())chap.mlr <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\chapman.dat",sep="",col.names=c("Case","Ag","S","D","Ch","H","W","y"))

attach(chap.mlr)chap.mlr#summary(chap.mlr)

#Summary tablescm <- glm(y ˜ Ag+S+D+Ch+H+W,family = binomial)cmp=summary(cm)cmp#anova(cm)

# Diagnosticsrpearson=(y-cm$fit)/(cm$fit*(1-cm$fit))ˆ(.5)rstand=rpearson/(1-hatvalues(cm))ˆ(.5)infv = c(y,cm$fit,hatvalues(cm),rpearson,

rstand,cooks.distance(cm))inf=matrix(infv,I(cmp$df[1]+cmp$df[2]),6,dimnames =list(NULL,c("y","yhat","lev","Pearson","Stand.","C")))inf

# Tests against Model (1)

17

18 4 Logistic Regression

cmAg <- glm(y ˜ Ag,family = binomial)anova(cmAg,cm)

#Variations on AIC# q=400=200*2Aq=AIC(cmAg)-400Aq1=deviance(cmAg)-2*df.residual(cmAg)c(Aq,Aq1)Astar=258.1+Aqout=c(df.residual(cmAg),deviance(cmAg),Aq,Astar,AIC(cmAg))matrix(out,1,5,dimnames =list(NULL,c("df","G2","A-2q","A*","AIC")))

The rest of the output is just reapplying modifications of the fitting code andapplying the formulas

Figure 4.1

x=seq(20,70,.5)w =-4.5173+(0.04590*x)+(0.00686*140)+(-0.00694*90)+(0.00631*200)+(-0.07400*69)+(0.02014*200)

w1=-4.5173+(0.04590*x)+(0.00686*140)+(-0.00694*90)+(0.00631*300)+(-0.07400*69)+(0.02014*200)

y=exp(w)/(1+exp(w))y1=exp(w1)/(1+exp(w1))plot(x,y1,type="l",xlim=c(20,70),ylim=c(0,.5),

ylab="Fitted",xlab="Age",lty=2)lines(x,y,type="l",lty=1)legend("topleft",c("Chol","300","200"),lty=c(NA,2,1))

4.2 Measuring Model Fit

This section of the book does not propose a formal test but it is quite similar to thewidely programmed Hosmer and Lemshow lack-of-fit test, which doesn’t work (atleast not when compared to a χ2 as it is usually programmed).

4.3 Logistic Regression Diagnostics

A table of diagnostics was given in Section 1. We can demonstrate the one-stepalgorithms.

First we give the standard fitting algorithm. Note that this gives slightly differentstandard errors than the software I used for the book.

4.3 Logistic Regression Diagnostics 19

rm(list = ls())chap.mlr <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\chapman.dat",sep="",col.names=c("Case","Ag","S","D","Ch","H","W","y"))

attach(chap.mlr)chap.mlr#summary(chap.mlr)

#Summary tablescm <- glm(y ˜ Ag+Ch+W,family = binomial)cmp=summary(cm)cmp#anova(cm)

# Diagnosticsrpearson=(y-cm$fit)/(cm$fit*(1-cm$fit))ˆ(.5)rstand=rpearson/(1-hatvalues(cm))ˆ(.5)infv = c(y,cm$fit,hatvalues(cm),rpearson,

rstand,cooks.distance(cm))inf=matrix(infv,I(cmp$df[1]+cmp$df[2]),6,dimnames =list(NULL,c("y","phat","lev","Pearson","Stand.","C")))inf

Now we construct the one-step model. Remember that this program is only forbinary counts. Although there were some slight differences in the glm fit, these agreewith the table in the book.

RWT=cm$fit*(1-cm$fit)Y0=log(cm$fit/(1-cm$fit))Y=Y0+(y-cm$fit)/RWT# The following # command should be and is a# nearly perfect fit.#summary(lm(Y0 ˜ Ag+Ch+W,weight=RWT))one=lm(Y ˜ Ag+Ch+W,weight=RWT)

The following gives the leverages and cooks distances from the one-step procedureand compares them to the values from the glm procedure, for the 4 cases discussedin the book.

rtMSE=summary(one)$sigmalevone=hatvalues(one)cookone=(cooks.distance(one)*rtMSEˆ2)c(lev[41],lev[86],lev[126],lev[192])c(hatvalues(cm)[41],hatvalues(cm)[86],hatvalues(cm)[126],hatvalues(cm)[192])


c(cookone[41],cookone[86],cookone[126],cookone[192])c(cooks.distance(cm)[41],cooks.distance(cm)[86],cooks.distance(cm)[126],cooks.distance(cm)[192])

To delete case 41 and refit, use y[41]=NA although you might want to do thison a copy of y rather than on y itself.

4.4 Model Selection Methods

4.4.1 Stepwise logistic regression

This chooses models based on the AIC criterion, so they may be a bit different fromthe book. As illustrated earlier, read in the data and obtain the glm output.

ch = glm(y ˜ Ag+S+D+Ch+H+W,family=binomial)chstep <- step(ch, direction="backward")chstep

Other “directions” include both and forward but forward requires additionalcommands, see Section 10.3. You get similar results by replacing the glm output inch with the lm output from

ch1 = lm(yy ˜ Ag+S+D+Ch+H+W,weights=rtw)

4.4.2 Best subset logistic regression

The method starts with the full model and performs only one step of the Newton-Raphson/Iteratively Reweighted Least Squares algorithm to determine the best mod-els. This is a far better procedure than the score test method used by SAS Proc Lo-gistic because it starts from the full model, which should be a good model, ratherthan the intercept-only model used by the score test. Also see notes at the end.

rm(list = ls())chap <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\chapman.dat",sep="",col.names=c("Case","Ag","S","D","Ch","H","W","y"))attach(chap)chapsummary(chap)

#Summary tables

4.5 ANOVA Type Logit Models 21

ch = glm(y ˜ Ag+S+D+Ch+H+W,family=binomial)chp=summary(ch)chp#anova(ch)

rwt=ch$fit*(1-ch$fit)yy=log(ch$fit/(1-ch$fit))+(y-ch$fit)/rwt# If Bin(n_i,p_i)s have n_i different from 1,# multiply rwt and second term in yy by by n_i

ch1 <- lm(yy ˜ Ag+S+D+Ch+H+W,weights=rwt)ch1p=summary(ch1)ch1panova(ch1)# Note the agreement between the glm and lm fits!!!

# assign number of best models and number of# predictor variables.

#install.packages("leaps")library(leaps)x <- model.matrix(ch1)[,-1]nb=3xp=ch1p$df[1]-1dfe=length(y)- 1- c(rep(1:(xp-1),each=nb),xp)g <- regsubsets(x,yy,nbest=nb,weights=rwt)gg = summary(g)tt=c(gg$rsq,gg$adjr2,gg$cp,sqrt(gg$rss/dfe))tt1=matrix(tt,nb*(xp-1)+1,4,dimnames = list(NULL,c("R2","AdjR2","Cp","RootMSE")))tab1=data.frame(tt1,gg$outmat)tab1

Another possible source for best subset logistic regression is the packagebestglm which seems to do full, rather than one-step, fits of the models,cf. Calcagno and de Mazancourt (2010).

Calcagno, Vincent and de Mazancourt, Claire (2010). glmulti: An R Packagefor Easy Automated Model Selection with (Generalized) Linear Models, Journal ofStatistical Software, Volume 34, Issue 12.

4.5 ANOVA Type Logit Models

Table 4.2


tense <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\tab20-10.dat",

sep="",col.names=c("High","Low","Wt","Ms","Dr"))attach(tense)tense#summary(tense)

W=factor(Wt)M=factor(Ms)D=factor(Dr)T=cbind(High,Low)sv7 <- glm(T ˜ W:M+W:D+M:D,family = binomial)sv6 <- glm(T ˜ W:M+W:D,family = binomial)sv5 <- glm(T ˜ W:M+M:D,family = binomial)sv4 <- glm(T ˜ W:D+M:D,family = binomial)sv1 <- glm(T ˜ W+M:D,family = binomial)sv2 <- glm(T ˜ M+W:D,family = binomial)sv3 <- glm(T ˜ W:M+D,family = binomial)sv0 <- glm(T ˜ W+M+D,family = binomial)svd <- glm(T ˜ W+M,family = binomial)svm <- glm(T ˜ W+D,family = binomial)svw <- glm(T ˜ M+D,family = binomial)

tab7=c(7,df.residual(sv7),deviance(sv7),1-pchisq(deviance(sv7),df.residual(sv7)),-2*df.residual(sv7)+deviance(sv7))tab6=c(6,df.residual(sv6),deviance(sv6),1-pchisq(deviance(sv6),df.residual(sv6)),-2*df.residual(sv6)+deviance(sv6))tab5=c(5,df.residual(sv5),deviance(sv5),1-pchisq(deviance(sv5),df.residual(sv5)),-2*df.residual(sv5)+deviance(sv5))tab4=c(4,df.residual(sv4),deviance(sv4),1-pchisq(deviance(sv4),df.residual(sv4)),-2*df.residual(sv4)+deviance(sv4))tab1=c(1,df.residual(sv1),deviance(sv1),1-pchisq(deviance(sv1),df.residual(sv1)),-2*df.residual(sv1)+deviance(sv1))tab2=c(2,df.residual(sv2),deviance(sv2),1-pchisq(deviance(sv2),df.residual(sv2)),-2*df.residual(sv2)+deviance(sv2))tab3=c(3,df.residual(sv3),deviance(sv3),1-pchisq(deviance(sv3),df.residual(sv3)),

4.5 ANOVA Type Logit Models 23

-2*df.residual(sv3)+deviance(sv3))tab0=c(0,df.residual(sv0),deviance(sv0),1-pchisq(deviance(sv0),df.residual(sv0)),-2*df.residual(sv0)+deviance(sv0))tabd=c(2,df.residual(svd),deviance(svd),1-pchisq(deviance(svd),df.residual(svd)),-2*df.residual(svd)+deviance(svd))tabm=c(3,df.residual(svm),deviance(svm),1-pchisq(deviance(svm),df.residual(svm)),-2*df.residual(svm)+deviance(svm))tabw=c(0,df.residual(svw),deviance(svw),1-pchisq(deviance(svw),df.residual(svw)),-2*df.residual(svw)+deviance(svw))

t(matrix(c(tab7,tab6,tab5,tab4,tab3,tab2,tab1,tab0,tabd,tabm,tabw),5,11))

anova(sv0,sv6)qchisq(0.95,2)anova(sv3,sv6)qchisq(0.95,1)anova(sv3,sv7)

Tables 4.3 and 4.4



W=factor(Wt)M=factor(Ms)D=factor(Dr)T=factor(Tn)m6 <- glm(y ˜ T:W + T:M:D + W:M:D,family = poisson)fitted(m6)c(fitted(m6)[1]/fitted(m6)[9],fitted(m6)[2]/fitted(m6)[10],fitted(m6)[3]/fitted(m6)[11],fitted(m6)[4]/fitted(m6)[12],fitted(m6)[5]/fitted(m6)[13],fitted(m6)[6]/fitted(m6)[14],fitted(m6)[7]/fitted(m6)[15],fitted(m6)[8]/fitted(m6)[16],fitted(m6)[1]/fitted(m6)[9])


Table 4.4 directly from the logit model.

rm(list = ls())tense <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\tab20-10.dat",


W=factor(Wt)M=factor(Ms)D=factor(Dr)T=cbind(High,Low)ts <- glm(T ˜ W + M*D,family = binomial)tsp=summary(ts)tspanova(ts)

fitted(ts)/(1-fitted(ts))

Tables 4.5 and 4.6 directly from the logit model.

rm(list = ls())tense <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\tab20-10.dat",


W=factor(Wt)M=factor(Ms)D=factor(Dr)T=cbind(High,Low)ts <- glm(T ˜ W:M + M*D,family = binomial)tsp=summary(ts)tspanova(ts)

fitted(ts)/(1-fitted(ts))

4.6 Logit Models For a Multinomial Response 25

4.6 Logit Models For a Multinomial Response

This code is actually for Table 4.7.

rm(list = ls())abt <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\TAB21-4.DAT",

sep="",col.names=c("R","S","A","O","y"))attach(abt)abt#summary(abt)

r=factor(R)o=factor(O)s=factor(S)a=factor(A)

m15 <- glm(y ˜ r:s:a+r:s:o+r:o:a+s:o:a ,family = poisson)m14 <- glm(y ˜ r:s:a + r:s:o + r:o:a ,family = poisson)m13 <- glm(y ˜ r:s:a + r:s:o + s:o:a ,family = poisson)m12 <- glm(y ˜ r:s:a + r:o:a + s:o:a ,family = poisson)m11 <- glm(y ˜ r:s:a + r:s:o + o:a ,family = poisson)m10 <- glm(y ˜ r:s:a + r:o:a + s:o ,family = poisson)m9 <- glm(y ˜ r:s:a + s:o:a + r:o ,family = poisson)m8 <- glm(y ˜ r:s:a + r:o + s:o + o:a,family = poisson)m7 <- glm(y ˜ r:s:a + r:o + s:o ,family = poisson)m6 <- glm(y ˜ r:s:a + r:o + o:a ,family = poisson)m5 <- glm(y ˜ r:s:a + s:o + o:a ,family = poisson)m4 <- glm(y ˜ r:s:a + r:o ,family = poisson)m3 <- glm(y ˜ r:s:a + s:o ,family = poisson)m2 <- glm(y ˜ r:s:a + o:a ,family = poisson)m1 <- glm(y ˜ r:s:a + o ,family = poisson)

df=c(m15$df.residual,m14$df.residual,m13$df.residual,m12$df.residual,m11$df.residual,m10$df.residual,m9$df.residual,m8$df.residual,m7$df.residual,m6$df.residual,m5$df.residual,m4$df.residual,m3$df.residual,m2$df.residual,m1$df.residual)G2=c(m15$deviance,m14$deviance,m13$deviance,m12$deviance,m11$deviance,m10$deviance,m9$deviance,m8$deviance,m7$deviance,m6$deviance,m5$deviance,m4$deviance,m3$deviance,m2$deviance,m1$deviance)

A2q=G2-(2*df)


modelm=c(df,G2,A2q)model=matrix(modelm,15,3,dimnames =

list(NULL,c("df","G2","A-q")))model

We now get the output for Table 4.8 of the book. When looking at the estimatedexpected cell counts and Pearson residuals associated with the next group of com-mands, it is important to notice that in the data file the White Males are listed in adifferent order than the other race-sex groups.



r=factor(R)o=factor(O)s=factor(S)a=factor(A)m11 <- glm(y ˜ r:s:a + r:s:o + o:a ,family = poisson)

m11s=summary(m11)m11sanova(m11)

rpearson=(y-m11$fit)/(m11$fit)ˆ(.5)rstand=rpearson/(1-hatvalues(m11))ˆ(.5)infv = c(y,m11$fit,hatvalues(m11),rpearson,rstand,cooks.distance(m11))inf=matrix(infv,I(m11s$df[1]+m11s$df[2]),6,dimnames =list(NULL,c("y","yhat","lev","Pearson","Stand.","C")))inf

We not examine fitting models (4.6.5) through (4.6.7) from the book.

rm(list = ls())abop <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\tab20-15.dat",sep="",col.names=c("Case","Race","Sex","Age","Yes","No","Total"))attach(abop)abop

4.6 Logit Models For a Multinomial Response 27

#summary(abop)

#Summary tablesR=factor(Race)S=factor(Sex)A=factor(Age)y=Yes/Total# Model (4.6.5)ab <- glm(y˜R:S+A,family=binomial,weights=Total)abp=summary(ab)abp

odds=ab$fit/(1-ab$fit)odds

# Model (4.6.6)ab6 <- glm(y˜R:S+Age,family=binomial,weights=Total)abp=summary(ab6)abpanova(ab6,ab5)

# Model (4.6.7)Men=Race*(Sex-1)m=factor(Men)ab7 <- glm(y˜m+A,family=binomial,weights=Total)abp=summary(ab7)abpanova(ab7,ab5)

# Model (4.6.8)ab8 <- glm(y˜m+Age,family=binomial,weights=Total)abp=summary(ab8)abpanova(ab8,ab5)

odds=ab8$fit/(1-ab8$fit)oddstable=matrix(odds,6,4,dimnames =list(NULL,c("Male", " White Female"," Male"," Nonwhite Female")))oddstable

Also see multinom in library nnet and polr in MASS


4.7 Logistic Discrimination and Allocation

The first thing we have to do is create the 3× 21 table illustrated in the book. Wethen fit the model and finally we get the entries for the book’s Tables 4.11 and 4.12.

rm(list = ls())cush <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\TAB21-11.DAT",

sep="",col.names=c("Syn","Tetra","Preg"))attach(cush)cush

#Create a 3 x 21 table of 0-1 entries,#each row has 1’s for a different type of syndromej=rep(seq(1,21),3)i=c(rep(1,21),rep(2,21),rep(3,21))Tet=c(Tetra,Tetra,Tetra)Pre=c(Preg,Preg,Preg)y=c(Syn,Syn,Syn)y[1:6]=1y[7:21]=0y[22:27]=0y[28:37]=1y[38:58]=0y[59:63]=1datal=c(y,i,j,Tet,Pre)datl=matrix(datal,63,5,dimnames =list(NULL,c("y", "i", "j","Tet","Pre")))datl

#Fit the log-linear model for logistic discrimination.i=factor(i)j=factor(j)lp=log(Pre)lt=log(Tet)ld <- glm(y ˜ i + j + i:lt +i:lp ,family = poisson)ldp=summary(ld)ldpanova(ld)

# Table 4.12q=ld$fit# Divide by sample sizesp1=ld$fit[1:21]/6p2=ld$fit[22:42]/10

4.8 Exercises 29

p3=ld$fit[43:63]/5# Produce tableestprob = c(Syn,p1,p2,p3)EstProb=matrix(estprob,21,4,dimnames =list(NULL,c("Group", "A", "B","C")))EstProb

# Table 4.13 Proportional prior probabilities.post = c(Syn,ld$fit)PropProb=matrix(post,21,4,dimnames =list(NULL,c("Group", "A", "B","C")))PropProb

# Table 4.13 Equal prior probabilities.p=p1+p2+p3pp1=p1/ppp2=p2/ppp3=p3/ppost = c(Syn,pp1,pp2,pp3)EqProb=matrix(post,21,4,dimnames=list(NULL,c("Group","A","B","C")))EqProb

4.8 Exercises

Chapter 5Independence Relationships and GraphicalModels

There is no computing in this entire chapter.

5.1 Model Interpretations

5.2 Graphical and Decomposable Models

5.3 Collapsing Tables

5.4 Recursive Causal Models

5.5 Exercises

31

Chapter 6Model Selection Methods and Model Evaluation

6.1 Stepwise Procedures for Model Selection

No computing.

6.2 Initial Models for Selection Methods

6.2.1 All s-Factor Effects



W=factor(Wt)M=factor(Ms)D=factor(Dr)T=factor(Tn)m7 <- glm(y ˜ T:W:M+T:W:D+T:M:D+W:M:D,family=poisson)m4 <- glm(y ˜ T:W+T:M+T:D+W:M+W:D+M:D,family=poisson)m0 <- glm(y ˜ T + W + M + D,family = poisson)

df=c(m7$df.residual,m4$df.residual,m0$df.residual)G2=c(m7$deviance,m4$deviance,m0$deviance)A2q=G2-(2*df)

33

34 6 Model Selection Methods and Model Evaluation

modelm=c(df,G2,A2q)model=matrix(modelm,3,3,dimnames=list(NULL,c("df","G2","A-q")))model

6.2.2 Examining Each Term Individually

No computing.

6.2.3 Tests of Marginal and Partial Association

The computations are simple but repetitive. The problem is identifying the modelsyou need to fit. The beauty of BMDP 4F is that it did these for you automatically. Iam not about to write a front end that determines all of these.

6.2.4 Testing Each Term Last

EXAMPLE 6.2.5.



W=factor(Wt)M=factor(Ms)D=factor(Dr)T=factor(Tn)yy=log(y)

m00 = lm(yy ˜ T*W*M*D)m0a=anova(m00)SS=m0a[,2]

tab=c(4*sqrt(SS[-16]),4*sqrt(SS[-16])/sqrt(sum(1/y)))

6.3 Example of Stepwise Methods 35

matrix(tab,15,2,dimnames = list(NULL,c("Est", "z")))

For all the weakness of using what is essentially a normal approximation forcount data, because the linear model is balanced, the results in the above table donot depend on the order in which effects are fitted. R will very conveniently print outa similar ANOVA table of G2 values (deviance reductions) and for this example theresults are very similar. The only problem with the output for the code below is thatthe model (D + M + W + T) 4 which is coded below gives different resultsthan the model (D + M + W + T) 4 or any other permutation of the factors.(Check the higher-order interactions.) Still, for these data the basic story remainspretty much the same regardless of the order. No guarantee that that will alwayshappen.



W=factor(Wt)M=factor(Ms)D=factor(Dr)T=factor(Tn)

m8 <- glm(y ˜ (T + W + M + D)ˆ4,family = poisson)anova(m8)

6.3 Example of Stepwise Methods

The examples in the book involve just fitting all of the models and accumulatingthe results. The following subsections discuss the use of R’s step command. Youwill notice that step applied to ANOVA type models does not eliminate redundantterms.

6.3.1 Forward Selection

The following code runs and gives results not too dissimilar from the book. It is hardform me to care enough about forward selection to worry about the differences. The


main differences are due to R basing decisions on AIC values rather than otherthings. If can probably vary the results by changing the k parameter discussed in thenext subsection.




svr <- glm(y ˜ r + s + o + a ,family = poisson)step(svr,y ˜ r*s*o*a, direction="forward")

6.3.2 Backward Elimination

The book considers applying backward elimination to the initial model containingall two-factor terms. R’s step command decides what to delete based on the AICcriterion rather than the P values used in the book. The default step procedure dropsone less two-factor term than the procedure in the book. You can get arrive at thesame model that the book gets by redefining AIC. R includes a k parameter for AICwhere the default value (and the true definition of AIC) is k=2. If you reset k=2.5,you arrive at the same final model as the book.




svf <- glm(y ˜ (r + s + o + a)ˆ2 ,family = poisson)

6.6 Use of Model Selection Criteria 37

step(svf, direction="backward")step(svf, direction="backward",k=2.5)

# you might find it interesting to see what# the following code producessvff <- glm(y ˜ r*s*o*a ,family = poisson)step(svff, direction="backward")

6.4 Aitkin’s Method of Backward Selection

Computationally this is just fitting a lot of models and using pchisq to obtain thegammas.

6.5 Model Selection Among Decomposable and GraphicalModels

Computationally this is just fitting a lot of models and perhaps using anova toobtain differences. The trick is in selecting the models and I am not about to programthat for you.

6.6 Use of Model Selection Criteria




sv6 <- glm(y ˜ r:s:o + o:a ,family = poisson)sv5 <- glm(y ˜ r:o + s:o + o:a ,family = poisson)sv4 <- glm(y ˜ r + s + o:a ,family = poisson)sv3 <- glm(y ˜ r:a + s + o:a ,family = poisson)


sv2 <- glm(y ˜ r + s:o + o:a ,family = poisson)sv1 <- glm(y ˜ r:o + s + o:a ,family = poisson)sv0 <- glm(y ˜ r + s + o + a ,family = poisson)

tab6=c(6,df.residual(sv6), deviance(sv6),deviance(sv6)-2*df.residual(sv6),((deviance(sv0)-deviance(sv6))/(deviance(sv0))),(1-(deviance(sv6)*df.residual(sv0)/(deviance(sv0)* df.residual(sv6)))) )tab5=c(5,df.residual(sv5), deviance(sv5),deviance(sv5)-2*df.residual(sv5),((deviance(sv0)-deviance(sv5))/(deviance(sv0))),(1-(deviance(sv5)*df.residual(sv0)/(deviance(sv0)* df.residual(sv5)))) )tab4=c(4,df.residual(sv4), deviance(sv4),deviance(sv4)-2*df.residual(sv4),((deviance(sv0)-deviance(sv4))/(deviance(sv0))),(1-(deviance(sv4)*df.residual(sv0)/(deviance(sv0)* df.residual(sv4)))) )tab1=c(3,df.residual(sv1), deviance(sv1),deviance(sv1)-2*df.residual(sv1),((deviance(sv0)-deviance(sv1))/(deviance(sv0))),(1-(deviance(sv1)*df.residual(sv0)/(deviance(sv0)* df.residual(sv1)))) )tab2=c(2,df.residual(sv2), deviance(sv2),deviance(sv2)-2*df.residual(sv2),((deviance(sv0)-deviance(sv2))/(deviance(sv0))),(1-(deviance(sv2)*df.residual(sv0)/(deviance(sv0)* df.residual(sv2)))) )tab3=c(1,df.residual(sv3), deviance(sv3),deviance(sv3)-2*df.residual(sv3),((deviance(sv0)-deviance(sv3))/(deviance(sv0))),(1-(deviance(sv3)*df.residual(sv0)/(deviance(sv0)* df.residual(sv3)))) )

t(matrix(c(tab6,tab5,tab4,tab3,tab2,tab1),6,6))

6.7 Residuals and Influential Observations


6.9 Exercises 39



mm <- glm(y ˜ r:s:o + o:a ,family = poisson)mms = summary(mm)

rpearson=(y-mm$fit)/(mm$fit)ˆ(.5)rstand=rpearson/(1-hatvalues(mm))ˆ(.5)infv = c(y,mm$fit,hatvalues(mm),rpearson,rstand,

cooks.distance(mm))inf=matrix(infv,I(mms$df[1]+mms$df[2]),6,dimnames =list(NULL,c("n","mhat","lev","Pearson","Stand.","C")))inf

index=c(1:72)plot(index,hatvalues(mm),ylab="Leverages",

xlab="Index")boxplot(rstand,horizontal=TRUE,

xlab="Standardized residuals")plot(index,rstand,ylab="Standardized residuals",

xlab="Index")qqnorm(rstand,ylab="Standardized residuals")boxplot(cooks.distance(mm),horizontal=TRUE,

xlab="Cook’s distances")plot(index,cooks.distance(mm),ylab="Cook’s distances",

xlab="Index")

6.8 Drawing Conclusions

6.9 Exercises

Chapter 7Models for Factors with Quantitative Levels

7.1 Models for Two-Factor Tables

rm(list = ls())abt <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\EX21-5-1.DAT",

sep="",col.names=c("c","p","y"))attach(abt)abt

C=factor(c)P=factor(p)m3 <- glm(y˜C+P+C:p,family=poisson) #[C][P][C_1]m2 <- glm(y˜C+P+c:P,family=poisson) #[C][P][P_1]m1 <- glm(y˜C+P+c:p,family=poisson) #[C][P][gamma]m0 <- glm(y˜C+P,family=poisson) #[C][P]df=c(m3$df.residual,m2$df.residual,m1$df.residual,

m0$df.residual)G2=c(m3$deviance,m2$deviance,m1$deviance,m0$deviance)A2q=G2-(2*df)modelm=c(df,G2,A2q)model=matrix(modelm,4,3,dimnames =

list(NULL,c("df","G2","A-q")))model

m1s=summary(m1)m1sanova(m1)

41

42 7 Models for Factors with Quantitative Levels

rpearson=(y-m1$fit)/(m1$fit)ˆ(.5)rstand=rpearson/(1-hatvalues(m1))ˆ(.5)infv = c(y,m1$fit,hatvalues(m1),rpearson,rstand,

cooks.distance(m1))inf=matrix(infv,I(m1s$df[1]+m1s$df[2]),6,dimnames =list(NULL,c("y","yhat","lev","Pearson","Stand.","C")))inf

m0$fit

7.2 Higher-Dimensional Tables



r=factor(R)o=factor(O)s=factor(S)a=factor(A)A2=A*A#[RSO][OA]ab <- glm(y ˜ r:s:o + o:a ,family = poisson)abp=summary(ab)abpanova(ab)

#[RSO][A][O_1][O_2]ab2 <- glm(y ˜ r:s:o + a + o:A + o:A2,family = poisson)abp2=summary(ab2)abp2anova(ab2)

#[RSO][A][O_1]ab3 <- glm(y ˜ r:s:o + a + o:A ,family = poisson)abp3=summary(ab3)abp3anova(ab3)

7.4 Logit Models with Unknown Scores 43

rpearson=(y-ab3$fit)/(ab3$fit)ˆ(.5)rstand=rpearson/(1-hatvalues(ab3))ˆ(.5)infv = c(y,ab3$fit,hatvalues(ab3),rpearson,

rstand,cooks.distance(ab3))inf=matrix(infv,I(abp3$df[1]+abp3$df[2]),6,dimnames =list(NULL,c("y", "yhat", "lev","Pearson","Stand.","C")))inf

7.3 Unknown Factor Scores

rm(list = ls())ct=c(43,16,3,6,11,10,9,18,16)L=c(1,2,3,1,2,3,1,2,3)V=c(1,1,1,2,2,2,3,3,3)l=factor(L)v=factor(V)ind=glm(ct ˜ v + l,family=poisson)summary(ind)t=log(ind$fit)t2=t*tm11=glm(ct ˜ v + l + v:t,family=poisson)summary(m11)m12=glm(ct ˜ v + l + l:t,family=poisson)summary(m12)m13=glm(ct ˜ v + l + t2,family=poisson)summary(m13)

summary(m11)t

see package logmult which runs things from package gnm

7.4 Logit Models with Unknown Scores

I must have gotten the maximum likelihood fits in Table 7.3 from Chuang (1983)because I have no idea how I would have computed them.

rm(list = ls())High=c(245,330,388,100,77,51,28,89,102,67,87,62,

125,234,233,109,197,90)Low=c(115,152,153,40,37,19,11,37,35,18,12,13,68,

91,173,47,82,32)

44 7 Models for Factors with Quantitative Levels

R=c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3)E=c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)r=factor(R)e=factor(E)T=cbind(High,Low)m4=glm(T ˜ r + e,family=binomial)summary(m4)t=log(m4$fit/(1-m4$fit))t2=t*tm5=glm(T ˜ r + e + r:t,family=binomial)summary(m5)m6=glm(T ˜ r + e + e:t,family=binomial)summary(m6)m7=glm(T ˜ r + e + t2,family=binomial)summary(m7)

7.5 Exercises

Chapter 8Fixed and Random Zeros

8.1 Fixed Zeros

This just involves leaving some cells out of the table.

EXAMPLE 8.1.1. Brunswick (1971) reports data on the health concerns ofteenagers.

rm(list = ls())ct=c(4,42,57,2,7,20,9,4,19,71,7,8,10,31)s=c(1,1,1,1,1,1,2,2,2,2,2,2,2,2)a=c(1,1,1,2,2,2,1,1,1,1,2,2,2,2)h=c(1,3,4,1,3,4,1,2,3,4,1,2,3,4)S=factor(s)A=factor(a)H=factor(h)m7=glm(ct ˜ S:A + S:H + A:H,family=poisson)summary(m7)m6=glm(ct ˜ S:H + A:H,family=poisson)summary(m6)m5=glm(ct ˜ S:A + A:H,family=poisson)summary(m5)m4=glm(ct ˜ S:A + S:H ,family=poisson)summary(m4)m3=glm(ct ˜ S:A + H,family=poisson)summary(m3)m2=glm(ct ˜ S:H + A,family=poisson)summary(m2)m2=glm(ct ˜ S + A:H,family=poisson)summary(m1)m0=glm(ct ˜ S + A + H,family=poisson)summary(m0)

45

46 8 Fixed and Random Zeros

tab7=c(7,df.residual(m7),deviance(m7),1-pchisq(deviance(m7),df.residual(m7)))tab6=c(6,df.residual(m6),deviance(m6),1-pchisq(deviance(m6),df.residual(m6)))tab5=c(5,df.residual(m5),deviance(m5),1-pchisq(deviance(m5),df.residual(m5)))tab4=c(4,df.residual(m4),deviance(m4),1-pchisq(deviance(m4),df.residual(m4)))tab1=c(1,df.residual(m1),deviance(m1),1-pchisq(deviance(m1),df.residual(m1)))tab2=c(2,df.residual(m2),deviance(m2),1-pchisq(deviance(m2),df.residual(m2)))tab3=c(3,df.residual(m3),deviance(m3),1-pchisq(deviance(m3),df.residual(m3)))tab0=c(0,df.residual(m0),deviance(m0),1-pchisq(deviance(m0),df.residual(m0)))

t(matrix(c(tab7,tab6,tab5,tab4,tab3,tab2,tab1,tab0),4,8))

8.2 Partitioning Polytomous Variables

rm(list = ls())ct=c(104,165,65,100,4,5,13,32,42,142,44,130,3,6,6,23)s=c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)y=c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)r=c(1,1,2,2,3,3,4,4,1,1,2,2,3,3,4,4)

S=factor(s)Y=factor(y)R=factor(r)

m7=glm(ct ˜ R:Y + R:S + Y:S,family=poisson)summary(m7)m5=glm(ct ˜ R:Y + Y:S,family=poisson)summary(m5)m4=glm(ct ˜ R:Y + R:S ,family=poisson)summary(m4)m3=glm(ct ˜ R:Y + S,family=poisson)summary(m3)

8.2 Partitioning Polytomous Variables 47

tab7=c(7,df.residual(m7),deviance(m7),1-pchisq(deviance(m7),df.residual(m7)))tab5=c(5,df.residual(m5),deviance(m5),1-pchisq(deviance(m5),df.residual(m5)))tab4=c(4,df.residual(m4),deviance(m4),1-pchisq(deviance(m4),df.residual(m4)))tab3=c(3,df.residual(m3),deviance(m3),1-pchisq(deviance(m3),df.residual(m3)))

t(matrix(c(tab7,tab4,tab5,tab3),4,4))

g=c(1,1,2,2,1,1,4,4,1,1,2,2,1,1,4,4)h=c(1,1,1,1,3,3,1,1,1,1,1,1,3,3,1,1)G=factor(g)H=factor(h)eq2=glm(ct ˜ G:H:Y + G:H:S + Y:S,family=poisson)summary(eq2)eq3=glm(ct ˜ G:H:Y + G:S + Y:S,family=poisson)summary(eq3)

r=c(1,1,2,2,3,3,4,4,1,1,2,2,3,3,4,4)p=c(1,1,2,2,2,2,2,2,1,1,2,2,2,2,2,2)c=c(2,2,1,1,2,2,2,2,2,2,1,1,2,2,2,2)j=c(2,2,2,2,1,1,2,2,2,2,2,2,1,1,2,2)o=c(2,2,2,2,2,2,1,1,2,2,2,2,2,2,1,1)P=factor(p)C=factor(c)J=factor(j)O=factor(o)

eq4=glm(ct ˜ P:C:J:O:Y + P:C:J:O:S + Y:S,family=poisson)summary(eq4)eq5=glm(ct ˜ P:C:J:O:Y + P:S + Y:S,family=poisson)summary(eq5)eq6=glm(ct ˜ P:C:J:O:Y + C:S + Y:S,family=poisson)summary(eq6)eq7=glm(ct ˜ P:C:J:O:Y + J:S + Y:S,family=poisson)summary(eq7)eq8=glm(ct ˜ P:C:J:O:Y + O:S + Y:S,family=poisson)summary(eq8)eq9=glm(ct ˜ P:C:J:O:Y + Y:S,family=poisson)summary(eq9)


tab4=c(4,df.residual(eq4),deviance(eq4),1-pchisq(deviance(eq4),df.residual(eq4)))






t(matrix(c(tab4,tab5,tab6,tab7,tab8,tab9),4,6))

8.3 Random Zeros

EXAMPLE 8.3.1.In the book it indicates that 3 rows of the 24×3 table are zeros as can be seen in

Table 8.3 of the book. If you just fit the entire data, the corresponding 9 fitted valuesmhi jk are converging to 0. The way the data are read below, those cases turn out tobe 9, 11, 19, 33, 35, 43, 57, 59, 67. It reports the same G2 as in the book, but doesnot give the book’s degrees of freedom.

rm(list = ls())knee <- read.table("C:\\E-drive\\Books\\LOGLIN2\\DATA\\TAB8-3.DAT",

sep="",col.names=c("hh","ii","jj","kk","ct"))attach(knee)knee#summary(knee)T=factor(hh)S=factor(ii)A=factor(jj)R=factor(kk)

art=glm(ct ˜ T:A:S + T:R + A:R,family=poisson)summary(art)art$fit

8.3 Random Zeros 49

art=glm(ct ˜ T:A:S + T:A:R,family=poisson)summary(art)

Note also the much larger than normal number of iterations the computations take.Technically, there are no maximum likelihood estimates because some estimates areconverging to 0 and 0 is not an allowable MLE.

Now we drop the offending cells, refit the models, and get the same G2 valuesbut the degrees of freed from the book.

ctt=ctctt[9]=NActt[11]=NActt[19]=NActt[33]=NActt[35]=NActt[43]=NActt[57]=NActt[59]=NActt[67]=NA

artt=glm(ctt ˜ T:A:S + T:R + A:R,family=poisson)summary(artt)

artt=glm(ctt ˜ T:A:S + T:A:R,family=poisson)summary(artt)

EXAMPLE 8.3.2.In the book it indicates that 3 rows of the 24× 3 table are zeros as can be seen

in Table 8.4 of the book. If you just fit the entire data, the 12 cases identified inthe book have fitted values mhi jk are converging to 0. The way the data are readbelow, those cases turn out to be 3, 4, 12, 13, 14, 15, 19, 21, 28, 30, 31, 34. Also asindicated in the book, this is a saturated model, so the other cases have mhi jk = nhi jk.The program reports G2 .

= 0 on 4 degrees of freedom which, again, is too manydegrees of freedom

rm(list = ls())mel <- read.table("C:\\E-drive\\Books\\LOGLIN2\\DATA\\TAB8-4.DAT",

sep="",col.names=c("hh","ii","jj","kk","ct"))attach(mel)mel#summary(mel)G=factor(hh)R=factor(ii)Im=factor(jj)


S=factor(kk)

out=glm(ct˜G:R:Im+G:R:S+G:Im:S+R:Im:S,family=poisson)summary(out)out$fitct

Again note the much larger than normal number of iterations the computations take.Technically, there are no maximum likelihood estimates because some estimates areconverging to 0 and 0 is not an allowable MLE.

Now we drop the offending cells and refit the model, we get the same G2 .= 0

value but 0 degrees of freed as in the book.

ctt=ctctt[3]=NActt[4]=NActt[12]=NActt[13]=NActt[14]=NActt[15]=NActt[19]=NActt[21]=NActt[28]=NActt[30]=NActt[31]=NActt[34]=NA

out=glm(ctt˜G:R:Im+G:R:S+G:Im:S+R:Im:S,family=poisson)summary(out)out$fitct

8.4 Exercises

EXAMPLE 8.4.3. Partitioning Two-Way Tables. Lancaster (1949) and Irwin(1949)

EXAMPLE 8.4.4. The Bradley-Terry Model.

Chapter 9Generalized Linear Models

No computing in this chapter.

9.1 Distributions for Generalized Linear Models

9.2 Estimation of Linear Parameters

9.3 Estimation of Dispersion and Model Fitting

9.4 Summary and Discussion

9.5 Exercises

51

Chapter 10The Matrix Approach to Log-Linear Models

10.1 Maximum Likelihood Theory for Multinomial Sampling

10.2 Asymptotic Results

EXAMPLE 10.2.3. . In the abortion opinion data of Chapter 3 with the model[RSO][OA] (cf. Table 6.7), the cell for nonwhite males between 18 and 25 years ofage who support abortion

EXAMPLE 10.2.4. Automobile Injuries

rm(list = ls())sb <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\EX21-3-1.dat",

sep="",col.names=c("nuLL","i","j","k","y"))

attach(sb)per#summary(sb)I=factor(i)J=factor(j)K=factor(k)m7 <- glm(y ˜ I:J + I:K + J:K,family = poisson)m7s=summary(m7)


cooks.distance(m7))inf=matrix(infv,I(m7s$df[1]+m7s$df[2]),6,dimnames =list(NULL,c("y","yhat","lev","Pearson","Stand.","C")))

53

54 10 The Matrix Approach to Log-Linear Models

inf

EXAMPLE 10.2.6. Classroom behavior.

rm(list = ls())sb <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\EX21-3-2.dat",

sep="",col.names=c("y","i","j","k"))

attach(sb)per#summary(sb)I=factor(i)J=factor(j)K=factor(k)m7 <- glm(y ˜ I + J+ K+ J:K,family = poisson)m7s=summary(m7)vcov(m7)


cooks.distance(m7))inf=matrix(infv,I(m7s$df[1]+m7s$df[2]),6,dimnames =list(NULL,c("y","yhat","lev","Pearson","Stand.","C")))inf

10.3 Product-Multinomial Sampling

10.4 Inference for Model Parameters

10.5 Methods for Finding Maximum Likelihood Estimates

10.6 Regression Analysis of Categorical Data

EXAMPLE 10.6.1. Drug Comparisons.

rm(list = ls())cnt=c(6,16,2,4,2,4,6,6)a=c(1,1,1,1,2,2,2,2)A=3-2*a

10.8 Exercises 55

b=c(1,1,2,2,1,1,2,2)B=3-2*bc=c(1,2,1,2,1,2,1,2)AB=A*BC=3-2*cy=log(cnt)

ts <- lm(y ˜ A+B+AB+C,weights = cnt)tsp=summary(ts)tspanova(ts)

# new standard errorscoef(tsp)[,2]/tsp$sigma# new z scorescoef(tsp)[,3]*tsp$sigma

10.7 Residual Analysis and Outliers

10.8 Exercises

Chapter 11The Matrix Approach to Logit Models

11.1 Estimation and Testing for Logistic Models

11.2 Model Selection Criteria for Logistic Regression

11.3 Likelihood Equations and Newton-Raphson

11.4 Weighted Least Squares for Logit Models

11.5 Multinomial Response Models

11.6 Asymptotic Results

11.7 Discrimination, Allocations, and Retrospective Data

11.8 Exercises

57

Chapter 12Maximum Likelihood Theory for Log-LinearModels

There is no computing in this chapter.

12.1 Notation

12.2 Fixed Sample Size Properties

12.3 Asymptotic Properties

12.4 Applications

12.5 Proofs of Lemma 12.3.2 and Theorem 12.3.8

59

Chapter 13Bayesian Binomial Regression: OpenBUGS RunThrough R

What exists here seems OK. Many parts do not existYET.

The book does not get into computational issues until Section 4. Here we use theentire chapter to gradually introduce the computations needed.

Bayesian computation has made huge strides since the second edition of thisbook in 1997. In the book I/we used importance sampling. (Ed Bedrick did almostall the computing for joint papers we wrote with Wes Johnson.) Current practiceinvolves using Markov chain Monte Carlo (McMC) methods that provide a sequenceof simulated observations on the parameters. Two particular tools in this approachare Gibbs sampling (named by Geman and Geman, 1984, after distributions thatwere introduced by and named after the greatest American physicist of the 19thcentury) and the Metropolis-Hastings algorithm (named after the the alphabeticallylisted authors of Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller, 1953, andthe person who introduced the technique to the statistics community [and made asimple but very useful improvement] Hastings, 1970). For more about McMC seeChristensen et al. (2010, Chapter 6) and the references given therein. HenceforthChristensen et al. is referred to as BIDA.

Unlike the importance samples used in the second edition, McMC samples arenot naturally independent and they only become (approximate) samples from theposterior distribution after the Markov chain has been running quite a while, i.e.,after one is deep into the sequence of observations. Computationally, we need tospecify a burn-in period for the samples to get close to the posterior and then wethrow away all of the observations from the burn-in period. Among the samples weuse, we tend to take larger samples sizes to adjust for the lack of independence.When averaging sample observations to estimate some quantity (including prob-abilities of events) the lack of independence is rarely a problem. When applyingmore sophisticated techniques than averaging to the samples, if independence is im-portant to the technique, we can often approximate independence by thinning thesample, i.e., using, say, only every 10th or 20th observation from the Markov chain.

61

62 13 Bayesian Binomial Regression: OpenBUGS Run Through R

Of course in any program we need to specify the sample size and the rate at whichany thinning occurs. The default is typically no thinning.

BUGS (Bayesian inference Using Gibbs Sampling) provides a language for spec-ifying Bayesian models computationally, cf. http://www.mrc-bsu.cam.ac.uk/software/bugs/. OpenBUGS (https://openbugs.net) and JAGS(http://mcmc-jags.sourceforge.net/) implement that language to ac-tually analyze data. I will present results using OpenBUGS. I will also illustraterunning the OpenBUGS code in R using R2OpenBUGS, cf. https://cran.r-project.org/web/packages/R2OpenBUGS/index.html.

An outdated version of OpenBUGS is WinBUGS, cf. https://cran.r-project.org/web/packages/R2WinBUGS/index.html. WinBUGSwas used in BIDA but the commands given in BIDA should almost all work withOpenBUGS. (Some data entry is different.) JAGS can be used in R with R2jags,cf. https://cran.r-project.org/web/packages/R2jags/index.html. R also has tools for doing these things directly, cf. https://cran.r-project.org/web/packages/MCMCpack/index.html.

My goal here is to get you through an McMC version of the computations in thebook. For a more general tutorial on OpenBUGS see http://www.openbugs.net/Manuals/Tutorial.html.

13.1 Introduction

There are three things you need to do in using OpenBUGS or JAGS:

• Specify the Bayesian model. This involves specifying both the sampling distri-bution and the prior distribution.

• Enter the data. Depending on how you specified the model, this includes speci-fying any “parameters” in the model that are known.

• Identify and give starting values for the unknown parameters.

There are a lot of similarities between the R and BUGS languages but beforeproceeding we mention a couple oddities of the BUGS language. First,

y∼ Bin(N, p)

is written as

y ˜ dbin(p,N)

with the order of N and p reversed. Also,

y∼ N(m,v)

is written as

y ˜ dnorm(m,1/v)

http://www.mrc-bsu.cam.ac.uk/software/bugs/


https://openbugs.net

http://mcmc-jags.sourceforge.net/

https://cran.r-project.org/web/packages/R2OpenBUGS/index.html


https://cran.r-project.org/web/packages/R2WinBUGS/index.html


https://cran.r-project.org/web/packages/R2jags/index.html


https://cran.r-project.org/web/packages/MCMCpack/index.html


http://www.openbugs.net/Manuals/Tutorial.html


13.1 Introduction 63

where the variance v is replaced in BUGS by the precision, 1/v. Replacing normalvariances with precisions is a very common thing to do in Bayesian analysis becauseit simplifies many computations.

The main thing is to specify the model inside a statement: model{}. For simpleproblems this is done by specifying a sampling distribution and a prior distribution.

The sampling model for the O-ring data with failures yi and temperatures τi is

yi ∼ Bin(1, pi),

logit(pi)≡ log(

pi

1− pi

)= β1 +β2τi, i = 1, . . . ,23.

For programming convenience we have relabeled the intercept as β1 and the slopeas β2. In the BUGS language this can be specified as

for(i in 1:23){y[i] ˜ dbin(p[i],1)logit(p[i]) <- beta[1] + beta[2]*tau[i]}

The parameters of primary interest are beta[1] and beta[2]. We also need tospecify initial values for the unknown parameters but that is not part of specifyingthe model. Remember if you copy the tilde symbol

˜

from a .pdf file into a program like R or OpenBUGS, you may need to delete andreplace the symbol.

In specifying the prior model, for simplicity we begin by specifying independentnormal priors,

β j ∼ N(a j,1/b j), j = 1,2,

for some specified values, say, a1 = 10, b1 = 0.001, a2 = 0, b2 = 0.004. The b js areprecisions (inverse variances). I have heretically put almost no thought into this priorother than making the precisions small. In BUGS the prior model is most directlyspecified as

beta[1] ˜ dnorm(10,.001)beta[2] ˜ dnorm(0,.004)

In my opinion one should always explore different priors to examine how sensitivethe end results are to the choice of prior. It is also my opinion that one of those priorsshould be your best attempt to quantify your prior knowledge about the unknownparameters.

All together the model is

model{for(i in 1:23){

y[i] ˜ dbin(p[i],1)logit(p[i]) <- beta[1] + beta[2]*tau[i]


}beta[1] ˜ dnorm(10,.001)beta[2] ˜ dnorm(0,.004)

}

The model is only a part of an overall program for OpenBUGS. To run OpenBUGSwithin R, we need to place this model into a .txt file, say, Oring.txt.

Producing a functional program involves identifying the model, specifying thedata, and specifying initial values for all of the unknown parameters. We have twochoices on how to run this. We can run it through R or we can run it through theOpenBUGS GUI. We begin by running it through R because R2OpenBUGS requiresa more transparent specification of the various steps involved. The GUI is moreflexible and discussed in the next chapter. I recommend you learn to use it beforedoing serious data analysis.

What follows is an R program for an analysis of the O-ring data. The variousparts are explained using comments within the program.

rm(list = ls())# Enter O-ring Data

y=c(1,1,1,1,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0)tau=c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)

# Call the R2OpenBUGS library.# Set paths so that R2OpenBUGS can find and store# the files it needs, including the model file Oring.txt# and where you have stored OpenBUGS on your computer.library(R2OpenBUGS)BUGS_path <-"c:\\Program Files (x86)\\OpenBUGS\\OpenBUGS323\\OpenBUGS.exe"setwd("c:\\E-drive\\Books\\LOGLIN2R\\BAYES\\")myworking_dir <- "c:\\E-drive\\Books\\LOGLIN2R\\BAYES\\"

# Define the McMC sample characteristics.# This code gives (we hope) 10,000 nonindependent# observations from the posterior distribution.iterates <- 10000burn_in <- 1000

# Identify the datadata <- list( "y", "tau")

# Identify the parametersparameters <- list("beta")

# Identify parameter initial values in a list of lists.


inits <- list(list(beta=c(0,0)))# The program is designed to run more than one McMC chain.# This is a list of lists to allow different initial# values for different chains.

# Easier than the list of lists is to generate# initial values via a function.inits <- function () { list(beta = c(0,0)) }# It would be redundant to run both inits <- commands

# Putting all the pieces together:Oring <- bugs( data, inits, parameters,

model.file="c:\\E-drive\\Books\\LOGLIN2R\\BAYES\\Oring.txt",n.chains=1, n.iter=iterates+burn_in,n.thin=1, n.burnin=burn_in,OpenBUGS.pgm=BUGS_path,working.directory=myworking_dir,debug=F )

Oring$summary# The following command tells you what info# is contained in Oringsummary(Oring)# For exampleOring$meanOring$sdOring$median

Rather than defining y and tau separately you could also write the data statementas

data <- list(y=c(1,1,1,1,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0),tau=c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81))

If you change the last command in bugs to debug=T, R will actually open aOpenBUGS GUI and you can manipulate the process manually.

13.1.1 Alternative Specifications

Sometimes it is more convenient to define the prior more obliquely. Rather than thedirect specification we used, we can specify

for(j in 1:2){ beta[j] ˜ dnorm(a[j],b[j]) }


where at this point in the code it is not clear whether the a[j]s and b[j]s areunknown parameters or not. In either case we have to specify values for them in alist statement, say,

list(a=c(10,0), b=c(.001,.004))

Identifying this list statement as data, rather than initial values, completes our priorspecification (and implicitly defines them as not being parameters).

13.2 Bayesian Inference

To this point we have introduced Bayesian computation for logistic regression butwe have not yet reproduced any results from the book. We we now do that and alsoproduce the new graphics used in the revised edition.

13.2.1 Specifying the Prior and Approximating the Posterior

Section 1 illustrates Bayesian computations using a prior of convenience. The mainaddition in this section is that we demonstrate how to program the induced prior onthe regression coefficients that was discussed in the book. This involves modifica-tions to the BUGS model.

13.2.1.1 O-ring Data with Induced Prior

The book elicits independent priors on the probabilities p1 and p2 of O-ring fail-ure associated with the temperatures x1 = 55 and x2 = 75. It then induces a prioron the regression coefficients from the elicited prior. The elicited prior consists ofindependent Beta distributions,

p j ∼ Beta(a j,b j), j = 1,2

(a1,a2) = (1,0.577) (b1,b2) = (0.577,1)

Sampling from the induced prior on β is actually easier than determining its prioranalytically. For the logistic model,[

logit(p1)logit(p2)

]=

[1 551 75

][β1β2

],

so [β1β2

]=

[1 551 75

]−1 [ logit(p1)logit(p1)

]=

[75/20 −55/20−1/20 1/20

][logit(p1)logit(p1)

].

13.2 Bayesian Inference 67

The equality allows us to transform samples from the p js directly into samples ofthe β js. In BUGS this is easily programmed as

for(j in 1:2){ ptilde[j] ˜ dbeta(a[j],b[j]) } # Prior# Induced prior on the regression coefficientsbeta[1] <- (75/20)*logit(ptilde[1])

- (55/20)*logit(ptilde[2])beta[2] <- (-1/20)*logit(ptilde[1])

+ (1/20)*logit(ptilde[2])

The values a[1], a[2], b[1], b[2] all need to be specified as part of adata list statement. (Note that the prior used in this book is different than thesimilar prior used in BIDA).

In general we define an invertible square matrix of predictor variables X and avector of probabilities p associated with with those predictor variables. The proba-bilities are related to the β coefficients via

F(p) = Xβ or β = X−1F(p).

In logistic regression, F is just the function that maps each probability to its logit.For models with an intercept and more than one predictor variable, we will want toinvert X in R, but in this code we have inverted the matrix X analytically, since it isonly 2× 2. Samples from the distribution of p are easily transformed into samplesof the regression coefficients.

13.2.1.2 Constructing Figures 13.1 through 13.4

The figures do not involve any use of BUGS. The next subsection provides estimatedversions of these plots so we show all of them for your comparison. The estimatedversions are much less smooth.

Figure 13.1 (prior)

rm(list = ls())# plot density piB=1000 #size of square grid#pi a B by B matrix of 1spi=rep(1,B*B)pi=matrix(pi,B,B)

# Grid domainbeta1 = seq(-15,25,length=B) #contours at levels 0.08, 0.22, 0.80beta2 = seq(-.4,.2,length=B)

#prior densityrk=2 #rank of Xeta=rep(1,B*rk)


eta=matrix(eta,B,rk)tautilde=c(55,75)ytilde=c(1,.577)Ntilde=c(1.577,1.577)

for(j in 1:B){for(k in 1:rk){eta[,k] = beta1 + beta2[j]*(tautilde[k])pi[,j] = pi[,j] * ((exp(eta[,k])/(1+exp(eta[,k])))**ytilde[k])*((1-(exp(eta[,k])/(1+exp(eta[,k]))))**(Ntilde[k]-ytilde[k]))}}

pi=pi*1000#/(sum(pi)/(60*.8))

contour(beta1,beta2,pi,nlevels=8)

Figure 13.2 (posterior)


# Grid domainbeta1 = seq(0,25,length=B) #contours at levels 0.08, 0.22, 0.80beta2 = seq(-.4,.0,length=B)

#prior densityrk=2 #rank of Xeta=rep(1,B*rk)eta=matrix(eta,B,rk)tautilde=c(55,75)ytilde=c(1,.577)Ntilde=c(1.577,1.577)

for(j in 1:B){for(k in 1:rk)


{eta[,k] = beta1 + beta2[j]*(tautilde[k])pi[,j] = pi[,j] * ((exp(eta[,k])/(1+exp(eta[,k])))**ytilde[k])*((1-(exp(eta[,k])/(1+exp(eta[,k]))))**(Ntilde[k]-ytilde[k]))}}

y=c(1,1,1,1,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0)N=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)tau=c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)mt=mean(tau)n=length(y)

# mt=69.565,#sdtemp=7.057#tau=(tau-mt)/sdtempetaD=rep(1,B*n)etaD=matrix(etaD,B,n)for(j in 1:B){for(k in 1:n){etaD[,k] = beta1 + beta2[j]*(tau[k])pi[,j] = pi[,j] * ((exp(etaD[,k])/(1+exp(etaD[,k])))**y[k])*((1-(exp(etaD[,k])/(1+exp(etaD[,k]))))**(N[k]-y[k]))}}

pi=pi*10000000#/(sum(pi)/(60*.8))


Figure 13.3 (mean adjusted prior).

}rm(list = ls())# plot density piB=1000 #size of square grid#pi a B by B matrix of 1spi=rep(1,B*B)pi=matrix(pi,B,B)




for(j in 1:B){for(k in 1:rk){eta[,k] = beta1 + beta2[j]*(tautilde[k]-69.565)pi[,j] = pi[,j] * ((exp(eta[,k])/(1+exp(eta[,k])))**ytilde[k])*((1-(exp(eta[,k])/(1+exp(eta[,k]))))**(Ntilde[k]-ytilde[k]))}}

pi=pi*1000#/(sum(pi)/(60*.8))


Figure 13.4 mean corrected posterior).





for(j in 1:B){for(k in 1:rk){eta[,k] = beta1 + beta2[j]*(tautilde[k]-69.565)pi[,j] = pi[,j] * ((exp(eta[,k])/(1+exp(eta[,k])))**ytilde[k])*((1-(exp(eta[,k])/(1+exp(eta[,k]))))**(Ntilde[k]-ytilde[k]))}}

y=c(1,1,1,1,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0)N=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)tau=c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)mt=mean(tau)n=length(y)

# mt=69.565,#sdtemp=7.057#tau=(tau-mt)/sdtempetaD=rep(1,B*n)etaD=matrix(etaD,B,n)for(j in 1:B){for(k in 1:n){etaD[,k] = beta1 + beta2[j]*(tau[k]-69.565)pi[,j] = pi[,j] * ((exp(etaD[,k])/(1+exp(etaD[,k])))**y[k])*((1-(exp(etaD[,k])/(1+exp(etaD[,k]))))**(N[k]-y[k]))}}

pi=pi*10000000#/(sum(pi)/(60*.8))


13.2.1.3 O-ring Program written by Fletcher Christensen

Before running the overall program Fletch produced a .csv data file and two modelsfor the uncentered and centered data, both incorporating the induced prior infor-mation but with the actual parameters of the beta distributions yet to be specified.(These would work for completely different priors as long as the priors were speci-fied at 55 and 75 degrees.)

Saved as


20

40

60

80

100

120

−10 0 10 20

−0.4

−0.3

−0.2

−0.1

0.00.1

0.2

Fig. 13.1 Book Figure 13.1; O-Ring Data: Contour shapes of prior on β .

5

10

15

20

25

0 5 10 15 20 25

−0.4

−0.3

−0.2

−0.1

0.0

Fig. 13.2 Book Figure 13.1; O-Ring Data: Contour shapes of posterior on β .


20

40

60

80

100

120

−4 −2 0 2 4

−0.4

−0.2

0.00.2

Fig. 13.3 Book Figure 13.1; O-Ring Data: Contour shapes of centered prior on β .

5

10

15

20

25

−3 −2 −1 0 1

−0.4

−0.3

−0.2

−0.1

0.0

Fig. 13.4 Book Figure 13.1; O-Ring Data: Contour shapes of centered posterior on β .


−0.3

−0.2

−0.1

0.0

0.1

−10 0 10 20β1

β 2

Approximate posterior for β1, β2

Fig. 13.5 Book Figure 13.1; O-Ring Data: Estimated contours of prior and posterior on β .

−2

−1

0

1

−3 −2 −1 0 1 2β1

β 2

Approximate posterior for centered β1, β2

Fig. 13.6 Book Figure 13.1; O-Ring Data: Estimated contours of centered prior and posterior onβ .


O-ring_model_a.txt

the model for uncentered data is

model{for( i in 1:n ){y[i] ˜ dbin( theta[i], 1)logit( theta[i] ) <- beta[1] + beta[2] * temp[i]}for( j in 1:2 ){tildetheta[j] ˜ dbeta( a[j], b[j] )}beta[1] <- (75/20) * logit( tildetheta[1] )

- (55/20) * logit( tildetheta[2] )beta[2] <- (-1/20) * logit( tildetheta[1] )

+ (1/20) * logit( tildetheta[2] )}

The model for the centered data was saved as

O-ring_model_b.txt

With the centered data, the invertible square matrix X is more complicated.

model{for( i in 1:n ){y[i] ˜ dbin( theta[i], 1)logit( theta[i] ) <- beta[1] + beta[2] * c_temp[i]}for( j in 1:2 ){tildetheta[j] ˜ dbeta( a[j], b[j] )}beta[1] <- 0.2717391 * logit( tildetheta[1] )

+ 0.7282609 * logit( tildetheta[2] )beta[2] <- 0.352854 * logit( tildetheta[2] )

- 0.352854 * logit( tildetheta[1] )}

The data file

O-ring_data.csv

looks like

"Temperature","Failure"66,070,169,068,067,0


72,073,070,057,163,170,178,067,053,167,075,070,081,076,079,075,176,058,1

Fletch’s program produces estimated versions of Figures 13.1 through 13.4 (byusing the two different model files), and provides Figure 13.6. Fletch used the plot-ting package ggplotwhich is very popular but unfamiliar to me. I have made somevery minor modifications to his program.

########################### O-Ring Analysis##rm(list = ls())library(R2OpenBUGS)

BUGS_path <-"c:\\Program Files (x86)\\OpenBUGS\\OpenBUGS323\\OpenBUGS.exe"setwd("c:\\E-drive\\Books\\LOGLIN2R\\BAYES\\")working_dir <- "c:\\E-drive\\Books\\LOGLIN2R\\BAYES\\"

#setwd("d:\\[Data]\\School\\UNM\\Teaching\\577\\")model_a_filename <- "O-ring_model_a.txt"model_b_filename <- "O-ring_model_b.txt"# Fletch used the following#BUGS_path <-# "c:\\Program Files (x86)\\OpenBUGS\\OpenBUGS323\\OpenBUGS.exe"#working_dir <- "d:\\[Data]\\School\\UNM\\Teaching\\577\\"

iterates <- 10000


burn_in <- 1000

########################## O-ring data

ORing_data <- read.csv("O-ring_data.csv", header=T)

n <- dim(ORing_data)[1]y <- ORing_data$Failure# To rerun the analysis without obs. 18# uncomment the following line# y[18]=NAtemp <- ORing_data$Temperature

### Fletch used BIDA not LOGLIN2 Priors#a <- c( 1.6, 1 )#b <- c( 1, 1.6 )# LOGLIN2 priorsa <- c( 1, .577 )b <- c( .577, 1 )

data <- list( "n", "y", "temp", "a", "b" )

inits <- function() {list( tildetheta = c( 0.5, 0.5 ) )

}

parameters <- list( "beta" )

ORing.c.sim <- bugs( data, inits, parameters,model.file=model_a_filename,n.chains=1, n.iter=iterates+burn_in,n.thin=1, n.burnin=burn_in,OpenBUGS.pgm=BUGS_path,working.directory=working_dir,debug=F )

ORing.c.sim$summary

beta <- ORing.c.sim$sims.list$beta

# Sampling from the priorprior_theta1 <- rbeta(iterates, a[1], b[1])prior_theta2 <- rbeta(iterates, a[2], b[2])


prior_beta1 <- (75/20) * logit( prior_theta1 )- (55/20) * logit( prior_theta2 )

prior_beta2 <- (1/20) * logit( prior_theta2 )- (1/20) * logit( prior_theta1 )

# housekeepingbeta_frame <- data.frame( cbind( prior_beta1, prior_beta2,

beta[,1], beta[,2] ) )names( beta_frame ) <-

c( "prior_beta1", "prior_beta2","posterior_beta1", "posterior_beta2" )

#The next two additions are mine.

# This provides Figure 13.10# To get the plot as similar as possible to the second edition# I reduced the iterates to 5000# but imposed severe thinning with n.thin=100post_slope=density(beta[,2])plot(post_slope,main="")

#LDalpha run for each of the values of af# run again with case 18 deleted.af=.9 #,.75,.5,.25,.1)lda=(log(af/(1-af))-beta[,1])/beta[,2]quantile(lda,c(.05,.5,.95))

# This provides Fletch’s estimated version of Figures 13.1 and 13.2library(ggplot2)# library(boot) #this was part of Fletch’s program# but I don’t see it being used anywhere# "boot" is a bootstrapping libraryggplot( beta_frame ) +

geom_density2d( aes(x=prior_beta1, y=prior_beta2),color="#000066", linetype=2 ) +geom_density2d( aes(x=posterior_beta1, y=posterior_beta2),color="#0000cc" ) +theme_bw() +xlab( expression( beta[0] ) ) +ylab( expression( beta[1] ) ) +ggtitle( expression(


paste( "Approximate prior and posterior for ",beta[0], ", ", beta[1] ) ) )

To analyze the centered model add this to the previous program. (This actuallylooks like it is both centered and scaled.) This also provides Figure 13.8.

centered_temp <- scale( ORing_data$Temperature )c_temp <- centered_temp[,1]temp_center <- attr( centered_temp, "scaled:center" )temp_scale <- attr( centered_temp, "scaled:scale" )data <- list( "n", "y", "c_temp", "a", "b" )

## For prior generation purposes,## 75 becomes 0.7701178 and 55 becomes -2.063916.## Induced priors for betas in Model B## are calculated using these values.

ORing.c.sim <- bugs( data, inits, parameters,model.file=model_b_filename,n.chains=1, n.iter=iterates+burn_in,n.thin=1, n.burnin=burn_in,OpenBUGS.pgm=BUGS_path,working.directory=working_dir,debug=F)

beta.c <- ORing.c.sim$sims.list$beta

prior_theta1 <- rbeta(iterates, a[1], b[1])prior_theta2 <- rbeta(iterates, a[2], b[2])prior_beta_c_1 <- 0.2717391 * logit( prior_theta1 )

+ 0.7282609 * logit( prior_theta2 )prior_beta_c_2 <- 0.352854 * logit( prior_theta2 )

- 0.352854 * logit( prior_theta1 )

beta_c_frame <- data.frame( cbind( prior_beta_c_1, prior_beta_c_2,beta.c[,1], beta.c[,2] ) )

names( beta_c_frame ) <- c( "prior_beta_c_1", "prior_beta_c_2","posterior_beta_c_1", "posterior_beta_c_2" )

#This is an estimated version of Figures 13.3 and 13.4ggplot( beta_c_frame ) +

geom_density2d( aes(x=prior_beta_c_1, y=prior_beta_c_2),color="#000066", linetype=2 ) +geom_density2d( aes(x=posterior_beta_c_1, y=posterior_beta_c_2),color="#0000cc" ) +theme_bw() +


xlab( expression( beta[0] ) ) +ylab( expression( beta[1] ) ) +ggtitle( expression( paste("Approximate prior and posterior for centered ",beta[0], ", ", beta[1] ) ) )

# This constructs Figure 13.8prediction_temps <- (320:920)/10adj_temps <- (prediction_temps - temp_center) / temp_scaleprediction_probs <- NULLfor( i in 1:(round( iterates/10 ) ) ){

prediction_probs <- rbind( prediction_probs,exp( beta.c[i,1] + adj_temps*beta.c[i,2] ) /

( 1 + exp( beta.c[i,1] + adj_temps*beta.c[i,2] ) ) )}prediction_quantiles <- apply(prediction_probs, 2,

quantile, probs=c(0.05,0.5,0.95))predictions <- data.frame( prediction_temps, t(prediction_quantiles) )names(predictions) <- c("Temp", "Lower", "Median", "Upper")

ggplot( predictions ) +geom_line( aes(x=Temp, y=Lower),color="#aa0000", linetype=2 ) +geom_line( aes(x=Temp, y=Median),color="#aa0000", linetype=1 ) +geom_line( aes(x=Temp, y=Upper),color="#aa0000", linetype=2 ) +coord_cartesian( ylim=c(0,1) ) +theme_bw() +xlab("Temperature") +ylab("Failure Probability") +ggtitle( expression( paste("O-Ring failure probability by temperature" ) ) )

13.2.1.4 Trauma Data

The trauma data file is coded with 1 as survived. The book treats 1 as death.TRAUMAa.DAT is a version of TRAUMA.DAT without the column names.

rm(list = ls())Trauma <- read.table("C:\\E-drive\\Books\\LOGLIN2\\DATA\\TRAUMAa.DAT",

sep="",col.names=c("ID","Death","ISS","TI","RTS","AGE"))


attach(Trauma)summary(Trauma)

par(mfrow=c(2,2))Death=1-Deathboxplot(ISS˜Death,ylab="ISS")boxplot(RTS˜Death)boxplot(AGE˜Death)par(mfrow=c(1,1))

20

40

60

80

100

120

−4 −2 0 2 4

−0.4

−0.2

0.00.2

Fig. 13.7 Trauma Data: Box Plots, 1 indicates death.

We are running OpenBUGS but through R so that we do not have to list all 300data points in our OpenBUGS program.

model{for(i in 1:n){

death[i] ˜ dbern(theta[i])logit(theta[i]) <- beta[1] + beta[2]*ISS[i] + beta[3]*RTS[i]

+ beta[4]*AGE[i] + beta[5]*TI[i]+ beta[6]*AGE[i]*TI[i]

}for(i in 1:6){ beta[i] ˜ dflat() }junk <- ID[1]}


list(beta= c(0,0,0,0,0,0))list(n=300)ID[ ] death[ ] ISS[ ] TI[ ] RTS[ ] AGE[ ]2979 0 1 1 7.8408 251167 0 9 0 7.8408 63116 0 29 0 2.9304 32remaining 297 data lines go hereEND

Fletch’s web code

library(R2OpenBUGS)

setwd("c:\\[Data]\\School\\UNM\\Teaching\\577\\")model_a_filename <- "https://www.stat.unm.edu/˜ronald/courses/577/files/R_Lesson_1a.txt"model_b_filename <- "https://www.stat.unm.edu/˜ronald/courses/577/files/R_Lesson_1b.txt"BUGS_path <- "c:\\OpenBUGS323\\OpenBUGS.exe"myworking_dir <- "c:\\[Data]\\School\\UNM\\Teaching\\577\\"

iterates <- 10000burn_in <- 1000

########################## One-Sample Problem

X <- 15n <- 100a <- 3.4b <- 23

data <- list( "X", "n", "a", "b" )

inits <- function() {list( p = runif(1,0,1) )

}

parameters <- list( "p" )

Prob1.sim <- bugs( data, inits, parameters,model.file=model_a_filename,n.chains=1, n.iter=iterates+burn_in, n.thin=1, n.burnin=burn_in,OpenBUGS.pgm=BUGS_path,working.directory=myworking_dir,debug=T )


Fig. 13.8 Trauma Data: Priors and Posteriors on p’s


Table 13.1 Trauma Data: Prior Specification

Design for Prior Beta (yi, Ni− yi)i x′i yi Ni− yi1 1 25 7.84 60 0 0 1.1 8.52 1 25 3.34 10 0 0 3.0 11.03 1 41 3.34 60 1 60 5.9 1.74 1 41 7.84 10 1 10 1.3 12.05 1 33 5.74 35 0 0 1.1 4.96 1 33 5.74 35 1 35 1.5 5.5

EXERCISE 13.11. Below we give code for handling the trauma data based onthe fully informative prior specified in Table 8.5 with standardized continuouscovariates. Below that, we give R code for obtaining X−1 and Z−1. (a) Modifythe code as needed to give a full analysis of the trauma data including assess-ment of posterior probabilities that regression coefficients are positive (or nega-tive), estimates of probabilities of “death on the table” for the 16 possible com-binations of (ISS,RT S,AGE,T I) corresponding to ISS = 20,40, RT S = 3.34,5.74,AGE = 10,60, and T I = 0,1. As part of your analysis, create a table of entries thatincludes the median, and a 95% probability interval for each combination. Infer-ences are for the proportions of deaths in the populations of trauma patients that fallinto these 16 categories. Compare with results obtained in Figure 8.5.


death[i] ˜ dbern(theta[i])logit(theta[i]) <- gamma[1] + gamma[2]*(ISS[i]-14.3)/11

+ gamma[3]*(RTS[i]-7.29)/1.25+ gamma[4]*(AGE[i]-31.4)/17+ gamma[5]*TI[i]+ gamma[6]*(AGE[i]-31.4)*TI[i]/17

}for(i in 1:6){

tildetheta[i] ˜ dbeta(a[i],b[i])v[i] <- log(tildetheta[i]/(1-tildetheta[i]))gamma[i] <- inprod(Ztinv[i,1:6], v[1:6])

}junk <- ID[1]}list(tildetheta=c(0.5,0.5,0.5,0.5,0.5,0.5))list(n=300, a=c(1.1,3,5.9,1.3,1.1,1.5), b=c(8.5,11,1.7,12,4.9,5.5))Ztinv[,1] Ztinv[,2] Ztinv[,3] Ztinv[,4] Ztinv[,5] Ztinv[,6]-2.603 -2.460 -3.702 -3.702 6.063 7.403-0.343 -0.343 0.343 0.343 0.686 -0.686-2.091 -2.091 -2.091 -2.091 4.182 4.182


2.901 2.218 2.559 2.559 -5.119 -5.1191.149 1.005 1.005 1.149 -3.154 -1.154-5.460 -4.778 -4.778 -5.460 10.238 10.238ENDID[ ] death[ ] ISS[ ] TI[ ] RTS[ ] AGE[ ]2979 0 1 1 7.8408 251167 0 9 0 7.8408 63116 0 29 0 2.9304 32remaining 297 data lines go hereEND

par(mfrow=c(2,3))

13.2.2 Predictive Probabilities

Fletch’s code previously produced Figure 13.6 which is more difficult than Fig-ures 13.5 and 13.7.

Figure 13.5 simply plots values computed using the MLEs from Section 2.6 andthe mean values reported in Table 13.2.

rm(list = ls())x=seq(30,80,.1)w= 15.0429 + (-.2322*x)y=exp(w)/(1+exp(w))plot(x,y,type="l",xlim=c(30,85),ylab="Probability",xlab="Temperature",lty=1)ww= 12.97 + (-.2018*x)yy=exp(ww)/(1+exp(ww))lines(x,yy,type="l",lty=3)www= 16.92 + (-.2648*x)yyy=exp(www)/(1+exp(www))lines(x,yyy,type="l",lty=2)legend("bottomleft",c("MLE","Bayes: full data","Bayes: no case 18"),lty=c(1,3,2))

Figure 13.7 uses the informative prior estimated coefficients reported in Ta-ble 13.3 Much of the code for Figure 13.7 is redundant because the four parts tothe figure involve only minor changes in code. (Minor but annoyingly picky.)

rm(list = ls())par(mfrow=c(2,2))gamma=c(-1.79,0.07,-0.60,0.05,1.10,-.02)ISS=seq(0,75,1)AGE= 60#AGE=10RTS=3.34#RTS=5.47


# TI=0, bluntw<- gamma[1] + gamma[2]*(ISS) + gamma[3]*(RTS) +gamma[4]*(AGE)y=exp(w)/(1+exp(w))plot(ISS,y,type="l",xlim=c(0,80),ylab="Pr(death)",

xlab="ISS",main="a. AGE=60 and RTS=3.34",lty=1)#TI=1ww= w+ gamma[5] + gamma[6]*(AGE)yy=exp(ww)/(1+exp(ww))lines(ISS,yy,type="l",lty=2)legend("bottomright",c("Blunt","Penetrating"),lty=c(1,2))

AGE=10RTS=3.34# TI=0, bluntw<- gamma[1] + gamma[2]*(ISS) + gamma[3]*(RTS) +gamma[4]*(AGE)y=exp(w)/(1+exp(w))plot(ISS,y,type="l",xlim=c(0,80),ylab="Pr(death)",

xlab="ISS",main="b. AGE=10 and RTS=3.34",lty=1)#TI=1ww= w+ gamma[5] + gamma[6]*(AGE)yy=exp(ww)/(1+exp(ww))lines(ISS,yy,type="l",lty=2)legend("bottomright",c("Blunt","Penetrating"),lty=c(1,2))

AGE= 60RTS=5.47# TI=0, bluntw<- gamma[1] + gamma[2]*(ISS) + gamma[3]*(RTS) +gamma[4]*(AGE)y=exp(w)/(1+exp(w))plot(ISS,y,type="l",xlim=c(0,80),ylab="Pr(death)",

xlab="ISS",main="c. AGE=60 and RTS=5.47",lty=1)#TI=1ww= w+ gamma[5] + gamma[6]*(AGE)yy=exp(ww)/(1+exp(ww))lines(ISS,yy,type="l",lty=2)legend("bottomright",c("Blunt","Penetrating"),lty=c(1,2))

AGE= 10RTS=5.47# TI=0, bluntw<- gamma[1] + gamma[2]*(ISS) + gamma[3]*(RTS) +


gamma[4]*(AGE)y=exp(w)/(1+exp(w))plot(ISS,y,type="l",xlim=c(0,80),ylab="Pr(death)",

xlab="ISS",main="d. AGE=10 and RTS=5.47",lty=1)#TI=1ww= w+ gamma[5] + gamma[6]*(AGE)yy=exp(ww)/(1+exp(ww))lines(ISS,yy,type="l",lty=2)legend("topleft",c("Blunt","Penetrating"),lty=c(1,2))

13.2.3 Inference for Regression Coefficients

Table 13.2 Posterior Marginal Distribution: O-Rings

Full Data Case 18 Deletedβ0 β1 β0 β1

βi = E(βi|Y ) 12.97 −0.2018 16.92 −0.2648Std. Dev.(βi|Y ) 5.75 0.0847 7.09 0.1056

5% 4.56 −0.355 6.85 −0.45925% 9.04 −0.251 11.98 −0.32450% 12.44 −0.194 16.13 −0.25275% 16.20 −0.144 20.86 −0.19195% 23.38 −0.077 29.96 −0.114

13.2.4 Inference for LDα

Add the data to the WinBUGS code below.

model{for(i in 1:2){ y[i] ˜ dbin(theta[i],N[i]) } # Likelihoodtheta[1] ˜ dbeta(11.26,11.26) # The priortheta[2] ˜ dbeta(13.32,6.28)beta[1] <- logit(theta[1]) # Induced posteriorbeta[2] <- logit(theta[2]) - logit(theta[1])lambda <- theta[1]/theta[2] # Induced posterior for ratioprob <- step(lambda -1) # Posterior prob. ratio > 1.}


Fig. 13.9 O-Ring Data: Marginal Density for β1

13.3 Diagnostics 89

Table 13.3 Fitted Trauma Model

Informative Posterior Summaries Maximum LikelihoodBased on informative prior

Variable Estimate Std. Error 0.05% 0.95% Estimate Std. ErrorIntercept −1.79 1.10 −3.54 0.02 −2.73 1.62ISS 0.07 0.02 0.03 0.10 0.08 0.03RTS −0.60 0.14 −0.82 −0.37 −0.55 0.17AGE 0.05 0.01 0.03 0.07 0.05 0.01TI 1.10 1.06 −0.66 2.87 1.34 1.33AGE × TI −0.02 0.03 −0.06 0.03 −0.01 0.03

Posterior SummariesBased on diffuse prior

Variable Estimate Std. Error 0.05% 0.95%Intercept −2.81 1.60 −5.34 −0.18ISS 0.09 0.03 0.05 0.13RTS −0.59 0.17 −0.86 −0.32AGE 0.06 0.02 0.03 0.09TI 1.46 1.36 −0.79 3.69AGE × TI −0.01 0.03 −0.07 0.05

Table 13.4 Posterior Summaries for LDα ’s

Full Data Case 18 DeletedPercentiles Percentiles

α 5% 50% 95% α 5% 50% 95%.90 30.2 52.9 60.4 0.90 39.8 55.1 61.2.75 43.4 58.5 64.0 0.75 48.9 59.4 64.0.50 55.9 64.2 68.5 0.50 57.5 63.8 67.5.25 65.1 69.8 76.4 0.25 64.1 68.1 73.0.10 70.3 75.4 88.3 0.10 68.3 72.4 80.9

13.3 Diagnostics

13.3.1 Case Deletion Influence Measures

13.3.2 Estimative Influence

13.3.3 Predictive Influence

EXAMPLE 13.3.1. O-Ring Data.Figure 13.9 gives index plots of Dp

i and D fi for the O-ring data.


Fig. 13.10 O-Ring Data: Index Plots of Dpi and D f

i

EXAMPLE 13.3.2. Trauma Data.Figure 13.10 contains an index plot of the difference in the predictive probabilitiesof death, p(y = 1|Y,x j)− p(y = 1|Y(52),x j).

13.3.4 Model Checking

13.3.5 Link Selection

Bayes Factors for the O-Ring Data Below, we give WinBUGS code forcalculating the Bayes factor comparing M1 (logistic link) to M3 (complemen-tary log-log link) using the O-ring data. You may want to review the methodfor simulating Bayes factors with parameters of equal dimensions given in Sub-section 4.8.3. (a) Before running the code, use (3) to show that the inducedprior for β under the complementary log-log transformation is p(β ) ∝ ∏

2i=1{1−

exp(−exiβ )}ai−1{exp(−exiβ )}biexiβ , and thus derive the expression for u[i] in thecode below.

model{for (i in 1:n){

y[i] ˜ dbin(theta[i],1)logit(theta[i]) <- beta[1] + beta[2]*(temp[i]-mt)/sdtemp

}

13.3 Diagnostics 91

Fig. 13.11 Trauma Data: Index Plot of p(y = 1|Y,x j)− p(y = 1|Y(52),x j)


for(i in 1:2){ tildetheta[i] ˜ dbeta(a[i],b[i]) }beta[1]<-G[1,1]*logit(tildetheta[1])+G[1,2]*logit(tildetheta[2])beta[2]<-G[2,1]*logit(tildetheta[1])+G[2,2]*logit(tildetheta[2])#Have specified logistic model and induced prior on beta#Now specify probs under cloglog modelfor(i in 1:n){

cloglog(thetastar[i]) <- beta[1]+beta[2]*(temp[i]-mt)/sdtemp#Now give terms that go into log lik ratio#comparing cloglog in num to logistic in denv[i] <- y[i]*(log(thetastar[i])-log(theta[i]))

+(1-y[i])*(log(1-thetastar[i])-log(1-theta[i]))}#Finally, give corresponding log of prior ratiofor(i in 1:2){

cloglog(tildethetastar[i]) <- beta[1]+ beta[2]*(ttemp[i]-mt)/sdtemp

logit(tttildetheta[i]) <- beta[1]+ beta[2]*(ttemp[i]-mt)/sdtemp

u[i] <- (a[i]-1)*log(tildethetastar[i])- a[i]*log(tttildetheta[i])+ b[i]*(log(1-tildethetastar[i])- log(1-tttildetheta[i]))+ beta[1]+ beta[2]*(ttemp[i]-mt)/sdtemp

}wstar1 <- sum(v[ ])wstar2 <- sum(u[ ])w <- exp(wstar1+wstar2)}list(tildetheta=c(0.5,0.5))list(ttemp=c(55,75), mt=69.56522, sdtemp=7.05708,

n=23, a=c(1.6,1.0), b=c(1.0,1.6))G[,1] G[,2]

0.2717391 0.7282609-0.3528540 0.3528540END

EXAMPLE 13.3.2 CONTINUED. Trauma Data.

13.4 Posterior Computations and Sample Size Calculation 93

Fig. 13.12 O-Ring Data: Bayes Factors with Case Deletion

13.3.6 Sensitivity Analysis

13.4 Posterior Computations and Sample Size Calculation


Fig. 13.13 Trauma Data: p(y = 1|Y,x j,Y )− p(y = 1|Y,x j,Y(i))


Fig. 13.14 O-Ring Data: Importance Function Diagnostic Plots

Chapter 14Bayesian Binomial Regression: OpenBUGS GUI

This is less complete than the previous chapter. I amtentatively OK with what I have done but many parts

are undone.I don’t plan to work on this any more UNTIL I get the previous chapter finished. (Istarted trying to do combine the GUI and R approaches in one chapter.)

The book does not get into computational issues until Section 4. Here we use theentire chapter to gradually introduce the computations needed.

Bayesian computation has made huge strides since the second edition of thisbook in 1997. In the book I/we used importance sampling. (Ed Bedrick did almostall the computing for joint papers we wrote with Wes Johnson.) Current practice in-volves using Markov chain Monte Carlo (McMC) methods that provide a sequenceof observations. Two particular tools in this approach are Gibbs sampling (named byGeman and Geman, 1984, after distributions that were introduced by and named af-ter the greatest American physicist of the 19th century) and the Metropolis-Hastingsalgorithm (named after the the alphabetically listed authors of Metropolis, Rosen-bluth, Rosenbluth, Teller, and Teller, 1953, and the person who introduced the tech-nique to the statistics community [and made a simple but very useful improvement]Hastings, 1970). For more about McMC see Christensen et al. (2010, Chapter 6) andthe references given therein. Henceforth Christensen et al. is referred to as BIDA.

Unlike importance samples used in the second edition, McMC samples are notnaturally independent and they only become (approximate) samples from the pos-terior distribution after the Markov chain has been running quite a while, i.e., afterone is deep into the sequence of observations. Computationally, we need to specifya burn-in period for the samples to get close to the posterior and then we throwaway all of the observations from the burn-in period. Among the samples we use,we tend to take larger samples sizes to adjust for the lack of independence. Whenaveraging sample observations to estimate some quantity (including probabilities ofevents) the lack of independence is rarely a problem. When applying more sophis-ticated techniques than averaging to the samples, if independence is important to

97

98 14 Bayesian Binomial Regression: OpenBUGS GUI

the technique, we can often approximate independence by thinning the sample, i.e.,using, say, only every 10th or 20th observation from the Markov chain. Of course inany program we need to specify the sample size and the rate at which any thinningoccurs. The default is typically no thinning.

BUGS (Bayesian inference Using Gibbs Sampling) provides a language for spec-ifying Bayesian models computationally, cf. http://www.mrc-bsu.cam.ac.uk/software/bugs/. OpenBUGS (https://openbugs.net) and JAGS(http://mcmc-jags.sourceforge.net/) implement that language to ac-tually analyze data. I will present results using OpenBUGS. I will also illustraterunning the OpenBUGS code in R using R2OpenBUGS, cf. https://cran.r-project.org/web/packages/R2OpenBUGS/index.html.

An outdated version of OpenBUGS is WinBUGS, cf. https://cran.r-project.org/web/packages/R2WinBUGS/index.html. WinBUGSwas used in BIDA but the commands given in BIDA should almost all work withOpenBUGS. (Some data entry is different.) JAGS can be used in R with R2jags,cf. https://cran.r-project.org/web/packages/R2jags/index.html. R also has tools for doing these things directly, cf. https://cran.r-project.org/web/packages/MCMCpack/index.html.

My goal here is to get you through an McMC version of the computations in thebook. For a more general tutorial on OpenBUGS see http://www.openbugs.net/Manuals/Tutorial.html.

14.1 Introduction

There are three things you need to do in using OpenBUGS or JAGS:

• Specify the Bayesian model. This involves specifying both the sampling distri-bution and the prior distribution.

• Enter the data. Depending on how you specified the model, this includes speci-fying any “parameters” in the model that are known.

• Identify and give starting values for the unknown parameters.

There are a lot of similarities between the R and BUGS languages but beforeproceeding we mention a couple oddities of the BUGS language. First,

y∼ Bin(N, p)

is written as

y ˜ dbin(p,N)

with the order of N and p reversed. Also,

y∼ N(m,v)

is written as



https://openbugs.net

http://mcmc-jags.sourceforge.net/












y ˜ dnorm(m,1/v)

where the variance v is replaced in BUGS by the precision, 1/v. Replacing normalvariances with precisions is a very common thing to do in Bayesian analysis becauseit simplifies many computations.

The main thing is to specify the model inside a statement: model{}. For simpleproblems this is done by specifying a sampling distribution and a prior distribution.

The sampling model for the O-ring data with failures yi and temperatures τi is

yi ∼ Bin(1, pi),

logit(pi)≡ log(

pi

1− pi

)= β1 +β2τi, i = 1, . . . ,23.

For programming convenience we have relabeled the intercept as β1 and the slopeas β2. In the BUGS language this can be specified as

for(i in 1:23){y[i] ˜ dbin(p[i],1)logit(p[i]) <- beta[1] + beta[2]*tau[i]}

The parameters of primary interest are beta[1] and beta[2]. We also need tospecify initial values for the unknown parameters but that is not part of specifyingthe model. Remember if you copy the tilde symbol

˜

from a .pdf file into a program like R or OpenBUGS, you may need to delete andreplace the symbol.

In specifying the prior model, for simplicity we begin by specifying independentnormal priors,

β j ∼ N(a j,1/b j), j = 1,2,

for some specified values, say, a1 = 10, b1 = 0.001, a2 = 0, b2 = 0.004. The b js areprecisions (inverse variances). I have heretically put almost no thought into this priorother than making the precisions small. In BUGS the prior model is most directlyspecified as

beta[1] ˜ dnorm(10,.001)beta[2] ˜ dnorm(0,.004)

In my opinion one should always explore different priors to examine how sensitivethe end results are to the choice of prior. It is also my opinion that one of those priorsshould be your best attempt to quantify your prior knowledge about the unknownparameters.

All together the model is


y[i] ˜ dbin(p[i],1)


logit(p[i]) <- beta[1] + beta[2]*tau[i]}beta[1] ˜ dnorm(10,.001)beta[2] ˜ dnorm(0,.004)

}

The model is only a part of an overall program for OpenBUGS. To run OpenBUGSwithin R, we need to place this model into a .txt file, say, Oring.txt.

Producing a functional program involves identifying the model, specifying thedata, and specifying initial values for all of the unknown parameters. We have twochoices on how to run this. We can run it through R or we can run it through theOpenBUGS GUI. The GUI is more flexible and I recommend you learn to use itbefore doing serious data analysis but running it through R because R2OpenBUGSrequires a more transparent specification of the various steps involved.

14.1.1 Running the OpenBUGS GUI

The price you pay for the added flexibility of using the OpenBUGS GUI is that youhave to know how to identify the model, the data, the parameters, and the param-eters’ initial values within the GUI (by clicking appropriate buttons) whereas allthese things are explicit in R2OpenBUGS. You still have to write a program to runOpenBUGS through the GUI but you only specify the pieces in the program and youidentify what the pieces are/mean in the GUI rather than the program. In a programwritten for the OpenBUGS GUI the statement of the model is exactly as before andwe need to add statements to identify the data and specify initial values.

One major advantage of running OpenBUGS through R is that you can use vari-ables defined in R and manipulate them to your heart’s content. Reading data di-rectly into the OpenBUGS GUI can be more awkward. For the O-rings a simplespecification of the observed data is

list(y=c(1,1,1,1,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0),tau=c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81))

We will come back to the issue of reading data into the GUI later.We specify initial values for the unknown parameters β1 and β2 in a (single)

list statement that implicitly defines the parameters. Here we take the initial val-ues as β1 = 0, β2 = 0, so

list(beta = c(0,0))

We still need to specify to the GUI that these are initial values, as opposed to spec-ifying that we know β1 = 0 = β2, in which case fitting data to estimate β1 and β2would be superfluous.


Fig. 14.1 OpenBUGS screenshot of GUI.

When you open the program OpenBUGS go to the file menu and hit new.Copy the following program into the new window of the GUI


y[i] ˜ dbin(p[i],1)logit(p[i]) <- beta[1] + beta[2]*tau[i]}beta[1] ˜ dnorm(10,.001)beta[2] ˜ dnorm(0,.004)

}# data lists including known prior parameterslist(y=c(1,1,1,1,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0),tau=c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81))# initial values list for parameterslist(beta = c(0,0))END

Remember to replace the 3 tildes. Note that the code by itself does not distinguishbetween data and initial values.

The GUI process will require trips to the menu options Model, thenInference, back to Model, then back to Inference. To specify all the parts of


the model, you will need to highlight parts of the program before hitting appropriatebuttons.

Having copied the code into OpenBUGS, the next step is to open the Modelmenu and select the Specification tool. You will need to hit the buttons

1. check model,2. load data,3. compile,4. load inits (for initial values)

but before hitting the check model button or a load button you need to high-light the appropriate model or list statement in your GUI program!

Quitting the Specification tool, go to the Inference menu and selectthe Samples tool. In the node dialog box, type beta and hit the set button.(Alternatively you could type beta[1] hit set then type beta[2] and hit setagain.?)

Go back to the Model menu and this time choose the Update tool. Click theUpdate button 11 times to mimic the R2OpenBUGS program. By default, everytime you click the button you get 1000 more samples from the Markov chain (aMarkov chain that for large samples converges to the posterior distribution). Oryou could change the updates dialog box to 11000 and click the Update buttononce. You will want to throw away many observations at the beginning of the chainbecause they will be sampled from distributions that are not yet sufficiently close tothe posterior distribution, i.e., you want a burn-in.

Go back to the Inference menu and again select the Samples tool. In thebeg dialog box, type 1001. This drops the first 1000 observations from the compu-tations. (Our burn-in.) Again type beta into the nodes dialog box but now clickthe stats button. A Table of Coefficients should appear. Hitting the densitybutton gives estimated density plots for the parameters. Before leaving the Samplestool, hit the autocor button. If the two autocorrelation figures tail off exponen-tially, you are in good shape. If they tail off linearly, there is a problem with theprocedure. Similarly, the history button should provide two figures that, ideally,look pretty stable.

Unfortunately, this process is not as idiot proof as the frequentist analysis inSection 2.6.

14.1.2 Alternative Specifications

Sometimes it is more convenient to define the prior more obliquely. Rather than thedirect specification we used, we can specify

for(j in 1:2){ beta[j] ˜ dnorm(a[j],b[j]) }

where at this point in the code it is not clear whether the a[j]s and b[j]s areunknown parameters or not. In either case we have to specify values for them in alist statement, say,


list(a=c(10,0), b=c(.001,.004))

Identifying this list statement as data, rather than initial values, completes our priorspecification (and implicitly defines them as not being parameters).

If you have the data in a file, it may be easier to copy it into the OpenBUGS GUIfrom a text editor. Below I read in the O-ring data as 23×2 matrix z and then I willneed to separate the columns before using them.

list(z=structure(.Data = c(1, 53,1, 57,1, 58,1, 63,0, 66,0, 67,0, 67,0, 67,0, 68,0, 69,0, 70,0, 70,1, 70,1, 70,0, 72,0, 73,0, 75,1, 75,0, 76,0, 76,0, 78,0, 79,0, 81), .Dim = c(23, 2)))# To the model statement addy=z[,1]tau=z[,2]

In WinBUGS entering data this was easier to do.

y[ ] tau[ ]1 531 571 58similar for 18 more cases0 790 81


14.2 Bayesian Inference

To this point we have introduced Bayesian computation for logistic regression butwe have not yet reproduced any results from the book. We also produce the newgraphics used in the revised edition.

14.2.1 Specifying the Prior and Approximating the Posterior

Section 1 illustrates Bayesian computations using a prior of convenience. The mainaddition in this section is that we demonstrate how to program the induced prior onthe regression coefficients that was discussed in the book. This involves modifica-tions to the BUGS model.

14.2.1.1 O-ring Data with Induced Prior

The book elicits independent priors on the probabilities p1 and p2 of O-ring failureassociated with the temperatures x1 = 55 and x2 = 75. It induces a prior on theregression coefficients. The elicited prior is

p j ∼ Beta(a j,b j), j = 1,2

(a1,a2) = (1,0.577) (b1,b2) = (0.577,1)

Sampling from the induced prior on β is actually easier than determining its prioranalytically. For the logistic model,[

logit(p1)logit(p2)

]=

[1 551 75

][β1β2

],

so [β1β2

]=

[1 551 75

]−1 [ logit(p1)logit(p1)

]=

[75/20 −55/20−1/20 1/20

][logit(p1)logit(p1)

].

The equality allows us to transform samples from the p js directly into samples ofthe β js. In BUGS this is easily programmed as

for(j in 1:2){ ptilde[j] ˜ dbeta(a[j],b[j]) } # Prior# Induced prior on the regression coefficientsbeta[1] <- (75/20)*logit(ptilde[1])

- (55/20)*logit(ptilde[2])beta[2] <- (-1/20)*logit(ptilde[1])

+ (1/20)*logit(ptilde[2])


The values a[1], a[2], b[1], b[2] all need to be specified as part of adata list statement. (Note that the prior used in this book is different than thesimilar prior used in BIDA).

In general we define an invertible square matrix of predictor variables X and avector of probabilities p associated with with those predictor variables. The proba-bilities are related to the β coefficients via

F(p) = Xβ or β = X−1F(p).

In logistic regression, F is just the function that maps each probability to its logit.For models with an intercept and more than one predictor variable, we will want toinvert X in R, but in this code we have inverted the matrix X analytically, since it isonly 2× 2. Samples from the distribution of p are easily transformed into samplesof the regression coefficients.

14.2.1.2 OpenBUGS Gui

The following code constitutes a program for fitting the O-ring data in the Open-BUGS GUI. In addition to inducing the prior, it specifies the sample size as n andthen specifies n in a list statement along with the known values for the Beta dis-tributions. . This code differs from the previous section in that it uses some differentnotation: tau→ temp and p[i]→ theta[i],


y[i] <- z[i,1]temp[i] <- z[i,2]y[i] ˜ dbin(theta[i],1) # Likelihoodlogit(theta[i]) <- beta[1] + beta[2]*temp[i]

}for(i in 1:2){ tildetheta[i] ˜ dbeta(a[i],b[i]) } # Prior# Induced prior on the regression coefficientsbeta[1] <- (75/20)*logit(tildetheta[1])

- (55/20)*logit(tildetheta[2])beta[2] <- (-1/20)*logit(tildetheta[1])

+ (1/20)*logit(tildetheta[2])}

# Sample size, hyperparameters of prior, and the# actual data need to be specified in list statements# all of which need to be specified as data in the GUI# by highlighting them before pressing the data buttonlist(n=23, a=c(1,.577), b=c(.577,1))list(z=structure(.Data = c(1, 53,


1, 57,1, 58,0, 63,1, 66,1, 67,1, 67,1, 67,1, 68,1, 69,1, 70,1, 70,0, 70,0, 70,1, 72,1, 73,1, 75,0, 75,1, 76,1, 76,1, 78,1, 79,1, 81), .Dim = c(23, 2)))# Set initial valueslist(tildetheta=c(0.5,0.5))END

14.2.1.3 MY ATTEMPTS TO DO FIGURES WITH PRIORS

https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/contour

x=seq()

x <- -6:16op <- par(mfrow = c(2, 2))contour(outer(x, x), method = "edge", vfont = c("sans serif", "plain"))z <- outer(x, sqrt(abs(x)), FUN = "/")image(x, x, z)

contour(x = seq(0, 1, length.out = nrow(z)),y = seq(0, 1, length.out = ncol(z)),z,nlevels = 10, levels = pretty(zlim, nlevels),labels = NULL,




Fig. 14.2 O-Ring Data: Prior on β


xlim = range(x, finite = TRUE),ylim = range(y, finite = TRUE),zlim = range(z, finite = TRUE),labcex = 0.6, drawlabels = TRUE, method = "flattest",vfont, axes = TRUE, frame.plot = axes,col = par("fg"), lty = par("lty"), lwd = par("lwd"),add = FALSE, ...)

x=seq(78,102,.125)y=z=par(mfrow=c(3,3))

14.2.1.4 Trauma Data Run Through R

The trauma data file is coded with 1 as survived. The book treats 1 as death.TRAUMAa.DAT is a version of TRAUMA.DAT without the column names.

rm(list = ls())Trauma <- read.table("C:\\E-drive\\Books\\LOGLIN2\\DATA\\TRAUMAa.DAT",

sep="",col.names=c("ID","Death","ISS","TI","RTS","AGE"))attach(Trauma)summary(Trauma)

par(mfrow=c(2,2))Deathboxplot(ISS˜Death,ylab="ISS")boxplot(RTS˜Death)boxplot(AGE˜Death)par(mfrow=c(1,1))

We are still running OpenBUGS but now through R so that we do not have to listall 300 data points in our OpenBUGS program.


death[i] ˜ dbern(theta[i])logit(theta[i]) <- beta[1] + beta[2]*ISS[i] + beta[3]*RTS[i]

+ beta[4]*AGE[i] + beta[5]*TI[i]+ beta[6]*AGE[i]*TI[i]

}for(i in 1:6){ beta[i] ˜ dflat() }junk <- ID[1]}


20

40

60

80

100

120

−4 −2 0 2 4

−0.4

−0.2

0.00.2

Fig. 14.3 Trauma Data: Box Plots, 0 indicates death.

list(beta= c(0,0,0,0,0,0))list(n=300)ID[ ] death[ ] ISS[ ] TI[ ] RTS[ ] AGE[ ]2979 0 1 1 7.8408 251167 0 9 0 7.8408 63116 0 29 0 2.9304 32remaining 297 data lines go hereEND

Fletch’s web code

library(R2OpenBUGS)

setwd("c:\\[Data]\\School\\UNM\\Teaching\\577\\")model_a_filename <- "https://www.stat.unm.edu/˜ronald/courses/577/files/R_Lesson_1a.txt"model_b_filename <- "https://www.stat.unm.edu/˜ronald/courses/577/files/R_Lesson_1b.txt"BUGS_path <- "c:\\OpenBUGS323\\OpenBUGS.exe"myworking_dir <- "c:\\[Data]\\School\\UNM\\Teaching\\577\\"

iterates <- 10000burn_in <- 1000

########################## One-Sample Problem


X <- 15n <- 100a <- 3.4b <- 23

data <- list( "X", "n", "a", "b" )

inits <- function() {list( p = runif(1,0,1) )

}

parameters <- list( "p" )

Prob1.sim <- bugs( data, inits, parameters,model.file=model_a_filename,n.chains=1, n.iter=iterates+burn_in, n.thin=1, n.burnin=burn_in,OpenBUGS.pgm=BUGS_path,working.directory=myworking_dir,debug=T )

Table 14.1 Trauma Data: Prior Specification

Design for Prior Beta (yi, Ni− yi)i x′i yi Ni− yi1 1 25 7.84 60 0 0 1.1 8.52 1 25 3.34 10 0 0 3.0 11.03 1 41 3.34 60 1 60 5.9 1.74 1 41 7.84 10 1 10 1.3 12.05 1 33 5.74 35 0 0 1.1 4.96 1 33 5.74 35 1 35 1.5 5.5

EXERCISE 14.11. Below we give code for handling the trauma data based onthe fully informative prior specified in Table 8.5 with standardized continuouscovariates. Below that, we give R code for obtaining X−1 and Z−1. (a) Modifythe code as needed to give a full analysis of the trauma data including assess-ment of posterior probabilities that regression coefficients are positive (or nega-tive), estimates of probabilities of “death on the table” for the 16 possible com-binations of (ISS,RT S,AGE,T I) corresponding to ISS = 20,40, RT S = 3.34,5.74,AGE = 10,60, and T I = 0,1. As part of your analysis, create a table of entries thatincludes the median, and a 95% probability interval for each combination. Infer-ences are for the proportions of deaths in the populations of trauma patients that fallinto these 16 categories. Compare with results obtained in Figure 8.5.

model{


Fig. 14.4 Trauma Data: Priors and Posteriors on p’s


for(i in 1:n){death[i] ˜ dbern(theta[i])logit(theta[i]) <- gamma[1] + gamma[2]*(ISS[i]-14.3)/11

+ gamma[3]*(RTS[i]-7.29)/1.25+ gamma[4]*(AGE[i]-31.4)/17+ gamma[5]*TI[i]+ gamma[6]*(AGE[i]-31.4)*TI[i]/17

}for(i in 1:6){

tildetheta[i] ˜ dbeta(a[i],b[i])v[i] <- log(tildetheta[i]/(1-tildetheta[i]))gamma[i] <- inprod(Ztinv[i,1:6], v[1:6])

}junk <- ID[1]}list(tildetheta=c(0.5,0.5,0.5,0.5,0.5,0.5))list(n=300, a=c(1.1,3,5.9,1.3,1.1,1.5), b=c(8.5,11,1.7,12,4.9,5.5))Ztinv[,1] Ztinv[,2] Ztinv[,3] Ztinv[,4] Ztinv[,5] Ztinv[,6]-2.603 -2.460 -3.702 -3.702 6.063 7.403-0.343 -0.343 0.343 0.343 0.686 -0.686-2.091 -2.091 -2.091 -2.091 4.182 4.1822.901 2.218 2.559 2.559 -5.119 -5.1191.149 1.005 1.005 1.149 -3.154 -1.154-5.460 -4.778 -4.778 -5.460 10.238 10.238ENDID[ ] death[ ] ISS[ ] TI[ ] RTS[ ] AGE[ ]2979 0 1 1 7.8408 251167 0 9 0 7.8408 63116 0 29 0 2.9304 32remaining 297 data lines go hereEND

14.2.2 Predictive Probabilities

EXAMPLE 14.2.2 CONTINUED.Repeat with case 18 deleted and MLE

# MLEx=seq(30,8,.1)w= 15.0429 + (-.2322*x)y=exp(w)/(1+exp(w))plot(x,y,type="l",xlim=c(30,85),ylab="Fitted",xlab="Temperature",lty=3)

# Bayes


temppred=seq(30,80,.1)L=length(temppred)XXpred=c(rep(1,L),temppred)Xpred=matrix(XXpred.L,2)Beta = t(sims.matrix)Beta = t(Oring$sims.matrix)linpred=rowMeans(Xpred %*% Beta)pred=exp(linpred)/(1 + exp(linpred))lines(temppred,pred,type="l",lty=1)

# Bayes case 18 deletedtemppred=seq(30,80,.1)L=length(temppred)XXpred=c(rep(1,L),temppred)Xpred=matrix(XXpred.L,2)Beta = t(sims.matrix)linpred=rowMeans(Xpred %*% Beta)pred=exp(linpred)/(1 + exp(linpred))lines(temppred,pred,type="l",lty=2)

legend("bottomleft",c("MLE","Bayes: full data","Bayes: no case 18"),lty=c(3,1,2))

Fig. 14.5 O-Ring Data: Predictive Probabilities and MLEs


Fig. 14.6 O-Ring Data: Predictive Probabilities and 90% Intervals

14.2.3 Inference for Regression Coefficients

Table 14.2 Posterior Marginal Distribution: O-Rings

Full Data Case 18 Deletedβ0 β1 β0 β1

βi = E(βi|Y ) 12.97 −0.2018 16.92 −0.2648Std. Dev.(βi|Y ) 5.75 0.0847 7.09 0.1056

5% 4.56 −0.355 6.85 −0.45925% 9.04 −0.251 11.98 −0.32450% 12.44 −0.194 16.13 −0.25275% 16.20 −0.144 20.86 −0.19195% 23.38 −0.077 29.96 −0.114

14.2.4 Inference for LDα

Add the data to the WinBUGS code below.


Fig. 14.7 Trauma Data: Predictive Probabilities


Fig. 14.8 O-Ring Data: Marginal Density for β1


Table 14.3 Fitted Trauma Model

Informative Posterior Summaries Maximum LikelihoodBased on informative prior

Variable Estimate Std. Error 0.05% 0.95% Estimate Std. ErrorIntercept −1.79 1.10 −3.54 0.02 −2.73 1.62ISS 0.07 0.02 0.03 0.10 0.08 0.03RTS −0.60 0.14 −0.82 −0.37 −0.55 0.17AGE 0.05 0.01 0.03 0.07 0.05 0.01TI 1.10 1.06 −0.66 2.87 1.34 1.33AGE × TI −0.02 0.03 −0.06 0.03 −0.01 0.03

Posterior SummariesBased on diffuse prior

Variable Estimate Std. Error 0.05% 0.95%Intercept −2.81 1.60 −5.34 −0.18ISS 0.09 0.03 0.05 0.13RTS −0.59 0.17 −0.86 −0.32AGE 0.06 0.02 0.03 0.09TI 1.46 1.36 −0.79 3.69AGE × TI −0.01 0.03 −0.07 0.05

Table 14.4 Posterior Summaries for LDα ’s

Full Data Case 18 DeletedPercentiles Percentiles

α 5% 50% 95% α 5% 50% 95%.90 30.2 52.9 60.4 0.90 39.8 55.1 61.2.75 43.4 58.5 64.0 0.75 48.9 59.4 64.0.50 55.9 64.2 68.5 0.50 57.5 63.8 67.5.25 65.1 69.8 76.4 0.25 64.1 68.1 73.0.10 70.3 75.4 88.3 0.10 68.3 72.4 80.9

model{for(i in 1:2){ y[i] ˜ dbin(theta[i],N[i]) } # Likelihoodtheta[1] ˜ dbeta(11.26,11.26) # The priortheta[2] ˜ dbeta(13.32,6.28)beta[1] <- logit(theta[1]) # Induced posteriorbeta[2] <- logit(theta[2]) - logit(theta[1])lambda <- theta[1]/theta[2] # Induced posterior for ratioprob <- step(lambda -1) # Posterior prob. ratio > 1.}


14.3 Diagnostics

14.3.1 Case Deletion Influence Measures

14.3.2 Estimative Influence

14.3.3 Predictive Influence

EXAMPLE 14.3.1. O-Ring Data.Figure 14.9 gives index plots of Dp

i and D fi for the O-ring data.

Fig. 14.9 O-Ring Data: Index Plots of Dpi and D f

i

EXAMPLE 14.3.2. Trauma Data.Figure 14.10 contains an index plot of the difference in the predictive probabilitiesof death, p(y = 1|Y,x j)− p(y = 1|Y(52),x j).

14.3 Diagnostics 119

Fig. 14.10 Trauma Data: Index Plot of p(y = 1|Y,x j)− p(y = 1|Y(52),x j)


14.3.4 Model Checking

14.3.5 Link Selection

Bayes Factors for the O-Ring Data Below, we give WinBUGS code forcalculating the Bayes factor comparing M1 (logistic link) to M3 (complemen-tary log-log link) using the O-ring data. You may want to review the methodfor simulating Bayes factors with parameters of equal dimensions given in Sub-section 4.8.3. (a) Before running the code, use (3) to show that the inducedprior for β under the complementary log-log transformation is p(β ) ∝ ∏

2i=1{1−

exp(−exiβ )}ai−1{exp(−exiβ )}biexiβ , and thus derive the expression for u[i] in thecode below.

model{for (i in 1:n){

y[i] ˜ dbin(theta[i],1)logit(theta[i]) <- beta[1] + beta[2]*(temp[i]-mt)/sdtemp

}for(i in 1:2){ tildetheta[i] ˜ dbeta(a[i],b[i]) }beta[1]<-G[1,1]*logit(tildetheta[1])+G[1,2]*logit(tildetheta[2])beta[2]<-G[2,1]*logit(tildetheta[1])+G[2,2]*logit(tildetheta[2])#Have specified logistic model and induced prior on beta#Now specify probs under cloglog modelfor(i in 1:n){

cloglog(thetastar[i]) <- beta[1]+beta[2]*(temp[i]-mt)/sdtemp#Now give terms that go into log lik ratio#comparing cloglog in num to logistic in denv[i] <- y[i]*(log(thetastar[i])-log(theta[i]))

+(1-y[i])*(log(1-thetastar[i])-log(1-theta[i]))}#Finally, give corresponding log of prior ratiofor(i in 1:2){

cloglog(tildethetastar[i]) <- beta[1]+ beta[2]*(ttemp[i]-mt)/sdtemp

logit(tttildetheta[i]) <- beta[1]+ beta[2]*(ttemp[i]-mt)/sdtemp

u[i] <- (a[i]-1)*log(tildethetastar[i])- a[i]*log(tttildetheta[i])+ b[i]*(log(1-tildethetastar[i])- log(1-tttildetheta[i]))+ beta[1]+ beta[2]*(ttemp[i]-mt)/sdtemp

}wstar1 <- sum(v[ ])wstar2 <- sum(u[ ])w <- exp(wstar1+wstar2)


}list(tildetheta=c(0.5,0.5))list(ttemp=c(55,75), mt=69.56522, sdtemp=7.05708,

n=23, a=c(1.6,1.0), b=c(1.0,1.6))G[,1] G[,2]

0.2717391 0.7282609-0.3528540 0.3528540END

Fig. 14.11 O-Ring Data: Bayes Factors with Case Deletion

EXAMPLE 14.3.2 CONTINUED. Trauma Data.

14.3.6 Sensitivity Analysis

14.4 Posterior Computations and Sample Size Calculation


Fig. 14.12 Trauma Data: p(y = 1|Y,x j,Y )− p(y = 1|Y,x j,Y(i))


Fig. 14.13 O-Ring Data: Importance Function Diagnostic Plots

Chapter 15Correspondence Analysis

Correspondence Analysis (CA) is a method of data reduction that is applied to two-way tables of counts, cf. Chapter 2. It typically results in a two or three dimen-sional graph of row scores and column scores. Proper interpretation of such graphsis complex. Multiple Correspondence Analysis (MCA) applies similar ideas to datawith more than two classification factors by simultaneously looking at all of thetwo-factor marginal tables.

15.1 Introduction

Correspondence Analysis, like Principal Component Analysis, is an application ofthe Singular Value Decomposition (SVD) for a nonsquare matrix.

Theorem 15.1.1. The Singular Value Decomposition.Let X be an n× p matrix with rank s. Then X can be written as

X =ULV ′,

where U is n× s, L is s× s, V ′ is s× p, and

L≡ Diag(λ j).

The λ js are the positive square roots of the positive eigenvalues (singular values) ofX ′X and XX ′. The columns of V are s orthonormal eigenvectors of X ′X correspond-ing to the positive eigenvalues with

X ′XV =V L2,

and the columns of U are s orthonormal eigenvectors of XX ′ with

XX ′U =UL2.

125

126 15 Correspondence Analysis

This version of the theorem is proven in Christensen (2020, Section 13.3). Com-putationally, because eigenvectors are not uniquely defined, one computes either Vor U and then transforms one into the other, e.g., U = XV L−1 or V = X ′UL−1. Inpractice the eigenvalues are almost always listed from largest to smallest. (For thediscussion of Multiple Correspondence Analysis you need to remember that SVDalso denotes the results X ′X =V L2V ′ and XX ′ =UL2U ′.)

The singular value decomposition can be used to reduce dimensionality in anydata matrix by zeroing out eigenvalues near zero. In statistics we usually preprocessthe data matrix before applying the singular value decomposition. In principal com-ponent analysis we always center the data so that each column has mean zero andusually rescale each column to a common length before applying the singular valuedecomposition. In correspondence analysis, we are concerned with two-way tablesof counts, i.e., contingency tables, and apply the singular value decomposition tothe Pearson residuals. However, standard correspondence analysis transforms theeigenvectors before using them.

The subsequent discussion presumes familiarity with Chapter 2 or alternativelyChristensen (1996, Section 8.5 or 2015, Section 5.5). This section provides intro-ductory material. The next applies SVD to the Pearson residual matrix and the thirddiscusses CA. The final section discusses MCA.

EXAMPLE 15.1.1. Christensen (1996, Section 8.5; 2015 Section 5.5) consideredthe data from Lazerwitz (1961). We consider the data as multinomial samples fromthree religious groups or, alternatively, as one large multinomial sample. The countsconsist of the numbers of people in different occupational groups: A, professions;B, owners, managers, and officials; C, clerical and sales; and D, skilled workers.Table 15.1 contains the observations ni j, the estimated expected values under thenull model of independence (or homgeniety),

m(0)i j ≡

ni·n· jn··

,

and the Pearson residuals,

ri j ≡ni j− m(0)

i j√m(0)

i j

.

The sum of the squared residuals gives the Pearson chi-squared statistic,

X2 = 60.02,

which is on (3−1)(4−1) = 6 degrees of freedom.The Pearson residuals are used to interpret how the data deviate from indepen-

dence/homogeniety by comparing their size to a N(0,1) distribution. Jewish peoplein occupation D have the largest negative residual, −4.78, so Jewish people wereunder-represented among skilled workers relative to the other religious groups. Ro-man Catholics, however, were over-represented among skilled workers with a pos-itive residual of 3.07. The remaining large residual in the table is 4.13 for Jewish

15.2 Singular Value Decomposition Plot 127

Table 15.1 Lazerwitz tables.

Observations (ni js).Religion A B C D TotalProtestant 210 277 254 394 1135Roman Catholic 102 140 127 279 648Jewish 36 60 30 17 143Total 348 477 411 690 1926

Estimated expected counts (m(0)i j s).

Religion A B C D TotalProtestant 205.08 281.10 242.20 406.62 1135Roman Catholic 117.08 160.49 138.28 232.15 648Jewish 25.84 35.42 30.52 51.23 143Total 348.00 477.00 411.00 690.00 1926

Pearson residuals (ri j)s).Religion A B C DProtestant 0.34 −0.24 0.76 −0.63Roman Catholic −1.39 −1.62 −0.96 3.07Jewish 2.00 4.13 −0.09 −4.78X2 = 60.0, df = 6

people in group B which means that Jewish people were more highly representedamong owners, managers, and officials than the other religious groups. The onlyother residual of even moderately large size is the 2.00 for Jewish people in theprofessions. These data seem to indicate that the Jewish group was different fromthe other two. A substantial difference appears in every occupational group exceptclerical and sales.

Figure 15.1 gives both an SVD plot and a CA plot of the Pearson residuals. To beinterpreted properly these plots need to be scaled properly. Figure 15.1 is not prop-erly scaled but gives an idea of what we are working towards. Both plots identifyJewish and D (skilled workers) as unusual categories. Both plots, but more clearlyin the SVD plot, associate Jewish with occupations B and A and associate Catholicwith occupation D. These combinations have the three large positive residuals dis-cussed earlier. Both plots have the points Jewish and D far from one another and theangle between the points Jewish, (0,0), and D close to 180 degrees. This is consistentwith having a large negative residual.

15.2 Singular Value Decomposition Plot

The singular value decomposition (SVD) plot is simply a two-dimensional visualrepresentation of the Pearson residual matrix. For a small table like the 3× 4Religion–Occupation table, there is little need for a visual display of the Pearsonresiduals. Looking at the table gives the information needed. But in a 10×20 con-


−2 −1 0 1 2

−20

12

SVD Plot

Factor 1

Facto

r 2 Protest

RomCathJewish

A

B

C

D+

−0.6 −0.4 −0.2 0.0 0.2

−0.3

−0.1

0.10.3

CA Plot

Factor 1

Facto

r 2

ProtestRomCathJewish

AB

CD+

Fig. 15.1 SVD and CA Plots of Lazerwitz’s data.

tingency table, one might appreciate some visual help in interpreting the residualtable. It should be emphasized that it is not hard to get the visual interpretationswrong and that interesting information may get missed in the visual display. What-ever interpretations one arrives at by looking at the SVD (or CA) plot should bereconfirmed by looking at the residual table.

Consider a two-way table of counts as an I× J matrix N ≡ [ni j]. The total countis n·· ≡ ∑i ∑ j ni j. Treating the sample as one multinomial, define the matrix of es-timated cell probabilities P ≡ (1/n··)N ≡ [pi j]. With 1s defining an s dimensionalcolumn of 1s, define vectors of row and column marginal probabilities

Pr ≡ P1J ≡ (pr1, . . . , prI)′

andP′c ≡ 1′IP≡ (pc1, . . . , pcJ).

The raw residual matrix is n··[P− PrP′c

]. Define diagonal matrices Dr ≡ D(Pr) and

Dc ≡ D(Pc). The Pearson residuals table is the matrix

R =√

n··D−1/2r

[P− PrP′c

]D−1/2

c .

Note that Pearson’s chi-squared statistic X2 is the sum of squares of the elements ofR, which reduces to X2 = tr(RR′).

Apply the singular value decomposition to R to obtain

15.2 Singular Value Decomposition Plot 129

R =ULV ′,

where U ′U = I =V ′V and L contains the square roots of the nonzero eigenvalues ofRR′ and R′R. Further observe that now

X2 = tr(RR′) = tr(ULV ′V LU ′) = tr(UL2U ′) = tr(L2U ′U) = tr(L2).

We base a plot on U ≡UL1/2 and V ≡V L1/2. Clearly,

R = UV ′.

The rows of U are labeled as the rows of R and the columns of V ′ (rows of V ) arelabeled as the columns of R. Assuming that the eigenvalues are listed from largestto smallest, a d dimensional plot uses the first d columns of U and V . Typically,d = 2 but sometimes people plot in 1 or 3 dimensions. Define the submatrices withd columns to be Ud and Vd and define Ld as the d× d diagonal submatrix with thelargest entries. The d dimensional approximation to the Pearson residual matrix Ris

Rd ≡ UdV ′d ≡ [rdi j].

How good is the approximation? Well, it gives

X2d ≡ tr(RdR′d) = tr(L2

d)≤ tr(L2) = X2.

Often,tr(L2

d)

tr(L2)=

X2d

X2

is reported as the eigenvalue percentage associated with the approximation. Unlessthis percentage is large, the dimension reduction is losing a lot of information.

Our SVD plot is a d dimensional plot of the I + J points consisting of every rowof

Ud =

u′d1...

u′dI

and Vd =

v′d1...

v′dJ

.The Euclidean inner product

u′divd j = rdi j

is the d dimensional approximation to the residual ri j.The inner product satisfies

u′divd j = ‖udi‖‖vd j‖cos(θi j).

On a properly scaled plot, one with identically scaled vertical and horizontal axes,one can get appropriate visual impressions of the vector lengths ‖udi‖ and ‖vd j‖,which are the distances from the plotted points to the origin, i.e., to the d vector of0s. Similarly, θi j, the angle between the two vectors, i.e., the angle from udi to the


origin to vd j, is properly displayed. In particular, |rdi j| will be big if both udi and vd jare far from 0 and on a straight line through 0. rdi j will be negative when the pointsare on opposite sides of the origin. rdi j will be positive when the points are on thesame side of the origin. Moreover, rdi j will be 0 if the angle from udi to 0 to vd j is90 degrees. Intermediate angles give intermediate results.

If udi and udh are close to one another, for every j we will have rdi j.= rdh j. In

other words, rows i and h behave very similarly (if a d dimensional approximationis a good one). If udi and udh are on a line through 0 but on opposite sides andabout equidistant from 0, all of their approximate residuals will be of similar sizebut opposite signs. In other words, rows i and h behave in nearly opposite ways (if ad dimensional approximation is a good one). Similar results hold for evaluating vd jand vdk.

EXAMPLE 15.2.1. Applying the SVD to the Lazerwitz data gives

U =

−0.06361666 0.6376901 0.76766160.49205979 −0.6491753 0.5800419−0.86823389 −0.4146357 0.2724833

,

L =

7.601416 0.000000 0.0000e+000.000000 1.496097 0.0000e+000.000000 0.000000 6.1118e−16

,

V =

−0.32146059 0.1973274 −0.6690531−0.57447704 −0.5473895 0.4766768−0.05777935 0.7651747 0.5571887

0.75053365 −0.2755618 0.1211934

.Note that, up to roundoff error,

X2 = 60.02 = 7.6014162 +1.4960972 +0

and you can check to see that R =ULV ′.In this example we lose no information by looking at the d = 2 reduced dimen-

sion Pearson residual matrix because, up to roundoff error,

R = R2 =

−0.06362 0.63770.49206 −0.6492−0.86823 −0.4146

[7.601 0.0000.000 1.496

]−0.32146 0.1973−0.57448 −0.5474−0.05778 0.7652

0.75053 −0.2756

′

.

This is not an accident. The rank of R is at most min(I,J)−1 which in this exampleis 2. The rank of an I×J matrix is always no greater than min(I,J) but the rows andcolumns of R are both subject to a linear constraint,

1′ID1/2r R =

√n··1′ID

1/2r D−1/2

r[P− PrP′c

]D−1/2

c

=√

n··1′I[P− PrP′c

]D−1/2

c

15.3 Correspondence Analysis Plot 131

=√

n··[P′c− (1)P′c

]D−1/2

c = 0

and similarlyRD1/2

c 1J = 0.

If the rank of R is two, the two dimensional approximation R2 will be perfect.Figure 14.2 gives the properly scaled SVD plot so that the visual impression of

lengths and angles are actual Euclidean lengths and angles. As discussed earlier,Jewish and D (skilled workers) are unusual categories because Jewish is far fromthe other row categories and D is far from the other column categories. Jewish isassociated with (located near to) occupations B and A. Catholic is near occupationD. These are the combinations with large positive residuals. Jewish and D are bothfar from the origin (0,0) and the angle between the points Jewish, (0,0), and D isclose to 180 degrees, so they have a large negative residual.

−2 −1 0 1 2

−2

−1

01

2

SVD Plot

Factor 1

Fact

or 2

Protest

RomCath

Jewish

A

B

C

D

+

Fig. 15.2 SVD Plot of Lazerwitz’s data.

15.3 Correspondence Analysis Plot

An alternative to looking at the Pearson residuals to evaluate deviations from inde-pendence/homogeniety is to look at something like an interaction plot. Table 15.2gives Lazerwitz’s occupational proportions for each religious group and Figure 15.3plots them. Under the null model of (row) homogeneity, the observed proportions


in each occupation category should be the same for all the religions (up to samplingvariability). Alternatively, we could look at the religious group proportions for eachoccupation. Correspondence Analysis tries to get at both of these.

Table 15.2 Observed proportions by religion.

OccupationReligion A B C D TotalProtestant 0.185 0.244 0.224 0.347 1.00Roman Catholic 0.157 0.216 0.196 0.431 1.00Jewish 0.252 0.420 0.210 0.119 1.00

Occupation

Prop

ortio

n

A B C D

0.10.2

0.30.4

ProtestRomCathJewish

Fig. 15.3 Occupational proportions by religion.

From Figure 15.3 the Jewish group is obviously very different from the othertwo groups in occupations B and D and is very similar in occupation C. The Jewishproportion seems somewhat different for occupation A. The Protestant and RomanCatholic groups seem similar except that the Protestants are a bit underrepresentedin occupation D and therefore are overrepresented in the other three categories. (Re-member that the four proportions for each religion must add up to one, so beingunderrepresented in one category forces an overrepresentation in one or more othercategories.)

In the previous section P was the matrix of estimated cell probabilities for multi-nomial sampling. Correspondence Analysis treats the data as product multinomial


rather than multinomial, so it looks at the rows of D−1r P, say p′i, i = 1, . . . , I. Rela-

tive to the multinomial probabilities P of being in the i j cell, the entries in D−1r P are

estimated conditional probabilities of being in column j given that you are in row i.In P the sum of all the entries is 1, i.e., 1 = 1′IP1J but in D−1

r P each row sums to 1,i.e., 1I = D−1

r P1J = D−1r Pr.

The p′is are precisely what was plotted in the I = 3 curves of Figure 15.3. CAthen looks at each row’s deviations from the pooled column proportions, so looksat pi− Pc. Finally, CA defines the squared distance between these two vectors as(pi− Pc)

′D−1c (pi− Pc). What we want is a d dimensional vector, say fri (subscript

r for “rows”), so that when we plot the point its squared Euclidean distance to theorigin is close to (pi− Pc)

′D−1c (pi− Pc).

CA begins, essentially, with the Pearson residual matrix R. It actually works with

1√

n··R = D−1/2

r[P− PrP′c

]D−1/2

c

= D1/2r D−1

r[P− PrP′c

]D−1/2

c

= D1/2r[D−1

r P−1IP′c]

D−1/2c .

The ith row of this matrix is √pri(pi− Pc)

′D−1/2c .

The Euclidean squared length of the ith row vector is

‖√

pri(pi− Pc)′D−1/2

c ‖2 = pri(pi− Pc)′D−1

c (pi− Pc),

so to get the length we want from the row vectors, we will need to multiply the ithrow by 1/

√pri. In other words, we need to examine the rows of

D−1/2r

1√

n··R =

[D−1

r P−1IP′c]

D−1/2c .

The squared lengths of the rows are the diagonal elements of[D−1/2

r1√

n··R][

D−1/2r

1√

n··R]′

=1n··

D−1/2r RR′D−1/2

r .

From the SVDR =ULV ′,

so we get1√

n··R =U

[1√

n··L]

V ′.

Only the eigenvalues have been changed by a multiple. Moreover, the squaredlengths that we want are the diagonal elements of


1n··

D−1/2r RR′D−1/2

r =1n··

D−1/2r ULV ′V LU ′D−1/2

r

=1n··

D−1/2r ULLU ′D−1/2

r

=

[D−1/2

r U1√

n··L][

D−1/2r U

1√

n··L]′.

The scores that traditionally define correspondence analysis for rows are

Fr ≡ D−1/2r UL(1/

√n··),

and we have just shown that the rows of Fr have the appropriate lengths. If weplotted the I rows of Fr in r(R) dimensions, their Euclidean squared distances fromthe origin would be (pi− Pc)

′D−1c (pi− Pc). But we cannot do r(R)> 3 dimensional

plots. We can do d dimensional plots for d = 1,2,3. So when plotting we take thefirst d columns of Fr, the ones that correspond to the largest diagonal values of L. Aswith the SVD plot, tr(L2

d)/tr(L2) is often used to measure how much of the originalinformation is retained in the plot.

EXERCISE 15.1. With f ′ri the ith row of Fr, show that

‖ fri− frh‖2 = (pi− ph)′D−1

c (pi− ph)

so that the “visual” distance between the (complete) scores for rows i and h isprecisely the distance between their vectors of estimated probabilities. Hint: Write(pi− Pc)

′D−1/2c = f ′riV

′.

Figure 15.4 presents the CA plot for the Lazerwitz religious groups. Not surpris-ingly, the Jewish group is far from the other two.

A similar argument holds for the column categories. In that case we think abouthaving J independent multinomial samples, one for each column, and constructingscores

Fc = D−1/2c V L.

Figure 15.5 presents the CA plot for the Lazerwitz occupational categories. Notsurprisingly, the skilled workers (D) group is far from the other three.

Visual differences in distances between row categories make sense in the CAplot. Visual differences between column categories can also be interpreted as dis-played. Somewhat surprisingly, a typical CA plot contains both sets of points (cf.,Figure 15.6) despite the fact that, as Greenacre and Hastie (1987) point out, “row-to-column distances are meaningless.” Nonetheless, it seems to be common practiceto interpret relations between column points and row points in a fashion similar tothat which was justified for the SVD plot.

An alternative to programming these computations is to let an R package do thework for you. Greenacre and Nenadic’s R package ca gave Figure 15.7. If you name


−0.6 −0.4 −0.2 0.0 0.2

−0.

6−

0.4

−0.

20.

00.

2

CA Rows Plot

Factor 1

Fact

or 2

Protest

RomCathJewish

+

Fig. 15.4 CA Rows Plot of Lazerwitz’s data.

−0.2 −0.1 0.0 0.1 0.2

−0.

2−

0.1

0.0

0.1

0.2

CA Columns Plot

Factor 1

Fact

or 2 A

B

C

D+

Fig. 15.5 CA Columns Plot of Lazerwitz’s data.


−0.6 −0.4 −0.2 0.0 0.2

−0.

6−

0.4

−0.

20.

00.

2

CA Plot

Factor 1

Fact

or 2

Protest

RomCathJewish

A

B

C

D

Fig. 15.6 CA Plot of Lazerwitz’s data.

the columns and rows of the input data matrix, it will provide specific labels, ratherthan numerical labels, for the plotted points.

Dimension 1 (96.3%)

Dim

ensi

on 2

(3.

7%)

−0.4 −0.2 0.0 0.2

−0.

3−

0.2

−0.

10.

00.

10.

20.

3

1

23

1

2

3

4

Fig. 15.7 R’s ca plot for Lazerwitz’s data.

15.4 R code for SVD and CA 137

A variety of R packages (including MASS) contain correspondence analysis ca-pabilities. ca provides a lot of output with unique names. For example, what it calls“Principal inertias (eigenvalues)” are the diagonal values of L2/n··. ca’s “Dimen-sions” are multiples of F ′r or F ′c .

EXERCISE 15.2. Construct and interpret SVD and CA plots for the 9×8 table ofReligions and Occupations given in Exercise 2.7.4 and Table 2.7.

Greenacre, Michael and Hastie, Trevor (1987). The Geometric Interpretation ofCorrespondence Analysis. Journal of the American Statistical Association, 82, 437–447.

15.4 R code for SVD and CA

lazer <- read.table("C:\\E-drive\\Books\\ANREG2\\newdata\\tab5-6.dat",

sep="",col.names=c("O","Rel","Occ"))attach(lazer)lazerlaz <- xtabs(O˜Rel+Occ)laz

# fit indep/homogen model to tablefit <- chisq.test(laz,correct=FALSE)fitfit$expectedfit$residual

# Singular Value Decomp of sqrt(n) x residual matrixcaa=svd(fit$residual)L=diag(caa$d)LU=caa$uUV=caa$vV#This should reproduce fit$residualPR=U%*%L%*%t(V)PR

# Pearson Chi-squareX2=PR%*%t(PR)sum(diag(X2))


#Look at how closely 2-dim SVD reproduces residual matrixPRca=U[,1:2]%*%L[1:2,1:2]%*%t(V[,1:2])PRca

#Figure 1par(mfrow=c(2,1))#SVD PlotUt=U%*%sqrt(L)Vt=V%*%sqrt(L)# If done correctly the next line should be# the residual matrix.Ut%*%t(Vt)plot(Ut[,1],Ut[,2],

xlim=c(-2.5,2.5),ylim=c(-2.5,2.5),xlab="Factor 1",ylab="Factor 2",main="SVD Plot")

text(Ut[,1],Ut[,2]-.4,labels=c("Protest","RomCath","Jewish"))lines(Vt[,1],Vt[,2],pch=15,type="p")text(Vt[,1]-.125,Vt[,2],labels=c("A","B","C","D"))text(0,0,labels=c("+"))

#Correspondence Analysis PlotPr=rowSums(laz)/sum(laz)Pc=colSums(laz)/sum(laz)Dr=diag(1/sqrt(Pr))Dc=diag(1/sqrt(Pc))Fr=Dr%*%U%*%L/sqrt(sum(laz))Fc=Dc%*%V%*%L/sqrt(sum(laz))plot(Fr[,1],Fr[,2],

xlim=c(-.7,.3),ylim=c(-.3,.3),xlab="Factor 1",ylab="Factor 2",main="CA Plot")

text(Fr[,1],Fr[,2]-.06,labels=c("Protest","RomCath","Jewish"))lines(Fc[,1],Fc[,2],pch=15,type="p")text(Fc[,1]-.03,Fc[,2],labels=c("A","B","C","D"))text(0,0,labels=c("+"))par(mfrow=c(1,1))

# Figure 2#SVD PlotUt=U%*%sqrt(L)Vt=V%*%sqrt(L)# If done correctly the next line should be# the residual matrix.Ut%*%t(Vt)

15.4 R code for SVD and CA 139

plot(Ut[,1],Ut[,2],xlim=c(-2.5,2.5),ylim=c(-2.5,2.5),xlab="Factor 1",ylab="Factor 2",main="SVD Plot")

text(Ut[,1],Ut[,2]-.15,labels=c("Protest","RomCath","Jewish"))lines(Vt[,1],Vt[,2],pch=15,type="p")text(Vt[,1]-.15,Vt[,2],labels=c("A","B","C","D"))text(0,0,labels=c("+"))

# Figure 3 is from ANREG2.

# Figure 4plot(Fr[,1],Fr[,2],

xlim=c(-.575,.23),ylim=c(-.575,.23),xlab="Factor 1",ylab="Factor 2",main="CA Rows Plot")

text(Fr[,1],Fr[,2]-.03,labels=c("Protest","RomCath","Jewish"))text(0,0,labels=c("+"))

#Figure 5plot(Fc[,1],Fc[,2],

xlim=c(-.25,.25),ylim=c(-.25,.25),xlab="Factor 1",ylab="Factor 2",main="CA Columns Plot")

text(Fc[,1]-.015,Fc[,2],labels=c("A","B","C","D"))text(0,0,labels=c("+"))

#Figure 6plot(Fr[,1],Fr[,2],

xlim=c(-.575,.23),ylim=c(-.575,.23),xlab="Factor 1",ylab="Factor 2",main="CA Plot")

text(Fr[,1],Fr[,2]-.03,labels=c("Protest","RomCath","Jewish"))lines(Fc[,1],Fc[,2],pch=15,type="p")text(Fc[,1]-.03,Fc[,2],labels=c("A","B","C","D"))

# Or you could just use the ca package#install("ca")library(ca)fitt=ca(laz)plot(fitt)par(mfrow=c(1,1))


15.4.1 Nobel Prize Winners

An example on Nobel prize winners by country from a Youtube video by FrancoisHusson, www.youtube.com/watch?v=Z5Lo1hvZ9fA Code is incompleteor wrong.

rm(list = ls())nobel = matrix(c(4, 3, 2, 4, 1, 4,8, 3, 11, 12, 10, 9,24, 1, 8, 18, 5, 24,1, 1, 6, 5, 1, 5,6, 0, 2, 3, 1, 11,4, 3, 5, 2, 3, 10,23, 6, 7, 26, 11, 20,51, 43, 8, 70, 19, 66),6,8)nobel=t(nobel)nobelrowSums(nobel)colSums(nobel)# rownames Canada France Germany Italy Japan Russia UK USA# colnames Chem Econ Lit Med Peace Physics

# fit indep/homogen model to tablefit <- chisq.test(nobel,correct=FALSE)fitfit$expectedfit$residual

# Singular Value Decomp of residual matrixca=svd(fit$residual)L=diag(ca$d)LU=ca$u %*% sqrt(L)UV=ca$v %*% sqrt(L)V#This should reproduce fit$residualPR=U%*%L%*%t(V)PR

# Pearson Chi-squareX2=PR%*%t(PR)sum(diag(X2))

www.youtube.com/watch?v=Z5Lo1hvZ9fA

15.5 Multiple correspondence analysis 141

#SVD Plotplot(U[,1],U[,2],

xlim=c(-1,1),ylim=c(-1,1),,xlab="Factor 1",ylab="Factor 2")

text(U[,1],U[,2]-.075,labels=c("Canada","France","Germany","Italy","Japan","USSR","UK","USA"))

lines(V[,1],V[,2],pch=15,type="p")text(V[,1]+.075,V[,2],labels=

c("Chem","Econ","Lit","Med","Peace","Phys"))

15.5 Multiple correspondence analysis

Multiple Correspondence Analysis (MCA) was developed to handle contingency ta-bles with more than two factors, In general, we consider s factors. MCA approachesproblems quite differently than CA but for the case of two factors gives similarresults.

To makes things specific, consider a very small matrix of counts, say,

N =

1 2 1 21 2 2 11 1 1 1

.This has a total of n≡ n·· = 16 observations on s = 2 factors having I = 3 row levelsand J = 4 column levels.

To perform MCA we need to turn N into a n× (I+ J) = 16×7 incidence matrixX where each row indicates the category for one of the n individuals, the first I = 3columns of X are 0-1 to indicate which row of N contains the observation, andthe last J = 4 columns of X are 0-1 to indicate which column of N contains theobservation. Let Xr denote the columns of X corresponding to the rows of N, so theith column of Xr is 1 only if the individual appears in the ith row category of tableN. Similarly Xc denotes the columns of X associated with the columns of N, so thejth column of Xc is 1 only if the individual appears in the jth column of N. For ourexample we can write


X = [Xr,Xc] =

1 0 0 1 0 0 01 0 0 0 1 0 01 0 0 0 1 0 01 0 0 0 0 1 01 0 0 0 0 0 11 0 0 0 0 0 10 1 0 1 0 0 00 1 0 0 1 0 00 1 0 1 1 0 00 1 0 0 0 1 00 1 0 0 0 1 00 1 0 0 0 0 10 0 1 1 0 0 00 0 1 0 1 0 00 0 1 0 0 1 00 0 1 0 0 0 1

but any permutation of the rows of X works just as well. Each row of X uniquelyidentifies some cell of N and the row of X that identifies the i j cell gets repeatedexactly ni j times within X .

The (quite remarkable) idea is to treat this incidence matrix like a table of counts.As such, the total number of observations is ns, 32 in our example. The row totalsof X all equal s, 2 in our example. The column totals of Xr are the row totals of Nand the columns totals of Xc are the column totals of N. In our example, the columntotals of Xr are (6,6,4) and the column totals of Xc are (3,5,4,4).

For an s factor contingency table with the qth factor having sq levels. LetX = [X1, . . . ,Xs] be the n×∑

sq=1 sq incidence matrix. Then ns is the total number

of counts in the table, the row totals always equal s, and the column totals for Xq arethe marginal totals for factor q.

We begin by relating MCA for s = 2 to standard CA. Take (s1,s2) = (I,J). Forthis case we use identical notation as in earlier sections for N but use a subscriptX when using similar concepts for performing an analysis treating X as a table ofcounts. Also, relative to N, define

P1/2r ≡ D1/2

r 1I ; P1/2c ≡ D1/2

c 1J

It is not hard to see that

X ′X =

[D(ni·) N

N′ D(n· j)

]where N is our I× J contingency table. For our little example


X ′X =

6 0 0 1 2 1 20 6 0 1 2 2 10 0 4 1 1 1 11 1 1 3 0 0 02 2 1 0 5 0 01 2 1 0 0 4 02 1 1 0 0 0 4

.

Recall that for N,

R =√

nD−1/2r [P− PrP′c]D

−1/2c =ULV ′.

Also note that the vector D1/2c 1J ≡ P1/2

c has the property

R′RP1/2c = R′

√nD−1/2

r [P− PrP′c]D−1/2c P1/2

c

= R′√

nD−1/2r [P− PrP′c]1J

= R′√

nD−1/2r [Pr− Pr(1)] = 0,

so P1/2c is an eigenvector of R′R with respect to the eigenvalue 0. Recall that eigen-

vectors from distinct eigenvalues are orthogonal and that the columns of V are eigen-vectors for R′R that correspond to positive eigenvalues, so P1/2′

c V = 0. Similarly,P1/2′

r U = 0.If we treat X like a two-way table of counts and find the Pearson residual matrix

RX , remarkably after some nasty algebra we can compute

R′X RX = n

[II− P1/2

r P1/2′r

1√n R

1√n R′ IJ− P1/2

c P1/2′c

].

The algebra involves recognizing that the estimated row probabilities from X arePXr = (1/n)1n and its column probabilites are P′Xc = (1/s)(P′r , P

′c). Then

RX ≡√

nsD−1/2(PXr)

[1ns

X− PXrP′Xc

]D−1/2(PXc) =

1√s

[X− s1nP′Xc

]D−1/2(PXc).

For R′X RX , the columns of [UV

]are eigenvectors. Check it out.

R′X RX

[UV

]= n

[II− P1/2

r P1/2′r

1√n R

1√n R′12 IJ− P1/2

c P1/2′c

][UV

]


= n

[(II− P1/2

r P1/2′r )U + 1√

n RV1√n R′12U +(IJ− P1/2

c P1/2′c )V

]

= n

[U + 1√

n RV1√n R′U +V

]

= n

[U + 1√

nULV ′V1√nV LU ′U +V

]

= n

[U + 1√

nUL1√nV L+V

]

= n

[U(I + 1√

n L)

V ( 1√n L+ I)

]

=

[UV

]n(

I +1√n

L)

The eigenvalues for the columns of[

UV

]are the diagonal elements of n

(I + 1√

n L)

.

The SVD of RX is

RX =

{RX

[UV

](nI +√

nL)−1/2

}(nI +√

nL)1/2

[UV

]′.

In fact, even the CA column factors of RX essentially agree with the original CAanalysis on N for both rows and columns because

FXc ≡ D−1/2(PXc)

[UV

](nI +√

nL)1/2

=

[D−1/2

r UD−1/2

c V

]LL−1

(ns2 I +

√n

s2 L)1/2

=

[FrFc

]L−1

(ns2 I +

√n

s2 L)1/2

but the diagonal matrix on the right is just a scale factor that has little effect on grossgeneral impressions.

For more than two factors, Multiple Correspondence Analysis focuses on thetwo-way marginal tables rather than on the full factorial table. Not that it is ananalogy that is easily exploited, but it seems like MCA is focussing on how thecomplete independence model may differ from the all two-factor interaction log-linear model. The process is looking at deviations from pairwise independence andonly looks at the tables that are sufficient statistics for the all two-factor interactionmodel.


Let s = 3 and (s1,s2,s3) = (I,J,K) so we have a standard three dimensional tableof values ni jk. Let Nab denote a two-dimensional marginal table, e.g., N13 = [ni·k]. Itis easy to see that

X ′X =

D(ni··) N12 N13N′12 D(n· j·) N23N′13 N′23 D(n··k)

Write Pr, Pc, P for the marginal probabilities for rows, columns, and layers, e.g.,P = (n··1/n···, . . . ,n··K/n···)′.

If we treat X like a two-way table of counts and find the Pearson residual matrixRX , again after some nasty algebra we can compute

R′X RX = n

II− P1/2

r P1/2′r

1√n R12

1√n R13

1√n R′12 IJ− P1/2

c P1/2′c

1√n R23

1√n R′13

1√n R′23 IK− P1/2

` P1/2′`

,wherein Rab is the table of Pearson residuals from fitting an independence modelto Nab. The model of complete independence for the s way table ensures that wecan treat each two-way marginal table as independent. Focusing on the residuals ofthese tables is to focus on how complete independence fails, but only consideringtwo-way marginal tables does not allow deeper relationships to arise than could befound in the all two-factors interaction model.

If you do a SVD with the appropriate diagonal matrix of eigenvalues L 2,

R′X RX = n

II− P1/2

r P1/2′r

1√n R12

1√n R13

1√n R′12 IJ− P1/2

c P1/2′c

1√n R23

1√n R′13

1√n R′23 IK− P1/2

` P1/2′`

=

UVW

L 2 [U ′ V ′ W ′ ]

=

UL 2U ′ UL 2V ′ UL 2W ′

VL 2U ′ VL 2V ′ VL 2W ′

WL 2U ′ WL 2V ′ WL 2W ′

.In particular you get

√nR12 = UL 2V ′,

√nR13 = UL 2W ′ and

√nR23 = VL 2W ′.

If you do a SVD plot, you can interpret points just as in an individual SVD plot,except that a d = 2 dimensional approximation will probably be a worse approxi-mation than when just dealing with an individual table because I would expect L 2

to contain more positive eigenvalues than the L2 defined for any one of the two-dimensional tables.

You can also look at the eigenvectors of RX R′X which will give you scores thatyou can plot for each individual. The meaning of what is getting plotted is not par-ticularly clear but it would be well to look at unusual individuals. Every individual


in the i jk cell corresponds to an identical row in X , so their scores will all be iden-tical. The scores for the i jk cell will be some combination of the residuals from thei j, ik, and jk marginal tables being fitted for independence/homogeneity.

Finally, if we label the 3 factors A, B, C and we wanted to entertain the model[A][BC] it seems clear that the appropriate approach to MCA would to to define theincidence matrix in terms of two factors rather than three, with the second factorbeing the combined levels of B and C. It would also be of interest to see whatrelationship holds between fitting a model [AC][BC] and the MCA of in incidencematrix defined by a first group of columns identifying combined levels of A and Cand then a second group of columns defining combined levels of B and C.

You can check my algebra by running the following code for the example.Heaven knows I spent enough time checking my algebra.

rm(list = ls())laz=matrix(c(1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,1),16,7)laz

# fit indep/homogen model to tablefit <- chisq.test(laz,correct=FALSE)fitfit$expectedfit$residualt(fit$residual)%*%fit$residualt(laz)%*%lazXtX=t(laz)%*%lazfitt = chisq.test(XtX[1:3,4:7],correct=F)fitt$residual

Chapter 16Exact Conditional Tests

NonBayesian approaches to log-linear models depend heavily on asymptotic the-ory. What is to be done when the sample sizes are not large? Exact conditional testsprovide one approach. This methodology seems to have been first introduced by ...who else? R. A. Fisher. We introduced Fisher’s exact test for a 2×2 table in Exer-cise 2.7.5. Agresti (1992) reviews not only exact conditional tests but other relatedforms of inference including confidence intervals. As examined in Agresti’s paperand its discussion, there is substantial debate about the efficacy of these methods.What is not up for debate is the fact that these methods, when they can be computed,provide valid significance tests of the log-linear model being tested. (Significancetests ask only whether the data tend to contradict the null model and involve noexplicit alternative model/hypothesis.)

The problem with exact conditional tests is that the larger the data, the moredifficult the tests are to compute. Given an observed contingency table and a log-linear model for the data, we have observed sufficient statistics for the model. Toperform an exact conditional test, we need to list all of the contingency tables thatshare the same set of sufficient statistics (in order to see how unusual our table isrelative to this reference set of tables). The difficulty is in listing all of the tableswith the same sufficient statistics.

As far as I can tell, the best software for doing this is StatXact for log-linearmodels and LogXact for logistic models. Both were developed by Cytel, a businesscreated for the purpose by Cyrus Mehta and Nitin Patel, who did seminal work onthe necessary computing. Aspects of these two packages have been incorporated intomany of the large statistics packages. However, even state of the art computing isnot able to list all of the necessary tables for large data. In such cases simulations areoften used to find approximate exact P values (as opposed the the exact approximateP values provided by large sample theory).

Exact conditional tests are based on discrete distributions. As such, you shouldrestrict yourself to looking at P values. Specifying an α level test may result in thetrue size of the test being much smaller than the nominal α level (unless you areperforming a randomized test which most people seem to think – including me – iscrazy).

147

148 16 Exact Conditional Tests

16.1 Two-Factor Tables

In fitting models to count data, we typically assume multinomial or product multino-mial sampling. Sampling independent Poisson counts also works similarly. In multi-nomial sampling the total sample size is fixed. In product multinomial sampling, thesize of each independent multinomial is fixed. The key idea in exact conditional testsfor independence/homogeneity in two-way tables is to look at the distribution thedata conditional on both the row totals and the column totals.

Independence and homogeneity have essentially the same log-linear model withthe same sufficient statistics which are the row and column totals. To test inde-pendence/homogeneity we condition on the sufficient statistics for that model. (Intesting homogeneity with product multinomial sampling of, say, rows, technically,only the column totals are sufficient statistics with the row totals known from thesampling scheme, so we actually condition on both the sufficient statistics and anytotals fixed by sampling.)

A 2× 2 table is particularly nice in this regard because if you fix both the rowand column margins, knowing any one entry in the table determines the entire table.Knowing the row and column totals, any table entry lets you find the other numberin the same row and the other number in the same column. Knowing three of thefour numbers in the table, the last is easy to find. Things get more complicated inbigger tables.

To illustrate the general approach for two-factor tables we consider a 3×3 tablewith the fixed marginal totals (n1·,n2·,n3·) = (1,2,3) and (n·1,n·2,n·3) = (1,2,3),i.e.,

Table with fixed marginsni j A1 A2 A3 TotalB1 n11 n12 n13 1B2 n21 n22 n23 2B2 n31 n32 n33 3Total 1 2 3 6

For general multinomial sampling, consider the conditional probability underindependence of an I×J table of ni js, given its marginal totals the ni·s and n· js. Thisis the multinomial probability of the table (assuming independence) divided by themultinomial probability of seeing the row totals times the multinomial probabilityof seeing the column totals (under independence, the row totals and column totalsare independent), hence the probability is [watch the probabilities in the numerator]

n··!

∏Ii=1 ∏

Jj=1 ni j!

I

∏i=1

J

∏j=1

(pi j)ni j

/[n··!

∏Ii=1 ni·!

I

∏i=1

pni·i·

][n··!

∏Jj=1 n· j!

J

∏j=1

pn· j· j

]

=n··!

∏Ii=1 ∏

Jj=1 ni j!

I

∏i=1

J

∏j=1

(pi·p· j)ni j

/[n··!

∏Ii=1 ni·!

I

∏i=1

pni·i·

][n··!

∏Jj=1 n· j!

J

∏j=1

pn· j· j

]

16.1 Two-Factor Tables 149

=n··!

∏Ii=1 ∏

Jj=1 ni j!

I

∏i=1

J

∏j=1

pni ji· p

ni j· j

/[n··!

∏Ii=1 ni·!

I

∏i=1

pni·i·

][n··!

∏Jj=1 n· j!

J

∏j=1

pn· j· j

]

=n··!

∏Ii=1 ∏

Jj=1 ni j!

I

∏i=1

J

∏j=1

pni ji·

I

∏i=1

J

∏j=1

pni j· j

/[n··!

∏Ii=1 ni·!

I

∏i=1

pni·i·

][n··!

∏Jj=1 n· j!

J

∏j=1

pn· j· j

]

=n··!

∏Ii=1 ∏

Jj=1 ni j!

I

∏i=1

p∑ j ni ji·

J

∏j=1

p∑i ni j· j

/[n··!

∏Ii=1 ni·!

I

∏i=1

pni·i·

][n··!

∏Jj=1 n· j!

J

∏j=1

pn· j· j

]

=n··!

∏Ii=1 ∏

Jj=1 ni j!

I

∏i=1

pni·i·

J

∏j=1

pn· j· j

/[n··!

∏Ii=1 ni·!

I

∏i=1

pni·i·

][n··!

∏Jj=1 n· j!

J

∏j=1

pn· j· j

]

=n··!

∏Ii=1 ∏

Jj=1 ni j!

/n··!

∏Ii=1 ni·!

n··!

∏Jj=1 n· j!

=∏

Ii=1 ni·!∏

Jj=1 n· j!

n··!∏Ii=1 ∏

Jj=1 ni j!

In our 3×3 example, the probability of a table with the designated margins is

1!2!3!1!2!3!6!

1

∏Ii=1 ∏

Jj=1 ni j!

=144720

1

∏Ii=1 ∏

Jj=1 ni j!

=15

1

∏Ii=1 ∏

Jj=1 ni j!

.

Next we present all of the 12 tables that exist having these margins together withtheir value of 1/∏i j ni j!. Again, the problem with this approach is that of making acomplete list of all the tables that satisfy the marginal constraints.

Possible Tables, 1∏

Ii=1 ∏

Jj=1 ni j!1 0 0

0 0 20 2 1

, 14

1 0 00 1 10 1 2

, 12

1 0 00 2 00 0 3

, 1120 0 1

0 0 21 2 0

, 14

0 0 10 1 11 1 1

, 11

0 0 10 2 01 0 2

, 140 1 0

0 0 21 1 1

, 12

0 1 00 1 11 0 2

, 120 0 1

1 1 00 1 2

, 12

0 1 01 1 00 0 3

, 160 0 1

1 0 10 2 1

, 12

0 1 01 0 10 1 2

, 12

Given the values 1/∏i j ni j!, as shown earlier the table probabilities are just[1/∏i j ni j!

]/5. Note that ∑tables

15

1∏

Ii=1 ∏

Jj=1 ni j!

= 1, as it should for probabilities.


In fact, the probabilities can also be computed as [1/∏i j ni j!]/∑tables 1/∏i j ni j!. Inthis example, the individual table probabilities are

Table Probabilities, 15∏

Ii=1 ∏

Jj=1 ni j!

0.05 0.10 0.016666660.05 0.20 0.050.10 0.100.10 0.033333330.10 0.10

Weird tables (tables that are inconsistent with independence) are the ones with smallprobabilities. Unless you see a 3 in the third row and third column, you have not seenanything at all weird. The probability of seeing something as weird or weirder thana 3 in the third row and third column is P = 0.03333333+0.01666666 = 0.05. Notable is significantly weird at the 0.01 level but the diagonal table with 3 in the thirdrow and column is close with P = 0.01666666.

Rather than basing a test on which tables are weird, we could base a test onwhich values of X2 or G2 are weird. This time, along with the table probabilities,we report the values of X2 and G2. Note that the largest test statistic is that for theleast probable table. The second largest statistic goes with the second least probabletable.

Statistics and Table ProbabilitiesX2 G2 Prob X2 G2 Prob X2 G2 Prob

8.667 8.3178 0.05 6.167 5.5452 0.10 12.000 12.137 0.016666.000 8.3178 0.05 2.166 2.7726 0.20 6.667 8.3178 0.054.667 5.5452 0.10 4.167 5.5452 0.104.167 5.5452 0.10 7.500 9.3643 0.033334.667 5.5452 0.10 4.667 5.5452 0.10

That the largest test statistics correspond to the weirdest tables is merely anecdotalevidence that the test statistics identify weird tables, i.e., that the test statistics aregood at picking up deviations from independence/homogeniety. Again, the tablesin question are only weird because we are assuming that the rows and columns areindependent.

I computed the probabilities for this example by hand. That is not something Iwould do with a bigger table in either the sense of more rows and columns or largercounts in the table – because both typically mean a longer list of possible tables.

The next section illustrates the exact conditional test for the table

Observed Tableni j D1 D2 D3 D4 TotalB1 2 1 1 1 5B2 1 1 2 1 5Total 3 2 3 2 10 .

16.1 Two-Factor Tables 151

However, it does so under the guise of testing the adequacy of whether one factor isindependent of the other two in a three-way table. Before doing that, we examine thesimpler three-way table test for the adequacy of whether two factors are independentgiven the third in a three-way table.

16.1.1 R code

Exact conditional tests for two-way tables can be performed in R usingfisher.test. As shown in the next section, I used fisher.test to check myalgebra on the 2×4 example table (and found some mistakes to correct). Some in-consistencies of method for 2×2 tables that exist in fisher.text are addressedby the new package exact2x2.

I computed the probabilities for the 3×3 example by hand but code for comput-ing the test statistics follows.

rm(list = ls())row=c(1,1,1,2,2,2,3,3,3)col=c(1,2,3,1,2,3,1,2,3)row=factor(row)col=factor(col)t1=c(1,0,0,0,0,2,0,2,1)t2=c(1,0,0,0,1,1,0,1,2)t3=c(1,0,0,0,2,0,0,0,3)t4=c(0,0,1,0,0,2,1,2,0)t5=c(0,0,1,0,1,1,1,1,1)t6=c(0,0,1,0,2,0,1,0,2)t7=c(0,1,0,0,0,2,1,1,1)t8=c(0,1,0,0,1,1,1,0,2)t9=c(0,0,1,1,1,0,0,1,2)t10=c(0,1,0,1,1,0,0,0,3)t11=c(0,0,1,1,0,1,0,2,1)t12=c(0,1,0,1,0,1,0,1,2)

fit1 = glm(t1 ˜ row + col, poisson)f1=chisq.test(t(matrix(t1,3,3)),correct=F)c(f1$stat,fit1$dev)fit1 = glm(t2 ˜ row + col, poisson)f1=chisq.test(t(matrix(t2,3,3)),correct=F)c(f1$stat,fit1$dev)fit1 = glm(t3 ˜ row + col, poisson)f1=chisq.test(t(matrix(t3,3,3)),correct=F)c(f1$stat,fit1$dev)fit1 = glm(t4 ˜ row + col, poisson)


f1=chisq.test(t(matrix(t4,3,3)),correct=F)c(f1$stat,fit1$dev)fit1 = glm(t5 ˜ row + col, poisson)f1=chisq.test(t(matrix(t5,3,3)),correct=F)c(f1$stat,fit1$dev)fit1 = glm(t6 ˜ row + col, poisson)f1=chisq.test(t(matrix(t6,3,3)),correct=F)c(f1$stat,fit1$dev)fit1 = glm(t7 ˜ row + col, poisson)f1=chisq.test(t(matrix(t7,3,3)),correct=F)c(f1$stat,fit1$dev)fit1 = glm(t8 ˜ row + col, poisson)f1=chisq.test(t(matrix(t8,3,3)),correct=F)c(f1$stat,fit1$dev)fit1 = glm(t9 ˜ row + col, poisson)f1=chisq.test(t(matrix(t9,3,3)),correct=F)c(f1$stat,fit1$dev)fit1 = glm(t10 ˜ row + col, poisson)f1=chisq.test(t(matrix(t10,3,3)),correct=F)c(f1$stat,fit1$dev)fit1 = glm(t11 ˜ row + col, poisson)f1=chisq.test(t(matrix(t11,3,3)),correct=F)c(f1$stat,fit1$dev)fit1 = glm(t12 ˜ row + col, poisson)f1=chisq.test(t(matrix(t12,3,3)),correct=F)c(f1$stat,fit1$dev)

16.2 Three-Factor Tables

We begin by developing an exact conditional test for the adequacy of the modelA |= B |C. Written in log-linear model shorthand the model is [AC][BC]. Testing thismodel amounts to testing whether the A and B factors are independent at every levelof C and then combining the information across the various levels of C. We then goon to the more difficult problem of testing [B][AC]. This is more difficult because itplaces fewer constraints on the table, so there are more possible tables that we mustconsider.

In this section we discuss only the procedures. Their justification is a special caseof the general methods discussed in Section 3.

16.2 Three-Factor Tables 153

16.2.1 Testing [AC][BC]

Consider the observed table

Observed TableC1 C2

ni j1 A1 A2 Total ni j2 A1 A2 TotalB1 2 1 3 B1 1 1 2B2 1 1 2 B2 2 1 3Total 3 2 5 Total 3 2 5

We want to condition on the sufficient statistics which for this model are the ni·k andn· jk marginal tables. For the example these are

Sufficient statisticsni·k C1 C2 n· jk C1 C2A1 3 3 B1 3 2A2 2 2 B2 2 3

Note that the sufficient statistics provide the row and column totals for both the ni j1two-way table and the ni j2 table. We need to find all of the possible ni j1 tables thatsatisfy its marginal constraints as well as all the ni j2 tables that satisfy its. Since theseare 2×2 tables with small counts it is easy to do. There are only two other possibletables in each case, so there are 3 possible ni j1 tables (given on the left of Table 16.1)and three possible ni j2 tables (on the right of Table 16.1) for a total of 3× 3 = 9possible tables. It is convenient to index them by their values of (n221,n222). Notethat our original observed table has (n221,n222) = (1,1).

Table 16.1 Possible [AC][BC] tables.

Possible C1 Tables Possible C2 TablesC1 C2


C1 C2ni j1 A1 A2 Total ni j2 A1 A2 TotalB1 2 1 3 B1 1 1 2B2 1 1 2 B2 2 1 3Total 3 2 5 Total 3 2 5



Similar to the previous section, the key computation is 1/∏Ii=1 ∏

Jj=1 ∏

Kk=1 ni jk!.

Regardless of the number of factors defining the table, this is the inverse of the prod-uct of the factorials of all the cell counts in the table. This number, divided by thesum of these numbers over the allowable tables, gives the conditional probability. Aproof of these claims is in the next section. For the example the table probabilitiesappear in Table 16.2 along with G2 for testing the conditional independence modelversus the saturated model. Our observed data is table (1,1) which has quite a largeprobability, so displays no inconsistency with independence. As in the previous sec-tion, the tables with the smallest probabilities have the largest G2.

Table 16.2 Conditional independence table information

Table index (n221,n222)(0,0) (0,1) (0,2) (1,0) (1,1) (1,2) (2,0) (2,1) (2,2)

1/∏i, j,k ni jk! 1/48 1/8 1/16 1/24 1/4 1/8 1/144 1/24 1/48Probability 3/100 18/100 9/100 6/100 36/100 18/100 1/100 6/100 3/100G2 9.641 3.049 5.822 6.869 0.2769 3.049 13.460 6.869 9.641

Not surprisingly, the least probable table under conditional independence is

Table least consistent with A |= B |CC1 C2


This also has the largest G2. Two other tables share the second smallest probabil-ity, 3/100. Individually, they are unusual, but collectively they share a P value of2(0.03)+0.01 = 0.07.

Table 16.3 gives the exact conditional distribution of G2. The P value associ-ated with G2 = 13.46 (the weirdest table) is P = 0.01. The probability of seeingG2 = 9.64 is 0.06 with a P value of 0.06+ 0.01 = 0.07. One unusual point is thatG2 = 5.822 is weirder than G2 = 6.869. G2 = 5.822 corresponds to one table withprobability 0.09 whereas G2 = 6.869 corresponds to tables with a smaller probabil-ity, 0.06, but there are two such tables. This suggests that we should focus on theweirdest tables rather than the weirdest G2 values.

Table 16.3 Distribution of G2

G2 0.2769 3.049 5.822 6.869 9.641 13.460Probability 0.36 0.36 0.09 0.12 0.06 0.01


16.2.1.1 Code for G2s in [AC][BC]

This code does not address the difficult problem of identifying the tables. It merelycomputes G2 for the 9 listed tables.

t00=c(1,2,2,0,0,2,3,0)t01=c(1,2,2,0,1,1,2,1)t02=c(1,2,2,0,2,0,1,2)t10=c(2,1,1,1,0,2,3,0)t11=c(2,1,1,1,1,1,2,1)t12=c(2,1,1,1,2,0,1,2)t20=c(3,0,0,2,0,2,3,0)t21=c(3,0,0,2,1,1,2,1)t22=c(3,0,0,2,2,0,1,2)i=c(1,2,1,2,1,2,1,2)j=c(1,1,2,2,1,1,2,2)k=c(1,1,1,1,2,2,2,2)i=factor(i)j=factor(j)k=factor(k)fit=glm(t00 ˜ i:k + j:k, poisson)fit$devfit=glm(t01 ˜ i:k + j:k, poisson)fit$devfit=glm(t02 ˜ i:k + j:k, poisson)fit$devfit=glm(t10 ˜ i:k + j:k, poisson)fit$devfit=glm(t11 ˜ i:k + j:k, poisson)fit$devfit=glm(t12 ˜ i:k + j:k, poisson)fit$devfit=glm(t20 ˜ i:k + j:k, poisson)fit$devfit=glm(t21 ˜ i:k + j:k, poisson)fit$devfit=glm(t22 ˜ i:k + j:k, poisson)fit$dev

16.2.2 Testing [B][AC]

Consider again the observed table


Observed TableC1 C2


This time we want to test the model [B][AC], i.e., B |= AC. For this model the suffi-cient statistics are the table ni·k and the vector n· j·

Sufficient statisticsni·k C1 C2 n· j·A1 3 3 B1 5A2 2 2 B2 5

There are 28 tables that satisfy these marginal constraints. Nine of them consist ofone table from the left of Table 16.1 and one table from the right of Table 16.1. Ninemore involve reversing the roles of C1 and C2 in Table 16.1, so these nine consist ofone table from the right of Table 16.1 and one table from the left of Table 16.1 butwith the labels C1 and C2 reversed. Three more are given in Table 16.4 but anothertwo tables can be obtained by switching the first two tables on the right hand side.The last five tables reverse the roles of C1 and C2 in the previous five tables. (All28 tables are listed in the computing document.)

Table 16.4 Possible [B][AC] tables.

Possible C1 Tables Possible C2 TablesC1 C2




The exact conditional distribution of G2 is given in Table 16.5. The first par-enthetical entry in the second column tells how many of the 28 tables gave thatparticular G2 value. For example, the penultimate entry in the second column indi-cates that there were 4 tables each having probability 3/252 that all gave G2 = 10.04.


Similarly there were two tables having probability 36/252 that gave G2 = 0.68. Ourobserved table gives the smallest possible value of G2, so has a P value of 1. Againthe tables with the smallest probabilities gave the highest G2 values. Programs likeR’s fisher.test actually give as the P value the probability of seeing a tablewith probability as small or smaller than the observed table.

Table 16.5 Exact conditional distribution of G2 for [B][AC]

G2 Probability Probability0.68 [(2)(36)]/252 0.28573.45 [(4)(18)]/252 0.28576.22 [(4)(9)]/252 0.14297.27 [(8)(6)]/252 0.19058.32 [(2)(4)]/252 0.0317

10.04 [(4)(3)]/252 0.047613.86 [(4)(1)]/252 0.0159

16.2.2.1 Code for G2s in [AC][B]

I am not aware of any R packages or programs that enumerate all of the appropriatetables for general log-linear models. StatXact and LogXact are not R packages butthey enumerate all the appropriate tables for many interesting special cases until thetables get too large for this to be practical.

The R program exactLoglinTest gives approximate exact P values for quitegeneral log-linear models by sampling from the allowable tables. The programclogit from the survival package does exact logistic regression.

I found the following tabulation listing half the tables to be very useful.


Table(n·11,n221,n222) ∏i jk ni jk! 144/∏i jk ni jk! G2

300 48 3 10.04301 8 18 3.45302 16 9 6.22310 24 6 7.27311 4 36 0.68312 8 18 3.45320 144 1 13.86321 24 6 7.27322 48 3 10.04402 16 9 6.22411 36 4 8.32401 24 6 7.27412 24 6 7.27502 144 1 13.86

The other 14 tables reverse C1 and C2, have the same probabilities (which are theentries in the third column divided by twice the third column total, i.e., 252) and thesame G2s. The tables are identified by (n·11,n221,n222). Notice that all nine tablesobtained from Table 16.1 by choosing one from the left and one from the right haven·11 = 3. All nine tables obtained from Table 16.1 by choosing one from the left andone from the right but then reversing the right and left have n·11 = 2. The five tablesfrom Table 16.4 have n·11 = 4,5 and reversing right and left give n·11 = 1,0.

There are redundancies in the following computations of G2 because I wantedto list all 24 of the tables which is not necessary. First are G2 for tables defined byTable 16.1.

t300=c(1,2,2,0,0,2,3,0)t301=c(1,2,2,0,1,1,2,1)t302=c(1,2,2,0,2,0,1,2)t310=c(2,1,1,1,0,2,3,0)t311=c(2,1,1,1,1,1,2,1)t312=c(2,1,1,1,2,0,1,2)t320=c(3,0,0,2,0,2,3,0)t321=c(3,0,0,2,1,1,2,1)t322=c(3,0,0,2,2,0,1,2)i=c(1,2,1,2,1,2,1,2)j=c(1,1,2,2,1,1,2,2)k=c(1,1,1,1,2,2,2,2)i=factor(i)j=factor(j)k=factor(k)fit=glm(t300 ˜ i:k + j, poisson)fit$devfit=glm(t301 ˜ i:k + j, poisson)


fit$devfit=glm(t302 ˜ i:k + j, poisson)fit$devfit=glm(t310 ˜ i:k + j, poisson)fit$devfit=glm(t311 ˜ i:k + j, poisson)fit$devfit=glm(t312 ˜ i:k + j, poisson)fit$devfit=glm(t320 ˜ i:k + j, poisson)fit$devfit=glm(t321 ˜ i:k + j, poisson)fit$devfit=glm(t322 ˜ i:k + j, poisson)fit$dev

That gave us 9 of the 24 G2 valuesTo compute the next 9 G2s for tables defined by reversing right and left sides of

Table 16.1, essentially the same code is used but the new tables need to be listed.However, you do not need to run this code because the reversed tables give thesame G2s.

t200=c(0,2,3,0,1,2,2,0)t210=c(1,1,2,1,1,2,2,0)t220=c(2,0,1,2,1,2,2,0)t201=c(0,2,3,0,2,1,1,1)t211=c(1,1,2,1,2,1,1,1)t221=c(2,0,1,2,2,1,1,1)t202=c(0,2,3,0,3,0,0,2)t212=c(1,1,2,1,3,0,0,2)t222=c(2,0,1,2,3,0,0,2)

i=c(1,2,1,2,1,2,1,2)j=c(1,1,2,2,1,1,2,2)k=c(1,1,1,1,2,2,2,2)i=factor(i)j=factor(j)k=factor(k)fit=glm(t200 ˜ i:k + j, poisson)fit$devfit=glm(t210 ˜ i:k + j, poisson)fit$devfit=glm(t220 ˜ i:k + j, poisson)fit$devfit=glm(t201 ˜ i:k + j, poisson)fit$dev


fit=glm(t211 ˜ i:k + j, poisson)fit$devfit=glm(t221 ˜ i:k + j, poisson)fit$devfit=glm(t202 ˜ i:k + j, poisson)fit$devfit=glm(t212 ˜ i:k + j, poisson)fit$devfit=glm(t222 ˜ i:k + j, poisson)fit$dev

The final 10 G2s require entering the five tables defined by Table 16.2.4 and the5 tables that reverse the right and left sides of those.

# tables from Table 16.4t402=c(2,2,1,0,1,0,2,2)t401=c(2,2,1,0,0,1,3,1)t411=c(3,1,0,1,0,1,3,1)t412=c(3,1,0,1,1,0,2,2)t502=c(3,2,0,0,0,0,3,2)# reverse sides of Table 16.4# Gˆ2 same as first 5t120=c(1,0,2,2,2,2,1,0)t110=c(0,1,3,1,2,2,1,0)t111=c(0,1,3,1,3,1,0,1)t121=c(1,0,2,2,3,1,0,1)t020=c(0,0,3,2,3,2,0,0)i=c(1,2,1,2,1,2,1,2)j=c(1,1,2,2,1,1,2,2)k=c(1,1,1,1,2,2,2,2)i=factor(i)j=factor(j)k=factor(k)fit=glm(t402 ˜ i:k + j, poisson)fit$devfit=glm(t401 ˜ i:k + j, poisson)fit$devfit=glm(t411 ˜ i:k + j, poisson)fit$devfit=glm(t412 ˜ i:k + j, poisson)fit$devfit=glm(t502 ˜ i:k + j, poisson)fit$devfit=glm(t120 ˜ i:k + j, poisson)fit$devfit=glm(t110 ˜ i:k + j, poisson)

16.3 General Theory 161

fit$devfit=glm(t111 ˜ i:k + j, poisson)fit$devfit=glm(t121 ˜ i:k + j, poisson)fit$devfit=glm(t020 ˜ i:k + j, poisson)fit$dev

To run fisher.test on these, I need to rearrange how the table is entered.

T320=matrix(c(3,0,0,2,0,3,2,0),2)T320fisher.test(T320)

I chose to run the table 320 because it is one of 4 tables that gives the highest G2

value. Thus I could compare my probability of seeing this G2 to the program’s Pvalue to see if they agree. (The first time I did it they did not which lead me to findsome mistakes in algebra. I had actually been more concerned that I might havemissed some of the possible tables, but my corrected algebra corresponded to theprogram, so I am confident I found all the tables.)

16.3 General Theory

Using notation from Chapters 10 and 12, consider a table with q cells and multino-mial sampling. The log-linear model is

log(m) = Xβ ,

sop =

1n·

eXβ .

The likelihood equations areX ′m = X ′n

and sufficient statistics are X ′n. Writing

X =

x′1...

x′q

,gives

ph =1n·

ex′hβ

and


n′X =q

∑h=1

nhx′h. (16.3.1)

Under our log-linear model, the probability of seeing the table n is,

Pr(n|β ) = n·!∏

qh=1 nh!

q

∏h=1

pnhh =

n·!∏

qh=1 nh!

q

∏h=1

[exp(x′hβ )

n·

]nh

. (16.3.2)

We want to look at the conditional distribution of the tables n given that the sufficientstatistic is, say, X ′n = t. The probability of getting the sufficient statistic equal to t isthe sum of the probabilities of the distinct tables that give t as the sufficient statistic

Pr(X ′n = t|β ) = ∑n∈{n|X ′n=t}

[n·!

∏qh=1 nh!

q

∏h=1

pnhh

]

= ∑n∈{n|X ′n=t}

n·!∏

qh=1 nh!

q

∏h=1

[exp(x′hβ )

n·

]nh

. (16.3.3)

The key is that the conditional distribution cannot depend on the β parametersand to see that we need to examine

q

∏h=1

[exp(x′hβ )

n·

]nh

=q

∏h=1

[exp(nhx′hβ )

(n·)nh

]=

exp(∑qh=1 nhx′hβ )

(n·)∑qh=1 nh

=exp(n′Xβ )

(n·)n·

=exp(t ′β )(n·)n·

(16.3.4)

where the penultimate equality holds from (1) and the last equality is because weare only considering tables that satisfy the sufficient statistic X ′n = t.

From (2) and (3) the conditional probability of seeing a table n with the sufficientstatistics X ′n = t is

Pr(n|X ′n = t,β ) =

n·!∏

qh=1 nh!

q

∏h=1

[exp(x′hβ )

n·

]nh/

∑n∈{n|X ′n=t}

n·!∏

qh=1 nh!

q

∏h=1

[exp(x′hβ )

n·

]nh

.

Substituting (4) gives

Pr(n|X ′n = t,β ) =n·!

∏qh=1 nh!

exp(t ′β )(n·)n·

/∑

n∈{n|X ′n=t}

n·!∏

qh=1 nh!

exp(t ′β )(n·)n·

16.4 Model Testing 163

and canceling the last terms (something that requires J ∈ C(X) so that n· = J′n isthe same in all the tables) gives

Pr(n|X ′n = t) =n·!

∏qh=1 nh!

/∑

n∈{n|X ′n=t}

n·!∏

qh=1 nh!

=1

∏qh=1 nh!

/∑

n∈{n|X ′n=t}

1∏

qh=1 nh!

,

which no longer depends on the unknown parameter vector β . The conditional prob-abilities are clearly numbers between 0 and 1 that sum up to 1 over the allowabletables.

Again, the real problem here is in finding the set {n|X ′n = t}. For a fixed q,augmenting X to increase the rank of X should result in fewer tables n satisfying acondition X ′n = t. (In other words, the bigger the model, the fewer the number oftables to consider.)

In Section 1, instead of finding ∑n∈{n|X ′n=t} 1/∏qh=1 nh!, we used an alternative

argument based on the row and column marginal tables being independent. We notedthat the claimed probabilities summed to 1. If we have listed all of the tables sat-isfying {n|X ′n = t}, we do not need a direct method of computing the probabilityof X ′n = t (which is what we were doing when employing independence of the rowand column marginal tables), we need only compute the sum as indicated.

The discussion of product multinomial and Poisson sampling in Chapters 10 and12 makes it clear that similar arguments hold for them, provided the log-linear modelcontains terms corresponding to the product-multinomial sampling, so that the like-lihood equations include the product-multinomial sampling constraints, or containsan intercept for Poisson sampling. Moreover, since logistic regression models area subset of log-linear models, the argument holds for them also. In fact, Bedrickand Hill (1992) make essentially the same argument as employed here for logisticregression models.

The first two paragraphs of Subsection 16.2.2.1 address general computing is-sues.

16.4 Model Testing

We begin by illustrating the procedure for testing a full versus a reduce model onour example table. We then give the general justification.

To test [AC][BC] versus [B][AC] we have to find all of the tables with the observedvalues of the sufficient statistics for [B][AC], those being the n· j·s and the ni·ks. Inour example, there are 28 such tables. Among those 28 we have to find all the tablesfor which the [AC][BC] sufficient statistics are identical. The [AC][BC] sufficientstatistics are the n· jks and the ni·ks. Because the ni·ks are involved in both models,all 28 tables have identical values. What we need to do is associate each of the 28


tables with their values for the n· jks. Every one of the 28 tables with the same n· jktable will have the same G2 for testing [AC][BC] versus [B][AC] because the modeltesting G2 only depends on the sufficient statistics for the two models.

Table 16.6 summarizes the computation. For every allowable marginal table n· jkit indicates how many of the 28 tables have that marginal table, it gives G2 for testingthe models, and it gives the probability of seeing that marginal table. The probabilityfor each collection of tables with a fixed n· jk table is the sum of the probabilities ofthose distinct tables. In Subsection 16.2.2 we had to find the probabilities for all 28tables, so this is just a matter of determining which probabilities to add. Computa-tionally, I chose a method for identifying the 28 tables that made that easy.

Table 16.6 Possible statistics n· jk consistent with observed values of ni·k and n· j·.

n· jk Tables n· jk Tables(# of tables) (# of tables)

n· jk C1 C2 (9) n· jk C1 C2 (9)B1 3 2 G2 = 0.40 B1 2 3 G2 = 0.40B2 2 3 Pr = 100/252 B2 3 2 Pr = 100/252n· jk C1 C2 (4) n· jk C1 C2 (4)B1 4 1 G2 = 3.85 B1 1 4 G2 = 3.85B2 1 4 Pr = 25/252 B2 4 1 Pr = 25/252n· jk C1 C2 (1) n· jk C1 C2 (1)B1 5 0 G2 = 13.86 B1 0 5 G2 = 13.86B2 0 5 Pr = 1/252 B2 5 0 Pr = 1/252

The 28 tables are required to have n· j· agree with the observed table values of 5and 5. Also, the 28 tables are required to have the ni·k table agree with the observedtable. This implies that the n··k table has to agree with the observed table, and thesevalues are also both 5. So all of the tables in Table 16.6 have the same marginaltotals.

From the probabilities and G2s in Table 16.6 we can determine the exact condi-tional distribution of G2 for testing the models.The exact conditional distribution ofG2 only takes on three values: Pr(G2 = 13.86) = 2/252 = 0.008, Pr(G2 = 3.85) =50/252 = 0.198, and Pr(G2 = 0.40) = 200/252 = 0.794. It is not a very interestingdistribution but it does establish that seeing G2 = 13.86 is odd enough to reject themodel [B][AC] in favor of the full model [AC][BC]. The actual observed table againhas a P value of 1.

16.4.1 General Theory

Consider testing a full modellog(m) = Xβ ,


against a reduced model

log(m) = X0γ, C(X0)⊂C(X).

Assuming the reduced (null) model is true, we are going to use the distribution of thesufficient statistic for the full model, X ′n, given the value of the sufficient statisticfor the reduced model, X ′0n. We use this because we can show that under the nullmodel, the distribution does not depend on the γ parameters.

The distribution of the the sufficient statistic for the full model, Pr(X ′n = t), aspresented in (16.3.2) and simplified by (16.3.4) is,

Pr(X ′n = t|β ) = ∑n∈{n|X ′n=t}

n·!∏

qh=1 nh!

[exp(n′Xβ )

(n·)n·

]= ∑

n∈{n|X ′n=t}

n·!∏

qh=1 nh!

[exp(t ′β )(n·)n·

].

Because C(X0)⊂C(X), there exists B such that X0 =XB. In particular, if t ′= n′Xthen t ′B = n′XB = n′X0 ≡ t ′0, so any table n with X ′n = t must automatically satisfyX ′0n = t0. If the reduced model happens to be true, Xβ = X0γ and the probabilitybecomes

Pr(X ′n = t|γ) = ∑n∈{n|X ′n=t}

n·!∏

qh=1 nh!

[exp(n′X0γ)

(n·)n·

]= ∑

n∈{n|X ′n=t}

n·!∏

qh=1 nh!

[exp(t ′0γ)

(n·)n·

]. (16.4.1)

Let nobs denote the observed table and redefine t0 ≡ X ′0nobs. We need to find thepossible values t = X ′n that could arise from different tables n that satisfy X ′0n = t0and the probability of seeing a table with t = X ′n given the table has X ′0n = t0. Ourparticular interest is in how weird it is to see the value tobs≡X ′nobs that we actuallyobserved. To that end we examine Pr(X ′n = t|X ′0n = t0,γ) and focus on how weirdit is to see Pr(X ′n = tobs|X

′0n = t0,γ)

We need

Pr(X ′n = t|X ′0n = t0,γ) = Pr(X ′n = t;X ′0n = t0|γ)/Pr(X ′0n = t0|γ)

For the reduced model

Pr(X ′0n = t0|γ) = ∑n∈{n|X ′0n=t0}

n·!∏

qh=1 nh!

[exp(t ′0γ)

(n·)n·

]. (16.4.2)

Similar to (1),


Pr(X ′n = t;X ′0n = t0|γ) = ∑n∈{n|X ′n=t;X ′0n=t0}

n·!∏

qh=1 nh!

[exp(t ′0γ)

(n·)n·

].

Therefore,

Pr(X ′n = t|X ′0n = t0,γ) = Pr(X ′n = t;X ′0n = t0|γ)/

Pr(X ′0n = t0|γ)

= ∑n∈{n|X ′n=t;X ′0n=t0}

n·!∏

qh=1 nh!

[exp(t ′0γ)

(n·)n·

]/∑

n∈{n|X ′0n=t0}

n·!∏

qh=1 nh!

[exp(t ′0γ)

(n·)n·

]

= ∑n∈{n|X ′n=t;X ′0n=t0}

n·!∏

qh=1 nh!

/∑

n∈{n|X ′0n=t0}

n·!∏

qh=1 nh!

= ∑n∈{n|X ′n=t;X ′0n=t0}

1∏

qh=1 nh!

/∑

n∈{n|X ′0n=t0}

1∏

qh=1 nh!

,

which no longer depends on γ . What we need to do is find all the tables with X ′0n= t0and sort them into equivalence classes in which all the tables in a class have X ′n= tξfor some vector tξ . Special interest lies in the table with tξ = tobs. Once we have theequivalence classes, we add up the table probabilities for each member of the class.

16.4.2 Computing

The exact inference aspects of the program LogXact seems to focus on model com-parisons that involve dropping one parameter out of a model. Although the programallows fitting factor terms, it seems to require the specific parameterization that thegroup with the largest index has its parameter set to 0.

The work of identifying tables for the example was done without a computer.This discussion is only about finding the G2s.

Earlier I indexed the tables by (n·11,n2,2,1,n2,2,2). Every table with the same valueof n·11 has the same marginal table n· jk because the 2× 2 marginal table has fixedmargins n· j· and n··k, so the entry n·11 determines the entire n· jk table. (This wouldget much harder if n· jk was not 2×2.) When there are more than two tables n for avalue of n·11 I have done the computation for two of the tables just to illustrate thatall tables with a fixed n·11 have the same G2.

rm(list = ls())i=c(1,2,1,2,1,2,1,2)j=c(1,1,2,2,1,1,2,2)k=c(1,1,1,1,2,2,2,2)i=factor(i)j=factor(j)k=factor(k)


t300=c(1,2,2,0,0,2,3,0)t322=c(3,0,0,2,2,0,1,2)

fit=glm(t300 ˜ i:k + j, poisson)fit1=glm(t300 ˜ i:k + j:k, poisson)fit$dev-fit1$dev


t201=c(0,2,3,0,2,1,1,1)t211=c(1,1,2,1,2,1,1,1)



t402=c(2,2,1,0,1,0,2,2)t412=c(3,1,0,1,1,0,2,2)t502=c(3,2,0,0,0,0,3,2)




t110=c(0,1,3,1,2,2,1,0)t111=c(0,1,3,1,3,1,0,1)t020=c(0,0,3,2,3,2,0,0)





The huge issue is finding all of the tables that give a fixed set of sufficient statis-tics.

16.5 Notes and References

fisher.test implements the network algorithm developed by Mehta and Patel(1983, 1986) and improved by Clarkson, Fan and Joe (1993) Two-sided tests arebased on the probabilities of the tables, and take as ‘more extreme’ all tables withprobabilities less than or equal to that of the observed table, the P-value being thesum of such probabilities.

Agresti, A. (1992). A survey of exact inference for contingency tables. StatisticalScience, 7, 131-177.

Bedrick, Edward J. and Hill, Joe R. (1992). [A Survey of Exact Inference forContingency Tables]: Comment. Statistical Science, 7, 153-157

Clarkson, D. B., Fan, Y. and Joe, H. (1993) A Remark on Algorithm 643: FEX-ACT: An Algorithm for Performing Fisher’s Exact Test in r×c Contingency Tables.ACM Transactions on Mathematical Software, 19, 484-488.

Davis, L. J. (1986). Exact tests for 2×2 contingency tables. The American Statis-tician, 40, 139-141.

Fisher, R. A. (1935). The logic of inductive inference. Journal of the Royal Sta-tistical Society, Series A, 98, 39-54.

Hirji, K. F., Mehta, C. R., and Patel, N. R. (1987). Computing Distributionsfor Exact Logistic Regression. Journal of the American Statistical Association, 82,1110-1117.

Mehta, Cyrus R. and Patel, Nitin R. (1983). A network algorithm for performingFisher’s exact test in r× c contingency tables. Journal of the American StatisticalAssociation, 78, 427-434.

Mehta, C. R. and Patel, N. R. (1986). Algorithm 643: FEXACT, a FORTRANsubroutine for Fisher’s exact test on unordered r×c contingency tables. ACM Trans-actions on Mathematical Software, 12, 154-161.

Patefield, W. M. (1981). Algorithm AS 159: An efficient method of generatingr× c tables with given row and column totals. Applied Statistics, 30, 91-97.

Chapter 17Polya Trees

No, there will be no chapter on Polya trees in this book. Someone just wanted theseplots.

17.0.1 Alas

Μ-2Σ Μ Μ+2Σ

Fig. 17.1 OpenBUGS screenshot.

169

170 17 Polya Trees

Μ-2Σ Μ Μ+2Σ

Θ11 Θ12


Μ-2Σ Μ Μ+2Σ

Θ11Θ21 Θ12Θ23 Θ12Θ24Θ11Θ22


17 Polya Trees 171

Μ-2Σ Μ Μ+2Σ


Index

1s, 128∼, 63, 99BIDA, 61, 97

burn-in, 61, 98

CA, 125copying commands

problems, viiCorrespondence Analysis, 125

GUIgraphical user interface, 100

Markov chain Monte Carlo, 61, 97MCA, 125McMC, 61, 97

precision, 63, 99

Singular Value Decomposition, 125singular value decomposition, 125SVD, 125

thinning, 61, 98tilde, 63, 99

copying problem, vii

173

Documents

Preliminary Version of R Commands for – Log-Linear Models ...fletcher/R-LOGLIN2R.pdf · Preface This online book is an R companion to Log-linear Models and Logistic Regres-sion,