13
Linear statistical models 2009 Count data Contingency tables and log-linear models Poisson regression

Linear statistical models 2009 Count data Contingency tables and log-linear models Poisson regression

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression

Linear statistical models 2009

Count data

Contingency tables and log-linear models

Poisson regression

Page 2: Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression

Linear statistical models 2009

Contingency tables and log-linear models

Expected frequency:

Log-linear models are linear models of the log expected frequency

(log is used as link function)

SnoresHeart_problems Seldom Often TotalYes 59 51 110No 1958 416 2374

2017 467 2484

SnoresHeart_problems Seldom Often TotalYes p11 p12 p1.

No p21 p22 p2.

p.1 p.2 1

ijij pn

Page 3: Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression

Linear statistical models 2009

A log-linear model for independence

The last parameter of each kind can be set to zero

jiijij ppnpn ..

jijiij ppn )log()log()log()log( ..

1

1

22

21

12

11

001

101

011

111

)log(

)log(

)log(

)log(

Page 4: Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression

Linear statistical models 2009

The saturated log-linear model

Independence can be tested by relating the difference in deviance D2 – D1 to a 2 distribution with df2 – df1 degrees of freedom.

What is D1 and df1 for the saturated model?

ijij pn

ijjiij )()log(

Page 5: Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression

Linear statistical models 2009

Analysis of example data (1)

proc genmod data=linear.snoring;

class snore heart;

model count = snore heart/link=log dist=Poisson;

run;

Can a Poisson distribution be justified?

Snore Hart CountOften Yes 51Often No 416Seldom Yes 59Seldom No 1958

Page 6: Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression

Linear statistical models 2009

Analysis of example data (2)

• Analysis Of Parameter Estimates

• Standard Wald 95% Confidence Chi-• Parameter DF Estimate Error Limits Square Pr > ChiSq

• Intercept 1 4.4922 0.0958 4.3044 4.6801 2197.28 <.0001• Snore Often 1 -1.4630 0.0514 -1.5637 -1.3624 811.67 <.0001• Snore Seldom 0 0.0000 0.0000 0.0000 0.0000 . . • Heart No 1 3.0719 0.0975 2.8807 3.2630 992.02 <.0001• Heart Yes 0 0.0000 0.0000 0.0000 0.0000 . . • Scale 0 1.0000 0.0000 1.0000 1.0000•

Often SeldomYes 4.4922 3.0292No 7.5641 6.1009Estimates of log()

Page 7: Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression

Linear statistical models 2009

Contingency table with one response variable

Consider the example data written in the following form

proc genmod data=linear.snoring2;

class snore;

model heart/total = snore/link=logit dist=binomial;

run;

Snore Heart TotalYes 51 467No 59 2017

Page 8: Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression

Linear statistical models 2009

Analysis of example data (2)

• Analysis Of Parameter Estimates

• Standard Wald 95% Confidence Chi-• Parameter DF Estimate Error Limits Square Pr > ChiSq

• Intercept 1 -2.0989 0.1484 -2.3896 -1.8081 200.13 <.0001• Snore No 1 -1.4033 0.1987 -1.7927 -1.0139 49.89 <.0001• Snore Yes 0 0.0000 0.0000 0.0000 0.0000 . . • Scale 0 1.0000 0.0000 1.0000 1.0000

log(p/(1-p)) p

Yes -2.0989 0.109204No -3.5022 0.02925

log(p/(1-p)) pYes -2.0989 0.109204No -3.5022 0.02925

Snore Heart Total Rel. FrequencyYes 51 467 0.109208No 59 2017 0.029251

Page 9: Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression

Linear statistical models 2009

The multinomial distribution

Consider a nominal random variable that takes k distinct values with probabilities p1, p2, …, pk

Assume that have made n independent observations of that variable

Then

where nj is the number of times the jth value is observed

Note that n is fixed in a multinomial distribution.

If the observations arrive randomly, a Poisson distribution is usually preferable.

knk

nn

kk ppp

nnn

nnnnP ...

!...!!

!)...,,,( 21

2121

21

Page 10: Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression

Linear statistical models 2009

Higher order tables

Consider the following data on drug use

Model:

Alcohol Cigarette Marijuana Countyes yes yes 911yes yes no 538yes no yes 44yes no no 456no yes yes 3no yes no 43no no yes 2no no no 279

ijkjkikijkjiijk )()()()()log(

Page 11: Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression

Linear statistical models 2009

Terminology

A = alcohol C = cigarette M = marijuana

Model A C M: mutual independence model

Model A C M A*C A*M C*M: homogeneous association model

Model A C M A*C A*M: Model in which C and M are mutually independent when controlling for A

Page 12: Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression

Linear statistical models 2009

Poisson regression I

Poisson distribution

Log link

where x is a covariate

x10)log(

Page 13: Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression

Linear statistical models 2009

Poisson regression II

Poisson distribution

Log link

where the parameters are row,

column and treatment effects

kji 0)log(

Row Column Treatment Count1 1 P 32 1 M 63 1 O 44 1 N 175 1 K 41 2 O 22 2 K 03 2 M 94 2 P 85 2 N 41 3 N 52 3 O 63 3 K 14 3 M 85 3 P 21 4 K 12 4 N 43 4 P 64 4 O 95 4 M 41 5 M 42 5 P 43 5 N 54 5 K 05 5 O 8