Upload
dinhhanh
View
215
Download
0
Embed Size (px)
Citation preview
Model Log-Linear(Bagian 1)
Dr. Kusman Sadik, M.Si
Program Studi Pascasarjana
Departemen Statistika IPB, 2018/2019
2
Using a log-linear modeling approach is advantageous to
conducting inferential tests of the associations in contingency
tables because the models can handle more complicated
situations.
For example, the Breslow–Day statistic is limited to 2x2xK
tables and estimates of common odds ratios cannot be
obtained for tables larger than 2x2.
Conversely, a log-linear modeling approach is not restricted
to two- or three-way tables so it can be used for testing
homogeneous association and estimating common odds
ratios in tables of any size.
3
Log-linear models are used to model the cell counts in
contingency tables.
The ultimate goal of fitting a log-linear model is to estimate
parameters that describe the relationships between
categorical variables.
Specifically, for a set of categorical variables, log-linear
models do not really distinguish between explanatory and
response variables but rather treat all variables as response
variables by modeling the cell counts for all combinations of
the levels of the categorical variables included in the model.
4
In general, the number of parameters in a log-linear
model depends on the number of categories of the
variables of interest.
More specifically, in any log-linear model the effect of a
categorical variable with a total of C categories requires
(C – 1) unique parameters.
For example, if variable X is gender (with two
categories), then C = 2 and only one predictor, thus one
parameter, is needed to model the effect of X.
5
When dummy coding is used, the last category of the
variable is used as a reference category.
Therefore, the parameter associated with the last category
is set to zero, and each of the remaining parameters of the
model is interpreted relative to the last category.
For example, if male is the last category of the gender
variable, then the one gender parameter in the log-linear
model will be interpreted as the difference between females
and males because the parameter reflects the odds for
females relative to the reference category, males.
6
7
Instead of representing the parameter associated with the ith
variable (Xi) as βi, in log-linear models this parameter is
represented by the Greek letter lambda, λ, with the variable
indicated in the superscript and the (dummy-coded) indicator
of the variable in the subscript.
For example, if the variable X has a total of I categories (i =
1, 2, …, I), λix is the parameter associated with the ith
indicator (dummy variable) for X.
Similarly, if the variable Y has a total of J categories (j = 1, 2,
…, J), then λjy is the parameter associated with the jth
indicator for Y.
8
For two categorical variables, the expected cell counts,
denoted by μij for the cell in the ith row and jth column, are the
outcome values from a log-linear model.
μij = Eij = (ni+) (n+j)/ (n++)
9
10
In general, main effects in log-linear models are interpreted
as odds.
The (exponentiated) parameter values associated with X, λix ,
can be interpreted as the odds of being in the ith row versus
being in the last row of the table regardless of the value of
the other variable, Y.
Likewise, the (exponentiated) parameter values associated
with Y, λjy , can be interpreted as the odds of being in the jth
column versus being in the last column of the table
regardless of the value of the other variable, X.
11
Lihat : Azen, hlm. 140
12
i = 1, 2, ..., k : kategori terakhir (i = k) sebagai referensi
j = 1, 2, ..., m : kategori terakhir (j = m) sebagai referensi
𝜆 𝑖𝑋 = log 𝑜𝑑𝑑𝑠 𝑖 = log
𝑃 𝑋=𝑖
𝑃 𝑋=𝑘 = log
𝑛𝑖 ./𝑛 ..
𝑛𝑘 ./𝑛 ..
𝜆 𝑗𝑌 = log 𝑜𝑑𝑑𝑠 𝑗 = log
𝑃 𝑌=𝑗
𝑃 𝑌=𝑚 = log
𝑛 .𝑗 /𝑛 ..
𝑛 .𝑚 /𝑛 ..
13
Data : Azen, Table.7.2
𝜆 𝑖𝑋 = log 𝑜𝑑𝑑𝑠 𝑖 = log
𝑃 𝑋=𝑖
𝑃 𝑋=𝑘 = log
𝑛𝑖 ./𝑛 ..
𝑛𝑘 ./𝑛 ..
𝜆 1𝑋 = log
𝑃 𝑋=1
𝑃 𝑋=3 = log
𝑛1./𝑛 ..
𝑛3./𝑛 .. = log
450/1776
698/1776 = −0.43897
𝜆 2𝑋 = log
𝑃 𝑋=2
𝑃 𝑋=3 = log
𝑛2./𝑛 ..
𝑛3./𝑛 .. = log
628/1776
698/1776 = −0.10568
𝜆 3𝑋 = log
𝑃 𝑋=3
𝑃 𝑋=3 = log
𝑛3./𝑛 ..
𝑛3./𝑛 .. = log 1 = 0
14
𝜆 𝑗𝑌 = log 𝑜𝑑𝑑𝑠 𝑗 = log
𝑃 𝑌=𝑗
𝑃 𝑌=𝑚 = log
𝑛 .𝑗 /𝑛 ..
𝑛 .𝑚 /𝑛 ..
𝜆 1𝑌 = log
𝑃 𝑌=1
𝑃 𝑌=3 = log
𝑛 .1/𝑛 ..
𝑛 .3/𝑛 .. = log
647/1776
274/1776 = 0.859218
𝜆 2𝑌 = log
𝑃 𝑌=2
𝑃 𝑌=3 = log
𝑛 .2/𝑛 ..
𝑛 .3/𝑛 .. = log
855/1776
274/1776 = 1.137973
𝜆 3𝑌 = log
𝑃 𝑌=3
𝑃 𝑌=3 = log
𝑛 .3/𝑛 ..
𝑛 .3/𝑛 .. = log 1 = 0
15
Bagaimana menduga λ ?
Gunakan kategori terakhir untuk X dan Y,
untuk data di atas (i=3 dan j=3), sehingga:
log(μ33) = λ, karena 𝜆 3𝑋 = 𝜆 3
𝑌 = 0
μ33 = (n3.) (n.3)/ (n..) = (274)(698)/(1776) = 107.6869
𝜆 = log(μ33) = log(107.6869) = 4.67923
16
17
18
19
Program R : Wajib
Program SAS : Tambahan
20
** Model Log-Linear untuk Data Tabel 7.2 (Azen, hlm.140) **
** relevel --> Memilih Kategori Referensi **
** Model 1 : Tanpa Interaksi **
pol <- factor(rep(c("1Lib","2Mod","3Con"),3))
pre <- factor(rep(c("1Bus","2Cli","3Per"),rep(3,3)))
count <- c(70, 195, 382, 324, 332, 199, 56, 101, 117)
pol <- relevel(pol, ref="3Con")
pre <- relevel(pre, ref="3Per")
data.frame(pol, pre, count)
model1 <- glm(count ~ pol + pre, family=poisson("link"=log))
summary(model1)
dugaan <- round(fitted(model1),2)
data.frame(pol,pre, count, dugaan)
21
pol pre count
1 1Lib 1Bus 70
2 2Mod 1Bus 195
3 3Con 1Bus 382
4 1Lib 2Cli 324
5 2Mod 2Cli 332
6 3Con 2Cli 199
7 1Lib 3Per 56
8 2Mod 3Per 101
9 3Con 3Per 117
22
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.67923 0.06723 69.605 < 2e-16 ***
pol1Lib -0.43897 0.06045 -7.261 3.84e-13 ***
pol2Mod -0.10568 0.05500 -1.921 0.0547 .
pre1Bus 0.85922 0.07208 11.921 < 2e-16 ***
pre2Cli 1.13797 0.06942 16.392 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 626.32 on 8 degrees of freedom
Residual deviance: 247.70 on 4 degrees of freedom
AIC: 320
23
pol pre count dugaan
1 1Lib 1Bus 70 163.94
2 2Mod 1Bus 195 228.78
3 3Con 1Bus 382 254.28
4 1Lib 2Cli 324 216.64
5 2Mod 2Cli 332 302.33
6 3Con 2Cli 199 336.03
7 1Lib 3Per 56 69.43
8 2Mod 3Per 101 96.89
9 3Con 3Per 117 107.69
24
When there is evidence for dependency between the row
and column variables of a two-way table, the dependency is
modeled using two-way interaction terms in the log-linear
modeling framework.
However, fitting a log-linear model with a two-way interaction
to a two-way contingency table is analogous to fitting the
saturated model.
To illustrate the saturated model using the previous example:
25
26
𝑂𝑑𝑑𝑠 𝑋=1 |𝑌=1
𝑂𝑑𝑑𝑠 𝑋=1 |𝑌=3= 𝑃 𝑋=1|𝑌=1 /𝑃 𝑋=3|𝑌=1
𝑃 𝑋=1|𝑌=3 /𝑃 𝑋=3|𝑌=3
27
** Model Log-Linear untuk Data Tabel 7.2 (Azen, hlm.140) **
** relevel --> Memilih Kategori Referensi **
** Model 2 : Ada Interaksi **
pol <- factor(rep(c("1Lib","2Mod","3Con"),3))
pre <- factor(rep(c("1Bus","2Cli","3Per"),rep(3,3)))
count <- c(70, 195, 382, 324, 332, 199, 56, 101, 117)
pol <- relevel(pol, ref="3Con")
pre <- relevel(pre, ref="3Per")
data.frame(pol, pre, count)
Model1 <- glm(count ~ pol + pre + pol*pre,
family=poisson("link"=log))
summary(model1)
dugaan <- round(fitted(model1),2)
data.frame(pol,pre, count, dugaan)
28
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.76217 0.09245 51.511 < 2e-16 ***
pol1Lib -0.73682 0.16249 -4.534 5.77e-06 ***
pol2Mod -0.14705 0.13582 -1.083 0.27895
pre1Bus 1.18325 0.10566 11.198 < 2e-16 ***
pre2Cli 0.53113 0.11650 4.559 5.14e-06 ***
pol1Lib:pre1Bus -0.96010 0.20810 -4.614 3.96e-06 ***
pol2Mod:pre1Bus -0.52537 0.16185 -3.246 0.00117 **
pol1Lib:pre2Cli 1.22426 0.18578 6.590 4.41e-11 ***
pol2Mod:pre2Cli 0.65888 0.16274 4.049 5.15e-05 ***
---
Signif.codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 6.2632e+02 on 8 degrees of freedom
Residual deviance: 9.5701e-14 on 0 degrees of freedom
AIC: 80.301
29
30
1. Gunakan Program R untuk menganalisis data yang terdapat pada
Tabel 7.2 (Azen, hlm.140) :
a. Lakukan pemodelan log-linear dengan menjadikan Conservative
dan Perot sebagai pembanding/referensi. Apa interpretasinya?
b. Lakukan pemodelan log-linear dengan menjadikan Liberal dan
Bush sebagai pembanding/referensi. Apa interpretasinya?
c. Berdasarkan dua pendekatan tersebut (a dan b), tentukan
penduga bagi ij, untuk i = 1, 2, 3 dan j = 1, 2, 3. Apakah
hasilnya berbeda antara (a) dan (b) di atas?
d. Lakukan uji hipotesis untuk mengetahui ada tidaknya hubungan
antara afiliasi politik dengan pilihan menggunakan model penuh
(saturated model). Apa kesimpulan Anda?
31
2. Gunakan Program R untuk melakukan analisis data pada
Tabel 2 dibawah ini:
a. Tentukan model log-linear dan dugaan parameternya.
Apa interpretasinya?
b. Berdasarkan model tersebut, tentukan penduga bagi ij,
untuk i = 1, 2, 3 dan j = 1, 2, 3, 4.
c. Lakukan uji hipotesis untuk mengetahui ada tidaknya
hubungan antara afiliasi politik dengan umur
menggunakan model penuh (saturated model). Apa
kesimpulan Anda?
32
2.
33
Pustaka
1. Azen, R. dan Walker, C.R. (2011). Categorical Data
Analysis for the Behavioral and Social Sciences.
Routledge, Taylor and Francis Group, New York.
2. Agresti, A. (2002). Categorical Data Analysis 2nd. New
York: Wiley.
3. Pustaka lain yang relevan.
34
Bisa di-download di
kusmansadik.wordpress.com
35
Terima Kasih