35
Model Log-Linear (Bagian 1) Dr. Kusman Sadik, M.Si Program Studi Pascasarjana Departemen Statistika IPB, 2018/2019

Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

Embed Size (px)

Citation preview

Page 1: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

Model Log-Linear(Bagian 1)

Dr. Kusman Sadik, M.Si

Program Studi Pascasarjana

Departemen Statistika IPB, 2018/2019

Page 2: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

2

Using a log-linear modeling approach is advantageous to

conducting inferential tests of the associations in contingency

tables because the models can handle more complicated

situations.

For example, the Breslow–Day statistic is limited to 2x2xK

tables and estimates of common odds ratios cannot be

obtained for tables larger than 2x2.

Conversely, a log-linear modeling approach is not restricted

to two- or three-way tables so it can be used for testing

homogeneous association and estimating common odds

ratios in tables of any size.

Page 3: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

3

Log-linear models are used to model the cell counts in

contingency tables.

The ultimate goal of fitting a log-linear model is to estimate

parameters that describe the relationships between

categorical variables.

Specifically, for a set of categorical variables, log-linear

models do not really distinguish between explanatory and

response variables but rather treat all variables as response

variables by modeling the cell counts for all combinations of

the levels of the categorical variables included in the model.

Page 4: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

4

In general, the number of parameters in a log-linear

model depends on the number of categories of the

variables of interest.

More specifically, in any log-linear model the effect of a

categorical variable with a total of C categories requires

(C – 1) unique parameters.

For example, if variable X is gender (with two

categories), then C = 2 and only one predictor, thus one

parameter, is needed to model the effect of X.

Page 5: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

5

When dummy coding is used, the last category of the

variable is used as a reference category.

Therefore, the parameter associated with the last category

is set to zero, and each of the remaining parameters of the

model is interpreted relative to the last category.

For example, if male is the last category of the gender

variable, then the one gender parameter in the log-linear

model will be interpreted as the difference between females

and males because the parameter reflects the odds for

females relative to the reference category, males.

Page 6: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

6

Page 7: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

7

Instead of representing the parameter associated with the ith

variable (Xi) as βi, in log-linear models this parameter is

represented by the Greek letter lambda, λ, with the variable

indicated in the superscript and the (dummy-coded) indicator

of the variable in the subscript.

For example, if the variable X has a total of I categories (i =

1, 2, …, I), λix is the parameter associated with the ith

indicator (dummy variable) for X.

Similarly, if the variable Y has a total of J categories (j = 1, 2,

…, J), then λjy is the parameter associated with the jth

indicator for Y.

Page 8: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

8

For two categorical variables, the expected cell counts,

denoted by μij for the cell in the ith row and jth column, are the

outcome values from a log-linear model.

μij = Eij = (ni+) (n+j)/ (n++)

Page 9: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

9

Page 10: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

10

In general, main effects in log-linear models are interpreted

as odds.

The (exponentiated) parameter values associated with X, λix ,

can be interpreted as the odds of being in the ith row versus

being in the last row of the table regardless of the value of

the other variable, Y.

Likewise, the (exponentiated) parameter values associated

with Y, λjy , can be interpreted as the odds of being in the jth

column versus being in the last column of the table

regardless of the value of the other variable, X.

Page 11: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

11

Lihat : Azen, hlm. 140

Page 12: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

12

i = 1, 2, ..., k : kategori terakhir (i = k) sebagai referensi

j = 1, 2, ..., m : kategori terakhir (j = m) sebagai referensi

𝜆 𝑖𝑋 = log 𝑜𝑑𝑑𝑠 𝑖 = log

𝑃 𝑋=𝑖

𝑃 𝑋=𝑘 = log

𝑛𝑖 ./𝑛 ..

𝑛𝑘 ./𝑛 ..

𝜆 𝑗𝑌 = log 𝑜𝑑𝑑𝑠 𝑗 = log

𝑃 𝑌=𝑗

𝑃 𝑌=𝑚 = log

𝑛 .𝑗 /𝑛 ..

𝑛 .𝑚 /𝑛 ..

Page 13: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

13

Data : Azen, Table.7.2

𝜆 𝑖𝑋 = log 𝑜𝑑𝑑𝑠 𝑖 = log

𝑃 𝑋=𝑖

𝑃 𝑋=𝑘 = log

𝑛𝑖 ./𝑛 ..

𝑛𝑘 ./𝑛 ..

𝜆 1𝑋 = log

𝑃 𝑋=1

𝑃 𝑋=3 = log

𝑛1./𝑛 ..

𝑛3./𝑛 .. = log

450/1776

698/1776 = −0.43897

𝜆 2𝑋 = log

𝑃 𝑋=2

𝑃 𝑋=3 = log

𝑛2./𝑛 ..

𝑛3./𝑛 .. = log

628/1776

698/1776 = −0.10568

𝜆 3𝑋 = log

𝑃 𝑋=3

𝑃 𝑋=3 = log

𝑛3./𝑛 ..

𝑛3./𝑛 .. = log 1 = 0

Page 14: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

14

𝜆 𝑗𝑌 = log 𝑜𝑑𝑑𝑠 𝑗 = log

𝑃 𝑌=𝑗

𝑃 𝑌=𝑚 = log

𝑛 .𝑗 /𝑛 ..

𝑛 .𝑚 /𝑛 ..

𝜆 1𝑌 = log

𝑃 𝑌=1

𝑃 𝑌=3 = log

𝑛 .1/𝑛 ..

𝑛 .3/𝑛 .. = log

647/1776

274/1776 = 0.859218

𝜆 2𝑌 = log

𝑃 𝑌=2

𝑃 𝑌=3 = log

𝑛 .2/𝑛 ..

𝑛 .3/𝑛 .. = log

855/1776

274/1776 = 1.137973

𝜆 3𝑌 = log

𝑃 𝑌=3

𝑃 𝑌=3 = log

𝑛 .3/𝑛 ..

𝑛 .3/𝑛 .. = log 1 = 0

Page 15: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

15

Bagaimana menduga λ ?

Gunakan kategori terakhir untuk X dan Y,

untuk data di atas (i=3 dan j=3), sehingga:

log(μ33) = λ, karena 𝜆 3𝑋 = 𝜆 3

𝑌 = 0

μ33 = (n3.) (n.3)/ (n..) = (274)(698)/(1776) = 107.6869

𝜆 = log(μ33) = log(107.6869) = 4.67923

Page 16: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

16

Page 17: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

17

Page 18: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

18

Page 19: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

19

Program R : Wajib

Program SAS : Tambahan

Page 20: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

20

** Model Log-Linear untuk Data Tabel 7.2 (Azen, hlm.140) **

** relevel --> Memilih Kategori Referensi **

** Model 1 : Tanpa Interaksi **

pol <- factor(rep(c("1Lib","2Mod","3Con"),3))

pre <- factor(rep(c("1Bus","2Cli","3Per"),rep(3,3)))

count <- c(70, 195, 382, 324, 332, 199, 56, 101, 117)

pol <- relevel(pol, ref="3Con")

pre <- relevel(pre, ref="3Per")

data.frame(pol, pre, count)

model1 <- glm(count ~ pol + pre, family=poisson("link"=log))

summary(model1)

dugaan <- round(fitted(model1),2)

data.frame(pol,pre, count, dugaan)

Page 21: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

21

pol pre count

1 1Lib 1Bus 70

2 2Mod 1Bus 195

3 3Con 1Bus 382

4 1Lib 2Cli 324

5 2Mod 2Cli 332

6 3Con 2Cli 199

7 1Lib 3Per 56

8 2Mod 3Per 101

9 3Con 3Per 117

Page 22: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

22

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 4.67923 0.06723 69.605 < 2e-16 ***

pol1Lib -0.43897 0.06045 -7.261 3.84e-13 ***

pol2Mod -0.10568 0.05500 -1.921 0.0547 .

pre1Bus 0.85922 0.07208 11.921 < 2e-16 ***

pre2Cli 1.13797 0.06942 16.392 < 2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 626.32 on 8 degrees of freedom

Residual deviance: 247.70 on 4 degrees of freedom

AIC: 320

Page 23: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

23

pol pre count dugaan

1 1Lib 1Bus 70 163.94

2 2Mod 1Bus 195 228.78

3 3Con 1Bus 382 254.28

4 1Lib 2Cli 324 216.64

5 2Mod 2Cli 332 302.33

6 3Con 2Cli 199 336.03

7 1Lib 3Per 56 69.43

8 2Mod 3Per 101 96.89

9 3Con 3Per 117 107.69

Page 24: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

24

When there is evidence for dependency between the row

and column variables of a two-way table, the dependency is

modeled using two-way interaction terms in the log-linear

modeling framework.

However, fitting a log-linear model with a two-way interaction

to a two-way contingency table is analogous to fitting the

saturated model.

To illustrate the saturated model using the previous example:

Page 25: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

25

Page 26: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

26

𝑂𝑑𝑑𝑠 𝑋=1 |𝑌=1

𝑂𝑑𝑑𝑠 𝑋=1 |𝑌=3= 𝑃 𝑋=1|𝑌=1 /𝑃 𝑋=3|𝑌=1

𝑃 𝑋=1|𝑌=3 /𝑃 𝑋=3|𝑌=3

Page 27: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

27

** Model Log-Linear untuk Data Tabel 7.2 (Azen, hlm.140) **

** relevel --> Memilih Kategori Referensi **

** Model 2 : Ada Interaksi **

pol <- factor(rep(c("1Lib","2Mod","3Con"),3))

pre <- factor(rep(c("1Bus","2Cli","3Per"),rep(3,3)))

count <- c(70, 195, 382, 324, 332, 199, 56, 101, 117)

pol <- relevel(pol, ref="3Con")

pre <- relevel(pre, ref="3Per")

data.frame(pol, pre, count)

Model1 <- glm(count ~ pol + pre + pol*pre,

family=poisson("link"=log))

summary(model1)

dugaan <- round(fitted(model1),2)

data.frame(pol,pre, count, dugaan)

Page 28: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

28

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 4.76217 0.09245 51.511 < 2e-16 ***

pol1Lib -0.73682 0.16249 -4.534 5.77e-06 ***

pol2Mod -0.14705 0.13582 -1.083 0.27895

pre1Bus 1.18325 0.10566 11.198 < 2e-16 ***

pre2Cli 0.53113 0.11650 4.559 5.14e-06 ***

pol1Lib:pre1Bus -0.96010 0.20810 -4.614 3.96e-06 ***

pol2Mod:pre1Bus -0.52537 0.16185 -3.246 0.00117 **

pol1Lib:pre2Cli 1.22426 0.18578 6.590 4.41e-11 ***

pol2Mod:pre2Cli 0.65888 0.16274 4.049 5.15e-05 ***

---

Signif.codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 6.2632e+02 on 8 degrees of freedom

Residual deviance: 9.5701e-14 on 0 degrees of freedom

AIC: 80.301

Page 29: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

29

Page 30: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

30

1. Gunakan Program R untuk menganalisis data yang terdapat pada

Tabel 7.2 (Azen, hlm.140) :

a. Lakukan pemodelan log-linear dengan menjadikan Conservative

dan Perot sebagai pembanding/referensi. Apa interpretasinya?

b. Lakukan pemodelan log-linear dengan menjadikan Liberal dan

Bush sebagai pembanding/referensi. Apa interpretasinya?

c. Berdasarkan dua pendekatan tersebut (a dan b), tentukan

penduga bagi ij, untuk i = 1, 2, 3 dan j = 1, 2, 3. Apakah

hasilnya berbeda antara (a) dan (b) di atas?

d. Lakukan uji hipotesis untuk mengetahui ada tidaknya hubungan

antara afiliasi politik dengan pilihan menggunakan model penuh

(saturated model). Apa kesimpulan Anda?

Page 31: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

31

2. Gunakan Program R untuk melakukan analisis data pada

Tabel 2 dibawah ini:

a. Tentukan model log-linear dan dugaan parameternya.

Apa interpretasinya?

b. Berdasarkan model tersebut, tentukan penduga bagi ij,

untuk i = 1, 2, 3 dan j = 1, 2, 3, 4.

c. Lakukan uji hipotesis untuk mengetahui ada tidaknya

hubungan antara afiliasi politik dengan umur

menggunakan model penuh (saturated model). Apa

kesimpulan Anda?

Page 32: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

32

2.

Page 33: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

33

Pustaka

1. Azen, R. dan Walker, C.R. (2011). Categorical Data

Analysis for the Behavioral and Social Sciences.

Routledge, Taylor and Francis Group, New York.

2. Agresti, A. (2002). Categorical Data Analysis 2nd. New

York: Wiley.

3. Pustaka lain yang relevan.

Page 34: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

34

Bisa di-download di

kusmansadik.wordpress.com

Page 35: Analisis Data Kategorik - kusmansadik.files.wordpress.com · 09/11/2018 · 2 Using a log-linear modeling approach is advantageous to conducting inferential tests of the associations

35

Terima Kasih