有序多分类变量模型（ Models for Ordinal Outcomes ）

1

类别资料分析Categorical Data Analysis

王德育(T.Y. Wang)伊利诺州立大学

(Illinois State University) [email protected]

有序多分类变量模型（Models for Ordinal

Outcomes）

3

• 如果变量具有两个以上的类别，虽然类别间的距离是未知数，但是这些类别具有大小或高低等级，可以依序排列，即为有序多分类变量

4

• 例如，收入可分为「上等水平」，「中等水平」与「下等水平」。在抽样调查研究中，选项包括「非常同意」，「比较同意」，「不太同意」，与「很不同意」。• 应以有序多分类变量模型分析

5

-∞ ∞

1 2 3 4

y*

y

1 32

•以 Latent Variable Model（隐性变量模型）说明：

6

–Structural Model ：

iiy βxi*

J to1mfor if *

1

mimi ymy

ifSA 4 ifA 3 if D 2

if SD1

4*

3

3*

2

2*

1

1*

0

i

i

i

i

i

yyy

y

y

7

8

• Two Models for Ordinal Outcomes– ordered probit model （有序勝算對數模型）– ordered logit model （有序概率對數模型） )3,0(~

)1,0(~ N

9

xx mm ymy *

1PrPr

10

•以『推广之线性模型』(Generalized Linear Model) 说明 :– 随机部份 (random component)

– 系统部份 (system component)

– 连接部份 (link component)

–随机部份 : 随机部份指涉依变量 y ，及其相关的概率分布。有序多分依变量之概率分布为多项 (multinomial)分布

11

–系统部份 : 所有自变量的组合–连接部份：胜算对数

12

13

• Ordered logit model （有序胜算对数模型）

xβx mmm )(ln |

1 1,mfor )|Pr()|Pr(

)(| J-mymy

mm

xxx

•前述公式等於一组分析二分类变量的模型：

xy

y -

xx

11Pr11Pr

ln

xy

y -

xx

22Pr12Pr

ln

xy

y -

xx

33Pr13Pr

ln

14

15

prstageed

whitemaleyr

FFmwarm

prstageed

whitemaleyr

mm

89

where)()()|Pr(

89

1

xβ

xβxβx

Example: Mother’s relationship with her Child （职业妇女与亲子关系） (file name: ordwarm2)

16

• Hypothesis Testing – 置信区间，临界值， p 值检定法– test (testing effects of single

e.g., test k5, or multiple coefficients being equal, e.g., test hc=wc)

– lrtest ： Comparing competitive (nested) models using LR test

17

• 四种解释方法 : – predict 指令计算预测概率– prvalue 或 prtab 指令计算『典型』（ profile ）

– prchange 指令计算依变量边际改变或固定值改变– listcoef 指令计算胜算比（ odds ratio ）

18

• Interpretation with listcoef (odds ratio – factor change coefficient):

• 在其他变量不变的情况下，　变量每增加 δ 个单位，低类别的胜算与高类别相比后会改变倍

xβx mmm )(ln |

)exp(1

)exp(),(

),(

|

|

kk

kmm

kmm

xx

xx

ke

kx

19

• (listcoef continued)Note: odds ratio in Stata is based on

男性对职业妇女的亲子关系采较正面态度的胜算（或机率比）要比女性低 0.48 倍

1)SD(categoryLow 4)SA(category Hi

20

• The Parallel Regression Assumption (平行回归假设 ): – the slope coefficients are identical across each regression

– if the assumption holds, the coefficients should be “close”

21

22

• The slop of the three probability curves at 0.5 is

x

yx

yx

y

x

xx

3Pr

2Pr1Pr

23

• 平行回归假设的检定： – brant, detail

24

• 一个看似是有序多分类的变量，有时并不必然就适用有序胜算对数模型• 如果检定的结果显示违反了平行回归的设定，另一个可能考量的是多项胜算对数模型

25

• if the proper ordering is ambiguous, models for nominal variables should be considered

• 有序多分类变量模型练习：–抽烟习惯与健康状况的调查研究–Stata 数据库： smoking.dta

无序多分类变量模型（Models for Nominal

Outcomes ）时间

28

• 无序多分类变量：如果一个变量有两个以上的类别，但是这些类别不具有等级，因此无法按照顺序排列出高低先后，这就是无序多分类变量

29

• 例如，受访人的政治面貌可分为「中共党员」、「民主党派」、与「无党派」。工作单位性质可分为「党政机关」，「国有企业」，「集体企事业」，「个体经营」，「三资企业」，以及「其他企业」

30

True Level of Measurement(测量尺度 )

Nominal(无序

多分 )

Ordinal(有序多分 )

Interval(定距 )

Ratio(定比 )

N OK Inefficient Inefficient Inefficient

O Biased OK Inefficient Inefficient

I Biased Biased OK Inefficient

R Biased Biased Biased OK

Ass

umed

Lev

el

31

–Multinominal logit model (MNLM多项胜算对数模型 ): the most frequently used

nominal regression model–以 Generalized Linear Model说明，MNLM的连接函数为

1 ,..,2 ,1 ,ln Jmn

m

32

• Formally, the MNLM can be written as:

for m=1 to Jwhere n is the base category (it

is important to know which is the base category)

nmii

iinm ny

my|| )|Pr(

)|Pr(ln)(ln βxxxx

33

•在其他变量保持不变的情况下，每增加一个单

位，类别 m 对比类别 n 的胜算对数便改变

个单位knkm

kx

34

–MNLM can be thought of as simultaneously estimating binary logits for all comparisons among the dependent categories

35

• Let occ3 be a nominal outcome with the categories M for manual jobs, W for white collar jobs, and P for professional jobs. Assuming there is a single independent variable ed measuring years of education, we can estimate three binary logits

36

edMP

MPMP |,1|,0)|Pr()|Pr(

ln

xx

edMW

MWMW |,1|,0)|Pr()|Pr(

ln

xx

edWP

WPWP |,1|,0)|Pr()|Pr(

ln

xx

37

• 当依变量有三个类别时，但是实际的计算上我们只须两个模型即可，因为，即

　　　　　　　　　　　　　　　　

)|Pr()|Pr(

ln

)|Pr()|Pr(

ln)|Pr()|Pr(

ln

xx

xx

xx

WP

MW

MP

baba lnlnln

38

• 因为每一个二分勝算對數模型仅使用相关类别中的案例，不相关类别中的案例则被排除在外。所以每一个二分勝算對數模型所使用的样本数都不相同。导出的回归系数与用多项胜算对数模型所计算的会有差异

39

• 多项胜算对数模型中所有的系数都同时估计• 当对照类别不同时，所估算的回归系数也会不同

40

• Example: Occupational Attainment （就业种类） : (file name: nomocc2).

experedWhite PMPMPMPMiPM |,3|,2|,1|,0| )(ln x

experedWhite PBPBPBPBiPB |,3|,2|,1|,0| )(ln x

experedWhite PCPCPCPCiPC |,3|,2|,1|,0| )(ln x

experedWhite PWPWPWPWiPW |,3|,2|,1|,0| )(ln x

41

• Hypothesis Testing – 置信区间，临界值， p 值检定法– Because the dependent variable in MNLM involves more than one categories, testing groups of coefficients is required

42

– Using test and lrtest to test groups of coefficients could be tedious. The mlogtest makes the task simple

– Tests for combining dependent categories: mlogtest

43

• 四种解释方法 : – predict 指令计算预测概率– prvalue 或 prtab 指令计算『典型』（ profile ）

– prchange 指令计算依变量边际改变或固定值改变– listcoef 指令计算胜算比（ odds ratio ）

44

• Interpretation with listcoef (odds ratio – factor change coefficient):– 当某自变量每增加单位时，类别 m 的胜算在与类别 n 对比后会依照下列公式改变

nmke

xx

knm

knm |,

),(),(

|

|

xx

kx

45

• 如果为 1时 , 在其他变量不变的情况下，变量每增加一个单位，类别 m 的胜算在与类别 n 对比后会改变倍

kx

nmke ,

nmke

x

xnmk

kinm

kinm ,

,exp,

1,

x

x

• 不相关选项独立性的假定(Independence of Irrelevant Alternative, IIA)

这表示类别 m 与类别 n 对比后的胜算不会受到其他类别的影响

bnbmnymy

x

xx

exp)|Pr()|Pr(

46

• 检定 IIA–mlogtest, hausman base–mlogtest, smhsiao

• 这两种检定法并不可靠，所得的结果往往相互抵触• IIA 假定的检定，最终仍在使用者的主观判断

47

• 无序多分类变量模型练习：–抽烟习惯与健康状况的调查研究–Stata 数据库： smoking.dta

Documents

有序多分类变量模型 （ Models for Ordinal Outcomes ）

有序多分类变量模型（ Models for Ordinal Outcomes ）