EVENT COUNT ANALYSIS VS ITEM RESPONSE THEORY ...item response theory (IRT) to estimate the latent trait underneath the binary responses and use it to scale the attitude or behavior

1

EVENT COUNT ANALYSIS VS. ITEM RESPONSE THEORY: A COMPARATIVE INVESTIGATION

Tse-min Lin The University of Texas at Austin

[email protected]

Etsuhiro Nakamura Ehime University

[email protected]

with

Dorothy Morgan Ariel Helfer

The University of Texas at Austin

Prepared for presentation at the 2014 Asian Political Methodology Meeting, Tokyo, Japan, January 6-7. A part of this paper is adapted from Tse-min Lin, Ariel Helfer, and Dorothy Morgan, “Event Count Models in Survey Research,” paper presented at the 2011 Annual Meeting of the American Political Science Association, Seattle, Washington, September 1-4.

2

Abstract

Traditionally, survey researchers have used binomial or extended beta-binomial

regressions to analyze event count data composed of binary item responses. The adequacy of

these models, however, depends on the assumption that those responses are generated from

identically and independently distributed Bernoulli trials. Recently, researchers have turned to

item response theory (IRT) to estimate the latent trait underneath the binary responses and use it

to scale the attitude or behavior in question. Since the event count model and the IRT model

involve different assumptions, they are different theoretical models with different empirical

implications. This paper examines their assumptions and conducts Monte-Carlo simulations to

compare the two models. Our results show that the IRT model is not as good as the event count

model when each is applied to data generated from the other. We conclude that the IRT model

should be used with care when the true data generating process is unknown.

3

Introduction

In survey research, it is not uncommon that an index is constructed as a sum of binary

survey items that are coded {0, 1}. The index is then a count of the “events” indicated by

positive responses to the binary items. For example, an index of political participation may be

constructed out of items that ask whether a respondent engaged in certain political activities,

such as “try to persuade,” “display preferences,” “attend meetings,” “do political work,” and

“give money.” When the researcher has a theory relating the index to a set of explanatory

variables, a model is then specified and estimated to test the theory.

It is well known that when the dependent variable of a liner regression model is an event

count, which is discrete and nonnegative, the normality and homoscedasticity assumptions

required of the classical linear regression model are no longer true, and, as a consequence, OLS

estimators are no longer the best linear unbiased estimators (BLUE). Models specifically

designed to take account of the data generating process of event counts have been developed.

When the count is theoretically infinite, the Poisson and negative binomial regressions are

appropriate. When the count is necessarily finite, such as in the case of survey research, the

binomial and (extended) beta-binomial regressions are more appropriate. These models are

estimable by the MLE method.

In this paper, we focus on finite event count models as applied to survey data. Both the

binomial and beta-binomial models have been widely used (King 1989). The adequacy of these

models, however, depends on the assumption that the binary responses constituting the event

count are generated from identically and independently distributed Bernoulli trials. If the

assumption is violated, these models may be rendered questionable. Recently, researchers have

turned to item response theory (IRT) to estimate the latent trait underneath the binary responses

4

(Schrodt 2007; Gillion 2009). The estimated trait is then used in lieu of event count as a scale of

the attitude or behavior in question.

Since the event count model and the IRT model involve different assumptions, they are

different theoretical models with different empirical implications. This paper compares the two

modeling approaches in the context of survey research. We examine their theoretical

assumptions and conduct Monte-Carlo simulations to evaluate their statistical performance. Our

findings, surprisingly, show that the IRT model is not as good as the event count model when

each is applied to data generated from the other. They suggest that the IRT model should be used

with care when the true data generation process is unknown.

This paper proceeds as follows. We first introduce the probability distributions used in

(finite) event count analysis. We then discuss the issue of heterogeneity and the adequacy of

traditional event count models. Next, we compare the theoretical assumptions of event count

models and IRT models. Lastly, we conduct Monte-Carlo simulations and discuss our results

before making some concluding remarks.

Event Counts and Their Probability Distributions

The simplest type of event count is whether an event has occurred at all. Called a

Bernoulli trial, this situation has only two possible outcomes: success (which we call x = 1) and

failure (which we call x = 0). The probability of success is π, which is a constant. Formally, a

random variable X follows a Bernoulli distribution with parameter π when

−=

ππ

1yprobabilitwith 0

yprobabilitwith 1X where 10 ≤≤ π

More complicated event counts can all be understood in terms of Bernoulli trials. If we

consider n independent Bernoulli trials Xj (j=1, 2, …, n) and each trial has the same probability

5

of success π, then the total number of successes =

=n

jjXY

1

out of n trials is a random variable

that follows a binomial distribution, which is given by the probability mass function (pmf)

( ) ( ) yny

y

nyY −−

== ππ 1Pr

where y = 0, . . . , n and 10 ≤≤ π . For example, Y might be the number of correct answers in a

battery of political knowledge questions or the number of “yes” responses to a battery of political

participation questions. The assumptions underlying the binomial distribution are that the

number of trials is finite, the trials are independent, and the probability π of success is identical

for the trials. Changing these basic and well-known assumptions yields other event counts. For

example, if the probability of success is kept constant across items but allowed to vary across

individuals according to the beta distribution, we have a beta-binomial distribution. When

modeling event counts with the binomial distribution, researchers often find that there is more

variance than what might be expected. The beta-binomial distribution is a popular generalization

of the binomial distribution that allows for this over-dispersion.

Prentice’s Extended Beta-Binomial Distribution

The beta-binomial distribution is a mixture of a beta distribution and a binomial

distribution. To derived the pmf of the beta-binomial random variable Ybb, we first multiply the

conditional pmf of a binomial variable Y|π by the probability density function (pdf) of π,

assumed to be a continuous random variable that follows a beta distribution, and then we

integrate the resulting joint probability function of Y and π over the domain of π (i.e., [0,1]) to

get the unconditional, marginal distribution Ybb:

6

( ) ( ) ( )( ) ( ) ( )

( )( ) ( )

( ) ( )( )

( )( )βα

βα

βαβα

βαβα

πππβαβαππ βα

,

,

11Pr1

0

11

Β−++Β

=

++Γ−+Γ+Γ⋅

ΓΓ+Γ⋅

=

−ΓΓ+Γ⋅−

== −−−

yny

y

n

n

yny

y

n

dy

nyY yny

bb

Note that ( )xΓ and ( )xΒ refer to the gamma function and beta function, respectively. The

expected value and variance of the beta-binomial distribution are given by:

( )

( ) ( )( ) ( )12 +++

++=

+=

βαβαβααβ

βαα

nnYVar

nYE

bb

bb

In his influential 1986 paper, Prentice extended the beta-binomial distribution to allow

for negative correlations among independent and identically-distributed Bernoulli trials (e.g.,

survey items) within an experimental unit (e.g., individuals). To do this, he uses the following

parameterization of the beta-binomial distribution:

( ) ( ) ( ) ( )

( )( ) ( ) ( ) ( )[ ]1

1

0

1

0

1

0

1111

1/1;Pr

1

−

−

=

−−

=

−

=

+−+−=

=

++−+

==

+=

+=

∏∏∏

γγρρ

ρ

γγργρ

βαγ

βααρ

nnYVar

nYE

jjjy

nnyY

bb

bb

n

j

yn

j

y

jbb

The re-parameterization results in the more intuitive interpretation of ρ as the expected

value of the binomial parameter π . γ has a less intuitive interpretation, but it is commonly

viewed as a measure of dispersion that is related to the variance of Y. Since γ is based on the

transformation ( ) 1−+= βαγ and since α > 0 and β > 0, γ must also be greater than 0 by

7

inheritance. Prentice’s insight was that γ need not be restricted to its non-zero, positive fate; the

re-parameterization made this inheritance unnecessary so long as the following condition held:

( ) ( )( ){ }11 11,1max −− −−−−−≥ nn ρργ

The dispersion parameter γ is important because it gives rise to the correlation

coefficient:

γγδ+

=1

According to Prentice, δ is the pairwise correlation between the binary trials that

comprise an event count. By allowing the dispersion parameter γ and by extension δ to take on

negative values, Prentice extended the beta-binomial distribution to allow for under-dispersion.

King’s Exposition of the Extended Beta-Binomial

As developed by Prentice, the extended beta-binomial distribution is a mixture

distribution formed by reparameterizing the π of the binomial distribution as a beta-distributed

random variable. The extended beta-binomial distribution presumes that the Bernoulli trials that

comprise the event count are identically and independently distributed. It is only under this iid

assumption that the sum of the Bernoullis can be a binomial that is to be further parameterized.

In the context of survey research, this means an individual respondent is assumed to have the

same probability of responding positively to all the binary items comprising a composite index,

but that probability is allowed to vary randomly across individuals according to the beta

distribution. For each individual, the index – an even count – is a binomial random variable, but

when all individuals are considered, the index becomes a random variable following the

extended beta-binomial distribution.

8

King (1989), however, interpreted the extended beta-binomial distribution as based on

“weakening the binomial assumption that the unobserved binary random variables making up Yi

have constant π ” (p. 45). This interpretation is inconsistent with Prentice’s original model, and

the inconsistency has been pointed out by Palmquist (1997). Nevertheless, the idea that the iid

assumption about the Bernoulli trials is too restrictive is important. In the study of political

participation, for example, it is apparently unreasonable to assume that an individual respondent

has the same probability of engaging in “persuading others” “displaying preferences,” “attending

meetings,” “giving money,” etc. A more realistic model for event count analysis in survey

research must allow for the weakening of the binomial assumption as King suggested. In a later

section, we introduce the Poisson-binomial distribution that does just that. In what follows, we

first review Palmquist’s unpublished work that has contributed significantly to the subject.

Palmquist’s Correction of King

Palmquist’s work centered largely around the fact that the variance of the beta-binomial

distribution is consistently larger – though under some special circumstances, under the heading

of the extended beta-binomial distribution, it may be smaller – than the variance of the simple

Binomial distribution, which has the familiar properties,

( )( ) ( )ππ

π−=

=1nYVar

nYE

binomial

binomial

This simple distribution can be represented graphically as in Figure 1a, where, the squares

represent Bernoulli random variables and each one has parameter π . These Bernoulli’s are

arranged in units of n items (here, n = 6), each indexed i to denote an individual. The variable of

interest, Yi , is of course the sum of all the Bernoulli outcomes in a unit.

9

(Figure 1 about here)

The greater variance of the beta-binomial distribution can be understood to be a result of

letting π become a random variable. We diagram that scenario in Figure 1b. In that figure, the

circle represents a beta distribution, with parameters α , and β , and the arrows from it indicate

that iπ is a beta-distributed random variable. Because this change from the binomial

distribution creates a higher Var(Yi), the additional variance is often referred to as “extra-

binomial” variance.

It is important to note that, in describing these models, we are not allowing for the re-

parameterization of our parameters as functions of some set of covariates. In a real-world

application of a distribution like the one diagramed above – imagine again that each i indexes a

survey respondent, and each Bernoulli variable is a survey question – we might well want to let

the beta distributions’ parameters vary as a function of the respondents’ characteristics. Indeed,

it would be hard to imagine an application in which we would be justified in asserting that the

parameters of the Bernoulli variables or of the beta distributions would be the same for all

individuals. However, even allowing for such individual-specific covariate-explained variation,

all of our theoretical results (i.e., those in the ensuing sections) would continue to hold

conditional on the covariates. Our findings would then refer, for example, to Var(Yi | xi'β) – but,

of course, the unconditional variance Var(Yi) would now include the additional effects of the

variation in the x’s.

Indeed, the variance of Yi will continue to be a primary focus as we proceed, as it was for

Palmquist. His work was thorough in clarifying how much extra-binomial variance we should

expect in data that followed a beta-binomial distribution as opposed to a simple binomial

10

distribution. It was in the light of those considerations that Palmquist introduced some

interesting complications into the beta-binomial model. He found that under certain

circumstances, there would be a reduction in the extra-binomial variances – in the limit case, the

extra-binomial variance could even completely disappear. Figure 1c is an illustration of the

modified beta-binomial model that Palmquist explored.

In this picture, each n = 6 unit has been divided into two “clusters” indexed by j, each

with nj = 3 Bernoulli’s. Within the smaller clusters, πij is the same across Bernoulli’s, but each

cluster’s πij is determined independently by one of two independent beta distributions. Palmquist

conclusively showed that in such a scenario, the variance of Y is lower than in the normal beta

Binomial case. Some – but not all – of the extra-binomial variance which we usually see in the

beta-binomial distribution is offset by the division of the Bernoulli’s into clusters. (In fact, this is

perfectly analogous to a well-known result that says that the sum of two binomial variables has a

variance equal to or less than the variance of either of the two distributions.)

Palmquist took this still one step further. He showed that, if the number of clusters was

increased to the limit – that is, if we let there be as many clusters as Bernoulli’s, so that each

cluster contains only one Bernoulli variable – then all of the extra-binomial variance disappears.

Such a case is no different than a simple binomial distribution with π equal to the average of the

expectations of all the n beta distributions.

While Palmquist’s attempt to introduce within-unit heterogeneity used multiple Beta

distributions, we can easily imagine a scenario in which that is not the case. Take data

generation process illustrated in Figure 1d. In this distribution, a single beta determines a full set

of π-values – one for each of the n items in a unit – and these remain unchanged across units.

11

Here we have within-unit heterogeneity but no between-unit heterogeneity. This distribution is

known as the “Poisson-binomial” distribution, and it is discussed in depth in the next section.

Finally, we note that between-unit heterogeneity is easily introduced into the Poisson-

binomial. In fact, Palmquist also explored the somewhat trivial special case represented in

Figure 1e. In this special case, as Palmquist noted, the under-dispersion resulting from the

within-unit heterogeneity exactly cancels out the over-dispersion caused by the across-unit

heterogeneity. The result is a distribution with binomial variance.

In a typical political science application – say, for example, that the unit of observation is

a survey respondent, and the Bernoulli variables are survey items – we must be careful to

identify the data generating process most likely to resemble the nature of the experimental

design. Do the survey items all have the same probability of being answered “1” as opposed to

“0”? Are there “clusters” of questions with similar probabilities? Is there likely to be random

variation from one respondent to the next? As we have clearly seen, the variety of ways to

model this seemingly simple data raises a number of questions. What may at first appear to be a

simple binomial distribution may in fact have subtle complications that seriously affect the

distribution’s variance. Of course, in order to be able to use these variations on the binomial

distribution, we must explore them more rigorously than has been done in this section. The

following section explores the mathematics underlying the various introductions of heterogeneity

to the binomial model.

The Issue of Heterogeneity

Deciding whether to use one variant of event count model over another depends on the

nature of the heterogeneity at hand. If parameters are allowed to vary, they are heterogeneous; if

12

they are assumed to be constants, they are not. Using a binomial distribution to model an event

count in survey research assumes homogeneity in two senses: First, the probabilities of success

in the n survey items are constant, and second, the probabilities of success across the N

individuals are constant. These assumptions can be expressed formally as follows:

( )π,~ nbinomialYi

where Yi is a binomial random variable, n and π are constant parameters, and i = 1, 2, . . . , N

refers to individuals. Note that the parameter π has no subscripts, which means π is the same

across survey items and across individuals. But assuming that there is homogeneity across

individuals and across items is often unrealistic. For example, it may be unrealistic to assume

that each individual has the same probability of correctly identifying political figures in a battery

of questions designed to measure political knowledge; instead, the researcher may want to allow

for heterogeneity across individuals. Alternatively, it may be unrealistic to assume that there is

the same probability of a correct answer across survey items; instead, the research may want to

allow for heterogeneity across items. Allowing for heterogeneity across individuals is quite

common and leads to the use of the beta-binomial or extended beta-binomial distribution. But

allowing for heterogeneity across items is rarely done or recognized. All too often, researchers

intend to allow for heterogeneity across items but actually allow for heterogeneity across

individuals. This is an understandable error – after all, the empirical footprints of both types of

heterogeneity are virtually indistinguishable. Accordingly, it is useful and necessary to clarify

further the distinction between the two types of heterogeneity in survey research.

Heterogeneity across Sets of Trials

13

To allow for heterogeneity across individuals, we allow the probability of success π for a

binomial variable Yi to vary across to a beta distribution with parameters ρ and γ. This yields a

random variable that follows a beta-binomial or extended beta binomial distribution:

( )( )γρπ

ππ,~

,~|

beta

nbinomialY

i

iii

Each individual i is associated with n items, which are independent Bernoulli trials with

parameter πi, and each πi is drawn from a beta distribution with parameters ρ and γ. The

parameter πi varies across individuals, but not across items. This is because a binomial random

variable is, by definition, a sum of independent, identical Bernoulli variables. If πi varies across

items, we would have independent but non-identical Bernoulli variables, which means we would

no longer have binomial variables.

By using extended beta-binomial distribution for event counts, we assume that we

observe Yi successes for individual i in n independent, identical Bernoulli trials which have the

probability of success πi, where πi is a random variable that follows a beta distribution. By

allowing πi to follow the beta distribution, we generalize the binomial distribution; when all the

respondents’ probabilities of success are the same (i.e., when π is a constant), the extended-beta-

binomial distribution reduces to the binomial distribution with parameters n and π.

The purpose of generalizing the binomial distribution via the extended-beta-binomial is

to account for heterogeneity across individuals. But the extended beta-binomial cannot handle

the possibility of heterogeneity within individuals, or heterogeneity across items because by

starting with a binomial distribution and allowing the parameter π to vary, we already assume

that there is no heterogeneity across independent Bernoulli trials. That is why using the extended

14

beta-binomial distribution necessarily requires that the probability of success is different for each

individual, but the same across each individual’s survey items.

Heterogeneity within Sets of Trials

To account for heterogeneity across independent Bernoulli trials, we must therefore start

with a Bernoulli distribution, not with a binomial distribution. To derive this kind of distribution

for event counts, we need to identify the distribution that is a sum of independent but non-

identical Bernoulli trials – in other words, the so-called Poisson trials (Pepper 1929; Feller 1968).

It turns out that the distribution of the sum of Poisson trials is called the Poisson-binomial

distribution.1

The Poisson-Binomial

Let Ypb denote the number of successes in n independent, non-identical Bernoulli trials.

When the trials are identical, Ypb collapses into the familiar binomial distribution with parameters

n and π, where π denotes the probability of success at each jth trial (j = 1, . . . , n). When the

trials are not identical and the jth trial has its own distinct probability of success πj, and the

distribution of Ypb is known as the Poisson-binomial.

Assuming that 10 ≤≤ jπ and letting ( )nππ ,,1 =π , the Poisson-binomial distribution

(Wang 1993) is given by

( ) ( ) ∏∏∈ ∈∈

−

==

xcFA Aj

jAj

jpb yY ππ 1|Pr π

1 Poisson was the first to consider extending the binomial distribution to non-identical Bernoulli trials; the names “Poisson-binomial distribution” and “Poisson’s binomial distribution” reflect this piece of statistical history. The name “Poisson-binomial distribution” is more popular (see Le Cam 1960; Hodges and Le Cam 1960; Edwards 1960; and Chen 1974).

15

where n is a positive integer denoting the number of Bernoulli trials and y = 0, 1, . . . , n.2

Appendix 1 provides an intuitive exposition of the Poisson-binomial distribution. The

expectation and variance of Ypb are given by:

( )

( ) ( )

=

=

−=

=

n

jjjpb

n

jjpb

YVar

YE

1

1

1 ππ

π

Recall that the expected value and variance of an extended beta-binomial distribution

(denoted by Y) are

( )( ) ( ) ( ) ( ) ( ) 11111 −+−−+−=

=

γγρρρρρ

nnnYVar

nYE

Note that the first term of ( )Var Y is the variance for a binomial distribution and the second term

is the so-called “extra-binomial” variance that allows for over-dispersion in the beta-binomial.

With a positive γ, this extra-binomial term is the reason that heterogeneity across individuals

leads to a variance that is greater than the binomial variance (Palmquist 1997).

2 The sets are defined as follows:

{ }{ }( ){ }x

xx

x

innpermutatioaisFAAP

xAnAAF

ℜ∈=

=⊆=

ππ ,:

,,,1:

where |A| denotes the number of elements of A, and Ac denotes the complement of A. If A ∈ Fx, then

( )xiiA ,,1 = is an ordered set such that kj ii < if j < k, and ( ) ( ) ( )( )xiiA πππ ,,1 = is a permutation of the

elements of A. Fn contains the n-tuple (1, . . . , n) and

=

x

nFx

is the number of subsets of size x of {1, . . . , n} and !nPx = is the number of permutations of {1, . . . , n}. For

each fixed A ∈ Fx, there are x! corresponding elements in Px, which is the number of permutations of the elements of A, hence

( ) ( )!!

!

xn

P

xn

nP n

x −=

−= .

16

In contrast, heterogeneity across items leads to a variance that is smaller than the

binomial variance. We can see that the variance for the Poisson-binomial distribution is

necessarily smaller than or equal to the variance for the binomial distribution by using a special

case of the Cauchy-Schwarz theorem:

( ) ( )

2

2

1 1

2

2

1 1 1 1

2

1 1 1 1

1

1

1

11

1 1

n n

j jj j

n n n n

j j j jj j j j

n n n n

j j j jj j j j

n

j jj

n

n

n

n

π π

π π π π

π π π π

π π π π

= =

= = = =

= = = =

=

≤

− ≥ −

− ≥ −

− ≥ −

This is the result that Palmquist (1997) reported in his discussion of parameter heterogeneity

(although he did not identify the Poisson-binomial distribution as the data-generating process

characterized by heterogeneity across items). From this result, we can conclude, as Palmquist

did, that across-item heterogeneity will lead to under-dispersion of the dependent variable,

relative to across-item homogeneity. In other words, heterogeneity across items will produce a

smaller variance than the binomial variance, but heterogeneity across individuals will produce a

larger variance. If both types of heterogeneity are present in survey data, they can counteract

each other’s effects. The big question for event count analysis, therefore, is whether traditional

event count models remain adequate under this circumstance.

Following King’s (1989) insight, we speculate that the extended beta-binomial model

will remain adequate. This is because with Prentice’s extension of γ to the negative domain,

“extra-binomial” variance can turn negative and lead to the reduction of binomial variance.

Because of the flexibility, the extended beta-binomial model can accommodate both over- and

17

under-dispersion, and, hence, both across-individual and across-item heterogeneity. We

demonstrate this with examples below.

Examples of the Poisson-Binomial and the Adequacy of the Extended Beta-Binomial Model

As an example of the Poisson-binomial, assume that each probability of success jπ

associated with the Bernoulli random variable jX (j = 1, . . . , n ) is drawn from a beta

distribution with parameters ρ and γ , i.e.,

),(beta~

)(Bernoulli~|

γρπππ

j

jjjX with

ρπππ

=

=

)(

)|(

j

jjj

E

XE

We derive the unconditional distribution of jX by first multiplying the conditional distribution

of jjX π| with the distribution of jπ to get the joint distribution of jX and jπ , and then

integrating this joint distribution over the domain of jπ , which is {0,1}.

)1())1((

)())1(1(

)1())1(()(

)()1(

)Pr(

11

11

1

0

1)1(1

11

11 11

−−

−−

−−−−−

−−

+Γ−Γ+Γ−+−Γ

=

−⋅−ΓΓ

Γ⋅−=

=

−−

ργγρργγρρ

πππγρργ

γππ γρργ

jj

jjjx

jx

j

jj

xx

d

xX

jj

Therefore,

ρ

ρ−==

==

1)0Pr(

)1Pr(

j

j

X

X

This means that the unconditional distribution of jX is simply a Bernoulli distribution

with parameter )( jE πρ = . In other words, by allowing the parameter of a Bernoulli distribution

to vary according to a beta distribution , we end up with yet another Bernoulli distribution but

18

with a different parameter, which happens to be the expected value of the original parameter as a

beta-distributed random variable. We call the unconditional distribution of jX a “beta-

Bernoulli” distribution. The moment generating function of this distribution is

tttxX eeeeEtM j

jρρρρ +−=+−== 1)1()()( 0

The moment generating function of the sum of beta-Bernoullis =

=n

jjXY

1

is:

ntn

j

tn

jXY eetMtM

j)1()1()()(

11

ρρρρ +−=+−== ∏∏==

which corresponds to the moment generating function of a binomial distribution with parameters

n and )( jE πρπ == . The variance of Y is

)1()1()(1

ρρρρ −=−==

nYVarn

j

When this Poisson-binomial process is assumed to hold true for all individuals Ni ,...,2,1= , we

have the model shown in Figure 1e, which is the case when the under-dispersion resulting from

the within-unit heterogeneity exactly cancels out the over-dispersion caused by the across-unit

heterogeneity. The result is a distribution with binomial variance, given by Var(Y) above.

This model can be generalized to the one shown in Figure 2. Using the frame of survey

research, each individual respondent (indexed i, as always) has his/her own beta distribution

from which the πij’s are generated. Palmquist never explored this possibility. As the arrows here

indicate, the expected value of each individual specific beta distribution, that is the iρ of each

Bi, is in turn generated by the same universal beta distribution B(ρ,γ). (It is worth noting that the

distributions presented in Figures 1a, 1b, 1d, and 1e above can all be construed as special cases

of this one.)

19

(Figure 2 about here)

Through repeated simulations, we found this Poisson-Binomial process can be estimated

by a beta-binomial model. For example, we set n = 10, ρ = 0.5, γ = 0.5, and let all of the γi = 0.5.

When we simulated the N observations generated by this process, the variance of the data

was 10)( ≈YVar . Now, if we had assumed that this was a simple beta-binomial distribution,

with ρ = 0.5 and γ = 0.5, we could have calculated the variance:

10])1()1(1)[1()( 1 =+−+−= −γγρρ nnYVar

Thus, our simulation result corroborates our speculation. The extended beta-binomial

distribution, with the two parameters ρ and γ , appears to have the flexibility to correctly reflect

data with both within-unit heterogeneity and between-unit heterogeneity or, more specifically,

data with heterogeneity across both items and individuals.

Event Count Regression Models and Item Response Theory

Generalized linear regression models based on the beta-binomial family has been used to

study event counts in survey research. In recent years, however, some political scientists have

turned to item response theory (IRT) to conduct such analyses (Schrodt 2007; Gillion 2009). IRT

models are not really designed for analyzing event counts. They are relevant, however, because

they, like factor analysis, can estimate the latent trait underneath the binary responses comprising

an event count and use it to scale the attitude of behavior in question. In other words, estimated

latent traits substitute event counts as measures of attitudes and behaviors. Moreover, IRT

models have the advantage in that they can explicitly model within-unit heterogeneity by

20

including parameters representing item-specific properties, i.e., difficulty, discrimination, and

guessing parameters. In this section, we compare the two approaches. Because covariates are

necessary in any realistic analysis, we first provide a formal definition of the beta-binomial

regression.

The Extended Beta-Binomial Regression Model

Using the survey research framework, suppose there are N individuals each of whom

responded to n binary items. Let ijX represent whether individual i answers item j positively,

and ijX ~ iid Bernoulli( iπ ) across items with ),(Beta~)1Pr( γρπ == iji X . Then =

=n

jiji XY

1

,

the number of items that individual i answers positively, follows the beta-binomial distribution:

( ) ( ) ( ) ( )∏∏∏−

=

−−

=

−

=

++−+

==

1

0

1

0

1

0

1/1;Prn

j

yn

j

y

jiii jjj

y

nnyY

ii

γγργρ

with

( )( ) ( ) ( ) ( )[ ]11111 −+−+−=

=

γγρρ

ρ

nnYVar

nYE

i

i

})1)(1(,)1(max{

1011 −− −−−−−≥

≤≤nn ρργ

ρ

When the parameter ρ is related to covariates ix using the logit link

)'exp(1

)'exp()(E

ββπρi

iii x

x

+==

then we have the extended beta-binomial regression model.

Note that if 0=γ ,

( ) ii yny

iii y

nnyY −−

== )1(;Pr ρρ

and extended beta-binomial regression reduces to binomial regression.

21

As discussed above, the extended beta-binomial regression model can accommodate data

with heterogeneity across both items and individuals. It is adequate for event count analysis in

survey research.

IRT Models as Mixed Effect Logit Models

According to Rijmen, Tuerlinckx, De Boeck and Kuppens (2003) and De Boeck and

Wilson (2004), IRT can be represented as a mixed logit model. Specifically, the probability that

the i-th ( Ni ,...,2,1= ) survey respondents correctly answers the j-th ( nj ,...,2,1= ) item can be

represented as

)''exp(1

)''exp(

iijjij

iijjijij zx

zx

θβθβ

π++

+=

where ijx is a p-dimensional person-by-item covariates; ijz is a q-dimensional vector of person-

by-item covariates; jβ is a p-dimensional vector of item-specific fixed effects; and iθ is a q-

dimensional vector of individual-specific random effects. Several special cases can be

derivedfrom this specification.

The Basic Rasch Model.

If we assume 1== qp and 1== ijij zx , then

)exp(1

)exp(

ij

ijij θβ

θβπ

+++

=

This is the basic Rasch model where jβ is the difficulty parameter of item j, and iθ is a random

variable. A simple one-parameter IRT model, the basic Rasch model is a mixed-effect logistic

regression model containing no covariates but only constant terms in ijx and ijz .

The Latent Rasch Regression Model with Individual-Specific Covariates for Fixed Effects.

22

If we allow individual-specific covariates ix for fixed effects in the basic Rasch model

above, we derive a more general latent Rasch regression model:

)'exp(1

)'exp(

iji

ijiij x

x

θβθβ

π++

+=

A special case of this model is when all items share the same vector jβ (i.e., ββ =j for

nj ...,3,2,1= ). In this case, we can drop the j subscript from ijπ to get

)'exp(1

)'exp(

ii

iii x

x

θβθβπ++

+=

This is a Rasch regression model for homogeneous items with both fixed effects and random

effects.

The Binomial Regression Model.

Based on the Rasch model for homogeneous items above, if we further remove the

random effect, we get

)'exp(1

)'exp(

ββπi

ii x

x

+=

which happens to be the binomial regression model with

ii yni

yi

iii y

nyY −−

== )1()Pr( ππ

with a logit link for iπ . Note that the constant term of the linear form β'ix is the common

difficulty parameter shared by all items. This shows that the binomial regression model is a

special case of the latent Rasch regression model.

The Latent Rasch Regression Model vs. the Extended Beta-Binomial Model

23

If the binomial model is a special case of the Rasch latent regression model, what about

the extended beta-binomial regression model? Could it also be a special case of IRT? The answer

to this question is no. There are, however, similarities between the extended beta-binomial model

and the Rasch model.

Recall that the Rasch latent regression model is

)'exp(1

)'exp(

iji

ijiij x

x

θβθβ

π++

+=

Taking the logit function on both sides gets

ijiij

ij x θβπ

ππ +=

−= ')

1ln()logit( ij

where iθ is the (random) ability parameter generally assumed to follow a normal distribution

with zero mean and constant variance: ),0(~ 2σθ Ni . Thus,

),'(~)logit( 2ij σβπ jixN

In contrast, the extended beta-binomial regression model assumes

),(~ γρπ ii B with )'(logit)'exp(1

)'exp( 1- ββ

βρ ii

ii x

x

x =+

=

or

)),'(logit(~ -1 γβπ ii xB .

Clearly, the two models are similar in that both make distributional assumptions about the

Bernoulli parameter ijπ . They are different in the specification of the assumptions. Because of

the limited nature of ijπ (i.e., 10 ≤≤ ijπ ), while Rasch regression transforms ijπ into )logit( ijπ

in order to use the unlimited normal distribution, extended beta-binomial regression directly lets

iπ follow the limited beta distribution. By making such assumptions, both models allow

24

conditional heterogeneity across units. They are, however, different in terms of heterogeneity

across items. The Rasch model, by allowing jβ to vary across items, directly introduces within-

unit heterogeneity. The extended beta-binomial model does not have such a mechanism,

although, as we have shown in the previous section, it does have the flexibility in

accommodating within-unit heterogeneity. It is noteworthy that while both models assume

independence among items, they, by virtue of between-unit heterogeneity, can produce positive

inter-item correlations in data. In this regard, the extended beta-binomial model is more flexible

in that a negative value of the parameter γ can produce negative inter-item correlations as well.

Finally, although both models subsume the binomial model, neither is a special case of the other.

Table 1 summarizes the similarities and differences between the two models:

(Table 1 about here)

In the section that follows, we conduct Monte Carlo simulations to compare the Rasch

model and the extended beta-binomial regression model.3

Monte-Carlo Simulation Results

We conduct Monte-Carlo simulations to compare the performances of the extended beta-

binomial regression model and the latent Rasch regression model. Because event counts can be

generated from either process, we simulate data from both of them and fit the data to each model.

In practice, the true data generating process is usually unknown. We expected that each model

would perform better when it is applied to data generated from its own process. But we were

3 We do not discuss two-parameter IRT models in this draft. Monte-Calro simulations involving two-parameter IRT models will be conducted and reported in a future version of this paper.

25

hoping our results can shed lights on which model performs better if it is wrongly applied to data

generated from a different process. All the simulations reported here involve two individual-

specific covariates. We vary sample size (N=200/600/1000) and item number (n=5/10) to ensure

the robustness of our results. Details of the simulations are described in Appendix 2.4

Extended Beta-Binomial Regression DGP

In the first simulation, we generate data from the beta-binomial regression DGP. The

estimation results are shown in Table 2. As expected, the estimates of the beta-binomial

regression model does not have detectable bias and are relatively efficient as the sample size and

item number increase. In contrast, fitting the wrong (Rasch) model produces significant positive

biases in absolute values and larger RMSEs. It seems that the Rasch model, with its specific form

of between-unit heterogeneity, cannot adequately capture the heterogeneity of the beta-binomial

type.


Latent Rasch Regression DGP

Table 3 shows the estimation results of our second simulation based on the Rasch DGP.

Not surprisingly, the Rasch model performs better than the extended beta-binomial model, which

produces negative biases in absolute values and relative inefficiencies. The RMSEs associated

with the wrong model, however, are not so large as in the first simulation. The reason, according

4 We also generated data from the binomial process and fit them to both Rasch regression and extended beta-binomial regression. Not surprisingly, both models perform well. We don’t provide the simulation/estimation details here in order to save space.

26

to our detailed analysis, is that the beta-binomial regression estimates, although generally

underestimated, are more narrowly distributed, which makes the RMSE small.


From our Monte-Carlo simulations, we reach the following conclusions. First, both beta-

binomial regression and Rasch regression can rather accurately deal with data generated from the

binomial DGP. This is not surprising because the binomial is subsumed under both models.

Second, both beta-binomial regression and Rasch regression can rather accurately estimate

parameters when applied to data generated from their own DGPs. Third, both models can

produce biased and inefficient estimates when applied to data generated from each other, unless

the other process happens to collapse to the binomial model. Fourth, the problems caused by

fitting data to the wrong model is more severe for Rasch regression than for extended beta-

binomial regression.

Therefore, using IRT models for event count data is not always a good strategy. The

binomial model is a special case of IRT but the beta-binomial is not. If we don’t know what the

true DGP is, as is usually the case, applying event count models is a safer choice than applying

IRT models.

Concluding Remarks

Event counts are widely used in survey research. Despite the efforts by King and

Palmquist, the nature of the processes that generate event counts and their implications on

statistical inference are still not well understood. In this paper, we use survey research as a

27

context in which to examine the data generating processes of event counts. By introducing the

beta-Bernoulli mixture distribution, we relaxed a restrictive assumption required for the binomial

distribution to be a valid model for event counts, namely, the binary trials that comprise an event

count must have identical Bernoulli probabilities. By relaxing this critical assumption, we have

generalized the binomial distribution to the Poisson-binomial distribution and allowed for more

flexibility in modeling survey response. On the basis of the less restrictive assumption, we

examined the adequacy of the binomial regression model and the extended beta-binomial

regression model. Our findings show that the extended beta-binomial regression model is

adequate for data generated with both within-unit and between-unit heterogeneity. This makes

the model more realistic than previously understood.

We also investigated IRT models as an alternative to event count models. IRT models

seem to have an advantage over event count models because they explicitly model within-unit

heterogeneity by including parameters representing item-specific properties. Our findings,

surprisingly, show that event count models depict data generated from IRT models better than

IRT models depict data generated from event count models. Therefore, if we don’t know what

the true data generating process is, applying event count models is a more advisable choice than

applying IRT models.

28

Appendix 1: An Exposition of the Poisson-Binomial Distribution

The Poisson-binomial distribution can be written

( ) ( ) ∏∏∈ ∈∈

−

=

xcFA Aj

jAj

jn xf ππ 1;p

where A is an ordered x-tuple of the elements in the set {1,…,n} arranged in ascending order, Fx

is the set of all such x-tuples, and Ac is the (n-x)-tuple consisting of whatever elements of

{1,…,n} are not in A (also arranged in ascending order). The vector π, of course, contains the

probabilities of each of the n Bernoulli variables obtaining a value of 1.

While this seems indeed to be the most concise way of representing the probability mass

distribution function in question, it is cumbersome to have to consider the various vectors and

sets involved in what is otherwise an extremely intuitive distribution. For that reason, we clarify,

with a few simple examples, how the Poisson-binomial distribution works.

Consider the case when n = 2 – that is, when we have two Bernoulli variables, each with

its own pi. The values of Poisson’s binomial distribution are

212

12212

212

);2(

)1()1();1(

)1)(1();0(

ππππππππ

πππ

=−+−=

−−=

f

f

f

Clearly, the distribution itself is more intuitive than the complicated mathematical

notation used to describe it suggests at first glance. A step to n = 3 is an easy and helpful

illustration.

3213

1322313213

2133123213

3213

);3(

)1()1()1();2(

)1)(1()1)(1()1)(1();1(

)1)(1)(1();0(

ππππππππππππππ

ππππππππππππππ

=−+−+−=

−−+−−+−−=−−−=

f

f

f

f

29

Even a generalization for n Bernoulli’s is relatively easy to understand when thus laid

out:

nn

nnn

nnn

nnnnn

nnnnn

nn

nf

f

f

f

πππππ

πππππππππππππππ

ππππππππππππππππππππππππππ

ππππ

...);(

...

)}1)...(1)(1({

...)}1)...(1)(1(...)1)...(1)(1({

)}1)...(1(...)1)...(1)(1()1)...(1({);2(

)1)...(1)(1(...)1)...(1)(1()1)...(1();1(

)1)...(1)(1();0(

321

2211

13124132

1214231321

12131221

21

=

−−−++−−−++−−−

+−−++−−−+−−=−−−++−−−+−−=

−−−=

−−

−

−

−

The generalized case quickly becomes unwieldy. Certainly, the concise expression using

summations and products is neater and, once clearly understood, preferable. However,

something like the above examples might be useful in clarifying the relatively simple concept

behind the Poisson-Binomial distribution.

30

Appendix 2. Simulation and Estimation Procedures

All the data generating processes in the simulation reported here involve two individual-

specific covariates, ix1 and ix2 , which were generated from N(0,1). The coefficient parameters

00 =β , 2.01 −=β and 3.02 =β , giving a linear form iii xxx 21 3.02.0' +−=β .

For the extended beta-binomial DGP, 2=γ . For the IRT (Rasch) DGP, the difficulty

parameters are [ ]1.6449 0.5978, 0, 0.5978,- 1.6449,-=α for the 5-item model, and

[ ]1.6449 1.0364, 0.6745, 0.3853, 0.1257, 0.1257,- 0.3853,- 0.6745,- 1.0364,- 1.6449,-=α for the

10-item model.

Data were generated for three different numbers of individuals (i.e., sample size),

N=200/600/1000, and two different numbers of items, n=5/10.

Simulation/Estimation Based on the Extended Beta-Binomial DGP.

1. Compute )'exp(1

)'exp(

ββρi

ii x

x

+= .

2. Compute γρi

ia = and γ

ρiib

−= 1.

3. Generate ),(~ iii baBπ . This is equivalent to )),'(logit(~ -1 γβπ ii xB .

4. Generate dichotomous values of ijy with probability iijy π== )1Pr( for all ji, .

5. Compute =

=n

jiji yy

1

.

6. Estimate the β ’s and γ of the beta-binomial regression model with MLE using MATLAB’s

optimization toolbox. For the maximization, the modified BFGS method was used.

7. Estimate the α ’s and β ’s of the latent regression Rasch model using the IRTm toolbox

31

developed by Braeken and Tuerlinckx. The toolbox follows the marginal maximum

likelihood approach. A detailed explanation is found in Braeken and Tuerlinckx (2009).

Simulation/Estimation Based on the Latent Regreassion (Rasch) DGP.

1. Compute )'exp(1

)'exp(

iij

iijij x

x

θβαθβα

π+++

++= where )1,0(~ Niθ . This is equivalent to

)1,'(~)logit( ij jij xN βαπ + .

2. Generate dichotomous values of ijy with probability ijijy π== )1Pr( for all ji, .

3. Compute =

=n

jiji yy

1

.

4. Estimate the β ’s and γ of the beta-binomial regression model with MLE using MATLAB’s

optimization toolbox. For the maximization, the modified BFGS method was used.

5. Estimate the α ’s and β ’s of the latent regression Rasch model using the IRTm toolbox

developed by Braeken and Tuerlinckx. The toolbox follows the marginal maximum

likelihood approach. A detailed explanation is found in Braeken and Tuerlinckx (2009).

32

References

Brady, Henry E. 1999. “Political Participation.” Measures of Political Attitudes eds. Robinson, John, et al. San Diego, CA: Academic Press.

Braeken, J., and F. Tuerlinckx. 2009. “Investigating Latent Constructs with Item Response Models: A MATLAB IRTm Toolbox.” Behavioral Research Methods. 41(4):1127-1137.

Butler, Ken, and Michael Stephens. 1993. “The Distribution of a Sum of Binomial Random Variables.” Technical Report No. 467. Prepared under Contract N00014-92-J-1264 (NR-042-267) for the Office of Naval Research.

Chen, Louis H. Y. 1974. “On the Convergence of Poisson Binomial to Poisson Distributions.” The Annals of Probability 2(1): 178-180.

Chen, Louis H. Y. 1975. “Poisson Approximation for Dependent Trials.” The Annals of Probability 3(3): 534-545.

De Boeck, Paul, and Mark Wilson. 2004. Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach. New York: Springer-Verlag.

Edwards, A. W. F. 1960. “The Meaning of Binomial Distribution.” Nature 186(25 June 1960): 1074.

Feller, William. 1968. An Introduction to Probability Theory and Its Applications. New York: John Wiley & Sons.

Gillion, Daniel Q. 2009. “Redefining Political Participation through Item Response Theory.” Paper presented at the 2009 Annual Meeting of the American Political Science Association, Toronto, ON, Canada, September 3-6.

Hodges, J. L., Jr. and Lucien Le Cam. 1960. “The Poisson Approximation to the Poisson Binomial Distribution.” The Annals of Mathematical Statistics 31(3):737-740.

King, Gary. 1998. Unifying Political Methodology: The Likelihood Theory of Statistical Inference. Ann Arbor: The University of Michigan Press.

Le Cam, Lucien. 1960. “An Approximation Theorem for the Poisson Binomial Distribution.” Pacific Journal of Mathematics 10(4): 1181-1197.

Palmquist, Bradley. 1997. “Heterogeneity and Dispersion in the Beta-Binomial Model.” Paper presented at the 1997 Annual Meeting of the American Political Science Association, Washington, D.C., August 27-31.

Palmquist, Bradley. 1998. “The Extended Beta-Binomial Model in Political Analysis.” Paper presented at the 1998 Annual Meeting of the Southern Political Science Association.

33

Pepper. Echo D. 1929. “On Poisson’s Series of Trials.” Mathematische Annalen 101(1): 375-380.

Prentice, R. L. 1986. “Binary Regression Using an Extended Beta-Binomial Distribution, With Discussion of Correlation Induced by Covariate Measurement Errors.” Journal of the American Statistical Association 81(394): 321-327.

Rijmen, Frank, Francis Tuerlinckx, Paul De Boeck, and Peter Kuppens. 2003. “A Nonlinear Mixed Model Framework for Item Response Theory.” Psychological Methods 8(2): 185-205.

Schrodt, Philip A. 2007. “Inductive Event Data Scaling Using Item Response Theory.” Paper presented at the 2007 Summer Meeting of the Society for Political Methodology, Pennsylvanian State University, July 18-20.

Wang, Y. H. 1993. “On the Number of Successes in Independent Trials.” Statistica Sinica 3: 295-312.

34

Table 1. Similarities and Differences between Extended Beta-Binoimial and Rasch Regressions

Extended Beta-Binomial Rasch Latent Regression

Specification of ijπ )),'(logit(~ -1 γβπ ii xB ),'(~)logit( 2ij σβπ jixN

Heterogeneity across Units Yes Yes

Heterogeneity across Items Flexible Yes

Inter-Item Correlation in Data Yes (+/−) Yes (+)

Subsuming Binomial Model? Yes Yes

Special Case to the Other? No No

35

Table 2. Monte Carlo Simulation: Beta-Binomial DGP 10 Items 5 Items

Beta-Binomial Regression Estimates Beta-Binomial Regression Estimates Mean β1 β2 β1 β2 n=200 -0.204 0.304 -0.200 0.301 n=600 -0.202 0.297 -0.204 0.300 n=1000 -0.199 0.302 -0.201 0.298 RMSE β1 β2 β1 β2 n=200 0.074 0.071 0.082 0.080 n=600 0.040 0.042 0.046 0.047 n=1000 0.032 0.032 0.034 0.035 Latent Rasch Regression Estimates Latent Rasch Regression Estimates Mean β1 β2 β1 β2 n=200 -0.313 0.463 -0.303 0.456 n=600 -0.308 0.453 -0.308 0.454 n=1000 -0.304 0.460 -0.303 0.451 RMSE β1 β2 β1 β2 n=200 0.147 0.177 0.149 0.179 n=600 0.114 0.154 0.116 0.156 n=1000 0.105 0.160 0.106 0.151

Note: True parameter values: β1 = -0.2, β2 = 0.3

36

Table 3. Monte Carlo Simulation: Item Response Theory DGP

10 Items 5 Items

Beta-Binomial Regression Estimates Beta-Binomial Regression Estimates Mean β1 β2 β1 β2 n=200 -0.147 0.221 -0.140 0.210 n=600 -0.147 0.221 -0.142 0.210 n=1000 -0.146 0.222 -0.140 0.209 RMSE β1 β2 β1 β2 n=200 0.069 0.088 0.078 0.099 n=600 0.056 0.080 0.062 0.091 n=1000 0.054 0.078 0.061 0.091

Latent Rasch Regression Estimates Latent Rasch Regression Estimates Mean β1 β2 β1 β2 n=200 -0.202 0.302 -0.200 0.299 n=600 -0.201 0.302 -0.203 0.298 n=1000 -0.200 0.302 -0.200 0.298 RMSE β1 β2 β1 β2 n=200 0.072 0.074 0.088 0.085 n=600 0.043 0.041 0.050 0.048 n=1000 0.031 0.032 0.039 0.037 Note: True parameter values: β1 = -0.2, β2 = 0.3

37

FIGURE 1: DATA-GENERATING PROCESSES Figure 1a: Binomial Distribution

Figure 1b: Beta Binomial Distribution

Figure 1c: Palmquist’s Alternative Data Generation Process

Figure 1d: Poisson-Binomial Distribution

Figure 1e: Beta Generated Within- and Between-Unit Heterogeneity

38

FIGURE 2: ALTERNATIVE DATA-GENERATING PROCESSES

Documents

EVENT COUNT ANALYSIS VS ITEM RESPONSE THEORY ...item response theory (IRT) to estimate the latent trait underneath the binary responses and use it to scale the attitude or behavior