Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
1
EVENT COUNT ANALYSIS VS. ITEM RESPONSE THEORY: A COMPARATIVE INVESTIGATION
Tse-min Lin The University of Texas at Austin
Etsuhiro Nakamura Ehime University
with
Dorothy Morgan Ariel Helfer
The University of Texas at Austin
Prepared for presentation at the 2014 Asian Political Methodology Meeting, Tokyo, Japan, January 6-7. A part of this paper is adapted from Tse-min Lin, Ariel Helfer, and Dorothy Morgan, “Event Count Models in Survey Research,” paper presented at the 2011 Annual Meeting of the American Political Science Association, Seattle, Washington, September 1-4.
2
Abstract
Traditionally, survey researchers have used binomial or extended beta-binomial
regressions to analyze event count data composed of binary item responses. The adequacy of
these models, however, depends on the assumption that those responses are generated from
identically and independently distributed Bernoulli trials. Recently, researchers have turned to
item response theory (IRT) to estimate the latent trait underneath the binary responses and use it
to scale the attitude or behavior in question. Since the event count model and the IRT model
involve different assumptions, they are different theoretical models with different empirical
implications. This paper examines their assumptions and conducts Monte-Carlo simulations to
compare the two models. Our results show that the IRT model is not as good as the event count
model when each is applied to data generated from the other. We conclude that the IRT model
should be used with care when the true data generating process is unknown.
3
Introduction
In survey research, it is not uncommon that an index is constructed as a sum of binary
survey items that are coded {0, 1}. The index is then a count of the “events” indicated by
positive responses to the binary items. For example, an index of political participation may be
constructed out of items that ask whether a respondent engaged in certain political activities,
such as “try to persuade,” “display preferences,” “attend meetings,” “do political work,” and
“give money.” When the researcher has a theory relating the index to a set of explanatory
variables, a model is then specified and estimated to test the theory.
It is well known that when the dependent variable of a liner regression model is an event
count, which is discrete and nonnegative, the normality and homoscedasticity assumptions
required of the classical linear regression model are no longer true, and, as a consequence, OLS
estimators are no longer the best linear unbiased estimators (BLUE). Models specifically
designed to take account of the data generating process of event counts have been developed.
When the count is theoretically infinite, the Poisson and negative binomial regressions are
appropriate. When the count is necessarily finite, such as in the case of survey research, the
binomial and (extended) beta-binomial regressions are more appropriate. These models are
estimable by the MLE method.
In this paper, we focus on finite event count models as applied to survey data. Both the
binomial and beta-binomial models have been widely used (King 1989). The adequacy of these
models, however, depends on the assumption that the binary responses constituting the event
count are generated from identically and independently distributed Bernoulli trials. If the
assumption is violated, these models may be rendered questionable. Recently, researchers have
turned to item response theory (IRT) to estimate the latent trait underneath the binary responses
4
(Schrodt 2007; Gillion 2009). The estimated trait is then used in lieu of event count as a scale of
the attitude or behavior in question.
Since the event count model and the IRT model involve different assumptions, they are
different theoretical models with different empirical implications. This paper compares the two
modeling approaches in the context of survey research. We examine their theoretical
assumptions and conduct Monte-Carlo simulations to evaluate their statistical performance. Our
findings, surprisingly, show that the IRT model is not as good as the event count model when
each is applied to data generated from the other. They suggest that the IRT model should be used
with care when the true data generation process is unknown.
This paper proceeds as follows. We first introduce the probability distributions used in
(finite) event count analysis. We then discuss the issue of heterogeneity and the adequacy of
traditional event count models. Next, we compare the theoretical assumptions of event count
models and IRT models. Lastly, we conduct Monte-Carlo simulations and discuss our results
before making some concluding remarks.
Event Counts and Their Probability Distributions
The simplest type of event count is whether an event has occurred at all. Called a
Bernoulli trial, this situation has only two possible outcomes: success (which we call x = 1) and
failure (which we call x = 0). The probability of success is π, which is a constant. Formally, a
random variable X follows a Bernoulli distribution with parameter π when
−=
ππ
1yprobabilitwith 0
yprobabilitwith 1X where 10 ≤≤ π
More complicated event counts can all be understood in terms of Bernoulli trials. If we
consider n independent Bernoulli trials Xj (j=1, 2, …, n) and each trial has the same probability
5
of success π, then the total number of successes =
=n
jjXY
1
out of n trials is a random variable
that follows a binomial distribution, which is given by the probability mass function (pmf)
( ) ( ) yny
y
nyY −−
== ππ 1Pr
where y = 0, . . . , n and 10 ≤≤ π . For example, Y might be the number of correct answers in a
battery of political knowledge questions or the number of “yes” responses to a battery of political
participation questions. The assumptions underlying the binomial distribution are that the
number of trials is finite, the trials are independent, and the probability π of success is identical
for the trials. Changing these basic and well-known assumptions yields other event counts. For
example, if the probability of success is kept constant across items but allowed to vary across
individuals according to the beta distribution, we have a beta-binomial distribution. When
modeling event counts with the binomial distribution, researchers often find that there is more
variance than what might be expected. The beta-binomial distribution is a popular generalization
of the binomial distribution that allows for this over-dispersion.
Prentice’s Extended Beta-Binomial Distribution
The beta-binomial distribution is a mixture of a beta distribution and a binomial
distribution. To derived the pmf of the beta-binomial random variable Ybb, we first multiply the
conditional pmf of a binomial variable Y|π by the probability density function (pdf) of π,
assumed to be a continuous random variable that follows a beta distribution, and then we
integrate the resulting joint probability function of Y and π over the domain of π (i.e., [0,1]) to
get the unconditional, marginal distribution Ybb:
6
( ) ( ) ( )( ) ( ) ( )
( )( ) ( )
( ) ( )( )
( )( )βα
βα
βαβα
βαβα
πππβαβαππ βα
,
,
11Pr1
0
11
Β−++Β
=
++Γ−+Γ+Γ⋅
ΓΓ+Γ⋅
=
−ΓΓ+Γ⋅−
== −−−
yny
y
n
n
yny
y
n
dy
nyY yny
bb
Note that ( )xΓ and ( )xΒ refer to the gamma function and beta function, respectively. The
expected value and variance of the beta-binomial distribution are given by:
( )
( ) ( )( ) ( )12 +++
++=
+=
βαβαβααβ
βαα
nnYVar
nYE
bb
bb
In his influential 1986 paper, Prentice extended the beta-binomial distribution to allow
for negative correlations among independent and identically-distributed Bernoulli trials (e.g.,
survey items) within an experimental unit (e.g., individuals). To do this, he uses the following
parameterization of the beta-binomial distribution:
( ) ( ) ( ) ( )
( )( ) ( ) ( ) ( )[ ]1
1
0
1
0
1
0
1111
1/1;Pr
1
−
−
=
−−
=
−
=
+−+−=
=
++−+
==
+=
+=
∏∏∏
γγρρ
ρ
γγργρ
βαγ
βααρ
nnYVar
nYE
jjjy
nnyY
bb
bb
n
j
yn
j
y
jbb
The re-parameterization results in the more intuitive interpretation of ρ as the expected
value of the binomial parameter π . γ has a less intuitive interpretation, but it is commonly
viewed as a measure of dispersion that is related to the variance of Y. Since γ is based on the
transformation ( ) 1−+= βαγ and since α > 0 and β > 0, γ must also be greater than 0 by
7
inheritance. Prentice’s insight was that γ need not be restricted to its non-zero, positive fate; the
re-parameterization made this inheritance unnecessary so long as the following condition held:
( ) ( )( ){ }11 11,1max −− −−−−−≥ nn ρργ
The dispersion parameter γ is important because it gives rise to the correlation
coefficient:
γγδ+
=1
According to Prentice, δ is the pairwise correlation between the binary trials that
comprise an event count. By allowing the dispersion parameter γ and by extension δ to take on
negative values, Prentice extended the beta-binomial distribution to allow for under-dispersion.
King’s Exposition of the Extended Beta-Binomial
As developed by Prentice, the extended beta-binomial distribution is a mixture
distribution formed by reparameterizing the π of the binomial distribution as a beta-distributed
random variable. The extended beta-binomial distribution presumes that the Bernoulli trials that
comprise the event count are identically and independently distributed. It is only under this iid
assumption that the sum of the Bernoullis can be a binomial that is to be further parameterized.
In the context of survey research, this means an individual respondent is assumed to have the
same probability of responding positively to all the binary items comprising a composite index,
but that probability is allowed to vary randomly across individuals according to the beta
distribution. For each individual, the index – an even count – is a binomial random variable, but
when all individuals are considered, the index becomes a random variable following the
extended beta-binomial distribution.
8
King (1989), however, interpreted the extended beta-binomial distribution as based on
“weakening the binomial assumption that the unobserved binary random variables making up Yi
have constant π ” (p. 45). This interpretation is inconsistent with Prentice’s original model, and
the inconsistency has been pointed out by Palmquist (1997). Nevertheless, the idea that the iid
assumption about the Bernoulli trials is too restrictive is important. In the study of political
participation, for example, it is apparently unreasonable to assume that an individual respondent
has the same probability of engaging in “persuading others” “displaying preferences,” “attending
meetings,” “giving money,” etc. A more realistic model for event count analysis in survey
research must allow for the weakening of the binomial assumption as King suggested. In a later
section, we introduce the Poisson-binomial distribution that does just that. In what follows, we
first review Palmquist’s unpublished work that has contributed significantly to the subject.
Palmquist’s Correction of King
Palmquist’s work centered largely around the fact that the variance of the beta-binomial
distribution is consistently larger – though under some special circumstances, under the heading
of the extended beta-binomial distribution, it may be smaller – than the variance of the simple
Binomial distribution, which has the familiar properties,
( )( ) ( )ππ
π−=
=1nYVar
nYE
binomial
binomial
This simple distribution can be represented graphically as in Figure 1a, where, the squares
represent Bernoulli random variables and each one has parameter π . These Bernoulli’s are
arranged in units of n items (here, n = 6), each indexed i to denote an individual. The variable of
interest, Yi , is of course the sum of all the Bernoulli outcomes in a unit.
9
(Figure 1 about here)
The greater variance of the beta-binomial distribution can be understood to be a result of
letting π become a random variable. We diagram that scenario in Figure 1b. In that figure, the
circle represents a beta distribution, with parameters α , and β , and the arrows from it indicate
that iπ is a beta-distributed random variable. Because this change from the binomial
distribution creates a higher Var(Yi), the additional variance is often referred to as “extra-
binomial” variance.
It is important to note that, in describing these models, we are not allowing for the re-
parameterization of our parameters as functions of some set of covariates. In a real-world
application of a distribution like the one diagramed above – imagine again that each i indexes a
survey respondent, and each Bernoulli variable is a survey question – we might well want to let
the beta distributions’ parameters vary as a function of the respondents’ characteristics. Indeed,
it would be hard to imagine an application in which we would be justified in asserting that the
parameters of the Bernoulli variables or of the beta distributions would be the same for all
individuals. However, even allowing for such individual-specific covariate-explained variation,
all of our theoretical results (i.e., those in the ensuing sections) would continue to hold
conditional on the covariates. Our findings would then refer, for example, to Var(Yi | xi'β) – but,
of course, the unconditional variance Var(Yi) would now include the additional effects of the
variation in the x’s.
Indeed, the variance of Yi will continue to be a primary focus as we proceed, as it was for
Palmquist. His work was thorough in clarifying how much extra-binomial variance we should
expect in data that followed a beta-binomial distribution as opposed to a simple binomial
10
distribution. It was in the light of those considerations that Palmquist introduced some
interesting complications into the beta-binomial model. He found that under certain
circumstances, there would be a reduction in the extra-binomial variances – in the limit case, the
extra-binomial variance could even completely disappear. Figure 1c is an illustration of the
modified beta-binomial model that Palmquist explored.
In this picture, each n = 6 unit has been divided into two “clusters” indexed by j, each
with nj = 3 Bernoulli’s. Within the smaller clusters, πij is the same across Bernoulli’s, but each
cluster’s πij is determined independently by one of two independent beta distributions. Palmquist
conclusively showed that in such a scenario, the variance of Y is lower than in the normal beta
Binomial case. Some – but not all – of the extra-binomial variance which we usually see in the
beta-binomial distribution is offset by the division of the Bernoulli’s into clusters. (In fact, this is
perfectly analogous to a well-known result that says that the sum of two binomial variables has a
variance equal to or less than the variance of either of the two distributions.)
Palmquist took this still one step further. He showed that, if the number of clusters was
increased to the limit – that is, if we let there be as many clusters as Bernoulli’s, so that each
cluster contains only one Bernoulli variable – then all of the extra-binomial variance disappears.
Such a case is no different than a simple binomial distribution with π equal to the average of the
expectations of all the n beta distributions.
While Palmquist’s attempt to introduce within-unit heterogeneity used multiple Beta
distributions, we can easily imagine a scenario in which that is not the case. Take data
generation process illustrated in Figure 1d. In this distribution, a single beta determines a full set
of π-values – one for each of the n items in a unit – and these remain unchanged across units.
11
Here we have within-unit heterogeneity but no between-unit heterogeneity. This distribution is
known as the “Poisson-binomial” distribution, and it is discussed in depth in the next section.
Finally, we note that between-unit heterogeneity is easily introduced into the Poisson-
binomial. In fact, Palmquist also explored the somewhat trivial special case represented in
Figure 1e. In this special case, as Palmquist noted, the under-dispersion resulting from the
within-unit heterogeneity exactly cancels out the over-dispersion caused by the across-unit
heterogeneity. The result is a distribution with binomial variance.
In a typical political science application – say, for example, that the unit of observation is
a survey respondent, and the Bernoulli variables are survey items – we must be careful to
identify the data generating process most likely to resemble the nature of the experimental
design. Do the survey items all have the same probability of being answered “1” as opposed to
“0”? Are there “clusters” of questions with similar probabilities? Is there likely to be random
variation from one respondent to the next? As we have clearly seen, the variety of ways to
model this seemingly simple data raises a number of questions. What may at first appear to be a
simple binomial distribution may in fact have subtle complications that seriously affect the
distribution’s variance. Of course, in order to be able to use these variations on the binomial
distribution, we must explore them more rigorously than has been done in this section. The
following section explores the mathematics underlying the various introductions of heterogeneity
to the binomial model.
The Issue of Heterogeneity
Deciding whether to use one variant of event count model over another depends on the
nature of the heterogeneity at hand. If parameters are allowed to vary, they are heterogeneous; if
12
they are assumed to be constants, they are not. Using a binomial distribution to model an event
count in survey research assumes homogeneity in two senses: First, the probabilities of success
in the n survey items are constant, and second, the probabilities of success across the N
individuals are constant. These assumptions can be expressed formally as follows:
( )π,~ nbinomialYi
where Yi is a binomial random variable, n and π are constant parameters, and i = 1, 2, . . . , N
refers to individuals. Note that the parameter π has no subscripts, which means π is the same
across survey items and across individuals. But assuming that there is homogeneity across
individuals and across items is often unrealistic. For example, it may be unrealistic to assume
that each individual has the same probability of correctly identifying political figures in a battery
of questions designed to measure political knowledge; instead, the researcher may want to allow
for heterogeneity across individuals. Alternatively, it may be unrealistic to assume that there is
the same probability of a correct answer across survey items; instead, the research may want to
allow for heterogeneity across items. Allowing for heterogeneity across individuals is quite
common and leads to the use of the beta-binomial or extended beta-binomial distribution. But
allowing for heterogeneity across items is rarely done or recognized. All too often, researchers
intend to allow for heterogeneity across items but actually allow for heterogeneity across
individuals. This is an understandable error – after all, the empirical footprints of both types of
heterogeneity are virtually indistinguishable. Accordingly, it is useful and necessary to clarify
further the distinction between the two types of heterogeneity in survey research.
Heterogeneity across Sets of Trials
13
To allow for heterogeneity across individuals, we allow the probability of success π for a
binomial variable Yi to vary across to a beta distribution with parameters ρ and γ. This yields a
random variable that follows a beta-binomial or extended beta binomial distribution:
( )( )γρπ
ππ,~
,~|
beta
nbinomialY
i
iii
Each individual i is associated with n items, which are independent Bernoulli trials with
parameter πi, and each πi is drawn from a beta distribution with parameters ρ and γ. The
parameter πi varies across individuals, but not across items. This is because a binomial random
variable is, by definition, a sum of independent, identical Bernoulli variables. If πi varies across
items, we would have independent but non-identical Bernoulli variables, which means we would
no longer have binomial variables.
By using extended beta-binomial distribution for event counts, we assume that we
observe Yi successes for individual i in n independent, identical Bernoulli trials which have the
probability of success πi, where πi is a random variable that follows a beta distribution. By
allowing πi to follow the beta distribution, we generalize the binomial distribution; when all the
respondents’ probabilities of success are the same (i.e., when π is a constant), the extended-beta-
binomial distribution reduces to the binomial distribution with parameters n and π.
The purpose of generalizing the binomial distribution via the extended-beta-binomial is
to account for heterogeneity across individuals. But the extended beta-binomial cannot handle
the possibility of heterogeneity within individuals, or heterogeneity across items because by
starting with a binomial distribution and allowing the parameter π to vary, we already assume
that there is no heterogeneity across independent Bernoulli trials. That is why using the extended
14
beta-binomial distribution necessarily requires that the probability of success is different for each
individual, but the same across each individual’s survey items.
Heterogeneity within Sets of Trials
To account for heterogeneity across independent Bernoulli trials, we must therefore start
with a Bernoulli distribution, not with a binomial distribution. To derive this kind of distribution
for event counts, we need to identify the distribution that is a sum of independent but non-
identical Bernoulli trials – in other words, the so-called Poisson trials (Pepper 1929; Feller 1968).
It turns out that the distribution of the sum of Poisson trials is called the Poisson-binomial
distribution.1
The Poisson-Binomial
Let Ypb denote the number of successes in n independent, non-identical Bernoulli trials.
When the trials are identical, Ypb collapses into the familiar binomial distribution with parameters
n and π, where π denotes the probability of success at each jth trial (j = 1, . . . , n). When the
trials are not identical and the jth trial has its own distinct probability of success πj, and the
distribution of Ypb is known as the Poisson-binomial.
Assuming that 10 ≤≤ jπ and letting ( )nππ ,,1 =π , the Poisson-binomial distribution
(Wang 1993) is given by
( ) ( ) ∏∏∈ ∈∈
−
==
xcFA Aj
jAj
jpb yY ππ 1|Pr π
1 Poisson was the first to consider extending the binomial distribution to non-identical Bernoulli trials; the names “Poisson-binomial distribution” and “Poisson’s binomial distribution” reflect this piece of statistical history. The name “Poisson-binomial distribution” is more popular (see Le Cam 1960; Hodges and Le Cam 1960; Edwards 1960; and Chen 1974).
15
where n is a positive integer denoting the number of Bernoulli trials and y = 0, 1, . . . , n.2
Appendix 1 provides an intuitive exposition of the Poisson-binomial distribution. The
expectation and variance of Ypb are given by:
( )
( ) ( )
=
=
−=
=
n
jjjpb
n
jjpb
YVar
YE
1
1
1 ππ
π
Recall that the expected value and variance of an extended beta-binomial distribution
(denoted by Y) are
( )( ) ( ) ( ) ( ) ( ) 11111 −+−−+−=
=
γγρρρρρ
nnnYVar
nYE
Note that the first term of ( )Var Y is the variance for a binomial distribution and the second term
is the so-called “extra-binomial” variance that allows for over-dispersion in the beta-binomial.
With a positive γ, this extra-binomial term is the reason that heterogeneity across individuals
leads to a variance that is greater than the binomial variance (Palmquist 1997).
2 The sets are defined as follows:
{ }{ }( ){ }x
xx
x
innpermutatioaisFAAP
xAnAAF
ℜ∈=
=⊆=
ππ ,:
,,,1:
where |A| denotes the number of elements of A, and Ac denotes the complement of A. If A ∈ Fx, then
( )xiiA ,,1 = is an ordered set such that kj ii < if j < k, and ( ) ( ) ( )( )xiiA πππ ,,1 = is a permutation of the
elements of A. Fn contains the n-tuple (1, . . . , n) and
=
x
nFx
is the number of subsets of size x of {1, . . . , n} and !nPx = is the number of permutations of {1, . . . , n}. For
each fixed A ∈ Fx, there are x! corresponding elements in Px, which is the number of permutations of the elements of A, hence
( ) ( )!!
!
xn
P
xn
nP n
x −=
−= .
16
In contrast, heterogeneity across items leads to a variance that is smaller than the
binomial variance. We can see that the variance for the Poisson-binomial distribution is
necessarily smaller than or equal to the variance for the binomial distribution by using a special
case of the Cauchy-Schwarz theorem:
( ) ( )
2
2
1 1
2
2
1 1 1 1
2
1 1 1 1
1
1
1
11
1 1
n n
j jj j
n n n n
j j j jj j j j
n n n n
j j j jj j j j
n
j jj
n
n
n
n
π π
π π π π
π π π π
π π π π
= =
= = = =
= = = =
=
≤
− ≥ −
− ≥ −
− ≥ −
This is the result that Palmquist (1997) reported in his discussion of parameter heterogeneity
(although he did not identify the Poisson-binomial distribution as the data-generating process
characterized by heterogeneity across items). From this result, we can conclude, as Palmquist
did, that across-item heterogeneity will lead to under-dispersion of the dependent variable,
relative to across-item homogeneity. In other words, heterogeneity across items will produce a
smaller variance than the binomial variance, but heterogeneity across individuals will produce a
larger variance. If both types of heterogeneity are present in survey data, they can counteract
each other’s effects. The big question for event count analysis, therefore, is whether traditional
event count models remain adequate under this circumstance.
Following King’s (1989) insight, we speculate that the extended beta-binomial model
will remain adequate. This is because with Prentice’s extension of γ to the negative domain,
“extra-binomial” variance can turn negative and lead to the reduction of binomial variance.
Because of the flexibility, the extended beta-binomial model can accommodate both over- and
17
under-dispersion, and, hence, both across-individual and across-item heterogeneity. We
demonstrate this with examples below.
Examples of the Poisson-Binomial and the Adequacy of the Extended Beta-Binomial Model
As an example of the Poisson-binomial, assume that each probability of success jπ
associated with the Bernoulli random variable jX (j = 1, . . . , n ) is drawn from a beta
distribution with parameters ρ and γ , i.e.,
),(beta~
)(Bernoulli~|
γρπππ
j
jjjX with
ρπππ
=
=
)(
)|(
j
jjj
E
XE
We derive the unconditional distribution of jX by first multiplying the conditional distribution
of jjX π| with the distribution of jπ to get the joint distribution of jX and jπ , and then
integrating this joint distribution over the domain of jπ , which is {0,1}.
)1())1((
)())1(1(
)1())1(()(
)()1(
)Pr(
11
11
1
0
1)1(1
11
11 11
−−
−−
−−−−−
−−
+Γ−Γ+Γ−+−Γ
=
−⋅−ΓΓ
Γ⋅−=
=
−−
ργγρργγρρ
πππγρργ
γππ γρργ
jj
jjjx
jx
j
jj
xx
d
xX
jj
Therefore,
ρ
ρ−==
==
1)0Pr(
)1Pr(
j
j
X
X
This means that the unconditional distribution of jX is simply a Bernoulli distribution
with parameter )( jE πρ = . In other words, by allowing the parameter of a Bernoulli distribution
to vary according to a beta distribution , we end up with yet another Bernoulli distribution but
18
with a different parameter, which happens to be the expected value of the original parameter as a
beta-distributed random variable. We call the unconditional distribution of jX a “beta-
Bernoulli” distribution. The moment generating function of this distribution is
tttxX eeeeEtM j
jρρρρ +−=+−== 1)1()()( 0
The moment generating function of the sum of beta-Bernoullis =
=n
jjXY
1
is:
ntn
j
tn
jXY eetMtM
j)1()1()()(
11
ρρρρ +−=+−== ∏∏==
which corresponds to the moment generating function of a binomial distribution with parameters
n and )( jE πρπ == . The variance of Y is
)1()1()(1
ρρρρ −=−==
nYVarn
j
When this Poisson-binomial process is assumed to hold true for all individuals Ni ,...,2,1= , we
have the model shown in Figure 1e, which is the case when the under-dispersion resulting from
the within-unit heterogeneity exactly cancels out the over-dispersion caused by the across-unit
heterogeneity. The result is a distribution with binomial variance, given by Var(Y) above.
This model can be generalized to the one shown in Figure 2. Using the frame of survey
research, each individual respondent (indexed i, as always) has his/her own beta distribution
from which the πij’s are generated. Palmquist never explored this possibility. As the arrows here
indicate, the expected value of each individual specific beta distribution, that is the iρ of each
Bi, is in turn generated by the same universal beta distribution B(ρ,γ). (It is worth noting that the
distributions presented in Figures 1a, 1b, 1d, and 1e above can all be construed as special cases
of this one.)
19
(Figure 2 about here)
Through repeated simulations, we found this Poisson-Binomial process can be estimated
by a beta-binomial model. For example, we set n = 10, ρ = 0.5, γ = 0.5, and let all of the γi = 0.5.
When we simulated the N observations generated by this process, the variance of the data
was 10)( ≈YVar . Now, if we had assumed that this was a simple beta-binomial distribution,
with ρ = 0.5 and γ = 0.5, we could have calculated the variance:
10])1()1(1)[1()( 1 =+−+−= −γγρρ nnYVar
Thus, our simulation result corroborates our speculation. The extended beta-binomial
distribution, with the two parameters ρ and γ , appears to have the flexibility to correctly reflect
data with both within-unit heterogeneity and between-unit heterogeneity or, more specifically,
data with heterogeneity across both items and individuals.
Event Count Regression Models and Item Response Theory
Generalized linear regression models based on the beta-binomial family has been used to
study event counts in survey research. In recent years, however, some political scientists have
turned to item response theory (IRT) to conduct such analyses (Schrodt 2007; Gillion 2009). IRT
models are not really designed for analyzing event counts. They are relevant, however, because
they, like factor analysis, can estimate the latent trait underneath the binary responses comprising
an event count and use it to scale the attitude of behavior in question. In other words, estimated
latent traits substitute event counts as measures of attitudes and behaviors. Moreover, IRT
models have the advantage in that they can explicitly model within-unit heterogeneity by
20
including parameters representing item-specific properties, i.e., difficulty, discrimination, and
guessing parameters. In this section, we compare the two approaches. Because covariates are
necessary in any realistic analysis, we first provide a formal definition of the beta-binomial
regression.
The Extended Beta-Binomial Regression Model
Using the survey research framework, suppose there are N individuals each of whom
responded to n binary items. Let ijX represent whether individual i answers item j positively,
and ijX ~ iid Bernoulli( iπ ) across items with ),(Beta~)1Pr( γρπ == iji X . Then =
=n
jiji XY
1
,
the number of items that individual i answers positively, follows the beta-binomial distribution:
( ) ( ) ( ) ( )∏∏∏−
=
−−
=
−
=
++−+
==
1
0
1
0
1
0
1/1;Prn
j
yn
j
y
jiii jjj
y
nnyY
ii
γγργρ
with
( )( ) ( ) ( ) ( )[ ]11111 −+−+−=
=
γγρρ
ρ
nnYVar
nYE
i
i
})1)(1(,)1(max{
1011 −− −−−−−≥
≤≤nn ρργ
ρ
When the parameter ρ is related to covariates ix using the logit link
)'exp(1
)'exp()(E
ββπρi
iii x
x
+==
then we have the extended beta-binomial regression model.
Note that if 0=γ ,
( ) ii yny
iii y
nnyY −−
== )1(;Pr ρρ
and extended beta-binomial regression reduces to binomial regression.
21
As discussed above, the extended beta-binomial regression model can accommodate data
with heterogeneity across both items and individuals. It is adequate for event count analysis in
survey research.
IRT Models as Mixed Effect Logit Models
According to Rijmen, Tuerlinckx, De Boeck and Kuppens (2003) and De Boeck and
Wilson (2004), IRT can be represented as a mixed logit model. Specifically, the probability that
the i-th ( Ni ,...,2,1= ) survey respondents correctly answers the j-th ( nj ,...,2,1= ) item can be
represented as
)''exp(1
)''exp(
iijjij
iijjijij zx
zx
θβθβ
π++
+=
where ijx is a p-dimensional person-by-item covariates; ijz is a q-dimensional vector of person-
by-item covariates; jβ is a p-dimensional vector of item-specific fixed effects; and iθ is a q-
dimensional vector of individual-specific random effects. Several special cases can be
derivedfrom this specification.
The Basic Rasch Model.
If we assume 1== qp and 1== ijij zx , then
)exp(1
)exp(
ij
ijij θβ
θβπ
+++
=
This is the basic Rasch model where jβ is the difficulty parameter of item j, and iθ is a random
variable. A simple one-parameter IRT model, the basic Rasch model is a mixed-effect logistic
regression model containing no covariates but only constant terms in ijx and ijz .
The Latent Rasch Regression Model with Individual-Specific Covariates for Fixed Effects.
22
If we allow individual-specific covariates ix for fixed effects in the basic Rasch model
above, we derive a more general latent Rasch regression model:
)'exp(1
)'exp(
iji
ijiij x
x
θβθβ
π++
+=
A special case of this model is when all items share the same vector jβ (i.e., ββ =j for
nj ...,3,2,1= ). In this case, we can drop the j subscript from ijπ to get
)'exp(1
)'exp(
ii
iii x
x
θβθβπ++
+=
This is a Rasch regression model for homogeneous items with both fixed effects and random
effects.
The Binomial Regression Model.
Based on the Rasch model for homogeneous items above, if we further remove the
random effect, we get
)'exp(1
)'exp(
ββπi
ii x
x
+=
which happens to be the binomial regression model with
ii yni
yi
iii y
nyY −−
== )1()Pr( ππ
with a logit link for iπ . Note that the constant term of the linear form β'ix is the common
difficulty parameter shared by all items. This shows that the binomial regression model is a
special case of the latent Rasch regression model.
The Latent Rasch Regression Model vs. the Extended Beta-Binomial Model
23
If the binomial model is a special case of the Rasch latent regression model, what about
the extended beta-binomial regression model? Could it also be a special case of IRT? The answer
to this question is no. There are, however, similarities between the extended beta-binomial model
and the Rasch model.
Recall that the Rasch latent regression model is
)'exp(1
)'exp(
iji
ijiij x
x
θβθβ
π++
+=
Taking the logit function on both sides gets
ijiij
ij x θβπ
ππ +=
−= ')
1ln()logit( ij
where iθ is the (random) ability parameter generally assumed to follow a normal distribution
with zero mean and constant variance: ),0(~ 2σθ Ni . Thus,
),'(~)logit( 2ij σβπ jixN
In contrast, the extended beta-binomial regression model assumes
),(~ γρπ ii B with )'(logit)'exp(1
)'exp( 1- ββ
βρ ii
ii x
x
x =+
=
or
)),'(logit(~ -1 γβπ ii xB .
Clearly, the two models are similar in that both make distributional assumptions about the
Bernoulli parameter ijπ . They are different in the specification of the assumptions. Because of
the limited nature of ijπ (i.e., 10 ≤≤ ijπ ), while Rasch regression transforms ijπ into )logit( ijπ
in order to use the unlimited normal distribution, extended beta-binomial regression directly lets
iπ follow the limited beta distribution. By making such assumptions, both models allow
24
conditional heterogeneity across units. They are, however, different in terms of heterogeneity
across items. The Rasch model, by allowing jβ to vary across items, directly introduces within-
unit heterogeneity. The extended beta-binomial model does not have such a mechanism,
although, as we have shown in the previous section, it does have the flexibility in
accommodating within-unit heterogeneity. It is noteworthy that while both models assume
independence among items, they, by virtue of between-unit heterogeneity, can produce positive
inter-item correlations in data. In this regard, the extended beta-binomial model is more flexible
in that a negative value of the parameter γ can produce negative inter-item correlations as well.
Finally, although both models subsume the binomial model, neither is a special case of the other.
Table 1 summarizes the similarities and differences between the two models:
(Table 1 about here)
In the section that follows, we conduct Monte Carlo simulations to compare the Rasch
model and the extended beta-binomial regression model.3
Monte-Carlo Simulation Results
We conduct Monte-Carlo simulations to compare the performances of the extended beta-
binomial regression model and the latent Rasch regression model. Because event counts can be
generated from either process, we simulate data from both of them and fit the data to each model.
In practice, the true data generating process is usually unknown. We expected that each model
would perform better when it is applied to data generated from its own process. But we were
3 We do not discuss two-parameter IRT models in this draft. Monte-Calro simulations involving two-parameter IRT models will be conducted and reported in a future version of this paper.
25
hoping our results can shed lights on which model performs better if it is wrongly applied to data
generated from a different process. All the simulations reported here involve two individual-
specific covariates. We vary sample size (N=200/600/1000) and item number (n=5/10) to ensure
the robustness of our results. Details of the simulations are described in Appendix 2.4
Extended Beta-Binomial Regression DGP
In the first simulation, we generate data from the beta-binomial regression DGP. The
estimation results are shown in Table 2. As expected, the estimates of the beta-binomial
regression model does not have detectable bias and are relatively efficient as the sample size and
item number increase. In contrast, fitting the wrong (Rasch) model produces significant positive
biases in absolute values and larger RMSEs. It seems that the Rasch model, with its specific form
of between-unit heterogeneity, cannot adequately capture the heterogeneity of the beta-binomial
type.
(Table 2 about here)
Latent Rasch Regression DGP
Table 3 shows the estimation results of our second simulation based on the Rasch DGP.
Not surprisingly, the Rasch model performs better than the extended beta-binomial model, which
produces negative biases in absolute values and relative inefficiencies. The RMSEs associated
with the wrong model, however, are not so large as in the first simulation. The reason, according
4 We also generated data from the binomial process and fit them to both Rasch regression and extended beta-binomial regression. Not surprisingly, both models perform well. We don’t provide the simulation/estimation details here in order to save space.
26
to our detailed analysis, is that the beta-binomial regression estimates, although generally
underestimated, are more narrowly distributed, which makes the RMSE small.
(Table 3 about here)
From our Monte-Carlo simulations, we reach the following conclusions. First, both beta-
binomial regression and Rasch regression can rather accurately deal with data generated from the
binomial DGP. This is not surprising because the binomial is subsumed under both models.
Second, both beta-binomial regression and Rasch regression can rather accurately estimate
parameters when applied to data generated from their own DGPs. Third, both models can
produce biased and inefficient estimates when applied to data generated from each other, unless
the other process happens to collapse to the binomial model. Fourth, the problems caused by
fitting data to the wrong model is more severe for Rasch regression than for extended beta-
binomial regression.
Therefore, using IRT models for event count data is not always a good strategy. The
binomial model is a special case of IRT but the beta-binomial is not. If we don’t know what the
true DGP is, as is usually the case, applying event count models is a safer choice than applying
IRT models.
Concluding Remarks
Event counts are widely used in survey research. Despite the efforts by King and
Palmquist, the nature of the processes that generate event counts and their implications on
statistical inference are still not well understood. In this paper, we use survey research as a
27
context in which to examine the data generating processes of event counts. By introducing the
beta-Bernoulli mixture distribution, we relaxed a restrictive assumption required for the binomial
distribution to be a valid model for event counts, namely, the binary trials that comprise an event
count must have identical Bernoulli probabilities. By relaxing this critical assumption, we have
generalized the binomial distribution to the Poisson-binomial distribution and allowed for more
flexibility in modeling survey response. On the basis of the less restrictive assumption, we
examined the adequacy of the binomial regression model and the extended beta-binomial
regression model. Our findings show that the extended beta-binomial regression model is
adequate for data generated with both within-unit and between-unit heterogeneity. This makes
the model more realistic than previously understood.
We also investigated IRT models as an alternative to event count models. IRT models
seem to have an advantage over event count models because they explicitly model within-unit
heterogeneity by including parameters representing item-specific properties. Our findings,
surprisingly, show that event count models depict data generated from IRT models better than
IRT models depict data generated from event count models. Therefore, if we don’t know what
the true data generating process is, applying event count models is a more advisable choice than
applying IRT models.
28
Appendix 1: An Exposition of the Poisson-Binomial Distribution
The Poisson-binomial distribution can be written
( ) ( ) ∏∏∈ ∈∈
−
=
xcFA Aj
jAj
jn xf ππ 1;p
where A is an ordered x-tuple of the elements in the set {1,…,n} arranged in ascending order, Fx
is the set of all such x-tuples, and Ac is the (n-x)-tuple consisting of whatever elements of
{1,…,n} are not in A (also arranged in ascending order). The vector π, of course, contains the
probabilities of each of the n Bernoulli variables obtaining a value of 1.
While this seems indeed to be the most concise way of representing the probability mass
distribution function in question, it is cumbersome to have to consider the various vectors and
sets involved in what is otherwise an extremely intuitive distribution. For that reason, we clarify,
with a few simple examples, how the Poisson-binomial distribution works.
Consider the case when n = 2 – that is, when we have two Bernoulli variables, each with
its own pi. The values of Poisson’s binomial distribution are
212
12212
212
);2(
)1()1();1(
)1)(1();0(
ππππππππ
πππ
=−+−=
−−=
f
f
f
Clearly, the distribution itself is more intuitive than the complicated mathematical
notation used to describe it suggests at first glance. A step to n = 3 is an easy and helpful
illustration.
3213
1322313213
2133123213
3213
);3(
)1()1()1();2(
)1)(1()1)(1()1)(1();1(
)1)(1)(1();0(
ππππππππππππππ
ππππππππππππππ
=−+−+−=
−−+−−+−−=−−−=
f
f
f
f
29
Even a generalization for n Bernoulli’s is relatively easy to understand when thus laid
out:
nn
nnn
nnn
nnnnn
nnnnn
nn
nf
f
f
f
πππππ
πππππππππππππππ
ππππππππππππππππππππππππππ
ππππ
...);(
...
)}1)...(1)(1({
...)}1)...(1)(1(...)1)...(1)(1({
)}1)...(1(...)1)...(1)(1()1)...(1({);2(
)1)...(1)(1(...)1)...(1)(1()1)...(1();1(
)1)...(1)(1();0(
321
2211
13124132
1214231321
12131221
21
=
−−−++−−−++−−−
+−−++−−−+−−=−−−++−−−+−−=
−−−=
−−
−
−
−
The generalized case quickly becomes unwieldy. Certainly, the concise expression using
summations and products is neater and, once clearly understood, preferable. However,
something like the above examples might be useful in clarifying the relatively simple concept
behind the Poisson-Binomial distribution.
30
Appendix 2. Simulation and Estimation Procedures
All the data generating processes in the simulation reported here involve two individual-
specific covariates, ix1 and ix2 , which were generated from N(0,1). The coefficient parameters
00 =β , 2.01 −=β and 3.02 =β , giving a linear form iii xxx 21 3.02.0' +−=β .
For the extended beta-binomial DGP, 2=γ . For the IRT (Rasch) DGP, the difficulty
parameters are [ ]1.6449 0.5978, 0, 0.5978,- 1.6449,-=α for the 5-item model, and
[ ]1.6449 1.0364, 0.6745, 0.3853, 0.1257, 0.1257,- 0.3853,- 0.6745,- 1.0364,- 1.6449,-=α for the
10-item model.
Data were generated for three different numbers of individuals (i.e., sample size),
N=200/600/1000, and two different numbers of items, n=5/10.
Simulation/Estimation Based on the Extended Beta-Binomial DGP.
1. Compute )'exp(1
)'exp(
ββρi
ii x
x
+= .
2. Compute γρi
ia = and γ
ρiib
−= 1.
3. Generate ),(~ iii baBπ . This is equivalent to )),'(logit(~ -1 γβπ ii xB .
4. Generate dichotomous values of ijy with probability iijy π== )1Pr( for all ji, .
5. Compute =
=n
jiji yy
1
.
6. Estimate the β ’s and γ of the beta-binomial regression model with MLE using MATLAB’s
optimization toolbox. For the maximization, the modified BFGS method was used.
7. Estimate the α ’s and β ’s of the latent regression Rasch model using the IRTm toolbox
31
developed by Braeken and Tuerlinckx. The toolbox follows the marginal maximum
likelihood approach. A detailed explanation is found in Braeken and Tuerlinckx (2009).
Simulation/Estimation Based on the Latent Regreassion (Rasch) DGP.
1. Compute )'exp(1
)'exp(
iij
iijij x
x
θβαθβα
π+++
++= where )1,0(~ Niθ . This is equivalent to
)1,'(~)logit( ij jij xN βαπ + .
2. Generate dichotomous values of ijy with probability ijijy π== )1Pr( for all ji, .
3. Compute =
=n
jiji yy
1
.
4. Estimate the β ’s and γ of the beta-binomial regression model with MLE using MATLAB’s
optimization toolbox. For the maximization, the modified BFGS method was used.
5. Estimate the α ’s and β ’s of the latent regression Rasch model using the IRTm toolbox
developed by Braeken and Tuerlinckx. The toolbox follows the marginal maximum
likelihood approach. A detailed explanation is found in Braeken and Tuerlinckx (2009).
32
References
Brady, Henry E. 1999. “Political Participation.” Measures of Political Attitudes eds. Robinson, John, et al. San Diego, CA: Academic Press.
Braeken, J., and F. Tuerlinckx. 2009. “Investigating Latent Constructs with Item Response Models: A MATLAB IRTm Toolbox.” Behavioral Research Methods. 41(4):1127-1137.
Butler, Ken, and Michael Stephens. 1993. “The Distribution of a Sum of Binomial Random Variables.” Technical Report No. 467. Prepared under Contract N00014-92-J-1264 (NR-042-267) for the Office of Naval Research.
Chen, Louis H. Y. 1974. “On the Convergence of Poisson Binomial to Poisson Distributions.” The Annals of Probability 2(1): 178-180.
Chen, Louis H. Y. 1975. “Poisson Approximation for Dependent Trials.” The Annals of Probability 3(3): 534-545.
De Boeck, Paul, and Mark Wilson. 2004. Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach. New York: Springer-Verlag.
Edwards, A. W. F. 1960. “The Meaning of Binomial Distribution.” Nature 186(25 June 1960): 1074.
Feller, William. 1968. An Introduction to Probability Theory and Its Applications. New York: John Wiley & Sons.
Gillion, Daniel Q. 2009. “Redefining Political Participation through Item Response Theory.” Paper presented at the 2009 Annual Meeting of the American Political Science Association, Toronto, ON, Canada, September 3-6.
Hodges, J. L., Jr. and Lucien Le Cam. 1960. “The Poisson Approximation to the Poisson Binomial Distribution.” The Annals of Mathematical Statistics 31(3):737-740.
King, Gary. 1998. Unifying Political Methodology: The Likelihood Theory of Statistical Inference. Ann Arbor: The University of Michigan Press.
Le Cam, Lucien. 1960. “An Approximation Theorem for the Poisson Binomial Distribution.” Pacific Journal of Mathematics 10(4): 1181-1197.
Palmquist, Bradley. 1997. “Heterogeneity and Dispersion in the Beta-Binomial Model.” Paper presented at the 1997 Annual Meeting of the American Political Science Association, Washington, D.C., August 27-31.
Palmquist, Bradley. 1998. “The Extended Beta-Binomial Model in Political Analysis.” Paper presented at the 1998 Annual Meeting of the Southern Political Science Association.
33
Pepper. Echo D. 1929. “On Poisson’s Series of Trials.” Mathematische Annalen 101(1): 375-380.
Prentice, R. L. 1986. “Binary Regression Using an Extended Beta-Binomial Distribution, With Discussion of Correlation Induced by Covariate Measurement Errors.” Journal of the American Statistical Association 81(394): 321-327.
Rijmen, Frank, Francis Tuerlinckx, Paul De Boeck, and Peter Kuppens. 2003. “A Nonlinear Mixed Model Framework for Item Response Theory.” Psychological Methods 8(2): 185-205.
Schrodt, Philip A. 2007. “Inductive Event Data Scaling Using Item Response Theory.” Paper presented at the 2007 Summer Meeting of the Society for Political Methodology, Pennsylvanian State University, July 18-20.
Wang, Y. H. 1993. “On the Number of Successes in Independent Trials.” Statistica Sinica 3: 295-312.
34
Table 1. Similarities and Differences between Extended Beta-Binoimial and Rasch Regressions
Extended Beta-Binomial Rasch Latent Regression
Specification of ijπ )),'(logit(~ -1 γβπ ii xB ),'(~)logit( 2ij σβπ jixN
Heterogeneity across Units Yes Yes
Heterogeneity across Items Flexible Yes
Inter-Item Correlation in Data Yes (+/−) Yes (+)
Subsuming Binomial Model? Yes Yes
Special Case to the Other? No No
35
Table 2. Monte Carlo Simulation: Beta-Binomial DGP 10 Items 5 Items
Beta-Binomial Regression Estimates Beta-Binomial Regression Estimates Mean β1 β2 β1 β2 n=200 -0.204 0.304 -0.200 0.301 n=600 -0.202 0.297 -0.204 0.300 n=1000 -0.199 0.302 -0.201 0.298 RMSE β1 β2 β1 β2 n=200 0.074 0.071 0.082 0.080 n=600 0.040 0.042 0.046 0.047 n=1000 0.032 0.032 0.034 0.035 Latent Rasch Regression Estimates Latent Rasch Regression Estimates Mean β1 β2 β1 β2 n=200 -0.313 0.463 -0.303 0.456 n=600 -0.308 0.453 -0.308 0.454 n=1000 -0.304 0.460 -0.303 0.451 RMSE β1 β2 β1 β2 n=200 0.147 0.177 0.149 0.179 n=600 0.114 0.154 0.116 0.156 n=1000 0.105 0.160 0.106 0.151
Note: True parameter values: β1 = -0.2, β2 = 0.3
36
Table 3. Monte Carlo Simulation: Item Response Theory DGP
10 Items 5 Items
Beta-Binomial Regression Estimates Beta-Binomial Regression Estimates Mean β1 β2 β1 β2 n=200 -0.147 0.221 -0.140 0.210 n=600 -0.147 0.221 -0.142 0.210 n=1000 -0.146 0.222 -0.140 0.209 RMSE β1 β2 β1 β2 n=200 0.069 0.088 0.078 0.099 n=600 0.056 0.080 0.062 0.091 n=1000 0.054 0.078 0.061 0.091
Latent Rasch Regression Estimates Latent Rasch Regression Estimates Mean β1 β2 β1 β2 n=200 -0.202 0.302 -0.200 0.299 n=600 -0.201 0.302 -0.203 0.298 n=1000 -0.200 0.302 -0.200 0.298 RMSE β1 β2 β1 β2 n=200 0.072 0.074 0.088 0.085 n=600 0.043 0.041 0.050 0.048 n=1000 0.031 0.032 0.039 0.037 Note: True parameter values: β1 = -0.2, β2 = 0.3
37
FIGURE 1: DATA-GENERATING PROCESSES Figure 1a: Binomial Distribution
Figure 1b: Beta Binomial Distribution
Figure 1c: Palmquist’s Alternative Data Generation Process
Figure 1d: Poisson-Binomial Distribution
Figure 1e: Beta Generated Within- and Between-Unit Heterogeneity
38
FIGURE 2: ALTERNATIVE DATA-GENERATING PROCESSES