Community Detection and Political Polarization (Daniel Argyle)

Networks, Community Detection, and PoliticalPolarization∗

Daniel ArgyleUniversity of California, Santa Barbara

[email protected]

February 14, 2013

Abstract

I propose a Bayesian algorithm that identifies latent groups, known as communities, in net-works. This algorithm is more general than previous community detection techniques intro-duced in economics and incorporates insights from Bayesian statistics that allow it to convergemore quickly and be implemented on relatively large networks. I demonstrate these attributeson randomly generated networks and provide an application to the United States Congress,where the proposed model and parameters have an intuitive interpretation.

JEL Classification Codes: C12, C49, C52, D72Keywords: networks, community structures, political networks, Bayesian methods

∗This is a working draft. Please do not cite or distribute without permission.

mailto:[email protected]

[email protected]

1

1 Introduction

Many areas of economics—including basic questions of public good provision, externalities, and

trade—are built on theories of repeated interactions of individuals in groups. Unfortunately, in

many settings an agent’s actions can be observed but group membership is unobserved. It may

also be the case that agents associate themselves with a given label, but their chosen designation

does not accurately reflect their revealed choices. Identifying actual group membership is a key

empirical concern when addressing questions that involve interaction within groups (e.g. peer

effects) and interactions across groups (e.g. externalities). In such scenarios may be feasible to

estimate group membership based on other observable data. One method, known as community

detection, uses network links to find sets of agents that interact more frequently and is rapidly

growing in prominence. This paper contributes a Bayesian community detection technique that is

based on a more general framework than existing methods in economics and utilizes insights from

Bayesian statistics that ensure that the algorithm is feasible to estimate.

Copic, Jackson and Kirman (2009) introduce community detection in the economics litera-

ture.1 Like other similar work (see Newman (2004)), Copic et al. (2009) define communities as

groups of nodes in a network where the probability of a link forming between two nodes in the

same community exceeds the probability that a link forms between two nodes in different com-

munities. Copic et al. (2009) then establish a function that yields the likelihood of observing a

network of interactions given the probabilities of link formation and a community structure (a par-

tition of the network nodes into individual communities) and optimize it via an iterative process

to find the maximum likelihood estimates of the probabilities and the optimal community struc-

ture. Subsequent work by Chapman and Zhang (2010) offers a basic Bayesian implementation of

this framework suggested by Copic et al. (2009), which builds on the same likelihood function

but estimates the parameters via a Markov-Chain Monte Carlo Algorithm. I make two principle

1There is a robust literature about community detection driven by research in physics, computer science, statisticsand other fields. Leskovec, Lang and Mahoney (2010) and Santo and Fortunato (2010) provide brief introductions toseveral of the most common community detection methods along with performance benchmarks; however, it must beemphasized that this is a widely varied and rapidly growing literature. Mørup and Schmidt (2012) is a good exampleof a state of the art Bayesian algorithm.

2

contributions to this previous work. First, I generalize the likelihood function found in Copic et al.

(2009) by loosening the restrictive assumption that the probability of link formation be identical

across all groups. Not only is the generalized likelihood function more plausible, because it allows

groups of individuals behave differently, I demonstrate that it results in more accurate estimation

through Bayesian Model Comparison. Second, I propose a new algorithm to estimate the more

general model. Chapman and Zhang (2010) propose an algorithm that relies on moving one node

to a new community and evaluating which partition is more likely given the previous estimates

of the probabilities. This process is very slow for large networks and when applied to the gen-

eralized model often will not converge. I propose transitioning by randomly selecting two nodes

and then merging their communities (if they lie in different groups) or splitting the community

through random assignment (if the nodes are in the same community); this is an insight based on

Jain and Neal (2004). Additionally, unlike similar Bayesian algorithms from other fields, I assume

an uninformative prior over the set of possible community groups.

Despite it’s potential usefulness, applications of community detection in economics remain

scarce. Copic et al. (2009) includes an application to citation networks, where the community

detection algorithm finds communities of journals that align with subfields in economics. Sev-

eral recent papers use community detection algorithms to identify groups of closely related finan-

cial institutions (Wetherilt, Zimmerman and Soramaki, 2010; Chapman and Zhang, 2010; Bech,

Bergstrom, Garratt and Rosvall, 2011). To demonstrate my algorithm, I contribute an application

of community detection in congressional voting that provides insights into changes in strategic

voting patterns in the United States Congress.

I introduce notation in Section 2. The specific Bayesian algorithm I develop is described in

Section 3, and Section 4 discusses simulation results demonstrating convergence of this technique

under a variety of parametric specifications. Section 5 contains the results of the algorithm when

applied to data from the United States Senate and suggests that Congressional voting has become

more partisan in recent years. Section 6 concludes.

3

2 Networks and Community Detection

A network consists of a set of nodes N = {1,2, . . . ,n} and an n× n adjacency matrix, A, of non-

negative integers representing links between the nodes. An individual element of the adjacency

matrix, Ai j, indicates the number of times agent i and agent j interact.2 A community structure z is

a vector of length n that contains a community assignment for each node, i.e. zi is the community

assignment of node i. Note that this implies that a given node can only belong to one community.

The total number of communities is denoted by L and will be indexed by ` so that z(`) is the set of

nodes assigned to a given community `. For a given community ` ∈ {1,2, . . . ,L}, the probability

of forming a link to a node in the same community is denoted pin` and the probability of forming

a link to a node in any other community is denoted pout` .3 This structure is more general than pre-

vious work by Copic et al. (2009) and Chapman and Zhang (2010) because both assume identical

probabilities for link formation across all communities; that model is nested as a special case.

A community structure requires the assumption that probability of forming links within a com-

munity exceeds the probability of forming links across communities. This implies the following

condition must hold

0≤ pout` ≤ pin

` ≤ 1 (1)

for all ` ∈ 1,2, . . . ,L. The number of interactions between a given node i in community ` and

another node j is assumed to follow a binomial distribution so that link between the nodes is

distributed

p(Ai j|pin` , pout

` ,z,Ci j) =

(

Ci j

Ai j

)(pin

` )Ai j(1− pin

` )Ci j−Ai j , if zi = z j(

Ci j

Ai j

)(pout

` )Ai j(1− pout` )Ci j−Ai j , if zi 6= z j

(2)

where the the correct case is determined by node j belonging to the same community as node

2While it is often the case that the adjacency matrix is symmetric (i.e. that Ai j = A ji), this does not have to be thecase. For example, in a social network defined by email communication, an email from person i to person j does notensure that person j responds, a situation with which most people are familiar.

3I assume that the probability of forming a link outside of a community is the same regardless of the community towhich the link is formed. While it is possible to posit a fully general model that allows pout

` to vary across groups(seeMørup and Schmidt (2012)) this can result in estimating more parameters than there is data when there are manycommunities.

4

i (zi = z j) or if the two nodes lie in different groups (zi 6= z j). The assumption of a binomial

distribution requires additional information, namely the number of trials for which one can observe

a successful outcome. This information is contained in an n× n capacity matrix C, where Ci j

contains the number of potential interactions between nodes i and j (note that Ci j ≥ Ai j for all i∈N

and j ∈ N).4 This assumption is shared with work by the original model in Copic et al. (2009) and

is most suited to cases where the number of potential interactions is known; however, the binomial

assumption is robust up to a reasonable estimate of Ci j as is discussed in the citation network

example in their work. For convenience, I will proceed assuming that C is known. Combining and

aggregating the binomial probabilities (and omitting the binomial coefficients which are functions

of only the data) I obtain the likelihood of observing a community structure given a network and

probabilities:5

L(z|pin` ,p

out` ,A,C) =

n

∏i=1

∏j∈z(`)

(pin` )

Ai j(1− pin` )

Ci j−Ai j ∏j 6∈z(`)

(pout` )Ai j(1− pout

` )Ci j−Ai j (3)

where node i lies in community `, pin` represents the set of all within community probabilities and

pout` is the set of all across community probabilities.

The Beta distribution is a continuous distribution on the interval [0,1] that can take a variety of

shapes depending on the parameterization. In Bayesian analysis it is the natural conjugate prior of

the Bernoulli, binomial, and geometric distributions and has the advantage of nesting the uniform

distribution as a special case when both parameters are set to one. Consequently, I assume a Beta

prior for the probabilities for all ` ∈ L:

pin` ∼ Beta(α in,β in)

pout` ∼ BetaInc(αout ,β out , pin

` )

4Similarly to the adjacency matrix, the capacity matrix is frequently symmetric but is not constrained to be so.5Note that aggregating in this fashion requires assuming that interactions between individuals are independent

conditional on community assignment. This is a less restrictive than assuming that the probabilities are independentacross all individuals (as is required in Copic et al. (2009) and Chapman and Zhang (2010)) and is based on the ideathat conditional on group assignment link formation appears to happen at random; however, there are scenarios wherethis assumption may not be innocuous. This will be discussed further in Section 5

5

where BetaInc represents a beta distribution truncated by [0, pin` ]. The density for this function is

F(θ) =θ α−1(1−θ)β−1

Bpin`(α,β )

(4)

where Bpin`(·) is the incomplete beta function. This prior assumes that the community definition

assumption in Equation 1 is met. For convenience in notation, I assume that the prior parameters

are the same for both distributions so that α in = αout = α and β in = β out = β . In addition to

assuming prior distributions for the probabilities, I assume an discrete uniform prior for the com-

munity structures. The number of potential partitions of a set of size n is a constant known as the

Bell Number, denoted ωn, which implies that the prior probability mass function is given by:6

f (z) =1

ωn

This represents an uninformative prior because it implies that all partitions of the nodes are equally

likely.7 Because this value is constant given the number of nodes in the network I omit it throughout

for clarity.

While it is feasible to estimate the parameters in (3) via direct maximum likelihood estimation

as in Copic et al. (2009), there are several reasons this may be undesirable. It is important to note

that it is very difficult to optimize over the space of potential community structures. As networks

become large the number of potential partitions increases dramatically factorially. For example, a

set of size 10 has 115,975 possible partitions while a set of size 25 has 4,638,590,332,330,743,949.

This makes it difficult, if not impossible, to calculate the likelihood for all possible partitions

for a network of any reasonable size. Further, the set of all possible partitions is discrete and

unordered, meaning that there is no information about direction of increase that is typically used

in optimization algorithms. Copic et al. (2009) propose using “pseudo-community structures”

which builds an artificial community around a randomly selected node to limit the set of optimal

partitions. They then use the established structure as a basis for maximum likelihood estimation

6The number of potential partitions of a set of of size n (the Bell number) is defined by the recursion formulaω j+1 = ∑

jk=0

( jk

)ωk

7Much of the work in machine learning and other fields assumes a Dirichlet Process (also known as a ChineseRestaurant Process) as the prior distribution for the partitions. This assumes that an individual is more likely to join alarger community than a smaller one and is dependent on an initial assumption for how many communities there are.I wish to avoid these assumptions because they do not necessarily fit all network structures.

6

of z, while optimization for pin and pout is done by grid search. Unfortunately, approximating

the network structure potentially excludes important information and relying on grid search for

estimation of the probabilities is inefficient. Because of this, this technique is only pursued for

networks of relatively small size (n < 50), limiting its usefulness for further experimentation.

A Bayesian implementation offers several advantages.8 While the maximum likelihood pro-

cedure is difficult to implement in practice because of the huge number of potential community

structures, a Metropolis-Hastings algorithm is capable of reaching all possible community struc-

tures with non-zero probability without requiring listing all possible partitions or making simpli-

fying assumptions. Additionally, rather than yielding point estimates, a Bayesian implementation

results in posterior distributions for the probabilities (which allows for easy hypothesis testing) and

a posterior distribution for community structure. A posterior distribution over community struc-

tures is simply a list of community structures along with their frequency over a given number of

iterations of the algorithm. This can be a “degenerate” distribution of only one community struc-

ture that fits the network better than any other, but this is not necessarily the case. Indeed it may

be very useful to know which partitions tightly “fit” the data and which are less certain.

3 Bayesian Implementation

Given the likelihood function defined in (3), the posterior distribution for the set of within commu-

nity probabilities pin and across community probabilities pout along with the optimal community

structure z given the observed network A is:

f (A,C|z∗,pin∗,pout∗) ∝ L(z,pin,pout |A,C) f (z,pin,pout) (5)

where f (z,pin,pout) represents the prior distribution of the probabilities of link formation within

a community and outside of a community, along with the prior beliefs about the probability of a

given partition of the network. Because the Beta distribution is the natural conjugate prior of the

8Copic et al. (2009) acknowledge the possibility of using Bayesian techniques in their work but choose not pursue itleaving this initial step to Chapman and Zhang (2010). To my knowledge, this is the first work to present a generalizedcommunity detection model combined with a sophisticated Bayesian convergence algorithm

7

binomial distribution, combining the likelihood conditional on the pout` with the prior distribution

pin` yields the kernel of Beta distribution

Beta

(α + ∑

i∈z(`)∑

j∈z(`)Ai j,β +

n

∑i∈z(`)

∑j∈z(`)

(Ci j−Ai j)

)(6)

and similarly the distribution for pout` has the following kernel for a truncated Beta distribution

BetaInc

(α + ∑

i∈z(`)∑

j 6∈z(`)Ai j,β +

n

∑i∈z(`)

∑j 6∈z(`)

(Ci j−Ai j)

). (7)

These kernels allow for estimation of a posterior distribution for the probabilities in pin and pout

via Gibbs sampling.

The space of all possible partitions is an extremely large discrete space with no convenient

functional form which precludes Gibbs-sampling to find a distribution over possible partitions.9

Consequently, estimation of an unknown posterior distribution proceeds via a Metropolis-Hastings

Markov-Chain Monte Carlo algorithm implemented within the Gibbs sampler. The intuition be-

hind how this works is straightforward. Candidate partitions (potential new community structures)

are generated by a stochastic process. Given two possible community structures, the original and

the new partition, the algorithm selects the one that is most likely given the previously sampled

values for pin and pout and discards the other. New values for pin and pout are sampled from beta

distributions conditional on the selected partition. This process continues for a given number of

repetitions until the process converges to posterior distributions for the probabilities and a posterior

distribution over community structures.

The Metropolis-Hastings algorithm requires positing a candidate-generating function, q(·, ·),

which determines the probability of observing a new partition z′ given the current partition z. These

functions can be quite flexible, but must allow for any move between partitions to be reversed with

non-zero probability. I utilize two such functions in my algorithm. The first function relies on

comparing two partitions, which are identical except for one node that is randomly assigned to

another community, which I will refer to as individual-walk. Chapman and Zhang (2010) posit the

9A variety of techniques in machine learning use a Chinese Restaurant Process prior which can yield results viaGibbs sampling given certain assumptions about the community detection process, see Mørup and Schmidt (2012) fora recent example in a long line of literature. For reasons previously discussed in Section 2 I am reluctant to assumethis prior.

8

following function for this purpose, q(z′|z) where q represents the probability of transitioning to a

partition z′ from the current partition z. Recall L as the number of communities in a given partition

and L` as the number of nodes in community ` for that partition. The transition function then has

the specific form

q(z′|z) =

1

(L)2×L`if L` > 1

1(L)2−L

if L` = 1(8)

This function represents the probability that a given node will change from one community to

another and assumes that prior beliefs indicate that all partitions are equally likely. Intuitively, the

probability that a given node will change from one community to another is determined by the

probability that a given node is selected within a randomly selected community, 1L×

1L`

, multiplied

by the inverse of the number of communities that it could potentially move to, 1L . The cases account

for the difference in calculations when a node belongs to a singleton community.

The second candidate-generating function is a variation of the split-merge algorithm proposed

by Jain and Neal (2004).10 As might be expected, moving only one node at a time to generate

candidate partitions results in very slow convergence. Since Chapman and Zhang (2010) only

implement their algorithm for a network of size 14, this is not very problematic; however, imple-

menting the procedure with networks that are only marginally larger results in prohibitively lengthy

computation time. The split-merge algorithm allows for larger changes in community partitions so

that convergence is achieved more quickly.

The split-merge algorithm begins by randomly selecting two nodes. If those nodes are in the

same community, the community is “split” into two new communities by assigning the second

of the two nodes to a new community, leaving the first node in the original community, and then

randomly allocating the remaining nodes in the original community between the original and new

communities via a series of Bernoulli trials. If the nodes are in the different communities, all the

nodes in both communities are “merged” so that all the nodes in the two communities are combined

10Jain and Neal (2004) propose the split-merge algorithm as a technique for Dirichlet processes where an infor-mative prior for community structures is chose. Since I have chose an uninformative prior for the partitions, theimplementation of the algorithm is different, but the idea follows directly from the original work.

9

into the community of the first randomly selected node. The associated probability of transitioning

from one partition to the next via split-merge for two randomly selected nodes i and j is given by:

q(z′|z) =

(

1n(n−1)

)(12

)L`

if zi = z j

1n(n−1)

if zi 6= z j.

(9)

This is simply the probability of randomly selecting two nodes from a set of size n multiplied by

the probability of assigning nodes between the node and original community. Note that in the

case of merging the communities the probability of assigning nodes is 1 because the communities

are merged with certainty.11 Although the split-merge proposals result in much larger changes in

the community structures that aids in rapid convergence, this same attribute makes it difficult to

transition between two partitions that are very similar. Consequently, I utilize both and individual-

walk and a split-merge transition in the Bayesian algorithm.

Candidate partitions are evaluated using the Metropolis-Hastings ratio. This represents the

probability of transitioning from the current partition z(m) to a candidate partition z′ and is given

by

min

[f (z′|pin, pout ,A,C))q(z′,z(m)))

f (z(m)|pin, pout ,A,C)q(z(m),z′),1

].

Note that the assumption of an uninformative prior over the partitions implies that the effect of the

prior in the ratio cancels out. This setup can be summarized in the following algorithm:

Algorithm 1 1. Assume initial values for pin0 , pout

0 , and an initial community partition z0

2. Sample new values pinnew and pout

new from their associated Beta distributions via Gibbs sam-

pling

3. Sample a new partition via a Metropolis-Hastings step:

(a) Randomly choose a new partition via either an individual-walk or a split-merge pro-

cess12

11Note that this proposal satisfies the reversibility condition because a community that has been merged returns tothe original community if the same two nodes are selected and split with an identical Bernoulli process (which occurswith non-zero probability). Additionally, a community that has been split returns to it’s original community if thesame two nodes are selected and merged.

12While there are several ways to implement this, the simplest is to alternate between the two methods. This causesno problems for the convergence of the Metropolis-Hasting Algorithm.

10

(b) Calculate the acceptance probability given the current values of pinnew and pout

new and the

current and previous partitions

(c) Accept the new community structure if the acceptance probability exceeds a uniformly

distributed random value; reject and retain the previous community structure otherwise

4. Repeat steps 2-3 for a given number of burn-in repetitions and a chosen number of trials

While initial values for pin0 , pout

0 are relatively innocuous because the Gibbs sampling quickly

converges to values within the desired range, the initial community structure z0 is very important

to the number of iterations required to achieve convergence. One choice that seems to work well in

initial testing is the “uninformative” partition of every node being in it’s own community; however,

the best choice may vary depending on the networks under consideration. It should be stressed

that the initial partition choice can be arbitrary and that the Metropolis-Hastings algorithm will

theoretically converge given any initial value; however, in practice it may be useful to choose an

informed prior if the researcher is comfortable doing so to limit the number of repetitions needed

to converge to the correct community structure.

4 Convergence Experiments

It is useful to examine the above algorithm on simulated networks where parameters are known.

Various statistical tests and graphical checks exist to test the convergence in distribution of the

probability parameters; however, the convergence of the community structure is much more dif-

ficult to test. Simulation allows verification that the algorithm not only converges properly to the

specified known probabilities, but also the correct known community structure.

Random network generation proceeds as follows. Set arbitrary parameter values for the proba-

bility of linking within a community and across communities p̃in and p̃out . Then generate a random

community structure zsim by specifying the number of nodes in a network n and assigning com-

munity membership. In this case, community membership is done by random assignment from a

11

chosen number of communities L.13 Using the resultant community structure, I populate an adja-

cency matrix A with ties between individuals based on an iterative process. Take an intermediate

adjacency matrix Ak and populate it with ones and zeros based on p̃in and p̃out as follows. If nodes

i and j are in the same community and if a draw from a uniform distribution is less than p̃in, let

Aki j = Ak

ji = 1 . Otherwise fill in 0. Similarly, if nodes i and j are not in the same community let

Aki j = Ak

ji = 1 if a draw from a uniform distribution is less than p̃out , otherwise fill in 0. Repeat

this process for a given value K. The resulting network adjacency matrix is A = ∑Kk=1 Ak and the

capacity matrix is a matrix with every element set to K, the number of potential interactions.

I generate 100 of these matrices and apply the algorithm to them and record the resulting

estimated probabilities and community structure. Additionally, I make the following assumptions

which are chosen for their similarity to the application in Section 5:

1. There are only two communities

2. N = 100, i.e. there are 100 nodes in the network

3. p̃in = 0.7 and p̃out = 0.3

4. There are K = 500 potential interactions in the network

The number of iterations in the Bayesian algorithm are varied, ranging from values of 1,000,

5,000, 10,000 and 25,000 (with an equal number of burn-in repetitions that are discarded prior to

analysis) to get intuition about how quickly the process converges. I use the following benchmarks

to evaluate convergence:

• Correct community: The algorithm identifies the correct community

• p̃in in HPD: p̃in is in the 95% highest probability density interval of its estimated posterior

distribution

• p̃out in HPD: p̃out is in the 95% highest probability density interval of its estimated posterior

distribution

• pin Geweke test: Test the null hypothesis that the distribution for pin converged at a 95%

13This implies that each community will be roughly the same size n× 1L . This assumption is not necessary, but fits

the the empirical application very well because Congress is roughly evenly split between community membership. Italso represents a scenario where algorithms that have a Chinese Restaurant Process prior have difficulty converging.

12

confidence level14

• pout Geweke test: Test the null hypothesis that the distribution for pin converged at a 95%

confidence level

• Optimal community frequency: Proportion of the draws that return the resultant optimal

community structure

• Unanimous community frequency: The resultant community is the only one sampled during

the iterative process

These numbers are averaged over all 100 random networks and reported in Table 1 for a variety

of repetition values for a network of size 100. The results indicate that the algorithm converges

quite quickly given the assumptions above. It identifies the correct community structure in all

of the sampled networks with as few as 500 iterations of the algorithm with the Geweke test

rejecting the null that the test has converged approximately 5% of the time for both pin and pout

(as would be expected for a test at the 95% confidence level repeated 100 times). Lastly, even

though the algorithm converges well at 500 iterations, it is important to remember that the random

network was generated in such a way that all the assumptions hold perfectly.15 This indicates that in

applied settings 500 is a minimum iteration requirement to get some intuition about the estimated

parameters; however, rigorous analysis should probably rely on many more. I have conducted

similar tests with a variety of assumptions about the network itself and this general pattern holds

regarding number of iterations. Not reported are simulations that indicate that holding the number

of repetitions fixed at 25,000 (with an equal number of burnins) seems to be sufficient to ensure

convergence in networks up to size 500. Naturally, these results likely hold for larger networks and

additional repetitions. In these cases, computing power becomes the binding constraint.

14The Geweke test is a simple convergence test for MCMC simulations. It tests the null hypothesis that there is nodifference in the mean between a subset of the repetitions collected at the beginning of the process and a subset of ofthe repetition collected at the end of the process. If you can reject this null, it is safe to conclude that the process hasnot converged. Note that this is not a true test of convergence in distribution as it only tests differences in means. Fordetails see Geweke et al. (1991).

15Of particular interest is the assumption that there are only two groups in the data. Preliminary results indicate thatthe process converges more slowly for networks with more than two groups.

13

5 Application to U.S. Senate

The United States Congress is easily interpreted as a network, a set of legislators connected in

ways that are easily observable such as voting patterns or co-sponsorship of bills. This simplicity

is not lost in the networks literature. From a technical standpoint, the Senate is often used as a test

for new computations techniques (Banerjee, El Ghaoui and D’Aspremont, 2008; Kolar, Le Song

and Xing, 2010). In an applied setting, researchers have used co-sponsorship on bills (Tam Cho

and Fowler, 2010; Harward, February 2010) and committee assignments (Porter, Mucha, Newman

and Warmbrand, 2005) to show that Congress exhibits traits commonly seen in social networks.

Specifically, a voting network consists of a Senators or Representatives with links between them

defined as the number of times they voted the same way on a bill. The element Ai j of the network

adjacency matrix A counts the number of times senator i voted yea when senator j voted yea and

the number of times they both voted nay. This structure yields a network that is weighted by the

number of common votes and undirected, because by construction Ai j = A ji. For this section I

focus on the Senate, although the results for the House are broadly similar.

While it seems clear anecdotally and from recent literature in political science (see for example

Poole and Rosenthal (2007); Theriault (2008)) that voting along partisan lines has increased in

the recent past, it is unclear that the increasing number of partisan votes actually means Congress

has become more polarized. Rather, it may be that consensus and conflict in Congress has stayed

constant over time due to the the rules and structure of Congress and only recently has party

voting clearly aligned with these preexisting voting patterns. The technique proposed in this paper

offers a unique setting with which to test this phenomenon. First, finding an optimal community

structure allows us to observe voting blocs in Congress, rather than relying on party identification

(an idea introduced by Waugh, Pei, Fowler, Mucha and Porter (2009)) as a relevant group for

decision making. I can then use Bayesian model comparison, to test whether or not the observed

voting bloc is distinct from party identification as a community partition. Second, the estimated

probabilities found via the Bayesian Community detection suggested above have very intuitive

interpretations in this setting, pinc is the probability of voting within a voting bloc and pout

c is the

14

probability of voting across a voting bloc. The expectation is that these probabilities should remain

fairly constant over time because they are determined by the structure, rather than the composition,

of Congress. However, the probabilities of voting in accordance with party identification, pinp and

poutp should vary as party identification comes closer to corresponding to observed voting blocs.

I use the Metropolis-Hastings within Gibbs algorithm proposed above to find the most likely

community structure and its associated probabilities pinc and pout

c . I use 10,000 burn-in repetitions

and 10,000 iterations which should be sufficient to ensure convergence given the simulations in

Section 4. Additionally, I run Geweke convergence tests for both probabilities and check the fre-

quency of appearance of the final community structure. These are reported in Table 2. It suggests

probability estimates for the Congresses converged, as the Geweke test rejects the null in approxi-

mately 5% of cases (as would be expected for a 95% test). However, it appears that there is more

uncertainty regarding the optimal partition than was evident in the convergence tests above, as only

60% of Congresses settled on only one potential partition.16

In addition to estimating partitions via the algorithm, I calculate the probabilities while holding

party identification constant as the community structure to obtain estimates of pinp and pout

p . The

estimated probabilities for 45th to 111th Senate, both for observed communities and party identi-

fication, are shown in Figure 1.17 The graphs exhibit the trends that are expected if Congress has

become more polarized, ppin and pp

out were less extreme than their voting bloc counterparts psin and

psout for most of the 20th Century but become indistinguishable in more recent Congresses.18 How-

ever, it is important to note that such convergence is not unprecedented, as Congresses in the late

1800s exhibit similar characteristics, although the measurements are more volatile from Congress

to Congress. This may be partially explained by smaller numbers of representatives taking smaller

16This is not evidence that the method is not working. The algorithm is supposed to converge to a distribution oflikely partitions, if there are several partitions that are nearly equally likely the algorithm will suggest these partitionsmore frequently. This is most likely due to the application not perfectly fitting the assumptions that were satisfied byconstruction in Section 4, for example the potential number of communities is not set to be two.

17The 45th Congress is considered by scholars as the beginning of the modern period of Congress, coinciding withthe end of Reconstruction. The 111th Congress is the most recent for which data is fully available.

18Formal statistical tests are possible using the highest probability density intervals for the estimated probabilities.However, the intervals are so narrow that statistically significant differences persist across almost all Congresses.These estimates are available upon request.

15

numbers of votes, resulting in additional sampling error. Figure 1 also reinforces the idea that pcin

and pcout remain relatively stable, with the exception of a seeming increase in ps

out from the 70th to

the 100th Congresses.

Figure 2 shows a simple measure of polarization,pin− pout , based on the estimated probabil-

ities for both party and voting bloc. This reinforces that the period from the 70th to the 100th

Congresses was somewhat unique, as it is clear that party based polarization is clearly lower than

the structural (voting bloc) persuasion. This graph also indicates that, in addition to party polar-

ization more closely aligning with the observed community polarization, structural polarization is

reaching unprecedented levels. Beyond the probability estimation, I am also able to use Bayesian

model comparison to test the null hypothesis that the party partition is the true community structure

against the alternative that estimated community structure is correct. I am able to strongly reject

the null in all but two Senates (the 59th and the 104th).

6 Extensions and Conclusion

This work contributes a Bayesian community detection algorithm and provides an application to

Congressional voting patterns. While this specific method is limited to applications where com-

munity structures and probabilities of link formation have an intuitive interpretation, the general

framework could be extended in several useful ways. Relaxing the assumption that probabilities

of link formation are identical across communities can provide additional insight into many net-

work questions. In the context of this work, a common explanation for increased polarization in

Congress is the Republican party moving further to the right. If this is the case, we would observe

pin growing and pout decreasing only for the voting bloc that most closely aligns with Republican

party. Methodologically, several extensions seem promising. First, the Bayesian framework estab-

lished in this paper is easily generalized to a hierarchical framework and estimate network level

parameters and individual parameters simultaneously. If it is possible to establish identification,

this process allows estimation of individual level models of network formation while formally ac-

16

counting for the endogenous nature of individual decisions on network structure. A second avenue

of future study would be to examine the properties of communities themselves. Communities are

likely to be determined both by observable attributes of the nodes and by unobservable attributes

of the nodes; therefore, the community to which a node belongs provides information about nodes

that is unobservable in other contexts.

17

ReferencesBanerjee, Onureena, Laurent El Ghaoui, and Alexandre D’Aspremont, “Model Selection

Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data.,”Journal of Machine Learning Research, 2008, 9 (3), 485 – 516.

Bech, Morten L., Carl T. Bergstrom, Rodney J. Garratt, and Martin Rosvall, “Mappingchange in the federal funds market,” Technical Report 2011.

Chapman, James and Yinan Zhang, “Estimating the Structure of the Payment Network in theLVTS: An Application of Estimating Communities in Network Data,” 2010.

Cho, Wendy K. Tam and James H. Fowler, “Legislative Success in a Small World: SocialNetwork Analysis and the Dynamics of Congressional Legislation,” The Journal of Politics,2010, 72 (01), 124–135.

Copic, Jernej, Matthew O. Jackson, and Alan Kirman, “Identifying Community Structuresfrom Network Data via Maximum Likelihood Methods,” The B.E. Journal of Theoretical Eco-nomics, 2009, 9.

Geweke, J. et al., Evaluating the accuracy of sampling-based approaches to the calculation ofposterior moments, Federal Reserve Bank of Minneapolis, Research Department, 1991.

Harward, Brian M., “The Calculus of Cosponsorship in the U.S. Senate,” Legislative StudiesQuarterly, February 2010, 35.

Jain, S. and R.M. Neal, “A split-merge Markov chain Monte Carlo procedure for the Dirichletprocess mixture model,” Journal of Computational and Graphical Statistics, 2004, 13 (1), 158–182.

Kolar, M, A. Le Song, and E. Xing, “Estimating time-varying networks,” Ann. Appl. Stat., 2010,4 (1), 94–123.

Leskovec, Jure, Kevin J. Lang, and Michael Mahoney, “Empirical comparison of algorithms fornetwork community detection,” in “Proceedings of the 19th international conference on Worldwide web” WWW ’10 ACM New York, NY, USA 2010, pp. 631–640.

Mørup, Morten and Mikkel N Schmidt, “Bayesian community detection.,” Neural computation,September 2012, 24 (9), 2434–56.

Newman, M.E.J., “Detecting community structure in networks,” The European Physical JournalB - Condensed Matter and Complex Systems, 2004, 38, 321–330. 10.1140/epjb/e2004-00124-y.

Poole, K.T. and H. Rosenthal, Ideology and Congress, Transaction Publishers, 2007.

Porter, Mason A., Peter J. Mucha, M. E. J. Newman, and Casey M. Warmbrand, “A net-work analysis of committees in the U.S. House of Representatives,” Proceedings of the NationalAcademy of Sciences of the United States of America, 2005, 102 (20), 7057–7062.

18

Santo and Fortunato, “Community detection in graphs,” Physics Reports, 2010, 486 (3-5), 75 –174.

Theriault, S.M., Party polarization in Congress, Cambridge University Press, 2008.

Waugh, Andrew S., Liuyi Pei, James H. Fowler, Peter J. Mucha, and Mason A. Porter, “PartyPolarization in Congress: A Network Science Approach,” Forthcoming, 2009.

Wetherilt, Anne, Peter Zimmerman, and Kimmo Soramaki, “The sterling unsecured loan mar-ket during 2006-08: insights from network theory,” Bank of England working papers 398, Bankof England July 2010.

19

A Tables and Figures

Table 1: Benchmark Convergence Tests: Varying Iterations

Number of Iterations100 500 1000 5000 10000 25000

Correct Community 0.46 1.00 1.00 1.00 1.00 1.00pin in HPD 0.98 0.96 0.92 0.91 0.95 0.97pout in HPD 0.98 0.96 0.92 0.91 0.95 0.97pin Geweke Test 0.76 0.04 0.02 0.05 0.04 0.09pout Geweke Test 1.00 0.05 0.02 0.11 0.02 0.04Optimal Community Frequency 0.08 1.00 1.00 1.00 1.00 1.00Unanimous Community Frequency 0.00 1.00 1.00 1.00 1.00 1.00

Table 2: Senate Convergence Tests

Proportion of 68 Senates

pin Geweke Test 0.05pout Geweke Test 0.07Optimal Community Frequency 0.66Unanimous Community Frequency 0.60

20

Figure 1: Probabilities of Link Formation

50 60 70 80 90 100 110

0.0

0.2

0.4

0.6

0.8

1.0

Probability of Voting Within Community and Outside Community

Congress

Prob

abili

ty

pinc

poutc

pinp

poutp

21

Figure 2: Senate Polarization

50 60 70 80 90 100 110

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Polarization in Congress over Time

Congress

p in

−p o

ut

pinc − pout

c

pinp − pout

p

Documents

Community Detection and Political Polarization (Daniel Argyle)