Modelling Cabinet Networks in Parliamentary Democraciesmason/research/gryc_dissertation_final.pdfModelling Cabinet Networks in Parliamentary Democracies Wojciech Gryc Trinity College

Modelling Cabinet Networks inParliamentary Democracies

Wojciech Gryc

Trinity College

University of Oxford

A thesis submitted for the degree of

M.Sc. in Mathematical Modelling and Scientific Computing

2

This dissertation is dedicated to my parents, Jolanta and Zbigniew Gryc.

Acknowledgements

I would like to thank my supervisor, Mason Porter, for his advice and

guidance. I would also like to thank Brian Uzzi and Daniel Diermeier for

their advice and for the data sets being analyzed in this dissertation. Fur-

thermore, I’d like to thank Peter Mucha for his advice, as well as Kathryn

Gillow and all those involved in running the M.Sc. in Mathematical Mod-

elling and Scientific Computing. Finally, I would like to thank the Rhodes

Trust for their support.

Abstract

Parliamentary democracies represent a common type of governance struc-

ture in numerous countries. While details vary from one country to an-

other, the structure of a parliamentary democracy entails having cabinet

ministers who each have a specific portfolio of policy interests, such as

healthcare, industry, or education. A set of ministers forms a govern-

ment, and such governments can change due to elections, political scan-

dals, or health problems. A great deal of work has already explored specific

types of ministers, such as prime ministers and foreign ministers, includ-

ing their survival rates and levels of experience. What is absent from

this research, however, is analysis of the network structures surrounding

cabinets, and how these structures vary over time. This research project

focuses on building mathematical models that incorporate the network

structure of ministerial cabinets in twelve parliamentary democracies, an-

alyzing whether such network information can be used to build a compar-

ative analysis of parliamentary democracies or help predict survival rates

of cabinet ministers. Governmental and ministerial models are built to

analyze cabinet evolution at the country and individual levels. The con-

clusions reached by both types of models show that network information is

extremely useful in predicting both the country- and individual-level qual-

ities of such democracies, including rates of incumbents in governments,

as well as political survival rates of individual ministers.

Contents

1 Introduction 1

1.1 The Mathematics of Networks . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Network- and Node-Level Paradigms . . . . . . . . . . . . . . . . . . 4

2 Literature Review 6

2.1 Power and Social Networks . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Modelling Networks of Teams and Collaborators . . . . . . . . . . . . 7

2.3 Political Theory and Political Networks . . . . . . . . . . . . . . . . . 7

3 Data 10

4 Network Measures 12

4.1 Centrality and Centralization . . . . . . . . . . . . . . . . . . . . . . 12

4.1.1 Degree Centrality . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1.2 Closeness Centrality . . . . . . . . . . . . . . . . . . . . . . . 14

4.1.3 Betweenness Centrality . . . . . . . . . . . . . . . . . . . . . . 14

4.1.4 Eigenvector Centrality . . . . . . . . . . . . . . . . . . . . . . 16

4.2 Clustering Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2.1 Subgraph Intensities . . . . . . . . . . . . . . . . . . . . . . . 17

i

4.2.2 Upper Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3 Homophily . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3.1 Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3.2 Modularity and Assortativity . . . . . . . . . . . . . . . . . . 19

5 Descriptive Overview of the Cabinet Data 20

5.1 Individual Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.2 Comparative Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6 Governmental Models 24

6.1 Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.1.1 Membership Predictions . . . . . . . . . . . . . . . . . . . . . 25

6.2 Single-Measure Models . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.2.1 Strength Model . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.2.2 The Strength Model as a Bipartite Network Model . . . . . . 29

6.2.3 Other Properties . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.2.4 Perturbing the Optimal Probability . . . . . . . . . . . . . . . 31

6.3 Preferential Attachment . . . . . . . . . . . . . . . . . . . . . . . . . 32

7 Model Optimization 34

7.1 Golden Section Search . . . . . . . . . . . . . . . . . . . . . . . . . . 34

7.2 Nelder-Mead Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 35

7.3 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

8 Optimized Models 39

8.1 Error Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

ii

8.2 Model Fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

9 Ministerial Models 42

9.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

9.2 Preparing the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

9.3 Voting by Majority Class Membership . . . . . . . . . . . . . . . . . 44

9.4 Binary Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

9.5 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

10 Discussion and Conclusion 48

10.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

10.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

A Symbols 51

B Code Overview 53

C Graph Visualization 55

C.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

C.2 Visualizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

D Clustering Coefficients 60

D.1 Upper Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

D.2 Matrix-based Definition . . . . . . . . . . . . . . . . . . . . . . . . . 61

D.3 Discussion and Critique . . . . . . . . . . . . . . . . . . . . . . . . . 61

E Perron-Frobenius Theorems 63

iii

F Data Tables 67

G ψ-Value Proofs 69

H Model Optimization Results 72

References 76

iv

List of Figures

1.1 A bipartite network with two governments, g1 and g2, and three cabinet

members, c1, c2, and c3. Observe that c2 is an incumbent in g2. . . . . 3

1.2 A network of cabinet members obtained from the bipartite network

shown in Figure 1.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

5.1 Clustering coefficients over time. . . . . . . . . . . . . . . . . . . . . . 22

5.2 Blau’s Heterogeneity measure over time in Japan. . . . . . . . . . . . 23

7.1 Illustration of the Golden Section Search algorithm, where w = (b −a)/(c− a) and z = (x− b)/(c− a). . . . . . . . . . . . . . . . . . . . 35

7.2 Number of iterations necessary to either quit the simulated anneal-

ing algorithm (which we have chosen to be 2500 iterations) or reach

convergence with error tolerance 0.1. The dashed lines represent one

standard deviation away from the mean number of iterations (or 0 or

2500 where relevant) when running the algorithm 50 times for T = 1,

and an initial perturbation of 2. The function being minimized is

f(x, y, z) = x2 + y2 + z2. . . . . . . . . . . . . . . . . . . . . . . . . . 38

9.1 An example of a binary tree decision system. One variable (cloudi-

ness) is evaluated first, either leading to a decision or another variable

evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

v

9.2 Accuracies for the random tree algorithm (dark line) and majority vote

(red line). The dashed lines represent one standard deviation away

from the mean of the random tree algorithm. The split represents at

which government we begin to built the test (i.e. validation) data set,

also known as the “holdout” set. . . . . . . . . . . . . . . . . . . . . . 47

B.1 The flow of information within the code developed to analyze the cab-

inet network data. Text surrounded by squares represents “entities”

such as classes, and arrows represent the flow of information and data. 54

C.1 Canada’s first and second cabinets. Note that the edges in these visu-

alizations are unweighted. . . . . . . . . . . . . . . . . . . . . . . . . 58

C.2 Sweden’s first 21 cabinets. Notice that nodes organize into clusters,

with a few nodes connected to many tightly-knit groups. Edges in this

visualization are unweighted. . . . . . . . . . . . . . . . . . . . . . . . 58

C.3 Israel’s 16th cabinet. Political parties are represented by colour, and

edges are thicker if their weights are higher. Notice the experienced

group composed of blue nodes. . . . . . . . . . . . . . . . . . . . . . . 59

C.4 Denmark’s 27th cabinet. Notice the presence of two parties (node

colours), and the newcomers (squares). Also, notice that different

nodes have only previously worked with a subset of incumbents. . . . 59

D.1 Various results for simulations for TO in random graphs whose edge

weights are values from a random variable following an exponential

distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

vi

Chapter 1

Introduction

Social network analysis is a burgeoning academic field helping researchers analyze how

different types of actors relate to each other through information on their relation-

ships, transactions, attendance of events, and so on. While social network research

has roots that extend back to the 1930s [44], and almost three hundred years further

if one considers the origins of graph theory, the last two decades have seen a great deal

of growth in the literature on social networks. The premise of such research is that

actors or other entities can be represented as nodes in a network, with edges (the

connections between the nodes) representing different types of relationships. Such

an approach to analyzing and understanding data is extremely useful, and has been

explored in numerous different applications [13, 71], from biological food webs, to

computer systems, and even actors in the global financial system. An analytical ap-

proach that uses networks begins by defining a set of relationships between different

actors, and using graph theory, then attempts to reach conclusions about some of

the properties of the system itself. This is relevant to political science, where rela-

tionships between politicians, departments, and other actors and institutions are key

to understanding decision making processes, public policy, and other governmental

issues.

With a basic understanding of networks, explored further in Chapter 2, we approach

parliamentary democracies and the people who help administrate and run them.

Specifically, we are interested in cabinets and their ministers. While the way in which

parliamentary democracies operate can vary significantly between different countries,

ministers are always responsible for a specific portfolio, such as healthcare, industry,

or education, and make important policy decisions about how the government and

1

the political party in power decides policies surrounding those portfolios. Within a

parliamentary democracy, a “government” represents a specific set of ministers in

power. Ministers serve this government, and can also be called “cabinet members”.

A formal overview of existing political network research is given in Section 2.3. A great

deal of research exists on the qualitative characteristics of leaders in ministries [20],

as well as the life spans of individual governments [34]. To our knowledge, however,

there is a lack of a formal exploration of how the actors within individual cabinets

affect each other over time, and whether the connections between them serve any

significant roles in ensuring long-term political survival.

With this in mind, our goal is to determine whether network-specific variables provide

information on how parliamentary democracies function over time, and whether such

variables play a role in cabinet members’ survival in office. In the former case, we

explore the role of incumbents in governments over time, while in the latter case, we

examine whether specific cabinet members return as members in future governments.

1.1 The Mathematics of Networks

Social network data comes in numerous forms, with the two relevant ones to this dis-

sertation being bipartite and weighted, undirected networks. A “graph” or “network”

is an object, G, composed of two sets. The first of these are vertices, V, or nodes of

the network. The second set, E, is the edge list, which contains objects that signify

relationships between two nodes. Edges can be directed or undirected, as well as have

weights associated with them. A graph’s size n is defined as the size of V .

For example, if we have an undirected graph of size 3 where V = A,B,C and

E = (A,B), (B,C), then this represents a graph with three nodes, A, B, and

C, with a “path” going from A to B to C. Note that in an undirected graph,

(B,C) = (C,B). This would not be true in a directed case.

For the purpose of this dissertation, “self-edges”, where edges start and end at the

same node, are presumed not to exist and are ignored. Specifically, self-edges would

represent ministers who work with themselves in a cabinet.

The data being analyzed in this dissertation focuses on cabinet members and the

2

Figure 1.1: A bipartite network with two governments, g1 and g2, and three cabinetmembers, c1, c2, and c3. Observe that c2 is an incumbent in g2.

governments they served1. This can be represented as a network with two types

of nodes, gi (governments) and cj (cabinet members). Note that a government, gi,

represents the ith administration, rather than the institutions and laws of the country.

If a cabinet member cj served in government gi, then an undirected edge (cj, gi) is

present in the network’s set of edges. Note that edges cannot connect between two

governments or two cabinet members. This type of network is called an “affiliation

network”, and is a “bipartite graph” [70], where the set of vertices are composed of

two subsets, with no edges between members of the same subset.

As indicated, the cabinet data that we analyze in this dissertation is a collection of

bipartite graphs. For an individual country, our data consists of a set of cabinet

members and the cabinets on which they served. For example, consider a country

with two governments, g1, and g2, and a total of three cabinet members, c1, c2, and c3.

If c1 and c3 only served in g1 and g2, respectively, and c2 served in both governments,

we would have the bipartite graph shown in Figure 1.1. This yields a graph G with

vertex set V = c1, c2, c3, g1, g2 and edge set E = (c1, g1), (c2, g1), (c2, g2), (c3, g2).

Bipartite graphs, however, can be projected onto unipartite graphs. If we define the

relationship or edge between two cabinet members as existing if and only if they both

served in the same government, then we can collapse the above edge set E into a new

set E such that

(c1, g1), (c2, g1) ∈ E ⇒ (c1, c2) ∈ E,

(c2, g2), (c3, g2) ∈ E ⇒ (c2, c3) ∈ E.

Thus, in the collapsed network, we would have V = c1, c2, c3 and E = (c1, c2), (c2, c3).This network is shown in Figure 1.2.

1In terms of notation in this report, we use vi to denote vertices when discussing graphs ab-stractly, and ci to denote cabinet members when discussing nodes in the context of a cabinet network.

3

Figure 1.2: A network of cabinet members obtained from the bipartite network shownin Figure 1.1.

In either case, networks with an edge set and node set can also be represented in the

form of a matrix, called an “adjacency matrix”. In such a matrix, A = aij, one can

arbitrarily order the nodes in V and set the ith row and column to represent the ith

node in V . One then sets

aij =

1 if (ci, cj) ∈ E0 if (ci, cj) /∈ E

.

A final important property of the networks studied in this dissertation is that when

bipartite cabinet networks are collapsed, one can obtain “weighted” networks. This

occurs because if we have (c1, g1), (c2, g1) ∈ E and (c1, g2), (c2, g2) ∈ E — when two

politicians serve together in two separate governments — we lose a great deal of

information by setting a1,2 = 1. Instead, we project the network into a weighted one,

where the actual value of aij represents the number of cabinets both nodes served

together. Thus, in the aforementioned example, the edge (c1, c2) has a value of 2, and

all other edges have non-negative integer values, rather than just 0 or 1.

It is useful to note that the above projection of a bipartite to unipartite network is

just one approach to this problem. One can set edge weights in other ways, allowing

for non-integer values. For example, by normalizing edge weights or taking logarithms

of the above values.

1.2 Network- and Node-Level Paradigms

We use two approaches to model cabinet networks:

• Governmental Level: In this dissertation, networks represent team collab-

orations, where each team member is a cabinet minister. Analyzing networks

and modelling network-level statistics and properties helps us understand how

governments in different countries work and how they differ. By exploring node-

level properties and seeing how they aggregate at the “global” (i.e. network)

level, one can better understand some of the mechanics of the nodes’ decision

4

making processes, and various limits on how nodes survive, thrive, and make

decisions. A key motivation for such work is based on collaboration networks

of actors in Broadway musicals [27].

• Ministerial Level: Individual nodes in the network represent cabinet members

and over time show the evolution of the nodes’ political careers. This allows us

to make predictions on how long nodes will survive, if they will be promoted,

or which will return as cabinet members.

The main goal of this dissertation is to investigate whether structural (i.e. network)

properties of governments and individual cabinet members help explain why politi-

cians survive, and how governments work. While other variables can be included in

such an analysis, our goal is to explore the effects of networks. Politics is compet-

itive, and being able to navigate various groups (even within cabinets themselves,

especially in coalition governments) can make or break a political career. It is in-

tuitive to think that incorporating network measures, and exploring how individual

politicians relate to larger groups, can help improve pre-existing models of cabinet

formation and political longevity.

5

Chapter 2

Literature Review

To the best of our knowledge, there has been little work on modelling the social

and affiliation networks that pervade cabinets in parliamentary democracies, or gov-

ernments in general. A key reason for the lack of work in this area is that data

surrounding governments and the actors therein is difficult to obtain in a usable for-

mat, especially if one tries to do historical research. While parliamentary democracies

aim to be transparent and provide information on politicians, historical data is often

not digitized, or is stored in formats that are difficult to combine with other data

sets.

2.1 Power and Social Networks

Politics and the political survival of cabinet members requires that those members

have an acute understanding of the social dynamics of their system. Such dynamics

can include how information flow, access to various resources, and understanding the

costs of cooperating or competing with other members. A key reason for employing

network analysis in the study of social systems is the connection between power, the

ability to mobilize resources, and a node’s location in a network.

While such network-focused research is relatively new in political science, network

analysis has been used in other social systems as early as the 1930s [44, 71]. For

example, individual studies have shown that people weakly familiar with or connected

to large sets of disparate groups are better at finding jobs [25]; network position helps

non-profit organizations succeed [26]; and well-connected managers earn more money

6

and make decisions that are better implemented in their corporations [12]. Research

has delved into the details of the networks pervading these various systems, analyzing

characteristics of individual nodes, properties of nodes in relation to other nodes,

and “global” characteristics and measures of the entire networks themselves (see, for

example, work on the strategic networks of firms [28]).

Some sociologists have gone as far as to promote social network analysis as a way

to quantify people’s social capital and ability to gain power and status over others

[11, 38, 39]. Some common measures related to status in a network are described in

Chapter 4.

2.2 Modelling Networks of Teams and Collabora-

tors

To understand a social system, one should go beyond applying descriptive measures

on nodes within a network and attempt to model the dynamics of the network it-

self. We explore models on two levels. First we look at the “governmental” level,

where properties of the entire network are modelled. One can analyze the degree (or

strength, in the case of weighted networks) distribution of a network, rank nodes, and

analyze the properties of such a distribution. For example, Guimera et al. [27] built

models to analyze how Broadway actors work together based on new actors working

with established ones, and analyzed the success of the models using degree distribu-

tions of actors. Similar strategies have been used to analyze cooperative networks in

nature [61].

A second approach to modelling networks is through the analysis of individual nodes.

If dealing with temporal data, one can build “longitudinal networks”, where edges

and nodes vary over time. With such data sets, one can attempt to predict survival

rates of specific types of nodes.

2.3 Political Theory and Political Networks

Asking whether the network-based context of a cabinet minister – in relation to other

nodes of a specific government, the specific government, and historical properties of

7

the government – has any effect on the ability of the cabinet minister to succeed (i.e.

serve in future governments) is an important question.

Mathematical modelling of governments has received some academic attention, both

at the macro level by exploring grassroots support for different types of governments,

such as dictatorships and democracies [1], as well as the micro level, by exploring

characteristics of individual politicians. One example of the latter approach is the

creation of survival models for individual cabinet ministers in the United Kingdom [7],

looking at properties of governments, ministerial experience and personal histories,

as well as external events to predict how long ministers will serve. Furthermore,

analysis of entire governments, such as Congresses in the United States [79], political

parties [74], legislative committees and subcommittees [14,54,55], as well as supreme

court precedents [21], has also been performed. Some of these studies employ network

measures like modularity (discussed in Section 4.3).

In the context of cabinet memberships and survival rates of individual ministers, a

relatively small body of research exists for modelling coalition governments and the

longevity thereof [18]. Indeed, numerous factors and variables affect how govern-

ments form following an election, and how parties select their ministers. Institutional

constraints, such as internal party politics, constitutional guidelines, and electoral

outcomes all play a significant role in whether a politician will become, or continue

to be, a minister in a future government [67]. In general, it appears such research is

split into two fronts [34]:

1. Cabinets terminate or change due to various types of events, all of which can

be treated as random events. As such, the models themselves are stochastic.

2. Cabinet duration is based on the individual properties of the cabinet and its

members. Properties could include election results and the institutional con-

straints mentioned earlier.

Mathematically modelling governments, cabinets, and political systems is a nascent

and increasingly active research effort. At this stage, it appears that cabinet mem-

bers’ affiliation networks and prior cabinet experience has not been studied. This

is rather surprising because evidence shows that the relationships between ministers

has an impact on their survival rates [20], as does their membership in coalition

governments [68]. While studies of individual types of ministers, such as foreign

8

ministers [20], have been performed, studying relationships been all the ministers of

specific cabinets seems to be absent [59, 69].

With such a stark absence of relational, network-focused analysis of cabinet members

and their long term involvement in cabinets, a pertinent research question is whether

such information is useful and helps build fruitful models. This is a key question

within this research project, and we try to explore network effects at both the indi-

vidual node level, as well as the global level mentioned in Section 2.2. With this in

mind, we aim to isolate and analyze network variables individually, and investigate if

they allow us to build models that outperform baseline results.

9

Chapter 3

Data

The data being analyzed and modelled in this dissertation is based on cabinet data

of twelve parliamentary democracies from 1945 to 1990 [77]. This data was provided

to us in a digital format by Dr. Brian Uzzi and Dr. Daniel Diermeier. Parliamen-

tary democracies differ greatly, and may change “governments” – the portfolios of

cabinet members – for various reasons, including elections, corruption scandals, or

deaths. Hence, the number of governments that were formed during the 45 year

period being analyzed differs greatly between countries. Table 3.1 shows the coun-

tries being analyzed, the number of governments the period encompasses, and other

general information.

One can create graphs by aggregating information over multiple years. Specifically,

one can see which ministers worked together in the past and weight edges accord-

ingly. In the case of this dissertation, edge weights represent the number of times

the two cabinet members in the edge have worked together up to and including the

current government (as discussed in Section 1.1). In previous studies on different

networks, researchers [27] defined edges as ones between different types of groups,

specifically: newcomer-newcomer, newcomer-incumbent, incumbent-incumbent. An-

other possibility is to explore inter- versus intra-party edges, as countries that permit

coalition governments or have proportional representation voting systems often have

governments composed of members from multiple parties.

The biggest concern about the data is its completeness and applicability to relevant

modelling questions. The data is composed of membership lists of governmental cabi-

nets in various countries, along with party information and, sometimes, the ministers’

portfolios. However, a minister’s ability to stay in power, a coalition government’s

10

Country Govts Positions CMsCanada 18 406 197Denmark 27 366 156France 28 532 146Germany 23 510 224Israel 35 317 65Japan 35 508 318Luxembourg 15 112 55Netherlands 21 296 159New Zealand 19 310 139Norway 23 360 186Sweden 21 374 127UK 18 367 206

Table 3.1: Overview of cabinet data. “Govts” represents the total number of govern-ments between 1945 and 1990, “Positions” is the number of cabinet positions availableduring the period, and “CMs” is the number of unique cabinet members (nodes) thatfilled those positions.

ministerial composition, and other related variables are often dependent on external

factors, such as election results and other aspects of the historical context of the

government. Because this data was not available to us, our models are inherently

incomplete. If our aim were to build a complete model of governments in parliamen-

tary democracies, we would be required to combine the current data set with other

information.

Nevertheless, the data that we have allows us to attempt to answer important ques-

tions. Due to the dearth of academic research on network-based or structural variables

and their correlation with cabinet member survival rates, our project aims to see if

such variables provide any useful information when modelling and exploring cabi-

net memberships. With positive results, it has the potential to improve pre-existing

models of politician longevity and cabinet formation theories discussed in Chapter 2.

A discussion on the code written to analyze the data is provided in Appendix B.

11

Chapter 4

Network Measures

Summary statistics provide a general overview of important network measures. De-

pending on the mathematical underpinnings of the measures, one can compare dif-

ferent networks to each other, or simply use the information to see how a network

evolves over time. A further discussion on some of the measures below is included in

Appendix D.

4.1 Centrality and Centralization

The idea behind a “centrality” measure is describing the role that a node plays within

a network [71]. Specifically a more “central” node is considered more important,

potentially because it is connected to more nodes, controls relationships between

disparate groups, or for other reasons. The measures below strive to define a node’s

centrality in a network in different ways. While this is not a complete list of centrality

measures, the choices below all focus on different aspects of a network and show how

certain nodal properties can be used to judge the importance of a node to a network.

Once node centralities are calculated for all the individual nodes in a network, one

can calculate a graph “centralization”, which examines how centralities are dispersed

in a network. The goal of a graph centralization is to be able to compare diverse

networks to each other.

In general, any measure of centrality can be generalized into a graph centralization

12

measure using Freeman’s theoretical maximum [23].

CX =

∑ni=1 [CX(v∗) − CX(vi)]

max∑n

i=1 [CX(v∗) − CX(vi)](4.1)

v∗ = maxiCX(vi)

This normalization guarantees that 0 ≤ CX ≤ 1, because the denominator is the

theoretical maximum for a network with n nodes.

4.1.1 Degree Centrality

The simplest centrality measure is degree centrality [71], which represents the number

of edges associated with a specific node, also called the node’s “degree”. That is,

CD(vi) =n∑

j=1

aij . (4.2)

Note that one can set C ′D(vi) = CD(vi)/(n− 1) to normalize the value for a network

with n nodes.

In a weighted network one has “strength” rather than degree, where

ki =n∑

j=1

aij . (4.3)

Note, however, that since aij ≥ 0 rather than aij ∈ 0, 1 in the unweighted case, one

would have to normalize by the maximum value of kj for j = 1, . . . , n.

The graph that attains the maximum theoretical value for unweighted degree cen-

trality is a star graph. In an n-star graph, one has n nodes, with one node in the

center of the graph and the n−1 other nodes only connected to this initial node. The

degree of the central node is n− 1 and all other nodes have degree 1. This yields

max

n∑

i=1

(CD(v∗) − CD(vi)) =

n−1∑

i=1

n− 2 = (n− 1)(n− 2).

With this, we can define degree centralization as

CD =

∑ni=1 [CD(v∗) − CD(vi)]

(n− 1)(n− 2). (4.4)

13

4.1.2 Closeness Centrality

A second centrality measure is based on distances, with the idea that a vertex with

the smallest average distance to all other vertices is the most central and thus the

most powerful (e.g., can spread information most quickly). If we define the distance

between two vertices as d(vi, vj), then the closeness centrality [23] of a vertex is the

inverse of the averages of the distances:

CC(vi) =n− 1

∑nj=1 d(vi, vj)

. (4.5)

The reason we take the inverse is that in an unconnected graph, the distance between

nodes without a path between them is infinite. By taking the inverse, one simply

defines the inverse of the infinite distance as zero.

For the centralization measure, a star graph provides the maximum theoretical value

of closeness centrality. This is because the value of CC(v∗), where v∗ is the central

node in the graph, is equal to 1, as the distance from the central vertex to all others

is 1. In the case of other vertices, each vertex has a distance of 2 to n − 2 vertices,

and a distance of 1 to the central vertex. Thus,

maxn∑

i=1

(CC(v∗) − CC(vi)) =n−1∑

i=1

(

1 − n− 1

2(n− 2) + 1

)

=(n− 2)(n− 1)

2n− 3.

With this, we then define closeness centralization as

CC =(2n− 3)

∑ni=1 [CC(v∗) − CC(vi)]

(n− 1)(n− 2). (4.6)

4.1.3 Betweenness Centrality

Betweenness centrality is defined as the proportion of shortest paths (called “geodesics”)

that a specific node is in, within the entire network. Define qij(vk) as the number of

geodesics that the node vk is on for the nodes vi and vj , with qij being the total num-

ber of such paths. Then set bij(vk) = qij(vk)/qij . One can then define betweenness

centrality [22] as

CB(vk) =n∑

j=1

j−1∑

i=1

bij(vk), (4.7)

14

where k 6= i, j.

The maximal value for vk in such a network occurs with a star graph [22]. This is

shown by starting with a completely disconnected graph G with n nodes and m = 0

edges. Adding one edge (i.e., letting m = 1) does not affect betweenness. If in the

previous example one adds the edge (vk, vi) for some node vi, one can now add another

edge (vk, vj) for some other node vj. This now maximizes the betweenness centrality

of vk for the graph G at m = 2. One can do this n−1 times to maximize the value of

the betweenness centrality for vk, at which point adding any new edges will decrease

this value.

In a star graph, there are (n − 1)(n − 2)/2 possible paths between the n − 1 nodes

that are not the central node. Because each vi, vj combination has only one geodesic

with the only intermediary being the central node, the central node has a betweenness

centrality of (n− 1)(n− 2)/2 and all others have 0.

Freeman [23] uses this information to normalize the measure above to obtain

C ′B(vk) =

2CB(vk)

(n− 1)(n− 2). (4.8)

A star graph still has the maximal value for the normalized betweenness centrality in

(4.8), as one is simply dividing by a constant. This yields the theoretical maximum,

max

(n∑

i=1

C ′B(v∗) − C ′

B(vi)

)

=

n∑

i=1

C ′B(v∗) −

n∑

i=1

C ′B(vi)

=2

(n− 1)(n− 2)

[n∑

i=1

CB(v∗) −n∑

i=1

CB(vi)

]

=2

(n− 1)(n− 2)[(n− 1) − 1]

=2

n− 1.

Thus, one can define betweenness centralization as

CB =2 (∑n

i=1 (C ′B(v∗) − C ′

B(vi)))

2(n− 1)=

∑ni=1 (C ′

B(v∗) − C ′B(vi))

(n− 1). (4.9)

15

4.1.4 Eigenvector Centrality

The idea behind eigenvector centrality [10, 33] is that a node is important based on

how important its neighbours are. For example, a node with a high degree need

not be the most important one, if its neighbours have very low degrees. Indeed,

its underlying theory is similar to that of Google’s PageRank [52]. As such, one

can define the centrality measure as being proportional to the average value of one’s

neighbours. For a node vi, its centrality, xi can be defined as

xi =1

λ

n∑

j=1

aijxj (4.10)

where A = aij is the adjacency matrix of the network. Multiplying by λ, one can

easily see that this is simply the calculation of the eigenvector. In other words, we

have

λx = Ax. (4.11)

The eigenvector centrality of a node vi is the ith component of the eigenvector as-

sociated with the highest eigenvalue of A, and all entries in this eigenvector are

non-negative [48]. This non-negativity is discussed and proven in Appendix E.

4.2 Clustering Coefficients

The clustering coefficient is a measure of transitivity of relationships; it seeks to

measure how often two neighbours of a node are likely to be connected with each

other. This can be defined as the proportion of complete triangles to paths of length

two in the network [47]:

T =3 × number of triangles in the network

number of connected triples of vertices. (4.12)

A second definition exists as well, and provides for a localized interpretation of the

clustering coefficient. In this case, the clustering coefficient [73] for a node vi is

Ti =2∑n

j=1

∑nk=1 aijaikajk

ki(ki − 1). (4.13)

The advantage of (4.13) is that it provides information about individual nodes in

the network. This is useful if we want to judge whether certain nodes tend to have

16

more transitive social ties than others. The network-level clustering coefficient is then

taken as the average of the values Ti for i = 1, . . . , n.

There is no commonly accepted weighted definition of clustering coefficient, and nu-

merous versions exist. This provides a useful opportunity for using this data set to

help understand the differences between coefficient definitions while also using the

coefficients to model cabinet networks. Papers are still being written to compare def-

initions and individual properties [62]. Some examples of clustering coefficients are

given below.

4.2.1 Subgraph Intensities

Onnela et al. [51] describe a definition for the weighted clustering coefficient using a

concept of “subgraph intensities”. Given a subgraph G of G, where G is of size n,

one can define the subgraph intensity I(G) as

I(G) =

n∏

i=1

n∏

(j=1, j 6=i)

aij

1/[n(n−1)]

. (4.14)

Onnela et al. justify such a definition for a subgraph intensity because low values

of I(G) imply that the subgraph is of lower importance that one with a high value.

One can then define a local clustering coefficient by considering all of the subgraphs

containing a specific node vi and every permutation of two neighbours. Normalizing

the weights of the edges by the maximal edge in the graph will always give a value

between 0 and 1. Thus, the weighted clustering coefficient is

TO,i =2

κi(κi − 1)

N∑

j=1

N∑

k=1

(aij ajkaki)1/3 , (4.15)

aij =aij

arg maxi,j aij, (4.16)

where κi is the number of edges connected to vi.

17

4.2.2 Upper Bounds

Zhang and Horvath [78] define an unweighted clustering coefficient as Ci = wi/πi,

where

πi =κi(κi − 1)

2, wi =

1

2

∑

u 6=i

∑

v | v 6=i, v 6=u

aiuauvavi. (4.17)

For a weighted version where 0 ≤ aij ≤ 1 for all i, j, Zhang and Horvath argue that

πi can be preserved, and one must then find an upper bound for wi. Note that if edge

weights are larger than 1, then they are divided by the largest weight in the network.

An upper bound for wi can be calculated using the inequality aij ≤ 1 − δij where δij

is the Kronecker delta. We then have,

wi =1

2

∑

u 6=i

∑

v | v 6=i, v 6=u

aiuauvavi ≤1

2

[∑

u 6=i

aiu

]2

−∑

u 6=i

a2iu

. (4.18)

Steps showing how to obtain the above are discussed in Appendix D.1.

Thus, one can set πi as in Equation (4.17) and define the weighted version of the

clustering coefficient as Ci = wi/πi. Note that this is equivalent to setting [32],

TZ,i =

∑Nj=1

∑Nk=1aij ajkaik

∑

j 6=k aij aik. (4.19)

4.3 Homophily

A common question in social network analysis is whether nodes tend to connect

to similar or dissimilar nodes, and this feature is called “homophily”. Homophily

comes in numerous forms [41], including nodes connecting with other nodes of similar

strength or degree, similar social status, gender, age, or numerous other character-

istics. While such factors play a crucial role in sociological research, a key issue is

measuring homophily within a network in a quantitative fashion.

4.3.1 Heterogeneity

A simple approach to defining homophily is by counting categorical node groups

within a network. Rather than exploring the actual edges in a network, one simply

18

delimits the different categorical groups that nodes belong to. One then calculates

what the probability is that, if one picks two nodes uniformly at random in a network,

the two nodes will come from different groups. This measure, called “heterogeneity”

[8] is sometimes defined by

QH = 1 −g∑

i=1

p2i (4.20)

where pi is the proportion of the nodes in the network that belong in the ith group.

A problem with the above is that if the size of the network is small, one must sample

without replacement. Thus, an improved version of Equation (4.20) is [64]

QS = 1 −g∑

i=1

si

n

si − 1

n− 1, (4.21)

where n is the number of nodes in the network, and si is the number of nodes in the

ith category.

4.3.2 Modularity and Assortativity

If nodes in a network can be categorized into discrete categories, one can build a

matrix representing the number or strength of connections between and within cate-

gories. Define a matrix M = mij where i represents the ith category, and j the jth

category. Then define mij as the sum of strengths of edges from nodes in category i

to category j, divided by the sum of all strengths in the network. The matrix M can

be called a “mixing matrix”. We can then define the network’s “modularity” [49],

QM =∑

i

mii −∑

ijk

mijmki. (4.22)

Modularity is zero when there is no homophily or heterophily (nodes from different

categories being attracted to each other), negative when heterophily is prevalent, and

positive when the network is homophilous.

If comparing multiple networks, one needs to normalize the modularity. For example,

one can define the assortativity coefficient [46]

QA =

∑ni=1mii −

∑

ijkmijmki

1 −∑ijk mijmki=

QM

1 −∑ijk mijmki. (4.23)

19

Chapter 5

Descriptive Overview of theCabinet Data

With the overview of network measures provided in Chapter 4, it is possible to explore

the data and try to understand the various countries and political systems in the data

set.

5.1 Individual Overview

At the individual level, we observed few universal patterns in the parliamentary

democracies in this study. The correlations between clustering coefficients and max-

imum edges in governments appear significant and negative for all countries, ranging

from −0.9203 for Canada to −0.6568 for Israel. A relationship between clustering

coefficient and maximal edge could be a byproduct of mathematical definitions, as

discussed in Section D.3.

Another interesting observation is recorded in Table 5.1, showing the correlation

between the number of parties involved in the current government and the clustering

coefficient of the government itself. While we do not have an intuitive explanation

for this, it might be an artifact of the data gathering and interpretation process.

Specifically, the correlation between clustering coefficient and number of parties for

Luxembourg is −0.06492 rather than 0.5386 if the first government (where clustering

coefficient is 1 for all nodes) is ignored.

By observing clustering coefficients over time, as shown in Figure 5.1 for Canada and

20

Country Cor(# parties, TO)Canada .

Denmark 0.1366France −0.1386

Germany −0.2304Israel −0.1369Japan 0.3982

Luxembourg 0.5386Netherlands −0.1715

New Zealand .Norway 0.2656Sweden 0.5978

UK .

Table 5.1: Correlations between the number of parties and TO. Dots represent anundefined correlation because the number of parties stayed constant at 1.

Luxembourg, ones sees that the initial measurements for clustering coefficient could be

misleading. In the case of Canada, in Figure 5.1(a), the clustering coefficient shoots to

1 every few years. This occurs whenever an entirely new government (one without any

previous connections between members) is instituted. This is less prominent in the

case of Luxembourg, in Figure 5.1(b), but one can still see that clustering coefficient

increases noticeably in a similar fashion. Many network measures, including most

centralizations, appear to be volatile like this — whenever a country experiences a

large change in government, the network measures change drastically as well.

At the individual level, what is perhaps more telling about this research is how coun-

tries differ due to their political histories. Figure 5.2 shows heterogeneity (QH) for

Japan. Notice how the measure changes for every government, and eventually drops

off to zero. This signals the transition of power to the Liberal Democratic Party,

which was formed in 1955 and maintained a continuous majority in government past

1990 [37].

As such, while individual network measures provide interesting insights on a case-

by-case basis, it appears that little can be gleaned from a descriptive point of view,

especially without incorporating political contexts and histories.

When patterns do appear to emerge, they are often remnants of the mathematical re-

lationships between various network measures and simple properties of the countries

themselves. For example, there is a strong correlation between degree centralization

and closeness centralization in the context of the entire historical cabinet network

21

5 10 15

0.2

0.4

0.6

0.8

1.0

Government

Clu

ster

ing

Coe

ffici

ent (

Onn

ela)

(a) Canada.

2 4 6 8 10 12 14

0.3

0.5

0.7

0.9

Government

Clu

ster

ing

Coe

ffici

ent (

Onn

ela)

(b) Luxembourg.

Figure 5.1: Clustering coefficients over time.

(and evolution thereof) for each country, except if the countries have networks with

more than one connected subgraph (i.e. “component”) [9]. If a country has a fully

connected ministerial network (i.e. in the case of countries with proportional rep-

resentation and coalition governments), as degree centrality decreases, there is less

inequality between nodes, in terms of degree. Thus, nodes likely have smaller dis-

tances between each other, which brings down closeness centrality measures as well.

Such a phenomenon is not observed when multiple components are present because

closeness centrality is always extremely low in this case. Low closeness centralities oc-

cur in graphs with multiple components because there are infinite distances between

nodes in distinct components.

Nevertheless, this provides a useful caveat for any models we develop. We must ensure

that our network measures are not simply proxies for simpler or qualitative properties

of governments or nodes but actually depend on the network structure of individual

cabinet members.

5.2 Comparative Overview

Table F.1 (in Appendix F) lists the countries in this study and some of the mean

measures calculated for every government in the data set. A number of very in-

teresting properties seem to emerge. While not listed in the table, the number of

components in cabinet networks aggregated over the entire historical period tends

22

0 5 10 15 20 25 30 35

0.0

0.2

0.4

0.6

Government

Het

erog

enei

ty

Figure 5.2: Blau’s Heterogeneity measure over time in Japan.

to be 1. Indeed, even in countries that never see coalition governments, the entire

historical cabinet network tends to be one component due to political mechanisms

like ministers switching parties during their careers. One interesting case is that of

New Zealand, which has three components (i.e. three sets of ministers have never

worked together). Furthermore, a majority (seven of twelve) of countries have mean

cabinet sizes between 15 and 25 people, while the rest tend to be smaller.

In some cases, it is the lack of a relationship that proves interesting. A useful property

is the maximal edge weight of a cabinet network, which represents the largest number

of cabinets two ministers have in common with each other in a given government.

The number of governments and mean maximal edge weight of a country do not

seem to be highly related. The correlation between the two has a value of 0.4136,

with a p value of 0.1814, meaning this relationship is not statistically significant.

Specifically, this implies that having a large number of governments within the 45

year period we analyze, such as the case with Israel or Japan, does not necessarily

translate to cabinet comemberships lasting in more governments. Furthermore, there

is a significant relationship (p value = 0.002427, correlation 0.7862) between the

mean cabinet size and the mean number of newcomers to a government, but this

relationship disappears when one examines the ratio of newcomers to cabinet size

rather than absolute numbers.

Another way to analyze the cabinet networks is by visualizing them, and this is

discussed in Appendix C.

23

Chapter 6

Governmental Models

We use “governmental” models to try to explain and predict network-wide patterns

within countries and their governments. While nodes still represent cabinet members,

one does not try to follow the progression of a specific node but rather tries to deter-

mine whether specific types of nodes tend to survive and progress through multiple

cabinets.

How such global parameters cause networks from different countries to vary is useful

from a comparative political perspective, as they allow us to see how countries differ.

For example, consider a model that accepts a parameter that uses the number of

years a node has been in power to calculate the probability the node will survive into

a future cabinet position. Upon fitting the model to the various countries in the data

set, we might be able to see that some governments seem to treat experienced nodes

favourably, whereas other governments do not. Such a conclusion, especially if com-

bined with other data about the individual countries, could be politically significant.

For instance, comparing parameter values between countries might allow us to see

how different constitutional or institutional properties affect which types of ministers

actually gain and keep power.

6.1 Basic Model

A simple approach for modelling cabinets within parliamentary democracies assumes

that nodes and their structural properties have no effect on whether a cabinet member

will be included in a future cabinet. This model thus assumes that every member of

24

a cabinet at time t has an equal probability of being in a cabinet at t+ 1.

This model, termed the “basic model”, assumes that countries have a fixed cabinet

portfolio of n members1, and each member has a probability p of being re-elected

and included in the next cabinet. If a member drops out of the cabinet, he or she

will never return. Because the model is based on probabilities, the survival rates

of cabinet members and percentages of incumbents in specific governments become

random variables as well.

We assume that the size of government and probability of being selected into the

following cabinet are both fixed. Because the probability is the same for every member

of cabinet, the number of ministers who move on to the next cabinet follows a binomial

distribution, bin(n, p), where n is the size of government, and p is the probability of

moving on. Specifically, the probability that k cabinet members will be present in

governments at times t and t+ 1 is [60]

Pr(K = k) =

(nk

)

pk(1 − p)n−k. (6.1)

Thus, using the mean and variance of a binomial distribution, we know that the

number of incumbents in a cabinet is np with a variance of np(1 − p).

Because incumbent cabinet members are selected from the prior cabinet only, a cab-

inet member will stay in office following a geometric distribution, geo(p) [60]:

Pr(K = k) = pk−1(1 − p) (6.2)

This implies that the expected number of governments that a cabinet member will

serve in is 1/(1 − p), with a variance of p(1 − p)−2.

6.1.1 Membership Predictions

Due to the simplicity of the basic model, it is possible to analyze the expected rates of

incumbents and survival rates of politicians in detail. Specifically, we are interested

not only in the number of incumbents in a cabinet, but also the number of cabinets

each member has already served. Thus, given a government gt, let I1 represent

1We use n to represent the size of the government because the government exists in a networkcontext.

25

the number of incumbents from the previous government (i.e. from gt−1). Thus,

I1 ∼ bin(n, p), so that I1 is a random variable taken from a binomial distribution.

To see how many incumbents have been in office since gt, where t < t, we use an

inductive approach. For every government between t and t, the incumbent had to be

reselected in our model. Because each selection takes place with a probability of p

and we make t − t selections, there is a probability pt−t that the specific node in gt

will still remain in gt. As such, It ∼ bin(n, pt−t), yields an expectation of npt−t.

Another way of looking at the above is that at gt−1 the expected number of nodes

from gt is equal to npt−1−t. Because each of these nodes continues to serve in gt with

a probability of p, it follows that It ∼ bin(npt−1−t, p).

Let Jt,u be the set of ministers who entered the model at time u and are still present

at time t. Also, let Gt represent the set of ministers that served in government gt.

Thus, we have a government gt composed of the set

Gt = ∪ti=1Jt,i

= Jt,t + ∪t−1i=1Jt,i

= newcomers ∪(∪t−1

i=1Jt,i

).

It is possible to get the expected numbers of cabinet ministers in gt that began

serving as ministers from times prior to t. Specifically, we have G2 = J2 ∪ J1 =

newcomers ∪ incumbents from g1. Because the distribution for each step is bi-

nomial, and newcomers and incumbents are disjoint sets, we obtain: n = E(|G2|) =

E(|J2|) + E(|J1|) = (n− np) + np.

Similarly, we have that n = E(|G3|) =∑3

i=1 E(|Ji|). Because every binomial step (i.e.

government change) is considered to be independent of the last one, we thus have

n = E(|J3|) + E(|J2 ∪ J1|) = (n−np) +np = (n−np) + (n−np+np)p. This leads to

n = (n− np)︸︷︷︸

newcomers

+ (np− np2)︸︷︷︸

incumbents from G2

+ np2

︸︷︷︸

incumbents from G1

. (6.3)

Using induction, we then get that for Gt, the expected number of incumbents from

26

different governments is:

E(|Jt,t|) = np0 − np1 = n− np (newcomers),

E(|Jt,t−1|) = np1 − np2,...

E(|Jt,k|) = npt−k − npt−k+1, (6.4)...

E(|Jt,2|) = npt−2 − npt−1,

E(|Jt,1|) = npt−1.

In other words, for gt, one has an expected number incumbents that entered the

cabinet at time k equal to npt−k(1 − p).

Using this information, one can create an “experience matrix” that shows the number

of newcomers and incumbents from specific governments:

Ω = ωij =

n 0 0n− np np 0n− np np− np2 np2

(6.5)

Equation (6.5) shows the expected number of cabinet members for a country with

three successive governments of size n with probability p of a cabinet member being

reselected for a post. Matrix rows represent successive governments, and columns

represent the government in which the cabinet member started his or her ministerial

career. Thus, ωij represents the nodes in government i that joined j− 1 governments

ago.

One can calculate the variance of the values in the experience matrix similarly. Specif-

ically, observe the third row of the experience matrix. In this case, one has an ex-

pectation of n− np for the number of newcomers, np− np2 for incumbents from the

second government, and np2 for incumbents from the first government. Observe, how-

ever, that in each case we look at expected numbers of incumbents, we are actually

observing various binomial distributions:

(np− np2) = E[(bin(n, p− p2)],

(np2) = E[bin(n, p2)].

27

Thus, using the expectation in Equation (6.4), one can see that for the kth row and

mth column of the experience matrix, where 1 < m < k, each entry is represented

by the random distribution bin(n, pm−1 − pm). This implies the variance is n(pm−1 −pm)(1 − pm−1 + pm).

6.2 Single-Measure Models

As discussed in Chapter 2, a key question within cabinet and coalition theory is

understanding whether the network structure of the cabinet plays any significant role

in ensuring that cabinet members remain in power. Furthermore, the position a node

holds within a social network can significantly impact its ability to gain access to

resources, information, and other forms of social capital. A politician’s position in a

cabinet network could similarly impact his or her access to forms of capital, and it is

worth testing whether the structural properties of a network do indeed impact nodes

in such a way.

With this in mind, we build a probabilistic model that works in a similar fashion as

the basic model, but with probabilities that depend on politicians’ specific structural

properties in the cabinet networks. As in the case of the basic model, we assume that

the cabinet members are only selected from the preceding government, and that their

probability of doing so is proportional to some node-based measure, f(ci), where f is

a function that outputs the relevant measure for the node in question. Thus,

P (ci) ∝ f(ci)

P (ci) = αf(ci)

where P (ci) is the probability that node ci will have a portfolio in the next cabinet

and α is a constant.

We can thereby fit the model to any node-level network metric. Strength, centralities,

clustering coefficients, assortativity, and other measures are all relevant and useful

candidates for such a model. Furthermore, if such a model does better in reducing

error (discussed in Chapter 8) compared to the basic model in Section 6.1, we can

conclude that network measures play a role in ensuring the survival (or death) of

politicians’ careers as cabinet members.

28

A further advantage of using single-parameter models is that one can use simple and

fast optimization techniques to analyze them. One such technique is the Golden

Section Search described in Section 7.1.

6.2.1 Strength Model

As an example, consider a model based on the strength of individual nodes. In this

case, the model assumes that the probability a cabinet member will be a member

again is proportional to his or her strength. From a political science perspective,

strength represents how long a certain party or group of politicians have been in

office, and whether or not politicians are more likely to survive in cabinets if they

work with other survivors. This yields

P (ci) ∝n∑

j=1, i6=j

aij =

n∑

j=1

aij (6.6)

where we recall that aii = 0 for all i.

Suppose we denote the expected number of incumbents from the previous year by β.

Then for some constant b we have,

β ≡ E[|incumbents|] =

n∑

i=1

P (ci)

=

n∑

i=1

(

b

n∑

j=1

aij

)

= bn∑

i=1

n∑

j=1

aij . (6.7)

At the start of a simulation, we have a matrix A = aij where aij = 1 for i 6= j and

aii = 0. Thus, the sum in Equation (6.7) is n(n − 1). This means that for the first

iteration of the model, if we want to expect β incumbents, we must have

b =β

n(n− 1). (6.8)

6.2.2 The Strength Model as a Bipartite Network Model

One reason that we consider the example of the strength model is that it is equiv-

alent to a bipartite network model. Based on our current approach to modelling

29

governments, our networks have very specific properties:

1. Every cabinet is represented by a clique of a fixed size n.

2. Edge weights represent the number of times two politicians have served in the

same cabinet.

Using these points, there is a direct correspondence between a minister’s strength and

the number of times he or she served in a cabinet position. Specifically, the strength

ki of node ci, is a function of the number of governments φi in which a node has

served:

ki = φi(n− 1). (6.9)

The equivalent bipartite model has two types of nodes: governments, gt for some

discrete time value t, and the cabinet members themselves, ci. We can set up a

system where members of gt have a probability of remaining in gt+1 proportional to

their experience:

P (ci in gt and in gt+1) ∝ deg(ci), (6.10)

where deg(ci) represents the degree of ci in the bipartite network. Note that in this

case, degree is equal to the number of governments served by ci, because nodes have

edges connecting them to the governments they served. Thus, we have

ki = (n− 1)φi = (n− 1)deg(ci). (6.11)

Inserting this into the strength model discussed in Section 6.2.1 gives the following

probability:

P (ci) = bki = b(n− 1) deg(ci) = b deg(ci),

where b is a new constant. Thus, the strength model with parameter b is equivalent

to a bipartite network model with a parameter b = b(n− 1).

6.2.3 Other Properties

The start of every model that we consider involves a network of n nodes connected

to all other nodes in the network with edge weights of 1, so one can make basic

30

Measure Type ψ valueStrength n− 1

Clustering Coefficient (any) 1Eigenvector Centrality any

Table 6.1: Various network measures and their ψ values. Note that the eigenvectorcentrality can have any ψ value, as the eigenvector for a clique is any vector withequal entries. This is discussed in Appendix G.

predictions on the expected number of incumbents in the second government. Because

every node in the network is structurally equivalent – that is, it has an equivalent

network position when one considers neighbours and descriptive network measures –

we have

E[|incumbents|] =n∑

i=1

P (ci)

=

n∑

i=1

bf(ci)

= nbψ,

where ψ is the measure’s value for a clique with edge weights of 1. Table 6.1 shows

ψ values for typical network measures.

6.2.4 Perturbing the Optimal Probability

Another approach we use to build models based solely on one network measure is

having the probability that nodes remain in a cabinet position be based on a prob-

ability that is then slightly modified based on a network measure. In this case, we

have

P (ci) = P0 + αf(ci) (6.12)

for some constant α and an initial probability of reselection P0. The advantage of

this model is that one can obtain the optimal probability for the basic model and

use this as the value for P0, with α = 0. One can then use simulated annealing or

another optimization algorithm to see if slightly changing P0 and α will yield lower

error rates.

31

6.3 Preferential Attachment

The preferential attachment model is based on the idea that those who already have a

large degree or strength are more likely to receive connections from new nodes entering

a network. In social science, this is sometimes called the “Matthew Effect” [42], based

on the Bible verse that discusses how rich people tend to get richer. The idea of the

Matthew Effect applies to network modelling, as it has been observed that different

types of social networks exhibit properties where nodes with large strengths or degrees

tend to gain more edges disproportionately faster than other nodes [57, 63].

The “preferential attachment” mechanism [5] is an application of the Matthew Effect

idea to network analysis. It strives to explain scale-free distributions of degrees within

networks. Specifically, a network is considered “scale-free” when, given a constant γ,

the distribution of nodes with degree k is the power law,

P (k) ∼ k−γ . (6.13)

The reason the above type of network is called “scale-free” deals solely with power

law form of the degree distribution [47]. Specifically, a scale free function satisfies

P (ak) = bP (k) for some a and b, and all k.

The preferential attachment model assumes that a network increases in size by the

addition of new nodes. The degree of new nodes is bounded by a predetermined value,

and the probability that the newly-added edge connects to a specific pre-existing node

depends on the latter node’s degree. The probability that a node vi will be connected

to the new node is

P (ki) =ki

∑Nj=1 kj

, (6.14)

where ki is the degree of node vi and P (ki) is the probability that the new node will

have an edge connected to vi.

The models described earlier all focus on selecting cabinet members that have served

only the previous government, while the present preferential attachment model selects

from a pool of all ministers in the entire history of the country. A second variable,

representing the aging bias is included as well. Ministers cannot serve forever, and

depending on how old a minister is, he or she may be forced into retirement. The

32

preferential attachment model is thus

Pc(ki, zi) =α1ki + α2zi

∑

j(α1kj + α2zj), (6.15)

where zi represents the number of governments (i.e. “experience”) that the cabinet

minister has served in the past. Weights associated with the strength and experience

variables are represented by α1 and α2, respectively.

We built another model where the probability is based on α1ki + α2zi + α3z2i . The

reason for this is that experience in multiple governments may be beneficial in the

short-term, but due to the limited biological capacity of politicians, they eventually

have to remove themselves form political careers. Thus, one would expect to have

α2 > 0 and α3 < 0.

A second difference between Barabasi and Albert’s [5] preferential attachment mech-

anism and ours is that in our case, a clique of new cabinet ministers is added at every

time step, rather than just an individual node. This clique is of a predetermined size

n, and n− n incumbents are then selected to join the newcomers. Each cabinet mem-

ber is chosen with a probability equal to Pc, and incumbents are sampled without

replacement.

While not implemented in this dissertation, a second approach to dealing with age

and other factors that might affect the attractiveness of a node to an incoming node

is to multiply node vi’s strength by a weight [2],

P (ki) =ηiki

∑

j ηjkj

, (6.16)

where ηi is some fitness parameter. In the case of the cabinet networks and our

function Pc, this function could be dependent on the age of the politician.

33

Chapter 7

Model Optimization

We implemented a number of optimization algorithms for the modelling task outlined

in this report.

7.1 Golden Section Search

The Golden Section Search algorithm [56] is based on the idea that if we have three

points, x1 < x2 < x3 on an interval with f(x2) < f(x1) and f(x2) < f(x3) then, if

the function is unimodal and continuous, there is a x∗ ∈ (x1, x3) such that f(x∗) is a

local minimum.

As a consequence of the previous text, the algorithm begins with points a and c on

an interval1 [a, c], and looks for b and x as illustrated in Figure 7.1. We then define

w to satisfyb− a

c− a= w,

c− b

c− a= 1 − w.

Thus, w is the ratio of the distance between b and a versus the length of the entire

interval. If we assume that the next point in the interval will be x, we can then define

the distance z as the distance between x and b over the entire interval length:

x− b

c− a= z

1While in the initial example the interval for the minimum point was open, we treat it as closedin the actual algorithm because aside from the first iteration, we do not know if our minimum pointwill appear on a boundary.

34

Figure 7.1: Illustration of the Golden Section Search algorithm, where w = (b −a)/(c− a) and z = (x− b)/(c− a).

We then choose either x to form a new interval [a, x] or b to form the interval [b, c].

To minimize the worst-case outcome, we require that 1−w = w+ z, which allows us

to obtain z = 1 − 2w.

Furthermore, by this approach, we thus require |b− a| = |x− c|.

We now have an optimal way of choosing z based on w. Because this is an iterative

approach, we can assume that w was picked in a similar way. Thus, we require that

the ratio between z and 1 − w be equal to w. This yields

z

1 − w= w

⇒ w2 − 3w + 1 = 0

⇒ w =3 ±

√5

2.

Because we require 0 < w < 1, we take the negative root above, so w ≈ 0.38197.

With this in mind, we can then choose points x and b based on the value of w, above.

If we see that f(x) < f(b), then our new interval in the next iteration will be [b, c].

If f(x) > f(b), the interval will be [a, x].

7.2 Nelder-Mead Algorithm

The Nelder-Mead algorithm is a nonlinear optimization technique that uses an (n+1)-

dimensional simplex to find maxima or minima of a given n-dimensional function [45].

This is a popular algorithm for non-linear optimization and is implemented in MAT-

LAB in the fminsearch function. Because it is difficult to conclude whether the error

functions associated with the parameter space of the models described in Chapter 6

are linear or follow any strict patterns, implementing a nonlinear search technique

35

can help determine optimal parameter values. Note, however, that we implement our

own version of the algorithm.

We begin an implementation of the Nelder-Mead algorithm by being given n + 1

points x1, . . . ,xn+1 in a parameter space Rn, ordered in such a way that f(x1) ≤

. . . .. ≤ f(xn+1). In our case, the goal of every iteration of the algorithm is to replace

xn+1 with a vector that decreases the error function (i.e. our objective function). The

actual implementation of the algorithm follows MATLAB’s own [35], and the steps

of an iteration are:

1. Order the points as described above.

2. Define x =∑n

i=1 xi/n. The first candidate is xr = (1 + ρ)x − ρxn+1. If

f(x1) ≤ f(xr) < f(xn), then replace xn+1 and quit the iteration.

3. If f(xr) < f(x1), set xe = (1 + ρχ)x − ρχxn+1 and replace xn+1 with xe if

f(xe) < f(xr). Otherwise, replace it with xr.

4. If f(xr) ≥ f(xn), then perform one of two steps:

(a) If f(xn) < f(xr) < f(xn+1), then set xc = (1 + ργ)x − ργxn+1, and if

f(xc) < f(xr) then replace. Otherwise, exit this conditional step.

(b) If f(xr) > f(xn+1), set xcc = (1 − γ)x + γxn+1. If f(xcc) < f(xn+1) then

replace with xcc.

5. If all of the above fails, reset the entire set of points by setting yi = x1+σ(xi−x1)

for i = 2, . . . , n+ 1.

Following the MATLAB implementation [35], we set ρ = 1, χ = 2, γ = 1/2, and

σ = 1/2.

In our implementation of the algorithm, the optimizer accepts one initial guess for

the optimal points, x∗ = [a1, . . . , an] and sets it as xn+1. Then it sets xi as all zeroes

except for the ith position, where we have ai.

7.3 Simulated Annealing

Simulated annealing is an optimization method based on energy states in statistical

mechanics [17]. Specifically, one is looking for low-energy states for a set of particles.

36

Given a starting point of variables, one then attempts to minimize the value of an

error function in the system.

If one simply performs a localized search for a minimum, it is likely that with com-

plex objective functions, the searcher will get stuck in a local minimum. It is thus

important that we consider both nearby and distant parameter values, and that some-

times values that worsen (i.e. increase) the objective function are used as steps in

the algorithm. Of course, this still does not guarantee that a global optimum will be

found.

Below is a basic overview of how simulated annealing works [4]. First, we choose an

initial starting point or guess, x0, which is evaluated using an objective function, f .

Given a new set of parameters, x′, one can move from x to x′ on two occasions:

1. When f(x′) < f(x).

2. If f(x′) ≥ f(x), then one moves to x′ uniformly at random, using a probability

p (see below). Alternatively, we pick a new point x′′ based on the current x.

Candidates for the next iteration of the algorithm can be selected in various ways. In

this dissertation, one has a set X of candidates such that, given the current state of

x ∈ Rn, we have

X = y ∈ Rn | yj = xj ± ǫ for j in 1, 2, . . . , n and yi = xi elsewhere.. (7.1)

Given the set of candidates X , a member x′ ∈ X is chosen uniformly at random and

the function f is evaluated. If f(x′) < f(x), then x′ is set as the new value, and a

new set X is generated at the next iteration. Otherwise, x′ still replaces x with a

probability p, defined as:

p = ef(x′)−f(x)

τ , (7.2)

where τ = ri−1T , and where 0 < r < 1 is a cooling rate, T is the initial temperature,

and we are on the ith iteration. We then define the perturbation at every step as

ǫ = τǫ0, where ǫ0 is the initial value for the perturbation and is provided by the user.

The algorithm runs a predefined set of iterations, or exits once it reaches a certain

tolerance for the error (i.e. objective function).

37

0.75 0.80 0.85 0.90 0.95

050

010

0015

0020

0025

00

Cooling Rate

Num

ber

of It

erat

ions

Figure 7.2: Number of iterations necessary to either quit the simulated annealingalgorithm (which we have chosen to be 2500 iterations) or reach convergence witherror tolerance 0.1. The dashed lines represent one standard deviation away from themean number of iterations (or 0 or 2500 where relevant) when running the algorithm50 times for T = 1, and an initial perturbation of 2. The function being minimizedis f(x, y, z) = x2 + y2 + z2.

The biggest problem with regard to using simulated annealing is the fact that, depend-

ing on how one sets the cooling rate, temperature, and perturbations, the algorithm

may converge to a local (or, ideally, a global) optimum extremely slowly. Figure 7.2

shows how for a function f(x, y, z) = x2 + y2 + z2, the number of iterations necessary

to find the minimal point (x = y = z = 0) can vary significantly. Based on the figure,

it is only within a relatively small subset of values in (0, 1), namely between about 0.8

and 1.0 that the algorithm actually converges without reaching the maximum number

of iterations allowed.

38

Chapter 8

Optimized Models

The models described in Chapter 6 incorporate important aspects of dynamics within

cabinet networks. However, to actually judge whether or not such dynamics are

relevant in any of the parliamentary democracies being explored, it is important to

optimize the parameters of models and see how well they do compared to the baseline

model.

8.1 Error Definition

The errors being analyzed are based on the observed experience matrices of the par-

liamentary democracies. Specifically, if O = oij represents the simulation means

and O = oij is the observed set of values, then one way to define error is using the

2-norm of the difference. That is

e =

(n∑

i=1

n∑

j=1

(oij − oij)2

)1/2

=

(n∑

i=1

i∑

j=1

(oij − oij)2

)1/2

. (8.1)

Note that the second equality exists because the experience matrix is lower-triangular.

8.2 Model Fits

We calculated results for the basic model, single-measure models, and preferential

attachment model, and a complete set of results is included in Appendix H. The

39

basic model results are reported in Table H.1, with most of the results showing the

lowest error rates at around P = 0.5. Because calculating the variance of a matrix

with random entries is non-trivial, we also present the results of the basic model’s

simulations, rather than analytical values.

Results for single measure models are reported in Table H.2, with models for per-

turbed probabilities in Table H.3. Finally, results for the preferential attachment

model are presented in Table H.4.

Most of the models do not have lower error rates than the basic model (i.e. the base-

line). What is promising, however, is that five countries outperform the basic model

in one network-centric model each. New Zealand and Luxembourg both outperform

the baseline results through the perturbed probability model, where the initial prob-

ability is shifted using nodes’ strengths. Japan outperforms the baseline using the

strength model, and Canada does so using the single-measure model based on TO

(discussed in Section 4.2.1). Finally, France improves upon the basic model through

the preferential attachment model. While the differences are not extremely large,

they are statistically significant. That is, all of these examples have lower mean error

values at a statistically significant difference compared to the simulations of the basic

model.

These results imply a number of things. In the Japanese case, if one treats the

strength model as a bipartite one (as discussed in Section 6.2.2), this implies that the

more governments a cabinet member serves, the more likely that he or she will be

reselected for a future portfolio. One can argue a similar result for Luxembourg and

New Zealand, though since here we are perturbing initial probabilities, it implies the

effect of serving prior governments is smaller.

France also has a similar interpretation. With the preferential attachment model, it

appears that ministers who serve many governments and work with others who do

the same are more likely to be selected for future posts.

The outlier in this case is Canada, which has clustering coefficients play an important

role. This is particularly interesting, because it implies that cabinet members succeed

based not only on who they work with, but if those ministers also work with each

other.

While models with error rates below the baseline were developed for only five of the

twelve democracies, this shows that models based solely on our data set may yield

40

useful and promising results, and that significant network effects are present in at least

some of the parliamentary democracies analyzed. Furthermore, because four differ-

ent model types had significant results, this implies that the underlying mechanics

of ministerial survival may be different depending on the country. It is possible that

other types of measures may be significant for the seven other parliamentary democ-

racies being analyzed, or that more complex network mechanics would be necessary

to improve beyond the baseline models.

41

Chapter 9

Ministerial Models

Thus far, we have explored global models of networks. That is, the earlier models were

optimized for global variables and properties of the network, rather than seeing how

well we could do in predicting the evolution and development of individual nodes. In

this chapter, we set out to explore whether we can make predictions about individual

nodes. Specifically, we hope to predict whether or not members of a specific cabinet

will be selected for another cabinet position in the future.

The problem definition in this case is as follows: given as much information as possible

about a specific node, such as political party, assortativity coefficient of the govern-

ment, number of governments the node has been involved in, eigenvector centrality

of the node, and so on, can we predict if the node will be in power in the future?

As discussed in Section 2.3, an important question is how cabinet ministers evolve

over time and whether they stay in power. A key question to ask, then, is given

network-focused information, can we predict with any sort of accuracy whether or

not a cabinet member will serve in a future government?

9.1 Supervised Learning

The approach we use for this problem can be termed as a “supervised” predictive

modelling approach. The term “supervised” is based on the fact that we train the

model on results that we have already observed — in this case, it is knowing whether

or not a cabinet member returns to power in a later cabinet. Every algorithm we use

seeks to map a data vector with n variables, x ∈ Rn, to a binary result, 0, 1. Thus,

42

we seek to fit a function ℓ : xn → 0, 1 to minimize another objective function.

The objective function, as before, is based on a measure of error. Because our function

returns one of two values, 0 for ministers that never return to cabinet, and 1 for

ministers that do, we can define the error function as the accuracy rate of our model.

Specifically,

accuracy =number of correct predictions

total number of predictions(9.1)

Thus, accuracy takes a value in the interval [0, 1], and we seek to maximize the

accuracy.

9.2 Preparing the Data

Because we aim to develop a model that predicts whether or not specific cabinet

members return to a cabinet position in the future, we can input various variables

into the model. Since our parameter for ℓ is a vector x, we can input any information

into x that may help our function discriminate between ministers that serve in an-

other cabinet and those who do not. For example, x1 may contain the TO clustering

coefficient, while x4 may contain the node’s strength. We hope that our algorithm

will be able to take these element values and learn to discriminate between the two

classes, 0, 1, based on the entries in x. In our case, x is composed of the following

entires:

• Global Network Properties: average clustering coefficient (TO, TH , TZ)

• Node-Level Network Properties: strength, clustering coefficient (TO,i,

TH,i, TZ,i), eigenvector centrality, betweenness centrality

• Party-Based Properties: assortativity coefficient, heterogeneity

• Historical Properties: whether the node is a newcomer or incumbent, num-

ber of governments the node has served in

Within the realm of machine learning, one must decide what data one should train a

model on, and what data one should test on. A key problem is ensuring that we do not

overtrain a model [40]. This occurs when one inputs biases into the training process

by selecting biased training sets. For example, if one were to simply randomize all

43

the data vectors (variables listed above) from all governments and pick a random set

of training instances and random set of test instances, we would likely overtrain the

models. This would occur because the global network-level properties stay the same

for all nodes in a specific government, and some of these nodes would be randomly

placed in the training set, while others in the test set. Thus, we would build and

evaluate our model based on data that we would never be able to access in practice.

Implicitly, we would be building models with information about future governments,

and so biasing our results.

A second issue is understanding how much historical information is necessary for

accurate predictions. For example, Canada had 18 governments between 1945 and

1990. If we limit ourselves to forward-looking models, we can build models for 16

governments. The first one would be dropped due to the absence of training data,

and the last one cannot be validated. However, having a training set with only one

government would likely yield extremely poor results. Similarly, having a test set

that only incorporates the 17th government could yield poor results if the transition

to the 18th government was a particularly tumultuous one.

With this in mind, we built N − 1 models for each country, where N is the number

of governments in the data set for the country in question. Thus, N − 1 splits were

used to create training and test data sets. If we take the Canada example, with 18

governments, suppose we have a split at government seven. This means we take all

the nodes in the first seven governments and their associated network properties, and

see if they return to power at any point in the future. We use this to train the model,

and then validate it on the 11 governments we did not include in the training set.

Note that we dropped Israel and Germany from this analysis because we lacked com-

plete data on political party membership.

9.3 Voting by Majority Class Membership

The simplest type of model, and the one used as a baseline for this research project,

deals with simply applying the class with the highest membership rate to all future

instances. For example, given 100 training instances, if 60% are in class A and 40%

in class B, then the model will categorize all future instances as members of class A.

44

Figure 9.1: An example of a binary tree decision system. One variable (cloudiness)is evaluated first, either leading to a decision or another variable evaluation.

This is a crude approach, as it assumes that class membership rates do not vary, and

tries to maximize accuracy rates without any learning.

9.4 Binary Decision Trees

This is a very basic model, but one that worked better than any of the other model

types initially attempted, including logistic regressions [36], support vector machines [29],

or nearest neighbour comparisons [15]. All of these algorithms were explored in the

WEKA software package [76]. Due to the success of binary trees, only they were

explored in detail in this project.

A binary tree is a classifier that tries to build decision-making rules for predicting the

specific class of an input instance (i.e. vector) [76]. For example, if we were to build

a classifier that answers the question, “Will it be raining in an hour?”, the tree in

Figure 9.1 seems to be a reasonable approach. The first question is “Is it cloudy?”,

and this splits into two options. If “yes” or “no”, then the second question could be

“Is it windy?” One hopes that by breaking down the tree into various categories,

one can place all the possible events in such a tree and be able to classify any given

condition.

A random tree is built by selecting x number of attributes uniformly at random and

choosing the attribute that best discriminates between classes. In our case, 4 random

attributes were chosen and the best one was used to split the tree. At the next level, 4

45

new attributes were randomly selected again. This process continues until all training

instances are categorized in the tree.

9.5 Model Evaluation

Like the governmental models discussed in Chapter 6, the models in this section

vary substantially between countries. Even though the underlying mathematical and

computational logic may be the same, the accuracies between countries vary a great

deal, implying that the underlying political dynamics may be different. For example,

Figures 9.2(a) and 9.2(b) show a typical situation for the data sets, where as we have

more training instances, the random tree algorithm tends to do better than the one

based on majority classes. However, Figure 9.2(c) shows that Japan provides a more

challenging data set for the algorithm, and it rarely has a better accuracy than the

baseline.

Table 9.1 provides the results of building random trees (20 different times) for each

country, compared to the baseline model. Observe that aside from Japan, New

Zealand, and Sweden, the models outperform the baseline case.

Country Baseline Acc. Mean Acc. Standard Dev., Acc.Canada 0.2500 0.3858 0.1128Denmark 0.4103 0.4891 0.0865France 0.3750 0.4688 0.0869Japan 0.8816 0.7270 0.0294Luxembourg 0.2500 0.4962 0.1365Netherlands 0.5000 0.5153 0.0619New Zealand 0.4333 0.4467 0.1347Norway 0.3444 0.6139 0.1455Sweden 0.5000 0.4321 0.1060UK 0.2574 0.5812 0.0613

Table 9.1: Results of fitting the binary tree classifier twenty different times. “Acc.”represents accuracy. Note that the baseline model does not have a standard deviationas it always builds the same model.

46

5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

Government Split

Acc

urac

y

(a) Canada.

5 10 15 200.

00.

20.

40.

60.

81.

0

Government Split

Acc

urac

y

(b) Norway.

5 10 15 20 25 30 35

0.0

0.2

0.4

0.6

0.8

1.0

Government Split

Acc

urac

y

(c) Japan.

Figure 9.2: Accuracies for the random tree algorithm (dark line) and majority vote(red line). The dashed lines represent one standard deviation away from the mean ofthe random tree algorithm. The split represents at which government we begin to builtthe test (i.e. validation) data set, also known as the “holdout” set.

47

Chapter 10

Discussion and Conclusion

The goal of this research was to explore whether network-centric variables can be

used to help explain some of the dynamics of cabinet members between 1945 to 1990.

We applied this to twelve parliamentary democracies, with models focusing both on

global properties of the networks themselves, as well as predicting specific results for

individual nodes. Our considerations did not produce a model that adequately helps

explain cabinet member longevity or survival in a majority of the twelve countries.

This should not come as a surprise: “parliamentary democracy” is a broad politi-

cal term, and the twelve countries in this study differ in electoral systems, political

views of constituents, constitutions, and other political variables. As such, building

a mathematical model that explains all twelve countries might be impossible – at the

very least, it would require more than just network information.

Importantly, some of the network-centric governmental models did outperform the

baseline “basic model”. This was the case with France, Luxembourg, New Zealand,

Japan, and Canada. While these models fare better than the baseline model in a

statistically significant way, the actual improvement in error rates is quite low. Ex-

ploring whether more complex models improve results further would be very useful.

Furthermore, there might be interaction effects between network variables and stan-

dard political science variables, such as experience of cabinet members or occupations

prior to becoming politicians [6].

We were more successful in predicting individual nodes’ performance over time, and

whether they will return as cabinet members in the future. Again, more algorithms

should be implemented in this case, but it is encouraging to see that some of the

variables seem to help improve predictions as to whether cabinet members will return

48

to office in the future. Indeed, only three of the ten countries analyzed in this research

fare worse than their baseline models when the holdout set for model validation

includes five governments. Of course, success rates vary depend on how large the

holdout set is, and some countries (e.g. Japan) present a formidable challenge to the

algorithms used. Exploring more algorithms and optimizing them for this specific

problem is a useful extension to this research.

Japan is a very unique example in this analysis. Whereas most ministerial models

outperform the baseline results and most governmental models do worse than the

baseline, the Japanese strength model (at the governmental level) is better than the

basic model, and the binary tree (at the ministerial level), is worse. As such, Japan

presents a stark reminder that having an understanding, however slight, of cabinet

networks at the nodal level does not mean we can extrapolate our knowledge to the

whole network.

10.1 Future Work

A number of interesting extensions are possible. First, the models and algorithms

implemented and developed above are by no means complete. Other models could be

built that incorporate more information about the network structure of the cabinets,

or look at the dynamics of cabinets in a different way. Similarly, other algorithms can

be tested on the data sets in question to predict whether nodes will return as cabinet

members in the future. Indeed, numerous network-based measures exist, and it is

possible that some of the ones that were not explored in this work may be fruitful.

A very interesting opportunity for further work lies in implementing exponential ran-

dom graph models (ERGMs) [31,72] to help understand and evaluate the underlying

structure of the cabinet data in this report. Specifically, ERGMs are a type of logit

model [3] that estimates the probability of a given network in a specific space of net-

works. Thus, given a fixed parameter space, one can model the cabinet networks and

whether they come from different types of random distributions of networks. ERGMs

have been applied to bipartite graphs [65], as well as “neighbourhoods” (e.g. groups

of nodes) in networks [53].

Second, it would be beneficial to combine our data with other data. Corruption

indices, voting records, election results, legislative debates, committee memberships,

49

and numerous other forms of data are available about politicians in many of the

countries discussed in this report, given the requisite time to find and extract them.

Combining our data with other pieces of information can help build better models –

ones that explore the qualitative, quantitative, and structural aspects of government

and politicians’ careers, and how they affect each other in the long run.

The question underlying much of this work deals with seeing whether it is possible to

build models of different types of governance systems. Many aspects of government

can be modelled and explored. Electoral systems, legislatures, and even grassroots

revolutions are all within reach [1]. It might ultimately be possible to build more

detailed models that take election results, cabinet membership decisions, and even

policy outcomes into account.

Returning to a more “micro” level, every country is unique in terms of history, gov-

erning institutions, and laws. Rather than building a comparative model that tries to

look at a group of parliamentary democracies, it may be more useful to build models

for individual countries.

10.2 Conclusion

Based on the discussion above, network variables seem to play a role in the careers

of cabinet ministers. While it is difficult to say whether their role is causal or simply

correlated with political success, such variables nonetheless allow one to predict fu-

ture success of individual cabinet members. Indeed, combining them with variables

already explored by political scientists may yield even better results and, ideally, help

drive development of political theory. Similarly, while the results for governmental

models are less successful compared to the ministerial ones, they also provide some

promising results, especially in the cases of France, Canada, New Zealand, Japan,

and Luxembourg. Because the models focusing on these five countries did better

than the baseline ones, there is support for the idea that network-centric variables

play a role in understanding governments and cabinet-level politics in parliamentary

democracies. With this in mind, a great deal of work lies ahead for those interested in

investigating cabinet memberships using mathematical and statistical frameworks.

50

Appendix A

Symbols

Table A.1 lists some of the symbols used in this paper, and what they mean. Note

that due to the size of this dissertation and the number of different symbols used,

some of the conventional symbols (e.g. for centralities and clustering coefficients,

which both use C) were substituted with more convenient ones.

51

Symbol DescriptionG A graph.V The vertex set of a graph.v A member of V .vi The ith member of V .E The edge set of a graph.t A unit of discrete time.gt The tth government.ci The ith cabinet member.Gt The set of ministers present in government gt.It The set of incumbents from government gt.Jt,t The set of incumbents in Gt that entered as newcomers

in Gt.Ω = ωij An experience matrix.CD, CB, CC Graph centralizations — degree, betweenness, and close-

ness, respectively.CD,i The graph centrality score (in this case, degree) of the

ith node.d(ci, cj) The distance between ci and cj .ki The degree or strength of the node vi, in the case of

unweighted and weighted networks, respectively.T The clustering coefficient of an unweighted network.Ti The clustering coefficient of the ith node in an un-

weighted network.TO, TZ , TH The clustering coefficient for a weighted network, under

the definitions of Onnela et al., Zhang et al., and Holmeet al., respectively.

TO,i The clustering coefficient of the ith node in a network.QH The heterogeneity of a network.QM The modularity of a network.QA The assortativity of a network.α A coefficient in a model.αi The ith coefficient in a model.zi The number of governments that ci has served.

Table A.1: Symbols used in the report.

52

Appendix B

Code Overview

Due to the large amount of information stored in our data sets and the relative novelty

of the work being undertaken, much of the software for the analysis and modelling

of the data had to be built specifically for this project. The programming language

used in the software development effort associated with this dissertation was Python.

The software developed for this dissertation has a number of important parts, which

we outline in Figure B.1.

The first part of the software is the “network data”, which is the way that network

information is stored at run time. There are several strategies for storing such infor-

mation. Because a social network can be represented as an adjacency matrix, one can

simply store a two-dimensional array. Another strategy is to store lists of neighbours

for each node. In the case of this software, network data is stored in the form of edge

lists: this is a list of values that create a function where, given a source and target

node, an edge weight is returned, or 0 is returned if such a connection does not exist.

Network data is most often used by “network functions”, which take the aforemen-

tioned edge list and calculate properties like clustering coefficients or strengths. Node-

level functions return an array of values, usually one for every node in the network,

while global functions return one value for the entire network. One can use network

functions on their own, or in actual “models”, which use the functions to make de-

cisions on how the network evolves within multiple time steps. Because most of the

models in this dissertation are probabilistic, they depend on running a large number

of simulations to understand the range of possible values attained by the networks.

With this in mind, “simulators” control how many times a model is run and how data

on the multiple simulations is stored.

53

Figure B.1: The flow of information within the code developed to analyze the cabinetnetwork data. Text surrounded by squares represents “entities” such as classes, andarrows represent the flow of information and data.

Simulators, network data, and network functions are used by “validators” to check

how a specific model with specific parameter values compares to real-world data.

At their simplest, validators calculate errors between the simulated networks and

real-world, country-specific data. This is then used by “optimizers” to find optimal

parameter values for the models being analyzed.

Unless otherwise stated, the measures, models, and optimization techniques discussed

in this dissertation have all been coded in Python by the author.

54

Appendix C

Graph Visualization

C.1 Algorithms

Graph visualization is a complex and rich field of study, and while it is beyond the

scope of this dissertation, it is useful to briefly review the methods we use to visualize

the graphs in the cabinet network data set. The algorithm used to display graphs is

the Fruchterman-Reingold [24] algorithm, implemented in the “igraph” [16] library.

This algorithm is part of a family of models that treat a network as a set of masses (i.e.

nodes) connected to each other by springs (i.e. edges) [19]. While these algorithms

eschew real-world physics, their approach to laying out graphs has two important

properties:

1. All nodes within a certain radius repel each other.

2. Neighbours attract each other and, at a certain radius, balance out the repulsive

forces between each other.

Thus, laying out a graph becomes a computational exercise in finding an equilibrium

between repulsive and attractive forces. The Fruchterman-Reingold algorithm cal-

culates repulsive and attractive forces for a specific node v ∈ V with the following

equations:fattractive(d) = d2/kfrepulsive(d) = −k2/d

(C.1)

where d is the distance from the node in question to another node, and k is a constant.

The borders of the layout also repel a node, ensuring that all nodes remain in the

55

boundaries of the image. Finally, to ensure that disconnected graphs do not repel

each other indefinitely, two other points are worth mentioning:

1. The repulsive force only acts up to a certain radius.

2. If a graph contains N components, the layout itself is first split into N parti-

tions, each designated for a specific component. This prevents any individual

components from centering themselves in the layout and repelling all others to

the boundaries.

A graph is thereby laid out by calculating forces on individual nodes, moving them

accordingly, and iterating as long as needed, or until an equilibrium is reached. The

resulting forces are one-dimensional – rather than moving nodes according to the

sum of forces in a two-dimensional plane, simulated annealing or other optimization

techniques are used to find an optimal layout.

Because visualization algorithms were beyond the broader scope of this dissertation,

the “igraph” [16] library was used to visualize the networks in the cabinet data set.

Within this library, 500 annealing iterations are used to lay out the graph, with

k = |V |.

In the case of weighted graphs, the edge weight acts as a multiplier for the attractive

force outlined in Equation (C.1). For example, an edge weight of 2 causes twice as

much attractive force between neighbours as an edge weight of 1.

An important recent result [50] shows that using a force-directed layout like the

Fruchterman-Reingold algorithm is related to grouping nodes in a way that max-

imizes modularity (discussed in Section 4.3.2) between densely connected groups.

This implies that optimal layouts – that is, ones with low energy states – group nodes

in densely connected clusters near each other.

C.2 Visualizations

The visualizations in this section show the diversity of the data being analyzed. Fig-

ure C.1 shows how the algorithm lays out cliques — where all nodes are connected to

all other nodes – and how newcomers affect this layout in C.1(b). Observe that the op-

timal layout for cliques is by setting nodes in an equidistant fashion. In Figure C.1(b),

56

newcomers are shown in green, and the most connected nodes – the incumbents – are

placed at the centre of the visualization.

Similarly, Figure C.2 shows the first 21 cabinets in Sweden. Tightly knit clusters are

present here as well. Green nodes represent newcomers joining a group in the middle

right of the graph. The central node in the graph is also one connected to many

different politicians, indicating that the node likely participated in a relatively large

number of cabinets.

If one chooses to analyze an individual cabinet, one can also see a wealth of diverse

information. Figure C.3 shows Israel’s 16th cabinet. Edge weights are illustrated by

the thickness of the line between nodes, and square nodes are newcomers. Edges

representing the current government are ignored, and node colour represents political

party. Observe how there are four parties participating in the cabinet itself, with

some of the nodes connected to thicker edges than others. This suggests that some

underlying dynamics – whether it be experience of politicians, party affiliations, or

other nodal or network-level characteristics – help decide who will remain in power

over an extended period of time.

We show a final example, Denmark’s 27th government, in Figure C.4. The largest

component in the graph is composed of two separate parties, and it is interesting to

see that not all incumbents have worked with each other in the past. This implies

that being in the same political party does not serve as a predictor for prior cabinet

co-membership.

Although visualizations help illustrate the wealth of data and information available

in cabinet networks, they also underscore the need for quantitative analysis and mod-

elling. There are numerous interesting features that one can try to better understand

through modelling, and some of these are explored in sections of this dissertation.

Importantly, the visualizations themselves are not meant to provide quantifiable in-

formation about the cabinets and their networks. Rather they are meant to aid in

data exploration, and help us create hypotheses to subsequently investigate through

quantitative analysis.

57

(a) Canada’s first government. Noticethat it is a simple clique with all nodesconnected to all other nodes.

(b) Canada’s second and first govern-ments. Blue nodes represent cabinetmembers who were present in the firstgovernment, while green nodes are new-comers.

Figure C.1: Canada’s first and second cabinets. Note that the edges in these visual-izations are unweighted.

Figure C.2: Sweden’s first 21 cabinets. Notice that nodes organize into clusters, witha few nodes connected to many tightly-knit groups. Edges in this visualization areunweighted.

58

Figure C.3: Israel’s 16th cabinet. Political parties are represented by colour, and edgesare thicker if their weights are higher. Notice the experienced group composed of bluenodes.

Figure C.4: Denmark’s 27th cabinet. Notice the presence of two parties (node colours),and the newcomers (squares). Also, notice that different nodes have only previouslyworked with a subset of incumbents.

59

Appendix D

Clustering Coefficients

The appendix contains three important notes on clustering coefficients which were

not included in the main text of the dissertation due to space limitations.

D.1 Upper Bounds

An upper bound for wi can be calculated using the inequality aij ≤ 1 − δij where δij

is the Kronecker delta. The specific steps for this are,

wi =1

2

∑

u 6=i

∑

v | v 6=i, v 6=u

aiuauvavi

≤ 1

2

∑

u 6=i

[

aiu

(∑

v 6=i

(1 − δuv)avi

)]

=1

2

∑

u 6=i

aiu

[∑

v 6=i

(avi − δuvavi)

]

=1

2

∑

u 6=i

(

aiu

[∑

v 6=i

avi −∑

v 6=i

δuvavi

])

=1

2

∑

u 6=i

(

aiu

[(∑

v 6=i

avi

)

− aui

])

=1

2

[∑

u 6=i

aiu

]2

−∑

u 6=i

a2iu

. (D.1)

60

D.2 Matrix-based Definition

Another way to view the Watts and Strogatz definition is through matrix multiplica-

tion [30]. Specifically,

Ti =

∑nj=1

∑nk=1 aijajkaki

∑nj=1

∑nk=1 aijaki

. (D.2)

One can expand on this idea by defining a weighted clustering coefficient as

TH,i =

∑Nj=1

∑Nk=1aijajkaki

max(a)∑N

j=1

∑Nk=1 aijaki

. (D.3)

D.3 Discussion and Critique

Dealing with weighted clustering coefficients is challenging when one has different

distributions of edge weights. In Equations (4.15) and (4.19), edges are resized to

fit within 0 ≤ aij ≤ 1 by dividing all edge values by the maximal edge value in the

entire network. Hence, depending on the distribution of edge weights among edges

(specifically, the value of the maximal edge compared to the mean values of edges),

the clustering coefficient values vary. To show this, we ran simulations on networks of

size 3 to 20 with edge weights being taken from an exponential distribution, exp(λ),

with λ varying between 0.05 and 2.0. The results are shown in Figure D.1(a) and

Figure D.1(b), and show that as N , the number of nodes in a network, grows and

thus increases the value of maxij aij in the network, the mean clustering coefficient

generally attains a lower value.

This is important to note, as it implies that one cannot necessarily take values of

clustering coefficients from different networks and compare them. One first needs to

ensure, for example, that edge weights are distributed in a similar manner.

61

6 8 10 12 14 16 18 20

0.5

1.0

1.5

2.0

N

Lam

bda

0.15

0.20

0.25

(a) Mean TO values for 100 simulations of graphs of various sizes(N) and edge distribution (exp(λ)).

6 8 10 12 14 16 18 20

0.5

1.0

1.5

2.0

N

Lam

bda

0.04

0.06

0.08

0.10

(b) Standard deviations for Figure D.1(a).

Figure D.1: Various results for simulations for TO in random graphs whose edgeweights are values from a random variable following an exponential distribution.

62

Appendix E

Perron-Frobenius Theorems

The Perron-Frobenius theorems guarantee that given an adjacency matrix, the largest

eigenvalue and eigenvector pair will be non-negative. This is crucial in the definition

of eigenvector centrality (which is described in Section 4.1.4). Please note that the

proofs below are taken from Meyer [43].

In the context of this dissertation, adjacency matrices are non-negative matrices, as

we have aii = 0 for all i, and aij ≥ 0 when i 6= j. Before seeing that such conditions

allow for eigenvector centrality to exist, we first prove it for a simpler case, where all

values aij > 0 for all i and j.

Following the notation of Meyer [43], ρ(A) is the spectral radius of A, and given a

vector x, we define |x| as a vector with all absolute values of elements in x. Given

two vectors, a and b, a > b means that for all entries in a and b, we have ai > bi.

Also, note that ||a||1 =∑

i |ai|.

Similarly, given two matrices A and B, |A| implies we take A with all absolute values

for entries, and A > B means the inequality holds true for all corresponding elements.

Finally, ||A||∞ = maxij |aij |.

With Theorem 1, we prove that a matrix with positive entries has a positive eigen-

vector and eigenvalue pair (called an “eigenpair”).

Theorem 1. Given a positive matrix A, there exists an eigenvector with all non-

negative entries.

Proof:

63

Without loss of generality, assume ρ(A) = 1. Suppose (λ,x) is an eigenpair for A

such that |λ| = 1. We then have:

|x| = |λ| |x| = |λx| = |Ax| ≤ |A| |x|

This implies |x| ≤ A |x|.

Now, let z = A |x| and let y = z − |x|. Thus, y ≥ 0.

Suppose that y 6= 0, so that we have A y > 0. Furthermore, since y = z − |x| > 0,

we also have z > |x|, and thus and z > 0.

Thus, we know there is a number ǫ such that Ay > ǫz. Thus

Ay > ǫz,

A(z − |x|) > ǫz,

Az − z > ǫz,A

1 + ǫz > z.

We then set B = A/(1 + ǫ), and we have that B2z > Bz > z, B3z > B2z > z, and

so on. Thus, we have that Bkz > z for k = 1, 2, . . . . However, because we have that

ρ(A) = 1 and ρ(B) = ρ(A/(1 + ǫ)) = 1/(1 + ǫ) < 1, we have that Bk → 0 as k → ∞.

This leads to the conclusion that 0 > z, which is a contradiction.

Because our initial assumption was that y > 0 and this led to a contradiction, we

have that y = 0. By definition, this results in 0 = z − |x| = A|x| − |x|.

Thus, A|x| = |x|, which means that |x| is an eigenvector for 1.

Before we move on to non-negative matrices, we need to prove a lemma that allows

us to transfer a form of the triangle inequality to matrices. Specifically, if we have

|A| ≤ B, then ρ(A) ≤ ρ(|A|) ≤ ρ(B).

Lemma 1. |A| ≤ B, then ρ(A) ≤ ρ(|A|) ≤ ρ(B).

Proof:

64

With the triangle inequality, we have |Ak| ≤ |A|k for every positive integer k. Also,

|A| < B implies |A|k < Bk. Thus:

||Ak||∞ = || |Ak| ||∞ ≤ || |A|k ||∞ ≤ ||Bk||∞⇒ lim

k→∞||Ak||1/k

∞ ≤ limk→∞

|| |A|k ||1/k∞ ≤ lim

k→∞||Bk||1/k

∞

⇒ ρ(A) ≤ ρ(|A|) ≤ ρ(B)

Before moving on to proving Theorem 1 for non-negative matrices, it is useful to

note the Bolzano-Weierstrass Theorem [66]. The theorem states, “Every bounded

sequence has a convergent subsequence.” This is used in the proof below.

Theorem 2. If A is a non-negative matrix, then there exists an eigenvalue r = ρ(A)

such that Ax = rx with x ≥ 0.

Proof:

For k = 1, 2, . . . , define Ak = A + (1/k)E > 0 where E is a matrix with 1 in all

entries. Thus, Theorem 1 applies to all Ak.

Now, define the set P = pk∞k=1 where pk is the vector associated with rk = ρ(Ak).

From Theorem 1, rk is an eigenvalue.

Furthermore, P is a bounded set because it is contained in the unit-1 sphere. This is

a result of Perron-Frobenius theory, but for the sake of brevity, is not included here.

Furthermore, the theory states that ||pk||1 = 1.

Using the Bolzano-Weierstrauss Theorem, there is a convergent subsequence pki →

p⋆ where p⋆ ≥ 0.

p⋆ 6= 0 because pki> 0 and ||pki

||1 = 1.

Now, because A1 > A2 > · · · > A, Lemma 1 above guarantees that r1 ≥ r2 ≥ · · · ≥ r,

so rk∞k=1 is a monotonic sequence of positive numbers that is bounded below by r.

Thus,

limk→∞

rk = r⋆ r⋆ ≥ r limi→∞

rki= r⋆ ≥ r (E.1)

65

Because Ak → A as k → ∞, we also have Aki→ A as i → ∞. Thus, we can also

establish:

Ap⋆ = limi→∞

Akipki

= limi→∞

rkipki

= r⋆p⋆ (E.2)

This shows that r⋆ is an eigenvalue for A, and since r is the maximal eigenvalue, we

then have r⋆ ≤ r.

Therefore, r⋆ = r and Ap⋆ = rp⋆ with p⋆ ≥ 0 and p⋆ 6= 0.

With Theorem 2, we know that given a matrix with non-negative entries, the eigen-

vector corresponding to the largest eigenvalue will be have non-negative entries as

well. In a connected graph, this allows us to define and use eigenvector centrality.

66

Appendix F

Data Tables

Table F.1 shows a number of different measures and summary statistics for the twelve

parliamentary democracies in this study.

67

Country Govts Mean(# parties) Mean (Cab.Size) Mean (Newcomers) Mean (Max Edge) Min(QA) Max(QA)Canada 18 1 22.5556 10.9444 3.2778 1 1Denmark 27 2.1481 13.5556 5.7778 3.8889 −0.15294118 1France 28 5.0714 19 5.2143 7.3929 −0.11807732 1Germany 23 . 22.1739 9.7391 5.6957 . .Israel 35 . 9.0571 1.8571 7.7714 . .Japan 35 1.5429 14.5143 9.0857 2.2 −0.10786845 1Luxembourg 15 2.4667 7.4667 3.6667 2.8 −0.21518987 0Netherlands 21 3.5714 14.0952 7.5714 3.1905 −0.1343361 0.01320423New Zealand 19 1 16.3158 7.3158 4 1 1Norway 23 1.7826 15.6522 8.087 2.8696 −0.08454065 1Sweden 21 2.0952 17.8095 6.0476 5.5714 −0.09939585 1UK 18 1 20.3889 11.4444 2.9444 1 1

Table F.1: Some descriptive statistics by country. “Govts” represents the number of governments, which is followed by the meannumber of parties in each government, the mean cabinet size, mean number of newcomers, and the mean maximal edge of eachgovernment. “Min(QA)” and “Max(QA)” give, respectively. the maximum and minimum of the assortativity coefficient.

68

Appendix G

ψ-Value Proofs

The proofs below show that the ψ-value for eigenvector centrality is any real number,

and that all values for all nodes are equal. We do this using LU decomposition [58].

The problem statement itself is proving that the largest eigenvalue of an adjacency

matrix, A, representing a clique of size n is n−1, and the eigenvector is any real-valued

vector with equal entries. By definition, we have

Ax = λx,

(A− λI)x = 0,

Ax = 0,

A = aij,

where

aij =

−λ i = j1 i 6= j

.

We can decompose A into upper and lower triangular matrices, U and L respectively,

to solve for the general n× n case. The theorem below helps us do this.

Theorem 3. For the LU decomposition of A, we have L = lij be such that lii = 1,

and entries below the diagonal are lij = (λ− j + 1)−1. Also, U = uij has elements

uij =

(−1)(λ−j+1)(λ+1)λ−j+2

for i = jλ+1

λ−j+2for i 6= j

. (G.1)

Proof:

69

This can be proven using the LU decomposition algorithm. The proof is inductive,

with the initial result, for the first iteration (at j = 1) being proven by simply running

through the algorithm.

At the kth step, we have:

lik =− λ+1

λ−k+2

− (λ−k+1)(λ+1)(λ−k+2)

=(−1)(λ+ 1)

(−1)(λ− k + 1)

=1

λ− k + 1

At this step, we also have

uk+1,k+1 =

((−1)(λ− k + 1)(λ+ 1)

λ− k + 2

)

+

(1

λ− k + 1

)(λ+ 1

λ− k + 2

)

=(−1)(λ+ 1)(λ+ k)

λ− k + 1

⇒ ukk =(−1)(λ+ 1)(λ+ k − 1)

λ− k + 2where k = k + 1

Finally,

uk+1,j =

(λ+ 1

λ− k + 2

)

+

(1

λ− k + 2

)(λ+ 1

λ− k + 2

)

=λ+ 1

λ− k + 1

This implies,

uk,j =λ+ 1

λ− k + 2where k = k + 1

To obtain the eigenvector centrality, we begin with the characteristic polynomial.

Since det(A) = det(L)det(U) = det(U), it follows that:

det(U) = Tr(U) = (λ+ 1)n−1(λ− n+ 1)

Thus, the eigenvalues for A are −1 and n− 1, with the latter being positive and the

largest value. Thus, this is the eigenvalue we use to find eigenvector centrality.

70

To find the associated eigenvector, we have that Ax = λx, which is equivalent to

Ax = 0 = LUx. Because the diagonal entries of L are all 1s and x 6= 0, we can

conclude that Ly = 0. This implies y = 0, so our eigenvector search leads to

Ux = 0.

Here, we can use back-substitution to solve for x. By Theorem 3, we have unn =(−1)(n−n)(n)

n−1= 0, so xn can be any real number. For xn−1,n we have,

un−1,n−1 =(−1)(n− n+ 1)(n)

n− 1 − (n− 1) + 2

=−n2

un−1,n =n

2⇒ xn+1 = xn

Using induction, we then get the (n− k)th row:

un−k,n−k =(−1)(n− 1 − n+ k + 1)(n)

n− 1 − n+ k + 2

=−knk + 1

un−k,n−k+i =n− 1 + 1

n− 1 − n+ k + 2=

n

k + 1for i = 1, . . . , k

This implies,

un−k,n−kxn−k +

k∑

i=1

un−k,n−k+ixn−k+i = 0

un−k,n−kxn−k =−knk + 1

xn

Thus,

xn−k = xn.

Thus, the eigenvector for λ = n − 1 is any vector where all entries are equal. This

implies the ψ value for eigenvector centrality is any real number as well.

71

Appendix H

Model Optimization Results

The tables below list results for the model parameters that return, on average, the

lowest error values for specific models. Table H.1 shows results for the basic model.

Tables H.2 and H.3 report on the single-measure model and the variant which per-

turbed an initial probability. Finally, Table H.4 shows the results of simulations on

the preferential attachment model.

Country P e Psim esim SDe

Canada 0.5600 1.6331 0.5500 1.6466 0.0409Denmark 0.5676 2.3052 0.5500 2.3132 0.0410France 0.7400 1.4686 0.7500 1.4978 0.0428Japan 0.3800 1.6470 0.3691 1.6717 0.0382Luxembourg 0.5676 1.0204 0.5691 1.0390 0.0335Netherlands 0.5000 1.7000 0.5000 1.7155 0.0429New Zealand 0.5276 1.9871 0.5500 1.9997 0.0431Norway 0.5276 1.7781 0.5000 1.7974 0.0396Sweden 0.7124 1.6668 0.7000 1.6836 0.0411UK 0.4800 1.4861 0.5000 1.4993 0.0404Israel 0.8200 2.1396 0.8191 2.1704 0.0408Germany 0.5876 1.4148 0.6000 1.4400 0.0461

Table H.1: Probability and error score for the basic model. Models were fitted usingthe Golden Section Search. The symbols P and e represent results from the analyticalvalues, while Psim, esim, and SDe represent the probability, mean error value, andstandard deviation of the error value, respectively, based on simulations of the model.

72

Country αstr estr σstr αonn eonn σonn αev eev σev

Canada 0.0180 1.8202 0.0294 0.8000 1.5836 0.0902 1.7000 1.6752 0.0419Denmark 0.0180 2.4513 0.0331 0.7000 2.7047 0.0612 1.5691 2.4546 0.0423France 0.0190 1.8906 0.0380 0.9382 2.3256 0.0859 1.7191 2.0582 0.0430Japan 0.0160 1.6362 0.0269 0.5382 1.8272 0.0559 1.2500 1.6697 0.0408Luxembourg 0.0184 1.2267 0.0272 0.7382 1.3630 0.0445 1.7500 1.0882 0.0285Netherlands 0.0180 1.7706 0.0354 0.7000 1.8861 0.0777 1.5000 1.7402 0.0380New Zealand 0.0180 2.0514 0.0260 0.7000 2.3666 0.0552 1.5691 2.0839 0.0481Norway 0.0174 1.9642 0.0283 0.8000 1.8212 0.0727 1.6500 1.8113 0.0366Sweden 0.0190 2.1300 0.0420 0.9382 1.9900 0.0877 1.8000 1.9539 0.0361UK 0.0174 1.5871 0.0277 0.6382 1.6078 0.0609 1.4500 1.5462 0.0427Israel 0.0190 2.8385 0.0528 0.9382 2.9417 0.0776 1.6500 3.0385 0.0353Germany 0.0180 1.6012 0.0258 0.8000 1.7534 0.0656 1.7000 1.4977 0.0383

Table H.2: Single measure model fits and errors. 3600 simulations per parameter set were run for the strength, TO models(denoted by “onn”), and eigenvector centrality models. Models were fitted using the Golden Section Search algorithm. αX, eX,and σX represent the coefficient, mean error, and standard deviation of the error for each model, respectively, with “str”, “onn”,and “ev” referring to strength, TO, and eigenvector centrality models. Red text means that the model’s error rate is lower thanthe baseline model at a statistically significant level, based on t-tests [75].

73

Country P0,str αstr estr σstr P0,onn αonn eonn σonn P0,ev αev eev σev

Canada 0.5367 0.0000 1.6364 0.0319 0.5498 0.0613 1.6494 0.0699 0.5606 0.0500 1.6526 0.0528Denmark 0.5488 0.0013 2.3080 0.0308 0.5676 0.0000 2.3440 0.0674 0.5676 -0.0619 2.3257 0.0489France 0.7355 0.0000 1.4798 0.0281 0.7697 0.0000 1.5459 0.0728 0.7400 0.0461 1.5101 0.0490Japan 0.3741 0.0001 1.6545 0.0261 0.4145 -0.0248 1.6959 0.0600 0.3757 -0.0464 1.6707 0.0408Luxembourg 0.5461 0.0006 1.0263 0.0241 0.5676 0.0000 1.0650 0.0513 0.5676 0.0500 1.0464 0.0386Netherlands 0.4774 0.0000 1.7043 0.0314 0.5287 -0.0489 1.7263 0.0640 0.5000 0.0000 1.7143 0.0488New Zealand 0.5119 0.0004 1.9868 0.0308 0.5276 0.0000 2.0137 0.0729 0.5482 -0.0506 1.9915 0.0537Norway 0.5498 -0.0006 1.7836 0.0279 0.5223 -0.0155 1.8136 0.0620 0.5135 0.0757 1.7909 0.0455Sweden 0.7197 0.0000 1.6735 0.0313 0.7238 0.0170 1.7005 0.0573 0.7403 -0.0757 1.6863 0.0508UK 0.4861 0.0000 1.4882 0.0290 0.4800 0.0000 1.5227 0.0711 0.4818 -0.0409 1.5004 0.0525Israel 0.8087 0.0000 2.1497 0.0269 0.8506 -0.0511 2.1846 0.0685 0.7968 0.0607 2.1679 0.0507Germany 0.5876 0.0000 1.4238 0.0312 0.5876 0.0000 1.4628 0.0654 0.5678 -0.0505 1.4460 0.0485

Table H.3: Single measure models with perturbed probabilities were fitted and errors were calculated. This represents 3600simulations per perturbation, with 50 iterations of simulated annealing used. The symbols αX , eX , and ˆsigmaX represent thecoefficient, mean error, and standard deviation of the error for each model, respectively. “str”, “onn”, and “ev” refer to thestrength, clustering coefficient (TO), and eigenvector centrality models. Red text means that the model’s error rate is lower thanthe baseline model at a statistically significant level, based on t-tests [75].

74

Country α1 α2 Pnew e σCanada 0.5000 0.5000 0.5000 1.6975 0.0179Denmark 0.4993 0.4993 0.4997 2.4021 0.0170France 0.1667 0.1667 0.3333 1.4554 0.0200Japan 0.5000 0.4688 0.4688 1.8660 0.0169Luxembourg 0.4995 0.4995 0.5000 1.0843 0.0166Netherlands 0.5000 0.5000 0.5000 1.7497 0.0160New Zealand 0.5000 0.5000 0.5000 2.0284 0.0212Norway 0.5000 0.5000 0.5000 1.8352 0.0151Sweden 0.5417 0.5417 0.3333 1.7860 0.0211UK 0.5000 0.5000 0.5000 1.5549 0.0181Israel 0.5833 0.5833 0.1667 2.2964 0.0233Germany 0.5208 0.5208 0.4167 1.4605 0.0199

Table H.4: Preferential attachment model fits, fitted using the Nelder-Mead algorithm,with 50 iterations, and 3600 simulations per parameter set. Red text means that themodel’s error rate is lower than the baseline model at a statistically significant level,based on t-tests [75].

75

References

[1] D. Acemoglu and J.A. Robinson. Economic Origins of Dictatorship and Democ-

racy. Cambridge University Press, 2006.

[2] L.A. Adamic, B.A. Huberman, A.L. Barabasi, R. Albert, H. Jeong, and G. Bian-

coni. Power-Law Distribution of the World Wide Web. Science, 287(5461):2115,

2000.

[3] A. Agresti. Categorical Data Analysis. Wiley-Interscience, 2002.

[4] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and

M. Protasi. Complexity and Approximation: Combinatorial Optimization Prob-

lems and Their Approximability Properties. Springer Verlag, 1999.

[5] A.L. Barabasi and R. Albert. Emergence of Scaling in Random Networks. Sci-

ence, 286(5439):509, 1999.

[6] L. Beckman. The Competent Cabinet? Ministers in Sweden and the Problem

of Competence and Democracy. Scandinavian Political Studies, 29(2):111–129,

2006.

[7] S. Berlinski, T. Dewan, and K. Dowding. The Length of Ministerial Tenure in the

United Kingdom, 1945–97. British Journal of Political Science, 37(02):245–262,

2007.

[8] P.M. Blau. Inequality and Heterogeneity: A Primitive Theory of Social Structure.

Free Press New York, 1977.

[9] B. Bollobas. Modern Graph Theory. Springer, 1998.

[10] P. Bonacich. Factoring and Weighting Approaches to Status Scores and Clique

Identification. Journal of Mathematical Sociology, 2(1):113–120, 1972.

76

[11] R.S. Burt. Structural Holes: The Social Structure of Competition. Belknap Press,

1995.

[12] R.S. Burt. Structural Holes and Good Ideas. American Journal of Sociology,

110(2):349–399, 2004.

[13] C.T. Butts. Revisiting the Foundations of Network Analysis. Science,

325(5939):414, 2009.

[14] W.K.T. Cho and J.H. Fowler. Legislative Success in a Small World: Social

Network Analysis and the Dynamics of Congressional Legislation. Journal of

Politics, forthcoming.

[15] J.G. Cleary and L.E. Trigg. K*: An instance-based learner using an entropic

distance measure. In 12th International Conference on Machine Learning, pages

108–114, 1995.

[16] G. Csardi and T. Nepusz. The igraph Software Package for Complex Network

Research. InterJournal Complex Systems, 1695, 2006.

[17] L. Davis. Genetic Algorithms and Simulated Annealing. Morgan Kaufmann

Publishers Inc. San Francisco, CA, USA, 1987.

[18] D. Diermeier and A. Merlo. Government Turnover in Parliamentary Democra-

cies. Journal of Economic Theory, 94(1):46–79, 2000.

[19] P. Eades. A Heuristic for Graph Drawing. Congressus Numerantium, 42:194–202,

1984.

[20] A.Q. Flores. The Political Survival of Foreign Ministers. Foreign Policy Analysis,

5:117–133, 2009.

[21] J.H. Fowler, T.R. Johnson, J.F. Spriggs II, S. Jeon, and P.J. Wahlbeck. Net-

work Analysis and the Law: Measuring the Legal Importance of Supreme Court

Precedents. Political Analysis, 15(3):324–346, 2007.

[22] L.C. Freeman. A Set of Measures of Centrality Based on Betweenness. Sociom-

etry, 40(1):35–41, 1977.

[23] L.C. Freeman. Centrality in Social Networks: Conceptual Clarification. Social

networks, 1(3):215–239, 1979.

77

[24] T.M.J. Fruchterman and E.M. Reingold. Graph Drawing by Force-Directed

Placement. Software: Practice and Experience, 21(11):1129–1164, 1991.

[25] M.S. Granovetter. The Strength of Weak Ties. American Journal of Sociology,

78(6):1360, 1973.

[26] W. Gryc. The Social and Communication Networks of a Grassroots Organi-

zation in Kibera, Kenya. Proceedings of the 2009 International Workshop on

Intercultural Collaboration, 2009.

[27] R. Guimera, B. Uzzi, J. Spiro, and L.A.N. Amaral. Team Assembly Mechanisms

Determine Collaboration Network Structure and Team Performance. Science,

308(5722):697–702, 2005.

[28] R. Gulati, N. Nohria, and A. Zaheer. Strategic Networks. Strategic Management

Journal, 21(3):203–215, 2000.

[29] S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice Hall, 1999.

[30] P. Holme, S. Min Park, B.J. Kim, and C.R. Edling. Korean University Life in

a Network Perspective: Dynamics of a Large Affiliation Network. Physica A,

373:821–830, 2007.

[31] D.R. Hunter, M.S. Handcock, C.T. Butts, S.M. Goodreau, and M. Morris.

ERGM: A Package to Fit, Simulate and Diagnose Exponential-Family Models

for Networks. Journal of Statistical Software, 24(3):1–29, 2008.

[32] G. Kalna and D.J. Higham. A Clustering Coefficient for Weighted Networks,

With Application to Gene Expression Data. AI Communications, 20(4):263–

271, 2007.

[33] L. Katz. A New Status Index Derived from Sociometric Analysis. Psychometrika,

18(1):39–43, 1953.

[34] G. King, J.E. Alt, N.E. Burns, and M. Laver. A Unified Model of Cabinet Dis-

solution in Parliamentary Democracies. American Journal of Political Science,

pages 846–871, 1990.

[35] J.C. Lagarias, J.A. Reeds, M.H. Wright, and P.E. Wright. Convergence Proper-

ties of the Nelder-Mead Simplex Method in Low Dimensions. SIAM Journal on

Optimization, 9(1):112–147, 1999.

78

[36] S. Le Cessie and J.C. Van Houwelingen. Ridge Estimators in Logistic Regression.

Applied Statistics, pages 191–201, 1992.

[37] M. Leiserson. Factions and Coalitions in One-Party Japan: An Interpretation

Based on the Theory of Games. The American Political Science Review, pages

770–787, 1968.

[38] N. Lin. Social Networks and Status Attainment. Annual Review of Sociology,

25(1):467–487, 1999.

[39] N. Lin. Building a Network Theory of Social Capital. Social Capital: Theory

and Research, pages 3–29, 2001.

[40] C.D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Re-

trieval. Cambridge University Press New York, NY, USA, 2008.

[41] M. McPherson, L. Smith-Lovin, and J.M. Cook. Birds of a Feather: Homophily

in Social Networks. Annual Review of Sociology, 27(1):415–444, 2001.

[42] R.K. Merton. The Matthew Effect in Science: The Reward and Communication

Systems of Science are Considered. Science, 159(3810):56–63, 1968.

[43] C.D. Meyer. Matrix Analysis and Applied Linear Algebra. SIAM, 2000.

[44] J.L. Moreno and H.H. Jennings. Statistics of Social Configurations. Sociometry,

1(3/4):342–374, 1938.

[45] J.A. Nelder and R. Mead. A Simplex Method for Function Minimization. The

Computer Journal, 7(4):308, 1965.

[46] M.E.J. Newman. Mixing Patterns in Networks. Physical Review E, 67(2):026126,

2003.

[47] M.E.J. Newman. The Structure and Function of Complex Networks. SIAM

Review, 45(2):167–256, 2003.

[48] M.E.J. Newman. The Mathematics of Networks. The New Palgrave Encyclopedia

of Economics, 2007.

[49] M.E.J. Newman and M. Girvan. Finding and Evaluating Community Structure

in Networks. Physical Review E, 69(2):26113, 2004.

79

[50] A. Noack. Modularity Clustering is Force-Directed Layout. Physical Review E,

79(2):26102, 2009.

[51] J.P. Onnela, J. Saramaki, J. Kertesz, and K. Kaski. Intensity and Coherence of

Motifs in Weighted Complex Networks. Physical Review E, 71(6):65103, 2005.

[52] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Rank-

ing: Bringing Order to the Web. 1998.

[53] P. Pattison and G. Robins. Neighborhood-Based Models for Social Networks.

Sociological Methodology, pages 301–337, 2002.

[54] M.A. Porter, P.J. Mucha, MEJ Newman, and AJ Friend. Community Structure

in the United States House of Representatives. Physica A: Statistical Mechanics

and its Applications, 386(1):414–438, 2007.

[55] M.A. Porter, P.J. Mucha, MEJ Newman, and C.M. Warmbrand. A Network

Analysis of Committees in the US House of Representatives. Proceedings of the

National Academy of Sciences, 102(20):7057–7062, 2005.

[56] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical

Recipes in C: The Art of Scientific Computing. Cambridge University Press,

1992.

[57] D.S. Price. A General Theory of Bibliometric and Other Cumulative Advantage

Processes. Journal of the American Society for Information Science, 27(5), 1976.

[58] A. Quarteroni, R. Sacco, and F. Saleri. Numerical Mathematics. Springer, 2006.

[59] R. Rose. The Making of Cabinet Ministers. British Journal of Political Science,

pages 393–414, 1971.

[60] S.M. Ross. Introduction to Probability Models. Elsevier, 2007.

[61] S. Saavedra, F. Reed-Tsochas, and B. Uzzi. A Simple Model of Bipartite Coop-

eration for Ecological and Organizational Networks. Nature, 457(7228):463–466,

2008.

[62] J. Saramaki, M. Kivela, J.P. Onnela, K. Kaski, and J. Kertesz. Generalizations

of the Clustering Coefficient to Weighted Complex Networks. Physical Review

E, 75(2):27105, 2007.

80

[63] H.A. Simon. On a Class of Skew Distribution Functions. Biometrika, 42(3-

4):425–440, 1955.

[64] J. Skvoretz. Salience, Heterogeneity and Consolidation of Parameters: Civilizing

Blau’s Primitive Theory. American Sociological Review, pages 360–375, 1983.

[65] J. Skvoretz and K. Faust. Logit Models for Affiliation Networks. Sociological

Methodology, pages 253–280, 1999.

[66] M. Spivak. Calculus. Addison Wesley, 1967.

[67] K. Strom, I. Budge, and M.J. Laver. Constraints on Cabinet Formation in

Parliamentary Democracies. American Journal of Political Science, pages 303–

335, 1994.

[68] M. Tavits. The Role of Parties’ Past Behavior in Coalition Formation. American

Political Science Review, 102(04):495–507, 2008.

[69] B. Uzzi, D. Diermeier, P.J. Mucha, and M.A. Porter. Personal communication,

July 2009.

[70] W.D. Wallis. A Beginner’s Guide to Graph Theory. Birkauser, 2000.

[71] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications.

Cambridge University Press, 2005.

[72] S. Wasserman and P. Pattison. Logit Models and Logistic Regressions for So-

cial Networks: I. An Introduction to Markov Graphs and p∗. Psychometrika,

61(3):401–425, 1996.

[73] D.J. Watts and S.H. Strogatz. Collective Dynamics of ‘Small-World’ Networks.

Nature, 393(6684):440, 1998.

[74] A.S. Waugh, L. Pei, J.H. Fowler, P.J. Mucha, and M.A. Porter. Party Polariza-

tion in Congress: A Social Networks Approach. arXiv:0907.3509v1, 2009.

[75] B.L. Welch. The Generalization of “Student’s” Problem When Several Different

Population Variances are Involved. Biometrika, 34(1-2):28–35, 1947.

[76] I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and

Techniques. Elsevier, second edition, 2005.

81

[77] J. Woldendorp, H. Keman, and I. Budge. Handbook of Democratic Government:

Party Government in 20 Democracies (1945-1990). Kluwer Academic Publishers,

Dordrecht; Boston, 1993.

[78] B. Zhang and S. Horvath. A General Framework for Weighted Gene Co-

Expression Network Analysis. Statistical Applications in Genetics and Molecular

Biology, 4(1):1128, 2005.

[79] Y. Zhang, A.J. Friend, A.L. Traud, M.A. Porter, J.H. Fowler, and P.J. Mucha.

Community Structure in Congressional Cosponsorship Networks. Physica A,

387:1705–1712, 2008.

82

Documents

Modelling Cabinet Networks in Parliamentary Democraciesmason/research/gryc_dissertation_final.pdfModelling Cabinet Networks in Parliamentary Democracies Wojciech Gryc Trinity College