Eidgenössische Technische Hochschule Zürich
Swiss Federal Institute of Technology Zurich
International Relations
Overview of the Possibilities of
Quantitative Methods in Political Science
Tobias Böhmelt
ETH Zurich
Overview
• Introduction
• EITM - The Importance of Methods
• Choice of Methods
• What is Quantitative Methodology?
• The Approach of Quantitative Methods in Political Science
• Short Overview of Possibilities
• Some Problems and Caveats
• Conclusion
Introduction
• What do I hope to accomplish?
– Teaching you in-depth knowledge of some quantitative approaches?
– Teaching you how to employ quantitative methods?
– Making you familiar with statistical software packages?
• The answer is simple – no.
• Instead:
– Clarify the value and challenges of quantitative research.
– Help you to get interested in these methods for conducting moreeffective research.
EITM – The Importance of Methods: Why Do We Need Methods to
Answer Questions in Political Science?
EITM – Empirical Implications of Theoretical Models
• Prerogative of theory.
• Characteristics of theory determine the testing method: scope and generality,parsimony and complexity, prediction and explanation.
• Estimating average causal effects or explaining the complexity of a single event?
• The “degree of freedom problem:” most theories argue ceteris paribus, othereffects have to be controlled for. This is often not possible with one or two cases.
• Is it important how much a variable matters or just that it matters?
• Case selection: selection bias, self-selection, selection on the dependent
variable lack of independence of cases leads to false conclusions.
EITM – The Importance of Methods
The Basic Research Design Problem
• N problems = .
• For any problem, N theories = .
• For any theory, N models = .
• For any problem, the number of empirical specifications = .
This has implications for the use of methods!
EITM – The Importance of Methods
• Science contributes to society by simplifying complex phenomena.
– Its value increases with the value of the simplification.
• Interesting topics per se are insufficient.
– You must be able to lead people from where they are to a better conclusion.
1. The goal is inference.
2. The procedures are public.
3. The conclusions are uncertain.
4. The content is the method.
Choice of Methods
Factors Influencing the Research Outcome – A Methods Perspective
• The chosen theoretical approach (paradigm) affects the results – approaches often
predefine the method to be applied for testing hypotheses.
• The method you choose to test propositions impacts the results you get: quantitative
vs. qualitative approaches scope and generalizability are crucial!
• Case selection: the selection of cases on the basis of the dependent variable
impedes the accumulation of knowledge: this leads to selection bias.
• Careful case selection on explanatory variables is crucial in order to obtain reliable
and valid results.
• Selection criteria should be explicitly stated to ensure replicability and show how
selection possibly drives the results.
Choice of Methods
Different Methods Have Different Comparative Advantages
• Deduction: method follows theory:
– Test implications of theories against empirical observations.
– Hypotheses testing logic of confirmation.
• Induction: method used to create or amend theories:
– Develop theories: induction, hypothesis formation by studying deviant
and outlier cases, historical explanation of individual cases.
– Modify theories: adapt theories to outliers.
Choice of Methods
• Trade off between explanation and prediction.
• In general: quantitative methods have a high predictive power and
qualitative a high explanatory power.
• Theory testing often requires the combination of qualitative and
quantitative methods:
– qualitative research looks at outliers of a quantitative analysis.
– case studies identify important variables and conceptualize variables.
– study the crucial case to test the underlying causal mechanism.
– study deviant or outlier cases to analyze why these cases do not fit the theory.
– study important historical cases.
What is Quantitatitve Methodology?
Simple Example demonstrating the
„Usefulness‟ of Statistics:
Homer is questioned about his newly
formed vigilante group.
Newscaster: “Since your group started up,
petty crime is down 20%, but other crimes
are up. Such as heavy sack beating, which
is up 800%. So you‟re actually increasing
crime.”
Homer: “You can make up statistics to
prove anything.”
Has to do with “numbers”…
What is Quantitatitve Methodology?
Curtis Signorino (1999) “How to Translate a Theory into a Statistical
Model:”
1. Specify the theoretical choice model.
2. Add a random component (the source of uncertainty).
3. Derive the probability model associated with one‟s dependent
variable.
4. Construct a likelihood equation based on the probability model.
What is Quantitatitve Methodology?
• Research techniques that are used to gather and analyze quantitative
data, i.e., information dealing with anything that is measurable.
• Descriptive statistics: description of central variables by statistical
measures such as median, mean, standard deviation and variance.
• Inferential statistics: test for a relationship between variables – at least
one explanatory factor and one dependent variable.
• Inference is the goal:
– is it possible to generalize the regression results for the sample under
observation to the universe of cases (the population)?
– can you draw conclusions for individuals, countries, and time-points beyond
those observations in your data-set?
What is Quantitatitve Methodology?
• For the application of quantitative data analysis it is crucial that the
selected method is appropriate for the data structure:
• Dependent Variable:
– Dimensionality: spatial and dynamic.
– continuous or discrete.
– Binary, ordinal categories, count.
– Distribution: normal, logistic, poison, negative binomial.
• Critical points:
– Measurement level of the DV and IV.
– Expected and actual distribution of the variables.
– Number of observations and variance.
What is Quantitatitve Methodology?
Definition of Key Concepts:
• Variable: a variable is any measured characteristic or attribute that hast the
potential to differ for different subjects.
• Independent variables – explanatory variables – exogenous variables –
explanans: variables that are causal for a specific outcome (necessary
conditions).
• Intervening variables: factors that impact the influence of independent
variables, variables that interact with explanatory variables and alter the
outcome (sufficient conditions).
• Dependent variables – endogenous variables – explanandum: outcome
variables, that we want to explain.
What is Quantitatitve Methodology?
Definition of Key Concepts:
• Sample: a specific subset of a population (the universe of cases)
– Samples can be random or non-random=selected
– For most simple statistical models random samples are a crucial prerequisite
• Random sample: drawn from the population in a way that every item in the
population has the same opportunity of being drawn – the observations of the
random sample are thus independent of each other.
• Sampling error: one sample will usually not be completely representative of the
population from which it was drawn – this random variation in the results is known as
sampling error.
• For random samples, mathematical theory is available to assess the sampling error,
estimates obtained from random samples can be combined with measures of the
uncertainty associated with the estimate, e.g. standard error, confidence intervals.
What is Quantitatitve Methodology?
Random Samples
• Observations are independent of each other.
• The random sample mimics the distribution and all characteristics of the underlying
population.
• Sampling error is white noise, a random component with no structure, and can
therefore be assessed by mathematical and statistical tools.
• Often: not observing a random sample renders statistical results biased and
unreliable.
Selected Samples
• Sample selected on the basis of a specific criterion connected with the dependent
variable.
• Sample selection often precludes inference beyond the sample and renders
estimation results biased.
• One has to be aware of possible sample selection and account for the possible bias
especially of test statistics.
The Approach of Quantitative Political Science
Datasets
• Datasets contain dependent, independent, and intervening variables
for a specific sample in order to answer a research question/testing
specific theoretical propositions.
• All variables in the data have the same dimensionality (observations
for the same cases, units, and time points).
• Variables in a data can have different measurement levels, types, and
distributions.
The Approach of Quantitative Political Science
The Approach of Quantitative Political Science – Types of Data
Micro Data: Individual Data
• Survey data: Eurobarometer, National Election Study (US), British Election Study,
socio-economic panel (Germany and other countries).
Macro Data: Aggregated Data at Different Levels
• Economic indicators: Inflation, Unemployment, GDP, growth, population (density)
and demographic data, government spending, public debt, tax rates, government
revenue, interest rates, exchange rates, income distribution, FDI, foreign aid, trade
(exports/ imports), no of employees in different sectors etc.
• Political indicators: electoral system (majority, proportional), political system
(parliamentary, presidential, federal), political institutions, number of veto players,
regime type (democracy, autocracy), union density, labor market regulations, wage
negotiation system (corporatism), human and civil rights, economic and financial
openness, political particularism etc.
The Approach of Quantitative Political Science – Types of Data
Dimensionality of the Data
• Cross-sectional data: observations for N units at one point in time.
• Time series data: observations for one unit at different points in time.
• Panel data: observations for N units at T points in time: N is
significantly larger than T – mostly used for micro data – units are
individuals.
• Time series cross section (TSCS) data: panel data, but mostly used for
macro data – aggregated (country) data.
• Cross section time series (CSTS) data: observations for N units at T
points in time: T > N.
The Approach of Quantitative Political Science – Data Sources
Economic Data
• OECD: national accounts, government revenue, taxation, main economic indicators
(unemployment, inflation, GDP), earnings, labour market, FDI, social expenditure, debt,
employment etc.
• IMF: economic indictors, direction of trade statistics, international financial statistics (interest
rates, exchange rates, capital flows)
• World bank: economic indicators
• PennWorld tables: macro-economic data
• ILO: labour market statistics
• WTO: data on preferential trade agreements etc.
Political Data
• Eurobarometer: regular surveys, microdata European countries
• Polity: degree of democracy
• Freedom house: human and civil rights
• Correlates of War: MID, alliance, membership in IGOs
• Event data bases: WEIS (World Event Interaction Survey), IDEA
• Cingranelli-Richards (CIRI) Human Rights Database: Political freedom, political rights, civil- and
human rights.
Short Overview of Possibilities
Short Overview of Possibilities
Short Overview of Possibilities
Short Overview of Possibilities
Short Overview of Possibilities
Short Overview of Possibilities: OLS Regression
• A metric variable Y can be determined by a function of X
• The specific values of Y therefore depend on the specific values of X
Y = f(X)
• The most straightforward association of such a relationship is linear
Y = f(X) = a + bX
• The „line‟ is hence uniquely determined by two factors:
• the constant (a), i.e. the point where the „line‟ crosses the y-axis
• and the slope (b), i.e. how does Y change if X is increased by one unit
Short Overview of Possibilities: OLS Regression
Short Overview of Possibilities: OLS Regression
We do not have „deterministic‟ relationships, however! Hard – if not
impossible - to find in Political Science!
Short Overview of Possibilities: OLS Regression
• It is impossible to find a linear line on which all points lie jointly.
• Nonetheless, you can try to capture all these points straight through a
line that describes the underlying relationship in the best way.
• And THIS is exactly what regression analysis tries to do.
• Which straight line is the best, though?
Short Overview of Possibilities: OLS Regression
• The method for doing this is called OLS – ordinary least squares.
• The function shall plot a straight line through the points so that the
squared distances between the actually observed values (yi) and the
values as predicted by the function (ŷi) are minimized when summed up.
• The straight line – or the parameters of a and b – is chosen that
minimizes the sum of the residuals ei:
Short Overview of Possibilities: OLS Regression
• The equation for the OLS function is written like this:
ŷi = a + bxi
yi = a + bxi + ei
• The “hat” in the first equation demonstrates that we are just dealing
with estimates ŷi that may differ from the actual values of Y.
• Regarding the second equation, the error term ei indicates that not all
values of our observations may be found on the straight line
automatically.
• It is an approach to capture the underlying relationship as closely as
possible!
• It is an estimation!
Short Overview of Possibilities: OLS Regression
• How to determine the “quality” of a regression line?
• Follow the principle of ANOVA: Analysis of Variance.
Short Overview of Possibilities: OLS Regression
regression conflict water
Source | SS df MS Number of obs = 557
-------------+------------------------------ F( 1, 555) = 195.62
Model | 16311.805 1 16311.805 Prob > F = 0.0000
Residual | 46278.3932 555 83.3844922 R-squared = 0.2606
-------------+------------------------------ Adj R-squared = 0.2593
Total | 62590.1981 556 112.572299 Root MSE = 9.1315
------------------------------------------------------------------------------
conflict | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
water | 1.462844 .1045899 13.99 0.000 1.257404 1.668285
_cons | 34.93685 .6476726 53.94 0.000 33.66466 36.20904
------------------------------------------------------------------------------
yi = a + bxi + ei conflict=34.94+1.46*water+ ei
Short Overview of Possibilities: OLS Regression
Problems with Quantiative Research – Stargazing
• Begin with a hunch that a particular variable has an unappreciatedassociation with [environmental conflict].
• A standard regression is run. The analyst looks for “stars.”
• If the stars support the hunch, then the examination stops.
• Otherwise, additional regressions are run. No easily stated theoryguides such decisions.
• The process stops when the stars align.
Problems with Quantiative Research – Misspecification
• Claim: “X1, has no effect on Y.”
• Evidence: the coefficient of X1 does not achieve a particular level of statistical significance.
– So, X1 does not have a statistically significant effect within the stated model.
• What if the true underlying data generating mechanism is not identical to the structure of the stated model?
Problems with Quantiative Research – Remedies
• New estimators.
• Replication data.
• Greater rigor in relations between theoretical models andthe empirical models used to evaluate them.
• Increase transparency and build credibility throughtheoretical development and evaluation.
• The importance of transparency and rigor does not stop
when you have developed an empirical model.
Santiago Ramon y Cajal (1916)
“What a wonderful stimulant itwould be for the beginner if hisinstructor, instead of amazing anddismaying him with the sublimityof great past achievements, wouldreveal instead the origin of eachscientific discovery … –information that, from a humanperspective, is essential to anaccurate explanation of thediscovery.”
Problems with Quantiative Research – Remedies
Conclusion
• EITM - The Importance of Methods
• Choice of Methods
• What is Quantitative Methodology?
• The Approach of Quantitative Political Science
• Short Overview of Possibilities
• Some Problems and Caveats
• Any questions?