22
Validating Wordscores Bastiaan Bruinsma Kostas Gemenis Universiteit Twente 5th EPSA General Conference, Vienna, 25-27 June 2015 Bruinsma, Gemenis Validating Wordscores

Validating Wordscores

Embed Size (px)

Citation preview

Page 1: Validating Wordscores

Validating Wordscores

Bastiaan Bruinsma Kostas Gemenis

Universiteit Twente

5th EPSA General Conference, Vienna, 25-27 June 2015

Bruinsma, Gemenis Validating Wordscores

Page 2: Validating Wordscores

Computer assisted methods for text analysis

analyzing massive collections of text has been essentially impossible for all but the most well-fundedprojects.

We show how automated content methods can make possible the previously impossible in pol-itical science: the systematic analysis of large-scale text collections without massive fundingsupport. Across all subfields of political science, scholars have developed or imported methodsthat facilitate substantively important inferences about politics from large text collections. Weprovide a guide to this exciting area of research, identify common misconceptions and errors,and offer guidelines on how to use text methods for social scientific research.

We emphasize that the complexity of language implies that automated content analysis methodswill never replace careful and close reading of texts. Rather, the methods that we profile here arebest thought of as amplifying and augmenting careful reading and thoughtful analysis. Further,automated content methods are incorrect models of language. This means that the performance ofany one method on a new data set cannot be guaranteed, and therefore validation is essential whenapplying automated content methods. We describe best practice validations across diverse researchobjectives and models.

Before proceeding we provide a road map for our tour. Figure 1 provides a visual overview ofautomated content analysis methods and outlines the process of moving from collecting texts toapplying statistical methods. This process begins at the top left of Fig. 1, where the texts are initiallycollected. The burst of interest in automated content methods is partly due to the proliferation ofeasy-to-obtain electronic texts. In Section 3, we describe document collections which political sci-entists have successfully used for automated content analysis and identify methods for efficientlycollecting new texts.

With these texts, we overview methods that accomplish two broad tasks: classification andscaling. Classification organizes texts into a set of categories. Sometimes researchers know thecategories beforehand. In this case, automated methods can minimize the amount of laborneeded to classify documents. Dictionary methods, for example, use the frequency of key wordsto determine a document’s class (Section 5.1). But applying dictionaries outside the domain forwhich they were developed can lead to serious errors. One way to improve upon dictionaries are

Fig. 1 An overview of text as data methods.

Justin Grimmer and Brandon M. Stewart2

at Stanford University on January 22, 2013

http://pan.oxfordjournals.org/D

ownloaded from

Bruinsma, Gemenis Validating Wordscores

Page 3: Validating Wordscores

Wordscores

I Originally proposed by Laver, Benoit & Garry (2003)

I Popular tool (869 citations on Google Scholar)

I Developed for political manifestos, but also used to study:I Party mergers, electoral coalitions, policy preferences,

speeches, reports from US state lotteries, Chinese newspaperarticles, public statements by US Senators, open-endedquestions ...

I Attempts at validation are rather limited

Bruinsma, Gemenis Validating Wordscores

Page 4: Validating Wordscores

How Wordscores Works

Bruinsma, Gemenis Validating Wordscores

Page 5: Validating Wordscores

Previous attempts at validation

I Mostly against CMP data though Benoit & Laver (2007)advise against this

I Only assess criterion validity

I Only assess ordinal placement (Hjorth et al. 2015)

I Only use Spearman’s ρ or Pearson’s r (and thus noassessment of systematic measurement error)

Bruinsma, Gemenis Validating Wordscores

Page 6: Validating Wordscores

Replication of the original Laver et al. article

Table 1: Replication of the original scores

Number of PartiesStata Version 5 parties 7 parties

0.36EC

0 5 10 15 20

SO

DL Labour FG FF PD

FFLabour

PD

FGDL

DL Labour FFFG PDSF

GreensEC

0 5 10 15 20

DL

Labour

FFFG

GreensSO

SF PD

Laver et al. (2003)

23-Jun-2009

EC

0 5 10 15 20

SO

Labour FG PDFF DL

DL Labour FFFG

PD

EC

0 5 10 15 20

SODL

Labour

FF

FG

PD

SFGreens

DL

LabourFF FG PD

SF

Greens

Laver et al. (2003) Replication Material

Bruinsma, Gemenis Validating Wordscores

Page 7: Validating Wordscores

Hjorth et al. validation

ws_

rank

exp

ws_

rank

exp

ws_

rank

exp

ws_

rank

exp

low high low high low high

low high low high

low high low high low high

low high low high

low high

1945 1950 1953 1957 1960

1964 1966 1968 1971 1973

1977 1979 1981 1984 1987

1988 1990 1994 1998 2001

2005 2007

Bruinsma, Gemenis Validating Wordscores

Page 8: Validating Wordscores

Study Design

I DocumentsI Using 2004 Euromanifestos to score 2009 EuromanifestosI Euromanifestos obtained from the Manifesto Project Database

I Reference scoresI Chapel Hill Expert Study (2002), Benoit & Laver Expert

Survey (2003-2004), Euromanifestos Project (2004)

I ComparisonI Chapel Hill Expert Study (2010), EU Profiler (2009),

Euromanifestos Project (2009)

I AnalysisI Use Lin’s Concordance Correlation Coefficient instead of

Spearman’s ρ or Pearson’s rI 25 countries/territories ∗ 4 dimensions ∗ 3 reference scores ∗ 2

transformations = 600 analyses

Bruinsma, Gemenis Validating Wordscores

Page 9: Validating Wordscores

Study Design

I DocumentsI Using 2004 Euromanifestos to score 2009 EuromanifestosI Euromanifestos obtained from the Manifesto Project Database

I Reference scoresI Chapel Hill Expert Study (2002), Benoit & Laver Expert

Survey (2003-2004), Euromanifestos Project (2004)

I ComparisonI Chapel Hill Expert Study (2010), EU Profiler (2009),

Euromanifestos Project (2009)

I AnalysisI Use Lin’s Concordance Correlation Coefficient instead of

Spearman’s ρ or Pearson’s rI 25 countries/territories ∗ 4 dimensions ∗ 3 reference scores ∗ 2

transformations = 600 analyses

Bruinsma, Gemenis Validating Wordscores

Page 10: Validating Wordscores

Study Design

I DocumentsI Using 2004 Euromanifestos to score 2009 EuromanifestosI Euromanifestos obtained from the Manifesto Project Database

I Reference scoresI Chapel Hill Expert Study (2002), Benoit & Laver Expert

Survey (2003-2004), Euromanifestos Project (2004)

I ComparisonI Chapel Hill Expert Study (2010), EU Profiler (2009),

Euromanifestos Project (2009)

I AnalysisI Use Lin’s Concordance Correlation Coefficient instead of

Spearman’s ρ or Pearson’s rI 25 countries/territories ∗ 4 dimensions ∗ 3 reference scores ∗ 2

transformations = 600 analyses

Bruinsma, Gemenis Validating Wordscores

Page 11: Validating Wordscores

Study Design

I DocumentsI Using 2004 Euromanifestos to score 2009 EuromanifestosI Euromanifestos obtained from the Manifesto Project Database

I Reference scoresI Chapel Hill Expert Study (2002), Benoit & Laver Expert

Survey (2003-2004), Euromanifestos Project (2004)

I ComparisonI Chapel Hill Expert Study (2010), EU Profiler (2009),

Euromanifestos Project (2009)

I AnalysisI Use Lin’s Concordance Correlation Coefficient instead of

Spearman’s ρ or Pearson’s rI 25 countries/territories ∗ 4 dimensions ∗ 3 reference scores ∗ 2

transformations = 600 analyses

Bruinsma, Gemenis Validating Wordscores

Page 12: Validating Wordscores

Study Design

I DocumentsI Using 2004 Euromanifestos to score 2009 EuromanifestosI Euromanifestos obtained from the Manifesto Project Database

I Reference scoresI Chapel Hill Expert Study (2002), Benoit & Laver Expert

Survey (2003-2004), Euromanifestos Project (2004)

I ComparisonI Chapel Hill Expert Study (2010), EU Profiler (2009),

Euromanifestos Project (2009)

I AnalysisI Use Lin’s Concordance Correlation Coefficient instead of

Spearman’s ρ or Pearson’s rI 25 countries/territories ∗ 4 dimensions ∗ 3 reference scores ∗ 2

transformations = 600 analyses

Bruinsma, Gemenis Validating Wordscores

Page 13: Validating Wordscores

Types of validity

Following Carmines & Zeller (1979):

I Content ValidityI Does the method represent all facets of a construct?

I Construct ValidityI Does the method correlate with other measures reflecting the

same concept?

I Criterion ValidityI Does the method behave as expected within a given theoretical

context?

Bruinsma, Gemenis Validating Wordscores

Page 14: Validating Wordscores

Types of validity

Following Carmines & Zeller (1979):

I Content ValidityI Does the method represent all facets of a construct?

I Construct ValidityI Does the method correlate with other measures reflecting the

same concept?

I Criterion ValidityI Does the method behave as expected within a given theoretical

context?

Bruinsma, Gemenis Validating Wordscores

Page 15: Validating Wordscores

Types of validity

Following Carmines & Zeller (1979):

I Content ValidityI Does the method represent all facets of a construct?

I Construct ValidityI Does the method correlate with other measures reflecting the

same concept?

I Criterion ValidityI Does the method behave as expected within a given theoretical

context?

Bruinsma, Gemenis Validating Wordscores

Page 16: Validating Wordscores

Types of validity

Following Carmines & Zeller (1979):

I Content ValidityI Does the method represent all facets of a construct?

I Construct ValidityI Does the method correlate with other measures reflecting the

same concept?

I Criterion ValidityI Does the method behave as expected within a given theoretical

context?

Bruinsma, Gemenis Validating Wordscores

Page 17: Validating Wordscores

Content validity for EU Integration

0.5

11

.52

2.5

De

nsity

0 .5 1word relevance (mean)

BNP

01

23

4

De

nsity

0 .2 .4 .6 .8 1word relevance (mean)

CONSERVATIVES

02

46

81

0

De

nsity

0 .2 .4 .6 .8 1word relevance (mean)

GREENS

02

46

De

nsity

0 .2 .4 .6 .8 1word relevance (mean)

LABOUR

02

46

8

De

nsity

0 .2 .4 .6 .8 1word relevance (mean)

LIBDEM

02

46

8

De

nsity

0 .2 .4 .6 .8 1word relevance (mean)

PC

02

46

8

De

nsity

0 .2 .4 .6 .8 1word relevance (mean)

SNP

0.5

11

.52

2.5

De

nsity

0 .5 1word relevance (mean)

UKIP

02

46

De

nsity

0 .2 .4 .6 .8 1word relevance (mean)

Total

Bruinsma, Gemenis Validating Wordscores

Page 18: Validating Wordscores

Construct validity

LBG

MV

Tra

nsfo

rmat

ion

0 .2 .4 .6 .8 1McFadden's R Squared

BL CHES EMPReference scores from

LBG

MV

Tra

nsfo

rmat

ion

0 .2 .4 .6 .8 1Count R Squared

BL CHES EMPReference scores from

Bruinsma, Gemenis Validating Wordscores

Page 19: Validating Wordscores

Criterion validity

CH

ES

EU

PE

MP

Co

mp

are

d t

o

0 .2 .4 .6 .8 1Concordance Correlation Coefficient

LBG Transformation − Per Country Rescaling

CH

ES

EU

PE

MP

Co

mp

are

d t

o

0 .2 .4 .6 .8 1Concordance Correlation Coefficient

LBG Transformation − Whole Dimension Rescaling

CH

ES

EU

PE

MP

Co

mp

are

d t

o

0 .2 .4 .6 .8 1Concordance Correlation Coefficient

MV Transformation − Per Country Rescaling

CH

ES

EU

PE

MP

Co

mp

are

d t

o

0 .2 .4 .6 .8 1Concordance Correlation Coefficient

MV Transformation − Whole Dimension Rescaling

EU Integration Dimension

BL CHES EMP

Reference scores from

Bruinsma, Gemenis Validating Wordscores

Page 20: Validating Wordscores

Conclusion

I No serious validation of Wordscores up till now

I This validation found it lacking on content, construct andcriterion validity

I Wordscores should not be used to estimate parties’ policypositions using electoral manifestos as reference and virgintexts

Bruinsma, Gemenis Validating Wordscores

Page 21: Validating Wordscores

Outlook

I Wordscores might still be useful in other applications wherethe assumptions of ideal point estimation for words might beapproximated

I However, a case-by-case validation should be applied

Bruinsma, Gemenis Validating Wordscores

Page 22: Validating Wordscores

Validating Wordscores

Bastiaan Bruinsma Kostas Gemenis

Universiteit Twente

5th EPSA General Conference, Vienna, 25-27 June 2015

Bruinsma, Gemenis Validating Wordscores