Forensic Analysis of the Venezuelan Recall …arXiv:1205.3645v1 [stat.ME] 16 May 2012 Statistical Science 2011, Vol. 26, No. 4, 564–583 DOI: 10.1214/11-STS375 c Institute of Mathematical

arX

iv:1

205.

3645

v1 [

stat

.ME

] 1

6 M

ay 2

012

Statistical Science

2011, Vol. 26, No. 4, 564–583DOI: 10.1214/11-STS375c© Institute of Mathematical Statistics, 2011

Forensic Analysis of the VenezuelanRecall ReferendumRaul Jimenez

Abstract. The best way to reconcile political actors in a controversialelectoral process is a full audit. When this is not possible, statisticaltools may be useful for measuring the likelihood of the results. TheVenezuelan recall referendum (2004) provides a suitable dataset forthinking about this important problem. The cost of errors in examiningan allegation of electoral fraud can be enormous. They can range fromlegitimizing an unfair election to supporting an unfounded accusation,with serious political implications. For this reason, we must be veryselective about data, hypotheses and test statistics that will be used.This article offers a critical review of recent statistical literature on theVenezuelan referendum. In addition, we propose a testing methodology,based exclusively on vote counting, that is potentially useful in electionforensics. The referendum is reexamined, offering new and intriguingaspects to previous analyses. The main conclusion is that there werea significant number of irregularities in the vote counting that intro-duced a bias in favor of the winning option. A plausible scenario inwhich the irregularities could overturn the results is also discussed.

Key words and phrases: Election forensics, Venezuelan presidentialelections, Benford’s Law, multivariate hypergeometric distribution.

1. INTRODUCTION

The statistical controversies surrounding the out-comes of the Venezuelan referendum, convened torevoke the mandate of President Chavez on August15th of 2004, generated a long spate of articles innewspapers and occupied significant television time.A Google search with the exact phrase “Venezuelanrecall referendum” shows more than 100,000 hitsin English. Several reports, commissioned by dif-ferent organizations, reached opposite conclusions.Roughly speaking, a fraud may have occurred dur-

Raul Jimenez is Associate Professor, Department ofStatistics, Universidad Carlos III de Madrid, C./Madrid, 126 – 28903 Getafe (Madrid), Spain e-mail:[email protected].

This is an electronic reprint of the original articlepublished by the Institute of Mathematical Statistics inStatistical Science, 2011, Vol. 26, No. 4, 564–583. Thisreprint differs from the original in pagination andtypographic detail.

ing the referendum or, on the contrary, was sta-tistically undetectable. A good example of this isthe work of Hausmann and Rigobon (2011), wherethe authors claimed to have found statistical evi-dence of fraud. According to experts consulted byThe Wall Street Journal, “the Hausmann/Rigobonstudy is more credible than many of the other al-legations being thrown around” (Luhnow and DeCordoba, 2004). However, their early claim (Haus-mann and Rigobon, 2004) was later rejected by TheCarter Center [(2005), Appendix 4] and by Weisbrotet al. (2004).The first peer-reviewed article devoted to the sta-

tistical analysis of the referendum data (Febres andMarquez, 2006) concluded that there is statisticalevidence for rejecting the official results. This ar-ticle, in International Statistical Review, made nomention of the paper by Taylor (2005) which con-cluded explicitly that there is no evidence of fraud.Taylor’s paper is the best known reference on thesubject, widely covered by media; in part becausehe was asked to investigate the allegations of fraud

1

http://arxiv.org/abs/1205.3645v1

http://www.imstat.org/sts/

http://dx.doi.org/10.1214/11-STS375

http://www.imstat.org

mailto:[email protected]

http://www.imstat.org

http://www.imstat.org/sts/

http://dx.doi.org/10.1214/11-STS375

2 R. JIMENEZ

on behalf of The Carter Center. Another well-knownreference is a paper by Felten et al. (2004), which didnot detect any statistical inconsistency that wouldindicate obvious fraud in the election. However, threepapers in this issue of Statistical Science (Delfinoand Salas, 2011; Prado and Sanso, 2011; Pericchiand Torres, 2011) support the claim of fraud. Whois right?The statistical papers on the referendum can be

grouped into two classes: those that only use votecounting and those that use related additional data.Five papers mentioned above cover the differentclaims of fraud investigated by the panel of expertsconvened by The Carter Center [(2005), Appendix 13].These are:

(1) Discrepancy between official results and exitpolls (Prado and Sanso, 2011) and unexpected cor-relations between computerized vote counting, thenumber of signatures for the recall petition and au-dit results (Delfino and Salas, 2011).

(2) Anomalous distributions of votes among vot-ing notebooks (Febres and Marquez 2006; Taylor2005), including high rates of ties (Taylor, 2005) andfailure of fit to Benford’s Law for significant digits(Pericchi and Torres 2011; Taylor 2005).

I am very skeptical about the use of data from othersources. To make a long story short, below I men-tion only key facts that can be extracted from theComprehensive Report of The Carter Center:The months previous to the referendum were highly

polarized, with mass rallies for and against the gov-ernment, with aggressive campaigns for attractingnew voters and to intimidate and persecute both sign-ers (people who signed for the recall petition) andsupporters of President Chavez. Even the referen-dum day was hot. The electoral actors took ad hocdecisions that generated suspicions and lack of con-fidence in the whole process.In this political atmosphere, we must assume that

any unofficial information will be controversial. Ifthere are many doubts about the official results, onecannot expect consensus with other data. Further-more, one must be very careful with the statisticalassumptions that one will use.This article has two purposes: (1) to bring order

to the ruckus caused by different statistical analy-ses, some of them carried out by non-experts, and(2) to examine, by a proper forensics analysis, theallegations of fraud. Section 2 reviews the referen-dum framework, introduces the main notation used

throughout this paper and presents a critical revi-sion of the five papers cited above. In Section 3 wepropose a methodology, based exclusively on votecounting, to test the recall referendum of 2004. Thepresidential elections of 1998 and 2000 are also re-viewed. Far from being a statistical headache, thereferendum is an excellent dataset to exercise a widevariety of elementary but powerful statistical tools.Additionally, the case of study is also useful for illus-trating some common mistakes in stochastic mod-eling. Section 4 summarizes the main findings andconclusions.

2. REFERENDUM FRAMEWORK

AND CRITICAL REVIEW

The electoral process is fully described in the re-port of The Carter Center (2005). The crucial fea-tures for the present analysis are:

(i) A voting center consists of one or more elec-toral tables and each table consists of one, two orthree voting notebooks, which are the official dataunits with the lowest number of votes.

(ii) Within the time allowed, voters were regis-tered to a center. Voters usually chose a center closeto their residence or workplace, many of them longbefore the referendum. When the time was over, thereferee decided the number of voting notebooks in acenter according to the number of voters. In addi-tion, notebooks are grouped in tables (no more thanthree per table), mainly for logistical reasons relatedto the voting process.(iii) In each center, voters are randomly assigned

to the notebooks.1

(iv) There were only two options to vote: YES orNO. Although there was a very small percentage ofinvalid votes (0.3%), there was a significant percent-age of abstentions (30%).

1Every Venezuelan citizen is assigned an ID number. Thesenumbers are assigned in sequential order by date of request.Usually, it is done when a Venezuelan girl or boy is ten yearsold. By this I mean that the number is independent of the en-tire electoral process. The ID number of the voters (older than18 years old) has up to nine digits and, except for a case of ex-treme longevity, at least six digits. The mechanism to assignvoters to notebooks can be described as follows: According tothe last two digits, the voters were uniformly distributed tothe notebooks. For example, in a center with four notebooks,if the last two digits ended between 00 and 24, then it wasassigned to notebook 1. If the last two digits ended between25 and 49, then it was assigned to notebook 2, and so on.

FORENSICS OF THE VENEZUELAN REFERENDUM 3

(v) The voting notebooks were computerized(touch-screen voting machines which collected 86%of the valid votes) and manual (ballot boxes whichrepresented 14% of the votes).

Both (i)–(ii) and (iv)–(v) are simple true factsbut (iii) is a statistical hypothesis. The secrecy ofthe ballot lies in the random assignment of votersto notebooks. For this reason, (iii) is essential fora fair election. Thus, we assume it is true through-out our analysis, with the exception of Sections 3.5–3.7, where we suppose there were irregularities inthe allocation of voters to notebooks.Next, let us introduce the basic notation used

throughout this paper. To do so, I will use the termpolling unit generically in the next three paragraphsto refer to a center or a table or a notebook.

– Let Yi be the number of YES votes (those favoringrecalling President Chavez) and Ni the number ofNO votes in polling unit i.

– Let Ti = Yi + Ni be the total number of validvotes in polling unit i and τi the number of vot-ers assigned to that polling unit (the size of thepolling unit). Note the difference between votersand valid votes.

– Let Oi = τi − Ti be the number of invalid votesand abstentions in the polling unit i. For brevity,we refer to them as the OUT votes (out of theelectoral consultation).

In the rest of this section, where we review differ-ent papers, the subscript can refer to centers, tablesor notebooks. However, in Section 3 the subscriptsare used only to identify voting notebooks.

2.1 Discrepancies Between Two Exit Polls and

Official Results

Prado and Sanso (2011) addressed the controver-sial discrepancy between two independent exit pollsand the official results. Roughly, the official resultwas 41% YES votes and 59% NO votes, while theexit poll results were 61% YES votes. The pollswere collected by a political party (Primero Justi-cia) and a non-governmental organization (Sumate),both opposition to president Chavez. The authors’main claims are:

C1: There was no selection bias in choosing the cen-ters to be polled.

C2: The discrepancies per center cannot be explai-ned by sampling errors.

C1 is settled by noting that the proportion of YESvotes for the overall population matches the propor-tion of YES votes for the polled centers.Claim C2 is addressed by assuming that the sam-

pling distribution of the number of YES answers fora given polled center i, say yi, is a Binomial(ti, pi)random variable. The parameters of this Binomialare: ti the size of the sample collected at the cen-ter and pi the proportion of YES votes, namelypi = Yi/Ti. Under this assumption, Prado and Sanso(2011) showed that there are significant differencesbetween the official results and the exit polls in about60% of the 497 polled centers. The authors also con-sidered the pairwise comparison between the twoexit polls among the common polled centers (27 intotal). We remark that eight of them (30%) differsignificantly.It appears that Prado and Sanso had the following

assumptions in mind to determine that yi is Bino-mial with the parameters above:

A1: Given a polling center, the persons to interviewwere selected by simple random sampling.

A2: Each interviewed person responded to the ques-tion with the truth.

A careful reading of Section 2 of Prado and Sanso(2011) suggests that the sample at each center maycorrespond to a more complex model than simplerandom sampling. How could the used model af-fect the estimates and conclusions of their analy-sis? If, for example, and as seems to be, stratifiedsampling was used, it will depend on the stratifi-cation schema and the allocation criteria used bythe pollsters (Lohr, 2004). In the absence of con-crete information, the assumption of the binomialdistribution is the most reasonable one. However,we cannot ignore the uncertainty about the modeland, consequently, about the sampling errors com-puted under A1.The authors discussed briefly the consequences of

the non-veracity of A2. “It has been demonstratedrepeatedly that non-response can have large effectson the results of a survey” (Lohr, 2004). It is quitepossible that, in a highly polarized political climate,voters that supported Chavez were associated withnon-response, since they could identify the pollstersas members of the opposition to Chavez. Unfortu-nately Prado and Sanso had no estimates of non-responses and so had to ignore their effects.Other sources of voter selection bias and measure-

ment error are discussed in this paper. Some of them

4 R. JIMENEZ

could imply a systematic bias across the pollsters.Such is the case of the late closing of the votingcenters:The voting centers had to be open until 4:00 p.m.

but the electoral umpire extended the closing timetwice, first until 9:00 p.m. and finally until midnight.This was not foreseen by the pollsters and duringthe afternoon and evening, there was a fierce cam-paign to promote the attendance of the supporters ofPresident Chavez to the voting centers (The CarterCenter, 2005).Prado and Sanso (2011) also studied this possi-

bility, but the available data are very limited. Al-though the statistical procedure and motive are cor-rect, missing data can produce results that have novalidity at all (De Veaux and Hand, 2005).It is hard to believe that the discrepancies between

exit polls and official results are due to samplingand random non-sampling errors. Unfortunately theinformation about exit polls is limited and does notallow a more rigorous analysis.

2.2 YES Votes Versus Number of Signers

in the Recall Petition

Delfino and Salas (2011) focused on the associa-tion between the YES votes and the number of sign-ers in the recall petition.2 In the first four sections ofthis paper the authors described the electoral pro-cess well. However, from the fifth section onward,I have major concerns.Let Si be the number of signers in voting center i.

The authors considered the following two relativenumbers of YES votes and signers:

ki =Yi

Siand si =

Si

Ti.(1)

They conducted a bivariate data analysis with k asa response variable and s as an input variable. Sincek ≤ 1/s, they said: “In voting centers with a largevalue of s, we expect a value of k around 1. . . Thesituation is completely different in voting centerswith a small value of s. The singularity can pro-duce very high values of k in the neighborhood ofs = 0. Hence the level of uncertainty in k becomesvery large.” Later on, they added: “The comput-erized centers are very far away from 1/s, clearly

2For readers who do not know the intricacies of the refer-endum, the signatures were collected eight months before thereferendum. Many signers were invalidated and some had tosign again in a second runoff (The Carter Center, 2005).

contradicting the expected non-linear behavior withrespect to s.” Finally, they claimed fraud becausethe data contradict this behavior and even venturedto establish a hypothesis: “In computerized centers,official results were forced to follow a linear relation-ship with respect to the number of signatures.”What can justify the previous conjecture? All that

we really know is that the range of k is larger whens decreases. How can we infer the expected nonlinearbehavior of k with respect to s from this fact? As isshown in equation (4) of their paper,

ki =pisi,

pi = Yi/Ti being the proportion of YES votes in cen-ter i. Then, k decreases as 1/s, of course, but in-creases as p does and there is a strong relation be-tween these two variables. In fact, as we will explainnext, one expects the value of k to be constant withrespect to s, not only showing that the conjecture ofDelfino and Salas (2011) is false, but showing thatthe results observed are as expected.Following their schema, we analyze the (full) com-

puterized centers and (full) manual centers sepa-rately. Manual centers are peculiar. They usuallycorrespond to remote locations and they have a muchsmaller number of votes than the computerized ones(Prado and Sanso, 2011). For this reason many au-thors perform a separate analysis of these data. Therewas also a small number of mixed centers wherethere were both manual and computerized notebooks.These centers represent only 1.26% of the total YESvotes, 1.3% of the valid votes, and are excluded inwhat follows.Let γ1 = 389,862 and γ2 = 3,548,811, the total

YES votes in manual and computerized centers, re-spectively. Consider also the total number of signersin manual and computerized centers, that we shalldenote by θ1γ1 and θ2γ2, so that θ1 and θ2 are ratiosbetween total signers and total YES votes. As men-tioned before, I am skeptical about the use of datathat are not official results of the referendum. So,we will assume θ1 and θ2 are unknown parametersand will only assign values to them for simulationpurposes.As The Carter Center (2005) remarked, the sign-

ers were the hard core of the YES votes. In fact,Delfino and Salas (2011) claimed that “each sig-nature has a high probability of resulting a YESvote.” Let us simplify the scenario and assume thateach signature in a center was a YES vote in that


Fig. 1. YES votes versus simulated signatures according to the heteroscedastic linear model (4). The left panel correspondsto manual centers with θ1 = 1/1.81. The right panel corresponds to computerized centers with θ2 = 1/1.15.

center. Thus, the ratios θ1 and θ2 are less than 1.Under this assumption, the conditional distributionof Si given Yi can be fitted by a hypergeometric dis-tribution with parameters γc (the number of mar-bles in the hypergeometric jargon), θcγc (the num-ber of white marbles) and Yi (the number of draws),c being equal to 1 or 2 according to whether i rep-resents a manual or computerized center. The ex-pected value and variance of the hypergeometric vari-able are

E[Si|Yi] = θcYi and(2)

Var[Si|Yi] = Yiθc(1− θc)γc − Yi

γc − 1.

Using the standard normal approximation one ob-tains

Si − θcYi√

Yiθc(1− θc)≈N (0,1),(3)

N (µ,σ2) a Normal random variable with mean µand variance σ2. Relation (3) leads us to considerthe two heteroscedastic linear models

S = θcY +N (0, θc(1− θc)Y ),(4)

for manual centers (c= 1) and computerized centers(c= 2).For each center, we simulated the number of sig-

natures given the number of YES votes at the cen-ter using (4). Typical outcomes of these simulationswith θ1 = 1/1.81 and θ2 = 1/1.15 are shown in Fig-ure 1. The values of θc were chosen with the in-tention of comparing our simulated clouds of pointswith those shown in Figure 6 of Delfino and Salas(2011). Note that the least squares regression linesof the latter ones have slopes 1.81 and 1.15 usinga reverse relation between the variables, namely Y =

acS + bc + error. Thus, we take θc = 1/ac. It is dif-ficult to see how to reject the regression model (4)using statistical testing, even under the classical ho-moscedastic linear model. The differences betweenthe clouds associated with manual and computerizedcenters are due to differences in scale and variances,included in the heteroscedastic linear model (4).There is nothing mysterious about this difference,as Delfino and Salas (2011) suggested. Reversing therelationship between Y and S in regression model (4)yields a heteroscedastic linear model

Y = βcS +N (0, σ2cS).(5)

Dividing by S, the above equation becomes

k = βc +1√SN (0, σ2

c ),(6)

which precisely describes the clouds of points shownin Figures 3 and 5 of Delfino and Salas (2011), withobservations around a constant for any value of s,although the range of k is larger when s is smaller.In summary, it is expected that {ki} will be con-stant with a dispersion which decreases as 1/

√S

(note the difference between S = sT and s). Notethat, although s will be small, if T is large (like al-most every computerized center), the variance canbe small, explaining why computerized centers aremore concentrated around the expected value of k.There is an additional comment related to Fig-

ures 3, 4 and 5 of Delfino and Salas (2011) worthmaking. Note that all right panels have a gap forsmall values of the input variables (almost withoutobservations). Compare the figures removing thesegaps in both panels. For example, remove the win-dows with s < 0.1 in Figures 3 and 5 and the win-dows with less than 200 total votes in Figure 4. The

6 R. JIMENEZ

Fig. 2. Left panel: Histogram of the YES votes (top), NO votes (middle) and OUT votes (bottom) per notebook. Right panel:Benford’s Law for the second digit (solid line) versus relative frequencies of the second digit for YES votes = +, NO votes=× and OUT votes = ◦.

behavior is very similar for manual and computer-ized voting centers. Their conclusion about the dif-ferent behavior between manual and computerizedcenters seems inaccurate.There are more intriguing statistical arguments

in the paper of Delfino and Salas (2011). Althoughwe have focused only on their main claim, I shouldadd a comment related to the data. From the leastsquares regression lines shown in Figure 6 of Delfinoand Salas (2011) one can estimate the total signa-tures in fully manual or computerized centers (ex-cluding the mixed ones) on which the authors basetheir study. This total is 3,310,200, close to the3,467,051 signatures submitted to the electoral um-pire (Delfino and Salas, 2011). However, the totalnumber of valid signers was 2,553,051 (The CarterCenter, 2005). I leave the conclusion to the reader.

2.3 Anomaly Detection by Benford’s Law

Pericchi and Torres (2011) compared empiricaldistributions with Benford’s Law governing the fre-quency of the significant digits (Hill, 1995). Con-sidering several electoral processes in three coun-tries, the only case compellingly rejected by theirtest is the NO votes at computerized notebooks inthe Venezuelan recall referendum. In addition, theymade reference to recent contributions in which com-pliance or violation of the law in electoral processeshas been studied. Some criticisms related to the useof the law in electoral data (The Carter Center,2005; Taylor, 2005) were also discussed. As theoreti-cal contributions, the authors obtained a generaliza-tion of the law under restrictions of the maximum

number of votes per polling station and discussedtechnical issues related to measuring the fit of thelaw.It is important to note that Pericchi and Torres

(2011) did not analyze the OUT votes or absten-tions. Figure 2 shows the marginal distributions ofeach option of vote per notebook (left panel) andcompares the empirical distributions of the seconddigit with Benford’s Law (right panel). RegardingFigure 2:

• As Pericchi and Torres showed, the YES votesconform to the law, while the NO votes do not.However, the strongest widespread departure fromthe law is related to the OUT votes. The χ2 teststatistic for this option is the highest of the three.

• It is known that compliance with the law is morelikely when the skewness is positive (Wallace, 2002),and the only distribution with positive skewnessis related to the YES votes.

We should remark that violations of Benford’s Lawmay be due to unbiased errors (Etteridge and Sri-vastava, 1999). Thus, deviations from the law canarise regardless of whether an election is fair or not(Deckert et al., 2010). On the other hand, there aremany types of fraud that cannot be detected by Ben-ford’s analysis (Durtschi et al., 2004). So, electoralresults that conform to the law are not neccessarlyfree of suspicion.To illustrate the comments above let us consider

results by centers rather than by notebooks. In Fig-ure 3 (left panel) we show the distributions of thenumber of votes at this aggregation level. Note that


Fig. 3. Left panel: Histogram of the YES votes (top), NO votes (middle) and OUT votes (bottom) aggregated by center.Right panel: Benford’s Law for the second digit (solid line) versus relative frequencies of the second digit for YES votes =+,NO votes =× and OUT votes = ◦.

now all distributions have positive skewness. In thesame figure (right panel) we also show Benford’sLaw for the second digit and the related empiri-cal distributions of vote per center. All voting op-tions confirm the law. According to this analysis,there is no reason to doubt the official results bycenter, despite that the test suggests the contrarywhen we use the results by notebook. Is the for-mer a false negative or the latter a false positive?Could unbiased errors in the vote counting by note-books reproduce such a scenario? Or, conversely,could the results by centers be masking a fraud innotebooks? Benford’s test does not address this con-troversy.

2.4 Irregularity in the YES Votes Distribution

Febres and Marquez (2006) tested the distribu-tion of YES votes in the voting notebooks. In a firstround, they applied a Z test to compare the pro-portion of YES votes in each notebook with theproportion from the center to which the notebookbelongs. The number of irregular notebooks (note-books with a proportion significantly different fromthe proportion of the center) resulting from thisround is expected. Therefore, this analysis suggestsno inconsistency. According to the territorial organi-zation of Venezuela, the voting centers are groupedinto parishes. The authors subdivided the parishesinto clusters of centers, using a criterion that wediscuss later. They then applied Pearson’s χ2 testto compare the distribution of YES votes amongthe notebooks at each cluster with the conditionalexpected distribution given the overall results by

cluster and valid votes by notebook. In this secondround, they reported a high percentage of irregularclusters (clusters with an outlier χ2 statistic). Theirmain finding was that the irregular clusters favor theNO option. Moreover, they showed a monotone re-lationship between the proportion of YES votes bycluster and the p-value of the Pearson χ2 test. Tun-ing the confidence level to block irregular clusters,they estimated the overall result and the winningoption is YES.As mentioned earlier, voters within the same cen-

ter were randomly assigned to the notebooks. Thus,each notebook is a random sample without replace-ment from the voting center population. The frame-work can be completely different when notebooksare grouped by clusters of centers. If the propor-tions of YES votes of two centers in the same clusterare not equal, no matter how similar they are, andif the total number of votes by notebooks is largeenough, any consistent test will detect significantdiscrepancy between the proportions in the note-books and the proportion in the cluster. The au-thors took care of this fact. They made a trade-offbetween the homogeneity of the cluster (how similarthe proportions of the centers within the cluster are)and the number of votes per notebook. Basically, theclusters were chosen such that the Z test does notdetect a significant difference between the propor-tion of YES votes at the notebook with the greatestnumber of votes and the cluster proportion. In thisway they ensured that each notebook is a represen-tative sample of the cluster. The authors referred tothis as the minimum heterogeneity distance for clus-

8 R. JIMENEZ

Fig. 4. Left panel: Exact probability density function (dashed line) of the χ2 test statistic related to the cluster with fivenotebooks described in Table 9 of Febres and Marquez (2006). Probability density function of the reference distribution used bythe authors (solid line). The cross marks the observed value for the test statistic. Right panel: Exact cumulative distributionfunction (dashed line) and usual asymptotic approximation for the χ2 test statistic (solid line).

tering analysis and made reference to the books ofSokal and Sneath (1973) and Press (1982).I have two concerns about these results. The first

deals with a general concern about the validity of adhoc mechanisms to identify false positives (detect-ing fraud when none is present), which might be thecase. The second is a technical issue that must beresolved before subscribing to the authors’ conclu-sions.In the referendum context, the standard cluster

units of notebooks are the voting centers. I guessthat the authors did not report results at this levelof aggregation because they did not observe incon-sistencies at this level of aggregation. In fact, if weapply Pearson’s χ2 test to detect irregular centers,in the same way that the authors applied this testto detect irregular clusters, we do not observe majorinconsistencies. Therefore, their results depend ona particular way of clustering the notebooks. Whythese clusters instead of ones more or less homoge-neous? Why keep the hierarchical ordering by par-ishes instead of another more related to politicalpreferences? With these questions I am only tryingto illustrate natural doubts that can arise when weintroduce ad hoc criteria for grouping notebooks. Ifthe results were independent of the grouping level,then this would not matter, but this is not the case.My second concern is the use of the usual asymp-

totic distribution of Pearson’s χ2 statistic to deter-mine when an observed value of the test statisticis an outlier. This asymptotic does not hold in theframework that we are considering. In general, itis doubtful that this holds when the multinomialdistribution, which is the standard underlying as-

sumption when this test is performed, is replacedby a multivariate hypergeometric distribution (Zel-terman, 2006), which is the reference model for thedistribution of votes among notebooks. In particu-lar, because all the votes of each cluster are dis-tributed among the notebooks, the correlations arenot negligible. Despite this, I do not deny that thereis a high percentage of irregular clusters. To illus-trate the previous comment, we consider the clusterwith five notebooks described in Table 9 of Febresand Marquez (2006). Following the standard asymp-totics, the authors used the χ2 distribution with fourdegrees of freedom to compute the p-value of the teststatistic related with this cluster. We compute theexact distribution of this statistic to compare withthe χ2(4) distribution. How to compute the exactdistribution is not relevant for now (it is a simpleexercise following the discussion in Section 3). Theimportant thing here is that an outlier for the χ2(4)distribution is also an outlier for the exact distri-bution (see the left panel of Figure 4). In fact, asthe right panel of Figure 4 shows, the test statis-tic for this cluster is less than χ2(4) in the usualstochastic order. If we had a similar result for allclusters, then we could ensure that the percentageof irregular clusters is equal to or greater than thepercentage reported in the paper. I believe that sucha result could be obtained. An alternative would beto compute the exact distribution for each cluster torecompute the p-values and the percentage of out-liers. This involves high computational costs but itwould also allow us to test the authors’ main claimabout a monotone causal relationship between theproportion of YES votes and the p-value.


The conjectures of Febres and Marquez are inter-esting and point in a concrete direction, but requirea further analysis before raising them to conclusionsof fraud.

2.5 Too Many Ties?

Taylor (2005) considered the following six modelsof “fair elections”:

T1. A model in which the YES/NO votes in com-puterized notebooks are independent andidentically distributed Poisson random vari-ables, with common expectation according tothe results in the country.

T2. The same model as above but with a commondistribution which is not necessarily Poisson.

T3: A model in which the YES/NO votes in thenotebooks of each electoral table are indepen-dent and identically distributed Poisson ran-dom variables, with common expectation ac-cording to the results in the table.

T3.1. A model in which the distribution of YES/NOis multinomial, splitting up the YES/NO votesof each electoral table equally among the note-books.

T4. A multivariate hypergeometric model, condi-tioned on the results per electoral table andvalid votes per notebook.

T5. A parametric bootstrap where total votes ofnotebooks {Ti} are generated according to theinteger part of a multivariate Normal distribu-tion. Then YES votes in notebook i are samp-led according to a Binomial(Ti, p), p being theproportion of YES vote in the electoral table.

Although in Taylor’s paper it is not always explicitlysaid, T3–T5 are conditioned on the official results byelectoral table and T4 is additionally conditioned onthe official number of valid votes by notebook.From these models, the author analyzed different

statistical anomalies related with claims of fraud.Two of them have been previously discussed in thissection (Febres and Marquez 2006; Pericchi and Tor-res 2011). The third is related to high rates of YESties: A YES tie is a perfect match of YES votes be-tween two notebooks. Accordingly, his analysis canbe divided into three parts:

• Global test for goodness of fit for models T3and T3.1.

• Comparative study between the distribution ofthe significant digits according to T3.1 (also toa slight improvement of T1), the observed distri-bution and Benford’s Law.

• Computation of the expected number of electoraltables with one or more YES ties, for each model;and comparison with the observed number of ties.

His main results and conclusions can be summarizedas follows:

R1. “The more powerful χ2 test” strongly rejectsthe Poisson model T3. However, a False Discov-ery Rates analysis (Benjamini and Hochberg,1995) shows “there is not evidence of widespreaddepartures for the Poisson model.” This result“shows no systematic fraud in the form of vote-capping.”

R2. The distribution of the significant digits of themultinomial model T3.1 does not conform toBenford’s Law and is virtually identical to theobserved distribution. Thus, Benford’s Law is of“little use in fraud detection in this instance.”

R3. The Z scores used to compare the observednumber of electoral tables with one or moreYES ties with the expected number accordingto his models “are fairly high” (I will make anexception with the Z test related to T4, whichis equal to 2.37). “This only means that wecan reject the global null hypothesis” (i.e., theglobal models) “and not that there indeed wasfraud.”

First of all, the validity of a statistical model isnot entirely justified by the fact that it fits the data,especially if one wants to test the quality of thosedata. The costs, here associated with a false nega-tive (failing to identify a fraud condition when oneexists), are too high. The model should at least notbe at odds with our knowledge about the systemthat is being modeled.According to Taylor’s web page,3 “the first two

models” (T1 and T2) “are clearly unrealistic.” Thenext two (T3 and T3.1) also are:

(a) As discussed in the previous subsection, the as-sumption of independence among the notebooksis meaningless. There are links on the sums ofvotes across an electoral table. All the votes ofeach table are distributed among its notebooks,so the correlations are not negligible.

(b) The number of voters (note again the differencebetween voters and votes) by notebooks variesamong the notebooks of the same electoral ta-ble, so it makes no sense to equally split thevotes of a table among the notebooks.

3http://www-stat.stanford.edu/˜jtaylo/venezuela.

http://www-stat.stanford.edu/~jtaylo/venezuela

10 R. JIMENEZ

The last two models (T4 and T5) take into ac-count (a) and (b). In particular, I agree that themultivariate hypergeometric approach used in T4 isthe right way to generate vote configurations. How-ever, T5 resorts to assumptions that can be ques-tionable, as to the use of the integer parts of mul-tivariate Normal random variables to generate validvotes by notebooks. Given that the two models pro-vide similar results according to his own analysis, wewill apply the principle of Occam’s razor4 to reducehis list to just one realistic model.What can we conclude when a questionable dataset

does not show evidence of widespread departures foran unrealistic model? What if the distributions ofthe significant digits are similar between them butdiffer from Benford’s Law? The conclusions in R1and R2 are baseless. We cannot conclude anythinguseful from these analyses.Let us move to R3, where he considers the multi-

variate hypergeometric model and simulations car-ried out by Felten et al. (2004) to analyze the YESties phenomenon. These simulations show that thenumber of electoral tables with one or more ties ishigh, but not high enough to be considered a signof fraud (around 1% of cases can have an equal orgreater number of tables with YES ties, accordingto this model). This part of his analysis did not de-tect extreme statistical anomalies that would indi-cate obvious fraud in the referendum. Of course, asFelten et al. emphasized, this does not imply theabsence of fraud, either.

3. REEXAMINING THE REFERENDUM

The purpose of this section is to reevaluate theclaim of fraud. An electoral fraud occurs if the re-sults are altered to favor one of the options. Havingevidence that the changes are enough to overturnthe winner, the outcomes of the referendum shouldnot be recognized. Moreover, if the handling doesnot change the winner, but changes the proportionssignificantly, it must be considered a fraud. Elec-toral results can affect drastically future electoralprocesses. In particular, this could have happenedduring the Venezuelan parliamentary elections, oneyear after the referendum, in which the political par-ties that supported the YES option withdrew, claim-ing the possibility of new fraud. Also, a tight result

4Entia non sunt multiplicanda praeter necessitatem (enti-ties must not be multiplied beyond necessity).

can have a different political meaning than an out-come with a winner by a wide margin, especially ina recall referendum. At the end of this section weevaluate the hypothesis of irregularities in the votecounting to favor significantly the NO option. Webegin by describing the joint probability distributionof results per notebook, conditioned on the completeset of information of each center. This correspondsto a multivariate hypergeometric model, similar tothat used in Felten et al. (2004) and Taylor’s T5model (the differences are explained below). This isa key tool in the hypothesis test methodology thatwe develop through this section.

3.1 Shuffling Voting Cards

Consider a center with m notebooks, labeled by1,2, . . . ,m. Let ν =

∑mi=1 τi be the total voters in the

center. Identify each voter by a number in {1,2, . . . , ν}such that the first τ1 voters are in notebook 1, thefollowing τ2 voters are in notebook 2, and so on.In the vote counting, each voter is represented bya voting card according to her/his electoral option.It can be YES, NO or OUT. Let Xi be the votingcard of voter i. Then, the vote configuration at thecenter can be represented by

X =

ν voters︷︸︸︷

(X1, . . . ,Xτ1︸︷︷︸

notebook 1

, . . . ,Xν−τm+1, . . . ,Xν︸︷︷︸

notebook m

) .(7)

Let y =∑m

i=1 Yi be the total YES votes in thecenter. Similarly, let n =

∑mi=1Ni be the total NO

votes. Then, X is an outcome of shuffling the votingcards of the center:

C = (

ν voters︷︸︸︷

YES , . . . ,YES︸︷︷︸

y YES’s

,NO, . . . ,NO︸︷︷︸

n NO’s

,OUT , . . . ,OUT︸︷︷︸

ν−y−n OUT’s

(8)

That is to say that X is a permutation of C.According to the random mechanism used by the

electoral umpire to assign voters to notebooks, given(y,n, ν), any permutation of voting cards has thesame probability of occurring. This is the underlyingstatistical principle shared by Febres and Marquez(2006), Felten et al. (2004) and Taylor (2005) fortesting the referendum data. However, these authorsdo not consider all possible permutations:

• The sampling distribution of the test used by Feb-res and Marquez (2006) in their first round, where


they conditioned on results by centers and validvotes by notebooks, corresponds to sampling onthe set of outcomes of shuffling YES cards andNO cards in centers, leaving fixed OUT cards innotebooks.

• The samples from the multivariate hypergeomet-ric model considered by Felten et al. (2004) andTaylor (2005) belong to a set of permutations evensmaller than the previous one. They conditionedadditionally on the total of YES votes and NOvotes by electoral tables. That is, they just consid-ered shuffling YES cards and NO cards in tables,also leaving fixed OUT cards in notebooks.

Both approaches fail to consider a large number ofequiprobable results that match the referendum re-sults at the centers. In this paper, we compute sam-pling distributions of test statistics considering allpossible permutations of the voting cards at eachcenter. To simplify the writing, in what follows, wewill refer to the result obtained by shuffling ran-domly the cards across all centers as a random sam-ple of the electoral process.

3.2 Statistical Hypothesis of Fair Referenda

If we assume that the referendum was properlyconducted, the results by notebook correspond toa random sample of the electoral process. Therefore,the hypothesis of a properly conducted referendumis

H0: The votes per notebook correspond to a randomsample of the electoral process.

But, the rejection of H0 does not imply that theresults per notebook were altered to favor one ofthe options. It only implies that there is a signifi-cant presence of outliers in the distribution of votesper notebook. Innocent irregularities, as the incor-rect allocation of voters to notebooks, can generatesuch outliers. We consider the most innocent alter-native to H0, assuming that: (1) there is a signifi-cant presence of outliers in the votes per notebook,(2) the outliers are the result of neutral irregular-ities, and (3) the irregularities affect a random setof notebooks, regardless of whether they belong tostrongholds of the winning option or not. Therefore,we consider the hypothesis of an atypical fair refer-endum, namely,

H1: There is a significant presence of outliers in thevotes per notebook that is consequence of inno-cent irregularities that affect a random set ofnotebooks.

If there is in fact a significant presence of outliers,we can reject H1 because: (1) the irregularities arenot innocent, introducing a significant bias in thevote counting, or (2) they affect mostly notebooks inbastions of one of the options. Therefore, we have toconsider the bizarre, but fair, scenario in which theirregularities that generate the outliers are neutral,no matter what, and, for some reason, they affectmostly a set of notebooks that are in strongholds ofone of the options. Thus, we consider the hypothesisof a bizarre but fair referendum:

H2: The significant presence of outliers in the votesper notebook is the result of innocent irregular-ities that affect mostly a set of notebooks fromstrongholds of one electoral option.

The remaining alternative is a clear signal that theirregularities are not innocent.Before testing the hypotheses, we describe the data-

set.

3.3 Description of the Dataset

It is required to have at least two notebooks percenter for shuffling voting cards, so we restrict ouranalysis to these centers. In addition, since all alle-gations of fraud are related to computerized note-books, we only consider full computerized centers(centers where there are no manual votes). We alsoexclude a very small number of centers with emptynotebooks (notebooks without valid votes). Emptynotebooks could arise for technical problems, affect-ing the distribution of voters to notebooks in suchcenters. After this simple cleaning on full comput-erized centers with two or more notebooks, a con-sistent dataset is obtained with 4,162 centers, all ofthem with comparable notebooks. This means thatthe votes among the notebooks of a center are inthe same order. These 4,162 centers represent 18,297notebooks, more than 83% of the total voters, andhere will be the base of the study. The mean andthe standard deviation of the number of voters perunit polls are 634 and 73, respectively.For the last two subsections of this section, we

will also use the results of presidential elections of1998, in which Chavez was elected to his first termas President of Venezuela with 56% of valid votesagainst a coalition of roughly the same political par-ties that supported the recall. This election was car-ried out with an automated voting system, whichfeatured a single integrated electronic network totransmit the results from the polling stations to cen-tral headquarters (McCoy, 1999). The legitimacy of

12 R. JIMENEZ

the electoral process and the acceptance of the re-sults by political parties and international observersis a guarantee of the reliability of the results. Atthat time, in each center, voters were also randomlyassigned to polling units, according to the last num-ber of their ID, similarly to the process described inSection 2. As we do with the referendum’s dataset,we exclude centers with only one unit poll and thosewith empty unit polls. After the cleaning, a datasetis obtained with 3952 centers, 15,667 unit polls, thatrepresent 85% of the total voters. The mean and thestandard deviation of the number of voters per unitpoll are 594 and 112. For all the above, both sets ofdata are comparable for the statistical purposes ofSection 3.6.A different scenario overshadowed the presidential

elections of 2000, that we consider in Section 3.7, inwhich Chavez was elected to his second term with59% of valid votes. After two years of important po-litical changes, including the enacting of a new con-stitution, the criticism of Chavez’s government in-creased, polarizing the political climate. Many claimsof fraud, including machines not properly function-ing, people whose names did not appear on the elec-toral registry and pre-marked ballots, were made atthat time. While The Carter Center does not believethat the election irregularities would have changedthe presidential results, they consider those electionsas flawed and not fully successful (Neuman and Mc-Coy, 2001). The election was carried out, roughly,with the same voting system used in 1998. However,there was an important difference in the numberof voters per unit poll, increasing significantly thenumber of centers with only one unit. As with thereferendum and the presidential elections of 1998,we exclude centers with only one unit poll and thosewith empty unit polls. In addition, we exclude unitpolls with more votes than voters. Thus, we obtaina dataset with 1,600 centers, only 3,730 unit polls,that represents 53% of the total voters.The three datasets provide estimates of high pre-

cision for the resultant percentage of votes per elec-toral option. Table 1 summarizes their main statis-tics.

3.4 Testing the Hypothesis of a Properly

Conducted Referendum

The way to test irregularity is to determine whetheran observed value is an outlier or not. Let i bea focal notebook and c the center to which it be-longs:

Table 1

Statistical summary of the dataset

Year Unit polls % of total Mean of voters Standard

voters per unit poll deviation

1998 15,667 85% 594.70 111.972000 3,730 53% 1662.50 361.742004 18,297 83% 634.60 73.86

– Since voters of a center are randomly assigned tonotebooks, Yi is the total of YES cards in a simplerandom sample (without replacement) of size τifrom the voting cards of the center. In particular,E[Yi|H0] = pcτi, with pc = yc/νc and

Var[Yi|H0] = τipc(1− pc)νc − τiνc − 1

.

– The minimum τi in the 18,297 notebooks involvedis 347 (the mean value τ =

∑τi/18,297 is equal

to 634.60, and the maximum is 975).

Coupling these facts, a straightforward applicationof the Central Limit Theorem implies that, underH0,the score

Zi =(Yi − pcτi)

√

pc(1− pc)τi(νc − τi)/(νc − 1)(9)

is approximately N (0,1), for any i. Therefore, a testof regularity for a single notebook is reduced to de-termining the significance of Z.To get an overall qualitative idea of the joint be-

havior of the Z scores under H0, the normal proba-bility plot of these statistics from a random sampleof the electoral process is shown in Figure 5 (leftpanel). In the same figure (right panel) the normalprobability plot of the scores based on the observedvalues is also shown. This plot highlights many of-ficial results far from what is expected. Let us peerinto the most atypical cases. Table 2 shows the offi-cial results of centers 7990 and 1123,5 where are thenotebooks associated with minimum (−9.08) andmaximum (10.54) Z value. Let us call these note-books m and M , respectively. Under H0, the ex-pected value of Ym is 161.81, almost twice the ob-served value, which is 81, while the expected valueof YM is 139.08, just over half of what is observed,which is 233.

5We use the center encoding used for the referendum.Codes, as well as the list of centers, varies from election toelection.


Fig. 5. Normal probability plot of Z scores based on a random sample (left) and observed values (right).

Table 2

Results in centers with notebooks associated with minimum(*) and maximum (+) Z value

Center 7990 Center 1123

Notebook m M

Y 174 81∗ 235 191 60 233+ 62

N 272 70 375 396 137 359 143

τ 607 600 610 588 583 594 567

An overall comparison is handled by summingsquares of Z scores. Let

S2 =

18,297∑

i=1

Z2i .(10)

A straightforward computation gives E[S2|H0] =18,297. The variance can be estimated by MonteCarlo, shuffling the voting cards. We performed 1,000random samples of the electoral process and ob-tained a standard deviation of 216. Next we showthat the sampling distribution of the test statistic

TYES =S2 − 18,297

216(11)

can be approximated by a standard Normal distri-bution.The centers have between 2 and 18 notebooks.

The distribution of the centers according to the num-ber of notebooks is shown in Table 3.The sum of squares can be decomposed as follows:

S2 =

18,297∑

i=1

Z2i = χ2

nb(1) + χ2nb(2) + · · ·+ χ2

nb(18),

χ2nb(i) being the sum along all the centers of the

squares of the Z scores related to the ith notebookof a center. Although the results of notebooks be-

longing to the same center are correlated, given H0

they are independent of results in other centers. Inturn, each χ2

nb(i) is the sum of independent random

variables and each one is approximately the squareof a standard normal random variable. Then, χ2

nb(i)

is approximately χ2 with∑

m≥iCm degrees of free-dom, Cm being the number of centers with m note-books. Table 4 lists the degrees of freedom relatedto {χ2

nb(i),1≤ i≤ 18}.In general, approximating the distribution of sums

of correlated χ2 can be difficult. Fortunately, this isnot case here. Two remarks:

– For i≤ 10, the degrees of freedom are large enoughto fit the distribution of χ2

nb(i) by a Normal.

– χ2nb(1) + · · · + χ2

nb(10) represents 99% of the Z2

statistics in S2.

Therefore, S2 is approximately a sum of Normal ran-dom variables. Letting ς2 be the sample variance ob-tained from k independent samples of S2, under H0,the test statistic

TYES =S2 −E[S2|H0]

ς(12)

is approximately N (0,1), for any large k. As we saidabove, we simulated 1,000 random samples of theelectoral process, obtaining ζ = 216. We also usedthe samples to confirm that, under H0, the distri-bution of TYES is approximately N (0,1). For that,we test normality with different methods, all of themwith the same conclusive results. To illustrate, Fig-ure 6 compares the kernel density estimator of theprobability density function of TYES with the prob-ability density of a standard Normal.The TYES observed value, according to the official

results, is TYESobs ≈ 13.12, which establishes that the

results of YES votes per notebook are not credible,

14 R. JIMENEZ

Table 3

Number of clean and fully computerized centers with m notebooks

m 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Cm 1,044 820 665 496 380 300 208 110 54 41 19 12 4 4 2 2 1

Table 4

Degrees of freedom (df) related with χ2nb(i)

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

df 4,162 4,162 3,118 2,298 1,633 1,137 757 457 249 139 85 44 25 13 9 5 3 1

Fig. 6. Kernel estimator of the probability density functionof TYES versus a standard Normal probability density.

given the results by centers. The p-value, less thanthe MatLab precision, is strong evidence against H0.Following the same approach, we can test regular-

ity on the distribution of NO votes and abstentions.For that, we define the Z statistics

ZNOi =

(Ni − qcτi)√

qc(1− qc)τi(νc − τi)/(νc − τi)and

(13)

ZOUTi =

(Oi − rcτi)√

rc(1− rc)τi(νc − τi)/(νc − τi),

qc = nc/νc and rc = (νc − yc − nc)/νc being the pro-portion of NO votes and OUT votes in the center cto which notebook i belongs. As an illustration, Fi-gure 7 shows the normal probability plots of these Zstatistics based on a random sample of a properlyconducted referendum. The figure shows also thenormal probability plots of the scores based on theofficial results. These plots show a widespread depar-ture from the expected values, even stronger thanfor the YES case (be careful with the scales of thesefigures). In fact, if we define test statistics to thedistribution of NO votes and OUT votes, TNO and

TOUT respectively, similar to what we did for theYES votes, then we have

TOUTobs > TNO

obs > TYESobs .

Clearly, H0 can be completely rejected.

3.5 Testing the Hypothesis of an Atypical

Fair Referendum

As mentioned previously, the widespread depar-ture of YESs, NOs and OUTs per notebook fromtheir expected values could be the outcome of in-nocent irregularities in the conduct of the referen-dum. Incorrect allocation of voters to notebooks andthe passing of votes from one notebook to anotherduring the vote counting, by bugs in the program-ming, are examples of such irregularities. These ir-regularities may generate, in particular, ZOUT out-liers but, by the secrecy of the ballot, they shouldnot be associated with a trend in the vote counting.Next, we propose a testing methodology, based ona simple statistical control chart, for testing trendin the vote counting on potentially irregular note-books. The methodology can be easily extended toother electoral audit frameworks. It relies on the as-sumption that unexpected irregularities can occurin any unit poll with the same probability.Denote by R the ratio between NO votes and total

valid votes in the target population, consisting ofK = 18,297 notebooks, namely,

R=

∑Ki=1Ni

∑Ki=1 Ti

.(14)

In sampling jargon, R is the population ratio and Kis the size of the population. Let Sk be the sampleconsisting of the k notebooks with the most extremeZOUT values. This is the set of k notebooks withZOUT values furthest away from zero. Given a con-fidence level 1−α, there is a k := k(α) such that Sk


Fig. 7. Normal probability plot of ZNO (left) and ZOUT (right) scores based on a random sample (top) and observed values(bottom).

matches the set of notebooks with ZOUT values thatwe consider that are outliers, that is, the set of note-books with ZOUT values out of the (1− α)× 100%normal confidence interval. In our case study, if theconfidence level is 99%, then k = 706. Roughly, 4%of the ZOUT values are out of the 99% confidenceinterval. In what follows, k varies in a range suchthat Sk corresponds to the set of outliers, accordingto some reasonable confidence level.Denote by rk the sample ratio based on Sk. That

is,

rk =

∑

i∈SkNi

∑

i∈SkTi

.(15)

Note that rk is not the usual ratio estimator, sincewe are sampling notebooks with atypical ZOUT val-ues. Thus, we might expect that observations (Ni, Ti)in Sk are larger or smaller than those from a simplerandom sample (SRS). However, if the irregularitiesare innocent, if they do not introduce bias in thevote counting, rk should be similar to the sampleratio based on a SRS. In particular, if k is large, thebias of the estimator will be small and the variance

can be approximated by

Var(rk)≈ S2k :=

(

1− k

K

)1

µ2T

s2rk

(Lohr, 2004), with

µT =1

K

K∑

i=1

Ti and s2r =1

k− 1

∑

i∈Sk

(Ni − rkTi)2.

Thus, under the hypothesisH1 defined in Section 3.2,if k and K − k are large enough,

ζk =rk −R

S2k

(16)

is distributed approximately as a standard normalvariable. In what follows, we will only consider 100≤k≪K.To illustrate the above, consider 1,000 indepen-

dent copies of ζ500 from a random sample of atypicalfair referenda. We simulate an atypical fair referen-dum by introducing 700 innocent irregularities ona random sample of the electoral process. Each ir-regularity consists in passing a random proportion of

16 R. JIMENEZ

Fig. 8. Left panel: Normal probability plot of ZOUT scores from an atypical fair referendum. Right panel: Standard normalprobability density versus kernel estimator of the probability density function of r500, based on 1,000 independent copies of anatypical fair referendum.

votes (10% on average) from a notebook to anotherlocated in the same center. This handling producesa significant number of ZOUT outliers (outside the99% normal confidence interval) to those already ob-tained before the manipulation. The normal proba-bility plot of the ZOUT scores of one of these atypicalfair referenda is shown in Figure 8 (left panel) as anexample. As we can see, the shape of the plot is sim-ilar to that observed for the referendum (Figure 7,right bottom panel). We test normality of ζ500 withdifferent methods, all of them with the same con-clusive positive results. To illustrate, the right panelof Figure 8 compares the kernel density estimatorof the probability density function of ζ500 with theprobability density of a standard Normal.We can test the hypothesis of an irregular fair

referendum using the ζk scores. High values of ζkimply that irregularities introduce a bias in favor ofthe NO option in the vote counting. Small values ofthis score imply a bias in favor of the YES option.Under H1, we expect ζk to be within a confidenceinterval.The ζk scores corresponding to the official results

are plotted in Figure 9 (top line), for k between 100and 706. To illustrate the behavior that we expectin an atypical fair referendum, we plot, in the samefigure, 100 simulated scores series of the atypicalfair referendum discussed above. The ζk scores cor-responding to the official results are well above the99.99% confidence interval (−3.9,3.9) for 100≤ k ≤706. Although a small number of simulated trajec-tories also reach high values, all of them are embed-ded in (−3.9,3.9) and most of them are in the 99%confidence interval (−2.58,2.58), as one expects.We

Fig. 9. ζk versus k for official results (top line) and simu-lated atypical fair referenda.

observed similar behavior in 1,000 additional sim-ulated trajectories (not plotted). The scores seriesof the referendum reaches values higher than anythat we observed in simulations, being the only onealways well above 3.9, for 100 ≤ k ≤ 706. This pro-vides strong evidence against H1 than a fairly smallp-value of a ζk score, for some k. We are seeing a sig-nificant bias in the vote counting on notebooks as-sociated with irregularities, which is almost impos-sible to observe under H1. All the above is strongevidence for rejecting it.

3.6 Testing the Hypothesis of Bizarre but Fair

Referendum

Most political scientists expect more innocent ad-ministrative errors in areas with more poor voters


Table 5

Results in Center 1123 (C. M. A. Dr. Angel Vicente Ochoa,in Santa Rosalıa, Caracas)

Notebook 1 2 3 4

Y 191 60 233 62N 396 137 359 143τ 588 583 594 567

(M. Lindeman, personal communication, July 2010).In addition, “the conventional wisdom about con-temporary Venezuelan politics is that class votinghas become commonplace, with the poor doggedlysupporting Hugo Chavez while the rich oppose him”(Lupu, 2010). If both beliefs are true, we expectmore innocent irregularities in strongholds of theNO option, which would explain the atypical resultobserved in the above section. That is what H2 de-scribes, a general scenario in which there are moreinnocent irregularities in centers that support thewinning option. To illustrate this possibility, we showin Table 5 the results in Center 1123 (C. M. A. Dr.Angel Vicente Ochoa, in Santa Rosalıa, Caracas),one of the most extreme results. All its notebooksare associated with very extreme ZOUT values(greater than 18.53!). But the overall NO propor-tion (65%) is even less than that observed in thepresidential elections of 1998 (67%). This center ap-pears to be a bona fide Chavez stronghold. However,we have to remark that, in this election, the ZOUT

values of that center are quite normal, all of thembetween −0.4 and 0.40. The results, only three unitpolls in that election, are shown in Table 6.A naive procedure to see if the irregularities af-

fected mostly notebooks in Chavez’s strongholdswould be repeating the previous analysis on all cen-ters with outliers. We consider an alternative anal-ysis for an important reason: If indeed tamperingoccurred, it is possible that, in order to mask thestuff, the irregularities were committed precisely inChavez’s bastions. In addition, we have to admitthat we do not have access to the re-coding of cen-ters to automatize the procedure.Lupu (2010) provided evidence that the presiden-

tial election of 1998 was more monotonic in classvoting than the referendum. This means, the poorwere more likely to vote for Chavez in 1998 than in2004. Thus, we expect more innocent irregularitiesin Chavez’s strongholds in 1998 than in 2004. In ad-dition, there is not doubt about the legitimacy ofthis election (Neuman and McCoy, 2001). For these

Table 6

Presidential elections of 1998, results inC. M. A. Dr. Angel Vicente Ochoa Center

Unit poll 1 2 3

Y 182 115 123N 357 265 247τ 899 645 624

reasons, the election of 1998 is very appropriate totest if irregularities affect mostly notebooks in cen-ters that support Chavez, that is, to test H2, de-fined in Section 3.2. The testing schema we use isto reject H2 if we fail to reject H1 for the electionsof 1998. We begin verifying that there is a signifi-cant presence of ZOUT outliers in 1998: 5% of ZOUT

values (797 of 15,667) are out of the 99% normalconfidence interval. The evidence against H0 is ofthe same order as in 2004. Furthermore, the mostextreme ZOUT values of 1998 are higher than thoseobserved in 2004. We omit the details and summa-rize results by showing the normal probability plotof the ZOUT scores in Figure 10 (left panel). It seemspossible that, in complex elections, ad hoc decisionsare made to resolve problems that arise on the fly.As we have discussed previously, this can producelarge outliers in the vote distribution. However, thetest discussed in the previous section strongly sup-ports H1 for the presidential elections of 1998. Thecorresponding scores series {ζk,100≤ k ≤ 797} is al-most embedded in the 99% confidence interval; seeright panel of Figure 10. Therefore, we see that thereis little reason to think that the significant presenceof ZOUT outliers, that are the result of innocent ir-regularities, affect mostly a set of notebooks fromChavez’s strongholds. Irregularities seem to occurrandomly, regardless of whether the notebook be-longs to a Chavez bastion or not, and thus, we re-ject H2.

3.7 Estimating the Effect of the Irregularities

We have provided statistical evidence that therewas a significant presence of irregularities that fa-vored the winning option in the vote counting of2004. But, how much could the irregularities affectthe overall results? Suppose that the ZOUT outliers

are the tip of the iceberg and there is bias in thevote counting of a high proportion of notebooks, notjust in the notebooks with extreme ZOUT values. Toevaluate this assumption, we analyze the behaviorof the sample ratio rk in (15) for higher values of k

18 R. JIMENEZ

Fig. 10. Left panel: Normal probability plot of ZOUT scores of the presidential elections of 1998. Right panel: ζk versus kfor presidential elections of 1998.

Fig. 11. Proportion of Chavez’s votes in Sk versus k for thereferendum (top line) and presidential elections of 1998 (linefrom the middle to the bottom) and 2000 (line from the bottomto the middle).

than those that we have already considered. Fig-ure 11 shows that proportion for k varying from 100to 3000 (top line). The shape shows a strong corre-lation between the trend in the vote counting andthe discrepancy between valid votes and its expecta-tion, not only in notebooks with ZOUT outliers. Weremark that for k = 100 we are considering 41,533valid votes, a very large sample size for estimatingproportions (an accepted standard for pollsters isabove 1,000). What we expect when we increase thesample size is exactly what we have for the pres-idential elections of 1998 (line from the middle tothe bottom in Figure 11): The proportion is a func-tion of the sample size that slightly varies aroundthe population proportion, and that quickly stabi-lizes around this value. So, our assumption of irreg-

ularities that affect the vote counting across all thenotebooks is quite possibly true.6 Let us measurehow much it could affect the totals.Let R be the population ratio defined in (14).

Bounds for the relative error, introduced in the votecounting by the irregularities, can be obtained maxi-mizing and minimizing the relative error (rk−R)/R.Thus, we can provide a prediction interval for thecorrected proportion of votes in favor of Chavez,namely,

[

max100≤k≤K/2

(

1− rk −R

R

)

R,

(17)

min100≤k≤K/2

(

1− rk −R

R

)

R

]

,

K being the total number of notebooks. Note thatwe are considering up to 50% of the notebooks (k≪K), those with the highest |ZOUT| values.For example, the prediction interval (17) for the

presidential election of 1998, in which Chavez wonwith 56% of valid votes, is [55%,57%]. We remarkthat this is an example of an atypical but fair elec-tion, where the results were well accepted by politi-cal parties and international observers.Let us consider next the presidential elections of

2000. As mentioned above, The Carter Center con-siders this election as flawed and not fully successful.

6Martın (2011), which unfortunately was not available formy review, studies the volume of traffic in incoming and out-going data between notebooks and totalizing servers. It pro-vides evidence that the vote counting of a high percent ofnotebooks could be affected from the totalizing servers.


Fig. 12. Left panel: Normal probability plot of ZOUT scores of the presidential elections of 2000. Right panel: ζk versus kfor presidential elections of 2000.

However, they also emphasize that the irregularitiesdid not change the presidential results. Our method-ology confirms their conclusion. Figure 12 summa-rizes our testing analysis. We observe the highestpresence of ZOUT outliers in 2000: 9% of ZOUT val-ues (327 of 3730) are outside the 99% normal confi-dence interval. Also, the most extreme ZOUT scoresof 2000 are higher than the observed in 1998 and2004. But, there is not evidence to reject H1 for thiselection. The ζ scores series is always in the 99%normal confidence interval, except for a short ex-cursion. Moreover, the prediction interval (17) forthis election is [59%,62%], and Chavez was electedwith 59% of the valid votes.We do observe a controversial result in the referen-

dum, managed by a different electoral umpire fromthose that managed the elections of 1998 and 2000:The prediction interval is [47%,57%]. The officialresult (59%) is out of range, while results that over-turn the winner are within. We remark that whilethis is not proof that irregularities changed the over-all results, it does illustrate that such a scenario isplausible. Certainly, the result should be, at least,more in line with the prediction interval.

4. CONCLUSIONS

The main tool for conciliating political actors inan election under suspicion of fraud is a full audit.When this is not possible, statistical methods for de-tecting numerical anomalies and diagnosing irregu-larities can be useful for evaluating the likelihood ofthe allegations of fraud. This is the aim of electionforensics (Mebane, 2008), an exciting area of ap-plied statistics. Election forensics has been applied

for several recent controversial elections, including2004 USA, 2006 Mexico, 2008 Russia and 2009 Iran(Mebane, 2011); see the personal web page of Wal-ter Mebane.7 The Venezuelan recall referendum isa case study that shows a wide pallet of the com-monly used statistical tools and problems that canarise in this type of analysis, as shown by our re-view in Section 2. In particular, we have highlightedproblems related to exit polls, causal relationshipbetween number of votes and dependent variables,Benford’s Law, different levels of data aggregation,goodness of fit, and election modeling. Beyond thestatistical learning, the hard criticism of some of thepapers reviewed relates to a deep concern about thefuture of this emerging area. I am convinced that thediffusion of inaccurate analyses only causes foundedallegations of fraud to be undervalued. At least, thiswas the case of the Venezuelan referendum.We propose a forensic election methodology, based

only on vote counting, to analyze the referendum.Also the Venezuelan presidential elections of 1998and 2000 are reviewed. Unlike previous work, weused the full information of the official dataset. Thisconsists not only of the number of votes for andagainst revoking the mandate of President Chavez,but also the number of abstentions and invalid votesat the official data unit with the lowest number ofvotes. The main conclusion of the present paper isthat there were a significant number of irregularitiesin the vote counting that introduced a bias in favorof the winning option. We provide prediction inter-vals for the bias, showing that the scenario in which

7www-personal.umich.edu/˜wmebane.

http://www-personal.umich.edu/~wmebane

20 R. JIMENEZ

the bias could overturn the results is plausible. Thisplaces solid evidence in the arena, substantiating theallegations of fraud made at the time.

ACKNOWLEDGMENTS

The author acknowledges the inspiring discussionson the subject with Haydee Lugo, of UniversidadComplutense de Madrid. Mark Lindeman suggestedthe hypothesis discussed in Section 3.6. The authorthanks the associate editor and the anonymous re-viewers, for a very careful reading of the manuscriptand thoughtful comments. Supported in part bySpanish MEC Grant ECO2011-25706 and CAMGrant S2007/HUM-0413.

REFERENCES

Benjamini, Y. and Hochberg, Y. (1995). Controlling thefalse discovery rate: A practical and powerful approach tomultiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.MR1325392

The Carter Center (2005). Observing the Venezuela presiden-tial recall referendum. Comprehensive report, The CarterCenter. Available at http://www.cartercenter.org/

documents/2020.pdf.De Veaux, R. D. and Hand, D. J. (2005). How to lie with

bad data. Statist. Sci. 20 231–238. MR2189000Deckert, J., Myagkov, M. and Ordeshook, P. C. (2010).

The irrelevance of Benford’s Law for detecting fraud inelections. Working Paper No. 9, Caltech/MIT VotingTechnology Project. Available at http://www.vote.

caltech.edu/drupal/files/rpeavt paper/benford pdf

4b97cc5b5b.pdf.Delfino, G. and Salas, G. (2011). Analysis of the 2004

Venezuela referendum: The official results versus the peti-tion signatures. Statist. Sci. 26 502–512.

Durstchi, C., Hillison, W. and Pacini, C. (2004). Theeffective use of Benford’s Law to assist in detecting fraudin accounting data. Journal of Forensic Accounting 5 17–34.

Etteridge, M. L., Hillison, W. and Srivastava, R. P.

(1999). Using digital analysis to enhance data integrity.Issues in Accounting Education 4 675–690.

Febres, M. M. and Marquez, B. (2006). A statistical ap-proach to asses referendum resuslts: The Venezuelan recallreferendum 2004. International Statistical Review 774 379–389.

Felten, E. W., Rubin, A. D. and Stubblefield, A. (2004).Analysis of the voting data from the recent Venezuela refer-endum. Available at http://venezuela-referendum.com.

Hausmann, R. and Rigobon, R. (2004). In search of theblack swan: Analysis of the statistical evidence of fraudin Venezuela. Working paper, J. F. Kennedy School ofGovernment, Harvard Univ. Available at http://www.hks.harvard.edu/fs/rhausma/new/blackswan03.pdf.

Hausmann, R. and Rigobon, R. (2011). In search of theblack swan: Analysis of the statistical evidence of fraud inVenezuela. Statist. Sci. 26 543–563.

Hill, T. P. (1995). The significant-digit phenomenon. Amer.Math. Monthly 102 322–327. MR1328015

Lohr, S. (2004). Sampling: Design and Analysis, 2nd ed.Brooks/Cole, Boston, MA.

Luhnow, D. and De Cordoba, J. (2004). Academics’ studybacks fraud claim in Chavez election. The Wall Street Jour-nal, September 7, 2004.

Lupu, N. (2010). Who Votes for chavismo? Class Voting inHugo Chavez’s Venezuela. Latin American Research Re-view 45 7–32.

Martın, I. (2011). 2004 Venezuelan presidential recall refer-endum (2004 PRR): A statistical analysis from the pointof view of data transmission by electronic voting machines.Statist. Sci. 26 528–542.

McCoy, J. (1999). Chavez and the end of “Partyarchy” inVenezuela. Journal of Democracy 10 64–77.

Mebane, W. (2008). Election forensics: The Second-digitBenford’s Law Test and recent American presidecial elec-tions. In Election Fraud: Detecting and Deterring Elec-toral Manipulation. (R. M. Alvarez, T. E. Hall andS. D. Hyde, eds.) 162–181. Brooking Press, Washington,DC.

Mebane, W. (2011). Fraud in the 2009 Presidential Electionin Iran? Chance 23 6–15.

Neuman, L. and McCoy, J. (2001). Observing politicalchange in Venezuela: The Bolivarianan constitution and2000 elections. Final report, The Carter Center. Availableat http://www.cartercenter.org/documents/297.pdf.

Pericchi, L. and Torres, D. (2011). Quick anomaly detec-tion by the Newcomb–Benford Law, with applications toelectoral processes data from the USA, Puerto Rico andVenezuela. Statist. Sci. 26 513–527.

Prado, R. and Sanso, B. (2011). The 2004 Venezuelan pres-idential recall referendum: Discrepancies between two exitpolls and official results. Statist. Sci. 26 502–512.

Press, S. J. (1982). Applied Multivariate Analysis: UsingBayesian and Frequentist Methods of Inference, 2nd ed.Dover, Mineola, NY.

Sneath, P. H. A. and Sokal, R. R. (1973). Numerical Tax-onomy: The Principles and Practice of Numerical Classifi-cation. Freeman, San Francisco, CA. MR0456594

Taylor, J. (2005). Too many ties? An empirical analysisof the Venezuelan recall referendum. Available at http://esdata.info/pdf/Taylor-Ties.pdf.

Wallace, W. A. (2002). Assessing the quality of data usedfor benchmarking and decision-making. The Journal ofGovernment Financial Management 51 21–23.

Weisbrot, M., Rosnick, D. and Tucker, T. (2004). Blackswans, conspiracy theories, and the Quixotic search forfraud: A look at Hausmann and Rigobon’s analysis ofVenezuela’s referendum vote. Briefing paper, Center forEconomic and Policy Research. Available at http://www.cepr.net/documents/publications/venezuela 2004 09.

pdf.Zelterman, D. (2006). Models for Discrete Data, Revised

ed. Oxford Univ. Press, Oxford. MR2218182

http://www.ams.org/mathscinet-getitem?mr=1325392

http://www.cartercenter.org/documents/2020.pdf



http://www.vote.caltech.edu/drupal/files/rpeavt_paper/benford_pdf_4b97cc5b5b.pdf



http://www.venezuela-referendum.com

http://www.hks.harvard.edu/fs/rhausma/new/blackswan03.pdf

http://www.hks.harvard.edu/fs/rhausma/new/blackswan03.pdf




http://esdata.info/pdf/Taylor-Ties.pdf

http://esdata.info/pdf/Taylor-Ties.pdf

http://www.cepr.net/documents/publications/venezuela_2004_09.pdf




Documents

Forensic Analysis of the Venezuelan Recall …arXiv:1205.3645v1 [stat.ME] 16 May 2012 Statistical Science 2011, Vol. 26, No. 4, 564–583 DOI: 10.1214/11-STS375 c Institute of Mathematical