Simulating the Colombian Peace Vote: Did the “No” Really Win?

On October 2nd, 2016, I watched in awe as Colombia’s national plebiscite for its just-signed peace accord narrowly failed. For the following week, I brooded over the result: the disinformation campaign, Uribe’s antics, and just how good the deal really seemed to be. Two days ago, I chanced upon this post, which reminds us that the razor-thin margin – 6,431,376 “No” vs. 6,377,482 “Yes” – is not particularly convincing, nor, as it happens, immune to human error.

And as with all manual voting systems, one cannot rule out at least some degree of misclassification of papers on some scale, no matter how small. We know of no evidence of cheating, and Colombia is to be lauded for the seriousness of its referendum process, but the distinction between intentional and unintentional misclassification by individual counters can occasionally become blurred in practice.

In other words, it was humans – tired humans – counting ballots by hand.

The technology of tired humans sorting pieces of paper into four stacks is, at best, crude. As a large research literature has made clear, we can reasonably assume that even well-rested people would have made mistakes with between 0.5% and 1% of the ballots. On this estimate, about 65,000-130,000 votes would have been unintentionally misclassified. It means the number of innocent counting errors could easily be substantially larger than the 53,894 yes-no difference.

Is it possible that the majority wanted “Yes” and still happened to lose?

To answer this question, we can frame the vote as a simple statistical process and ask: “if we were to re-hold the vote many more times, how often would the ‘Yes’ vote actually win?”

Should we choose, we could pursue this result analytically, i.e. solve the problem with a pencil and paper. This get messy quickly. Instead, we’ll disregard closed-form theory and run a basic simulation; “if you can write a for-loop, you can do statistics.”

We’ll frame our problem as follows:

1. V_t=13,066,047 voters arrive to the polls.
2. p_{yes}\% of them intend to vote “Yes”, (1-p_{yes})\% of them intend to vote “No.”
3. Each voter casts an invalid (unmarked or void) ballot with probability p_{invalid}\%.
4. Of the valid ballots, the poll workers misclassify the vote with probability p_{misclassification}\%.
5. Majority vote wins.

In each trial, we assume a true, underlying p_{yes}\% for the voting populace. For example, if p_{yes} is .48, we will have V_t * p_{yes} individuals intending to vote “Yes,” and V_t * (1-p_{yes}) voters intending to vote “No.” We assume these values to be static: they are not generated by a random process.

Next, each voter casts an invalid ballot with probability p_{invalid}, which we model as a Binomial random variable. Each remaining, valid ballot is then misclassified with probability p_{misclassification}. Finally, the tallies of “Yes” and “No” votes are counted, and the percentage of “Yes” votes is returned.

Let’s try this out for varying values of p_{yes}. To start, if the true, underlying percentage of “Yes” voters were 51%, how often would the “No” vote still win?

That’s comforting. Given our assumptions, if 51% of the Colombian people arrived at the polls intending to vote “Yes,” the “No” vote would have nonetheless won in 0 of 100,000 trials. So, how close can we get before we start seeing backwards results?

Our first frustration comes at p_{yes} = .5001: if V_t * p_{yes} = 13,066,047 * .5001 ≈ 6,534,330 voters wanted “Yes” vs. ≈ 6,531,716 who wanted “No,” the “No” vote would have still won 0.191% of the time. Again, this reversal derives from human error: both on the part of the voter in casting an invalid ballot, and on the part of the the poll-worker incorrectly classifying that ballot by hand.

As we move further down, the results get tighter. At p_{yes} = .50001, the “Yes” vote can only be expected to have won 1 – .38688 = 61.312% of the time. Finally, at p_{yes} = .5000001 (which, keep in mind, implies an “I intend to vote ‘Yes'” vs. “I intend to vote ‘No'” differential of just 13,066,047 * (p_{yes} - (1 - p_{yes})) \approx 3 voters), the “No” vote actually wins the majority of the 100,000 hypothetical trials. At that point, we’re really just flipping coins.

In summary, as the authors of the above post suggest, it would be statistically irresponsible to claim a definitive win for the “No.” Conversely, the true, underlying margin does prove to be extremely tight: maybe a majority vote just isn’t the best way to handle these issues after all.

The notebook and repo for the analysis can be found here. Key references include:

 

Will Wolf

I'm a Data Scientist passionate about machine learning, Bayesian statistics, Python and Scala. Ultimately, I want to use mathematics to solve human-scale problems that positively impact the lives of others.

 

Leave a Reply

Your email address will not be published. Required fields are marked *