Being again near general elections is an opportunity to look at some topics we talked about over the years.
I am quite fond of (and a bit addicted to) Nate Silver’s site FiveThirtyEight. Silver’s models tell us what is the probability that Hillary Clinton will win the elections. It is a very difficult question to understand how does the model relates to reality. What does it even mean that Clinton’s chances to win are 81.5%? One thing we can do with more certainty is to compare the predictions of one model to that of another model.
Some data from Nate Silver. Comparing the chance of winning with the chance of winning of the popular vote accounts for “aggregation of information,” the probability for a recount accounts for noise sensitivity. The computation for the winning probabilities themselves is also similar in spirit to the study of noise sensitivity/stability.
This data is month-old. Today, Silver gives probability above 80% for Clinton’s victory.
Nate Silver and information aggregation
Given two candidates “zero” and “one” and a fixed , suppose that every voter votes for “one” with probability and for “zero” with probability and that these events are statistically independent. Asymptotically complete aggregation of information means that with high probability (for large populations) “one” will win.
Aggregation of information for the majority rule was studied by Condorcet in what is known as the “Condorcet’s Jury theorem”. The US electoral rule which is a two-tier majority with some weights also aggregates information but in a somewhat weaker way.
The data in Silver’s forecast allows to estimate aggregation of information based on actual polls which give different probabilities for voters in different states. This is reflected by the relation between the probability of winning and the probability for winning the popular vote. Silver’s data allows to see for this comparison if the the simplistic models behave in a similar way to the models based on actual polls.
We talked about Condorcet’s Jury theorem in this 2009 post on social choice.
Marie Jean Nicolas Caritat, marquis de Condorcet (1743-1794)
Nate Silver and noise stability
Suppose that the voters vote at random and each voter votes for each candidate with probability 1/2 (again independently). One property that we ask from a voting method is that the outcomes of the election will be robust to noise of the following kind: Flip each ballot with probability t for . “Noise stability” means that if t is small then the probability of such random errors in counting the votes to change the identity of the winner is small as well. The majority rule is noise stable and so is the US election rule (but not as much).
How relevant is noise sensitivity/stability for actual elections? One way to deal with this question is to compare noise sensitivity based on the simple model for voting and for errors to noise sensitivity for the model based on actual polls. Most relevant is Silver’s probability for “recount.”
Nate Silver computes the probability of victory for every candidate based on running many “noisy” simulations based on the outcomes of the polls. (The way different polls are analyzed and how all the polls are aggregated together to give a model for voters’ behavior is a separate interesting story.)
We talked about noise stability and elections in this 2009 post (and other places as well).
Nate Silver and measures of power.
The Banzhaf power index is the probability that a voter is pivotal (namely her vote can determine the outcome of the election) based on each voter voting with equal probability to each candidate. The Shapley-Shubik power index is the probability that a voter is pivotal under a different a priory distribution for the individual voters (under which the votes are positively correlated). Nate silver computes certain power indices based on the distribution of votes in each states as described by his model. Of course, voters in swing states have more power. It could be interesting to compare the properties of the abstract power indices and the more realistic ones from FiveThirtyEight. For example, the Banzhaf power indices sum up to the square root of the size of the population, while the Shapley-Shubik power indices sum up to one. It will be interesting to check the sum of pivotality probabilities under Silver’s model. (I’d guess that Silver’s model is closer to the Shapley-Shubik behavior.)
Nate Silver and the Hex election rule
In some earlier post we considered (but did not recommend) the HEX election rule. FiveThirtyEight provides a tool to represent the states of the US on a HEX board where sizes of states are proportional to the number of electoral votes.
According to the HEX rule one candidates wins by forming a continuous right-left path of winning states, and the other wins by blocking every such path or, equivalently, by forming a north-south path of winning states. The Hex rule is not “neutral” (symmetric under permuting the candidates).
If we ask for winning a north-south path for red and an east-west path for blue then red wins. For a right-left blue path much attention should be given to Arizona and Kansas.
If we ask for winning a north-south path for blue and an east-west path for red then blue wins and the Reds’ best shot would be to try to gain Oregon.
Now with the recent rise of the democratic party in the polls it seems possible that we will witness two disjoint blue north-south paths (with Georgia) as well as a blue east-west path. For a percolation-interested democratically-inclined observer (like me), this would be beautiful.
The mathematics of information aggregation and noise stability, and the anomaly of majority
One way to consider both two basic properties of the majority rule as sort of stability to errors is as follows:
a) (Information aggregation reformulated) If all voters vote for the better candidate and with some probability a ballot will be flipped, then with high probability as the number of voters grows, the better candidate still wins.
We can also consider a weak form of information aggregation where is a fixed small real number. One way to think about this property is to consider an encoding of a bit by a string on n identical copies. Decoding using the majority rule have good error-correction capabilities.
b) (Noise stability) If all voters vote at random (independently with probability 1/2 for each candidate) and with some small probability a ballot will be flipped, then with high probability (as get smaller) this will not change the winner.
The “anomaly of majority” refers to these two properties of the majority rule which in terms of the Fourier expansion of Boolean functions are in tension with each other.
It turns out that for a sequence of voting rules, information aggregation is equivalent to the property that the maximum Shapley-Shubik power of the players tends to zero. (This is a theorem I proved in 2002. The quantitative relations are weak and not optimal.) Noise stability implies a positive correlation with some weighted majority rule, and it is equivalent to approximate low-degree Fourier representation. (These are results from 1999 of Benjamini Schramm and me.) Aggregation of information when there are two candidates implies a phenomenon called indeterminacy when there are many candidates.
The anomaly of majority is important for the understanding of why classical information and computation is possible in our noisy world.
Frank Wilczek, on of the greatest physicists of our time, wrote in 2015 a paper about future physics were he (among many other interesting things) is predicting that quantum computers will be built! While somewhat unimpressed by factoring large integers, Wilczek is fascinated by the possibility that
A quantum mind could experience a superposition of “mutually contradictory” states
Now, imagine quantum elections where the desired outcome of the election is going to be a superposition Hilary and Donald (Or Hillary’s and Donald states of mind, if you wish.) For example |Hillary> PLUS |Donald>.
Can we have a quantum voting procedure which has both a weak form of information aggregation and noise stability? Weak form of information aggregation amounts for the ability to correct a small fraction of random errors. Noise stability amounts to decoding procedure which is based on low-degree polynomials. Such procedures are unavailable and proving that they do not exist (or disproving it) is on my “to do” list.
The fact that no such quantum mechanisms are available appears to be important for the understanding of why robust quantum information and quantum computation is not possible in our noisy world!
Quantum election and a quantum Arrow’s theorem were considered in the post “Democrat plus Republican over the square-root of two” by
Nate Silver’s 2008 fifty home runs
One last point. I learned about Nate Silver from my friend Greg Kuperberg, and probably from his mathoverflow answer to a question about mathematics and social science. There, Greg wrote referring to the 2008 elections: “The main person at this site, Nate Silver, has hit 50 home runs in the subject of American political polling.” Indeed, in the 2008 elections Silver correctly predicted who will win in each of the 50 states of the US. This is clearly impressive but does it reflect Silver’s superior methodology? or of him being lucky? or perhaps suggests some problems with the methodology? (Or some combination of all answers?)
One piece of information that I don’t have is the probabilities Silver assigned in each state in 2008. Of course, these probabilities are not independent but based on them we can estimate the expected number of mistakes. (In the 2016 election the expected number of mistakes in state-outcomes is today above five.) Also here, because of dependencies the expected value accounts also for some substantial small probability for many errors simultaneously. Silver’s methodology allows to estimate the actual distribution of “for how many states the predicted winner will lose?” (This estimation is not given on the site.)
Now, suppose that the number of such errors is systematically lower than the predicted number of errors. If this is not due to lack, it may suggest that the probabilities for individual states are tilted to the middle. (It need not necessarily have bearing on the presidential probabilities.)
A major addiction problem…
Would you decide by yourself the elections if you could?
One mental experiment I am fond of asking people (usually before elections) is this: Suppose that just a minute before the votes are counted you can change the outcome of the election (say, the identity of the winner, or even the entire distribution of ballots) according to your own preferences. Let’s assume that this act will be completely secret. Nobody else will ever know it. Will you do it?
In 2008 we ran a post with a poll about it.
We can run a new poll specific to the US 2016 election.
Reflections about elections
I really like days of elections and their special atmosphere in Israel where I try never to miss them, and also in the US (I often visit the US on Novembers). I also believe in democracy as a value and as a tool. Often, I don’t like the results but usually I can feel happy for those who do like the results. (And by definition, in some sense, most people do like the outcomes.)
And here is a post about democracy in talmudic teachings.
Below the fold, my own opinion on the coming US election.
The choice as I see it