Being again near general elections is an opportunity to look at some topics we talked about over the years.

I am quite fond of (and a bit addicted to) Nate Silver’s site FiveThirtyEight. Silver’s models tell us what is the probability that Hillary Clinton will win the elections. It is a very difficult question to understand how does the model relates to reality. What does it even mean that Clinton’s chances to win are 81.5%? One thing we can do with more certainty is to compare the predictions of one model to that of another model.

Some data from Nate Silver. Comparing the chance of winning with the chance of winning of the popular vote accounts for “aggregation of information,” the probability for a **recount** accounts for noise sensitivity. The computation for the winning probabilities themselves is also similar in spirit to the study of noise sensitivity/stability.

This data is month-old. Today, Silver gives probability above 80% for Clinton’s victory.

## Nate Silver and information aggregation

Given two candidates “zero” and “one” and a fixed , suppose that every voter votes for “one” with probability and for “zero” with probability and that these events are statistically independent. Asymptotically complete aggregation of information means that with high probability (for large populations) “one” will win.

Aggregation of information for the majority rule was studied by Condorcet in what is known as the “Condorcet’s Jury theorem”. The US electoral rule which is a two-tier majority with some weights also aggregates information but in a somewhat weaker way.

The data in Silver’s forecast allows to estimate aggregation of information based on actual polls which give different probabilities for voters in different states. This is reflected by the relation between the probability of winning and the probability for winning the popular vote. Silver’s data allows to see for this comparison if the the simplistic models behave in a similar way to the models based on actual polls.

We talked about Condorcet’s Jury theorem in this 2009 post on social choice.

**Marie Jean Nicolas Caritat, marquis de Condorcet (1743-1794)**

## Nate Silver and noise stability

Suppose that the voters vote at random and each voter votes for each candidate with probability 1/2 (again independently). One property that we ask from a voting method is that the outcomes of the election will be robust to noise of the following kind: Flip each ballot with probability *t* for . “Noise stability” means that if *t* is small then the probability of such random errors in counting the votes to change the identity of the winner is small as well. The majority rule is noise stable and so is the US election rule (but not as much).

How relevant is noise sensitivity/stability for actual elections? One way to deal with this question is to compare noise sensitivity based on the simple model for voting and for errors to noise sensitivity for the model based on actual polls. Most relevant is Silver’s probability for “recount.”

Nate Silver computes the probability of victory for every candidate based on running many “noisy” simulations based on the outcomes of the polls. (The way different polls are analyzed and how all the polls are aggregated together to give a model for voters’ behavior is a separate interesting story.)

We talked about noise stability and elections in this 2009 post (and other places as well).

## Nate Silver and measures of power.

The Banzhaf power index is the probability that a voter is pivotal (namely her vote can determine the outcome of the election) based on each voter voting with equal probability to each candidate. The Shapley-Shubik power index is the probability that a voter is pivotal under a different a priory distribution for the individual voters (under which the votes are positively correlated). Nate silver computes certain power indices based on the distribution of votes in each states as described by his model. Of course, voters in swing states have more power. It could be interesting to compare the properties of the abstract power indices and the more realistic ones from FiveThirtyEight. For example, the Banzhaf power indices sum up to the square root of the size of the population, while the Shapley-Shubik power indices sum up to one. It will be interesting to check the sum of pivotality probabilities under Silver’s model. (I’d guess that Silver’s model is closer to the Shapley-Shubik behavior.)

We talked about elections, coalition forming and power measures here, here and here.

## Nate Silver and the Hex election rule

In some earlier post we considered (but *did not* recommend) the HEX election rule. FiveThirtyEight provides a tool to represent the states of the US on a HEX board where sizes of states are proportional to the number of electoral votes.

According to the HEX rule one candidates wins by forming a continuous right-left path of winning states, and the other wins by blocking every such path or, equivalently, by forming a north-south path of winning states. The Hex rule is not “neutral” (symmetric under permuting the candidates).

If we ask for winning a north-south path for red and an east-west path for blue then red wins. For a right-left blue path much attention should be given to Arizona and Kansas.

If we ask for winning a north-south path for blue and an east-west path for red then blue wins and the Reds’ best shot would be to try to gain Oregon.

Now with the recent rise of the democratic party in the polls it seems possible that we will witness two disjoint blue north-south paths (with Georgia) as well as a blue east-west path. For a percolation-interested democratically-inclined observer (like me), this would be beautiful.

## The mathematics of information aggregation and noise stability, and the anomaly of majority

One way to consider both two basic properties of the majority rule as sort of stability to errors is as follows:

a) (Information aggregation reformulated) If all voters vote for the better candidate and with some probability a ballot will be flipped, then with high probability as the number of voters grows, the better candidate still wins.

We can also consider a weak form of information aggregation where is a fixed small real number. One way to think about this property is to consider an encoding of a bit by a string on n identical copies. Decoding using the majority rule have good error-correction capabilities.

b) (Noise stability) If all voters vote at random (independently with probability 1/2 for each candidate) and with some small probability a ballot will be flipped, then with high probability (as get smaller) this will not change the winner.

The “anomaly of majority” refers to these two properties of the majority rule which in terms of the Fourier expansion of Boolean functions are in tension with each other.

It turns out that for a sequence of voting rules, information aggregation is equivalent to the property that the maximum Shapley-Shubik power of the players tends to zero. (This is a theorem I proved in 2002. The quantitative relations are weak and not optimal.) Noise stability implies a positive correlation with some weighted majority rule, and it is equivalent to approximate low-degree Fourier representation. (These are results from 1999 of Benjamini Schramm and me.) Aggregation of information when there are two candidates implies a phenomenon called indeterminacy when there are many candidates.

The anomaly of majority is important for the understanding of why classical information and computation is possible in our noisy world.

## Quantum Elections!

Frank Wilczek, on of the greatest physicists of our time, wrote in 2015 a paper about future physics were he (among many other interesting things) is predicting that quantum computers will be built! While somewhat unimpressed by factoring large integers, Wilczek is fascinated by the possibility that

A quantum mind could experience a superposition of “mutually contradictory” states

Now, imagine **quantum elections** where the desired outcome of the election is going to be a superposition Hilary and Donald (Or Hillary’s and Donald states of mind, if you wish.) For example **|Hillary>** PLUS **|Donald>**.

Can we have a quantum voting procedure which has both a weak form of information aggregation and noise stability? Weak form of information aggregation amounts for the ability to correct a small fraction of random errors. Noise stability amounts to decoding procedure which is based on low-degree polynomials. Such procedures are unavailable and proving that they do not exist (or disproving it) is on my “to do” list.

The fact that no such quantum mechanisms are available appears to be important for the understanding of why robust quantum information and quantum computation is not possible in our noisy world!

Quantum election and a quantum Arrow’s theorem were considered in the post “Democrat plus Republican over the square-root of two” by

## Nate Silver’s 2008 fifty home runs

One last point. I learned about Nate Silver from my friend Greg Kuperberg, and probably from his mathoverflow answer to a question about mathematics and social science. There, Greg wrote referring to the 2008 elections: “The main person at this site, Nate Silver, has hit 50 home runs in the subject of American political polling.” Indeed, in the 2008 elections Silver correctly predicted who will win in each of the 50 states of the US. This is clearly impressive but does it reflect Silver’s superior methodology? or of him being lucky? or perhaps suggests some problems with the methodology? (Or some combination of all answers?)

One piece of information that I don’t have is the probabilities Silver assigned in each state in 2008. Of course, these probabilities are not independent but based on them we can estimate the expected number of mistakes. (In the 2016 election the expected number of mistakes in state-outcomes is today above five.) Also here, because of dependencies the expected value accounts also for some substantial small probability for many errors simultaneously. Silver’s methodology allows to estimate the actual distribution of “for how many states the predicted winner will lose?” (This estimation is not given on the site.)

Now, suppose that the number of such errors is systematically lower than the predicted number of errors. If this is not due to lack, it may suggest that the probabilities for individual states are tilted to the middle. (It need not necessarily have bearing on the presidential probabilities.)

## A major addiction problem…

## Would you decide by yourself the elections if you could?

One mental experiment I am fond of asking people (usually before elections) is this: Suppose that just a minute before the votes are counted you can change the outcome of the election (say, the identity of the winner, or even the entire distribution of ballots) according to your own preferences. Let’s assume that this act will be completely secret. Nobody else will ever know it. Will you do it?

In 2008 we ran a post with a poll about it.

We can run a new poll specific to the US 2016 election.

## Reflections about elections

I really like days of elections and their special atmosphere in Israel where I try never to miss them, and also in the US (I often visit the US on Novembers). I also believe in democracy as a value and as a tool. Often, I don’t like the results but usually I can feel happy for those who do like the results. (And by definition, in some sense, most people do like the outcomes.)

And here is a post about democracy in talmudic teachings.

Below the fold, my own opinion on the coming US election.

**The choice as I see it**

Reblogged this on C r i s t i a n a *.

Thank you, Cristiana!

You’re welcome

Just by looking at other peoples laptop screens during talks, this seems to be a common addiction. The HEX rule amused me. In the last Fivethrityeight chat, which I read, there was this line.

“natesilver: Why not aim for other arbitrary goals like building a path of states that go from the Pacific to the Atlantic?

micah: That would be cool.

natesilver: You can do it with only 52 electoral votes!”

And he posted this map:

http://www.270towin.com/maps/vP0Xb

Source: http://fivethirtyeight.com/features/should-clinton-play-for-an-electoral-college-landslide/

Many thanks, Ferdinand! Very interesting links. There is also a very interesting (and rather convincing) explanation by Silver for why his model gives higher chances for Trump’s victory compared to other models. http://fivethirtyeight.com/features/election-update-why-our-model-is-more-bullish-than-others-on-trump/

For some further interesting discussion over facebook: https://www.facebook.com/gil.kalai.1/posts/10154644458603792

Regarding 2008-2012 home runs of Silver.

Those elections were much more stable in multiple ways, as Nate himself explained in length.

Eg. Number of undecided voters. Strengthen of third party candidates. That one candidate is very unconventional and there other is hated by many of her own camp for ditching Sandres

More thoughts to follow….

Noise is really beside the point or downstream from it. The quantum world is made of “quanta” not classical particles and superposition is not an observed simultaneity like many worlds. There are no electrons 1/4 in a cavity and 3/4 out. There is no empirical proof of any such thing happening! We don’t strictly need probability in QM because it just predicts the expected values of observables. That’s it. The Schrodinger cat fallacy shoots down the entire thinking.

“The best argument for quantum mechanics being probabilistic occurs when an observable A has discrete spectrum. A yes-no observable has a spectrum of just 0 and 1, and many other oberservables also have discrete spectrum. Then maybe a system can be put into an eigenstate so that an exactly precise prediction can be made, and the system seems deterministic. But if a state is a superposition of two eigenstates, then the measurement seems probabilistic because a measurement of A gives one of two discrete possibilities.

The situation is analogous to a coin toss that must yield either heads or tails. But the coin itself is not restricted to two discrete possibilities as it is tossed; the discreteness comes about because of the way it is measured. The coin itself may be deterministic. Likewise a quantum mixture of two eigenstates could be a deterministic object that only seems like a coin toss because of the way that it is measured. Whatever uncertainty there is may be entirely due to our lack of knowledge about the state, and the discreteness imposed by the measuring process.

Thus I do not believe that it is either necessary or very useful to talk about probabilities in quantum mechanics. You could say that the probability gives a way of understanding that the same experiment does not give the same outcome every time, but it does not give any more quantitatively useful information. This understanding is nothing special because every other branch of science also has variation in experimental outcomes.”

http://fqxi.org/community/forum/topic/1828

Also, you can’t run Turing complexity arguments backwards.

As far as elections, turnout matters and the polls constantly shift. If you can’t explain the drift, then you are just engaging in mental masturbation and curve-fitting the past. Rolling windows made a disaster out of Black-Scholes. Never mind MRIs…

No time to comment in detail but a quick note on the “Hex” rule: Nebraska splits its votes by Congressional district, as does Maine. Now the district that the Dems have a chance in is concentrated around Omaha, but several maps show it as a left-to-right stripe. Thus it is possible for Blue to have a Pacific-to-Atlantic path without winning Nebraska—and it could be a bridge allowing a vertical red path thru the heartland too.

One related model I forgot to mention is dynamic critical percolation. This is sort of similar to a movie where you show Silver’s HEX (or the US colored map) dynamically as a function of time. Thanks for the comment, Ken.

Tell these university liberals to stop wasting our time with models that Nassim Taleb would laugh at and quantum computers that aren’t based on accepted quantum mechanics! DSGE models can’t predict a quarter ahead in macroeconomics and S&P SPIVA scorecards show active managers can’t consistently beat market indexes. It was easy to see who was going to win and where the stock market was headed (see the long-standing 96% correlation between the Fed & world balance sheets vs the S&P). I was the first to find the Virtuosity (1995) scene with Trump debating “immigration & closing the American borders” and the Japanese Back to the Future scene. Back to the Future even “predicted” the Cubs winning the world series! Newton was a mystic for a good reason. Don’t believe me, take a look at Back to the Future (I & II), Virtuosity, Casino, Simpsons, etc…

Donald Trump in Back to the Future, Casino, Virtuosity & Simpsons

In case of copyright, see here: https://www.dropbox.com/s/3mn8yhupz692pz9/Final%20Trump%202.mp4?dl=0

Just a few points to temper my attitude toward impossibility:

The major questions in geometry were how to square a circle, trisect an angle, inscribe regular polygons and inverse trig. They can ALL be done with Archimedes’ Spiral, constructed only with a piece of string unwrapped from a circle. Gauss and others somehow thought a piece of string was not a basic instrument of construction, although people used them to make an ellipse since antiquity.

Galois had us believe we could never find the roots to quintic polynomials and Poincare said there was no analytical solution to the three-body problem. All kinds of things are possible with EXISTING computers when previous times thought they weren’t. Numerical methods, optimization algos, non-parametric statistics, perturbation theory, etc… solve just about everything to arbitrary accuracy. Even chiral fermions are being simulated with lattice gauge theory (see Wen). Fairly simple 4-D gauge theory can basically unify physics. We don’t need a supercomputers.

We have the moronic theorems of Cantor, Godel & Turing about impossibility but they are nothing of the sort. Asking a system you don’t know is consistent, if it’s consistent, is moronically stupid. It’s obvious without a proof that logical explosion could have it tell you it is. It in no way says that a formal system could not produce all arithmetical truths. Did you stop beating your wife? Transfinite numbers, “uncomputable” numbers and completed infinities are pure non-sense involving impredicativity mentioned by Poincare and more recently Solomon Feferman. Not a single real number has been truly demonstrated and ultrafinite math rids ALL, and I mean ALL, the paradoxes of analysis and set theory.

P vs. NP is also hardly obvious when you see that the question is poorly posed. “Finite” length programs with pre-computation (a finite set of hard cases) and constant-time jumps (my additional observation about the poor modeling by clumsy Turing machines) even make Don Knuth wonder.

People like Seth Lloyd cannot even understand the meaning of Maxwell’s Demon and thermodynamics. No absolute proofs have been given to refute reversibility. Read Maxwell’s letters and you will understand that he was saying that such a demon could do no work and was not a physical thing but was only entertained to explain a circular reasoning of statistics.

Some things do seem impossible in physics but some of the most prominent examples are not demonstrated by the “skeptic” community. Anyone with a fifth-grade education can understand that energy requirements are too great for deep-space travel, even before considering the relativistic rocket equation with optimistic exhaust velocity. The non-local nature of faster-than-light propagation also seems unlikely when considering how people can’t even understand Gell-Mann’s invocation of Bertleman’s Socks. The world of spookiness is macro and not micro.

Just thought I would also mention one clarifying reason why mathematicians can’t beat a coin flip at prediction: https://www.youtube.com/watch?v=dFs9WO2B8uI

A nice commentary by Nate Silver also related to noise sensitivity is here http://fivethirtyeight.com/features/what-a-difference-2-percentage-points-makes/

While this should be checked more carefully overall, I think that Silver’s forecast was reasonable.

He gave close to 30% probability for a Trump victory, emphasized the substantial chance for a split between popular votes and electoral votes. Also the error-states is (this time) consistent with

the expected number of errors based on Silver’s probabilities for individual states.

We witnessed some systematic error in polls vs reality and a very strong positive dependency of errors between state. This is something consistent with Silver’s modeling. Probably people gave too strong interpretations to Silver’s 2008/2012 successes (especially the 50/50 success.) Of course, Silver aggregate information from polls (in a way which evaluate the quality of individual polls and estimate the poll’s bias), and the question if improving the quality of individual polls is at all possible and how is a separate interesting issue.

Taleb said he should leave the business on Twitter. I said the following on Nov. 6th: “Nassim Nicholas Taleb after the election: ‘fooled by randomness, again…'”

I said in a previous post here that “turnout matters.” It’s a relevant variable and somewhat obvious. You can’t ignore assumptions. Furthermore, the polls show first and second derivative momentum and the latest ones were still stale. It’s not factored in! The point about correlation, I agree with.

“From a probabilistic standpoint, neither a p-value of .05 nor a ‘power’ at .9 appear to make the slightest sense.”

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2834266