Final Review: A summary of the final results can be found here.
Original article: 7/22/2016: Being in the height of the presidential election season, there is intense interest as to who is likely to win the November elections. The standard approach in the press has been to commission a poll, report the results and compare to the “margin of error”. If the spread is in the margin of error, there is almost always a statement that the results are a “statistical tie”.
Many are rightfully suspicious of polling results and the media analysis for a number of very good reasons. Firstly, a “statistical tie” is a nonsense concept that you will not find in your standard statistical textbook. Secondly, the results from one poll to the next seem to be more volatile than might be expected after consideration of the disclosed margins of error. And thirdly, an obvious improvement in analyzing the state of the race would be to try average over recent poll results, although determining how long a period to average over is a non-trivial challenge.
A more interesting and direct question than what is the spread in the polls relative to the margin of error, is to ask the direct question, “What is the probability that one candidate is leading the other as of today?” To address this question, a number of fairly recent attempts have been made, the most well-known is Nate Silver’s FiveThirtyEight blog which had good results in predicting the 2012 elections. He has been less successful at predicting the Republican nomination of Trump. The New York Times has developed a competing approach to FiveThirtyEight and Huffington Post also does a good job at recording and summarizing the polls. Another alternative is to derive the implied probability of a candidate winning by analyzing the betting markets.
As of July 22, the day after the Republican convention, the probability that Clinton is leading Trump is summarized as follows:
|Source||Probability of Clinton Win|
|FiveThirtyEight (“Nowcast” version)||64%|
|New York Times (the Upshot)||74%|
|Betting Fair (UK betting market)||69%|
The first three entities give some degree of detail about their approaches but do not disclose their technical models, although FiveThirtyEight is well-known for using a Bayesian approach. The betting market percentage is ultimately linked to economic market forces as bettors set the price at which they are willing to enter into a bet.
As an alternative, we introduce the use of a Gaussian Process (GP) framework as a means of addressing the challenge of estimating a “Clinton Winning Now” probability. GPs are an exceptionally powerful framework for analyzing complex data, and have not typically been applied to polling analysis. The key advantages of GPs are:
- The method smooths out noisy data across multiple input dimensions
- GPs automatically generate credible intervals around the best estimates
- The weightings used in the smoothing approach are automatically determined
- With polling data, we are essentially dealing with single dimensional data, namely polling results over time. The GP can be easily extended to handle multiple dimensions.
Jumping Straight to the results, we show how the estimated percentage of votes to Clinton and Trump, together with a 90% credible intervals:
The next chart shows the probability of a Clinton win over time, with the July 18 probability being 68%.
The GP Method Explained
The core concept of a GP model is that it assumes that every output (in our case the percentage of votes going to Clinton) in a training dataset is correlated to every other output. The trick underlying GPs is to structure the correlation to be higher for inputs that are closer to one another and lower for inputs that are further away. In our case, the input is the date of the poll. This simply means that poll results that are closer in time will have high correlation, poll results taken further away will have low correlation. By analyzing the GP model, we find that the correlation of data degrades in the following way:
This suggests that after about two months, the state of the race has almost no correlation to today’s position. The key take-away from this graph is to expect the polling results to change over time. The race has not settled yet.
Unlike a traditional linear or generalized linear model, there is no explicitly formula in a GP that links the output to the input. For example, a random walk model might say:
Vote%(time = t) = Vote%(time = t-1) + error
A GP model instead relies on the correlation structure to enforce a relationship. This enables GPs to model very complex relationships across multiple dimensions. GPs have proven to be among the best-in-class prediction engines across many areas.
Our GP model analyzes polls of registered and likely voters on a national scale. State polling was not analyzed and this could clearly improve results.
The GP model we developed, nevertheless, does have a few bells and whistles:
- Changing noise term: A GP model could be fit directly to the % Clinton votes. However, this will ignore the fact that some polls are derived from significantly more samples than others, e.g. the NBC/SurveyMonkey poll had 9,436 samples, the Monmouth poll had only 805. We would want to place more weight on the former poll. This is done by using a binomial likelihood function which takes into account the size of the poll. Thus the model recognizes that there is less noise for polls with larger sample sizes.
- Pollster Noise: Each pollster conducts its surveys slightly differently and weights the results according to its own methods. Further, the general election will have its own idiosyncrasies relative the results from the pollsters. To cope with this, we use a hierarchical model which incorporates the bias effect and noise generated by each pollster relative to the average and also estimates and then adds a further element of noise for the general election relative to all other polling results. The adjustment effect with 80% credible interval for each pollster is shown in the following chart. The “general election” is treated as a new and unknown “pollster” having additional noise. The implication of this is that there will always be a degree of noise in the eventual general election results that will be hard, if not impossible to remove, just by adding additional or larger polls. Most of the pollsters are reasonably clumped around the average, with just a few presenting potential upward or downward bias.
Our analysis suggests that the General Election will likely have a +- 1.7% noise effect relative to the various Pollster results
In summary, our results are very close to the betting markets and roughly the midpoint of the FiveThirtyEight and NY Times estimates. We note that the analysis relates to the state of the race as it evolves over time. Of real interest is what is going to happen in November, not what is the sentiment as of today. We anticipate the probability of a Clinton Win will trend towards 50% the further into the future that we project. This is not to suggest that Trump will intrinsically improve his standing, instead, the fog of time will act to make our estimates less certain the further out into the future we conduct our predictions. For example, a 50% chance of winning is roughly equivalent to saying that we have no special information about the state of the race. GPs have something to say about this too, but that will be left to a different post.