Something interesting happened in the lead up to the presidential election. Most of the mainstream media outlets were predicting a statistical dead heat between Obama and Romney, and only towards the end did some in the media start reporting that Obama had a razor thin margin.
However, this so-called statistical tie was at odds with new breed of analytics that is the basis of Nate Silver’s FiveThirtyEight blog, hosted on the New York Times website. Silver became a lightning rod for criticism by the punditry class because he consistently showed that the race was Obama’s to lose. In addition to publishing the national vote split, he also published electoral splits and most importantly, he showed the probability of an Obama win. His predictions of an Obama win are shown in the chart below in blue:
As can be seen, on the day on the election, he predicted Obama had a 90.9% chance of winning, and even at the height of the Romney run following the Denver debate, Obama still had about a 60% chance of beating Romney. This continuing narrative of a stronger Obama position than that put forward by other pollsters made Silver’s work controversial. . Silver, however, was very vocal about his views, remarking on the Colbert Show that if he had a choice between voting for a political pundit or the Ebola virus, Ebola would win out.
How does Silver calculate the chance of winning as 90.9%? He uses is a Bayesian approach. The core element of Bayesian analytics is that it views the parameters underlying a model (e.g. the probability of a person voting for Obama) as random variables and the data (in this case, the poll results) as being fixed. This turns the traditional or “Frequentist” approach on its head, which typically specifies a model as having fixed parameters and the data as being random.
If we ignore the electoral college and assume that the Presidential contest is based on the popular vote, then in the Bayesian framework, we would set p as the probability of an individual voting for Obama and 1-p as the probability of an individual voting for Romney. The quantity p is assumed to be a random variable with an associated probability distribution. The Bayesian methodology allows for updating the estimate of the distribution of p continuously. As more data is gathered, the standard deviation of the distribution of p will decline. Thus Silver, by pooling data from many national and state elections, was able to reduce the standard deviation of the distribution of p.
Although simply averaging over polls will give you a better estimate of results than analyzing any one poll in isolation, the Bayesian approach provides a rigorous framework for incorporating data from multiple sources and over different time periods. You can also answer questions such as “What is the probability of Obama or Romney winning?” This probability would not be immediately apparent just by inspecting the average of available polls: we seek a more sophisticated approach. Such an approach would be rooted in Bayesian statistical theory.
This Bayesian approach is best illustrated in the form of a hypothetical thought experiment. Suppose 12 polling organizations have conducted polls during the two days leading up to the election, each examining a sample size of 1,000 people. We simulate hypothetical polling results in table below.
Individual Poll Analysis
95% Confidence Interval for p |
|||||||
Poll |
Size |
For Obama |
For Romney |
Percentage for Obama (p) |
Probability Obama Wins |
Lower Bound |
Upper Bound |
1 |
1,000 |
502 |
498 |
50.2% |
55.0% |
47.1% |
53.3% |
2 |
1,000 |
491 |
509 |
49.1% |
28.5% |
46.0% |
52.2% |
3 |
1,000 |
523 |
477 |
52.3% |
92.7% |
49.2% |
55.4% |
4 |
1,000 |
489 |
511 |
48.9% |
24.3% |
45.8% |
52.0% |
5 |
1,000 |
518 |
482 |
51.8% |
87.2% |
48.7% |
54.9% |
6 |
1,000 |
516 |
484 |
51.6% |
84.4% |
48.5% |
54.7% |
7 |
1,000 |
519 |
481 |
51.9% |
88.5% |
48.8% |
55.0% |
8 |
1,000 |
505 |
495 |
50.5% |
62.4% |
47.4% |
53.6% |
9 |
1,000 |
511 |
489 |
51.1% |
75.7% |
48.0% |
54.2% |
10 |
1,000 |
499 |
501 |
49.9% |
47.5% |
46.8% |
53.0% |
11 |
1,000 |
500 |
500 |
50.0% |
50.0% |
46.9% |
53.1% |
12 |
1,000 |
517 |
483 |
51.7% |
85.9% |
48.6% |
54.8% |
Based on the data from Poll 2 alone, the Bayesian analysis would estimate the distribution of “p” as the following:
We can estimate the area under the curve to the right of p=0.5. This is 28.5% and it represents the probability of Obama winning. Likewise the area under the curve to the left of p=0.5 is 71.5% and this is the probability of Romney winning. Likewise, the confidence intervals of p fo 46.0% to 52.2% can be measured from the derived distribution. This analysis is based on only the data from Poll 2.
But there is a way of combining the polls to reach more accurate conclusions: sequentially adding the poll results to produce a cumulative poll analysis is a powerful technique for obtaining more reliable estimates. If we analyze the polls on a sequential cumulative basis, we obtain the following table:
Cumulative Poll Analysis
95% Confidence Interval for p |
|||||||
Cumulative |
Size |
For Obama |
For Romney |
Percentage for Obama (p) |
Probability Obama Wins |
Lower Bound |
Upper Bound |
1 |
1,000 |
502 |
498 |
50.2% |
55.0% |
47.1% |
53.3% |
2 |
2,000 |
993 |
1,007 |
49.7% |
37.7% |
47.5% |
51.8% |
3 |
3,000 |
1,516 |
1,484 |
50.5% |
72.0% |
48.7% |
52.3% |
4 |
4,000 |
2,005 |
1,995 |
50.1% |
56.3% |
48.6% |
51.7% |
5 |
5,000 |
2,523 |
2,477 |
50.5% |
74.2% |
49.1% |
51.8% |
6 |
6,000 |
3,039 |
2,961 |
50.7% |
84.3% |
49.4% |
51.9% |
7 |
7,000 |
3,558 |
3,442 |
50.8% |
91.7% |
49.7% |
52.0% |
8 |
8,000 |
4,063 |
3,937 |
50.8% |
92.1% |
49.7% |
51.9% |
9 |
9,000 |
4,574 |
4,426 |
50.8% |
94.1% |
49.8% |
51.9% |
10 |
10,000 |
5,073 |
4,927 |
50.7% |
92.8% |
49.8% |
51.7% |
11 |
11,000 |
5,573 |
5,427 |
50.7% |
91.8% |
49.7% |
51.6% |
12 |
12,000 |
6,090 |
5,910 |
50.8% |
95.0% |
49.9% |
51.6% |
Combining 12 polls (see last line in the table above), the Bayesian analysis estimates that the mean value for p is 50.8% and that there is a 95% chance of Obama winning. This can be seen from the new distribution of p based on the combined polls:
Now 95% of the area under the curve lies to the right of 0.5 and thus the probability of Obama winning is 95%. As can be expected, the distribution of p is much narrower after taking into account all 12 polls compared to the distribution derived from just a single poll. It so happens that the data in this example was generated from a hypothetical population where p=0.51. The distribution above shows how closely the Bayesian analysis was able to ferret out this 0.51 amount. Thus despite the margin for Obama being only 51% to 49% in the hypothetical example, the Bayesian analysis based on 12 polls can estimate the probability of Obama winning at 95%.
The above charts also highlight that the Bayesian analysis does not produce a point estimate for p, as is done is traditional statistical analysis, but a full probability distribution of p allowing for a richer analysis.
At Elucidor, we use Bayesian analysis as a core part of our analytics of insurance related risks. As with the above analysis, it produces a number of key advantages for risk takers such as insurance companies and investors, including:
- You can combine data from multiple sources in a cohesive fashion to vastly improve the accuracy of your results. This can be done as data is received rather than waiting until the end of the analysis period.
- You can calculate risk measures (e.g. the probability of default, the probability of a claim, the probability of exceeding an attachment point) within a structured format.
- Bayesian models can more easily incorporate the risks associated with your models not being representative of real world. This is often called parameter risk.
- It is also more feasible to incorporate complexity in models in a practical way (e.g. analyzing winning on an electoral rather than popular vote basis) than would typically be the case in a traditional statistical model.