Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 19 submissions in the queue.

Submission Preview

Link to Story

Do Complex Election Forecasting Models Actually Generate Better Forecasts?

Accepted submission by day of the dalek at 2024-09-22 07:23:39 from the lies-damn-lies-statistics-and-pundits dept.
Science

We are just a few weeks away from the general election in the United States and many publications provide daily updates to election forecasts. One of the most well-known forecasting systems [natesilver.net] was developed by Nate Silver, originally for the website FiveThirtyEight. Although Silver's model is quite sophisticated and incorporates a considerable amount of data beyond polls, other sites like RealClearPolitics [realclearpolling.com] just use a simple average of recent polls. Does all of the complexity of models like Silver's actually improve forecasts, and can we demonstrate that they're superior to a simple average of polls?

Pre-election polls are a bit like a science project that uses a lot of sensors to measure the state of a single system. There's a delay between the time a sensor is polled for data and when it returns a result, so the project uses many sensors to get more frequent updates. However, the electronics shop had a limited quantity of the highest quality sensor, so a lot of other sensors were used that have a larger bias, less accuracy, or use different methods to measure the same quantity. The science project incorporates the noisy data from the heterogeneous sensors to try to produce the most accurate estimate of the state of the system.

Polls are similar to my noisy sensor analogy in that each poll has its own unique methodology, has a different margin of error related to sample size, and may have what Silver calls "house effects" [fivethirtyeight.com] that may result in a tendency for results from polling firms to favor some candidates or political parties. Some of the more complex election forecasting systems like Silver's model [fivethirtyeight.com] attempt to correct for the bias and give more weight to polls with methodologies that are considered to have better polling practices [fivethirtyeight.com] and that use larger sample sizes.

The purpose of the election forecasts is not to take a snapshot of the race at a particular point in time, but instead to forecast the results on election day. For example, after a political party officially selects its presidential candidate at the party's convention, the candidate tends to receive a temporary boost in the polls, which is known as a "post-convention bounce" [natesilver.net]. Although this effect is well-documented through many election cycles, it is temporary, and polls taken during this period tend to overestimate the actual support the candidate will receive on election day. Many forecast models try to adjust for this bias when incorporating polls taken shortly after the convention.

Election models also often incorporate "fundamentals" [gwu.edu] such as approval ratings and the tendency of a strong economy to favor incumbents. This information can be used separately to predict the outcome of elections or incorporated into a model along with polling data. Some forecast models like Silver's model also incorporate polls from other states that are similar to the state that is being forecasted [fivethirtyeight.com] and data from national polls to try to produce a more accurate forecast and smooth out the noise from individual polls. These models may also incorporate past voting trends, expert ratings of races, and data from prediction markets. The end result is a model that is very complex and incorporates a large amount of data. But does it actually provide more accurate forecasts?

Unusual behaviors have been noted with some models, such as the tails in Silver's model [columbia.edu] that tended to include some very unusual outcomes. On the other hand, many models predicted that it was nearly certain that Hillary Clinton would defeat Donald Trump in the 2016 election [columbia.edu], perhaps underestimating the potential magnitude of polling errors, leading to tails that weren't heavy [wikipedia.org] enough. Election forecasters have to decide what factors to include in their models and how heavily to weight them, sometimes drawing criticism when their models appear to be an outlier [salon.com]. Presidential elections occur only once every four years in the United States, so there are more fundamental questions about whether there's even enough data to verify the accuracy of forecast models [politico.com]. There may even be some evidence of a feedback, where election forecast models could actually influence election results [nytimes.com].

Whether the goal is to forecast a presidential election or project a player's statistics in an upcoming baseball season, sometimes even the most complex of forecasting systems struggle to outperform simple prediction models [mlb.com]. I'm not interested in discussions about politics and instead pose a fundamental data science question: does all of the complexity of election models like Nate Silver's really make a meaningful difference, or is a simple average of recent polls just as good of a forecast?


Original Submission