Skip to main content
SearchLoginLogin or Signup

Post-Election Interview With Andrew Gelman and G. Elliott Morris

An interview with Andrew Gelman and G. Elliott Morris by Liberty Vittert and Xiao-Li Meng
Published onDec 14, 2020
Post-Election Interview With Andrew Gelman and G. Elliott Morris
·
key-enterThis Pub is a Supplement to

Listen to the interview or read the transcript below


Liberty Vittert (LV): Hello, and welcome to the Harvard Data Science Review special theme on the 2020 U.S. election. I'm Liberty Vittert, Media Feature Editor for the Harvard Data Science Review, and I'm joined by my co-host Xiao-Li Meng, our Editor-in-Chief. Today we are speaking with Professor Andrew Gelman of Columbia University and G. Elliott Morris, the data journalist for The Economist. They predicted a Biden win in the Harvard Data Science Review in October 2020, and they are here today to discuss that correct prediction in the aftermath of this election.

Well, both, thank you so much for joining us. We really appreciate it and we also loved your paper. Would one of you be able to give us sort of a brief recap of your model for the listeners and also a general consensus of what went wrong, what went right, sort of a grade card for the prediction or forecasting?

Andrew Gelman (AG): Well, I'll start by describing the model, and then Elliott can say what I've left out. In general, what's important about the statistical model is not what it does with the data, but what data it uses. And our model uses three sources of data. We have a fundamentals-based prediction predicting the election outcome based on previous election outcomes, the economy, and public opinion during that campaign period. Then we have state and national opinion polls, and then we have models connecting the state and national polls to the outcome of interest, which is how people are actually going to vote. And there are two parts of this model. The first part of the model is a model for sampling and non-sampling error in the polls, And that's, right now, kind of the famous part of the model because it allowed the polls to be off by larger than their stated margin of error based on how off polls were historically. Second, we have a time series model, which allows public opinion to change during the months during the campaign, and especially including during the very last week of the campaign. Both of those models have correlations at the state level, so we assume that polling errors can be positively correlated. You'd expect to see, if there are errors, you'd expect them to happen at the national level as well as at the state level. Similarly, public changes in public opinion, you will have some idiosyncratic changes, but you'll also have changes at the national level. We did not have much of a turnout model to predict whether voters of one party or another would be more likely to turn out compared to previous elections. And our model did not include factors for votes being invalidated, which evidence is that votes by mail are invalidated at a higher rate than in-person votes.

LV: That was the best recap I've heard in a long time of such a big paper. So, for the report card aspect, Elliott, are you going to give yourself a report card?

G. Elliott Morris (GEM): The model worked about as well as you'd expect under conditions of higher-than-average polling error, so if you just look at the raw average of polls, not our model output, the polls looked to have overestimated support for Joe Biden on average by five or six percentage points across states.

AG: Wait, wait, wait. The polls overestimated support for Joe Biden by about two and a half percentage points.

GEM: Yeah, I was rounding on margin. Sorry. So, on vote share, two and a half percentage points, two party vote share, which is higher than the two percentage point, two party vote share average error from 2000-2016. So we're looking at a higher than average polling error election. Now, all things considered, our model simulating enough error to have, I think, 48 or 49 state outcomes within the confidence interval, I guess is pretty good, but it's certainly disheartening performance for public pollsters. We're going to have to think a lot about how to leverage other sources of information if we do this again, or about the tail width of our model, which probably should have been a bit wider.

Xiao-Li Meng (XLM): Let me follow up on that for both of you. First thank you for publishing your paper in HDSR. It’s great. You guys did do one thing, which I really appreciate. I wrote about it in my editorial. You guys did the back testing. You actually tried your models back to 2016 to see how it would have performed. And you did see there, if I remember correctly, there was already an over-estimation. So, the model you were back testing yourself is warning you that there potentially could still be over-predicting. And so my question for you, is that, once you saw the results, how did that help you to re-tune the models or did you decide not to do anything but simply just to report that?

AG: Let me take a shot at this one. There are two ways of responding to the issue that when you fit it to 2016, you found that it slightly overestimated Hillary Clinton's vote share. So, the first response is to say the polls overall overestimated the Democrats’ vote share in 2016, so we would expect correlated error. We should assume some regression to the mean, which in this case didn't happen, but we should assume that in 2020 that, on balance, we think the polls would overestimate the Democrats. And so, we should put a shift factor into the model corresponding to a correlation in polling error from one year to the next. That's one response, which we did not take. The other response is a economically-motivated response to say, yes, the polls are off by that amount in 2016. They [the pollesters] have every economic motive to correct for that, and so they're going to try their best. Given that, we have no expectation that the polls would be biased towards one party or the other, so let's put in a zero mean high variance error term, which is what we put in. Now, one reason why we did this is that in 2012, the polls were not biased in favor of the Democrats. Thinking about it, I feel like there was information not in our model that perhaps could have indicated that the Republican vote share would be better than predicted. There was some voter registration information. It's not so easy, right? To throw information in a model requires additional steps. It would have required us to say, yeah, thinking about this, given all the information here, there is evidence outside the polls suggesting that the Republicans might do better. Let's put that in. It's easy to say that in retrospect, like during the campaign, you could also say that there is evidence outside the polls favoring the Democrats, like various Republican leaders’ not supporting Trump. It's hard to say, but certainly that could be done.

My quick answer in the back testing is that the back testing alone wouldn't have done it because, beyond allowing us to have a healthier error term and avoiding us from a major embarrassment, what we're doing is saying, in the past, there have been occasional large errors so let's allow for that. Given the historical data on the polls, it would have been a tough call for us to assume that just because there was a polling error in the Democratic direction in 2016, that we would expect that in 2020. You'd need extra information in the model somewhere that wasn't already accounted for in the polls.

XLM: I think I understand your perspective. And in one way, I think that's a right response because you say, historically, these polls always have errors here and there, and you don't want to be correcting too much just because of one election is kind of off. I understand that. But my counterargument would say that for 2016, we have pretty clear evidence---I did some work on that as well, as you know—to really show that there was this, whether you call it the nonresponse bias or shy Trump phenomenon, clear indication that people were just reluctant to tell you that they were voting for Trump. Now, given this is the same candidate and given the situation is even more so, I thought your model what have taken into account that factor as well.

AG: OK, there are two things. One, I don't see evidence that people were reluctant to say they would vote for Trump. I think that Republican voters were less likely to respond to the polls and it was a non-response problem, which, of course, you've studied Xiao-Li. I also think that this non-response was even after adjusting for factors. So, there were polls that adjusted for how people stated they voted in the previous election. We thought that would adjust for that, and I think that did work in 2016, but not this time. That was part of it. I think we were trusting the pollsters. I wouldn’t say ‘we were trusting the pollsters and they let us down,’ that's kind of wrong. It's our job, if we're post-processing their data, to model whatever biases and errors they might have. They did their best and their best wasn't good enough. But our job is to recognize that. I think it's easy to say now that we should have—and I do think now that we should have, and I don't mean only in retrospect. One thing we did not do during the campaign is say, ‘What other sources of data do we have?’ We talked about potential biases and we talked about being concerned that our confidence interval, that our uncertainty intervals were too narrow and we were worried about potential tail events. But the one thing that I don't recall us discussing was, is there other information not in the polls and not in the fundamentals that we could use, that we could put into the model. To the extent that we had a mistake, that was the mistake. I don't think we could have done it without additional information.

XLM: Well, I certainly I certainly agree with that. And I think, too, your point about the nonresponse biases are clearly there. The question then is that when you say you rely on the pollsters to make correction --- I do know most polls to make corrections through weightings and I understand they do more weightings on the education level now --- but these weightings don't really take care of the kinds of—whether you call it a shy Trump phenomenon or shy Republican phenomenon—people just selectively don't respond. Then you just have to address that directly. Unless these weights can address that, I don't see how you can solve that problem.

AG: Well, you can adjust it. You can adjust for it by adjusting for—I don't like to refer to weighting because that's only a particular technique—but you can adjust for differences in Republican and Democratic response rates by adjusting for the partisanship of the respondents. But that's not perfect either, and also, that doesn't address turnout issues.

LV: I wanted to ask really quick about the no evidence of the shy voter. There was a study—I think it was out of USC, maybe—and they looked at asking people if they would vote, who they were voting for. And when they did that, Biden was up by 10 points. But when they changed the question to who are your neighbors, your friends, voting for, people who were much more willing to say Trump and Biden's lead went down, I think almost five points. So, are you familiar with that, changing the question to get at whether someone's a shy voter or not?

AG: I really would step away from the whole ‘shy voter’ thing. I don't see evidence for that. But we've done some survey questions like that, asking people how their friends and families and neighbors plan to vote. You can learn a lot from that. That's the kind of information that would be great to incorporate into a model. It's not completely clear how. I think when you talk about how your friends vote, part of that is perception that could be wrong. Part of it is that people who tell you how they're going to vote are presumably more enthusiastic, and enthusiasm matters also. If people are more likely to say that their friends are going to vote for Trump, that actually suggests that Trump voters are the opposite of shy. I think that's good data. I think it's a cognitive thing that when people think about surveys, it's somehow very easy for people to imagine survey respondents lying, but it's harder to visualize people just not responding to the survey in the first place. I think most of the problem is nonresponse. Some of the problem is not being able to model turnout. Insincere survey responses doubtless exist, but I just think they're in the forefront of people's mind. As I said, it's a kind of cognitive illusion that when you think of a survey, you're picturing somebody responding, and so, you think about insincere survey respondents. You don't really think about the people who don't respond at all. So, it's not that it can't exist, I just think the attention paid to it is a mistake.

XLM: Well, I certainly agree with that. I think that we somehow collectively, still, just have too much confidence in the polls themselves. Right? Because, you know, we understand to theory, why that works, but we just underestimated to really implement polls to do well is extremely difficult. People have the choice of not responding for whatever reason, whether it's a shy voter or whatever, they don't want to respond to you, and that’s the mechanism. Until we seriously address the mechanism, I don't think that we can get to the point that we will feel very confident about these polls.

AG: You can get to some of it. You can do polls where you actually sample people from lists of registered voters, for example, and it is public record whether people have voted before. But it's difficult. That's what Steve Ansolabehere said, that given that nonresponse rate through over ninety nine percent, it's pretty impressive that a poll can be within two and a half percentage points of anything.

XLM: Right, because I actually looked into using Steve's data, looking at these people who actually voted because it's public record, and you still clearly see a kind of nonresponse bias – retrospectively, you can obviously calculate that.

We want to talk to you about this larger picture. No matter how we spin it, given the pollsters’ performance for 2016 and 2020, I don't think the 2020 election itself, in terms of what the pollsters have done, has raised the confidence in the public about the polling itself. The question, then, is that, in moving forward, how do we do – you and me, all kinds of professionals-- to communicate to the general public to ensure their participations in ways that we expect them to, and not to selectively not respond? People say, ‘Well, why should I participate? Because those polls are really not that useful.’ People may even think them deceptive. So that's an issue. Right?

AG: I mean, setting aside our professional interests as statisticians, so what if people don't participate in polls? They don't have a responsibility to. In the 1950s, it was very rational to respond to a poll because if you responded to a poll, you were one of 1500 people responding to the Gallup poll. You might be the person who flipped it from 50% to under 50% for some important issue, and it would be in the news. That would be a very effective way of affecting policy. Now that there are millions of polls, that's not going to happen. Polls aren't going away because businesses still want to know whether you would pay $20 more a month for cable TV if it included ESPN and so forth. They're always going to be doing polls, and then as long as the results of political polls aren't too embarrassing for the pollsters, they're always going to be doing political questions on their polls, not because it makes money, but because it gets them in the news. Then there are ways of trying to do better correcting for these things. But, yeah, public opinion. George Gallup wrote about how he argued that polling was very good for democracy because then politicians were more aware of public opinion. So, it's true, if we don't have any confidence in polls then that does remove one of the connections between the voters and the politicians. Kind of gives politicians an excuse to do what they want and not listen to the voters. So, that is a concern.

Maybe if polls are only off by two-and-a-half percentage points, it's not such a big issue because most issues aren't so close to the knife edge. 75% of people in America supported health care reform until it became politicized, and then it became 50%. I think that's about what happened, whether it was really from 72% to 54% or 75% to 48%, that doesn't really matter that much. The fact that it was 50 and not 75 made a big difference. It really emboldened the Republicans to oppose it, which is completely legitimate. So, it gave them the information to support them. Not that politicians will go against the polls, which is fine, but they should be aware of what public opinion is.

LV: Would you say that there is any secret to forecasting? What would be your best advice for forecasting?

AG: Well, I wouldn't tell you my secret, right? We put all our code on GitHub. We obviously don't have a secret. It's like we're giving you all the ingredients and the recipe for a sauce. I got nothing. And I think my best hope for improving things is to write some textbooks so that future researchers will figure out their own ideas and go beyond that.

LV: That's the best advice. I love it I have one last question, and that is before you go, what do you think is going to happen in 2024?

AG: I'll let Elliot take that one. It's too bad we weren't able to overlap more and argue with each other a little bit as we do sometimes when we're having our meetings. I'll let him take the 2024 question.

XLM: Thank you Andrew. We will find a chance to argue more. We used to argue with each other a lot.

GEM: I have no idea about 2024, I wouldn't be so bold as to make a prediction that far in advance.

XLM: That's a safe answer.

GEM: Maybe we'll have a Tom Cotton/Bernie Sanders ticket.

LV: I mean, I guess anything could happen at this point!

XLM: That looks like how life has been evolving in 2020, right? Lots of surprises.

LV: Eliot, I wanted to dig in to a tweet that you wrote earlier. You said, "For the record, it is possible to believe both that (a) The patterns of non-response [sic] and pro-Dem bias in the polls in 2016, 18, and 2020 is very concerning and (b) most reporters covering polls/forecasts/etc [sic] don't understand the true underlying uncertainty in pre-election surveys." So, could you dig into that a little bit more and explain when you all say that Biden has a 99% chance of winning in terms of confidence, how should reporters covering polls and forecasts report that or explain that?

GEM: Well, I'll first explain the motivation. So, right after we got our first red-mirage-ish election results Tuesday night, there was an immediate influx of articles about how the polls were broken and forecasts were useless, using those exact words across news outlets and journalists. I don't think that's true. At the very least, forecasts contextualize the polls in ways that we would not have been able to do without forecasts or without more sophisticated statements of margin of error. That brings me to the point, which is it seems like lots of journalists, just by way of that outrage, don't understand the margin of error in a survey. We’ve written about how the margin of error is probably at least two times as large as the one that pollsters will tell you. Yet journalists expect the electoral predictions or snapshots in time to be within one or two percentage points of the results every time, and if it's not, then it's a failure. I think the forecasts provide a public service in contextualizing that error and the range of outcomes that are possible under certain conditions. Our prediction, I don't think we're doing it the right way. Telling the public or even journalists—who might be the main consumers of forecasts, anyway—that a candidate has a 97% chance is communicating the range of outcomes, but not in a way that makes them think critically about the uncertainty in the polling data.

Next time around, we'll probably think a little bit harder about how we’re displaying our forecast and how we're trying to communicate uncertainty, because if we are saying basically that Joe Biden has so large a lead that even an historically unprecedented polling error would probably not be enough to elect Donald Trump to a second term, and that's exactly what happens, and people still write the polls are useless, forecasts or bogus articles—there’s clearly room for improvement with the forecasting communication.

XLM: I think one of the things that we are all struggling with is this concept of margin of error. You know, we statisticians have done quite a successful job to convey that idea to the general public. And people just take the margin of error is the measure of--no matter how you spin it--when you say plus or minus three percent, people just think the truth must be within that interval, no matter how you explain it. Now, my fundamental question here is that, as professional statisticians, we knew the problem is the original concept of margin of error is purely only captureing the sampling error. Now we know well that the margin of error, that kind of sampling error, is really just one part. Sometimes it is even a minor part. There are all these other errors, as statisticians, we don't even call them variations, we call them the bias, the nonresponse bias, all kinds of non-coverage bias. The question then is that, until we can factor those things into a margin of error calculation, if we don't do that and no matter how we do it, we're going to miss those things. And if we just miss by a little bit, the problem is, as you know well, for the public, that no matter how you explain it, if you are not covering it, even if you just missed a little bit, no matter how you spin it, then people think that's just wrong.

What can we do here? How do we think about re-educating the general public about this concept of margin of error? Actually, more importantly, re-educate the pollsters about this concept of margin of error? What they should include or how they should calculate, how should it take into account historical errors of all kinds? It does seem to me that it is a moment that we need to think very critically both about how the polls are going to be conducted and the public perception about how to deal with this. Otherwise, we're fundamentally going to have a problem.

GEM: I have two thoughts. The first is—well, I should preface this by saying I don't want to talk down to pollsters, many of whom are smarter than me and do a lot of important work. But I think the American Association for Public Opinion Research could do a lot to lean on pollsters and give them best practices for reporting margins of error, especially in pre-election surveys. Some pollsters will go on Twitter or in their press releases. They release multiple turnout scenarios, but they never tell you. And that's making the underlying case that the polling numbers can change based on how you weight the data. But they very rarely tell you the full range of outcomes, depending on the possibility of turnout scenarios. If APOR tells pollsters, ‘You need to include your non-sampling error as well as your sampling error with the margin of error,’ that might do a lot.

Then, of course, there's also pressure on people like me who not only do forecasts but also cover the polls to try to educate the public about the true range of uncertainty in those data. At one point, or at some point, devoting so much time to forecasting has a lot of psychological pressures. It makes you want to buckle down on your predictions, especially if we're at a 97% confidence for Biden. Sometimes in communicating the forecast, we lose some of the context that people really, really need to hear. “The polls have double the margin of error that you think,” over and over again. We really need to beat that into them rather than just the range of outcomes for the election, because if we do that, then that could help the public understanding of survey uncertainty a lot better. Of course, this is not a researched opinion, but that's probably one step that I can take and a step that APOR could take pretty easily as well.

LV: To go a bit further into what actually happened and where the error came from that you weren't expecting, there was an article, I think it was in The New Yorker with Nate Cohn. And he spoke about two big issues: one was the nonwhite, mostly Hispanic vote that went for Trump, more so in 2020 than in 2016. And the other was the white rural Midwest, middle-of-the-country, American vote. Could you talk about that? Do you agree with that? Do you think that's where the real errors are or do you think there's something else going on?

GEM: Yeah, I agree with that. Obviously, we're not going to be able to cover all the bases of the potential facets for error. But, yeah, I think it's useful to stratify the errors, the first being in Florida and, to some extent, Arizona and Texas, which overestimated Joe Biden, because polls have a hard time reaching Hispanics, and especially in Florida, polls tend to weight all Hispanics the same and not give extra weight to Cubans who make up 4-5% of the population. I guess, perhaps, if pollsters had designed their Hispanic weighting by subgroup within the Latino community, they would have picked up more support for Trump. But it could just be that there was a late shift of the campaign that they missed and that matters a lot there. Surveying Hispanics is always hard, though.

If we're talking about residual errors versus 2016, the ones that are really standing out to me, the enormous errors in Wisconsin, Pennsylvania, and Michigan, which are approaching eight percentage points in Wisconsin. In 2016, the error in Wisconsin was closer to four or five points, depending on how you sliced it. So, the doubling of the error there tells us that there's something wrong with the way that pollsters are reaching non-college-educated, white Trump supporters. We were talking about nonresponse earlier. That's probably the best place to direct our research. Clearly, the typical nonresponse adjustments for differential partisan nonresponse aren't going far enough. If all nine pollsters like YouGov and Civics and pollsters like Monmouth University and The New York Times, all of which are adjusting from partisan nonresponse in one way or another, either pre- and post-processing, that they all overestimated support for Joe Biden. Actually, statistically by about the same margin or an indistinguishable margin from those who weren't adjusting for non-response, because our model had an adjustment for this and didn't pick up any residual differences, didn't improve our predictions. So, there's got to be something going on within those partisan groups that's causing nonresponse. I don't conduct my own polling, so we can't really look at that. We can lean on YouGov, who provides the Economist with data for some findings. To the extent that I can prescribe some suggestions, that's where they would spend most of their effort, I would think.

XLM: I would assume that after this election, I hope, that further efforts will be made to try to really address these nonresponse, not coverages, and all these things that are just much more fatal than the kind of sampling errors that we're talking about. I want to raise a broader question: in making lots of these corrections, there are lots of judgments that need to be made. Now in the data science community, there is this general perception that when we build these statistical models or economic models, we do those things that are much more evidence-based. There's lots of data, lots of modeling. There is less emphasis on, in terms of judgment, qualitative assessment in all kinds of these issues.

We just had a conversation with Allan Lichtman. As you know, his way of approaching is completely kind of a qualitative using his “Keys.” You can debate about whether these methods are better or worse. But there is a one big point here, which is that he used a lot of judgments, and that's also a vulnerability, but I want to ask you, in your modeling paper, I assume you made lots of judgments as well, right? Setting up the prior, where to make adjustment, and a lot of these kinds of judgments. Can you talk a little bit about in the process of modeling, how do you make the judgment? Where do you draw the line? How do you interpret this between the qualitative thinking versus quantitative thinking? That kind of sort of general picture there.

GEM: I would focus on two major judgments. I wouldn't necessarily call them subjective, but certainly judgments, that we made in model building and in data inclusion. In model building, we decided, in our fundamentals prediction for the election, the prior of our Bayesian poll aggregation to adjust the role that the economy plays in changing voter behavior for the level of political polarization in the electorate, specifically for the amount of swing voting over time. So, our regression that predicts incumbent party aggregate national vote share from 1948 to now had an interaction term with the degree of polarization of the electorate. We took some flak for this because it's kind of new and it's another variable and a pretty underpowered one in regression. But here were some judgments made that made us think it was worthwhile. First, in political science research, we found that partisanship has come to moderate our economic perceptions. So, if that's true, then it's going to moderate how people are responding to the economy and deciding who to vote for. That made us pretty confident that this is something we should be trying to control for, even if we're not doing it in the perfect way. And then, second, if you just look at a time series of how people feel, the direction of the economy is headed, there's a complete partisan reversal when Donald Trump’s elected. We thought that this was an adjustment we needed to make in model building. The way we're reasoning about that is just to be able to have enough evidence to fit our prior that this is something we should be doing. And then to put faith in that we're capturing the amount of uncertainty that the added predictor is adding. Of course, there's always room for disagreement there, and there was quite there's quite a lot of disagreement about it.

The second judgment we made, which might be even a bit more controversial, was to not include polls in our forecast that were published by Donald Trump's super PAC or published by outlets that seemed to be ideologically-motivated and let those ideological motivations seep into their questionnaire design, or it looked like the way they were weighting their data perhaps to partisan benchmarks that were too high or too low, that fit their ideological predisposition. We didn't include polls from two pollsters because they conducted surveys for Trump super PAC, and then two other pollsters for the other reasons I've listed. It turns out that including those pollsters would have improved the estimates of the model because they were more biased toward Donald Trump, and we got an error that favored Donald Trump. That would have pushed our average prediction down to zero. So, if all we're trying to do, I guess, is optimize our predictions, that's something we should have done. I made the judgment that it was better not to let those data pollute the data stream, that sort of Bayesian process of the model, because it didn't fit the sort of motivations for public polling, which are typically just to survey the population and release the results, not to, on the super PAC context, spur donations for Donald Trump by showing a closer race or perhaps to satisfy their client by showing him doing better than the other polling did. So, now we have to ask this question, which is, did they all know some secret that the rest of the polling industry didn't, or did they get lucky? And so we're going to be reasoning about that and making judgments about that for the next four years to try to figure it out.

XLM: But I guess there is also another judgment there, right? Because I can completely understand why you don't want polls that are ideologically driven. But by making the statement, what you just did is also assuming that the ones you included have no ideologically-driven component of it. How sure you are on that?

GEM: Well, empirically, just looking at the house effects for the pollsters that we didn't include, they were all much larger than the spread of house effects for other nonparitisan pollsters. I'm relatively confident that they were at least ideologically-driven in the data they were releasing. Or even if their weighting decisions weren't illogical, that there was something sort of contaminating the stream of post- or pre-processing or even data collection. But maybe that was the wrong call to make. If it would have increased our predictions, and that's what we care about, then it was the wrong call to make. That's what that's what some of the competing forecasters think. I think it's a worthwhile conversation to have, whether or not we're trying to aggregate public opinion in its truest form, or in its form that we can be really confident about the data we're including. Or, if we want the model to do as much post-processing as possible with contaminated data, to make predictions that actually would have been better both in 2016 and 2020, if we had just included data that pushed the averages back toward 50/50.

XLM: Right. I guess it depends on how you think about it. Statistically, one thing I learned, both in my profession and in life, usually what happened is in between the extreme things you will predict. You can either think about it as a contaminated version or you can think about it as just a reflection of a certain segment of the population, which is more extreme one way or the other. If you believe this kind of a wisdom-of-crowd argument, you have people guessing in a very extreme version. But I've done that many times over dinners with people who guess the bills. And you always see that people want to just make fun to break it. But in fact, what's happened is, because these kinds of different extremes, when you look at the whole, the wisdom-of-crowd actually looks really, pretty good. I mean, it's not a statistical statement. It's an anecdote statement. But I do think that maybe there is something there and understanding that is part of the reflection of the uncertainty, that somehow the integration might help.

GEM: Yeah. If our model is trying to address not only the uncertainty in weighting decisions, but also selection bias, I guess, for who becomes a pollster or who releases public polls, then there might be added information there that we had missed. I should also clarify, when we're talking about improvements, we're talking about really, really marginal improvements, probabilistically, from, like, 97 for Joe Biden to, like, 95 and root mean squared error drops by like a percent. The model is clearly doing a good job controlling for those biases. It's just a question of whether or not we want to be as confident as possible in the data we're using or whether or not we just want the model to do the work and have slightly better predictions. That's something you have to think about. And your thoughts are also really clarifying there, too. So hopefully we can answer the question before next time.

LV: Elilott, going forward, you're talking about going forward and thinking about how you're going to change things going forward, we spoke with Allan Lichtman, who has a very qualitative [method] and makes judgments in order to determine who he thinks is going to win. It seems like in your model, obviously you make judgments in it, but it's very data-driven. Is there any future you see for sort of marrying the two of these things together, or do you think there is no way to sort of marry these two?

GEM: Well, Allen clearly has a good prior, a good starting position for the election. I guess I don't know how to incorporate that into our model probabilistically, or maybe we can get past predictions and regress them on the outcome. But if we're trying to think of incorporating his beliefs as the prior for the model, there's probably something there. You know, I've been pretty transparent about this sort of subjective thinking that goes into the model so far. I'm certainly not wedded to the idea that the only way we learn about politics is through a strict quantitative study of data that we can incorporate into a model with a very slim training set. There's got to be a halfway point where we can incorporate other judgments and other information into the prior, I mean, I guess after all, that's why we did a Bayesian version of it.

XLM: I do have a question, because I know all these predictions, everything that were posted on your site—The Economist site, and the HDSR article was really explaining the statistic behind it, and again, I really appreciate that we have the opportunity to publish that article. I want to ask you about what reactions you are getting from the site you're running. People hated it? People love it? All over the place, very divided? Give us a sense of how the public has reacted to these predictions.

GEM: Well, they're divided, but I think on average they're more positive than negative. Like I said, right after the election results, the very quickly-forming conventional wisdom is that pollsters had got it wrong and forecasts had missed their mark, sort of a massive way. Again, I don't think that it for other reasons we already discussed, I don't think that's true. But we got a lot of emails right in that first reactionary phase to the election, when people were expecting clear Biden victory early on the night and didn't get it and I think needed to lash out at someone. We got some angry emails. We got some e-mails like, ‘How could you be so wrong?’ That sort of stuff. But then we've done a bit of reckoning. We've posted two articles on The Economist site about what pollsters may have gotten wrong and about how our model could have better reflected uncertainty. Our [distributional] tails should have been slightly fatter, I think. And I think importantly, we could have done a much better job at emphasizing the range of outcomes that just giving people 97% odds often times obscures. And people have been empathetic or really welcoming of that sort of iteration on how we were thinking about things earlier on. We did get one email where someone said, ‘I was looking at the betting markets two weeks ago and they said 55 for Trump and 60 for Biden. I didn't think that was right, so I used your model to bet on the election. I bet a whole month's salary. And I made this whole month salary back.’ So, it's worked out for some people. But academically, there are things to consider when we get a week's worth of email, basically saying that we are wrong and didn't help them contextualize the election, even though that's what we were trying to do.

XLM: Would you consider some kind of adjustment in your communication?

GEM: Yes, an adjustment in our communication, yes. But not because the model might say 95, but because it's clear that publishing a number is obscuring what we're really trying to convey. The 97% doesn't say Joe Biden's going to win by a landslide. It means that there are a whole host of outcomes and most of them land above 270, but a lot of them could be 306 right where we're ending up now. I think the 306 outcome was the sixth-likeliest outcome for our model. But that information gets lost. I think that's the solution is not for us to abandon the more probabilistic thinking altogether, but to really just highlight the range of outcomes. One practical first step is that our web page show people first a probability and a frequency statement, 1 in 20. And then it showed them a histogram and a risk feeder highlighting certain squares in a grid to reflect probability. I think next time around we want to do that in the reverse order, where we show them the distribution first, walk them through exactly what we're showing, and then, if we're going to give them a probability at all, give them the probability after that. I think it would be best to avoid the probabilistic statement altogether and just give them the frequency statement, which might better highlight the potential for sort of low probability events.

LV: Just for the listeners, could you explain what you mean by the difference between probability and frequency outcome?

GEM: Sure. So our model gave Joe Biden a 97% percent chance of winning, which people probably just round up to a 100% in their heads, losing the context of the distribution. If we give them the 19 in 20 chance or the 29 in 30 chance, which is closer to 97, that might emphasize that one rare outcome is still possible in the distribution. Or at the very least, it prevents the sort of 97% just being shared widely and that being the only thing the media reports on. I think hammering home the range of expectations for the election, it's the right way to go. That's what we care about. That's what we care about anyway, right?

XLM: Right. I think that the other point of you allude to is when people see 97, they translate that into some kind of landslide winning. They don't necessarily understand that 97 just means winning, and winning could be by a tiny margin, and that's still winning.

GEM: Yeah, I definitely agree with that. Like I said, the reactions we've been getting from the model is, ‘Hey, you predicted the single event of a Biden landslide and that's not what happened. So therefore, your model is wrong.’ And that's disheartening for us. Next time around, we're just going to hammer home the different sources of uncertainty and try not to block the perception of the model with enticing probabilistic statement.

XLM: Well, speaking of next time, next time actually comes pretty soon, right? Because, you know, 2016 was just like now, but it's already 2020. So I'm sure you guys will start to work on that again very soon. Thank you very much, thank you again for, first, contributing the article to HDSR, and second, for having this conversation, sharing with us lots of thoughts. And I hope our readers will get something useful out of this. And minimally, I think we should all do a better job, both as those of us communicating and those of us on the receiving end to really understand what the errors and uncertainties and how those things play out in real life. Liberty, you have any closing thoughts?

LV: No, just thank you. And if it makes you feel any better, I'm one of those people that wrote that the pollsters got it so wrong and are terrible, but I get as much hate mail on the other side telling me I'm an idiot. It goes both ways, so don't worry. You have people defending you. Yeah, it's rough out there.

GEM: Well, good. Thank you. To the extent we can provide an initial postmortem of the model, it's always good to do it in an academic setting.

XLM: Oh, thank you very much. Thank you again.


Disclosure Statement

Andrew Gelman, G. Elliott Morris, Liberty Vittert, and Xiao-Li Meng have no financial or non-financial disclosures to share for this interview.


©2020 Andrew Gelman, G. Elliott Morris, Liberty Vittert, and Xiao-Li Meng. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

Comments
0
comment
No comments here
Why not start the discussion?