PRRI

Most survey consumers know that carefully crafted and executed surveys can provide reasonable estimates of a population, but that these estimates are just that — estimates — which could be off for a variety of reasons. Survey researchers quantify the amount the survey could be off due to sampling in the “margin of error,” but many other potential sources of “error” exist that require expert attention and planning in order to get the best estimates possible.

At PRRI, we obsess about data quality and obtaining the best, most representative samples we can. Therefore, we carefully reviewed a new Pew Research Center report which investigates one of those particularly insidious sources of error for online surveys: Respondents who do not take the survey seriously or answer the questions carefully — what they call “bogus” respondents.

Respondent quality is a particular concern for online surveys since there is no interviewer (as there is on many phone surveys) controlling the process of asking questions and recording answers. Additionally, online surveys typically provide some form of compensation (points to go toward rewards, or small amounts of cash), which can motivate respondents to take as many surveys as possible — with varying degrees of concern for accuracy or attention to what the survey is asking — to reap the rewards. “Professional respondents” who take a lot of surveys can be quite different from the general population that a survey wants to query in both their attitudes and their survey-taking behavior. ^[1]Understanding how this transactive relationship between surveyors and respondents, as well as the lack of an interviewer, impacts data quality has become crucial as the survey field increases usage of online surveying.

The Pew report looks at three different implementations of online surveys: Crowd-sourced, such as Amazon’s Mechanical Turk (although they do not state what crowd-sourced platform they use — all survey vendors’ identities are masked), where respondents are completing “tasks” for money; opt-in panels, where respondents have signed up on a website or app to receive surveys to complete for some form of reward; and address-based sampling (ABS) online panels, which recruit randomly selected households offline (using addresses) and ask people living at those addresses to sign up for a panel to complete surveys for some form of reward.

The basic findings are both hopeful and frightening. There are relatively few “bogus” respondents mucking up the data in any of these sources. However, there are substantially more in the crowd-sourced (7%) and opt-in panels (5% on average) than in the ABS panels (1% on average). At the same time, some of the bogus respondent behavior has the power to change some of the survey’s results: When bogus respondents were removed from questions about favorability, positive responses overall decreased by around 2-4% in the crowd-sourced and opt-in panels. The ABS panels remained unchanged due to the smaller incidence of bogus responses.

The full report has many, many more details and tests, but the gist remains the same: The ABS panels are least susceptible to bogus respondents, and the crowd-sourced platform is most impacted by these inauthentic responses. As Pew is careful to note, however, this does not mean that opt-in polls are wrong. It means that those using opt-in survey data need to make efforts to identify and eliminate biasing responses. Pew notes that many vendors do take these steps, but there is not much transparency around what is done and how.

Pew’s findings support the general concept that when it comes to surveys, you get the quality you pay for. But it’s important to remember that quality is a continuum, not a good/bad dichotomy. ABS online polls are at the upper end of the continuum and are relatively expensive due to the cost of the offline sampling efforts to ensure representativeness and the costs of maintaining high engagement and response rates among the panelists once they are recruited.

Opt-in surveys are considerably cheaper, but not all opt-in surveys are of the same quality. Some opt-in surveys are done quite well, using only samples from a panel the vendor themselves manage, and are carefully controlled to minimize respondent and data quality issues while maximizing representativeness. These vendors and their data are closer to the middle of the survey quality continuum and cost more than less well-regulated opt-in surveys.

Many opt-in surveys simply run a race to get the most respondents possible, with varying levels of control on who enters their samples, what samples are used to complete survey projects, and how to achieve something resembling population representation. These vendors and their data are at the lower end of the continuum. Yet their services are compelling because the price point is relatively low, and many people who want or need public opinion data lack the resources to pay for higher-quality data. There are also cases where there is no need to go higher on the continuum, such as experimental work or when representation of a population is not a concern.

PRRI strives to accurately represent the populations we survey, and thus places a premium on representation and quality. We have already been using ABS panels anytime we conduct surveys online unless there are extenuating circumstances and will continue to do so. Address-based sampling techniques for recruiting respondents are widely regarded as one of the best ways to get representative data, and as Pew’s report illustrates, these panels generally offer a quality upgrade over opt-in options. When data higher up the quality continuum are available, there is no question that cautious consumers would want to rely on the higher-quality results.

^[1] See, for example, Hillygus, D. Sunshine, Natalie Jackson, and McKenzie Young. 2014. “Professional Respondents in Online Nonprobability Panels,” in Online Panel Research: A Data Quality Perspective, edited by Callegaro, Baker, Bethlehem, Göritz, Krosnick and Lavrakas. https://onlinelibrary.wiley.com/doi/10.1002/9781118763520.ch10