15
Data Quality: Measuring the Quality of Online Data Sources Data Quality Measuring the Quality of Online Data Sources A SurveyMonkey Audience White Paper October 2012

SurveyMonkey Audience - Data Quality Whitepaper

Embed Size (px)

DESCRIPTION

SurveyMonkey Audience: an on-demand group of millions of survey respondents to take your surveys. Audience allows customers to target respondents on a wide variety of demographic and behavioral traits to get fast, high quality, cost effective data.

Citation preview

Page 1: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

1

!!

Data Quality Measuring the Quality of Online Data Sources !!A SurveyMonkey Audience White Paper October 2012 !!!!!!!!!!!!!

Page 2: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

2

Overview!In today’s technology driven society and marketplace, research has become accessible to a wide variety of participants; from research experts and specialists, to marketing, product and business teams, all of whom need to gather feedback and expect quality results. Online surveys are a very popular method of collecting data for both research experts and research novices alike due to their speed, efficiency and reach. As a replacement to other data collections methods (e.g., phone, in-person interview, focus group), online research is highly advantageous in terms of ease, speed, efficiency and cost. But to ensure that quality is preserved using the online medium, understanding the key topics that impact quality is critical for researchers using online data sources. Among the many ways to evaluate the quality of a data source, the following key factors have been shown to have a major impact on the quality and reliability of data produced for online samples.

• Recruitment Channels • Attribute Accuracy • Scale & Diversity • Incentives

In this paper, we will analyze how these factors can impact data quality and how SurveyMonkey Audience and its respondent groups, SurveyMonkey Contribute and SurveyMonkey ZoomPanel, handle these topics. We will also answer two key questions on the SurveyMonkey Audience respondent groups:

• Are SurveyMonkey Contribute and SurveyMonkey ZoomPanel members representative of the United States population?

• How closely do the responses from the Contribute and ZoomPanel members match reliable benchmarks?

Page 3: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

3

Recruitment Channels

Key Quality Questions • How are online respondents recruited? • Which sources were used to find the people taking your survey?

Overview & Alternatives We utilize a wide range of methods in order to recruit respondents. The Internet has the capability to reach a vast portion of the population, thus making online sources an easily accessible channel for recruitment. Focusing on a single recruitment channel (e.g., people recruited from a niche website) may yield a large number of respondents, but creates the risk of acquiring individuals with an inherent bias, thus not thoroughly representing the general (or target) population. In order to prevent bias, most data sources use multiple channels infused with a variety of techniques to recruit new survey takers, or use large sources that target a more diverse set of the overall population. By incorporating multiple channels in order to reach a larger array of respondents, data sources are able to more effectively eradicate biases that may arise from using a single channel.

Some of the many channels for recruitment include: • Online advertisements (banner or search ads)

• Co-registration (offers to sign up for additional services or offers when registering for other services or offers)

• In-game advertising (recruitment ads displayed during a game playing experience)

• Website recruitment (offers to sign up to take surveys displayed on a website)

• TV advertisements

• Billboards and offline advertising to join an online service

SurveyMonkey Audience Approach SurveyMonkey Audience, one of the world’s largest providers of access to online survey respondents, utilizes a variety of methods to recruit new survey takers to maintain a high quality flow of new respondents to its member registration sites. The key methods used by SurveyMonkey Audience to recruit survey respondents include:

Website recruitment

• SurveyMonkey is the world’s leading provider of survey solutions. A byproduct of this scale is the significant volume of traffic to its web properties and customer surveys. Over 30 million unique people visit SurveyMonkey websites, or take SurveyMonkey customer surveys each month. Not all of these visitors are survey creators or SurveyMonkey subscribers, and many are completing surveys sent to them by colleagues, friends, or service providers. SurveyMonkey Audience recruits from this large group of people, providing them with the opportunity to regularly complete surveys.

Page 4: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

4

Of the millions of people who take surveys every day on the SurveyMonkey platform, thousands of people register to take surveys.

• Given the scale and reach of SurveyMonkey and its audience (over 30 million unique people per month), website recruitment is the primary method of acquiring new respondents for the SurveyMonkey Contribute business.

Online advertisements & co-registration partners

• SurveyMonkey Audience uses online advertising and co-registration partnerships to recruit members in the US, Canada, UK, France and Australia for its ZoomPanel member recruitment sites. Instead of using a single channel or partner, SurveyMonkey Audience works with over one dozen partners to attract a diverse set of new survey respondents from various sites that appeal to different demographics to recruit new members to ZoomPanel.

Attribute Accuracy

Key Quality Questions • How was the information about each respondent collected? • How is attribute information validated?

Overview & Alternatives In order to effectively navigate today’s data rich Internet environment, data sources must utilize a variety of methods to learn about potential survey respondents. Since many research projects have specific targeting criteria (seeking respondents with particular demographic, behavioral or attitudinal characteristics), the abundance of available data must be filtered efficiently in order to determine the feasibility of projects as well as confirming the level of insightfulness the gleaned data will yield. Data providers use a vast assortment of techniques in their efforts to target respondents and allow survey creators to cross analyze results across different demographic, behavioral, and attitudinal traits of respondents. Some of the methods used by providers to understand these attributes include:

Profiling Asking respondents questions about themselves that can be used for targeting or in survey data analysis

Screening Using screener questions at various points in a survey to filter in or out people that fit a certain criteria

Data appending Leveraging outside data validation source or social networking data based on certain respondent identifiers (e.g., email address, physical address, and/or name)

Page 5: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

5

Inference Inferring information about a respondent based on cookie information from websites, or making assumptions on an attribute based on other attributes or responses to questions

Attribute information on respondents is only useful if it is accurate. Therefore, accuracy is a key opportunity and challenge for data providers to distinguish themselves. Since services will often provide incentives and rewards (as discussed later in this paper) for participation in surveys, many pitfalls in data accuracy may arise if attribute information is not validated. Some of the key pitfalls include:

Inaccurate attribute information Respondents providing inaccurate attribute or response information in order to qualify for more surveys and earn more rewards

Duplicate respondents Respondents registering for multiple accounts to receive more survey opportunities and earn more rewards (often using different, invalid attributes in each profile)

Invalid survey response data Respondents speeding through longer surveys, without considering each response option, so they can earn rewards by simply finishing a survey

So how do data providers protect against this? Different providers use a wide variety of mechanisms—some manual, some using technology—to extract outliers or validate individual members and their attributes.

SurveyMonkey Audience Approach SurveyMonkey Audience utilizes both profiling and screening as its key methods to learn as much as possible about respondents. When new respondents sign up to join member sites, they are asked to provide information about themselves across a wide variety of attribute categories. This allows us to target members and provide surveys that will be relevant to respondents, allowing customers to narrow down their target audience to only those members who are relevant for their studies. Since SurveyMonkey Audience maintains groups of respondents that have registered to take surveys, additional profile information can be collected and updated frequently. For certain attributes that may not have been previously profiled, customers can also use SurveyMonkey’s survey tools to add questions with skip logic to screen respondents in or out of surveys.

Page 6: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

6

SurveyMonkey Audience has taken a very proactive stance on validation and quality to ensure its customers get highly reliable data. In order to combat inaccurate profile information and to ensure that respondents are unique, real people, we use TrueSample Validation for every new respondent who registers for our ZoomPanel member site. We require that new respondents provide their physical address and email address, which is validated using TrueSample. Only respondents with valid addresses are permitted to register, and any duplicates are removed from our member group. For our SurveyMonkey Contribute respondents, we have attempted to eliminate the incentive for respondents to provide false information by removing the reward at the completion of a survey project. When SurveyMonkey Contribute members complete a survey, they are given the opportunity to make a small donation (made by SurveyMonkey on their behalf) to a charity of their choice and play an instant win sweepstakes game.

Scale & Diversity

Key Questions • How big is the group you are sampling from? • How diverse and representative are your survey respondents?

Overview & Alternatives The size of a potential data or sample source is a very important aspect in its ability to address customers’ needs and ensure that a diverse group of people can be reached. With niche data sources, a smaller group can satisfy specific customer demands. However, in instances where a source targets a larger consumer group, a significant subset of the overall population is required in order to provide enough reach to perform effective sampling. Recruiting from a large traffic source or leveraging recruitment sources that reach massive audiences (like social networks or highly trafficked websites) is very helpful in creating a large, diverse group of potential respondents.

SurveyMonkey Audience Approach SurveyMonkey Audience, per the recruitment sources described above, has access to a very large portion of overall Internet traffic through its website recruitment and partner channels. By leveraging these two sources, an online website recruitment channel that reaches 30M+ unique people each month, and multiple partner co-registration channels which span diverse interest groups and offer types, our member sites, SurveyMonkey Contribute and SurveyMonkey ZoomPanel, have very broad reach and allow us to recruit a large (3M+ potential respondents) and diverse group of people.

Incentives

Key Questions • How are survey respondents rewarded for responding to your surveys?

Page 7: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

7

Overview & Alternatives Most people enjoy providing their opinions and being heard, particularly when they can help shape the future development of products or services that are relevant or interesting to them. Surveys have become a very popular instrument for collecting this type of feedback and organizing it in a way that allows for quantitative analysis. However, time is valuable, especially in a world where people are constantly on the go. Therefore, survey respondents often expect some sort of reward or benefit in exchange for their time and opinions. Using rewards and incentives for taking the time to provide honest feedback has become a useful tool to increase the likelihood people will respond to and finish surveys. Rewards are also a way for data providers to form a relationship with their respondents and encourage them to continue to take surveys and build scalable solutions to address customer needs. While rewards and incentives are a helpful mechanism for encouraging survey participation, they also wield the potential to introduce quality concerns when not monitored and administered carefully. Indeed, literature in psychology suggests that motivation to complete tasks carefully diminishes when people are paid. Data providers have come up with a wide variety of reward types and programs to help increase survey participation and maintain active member groups. Some of these types and programs include:

Cash rewards

Point or credit programs Programs that enable members to earn points or credits to redeem in the future for rewards that “cost” a certain amount that can be accrued over time with participation

Currency programs Similar to point or credit programs, but leveraging a currency provided in a partner ecosystem, like rewarding Facebook credits by participating in in-game surveys that can be used to purchase gaming or other credits on a partner platform

Charitable donations Donating to charity on behalf of survey participants in exchange for survey participation

Information or content access Allowing participants to view premium content in exchange for survey participation

Sweepstakes entries Entering survey participants into a sweepstakes to win a larger prize in exchange for survey participation

Gift cards Often part of point or credit programs, but using a non-cash gift card as a reward for survey participation

Page 8: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

8

Frequent flyer miles or rewards program points Similar to currency programs, awarding miles or points to other popular reward programs in exchange for survey participation

Rewards programs are most often crafted to appeal to the specific type of respondents taking surveys. These programs strive to increase the participation rates of people invited or exposed to a survey by providing an appropriate reward. Certain types of people respond differently to different types of rewards and reward sizes, which can influence the reliability of the accuracy and data provided by respondents. A negative outcome of an incentives program is that not all rewards will be equally appealing to the many types of people that may respond to a survey. Thus, rewards programs aspire to provide rewards that encourage participation while simultaneously eliminating any data quality or reliability issues. Therefore, data providers must monitor the impact of their rewards programs to ensure that data quality pitfalls are avoided. Some of the common quality issues that may arise due to reward and incentive programs include:

Speeding Respondents rushing through a survey to finish a survey and qualify to receive their reward

Satisficing Tied to speeding, but when respondents do not thoroughly consider all answer options and pick whatever option they see that will help them quickly move through a survey

Straight lining An example is “picking the first option for all answer choices”, a way to quickly move through a survey without providing honest opinions

Respondent bias When reward options only appeal to a certain demographic or type of person, this may lead to a biased data set that only includes people interested in the incentive provided

Response manipulation When variable reward amounts are used (typically when screener questions screen out people and provide lesser reward amounts for respondents who are screened out vs. higher amounts for respondents screened in), respondents may intentionally provide invalid response data to attempt to gain a larger reward amount by fitting into the response category they feel will allow them to increase their potential reward

While many of the pitfalls listed above are difficult to avoid, data providers have created tools and systems to remove incentives for behavior that produces invalid response data. Data providers have also begun to increasingly use tools to remove outliers from data sets, as well as removing respondents from their member groups that provide inaccurate data.

Page 9: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

9

SurveyMonkey Audience Approach SurveyMonkey Audience created its reward programs to encourage honest participation and to avoid the common pitfalls associated with many reward types. SurveyMonkey Contribute respondents are rewarded with charitable donations, made on their behalf by SurveyMonkey. Respondents are also allowed to play a flash game that enters them into an instant win sweepstakes upon completing surveys. The combination of charitable donations and sweepstakes entries means that there is no direct material incentive waiting for respondents at the end of any survey opportunity, therefore eliminating the instant gratification that respondents may expect upon finishing surveys. SurveyMonkey ZoomPanel respondents are rewarded with redeemable points for merchandise, gift cards and sweepstakes entries. While respondents receive points upon the completion of any survey, the amount is nominal.

Measuring Representativeness of Data Sources

Key Quality Questions • How closely do SurveyMonkey Audience respondents match the US population? • How closely do SurveyMonkey Audience responses match reliable benchmarks?

Overview Data sources all take their own unique methods of proving quality and representativeness. Since not all data sources are the same and are often used for different purposes, this varied approach of proving and measuring quality is appropriate. In any quality discussion on data, the topics of consistency and accuracy will often surface—namely, how consistent is the data produced by a given data source and how accurate is the data provided by a given data source. While consistency is important, accuracy is the foundation of data quality. Being consistently wrong is typically not a desirable attribute of any data source.

SurveyMonkey Audience Approach The goal of our methods in recruitment, policies, processes and incentive structures is to create a group of respondents who are able to produce data that is accurate. SurveyMonkey Audience was designed as a tool that helps people find the right answer, not just any answer. So, while we constantly measure our respondent groups to see how closely they resemble the US population and the populations of respondents our customers are seeking, we also test the end results. We want to see the data those respondents produce, and see how closely it benchmarks against known, reliable data that may cost significantly more to purchase or field. In the following sections, we detail how SurveyMonkey Audience measures the representativeness of its member groups, and the data they produce.

Page 10: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

10

How closely do the Contribute and ZoomPanel member groups match the US population? If we expect our respondents to provide reliable data that will help customers make better decisions, we want to start with a group of people that is representative of the United States population. This does not guarantee that the responses from these members will mirror what the United States population thinks and does, but it does help ensure that our respondents are diverse across various dimensions. While we benchmark various demographic attributes with the United States population, one of the most important factors, which can be visually represented using census data, is the location of our members. When members sign up to take surveys on SurveyMonkey Contribute or SurveyMonkey ZoomPanel, we capture each member's zip code in the registration process. Below are six charts that show how the SurveyMonkey Contribute and SurveyMonkey ZoomPanel member groups compare to the US population. The differences between the 3 maps showing the US Census information, SurveyMonkey Contribute members and SurveyMonkey ZoomPanel members for both the county level and metro area level density are difficult to see, which is a good thing, and shows a representative location makeup for both member groups when compared to the US Census.

!

!!!!!!!

Page 11: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

11

Population Density Maps Population Density by County – 2010 United States Census !!

Population Density by County – SurveyMonkey Contribute and ZoomPanel Members !!!!!!!!!!!!!

SurveyMonkey Contribute ZoomPanel!!!

Page 12: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

12

!

Population Density by Metro Area – 2010 United States Census!!!!!

!!!!

!!!

!!

!!!!! Population Density by Major Metro Area – SurveyMonkey Contribute and Zoom Panel Members !!!!!!!!!!!!

SurveyMonkey Contribute ZoomPanel!

Page 13: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

13

How closely do the responses from the Contribute and ZoomPanel members match reliable benchmarks? In addition to making sure that our respondents are representative of the United States population, we want to make sure that the responses are also indicative of the general population. This helps us make sure that we aren't making decisions based on survey data fielded from a group that just looks like the United States population but may not answer questions similarly to the United States population.

SurveyMonkey Audience Quality Benchmarking SurveyMonkey Audience has many systemic controls in place to mitigate many quality issues that may arise, but we also look at the data provided by our respondents to ensure that it is accurate and comparable to other reputable sources. A key benchmarking exercise we perform periodically compares the responses from our member groups to those of other data sources that have established themselves as proven, reliable sources of data with solid methodological underpinnings. One such source is Gallup, a leading full service research business. The testing concept is simple; Gallup conducts phone interviews with more than 1,500 people every day on a variety of topics. They publish the results of their surveys daily for certain measures and weekly for others. We take one topic they ask thousands of Americans and ask SurveyMonkey Audience respondents the exact same question, using the exact same language and answer options.

Page 14: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

14

Compared alongside Gallup’s metrics, we found that the data and responses from our member groups are equivalent to the data from Gallup's phone survey. The charts below explain how SurveyMonkey Audience respondents (from the SurveyMonkey Contribute member group) compare to Gallup's phone survey respondents. Conducted over a 7-day period, our study’s results were consistently within a 5% margin of error with Gallup. !

Methodology & Results Explanation

Methodology Over a seven day period from 7/19/12 to 7/25/12, we surveyed SurveyMonkey Audience respondents and asked them one question, “we'd like you to think about your spending yesterday, not counting the purchase of a home, motor vehicle, or your normal household bills. How much money did you spend or charge yesterday on all other types of purchases you may have made, such as at a store, restaurant, gas station, online, or elsewhere?” Respondents were asked to input the dollar amount (a whole number) that they spent the prior day. The wording was identical to the question asked by Gallup in its daily questionnaire to over 1,500 phone based respondents. We used our SurveyMonkey Contribute member group for this analysis. Surveys were launched every morning at 9am PT, and were left open for a period of 3 days. We analyzed our data to see how it stood up to Gallup, which publishes its data in two data points every day; a 14-day trailing average and a 3-day average, whereby they average the corresponding (3 or 14 day) trailing averages to produce their data point. These trailing averages are used to smooth the spending trends since data can be more volatile on a daily basis due to the day or week or macro events or other factors. We included 2 data points for SurveyMonkey Audience responses to compare against Gallup’s 14-day and 3-day averages. First, we included our own raw average of the trailing 3-day average responses. Second, we performed a common manipulation called an exponential transformation to account for a skew in the average household income of SurveyMonkey Audience respondents, which is higher, on average, than the US population per the US Census. The data points that use the manipulation are labeled “3-day Adj. Avg.”

Results Explanation The Audience 3-day Adj. Avg, for 5 consecutive days is within a 5% error margin of Gallup’s 14-day trailing average. What does this mean? When applying a manipulation to correct for a higher average income of Audience respondents in comparison to the US census data, Audience is able to produce effectively the same results as Gallup, with only 3-days of data instead of 14 for Gallup.

Page 15: SurveyMonkey Audience - Data Quality Whitepaper

Data Quality: Measuring the Quality of Online Data Sources

15

The 3-day trends for Audience data also follow the same trend as the Gallup data using both the Audience 3-day Raw Avg and the Audience 3-day Adj. Avg. Gallup’s data, when comparing the 14-day Avg and 3-day Avg shows more volatility in the 3-day Avg versus. the 14-day Avg. which lead us to believe the that Audience sample can be a healthy predictor of trends even with a smaller data set in terms of the days needed to produce a stable benchmark. Since the Audience 3-day Adj. Avg is within a 5% error margin of the Gallup 14-day Avg, for the 5 consecutive where the test was run (and 3-day Avg. metrics could be gathered) we believe this indicates that SurveyMonkey Audience is able to produce results comparable to those of Gallup. The Audience 3-day Raw Avg is consistently higher than the Gallup 14-day Avg, which our manipulation shows may be highly correlated to the higher average income of members of the Audience respondent group.

Conclusion While data providers and methods of gathering data have grown rapidly over the past decade, it is critical for experienced researchers and those new to research to understand and evaluate the quality of the data produced by the providers they work with and purchase data from. While there are a variety of key factors that can impact the quality of any given data source, we believe that most of these factors can be understood and evaluated by both experts and more novice researchers. SurveyMonkey Audience uses various methods to recruit, maintain and incentivize its various member groups of respondents. We believe that our practices encourage responsible data collection that will benefit our customers. We encourage any customers who purchase data in the form of survey respondents to hold their suppliers to a high standard and to make sure they are aware of the various questions and topics addressed in this paper. !!!