PowerPoint Presentation
Everyone Cares About Sample Quality, But Not Everyone Values It!A review of responsibilities and techniques you can implement to protect your online research and beyond
1
REAL PEOPLE, QUALITY DATATM
DATA QUALITY SOFTWARE
Lisa Wilding-BrownChief Research Officer
Mark MenigChief Executive Officer
AgendaQuality through the years (brief overview of where weve been and where we are going)Current landscape i.e., bots, hijackers, foreign click shops in China etc. Challenges & costs associated with todays online fraud and how it impacts data qualityImplementing an effective solution (multi-layered approach)Technical approaches: Digital fingerprinting (when and where); Respondent validation; algorithmic solutions over a members lifetime, other 3rd-party techniques, etc.Behavioral approaches: Knowledge question design (red-herrings); Pre-survey screening; smart survey design (dos and donts)The Path Forward: Responsibility, Accountability, & Collaboration
3
3
Care About vs. ValueWhen you care about something, you simply have even minimal regard for someone or something.
When youVALUE something, you consider it important and worthwhile. ... As a verb, itmeans"holdingsomethingin high regard," (like "Ivalueour friendship") but it can alsomean"determine how muchsomethingis WORTH," like a prize valued at $200.
4
QUALITY
means doing it right when no one is looking
5
200020062008201220162020
The industry rapidly becomes enamored with the speed and cost savings of moving to onlineIndustry associations launch major initiatives to investigate and restore online research qualityFraud continues to morph and evolve with the emergence of new threatsP&G speaks out about online data quality issues at the Client Summit sparking industry-wide discourseRapid evolution and diversification of devices and engaging respondents migrates from a proximity-fixed experience to a portable experienceThe only constant is change! Continual innovation is required in order to stay ahead; recognizing the battle is never over
6
Current Landscape
Dr. Liz Nelson, co-founder of TNS, advisor to the board of Fly Research and a fellow of the Market Research Society, talks about howthe need for speed is affecting the quality of research.
Research Live November 24, 2016
7
I would say immediately that the emphasis on speed is whats happening now. Clients demand immediate results with the survey in field on Friday, and 2000 results the next day. I think the sad bit is that quality suffers
Current Landscape
Recent advances in big data and artificial intelligence are now making it possible to teach a machine to understand and speak to humans.It's very difficult to simply look at the data provided by some of the more sophisticated bots and identify what to remove, because it's all gray goo inside, just like a real brain, and may be indistinguishable from real data.Need a real world example? Take out your iPhone and ask Siri a question.
Forums like the one to the left abound online with users looking for and sharing information about how to utilize tools to create/mimic bots and automate the process of filling in surveys.
8
Current Landscape
Here is survey bot attempting to complete a survey with no given information. The creator ran this on 6 surveys a day for two weeks (fully automated of course) and got the total sum of 14.95p, with no user interaction what so ever!That was 10 questions completed in under 17 seconds in case you lost count!
9
Current Landscape
Create a fake whatever you need
10
Current Landscape
TheTorsoftware protects users by bouncing their communications around a distributed network of relays run by volunteers all around the world.
TheTor Browsergives access to Tor on Windows, Mac OS X, or Linux without needing to install any software.
Survey Click Shops are popping up around the globe
Comprised of many unique devices in a single location being utilized by a group of fraudsters to game surveys and generate incentives
11
Current Landscape
Device Emulators. In computing, an emulator is hardware or software that enables one computer system (called the host) to behave like another computer system (called the guest).
This threat will only get worse as computers and global computer networks continued to advance and emulator developers grow more skilled in their work.
Datacenters, VPNs, Anonymous Proxies, etc. are favorite tools for fraudsters because they allow them to spoof their device to appear to be coming from a different country on a case by case basis as needed based on the requirements of a given survey.
12
Challenges & Costs
Timeliness of fieldingPurchase processEase of accessing panelCustomer serviceQuantity of respondentsCost of panelQuality of RespondentsNot at all satisfied2%2%2%3%5%5%7%Slightly satisfied11%8%12%10%17%15%26%Moderately satisfied33%37%36%39%41%46%42%Very satisfied44%44%42%40%31%30%23%Completely satisfied9%9%8%8%5%5%3%Top 2 box54%53%50%49%36%34%26%
2016 GRIT Report13
13
Challenges & CostsTechnology, or lack thereof, is the prime culprit for sample getting worse: from bots, to survey design, to mobile enabled surveys, all these are driving sample quality down. Many folks have a strong sense that there are only professional survey takers and fraudulent bots that are taking all the surveys because there is a race to the bottom in terms of cost. Sample providers should only actively communicate on issue of representativeness, not quality or design.
2016 GRIT Report14
14
Implementing an Effective SolutionTechnical Approaches
Most Adopted Fraud Detection Tools
2016 Fraud Report15
Implementing an Effective SolutionTechnical Approaches
DEVICE FINGERPRINTAdevice fingerprintormachine fingerprint or browser fingerprintis information collected about a remote computing device for the purpose of identification. Fingerprints can be used to fully or partially identify individual users or devices even whencookiesare turned off.Motivation for the device fingerprint concept stems from theforensicvalue ofhuman fingerprints. In the "ideal" case, all web client machines would have a different fingerprint value (diversity), and that value would never change (stability). Under those assumptions, it would be possible to uniquely distinguish between all machines on a network, without the explicit consent of the users themselves.16
Implementing an Effective SolutionTechnical Approaches
IDENTITY VALIDATIONIdentity validation solutions allow for the evaluation of names, postal addresses, and/or email addresses against third-party consumer databases to determine if they're legitimate and correspond with one another.They provide confidence in knowing that a participant is who they say they are and lives where they say they live. Also allows for the removal of duplicates within and across sources.
Layering in a Geo-Location Distance Check adds additional fraud detection by calculating the distance (in miles across the surface of a sphere) between the latitude/longitude coordinates of the postal address and the latitude/longitude coordinates that the users IP address resolves to.
17
Implementing an Effective SolutionTechnical Approaches
FRAUD DETECTION
At the device level, there are key markers that can be identified to indicate the risk of first time user fraud:Language CheckGeo-Browser Language CheckGeo-OS Language CheckGeo-Time Zone CheckGeo-Off Hours CheckGeo-Country CheckMulti-Device CheckBot CheckAnonymous CheckBlacklist CheckBrowser Status Check18
Implementing an Effective SolutionTechnical Approaches
SURVEY VALIDATIONA respondent can be flagged as unengaged in the survey if he or she speeds on at least X% of the pages they saw in the survey. The norms and standard deviations of the times for each page should be calculated in real-time as the page submissions from the respondents are received by the survey platform.
It can also be useful to consider the response patterns that are being submitted as another key indicator. Respondents who provide undesirable response patterns on more than X% of pages can also be classified as unengaged for the survey. Good Response Validation tools leverage real-time Bayesian statistical models/analysis to determine engagement.
19
Implementing an Effective SolutionBehavioral Approaches
There are three channels to address in order to ensure superior data quality in your study:
Sample Design & Management
Survey Design
Member Management
20
Implementing an Effective SolutionBehavioral Approaches Sample Design & Management
Vendor selection is key. Understand how your vendors sample is sourced, managed and incentivized.
Ask the tough questions! How is sample outgo balanced? What measures are implemented to ensure the highest quality sample is provided?
Demographic balanceActivity & tenure balanceSurvey field timeInvitation/introductory languageCompeting survey inventorySurvey frequency & variationRouting/project prioritization
21
Implementing an Effective SolutionBehavioral Approaches Survey Design
Question design is key!
Use non-leading wording
Provide an out for all respondents
Use open-ends sparingly
Avoid yes/no format
22
Implementing an Effective SolutionBehavioral Approaches Survey Design
Avoid burdensome question formats (i.e., extensive grids and lists longer than 10-15 attributes).
Strive to keep your survey short and simple.
Clear, concise wording write for a 5th grader!
Avoid multiple questions on one screen visual clutter will result in respondent fatigue.
Mobile-compatible and mobile-friendly are two different things!
23
Implementing an Effective SolutionBehavioral Approaches Member Management
Trap Questions
Honey Pots
Algorithmic solutionsTracking activity over time (LOI completions & invalids)
Profiling & third-party data validation sources
Demo consistency checks
Quality exists across a wide spectrum; lifetime management is critical
24
Implementing an Effective SolutionBehavioral Approaches Trap Questions Dos & Donts
Not all trap questions are effective! Trap questions shouldnt be too simple or too complex.
Types:Instructional (i.e., Select the image which shows a book.)Skill-based (i.e., 2+2 = ?)Honesty-based (i.e., What brand(s) are you aware of? What activities have you done in the last 12 months?)
Implement multiple measures to assess quality, never rely on a singular question within the survey to dictate quality.
Be mindful of question position within the survey i.e., adding your trap question at minute 45 will yield false positives that arguably are a result of a lengthy survey NOT a poorly-behaving respondent.
25
Implementing an Effective SolutionApplying Our Learnings to B2B Research
Know thy sample source!
Always use multiple knowledge-based trap questions (.i.e., looking for experts in cloud-computing? Test their knowledge on various storage products vs. the color of the sky).
Implement multiple measures to assess quality (inclusive of technical and behavioral approaches).
When possible, leverage 3rd party data sources to validate member data.
Never become complacent your research will always be a hot target for fraud. Stay protected!
26
The Path Forward: Responsibility, Accountability, & Collaboration
Every company up and down the supply chain involved in the execution of online research has a role/responsibility as it relates to data quality/fraud detection. What you are responsible for depends on which part of the research process you have operational control over (i.e. you cant just push responsibility down to the operational layer below you, everyone has to do their part, or the whole system suffers).There is no silver bullet solution. Effective solutions require a layered technique/approach that incorporates redundancies and failsafe mechanisms.Its not enough to simply care about data quality and fraud detection, you must VALUE it!
27