27
EVERYONE CARES ABOUT SAMPLE QUALITY, BUT NOT EVERYONE VALUES IT! A review of responsibilities and techniques you can implement to protect your online research and beyond

Webinar: Everyone cares about sample quality but not everyone values it!

Embed Size (px)

Citation preview

PowerPoint Presentation

Everyone Cares About Sample Quality, But Not Everyone Values It!A review of responsibilities and techniques you can implement to protect your online research and beyond

1

REAL PEOPLE, QUALITY DATATM

DATA QUALITY SOFTWARE

Lisa Wilding-BrownChief Research Officer

Mark MenigChief Executive Officer

AgendaQuality through the years (brief overview of where weve been and where we are going)Current landscape i.e., bots, hijackers, foreign click shops in China etc. Challenges & costs associated with todays online fraud and how it impacts data qualityImplementing an effective solution (multi-layered approach)Technical approaches: Digital fingerprinting (when and where); Respondent validation; algorithmic solutions over a members lifetime, other 3rd-party techniques, etc.Behavioral approaches: Knowledge question design (red-herrings); Pre-survey screening; smart survey design (dos and donts)The Path Forward: Responsibility, Accountability, & Collaboration

3

3

Care About vs. ValueWhen you care about something, you simply have even minimal regard for someone or something.

When youVALUE something, you consider it important and worthwhile. ... As a verb, itmeans"holdingsomethingin high regard," (like "Ivalueour friendship") but it can alsomean"determine how muchsomethingis WORTH," like a prize valued at $200.

4

QUALITY

means doing it right when no one is looking

5

200020062008201220162020

The industry rapidly becomes enamored with the speed and cost savings of moving to onlineIndustry associations launch major initiatives to investigate and restore online research qualityFraud continues to morph and evolve with the emergence of new threatsP&G speaks out about online data quality issues at the Client Summit sparking industry-wide discourseRapid evolution and diversification of devices and engaging respondents migrates from a proximity-fixed experience to a portable experienceThe only constant is change! Continual innovation is required in order to stay ahead; recognizing the battle is never over

6

Current Landscape

Dr. Liz Nelson, co-founder of TNS, advisor to the board of Fly Research and a fellow of the Market Research Society, talks about howthe need for speed is affecting the quality of research.

Research Live November 24, 2016

7

I would say immediately that the emphasis on speed is whats happening now. Clients demand immediate results with the survey in field on Friday, and 2000 results the next day. I think the sad bit is that quality suffers

Current Landscape

Recent advances in big data and artificial intelligence are now making it possible to teach a machine to understand and speak to humans.It's very difficult to simply look at the data provided by some of the more sophisticated bots and identify what to remove, because it's all gray goo inside, just like a real brain, and may be indistinguishable from real data.Need a real world example? Take out your iPhone and ask Siri a question.

Forums like the one to the left abound online with users looking for and sharing information about how to utilize tools to create/mimic bots and automate the process of filling in surveys.

8

Current Landscape

Here is survey bot attempting to complete a survey with no given information. The creator ran this on 6 surveys a day for two weeks (fully automated of course) and got the total sum of 14.95p, with no user interaction what so ever!That was 10 questions completed in under 17 seconds in case you lost count!

9

Current Landscape

Create a fake whatever you need

10

Current Landscape

TheTorsoftware protects users by bouncing their communications around a distributed network of relays run by volunteers all around the world.

TheTor Browsergives access to Tor on Windows, Mac OS X, or Linux without needing to install any software.

Survey Click Shops are popping up around the globe

Comprised of many unique devices in a single location being utilized by a group of fraudsters to game surveys and generate incentives

11

Current Landscape

Device Emulators. In computing, an emulator is hardware or software that enables one computer system (called the host) to behave like another computer system (called the guest).

This threat will only get worse as computers and global computer networks continued to advance and emulator developers grow more skilled in their work.

Datacenters, VPNs, Anonymous Proxies, etc. are favorite tools for fraudsters because they allow them to spoof their device to appear to be coming from a different country on a case by case basis as needed based on the requirements of a given survey.

12

Challenges & Costs

Timeliness of fieldingPurchase processEase of accessing panelCustomer serviceQuantity of respondentsCost of panelQuality of RespondentsNot at all satisfied2%2%2%3%5%5%7%Slightly satisfied11%8%12%10%17%15%26%Moderately satisfied33%37%36%39%41%46%42%Very satisfied44%44%42%40%31%30%23%Completely satisfied9%9%8%8%5%5%3%Top 2 box54%53%50%49%36%34%26%

2016 GRIT Report13

13

Challenges & CostsTechnology, or lack thereof, is the prime culprit for sample getting worse: from bots, to survey design, to mobile enabled surveys, all these are driving sample quality down. Many folks have a strong sense that there are only professional survey takers and fraudulent bots that are taking all the surveys because there is a race to the bottom in terms of cost. Sample providers should only actively communicate on issue of representativeness, not quality or design.

2016 GRIT Report14

14

Implementing an Effective SolutionTechnical Approaches

Most Adopted Fraud Detection Tools

2016 Fraud Report15

Implementing an Effective SolutionTechnical Approaches

DEVICE FINGERPRINTAdevice fingerprintormachine fingerprint or browser fingerprintis information collected about a remote computing device for the purpose of identification. Fingerprints can be used to fully or partially identify individual users or devices even whencookiesare turned off.Motivation for the device fingerprint concept stems from theforensicvalue ofhuman fingerprints. In the "ideal" case, all web client machines would have a different fingerprint value (diversity), and that value would never change (stability). Under those assumptions, it would be possible to uniquely distinguish between all machines on a network, without the explicit consent of the users themselves.16

Implementing an Effective SolutionTechnical Approaches

IDENTITY VALIDATIONIdentity validation solutions allow for the evaluation of names, postal addresses, and/or email addresses against third-party consumer databases to determine if they're legitimate and correspond with one another.They provide confidence in knowing that a participant is who they say they are and lives where they say they live. Also allows for the removal of duplicates within and across sources.

Layering in a Geo-Location Distance Check adds additional fraud detection by calculating the distance (in miles across the surface of a sphere) between the latitude/longitude coordinates of the postal address and the latitude/longitude coordinates that the users IP address resolves to.

17

Implementing an Effective SolutionTechnical Approaches

FRAUD DETECTION

At the device level, there are key markers that can be identified to indicate the risk of first time user fraud:Language CheckGeo-Browser Language CheckGeo-OS Language CheckGeo-Time Zone CheckGeo-Off Hours CheckGeo-Country CheckMulti-Device CheckBot CheckAnonymous CheckBlacklist CheckBrowser Status Check18

Implementing an Effective SolutionTechnical Approaches

SURVEY VALIDATIONA respondent can be flagged as unengaged in the survey if he or she speeds on at least X% of the pages they saw in the survey. The norms and standard deviations of the times for each page should be calculated in real-time as the page submissions from the respondents are received by the survey platform.

It can also be useful to consider the response patterns that are being submitted as another key indicator. Respondents who provide undesirable response patterns on more than X% of pages can also be classified as unengaged for the survey. Good Response Validation tools leverage real-time Bayesian statistical models/analysis to determine engagement.

19

Implementing an Effective SolutionBehavioral Approaches

There are three channels to address in order to ensure superior data quality in your study:

Sample Design & Management

Survey Design

Member Management

20

Implementing an Effective SolutionBehavioral Approaches Sample Design & Management

Vendor selection is key. Understand how your vendors sample is sourced, managed and incentivized.

Ask the tough questions! How is sample outgo balanced? What measures are implemented to ensure the highest quality sample is provided?

Demographic balanceActivity & tenure balanceSurvey field timeInvitation/introductory languageCompeting survey inventorySurvey frequency & variationRouting/project prioritization

21

Implementing an Effective SolutionBehavioral Approaches Survey Design

Question design is key!

Use non-leading wording

Provide an out for all respondents

Use open-ends sparingly

Avoid yes/no format

22

Implementing an Effective SolutionBehavioral Approaches Survey Design

Avoid burdensome question formats (i.e., extensive grids and lists longer than 10-15 attributes).

Strive to keep your survey short and simple.

Clear, concise wording write for a 5th grader!

Avoid multiple questions on one screen visual clutter will result in respondent fatigue.

Mobile-compatible and mobile-friendly are two different things!

23

Implementing an Effective SolutionBehavioral Approaches Member Management

Trap Questions

Honey Pots

Algorithmic solutionsTracking activity over time (LOI completions & invalids)

Profiling & third-party data validation sources

Demo consistency checks

Quality exists across a wide spectrum; lifetime management is critical

24

Implementing an Effective SolutionBehavioral Approaches Trap Questions Dos & Donts

Not all trap questions are effective! Trap questions shouldnt be too simple or too complex.

Types:Instructional (i.e., Select the image which shows a book.)Skill-based (i.e., 2+2 = ?)Honesty-based (i.e., What brand(s) are you aware of? What activities have you done in the last 12 months?)

Implement multiple measures to assess quality, never rely on a singular question within the survey to dictate quality.

Be mindful of question position within the survey i.e., adding your trap question at minute 45 will yield false positives that arguably are a result of a lengthy survey NOT a poorly-behaving respondent.

25

Implementing an Effective SolutionApplying Our Learnings to B2B Research

Know thy sample source!

Always use multiple knowledge-based trap questions (.i.e., looking for experts in cloud-computing? Test their knowledge on various storage products vs. the color of the sky).

Implement multiple measures to assess quality (inclusive of technical and behavioral approaches).

When possible, leverage 3rd party data sources to validate member data.

Never become complacent your research will always be a hot target for fraud. Stay protected!

26

The Path Forward: Responsibility, Accountability, & Collaboration

Every company up and down the supply chain involved in the execution of online research has a role/responsibility as it relates to data quality/fraud detection. What you are responsible for depends on which part of the research process you have operational control over (i.e. you cant just push responsibility down to the operational layer below you, everyone has to do their part, or the whole system suffers).There is no silver bullet solution. Effective solutions require a layered technique/approach that incorporates redundancies and failsafe mechanisms.Its not enough to simply care about data quality and fraud detection, you must VALUE it!

27