17
Quality of data is definitely better in case of online surveys

Quality of data

Embed Size (px)

DESCRIPTION

A case study that explains how quality of data is much better in case of online surveys, with guidelines on how sampling and non-sampling errors are eliminated.

Citation preview

Page 1: Quality of data

Quality of data is definitely better in case of online surveys

Page 2: Quality of data

There are two kinds of errors that can creep in during a survey – sampling errors and non-sampling (human) errors.

2

Types of errors

Page 3: Quality of data

Sampling errors are those that occur when the statistical characteristics of a population are estimated from a sample of that population.

A way to lower this error is to have randomized sampling. Now, in online surveys, the number of contacts is really high, and with low incidence rates and low completion rates, the level of randomness that is achieved is really not possible in an offline study.

Sampling errors

3

Page 4: Quality of data

Also, if required, we do a process known as “weighting”..

Every year, we conduct a baseline study covering 109 urban centres, 196 villages, 80 out of 88 NSSO regions, covering 30,066 households and 1,21,311 individuals, covering 28 states and 4 UTs. Using this baseline study “Juxt India Consumer Landscape”, we create a matrix of unique weights for each age-gender-location combination.

Using this matrix, we can project the data for any survey to a nationwide population and remove the sampling error and the self-selection bias also in this weighting process.

Sampling errors

4

Page 5: Quality of data

In an offline study, the questionnaire administration is done by a human, who reads it out in his interpretation, which may result in bias and errors.

However, in the online study, it is the respondent's interpretation, which is why we use extremely simple english, and the survey can even be done in local languages, thus removing this non-sampling error.

Non-sampling (human/system) errors

5

Page 6: Quality of data

Now, for there can be “bad respondents” also. So, to “clean” this data,

We clear out the junk respondents, we just don’t believe in ‘response cleaning’, we delete the case/respondent itself

We remove all the “straight liners”, respondents who fill the surveys in patterns

We also do “mode time cleaning”. The completion times for majority of responses fall within the 2/3 to 4/3 region of the mode time, this can be flexible depending on type of questionnaire. Outliers outside this band are discarded. A sample of the mode time cleaning can be seen in the next slide.

Non-sampling (human/system) errors

6

Page 7: Quality of data

Typical scatter plot of survey response

times

7

0 200 400 600 800 1000 12005

10

15

20

25

30

Time

Time

Mode time (most common occurring completion time) – 13 minutes

Clean Outliers lying outside 4/3rd of mode time

Clean Outliers lying outside 2/3rd of mode time

Most of responses occur within 2/3rd and 4/3rd of mode time

Page 8: Quality of data

There are also some tests that can be done at client’s request for ensuring statistical validity of data. Let us see them one by one.

Normality, reliability and validity tests

8

Page 9: Quality of data

The objective of sample normality tests is to ensure the sample is normally distributed and randomly selected.

It is important that the normality of the sample will be confirmed before subjecting it to inferential and differential analyses.

Let us take the example of a normality test on the age of respondents

Normality Test

9

Page 10: Quality of data

Histogram – graphical method

An initial impression of the normality of the distribution can be gained by examining the histogram. From the above Figure, it is evident that the collected data (of age) is very near to normally distributed curve.

10

Page 11: Quality of data

Normal Q-Q Plot of Age

In this Normal Q-Q plot, if the variable were normally distributed, the dots would fit the line very closely. In this case, the points in the upper right of the chart indicate the some skewing caused by the extremely large data values, otherwise data seems to be normally distributed. 11

Page 12: Quality of data

It is the extent to which a measuring procedure yields consistent results on repeated administrations of the scale.

The objective of the reliability test is to ensure that the measurable items of each variable were measuring the same underlying construct.

The reliability test of this instrument will be examined through Cronbach‟s Alpha Coefficient.

Reliability test

12

Page 13: Quality of data

Cronbach alpha (α)

The average of all possible split-half‟ correlation coefficients resulting from different ways of splitting the scale items

It’s value varies from 0 to 1

α < 0.6 indicates unsatisfactory internal consistency reliability (see Malhotra & Birks, 2007, p.358)

Note: alpha tends to increase with an increase in the number of items in scale

The Cronbach alpha reliability coefficient for the choice factors scale (in our sample questionnaire) as a whole was 0.78071, indicating that the scale as a whole has acceptable internal consistency and reliability and no items were deleted.

13

Page 14: Quality of data

While the reliability test is necessary, it is not sufficient

The objective of the validity test is to identify whether the proposed items in a study are valid for measuring the underlying concept, how accurately the concept corresponds to the real world

In a test case, the concept referred to the respondents‟ perceived importance of factors influencing their intention to study at X

Validity test

14

Page 15: Quality of data

Sample validity test

15

Importance of the aspects related to content & structure of course offered

a12_7 a12_1 a12_4 a12_2 a12_5 a12_6 a12_3

Correlations

Adaptability to professional environment (a12_7)

1.00 -0.07 -0.06 0.00 -0.09 -0.17 -0.12

Reasonableness of the minimum qualification requirement (a12_1)

-0.07 1.00 -0.05 -0.18 -0.13 0.04 -0.21

Specialized programs in the offing (a12_4)

-0.06 -0.05 1.00 -0.17 -0.12 -0.33 -0.16

Range of courses offered (a12_2)

0.00 -0.18 -0.17 1.00 0.01 -0.11 -0.28

Reasonableness of the course duration (a12_5)

-0.09 -0.13 -0.12 0.01 1.00 -0.25 -0.26

Topicality of course content (a12_6)

-0.17 0.04 -0.33 -0.11 -0.25 1.00 -0.06

Flexibility in selection of course (a12_3)

-0.12 -0.21 -0.16 -0.28 -0.26 -0.06 1.00

Page 16: Quality of data

The questionnaire for the test study was developed using choice factors from similar studies as a point of reference, which was then adapted to the Indian context and in fact correlation between the factors was minimum

Thus, the content validity of the questionnaire was addressed

Validity test

16

Page 17: Quality of data

Thank youwww.juxtconsult.com www.getcounted.net

17