Data Quality Jpal 260708

  • Upload
    amymowl

  • View
    33

  • Download
    0

Embed Size (px)

Citation preview

Ensuring data quality: fundamental stepsJ-PAL RA Meeting, July 26th 2008

1

The whole process Questionnaire design Data collection Software design Data entry

2

The whole processQuestionnaire DesignAsking the right questions, in the right way Structure the questionnaire effectively

Data collectionVeracity Quality of survey Quality of filling questionnaires

Software design Minimize data entry errors

Data entry and management Check that data is errorless Organize data in an effective way Clean data3

The whole process Questionnaire design Data collection Software design Data entry

4

Questionnaire design Clear skip patterns whenever needed. The software designer will then need to include those in the data entry software.

Grids Single/multiple options Interviewer checkpoints When coding your questions, make sure that all options are included. For example, if there is a chance, even small, that people will say I dont know, do include the code -999 in the question.

5

Pilot and translate survey Pilot: in non research areas, but similar setting Depending on how ready questionnaire is, 30 to 40 pilots Can also pilot some sections more intensively

Translation: back translation is MANDATORY

6

The whole process Questionnaire design Data collection Software design Data entry

7

Data collection: surveyors Selection Training: before survey, and on-going Before survey: Classroom and field Questionnaire + field instructions + behavior on field Training on the issue of interest Also, if you have time to do an instruction manual, it is useful Keep going to the field with them and do reminder trainings (ex. You notice they prompt too much etc.)

Maintain motivation: go out with them, bonuses etc. STAY IN THE FIELD WITH THEM8

Data collection: quality checks Team structure One supervisor for five surveyors A field monitor if your team is big to help you manage the team

Monitoring on the fieldAccompaniments by supervisor: all the time Accompaniments by monitor: 75% of the time Accompaniments by yourself: maybe 15% of the time Back-checks by field monitor: 15% of questionnaires, some sections (mandatory!) Do some back-checks yourself Analyse the data from back-checks right away!

If you use a survey company, you still need to do your own back-checks and some accompaniments

9

Questionnaire quality: scrutiny Scrutinize questionnaires Have surveyors, and supervisors do it But also do it yourself! If you have a project assistant, ask him to scrutinize 100% but still scrutinize 50% or so yourself (at least most tricky sections) Examples of instances where only you can catch mistakes: codes for activity, logical consistency

When scrutinizing, write all codes, even if not precoded -777 for missing, or -999 for I dont know

If you find too many missing data, or data not consistent, send surveyors back to the field10

The whole process Questionnaire design Data collection Software design Data entry

11

Data management: goals Quality Timing Timing is important, and you need to monitor the DEOs or the DE company carefully to make sure they stick to timelines, but by no mean you should sacrifice any steps related to quality check (if you save time on those steps, you ll loose time later).12

Data entry software Software Need to think about it as soon as questionnaire close to final Could be done by survey company or outsourced to someone else (less expensive, or someone you trust better) Goal is that DEO should be able to do as few mistakes as possible

13

Data entry software Software developing: send the developer a detailed spreadsheet indicating instructions for each question (what is the range of acceptable values, logical checks, etc.). The more detailed this will be, the more time youll save later. Software testing: When Sachin or a software designer does the software, you need to test it your self by entering a bunch of questionnaires (for e.g pilot questionnaires, or also invent the responses, just make sure you test all the parts of the software). Check output: Then look at the output carefully and make sure it looks fine, and also send it to the professors you work with to make sure they are satisfied with the output.

14

Checking output When checking output try to imagine yourself analyze the data! All field need to be numerical (except text fields, like comments or others specify). Again, there is not much you can do with text fields when you analyse. One example: when questions have multiple choice responses (lets say the question is where do you take your water from? and there are 5 options well, tap, etc.) This question should be considered as 5 questions (1. Do you take your water from the well? Yes or no 2. Do you take your water from the tap? Yes or no etc.). The response for this question will be a binary variable (i.e either 1 (yes) or 0 (no). This becomes obvious if you put your self in the shoes of the person who will analyse the data (among others, you!). If this is considered as only one question, and the DEO fills 1, 2, 5 in the unique response field, you can not do anything with that data!

15

The whole process Questionnaire design Data collection Software design Data entry

16

Data entry Timing: Data entry should start no as soon as possible after data collection start and before collection is over! Double entry: Mandatory. Must be written in contract. One output Two outputs, reconciled

Error checking: Check the error rate on a regular basis (batches of 200 or 300 questionnaires). And before you do any cleaning Payment to DE company: In contract, clause that the first payment will be done only after 200 or so questionnaires have been given to you, the error rate checked by you, and less than 0.5%. Pay only after that. Get bad data re-entered entirely: whatever is the nature of the errors17

Error rate checking What is it? For each batch, re-enter a sample of data fields and compare this data with the data given by the company (for those fields) Need approximately 3000 by batch

How to do? Divide your data in sub-sections (of about 25 questions) In some cases you will receive your data split in tabs you can use those tabs as sub-sections if small enough For each sub-section select 5% of questionnaires in your batch, randomly selected Enter data from that section of the selected questionnaires (using an excel spreadsheet, or the data entry software) Compare your dataset with original data (use stata, excel, or comparison software), and check on physical questionnaire who did the mistake Error rate: numbers of errors made by the company/number of fields (one error is one field with a mistake, not one question!) Calculate error rate for each section, and overall

Data cleaning and organizing Clean your data in a different file Rename and label variables Check for logical errors Look at ranges and outliers Do basic data summaries

Check for duplicate data Check for missing data Look at distribution of data by surveyors/teams19

Thank you!

20