16
Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Embed Size (px)

Citation preview

Page 1: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Data cleaning workshopBerlin, 8-10 June 2009

The Analysis of Interviewers‘ remarks

Laura Crespo

Spanish team

CEMFI

Page 2: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

This is based on my presentation for Wave 2 in Frankfurt, December 6 2007:

Based on the remarks and feedback from PL, NL, BE-fr, DK, GR and ES from Wave 2 !

Comments and suggestions from other countries’ experiences are very welcome!

SHARE Data Cleaning Workshop

Page 3: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Reminder: When a “remark” should be recorded?:

When a response (or non-response) needs to be commented.

When Blaise does not accept the answer provided by the respondent.

When a response is difficult to code.

When a response needs to be clarified.

SHARE Data Cleaning Workshop

Page 4: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Therefore,

Good news: They may contain very useful info for data cleaning (and also useful for

SHARE-users, working-groups, country teams and even the survey agency).

They are an important source of info to detect errors, missing info, clarifications, problems with questions. One of the first things to look into.

Bad news: Very iwer-specific (large heterogeneity across iwers, questions, and

even countries). They need a case-by-case analysis. Very time consuming.

At some point, they will need translation to english.

SHARE Data Cleaning Workshop

Page 5: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Dealing with iwer remarks: Steps

Step 1: They will be provided by MEA in an Excel file with a particular format for categorization.

Step 2: Have a look at them and try to define specific categories based on their content and potential use. Very often we will need to check the corresponding question to understand perfectly the remark. Categories:

1. Remarks that should be investigated for data cleaning.

2. Useful remarks for researchers, working groups or country teams.

3. Both (useful for data cleaning and SHARE users).

4. Other remarks that should be investigated.

SHARE Data Cleaning Workshop

Page 6: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Step 3: Focus on those that may be useful for data cleaning and identify which correction should be made.

Step 4: Write programs to correct data or flag cases following instructions (examples do files from wave 1 and wave 2 provided by MEA?)

Dealing with iwer remarks: Steps

SHARE Data Cleaning Workshop

Page 7: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Step 2) Categories with different colours/columns

1. Remarks that should be investigated for data cleaning:

Specific amounts, frequencies, years, time periods (time consistency along the calendar or life cycle).

Currencies (maybe less problematic than in Wave 2).

Gross terms instead of net terms or viceversa.

SHARE Data Cleaning Workshop

Page 8: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Remarks for data cleaning

Answer category: Information that may be recorded or imputed to one of the categories already defined (instead of “Other” option) or should be back-checked with the reported answer:

– (RC) Sources of income maternity leave.– (RP) Reasons for not living with a partner.– (AC) Type of residence.– (RE) Situation at 15 if no education, occupation (ISCO),

economic activity (NACE), why worked part-time, reasons left job, title of the job.

– (GS) reasons for no completed, positions during the tests.– (HS) Type of illness, reasons for no checks.– (IV) location and type of house

SHARE Data Cleaning Workshop

Page 9: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Remarks for data cleaning

Answer category: Information that may be recorded or imputed to one of the categories already defined (instead of “Other” option) or should be back-checked with the reported answer:

– (EP) employment status, pensions, eligibility for pensions, occupation (ISCO), economic activity (NACE).

– (HO) housing status.– (HC) health care payments.– (GS, WS, PT) positions during the tests.– (CH) age of children, education.– (DN) marital status.– (PH) illness and disorders, medication, surgery… – (IV) location and type of house

SHARE Data Cleaning Workshop

Page 10: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Mistakes by iwers when coding the answers or the proxy status.

The system does not accept a particular answer (i.e, years, dates, amounts).

Corrected information that is included by the iwer when the respondent realizes that he/she made a mistake or reported wrong info previously (specially when inconsistencies are detected along the calendar).

Remarks for data cleaning

SHARE Data Cleaning Workshop

Page 11: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

1. Remarks useful for researchers, working groups or country teams: Grip strength test not performed or interrupted due to illness,

disabilities, fears, concerns, not safe.

Problems encountered during the physical test (due to distraction, lack of concentration or interest, nerves, specific physical impairments or conditions,..). Presence of another person during the test.

Does not remember/Does not Know.

Does not know to read or write.

Remarks for data cleaning

SHARE Data Cleaning Workshop

Page 12: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Difficulties with Spanish (language problems).

IWERS' opinions about the reliability of the answers: contradictions, attitudes, random answers, reluctance…

Further clarifications or explanations of reported answers. i.e., help (or influence) provided by another person (spouse,

children, others,…)

Problems or circumstances with the drop-offs (help provided by the iwer, by a relative,…).

Remarks for data cleaning

SHARE Data Cleaning Workshop

Page 13: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

More specific motives for non-response (private and sensitive information, does not understand the question): i.e., stillborn children, no available equipment to perform tests.

Complaints relating to the length of questionnaire.

Remarks for data cleaning

SHARE Data Cleaning Workshop

Page 14: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

3. Both (useful for data cleaning and SHARE users): Use of proxies (need to be back-checked with SMS data and

also useful for researcher).

4. Other remarks that should be investigated: Unclear meaning. Phone numbers and addresses (may be important for contacts

in next waves).

Some examples.

Remarks for data cleaning

SHARE Data Cleaning Workshop

Page 15: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Step 3: Focus on remarks for data cleaning and identify the correction needed.

Step 4: Corrections (do files):

Instructions on this?

Even if a correction or imputation can not be made, the remark could still be useful for SHARE users, working groups/country teams and CentERdata (revision of questionnaire for Wave4). Production of a specific file with translation for this purposes? Translation of all remarks: Probably not worthy!

Remarks for data cleaning

SHARE Data Cleaning Workshop

Page 16: Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI

Thanks for your attention!

Julie’s instructions

Open discussion…

Remarks for data cleaning

SHARE Data Cleaning Workshop