Upload
thomas-patrick
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
IAB homepage: www.iab.de
Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research
A New Approach for Disclosure Control in the IAB Establishment Panel –
Multiple Imputation for a Better Data Access
Jörg Drechsler
Competence Center for Empirical MethodsInstitute for Employment Research of the Federal Employment Agency, Germany
UNECE Work Session on Statistical Data Editing Bonn 25.09.2006-27.09.2006
Jörg Drechsler 26. September 2006
Slide 2Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Overview
The IAB Establishment Panel
Three approaches for disclosure control via multiple imputation
Application of the full MI approach to the IAB Establishment Panel
First results
Proceedings/open questions
Jörg Drechsler 26. September 2006
Slide 3Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
The IAB Establishment Panel
Annually conducted Establishment Survey (generally face-to-face interviews)
Since 1993 in Western Germany, since 1996 in Eastern Germany
Population: All establishments with at least one employee covered by social security
Source: Official Employment Statistics
Response rate of repeatedly interviewed establishments more than 80%
Jörg Drechsler 26. September 2006
Slide 4Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
The IAB Establishment Panel: Sample/Weighting
Sample of more than 16.000 establishments in the last wave
Stratified sample:20 economic branches x 10 size classes
Oversampling of large establishments
Yearly additional samples:newly founded firms and replacements for panel attrition
Weighting:- inverse sampling probabilities- adjustment to exogenous values- probabilities to stay in the sample
Jörg Drechsler 26. September 2006
Slide 5Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
The IAB Establishment Panel: Contents
Annual: employment structure, changes in employment, business policies, investment, training,
remuneration, working hours, collective wage agreements, works councils
Bi- or triennial: innovations, government aid, further training, flexibility of working hours, business activities, contact with
employment offices
Focus: 2001 innovation and modern technologies 2002 elderly employees and contact to the labour offices
Kölling, A. (2000): The IAB-Establishment Panel, Journal of Appl. Social Science Studies, 120: 2, 291-300.
Jörg Drechsler 26. September 2006
Slide 6Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Overview
The IAB Establishment Panel
Three approaches for disclosure control via multiple imputation
Application of the full MI approach to the IAB Establishment Panel
First results
Proceedings/open questions
Jörg Drechsler 26. September 2006
Slide 7Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
(1) Fully Synthetic Data
Proposed by Rubin (1993) Idea: - Treat all the units from the population not included in
the sample as missing data and impute them multiply
- Take random samples from the imputed population and release these samples to the public.
Yexc
Yinc
X
X variables available for all units in the populationY variables available only for units in the surveyYinc units included in the surveyYexc units not included in the survey
Jörg Drechsler 26. September 2006
Slide 8Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
(2) Imputation of Selected Variables
Only for variables that bear a high risk of disclosure (key variables) observed values are replaced by imputed values
Proposal: Replace only parts of each key variable in every imputation round and combine the imputed parts to achieve fully imputed variables.
Example: 3 variables and 3 imputation rounds
Jörg Drechsler 26. September 2006
Slide 9Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
(3) Selective Multiple Imputation of Key Variables (SMIKe)
Suggested by Liu and Little (2002) Only selected units of key variables are multiply imputed Assume, the dataset can be divided in a set of categorical key
variables X and a set of continuous variables Y Cross tabulation of X yields the vector x containing cell counts for
all combinations of x Cell counts lower than a previously defined sensitivity threshold
possibly allow re-identification These cells combined with some non sensitive cells, closely
related to the sensitive cells in regard to Y, are replaced by imputed values
Jörg Drechsler 26. September 2006
Slide 10Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Overview
The IAB Establishment Panel
Three approaches for disclosure control via multiple imputation
Application of the full MI approach to the IAB Establishment Panel
First results
Proceedings/open questions
Jörg Drechsler 26. September 2006
Slide 11Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Generating a synthetic data set Create a synthetic data set for selected variables from the wave
1997 from the Establishment Panel Imputation for the whole population is not feasible Draw a new sample from the Official Employment Statistics using
the same sampling design as for the Establishment Panel (Stratification by economic branch, size, and region)
Each stratum cell contains the same number of observations as the wave 1997 from the Establishment Panel
Additional Information from the German Social Security Data (GSSD) for the imputation
missing data
data from thenew sample
data from the IAB Establishment Panel
Yexc
Yinc
X
Jörg Drechsler 26. September 2006
Slide 12Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
The German Social Security Data (GSSD)
Contains information on all employees covered by social security
Since 1973 all employers are required to notify the social security agencies about all employees covered by social security.
The GSSD represents about 80% of the German workforce Information from the GSSD is aggregated on the
establishment level and is matched to the IAB Establishment Panel via establishment identification number
Information on: number of employees by gender, schooling, mean of the employees age, mean of the wages of the employees…
Jörg Drechsler 26. September 2006
Slide 13Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Imputation procedure
For simplicity new founded establishments are excluded from the sampling frame and from the panel
8 new samples are drawn The number of observations in each sample equals the
number of observations in the panel ns=np=7332 Every sample is imputed five times using chained
equations Number of variables in X=24 Number of variables in Y=48
Imputations are generated using IVEware by Raghunathan, Solenberger and Hoewyk (2001)
Jörg Drechsler 26. September 2006
Slide 14Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Overview
The IAB Establishment Panel
Three approaches for disclosure control via multiple imputation
Application of the full MI approach to the IAB Establishment Panel
First results
Proceedings/open questions
Jörg Drechsler 26. September 2006
Slide 15Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
A regression by T. Zwick (2005) as a means of evaluation
Zwick analyses the productivity effects of different continuing vocational training forms in Germany
Results: vocational training is one of the most important measures to gain and keep productivity
Probit regression to explain, why firms offer vocational training
13 Explanatory variables including: Share of qualified employees, establishment size, region, collective wage agreement, high qualification needs expected…
2 variables, based on the 1998 wave of the panel, are dropped for the evaluation
Jörg Drechsler 26. September 2006
Slide 16Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Binary variables in the original and in the synthetic data set
Variable survey meansynthetic
data meanDeviation
Training Yes/No 0.7069 0.7229 2.25%
Redundancies expected 0.2239 0.1880 -16.01%
Many employees are expected to be on maternity leave 0.0644 0.0811 25.84%
High qualification needs expected 0.1551 0.1752 12.95%
Establishment size 20-199 0.3973 0.4092 3.00%
Establishment size 200-499 0.1348 0.1450 7.57%
Establishment size 500-999 0.0745 0.0777 4.29%
Establishment size 1000+ 0.0942 0.0991 5.17%
Collective wage agreement 0.7643 0.7562 -1.06%
Apprenticeship training reaction on skill shortages 0.3632 0.3725 2.58%
Training reaction on skill shortages 0.4490 0.4693 4.52%
State-of-the-art technical equipment 0.6513 0.7095 8.94%
Apprenticeship training 0.6141 0.6398 4.17%
Jörg Drechsler 26. September 2006
Slide 17Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Continuous variables in the original and in the synthetic dataset
VariableSurvey mean
synthetic data mean
Deviation
Share of qualified employees 0.6741 0.6236 -7.49%
number of employees 365.6238 356.1432 -2.59%
number of employees that participated in training measures 110.2944 88.2385 -20.00%
Jörg Drechsler 26. September 2006
Slide 18Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Results from the regressionRegression as performed by T. Zwick (n=6,258)
Exogenous variables Coefficients z-value
Redundancies expected 0.2610 4.58
Emp. exp. on maternity leave 0.2516 2.49
High qualification needs expected 0.6407 8.1
Appr. tr. react. on skill shortages 0.1763 3.4
Tr. reaction on skill shortages 0.5974 11.91
Establishment size 20-199 0.6827 15.19
Establishment size 200-499 1.3514 15.71
Establishment size 500-999 1.3984 11.75
Establishment size 1000+ 1.9725 9.15
Share of qualified employees 0.7663 10.28
State-of-the-art tech. equipment 0.1755 4.16
Collective wage agreement 0.2450 5.46
Apprenticeship training 0.4199 9.31
Regression with all missing data imputed (n=7,332)
Exogenous variables Coefficients z-values
Redundancies expected 0.2491 4.62
Emp. Exp. on maternity leave 0.2657 2.82
High qual. needs expected 0.6483 8.76
Appr. tr. react. on skill shortages 0.1142 2.05
Tr. reaction on skill shortages 0.5270 9.92
Establishment size 20-199 0.6866 16.01
Establishment size 200-499 1.3555 17.22
Establishment size 500-999 1.3475 12.78
Establishment size 1000+ 1.9622 10.13
Share of qualified employees 0.7793 11.21
State-of-the-art tech. equipment 0.1694 4.3
Collective wage agreement 0.2535 5.82
Apprenticeship training 0.4841 11.24
Jörg Drechsler 26. September 2006
Slide 19Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Complete data set and synthetic data setRegression with all missing data imputed (n=7,332)
Exogenous variables Coefficients z-values
Redundancies expected 0.2491 4.62
Emp. exp. on maternity leave 0.2657 2.82
High qual. needs expected 0.6483 8.76
Appr. tr. react. on skill shortages 0.1142 2.05
Tr. reaction on skill shortages 0.5270 9.92
Establishment size 20-199 0.6866 16.01
Establishment size 200-499 1.3555 17.22
Establishment size 500-999 1.3475 12.78
Establishment size 1000+ 1.9622 10.13
Share of qualified employees 0.7793 11.21
State-of-the-art tech. equipment 0.1694 4.3
Collective wage agreement 0.2535 5.82
Apprenticeship training 0.4841 11.24
Regression on the synthetic data (n=7,332)
Exogenous variables Coefficients z-values
Redundancies expected 0.2764 4.71
Many emp. exp. on maternity leave 0.2373 2.78
High qualification needs expected 0.6308 9.15
Appr. tr. react. on skill shortages 0.1442 2.66
Training reaction on skill shortages 0.5566 10.69
Establishment size 20-199 0.5466 12.65
Establishment size 200-499 1.0313 14.37
Establishment size 500-999 1.1425 10.40
Establishment size 1000+ 1.2331 9.89
Share of qualified employees 0.8692 9.98
State-of-the-art technical equipment 0.2041 5.00
Collective wage agreement 0.3117 7.10
Apprenticeship training 0.4655 10.81
Jörg Drechsler 26. September 2006
Slide 20Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Overview
The IAB Establishment Panel
Three approaches for disclosure control via multiple imputation
Application of the full MI approach to the IAB Establishment Panel
First results
Proceedings/open questions
Jörg Drechsler 26. September 2006
Slide 21Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Proceedings/Open Questions
Use non parametric approaches
Replace only selected variables
Measure the disclosure risk after imputation
Generate weights for the synthetic sample?
Jörg Drechsler 26. September 2006
Slide 22Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Thank you for the attention!
Jörg Drechsler 26. September 2006
Slide 23Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
Rubin’s adjusted combining rules
• Imputation yields m different data sets
• Information from the data sets has to be combined to get valid estimates
Point Estimate: Average of the point estimates from the different data sets
m
i
iMI m 1
)(ˆ1ˆ
Variance estimate as a combination of the variance within the data sets (W) and the variance between the data sets (B)
m
i
t
mW
1
)( )ˆr(av1
m
iMI
i
mB
1
2)( )ˆˆ(1
1
WBm
mMI
1)ˆr(av B
m
mW
1(not )
with
Additional sampling step necessary, when creating synthetic data sets variance B already reflects the variance within each population
Jörg Drechsler 26. September 2006
Slide 24Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research
- number of employees in June 1996 - qualification of the employees- number of temporary employees- number of agency workers- working week (full-time and overtime)- the firm‘s commitment to collective agreements- existence of a works council- turnover, advance performance and export share- investment total- overall wage bill in June 1997- technological status- age of the establishment- legal form and corporate position- overall company-economic situation- reorganisation measures- company further training activities- additional information on new foundations
Information contained in the German Social Security Data (from 1997)
Available for all German establishments with at least one employee covered by social security
Information contained in the IAB Establishment Panel (wave 1997)
Available for establishments in the survey
Covered in both datasets
establishment number, branch and size
location of the establishment
number of employees in June 1997
- number of full-time and part-time employees- short-time employment- mean and standard deviation of the employees age- mean and standard deviation of wages from
full-time employees- mean and standard deviation of wages from all
employees- occupation- schooling and training- number of women and men- number of German employees
Information from the two data sets