Upload
miranda-houston
View
219
Download
0
Embed Size (px)
Citation preview
1
Data Management (1)Data Management (1)“Application of Information and Communication Technology
to Production and Dissemination of Official statistics”10 May – 11 July 2006
M Q HasanLecturer/ StatisticianUN Statistical Institute for Asia and the PacificChiba, JapanEmail : [email protected]
2
OverviewOverview
Data managementData management planningData management proceduresData management softwareHands on experienceReferences
3
Data management and the NSOData management and the NSO
Data management during production
– Individual caseData management after production
– Individual caseData management
– All case – long term
4
Data managementData management
Management of data filesManagement files during analysisManagement files afterwards
5
Data managementData management
Management of data files
– Labeling data files
– Documentation
6
Data managementData management
Management files during analysis
– Version managements
– Subset data
– Arrange files in different folder
– Index files
7
Data managementData management
Management files afterwards
– Pass them to system administrator for future reference
8
DATA MANAGEMENTDATA MANAGEMENT
P O S S IB L E D IR E C TO R Y S TR U C TU R E
D ata TA B L E S
H E A L TH E D U C A TIO N
M Y _ F IL E S
M Q Hasan, UN-SIAP9
These will lead to …These will lead to …Production of creditable data
Design of robust/ efficient / flexible storage and accessible system
Efficient procedure for sharing data with others
10
Data managementData managementbefore and duringbefore and duringdata processingdata processing
11
Define the relevant aspects of a dataset.Formulate a data preservation strategy.Design an access procedure.
During DP Planning :During DP Planning :
12
File format and file structureNaming filesCreation and naming of variablesVariable labels
Defining the relevant aspects of a dataset
13
Chose file structure according to available computing resources and the experience of
the data processors.
Defining the relevant aspects of a dataset
14
Documentation– Provide responsibility to log all processing
activities
– Problems encounter
– How problems are to be solved
– Major decision taken
Defining the relevant aspects of a dataset
15
Can be time consuming.Should contain all information about data, such
as, survey method, sample information, time of collection, information about variables, missing values etc.
Should start well before actual data processing. Follow standards.Preferably one file with reference to other files.
DP : Documentation DP : Documentation
16
Title: Child labour in Portugal: Social characterization of school-age children and their families, 1998.
Subtitle : Child labour in Portugal, 1998.Alternative title : SIMPOC Portugal survey,
1998.Parallel title :Trabalho Infantil em Portugal:
Caracterização social dos menores emidade escolar e suas famílias, 1998 files.
DP : DocumentationDP : Documentation
17
Keywords. National survey, child, economic activity, child labour, household, household chores etc.Abstract. Purpose, nature, and scope of the child labour data collection. Special characteristics of the contents etc.Time period covered. If the data was collected in 1999, and one question was “did you work last year?”, The time period should be 1998-99.
DP : DocumentationDP : Documentation
18
Date of collection. Date(s) when the data were collected.Country. Name of the country where the survey was conducted.Geographic coverage. Total geographic scope of the data. Geographic unit. Lowest level of geographic aggregation covered by the data—for example province, state, or district.Unit of analysis. For most child labour surveys, the basic unit of analysis or observation is the individual person.
DP : DocumentationDP : Documentation
19
Time method. Panel, cross-sectional, trend, and time-series etc.Data collector. Responsible for administering the questionnaire or interview or for compiling the data. E.G NSO.Frequency of data collection. For example, in first-time.Sampling procedure. Reference to sampling documents.
DP : DocumentationDP : Documentation
20
Mode of data collection. CAPI, CATI etc.Type of research instrument. Structured, semi-structured, open-ended questions etc. Actions to minimize losses. E.G follow-up visits, supervisory checks, historical matching etc.Control operations. Methods used to facilitate data control.
DP : DocumentationDP : Documentation
21
Weighting. Reference to appropriate document.Cleaning operation. E.g consistency checking, wild code checking, etc.Response rate. Percentage of sample members who provided information. Estimates of sampling error. Indication of how precisely one can estimate a population value from a given sample.
DP : DocumentationDP : Documentation
22
Location. Say where the data is currently stored (e.g. A national statistics office).Availability status. Provide a statement of data availability.Extent of data. Number of physical files that exist in a dataset.Completeness of dataset. Describe if items of collected information were not included in the data file.
DP : DocumentationDP : Documentation
23
Access authority. Contact person or organization that controls access to the data collection.Date use statement. Reference to the terms of use for the data collection, if any.Citation requirement. Specify any text that should be cited in publications based on analysis of the data.
DP : DocumentationDP : Documentation
24
File contents. Short description of the file(s).File structure. E.G. Hierarchical, rectangular, or relational etc.Record or record group. Describe the record groupings for hierarchical or relational.Label (of record). Detailed information for each record group.Dimensions (of record). Physical characteristics of the record, such items as number of variables per record, number of cases, etc.
DP : DocumentationDP : Documentation
25
Overall case count. Number of cases or observations.Overall variable count. Number of variables.Data format. Delimited format, free format, software dependent, etc.Missing data. Provide information such standardized across the collection, that missing data are the result of merging, etc.Software. Identify the software used to create the file, including the software version number.Version statement. Version statement for the data file.
DP : DocumentationDP : Documentation
26
list of variables with followings :
– if variable is a weight; and if not reference weight variable for this variable;
– question ID for the variable;
– which format has been used (e.g. SAS, SPSS);
– the number of decimal points in the variable; – whether the options are discrete or continuous
which record type this variable belongs to;
DP : DocumentationDP : Documentation
27
Usually generated in a package-specific format
Convert data into other formats, if possible,Convert data into ASCII and generate
codebookReload ASCII data using same codebookRecheck data
Conversion of data files to other formats as required
DPDP
28
Possible list/type of files
– Data in a package-specific format – Data in ASCII with necessary data dictionary– Public use data– Public use data in ASCII with necessary data
dictionary– Final documentation– Questionnaire
Storage of all files.
DATA MANAGEMENTDATA MANAGEMENT
29
Possible list/type of files contd.
– Logical rules for consistency check.– Computer program files.– Interviewer and/or supervisor’s instruction
manual.– Coding file/s.– Sampling and weight files.
Storage of all files.
DATA MANAGEMENTDATA MANAGEMENT
30
Group them considering version, type etc.Create index file associated with each sub-
directory.Add short description to each file according to
the file contents in the index file.
Storage of all files
DATA MANAGEMENTDATA MANAGEMENT
31
HardwareAutomation softwareDirectory structure
Formulating a data preservation strategy
DATA MANAGEMENTDATA MANAGEMENT
32
DATA MANAGEMENTDATA MANAGEMENT
P O S S IB L E D IR E C TO R Y S TR U C TU R E
V er_ 1 V er_ 2
IN TE R N A L E X TE R N A L
C L S
33
DATA MANAGEMENTDATA MANAGEMENT
P O S S IB L E D IR E C TO R Y S TR U C TU R E (con td .)
In d ex file
D a ta file1
D ata file 2
D ata file 3
C od eb ook
D ata
In d ex file
M etad a ta file
P rog ram file
Q u es tion n a ire
E tc .
D ocu m en t
In d ex file
C ou n try P ro file
C ou n try rep ort
O th er R ep orts
R ep ort In d ex file
V er_ 1
34
DATA MANAGEMENTDATA MANAGEMENT
P O S S IB L E D IR E C TO R Y S TR U C TU R E (con td .)
In d ex file
D a ta file 1
D ata file 2
C od eb ook
D ata
In d ex file
M etad a ta file
M an u a l file
Q u es tion n a ire
D ocu m en t
In d ex file
C ou n try P ro file
C ou n try rep ort
O th er R ep orts
R ep ort In d ex file
E X TE R N A L
35
Access policy Safe keeping person : system administrator Contact person : supervisor Content modifying authority : supervisor Finalize access condition to each file
Designing an access procedure
DATA MANAGEMENTDATA MANAGEMENT
36
Micro data Aggregate tables Executive summary Reports
Data type
DATA DISSEMINATIONDATA DISSEMINATION
37
Online : direct access through internet in real time
Off line : available on request
Methods
DATA DISSEMINATIONDATA DISSEMINATION
38
Backup policy During during data processing
Data processors responsibility
After finalization of data and documentation System administrator’s responsibility
Designing an access procedure
DATA MANAGEMENTDATA MANAGEMENT
39
END