39
1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics” 10 May – 11 July 2006 M Q Hasan Lecturer/ Statistician UN Statistical Institute for Asia and the Pacific Chiba, Japan Email : [email protected]

1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

Embed Size (px)

Citation preview

Page 1: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

1

Data Management (1)Data Management (1)“Application of Information and Communication Technology

to Production and Dissemination of Official statistics”10 May – 11 July 2006

M Q HasanLecturer/ StatisticianUN Statistical Institute for Asia and the PacificChiba, JapanEmail : [email protected]

Page 2: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

2

OverviewOverview

Data managementData management planningData management proceduresData management softwareHands on experienceReferences

Page 3: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

3

Data management and the NSOData management and the NSO

Data management during production

– Individual caseData management after production

– Individual caseData management

– All case – long term

Page 4: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

4

Data managementData management

Management of data filesManagement files during analysisManagement files afterwards

Page 5: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

5

Data managementData management

Management of data files

– Labeling data files

– Documentation

Page 6: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

6

Data managementData management

Management files during analysis

– Version managements

– Subset data

– Arrange files in different folder

– Index files

Page 7: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

7

Data managementData management

Management files afterwards

– Pass them to system administrator for future reference

Page 8: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

8

DATA MANAGEMENTDATA MANAGEMENT

P O S S IB L E D IR E C TO R Y S TR U C TU R E

D ata TA B L E S

H E A L TH E D U C A TIO N

M Y _ F IL E S

Page 9: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

M Q Hasan, UN-SIAP9

These will lead to …These will lead to …Production of creditable data

Design of robust/ efficient / flexible storage and accessible system

Efficient procedure for sharing data with others

Page 10: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

10

Data managementData managementbefore and duringbefore and duringdata processingdata processing

Page 11: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

11

Define the relevant aspects of a dataset.Formulate a data preservation strategy.Design an access procedure.

During DP Planning :During DP Planning :

Page 12: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

12

File format and file structureNaming filesCreation and naming of variablesVariable labels

Defining the relevant aspects of a dataset

Page 13: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

13

Chose file structure according to available computing resources and the experience of

the data processors.

Defining the relevant aspects of a dataset

Page 14: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

14

Documentation– Provide responsibility to log all processing

activities

– Problems encounter

– How problems are to be solved

– Major decision taken

Defining the relevant aspects of a dataset

Page 15: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

15

Can be time consuming.Should contain all information about data, such

as, survey method, sample information, time of collection, information about variables, missing values etc.

Should start well before actual data processing. Follow standards.Preferably one file with reference to other files.

DP : Documentation DP : Documentation

Page 16: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

16

Title: Child labour in Portugal: Social characterization of school-age children and their families, 1998.

Subtitle : Child labour in Portugal, 1998.Alternative title : SIMPOC Portugal survey,

1998.Parallel title :Trabalho Infantil em Portugal:

Caracterização social dos menores emidade escolar e suas famílias, 1998 files.

DP : DocumentationDP : Documentation

Page 17: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

17

Keywords. National survey, child, economic activity, child labour, household, household chores etc.Abstract. Purpose, nature, and scope of the child labour data collection. Special characteristics of the contents etc.Time period covered. If the data was collected in 1999, and one question was “did you work last year?”, The time period should be 1998-99.

DP : DocumentationDP : Documentation

Page 18: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

18

Date of collection. Date(s) when the data were collected.Country. Name of the country where the survey was conducted.Geographic coverage. Total geographic scope of the data. Geographic unit. Lowest level of geographic aggregation covered by the data—for example province, state, or district.Unit of analysis. For most child labour surveys, the basic unit of analysis or observation is the individual person.

DP : DocumentationDP : Documentation

Page 19: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

19

Time method. Panel, cross-sectional, trend, and time-series etc.Data collector. Responsible for administering the questionnaire or interview or for compiling the data. E.G NSO.Frequency of data collection. For example, in first-time.Sampling procedure. Reference to sampling documents.

DP : DocumentationDP : Documentation

Page 20: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

20

Mode of data collection. CAPI, CATI etc.Type of research instrument. Structured, semi-structured, open-ended questions etc. Actions to minimize losses. E.G follow-up visits, supervisory checks, historical matching etc.Control operations. Methods used to facilitate data control.

DP : DocumentationDP : Documentation

Page 21: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

21

Weighting. Reference to appropriate document.Cleaning operation. E.g consistency checking, wild code checking, etc.Response rate. Percentage of sample members who provided information. Estimates of sampling error. Indication of how precisely one can estimate a population value from a given sample.

DP : DocumentationDP : Documentation

Page 22: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

22

Location. Say where the data is currently stored (e.g. A national statistics office).Availability status. Provide a statement of data availability.Extent of data. Number of physical files that exist in a dataset.Completeness of dataset. Describe if items of collected information were not included in the data file.

DP : DocumentationDP : Documentation

Page 23: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

23

Access authority. Contact person or organization that controls access to the data collection.Date use statement. Reference to the terms of use for the data collection, if any.Citation requirement. Specify any text that should be cited in publications based on analysis of the data.

DP : DocumentationDP : Documentation

Page 24: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

24

File contents. Short description of the file(s).File structure. E.G. Hierarchical, rectangular, or relational etc.Record or record group. Describe the record groupings for hierarchical or relational.Label (of record). Detailed information for each record group.Dimensions (of record). Physical characteristics of the record, such items as number of variables per record, number of cases, etc.

DP : DocumentationDP : Documentation

Page 25: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

25

Overall case count. Number of cases or observations.Overall variable count. Number of variables.Data format. Delimited format, free format, software dependent, etc.Missing data. Provide information such standardized across the collection, that missing data are the result of merging, etc.Software. Identify the software used to create the file, including the software version number.Version statement. Version statement for the data file.

DP : DocumentationDP : Documentation

Page 26: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

26

list of variables with followings :   

– if variable is a weight; and if not reference weight variable for this variable;

– question ID for the variable;

– which format has been used (e.g. SAS, SPSS);

– the number of decimal points in the variable; – whether the options are discrete or continuous

which record type this variable belongs to;

DP : DocumentationDP : Documentation

Page 27: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

27

Usually generated in a package-specific format

Convert data into other formats, if possible,Convert data into ASCII and generate

codebookReload ASCII data using same codebookRecheck data

Conversion of data files to other formats as required

DPDP

Page 28: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

28

Possible list/type of files

– Data in a package-specific format – Data in ASCII with necessary data dictionary– Public use data– Public use data in ASCII with necessary data

dictionary– Final documentation– Questionnaire

Storage of all files.

DATA MANAGEMENTDATA MANAGEMENT

Page 29: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

29

Possible list/type of files contd.

– Logical rules for consistency check.– Computer program files.– Interviewer and/or supervisor’s instruction

manual.– Coding file/s.– Sampling and weight files.

Storage of all files.

DATA MANAGEMENTDATA MANAGEMENT

Page 30: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

30

Group them considering version, type etc.Create index file associated with each sub-

directory.Add short description to each file according to

the file contents in the index file.

Storage of all files

DATA MANAGEMENTDATA MANAGEMENT

Page 31: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

31

HardwareAutomation softwareDirectory structure

Formulating a data preservation strategy

DATA MANAGEMENTDATA MANAGEMENT

Page 32: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

32

DATA MANAGEMENTDATA MANAGEMENT

P O S S IB L E D IR E C TO R Y S TR U C TU R E

V er_ 1 V er_ 2

IN TE R N A L E X TE R N A L

C L S

Page 33: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

33

DATA MANAGEMENTDATA MANAGEMENT

P O S S IB L E D IR E C TO R Y S TR U C TU R E (con td .)

In d ex file

D a ta file1

D ata file 2

D ata file 3

C od eb ook

D ata

In d ex file

M etad a ta file

P rog ram file

Q u es tion n a ire

E tc .

D ocu m en t

In d ex file

C ou n try P ro file

C ou n try rep ort

O th er R ep orts

R ep ort In d ex file

V er_ 1

Page 34: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

34

DATA MANAGEMENTDATA MANAGEMENT

P O S S IB L E D IR E C TO R Y S TR U C TU R E (con td .)

In d ex file

D a ta file 1

D ata file 2

C od eb ook

D ata

In d ex file

M etad a ta file

M an u a l file

Q u es tion n a ire

D ocu m en t

In d ex file

C ou n try P ro file

C ou n try rep ort

O th er R ep orts

R ep ort In d ex file

E X TE R N A L

Page 35: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

35

Access policy Safe keeping person : system administrator Contact person : supervisor Content modifying authority : supervisor Finalize access condition to each file

Designing an access procedure

DATA MANAGEMENTDATA MANAGEMENT

Page 36: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

36

Micro data Aggregate tables Executive summary Reports

Data type

DATA DISSEMINATIONDATA DISSEMINATION

Page 37: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

37

Online : direct access through internet in real time

Off line : available on request

Methods

DATA DISSEMINATIONDATA DISSEMINATION

Page 38: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

38

Backup policy During during data processing

Data processors responsibility

After finalization of data and documentation System administrator’s responsibility

Designing an access procedure

DATA MANAGEMENTDATA MANAGEMENT

Page 39: 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

39

END