EVALUATION OF DATA EDITING PROCESS
by Leopold Granquist, Statistics Sweden
This Chapter aims to serve as a basis to elaborate does not discuss this issue further. The second part of
and finally establish standards for useful indicators the paper discusses the role of error lists in evaluating
and/or statistical measures on the rationality and editing processes and as a basis for improvements of
efficiency of the data processing process, and the data collection.
indicating the problem areas of the data collection.
What should be measured and how it should be done The third paper is an overview of selected
are the underlying issues throughout the Chapter. evaluation studies. Most studies use the error list
An important step towards the mentioned goal is the aid of computers. The rationality indicator
taken in the first paper. It outlines the general suggested by Stefanowicz is used by some authors and
requirements for a computer system to measure and called the hit-rate. A number of evaluations are
monitor the impact of data editing and imputation for focused on the efficiency of editing processes and raise
the 2001 UK Census of Population. The UK Office for the question whether resources spent on editing are
National Statistics identified the need when evaluating justified in terms of quality improvements. In some
the 1991 Census operation. The paper documents the cases the question can be answered by studying the
requirements for the "Data Quality Monitoring impact of editing changes on the estimates. How to
System" (DQMS), which have been gathered so far. carry out such studies is also described in Chapter 1.
DQMS will be developed iteratively on a prototype However, those methods cannot be used for measuring
basis and the requirements will be enhanced or the data quality. To obtain measures on how editing
re-prioritized as work proceeds. One key requirement affects quality, it is necessary to conduct reinterview
will be to allow data experts to check the assumptions studies, record check studies or simulation studies.
built into the capture, coding, editing, and imputation Examples of such studies are presented in the paper,
of census data, and the impact of these assumptions on which also provides hints as to how the various
the data. This requirement is broken down into two methods can be evaluated. Some results from almost
parts, standard reports and ad-hoc enquiry facilities. all of the studies are given.
The latter will allow intuitive and complementary
analysis of the data. A number of standard reports and The fourth paper is a description of two evaluation
requirements for the ad-hoc reports are proposed. All studies, each one consisting of a comparison of micro
of them will tell implicitly or explicitly what is data collected from two different data sources: survey
recognized as important to measure in processing data and administrative data. This is a unique
census data. situation. Firstly, among the evaluation studies
The second paper, written by Bogdan Stefanowicz evaluation of statistics collected from administrative
proposes indicators on the rationality and the data files. Secondly, in general, data from
effectiveness of the set of edits in detecting all errors. administrative sources are not available for comparing
The author suggests improvements to take into account survey micro-data with administrative data except for
that different types of errors may have a different a few items from the register used as the sampling
impact on quality. It does not deal with errors frame. Although the paper does not provide details
introduced by editors. The efficiency indicator concerning the editing methods, it covers many aspects
involves the number of undetected errors which cannot of editing and evaluation. The need for resources to
be found from studies of error lists. The author measure or assess the impact of editing is stressed.
suggests that it might be estimated by simulations, but
method in different ways, and perform analysis with
discussed in the second paper by Granquist, there is no
STATISTICAL MEASUREMENT AND MONITORING OF DATA
106 ________________________________________________________________ Evaluation of Data Editing Process
EDITING AND IMPUTATION IN THE 2001 UNITED KINGDOM
CENSUS OF POPULATION
By Jan Thomas, Census Division, Office for National Statistics, United Kingdom
The Census Offices of the United Kingdom have
identified the need to measure and monitor the impacts
of the processing systems on census data.
Requirements have been identified for a computer
system to be known as the “Data Quality Monitoring
System” to give this information.
The system will produce a series of standard and
ad hoc reports, and will provide comparisons of
distributions before and after editing and imputation
and simple cross tabulations at various area levels.
The data can be visualised, possibly by geographical
area, to see where the error is occurring. Work has
started on a prototype system and it is hoped that the
prototype will be developed for use in the 1997 Census
In addition, it is planned to appoint a team of “data
experts” to analyse and interpret the results that will be
reported from the system.
This paper outlines the general requirements of the
system and those specifically relating to editing and
Keywords: editing; imputation; measuring and
monitoring the data quality.
The UK Census Offices have identified the need
to measure and monitor the impacts of processing
systems on census data. The evaluation of the 1991
Census operation highlighted the fact that the facilities
which were in place to monitor the quality of the data
were inadequate, and were employed too late to have
any impact on problem resolution. Research is
currently underway to produce a computer system
which will measure and monitor the data as it is being
processed in the 2001 Census.
It is planned that the use of this system, to be
known as the "Data Quality Monitoring System"
(DQMS), will be extended to cover data capture,
coding, derivation of variables and sampling. This
paper considers the editing and imputation
requirements only as they are relevant to the data
It is recognised that to operate this system a team
of people who are experts in data analysis will be
needed. It is planned to appoint a team of six "Data
Experts", with the responsibility for monitoring data as
it is processed.
2. BACKGROUND TO EDITING AND
IMPUTATION IN THE BRITISH CENSUS
In the 1991 Census, the edit system checked the
validity of data and performed sequence and structure
checks. Invalid, missing and inconsistent items were
identified for the imputation process. The editing
process filled in a few missing items. The edit matrices
were constructed so as to consider every possible
combination of values for relevant items and to give
the action, (if any) required should that combination
arise, by making the least number of changes.
Imputation was only carried out on the 100%
questions and not on sample (10%) questions; this was
because most of the 10% questions have lengthy
classifications, such as occupation and hence are
difficult to impute with any accuracy. Automatic
imputation on a record by record basis was first
introduced in the 1981 Census, and was based on the
work by Felligi and Holt in the 1970s, the so-called
hotdeck methodology. This worked well in 1981, and
so was carried through to 1991 with only minor
changes for new questions.
This paper documents the requirements for the
DQMS which have been gathered so far. It is a
working paper, as the DQMS will be developed
iteratively on a prototype basis and the requirements
will be enhanced or re-prioritised as work proceeds.
The general system requirements are shown in italics
for ease of reference and are classified as either
"standard" or "ad-hoc". The specific requirements for
editing and imputation are then listed.
Although the role of the data expert is not yet fully
defined it is anticipated that they will be in place
sometime during 1998-99, and that they will become
familiar with the data during the Dress Rehearsal for