16
Statistics Canada’s National Household Survey: State of knowledge for Quebec users Information note December 2, 2013

Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

  • Upload
    lyhanh

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

Statistics Canada’s National Household Survey: State of knowledge for Quebec users Information note December 2, 2013

Page 2: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to
Page 3: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

INSTITUT DE LA STATISTIQUE DU QUÉBEC

Statistics Canada’s National Household Survey: State of knowledge for Quebec users Information note December 2, 2013

Direction de la méthodologie et de la qualité Direction des statistiques sociodémographiques

Page 4: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to
Page 5: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

Background In 2013, Statistics Canada released the data from the new National Household Survey (NHS). This voluntary survey has replaced the mandatory 2B long form of previous censuses. Many departments, agencies and local planners who have used data from the long census questionnaire to establish a socioeconomic profile of their target population are now turning to the NHS. The major methodological changes made by Statistics Canada involve additional work on the part of these users who must now familiarise themselves with changes related to the implementation of the NHS in order to determine the impact it will have on their work. Using NHS data does require increased vigilance, whether it be results posted on Statistics Canada’s website or obtained by special requests. We must not assume that data taken from the 2006 Census long questionnaire can be updated directly by a similar measure taken from the 2011 NHS. An analysis of the extent of changes in each user context must first be made. This document, issued by the Institut de la statistique du Québec, aims to provide insights on the information made available to date by Statistics Canada on the NHS data. A reminder of facts associated with this important change is followed by sections on data reliability, comparability and consistency. The final section is summarized in list form featuring key elements that every user should consider before using NHS data to determine the impact of methodological changes in the context of use and ensure that these data can be used to meet their needs. Reminder of facts Because of a federal decision announced in June 2010, the mandatory long questionnaire was replaced by a voluntary survey at the time of data collection for the 2011 Census. With very similar content to the long questionnaire of the 2006 Census, the 2011 NHS was distributed to about one in three Canadian households rather than one in five as was the long questionnaire of censuses since 1981. While NHS questions have been tested during the 2011 Census consultation and testing processes, the data collection method and the voluntary nature of the survey have not (NHS User Guide, page 4). Data collection took place from May to August 2011. Statistics Canada then processed the data (capture, coding, imputation, weighting, etc.).

Page 3

Page 6: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

NHS results were disseminated in three stages during the spring and summer of 2013. The results released on May 8 covered Aboriginal peoples, immigration and ethnocultural diversity. Results released on June 26 covered employement, education, workplace, commuting to work, mobility and migration and language of work. Finally, the results released on September 11, rather than August 14 as originally planned1, focused on income, earnings, housing and shelter costs.

Many stakeholders have expressed concerns about the quality and comparability of NHS data, particularly for specific subpopulations or small geographical levels. In a press release issued on July 15, 2010, the Institut de la statistique du Québec raised concerns related to the three dimensions of quality of statistical information: reliability, comparability and coherence. Issues concerning each of these elements are discussed in the following sections. Most of the elements presented in these sections are taken from reference material made available by Statistics Canada, including the NHS Dictionary, the NHS User Guide and reference guides on each theme. A list of these documents is available at the end of this note. Reliability of NHS data Reliability generally refers to statistical accuracy and the absence of bias. These two elements are described below. Risk of bias in NHS data Since nonrespondents tend to have different characteristics than respondents, the risk of survey data being biased (i.e. not being representative of the actual population) is more important with a lower response rate. As expected, given the voluntary nature of the NHS, a decrease in response rate was observed, compared to the long questionnaire of previous censuses. In Quebec, the response rate of households for the NHS was 71.9%, a result higher than the 50% goal Statistics Canada had aimed for. It is however well below the rate of about 94% obtained with the long questionnaire of the 2006 Census. Indeed, the bias risk due to nonresponse is greater in the NHS than in previous censuses. In assessing this response rate, we must keep in mind that "The objective of the NHS is to provide data for small geographic areas and small population groups" (NHS User Guide, page 4). By examining response rates according to census subdivision (CSD)2 (available on Statistics Canada’s website), we can see that the overall rate of over 70% conceales significant disparities, which constitutes a major drawback given the objective of this survey.

1 Dissemination had been confirmed by Statistics Canada in The Daily on August 9. The postponement was announced in The Daily on August 14 with the following statement: "While processing data, Statistics Canada detected issues that must be resolved before release. All data from the National Household Survey released so far are not affected." 2 CSDs are census geography areas which correspond to municipalities. On January 1, 2011, Quebec had 1,285 CSDs.

Page 4

Page 7: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

Moreover, even among returned questionnaires, some questions may have been left unanswered. This is called item non-response. While it is less present in demographic, sociocultural, and linguistic characteristics, as well as characteristics related to mobility measured by the NHS (less than 5%), Statistics Canada indicates that item non-responses are greater in education characteristics (5 to 10% rate) and in sections on work, income and dwelling characteristics (5 to 20% rate). The thematic references guides can provide you with additional information. A global non-response rate (GNR), combining household non-response and item non-response, is the main indicator related to the quality of NHS estimates used by Statistics Canada. It comes alongside the release of data for each geographical area. Statistics Canada established that NHS estimates for any geographical area with a GNR of 50% or higher would not be published in standard products. These estimates “have such a high level of error that they should not be released in most circumstances.” Analyses showed that beyond this number, “the bias was so large that the estimates were not of sufficiently high quality” (User Guide, page 16). This quality criterion is much less restrictive than the one used for the 2011 Census (short questionnaire) and previous censuses. For these, a geographical unit with a GNR higher than 5% was reported and any unit with a GNR higher than 25% was suppressed. Across Quebec, 183 CSDs were removed from standard products because of a GNR equal to or greater than 50%. Another 123 CSDs were also removed because of other criteria used in 2006 (estimated population of less than 40 people: 28 deleted CSDs; population count is zero: 88 deleted CSDs; incomplete enumeration of Indian reserves and Indian settlements: 7 deleted CSDs). A census agglomeration (CA), Hawkesbury (Quebec part), has also been suppressed because of a GNR greater than or equal to 50%, but no census division (CD) has been deleted. Applying the suppression criterion used in the 2006 Census (greater than or equal to 25%) would have resulted in the removal of 929 CSDs (rather than 183), 9 CAs (rather than 1) and 51 CDs (rather than 0).

Page 5

Page 8: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

Although geographical units with a GNR greater than or equal to 50% are deleted from standard data products released by Statistics Canada, they are nevertheless available by custom requests. In light of all the warnings made by Statistics Canada, use of these data with a greater bias risk must be done with caution and only after rigorous examination of the context of use. In addition to reviewing the GNR, an analysis of the NHS data quality may be carried out by a comparison with alternative data sources. This is Statistics Canada’s preferred approach in its thematic reference guides to document potential bias. It has however limited comparisons to Canada and its provinces and territories. Such comparisons should be conducted whenever possible by users with alternative sources, including at a smaller geographical scale or for specific topics which have not been processed by Statistics Canada. These research findings would help provide a clearer picture of the analytical potential of NHS data. Reliability of income data Tests conducted by Statistics Canada on income data are a good example of the information that such comparisons can provide on the data’s scope and limitations. In contrast to other topics for which alternative sources are not available, Statistics Canada had three sources with which the NHS data could be compared: the Survey of Labour and Income Dynamics (SLID), the Annual Estimates for Census Families and Individuals (T1 Family File - T1FF, produced primarily from tax returns) and the 2006 Census. It is especially interesting that such comparisons were possible for the topic of income, taking into account higher rates of non-response to these questions and special measures put in place by Statistics Canada to mitigate its effects. In the Income Reference Guide, Statistics Canada provides several findings showing differences between the NHS results and other data sources. These findings can help analysts dealing with the topic of income. They will be able to judge the adequacy of warnings given on documents they are considering. Observations about income can also provide information to analysts whose work focuses on issues correlated with income (e.g. education, employment, etc.). Other aspects related to income are covered below in the section on comparability.

Page 6

Page 9: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

Accuracy of NHS estimates The coefficient of variation (CV) is a measure used to describe the precision of an estimate. It depends on the number of respondents and response rate variability. By increasing the sampling fraction of households from 20% to 30%, Statistics Canada has secured a number of NHS respondents roughly equivalent to that of the 2006 Census. However, the voluntary nature of the survey resulted in a much greater variability in response rates across geographical regions and between subpopulations, which has a direct impact on accuracy. Coefficients of variation, or similar measures, are generally provided with estimates derived from survey data to allow the calculation of confidence intervals used for data analysis. For the NHS, Statistics Canada has made available a few CVs for selected variables across Canada (provinces and territories only) on its Web site. At these levels, CVs are very small, indicating that the estimate’s precision is good and appears to be equivalent to that of 2006. Statistics Canada sent to statistical focal points in each province a second production of CVs for census subdivisions of at least 30 000 people living in private households (34 out of the 1285 Quebec municipalities). They are available upon request to the Institut de la statistique du Québec. A quick review of these CVs showed that, as expected, they were higher than Quebec CVs, but remain below 15% for most variables. Nevertheless, Statistics Canada mentioned in the note accompanying the diffusion of CVs that "larger CVs can be observed for smaller geographical areas like census subdivisions and territories or rarer caracteristics like visible minority, immigrants and unemployed." Such CVs are however not made available by Statistics Canada to this day, although requests to that effect have been made by provincial and territorial representatives. The aim of these requests was to obtain additional information on the quality of NHS data, compared with that of previous censuses, but also to ensure that users have statistical analysis tools. For now, analysts will have to draw their conclusions on a direct comparison of point estimates, without being able to account for their accuracy. Given this, the risk of falsely concluding to differences between groups would be greater when the variability is higher, so for smaller geographic areas or subpopulations. Statistics Canada announced, as in past censuses, the upcoming release of a technical report on sampling and weighting for the NHS. The issue of sampling variance, directly related to CVs is usually covered in this document for which the release date is not yet known.

Page 7

Page 10: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

Comparability of NHS data

For our purpose, we consider that comparability refers to the ability to compare data from the NHS with data of previous censuses in order to establish trends. Because of methodological changes made by Statistics Canada, one can not assume that the NHS data are directly comparable with similar results from the long questionnaire of the 2006 Census. In this regard, the NHS User Guide recommends "caution (…) especially when the analysis involves small geographies" (page 14). Three main aspects can limit comparability: increased risk of bias due to non-response, differences between target populations and wording and positioning of questions.

In a note to readers placed at the end of the analytical documents, Statistics Canada states that "the NHS estimates are derived from a voluntary survey and are therefore subject to potentially higher non-response error than those derived from the 2006 Census long form.” Statistics Canada encourages users to use the GNR “in assessing the quality of the NHS estimates and determining the extent to which the estimates can be compared with the estimates from the 2006 Census long form" (User Guide, page 14).

The target population for the 2011 NHS is the population in private households. It is different from the target population of the 2006 Census long questionnaire which also included usual residents in non-institutional collective dwellings (e.g., hotels, motels, rooming houses, retirement homes, student residences, religious establishments, military bases, etc.).

Although the NHS questionnaire content is quite similar to the 2006 Census long questionnaire, the wording and positioning of some questions has changed. The NHS Dictionary provides information on these topics. However, unlike previous census dictionaries which contained a lot of information on comparability over time, the NHS Dictionary almost never addresses this issue. It looks like the dictionary of a brand new survey, almost without ties to the past. Some reference guides contain a section on this aspect, but it is not the case for most of them. In this context, comparisons between NHS results and results of previous censuses should be made with caution, after making a risk analysis and, if possible, after a comparison with an alternative data source has confirmed its validity. It may also be appropriate to consult data products and analytical documents released by Statistics Canada to assess the relevance for comparisons over time. Comparability of income data

For data quality evaluation purposes, Statistics Canada compared NHS estimates on income with those of the 2006 Census. Results of this comparison can be found in the Income Reference Guide.

Page 8

Page 11: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

It states that:

- Comparing low-income estimates from the 2011 NHS to estimates from previous censuses shows trends that differ significantly from those observed from other surveys and administrative data such as the Survey of Labour and Income Dynamics (SLID) or the T1 Family File (T1FF).

- With the new NHS methodology, the low-income estimates are not comparable to those previously based on the census.

- Due to lack of comparability of trends and to prevent drawing misleading conclusions from the comparison of low-income cut-off (LICO) estimates between the NHS and previous censuses, low-income estimates based on the LICO are not available as a standard product of the NHS but can be obtained upon request.

In an open letter to the Globe and Mail (October 4, 2013), researchers with the University of Toronto highlighted other limitations of NHS data comparability. The researchers calculated the Gini coefficients of income inequality (median income) between CMA census tracts of Canada from three sources, namely the 2006 Census, the Canada Revenue Agency (CRA) and the NHS. They stated that: “The Gini coefficient of neighbourhood income inequality for Canada’s major metropolitan areas in the 2011 CRA and the 2006 Census are about the same, but not in the 2011 NHS. Inequality can increase or decrease in measurable ways over five years, but rarely by the amount claimed in the NHS." In other words, the trend of inequalities decreasing observed in the most populous metropolitan areas does not appear likely to them. They believe that the problem is the measurement of NHS median income at census tract level. Consistency between NHS and Census

Weighting of the NHS was made by Statistics Canada to reduce or eliminate differences with the 2011 Census figures. However, some discrepancies may remain, especially when it comes to small municipalities, because calibrating the weights against census totals is done for geographical areas with an average of 2300 dwellings (approximately 5600 people). So, by design, statistical processing does not garanty coherence below this level. This means that for geographical levels with fewer than 5000 people or so, increased vigilance is required.

In the User Guide (p. 14), Statistics Canada encourages users to compare the NHS estimates of the target population count against the corresponding figures in the 2011 Census. The greater the difference is, “the greater (is) the risk of having poor quality NHS estimates.” Three figures featured in Appendix 3 of the Guide show that for small CSDs the difference can be important.

In Quebec, 45 municipalities show differences of more than 10% between the 2011 Census figures and those of the NHS. Most of these municipalities have fewer than 500 inhabitants, according to the universe covered by the NHS.

Page 9

Page 12: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

Page 10

In order to make this comparison for a given geographical area, users can refer to the NHS profile which features, in a second tab, the Census data. Given the differences in universes, comparison must be made between the Total number of persons in private households (Census) and Total population in private households (NHS). Other important elements to consider The specific issue of subpopulations Unlike geographical areas for which the GNR provides an indication of potential bias, there is no such indicator to assess data quality for subpopulations. This is true for immigrants, low-income individuals, people working in a particular employment sector, etc. Again, it is by comparing NHS estimates with alternative data sources that data quality can be assessed. When such data was available, Statistics Canada drew comparisons during the certification of final NHS estimates. Reference documents give some information on certain subpopulations. For example, the Reference Guide covering place of birth states that "the NHS estimate for the population born in the Philippines is overestimated on the national level.” Since Statistics Canada does not have alternative sources of data for every topic covered by the survey, it was not able to draw comparisons for every subject. This task should be carried out by users with alternative sources of subpopulations that have not been processed by Statistics Canada. These findings help provide a clearer picture of the analytical potential of NHS data. Impact of applying confidentiality rules It is important to consider the issue of rules of confidentiality applied by Statistics Canada, especially in the case of data tables with multiple cells of very small size. These rules include suppression thresholds for the value of certain cells. It is essential to read the document published by Statistics Canada dealing with this aspect and to ask the agency additional questions to understand the implications of the application of the confidentiality suppression thresholds when purchasing data.

Page 13: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

Conclusion In light of the evidence presented, two facts stand out. The first concerns comparability. Due to methodological changes made by Statistics Canada, one can not assume that the NHS data, coming from a new voluntary survey, can be compared directly with corresponding data from the 2006 Census long form, which was mandatory. In this context, Statistics Canada recommended caution by doing a preliminary data analysis, including a comparison with an alternative source if possible. The second fact concerns work on small populations. Given the generally lower quality occurring with small geographical levels, using NHS results of these levels should be done with vigilance. In addition, since statistical processing of data ensures consistency with Census data at a level of about 2300 dwellings, using NHS results on geographic levels with less than 5000 people should be done with greater vigilance. Caution is also required when analyzing subpopulations, especially since no all-embracing quality indicator is available to guide users in these cases. The Institut de la statistique du Québec has a long record of expertise in conducting population surveys. In addition, as part of its analysis and dissemination work, it frequently used data from previous censuses to document several themes and intends to use those of the NHS, when possible. This expertise, combined with close monitoring of Statistics Canada’s NHS related work since 2010, puts the Institut in a position to provide general advice on the use of the results of this new survey. However, Statistics Canada is the NHS’s producer and, therefore, is the only one who possesses all the information on the statistical processing of data. As suggested by the federal agency, specific questions on NHS quality and comparability should be addressed directly to Statistics Canada. However, since the Institut represents Quebec at the Federal - Provincial - Territorial Committee on the Census, we would like to be kept informed of the outcome of the steps taken with Statistics Canada in order to carry out our role as statistical coordinator as best we can.

Page 11

Page 14: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

Summary of facts to be taken into consideration when using NHS data 1. Be aware of the many rules applied by Statistics Canada to ensure confidentiality,

particularly for special data orders. These rules may have a significant impact on the availability of data (suppression), particularly in the case of cells containing small estimated values.

2. Consider global non-response rates (GNR). These should be provided systematically by

Statistics Canada for all geographical levels, whether it be standard census geographic levels or a particular one by special request. Keep in mind that Statistics Canada sets its release cut-offs at 50% rather than 25% as in previous censuses. Geographic levels that do not meet this criterion are still available for special table purchases. Using these lower quality data requires the utmost caution.

3. Consider the risk that the data are biased. The higher the GNR, the greater the risk.

Reference guides for each topic covered by the NHS features results of analyses made on this subject by Statistics Canada. NHS data should, where possible, be compared with those obtained from alternative sources to ensure their plausibilty.

4. Consider the fact that data accuracy decreases with smaller areas and subpopulations. Take

this into account when comparing two parameters and finding a difference. If CVs are not available, ask Statistics Canada about the accuracy of the estimates analyzed.

5. Before comparing NHS results with those of previous censuses, consider differences in

target population, wording and positioning of questions and risk of bias due to non-response. Keep in mind that Statistics Canada recommends caution, especially for small geographical levels.

6. For analyses on small geographic areas, check consistency between the NHS estimation for

the target population count and the corresponding figure in the 2011 Census. A large difference could mean lower quality data for the NHS.

7. For subpopulations, consider the higher risk of data being biased. Reference guides for each

of the topics covered by the NHS sometimes provide information about this. The NHS data should, where possible, be compared with those obtained from alternative sources, to ensure their plausibility.

8. Accompany any analysis and dissemination of data with a note specifying the interpretation

limits. If needed, contact Statistics Canada for details about this.

Page 12

Page 15: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

Page 13

The list below features a number of reference documents on the NHS made available by Statistics Canada. Clickable links provide direct access on the federal agency’s website.

- NHS Dictionary - NHS User Guide - Data Quality and Confidentiality Standards and Guidelines (Public) - Reference Guides:

o Aboriginal Peoples o Ethnic Origin o Langages (including Language of work ) o Place of Birth, Generation Status, Citizenship and Immigration o Religion o Visible Minority and Population Group o Education o Labour o Journey to Work o Mobility and Migration o Income o Housing

- Final response rates - Geographic areas not released - NHS: Coefficients of variation (CVs) - Home page of analytical documents for different topics

Page 16: Statistics Canada’s National Household Survey: … · Statistics Canada’s National Household Survey: State of knowledge for Quebec users . ... The aim of these requests was to

Page 14