23
1175 SAMPLING FISCAL DATA FOR THE AGRICULTURE SECTOR Suzelle Giroux, Statistics Canada R.H. Coats Building 3 rd floor, Ottawa Ontario, K1A 0T6, Canada [email protected] ABSTRACT Since 1987, tax records have been sampled to get detailed financial information and to produce annual estimates on the Canadian agricultural sector. Several sources of fiscal data are used to supply the sample. The use of these sources has evolved over the years as well as the sources themselves. Also, the requirements for data analysis have grown and changed. This evolution brings new challenges that are currently being addressed, namely the data quality resulting from the transition of corporations’ tax returns to a new format called General Index of Financial Information (GIFI); and a sample design that will allow retrospective and longitudinal analysis. Key Words: multiple frames, retrospective analysis, longitudinal analysis 1. INTRODUCTION The agriculture tax data program produces financial agricultural statistics based on tax records since the 1987 fiscal year. Amongst the statistics produced are estimates on detailed income and expenses as well as acquisitions and disposals of assets. The requirements for the sample are determined and produced at Statistics Canada and sent to the Canadian Customs and Revenue Agency (CCRA, formerly known as Revenue Canada) to be implemented in the sampling system. As the tax returns come in, basic information captured by CCRA for assessment purposes will determine if the return is to be sampled or not. Why sample fiscal data when the whole farm population is reporting? CCRA receives the completed taxation forms and captures the information required for the purpose of the agency. However, more details are needed to fulfill the requirements of the agriculture tax data program. Since getting these details for all farms would be very costly, a sample design was created. 2. SOURCES OF DATA Different sources are currently used to supply the agriculture tax sample. Farms can be unincorporated businesses or corporations, but not both. The operators with an unincorporated business will have to complete a T1 tax form , whereas for the corporations, a T2 form has to be completed. The T1 forms completed by the operators of unincorporated businesses have typically being obtained on paper. However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an electronic fashion. These records are called E-filers. Since the 1995 fiscal year, E-filers have been used as a source to supply the agriculture tax sample. In 1997, operators of unincorporated businesses who participated in the Net Income Stabilization Account (NISA) program had a new T1 tax form made available to them. NISA is a program available to agriculture operations to help them in case of disaster. It allows farm operators to set aside money, which is matched by the government, that can be used by farm operations when disasters strike (flood, drought, hail, commodity price decreases, etc.). Farm operations must meet certain requirements to be eligible for this program and there are some limits to the money that can be set aside. To apply for this program, a NISA form requiring many details on the farm operation including financial information like detailed income and expenses, must be completed. Agriculture and Agri-Food Canada (AAFC) who manages this program, had these forms captured in the past few years, hence these forms are available electronically.

11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

Embed Size (px)

Citation preview

Page 1: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1175

SAMPLING FISCAL DATA FOR THE AGRICULTURE SECTOR

Suzelle Giroux, Statistics CanadaR.H. Coats Building 3rd floor, Ottawa Ontario, K1A 0T6, Canada

[email protected]

ABSTRACT

Since 1987, tax records have been sampled to get detailed financial information and to produce annual estimates on theCanadian agricultural sector. Several sources of fiscal data are used to supply the sample. The use of these sources hasevolved over the years as well as the sources themselves. Also, the requirements for data analysis have grown andchanged. This evolution brings new challenges that are currently being addressed, namely the data quality resulting fromthe transition of corporations’ tax returns to a new format called General Index of Financial Information (GIFI); and asample design that will allow retrospective and longitudinal analysis.

Key Words: multiple frames, retrospective analysis, longitudinal analysis

1. INTRODUCTION

The agriculture tax data program produces financial agricultural statistics based on tax records since the 1987 fiscalyear. Amongst the statistics produced are estimates on detailed income and expenses as well as acquisitions anddisposals of assets.

The requirements for the sample are determined and produced at Statistics Canada and sent to the Canadian Customsand Revenue Agency (CCRA, formerly known as Revenue Canada) to be implemented in the sampling system. Asthe tax returns come in, basic information captured by CCRA for assessment purposes will determine if the return isto be sampled or not.

Why sample fiscal data when the whole farm population is reporting? CCRA receives the completed taxationforms and captures the information required for the purpose of the agency. However, more details are needed tofulfill the requirements of the agriculture tax data program. Since getting these details for all farms would be verycostly, a sample design was created.

2. SOURCES OF DATA

Different sources are currently used to supply the agriculture tax sample. Farms can be unincorporated businessesor corporations, but not both. The operators with an unincorporated business will have to complete a T1 tax form ,whereas for the corporations, a T2 form has to be completed.

The T1 forms completed by the operators of unincorporated businesses have typically being obtained on paper.However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in anelectronic fashion. These records are called E-filers. Since the 1995 fiscal year, E-filers have been used as a sourceto supply the agriculture tax sample.

In 1997, operators of unincorporated businesses who participated in the Net Income Stabilization Account (NISA)program had a new T1 tax form made available to them. NISA is a program available to agriculture operations tohelp them in case of disaster. It allows farm operators to set aside money, which is matched by the government, thatcan be used by farm operations when disasters strike (flood, drought, hail, commodity price decreases, etc.). Farmoperations must meet certain requirements to be eligible for this program and there are some limits to the money thatcan be set aside. To apply for this program, a NISA form requiring many details on the farm operation includingfinancial information like detailed income and expenses, must be completed. Agriculture and Agri-Food Canada(AAFC) who manages this program, had these forms captured in the past few years, hence these forms are availableelectronically.

Page 2: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1176

Before 1997, operators who wanted to participate in the NISA program had to complete a NISA form. They alsohad to complete the T1 tax form for fiscal purposes. Since these two forms required similar information, a new formwas created for fiscal year 1997 and combines both forms. Now, operators of unincorporated businesses who wantto participate to the NISA program have only one form to complete. These NISA-T1 forms are captured by CCRAfor AAFC’s NISA program hence are available electronically. Since fiscal year 1997, these NISA-T1 records havebeen used to supply the agriculture tax sample.

The T2 forms completed for the corporations are typically obtained on paper. The operators of corporations whowant to participate in NISA must complete a separate form. The NISA form completed for the corporations are notused to supply the sample.

In summary, there are three sources to sample from for the unincorporated businesses: paper forms, E-filers andNISA-T1 forms and one source for the corporations: paper forms.

3. SAMPLING METHODOLOGY AND DATA COLLECTION

3.1 Unincorporated Businesses’ sample design

Statistics Canada receives and uses every year the most recent self-employment file to create the parametersrequired for the new sample. This file contains very limited information on all unincorporated businesses. For theagriculture sector, records with a net farm income not equal to zero or a farm gross income greater than zero areextracted from this file and represent the population of unincorporated agricultural business operators and amountsto approximately 450,000 records. Note that the 450,000 records do not represent individual farms since manyfarms consist of partnership and each partner should report a farm income.

This population is stratified by province and by sales class. The sample size is usually driven by cost. Mostrecently, the required sample size has been around 28,000. This sample size is then allocated to the strata, the farmoperations in strata with larger sales having a larger probability of being selected. The sampling fractions obtainedby strata are then calculated. These fractions are then translated to a 0 to 99 selection interval. For instance, astratum with an interval 10 to 19 or 55 to 64 represents a 10% sampling fraction. The list of strata with the selectionintervals is called the parameter file that is required by the sampling system at CCRA that is based on a Bernouillidesign.

Since each unincorporated business has a unique identifier for a person (Social Insurance Number (SIN)), a functionwas created to “transform” this number on a 0 to 99 interval. This function is called HASH number. For a givenstratum, the HASH number of a record is compared to the selection interval and this will determine if the record isto be selected or not. Since 1987, the smallest sampling fraction one could get for a stratum was 1% (interval 0 to99). However, this was changed for the 1999 fiscal data where a sampling fraction of 0.1% can now be obtained(interval 0 to 999).

Another feature of the sampling system in CCRA is called the historical file. The records contained on this file arealso called pre-specified by the agriculture tax data program. The pre-specified file contains a list of agricultureoperations that have to be sampled, even if they do not fall in the sampling interval. These records are veryimportant contributors to the estimates of the agriculture tax data program.

A final file called an exclusion file is also provided to the sampling system. As opposed to pre-specified, the filerepresents a list of records that are not to be sampled, even if they fall in the sampling interval. These records aremainly hutterite colonies that do not produce commodities for trade.

From 1987 to 1994, most of the sampled T1 tax returns were obtained on paper. The number of unincorporatedbusinesses selected increased from around 40,000 in 1987 to around 60,000 in 1994, with a smaller sample in 1988and 1989 (~28,000) since the Prairie provinces (Alberta, Saskatchewan and Manitoba) were not sampled.

The sampled paper forms are captured using a system called COMSCREEN. This system is used by CCRA tocapture the agriculture financial fields required by the agriculture tax data program. The capture of forms was

Page 3: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1177

historically performed by Statistics Canada until 1995 and was transferred in 1996 to CCRA for cost effectivereasons.

Since more and more E-filers were obtained, it was decided to introduce this new source in the sample starting in1995. In 1997, with the NISA-T1 combined form, another source of data was made available to us.

Records falling into the sample and that are reported electronically (E-filers) are automatically added to the sample.The same is done for NISA-T1 records. Since all records for E-filers and NISA-T1 are available electronically,Agriculture and Agri-Food Canada required that more of these records be added to the sample. However, theserecords do not necessarily pass the edits and some clean up is necessary. Because of the large amount of recordsthat failed the edits, it was not possible to afford to clean them all. The solution was to add only the “clean” recordsto the sample. However, because of possible bias, these records would only represent themselves, hence would havea sampling weight of one. Figure 1 illustrates the sample design with paper, E-filers and NISA-T1 used as sourcesto supply the sample.

FIGURE 1: Sample design for the unincorporated businesses (T1)

For 1995 and 1996, the number of unincorporated businesses selected for the sample (with E-filers introduced)increased to 113,000 and 137,000 respectively. In 1997 and 1998, the number increased to 139,000 and 155,000respectively, when NISA-T1 forms were added. The small increase observed between 1996 and 1997 (only 2,000)is explained by a reduction of the algorithm sample from 36,000 to 22,000 because the NISA source was introduced.The algorithm sample was increased in 1998 to 28,000 because some estimates had poor coefficient of variations.

These different data sources do not have the same data quality and do not report the same amount of details. TheCOMSCREEN is considered the source with the highest quality data (less edits failed). In terms of data details, theE-filers report less detail than the other sources while NISA-T1 forms report the most.

E-filers non-clean

E-filers clean

Paper filer COMSCREEN

NISA clean

NISA non-clean

Algorithm sample

E-filers NISA

Page 4: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1178

3.2 Corporations sample design

As for the unincorporated businesses sector, Statistics Canada receives and uses each year the most recentcorporations’ file to create the parameters required for the new sample. Only corporations with a StandardIndustrial Code (SIC) falling in the agriculture sector are kept. The number of corporations in the Canadianagricultural sector is currently around 30,000.

This population is stratified by province, standard industrial code groups and by sales class. As for theunincorporated sector, the sample size is usually driven by cost. Most recently, the required sample size has beenaround 12,000. This sample size is then allocated to the strata, the farm operations in strata with larger sales havinga larger probability of being selected. The sampling fractions obtained by strata are then calculated. The list of stratawith the sampling fractions is called the parameter file that is required by the sampling system at CCRA that isbased on a Bernouilli design.

As for the unincorporated sector, each corporation has a unique identifier called the corporation or T2 number. AHASH number is created using a similar but different function used for the SIN number. For a given stratum, theHASH number of a record is compared to the interval based on the sampling fractions and this will determine if therecord is to be selected or not.

For the corporations, there is also a list of pre-specified units provided. However, no exclusion file is provided forcorporations.

All returns are sampled at CCRA and photocopies of sampled records are sent to Statistics Canada where they arecaptured and edited.

For the years 1987 to 1989, the sample was around 4,500 since the Prairie provinces were not sampled. They wereincluded in 1990 at which time the sample increased to 9,200. It has since increased to 11,800 in 1997 and wentdown to 8,900 in 1998. The reason for the latter decrease is the arrival of a new source of data: GIFI. StatisticsCanada was to receive the GIFI data for all agriculture corporations hence the planned sample was cancelled.However, it became clear in the process that the release deadline would not be met due to delays in obtaining thedata. There were also concerns about data quality. Hence, a contingency plan was designed. Parameters werecreated to obtain a “reduced” sample. More on this issue is found in the following section.

4. FUTURE CHALLENGES

4.1 General Index of Financial Information (GIFI)

For the fiscal year 1998, a new source of data has been made available. GIFI is a format that will replace the T2 taxreturn used by corporations. For the fiscal year 1998, the T2 form has been transcribed to the GIFI format byCCRA personnel, as it will be for 1999. However for fiscal year 2000, it is planned that all corporations will reportunder this new format. Moreover, it is expected in a few months from now, that all corporations will be required tofile electronically using the GIFI format, so one big advantage is that data will be available for all corporationselectronically and may results in lower costs to obtain.

However, there are many concerns to be addressed before these data can be used. The first concern is timeliness.The agriculture tax data program has a preliminary release in November for the previous fiscal year. The finalrelease is due in March the following year. Usually for the preliminary release, data sampled and obtained until theend of August or beginning of September are used and represent around 70% of the expected sample. For the finalrelease, data sampled and obtained until the end of November are used and usually represent 80% to 90% of thesample.

For fiscal year 1998, the data have been transcribed from the T2 form to the GIFI format and have been captured byCCRA. The process has started late in December 1998. Although it was expected that the agriculture tax dataprogram would use GIFI for the corporations, it became clear early in the process that a contingency plan wasrequired (and was used) should the data not be available for the release. The contingency plan was to select a

Page 5: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1179

reduced sample and obtain photocopies of returns as it was done in the past. For 1999, it is expected that the sameprocess of transcription and capture will be done, hence concerns about deadlines called for another contingencyplan similar to 1998 except for a full sample not a reduced one. In 2000 however, deadlines may not be a problem,since no transcription is expected, only capture for those not yet reporting electronically.

Another concern related to GIFI is data quality. The transcription process, although temporary, resulted in poorquality of data for some fields. The transcription has been done for all corporations not only agricultural, henceCCRA personnel’s training was not specialized towards agriculture and the personnel did not have access to othersources of data to help them properly capture the returns. This was observed during a small scale test wheretranscribed GIFI forms and T2 forms captured by Agriculture Division personnel were compared. Somerecommendations for improvement were provided. A larger scale test is planned later this year.

A final concern is the amount of details that will be obtained once the corporations will report on the GIFI form.Very few variables are mandatory to report. For the income statement, only the total income and total expenses arerequired, which is not at all acceptable for the agriculture tax data program where details are required by commodity(livestock details, crops details, etc.). Until corporations report under the new GIFI format, it is unknown howmany of them will voluntarily report the details that are so important for the agriculture tax data program. If manyof them do provide the details, imputation methods can be used to impute those not providing the details. However,if most of them do not report the details, sources of data other than GIFI, will have to be sought to fulfill theobjectives of the agriculture tax data program (possibly a new survey).

4.2 New sample design for retrospective and longitudinal analysis

Recent crises in the Agriculture sector have resulted in new requirements for data analysis. The current design’sgoal is to produce cross-sectional estimates every year. However, new requests coming from economists andanalysts of the agriculture sector require retrospective and longitudinal data. Although the current design has acertain degree of units present in the sample for several years, recent studies have shown that these units are notrepresentative of the population for a given year. Because retrospective and longitudinal analysis are more and morepopular, actions have to be taken to modify or upgrade the sample to be able to answer the needs of these demands.

Since we are dealing with fiscal records, respondent burden does not exist. So, if we want to obtain past data toperform a retrospective analysis, it is just a matter of getting the form captured. Of course, there are costs associatedwith this. Although the price to obtain a photocopy of a form in the regular process is around 3$, it can vary from$15 to $30 for past records, mainly because these records have to be pulled from archives.

A study was undertaken in July 1999 to determine the cost of obtaining records so that each year a sample can beused to produce cross-sectional (annual) estimates, perform retrospective analyses and perform longitudinal analyses(see figure 2 on the following page). The unincorporated and incorporated sectors were included in this analysis.The 1993 sample was designed and drawn (white part) and was also pre-specified for 1994. The 1994 sample wasdesigned and drawn (white part) and was also pre-specified for 1995, and so on. For 1993, the gray part representsunits that are needed should a retrospective analysis be required based on the 1997 fiscal year. For 1994, the graypart at the top represent units required if a longitudinal analysis is required based on 1993.

Over a period of 5 years, the cost would be close to $300,000 to obtain data for retrospective analysis (assuming $15per photocopy) and around $150,000 for longitudinal analysis (assuming $3 per photocopy).

This cost has taken into account records that could be found in an electronic version either among E-filers or NISA,(no cost to obtain them, hence not included in the cost). Also, births (new agriculture operations) and deaths(agriculture operations no longer in the population) have been considered so no retrospective units for births and nolongitudinal units for deaths were included for them in the cost. However, all units obtained need to be edited(cleaned) and there is also a cost associated with this. As the agriculture tax data program cannot afford such anamount, other alternatives need to be looked at. Among those currently being considered are to create a smallerrepresentative sample and obtain retrospective and longitudinal data or impute retrospective data using nearestneighbor, hot deck or modeling techniques.

Page 6: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1180

FIGURE 2: Cost to obtain data for the 5-year period

5. CONCLUSION

The next two years will be critical for the agriculture tax data program. Concerns about deadlines, data quality andlevel of details will have to be addressed with the arrival of GIFI. The current contingency plan (photocopies ofT2) is a good alternative, however, it will no longer be available once the corporations start reporting under the GIFIformat. It will be imperative to find solutions and act quickly on them if we do not want the data quality and thelevel of details to be in jeopardy. As well, as more and more retrospective analysis are requested, it will benecessary to obtain the data to be able to produce reliable results representative of the population via an appropriatesample design. All this will have to be addressed in a very short period of time.

6. REFERENCES

Statistics Canada (1999), Whole Farm Data Base Reference Manual, Statistics Canada publication no.21C0005GPE.

1993 1994 1995 1996

Data to obtain for retrospective analysis

Data to obtain for longitudinal analysis1997

19,000 units at $15 = $285,000

51,000 units at $3=

$153,000

Page 7: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1181

Health Care Establishment Surveys of theNational Center for Health Statistics

Thomas McLemore, National Center for Health Statistics6525 Belcrest Road, Room 952, Hyattsville, Maryland 20782

[email protected]

Abstract

This paper provides an overview of the health care establishment and provider surveys, collectively called the National HealthCare Survey (NHCS), which are conducted by the National Center for Health Statistics. This family of surveys provides nationallyrepresentative utilization data for the major sectors of the U.S. health care delivery system. The paper reviews the design, contentand status of the NHCS which includes separate surveys of the following types of health care: discharges from non-Federal, short-stay hospitals; visits to hospital-based and freestanding ambulatory surgery centers; visits to non-Federal, office-based physicians;visits to non-Federal, short-stay hospital emergency and outpatient departments; residents of nursing and related care homes; andpatients of home health agencies and hospices. The NHCS evaluation, research and development activities, and the datadissemination program are also discussed.

Key words: health care utilization, data collection

Introduction

Over the past two decades, significant changes have occurred in the organization, financing and delivery of health carein the U.S. These changes resulted from a variety of factors such as efforts to contain cost, the need to improve theefficiency and effectiveness of medical care, an aging population, and improvements in medical technology. One of theconsequences of these changes has been a proliferation and diversification of health care settings and services. In thelate 1980's, the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC),initiated an effort to restructure its health care establishment and provider-based surveys to better address the data needsof this changing health care environment. The resulting program called the National Health Care Survey (NHCS) isbased on an integrated family of surveys of health care establishments and providers. This paper provides an overviewof the NHCS including: highlights of survey findings; descriptions of the survey components, sample designs andenhancements; research and development activities; and data dissemination program. The National Health Care Survey

One of the original objectives of the National Health Care Survey (NHCS), as previously presented at the 1993International Conference on Establishment Surveys, was to provide nationally representative data on the use of healthcare resources for the major sectors of the U.S. health care delivery system. The NHCS seeks to accomplish thisobjective by collecting data on the following health care establishments and providers: hospitals, freestanding ambulatorysurgery centers, physicians, hospital emergency and outpatient departments, nursing homes, home health agencies, andhospices. The means by which the data are collected are six national probability sample surveys. These include theNational Hospital Discharge Survey (NHDS); the National Survey of Ambulatory Surgery (NSAS); the NationalAmbulatory Medical Care Survey (NAMCS); the National Hospital Ambulatory Medical Care Survey (NHAMCS); theNational Nursing Home Survey (NNHS); and the National Home and Hospice Care Survey (NHHCS). Examples of annual health care utilization data produced by these surveys include (from the 1996 or 1997 surveys):

• 31 million inpatient stays, with an average length of about 5 days.• 21 million ambulatory surgery visits.• 787 million ambulatory care visits to physician offices, a rate of 3 visits per person.• 77 million visits to emergency departments.• 95 million visits to hospital outpatient departments.• 88 percent occupancy rate for nursing home beds.• 73 percent of patients served by home health agencies were 65 years of age and over.

Health researchers, policy makers, and planners use these data to monitor changes in the use of health care resources,to monitor the treatment of diseases, and to examine the impact of new medical technologies.

Page 8: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1182

The surveys share a number of common methodologies. Each is a national probability sample survey based on amultistage design of health care establishments and providers. The statistics produced from the NHCS are primarilyevent-based, e.g., visit or discharge, although each survey collects data at both the level of the establishment or provider,and at the level of the patient or event. To the extent possible, data elements, survey definitions and sampling framesare uniform across surveys. Medical diagnoses are collected in each survey. The Bureau of the Census conducts all fielddata collection. The surveys enjoy high response rates relative to other establishment and provider surveys. These ratesvary across survey, but range from 70 percent to 96 percent. A single contractor processes all NHCS data, allowing foreffective oversight, uniform coding, and standard quality control procedures.

The NHCS component surveys may be grouped into three topical areas: hospital and surgical care, ambulatory care, andlong-term care. A brief description of each of the component surveys in each of these areas follows. Examples of effortsto broaden the coverage or improve the scope of the surveys and of survey enhancements indicating the capability of theNHCS to collect supplemental data for policy relevant issues and programmatic needs are provided.

Hospital and Surgical Care

In the area of hospital and surgical care, the NHCS includes the National Hospital Discharge Survey (NHDS) and theNational Survey of Ambulatory Surgery (NSAS).

The NHDS is the principal source of nationally representative data on the inpatient utilization of non-federal, short-stayand general hospitals in the U.S. The target population for the NHDS includes discharges from non-institutionalhospitals, exclusive of Federal, military and Veterans Administration hospitals. Hospitals whose specialty is general(medical or surgical), maternity or children's general or short-stay hospitals (i.e., hospitals with an average length of stayfor all patients of less than 30 days) are included in the survey. In addition, hospitals must have six or more beds staffedfor inpatient use. The NHDS has been conducted continuously since 1965, a feature that has permitted year-to-yeartracking of the dramatic decline in length of inpatient stays.

The NHDS is based on a three-stage sample design. The first stage sample was a subsample of the Primary SamplingUnits used for the 1985 National Health Interview Survey (NHIS). PSUs are geographic areas usually defined by acounty or group of counties. Within this first stage sample of PSUs, a sample of approximately 540 hospitals wasselected with probability proportional to size (as measured by the annual number of discharges) after stratifying byhospital specialty/bedsize class and abstract service status. The SMG Hospital Data File was used as the sampling frame. Every three years, a “birth” sample is added to the hospital sample to account for new hospital construction since theprevious update. The hospital response rate is approximately 96 percent.

The third stage of selection is a systematic sample of approximately 300,000 discharges. Discharge data are collectedusing one of two methods: manual abstraction of data from medical records by hospital staff or by Census field staff, orthe purchase of electronic data from state data systems, commercial abstracting services, or directly from hospitals. Theemphasis on electronic data collection is evidenced by the fact that about one-third of the hospitals participate via anautomated mode, but they provide about two-thirds of the sample records due to differential sampling in these facilities. While collecting data in automated form has the advantage of making the discharge sampling easier, that advantage ismitigated by the increased complexity due to the diversity of formats and time frames for obtaining these data.

The NHDS discharge data set conforms to the Uniform Hospital Discharge Data Set (UHDDS) and includes admissionand discharge dates; date of birth; sex; race; ethnicity; marital status; expected sources of payment; discharge status;disposition; diagnoses; surgical and diagnostic procedures; and dates of procedures. Hospital characteristics, such ascurrent ownership, bedsize, and geographic location, are also collected. We are currently investigating the feasibilityof collecting admission type and source, and of collecting more clinically oriented data in future survey cycles.

As currently constituted, the NHDS does not include Federal hospitals. However, we are exploring the possibility ofreporting data from the Veterans Administration, the Department of Defense, and the Indian Health Service along withdata from the NHDS.

Improvements in medical technology, advances in anesthesia, and efforts to control and reduce costs have had the effectof increasing ambulatory surgery and moving many surgical procedures from the inpatient to the ambulatory setting. This

Page 9: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1183

movement has been dramatic and continues. At this point, the majority of all surgeries in the U.S. are performed on anoutpatient basis. The National Survey of Ambulatory Surgery (NSAS) was developed to measure the quantity and typeof surgery done on an outpatient basis and, in conjunction with the NHDS, to measure the movement of surgery fromone setting to another. In the NSAS, ambulatory surgery referred to surgical and non-surgical procedures performed onan ambulatory (outpatient) basis in hospitals or freestanding centers. The NSAS was implemented in 1994 and conductedthrough 1996. The NSAS has not been conducted since 1997 due to budgetary constraints.

The NSAS may be viewed as two separate surveys since it included independent samples of freestanding ambulatorysurgery centers and hospitals that perform ambulatory surgery. The NSAS used a multistage design similar to that usedfor the NHDS. It included a first stage of PSUs, a second stage sample of facilities drawn from within the sampled PSUs,and a third stage sample of discharges selected from within the facilities using either operating room logs or computergenerated lists of ambulatory surgery cases. The freestanding ambulatory surgery centers were sampled from the SMGFreestanding Outpatient Surgery Center Database and the Health Care Financing Administration’s Provider of Servicesfile. Hospitals were sampled from the SMG Hospital Market Data Base using the same definition of “hospital” that isused for the NHDS. The NSAS included a sample of 751 facilities (333 freestanding centers and 418 hospitals). Thefacility response rates were approximately 70 percent and 90 percent for the freestanding facilities and hospitals,respectively.

Locations within hospitals that were included in the NSAS were main operating rooms, dedicated ambulatory surgerylocations, laser procedure rooms, and other specialty rooms, such as endoscopy units and cardiac catheter laboratories. Freestanding ambulatory surgery centers and locations within hospitals that specialized in or were dedicated to dentistry,podiatry, pain block, abortion, family planning, or small procedures were not included. Each year, responding facilitiescompleted approximately 120,000 abstract forms. The NSAS abstract form was similar to the NHDS form withmodification of some items, such as discharge status, and inclusion of items on the use of anesthesia.

Ambulatory Care

The NHCS includes two surveys conducted to provide data on the utilization of ambulatory care services in physicians’offices, and hospital emergency and outpatient departments.

The National Ambulatory Medical Care Survey (NAMCS) is a national probability sample survey of non-federal, office-based physicians who are principally engaged in office-based patient care practice, but not in the specialties ofanesthesiology, pathology or radiology. Telephone contacts and non-office visits are excluded. The NAMCS wasredesigned as part of the NHCS in 1989 and has been conducted annually since then. The NAMCS is based on a three-stage sample design with samples of PSUs, physicians and office visits. The physician sample is selected from the masterfiles maintained by the American Medical Association and the American Osteopathic Association. Each year, a sampleof approximately 2,500-3,000 physicians is drawn from the universe of about 600,000 physicians. About 70-75 percentof the sample physicians agree to participate in the survey.

At the third stage, office visits are selected from within the annual practices of the sample physicians. Physicians arerandomly assigned to a one-week reporting period and a systematic random sample of approximately 30 visits is selectedfrom the physician's practice using a patient log or register. Annually, physicians or their staff complete patient encounterforms for approximately 25,000-30,000 sample visits.

Current data items include the date of visit; date of birth; sex; race; ethnicity; expected source(s) of payment; principalcomplaint(s), symptom(s), or other reason(s) for visit; cause of injury; diagnoses; diagnostic and screening services;therapeutic and preventive services, including medication therapy; disposition; providers seen; duration of visit, andselected physician characteristics.

The NAMCS survey methods are conducive for collecting special purpose programmatic data. For example, incollaboration with the National Heart, Lung, and Blood Institute, the 1993 NAMCS oversampled allergists andpulmonary disease specialists to provide information on the treatment of asthma in physician’s offices. Additionally,data on physicians’ knowledge and treatment practices for patients with asthma were collected via a self-administeredquestionnaire.

Page 10: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1184

A NAMCS Complement Survey was conducted in 1997-99 to estimate the number of office visits are made to physicianswho are not classified as non-federal, office-based, i.e., physicians not included in the NAMCS. These data will updatedata collected in 1980 that indicated that 17 percent of physicians not routinely included in the NAMCS provided someoffice-based care and accounted for about 10 percent of all office visits, and will be combined with the NAMCS datato provide a more complete picture of the medical care provided in physician’s offices.

While the majority of ambulatory care is performed in physician offices, a significant proportion is provided in hospitalemergency departments (ED) and outpatient department clinics (OPD). The National Hospital Ambulatory Medical CareSurvey (NHAMCS) was initiated in 1992, and conducted continuously since that time, to measure the utilization in thesehospital settings. The NHAMCS is based on a four-stage sample design. Within a first stage sample of PSUs, a sampleof 600 non-Federal, short-stay and general hospitals (using the same hospital definition as the NHDS) is selected basedon probability proportional to the size of the emergency and outpatient department (as measured by the annual volumeof visits) after stratifying by type of service. Sample hospitals are randomly assigned to one of 16 rotating panels anda four-week reporting period; therefore, each sample hospital rotates into the NHAMCS approximately once every 15months.

Within each sample hospital, the ED and up to five OPD clinics are selected. The OPD clinics are selected withprobability proportional to size based on the expected number of visits. A systematic sample of visits is selected frompatient logs or registers. Physicians, hospital staff, or Census Bureau field staff complete an encounter form for eachsampled visit. The data may be provided prospectively, i.e., at or near the time of the visit, or retrospectively from themedical record. Annually, about 440 hospitals respond to the survey providing data on about 70,000 sample visits.

The data items on the OPD patient record form parallel those collected in the NAMCS. The ED form is similar to theNAMCS and OPD forms, but does contain data items specific to the care provided in EDs, e.g., the immediacy withwhich the patient should be seen; time spent waiting to see the physician; and disposition of visit. Cause of injury hasbeen collected since 1992 on the ED form. Starting in 1995, this item and other injury-related items, such as place ofoccurrence and whether the injury was work related, were added to the ED, OPD, and NAMCS forms. The similarityof the NAMCS and NHAMCS patient record forms and the independent designs for the ambulatory care surveys permitthe results to be readily combined to permit analysis of the variation in utilization by health care setting, condition,patient race and sex.

Long Term Care

The NHCS includes two surveys that document the utilization of long-term care services in nursing homes, home healthagencies and hospices. Data are collected about the facilities, their services and staff, as well as on the personal andhealth characteristics of their residents or clients.

After a hiatus of data collection since 1985, the National Nursing Home Survey (NNHS) was conducted in 1995 and inthe odd years since 1995. The NNHS includes homes with three or more beds that were staffed for use by residents androutinely providing nursing and personal care services. The 1995 NNHS utilized a two-stage sample design. The firststage probability sample of 1,500 nursing homes was selected from the universe of 17,500 facilities which had beenstratified by bedsize and certification status (Medicare or Medicaid certified as a skilled nursing or intermediate carefacility). The sampling frame was the 1991 National Health Provider Inventory (NHPI) that had been supplemented withfacility listings obtained from the States. At the second stage, a random sample of up to six current residents was selectedfrom the list of residents who were currently receiving care from the facility. Three questionnaires were used to collect the data. A Facility Questionnaire was completed with the facility administratoror designee. The Current Resident Questionnaire was completed via personal interview with the staff member familiarwith the care received by the resident and the medical record of the resident. Data items include patient demographics,expected sources of payment, charges, diagnoses, services provided, health status, referral and length of service or stay. For the 1995 survey, a self-administered expense questionnaire was used to collect expense and revenue information from the facility accountant or designee.

The NNHS served as a vehicle for collecting data needed by other Federal agencies. For example, data on immunizationpractices and services in nursing homes were collected for the National Immunization Program (CDC) in 1995, 1997,

Page 11: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1185

and 1999, and data on the provision of dental services to nursing home residents were collected for the National Instituteof Dental Research in 1995 and 1997.

The National Home and Hospice Care Survey (NHHCS) has proved valuable in measuring the dramatic changes thathave occurred in the home health industry during the 1990s. The survey was conducted in 1992-94 and in the even yearssince 1994. From 1992-94, the NHHCS utilizes a three stage stratified sample design based on a first stage sample ofPSUs and a second stage sample of approximately 1,500 hospices and home health agencies selected from the 1991NHPI. A sample of six patients and six discharge patients is randomly selected from each sample agency. Beginningin 1996, the NHHCS moved to a two-stage sample design with a first stage sample of facilities and a second-stage sampleof residents.

Data collection procedures for the NHHCS parallel those used in the NNHS with facility data collected from personalinterviews with administrators and resident data collected from staff. Facility data include ownership;Medicare/Medicaid certification; number and type of patients; services provided; and number and type of employees. Patient data are obtained from personal interviews with caregivers that consult the patient's medical record. Patient datainclude date of birth; sex; race; ethnicity; Social Security Number; marital status; date of enrollment; referral status;expected source(s) of payment; charges; sources of informal care; living arrangement; physical limitations; functionalimpairment; diagnoses; number of visits; services received; and, for discharges, the outcomes of care. Facility responserates have increased since the survey’s inception; current facility response rates are about 95 percent. Like the NNHS,the NHHCS has also served as a vehicle for collecting data for other agencies. Data on the use of home medical deviceswas collected for the Food and Drug Administration in 1996 and data on “homebound” care was collected for the Officeof the Assistant Secretary for Planning and Evaluation in 1998.

We recently completed a methodological study to investigate the use of computer-assisted interviewing techniques andWEB-based technology in collecting NHCS data. This research suggested that several of the component surveys, inparticular the two long-term care surveys, were excellent candidates for these forms of data collection. Data Dissemination

We conduct an active program to disseminate data from the NHCS. Data are released in published form through NCHSpublications like Vital and Health Statistics series reports, Advance Data, and Health US, and through journals and otherprinted media. The NCHS WEB site (www.cdc.gov/nchs) is a valuable source of information on the NHCS. This WEBsite includes the NCHS’ Internet publication series, called the Health E-Stats which is designed to provide for quickrelease of information on topics of significant importance, such as the 1997 NHDS data that highlighted the decline inhospitalizations for AIDS and the increase in length of stay for childbirth.

Public use micro-data files are created and disseminated in a variety of formats for each year that the surveys areconducted. Many of these public use datasets can be downloaded from the NCHS WEB site. In addition, micro-datatrend files that readily permit trend analysis have been produced for several of the components. For the NHDS, a micro-data file containing 18 years of data has been released and, for the NAMCS, a micro-data file containing 20 years of datain 5-year intervals has been released. NHCS data are also available for special purpose analyses through the NCHSResearch Data Center. In the near term, we are preparing to do a customer satisfaction survey to solicit information onthe usefulness of our current products and ideas on future dissemination products.

Conclusion

The NHCS is dynamic data collection effort designed to provide needed information on the utilization of health careresources in the U.S. With its continued development and implementation, this data collection system will serve as avaluable resource for monitoring changes in the delivery of health care, and the effectiveness and quality of care providedto a changing U.S. population.

REFERENCES:

Burt CW, Knapp DE. Ambulatory care visits for asthma: United States, 1993-94. Advance data from vital and healthstatistics; no. 277. National Center for Health Statistics. 1996.

Page 12: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1186

Data processing review and quality improvement issues for the National Health Interview Survey and National HealthCare Survey. Contract No. GS-23F-8152H. Washington, D.C.

Gabrel CS. An overview of nursing home facilities: Data from the 1997 National Nursing Home Survey. Advance datafrom vital and health statistics; no. 311. National Center for Health Statistics. 2000.

Gardocki, GJ, McLemore T, DeLozier JE. The National Ambulatory Medical Care Survey Complement Survey: UnitedStates, 1980. National Center for Health Statistics. Vital Health Stat 13(77). 1984.

Graves EJ. National Hospital Discharge Survey: Annual summary, 1988. National Center for Health Statistics. VitalHealth Stat 13(106). 1991.

Haupt, B. Development of the National Home and Hospice Care Survey. National Center for Health Statistics. VitalHealth Stat 1(33). 1994.

Haupt BJ. An overview of home health and hospice care patients: 1996 National Home and Hospice Care Survey. Advance data from vital and health statistics; no. 297. National Center for Health Statistics. 1998.

Hall MJ, Lawrence, L. Ambulatory surgery in the United States, 1996. Advance data from vital and health statistics;no. 300. National Center for Health Statistics. 1998.

McCaig LF, McLemore T. Plan and operation of the National Hospital Ambulatory Medical Care Survey. NationalCenter for Health Statistics. Vital Health Stat 1(34). 1994.

McLemore T, Bacon WE. Establishment Surveys of the National Center for Health Statistics. Proceedings of theInternational Conference on Establishment Surveys. Buffalo, New York. American Statistical Association. 93-98. 1993.

McLemore T, Lawrence L. Plan and operation of the National Survey of Ambulatory Surgery. National Center forHealth Statistics. Vital Health Stat 1(37). 1997.

National Center for Health Statistics. Health, United States, 1999 with Health and Aging Chartbook. Hyattsville,Maryland: 1999.

Schappert SM. Ambulatory care visits to physician’s offices, hospital outpatient departments, and emergencydepartments: United States, 1997. National Center for Health Statistics. Vital Health Stat 13(143). 1999.

Schappert SM. National Ambulatory Medical Care Survey: 1989 summary. National Center for Health Statistics. VitalHealth Stat 13(110). 1992.

SMG Marketing Group, Inc. Hospital Market Database. Chicago, Illinois. Healthcare Information Specialists.

Strahan GW. An overview of nursing homes and their current residents: data from the 1995 National Nursing HomeSurvey. National Center for Health Statistics. Advance data from vital and health statistics; no 280. 1997.

Wunderlich, GS, ed. Toward a National Health Care Survey. Washington, D.C. National Academy Press. 1992.

Page 13: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1187

Measuring Energy Use in Commercial Buildings in CanadaBy André Bourbeau, Policy Research Directorate, Environment Canada 1

10 Wellington, 24th floor, Hull, Quebec K1A 0E3, [email protected]

ABSTRACT

Energy efficiency is an essential element of Canada’s strategy for addressing climate change. As part a major initiative,Natural Resources Canada’s Office of Energy Efficiency (OEE) is undertaking the first national Commercial BuildingEnergy Use Survey (CBEUS) in Canada. This survey will allow to establish an information baseline to determine theimpact of the OEE’s Commercial Building Incentive Program (CBIP), such as their indirect effects on the design of newbuildings and on commercial building sector energy use. The survey involves collecting energy intensity data for varioustypes of commercial buildings in Canada with a view to obtaining national estimates by vintage, building type and size.The paper presents the innovative aspects of the proposed survey by which the OEE intent to establish a surveycomparable to that produced by the EIA’s Commercial Building Energy Consumption Survey (CBECS) but working withbudget constraints that are more severe than those applied to the U.S. study. Preliminary research, including a major pilotstudy examining various sampling strategies and survey methods, was carried out in enumeration areas distributed in fiveregions of Canada. Both a core survey, focusing on commercial building energy intensity, and a supplement survey,focusing on energy end use information were tested to assess response rates and the accuracy of data collected; in addition,secondary data sources which could be possibly used by the survey were assessed, modeling reviewed and results andrecommendations provided before moving forward on full scale data collection. The paper will present the main findingsof the pilot study and make recommendations for the survey implementation.

1. INTRODUCTION

Collecting accurate energy use data is necessary to help Canada measure its progress in meeting emission targets setunder the Kyoto protocol and to make effective policy decisions. But gathering reliable data on the commercialbuilding sector has always been difficult because of its large size, its wide range of buildings and activities and itsdiverse range of energy-using equipment and decision-makers involved. In fact, no national commercial buildingenergy use survey has ever been undertaken in Canada.

To remedy this situation and, eventually, to facilitate data collection, the Office of Energy Efficiency (OEE)commissioned a series of studies to develop a methodology for collecting segment energy intensity data and segmentenergy end-use intensity data.

The first of these studies was carried out in 1995 at the Canadian Commercial Energy End-use Data and AnalysisCentre (CCEEDAC). Based on the results of this study, the OEE commissioned a feasibility study in 1997 to assessalternatives for collecting commercial sector energy-use data in Canada. The study identified two distinct objectiveswith respect to the collection of energy end-use data: 1) The collection of segment energy intensity data; 2) thecollection of segment energy end-use intensity data. A strategy for data collection was also proposed. Key elementsof this strategy are as follows:

• It is important not to compromise the integrity of segment energy intensity data by trying to collect more datathan is necessary to meet the objective or that can be collected with any reasonable degree of accuracy.

• To ensure data accuracy, technical energy end-use data must be collected solely from energy specialists in thefield. In identifying specialists in different segments, the heterogeneity of the commercial sector must be takeninto consideration.

• Existing non-survey data can supplement collected survey data.

The first major recommendation of the feasibility study was to conduct a core survey that would collect data so thataccurate segment energy intensity data could be measured. The sample used in such a survey must therefore be 1 This paper is based on recent findings from study coordinated by the author when he was Head, DataDevelopment at the Office of Energy Efficiency, Natural Resource Canada. The author would like to thank FrankTrimnel, for his major contribution to the development and implementation of the pilot study.

Page 14: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1188

representative of all segments. The second recommendation was to conduct a second survey to collect in-depth andaccurate information on energy end-use and equipment from specialists in the field. In the report, the segment usedas an example was “large” office buildings in which energy using equipment is maintained by specialists and energysavings retrofits are carried out by Energy Service Companies.

The feasibility study also recommended the pilot testing of the elements of the strategy. The pilot test wouldcomprise the following 4 components:• Pilot testing of the commercial building energy intensity survey -- Is it possible to collect basic segment energy

intensity data in a full-scale survey?• Pilot testing of the energy end use survey – Is it possible to identify contacts in the field and collect in depth and

accurate data from them?• Assessment of other sources of information – Can non-survey data be used?• Issues related to data sorting and modeling - What data and methodology are required to estimate energy

intensities?

In 1998, the OEE carried out a pilot study of the proposed surveys. The results were summarized in a documententitled Commercial Building Energy Use Survey Phase II—Pilot Testing Coordination Report. This papersummarizes the main findings from the pilot study.

2. ENERGY INTENSITY PILOT SURVEY

The first survey is intended to collect data on energy use intensity and a limited range of building characteristicsfrom a representative set of buildings. The general conclusion that could be drawn from the pilot study is that basicdata for segment intensity can be collected in a full-scale survey. The low response rate obtained in the pilot survey,however, confirmed that the survey should be designed to reduce respondent burden to its strict minimum. Only aminimum amount of data should be collected in this survey < building use, square footage and energy bills >. Thelow response rate gave also warning that in order to improve participation, the methodology of the energy intensitysurvey should take account of the diversity in the commercial sector in decision making that affects which and howmany individuals have access to the information required. Similarly the survey, including enumeration andsurveying, must be tailored to the unique characteristics of each segment. For example, where a well-documented,accurate listing of the population of a segment or sub-segment exists, this information should be used rather thanproceeding with an area enumeration approach.

The energy intensity pilot survey, examining various sampling strategies and survey methods, was carried out inenumeration areas distributed among five regions of Canada. Some of the practical sample design issues that haveemerged are:

• There is no basis, as yet, in Canada for a universal commercial building list;

• Special lists of buildings can be developed on a comprehensive basis for only four segments (hospitals/in-patientcare, primary and secondary schools, colleges and universities and enclosed shopping/mall), six sub-segments(large office buildings, large hotels, food sales, other retail, food services, services --excluding food, andgovernment-owned buildings), out of the sixteen segments covered by the survey;

• Area sampling is required to establish the overall universe of commercial buildings; and

• Establishment data rolled up to “unique municipal address”, overrepresents commercial buildings by a ratio of1.7 to 1.

A need for a segmented approach for the heterogeneous commercial building sector emerged as a conclusion afterthe completion of the pilot test. In order to be successful, the survey methodology will need to be adapted to eachsegment. In order for this segmented approach to work, it is necessary to contact associations and majororganizations to prepare the way for the survey by creating building lists, contact name lists and to collect existingenergy intensity data such as floor space data (if available). Such an approach would require the establishment of a

Page 15: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1189

“clearinghouse” whose mandate would be to oversee/coordinate an efficient linkage between databases and protectthe confidentiality of the information provided. However, this approach, as efficient it may be to improve the qualityand richness of the survey, could also create major delays in the survey results delivery.

The recommendations for data collections include surveying all segments (including small mixedcommercial/residential and multi-residential), at least at the enumeration stage and surveying buildings down to aminimum of 1,000 square feet. The personal interview is the most acceptable form of survey methodology. Billingdata from either establishment or utilities is an acceptable alternative source of energy consumption data.Finally, it was also recommended to complement data collection activities in a segment where there exist effectivepotential leverage with established communication channels (for enlisting support, encourage participation and tohelp identifying contact person and efficiency of survey).

3. ENERGY END USE PILOT SURVEY

The energy end use survey is intended to collect detailed data on building characteristics, including occupancyduration, envelope, mechanical, electrical and lighting systems, and for some sectors, refrigeration and cooking.From this data and that collected in the energy intensity survey, the energy performance of the building can bemodeled to determine energy end-use intensities. Based on the assumption that this information could be more easilyacquired for larger buildings, these were targeted in the pilot phase.

In the feasibility study, the best source of this information was determined to be the organizations providingmechanical and electrical services. In the pilot project, meetings were held with the 11 largest national buildingservice companies (BSCO’s). All of them agreed to cooperate.

The limitation on the data from BSCO’s has proven to be the growth of property management in large buildings inthis decade. These companies tend to breakup maintenance and service contracts into their component parts and askseveral BSCO’s to bid on the provision of their services, usually on an annual basis. Most BSCO’s, therefore, onlyprovide service on major mechanical or electrical components, and not all systems comprehensively. In addition,confidentiality agreements between contractors and building owners limit the direct access of the data. Permissionwould need to be obtained from building owners, adding one additional barrier/step in the collection of the neededinformation.

To address this limitation, a second approach has been investigated with large property management firms as thesource of data. They have been asked to provide data on buildings in three cities, Montreal, Vancouver and Ottawa,in two building segments, Office and Enclosed Shopping Malls, using the two surveys being pilot tested, the energyintensity survey and the energy end-use survey.

Results were somewhat encouraging. With the onset of electricity deregulation in Ontario and elsewhere in Canada,these organizations are moving forward to collect energy intensity data for their entire portfolio into a centraldatabase. This might ensure simple and accurate acquisition of this data in future from these organization. Theenergy end use data, however, will continue to reside with the onsite property manager. When directed by their headoffice to respond to the survey, the pilot has shown that the onsite staff is capable of completing the questionnaire ofthe energy end use survey in most cases. However, confidentiality agreement could also limit data access.

After the pilot test, it was concluded that the energy end use survey could only be applied where data and sufficientleverage currently exists to ensure good response. For example, leveraging OEE Program Energy Innovators'contacts in the Elementary and Secondary Schools segment. Data could be organized and archived at the CanadianCommercial Energy Energy End-use Data and Analysis Centre. This data would prove valuable in bench markingand improve the precision of modeling to determine the energy end-use intensity values.

4. REVIEW AND ASSESSMENT OF NON-SURVEY DATA SOURCES

The overall goal of this initiative was to assess existing non-survey sources of data (other than large/nationalBSCO’s and Energy Service Companies (ESCO’s)) in order to improve the efficiency of the survey samplingstrategy and methodology and, hopefully, the accuracy of the energy end-use and intensity estimates. This project

Page 16: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1190

was divided in two stages. The first stage focussed on identifying and assessing sources of data from a generalperspective. Sources fell into one or more of the following classes:

• Those that could provide lists of buildings, general information on buildings, aggregate information, and/orbuilding contacts;

• Those that could provide general data on specific buildings; and

• Those that could provide detailed energy data on specific buildings.

Such sources include federal, provincial and municipal government buildings; government programs information;assessment authorities, BSCO’s; utilities; commercial property managers; facilities managers of post-secondaryschools, elementary and secondary schools, hospitals, restaurants, financial services and other associations.

The second stage operationalized the initial findings through implementing a pilot survey process in the Vancouverarea using a methodology adjusted to take account of these potentially usable sources. The conclusion is that thereexist a set of segments within the commercial sector for which one typical approach is appropriate, and another set ofsegments for which different approaches are required. Take for example, hospitals. The Vancouver Test providedinformation on the best service for detailed information about each hospital – in this case, it is known that eachhospital has a facilities engineer or manager that will be the most knowledgeable person for a given hospitalbuilding. As well, these buildings are so large and complex that it is necessary for an in-person audit to be conductedin order to ensure completion of the survey forms.

5. ANALYSIS OF ENERGY USE SURVEY RESULTS

We undertook a survey of various methodologies for estimating energy (kWh/m2) intensities (EUI’s) and end-useintensities (EEUI’s). With respect to EEUI’s a combination of elements of Conditional Demand Analysis (CDA) andengineering estimates were recommended. In order for some of the analyses to be feasible, this initiative provideddirection on the variables to be collected. For EUI’s, the energy intensity survey must identify the proportion of abuilding devoted to specific activities, per segment and source of energy. Among the variables which could besignificant in explaining energy consumption by end-use and by building type are weather variables; size of building;vintage; occupancy; and number of workers or any variable which could be used as a proxy for the size of thebuilding.

6. RECOMMENDATIONS FOR THE IMPLEMENTATION OF THE FIRST COMMERCIAL BUILDINGENERGY USE SURVEY IN CANADA

The main conclusion from the pilot study is that a large scale energy intensity survey could be effectivelyimplemented if the survey methodology is adapted, making full use of alternative sources of data and information ondifferent decision making processes. The commercial building sector is not homogeneous and one would gain inadapting the survey methodology accordingly.

With respect to the energy end-use survey, it was found that a full-scale survey would not be cost effective. It isdifficult to collect accurate information in a survey as large and technical as an energy end-use survey. However, itwas determined that contacts for narrow defined segments of buildings could be identified and that these specialistscould possibly provide accurate end-use data. Consequently, segments have to be narrowly defined before results canbe generalized.

In addition, since modeling must be combined with survey data to develop end-use estimates, the generalized resultsof the estimates will not be statistically representative. The usage of survey data, non-survey data and engineeringmodels for estimating energy end-use is common practice.

The OEE is currently undertaking a building energy intensity survey in cooperation with Statistics Canada. Plans areto undertake a “basic” building energy intensity survey followed by an energy suppliers survey in 2000/2001. Moredetails on survey implementation as well as copy of survey development reports are available on the OEE web site

Page 17: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1191

at the following address: www.oee.nrcan.gc.ca.

References

Anderson, W. P. (1985), Commercial Sector Energy End-Use Data in Canada: Recommendations for a NationalData Collection Strategy, Canadian Commercial Energy End-Use Data and Analysis Centre (CREEDAC), McMasterInstitute for Energy Studies, University of McMaster.

Trimnell, F., B. Bach, R. Robinson, ARC Applied Research Consultants and Engineering Interfaced Limited for theOffice of Energy Efficiency, Natural Resources Canada (1997), A detailed Strategy for Commercial Sector DataCollection In Canada.

Cockburn J., A. Bourbeau, B. Gobeil (1998), Natural Resources Canada’s Commercial Building IncentiveProgram, 1998 ACEEE Summer Study on Energy Efficiency in Buildings.

Nixey, D., P. Fuller, The Corporate Research Group for the Office of Energy Efficiency, Natural Resources Canada(1998), CBEUS Phase II Pilot Study General Commercial Building Survey: Component 1 Final Report,.

Moffat, S. and all, Sheltair Scientific Limited Group for the Office of Energy Efficiency, Natural Resources Canada(1998), Review and Assessment of Data Sources for Commercial Sector Energy Consumption: Component ThreeFinal Report.

Robinson, R., B. Bach, G. Lafrance, M. Chiarelli, M. Singleton, ARC Applied Research Consultants Group for theOffice of Energy Efficiency, Natural Resources Canada (1998), Commercial Sector Energy Efficiency DataCollection—Phase II. Analysis of CBEUS Survey Results.

Bach, B., Engineering Interface Limited for the Office of Energy Efficiency, Natural Resources Canada (1999),Commercial Building Energy Use Survey Component 2b: Pilot Survey of Property Management Firms in the Officeand Enclosed Shopping Mall Segments.

Trimnel, F., ARC Applied Research Consultants for the Office of Energy Efficiency, Natural Resources Canada(1999), Commercial Building Energy Use Survey Phase II—Pilot Testing Coordination Report.

Page 18: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1192

SURVEYING RESEARCH AND DEVELOPMENT FUNDING AND PERFORMANCE BY NONPROFITORGANIZATIONS

Ronald S. Fecso, John E. Jankowski and Mary V. Burke, National Science Foundation andRoger Tourangeau and Margrethe Montgomery, The Gallup Organization

Ronald S. Fecso, Division of Science Resources Studies, National Science Foundation,4201 Wilson Blvd., Suite 965, Arlington, Va. 22230

[email protected]

ABSTRACT

The Survey of 1996 and 1997 Research and Development Funding and Performance by Nonprofit Institutions collectedinformation on the science and engineering (S&E) research and development (R&D) activities of nonprofit organizations(NPOs). It collected data both from NPOs that fund S&E R&D and from those that perform R&D themselves. This paperdiscusses the methodology used for the survey, last fielded in 1973, and problems encountered in the effort.

1. INTRODUCTION

The Survey of R&D Funding and Performance by Nonprofit Institutions was done to meet needs for informationabout the role of the nonprofit sector in funding and conducting S&E R&D in the United States. The contractor toNSF for this survey was the Gallup Organization. The overall target population for the survey was nonprofitorganizations that fund or perform S&E R&D of at least $250,000 per year (compared to the 1973 study where thecut-off was $100,000). This time, the study encompassed two waves of data collection because the first waveyielded a lower than expected response rate. This paper presents some of the major difficulties encountered duringthe survey as well as attempted solutions and outcomes.

2. SAMPLING FRAME

Two frames were needed: possible research funders (independent nonprofit institutions funding S&E R&D) andpossible research performers (independent nonprofit institutions conducting science and engineering research and/ordevelopment). The final frame for the performers sample was assembled from several lists. The most important ofthese was the list of organizations filing Form 990 tax returns as nonprofit organizations for the year 1996. The IRSdatabase had many important advantages for generating the list of possible performers of S&E R&D. First, this wasthought to be the most complete list available. Second, all of the entries on the IRS database were, by definition,independent entities keeping their own sets of books. Thus, NSF and Gallup incorrectly assumed that the IRS listcontained no duplicate entries. Finally, the 990 database contained good measures of size and location, and itincluded unique Employer Identification Numbers (EINs). The other candidate databases all had serious gaps.An alternative to the IRS list was the Research Centers Directory(RCD). Unfortunately, this source had seriouslimitations making it unsuitable. Specifically, it did not include good measures of size for the organizations it listed,it excluded associations and hospitals, and it omitted most of the nonprofit R&D performers that received FederalR&D funds, according to Federal agencies surveyed in NSFs Federal Funds Survey (FSS). The RCD was used assource of locating information about the sampled NPOs, but was not directly used in the sample selection process.The IRS database also had some weaknesses, and, as a result, several supplementary lists were used in constructingthe final sampling frame for the performers study. The most serious drawback to the IRS database was that itcontained a relatively low proportion of NPOs that were likely to be eligible for the study. The full IRS file for 1996included approximately 600,000 NPOs. Gallup purchased a modified list of NPOs from the National Center forCharitable Statistics (NCCS is a subsidiary of the Urban Institute). NCCS dropped approximately 50,000 privatefoundations that were unlikely to carry out R&D activities themselves. It addition, it dropped another 280,000 NPOsthat had gross receipts of less than $25,000 or were religious organizations; these NPOs were not required to fileIRS 990 information returns. The final data set included 184,876 NPOs that had filed returns for 1996. Availableestimates indicated that the number of NPOs performing R&D was far smaller—between 2,000 and 4,000.1 Thus,only about 1 to 2 percent of the NPOs on the IRS file were likely to be eligible for the performers sample.

1 Facing Toward Governments: Non Governmental Organizations and Scientific and Technical Advice. (1993)

Carnegie Commission Report.

Page 19: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1193

Four additional lists were merged with the IRS 990 file to allow us to sample eligible S&E R&D performers moreefficiently. These were: a list of 443 organizations that took part in a 1973 NSF study of S&E R&D activities; the2,899 organizations compiled in the FSS; a list of 15 non-profit Federally-funded R&D Centers (FFRDCs) and 10additional nonprofit organizations that administer FFRDCs; and a list of 1,138 teaching hospitals.

2.1 Sample Stratification and Selection

Before Gallup classified the NPOs on the final sampling frame into strata, it analyzed NTEE codes to determinethose likely to contain a high percentage of organizations eligible for the performers study. The NTEE is a systemfor classifying nonprofit organizations, originally developed to accompany the United Way taxonomy of goals forcharitable organizations. In 1993, the IRS incorporated the NTEE coding system into its tax-exempt classificationsystem to standardize coding between the IRS and the nonprofit community. In total, five sampling strata werecreated (Table 1). Gallup selected samples of potential performers and funders of S&E R&D and attempted tocomplete a short screening questionnaire with the sample NPOs to determine their eligibility for the main study. Itthen tried to collect more detailed information from those NPOs that were, according to their screener responses,eligible for the main study.Expected differences in stratum eligibility rates implied that the cost of identifying eligible NPOs and completingdata collection would vary markedly by stratum. In addition, the expected differences in mean S&E expenditures bystratum suggested that the within-stratum variances in S&E R&D expenditures (and other key NPO characteristics)would also differ. Gallup used an optimal allocation of the screening sample across the final three strata that tookinto account best guesses regarding eligibility rates, relative costs, and within-stratum variation. The allocationformula was a variant of one discussed by Kish (Survey Sampling, 1965, New York: John Wiley. See p. 406).

Table 1. Allocation of the Performers Sample by Stratum

Stratum Selected for Screening N In Frame Sampling Fraction Description1 435 435 100% Certainty selections2 1,936 1,936 100% FSS entries3 208 972 21% Teaching hospitals4 3,953 22,107 18% High likelihood filers5 289 158,661 < 1% Low likelihood filers

Total 6,821 184,111

3. WAVE 1 DATA COLLECTION

The initial wave, described in this section of the report, encompassed a pretest, an initial attempt at screening thesample NPOs, and the first attempt to collect more detailed information from NPOs that proved to be eligible for themain performer or funder questionnaire. This initial wave of data collection encountered serious problems, and, as aresult, a second wave of data collection was conducted (discussed in Section 4 ).Between March and June of 1997, Gallup conducted a pretest that examined drafts of the key survey documentsincluding: the screener questionnaire; the cover letter; and the funder and performer questionnaires. Nineorganizations took part in the pretest of the performer materials and six organizations participated in the pretest ofthe funder materials. The designated contacts at each of these organizations were not asked to complete thequestionnaires, but merely to offer suggestions on how to refine them.The pretest participants expressed some concern about confidentiality but most said that many of the items werealready available to the public in either their 990 form or their annual reports. NSF decided not to make any itemsconfidential, because NSF and researchers wanted to have as much detail as possible.For the performer questionnaire, the pretest participants were concerned with the terms “research,” “development,”“basic research,” “applied research,” and the specific scientific fields listed in the questionnaire. Although theyindicated their familiarity with these terms, participants noted that many NPOs did not keep their records in thatformat. The respondents would have to contact the researchers to find out which term applied to a project. Despitethese misgivings, the questions based on these terms were kept in the questionnaire, since these data wereconsidered vital to NSF/SRS publications. The pretest participants also expressed concern about the definitions ofintramural and extramural. Definitions for these terms were added to the questionnaire.

Page 20: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1194

3.1 Screener Survey

On April 1 and 2, 1998, Gallup mailed out packages to all of the institutions in the screening sample. This initialmailout (which had a Lincoln, Nebraska, return address) contained the following documents: a cover letter signed bythe Director of NSF; a screener questionnaire; a “Q&A” document; and a postage-paid return envelope with aLincoln, Nebraska, return address. The cover letter was addressed to the contact, where the contact name wasknown, or to “Dear Colleague,” where it was unknown. Completed questionnaires were to be returned to the GallupLincoln office for processing and scanning. The initial due date for the screener questionnaire was May 1, 1998.Telephone follow-up with NPOs that had not mailed back a screener began on April 27, 1998. The screening surveywas conducted with the contact person identified on the frame. By June 1, 1998, only 2,274 screeners (out of 6,806mailed out) had been completed either by mail or telephone, and numerous organizations had contacted Gallupand/or NSF requesting extensions. Ultimately, the field period for Wave 1 screening was extended to the middle ofJune 1998. Of the 6,806 sample NPOs thought to be potential S&E R&D performers (15 were deemed ineligibleand had been dropped from the sample prior to fielding), 4,698 completed a screening questionnaire. An additional576 NPOs were deemed out of scope for the survey for a variety of reasons (e.g., they turned out to be out ofbusiness or duplicates). The estimated response rate to the screener for the performers was 75.4 percent. Of the1,963 NPOs thought to be potential funders, 851 provided screening data and another 337 were classified as out ofscope, for an overall response rate of 52.3 percent. Table 2 gives detailed information on the response rates for theWave 1 screening effort. A total of 823 NPOs—700 performers and 123 funders—were deemed eligible for themain questionnaire. One barrier to a higher response rate was the poor locating information for the sample NPOs.Some 428 of the packages initially mailed out were returned as undeliverable; Gallup was also unable to obtaintelephone numbers for 292 of the sample NPOs.

Table 2. Wave 1 Screener Response Rates, by StratumSTRATUM

Certainty (1973) FSS Hospitals R&D Other Funders TotalScreener ResponseRate

81.6% 84.2% 80.0% 70.9% 71.8% 52.3% 70.6%

3.2 Main Survey

The main survey packets were mailed to eligible organizations beginning on June 29, 1998. The packet (which borea Lincoln, Nebraska, return address) included: a cover letter; a copy of the letter signed by NSFs Director that hadaccompanied the screener survey; a relevant main survey questionnaire (either performer or funder); instructions forcompleting the survey on the Web; a detailed list of fields for help in classifying the organization’s activities; aglossary of terms; a “Q&A” document; and a postage-paid return envelope with a Lincoln, Nebraska address. Thecover letter addressed to the Presidents/CEOs of sample organizations included a statement noting the endorsementof the survey by other relevant organizations.The respondent was able to choose to complete the questionnaire via the Web or on paper. Ultimately, only 42 of118 respondents completed the survey via the Web. An additional 11 NPOs began the Web survey but did notcomplete it. NPOs eligible for both questionnaires—that is, organizations that funded and performed S&E R&D—were administered the performer questionnaire, since that performer questionnaire included questions that elicitedinformation on funding activities.

4. WAVE 2 DATA COLLECTION

By the end of Wave 1, only 95 performers (of the 700 screened in) and only 30 funders (of 123 screened in) hadcompleted the main questionnaire. In an effort to increase the response rates (which were particularly low for themain questionnaire), Gallup carried out a second wave of data collection, beginning in February of 1999.Nonrespondents to both the screener and main questionnaire (excluding hard refusals) were refielded in Wave 2. Inaddition, a small number of cases originally classified as ineligible during Wave 1 were refielded in Wave 2; thesewere cases for which NSF had reason to believe that the Wave 1 screening data may have been incorrect.Some of the survey packages may never have reached the intended respondents; in addition, the envelope (whichhad a Lincoln return address) did not indicate that it was an official U.S. Government mailing. An informal testcarried out by the Project Officer with nine nonresponding NPOs indicated that three of the nine addresses were outof date. This test used an official envelope, hand written address and a “real” stamp.

Page 21: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1195

The test results prompted Gallup to carry out additional locating efforts prior to the Wave 2 mailouts. In addition,the Wave 2 mailings were sent in business-size, official NSF envelopes with an Arlington, Virginia return address.A prenotification letter was mailed to 2,475 Wave 2 NPOs in February 1999. All organizations that had notresponded to the first screener mailing in the spring of 1998 and were believed to be active organizations weremailed the prenotification letter. Versions of the prenotification letter were tailored to each of the situations forwhich a specific contact name was known or not known.The letter was sent in a business-size official NSF envelope. Another major modification to the cover letter fromWave 1 was the inclusion of a Correction/Change of Address Form. Contact information, including the organizationname, EIN, contact name, address, email address, and telephone number was preprinted on this form. Organizationswere requested to review the form and, if necessary, make corrections or additions and fax it back to Gallup.

4.1 Screener Survey

The screener survey packet was mailed to all NPOs in the Wave 2 sample in a white 9x12” envelope with anArlington, Virginia return address. The packet included: a cover letter; the screener questionnaire; and a postage-paid return envelope addressed to NSF, Lincoln, Nebraska. The first wave of the screener questionnaire was sent toeligible NPOs on March 10, 1999 with an end of month due date. The screener questionnaire for Wave 2 was verysimilar to the questionnaire for Wave 1. The only change was to ask for information about the person completing thequestionnaire at the beginning of the questionnaire instead of at the end.Attempts were made to conduct the screener survey via telephone with any sampled organization that did not returnthe hardcopy screener questionnaire by the due date. Gallup’s telephone interviewers conducted these interviews.The follow-up effort began on April 12, 1999 to the approximately 2,000 organizations that had yet not responded tothe screener by mail. The follow-up continued through June 25, 1999. The cumulative screener response rate acrossboth waves was 84.4% (Table 3). A major obstacle in completing screening interviews via telephone was the lackof useful contact information for many of the organizations

Table 3. Final Screener Response and Rates, by StratumSTRATUM

Certainty FSS Hospital R&D Other Funders TotalB NO RESPONSE 8 24 3 208 14 338 595F INCOMPLETE RESPONSE 2 12 0 95 9 110 228N OUT OF SCOPE 56 263 33 237 19 344 952Q FINAL REFUSAL 12 43 13 177 17 186 448

COMPLETE RESPONSER Low Performer/ Funder-CATI* 30 121 17 154 10 93 425S Low Performer/ Funder-MAIL* 22 123 5 218 12 193 573T Non Performer/ Funder-CATI 50 356 43 1071 82 458 2060U Completed Mail Survey Ineligible 55 593 58 1572 110 365 2753V Survey Eligibles 197 393 36 219 17 216 1078W Subtotal 354 1586 159 3234 231 1325 6889X Total 1 (Sum (B,F,N,Q,W) 432 1928 208 3951 290 2303 9112

Y Total # of Cases (excludes out-of-scope) 376 1665 175 3714 271 1959 8160

Screener Completion Rate (percent) 94.2 95.3 90.9 87.1 85.2 67.6 84.4

4.2 Wave 2 Main Survey and Data Collection Summary

Survey packets were mailed to eligible organizations on a rolling basis as the screening data were received. Thepackets included: a cover letter; a hardcopy version of the appropriate questionnaire; the Internet address foraccessing the Web version of the questionnaire and a personal identification number; and a postage-paid returnenvelope. The first round of mailing included organizations that had not responded or had returned incomplete

Page 22: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1196

questionnaires in Wave 1; organizations that had screened in during the Wave 2; and 316 funders that had beeninadvertently left out of the mailout in 1998. This mailing was sent by regular mail to a total of 609 organizationsduring the week of May 17, 1999. The second mailing went to 163 organizations on June 11, 1999. Two weeksafter each of these mailings, a reminder postcard was mailed to nonrespondents. The third round of mailings went toorganizations that had completed the screener, organizations for which Gallup had new address information, andorganizations that needed individualized letters. This mailing, to 93 NPOs, was also sent out in June 1999. A fourthmailing went to eligible organizations that had been included in the previous three mailings but had not yetresponded (sent to 691 NPOs by priority mail on July 23, 1999). At the end of August, a final mailing went to 120NPOs that were eligible but had not been included in any of the previous mailings, organizations that had been sentthe wrong version of the questionnaire, and organizations for which Gallup had new contact information.Both the performer and funder instruments were changed substantially from Wave 1 to simplify them and reduce thetime needed to complete them. This trimming was done to improve the main survey response rate.As in Wave 1, respondents had the option of completing the questionnaire via the Web. Overall, 58 of the 227Wave 2 respondents (25.6%) completed the questionnaire via the Web. A similar proportion of the Wave 1respondents—41/125, or 32.8%— completed the survey on the Web.The main data collection effort produced a much lower response rate than the screener produced. Table 4 providesresponse rate estimates. After both waves of data collection, the estimated response rate to the main survey (takingineligibility into account) was 41 percent—only 352 NPOs completed the main questionnaire out of more than 1,100who had screened in. Nearly 65 percent of the cases that completed the main questionnaire completed it during thesecond wave of data collection.

Table 4. Main Data Collection Response Rates

Status Total Funder Performer Both Total Completes 352 107 243 2 Total Ineligibles 126 50 76 0 Total Nonrespondents 653 176 476 1Total 1131 333 795 3

Ineligibility Rate 24.9% 29.8% 22.6% 0.0%Eligibility Rate 75.1% 70.2% 77.4% 100.0%Adjusted Total 850 234 616 3

Unadjusted Response Rate 35.0% 37.8% 33.8% 66.7%Adjusted Response Rate 41.4% 45.7% 39.5% 66.7%

Note: All figures are unweighted.

4.3 Issues with Data Collection

Problems with the Screening Process -- Despite the suggestions from the Advisory Panel and the pretest of theforms, it is apparent from the high rate of ineligibility to the main survey (24 percent), that the screening process didnot always yield accurate information about sample NPOs. There are at least two possible reasons for theinaccuracy of the screening data. First, many of the screener questionnaires may have been completed byinappropriate respondents. Another possibility is that the screening materials did not provide adequate definitions ofresearch and development and science and engineering.Problems with the Questionnaire Design -- Both the funder and performer questionnaires were lengthy andcomplicated, which contributed to the low overall response rate. The problems with the questionnaire are alsoapparent in the extensive data retrieval and cleaning effort that were necessary to resolve inconsistent andincomplete responses. Even with the simplified version of the performer questionnaire fielded in Wave 2, there wereproblems that may have adversely affected the data quality.The most serious problem discovered in the retrieval process was that many organizations did not understand thedistinction between intramural and extramural research. Another problem was respondents’ inability to force annualincome to match annual spending for intramural R&D. Also, the order of the questions may also have created somedifficulty for respondents.

Page 23: 11.1 - Giroux - ww2. · PDF fileINTRODUCTION The agriculture tax ... However, as the computer technology progresses and becomes more popular, more and more T1’s are filed in an

1197

The number of organizations with these problems was discovered only through data retrieval, which was onlyperformed for questionnaires with obvious inconsistencies. Overall, 45 percent of the performer questionnaires and25 percent of the funder questionnaires needed some type of data cleaning or imputation.Impact of Nonresponse -- Aside from the bias it may have introduced, the unexpectedly low response rate to themain survey sharply reduced the analytical possibilities for the data set. The original sample design had called formore than 2,500 completed cases; the final data set had 352. State estimates, as well as estimates for other smallsubdomains, were no longer possible because the final sample sizes were too small to permit reliable estimates. Oneof the reasons that the FSS cases were included in the performer sample with certainty was to ensure a gooddistribution of cases across states. The low response rate does not permit NSF to publish state-level estimates.

5. CONCLUSIONS AND RECOMMENDATIONS

This section of the report presents recommendations for carrying out a similar survey in the future. Although NSFand Gallup gained useful experience on many specific points, this section focuses on broader issues that are likely tohave a major impact on the outcome of any new attempt to measure S&E R&D spending by NPOs. Therecommendations concern the design phase for a new study, questionnaire design, the sampling frame, and the datacollection protocol.In retrospect, it seems clear that survey encountered many unexpected problems in virtually every phase of the study—ranging from problems with the sampling frame to difficulties in cleaning the final data file. The effort profitedfrom a small pretest conducted before the first wave of data collection and an informal pretest carried out prior toWave 2. This suggests that a longer design phase, with more extensive testing of the questionnaire and fieldprocedures, may be useful in carrying out similar studies in the future. As noted below, separate pretests may beneeded to develop and test the questionnaire and to examine the proposed data collection procedures. Both sets ofpretests may need to be iterative—examining what was proposed initially and then proposed revisions.The results indicate that respondents had a good deal of trouble in completing the questionnaires. Many of theNPOs that screened into the survey—approximately one in four—turned out to be ineligible for the main study.Many of the respondents to the main survey had to be recontacted to resolve inconsistent answers or retrieve missingones, and, even after this intensive retrieval operation, many of the questionnaires still had missing or inconsistentresponses. It is possible that difficulties in completing the main questionnaires contributed to the low overallresponse rate. Thus, one goal for an enhanced design effort in any future round of the survey would be a more user-friendly questionnaire.The design phase of a new study might incorporate additional testing to evaluate the shortcomings of candidatesampling frames for the study and to assess possible strategies for addressing these shortcomings. A substantial fieldtest prior to the main study (e.g., one involving 50 to 100 NPOs) might help identify problems early. In addition,prior to such a field test, it may be useful to revisit questions about the scope of the survey and the definition of theunits to be surveyed. For the current survey, NSF and Gallup opted to cast a broad net, and the sampling frameGallup developed included more than 180,000 organizations. In retrospect, it may have been useful to concentratemore of the data collection effort on recipients of Federal research support and other known performers and fundersof S&E R&D, even if that meant some reduction in the coverage of the entire target population.Although the final response rate for the screener was reasonably high—84.4 percent—achieving this rate requiredtwo waves of data collection. Further, even after the second wave, the response rate for the main questionnaire wasunacceptably low at 41.4 percent. Many factors doubtless contributed to the low response rate—the use of Galluprather than NSF letterhead and the use of a Gallup return address on correspondence with sample NPOs duringWave 1, the inclusion of NPOs (such as private foundations) with minimal interest in S&E R&D, and variousoperational problems. The recommendations presented here focus on three features of the design of the survey thatstand out in retrospect as probable contributors to the low response rate—the questionnaire, the schedule of contactswith the sample NPOs, and the staffing of the survey.The data collection protocol had Gallup making as many as 15 attempts to contact the sample NPOs. In addition,NSF staff also made attempts to contact and convert NPOs that were reluctant to participate in the study. The bulkof these contacts came during the second wave of data collection. In retrospect, the plans for Wave 1 did not allowenough time or sufficient contacts to achieve a high response rate. The data collection plan for any future surveyshould allow for multiple contacts with members of the sample from the outset, including prenotification letters(which were omitted in Wave 1 of the current study), multiple follow-up contacts by mail and, if possible, e-mailcontacts, and a specified number of telephone prompts. The data collection contractor should be encouraged toexplore any other steps (such as incentives) that might produce higher response rates.