14
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Who needs RDD? Combining directory listings with cell phone exchanges for an alternative telephone sampling frame

Embed Size (px)

Citation preview

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

Author's personal copy

Who needs RDD? Combining directory listings with cell phoneexchanges for an alternative telephone sampling frame

Thomas M. Guterbock a,⇑, Abdoulaye Diop b, James M. Ellis a, John Lee Holmes a, Kien Trung Le b

a Center for Survey Research, University of Virginia, United Statesb Social and Economic Survey Research Institute, Qatar University, Doha, Qatar

a r t i c l e i n f o

Article history:Received 13 March 2009Available online 11 January 2011

Keywords:Telephone samplingDual-frame samplesElectronic white pagesListed telephone samplesRandom Digit DialingCoverage biasCell-phone samples

a b s t r a c t

The traditional Random Digit Dialing method (list-assisted RDD using a frame of landlinephone numbers) is clearly under threat. The difficulty and costs of completing telephonesurveys have increased due to rising rates of refusal and non-contact. The completenessof coverage of list-assisted RDD samples has diminished due to the proliferation of cell-phone only households. The ability of list-assisted RDD to capture young, mobile, unmar-ried, and minority households is thus diminishing as well. Increasingly, survey researchershave been adding a cell phone component to their sampling frames for telephone surveys,despite the increased costs and other issues associated with RDD calling of cell phones.

Recent research by Guterbock, Oldendick, and others has explored the extent to which‘‘electronic white pages’’ (EWP) samples really differ from RDD samples. Oldendick et al.and Guterbock, Diop and Holian have emphasized that minority households are seriouslyunderrepresented in EWP samples. Nevertheless, EWP samples have distinct advantageswhenever a survey is aimed at a restricted geographic area.

This paper considers the feasibility of combining EWP samples with cell-phone RDD,eliminating the ordinary RDD component from the sampling frame. We analyze the com-ponents of the telephone population, showing that the proposed method would fail tocover only one segment of the telephone population: unlisted landline households thathave no cell phone. We analyze data from the 2006 National Health Interview Study to esti-mate the size of this segment, its demographic profile, the degree to which selected demo-graphic and health behavior characteristics are different from those in the segments thatthis sampling strategy would capture, and what biases are present in the various samplingframes of interest. Trend data from the NHIS are used to assess how these biases are chang-ing. A simple cost comparison is made among the RDD, RDD + cell and EWP + cell frames.The proposed alternative ‘‘EWP + cell’’ sampling frame provides relatively small bias com-pared to RDD + cell at costs comparable to RDD-only designs. The portion of the telephoneuniverse that is excluded in the EWP + cell design is getting smaller all the time, thereforeits bias relative to the RDD + cell design is decreasing over time. Overall, the EWP + celldesign seems to be a useful alternative.

� 2011 Elsevier Inc. All rights reserved.

1. Introduction

The Random Digit Dialing (RDD) method for sampling US households in telephone surveys is clearly under threat. Evenwith today’s industry-standard ‘‘list-assisted’’ approach that identifies blocks of telephone numbers that are more likely to

0049-089X/$ - see front matter � 2011 Elsevier Inc. All rights reserved.doi:10.1016/j.ssresearch.2011.01.001

⇑ Corresponding author. Fax: +1 434 982 5524.E-mail address: [email protected] (T.M. Guterbock).

Social Science Research 40 (2011) 860–872

Contents lists available at ScienceDirect

Social Science Research

journal homepage: www.elsevier .com/locate /ssresearch

Author's personal copy

contain working residential telephone numbers, RDD has always involved inefficiency because a fairly sizable portion of thetelephone numbers that are attempted will be non-working or non-residential numbers. In the past, the costs of this extraeffort have been easily justifiable, because no other method of sampling telephone numbers could offer the full coverage ofthe landline telephone universe that RDD offered. At the time of its adoption by the survey industry in the 1970s, of course,all US telephones were landline phones.

In recent years, however, the difficulties and costs of completing RDD telephone surveys have increased due to thedecreasing density of working residential numbers within blocks of telephone numbers (Tucker et al., 2005), increasing ratesof non-contact, and rising rates of refusal (Battaglia et al., 2007). It can be argued that of these factors, non-response presentsthe greatest challenge to the accuracy of telephone surveys, but the present study is focused on the growing problem of cov-erage bias. The proliferation of households that can be reached only via cellular phone has begun to reduce the completenessof coverage of the list-assisted RDD sampling frame, which is based solely on blocks of telephone numbers used for landlinetelephones (Blumberg and Luke, 2010; Blumberg et al., 2008). Since the cell-only lifestyle has been adopted most readily bythe young, the mobile, the unmarried, and members of ethnic or racial minorities, the coverage problem with list-assistedRDD frames is particularly acute for these groups (Ehlen and Ehlen, 2007; Blumberg et al., 2008).1

In response to these challenges, leading survey research organizations have begun to experiment with adding a cell phonecomponent to their sampling frames for telephone surveys (Brick et al., 2007; Link et al., 2007; Keeter et al., 2008). This re-sults in a dual-frame design in which some households are reached through a traditional list-assisted RDD sample of workingtelephone exchanges assigned to landline telephones while others are reached through an RDD sample drawn from theframe of working exchanges that are devoted to cellular phones. Households that could be reached in both frames have agreater chance of selection than do those that appear in only one of the frames. While some researchers have advocated thatunits sampled from the cell-phone frame be screened so that those who also have landlines in their households are excluded(Kennedy, 2007), the emerging consensus appears to be that more complete coverage is obtained if cell-phone frame unitsare included whether or not they also have a landline phone (Lavrakas et al., 2007: p. 846).

Either approach to handling the respondents contacted by cell phone substantially increases costs over the traditionallandline only RDD design, because considerably greater calling effort is needed to achieve a given number of completionsfrom the cell-phone sample—and even greater effort if those also having landlines are screened out (Guterbock et al.,2010). There are other challenges in calling cell phones, which are discussed in the comprehensive overview by Lavrakaset al. (2007), and research is ongoing to address these challenges (as is described by AAPOR Cell Phone Task Force(2010)). It appears, however, that a ‘‘new norm’’ is emerging for telephone surveys of the general US population: the useof dual-frame RDD designs combining traditional list-assisted landline-based RDD with RDD of cell-phone samples in whichrespondents reached by cell phone are not screened for cell-phone-only households. In this paper, we will refer to this over-lapping dual-frame design as ‘‘RDD + cell.’’

This paper considers the potential utility of a different sampling approach: substituting a listed-only sample for the tra-ditional RDD sample in the dual-frame design. That is, we explore the possible advantages and the possible effects on cov-erage if a cell-phone RDD sample is combined not with traditional RDD but with an ‘‘electronic white pages’’ (EWP) sample—a sample drawn from the computerized frame of directory-listed landline telephone households.2 We explore this approach,which we call ‘‘EWP + cell,’’ because it promises considerably greater efficiency and cost savings compared to the RDD + celldual-frame design. This is especially true for any survey in which the population is located in a specific geographic region smal-ler than a whole state or not co-extensive with any set of telephone area codes. An EWP sampling frame can be closely targetedto such geographical specifications, while an RDD frame will contain telephone numbers that are located outside the area ofinterest.

We will introduce a partition of the telephone universe into five segments and show that the proposed EWP + cell methodwould fail to cover only one segment of the population of telephone households: unlisted landline households that have nocell phone. We analyze data from the 2006 National Health Interview Survey to estimate the size of this non-covered seg-ment, determine its demographic characteristics, and assess the degree of coverage bias that results from its exclusion fromthe EWP + cell dual-frame design. To place this comparison in a proper context, we also assess coverage bias in landline RDD-only designs compared to RDD + cell dual-frame designs and coverage bias in EWP-only designs compared to RDD + cell.

We will show that there is surprisingly little coverage bias to be expected if EWP + cell is undertaken as an alternative toRDD + cell. We also consider the potential cost savings from EWP + cell compared to RDD + cell. Under certain assumptions,we believe that a survey based on the EWP + cell dual-frame sample design may cost no more than one based on a traditional(list-assisted landline) RDD-only sample. Thus, the proposed design holds promise for delivering results of substantiallyhigher data quality (better coverage) than RDD-only, with considerably less increase in survey cost than that associated withRDD + cell. The cost advantage will be greatest when the population of interest is located in a specific geographic area thatcannot easily be matched to a co-extensive landline RDD frame.

1 These groups have tended to be under-represented in most telephone surveys in any case, because they are at home less of the time and/or have higherrates of refusal. The prevalence of cell-phone-only households among these groups worsens their representation in realized samples obtained from RDD frames.

2 Here we follow the terminology of Wilson et al., 1999, and we use the term ‘‘EWP sample’’ to avoid using the term ‘‘listed sample.’’ The latter term canintroduce unwanted confusion, because we will later be contrasting listed and unlisted households. Both EWP and RDD samples include some unlistedhouseholds along with listed households, with listed households usually constituting a majority of working telephone households in both sample types.

T.M. Guterbock et al. / Social Science Research 40 (2011) 860–872 861

Author's personal copy

2. Prior research on EWP samples compared to RDD

A small but useful literature that compares directory-listed samples with list-assisted landline RDD samples has been re-viewed elsewhere in some detail (Guterbock et al., 2003). Somewhat surprisingly, there was little scientific attention to suchcomparisons in the 1970s, when RDD samples replaced phone-listed samples as the dominant mode of telephone sampling.As a result, the degree and nature of the differences between listed and unlisted households have remained a matter of somespeculation, not always well grounded in evidence.3 The few studies published before 2002 (Traugott et al., 1987; Psaty et al.,1991; Schejbal and Lavrakas, 1994; Wilson et al., 1999) tend to agree on several key points. In general, researchers conductingdual frame studies reported slight differences in substantive results between list and RDD-frame samples. Conclusions in mostof the studies indicated that the differences probably would not alter the study’s recommendations. In addition, there is generalagreement among researchers using dual frame studies that list-frame samples provide a significantly more efficient method forcontacting target households in small areas than do RDD-frame samples.

Building on this work, Guterbock et al. (2003) conducted an experimental comparison of RDD and EWP samples in a Vir-ginia statewide sample. They found a 15% cost saving in production costs for the EWP sample, but the results differed sig-nificantly between the samples both with regard to demographics and in measures of satisfaction with services of the state’sDepartment of Motor Vehicles (DMV), the main focus of that survey. In particular, their study showed that African-Ameri-cans were underrepresented in the EWP sample. Similarly, Oldendick et al. (2004) found substantial underrepresentation ofAfrican-Americans in a statewide survey in South Carolina that compared the EWP sampling frame to RDD, while finding fewimportant differences in the substantive outcomes of the survey. Their estimate of cost savings from EWP sampling wasmore modest than that found in Virginia.4

In contrast, Guterbock et al. (2007) highlight the fact that several UVa CSR studies that have asked respondents whethertheir telephones are listed or unlisted have found very substantial differences in rates of listedness between whites andAfrican-Americans. While they are not able to account fully for this persistent difference, they show, using data from the2005 National Health Interview Survey (NHIS), that unlisted households are significantly more likely to be African-American,lower income, renters, singles, and young people. They note that rates of unlistedness are markedly higher in Western states,and that unlistedness is slightly more common for women than men, and for those with children in the home.

In this paper, using several years of NHIS data, we will look more closely at the coverage bias that might result if an EWP-only sampling frame were used in place of a landline-based RDD frame in the dual-frame design. This paper will advocate forthe use of EWP + cell as a dual-frame sample design, but we do not claim that landline RDD and EWP frames are fully equiv-alent. Rather, we will endeavor to demonstrate that the cell phone component of the EWP + cell dual-frame design largelymakes up for the known deficiencies in coverage of the EWP frame.

3. Data sources

The primary source of data used in this paper is the 2006 National Health Interview Survey (NHIS). The NHIS is performedusing computer-assisted personal interviewing (CAPI) with a stratified multistage area probability sample of households inthe United States drawn from US Census and local building permit data (US Department of Health and Human Services,2000). The NHIS is large: 29,204 households were interviewed in 2006, contributing data for 75,716 individuals in 29,868families (US Department of Health and Human Services, 2007). The construction of the NHIS sampling frame does not relyon the telephone status of the household, and the CAPI mode includes households without regard to their telephone status.In addition, the NHIS includes questions about the telephone status of the household. These features make the NHIS veryuseful for analyses of telephone and cell phone penetration in the US.5

Unfortunately, the NHIS questionnaire has not in the last few years included questions about whether or not the respon-dent’s landline phone is directory-listed or not. Therefore, the most recent available dataset from the NHIS that includes theneeded information is the 2006 release. Survey questions that differentiated listed and unlisted landline households wereincluded in the 2006 NHIS. This distinction is important for the analyses in this paper. It should be noted that the most recenttrends in cell phone penetration – particularly the growth of cell-only households – will not be fully captured in the 2006NHIS. The 2006 data set also lags behind any changes in the distribution of listed and unlisted landline households within theuniverse of landline telephone households.

3 According to the website of Survey Sampling, Inc.: ‘‘No one currently collects data on the properties of unlisted households, but it is generally assumed thatthey are households with lower income. The highest demographic correlation with unlistedness is age. Younger people, due to frequent address changes, aremore likely to be unlisted (www.surveysampling.com/faq).’’ Note that this summary makes no mention of the racial difference in listedness. The most recentedition of Babbie’s textbook in research methods, the most widely used text in its field, introduces RDD sampling with the following misstatement: ‘‘A surveysample selected from the pages of a local telephone directory would totally omit all those people—typically richer—who requested that their numbers not bepublished’’ (2007: p. 269).

4 More recently, Zogby et al. (2007) publicized results of several telephone surveys that compared RDD samples with EWP samples. Zogby finds somedemographic and substantive differences, but judges these to be small enough, in general, to be of little consequence for survey accuracy.

5 The analyses, interpretations, and conclusions described in this paper are the responsibility of the authors, not NCHS, which is responsible only for the NHISdata.

862 T.M. Guterbock et al. / Social Science Research 40 (2011) 860–872

Author's personal copy

4. Analysis preview

The analyses in this paper cover several areas. First we describe the telephone universe in terms of five telephonesegments, using the 2006 NHIS as the best available source for estimates of the proportion of the US population in eachsegment at that time (and accepting that 4.7% of the weighted cases in the 2006 NHIS could not be assigned to any telephonesegment6). Because the NHIS is conducted using face-to-face interviewing, all five telephone segments of interest are repre-sented, as are non-telephone households.

Our analysis is built around three telephone sampling frames: The EWP (directory-listed) frame, the traditional RDDframe based on landline phones, and the frame of cellphone numbers (also sampled through an RDD process). Buildingon these frames, we consider four possible sample designs for a telephone survey: EWP only, RDD-only, RDD + cell, andEWP + cell.

We compare the five different segments of the telephone universe on several demographic, health status and healthbehavior variables that are usually of interest to social science researchers. Using the empirical data across the five telephonesegments, we calculate the coverage errors that would be observed for various sample designs using fewer than all five tele-phone segments. These comparisons assume that other sources of survey error (e.g., non-response, interviewer error, etc.)will be ignorable or at least roughly equal across telephone segments. Three contrasts are considered in our analysis: (1)EWP vs. RDD + cell, (2) RDD vs. RDD + cell, and (3) EWP + cell vs. RDD + cell.

Finally, we relate these observations to changes in the telephone universe documented by the NHIS from 2003 to 2006,and then we offer some tentative cost comparisons.

5. The telephone universe: from three segments to five

For 30 years or more an RDD telephone sample has been considered the most rigorous type of telephone sample becauseit includes landline households that are unlisted or unpublished in their local telephone system’s database. This allows theRDD sample to cover people who are more likely to be transient or willing to pay for limiting access to their telephonenumbers.

But the growth of cell phone use and cell-only households presents new problems. These issues have been the focus ofattention for some time. In the 2006 NHIS, 16.6% of all households with some sort of telephone service were cell-only house-holds. In the NHIS early estimates for July–December 2009 this percentage had risen to 25.1% (Blumburg and Luke, May2010, Table 1, recalculated from 24.5% cell-only out of 97.6% with known telephone status).

It seems clear from the practitioner’s view that research clients today require survey designs that take into account thecell phone phenomenon. These designs must take into account a ‘‘telephone universe’’ that is often conceptualized in threesegments, as seen in Fig. 1: households with only landlines, households with only cell phones, and households with both.(Note that non-telephone households are excluded from these analyses.) The traditional RDD sample reaches householdswith only landlines as well as households with both landlines and cell phones. Therefore, the ‘‘cell only’’ households arethe only ones not covered by the classic RDD design. A cell-phone frame would cover the cell-only households as well asthose with both cell phones and landlines, leaving uncovered the households with only landlines.

But we also have to consider listedness in our description of telephone segments. As noted earlier, listedness has almostalways been a concern when designing telephone survey samples, and it has also been used to make RDD samples more effi-cient by identifying blocks of telephone numbers with listed numbers among them. Some landline telephones are listed in

Table 12006 NHIS sample characteristics – telephone segments (data weighted to adult population).

2006

N Percent

Group 1 cell-phone only 3758 16.6Group 2 cell phone with unlisted landline 4009 17.7Group 3 unlisted landline only 3225 14.2Group 4 cell phone with listed landline 7766 34.3Group 5 listed landline only 3888 17.2Total: known phone service 22,646 100.0Have phone service, missing information 1148 (4.7%)No phone service 481 (2.0%)Total: all households 24,275

6 The demographics of the unclassified telephone cases are not markedly different from those of the classified cases, suggesting that the occurrence ofmissing data for telephone service details was distributed fairly randomly across the phone service types. African-Americans comprised 11.7% of classified casesand 12.7% of those with missing telephone data. Among the classified cases there were 15.0% with ages 18–25, compared to only 10% among the missing cases.Those with high school education or less were 32.9% of classified cases, 39.5% of the missing. Renters were 27.4% of classified cases, 31.6% of the missing.Smoking, diabetes and hypertension percentages differed by only two to three percentage points. While these are significant differences, they are not largeenough to suggest that the omission of less than five percent of the cases from our analysis will distort our findings.

T.M. Guterbock et al. / Social Science Research 40 (2011) 860–872 863

Author's personal copy

the residential directory or electronic white pages (EWP). Some are unlisted. Households with listed landlines (LLL) or un-listed landlines (ULL) may have cell phones that reach them as well. Cell phones, by current industry practice, are unlisted bydefinition. Introducing listedness brings us from three segments to five, as illustrated in Fig. 2.7

Some basic characteristics of these five segments can be described by the 2006 NHIS data. First, a caveat: we note that4.7% of adult-weighted cases were unclassifiable for 2006. A second caveat, mentioned earlier, is that the proportion ofcell-phone only households has grown markedly since the 2006 NHIS survey was conducted. That said, when looking spe-cifically at the differences between the samples in terms of outcomes we see that it generally falls into the patterns onewould expect and, if anything, the trends we identify can be expected to be larger in the future.

Table 2 below shows demographic characteristics in the 2006 NHIS for people who are traditionally harder to reach intelephone surveys, and the telephone status of their households. In the table we see that the proportion of African-Americansis highest in the group that has only an unlisted landline at home and lowest in the group that has a cell phone with a listedlandline. Overall, there is not much difference in rates of adoption of cell phones between different ethnicities. In addition,

Fig. 1. The telephone universe in three segments.

Fig. 2. The telephone universe – from three segments to five segments.

7 A growing number of households are adopting an alternative telephone technology: Voice over Internet protocol or VOIP service. However, VOIP need notbe treated as a separate ‘segment’ for sampling purposes. The great majority of VOIP households remain in the landline RDD segment, because they usetelephone numbers that are drawn from the POTS exchanges (telephone exchanges dedicated to ‘‘Plain Old Telephone Service’’). VOIP households using thesetelephone numbers will be represented in the list-assisted RDD sampling frame. If the VOIP number happens to fall within a numbers block that does notinclude any directory-listed numbers, then it will be excluded from the frame along with other ‘zero-block’ numbers. While number portability may affect thispattern in the future, for the present VOIP users are adequately captured by existing telephone sampling methods. (Our thanks to Linda Piekarski of SSI forpersonal communication to clarify this issue.)

Recent dual-frame surveys completed by the authors have in fact reached some VOIP cases, most of which turn up in the landline RDD frame, with very fewcoming from the cell-phone RDD frame. In any case, there is no information on VOIP telephone service in the NHIS. The question wording used in the NHISinterview to identify household phones is general enough that a VOIP line in the household would be counted with landline cases. In short, VOIP households arenot separately considered here, but are not excluded.

864 T.M. Guterbock et al. / Social Science Research 40 (2011) 860–872

Author's personal copy

people aged 18–25 and renters are far more likely to be cell-only than to be in any other telephone status, and those withannual household incomes less than $20,000 are less likely to have overlapping telephone services (groups 2 or 4).

On the other hand, we have seen that age can be quite important in affecting availability for contact due to cell phoneadoption. Young people, specifically those in the 18–25 year old age group, were already highest in the cell-phone only groupin 2006 (31.1%) while lowest in the listed land line only group. By contrast, adults with a high school or less level of educa-tion were highest in the unlisted land line and lowest in the cell with unlisted landline.

Because cell phone adoption is high among young people we see that the intersection between health related issues andavailability for contact by group type is highest for issues that likewise vary by age. Smoking tends to be higher among youngpeople and hypertension higher among older people and this is reflected in the variation in contact groups. Diabetes andasthma are less age-specific and so we see less variation among the groups there.

6. Coverage bias: concept and calculation

The question at hand is: How biased are various sample designs when they include only portions of the full telephoneuniverse? Providing a literal answer to this question requires understanding biases in survey data due to coverage error,and calculating those biases. Providing a practical answer involves weighing the tradeoffs between better coverage of astudy’s target population and the higher survey costs incurred as coverage is improved.

Groves (1989) presented a typology of survey errors and described coverage error as follows: ‘‘Coverage error exists be-cause some persons are not part of the list or frame (or equivalent materials) used to identify members of the population.Because of this they can never be measured, whether a complete census of the frame is attempted or a sample is studied’’ (p.11).

When certain members of the study population have no chance to be interviewed because they are not in the samplingframe, there is the concern that they may be systematically different from those who are included in the sampling frame, butthere is no way to know how meaningful the difference might be because there are no survey data for those excluded pop-ulation members.

This paper focuses on bias due to coverage error and ignores biases due to other errors of non-measurement such as dif-ferential non-contact or refusal rates across segments of the telephone universe. Conceptually, the impact of bias due to cov-erage error is a function of two factors that may or may not be interrelated – the proportion of the population that is notcovered, and the difference between the covered and uncovered populations for the survey statistic that is of interest. Math-ematically, in the case where the survey statistic of interest is a mean value, bias due to coverage error is expressed asfollows:

YC � Y ¼ ðU=NÞðYC � YUÞ

where Y is the true population mean (which is usually unknown), YC is the observed survey mean for the covered population,the difference YC � Y is the bias in the observed survey statistic, U/N is the proportion of the total population (N) that isuncovered (U), and ðYC � YUÞ captures the difference between the covered (observed) population members YC and theuncovered (unobserved) population members YU . That is, the bias due to coverage error equals the product of the proportionof the total population that is uncovered and the difference between the observed survey statistic and the survey statistic forunobserved members of the population.

Note that a survey with a very high rate of noncoverage may have a small noncoverage bias error if the uncovered pop-ulation does not differ much on the statistic of interest from the covered population. Similarly, a survey with a very low rateof noncoverage may have a large noncoverage bias error if the uncovered population differs greatly on the statistic of interestfrom the covered population.

Table 22006 NHIS demographics and health status of respondents by group.

Variables Group 1cell-phone only (%)

Group 2cell with ULL (%)

Group 3ULL only (%)

Group 4cell with LL (%)

Group 5LLL only (%)

Total (%)

DemographicsAfrican-American 13.4 14.2 15.7 8.3 10.6 11.618–25 31.1 15.7 13.6 11.0 8.0 15.0HS or less 32.1 26.6 48.0 23.1 47.5 32.9Renter 54.5 26.6 37.2 13.1 22.8 27.4Less than $20,000 25.6 10.7 28.8 7.3 28.4 17.5

Health related questionsSmokers (yes) 27.1 18.9 22.1 17.6 20.6 20.5Diabetes (yes) 4.5 6.7 9.4 7.8 11.5 7.9Hypertension (yes) 16.8 22.9 28.6 28.5 37.7 27.2Asthma (yes) 11.9 10.9 9.5 11.3 11.2 11.1

T.M. Guterbock et al. / Social Science Research 40 (2011) 860–872 865

Author's personal copy

7. Contrast I. EWP vs. RDD + cell

In the analyses that follow, the NHIS data for households whose telephone status is known are taken as the true ‘‘goldstandard’’ population data. The five segments of household telephone status8 – listed landline only, unlisted landline only,listed landline plus cell phone access, unlisted landline plus cell phone access, and cell-phone only – are grouped to create threedifferent sample designs that are contrasted empirically by calculating and examining the coverage bias errors associated withthose designs.

The first contrast compares the electronic white pages (EWP) sample frame to the gold standard RDD + cell dual-framedesign, which covers all telephone households.9 The EWP households in the 2006 NHIS are those for which telephone statuscan be determined and that have a listed landline telephone. Some of the listed landline households also have cell phones andsome do not. Households without listed landlines are excluded from the EWP frame – that is, households with cell-only statusand those with unlisted landlines (again, those without telephone service of any kind are excluded from this paper). In the 2006NHIS, among those for whom telephone status could be determined, the EWP frame covered 51.5% of all telephone households,shaded in Fig. 3.

According to the 2006 NHIS data, the EWP frame is most biased in its underrepresentation of renters, by more than 11%points (Table 3, below). African-Americans are also underrepresented by the EWP sample, by more than 3% points. But theunderrepresentation of those with education levels of high school or less and young adults aged 18–25 is less than 2% points.In addition, three of the four substantive health status variables have biases of 1.9% or less as calculated by the formula pre-sented above. Thus, even though the EWP greatly underrepresents renters, the overall noncoverage biases for many otherdemographic and substantive variables do not seem to be as large as one might expect.

Fig. 3. Electronic white pages sample design.

Table 3EWP vs. RDD + cell.

Variables (a) (b) (c) (d) (e) (f) (g) (h)UN (%) Yc (%) Yu (%) ðYc � YcÞ (%) U

N ðYc � YcÞ (%) Y (%) Yc � Y (b–f) (%) Percent bias (g/f) (%)

DemographicsAfrican-American 48.5 9.1 14.4 �5.3 �2.6 11.7 �2.6 �22.218–25 48.5 10.0 20.4 �10.4 �5.0 15.0 �5.0 �33.3HS or less 48.5 31.2 34.8 �3.6 �1.7 32.9 �1.7 �5.2Renter 48.5 16.3 39.2 �22.9 �11.1 27.4 �11.1 �40.5Less than $20,000 47.9 14.2 21.1 �6.9 �3.3 17.5 �3.3 �18.9

Health related questionsSmokers (yes) 48.4 18.6 22.6 �4.0 �1.9 20.5 �1.9 �9.3Diabetes (yes) 48.7 9.0 6.7 2.3 1.1 7.9 1.1 13.9Hypertension (yes) 48.5 31.6 22.5 9.1 4.4 27.2 4.4 16.2Asthma (yes) 48.5 11.3 10.8 0.5 0.2 11.1 0.2 1.8

Unlisted is the uncovered group – cell phones are unlisted by definition.

8 The sixth segment – no landline telephone and no cell phone access – is excluded from the analysis because it is not relevant to a comparison of varioustelephone sampling frames.

9 Since our gold standard is households with known telephone status in the NHIS, and a properly weighted RDD + cell dual-frame design would include allfive segments and is thus exhaustive of all households with known telephone status, we refer to the RDD + cell design as our ‘gold-standard’ design.

866 T.M. Guterbock et al. / Social Science Research 40 (2011) 860–872

Author's personal copy

But the population proportions for most of these groups are also fairly small. Another way to express the impact of biasrelative to the population proportions is the percent bias statistic described by Lepkowski and Groves (1986, p. 396) andMoore and Marquis (1988), which is the ratio of the absolute value of the bias to the true population value times 100. (Ina departure from Lepkowski and Groves (1986), we have elected to retain the information denoted by the minus signs ratherthan use absolute values.) In this paper, the NHIS values derived from the full sample are taken to be the true populationvalues. The percent biases in this contrast are shown in column h of Table 3. These are quite large in many cases and tella different story, with four of the five demographic variables and three of the four substantive variables showing percentbiases ranging from nine percent to more than 40%. This is why EWP sampling frames are usually avoided in social scienceresearch.

8. Contrast II. RDD vs. RDD + cell

The second contrast uses the data from the 2006 NHIS to determine what reduction in noncoverage bias is achieved byadding in the cell-phone sample, as compared to relying only on the standard RDD sample. (Again, this analysis is limited tohouseholds in the 2006 NHIS for which telephone status could be determined.) As Fig. 4 shows, four of the five telephonesegments are included in the standard RDD frame, leaving 16.6% of all telephone households that were cell-only and there-fore uncovered in the RDD frame.

Table 4 compares the standard RDD frame to the RDD + cell design that represents the gold standard in this analysis. TheRDD frame is most biased in its underrepresentation of renters by nearly 5% points, those aged 18–25 by 3% points, andsmokers by more than 1% point. The standard RDD frame also overrepresents hypertensive respondents by 2% points (itseems likely that there would be a correlation between the underrepresentation of young adults and hypertension). Theexclusion of cell phones in the standard RDD frame does not seem to bias the sample as strongly for those who areAfrican-Americans or have education levels of high school or less, household incomes below $20,000, diabetes or asthma.

Fig. 4. RDD sample design.

Table 4RDD vs. RDD + cell.

Variables (a) (b) (c) (d) (e) (f) (g) (h)UN (%) Yc (%) Yu (%) ðYc � YcÞ (%) U

N ðYc � YcÞ (%) Y (%) Yc � Y (b–f) (%) Percent bias (g/f) (%)

DemographicsAfrican-American 16.6 11.3 13.4 �2.1 �0.3 11.7 �0.4 �3.418–25 16.6 11.8 31.1 �19.3 �3.2 15.0 �3.2 �21.3HS or less 16.6 33.1 32.1 1.0 0.2 32.9 0.2 0.6Renter 16.6 22.0 54.5 �32.5 �5.4 27.4 �5.4 �19.7Less than $20,000 16.7 15.8 25.6 �9.8 �1.6 17.5 �1.7 �9.7

Health related questionsSmokers (yes) 16.5 19.3 27.1 �7.8 �1.3 20.6 �1.3 �6.3Diabetes (yes) 16.7 8.6 4.5 4.1 0.7 7.9 0.7 8.9Hypertension (yes) 16.6 29.2 16.8 12.4 2.1 27.2 2.0 7.4Asthma (yes) 16.6 10.9 11.9 �1.0 �0.2 11.1 �0.2 �1.8

Cell-only is the uncovered group.

T.M. Guterbock et al. / Social Science Research 40 (2011) 860–872 867

Author's personal copy

9. Contrast III. EWP + cell vs. RDD + cell

The third contrast examined here explores the acceptability of a compromise. By eliminating the unlisted RDD numbersfrom the full RDD + cell design to produce our proposed EWP + cell design we anticipate incurring lower costs; but what isthe effect on noncoverage bias? The EWP + cell design is represented in Fig. 5 in the shaded segments of the telephoneuniverse.

The results from the 2006 NHIS are – perhaps – surprising. The EWP + cell design underrepresents African-Americans, loweducation and low income households compared to the gold standard RDD + cell design, but the EWP + cell design has lowpercent biases on the other demographic variables. More importantly, the percent biases are low on the substantive healthvariables in this analysis. The reader can compare the entries in the YC column with those in the Y column to see how theresults for the covered sample compare to those from the full telephone population. Depending on the cost savings in theEWP + cell design, the overall picture might be very tolerable to survey researchers, particularly if the variables of interestin the survey are thought to be correlated to age (as many health status variables are). Table 5 shows these results.

10. Coverage trends in the NHIS

The EWP + cell design will certainly have a lower cost compared to the full RDD + cell design, and it appears to have rel-atively small noncoverage biases. In addition, the utility of the EWP + cell design may increase over time because of twotrends in telephone usage. First, as noted earlier, the inefficiency of the RDD frame has been increasing (Tucker et al.,2005; Battaglia et al., 2007). The greater the associated costs of dialing unlisted RDD numbers, the greater the cost savingsin the EWP + cell design.

Second, the proportion of the telephone universe that is unlisted landline only – the only portion left out of the EWP + celldesign – is shrinking. Therefore, the noncoverage bias due to eliminating unlisted landline only households may shrink overtime or possibly end up representing such a small portion of the telephone universe that it could be safely ignored. Fig. 6shows the trends toward cell-only households and away from unlisted landline only households in the 2003–2006 NHIS.

Fig. 5. Electronic white pages + cell sample design.

Table 5EWP + cell vs. RDD + cell.

Variables (a) (b) (c) (d) (e) (f) (g) (h)UN (%) Yc (%) Yu (%) ðYc � YcÞ (%) U

N ðYc � YcÞ (%) Y (%) Yc � Y (b–f) (%) Percent bias (g/f) (%)

DemographicsAfrican-American 14.2 11.0 15.7 �4.7 �0.7 11.6 �0.6 �5.218–25 14.2 15.3 13.6 1.7 0.2 15.0 0.3 2.0HS or less 14.2 30.5 48.0 �17.5 �2.5 32.9 �2.4 �7.3Renter 14.2 25.8 37.2 �11.4 �1.6 27.4 �1.6 �5.8Less than $20,000 13.7 15.7 28.8 �13.1 �1.8 17.5 �1.8 �10.3

Health related questionsSmokers (yes) 14.2 20.3 22.1 �1.8 �0.3 20.5 �0.2 �1.0Diabetes (yes) 14.2 7.7 9.4 �1.7 �0.2 7.9 �0.2 �2.5Hypertension (yes) 14.2 26.9 28.6 �1.7 �0.2 27.2 �0.3 �1.1Asthma (yes) 14.2 11.3 9.5 1.8 0.3 11.1 0.2 1.8

Unlisted-landline only is the uncovered group.

868 T.M. Guterbock et al. / Social Science Research 40 (2011) 860–872

Author's personal copy

In three recent citizen satisfaction surveys conducted by CSR in Prince William County, Albemarle County and Chester-field County in Virginia, 4.4%, 2.3% and 2.0% of responding households in a traditional RDD sample design were unlisted land-line only households.10 Putting aside for the moment any concerns that citizens in households with only unlisted landlinetelephone service may be less likely to respond to surveys, these data indicate that in these three counties, noncoverage biasfrom the EWP + cell design may be trivial, practically speaking.

In terms of demographic and substantive biases, the NHIS data also show that the RDD frame is trending away from thegold standard and the EWP + cell design is trending towards the gold standard. Fig. 7 shows the percent biases across 4 yearsof NHIS data for the five demographic characteristics covered in this paper: African-American, aged 18–25, educated at highschool or less, renting, and household income below $20,000 annually. The horizontal line at ‘‘0.0%’’ represents no percentbias. Except for those educated at high school or less (which shows a slight trend toward overrepresentation), these keydemographic groups are increasingly underrepresented in the RDD frame.

The percent biases for the demographics of the EWP + cell design (Fig. 8), however, moved closer to the gold standardfrom 2003 to 2006 in the NHIS data (with the exception of those aged 18–25, who are starting to be overrepresented inthe EWP + cell design).

Fig. 6. Trends in cell-only and ULL households.

Fig. 7. Trends in percent bias in the RDD sample design for selected demographics (NHIS data).

10 The surveys used samples from three frames: listed landlines, traditional RDD and cell phones. The percentages of ULL households cited here are for theRDD frames. See Guterbock et al. 2009.

T.M. Guterbock et al. / Social Science Research 40 (2011) 860–872 869

Author's personal copy

The results were similar for the substantive health variables. The percent biases in the RDD frame widened from 2003 to2006 in the NHIS (Fig. 9) and ranged from �6.3% to 8.9%. But in the EWP + cell design, they generally fluctuated in a muchnarrower range (Fig. 10). These figures indicate the growing biases in the RDD frame – which is why cell phones are beingadded to the design more frequently – and the smaller biases in the EWP + cell design. Basically, the EWP + cell designignores a shrinking telephone segment that is costly to reach and is not very different from the others statistically, in favorof including a growing telephone segment that is also costly to reach but adds more important information to the data, sta-tistically speaking.

11. Data collection costs

Survey research is a constant balancing of resources, usually measured by time and money, and data quality. As notedearlier, the increased noncoverage biases of the EWP + cell design compared to the RDD + cell design are going to be offsetto some degree by a reduction in data collection costs due to the exclusion of households with only unlisted landline tele-phone numbers. The relevant question here is, would the reduced costs be attractive enough for the researcher to accept theincreased noncoverage bias?

Fig. 8. Trends in percent bias in the EWP + cell sample design for selected demographics (NHIS data).

Fig. 9. Trends in percent bias in the RDD sample design for selected substantive health variables (NHIS data).

870 T.M. Guterbock et al. / Social Science Research 40 (2011) 860–872

Author's personal copy

Table 6 compares three hypothetical telephone survey sampling schemes using an estimated cost of $32 per interviewinghour to represent personnel and non-personnel costs associated with telephone data collection, and estimated completionsper interviewing hour (CPH) that might be observed in a ‘‘typical’’ opinion survey of closed-ended questions averaging 12–18 min per completion. The CPH incorporates the effects of efficient or inefficient frames or segments. The data collectioncosts in the proposed EWP + cell design compare very favorably with the traditional RDD design, and as we have demon-strated, the EWP + cell design is expected to yield smaller noncoverage biases than is the RDD design. In this hypotheticalcomparison, the EWP + cell design saves about 14% of the data collection costs of the gold standard RDD + cell design.

12. Conclusion

There are many factors to consider when making decisions about what methods to use for a particular project. The finaldecisions depend on how the survey methodologist balances scientific knowledge with the unique circumstances of the pro-ject at hand. We are currently engaged in empirical tests of alternative dual-frame designs (EWP + cell vs. RDD + cell) in tele-phone surveys of several Virginia localities (with some initial results from these triple-frame studies reported in Guterbocket al., 2009); further tests are planned on metropolitan or nationwide samples. Our work to date suggests that the EWP + cellsampling design is a viable option that fits current trends in telephone usage and offers a favorable balance of survey errorsand survey costs.

References

AAPOR Cell Phone Task Force, 2010. New Considerations for Survey Researchers When Planning and Conducting RDD Telephone Surveys in the US WithRespondents Reached via Cell Phone Numbers. American Association for Public Opinion Research. <www.aapor.org>.

Babbie, Earl., 2007. The Practice of Social Research, 11th ed. Thompson Wadsworth, Belmont, CA.Battaglia, Michael P., Khare, Meena, Frankel, Martin R., Murray, Mary Cay, Buckley, Paul, Peritz, Saralyn, 2007. In: Lepkowski, James M., Tucker, Clyde, Brick,

J. Michael, de Leeuw, Edith, Japec, Lilli, Lavrakas, Paul J., Link, Michael W., Sangster, Roberta L. (Eds.), Advances in Telephone Survey Methodology. JohnWiley & Sons, Hoboken, NJ, pp. 529–560.

Blumberg, Stephen J., Luke, Julian V., 2010. Wireless Substitution: Early Release of Estimates Based on Data from the National Health Interview Survey, July–December 2009. National Center for Health Statistics. <http://www.cdc.gov/nchs/data/nhis/earlyrelease/wireless201005.pdf>.

Fig. 10. Trends in percent bias in the EWP + cell sample design for selected substantive health variables (NHIS data).

Table 6Cost comparison among three sample designs.

Parameters Traditional design ‘‘Gold Standard’’ ProposedRDD-only RDD + cell EWP + cell

RDD RDD Cell EWP Cell

Target N 1000 800 200 800 200Cost per hour $32 $32 $32 $32 $32CPHa 1.1 1.1 0.55 1.4 0.55Cost by segment $29,091 $23,273 $11,636 $18,286 $11,636

Total cost $29,091 $34,909 $29,922

a CPH = completions per interviewing hour.

T.M. Guterbock et al. / Social Science Research 40 (2011) 860–872 871

Author's personal copy

Blumberg, Stephen J., Luke, Julian V., Cynamon, Marcie L., Frankel, Martin R., 2008. Recent trends in household telephone coverage in the United States. In:Lepkowski, James M., Tucker, Clyde, Brick, J. Michael, de Leeuw, Edith, Japec, Lilli, Lavrakas, Paul J., Link, Michael W., Sangster, Roberta L. (Eds.), Advancesin Telephone Survey Methodology. John Wiley & Sons, Hoboken, NJ, pp. 56–86.

Brick, J. Michael, Edwards, W. Sherman, Lee, Sunghee, 2007. Sampling telephone numbers and adults, interview length, and weighting in the CaliforniaHealth Interview Survey cell phone pilot study. Public Opinion Quarterly 71 (5), 793–813.

Ehlen, John, Ehlen, Patrick, 2007. Cellular-only substitution in the United States as lifestyle adoption: implications for telephone survey coverage. PublicOpinion Quarterly 71 (5), 717–733.

Groves, Robert M., 1989. Survey Errors and Survey Costs. John Wiley & Sons, New York.Guterbock, Thomas M., Hartman, David E., Hubbard, Ryan A., 2003. RDD vs listed: an experimental study of coverage error, costs and non-response in a

statewide telephone survey. In: Presented at the Annual Meetings of the American Association for Public Opinion Research, Nashville, TN (May).Guterbock, Thomas M., Diop, Abdoulaye, Holian, Laura, 2007. White pages, white people: reasons for the low listed-phone rates of African-Americans. In:

Presented at the Annual Meetings of the American Association for Public Opinion Research, Anaheim, CA (May).Guterbock, Thomas M., Ellis, James, Diop, Abdoulaye, Le, Kien, Holmes, John Lee, 2009. Who needs RDD–part II: an assessment of coverage bias in dual-

frame designs. That combine directory-listed and cellphone samples. In: Poster Presented at the Annual Meetings of the American Association for PublicOpinion Research, Hollywood, FL (May)

Guterbock, Thomas M., Lavrakas, Paul J., Tompson, Trevor N., ZuWallack, Randal, 2010. The variable costs of cell phone interviewing: understanding cost andproductivity ratios in dual-frame telephone surveys. In: Presented at the Annual Meetings of the American Association for Public Opinion Research,Chicago (May).

Keeter, Scott, Dimock, Michael, Christian, Leah, 2008. Calling Cell Phones in ’08 Pre-election Polls. Pew Research Center for the People & the Press,Washington, DC. <http://www.people-press.org/reports/pdf/cell-phone-commentary.pdf> (accessed 12.18.09).

Kennedy, Courtney, 2007. Evaluating the effects of screening for telephone service in dual frame RDD surveys. Public Opinion Quarterly 71 (5), 750–771.Lavrakas, Paul J., Shuttles, Charles D., Steeh, Charlotte, Feinberg, Howard, 2007. The state of surveying cell phone numbers in the United States: 2007 and

beyond. Public Opinion Quarterly 71 (5), 840–854.Lepkowski, James M., Groves, Robert M., 1986. A mean squared error model for dual frame, mixed mode survey design. Journal of the American Statistical

Association 81 (396), 930–937.Link, Michael W., Battaglia, Michael P., Frankel, Martin R., Osborn, Larry, Mokdad, Ali H., 2007. Reaching the US cell phone generation: comparison of cell

phone survey results with an ongoing landline telephone survey. Public Opinion Quarterly 71 (5), 814–839.Moore, Jeffrey C., Marquis, Kent H., 1988. Using administrative record data to describe SIPP response errors. In: Presented at the Annual Meetings of the

American Statistical Association, New Orleans, LA (August).Oldendick, Robert W. et al., 2004. Differences in an RDD and list sample: an experimental comparison. In: Presented at the Annual Meetings of the American

Association for Public Opinion Research, Phoenix, AZ (May).Psaty, Bruce M., Cheadle, Allen, Curry, Susan, McKenna, Thomas, Koepsell, Thomas D., Wickizer, Thomas, VonKrrff, Michael, Diehr, Paula, Perrin, Edward B.,

Wagner, Edward H., 1991. Sampling elderly in the community: a comparison of commercial telemarketing lists and random digit dialing techniques forassessing health behaviors and health status. American Journal of Epidemiology 134, 96–106.

Schejbal, Judith A., Lavrakas, Paul J., 1994. Coverage error and cost issues in small area telephone surveys. American Statistical Association: 1994Proceedings of the Section on Survey Methods, pp. 1287–1292.

Traugott, Michael W., Groves, Robert M., Lepkowski, James M., 1987. Using dual frame designs to reduce nonresponse in telephone surveys. Public OpinionQuarterly 51, 522–539.

Tucker, Clyde, Lepkowski, James M., Piekarski, Linda, 2005. The current efficiency of list-assisted telephone sampling designs. Public Opinion Quarterly 66,321–338.

US Department of Health and Human Services, 2000. Design and Estimation for the National Health Interview Survey, 1995–2004 Series 2: Data Evaluationand Methods Research. DHHS Publication No. (PHS) 2000-1330, Vital and Health Statistics. Hyattsville, Maryland, June 2000.

US Department of Health and Human Services, 2007. Data File Documentation, National Health Interview Survey, 2006 (machine readable data file anddocumentation). National Center for Health Statistics. Centers for Disease Control and Prevention, Hyattsville, Maryland.

Wilson, David H., Starr, Gary J., Taylor, Anne W., Dal Grande, Eleonara, 1999. Random digit dialing and electronic white pages samples compared:demographic profiles and health estimates. Australian and New Zealand Journal of Public Health 23, 627–633.

Zogby, John, Bruce, John, Wittman, Rebecca, Ren, Grace Q., Vanno, Philip A., 2007. Random Digit Dialing vs. Listed Sample: A Zogby International ResearchStudy into Telephone Survey Methodologies. Zogby International (May). <http://www.zogby.com/Listed_vs_RDD_05_15l.pdf>.

872 T.M. Guterbock et al. / Social Science Research 40 (2011) 860–872