Validity of Responses

Validity of Responses to Survey Questions BY HUGH J. PARRY AND HELEN M. CROSSLEY

This article is designed as one of a series which will discuss certain aspects of validity in surveys. T h e first article, which appears below, examines two cur- rent concepts of validity (as predictive accuracy, and as a matter of interpretation), reviews the literature on the subject, and presents some of the results of a specially- designed survey in Denver which showed that the validity of even simpleufactual" responses may often be open to question. Subsequent articles will discuss the effect of the interviewer on the validity of survey results and the variations in validity according to respondent characteristics and other variables.

Hugh I. Parry was formerly Acting Director of the Opinion Research Center of the University of Denver, and is at present Director of Publications for the A n d Defamation League. Helen M. Crossley, formerly Senior Analyst at the Opinion Research Center, is now with the Attitude Research Branch of the Armed Forces information and Education Division.

Perhaps no word has been more psychological testing are familiar with vaguely or loosely used in all the social the concept of validity as the ability of sciences than "validity." To some it is a test to predict performance; the cri- a matter of gradation-a continuum, terion in the case of an entire test be- so to speak-ranging from an imagi- ing some outside measure such as school nary absolute of perfection down to an success, while item validity measures equally imaginary absolute of non- the predictive accuracy of individual validity. To others, more naive, it is an test items against the criterion of the either-or dichotomy, chiefly useful as a full test score. Hundreds of new )tests weapon to hurl against personal or have been devised and validated, based ideological opponents. Yet validity is sometimes on dangerously small groups basic to all research, and the concept of college students, sometimes on larger clearly must be made more specific. heterogeneous populations. Many dis-

cussions of test validation, both defini- A. CONCEPTS OF VALIDITY tional and methodological, can be found

Validity as Predictive Accuracy in the literature and will not be re-In defining the essential meaning of viewed here. Nearly all of them, to a

"validity," two main schools of thought greater or lesser extent, are based on can be distinguished. The more com- the concept of validity as the ability to mon definition is given in terms of predict performance, although some predictive accuracy. Social psychologists, writers have begun to point out that educators, and others concerned with the performance criteria themselves

62 PUBLIC OPINION QUARTERLY, SPRING 1950

may be subject to various types of in- va1idity.l

Research workers in the broader field of public opinion and market research and other social scientists interested in the definition of attitudes and opinions and their manifesta~ions have applied the concept of validity as predictive accuracy more broadly to mean prediction of behavior. In this sense, attitude surveys are considered valid if they can predict with reasonable certainty how various groups or individuals will be-have at the polls, in a grocery store, or in some other future behavioral situation. However, since Link and Frei- berg2 made their historic, categorical statement linking validity to behavior, their concept has been questioned more and more frequently and severely. Dollard3 pointed out in a recent article that the conditions under which opinions can be expected to predict behavior may vary greatly according to such factors as the state of mind and verbal ability of respondents, the conditions of the test situation, and the intrusion of outside factors between the time of the test and the actual behavioral situation. A study by Pace at Syracuse compared the answers to nine "opinion scales" with results from seven "activity scales" and led to the concl~s ion:~

"Manv attitude tests are descri~tive but not predictive, and their meaLing and interpretation is limited by this fact. Definitions of attitude as a tendency to act may need to be recon-sidered; acceptance of the definition implies that behavior is the criterion of validity."

Validity as a Matter of Interpretation In recent years many analysts and

users of social research have come to realize that the use of "validity" to mean predictive accuracy is not a fair or complete test of the accuracy or use- fulness of survey results. Opinion may be closely related to behavior, but it is not the same thing and it may therefore have separate validity of its own. As Connelly has so aptly said: "Answers to every question asked uniformly of an adequate sample by capable interviewers have ~al id i ty ;"~ the trouble starts with interpretation of responses apart from their own stimuli. More- over, validity as predictive accuracy applies only to attitude and opinion studies, and calls for another definition for the validity of so-called "factual" questions. Definition and measurement of validity, obviously, also involves problems of semantics, since both the researcher and the user of survey re-sults must mean the same thing when they talk about opinion, factual information, and validity. Such necessary inclusiveness of the validity concept was pointed out in 1946 by M ~ N e r n a r . ~

1Cf. Jenkins, J. G., "Validity for What?", Journal of Consulting Psychology, Vol. I O

(1946). 2 Link, Henry C., and A. D. Freiberg, "The

Problem of Validity vs. Reliability in Public Opinion Polls," Public Opinion Quarterly, Vol. 6, No. I (1942), p. 98.

3 Dollard, John, "Under What Conditions Do Opinions Predict Behavior?", Public Opin- ion Quarterly, Vol. 12, NO. 4 (1948), p. 623.

Pace, C, Robert, "Opinion and Action: A study in Validity of Attitude Measurement," American Ps~chologist , Val. 4 (1949), P. 242.

Connelly, Gordon M., "Now Let's Look at the Real Problem: Validity," Public Opinion Quarterly, Vol. 9, No. I (1945), p. 53.

6 McNemar, Quinn, "Opinion-Attitude Meth- odology," Psychological Bulletin, Vol. 43

(1946), p. 315.

~ ~ ~ ~ ~

VALIDITY O F RESPONSES T O SURVEY QUESTIONS 63

The conflicting definitions of validity were clearly brought out at the Central City Conference on Public Opinion Research in 1946, at which some par- ticipants held out for validity in terms of prediction, others for consistency criteria, and still others in terms of the interpretations made of survey data.? Since that time growing numbers of social scientists have begun to define validity in terms not of prediction but of interpretation. This meaning of validity is an extension of the classical use of the term to describe something that measures "what it is supposed to measure," limited by a careful consid- eration of what the instrument can logically be expected to measure.

The significance of the differing con- CePts of validity can be clearly illus- trated by the case of the ~ r polls. In terms forecasting the be-havior of the electorate at the voting booths, the 1 9 4 ~ polls were

far from yet in terms meas-uring 'elative sentiment towards and Truman at the time they were taken, their validity may have been

high' The report of the 'pe-cia1 committee of the Social Science Research Council even went so far as to report that: "There is a possibility that the shift [last-minute swing to Truman] could have been large enough to make Gallup's and Crossley's last pre-election surveys not too far off the mark, as of two weeks before the elec- t i ~ n . " ~They may have measured opinion of respondents, two or three weeks beforehand, as to what they thought they would do at election time; but because of the last-minute shift, turnout complications, and many other less spectacular factors, they could not validly forecast who would actually

vote or how. No more convincing argu- ment for using "validity" in terms of interpretation of data rather than prediction of behavior can be given.

Validity is the most important single concept facing either the casual or the specialized user of survey results. In view of this fact, it is discouraging that so little attention has been paid to it to date. Even Cantril's admirable text, Gauging Public Opinion, does not tackle the problem of validity, except in a brief technical study of interviewer ratings. Had social scientists paid more attention to this crucial matter, the election predictions of 1948 might have been less widely accepted as Gospel by laymen, and the causes of their failure might have been more widely under- stood. At any rate, there would now be less evidence of the misconception, still ~ - ~ ~ ~ prevalent, that the polls' failure in 1948 was due to some particular error in

sampling, interviewing technique, or statistical allocation of certain groups, rather than to the far more basic error of trying to predict behavior where they could not validly be expected to do more than measure pre-election opinion and intention.

B. MEASURES O F VALIDITY

Aggregate and Individual Validity

Whether validity is considered as predictive accuracy or as interpretation,

7 Proceedings of the Central City Conference on Public Opinion Research, Panel 5 : "Validity in Public Opinion Surveys"-Panel Members: H. H. Remmers, E. Palmer Hoyt, Wilfrid Sanders, Herbert Hyman. National Opinion Research Center, Denver, Colorado, 1946.

8 The Pre-Election Polls of 1948-Report to the Committee on Analysis of Pre-Election Polls and Forecasts. Social Science Research Council, Bulletin 60, 1949,p. 313.


however, some way must still be found to measure it. Often, of course, no check is possible for the major findings which the survey was designed to un- cover; validity must be established for related questions and independent characteristics. Except in test validation, the usual method has been by means of comparisons of aggregate results from the survey in question against actual or percentage figures from an outside source, such as election results or census figures. The concept of aggregate validation, both of sample designs and of survey results, is a familiar one in market re~earch,~ as well as in the field of election forecasting and social re-search in general. On many types of surveys, given sufficient aggregate checks, results can often be assumed to have over-all validity. Yet there is al- ways a danger that satisfactory aggregate comparisons may conceal danger- ous compensating errors. Thus, the most reliable means of establishing the validity of survey results is the comparison of aggregate results with outside data accompanied by an independent check on the worth of the individual responses.

Validation of individual reports is extremely difficult to carry out, because of the anonymity of most respondents and the difficulty of verifying answers even when respondents are identified. Nevertheless, there have been several attempts to make such checks on a small or large scale. Some have been based on the predictive concept of va-lidity, others on the more limited one of truthfulness. Some of the more significant of these studies are outlined briefly below as illustrations of the difficulties involved and the results that can be achieved.

One of the earliest studies was the well-known experiment of LaPiere,lo who between 1930 and 1932 traveled extensively with a young Chinese cou- ple, and then obtained questionnaires from many of the hotels, auto camps, tourist homes, and eating establish-ments they had visited; over go per cent of the proprietors in each group said they would not accept Chinese as guests. His early findings did much to show that the best test of validity of measured attitudes may be something other than behavior in a hypothetical or real situation.

Commercial Research

In the commercial field, studies of individual validity, as opposed to aggregate validations, have sometimes taken the form of "pantry inventories" to see whether what is actually on the shelves agrees with housewives' reports. A similar type of study was reported in 1938 by Jenkins and Corbin,ll who checked daily sales slips for 70 regular customers of a local grocery store in Ithaca, New York. The check covered 13 frequently purchased articles, and resulted in a range of 62 to IOO per cent of respondents naming as most recent purchase the brand actually shown on the store's sales slip. The authors found that indices of validity did not exhibit uniformity from product to product, and concluded that

Q Cf. Committee on Marketing Research Techniques, "Design, Size, and Validation of Sample for Market Research," Journal o f ~Marteting,Vol. 10 (1946).

10 LaPiere, Richard T., "Attitudes vs. Ac-tions," Social Forces, Vol. 13 (1934).

l1 Jenkins, John G., and Horace H. Corbin, Jr., "Dependability of Psychological Brand Barometers-11: The Problem of Validity," Journal o f Applied Psychology, Vol. 22 ( 1 9 3 8 ) .

65 VALIDITY OF RESPONSES T O SURVEY QUESTIONS

while reliability of last-purchase questions (as measured through re-inter-views) could safely be assumed, the validity of such questions should be determined individually for each product to be studied.

The Magazine Audience Group, which was sponsored originally by Life through the Continuing Study of Magazine Audiences and later ex-panded into a general advisory body on magazine research for many pub- lishers, was from its beginnings in mid- 1938 especially concerned with the problem of validity. In order to elimi- nate invalid answers from reports of magazine readership, the committee de- veloped a system called "Confusion Control" based on the technique used by Professor Darrell B. Lucas12 in measuring the impact of advertisements. The basic technique involved the use of advance magazines not yet published that respondents could not possibly have seen, in order to find out the amount of false identification. At first the correction applied to readership figures on an aggregate basis only. But beginning with Report No. 4 in 1941, a method was devised to evaluate individual replies according to the number of pages identified. The amount of confusion (false identification, either de- liberate or mistaken) found was gen-erally low, well below 10 per cent.

A small study done for the Maga- zine Audience Group by Crossley In- corporated13 in early 1941 was set up to check on the accuracy of education reports received from respondents on regular surveys. While done on a limited scale in a few small cities only, this experiment is particularly significant in the study of validity in view of the apparently common upward edu-

cational bias of even the most carefully designed quota or area samples. Cross- ley's study checked each respondent's answers on the amount of education received against three different sources: later reports from other members of the family, interviews with neighbors, and actual school records where available. As expected, results showed exaggeration of reports on the part of re-spondents, although the exaggeration was more evident in reports of gradua- tion from grade, high school, or col-lege, than it was in actual attendance at the different types of schools. On the basis of this study, Crossley concluded that simple questions regarding the number of years the respondent at-tended school were likely to have low validity and should not be relied upon.

Government Research

The Federal Government has occa-sionally made various studies which bear on validity. In a brief but reveal- ing article in 1944 Hyman14 cited three surveys done for the Ofice of War In- formation which showed distortion of the truth by from 4 to 42 per cent of respondents. From these results Hyman concluded that, at least on questions concerning behavior having a prestige character, poll results should be used with the greatest caution. One of the most significant of his findings was the fact that invalidity may exist in varying

l2L U C ~ S ,Darrell Blaine, "Rigid Techniques for Measuring the Impression Values of Specific Magazine Advertisements," Iournal of Applied Psychology, Vol. 24 (1940).

13 Results of this study were never published, and the authors are indebted to Archibald M. Crossley for permission to cite them here.

14Hyman, Herbert, "Do They Tell the Truth?", Public Opinion Quarterly, Vol. 8 , No. 4 (7944). P. 557.

66 PUBLIC OPINION QU ARTERLY, SPRING 1950

amounts in different population groups. Some work related to validity was

done by the armed forces during World War 11, notably the methodological studies by the Bureau of Naval Person- nel and the experiments in prediction made by the Research Branch of the War Department's Information and Education Division. The American Soldier,16 the impressive, recently published report of War Department re-search, contains a few references to the validity of individual attitudes as es-tablished by future behavior. These studies, however, are all concerned with the predictive concept of validity; there seems to have been little concern with the more vital matter of validity as representation of truth.

The most comprehensive government work on validity is that now being set up by the Bureau of the Census to be applied on the 1950 Census of Popula- tion. The Bureau has a Response Re- search Unit whose task it is to find out the kind and amount of error involved in reports obtained by enumerators. Various techniques are being used, including re-interviews and special sta-tistical analyses. When the reports from this source are available, they should provide a wealth of hitherto unknown facts about the nature of the validity of census-type information.

In both governmental and non-governmental research validity problems are almost unbounded. Tests for validity are still limited by the accessibility of check data, but they range widely. In the past year the writers have had occa- sion to devise measures of validity of surveys on subjects ranging from anti- Semitism and election behavior to re-ports from hunters and fishermen in California regarding the amount of

game they bagged or fish they caught. In the latter case, as in many other types of surveys, the issues involved were both memory and honesty-that is, could anglers and hunters give us reasonably correct answers on the matter, and would they if they could? Researchers can and must use sufficient ingenuity to apply a great variety of validity or quasi-validity checks to every study design of the future.

Medical and Related Research

Medicine is generally considered as belonging to the field of the relatively exact or physical sciences, one with which social scientists have usually had little contact. But a study done in Michigan indicates that the methods of social research may soon be applied more widely in the medical field.16 The objective of the study was the validation of a new method to determine the need for medical attention among farm families. The basic technique used was a list of symptoms which should re-ceive medical attention, information on which was obtained by regular inter-viewing methods from an informant (usually the housewife) for each mem- ber of her family. The information was then validated by means of actual physical examinations of the members

15 T h e American Soldier, Princeton Uni-versity Press, 1949.Vol. I:Adjustment During Army Life, by Samuel A. Stouffer, Edward A. Suchman, Leland C. DeVinney, Shirley A. Star, Robin M. Williams, Jr.; Vol. 2: Combat and Its Aftermath, by Samuel A. Stouffer, Arthur A. Lumsdaine, Marion Harper Lums- daine, Robin M. Williams, Jr., M. Brewster Smith, Isving L. Janis, Shirley A. Star, Leon- ard S. Cottrell, Jr.

laHoffer, Charles R., "Medical Needs of the Rural Population in Michigan," Rural S O C ~ O ~ O ~ Y ,V0l. I 2 (1947).

67 VALIDITY O F RESPONSES T O SURVEY QUESTIONS

of about one-sixth of the families. Com- plete agreement between the questionnaire reports and the physician's ex-aminations was found in 8 out of 10 cases, and indicated that the determi- nation of the medical needs of a population by asking individuals to list their symptoms was quite feasible.

Kinsey17 has given a great deal of attention to the problem of validity. Perhaps the most comprehensive of his techniques to establish validity is the comparison of reports from 231 pairs of spouses. For most of his items Kinsey found that between 80 and 99 per cent of this group of subjects gave replies that were later verified independently by their marriage partners. In addi-tion to this type of check, the Kinsey investigators obtained a small number of re-takes to test the constancy of memory. They also noted such things as internal consistency of the case histories, reports from the skilled inter-viewers on falsification and cover-up, constancy of patterns in members of different segments of the population, checks by sexual partners other than spouses, comparisons between inter-viewers of results for similar groups, hundred per cent samples, and com-parisons of reports from older and younger generations. Kinsey found that accuracy varies considerably with dif-ferent individuals. The validity of individual histories also varies with particular items and for different segments of the population. Incidence data were found to be more accurate than fre-quency data, and averages of social statistics such as age, education, events concerned with marriage, etc., check closely with averages obtained by direct observations.

In spite of the author's warning that

the results presented in the remainder of the book are only fair approxima- tions of fact, the careful reader will be inclined to accept the findings as having been obtained in a most scientific manner and as having a more than satisfactory degree of validity, so far as individual reports are concerned. The one point at which the Kinsey Report is vulnerable to criticism is the one at which many other studies stop-aggregate validation. In the absence of a scientifically selected sample (a require- ment which might be quite impossible for such a survey to meet on a full-scale basis), the Kinsey results and back-ground data should be validated against all possible criteria in a regular probability sample of perhaps one or two selected areas. In this way, results which are now reasonably valid for individuals and special groups could be applied to larger, more general segments of the population.1s

Political Research

It is only recently that election pollers have begun to recognize the need for validating individual answers. Election results were considered the acid test, and a poll which came close to the aggregate official results of an election had indeed performed a difficult task. Since elections are secret, the problem of how to validate respondent reports is almost insurmountable, the limit

1 7 Kinsey, Alfred C., Wardell B. Pomeroy, and Clyde E. Martin, Sexual Behavior in the Human Male. Philadelphia: W. B. Saunders Company, 1948.

18 Parry, Hugh J., "Some Contributions of the Kinsey Report to Opinion and Attitude Research," unpublished paper presented before the American Association for the Advancement of Science, New York City, December 30,

1949.


usually being a check against precinct records after the election to see whether each respondent voted or not, with no way of telling for whom he voted. Re- interviews with respondents after election, as were made in the 1940 Erie County survey;g serve somewhat the same purpose, with the added advan- tage of including the report of the can- didate voted for-but since they are still verbal reports from the same subjects and not checks against outside data, they are as much a reliability measure as validity, and may be subject to the same kinds of inaccuracy on voting reports as the original pre-election questions.

In December 1942 the American In- stitute of Public OpinionZ0 made a small but significant study in Ewing Township, near Trenton, New Jersey, in which 271 out of the 739 registered voters in the Seventh Precinct were interviewed and asked whether they had voted in the election a month before; their answers were then checked against precinct records. Correct an-swers were given by 93 per cent of the respondents. Incorrect replies included 5 per cent who said they had voted but actually had not, and 2 per cent who said they had not, but actually did. Similar results indicating high validity in some post-election studies in 1948 are not confirmed by the extensive check made in Denver six months after the election, as will be demonstrated in the following section.

The 1948 election gave rise to several post-election checks by polling agencies. Among them was the intensive panel study carried out during the campaign period in Elmira, New York, which will yield 'much information when it is fully analyzed. A preliminary re-

portz1 states that the respondents' post- election reports of voting corresponded with official records in 98 per cent of the cases. These respondents, however, as members of a panel, were inter-viewed several times in the course of the campaign, and, because of their generally cooperative attitude, could be expected to give more truthful answers than respondents on other types of surveys.

A resurvey of 317 respondents was made by the Washington Public Opin- ion Laboratory in the State of Wash- ington during the first week of De-cember 1948. High agreement with official records was reported in this study: of the 299 respondents who reported having voted on November 2,

287 were actually found to have done so.zz This situation may be rather un-usual, however, in that the respondents had been interviewed before and may have been more inclined for this reason to give correct replies.

Re-interviews were also made in 1948 by the Survey Research Center of the University of Michigan on a na-tional sample.2za No check was made against official records for these re-spondents, however, since this study, like most others, was intended not as a validity check but to throw light on the

19 Lazarsfeld, Paul F., Bernard Berelson, and Hazel Gaudet, T h e People's Choice. Second Edition, New York: Columbia University Press, 1948.

20 The authors are indebted to William S. Gillam and the AIPO for permission to present here the results of this hitherto unpublished study.

21 Dinerman, Helen, "1948 Votes in the Making-a Preview," Public Opinion Quar-terly, Vol. 12, NO. 4 (1948), p. 585.

zz SSRC, op. cit., pp. 368-369. 228 Ibid., pp. 373-379.

VALIDITY OF RESPONSES T O SURVEY QUESTIONS 69

problems of voting intention and turn- our. In both surveys it is interesting to note that the percentage reported having voted is higher than in the population at large.

~ n o t h e r check was carried out in New Jersey following the 1948 elec-tiom by Carroll S. Moore7 Jr. of the Trenton Times Poll.22b He did not re- interview his respondents, but checked their voting through precinct records. He found that 95 per cent of those who had intended to vote actually did "; but that I2 per cent Of

resP""dents who said they were reg's- tered and eligible to vote were in fact not registered at all.

From the findings of the various studies reported above, and from others not included because of space consid-erations, it can be seen that the validity of individual replies can never be taken for granted, even when aggregate validity is very high. onthe other hand, as is shown in studies such as the Kinsey Report and the post-election checks of voting behavior, the fact that individual replies have a great deal of

does automatically insure that the over-all results will therefore be valid. Before survey results can be relied on, they must be subjected to both kinds of tests-do the aggregate results check against important known data? and if so, are the individual reports sufficiently truthful? ~ i kind~ of without the other may be misleading.

C. DENVER VALIDITY STUDY

Plan of Study

In order to make a systematic attack

On of the previously problems of validity7 a detailed study

was planned and carried out in 1949 at the University of Denver's Opinion Research Center, of which Don Cahalan was Director. It was made possible through generous grants-in-aid from the Rockefeller Foundation, the National Opinion Research Center (through funds allocated from the In- terviewer-~ffectproject sponsored by the Social Science Research Council), and the University of Denver, and was also assisted by a contribution from Elmo Roper. In its inception and planning the study benefited immeasurably from the advice and assistance of a formidable number of social scientists, whose aid is gratefully ackn~wledged .~~

22b The authors are indebted to Mr.Moore for making available the results of this unpublished study.

,,In addition to ORC staff members, erous assistance was given by Herbert Hyman of NORC, whose earlier research inspired much of this study; by Clyde Hart and Paul Sheats- ley of NORC; and by Frederick Stephan of the Commieee on Measurement of Opinion, At-titudes, and Consumer Wants. Others who have made contributions include: Fitzhugh L. Carmichael, Bureau of Business and Social Research, University of Denver; Archibald M. Crossley, Crossley Inc.; Lawrence E. Dameron, Department of ~sychology, University of Den- ver; W. Edwards Deming, Bureau of the Budget; Leland DeVinney, Rockefeller Foun- dation; George Gallup, American Institute of Public Opinion; Donald Glad, Department of psychology, University of Denver; Charles Y. Glock, Bureau of Applied Social Research, Co- lumbia University; Morris Hansen, Bureau of h ~ ~ the Census; Paul Lazarsfeld, Bureau of Ap-plied Social Research, Columbia University; Dean Manheimer, Bureau of Applied Social Research, Columbia University; William Mc-Phee, Research Services, Inc., Denver; Law-rence W. Miller, Department of Psychology, University of Denver: Robert and Ann Neel, Department of Ps~cholog~, University of Den- ver; Elmo Roper; Samuel A. Stouffer, Depart- ment of Social Relations, Harvard University; Coleman Woodbury, Urban Redevelopment


The study was designed to explore three areas: a substantive area of the determinants and concomitants of community satisfaction, and the methodological implications of interviewer effect and of validity. The substantive area will not be covered here, except as it overlaps the methodological areas. This article will limit itself to a report of the design and over-all findings of the validity portion of the survey.

Items of Investigation

The subjects chosen for the check on validity of response were generally of a sort common to survey questionnaires. Wording of the questions was based on forms commonly used by other opinion research organizations. T o some degree, the subjects used for investigation were supplied by the logic of necessity; that is, we had to limit our choices to items which were sig-nificant and which also could be checked against official records. The subjects finally selected for checking were:

( I ) Respondent's registration and voting in the six city-wide Den- ver elections held between 1944 and 1948. Official precinct lists of voters are in the public do-main, so each respondent's re-ported voting history could be checked against them. In the case of the primary election in 1948, we could also check on party affiliation.

(2) Personal contribution during the fall 1948 Community Chest drive.

(3) Possession of a valid Denver Public Library Card in respondent's name.

(4) Possession of a valid Colorado driver's license.

(5) Ownership of an automobile by respondent or spouse, and make and year of car.

(6) Respondent's age. This was checked t h r e e way s-against v o t i n g r eg i s t r a t ion records, against driver's license reports, and finally, for internal consistency, against another question on the ballot.

(7) Ownership or rental of respondent's place of residence.

(8) Telephone in respondent's home.

It can be seen that the items chosen evoke varying amounts of prestige, and varying degrees of potential distortion as caused by social pressure, ease of verification, memory factors, and the like.

Perhaps the items of greatest practical interest and importance are those dealing with elections, since past performance (or the respondent's version of past performance) has often been used, deliberately or unconsciously, in an attempt to behavior in the future. Other items used are common ones, either in opinion and attitude research or in the more specialized field of market research. Cross-analyses are frequently made on the basis of re-sponses to these items, and conclusions are drawn from the attitudes or past behavior of these groups; it is therefore important to know to what extent such breakdowns are based on valid information.-Study, Chicago. The authors are also indebted to Hadley Cantril and Elizabeth Deyo of the Office of Public Opinion Research, Princeton University, for making available machine equipment for the final analyses.


The results presented here, of course, will not apply automatically to any survey done on any population, al-though many of the findings are of importance to research in general. Their significance and application must be studied in the light of the conditions under which they were obtained. For this reason it is necessary to present here a brief but basic outline of the sample used in the Denver survey and the techniques employed to obtain the information.

The Sample

Most fortunately, while the study was still in the planning stage, a new edition of the City Directory of Denver residents was issued. A series of infor- mal checks on the Directory information indicated that it was sufficiently accurate and up-to-date to take the place of a costly enumeration on our part and that it could be used as the universe for this study. While there may have been some small distortions in representativeness in the Directory, they would not materially affect our results, since our purpose was to ob-tain a random list of individuals for the validity and interviewer effect tests rather than to make any numerical estimates.

Using a probability method of systematic selection, 1,349 names were taken from the Directory (discarding, of course, such unusable listings as business places, out-of-town addresses, and duplications of names). These I ,349 names were distributed to the 45 interviewers in assignments of 30 (one as-signment was 29 names). Interviewers were allowed to make no substitutions, and were required to make at least four calls to reach their respondents.

A total of 920 usable interviews was finally obtained. By using the Directory as a sampling universe, we have available, as a by-product, considerable data on the characteristics of respondents who were not reached; analysis of these data will be made available later.

Interviewer Selection and Training

The field work was begun on April 19, 1949, and continued through May. The 45 interviewers used came from two groups: experienced professional interviewers on the staffs of national and local research organizations, and graduate and undergraduate students in opinion research and social science at the University of Denver. Each interviewer was given intensive personal training in two or more special ses-sions, and was assigned to a special supervisor for the duration of the field work. The result was that the interviewing staff, when it went into the field, was presumably somewhat above average in its ability and training. This point is stressed only to show that little of the invalidity of response found could have been caused by an unduly amateur or inefficient staff of inter-viewers. Further evidence of the quality of the field work was given by a post- interviewing check by the office staff, both for respondents interviewed (certain items were checked against Direc- tory information and by telephone) and for those not reached (using Post OGce records and other methods). Eight ballots were discarded as invalid, chiefly because of mistaken identity, a constant ~roblem with name-and-ad-dress samples. Thus, it can be assumed on this survey, in contrast to others where such rigid control and checking of field work are not feasible, that a


minimum of the invalidity uncovered is due to dishonesty or incompetence on the part of interviewers.

Another aspect of the sample design assured that differential validity among various groups could not be due to certain interviewers' interviewing more of certain types of people. The city was divided into five sectors, as equivalent as possible with respect to several factors, and within each sector respondents were stratified by sex and geo-graphical location and assigned at random to the nine interviewers. Thus, careful control was exercised to see that each interviewer's assignment was as nearly like every other assignment as possible. F u r t h e r m o r e , interviewers were allocated to the various sectors of the city so as to equalize as far as possible the effects of such factors as interviewer's sex, experience, education, age, and social introversion-extraversion. The importance of this technique will be brought out in an article in prepara- tion dealing with the relationship of the interviewer to the validity of sur-vey results.

The interviewers, it should be added, were given no indication of the real purpose of the study nor were they told that there would be a check on the respondents (although, to improve effi- ciency, they were, as usual, told that there might be checks on their own work). As far as they were concerned, it was a normal survey covering com-munity satisfaction. Later checks indicated that none of the interviewers became aware of the justifiable trick being played on them.

Checking Validity Information

T o ascertain the validity of information obtained by the interviewers, a

long and tedious, name-by-name, re-sponse-by-response check was carried out. Each respondent's answers to the questions cited earlier were compared with records of the City and County of Denver, the Denver Community Chest, the Denver Public Library, and the Mountain States Telephone Company. All checking, except on Community Chest records, was done by Center personnel. In this investigation the Center received the whole-hearted and efficient cooperation of all agencies concerned.

It is not necessary to go into the mechanical details of the validity check. Such factors as marriage and conse-quent change of name by female re-spondents between 1943 and the present, changes of address during the period, and the like all contributed to our difficulties. The official records of the City and County of Denver appear to have been in a state of much higher order and accuracy than many researchers have found in other areas, but even so occasional difficulties crept in. While it was possible to solve the great ma-jority of problems by rechecking, dig- ging, leg-work, and phone-work, it must be realized that some error in the base criteria was inevitable and is re-flected in the measures of validity obtained.

The Results

The level of invalidity on the various items or combinations of items checked ran from nearly zero up to almost half of the responses received. As the following ,tables show, invalidity often follows social pressures. More respondents exaggerated their participation in elections than under-reported it. The same tendency is evident in the reports

73 VALIDITY OF RESPONSES TO SURVEY QUESTIONS

of possession of library cards and 24 Question 14: "Here are some questions driverjs licenses. only the over-all totals about registration and voting in Denver. Have

you been registered to vote in Denver at any are presented here; later articles will time since rg43?33 explore the variations in validity by ~~~~~i~~ 1 4 ~(IF OR 'IDON~T respondent characteristics, conditions of KNOW"): "Have you voted in any election

the interview, and other factors. in Denver since 1943, either in person or by mailing an absentee ballot back to Denver?"

Elections. Since the laxest amount of Question 15 (UNLESS "NO" T O 14 OR 14A): "We know a lot of people aren't abledata concerns reports Oy in the

various Denver elections, we shall dis- to vote in every election. Do you remember for cevlain whether or not you voted in any

cuss them first. Results are shown in of these elections: First . . ." (ELECTIONS Table READ OFF, ONE AT A TIME).

TABLE I

VALIDITY REGISTRATION REPORTSOF AND VOTING 100%=92o Cases

A. Whether registered or voted in Denver since 1943: Correct reports:

Not registered since 1943 Voted or registered since 1943

Exaggerated registration or voting Under-reported registration or voting Confused (Don't remember, No answer)

B. V o ~ i n greports on combination of six elections: Correct in all statements Exaggerated (voted in fewer than reported) Under-reported (voted in more than reported) Confused (voted in same number but different elec-

tions, or Don't remember or No answer to one or more elections)

C. Vodng reports on six specific elections: ( I ) November 1948 Presidential election:

Correct reports: Did not vote Voted

Exaggerated (said voted, but did not) Under-reported (said did not vote, but did) Confused (Don't remember, No answer)

*Less than 0.5 per cent.

PUBLIC OPINION QUARTERLY, SPRING I

TABLE I (continued) 1 0 0 ~ = 9 2 o Cases

(2) September 1948 primary election: Correct reports:

Did not vote Voted in Republican Primary Voted in Democratic Primary

Exaggerated (said voted, but did not) Under-reported (said did not vote, but did) Confused (Don't remember, No answer, wrong

answer on party)

(3) November 1947 city charter election: Correct reports:

Did not vote Voted


(4) May 1947 Mayoralty election: Correct reports:

Did not vote Voted


(5) November 1946 Congressional election: Correct reports:

Did not vote Voted


(6) November I944 Presidential election: Correct reports:

Did not vote Voted


TABLE I (continued)

~oo%=gzo Cases


The cumulative amount of invalidity for the six elections is somewhat star- tling. While four-fifths of the respondents gave valid answers as to their registration during the period, only a third gave entirely correct answers to questions regarding all six elections. And "correct" in this check is only in terms of whether or not the respondent actually voted; if a check could be made on the truthfulness of the reports given on candidates voted for, even a larger number of errors might be uncovered.

On the questions regarding specific elections the amount of invalidity var-ied from a seventh to a fourth of all responses. Clearly, on the basis of these results, any of these questions would have little value as a means for checking the representativeness of a sample, for drawing assumptions on the basis of voting groups, and particularly for using this reported past voting behavior as a means of indicating future voting behavior.25

The 1948 Presidential election was both nearest in time and highest in importance to respondents in general. Thus the level of invalidity here was somewhat lower than on the other elections. However, to some extent, the lower level of invalidity is artifactual: where invalidity is basically in the direction of exaggeration and where a

higher proportion vote than in most elections, there is simply a smaller group of persons who are likely to give incorrect responses.

Community Chest Contribution. Table 2 shows that the query concerning personal contributions to the 1948 Com- munity Chest drive provided a large relative amount of in~alidity. '~

It can be seen that about a third of the respondents said that they did not contribute to ,the Chest; in these cases no further check was made, on the pragmatic but probably reliable assump- tion that few if any respondents would deny contributions they had made.

25 "In five of the six elections, using the respondent's unverified statement to classify him as a 'voter' or 'non-voter' would result in misclassifying from 22 to 30 per cent of the respondents." Don Cahalan, "Validity of Behavior Reports in Opinion Surveys," un-published paper read before the American Sta- tistical Association, New York City, December 30, 1949. Nevertheless, in terms of aggregate validation, the uncorrected sample results showed that Truman had received 53 per cent of the major party vote in Denver in 1948; he actually received 54 per cent! Evidently, even on the most recent election, a set of can-celing biases was in operation.

26 Question 25: "Did you yourself happen to contribute or pledge any money to the Community Chest during its campaign last fall?"

PUBLIC OPINION QUARTERLY, SPRING 1950

TABLE 2

VALIDITYOF REPORTS COMMUNITYON CHEST CONTRIBUTIONS 1 0 0 ~ = 9 2 0 Cases

Reported not giving (statements assumed to be correct, but not checked against records)

Reported giving, and did give Reported giving, and might have given through uncheck-

able source Reported giving, but did not give Don't remember, No answer

About a fourth correctly said they had question created considerable ambigu- given, either at work or at home. ity. Despite the stress on "you your-Slightly over a third said they had self," some respondents tended to given but were not listed as donors in answer in terms of pledges by other the Community Chest files. About a members of the family. Whatever the tenth of the responses could not be reason for invalidity, it can safely be classified as valid or invalid-though said that this sort of question, whether the presumption is toward invalidity, it concerns the Community Chest or since the Chest records were in very some other charitable organization, is good shape and, except for certain col- not very helpful for survey use. More- lective donations, included a very com- over, the issue tested here was only the plete list of donors. fact of giving; if it had been necessary

Thus it is evident that about four out to find out the amounts of contribu-of every ten responses here were in- tions made, even more invalidity could valid. Undoubtedly social pressures and have been expected. a belief that the responses would not be checked were the major factors be- Library Card. As Table 3 shows, there hind the high level of invalidity. It was a slight tendency for respondents should be noted, however, that the to claim possession of a currently valid

TABLE 3

100%=92o Cases Correct reports:

Do not have card Have card

Exaggerated (reporting having card, none on file) Under-reported (reported no card, one on file) Don't remember, No answer

VALIDITY O F RESPONSES T O SURVEY QUESTIONS 77

TABLE 4 VALIDITYOF REPORTSON DRIVER'SLICENSE

AND AUTOMOBILEOWNERSHIP 100%=92o Cases

A. Possession of Driver's License: Correct reports:

Do not have license Have license

Exaggerated (reported license, but none on file) Under-reported (reported no license, one on file) Don't know, No answer (most had licenses on file)

44% 44

10

2

Q

B. Possession of Automobile, Year and Make: Reported no car owned (statements assumed to be

correct, but not checked against records) Correct on ownership, make, and year Correct on ownership and make, incorrect on year Correct on ownership and year, incorrect on make Correct on ownership, incorrect on make and year Incorrect on ownership No answer (more than half of these actually had

cars registered)

*Less than 0.5 per cent.

library card, when no card was actually on file.27 About a tenth of the responses were invalid in this respect, and a neg- ligible proportion were invalid in the direction of under-statement-probably infrequent users unaware that their cards remain in three years.

Driver's License and Car Ownership. Again, as can be seen in Table 4, about a tenth of the respondents claimed pos-

session a driver's license when actually they did not have one. Less invaliditv was found in items concerning ownership, by respondent or spouse, of an automobile, and the make and year of such car.28

While the number of correct answers on such questions is gratifying in comparison to the answers on other types of questions, an error of as little as 3 per cent in the proportion of families own-

27 Question 21: "Do you have a library card for the Denver public library in your own name?"

28 22: YOU have a Colorado

driver's license that is still good?" Question 23: ''Do You happen to own an

automobile at the present time? (IF "YES") Is it registered in your name alone, or in your (,ife,,) (husbandBs)name

~ u e s t i o n z?A (IF "YES" T O 23): "Does the car have Colorado plates or plates from some other state?"

Question 23B (IF TO 23): ''What year and make of car is it?"

PUBLIC OPINION QUARTERLY, SPRING 1950

TABLE 5

A. Consistency check by year of birth: 100%=886 Cases

Age and year of birth consistent within a year Reported age more than year younger than age by

year of birth Reported age more than year older than age by

year of birth

I3. Check by driver's license records: 1oo%=411 Cases

Reported age within one year of age on license record Reported age more than year younger than age on record Reported age more than year older than age on record

C. Check by election registration records (Men only):29 1oo%=zg7 Cases

Reported age within one year of age on registration record Reported age more than year younger than age on record Reported age more than year older than age on record

ing cars might be quite serious for Respondent's Age. Results on the vari- some purposes, such as estimating the ous age checks showed a generally tire needs of the country, since it is satisfactory level of validity, as indi-over and above any error that might be cated in Table 5.30 expected from sampling. If it were The correlation with age as reported proved that the entire 3 per cent were on the traditional age question was actually incorrect respondent reports, highest for the information on year of and not omissions in the official files birth obtained from the other end of or various other types of error, a survey with direct need for valid figures 29 1n former years women in Colorado were on car ownership would have to ex- not required to give their exact ages when

amine this problem carefully. In the registering to vote, only to swear that they

Denver study, a tendency was also no- were over 21. Consequently the registration

ticed to report the car owned as newer check on women's ages is omitted here because the information is not sufficiently precise.

than it actually was-a fact which 30Question 10: "May I ask your age?" might also require attention in a spe- Question 35: "In what year were you cialized survey. born?"

- -

VALIDITY O F RESPONSES TO SURVEY QUESTIONS 79

TABLE 6 VALIDITYOF REPORTS ANDON HOME OWNERSHIP TELEPHONE

A. H o m e Ownership:

1oo%=91g Cases Correct reports: 96%

Home owned 53% Home rented 43

Probably exaggerated ownership (place owned by someone of a different name) 3

Probably under-reported ownership (place owned by someone of same family name) I

B. Telephone:

1oo%=g18 Cases Correct reports:

Telephone N o telephone

Exaggerated (reported telephone, but none in family name at that address)

Under-reported (reported no telephone, but one in family name at that address)

the ballot. This result was to be ex-pected, since the check was essentially more one of reliability than validity. The device was suggested by the Bu- reau of Applied Social Research of Columbia University,,, and would be more useful on successive panel studies than on a single questionnaire as here.

For those respondents who did not have drivers' licenses or were not registered to vote in Denver, it was, of course, not possible to check the age information against such records. The fact that those who were so checked, however, appeared more accurate when compared with drivers' license records tha; wi,th registration records may mean one of several things-that the-registration records are less accurate than the license records, that some re-

spondents are motivated to give less valid reports to registration officers, or that the people for whom the various checks were possible differ in their ten- dencies to give invalid answers to the official reporters and to interviewers. These differences emphasize the point that it is not enough merely to know that invalidity exists and the extent of it; information is also needed on its sources and on means to distinguish which of several answers is the valid one and which the invalid.

H o m e Ownership and T e l e p h ~ n e . ~ ~

These factors, which are commonly

31Question 30: "Do you or your family rent, or own, the place where you live?"

Question 3 3 : " I s there a telephone in your home in your family's name?"


used for breakdown and checking purposes in many types of surveys, were found to have a high degree of validity when checked against city property records and telephone company listings. Results of this check are given in Table 6.

CONCLUSIONS

The Denver study disclosed amounts of invalidity ranging from a twentieth to nearly a half of the responses re-ceived on various types of factual questions. While other situations or areas may show more or less validity depend- ing on circumstances, the survey results demonstrate clearly the wide range of invalidity to be found in the answers to a number of factual items of types often used in survey research. They further underline the need for caution in accepting so-called "factual information" at face value; even census-type data must be considered suspect. Be- cause of the special controls exercised in the design of the survey and the careful training and supervision of interviewers, it is believed that the invalidity found here represents close to a

minimum, and that national surveys which cannot be so rigidly controlled should expect to encounter even more on many types of items. Except on certain more or less innocuous items, the range of invalidity is suacient to cause worry, and indicates a great need for further research on the truthfulness of respondents' statements of fact.

Nevertheless, the reader should not infer from these findings that research in the social sciences is relatively hope- less. He need not feel that truth is un- ascertainable by pragmatic methods of experimental science, and that he had better turn to Yoga or Neo-Thomism. For invalidity, in the final analysis, is not inevitable. It has causes which can be found in the questionnaire, in the respondent, in the interviewer, and above all in the interpretation of data. It varies by subject and among sub-groups. Yet it can be measured and analyzed. Once this is done, it is subject to certain pragmatic checks and controls. Succeeding articles in this series will attempt to examine these phases of the problem, and will sug- gest certain practical remedies.

Documents

Validity of Responses