Social Experimentation and Social Experimentation

The Board of Regents of the University of Wisconsin System

Social Experimentation and Social ExperimentationSocial Experimentation by Jerry A. Hausman; David A. WiseReview by: Robert H. HavemanThe Journal of Human Resources, Vol. 21, No. 4 (Autumn, 1986), pp. 586-605Published by: University of Wisconsin PressStable URL: http://www.jstor.org/stable/145768 .

Accessed: 08/05/2014 20:05

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

University of Wisconsin Press and The Board of Regents of the University of Wisconsin System arecollaborating with JSTOR to digitize, preserve and extend access to The Journal of Human Resources.

http://www.jstor.org

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 20:05:12 PMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=uwisc

http://www.jstor.org/stable/145768?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


586 The Journal of Human Resources

Social Experimentation and Social Experimentation

Social experimentation was a major social research innova- tion, introduced with the War on Poverty and pursued vigorously from 1965 through 1980; Social Experimentation is a review and evaluation of that research development. In this essay, I will discuss each of these enterprises, both of which have serious shortcomings.

I. Social Experimentation

The War on Poverty and Great Society initiative of the mid- 1960s was a major break with previous policymaking. A wide variety of policy interventions into social affairs were undertaken to change social positions, the nature of social interactions, and the behavior and performance of low-income people. Because U.S. policymakers had little experience in such matters and substantial uncertainty about how to proceed, many of the interventions were called "demonstration projects." Unlike earlier social policy efforts, they represented a major, nonmarginal, activist attempt to alter individual behavior and economic status.

Into this environment social experimentation was born. While the U.S. Office of Economic Opportunity (OEO) was initiating numerous demonstration projects through its Community Action Program, it was also con- sidering a major income support proposal-a negative income tax (NIT) to replace much of the nation's welfare system with a uniform, family-size- conditioned income guarantee that would gradually diminish as earnings rose above zero. The NIT would provide income support to families headed by men as well as women, reduce large differences among the states in benefit levels, and minimize work disincentives because dollars earned would not result in equivalent benefits lost. Relative to the existing income support system, the NIT was also a major break with past policy; its adoption would have been better characterized as an "overhaul" than a reform.

In discussions of the NIT within OEO, the work-incentive issue was considered crucial in determining its political feasibility (Lampman 1976). Many saw the notion of a guaranteed income as inviting large reductions in work effort, primarily by male adults who were the heads of families. To those involved in the debates, it was clear that the labor-supply effects of an NIT could not be adequately evaluated from existing studies based on available cross-sectional data, or from a "field test." It required an evalua-



Book Reviews 587

tion of the work-effort behavior of those covered by an NIT relative to those who were not covered; in short, a controlled experiment.

While such experiments had long been practiced in the biological sciences and psychology, for a variety of reasons-ethics, cost, complexity-this research technique had been little used by economists and other social scientists to evaluate policy interventions. Yet it had been discussed at length by some economists (Orcutt and Orcutt 1968, Rivlin 1971), and it appeared ideally suited for reliably estimating the labor-supply effects of an NIT. Relative to past social science research and research techniques, a social experiment was also a major break from past approaches to gaining policy relevant knowledge.

The idea behind a social experiment as applied to an NIT was simple: design a basic NIT and establish a set of benefit-reduction rates and income guarantees that are judged to be in the relevant range; choose a sample of households for whom a negative income tax is a viable policy option; randomly assign these households to an NIT group (the experimental group), the remainder being a control group; administer the NIT plan to households in the experimental group for some period of time; measure the work-effort patterns of those in the experimental group relative to the patterns of those in the control group; adjust for any other factors not taken into account in the experimental design; and attribute the remaining differ- ence in labor supply to the NIT.

Lurking behind this simple idea are a wide variety of technical, empirical, and implementation issues, of which the following are the most important:

* What variables should be emphasized in the experimental design? How many combinations of income guarantees and tax rates should be tested?

* What experimental units should be included? Individuals? Couples? Intact families? Single-parent families? In what environment should the experiment be placed in order to secure the desired level of relevance and generality of results? Urban or rural areas? Large towns or small? Suburbs or inner cities?

* How large a sample is required in order to achieve acceptable levels of statistical reliability for each plan tested? How should those in the experimental group be allocated across the plans tested so as to maximize the statistically reliable information obtained, given the cost?

* What behavioral variables should be measured for units in both the experimental and control groups? Should only labor-supply responses (hours worked, weeks worked, earnings) be measured, or




should a variety of other variables of interest (e.g., consumption patterns, family structure) also be measured?

* For how long should the experimental treatment be administered in order to secure a reliable measure of long-run response? One year? Three? Five?

* How is the experiment to be "fielded"-how are observation units to be chosen, enrolled, and how is the treatment to be administered? How can its administration be structured so as to minimize Haw- thorne effects (positive effects resulting not from the treatment but just from the stimulation of participating in the experiment)?

* What statistical techniques are to be employed in analyzing the results of the experiment? Simple control-experimental compari- sons? Complex models to reflect design characteristics of the experiment or to adjust for the self-selection of units of observation into the experiment or their attrition over the course of the experiment?

* To what extent can the observed responses of a sample of randomly selected and isolated individuals be taken as evidence of the impact of a national program?

Although the difficulties of these issues were, at some level, recognized by the economists and other social scientists who became involved in the OEO-sponsored NIT experiments of the late 1960s, the pressures of the moment propelled an experiment into the field before many of them could be adequately thought through and resolved. In 1968, the three-year New Jersey Income Maintenance Experiment enrolled 1,375 intact poor and near-poor families, each headed by a man of working age, 725 of which were randomly assigned to one of eight negative income tax plans.

The results of the experiment indicated that the response of the husbands of the families was small and not statistically significant. Wives, however, showed statistically significant and relatively large reductions in labor supply due to the experiment. Family labor supply reductions fell into the 5 to 10 percent range. In contrast with the other groups, blacks showed very little response to the NIT incentives.1

As the first controlled social experiment, the New Jersey project was subjected to detailed scrutiny from the research community. Within six months of the presentation of the final report on the experiment, the Brookings Institution sponsored a conference of nearly 50 social scientists and policy analysts to review the results of the experiment (Pechman and

1. The most comprehensive description of the experiment and evaluation of its labor supply results is Watts and Rees (1977).



Book Reviews 589

Timpane 1975). Five papers were commissioned from researchers not associated with the experiment to review and evaluate the design, the analyses of effects, and the policy implications of the experiment. In addition, an in-depth review of the history and conduct of the experiment and an evaluation of its results was sponsored by the Russell Sage Foundation (Rossi and Lyall 1976).

These reviews produced a wide-ranging critique, and documented concerns with a number of aspects of the experiment and its design which had been voiced since the inception of the experiment by a variety of observers. To some, the focus of the experiment on the labor-supply issue was too restrictive, making the project an empirical test of micro-economic theory, rather than an effort designed to yield policy-relevant information. The choice of the experimental units was criticized as being too restrictive; e.g., little could be said about the work response of female family heads to an NIT or to the national costs of such a program. Further, the treatments chosen were sufficiently similar as to preclude evaluation of the relative effects of tax rates and guarantees, and the choice of sites so restrictive as to eliminate the possibility of generalizing the results to larger population groups. The basic analytical criticisms were the following:

1. Limiting the eligible population in the experiment to families with incomes under 150 percent of the poverty line resulted in a trun- cated sample with important implications for achieving unbiased estimates of response.

2. The experiment was plagued with problems of attrition, in part caused by a decision by the state of New Jersey to institute a welfare plan for male-headed families after the start of the experiment. This also created substantial difficulty in achieving unbiased estimates of response.

3. The pre-experimental values of important variables related to labor supply (e.g., wage rates, hours worked, and earnings of all house- hold members) were measured in less than perfect fashion in the survey instruments.

4. The duration of the experiment, three years, may be too short to enable reliable estimates of the effects of a permanent program.

The New Jersey project represented the general acceptance of large-scale social experimentation as a valid research tool for estimating important behavioral responses to proposed policy interventions. The fact that it was successfully designed and fielded, and that it yielded estimated responses that passed the scrutiny of the research community, led to a wide variety of additional publicly funded social experiments.




Table 1 provides a listing of the most significant of these experiments, and some basic facts regarding each. The table proceeds chronologically through each category: income maintenance (1-4), labor market (8-9), and electricity pricing (10) experiments.2

Two developments in the evolution of the experiments described in the table are noteworthy. First, the early experiments were primarily in the income maintenance or welfare area. They were followed by experiments designed to test major alterations in the provision of education, health, and housing subsidies to low-income families. The latest experiments, beginning in the mid-1970s, emphasized the work effort and productivity effects of public employment and training interventions or the extent of electricity usage response under various utility pricing arrangements. The second development concerns the complexity of the experiments through time. While the early ones involved relatively simple treatments with relatively straightforward hypotheses to be tested, the later labor market experiments involved more complex treatments, often with several interventions designed to be mutually supporting (e.g., income support plus counseling plus training). Because these treatments were not independently assigned, the findings of these more complex experiments are difficult to interpret.

Stated in constant (1983) dollars, the ten social experiments listed in the table involved a total cost of about $1.1 billion, of which about $450 million was allocable to research and administrative costs. Relative to the total volume of annual poverty research and poverty-related research expenditures-which were in 1980 (current dollars) about $75 million and $300 million, respectively (Haveman 1986),3 this support for social experimentation research is large indeed. A brief examination of the most significant of these projects is in order.

The Seattle-Denver income maintenance experiment was the largest and most comprehensive of the NIT experiments. Approximately 4,800 families were enrolled, and the families assigned to experimental NIT plans were

potentially eligible for payments for a period of three, five, or, for a few families, 20 years. The experiment had two main goals, reflected in its rather elaborate design. The first was to determine the effect of alternative NIT

plans on work effort, the same objective as that of the New Jersey experiment. The work-effort findings from the experiment (OISP 1983, Robins and West 1983) showed that the tested NIT plans caused substantially larger reductions in labor-market activity than those estimated in the New Jersey

2. Greenberg and Robins (1985), the main source for Table 1, identify, in addition to those indicated in the table, 29 social experiments in the post-1965 period with total costs of about $150 million. 3. The $75 million and $300 million estimates exclude expenditures on social experimentation. These are classified as demonstrations.



Book Reviews 591

experiment, particularly for persons enrolled in longer duration (5-year) plans. Prime-aged men reduced their annual hours of work by 9 or 10 percent in response to the tested plans; their spouses reduced annual hours by 17 to 20 percent; and women heading single-parent families reduced annual hours by more than 20 percent-and perhaps by as much as 30 percent. According to simulations based on these results, replacing current cash welfare and food stamp programs with an NIT with a guarantee of three-fourths of the poverty line and a 50 percent tax rate would cost the government $1.79 in transfer outlays to raise the net income of poor two- parent families by $1.00. In other words, 44 percent of the net program costs of the NIT would be "consumed" by breadwinners in the form of leisure (Aaron and Todd 1979).4

The second objective of the experiment was to test the effectiveness of issuing education and training vouchers to low-income breadwinners. It found that they used much of the subsidy to pay for schooling they would have obtained in the absence of the program; moreover, there was little payoff to the incremental investment.

The National Supported Work Demonstration and the Employment Opportunity Pilot Project (EOPP) were both designed to test alternative interventions assisting hard-to-employ workers in finding jobs. The Sup- ported Work experiment (MDRC 1980; Hollister, Kemper, and Maynard 1984) provided jobs for individuals with severe employment problems (four groups: long-term AFDC recipients, recently released convicts, former addicts, and school dropouts with delinquency records). It gave them one year of work experience under conditions of gradually increasing demands, close supervision, and in association with a crew of peers. EOPP's purpose was to estimate the employment effects of a guaranteed jobs program similar to that proposed by President Carter, as well as new approaches to job finding among the hard-core unemployed (Mathematica Policy Re- search 1984). It was terminated prior to its original design, and no benefit- cost appraisal was made. Neither experiment was able to claim substantial success in generating long-duration increases in employment or earnings; both were most successful with a group with high welfare incidence-single women with no recent work experience.

In terms of size and cost, the Housing Allowance Demand (Friedman and Weinberg 1982, Bradbury and Downs 1981) and Health Insurance (New- house et al. 1982) experiments were among the largest. Both examined families' responses to differentially subsidized prices for rental housing and health care services. They tested the efficiency of improving the economic status of low-income families through permitting them to choose between

4. The net program cost of the NIT is the amount by which NIT transfers exceed those now paid under the cash welfare and food stamp programs.



Table 1 The Major Social Experiments

Date of Final Field Report Work Date

1. New Jersey Income Maintenance Experiment

2. Rural Negative Income Tax Experiment

3. Gary Income Maintenance

Experiment

4. Seattle-Denver Income Maintenance

5. Education Performance

Contracting

6. National Health Insurance

1968-72 1974 NIT

1970-72 1976 NIT

1971-74 - NIT plus day care

1971-78 1983 NIT plus employment counseling and educational vouchers

1970-71 1972 Cash payment as incentive for academic performance

1974-81 n.a. Different fee for service health insurance plans with alternative coinsurance rates and deductibles plus a

prepaid group health insurance plan

Head-spouse-family labor supply

Head-spouse-family labor supply

Head-spouse-family labor supply, use of day care

Head-spouse-family labor supply, marital

stability, earnings capacity

1,216 families

809 families

1,780 families

4,784 families

Educational improvement 19,399 students1 of junior high students

Demand for health care and change in health status

2,823 families

$15.4 million $22.2 million




n.a. $14.3 million


Experiment Nature of Treatment

:=I

(1)

0

r_

0

0

O cr

F:

(D V,

Response Measured

Number of

Participants

Research and Administrative

Costs

(1983 dollars)

Total Costs

(1983 dollars)



7. Housing Allowances2

8. National Supported Work

Project

9. Employment Opportunity Pilot Project (EOPP)5

10. Electricity Time-of-Use

Pricing (15 Experiments)

1973-77 1980 Cash housing allowance Use of allowance; effect on quality, supply, and costs of housing; administrative

feasibility

1975-79 1980 Temporary Public Services Employment (PSE), work discipline, and group support

Administrative costs, effects on antisocial behavior and earnings

1979-81 1982 PSE preceded by required Effect on welfare

job search and training caseload, private (if necessary) sector labor market,

use of program, administrative costs, and feasibility

1975- 1979- Time-of-use electricity price schedules

Alterations in elec-

tricity use levels and patterns of residential consumers

_3 $160.4 million $352.2 million

6,606 individuals $16.9 million4 $126.8 million

Open to all heads of households on

public assistance

n.a.

n.a.

n.a.

$246.0 million

- 50.0 million

Source: Greenberg and Robins (1985); author's data. 1. The number of students remaining in the program the entire school year. The initial sample comprised 24,000 students split evenly between control and experimental groups. 2. There were three different housing allowance experiments during the 1970s. They have been grouped together here for brevity. 3. One of the housing allowance experiments was open enrollment while the other two involved random selection. 4. Research only. 5. The EOPP was ended long before completion.




services provided at subsidized prices and outright public provision of the same services. The instrument was rent subsidization in the housing case, and coinsurance in the health insurance experiment. Both experiments were complex in their design, and attended mainly to the nature of demand-side responses to alternative subsidization arrangements. The housing allowance experiment had difficulties in obtaining reliable responses because only a small number of eligible households accepted the subsidies. While the health experiment also experienced design problems, it did reveal that families economize on their demand for health services when confronted with the need to pay for some fraction of the cost of the services provided.

Finally, there are the 15 time-of-use residential electricity pricing experiments which have been completed or are ongoing (Aigner 1985). The purpose of these experiments is to test the elasticity of demand of residential electricity users to price variation by time of day or season. These experiments established that usage at peak periods is responsive to prices, and that usage reductions at the peak are not fully offset by increases in off-peak usage. They support the conclusion that total electricity usage and capacity requirements can be reduced by time-specific pricing arrangements.

II. Social Experimentation

It was to assess this body of experimental research and evaluation that the National Bureau of Economic Research convened a conference, "Social Experimentation" in 1981. Numerous questions were addressed by the conference, and by the 1985 conference volume, edited by Jerry Hausman and David Wise. The primary ones are:

1. For the income maintenance, housing, health, and electricity pricing experiments, what are the primary objectives, design characteristics, estimation methods, and findings? What are the central weaknesses of the designs, and the potential biases affecting the estimates?

2. What is the value of the findings of the experiments in comparison to what could have been learned from existing data, and relative to the cost? What is the social willingness-to-pay for improved elasticity estimates of particular economic relationships?

3. Is there agreement among analysts regarding the optimal design of experiments, the circumstances under which they are the desired evaluation strategy, and the statistical procedures appropriate for accounting for the special issues to which social experiments give rise (sample selection, treatment assignment, and attrition)?



Book Reviews 595

4. Are there net benefits attributable to the experiments, over and above their contributions to knowledge regarding behavioral responses to policy interventions?

5. Have the findings from the experiments had an influence on public policy decisions? Should policymakers be influenced by their findings?

Question 1 is the concern of the first four chapters of the volume: electricity pricing, Dennis Aigner; housing, Harvey Rosen; income maintenance, Frank Stafford; health, Jeffrey Harris. Each chapter has two discussants. In these three presentations on each experiment, questions 2 and 3 are also addressed, although often not explicitly.

Question 4 is the subject of the chapter by Hausman and Wise, the most analytical chapter in the volume. Their position is that experimental designs should be as simple as possible, testing a single behavioral response, with a large random sample of the population randomly assigned to treatment and control groups. Discussants John Conlisk and Daniel McFadden agreed with the objective of this strategy-avoidance of the difficult explanation problems posed by the endogenous stratification present in most of the past experiments-but they questioned its desirability and feasibility in real- world, budget constrained circumstances.

Question 5 is the subject of papers by Ernst Stromsdorfer and David Mundel, and each of these papers also has two discussants. An additional paper, by Frederick Mosteller and Milton Weinstein, draws from experience with medical experimentation to flesh out a framework for determining when experimentation is the optimal research strategy, and raises the important question of when and under what circumstances experimental findings are likely to influence policy choices.

Overall, the conference participants' answers to these questions make for discouraging reading. At the risk of oversimplifying, I will characterize the primary answers to the five questions in the following paragraphs.

Answering the first question requires a wealth of factual information and interpretive comment. The four experiment-specific papers provide a solid source of such material for each of the experiments discussed. Aigner's paper on the electricity pricing experiments emphasizes the differential welfare effect of time-of-use pricing on various population subgroups, and the implications of these differences on participation in a time-of-use pricing program. It is the only paper in the volume that discusses this important welfare economics and program implementation point. The Rosen paper on housing experimentation and the paper's discussants emphasize the critical problem of a short-duration experiment for analyzing the consumption of the services of a durable commodity such as housing, noting the important implications of the assumption that consumers are observed to be in equilib-




rium positions both prior to and after the experimental treatment is administered. The duration problem may well explain why the estimated income

elasticity from the experiment is at the very low end of the range of elasticities estimated from prior studies. Stafford, in dealing with the income- maintenance experiments, makes a special contribution in emphasizing that standard theory based on continuous supplies and linear budget constraints

may misguide the designers of experimental studies when actual work behavior is intermittent and responds to nonconvex budgets. The Harris

paper on the health experiment seems more intent on suggesting an alternative research strategy-macro-experimentation (more on this later)-than on describing and appraising what was actually done. While the other experiment-specific studies compared the experimental findings to existing estimates, the counterfactual for Harris' appraisal is some ideal, but not-yet- observed, experiment.

It is with respect to the evaluation of the design and findings of these

experiments that one struggles against a sense of deja vu. The book's litany of problems and their impacts on the reliability of the empirical findings reads like a recital of the analytical criticisms of the New Jersey experiment that has existed informally since the inception of that experiment in 1968, and that were documented in the formal critiques of its design and findings. This litany includes:

1. Insufficient sample sizes, insufficient variation in the treatment variables, and inadequate definition of the treatment variables

(Aigner, p. 11; Rosen, pp. 60 ff., p. 69; Stafford, pp. 102 ff.; Harris, p. 150).

2. Selectivity problems involving restrictions imposed on the participating population, or self-selection possibilities for participation (Aigner, p. 19; Rosen, p. 65; Stafford, pp. 97 ff.; Harris, pp. 148 ff.).

3. Sensitivity of results to the specification of the estimating model, with no basis for testing the accuracy of the assumed statistical

properties (Aigner, p. 33; Rosen, pp. 59 ff., p. 63, p. 68; Stafford, pp. 98 ff.; Harris, p. 152).

4. Short duration of the experiments, precluding the estimation of

long-run impacts (Aigner, pp. 18 ff.; Rosen, pp. 64 ff.; Stafford, pp. 101 ff.).

5. Self-selected attrition of participants from the experiment (Aigner, pp. 12 ff.; Rosen, p. 65; Stafford, pp. 97 ff.; Harris, pp. 150 ff.).

6. Estimation of behavior which responds to nonconvex budget constraints and which is intermittent with simple linear constraint



Book Reviews 597

models assuming continuous demands or supplies (Aigner, p. 19; Rosen, p. 63; Stafford, pp. 98 ff.; Harris, pp. 153 ff.).

7. The difficulty of estimating market effects from the individual response estimates available from micro-experiments (Aigner, p. 31; Rosen, pp. 71 ff.; Stafford, p. 103; Harris, pp. 153 ff.).

8. The presence of Hawthorne-type effects (Rosen, p. 66; Harris, pp. 151 ff.).

9. Inadequate data on both important "shift" variables and pre- experimental data on the primary response variables (Rosen, p. 66; Harris, pp. 151 ff.).

There are several explanations for this persistent catalog of problems with social experimentation, none of which are particularly encouraging. For some of these problems, the theoretical and empirical requirements for avoiding them lay beyond the financial and intellectual resources available to those who designed and fielded the experiments. These problems include nonrandom attrition from or selection into the sample, estimation of behavior given complex and ultimately unknown budget constraints, and inadequate data on the backgrounds of respondents or their previous behavior. Most of these problems, it should be noted, affect both social experimental research and standard micro-data analysis, and are typically more serious in the latter than in the former. While modeling and data collection advances have been made with respect to all of these, financial resources are scarce and the pool of technical expertise on these issues is spread thin. Progress has been painfully slow.

A second explanation is that avoiding the problems or reducing them to more tolerable levels is very costly; budget constraints imposed on each experiment resulted in the occurrence of the same set of problems, but of various intensities. The problems in this category concern deviations from the pure random assignment experimental model, and include the restrictions imposed on participating populations, inadequate sample sizes, insufficient variation in treatment variables, ad hoc specification of response models, and the short-run nature of the experiments. Presumably, these problems will exist at some level so long as constraints on social experimental research budgets exist.5

5. This is not to suggest that all of these problems have plagued each of the experiments with equal intensities, or that experiments with less constrained budgets were not able to reduce the problems confronted by earlier projects, or those with more binding budget constraints. Robert Moffitt has emphasized, for example, that the SIME-DIME experiment did have a much larger sample size than the New Jersey experiment, as well as cells with multiple durations, higher income truncation points, a more diverse population, and a wider range of program parameters.




Finally, the problems may have persisted because the later designers of

experiments learned little from the experience of the earlier experimenters.6 This explanation appears to have some basis, especially across the substantive areas over which experimentation was done. For example, in his discussion of the reasons for the poor designs of the electricity pricing experiments, the first of which started in 1975, Paul Joskow states:

When the earliest experiments were structured, those involved had

simply not thought very deeply about what the data generated might be used for. [They] were motivated more by narrow adversarial and

litigation concerns than by an interest in sound economic analysis.... There was no inherent reason for these early experiments to have been so poorly designed (pp. 43-44).

While all of these problems cited by the authors have, to some extent, detracted from the reliability of the experimental findings, they vary widely in both their seriousness and in the extent to which they can be avoided. While some of them can be minimized by larger experimental budgets or more technical expertise (e.g., sample sizes, estimates of responses to

complex budget constraints), others are not easily correctable (e.g., problems of attrition from or self-selection into the experiment).

The second question has to do with the value of the findings of the

experiments relative to existing estimates, and to the costs of obtaining them. By and large, the authors of the four experiment-specific papers and their discussants did not find the experimental results to be notably more reliable than existing estimates based on cross-sectional data. For example, in assessing the findings of the housing allowance experiments, Rosen concludes that "if the goal was to obtain new and improved estimates of the behavioral responses to housing allowances,... the money would have been better spent on augmenting conventional data sources" (p. 72). In discuss-

ing the findings of the health insurance experiment, Harris concludes that "economists and other social scientists have spent disproportionately too much effort on the design and interpretation of microexperiments" (p. 145). And, in the case of the electricity pricing experiments, Aigner concludes that "It is difficult to summarize the empirical results ... since the elasticity

6. To some extent, of course, this was unavoidable. Some of the earlier experiments-for example, SIME-DIME, housing allowances, and health insurance-were being designed prior to the formal evaluations of the design and findings of the New Jersey project. However, the basic concerns with the design of the New Jersey project were clearly debated and discussed well before the final report was published. While this timing point is surely relevant, I cannot agree with those who argue that, in fact, there has been only one phase of experimentation and that the proper question is: Given what we have learned from this first phase, would the benefits outweigh the costs of a new experiment that we would design today? I am indebted to Robert Moffitt for raising this point.



Book Reviews 599

estimates frequently conflict with each other . . . no consistent overall pattern emerges" (p. 20). His discussants state this conclusion even more strongly. Finally, for the income maintenance experiments, Stafford is the most optimistic, concluding that "at a minimum, the experiments have reduced the variance of labor supply parameters, even if they have not shifted the means very much" (p. 121). Such a claim, it should be noted, is not great praise when the ten conventional male labor-supply analyses reviewed by Stafford have a range of uncompensated wage elasticities of +.11 to -.55, a range of compensated wage elasticities of +.86 to -.04, and a range of income elasticities of -.06 to -.51. It would be surprising if estimates from any new study added to this set failed to reduce the variance.7

These generally pessimistic conclusions, it should be noted, rest largely upon the numerous design and analysis weaknesses noted above, rather than on any comprehensive comparison of the magnitudes, the strengths, and the weaknesses of both experimental and nonexperimental estimates. When the experimental findings are set against potential estimates from some ideally designed and unflawed experiment, such an assessment is probably appropriate. However, when set against the reliability of existing estimates from nonexperimental research, this conclusion seems exagger- ated. Most of the same problems affecting the experiments also plague nonexperimental empirical studies on behavioral responses, and appear in even more virulent form in such research. Moreover, nonexperimental analyses are burdened by additional problems that are at least partially avoided by the experiments. The absence of pre-observation measures of the variables of interest and the weakness of the variables available for control come immediately to mind. My own judgment is that the parameter estimates from the social experiments are, in general, more reliable than those available from nonexperimental studies. They set the standard in the labor supply, housing demand, medical demand, and electricity demand areas; they are the best game in town. Any new estimates are-and should be-judged by comparison to them. This assessment, of course, does not say that the contribution to knowledge provided by the experiments is worth the cost.

The third question concerns the lessons from the experiments for the design of future research on behavioral responses to policy incentives. In the face of the serious reservations of the conference participants regarding the findings of the experiments, it would be reassuring if there were agreement on the characteristics of the optimal research design for response estima-

7. Indeed, Stafford's conclusion regarding the "rather clear consensus" on labor supply parameters seems overdrawn, as has been discussed in detail by Killingsworth (1983), and noted by Sherwin Rosen in his discussion comments.




tion. With such agreement, one could anticipate that a second generation of experiments would yield more reliable results.

Jerry Hausman and David Wise present the most concrete case for an experimental research design that would minimize many of the problems that have plagued past experiments. They advocate a simple micro- experiment with few treatments, large sample sizes, and full randomization so as to avoid the need for complicated structural models based on strong and nontestable specification assumptions. While agreeing with the objective of avoiding endogenous stratification, John Conlisk finds their advice "incomplete," and "sees no reason to suppose that a good design will be the sort of simple design Hausman and Wise have in mind. Nor [does he] see a useful way to substitute simple rules of thumb .. . for a full-blown, optimal design analysis specific to the context at hand" (p. 212). While Conlisk's criticism is telling, neither he nor the authors discuss the substantial additional research costs that the proposal implies, and the trade-off between the benefits attributable to these costs and the value of other expenditures on knowledge acquisition. Given the enormous costs of the experiments, together with the design compromises which budget constraints have imposed on them, the failure to address this issue explicitly is unfortunate. Moreover, even in the pure experimental model proposed by Hausman and Wise, unavoidable problems of selection into and attrition from the experiment will exist. And, while they advocate a variety of sophisticated statistical techniques to correct for these problems, there is little assurance that estimates derived from them will be similar, and if not, which model is to be preferred.

Harris finds the shortfalls in the health insurance experiment to be over- whelming and offers a second design proposal: the substitution of macro- experimentation-the assignment of treatments to randomly selected groups, communities, or markets, with others, also randomly selected, serving as controls-for micro-experimentation. Larry Orr, his discussant, finds this enthusiasm for macro-experiments to be "seriously overdrawn," and cites a variety of difficulties with such an approach-cost, political feasibility (and, hence, self-selection), control and administration, and the lack of a true control group-to be sorely and unrealistically understated. One senses from reading the volume that most conference participants agreed with this critique.

The fourth question concerns the possibility of other benefits from social experimentation, given the disappointing conclusion that the response estimates-their primary purpose, output, and benefit-are not markedly more reliable than those from other studies. Indeed, conference participants attribute a variety of other "benefits" to social experimentation. Virtually all of these are side (or secondary) effects-often, unanticipated-from experimentation, and include:



Book Reviews 601

1. The initiation and fostering of more thoughtful and comprehensive discussions of the policy measures under consideration (p. 48);

2. the development of economic concepts and statistical techniques which will be useful in subsequent scientific work (pp. 98 ff., 121 ff., 136);

3. increased knowledge regarding the efficient administration and monitoring of programs (pp. 56, 78);8

4. estimates of (or at least insights into) take-up rates to new or altered policies and hence, improved cost estimates (pp. 87, 92);

5. generation of valuable longitudinal micro-data sets enabling subsequent analyses of economic behavior (pp. 72, 78, 90, 121);

6. use of the experimentation mantle to present results regarding economic behavior that are also available from other studies in a more persuasive, convincing way (pp. 93, 251 ff.); and

7. contributions to scientific progress by pointing up behaviors incon- sistent with standard theory, hence stimulating the development of new or extended theories (pp. 121 ff., 136, 138).

While many of these benefits may have accrued from the experiments, in this volume anecdotal evidence serves as the primary basis for assessing their value. Discussion of them consists of little more than a set of catalog entries. The analogy to the claims of advocates of increased space and military research is obvious, and the volume does nothing to document the gains claimed.

Moreover, while such side benefits of social experimentation can be noted, the associated costs of experimentation must also be weighed. While one is left with the impression that these secondary effects net out to be positive, that conclusion is not obvious. The substantial research resources and brainpower devoted to experimentation would have been employed in some other enterprise in the absence of this heady new social research endeavor. The secondary or side-effect benefits that would have accrued from these activities would likely have been as significant as those attributed to the experiments. These forgone side benefits are secondary costs allocable to experimentation; they fit into many of the same categories as the side benefits credited to experimentation and are equally as unmeasured and unseen. The lesson from the early benefit-cost literature could be

8. The monthly reporting procedure now incorporated into welfare administration legislation came directly from experience in the Rural Negative Income Tax Experiment, and has demonstrated efficiency gains in program administration and financial control.




profitably applied in this case as well: unless there are overriding considera- tions of asymmetric information, deviations from full employment, or other identifiable market failures, assume that the secondary benefits and costs from a public investment-anticipated or imagined-are a wash!

The final question concerns the impact of the experiments on public policy decisions. In the volume, this impact is regarded as a mixed bag. And, as with the discussion of the secondary benefits of social experimentation, the evaluation provided is anecdotal, impressionistic, and contradictory. Ernst Stromsdorfer points to both the positive effects of the income maintenance experiments on welfare policy and to noneffects elsewhere; Henry Aaron emphasizes (twice) that the "serendipitous findings of the income- maintenance and housing-allowance experiments will more than repay the U.S. Treasury the cost of the experiments in short order" (p. 276), and notes that "social experiments ... have been a force for slowing the adoption of new policies"9 (p. 276); Joskow points out that policymakers "do not under- stand what deadweight losses are, would not care much about them if they did, and, as a result, more rarified calculations are unlikely to have any policy impacts" (p. 45).

Laurence Lynn reacted strongly to this discussion of the policy impacts of experimentation: "I find this subject . . boring. ... [T]he number of interesting things one can say is limited. .. . [T]he interesting things have already been said quite well by others. ... The authors reach opposite conclusions ..." (pp. 277-78). As with the previous point, the discussion in this area seems to ignore the need for a comprehensive framework in which all effects are considered. No meaningful conclusion is possible by tracing the policy impacts of a single activity (e.g., social experimentation) in the absence of a similar tracing of the policy impacts of the forgone activities- both research and nonresearch-which would have occurred but never did. In the absence of any evidence about these offsetting effects, I find no reason to consider them anything but equal to and opposite in sign from those anecdotal impacts mentioned-in sum, again a wash. One could have ex- pected more from this National Bureau volume.

In summary, the volume's primary value lies in the unified collection of a set of assessments of a wide range of social experiments by a well-known and highly competent group of economists and analysts. While these assessments raise few issues that have not already been exposed in more in-depth reviews (see, for example, the Brookings Institution conference volumes on social experimentation), the perspectives brought by new critics are both stimulating and provocative. The overall impression-that if the resources used for the experiments could have been reallocated among alternative

9. This same point has also been emphasized in Burtless and Haveman (1985) and Greenberg and Robins (1985).



Book Reviews 603

social research and data collection efforts, the social payoff would likely be greater-is not a particularly satisfying one, however. A second contribution of the volume lies in the prescriptions for experimental designs which can avoid or at least mitigate some of the weaknesses of past experiments and other research on behavioral responses. While provocative, these prescriptions are also contentious.

A message which the volume conveys with persistence, if not with clarity, is that if we were seriously contemplating a new round of social experimental research today, we would do things differently and probably better than we did them before. We would be more discerning regarding the particular issues on which social experimentation would be justified, more sensitive to the interaction between the nature and design of the experiment (e.g., micro- versus macro-experimentation) and the policy question of interest, more careful in designing the experiment to avoid problems of selectivity and attrition bias and inadequate specification of treatments and response, and more thorough in administering and monitoring the implementation of the experiment. In terms of the design of micro-experiments, we would opt for simpler, less complicated treatments, and invest more in collecting pre-experimental data on the relevant aspects of behavior. While being aware of the consequences of truncating the population eligible for participation, we would not necessarily proceed to full randomization in sample selection and treatment assignment as proposed by Hausman and Wise.

The book is seriously uneven, however. For my tastes, the unsubstanti- ated "schmoosing" about the unintended benefits of experimentation and their impacts on policymakers which is scattered throughout the volume detracts from the substantive analyses in the book. The same holds for the 20-page discussion of essentially extraneous issues (viz., statistical decision theory, nonstandard time diary studies, and his own agenda for future labor economics research) tagged on to the end of the Stafford paper, the (in essence) reprinting of the Stromsdorfer paper on the impact of social research on policy from a 1979 volume, and inclusion of the Mosteller- Weinstein paper on evaluating how, when, where, and why to perform health care-as opposed to social-experiments. Interestingly, several of the more interesting and insightful passages in the volume are found in the discussants' comments, especially those of Joskow, John Quigley, Gregory Ingram, Orr, Aaron, and Conlisk. Finally, one might wonder why the proceedings of a 1981 conference on this important issue were not available to a general audience until 1985.

Robert H. Haveman University of Wisconsin-Madison




References

Aaron, H., and J. Todd. 1979. "The Use of Income Maintenance Experiment Findings in Public Policy, 1977-78." In Industrial Relations Research Association Proceedings, 46-56.

Aigner, D. 1985. "The Residential Electricity Time-of-Use Pricing Experiments: What Have We Learned?" In Social Experimentation, ed. Hausman and Wise. Chicago: University of Chicago Press, for the National Bureau of Economic Research.

Bradbury, K. A., and A. Downs, eds. 1981. Do Housing Allowances Work?

Washington, D.C.: The Brookings Institution. Burtless, G., and R. Haveman. 1985. "Policy Lessons from Three Labor Market

Experiments." In Employment and Training R&D: Lessons Learned and Future Directions, ed. R. T. Robins. Kalamazoo: The Upjohn Institute.

Friedman, J., and D. Weinberg. 1982. The Economics of Housing Vouchers. New York: Academic Press.

Greenberg, D., and P. Robins. 1985. "The Changing Role of Social

Experiments in Policy Analysis." In Evaluation Studies Review Annual, vol. 10, ed. L. Aiken and B. Kehrer. Beverly Hills: Sage.

Hausman, J., and D. Wise, eds. 1985. Social Experimentation. Chicago: University of Chicago Press, for the National Bureau of Economic Research.

Haveman, R. 1986. "The War on Poverty and Social Science Research, 1965-1980." Research Policy (forthcoming).

Hollister, R. G., Jr., P. Kemper, and R. A. Maynard. 1984. The National

Supported Work Demonstration. Madison, Wis.: University of Wisconsin Press.

Lampman, R. 1976. "The Decision to Undertake the New Jersey Experiment." In The New Jersey Income Maintenance Experiment, D. Kershaw and J. Fair, vol. 1, Operations, Surveys, and Administration. New York: Academic Press.

Killingsworth, M. 1983. Labor Supply. Cambridge: Cambridge University Press.

[MDRC] Manpower Demonstration Research Corporation. 1980. Summary and

Findings of the National Supported Work Demonstration. Cambridge, Mass.:

Ballinger. Mathematica Policy Research. 1984. Final Report: Employment Opportunity

Pilot Project: Analysis of Program Impacts. Princeton, N.J.: MPR.

Newhouse, J., W. Manning, C. Morris, et al. 1982. Some Interim Results from a Controlled Trial of Cost Sharing in Health Insurance. Rand Report R-2847-HHS. Santa Monica, Calif.: Rand Corporation.

[OISP] Office of Income Security Policy, U.S. Department of Health and Human Services. 1983. Overview of the Seattle-Denver Income Maintenance

Experiment Final Report. Washington, D.C.: GPO. Orcutt, G., and A. Orcutt. 1968. "Incentive and Disincentive Experimentation

for Income Maintenance Policy Purposes." American Economic Review 58:754-72.

Pechman, J., and M. Timpane. 1975. Work Incentives and Income Guarantees: The New Jersey Negative Income Tax Experiment. Washington, D.C.: The

Brookings Institution.



Book Reviews 605

Rivlin, A. 1971. Systematic Thinking for Social Action. Washington, D.C.: The Brookings Institution.

Robins, P. K., and R. W. West. 1983. "Labor Supply Response." In Final Report of the Seattle-Denver Income Maintenance Experiment, vol. 1, Design and Results. Stanford, Calif.: SRI International.

Rossi, P. H., and K. C. Lyall. 1976. Reforming Public Welfare: A Critique of the Negative Income Tax Experiment. New York: Russell Sage Foundation.

Watts, H., and A. Rees, eds. 1977. The New Jersey Income-Maintenance Experiment, vol. 2, Labor Supply Responses. New York: Academic Press.



Documents

Social Experimentation and Social Experimentation