11
Testing Data for Consistency with Revealed Preference Author(s): John Gross Source: The Review of Economics and Statistics, Vol. 77, No. 4 (Nov., 1995), pp. 701-710 Published by: The MIT Press Stable URL: http://www.jstor.org/stable/2109817 . Accessed: 24/06/2014 21:37 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . The MIT Press is collaborating with JSTOR to digitize, preserve and extend access to The Review of Economics and Statistics. http://www.jstor.org This content downloaded from 185.2.32.121 on Tue, 24 Jun 2014 21:37:17 PM All use subject to JSTOR Terms and Conditions

Testing Data for Consistency with Revealed Preference

Embed Size (px)

Citation preview

Page 1: Testing Data for Consistency with Revealed Preference

Testing Data for Consistency with Revealed PreferenceAuthor(s): John GrossSource: The Review of Economics and Statistics, Vol. 77, No. 4 (Nov., 1995), pp. 701-710Published by: The MIT PressStable URL: http://www.jstor.org/stable/2109817 .

Accessed: 24/06/2014 21:37

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

The MIT Press is collaborating with JSTOR to digitize, preserve and extend access to The Review ofEconomics and Statistics.

http://www.jstor.org

This content downloaded from 185.2.32.121 on Tue, 24 Jun 2014 21:37:17 PMAll use subject to JSTOR Terms and Conditions

Page 2: Testing Data for Consistency with Revealed Preference

TESTING DATA FOR CONSISTENCY WITH REVEALED PREFERENCE

John Gross*

Abstract-We develop a method of hypothesis testing for revealed preference tests. This method is particularly well suited to tests of commonality of tastes or tests of preference stability. We devise a metric based on an estimate of the fraction of expenditure wasted in maximizing utility if all consumers in the sample share the same utility function. This metric increases in size with increases in the proportion of observations having different tastes and with increasing differ- ences in tastes. We propose bootstrap methods for estimating the distribution of the test statistic.

I. Introduction

DEVEALED preference techniques are uniquely appropriate for testing basic hy-

potheses about consumption behavior when pro- cedures free from functional form restrictions are required. Such methods are particularly useful in testing hypothesis of taste stability over time or commonality of preferences for a single-period cross section. They are also well suited to tests of consistency with utility maximization.

Several empirical studies applying revealed preference methods have appeared in the recent literature. Landsburg (1981) examines changes in taste over time. Chalfant and Alston (1988) demonstrate that what appear to be changes in tastes for red meat are explainable as changes in relative prices. Gross (1995) shows that demo- graphic traits do not typically perform well in selecting consumers with similar tastes for public educational expenditure. Dowrick and Quiggin (1994) show that welfare rankings for countries based on average international prices may be somewhat flawed. Famulari (1995) finds that households with similar demographic traits make commodity choices that are essentially consistent with revealed preference. Famulari (1994) finds the same households are not consistent when

labor supply decisions are explicitly investigated. Banker and Maindiratta (1988) and Chavas and Cox (1990) apply similar methods to investiga- tions of firm behavior.

While successful, revealed preference methods to date have suffered from inadequate statistical procedures and goodness-of-fit metrics. In this paper we develop a goodness-of-fit measure based on the fraction of wasted expenditure associated with each observation if all consumers are maxi- mizing the same utility function. This metric may be seen as a nonparametric extension of a para- metric goodness-of-fit measure proposed by Varian (1990) and is particularly useful in illumi- nating differences in tastes.

Bootstrap procedures for empirically deriving the distribution of the test statistic are proposed. This, in turn, enables one to carry out tests of hypotheses in what is essentially a classical man- ner, and illuminates those instances in which the data are consistent with revealed preference, or nearly so, but when the power of the test is low.'

In what follows, section II provides a very brief review of revealed preference theory and a dis- cussion of problems that arise in empirical appli- cations. Section III reviews prior goodness-of-fit metrics and power tests. Section IV details the construction of the test statistic. Section V is a discussion of bootstrap methods for deriving the distribution of the test statistic. Section VI pro- vides a brief empirical example and section VII is the conclusion.

II. Theory and Practical Considerations

A. Theory

The fundamental theorem of revealed prefer- ence is Afriat's theorem. Of particular interest is the equivalence of the following two conditions:

(1) The data satisfy the axioms of revealed preference.

Received for publication March 9, 1993. Revision accepted for publication July 28, 1994.

* Marquette University. I benefited greatly in the early part of this work from

discussions with Ted Bergstrom and Hal Varian, to whom I offer thanks and apologies for any attributional oversights. Two anonymous referees pointed out plentiful shortcomings in an earlier version, for which I'm very grateful. Thanks also to Deborah Antkoviak, Bill Holahan, and Vicki Knoblauch for

ITests for consistency with homotheticity and separability are not addressed here, nor are nonparametric tests of pro- duction behavior, though similar methods can be applied to these problems.

Copyright ? 1995 [ 701 ]

This content downloaded from 185.2.32.121 on Tue, 24 Jun 2014 21:37:17 PMAll use subject to JSTOR Terms and Conditions

Page 3: Testing Data for Consistency with Revealed Preference

702 THE REVIEW OF ECONOMICS AND STATISTICS

(2) There exists a nonsatiated, continuous, concave, monotonic utility function that ra- tionalizes the data.

Proof of this theorem can be found in Afriat (1967, 1973), Diewert (1973), Diewert and Parkan (1985), and Varian (1982). Condition (1) states that the data must satisfy the axioms of revealed preference. Rose (1958) shows that in the two good case we need only test data for consistency with WARP. Where there are three or more consumption goods the data must be tested for consistency with SARP if demand functions are restricted to single values, or GARP if we wish to allow for multivalued demands.

The equivalence of (1) and (2) above suggests that, as a simple test, we take the existence of a common utility function as the maintained hy- pothesis and reject if any violations of revealed preference are observed. While this is appealing in both its simplicity and theoretical elegance, it is of little practical value, as a test of exact consistency is far too stringent to be useful.

B. Practical Considerations

Generally speaking, inconsistent data can arise for three reasons.2 First, even when the observa- tions are generated by consumers with similar tastes, measurement errors or small maximization errors on the part of consumers may lead to some violations. Second, even a few observations from consumers with very different tastes can generate a large number of observed violations. Third, the data may have arisen due to genuine inconsis- tency as a result of serious underlying differences in tastes. In the first two cases, an ideal test statistic whose value increases with increasing inconsistency will yield small values, while in the third it will yield significantly larger values. - Consistent, or almost consistent, data can arise for two reasons. One, the data are truly consis- tent or nearly so. Two, when income variation is large or relative price variation is small, budgets may intersect infrequently. If so, violations of revealed preference can occur only rarely regard- less of the underlying behavior, thus the data will always appear to be consistent or nearly so. In

this latter case, revealed preference methods have little power and our hypothesis testing procedure should alert us to this problem.

III. Prior Measures

A. Goodness-of-Fit Measures

Several authors have proposed other methods for assessing the seriousness of violations of re- vealed preference. In what follows we briefly review these prior approaches.

1. Afriat's Index: Afriat (1973) proposed a sim- ple index of the seriousness of a single violation of revealed preference. Suppose two observations i and j are involved in a violation. Then p'x' 2 p' x and pIx' 2 p x', where p' and pi are the vectors of prices faced by i and j respectively and xi and xi are the associated quantity vectors. Thus p'x' is i's total expenditure at her own prices and p'xJ is the cost of j's bundle at i's prices.

We rewrite the above inequalities multiplying the left hand sides of each by some fraction e, where 0 < e < 1. Thus a violation occurs when e p'x' 2 p'xl and e -pixi 2 p x'. Afriat's index may be thought of as the largest value of e for which at least one of the inequalities reverses, removing the violation. In essence, the budgets on the left hand side of each inequality are shifted inward, toward the origin, until the data are consistent.3 If a single index is computed for data involving several violations of revealed prefer- ence, Afriat's index reflects only the "worst" vio- lation, the one for which the smallest value of e was required to reverse one of the inequalities.

Chalfant and Alston (1988) use an almost iden- tical index of the difference in expenditure levels for two observations that are found to form a violation of WARP. They consider the ratio p' x/p'x' where observations i and j form a violation. Since we may think of e as max{p'x1/p'x', pJx'/pJxJ}, Chalfant and Alston's index is essentially Afriat's index.

The expenditure index used by Chalfant and Alston, and thus Afriat's index, are not very use- ful as measures of differences in tastes. Gross

2 There are, of course, other reasons why data may be inconsistent, such as quantity constraints (Dowrick and Quiggin (1994)), and unobserved non-linearities in the budget constraints. We do not address these problems here.

3Since the index is constructed such that only the left hand side of each inequality is altered, all values of e below the critical value which removes the violation will also yield no violations.

This content downloaded from 185.2.32.121 on Tue, 24 Jun 2014 21:37:17 PMAll use subject to JSTOR Terms and Conditions

Page 4: Testing Data for Consistency with Revealed Preference

TESTING DATA FOR CONSISTENCY 703

(1991) shows that if two consumers face similar prices with similar incomes the expenditure index and e will always take on a value close to 1 regardless of differences in preferences.

2. Varian's Minimum Perturbation Test: Varian (1985) proposes a test based on determining the minimum perturbation required to remove viola- tions. For this test, the maintained hypothesis is that the data are consistent with revealed prefer- ence. Minimal movements of the data needed for consistency are computed, and these "fitted"val- ues are used to estimate an upper bound on the measurement error for the null not to be re- jected. As Varian (1989) points out, this test will be impractical for even modest size data sets and its use requires that the error variances be speci- fied by the investigator.

3. Maximum Number of Violations Possible: Swofford and Whitney (1987) and McMillan and Amoako-Tuffour (1988) compare the observed number of violations to the maximum possible. The most obvious interpretation of the maximum number of violations possible is to simply con- sider every budget intersection as a possible vio- lation. If every budget intersects in a particular set of data then the maximum number of possible violations of WARP is given by ( where n is the number of observations.4 For data with only 100 observations, for which each budget crosses, the maximum number of violations may reach as high as 4,950 or 9,900 for double-counting algo- rithms.

Several problems arise with such a measure. First, it seems unlikely that any observed data will ever be found which appear to be inconsistent when compared with the maximum number of violations possible.5 It is quite common in empiri- cal work to find no more than a handful of violations in data sets involving fewer than fifty observations, though the maximum number possi- ble may be as high as 2,450 if double counting. Further, when budgets intersect, they may inter- sect where consumption is unlikely. For instance,

budgets that intersect near their endpoints can lead to violations only when expenditure is de- voted almost entirely to consumption of one good, choices which may not be feasible or even life sustaining. Thus, violation counts compared against the maximum possible will lead to an overestimation of the consistency of the data.

More generally, simple violation counts may also lead to an underestimation of data consis- tency. A single unusual observation may be in violation with large numbers of other observa- tions leading to the incorrect conclusion that the data are largely inconsistent. Alternatively, many very small differences in tastes can lead to many violations when actual differences in behavior may be negligible.

4. Famulari's Violation Rate: Famulari (1995) defines a violation rate that is essentially the proportion of pairwise combinations that form violations among those observations for which violations could reasonably have occurred. She notes that there are three types of pairwise rela- tionships: Those which form violations, V, those for which one observation is revealed preferred to the other, C, and those which cannot be ranked by revealed preference. She defines the violation rate as V/(V + C).

Recognizing the problems with violation based measures, she sharpens the test by observing that violations are unlikely to occur between observa- tions where the total expenditure difference is large. Thus, she restricts the violation rate com- putation to those observations whose total expen- ditures differ by less than a proportion deter- mined by the investigator. She further restricts comparisons to those bundles whose overall ex- penditure is similar and for which the fraction of income allocated to each good is similar. Degrees of both types of similarity are also determined by the investigator.

Famulari's violation rate was developed in the context of testing consistency with utility maxi- mization and is a significant improvement over prior measures. By restricting the rate computa- tion to appropriately comparable observations she avoids many of the problems associated with vio- lation based measures. These restrictions come at a price: three parameters, apart from the critical level of the violation rate, must be chosen by the investigator, making results difficult to interpret.

4 Many algorithms for counting violations, including those used by the author, double-count. Thus the number of viola-

tions can be () x 2. 5 Indeed, even when all budgets intersect it is usually im-

possible to find consumption bundle placements that allow every observation to be in violation with every other.

This content downloaded from 185.2.32.121 on Tue, 24 Jun 2014 21:37:17 PMAll use subject to JSTOR Terms and Conditions

Page 5: Testing Data for Consistency with Revealed Preference

704 THE REVIEW OF ECONOMICS AND STATISTICS

B. Summary of Prior Measures

A fundamental tenant of hypothesis testing is that the observed value of a test statistic must be compared with values expected given its distribu- tion under the null. None of the methods above, with the exception of Varian (1985), give any consideration to the distribution of the test statis- tic. Further, none of these prior metrics captures differences in behavior in an economically mean- ingful way. In contrast, the measure we propose below is based on utility differences among the observations measured in terms of an estimate of money metric utility.

C. Power of the Test

Bronars (1987) and Aizcorbe (1991) provide methods for estimating the power of revealed preference tests against the alternative of random behavior. Random consumption is equivalent to the hypothesis that consumption follows a uni- form or rectangular distribution along the budget constraint. For tests of common preferences or preference instability this places an inordinately high probability on unlikely consumption choices.6 Thus, for these hypotheses their methods will overestimate the power of the test.

IV. Construction of the Test Statistic

A. Overview

The discussion below presumes tests based on the following null hypothesis:

Ho: The data were generated by consumers with different preference (or whose preferences have changed over time).

Thus, the alternative is that the data were generated by consumers with the same prefer- ences or that preferences have remained stable. Rejection of the null is evidence that the data arose from maximization of similar preferences.

Given this setting, it is reasonable to construct a test statistic which reflects differences in the tastes underlying the observations. Our test statistic can be thought of as a measure of the fraction of expenditure wasted by consumers maximizing a common utility function. If all are

maximizing identical utility functions and none commit maximization errors, this index will have a value of zero. For a given sample size, as the number of observations generated by different preferences rises, and as the difference in prefer- ences increases, so will the value of the statistic.

B. Observations Causing Violations

In order to construct a measure of wasted expenditure, we need a reference utility function. Yet revealed preference techniques are useful because they free us from the restrictions im- posed by specific functional forms. However, us- ing Varian's (1982) methods for bounding money metric utility, we are able to learn something about the set of all utility functions which are consistent with a large subset of the data.

As a first step we propose partitioning the data into two subsets, a set comprised primarily of those observations causing most of the violations, the violator subset (VS), and its complement, a consistent subset (CS). We then estimate the fraction of expenditure which was wasted by the VS observations in maximizing utility consistent with the CS observations.

In partitioning the data, we follow Gross and Kaiser (forthcoming) who give efficient methods for creation of a VS and CS in the WARP case.7 Essentially, each observation is ranked according to the number of observations with which it is involved in violations.8 Those of highest rank are considered the most serious violators and are removed first, the violations are then re-com- puted and another round of removals is per- formed. This process is repeated until no further violations remain. To illustrate consider figures 1 and 2.

Figure 1 depicts data with five violations of revealed preference, while figure 2 depicts data with 6 violations.9 In figure 1 the violations are

6 For example, spending such a large proportion of total expenditure on clothing that one cannot afford shelter or adequate nourishment.

7Houtman and Maks (1985) give a method for determining all subsets consistent with SARP which was the initial inspira- tion for this approach. However, they do not focus on viola- tors but simply find all consistent subsets of maximal size. We know of no algorithm which quickly partitions data in the GARP case; however, since WARP violations are also GARP violations the Gross and Kaiser approach should be the first step of any such algorithm.

8 There are some additional considerations, discussed in Gross and Kaiser, but these do not alter the basic logic.

9These would be 10 and 12, respectively, for algorithms which double-count.

This content downloaded from 185.2.32.121 on Tue, 24 Jun 2014 21:37:17 PMAll use subject to JSTOR Terms and Conditions

Page 6: Testing Data for Consistency with Revealed Preference

TESTING DATA FOR CONSISTENCY 705

FIGURE 1.-WARP DATA WITH VS = 1

x2

2 6

FIGURE 2.- WARP DATA WITH VS = 3

x2~~~~~~~~~~~~~~x

the pairs: {{1,2), {1,3), {1,4), {1,5), {1,6)). In figure 2 the violations are: {{1,2), {1,5), {1,6), {2, 4), {3, 5), {3, 6)). Even though the numbers of violations in the two data sets are similar, the data themselves are quite different. For the case illustrated in figure 1, observation 1 is removed on the first round and the algorithm has com- pleted its work. VS, = {1) and CS1 = {2, 3,4,5, 6). Figure 2 illustrates a slightly more complicated case. First, observation 1 is removed as it is involved in three violations. After which observa- tion 3 is found to be involved in 2 violations and is removed. Finally, observations 2 and 4 are in violation with one another. As there is no obvious reason to pick one over the other, the algorithm chooses randomly. Thus two possible VSs could

be created: VS2 I = {1, 3, 2) and VS2,2 = 1, 3, 4}. It is clear that a VS is not necessarily unique. However, by identifying those observations in- volved in the greatest number of violations, most of the non-uniqueness is relegated to those obser- vations, typically a small fraction in larger data sets, involved in violation with only one other observation." In any case, computation of the VS is simply the first step in computation of the summary test statistic. The non-uniqueness typi- cally has little impact on the computed value of the test statistic, and thus little impact on the decision to reject or not reject.'2

C. Computation of Wasted Expenditure

Suppose we know the reference utility func- tion, Ucs, for the CS data. This would allow us to compute an index of wasted expenditure for each observation in the VS. Specifically, if at prices pi

consumer i in the VS purchases bundle y' we would compute

min p x s.t. Ucs(x) = Ucs(y ) (1) X E- CS

where Ucs(x) represents the utility associated with some bundle x for the CS preferences and Ucs(y') is the utility consumer i enjoys if her tastes are the same as the CS preferences. Let x* represent the cost minimizing solution. The index of wasted expenditure is then coi = (piyi -

" In fact there are other ways to construct consistent sub- sets of the same size from these data. For example, removal of 2, 5, and 6 would leave the remaining data violation free but, due to its involvement in more violations than any other, observation 1 should be included in any version of a VS.

As we see later, in the limit, as n - , where n is the sample size, non-uniqueness disappears.

1 Alternatively, we propose the following for very small data sets where non-uniqueness may affect the value of the test statistic. Suppose in those cases where two removals are possible we are able to choose which to remove on the basis of some goodness-of-fit measure. If so we would remove the more inconsistent, yielding a larger value of the test statistic. While this would be computationally burdensome, we can simply re-compute the CS and the associated test statistic a number of times and select the largest value. While this will not guarantee that every possible CS has been considered, 100 such computations will make this a virtual certainty. On current desktop machines computation of the WARP CS and test statistic are quite fast, 100 trials on a data set of under 100 observations can be performed in a few minutes.

This content downloaded from 185.2.32.121 on Tue, 24 Jun 2014 21:37:17 PMAll use subject to JSTOR Terms and Conditions

Page 7: Testing Data for Consistency with Revealed Preference

706 THE REVIEW OF ECONOMICS AND STATISTICS

pix*)/piyi.l3 For each observation in the CS the index is assigned a value of 0 as there is no wasted expenditure. Let '= EI( wi/n) repre- sent the arithmetic mean of the ws, giving the mean wasted expenditure for the sample.

Since Ucs is unobservable ( cannot be com- puted. However, Varian (1982) gives methods which bound money metric utility using revealed preference techniques. Our approach follows di- rectly from these methods.

Consider the iPh VS observation, p'sy's, where y's is a vector of quantities and p's is the price vector at which y's was demanded. Since p'sy' is an element of a VS it will be in violation with one or more members of the CS. We define

is) as the set of all price vectors at which yfs could be demanded and be consistent with the CS observations.

Using the above definitions we define the re- vealed preferred set for y's with respect to the CS, RPcs(yYs). RPCs(y's) is the set of all observa- tions in the CS such that for every price p's E

ci), px > is. Thus, RPCs(y S) is the set of all observations strictly revealed preferred to y's for all prices for which y' is consistent with the CS observations.

We may define the revealed worst set, RWcs(y's), in a similar manner. Having defined RWcs(y's) we can form the set NIRWcs(yi,) which is the complement of the inner bound of RWcs(y's), and can be thought of as the upper bound to the set of points to which y', is revealed preferred for all p's c Pcs(Yus).

Now we are ready to define the upper and lower bounds of the CS expenditure function for pz4yi. Simply put, the upper bound is the least cost bundle in RPts(y's) at prices p', and the lower bound is the least cost bundle in NIRW.s(y's) at prices ps. More precisely we de- fine them as follows:

aeCS( pvs, Ys)

= min psx such that x E RPcs(y's)

ae i ..i

min pi5x such that x E NIRW,y(Y1). (2)

Since ae+(p's, yt) is the upper bound of the expenditure function it overestimates the amount of expenditure that would have been required to obtain a level of utility equal to that conveyed by bundle y's. Likewise ae[-(p'S, yis) gives an under- estimation. In estimating the proportion of wasted expenditure associated with observation Ps YVS with respect to the CS preferences, the overesti- mation of the expenditure function underesti- mates the fraction of wasted expenditure and vice-versa. Thus we define these approximations as follows:

-~r(pt~,~) i i tSYv - aej+(pis, Yvs) Vs Cs PP S YY cs (3)

a i Pu Plsv ae,s = awi (pVS,yLS) = . - ae(p. i L

PusYuss

where aw-(p's,y',) is the underapproximation of the fraction of wasted expenditure'4 and a +(p y) is the overapproximation.

We defined the CS wasted expenditure index for observation i in the VS as:

i aw+ (ps, y') + a w- (p PS, YVS)

aw(ps, y's) = 2

(4)

the mean of the over- and underapproximation. This is computed for each observation in the VS.

Since each observation in the VS can be thought of as the solution to the "wrong" utility function with respect to the CS, the associated expendi- ture, Pisi will be greater than that required to attain the level of utility associated with that consumption bundle if the behavior were consis- tent with the CS data. Thus, the value of the index for each VS observation will be non-nega- tive.'6

13 I began this line of research after a discussion with Hal Varian in which he described a parametric measure based on wasted expenditure (see Varian (1990)). I was already thinking about consistent subsets and was looking for a reasonable way to characterize differences between CS and VS observations and this seemed a natural approach.

14 Afriat's index and the lower bound on wasted expenditure are equivalent when there is only one violation of WARP. A proof is available from the author.

15 We thank Hal Varian and Lee Redding for subroutines borrowed with only minor changes from NonPar which we use to compute the upper and lower bounds on the expenditure function.

6 A proof is available from the author.

This content downloaded from 185.2.32.121 on Tue, 24 Jun 2014 21:37:17 PMAll use subject to JSTOR Terms and Conditions

Page 8: Testing Data for Consistency with Revealed Preference

TESTING DATA FOR CONSISTENCY 707

For a data set with n observations we compute the observed value of the test statistic (:'7

A n a ao. (5) = n

In theorem 1 we establish the consistency of 4.

THEOREM 1: If demand functions are single val- ued, continuously differentiable in prices and in- come, and homogeneous of degree zero in prices and income, then as n -x oo, ( -* 4. Proof: Houthakker (1950) shows that if the data are infinite and demand functions have the above properties, through revealed preference tech- niques we can exactly determine the indifference surface associated with any bundle. Thus, as the number of observations increases, the bounds on the indifference surfaces become tighter and in the limit a w' and ao- coincide becoming the true measure of wasted expenditure, co.

Remarks on the VS: In the limit, the non-unique- ness of VS is eliminated. As the number of obser- vations increases and the entire indifference sur- face becomes more tightly defined, each violator will be in violation with an increasing number of observations. In the limit, each element of the VS will be uniquely identified as it will be in violation with an infinite number of observations.

We generatedA sample data to illustrate the computation of f for data of known characteris- tics. We constructed 40 data sets of 200 observa- tions based on three Cobb-Douglas utility func- tions: UA = X1OX2/ , UB = X1/X2/, and Uc =

x3/1?x7/10. By randomly selecting incomes from a uniform distribution on the interval between 10,000 and 12,000 and price ratios from a uniform distribution on the 0.5 to 1.5 interval, we con- structed the 200 budgets which were used for all 40 data sets. This narrow income distribution and relatively wide price ratio distribution generated plentiful budget intersections. We constructed 20 data sets by choosing utility maximizing bundles consistent with UA and UB. The first such data set contained 199 bundles found by maximizing UA and 1 by maximizing UB, the second was com- prised of 198 UA and 2 UB bundles, and so on with the 20th consisting of 180 bundles from UA

TABLE 1.-VALUES OF (

UA UB C (A B A, C

199 1 0.0031 0.0000 198 2 0.0060 0.0006 197 3 0.0086 0.0006 196 4 0.0104 0.0006 195 5 0.0115 0.0007 194 6 0.0117 0.0007 193 7 0.0139 0.0007 192 8 0.0171 0.0010 191 9 0.0186 0.0013 190 10 0.0194 0.0019 189 11 0.0194 0.0023 188 12 0.0198 0.0023 187 13 0.0228 0.0027 186 14 0.0260 0.0030 185 15 0.0263 0.0035 184 16 0.0286 0.0035 183 17 0.0312 0.0035 182 18 0.0322 0.0038 181 19 0.0350 0.0042 180 20 0.0363 0.0042

and 20 from UB. We also constructed 20 data sets combining UA and Uc in the same proportions. This gives us data sets involving observations based on very large differences in tastes and data based on rather modest differences. Results of

A

the computation of f for these 40 data sets are given in table 1.

The first column gives the number of observa- tions based on UA, the second gives the number of observations based on UB or Uc. The column labeled (A B gives the results for data formed from combining UA and UB, the case where the difference in preferences was greater. The results from combining observations from UA and Uc are found in the column labeled (A,C.

Whether differences in tastes are modest or large, as the number of observations generated by different preferences increases the value of the test statistic tends to increase. This is seen in increasing values as we read down either of the last two columns. Comparing the last two values in any row demonstrates the impact of differ- ences in tastes holding the number of observa- tions generated by different preferences constant. As expected, in each row f is larger in the (A B

column, where taste differences are greater. Thus, our test statistic performs as expected.

Without knowledge of its distribution under the null hypothesis however, values of (, like those of any test statistic, are difficult to interpret.

1 Recall that we can sum over all observations as those in the CS are assigned an index of 0.

This content downloaded from 185.2.32.121 on Tue, 24 Jun 2014 21:37:17 PMAll use subject to JSTOR Terms and Conditions

Page 9: Testing Data for Consistency with Revealed Preference

708 THE REVIEW OF ECONOMICS AND STATISTICS

V. Bootstrapping the Distribution of t

In general, derivation of the analytical form of the distribution of (, or for that matter any test statistic based on revealed preference methods, is quite difficult. Thus, we propose a bootstrap ap- proach.'8 Devised by Efron (1979, 1982), the bootstrap involves resampling the observed data in order to generate an empirical distribution of the test statistic. It is both a practical and easily implemented method of obtaining the distribu- tion of a difficult statistic.

A. Maintaining Original Budgets

Recall that our null hypothesis is that the data were generated by consumers with different pref- erences. Suppose we compare an observed value of the test statistic with values expected from a distribution arising from a process involving more plentiful budget intersections than in the original data. Since this will mean that violations are far more likely than in the observed data we will be too likely to reject the null. Conversely, if the distribution arises from a process with fewer in- tersections than in the observed data we will be too likely not to reject. Thus, the distribution under the null hypothesis of any test statistic based on revealed preference methods is condi- tional on the observed budgets."'

Consider the experiment suggested above. For a data set D we compute (, the observed value of the test statistic. We then resample the origi- nal data with replacement constructing s data sets with the same budgets as in the data. Using these data sets we compute s values of the test statistic, (, 2, . . ., 5 The bootstrap estimator, Pf, based on s re-samplings is

p = ( ).(6) s

Reading # as "number of occurrences" we see from (6) that Pf is simply the proportion of (s no greater than the observed value (, and thus, is an estimate of the probability of drawing a value no greater than f under the null.

B. Re-sampling

The final issue in construction of the bootstrap distribution is choosing a process for generation of consumption bundles consistent with the null. Under our specified null we expect to observe consumption bundles that are all the result of utility maximization, but of a variety of utility functions.

By randomly drawing, with replacement, from the fractions of expenditure devoted to the con- sumption goods, we obtain consumption bundles which lie on the original budgets. In the simplest version of this process, each budget has probabil- ity 1/n of receiving any observed set of expendi- ture shares found in the original n observation data.

Using a single distribution of proportions for the entire population implicitly assumes homoth- etic preferences.21' This may be serious if expen- diture variation is large. For example, high in- come households will almost invariably spend far smaller fractions of total expenditure on food than lower income households and our bootstrap process should reflect this. Thus, we propose that observations be categorized according to total expenditure. The size and number of categories will depend on the characteristics and size of the data set as well as any prior knowledge or as- sumptions held by the investigator. Typically, two or at most three categories will be adequate, roughly splitting the observations into equal size subsets, or at some level of income that is recog-

21 nized as a class division. In particular, suppose there are two classes of

total expenditure and three consumption goods. For each expenditure class there is an observed distribution of fractions of expenditure spent on the goods. Denote the ith set of expenditure fractions for goods 2 and 3 in the kth category as 82, ,3 k. For observation j, in expenditure cate- gory 2, a set of expenditure fractions is drawn. For the jth budget we determine the consump-

18 Bronars (1987) proposes simulations in cases where di- rect power calculations are not feasible. His proposal led to the idea of using simulations to derive the distribution of the test statistic.

" Both Bronars (1987) and Aizcorbie (1991) clearly recog- nize this in their power computations.

2() Thanks to Ted Bergstrom and Stephen Bronars for point- ing this out.

21 Increasing the number of categories reduces the variance of the empirical distribution. To see this note that in the limit, if each observaton is in a unique group, each will receive its original consumption bundle in the bootstrap procedure and thus every estimate of 6 will be the same as the observed value 6.

This content downloaded from 185.2.32.121 on Tue, 24 Jun 2014 21:37:17 PMAll use subject to JSTOR Terms and Conditions

Page 10: Testing Data for Consistency with Revealed Preference

TESTING DATA FOR CONSISTENCY 709

tion bundle by setting the quantities according to

p'x' 2812 piXi ji2

q2= P2 ' q3 P3

and

pix j (1 -i,2 i,82)

Pi

This is carried out for each budget allowing com- putation of a value of (, one point in our empiri- cal distribution. The process is repeated s times creating s values of ( enabling us to compute the bootstrap estimate P4. If P(< a, where a repre- sents the chosen level of significance, the null is rejected.

A small complication arises in data for which one or more of the consumption goods cannot be reasonably thought of as a continuous variable, and thus numerous consumers consume identical quantities. Consider a two good example where good 2 is observed to be consumed in integer amounts. For any two consumers whose con- sumption of good 2 is the same, no violation is possible.22 Thus, data constructed in accordance with the null should reflect the frequency with which this occurs in the observed data. One method which achieves this is to draw two expen- diture proportions with replacement, then draw quantities with replacement from the observed quantities until one is found which yields a pro- portion within the range defined by the first draw.23

VI. An Empirical Example

This paper is not intended as an empirical study, however we provide a very brief empirical example.24

We use data from Wave IX (1976) of the Panel Study for Income Dynamics. Heads of households

TABLE 2.-CHILD CARE RESULTS

N Violations N 245 174 0.0164 0.94

were interviewed to obtain behavioral and eco- nomic information. Wives were asked about their labor supply, earnings, and use of child care services (for those with children below the age of 12). We consider only the 334 women who ob- tained paid child care services. Varian (1988) shows that the axioms of revealed preference place no restrictions on unobserved price or quantity information, thus this group had to be further reduced as we eliminate observations in which we have no response on the quantity (num- ber of hours per week of child care), price, or income, leaving us with 245 complete observa- tions.

Our null hypothesis is that, for the women observed, a single utility function does not exist which is consistent with observed choices for child care and a composite good. Our alternative is that such a function does exist. Rejection of the null is taken as evidence that these women have similar preferences.

Recognizing that reported child care consump- tion is somewhat lumpy (more women report choosing 40 hours than any other value, and 45 and 50 were reported frequently), we constructed our bootstrap data sets by sampling with replace- ment in the manner described at the end of the previous section. Our findings are given in table 2.

It is evident that there is little support for the hypothesis that a single utility function is consis- tent with these data. Indeed, we expect that the frequency with which we would see data at least as consistent under the null is 0.94.25

VII. Conclusion

We consider the problem of testing data for consistency with revealed preference, particularly where investigators are interested in testing for

22 The unconvinced reader can construct a simple geomet- ric example.

23 Since this process may fail, even with repeated draws, to find a quantity which yields a proportion in the required range, one can either limit the number of quantity draws, redrawing proportions after a number of failures, or after each failure, redraw both proportions and a quantity until a successful draw is obtained. This latter method was used in the empirical example below.

24 Pascal source code which fully implements the methods presented in this paper for tests based on WARP is available from the author.

25 Gross and Hallinan (1994) investigate these data more fully. Specifically they investigate whether, as data are strati- fied by demographic characteristics, more consistency with revealed preference emerges.

This content downloaded from 185.2.32.121 on Tue, 24 Jun 2014 21:37:17 PMAll use subject to JSTOR Terms and Conditions

Page 11: Testing Data for Consistency with Revealed Preference

710 THE REVIEW OF ECONOMICS AND STATISTICS

commonality or stability of tastes. Several ap- proaches advanced by other authors are exam- ined and rejected as impractical, insufficiently informative, or more appropriate for other inves- tigations.

We propose the use of a test statistic based on differences in preferences between observations which are reconcilable with a single utility func- tion and those which are not. Our metric may be thought of as a measure of the proportion of expenditure wasted if all consumers are maximiz- ing the same utility function. We find that this metric increases in size as differences in taste increase and as the proportion of inconsistent observations increases.

We emphasize the need to estimate the distri- bution of the test statistic conditional on the observed budget constraints. As a practical alter- native to obtaining an analytical form of the distribution of the test statistic, bootstrap meth- ods of estimating the distribution are proposed. Finally, we provide a brief empirical example.

ERRATUM

In the article "Do Hostile Takeovers Reduce Extramarginal Wage Payments?," by Jagadeesh Gokhale, Erica L. Groshen, and David Neumark, which appeared in the August 1995 issue of this Review (vol. 77, no. 3, pp. 470-485), the following acknowledgement was inadvertently omitted.

We are grateful to Steven Sharpe for helpful discussions, to Colin Drozdowski and Edward Bryden for research assistance, and to anonymous referees for helpful comments.

Neumark's research was partially funded by NSF grant SES-92-09575 and NIA grant 33-4031-00-1-80-394.

REFERENCES

Afriat, Sydney N., "The Construction of Utility Functions from Expenditure Data," Intemational Economic Re- view 8 (1967), 67-77. , "On a System of Inequalities in Demand Analysis: An Extension of the Classical Method," International Economic Review 14 (1973), 460-472.

Aizcorbe, Ana M., "A Lower Bound for the Power of Non- parametric Tests," Journal of Business and Economic Statistics 9 (1991), 463-467.

Banker, Rajiv D., and Ajay Maindiratta, "Nonparametric Analysis of Technical and Allocative Efficiencies in Production," Econometrica 56 (1988), 1315-1332.

Bronars, Stephen G., "The Power of Nonparametric Test of Preference Maximization," Econometrica 55 (1987), 693-698.

Chalfant, James A., and Julian M. Alston, "Accounting for Changes in Tastes," Joumal of Political Economy 96 (1988), 391-410.

Chavas, Jean-Paul, and Thomas Cox, "A Non-Parametric Analysis of Productivity: The Case of U.S. and Japanese Manufacturing," American Economic Review 80 (1990), 450-464.

Diewert, W. Erwin, "Afriat and Revealed Preference Theory," Review of Economic Studies 40 (1973) 419-426.

Diewert, W. Erwin, and Cecil Parkan, "Tests for the Consis-

tency of Consumer Data," Journal of Econometrics 30 (1985), 127-147.

Dowrick, Steve, and John Quiggin, "International Compar- isons of Living Standards and Tastes: A Revealed-Pref- erence Analysis," American Economic Review 84 (1994), 332-341.

Efron, Bradly, " Bootstrap Methods: Another Look at the Jackknife," Annals of Statistics 7 (1979), 1-26. , The Jackknife, the Bootstrap, and Other Resampling Plans (Philadelphia: Society for Industrial and Applied Mathematics, 1982).

Famulari, Melissa, "Household Labor Supply and Taxes: A Nonparametric Approach," Working Paper, University of Texas at Austin (1994). , "A Household-Based, Nonparametric Test of De- mand Theory," this REVIEW 77 (May 1995), 372-382.

Gross, John, "On Expenditure Indices in Revealed Prefer- ence Tests," Journal of Political Economy 99 (1991), 416-419. , "Heterogeneity of Preferences for Local Public Goods: The Case of Private Expenditure on Public Education," Journal of Public Economics 57 (1995), 103-127.

Gross, John, and Patricia Hallinan, "Demographic Traits and the Demand for Childcare," Working Paper, Marquette University (1994).

Gross, John, and Dan Kaiser, "Two Simple Algorithms for Generating Subsets of Data Consistent with WARP and other Binary Relations," Journal of Business and Economic Statistics (forthcoming).

Houthakker, Hendrik S., "Revealed Preference and the Util- ity Function," Economica 17 (1950), 159-174.

Houtman, Martijn, and J. A. H. Maks, "Determining All Maximal Data Subsets Consistent with Revealed Pref- erence," Kwantitatieve Methoden 19 (1985), 89-104.

McMillan, Melville L., and Joe Amoako-Tuffour, "An Exami- nation of Preferences for Local Public Sector Options," this REVIEW 70 (1988), 45-54.

Rose, H., "Consistency of Preference: The Two-Commodity Case," Review of Economic Studies 35 (1958), 124-125.

Swofford, James L., and Gerald A. Whitney, "Nonparametric Test of Utility Maximization and Weak Separability for Consumption, Leisure and Money," this REVIEW 69 (1987), 458-464.

Varian, Hal R., "The Nonparametric Approach to Demand Analysis," Econometrica 50 (1982), 945-973. , "Non-parametric Tests of Consumer Behavior," Re- view of Economic Studies (1983), 99 -110. ,"Nonparametric Analysis of Optimizing Behavior with Measurement Error," Journal of Econometrics 30 (1985), 445-458.

_____, "Revealed Preference with a Subset of Goods,"Jour- nal of Economic Theory 46 (1988), 179-185.

____, "Goodness-of-Fit in Demand Analysis," CREST working paper (1989). , "Goodness-of-Fit in Demand Analysis," Journal of Econometrics 46 (1990), 125-140.

This content downloaded from 185.2.32.121 on Tue, 24 Jun 2014 21:37:17 PMAll use subject to JSTOR Terms and Conditions