Household Responses to Food Subsidies: Evidence …cega.berkeley.edu/assets/cega_events/61/1D_Food_and_Agriculture_3... · Household Responses to Food Subsidies: Evidence from India

Preliminary Draft: Please do not cite without author’s permission.

Household Responses to Food Subsidies:

Evidence from India∗

Tara Kaul†

University of Maryland, College Park

January 8, 2014

Abstract

This paper uses household survey data to examine the effect of food subsidies on thenutritional outcomes of poor households in India. The national food security program,known as the Public Distribution System (PDS), takes the form of a monthly quotaof cereals (rice and/or wheat) available for purchase at substantially discounted prices.The effect of the program is studied by exploiting the geographic and household sizespecific variations in the value of the subsidy that result from differences in state pro-gram rules and local market prices. The analysis focusses on the rice subsidy programand outcomes are consumption data from six years (2002-2008) of the NSSO Socio-Economic surveys. In agreement with other literature on food subsidies, this paperfinds small elasticities for cereal consumption and caloric intake with respect to thevalue of the subsidy. However, households benefit from the program in terms of overallfood intake and not just through cereals directly provided by the PDS. The elastici-ties for calories from all food groups are positive and significant. This is in contrastto studies on pure price subsidies for cereals that have found zero or negative effectson caloric intake. Thus, the results in this paper suggest that quotas may be moreeffective than price subsidies at improving nutrition. Taking into account state leveldifferences in the functioning of the program, a substantially smaller effect is found instates that have higher levels of corruption. Finally, the estimates from this paper areused to simulate the caloric impact of changes in the PDS under the provisions of theNational Food Security Bill of 2013.

JEL classification: I38; H42; O12; O15; Q18

∗I am grateful to Jessica Goldberg and Melissa Kearney for their invaluable guidance and support atevery stage of this project. I thank Sebastian Galiani, Parikshit Ghosh and Susan Godlonton for helpfulcomments and discussions. I also thank Ron Chan, Daisy Dai, Sonal Desai, Brinda Karat, Reetika Khera,Giordano Palloni, Brian Quistorff, Miguel Sarzosa, Suraj Shekhar and seminar participants at the Universityof Maryland, Delhi School of Economics, the 2013 MEA Annual Meeting, the 2013 Midwest InternationalEconomic Development Conference and the 2013 APPAM Fall Research Conference for useful feedback. Igratefully acknowledge use of data from the National Sample Survey Office, New Delhi and the India HumanDevelopment Survey. All errors and omissions are my own.

†Email: [email protected]

1

1. Introduction

Provision of food security is one of the most basic forms of assistance to poor households

and many governments around the world provide aid in the form of food subsidies. Despite

their popularity and importance, there is a relatively small body of research studying the

impact of such subsidies on nutritional outcomes. While the primary rationale for all food

subsidies is reducing food insecurity (FAO 1997), their impact on nutrition (via food intake)

has generally been found to be quite small, in some cases even zero or negative.1

Food subsidies are implemented via food stamps, price subsidies, direct in-kind transfers

or through the operation of ration shops (price subsidies with a quantity cap). The mecha-

nism through which a food subsidy operates depends on the manner in which the program

affects income and prices (and the related elasticities of the demand for food), the specific

types of food groups it targets (and the income and cross price elasticities of other foods),

and how close the beneficiary households are to their ideal level of food consumption. This

paper examines India’s food security program, the Public Distribution System (PDS). The

PDS is one of the country’s biggest anti-poverty programs. It has a target population of

65.2 million households and the total cost of food subsidies amounts to almost 1% of GDP

(Planning Commission 2012).

1Butler, Ohls and Posner (1985) find a very limited nutritional impact of the food stamps program in theUnited States. The Indian food security program has also historically been found to have a small impacton nutrition (Kochar 2005, Tarozzi 2005, Khera 2011c). Jensen and Miller (2011) find a negative impact ofprice subsidies on caloric intake in China. There is also a literature comparing the effects of food subsidiesand income transfers. Hoynes and Schanzenbach (2009) find that the food stamps program in the UnitedStates has the same impact on food expenditure as cash. Steifel and Alderman (2006) and Laderchi (2001)find similar results in Peru where the impact of food subsidies on child nutrition is no different than that ofcash transfers.

2

The PDS provides a monthly quota of cereals (20-35 kg per household) at substantially

discounted prices to households that are below the poverty line (Government of India 2011).

The poverty line in India is determined by having an income deemed adequate to purchase a

basic minimum level of calories.2 Since cereals are relatively cheap and rich in energy, they

account for a substantial proportion of the total calories consumed by poor households.3

Thus the PDS aims to directly increase caloric intake by providing supplementary cereals to

poor households, who are, by assumption, food insecure. The program may, however, have

consequences in terms of consumption of other food groups which are critical to maintaining

a healthy and balanced diet. This is particularly important in India where the levels of child

malnutrition are alarmingly high and have been linked to severe micronutrient deficiencies

(Banerjee and Duflo 2011). Thus it is necessary to study the impact of the program not

just on the intake of cereals and total calories, but also on calories from different food groups.4

The value of the subsidy to a household is a function of the quantity cap and the discount

(difference between the market and PDS price of cereals) that the household is eligible for.

This paper uses two previously unexploited sources of variation in the value of the subsidy to

measure its impact on nutritional outcomes. First, the total quantity cap for each household

2Deaton and Dreze (2009) find that the average caloric intake in India has declined over time. However,they note that this declining trend does not undermine the importance of reducing calorie deficiencies forpoor households. In my sample, the caloric intake of the below poverty line (BPL) population is well belowthe minimum caloric norms (and the calories consumed by more affluent households). See Table 4 ahead.

3Cereals account for over 72% of the total calories consumed by BPL households in the sample understudy (Table 4) .

4While the links between improvements in caloric intake and health outcomes certainly exist, the rela-tionship is not necessarily linear and depends critically on other factors such as a balanced diet, activitylevels, sanitation, and basic medical facilities (Deaton and Dreze 2009). In the absence of data on healthoutcomes in my dataset, I am unable to predict the impact of the subsidy on health. However, assessing theimpact on caloric intake is important in itself. Reducing the incidence of hunger by improving caloric intakefor a food insecure population, such as BPL households in India, is among the major humanitarian goals ofmany governments and international organisations around the world.

3

varies by state since the program is jointly funded by the state and central governments.

Further, some states index the cap by family size and others offer a fixed quantity to every

household. This results in variation, even within state, in the per person quantity cap, due

to differences in family size. Second, the price charged for PDS goods is typically set for

the year and not linked in any way to fluctuations in market prices. Due to the absence of

perfectly integrated agricultural markets and controls on the movement of goods within the

country, the difference between the PDS and market price within a year varies substantially

across districts (within a state) and seasons.

State level program rules and differences in local market prices are used to compute the

value of the subsidy for each beneficiary household, based on its size and district-season-year

cell. This value is used to identify the effect of the subsidy, assuming that after controlling

for household characteristics, district, state-year and seasonal effects, the remaining variation

in the value of the subsidy (owing to unpredictable fluctuations in local market prices and

differences in the per person quantity cap due to family size) is exogenous. The consumption

data come from six years (2002-2008) of the nationally representative socio-economic surveys

conducted by the National Sample Survey Organisation (NSSO). The analysis focusses on

eight major states in India where rice is the primary staple. The main outcomes studied

are cereal consumption, caloric intake, calories from different food groups, and total food

expenditure. In order to compare the estimates with prior work on expenditure and income

elasticities, supplementary data on income and consumption from a smaller but much richer

dataset, the India Human Development Survey (2004-05), is used.

The estimated elasticities of cereal consumption and calories with respect to the value of

4

the subsidy are small, but higher than previous estimates – the elasticity of caloric intake

with respect to the value of the subsidy is 0.144, as compared to 0.06 in Kochar (2005).

An increase in the rupee value of the subsidy increases calories by more than twice of what

is implied by its impact on cereal consumption alone. This is an important indicator that

households benefit from the program in terms of overall caloric intake and not just through

the cereals directly provided by the PDS. Even though the program subsidizes only cereals,

it has a positive effect on the consumption of all types of food: the impact on calories for

all food groups is positive and significant. In contrast to Jensen and Miller (2011), who find

a zero or negative effect of pure price subsidies on overall calories and calories from differ-

ent food groups, the results in this paper suggest that quotas may be more effective than

price subsidies in improving nutrition via caloric intake. Results are robust to alternative

specifications of the value of the subsidy and the impact of the program on non beneficiary

households is examined and found to be small.

State level differences in the administration of the PDS are taken into account using ev-

idence on program fidelity from Khera (2011a). As expected, the subsidy has a substantially

smaller impact in those states where the illegal diversion of food grains away from their in-

tended beneficiaries is reported to be high. Despite serious implementation issues, the PDS

continues to be one of the government’s biggest anti-poverty programs and an expansion of

the system is imminent under the National Food Security Bill, which was passed in Septem-

ber 2013. The bill will expand both the reach, and the value of subsidies provided through

the PDS. The elasticity estimates from this paper, combined with current data on prices,

suggest that the increase in the value of the subsidy under the provisions of the bill will

lead to an increase of 66 - 72 kcal in the daily caloric intake of current beneficiaries of the

5

program. These gains will be much higher if states are able to successfully reduce leakages

from the PDS.

The paper is organized as follows: Section 2 describes the functioning, history and rules

of the Public Distribution System in India. The motivation is described in section 3, which

comprises the conceptual framework and a review of the related literature. The data, empir-

ical specification and descriptive statistics are presented in section 4 and section 5 discusses

the results. The final section concludes and describes avenues for future work.

2. The Public Distribution System in India

The Public Distribution System was established in India in 1939 by the then British govern-

ment to cope with rising food prices and food shortages in Bombay and other urban areas.

Over the years it expanded its scope dramatically and is currently one of the Indian gov-

ernments most significant welfare programs. In the 2012-13 central government budget, the

food subsidy was projected at Rs 750 billion (approximately US$13.8 billion, Government

of India 2012). The PDS provides a minimum support price to farmers and acts as a food

safety net for the rural and urban poor. It works alongside the free market and provides

rice, wheat, edible oils, sugar and kerosene at subsidized prices through 489,000 fair price

shops across the country (Planning Commission 2012).

Prior to 1997, any household could purchase a monthly quota of subsidized products at

fair price shops. In 1997, there was a major change in the PDS with the introduction of the

Targeted Public Distribution System (TPDS). Under the new system, families classified as

6

being below the poverty line (BPL) could get 10 kg of food grains per month at half the

economic cost to the central government of procuring them. This quota was increased in

2000 to 20 kg per month (rice and/or wheat) and in 2002, the quota was revised upwards to

35 kg for most states. The subsidy for above poverty line (APL) families was eliminated.5

Table 1 presents the details of state specific quotas for the period 2002-2008. Kochar (2005)

estimates that the value of the subsidy, defined as the product of the quantity entitlement

and the difference between the market and PDS price, increased from Rs 7 per household

per month in 1993 to Rs 48 per household per month in 2000 (an increase of over 500%).

Thus the TPDS introduced targeting and substantially increased the subsidy for BPL fam-

ilies. Currently, PDS prices are fixed by the central government and state governments are

only allowed to add limited transportation costs and taxes. PDS prices are not indexed in

any way to market prices. 24.3 million households classified as Antyodaya (poorest of the

poor) are entitled to a larger quota and still lower prices. The next big change in the PDS

is imminent as per the provisions of the National Food Security Bill, which was passed in

September 2013. This bill makes it a legal right for 67% of the population to obtain 5 kg of

foodgrains (per person per month) at prices between Re 1 and Rs 3 per kilogram. The eldest

woman in the household will be given access to the subsidy on behalf of her family (The

Hindu, July 2013). The resulting expansion in the reach and value of subsidies provided

through the PDS could make it the largest food security program in the world.

The PDS is an essential part of the social framework in many areas of the country, particu-

larly places with limited access to markets. However, the implementation of the program has

5Tamil Nadu continues to have a universal subsidy and, since June 2011, provides 20 kg of rice per monthfree of cost to BPL households.

7

been called into question on a number of dimensions.6 The first is the targeting accuracy of

the PDS post 1997. Local governments are responsible for periodically carrying out surveys

to assign poverty scores to households. The central government provides a cap on the number

of BPL households for each state. State governments translate these caps into cut offs for

the poverty score. All households that fall below their respective state- and district-specific

cut off are classified as being below the poverty line and are issued BPL cards (Karat 2013

and Dreze and Khera 2010). In 2002, the BPL survey comprised 13 questions related to

educational status, asset holdings, indebtedness, demographics etc. Targeting leads to er-

rors of inclusion and exclusion. The reported rate of exclusion (eligible households without

a BPL card) varies from 3% in Andhra Pradesh to 47% in Assam (Planning Commission

2005). In an independent study on Rajasthan, Khera (2008) finds an exclusion error of 44%.

Swaminathan (2008) also reports high rates of exclusion, particularly in states like Kerala

that switched from a universal PDS. Kochar (2005) argues that targeting reduced political

support for the program.

The second issue is corruption in the PDS, particularly through diversion of food grains

at different points along the distribution chain. This has been often highlighted in newspa-

per editorials, magazines, government reports and journals, though accurate estimates are

difficult to obtain. The Planning Commission (2005) estimates that the government of In-

dia spends Rs 3.65 to get Re 1 worth of subsidy to a BPL household. Based on a survey

conducted in 2001, the report concluded that 58% of food grains procured by PDS did not

reach their intended beneficiaries due to a combination of inaccurate targeting and diver-

sion. Khera (2011a) puts this number at 44% in 2007-08 by comparing official figures for

6A discussion of how the analysis deals with these implementation issues is presented in sections 3.3 and5.5.

8

food grains distributed through the PDS, with data on purchases from household surveys.

She divides states into those that are functioning, reforming, or languishing in terms of the

average quantity of food grains reaching households. Of late, significant state level differ-

ences in the working of the PDS have emerged and some states are improving delivery. A

nine state survey by Khera (2011b) conducted in May and June 2011 finds that over 85%

of the monthly entitlement was received by beneficiaries in these states. In November 2012,

the Indian government announced that a number of subsidy programs such as scholarships,

cooking fuel subsidies, pensions and unemployment benefits would be converted into direct

cash transfers in a phased manner starting in January 2013. This move is an attempt to

reduce inefficiencies and corruption in the implementation of various welfare schemes. Food

grains in the PDS are not yet a part of the proposed switch but some states, such as Bihar,

Madhya Pradesh and Delhi, are conducting pilot studies and have expressed an interest in

making that change.

3. Motivation

3.1 Conceptual Framework

A price discount with a quantity cap is typically represented by an outward shift of the

budget constraint. Figure 1 presents this shift in the household budget set with food on the

horizontal axis and the rupee value of non-food consumption on the vertical axis. Let the

price of non food purchases be normalized to Re 1. A subsidy comprising a discount of δ and

a quota Q shifts the budget set from NF to NCD. As demonstrated in Moffitt (1989) and

Deaton (1984), the household will choose segment I if its indifference curve is tangent at any

point (say, A) on the segment CD and a similar argument holds for tangency on segment

9

II. If tangency is not achieved on either segment, the household will choose the kink C. The

food subsidy program in India varies geographically and by family size in the amount of the

quota, the discount and the resulting value of the subsidy. Following Kochar (2005) and

Khera (2011c), these program rules are presented below by slightly modifying the standard

utility maximization problem.

Let the utility function for household i be Ui = f((1 − α)xi + yi, zi;Ti), where xi repre-

sents the subsidized food, yi represents food purchased from the market, zi is non-food

purchases and Ti denotes tastes which could depend on age, gender, composition and other

household characteristics. The subsidized and market food purchases are substitutes and

α (0 ≤ α ≤ 1) represents any (multiplicative) transaction costs associated with buying the

former. Let px, py and pz be the prices faced by the household where px= (1 − δ)py with

δ being the discount (0 ≤ δ ≤ 1). Let Q be the quota and Mi the household income. The

household’s maximization problem is:

maxxi,yi,zi

f((1− α)xi + yi, zi;Ti)

subject to:

pxxi + pyyi + pzzi ≤ Mi

xi ≤ Q

xi ≥ 0, yi ≥ 0, zi ≥ 0

10

The resulting Lagrangian is:

L = maxxi,yi,zi,λ,γ,µ1,µ2,µ3

f((1− α)xi + yi, zi;Ti) + λ(Mi − pxxi − pyyi − pzzi)

+γ(Q− xi) + µ1xi + µ2yi + µ3zi

Solving the first order conditions provides the demand functions for specific parameter val-

ues. The solution (x∗i , y

∗i , z

∗i ) will belong to one of the following four cases.

Case 1 : x∗i = 0, y∗i ≥ 0, z∗i ≥ 0

α > δ, MUy

MUz= py

pz(First order conditions)

Transaction costs (α) far outweigh the discount benefit (δ) and the household does not make

any purchases from the PDS (i.e. non participation).

Case 2 : 0 < x∗i < Q, y∗i = 0, z∗i ≥ 0

α < δ, MUxMUz

= pxpz

(First order conditions)

The household meets its entire food requirement through the PDS and does not need to

make any purchases from the market. This case is represented by point B in figure 1.

Case 3 : x∗i = Q, y∗i = 0, z∗i ≥ 0

α < δ, MUxMUz

> pxpz, MUy

MUz< py


The household would like to purchase more food at the discounted rate, but not at the mar-

ket price. This case refers to point C in figure 1, i.e. locating exactly at the kink.

11

Case 4 : x∗i = Q, y∗i > 0, z∗i ≥ 0

α < δ, MUy

MUz= py


The quota is binding and the household supplements its food with market purchases. This is

represented by point A in figure 1. Note that voluntary under-purchase, where 0 < x∗i < Q

and y∗i > 0 could take place if α = δ. However, this will never occur if there is even an

infinitely small (�) fixed cost associated with going to the fair price shop and thus, under-

purchase is likely driven by supply issues and collapses to case 4.

3.2 Expected Effect of the PDS

Post 1997, both the discount and the quantity cap in the PDS for BPL families were sub-

stantially increased. Khera (2011c) finds that participation in the program has dramatically

risen since the early 2000’s. She confirms that the conditions for case 1, where eligible house-

holds make no purchases from the PDS, are unlikely to be realized. Cases 2 and 3, where

households make purchases from the PDS but none from the market are also unlikely to

be relevant. The PDS is intended as a supplementary program and does not seek to meet

the entire food requirement of beneficiary households (Government of India, 2011). Kochar

(2005) and Khera (2011c) find that the quota is low enough to necessitate supplementary

purchases of food from the market for all households. The data from the NSSO surveys also

confirm that cases 1-3 are not supported. As shown ahead in section 4, all PDS beneficiary

households make food purchases in the market.7

The special case of households voluntarily under-purchasing from the PDS and buying addi-

tional food in the market is also not likely to hold. Khera (2011c) finds that the quantity of

7PDS rice expenditure accounts for less than 8% of the total food expenditure for a typical BPL household.

12

cereal purchased from the PDS does not respond to income, but overall cereal purchase does.

She also checks for the impact of other household characteristics on PDS purchases and finds

no effect. She concludes that under purchase is largely driven by supply side issues.8 Thus

based on program rules, past studies and summary statistics, the most relevant case is 4,

where the subsidy can be viewed as providing extra income to the household.9 This extra

income is given by the horizontal distance FD in figure 1: Si = (py − px)Q.

For a low enough quota and transaction costs less than the discount, the quota will bind

and the household will purchase food and other items in the market. To see this, consider a

Cobb-Douglas utility function of the form f = ((1− α)xi + yi)1/2z1/2i .10 The solution to the

household’s problem is:

y∗i =Mi −Q(2− α− δ)py

2py, z∗i =

Mi +Q(δ − α)py2pz

, x∗i = Q

conditional on:

Q < Mi(2−α−δ)py

(Quota < threshold value)

α < δ (Transaction costs < discount)

8In the IHDS (2005) data, of the poor households that had not accessed the PDS in the last 6 months,over 57% reported supply side constraints as the primary issue.

9This is also in agreement with opinions recently expressed in the popular press. Since all beneficiaryhouseholds purchase additional grain from the market, Arvind Panagariya believes that the proposed expan-sion of the PDS will have the same effect as a cash transfer of equal value (The Times of India, June 2013).Ashok Kotwal agrees that the PDS ’..is nothing but an income transfer.’ (The Economic Times, June 2013).

10Cobb-Douglas has the unique property that the proportion of the budget spent on food will be indepen-dent of the price of other commodities.

13

The total food consumption (F ∗i = y∗i +Q) is:

F ∗i =

Mi +Q(δ − α)py2py

where ∂Fi∗∂Q > 0 , ∂Fi∗

∂δ > 0 and ∂Fi∗∂α < 0.

Thus, the total food consumption is an increasing function of the quota and discount, and a

decreasing function of transaction costs. This motivates the use of a regression framework

taking into account the program rules, value of the subsidy, transaction costs and tastes.

Inefficiencies and corruption in the administration of the program would manifest in the

form of a higher α (poor quality, long waiting time, inconvenience, distance to FPS) or a

lower actual Q (supply side constraints).

3.3 Evidence on Food Subsidies and Nutrition

Food subsidies are an important policy tool and their impact on household outcomes such

as nutrition, caloric intake, early life development and income stabilization has been exam-

ined in both developed and developing countries. The national food stamps program (FSP)

in the US (renamed SNAP in 2008) has been studied extensively.11 Between 1961-75 the

program was phased in with different counties adopting it at different times. Hoynes and

Schanzenbach (2009) exploit this difference in timing to evaluate the impact of the program

and find that it led to an overall increase in food expenditures, as expected. Butler, Ohls

and Posner (1985) find very small effects of the FSP on the nutrient intake of the eligible

elderly, either through stamps or cash.

11See Hoynes and Schanzenbach (2009) for a review of the large literature on the food stamps program.

14

Despite lower per capita incomes and greater malnutrition, most studies in the context

of developing countries also find very small effects of food subsidies on nutritional outcomes.

Jensen and Miller (2011) conduct an experiment in rural China to estimate the impact of

price subsidies for rice and wheat and find no evidence that subsidies improve nutrition.

Laderchi (2001) finds that food transfers are no more successful at improving child nutrition

than other sources of income in Peru. In a survey of 300 households in Rajasthan, Khera

(2011c) finds that being eligible for PDS food grains does not lead to higher overall cereal

consumption. Kochar (2005) uses the change in 1997 from universal to targeted PDS to es-

timate the impact on calories consumed by rural households. She finds that the elasticity of

caloric intake with respect to the value of the subsidy is very low (0.06 on average.) Tarozzi

(2005) studies a sudden increase in the price of PDS rice in Andhra Pradesh and concludes

that it did not have a big impact on nutritional status and child anthropometrics.

While the effect of the PDS on nutrition has been found to be quite small, some limita-

tions of the identification strategies used in previous studies could cause their results to be

biased in the direction of finding no effect.12 Kochar (2005) uses variation in the value of the

subsidy that is determined by BPL status. The measure of eligibility for the newly targeted

program (BPL status) is imputed and not actually observed in the data which results in

errors of mis-classification. In practice, BPL status is determined using individual poverty

scores and region specific cut offs, both of which vary over time. Even within a particular

region, imputing BPL status using multi-dimensional indicators of poverty is problematic.

Niehaus et al. (forthcoming) survey households in Karnataka which were identified as po-

tential PDS beneficiaries in the government BPL survey. Using the criteria for 2007 (based

12Jensen and Miller (2011) raise similar concerns regarding the methodologies employed by Kochar (2005)and Tarozzi (2005).

15

on eight measures), they find that 13% of eligible households do not possess a BPL card,

while 70% of legally ineligible households do. Thus any exercise in imputing BPL status will

have significant mis-measurement resulting in biased estimates. Further, the time frame of

Kochar’s study is two to three years before and after the targeted program was introduced.

Take up rates were low (6%- 14%) during that time and the program was much less gener-

ous. Since 1997 and particularly during the early 2000’s, the PDS has undergone enormous

changes. There has been a push to increase the value of the subsidy to BPL households and

participation in the PDS has gone up.13 Finally, Kochar’s analysis also suffers from a bias

in the opposite direction. Since BPL status is a prerequisite for other types of government

assistance, using the BPL dummy could confound the effect of the PDS with the impact of

other government programs.14 Khera’s (2011c) study also uses variation due to BPL sta-

tus, and does not exploit differences in the value of the subsidy. Tarozzi’s (2005) analysis

focusses on anthropometrics for children below age 4; identification comes from variation

in the length of exposure to higher PDS prices but does not use differences in the quantity

entitlement. The length of exposure varies only from one to three months and the actual

receipt of benefits is not observed. His analysis focusses on the pre-targeted PDS in 1992,

when the program was much less generous. This paper adds to the literature by being the

first to compute a household specific value of the subsidy using state level program rules and

13Round 61 (2004-05) of the NSSO surveys provides information about the BPL status of a household.The data from this round suggest that participation is not driven by income or idiosyncratic householdcharacteristics that could potentially affect food consumption. 68% of BPL households in 2004-05 reporthaving accessed the PDS in the last 30 days. In data from the India Human Development Survey (2005),over 95% of BPL households report having accessed the PDS in the last 6 months. Thus, as the value ofthe program to BPL households has increased, there has been a corresponding increase in participation.Non-BPL participation in the PDS has fallen dramatically. In the 2004-05 round of the NSSO surveys,0.06% of PDS users were non-BPL. This decline has also been noted by Khera (2011a). Similar behaviourhas been observed in the context of other welfare programs – Brian Mc Call (1995) finds that take up ofunemployment benefits increases as the individual level recipient amounts increase.

14Assistance programs include loans, scholarships, medical benefits, distribution of bicycles etc.

16

local market prices.15

4. Data and Empirical Strategy

4.1 Data and Sample

The National Sample Survey Organisation (Ministry of Statistics and Program Implemen-

tation) has conducted socio-economic surveys every one to two years since the 1950s. The

surveys are repeated cross sections and collect detailed expenditure information. They are

nationally representative, conducted year round to avoid seasonal biases and form the basis

for the government’s official poverty estimates. The NSSO data have been used to study

the PDS and other national programs such as the employment guarantee scheme as well as

issues of inequality, poverty and land relations (Kochar 2005, Khera 2011a, Dutta et.al 2012,

Deshpande 2000 and Sundaram and Tendulkar 2003). This paper uses six years of data from

July 2002 to June 2008.16

The NSSO surveys split expenditure data into several sub categories such as food items,

beverages, durable goods, medical expenditure, educational expenditure, conveyance and

rent. Other variables include family size, number of children, industry classification of the

head of the household, location, land owned, own production, type of dwelling, religion and

social group. Data are collected on the age, gender, education level, marital status and

15Using the value of the subsidy, as opposed to a dummy for eligibility, also enables a comparison withthe results for price subsidies in Jensen and Miller (2011).

16In 2002, PDS allocations were increased from 20 kg to 35 kg per family, per month in most states.Some states started making changes (increases and decreases) in the PDS quotas in 2008. In the absence ofcredible official records of all the changes that ensued, I focus on the relatively stable period between 2002and 2008 for which reliable state level program rules are available.

17

relation to the head of each household member. For goods that are available through the

PDS, households report spending in terms of cost and quantity separately for PDS and

non-PDS sources. This enables the classification of households into PDS beneficiaries and

non-beneficiaries based on actual utilization as opposed to imputing BPL status. The final

analytic sample for this paper comprises the current participants of the PDS.17

Regional preferences for cereals are distinct and strong in India. Atkin (forthcoming) pro-

poses climatic conditions and habit formation as possible reasons for these distinct tastes

and previous studies on the PDS (Kochar 2005, Khera 2011c) have focussed either on rice

or wheat.18 Following the literature, I consider the eight states (151 districts) that have rice

as their dominant staple and a generous rice subsidy.19 These states offer a very small or no

subsidy for wheat. Though the main focus is on rice, the wheat subsidy in these states is used

as a validity check and is found to have a very small impact. Due to data availability and

reliability issues, it is standard practice to focus on the major Indian states. Tamil Nadu, a

state in southern India, is distinct from the rest of the country in that it has a universal PDS.

All households, irrespective of their income, are eligible for the program, making it difficult

to compare their outcomes with the rest of the country since the impact at the bottom of the

income distribution could be very different from its impact at the top. Thus, in the absence

of a credible way to classify households into BPL and APL, and given that the program is

so different in this one state, I drop Tamil Nadu from the sample. The states in the final

17As discussed earlier, non participation in the PDS is not likely to be driven by household specific factors.To the extent that some fraction of households with BPL cards do not avail of the program, strictly speaking,my analysis estimates the effect on eligible households that participate in the program.

18Jensen and Miller’s (2011) results on the impact of price subsidies in China vary substantially by provincesuggesting that the two staples (rice and wheat) should be analysed separately. In this paper, I examine therice subsidy and a similar analysis can be undertaken for wheat.

19Honyes and Schanzenbach (2009) also restrict their sample to limited subgroups of the population thathave a higher probability of participation in the food stamps program.

18

sample are: Andhra Pradesh, Assam, Chattisgarh, Jharkhand, Karnataka, Kerala, Orissa

and West Bengal. As a robustness check, I estimate the impact of the program in all the

major Indian states. As expected, the rice subsidy has a much smaller impact in the non-rice

favoring states. Aside from regional preferences, the rice subsidy specifically is of interest

because there is some evidence that rice may be a giffen good for poor households (Jensen

and Miller 2008). Finally, the rice subsidy in India is cleaner to study because the amount

of illegal diversion of wheat grains is much higher (Khera 2011a).

Prices are computed using households’ reported expenditures and quantities for over 150

food items. For every item, prices can be determined by dividing expenditure by quantity.

Though technically speaking these are unit values, which can give rise to measurement error

and concerns of differentiated products in terms of quality, it is common in the literature to

use them as proxies for prices (Subramaium and Deaton 1996, Kochar 2005, Atkin 2013). In

the context of this study, these issues pose much less of a threat because prices for PDS and

market rice are computed by averaging the district-season-year specific price as reported by

BPL households.20 Each household is assigned this average price and not its self reported

unit value. Further, only prices reported by similar households, i.e. beneficiaries of the PDS,

are used to compute the local average price. Thus the local average does not include prices

paid by much wealthier households, who might be purchasing higher quality rice. Following

Atkin (forthcoming), median local prices are used to calculate the value of the subsidy as a

check. The results are robust to using median prices, which are not likely to be influenced

by quality and outliers. All prices are in 2005 rupees.

20Every NSSO survey period (one year) is subdivived into four or six sub-rounds/waves. This is done toensure that seasonal patterns in consumption can be studied.

19

The conversion of food purchases into per capita caloric intake is done using standard factors

(NSSO 1996) for each of the food items in the survey and total monthly household calories

are converted into per capita daily amounts.21 While this may not be ideal due to potential

differences between purchases and actual intake, it is an approximation used not just in the

literature, but also by the government to estimate the income poverty line based on caloric

intake. 22 Deaton (1997) discusses the merits and limitations of several types of equivalence

scales to convert household consumption into per person quantities. For practical applica-

tions, a simple weight is suggested as a reasonable approximation. All per capita values are

thus calculated using an equivalence scale that assigns 0.5 weight for children below 15 years.

This allows for a correction in the per capita amounts for larger families that are likely to

have more children (who consume fewer calories).23

4.2 Variation in Value of the Subsidy

The value of the subsidy for each household is calculated as the product of the local price

discount and the state specific per capita quota. The discount is the difference between

the average market and average PDS price, at the district-season-year level. The quota is

calculated based on program rules of the state of residence and household size.24

PerCapV alSubijswt = (Pmktjwt − P pds

jwt ) ∗Qis

21Subramanium and Deaton (1996) attempt to correct this conversion from the NSSO surveys to accountfor the number of meals given to guests. However, since there is no way to determine the composition ofthose meals, caloric consumption from different food groups is not adjusted and neither is the expenditure.

22As discussed in Jensen and Miller (2011), data from more detailed individual food intake diaries raisetheir own set of concerns of validity and accuracy.

23Results remain qualitatively the same if a simple per capita variable is used instead. Elasticities computedat the household level are also found to be similar.

24As discussed ahead, I use the national average family size to compute the value of the subsidy for everyhousehold as a robustness check.

20

where i= household, j= district, s= state, w= season, t= year and Qis= state-household

size specific quota for household i.

Prices

Recall that the central government sets a PDS price for the year and states are allowed to

add transport and distribution costs to the final price that households pay. The PDS price is

not indexed to market prices, which are in turn determined by demand and supply side fac-

tors.25 The diverse nature of India’s terrain and climate combined with its large area results

in substantial geographic variation in the prices of agricultural commodities. As discussed in

Jacoby (2013), controls on the inter state trade of food grains lead to substantial differences

in prices across states. Deaton (1997) finds important interregional differences in prices

using NSSO data. Kochar (2005) and Atkin (forthcoming) ascribe the within state, inter

district variation in market prices of food grains to transportation costs and state specific

market controls.26 Agricultural prices also vary by season and as a result of local conditions

and unpredictable weather phenomena.27 Thus, as a result of the government’s policy of

maintaining a fixed (within year) PDS price across regions in India, the difference between

the market and PDS price varies for every district-season-year cell.28

25In a study conducted in Delhi, SEWA (2009) finds that 31% of respondents report that PDS pricesremain low, but market prices keep rising.

26Wadhwa (2001) details the restrictions on the marketing and movement of agricultural goods in India.27Planning Commission’s (2001) extensive study on the price behaviour of rice and wheat confirms sea-

sonality of prices. Sekhar (2003) reports a higher intra year variability in domestic agricultural prices ascompared to international prices which suggests incomplete integration of markets within the country.

28Antyodaya households are identified as being the poorest of the poor and they pay even lower prices. Inround 61 (2004-05) of the NSSO surveys and the IHDS (2005), these households are identified and summarystatistics confirm that they pay 30% lower prices on average at the PDS shop (appendix Table C). Theycomprise less than 10 % of the BPL sample in my data. In the full dataset, it is not possible to identify thesehouseholds and this source of variation is not explicitly used in the analysis. As a check, I use data fromround 61 to include an interaction between the value of the subsidy and an Antyodaya dummy and find noevidence of a differential impact of the subsidy.

21

Table 2 presents the spread of the discount across and within states for different seasons.

The state level average discount varies between 38% (Assam in the monsoon season) and

67% (Karnataka in winter). These rates are in line with the government’s policy of offering

food grains at approximately half of the procurement cost. Within any particular season

and state, there is substantial variation in the district level discount. Figure 2 confirms

this variation graphically for each district-season-year cell in the sample. Panel A plots the

discount data from 2002 for each of the 151 districts in the sample. There is heterogeneity

within states in the discount across districts. For instance, in 2002, the discount in Assam

varies from 15% to about 45% in the monsoon season, with a state level average of 38%. The

average discount in Andhra Pradesh is 55% but the spread (47% to 65%) is much smaller as

compared to Assam. A similar pattern of within state variation in the seasonal discount is

seen for the other six years (panels B to G, Figure 2) .

Quotas

Different states set different quotas for rice and wheat ranging from 0 to 35 kg per household

per month, as shown in Table 1. Some states do not index the quota to household size and

for those that do, there is an upper limit on the maximum amount (except in West Bengal).

Thus the per person quota varies by household, depending on its size and state of residence.29

Table 3 combines the two sources of variation and presents the spread of the per capita

value of the subsidy for the sample under study. As expected, the spread of the per capita

value (standard deviation) is the smallest in West Bengal and Andhra Pradesh, both of

29An examination of the average quantities bought via the PDS confirms that this variation is realized inpractice (Appendix Table A).

22

which index the subsidy to household size. The average subsidy in West Bengal is less than

half of the subsidy in Andhra Pradesh. Overall, the value varies across seasons, states and

within states.

4.3 Descriptive Statistics

Table 4 presents descriptive statistics for PDS users and the full NSSO sample averaged

over all six years. The monthly per capita expenditure for PDS users is lower than that

for the full sample, as expected. The average per capita caloric intake for PDS users is

2190 kcal. Given that over 78% of PDS users live in rural areas, this average is well below

the minimum daily requirement (2100 kcal for urban and 2400 kcal for rural areas). PDS

users spend a higher fraction of their total monthly expenditure (58%) on food. Scheduled

Caste/Scheduled Tribe/Other Backward Castes comprise 76% of PDS users.

The PDS price is 50% lower than the market price. On average, each household buys

more rice in the market than what they receive through the program – the PDS contributes

41% of the total rice consumed. Thus, the PDS is an important source, but not one that

meets the entire food requirements of the household; almost three-fourths of the total food

expenditure is on items other than rice.30 Per capita cereal expenditure is a little over one

fourth of food expenditure, but cereals contribute over 72% of the total calories consumed.

30The fact that households also purchase rice in the market suggests that resale of PDS rice throughconsumers is not a serious concern.

23

4.4 Empirical Specification and Assumptions

The basic equation to estimate the impact of the subsidy on household outcomes is

Yijswt = α + βPerCapV alSubijswt +Xijswtγ + δj + χw + θst + εijswt (1)

Yijswt represents the outcome variable (such as per capita caloric intake) for household i in

district j, state s, season w and year t. PerCapV alSubijswt is the per capita value of the

subsidy to the household based on program rules and local prices. Xijswt is a vector of house-

hold characteristics, δj and χw control for district-specific and seasonal effects respectively,

and θst are state*year dummies. Standard errors are clustered at the district level, as all

households in a district are subject to the same rules determining the per person monthly

quota.

Household characteristics in X include education of the head of the household and the

spouse, a quadratic in the age of the household head, the proportion of females in the house-

hold, land holdings (proxy for income) and urban location. Behrman and Deolalikar (1988)

provide a comprehensive review of the literature on the demand for nutrients in developing

countries and find that education levels, size and demographic composition of the households

are important determinants of the demand for calories.31 While most studies on the PDS

have focussed on the rural poor, the program is also critical in urban areas and the nutri-

31Though the dependent and main explanatory variable in equation (1) are both in per capita terms, theremay still be economies of scale within a household that are not accounted for by this specification. I checkfor any effect of household size in the following ways. First, I use an alternative weighting scheme to generateper person variables for the household. Second, I use household level caloric intake and value of the subsidyto estimate the impact and explicitly control for family size. Third, I shut down the variation due to familysize by using quotas based on the national average family size. Finally, even though household size as aregressor in equation (1) is potentially endogeneous, as an additional check, I find that adding size to theregression does not qualitatively change the results.

24

tional needs of households in the two areas are very different. District and seasonal dummies

are included to take regional and climatic effects on the demand for calories into account.

The model is saturated by introducing state*year dummies that control for any differences

in state level effects and other assistance programs that BPL households in a particular state

might be eligible for. With district, season and state*year controls, any variation in the price

discount is due to unpredictable shocks to prices as a result of random weather phenomenon,

arbitrary controls on the movement of goods and imperfectly integrated agricultural markets.

The parameter of interest from equation (1) is β, the coefficient on the value of the sub-

sidy. The validity of the model rests on the assumption that after controlling for household

characteristics and geographic and seasonal effects, the value of the subsidy (price discount*

quota) is exogenous to unobservable factors that may affect the demand for food. If this

assumption holds, the value of the subsidy should have a negative (through market price) or

zero effect on the caloric consumption of households that do not use the program. To check

for this, I perform a falsification test using non-PDS users.32

The analysis makes two other substantive assumptions. First, the model assumes that

household size is exogenous to the state level quota, i.e. households do not adjust fam-

ily size according to more or less generous program rules. While the program is important,

it is not likely to affect the fertility or migration decisions of households. On average, the

value of the subsidy is 4% of the total monthly expenditure of the household. I check for the

validity of this assumption by using the national average family size instead of the actual

size of the household to compute the value of the subsidy and find that the results remain

32Tables 12-15 ahead present the results of multiple robustness checks.

25

qualitatively the same. I also check for the independent effect of family size by using total

household caloric intake as the outcome and explicitly controlling for family size. Second,

the analysis assumes that the demand for calories or rice by any one household does not

affect the market price that it faces i.e. the local price in its district-season-year cell. In the

absence of this assumption, the coefficients would suffer from simultaneity bias. Treating the

household as a price taker is a standard assumption from the theory of competitive markets.

The agricultural market in India has thousands of producers and consumers, which makes

this assumption fairly reasonable in the context of this study.

5. Results

5.1 Impact on Cereal Consumption and Caloric Intake

To study the impact of the PDS on nutrition, three outcome variables are considered: per

capita cereal consumption, per capita caloric intake and per capita calories from different

food groups. Cereals are a cheap and critical source of calories and the PDS is responsible

for providing a substantial amount of cereals to vulnerable households.33 Column 1 of Table

5 presents results for the main specification with per capita cereal intake as the outcome

variable. An increase in the monthly subsidy by Rs 10 leads to a 20.3 gm (60 kcal/day)

increase in the daily consumption of cereals. Based on average market prices, Rs 10 (per

month) can buy an extra 30.2 gm of cereals every day. This suggests that infra marginal

households prefer to spend part of the extra income on foods other than cereals or on non-

food items. The specification in column 1 assumes that an (absolute) increase in the value

33Most cereals have 3-3.5 kcal/gm. See Appendix Table B for a comparison of the price per calorie fordifferent food groups.

26

of the subsidy has the same effect, irrespective of the base value. Evaluated at the means,

the estimate from column 1 implies an elasticity of 0.12, which is similar to the estimate

from the log-log regression in column 2– 0.123. Given that the elasticity is small and the

sample comprises only BPL households, it is not surprising that the estimates are so close.34

Though small, the elasticity is positive and significant and confirms that the subsidy has a

real impact on cereal consumption, as expected. In order to examine the impact of each of

the components of the subsidy and determine their relative importance, column 3 splits the

value of the subsidy into quota, prices and interaction terms. The coefficient on quota is

insignificant, confirming that households are infra-marginal since the amount of the quota

doesn’t determine overall cereal consumption. The coefficient on market price is negative, as

expected. The interaction between the amount of the quota and the market price is positive

and significant, suggesting that a higher quota increases cereal consumption more, and is

thus more valuable, when the market price of rice is higher.

Though the estimates suggest that the program has a positive effect on cereal consumption,

households may be substituting some of the income gains away from cereals. Combining the

analysis for calories with the results on cereals helps to determine the impact on overall food

intake. Column 1 in Table 6 indicates that a Rs 10 increase in the value of the rice subsidy

results in an increase an increase of 126 kcal/day.35 This is more than double of the impact

34The levels estimate assumes that the impact of an absolute increase in the value of the subsidy isindependent of where in the distribution the household is located (linear relationship). In this sample,households are homogeneous to the extent that they all lie below the poverty line and receive a positive foodsubsidy. The log-log specification assumes a non-linear relationship and a constant elasticity. However, foran elasticity of 0.12, the resulting curve is very flat, i.e. almost linear, over the range of the value of thesubsidy, cereal consumption and caloric intake.

35In order to compare the estimates, I convert the gains in daily cereal consumption from Table 5 intocaloric gains. The 95 % confidence interval from Table 5 implies an increase between 51 kcal/day and 69kcal/day. This does not overlap with the confidence interval from Table 6 which is 104 kcal/day to 148kcal/day. Thus, the overall gains in caloric intake are greater than the increase in calories from cereals alone.

27

on calories through increased cereal consumption (60 kcal/day from by column 1 in Table 5),

suggesting that the subsidy works by increasing calories through not just cereals, but other

foods as well. This is an important indicator that households benefit from the program in

terms of overall food intake and not just through the food grains directly provided by the

PDS.

Table 6 also presents estimates of the cross food group elasticities.36 In China, Jensen

and Miller (2011) find negative effects of an experimental price subsidy for rice on fruits and

vegetables and on pulses (lentils). In contrast, all elasticities here are positive and significant.

Thus, the results suggest that provision of a food subsidy via quotas has a positive impact

on all food groups, similar to what would be expected from an income transfer whereas the

pure price subsidy causes households to consume less of the subsidized good and fewer overall

calories.

5.2 Comparison with the Expenditure Elasticity of Calories

The PDS subsidy is supplementary in nature and thus is expected to operate through the

income effect. The elasticity of calories with respect to the value of the subsidy is 0.144

from column 2 of Table 6. Though small, it lies between the historical estimates of income

elasticity of calories for India, which range from 0 to 0.34. Behrman and Deolalikar (1990)

find the income elasticity of calories for rural south India to be very low: 0.01. Subramanian

and Deaton (1996) use NSSO data and report an expenditure elasticity of calories of 0.34 for

rural households. To compare my results with these earlier studies, I estimate the elasticity

of calories with respect to total expenditure for rural PDS households in my sample. The

36For the sample in this paper: 5% of total calories come from lentils, 5% from fruits and vegetables andapproximately 1% from meat, fish and other animal products.

28

expenditure elasticity is close to the higher estimates in the previous literature: 0.4 from

column 1 in Table 7. Even though the elasticity of caloric intake with respect to the subsidy

is lower than the expenditure elasticity (0.140 for the rural sample), it is still positive and

significant.37

The fact that the PDS acts like an income transfer raises the question of its impact on

price paid per calorie i.e., the extent to which a rise in income induces households to sac-

rifice caloric intake for more expensive and less calorie rich foods. Behrman and Deolalikar

(1989) suggest that the income elasticity of calories is smaller than the income elasticity

of food expenditures because of changes in the shape of the food indifference curve as in-

come rises. They find that even relatively poor households value variety. Subramanian and

Deaton (1996) report the total expenditure elasticity of expenditure on food as 0.75 and the

elasticity of price paid per (1000) calories as 0.35. These numbers are very close to the esti-

mates for the sample used in this paper (columns 3 and 5 in Table 7). The elasticity of food

expenditure with respect to the value of the subsidy is lower at 0.146, but it is positive and

significant. The subsidy elasticity of price per calorie is not significantly different from zero.

Thus the value of the subsidy is not large enough to raise concerns of households sacrificing

calories for taste.

One issue with the NSSO data (and any estimates based on them) is that they do not

contain information on income. Table 8 presents alternate estimates of the elasticity of

37It is not surprising that the elasticity of caloric intake with respect to the value of the subsidy is lowerthan the expenditure elasticity. This is true for two reasons: First, transaction costs associated with goingto the fair price shop, such as irregular supply, inconvenient timings and long queues would make the impactof the subsidy smaller. A pure increase in income does not have any such costs associated with it. Second,this estimate takes into account leakages and inefficiencies in the system since it is based on actual programrules.

29

cereal consumption using income data from the India Human Development Survey 2005.38

The survey is nationally representative, covers 41,554 households and collects detailed house-

hold information on demographics, income, debt, insurance and consumption. The income

elasticity of cereals for PDS users is much lower (0.046 in column 1, panel A of Table 8)

than the elasticity with respect to the rice subsidy which is 0.295 in column 2. The income

elasticity of food expenditure is also much lower than the elasticity of food expenditure with

respect to the value of the subsidy from columns 3 and 4 of panel A. While income is a

reflection of more permanent characteristics of a household such as age and education of

the head, location and occupation, the value of the subsidy fluctuates from month to month

depending on market prices. Not all income is consumed, part of it saved or used to purchase

durable goods, thus it is not surprising that an increase in the value of the subsidy has a

bigger impact on cereal consumption than an increase in income. Further, income tends to

be under-reported and the resulting measurement error would bias the estimates downwards.

The IHDS dataset allows me to check if the household characteristics used in the main

regression are a valid control for income. Column 2 of panel A in Table 8 estimates the

elasticity with respect to the value of the subsidy controlling for income, location, district

and season effects. In column 3 of panel B, I run the same regression, but instead of explic-

itly including income, I add the education of the head of the household and the spouse, a

quadratic in the age of the household head, the proportion of females in the household and

land holdings. The estimates are similar, which validates the use of these characteristics in

place of income.

38This survey is jointly conducted by National Council of Applied Economic Research in Delhi and theUniversity of Maryland (Desai, Vanneman, & National Council of Applied Economic Research, 2005). I usethe IHDS dataset to construct variables that are identical to variables from the NSSO data. I also restrictthe sample to the eight rice favoring states used in the main analysis.

30

The food module of the IHDS surveys is similar to the NSSO, but is much less detailed.

To check for any effect of differences in survey methodologies, in panel B of Table 8, I es-

timate identical regressions for the IHDS and data from the NSSO covering the same time

period (November 2004 - October 2005). The estimates for expenditure elasticity of cereals

from the two datasets are similar: 0.247 and 0.269 in columns 1 and 2 of panel B.39 The

estimates of the elasticity of cereal consumption with respect to value of the subsidy across

the two samples (panel B columns, 3 and 4) are also similar, though the IHDS estimate

is slightly larger. The NSSO sample is larger than the IHDS, but the two don’t differ on

observable characteristics.40 PDS prices are lower, and consequently the subsidy value is

marginally higher on average (Rs 25.46) in the IHDS sample as compared to the NSSO data

(Rs 24.17).41 While it is difficult to say with certainty why the estimates differ, the IHDS

results do not qualitatively change the implications of the main analysis. At best, they

suggest that the estimates from the NSSO data may be a lower bound on the impact of the

program.

5.3 Heterogeneous Impacts

Households that benefit from the PDS are similar to the extent that they all lie below the

poverty line. However, the program could have differential impacts on certain sub-groups of

the population. Table 9 checks for heterogenous impacts of the subsidy on urban households,

39The expenditure elasticity of cereal consumption (column 2 of panel B, table 8) is lower than theexpenditure elasticity of calories (column 1, table 7). Deaton and Dreze (2009) find a similar pattern andattribute it to the fact that a higher fraction of the marginal rupee spent on food goes towards non-cerealfood items.

40Appendix Table D presents descriptive statistics for the two samples.41Price data in the IHDS is collected directly, by asking households for their estimate of the average market

price of various goods over the last 30 days.

31

traditionally marginalized communities, farming households and households in the lowest ex-

penditure quartile. While urban households face different patterns of price changes compared

to rural ones and may have different mechanisms to cope with food insecurity, (rice) farming

households may be more or less affected by market prices depending on whether they are net

buyers or sellers in the market. Traditionally marginalized communities may face discrimina-

tion in terms of access to the program and households in the lowest expenditure quartile may

lack sufficient income to fully leverage the subsidy provided through the program. Residing

in an urban area has a negative and significant impact on cereal consumption (column 1).

This is expected as urban calorie requirements are lower on average. However, the program

does not have a significantly different impact in urban areas. Columns 2 and 3 check for

the impact on cereal consumption of traditionally marginalized communities (by caste) and

the poorest quartile of the sample, respectively. There is no significantly different impact

for the SC/ST/OBC category. Households in the lowest expenditure quartile consume fewer

cereals, but the subsidy does not have a differential impact on them.

The program has a strong, positive effect on the cereal consumption and caloric intake

of farmers, i.e. households that report consuming some of their home grown rice (columns

4 and 8). Given that these households need to rely less on the market and are not likely

to make cereal purchases, the subsidy acts like a direct in-kind transfer of extra grains. As

there is no income gain, these household can not substitute cereals with other types of food

or other goods. For the sample of farmers, Table 10 splits the subsidy into its components

and confirms that market price is not a significant determinant of caloric intake (column 2)

and marginally significant for cereal consumption (column 1). This is in contrast to results

for the overall sample (column 3 of Table 5) where market price was seen to have a significant

32

and negative impact. The interaction between market price and per capita quota has the

expected positive sign and is significant.

From column 7 of Table 9, the impact of the subsidy on the caloric intake of the lowest

income quartile is significantly lower, compared to other households. This group represents

the poorest of the poor. For an increase in the value of the subsidy, these households appear

to substitute away from food towards other non food items. Table 11 presents elasticities

of cereal consumption and caloric intake for the four income quartiles of PDS users. The

elasticities increase with income, suggesting that the poorest households (also likely to be

the most credit constrained) use the extra income generated by the subsidy to purchase

non food items.42 This is consistent with the hypothesis in Sen (2005) that an increase in

the availability, demand and prices of non-food essential goods and services has resulted in

squeezing of the food budget of poor families.

5.4 Robustness and Sensitivity Checks

The identification strategy rests on the assumption that there are no underlying factors that

simultaneously affect the value of the subsidy and food consumption of households. To test

the validity of this assumption, I perform a falsification test by checking for the impact of

the local average value of the PDS subsidy on the cereal consumption and caloric intake of

households that do not receive subsidies from the PDS. From Table 12, it is clear that the

program has no impact on the consumption of Non-PDS beneficiaries. The only (marginally)

significant effect is the elasticity of cereal consumption. As expected, this is negative since

42There is no difference in terms of quantity of rice purchased from the PDS across expenditure quartiles,implying that even households in the lowest quartile purchase the full PDS entitlement and the lower impacton their cereal consumption is not driven by an inability to fully avail of the PDS discount.

33

a higher value of the subsidy (resulting from a higher market price of rice) would negatively

affect cereal consumption of households that purchase cereals only in the market.

Table 13 presents results from a series of robustness checks. Concerns about the possible

endogeneity of household size are addressed in columns 1 and 2. In column 1, the national

average family size is used to calculate the value of the subsidy instead of the actual size and

composition of each household. Column 2 presents the elasticity results at the household

level, with a control for household size. The estimates from both are very close to the main

result (0.144) which supports the assumption that household size is not influenced by the

program. To check that the results are not driven by the equivalence scale used in the main

analysis, column 3 uses a simple per capita estimate. The results remain qualitatively simi-

lar to those from the main analysis, though using a simple per capita without correcting for

the composition of the household makes the elasticity appear larger. Averaging across the

market price of rice within a district-season-year cell should not raise concerns of significant

differences in quality, since households in the sample all lie below the poverty line. However,

I check for this possibility by using the median prices to calculate the value of the subsidy

and find that the results remain the same (column 4). Finally, I check the sensitivity of my

results to seasonal price patterns by utilizing survey waves/sub-rounds. Replacing state*year

and season dummies with state*survey wave dummies does not change the estimates, as seen

in column 5.

Table 14 checks for the impact of the rice subsidy on three different samples – All India,

the rice favoring states and non-rice favoring states. As expected, the elasticity is much

smaller in states where rice is not the main staple, even though some of them offer a small

34

subsidy on rice. This justifies using the smaller sample of rice favoring states to get a more

valid estimate of the impact of the rice subsidy program. Conversely, Table 15 checks for

the impact of the (much smaller) wheat subsidy in the main sample of rice favoring states.

The effect of the wheat subsidy on both cereal consumption and caloric intake is extremely

small and its inclusion makes the effect of the rice subsidy a bit larger. To the extent that

the value of the subsidy is higher when the market price of rice (and possibly other goods)

is higher, this is expected because the wheat subsidy provides a control for the overall level

of market prices.

5.5 Issues in Implementation and Policy Changes

Corruption

The PDS has been criticized for various types of inefficiencies and corruption and there are

striking state-wise differences in implementation. Khera (2011a) estimates the extent of

diversion in food grains using a combination of NSSO data and administrative records on

grain allocation at the state level. She categorizes states into those that are performing (well)

and others that are reforming or languishing.43 Table 16 presents results for the impact on

cereal consumption and caloric intake. Columns 1 and 2 suggest that the impact on cereals

and caloric intake is almost 50% lower in states that are more corrupt.44 This is a huge cost

of inefficiency and is particularly troubling given the high levels of malnutrition that persist

in India.43Of the eight states in the sample, three are categorized as functioning well: Andhra Pradesh, Karnataka

and Kerala.44These estimates provide suggestive, not causal evidence of the effect of corruption as there could be

many other state level factors that make the program less (more) effective in more (less) corrupt states.

35

The National Food Security Bill

The PDS is the government’s flagship program for improving nutritional outcomes and the

National Food Security Bill (NFSB), passed in September 2013, plans to expand it further.

Unless the enormous leakages in the system are plugged, the government will have to procure

a much higher quantity of food grains (than the target) in order to have the intended effect

on nutritional outcomes. A rough estimate of the impact of the NFSB can be made using

price data, program rules and the elasticity estimates from this paper. The NFSB assures 5

kg of food grains per person per month to 67 % of the population at Rs 3 per kg for rice, Rs 2

per kg for wheat and Re 1 per kg for coarse grains. BPL households are currently eligible for

an average of 4 to 5 kg per person per month.45 At current prices, it is estimated that the per

kilogram subsidy will rise from Rs 13.5 to Rs 16.5 (The Financial Express, September 2013).

Thus the NFSB does not substantially increase the quantity of food grains assured to BPL

households, but it does entail a big increase in the price discount they receive. Combined

with average caloric intake and current price data, the elasticity estimate from this paper

suggests that the bill will lead to a per person increase of 72 kcal/day in rural areas and

66 kcal/day in urban areas for the current beneficiaries of the program.46 This estimate is

based on program rules and thus takes into account the leakages and inefficiencies in the

system. If the expansion of the PDS is accompanied by better enforcement, especially in the

more corrupt states, the impact will be even higher.

The bill will also expand the beneficiary pool of the PDS to include 67% of the popula-

45Many state governments such as Tamil Nadu, Andhra Pradesh and Chattisgarh already have a moregenerous PDS than the national average and provide food grains to a large fraction of their population atthese low prices. Further, Antyodaya households are eligible for 35 kg per month at even lower prices.

46This estimate uses the approximate caloric intake of PDS rice users in rural (2260.25 kcal/day) andurban areas (2076.5 kcal/day) from the 2009-10 round of the NSSO Surveys (Government of India, 2013).

36

tion.47 While it is difficult to precisely predict how newly eligible households will respond

to the subsidy, there is some evidence to suggest that there will be a positive effect (similar

to that for current beneficiaries) on their caloric intake. Given that the new beneficiaries

are likely to be better off, the subsidy will not be their primary source of food grains and

thus, should have a positive effect on caloric intake through the standard income effect. This

hypothesis is supported by the results in Table 11, which show that the elasticity of caloric

intake is higher for households in the higher expenditure quartiles. There is a concern that

expansion in the reach of the program could result in the inclusion of some households that

have an income high enough to make the relative value of the subsidy too insubstantial to

have an effect. These households are also likely to have have lower rates of participation,

which would further reduce the overall impact of the program on caloric intake. Neither of

these concerns is found to relevant in Tamil Nadu, which is the only state in India that has

a universal PDS. The participation rate for the entire population of the state (averaged over

the six years) is high (61%) and the elasticity of caloric intake with respect to the value of

the subsidy is 0.2, which is higher than the national average (0.144). Tamil Nadu has a well

functioning PDS and the data from this state suggest that if implemented well, the program

can have a substantial impact on caloric intake for a wide range of households.

6. Conclusion

This paper examines the Indian Public Distribution System and presents evidence of its

impact on nutrition using variation in state specific program rules and fluctuations in local

market prices. In agreement with the literature on food subsidies, the elasticities for cereal

consumption and calories with respect to the value of the subsidy are small. However the

4744.5% of the population currently participates in the PDS (The Indian Express, July 2013).

37

results indicate that households benefit from the program in terms of food intake (calories)

and not this is not just through the food grains (cereals) directly provided by the PDS.

Even though the program provides a subsidy only on cereals, it has a positive effect on the

consumption of different food groups. Thus, the PDS subsidy generates an income effect

for households and is effective in improving nutrition. The results also confirm state wise

differences in the functioning and impact of the PDS. Finally, the elasticity estimates suggest

that the implementation of the National Food Security Bill will lead to an increase of 66 -

72 kcal in the daily caloric intake of current beneficiaries of the program.

Some states and regions in India have had much more success in implementing the PDS

than others. The Indian Human Development Survey is a rich socio-economic dataset that

makes it possible to study these regional differences in more detail. In additional to details

on household expenditure, income and fertility, indicators for learning skills and anthropo-

metrics for children are also collected. Examining district level outcomes using participation

and average value of the subsidy would be an additional dimension to studying the PDS.

This would speak to the significant geographic differences that have been noted in the func-

tioning of the system. Another unexplored aspect of the PDS is the protection it provides

from fluctuations in market prices. This would be an interesting angle to assess its value as

a source of food security. Reduced exposure to market risk could also affect a households

investments and labour market choices.

The PDS has been the topic of a lot of debate recently. The National Food Security Bill,

which was passed in September 2013, will give two thirds of the population the legal right

to obtain 5 kg of food grains (per person per month) at prices between Re 1 and Rs 3 per

38

kg The bill also has implications for intra-household bargaining, since it makes the eldest

woman in the household responsible for accessing the PDS on behalf of her family. Given the

enormous scale of the PDS, its expansion and imminent overhaul, it is important to study

the impact that it has in its present form.

References

[1] Atkin, D. (Forthcoming). Trade, Tastes and Nutrition in India.The American Economic

Review.

[2] Atkin, D. (2013). The Caloric Costs of Culture: Evidence from Indian Migrants (No.

w19196). National Bureau of Economic Research.

[3] Banerjee, A. V., and Duflo, E. (2011). Poor economics A radical rethinking of the way

to fight global poverty. PublicAffairs.

[4] Behrman, J. R., and Deolalikar, A. B. (1990). The intrahousehold demand for nutrients

in rural south India: Individual estimates, fixed effects, and permanent income. Journal

of human resources, 665-696.

[5] Behrman, J. R., and Deolalikar, A. (1989). Is variety the spice of life? Implications for

calorie intake. The Review of Economics and Statistics, 666-672.

[6] Behrman, J. R. and Deolalikar, A. B. (1988). Health and nutrition. Handbook of devel-

opment economics, 1, 631-711.

[7] Butler, J. S., Ohls, J.C., and Posner, B. (1985). The Effect of the Food Stamp Program on

the Nutrient Intake of the Eligible Elderly. Journal of Human Resources, 20(3):405-420.

39

[8] Deaton, A. (1984). Household surveys as a data base for the analysis of optimality and

disequilibrium. Sankhya: The Indian Journal of Statistics, Series B, 228-246.

[9] Deaton, A. (1997). The analysis of household surveys: a microeconomic approach to

development policy. Johns Hopkins University Press.

[10] Deaton, A., and Dreze, J. (2009). Food and nutrition in India: facts and interpretations.

Economic and political weekly, 42-65.

[11] Desai, S., Vanneman, R., and National Council of Applied Economic Research. (2005).

India Human Development Survey (IHDS). Computer File, ICPSR22626-v5. Ann Arbor,

MI: Inter University Consortium for Political and Social Research [distributor], 2009-622.

http://dx.doi.org/10.3886/ICPSR22626.

[12] Deshpande, A. (2000). Does caste still define disparity? A look at inequality in Kerala,

India. The American Economic Review, Vol. 90, No.2, pp. 322-325.

[13] Dreze, J., and Khera, R. (2010). The Bpl census and a possible alternative. Economic

and Political Weekly, 45(9), 54-63.

[14] Dutta, P., Murgai, R., Ravallion, M., and Van de Walle, D. (2012). Does India’s

employment guarantee scheme guarantee employment? World Bank Policy Research

Working Paper, (6003).

[15] The Economic Times (June 14 2013). Why the Food Security Bill is neither populist

nor unaffordable. Viewed on July 11 2013, http://articles.economictimes.indiatimes.com

/2013-06-14/news/39976460 1 national-food-security-bill-cash-transfers-food-subsidy

40

[16] FAO (1997). Safety nets to protect consumers from possible adverse effects Economic

and Social Development Department, Food and Agriculture Organization of the United

Nations.

[17] The Financial Express (September 2 2013). Correct Costs of the Food Security Bill

Viewed on September 3 2013, http://www.financialexpress.com/news/correct-costs-of-

the-food-security-bill/1156251/0

[18] Government of India (2013). Public Distribution System and Other Sources of Household

Consumption. NSS 66th Round. Report No. 545 (66/1.0/6). National Sample Survey

Office (Ministry of Statistics and Programme Implementation). Government of India,

New Delhi.

[19] Government of India (2012). Union Budget 2012-13. Expenditure Budget Volume I

(Ministry of Finance). Government of India, New Delhi.

[20] Government of India (2011). Annual Report 2010-2011. Department of food and public

distribution (Ministry of Consumer Affairs, Food and Public Distribution). Government

of India, New Delhi.

[21] Hoynes, H. W. and Schanzenbach, D. W. (2009). Consumption Responses to In-Kind

Transfers: Evidence from the Introduction of the Food Stamp Program. American Eco-

nomic Journal: Applied Economics, American Economic Association, vol. 1(4), pages

109-39, October.

[22] The Hindu (July 5 2013). President agrees to Food Bill Ordinance. Viewed on

July 9 2013, http://www.thehindu.com/news/national /president-agrees-to-food-bill-

ordinance/article4884834.ece

41

[23] The Indian Express (July 6 2013). Manmonia’s FSB: 3% of GDP. Viewed on September

3 2013, http://www.indianexpress.com/news/manmonias-fsb-3–of-gdp/1138195/1

[24] Jacoby, H. G. (2013). Food prices, wages, and welfare in rural India. World Bank Policy

Research Working Paper, (6412).

[25] Jensen, R. T., and Miller, N.H. (2008). Giffen Behavior and Subsistence Consumption.

The American Economic Review, 98:4, 15531577.

[26] Jensen, R. T. and Miller, N.H. (2011). Do consumer price subsidies really improve

nutrition? Review of Economics and Statistics, 93(4), p. 1205-1223.

[27] Karat, B. (2013). Food Matters: Law, Policy and Hunger. Prajashakti Book House.

[28] Khera, R. (2008). Access to the Targeted Public Distribution System: A Case Study

in Rajasthan. Economic and Political Weekly, Vol - XLIII No. 44, November 01.

[29] Khera, R. (2011a). Trends In Diversion Of PDS Grain. Economic and Political Weekly,

Vol - XLVI No. 21, May 21.

[30] Khera, R. (2011b). Revival of the Public Distribution System: Evidence and Explana-

tions. Economic and Political Weekly, Vol - XLVI No. 44-45, November 05.

[31] Khera, R. (2011c). India’s Public Distribution System: Utilization and Impact. Journal

of Development Studies, March 07.

[32] Kochar, A. (2005). Can Targeted Programs Improve Nutrition? An Empirical Analysis

Of India’s Public Distribution System. Economic development and cultural change, 54

(1), 20335.

42

[33] Laderchi, C. R. (2001). Killing Two Birds With The Same Stone ? The Effectiveness

Of Food Transfers On Nutrition And Monetary Poverty. Working Paper no. 74. QEH

working paper series.

[34] McCall, B. P. (1995). The Impact of Unemployment Insurance Benefit Levels on Re-

cipiency. Journal of Business and Economic Statistics, American Statistical Association,

vol. 13(2), pages 189-98, April.

[35] Moffitt, B. Y. R. (1989). Estimating the value of an In-kind Transfer: The case of Food

stamps. Econometrica, 57(2), 385-409.

[36] National Sample Survey Organisation (1996). Nutritional Intake in India. Report No.

405. National Sample Survey Organisation, Delhi.

[37] Niehaus, P., Atanassova, A., Bertrand, M., and Mullainathan, S. (forthcoming). Tar-

geting with Agents. American Economic Journal: Economic Policy.

[38] Planning Commission (2001). An Analysis of the price behavior of selected commodities.

National Council of Applied Economic Research, India.

[39] Planning Commission (2005). Performance Evaluation of Targeted Public Distribution

System (TPDS). Programme Evaluation Organisation. Government of India.

[40] Planning Commission (2012). Twelfth Five Year Plan, 2012-17. Government of India.

[41] Sen, P. (2005). Of Calories and Things: Reflections on Nutritional Norms, Poverty

Lines and Consumption Behaviour in India. Economic and Political Weekly, 40(43), 22

October, 4611-8.

43

[42] Sekhar, C. S. C. (2003). Volatility of agricultural prices-an analysis of major inter-

national and domestic markets. Indian Council for International Economic Relations.

Working paper no 103.

[43] SEWA (2009). Do poor people in Delhi want to change from PDS to cash transfers?

Self Employed Womens Association.

[44] Stifel, D. and Alderman H. (2006). The Glass of Milk Subsidy Pro- gram and Malnu-

trition in Peru. World Bank Economic Review, 20:3 (2006), 421448.

[45] Subramanian, S., and Deaton, A. (1996). The demand for food and calories. Journal of

political economy, 133-162.

[46] Sundaram, K. and Tendulkar, S. D. (2003). Poverty has declined in the 1990s: A

resolution of comparability problems in NSS consumer expenditure data. Economic and

Political Weekly, 327-337.

[47] Swaminathan, M. (2008). Programmes to Protect the Hungry: Lessons from India.

DESA Working Paper No. 70.

[48] Tarozzi, A. (2005). The Indian Public Distribution System as Provider of Food Security:

Evidence from Child Nutrition in Andhra Pradesh. European Economic Review, 49, p.

1305-1330.

[49] The Times of India (June 1 2013). Why the food security Bill will not boost

foodgrain consumption for the poor. Viewed on July 11 2013, http://articles.

timesofindia.indiatimes.com/2013-06-01/edit-page/39674597 1 cereal-consumption-

food-security-bill-households

44

[50] Wadhwa, M. (2001). Parking Space for the Poor: Restrictions Imposed on Marketing

and Movement of Agricultural Goods in India. Centre for Civil Studies.

45

Figure 1: Impact of a food subsidy on the budget set

15

Food

No

n F

oo

d

C

Q

Segment II

Segment I

B

A

N

F D

46

Figure

2:Variation

inPDSrice

discount

(2002-2008)

Pan

elA

020406080 020406080

Monso

on

Post M

onso

on Monso

on

Post M

onso

on Monso

on

Post M

onso

on Monso

on

Post M

onso

on

Assa

mW

est B

enga

lJh

arkh

and

Oris

sa

Chat

tisga

rhAn

dhra

Pra

desh

Karn

atak

aKe

rala

Rice discount ( % of Mkt. price )

Year

= 2

002

Pan

elB

020406080 020406080 Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer Monso

onPos

t Mon

soon

Assa

mW

est B

enga

lJh

arkh

and

Oris

sa

Chat

tisga

rhAn

dhra

Pra

desh

Karn

atak

aKe

rala


Year

= 2

003

Pan

elC

020406080 020406080 Winter

Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer Monso

onPos

t Mon

soon

Assa

mW

est B

enga

lJh

arkh

and

Oris

sa

Chat

tisga

rhAn

dhra

Pra

desh

Karn

atak

aKe

rala


Year

= 2

004

Pan

elD

020406080 020406080 Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer

Post M

onso

on

Monso

on

Summer

Winter

Monso

onPos

t Mon

soon

Assa

mW

est B

enga

lJh

arkh

and

Oris

sa

Chat

tisga

rhAn

dhra

Pra

desh

Karn

atak

aKe

rala


Year

= 2

005

Notes:1.

Rou

nd58

surveyed

hou

seholdsbetweenJu

lyan

dDecem

ber

2002,resultingin

noob

servationsfrom

thewinteran

dsummer

season

forthis

year.2.

Discount

calculatedas

(Mkt.price

-PDSprice)/Mkt.price*100

3.Averagesbased

onPDSan

dmarket

pricesreportedby

PDSusers

inthesample.

47

Figure

2:Variation

inPDSrice

discount

(2002-2008)

Pan

elE

020406080 020406080 Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer Monso

onPos

t Mon

soon

Assa

mW

est B

enga

lJh

arkh

and

Oris

sa

Chat

tisga

rhAn

dhra

Pra

desh

Karn

atak

aKe

rala


Year

= 2

006

Pan

elF

020406080 020406080 Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer Monso

onPos

t Mon

soon

Winter

Summer Monso

onPos

t Mon

soon

Assa

mW

est B

enga

lJh

arkh

and

Oris

sa

Chat

tisga

rhAn

dhra

Pra

desh

Karn

atak

aKe

rala


Year

= 2

007

Pan

elG

020406080 020406080 Winter

Summer

Monso

onWint

er

Summer

Monso

onWint

er

Summer

Monso

onWint

er

Summer

Monso

on

Assa

mW

est B

enga

lJh

arkh

and

Oris

sa

Chat

tisga

rhAn

dhra

Pra

desh

Karn

atak

aKe

rala


Year

= 2

008

Notes:1.

Rou

nd63

surveyed

hou

seholdsbetweenJa

nuaryan

dJu

ne2008,resultingin

noob

servationsfrom

thepostmon

soon

season

forthis

year.2.

Discount

calculatedas

(Mkt.price


3.Averagesbased

onPDSan

dmarketprices

reportedby

PDSusers

inthesample.

48

Table 1: State Specific PDS Quotas for BPL households

State Rice (kg) Wheat (kg)

Andhra Pradesh 4 per person (20 hh max) 5 (at APL price*)Assam 20 0Bihar 15 15Chattisgarh 25 0Gujarat 1 per person (3.5 hh max) 1.5 per person (9 hh max)Haryana 10 25Himachal Pradesh 15 20Jharkhand 35 0Karnataka 16 4Kerala 8 per adult 4 per child (20 hh max) 5 (at APL price*)Madhya Pradesh 6 17Maharashtra 5 15Meghalaya 2 per person 0Orissa 16 0Punjab 10 25Rajasthan 5 25Uttar Pradesh 20 15West Bengal 2 per person 2 per person

Sources: Planning Commission (2005), Khera (2011b) and Simplifying the food security bill at

http : //bit.ly/PMNFSB. Notes: 1. * denotes all households pay Above Poverty Line prices i.e. no price

discount. 2. PDS entitlements are for the period under study: 2002-2008. 3. Tamil Nadu is excluded as it

follows a universal PDS.

49

Tab

le2:

Variation

inPDSRicediscount

(%)

Winter(J

anuary

-March)

Summer

(April-M

ay)

State

Mean

Std.Dev

p10

p50

p90

NMean

Std.Dev

p10

p50

p90

N

Assam

41.46

10.59

2040.04

56.45

233

41.78

11.11

6.17

41.82

54.17

163

WestBengal

47.79

1023.51

47.94

60.32

445

48.56

11.28

11.12

50.77

61.18

320

Jharkh

and

54.75

13.57

20.45

51.52

72.73

7356.62

11.6

2062.86

7068

Orissa

42.88

11.76

18.33

44.12

59.42

544

44.58

12.01

13.33

44.57

57.92

427

Chattisgarh

45.15

13.83

17.65

41.04

67.09

265

49.14

17.95

1649.53

73.33

178

Andhra

Pradesh

53.36

5.52

39.91

53.52

60.24

2952

56.13

9.96

37.39

54.53

72.83

2134

Karnataka

66.92

11.6

23.39

69.82

77.32

1239

66.86

11.63

35.56

69.67

78.18

791

Kerala

45.16

10.17

27.39

42.8

59.89

970

47.03

12.42

18.18

45.82

63.74

620

Monso

on

(June-Sep

tember)

Post

Monso

on

(October-D

ecem

ber)

State

Mean

Std.Dev

p10

p50

p90

NMean

Std.Dev

p10

p50

p90

N

Assam

38.1

13.92

4.55

35.48

60283

39.44

13.41

17.65

36.93

59.13

230

WestBengal

43.29

11.9

7.82

43.56

57.19

618

43.35

12.77

6.41

43.13

58.5

531

Jharkh

and

52.73

17.17

6.25

54.83

7576

54.47

14.22

16.67

54.69

71.33

52Orissa

40.63

10.11

16.59

40.42

53.48

693

40.3

10.9

16.67

3955.48

623

Chattisgarh

42.48

12.94

22.5

41.98

63.69

398

41.21

13.67

11.33

39.48

58.43

324

Andhra

Pradesh

52.94

8.01

36.13

53.05

58.99

3952

52.4

5.53

40.98

52.15

60.23

3307

Karnataka

58.22

15.45

24.38

62.13

75.07

1697

60.2

15.04

14.67

65.5

751335

Kerala

42.77

10.55

18.7

42.32

58.22

1452

41.65

10.71

18.26

39.27

58.66

1217

Sou

rce:

Calculation

susing2002-2008NSSO

Socio-E

conom

icSurveys.

Notes:1.

Discount

calculatedas

(Mkt.price


2.Averagesbased

onPDSan

dmarketpricesreportedby

PDSusers

inthesample.

50

Tab

le3:

Variation

intheper

capitavalueof

thePDSRiceSubsidy(R

s)

Winter(J

anuary

-March)

Summer

(April-M

ay)

State

Mean

Std.Dev

p10

p50

p90

NMean

Std.Dev

p10

p50

p90

N

Assam

23.52

11.11

8.64

20.94

38.67

233

26.56

14.4

3.86

23.47

43.42

163

WestBengal

11.44

3.27

4.62

11.01

16.13

445

11.9

3.37

2.21

1215.63

320

Jharkh

and

52.11

29.69

14.35

44.38

8473

60.51

38.41

15.12

53.2

9868

Orissa

18.89

12.55

4.14

15.98

31.97

544

18.95

11.43

416.48

29.99

427

Chattisgarh

33.15

27.58

8.28

25.55

52.71

265

35.66

29.53

5.19

29.39

61.99

178

Andhra

Pradesh

25.36

6.23

12.21

25.18

33.44

2952

26.95

7.81

11.94

25.96

36.35

2134

Karnataka

35.89

21.56

6.93

31.74

58.24

1239

37.93

25.26

8.32

31.49

63.76

791

Kerala

31.11

14.14

7.27

29.72

49.81

970

33.07

15.31

7.8

29.86

53.1

620

Monso

on

(June-Sep

tember)

Post

Monso

on

(October-D

ecem

ber)

State

Mean

Std.Dev

p10

p50

p90

NMean

Std.Dev

p10

p50

p90

N

Assam

24.38

13.98

2.7

2241.81

283

26.12

18.65

7.2

21.89

41.63

230

WestBengal

10.76

3.55

210.55

14.74

618

10.88

3.83

1.52

10.69

15.61

531

Jharkh

and

45.45

29.08

7.17

38.04

7776

54.03

31.11

7.7

45.53

79.8

52Orissa

17.07

9.69

4.34

15.18

28693

17.94

11.33

4.85

14.87

28.8

623

Chattisgarh

32.51

23.28

7.02

24.72

59.79

398

31.2

26.64

3.73

24.55

54.93

324

Andhra

Pradesh

25.87

6.99

12.09

25.27

34.72

3952

26.1

6.55

12.43

25.6

34.94

3307

Karnataka

32.62

20.37

6.63

28.71

55.37

1697

33.71

21.71

4.79

29.1

56.25

1335

Kerala

29.89

13.48

7.7

28.28

48.89

1452

29.12

12.81

6.34

27.64

46.27

1217

Sou

rce:

Calculation

susing2002-2008NSSO

Socio-E

conom

icSurveys.

Notes:1.

Valueof

thesubsidycalculatedas

Per

capitaQuota*(M

kt.price

-PDSprice)2.

Averagesbased

onPDSan

dmarketprices

reportedby

PDSusers

inthesample.

51

Table 4: Descriptive statistics for the full sample and PDS users

Sample: Full Sample PDS users

Mean Standard Mean Standarddeviation deviation

Monthly expenditure per capita (Rs) 1011.0 (1085.3) 636.8 (393.4)Daily calories per capita (kcal) 2334.2 (1300.9) 2190.9 (623.7)Proportion spent on food 0.547 (0.142) 0.577 (0.116)

Size of the household 4.570 (2.382) 4.736 (1.891)Number of children below 15 1.411 (1.428) 1.538 (1.339)Proportion of women 0.515 (0.207) 0.512 (0.152)Age of household head 46.58 (13.61) 45.38 (12.17)

Urban dummy 0.363 (0.481) 0.219 (0.414)Electricity 0.724 (0.447) 0.725 (0.446)Permanent home 0.338 (0.473) 0.314 (0.464)SC/ST/OBC 0.592 (0.491) 0.765 (0.424)

PDS rice price (Rs/kg) 5.271 (1.855)Mkt rice price (Rs/kg) 10.80 (2.193)

PDS rice qty (kg) 18.64 (9.392)Market rice qty (kg) 26.11 (20.24)

Food expenditure per capita (Rs) 400.5 (168.9)Cereal expenditure per capita (Rs) 116.0 (43.85)Rice subsidy per capita (Rs) 25.71 (11.90)

Rice proportion of food expenditure 0.260 (0.130)

Proportion of calories from rice 0.615 (0.175)Proportion of calories from cereals 0.727 (0.0987)

Observations 124228 22564

Notes: 1. Rural Poverty line is Rs 497.6, Urban Poverty line is Rs 635.7 (Planning Commission,

Government of India). 2. Average daily minimum calorie requirements are 2400 kcal for rural

and 2100 kcal for urban areas. 3. All prices in 2005 Rupees. (Rs 45.3 = 1 USD in 2005). 4. The

category cereals includes rice, wheat, semolina, jowar, bajra, millets, corn, barley and ragi. 5. The

sample comprises households in the following states: Andhra Pradesh, Assam, Karnataka, Kerala,

Orissa, Jharkhand, Chattisgarh and West Bengal.

52

Table 5: Impact of the subsidy on cereal consumption

Dependent variable: Cereal consumption Log cereal consumption Cereal consumption(1) (2) (3)

Rice subsidy per capita 2.030∗∗∗

(0.158)

Log rice subsidy per capita 0.123∗∗∗

(0.00963)

Rice quota per capita -1.968(4.442)

Market price* quota per capita 1.697∗∗∗

(0.436)

PDS price* quota per capita 0.157(0.463)

PDS price -1.607(2.849)

Market price -6.131∗∗

(2.395)

Observations 22564 22564 22564Adjusted R2 0.250 0.270 0.258

Standard errors in parentheses. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01

Notes: 1. All equations present results clustered at the district level. 2. All equations include household

characteristics (education of household head and spouse, age and age squared of household head, proportion

of females, land owned) and urban, state*year, district and season dummies. 3. Dependent variable in

columns (1) and (3) is daily cereal consumption per capita (grams), dependent variable in column (2)

is log of daily cereal consumption per capita.

53

Tab

le6:

Impactof

thesubsidyon

calories

from

differentfood

grou

ps

Dep

endentvariab

le:

Caloric

intake

Log

caloricintake

Log

caloricintake

from

food

grou

pCereals

Lentils

Fruits&

Vegetab

les

Meat

(1)

(2)

(3)

(4)

(5)

(6)

Ricesubsidyper

capita

12.58∗

∗∗

(1.116)

Log

rice

subsidyper

capita

0.144∗

∗∗0.123∗

∗∗0.154∗

∗∗0.234∗

∗∗0.170∗

∗∗

(0.0103)

(0.00963)

(0.0177)

(0.0160)

(0.0187)

Observations

22564

22564

22564

22118

22562

19833

Adjusted

R2

0.124

0.166

0.270

0.215

0.441

0.426

Standarderrors

inparentheses.

∗p<

0.10,∗∗

p<

0.05,∗∗

∗p<

0.01

Notes:1.

Allequationspresent

resultsclustered

atthedistrictlevel.2.

Allequationsincludehou

seholdcharacteristics(education

ofhou

sehold

headan

dspou

se,agean

dagesquared

ofhou

seholdhead,proportion

offemales

landow

ned)an

durban

,state*year,districtan

dseason

dummies.

3.Dep

endentvariab

lein

column(1)is

daily

caloricintake

per

capita(kcal),dep

endentvariab

lein

column(2)is

logof

daily

caloricintake

per

capita,

dep

endentvariab

lein

column(3)is

logof

calories

from

cerealsper

capita,

dep

endentvariab

lein

column(4)is

logof

calories

from

lentilsper

capita,

dep

endentvariab

lein

column(5)is

logof

calories

from

fruitsan

dvegetablesper

capita,

anddep

endentvariab

lein

column(6)is

logof

calories

from

meatper

capita.

54

Tab

le7:

Exp

enditure

elasticity

ofcalories

andfood

expenditure:Ruralsample

Dep

endentvariab

le:

Log

caloricintake

Log

food

expenditure

Log

Rupeesper

calorie

(1)

(2)

(3)

(4)

(5)

(6)

Log

mon

thly

expenditure

per

capita

0.406∗

∗∗0.751∗

∗∗0.345∗

∗∗

(0.0103)

(0.00883)

(0.0102)

Log

rice

subsidyper

capita

0.140∗

∗∗0.146∗

∗∗0.00575

(0.0135)

(0.0153)

(0.00632)

Observations

13333

13333

13333

13333

13333

13333

Adjusted

R2

0.437

0.157

0.820

0.404

0.675

0.527

Standarderrors

inparentheses.

∗p<

0.10,∗∗

p<

0.05,∗∗

∗p<

0.01

Notes:1.

Allequationspresent

resultsclustered



seholdcharacteristics

(education

ofhou

seholdheadan

dspou

se,agean

dagesquared

ofhou


offemales,landow

ned)an

d

state*year,districtan

dseason

dummies.

3.Dep

endentvariab

lein

columns(1)an

d(2)is

logof

daily

caloricintake

per

capita,

dep

endentvariab

lein

columns(3)an

d(4)is

logof

food

expenditure

per

capita,

dep

endentvariab

lein

columns(5)an

d(6)is

logof

price

paidper

1000

calories.5.

Asin

Subraman

ianan

dDeaton(1996),thesample

comprises

only

ruralhou

seholdsan

dmon

thly

expenditure

iscalculatednet

ofpurchases

ofdurable

good

s.

55

Table 8: Income elasticity of cereal and food consumption

Panel A: IHDS data

Dependent variable: Log cereal consumption Log food expenditure

(1) (2) (3) (4)

Log monthly income per capita 0.0462∗∗∗ 0.0304∗∗∗ 0.0966∗∗∗ 0.0827∗∗∗

(0.00972) (0.00959) (0.0108) (0.0105)

Log rice subsidy per capita 0.295∗∗∗ 0.259∗∗∗

(0.0323) (0.0323)

Observations 3962 3962 3962 3962Adjusted R2 0.306 0.354 0.402 0.429

Panel B: IHDS and NSSO data

Dependent variable: Log cereal consumptionData: IHDS NSSO IHDS NSSO

Log monthly expenditure per capita 0.247∗∗∗ 0.269∗∗∗

(0.0178) (0.0264)


(0.0314) (0.0390)


Standard errors in parentheses.∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01

Notes: 1. All equations present results clustered at the district level. 2. All equations in panel A include

urban, district and season dummies. All equations in panel B include household characteristics (education

of household head and spouse, age and age squared of household head, proportion of females, land owned)

and urban, district and season dummies. 3. Dependent variable in columns (1) and (2) of panel A and

columns (1)-(4) of panel B is log of cereal consumption per capita, dependent variable in columns (3) and

(4) of panel A is log of food expenditure per capita. 4. The data come from the India Human Development

Survey 2004-05 and the NSSO surveys covering November 2004 - October 2005.

56

Tab

le9:

Impactof

thesubsidyon

cereal

consumption

andcaloricintake:Heterogeneity

bysocial

grou

p

Dep

endentvariab

le:

Cerealconsumption

per

capita

Caloric

intake

per

capita

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

Ricesubsidyper

capita

1.979∗

∗∗1.892∗

∗∗1.830∗

∗∗1.960∗

∗∗12.41∗

∗∗11.61∗

∗∗11.55∗

∗∗12.28∗

∗∗

(0.172)

(0.209)

(0.173)

(0.159)

(1.214)

(1.046)

(1.309)

(1.142)

Urban

dummy

-31.61

∗∗∗

-24.97

∗∗∗

-30.90

∗∗∗

-24.86

∗∗∗

-82.32

∗∗∗

-63.93

∗∗∗

-104.0

∗∗∗

-59.30

∗∗∗

(6.412)

(3.026)

(3.084)

(3.030)

(27.98)

(13.92)

(13.52)

(13.91)

Urban

*Ricesubsidy

0.240

0.772

(0.241)

(1.054)

SC/S

T/O

BC

1.488

-76.34

∗∗∗

(5.767)

(27.20)

SC/S

T/O

BC*R

icesubsidy

0.186

1.242

(0.205)

(1.001)

Low

estexpenditure

quartile

-52.27

∗∗∗

-334.8

∗∗∗

(6.058)

(29.14)

Low

estqu

artile*R

icesubsidy

-0.106

-2.961

∗∗

(0.205)

(1.169)

Hom

egrow

nrice

-15.73

∗-41.21

(8.247)

(39.68)

Hom

egrow

n*R

icesubsidy

1.300∗

∗∗5.893∗

∗∗

(0.328)

(1.609)

Observations

22564

22564

22564

22564

22564

22564

22564

22564

Adjusted

R2

0.250

0.250

0.274

0.251

0.124

0.124

0.183

0.126

Standarderrors

inparentheses.

∗p<

0.10,∗∗

p<

0.05,∗∗

∗p<

0.01

Notes:1.

Allequationspresent

resultsclustered



seholdcharacteristics(education

ofhou

sehold

headan

dspou

se,agean

dagesquared

ofhou


offemales,landow

ned)an

durban

,state*year,districtan

dseason

dummies.

3.Dep

endentvariab

lein

columns(1)-(4)is

daily

cereal

consumption

per

capita(grams),dep

endentvariab

lein

columns(5)-(8)is

daily

caloricintake

per

capita(kcal).

57

Table 10: Impact of the subsidy on rice producing households

Dependent variable: Cereal consumption Caloric intake

(1) (2)

Rice quota per capita -14.15 -38.18(18.56) (84.41)

Market price* quota per capita 4.483∗∗ 19.52∗∗

(1.792) (7.896)

PDS price* quota per capita -0.989 -5.916(1.976) (8.667)

PDS price 3.226 38.62(10.98) (49.72)

Market price -17.15∗ -52.60(8.760) (40.03)

Observations 2111 2111Adjusted R2 0.231 0.241


Notes: 1. The equation present results clustered at the district level. 2. The equation includes household

characteristics (education of household head and spouse, age and age squared of household head,

proportion of females, land owned) and urban, state*year, district and season dummies. 3. Dependent

variable in column (1) is daily cereal consumption per capita (grams), dependent variable in column (2)

is daily caloric intake per capita (kcal). 4. The sample comprises PDS households that report

own (produced) rice as one of their sources of supplementary grains.

58

Table 11: Elasticity of cereal consumption and caloric intake by expenditure quartile

Panel A: Cereal consumption

Dependent variable: Log cereal consumption per capitaExpenditure quartile: Lowest Highest

(1) (2) (3) (4)

Log rice subsidy per capita 0.0689∗∗∗ 0.0895∗∗∗ 0.109∗∗∗ 0.175∗∗∗

(0.0138) (0.0147) (0.0137) (0.0180)


Panel B: Caloric intake

Dependent variable: Log caloric intake per capitaExpenditure quartile: Lowest Highest

Log rice subsidy per capita 0.0682∗∗∗ 0.0933∗∗∗ 0.111∗∗∗ 0.196∗∗∗

(0.0135) (0.0132) (0.0129) (0.0182)





proportion of females, land owned) and urban, state*year, district and season dummies. 3. In going

from columns (1) to (4), the sample comprises households in the lowest, second lowest, second

highest and highest expenditure quartile of PDS rice users, respectively. 4. Dependent variable in

panel A is log of cereal consumption per capita, dependent variable in panel B is log of daily

caloric intake per capita.

59

Table 12: Impact on Non-PDS users

Dependent variable: Cereal cons. Log cereal cons. Caloric intake Log caloric intake(1) (2) (3) (4)

Rice subsidy per capita -0.0706 0.462(0.115) (0.772)

Log rice subsidy per capita -0.0141∗ -0.00863(0.00750) (0.00686)






variable in column (1) is daily cereal consumption per capita (grams), dependent variable in

column (2) is log of daily cereal consumption per capita, dependent variable in column (3) is daily caloric

intake per capita (kcal), dependent variable in column (4) is log of daily caloric intake per capita.

4. The sample comprises households that do not receive any subsidies from the PDS. They are assigned

their respective local average value of the PDS subsidy.

60

Table 13: Alternative specifications for value of the subsidy

Dependent variable: Log caloric intake(1) (2) (3) (4) (5)

Log rice subsidy (avg. family size) 0.134∗∗∗

(0.00763)

Log rice subsidy (household level) 0.131∗∗∗

(0.0147)

Size of the household 0.134∗∗∗

(0.00243)

Log rice subsidy (per person) 0.202∗∗∗

(0.00992)

Log rice subsidy (median prices) 0.119∗∗∗

(0.0103)

Log rice subsidy per capita 0.148∗∗∗

(state*survey wave) (0.0107)

Observations 22564 22564 22564 22543 22564Adjusted R2 0.173 0.613 0.200 0.159 0.168


Notes: 1. All equations present results clustered at the district level. 2. All equations include

household characteristics (education of household head and spouse, age and age squared of household

head, proportion of females land owned) and urban and district dummies. Equations (1)-(4) include

state*year and season dummies. Equation (5) includes state*survey-wave dummies. 3. Dependent

variable in columns (1), (4) and (5) is log of daily caloric intake per capita, dependent variable

in column (2) is log of daily caloric intake at the household level, dependent variable in

column (3) is log of daily caloric intake at the household level divided by the total number

of household members.

61

Table 14: Impact on different sub-samples of states

Dependent variable: Log cereal consumption Log caloric intakeStates: All Rice Non Rice All Rice Non Rice

(1) (2) (3) (4) (5) (6)

Log rice subsidy per capita 0.0796∗∗∗ 0.123∗∗∗ 0.0439∗∗∗ 0.101∗∗∗ 0.144∗∗∗ 0.0666∗∗∗

(0.00677) (0.00963) (0.00664) (0.00701) (0.0103) (0.00675)

Observations 33231 22564 10667 33231 22564 10667Adjusted R2 0.263 0.270 0.255 0.197 0.166 0.255





variable in columns (1) - (3) is log of daily cereal consumption per capita, dependent variable in columns

(4)-(6) is log of daily caloric intake per capita. 4. The rice favoring states are: Andhra Pradesh, Assam,

Chattisgarh, Jharkhand, Karnataka, Kerela, Orissa and West Bengal. The non-rice favoring states are:

Bihar, Gujarat, Haryana, Himachal Pradesh, Madhya Pradesh, Maharashtra, Punjab, Rajasthan

and Uttar Pradesh.

Table 15: Impact of the wheat subsidy

Dependent variable: Log cereal consumption Log caloric intake(1) (2)


(0.0110) (0.0125)

Log wheat subsidy per capita 0.0172∗∗∗ 0.0270∗∗∗

(0.00549) (0.00519)






variable in column (1) is log of daily cereal consumption per capita, dependent variable in

column is log of daily caloric intake per capita.

62

Table 16: Impact of the subsidy on cereal consumption and caloric intake: State levelfunctioning

Dependent variable: Cereal consumption per capita Caloric intake per capita(1) (2)

Rice subsidy per capita 2.359∗∗∗ 12.04∗∗∗

(0.180) (1.148)

Corrupt*Rice subsidy -1.207∗∗∗ -7.060∗∗∗

(0.304) (1.480)



Notes: 1. All equations present results clustered at the district level 2. All equations include household


proportion of females, land owned) and urban, state-year, district and season dummies. 3. Dependent

variable in column (1) is daily cereal consumption per capita (grams), dependent variable in column (2)

is daily caloric intake per capita (kcal).

63

Appendix Tables

A. Per capita purchase of PDS Rice

Mean (kg) Std. Dev. N

Assam 7.29 3.5 909

West Bengal 2.82 2.46 1914

Orissa 6.60 3.63 2287

Jharkhand 4.74 2.41 269

Chattisgarh 9.09 4.07 1169

Andhra Pradesh 5.07 2.50 12345

Karnataka 5.05 2.49 5062

Kerala 5.50 3.49 4259

Full Sample 5.34 3.1 28214

Note: The sample comprises PDS rice users.

B. Price paid per calorie by food group

Foodgroup: Cereals Lentils Vegetables Meat

Mean (Rs/1000 kcal) 2.40 8.46 18.89 41.45Std. Dev. 0.63 2.80 6.48 19.42

Note: Price calculations based on quantity and value reported by PDS rice users.

1

C. BPL and Antyodaya Households - IHDS and NSSO 61st round data

Sample: BPL AntyodayaData: IHDS NSSO IHDS NSSO

Monthly expenditure per capita (Rs) 660.3 558.5 485.3 478.9(539.4) (306.4) (481.2) (309.5)

Monthly income per capita (Rs) 606.5 471.0(785.6) (370.1)

PDS rice price (Rs/kg) 4.475 5.399 3.329 3.461(1.493) (1.290) (0.954) (0.983)

Market rice price (Rs/kg) 10.54 10.53 9.753 10.09(2.227) (2.016) (1.634) (2.049)

PDS rice Qty (kg) 18.60 17.95 24.71 23.73(6.766) (8.513) (9.438) (10.81)

Market rice Qty (kg) 23.08 25.29 25.67 16.68(22.63) (20.88) (26.31) (20.38)

Daily cereal consumption per capita (kg) 0.441 0.467 0.538 0.496(0.170) (0.136) (0.195) (0.186)

Rice subsidy per capita (Rs) 27.23 24.97 32.47 30.85(15.42) (14.29) (23.30) (23.90)

Observations 4196 7405 283 788

Notes: 1. Means with standard deviations in parentheses. 2. The sample comprises PDS rice users

from the 61st round (2004-05) of the NSSO surveys and the India Human Development Survey 2005.

2

D. PDS Rice users - IHDS and NSSO

Data: IHDS NSSO

Monthly expenditure per capita (Rs) 625.1 596.7(442.2) (362.4)

Monthly income per capita (Rs) 587.5(567.3)

PDS rice price (Rs/kg) 4.546 5.237(1.612) (1.627)

Market rice price (Rs/kg) 10.48 10.56(1.906) (2.122)

PDS rice qty (kg) 19.16 18.54(7.253) (8.785)

Market rice qty (kg) 24.52 27.97(23.03) (21.37)

Daily cereal consumption per capita (kg) 0.434 0.475(0.158) (0.129)

Rice subsidy per capita (Rs) 25.46 24.17(11.18) (11.37)

Observations 3962 4255

Notes: 1. Means with standard deviations in parentheses. 2. The sample comprises PDS rice users

(November 2004 to October 2005) from the NSSO surveys and the India Human Development Survey 2005.

3

Removing Observation Bias: using “big data” to create cooking metrics from stove usage monitors and minimal observation

Andrew M. Simons, Theresa Beltramo, Garrick Blalock, David I. Levine*

"Big data" holds the promise of improving social science research. One application of big data is in understanding the adoption behaviors of new cooking technologies in the developing world. Stove Usage Monitors, for example, can generate millions of observations with lower Hawthorne effects than traditional surveys or direct observation. This paper describes field methods and proposes a new algorithm for converting temperature data generated from stove use monitors into usage metrics for both traditional and nontraditional stoves. Central to our technique is recording the visual on/off status of a stove anytime research staff observes the stove. The observations are then combined with temperature readings and a predictive logistic regression to predict the probability a stove is in use. Using this algorithm we correctly predict 89% of three stone fire observations and 94% of Envirofit observations. The logistic regression correctly classifies more observations than published temperature analysis algorithms. Further, we analyze stove usage data when multi-‐person measurement teams are present in households compared to the same times in the previous and subsequent weeks when only the unobtrusive sensors are collecting temperature data. We find a statistically and economically significant Hawthorne effect whereby study participants increase the use of the introduced stoves by approximately one third (2 hours per day), while simultaneously decreasing three stone fire usage by approximately the same amount when measurement teams are present.

Keywords: observation bias, Hawthorne effect, improved cookstoves, technology adoption, biomass fuel, monitoring and evaluation, stove use monitors * Simons: Charles H. Dyson School of Applied Economics and Management, Cornell University (email: [email protected]). Beltramo: Impact Carbon, San Francisco (email: [email protected]). Blalock: Charles H. Dyson School of Applied Economics and Management, Cornell University (email: [email protected]). Levine: Haas School of Business, University of California, Berkeley (email: [email protected]).

Acknowledgments: This study was funded by the United States Agency for International Development under Translating Research into Action, Cooperative Agreement No. GHS-‐A-‐00-‐09-‐00015-‐00. The recipient of the grant was Impact Carbon who co-‐funded and managed the project. Juliet Kyaesimira and Stephen Harrell expertly oversaw field operations and Amy Gu provided excellent research support. We thank Impact Carbon partners Matt Evans, Evan Haigler, Jimmy Tran, Caitlyn Toombs, and Johanna Young; Berkeley Air project partners including Dana Charron, Dr. David Pennise, Dr. Michael Johnson, and Erin Milner, the USAID TRAction Technical Advisory Group, and seminar participants at Cornell University for valuable comments. Special thanks to Kirk Smith, Ilse Ruiz-‐Mercado, Ajay Pillarisetti and colleagues from the U.C. Berkeley Household Energy, Climate, and Health Research Group for the development of the Stove Use Monitoring system (SUMs) concept and consultations on optimal SUMs placement and holder design for this project. Data collection was carried out by the Center for Integrated Research and Community Development (CIRCODU), and the project’s success relied on expert oversight by CIRCODU's Director General Joseph Ndemere Arineitwe and field supervisors Moreen Akankunda,

2

Innocent Byaruhanga, Fred Isabirye, Noah Kirabo, and Michael Mukembo. This study was made possible by the support of the American people through the United States Agency for International Development (USAID). We thank the Atkinson Center for a Sustainable Future at Cornell University for additional funding of related expenses. The findings of this study are the sole responsibility of the authors, and do not necessarily reflect the views of their respective institutions, nor USAID or the United States Government. Introduction “Big data” is transforming economic policy and the process of economic research (Einav and Levin 2014). Prominent examples include using internet search results to provide real-‐time estimates of economic activity (Choi and Varian 2012; Goel et al. 2010), analyzing administrative records to evaluate the long-‐term effects of better teachers and the effects of providing health insurance (Chetty, Friedman, and Rockoff 2011; Finkelstein et al. 2012), and utilizing shopping data from millions of online shoppers to understand their sensitivity to in-‐state sales taxes (Einav et al. 2012). The developing world also has opportunities for the analysis of “big data.” We use millions of data points collected by non-‐invasive, relatively inexpensive temperature monitors to create metrics of stove use in a developing country setting. Traditional wood-‐and charcoal-‐burning stoves both burn inefficiently and produce a great amount of smoke. The smoke leads to respiratory, heart, and other disorders that kill approximately 4.0 million people per year (Lim et al. 2012). Much of the health burden is concentrated on women and children, as is the time burden collecting biomass fuel (Kammen, Bailis, and Herzog 2002). Environmental damage can be significant; by one estimate, household energy use in Africa will produce 6.7 billion tons of carbon by 2050 (Bailis, Ezzati, & Kammen, 2005). Furthermore, the incomplete combustion of biomass fuels leads to the release of black carbon (soot) that contributes to current global warming (Bond, Venkataraman, and Masera 2004; Ramanathan and Carmichael 2008). These inefficiencies imply deeper poverty, loss of biodiversity, rapid deforestation, and climate change. These problems have led both policy-‐makers and practitioners to search for safer and more fuel-‐efficient stoves that are attractive to consumers. One challenge to understanding stove use substitution behaviors is to measure stove use precisely, economically and with minimal intrusion into the lives of study participants. Measuring stove usage with minimal observation bias is necessary to recover unbiased estimates of stove use because study participants potentially behave differently when research staff is present. Scientists studying the health effects of cooking technologies need to understand how many hours a day a stove is used to calculate exposure to household air pollution (Martin II et al. 2011; McCracken et al. 2007). Project financiers need to measure the time cooked on both new and old stoves to allocate the carbon credits that can fund stove subsidies in developing countries (Gold Standard Foundation 2013; GIZ 2011; Freeman and Zerriffi 2012). Further, “stove stacking” (the use of multiple fuels and stoves) often occurs when multiple cooking options exist (Masera, Saatkamp, and Kammen 2000; Ruiz-‐Mercado et al. 2013). To understand any of these issues researchers must

3

know the baseline amount of cooking on existing stoves, and then the time cooked on both the existing and introduced stove. Pioneering stove studies have highlighted the harmful effects of indoor air pollution and the health benefits of adopting fuel-‐efficient nontraditional stoves (Ezzati, Saleh, and Kammen 2000; Ezzati and Kammen 2002; Ezzati and Kammen 2001; Smith et al. 2010; Smith et al. 2011; Smith et al. 2006; Smith-‐Sivertsen et al. 2009). However these studies were characterized by high levels of interactions between research staff and study participants; for example, in Ezzati et al. (2000) research staff noted the locations of each household member and the status of how active the cooking fire is every 5-‐10 minutes. Observational studies of stove use raise the possibility of observation bias (or Hawthorne effect), which arises when participants act differently than they normally would because they know they are being observed (as noted by Smith-‐Sivertsen et al., 2009). To minimize observation bias, stove use monitor systems (SUMs) can record stove temperatures without the need for an observer to be present. The SUMs used for our project, iButtons™ manufactured by Maxim Integrated Products, Inc., are small stainless steel temperature sensors about the size of a small coin and the thickness of a watch battery which can be affixed to any stove type. Our SUMs record temperatures with an accuracy of +/-‐ 1.3 degrees C up to 85°C (see Figure 1 for a photograph). 1 Stove Usage Monitors that log stove temperatures was first suggested by Ruiz-‐Mercado et al. (2008). SUMs offer an unobtrusive, precise, relatively inexpensive, and objective measure of stove usage (Ruiz-‐Mercado et al. 2008).2 We know of two published algorithms for determining stove usage with SUMs. Both studies use SUMs to gather continuous temperature data for nontraditional cookstoves, and then calculate temperature changes between subsequent temperature readings. In Ruiz-‐Mercado, Canuz, and Smith (2012) the algorithm considers an increase in temperature above a certain threshold (1.52°C increase over 40 minutes) as the start of a candidate cooking or refueling event. If the post-‐increase stove temperature is also above the ambient temperature, the algorithm counts the passage of time until a temperature drop indicates that the stove is cooling down (indicated by a 2.28°C decrease over 60 minutes), signaling the end of the cooking event. These slope thresholds are chosen by taking the 99th and 1st percentiles of slope values from the distribution of ambient air temperatures in the research area. Multiple temperature peaks within a 2 hour period of each other are considered the same cooking event (Ruiz-‐Mercado, Canuz, and Smith 2012). Mukhopadhyay et al. (2012) use a similar technique combining both change in slope and a temperature threshold to classify when a stove is being used. However, the

1 For additional details concerning the SUMs see the product description website at: http://www.berkeleyair.com/products-‐and-‐services/instrument-‐services/78-‐sums 2 The SUMs used in our project cost approximately USD$16 each and could record temperature data for 24 hours a day for four to six weeks in a household before needing minimal servicing from a technician (after the data download they can be reset and re-‐used).

4

researchers choose different thresholds based on observed local cooking practices and local ambient temperatures. While the decision of what temperature thresholds to use is informed by data from the experiment, it requires the assumption that the slope thresholds for starting and ending cooking are the 1st and 99th percentiles of the slope of the ambient air temperatures. If these slope thresholds are not the true indicator of starting and ending cooking, this introduces error into the specification. Further, a challenge of the existing algorithms is that different thresholds are needed for different geographic areas and potentially seasons. We propose a methodology that uses SUMs temperature data gathered in the same way as previous studies, but combines this temperature data with observations made throughout the experiment of when stoves were seen as on or off. We run a logistic regression and use SUMs temperature data to predict observations of stoves being on or off. We then take the coefficients from this equation to predict the probability of cooking over the much larger set of SUM temperature readings without any observational data. The researcher can sum the predicted minutes cooked in a 24-‐hour period to estimate minutes of cooking that day. This measure of cooking can then be used as the basis for other analysis such as comparing cooking behaviors between groups of households with certain characteristics or groups differentiated by experimental design. Methods Our technique requires continuous SUMs temperature data for a given stove, and recorded instances of whether that particular stove is seen in use or not (henceforth called “observations”). We matched observations of stove use to SUMs temperature data by time and date stamps. The core of our method is a logistic regression using the lags and leads of the SUMs temperature data to predict observations of stove usage. Continuous SUMs temperature data A stove usage monitor takes a temperature reading at an interval specified by the user. The SUMs used in this experiment hold approximately 2,000 data points before the data needs to be downloaded and the device reset. We set the SUM to record temperature every 30 minutes, giving them six weeks between required servicing. Observations of stove use Each time the data collection team visited a household they observed which stoves were in use (along with the date and time). Enumerators visited a house numerous times during a “measurement week,” when we also enumerated a survey and weighed wood for a kitchen performance test (Rob Bailis, Smith, and Edwards

5

2007). Another enumerator visited once every 4-‐6 weeks to download data and reset the SUM.3 We originally intended to use the entire sample of observations. We found, however, a number of anomalies. For example, 3.0% (10 out of 329) of “lit” three stone fire observations had SUM readings below the typical daily mean temperature (23.8°C). At the same time 7.8% (105 out of 1339) of “not lit” three stone fire observations had SUM readings over 40°C, which was the highest ambient air temperature recorded in the observation period. While suspicious, these readings may not necessarily be erroneous, as SUMs can be cool if the stove was just lit in the last few minutes and the SUM has not yet registered the heat increase, or a SUM could still be hot on an unlit stove if the stove was just extinguished. We suspect a substantial share of such cases were due to errors in observing or recording the lit status of stoves or in errors in matching the time stamps, stoves, or homes of observations and SUMs measurements. Therefore we trim the observation sample based on the hourly ambient temperatures (with a conservative adjustment). We drop observations observed as “lit” below the mean minus 2.0°C of the ambient temperature for that hour, and we drop observations observed as “not lit” that are above the maximum ambient temperature plus 2.0°C for that hour. The mean and maximum hourly ambient temperatures are taken across the entire sample of ambient temperature readings. This removes 11.0% (184 out of 1668) and 3.6% (28 out of 782) of the observation samples for three stone fires and Envirofits, respectively. Regression specification There is no theory of what combinations or transforms of current, lagged, and leading temperature readings should be used to predict observed cooking. We test ten specifications and compare the goodness of fit for each specification. (We do not also include slope terms, because they are perfectly collinear with multiple lags or leads.) For simplicity, our desire is to use the same specification on both the three stone fire sample and the Envirofit sample. Data A series of randomized control trials were executed in rural areas of the Mbarara District in southwestern Uganda (Figure 2) from February to September 2012, which focused on the use, and adoption of fuel-‐efficient stoves.4 In our study the 3 One caveat with observations used in this study, is that at times when there are multiple stoves of a given type (for example two Envirofits or two three stone fires) in a household, the observation was the answer to the following question, “Is an [Envirofit stove/Three Stone Fire] on in the given household?” In cases where the answer was “Yes, one Envirofit stove is on and in use” but two Envirofits were present in the household, the data cleaning process required comparing the SUMs temperature reading of both Envirofits that were in the house at the timestamp of the observation and then assigning the “on” value to the hotter one, and the “off” value to the cooler one. The same process was applied to three stone fires as well. This process is described in detail in Appendix A. 4 See Harrell, Beltramo, Levine, Blalock, & Simons (2014) and Levine, Beltramo, Blalock, & Cotterman (2013) for more details on the experimental design and an overview of the study area.

6

traditional stove was a three stone fire,5 and the nontraditional stove was a fuel-‐efficient Envirofit G-‐3300.6 The study area is characterized by agrarian livelihoods including farming of matooke (starchy cooking banana), potatoes, and millet as well as raising livestock. At baseline almost all families cook on a traditional three-‐stone fire (97%), usually located within a separate cooking hut (62% of households had totally enclosed kitchens with no windows, while 38% had semi-‐enclosed kitchens with at least one window). Most stove usage occurs during lunch and dinner preparation, with matooke and beans the most common and most time consuming foods cooked. Matooke, a main food for lunch and dinner, is unripe plantain eaten after steaming for 3-‐5 hours. Beans, another common dish, is prepared by boiling and simmering for generally 2-‐4 hours. Thus for the main meals, it is common cooking practice to simmer and/or steam foods for several hours in a row. The study tracked stove usage (before and after the purchase of a fuel efficient stove) amongst twelve households in each of fourteen rural parishes in Mbarara (168 total households). Upon arriving in a new parish, staff displayed the new Envirofit and offered it for sale to anyone who wanted to purchase at 40,000 Ugandan Shillings (approximately USD $16, see Levine et al. (2013) for an overview of the sales contract). Consumers who wanted to buy the stove were randomly assigned into two groups (early buyers, late buyers). The project asked both early buyers and late buyers if they would agree to have SUMs placed on their traditional stoves immediately. Then approximately four weeks later the early buyers group received their first Envirofit stove, and approximately four weeks after that the late buyers received their first Envirofit stove. In each parish, more than twelve households agreed to join the study; therefore among those that agreed, we randomly selected twelve households per parish for the usage study with the SUMs. Stove temperatures were tracked for approximately six months (April – September 2012). Approximately four weeks after late buyers received their Envirofits, both groups were surprised with a second Envirofit stove. Common cooking practices in the area require two simultaneous cooking pots (for example rice and beans, or matooke and some type of sauce); therefore, because the Envirofit is sized for one cooking pot, we gave a second Envirofit to mimic normal cooking behavior as much as possible. Each household had as many as two three stone fires and two Envirofit stoves monitored with SUMs throughout the study. The results of stove use will be presented in a subsequent analysis.

5 A three stone fire is simply three large stones, approximately the same height, on which a cooking pot can be balanced over a fire. Almost all cooks burned firewood or other solid biomass. 6 Product manufacturer features state that the Envirofit G-‐3300 reduces smoke and harmful gasses by up to 80%, reduces biomass fuel use by up to 60% and reduces cooking time by up to 50%. For more details see: http://www.envirofit.org/products/?sub=cookstoves&pid=10

7

Results Descriptive statistics The data consist of approximately 1.7 million temperature readings from more than 34,000 stove-‐days of use recorded across 580 stoves at 168 households. The sample sizes by stove type are in Table 1. SUMs must be placed close enough to the heat source to capture changes in temperatures, but not so close that they exceed 85°C or they overheat and malfunction. We do not need to recover the exact temperature of the hottest part of the fire to learn about cooking behaviors. Even with SUMs that are reading temperatures 20-‐30 centimeters from the center of the fire, as long as the temperature readings for times when stoves are in use are largely different than times when stoves are not used the logistic regression will be able to predict a probability of usage. We did not have strict control over the location of our SUM in a three stone fire, while we did for an Envirofit. SUMs for three stone fires were placed in a SUM holder (Figure 3)7 and then placed under one of the stones in the three stone fire (left panel, Figure 4). The SUMs for Envirofits were attached using duct tape and wire and placed at the base of the stove behind the intake location for the firewood (right panel, Figure 4). Over multiple weeks users likely move the locations of the stones in their three stone fires depending on their cooking needs, pushing the stones closer together for a smaller fire and wider to fit more fuel wood for a larger fire. Therefore we expect the tracking of three stone fire temperatures to be more difficult with SUMs than the tracking of Envirofit stove usage. Figure 5 shows an example of SUMs temperature data for a household across about three weeks. The left panel shows the temperatures registered in a three stone fire versus the ambient temperature also recorded with SUMs in this household, while the right panel compares the temperature of the Envirofit to the ambient temperature reading. These two figures illustrate a point that is largely consistent throughout our wider data set. SUMs readings of the three stone fires are less responsive than Envirofit SUMs readings to temperature changes. It seems that residual heat stays in the cooking area of a three stone fire preventing temperatures from dropping all the way to the ambient temperature. The design of the Envirofit, which is engineered to be much more heat efficient, shows up in these SUMs graphs as less residual heat when a stove is not in use by the temperatures dropping to room temperature or below. A final difficulty is found July 2-‐4th (left panel, Figure 5) where temperatures for the three stone fire remain above 40°C for about a 48-‐hour period. While it could be that the fire was on for this entire period, it could also be that the household was

7 The location for optimal SUM placement on three stone fires and Envirofit stoves was field tested in Uganda prior to starting the experiment. Additionally, we are thankful for numerous consultations with Dr. Kirk Smith and colleagues at the University of California, Berkeley who have successfully used SUMs and designed the SUM holder (Figure 3) to track stove temperatures in previous studies (Ruiz-‐Mercado et al. 2008; Ruiz-‐Mercado, Canuz, and Smith 2012; Mukhopadhyay et al. 2012).

8

banking embers (keeping coals warm, but not cooking) and the SUM was close to those embers. This figure highlights some of the potential challenges of using temperatures as a predictive mechanism for cooking metrics, especially with three stone fires. Ambient temperature sensors were placed in one household in each of the fourteen parishes.8 Figure 6 is a box plot of the ambient temperatures by hour of the day (averaged over each day of the experiment). There are 1,668 visual observations for three stone fires, and 782 for the Envirofit stove (see Appendix A for a description of the process to match observations to temperature data). The summary statistics for the entire sample and for the trimmed sample (dropping observations that were implausible given the ambient temperature) are in Table 2. Note the spread of mean temperatures (“off” vs. “on”) is wider for observations for Envirofits (26.2°C vs. 41.5°C in full sample, and 25.3°C vs. 42.0°C in trimmed sample) as compared to the three stone fires (30.2°C vs. 38.3°C in full sample, and 28.2°C vs. 39.0°C in trimmed sample). Comparison of fit: full vs. trimmed sample We first examine one regression with two leads and lags on both the full and trimmed samples. Table 3 shows the logistic regression predicting the observation that the stove was recorded as lit. The percent correctly classified is good (83.1%), but the pseudo-‐R2 is a modest 0.17. Figure 7 shows the predicted probability a stove is observed in use at each SUM temperature. Contrary to our priors, the predicted probability is fairly linear for the full sample (especially for the three stone fire). We originally hoped to let the dataset “speak for itself.” However, the predicted probability a three stone fire with a temperature reading of 50°C is “lit” is only about 50%. Because 50°C is much higher than the highest ambient temperature we recorded (40°C), it should have a predicted probability of cooking much higher than 50%. The predictive power rises substantially when we discard outliers, defined as observations coded “lit” and more than two degrees below the hourly mean air temperatures (Figure 6), or “not lit” and more than two degrees above the maximum air temperature we observed that hour of the day across the duration of the experiment (Figure 6). This increase in fit holds for the three stone fires (pseudo-‐R2 increases from 0.17 to 0.40, percentage correctly classified increases from 83.1% to 89.3%) and Envirofits (pseudo-‐R2 increases from 0.45 to 0.70, percentage correctly classified increases from 90.7% to 93.8%). (Note that trimming these outliers mechanically improves the fit and percent correctly classified.) When we drop outliers the predicted probability now has the expected logistic S-‐shape (Figure 7). 8 Ambient temperature readings were only recovered in 13 out of 14 districts. In total there are about 56,000 ambient temperature readings.

9

Regression Specifications We test ten regression specifications on the trimmed sample for both the three stone fires and the Envirofit (Table 4). Initially, we include only the temperature at time t (specification 1); it has a pseudo-‐R2 of 37% and 60%, and correctly classifies 87.5% and 96.1% of the three stone fire and Envirofit sample, respectively. Adding a lead and lag of one period (spec. 2) increases the pseudo R2 to 41% and 68%, and increases the percentage correctly classified to 89.2% for three stone fires, but decreases the percentage correctly classified to 94.9% in the Envirofits. Adding a second (spec. 4), third (spec. 6) or fourth (spec. 8) lead and lag does not improve fit by much. Adding a slope-‐squared term when the slope is positive for the time step from t-‐1 to t (coding slope-‐squared as zero otherwise) measures steep positive increases in slope, as would be the case when a fire is lit. However adding this slope-‐squared term (col. 3, 5, and 7) minimally changes the results. Only including leads (spec. 9) or only including lags (spec. 10) reduces pseudo-‐R2, but increases percentage correctly classified to 89.4% and 95.3% (spec. 10) for three stone fires and Envirofits, respectively. In order to determine which of the models is most appropriate we test the ten specifications with the Akaike information criterion (AIC) (Akaike 1974; Akaike 1981). The AIC trades off goodness of fit of the model with the complexity of the model to guard against over fitting. The AIC is defined as: !"# = 2! − 2 !"(!) (0) where k is the number of parameters in the statistical model, and L is the maximized value of the likelihood function of the model. Given a set of candidate models for a set of data, the preferred model is the one with the lowest AIC. Burnham and Anderson (2002) argue that the AIC is appropriate in cases where the number of observations is many times larger that the square of the number of predictors, as in this case. Among both the three stone fire and Envirofit sample, the AIC is a much higher value for the specifications with temperature at t (spec. 1), temperature at t and four leads (spec. 9), temperature at t and 4 lags (spec. 10) than for the other specifications. The remaining seven specifications all have similar AICs. The specification that includes two leads and two lags (spec. 4) has the lowest AIC among the three stone fire sample. The lowest AIC among the Envirofit sample is the specification with four leads and four lags (spec. 8). For simplicity, we use the same specification on both the three stone fire sample and the Envirofit sample; therefore, we rely on the specification with two leads and lags (spec. 4) for both samples. Logistic Regression Specification According to the AIC for three stone fires, the preferred specification is: !!" = !(γ!!!" + γ!!!"!! + γ!!!"!! + γ!!!"!! + γ!!!"!! + γ!!!" + !!") (0)

10

where F(.) is the logistic functional form, !!" is a dummy variable for observations for stove i at time t. !!" is the SUMs temperature reading for stove i at time t. !!"!! is the SUMs temperature reading for stove i at time t+τ, for τ = -‐2, -‐1, 1, and 2 (that is 30 and 60 minutes prior to the observation, and 30 and 60 minutes after). !!" is a dummy variable for the hour of the day for stove i at time t. The terms γ!, γ!, γ!, γ!, γ!, γ! are the parameters and !!" is an error term. Estimated odds ratios of the logistic regression for both the three stone fires and the Envirofit are presented in Table 3 (columns 2 and 4). An increase of one degree Celsius at time t is associated with an odds ratio of 1.37 and 1.46 (both statistically significant at the 1% level) for the three stone fire and Envirofit, respectively. This means, holding all else constant, that an increase of one degree Celsius at time t means the odds of a stove being “on” are approximately 37% higher for three stone fires, and 46% higher for Envirofits than the odds of the stove being “on” at the initial time t temperature. The three stone fire temperature reading at time t+2 is associated with an odds ratio of 1.07 (significant at 5% level), meaning the odds of a three stone fire being on at time t are 7% higher in the presence of a one degree increase in temperature at time t+2 holding all else constant. The Envirofit temperature reading at time t+1 has an odds ratio of 1.24 (significant at 1% level) and at time t+2 an odds ratio of 0.90 (significant at 5% level). This means the odds of an Envirofit being on at time t are 24% higher in the presence of a one degree increase in temperature at time t+1, and 10% lower in the presence of a one degree increase in temperature at time t+2, holding all else constant. 9 As a robustness check Table 5 shows the in sample and out of sample predictive accuracy of the preferred specification. We randomly split the sample of observations of each stove in half and ran the logistic regression on half of the observations. We then predict the probability of use to both halves of the sample. For the three stone fires the in-‐sample predictive accuracy is 89.8% while out-‐of-‐sample predictive accuracy is 87.8%. For the Envirofits the in-‐sample predictive accuracy is 95.5% while out–of-‐sample predictive accuracy is 91.1%. Because the in-‐sample and out-‐of-‐sample predictive accuracy is very similar for each stove type, it is unlikely that we are over-‐fitting the data. A classification table for the entire sample (Table 6) shows how well the regression predicts the entire observation sample for the three stone fires and for the Envirofit. The sensitivity (probability predicted “lit” when observation observed as “lit”) is only 56.5% for the three stone fires and 77.6% for the Envirofits, showing that the technique struggles when only predicting the sample that was observed as “lit.” The 9 We also tested probit and the linear probability model (LPM) in addition to the logistic regression. Predictions of the probability of stove usage from the probit and logistic regression were correlated at levels higher than 0.99 for both the three stone fire and Envirofit sample. Among the predictions of probability of stove use from the linear probability model, 28.8% and 36.2% of the prediction fell outside of the range [0 to 1] for the three stone fire and Envirofit, respectively. As the proportion of linear probability model predicted probabilities that fall outside of the unit interval increases, the potential bias of using LPM also increases (Horrace and Oaxaca 2006). Because we have a high proportion of LPM predictions that fall outside of this range, using LPM would likely result in biases.

11

specificity (probability predicted “not lit” when the observation observed as “not lit”) is much better, 97.6% for the three stone fire, and 97.0% for the Envirofits. This technique generally holds more predictive power with Envirofits than three stone fires as evidenced by the Envirofit sample having a higher percentage of overall correctly classified observations (93.8% vs. 89.2%). Applying published SUMs algorithms to our data We replicate the algorithm described in (Ruiz-‐Mercado, Canuz, and Smith 2012; Mukhopadhyay et al. 2012) to convert temperature readings to hours of usage. We first determined a threshold increase and decrease in stove temperature equal to the 99th and 1st percentiles of ambient temperature slopes. In our sample of approximately 56,000 ambient temperature readings the 99th percentile of slopes is 0.0833°C/minute (triggering slope of an increase of 2.5°C in 30 minutes), and the 1st percentile of slopes is -‐0.05°C/minute (exit slope of a decrease of 1.5°C in 30 minutes). The algorithm counts a stove as turning “on” when the slope is larger than a triggering slope, then it continues to count each subsequent temperature reading as “on” (as long as temperatures are higher than the ambient temperature) until it reaches a negative slope that is steeper than the exit slope. We calculate the on/off status of our entire data set using this algorithm. We apply this algorithm to both the traditional three stone fires and the Envirofit stoves; however Mukhopadhyay et al. (2012) and Ruiz-‐Mercado, Canuz, and Smith (2012) designed their algorithms for nontraditional stoves and do not endorse this technique for three stone fires. We observe some irregularities in this algorithm when applied to our data, most notably a very hot stove could be marked as “off” too quickly if the temperature falls steeper than a decrease of 1.5°C in 30 minutes even if the absolute temperature is still hot enough to likely indicate cooking. Take for example the following series of temperature data {26, 51, 52, 47, 49, 46, 41, 37, 33, 31°C} with each reading thirty minutes apart. The algorithm would return {off, on, on, off, off, off, off, off, off, off}. When the temperature falls from 52°C to 47°C between the third and fourth reading the slope is steep enough and the algorithm switches to off. However, in reality the subsequent temperature readings above 40°C are likely to indicate stove usage (recall no ambient air temperatures were ever that high). Another irregularity with the basic algorithm is that stoves can cool too slowly to trigger an exit slope so the algorithm returns an excessively long period of cooking (at times multiple days) despite temperatures low enough that cooking is unlikely. For example the data series {26, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20°C} would return {off, on, on, on, on, on, on, on, on, on, on, on, on, on, on} despite 20°C being below the mean ambient temperature in our period, and almost certainly not hot enough to indicate stove usage. We created an adjusted algorithm to account for these two cases. The first adjustment is that once the algorithm has a triggering slope marking the stove as “on” it does not consider an exit slope to trigger as “off” until the absolute temperature is less than 40°C. The second adjustment is that once the algorithm has

12

a triggering slope marking the stove as “on” it will switch to “off” even without a sufficiently steep exit slope if it has negative slope readings for an hour straight and the absolute temperature is below 35°C. Comparison of fit Next we test which algorithm correctly predicts the highest percentage of observations. In addition to testing the Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm, and the adjusted Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm, we also test various cutoff points in relation to mean ambient temperature. For example considering all data points above temperature thresholds such as 32°C, 34°C, 36°C, and 38°C as “on” (these correspond to approximately 8°C, 10°C, 12°C, and 14°C above mean ambient temperature, respectively). Table 7 reports the classification table of how well the algorithms predict the observations of three stone fire use. The strict threshold cutoff of 36°C (correctly classified 85.0% of the observations. This was the best performing strict threshold, correctly predicting more than the thresholds of 32°C (76.8%), 34°C (81.9%), and 38°C (84.8%). The adjusted Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm (77.7%) correctly predicted better than the Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm (58.9%), but not as well as the strict threshold of 36°C (85.0%). The logistic regression with temperature readings at times t-‐2, t-‐1, t, t+1, t+2 (that is at time t, and 30 and 60 minutes prior to, and 30 and 60 minutes after) and hour of the day predicts the most observations of three stone fire use, predicting 89.3% of observations correctly. The pseudo R-‐squared of the logistic regression explains 41.8% of the variation in observations, whereas each of the temperature thresholds has pseudo R-‐squared in the range of 10.3% to 18.4%. The Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm and the adjusted Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm have a pseudo R-‐squared of 0.1% and 7.7%, respectively. Table 8 shows how well the algorithms predict the observations of Envirofit use. Among Envirofits, the strict threshold cutoff of 36°C correctly classified 94.7% of the observations. This was the best performing strict threshold, correctly predicting more than the thresholds of 32°C (91.6%), 34°C (93.9%), and 38°C (94.3%). The strict threshold of 36°C also correctly classified more observations (94.7%) than both the Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm (84.4%), and the adjusted Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm (85.9%). The logistic regression with temperature readings at times t-‐2, t-‐1, t, t+1, t+2 (that is at time t, and 30 and 60 minutes prior to, and 30 and 60 minutes after) and hour of the day correctly predicts slightly fewer (93.8%) than the best performing temperature threshold of 36°C (94.7%). However, considering the pseudo R-‐squared the logistic regression explains 67.1% of the variation in trimmed observations, whereas each of the temperature thresholds has a pseudo R-‐squared in the range of 27.0% to 36.7%. The Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm and the adjusted Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm have a pseudo R-‐squared of 14.4% and 19.6%, respectively. Because the performance of a strict

13

temperature threshold would vary based on the local climate the logistic regression is a more robust algorithm. Evidence of a Hawthorne Effect Our kitchen performance tests (KPT) involve visits on four days in a row to weigh wood and install measurement devices, all while asking participants to record their cooking on a food diary. A major risk is that direct observational process altered participants’ behavior (as noted by (Ezzati, Saleh, and Kammen 2000; Smith-‐Sivertsen et al. 2009)). Specifically, it is plausible that participants used their Envirofit stoves more (and their three-‐stone fire less) during the KPT than during other times. To test for such Hawthorne effects, we examine stove usage during the final KPT, when all participants have two Envirofit. We compare stove usage during the KPT with stove usage over the same duration and days of the week in the prior and following weeks. Our measurement periods are roughly 72 hours (generally late Monday morning to early Thursday afternoon). Table 9 shows the summary statistics for predicted hours cooked per day among the four stoves. By using paired t-‐tests we compare stove usage on a given stove within the same household. Households used their first Envirofit about 3.8 hours per day in the week before the KPT and 3.4 hours in the following week. Their usage was about 4.6 hours10 during the KPT week. The differences between the KPT measurement period with previous week (0.75 hours, 95% CI = [0.12 to 1.37], p<0.05) and with the following week (1.21 hours, 95% CI = [0.58 to 1.84], p<0.01) were statistically significant at the 5% and 1% levels, respectively. The difference in daily hours cooked on the first Envirofit stove in the week prior to and following the KPT was not statistically significantly different than zero. Households used their second Envirofit about 2.7 hours per day in the week before the KPT and 2.4 hours in the following week. Their usage was about 3.4 hours11 during the KPT week. The differences between the KPT measurement period with previous week (0.83 hours, 95% CI = [0.27 to 1.38], p<0.01) and with the following week (0.81 hours, 95% CI = [0.34 to 1.27], p<0.01) were both highly statistically significant. The difference in daily hours cooked on the second Envirofit stove in the week prior to and following the KPT was not statistically significantly different than zero. While usage on each Envirofit stove rose, three stone fire usage declined during the KPT. Households used their primary three stone fire about 8.9 hours per day in the

10 To be precise, the cooking during the KPT measurement week for the first Envirofit was 4.59 hours per day among the 105 pairs of households that had SUMs temperature readings on their first Envirofit stove for both the KPT week and the prior week, and 4.56 hours among the 111 pairs of households that had SUMs temperature readings on their first Envirofit stove for the KPT week and the following week. 11 To be precise, the cooking during the KPT measurement week for the second Envirofit was 3.48 hours per day among the 88 pairs of households that had SUMs temperature readings on their second Envirofit for both the KPT week and the prior week, and 3.24 hours among the 95 pairs of households that had SUMs temperature readings for their second Envirofit for the KPT week and the following week.

14

week before the KPT and 10.1 hours in the following week. Their usage was about 7.4 hours12 during the KPT week. The differences between the KPT measurement period with previous week (1.44 hours, 95% CI = [0.43 to 2.46], p<0.01) and with the following week (2.74 hours, 95% CI = [1.56 to 3.93], p<0.01) were both highly statistically significant. The difference in daily hours cooked on the primary three stone fire in the week prior to and following the KPT was weakly statistically significantly different than zero (p<0.10). Households used their secondary three stone fire about 9.8 hours per day in the week before the KPT and 9.8 hours in the following week. Their usage was 9.0 during the KPT week. Due to a very small sample size,13 none of these differences are significantly different than zero. Ultimately, our interest is to know how combined cooking on Envirofits compares to combined cooking on three stone fires during this observation period. Due to a constraint in the number of functioning SUM devices at the end of our experiment, we have a very small sample (N=22) of temperature readings for secondary three stone fires. Therefore to analyze total combined stove usage we will use the balanced sample in which we successfully recovered SUMs data for both Envirofits and for the primary three stone fire (Table 10) in a given household. Among this sample (N=48) households used their Envirofits about 6.1 hours per day in the week before the KPT and 5.6 hours in the following week. Their usage was 8.1 hours during the KPT week. (Recall we match the same measurement period in each week.) The differences between the KPT measurement period with previous week (2.06 hours, 95% CI = [1.07 to 3.05], p<0.01) and with the following week (2.54 hours, 95% CI = [1.66 to 3.41], p<0.01) were both highly statistically significant. The difference in daily hours cooked in the week prior to and following the KPT was not statistically significantly different than zero. While Envirofit use rose, three stone fire use declined during the KPT. Among the same sample (Table 10) households used their primary three stone fire about 8.1 hours per day in the week prior to the KPT and 10.5 hours in the week following the KPT. The usage was 7.0 hours per day during the KPT week. The differences between the KPT measurement periods with previous week (1.13 hours, 95% CI = [-‐0.13 to 2.38], p<0.10) and with the following week (3.50 hours, 95% CI = [1.92 to 5.09], p<0.01) were statistically significant at the 10% and 1% levels respectively. The difference in daily hours cooked on the primary three stone fire in the week 12 To be precise, the cooking during the KPT measurement week for the primary three stone fire was 7.43 hours per day among the 98 pairs of households that had SUMs temperature readings on their primary three stone fire for both the KPT week and the prior week, and 7.40 hours among the 100 pairs of households that had SUMs temperature readings on their primary three stone fire for the KPT week and the following week. 13 SUMs devices malfunction and can no longer be used once they reach 85°C. Our study is similar to others (Ruiz-‐Mercado, Canuz, and Smith 2012; Ruiz-‐Mercado et al. 2008; Mukhopadhyay et al. 2012) in that our experiment was no exception to the losses of SUMs devices. Due to losses from malfunction or the inability to recover all the devices we initially placed, we did not have enough SUMs to place on all stoves by the end of the experiment. When the final KPT took place (near the end of our experiment) we instructed enumerators to exclude the second three stone fire if they were constrained in the number of SUMs devices. Table 9 shows few SUMs readings from the second three stone fire (N=22) compared to the other three stoves types.

15

prior to and following the KPT was significantly different (p<.01). All parishes overlapped in time, and therefore robust to time effects. While balancing the sample and focusing on combined Envirofit usage compared to primary three stone fire usage within the same household shows evidence of a Hawthorne effect, it also drastically reduces the sample size. Another option that utilizes more data is regression analysis to estimate the effect of the presence of the KPT measurement team on stove usage. Assign the subscripts t=-‐1 to the week prior to measurement week, t=0 to the measurement week, and t=1 to the week after the measurement week. Let !!"!"# be total hours cooked per day on the primary three stone fire for household i during week t. Let !"#$%&'(_!""# be a dummy variable for an adjacent week (t=-‐1 or t=1). The regression is modeled using Ordinary Least Squares (OLS) as: !!"!"# = !!"# ∗ !"#$%&'(_!""# + !! + !!" (1) where !! is fixed effects for the individual household (this controls for household level characteristics that don’t change over these three weeks like family size, income, etc.), and !!" is an error term. The coefficient !!"# is the estimate of how different (in hours cooked per day) an adjacent week is compared to a measurement week (this specification imposes that the previous and subsequent week are equal). Standard errors are clustered at the household level. The regression for the combined Envirofit usage is similar: !!"!"# = !!"# ∗ !"#$%&'(_!""# + !! + !!" (2) where !!"!"# is the total hours cooked per day on both Envirofits and all other variables are the same as in the three stone fire case. The coefficient !!"# is the estimate of how different (in hours cooked per day) an adjacent week is compared to the measurement week. This also imposes equality of the previous and subsequent week. To test the weeks separately, we use a slightly different specification. Let Week_Prior be a dummy variable equal to 1 for the week before the measurement week (when t=-‐1) and 0 otherwise, and let Week_After be a dummy variable equal to 1 for the week after the measurement week (when t=1) and 0 otherwise. Then the regression is modeled using OLS as: !!"!"# = !!!"#!""#_!"#$" + !!!"#!""#_!"#$% + !! + !!" (3) where !! is fixed effects for the individual household and !!!"# is the estimate of the difference (in hours cooked per day) of the week before the measurement week compared to the measurement week. The coefficient of !!!"# is the estimate of the difference (in hours cooked per day) of the week after the measurement week compared to the measurement week. The standard errors are clustered at the

16

household level. The specification for the combined total daily cooking with Envirofits is similar: !!"!"# = !!!"#!""#_!"#$" + !!!"#!""#_!"#$% + !! + !!" (4) where !!"!"# is the total hours cooked per day on both Envirofits and all other variables are the same as in the three stone fire case. The coefficient !!!"# is the estimate of how different (in hours cooked per day) the week prior to the measurement week is compared to the measurement week. The coefficient !!!"# is the estimate of how different (in hours cooked per day) the week after the measurement week is compared to the measurement week. Based on the regression estimates (Table 11) the patterns of stove use are essentially the same (as Tables 9-‐10) as Envirofit usage increases while three stone fire usage decreases in the presence of the measurement team. These effects disappear once the measurement team departs. When constraining the week prior to and after the measurement week to be equal, estimated primary three stone fire usage is higher by 2.26 hours per day (95% CI = [1.13 to 3.39], p<0.01) when comparing the adjacent weeks to the measurement week (col. 1) and the estimated total Envirofit usage is lower by 2.40 hours per day (95% CI = [1.48 to 3.32], p<0.01) when comparing the adjacent weeks to the measurement week (col. 3). Running the analysis weekly, the estimated use of primary three stone fires (col. 2) as compared to usage levels during the measurement week is 1.62 hours per day higher (95% CI = [0.32 to 2.92], p<0.05) and 2.88 hours per day higher (95% CI = [1.37 to 4.39], p<0.01) in the week prior to and after the measurement week, respectively. These coefficients are jointly significantly different than zero (joint F test significant at 1% level). The total usage on Envirofits (col. 4) as compared to Envirofit usage levels during the measurement week is estimated as a decrease of 2.02 hours per day (95% CI = [0.92 to 3.13], p<0.01) and a decrease of 2.71 hours per day (95% CI = [1.72 to 3.71], p<0.01) in the week prior to and after the measurement week, respectively. These coefficients are jointly significantly different from zero (joint F test significant at 1% level). Importance of a Hawthorne Effect To see the importance of high Envirofit usage during the KPT, consider the following illustrative example. Assume:

o Households without Envirofits use three stone fires twelve hours per day

o Households with Envirofits normally use three stone fires nine hours a day and Envirofits four hours per day

o During the KPT, households with Envirofits use three stone fires seven hours a day and Envirofits six hours per day

o Three stone fires create one unit and Envirofits create ½ unit of pollution per hour

17

With these illustrative assumptions, pollution per day declines from twelve units prior to the introduction of an Envirofit to eleven units of pollution (9 + (½ *4) = 11) once the Envirofit is present, a decline of 8.33%. However, if instead we used the data from the kitchen performance test, we would estimate a decline in pollution from twelve units to ten units (7 + (½*6) = 10), a decline of 16.67%, or twice the true decline. While these figures are merely illustrative, they illustrate the importance of observer effects in observational studies. The kitchen performance test is one of the workhorse tests in determining field-‐based use of stoves. Our results suggest the need to correct for the Hawthorne effect prior to publishing results. For example Bailis et al. (2007) find, when comparing the results of pre and post intervention kitchen performance tests, that improved cookstoves statistically significantly reduce average per capita fuel use and this reduction ranges from 19%-‐67% in rural Mexico. Reductions of the emissions of CO2 equivalents is estimated at 3.9 tons per home per year (Johnson et al., 2009) in the same study area in rural Mexico. In these studies daily biomass consumption patterns were calculated based on the results of kitchen performance tests. Other important studies focusing on the health benefits of improved stoves (Ezzati, Saleh, and Kammen 2000; Smith-‐Sivertsen et al. 2009) also use kitchen performance tests or other direct observational methods when measuring indoor air pollution. Given our findings demonstrate that stove users change their behaviors, favoring the clean stove technology when they are directly observed, it is possible that estimates of health benefits would be incorrectly estimated if the underlying behaviors in those studies were influenced in a similar manner. Conclusion We describe an algorithm that combines observations of stove use with continuous temperature data from a stove usage monitor to predict the on/off status of a stove. We test various regression specifications and use the Akaike Information Criterion for choosing a logistic regression where observations are regressed against temperature at times t-‐2, t-‐1, t, t+1, t+2 and a dummy variable for hour of the day. This regression specification correctly predicts 89.3% of three stone fire observations and 93.8% of Envirofit observations of stove usage. The predictive accuracy is almost the same in and out-‐of-‐sample. We continue by comparing the performance of this algorithm to previously published algorithms (Mukhopadhyay et al. 2012; Ruiz-‐Mercado, Canuz, and Smith 2012) and strict temperature threshold cutoffs. Our logistic regression correctly classifies more observations, with a higher pseudo R-‐squared, than any other algorithm for three stone fires. Among the Envirofit observations our logistic regression correctly predicts a higher percentage of observations than both the Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm and the adjusted Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm. The strict 36°C threshold predicts slightly more observations (0.4%) than our logistic regression, however the logistic regression has a much higher pseudo R-‐squared than all other algorithms. We argue that a predictive logistic regression with observations of actual use and temperature readings at times t-‐2, t-‐1, t, t+1, t+2 and hour of the day is the best algorithm for both

18

traditional and nontraditional stove types. Further, this algorithm is transferable to many climatic zones (both hot and cold) whereas strict temperature thresholds and algorithms based on entry and exit slopes need adjustment depending on the local climate and local cooking practices. We find evidence of a large Hawthorne effect during the 72 hour-‐long kitchen performance tests. Households appear to increase cooking on Envirofit stoves by 32.4% (2.02 hours per day) during the measurement week; followed by a reduction in Envirofit use of 33.6% (2.71 hours per day) in the same period the week after the measurement team has departed. Further, use of the primary three stone fires decreases by 18.7% (1.62 hours per day) during the measurement week, followed by an increase of 39.9% (2.88 hours per day) in the week after the kitchen performance test. Attention is warranted in interpreting results of observational studies, our finding of a large and statistically significant Hawthorne effect suggests that reported benefits due to fuel efficient and clean burning stoves could be overstated if researchers are measuring the benefits of an improved stove when behavior is biased in favor of the cleaner technology due to a Hawthorne effect. By using SUMs and a machine-‐learning algorithm to predict the probability of a stove in use, we show one way around this obstacle. Bibliography

Akaike, Hirotugu. 1974. “A New Look at the Statistical Model Identification.” Automatic Control, IEEE Transactions on 19 (6): 716–723.

———. 1981. “Likelihood of a Model and Information Criteria.” Journal of Econometrics 16: 3–14.

Bailis, Rob, Victor Berrueta, Chaya Chengappa, Karabi Dutta, Rufus Edwards, Omar Masera, Dean Still, and Kirk R. Smith. 2007. “Performance Testing for Monitoring Improved Biomass Stove Interventions: Experiences of the Household Energy and Health Project.” Energy for Sustainable Development 11 (2) (June): 57–70. doi:10.1016/S0973-‐0826(08)60400-‐7. http://linkinghub.elsevier.com/retrieve/pii/S0973082608604007.

Bailis, Rob, Kirk R. Smith, and Rufus Edwards. 2007. “Kitchen Performance Test (KPT)”. Univerisity of California, Berkeley, CA.

Bailis, Robert, Majid Ezzati, and Daniel M Kammen. 2005. “Mortality and Greenhouse Gas Impacts of Biomass and Petroleum Energy Futures in Africa.” Science 308 (5718) (April 1): 98–103. doi:10.1126/science.1106881. http://www.ncbi.nlm.nih.gov/pubmed/15802601.

Bond, Tami, Chandra Venkataraman, and Omar Masera. 2004. “Global Atmospheric Impacts of Residential Fuels.” Energy for Sustainable Development 8 (3)

19

(September): 20–32. doi:10.1016/S0973-‐0826(08)60464-‐0. http://linkinghub.elsevier.com/retrieve/pii/S0973082608604640.

Burnham, Kenneth P., and David R. Anderson. 2002. Model Selection and Multimodel Inference: A Practical Information-‐Theoretic Approach. 2nd ed. New York, NY: Springer-‐Verlag.

Chetty, Raj, John N. Friedman, and Jonah E. Rockoff. 2011. “The Long-‐Term Impacts of Teachers: Teacher Value-‐Added and Student Outcomes in Adulthood”. Cambridge, MA, NBER.

Choi, Hyunyoung, and Hal Varian. 2012. “Predicting the Present with Google Trends.” Economic Record 88 (June 27): 2–9. doi:10.1111/j.1475-‐4932.2012.00809.x. http://doi.wiley.com/10.1111/j.1475-‐4932.2012.00809.x.

Einav, Liran, Dan Knoepfle, Jonathan D Levin, and Neel Sundaresan. 2012. “Sales Taxes and Internet Commerce”. Cambridge, MA, NBER.

Einav, Liran, and Jonathan D Levin. 2014. “The Data Revolution and Economic Analysis.” In Innovation Policy and the Economy, edited by Josh Lerner and Scott Stern, Vol. 14. University of Chicago Press.

Ezzati, Majid, and Daniel M Kammen. 2001. “Indoor Air Pollution from Biomass Combustion and Acute Respiratory Infections in Kenya: An Exposure-‐Response Study.” Lancet 358 (9282) (August 25): 619–24. http://www.ncbi.nlm.nih.gov/pubmed/11530148.

———. 2002. “Evaluating the Health Benefits of Transitions in Household Energy Technologies in Kenya.” Energy Policy 30 (10) (August): 815–826. doi:10.1016/S0301-‐4215(01)00125-‐2. http://linkinghub.elsevier.com/retrieve/pii/S0301421501001252.

Ezzati, Majid, Homayoun Saleh, and Daniel M Kammen. 2000. “The Contributions of Emissions and Spatial Microenvironments to Exposure to Indoor Air Pollution from Biomass Combustion in Kenya.” Environmental Health Perspectives 108 (9) (September): 833–839.

Finkelstein, Amy, Sarah Taubman, Bill Wright, Mira Bernstein, Jonathan Gruber, Joseph P Newhouse, Heidi Allen, Katherine Baicker, and Oregon Health Study Group. 2012. “The Oregon Health Insurance Experiment: Evidence from the First Year.” The Quarterly Journal of Economics 127 (3): 1057–1106. doi:10.1093/qje/qjs020.Advance.

Freeman, Olivia E., and Hisham Zerriffi. 2012. “Carbon Credits for Cookstoves: Trade-‐Offs in Climate and Health Benefits.” The Forestry Chronicle 88 (5): 600–608.

20

GIZ. 2011. “Carbon Markets for Improved Cooking Stoves: A GIZ Guide for Project Operators”. Poverty-‐oriented Basic Energy Services, Eschborn, Germany.

Goel, Sharad, Jake M Hofman, Sébastien Lahaie, David M Pennock, and Duncan J Watts. 2010. “Predicting Consumer Behavior with Web Search.” Proceedings of the National Academy of Sciences of the United States of America 107 (41) (October 12): 17486–90. doi:10.1073/pnas.1005962107. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2955127&tool=pmcentrez&rendertype=abstract.

Gold Standard Foundation. 2013. “The Gold Standard: Simplified Methodology for Efficient Cookstoves”. Geneva-‐Cointrin, Switzerland.

Harrell, Stephen, Theresa Beltramo, David I. Levine, Garrick Blalock, and Andrew M. Simons. 2014. “What Is a ‘Meal’? Comparing Methods to Determine Cooking Events”. Mimeo.

Horrace, William C., and Ronald L. Oaxaca. 2006. “Results on the Bias and Inconsistency of Ordinary Least Squares for the Linear Probability Model.” Economics Letters 90 (3) (March): 321–327. doi:10.1016/j.econlet.2005.08.024. http://linkinghub.elsevier.com/retrieve/pii/S0165176505003150.

Johnson, Michael, Rufus Edwards, Adrián Ghilardi, Victor Berrueta, Dan Gillen, Claudio Alatorre Frenk, and Omar Masera. 2009. “Quantification of Carbon Savings from Improved Biomass Cookstove Projects.” Environmental Science & Technology 43 (7) (April): 2456–2462. doi:10.1021/es801564u. http://pubs.acs.org/doi/abs/10.1021/es801564u.

Kammen, Daniel M, Robert Bailis, and Antonia Herzog. 2002. “Clean Energy for Development and Economic Growth: Biomass and Other Renewable Energy Options to Meet Energy and Development Needs in Poor Nations”. New York: UNDP and Government of Morocco.

Levine, David I, Theresa Beltramo, Garrick Blalock, and Carolyn Cotterman. 2013. “What Impedes Efficient Adoption of Products? Evidence from Randomized Variation in Sales Offers for Improved Cookstoves in Uganda”. Mimeo.

Lim, Stephen S, Theo Vos, Abraham D Flaxman, Goodarz Danaei, Kenji Shibuya, Heather Adair-‐Rohani, Markus Amann, et al. 2012. “A Comparative Risk Assessment of Burden of Disease and Injury Attributable to 67 Risk Factors and Risk Factor Clusters in 21 Regions, 1990-‐2010: A Systematic Analysis for the Global Burden of Disease Study 2010.” Lancet 380 (9859) (December 15): 2224–60.

Martin II, William J, Roger I Glass, John M Balbus, and Francis S Collins. 2011. “A Major Environmental Cause of Death.” Science 334: 180–181.

21

Masera, Omar R, Barbara D Saatkamp, and Daniel M Kammen. 2000. “From Linear Fuel Switching to Multiple Cooking Strategies: A Critique and Alternative to the Energy Ladder Model.” World Development 28 (12) (December): 2083–2103. doi:10.1016/S0305-‐750X(00)00076-‐0. http://linkinghub.elsevier.com/retrieve/pii/S0305750X00000760.

McCracken, John, J Schwartz, M Mittleman, L Ryan, A Diaz Artiga, and Kirk R Smith. 2007. “Biomass Smoke Exposure and Acute Lower Respiratory Infections Among Guatemalan Children.” Epidemiology 18 (5): S185.

Mukhopadhyay, Rupak, Sankar Sambandam, Ajay Pillarisetti, Darby Jack, Krishnendu Mukhopadhyay, Kalpana Balakrishnan, Mayur Vaswani, et al. 2012. “Cooking Practices, Air Quality, and the Acceptability of Advanced Cookstoves in Haryana, India: An Exploratory Study to Inform Large-‐Scale Interventions.” Global Health Action 5: 19016.

Ramanathan, V., and G. Carmichael. 2008. “Global and Regional Climate Changes due to Black Carbon.” Nature Geoscience 1 (4): 221–227.

Ruiz-‐Mercado, Ilse, Eduardo Canuz, and Kirk R. Smith. 2012. “Temperature Dataloggers as Stove Use Monitors (SUMs): Field Methods and Signal Analysis.” Biomass and Bioenergy 47: 459–468. doi:10.1016/j.biombioe.2012.09.003.

Ruiz-‐Mercado, Ilse, Eduardo Canuz, Joan L. Walker, and Kirk R. Smith. 2013. “Quantitative Metrics of Stove Adoption Using Stove Use Monitors (SUMs).” Biomass and Bioenergy 57 (August): 136–148. doi:10.1016/j.biombioe.2013.07.002. http://linkinghub.elsevier.com/retrieve/pii/S096195341300319X.

Ruiz-‐Mercado, Ilse, Nick L Lam, Eduardo Canuz, Gilberto Davila, and Kirk R Smith. 2008. “Low-‐Cost Temperature Loggers as Stove Use Monitors (SUMs).” Boiling Point 55: 16–18.

Smith, Kirk R, Nigel Bruce, Byron Arana, Alisa Jenny, Asheena Khalakdina, John McCracken, Anaite Diaz, Morten Schei, and Sandy Gove. 2006. “Conducting The First Randomized Control Trial of Acute Lower Respiratory Infections and Indoor Air Pollution: Description of Process and Methods.” Epidemiology 17 (6): S44.

Smith, Kirk R, John P McCracken, Lisa Thompson, Rufus Edwards, Kyra N Shields, Eduardo Canuz, and Nigel Bruce. 2010. “Personal Child and Mother Carbon Monoxide Exposures and Kitchen Levels: Methods and Results from a Randomized Trial of Woodfired Chimney Cookstoves in Guatemala (RESPIRE).” Journal of Exposure Science and Environmental Epidemiology 20 (5) (July): 406–16. doi:10.1038/jes.2009.30. http://www.ncbi.nlm.nih.gov/pubmed/19536077.

22

Smith, Kirk R, John P McCracken, Martin W Weber, Alan Hubbard, Alisa Jenny, Lisa M Thompson, John Balmes, Anaité Diaz, Byron Arana, and Nigel Bruce. 2011. “Effect of Reduction in Household Air Pollution on Childhood Pneumonia in Guatemala (RESPIRE): A Randomised Controlled Trial.” Lancet 378 (9804) (November 12): 1717–26. doi:10.1016/S0140-‐6736(11)60921-‐5. http://www.ncbi.nlm.nih.gov/pubmed/22078686.

Smith-‐Sivertsen, Tone, Esperanza Díaz, Dan Pope, Rolv T Lie, Anaite Díaz, John McCracken, Per Bakke, Byron Arana, Kirk R Smith, and Nigel Bruce. 2009. “Effect of Reducing Indoor Air Pollution on Women’s Respiratory Symptoms and Lung Function: The RESPIRE Randomized Trial, Guatemala.” American Journal of Epidemiology 170 (2) (July 15): 211–20. doi:10.1093/aje/kwp100. http://www.ncbi.nlm.nih.gov/pubmed/19443665.

Figure 1Photograph of stove use monitoring system (SUMs)

Source: http://www.berkeleyair.com/products-and-services/instrument-services/78-sums

Figure 2Map of Uganda with Mbarara district highlighted

1

Figure 3SUM holder designed to encase the stove use monitor to protect it from malfunctions when exceeding

temperatures of 85 degrees Celsius

Figure 4Arrows mark the placement of SUMs on three stone fire and Envirofit

(a) Three Stone Fire (b) Envirofit

2

Figure 5Temperature readings on a three stone fire, Envirofit, and ambient air in one household over 20 days

16jun2012 22jun2012 04jul2012Date

28jun2012

(a) Three Stone Fire

16jun2012 22jun2012 04jul2012Date

28jun2012

(b) Envirofit

Figure 6Distribution of hourly ambient temperatures

1520

2530

3540

SUM

s Te

mpe

ratu

re, [

C]

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

when time is formatted 00:00-23:59Box Plot of Ambient Temperature by Hour of the Day

Hour of Day

Box and whisker plot: The bottom and top of each box represent the first and the third quartiles of the hourly ambient temperaturedistribution, the horizontal band inside each box represents the median hourly ambient temperature. The end of the whiskers marked with a

horizontal line represent the lowest datum within 1.5 times the interquartile range of the lower quartile and the highest datum within 1.5 timesthe interquartile range of the upper quartile. The dots represent individual observations that are outliers outside of this range.

3

Figure 7Comparison of predicted probability of stove in use: full verses trimmed sample

0.2

.4.6

.81

Prob

abilit

y St

ove

in U

se

20 30 40 50Temperature [C]

TSF (full sample) TSF (trimmed sample)

(a) Three Stone Fire

0.2

.4.6

.81

Prob

abilit

y St

ove

in U

se

20 30 40 50Temperature [C]

ENV (full sample) ENV (trimmed sample)

(b) Envirofit

Note: The trimmed sample dropped observations when observed as ‘lit’ and more than two degrees below hourly mean air temperature, or ifobserved as ‘not lit’ but more than two degrees above the hourly maximum air temperature.

4

Table 1Temperature readings, days, stoves observed by stove type

STOVE TYPE READINGS DAYS STOVESEnvirofit 1st 432,033 8,945 153Envirofit 2nd 137,413 2,974 107Three Stone Fire 1st 708,048 14,081 168Three Stone Fire 2nd 417,149 8,204 152Totals 1,694,643 34,204 580Note: data collected March - September 2012

Table 2Summary statistics: observations of stove usage

Observations Mean Std. Dev. Min. Max. NTemp [C] when three stone fire observed as off 30.17 7.27 19.5 85 1339Temp [C] when three stone fire observed as on 38.30 10.63 20 85 329Temp [C] when Envirofit observed as off 26.17 6.34 18.5 76 704Temp [C] when Envirofit observed as on 41.52 11.7 20 69.5 78

Observations - Trimmed Sample* Mean Std. Dev. Min. Max. NTemp [C] when three stone fire observed as off 28.21 4.60 19.5 41.5 1169Temp [C] when three stone fire observed as on 38.97 10.36 22 85 315Temp [C] when Envirofit observed as off 25.27 3.86 18.5 41 678Temp [C] when Envirofit observed as on 42.04 11.39 20 69.5 76

Note*: In the trimmed sample, observations are dropped if ‘lit’ and more than two degrees below hourly meanair temperature, or if observed as ‘not lit’ but more than two degrees above the hourly maximum air temperature.The trimmed sample removes 11.0% (184 out of 1668) and 3.6% (28 out of 782) of the observation samples for thethree stone fire and Envirofits, respectively.

5

Table 3Probability (odds ratio) that three stone fire (TSF) or Envirofit (ENV) is lit: full versus trimmed samples

(1) (2) (3) (4)TSF TSF trimmed ENV ENV trimmed

SUMs Temperature [C], t 1.13*** 1.37*** 1.08 1.46***(0.03) (0.06) (0.06) (0.13)

SUMs Temperature [C], t-1 0.99 0.95 0.99 1.09(0.06) (0.07) (0.06) (0.08)

SUMs Temperature [C], t+1 0.98 0.98 1.16*** 1.24***(0.03) (0.04) (0.05) (0.07)

SUMs Temperature [C], t-2 0.95 1.03 1.05 0.94(0.05) (0.07) (0.05) (0.05)

SUMs Temperature [C], t+2 1.06** 1.07** 0.96 0.90**(0.03) (0.03) (0.03) (0.05)

Control: Hour of Day Yes Yes Yes Yes

Observations 1,184 1,052 366 356Pseudo R-squared 0.167 0.403 0.445 0.696Correctly Classified 83.11% 89.26% 90.71% 93.82%

Robust standard error exponentiated form in parentheses*** p<0.01, ** p<0.05, * p<0.1

Note: Columns (1) and (3) use the full observation sample. Columns (2) and (4) are trimmed foroutliers, meaning observations are dropped if ‘lit’ and more than two degrees below hourly meanair temperature, or if observed as ‘not lit’ but more than two degrees above the hourly maximumair temperature.

Note: Standard errors clustered at stove level.

Table 4Candidate regression specifications: all specifications include temperature at time [t] and control for

hour of the day

Three Stone Fire EnvirofitSpec Leads Lags Slope2 if

Slope>0Pseudo R2 AIC % Correctly

PredictedN Pseudo R2 AIC % Correctly

PredictedN

(1) 0 0 - 0.37 984.01 87.49 1479 0.60 212.05 96.05 735(2) 1 1 - 0.41 649.04 89.16 1052 0.68 120.29 94.94 356(3) 1 1 Yes 0.42 649.44 89.07 1052 0.69 121.55 95.22 356(4) 2 2 - 0.42 648.29 89.26 1052 0.70 120.14 93.82 356(5) 2 2 Yes 0.42 648.49 89.35 1052 0.70 121.61 94.10 356(6) 3 3 - 0.42 650.87 89.15 1051 0.72 117.33 94.93 355(7) 3 3 Yes 0.42 651.19 89.34 1051 0.72 118.49 94.93 355(8) 4 4 - 0.42 653.48 89.05 1050 0.73 116.33 94.93 355(9) 4 0 - 0.39 865.75 86.98 1260 0.69 149.97 94.57 516(10) 0 4 - 0.40 772.18 89.36 1269 0.61 183.79 95.30 574

Note: Classified as correctly predicted if predicted Pr(D) >= 0.5: True D defined when stove use observed as ‘on’

Note: Control for hour of the day is a dummy variable for each hour of the day stoves were observed in use.

Akaike information criterion: defined as AIC = 2k-2ln(L), where k is number of parameters and L is the maximized value of thelikelihood function. Given a set of candidate models for a set of data, the preferred model is the one with the minimum AIC.

6

Table 5Out of sample predictive accuracy: three stone fire and envirofit classification tables for in-sample

predictions and out-of-sample predictions

Three Stone Fire Envirofit

In-Sample Out-Sample In-Sample Out-SampleSensitivity: classified as ‘on’ when observed as ‘on’ 62.04 49.07 80.00 67.86Specificity: classified as ‘off’ when observed as ‘off’ 97.09 97.45 98.65 95.09Positive predictive value: observed ‘on’ when classifed ‘on’ 84.81 82.81 92.31 70.37Negative predictive value: observed ‘off’ when classified ‘off’ 90.70 88.45 96.05 94.51False positive rate given observed as ‘off’ 2.91 2.55 1.35 4.91False negative rate given observed as ‘on’ 37.96 50.93 20.00 32.14False positive rate given classified as ‘on’ 15.19 17.19 7.69 29.63False negative rate given classified as ‘off’ 9.30 11.55 3.95 5.49Overall correctly classified (percentage) 89.81 87.78 95.51 91.10Sample size 520 540 178 191

Note: Unless noted cell contents to be read as probabilities out of one hundredNote: Classified as ‘on’ if predicted probability >= 0.5Note: For this exercise, the samples of observations are randomly split into two groups, the logistic regression is run on one of the twogroups, then based on the coefficients of that regression, the prediction is made to the other (out-of-sample) group. Running a classificationtable on the in-sample and out-of-sample groups shows very little difference between the two samples, suggesting the technique is robustout-of-sample. When the samples were randomly split in half the groups were equally sized (three stone fire groups sized 742/742 andEnvirofit groups sized 377/377), however the sample size of the groups presented in this table is different because various observations dropout of the regression if lacking temperature data for any of the leads and lags. The observation sample is trimmed for outliers, observationsare dropped if ‘lit’ and more than two degrees below hourly mean air temperature, or if observed as ‘not lit’ but more than two degreesabove the hourly maximum air temperature.

Table 6How well do temperature readings generated from stove use monitors predict observations of stove use

Three Stone Fire Envirofit

Sensitivity: classified as ‘on’ when observed as ‘on’ 56.54 77.59Specificity: classified as ‘off’ when observed as ‘off’ 97.61 96.98Positive predictive value: observed ‘on’ when classifed ‘on’ 85.82 83.33Negative predictive value: observed ‘off’ when classified ‘off’ 89.79 95.70False positive rate given observed as ‘off’ 2.39 3.02False negative rate given observed as ‘on’ 43.46 22.41False positive rate given classified as ‘on’ 14.18 16.67False negative rate given classified as ‘off’ 10.21 4.30Overall correctly classified (percentage) 89.26 93.82Sample size 1052 356

Note: Unless noted cell contents to be read as probabilities out of one hundredNote: Classified as ‘on’ if predicted probability >= 0.5Note: The observation sample is trimmed for outliers, observations are dropped if ‘lit’ and more than two degreesbelow hourly mean air temperature, or if observed as ‘not lit’ but more than two degrees above the hourly maximum airtemperature.

7

Table 7How well do temperature readings generated from stove use monitors predict observations of three stone

fires in use: classification tables comparing algorithms

Three Stone Fire

Over 32 Over 34 Over 36 Over 38 RMM adj RMM LogitSensitivity: classified as ‘on’ when obs as ‘on’ 53.0 44.1 36.8 31.4 40.0 41.3 56.5Specificity: classified as ‘off’ when obs as ‘off’ 83.2 92.0 98.0 99.1 64.0 87.5 97.6Positive predictive: obs ‘on’ when classifed ‘on’ 46.0 59.9 83.5 90.8 23.0 47.1 85.8Negative predictive: obs ‘off’ when classified ‘off’ 86.8 85.9 85.2 84.3 79.8 84.7 89.8False positive rate given observed as ‘off’ 16.8 8.0 2.0 0.9 36.0 12.5 2.4False negative rate given observed as ‘on’ 47.0 55.9 63.2 68.6 60.0 58.7 43.5False positive rate given classified as ‘on’ 54.0 40.1 16.5 9.2 77.0 52.9 14.2False negative rate given classified as ‘off’ 13.2 14.1 14.8 15.7 20.2 15.3 10.2Overall correctly classified (percentage) 76.8 81.9 85.0 84.8 58.9 77.7 89.3Pseudo R-squared 10.3 13.4 18.4 17.7 0.1 7.7 41.8

Note: Each column represents a logistic regression with the trimmed sample of observations as the dependent variable and the column header asthe independent variable, cell contents are classified as ‘on’ if predicted probability >= 0.5, the psuedo R-squared is reported for each regression.As an example, the first column is the observation sample regressed on a binary variable registering ‘on’ when the temperature is greater than 32Celsius, and ‘off’ when 32 Celsius or below. The same is true for the ‘Over 34,’ ‘Over 36,’ and ‘Over 38’ columns. The column titled ‘RMM’ usesthe ‘on’/‘off’ status as determined by the Ruiz-Mercado-Mukhopadhyay et al. algorithm as the independent variable for the logistic regression.The column titled ‘adj RMM’ uses the ‘on’/‘off’ status as determined by the adjusted Ruiz-Mercado-Mukhopadhyay et al. algorithm (allowingthe stove to remain ‘on’ if temperatures are very hot even if an exit slope has triggered, and switching ‘off’ if cooling and approaching ambienttemperature even if exit slope not triggered) as the independent variable. The column titled ‘Logit’ uses the logistic regression developed in thispaper where the observations of stove use are regressed against temperature at time t, and two lead and lag temperature readings, and the hourof day. In cases where the regression does not run due to insufficient variability or colinearity, the classification tables are calculated manually,and the pseudo R-squared is reported as n/a. All cells are to be read as probabilities out of one hundred.Note: The observation sample is trimmed for outliers, observations are dropped if ‘lit’ and more than two degrees below hourly mean airtemperature, or if observed as ‘not lit’ but more than two degrees above the hourly maximum air temperature.

Table 8How well do temperature readings generated from stove use monitors predict visual observations of

Envirofit use: classification tables comparing algorithms

Envirofit

Over 32 Over 34 Over 36 Over 38 RMM adj RMM LogitSensitivity: classified as ‘on’ when obs as ‘on’ 57.9 52.6 48.7 43.4 56.6 63.2 77.6Specificity: classified as ‘off’ when obs as ‘off’ 95.4 98.5 99.9 100.0 87.5 88.5 97.0Positive predictive: obs ‘on’ when classifed ‘on’ 58.7 80.0 97.4 100.0 33.6 38.1 83.3Negative predictive: obs ‘off’ when classified ‘off’ 95.3 94.9 94.6 94.0 94.7 95.5 95.7False positive rate given observed as ‘off’ 4.6 1.5 0.1 0.0 12.5 11.5 3.0False negative rate given observed as ‘on’ 42.1 47.4 51.3 56.6 43.4 36.8 22.4False positive rate given classified as ‘on’ 41.3 20.0 2.6 0.0 66.4 61.9 16.7False negative rate given classified as ‘off’ 4.7 5.1 5.4 6.0 5.3 4.5 4.3Overall correctly classified (percentage) 91.6 93.9 94.7 94.3 84.4 85.9 93.8Pseudo R-squared 27.0 32.2 36.7 n/a 14.4 19.6 69.6

Note: Each column represents a logistic regression with the trimmed sample of observations as the dependent variable and the column header asthe independent variable, cell contents are classified as ‘on’ if predicted probability >= 0.5, the psuedo R-squared is reported for each regression.As an example, the first column is the observation sample regressed on a binary variable registering ‘on’ when the temperature is greater than 32Celsius, and ‘off’ when 32 Celsius or below. The same is true for the ‘Over 34,’ ‘Over 36,’ and ‘Over 38’ columns. The column titled ‘RMM’ usesthe ‘on’/‘off’ status as determined by the Ruiz-Mercado-Mukhopadhyay et al. algorithm as the independent variable for the logistic regression.The column titled ‘adj RMM’ uses the ‘on’/‘off’ status as determined by the adjusted Ruiz-Mercado-Mukhopadhyay et al. algorithm (allowingthe stove to remain ‘on’ if temperatures are very hot even if an exit slope has triggered, and switching ‘off’ if cooling and approaching ambienttemperature even if exit slope not triggered) as the independent variable. The column titled ‘Logit’ uses the logistic regression developed in thispaper where the observations of stove use are regressed against temperature at time t, and two lead and lag temperature readings, and the hourof day. In cases where the regression does not run due to insufficient variability or colinearity, the classification tables are calculated manually,and the pseudo R-squared is reported as n/a. All cells are to be read as probabilities out of one hundred.Note: The observation sample is trimmed for outliers, observations are dropped if ‘lit’ and more than two degrees below hourly mean airtemperature, or if observed as ‘not lit’ but more than two degrees above the hourly maximum air temperature.

8

Table 9Summary statistics: predicted hours cooked per day for each three stone fire (TSF) and each Envirofit in

the time period one week prior to and after the approximately 72 hour long kitchen performance test(KPT)

Predicted Daily Hours Cooked

Primary TSF Secondary TSF Envirofit 1st Envirofit 2ndMean SD N Mean SD N Mean SD N Mean SD N

Week Before Final KPT 8.87*** 7.54 98 9.82 8.93 22 3.84** 3.50 105 2.65*** 2.84 88Final KPT Week 7.43 7.62 98 8.99 8.19 22 4.59 3.27 105 3.48 3.07 88

Final KPT Week 7.40 7.56 100 8.99 8.19 22 4.56 3.24 111 3.24 2.95 95Week After Final KPT 10.14*** 8.41 100 9.79 8.34 22 3.35*** 3.21 111 2.44*** 2.15 95

Note: Stove usage during KPT verses stove usage over the same duration and days of the week for the same stove types in the prior and following

weeks. We use paired t-tests relative to the measurement (KPT) week to examine statistical significance. The difference in use is statistically

different than zero at the following levels *** p<0.01, ** p<0.05, * p<0.10. The differences in hours cooked per day in the week prior to and

after the KPT are weakly statistically significantly different than zero for the primary three stone fire (p<0.10). There is no statistically significant

difference in daily hours cooked in the week prior to and after the KPT for neither the secondary three stone fire, the first Envirofit, nor the second

Envirofit.

Table 10Test for Hawthorne effect: primary three stone fire verses combined hours cooked on two Envirofits

during the approximately 72 hour long kitchen performance test (KPT) “week”

Predicted Daily Hours CookedBoth Envirofit Stoves Mean SD N

One Week Before Final KPT 6.08*** 3.91 48Final KPT Measurement Week 8.14 3.67 48One Week After Final KPT 5.60*** 3.05 48

Primary Three Stone Fire Mean SD NOne Week Before Final KPT 8.07* 7.39 48Final KPT Measurement Week 6.95 7.14 48One Week After Final KPT 10.45*** 8.40 48

Note: Stove usage during KPT verses stove usage over the same duration and days of the

week in the prior and following weeks. We use paired t-tests relative to the measurement

(KPT) week to examine statistical significance. The difference in use is statistically dif-

ferent than zero at the following levels *** p<0.01, ** p<0.05, * p<0.10. Each of the 48

households in this sample has SUMs readings for two Envirofit stoves and SUMs readings

for the primary three stone fire. The week prior to and after the KPT are not statistically

significantly different than zero for Envirofit, however they are statistically significantly

different than zero for the three stone fire (at p<0.01).

9

Table 11Regressions testing for Hawthorne effect: estimates of effects of the presence of measurement team in

primary three stone fire (TSF) usage and combined Envirofit usage, the coefficients represent the changein hours cooked per day compared to hours cooked per day in the measurement week

Primary TSF Combined Envirofit(1) (2) (3) (4)

Week prior to and after measurement week 2.26*** -2.40***constrained to be equal (0.57) (0.46)

Week prior to measurement week 1.62** -2.02***(0.66) (0.56)

Week after measurement week 2.88*** -2.71***(0.76) (0.50)

Household fixed effects Yes Yes Yes YesObservations 316 316 229 229R-squared 0.82 0.82 0.79 0.80Household clusters 118 118 89 89

Robust standard errors in parentheses*** p<0.01, ** p<0.05, * p<0.1

Note: The unit of analysis is a measurement “week” (approximately 72 hours) at a household. Thespecification in columns 1 and 3 imposes that the weeks prior to and after the measurement week areequal. The specification in columns 2 and 4 tests usage in the week prior to and after the measurementweek separately. The coefficient estimates in columns 2 and 4 are both jointly significantly not equalto zero (joint F test significant at 1% level in both cases).

Note: Standard errors clustered at household level.

10

Documents

Household Responses to Food Subsidies: Evidence …cega.berkeley.edu/assets/cega_events/61/1D_Food_and_Agriculture_3... · Household Responses to Food Subsidies: Evidence from India