Upload
hoangkien
View
215
Download
0
Embed Size (px)
Citation preview
Preliminary Draft: Please do not cite without author’s permission.
Household Responses to Food Subsidies:
Evidence from India∗
Tara Kaul†
University of Maryland, College Park
January 8, 2014
Abstract
This paper uses household survey data to examine the effect of food subsidies on thenutritional outcomes of poor households in India. The national food security program,known as the Public Distribution System (PDS), takes the form of a monthly quotaof cereals (rice and/or wheat) available for purchase at substantially discounted prices.The effect of the program is studied by exploiting the geographic and household sizespecific variations in the value of the subsidy that result from differences in state pro-gram rules and local market prices. The analysis focusses on the rice subsidy programand outcomes are consumption data from six years (2002-2008) of the NSSO Socio-Economic surveys. In agreement with other literature on food subsidies, this paperfinds small elasticities for cereal consumption and caloric intake with respect to thevalue of the subsidy. However, households benefit from the program in terms of overallfood intake and not just through cereals directly provided by the PDS. The elastici-ties for calories from all food groups are positive and significant. This is in contrastto studies on pure price subsidies for cereals that have found zero or negative effectson caloric intake. Thus, the results in this paper suggest that quotas may be moreeffective than price subsidies at improving nutrition. Taking into account state leveldifferences in the functioning of the program, a substantially smaller effect is found instates that have higher levels of corruption. Finally, the estimates from this paper areused to simulate the caloric impact of changes in the PDS under the provisions of theNational Food Security Bill of 2013.
JEL classification: I38; H42; O12; O15; Q18
∗I am grateful to Jessica Goldberg and Melissa Kearney for their invaluable guidance and support atevery stage of this project. I thank Sebastian Galiani, Parikshit Ghosh and Susan Godlonton for helpfulcomments and discussions. I also thank Ron Chan, Daisy Dai, Sonal Desai, Brinda Karat, Reetika Khera,Giordano Palloni, Brian Quistorff, Miguel Sarzosa, Suraj Shekhar and seminar participants at the Universityof Maryland, Delhi School of Economics, the 2013 MEA Annual Meeting, the 2013 Midwest InternationalEconomic Development Conference and the 2013 APPAM Fall Research Conference for useful feedback. Igratefully acknowledge use of data from the National Sample Survey Office, New Delhi and the India HumanDevelopment Survey. All errors and omissions are my own.
†Email: [email protected]
1
1. Introduction
Provision of food security is one of the most basic forms of assistance to poor households
and many governments around the world provide aid in the form of food subsidies. Despite
their popularity and importance, there is a relatively small body of research studying the
impact of such subsidies on nutritional outcomes. While the primary rationale for all food
subsidies is reducing food insecurity (FAO 1997), their impact on nutrition (via food intake)
has generally been found to be quite small, in some cases even zero or negative.1
Food subsidies are implemented via food stamps, price subsidies, direct in-kind transfers
or through the operation of ration shops (price subsidies with a quantity cap). The mecha-
nism through which a food subsidy operates depends on the manner in which the program
affects income and prices (and the related elasticities of the demand for food), the specific
types of food groups it targets (and the income and cross price elasticities of other foods),
and how close the beneficiary households are to their ideal level of food consumption. This
paper examines India’s food security program, the Public Distribution System (PDS). The
PDS is one of the country’s biggest anti-poverty programs. It has a target population of
65.2 million households and the total cost of food subsidies amounts to almost 1% of GDP
(Planning Commission 2012).
1Butler, Ohls and Posner (1985) find a very limited nutritional impact of the food stamps program in theUnited States. The Indian food security program has also historically been found to have a small impacton nutrition (Kochar 2005, Tarozzi 2005, Khera 2011c). Jensen and Miller (2011) find a negative impact ofprice subsidies on caloric intake in China. There is also a literature comparing the effects of food subsidiesand income transfers. Hoynes and Schanzenbach (2009) find that the food stamps program in the UnitedStates has the same impact on food expenditure as cash. Steifel and Alderman (2006) and Laderchi (2001)find similar results in Peru where the impact of food subsidies on child nutrition is no different than that ofcash transfers.
2
The PDS provides a monthly quota of cereals (20-35 kg per household) at substantially
discounted prices to households that are below the poverty line (Government of India 2011).
The poverty line in India is determined by having an income deemed adequate to purchase a
basic minimum level of calories.2 Since cereals are relatively cheap and rich in energy, they
account for a substantial proportion of the total calories consumed by poor households.3
Thus the PDS aims to directly increase caloric intake by providing supplementary cereals to
poor households, who are, by assumption, food insecure. The program may, however, have
consequences in terms of consumption of other food groups which are critical to maintaining
a healthy and balanced diet. This is particularly important in India where the levels of child
malnutrition are alarmingly high and have been linked to severe micronutrient deficiencies
(Banerjee and Duflo 2011). Thus it is necessary to study the impact of the program not
just on the intake of cereals and total calories, but also on calories from different food groups.4
The value of the subsidy to a household is a function of the quantity cap and the discount
(difference between the market and PDS price of cereals) that the household is eligible for.
This paper uses two previously unexploited sources of variation in the value of the subsidy to
measure its impact on nutritional outcomes. First, the total quantity cap for each household
2Deaton and Dreze (2009) find that the average caloric intake in India has declined over time. However,they note that this declining trend does not undermine the importance of reducing calorie deficiencies forpoor households. In my sample, the caloric intake of the below poverty line (BPL) population is well belowthe minimum caloric norms (and the calories consumed by more affluent households). See Table 4 ahead.
3Cereals account for over 72% of the total calories consumed by BPL households in the sample understudy (Table 4) .
4While the links between improvements in caloric intake and health outcomes certainly exist, the rela-tionship is not necessarily linear and depends critically on other factors such as a balanced diet, activitylevels, sanitation, and basic medical facilities (Deaton and Dreze 2009). In the absence of data on healthoutcomes in my dataset, I am unable to predict the impact of the subsidy on health. However, assessing theimpact on caloric intake is important in itself. Reducing the incidence of hunger by improving caloric intakefor a food insecure population, such as BPL households in India, is among the major humanitarian goals ofmany governments and international organisations around the world.
3
varies by state since the program is jointly funded by the state and central governments.
Further, some states index the cap by family size and others offer a fixed quantity to every
household. This results in variation, even within state, in the per person quantity cap, due
to differences in family size. Second, the price charged for PDS goods is typically set for
the year and not linked in any way to fluctuations in market prices. Due to the absence of
perfectly integrated agricultural markets and controls on the movement of goods within the
country, the difference between the PDS and market price within a year varies substantially
across districts (within a state) and seasons.
State level program rules and differences in local market prices are used to compute the
value of the subsidy for each beneficiary household, based on its size and district-season-year
cell. This value is used to identify the effect of the subsidy, assuming that after controlling
for household characteristics, district, state-year and seasonal effects, the remaining variation
in the value of the subsidy (owing to unpredictable fluctuations in local market prices and
differences in the per person quantity cap due to family size) is exogenous. The consumption
data come from six years (2002-2008) of the nationally representative socio-economic surveys
conducted by the National Sample Survey Organisation (NSSO). The analysis focusses on
eight major states in India where rice is the primary staple. The main outcomes studied
are cereal consumption, caloric intake, calories from different food groups, and total food
expenditure. In order to compare the estimates with prior work on expenditure and income
elasticities, supplementary data on income and consumption from a smaller but much richer
dataset, the India Human Development Survey (2004-05), is used.
The estimated elasticities of cereal consumption and calories with respect to the value of
4
the subsidy are small, but higher than previous estimates – the elasticity of caloric intake
with respect to the value of the subsidy is 0.144, as compared to 0.06 in Kochar (2005).
An increase in the rupee value of the subsidy increases calories by more than twice of what
is implied by its impact on cereal consumption alone. This is an important indicator that
households benefit from the program in terms of overall caloric intake and not just through
the cereals directly provided by the PDS. Even though the program subsidizes only cereals,
it has a positive effect on the consumption of all types of food: the impact on calories for
all food groups is positive and significant. In contrast to Jensen and Miller (2011), who find
a zero or negative effect of pure price subsidies on overall calories and calories from differ-
ent food groups, the results in this paper suggest that quotas may be more effective than
price subsidies in improving nutrition via caloric intake. Results are robust to alternative
specifications of the value of the subsidy and the impact of the program on non beneficiary
households is examined and found to be small.
State level differences in the administration of the PDS are taken into account using ev-
idence on program fidelity from Khera (2011a). As expected, the subsidy has a substantially
smaller impact in those states where the illegal diversion of food grains away from their in-
tended beneficiaries is reported to be high. Despite serious implementation issues, the PDS
continues to be one of the government’s biggest anti-poverty programs and an expansion of
the system is imminent under the National Food Security Bill, which was passed in Septem-
ber 2013. The bill will expand both the reach, and the value of subsidies provided through
the PDS. The elasticity estimates from this paper, combined with current data on prices,
suggest that the increase in the value of the subsidy under the provisions of the bill will
lead to an increase of 66 - 72 kcal in the daily caloric intake of current beneficiaries of the
5
program. These gains will be much higher if states are able to successfully reduce leakages
from the PDS.
The paper is organized as follows: Section 2 describes the functioning, history and rules
of the Public Distribution System in India. The motivation is described in section 3, which
comprises the conceptual framework and a review of the related literature. The data, empir-
ical specification and descriptive statistics are presented in section 4 and section 5 discusses
the results. The final section concludes and describes avenues for future work.
2. The Public Distribution System in India
The Public Distribution System was established in India in 1939 by the then British govern-
ment to cope with rising food prices and food shortages in Bombay and other urban areas.
Over the years it expanded its scope dramatically and is currently one of the Indian gov-
ernments most significant welfare programs. In the 2012-13 central government budget, the
food subsidy was projected at Rs 750 billion (approximately US$13.8 billion, Government
of India 2012). The PDS provides a minimum support price to farmers and acts as a food
safety net for the rural and urban poor. It works alongside the free market and provides
rice, wheat, edible oils, sugar and kerosene at subsidized prices through 489,000 fair price
shops across the country (Planning Commission 2012).
Prior to 1997, any household could purchase a monthly quota of subsidized products at
fair price shops. In 1997, there was a major change in the PDS with the introduction of the
Targeted Public Distribution System (TPDS). Under the new system, families classified as
6
being below the poverty line (BPL) could get 10 kg of food grains per month at half the
economic cost to the central government of procuring them. This quota was increased in
2000 to 20 kg per month (rice and/or wheat) and in 2002, the quota was revised upwards to
35 kg for most states. The subsidy for above poverty line (APL) families was eliminated.5
Table 1 presents the details of state specific quotas for the period 2002-2008. Kochar (2005)
estimates that the value of the subsidy, defined as the product of the quantity entitlement
and the difference between the market and PDS price, increased from Rs 7 per household
per month in 1993 to Rs 48 per household per month in 2000 (an increase of over 500%).
Thus the TPDS introduced targeting and substantially increased the subsidy for BPL fam-
ilies. Currently, PDS prices are fixed by the central government and state governments are
only allowed to add limited transportation costs and taxes. PDS prices are not indexed in
any way to market prices. 24.3 million households classified as Antyodaya (poorest of the
poor) are entitled to a larger quota and still lower prices. The next big change in the PDS
is imminent as per the provisions of the National Food Security Bill, which was passed in
September 2013. This bill makes it a legal right for 67% of the population to obtain 5 kg of
foodgrains (per person per month) at prices between Re 1 and Rs 3 per kilogram. The eldest
woman in the household will be given access to the subsidy on behalf of her family (The
Hindu, July 2013). The resulting expansion in the reach and value of subsidies provided
through the PDS could make it the largest food security program in the world.
The PDS is an essential part of the social framework in many areas of the country, particu-
larly places with limited access to markets. However, the implementation of the program has
5Tamil Nadu continues to have a universal subsidy and, since June 2011, provides 20 kg of rice per monthfree of cost to BPL households.
7
been called into question on a number of dimensions.6 The first is the targeting accuracy of
the PDS post 1997. Local governments are responsible for periodically carrying out surveys
to assign poverty scores to households. The central government provides a cap on the number
of BPL households for each state. State governments translate these caps into cut offs for
the poverty score. All households that fall below their respective state- and district-specific
cut off are classified as being below the poverty line and are issued BPL cards (Karat 2013
and Dreze and Khera 2010). In 2002, the BPL survey comprised 13 questions related to
educational status, asset holdings, indebtedness, demographics etc. Targeting leads to er-
rors of inclusion and exclusion. The reported rate of exclusion (eligible households without
a BPL card) varies from 3% in Andhra Pradesh to 47% in Assam (Planning Commission
2005). In an independent study on Rajasthan, Khera (2008) finds an exclusion error of 44%.
Swaminathan (2008) also reports high rates of exclusion, particularly in states like Kerala
that switched from a universal PDS. Kochar (2005) argues that targeting reduced political
support for the program.
The second issue is corruption in the PDS, particularly through diversion of food grains
at different points along the distribution chain. This has been often highlighted in newspa-
per editorials, magazines, government reports and journals, though accurate estimates are
difficult to obtain. The Planning Commission (2005) estimates that the government of In-
dia spends Rs 3.65 to get Re 1 worth of subsidy to a BPL household. Based on a survey
conducted in 2001, the report concluded that 58% of food grains procured by PDS did not
reach their intended beneficiaries due to a combination of inaccurate targeting and diver-
sion. Khera (2011a) puts this number at 44% in 2007-08 by comparing official figures for
6A discussion of how the analysis deals with these implementation issues is presented in sections 3.3 and5.5.
8
food grains distributed through the PDS, with data on purchases from household surveys.
She divides states into those that are functioning, reforming, or languishing in terms of the
average quantity of food grains reaching households. Of late, significant state level differ-
ences in the working of the PDS have emerged and some states are improving delivery. A
nine state survey by Khera (2011b) conducted in May and June 2011 finds that over 85%
of the monthly entitlement was received by beneficiaries in these states. In November 2012,
the Indian government announced that a number of subsidy programs such as scholarships,
cooking fuel subsidies, pensions and unemployment benefits would be converted into direct
cash transfers in a phased manner starting in January 2013. This move is an attempt to
reduce inefficiencies and corruption in the implementation of various welfare schemes. Food
grains in the PDS are not yet a part of the proposed switch but some states, such as Bihar,
Madhya Pradesh and Delhi, are conducting pilot studies and have expressed an interest in
making that change.
3. Motivation
3.1 Conceptual Framework
A price discount with a quantity cap is typically represented by an outward shift of the
budget constraint. Figure 1 presents this shift in the household budget set with food on the
horizontal axis and the rupee value of non-food consumption on the vertical axis. Let the
price of non food purchases be normalized to Re 1. A subsidy comprising a discount of δ and
a quota Q shifts the budget set from NF to NCD. As demonstrated in Moffitt (1989) and
Deaton (1984), the household will choose segment I if its indifference curve is tangent at any
point (say, A) on the segment CD and a similar argument holds for tangency on segment
9
II. If tangency is not achieved on either segment, the household will choose the kink C. The
food subsidy program in India varies geographically and by family size in the amount of the
quota, the discount and the resulting value of the subsidy. Following Kochar (2005) and
Khera (2011c), these program rules are presented below by slightly modifying the standard
utility maximization problem.
Let the utility function for household i be Ui = f((1 − α)xi + yi, zi;Ti), where xi repre-
sents the subsidized food, yi represents food purchased from the market, zi is non-food
purchases and Ti denotes tastes which could depend on age, gender, composition and other
household characteristics. The subsidized and market food purchases are substitutes and
α (0 ≤ α ≤ 1) represents any (multiplicative) transaction costs associated with buying the
former. Let px, py and pz be the prices faced by the household where px= (1 − δ)py with
δ being the discount (0 ≤ δ ≤ 1). Let Q be the quota and Mi the household income. The
household’s maximization problem is:
maxxi,yi,zi
f((1− α)xi + yi, zi;Ti)
subject to:
pxxi + pyyi + pzzi ≤ Mi
xi ≤ Q
xi ≥ 0, yi ≥ 0, zi ≥ 0
10
The resulting Lagrangian is:
L = maxxi,yi,zi,λ,γ,µ1,µ2,µ3
f((1− α)xi + yi, zi;Ti) + λ(Mi − pxxi − pyyi − pzzi)
+γ(Q− xi) + µ1xi + µ2yi + µ3zi
Solving the first order conditions provides the demand functions for specific parameter val-
ues. The solution (x∗i , y
∗i , z
∗i ) will belong to one of the following four cases.
Case 1 : x∗i = 0, y∗i ≥ 0, z∗i ≥ 0
α > δ, MUy
MUz= py
pz(First order conditions)
Transaction costs (α) far outweigh the discount benefit (δ) and the household does not make
any purchases from the PDS (i.e. non participation).
Case 2 : 0 < x∗i < Q, y∗i = 0, z∗i ≥ 0
α < δ, MUxMUz
= pxpz
(First order conditions)
The household meets its entire food requirement through the PDS and does not need to
make any purchases from the market. This case is represented by point B in figure 1.
Case 3 : x∗i = Q, y∗i = 0, z∗i ≥ 0
α < δ, MUxMUz
> pxpz, MUy
MUz< py
pz(First order conditions)
The household would like to purchase more food at the discounted rate, but not at the mar-
ket price. This case refers to point C in figure 1, i.e. locating exactly at the kink.
11
Case 4 : x∗i = Q, y∗i > 0, z∗i ≥ 0
α < δ, MUy
MUz= py
pz(First order conditions)
The quota is binding and the household supplements its food with market purchases. This is
represented by point A in figure 1. Note that voluntary under-purchase, where 0 < x∗i < Q
and y∗i > 0 could take place if α = δ. However, this will never occur if there is even an
infinitely small (�) fixed cost associated with going to the fair price shop and thus, under-
purchase is likely driven by supply issues and collapses to case 4.
3.2 Expected Effect of the PDS
Post 1997, both the discount and the quantity cap in the PDS for BPL families were sub-
stantially increased. Khera (2011c) finds that participation in the program has dramatically
risen since the early 2000’s. She confirms that the conditions for case 1, where eligible house-
holds make no purchases from the PDS, are unlikely to be realized. Cases 2 and 3, where
households make purchases from the PDS but none from the market are also unlikely to
be relevant. The PDS is intended as a supplementary program and does not seek to meet
the entire food requirement of beneficiary households (Government of India, 2011). Kochar
(2005) and Khera (2011c) find that the quota is low enough to necessitate supplementary
purchases of food from the market for all households. The data from the NSSO surveys also
confirm that cases 1-3 are not supported. As shown ahead in section 4, all PDS beneficiary
households make food purchases in the market.7
The special case of households voluntarily under-purchasing from the PDS and buying addi-
tional food in the market is also not likely to hold. Khera (2011c) finds that the quantity of
7PDS rice expenditure accounts for less than 8% of the total food expenditure for a typical BPL household.
12
cereal purchased from the PDS does not respond to income, but overall cereal purchase does.
She also checks for the impact of other household characteristics on PDS purchases and finds
no effect. She concludes that under purchase is largely driven by supply side issues.8 Thus
based on program rules, past studies and summary statistics, the most relevant case is 4,
where the subsidy can be viewed as providing extra income to the household.9 This extra
income is given by the horizontal distance FD in figure 1: Si = (py − px)Q.
For a low enough quota and transaction costs less than the discount, the quota will bind
and the household will purchase food and other items in the market. To see this, consider a
Cobb-Douglas utility function of the form f = ((1− α)xi + yi)1/2z1/2i .10 The solution to the
household’s problem is:
y∗i =Mi −Q(2− α− δ)py
2py, z∗i =
Mi +Q(δ − α)py2pz
, x∗i = Q
conditional on:
Q < Mi(2−α−δ)py
(Quota < threshold value)
α < δ (Transaction costs < discount)
8In the IHDS (2005) data, of the poor households that had not accessed the PDS in the last 6 months,over 57% reported supply side constraints as the primary issue.
9This is also in agreement with opinions recently expressed in the popular press. Since all beneficiaryhouseholds purchase additional grain from the market, Arvind Panagariya believes that the proposed expan-sion of the PDS will have the same effect as a cash transfer of equal value (The Times of India, June 2013).Ashok Kotwal agrees that the PDS ’..is nothing but an income transfer.’ (The Economic Times, June 2013).
10Cobb-Douglas has the unique property that the proportion of the budget spent on food will be indepen-dent of the price of other commodities.
13
The total food consumption (F ∗i = y∗i +Q) is:
F ∗i =
Mi +Q(δ − α)py2py
where ∂Fi∗∂Q > 0 , ∂Fi∗
∂δ > 0 and ∂Fi∗∂α < 0.
Thus, the total food consumption is an increasing function of the quota and discount, and a
decreasing function of transaction costs. This motivates the use of a regression framework
taking into account the program rules, value of the subsidy, transaction costs and tastes.
Inefficiencies and corruption in the administration of the program would manifest in the
form of a higher α (poor quality, long waiting time, inconvenience, distance to FPS) or a
lower actual Q (supply side constraints).
3.3 Evidence on Food Subsidies and Nutrition
Food subsidies are an important policy tool and their impact on household outcomes such
as nutrition, caloric intake, early life development and income stabilization has been exam-
ined in both developed and developing countries. The national food stamps program (FSP)
in the US (renamed SNAP in 2008) has been studied extensively.11 Between 1961-75 the
program was phased in with different counties adopting it at different times. Hoynes and
Schanzenbach (2009) exploit this difference in timing to evaluate the impact of the program
and find that it led to an overall increase in food expenditures, as expected. Butler, Ohls
and Posner (1985) find very small effects of the FSP on the nutrient intake of the eligible
elderly, either through stamps or cash.
11See Hoynes and Schanzenbach (2009) for a review of the large literature on the food stamps program.
14
Despite lower per capita incomes and greater malnutrition, most studies in the context
of developing countries also find very small effects of food subsidies on nutritional outcomes.
Jensen and Miller (2011) conduct an experiment in rural China to estimate the impact of
price subsidies for rice and wheat and find no evidence that subsidies improve nutrition.
Laderchi (2001) finds that food transfers are no more successful at improving child nutrition
than other sources of income in Peru. In a survey of 300 households in Rajasthan, Khera
(2011c) finds that being eligible for PDS food grains does not lead to higher overall cereal
consumption. Kochar (2005) uses the change in 1997 from universal to targeted PDS to es-
timate the impact on calories consumed by rural households. She finds that the elasticity of
caloric intake with respect to the value of the subsidy is very low (0.06 on average.) Tarozzi
(2005) studies a sudden increase in the price of PDS rice in Andhra Pradesh and concludes
that it did not have a big impact on nutritional status and child anthropometrics.
While the effect of the PDS on nutrition has been found to be quite small, some limita-
tions of the identification strategies used in previous studies could cause their results to be
biased in the direction of finding no effect.12 Kochar (2005) uses variation in the value of the
subsidy that is determined by BPL status. The measure of eligibility for the newly targeted
program (BPL status) is imputed and not actually observed in the data which results in
errors of mis-classification. In practice, BPL status is determined using individual poverty
scores and region specific cut offs, both of which vary over time. Even within a particular
region, imputing BPL status using multi-dimensional indicators of poverty is problematic.
Niehaus et al. (forthcoming) survey households in Karnataka which were identified as po-
tential PDS beneficiaries in the government BPL survey. Using the criteria for 2007 (based
12Jensen and Miller (2011) raise similar concerns regarding the methodologies employed by Kochar (2005)and Tarozzi (2005).
15
on eight measures), they find that 13% of eligible households do not possess a BPL card,
while 70% of legally ineligible households do. Thus any exercise in imputing BPL status will
have significant mis-measurement resulting in biased estimates. Further, the time frame of
Kochar’s study is two to three years before and after the targeted program was introduced.
Take up rates were low (6%- 14%) during that time and the program was much less gener-
ous. Since 1997 and particularly during the early 2000’s, the PDS has undergone enormous
changes. There has been a push to increase the value of the subsidy to BPL households and
participation in the PDS has gone up.13 Finally, Kochar’s analysis also suffers from a bias
in the opposite direction. Since BPL status is a prerequisite for other types of government
assistance, using the BPL dummy could confound the effect of the PDS with the impact of
other government programs.14 Khera’s (2011c) study also uses variation due to BPL sta-
tus, and does not exploit differences in the value of the subsidy. Tarozzi’s (2005) analysis
focusses on anthropometrics for children below age 4; identification comes from variation
in the length of exposure to higher PDS prices but does not use differences in the quantity
entitlement. The length of exposure varies only from one to three months and the actual
receipt of benefits is not observed. His analysis focusses on the pre-targeted PDS in 1992,
when the program was much less generous. This paper adds to the literature by being the
first to compute a household specific value of the subsidy using state level program rules and
13Round 61 (2004-05) of the NSSO surveys provides information about the BPL status of a household.The data from this round suggest that participation is not driven by income or idiosyncratic householdcharacteristics that could potentially affect food consumption. 68% of BPL households in 2004-05 reporthaving accessed the PDS in the last 30 days. In data from the India Human Development Survey (2005),over 95% of BPL households report having accessed the PDS in the last 6 months. Thus, as the value ofthe program to BPL households has increased, there has been a corresponding increase in participation.Non-BPL participation in the PDS has fallen dramatically. In the 2004-05 round of the NSSO surveys,0.06% of PDS users were non-BPL. This decline has also been noted by Khera (2011a). Similar behaviourhas been observed in the context of other welfare programs – Brian Mc Call (1995) finds that take up ofunemployment benefits increases as the individual level recipient amounts increase.
14Assistance programs include loans, scholarships, medical benefits, distribution of bicycles etc.
16
local market prices.15
4. Data and Empirical Strategy
4.1 Data and Sample
The National Sample Survey Organisation (Ministry of Statistics and Program Implemen-
tation) has conducted socio-economic surveys every one to two years since the 1950s. The
surveys are repeated cross sections and collect detailed expenditure information. They are
nationally representative, conducted year round to avoid seasonal biases and form the basis
for the government’s official poverty estimates. The NSSO data have been used to study
the PDS and other national programs such as the employment guarantee scheme as well as
issues of inequality, poverty and land relations (Kochar 2005, Khera 2011a, Dutta et.al 2012,
Deshpande 2000 and Sundaram and Tendulkar 2003). This paper uses six years of data from
July 2002 to June 2008.16
The NSSO surveys split expenditure data into several sub categories such as food items,
beverages, durable goods, medical expenditure, educational expenditure, conveyance and
rent. Other variables include family size, number of children, industry classification of the
head of the household, location, land owned, own production, type of dwelling, religion and
social group. Data are collected on the age, gender, education level, marital status and
15Using the value of the subsidy, as opposed to a dummy for eligibility, also enables a comparison withthe results for price subsidies in Jensen and Miller (2011).
16In 2002, PDS allocations were increased from 20 kg to 35 kg per family, per month in most states.Some states started making changes (increases and decreases) in the PDS quotas in 2008. In the absence ofcredible official records of all the changes that ensued, I focus on the relatively stable period between 2002and 2008 for which reliable state level program rules are available.
17
relation to the head of each household member. For goods that are available through the
PDS, households report spending in terms of cost and quantity separately for PDS and
non-PDS sources. This enables the classification of households into PDS beneficiaries and
non-beneficiaries based on actual utilization as opposed to imputing BPL status. The final
analytic sample for this paper comprises the current participants of the PDS.17
Regional preferences for cereals are distinct and strong in India. Atkin (forthcoming) pro-
poses climatic conditions and habit formation as possible reasons for these distinct tastes
and previous studies on the PDS (Kochar 2005, Khera 2011c) have focussed either on rice
or wheat.18 Following the literature, I consider the eight states (151 districts) that have rice
as their dominant staple and a generous rice subsidy.19 These states offer a very small or no
subsidy for wheat. Though the main focus is on rice, the wheat subsidy in these states is used
as a validity check and is found to have a very small impact. Due to data availability and
reliability issues, it is standard practice to focus on the major Indian states. Tamil Nadu, a
state in southern India, is distinct from the rest of the country in that it has a universal PDS.
All households, irrespective of their income, are eligible for the program, making it difficult
to compare their outcomes with the rest of the country since the impact at the bottom of the
income distribution could be very different from its impact at the top. Thus, in the absence
of a credible way to classify households into BPL and APL, and given that the program is
so different in this one state, I drop Tamil Nadu from the sample. The states in the final
17As discussed earlier, non participation in the PDS is not likely to be driven by household specific factors.To the extent that some fraction of households with BPL cards do not avail of the program, strictly speaking,my analysis estimates the effect on eligible households that participate in the program.
18Jensen and Miller’s (2011) results on the impact of price subsidies in China vary substantially by provincesuggesting that the two staples (rice and wheat) should be analysed separately. In this paper, I examine therice subsidy and a similar analysis can be undertaken for wheat.
19Honyes and Schanzenbach (2009) also restrict their sample to limited subgroups of the population thathave a higher probability of participation in the food stamps program.
18
sample are: Andhra Pradesh, Assam, Chattisgarh, Jharkhand, Karnataka, Kerala, Orissa
and West Bengal. As a robustness check, I estimate the impact of the program in all the
major Indian states. As expected, the rice subsidy has a much smaller impact in the non-rice
favoring states. Aside from regional preferences, the rice subsidy specifically is of interest
because there is some evidence that rice may be a giffen good for poor households (Jensen
and Miller 2008). Finally, the rice subsidy in India is cleaner to study because the amount
of illegal diversion of wheat grains is much higher (Khera 2011a).
Prices are computed using households’ reported expenditures and quantities for over 150
food items. For every item, prices can be determined by dividing expenditure by quantity.
Though technically speaking these are unit values, which can give rise to measurement error
and concerns of differentiated products in terms of quality, it is common in the literature to
use them as proxies for prices (Subramaium and Deaton 1996, Kochar 2005, Atkin 2013). In
the context of this study, these issues pose much less of a threat because prices for PDS and
market rice are computed by averaging the district-season-year specific price as reported by
BPL households.20 Each household is assigned this average price and not its self reported
unit value. Further, only prices reported by similar households, i.e. beneficiaries of the PDS,
are used to compute the local average price. Thus the local average does not include prices
paid by much wealthier households, who might be purchasing higher quality rice. Following
Atkin (forthcoming), median local prices are used to calculate the value of the subsidy as a
check. The results are robust to using median prices, which are not likely to be influenced
by quality and outliers. All prices are in 2005 rupees.
20Every NSSO survey period (one year) is subdivived into four or six sub-rounds/waves. This is done toensure that seasonal patterns in consumption can be studied.
19
The conversion of food purchases into per capita caloric intake is done using standard factors
(NSSO 1996) for each of the food items in the survey and total monthly household calories
are converted into per capita daily amounts.21 While this may not be ideal due to potential
differences between purchases and actual intake, it is an approximation used not just in the
literature, but also by the government to estimate the income poverty line based on caloric
intake. 22 Deaton (1997) discusses the merits and limitations of several types of equivalence
scales to convert household consumption into per person quantities. For practical applica-
tions, a simple weight is suggested as a reasonable approximation. All per capita values are
thus calculated using an equivalence scale that assigns 0.5 weight for children below 15 years.
This allows for a correction in the per capita amounts for larger families that are likely to
have more children (who consume fewer calories).23
4.2 Variation in Value of the Subsidy
The value of the subsidy for each household is calculated as the product of the local price
discount and the state specific per capita quota. The discount is the difference between
the average market and average PDS price, at the district-season-year level. The quota is
calculated based on program rules of the state of residence and household size.24
PerCapV alSubijswt = (Pmktjwt − P pds
jwt ) ∗Qis
21Subramanium and Deaton (1996) attempt to correct this conversion from the NSSO surveys to accountfor the number of meals given to guests. However, since there is no way to determine the composition ofthose meals, caloric consumption from different food groups is not adjusted and neither is the expenditure.
22As discussed in Jensen and Miller (2011), data from more detailed individual food intake diaries raisetheir own set of concerns of validity and accuracy.
23Results remain qualitatively the same if a simple per capita variable is used instead. Elasticities computedat the household level are also found to be similar.
24As discussed ahead, I use the national average family size to compute the value of the subsidy for everyhousehold as a robustness check.
20
where i= household, j= district, s= state, w= season, t= year and Qis= state-household
size specific quota for household i.
Prices
Recall that the central government sets a PDS price for the year and states are allowed to
add transport and distribution costs to the final price that households pay. The PDS price is
not indexed to market prices, which are in turn determined by demand and supply side fac-
tors.25 The diverse nature of India’s terrain and climate combined with its large area results
in substantial geographic variation in the prices of agricultural commodities. As discussed in
Jacoby (2013), controls on the inter state trade of food grains lead to substantial differences
in prices across states. Deaton (1997) finds important interregional differences in prices
using NSSO data. Kochar (2005) and Atkin (forthcoming) ascribe the within state, inter
district variation in market prices of food grains to transportation costs and state specific
market controls.26 Agricultural prices also vary by season and as a result of local conditions
and unpredictable weather phenomena.27 Thus, as a result of the government’s policy of
maintaining a fixed (within year) PDS price across regions in India, the difference between
the market and PDS price varies for every district-season-year cell.28
25In a study conducted in Delhi, SEWA (2009) finds that 31% of respondents report that PDS pricesremain low, but market prices keep rising.
26Wadhwa (2001) details the restrictions on the marketing and movement of agricultural goods in India.27Planning Commission’s (2001) extensive study on the price behaviour of rice and wheat confirms sea-
sonality of prices. Sekhar (2003) reports a higher intra year variability in domestic agricultural prices ascompared to international prices which suggests incomplete integration of markets within the country.
28Antyodaya households are identified as being the poorest of the poor and they pay even lower prices. Inround 61 (2004-05) of the NSSO surveys and the IHDS (2005), these households are identified and summarystatistics confirm that they pay 30% lower prices on average at the PDS shop (appendix Table C). Theycomprise less than 10 % of the BPL sample in my data. In the full dataset, it is not possible to identify thesehouseholds and this source of variation is not explicitly used in the analysis. As a check, I use data fromround 61 to include an interaction between the value of the subsidy and an Antyodaya dummy and find noevidence of a differential impact of the subsidy.
21
Table 2 presents the spread of the discount across and within states for different seasons.
The state level average discount varies between 38% (Assam in the monsoon season) and
67% (Karnataka in winter). These rates are in line with the government’s policy of offering
food grains at approximately half of the procurement cost. Within any particular season
and state, there is substantial variation in the district level discount. Figure 2 confirms
this variation graphically for each district-season-year cell in the sample. Panel A plots the
discount data from 2002 for each of the 151 districts in the sample. There is heterogeneity
within states in the discount across districts. For instance, in 2002, the discount in Assam
varies from 15% to about 45% in the monsoon season, with a state level average of 38%. The
average discount in Andhra Pradesh is 55% but the spread (47% to 65%) is much smaller as
compared to Assam. A similar pattern of within state variation in the seasonal discount is
seen for the other six years (panels B to G, Figure 2) .
Quotas
Different states set different quotas for rice and wheat ranging from 0 to 35 kg per household
per month, as shown in Table 1. Some states do not index the quota to household size and
for those that do, there is an upper limit on the maximum amount (except in West Bengal).
Thus the per person quota varies by household, depending on its size and state of residence.29
Table 3 combines the two sources of variation and presents the spread of the per capita
value of the subsidy for the sample under study. As expected, the spread of the per capita
value (standard deviation) is the smallest in West Bengal and Andhra Pradesh, both of
29An examination of the average quantities bought via the PDS confirms that this variation is realized inpractice (Appendix Table A).
22
which index the subsidy to household size. The average subsidy in West Bengal is less than
half of the subsidy in Andhra Pradesh. Overall, the value varies across seasons, states and
within states.
4.3 Descriptive Statistics
Table 4 presents descriptive statistics for PDS users and the full NSSO sample averaged
over all six years. The monthly per capita expenditure for PDS users is lower than that
for the full sample, as expected. The average per capita caloric intake for PDS users is
2190 kcal. Given that over 78% of PDS users live in rural areas, this average is well below
the minimum daily requirement (2100 kcal for urban and 2400 kcal for rural areas). PDS
users spend a higher fraction of their total monthly expenditure (58%) on food. Scheduled
Caste/Scheduled Tribe/Other Backward Castes comprise 76% of PDS users.
The PDS price is 50% lower than the market price. On average, each household buys
more rice in the market than what they receive through the program – the PDS contributes
41% of the total rice consumed. Thus, the PDS is an important source, but not one that
meets the entire food requirements of the household; almost three-fourths of the total food
expenditure is on items other than rice.30 Per capita cereal expenditure is a little over one
fourth of food expenditure, but cereals contribute over 72% of the total calories consumed.
30The fact that households also purchase rice in the market suggests that resale of PDS rice throughconsumers is not a serious concern.
23
4.4 Empirical Specification and Assumptions
The basic equation to estimate the impact of the subsidy on household outcomes is
Yijswt = α + βPerCapV alSubijswt +Xijswtγ + δj + χw + θst + εijswt (1)
Yijswt represents the outcome variable (such as per capita caloric intake) for household i in
district j, state s, season w and year t. PerCapV alSubijswt is the per capita value of the
subsidy to the household based on program rules and local prices. Xijswt is a vector of house-
hold characteristics, δj and χw control for district-specific and seasonal effects respectively,
and θst are state*year dummies. Standard errors are clustered at the district level, as all
households in a district are subject to the same rules determining the per person monthly
quota.
Household characteristics in X include education of the head of the household and the
spouse, a quadratic in the age of the household head, the proportion of females in the house-
hold, land holdings (proxy for income) and urban location. Behrman and Deolalikar (1988)
provide a comprehensive review of the literature on the demand for nutrients in developing
countries and find that education levels, size and demographic composition of the households
are important determinants of the demand for calories.31 While most studies on the PDS
have focussed on the rural poor, the program is also critical in urban areas and the nutri-
31Though the dependent and main explanatory variable in equation (1) are both in per capita terms, theremay still be economies of scale within a household that are not accounted for by this specification. I checkfor any effect of household size in the following ways. First, I use an alternative weighting scheme to generateper person variables for the household. Second, I use household level caloric intake and value of the subsidyto estimate the impact and explicitly control for family size. Third, I shut down the variation due to familysize by using quotas based on the national average family size. Finally, even though household size as aregressor in equation (1) is potentially endogeneous, as an additional check, I find that adding size to theregression does not qualitatively change the results.
24
tional needs of households in the two areas are very different. District and seasonal dummies
are included to take regional and climatic effects on the demand for calories into account.
The model is saturated by introducing state*year dummies that control for any differences
in state level effects and other assistance programs that BPL households in a particular state
might be eligible for. With district, season and state*year controls, any variation in the price
discount is due to unpredictable shocks to prices as a result of random weather phenomenon,
arbitrary controls on the movement of goods and imperfectly integrated agricultural markets.
The parameter of interest from equation (1) is β, the coefficient on the value of the sub-
sidy. The validity of the model rests on the assumption that after controlling for household
characteristics and geographic and seasonal effects, the value of the subsidy (price discount*
quota) is exogenous to unobservable factors that may affect the demand for food. If this
assumption holds, the value of the subsidy should have a negative (through market price) or
zero effect on the caloric consumption of households that do not use the program. To check
for this, I perform a falsification test using non-PDS users.32
The analysis makes two other substantive assumptions. First, the model assumes that
household size is exogenous to the state level quota, i.e. households do not adjust fam-
ily size according to more or less generous program rules. While the program is important,
it is not likely to affect the fertility or migration decisions of households. On average, the
value of the subsidy is 4% of the total monthly expenditure of the household. I check for the
validity of this assumption by using the national average family size instead of the actual
size of the household to compute the value of the subsidy and find that the results remain
32Tables 12-15 ahead present the results of multiple robustness checks.
25
qualitatively the same. I also check for the independent effect of family size by using total
household caloric intake as the outcome and explicitly controlling for family size. Second,
the analysis assumes that the demand for calories or rice by any one household does not
affect the market price that it faces i.e. the local price in its district-season-year cell. In the
absence of this assumption, the coefficients would suffer from simultaneity bias. Treating the
household as a price taker is a standard assumption from the theory of competitive markets.
The agricultural market in India has thousands of producers and consumers, which makes
this assumption fairly reasonable in the context of this study.
5. Results
5.1 Impact on Cereal Consumption and Caloric Intake
To study the impact of the PDS on nutrition, three outcome variables are considered: per
capita cereal consumption, per capita caloric intake and per capita calories from different
food groups. Cereals are a cheap and critical source of calories and the PDS is responsible
for providing a substantial amount of cereals to vulnerable households.33 Column 1 of Table
5 presents results for the main specification with per capita cereal intake as the outcome
variable. An increase in the monthly subsidy by Rs 10 leads to a 20.3 gm (60 kcal/day)
increase in the daily consumption of cereals. Based on average market prices, Rs 10 (per
month) can buy an extra 30.2 gm of cereals every day. This suggests that infra marginal
households prefer to spend part of the extra income on foods other than cereals or on non-
food items. The specification in column 1 assumes that an (absolute) increase in the value
33Most cereals have 3-3.5 kcal/gm. See Appendix Table B for a comparison of the price per calorie fordifferent food groups.
26
of the subsidy has the same effect, irrespective of the base value. Evaluated at the means,
the estimate from column 1 implies an elasticity of 0.12, which is similar to the estimate
from the log-log regression in column 2– 0.123. Given that the elasticity is small and the
sample comprises only BPL households, it is not surprising that the estimates are so close.34
Though small, the elasticity is positive and significant and confirms that the subsidy has a
real impact on cereal consumption, as expected. In order to examine the impact of each of
the components of the subsidy and determine their relative importance, column 3 splits the
value of the subsidy into quota, prices and interaction terms. The coefficient on quota is
insignificant, confirming that households are infra-marginal since the amount of the quota
doesn’t determine overall cereal consumption. The coefficient on market price is negative, as
expected. The interaction between the amount of the quota and the market price is positive
and significant, suggesting that a higher quota increases cereal consumption more, and is
thus more valuable, when the market price of rice is higher.
Though the estimates suggest that the program has a positive effect on cereal consumption,
households may be substituting some of the income gains away from cereals. Combining the
analysis for calories with the results on cereals helps to determine the impact on overall food
intake. Column 1 in Table 6 indicates that a Rs 10 increase in the value of the rice subsidy
results in an increase an increase of 126 kcal/day.35 This is more than double of the impact
34The levels estimate assumes that the impact of an absolute increase in the value of the subsidy isindependent of where in the distribution the household is located (linear relationship). In this sample,households are homogeneous to the extent that they all lie below the poverty line and receive a positive foodsubsidy. The log-log specification assumes a non-linear relationship and a constant elasticity. However, foran elasticity of 0.12, the resulting curve is very flat, i.e. almost linear, over the range of the value of thesubsidy, cereal consumption and caloric intake.
35In order to compare the estimates, I convert the gains in daily cereal consumption from Table 5 intocaloric gains. The 95 % confidence interval from Table 5 implies an increase between 51 kcal/day and 69kcal/day. This does not overlap with the confidence interval from Table 6 which is 104 kcal/day to 148kcal/day. Thus, the overall gains in caloric intake are greater than the increase in calories from cereals alone.
27
on calories through increased cereal consumption (60 kcal/day from by column 1 in Table 5),
suggesting that the subsidy works by increasing calories through not just cereals, but other
foods as well. This is an important indicator that households benefit from the program in
terms of overall food intake and not just through the food grains directly provided by the
PDS.
Table 6 also presents estimates of the cross food group elasticities.36 In China, Jensen
and Miller (2011) find negative effects of an experimental price subsidy for rice on fruits and
vegetables and on pulses (lentils). In contrast, all elasticities here are positive and significant.
Thus, the results suggest that provision of a food subsidy via quotas has a positive impact
on all food groups, similar to what would be expected from an income transfer whereas the
pure price subsidy causes households to consume less of the subsidized good and fewer overall
calories.
5.2 Comparison with the Expenditure Elasticity of Calories
The PDS subsidy is supplementary in nature and thus is expected to operate through the
income effect. The elasticity of calories with respect to the value of the subsidy is 0.144
from column 2 of Table 6. Though small, it lies between the historical estimates of income
elasticity of calories for India, which range from 0 to 0.34. Behrman and Deolalikar (1990)
find the income elasticity of calories for rural south India to be very low: 0.01. Subramanian
and Deaton (1996) use NSSO data and report an expenditure elasticity of calories of 0.34 for
rural households. To compare my results with these earlier studies, I estimate the elasticity
of calories with respect to total expenditure for rural PDS households in my sample. The
36For the sample in this paper: 5% of total calories come from lentils, 5% from fruits and vegetables andapproximately 1% from meat, fish and other animal products.
28
expenditure elasticity is close to the higher estimates in the previous literature: 0.4 from
column 1 in Table 7. Even though the elasticity of caloric intake with respect to the subsidy
is lower than the expenditure elasticity (0.140 for the rural sample), it is still positive and
significant.37
The fact that the PDS acts like an income transfer raises the question of its impact on
price paid per calorie i.e., the extent to which a rise in income induces households to sac-
rifice caloric intake for more expensive and less calorie rich foods. Behrman and Deolalikar
(1989) suggest that the income elasticity of calories is smaller than the income elasticity
of food expenditures because of changes in the shape of the food indifference curve as in-
come rises. They find that even relatively poor households value variety. Subramanian and
Deaton (1996) report the total expenditure elasticity of expenditure on food as 0.75 and the
elasticity of price paid per (1000) calories as 0.35. These numbers are very close to the esti-
mates for the sample used in this paper (columns 3 and 5 in Table 7). The elasticity of food
expenditure with respect to the value of the subsidy is lower at 0.146, but it is positive and
significant. The subsidy elasticity of price per calorie is not significantly different from zero.
Thus the value of the subsidy is not large enough to raise concerns of households sacrificing
calories for taste.
One issue with the NSSO data (and any estimates based on them) is that they do not
contain information on income. Table 8 presents alternate estimates of the elasticity of
37It is not surprising that the elasticity of caloric intake with respect to the value of the subsidy is lowerthan the expenditure elasticity. This is true for two reasons: First, transaction costs associated with goingto the fair price shop, such as irregular supply, inconvenient timings and long queues would make the impactof the subsidy smaller. A pure increase in income does not have any such costs associated with it. Second,this estimate takes into account leakages and inefficiencies in the system since it is based on actual programrules.
29
cereal consumption using income data from the India Human Development Survey 2005.38
The survey is nationally representative, covers 41,554 households and collects detailed house-
hold information on demographics, income, debt, insurance and consumption. The income
elasticity of cereals for PDS users is much lower (0.046 in column 1, panel A of Table 8)
than the elasticity with respect to the rice subsidy which is 0.295 in column 2. The income
elasticity of food expenditure is also much lower than the elasticity of food expenditure with
respect to the value of the subsidy from columns 3 and 4 of panel A. While income is a
reflection of more permanent characteristics of a household such as age and education of
the head, location and occupation, the value of the subsidy fluctuates from month to month
depending on market prices. Not all income is consumed, part of it saved or used to purchase
durable goods, thus it is not surprising that an increase in the value of the subsidy has a
bigger impact on cereal consumption than an increase in income. Further, income tends to
be under-reported and the resulting measurement error would bias the estimates downwards.
The IHDS dataset allows me to check if the household characteristics used in the main
regression are a valid control for income. Column 2 of panel A in Table 8 estimates the
elasticity with respect to the value of the subsidy controlling for income, location, district
and season effects. In column 3 of panel B, I run the same regression, but instead of explic-
itly including income, I add the education of the head of the household and the spouse, a
quadratic in the age of the household head, the proportion of females in the household and
land holdings. The estimates are similar, which validates the use of these characteristics in
place of income.
38This survey is jointly conducted by National Council of Applied Economic Research in Delhi and theUniversity of Maryland (Desai, Vanneman, & National Council of Applied Economic Research, 2005). I usethe IHDS dataset to construct variables that are identical to variables from the NSSO data. I also restrictthe sample to the eight rice favoring states used in the main analysis.
30
The food module of the IHDS surveys is similar to the NSSO, but is much less detailed.
To check for any effect of differences in survey methodologies, in panel B of Table 8, I es-
timate identical regressions for the IHDS and data from the NSSO covering the same time
period (November 2004 - October 2005). The estimates for expenditure elasticity of cereals
from the two datasets are similar: 0.247 and 0.269 in columns 1 and 2 of panel B.39 The
estimates of the elasticity of cereal consumption with respect to value of the subsidy across
the two samples (panel B columns, 3 and 4) are also similar, though the IHDS estimate
is slightly larger. The NSSO sample is larger than the IHDS, but the two don’t differ on
observable characteristics.40 PDS prices are lower, and consequently the subsidy value is
marginally higher on average (Rs 25.46) in the IHDS sample as compared to the NSSO data
(Rs 24.17).41 While it is difficult to say with certainty why the estimates differ, the IHDS
results do not qualitatively change the implications of the main analysis. At best, they
suggest that the estimates from the NSSO data may be a lower bound on the impact of the
program.
5.3 Heterogeneous Impacts
Households that benefit from the PDS are similar to the extent that they all lie below the
poverty line. However, the program could have differential impacts on certain sub-groups of
the population. Table 9 checks for heterogenous impacts of the subsidy on urban households,
39The expenditure elasticity of cereal consumption (column 2 of panel B, table 8) is lower than theexpenditure elasticity of calories (column 1, table 7). Deaton and Dreze (2009) find a similar pattern andattribute it to the fact that a higher fraction of the marginal rupee spent on food goes towards non-cerealfood items.
40Appendix Table D presents descriptive statistics for the two samples.41Price data in the IHDS is collected directly, by asking households for their estimate of the average market
price of various goods over the last 30 days.
31
traditionally marginalized communities, farming households and households in the lowest ex-
penditure quartile. While urban households face different patterns of price changes compared
to rural ones and may have different mechanisms to cope with food insecurity, (rice) farming
households may be more or less affected by market prices depending on whether they are net
buyers or sellers in the market. Traditionally marginalized communities may face discrimina-
tion in terms of access to the program and households in the lowest expenditure quartile may
lack sufficient income to fully leverage the subsidy provided through the program. Residing
in an urban area has a negative and significant impact on cereal consumption (column 1).
This is expected as urban calorie requirements are lower on average. However, the program
does not have a significantly different impact in urban areas. Columns 2 and 3 check for
the impact on cereal consumption of traditionally marginalized communities (by caste) and
the poorest quartile of the sample, respectively. There is no significantly different impact
for the SC/ST/OBC category. Households in the lowest expenditure quartile consume fewer
cereals, but the subsidy does not have a differential impact on them.
The program has a strong, positive effect on the cereal consumption and caloric intake
of farmers, i.e. households that report consuming some of their home grown rice (columns
4 and 8). Given that these households need to rely less on the market and are not likely
to make cereal purchases, the subsidy acts like a direct in-kind transfer of extra grains. As
there is no income gain, these household can not substitute cereals with other types of food
or other goods. For the sample of farmers, Table 10 splits the subsidy into its components
and confirms that market price is not a significant determinant of caloric intake (column 2)
and marginally significant for cereal consumption (column 1). This is in contrast to results
for the overall sample (column 3 of Table 5) where market price was seen to have a significant
32
and negative impact. The interaction between market price and per capita quota has the
expected positive sign and is significant.
From column 7 of Table 9, the impact of the subsidy on the caloric intake of the lowest
income quartile is significantly lower, compared to other households. This group represents
the poorest of the poor. For an increase in the value of the subsidy, these households appear
to substitute away from food towards other non food items. Table 11 presents elasticities
of cereal consumption and caloric intake for the four income quartiles of PDS users. The
elasticities increase with income, suggesting that the poorest households (also likely to be
the most credit constrained) use the extra income generated by the subsidy to purchase
non food items.42 This is consistent with the hypothesis in Sen (2005) that an increase in
the availability, demand and prices of non-food essential goods and services has resulted in
squeezing of the food budget of poor families.
5.4 Robustness and Sensitivity Checks
The identification strategy rests on the assumption that there are no underlying factors that
simultaneously affect the value of the subsidy and food consumption of households. To test
the validity of this assumption, I perform a falsification test by checking for the impact of
the local average value of the PDS subsidy on the cereal consumption and caloric intake of
households that do not receive subsidies from the PDS. From Table 12, it is clear that the
program has no impact on the consumption of Non-PDS beneficiaries. The only (marginally)
significant effect is the elasticity of cereal consumption. As expected, this is negative since
42There is no difference in terms of quantity of rice purchased from the PDS across expenditure quartiles,implying that even households in the lowest quartile purchase the full PDS entitlement and the lower impacton their cereal consumption is not driven by an inability to fully avail of the PDS discount.
33
a higher value of the subsidy (resulting from a higher market price of rice) would negatively
affect cereal consumption of households that purchase cereals only in the market.
Table 13 presents results from a series of robustness checks. Concerns about the possible
endogeneity of household size are addressed in columns 1 and 2. In column 1, the national
average family size is used to calculate the value of the subsidy instead of the actual size and
composition of each household. Column 2 presents the elasticity results at the household
level, with a control for household size. The estimates from both are very close to the main
result (0.144) which supports the assumption that household size is not influenced by the
program. To check that the results are not driven by the equivalence scale used in the main
analysis, column 3 uses a simple per capita estimate. The results remain qualitatively simi-
lar to those from the main analysis, though using a simple per capita without correcting for
the composition of the household makes the elasticity appear larger. Averaging across the
market price of rice within a district-season-year cell should not raise concerns of significant
differences in quality, since households in the sample all lie below the poverty line. However,
I check for this possibility by using the median prices to calculate the value of the subsidy
and find that the results remain the same (column 4). Finally, I check the sensitivity of my
results to seasonal price patterns by utilizing survey waves/sub-rounds. Replacing state*year
and season dummies with state*survey wave dummies does not change the estimates, as seen
in column 5.
Table 14 checks for the impact of the rice subsidy on three different samples – All India,
the rice favoring states and non-rice favoring states. As expected, the elasticity is much
smaller in states where rice is not the main staple, even though some of them offer a small
34
subsidy on rice. This justifies using the smaller sample of rice favoring states to get a more
valid estimate of the impact of the rice subsidy program. Conversely, Table 15 checks for
the impact of the (much smaller) wheat subsidy in the main sample of rice favoring states.
The effect of the wheat subsidy on both cereal consumption and caloric intake is extremely
small and its inclusion makes the effect of the rice subsidy a bit larger. To the extent that
the value of the subsidy is higher when the market price of rice (and possibly other goods)
is higher, this is expected because the wheat subsidy provides a control for the overall level
of market prices.
5.5 Issues in Implementation and Policy Changes
Corruption
The PDS has been criticized for various types of inefficiencies and corruption and there are
striking state-wise differences in implementation. Khera (2011a) estimates the extent of
diversion in food grains using a combination of NSSO data and administrative records on
grain allocation at the state level. She categorizes states into those that are performing (well)
and others that are reforming or languishing.43 Table 16 presents results for the impact on
cereal consumption and caloric intake. Columns 1 and 2 suggest that the impact on cereals
and caloric intake is almost 50% lower in states that are more corrupt.44 This is a huge cost
of inefficiency and is particularly troubling given the high levels of malnutrition that persist
in India.43Of the eight states in the sample, three are categorized as functioning well: Andhra Pradesh, Karnataka
and Kerala.44These estimates provide suggestive, not causal evidence of the effect of corruption as there could be
many other state level factors that make the program less (more) effective in more (less) corrupt states.
35
The National Food Security Bill
The PDS is the government’s flagship program for improving nutritional outcomes and the
National Food Security Bill (NFSB), passed in September 2013, plans to expand it further.
Unless the enormous leakages in the system are plugged, the government will have to procure
a much higher quantity of food grains (than the target) in order to have the intended effect
on nutritional outcomes. A rough estimate of the impact of the NFSB can be made using
price data, program rules and the elasticity estimates from this paper. The NFSB assures 5
kg of food grains per person per month to 67 % of the population at Rs 3 per kg for rice, Rs 2
per kg for wheat and Re 1 per kg for coarse grains. BPL households are currently eligible for
an average of 4 to 5 kg per person per month.45 At current prices, it is estimated that the per
kilogram subsidy will rise from Rs 13.5 to Rs 16.5 (The Financial Express, September 2013).
Thus the NFSB does not substantially increase the quantity of food grains assured to BPL
households, but it does entail a big increase in the price discount they receive. Combined
with average caloric intake and current price data, the elasticity estimate from this paper
suggests that the bill will lead to a per person increase of 72 kcal/day in rural areas and
66 kcal/day in urban areas for the current beneficiaries of the program.46 This estimate is
based on program rules and thus takes into account the leakages and inefficiencies in the
system. If the expansion of the PDS is accompanied by better enforcement, especially in the
more corrupt states, the impact will be even higher.
The bill will also expand the beneficiary pool of the PDS to include 67% of the popula-
45Many state governments such as Tamil Nadu, Andhra Pradesh and Chattisgarh already have a moregenerous PDS than the national average and provide food grains to a large fraction of their population atthese low prices. Further, Antyodaya households are eligible for 35 kg per month at even lower prices.
46This estimate uses the approximate caloric intake of PDS rice users in rural (2260.25 kcal/day) andurban areas (2076.5 kcal/day) from the 2009-10 round of the NSSO Surveys (Government of India, 2013).
36
tion.47 While it is difficult to precisely predict how newly eligible households will respond
to the subsidy, there is some evidence to suggest that there will be a positive effect (similar
to that for current beneficiaries) on their caloric intake. Given that the new beneficiaries
are likely to be better off, the subsidy will not be their primary source of food grains and
thus, should have a positive effect on caloric intake through the standard income effect. This
hypothesis is supported by the results in Table 11, which show that the elasticity of caloric
intake is higher for households in the higher expenditure quartiles. There is a concern that
expansion in the reach of the program could result in the inclusion of some households that
have an income high enough to make the relative value of the subsidy too insubstantial to
have an effect. These households are also likely to have have lower rates of participation,
which would further reduce the overall impact of the program on caloric intake. Neither of
these concerns is found to relevant in Tamil Nadu, which is the only state in India that has
a universal PDS. The participation rate for the entire population of the state (averaged over
the six years) is high (61%) and the elasticity of caloric intake with respect to the value of
the subsidy is 0.2, which is higher than the national average (0.144). Tamil Nadu has a well
functioning PDS and the data from this state suggest that if implemented well, the program
can have a substantial impact on caloric intake for a wide range of households.
6. Conclusion
This paper examines the Indian Public Distribution System and presents evidence of its
impact on nutrition using variation in state specific program rules and fluctuations in local
market prices. In agreement with the literature on food subsidies, the elasticities for cereal
consumption and calories with respect to the value of the subsidy are small. However the
4744.5% of the population currently participates in the PDS (The Indian Express, July 2013).
37
results indicate that households benefit from the program in terms of food intake (calories)
and not this is not just through the food grains (cereals) directly provided by the PDS.
Even though the program provides a subsidy only on cereals, it has a positive effect on the
consumption of different food groups. Thus, the PDS subsidy generates an income effect
for households and is effective in improving nutrition. The results also confirm state wise
differences in the functioning and impact of the PDS. Finally, the elasticity estimates suggest
that the implementation of the National Food Security Bill will lead to an increase of 66 -
72 kcal in the daily caloric intake of current beneficiaries of the program.
Some states and regions in India have had much more success in implementing the PDS
than others. The Indian Human Development Survey is a rich socio-economic dataset that
makes it possible to study these regional differences in more detail. In additional to details
on household expenditure, income and fertility, indicators for learning skills and anthropo-
metrics for children are also collected. Examining district level outcomes using participation
and average value of the subsidy would be an additional dimension to studying the PDS.
This would speak to the significant geographic differences that have been noted in the func-
tioning of the system. Another unexplored aspect of the PDS is the protection it provides
from fluctuations in market prices. This would be an interesting angle to assess its value as
a source of food security. Reduced exposure to market risk could also affect a households
investments and labour market choices.
The PDS has been the topic of a lot of debate recently. The National Food Security Bill,
which was passed in September 2013, will give two thirds of the population the legal right
to obtain 5 kg of food grains (per person per month) at prices between Re 1 and Rs 3 per
38
kg The bill also has implications for intra-household bargaining, since it makes the eldest
woman in the household responsible for accessing the PDS on behalf of her family. Given the
enormous scale of the PDS, its expansion and imminent overhaul, it is important to study
the impact that it has in its present form.
References
[1] Atkin, D. (Forthcoming). Trade, Tastes and Nutrition in India.The American Economic
Review.
[2] Atkin, D. (2013). The Caloric Costs of Culture: Evidence from Indian Migrants (No.
w19196). National Bureau of Economic Research.
[3] Banerjee, A. V., and Duflo, E. (2011). Poor economics A radical rethinking of the way
to fight global poverty. PublicAffairs.
[4] Behrman, J. R., and Deolalikar, A. B. (1990). The intrahousehold demand for nutrients
in rural south India: Individual estimates, fixed effects, and permanent income. Journal
of human resources, 665-696.
[5] Behrman, J. R., and Deolalikar, A. (1989). Is variety the spice of life? Implications for
calorie intake. The Review of Economics and Statistics, 666-672.
[6] Behrman, J. R. and Deolalikar, A. B. (1988). Health and nutrition. Handbook of devel-
opment economics, 1, 631-711.
[7] Butler, J. S., Ohls, J.C., and Posner, B. (1985). The Effect of the Food Stamp Program on
the Nutrient Intake of the Eligible Elderly. Journal of Human Resources, 20(3):405-420.
39
[8] Deaton, A. (1984). Household surveys as a data base for the analysis of optimality and
disequilibrium. Sankhya: The Indian Journal of Statistics, Series B, 228-246.
[9] Deaton, A. (1997). The analysis of household surveys: a microeconomic approach to
development policy. Johns Hopkins University Press.
[10] Deaton, A., and Dreze, J. (2009). Food and nutrition in India: facts and interpretations.
Economic and political weekly, 42-65.
[11] Desai, S., Vanneman, R., and National Council of Applied Economic Research. (2005).
India Human Development Survey (IHDS). Computer File, ICPSR22626-v5. Ann Arbor,
MI: Inter University Consortium for Political and Social Research [distributor], 2009-622.
http://dx.doi.org/10.3886/ICPSR22626.
[12] Deshpande, A. (2000). Does caste still define disparity? A look at inequality in Kerala,
India. The American Economic Review, Vol. 90, No.2, pp. 322-325.
[13] Dreze, J., and Khera, R. (2010). The Bpl census and a possible alternative. Economic
and Political Weekly, 45(9), 54-63.
[14] Dutta, P., Murgai, R., Ravallion, M., and Van de Walle, D. (2012). Does India’s
employment guarantee scheme guarantee employment? World Bank Policy Research
Working Paper, (6003).
[15] The Economic Times (June 14 2013). Why the Food Security Bill is neither populist
nor unaffordable. Viewed on July 11 2013, http://articles.economictimes.indiatimes.com
/2013-06-14/news/39976460 1 national-food-security-bill-cash-transfers-food-subsidy
40
[16] FAO (1997). Safety nets to protect consumers from possible adverse effects Economic
and Social Development Department, Food and Agriculture Organization of the United
Nations.
[17] The Financial Express (September 2 2013). Correct Costs of the Food Security Bill
Viewed on September 3 2013, http://www.financialexpress.com/news/correct-costs-of-
the-food-security-bill/1156251/0
[18] Government of India (2013). Public Distribution System and Other Sources of Household
Consumption. NSS 66th Round. Report No. 545 (66/1.0/6). National Sample Survey
Office (Ministry of Statistics and Programme Implementation). Government of India,
New Delhi.
[19] Government of India (2012). Union Budget 2012-13. Expenditure Budget Volume I
(Ministry of Finance). Government of India, New Delhi.
[20] Government of India (2011). Annual Report 2010-2011. Department of food and public
distribution (Ministry of Consumer Affairs, Food and Public Distribution). Government
of India, New Delhi.
[21] Hoynes, H. W. and Schanzenbach, D. W. (2009). Consumption Responses to In-Kind
Transfers: Evidence from the Introduction of the Food Stamp Program. American Eco-
nomic Journal: Applied Economics, American Economic Association, vol. 1(4), pages
109-39, October.
[22] The Hindu (July 5 2013). President agrees to Food Bill Ordinance. Viewed on
July 9 2013, http://www.thehindu.com/news/national /president-agrees-to-food-bill-
ordinance/article4884834.ece
41
[23] The Indian Express (July 6 2013). Manmonia’s FSB: 3% of GDP. Viewed on September
3 2013, http://www.indianexpress.com/news/manmonias-fsb-3–of-gdp/1138195/1
[24] Jacoby, H. G. (2013). Food prices, wages, and welfare in rural India. World Bank Policy
Research Working Paper, (6412).
[25] Jensen, R. T., and Miller, N.H. (2008). Giffen Behavior and Subsistence Consumption.
The American Economic Review, 98:4, 15531577.
[26] Jensen, R. T. and Miller, N.H. (2011). Do consumer price subsidies really improve
nutrition? Review of Economics and Statistics, 93(4), p. 1205-1223.
[27] Karat, B. (2013). Food Matters: Law, Policy and Hunger. Prajashakti Book House.
[28] Khera, R. (2008). Access to the Targeted Public Distribution System: A Case Study
in Rajasthan. Economic and Political Weekly, Vol - XLIII No. 44, November 01.
[29] Khera, R. (2011a). Trends In Diversion Of PDS Grain. Economic and Political Weekly,
Vol - XLVI No. 21, May 21.
[30] Khera, R. (2011b). Revival of the Public Distribution System: Evidence and Explana-
tions. Economic and Political Weekly, Vol - XLVI No. 44-45, November 05.
[31] Khera, R. (2011c). India’s Public Distribution System: Utilization and Impact. Journal
of Development Studies, March 07.
[32] Kochar, A. (2005). Can Targeted Programs Improve Nutrition? An Empirical Analysis
Of India’s Public Distribution System. Economic development and cultural change, 54
(1), 20335.
42
[33] Laderchi, C. R. (2001). Killing Two Birds With The Same Stone ? The Effectiveness
Of Food Transfers On Nutrition And Monetary Poverty. Working Paper no. 74. QEH
working paper series.
[34] McCall, B. P. (1995). The Impact of Unemployment Insurance Benefit Levels on Re-
cipiency. Journal of Business and Economic Statistics, American Statistical Association,
vol. 13(2), pages 189-98, April.
[35] Moffitt, B. Y. R. (1989). Estimating the value of an In-kind Transfer: The case of Food
stamps. Econometrica, 57(2), 385-409.
[36] National Sample Survey Organisation (1996). Nutritional Intake in India. Report No.
405. National Sample Survey Organisation, Delhi.
[37] Niehaus, P., Atanassova, A., Bertrand, M., and Mullainathan, S. (forthcoming). Tar-
geting with Agents. American Economic Journal: Economic Policy.
[38] Planning Commission (2001). An Analysis of the price behavior of selected commodities.
National Council of Applied Economic Research, India.
[39] Planning Commission (2005). Performance Evaluation of Targeted Public Distribution
System (TPDS). Programme Evaluation Organisation. Government of India.
[40] Planning Commission (2012). Twelfth Five Year Plan, 2012-17. Government of India.
[41] Sen, P. (2005). Of Calories and Things: Reflections on Nutritional Norms, Poverty
Lines and Consumption Behaviour in India. Economic and Political Weekly, 40(43), 22
October, 4611-8.
43
[42] Sekhar, C. S. C. (2003). Volatility of agricultural prices-an analysis of major inter-
national and domestic markets. Indian Council for International Economic Relations.
Working paper no 103.
[43] SEWA (2009). Do poor people in Delhi want to change from PDS to cash transfers?
Self Employed Womens Association.
[44] Stifel, D. and Alderman H. (2006). The Glass of Milk Subsidy Pro- gram and Malnu-
trition in Peru. World Bank Economic Review, 20:3 (2006), 421448.
[45] Subramanian, S., and Deaton, A. (1996). The demand for food and calories. Journal of
political economy, 133-162.
[46] Sundaram, K. and Tendulkar, S. D. (2003). Poverty has declined in the 1990s: A
resolution of comparability problems in NSS consumer expenditure data. Economic and
Political Weekly, 327-337.
[47] Swaminathan, M. (2008). Programmes to Protect the Hungry: Lessons from India.
DESA Working Paper No. 70.
[48] Tarozzi, A. (2005). The Indian Public Distribution System as Provider of Food Security:
Evidence from Child Nutrition in Andhra Pradesh. European Economic Review, 49, p.
1305-1330.
[49] The Times of India (June 1 2013). Why the food security Bill will not boost
foodgrain consumption for the poor. Viewed on July 11 2013, http://articles.
timesofindia.indiatimes.com/2013-06-01/edit-page/39674597 1 cereal-consumption-
food-security-bill-households
44
[50] Wadhwa, M. (2001). Parking Space for the Poor: Restrictions Imposed on Marketing
and Movement of Agricultural Goods in India. Centre for Civil Studies.
45
Figure 1: Impact of a food subsidy on the budget set
15
Food
No
n F
oo
d
C
Q
Segment II
Segment I
B
A
N
F D
46
Figure
2:Variation
inPDSrice
discount
(2002-2008)
Pan
elA
020406080 020406080
Monso
on
Post M
onso
on Monso
on
Post M
onso
on Monso
on
Post M
onso
on Monso
on
Post M
onso
on
Assa
mW
est B
enga
lJh
arkh
and
Oris
sa
Chat
tisga
rhAn
dhra
Pra
desh
Karn
atak
aKe
rala
Rice discount ( % of Mkt. price )
Year
= 2
002
Pan
elB
020406080 020406080 Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer Monso
onPos
t Mon
soon
Assa
mW
est B
enga
lJh
arkh
and
Oris
sa
Chat
tisga
rhAn
dhra
Pra
desh
Karn
atak
aKe
rala
Rice discount ( % of Mkt. price )
Year
= 2
003
Pan
elC
020406080 020406080 Winter
Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer Monso
onPos
t Mon
soon
Assa
mW
est B
enga
lJh
arkh
and
Oris
sa
Chat
tisga
rhAn
dhra
Pra
desh
Karn
atak
aKe
rala
Rice discount ( % of Mkt. price )
Year
= 2
004
Pan
elD
020406080 020406080 Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer
Post M
onso
on
Monso
on
Summer
Winter
Monso
onPos
t Mon
soon
Assa
mW
est B
enga
lJh
arkh
and
Oris
sa
Chat
tisga
rhAn
dhra
Pra
desh
Karn
atak
aKe
rala
Rice discount ( % of Mkt. price )
Year
= 2
005
Notes:1.
Rou
nd58
surveyed
hou
seholdsbetweenJu
lyan
dDecem
ber
2002,resultingin
noob
servationsfrom
thewinteran
dsummer
season
forthis
year.2.
Discount
calculatedas
(Mkt.price
-PDSprice)/Mkt.price*100
3.Averagesbased
onPDSan
dmarket
pricesreportedby
PDSusers
inthesample.
47
Figure
2:Variation
inPDSrice
discount
(2002-2008)
Pan
elE
020406080 020406080 Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer Monso
onPos
t Mon
soon
Assa
mW
est B
enga
lJh
arkh
and
Oris
sa
Chat
tisga
rhAn
dhra
Pra
desh
Karn
atak
aKe
rala
Rice discount ( % of Mkt. price )
Year
= 2
006
Pan
elF
020406080 020406080 Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer Monso
onPos
t Mon
soon
Winter
Summer Monso
onPos
t Mon
soon
Assa
mW
est B
enga
lJh
arkh
and
Oris
sa
Chat
tisga
rhAn
dhra
Pra
desh
Karn
atak
aKe
rala
Rice discount ( % of Mkt. price )
Year
= 2
007
Pan
elG
020406080 020406080 Winter
Summer
Monso
onWint
er
Summer
Monso
onWint
er
Summer
Monso
onWint
er
Summer
Monso
on
Assa
mW
est B
enga
lJh
arkh
and
Oris
sa
Chat
tisga
rhAn
dhra
Pra
desh
Karn
atak
aKe
rala
Rice discount ( % of Mkt. price )
Year
= 2
008
Notes:1.
Rou
nd63
surveyed
hou
seholdsbetweenJa
nuaryan
dJu
ne2008,resultingin
noob
servationsfrom
thepostmon
soon
season
forthis
year.2.
Discount
calculatedas
(Mkt.price
-PDSprice)/Mkt.price*100
3.Averagesbased
onPDSan
dmarketprices
reportedby
PDSusers
inthesample.
48
Table 1: State Specific PDS Quotas for BPL households
State Rice (kg) Wheat (kg)
Andhra Pradesh 4 per person (20 hh max) 5 (at APL price*)Assam 20 0Bihar 15 15Chattisgarh 25 0Gujarat 1 per person (3.5 hh max) 1.5 per person (9 hh max)Haryana 10 25Himachal Pradesh 15 20Jharkhand 35 0Karnataka 16 4Kerala 8 per adult 4 per child (20 hh max) 5 (at APL price*)Madhya Pradesh 6 17Maharashtra 5 15Meghalaya 2 per person 0Orissa 16 0Punjab 10 25Rajasthan 5 25Uttar Pradesh 20 15West Bengal 2 per person 2 per person
Sources: Planning Commission (2005), Khera (2011b) and Simplifying the food security bill at
http : //bit.ly/PMNFSB. Notes: 1. * denotes all households pay Above Poverty Line prices i.e. no price
discount. 2. PDS entitlements are for the period under study: 2002-2008. 3. Tamil Nadu is excluded as it
follows a universal PDS.
49
Tab
le2:
Variation
inPDSRicediscount
(%)
Winter(J
anuary
-March)
Summer
(April-M
ay)
State
Mean
Std.Dev
p10
p50
p90
NMean
Std.Dev
p10
p50
p90
N
Assam
41.46
10.59
2040.04
56.45
233
41.78
11.11
6.17
41.82
54.17
163
WestBengal
47.79
1023.51
47.94
60.32
445
48.56
11.28
11.12
50.77
61.18
320
Jharkh
and
54.75
13.57
20.45
51.52
72.73
7356.62
11.6
2062.86
7068
Orissa
42.88
11.76
18.33
44.12
59.42
544
44.58
12.01
13.33
44.57
57.92
427
Chattisgarh
45.15
13.83
17.65
41.04
67.09
265
49.14
17.95
1649.53
73.33
178
Andhra
Pradesh
53.36
5.52
39.91
53.52
60.24
2952
56.13
9.96
37.39
54.53
72.83
2134
Karnataka
66.92
11.6
23.39
69.82
77.32
1239
66.86
11.63
35.56
69.67
78.18
791
Kerala
45.16
10.17
27.39
42.8
59.89
970
47.03
12.42
18.18
45.82
63.74
620
Monso
on
(June-Sep
tember)
Post
Monso
on
(October-D
ecem
ber)
State
Mean
Std.Dev
p10
p50
p90
NMean
Std.Dev
p10
p50
p90
N
Assam
38.1
13.92
4.55
35.48
60283
39.44
13.41
17.65
36.93
59.13
230
WestBengal
43.29
11.9
7.82
43.56
57.19
618
43.35
12.77
6.41
43.13
58.5
531
Jharkh
and
52.73
17.17
6.25
54.83
7576
54.47
14.22
16.67
54.69
71.33
52Orissa
40.63
10.11
16.59
40.42
53.48
693
40.3
10.9
16.67
3955.48
623
Chattisgarh
42.48
12.94
22.5
41.98
63.69
398
41.21
13.67
11.33
39.48
58.43
324
Andhra
Pradesh
52.94
8.01
36.13
53.05
58.99
3952
52.4
5.53
40.98
52.15
60.23
3307
Karnataka
58.22
15.45
24.38
62.13
75.07
1697
60.2
15.04
14.67
65.5
751335
Kerala
42.77
10.55
18.7
42.32
58.22
1452
41.65
10.71
18.26
39.27
58.66
1217
Sou
rce:
Calculation
susing2002-2008NSSO
Socio-E
conom
icSurveys.
Notes:1.
Discount
calculatedas
(Mkt.price
-PDSprice)/Mkt.price*100
2.Averagesbased
onPDSan
dmarketpricesreportedby
PDSusers
inthesample.
50
Tab
le3:
Variation
intheper
capitavalueof
thePDSRiceSubsidy(R
s)
Winter(J
anuary
-March)
Summer
(April-M
ay)
State
Mean
Std.Dev
p10
p50
p90
NMean
Std.Dev
p10
p50
p90
N
Assam
23.52
11.11
8.64
20.94
38.67
233
26.56
14.4
3.86
23.47
43.42
163
WestBengal
11.44
3.27
4.62
11.01
16.13
445
11.9
3.37
2.21
1215.63
320
Jharkh
and
52.11
29.69
14.35
44.38
8473
60.51
38.41
15.12
53.2
9868
Orissa
18.89
12.55
4.14
15.98
31.97
544
18.95
11.43
416.48
29.99
427
Chattisgarh
33.15
27.58
8.28
25.55
52.71
265
35.66
29.53
5.19
29.39
61.99
178
Andhra
Pradesh
25.36
6.23
12.21
25.18
33.44
2952
26.95
7.81
11.94
25.96
36.35
2134
Karnataka
35.89
21.56
6.93
31.74
58.24
1239
37.93
25.26
8.32
31.49
63.76
791
Kerala
31.11
14.14
7.27
29.72
49.81
970
33.07
15.31
7.8
29.86
53.1
620
Monso
on
(June-Sep
tember)
Post
Monso
on
(October-D
ecem
ber)
State
Mean
Std.Dev
p10
p50
p90
NMean
Std.Dev
p10
p50
p90
N
Assam
24.38
13.98
2.7
2241.81
283
26.12
18.65
7.2
21.89
41.63
230
WestBengal
10.76
3.55
210.55
14.74
618
10.88
3.83
1.52
10.69
15.61
531
Jharkh
and
45.45
29.08
7.17
38.04
7776
54.03
31.11
7.7
45.53
79.8
52Orissa
17.07
9.69
4.34
15.18
28693
17.94
11.33
4.85
14.87
28.8
623
Chattisgarh
32.51
23.28
7.02
24.72
59.79
398
31.2
26.64
3.73
24.55
54.93
324
Andhra
Pradesh
25.87
6.99
12.09
25.27
34.72
3952
26.1
6.55
12.43
25.6
34.94
3307
Karnataka
32.62
20.37
6.63
28.71
55.37
1697
33.71
21.71
4.79
29.1
56.25
1335
Kerala
29.89
13.48
7.7
28.28
48.89
1452
29.12
12.81
6.34
27.64
46.27
1217
Sou
rce:
Calculation
susing2002-2008NSSO
Socio-E
conom
icSurveys.
Notes:1.
Valueof
thesubsidycalculatedas
Per
capitaQuota*(M
kt.price
-PDSprice)2.
Averagesbased
onPDSan
dmarketprices
reportedby
PDSusers
inthesample.
51
Table 4: Descriptive statistics for the full sample and PDS users
Sample: Full Sample PDS users
Mean Standard Mean Standarddeviation deviation
Monthly expenditure per capita (Rs) 1011.0 (1085.3) 636.8 (393.4)Daily calories per capita (kcal) 2334.2 (1300.9) 2190.9 (623.7)Proportion spent on food 0.547 (0.142) 0.577 (0.116)
Size of the household 4.570 (2.382) 4.736 (1.891)Number of children below 15 1.411 (1.428) 1.538 (1.339)Proportion of women 0.515 (0.207) 0.512 (0.152)Age of household head 46.58 (13.61) 45.38 (12.17)
Urban dummy 0.363 (0.481) 0.219 (0.414)Electricity 0.724 (0.447) 0.725 (0.446)Permanent home 0.338 (0.473) 0.314 (0.464)SC/ST/OBC 0.592 (0.491) 0.765 (0.424)
PDS rice price (Rs/kg) 5.271 (1.855)Mkt rice price (Rs/kg) 10.80 (2.193)
PDS rice qty (kg) 18.64 (9.392)Market rice qty (kg) 26.11 (20.24)
Food expenditure per capita (Rs) 400.5 (168.9)Cereal expenditure per capita (Rs) 116.0 (43.85)Rice subsidy per capita (Rs) 25.71 (11.90)
Rice proportion of food expenditure 0.260 (0.130)
Proportion of calories from rice 0.615 (0.175)Proportion of calories from cereals 0.727 (0.0987)
Observations 124228 22564
Notes: 1. Rural Poverty line is Rs 497.6, Urban Poverty line is Rs 635.7 (Planning Commission,
Government of India). 2. Average daily minimum calorie requirements are 2400 kcal for rural
and 2100 kcal for urban areas. 3. All prices in 2005 Rupees. (Rs 45.3 = 1 USD in 2005). 4. The
category cereals includes rice, wheat, semolina, jowar, bajra, millets, corn, barley and ragi. 5. The
sample comprises households in the following states: Andhra Pradesh, Assam, Karnataka, Kerala,
Orissa, Jharkhand, Chattisgarh and West Bengal.
52
Table 5: Impact of the subsidy on cereal consumption
Dependent variable: Cereal consumption Log cereal consumption Cereal consumption(1) (2) (3)
Rice subsidy per capita 2.030∗∗∗
(0.158)
Log rice subsidy per capita 0.123∗∗∗
(0.00963)
Rice quota per capita -1.968(4.442)
Market price* quota per capita 1.697∗∗∗
(0.436)
PDS price* quota per capita 0.157(0.463)
PDS price -1.607(2.849)
Market price -6.131∗∗
(2.395)
Observations 22564 22564 22564Adjusted R2 0.250 0.270 0.258
Standard errors in parentheses. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Notes: 1. All equations present results clustered at the district level. 2. All equations include household
characteristics (education of household head and spouse, age and age squared of household head, proportion
of females, land owned) and urban, state*year, district and season dummies. 3. Dependent variable in
columns (1) and (3) is daily cereal consumption per capita (grams), dependent variable in column (2)
is log of daily cereal consumption per capita.
53
Tab
le6:
Impactof
thesubsidyon
calories
from
differentfood
grou
ps
Dep
endentvariab
le:
Caloric
intake
Log
caloricintake
Log
caloricintake
from
food
grou
pCereals
Lentils
Fruits&
Vegetab
les
Meat
(1)
(2)
(3)
(4)
(5)
(6)
Ricesubsidyper
capita
12.58∗
∗∗
(1.116)
Log
rice
subsidyper
capita
0.144∗
∗∗0.123∗
∗∗0.154∗
∗∗0.234∗
∗∗0.170∗
∗∗
(0.0103)
(0.00963)
(0.0177)
(0.0160)
(0.0187)
Observations
22564
22564
22564
22118
22562
19833
Adjusted
R2
0.124
0.166
0.270
0.215
0.441
0.426
Standarderrors
inparentheses.
∗p<
0.10,∗∗
p<
0.05,∗∗
∗p<
0.01
Notes:1.
Allequationspresent
resultsclustered
atthedistrictlevel.2.
Allequationsincludehou
seholdcharacteristics(education
ofhou
sehold
headan
dspou
se,agean
dagesquared
ofhou
seholdhead,proportion
offemales
landow
ned)an
durban
,state*year,districtan
dseason
dummies.
3.Dep
endentvariab
lein
column(1)is
daily
caloricintake
per
capita(kcal),dep
endentvariab
lein
column(2)is
logof
daily
caloricintake
per
capita,
dep
endentvariab
lein
column(3)is
logof
calories
from
cerealsper
capita,
dep
endentvariab
lein
column(4)is
logof
calories
from
lentilsper
capita,
dep
endentvariab
lein
column(5)is
logof
calories
from
fruitsan
dvegetablesper
capita,
anddep
endentvariab
lein
column(6)is
logof
calories
from
meatper
capita.
54
Tab
le7:
Exp
enditure
elasticity
ofcalories
andfood
expenditure:Ruralsample
Dep
endentvariab
le:
Log
caloricintake
Log
food
expenditure
Log
Rupeesper
calorie
(1)
(2)
(3)
(4)
(5)
(6)
Log
mon
thly
expenditure
per
capita
0.406∗
∗∗0.751∗
∗∗0.345∗
∗∗
(0.0103)
(0.00883)
(0.0102)
Log
rice
subsidyper
capita
0.140∗
∗∗0.146∗
∗∗0.00575
(0.0135)
(0.0153)
(0.00632)
Observations
13333
13333
13333
13333
13333
13333
Adjusted
R2
0.437
0.157
0.820
0.404
0.675
0.527
Standarderrors
inparentheses.
∗p<
0.10,∗∗
p<
0.05,∗∗
∗p<
0.01
Notes:1.
Allequationspresent
resultsclustered
atthedistrictlevel.2.
Allequationsincludehou
seholdcharacteristics
(education
ofhou
seholdheadan
dspou
se,agean
dagesquared
ofhou
seholdhead,proportion
offemales,landow
ned)an
d
state*year,districtan
dseason
dummies.
3.Dep
endentvariab
lein
columns(1)an
d(2)is
logof
daily
caloricintake
per
capita,
dep
endentvariab
lein
columns(3)an
d(4)is
logof
food
expenditure
per
capita,
dep
endentvariab
lein
columns(5)an
d(6)is
logof
price
paidper
1000
calories.5.
Asin
Subraman
ianan
dDeaton(1996),thesample
comprises
only
ruralhou
seholdsan
dmon
thly
expenditure
iscalculatednet
ofpurchases
ofdurable
good
s.
55
Table 8: Income elasticity of cereal and food consumption
Panel A: IHDS data
Dependent variable: Log cereal consumption Log food expenditure
(1) (2) (3) (4)
Log monthly income per capita 0.0462∗∗∗ 0.0304∗∗∗ 0.0966∗∗∗ 0.0827∗∗∗
(0.00972) (0.00959) (0.0108) (0.0105)
Log rice subsidy per capita 0.295∗∗∗ 0.259∗∗∗
(0.0323) (0.0323)
Observations 3962 3962 3962 3962Adjusted R2 0.306 0.354 0.402 0.429
Panel B: IHDS and NSSO data
Dependent variable: Log cereal consumptionData: IHDS NSSO IHDS NSSO
Log monthly expenditure per capita 0.247∗∗∗ 0.269∗∗∗
(0.0178) (0.0264)
Log rice subsidy per capita 0.320∗∗∗ 0.179∗∗∗
(0.0314) (0.0390)
Observations 3962 4255 3962 4255Adjusted R2 0.388 0.357 0.358 0.286
Standard errors in parentheses.∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Notes: 1. All equations present results clustered at the district level. 2. All equations in panel A include
urban, district and season dummies. All equations in panel B include household characteristics (education
of household head and spouse, age and age squared of household head, proportion of females, land owned)
and urban, district and season dummies. 3. Dependent variable in columns (1) and (2) of panel A and
columns (1)-(4) of panel B is log of cereal consumption per capita, dependent variable in columns (3) and
(4) of panel A is log of food expenditure per capita. 4. The data come from the India Human Development
Survey 2004-05 and the NSSO surveys covering November 2004 - October 2005.
56
Tab
le9:
Impactof
thesubsidyon
cereal
consumption
andcaloricintake:Heterogeneity
bysocial
grou
p
Dep
endentvariab
le:
Cerealconsumption
per
capita
Caloric
intake
per
capita
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
Ricesubsidyper
capita
1.979∗
∗∗1.892∗
∗∗1.830∗
∗∗1.960∗
∗∗12.41∗
∗∗11.61∗
∗∗11.55∗
∗∗12.28∗
∗∗
(0.172)
(0.209)
(0.173)
(0.159)
(1.214)
(1.046)
(1.309)
(1.142)
Urban
dummy
-31.61
∗∗∗
-24.97
∗∗∗
-30.90
∗∗∗
-24.86
∗∗∗
-82.32
∗∗∗
-63.93
∗∗∗
-104.0
∗∗∗
-59.30
∗∗∗
(6.412)
(3.026)
(3.084)
(3.030)
(27.98)
(13.92)
(13.52)
(13.91)
Urban
*Ricesubsidy
0.240
0.772
(0.241)
(1.054)
SC/S
T/O
BC
1.488
-76.34
∗∗∗
(5.767)
(27.20)
SC/S
T/O
BC*R
icesubsidy
0.186
1.242
(0.205)
(1.001)
Low
estexpenditure
quartile
-52.27
∗∗∗
-334.8
∗∗∗
(6.058)
(29.14)
Low
estqu
artile*R
icesubsidy
-0.106
-2.961
∗∗
(0.205)
(1.169)
Hom
egrow
nrice
-15.73
∗-41.21
(8.247)
(39.68)
Hom
egrow
n*R
icesubsidy
1.300∗
∗∗5.893∗
∗∗
(0.328)
(1.609)
Observations
22564
22564
22564
22564
22564
22564
22564
22564
Adjusted
R2
0.250
0.250
0.274
0.251
0.124
0.124
0.183
0.126
Standarderrors
inparentheses.
∗p<
0.10,∗∗
p<
0.05,∗∗
∗p<
0.01
Notes:1.
Allequationspresent
resultsclustered
atthedistrictlevel.2.
Allequationsincludehou
seholdcharacteristics(education
ofhou
sehold
headan
dspou
se,agean
dagesquared
ofhou
seholdhead,proportion
offemales,landow
ned)an
durban
,state*year,districtan
dseason
dummies.
3.Dep
endentvariab
lein
columns(1)-(4)is
daily
cereal
consumption
per
capita(grams),dep
endentvariab
lein
columns(5)-(8)is
daily
caloricintake
per
capita(kcal).
57
Table 10: Impact of the subsidy on rice producing households
Dependent variable: Cereal consumption Caloric intake
(1) (2)
Rice quota per capita -14.15 -38.18(18.56) (84.41)
Market price* quota per capita 4.483∗∗ 19.52∗∗
(1.792) (7.896)
PDS price* quota per capita -0.989 -5.916(1.976) (8.667)
PDS price 3.226 38.62(10.98) (49.72)
Market price -17.15∗ -52.60(8.760) (40.03)
Observations 2111 2111Adjusted R2 0.231 0.241
Standard errors in parentheses. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Notes: 1. The equation present results clustered at the district level. 2. The equation includes household
characteristics (education of household head and spouse, age and age squared of household head,
proportion of females, land owned) and urban, state*year, district and season dummies. 3. Dependent
variable in column (1) is daily cereal consumption per capita (grams), dependent variable in column (2)
is daily caloric intake per capita (kcal). 4. The sample comprises PDS households that report
own (produced) rice as one of their sources of supplementary grains.
58
Table 11: Elasticity of cereal consumption and caloric intake by expenditure quartile
Panel A: Cereal consumption
Dependent variable: Log cereal consumption per capitaExpenditure quartile: Lowest Highest
(1) (2) (3) (4)
Log rice subsidy per capita 0.0689∗∗∗ 0.0895∗∗∗ 0.109∗∗∗ 0.175∗∗∗
(0.0138) (0.0147) (0.0137) (0.0180)
Observations 5677 5709 5696 5482Adjusted R2 0.320 0.362 0.330 0.272
Panel B: Caloric intake
Dependent variable: Log caloric intake per capitaExpenditure quartile: Lowest Highest
Log rice subsidy per capita 0.0682∗∗∗ 0.0933∗∗∗ 0.111∗∗∗ 0.196∗∗∗
(0.0135) (0.0132) (0.0129) (0.0182)
Observations 5677 5709 5696 5482Adjusted R2 0.251 0.294 0.279 0.177
Standard errors in parentheses. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Notes: 1. All equations present results clustered at the district level. 2. All equations include household
characteristics (education of household head and spouse, age and age squared of household head,
proportion of females, land owned) and urban, state*year, district and season dummies. 3. In going
from columns (1) to (4), the sample comprises households in the lowest, second lowest, second
highest and highest expenditure quartile of PDS rice users, respectively. 4. Dependent variable in
panel A is log of cereal consumption per capita, dependent variable in panel B is log of daily
caloric intake per capita.
59
Table 12: Impact on Non-PDS users
Dependent variable: Cereal cons. Log cereal cons. Caloric intake Log caloric intake(1) (2) (3) (4)
Rice subsidy per capita -0.0706 0.462(0.115) (0.772)
Log rice subsidy per capita -0.0141∗ -0.00863(0.00750) (0.00686)
Observations 26494 26494 26494 26494Adjusted R2 0.256 0.261 0.045 0.152
Standard errors in parentheses. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Notes: 1. All equations present results clustered at the district level. 2. All equations include household
characteristics (education of household head and spouse, age and age squared of household head,
proportion of females, land owned) and urban, state*year, district and season dummies. 3. Dependent
variable in column (1) is daily cereal consumption per capita (grams), dependent variable in
column (2) is log of daily cereal consumption per capita, dependent variable in column (3) is daily caloric
intake per capita (kcal), dependent variable in column (4) is log of daily caloric intake per capita.
4. The sample comprises households that do not receive any subsidies from the PDS. They are assigned
their respective local average value of the PDS subsidy.
60
Table 13: Alternative specifications for value of the subsidy
Dependent variable: Log caloric intake(1) (2) (3) (4) (5)
Log rice subsidy (avg. family size) 0.134∗∗∗
(0.00763)
Log rice subsidy (household level) 0.131∗∗∗
(0.0147)
Size of the household 0.134∗∗∗
(0.00243)
Log rice subsidy (per person) 0.202∗∗∗
(0.00992)
Log rice subsidy (median prices) 0.119∗∗∗
(0.0103)
Log rice subsidy per capita 0.148∗∗∗
(state*survey wave) (0.0107)
Observations 22564 22564 22564 22543 22564Adjusted R2 0.173 0.613 0.200 0.159 0.168
Standard errors in parentheses. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Notes: 1. All equations present results clustered at the district level. 2. All equations include
household characteristics (education of household head and spouse, age and age squared of household
head, proportion of females land owned) and urban and district dummies. Equations (1)-(4) include
state*year and season dummies. Equation (5) includes state*survey-wave dummies. 3. Dependent
variable in columns (1), (4) and (5) is log of daily caloric intake per capita, dependent variable
in column (2) is log of daily caloric intake at the household level, dependent variable in
column (3) is log of daily caloric intake at the household level divided by the total number
of household members.
61
Table 14: Impact on different sub-samples of states
Dependent variable: Log cereal consumption Log caloric intakeStates: All Rice Non Rice All Rice Non Rice
(1) (2) (3) (4) (5) (6)
Log rice subsidy per capita 0.0796∗∗∗ 0.123∗∗∗ 0.0439∗∗∗ 0.101∗∗∗ 0.144∗∗∗ 0.0666∗∗∗
(0.00677) (0.00963) (0.00664) (0.00701) (0.0103) (0.00675)
Observations 33231 22564 10667 33231 22564 10667Adjusted R2 0.263 0.270 0.255 0.197 0.166 0.255
Standard errors in parentheses. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Notes: 1. All equations present results clustered at the district level. 2. All equations include household
characteristics (education of household head and spouse, age and age squared of household head,
proportion of females, land owned) and urban, state*year, district and season dummies. 3. Dependent
variable in columns (1) - (3) is log of daily cereal consumption per capita, dependent variable in columns
(4)-(6) is log of daily caloric intake per capita. 4. The rice favoring states are: Andhra Pradesh, Assam,
Chattisgarh, Jharkhand, Karnataka, Kerela, Orissa and West Bengal. The non-rice favoring states are:
Bihar, Gujarat, Haryana, Himachal Pradesh, Madhya Pradesh, Maharashtra, Punjab, Rajasthan
and Uttar Pradesh.
Table 15: Impact of the wheat subsidy
Dependent variable: Log cereal consumption Log caloric intake(1) (2)
Log rice subsidy per capita 0.138∗∗∗ 0.156∗∗∗
(0.0110) (0.0125)
Log wheat subsidy per capita 0.0172∗∗∗ 0.0270∗∗∗
(0.00549) (0.00519)
Observations 12235 12235Adjusted R2 0.275 0.193
Standard errors in parentheses. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Notes: 1. All equations present results clustered at the district level. 2. All equations include household
characteristics (education of household head and spouse, age and age squared of household head,
proportion of females, land owned) and urban, state*year, district and season dummies. 3. Dependent
variable in column (1) is log of daily cereal consumption per capita, dependent variable in
column is log of daily caloric intake per capita.
62
Table 16: Impact of the subsidy on cereal consumption and caloric intake: State levelfunctioning
Dependent variable: Cereal consumption per capita Caloric intake per capita(1) (2)
Rice subsidy per capita 2.359∗∗∗ 12.04∗∗∗
(0.180) (1.148)
Corrupt*Rice subsidy -1.207∗∗∗ -7.060∗∗∗
(0.304) (1.480)
Observations 22564 22564Adjusted R2 0.251 0.117
Standard errors in parentheses. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Notes: 1. All equations present results clustered at the district level 2. All equations include household
characteristics (education of household head and spouse, age and age squared of household head,
proportion of females, land owned) and urban, state-year, district and season dummies. 3. Dependent
variable in column (1) is daily cereal consumption per capita (grams), dependent variable in column (2)
is daily caloric intake per capita (kcal).
63
Appendix Tables
A. Per capita purchase of PDS Rice
Mean (kg) Std. Dev. N
Assam 7.29 3.5 909
West Bengal 2.82 2.46 1914
Orissa 6.60 3.63 2287
Jharkhand 4.74 2.41 269
Chattisgarh 9.09 4.07 1169
Andhra Pradesh 5.07 2.50 12345
Karnataka 5.05 2.49 5062
Kerala 5.50 3.49 4259
Full Sample 5.34 3.1 28214
Note: The sample comprises PDS rice users.
B. Price paid per calorie by food group
Foodgroup: Cereals Lentils Vegetables Meat
Mean (Rs/1000 kcal) 2.40 8.46 18.89 41.45Std. Dev. 0.63 2.80 6.48 19.42
Note: Price calculations based on quantity and value reported by PDS rice users.
1
C. BPL and Antyodaya Households - IHDS and NSSO 61st round data
Sample: BPL AntyodayaData: IHDS NSSO IHDS NSSO
Monthly expenditure per capita (Rs) 660.3 558.5 485.3 478.9(539.4) (306.4) (481.2) (309.5)
Monthly income per capita (Rs) 606.5 471.0(785.6) (370.1)
PDS rice price (Rs/kg) 4.475 5.399 3.329 3.461(1.493) (1.290) (0.954) (0.983)
Market rice price (Rs/kg) 10.54 10.53 9.753 10.09(2.227) (2.016) (1.634) (2.049)
PDS rice Qty (kg) 18.60 17.95 24.71 23.73(6.766) (8.513) (9.438) (10.81)
Market rice Qty (kg) 23.08 25.29 25.67 16.68(22.63) (20.88) (26.31) (20.38)
Daily cereal consumption per capita (kg) 0.441 0.467 0.538 0.496(0.170) (0.136) (0.195) (0.186)
Rice subsidy per capita (Rs) 27.23 24.97 32.47 30.85(15.42) (14.29) (23.30) (23.90)
Observations 4196 7405 283 788
Notes: 1. Means with standard deviations in parentheses. 2. The sample comprises PDS rice users
from the 61st round (2004-05) of the NSSO surveys and the India Human Development Survey 2005.
2
D. PDS Rice users - IHDS and NSSO
Data: IHDS NSSO
Monthly expenditure per capita (Rs) 625.1 596.7(442.2) (362.4)
Monthly income per capita (Rs) 587.5(567.3)
PDS rice price (Rs/kg) 4.546 5.237(1.612) (1.627)
Market rice price (Rs/kg) 10.48 10.56(1.906) (2.122)
PDS rice qty (kg) 19.16 18.54(7.253) (8.785)
Market rice qty (kg) 24.52 27.97(23.03) (21.37)
Daily cereal consumption per capita (kg) 0.434 0.475(0.158) (0.129)
Rice subsidy per capita (Rs) 25.46 24.17(11.18) (11.37)
Observations 3962 4255
Notes: 1. Means with standard deviations in parentheses. 2. The sample comprises PDS rice users
(November 2004 to October 2005) from the NSSO surveys and the India Human Development Survey 2005.
3
Removing Observation Bias: using “big data” to create cooking metrics from stove usage monitors and minimal observation
Andrew M. Simons, Theresa Beltramo, Garrick Blalock, David I. Levine*
"Big data" holds the promise of improving social science research. One application of big data is in understanding the adoption behaviors of new cooking technologies in the developing world. Stove Usage Monitors, for example, can generate millions of observations with lower Hawthorne effects than traditional surveys or direct observation. This paper describes field methods and proposes a new algorithm for converting temperature data generated from stove use monitors into usage metrics for both traditional and nontraditional stoves. Central to our technique is recording the visual on/off status of a stove anytime research staff observes the stove. The observations are then combined with temperature readings and a predictive logistic regression to predict the probability a stove is in use. Using this algorithm we correctly predict 89% of three stone fire observations and 94% of Envirofit observations. The logistic regression correctly classifies more observations than published temperature analysis algorithms. Further, we analyze stove usage data when multi-‐person measurement teams are present in households compared to the same times in the previous and subsequent weeks when only the unobtrusive sensors are collecting temperature data. We find a statistically and economically significant Hawthorne effect whereby study participants increase the use of the introduced stoves by approximately one third (2 hours per day), while simultaneously decreasing three stone fire usage by approximately the same amount when measurement teams are present.
Keywords: observation bias, Hawthorne effect, improved cookstoves, technology adoption, biomass fuel, monitoring and evaluation, stove use monitors * Simons: Charles H. Dyson School of Applied Economics and Management, Cornell University (email: [email protected]). Beltramo: Impact Carbon, San Francisco (email: [email protected]). Blalock: Charles H. Dyson School of Applied Economics and Management, Cornell University (email: [email protected]). Levine: Haas School of Business, University of California, Berkeley (email: [email protected]).
Acknowledgments: This study was funded by the United States Agency for International Development under Translating Research into Action, Cooperative Agreement No. GHS-‐A-‐00-‐09-‐00015-‐00. The recipient of the grant was Impact Carbon who co-‐funded and managed the project. Juliet Kyaesimira and Stephen Harrell expertly oversaw field operations and Amy Gu provided excellent research support. We thank Impact Carbon partners Matt Evans, Evan Haigler, Jimmy Tran, Caitlyn Toombs, and Johanna Young; Berkeley Air project partners including Dana Charron, Dr. David Pennise, Dr. Michael Johnson, and Erin Milner, the USAID TRAction Technical Advisory Group, and seminar participants at Cornell University for valuable comments. Special thanks to Kirk Smith, Ilse Ruiz-‐Mercado, Ajay Pillarisetti and colleagues from the U.C. Berkeley Household Energy, Climate, and Health Research Group for the development of the Stove Use Monitoring system (SUMs) concept and consultations on optimal SUMs placement and holder design for this project. Data collection was carried out by the Center for Integrated Research and Community Development (CIRCODU), and the project’s success relied on expert oversight by CIRCODU's Director General Joseph Ndemere Arineitwe and field supervisors Moreen Akankunda,
2
Innocent Byaruhanga, Fred Isabirye, Noah Kirabo, and Michael Mukembo. This study was made possible by the support of the American people through the United States Agency for International Development (USAID). We thank the Atkinson Center for a Sustainable Future at Cornell University for additional funding of related expenses. The findings of this study are the sole responsibility of the authors, and do not necessarily reflect the views of their respective institutions, nor USAID or the United States Government. Introduction “Big data” is transforming economic policy and the process of economic research (Einav and Levin 2014). Prominent examples include using internet search results to provide real-‐time estimates of economic activity (Choi and Varian 2012; Goel et al. 2010), analyzing administrative records to evaluate the long-‐term effects of better teachers and the effects of providing health insurance (Chetty, Friedman, and Rockoff 2011; Finkelstein et al. 2012), and utilizing shopping data from millions of online shoppers to understand their sensitivity to in-‐state sales taxes (Einav et al. 2012). The developing world also has opportunities for the analysis of “big data.” We use millions of data points collected by non-‐invasive, relatively inexpensive temperature monitors to create metrics of stove use in a developing country setting. Traditional wood-‐and charcoal-‐burning stoves both burn inefficiently and produce a great amount of smoke. The smoke leads to respiratory, heart, and other disorders that kill approximately 4.0 million people per year (Lim et al. 2012). Much of the health burden is concentrated on women and children, as is the time burden collecting biomass fuel (Kammen, Bailis, and Herzog 2002). Environmental damage can be significant; by one estimate, household energy use in Africa will produce 6.7 billion tons of carbon by 2050 (Bailis, Ezzati, & Kammen, 2005). Furthermore, the incomplete combustion of biomass fuels leads to the release of black carbon (soot) that contributes to current global warming (Bond, Venkataraman, and Masera 2004; Ramanathan and Carmichael 2008). These inefficiencies imply deeper poverty, loss of biodiversity, rapid deforestation, and climate change. These problems have led both policy-‐makers and practitioners to search for safer and more fuel-‐efficient stoves that are attractive to consumers. One challenge to understanding stove use substitution behaviors is to measure stove use precisely, economically and with minimal intrusion into the lives of study participants. Measuring stove usage with minimal observation bias is necessary to recover unbiased estimates of stove use because study participants potentially behave differently when research staff is present. Scientists studying the health effects of cooking technologies need to understand how many hours a day a stove is used to calculate exposure to household air pollution (Martin II et al. 2011; McCracken et al. 2007). Project financiers need to measure the time cooked on both new and old stoves to allocate the carbon credits that can fund stove subsidies in developing countries (Gold Standard Foundation 2013; GIZ 2011; Freeman and Zerriffi 2012). Further, “stove stacking” (the use of multiple fuels and stoves) often occurs when multiple cooking options exist (Masera, Saatkamp, and Kammen 2000; Ruiz-‐Mercado et al. 2013). To understand any of these issues researchers must
3
know the baseline amount of cooking on existing stoves, and then the time cooked on both the existing and introduced stove. Pioneering stove studies have highlighted the harmful effects of indoor air pollution and the health benefits of adopting fuel-‐efficient nontraditional stoves (Ezzati, Saleh, and Kammen 2000; Ezzati and Kammen 2002; Ezzati and Kammen 2001; Smith et al. 2010; Smith et al. 2011; Smith et al. 2006; Smith-‐Sivertsen et al. 2009). However these studies were characterized by high levels of interactions between research staff and study participants; for example, in Ezzati et al. (2000) research staff noted the locations of each household member and the status of how active the cooking fire is every 5-‐10 minutes. Observational studies of stove use raise the possibility of observation bias (or Hawthorne effect), which arises when participants act differently than they normally would because they know they are being observed (as noted by Smith-‐Sivertsen et al., 2009). To minimize observation bias, stove use monitor systems (SUMs) can record stove temperatures without the need for an observer to be present. The SUMs used for our project, iButtons™ manufactured by Maxim Integrated Products, Inc., are small stainless steel temperature sensors about the size of a small coin and the thickness of a watch battery which can be affixed to any stove type. Our SUMs record temperatures with an accuracy of +/-‐ 1.3 degrees C up to 85°C (see Figure 1 for a photograph). 1 Stove Usage Monitors that log stove temperatures was first suggested by Ruiz-‐Mercado et al. (2008). SUMs offer an unobtrusive, precise, relatively inexpensive, and objective measure of stove usage (Ruiz-‐Mercado et al. 2008).2 We know of two published algorithms for determining stove usage with SUMs. Both studies use SUMs to gather continuous temperature data for nontraditional cookstoves, and then calculate temperature changes between subsequent temperature readings. In Ruiz-‐Mercado, Canuz, and Smith (2012) the algorithm considers an increase in temperature above a certain threshold (1.52°C increase over 40 minutes) as the start of a candidate cooking or refueling event. If the post-‐increase stove temperature is also above the ambient temperature, the algorithm counts the passage of time until a temperature drop indicates that the stove is cooling down (indicated by a 2.28°C decrease over 60 minutes), signaling the end of the cooking event. These slope thresholds are chosen by taking the 99th and 1st percentiles of slope values from the distribution of ambient air temperatures in the research area. Multiple temperature peaks within a 2 hour period of each other are considered the same cooking event (Ruiz-‐Mercado, Canuz, and Smith 2012). Mukhopadhyay et al. (2012) use a similar technique combining both change in slope and a temperature threshold to classify when a stove is being used. However, the
1 For additional details concerning the SUMs see the product description website at: http://www.berkeleyair.com/products-‐and-‐services/instrument-‐services/78-‐sums 2 The SUMs used in our project cost approximately USD$16 each and could record temperature data for 24 hours a day for four to six weeks in a household before needing minimal servicing from a technician (after the data download they can be reset and re-‐used).
4
researchers choose different thresholds based on observed local cooking practices and local ambient temperatures. While the decision of what temperature thresholds to use is informed by data from the experiment, it requires the assumption that the slope thresholds for starting and ending cooking are the 1st and 99th percentiles of the slope of the ambient air temperatures. If these slope thresholds are not the true indicator of starting and ending cooking, this introduces error into the specification. Further, a challenge of the existing algorithms is that different thresholds are needed for different geographic areas and potentially seasons. We propose a methodology that uses SUMs temperature data gathered in the same way as previous studies, but combines this temperature data with observations made throughout the experiment of when stoves were seen as on or off. We run a logistic regression and use SUMs temperature data to predict observations of stoves being on or off. We then take the coefficients from this equation to predict the probability of cooking over the much larger set of SUM temperature readings without any observational data. The researcher can sum the predicted minutes cooked in a 24-‐hour period to estimate minutes of cooking that day. This measure of cooking can then be used as the basis for other analysis such as comparing cooking behaviors between groups of households with certain characteristics or groups differentiated by experimental design. Methods Our technique requires continuous SUMs temperature data for a given stove, and recorded instances of whether that particular stove is seen in use or not (henceforth called “observations”). We matched observations of stove use to SUMs temperature data by time and date stamps. The core of our method is a logistic regression using the lags and leads of the SUMs temperature data to predict observations of stove usage. Continuous SUMs temperature data A stove usage monitor takes a temperature reading at an interval specified by the user. The SUMs used in this experiment hold approximately 2,000 data points before the data needs to be downloaded and the device reset. We set the SUM to record temperature every 30 minutes, giving them six weeks between required servicing. Observations of stove use Each time the data collection team visited a household they observed which stoves were in use (along with the date and time). Enumerators visited a house numerous times during a “measurement week,” when we also enumerated a survey and weighed wood for a kitchen performance test (Rob Bailis, Smith, and Edwards
5
2007). Another enumerator visited once every 4-‐6 weeks to download data and reset the SUM.3 We originally intended to use the entire sample of observations. We found, however, a number of anomalies. For example, 3.0% (10 out of 329) of “lit” three stone fire observations had SUM readings below the typical daily mean temperature (23.8°C). At the same time 7.8% (105 out of 1339) of “not lit” three stone fire observations had SUM readings over 40°C, which was the highest ambient air temperature recorded in the observation period. While suspicious, these readings may not necessarily be erroneous, as SUMs can be cool if the stove was just lit in the last few minutes and the SUM has not yet registered the heat increase, or a SUM could still be hot on an unlit stove if the stove was just extinguished. We suspect a substantial share of such cases were due to errors in observing or recording the lit status of stoves or in errors in matching the time stamps, stoves, or homes of observations and SUMs measurements. Therefore we trim the observation sample based on the hourly ambient temperatures (with a conservative adjustment). We drop observations observed as “lit” below the mean minus 2.0°C of the ambient temperature for that hour, and we drop observations observed as “not lit” that are above the maximum ambient temperature plus 2.0°C for that hour. The mean and maximum hourly ambient temperatures are taken across the entire sample of ambient temperature readings. This removes 11.0% (184 out of 1668) and 3.6% (28 out of 782) of the observation samples for three stone fires and Envirofits, respectively. Regression specification There is no theory of what combinations or transforms of current, lagged, and leading temperature readings should be used to predict observed cooking. We test ten specifications and compare the goodness of fit for each specification. (We do not also include slope terms, because they are perfectly collinear with multiple lags or leads.) For simplicity, our desire is to use the same specification on both the three stone fire sample and the Envirofit sample. Data A series of randomized control trials were executed in rural areas of the Mbarara District in southwestern Uganda (Figure 2) from February to September 2012, which focused on the use, and adoption of fuel-‐efficient stoves.4 In our study the 3 One caveat with observations used in this study, is that at times when there are multiple stoves of a given type (for example two Envirofits or two three stone fires) in a household, the observation was the answer to the following question, “Is an [Envirofit stove/Three Stone Fire] on in the given household?” In cases where the answer was “Yes, one Envirofit stove is on and in use” but two Envirofits were present in the household, the data cleaning process required comparing the SUMs temperature reading of both Envirofits that were in the house at the timestamp of the observation and then assigning the “on” value to the hotter one, and the “off” value to the cooler one. The same process was applied to three stone fires as well. This process is described in detail in Appendix A. 4 See Harrell, Beltramo, Levine, Blalock, & Simons (2014) and Levine, Beltramo, Blalock, & Cotterman (2013) for more details on the experimental design and an overview of the study area.
6
traditional stove was a three stone fire,5 and the nontraditional stove was a fuel-‐efficient Envirofit G-‐3300.6 The study area is characterized by agrarian livelihoods including farming of matooke (starchy cooking banana), potatoes, and millet as well as raising livestock. At baseline almost all families cook on a traditional three-‐stone fire (97%), usually located within a separate cooking hut (62% of households had totally enclosed kitchens with no windows, while 38% had semi-‐enclosed kitchens with at least one window). Most stove usage occurs during lunch and dinner preparation, with matooke and beans the most common and most time consuming foods cooked. Matooke, a main food for lunch and dinner, is unripe plantain eaten after steaming for 3-‐5 hours. Beans, another common dish, is prepared by boiling and simmering for generally 2-‐4 hours. Thus for the main meals, it is common cooking practice to simmer and/or steam foods for several hours in a row. The study tracked stove usage (before and after the purchase of a fuel efficient stove) amongst twelve households in each of fourteen rural parishes in Mbarara (168 total households). Upon arriving in a new parish, staff displayed the new Envirofit and offered it for sale to anyone who wanted to purchase at 40,000 Ugandan Shillings (approximately USD $16, see Levine et al. (2013) for an overview of the sales contract). Consumers who wanted to buy the stove were randomly assigned into two groups (early buyers, late buyers). The project asked both early buyers and late buyers if they would agree to have SUMs placed on their traditional stoves immediately. Then approximately four weeks later the early buyers group received their first Envirofit stove, and approximately four weeks after that the late buyers received their first Envirofit stove. In each parish, more than twelve households agreed to join the study; therefore among those that agreed, we randomly selected twelve households per parish for the usage study with the SUMs. Stove temperatures were tracked for approximately six months (April – September 2012). Approximately four weeks after late buyers received their Envirofits, both groups were surprised with a second Envirofit stove. Common cooking practices in the area require two simultaneous cooking pots (for example rice and beans, or matooke and some type of sauce); therefore, because the Envirofit is sized for one cooking pot, we gave a second Envirofit to mimic normal cooking behavior as much as possible. Each household had as many as two three stone fires and two Envirofit stoves monitored with SUMs throughout the study. The results of stove use will be presented in a subsequent analysis.
5 A three stone fire is simply three large stones, approximately the same height, on which a cooking pot can be balanced over a fire. Almost all cooks burned firewood or other solid biomass. 6 Product manufacturer features state that the Envirofit G-‐3300 reduces smoke and harmful gasses by up to 80%, reduces biomass fuel use by up to 60% and reduces cooking time by up to 50%. For more details see: http://www.envirofit.org/products/?sub=cookstoves&pid=10
7
Results Descriptive statistics The data consist of approximately 1.7 million temperature readings from more than 34,000 stove-‐days of use recorded across 580 stoves at 168 households. The sample sizes by stove type are in Table 1. SUMs must be placed close enough to the heat source to capture changes in temperatures, but not so close that they exceed 85°C or they overheat and malfunction. We do not need to recover the exact temperature of the hottest part of the fire to learn about cooking behaviors. Even with SUMs that are reading temperatures 20-‐30 centimeters from the center of the fire, as long as the temperature readings for times when stoves are in use are largely different than times when stoves are not used the logistic regression will be able to predict a probability of usage. We did not have strict control over the location of our SUM in a three stone fire, while we did for an Envirofit. SUMs for three stone fires were placed in a SUM holder (Figure 3)7 and then placed under one of the stones in the three stone fire (left panel, Figure 4). The SUMs for Envirofits were attached using duct tape and wire and placed at the base of the stove behind the intake location for the firewood (right panel, Figure 4). Over multiple weeks users likely move the locations of the stones in their three stone fires depending on their cooking needs, pushing the stones closer together for a smaller fire and wider to fit more fuel wood for a larger fire. Therefore we expect the tracking of three stone fire temperatures to be more difficult with SUMs than the tracking of Envirofit stove usage. Figure 5 shows an example of SUMs temperature data for a household across about three weeks. The left panel shows the temperatures registered in a three stone fire versus the ambient temperature also recorded with SUMs in this household, while the right panel compares the temperature of the Envirofit to the ambient temperature reading. These two figures illustrate a point that is largely consistent throughout our wider data set. SUMs readings of the three stone fires are less responsive than Envirofit SUMs readings to temperature changes. It seems that residual heat stays in the cooking area of a three stone fire preventing temperatures from dropping all the way to the ambient temperature. The design of the Envirofit, which is engineered to be much more heat efficient, shows up in these SUMs graphs as less residual heat when a stove is not in use by the temperatures dropping to room temperature or below. A final difficulty is found July 2-‐4th (left panel, Figure 5) where temperatures for the three stone fire remain above 40°C for about a 48-‐hour period. While it could be that the fire was on for this entire period, it could also be that the household was
7 The location for optimal SUM placement on three stone fires and Envirofit stoves was field tested in Uganda prior to starting the experiment. Additionally, we are thankful for numerous consultations with Dr. Kirk Smith and colleagues at the University of California, Berkeley who have successfully used SUMs and designed the SUM holder (Figure 3) to track stove temperatures in previous studies (Ruiz-‐Mercado et al. 2008; Ruiz-‐Mercado, Canuz, and Smith 2012; Mukhopadhyay et al. 2012).
8
banking embers (keeping coals warm, but not cooking) and the SUM was close to those embers. This figure highlights some of the potential challenges of using temperatures as a predictive mechanism for cooking metrics, especially with three stone fires. Ambient temperature sensors were placed in one household in each of the fourteen parishes.8 Figure 6 is a box plot of the ambient temperatures by hour of the day (averaged over each day of the experiment). There are 1,668 visual observations for three stone fires, and 782 for the Envirofit stove (see Appendix A for a description of the process to match observations to temperature data). The summary statistics for the entire sample and for the trimmed sample (dropping observations that were implausible given the ambient temperature) are in Table 2. Note the spread of mean temperatures (“off” vs. “on”) is wider for observations for Envirofits (26.2°C vs. 41.5°C in full sample, and 25.3°C vs. 42.0°C in trimmed sample) as compared to the three stone fires (30.2°C vs. 38.3°C in full sample, and 28.2°C vs. 39.0°C in trimmed sample). Comparison of fit: full vs. trimmed sample We first examine one regression with two leads and lags on both the full and trimmed samples. Table 3 shows the logistic regression predicting the observation that the stove was recorded as lit. The percent correctly classified is good (83.1%), but the pseudo-‐R2 is a modest 0.17. Figure 7 shows the predicted probability a stove is observed in use at each SUM temperature. Contrary to our priors, the predicted probability is fairly linear for the full sample (especially for the three stone fire). We originally hoped to let the dataset “speak for itself.” However, the predicted probability a three stone fire with a temperature reading of 50°C is “lit” is only about 50%. Because 50°C is much higher than the highest ambient temperature we recorded (40°C), it should have a predicted probability of cooking much higher than 50%. The predictive power rises substantially when we discard outliers, defined as observations coded “lit” and more than two degrees below the hourly mean air temperatures (Figure 6), or “not lit” and more than two degrees above the maximum air temperature we observed that hour of the day across the duration of the experiment (Figure 6). This increase in fit holds for the three stone fires (pseudo-‐R2 increases from 0.17 to 0.40, percentage correctly classified increases from 83.1% to 89.3%) and Envirofits (pseudo-‐R2 increases from 0.45 to 0.70, percentage correctly classified increases from 90.7% to 93.8%). (Note that trimming these outliers mechanically improves the fit and percent correctly classified.) When we drop outliers the predicted probability now has the expected logistic S-‐shape (Figure 7). 8 Ambient temperature readings were only recovered in 13 out of 14 districts. In total there are about 56,000 ambient temperature readings.
9
Regression Specifications We test ten regression specifications on the trimmed sample for both the three stone fires and the Envirofit (Table 4). Initially, we include only the temperature at time t (specification 1); it has a pseudo-‐R2 of 37% and 60%, and correctly classifies 87.5% and 96.1% of the three stone fire and Envirofit sample, respectively. Adding a lead and lag of one period (spec. 2) increases the pseudo R2 to 41% and 68%, and increases the percentage correctly classified to 89.2% for three stone fires, but decreases the percentage correctly classified to 94.9% in the Envirofits. Adding a second (spec. 4), third (spec. 6) or fourth (spec. 8) lead and lag does not improve fit by much. Adding a slope-‐squared term when the slope is positive for the time step from t-‐1 to t (coding slope-‐squared as zero otherwise) measures steep positive increases in slope, as would be the case when a fire is lit. However adding this slope-‐squared term (col. 3, 5, and 7) minimally changes the results. Only including leads (spec. 9) or only including lags (spec. 10) reduces pseudo-‐R2, but increases percentage correctly classified to 89.4% and 95.3% (spec. 10) for three stone fires and Envirofits, respectively. In order to determine which of the models is most appropriate we test the ten specifications with the Akaike information criterion (AIC) (Akaike 1974; Akaike 1981). The AIC trades off goodness of fit of the model with the complexity of the model to guard against over fitting. The AIC is defined as: !"# = 2! − 2 !"(!) (0) where k is the number of parameters in the statistical model, and L is the maximized value of the likelihood function of the model. Given a set of candidate models for a set of data, the preferred model is the one with the lowest AIC. Burnham and Anderson (2002) argue that the AIC is appropriate in cases where the number of observations is many times larger that the square of the number of predictors, as in this case. Among both the three stone fire and Envirofit sample, the AIC is a much higher value for the specifications with temperature at t (spec. 1), temperature at t and four leads (spec. 9), temperature at t and 4 lags (spec. 10) than for the other specifications. The remaining seven specifications all have similar AICs. The specification that includes two leads and two lags (spec. 4) has the lowest AIC among the three stone fire sample. The lowest AIC among the Envirofit sample is the specification with four leads and four lags (spec. 8). For simplicity, we use the same specification on both the three stone fire sample and the Envirofit sample; therefore, we rely on the specification with two leads and lags (spec. 4) for both samples. Logistic Regression Specification According to the AIC for three stone fires, the preferred specification is: !!" = !(γ!!!" + γ!!!"!! + γ!!!"!! + γ!!!"!! + γ!!!"!! + γ!!!" + !!") (0)
10
where F(.) is the logistic functional form, !!" is a dummy variable for observations for stove i at time t. !!" is the SUMs temperature reading for stove i at time t. !!"!! is the SUMs temperature reading for stove i at time t+τ, for τ = -‐2, -‐1, 1, and 2 (that is 30 and 60 minutes prior to the observation, and 30 and 60 minutes after). !!" is a dummy variable for the hour of the day for stove i at time t. The terms γ!, γ!, γ!, γ!, γ!, γ! are the parameters and !!" is an error term. Estimated odds ratios of the logistic regression for both the three stone fires and the Envirofit are presented in Table 3 (columns 2 and 4). An increase of one degree Celsius at time t is associated with an odds ratio of 1.37 and 1.46 (both statistically significant at the 1% level) for the three stone fire and Envirofit, respectively. This means, holding all else constant, that an increase of one degree Celsius at time t means the odds of a stove being “on” are approximately 37% higher for three stone fires, and 46% higher for Envirofits than the odds of the stove being “on” at the initial time t temperature. The three stone fire temperature reading at time t+2 is associated with an odds ratio of 1.07 (significant at 5% level), meaning the odds of a three stone fire being on at time t are 7% higher in the presence of a one degree increase in temperature at time t+2 holding all else constant. The Envirofit temperature reading at time t+1 has an odds ratio of 1.24 (significant at 1% level) and at time t+2 an odds ratio of 0.90 (significant at 5% level). This means the odds of an Envirofit being on at time t are 24% higher in the presence of a one degree increase in temperature at time t+1, and 10% lower in the presence of a one degree increase in temperature at time t+2, holding all else constant. 9 As a robustness check Table 5 shows the in sample and out of sample predictive accuracy of the preferred specification. We randomly split the sample of observations of each stove in half and ran the logistic regression on half of the observations. We then predict the probability of use to both halves of the sample. For the three stone fires the in-‐sample predictive accuracy is 89.8% while out-‐of-‐sample predictive accuracy is 87.8%. For the Envirofits the in-‐sample predictive accuracy is 95.5% while out–of-‐sample predictive accuracy is 91.1%. Because the in-‐sample and out-‐of-‐sample predictive accuracy is very similar for each stove type, it is unlikely that we are over-‐fitting the data. A classification table for the entire sample (Table 6) shows how well the regression predicts the entire observation sample for the three stone fires and for the Envirofit. The sensitivity (probability predicted “lit” when observation observed as “lit”) is only 56.5% for the three stone fires and 77.6% for the Envirofits, showing that the technique struggles when only predicting the sample that was observed as “lit.” The 9 We also tested probit and the linear probability model (LPM) in addition to the logistic regression. Predictions of the probability of stove usage from the probit and logistic regression were correlated at levels higher than 0.99 for both the three stone fire and Envirofit sample. Among the predictions of probability of stove use from the linear probability model, 28.8% and 36.2% of the prediction fell outside of the range [0 to 1] for the three stone fire and Envirofit, respectively. As the proportion of linear probability model predicted probabilities that fall outside of the unit interval increases, the potential bias of using LPM also increases (Horrace and Oaxaca 2006). Because we have a high proportion of LPM predictions that fall outside of this range, using LPM would likely result in biases.
11
specificity (probability predicted “not lit” when the observation observed as “not lit”) is much better, 97.6% for the three stone fire, and 97.0% for the Envirofits. This technique generally holds more predictive power with Envirofits than three stone fires as evidenced by the Envirofit sample having a higher percentage of overall correctly classified observations (93.8% vs. 89.2%). Applying published SUMs algorithms to our data We replicate the algorithm described in (Ruiz-‐Mercado, Canuz, and Smith 2012; Mukhopadhyay et al. 2012) to convert temperature readings to hours of usage. We first determined a threshold increase and decrease in stove temperature equal to the 99th and 1st percentiles of ambient temperature slopes. In our sample of approximately 56,000 ambient temperature readings the 99th percentile of slopes is 0.0833°C/minute (triggering slope of an increase of 2.5°C in 30 minutes), and the 1st percentile of slopes is -‐0.05°C/minute (exit slope of a decrease of 1.5°C in 30 minutes). The algorithm counts a stove as turning “on” when the slope is larger than a triggering slope, then it continues to count each subsequent temperature reading as “on” (as long as temperatures are higher than the ambient temperature) until it reaches a negative slope that is steeper than the exit slope. We calculate the on/off status of our entire data set using this algorithm. We apply this algorithm to both the traditional three stone fires and the Envirofit stoves; however Mukhopadhyay et al. (2012) and Ruiz-‐Mercado, Canuz, and Smith (2012) designed their algorithms for nontraditional stoves and do not endorse this technique for three stone fires. We observe some irregularities in this algorithm when applied to our data, most notably a very hot stove could be marked as “off” too quickly if the temperature falls steeper than a decrease of 1.5°C in 30 minutes even if the absolute temperature is still hot enough to likely indicate cooking. Take for example the following series of temperature data {26, 51, 52, 47, 49, 46, 41, 37, 33, 31°C} with each reading thirty minutes apart. The algorithm would return {off, on, on, off, off, off, off, off, off, off}. When the temperature falls from 52°C to 47°C between the third and fourth reading the slope is steep enough and the algorithm switches to off. However, in reality the subsequent temperature readings above 40°C are likely to indicate stove usage (recall no ambient air temperatures were ever that high). Another irregularity with the basic algorithm is that stoves can cool too slowly to trigger an exit slope so the algorithm returns an excessively long period of cooking (at times multiple days) despite temperatures low enough that cooking is unlikely. For example the data series {26, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20°C} would return {off, on, on, on, on, on, on, on, on, on, on, on, on, on, on} despite 20°C being below the mean ambient temperature in our period, and almost certainly not hot enough to indicate stove usage. We created an adjusted algorithm to account for these two cases. The first adjustment is that once the algorithm has a triggering slope marking the stove as “on” it does not consider an exit slope to trigger as “off” until the absolute temperature is less than 40°C. The second adjustment is that once the algorithm has
12
a triggering slope marking the stove as “on” it will switch to “off” even without a sufficiently steep exit slope if it has negative slope readings for an hour straight and the absolute temperature is below 35°C. Comparison of fit Next we test which algorithm correctly predicts the highest percentage of observations. In addition to testing the Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm, and the adjusted Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm, we also test various cutoff points in relation to mean ambient temperature. For example considering all data points above temperature thresholds such as 32°C, 34°C, 36°C, and 38°C as “on” (these correspond to approximately 8°C, 10°C, 12°C, and 14°C above mean ambient temperature, respectively). Table 7 reports the classification table of how well the algorithms predict the observations of three stone fire use. The strict threshold cutoff of 36°C (correctly classified 85.0% of the observations. This was the best performing strict threshold, correctly predicting more than the thresholds of 32°C (76.8%), 34°C (81.9%), and 38°C (84.8%). The adjusted Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm (77.7%) correctly predicted better than the Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm (58.9%), but not as well as the strict threshold of 36°C (85.0%). The logistic regression with temperature readings at times t-‐2, t-‐1, t, t+1, t+2 (that is at time t, and 30 and 60 minutes prior to, and 30 and 60 minutes after) and hour of the day predicts the most observations of three stone fire use, predicting 89.3% of observations correctly. The pseudo R-‐squared of the logistic regression explains 41.8% of the variation in observations, whereas each of the temperature thresholds has pseudo R-‐squared in the range of 10.3% to 18.4%. The Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm and the adjusted Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm have a pseudo R-‐squared of 0.1% and 7.7%, respectively. Table 8 shows how well the algorithms predict the observations of Envirofit use. Among Envirofits, the strict threshold cutoff of 36°C correctly classified 94.7% of the observations. This was the best performing strict threshold, correctly predicting more than the thresholds of 32°C (91.6%), 34°C (93.9%), and 38°C (94.3%). The strict threshold of 36°C also correctly classified more observations (94.7%) than both the Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm (84.4%), and the adjusted Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm (85.9%). The logistic regression with temperature readings at times t-‐2, t-‐1, t, t+1, t+2 (that is at time t, and 30 and 60 minutes prior to, and 30 and 60 minutes after) and hour of the day correctly predicts slightly fewer (93.8%) than the best performing temperature threshold of 36°C (94.7%). However, considering the pseudo R-‐squared the logistic regression explains 67.1% of the variation in trimmed observations, whereas each of the temperature thresholds has a pseudo R-‐squared in the range of 27.0% to 36.7%. The Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm and the adjusted Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm have a pseudo R-‐squared of 14.4% and 19.6%, respectively. Because the performance of a strict
13
temperature threshold would vary based on the local climate the logistic regression is a more robust algorithm. Evidence of a Hawthorne Effect Our kitchen performance tests (KPT) involve visits on four days in a row to weigh wood and install measurement devices, all while asking participants to record their cooking on a food diary. A major risk is that direct observational process altered participants’ behavior (as noted by (Ezzati, Saleh, and Kammen 2000; Smith-‐Sivertsen et al. 2009)). Specifically, it is plausible that participants used their Envirofit stoves more (and their three-‐stone fire less) during the KPT than during other times. To test for such Hawthorne effects, we examine stove usage during the final KPT, when all participants have two Envirofit. We compare stove usage during the KPT with stove usage over the same duration and days of the week in the prior and following weeks. Our measurement periods are roughly 72 hours (generally late Monday morning to early Thursday afternoon). Table 9 shows the summary statistics for predicted hours cooked per day among the four stoves. By using paired t-‐tests we compare stove usage on a given stove within the same household. Households used their first Envirofit about 3.8 hours per day in the week before the KPT and 3.4 hours in the following week. Their usage was about 4.6 hours10 during the KPT week. The differences between the KPT measurement period with previous week (0.75 hours, 95% CI = [0.12 to 1.37], p<0.05) and with the following week (1.21 hours, 95% CI = [0.58 to 1.84], p<0.01) were statistically significant at the 5% and 1% levels, respectively. The difference in daily hours cooked on the first Envirofit stove in the week prior to and following the KPT was not statistically significantly different than zero. Households used their second Envirofit about 2.7 hours per day in the week before the KPT and 2.4 hours in the following week. Their usage was about 3.4 hours11 during the KPT week. The differences between the KPT measurement period with previous week (0.83 hours, 95% CI = [0.27 to 1.38], p<0.01) and with the following week (0.81 hours, 95% CI = [0.34 to 1.27], p<0.01) were both highly statistically significant. The difference in daily hours cooked on the second Envirofit stove in the week prior to and following the KPT was not statistically significantly different than zero. While usage on each Envirofit stove rose, three stone fire usage declined during the KPT. Households used their primary three stone fire about 8.9 hours per day in the
10 To be precise, the cooking during the KPT measurement week for the first Envirofit was 4.59 hours per day among the 105 pairs of households that had SUMs temperature readings on their first Envirofit stove for both the KPT week and the prior week, and 4.56 hours among the 111 pairs of households that had SUMs temperature readings on their first Envirofit stove for the KPT week and the following week. 11 To be precise, the cooking during the KPT measurement week for the second Envirofit was 3.48 hours per day among the 88 pairs of households that had SUMs temperature readings on their second Envirofit for both the KPT week and the prior week, and 3.24 hours among the 95 pairs of households that had SUMs temperature readings for their second Envirofit for the KPT week and the following week.
14
week before the KPT and 10.1 hours in the following week. Their usage was about 7.4 hours12 during the KPT week. The differences between the KPT measurement period with previous week (1.44 hours, 95% CI = [0.43 to 2.46], p<0.01) and with the following week (2.74 hours, 95% CI = [1.56 to 3.93], p<0.01) were both highly statistically significant. The difference in daily hours cooked on the primary three stone fire in the week prior to and following the KPT was weakly statistically significantly different than zero (p<0.10). Households used their secondary three stone fire about 9.8 hours per day in the week before the KPT and 9.8 hours in the following week. Their usage was 9.0 during the KPT week. Due to a very small sample size,13 none of these differences are significantly different than zero. Ultimately, our interest is to know how combined cooking on Envirofits compares to combined cooking on three stone fires during this observation period. Due to a constraint in the number of functioning SUM devices at the end of our experiment, we have a very small sample (N=22) of temperature readings for secondary three stone fires. Therefore to analyze total combined stove usage we will use the balanced sample in which we successfully recovered SUMs data for both Envirofits and for the primary three stone fire (Table 10) in a given household. Among this sample (N=48) households used their Envirofits about 6.1 hours per day in the week before the KPT and 5.6 hours in the following week. Their usage was 8.1 hours during the KPT week. (Recall we match the same measurement period in each week.) The differences between the KPT measurement period with previous week (2.06 hours, 95% CI = [1.07 to 3.05], p<0.01) and with the following week (2.54 hours, 95% CI = [1.66 to 3.41], p<0.01) were both highly statistically significant. The difference in daily hours cooked in the week prior to and following the KPT was not statistically significantly different than zero. While Envirofit use rose, three stone fire use declined during the KPT. Among the same sample (Table 10) households used their primary three stone fire about 8.1 hours per day in the week prior to the KPT and 10.5 hours in the week following the KPT. The usage was 7.0 hours per day during the KPT week. The differences between the KPT measurement periods with previous week (1.13 hours, 95% CI = [-‐0.13 to 2.38], p<0.10) and with the following week (3.50 hours, 95% CI = [1.92 to 5.09], p<0.01) were statistically significant at the 10% and 1% levels respectively. The difference in daily hours cooked on the primary three stone fire in the week 12 To be precise, the cooking during the KPT measurement week for the primary three stone fire was 7.43 hours per day among the 98 pairs of households that had SUMs temperature readings on their primary three stone fire for both the KPT week and the prior week, and 7.40 hours among the 100 pairs of households that had SUMs temperature readings on their primary three stone fire for the KPT week and the following week. 13 SUMs devices malfunction and can no longer be used once they reach 85°C. Our study is similar to others (Ruiz-‐Mercado, Canuz, and Smith 2012; Ruiz-‐Mercado et al. 2008; Mukhopadhyay et al. 2012) in that our experiment was no exception to the losses of SUMs devices. Due to losses from malfunction or the inability to recover all the devices we initially placed, we did not have enough SUMs to place on all stoves by the end of the experiment. When the final KPT took place (near the end of our experiment) we instructed enumerators to exclude the second three stone fire if they were constrained in the number of SUMs devices. Table 9 shows few SUMs readings from the second three stone fire (N=22) compared to the other three stoves types.
15
prior to and following the KPT was significantly different (p<.01). All parishes overlapped in time, and therefore robust to time effects. While balancing the sample and focusing on combined Envirofit usage compared to primary three stone fire usage within the same household shows evidence of a Hawthorne effect, it also drastically reduces the sample size. Another option that utilizes more data is regression analysis to estimate the effect of the presence of the KPT measurement team on stove usage. Assign the subscripts t=-‐1 to the week prior to measurement week, t=0 to the measurement week, and t=1 to the week after the measurement week. Let !!"!"# be total hours cooked per day on the primary three stone fire for household i during week t. Let !"#$%&'(_!""# be a dummy variable for an adjacent week (t=-‐1 or t=1). The regression is modeled using Ordinary Least Squares (OLS) as: !!"!"# = !!"# ∗ !"#$%&'(_!""# + !! + !!" (1) where !! is fixed effects for the individual household (this controls for household level characteristics that don’t change over these three weeks like family size, income, etc.), and !!" is an error term. The coefficient !!"# is the estimate of how different (in hours cooked per day) an adjacent week is compared to a measurement week (this specification imposes that the previous and subsequent week are equal). Standard errors are clustered at the household level. The regression for the combined Envirofit usage is similar: !!"!"# = !!"# ∗ !"#$%&'(_!""# + !! + !!" (2) where !!"!"# is the total hours cooked per day on both Envirofits and all other variables are the same as in the three stone fire case. The coefficient !!"# is the estimate of how different (in hours cooked per day) an adjacent week is compared to the measurement week. This also imposes equality of the previous and subsequent week. To test the weeks separately, we use a slightly different specification. Let Week_Prior be a dummy variable equal to 1 for the week before the measurement week (when t=-‐1) and 0 otherwise, and let Week_After be a dummy variable equal to 1 for the week after the measurement week (when t=1) and 0 otherwise. Then the regression is modeled using OLS as: !!"!"# = !!!"#!""#_!"#$" + !!!"#!""#_!"#$% + !! + !!" (3) where !! is fixed effects for the individual household and !!!"# is the estimate of the difference (in hours cooked per day) of the week before the measurement week compared to the measurement week. The coefficient of !!!"# is the estimate of the difference (in hours cooked per day) of the week after the measurement week compared to the measurement week. The standard errors are clustered at the
16
household level. The specification for the combined total daily cooking with Envirofits is similar: !!"!"# = !!!"#!""#_!"#$" + !!!"#!""#_!"#$% + !! + !!" (4) where !!"!"# is the total hours cooked per day on both Envirofits and all other variables are the same as in the three stone fire case. The coefficient !!!"# is the estimate of how different (in hours cooked per day) the week prior to the measurement week is compared to the measurement week. The coefficient !!!"# is the estimate of how different (in hours cooked per day) the week after the measurement week is compared to the measurement week. Based on the regression estimates (Table 11) the patterns of stove use are essentially the same (as Tables 9-‐10) as Envirofit usage increases while three stone fire usage decreases in the presence of the measurement team. These effects disappear once the measurement team departs. When constraining the week prior to and after the measurement week to be equal, estimated primary three stone fire usage is higher by 2.26 hours per day (95% CI = [1.13 to 3.39], p<0.01) when comparing the adjacent weeks to the measurement week (col. 1) and the estimated total Envirofit usage is lower by 2.40 hours per day (95% CI = [1.48 to 3.32], p<0.01) when comparing the adjacent weeks to the measurement week (col. 3). Running the analysis weekly, the estimated use of primary three stone fires (col. 2) as compared to usage levels during the measurement week is 1.62 hours per day higher (95% CI = [0.32 to 2.92], p<0.05) and 2.88 hours per day higher (95% CI = [1.37 to 4.39], p<0.01) in the week prior to and after the measurement week, respectively. These coefficients are jointly significantly different than zero (joint F test significant at 1% level). The total usage on Envirofits (col. 4) as compared to Envirofit usage levels during the measurement week is estimated as a decrease of 2.02 hours per day (95% CI = [0.92 to 3.13], p<0.01) and a decrease of 2.71 hours per day (95% CI = [1.72 to 3.71], p<0.01) in the week prior to and after the measurement week, respectively. These coefficients are jointly significantly different from zero (joint F test significant at 1% level). Importance of a Hawthorne Effect To see the importance of high Envirofit usage during the KPT, consider the following illustrative example. Assume:
o Households without Envirofits use three stone fires twelve hours per day
o Households with Envirofits normally use three stone fires nine hours a day and Envirofits four hours per day
o During the KPT, households with Envirofits use three stone fires seven hours a day and Envirofits six hours per day
o Three stone fires create one unit and Envirofits create ½ unit of pollution per hour
17
With these illustrative assumptions, pollution per day declines from twelve units prior to the introduction of an Envirofit to eleven units of pollution (9 + (½ *4) = 11) once the Envirofit is present, a decline of 8.33%. However, if instead we used the data from the kitchen performance test, we would estimate a decline in pollution from twelve units to ten units (7 + (½*6) = 10), a decline of 16.67%, or twice the true decline. While these figures are merely illustrative, they illustrate the importance of observer effects in observational studies. The kitchen performance test is one of the workhorse tests in determining field-‐based use of stoves. Our results suggest the need to correct for the Hawthorne effect prior to publishing results. For example Bailis et al. (2007) find, when comparing the results of pre and post intervention kitchen performance tests, that improved cookstoves statistically significantly reduce average per capita fuel use and this reduction ranges from 19%-‐67% in rural Mexico. Reductions of the emissions of CO2 equivalents is estimated at 3.9 tons per home per year (Johnson et al., 2009) in the same study area in rural Mexico. In these studies daily biomass consumption patterns were calculated based on the results of kitchen performance tests. Other important studies focusing on the health benefits of improved stoves (Ezzati, Saleh, and Kammen 2000; Smith-‐Sivertsen et al. 2009) also use kitchen performance tests or other direct observational methods when measuring indoor air pollution. Given our findings demonstrate that stove users change their behaviors, favoring the clean stove technology when they are directly observed, it is possible that estimates of health benefits would be incorrectly estimated if the underlying behaviors in those studies were influenced in a similar manner. Conclusion We describe an algorithm that combines observations of stove use with continuous temperature data from a stove usage monitor to predict the on/off status of a stove. We test various regression specifications and use the Akaike Information Criterion for choosing a logistic regression where observations are regressed against temperature at times t-‐2, t-‐1, t, t+1, t+2 and a dummy variable for hour of the day. This regression specification correctly predicts 89.3% of three stone fire observations and 93.8% of Envirofit observations of stove usage. The predictive accuracy is almost the same in and out-‐of-‐sample. We continue by comparing the performance of this algorithm to previously published algorithms (Mukhopadhyay et al. 2012; Ruiz-‐Mercado, Canuz, and Smith 2012) and strict temperature threshold cutoffs. Our logistic regression correctly classifies more observations, with a higher pseudo R-‐squared, than any other algorithm for three stone fires. Among the Envirofit observations our logistic regression correctly predicts a higher percentage of observations than both the Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm and the adjusted Ruiz-‐Mercado-‐Mukhopadhyay et al. algorithm. The strict 36°C threshold predicts slightly more observations (0.4%) than our logistic regression, however the logistic regression has a much higher pseudo R-‐squared than all other algorithms. We argue that a predictive logistic regression with observations of actual use and temperature readings at times t-‐2, t-‐1, t, t+1, t+2 and hour of the day is the best algorithm for both
18
traditional and nontraditional stove types. Further, this algorithm is transferable to many climatic zones (both hot and cold) whereas strict temperature thresholds and algorithms based on entry and exit slopes need adjustment depending on the local climate and local cooking practices. We find evidence of a large Hawthorne effect during the 72 hour-‐long kitchen performance tests. Households appear to increase cooking on Envirofit stoves by 32.4% (2.02 hours per day) during the measurement week; followed by a reduction in Envirofit use of 33.6% (2.71 hours per day) in the same period the week after the measurement team has departed. Further, use of the primary three stone fires decreases by 18.7% (1.62 hours per day) during the measurement week, followed by an increase of 39.9% (2.88 hours per day) in the week after the kitchen performance test. Attention is warranted in interpreting results of observational studies, our finding of a large and statistically significant Hawthorne effect suggests that reported benefits due to fuel efficient and clean burning stoves could be overstated if researchers are measuring the benefits of an improved stove when behavior is biased in favor of the cleaner technology due to a Hawthorne effect. By using SUMs and a machine-‐learning algorithm to predict the probability of a stove in use, we show one way around this obstacle. Bibliography
Akaike, Hirotugu. 1974. “A New Look at the Statistical Model Identification.” Automatic Control, IEEE Transactions on 19 (6): 716–723.
———. 1981. “Likelihood of a Model and Information Criteria.” Journal of Econometrics 16: 3–14.
Bailis, Rob, Victor Berrueta, Chaya Chengappa, Karabi Dutta, Rufus Edwards, Omar Masera, Dean Still, and Kirk R. Smith. 2007. “Performance Testing for Monitoring Improved Biomass Stove Interventions: Experiences of the Household Energy and Health Project.” Energy for Sustainable Development 11 (2) (June): 57–70. doi:10.1016/S0973-‐0826(08)60400-‐7. http://linkinghub.elsevier.com/retrieve/pii/S0973082608604007.
Bailis, Rob, Kirk R. Smith, and Rufus Edwards. 2007. “Kitchen Performance Test (KPT)”. Univerisity of California, Berkeley, CA.
Bailis, Robert, Majid Ezzati, and Daniel M Kammen. 2005. “Mortality and Greenhouse Gas Impacts of Biomass and Petroleum Energy Futures in Africa.” Science 308 (5718) (April 1): 98–103. doi:10.1126/science.1106881. http://www.ncbi.nlm.nih.gov/pubmed/15802601.
Bond, Tami, Chandra Venkataraman, and Omar Masera. 2004. “Global Atmospheric Impacts of Residential Fuels.” Energy for Sustainable Development 8 (3)
19
(September): 20–32. doi:10.1016/S0973-‐0826(08)60464-‐0. http://linkinghub.elsevier.com/retrieve/pii/S0973082608604640.
Burnham, Kenneth P., and David R. Anderson. 2002. Model Selection and Multimodel Inference: A Practical Information-‐Theoretic Approach. 2nd ed. New York, NY: Springer-‐Verlag.
Chetty, Raj, John N. Friedman, and Jonah E. Rockoff. 2011. “The Long-‐Term Impacts of Teachers: Teacher Value-‐Added and Student Outcomes in Adulthood”. Cambridge, MA, NBER.
Choi, Hyunyoung, and Hal Varian. 2012. “Predicting the Present with Google Trends.” Economic Record 88 (June 27): 2–9. doi:10.1111/j.1475-‐4932.2012.00809.x. http://doi.wiley.com/10.1111/j.1475-‐4932.2012.00809.x.
Einav, Liran, Dan Knoepfle, Jonathan D Levin, and Neel Sundaresan. 2012. “Sales Taxes and Internet Commerce”. Cambridge, MA, NBER.
Einav, Liran, and Jonathan D Levin. 2014. “The Data Revolution and Economic Analysis.” In Innovation Policy and the Economy, edited by Josh Lerner and Scott Stern, Vol. 14. University of Chicago Press.
Ezzati, Majid, and Daniel M Kammen. 2001. “Indoor Air Pollution from Biomass Combustion and Acute Respiratory Infections in Kenya: An Exposure-‐Response Study.” Lancet 358 (9282) (August 25): 619–24. http://www.ncbi.nlm.nih.gov/pubmed/11530148.
———. 2002. “Evaluating the Health Benefits of Transitions in Household Energy Technologies in Kenya.” Energy Policy 30 (10) (August): 815–826. doi:10.1016/S0301-‐4215(01)00125-‐2. http://linkinghub.elsevier.com/retrieve/pii/S0301421501001252.
Ezzati, Majid, Homayoun Saleh, and Daniel M Kammen. 2000. “The Contributions of Emissions and Spatial Microenvironments to Exposure to Indoor Air Pollution from Biomass Combustion in Kenya.” Environmental Health Perspectives 108 (9) (September): 833–839.
Finkelstein, Amy, Sarah Taubman, Bill Wright, Mira Bernstein, Jonathan Gruber, Joseph P Newhouse, Heidi Allen, Katherine Baicker, and Oregon Health Study Group. 2012. “The Oregon Health Insurance Experiment: Evidence from the First Year.” The Quarterly Journal of Economics 127 (3): 1057–1106. doi:10.1093/qje/qjs020.Advance.
Freeman, Olivia E., and Hisham Zerriffi. 2012. “Carbon Credits for Cookstoves: Trade-‐Offs in Climate and Health Benefits.” The Forestry Chronicle 88 (5): 600–608.
20
GIZ. 2011. “Carbon Markets for Improved Cooking Stoves: A GIZ Guide for Project Operators”. Poverty-‐oriented Basic Energy Services, Eschborn, Germany.
Goel, Sharad, Jake M Hofman, Sébastien Lahaie, David M Pennock, and Duncan J Watts. 2010. “Predicting Consumer Behavior with Web Search.” Proceedings of the National Academy of Sciences of the United States of America 107 (41) (October 12): 17486–90. doi:10.1073/pnas.1005962107. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2955127&tool=pmcentrez&rendertype=abstract.
Gold Standard Foundation. 2013. “The Gold Standard: Simplified Methodology for Efficient Cookstoves”. Geneva-‐Cointrin, Switzerland.
Harrell, Stephen, Theresa Beltramo, David I. Levine, Garrick Blalock, and Andrew M. Simons. 2014. “What Is a ‘Meal’? Comparing Methods to Determine Cooking Events”. Mimeo.
Horrace, William C., and Ronald L. Oaxaca. 2006. “Results on the Bias and Inconsistency of Ordinary Least Squares for the Linear Probability Model.” Economics Letters 90 (3) (March): 321–327. doi:10.1016/j.econlet.2005.08.024. http://linkinghub.elsevier.com/retrieve/pii/S0165176505003150.
Johnson, Michael, Rufus Edwards, Adrián Ghilardi, Victor Berrueta, Dan Gillen, Claudio Alatorre Frenk, and Omar Masera. 2009. “Quantification of Carbon Savings from Improved Biomass Cookstove Projects.” Environmental Science & Technology 43 (7) (April): 2456–2462. doi:10.1021/es801564u. http://pubs.acs.org/doi/abs/10.1021/es801564u.
Kammen, Daniel M, Robert Bailis, and Antonia Herzog. 2002. “Clean Energy for Development and Economic Growth: Biomass and Other Renewable Energy Options to Meet Energy and Development Needs in Poor Nations”. New York: UNDP and Government of Morocco.
Levine, David I, Theresa Beltramo, Garrick Blalock, and Carolyn Cotterman. 2013. “What Impedes Efficient Adoption of Products? Evidence from Randomized Variation in Sales Offers for Improved Cookstoves in Uganda”. Mimeo.
Lim, Stephen S, Theo Vos, Abraham D Flaxman, Goodarz Danaei, Kenji Shibuya, Heather Adair-‐Rohani, Markus Amann, et al. 2012. “A Comparative Risk Assessment of Burden of Disease and Injury Attributable to 67 Risk Factors and Risk Factor Clusters in 21 Regions, 1990-‐2010: A Systematic Analysis for the Global Burden of Disease Study 2010.” Lancet 380 (9859) (December 15): 2224–60.
Martin II, William J, Roger I Glass, John M Balbus, and Francis S Collins. 2011. “A Major Environmental Cause of Death.” Science 334: 180–181.
21
Masera, Omar R, Barbara D Saatkamp, and Daniel M Kammen. 2000. “From Linear Fuel Switching to Multiple Cooking Strategies: A Critique and Alternative to the Energy Ladder Model.” World Development 28 (12) (December): 2083–2103. doi:10.1016/S0305-‐750X(00)00076-‐0. http://linkinghub.elsevier.com/retrieve/pii/S0305750X00000760.
McCracken, John, J Schwartz, M Mittleman, L Ryan, A Diaz Artiga, and Kirk R Smith. 2007. “Biomass Smoke Exposure and Acute Lower Respiratory Infections Among Guatemalan Children.” Epidemiology 18 (5): S185.
Mukhopadhyay, Rupak, Sankar Sambandam, Ajay Pillarisetti, Darby Jack, Krishnendu Mukhopadhyay, Kalpana Balakrishnan, Mayur Vaswani, et al. 2012. “Cooking Practices, Air Quality, and the Acceptability of Advanced Cookstoves in Haryana, India: An Exploratory Study to Inform Large-‐Scale Interventions.” Global Health Action 5: 19016.
Ramanathan, V., and G. Carmichael. 2008. “Global and Regional Climate Changes due to Black Carbon.” Nature Geoscience 1 (4): 221–227.
Ruiz-‐Mercado, Ilse, Eduardo Canuz, and Kirk R. Smith. 2012. “Temperature Dataloggers as Stove Use Monitors (SUMs): Field Methods and Signal Analysis.” Biomass and Bioenergy 47: 459–468. doi:10.1016/j.biombioe.2012.09.003.
Ruiz-‐Mercado, Ilse, Eduardo Canuz, Joan L. Walker, and Kirk R. Smith. 2013. “Quantitative Metrics of Stove Adoption Using Stove Use Monitors (SUMs).” Biomass and Bioenergy 57 (August): 136–148. doi:10.1016/j.biombioe.2013.07.002. http://linkinghub.elsevier.com/retrieve/pii/S096195341300319X.
Ruiz-‐Mercado, Ilse, Nick L Lam, Eduardo Canuz, Gilberto Davila, and Kirk R Smith. 2008. “Low-‐Cost Temperature Loggers as Stove Use Monitors (SUMs).” Boiling Point 55: 16–18.
Smith, Kirk R, Nigel Bruce, Byron Arana, Alisa Jenny, Asheena Khalakdina, John McCracken, Anaite Diaz, Morten Schei, and Sandy Gove. 2006. “Conducting The First Randomized Control Trial of Acute Lower Respiratory Infections and Indoor Air Pollution: Description of Process and Methods.” Epidemiology 17 (6): S44.
Smith, Kirk R, John P McCracken, Lisa Thompson, Rufus Edwards, Kyra N Shields, Eduardo Canuz, and Nigel Bruce. 2010. “Personal Child and Mother Carbon Monoxide Exposures and Kitchen Levels: Methods and Results from a Randomized Trial of Woodfired Chimney Cookstoves in Guatemala (RESPIRE).” Journal of Exposure Science and Environmental Epidemiology 20 (5) (July): 406–16. doi:10.1038/jes.2009.30. http://www.ncbi.nlm.nih.gov/pubmed/19536077.
22
Smith, Kirk R, John P McCracken, Martin W Weber, Alan Hubbard, Alisa Jenny, Lisa M Thompson, John Balmes, Anaité Diaz, Byron Arana, and Nigel Bruce. 2011. “Effect of Reduction in Household Air Pollution on Childhood Pneumonia in Guatemala (RESPIRE): A Randomised Controlled Trial.” Lancet 378 (9804) (November 12): 1717–26. doi:10.1016/S0140-‐6736(11)60921-‐5. http://www.ncbi.nlm.nih.gov/pubmed/22078686.
Smith-‐Sivertsen, Tone, Esperanza Díaz, Dan Pope, Rolv T Lie, Anaite Díaz, John McCracken, Per Bakke, Byron Arana, Kirk R Smith, and Nigel Bruce. 2009. “Effect of Reducing Indoor Air Pollution on Women’s Respiratory Symptoms and Lung Function: The RESPIRE Randomized Trial, Guatemala.” American Journal of Epidemiology 170 (2) (July 15): 211–20. doi:10.1093/aje/kwp100. http://www.ncbi.nlm.nih.gov/pubmed/19443665.
Figure 1Photograph of stove use monitoring system (SUMs)
Source: http://www.berkeleyair.com/products-and-services/instrument-services/78-sums
Figure 2Map of Uganda with Mbarara district highlighted
1
Figure 3SUM holder designed to encase the stove use monitor to protect it from malfunctions when exceeding
temperatures of 85 degrees Celsius
Figure 4Arrows mark the placement of SUMs on three stone fire and Envirofit
(a) Three Stone Fire (b) Envirofit
2
Figure 5Temperature readings on a three stone fire, Envirofit, and ambient air in one household over 20 days
16jun2012 22jun2012 04jul2012Date
28jun2012
(a) Three Stone Fire
16jun2012 22jun2012 04jul2012Date
28jun2012
(b) Envirofit
Figure 6Distribution of hourly ambient temperatures
1520
2530
3540
SUM
s Te
mpe
ratu
re, [
C]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
when time is formatted 00:00-23:59Box Plot of Ambient Temperature by Hour of the Day
Hour of Day
Box and whisker plot: The bottom and top of each box represent the first and the third quartiles of the hourly ambient temperaturedistribution, the horizontal band inside each box represents the median hourly ambient temperature. The end of the whiskers marked with a
horizontal line represent the lowest datum within 1.5 times the interquartile range of the lower quartile and the highest datum within 1.5 timesthe interquartile range of the upper quartile. The dots represent individual observations that are outliers outside of this range.
3
Figure 7Comparison of predicted probability of stove in use: full verses trimmed sample
0.2
.4.6
.81
Prob
abilit
y St
ove
in U
se
20 30 40 50Temperature [C]
TSF (full sample) TSF (trimmed sample)
(a) Three Stone Fire
0.2
.4.6
.81
Prob
abilit
y St
ove
in U
se
20 30 40 50Temperature [C]
ENV (full sample) ENV (trimmed sample)
(b) Envirofit
Note: The trimmed sample dropped observations when observed as ‘lit’ and more than two degrees below hourly mean air temperature, or ifobserved as ‘not lit’ but more than two degrees above the hourly maximum air temperature.
4
Table 1Temperature readings, days, stoves observed by stove type
STOVE TYPE READINGS DAYS STOVESEnvirofit 1st 432,033 8,945 153Envirofit 2nd 137,413 2,974 107Three Stone Fire 1st 708,048 14,081 168Three Stone Fire 2nd 417,149 8,204 152Totals 1,694,643 34,204 580Note: data collected March - September 2012
Table 2Summary statistics: observations of stove usage
Observations Mean Std. Dev. Min. Max. NTemp [C] when three stone fire observed as off 30.17 7.27 19.5 85 1339Temp [C] when three stone fire observed as on 38.30 10.63 20 85 329Temp [C] when Envirofit observed as off 26.17 6.34 18.5 76 704Temp [C] when Envirofit observed as on 41.52 11.7 20 69.5 78
Observations - Trimmed Sample* Mean Std. Dev. Min. Max. NTemp [C] when three stone fire observed as off 28.21 4.60 19.5 41.5 1169Temp [C] when three stone fire observed as on 38.97 10.36 22 85 315Temp [C] when Envirofit observed as off 25.27 3.86 18.5 41 678Temp [C] when Envirofit observed as on 42.04 11.39 20 69.5 76
Note*: In the trimmed sample, observations are dropped if ‘lit’ and more than two degrees below hourly meanair temperature, or if observed as ‘not lit’ but more than two degrees above the hourly maximum air temperature.The trimmed sample removes 11.0% (184 out of 1668) and 3.6% (28 out of 782) of the observation samples for thethree stone fire and Envirofits, respectively.
5
Table 3Probability (odds ratio) that three stone fire (TSF) or Envirofit (ENV) is lit: full versus trimmed samples
(1) (2) (3) (4)TSF TSF trimmed ENV ENV trimmed
SUMs Temperature [C], t 1.13*** 1.37*** 1.08 1.46***(0.03) (0.06) (0.06) (0.13)
SUMs Temperature [C], t-1 0.99 0.95 0.99 1.09(0.06) (0.07) (0.06) (0.08)
SUMs Temperature [C], t+1 0.98 0.98 1.16*** 1.24***(0.03) (0.04) (0.05) (0.07)
SUMs Temperature [C], t-2 0.95 1.03 1.05 0.94(0.05) (0.07) (0.05) (0.05)
SUMs Temperature [C], t+2 1.06** 1.07** 0.96 0.90**(0.03) (0.03) (0.03) (0.05)
Control: Hour of Day Yes Yes Yes Yes
Observations 1,184 1,052 366 356Pseudo R-squared 0.167 0.403 0.445 0.696Correctly Classified 83.11% 89.26% 90.71% 93.82%
Robust standard error exponentiated form in parentheses*** p<0.01, ** p<0.05, * p<0.1
Note: Columns (1) and (3) use the full observation sample. Columns (2) and (4) are trimmed foroutliers, meaning observations are dropped if ‘lit’ and more than two degrees below hourly meanair temperature, or if observed as ‘not lit’ but more than two degrees above the hourly maximumair temperature.
Note: Standard errors clustered at stove level.
Table 4Candidate regression specifications: all specifications include temperature at time [t] and control for
hour of the day
Three Stone Fire EnvirofitSpec Leads Lags Slope2 if
Slope>0Pseudo R2 AIC % Correctly
PredictedN Pseudo R2 AIC % Correctly
PredictedN
(1) 0 0 - 0.37 984.01 87.49 1479 0.60 212.05 96.05 735(2) 1 1 - 0.41 649.04 89.16 1052 0.68 120.29 94.94 356(3) 1 1 Yes 0.42 649.44 89.07 1052 0.69 121.55 95.22 356(4) 2 2 - 0.42 648.29 89.26 1052 0.70 120.14 93.82 356(5) 2 2 Yes 0.42 648.49 89.35 1052 0.70 121.61 94.10 356(6) 3 3 - 0.42 650.87 89.15 1051 0.72 117.33 94.93 355(7) 3 3 Yes 0.42 651.19 89.34 1051 0.72 118.49 94.93 355(8) 4 4 - 0.42 653.48 89.05 1050 0.73 116.33 94.93 355(9) 4 0 - 0.39 865.75 86.98 1260 0.69 149.97 94.57 516(10) 0 4 - 0.40 772.18 89.36 1269 0.61 183.79 95.30 574
Note: Classified as correctly predicted if predicted Pr(D) >= 0.5: True D defined when stove use observed as ‘on’
Note: Control for hour of the day is a dummy variable for each hour of the day stoves were observed in use.
Akaike information criterion: defined as AIC = 2k-2ln(L), where k is number of parameters and L is the maximized value of thelikelihood function. Given a set of candidate models for a set of data, the preferred model is the one with the minimum AIC.
6
Table 5Out of sample predictive accuracy: three stone fire and envirofit classification tables for in-sample
predictions and out-of-sample predictions
Three Stone Fire Envirofit
In-Sample Out-Sample In-Sample Out-SampleSensitivity: classified as ‘on’ when observed as ‘on’ 62.04 49.07 80.00 67.86Specificity: classified as ‘off’ when observed as ‘off’ 97.09 97.45 98.65 95.09Positive predictive value: observed ‘on’ when classifed ‘on’ 84.81 82.81 92.31 70.37Negative predictive value: observed ‘off’ when classified ‘off’ 90.70 88.45 96.05 94.51False positive rate given observed as ‘off’ 2.91 2.55 1.35 4.91False negative rate given observed as ‘on’ 37.96 50.93 20.00 32.14False positive rate given classified as ‘on’ 15.19 17.19 7.69 29.63False negative rate given classified as ‘off’ 9.30 11.55 3.95 5.49Overall correctly classified (percentage) 89.81 87.78 95.51 91.10Sample size 520 540 178 191
Note: Unless noted cell contents to be read as probabilities out of one hundredNote: Classified as ‘on’ if predicted probability >= 0.5Note: For this exercise, the samples of observations are randomly split into two groups, the logistic regression is run on one of the twogroups, then based on the coefficients of that regression, the prediction is made to the other (out-of-sample) group. Running a classificationtable on the in-sample and out-of-sample groups shows very little difference between the two samples, suggesting the technique is robustout-of-sample. When the samples were randomly split in half the groups were equally sized (three stone fire groups sized 742/742 andEnvirofit groups sized 377/377), however the sample size of the groups presented in this table is different because various observations dropout of the regression if lacking temperature data for any of the leads and lags. The observation sample is trimmed for outliers, observationsare dropped if ‘lit’ and more than two degrees below hourly mean air temperature, or if observed as ‘not lit’ but more than two degreesabove the hourly maximum air temperature.
Table 6How well do temperature readings generated from stove use monitors predict observations of stove use
Three Stone Fire Envirofit
Sensitivity: classified as ‘on’ when observed as ‘on’ 56.54 77.59Specificity: classified as ‘off’ when observed as ‘off’ 97.61 96.98Positive predictive value: observed ‘on’ when classifed ‘on’ 85.82 83.33Negative predictive value: observed ‘off’ when classified ‘off’ 89.79 95.70False positive rate given observed as ‘off’ 2.39 3.02False negative rate given observed as ‘on’ 43.46 22.41False positive rate given classified as ‘on’ 14.18 16.67False negative rate given classified as ‘off’ 10.21 4.30Overall correctly classified (percentage) 89.26 93.82Sample size 1052 356
Note: Unless noted cell contents to be read as probabilities out of one hundredNote: Classified as ‘on’ if predicted probability >= 0.5Note: The observation sample is trimmed for outliers, observations are dropped if ‘lit’ and more than two degreesbelow hourly mean air temperature, or if observed as ‘not lit’ but more than two degrees above the hourly maximum airtemperature.
7
Table 7How well do temperature readings generated from stove use monitors predict observations of three stone
fires in use: classification tables comparing algorithms
Three Stone Fire
Over 32 Over 34 Over 36 Over 38 RMM adj RMM LogitSensitivity: classified as ‘on’ when obs as ‘on’ 53.0 44.1 36.8 31.4 40.0 41.3 56.5Specificity: classified as ‘off’ when obs as ‘off’ 83.2 92.0 98.0 99.1 64.0 87.5 97.6Positive predictive: obs ‘on’ when classifed ‘on’ 46.0 59.9 83.5 90.8 23.0 47.1 85.8Negative predictive: obs ‘off’ when classified ‘off’ 86.8 85.9 85.2 84.3 79.8 84.7 89.8False positive rate given observed as ‘off’ 16.8 8.0 2.0 0.9 36.0 12.5 2.4False negative rate given observed as ‘on’ 47.0 55.9 63.2 68.6 60.0 58.7 43.5False positive rate given classified as ‘on’ 54.0 40.1 16.5 9.2 77.0 52.9 14.2False negative rate given classified as ‘off’ 13.2 14.1 14.8 15.7 20.2 15.3 10.2Overall correctly classified (percentage) 76.8 81.9 85.0 84.8 58.9 77.7 89.3Pseudo R-squared 10.3 13.4 18.4 17.7 0.1 7.7 41.8
Note: Each column represents a logistic regression with the trimmed sample of observations as the dependent variable and the column header asthe independent variable, cell contents are classified as ‘on’ if predicted probability >= 0.5, the psuedo R-squared is reported for each regression.As an example, the first column is the observation sample regressed on a binary variable registering ‘on’ when the temperature is greater than 32Celsius, and ‘off’ when 32 Celsius or below. The same is true for the ‘Over 34,’ ‘Over 36,’ and ‘Over 38’ columns. The column titled ‘RMM’ usesthe ‘on’/‘off’ status as determined by the Ruiz-Mercado-Mukhopadhyay et al. algorithm as the independent variable for the logistic regression.The column titled ‘adj RMM’ uses the ‘on’/‘off’ status as determined by the adjusted Ruiz-Mercado-Mukhopadhyay et al. algorithm (allowingthe stove to remain ‘on’ if temperatures are very hot even if an exit slope has triggered, and switching ‘off’ if cooling and approaching ambienttemperature even if exit slope not triggered) as the independent variable. The column titled ‘Logit’ uses the logistic regression developed in thispaper where the observations of stove use are regressed against temperature at time t, and two lead and lag temperature readings, and the hourof day. In cases where the regression does not run due to insufficient variability or colinearity, the classification tables are calculated manually,and the pseudo R-squared is reported as n/a. All cells are to be read as probabilities out of one hundred.Note: The observation sample is trimmed for outliers, observations are dropped if ‘lit’ and more than two degrees below hourly mean airtemperature, or if observed as ‘not lit’ but more than two degrees above the hourly maximum air temperature.
Table 8How well do temperature readings generated from stove use monitors predict visual observations of
Envirofit use: classification tables comparing algorithms
Envirofit
Over 32 Over 34 Over 36 Over 38 RMM adj RMM LogitSensitivity: classified as ‘on’ when obs as ‘on’ 57.9 52.6 48.7 43.4 56.6 63.2 77.6Specificity: classified as ‘off’ when obs as ‘off’ 95.4 98.5 99.9 100.0 87.5 88.5 97.0Positive predictive: obs ‘on’ when classifed ‘on’ 58.7 80.0 97.4 100.0 33.6 38.1 83.3Negative predictive: obs ‘off’ when classified ‘off’ 95.3 94.9 94.6 94.0 94.7 95.5 95.7False positive rate given observed as ‘off’ 4.6 1.5 0.1 0.0 12.5 11.5 3.0False negative rate given observed as ‘on’ 42.1 47.4 51.3 56.6 43.4 36.8 22.4False positive rate given classified as ‘on’ 41.3 20.0 2.6 0.0 66.4 61.9 16.7False negative rate given classified as ‘off’ 4.7 5.1 5.4 6.0 5.3 4.5 4.3Overall correctly classified (percentage) 91.6 93.9 94.7 94.3 84.4 85.9 93.8Pseudo R-squared 27.0 32.2 36.7 n/a 14.4 19.6 69.6
Note: Each column represents a logistic regression with the trimmed sample of observations as the dependent variable and the column header asthe independent variable, cell contents are classified as ‘on’ if predicted probability >= 0.5, the psuedo R-squared is reported for each regression.As an example, the first column is the observation sample regressed on a binary variable registering ‘on’ when the temperature is greater than 32Celsius, and ‘off’ when 32 Celsius or below. The same is true for the ‘Over 34,’ ‘Over 36,’ and ‘Over 38’ columns. The column titled ‘RMM’ usesthe ‘on’/‘off’ status as determined by the Ruiz-Mercado-Mukhopadhyay et al. algorithm as the independent variable for the logistic regression.The column titled ‘adj RMM’ uses the ‘on’/‘off’ status as determined by the adjusted Ruiz-Mercado-Mukhopadhyay et al. algorithm (allowingthe stove to remain ‘on’ if temperatures are very hot even if an exit slope has triggered, and switching ‘off’ if cooling and approaching ambienttemperature even if exit slope not triggered) as the independent variable. The column titled ‘Logit’ uses the logistic regression developed in thispaper where the observations of stove use are regressed against temperature at time t, and two lead and lag temperature readings, and the hourof day. In cases where the regression does not run due to insufficient variability or colinearity, the classification tables are calculated manually,and the pseudo R-squared is reported as n/a. All cells are to be read as probabilities out of one hundred.Note: The observation sample is trimmed for outliers, observations are dropped if ‘lit’ and more than two degrees below hourly mean airtemperature, or if observed as ‘not lit’ but more than two degrees above the hourly maximum air temperature.
8
Table 9Summary statistics: predicted hours cooked per day for each three stone fire (TSF) and each Envirofit in
the time period one week prior to and after the approximately 72 hour long kitchen performance test(KPT)
Predicted Daily Hours Cooked
Primary TSF Secondary TSF Envirofit 1st Envirofit 2ndMean SD N Mean SD N Mean SD N Mean SD N
Week Before Final KPT 8.87*** 7.54 98 9.82 8.93 22 3.84** 3.50 105 2.65*** 2.84 88Final KPT Week 7.43 7.62 98 8.99 8.19 22 4.59 3.27 105 3.48 3.07 88
Final KPT Week 7.40 7.56 100 8.99 8.19 22 4.56 3.24 111 3.24 2.95 95Week After Final KPT 10.14*** 8.41 100 9.79 8.34 22 3.35*** 3.21 111 2.44*** 2.15 95
Note: Stove usage during KPT verses stove usage over the same duration and days of the week for the same stove types in the prior and following
weeks. We use paired t-tests relative to the measurement (KPT) week to examine statistical significance. The difference in use is statistically
different than zero at the following levels *** p<0.01, ** p<0.05, * p<0.10. The differences in hours cooked per day in the week prior to and
after the KPT are weakly statistically significantly different than zero for the primary three stone fire (p<0.10). There is no statistically significant
difference in daily hours cooked in the week prior to and after the KPT for neither the secondary three stone fire, the first Envirofit, nor the second
Envirofit.
Table 10Test for Hawthorne effect: primary three stone fire verses combined hours cooked on two Envirofits
during the approximately 72 hour long kitchen performance test (KPT) “week”
Predicted Daily Hours CookedBoth Envirofit Stoves Mean SD N
One Week Before Final KPT 6.08*** 3.91 48Final KPT Measurement Week 8.14 3.67 48One Week After Final KPT 5.60*** 3.05 48
Primary Three Stone Fire Mean SD NOne Week Before Final KPT 8.07* 7.39 48Final KPT Measurement Week 6.95 7.14 48One Week After Final KPT 10.45*** 8.40 48
Note: Stove usage during KPT verses stove usage over the same duration and days of the
week in the prior and following weeks. We use paired t-tests relative to the measurement
(KPT) week to examine statistical significance. The difference in use is statistically dif-
ferent than zero at the following levels *** p<0.01, ** p<0.05, * p<0.10. Each of the 48
households in this sample has SUMs readings for two Envirofit stoves and SUMs readings
for the primary three stone fire. The week prior to and after the KPT are not statistically
significantly different than zero for Envirofit, however they are statistically significantly
different than zero for the three stone fire (at p<0.01).
9
Table 11Regressions testing for Hawthorne effect: estimates of effects of the presence of measurement team in
primary three stone fire (TSF) usage and combined Envirofit usage, the coefficients represent the changein hours cooked per day compared to hours cooked per day in the measurement week
Primary TSF Combined Envirofit(1) (2) (3) (4)
Week prior to and after measurement week 2.26*** -2.40***constrained to be equal (0.57) (0.46)
Week prior to measurement week 1.62** -2.02***(0.66) (0.56)
Week after measurement week 2.88*** -2.71***(0.76) (0.50)
Household fixed effects Yes Yes Yes YesObservations 316 316 229 229R-squared 0.82 0.82 0.79 0.80Household clusters 118 118 89 89
Robust standard errors in parentheses*** p<0.01, ** p<0.05, * p<0.1
Note: The unit of analysis is a measurement “week” (approximately 72 hours) at a household. Thespecification in columns 1 and 3 imposes that the weeks prior to and after the measurement week areequal. The specification in columns 2 and 4 tests usage in the week prior to and after the measurementweek separately. The coefficient estimates in columns 2 and 4 are both jointly significantly not equalto zero (joint F test significant at 1% level in both cases).
Note: Standard errors clustered at household level.
10