Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
The impact of health resources on education outcomes in rural India
CANDICE WEI LING TAN
Honours Thesis Bachelor of Commerce (Financial Economics / Business Statistics)
Bachelor of Arts (History)
Supervisor: Dr. Gautam Bose
27th October 2008
2
Declaration
I hereby declare that this submission is my own work and any contributions or materials
by other authors used in this thesis have been appropriately acknowledged. This thesis has
not been previously submitted to any other university or institution as part of the
requirements for another degree or award.
CANDICE WEI LING TAN
27th October 2008
3
Acknowledgements
I would like to thank my supervisor Dr. Gautam Bose for his support and assistance
throughout my honours year. An enormous debt of gratitude is owed to Meliyanni Johar
for her willingness to help, her unwavering Stata wisdom and encouragement. I would
also like to thank Professor Denzil Fiebig for his help and insight. Thanks also to Mr.
Hong il Yoo for graciously answering my random econometrics questions and Dr.
Valentyn Pachenko and Dr. Shiko Maruyama for their comments.
Finally, I would like to acknowledge my family: mum, dad and brothers Paul and David
(thank you for your help on this thesis!) who have helped me in their own important way
throughout my studies.
4
Table of Contents
ABSTRACT……………………………………………………………………………..8
1.INTRODUCTION……………………………………………………………………..9
2. BACKGROUND ........................................................................................................ 13
2.1 THE GLOBAL HEALTH AND EDUCATION CRISIS ................................. 13
2.2 CASE STUDY OF INDIA................................................................................ 15
3. LITERATURE REVIEW ........................................................................................... 17
3.1 THE BENEFITS AND DETERMINANTS OF EDUCATION....................... 17
3.2 EDUCATION AND HEALTH......................................................................... 18
3.3 GENDER BIAS IN EDUCATION AND HEALTH........................................ 21
4. CONCEPTUAL FRAMEWORK ............................................................................... 23
5. DATA ......................................................................................................................... 27
5.1.1 DESCRIPTION.............................................................................................. 27
5.1.2 HANDLING OF DATA ................................................................................ 28
5.1.3 LIMITATIONS OF DATA............................................................................ 29
5.2 DEPENDENT VARIABLE.............................................................................. 29
5.3 EXPLANATORY VARIABLES...................................................................... 33
6. ECONOMETRIC APPROACH ................................................................................. 42
6.1 BINARY DEPENDENT VARIABLE ............................................................. 42
6.2 BIVARIATE PROBIT MODEL WITH SAMPLE SELECTION.................... 43
6.3 EXTENSION: RANDOM EFFECTS PROBIT MODELS.............................. 47
7. EMPIRICAL RESULTS............................................................................................. 49
7.1 BIVARIATE PROBIT WITH SAMPLE SELECTION RESULTS ................ 49
7.2 UNIVARIATE PROBIT MODEL RESULTS ................................................. 51
7.3 GENDER-DISAGGREGATED PROBIT MODEL RESULTS ...................... 58
7.4 PREDICTED PROBABILITIES...................................................................... 64
7.5 DIAGNOSTICS................................................................................................ 67
7.6 ROBUSTNESS CHECKS ................................................................................ 70 7.6.1 RANDOM EFFECTS PROBIT RESULTS........................................................ 70
7.6.2 ALTERNATIVE VARIABLE DEFINITIONS AND SUB SAMPLE ESTIMATION ............................................................................................................. 71
8. CONCLUSION........................................................................................................... 73
5
9. APPENDIX................................................................................................................. 77
APPENDIX 1: YEARS OF EDUCATION AND AGE DIFFERENCES………..77 APPENDIX 2: QUALITY OF EDUCATION INDEX VARIABLES…………...78 APPENDIX 3: DESCRIPTIVE STATISTICS FOR ALL (PARTICIPATION AND PARTICIPATE)…………………………………………………………………..78 APPENDIX 4: DESCRIPTIVE STATISTICS FOR ALL CHILDREN (CONSISTENT AND INCONSISTENT)………………………………………...79 APPENDIX 5: CALCULATING PARTIAL EFFECTS OF CONTINUOUS VARIABLE ON RESPONSE PROBABILITY…………………………………. 79 APPENDIX 6: GENDER-DISAGGREGATED RANDOM EFFECTS PROBIT RESULTS FOR PARTICIPATION………………………………………………80 APPENDIX 7: GENDER-DISAGGREGATED RANDOM EFFECTS PROBIT RESULTS FOR CONSISTENCY……………………………………………… 81
10. REFERENCES ......................................................................................................... 82
6
List of Tables
1. DEFINITION OF EXPLANATORY VARIABLES …………………………………………34
2. DESCRIPTIVE STATISTICS FOR ALL SAMPLE CHILDREN …………………………...39
3. BIVARIATE PROBIT WITH SAMPLE SELECTION RESULTS …………………………..50
4. PROBIT MODEL RESULTS FOR SCHOOLING PARTICIPATION ………………………52
5. PROBIT MODEL RESULTS FOR SCHOOLING CONSISTENCY ………………………...54
6. GENDER-DISAGGREGATED PROBIT MODEL RESULTS FOR
SCHOOLING PARTICIPATION ………………………………………………………………..59
7. GENDER-DISAGGREGATED PROBIT MODEL RESULTS FOR
SCHOOLING CONSISTENCY …………………………………………………………………62
8A. PREDICTED PROBABILITIES FOR SCHOOLING PARTICIPATION (MALES) ……....65
8B. PREDICTED PROBABILITIES FOR SCHOOLING PARTICIPATION (FEMALES) ……66
9A. PREDICTED PROBABILITIES FOR SCHOOLING
CONSISTENCY (MALES)………………………………………………………………………66
9B. PREDICTED PROBABILITIES FOR SCHOOLING
CONSISTENCY (FEMALES)…………………………………………………………………... 67
10A. PREDICTION SUCCESSES FROM TABLE 6 (MALES)………………………………...69
10B. PREDICTION SUCCESSES FROM TABLE 7 (MALES) ………………………………..69
11A. PREDICTION SUCCESSES FROM TABLE 6 (FEMALES) ……………………….…….69
11B. PREDICTION SUCCESSES FROM TABLE 7 (FEMALES)……………………………...69
7
List of Figures
1. DISTRIBUTION OF SAGE SCORES…………………………………………………………32
2. GENERAL HEALTH SESSION FREQUENCY AND SCHOOLING OUTCOMES………...41
3. SOURCE OF WATER AND SCHOOLING OUTCOMES……………………………………41
4. OBSERVATIONS IN A BIVARIATE PROBIT MODEL WITH SAMPLE SELECTION…..45
8
Abstract The benefits of education on productivity, growth and development have engendered the
interests of many researchers in examining the determinants of schooling outcomes.
Various factors have been considered, including parental education, gender and school
factors. This thesis will contribute to the literature and considers another pertinent factor
in education outcomes: the health status of the child.
Many studies have linked the health of a child to their schooling performance because of
the effect that health has on a child’s immune system, cognitive ability and level of
concentration. However these studies focused on individual and anthropometric indicators
of health. In this thesis, the focus will be on village-level health resources.
Using detailed household survey data from rural India, this thesis investigates the impact
that village infrastructure and resources that promote health, such as clean supplies of
water and proximity to a hospital, has on schooling outcomes. Specifically, whether health
resources impact on the propensity of a child to attend school (participation), as well as
his or her propensity to keep up with their schooling for their age (consistency).
The econometric models used in this thesis take into account the possible relationship
between these two schooling outcomes, as well as unobserved household effects that may
impact on a child’s education. The findings of this study indicate that health resources
have a statistically and practically large impact on schooling outcomes after controlling
for a range of individual, household and other village level characteristics. The
importance of preventive care health measures, such as the frequency of general health
lessons, is particularly robust. Furthermore, the effect of health resources is found to be
greater for female education outcomes and indicates that improvement in basic health
provision would also help reduce the pronounced gender bias that exists in education
attainment.
9
1. Introduction Human capital investment through education is widely recognized as an important source
of economic growth and productivity for a nation. Indeed governments in developing
nations often divert substantial amounts of their national expenditure to the education
sector because of the positive flow-on effects a more educated population would have on
other sectors of the economy. It has also been considered an effective means to ease
inequality and to improve the opportunities for socio-economic improvement for people in
developing countries (World Bank, 2001).
Despite this, the education levels in developing countries remain low, and improvements
in education outcomes continue to be needed in order to promote and sustain development
in the world’s poorer nations. This issue is even more urgent as educational attainment is
typically the only means for a family to break the poverty cycle. Thus the implications of
schooling are not only considered for present individuals and families, but for future
generations as well.
In India, the importance of education for children has been demonstrated by the many
program and policies that have been pursued over the last few decades. However,
although improvements have been made, like many other developing countries this
second-most populous nation in the world still faces many challenges and obstacles to
favourable schooling outcomes that need to be addressed.
Because of the known benefits of education attainment, myriad studies have sought to
determine the factors that contribute to improved schooling outcomes. This would then aid
and direct public policy initiatives and investments to strengthen this vital area of
development. Individual and household characteristics such as gender, household income
and parental schooling levels are among the many factors that education papers have
focused on. Following this intention, my thesis will also analyse the determinants of
education outcomes for children using detailed survey data from India. However, my
10
focus is on another potential determinant of schooling that deserves deeper analysis – the
health status of a child.
The topics of health and education have been addressed extensively in the economic
literature and from many perspectives. This study will take the position that health is an
important determinant of a student’s ability to acquiring education: a more healthy child
will be more productive and capable in class, which will positively impact on their
schooling outcome. Indeed the importance of health and nutrition on the overall welfare of
a growing child has been widely acknowledged. From its impact on cognitive and learning
ability, to food absorption and other deficiencies, the potential impact of health on
schooling outcomes has gained considerable attention.
Many studies have linked the nutritional status of children to educational achievements
using data from various developing countries. They find that the health status of a child
does have significant impact on their education outcomes, which is either measured by
participation or test score variations. However, these papers have focused predominantly
on individual-level and anthropometric indicators for a child’s health, such as height-to-
age or weight-to-age measures. In contrast, this study will focus on village-level health
resources and infrastructure as a measure of a child’s health status.
It is hypothesized that a child living in a village that is well endowed with quality health
resources is more likely to achieve favourable education outcomes compared to a child
from a village that lacks proper health services. Although the World Health Organization
(WHO), United Nations (UN) among others have frequently cited unsafe water supply and
sanitation as having severe implications for development, analysis of these and other
health resources on child education outcomes has not been seriously undertaken.
Using nationally representative household and village survey data from rural India, this
thesis will investigate the extent to which health services and infrastructure in a village
affect education outcomes. Two measures of education outcomes will be examined:
schooling participation as well as schooling consistency (the extent that, once in school, a
11
child keeps up with their education for their age). Although schooling participation in
India has experienced vast improvement over the decades, improvement in the quality of
the education received by children remains critical (WHO, 2000). Thus, determining
schooling consistency is considered another important education outcome and one that has
not been commonly addressed in the related literature.
Limited dependent variable models will be employed to estimate this relationship.
Although univariate probit models were used for the main results of this thesis, other
econometric strategies were employed in order to account for two possible sample
problems in analysing schooling outcomes. First, almost a quarter of the children in the
sample had not participated in the school system. If a child’s propensity to participate in
school and a child’s propensity to have consistent schooling were correlated, simply
estimating schooling consistency on children who have participated in school would
amount to endogenous sample selection. Studies that have examined these two schooling
tendencies have often assumed the independence between the two, yet from a
methodological standpoint this possible bias should be addressed in the initial stage of the
econometric study. Thus, a bivariate probit model with sample selection is estimated.
Second, because many children in the sample are from the same family, common factors
may affect siblings in the same household. I employ random effects probit models in
order take account of this potential unobserved household heterogeneity and clustering
effect.
In addition to the investigation of the impact of health resources on education outcomes,
the determination of a gender bias in not only education, but also health and other factors
is considered as well. The gender bias in education has been well documented and studies
have typically included a single gender dummy in order to capture this effect. My analysis
will split the sample into gender-disaggregated sub samples and hope to capture other
biases that may arise. In particular, studies have shown that there is a gender bias in health
outcomes in developing countries. Where health resources are scarce or poor, females
tend to be more disadvantaged than males. This implies that an improvement in health
12
resources would have greater impact on females that could, in turn, help reduce the gender
gap in education outcomes.
The benefits of this rich cross sectional dataset are that several health resources variables
are available to use as a measure of health status, as well as a range of individual,
household and village level data to control for other factors that may affect schooling
outcomes. Accordingly, this thesis will contribute to the literature in two main aspects.
First, it will add to the studies that examine the determinants of education outcomes by
focusing on the health status of the child, as well as providing fresh analysis into the effect
of other commonly considered schooling determinants.
Second, building from the literature that links individual nutritional and health status of a
child to their schooling outcomes, this thesis will investigate the impact of village-level
health resources on education outcomes. Compared to the individual health perspective
that have analysed variation in calorie-intake or height-for-age scores, the results from this
thesis on public infrastructure and resources may allow a more direct policy implication
and suggestion for public expenditure and investment.
The remainder of this thesis is structured as follows. Section 2 presents some background
on education and health in developing countries and considers the case study of India.
Section 3 reviews the literature on education and health with emphasis placed on the
studies that link health to education outcomes. Section 4 presents the conceptual
framework of the investigation. Section 5 develops the econometric strategies employed.
Section 6 describes the cross-sectional household and village data from India used for the
empirical analysis. Section 7 presents and discusses the empirical results that seem to
affirm the hypothesis; in particular, preventive care health measures have a statistically
and practically significant impact on schooling outcomes, highlighting in particular the
positive effect of general health sessions and information. Section 8 concludes with
suggestions for future research.
13
2. Background
2.1 The Global Health and Education Crisis There are many paths to development but the focal areas of this thesis are health and
education. The importance of health and education were stated in the historic 1948 United
Nations Universal Declaration of Human Rights1.
Everyone has the right to a standard of living adequate for the health and well
being of himself and of his family.
(Article 25)
Everyone has the right to an education…directed to the full development of the
human personality and to the strengthening of respect for human rights and
fundamental freedoms.
(Article 26)
These fundamental rights for human beings are basic for socio-economic improvement.
Indeed improving the lives of poor and disadvantaged people has always been a worthy
objective in both developing and developed nations. However this task takes on greater
urgency in developing countries where poverty and underdevelopment are widespread,
with potential to endure and persist through generations.
More recently in 2000, the United Nations set out eight international development goals to
be achieved by the global community. The United Nations and its 189 member states
agreed to make a concerted effort in meeting these United Nations Millennium
Development Goals (MDG) by the year 2015.2 The goals include:
1 http://www.un.org/Overview/rights.html 2 http://www.un.org/millenniumgoals/
14
The United Nations Millennium Development Goals
1. Eradicate extreme hunger and poverty
2. Achieve universal primary education
3. Promote gender equality and empower women
4. Reduce child mortality
5. Improve maternal health
6. Combat HIV/AIDS, malaria and other diseases
7. Ensure environmental sustainability
8. Develop a global partnership for development
Providing basic health resources and achieving good education outcomes would greatly
assist in achieving a majority of these goals. Moreover, the poor state of education and
health resources in the world today, largely in developing countries, further highlights the
need for improvement in these vital areas. Progress reports from developing nations
indicate that, contrary to the ideals espoused in 1948, the provision of proper health care
and education are human rights that remain far from being universal.
The United Nations Educational, Scientific and Cultural Organization (UNESCO)
reported that 113 million school-age children around the world are not in school
(UNESCO, 2002). This deficiency is particularly pronounced in developing nations.
Although education is widely acknowledged as an important source of development, a key
to poverty reduction and thus a priority for developing nations (World Bank, 2001), the
acquisition of education in these areas is not guaranteed. In fact it is estimated that 46 per
cent of people in developing countries are illiterate, 25 percent of children aged 6-12 years
do not receive primary education and 80 per cent of children aged 13-18 years do not
receive secondary education (Todaro, 2000).
The situation of health care and provision lies in a similarly dismal state in developing
countries. Even in the 21st century, 11 million children die each year from preventable
15
illnesses (UNICEF, 2002). A lack of clean water supply, poor sanitation practices and
poor health care contributes to the spread of infections and diseases that, though
inconsequential in developed countries, prove devastating in developing areas. In fact, 1.3
million people die every year of malaria, 1.8 million people die of diarrheal diseases and
90% of these deaths are children under the age of five years old. (WHO, 2004). The
relative ease in the prevention and cure of these low-level ailments further highlights the
severity of the health crisis that faces poor countries in the world.
2.2 Case Study of India Achieving favourable education and health outcomes is of great importance in a nation
like India, the second most populous nation in the world. The United Nations Millennium
Development Goals are goals that the Indian government has pledged to achieve in order
to improve the livelihoods of its people. Finance Minister P. Chidambaram expressed this
commitment during his budget speech in 2004:
“The countries of the world, India included, have set for themselves the Millennium
Development Goals. Our date with destiny is not at the end of the millennium, but in the
year 2015. Will we achieve those goals? In the eleven years that remain, it is in our hands
to shape our destiny.”3
On one hand, great improvements in education and health have been achieved in India
over the last few decades. Policies and initiatives such as the Integrated Child
Development Services (ICDS) in 1975 to the recently launched National Rural Health
Mission as well as high economic growth in recent years has helped to foster improved
outcomes in these two vital areas of development. Yet, serious challenges still remain and
need to be addressed in order to achieve sustained growth and development.
3 Sachs, J. (2005). “The End of Poverty: How we can make it happen in our lifetime”, Penguin Books, Great Britain. p. 185.
16
According to the World Bank, the number of Indian children not in school has been
reduced from 25 million in 2003 to 9.6 million in 2005-06. More equity in schooling has
also been achieved and the gap between gender and social status (through castes) has also
been reduced.4 However, challenges in education still persist. Although there have been
improvements in schooling participation, the quality of education outcomes has not
experienced the same progress, with incomplete schooling and drop-outs a considerable
problem.
In terms of health, India is also lagging behind in both the provision and quality of health
resources. In fact, India has one of the highest percentages of undernourished children in
the world with approximately 60 million children classified as being undernourished
(Gragnolati et al, 2005). According to UNICEF, in 2006 some 2.1 million children under
the age of five died in India and this figure has been attributed to India’s poor state of
health care and delivery (UNICEF, 2006).
Moreover, in their report on undernourished children in India, Gragnolati et al (2005)
argue that the country’s child malnutrition problem persists in part because the focus on
improving nutrition has primarily been on food intake. However, they highlight the role
that infections and ill-informed health practices have had on India’s malnutrition epidemic.
Although great improvements in the provision of improved drinking-water sources have
been achieved (86% of the Indian population in 2004), unimproved sanitation and lack of
general health knowledge in the population remain significant public health threats for
much of the population (UNICEF, 2002). The extent that such bleak health conditions
affect poor education outcomes forms the basis of this investigation.
4World Bank (2008) India Country Overview 2008. See Reference.
17
3. Literature Review
3.1 The Benefits and Determinants of Education
Human capital formation through education is fundamental for development and progress.
This is particularly an important issue in developing countries where the poor may
experience persistent inequality and poverty because of the credit constraints and lack of
opportunity for socio-economic improvement. Through higher post-school earnings, a
good education provides the opportunity for intergenerational income mobility and a
breaking of the poverty cycle (Restuccia & Urrutia, 2004). Alternatively, Behrman (1990)
and Bedi and Gaston (1997) argue that children who are poorly educated may have low
productivity in adult life and end up in poverty.
However, free provision of education does not necessarily mean free consumption and a
vast amount a literature has been devoted to the study of schooling determinants in
developing countries (Dreze and Kingdon, 2000; Duraisamy, 1992; Sipahimalani, 1997
among others). Dreze and Kingdon (2000) examined schooling participation and grade
attainment in rural north India. Using household survey data, they employed a logit model
for estimating participation and an ordered logit model for estimating the determinants of
grade attainment (three outcomes of not enrolled in, enrolled but not completed and
completed primary schooling) for separate as well as pooled sample of female and male
children of primary age. Their results found that a range of individual and household
variables affects participation and grade attainment in school. They particularly
highlighted the role that parental education plays in schooling outcomes as well as
schooling characteristics. Mid-day meals are found to be particularly effective in
improving participation in school for girls.
Other studies on the determinants of education outcomes have also considered a range of
variables but focus on some key characteristics. For example, Blau and Grossberg (1992)
find the role of mother’s education an important determinant while Brown and Park
(2002) focus on the role that wealth and credit constraints have on schooling investments.
18
Birth order and family size are also commonly found to be significant determinants for
schooling outcomes. It is from these studies on schooling determinants that the choice of
explanatory variables in the model will be considered.
Another factor that has been considered in relation to education attainment is the role of
health. The literature on education and health is also quite extensive. Some studies have
examined them as separate inputs for growth and development. This study aims to
investigate a relationship between the two.
3.2 Education and Health
Studies have compared the benefits of investing in education with the benefits of investing
in health in the hopes of directing public policy in the area that would most benefit
economic growth. Knowles and Owen (1995) found that health has a greater impact on
economic growth than education. Using life expectancy as a proxy for health capital, it
was found to have a statistically and practically significant impact on income per capita
compared to education, and highlighted the importance of including health capital in
models of growth.
Webber (2002) is even more emphatic about the apparent trade off between education and
health in his paper subtitled, “should we invest in health or education?” His question is
answered using cross sectional data from 46 countries. Webber finds that his proxy for
health, as measured by the intake of calories per head, has a statistically insignificant
effect on economic growth contrary to education. He concluded that the results support
the notion that investing in health has lower returns for a nation than investing in
education. His suggestions for future research, however, are to investigate other proxies
for health, in particular health infrastructure such as the supply of clean water and quality
of health care.
Therefore economic growth studies have viewed education and health as two separate,
exogenous and almost opposing inputs for growth and development. This study, however,
19
will instead consider the intricate relationship between these two important sources of
social capital whilst focusing on individual education and health outcomes.
Both channels of the health and education relationship have been examined. Although this
study is investigating the impact of health status on education outcomes, the opposite
direction of education’s effect on the health status of an individual has also been of
interest – that is, the health benefits of acquiring education. Mushkin (1962) argues that
ignorance delays medical treatment and subsequently increases the strength of infection
and disease. The inability to read and understand medicinal information and innovations
could also be averse to a person’s health. More educated people could also be more likely
to be employed in “safer” white-collar occupations with less health risks and generally
pursue activities that do not endanger their health (Case 2002; Caldwell, 1986).
However, the direction of this education and health relationship views education as human
capital already acquired. Since education attainment (and the ensuing positive flow-on
effects) is not guaranteed in developing countries, the reverse relationship is of interest in
this study. The channel that this thesis will investigate – namely, the extent that health
status impacts on education outcomes - is analogous to the impact of worker productivity
on output. In other words, we investigate the impact of the health of a child on his or her
capacity for participating and exerting effort in school.
The link between health and education attainment has been well established. Using height-
to-age, weight-to-age and other anthropometric measures for child health, research across
a range of developing countries has shown that variations in these indicators have a
significant impact on schooling outcomes.
The health and nutritional status of a child has been shown to determine the propensity of
a child to participate in school. In their study of Nepalese children, Moock and Leslie
(1986) examined the effect of nutrition status – as measured by height-for-age, weight-for-
age and weight-for-height – on both schooling participation and grade attainment. They
estimated a probit model for schooling participation and found that children with better
20
nutritional status had a significantly higher probability of attending school compared to
those with stunted growth. In terms of grade attainment, their ordinary least squares (OLS)
results also come to the same conclusion on the benefits of nutrition on schooling
outcomes. However, only 15 per cent of the 350 primary school aged children actually
participated in school. Therefore their analysis of grade attainment for this 15 per cent of
the sample does not take into account the possible sample selection of participation. This
will be addressed in my study, as the proportion of non-participants is also non-trivial.
Addressing schooling participation, Glewwe and Jacoby (1995) found that children in
Ghana delayed enrolment in school and also completed fewer years in schooling because
of malnutrition and poor health (measured by height-for-age). Because of the negative
impact delayed schooling would have on post-school labour earnings, the authors
emphasise the importance of child health and nutrition. However, after they control for
unobserved family variables using random and fixed effects estimation, the effect of
health is substantially reduced. This paper highlights the importance of accounting for
unobserved factors that may affect the analysis.
Studies also link poor health status to poor achievements by children in school as
measured by variation in test scores. Gorman and Pollitt (1993) found that children with
better nutrition in Guatemala performed better in cognitive and other school tests.
Similarly, a study in the Philippines also found that a one standard deviation increase in
early-age child health increased test scores by almost a third of a standard deviation
(Glewwe & King, 2001). Thus these studies have indicated that the health status of a child
can impact on their ability to acquire education through its consequence on concentration,
cognitive and physical ability. Subsequently this will impact on the quantity and quality of
their education and potential for socio-economic improvement.
However, these studies on health and education have largely focused on the impact of
individual health status on education outcomes. Because of this, the empirical methods of
these papers have primarily used two stage least squares in order to account for the
possible endogeneity of a child’s health to their education outcomes. This is because, as
21
mentioned previously, the relationship between health and education can be viewed from
both directions and thus this issue would arise. Less focus, however, has been placed on
the role of village health infrastructure and resources as a determinant of health status, and
its subsequent impact on education outcomes. By using village level health resources as
an explanatory variable, the threat of endogeneity is minimized. That is, a child’s
education outcome may impact on their individual health status, but it is unlikely that a
child’s education outcome would impact on the village’s level of health infrastructure.
It has been established that there exists a relationship between the quality of sanitation and
water and other village-level health resources, and the health status of households,
particularly children. For example, Esrey (1996) found that improved water seemed to
decrease the prevalence of diarrhoea in children by 6 percentage points when analysed
across different countries. However, the extent to which these village-level health
resources impact on education outcomes has not been vigorously addressed and it is this
gap in the literature that my thesis hopes to fill. While studies have included physical
infrastructure variables in their models that could proxy for health resources in the village,
it is usually a single indicator that is used to measure the overall level of development in
the village. The presence of piped water, for example, is a common proxy that has been
used in studies of education outcomes (for example, Psacharopoulos and Arriagada, 1989;
Holmes, 1999). The innovation in this paper is that besides controlling for village level
development, various other health resource variables will be included in order to isolate, if
any, a causal relationship between health resources and education outcomes.
3.3 Gender Bias in Education and Health Gender bias in education outcomes has been a keen and important area of interest in the
education literature (Lavy et al 1996; King and Lillard, 1987 among others), which
indicates there is a sharp disparity in female and male schooling outcomes. South Asia is a
region well known for its strong male preferences and discrimination against females and
this inequality has also been viewed from a health perspective. For example, Rosenzweig
and Schultz (1982) and Dasgupta (1987) found that there were significant gender
22
differences in household health care and resource expenditures in India as a result of
perceived differences in future earning abilities. Studies of Pakistan found that boys
received preferential treatment over girls with respect to treatment for illnesses such as
diarrhoea and fevers as well as acute respiratory infections (Mahmood and Mahmood,
1995; Filmer et al, 1998). Such bias has implications for childbirth and thus the health of
future generations – further highlighting the importance of reducing such inequality.
Females may also be given the task of taking care of ill family members if their
opportunity cost of being outside the home is considered lower than a male counterpart.
Therefore this issue of gender bias will also be addressed in this study, not only from a
schooling outcomes perspective but also in terms of health outcomes.
This thesis will contribute to the literature on the determinants of schooling by analysing a
range of individual, household and village characteristics. This area of research is
particularly important for developing countries in which education attainment is
considered a key to development and poverty reduction. Furthermore, this study will
extend the literature by focusing on a range of village-level health resources as a measure
of health status and nutrition that has not been seriously addressed. The empirical methods
employed in this paper will also take into account the possible sample selection of
schooling participants that tends to be ignored in the literature, as well as accounting for
unobserved household effects that could impact on the schooling outcomes for children in
the same family. Finally, gender bias in education and health outcomes will also be
examined.
23
4. Conceptual Framework
The primary hypothesis of this thesis is that village-level infrastructure and resources that
promote health leads, through improved health conditions, to more favourable education
outcomes. ‘Education outcomes’ here reflect schooling participation as well as schooling
consistency (the extent that a student keeps up with schooling according to their age). In
terms of health resources, these can be categorized into three broad areas (De Ferranti,
1985):
Preventive care (patient-related): this includes services that are performed
on well patients in order to reduce the incidence of adverse health events like
gastro-intestinal infections, diarrhoea, and malaria etcetera. They would
include measures such as food supplements, malaria shots and other
vaccinations.
Preventive care (non-patient related): this includes services that are
provided in a community in order to control the spread of disease and
infections. These include resources such as clean water, proper sanitation and
the promotion of good health habits and hygiene.
Curative care: these include resources such as hospitals and health
facilities, medical practitioners or traditional healers that act to contain and
ease illness after they occur.
The ailments that plague school aged children, particularly in developing countries, are
typically common “low level” diseases that tend to be easily preventable (the occurrence
of diarrhoea, for example). The prevention and treatment of more complex ailments, such
as malaria, are also well known. This indicates the importance of investing in basic health
care and resources.
I posit that communities that are less endowed with health resources and measures would
lead to adverse effects on the education outcomes of the children living in that area. In a
poor health-resourced environment, children are more likely to be afflicted periodically
24
with low-level ailments that would cause temporarily debilitation. The period of time that
they remain indisposed depends on the severity of the illness, capacities of family care as
well as the curative care services available in the area. Therefore the hypothesis of this
thesis is that the health status of a child and their incidence of sicknesses will then impact
on their education outcomes by affecting their ability to perform and succeed in school. It
will also impact on the probability of school entry.
It is hypothesized that the frequency of illnesses in the years preceding schooling age
reduces the probability of school participation. Due to the opportunity cost of spending
time in school, as opposed to utilizing that time in the home or engaging in paid work, the
payoff from schooling needs to outweigh the cost of the invested time (besides other
pecuniary schooling costs). This opportunity cost is particular high in developing
countries where the resources in households can be heavily constrained.
However, time spent in school is more effective over consistent time periods rather than
short bouts of learning. Consistent attendance in school increases the productivity and
learning ability of the child and also gives them the opportunity to understand ideas and
concepts that would be important for higher level learning. Therefore because schooling
requires a long term and consistent time investment in order to provide “profitable”
returns (in terms of potential earning ability in the future), a child who is prone to illness
and will consequently have a transient presence in the classroom may be more productive
in non-schooling activities or work (where returns requires a less consistent time
investment). Alternatively, the poor health status of a child may simply demand time and
caring in the home rather than in the classroom. As such, a hypothesis of this study is that
there exists a positive relationship between the probability of a child participating in
school and the availability and quality of public health infrastructure in the village.
Similarly, I hypothesize that health status impacts on children who are already enrolled in
school by affecting their probability of keeping up with their studies. The frequency with
which the child falls ill, and the extent that they remain ill without timely treatment,
determines the probability that he or she successfully completes their schooling. Again, by
25
missing classes due to illnesses, a child misses out on learning concepts that would be
used to understand more difficult material. Poor health and nutrition would also affect a
child’s cognitive ability and capacity for learning and concentrating in the classroom. This
would increase the probability that they fall behind in their schooling.
Another consequence of a village being endowed with poor health resources is that the
health status of other family members, besides the school-aged children, would be poor.
This could place more responsibility on the children to take care of their ill siblings or
elders and give less priority to attending or keeping up with their schooling.
Therefore this relationship between health resources and education outcomes can be
expressed as a reduced form achievement function:
Ai = β1Zi + β2Hi + vk [1]
where Ai is the education outcomes of child i. Zi is a vector of individual, household and
other village-level characteristics that affect education outcomes, Hi includes the village-
level health resources available to child i, and vk is a random disturbance term that
includes unobservable characteristics that would affect schooling outcomes, for example,
a child’s innate ability.
Therefore my theory is that the education outcome of a child is, among other things, a
factor of the health resources in the village, through its effects on the health status of the
child. Implicit in this theory is that all families in a “good health” community will utilize
the health resources available and alternatively, “bad” health resources will adversely
impact on all families and the health status of the children in that community. This may be
considered a strong assumption if some health measures are not available to all families in
the village. However, the nature of public infrastructure and services is that when they are
in place, they are available to all residents regardless of the economics means of the
individual family. For health improvements in particular, the benefits to a whole
community or village has well been affirmed.
26
In a study by Bundy et al (1990), they identified a transmission effect from treating
diseases for school age children to the rest of the adult community. Hughes et al (2000)
and Alderman et al (2001) also found that the impact of village level health resources had
spill over effects on the entire village. Therefore I assume that a village with good health
resources would similarly have positive externalities for the wider community. Likewise,
a poor health-resourced village would be more conducive to infectious disease and illness
that could permeate throughout the village because of interactions and poor practices
across families.
27
5. Data 5.1.1 Description The data used in this study was collected by the National Council of Applied Economic
Research (India) in 1999. This ARIS-REDS data comes from a nationally representative
sample of rural Indian villages and households. This rich cross-sectional dataset is
appropriate for this study as it provides detailed individual, household and village-level
data across a range of socio-economic characteristics including health and education
information for every family member.
The ARIS-REDS surveys were taken in several rounds. It was first collected in 1969,
then 1970, 1971, 1982 and 1999. Due to the timing between the survey rounds,
longitudinal analysis of the households and villages would not be possible without
significant changes in the composition of households and villages. The 1999 data was
chosen for this analysis as it is the most recent, the data is in Stata format and the
directories and identifications are presented in a clear layout (relative to the earlier
datasets in which much of the information on scanned photocopies lacked clarity). This
dataset covers 9298 families, consisting of 44,999 individuals across 253 Indian villages
and within 16 rural states.
Although the majority of the 1999 data is available online5, merging the village and
household data needed village identifiers that were suppressed for privacy concerns.
Professor Andrew Foster of the Department of Economics and Community Health at
Brown University manages the full ARIS-REDS data. After obtaining approval from my
faculty’s Human Research Ethics Advisory Panel and Professor Foster, the secure data
allowed full merging of the relevant data decks.
5 http://adfdell.pstc.brown.edu/arisreds_data/
28
5.1.2 Handling of Data The sample consists of several decks of information. If a family provided answers for
the deck 2 questionnaire (referring to household composition), then because of full
enumeration, more detailed information for all members of the family were available in
subsequent decks. Since the data for this study needed to be merged across household
and individual levels, deck 2 provided an overview of the family and was considered the
“master” deck. A household identification (ID) number and specific member ID
matched more detailed information about the child such as years of education. For
example, the master deck contains information about a family with ID 7072 with three
children of schooling age. These children had specific IDs that were matched with more
detailed information in deck 6 (sons) and deck 7 (daughters).
However, there were incidences of ID inconsistency across individuals within the family.
For example, a son with an ID of “4” in the master deck may not match with their
specific ID in deck 6. These inconsistencies can be attributed to the sheer nature of
survey data and the inclination for human type error. In such cases, manual re-
identifications were necessary. Although a tedious process, this ensured that the data
remained consistent and informative for this analysis. Finally, the village data needed to
be merged with the master deck. Secure village IDs were matched with coded identifiers
in the master deck. Thus a fully merged dataset that consisted of individual information
within families and across different villages was constructed for analysis.
My study is based on a sub sample of children of the schooling age 6-18 years.
Moreover, we are interested in children who were alive at the time of the survey as well
as children who were living in the family. Therefore the sample did not include children
listed in the more detailed decks that were not indicated in the master deck. These
children were generally older, married and lived away from the family village. I was
interested only in children living in the family village as the village-level health
resources may only then have relevant impact on their education outcomes. Finally,
children of families for which education and other household information were missing
were excluded from the analysis. Therefore families with children, as indicated in the
29
master deck, that did not have more detailed information in the other decks were
excluded. This was done under the assumption that this information was missing at
random and thus their exclusion would not adversely affect the results of our analysis.
Therefore our analysis was reduced to a sub sample of 8,668 children of the schooling
age 6-18 years.
5.1.3 Limitations of Data There are limitations to the information contained in this dataset. First, for village data,
information on hospitals and schools are only available for the main institution in the
village. That is, although there may be more than one school in the village, detailed data
about the number of qualified teachers or availability of textbooks is available for the
representative school in the village only. This means that it has to be assumed that the
main school in the village is representative in terms of quality of the other schools. This
assumption has to also be placed on the village information available for hospitals and
health centres.
Second, the cross sectional nature of the dataset implies that the children within the
schooling range of 6-18 years had access to the schooling and health resources as they
were maturing. That is, an 18 year old that is shown in the data to reside from a village
with good health resources is assumed to have had this quality of health resources as he
or she were growing up. Because of India’s emphasis on improving health resources as
early as the 1950s but more so in early 1980s, this assumption is considered acceptable
and necessary given the scope and nature of the data.
5.2 Dependent Variable There are different ways to measure education outcomes. Some studies have utilized and
examined variations in standardized test scores as a measure of education attainment
(for example, Glewwe & King, 2001; Jamison and Lockheed, 1987). Arguably, this may
not be an adequate reflection of education achievement as schooling provides a child
30
with opportunities of learning social skills through interaction and other positive
externalities. More so, test scores reflect students who are already enrolled in school –
that is, human capital already acquired. However as aforementioned, in many
developing countries including India, participating in the education system is not
guaranteed or universal but is affected by various household and socio-economic factors.
Therefore participation in school and consistency in school are considered as measures
of endogenous education outcomes for this analysis.
Measuring schooling participation is uncomplicated. A child who reports at least one
year of education is considered a school participant. For measuring schooling
consistency, however, simply comparing the number of schooling years attained across
different ages would be erroneous. That is, a 5 year old would have less years of
schooling than an 18 year old simply because of their age difference. Although
controlling for age could correct for this, a crux of this study is measuring the
completeness or consistency of schooling outcomes. Therefore whether a child keeps up
with their schooling, given their age, is of particular interest. In order to analyse these
variations in education for children across different ages, a standardised measure that
captures both participation and completion is used.
A standardised measure for education outcomes, called the SAGE score (schooling for
age), has been used in other studies and is a useful measure because it controls for
different aged children and encompasses both schooling participation and years of
schooling completed (Patrinos and Psacharopoulos, 1997; Gitter & Barham, 1999). The
SAGE score is calculated as follows:
SAGE = [S/(A – E)] * 100 [2]
S is the total number of years completed, A is the age of the child, and E is the age that
children officially begin schooling. The age in which children start school in India is 5
years old. Thus a SAGE score of 100 would mean that a child’s education is consistent
for their age, whilst a score less than 100 would mean that they have missed some years
31
of schooling or not participated at all (a SAGE score of zero). Although the official
schooling age is 5 years, children may begin some sort of pre schooling and accordingly
SAGE scores greater than 100 are possible.
Again, survey data are not free from error. The dataset used contained discrepancies in
the child’s age and years of education that needed to be accounted for. Although these
differences are the interest of this study, including observations in which the difference
is obviously a case of human type error could be adverse to our analysis and lead to bias
results. Therefore, children with SAGE scores that were confidently regarded as a result
of human error were altered according. For example, a child aged 10 years is purported
to have 40 years of education – a difference of 30 years. Changing the education years
to ‘4’ seemed reasonable. Two other observations were changed in this way as their
differences in age and years of education were unrealistically great in magnitude (and at
the same time easy to infer the correct value). Note that the results did not change with
the exclusion of these three observations. However, there were cases of ambiguity in
which the difference in age and years of education were very small.
The minimum cut-off for differences in this analysis is three years of age. This means a
6-year-old child having three years of education is considered reasonable since children
may have had pre schooling before the official schooling age. Moreover, the
observations with differences of less than three years constituted only 0.8 per cent or 68
observations of the entire sample and thus culling these observations was believed to
have little impact on our analysis. See Appendix 1 for a table of the age-education year
differences and the abovementioned changes.
Therefore, the final sample for analysis will consist of 8,600 observations of 6-18 year
old children. The distribution of SAGE scores over this sample is shown in Figure 1.
32
Figure 1: Distribution of SAGE scores
05
1015
2025
Per
cent
0 100 200 300SAGEX
Figure 1 shows that a significant proportion of the observations in the sample had SAGE
scores of zero. Specifically, 2028 or 23.57% of the entire sample of 8,600 children have
had no years of education. Of the 6572 children who have had at least one year of
education, 66.84% of them have not kept up with their education according to their age
(with SAGE scores less than 100). Approximately 6% or 569 children had a SAGE
score greater than 100, indicating some form of pre schooling. The variation in SAGE
scores for the sample will inform the estimation methods to be used, which will be
detailed in the next section.
Finally, it should be noted that variation in SAGE scores between 0 and 100 do not
reflect the extent of schooling “completeness”. That is, a higher SAGE score does not
necessarily mean that the child is keeping up with their schooling “better” than a child
with a lower score. Given the way the SAGE score is constructed, such variations can be
attributed to difference in ages rather than differences in years completed. For example,
33
a 17 year old with 10 years of education and a 12 year old with 5 years of education are
both behind in their schooling-for-age by two years. Yet, the 17 year old has a higher
SAGE score of 0.83 compared to 0.71 for the 12 year old. However, because an aim of
this thesis is to analyse the incidence, rather than the extent of, children falling behind in
their schooling, this particular feature of SAGE scores between 0 and 100 is not of key
interest in this study.
Therefore, equation [1] will be estimated with two measures of education attainment:
schooling participation and schooling consistency. A binary outcome of school
participation (SAGE score > 0) or not and school consistency (SAGE score ≥ 100) will
describe the education outcomes of the children. Details on the econometric strategy and
the modelling of these dependent variables will be provided in the next section.
5.3 Explanatory Variables The variables of interests in this study are the village-level health resources. In addition
to these, our rich dataset also allows for several individual, household and village level
control variables. Table 2 presents the definitions of the list of variables used for this
analysis. The inclusion of the control variables was considered given the practice and
findings of previous studies.
34
Table 1 Definition of explanatory variables Variable Definition
Individual characteristics
AGE6_12 dummy, takes value 1 if child is aged 6-12 years AGE13_15 dummy, takes value 1 if child is aged 13-15 years AGE16_18 dummy, takes value 1 if child is aged 16-18 years MALE dummy, takes value 1 if child is male BTHORDER value 1 for first born, 2 for second born, 3 for third born…etcetera ACTIVITY dummy, takes value 1 if child performs activity non-school related
Household characteristics LAND dummy, takes value 1 if family owns land FAMILY_SIZE number of family members HH_EXP family expenditure on food and non-food items as reported in
1999 (Rupees) per capita HEAD_EDUC education level of head of the family and spouse
Village characteristics ELEC dummy, takes value 1 if village is electrified SCHOOL_DIST distance (km) of the main school from the village EDUC_QUAL education quality indicator of value 1-10 HEALTHDIST dummy, takes value 1 if health facility is not situated in the
village HOSPDIST dummy, takes value 1 if rural hospital is not situated in the village CHLORF dummy, takes value 1 if frequency of well chlorination is at least
every 3 months MALAEF dummy, takes value 1 if frequency of malaria spraying is at least
every months GHEALTH dummy, takes value 1 if frequency of general health sessions
given in the village is at least every 3 months WATERSOURCE dummy, takes value 1 if village has improved source of water TOILETQ dummy, takes value 1 if village has improved sanitation HTHGUID dummy, takes value 1 if village has a health guide
a) Village-level health resources There are eight health variables of interest that are considered to capture the level of
health resources in a village.
35
The proximity of a health facility from the village (health_dist and hosp_dist) was
deemed adequate proxies for the extent of curative care available in the village. A proxy
for the quality of curative care could only be captured by whether the health centre had
beds on its premises. However this had little variation over the sample and thus was not
included.
The variables watersource and toiletq are the indicators of safe drinking water and
proper sanitation in a village. Following the definitions by the World Health
organization and UNICEF, an “improved” source of water in a village is the presence of
a public tap, hand pump or tube well. An “unimproved” source includes canals, rivers
and ponds. “Improved sanitation” refers to toilet facilities with a flush or semi flush, and
“unimproved” sanitation is defined as a service latrine or open fields in the data. (WHO
and UNICEF, 2004)
Other preventive care health resources in the analysis include three frequency measures.
The frequency of well chlorinating, chlorf, is considered an important health measure.
An improved source of water may still contain harmful bacteria and disease-causing
organisms and chlorinating water sources will help kill such bacteria and reduce the
transmission of water-borne diseases (WHO, 2004). Thus the frequency of well
chlorinating is considered an important water quality measure. The frequency of malaria
spraying, malaef, obviously helps reduce the incidence of malaria but is also an
indicator of the other vaccinations and immunizations that the village may perform. The
frequency of general health sessions, ghealthf, reflects the level of health consciousness
in the village. Indeed general health knowledge has been cited as being fundamental to
improving the health of people in developing countries. Basic practices such as hand
washing, for example, has particularly been acknowledged as effective means to reduce
the spread of disease and infection and is increasingly being promoted in developing
countries (Reuters, 2008). Finally the presence of a health guide in a village, hthguid, is
another measure of health resources that is thought to improve schooling outcomes by
promoting good health practices.
36
b) Individual and household characteristics Age dummies that reflect different levels of schooling: primary, middle and senior
school, are included in order to capture any possible age cohort effect on schooling
outcomes. Gender may play an important role in determining education outcomes as
well. From traditional or cultural norms, males may be considered more important as a
future income-earner for the family (contrary to females who will marry and “leave” the
family). Therefore parents may favour schooling resources towards sons rather than
daughters.
Although the empirical evidence has been mixed, the birth order of the child may also
have an effect on education outcomes for children. Children that are born earlier (and
have a low birth order) have fewer siblings to compete with and thus can enjoy a greater
proportion of household resources (Lindbert, 1977). Older children may also be
expected to provide for the family and thus are given greater access to schooling
resources by parents. On the other hand, later stages of the life cycle for a family may
have greater resources available for schooling. This means older children may not be
given the chance for schooling because of limited household resources compared to later
born children (Parish and Willis, 1993). Therefore the expected impact of a child’s birth
order on their education outcome is not certain.
The financial resources of the family would impact on the educational outcomes of
children by influencing their ability to put their children through schooling. Indeed
although schooling costs may be freely provided and subsidized by the government,
there remain other costs to schooling, such as textbook, transport and other
miscellaneous expenditures. Moreover, and another important issue in developing
countries, the opportunity cost of the child’s schooling is their labour in the fields and
supplemental income. Thus the activity status of a child – whether he or she is engaging
in non-school activities be it paid or non-paid – is included as an explanatory variable.
37
Psacharopoulos and Arriagada (1989) found the demand of child labour by the family a
strong indictor of schooling participation.
Moreover, if a family has a higher income, the willingness to invest in a child’s
education (which is a long term investment) rather than the shorter-term child labour
earnings is expected to be greater. However as suggested and used by Maitra (2003), log
of household expenditure should be used as a proxy for permanent income. This is due
to possible mis-measurement of household income information in the data, as well as the
transitory nature of household income. It is considered that households tend to smooth
consumption over time and thus household expenditure per head is a more appropriate
proxy for permanent income. The data on household’s expenditure includes food and
non-food items.
Access to credit is also another factor that may influence human capital investment. If
poor families are credit constrained and lack the collateral to borrow against their
income, investing in human capital would be difficult. Whether a family owns land or
not is used as a proxy for family assets and their ability to invest in education. It is
expected that a family with more resources are more likely to access credit and invest in
education. This is considered important in deciding schooling participation for a child
and thus land is used as an exclusion restriction in the econometric strategy, which will
be detailed later.
The education level of the head of the house, head_educ, is also included in order to
capture its expected positive correlation with the child’s education. This may be due to a
greater appreciation of schooling or ability to assist their children in their studies The
intergeneration effects of parental education on the schooling outcome of children has
been studied previously (for example, see Glick and Sahn, 2000; Brown and Park, 2002).
Head education could also be considered a proxy for the child’s innate ability or IQ.
38
c) Other village-level determinants of child education outcomes
Previous studies have indicated that community-level factors in education, such as
distance of village to schools, teacher-to-pupil numbers, class size and other quality
indicators impacts on the education outcome of children (Hanushek, 1995, Glewwe,
2002). Supply side schooling factors are including in this analysis. The distance of a
village from a school entails time and transportations costs that could reduce the demand
for school as well as the ability to keep up with schooling. An index, educ_qual, was
also formed, which includes 10 possible indicators of school quality including the
availability of mid-day meals, furniture for students and computers in the school. See
Appendix 2 for the list of variables in the index.
Other community-level infrastructure factors were also considered as determinants of
schooling outcomes. The level of development or income in the village could impact on
both education and health outcomes. Thus appropriate proxy variables needed to be
included. Unfortunately, the data allowed for limited variable choices in this respect.
Whether a village is electrified or not, elec, is used to indicate the level of development
in the village. Other proxies that were considered included the number of televisions and
telephones in the village. These were poor proxies, however, because these continuous
variables were not useful unless the proportion of the village people owning these items
were known. As such, only one indicator of village development is included in the
analyses (which pose one of the limitations of this study).
Table 2 contains descriptive statistics for the variables used in the empirical model.
Descriptive statistics of the variables with comparisons between children who have
participated in school against those who have not participated, as well as comparison
between consistently and inconsistently schooled children can be seen in Appendixes 3
and 4 respectively.
39
Table 2: Descriptive Statistics for all children (N = 8600)
Variable Mean Standard
Dev. Min Max age6_12 0.539 0.498 0 1
age13_15 0.230 0.421 0 1 age16_18 0.230 0.421 0 1
male 0.547 0.498 0 1 bthorder 1.993 1.128 1 13
activity 0.097 0.296 0 1 familysize 6.423 2.490 2 30 lnHH_exp 10.080 0.520 7.711 13.851 headeduc 6.921 5.594 0 44
land 0.740 0.438 0 1 elec 0.917 0.276 0 1
scdist 1.411 2.511 0 9 educ_qual 3.788 2.180 0 10 healthdist 0.414 0.493 0 1
hospdist 0.268 0.443 0 1 chlorf 0.551 0.497 0 1
malaef 0.431 0.495 0 1 ghealthf 0.538 0.499 0 1
watersource 0.641 0.480 0 1 toiletq 0.143 0.350 0 1
hthguid 0.418 0.493 0 1
Before econometric analysis was performed using the data, simple comparisons were
made in order to identify any possible correlation to support the hypothesis. A priori it is
argued that poor health resources lead to poor education outcomes. Therefore it is
expected that children who have not participated in school or who have been
inconsistent in their schooling would live in villages that lacked preventive and curative
care health measures.
40
Some simple correlations of the data, focusing on two health resources – general health
sessions and source of water - is shown in Figure 2 and Figure 3. Both indicated that this
correlation could exist. Note that participation and consistency has been modelled as a
binary outcome
It can be seen from Figure 2 that a higher percentage of children who have participated
in school live in villages with frequent – at least every two months - general health
sessions (57%) compared to children who have not participated in school (43%). In
terms of schooling consistency, though there is less pronounced difference, 60% of
children who have kept up with their schooling live in villages with frequent general
health sessions compared to 56% of children who have fallen behind in their studies.
Source of water and schooling outcomes is compared in Figure 3. Similarly, a greater
percentage of children who have good education outcomes tend to live in villages with
improved sources of water.
Certainly these “naïve” correlations do not take into account any other factors that may
impact on schooling outcomes. However, this preliminary exercise gave some indication
that a relationship may exist, further justified our investigation and necessitated the use
of econometrics to advance our analysis.
41
Figure 2: General Health Session Frequency and Schooling Outcomes
57%
43%
60%56%
0
10
20
30
40
50
60
70
Participation Non- Participation Consistency Inconsistent
% o
f chi
ldre
n fr
om g
ood
heal
th re
sour
ce v
illag
e
Figure 3: Source of Water and Schooling Outcomes
64%69%
58%66%
0
10
20
30
40
50
60
70
80
Participation Non- Participation Consistency Inconsistent
% o
f chi
ldre
n fr
om g
ood
heal
th
reso
urce
vill
age
42
6. Econometric Approach 6.1 Binary Dependent Variable Our main hypothesis is that a low level of health resources in a village will increase the
probability that a child falls behind in their schooling. Moreover, it may reduce the
propensity that a child attends school. As aforementioned, the construction of the SAGE
score means that variations in scores between zero and 100 do not reflect the extent of
schooling “completeness”. Therefore, Ordinary Least Squares (OLS) estimation of
variation in SAGE scores against explanatory variables would not produce meaningful
results. Instead, the nature of the SAGE scores indicates that a binary dependent variable
approach would be appropriate.
A SAGE score equal or greater than 100 would indicate that a child has kept up with their
schooling and a SAGE score less than 100 would indicate inconsistent schooling for age.
Similarly a positive SAGE score would indicate schooling participation and a SAGE score
of zero would indicate non-participation. Thus two probit models will estimate equation
[1] with a binary dependent variable for schooling participation and schooling consistency.
However, the nature of the data indicates that there may be a sample selection problem
when modelling schooling consistency.
From Figure 1, it was shown that a sizable number of the children in the sample have had
no schooling years at all. Because inconsistency in schooling necessitates participation in
schooling, ignoring these observations and only analysing the sub sample of schooled
children could produce inconsistent estimates. This is because the analysis of the schooled
children may not be randomly selected. Since schooling consistency is only observed if
the child participated in school, sample selectivity bias may arise if the probability of not
participating in school is not differentiated and distinct from that of being inconsistent in
schooling.
43
Therefore to address this suspicion of sample selection and interdependency between
schooling participation and schooling consistency, a bivariate probit model with sample
selection as well as univariate probit models was employed.
6.2 Bivariate Probit Model with Sample Selection A probit model of the standard form models each schooling outcome of participation and
consistency. Let the superscript * indicate an unobserved or latent variable:
Y*i = βXi + ei P(Yi = 1 | Xi) = P(Y*i > 0| Xi) = P[ei > -(βXi) | Xi] = Φ( βXi ) [3]
where i is an individual subscript, X represents a vector of characteristics and includes 1,
ei is a standard-normally distributed error term and Φ(.) is the standard-normal cumulative
density. For more details, see Wooldridge (2002). Equation [2] is estimated by maximum
likelihood estimation using the econometric program, State 9SE (as are all estimations in
this study).
However, because of the possible sample selection bias from estimating the schooling
consistency model, it is necessary to begin the analysis using a two-equation approach in
order to determine if a sample selection problem existed. Thus, a bivariate probit model
with sample selection was employed.
This model consists of two simultaneous equations – one for the selection equation of
attending school, Yi1, and another for the outcome equation of keeping up with schooling,
Yi2. In other words, the econometric model will consider two latent variables representing
the propensities of a child to be educated as well as the propensity for a child to keep up
with their education given their age. Let the superscript * indicate the latent variables with
the model specification following:
44
Y*i1 = β1X1i1 + ei1
Y*i2 = β2X2i2 + ei2 [4]
where i is the individual subscript and Xij are the vectors of individual, household and
village-level characteristics that affect child education outcomes for j =1,2 and includes 1
(village-level health resources Hi from equation [1] is now included in Xij). The
disturbance terms (ei1, ei2) are assumed to be zero-mean, bivariate normally distributed
with a unit variance and a correlation coefficient between ei1 and ei2 equal to ρ: (0,0,1,1, ρ).
The modelling strategy is such that the binary choice variable Yi1 takes a value 1 if the
child has had at least one year of education (SAGE score > 0) and 0 if the child has had no
schooling (SAGE score = 0). The second binary variable, Yi2, takes the value 1 if the child
has had consistent schooling (SAGE score ≥100) and 0 if he/she has fallen behind for their
age (0 0); = 0 otherwise [5a]
Yi2 = 1 if consistent schooling (Y*i2 >0); = 0 otherwise [5b]
This model is a variant of the standard bivariate probit model with four observations (see
Meng & Schmidt, 1985). Also known as a ‘bivariate probit model with partial
observability’, there are three types of observations under this particular model structure: a
child with no schooling, a child with consistent schooling or a child with inconsistent
schooling. The observations of this two-equation probit model can be represented
graphically, where n is the number of observations observed for each equation:
45
Figure 4. Three observations in the bivariate probit model with sample selection
n = 8600 n = 6572
The likelihood function is therefore given by:
ℓ = Π pr(no schooling) · Π pr(consistent schooling) · Π pr(inconsistent schooling)
This implies the log-likelihood function is:
lnℓ = ∑ni=1(1 – Yi1) ln [1 – Φ (Xi1β1)]
+ ∑ni=1Yi1·(1- Yi2) ln{Φ(Xi1β1) – Φ2(Xi1β1, Xi2β2; ρ)}
+∑ni=1Yi1·Yi2 ln Φ2(Xi1β1, Xi2β2; ρ) [6]
where Φ(·) and Φ2(·,·,ρ) denotes the univariate and bivariate standard normal cumulative
distribution functions. Equation [6] is jointly estimated by maximum likelihood using
Stata 9 SE.
No Participation; Yi1 = 0
Participation; Yi1 = 1
Inconsistent; Yi2 = 0
Consistent; Yi2 = 1
All Children
46
This econometric strategy was chosen to deal with the potential sample selection problem.
The need for this model is indicated by the statistical significance of ρ – the correlation
coefficient of the dual equation errors. If ρ is statistically significant, this indicates that
there is a relationship between the two schooling propensities.
Although schooling participation is fully observable, estimating this first probit equation
would produce inefficient results under this condition. Moreover, the second probit model
would produce selectivity bias if only the schooled sub sample was analysed. However, if
ρ is not statistically different from zero, two univariate probit models to estimate the
probability of schooling participation and consistency following [3] would be appropriate.
Identification in a multiple equation probit model has been a source of some debate.
Maddala (1983) stated that in order to identify the second equation, at least one variable
needed to be included in the selection equation that is not included in the outcome
equation. However, Wilde (2000) argues that exclusion restrictions are not needed (and
that Maddala was considering a specific example) if there is sufficient variation in at least
one exogenous regressor in each equation. Essentially, the non-linearities in the probit
models are considered sufficient for identification.
For prudence’s sake, an exclusion restriction was included in this analysis. The variable,
land, was considered a good proxy for a family’s assets and potential to invest in
schooling. This asset proxy is not considered relevant for schooling consistency (with
household income more pertinent). Using Stata, the bivariate probit with sample selection
model did not converge without this identifier and thus its inclusion seemed necessary as
well, despite Wilde’s argument.
47
6.3 Extension: Random Effects Probit Models Probit analysis of schooling participation and schooling consistency assumes that each
observation or child is independent. However, the sample consists of 3820 families with
an average size of 2.3 members. Therefore there may be some characteristics that are not
specifically in the model but are common to children in the same family and household.
Examples of this clustering effect could be parental competence in assisting their children
in their schoolwork or parental preference for education or health resources. Because
intra-cluster correlation would create bias parameter estimates, an extension of our
econometric models to take into account unobserved household heterogeneity is necessary.
A random effects probit model is used to account for children’s education data being
clustered at the family level. For detailed discussion on the model, see Maddala (1987).
This model considers the household effect to be random. The latent variable (for general
schooling outcomes) is thus of the form:
Y*ih = α + Xhβ+ Zihγ + vih [7]
where i and h are individual and household subscripts respectively. 1 x K vector Xh contains the explanatory variables that vary only at the household level. 1 x L vector Zih contains the explanatory variables that vary within the households or clusters.
The random effects probit model assumes that the error term vih is composite in nature:
vih = ch + uih [8]
where i and h are individual and household subscripts respectively, ch is the unobserved
household effect and uih is the idiosyncratic error. It is assumed that uih ~ i.i.d. N(0,1) and
ch ~ N(0, σ2c). Thus,
Var (vih) = 1 + σ2c [9]
48
ρ = σ2c / 1 + σ2c [10]
where rho ρ is considered the proportion of the error variance that is due to the
unobserved household effect.
The random effects probit model holds some strong assumptions. Notably, that there is no
relationship between the explanatory variables and the unobserved household effect, ch. This assumption is needed in order to produce consistent estimates. An alternative model
to control for unobserved heterogeneity is the fixed effects model. However, because our
variables of interests are at the non-individual level, using fixed effects would effectively
drop