The impact of health resources on education …...1 The impact of health resources on education outcomes in rural India CANDICE WEI LING TAN Honours Thesis Bachelor of Commerce (Financial

1

The impact of health resources on education outcomes in rural India

CANDICE WEI LING TAN

Honours Thesis Bachelor of Commerce (Financial Economics / Business Statistics)

Bachelor of Arts (History)

Supervisor: Dr. Gautam Bose

27th October 2008

2

Declaration

I hereby declare that this submission is my own work and any contributions or materials

by other authors used in this thesis have been appropriately acknowledged. This thesis has

not been previously submitted to any other university or institution as part of the

requirements for another degree or award.

CANDICE WEI LING TAN

27th October 2008

3

Acknowledgements

I would like to thank my supervisor Dr. Gautam Bose for his support and assistance

throughout my honours year. An enormous debt of gratitude is owed to Meliyanni Johar

for her willingness to help, her unwavering Stata wisdom and encouragement. I would

also like to thank Professor Denzil Fiebig for his help and insight. Thanks also to Mr.

Hong il Yoo for graciously answering my random econometrics questions and Dr.

Valentyn Pachenko and Dr. Shiko Maruyama for their comments.

Finally, I would like to acknowledge my family: mum, dad and brothers Paul and David

(thank you for your help on this thesis!) who have helped me in their own important way

throughout my studies.

4

Table of Contents

ABSTRACT……………………………………………………………………………..8

1.INTRODUCTION……………………………………………………………………..9

2. BACKGROUND ........................................................................................................ 13

2.1 THE GLOBAL HEALTH AND EDUCATION CRISIS ................................. 13

2.2 CASE STUDY OF INDIA................................................................................ 15

3. LITERATURE REVIEW ........................................................................................... 17

3.1 THE BENEFITS AND DETERMINANTS OF EDUCATION....................... 17

3.2 EDUCATION AND HEALTH......................................................................... 18

3.3 GENDER BIAS IN EDUCATION AND HEALTH........................................ 21

4. CONCEPTUAL FRAMEWORK ............................................................................... 23

5. DATA ......................................................................................................................... 27

5.1.1 DESCRIPTION.............................................................................................. 27

5.1.2 HANDLING OF DATA ................................................................................ 28

5.1.3 LIMITATIONS OF DATA............................................................................ 29

5.2 DEPENDENT VARIABLE.............................................................................. 29

5.3 EXPLANATORY VARIABLES...................................................................... 33

6. ECONOMETRIC APPROACH ................................................................................. 42

6.1 BINARY DEPENDENT VARIABLE ............................................................. 42

6.2 BIVARIATE PROBIT MODEL WITH SAMPLE SELECTION.................... 43

6.3 EXTENSION: RANDOM EFFECTS PROBIT MODELS.............................. 47

7. EMPIRICAL RESULTS............................................................................................. 49

7.1 BIVARIATE PROBIT WITH SAMPLE SELECTION RESULTS ................ 49

7.2 UNIVARIATE PROBIT MODEL RESULTS ................................................. 51

7.3 GENDER-DISAGGREGATED PROBIT MODEL RESULTS ...................... 58

7.4 PREDICTED PROBABILITIES...................................................................... 64

7.5 DIAGNOSTICS................................................................................................ 67

7.6 ROBUSTNESS CHECKS ................................................................................ 70 7.6.1 RANDOM EFFECTS PROBIT RESULTS........................................................ 70

7.6.2 ALTERNATIVE VARIABLE DEFINITIONS AND SUB SAMPLE ESTIMATION ............................................................................................................. 71

8. CONCLUSION........................................................................................................... 73

5

9. APPENDIX................................................................................................................. 77

APPENDIX 1: YEARS OF EDUCATION AND AGE DIFFERENCES………..77 APPENDIX 2: QUALITY OF EDUCATION INDEX VARIABLES…………...78 APPENDIX 3: DESCRIPTIVE STATISTICS FOR ALL (PARTICIPATION AND PARTICIPATE)…………………………………………………………………..78 APPENDIX 4: DESCRIPTIVE STATISTICS FOR ALL CHILDREN (CONSISTENT AND INCONSISTENT)………………………………………...79 APPENDIX 5: CALCULATING PARTIAL EFFECTS OF CONTINUOUS VARIABLE ON RESPONSE PROBABILITY…………………………………. 79 APPENDIX 6: GENDER-DISAGGREGATED RANDOM EFFECTS PROBIT RESULTS FOR PARTICIPATION………………………………………………80 APPENDIX 7: GENDER-DISAGGREGATED RANDOM EFFECTS PROBIT RESULTS FOR CONSISTENCY……………………………………………… 81

10. REFERENCES ......................................................................................................... 82

6

List of Tables

1. DEFINITION OF EXPLANATORY VARIABLES …………………………………………34

2. DESCRIPTIVE STATISTICS FOR ALL SAMPLE CHILDREN …………………………...39

3. BIVARIATE PROBIT WITH SAMPLE SELECTION RESULTS …………………………..50

4. PROBIT MODEL RESULTS FOR SCHOOLING PARTICIPATION ………………………52

5. PROBIT MODEL RESULTS FOR SCHOOLING CONSISTENCY ………………………...54

6. GENDER-DISAGGREGATED PROBIT MODEL RESULTS FOR

SCHOOLING PARTICIPATION ………………………………………………………………..59

7. GENDER-DISAGGREGATED PROBIT MODEL RESULTS FOR

SCHOOLING CONSISTENCY …………………………………………………………………62

8A. PREDICTED PROBABILITIES FOR SCHOOLING PARTICIPATION (MALES) ……....65

8B. PREDICTED PROBABILITIES FOR SCHOOLING PARTICIPATION (FEMALES) ……66

9A. PREDICTED PROBABILITIES FOR SCHOOLING

CONSISTENCY (MALES)………………………………………………………………………66

9B. PREDICTED PROBABILITIES FOR SCHOOLING

CONSISTENCY (FEMALES)…………………………………………………………………... 67

10A. PREDICTION SUCCESSES FROM TABLE 6 (MALES)………………………………...69

10B. PREDICTION SUCCESSES FROM TABLE 7 (MALES) ………………………………..69

11A. PREDICTION SUCCESSES FROM TABLE 6 (FEMALES) ……………………….…….69

11B. PREDICTION SUCCESSES FROM TABLE 7 (FEMALES)……………………………...69

7

List of Figures

1. DISTRIBUTION OF SAGE SCORES…………………………………………………………32

2. GENERAL HEALTH SESSION FREQUENCY AND SCHOOLING OUTCOMES………...41

3. SOURCE OF WATER AND SCHOOLING OUTCOMES……………………………………41

4. OBSERVATIONS IN A BIVARIATE PROBIT MODEL WITH SAMPLE SELECTION…..45

8

Abstract The benefits of education on productivity, growth and development have engendered the

interests of many researchers in examining the determinants of schooling outcomes.

Various factors have been considered, including parental education, gender and school

factors. This thesis will contribute to the literature and considers another pertinent factor

in education outcomes: the health status of the child.

Many studies have linked the health of a child to their schooling performance because of

the effect that health has on a child’s immune system, cognitive ability and level of

concentration. However these studies focused on individual and anthropometric indicators

of health. In this thesis, the focus will be on village-level health resources.

Using detailed household survey data from rural India, this thesis investigates the impact

that village infrastructure and resources that promote health, such as clean supplies of

water and proximity to a hospital, has on schooling outcomes. Specifically, whether health

resources impact on the propensity of a child to attend school (participation), as well as

his or her propensity to keep up with their schooling for their age (consistency).

The econometric models used in this thesis take into account the possible relationship

between these two schooling outcomes, as well as unobserved household effects that may

impact on a child’s education. The findings of this study indicate that health resources

have a statistically and practically large impact on schooling outcomes after controlling

for a range of individual, household and other village level characteristics. The

importance of preventive care health measures, such as the frequency of general health

lessons, is particularly robust. Furthermore, the effect of health resources is found to be

greater for female education outcomes and indicates that improvement in basic health

provision would also help reduce the pronounced gender bias that exists in education

attainment.

9

1. Introduction Human capital investment through education is widely recognized as an important source

of economic growth and productivity for a nation. Indeed governments in developing

nations often divert substantial amounts of their national expenditure to the education

sector because of the positive flow-on effects a more educated population would have on

other sectors of the economy. It has also been considered an effective means to ease

inequality and to improve the opportunities for socio-economic improvement for people in

developing countries (World Bank, 2001).

Despite this, the education levels in developing countries remain low, and improvements

in education outcomes continue to be needed in order to promote and sustain development

in the world’s poorer nations. This issue is even more urgent as educational attainment is

typically the only means for a family to break the poverty cycle. Thus the implications of

schooling are not only considered for present individuals and families, but for future

generations as well.

In India, the importance of education for children has been demonstrated by the many

program and policies that have been pursued over the last few decades. However,

although improvements have been made, like many other developing countries this

second-most populous nation in the world still faces many challenges and obstacles to

favourable schooling outcomes that need to be addressed.

Because of the known benefits of education attainment, myriad studies have sought to

determine the factors that contribute to improved schooling outcomes. This would then aid

and direct public policy initiatives and investments to strengthen this vital area of

development. Individual and household characteristics such as gender, household income

and parental schooling levels are among the many factors that education papers have

focused on. Following this intention, my thesis will also analyse the determinants of

education outcomes for children using detailed survey data from India. However, my

10

focus is on another potential determinant of schooling that deserves deeper analysis – the

health status of a child.

The topics of health and education have been addressed extensively in the economic

literature and from many perspectives. This study will take the position that health is an

important determinant of a student’s ability to acquiring education: a more healthy child

will be more productive and capable in class, which will positively impact on their

schooling outcome. Indeed the importance of health and nutrition on the overall welfare of

a growing child has been widely acknowledged. From its impact on cognitive and learning

ability, to food absorption and other deficiencies, the potential impact of health on

schooling outcomes has gained considerable attention.

Many studies have linked the nutritional status of children to educational achievements

using data from various developing countries. They find that the health status of a child

does have significant impact on their education outcomes, which is either measured by

participation or test score variations. However, these papers have focused predominantly

on individual-level and anthropometric indicators for a child’s health, such as height-to-

age or weight-to-age measures. In contrast, this study will focus on village-level health

resources and infrastructure as a measure of a child’s health status.

It is hypothesized that a child living in a village that is well endowed with quality health

resources is more likely to achieve favourable education outcomes compared to a child

from a village that lacks proper health services. Although the World Health Organization

(WHO), United Nations (UN) among others have frequently cited unsafe water supply and

sanitation as having severe implications for development, analysis of these and other

health resources on child education outcomes has not been seriously undertaken.

Using nationally representative household and village survey data from rural India, this

thesis will investigate the extent to which health services and infrastructure in a village

affect education outcomes. Two measures of education outcomes will be examined:

schooling participation as well as schooling consistency (the extent that, once in school, a

11

child keeps up with their education for their age). Although schooling participation in

India has experienced vast improvement over the decades, improvement in the quality of

the education received by children remains critical (WHO, 2000). Thus, determining

schooling consistency is considered another important education outcome and one that has

not been commonly addressed in the related literature.

Limited dependent variable models will be employed to estimate this relationship.

Although univariate probit models were used for the main results of this thesis, other

econometric strategies were employed in order to account for two possible sample

problems in analysing schooling outcomes. First, almost a quarter of the children in the

sample had not participated in the school system. If a child’s propensity to participate in

school and a child’s propensity to have consistent schooling were correlated, simply

estimating schooling consistency on children who have participated in school would

amount to endogenous sample selection. Studies that have examined these two schooling

tendencies have often assumed the independence between the two, yet from a

methodological standpoint this possible bias should be addressed in the initial stage of the

econometric study. Thus, a bivariate probit model with sample selection is estimated.

Second, because many children in the sample are from the same family, common factors

may affect siblings in the same household. I employ random effects probit models in

order take account of this potential unobserved household heterogeneity and clustering

effect.

In addition to the investigation of the impact of health resources on education outcomes,

the determination of a gender bias in not only education, but also health and other factors

is considered as well. The gender bias in education has been well documented and studies

have typically included a single gender dummy in order to capture this effect. My analysis

will split the sample into gender-disaggregated sub samples and hope to capture other

biases that may arise. In particular, studies have shown that there is a gender bias in health

outcomes in developing countries. Where health resources are scarce or poor, females

tend to be more disadvantaged than males. This implies that an improvement in health

12

resources would have greater impact on females that could, in turn, help reduce the gender

gap in education outcomes.

The benefits of this rich cross sectional dataset are that several health resources variables

are available to use as a measure of health status, as well as a range of individual,

household and village level data to control for other factors that may affect schooling

outcomes. Accordingly, this thesis will contribute to the literature in two main aspects.

First, it will add to the studies that examine the determinants of education outcomes by

focusing on the health status of the child, as well as providing fresh analysis into the effect

of other commonly considered schooling determinants.

Second, building from the literature that links individual nutritional and health status of a

child to their schooling outcomes, this thesis will investigate the impact of village-level

health resources on education outcomes. Compared to the individual health perspective

that have analysed variation in calorie-intake or height-for-age scores, the results from this

thesis on public infrastructure and resources may allow a more direct policy implication

and suggestion for public expenditure and investment.

The remainder of this thesis is structured as follows. Section 2 presents some background

on education and health in developing countries and considers the case study of India.

Section 3 reviews the literature on education and health with emphasis placed on the

studies that link health to education outcomes. Section 4 presents the conceptual

framework of the investigation. Section 5 develops the econometric strategies employed.

Section 6 describes the cross-sectional household and village data from India used for the

empirical analysis. Section 7 presents and discusses the empirical results that seem to

affirm the hypothesis; in particular, preventive care health measures have a statistically

and practically significant impact on schooling outcomes, highlighting in particular the

positive effect of general health sessions and information. Section 8 concludes with

suggestions for future research.

13

2. Background

2.1 The Global Health and Education Crisis There are many paths to development but the focal areas of this thesis are health and

education. The importance of health and education were stated in the historic 1948 United

Nations Universal Declaration of Human Rights1.

Everyone has the right to a standard of living adequate for the health and well

being of himself and of his family.

(Article 25)

Everyone has the right to an education…directed to the full development of the

human personality and to the strengthening of respect for human rights and

fundamental freedoms.

(Article 26)

These fundamental rights for human beings are basic for socio-economic improvement.

Indeed improving the lives of poor and disadvantaged people has always been a worthy

objective in both developing and developed nations. However this task takes on greater

urgency in developing countries where poverty and underdevelopment are widespread,

with potential to endure and persist through generations.

More recently in 2000, the United Nations set out eight international development goals to

be achieved by the global community. The United Nations and its 189 member states

agreed to make a concerted effort in meeting these United Nations Millennium

Development Goals (MDG) by the year 2015.2 The goals include:

1 http://www.un.org/Overview/rights.html 2 http://www.un.org/millenniumgoals/

14

The United Nations Millennium Development Goals

1. Eradicate extreme hunger and poverty

2. Achieve universal primary education

3. Promote gender equality and empower women

4. Reduce child mortality

5. Improve maternal health

6. Combat HIV/AIDS, malaria and other diseases

7. Ensure environmental sustainability

8. Develop a global partnership for development

Providing basic health resources and achieving good education outcomes would greatly

assist in achieving a majority of these goals. Moreover, the poor state of education and

health resources in the world today, largely in developing countries, further highlights the

need for improvement in these vital areas. Progress reports from developing nations

indicate that, contrary to the ideals espoused in 1948, the provision of proper health care

and education are human rights that remain far from being universal.

The United Nations Educational, Scientific and Cultural Organization (UNESCO)

reported that 113 million school-age children around the world are not in school

(UNESCO, 2002). This deficiency is particularly pronounced in developing nations.

Although education is widely acknowledged as an important source of development, a key

to poverty reduction and thus a priority for developing nations (World Bank, 2001), the

acquisition of education in these areas is not guaranteed. In fact it is estimated that 46 per

cent of people in developing countries are illiterate, 25 percent of children aged 6-12 years

do not receive primary education and 80 per cent of children aged 13-18 years do not

receive secondary education (Todaro, 2000).

The situation of health care and provision lies in a similarly dismal state in developing

countries. Even in the 21st century, 11 million children die each year from preventable

15

illnesses (UNICEF, 2002). A lack of clean water supply, poor sanitation practices and

poor health care contributes to the spread of infections and diseases that, though

inconsequential in developed countries, prove devastating in developing areas. In fact, 1.3

million people die every year of malaria, 1.8 million people die of diarrheal diseases and

90% of these deaths are children under the age of five years old. (WHO, 2004). The

relative ease in the prevention and cure of these low-level ailments further highlights the

severity of the health crisis that faces poor countries in the world.

2.2 Case Study of India Achieving favourable education and health outcomes is of great importance in a nation

like India, the second most populous nation in the world. The United Nations Millennium

Development Goals are goals that the Indian government has pledged to achieve in order

to improve the livelihoods of its people. Finance Minister P. Chidambaram expressed this

commitment during his budget speech in 2004:

“The countries of the world, India included, have set for themselves the Millennium

Development Goals. Our date with destiny is not at the end of the millennium, but in the

year 2015. Will we achieve those goals? In the eleven years that remain, it is in our hands

to shape our destiny.”3

On one hand, great improvements in education and health have been achieved in India

over the last few decades. Policies and initiatives such as the Integrated Child

Development Services (ICDS) in 1975 to the recently launched National Rural Health

Mission as well as high economic growth in recent years has helped to foster improved

outcomes in these two vital areas of development. Yet, serious challenges still remain and

need to be addressed in order to achieve sustained growth and development.

3 Sachs, J. (2005). “The End of Poverty: How we can make it happen in our lifetime”, Penguin Books, Great Britain. p. 185.

16

According to the World Bank, the number of Indian children not in school has been

reduced from 25 million in 2003 to 9.6 million in 2005-06. More equity in schooling has

also been achieved and the gap between gender and social status (through castes) has also

been reduced.4 However, challenges in education still persist. Although there have been

improvements in schooling participation, the quality of education outcomes has not

experienced the same progress, with incomplete schooling and drop-outs a considerable

problem.

In terms of health, India is also lagging behind in both the provision and quality of health

resources. In fact, India has one of the highest percentages of undernourished children in

the world with approximately 60 million children classified as being undernourished

(Gragnolati et al, 2005). According to UNICEF, in 2006 some 2.1 million children under

the age of five died in India and this figure has been attributed to India’s poor state of

health care and delivery (UNICEF, 2006).

Moreover, in their report on undernourished children in India, Gragnolati et al (2005)

argue that the country’s child malnutrition problem persists in part because the focus on

improving nutrition has primarily been on food intake. However, they highlight the role

that infections and ill-informed health practices have had on India’s malnutrition epidemic.

Although great improvements in the provision of improved drinking-water sources have

been achieved (86% of the Indian population in 2004), unimproved sanitation and lack of

general health knowledge in the population remain significant public health threats for

much of the population (UNICEF, 2002). The extent that such bleak health conditions

affect poor education outcomes forms the basis of this investigation.

4World Bank (2008) India Country Overview 2008. See Reference.

17

3. Literature Review

3.1 The Benefits and Determinants of Education

Human capital formation through education is fundamental for development and progress.

This is particularly an important issue in developing countries where the poor may

experience persistent inequality and poverty because of the credit constraints and lack of

opportunity for socio-economic improvement. Through higher post-school earnings, a

good education provides the opportunity for intergenerational income mobility and a

breaking of the poverty cycle (Restuccia & Urrutia, 2004). Alternatively, Behrman (1990)

and Bedi and Gaston (1997) argue that children who are poorly educated may have low

productivity in adult life and end up in poverty.

However, free provision of education does not necessarily mean free consumption and a

vast amount a literature has been devoted to the study of schooling determinants in

developing countries (Dreze and Kingdon, 2000; Duraisamy, 1992; Sipahimalani, 1997

among others). Dreze and Kingdon (2000) examined schooling participation and grade

attainment in rural north India. Using household survey data, they employed a logit model

for estimating participation and an ordered logit model for estimating the determinants of

grade attainment (three outcomes of not enrolled in, enrolled but not completed and

completed primary schooling) for separate as well as pooled sample of female and male

children of primary age. Their results found that a range of individual and household

variables affects participation and grade attainment in school. They particularly

highlighted the role that parental education plays in schooling outcomes as well as

schooling characteristics. Mid-day meals are found to be particularly effective in

improving participation in school for girls.

Other studies on the determinants of education outcomes have also considered a range of

variables but focus on some key characteristics. For example, Blau and Grossberg (1992)

find the role of mother’s education an important determinant while Brown and Park

(2002) focus on the role that wealth and credit constraints have on schooling investments.

18

Birth order and family size are also commonly found to be significant determinants for

schooling outcomes. It is from these studies on schooling determinants that the choice of

explanatory variables in the model will be considered.

Another factor that has been considered in relation to education attainment is the role of

health. The literature on education and health is also quite extensive. Some studies have

examined them as separate inputs for growth and development. This study aims to

investigate a relationship between the two.

3.2 Education and Health

Studies have compared the benefits of investing in education with the benefits of investing

in health in the hopes of directing public policy in the area that would most benefit

economic growth. Knowles and Owen (1995) found that health has a greater impact on

economic growth than education. Using life expectancy as a proxy for health capital, it

was found to have a statistically and practically significant impact on income per capita

compared to education, and highlighted the importance of including health capital in

models of growth.

Webber (2002) is even more emphatic about the apparent trade off between education and

health in his paper subtitled, “should we invest in health or education?” His question is

answered using cross sectional data from 46 countries. Webber finds that his proxy for

health, as measured by the intake of calories per head, has a statistically insignificant

effect on economic growth contrary to education. He concluded that the results support

the notion that investing in health has lower returns for a nation than investing in

education. His suggestions for future research, however, are to investigate other proxies

for health, in particular health infrastructure such as the supply of clean water and quality

of health care.

Therefore economic growth studies have viewed education and health as two separate,

exogenous and almost opposing inputs for growth and development. This study, however,

19

will instead consider the intricate relationship between these two important sources of

social capital whilst focusing on individual education and health outcomes.

Both channels of the health and education relationship have been examined. Although this

study is investigating the impact of health status on education outcomes, the opposite

direction of education’s effect on the health status of an individual has also been of

interest – that is, the health benefits of acquiring education. Mushkin (1962) argues that

ignorance delays medical treatment and subsequently increases the strength of infection

and disease. The inability to read and understand medicinal information and innovations

could also be averse to a person’s health. More educated people could also be more likely

to be employed in “safer” white-collar occupations with less health risks and generally

pursue activities that do not endanger their health (Case 2002; Caldwell, 1986).

However, the direction of this education and health relationship views education as human

capital already acquired. Since education attainment (and the ensuing positive flow-on

effects) is not guaranteed in developing countries, the reverse relationship is of interest in

this study. The channel that this thesis will investigate – namely, the extent that health

status impacts on education outcomes - is analogous to the impact of worker productivity

on output. In other words, we investigate the impact of the health of a child on his or her

capacity for participating and exerting effort in school.

The link between health and education attainment has been well established. Using height-

to-age, weight-to-age and other anthropometric measures for child health, research across

a range of developing countries has shown that variations in these indicators have a

significant impact on schooling outcomes.

The health and nutritional status of a child has been shown to determine the propensity of

a child to participate in school. In their study of Nepalese children, Moock and Leslie

(1986) examined the effect of nutrition status – as measured by height-for-age, weight-for-

age and weight-for-height – on both schooling participation and grade attainment. They

estimated a probit model for schooling participation and found that children with better

20

nutritional status had a significantly higher probability of attending school compared to

those with stunted growth. In terms of grade attainment, their ordinary least squares (OLS)

results also come to the same conclusion on the benefits of nutrition on schooling

outcomes. However, only 15 per cent of the 350 primary school aged children actually

participated in school. Therefore their analysis of grade attainment for this 15 per cent of

the sample does not take into account the possible sample selection of participation. This

will be addressed in my study, as the proportion of non-participants is also non-trivial.

Addressing schooling participation, Glewwe and Jacoby (1995) found that children in

Ghana delayed enrolment in school and also completed fewer years in schooling because

of malnutrition and poor health (measured by height-for-age). Because of the negative

impact delayed schooling would have on post-school labour earnings, the authors

emphasise the importance of child health and nutrition. However, after they control for

unobserved family variables using random and fixed effects estimation, the effect of

health is substantially reduced. This paper highlights the importance of accounting for

unobserved factors that may affect the analysis.

Studies also link poor health status to poor achievements by children in school as

measured by variation in test scores. Gorman and Pollitt (1993) found that children with

better nutrition in Guatemala performed better in cognitive and other school tests.

Similarly, a study in the Philippines also found that a one standard deviation increase in

early-age child health increased test scores by almost a third of a standard deviation

(Glewwe & King, 2001). Thus these studies have indicated that the health status of a child

can impact on their ability to acquire education through its consequence on concentration,

cognitive and physical ability. Subsequently this will impact on the quantity and quality of

their education and potential for socio-economic improvement.

However, these studies on health and education have largely focused on the impact of

individual health status on education outcomes. Because of this, the empirical methods of

these papers have primarily used two stage least squares in order to account for the

possible endogeneity of a child’s health to their education outcomes. This is because, as

21

mentioned previously, the relationship between health and education can be viewed from

both directions and thus this issue would arise. Less focus, however, has been placed on

the role of village health infrastructure and resources as a determinant of health status, and

its subsequent impact on education outcomes. By using village level health resources as

an explanatory variable, the threat of endogeneity is minimized. That is, a child’s

education outcome may impact on their individual health status, but it is unlikely that a

child’s education outcome would impact on the village’s level of health infrastructure.

It has been established that there exists a relationship between the quality of sanitation and

water and other village-level health resources, and the health status of households,

particularly children. For example, Esrey (1996) found that improved water seemed to

decrease the prevalence of diarrhoea in children by 6 percentage points when analysed

across different countries. However, the extent to which these village-level health

resources impact on education outcomes has not been vigorously addressed and it is this

gap in the literature that my thesis hopes to fill. While studies have included physical

infrastructure variables in their models that could proxy for health resources in the village,

it is usually a single indicator that is used to measure the overall level of development in

the village. The presence of piped water, for example, is a common proxy that has been

used in studies of education outcomes (for example, Psacharopoulos and Arriagada, 1989;

Holmes, 1999). The innovation in this paper is that besides controlling for village level

development, various other health resource variables will be included in order to isolate, if

any, a causal relationship between health resources and education outcomes.

3.3 Gender Bias in Education and Health Gender bias in education outcomes has been a keen and important area of interest in the

education literature (Lavy et al 1996; King and Lillard, 1987 among others), which

indicates there is a sharp disparity in female and male schooling outcomes. South Asia is a

region well known for its strong male preferences and discrimination against females and

this inequality has also been viewed from a health perspective. For example, Rosenzweig

and Schultz (1982) and Dasgupta (1987) found that there were significant gender

22

differences in household health care and resource expenditures in India as a result of

perceived differences in future earning abilities. Studies of Pakistan found that boys

received preferential treatment over girls with respect to treatment for illnesses such as

diarrhoea and fevers as well as acute respiratory infections (Mahmood and Mahmood,

1995; Filmer et al, 1998). Such bias has implications for childbirth and thus the health of

future generations – further highlighting the importance of reducing such inequality.

Females may also be given the task of taking care of ill family members if their

opportunity cost of being outside the home is considered lower than a male counterpart.

Therefore this issue of gender bias will also be addressed in this study, not only from a

schooling outcomes perspective but also in terms of health outcomes.

This thesis will contribute to the literature on the determinants of schooling by analysing a

range of individual, household and village characteristics. This area of research is

particularly important for developing countries in which education attainment is

considered a key to development and poverty reduction. Furthermore, this study will

extend the literature by focusing on a range of village-level health resources as a measure

of health status and nutrition that has not been seriously addressed. The empirical methods

employed in this paper will also take into account the possible sample selection of

schooling participants that tends to be ignored in the literature, as well as accounting for

unobserved household effects that could impact on the schooling outcomes for children in

the same family. Finally, gender bias in education and health outcomes will also be

examined.

23

4. Conceptual Framework

The primary hypothesis of this thesis is that village-level infrastructure and resources that

promote health leads, through improved health conditions, to more favourable education

outcomes. ‘Education outcomes’ here reflect schooling participation as well as schooling

consistency (the extent that a student keeps up with schooling according to their age). In

terms of health resources, these can be categorized into three broad areas (De Ferranti,

1985):

Preventive care (patient-related): this includes services that are performed

on well patients in order to reduce the incidence of adverse health events like

gastro-intestinal infections, diarrhoea, and malaria etcetera. They would

include measures such as food supplements, malaria shots and other

vaccinations.

Preventive care (non-patient related): this includes services that are

provided in a community in order to control the spread of disease and

infections. These include resources such as clean water, proper sanitation and

the promotion of good health habits and hygiene.

Curative care: these include resources such as hospitals and health

facilities, medical practitioners or traditional healers that act to contain and

ease illness after they occur.

The ailments that plague school aged children, particularly in developing countries, are

typically common “low level” diseases that tend to be easily preventable (the occurrence

of diarrhoea, for example). The prevention and treatment of more complex ailments, such

as malaria, are also well known. This indicates the importance of investing in basic health

care and resources.

I posit that communities that are less endowed with health resources and measures would

lead to adverse effects on the education outcomes of the children living in that area. In a

poor health-resourced environment, children are more likely to be afflicted periodically

24

with low-level ailments that would cause temporarily debilitation. The period of time that

they remain indisposed depends on the severity of the illness, capacities of family care as

well as the curative care services available in the area. Therefore the hypothesis of this

thesis is that the health status of a child and their incidence of sicknesses will then impact

on their education outcomes by affecting their ability to perform and succeed in school. It

will also impact on the probability of school entry.

It is hypothesized that the frequency of illnesses in the years preceding schooling age

reduces the probability of school participation. Due to the opportunity cost of spending

time in school, as opposed to utilizing that time in the home or engaging in paid work, the

payoff from schooling needs to outweigh the cost of the invested time (besides other

pecuniary schooling costs). This opportunity cost is particular high in developing

countries where the resources in households can be heavily constrained.

However, time spent in school is more effective over consistent time periods rather than

short bouts of learning. Consistent attendance in school increases the productivity and

learning ability of the child and also gives them the opportunity to understand ideas and

concepts that would be important for higher level learning. Therefore because schooling

requires a long term and consistent time investment in order to provide “profitable”

returns (in terms of potential earning ability in the future), a child who is prone to illness

and will consequently have a transient presence in the classroom may be more productive

in non-schooling activities or work (where returns requires a less consistent time

investment). Alternatively, the poor health status of a child may simply demand time and

caring in the home rather than in the classroom. As such, a hypothesis of this study is that

there exists a positive relationship between the probability of a child participating in

school and the availability and quality of public health infrastructure in the village.

Similarly, I hypothesize that health status impacts on children who are already enrolled in

school by affecting their probability of keeping up with their studies. The frequency with

which the child falls ill, and the extent that they remain ill without timely treatment,

determines the probability that he or she successfully completes their schooling. Again, by

25

missing classes due to illnesses, a child misses out on learning concepts that would be

used to understand more difficult material. Poor health and nutrition would also affect a

child’s cognitive ability and capacity for learning and concentrating in the classroom. This

would increase the probability that they fall behind in their schooling.

Another consequence of a village being endowed with poor health resources is that the

health status of other family members, besides the school-aged children, would be poor.

This could place more responsibility on the children to take care of their ill siblings or

elders and give less priority to attending or keeping up with their schooling.

Therefore this relationship between health resources and education outcomes can be

expressed as a reduced form achievement function:

Ai = β1Zi + β2Hi + vk [1]

where Ai is the education outcomes of child i. Zi is a vector of individual, household and

other village-level characteristics that affect education outcomes, Hi includes the village-

level health resources available to child i, and vk is a random disturbance term that

includes unobservable characteristics that would affect schooling outcomes, for example,

a child’s innate ability.

Therefore my theory is that the education outcome of a child is, among other things, a

factor of the health resources in the village, through its effects on the health status of the

child. Implicit in this theory is that all families in a “good health” community will utilize

the health resources available and alternatively, “bad” health resources will adversely

impact on all families and the health status of the children in that community. This may be

considered a strong assumption if some health measures are not available to all families in

the village. However, the nature of public infrastructure and services is that when they are

in place, they are available to all residents regardless of the economics means of the

individual family. For health improvements in particular, the benefits to a whole

community or village has well been affirmed.

26

In a study by Bundy et al (1990), they identified a transmission effect from treating

diseases for school age children to the rest of the adult community. Hughes et al (2000)

and Alderman et al (2001) also found that the impact of village level health resources had

spill over effects on the entire village. Therefore I assume that a village with good health

resources would similarly have positive externalities for the wider community. Likewise,

a poor health-resourced village would be more conducive to infectious disease and illness

that could permeate throughout the village because of interactions and poor practices

across families.

27

5. Data 5.1.1 Description The data used in this study was collected by the National Council of Applied Economic

Research (India) in 1999. This ARIS-REDS data comes from a nationally representative

sample of rural Indian villages and households. This rich cross-sectional dataset is

appropriate for this study as it provides detailed individual, household and village-level

data across a range of socio-economic characteristics including health and education

information for every family member.

The ARIS-REDS surveys were taken in several rounds. It was first collected in 1969,

then 1970, 1971, 1982 and 1999. Due to the timing between the survey rounds,

longitudinal analysis of the households and villages would not be possible without

significant changes in the composition of households and villages. The 1999 data was

chosen for this analysis as it is the most recent, the data is in Stata format and the

directories and identifications are presented in a clear layout (relative to the earlier

datasets in which much of the information on scanned photocopies lacked clarity). This

dataset covers 9298 families, consisting of 44,999 individuals across 253 Indian villages

and within 16 rural states.

Although the majority of the 1999 data is available online5, merging the village and

household data needed village identifiers that were suppressed for privacy concerns.

Professor Andrew Foster of the Department of Economics and Community Health at

Brown University manages the full ARIS-REDS data. After obtaining approval from my

faculty’s Human Research Ethics Advisory Panel and Professor Foster, the secure data

allowed full merging of the relevant data decks.

5 http://adfdell.pstc.brown.edu/arisreds_data/

28

5.1.2 Handling of Data The sample consists of several decks of information. If a family provided answers for

the deck 2 questionnaire (referring to household composition), then because of full

enumeration, more detailed information for all members of the family were available in

subsequent decks. Since the data for this study needed to be merged across household

and individual levels, deck 2 provided an overview of the family and was considered the

“master” deck. A household identification (ID) number and specific member ID

matched more detailed information about the child such as years of education. For

example, the master deck contains information about a family with ID 7072 with three

children of schooling age. These children had specific IDs that were matched with more

detailed information in deck 6 (sons) and deck 7 (daughters).

However, there were incidences of ID inconsistency across individuals within the family.

For example, a son with an ID of “4” in the master deck may not match with their

specific ID in deck 6. These inconsistencies can be attributed to the sheer nature of

survey data and the inclination for human type error. In such cases, manual re-

identifications were necessary. Although a tedious process, this ensured that the data

remained consistent and informative for this analysis. Finally, the village data needed to

be merged with the master deck. Secure village IDs were matched with coded identifiers

in the master deck. Thus a fully merged dataset that consisted of individual information

within families and across different villages was constructed for analysis.

My study is based on a sub sample of children of the schooling age 6-18 years.

Moreover, we are interested in children who were alive at the time of the survey as well

as children who were living in the family. Therefore the sample did not include children

listed in the more detailed decks that were not indicated in the master deck. These

children were generally older, married and lived away from the family village. I was

interested only in children living in the family village as the village-level health

resources may only then have relevant impact on their education outcomes. Finally,

children of families for which education and other household information were missing

were excluded from the analysis. Therefore families with children, as indicated in the

29

master deck, that did not have more detailed information in the other decks were

excluded. This was done under the assumption that this information was missing at

random and thus their exclusion would not adversely affect the results of our analysis.

Therefore our analysis was reduced to a sub sample of 8,668 children of the schooling

age 6-18 years.

5.1.3 Limitations of Data There are limitations to the information contained in this dataset. First, for village data,

information on hospitals and schools are only available for the main institution in the

village. That is, although there may be more than one school in the village, detailed data

about the number of qualified teachers or availability of textbooks is available for the

representative school in the village only. This means that it has to be assumed that the

main school in the village is representative in terms of quality of the other schools. This

assumption has to also be placed on the village information available for hospitals and

health centres.

Second, the cross sectional nature of the dataset implies that the children within the

schooling range of 6-18 years had access to the schooling and health resources as they

were maturing. That is, an 18 year old that is shown in the data to reside from a village

with good health resources is assumed to have had this quality of health resources as he

or she were growing up. Because of India’s emphasis on improving health resources as

early as the 1950s but more so in early 1980s, this assumption is considered acceptable

and necessary given the scope and nature of the data.

5.2 Dependent Variable There are different ways to measure education outcomes. Some studies have utilized and

examined variations in standardized test scores as a measure of education attainment

(for example, Glewwe & King, 2001; Jamison and Lockheed, 1987). Arguably, this may

not be an adequate reflection of education achievement as schooling provides a child

30

with opportunities of learning social skills through interaction and other positive

externalities. More so, test scores reflect students who are already enrolled in school –

that is, human capital already acquired. However as aforementioned, in many

developing countries including India, participating in the education system is not

guaranteed or universal but is affected by various household and socio-economic factors.

Therefore participation in school and consistency in school are considered as measures

of endogenous education outcomes for this analysis.

Measuring schooling participation is uncomplicated. A child who reports at least one

year of education is considered a school participant. For measuring schooling

consistency, however, simply comparing the number of schooling years attained across

different ages would be erroneous. That is, a 5 year old would have less years of

schooling than an 18 year old simply because of their age difference. Although

controlling for age could correct for this, a crux of this study is measuring the

completeness or consistency of schooling outcomes. Therefore whether a child keeps up

with their schooling, given their age, is of particular interest. In order to analyse these

variations in education for children across different ages, a standardised measure that

captures both participation and completion is used.

A standardised measure for education outcomes, called the SAGE score (schooling for

age), has been used in other studies and is a useful measure because it controls for

different aged children and encompasses both schooling participation and years of

schooling completed (Patrinos and Psacharopoulos, 1997; Gitter & Barham, 1999). The

SAGE score is calculated as follows:

SAGE = [S/(A – E)] * 100 [2]

S is the total number of years completed, A is the age of the child, and E is the age that

children officially begin schooling. The age in which children start school in India is 5

years old. Thus a SAGE score of 100 would mean that a child’s education is consistent

for their age, whilst a score less than 100 would mean that they have missed some years

31

of schooling or not participated at all (a SAGE score of zero). Although the official

schooling age is 5 years, children may begin some sort of pre schooling and accordingly

SAGE scores greater than 100 are possible.

Again, survey data are not free from error. The dataset used contained discrepancies in

the child’s age and years of education that needed to be accounted for. Although these

differences are the interest of this study, including observations in which the difference

is obviously a case of human type error could be adverse to our analysis and lead to bias

results. Therefore, children with SAGE scores that were confidently regarded as a result

of human error were altered according. For example, a child aged 10 years is purported

to have 40 years of education – a difference of 30 years. Changing the education years

to ‘4’ seemed reasonable. Two other observations were changed in this way as their

differences in age and years of education were unrealistically great in magnitude (and at

the same time easy to infer the correct value). Note that the results did not change with

the exclusion of these three observations. However, there were cases of ambiguity in

which the difference in age and years of education were very small.

The minimum cut-off for differences in this analysis is three years of age. This means a

6-year-old child having three years of education is considered reasonable since children

may have had pre schooling before the official schooling age. Moreover, the

observations with differences of less than three years constituted only 0.8 per cent or 68

observations of the entire sample and thus culling these observations was believed to

have little impact on our analysis. See Appendix 1 for a table of the age-education year

differences and the abovementioned changes.

Therefore, the final sample for analysis will consist of 8,600 observations of 6-18 year

old children. The distribution of SAGE scores over this sample is shown in Figure 1.

32

Figure 1: Distribution of SAGE scores

05

1015

2025

Per

cent

0 100 200 300SAGEX

Figure 1 shows that a significant proportion of the observations in the sample had SAGE

scores of zero. Specifically, 2028 or 23.57% of the entire sample of 8,600 children have

had no years of education. Of the 6572 children who have had at least one year of

education, 66.84% of them have not kept up with their education according to their age

(with SAGE scores less than 100). Approximately 6% or 569 children had a SAGE

score greater than 100, indicating some form of pre schooling. The variation in SAGE

scores for the sample will inform the estimation methods to be used, which will be

detailed in the next section.

Finally, it should be noted that variation in SAGE scores between 0 and 100 do not

reflect the extent of schooling “completeness”. That is, a higher SAGE score does not

necessarily mean that the child is keeping up with their schooling “better” than a child

with a lower score. Given the way the SAGE score is constructed, such variations can be

attributed to difference in ages rather than differences in years completed. For example,

33

a 17 year old with 10 years of education and a 12 year old with 5 years of education are

both behind in their schooling-for-age by two years. Yet, the 17 year old has a higher

SAGE score of 0.83 compared to 0.71 for the 12 year old. However, because an aim of

this thesis is to analyse the incidence, rather than the extent of, children falling behind in

their schooling, this particular feature of SAGE scores between 0 and 100 is not of key

interest in this study.

Therefore, equation [1] will be estimated with two measures of education attainment:

schooling participation and schooling consistency. A binary outcome of school

participation (SAGE score > 0) or not and school consistency (SAGE score ≥ 100) will

describe the education outcomes of the children. Details on the econometric strategy and

the modelling of these dependent variables will be provided in the next section.

5.3 Explanatory Variables The variables of interests in this study are the village-level health resources. In addition

to these, our rich dataset also allows for several individual, household and village level

control variables. Table 2 presents the definitions of the list of variables used for this

analysis. The inclusion of the control variables was considered given the practice and

findings of previous studies.

34

Table 1 Definition of explanatory variables Variable Definition

Individual characteristics

AGE6_12 dummy, takes value 1 if child is aged 6-12 years AGE13_15 dummy, takes value 1 if child is aged 13-15 years AGE16_18 dummy, takes value 1 if child is aged 16-18 years MALE dummy, takes value 1 if child is male BTHORDER value 1 for first born, 2 for second born, 3 for third born…etcetera ACTIVITY dummy, takes value 1 if child performs activity non-school related

Household characteristics LAND dummy, takes value 1 if family owns land FAMILY_SIZE number of family members HH_EXP family expenditure on food and non-food items as reported in

1999 (Rupees) per capita HEAD_EDUC education level of head of the family and spouse

Village characteristics ELEC dummy, takes value 1 if village is electrified SCHOOL_DIST distance (km) of the main school from the village EDUC_QUAL education quality indicator of value 1-10 HEALTHDIST dummy, takes value 1 if health facility is not situated in the

village HOSPDIST dummy, takes value 1 if rural hospital is not situated in the village CHLORF dummy, takes value 1 if frequency of well chlorination is at least

every 3 months MALAEF dummy, takes value 1 if frequency of malaria spraying is at least

every months GHEALTH dummy, takes value 1 if frequency of general health sessions

given in the village is at least every 3 months WATERSOURCE dummy, takes value 1 if village has improved source of water TOILETQ dummy, takes value 1 if village has improved sanitation HTHGUID dummy, takes value 1 if village has a health guide

a) Village-level health resources There are eight health variables of interest that are considered to capture the level of

health resources in a village.

35

The proximity of a health facility from the village (health_dist and hosp_dist) was

deemed adequate proxies for the extent of curative care available in the village. A proxy

for the quality of curative care could only be captured by whether the health centre had

beds on its premises. However this had little variation over the sample and thus was not

included.

The variables watersource and toiletq are the indicators of safe drinking water and

proper sanitation in a village. Following the definitions by the World Health

organization and UNICEF, an “improved” source of water in a village is the presence of

a public tap, hand pump or tube well. An “unimproved” source includes canals, rivers

and ponds. “Improved sanitation” refers to toilet facilities with a flush or semi flush, and

“unimproved” sanitation is defined as a service latrine or open fields in the data. (WHO

and UNICEF, 2004)

Other preventive care health resources in the analysis include three frequency measures.

The frequency of well chlorinating, chlorf, is considered an important health measure.

An improved source of water may still contain harmful bacteria and disease-causing

organisms and chlorinating water sources will help kill such bacteria and reduce the

transmission of water-borne diseases (WHO, 2004). Thus the frequency of well

chlorinating is considered an important water quality measure. The frequency of malaria

spraying, malaef, obviously helps reduce the incidence of malaria but is also an

indicator of the other vaccinations and immunizations that the village may perform. The

frequency of general health sessions, ghealthf, reflects the level of health consciousness

in the village. Indeed general health knowledge has been cited as being fundamental to

improving the health of people in developing countries. Basic practices such as hand

washing, for example, has particularly been acknowledged as effective means to reduce

the spread of disease and infection and is increasingly being promoted in developing

countries (Reuters, 2008). Finally the presence of a health guide in a village, hthguid, is

another measure of health resources that is thought to improve schooling outcomes by

promoting good health practices.

36

b) Individual and household characteristics Age dummies that reflect different levels of schooling: primary, middle and senior

school, are included in order to capture any possible age cohort effect on schooling

outcomes. Gender may play an important role in determining education outcomes as

well. From traditional or cultural norms, males may be considered more important as a

future income-earner for the family (contrary to females who will marry and “leave” the

family). Therefore parents may favour schooling resources towards sons rather than

daughters.

Although the empirical evidence has been mixed, the birth order of the child may also

have an effect on education outcomes for children. Children that are born earlier (and

have a low birth order) have fewer siblings to compete with and thus can enjoy a greater

proportion of household resources (Lindbert, 1977). Older children may also be

expected to provide for the family and thus are given greater access to schooling

resources by parents. On the other hand, later stages of the life cycle for a family may

have greater resources available for schooling. This means older children may not be

given the chance for schooling because of limited household resources compared to later

born children (Parish and Willis, 1993). Therefore the expected impact of a child’s birth

order on their education outcome is not certain.

The financial resources of the family would impact on the educational outcomes of

children by influencing their ability to put their children through schooling. Indeed

although schooling costs may be freely provided and subsidized by the government,

there remain other costs to schooling, such as textbook, transport and other

miscellaneous expenditures. Moreover, and another important issue in developing

countries, the opportunity cost of the child’s schooling is their labour in the fields and

supplemental income. Thus the activity status of a child – whether he or she is engaging

in non-school activities be it paid or non-paid – is included as an explanatory variable.

37

Psacharopoulos and Arriagada (1989) found the demand of child labour by the family a

strong indictor of schooling participation.

Moreover, if a family has a higher income, the willingness to invest in a child’s

education (which is a long term investment) rather than the shorter-term child labour

earnings is expected to be greater. However as suggested and used by Maitra (2003), log

of household expenditure should be used as a proxy for permanent income. This is due

to possible mis-measurement of household income information in the data, as well as the

transitory nature of household income. It is considered that households tend to smooth

consumption over time and thus household expenditure per head is a more appropriate

proxy for permanent income. The data on household’s expenditure includes food and

non-food items.

Access to credit is also another factor that may influence human capital investment. If

poor families are credit constrained and lack the collateral to borrow against their

income, investing in human capital would be difficult. Whether a family owns land or

not is used as a proxy for family assets and their ability to invest in education. It is

expected that a family with more resources are more likely to access credit and invest in

education. This is considered important in deciding schooling participation for a child

and thus land is used as an exclusion restriction in the econometric strategy, which will

be detailed later.

The education level of the head of the house, head_educ, is also included in order to

capture its expected positive correlation with the child’s education. This may be due to a

greater appreciation of schooling or ability to assist their children in their studies The

intergeneration effects of parental education on the schooling outcome of children has

been studied previously (for example, see Glick and Sahn, 2000; Brown and Park, 2002).

Head education could also be considered a proxy for the child’s innate ability or IQ.

38

c) Other village-level determinants of child education outcomes

Previous studies have indicated that community-level factors in education, such as

distance of village to schools, teacher-to-pupil numbers, class size and other quality

indicators impacts on the education outcome of children (Hanushek, 1995, Glewwe,

2002). Supply side schooling factors are including in this analysis. The distance of a

village from a school entails time and transportations costs that could reduce the demand

for school as well as the ability to keep up with schooling. An index, educ_qual, was

also formed, which includes 10 possible indicators of school quality including the

availability of mid-day meals, furniture for students and computers in the school. See

Appendix 2 for the list of variables in the index.

Other community-level infrastructure factors were also considered as determinants of

schooling outcomes. The level of development or income in the village could impact on

both education and health outcomes. Thus appropriate proxy variables needed to be

included. Unfortunately, the data allowed for limited variable choices in this respect.

Whether a village is electrified or not, elec, is used to indicate the level of development

in the village. Other proxies that were considered included the number of televisions and

telephones in the village. These were poor proxies, however, because these continuous

variables were not useful unless the proportion of the village people owning these items

were known. As such, only one indicator of village development is included in the

analyses (which pose one of the limitations of this study).

Table 2 contains descriptive statistics for the variables used in the empirical model.

Descriptive statistics of the variables with comparisons between children who have

participated in school against those who have not participated, as well as comparison

between consistently and inconsistently schooled children can be seen in Appendixes 3

and 4 respectively.

39

Table 2: Descriptive Statistics for all children (N = 8600)

Variable Mean Standard

Dev. Min Max age6_12 0.539 0.498 0 1

age13_15 0.230 0.421 0 1 age16_18 0.230 0.421 0 1

male 0.547 0.498 0 1 bthorder 1.993 1.128 1 13

activity 0.097 0.296 0 1 familysize 6.423 2.490 2 30 lnHH_exp 10.080 0.520 7.711 13.851 headeduc 6.921 5.594 0 44

land 0.740 0.438 0 1 elec 0.917 0.276 0 1

scdist 1.411 2.511 0 9 educ_qual 3.788 2.180 0 10 healthdist 0.414 0.493 0 1

hospdist 0.268 0.443 0 1 chlorf 0.551 0.497 0 1

malaef 0.431 0.495 0 1 ghealthf 0.538 0.499 0 1

watersource 0.641 0.480 0 1 toiletq 0.143 0.350 0 1

hthguid 0.418 0.493 0 1

Before econometric analysis was performed using the data, simple comparisons were

made in order to identify any possible correlation to support the hypothesis. A priori it is

argued that poor health resources lead to poor education outcomes. Therefore it is

expected that children who have not participated in school or who have been

inconsistent in their schooling would live in villages that lacked preventive and curative

care health measures.

40

Some simple correlations of the data, focusing on two health resources – general health

sessions and source of water - is shown in Figure 2 and Figure 3. Both indicated that this

correlation could exist. Note that participation and consistency has been modelled as a

binary outcome

It can be seen from Figure 2 that a higher percentage of children who have participated

in school live in villages with frequent – at least every two months - general health

sessions (57%) compared to children who have not participated in school (43%). In

terms of schooling consistency, though there is less pronounced difference, 60% of

children who have kept up with their schooling live in villages with frequent general

health sessions compared to 56% of children who have fallen behind in their studies.

Source of water and schooling outcomes is compared in Figure 3. Similarly, a greater

percentage of children who have good education outcomes tend to live in villages with

improved sources of water.

Certainly these “naïve” correlations do not take into account any other factors that may

impact on schooling outcomes. However, this preliminary exercise gave some indication

that a relationship may exist, further justified our investigation and necessitated the use

of econometrics to advance our analysis.

41

Figure 2: General Health Session Frequency and Schooling Outcomes

57%

43%

60%56%

0

10

20

30

40

50

60

70

Participation Non- Participation Consistency Inconsistent

% o

f chi

ldre

n fr

om g

ood

heal

th re

sour

ce v

illag

e

Figure 3: Source of Water and Schooling Outcomes

64%69%

58%66%

0

10

20

30

40

50

60

70

80

Participation Non- Participation Consistency Inconsistent

% o

f chi

ldre

n fr

om g

ood

heal

th

reso

urce

vill

age

42

6. Econometric Approach 6.1 Binary Dependent Variable Our main hypothesis is that a low level of health resources in a village will increase the

probability that a child falls behind in their schooling. Moreover, it may reduce the

propensity that a child attends school. As aforementioned, the construction of the SAGE

score means that variations in scores between zero and 100 do not reflect the extent of

schooling “completeness”. Therefore, Ordinary Least Squares (OLS) estimation of

variation in SAGE scores against explanatory variables would not produce meaningful

results. Instead, the nature of the SAGE scores indicates that a binary dependent variable

approach would be appropriate.

A SAGE score equal or greater than 100 would indicate that a child has kept up with their

schooling and a SAGE score less than 100 would indicate inconsistent schooling for age.

Similarly a positive SAGE score would indicate schooling participation and a SAGE score

of zero would indicate non-participation. Thus two probit models will estimate equation

[1] with a binary dependent variable for schooling participation and schooling consistency.

However, the nature of the data indicates that there may be a sample selection problem

when modelling schooling consistency.

From Figure 1, it was shown that a sizable number of the children in the sample have had

no schooling years at all. Because inconsistency in schooling necessitates participation in

schooling, ignoring these observations and only analysing the sub sample of schooled

children could produce inconsistent estimates. This is because the analysis of the schooled

children may not be randomly selected. Since schooling consistency is only observed if

the child participated in school, sample selectivity bias may arise if the probability of not

participating in school is not differentiated and distinct from that of being inconsistent in

schooling.

43

Therefore to address this suspicion of sample selection and interdependency between

schooling participation and schooling consistency, a bivariate probit model with sample

selection as well as univariate probit models was employed.

6.2 Bivariate Probit Model with Sample Selection A probit model of the standard form models each schooling outcome of participation and

consistency. Let the superscript * indicate an unobserved or latent variable:

Y*i = βXi + ei P(Yi = 1 | Xi) = P(Y*i > 0| Xi) = P[ei > -(βXi) | Xi] = Φ( βXi ) [3]

where i is an individual subscript, X represents a vector of characteristics and includes 1,

ei is a standard-normally distributed error term and Φ(.) is the standard-normal cumulative

density. For more details, see Wooldridge (2002). Equation [2] is estimated by maximum

likelihood estimation using the econometric program, State 9SE (as are all estimations in

this study).

However, because of the possible sample selection bias from estimating the schooling

consistency model, it is necessary to begin the analysis using a two-equation approach in

order to determine if a sample selection problem existed. Thus, a bivariate probit model

with sample selection was employed.

This model consists of two simultaneous equations – one for the selection equation of

attending school, Yi1, and another for the outcome equation of keeping up with schooling,

Yi2. In other words, the econometric model will consider two latent variables representing

the propensities of a child to be educated as well as the propensity for a child to keep up

with their education given their age. Let the superscript * indicate the latent variables with

the model specification following:

44

Y*i1 = β1X1i1 + ei1

Y*i2 = β2X2i2 + ei2 [4]

where i is the individual subscript and Xij are the vectors of individual, household and

village-level characteristics that affect child education outcomes for j =1,2 and includes 1

(village-level health resources Hi from equation [1] is now included in Xij). The

disturbance terms (ei1, ei2) are assumed to be zero-mean, bivariate normally distributed

with a unit variance and a correlation coefficient between ei1 and ei2 equal to ρ: (0,0,1,1, ρ).

The modelling strategy is such that the binary choice variable Yi1 takes a value 1 if the

child has had at least one year of education (SAGE score > 0) and 0 if the child has had no

schooling (SAGE score = 0). The second binary variable, Yi2, takes the value 1 if the child

has had consistent schooling (SAGE score ≥100) and 0 if he/she has fallen behind for their

age (0 0); = 0 otherwise [5a]

Yi2 = 1 if consistent schooling (Y*i2 >0); = 0 otherwise [5b]

This model is a variant of the standard bivariate probit model with four observations (see

Meng & Schmidt, 1985). Also known as a ‘bivariate probit model with partial

observability’, there are three types of observations under this particular model structure: a

child with no schooling, a child with consistent schooling or a child with inconsistent

schooling. The observations of this two-equation probit model can be represented

graphically, where n is the number of observations observed for each equation:

45

Figure 4. Three observations in the bivariate probit model with sample selection

n = 8600 n = 6572

The likelihood function is therefore given by:

ℓ = Π pr(no schooling) · Π pr(consistent schooling) · Π pr(inconsistent schooling)

This implies the log-likelihood function is:

lnℓ = ∑ni=1(1 – Yi1) ln [1 – Φ (Xi1β1)]

+ ∑ni=1Yi1·(1- Yi2) ln{Φ(Xi1β1) – Φ2(Xi1β1, Xi2β2; ρ)}

+∑ni=1Yi1·Yi2 ln Φ2(Xi1β1, Xi2β2; ρ) [6]

where Φ(·) and Φ2(·,·,ρ) denotes the univariate and bivariate standard normal cumulative

distribution functions. Equation [6] is jointly estimated by maximum likelihood using

Stata 9 SE.

No Participation; Yi1 = 0

Participation; Yi1 = 1

Inconsistent; Yi2 = 0

Consistent; Yi2 = 1

All Children

46

This econometric strategy was chosen to deal with the potential sample selection problem.

The need for this model is indicated by the statistical significance of ρ – the correlation

coefficient of the dual equation errors. If ρ is statistically significant, this indicates that

there is a relationship between the two schooling propensities.

Although schooling participation is fully observable, estimating this first probit equation

would produce inefficient results under this condition. Moreover, the second probit model

would produce selectivity bias if only the schooled sub sample was analysed. However, if

ρ is not statistically different from zero, two univariate probit models to estimate the

probability of schooling participation and consistency following [3] would be appropriate.

Identification in a multiple equation probit model has been a source of some debate.

Maddala (1983) stated that in order to identify the second equation, at least one variable

needed to be included in the selection equation that is not included in the outcome

equation. However, Wilde (2000) argues that exclusion restrictions are not needed (and

that Maddala was considering a specific example) if there is sufficient variation in at least

one exogenous regressor in each equation. Essentially, the non-linearities in the probit

models are considered sufficient for identification.

For prudence’s sake, an exclusion restriction was included in this analysis. The variable,

land, was considered a good proxy for a family’s assets and potential to invest in

schooling. This asset proxy is not considered relevant for schooling consistency (with

household income more pertinent). Using Stata, the bivariate probit with sample selection

model did not converge without this identifier and thus its inclusion seemed necessary as

well, despite Wilde’s argument.

47

6.3 Extension: Random Effects Probit Models Probit analysis of schooling participation and schooling consistency assumes that each

observation or child is independent. However, the sample consists of 3820 families with

an average size of 2.3 members. Therefore there may be some characteristics that are not

specifically in the model but are common to children in the same family and household.

Examples of this clustering effect could be parental competence in assisting their children

in their schoolwork or parental preference for education or health resources. Because

intra-cluster correlation would create bias parameter estimates, an extension of our

econometric models to take into account unobserved household heterogeneity is necessary.

A random effects probit model is used to account for children’s education data being

clustered at the family level. For detailed discussion on the model, see Maddala (1987).

This model considers the household effect to be random. The latent variable (for general

schooling outcomes) is thus of the form:

Y*ih = α + Xhβ+ Zihγ + vih [7]

where i and h are individual and household subscripts respectively. 1 x K vector Xh contains the explanatory variables that vary only at the household level. 1 x L vector Zih contains the explanatory variables that vary within the households or clusters.

The random effects probit model assumes that the error term vih is composite in nature:

vih = ch + uih [8]

where i and h are individual and household subscripts respectively, ch is the unobserved

household effect and uih is the idiosyncratic error. It is assumed that uih ~ i.i.d. N(0,1) and

ch ~ N(0, σ2c). Thus,

Var (vih) = 1 + σ2c [9]

48

ρ = σ2c / 1 + σ2c [10]

where rho ρ is considered the proportion of the error variance that is due to the

unobserved household effect.

The random effects probit model holds some strong assumptions. Notably, that there is no

relationship between the explanatory variables and the unobserved household effect, ch. This assumption is needed in order to produce consistent estimates. An alternative model

to control for unobserved heterogeneity is the fixed effects model. However, because our

variables of interests are at the non-individual level, using fixed effects would effectively

drop

Documents

The impact of health resources on education …...1 The impact of health resources on education outcomes in rural India CANDICE WEI LING TAN Honours Thesis Bachelor of Commerce (Financial