Executive Summary - Rice University | Rice Universitydesign/PastSuccessandPressFiles/Opti… · From: TEAM OPTIMIST (Amber Kunkel, Elizabeth Van Itallie, and Duo Wu) Subject: Final

Date: 2 May 2012

To: Dr. Mark Embree

From: TEAM OPTIMIST (Amber Kunkel, Elizabeth Van Itallie, and Duo Wu)

Subject: Final Report

Executive Summary

We created a three-tiered plan for optimally allocating Health Surveillance Assistants (HSAs),HSA backpacks, and backpack resupply centers across all Malawi or selected regions. To achievethis, we used the p-median problem to allocate HSAs and the capacitated facility location problemto assign backpacks and resupply centers. We solved these integer programs using Gurobi, basedon data from the International Food Policy Research Institute processed in ArcGIS.

Across Malawi, we assigned 9147 population areas to 6500 HSA pairs, with a mean distancebetween population areas and assigned HSAs of 0.39 km. We determined that 2188 backpackscould optimally serve these HSAs, with a mean distance of 1.8 km and an initial cost of $772,167.08.While using health centers along with hospitals as resupply sites could disrupt the supply chain,it produced a desirable average distance of 5.1 km between backpacks and resupply centers, asopposed to 19.8 km when using hospitals only.

Our results are limited by the age (>10 years) and potential inaccuracy of our data; modelassumptions such as the use of straight line distances; and inherent difficulties in modeling humanbehavior. Nevertheless, our plan could optimally scale up the HSA backpack program and improvehealthcare access in Malawi.

1

Contents

1 Problem Statement 5

2 Literature Review 5

2.1 Current Health Situation in Malawi . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Approaches to Similar Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Design Criteria 7

3.1 Choosing a Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Solving the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3 Applying the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Selected Design Solutions 9

4.1 Mathematical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.1.1 The p-Median Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.1.2 The Capacitated Facility Location Problem . . . . . . . . . . . . . . . . . . . 10

4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5 Ethical Considerations 11

5.1 Model Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.2 Parameters Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.3 Policy Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.4 Potential Misuse of Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

6 Testing Plan 13

6.1 Accuracy of Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6.2 Sensitivity to Small Changes in Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6.3 Accuracy and Efficiency of Model Implementation . . . . . . . . . . . . . . . . . . . 16

7 Obstacles 17

7.1 Obtaining Inputs for the Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

7.2 Obtaining Information about the Situation of HSAs . . . . . . . . . . . . . . . . . . 18

7.3 Determining Parameters for the Models . . . . . . . . . . . . . . . . . . . . . . . . . 18

7.4 Learning New Software and Computer Languages . . . . . . . . . . . . . . . . . . . . 19

7.5 Solving Large Integer Programs with Gurobi . . . . . . . . . . . . . . . . . . . . . . 19

2

8 Results 19

8.1 Model Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

8.1.1 HSA-to-EA assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

8.1.2 Backpack-to-HSA assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

8.1.3 Resupply Center-to-Backpack assignment . . . . . . . . . . . . . . . . . . . . 22

8.2 Code Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

8.3 Design Criteria Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

9 Future Work 26

9.1 Exploration of Solution Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

9.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

9.2.1 Obtaining Improved Data for Inputs . . . . . . . . . . . . . . . . . . . . . . . 26

9.2.2 Choosing Pilot Districts for the Scale-up Plan . . . . . . . . . . . . . . . . . . 27

9.2.3 Determining an Implementable Plan . . . . . . . . . . . . . . . . . . . . . . . 27

10 Conclusion 27

References 29

A HSA and Backpack Assignments by District 31

B p-Median Code Documentation 32

B.1 Creating the Executable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

B.2 Inputs for the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

B.3 Outputs Generated by Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

B.4 Description of Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

B.5 Examples of Executable Lines for the Command Prompt . . . . . . . . . . . . . . . . 34

C CFLP - Backpack To HSA Code Documentation 34

C.1 Creating the Executable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

C.2 Inputs for the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

C.3 Running the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

C.4 Outputs Generated by Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

C.5 Description of Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

D CFLP - Resupply Center To Backpack Code Documentation 36

3

D.1 Creating the Executable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

D.2 Inputs for the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

D.3 Running the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

D.4 Outputs Generated by Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

D.5 Description of Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

E Plot Instructions 38

F Main ArcGIS Tools Used 39

4

1 Problem Statement

Our project goal was to propose a plan for the distribution and resupply of Health SurveillanceAssistants (HSAs) and HSA backpacks in the country of Malawi. Currently, selected communityhealth workers around St. Gabriel’s Hospital in Namitete use supplies from similar backpacksto provide limited healthcare within Malawian villages, returning periodically to St. Gabriel’s toresupply. We sought optimal locations for HSAs and a plan for the deployment of backpacksin a nationwide scale-up of this system. To do so, we analyzed census enumeration area (EA)centroids as potential sites for HSAs and backpacks and the existing network of health centers,government district hospitals, and Christian Health Association of Malawi hospitals as potentialresupply centers. Our plan was evaluated based on overall costs and travel times, as well as coverage.In this project, straight line distances were used as a proxy for travel times.

Our client for this project was Rice 360◦ and Rice Beyond Traditional Borders (BTB), specificallyDr. Rebecca Richards-Kortum and Dr. Maria Oden. In their words, the ideal output from thisproject would be an analysis of how to effectively scale-up the backpack program, including a planfor how many backpacks to provide and where to deploy them. In fact, results of this project willbe included as part of a grant application to USAID to fund more backpacks.

2 Literature Review

We began our analysis of this problem by conducting a review of the literature to improve ourunderstanding of the current situation in Malawi as well as approaches other researchers and publichealth practitioners have taken to similar problems.

2.1 Current Health Situation in Malawi

Malawi is a country in southern Africa located east of Zambia and west and north of Mozambique.Its citizens face many health challenges, including an under-five mortality rate of 134/1000, aninfant mortality rate of 77/1000 [12], and an HIV prevalence of 12% [8]. Malawi’s government isdedicated to fighting this problem, and in fact Malawi is one of few African nations on track toreach their Millennium Development Goals. One reason for this progress is the network of HSAslocated across the country.

HSAs in Malawi originated as “Smallpox Vaccinators” in the 1960s in response to an outbreakof smallpox. When a cholera outbreak occurred in the 1970s the “Smallpox Vaccinators” weredeployed as “Cholera Assistants” to assist the efforts to control and prevent further spread ofcholera. The Ministry of Health and Population in Malawi decided to keep these trained auxiliarymedical personnel “under a new mandate of ‘surveying’ factors and behaviors that put people’shealth at risk and providing primary assistance before referring complicated cases to health centersand hospitals” [11]. Over the years, the mandate of HSAs has widened considerably: they nowprovide health services such as immunization across the country. Currently, the government ofMalawi is planning to scale up the HSA program to a ratio of 1 HSA : 1000 people [12].

One factor constraining these HSAs’ ability to provide health care is the lack of infrastructure,especially related to transportation [11]. “Riders for Health: An African Success” discusses how thisnon-governmental organization is addressing transportation problems by focusing on maintenanceand repair of motorcycles [18]. Solving the transportation problem is not part of our project, but

5

the issues that this organization is trying to solve are very relevant.

Partially because of this poor infrastructure, medical supplies are often difficult to acquire inMalawi. There are only three Central Medical Stores in the country, which are located in Mzimba,Mponela, and Mwanza. These Central Medical Stores are frequently understocked, and mostHSAs are given very minimal equipment. Rice BTB’s community health worker backpack programattempts to address some of these issues by providing HSAs with backpacks that contain equipmentdesigned specifically for impoverished settings and disposables that can be restocked locally whennecessary. While this is an exciting development, it is a challenge for Rice BTB to find an efficientway to distribute and re-stock the backpacks given the high fuel costs and limited access to medicalsupplies in Malawi. This project aims to bridge this gap by proposing a plan for HSA deploymentand backpack allocation that addresses these limitations.

2.2 Approaches to Similar Problems

We have not found any instances of researchers creating a three-tiered system of community healthworkers, supplies, and resupply centers. However, many researchers have studied the allocation ofhealthcare resources in developing countries with slightly different models.

First, some researchers have studied how to best locate physical facilities rather than communityhealth workers. Cocking et al. proposed a coverage model for locating new health facilities inBurkina Faso, Africa [6]. Rahman and Smith provided a review of articles using location-allocationmodels for locating health centers in developing nations, focusing on their use in locating new sites,measuring effectiveness of past decisions, and improving existing systems [16]. Another interestingproject was conducted by Reid et al., who used a location set covering algorithm to determineoptimal locations of medical supply centers in Ecuador [17]. However, Murawski and Churchnoted that location-allocation models provide a static structure that may not be appropriate fordetermining health center locations, and instead proposed a linear integer programming modeltaking into account possible future improvements in road infrastructure [15]. Balancing competingincentives and objectives may complicate the process of setting locations, as acknowledged byMassam et al. [14].

Other researchers have studied health system efficiency at the community health worker specificlevel. In terms of travel, Brunskill and Lesh suggested house visits be treated as a routing andscheduling problem that could be approached as a traveling salesman problem with time windows,with additional complexity caused by the possibility of future follow-up visits [5]. Meanwhile, Parkeret al. presented a simple model to determine how to prioritize tasks for community health workersin Haiti [2]. Finally, Doerner et al. helped bridge the gap between community health workers andphysical health center locations by using a multiobjective combinatorial optimization formulationfor mobile health centers and evaluating that model within the Thies region of Senegal [7].

While none of these papers directly met the needs of this project, they provide a conceptualbasis for future mathematical analysis. Many of these models followed an integer programmingformulation similar to our approach.

6

3 Design Criteria

In determining our particular approach to this problem, we considered criteria for three differentstages of the design process: building the model, solving the model, and applying the model to theactual situation in Malawi. Since the decisions made at each stage do not necessarily interact withone another, we described and ranked the criteria within each stage independently.

3.1 Choosing a Model

We used the following criterion to rank different models.

• RelevanceOur model should meet the project objectives of allocating HSAs, assigning backpacks toHSAs, and assigning backpack resupply centers. Any model included in the final solutionshould fully address at least one of these goals. Furthermore, our final combination shouldmodel all three aspects of the problem.

This single criterion makes sense because of the homogeneity of the candidate design solutions,making them largely indistinguishable by other possible criteria. Since it was the only criterionconsidered here, it received a weight of 1.

3.2 Solving the Model

In considering different ways to solve our final model, we considered the four design criteria de-scribed here.

• DoabilityUsing our solution method, it should take less than 1 hour for us to access the requiredsoftware from the Rice campus and less than 10 hours for us to program our optimizationproblem excluding input data.

• Numerically AccuracyOur model should produce an output that is within 5% of the optimal value.

• Ease of UpdateOnce our initial code is written, we should be able to update the parameters of our model inless than 1 hour and run the code in less than 7 hours. Additionally, we should be able tomake changes to specific variables, constraints, and objective functions in less than 3 hoursof coding time.

• CostOur model should not require any hardware beyond a personal computer or software otherthan programs that are freely available to Rice students on the Rice network.

Analyzing these criteria led us to the following ranking of our solution-specific design criteria,with weights describing the relative importance of each criterion.

7

Criteria Weights

Doability .4Numerical Accuracy .3

Ease of Update .2Cost .1

3.3 Applying the Model

Unlike the above criteria, the design criteria shown below had little influence on determining themathematics of our proposed solution. Rather, these are aspects of the project that we have keptin mind as we used our model and then packaged its results.

• Accessibility of OutputsOur final product should be understandable to someone with a high-school education andwithout access to a computer. This could include maps, graphs, protocols, tables, etc. On ascale from 1 to 5, with 5 being extremely easy to understand and 1 being extremely difficult,evaluated by Dr. Richards-Kortum or Dr. Oden, we aim for a scale of 4 or higher.

• ComprehensivenessOur model should include a plan for covering the entire country of Malawi (at least 95% ofthe population).

• Stability of Proposed PlanA 5% change in the inputs of our model should produce no more than a 10% change in outputs(number of backpacks assigned to each resupply center), so that our plan will remain relevanteven in the face of errors or changes in inputs.

• Accuracy of InputsOur model should account for population as well as geography, seasonality, and road structurewithin 5% of their actual values.

• Reasonableness of OutputsOur model should produce outputs that are reasonable and implementable. In other words,the constraints of our model should be achieved and meet the needs of the Malawian popu-lation, and our output should not contradict the intuition and expertise of our global healthadvisors. This will be evaluated on a scale from 1 to 5, with 5 being extremely reasonable 1being entirely unreasonable, by Dr. Richards-Kortum or Dr. Oden. We aim for a score of 4or better.

The following weights reflect our understanding of the importance of each of these criteria.

Criteria Weights

Reasonableness of Outputs .3Comprehensiveness .25Accuracy of Inputs .2

Accessibility of Outputs .15Stability .1

8

4 Selected Design Solutions

We selected two mathematical models for the three-tiered distribution problem that we were tryingto solve. We applied these models by processing geographic and population data in ArcGIS andwriting a C++ routine to interface with Gurobi, a solver for large-scale linear program problems.

4.1 Mathematical Models

4.1.1 The p-Median Problem

The p-Median Problem was used to assign 6500 HSA pairs across 9218 EAs in Malawi. In thep-median problem, the goal is to find the p locations that minimize the average distance in anetwork [9]. In this problem, demand for service at each node and travel time between nodes isdeterministic.

To begin, let G = (V , E) be a graph where V is the set of vertices and E is the set of edges.Associate with each edge a weight d(vi, vj), which is the distance of the shortest path betweenvertices vi and vj according to the metric d. The n × n symmetric matrix dij = [d(vi, vj)] is theshortest distance matrix. Each vertex vi is assigned a weight wi, and the weighted distance matrixis Wij = widij .

A p-median problem can be formulated and solved as a binary linear program. Let ξij be anallocation variable such that

ξij =

{1, if vertex xi is allocated to vertex xj ;

0, otherwise.

We then seek

minn∑

i=1

n∑j=1

Wijξij

s.t.

n∑j=1

ξij = 1 ∀i

∑i

ξii = p

ξij ≤ ξii ∀i, j

ξij ∈ {0, 1} ∀i, j

For our problem, G was the graph of the country of Malawi, where the nodes vi representedEAs and the associated weights wi represented the demand assigned to that node. Specifically, thedemand at each EA was equal to its population adjusted by:

• Under-5 PopulationUnder-5 children were given a weight of 0.55 whereas the rest of the population was given

9

a weight of 0.45. This was because the HSA backpacks were built on the basis that HSAswould be spending 55% of their time on under-5 children [19].

• Population DensityThis factor was used as a proxy for rural vs. urban. Rural populations, defined as EAs withpopulation densities below a natural cutoff of 0.0035 people per square meters in the data,were weighted upward by 1.5.

• Proximity to Health CenterIf an EA was within 1 km of a health center, its demand was reduced to 10% of its originallevel.

The dij – distance of shortest path between the nodes – was represented by the shortest distancebetween centroids of the EAs.

4.1.2 The Capacitated Facility Location Problem

The Capacitated Facility Location Problem or CFLP for short was used to allocate HSAs to back-packs and select resupply centers from the existing network of health centers and hospitals.

In the general CFLP, a set of potential locations for facilities and a set of customers are given.The problem is to locate facilities and assign them to customers in a way that the total cost of usingthese facilities to satisfy customers’ demands is minimized. This problem is termed “capacitated”because each possible location has an upper limit on its supplying capacity.

A CFLP with each customer assigned to one facility (Capacitated Facility Location Problemwith Single Source, or CFLPSS) generally contains two sets of decision variables, xij and yi. It canalso be formulated as a binary linear program, where we seek to [10], [13]:

minm∑i=1

n∑j=1

cijxij +m∑i=1

fiyi

s.t.

n∑j=1

ajxij ≤ biyi ∀i

m∑i=1

xij = 1 ∀j

xij ≤ yi ∀i, j

xij , yi ∈ {0, 1} ∀i, j

yi = 1 if facility i is picked, and 0 otherwisexij = 1 if customer j is served by facility i, and 0 otherwiseaj = demand of customer jbi = capacity of facility icij = cost of using facility j to supply ifi = fixed cost of opening facility i

10

In this case of HSA to backpack assignment, we assumed that:

• Fixed cost of adding an additional backpack (fi) is equal to $352.91 [19], which is the initialcost of building a backpack with two weeks of supplies.

• Cost of traveling from HSA pair j to backpack i (cij) depends on the distances between thecentroids of EAs in which the two agents are located. Specifically, we assumed that traveling30 km is equivalent to the fixed cost of a backpack because a previous survey showed thatthe maximum distance an HSA has to travel is 30 km [11].

• Demand of an HSA pair (aj) is 1 backpack.

• Capacity of a backpack (bi) is 3 pairs of HSAs. This number was obtained from conversationswith contacts who had worked with these backpacks at St. Gabriel’s Hospital.

In the case of backpack to resupply center assignment, we assumed that:

• Fixed cost (fi) of setting up a resupply center in a hospital or health center is 0.

• Variable cost (cij) is equal to the distance between the health facility i and the EA in whichthe backpack j is located.

• Demand (aj) of a backpack is 1.

• Capacity (bi) of a hospital is 80 backpacks and that of a health center is 10 backpacks.

4.2 Implementation

ArcGIS – a commercially available Geographic Information System (GIS) software that is availableon Rice computers – was used to process all the geographic and population data used in thisproject. Three separate C++ routines were then written to interface with Gurobi to solve eachof the distribution problems. Gurobi is a commercially available large scale linear program solverthat is accessible to Rice students through the Curriculum Linux Environment At Rice (CLEAR)network.

5 Ethical Considerations

Despite the effort that went into finalizing these models, we realized that our good intentions did notexcuse us from considering their ethical implications. These ethical issues are particularly seriousbecause our suggestions may be carried out by policymakers who do not actually understand theinner workings of these models. We therefore should continue to clearly communicate the limitationsof our backpack distribution to our clients and beyond.

5.1 Model Inputs

The first ethical issue produced by our project is the possibility of sub-optimal results, becausethe two models used in the project depend greatly on the accuracy and resolution of input data.Besides obvious inputs such as distances and population data, this project also considered factorssuch as population density, percentage of under 5 children, and proximity to healthcare facilities. Ifthese input data are incorrect, the model may end up assigning too few HSAs and backpacks to EAs

11

that are in fact in urgent need of these resources. For example, population data used in this projectwere obtained from the 1998 Malawi Housing and Population Census, and adjusted to approximate2008 levels using district-level growth rate published in the 2008 Housing and Population Census.However, the population has exploded from about 10 million in 1998 to about 15 million today anddifferent enumeration areas (EAs) in the same district may not have experienced uniform growth.As a result, EAs that have experienced population growth above the average of their residingdistricts will likely be assigned less than adequate healthcare coverage.

5.2 Parameters Determination

Partially due to the difficulty of obtaining high-quality data, we had to make a number of assump-tions when deciding model parameters and their weights. For example, we assumed that if the totalnumber of backpacks assigned to a resupply center does not exceed its capacity, then the resupplycenters will be sufficiently stocked to resupply the backpacks. In reality this is highly unlikely. Likeother less developped countries, Malawi is plagued with infrastructure problems, which means thathealth facilities, including major hospitals, are constantly facing shortages in critical medical sup-plies. We also read contradictory accounts of the roles and working styles of HSAs while gatheringinformation for determining our model parameters. However, we were not paralyzed by this andhave been careful to document the sources for our assumptions and clearly describe the limitationsof our results.

5.3 Policy Implications

Despite our best efforts to model the situation in Malawi as realistically as possible, our models arenot comprehensive. Population characteristics and distances between EAs are the main factors wehave considered, but these two factors alone do not fully determine the quality of this healthcaredelivery network. As Dr. Rebecca Richards-Kortum and some of our global health contacts havepointed out, irregularities in supply chain and corruption are just two factors among possibly manyothers that affect the strength of this network.

Another issue is the trade-off between equity and efficiency when it comes to healthcare poli-cies. For example, it might have made sense from an efficiency standpoint to locate backpacks inareas with high population density, so that the highest possible number of people could be served.However, such a plan would probably not have addressed the issue of equity across rural vs. urbanpopulation. Another possible issue is that our plan might lead to differing levels of healthcare beingprovided to different ethnic groups, something we did not consider.

5.4 Potential Misuse of Model

Another set of ethical implications arises when we consider the possibility of our model being usedfor a purpose other than the one it was designed for. Our models could be used for achieving anunethical goal. For example, if you wanted to wage biological warfare on the country of Malawi,you could use our model to determine how to initially infect the minimum number of people toachieve infection of the entire country. Our models could be used for this purpose because oneof the criteria for the HSA allocation portion of our model was that the HSA network serve theentire population of Malawi. Another example of an unethical purpose that our models could be

12

helpful for achieving is stealing medical supplies. According to our contacts who have spent timein Malawi, HSAs are known for selling medical supplies that they have received for their job on theblack market. This is an unfortunate result of the fact that the HSAs are often underpaid. Ourmodels could be used by the leader of a group that serves as an intermediate between HSAs andthe black market by buying supplies directly from HSAs and then selling them again. This groupcould use our models to determine where to place their members across the country to buy thesesupplies from the HSAs. They could use the portion of our models designed to determine how toallocate HSAs to backpacks and backpacks to resupply centers to do this.

6 Testing Plan

We created several tests to investigate the accuracy of our input data, the sensitivity of our modelsto perturbations in input data, and the accuracy of our implementation.

6.1 Accuracy of Input Data

As discussed in the Ethical Consideration section, our models rely drastically on data about thegeography and population of Malawi. While we may not have other data as precise or (ideally)as accurate as those we used within the model, our data acquisition phase left us with manydifferent data sources. In each of the following tests, we compared the International Food PolicyResearch Institute (IFPRI) data that we used for our project with data from one or more othersources, looking for high levels of consistency between data sets and thus increased confidence inthe accuracy of our data. While there are some limitations to this type of test, we were able toobtain a better understanding of our data.

1. Accuracy of number of hospitalsFor this test, we compared the IFPRI data that we used in our project with data from theHealth Information Systems Programme (HISP). The HISP data set is from 2006, thoughit is unclear whether that is the data of data acquisition or of publication [1]. The IFPRIdata comes from approximately 1998, based on our correspondence with them. The standardwe hoped to achieve for this test was having 100% agreement on the number of district andcentral hospitals, 90% agreement on the total number of hospitals, and 80% agreement on thetotal number of health centers or other small healthcare facilities. We allowed for this gradualdecline in accuracy as the size of the hospital decreases as we believed that the existence ofsmall health centers would likely be more fluid than that of larger hospitals. The numbers ofhospitals and health centers listed in each data set are as follows:

Type of Facilities IFPRI HISP

Central Hospitals 4 4

District Hospitals 22 22

Other Hospitals 23 19

Rural Hospitals No such designation 27

Total Healthcare Facilities1 710 620

1This include hospitals, maternity wards, dispensaries, health centers, and more.

13

We thus saw 100% agreement in the number of central and district hospitals; 92% agreementin the total number of hospitals (not including rural hospitals); and 87% agreement in thenumber of non-hospital healthcare facilities. Our data thus passed the tests for hospitalconsistency in terms of total numbers.

2. Accuracy of hospital locationsFor district hospitals, non-district hospitals, and health centers, we looked at the averagedistance from each of the points in the smaller data set to the closest point in the larger dataset. We hoped this distance would be less than 2 kilometers for central and district hospitals,5 kilometers for all hospitals, and 5 kilometers for all health centers. In practice, the averagedistance to the closest facility was 0.32 kilometers for district and central hospitals; 2.88kilometers for hospitals; and 1.68 kilometers for health centers, meeting our goals.

3. Geographic data consistencyIn this we compared the IFPRI data on Traditional Authorities (TAs) with the MalawiAdministration Level 2 2000-2010 file from the University of North Carolina (UNC) database.The IFPRI data were obtained from the 1998 Malawi Housing and Population Census whereasthe UNC data came from the 2008 Malawi Housing and Population Census. We hoped tosee 100% agreement in the raw number of TAs. In fact, the IFPRI and UNC data sets bothcontained 367 TAs. Furthermore, the centroids of these TAs were within 10−4 meters of oneanother. This is better than we had hoped for (2 km average) and could serve to increaseour confidence in the data. However, the fact that the primary source of these two data setscome is the same agency makes the comparison less meaningful.

4. Population data recencyWe would like our population data to accurately reflect the most recent data available. Wehad hoped to obtain EA level population data from the 2008 census for this project. However,we were only able to obtain EA-level data from 1998. To compensate for this, we increasedthe population level for each EA according to the Intercensal Population Increase Percent1998-2008 for the district containing the EA. These growth rates were obtained from the2008 Malawi Population and Housing Census.

5. Population data consistencyFor this test, we compared the 2008 district population values with the 1998 district popu-lation values adjusted by the 1998-2008 growth rates. We hoped to see 90% agreement inpopulation levels from these two numbers. In fact, most of the districts had >99% agree-ment. However, there were two main exceptions. First, the Mwanza district was split in2003 into Mwanza and Neno districts, making the actual population of Mwanza 45% of theexpected one. When the published values for Mwanza and Neno were combined, however,they were within 97% of our prediction. Second, our projected population of Zomba Districtwas 509,590 compared with the published values of 579,630, putting us only within 88% ofthe published value. The reason for this discrepancy is not clear.

6.2 Sensitivity to Small Changes in Data

To mitigate the impact of inaccurate input population data as well as locations and reliabilityof health facilities, we continued to search and update our input files throughout the project. Wehoped to allow easy update of input data by users of our models so as to allow constant improvementon their accuracy. While this function is important, it could lead to implementation problems for

14

users of our models if our outputs are too sensitive to small changes in inputs. We hoped thatthe number of backpacks predicted by our model would change in a manner commensurate withchanges in population data. If this number changes significantly with small changes in populationdata, our results may lead to gross over-estimation, or worse, under-estimation. Furthermore, wehoped that in case of missing health facilities or supply disruption to some resupply centers, thebackpacks could be quickly re-assigned to another resupply center within a reasonable distance.

1. Sensitivity to change in population dataWe hoped to artificially perturb the population levels of a portion of the EAs and comparethe number of backpacks and resupply centers predicted by our models using the perturbeddata with that generated by original data to assess the sensitivity of our models to populationdata. We hoped to generate the following table after the tests:

% of EA Extent of Average % Change in Average % ChangePopulations Perturbation Number of Backpacks in Number ofPerturbed (%) Resupply Centers

5 5

5 10

10 5

10 10

20 5

20 10

50 5

50 10

100 5

100 10

For our models to pass the test, the resulting average % change in number of backpacks orresupply centers should be commensurate with the change in population data. That is:

% Change in Number of Backpacks / Resupply Centers

≤

2 * (% of EA Populations Perturbed * Extent of Perturbation)

Unfortunately, we were unable to complete this test due to unforeseen obstacles in usingGurobi. Given more time, we would hope to carry out this test to better understand thesolution space of our models.

2. Sensitivity to strength of supply chainTo test the sensitivity of our models to the reliability of supply chain, we hoped to randomlyshut down 5% of the resupply centers by changing their capacities to 0. We would thenexamine if the affected backpacks would be able to find reasonable alternative resupply centersby running our models with a reduced set of input data that would include only unassignedbackpacks, unassigned resupply centers and assigned resupply centers with spare capacities.If the assignment of affected backpacks gave rise to backpack-resupply center distances that

15

were less than 1.5 times the average backpack-resupply center distance, our models wouldbe considered to have passed the test. We hoped to repeat this test by disrupting resupplycenters in different parts of Malawi.

From our conversations with contacts in Malawi and our tests on the accuracy of healthfacility locations, we found out that health centers have the least consistency across differentdata sets and that they are most likely to be affected by supply chain disruption. We thereforeconducted a modified version of test 2, where we tested the scenario that all health centerswould be shut down. The average travel distances more than tripled in such a case, which isundesirable. Most likely, BTB will have to use a combination of hospitals and health centersand will have to assess the reliability of health centers before using them.

6.3 Accuracy and Efficiency of Model Implementation

While the codes for the two models we used in this project were not highly complicated, there werestill many opportunities for errors. One challenge was that we were working with very large datasets and were therefore were loading files of over a million lines. Another challenge that could becaused by data loading errors as well as other bugs is that we could have created the wrong integerprogram. Errors could also have arisen even if we had created the correct integer program butthe optimization by Gurobi was not done correctly. We could also have output the informationincorrectly. Furthermore, it is possible that it could have taken a long time for the solution to befound. Lastly, our integer program might not have had a solution. Fortunately, we were able totest for and avoid most of these potential errors.

1. Accuracy of parameter weightingThe node weights used in the p-median problem were weighted based on population levelsand population characteristics. These node weights were generated using Excel. For tenrandomly chosen EAs these values were computed by hand from the distance to the closesthealth center, the EA population update to approximate 2008 level, percentage of populationunder 5, and population density. For 100% of the chosen FIDs the Excel generated and handcomputed values were the same.

2. Accuracy of reading inputsOur codes should accurately read inputs from the input files. To test this, we ran our codeon Matlab generated test input files that were smaller in size than the actual input files. Wehoped to achieve 100% accuracy in reading inputs.

• p-Median Node WeightsWe hoped to randomly select 100 of the FIDs and node weights of the 9218 nodes readin by the p-median code and test how well they agree with the actual values given in theinput file. To do so, we used Matlab to generate a random ordered list of 100 numbersbetween 0 and 9217. The FIDs and node weights for the 100 nodes corresponding tothe randomly list of 100 numbers were printed to a text file. We found that 99 of thegenerated FIDs were included in the full Malawi problem with the zero nodes removedand 100% of the outputted node weights matched exactly with the those given in theinput file.

• p-Median Edge WeightsFor this test, we hoped to randomly select 100 of the 9218-by-9218 edge weights read in

16

by the p-median code and again test how far they agree with the actual values given inthe input file. We used Matlab to generate a random list of 100 pairs of numbers between0 and 9217. The corresponding values within the edge weight matrix were outputtedalong with the randomly generated pairs into a text file. Not all of the decimal digits ofthe edge weights were printed so the outputs were different than the inputs on the orderof hundredths of meters. Otherwise 100% of the outputted edge values were the sameas the corresponding one in the input file.

• CFLP Variable CostsWe used Matlab to generate a 31-customer-by-20-facility variable cost matrix. Thevariable costs were read in by our code and then outputted to a text file. We thencompared the output and input files, which achieved 100% agreement on the values ofvariable costs.

3. Correctness of setting up the modelFor this test, we used the CFLP code to write out a .lp file for a small problem of 10 customersand 10 facilities. An .lp file explicitly writes out the objective function and constraint functionsof the model that Gurobi is given to solve. The .lp file written by the CFLP code was exactlywhat it was supposed to be.

4. Correctness of overall model codeTo test that we were writing and solving our integer programs correctly, we used our codes tosolve p-median and CFLP test problems with known integer solutions and hoped to achieve95% accuracy. We chose six of the test problems from OR-Library [3] to run the p-mediancode on. Among them were four problems with 100 nodes, one problem with 400 nodes, andone problem with 900 nodes. We tested the CFLP code on six of the test problems thatProfessor Kaj Holmberg kindly shared with us [10]. These test problems can be found onProfessor Holmberg’s website: http://www.mai.liu.se/~kahol/problemdata/cloc/. For100% of the problems the optimal objective function value returned by the p-median andCFLP codes was the same as the given optimal answer. While our code produced the correctresults, it is possible that issues that arise for the very large data sets that we are using willnot arise for a small problem. Therefore we also hoped to run our code on a test problemswith known integer solution that had at least 1 million nodes. Unfortunately, however, wewere unable to locate a test problem with the desired size.

5. Computation speedWhile the speed at which we are able to generate and solve our integer programs was not ofprimary importance, we were concerned that it should not take an unreasonable amount oftime for the integer programs to be solved, assuming there was a solution. As stated in ourDesign Criteria, we wanted our code to return the optimal solution in less than 7 hours. Thismeant that the total run time for the p-median problem and the CFLP should be less than7 hours because the output of the p- median problem was part of the input for the CFLP.However, due to the large-scale of our problems, we were unable to achieve this goal.

7 Obstacles

Some of the main obstacles we faced in modeling and solving our problem included issues obtainingaccurate data, determining model parameters, using new software programs, and solving largeinteger programs.

17

http://www.mai.liu.se/~kahol/problemdata/cloc/

7.1 Obtaining Inputs for the Models

To implement our model we needed to create a graph of Malawi with nodes representing populationcenters and edges representing travel times between them. We also needed to know the locationsof the health facilities that we could use to resupply the backpacks. In the fall semester we foundpublished files from the 2008 Housing and Population Census on the website of National StatisticalOffice of Malawi. In the very beginning of the spring semester we were introduced to GIS, Geo-graphical Information Systems, and started to search for GIS files for the country of Malawi. Wewere not able to find EA-level GIS files or population files from the 2008 census. We instead usedGIS and population files obtained from Todd Benson of the International Food Policy ResearchInstitute, who had worked with the National Statistical Office of Malawi to publish “Malawi: AnAtlas of Social Statistics” in 2002 [4]. Unfortunately this data was dated to the late 1990s. Sincethe population of Malawi increased from 9 million in the late 1990s to 13 million in 2008, it wasimportant that we update the EA population values to reflect this change. We used district-levelgrowth rates from the 2008 Census to update the population values. Since there are 9218 EAs yetonly 31 districts this was not an ideal method. It was frustrating knowing that there may havebeen more relevant data in existence that was not accessible to us.

7.2 Obtaining Information about the Situation of HSAs

A challenge throughout the project has been trying to make sense of the inconsistencies betweenthe Ministry of Health’s goals for the HSA program, BTB’s vision for the HSA backpacks, andthe reality of how the current health system operates in Malawi. One particular example of thisconfusion was that BTB has developed both HSA and Community Health Worker backpacks. Fromthe official description of the HSA job, the differences between the roles the two different packsseemed significant. However, only the Community Health Worker backpacks had been field tested atSt. Gabriel’s Hospital and BTB was anxious for us to use those results to inform our understandingof the project and model. It was challenging trying to determine how relevant this informationreally was to our project with the HSA backpacks.

In general, trying to understand the current situation of HSAs in Malawi was difficult. Whilewe contacted many different non-governmental organizations as well as individuals with experienceworking with HSAs in Malawi, few of these correspondences provided useful information. In theend we dealt with this challenge by making our models relatively simple and by relying on a fewreputable sources, namely “The Role of Health Surveillance Assistants (HSAs) in the Delivery ofHealth Services and Immunizations in Malawi” [11] for travel distances that could be expectedfor an HSA, and the Backpack Cost Analysis spreadsheets produced by Dan Walk [19] for BTB’sexpectations of the costs and usage of the packs.

7.3 Determining Parameters for the Models

Throughout our project we struggled with the disparate perspectives of BTB and the academiccommunity of operations research. Our contacts at BTB were used to dealing with the inaccessi-bility of information in the less developed world and often seemed frustrated by our inquiries forspecific statistics. When we spoke with experts in operations research, they were very comfortablechoosing model parameters that were not strongly supported by data. Since we felt it was importantthat our model be as relevant as possible, the lack of receptiveness to both our concern about the

18

importance of this and our inability to find these exact values was challenging. Our stress from thiseven led us to setup a Skype phone call with one of our advisors while he was overseas. In the endour solution was to make our models as simple as possible. Experiencing this disconnect betweenon the ground global health and academic operations research experts also gave us confidence inthe importance and potential impact of the work we were doing.

7.4 Learning New Software and Computer Languages

This semester we were challenged when learning to work with new software and/or computerlanguages. We used the resources of Fondren Library to learn ArcGIS, which enabled us to convertGIS files into the input files for the models and to convert the outputs from the models intographical and statistical results. We also learned how to code in C++ and how to use the Linuxcommand line and operating environment. In all of these instances the files were large and codesran for long amounts of time, requiring patience and attention to details. Running our integerprograms was also demanding for Gurobi and this was a challenge as well. Even though we had abasic understanding of Gurobi, solving this problem required us to learn many more details aboutGurobi and also led us to realize some of its limitations.

7.5 Solving Large Integer Programs with Gurobi

We initially used the Curricular Linux Environment at Rice (CLEAR) to access Gurobi and tocompile the C++ interface codes. However, the Gurobi variables were so numerous that the codesended up taking over all the computer memory and threads of the CLEAR environment. Thanksto the help of Academic Computing staff member Karl Burkett, a dedicated server was created forour use, hence removing the concern of our code taking over shared resources.

However, the problems were still too large to be solved and we were forced to reduce the numberof Gurobi variables created by limiting the number of potential service providers to those within aspecified distance. We were also forced to limit the Threads parameter in Gurobi to one becausewhen multiple threads were used, each copy of the thread required a large amount of computermemory. This was frustrating because Gurobi prides itself on being able to take advantage ofmultiple processors and by limiting Gurobi to only one thread this feature was not utilized. Lastly,the programs were generally relatively slow to solve. This obstacle was made worse because we didnot realize that the turning on the Presolve parameter was important even when the process didnot actually remove rows or columns in the constraint matrix. Email exchanges with one of thefounders of Gorubi and an employee were helpful in resolving many of these issues.

8 Results

Despite these obstacles, we managed to produce results that met our primary objectives as wellas those of BTB. In particular, we succeeded in using ArcGIS and Gurobi to obtain data aboutthe population and geography of Malawi, translate this data into model inputs, solve our models,and interpret these results. We assigned EAs to HSAs with an average distance of only 0.39 km;determined that 2188 backpacks would be necessary to optimally supply all these HSAs, resultingin a total start-up cost of $772,167.08; and assigned each backpack to a resupply site, with varyingdistances depending on whether health centers or only hospitals were considered. All of these

19

results were determined within approximately 24 total hours and were within 1% of optimal. Thedetails of these results, a summary of our code statistics, and a discussion of how well the resultsmet our design criteria are provided below.

8.1 Model Outputs

We divided the problem into three parts: HSA-to-EA assignment, Backpack-to-HSA assignment;and Resupply Center-to-Backpack assignment. In each case, we looked at both the entire country ofMalawi and specifically the district of Lilongwe, which is the district with the highest node weightsum and also the location of St. Gabriel’s Hospital. The results for each sub-problem are describedbelow.

8.1.1 HSA-to-EA assignment

In the Malawi case, we assigned 6500 HSA pairs to 9147 EAs (originally 9218, but we removed71 EAs that had a population of 0) using the p-median problem. We then calculated the averagedistance between each EA centroid and the location of the HSA pair serving that EA, representedby the centroid of the HSA’s base EA. Our model resulted in an average distance of 0.39 km, amedian of 0 km, and a maximum distance of 12.7 km. While it is important to note that thisdoes not necessarily represent the distance from an HSA to their farthest patient, these distancesnonetheless appear reasonable, especially since most EAs are only a few square kilometers in area.

The number of EAs served by each HSA pair is shown in Table 1. Most HSA pairs are servingonly one or two EAs. However, a few are serving 10 or more EAs, with the maximum number being35. When we traced the HSAs serving large numbers of EAs, we found that they were centeredin urban areas (where EAs are much smaller, closer together, and more dense) and located nearhealth centers. This finding makes sense, as we assumed that EAs located near a health centercould have the majority of their needs met by that health center and would only have 1/10 thedemand for an HSA of an EA without such access.

Number of EAs Number ofper HSA HSA pairs

1 49092 10733 3354 95

5-9 7110-19 1420+ 3

Table 1: Number of HSAs by the number of enumeration areas they serve.

In the Lilongwe case, we assigned 615 HSA pairs to 861 EAs. The number of HSAs assigned herewas based on a 1:1000 ratio between HSAs and the total population of Lilongwe, similar to howwe derived the number of HSAs assigned to the entire country. In this case, the average distancebetween each EA and the assigned HSA pair was 0.46 km, with a median of 0 km and a maximumof 4.4 km. The assignments of these HSAs are shown in Figure 1. This and the other maps below

20

were created using a combination of ArcGIS and Matlab and indicate connections between servers(in this case, HSAs) and assignments (in this case, to EAs). As this graph shows, most HSA pairsin Lilongwe have neither long distances to travel nor many EAs to serve. As opposed to the entireMalawi case, in the Lilongwe case the maximum number of EAs assigned to one HSA pair was 5.

Student Version of MATLAB

Figure 1: HSA-to-EA assignments in Lilongwe District. Points represent the location of each HSApair serving one EA. Lines indicate paths between HSAs and EAs for HSA pairs serving more thanone EA.

8.1.2 Backpack-to-HSA assignment

Using the CFLP for backpack assignments allowed us to determine the number of backpacks neededas well as optimal locations and HSA assignments of these packs. In the entire Malawi case, ourmodel assigned 2188 backpacks to the 6500 HSA pairs. With a per-backpack initial cost of $352.91,this corresponds to a total start-up cost of $772,167.08. We found that 2129/2188 (97.3%) of thesebackpacks were serving at full capacity of 3 HSA pairs, suggesting that this is the main limitingconstraint in our model. The average distance between each HSA pair and the backpack serving itwas 1.8 km, with a median of 1.9 km and a maximum of 21.7 km.

In the Lilongwe case, our model assigned 205 backpacks to 615 HSAs, with 100% of backpacksserving at full capacity. This corresponds to a total start-up cost of $72,346.55. The assignmentsare shown in Figure 2. The average distance between each HSA pair and the backpack serving itwas 1.6 km, with a maximum of 9.1 km.

21

Student Version of MATLAB

Figure 2: Backpack-to-HSA assignments in Lilongwe District. Lines indicate paths between eachbackpack and the HSAs it serves.

8.1.3 Resupply Center-to-Backpack assignment

In assigning resupply centers to backpacks, we considered two different cases. First, we allowedboth hospitals and health centers to act as resupply centers. Next, we assumed that health centerswould not have a reliable enough supply chain to serve the backpacks and instead based our modelsolely on the hospital network. Both of these cases are shown for the entire country of Malawi inFigure 3.

In the hospital and health center case, 46/48 hospitals and 574/661 health centers were chosenas resupply centers. The mean number of backpacks served by each hospital was 4.09, with amaximum of 12; for health centers, the mean was 3.48 backpacks, with a maximum of 10. Thelarge number of facilities chosen led to short distances between each backpack and its resupplysite, with a mean of 5.1 km and a maximum of 28.4 km. In the hospital-only case, 49/49 hospitalswere chosen as resupply centers. The mean number of backpacks served by each hospital was 40,with a maximum of 80. As Figure 3 shows, the distances HSAs would need to travel in this casewere much greater than in the health center and hospital case, with an average of 19.8 km anda maximum of 70.5 km. The hospital-only case also introduced a few other difficulties, includingthat some HSAs would need to cross Lake Malawi to reach a resupply center because of a lack ofnearby hospitals.

When our model was restricted to consider only the backpacks and potential resupply centerslocated in Lilongwe District, the results were similar. The health center and hospital case againshowed shorter distances between backpacks and resupply centers, with an average distance of 5.4km and a maximum of 25.7 km, whereas the hospital-only case resulted in an average distance of 28

22

Student Version of MATLABStudent Version of MATLAB

Figure 3: Resupply Center-to-Backpack assignments across Malawi, when either (L) both hospitalsand health centers or (R) hospitals only are allowed as potential resupply sites.

23

km with a maximum of 62 km. These results were complicated by the fact that the two hospitals inLilongwe District were not capable of serving all 205 backpacks with our initial hospital capacity of80 backpacks per hospital. We therefore raised the capacity to 150, which resulted in one hospitalserving 89 backpacks and the other serving 116. In the hospital and health center case, 37/37 healthcenters and both hospitals were chosen. The hospitals served 6 and 2 backpacks, respectively, whilethe health centers served a mean of 5.3 backpacks and a maximum of 10.

8.2 Code Details

In order to achieve these results, we used three different codes – a p-median program for HSAassignment, a CFLP program for backpack assignment, and a CFLP program for resupply centerassignment – applied to both the entire data set and the Lilongwe subset. The statistics for eachof these programs are shown in Table 2. A few trends from this table stand out. First, the runtimeand gap percent generally decrease as the size of the problem decreases, though this is not the casewhen we compare the p-median problem to the CFLP because of the differing structure betweenthe two problems. In all cases, we were able to obtain gap percentages less than 1%, indicatingthat obtained solution was within 1% of the numerical optimal. In some of the codes, such as theHSA-to-Backpack assignment in Malawi, the solution could be solved to even greater optimality,but we chose to instead stop the code after about 12 hours to keep the runtime reasonable. Thisdecision was closer in line with our design criteria, reviewed in the next section.

EAs to HSAs to Backpacks to Health Backpacks toRegion Measurement HSAs Backpacks Centers and Hospitals Hospitals

MalawiNumber of Variables 23 million 10 million 1.6 million 0.11 million

Runtime 11.5 hr 12.5 hr 3 min 5 secGap Percent 0.0100% 0.7979% 0.0037% 0.0000%

LilongweNumber of Variables 741,181 378,708 8,034 412

Runtime 81 sec 2 hr 15 sec 3 secGap Percent 0.0000% 0.0100% 0.0000% 0.0000%

Table 2: Statistics for each code run. Number of variables indicates the number of Gurobi variablescreated after variable reduction; runtime indicates the total runtime involved in reading the data,setting up the problem, and solving the problem; and gap percent indicates the gap between thelower bound and upper bound that Gurobi obtained through branch-and-bound.

8.3 Design Criteria Recap

With these results obtained, we were able to assess how well our final model met our originaldesign criteria. Overall, our model was successful in meeting the original criteria we had set. Ourassessment of each individual criterion is listed below.

• Model RelevanceAs intended, our final model achieved all three objectives of allocating HSAs, assigning back-packs to HSAs, and assigning backpack resupply centers.

• DoabilityIt took less than 1 hour for us to access Gurobi, our software solver, from Rice campus as

24

desired. However, it took significantly more than 10 hours for us to completely programour C++ codes, especially when considering debugging time. Still, we were able successfullyto solve all of our models by the end of the semester, suggesting that our methods weresufficiently doable.

• Numerically AccuracyEach of our models produced an output well within our initial goal of 5% of the optimal.

• Ease of UpdateOur longest code takes 12.5 hours to run, longer than our initial goal of 7 hours. However, thisruntime was still short enough for us to run it approximately overnight, which had been ourinitial justification for choosing 7 hours. The longer runtime allowed us to greatly improve ournumerical accuracy. Parameters for node weights could probably be updated within 1 hour,as this weighting was mainly done through a simple Excel routine. However, it would takelonger than 1 hour to update edge weights because of the sheer number of edge weights to becalculated. These parameters were determined using ArcGIS and formatted by a shell script,the combination of which would probably take at least 3 hours. We were able to find a wayto remove unwanted variables, such as EAs with zero population, and updating that couldbe done within our goal of 3 hours. Major changes to constraints and objective functions, tothe extent that we are no longer using the p-median or CFLP, may take longer.

• CostWe were able to run our models on personal computers using only software freely available toRice students on the Rice network (in particular, ArcGIS and Gurobi); however, our solutiondid require establishing our own dedicated server.

• Accessibility of OutputsWhile we did not ask Dr. Richards-Kortum and Dr. Oden to rank this directly, they seemedhappy with the maps and tables that we had produced and said they thought they wereformatted appropriately for their use.

• ComprehensivenessOur model included a plan for scaling up the backpack program to cover 100% of the country,better than our 95% goal.

• Stability of Proposed PlanWe did not have time to systematically test our final results. We did see slightly differentresults based on how long we allowed the programs to run and how close the gap was.

• Accuracy of InputsThis criterion is discussed in the section on our Testing Plan. We were not able to directlytest the accuracy of our inputs, but we did see good consistency across different data sources.Nevertheless, we expect that there may be significant errors within our inputs, includingwithin the population data (as we had to project 1998 population levels to 2008), the healthcenter location data (also from the late 1990s), and the distance data (as we used straightline distances as a proxy for travel times). Furthermore, we did not manage to incorporatesome of the variables that we had initially aimed to include in our model, such as seasonality.

• Reasonableness of OutputsWe did not ask Dr. RIchards-Kortum and Dr. Oden to rank the reasonableness of ouroutputs. However, from talking to them it appears that our model results do not conflictwith their intuitions about backpack placements.

25

9 Future Work

While we managed to achieve most of our design goals for this project, there is still more work thatwe would like to do or see done in the future. These changes are primarily related exploring thesolution spaces of our models and implementing our plan.

9.1 Exploration of Solution Spaces

As discussed in the Obstacles section of this report, one of the largest and most unexpected chal-lenges of this problem was that we were continually frustrated by the unpredictability of how Gurobisolved the linear integer programs. For example, some smaller problems (e.g. HSA to backpackassignment in Lilongwe) that we expected would be easier for Gurobi to solve often ended up takinglonger than larger ones (e.g. backpack to resupply assignment in Malawi). In order to explore thesolution spaces of these models it would be necessary to understand Gurobi more thoroughly. Thealgorithmic design of Gurobi and the fact that most of its users are chiefly interested in finding afeasible integer solution within some tolerance of optimal but not necessarily exploring the space offeasible solutions close to optimal means that the feasible solutions found were always the same. Anincreased understanding of Gurobi could lead to reduced run times, which would be useful becausecurrently it currently takes over twelve hours for the EA to HSA and HSA to Backpack codes toreach the default Gurobi gap. More knowledge about Gurobi could also allow for the developmentof a feature in the heuristic exploration of nodes that allowed different feasible integer solutions tobe found for a single problem formulation.

9.2 Implementation

One major direction that future work on this problem could take is the creation of a detailed imple-mentation plan for the scale up of BTB’s HSA backpack program in the country of Malawi. Thiseffort would contain three different focuses: obtaining more recent and specific inputs, determin-ing the regions of Malawi where the program would be implemented, and outlining a detailed yetfeasible plan that would include the salient features of the model solutions.

9.2.1 Obtaining Improved Data for Inputs

In the future, we would like to obtain GIS files from the most recent census with new EA shapefilesand population values. To our knowledge the most recent census in Malawi was conducted in 2008.However, the population has already increased by at least 2 million since then. We would want toupdate the population values for the EAs to reflect the most current population value.

We would also like to have data about the travel times between EAs as opposed to using thestraight line distances between centroids of EAs as proxy. Especially for large EAs, the currentapproximation is not very accurate. Furthermore, while we are unsure of the road conditions acrossmuch of Malawi, we know that the road quality is poor in some areas, which could make some ofthe distances between backpacks and resupply centers unreasonable proxies for travel times. Wealso know that most HSAs travel largely on foot and that information about travel time by non-caraccessible paths could be very different than the straight line distance. However, we realize thatthis data would be challenging to obtain.

26

As discussed in our Testing Plan and Results sections, we found that using hospitals only asresupply centers would produce unreasonable travel distances. Most likely, BTB would have touse a combination of hospitals and health centers to supply the backpacks. Thus, to prepare ourplan for implementation we would need to consider the reliability of the supply connection betweenhealth centers and hospitals and learn more about which health centers we could expect to act asresupply sites.

9.2.2 Choosing Pilot Districts for the Scale-up Plan

The implementation of the HSA backpack program will likely take place in a few districts acrossMalawi rather than the entire country. Our preliminary analysis showed that Lilongwe District wasthe one with the highest demand. However, we would want to determine other places to implementthe plan based on demand. It would be important for the test districts to be representative ofthe different parts of Malawi and for the total proposed cost of providing the backpacks to thechosen districts to be below the amount Dr. Maria Oden and Dr. Rebecca Richards-Kortum wouldpotentially have to spend on initial backpack costs, likely $400,000. Fortunately, our code is alreadyformatted in such a way that we can easily apply it to subsections of the entire data set.

9.2.3 Determining an Implementable Plan

While the output for our problem gives assignments for every EA to an HSA pair, every HSA pairto a backpack, and every backpack to a hospital or health center, a plan in this format would notbe easy for the Ministry of Health in Malawi to implement because of the sheer size of the network.Therefore, another future aim for this project could be incorporating the salient features from ourresults into an easily understandable and implementable plan.

This plan could take several possible shapes. A few aspects of our outputs are already simpleand implementable, for example the total number and cost of backpacks needed, the extensivenumber of backpacks serving at full capacity, and the clear advantage of including health centersas potential resupply sites if possible. Table 3 in the appendices lists the number of HSAs andbackpacks that our model assigned to each district, a feature that could easily be implemented.One future goal might be to record which HSA pairs serve the greatest number of EAs, to makesure their assigned EAs are not over-served at the expense of others. Similarly, backpacks notbeing used at full capacity are more likely to be serving remote areas with greater need and shouldprobably be kept below capacity. We could also analyze the characteristics of the hospitals andhealth centers not chosen by our model to search for a relevant pattern that could be incorporatedinto the backpack scale-up.

10 Conclusion

This semester, we created a plan for deploying HSAs, assigning backpacks to HSAs, and resupplyingbackpacks in Malawi that met our main objectives as well as those of BTB. Along the way, we havefaced intellectual and ethical questions ranging from solving large integer programs to balancing costand quality of care. While our final model still has several assumptions and limitations, it provides adetailed analysis of the HSA and backpack programs beyond the mathematical capabilities of most

27

global health organizations. Furthermore, it provides a plan that could be used to successfully scaleup BTB’s HSA backpack program and significantly improve healthcare access in Malawi.

28

References

[1] Malawi gis shape files (2006). http://www.hisp.org/ftp/links/ViewDocCategory.asp?

cat=GIS.

[2] R.P. Barnett, S.K. Stansfield, A. Augustin, R. Boulos, and J.S. Newman. Optimization of taskallocation for community health workers in Haiti. Socio-Economic Planning Sciences, 22(1):3– 14, 1988.

[3] J. Beasley. Or-library: distributiong test problems by electronic mail. Journal of the Opera-tional Research Society, 41:1069–1072, 1990.

[4] Kaphuka J. Kanyanda S. Benson, T. and R. Chinula. Malawi - an atlas of social statistics.National Statistics Office, Zomba, Malawi and International Food Policy Research Institute,Washington, DC, USA, 2002.

[5] E. Brunskill and N. Lesh. Routing for rural health: optimizing community health worker visitschedules. Proceedings of AAAI Artificial Intelligence for Development, pages 326–334, 2010.

[6] C. Cocking, S. Flessa, and G. Reinelt. Locating health facilities in Nouna District, BurkinaFfaso. In H. Haasis, H. Kopfer, and J. Schn, editors, Operations Research Proceedings 2005,volume 2005 of Operations Research Proceedings, pages 431–436. Springer Berlin Heidelberg,2006. doi: 10.1007/3-540-32539-5-68.

[7] K. Doerner, A. Focke, and W.J. Gutjahr. Multicriteria tour planning for mobile healthcarefacilities in a developing country. European Journal of Operational Research, 179(3):1078 –1096, 2007.

[8] U.S. Agency for International Development. USAID Malawi AIDS/HIV health profile, Septem-ber 2010.

[9] S.L. Hakimi. Optimum distribution of switching centers in a communication network and somerelated graph theoretic problems. Operations Research, 13(3):462–475, 1965.

[10] K. Holmberg, M. Ronnqvist, and D. Yu. An exact algorithm for the capacitated facility locationproblems with single sourcing. European Journal of Operational Research, 113:544–559, 1999.

[11] J.M. Kadzandira and W.R. Chilowa. The role of health surveillance assistants (HSAs)in the delivery of health services and immunisation in Malawi. http://www.unicef.org/

evaldatabase/files/MLW_01-04.pdf, 2001. UNICEF Evaluation and Research Database.

[12] A. Katsulukuta. Making use of community health workers to imporove coverage - opportunitiesand challenges. World Health Organization Global Immunization Meeting, February 2010.

[13] J. Klincewicz and H. Luss. A lagrangian relaxation heuristic for the capacitated facility locationwith single-source constraints. Journal of Operational Research Society, 37(5):495–500, 1986.

[14] B.H. Massam, R. Akhtar, and I.D. Askew. Applying operations research to health planning:locating health centres in Zambia. Health Policy and Planning, 1(4):326–334, 1986.

29

http://www.hisp.org/ftp/links/ViewDocCategory.asp?cat=GIS

http://www.hisp.org/ftp/links/ViewDocCategory.asp?cat=GIS

http://www.unicef.org/evaldatabase/files/MLW_01-04.pdf

http://www.unicef.org/evaldatabase/files/MLW_01-04.pdf

[15] L. Murawski and R.L. Church. Improving accessibility to rural health services: the maximalcovering network improvement problem. Socio-Economic Planning Sciences, 43(2):102–110,2009. The contributions of Charles S. Revelle.

[16] S. Rahman and D.K. Smith. Use of location-allocation models in health service developmentplanning in developing nations. European Journal of Operational Research, 123(3):437–452,2000.

[17] R.A. Reid, K.L. Ruffing, and H.L. Smith. Managing medical supply logistics among healthworkers in Ecuador. Social Science and Medicine, 22(1):9–14, 1986.

[18] Muherjee Rhoan. Riders for Health: An African Success. http://ssrn.com/abstract=

1330277, January 2009.

[19] D. Walk. Backpack cost analysis v3.4. Personal communication with author, 2012.

30

http://ssrn.com/abstract=1330277

http://ssrn.com/abstract=1330277

A HSA and Backpack Assignments by District

This section includes a table of HSA and backpack assignments per district. This table may beuseful to us later as we work with BTB to determine in which districts the backpack program shouldinitially be deployed and how many backpacks should be assigned there. It is also an example ofan output from our model that is easy to interpret and follow. One interesting thing to note isthat more HSAs and backpacks are assigned to Lilongwe District than considered above, when thisnumber was based solely on population; this indicates that Lilongwe’s high demand likely comesfrom additional sources such as a small number of health centers, low population density, and ahigh percentage of children.

Number of Number ofDistrict HSA pairs backpacks

Balaka 200 68Blantyre 184 61

Blantyre City 68 24Chikwawa 253 83Chiradzulu 131 44

Chitipa 112 39Dedza 377 125Dowa 300 98

Karonga 141 47Kasungu 294 98Likoma 4 2

Lilongwe 726 246Lilongwe City 95 33

Machinga 309 103Mangochi 443 151Mchinji 264 88Mulanje 231 77Mwanza 126 41Mzimba 359 121

Mzuzu City 29 9Nkhata Bay 118 40Nkhotakota 151 52

Nsanje 126 44Ntcheu 295 99Ntchisi 138 47

Phalombe 174 60Rumphi 97 33Salima 176 59Thyolo 279 96Zomba 288 96

Zomba City 12 4

Table 3: Number of HSAs and backpacks assigned per district according to our full model.

31

B p-Median Code Documentation

Our p-median code, HSAToEA.cpp, was created by Elizabeth Van Itallie. This section containsinformation about how to compile the code, run the executable, and understand the outputs. Thesection also contains a summary of what the different sections of the code do. The instructionsincluded in this file assume that the code is being compiled and executed using a Linux operatingsystem.

B.1 Creating the Executable

A makefile is used to to create an executable from this code. The makefile is named “MakeHSAToEA”.The Linux command line prompt to compile the code is: “make -f MakeHSAToEA”.

B.2 Inputs for the Code

The executable file is “HSAToEA”. The code requires a minimum of nine inputs. The inputs areas follows (where the indexing is the position of the input in the command prompt executable line):

1: n, the number of nodes in the edge files, integer

2: p value for p-median problem (number of HSA pairs being assigned), integer

3: threshold value in meters for determining whether an HSA is close enough to potentially servean EA, float

4: flag to determine whether node and edge weight input check should be done

• 1 = check performed

• 0 = check not performed

5: text file containing 100 randomly generated ordered (from smallest to largest) integers between0 and 9217

6: text file containing 100 randomly generated pairs of integers where both values of the pairare between 0 and 9217 and the first values are ordered (from smallest to largest)

7: text file containing the EA FIDs for the EAs that are to be included in the problem formulation

8: text file containing two space delimited columns with all of the EA FIDs in the first columnand their corresponding node weight in the second column

9-end: .csv files of the edge weights where the value in the first column is an EA FID minus 1000*the integer value in the third position of the file name (indexing starting at 0), the value inthe second column in an EA FID minus 1000* the integer value in the fifth position of the filename, and the value in the third column is the straight line distance between the centroidsof the two EAs designated by the values in the first two columns in meters

B.3 Outputs Generated by Code

This codes creates six output files if input 4 is equal to 1 and four output files if input 4 is equalto 0. The output files are listed below along with an explanation of the information they contain.

The following four output files are created every time the code is executed:

32

gurobiout-check.txt This is the log file for the output from the Gurobi optimization process.The file contains information about the size of the constraint matrix, the results from thesimplex method results of the associated linear relaxation problem, and the output about thestatus of the branch and cut algorithm process to find the optimal integer solution.

p-median-out-amber.txt This file contains a column with the FIDs of all of the EA nodesincluded in the edge weight input files. If the EA is part of the integer program formulationthere is a value in the second comma delimited column with the FID of the location of theHSA providing service. If the EA is not part of the integer program there is no comma orvalue in the second column. This file is used for generating the graphical output and as inputfor the data.cpp code.

p-median-out-duo.txt The file contains a single column with the FIDs of the EA locations of allof the p chosen HSAs. This file is used as input for the code that determines the assignmentof backpacks to HSAs.

outtest.txt This file contains a single column with the FIDs of the EAs included in the edgeweight input but not included in the integer program formulation. This file is one of theinput files for the data.cpp code.

The following two files are only created when input 4 is equal to 1:

nodetest.txt This file contains the output of the node weight input checking. The file contains twocomma delimited columns. The first column contains the FIDs of the randomly chosen EAsand the second column contains their assigned node weights. The outputted node weightsshould match the appropriate ones in input 8.

edgetest.txt This file contain the output of the edge weight input checking. The file containsfive comma deliminated columns. The values of the first two columns are used to determinewhich of the edge input files the value came from. The values in columns three and four givethe indexing of the edge weight within the file specified by the first two columns. The lastcolumn contains the edge weight value.

B.4 Description of Code

The code HSAToEA.cpp formulates the integer program for the p-median problem specified by theinputs and optimizes it using Gurobi. There are five portions of this code: input, input testing,creation of the objective function and the constraints, optimization, and output.

In the input section inputs 1,2,3,7,8, and 9 through the end are stored in the appropriate arrays.If the input checking flag is equal to 1 the outputting of the node weights and edge weights forverification happens next.

In the next section the objective function and the constraints are formed. First, based on theinput about the EAs to be included in the problem the edge weight matrix is reduced to onlyinclude values designating lengths between nodes in the problem. Next we iterate through theresulting edge weight matrix and store the information about edge weights less than or equal to thethreshold level in compressed matrix storage. Next we create the appropriate number of Gurobiinteger variables. The objective function is created by passing each of the integer variables andits objective function coefficient, the product of the node weight and edge weight, to the Gurobiobject model. Lastly the constraints are passed to the object model. First we ensure that no EA is

33

assigned to an HSA that does not exist by iterating through all of the Gurobi integer variables andplacing them in the appropriate constraint variable depending on their original column indexing.For each of the columns the sum of all of the variables with that orginal column index must be lessthan or equal to the number of these variables multiplied by the value of the variable representingthe possible service of the EA by an HSA located in that EA. To ensure that p EAs are chosen tohave HSAs located in them a constraint is added to the object model saying that the sum of theGurobi variables representing the service of an EA by an HSA located in it must be equal to p. Toensure that every EA is assigned to an HSA, constraints are passed to the object model saying thatthe sum of the Gurobi variables representing the possible service options for an EA by an HSA isidentically equal to one.

In the optimize section the object model is optimized by Gurobi.

In the output section the results of the optimization are outputted. This information is de-termined by iterating through the Gurobi variables and finding where a variable is equal to 1designating service of the EA by a specific HSA.

B.5 Examples of Executable Lines for the Command Prompt

For the entire country of Malawi:

• If inputs are to be checked: ./HSAToEA 9218 6500 100000.0 1 rand check nodes.txt rand check edges.txtMalawi-nodes-FIDs.txt Node Weights 404.txt EA 0 0.dbf.csv EA 0 2.dbf.csv EA 0 4.dbf.csvEA 0 6.dbf.csv EA 0 8.dbf.csv EA 2 2.dbf.csv EA 2 4.dbf.csv EA 2 6.dbf.csv EA 2 8.dbf.csvEA 4 4.dbf.csv EA 4 6.dbf.csv EA 4 8.dbf.csv EA 6 6.dbf.csv EA 6 8.dbf.csv EA 8 8.dbf.csv

• If inputs are not to be checked: ./HSAToEA 9218 6500 100000.0 0 anything anything Malawi-nodes-FIDs.txt Node Weights 404.txt EA 0 0.dbf.csv EA 0 2.dbf.csv EA 0 4.dbf.csv EA 0 6.dbf.csvEA 0 8.dbf.csv EA 2 2.dbf.csv EA 2 4.dbf.csv EA 2 6.dbf.csv EA 2 8.dbf.csv EA 4 4.dbf.csvEA 4 6.dbf.csv EA 4 8.dbf.csv EA 6 6.dbf.csv EA 6 8.dbf.csv EA 8 8.dbf.csv

For only Lilongwe district:

• If inputs are to be checked: ./HSAToEA 9218 615 100000.0 1 rand check nodes.txt rand check edges.txtLilongwe-nodes-FIDs.txt Node Weights 404.txt EA 0 0.dbf.csv EA 0 2.dbf.csv EA 0 4.dbf.csvEA 0 6.dbf.csv EA 0 8.dbf.csv EA 2 2.dbf.csv EA 2 4.dbf.csv EA 2 6.dbf.csv EA 2 8.dbf.csvEA 4 4.dbf.csv EA 4 6.dbf.csv EA 4 8.dbf.csv EA 6 6.dbf.csv EA 6 8.dbf.csv EA 8 8.dbf.csv

• If inputs are not to be checked: ./HSAToEA 9218 615 100000.0 0 anything anything Lilongwe-nodes-FIDs.txt Node Weights 404.txt EA 0 0.dbf.csv EA 0 2.dbf.csv EA 0 4.dbf.csv EA 0 6.dbf.csvEA 0 8.dbf.csv EA 2 2.dbf.csv EA 2 4.dbf.csv EA 2 6.dbf.csv EA 2 8.dbf.csv EA 4 4.dbf.csvEA 4 6.dbf.csv EA 4 8.dbf.csv EA 6 6.dbf.csv EA 6 8.dbf.csv EA 8 8.dbf.csv

C CFLP - Backpack To HSA Code Documentation

Our CFLP - backpack To HSA code, BackpackToHSA.cpp, was created by Duo Wu. This sectioncontains information about how to compile the code, run the executable, and understand theoutputs. The section also contains a summary of what the different sections of the code do. The

34

instructions included in this file assume that the code is being compiled and executed using a Linuxoperating system.

C.1 Creating the Executable

A makefile is used to to create an executable from this code. The makefile is named “MakeBackpack-ToHSA”. The Linux command line prompt to compile the code is: “make -f MakeBackpackToHSA”.

C.2 Inputs for the Code

The executable file is “BackpackToHSA”. This code takes 17 files when assigning backpacks forthe entire country of Malawi. When restricted the the district of Lilongwe, the codes takes in 9files. The inputs are:

1: A text file containing the location of each HSA pair. Location here is represented by the FIDof the HSA pair’s base EA, with each line containing exactly one FID. This input is producedby “pmedian-416-check”.

2: A text file containing the fixed cost and capacity of a backpack, in this order. The twonumbers are separated by space.

3-end: These are .csv files of the variable costs where the value in the first column is an EA FID minus1000* the integer value in the third position of the file name (indexing starting at 0), thevalue in the second column in an EA FID minus 1000* the integer value in the fifth positionof the file name, and the value in the third column is the straight line distance between thecentroids of the two EAs designated by the values in the first two columns in meters.

C.3 Running the Code

Type in the command line ./BackpackToHSA > GRB BtoHSA Log.txt followed by the list of inputfiles. Similar to gurobiout-check.txt, “GRB BtoHSA Log.txt” is the log file for the entire code. Itdocuments the problem set up process as well as the Gurobi optimization process.

For example, to run the code for Malawi, type in “./BackpackToHSA p-median-out-duo.txtBackpackInfo.txt EA 0 0.dbf.csv EA 0 2.dbf.csv EA 0 4.dbf.csv EA 0 6.dbf.csv EA 0 8.dbf.csv EA 2 2.dbf.csvEA 2 4.dbf.csv EA 2 6.dbf.csv EA 2 8.dbf.csv EA 4 4.dbf.csv EA 4 6.dbf.csv EA 4 8.dbf.csv EA 6 6.dbf.csvEA 6 8.dbf.csv EA 8 8.dbf.csv ”

To run the code for Lilongwe, type in “./BackpackToHSA p-median-out-duo.txt Backpack-Info.txt EA 0 0.dbf.csv EA 0 2.dbf.csv EA 0 4.dbf.csv EA 2 2.dbf.csv EA 2 4.dbf.csv EA 4 4.dbf.csv”

C.4 Outputs Generated by Code

This code generates two output files:

BackpackToHSA Amber.txt First column contains the FIDs of the EAs that are selected toplace a backpack. Second column contains the locations of the HSA pairs that are beingserved by the backpack designated in column 1. The locations are referenced by the FIDs of

35

the HSAs’ base EAs. For example, if there is a backpack in EA 16 that is serving HSA pairsin EA 16, 17, and 18, we will see

16 1616 1716 18

BackpackLocation.txt This file contains a list of EAs that are selected to place a backpack andthe number of HSA pairs that backpack is serving.

C.5 Description of Code

This code follows a similar structure to that of the p-median code. It starts by setting up theGurobi environment and reading in parameters from the input files. It then creates the appropriatenumber of Gurobi variables by considering only EAs that are within 100 km of each other. In thenext step, the code creates the constraints described in the Mathematical Models section and callsGurobi to solve the linear integer problem. Finally, it writes the outputs from Gurobi to the twooutput files.

D CFLP - Resupply Center To Backpack Code Documentation

Our CFLP - resupply center to backpack code, ResupplyToBackpack.cpp, was created by DuoWu. This section contains information about how to compile the code, run the executable, andunderstand the outputs. The section also contains a summary of what the different sections of thecode do. The instructions included in this file assume that the code is being compiled and executedusing a Linux operating system.

D.1 Creating the Executable

A makefile is used to to create an executable from this code. The makefile is named “Mak-eResupplyToBackpack”. The Linux command line prompt to compile the code is: “make -fMakeResupplyToBackpack”.

D.2 Inputs for the Code

The executable file is “ResupplyToBackpack”. This code takes 5 files when both health centersand hospitals are considered potential resupply centers. The inputs are:

1: A text file containing the location of each backpack. Location here is represented by the FIDof the backpack’s base EA, with each line containing exactly one FID. This input is producedby “BackpackToHSA”.

2: A text or .csv file containing the FIDs of the set or subset of hospitals that are considered inthis assignment. Each line contains exactly one FID.

3: A text or .csv file containing the FIDs of the set or subset of health centers that are consideredin this assignment. Each line contains exactly one FID.

36

4: A .csv file containing straight line distances between each hospital and the centroid of allEAs. The first column contains the FID of hospitals; the second column contains the FID ofEAs; the last column contains the straight line distances in meters between the hospital andthe EA designated by the two FIDs.

5: A .csv file containing straight line distances between each health center and the centroid ofall EAs. The first column contains the FID of health centers; the second column contains theFID of EAs; the last column contains the straight line distances in meters between the healthcenter and the EA designated by the two FIDs.

When only hospitals are used, the codes only reads in item 1, 2, and 4.

D.3 Running the Code

Type in the command line ./ResupplyToBackpack > GRB RtoB Log.txt followed by the list ofinput files. Serving exactly the same function as GRB BtoHSA Log.txt, “GRB RtoB Log.txt” alsodocuments the problem set up process as well as the Gurobi optimization process.

For example, to run the code for Malawi using both hospitals and health centers, type in“./ResupplyToBackpack BackpackLocation.txt Hosp.dbf.csv Primary.dbf.csv Hosp EA.dbf.csv Pri-mary EA.dbf.csv ”.

To run the code for Lilongwe, type in “./ResupplyToBackpack BackpackLocation.txt Lilongwe Hosp.txtLilongwe Primary.txt Hosp EA.dbf.csv Primary EA.dbf.csv ”.

Note that the third and fifth files, which contain information about health centers, are simplynot used when only hospitals are considered.

D.4 Outputs Generated by Code

This code generates two output files:

ResupplyToBackpack Amber.txt First column contains the variable yi corresponding to healthfacilities that are selected to host a resupply center. Second column contain the FIDs of theseselected health facilities. Third column contains the locations of the backpacks that are beingserved by the health facility designated in column 1 and 2. The locations are referenced bythe FIDs of the backpacks’ base EAs.

ResupplyCenter.txt This file contains a list of health facilities that are selected to host a resupplycenter and the number of backpacks that backpack is serving.

D.5 Description of Code

This code follows exactly the same structure as BackpackToHSA.cpp. In fact, it is a modification ofthe previous code with the addition of a flag variable that indicates if health centers are consideredalong with hospitals. When number of inputs is larger than 3, we are considering both sets offacilities and the flag is turned on. Otherwise, only hospitals are considered and the flag is turnedoff.

37

E Plot Instructions

This appendix contains the basic steps necessary to convert the output files from the p-median andCFLP codes into maps such as the ones included in the Results section of this report. Matlab,Excel, and ArcGIS are all needed for this conversion.

1. Format files for GIS import

• Open Excel.

• Open file in Excel. Files are comma delimited.

• If zeros are not yet removed, match EA numbers with EA numbers from populationspreadsheet, sort by population, and remove zero rows.

• Remove any rows with no assignment (one or more blank entries in a row).

• For the Both resupply output, keep each hospital ID but add 49 to each health centerID.

• Delete unnecessary columns, if applicable. Add an extra column of zeros as a placeholderat the front. This will hopefully prevent numbers from being cut off.

• Save file as csv.

2. Open ArcCatalog and ArcMap and establish relevant folder connection

3. Load and prepare data in ArcMap

• Add the csv file being used.

• Also add the following shape files: EA Centroid (for all Malawi files); EA CentroidCopy (for Malawi backpack-to-HSA and HSA-to-EA); Lilongwe Centroid (for all Li-longwe files); Lilongwe Centroid Copy (for Lilongwe backpack-to-HSA and HSA-to-EA);Hospitals (for Lilongwe and Malawi Hospitals and Both); Primary Health Centers (forLilongwe and Malawi Both).

• Make sure all shape files have columns point-x and point-y (note: x-coord and y-coordare not the same thing). If they don’t, use the ArcMap ModelBulder to run the AddXY Coordinate tool (found in system toolboxes data management tools).

4. Join csv files with relevant shape files

• Right click on the csv file to join.

• First join on broader column (ie EA before HSA, HSA before backpack, etc); next joinon more narrow column. For the Both csv files only, three joins are necessary: first oncentroids, then on hospitals, then on primary health centers.

• Join based on relevant field: FID for EA Point Centroid (first join for all Malawi files);FID for EA Point Centroid Copy (second join for HSA and backpack Malawi files);Index for Hospitals (second join for all Hospital and Both files); Index for PrimaryHealth Centers (third join for all Both files); ORIG-FID for Lilongwe Point Centroid(first join for all Lilongwe files); and ORIG-FID for Lilongwe Point Centroid Copy files(second join for HSA and backpack Lilongwe files).

• Check the box to keep all records.

• Push the button to join.

5. Export joined data table as a txt file

6. Compute distances and format csv

38

• Open text file in Excel.

• Delete all columns but point-x, point-y, point-x-1, and point-y-1.

• Create a temporary distance column by calculating the root sum of the squares of thedifferences between coordinates. Calculate average, median, and max in temporary cells.

• Remove all temporary cells and save as a csv.

7. Create background image

• Open either Lilongwe-Background or Malawi-Background in ArcMap.

• Do not change the size and shape of the map. (Otherwise, you will need to re-recordthe coordinate boundaries and change these values in Plotter.m).

• Double click on the distance textbox to change the distance values to those obtainedfrom Excel.

• Export the map as a jpeg with a resolution of 960 dpi.

8. Create and finalize plots

• Crop the background images in Preview so only the inner square boundaries are at thevery edge of the image.

• Open Plotter.m in Matlab.

• Comment out all but the relevant section. Make sure all file names are correct.

• Run Plotter in the command line.

• Save resulting plot as a pdf.

• Crop the resulting plot in Preview.

F Main ArcGIS Tools Used

This section includes a list of the main ArcGIS tools used in processing the data for this project.

• Feature to Point: This tool converts polygon features to centroid points, and is how we wereable to convert EAs to their centroids.

• Point Distance: This tool calculates the distance from all points in one input file to all pointsin a second input file. This is how we were able to calculate distances between EA centroids.This tool crashed when we tried to compute distances between all EAs at once, which forcedus to split the EAs into five different files: those with FIDs from 0-2000, from 2000-4000,from 4000-6000, from 6000-8000, and ≥ 8000. The output tables here are in dbf format, butwe were able to convert them to csv using dbfpy and the shell script command ”crunch.”

• Near Distance: This command calculates the distance from each point in one input file to thenearest point in another input file. This was useful in testing the accuracy of our data.

• Add XY Coordinates: This tool calculates the coordinates of point features in the relevantcoordinate system. Our data used the WGS 1984 geographic coordinate system and theUTM Zone 36S projected coordinate system, so the units here are meters. This was usefulfor plotting our data

• Join: This command allows you to join multiple csv and/or shape files together when youspecify which column to join by. This was useful for analyzing and plotting our data.

39

Other useful ArcGIS features to remember include the ability to select features by attribute orlocation; to export tables and image files; to add and calculate new fields within a table; to adjustmap properties based on data values; and to project data onto the relevant coordinate system.

40

Documents

Executive Summary - Rice University | Rice Universitydesign/PastSuccessandPressFiles/Opti… · From: TEAM OPTIMIST (Amber Kunkel, Elizabeth Van Itallie, and Duo Wu) Subject: Final