Download pdf - Approximation Algorithms for Stochastic Orienteering … · Approximation Algorithms for Stochastic Orienteering ... Our Techniques: ... orienteering problem which will be a crucial

Approximation Algorithms for Stochastic Orienteering

Anupam Gupta∗ Ravishankar Krishnaswamy† Viswanath Nagarajan‡ R. Ravi§

Abstract

In the Stochastic Orienteering problem, we are given ametric, where each node also has a job located therewith some deterministic reward and a random size.(Think of the jobs as being chores one needs to run,and the sizes as the amount of time it takes to do thechore.) The goal is to adaptively decide which nodesto visit to maximize total expected reward, subject tothe constraint that the total distance traveled plus thetotal size of jobs processed is at most a given budgetof B. (I.e., we get reward for all those chores we finishby the end of the day). The (random) size of a jobis not known until it is completely processed. Hencethe problem combines aspects of both the stochasticknapsack problem with uncertain item sizes and thedeterministic orienteering problem of using a limitedtravel time to maximize gathered rewards located atnodes.

In this paper, we present a constant-factor ap-proximation algorithm for the best non-adaptive pol-icy for the Stochastic Orienteering problem. Wealso show a small adaptivity gap—i.e., the exis-tence of a non-adaptive policy whose reward is atleast an Ω(1/ log log B) fraction of the optimal ex-pected reward—and hence we also get an O(log log B)-approximation algorithm for the adaptive problem. Fi-nally we address the case when the node rewards arealso random and could be correlated with the wait-ing time, and give a non-adaptive policy which is anO(log n log B)-approximation to the best adaptive pol-icy on n-node metrics with budget B.

∗Department of Computer Science, Carnegie Mellon Univer-sity, Pittsburgh PA 15213. Research was partly supported byNSF awards CCF-0964474 and CCF-1016799.

†Department of Computer Science, Carnegie Mellon Univer-sity, Pittsburgh PA 15213. Research was partly supported byNSF awards CCF-0964474 and CCF-1016799, and an IBM Grad-uate Fellowship.

‡IBM T.J. Watson Research Center, Yorktown Heights, NY10598, USA.

§Tepper School of Business, Carnegie Mellon University, Pitts-burgh PA 15213. Research was partly supported by NSF awardsCCF-1143998.

1 Introduction

Consider the following problem: you start your day athome with a set of chores to run at various locations(e.g., at the bank, the post office, the grocery store),but you only have limited time to run those chores in(say, you have from 9am until 5pm, when all these shopsclose). Each successfully completed chore/job j givesyou some fixed reward rj . You know the time it takesyou to travel between the various job locations: thesedistances are deterministic and form a metric (V, d).However, you do not know the amount of time you willspend doing each job (e.g., standing in the queue, fillingout forms). Instead, for each job j, you are only giventhe probability distribution πj governing the randomamount of time you need to spend performing j. Thatis, once you start performing the job j, the job finishesafter Sj time units and you get the reward, where Sj

is a random variable denoting the size, and distributedaccording to πj .1 (You may cancel processing the job jprematurely, but in that case you don’t get any reward,and you are barred from trying j again.) The goal is nowa natural one: given the metric (V, d), the starting pointρ, the time budget B, and the probability distributionsfor all the jobs, give a strategy for traveling aroundand doing the jobs that maximizes the expected rewardaccrued.

The case when all the sizes are deterministic (i.e.,Sj = sj with probability 1) is the orienteering problem,for which we now know a (2 + ε)-approximation algo-rithm [5, 7]. Another special case, where all the choresare located at the start node, but the sizes are random,is the stochastic knapsack problem, which also admitsa (2+ ε)-approximation algorithm [12, 3]. However, thestochastic orienteering problem above, which combinesaspects of both these problems, seems to have been hith-erto unexplored in the approximation algorithms liter-ature.

It is known that even for stochastic knapsack, an op-timal adaptive strategy may be exponential-sized (and

1To clarify: before you reach the job, all you know about itssize is what can be gleaned from the distribution πj of Sj ; andeven having worked on j for t units of time, all you know aboutthe actual size of j is what you can infer from the conditional(Sj | Sj > t).

finding the best strategy is PSPACE-hard) [12]. So an-other set of interesting questions is to bound the “adap-tivity gap”, and to find non-adaptive solutions whoseexpected reward is close to that of the best adaptivesolution. A non-adaptive solution for stochastic orien-teering is simply a tour P of points in the metric spacestarting at the root ρ: we visit the points in this fixedorder, performing the jobs at the points we reach, untiltime runs out.

One natural algorithm for stochastic orienteeringis to replace each random job j by a deterministicjob of size E[Sj ], and use an orienteering algorithm tofind the tour of maximum reward in this deterministicinstance.2 Such an approach was taken by [12] forstochastic knapsack: they showed a constant adaptivitygap, and constant-factor approximation via preciselythis idea. However, for stochastic orienteering this givesonly an O(log B) approximation, and indeed there areexamples where we get only an Ω(log B) fraction of theoptimal expected reward. (See Section 4 for a moredetailed discussion.) In this paper we show we can domuch better than this logarithmic approximation:

Theorem 1.1. There is an O(log log B)-approximationalgorithm for the stochastic orienteering problem.

Indeed, our proof proceeds by first showing the followingstructure theorem which bounds the adaptivity gap:

Theorem 1.2. Given an instance of the stochastic ori-enteering problem, then

• either there exists a single job which gives anΩ(log log B) fraction of the optimal reward, or• there exists a value W ∗ such that the optimal non-adaptive tour which spends at most W ∗ time wait-ing and B−W ∗ time traveling, gets an Ω(log log B)fraction of the optimal reward.

Note that naıvely we would expect only a logarithmicfraction of the reward, but the structure theorem showswe can do better. Indeed, this theorem is the technicalheart of the paper, and is proved via a martingaleargument of independent interest. Since the abovetheorem shows the existence of a non-adaptive solutionclose to the best adaptive solution, we can combine itwith the following result to prove Theorem 1.1.

Theorem 1.3. There exists a constant-factor approxi-mation algorithm to the optimal non-adaptive policy forstochastic orienteering.

2Actually, one should really replace the stochastic job witha deterministic one of size E[min(Sj , B − d(ρ, j))] and rewardrj · Pr[Sj + d(ρ, j) ≤ B], it is very easy to fool the algorithmotherwise.

Note that if we could show an existential proof ofa constant adaptivity gap (which we conjecture tobe true), the above approximation for non-adaptiveproblems that we show immediately implies an O(1)-approximation algorithm for the adaptive problem too.

Our second set of results are for a variant of theproblem, one where both the rewards and the job sizesare random and correlated with each other. For thiscorrelated problem, we show the following results:

Theorem 1.4. There is a polynomial-time algorithmthat outputs a non-adaptive solution for correlatedstochastic orienteering, which is an O(log n log B)-approximation to the best adaptive solution. More-over, the correlated problem is at least as hard as theorienteering-with-deadlines problem.

Recall that we only know an O(log n) approximation forthe orienteering-with-deadlines problem [2].

Our Techniques: Most of the previous adaptivitygaps have been shown by considering some linear pro-gramming relaxation that captures the optimal adaptivestrategies, and then showing how to round a fractionalLP solution to get a non-adaptive strategy. But sincewe do not know a good relaxation for even the deter-ministic orienteering problem, this approach seems dif-ficult to take directly. So to show our adaptivity gapresults, we are forced to argue directly about the opti-mal adaptive strategy: we use martingale arguments toshow the existence of a path (a.k.a. non-adaptive strat-egy) within this tree which gives a large reward. Wecan then use algorithms for the non-adaptive settings,which are based on reductions to orienteering (with anadditional knapsack constraint).

Roadmap: The rest of the paper follows the out-line above. We begin with some definitions in Section 2,and then define and give an algorithm for the knapsackorienteering problem which will be a crucial sub-routinein all our algorithms in Section 3. Next we motivate ournon-adaptive algorithm by discussing a few straw-mansolutions and the traps they fall into, in Section 4. Wethen state and prove our constant-factor non-adaptivealgorithm in Section 5, which naturally leads us to ourmain result in Section 6, the proof of the O(log log B)-adaptivity gap for StocOrient. Finally, we present inSection 7 the poly-logarithmic approximation for theproblem where the rewards and sizes are correlated witheach other. For all these results we assume that we arenot allowed to cancel jobs once we begin working onthem: in Section 8 we show how to discharge this as-sumption.

1.1 Related Work The orienteering problem isknown to be APX-hard, and the first constant-factor

approximation was due to Blum et al. [5]. Their fac-tor of 4 was improved by [2] and ultimately by [7] to(2 + ε) for every ε > 0. (The paper [2] also consid-ers the problem with deadlines and time-windows; seealso [8, 9].) There is a PTAS for low-dimensional Eu-clidean space [10]. To the best of our knowledge, thestochastic version of the orienteering problem has notbeen studied before from the perspective of approxima-tion algorithms. Heuristics and empirical guarantees fora similar problem were given by Campbell et al. [6].

The stochastic knapsack problem [12] is a specialcase of this problem, where all the tasks are locatedat the root ρ itself. Constant factor approximationalgorithms for the basic problem were given by [12,4], and extensions where the rewards and sizes arecorrelated were studied in [16]. Most of these papersproceed via writing LP relaxations that capture theoptimal adaptive policy; extending these to our problemfaces the barrier that the orienteering problem is notknown to have a good natural LP relaxation.

Another very related body of work is on budgetedlearning. Specifically, in the work of Guha and Muna-gala [15], there is a collection of Markov chains spreadaround in a metric, each state of each chain having anassociated reward. When the player is at a Markovchain at j, she can advance the chain one step everyunit of time. If she spends at most L time units trav-eling, and at most C time units advancing the Markovchains, how can she maximize some function (say thesum or the max) of rewards of the final states in expec-tation? [15] give an elegant constant factor approxima-tion to this problem (under some mild conditions on therewards) via a reduction to classical orienteering usingLagrangean multipliers. Our algorithm/analysis for theknapsack orienteering problem (defined in Section 2) isinspired by theirs; the analysis of our algorithm thoughis simpler, due to the problem itself being determin-istic. However, it is unclear how to directly use suchtwo-budget approaches to get O(1)-factors for our one-budget problem without incurring an O(log B)-factor inthe approximation ratio.

Finally, while most of the work on giving ap-proximations to adaptive problems has proceeded byusing LP relaxations to capture the optimal adap-tive strategies—and then often rounding them to getnon-adaptive strategies, thereby also proving adaptiv-ity gaps [14, 12], there are some exceptions. In par-ticular, papers on adaptive stochastic matchings [11],on stochastic knapsack [4, 3], on building decisiontrees [18, 1, 17], all have had to reason about the optimaladaptive policies directly. We hope that our martingale-based analysis will add to the set of tools used for sucharguments.

2 Definitions and Notation

An instance of stochastic orienteering is defined on anunderlying metric space (V, d) with ground set |V | = nand symmetric integer distances d : V × V → Z+

(satisfying the triangle inequality) that represent traveltimes. Each vertex v ∈ V is associated with a uniquestochastic job, which we also call v. For the firstpart of the paper, each job v has a fixed rewardrv ∈ Z≥0, and a random processing time/size Sv,which is distributed according to a known but arbitraryprobability distribution πv : R+ → [0, 1]. (In theintroduction, this was the amount we had to wait inqueue before receiving the reward for job v.) We arealso given a starting “root” vertex ρ, and a budget Bon the total time available. Without loss of generality,we assume that all distances are integer values.

The only actions allowed to an algorithm are totravel to a vertex v and begin processing the job there:when the job finishes after its random length Sv oftime, we get the reward rv (so long as the total timeelapsed, i.e., total travel time plus processing time, isat most B), and we can then move to the next job.There are two variants of the basic problem: in the basicvariant, we are not allowed to cancel any job that webegin processing (i.e., we cannot leave the queue oncewe join it). In the version with cancellations, we cancancel any job at any time without receiving any reward,and we are not allowed to attempt this job again in thefuture. Our results for both versions will have similarapproximation guarantees—for most of the paper, wefocus on the basic version, and in Section 8 we show howto reduce the more general version with cancellations tothe basic one.

Note that any strategy consists of a decision treewhere each state depends on which previous jobs wereprocessed, and what information we got about theirsizes. Now the goal is to devise a strategy which, start-ing at the root ρ, must decide (possibly adaptively)which jobs to travel to and process, so as to maxi-mize the expected sum of rewards of jobs successfullycompleted before the total time (travel and processing)reaches the threshold of B.

In the second part of the paper, we consider thesetting of correlated rewards and sizes: in this model,the job sizes and rewards are both random, and arecorrelated with each other. (Recall that the stochasticknapsack version of this problem also admits a constantfactor approximation [16]).

We are interested in both adaptive and non-adaptive strategies, and in particular, want to bound theratio between the performance of the best adaptive andbest non-adaptive strategies. An adaptive strategy is adecision tree where each node is labeled by a job/vertex

of V , with the outgoing arcs from a node labeled by jcorresponding to the possible sizes in the support of πj .A non-adaptive strategy, on the other hand, is a path Pstarting at ρ; we just traverse this path, executing thejobs we encounter, until the total (random) size of thejobs and the distance traveled exceeds B. (A random-ized non-adaptive strategy may pick a path P at randomfrom some distribution before it knows any of the sizeinstantiations, and then follows this path as above.)

Finally, we use [m] to denote the set 0, 1, . . . , m.

3 The (Deterministic) Knapsack OrienteeringProblem

We now define a variant of the orienteering problemwhich will be crucially used in the rest of the paper.Recall that in the basic orienteering problem, theinput consists of a metric (V, d), the root vertex ρ,rewards rv for each job v, and total budget B. Thegoal is to find a path P of length at most B starting atρ that maximizes the total reward

∑v∈P rv of vertices

in P .In the knapsack orienteering problem

(KnapOrient), we are given two budgets, L whichis the “travel” budget, and W which is the “knapsack”budget. Each job v has a reward rv, and now also a“size” sv. A feasible solution is a path P of length atmost L, such that the total size s(P ) :=

∑v∈P sv is

at most W . The goal is to find a feasible solution ofmaximum reward

∑v∈P rv.

Theorem 3.1. There is a polynomial time O(1)-approximation AlgKO for the KnapOrient problem.

The idea of the proof is to push the knapsackconstraint into the objective function by consideringLagrangian relaxation of the knapsack constraint; weremark that such an approach was also taken in [15]for a related problem. This way we alter the rewardsof items while still optimizing over the set of feasibleorienteering tours. For a suitable choice of the Lagrangeparameter, we will show that we can recover a solutionwith large (unaltered) reward while meeting both theknapsack and orienteering constraints.

Proof. For a value λ ≥ 0, define an orienteering instanceI(λ) on metric (V, d) with root ρ, travel budget L, andprofits rλ

v := rv − λ · sv at each v ∈ V . Note that theoptimal solution to this orienteering instance has valueat least Opt− λ ·W , where Opt is the optimal solutionto the original KnapOrient instance.

Let Algo(λ) denote an α-approximate solution toI(λ) as well as its profit; we have α = 2 + δ via the

algorithm from [7]. By exhaustive search, let us find:

(3.1) λ∗ := max

λ ≥ 0 : Algo(λ) ≥ λ ·Wα

Observe that by setting λ = Opt2W , we have Algo(λ) ≥

1α (Opt− λW ) = Opt

2α . Thus λ∗ ≥ Opt2W .

Let σ denote the ρ-path in solution Algo(λ∗), andlet

∑v∈σ sv = y · W for some y ≥ 0. Partition the

vertices of σ into c = max1, b2yc parts σ1, . . . , σc with∑v∈σj

sv ≤ W for all j ∈ 1, . . . , c. This partition canbe obtained by greedy aggregation since maxv∈V sv ≤W (all vertices with larger size can be safely excluded bythe algorithm). Set σ′ ← σk for k = arg maxc

j=1 r(σj).We then output σ′ as our approximate solution to theKnapOrient instance. It remains to bound the rewardwe obtain.

r(σ′) ≥ r(σ)c

≥ λ∗ yW + λ∗W/α

c

= λ∗W ·(

y + 1/α

c

)

≥ λ∗W ·min

y +1α

,12

+1

2αy

≥ λ∗W

α

The second inequality is by r(σ)−λ∗ ·s(σ) = Algo(λ∗) ≥λ∗W

α due to the choice (3.1), which implies that r(σ) ≥λ∗ · s(σ) + λ∗·W

α = λ∗yW + λ∗Wα by the definition of

y. The third inequality is by c ≤ max1, 2y. The lastinequality uses α ≥ 2. It follows that r(σ′) ≥ Opt

2α , thusgiving us the desired approximation guarantee.

As an aside, this Lagrangean approach can give aconstant-factor approximation for a two-budget versionof stochastic orienteering, but it is unclear how toextend it to the single-budget version. In particular,we are not able to show that the Lagrangian relaxation(of processing times) has objective value Ω(Opt). Thisis because different decision paths in the Opt tree mightvary a lot in their processing times, implying thatthere is no reasonable candidate for a good Lagrangemultiplier.

4 A Strawman Approach: Reduction toDeterministic Orienteering

As mentioned in the introduction, a natural idea forStocOrient is to replace random jobs by deterministicones with size equal to the expected size E[Sv], find anear-optimal orienteering solution P to the determin-istic instance which gets reward R. One can then use

this path P to get a non-adaptive strategy for the orig-inal StocOrient instance with expected reward Ω(R).Indeed, suppose the path P spends time L travelingand W waiting on the deterministic jobs such thatL + W ≤ B. Then, picking a random half of the jobsand visiting them results in a non-adaptive solution forStocOrient which travels at most L and processes jobsfor time at most W/2 in expectation. Hence, Markov’sinequality says that with probability at least 1/2, alljobs finish processing within W time units and we getthe entire reward of this sub-path, which is Ω(R).

However, the problem is in showing that R =Ω(Opt)—i.e., that the deterministic instance has a so-lution with large reward. The above simplistic reduc-tion of replacing random jobs by deterministic ones withmean size fails even for stochastic knapsack: suppose theknapsack budget is B, and each of the n jobs have sizeBn with probability 1/n, and size 0 otherwise. Notethat the expected size of every job is now B. Therefore,a deterministic solution can pick only one job, whereasthe optimal solution would finish Ω(n) jobs with highprobability. However, observe that this problem dis-appears if we truncate all sizes at the budget, i.e., setthe deterministic size to be the expected “truncated”size E[min(Sj , B)] where Sj is the random size of jobj. (Of course, we also have to set the reward to berj Pr[Sj ≤ B] to discount the reward from impossiblesize realizations.) Now E[min(Wj , B)] reduces to B/nand so the deterministic instance can now get Ω(n) re-ward. Indeed, this is the approach used by [12] to getan O(1)-approximation and adaptivity gap.

But for StocOrient, is there a good truncationthreshold? Considering E[min(Sj , B)] fails on the ex-ample where all jobs are co-located at a point at dis-tance B − 1 from the root. Each job v has size Bwith probability 1/B, and 0 otherwise. Truncation byB gives an expected size ESv∼πv [min(Sv, B)] = 1 forevery job, and so the deterministic instance gets rewardfrom only one job, while the StocOrient optimum cancollect Ω(B) jobs. Now noticing that any algorithm hasto spend B − 1 time traveling to reach any vertex thathas some job, we can instead truncate each job j sizeat B − d(ρ, j), which is the maximum amount of timewe can possibly spend at j (since we must reach vertexj from ρ). However, while this fix would work for theaforementioned example, the following example showsthat the deterministic instance gets only an O( log log B

log B )fraction of the optimal stochastic reward.

Consider n = log B jobs on a line as in Figure 4. Fori = 1, 2, . . . , log B, the ith job is at distance B(1−1/2i),and it takes on sizes B/2i with probability p := 1/ log Band size 0 otherwise. Each job has unit reward. Theoptimal (adaptive and non-adaptive) solution to this

ρ v1 v2 vlogB

B/2 B/4

Figure 4.1: Bad example for replacing by expectations.

instance is to try all the jobs in order: with probability(1− p)log B ≈ 1/e, all the jobs instantiate to size 0 andwe will accrue reward Ω(log B).

However, suppose we replace each job by its ex-pected size µi = B/(2i log B), and try to pick jobs sothat the total travel plus expected sizes is at most B.Suppose j is the first job we pick along the line, thenbecause of its size being µj we cannot reach any jobsin the last µj length of the path. The number of theselost jobs is log µj = log B− j− log log B (because of thegeometrically decreasing gaps between jobs), and hencewe can reach only jobs j, j + 1, j + log log B− 1—givingus a maximum profit of log log B even if we ignore thespace these jobs would take. (In fact, since their sizesdecrease geometrically, we can indeed get all but a con-stant number of these jobs.)

This shows that replacing jobs in a StocOrient in-stance by their expected truncated sizes gives a deter-ministic instance whose optimal reward is smaller by anΩ( log B

log log B ) factor.

4.1 Why the above approaches failed, and ourapproach The reason why the determinization tech-niques described above worked for stochastic knapsack,but failed for stochastic orienteering is the following:the total sizes of jobs is always roughly B in knapsack(so truncating at B was the right thing to do). Butin orienteering it depends on the total time spent trav-eling, which in itself is a random quantity even for anon-adaptive solution. One way around this is to guessthe amount of time spent processing jobs (up to a fac-tor of 2) which gets the largest profit, and use thatas the truncation threshold. Such an approach seemslike it should lose a Ω(log B) fraction of the reward.Surprisingly, we show that this simple algorithm actu-ally gives a much better reward: it gets a constant fac-tor of the reward of a non-adaptive optimum, and anO(log log B)-approximation when compared to the adap-tive optimum!

At a high level, our approach is to solve this problemin three steps. Suppose we start off with an instance Iso

of StocOrient with optimal (adaptive) solution havingexpected reward Opt. The high level overview of ouralgorithm is described in Figure 4.2. However, thereare many details to be addressed, and we flesh outthe details of this algorithm over the next couple of

sections. Before we proceed, it would be useful to recallour definition of the KnapOrient problem from Section 3.

5 The Algorithm for Non-Adaptive StochasticOrienteering

We first consider the non-adaptive StocOrient problem,and present an O(1)-approximation algorithm. This willprove Theorem 1.3, and also serve as a warmup for themore involved analysis in the adaptive setting.

Recall that the input consists of metric (V, d) witheach vertex v ∈ V representing a stochastic job having adeterministic reward rv ∈ Z≥0 and a random processingtime/size Sv distributed according to πv : R+ →[0, 1]; we are also given a root ρ and budget B. Anon-adaptive strategy is an ordering π of the vertices(starting with ρ), which corresponds to visiting vertices(and processing the respective jobs) in the order π.The goal in the non-adaptive StocOrient problem isto compute an ordering that maximizes the expectedreward, i.e., the total reward of all items which arecompleted within the budget of B (travel + processingtimes).

Definition 1. (Truncated Means) For any vertexu ∈ V and any positive value Z ≥ 0, let µu (Z) :=ESu∼πu [min(Su, Z)] denote the expected size truncatedat Z. Note that µu (Z1) ≤ µu (Z2) and µu (Z1 + Z2) ≤µu (Z1) + µu (Z2) for all Z2 ≥ Z1 ≥ 0.

Definition 2. (Valid KnapOrient Instances) Nowgiven an instance Iso of StocOrient, and a valueW ≤ B, we define the following valid KnapOrientinstance with parameter W to be Iko(W ) :=KnapOrient(V, d, (su, ru) : ∀u ∈ V , L,W, ρ) asfollows.

(i) Set L = B −W ,(ii) For all u ∈ V , define its deterministic sizesu = µu (W/2).

(iii) For all u ∈ V , set its reward to be ru = ru ifPrSu∼πu [Su > W/2] ≤ 1

2 , and to 0 otherwise.

Recall that KnapOrient has deterministic sizes buttwo budgets L and W , and that AlgKO is an O(1)-approximation algorithm for it. The algorithm for non-adaptive StocOrient now proceeds as given below.

Note that the algorithm gives a randomized non-adaptive policy: it chooses a random (sub)path tofollow, and just visits all the jobs on that path untiltime runs out.

5.1 The Analysis The analysis consists of two con-ceptual parts. Let Opt denote both the optimal non-adaptive strategy, and its total expected reward.

Algorithm 1 Algorithm AlgSO for StocOrient on inputIso = (V, d, (πu, ru) : ∀u ∈ V , B, ρ)1: for all v ∈ V do2: let Rv := rv · PrSv∼πv

[Sv ≤ (B − d(ρ, v))] be theexpected reward of the single-vertex tour to v.

3: w.p. 1/2, just visit the vertex v with the highestRv and exit.

4: for W = B,B/2, B/4, . . . , B/2dlog Be, 0 do5: let i = log(B/W ) if W 6= 0, otherwise let

i = dlog Be+ 1.6: let Pi be the path returned by AlgKO on the valid

KnapOrient instance Iko(W ).7: let Ri be the reward of this KnapOrient solution

Pi.8: let Pi∗ be the solution among Pii∈[log B+1] with

maximum reward Ri.9: sample each vertex in Pi∗ independently w.p. 1/4,

and visit these sampled vertices in order given byPi∗ .

• We first show that either just a single-vertex tourhas reward Ω(Opt), or one of the valid KnapOrientinstances for some W of the form B/2i (or W = 0)has reward Ω(Opt).

• If a single-vertex tour is good, we’ll visit it with50% chance and be done. If not, since AlgKO is anO(1)-approximation algorithm for KnapOrient, thesolution Pi∗ selected in Step 8 must have rewardRi∗ = Ω(Opt), albeit only as a KnapOrient solution.So we finally show that a solution for any validKnapOrient instance with reward R can be con-verted into a non-adaptive strategy for StocOrientwith reward Ω(R).The following theorem summarizes the first part of

this argument:

Theorem 5.1. (Structure Theorem 1) Given aninstance Iso for which an optimal non-adaptive strategyhas an expected reward of Opt, either there is a single-vertex tour with expected reward Ω(Opt), or there existsW = B/2i for some i ∈ Z≥0 (or W = 0) for which thevalid KnapOrient instance Iko(W ) has reward Ω(Opt).

For now, let us assume Theorem 5.1 and completethe proof of Theorem 1.3; we’ll then prove Theorem 5.1in the following subsection. Firstly, note that if thereis a single-vertex tour with expected reward Ω(Opt),then our algorithm would get this reward in Step 3with probability 1/2. Therefore, for the remainder ofthis section, assume that no single-vertex tour is good;hence, by Theorem 5.1 and our O(1)-approximationalgorithm (Theorem 3.1) for KnapOrient, the path Pi∗

with the highest reward identified in Step 8 has reward

Step 1: Construct a suitable instance Iko of Knapsack Orienteering (KnapOrient), with the guarantee that theoptimal profit from this KnapOrient instance Iko is Ω(Opt/ log log B).

Step 2: Use Theorem 3.1 on Iko to find a path P with reward Ω(Opt/ log log B).Step 3: Follow this solution P and process the jobs in the order P visits them.

Figure 4.2: High Level Overview

Ω(Opt).For brevity, denote Pi∗ as P ∗, and let W ∗ = 2i∗ be

the associated value of W . Let P ∗ = 〈ρ, v1, v2, . . . , vl〉.Hence Ri∗ =

∑u∈P∗ ru. Since P ∗ is feasible for

Iko(W ∗), we know that (a) the length of the path P ∗

is at most B − W ∗, and (b)∑

u∈P∗ su ≤ W ∗. Andrecall that by the way we defined the valid KnapOrientinstances (Definition 2), su = µu (W ∗/2) for all u, andru = ru if PrSu∼πu

[Su > W ∗/2] ≤ 12 and 0 otherwise.

Lemma 5.1. For any vertex v ∈ P ∗ such that rv > 0,the probability of successfully reaching v and finishingthe job at v before violating the budget is at least 1/16.

Proof. The proof is an easy application of Markov’sinequality. Indeed, consider the sum

(5.2)∑

u≺P∗v 1u ·min(Su, W ∗/2)

over the vertices on P ∗ that precede v, where 1u isthe indicator random variable for whether u is sampledin Step 9. If this sum is at most W ∗/2 (the “good”event), then the total time spent processing jobs up to(but not including) v is at most W ∗/2; And indeed,since we sample each vertex in P ∗ independently withprobability 1/4, and since

∑x∈P∗ E[min(Sx,W ∗/2)] =∑

x∈P∗ sx ≤ W ∗, the expected value of the sum (5.2)is at most W ∗/4. By Markov’s inequality, the “goodevent” happens with probability at least 1/2. Wecondition on the occurrence of this event.

Now, job v is sampled independently with proba-bility 1/4, and if chosen, it finishes processing withinW ∗/2 with probability at least 1/2 (because rv > 0 andtherefore PrSv∼πv [Sv > W ∗/2] ≤ 1

2 by definition). Fi-nally, since the total time spent traveling on the entirepath P ∗ is at most B −W ∗, we can combine the aboveevents to get that the job successfully completes withprobability at least 1/16.

Now, linearity of expectation implies that using Pi∗

with sub-sampling, as in Step 9, gives us an expectedreward of at least Ri∗/16. Since Ri∗ is Ω(Opt), thiscompletes the proof of Theorem 1.3.

5.1.1 Proof of Theorem 5.1 The proof proceedsas follows: assuming that no single-vertex tour has

reward Ω(Opt), we want to show that one of the validKnapOrient instances must have large reward. We firstnote that this assumption lets us make the followingstructural observation about Opt.

Observation 1. Assume that no single-vertex tour hasreward Ω(Opt). Then Opt can skip all visits to verticesu such that PrSu∼πu

[Su > (B − d(ρ, u))] ≥ 1/2, andstill preserve a constant factor of the total reward.

Proof. Say that a vertex u is bad if PrSu∼πu[Su >

(B−d(ρ, u))] ≥ 1/2. Notice that by visiting a bad vertexu, the probability that Opt can continue decreasesgeometrically by a factor of 1/2, because of the highchance of the size being large enough to exceed the totalbudget. Therefore, the total expected reward that wecan collect from all bad jobs is at most, say, twice, thatwe can obtain by visiting the best bad vertex. Since theexpected reward from any bad vertex is small, we cansafely skip such vertices in Opt, and still retain Ω(Opt)reward.

Without loss of generality, let the optimal non-adaptive ordering be P ∗ = ρ = v0, v1, v2, . . . , vn;also denote the expected total reward by Opt. Forany vj ∈ V let Dj =

∑ji=1 d(vi−1, vi) denote the total

distance spent before visiting vertex vj . Note that whilethe time spent before visiting any vertex is a randomquantity, the distance is deterministic, since we dealwith non-adaptive strategies. Let vj∗ denote the firstvertex j ∈ P ∗ such that

(5.3)∑

i<j µvi (B −Dj) ≥ K · (B −Dj)

Here K is some constant which we will fix later.

Lemma 5.2. The probability that Opt visits some vertexindexed j∗ or higher is at most e1−K/2.

Proof. If Opt visits vertex vj∗ then we have∑

i<j∗ Svi ≤B − Dj∗ . This also implies that

∑i<j∗ min(Svi , B −

Dj∗) ≤ B − Dj∗ . Now, for each i < j∗ let us define arandom variable Xi := min(Svi

, B−Dj∗ )

B−Dj∗. Note that the

Xi’s are independent [0, 1] random variables, and thatE[Xi] = µvi (B −Dj∗) /(B−Dj∗). From this definition,

it is also clear that the probability that Opt visits vj∗

is upper bounded by the probability that∑

i<j∗ Xi ≤1. To this end, we have from Inequality (5.3) that∑

i<j∗ E[Xi] ≥ K. Therefore we can apply a standardChernoff bound to conclude that

Pr [Opt visits vertex vj∗ ] ≤ Pr

∑

i<j∗Xi ≤ 1

≤ e1−K/2

This completes the proof.

We therefore conclude that, for a suitably large constantK (say K = 10), we can truncate Opt at vertex vj∗−1,and still only lose a constant fraction of the reward: thisis because Opt can reach vertex vj∗ with probabilityat most e1−K/2 by Lemma 5.2, and conditioned onreaching vj∗ , the expected reward it can fetch is at mostOpt. Thus:

∑

i≤j∗−1

rvi = R∗ = Ω(Opt).

Moreover,∑

i≤j∗−1

µvi (B −Dj∗−1)

=∑

i<j∗−1

µvi (B −Dj∗−1) + µvj∗−1 (B −Dj∗−1)

≤ (K + 1)(B −Dj∗−1)

The last inequality is by choice of j∗ in equa-tion (5.3). Now if Dj∗−1 < B, then let ` ∈ Z+ be suchthat B/2` < B − Dj∗−1 ≤ B/2`−1 and W ∗ = B/2`.Otherwise if Dj∗−1 = B, then let W ∗ = 0. Observethat in either case, Dj∗−1 ≤ B −W ∗. Furthermore,

∑

i≤j∗−1

µvi (W ∗/2) ≤∑

i≤j∗−1

µvi (B −Dj∗−1)(5.4)

≤ (K + 1)(B −Dj∗−1)≤ 2(K + 1)W ∗

A Valid Knapsack Orienteering Instance withLarge Reward. We now show that a sub-prefix of Π∗

(which has length at most B−W ∗) is a good candidatesolution for the valid KnapOrient instance Iko(W ∗) byapplying the following lemma with parameters R =Ω(Opt), W = W ∗ and K ′ = 2(K + 1) = O(1), alldefined as above. From the lemma, we obtain a solutionΠ′ which is feasible for the valid KnapOrient instanceIko(W = W ∗) and has reward Ω(R/K ′) = Ω(Opt), andthis completes the proof of Theorem 5.1.

Lemma 5.3. Given parameters R, W , and K ′ supposethat no single-vertex tour has expected reward Ω(R/K ′).Then, given a path Π ≡ v1, v2, . . . , vj of length at mostB −W such that

• ∑i≤j rvi

≥ R, and

• ∑i≤j µvi

(W/2) ≤ K ′ ·W ,

the valid KnapOrient instance with parameter W hasreward Ω(R/K ′).

Proof. We first partition the vertices in Π into twoclasses based on their truncated means: L = u ∈ Π |PrSu∼πu [Su > W/2] ≥ 1/2 and S be the other verticesvisited in Π. Notice that all the vertices in S haveru = ru in the valid instance Iko(W ) of KnapOrient.

Case (i):∑

u∈L ru ≥ (1/2)∑

u∈Π ru. By thedefinition of L, each node u ∈ L has µu (W/2) ≥(1/4)W ; moreover,

∑u∈Π µu (W/2) ≤ K ′ · W by the

second assumption in the lemma (which in turn wouldfollow from equation (5.4)). So |L| ≤ 4K ′, and the nodev′ in L with highest reward has reward rv′ ≥ Ω(R/K ′).Now by Observation 1, the job located at v′ instantiatesto a size at most B − d(ρ, v′) with probability at least1/2, and therefore the single vertex tour of visiting v′

has expected reward Ω(R/K ′). This contradicts ourassumption in Observation 1; so this case can not occur.

Case (ii):∑

u∈S ru ≥ (1/2)∑

u∈Π ru. In this casewe now construct the (natural) feasible solution Π′ tothe valid KnapOrient instance with parameter W :

(a) Partition the vertices in S into at most 2K ′

groups such that the sum of the truncated meansof jobs in each group is at most W .

(b) Pick the group which has largest sum of rewardsr: clearly this is Ω(Opt/K ′).

(c) Π′ visits only the nodes of the group from (b).Since Π had length at most B−W , the subpath Π′

also satisfies the travel budget. Further, the partitioningin step (a) above ensures that

∑u∈Π′ su ≤ W , and the

definition of S ensures that ru = ru for all verticesu ∈ Π′, so the reward is

∑u∈Π′ ru ≥ Ω(Opt/K ′). This

completes the proof of Lemma 5.3.

6 The Algorithm for Adaptive StochasticOrienteering

In this section we consider the adaptive StocOrientproblem. We will show the same algorithm (Algo-rithm AlgSO) is an O(logdlog Be)-approximation to thebest adaptive solution. Note that this also establishesan adaptivity gap of O(log log B), thus proving Theo-rem 1.2.

Our proof proceeds by showing that for an opti-mal adaptive solution with expected reward Opt, either

just a single-vertex tour has reward Ω(Opt/ logdlog Be),or one of the valid KnapOrient instances (Definition 2)for some W of the form B/2i (or W = 0) has rewardΩ(Opt/ logdlog Be). This is sufficient because we cannow plug in the following Structure Theorem 2 (The-orem 6.1) instead of the non-adaptive Structure The-orem 5.1 and the rest of the previous section’s anal-ysis goes through unchanged. Indeed, we argued inSection 5.1 that any solution for any valid KnapOrientinstance with reward Ω(R) can be converted into anon-adaptive strategy for StocOrient with reward Ω(R),and therefore this also establishes the adaptivity gap oflogdlog Be for StocOrient.

Theorem 6.1. (Structure Theorem 2) Given aninstance Iso for which an optimal strategy has an ex-pected reward Opt, either there is a single-vertex tourwith expected reward Ω(Opt/ log log B), or there ex-ists W = B/2i for some i ∈ Z≥0 (or W = 0) forwhich the valid KnapOrient instance Iko(W ) has rewardΩ(Opt/ logdlog Be).

Before we begin, recall the typical instance ofStocOrient Iso := StocOrient(V, d, (πu, ru) : ∀u ∈V , B, ρ) of the stochastic orienteering problem, wherethe metric is (V, d), B is the total budget of our tour,the job at vertex u has a reward ru and a probabilitydistribution πu : R+ → [0, 1] over sizes, and ρ is thestarting vertex.

Roadmap. We begin by giving a roadmap of theproof. Let us view the optimal adaptive strategy Optas a decision tree where each node is labeled with avertex/job, and the children correspond to different sizeinstantiations of the job. For any sample path P in thisdecision tree, consider the first node xP where the sumof expected sizes of the jobs played until xP exceeds the“budget remaining”—here, if Lx,P is the total distancetraveled from the root ρ to this vertex xP by visitingvertices along P , then the remaining budget is B−Lx,P .Call such a node a frontier node, and the frontier isthe union of all such frontier nodes. To make senseof this definition, note that if the orienteering instancewas non-stochastic (and all the sizes would equal theexpected sizes), then we would not be able to get anyreward from portions of the decision tree on or belowthe frontier nodes. Unfortunately, since job sizes arerandom for us, this may not be the case.

The main idea in the proof is to show that we don’tlose too much reward by truncation: even if we truncateOpt along this frontier in a StocOrient instance, we stillget an expected reward of Ω(Opt/ logdlog Be) from thetruncated tree. Now an averaging argument says thatthere exists some path P ∗ of length L where (i) the

total rewards of jobs is Ω(Opt/ logdlog Be), and (ii) thesum of expected sizes of the jobs is O(B − L) and thisgives us the candidate KnapOrient solution. However, aswe elaborated earlier in Section 4 , we need to be verycareful in the way we define our deterministic sizes andtruncation thresholds.

Viewing Opt as a Discrete Time Stochastic Pro-cess. Note that the transitions of the decision tree Optrepresent travel between vertices: if the parent node islabeled with vertex u, and its child is labeled with v, thetransition takes d(u, v) time. To simplify matters, wetake every such transition, and subdivide it into d(u, v)unit length transitions. The intermediate nodes addedin are labeled with new dummy vertices, with dummyjobs of deterministic size 0 and reward 0. We denotethis tree as Opt′, and note the amount of time spenttraveling is exactly the number of edges traveled downthis tree. (All this is only for analysis.) Now if westart a particle at the root, and let it evolve down thetree based on the random outcomes of job sizes, thenthe node reached at timestep t corresponds to some jobwith a random size and reward. This naturally givesus a discrete-time stochastic process T , which at everytimestep picks a job of size St ∼ Dt and reward Rt.Note that St,Rt and the probability distribution Dt allare random variables that depend on the outcomes ofthe previous timesteps 0, 1, . . . , t − 1 (since the actualjob that the particle sees after going t hops depends onpast outcomes). We stop this process at the first (ran-dom) timestep tend such that

∑tend

t=0 St ≥ (B − tend)—this is the natural point to stop, since it is preciselythe timestep when the total processing plus the totaldistance traveled exceeds the budget B.

Some notation: Nodes will correspond to statesof the decision tree Opt′, whereas vertices are pointsin the metric (V, d). The level t of a node x in Opt′

is the number of hops in the decision tree from rootto reach x—this is the timestep when the stochasticprocess would reach x, or equivalently the travel time toreach the corresponding vertex in the metric. We denotethis by level(x). Let label(x) be the vertex labeling x.We abuse notation and use Sx, Rx and πx to denote thesize, reward, and size distribution for node x—henceSx = Slabel(x), Rx = Rlabel(x) and πx = πlabel(x). We usex′ ¹ x to denote that x′ is an ancestor of x.

Now to begin the proof of Theorem 6.1. Firstly,we assume that there are no co-located stochastic jobs(i.e., there is only one job at every vertex); note thatthis also implies that we have to travel for a non-zero integral distance between jobs. (This is only tosimplify the exposition of the proof; we explain how todischarge this assumption in Section 6.1.) The proofproceeds by assuming that no single-vertex tour has

reward Ω(Opt/ logdlog Be), and then going on to showthat one of the valid KnapOrient instances must havelarge reward. Like in Section 5 we can make thefollowing observation about the optimal solution (thestatement is identical to Observation 1, we simply stateit for continuity).

Observation 2. Assume that no single-vertex tour hasreward Ω(Opt/ logdlog Be). Then Opt can skip all visitsto vertices u such that PrSu∼πu [Su > (B − d(ρ, u))] ≥1/2, and still preserve a constant factor of the totalreward.

Proof. See proof of Observation 1.

Defining the Frontiers. Henceforth, we will focuson the decision tree Opt′ and the induced stochasticprocess T . Consider any intermediate node x and thesample path from the root to x in Opt′. We call x astar node if x is the first node along this sample pathfor which the following condition is satisfied:

∑x′¹x µx′ (B − level(x)) ≥ 8K (B − level(x))(6.5)

Observe that no star node is an ancestor of another starnode. To get a sense of this definition of star nodes,ignore the truncation for a moment: then x is a starnode if the expected sizes of all the level(x) jobs on thesample path until x sum to at least 8K(B − level(x)).But since we’ve spent level(x) time traveling to reach x,the process only goes below x if the actual sizes of thejobs is at most B − level(x); i.e., if the sizes of the jobsare a factor K smaller than their expectations. If thiswere an unlikely event, then pruning Opt′ at the starnodes would result in little loss of reward.

And that is precisely what we show. The followinglemma is the main technical component of the reduc-tion, and bounds the probability that the random pro-cess T visits nodes that are descendants of star nodes.

Lemma 6.1. The process T continues beyond a starnode with probability at most 1/10, if K =Ω(logdlog Be).Proof. We group the star nodes into dlog Be + 1 bandsbased on the value of (B − level(x)). Say that a starnode x is in band i if (B − level(x)) ∈ (B/2i+1, B/2i]for 0 ≤ i ≤ dlog Be, and in band (dlog Be + 1) iflevel(x) = B. Firstly note that Opt can never continuebeyond a star node at band dlog Be + 1 since it wouldrun out of budget by then. Therefore we focus on aparticular band i ∈ 0, 1, . . . , dlog Be and bound theprobability of T reaching a star node in band i by1/(10(dlog Be+ 1)); then a trivial union bound over alli suffices.

In order to bound the probability, consider thefollowing altered stochastic process Ti: follow T as longas it could lead to a star node in band i. If we reacha node y such that there is no band-i star node as adescendant of y, then we stop the process Ti at y. Elsewe stop when we reach a star node in band i. Anillustration of the optimal decision tree, the differentbands and altered processes is given in Figure 6.3.

By a straightforward coupling argument, the prob-abilities of reaching a band-i star node in T and in Ti

are identical, and hence it suffices to bound the corre-sponding probability of continuing beyond a band-i starnode in Ti. Therefore, let us fix a particular band i.

level(v)Opt(1)

star nodes

frontier

ρ

Figure 6.3: Optimal decision tree example: dashed lines

indicate the bands, × indicates star nodes.

Claim 1. For each i ∈ 0, 1, . . . , dlog Be, and any starnode x in band i, the following inequality holds:

2KB

2i≤

∑

x′¹x

µx′(B/2i+1

) ≤ 17KB

2i

Proof. By the definition of a star node (6.5), and sinceband-i star node x has B/2i+1 ≤ B − level(x) ≤ B/2i,

∑

x′¹x

µx′(B/2i+1

) ≥∑

x′¹x

µx′

(12(B − level(x))

)

≥ 12

∑

x′¹x

µx′ (B − level(x))

≥ 4K(B − level(x)) ≥ 2KB

2i.

The first two inequalities used the monotonicity andsubadditivity of µx′ (·).

Moreover, since y, the parent node of x, is not a starnode, it satisfies

∑x′¹y µx′ (B − level(y)) < 8K(B −

level(y)) = 8K(B − level(x) + 1). But since we arecurrently not considering band number dlog Be+1 (andall distances are at least 1), level(x) ≤ B− 1, and henceB − level(x) + 1 ≤ 2(B − level(x)) ≤ 2B/2i. Thus wehave

∑x′¹y µx′ (B − level(y)) < 16K ·B/2i. Now,

∑

x′¹x

µx′(B/2i+1

)=

∑

x′¹y

µx′(B/2i+1

)+ µx

(B/2i+1

)

≤∑

x′¹y

µx′ (B − level(y)) +B

2i+1

≤ 16K · B

2i+

B

2i+1≤ 17K · B

2i

The first inequality uses B − level(y) ≥ B − level(x) ≥B/2i+1. This completes the proof.

Claim 2. For any i ∈ 0, 1, . . . , dlog Be + 1 and anystar node x in band i, if process Ti reaches x then

∑

x′¹x

min(

Sx′ ,B

2i+1

)≤ B

2i

Proof. Clearly, if process Ti reaches node x, it mustmean that

∑x′¹x Sx′ ≤ (B − level(x)) ≤ B/2i, else we

would have run out of budget earlier. The claim followsbecause truncation can only decrease the LHS.

We now finish upper bounding the probability ofreaching a star node in band i using a Martingale anal-ysis. Define a sequence of random variables Zt, t =0, 1, . . . where

Zt =t∑

t′=0

(min

St′ ,

B

2i+1

− µSt′

(B/2i+1

)).(6.6)

Since the subtracted term is precisely the expectationof the first term, the one-term expected change is zeroand the sequence Zt forms a martingale. In turn,E[Zτ ] = 0 for any stopping time τ . We will define τto be the time when the process Ti ends—recall thatthis is the first time when (a) either the process reachesa band-i star node, or (b) there is no way to get to aband-i star node in the future.

Claim 2 says that when Ti reaches any star nodex, the sum over the first terms in (6.6) is at mostB/2i, whereas Claim 1 says the sums of the meansis at least 2K B

2i . Because K ≥ 1, we can infer thatthe Zt ≤ −K(B/2i) for any star node (at level t). Tobound the probability of reaching a star node in Ti,we appeal to Freedman’s concentration inequality formartingales, which substantially generalizes the Azuma-Hoeffding inequality.

Theorem 6.2. (Freedman [13] (Theorem 1.6))Consider a real-valued martingale sequence Xkk≥0

such that X0 = 0, and E [Xk+1 | Xk, Xk−1, . . . , X0] = 0for all k. Assume that the sequence is uniformlybounded, i.e., |Xk| ≤ M almost surely for all k. Nowdefine the predictable quadratic variation process of themartingale to be

Wk =∑k

j=0 E[X2

k | Xk−1, Xk−2, . . . X0

]

for all k ≥ 1. Then for all l ≥ 0 and σ2 > 0, and anystopping time τ we have

Pr

|

τ∑

j=0

Xj | ≥ l and Wτ ≤ σ2

≤ 2 exp

(− l2/2

σ2 + Ml/3

)

We apply the above theorem to the Martingaledifference sequence Xt = Zt − Zt−1. Now, since eachterm Xt is just min(St,

B2i+1 ) − µSt

(B/2i+1

), we get

that E [Xt | Xt−1, . . .] = 0 by definition of µSt

(B/2i+1

).

Moreover, since the sizes and means are both truncatedat B/2i+1, we have |Xt| ≤ B/2i+1 w.p 1; hence we canset M = B/2i+1. Finally in order to bound the varianceterm Wt we appeal to Claim 1. Indeed, consider a singler.v Xt = min(St,

B2i+1 )− µSt

(B/2i+1

), and suppose we

abbreviate min(St,B

2i+1 ) by Y for convenience. Then:

E[X2

t | Xt−1, . . .]

= E[(Y − E [Y ])2

]

= E[Y 2

]− E[Y ]2

≤ Ymax · E[Y ]

≤ B

2i+1· µSt

(B/2i+1

)

Here, the first inequality uses Y ≥ 0 and Ymax asthe maximum value of Y . The last inequality usesthe definition of Y . Hence the term Wt is at most(B/2i+1)

∑t′≤t µSt′

(B/2i+1

)for the process at time

t. Now from Claim 1 we have that for any star node(say at level t) in band i, we have

∑t′≤t µt′

(B/2i+1

) ≤17K(B/2i). Therefore we have Wt ≤ 9K · (B/2i)2 forstar nodes, and we set σ2 to be this quantity.

So by setting ` = K(B/2i), σ2 = 9K(B/2i)2, andM = B/2i+1, we get that Pr[ reaching star node in Ti]is at most:

Pr[|Zτ | ≥ K(B/2i) and Wτ ≤ 9K(B/2i)2

]≤ 2 e−K/20.

Setting K = Ω(logdlog Be) and performing a simpleunion bound calculation over the dlog Be bands com-pletes the proof of Lemma 6.1.

Consider the new tree Opt′′ obtained by truncat-ing Opt′ at the star nodes (of any level). Since by

Lemma 6.1 we reach star nodes with probability at most1/10, Opt′′ has reward at least 9Opt/10. This uses thefact that the expected conditional reward in any subtreeof Opt′ is itself at most Opt: so the reward from Opt′

lost to truncation is at most Opt/10.

Claim 3. Every leaf node s in tree Opt′′ satisfies∑x¹s µx (B − level(s) + 1) ≤ 9K (B − level(s) + 1).

Proof. Let s′ denote the parent of leaf node s. Notethat level(s′) = level(s) − 1, and s′ is not a star node(nor a descendant of one). So:

∑

x¹s′µx (B − level(s) + 1) =

∑

x¹s′µx (B − level(s′))

< 8K(B − level(s′)).

Clearly µs (B − level(s) + 1) ≤ B− level(s) + 1. Addingthese inequalities, we obtain the claim.

So by an averaging argument, there exists a samplepath P ∗ in Opt′′ to leaf node s∗ (say belonging to bandi∗) of length level(s∗) with (a) the sum of rewards ofnodes visited along this path being ≥ 9Opt/10 and(b) the sum of means (truncated at B − level(s∗) + 1)of jobs being ≤ 9K (B − level(s∗) + 1). By chargingagainst a single-vertex tour, we may assume WLOGthat level(s∗) ≤ B − 1. Then we have:

∑

x¹s∗µx

(B − level(s∗)

2

)≤

∑

x¹s∗µx (B − level(s∗) + 1)

≤ 18K (B − level(s∗))

Thus we can apply Lemma 5.3 on path Π =P ∗ of length level(s∗), and parameters R = 9

10Opt,W = B − level(s∗) and K ′ = 18K to obtain that thevalid KnapOrient instance Iko(W ) has optimal valueΩ(Opt/ log log B). This completes the proof of Theo-rem 6.1.

6.1 Handling Co-Located Jobs To help with thepresentation in the above analysis, we assumed thata node x which is at depth l in the decision tree forOpt is actually processed after the adaptive tour hastraveled a distance of l. In particular, this meant thatthere is at most one stochastic job per node (because ifOpt is done with processing some job, it has to travelat least 1 timestep in the decision tree because itsdepth increases). However, if we are more careful indefining the truncations of any node (in equation (6.5))by its actual length along the tour, instead of simply itsdepth/level in the tree, then we will be able to handleco-located jobs in exactly the same manner as above. Inthis situation, there could be several nodes in a sample

path which have the same truncation threshold but itis not difficult to see that the rest of the analysis wouldproceed in an identical fashion.

7 Stochastic Orienteering with CorrelatedRewards (CorrOrient)

In this section we consider the stochastic orienteeringproblem where the reward of each job is a random vari-able that may be correlated with its size. Crucially, thedistributions at different vertices are still independent ofeach other. For any vertex v, we are given a probabil-ity distribution over (size, reward) pairs as follows: foreach t ∈ 0, 1, . . . , B, the job at v has size t and rewardrv,t with probability πv,t. Again we consider the settingwithout cancellations, so once a job is started it mustbe run to completion unless the budget B is exhausted.The goal is to devise a (possibly adaptive) strategy thatmaximizes the expected reward of completed jobs, sub-ject to the total budget (travel time + processing time)being at most B.

When there is no metric in the problem instance, itis a correlated stochastic knapsack instance, and [16]show a non-adaptive algorithm which is a constant-factor approximation to the optimal adaptive strategy;this is using an LP-relaxation that is quite differentfrom that in the uncorrelated setting. The problemwith extending that approach to stochastic orienteeringis that we do not know of LP relaxations with goodapproximation guarantees even for the deterministicorienteering problem. We circumvented this issue forthe uncorrelated case by using a Martingale analysisto bypass the need for an LP relaxation, which gave adirect lower bound. We adopt a similar technique here,but as Theorem 1.4 says, our approximation ratio isonly a O(log n log B) factor here. Moreover, we showthat CorrOrient is at least as hard to approximate asthe deadline orienteering problem, for which the bestknown guarantee is an O(log n) approximation [2].

7.1 The Non-Adaptive Algorithm for CorrOrientWe now get into describing our approximation algo-rithm, which proceeds via a reduction to a suitably con-structed instance of the deadline orienteering problem.This reduction proceeds via taking a Lagrangian relax-ation of the processing times, and then performing anamortized analysis to argue that the reward is high andthe budget is met with constant probability. (In thisit is similar to the ideas of [15].) However, new ideasare required to handle correlations between sizes andrewards: indeed, this is where the deadline orienteeringproblem is needed. Also, we use an interesting countingproperty (Claim 4) to avoid double-counting reward.

Notation. Let Opt denote an optimal decision tree.We classify every execution of a job in this decisiontree as belonging to one of (log2 B + 1) types thus:for i ∈ [log2 B], a type-i job execution occurs whenthe processing time spent before running the job liesin the interval [2i − 1, 2i+1 − 1). (Note that the samejob might have different types on different sample pathsof Opt, but for a fixed sample path down Opt, it canhave at most one type.) If Opt(i) is the expectedreward obtained from job runs of type i, then we haveOpt =

∑i Opt(i), and hence maxi∈[log2 B] Opt(i) ≥

Ω( 1log B ) · Opt. For all v ∈ V and t ∈ [B], let Rv,t :=∑B−tz=0 rv,z · πv,z denote the expected reward when job

v’s size is restricted to being at most B − t. Note thatthis is the expected reward we can get from job v if itstarts at time t.

7.1.1 Reducing CorrOrient to DeadlineOrient Thehigh-level idea is the following: for any fixed i, wecreate an instance of DeadlineOrient to get an O(log n)fraction of Opt(i) as reward; then choosing the best suchsetting of i gives us the O(log n log B)-approximation.To get the instance of DeadlineOrient, for each vertexv, we create several copies of it, where the copy 〈v, t〉corresponds to the situation where the job starts attime t, and hence has reward Rv,t. However, to preventthe DeadlineOrient solution from collecting reward frommany different copies of the same vertex, we makecopies of vertices only when the reward changes by ageometric factor. Indeed, for t1 < t2, if it holds thatRv,t1 ≤ 2Rv,t2 , then we do not include the copy 〈v, t1〉in our instance. The following claim is useful for definingsuch a minimal set of starting times for each vertex/job.

Claim 4. Given any non-increasing function f : [B] →R+, we can efficiently find a subset I ⊆ [B] such that:

f(t)2

≤ max`∈I:`≥t

f(`) ,

and ∑

`∈I:`≥t

f(`) ≤ 2 · f(t), ∀t ∈ [B].

Proof. The set I is constructed as follows. Observe thatB is always contained in the set I, and hence for anyt ∈ [B], min` ≥ t : ` ∈ I is well-defined.

To prove the claimed properties let I = ìpi=0. For

the first property, given any t ∈ [B] let `(t) = min` ≥t : ` ∈ I. Observe that if this set is empty then it mustbe that f(t) = 0 and the claim trivially holds. Nowassume that value `(t) exists and that `(t) = ì. By thedefinition of `(t) it must be that ì−1 < t, and so ki ≤ t.Hence f(ì) ≥ f(ki)/2 ≥ f(t)/2; the first inequality isby choice ì, and the second as f is non-increasing.

Algorithm 2 Computing the support I in Claim 4.1: let i ← 0, k0 ← 0, I ← ∅.2: while ki ∈ [B] and f(ki) > 0 do

3: ì ← max

` ∈ [B] : f(`) ≥ f(ki)2

.

4: ki+1 ← ì + 1, I ← I⋃ì.

5: i ← i + 1.6: output set I.

We now show the second property. For any indexi, we have ì−1 < ki ≤ ì < ki+1 ≤ ì+1. Note thatf(ki+1) = f(ì + 1) < f(ki)/2 by choice of ì. Sof(ì+1) ≤ f(ki+1) < f(ki)/2 ≤ f(ì−1)/2. Given anyt ∈ [B] let q be the index such that `q = min` ≥ t : ` ∈I. Consider the sum:

∑

i≥q

f(ì) =∑

j≥0

f(`q+2j) +∑

j≥0

f(`q+1+2j)

≤ f(`q) + f(`q+1) ≤ 2 · f(t)

The first inequality uses f(ì+1) < f(ì−1)/2 and ageometric summation; the last is by t ≤ `q < `q+1.

Consider any i ∈ [log2 B]. Now for each v ∈ V ,apply Claim 4 to the function f(t) := Rv,t+2i−1 toobtain a subset Ii

v ⊆ [B], which defines which copies weuse—namely, we create |Ii

v| co-located copies 〈v, `〉 :` ∈ Ii

v where node 〈v, `〉 has deadline `. Define the sizesi(〈v, `〉) = E

[min(Sv, 2i)

], and ri(〈v, `〉) = Rv,`+2i−1.

Finally, define Ni = 〈v, `〉 : ` ∈ Iiv, v ∈ V .

Now we are ready to describe the DeadlineOrientinstance Ii(λ) for any λ ≥ 0. The metric is (V, d), andfor each vertex v ∈ V , the copies Ii

v are all co-locatedat v. The copy 〈v, `〉 ∈ Ii

v has deadline `, and a profitof ri(v, `, λ) = ri(v, `) − λ · si(v). The goal is to finda path which maximizes the reward of the copies thatare visited within their deadlines. Here, λ should bethought of as a Lagrangean multiplier, and hence theabove instance is really a kind of Lagrangean relaxationof a DeadlineOrient instance which has a side constraintthat the total processing time is at most ≈ 2i. It isimmediate by the definition of rewards that Opt(Ii(λ))is a non-increasing function of λ.

The idea of our algorithm is to argue that for the“right” setting of λ, the optimal DeadlineOrient solutionfor Ii(λ) has value Ω(Opt(i)); moreover that we canrecover a valid solution to the stochastic orienteeringproblem from an approximate solution to Ii(λ).

Lemma 7.1. For any i ∈ [log B] and λ > 0, the optimalvalue of the DeadlineOrient instance Ii(λ) is at leastOpt(i)/2− λ 2i+1.

Proof. Consider the optimal decision tree Opt of theCorrOrient instance, and label every node in Opt by

a (dist, size) pair, where dist is the total time spenttraveling and size the total time spent processing jobsbefore visiting that node. Note that both dist and sizeare non-decreasing as we move down Opt. Also, type-inodes are those where 2i−1 ≤ size < 2i+1−1, so zeroingout rewards from all but type-i nodes yields expectedreward Opt(i).

For any vertex v ∈ V , let Xiv denote the indicator

r.v. that job v is run as type-i in Opt, and Sv be ther.v. denoting its instantiated size—note that these areindependent. Also let Y i =

∑v∈V Xi

v · min(Sv, 2i) bethe random variable denoting the sum of truncated sizesof type-i jobs. By definition of type-i, we have Y i ≤ 2·2i

with probability one, and hence E[Y i] ≤ 2i+1. For easeof notation let qv = Pr[Xi

v = 1] for all v ∈ V . We nowhave,

2i+1 ≥ E[Y i](7.7)

=∑

v∈V

qv · E[min(Sv, 2i) | Xiv = 1]

=∑

v∈V

qv · E[min(Sv, 2i)]

=∑

v∈V

qv · si(v)

Now consider the expected reward Opt(i). We can write:

Opt(i) =∑

v∈V

∑

`∈[B]

2i+1−2∑

k=2i−1

Pr[1v,dist=l,size=k] ·Rv,`+k

≤∑

v∈V

∑

`∈[B]

Pr[1v,type=i,size=k] ·Rv,`+2i−1(7.8)

where 1v,dist=l,size=k is the indicator for whether Optvisits v with dist = l and size = k, and 1v,type=i,size=k

is the indicator for whether Opt visits v as type-i withsize = k. Now going back to the metric, let P denotethe set of all possible rooted paths traced by Opt(i) inthe metric (V, d). Now for each path P ∈ P, define thefollowing quantities.

• β(P ) is the probability that Opt(i) traces P .• For each vertex v ∈ P , dv(P ) is the travel time (i.e.dist) incurred in P prior to reaching v. Note thatthe actual time at which v is visited is dist + size,which is in general larger than dv(P ).• wλ(P ) =

∑v∈P

[12 ·Rv,dv(P )+2i−1 − λ · si(v)

].

Moreover, for each v ∈ P , let `v(P ) = min` ∈ Iiv | ` ≥

dv(P ); recall that we defined Iiv on the previous page

using Claim 4, and since B ∈ Iiv, the quantity `v(P ) is

well-defined.For any path P ∈ P, consider P as a solution to

Ii(λ) that visits the copies 〈v, `v(P )〉 : v ∈ P withintheir deadlines. It is feasible for the instance Ii(λ)

because for each vertex in P , we defined `v(P ) ≥ dv(P )and therefore we would reach the chosen copy within itsdeadline. Moreover, the objective value of P is precisely

∑

v∈P

ri(v, `v(P ), λ) ≥∑

v∈P

[Rv,`v(P )+2i−1 − λ · si(v)

]

≥∑

v∈P

[12·Rv,dv(P )+2i−1 − λ · si(v)

]

= wλ(P )

where the second inequality above uses the definition of`v(P ) and Claim 4. Now Opt(Ii(λ)) is at least:

maxP∈P

wλ(P )

≥∑

P∈Pβ(P ) · wλ(P )

=∑

P∈Pβ(P ) ·

∑

v∈P

[12·Rv,dv(P )+2i−1 − λ · si(v)

]

=12

∑

v∈V

∑

`∈[B]

Pr[1v,type=i,dist=l] ·Rv,`+2i−1

− λ ·∑

v∈V

Pr[Xiv] · si(v)

≥ Opt(i)2

− λ · 2i+1

Above the second-to-last equatility is by interchangingsummations and splitting the two terms from the pre-vious expression, the first term in the final inequalitycomes from (7.8), and the second term comes from (7.7)and the fact that qv = Pr[Xi

v] = Pr[v visited as type-i]).

Now let AlgDO denote an α-approximation algo-rithm for the DeadlineOrient problem.3 We focus onthe “right” setting of λ defined thus:

(7.9) λ∗i = max

λ : AlgDO(Ii(λ)) ≥ 2i · λα

Lemma 7.2. For any i ∈ [log2 B], we get λ∗i ≥Opt(i)/2i+3, and hence AlgDO(Ii(λ∗i )) ≥ Opt(i)/8α.

Proof. Consider the setting λ = Opt(i)/2i+3; byLemma 7.1, the optimal solution to the DeadlineOrientinstance Ii(λ) has value at least Opt(i)/4 ≥ 2i · λ. SinceAlgDO is an α-approximation for DeadlineOrient, it fol-lows that AlgDO(Ii(λ)) ≥ Opt(Ii(λ))/α ≥ 2i · λ/α. Soλ∗i ≥ λ ≥ Opt(i)/2i+3, hence proved.

3We will slightly abuse notation and use AlgDO(Ii(λ)) todenote both the α-approximate solution on instance Ii(λ) as wellas its value.

7.1.2 Obtaining CorrOrient solution fromAlgDO(λ∗i ) It just remains to show that the so-lution output by the approximation algorithm forDeadlineOrient on the instance Ii(λ∗i ) yields a goodnon-adaptive solution to the original CorrOrient in-stance. Let σ = AlgDO(λ∗i ) be this solution—hence σis a rooted path that visits some set P ⊆ Ni of nodeswithin their respective deadlines. The algorithm belowgives a subset Q ⊆ P of nodes that we will visit in thenon-adaptive tour; this is similar to the algorithm forKnapOrient in Section 3).

Algorithm 3 Algorithm Ai for CorrOrient given asolution for Ii(λ∗i ) characterized by a path P .

1: let y =(∑

v∈P si(v))/2i.

2: partition the vertices of P into c = max(1, b2yc)parts P1, . . . , Pc s.t

∑v∈Pj

si(v) ≤ 2i for all 1 ≤ j ≤c.

3: set Q ← Pk where k =arg maxc

j=1

∑〈v,`〉∈Pj

ri(v, `).4: for each v ∈ V , define dv := min` : 〈v, `〉 ∈ Q5: let Q := v ∈ V : dv < ∞ be the vertices with at

least one copy in Q.6: sample each vertex in Q independently w.p. 1/2,

and visit these sampled vertices in order given byP .

At a high level, the algorithm partitions the verticesin σ into groups, where each group obeys the size budgetof 2i in expectation. It then picks the most profitablegroup of them. The main issue with Q chosen in Step 3is that it may include multiple copies of the same vertex.But because of the way we constructed the sets Iv

i

(based on Claim 4), we can simply pick the copy whichcorresponds to the earliest deadline, and by discardingall the other copies, we only lose out on a constantfraction of the reward ri(Q). Our first claim boundsthe total (potential) reward of the set Q we select inStep 3.

Claim 5. The sum ri(Q) =∑〈v,`〉∈Q ri(v, `) is at least

Opt(i)/(8α).

Proof. Consider the following chain of inequalities:

ri(Q) ≥ ri(P )c

≥ λ∗i y2i + λ∗i 2i/α

c

= λ∗i 2i ·(

y + 1/α

c

)

≥ λ∗i 2i ·min

y +1α

,12

+1

2αy

≥ λ∗i 2i

α

The second inequality is due the the fact that 2i λ∗i /α ≤AlgDO(λ∗i ) = ri(P ) = ri(P ) − λ∗i · si(P ) due to thechoice of λ∗i in equation (7.9); the third inequality is byc ≤ max1, 2y; and the last inequality uses α ≥ 2. Toconclude we simply use Lemma 7.2.

Next, we show that we do not lose much of the totalreward by discarding duplicate copies of vertices in Q.

Claim 6.∑

v∈Q Rv,dv+2i−1 ≥ Opt(i) / 16α.

Proof. For each u ∈ Q, let Qu = Q⋂〈u, `〉`∈[Ii

u]

denote all copies of u in Q. Now by the definition ofdu we have ` ≥ du for all 〈u, `〉 ∈ Qu. So for any u ∈ Q,

∑

〈u,`〉∈Qu

Ru,`+2i−1 ≤∑

`∈Iiu:`≥du

Ru,`+2i−1

≤ 2 ·Ru,du+2i−1

Above, the last inequality uses the definition of Iiu as

given by Claim 4. Adding over all u ∈ Q,

∑

u∈Q

Ru,du+2i−1 ≥ 12

∑

u∈Q

∑

〈u,`〉∈Qu

Ru,`+2i−1

=12

∑

v∈Q

ri(v)

≥ Opt(i)16α

Here, the last inequality avove uses Claim 5. Thiscompletes the proof.

We now argue that the algorithm Ai reaches anyvertex v ∈ Q before the time dv + 2i − 1 with constantprobability.

Claim 7. For any vertex v ∈ Q, it holds thatPr

[Ai reaches job v by time dv + 2i − 1

] ≥ 12 .

Proof. We know that because P is a feasible tour forthe DeadlineOrient instance, the distance traveled be-fore reaching the copy 〈v, dv〉 is at most dv. There-fore in what remains, we show that with probabil-ity 1/2, the total size of previous vertices is at most2i − 1. To this end, let Q

′denote the set of ver-

tices sampled in Step 6. We then say that the badevent occurs if

∑u∈Q

′\v min(Su, 2i) ≥ 2i. Indeed if∑u∈Q

′\v min(Su, 2i) < 2i, then certainly we will reachv by time dv + 2i − 1.

We now bound the probability of the bad event. Forthis purpose, observe that

E[ ∑

u∈Q′\v

min(Su, 2i)] ≤

∑

u∈Q

12E

[min(Su, 2i)

],

by linearity of expectation and the fact that each u issampled with probability 1/2. Now the latter sum isat most (1/2)

∑u∈Q si(u) ≤ 2i−1, by the choice of Q in

the partition Step 3. Hence, the probability of the badevent is at most 1/2 by Markov’s inequality.

Theorem 7.1. The expected reward of the algorithm Ai

is at least Opt(i)/64α.

Proof. We know from the above claim that for eachvertex v ∈ Q, Ai reaches v by time dv + 2i − 1 withprobability at least 1/2. In this event, we sample vwith probability 1/2. Therefore, the expected rewardcollected from v is at least (1/4)Rv,dv+2i−1. The proofis complete by using linearity of expectation and thenClaim 6.

7.2 Evidence of hardness for CorrOrient Our ap-proximation algorithm for CorrOrient can be viewedas a reduction to DeadlineOrient. We now providea reduction in the reverse direction: namely, a β-approximation algorithm for CorrOrient implies a (β −o(1))-approximation algorithm for DeadlineOrient. Inparticular this means that a sub-logarithmic approxima-tion for CorrOrient would also improve the best knownapproximation ratio for DeadlineOrient.

Consider any instance I of DeadlineOrient on metric(V, d) with root ρ ∈ V and deadlines tvv∈V ; the goalis to compute a path originating at ρ that visits themaximum number of vertices before their deadlines. Wenow define an instance J of CorrOrient on the samemetric (V, d) with root ρ. Fix parameter 0 < p ¿ 1

n2 .The job at each v ∈ V has the following distribution:size B−tv (and reward 1/p) with probability p, and sizezero (and reward 0) with probability 1−p. To completethe reduction from DeadlineOrient to CorrOrient we willshow that (1 − o(1)) · Opt(I) ≤ Opt(J ) ≤ (1 + o(1)) ·Opt(I).

Let τ be any solution to I that visits subset S ⊆ Vof vertices within their deadline; so the objective valueof τ is |S|. This also corresponds to a (non-adaptive)solution to J . For any vertex v ∈ S, the probabilitythat zero processing time has been spent prior to v is atleast (1− p)n; in this case, the start time of job v is atmost tv (recall that τ visits v ∈ S by time tv) and hencethe conditional expected reward from v is p· 1p = 1 (sincev has size B− tv and reward 1/p with probability p). Itfollows that the expected reward of τ as a solution to Jis at least

∑v∈S(1−p)n ≥ |S| ·(1−np) = (1−o(1)) · |S|.

Choosing τ to be the optimal solution to I, we have(1− o(1)) · Opt(I) ≤ Opt(J ).

Consider now any adaptive strategy σ for J , withexpected reward R(σ). Define path σ0 as one startingfrom the root of σ that always follows the branchcorresponding to size zero instantiation. Consider σ0 asa feasible solution to I; let S0 ⊆ V denote the verticeson path σ0 that are visited prior to their respectivedeadlines. Clearly Opt(I) ≥ |S0|. When strategy σis run, every size zero instantiation gives zero reward,and no (V \ S0)-vertex visited on σ0 can fit into theknapsack (with positive size instantiation); so

(7.10) Pr [σ gets positive reward] ≤ p · |S0|Moreover, since the reward is always an integral multi-ple of 1/p,

R(σ) =1p·

n∑

i=1

Pr [σ gets reward at least i/p]

=1p· Pr [σ gets positive reward](7.11)

+1p·

n∑

i=2

Pr [σ gets reward at least i/p]

Furthermore, for any i ≥ 2, wehave: Pr [σ gets reward at least i/p] ≤Pr [at least i jobs instantiate to positive size] ≤(ni

) ·pi ≤ (np)i. It follows that the second term in (7.11)can be upper bounded by 1

p ·∑n

i=2(np)i ≤ 2n2p. Com-bining this with (7.10) and (7.11), we obtain thatR(σ) ≤ |S0| + 2n2p = |S0| + o(1). Since thisholds for any adaptive strategy σ for J , we getOpt(I) ≥ (1− o(1)) · Opt(J ).

8 Stochastic Orienteering with Cancellations

We now describe a black-box way to handle the settingwhere an optimal strategy can actually abort a job dur-ing its processing (say, if it turns out to be longer thanexpected). We first note that even for the basic caseof Stochastic Knapsack, there are very simple exam-ples that demonstrate an arbitrarily large gap in the ex-pected reward for strategies which can cancel and those

that can’t. The idea is simple — essentially we createup to B co-located copies of every job v, and imaginethat copy 〈v, t〉 has expected reward

∑tt′=0 πv,t′rv,t′ , i.e.,

it fetches reward only when the size instantiates to atmost t. It is easy to see that any adaptive optimal so-lution, when it visits a vertex in fact just plays somecopy of it (which copy might depend on the history ofthe sample path taken to reach this vertex). Thereforewe can find a good deterministic solution with suitablylarge reward (this is the KnapOrient problem for the un-correlated case and the DeadlineOrient problem for thecorrelated case). Now the only issue is when we trans-late back from the deterministic instance to a solutionfor the StocOrient instance: the deterministic solutionmight collect reward from multiple copies of the samejob. We can fix this by again using the geometric scalingidea (using Claim 4), i.e., if there are two copies of the(roughly the) same reward, we only keep the one withthe earlier cancellation time. This way, we can ensurethat for all remaining copies of a particular job, the re-wards are all at least geometric factors away. Now, evenif the deterministic solution collects reward from multi-ple copies of a job, we can simply play the one amongstthem with best reward, and we’ll have collected a con-stant fraction of the total.

References

[1] Micah Adler and Brent Heeringa. ApproximatingOptimal Binary Decision Trees. In APPROX, pages1–9, 2008.

[2] Nikhil Bansal, Avrim Blum, Shuchi Chawla, and AdamMeyerson. Approximation algorithms for deadline-TSP and vehicle routing with time-windows. In STOC,pages 166–174.

[3] Anand Bhalgat. A (2 + ε)-approximation al-gorithm for the stochastic knapsack prob-lem. At http://www.seas.upenn.edu/~bhalgat/

2-approx-stochastic.pdf, 2011.[4] Anand Bhalgat, Ashish Goel, and Sanjeev Khanna.

Improved approximation results for stochastic knap-sack problems. In SODA, 2011.

[5] Avrim Blum, Shuchi Chawla, David R. Karger, TerranLane, Adam Meyerson, and Maria Minkoff. Approx-imation algorithms for orienteering and discounted-reward TSP. SIAM J. Comput., 37(2):653–670, 2007.

[6] Ann Melissa Campbell, Michel Gendreau, and Bar-rett W. Thomas. The orienteering problem withstochastic travel and service times. Annals OR,186(1):61–81, 2011.

[7] Chandra Chekuri, Nitish Korula, and Martin Pal. Im-proved algorithms for orienteering and related prob-lems. In SODA, pages 661–670.

[8] Chandra Chekuri and Amit Kumar. Maximum cover-age problem with group budget constraints and appli-cations. In APPROX-RANDOM, pages 72–83, 2004.

[9] Chandra Chekuri and Martin Pal. A recursive greedyalgorithm for walks in directed graphs. In FOCS, pages245–253, 2005.

[10] Ke Chen and Sariel Har-Peled. The Euclideanorienteering problem revisited. SIAM J. Comput.,38(1):385–397, 2008.

[11] Ning Chen, Nicole Immorlica, Anna R. Karlin, Mo-hammad Mahdian, and Atri Rudra. Approximatingmatches made in heaven. In ICALP (1).

[12] Brian C. Dean, Michel X. Goemans, and Jan Vondrak.Approximating the stochastic knapsack problem: Thebenefit of adaptivity. Math. Oper. Res., 33(4):945–964,2008.

[13] David A. Freedman. On tail probabilities for martin-gales. Annals of Probability, 3:100–118, 1975.

[14] Sudipto Guha and Kamesh Munagala. Approximationalgorithms for budgeted learning problems. In STOC,pages 104–113.

[15] Sudipto Guha and Kamesh Munagala. Multi-armedbandits with metric switching costs. In ICALP, pages496–507, 2009.

[16] Anupam Gupta, Ravishankar Krishnaswamy, MarcoMolinaro, and R. Ravi. Approximation algorithms forcorrelated knapsacks and non-martingale bandits. Toappear in FOCS 2011, arxiv: abs/1102.3749, 2011.

[17] Anupam Gupta, Viswanath Nagarajan, and R. Ravi.Approximation algorithms for optimal decision treesand adaptive tsp problems. In ICALP (1), pages 690–701, 2010.

[18] S. Rao Kosaraju, Teresa M. Przytycka, and Ryan S.Borgstrom. On an Optimal Split Tree Problem. InWADS, pages 157–168, 1999.

http://www.seas.upenn.edu/~bhalgat/2-approx-stochastic.pdf

http://www.seas.upenn.edu/~bhalgat/2-approx-stochastic.pdf