Budgeted Online Influence Maximization

Budgeted Online Influence Maximization

Pierre Perrault 1 2 3 Jennifer Healey 1 Zheng Wen 4 Michal Valko 4 3

AbstractWe introduce a new budgeted framework for on-line influence maximization, considering the totalcost of an advertising campaign instead of thecommon cardinality constraint on a chosen influ-encer set. Our approach models better the real-world setting where the cost of influencers variesand advertizers want to find the best value for theiroverall social advertising budget. We propose analgorithm assuming an independent cascade diffu-sion model and edge-level semi-bandit feedback,and provide both theoretical and experimental re-sults. Our analysis is also valid for the cardinality-constraint setting and improves the state of the artregret bound in this case.

1. IntroductionViral marketing through online social networks now rep-resents a significant part of many digital advertising bud-gets. In this form of marketing, companies incentivize cho-sen influencers in social networks (e.g., Facebook, Twitter,YouTube) to feature a product in hopes that their followerswill adopt the product and repost the recommendation totheir own network of followers. The effectiveness of thechosen set of influencers can be measured by the expectednumber of users that adopt the product due to their initialrecommendation, called the spread. Influence maximization(IM, Kempe et al., 2003) is the problem of choosing theoptimal set of influencers to maximize the spread under acardinality constraint on the chosen set.

In order to define the spread, we need to specify a diffusionprocess such as independent cascade (IC) or linear threshold(LT) (Kempe et al., 2003). The parameters of these modelsare usually unknown. Different methods exist to estimatethe parameters of the diffusion model from historical data(see section 1.1) however historical data is often difficultto obtain. Another possibility is to consider online influ-

1Adobe Research, San Jose, CA 2ENS Paris-Saclay3Inria Lille 4DeepMind. Correspondence to: Pierre Perrault<[email protected]>.

Proceedings of the 37 th International Conference on MachineLearning, Online, PMLR 119, 2020. Copyright 2020 by the au-thor(s).

ence maximization (OIM) (Vaswani et al., 2015; Wen et al.,2017) where an agent actively learns about the network byinteracting with it repeatedly, trying to find the best seedinfluencers. The agent thus faces the dilemma of explorationversus exploitation, allowing us to see it as multi-armed ban-dits problem (Auer et al., 2002). More precisely, the agentfaces IM over T rounds. Each round, it selects m seeds(based on feedback from prior rounds) and diffusion occurs;then it gains a reward equal to the spread and receives somefeedback on the diffusion.

IM and OIM optimize with the constraint of a fixed numberof seeds. This reflects a fixed seed cost model, for exam-ple, where influencers are incentivized by being given anidentical free product. In reality, however, many influencersdemand different levels of compensation. Those with a highout-degree (e.g., number of followers) are usually more ex-pensive. Due to these cost variations, marketers usuallywish to optimize their seed sets S under a budget c(S) ≤ brather than a cardinality constraint |S| ≤ m. Optimizing aseed set under a budget has been studied in the offline caseby Nguyen and Zheng (2013). In the online case, Wanget al. (2020) considered the relaxed constraint E[c(S)] ≤ b,where the expectation is over the possible randomness ofS.1 We believe however that the constraint of a fixed, equalbudget c(S) ≤ b at each round does not sufficiently modelthe willingness to choose a cost-efficient seed set. Indeed,we see that the choice of b is crucial: a b too large translatesinto a waste of budget (some seeds that are too expensivewill be chosen) and a b too small translates into a waste oftime (a whole round is used to influence only a few users).To circumvent this issue, instead of a budget per round, inour framework, we allow the agent to choose seed sets ofany cost at each round, under an overall budget constraint(equal to B = bT for instance). In summary, we incor-porate the OIM framework into a budgeted bandit setting.Our setting is more flexible for the agent, and better meetsreal-world needs.

1.1. Related work on IM

IM can be formally defined as follows. A social networkis modeled as a directed graph G = (V,E), with nodes V

1This relaxation is to avoid a computationally costly partialenumeration (Krause and Guestrin, 2005; Khuller et al., 1999)


representing users and edges E representing connections.An underlying diffusion model D governs how informationspreads in G. More precisely, D is a probability distributionon subgraphsG′ ofG, and given some seed set S, the spreadσ(S) is defined as the expected number of S-reachable2

nodes in G′ ∼ D. IM aims to find S that is a solution to

max|S|=m

σ(S). (1)

Although IM is NP-hard under standard diffusion models— i.e., IC and LT — σ is a monotone submodular3 function(Fujishige, 2005), and given a value oracle access to σ, thestandard GREEDY algorithm solves (1) within a 1 − 1/eapproximation factor (Nemhauser et al., 1978). There havebeen multiple lines of work for IM, including the develop-ment of heuristics, approximation algorithms, as well asalternative diffusion models (Leskovec et al., 2007; Goyalet al., 2011; Tang et al., 2014; 2015). Additionally, thereare also results on learning D from data in the case it isnot known (Saito et al., 2008; Goyal et al., 2010; Gomez-Rodriguez et al., 2012; Netrapalli and Sanghavi, 2012).

1.2. Related work on OIM

Prior work in OIM has mainly considered either node levelsemi-bandit feedback (Vaswani et al., 2015), where theagent observes all the S-reachable nodes inG′, or edge levelsemi-bandit feedback (Wen et al., 2017), where the agent ob-serves the whole S-reachable subgraph (i.e., the subgraph ofG′ induced by S-reachable nodes). Other, weaker, feedbacksettings have also been studied including: pairwise influencefeedback, where all nodes that would be influenced by aseed set are observed but not the edges connecting them,i.e., ({i}-reachable nodes)i∈S is observed (Vaswani et al.,2017); local feedback, where the agent observes a set ofout-neighbors of S (Carpentier and Valko, 2016) and imme-diate neighbor observation where the agent only observesthe out-degree of S (Lugosi et al., 2019).

1.3. Our contributions

In this paper, we define the budgeted OIM paradigm andpropose a performance metric for an online policy on thisproblem using the notion of approximation regret (Chenet al., 2013). To the best of our knowledge, the both ofcontributions are new. We then focus our study on the ICmodel with edge level semi-bandit feedback. We design aCUCB-style algorithm and prove logarithmic regret bounds.We also propose some modifications of this algorithm withimproving the regret rates. These gains apply to the non-budgeted setting, giving an improvement over the state-of-the-art analysis of the standard CUCB-approach (Wangand Chen, 2017). Our proof incorporates an approximation

2nodes that are reachable from some node in S.3f is submodular if f(A∪{i})−f(A) is non-increaing with A.

guarantee of GREEDY for ratio of submodular and modularfunctions, which may also be of independent interest.

2. Problem definitionIn this section, we formulate the problem of budgeted OIMand give a regret definition for evaluating policies in thatsetting. We also justify our choice for this notion of regret.We typeset vectors in bold and indicate components withindices, e.g., for some set I , a = (ai)i∈I ∈ RI is a vectoron I . Let ei be the ith canonical unit vector of RI . Theincidence vector of any subset A ⊂ I is eA ,

∑i∈A ei.

We consider a fixed directed network G = (V,E), knownto the agent, with V , {1, . . . , |V |}. We denote by ij ∈ Ethe directed edge from node i to j in G. We assume thatG doesn’t have self-loops, i.e., for all ij ∈ E, i 6= j. Fora node i ∈ V , a subset S ⊂ V , and a vector w ∈ {0, 1}E ,the predicate S w

i holds if, in the graph defined by Gw ,(V, {ij ∈ E,wij = 1}), there is a forward path from a nodein S to the node i. If it holds, we say that i is influencedby S under w. We define pi(S;w) , I

{S

w i

}and the

spread as σ(S;w) ,∣∣∣{i ∈ V, S w

i}∣∣∣. Our diffusion

process is defined by the random vector W ∈ {0, 1}E , andour cost is defined by the random4 vector C ∈ [0, 1]V ∪{0}

where the added component C0 represents any fixed costs5.Notice, random costs are neither assumed to be mutuallyindependent nor independent from W. We will see thatcomponents of W might however be mutually independent(e.g., for the IC model).

2.1. Budgeted online influence maximization

The agent interacts with the diffusion process across severalrounds, using a learning policy. At each round t ≥ 1, theagent first selects a seed set St ⊂ V , based on its past ob-servations. Then, the random vectors for both the diffusionprocess Wt ∼ PW and the costs Ct ∼ PC are sampledindependently from previous rounds. Then, the agent ob-serves some feedback from both the diffusion process andthe costs.

We provide in (2) the expected cumulative rewards FB de-fined for some total budget B > 0. The goal for the agent isto follow a learning policy π maximizing FB . In (2), recallthat St is the seed set selected by π at round t.

4Although costs are usually deterministic, we assume random-ness for more generality (influencer campaigns may have uncertainsurcharges for example).

5We provide a toy example where C0 models a concrete quan-tity: Assume you want to fill your restaurant. You may pay someseeds and ask them to advertise/influence people. C0 representsthe cost of the food, the staff, the rent, the taxes, ...


FB(π) , E

[τB−1∑t=1

σ(St;Wt)

]. (2)

τB is the random round at which the remaining budgetbecomes negative: if Bt , B −

∑t′≤t

(eT

St′Ct′ + C0,t′

),

then BτB−1 ≥ 0 and BτB < 0. Notice, quantities Bt andτB are usual in budgeted multi-armed bandits (Xia et al.,2016; Ding et al., 2013).

2.2. Performance metric

We restrict ourselves to efficient policies, i.e., we considera complexity constraint on the policy the agent can follow:For a round t, the space and time complexity for computingSt has to be polynomial in |V |, and polylogarithmic in t. Toevaluate the performance of a learning policy π, we use thenotion of approximation regret (Kakade et al., 2009; Streeterand Golovin, 2009; Chen et al., 2016). The agent wants tofollow a learning policy π which minimizes

RB,ε(π) , (1− 1/e− ε)F ?B − FB(π),

where F ?B is the best possible value of FB over all policies(thus leveraging on the knowledge of PW and PC), andwhere ε > 0 is some parameter the agent can control todetermine the tradeoff between computation and accuracy.Remark 1. This OIM with a total budgetB is different fromOIM in previous work, such as Wang and Chen (2017), evenwhen we set all costs to be equal. In our setting, there isonly one total budget for all rounds, and the policy is freeto choose seed sets of different cost in each round, whereasin the previous work, each round had a fixed budget for thenumber/cost of seeds selected. Our setting thus avoid theuse of a budget per round, which is in practice more difficultto establish than a global budget B. Nevertheless, as wewill see in section 6, both types of constraints (global andper round) can be considered simultaneously when the truecosts are known.

2.3. Justification for the approximation regret

In the non-budgeted OIM problem with a cardinality con-straint given by m ∈ [|V |], let us recall that the approxima-tion regret

RT,ε(π),∑t≤T

maxS⊂V,|S|=m

E[(1−1/e−ε)σ(S;W)−σ(St;W)]

is standard (Wen et al., 2017; Wang and Chen, 2017). Inthis notion of regret, the factor (1− 1/e− ε) (Feige, 1998;Chen et al., 2010) reflects the difficulty of approximatingthe following NP-Hard (Kempe et al., 2003) problem in thecase the distribution PW is described by IC or LT, and isknown to the agent:

maxS⊂V, |S|=m

E[σ(S;W)]. (3)

For our budgeted setting, at first sight, it is not straightfor-ward to know which approximation factor to choose. Indeed,since the random horizon may be different in FB(π) andin F ?B , the expected regret RB,ε(π) is not expressed as theexpectation of a sum of approximation gaps, so we can’tdirectly reduce the regret level approximability to the gaplevel approximability. We thus consider a quantity provablyclose to RB,ε(π) and easier to handle.Proposition 1. Define

λ? , maxS⊂V

E[σ(S;W)]

E[eT

S∪{0}C] ·

For all S ⊂ V, define the gap corresponding to S as

∆(S) , (1− 1/e− ε)λ?E[eT

S∪{0}C]− E[σ(S;W)].

Then, for any policy π selecting St at round t,∣∣∣∣∣RB,ε(π)− E

[τB−1∑t=1

∆(St)

]∣∣∣∣∣ ≤ 2|V |+ 2λ?(1 + |V |).

From Proposition 1, whose proof can be found in Ap-pendix A, RB,ε(π) and E

[∑τB−1t=1 ∆(St)

]are equivalent

in term of regret upper bound rate. Therefore, the factor(1− 1/e− ε) should reflect the approximability of

maxS⊂V

E[σ(S;W)]/E[eT

S∪{0}C]

= maxS⊂V

f(S)/c(S). (4)

Considering the specific problem where the cost function isof the form c(S) = c1|S|+ c0, for some (c0, c1) ∈ [0, 1]2,we can reduce the approximability of (4) to the approxima-bility of the following problem considered in Wang et al.(2020):

maxS⊂V

E[f(S)] s.t. E[|S|] ≤ m, (5)

for some given integer m, where the expectations are withrespect to a randomization in the approximation algorithm.Wang et al. (2020) proved that this problem is NP-hard byreducing to the set cover problem. We show here that an ap-proximation ratio α better that 1− 1/e yields an approxima-tion for set cover within (1−δ) log(|V |), δ > 0, which is im-possible unless NP ⊂ BPTIME

(nO(log log(|V |))) (Feige,

1998). Consider the graph where the collection of out-neighborhoods is exactly the collection of sets in the setcover instance. First, trying out all possible values of m, weconcentrate on the case in which the optimal m for set coveris tried out. As in Feige (1998), for k ∈ N∗, we repeatedlyapply the algorithm that α-approximate (5). It outputs a setSk (that can be associated with a set of neighborhoods) andafter each application the nodes already covered by previousapplications are removed from the graph, giving a sequenceof objective functions (fk) with f1 = f . We thus obtain

E[fk(Sk)|S1, . . . , Sk−1] ≥ α

(|V | −

k−1∑k′=1

fk′(Sk′)

).


Noticing that E[f(S1 ∪ · · · ∪ Sk)] =∑kk′=1 E[fk′(Sk′)],

we get

E[f(S1 ∪ · · · ∪ Sk)] ≥(1− (1− α)k

)|V |.

After ` = blog(1/|V |)/ log(1− α)c < (1 − δ) log(|V |)iterations, we obtain that S = S1 ∪ · · · ∪ S` is a cover, i.e.,f(S) = |V |. The result follows noticing that in expectation(and so with positive probability), we have |S| ≤ `m.

3. Algorithm for IC with edge levelsemi-bandit feedback

3.1. Setting

For w ∈ [0, 1]V , we recall that we can define an IC modelby taking PW = ⊗ij∈EBernoulli(wij). We can extendthe two previous functions pi and σ to w taking values in[0, 1]V as follows: Let W ∼ ⊗ij∈EBernoulli(wij). Wedefine the probability that i is influenced by S under W aspi(S;w) , P

[S

W i

], and we let the spread be σ(S;w) ,

E[∣∣∣{i ∈ V, S W

i}∣∣∣]. Another expression for the spread

is σ(S;w) =∑i∈V pi(S;w). We fix a weight vector on

E, w? ,(w?ij)ij∈E ∈ [0, 1]E , a cost vector on V ∪ {0},

c? , (c?i )i∈V ∪{0} ∈ [0, 1]V ∪{0}, with c?0 > 0. Thesequantities are initially unknown to the agent. We assumefrom now that

PW , ⊗ij∈EBernoulli(w?ij),

and thatE[C] = c?.

We also define S? ∈ arg maxS⊂V σ(S;w?)/eT

S∪{0}c?.

We assume that the feedback received by the agentat round t is

{Wij,t, ij ∈ E, St

Wt i}

. The agentalso receives semi-bandit feedback from the costs, i.e.,{Ci,t, i ∈ V, i ∈ St ∪ {0}} is observed.

3.2. Algorithm designIn this subsection, we present BOIM-CUCB, CUCB forBudgeted OIM problem as Algorithm 1. As we saw inProposition 1, the policy that, at each round, (1 − 1/e −ε)−approximately maximize

S 7→ σ(S;w?)

eT

Sc? + c?0

(6)

has a bounded regret. Thus, BOIM-CUCB shall be based onthis objective. Not only there are some estimation concernsdue to the unknown parameters w?, c?, but in addition tothat, we also need to evaluate/optimize our estimates of (6).

We begin by introducing some notations. We define theempirical means for t ≥ 1 as: For all i ∈ V ∪ {0},

ci,t−1 ,

∑t′∈[t−1] I{i ∈ St′ ∪ {0}}Ci,t′

Ni,t−1

,

Algorithm 1 BOIM-CUCB

Input: ε > 0, B0 = B > 0.for each round t ≥ 1 do

If true costs are known, then ct ← c?.Compute St given by Algorithm 2 with input S 7→σ(S;wt), ct.Select seed set St, and pay eT

St∪{0}Ct (i.e., removethis cost from Bt−1 to get the new budget Bt).if Bt ≥ 0, then

Get the reward σ(St;Wt), get the feedback, andupdate corresponding quantities accordingly.

elseThe budget is exhausted: leave the for loop.

end ifend for

and for all ij ∈ E,

wij,t−1 ,

∑t′∈[t−1] I

{St′

Wt′ i

}Wij,t′

N⊕i,t−1

,

where Ni,t−1 ,∑t−1t′=1 I{i ∈ St′}, N⊕i,t−1 ,∑t−1

t′=1 I{St′

Wt′ i

}. Using concentration inequalities, we

get confidence intervals for the above estimates. We arethen able to use an upper-confidence-bound (UCB) strategy(Auer et al., 2002). More precisely, in the case costs areunknown, we first build the lower confidence bound (LCB)on c?i as follows

ci,t , 0 ∨

(ci,t−1 −

√1.5 log(t)

Ni,t−1

).

We can also define UCBs for w?ij :

wij,t , 1 ∧

(wij,t−1 +

√1.5 log(t)

N⊕i,t−1

)·

We use wij,t = 1 (and ci,t = 0) when the correspond-ing counter is equal to 0. Our BOIM-CUCB approachchooses at each round t the seed set St given by Algo-rithm 2 which, as we shall see, approximately maximizeS 7→ σ(S;wt)/

(eT

S∪{0}ct

). Indeed, with high probability,

this set function is an upper bound on the true ratio (6) (usingthat σ is non decreasing w.r.t. w). Notice that this approachis followed by Wang and Chen (2017) for the non budgetedsetting, i.e., they choose St, |St| ≤ m that approximatelymaximize S 7→ σ(S;wt). To complete the description ofour algorithm, we need to describe Algorithm 2. This is thepurpose of the following.

3.3. Greedy for ratio maximization

In BOIM-CUCB, one has to approximately maximize theratio S 7→ σ(S;wt)/e

T

S∪{0}ct, that is a ratio of submod-ular over modular function. A GREEDY technique can be


Algorithm 2 GREEDY for ratio, Lazy implementationInput: σ that is an increasing submodular function,

c ∈ [0, 1]V ∪{0}.S0 ← ∅.ρ← [(∞, i)]i∈V .for k ∈ [|V |] dochecked← Sk−1.(∗) Remove the first element ρ[0] = (∼, i) from ρ.if i /∈ checked then

Insert ((σ({i} ∪ Sk−1)− σ(Sk−1))/ci, i) in ρ,such that ρ[:][0] is sorted in decreasing order.Add i to checked and go back to (∗).

elseSk ← Sk−1 ∪ {i}.

end ifend fork′ ← arg maxk∈{0,...,|V |} σ(Sk)/eT

Sk∪{0}c.Output: Sk′ .

used (see Algorithm 2). Indeed, instead of maximizingthe marginal contribution at each time step, as the standardGREEDY algorithm do, the approach is to maximize the socalled bang-per-buck, i.e., the marginal contribution dividedby the marginal cost. This builds a sequence of increasingsubsets, and the final output is the one that maximizes theratio. We prove in Appendix D the following Proposition 2,giving an approximation factor of 1− 1/e for Algorithm 2.

Proposition 2. Algorithm 2 with input σ, c is guaranteedto obtain a solution S such that:(

1− e−1) σ(S?)

eT

S?∪{0}c≤ σ(S)

eT

S∪{0}c·

Notice, a similar result as Proposition 2 is stated in Theo-rem 3.2 of Bai et al. (2016). However, their proof doesn’thold in our case, since their inequality (16) would be trueonly for a normalized cost (i.e. c0 = 0). Actually, c0 = 0implies that S? is a singleton, from subadditivity of σ.

For more efficiency, we use a greedy algorithm with lazyevaluations (Minoux, 1978; Leskovec et al., 2007), lever-aging on the submodularity of σ. More precisely, in Algo-rithm 2, instead of taking the arg max in the step

Sk ← Sk−1 ∪

{arg maxi∈V \Sk−1

σ({i} ∪ Sk−1)− σ(Sk−1)

ci

},

we maintain an upper bound ρ (initially∞) on hhe marginalgain, sorted in decreasing order. In each iteration k, weevaluates the element on top of the list, say i, and updatesits upper bound with the marginal gain at Sk−1. If afterthe update the upper bound is greater than the others, sub-modularity guarantees that i is the element with the largestmarginal gain.

Algorithm 2 (and the approximation factor) can’t be used di-rectly in the OIM context, since computing the exact spreadσ is #P hard (Chen et al., 2010). However, with MonteCarlo (MC) simulations, it can efficiently reach an arbitrar-ily close ratio of α = 1− 1/e− ε, with a high probability1− 1/

(t log2(t)

)(Kempe et al., 2003).

3.4. An alternative to Lazy Greedy: Ratiomaximization from sketches

In the previous approach, MC can still be computation-ally costly, since the marginal contribution have to be re-evaluated each time, using directed reachability computationin each MC instance. There exist efficient alternatives toα−approximately maximize the spread with cardinality con-straint, such as TIM from Tang et al. (2014) and SKIM fromCohen et al. (2014). Adapting TIM to our ratio maximizationcontext is not straightforward, since it require to know theseed set size in advance, which is not the case in Algorithm 2.SKIM is more promising since it uses the standard GREEDYin a sketch space. We provide an adaptation of SKIM forapproximately maximize the ratio (see Appendix G). It usesthe bottom-k min-hash sketches of Cohen et al. (2014), witha threshold for the length of the sketches that depends onboth k =

⌊ε−2 log(1/δ)

⌋and the cost, where δ is an up-

per bound on the probability that the relative error is largerthan ε. Exactly as Cohen et al. (2014) proved the approx-imation ratio for SKIM, this approach reaches a factor of1− 1/e− ε, with high probability. More precisely, at roundt ≥ 2, we can actually choose to have the approximationwith δ = 1/

(t log2(t)

), only adding a O(log(t)) factor in

the computational complexity of Algorithm 2 (Cohen et al.,2014), thus remaining efficient.6

3.5. Regret bound for Algorithm 1

We provide a gap dependent upper bound on the regret ofBOIM-CUCB in Theorem 1. For this, we define, for i ∈ V ,the gap

∆i,min , minS⊂V, pi(S;w?)>0, ∆(S)>0

∆(S).

We also define, with dk being the out-degree of node k,

pi,max , maxS⊂V, pi(S;w?)>0

∑k∈V

dkpk(S;w?).

Theorem 1. If π is the policy described in Algorithm 1,then

RB,ε(π)= O

(logB

(∑i∈V|V |λ

?+dipi,max|V |∆i,min

)).

6Wang and Chen (2017) uses the notion of approximation regretwith a certain fixed probability. Here, we rather fix this probabilityto 1 and allow for a O(log(t)) factor in the running time.


In addition, if true costs are known, then

RB,ε(π)= O

(logB

(∑i∈V

dipi,max|V |2

∆i,min

)).

A proof of Theorem 1 can be found in Appendix B. No-tice that the analysis can be easily used for the non bud-geted setting. In this case, it reduces to the state-of-the-art analysis of Wang and Chen (2017), except that weslightly simplify and improve the analysis to replace the fac-tor maxS⊂V

∑k∈V dkI{pk(S;w?) > 0} by a potentially

much lower quantity pi,max. In the case this last quantity isstill large, we can further improve it by considering slightmodifications to the original Algorithm 1. This is the pur-pose of the next section.

4. More refined optimistic spreadsWe observe that the factor pi,max in Theorem 1 can beas large as |E| in the worst case. In other word, if ∆ =mini ∆i,min, the rate can be as large as

O

(logB

(λ?|V |2+|E|2|V |2

∆

))·

We argue here that we can replace |E|2|V |2 by|E||V |3 log2(|V |)). Indeed, leveraging on the mutual in-dependence of random variables Wij , we can hope to get atighter confidence region for w?, and thus a provably tighterregret upper bound (Magureanu et al., 2014; Combes et al.,2015; Degenne and Perchet, 2016). We consider the fol-lowing confidence region from Degenne and Perchet (2016)(see also Perrault et al. (2019a)) and adapted to our setting.Fact 1 (Confidence ellipsoid for weights). For all t ≥ 2,with probability at least 1− 1/

(t log2(t)

),∑

ij∈EN⊕i,t−1

(w?ij − wij,t−1

)2 ≤ δ(t),where δ(t) , 2 log(t) + 2(|E|+ 2) log log(t) + 1.

For OIM (both budgeted and non budgeted), there is a largepotential gain in the analysis using the confidence regiongiven by Fact 1 compared to simply using an Hoeffdingbased one, like in BOIM-CUCB. More precisely, for classicalcombinatorial semi bandits, Degenne and Perchet (2016)reduced the gap dependent regret upper bound by a fac-tor `/ log2 (`), where in our case ` can be as large as |E|.However, there is also a drawback in practice with suchconfidence region: computing the optimistic spread mightbe inefficient, even if an oracle for evaluating the spread isavailable. Indeed, for a fixed S ⊂ V , the problem of maxi-mizing w 7→ σ(S;w) over w belonging to some ellipsoidmight be hard, since the objective is not necessarily concave.We can overcome this issue using the following Fact 2 (Wenet al., 2017; Wang and Chen, 2017).

Fact 2 (Smoothness property of the spread). for all S ⊂ V,and all w,w′ ∈ [0, 1]E ,∀k ∈ V, |pk(S;w)−pk(S;w′)|≤

∑ij∈E

pi(S;w)∣∣wij−w′ij∣∣.

In particular,

|σ(S;w)− σ(S;w′)| ≤ |V |∑ij∈E

pi(S;w)∣∣wij − w′ij∣∣.

For S ⊂ V and w ∈ RE , we define the confidence “bonus”as follows:

Bonus(S;w) , |V |

√√√√δ(t)∑

i∈V,N⊕i,t−1>0

dipi(S;w)

2

N⊕i,t−1·

Notice, we don’t sum on vertices with a zero counter. Wecompensate this by using the convention wij,t−1 = 1 whenN⊕i,t−1 = 0. We can successively use Fact 2, Cauchy-Schwartz inequality, and Fact 1 to get, with probability atleast 1− 1/

(t log2(t)

),

σ(S;w?) ≤ σ(S;wt−1) + Bonus(S;wt−1). (7)

In the same way, with probability at least 1− 1/(t log2(t)

),

we also have (8).

σ(S;w?) ≤ σ(S;wt−1) + Bonus(S;w?). (8)

Contrary to (7), this “optimistic spread” can’t be used di-rectly by the agent since w? is not known.

Although the optimistic spread defined in (7) is now mucheasier to compute, there is still a major drawback that re-mains: As a function of S ⊂ V , Bonus(S;wt−1) is notnecessarily submodular, so the optimistic spread is itself nolonger submodular. This is an issue because submodularityis a crucial property for reaching the approximation ratio1− 1/e− ε. We propose here several submodular upperbound to Bonus, defined for S ⊂ V and w ∈ RE :

• Bonus1 is actually modular, and simply uses the sub-additivity (w.r.t. S) of Bonus:

Bonus1(S;w), |V |∑j∈S

√√√√δ(t) ∑i,N⊕i,t−1>0

dipi({j};w)

2

N⊕i,t−1·

• Bonus2 uses the subadditivity of the square root:

Bonus2(S;w) , |V |∑

i,N⊕i,t−1>0

pi(S;w)

√δ(t)diN⊕i,t−1

·

• Bonus3 uses pi(S;w)2 ≤ pi(S;w), and is submodu-

lar as the composition between a non decreasing con-cave function (the square root) and a monotone sub-modular function:

Bonus3(S;w) , |V |

√√√√δ(t)∑

i,N⊕i,t−1>0

dipi(S;w)

N⊕i,t−1·


• Bonus4 uses Jensen’s inequality, and is submodularas the expectation of the square root of a submodularfunction.

Bonus4(S;w) , E

|V |√√√√ ∑i∈V,N⊕i,t−1>0,S

W i

δ(t)diN⊕i,t−1

,where W ∼ ⊗ij∈EBernoulli(wij).

We can write the following approximation guarantees forthe two first bonus:

Bonus(S;w) ≤ Bonus1(S;w)≤ |S|Bonus(S;w), (9)

Bonus(S;w) ≤ Bonus2(S;w) ≤√|V |Bonus(S;w).

Notice, another approach to get a submodular bonus is toapproximate pi(S;wt−1) by the square root of a modularfunction (Goemans et al., 2009). However, not only thisbonus would be much more computationally costly to buildthan ours, but also, we would get only a

√|V | log|V | ap-

proximation factor, which is worst than the one with ourBonus2. Since increasing the bonus by a factor α ≥ 1 in-creases the gap dependent regret upper bound by a factorα2, we only loose a factor |V | for Bonus2, compared to theuse of Bonus, which is still better than the CUCB approach.Bonus1 can also be interesting to use when we have someupper bound guarantee on the cardinality of seed sets used(see subsection 5.1). An approximation factor for Bonus3 orBonus4 doesn’t seem interesting, because it would involvethe inverse of triggering probabilities. We can, however,further upper bound Bonus3(S;w?) as follows:

Bonus3(S;w?) = |V |

√√√√δ(t)∑

i,N⊕i,t−1>0

dipi(S;w?)

N⊕i,t−1

≤ |V |

√√√√√∑j∈S

∑i∈V,

N⊕i,t−1>0

diδ(t)pi({j};w?)

N⊕i,t−1

≤ |V |

√√√√δ(t)∑j∈S|E|(

8

Nj,t−1∧ 1

),

where the last inequality only holds under some high proba-bility event, given by the following Proposition 3, involvingcounters on the costs and counters on the weights.

Proposition 3. Consider the event defined by Pt ,{∀i ∈ V,N⊕i,t−1 ≥ δ(t)}. Then, for all i, j ∈ V ,

P[Pt and

δ(t)pi({j};w?)

N⊕i,t−1>

8δ(t)

Nj,t−1

]≤ 1/t2.

We thus define for S ⊂ V ,

Bonus5(S) , |V |

√√√√δ(t)∑j∈S|E|(

8

Nj,t−1∧ 1

)·

This bonus is much more convenient since it does not de-pend anymore on w?, and can thus be computed by theagent. Indeed, although the first four bonuses are likely tobe tighter than this last submodular Bonus5, their depen-dence in w forces us to use them for w = wt−1. Evenif this doesn’t pose any problem in practice, this is moredifficult to handle in theory since it would involve optimisticestimates on pi( · ;wt−1) itself (see the next section forfurther details). Actually, we will see that the analysis withBonus5 is slightly better than the one we would get withBonus2, since we loose a factor |V | log2(|V |)/ log2(|E|)compared to the use of Bonus. In addition, it allows a muchmore interesting constant term in the regret upper boundthanks to the suppression of the dependence in w?.

Although we can improve the analysis based on Fact 1 and 2,the inequality in this last fact may be be less rough in prac-tice (we confirm this in section 7). In that case, we sufferfrom this roughness, since we actually use Fact 2 to de-sign our bonus. In contrast, BOIM-CUCB only uses it inthe analysis, and can therefore adapt to a better smoothnessinequality. Thus, we consider BOIM-CUCB5, where we firstcompute St using the BOIM-CUCB approach, and accept itonly if

σ(St;wt) ≤ σ(St;wt−1) + Bonus5(St), (10)

otherwise, we chose St maximizing σ(S;wt−1) +Bonus5(S). For technical reason (due to Proposition 3),we replace St by St ∪ {j} if it exists a j ∈ V , such that

N⊕j,t−1 < δ(t). (11)

We thus both enjoy the theoretical advantages of Bonus5

and the practical advantages of BOIM-CUCB. We give thefollowing regret bounds for this approach.Theorem 2. If π is the policy following BOIM-CUCB5, then

RB,ε(π)=O

(logB

(∑i∈V|V |λ

?+|V ||E| log2|V |∆i,min

+λ?|V |2)).

A proof of Theorem 2 can be found in Appendix E. Suchanalysis also holds in the non-budgeted setting, and maxi-mizing the spread only instead of the ratio, we can build apolicy π satisfying the following (with the standard defini-tion of the non-budgeted gaps):

RT,ε(π) = O

(log T

∑i∈V

|V |2|E| log2 |V |∆i,min

).

The regret rate is thus better than the one from Wang andChen (2017), gaining a factor |E|/

(|V | log2(|V |)

).


5. Improvements using Bonus1 and Bonus4In this section, we show that the use of Bonus1 and Bonus4

leads to a better regret leading term, at the cost of a large sec-ond order term. In the following, we propose BOIM-CUCB1

(resp. BOIM-CUCB4), that are the same approach as BOIM-CUCB5 with Bonus1( · ;wt−1) (resp. Bonus4( · ;wt−1))instead of Bonus5, and where condition (11) is replaced by

∃j ∈ V,N⊕j,t−1 ≤ |E|δ(t).

5.1. Bonus1 for low cardinality seed sets

In many real world scenarios, maximal cardinality of seedset is small compared to |V |. Indeed, in the non-budgetedsetting, it is limited by m, and it is usually assumed that mis much smaller than |V |. In the budgeted setting, we willsee in section 6 how to limit the cost of the chosen seeds,and this is likely to also induce a limit on the cardinalityof seeds. Using Bonus1 is more appropriate in this situa-tion, according to the approximation factor (9). We state inTheorem 3 the regret bound for BOIM-CUCB1.

Theorem 3. If π is the policy BOIM-CUCB1, and if all seedsselected have a cardinality bounded by m, then we have

RB,ε(π) = O

(logB

(∑i∈V

mλ? +m|V |2di log2(|E|)

∆i,min

+λ?|V |2|E|

)).

A proof can be found in Appendix F. As previously, wecan state the following non-budgeted version, with seed setcardinality constrained by m:

RT,ε(π)=O

(log T

(∑i∈V

m2|V |2di log2(|E|)∆i,min

+|E||V |2)).

Notice, for both settings, there is an improvement in themain term (the gap dependent one), in the case m ≤

√|V |.

However, there is also a higher gap independent term thatappears.

5.2. Bonus4: the same performance as Bonus?

We show here that the regret with Bonus4 is of the sameorder as what we would have had with Bonus (which is notsubmodular). However, Bonus4 does not have the calcula-tion guarantees of the other bonuses. We state in Theorem 4the regret bound for the policy BOIM-CUCB4. Notice thatwe obtain a bound whose leading term improves by a factor|E|/ log2|E| that of BOIM-CUCB.

Theorem 4. If π is the policy BOIM-CUCB4, then we have

RB,ε(π) = O

(logB

(∑i∈V

|V |λ? + |V |2di log2(|E|)∆i,min

+λ?|V |2|E|

)).

The proof is given in in Appendix J. As previously, we canstate the following non-budgeted version. Notice that thecardinality constrain does not appear in the bound.

RT,ε(π)=O

(log T

(∑i∈V

|V |2di log2(|E|)∆i,min

+|E||V |2)).

In spite of the superiority in terms of regret of the use ofBonus4, we must point out that, in the worst case, the calcu-lation of this bonus may require a number of sample (andthus a time complexity) polynomial in t, which does notmeet the criterion of efficiency that we set ourselves at thebeginning of the paper.

6. Knapsack constraint for known costsIn their setting, Wang et al. (2020) considered the relaxedconstraint

E[eS∪{0}c

?]≤ b, (12)

instead of ratio maximization, where the expectation is overthe possible randomness of S. When true costs are known tothe agent, we can actually combine the two settings: a seedset S can be chosen only if it satisfies (12). In this section,we describe modifications this new setting implies. First ofall, the regret definition is impacted, and F ?B is now maximalfor policies respecting the constraint (12) within each round.Naturally, the definitions of λ? and S? are also modifiedaccordingly. Otherwise, apart from Algorithm 2, there isconceptually no change in the approaches that have beendescribed in this paper. We now described the modificationneeded to make Algorithm 2 works in this setting. The samesequence of set Sk is considered, but instead of choosingthe set that maximizes the ratio over all k ∈ {0, . . . , |V |},we restrict the maximization to k ∈ {0, . . . , j}, where j isthe first index such that eT

Sj∪{0}c? > b. If this maximizer is

not Sj , then it satisfies the constraint and is output. Else, weoutput Sj with probability (b− eT

Sj−1∪{0}c?)/c?j and Sj−1

with probability 1− (b− eT

Sj−1∪{0}c?)/c?j . This way, the

expected cost of the output is b. We prove in Appendix Kthe following Proposition 4, giving an approximation factorof 1− 1/e for the above modification of Algorithm 2.Proposition 4. The solution S obtained by the modifiedAlgorithm 2 is such that:(

1− e−1) E[σ(S?)]

E[eT

S?∪{0}c?] ≤ E[σ(S)]

E[eT

S∪{0}c?] ,

where the expectation is over the possible randomness ofS, S?.


7. Experiments

Figure 1. Regret curves with respect to the budget B (expectationcomputed by averaging over 10 independent simulations).

In this section, we present an experiment for BudgetedOIM. In Figure 1, we plot E

[∑τB−1t=1 ∆(St)

]with respect

to the budget B used, running over up to T = 10000rounds. This quantity is a good approximations to the trueregret according to Proposition 1. Plotting the true regretwould require to compute F ?B , which is NP-Hard to do.We consider a subgraph of Facebook network (Leskovecand Krevl, 2014), with |V | = 333 and |E| = 5038,as in Wen et al. (2017). We take w? ∼ U(0, 0.1)

⊗E

and take deterministic, known costs with c?0 = 1, andc?i = di/maxj∈V dj . BOIM-CUCB+ is the same approachas BOIM-CUCB5 with Bonus( · ;wt−1) instead of Bonus5,ignoring that Bonus( · ;wt−1) is not submodular (it is onlysub-additive).

We observed that in BOIM-CUCB1, BOIM-CUCB4, BOIM-CUCB5, Condition 10 (with the correct bonus instead ofBonus5) always holds, meaning that those algorithms coin-cide with BOIM-CUCB in practice, and that the gain onlyappears through the analysis. We thus plot a single curvefor these 4 algorithms in Figure 1. On the other hand, weobserve only a slight gain of BOIM-CUCB+ compared toBOIM-CUCB.

Our experiments confirm that Fact 2 is less rough in practice,as we already anticipated. Indeed, our submodular bonusesare not tight enough to compete with BOIM-CUCB, althoughwe gain in the analysis. The slight gain that we have forBOIM-CUCB+ suggests that the issue is not only about thetightness of a submodular upper bound, but rather about thetightness of Fact 2. This is supported by the following ob-servation we made: for the Facebook subnetwork, for 1000

random draws of a seed set and vector pairs in [0, 0.1]E , theratio of the RHS and the LHS in Fact 2 is each time greaterthan 0.4|V |.

In Appendix I, we conducted further experiments on asynthetic graph comparing BOIM-CUCB to BOIM-CUCB-REGULARIZED, which greedily maximizes the regularizedspread S 7→ σ(S;wt) − λeSct, where λ is a parameterto set. We observed that for an appropriate choice of λ, aperformance similar to BOIM-CUCB can be obtained.

8. Discussion and Future workWe introduced a new Budgeted OIM problem, taking boththe costs of influencers and fixed costs into account in theseed selection, instead of the usual cardinality constraint.This better represents the current challenges in viral market-ing, since top influencers tend to be more and more costly.Our fixed cost can also be seen as the time that a round takes:A null fixed cost would mean that reloading the network toget a new independent instance is free and instantaneous.7

Obviously, this is not realistic. We also provided an algo-rithm for Budgeted OIM under the IC model and the edgelevel semi-bandit feedback setting.

Interesting future directions of research would be to exploreother kinds of feedback or diffusion models for BudgetedOIM. For practical scalability, it would also be good toinvestigate the incorporation of the linear generalizationframework (Wen et al., 2017) into Budgeted OIM. Notice,this extension is not straightforward if we want to keep ourtighter confidence region. More precisely, we believe that alinear semi-bandit approach that is aware of independencebetween edge observations should be developed (the lineargeneralization approach of Wen et al. (2017) treats eachedge observation as arbitrary correlated).

In addition to this, exploring how the use of Fact 2 in the Al-gorithm might be avoided while still using confidence regiongiven by Fact 1 would surely improve the algorithms. Onepossible way would be to use a Thompson Sampling (TS)approach (Wang and Chen, 2018; Perrault et al., 2020a),where the prior takes into account the mutual independenceof weights. However, Wang and Chen (2018) proved intheir Theorem 2 that TS gives linear approximation regretfor some special approximation algorithms. Thus, we wouldhave to use some specific property of the GREEDY approxi-mation algorithm we use.

7In this case, using |S| rounds choosing each time a singledifferent influencer i ∈ S is better than choosing the whole S in asingle round.


AcknowledgementsThe research presented was supported by European CHIST-ERA project DELTA, French Ministry of Higher Educa-tion and Research, Nord-Pas-de-Calais Regional Council,French National Research Agency project BOLD (ANR19-CE23-0026-04).

ReferencesAuer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-

time analysis of the multiarmed bandit problem. MachineLearning, 47(2-3):235–256.

Bai, W., Iyer, R., Wei, K., and Bilmes, J. (2016). Algorithmsfor optimizing the ratio of submodular functions. InInternational Conference on Machine Learning, pages2751–2759.

Carpentier, A. and Valko, M. (2016). Revealing graph ban-dits for maximizing local influence. In InternationalConference on Artificial Intelligence and Statistics.

Chen, W., Wang, C., and Wang, Y. (2010). Scalable in-fluence maximization for prevalent viral marketing inlarge-scale social networks. In Knowledge Discovery andData Mining.

Chen, W., Wang, Y., and Yuan, Y. (2013). Combinatorialmulti-armed bandit: General framework and applications.In Proceedings of the 30th International Conference onMachine Learning, volume 28 of Proceedings of MachineLearning Research, pages 151–159, Atlanta, Georgia,USA. PMLR.

Chen, W., Wang, Y., and Yuan, Y. (2016). Combinatorialmulti-armed bandit and its extension to probabilisticallytriggered arms. Journal of Machine Learning Research,17.

Cohen, E. (1997). Size-estimation framework with appli-cations to transitive closure and reachability. Journal ofComputer and System Sciences, 55(3):441–453.

Cohen, E. (2016). Minhash sketches.

Cohen, E., Delling, D., Pajor, T., and Werneck, R. F. (2014).Sketch-based influence maximization and computation:Scaling up with guarantees. In Proceedings of the 23rdACM International Conference on Conference on Infor-mation and Knowledge Management, pages 629–638.ACM.

Cohen, E. and Kaplan, H. (2007). Summarizing data usingbottom-k sketches. In Proceedings of the twenty-sixthannual ACM symposium on Principles of distributed com-puting, pages 225–234. ACM.

Combes, R., Shahi, M. S. T. M., Proutiere, A., and Others(2015). Combinatorial bandits revisited. In Advancesin Neural Information Processing Systems, pages 2116–2124.

Degenne, R. and Perchet, V. (2016). Combinatorial semi-bandit with known covariance. In Advances in NeuralInformation Processing Systems 29, pages 2972–2980.Curran Associates, Inc.

Ding, W., Qiny, T., Zhang, X.-D., and Liu, T.-Y. (2013).Multi-armed bandit with budget constraint and variablecosts.

Feige, U. (1998). A threshold of lnn for approximating setcover. Journal of the ACM (JACM), 45(4):634–652.

Fujishige, S. (2005). Submodular functions and optimiza-tion. Annals of discrete mathematics.

Goemans, M. X., Harvey, N. J., Iwata, S., and Mirrokni,V. (2009). Approximating submodular functions every-where. In Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms, pages 535–544.Society for Industrial and Applied Mathematics.

Gomez-Rodriguez, M., Leskovec, J., and Krause, A. (2012).Inferring networks of diffusion and influence. ACM Trans.Knowl. Discov. Data, 5(4):21:1–21:37.

Goyal, A., Bonchi, F., and Lakshmanan, L. V. (2010). Learn-ing influence probabilities in social networks. In Proceed-ings of the Third ACM International Conference on WebSearch and Data Mining, WSDM ’10, pages 241–250,New York, NY, USA. ACM.

Goyal, A., Lu, W., and Lakshmanan, L. V. S. (2011). Sim-path: An efficient algorithm for influence maximizationunder the linear threshold model. In 2011 IEEE 11th In-ternational Conference on Data Mining, pages 211–220.

Kakade, S. M., Kalai, A. T., and Ligett, K. (2009). Playinggames with approximation algorithms. SIAM Journal onComputing, 39(3):1088–1106.

Kempe, D., Kleinberg, J., and Tardos, E. (2003). Maxi-mizing the spread of influence through a social network.In Proceedings of the ninth ACM SIGKDD internationalconference on Knowledge discovery and data mining,pages 137–146. ACM.

Khuller, S., Moss, A., and Naor, J. S. (1999). The budgetedmaximum coverage problem. Information processingletters, 70(1):39–45.

Krause, A. and Guestrin, C. (2005). A Note on the Bud-geted Maximization of Submodular Functions. TechnicalReport June, CMU.


Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., Falout-sos, C., VanBriesen, J., and Glance, N. (2007). Cost-effective outbreak detection in networks. In Proceedingsof the 13th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, KDD ’07, pages420–429, New York, NY, USA. ACM.

Leskovec, J. and Krevl, A. (2014). SNAP Datasets: Stan-ford large network dataset collection. http://snap.stanford.edu/data.

Lugosi, G., Neu, G., and Olkhovskaya, J. (2019). Onlineinfluence maximization with local observations. In Pro-ceedings of the 30th International Conference on Algo-rithmic Learning Theory, volume 98 of Proceedings ofMachine Learning Research, pages 557–580, Chicago,Illinois. PMLR.

Magureanu, S., Combes, R., and Proutiere, A. (2014). Lips-chitz bandits: Regret lower bound and optimal algorithms.In Proceedings of The 27th Conference on Learning The-ory, volume 35 of Proceedings of Machine LearningResearch, pages 975–999, Barcelona, Spain. PMLR.

Minoux, M. (1978). Accelerated greedy algorithms for max-imizing submodular set functions. Optimization Tech-niques, pages 234–243.

Mitzenmacher, M. and Upfal, E. (2017). Probability andcomputing: randomization and probabilistic techniquesin algorithms and data analysis. Cambridge universitypress.

Nemhauser, G. L., Wolsey, L. A., and Fisher, M. L. (1978).An analysis of approximations for maximizing sub-modular set functions-I. Mathematical Programming,14(1):265–294.

Netrapalli, P. and Sanghavi, S. (2012). Learning the graphof epidemic cascades. SIGMETRICS Perform. Eval. Rev.,40(1):211–222.

Nguyen, H. and Zheng, R. (2013). On budgeted influencemaximization in social networks. IEEE Journal on Se-lected Areas in Communications, 31(6):1084–1094.

Perrault, P., Boursier, E., Perchet, V., and Valko, M. (2020a).Statistical efficiency of thompson sampling for combina-torial semi-bandits. arXiv preprint arXiv:2006.06613.

Perrault, P., Perchet, V., and Valko, M. (2019a). Exploitingstructure of uncertainty for efficient matroid semi-bandits.In Proceedings of the 36th International Conference onMachine Learning, volume 97 of Proceedings of Ma-chine Learning Research, pages 5123–5132, Long Beach,California, USA. PMLR.

Perrault, P., Perchet, V., and Valko, M. (2019b). Findingthe bandit in a graph: Sequential search-and-stop. InProceedings of Machine Learning Research, volume 89of Proceedings of Machine Learning Research, pages1668–1677. PMLR.

Perrault, P., Perchet, V., and Valko, M. (2020b). Covariance-adapting algorithm for semi-bandits with application tosparse outcomes. Proceedings of Machine Learning Re-search, 125:1–33.

Saito, K., Nakano, R., and Kimura, M. (2008). Predictionof information diffusion probabilities for independentcascade model. In Knowledge-Based Intelligent Infor-mation and Engineering Systems, pages 67–75, Berlin,Heidelberg. Springer Berlin Heidelberg.

Streeter, M. and Golovin, D. (2009). An online algorithm formaximizing submodular functions. In Advances in NeuralInformation Processing Systems, pages 1577–1584.

Tang, Y., Shi, Y., and Xiao, X. (2015). Influence maximiza-tion in near-linear time: A martingale approach. In Pro-ceedings of the 2015 ACM SIGMOD International Con-ference on Management of Data, SIGMOD ’15, pages1539–1554, New York, NY, USA. ACM.

Tang, Y., Xiao, X., and Shi, Y. (2014). Influence maxi-mization: Near-optimal time complexity meets practicalefficiency. In Proceedings of the 2014 ACM SIGMODinternational conference on Management of data, pages75–86. ACM.

Vaswani, S., Kveton, B., Wen, Z., Ghavamzadeh, M.,Lakshmanan, L. V., and Schmidt, M. (2017). Model-independent online learning for influence maximization.In Proceedings of the 34th International Conference onMachine Learning-Volume 70, pages 3530–3539. JMLR.org.

Vaswani, S., Lakshmanan, L. V. S., and Mark Schmidt(2015). Influence maximization with bandits. In NIPSworkshop on Networks in the Social and InformationSciences 2015.

Wang, Q. and Chen, W. (2017). Improving regret boundsfor combinatorial semi-bandits with probabilistically trig-gered arms and its applications. In Neural InformationProcessing Systems.

Wang, S. and Chen, W. (2018). Thompson Sampling forCombinatorial Semi-Bandits.

Wang, S., Yang, S., Xu, Z., and Truong, V.-A. (2020). Fastthompson sampling algorithm with cumulative oversam-pling: Application to budgeted influence maximization.arXiv preprint arXiv:2004.11963.

http://snap.stanford.edu/data

http://snap.stanford.edu/data


Wen, Z., Kveton, B., Valko, M., and Vaswani, S. (2017). On-line influence maximization under independent cascademodel with semi-bandit feedback. In Neural InformationProcessing Systems.

Xia, Y., Qin, T., Ma, W., Yu, N., and Liu, T.-Y. (2016). Bud-geted multi-armed bandits with multiple plays. In Pro-ceedings of the Twenty-Fifth International Joint Confer-ence on Artificial Intelligence, pages 2210–2216. AAAIPress.


A. Proof of Proposition 1Proof. Let α = 1− 1/e− ε. In the proof, we shall consider several policies π one after the other. In each case, we willdenote by St the seed selected by π at round t, and τB the random round where π has exhausted its budget. We denote(Ht)t≥1 the filtration corresponding to (Wt,Ct)t≥1. Recall that St and Bt−1 are both measurable with respect toHt−1.

Consider first the policy π that selects St = S? ∈ arg maxS⊂V E[σ(S;W)]E[eT

S∪{0}C]−1

at each round t ≥ 1. We canwrite

F ?B + |V | ≥ F ?B + E[σ(S?;W)]

≥ FB(π) + E[σ(S?;W)] definition of F ?B

=∑t≥1

E[σ(S?;Wt)I{Bt−1 ≥ 0}]

=∑t≥1

E[E[σ(S?;W)]I{Bt−1 ≥ 0}] conditioning onHt−1

= λ?∑t≥1

E[E[eT

S?∪{0}C]I{Bt−1 ≥ 0}

]= λ?

∑t≥1

E[(C0,t + eT

S?Ct)I{Bt−1 ≥ 0}] conditioning onHt−1

≥ λ?B definition of τB .

We can use the inequality

F ?B + |V | ≥ λ?B (13)

with any policy π in the following ways:

• First, we can bound the cost part in the cumulative gap:

αλ?E

[τB−1∑t=1

E[eT

St∪{0}C]]≤ αλ?

∑t≥1

E[E[eT

St∪{0}C]I{Bt−1 ≥ 0}

]Bt ≥ 0⇒ Bt−1 ≥ 0

= αλ?∑t≥1

E[(eT

StCt + C0,t

)I{Bt−1 ≥ 0}

]conditioning onHt−1

≤ αλ?B + αλ?(1 + |V |) definition of τB ,≤ αF ?B + α|V |+ αλ?(1 + |V |) inequality (13).

• Next, we can bound the reward part:

E

[τB−1∑t=1

E[σ(St;W)]

]≥ E

[τB∑t=1

E[σ(St;W)]

]− |V |

=∑t≥1

E[E[σ(St;W)]I{Bt−1 ≥ 0}]− |V |

=∑t≥1

E[σ(St;Wt)I{Bt−1 ≥ 0}]− |V | conditioning onHt−1

≥∑t≥1

E[σ(St;Wt)I{Bt ≥ 0}]− |V | Bt ≥ 0⇒ Bt−1 ≥ 0

= FB(π)− |V |.

Adding these two inequalities, we get the following upper bound on the cumulative gap:

E

[τB−1∑t=1

∆(St)

]= αλ?E

[τB−1∑t=1

E[eT

St∪{0}C]]− E

[τB−1∑t=1

E[σ(St;W)]

]≤ RB,ε(π) + (α+ 1)|V |+ αλ?(1 + |V |).


In the same way, we can derive a lower bound on E[∑τB−1

t=1 ∆(St)], considering first the policy π such that FB(π) = F ?B :

F ?B =∑t≥1

E[σ(St;Wt)I{Bt ≥ 0}]

≤∑t≥1

E[σ(St;Wt)I{Bt−1 ≥ 0}] Bt ≥ 0⇒ Bt−1 ≥ 0

=∑t≥1

E[E[σ(St;W)]I{Bt−1 ≥ 0}] conditioning onHt−1

≤∑t≥1

E[λ?E

[eT

St∪{0}C]I{Bt−1 ≥ 0}

]definition of λ?

= λ?E

[τB∑t=1

(C0,t + eT

StCt

)]conditioning onHt−1

≤ λ?(B + 1 + |V |) definition of τB .

i.e.,

F ?B − λ?(1 + |V |) ≤ λ?B. (14)

Considering any policy π:

αλ?E

[τB−1∑t=1

E[eT

St∪{0}C]]≥ αλ?

∑t≥1

E[E[eT

St∪{0}C]I{Bt−1 ≥ 1 + |V |}

]Bt−1 ≥ 1 + |V | ⇒ Bt ≥ 0

= αλ?∑t≥1

E[(eT

StCt + C0,t

)I{Bt−1 ≥ 1 + |V |}

]conditioning onHt−1

≥ αλ?B − αλ?(1 + |V |) definition of τB .≥ αF ?B − 2αλ?(1 + |V |) inequality (14),

and

E

[τB−1∑t=1

E[σ(St;W)]

]≤ E

[τB∑t=1

E[σ(St;W)]

]=∑t≥1

E[E[σ(St;W)]I{Bt−1 ≥ 0}]

=∑t≥1

E[σ(St;Wt)I{Bt−1 ≥ 0}] conditioning onHt−1

≤∑t≥1

E[σ(St;Wt)I{Bt ≥ 0}] + |V |

= FB(π) + |V |.

Again adding these two inequalities, we get the desired lower bound:

E

[τB−1∑t=1

∆(St)

]= αλ?E

[τB−1∑t=1

E[eT

St∪{0}C]]− E

[τB−1∑t=1

E[σ(St;W)]

]≥ RB,ε(π)− 2αλ?(1 + |V |)− |V |.


B. Proof of Theorem 1Proof. Let α = 1− 1/e− ε, and t ≥ 1. From Proposition 1, we have to upper bound

E

[τB−1∑t=1

∆(St)

].

Fix t ≥ 1. We consider the following events:

Wt ,

{∀ij ∈ E, 0 ≤ w?ij − wij,t ≤ 2

√1.5 log(t)

N⊕ij,t−1

},

Ct ,

{∀i ∈ V ∪ {0}, 0 ≤ c?i − ci,t ≤ 2

√1.5 log(t)

Ni,t−1

}.

We also consider the event At under which the α−approximation in Algorithm 1 holds. We already saw that

P[¬At] ≤1

t log2(t)·

From Hoeffding inequality, Ct doesn’t hold with probability bounded by 2(|V |+ 1)/t2, and Wt doesn’t hold with probabilitybounded by 2|E|/t2. Thus, under the event that either Ct, Wt or At doesn’t hold, then the regret is bounded by a constant(since we have a convergent series).

We thus now assume that Ct, Wt and At hold. In particular, using Ct and Wt, we can write

λ? =σ(S?;w?)

eT

S?∪{0}c?≤ σ(S?;wt)

eT

S?∪{0}ct·

We can use this relation to write

∆(St) = λ?α(eT

Stc? + c?0

)− σ(St;w

?)

= λ?α

((eT

Stc? + c?0

)− σ(St;wt)

αλ?

)+ (σ(St;wt)− σ(St;w

?))

≤ λ?α((

eT

Stc? + c?0

)− σ(St;wt)

ασ(S?;wt)(eT

S?ct + c0,t)

)+ (σ(St;wt)− σ(St;w

?))

≤ λ?α((eT

Stc? + c?0

)−(eT

Stct + c0,t))

+ (σ(St;wt)− σ(St;w?)),

where the last inequality is from At. Notice here that in the case the costs are known, the first term in this bound disappears,and we can then safely take λ? =0, explaining why in the final bound the term in front of λ? disappears. We now use Fact 2,and then Ct, Wt to further get the bound

∆(St) ≤ λ?α∑i∈St

(1 ∧ 2

√1.5 log(t)

Ni,t−1

)︸︷︷︸

(15)

+ |V |∑ij∈E

pi(St;w?)

(1 ∧ 2

√1.5 log(t)

N⊕ij,t−1

)︸︷︷︸

(16)

·

Then, necessarily either ∆(St) ≤ 2 · (15) or ∆(St) ≤ 2 · (16) is true. The first event can be handle exactly as in standardcombinatorial semi bandit settings, using the following upper bound on the expectation of the random horizon (Perraultet al., 2019b):

E[τB − 1] ≤ (2B/c?0 + 1)2.

This allows us to get a term of order

λ? log(B/c?0)∑i∈V

|V |∆i,min

,


in the regret upper bound.

For the second event, the analysis of (Wang and Chen, 2017) uses triggering probability group to deal with pi(St;w?).We propose here another method, simpler, that allows us to transform the factor |E|, present in the bound of (Wang andChen, 2017), into maxS⊂V

∑i∈V dipi(S;w?), a potentially much smaller quantity. More precisely, we first use a reverse

amortisation:

∆(St) ≤ −∆(St) + 4 · (16) = 4|V |∑ij∈E

pi(St;w?)

(1 ∧ 2

√1.5 log(t)

N⊕ij,t−1− ∆(St)

4|V |∑k∈V dkpk(St;w?)

)

≤ 4|V |∑ij∈E

pi(St;w?)I

{N⊕ij,t−1 ≤ 1.5 log(t)

(8|V |

∑k∈V dkpk(St;w

?)

∆(St)

)2}(

1 ∧ 2

√1.5 log(t)

N⊕ij,t−1

).

Since we have

E

[τB−1∑t=1

∆(St)

]≤ E

[τB∑t=1

E[∆(St)|Ht−1]

],

and that

E

[pi(St;w

?)I

{N⊕ij,t−1 ≤ 1.5 log(t)

(8|V |

∑k∈V dkpk(St;w

?)

∆(St)

)2}(

1 ∧ 2

√1.5 log(t)

N⊕ij,t−1

)∣∣∣∣∣Ht−1

]equals

E

[I{St

w?

i}I

{N⊕ij,t−1 ≤ 1.5 log(t)

(8|V |

∑k∈V dkpk(St;w

?)

∆(St)

)2}(

1 ∧ 2

√1.5 log(t)

N⊕ij,t−1

)∣∣∣∣∣Ht−1

],

it is sufficient to bound the quantity

τB∑t=1

4|V |∑

ij∈E, pi(St;w?)>0

I{St

w?

i}I

{N⊕ij,t−1 ≤ 1.5 log(t)

(8|V |

∑k∈V dkpk(St;w

?)

∆(St)

)2}(

1 ∧ 2

√1.5 log(t)

N⊕ij,t−1

)·

Therefore, counters N⊕ij,t−1 are ensured to increase thanks to the event{St

w?

i}

. We can now handle this exactly as instandard combinatorial semi bandit setting, to get a bound of order

log(B/c?0)∑i∈V

di|V |2 maxS⊂V, pi(S;w?)>0

∑k∈V dkpk(S;w?)

∆i,min.

Problem-independent bound The problem-independent bound of O(|V |√B logB

∑i∈V dipi,max

)is an immediate

consequence of our problem-dependent bound, decomposing, classically, the regret in two terms by filtering by whether ornot ∆(St) ≤ δ, and then taking the worst regime for δ.

C. Proof of Proposition 3Proof. First, notice that we trivially have

P[Pt and pi({j};w?) ≤ 8δ(t)

Nj,t−1and

δ(t)pi({j};w?)

N⊕i,t−1>

8δ(t)

Nj,t−1

]= 0.

Thus, let’s prove that

P[Pt and pi({j};w?) >

8δ(t)

Nj,t−1and

δ(t)pi({j};w?)

N⊕i,t−1>

8δ(t)

Nj,t−1

]≤ 1/t2.


We define another counter for (i, j) ∈ V 2 as follows:

Ni,j,t−1 ,t−1∑t′=1

I{j ∈ St′ , {j}

Wt′ i

}.

Note that we have Ni,j,t−1 ≤ (N⊕i,t−1 ∧Nj,t−1). We can thus remove Pt and replace N⊕i,t−1 by Ni,j,t−1, since this canonly increases the probability. By an union bound we have,

P[pi({j};w?) >

8δ(t)

Nj,t−1and

pi({j};w?)

Ni,j,t−1>

8

Nj,t−1

]≤

∑t′>

8δ(t)pi({j};w?)

P[Nj,t−1 = t′,

t′pi({j};w?)

8> Ni,j,t−1

].

Since the random variables I{{j}Wt′′ i

}are bernouillies of mean pi({j};w?), we can apply the Fact 3 to get

P[Nj,t−1 = t′,

t′pi({j};w?)

8> Ni,j,t−1

]≤ exp

(−(7/8)

28δ(t)/2

)< 1/t3.

By taking t′ over {0, . . . , t− 1}, the proposition holds.

Fact 3 (Multiplicative Chernoff Bound (Mitzenmacher and Upfal, 2017)). Let X1, . . . , Xt be Bernoulli random variables,of parameter µ, then for Y = X1 + · · ·+Xt, we have with δ ∈ (0, 1),

P[Y ≤ (1− δ)tµ] ≤ e−δ2tµ/2.

D. Proof of Proposition 2Proof. In the proof, we use the notation σ(i|S) , σ({i} ∪ S)− σ(S). For any k ∈ [|V |],

σ(S?)− σ(Sk−1) ≤∑

i∈S?\Sk−1

σ(i|Sk−1) Submodularity, monotonicity of σ

≤ σ(ik|Sk−1)

cik

∑i∈S?\Sk−1

ci Algorithm 2

≤ σ(ik|Sk−1)

cik

∑i∈S?

ci.

i.e., for all k ∈ [|V |] such that σ(S?)− σ(Sk−1) ≥ 0,

cikeT

S?c≤ σ(ik|Sk−1)

σ(S?)− σ(Sk−1)· (17)

There must be an index ` ∈ {0, 1, . . . , |V | − 1} such that eT

S`c ≤ eT

S?c ≤ eT

S`+1c. Let β ∈ [0, 1] be such that

eT

S?c = (1− β)eT

S`c + βeT

S`+1c. (18)

If σ(S?)− (1− β)σ(S`)− βσ(S`+1) ≤ 0, then we have

(1− e−1

)σ(S?)

eT

S?c≤ σ(S?)

eT

S?c≤ (1− β)σ(S`) + βσ(S`+1)

(1− β)eT

S`c + βeT

S`+1c·

Else,

σ(S?)− (1− β)σ(S`)− βσ(S`+1) > 0 and σ(S?)− σ(Sk) > 0 for all k ∈ [`], (19)


so we can write the following,

σ(S?)− (1− β)σ(S`)− βσ(S`+1)

σ(S?)≤ σ(S?)− (1− β)σ(S`)− βσ(S`+1)

σ(S?|∅)

=σ(S?)− σ(S`)− βσ(i`+1|S`)

σ(S?)− σ(S`)

∏k∈[`]

σ(S?)− σ(Sk)

σ(S?)− σ(Sk−1)

=

(1− βσ(i`+1|S`)

σ(S?)− σ(S`)

) ∏k∈[`]

(1− σ(ik|Sk−1)

σ(S?)− σ(Sk−1)

)

≤(

1−βci`+1

eT

S?c

) ∏k∈[`]

(1− cik

eT

S?c

)(17) and (19)

≤ exp

(−βci`+1

+∑k∈[`] c(ik)

eT

S?c

)1− x ≤ e−x

= exp

(−

(1− β)eT

S`c + βeT

S`+1c

eT

S?c

)= e−1. (18)

Rearranging the inequality, we obtain the following:(1− e−1

)σ(S?) ≤ (1− β)σ(S`) + βσ(S`+1). (20)

i.e., (1− e−1

) σ(S?)

eT

S?∪{0}c≤ (1− β)σ(S`) + βσ(S`+1)

(1− β)eT

S`∪{0}c + βeT

S`+1∪{0}c·

The output S of Algorithm 2 maximizes the ratio of σ(Sk)/eT

Sk∪{0}c over k. Thus,

maxk≤`+1

σ(Sk)

eT

Sk∪{0}c≤ σ(S)

eT

S∪{0}c·

We end the proof remarking that

maxk∈{`,`+1}

σ(Sk)

eT

Sk∪{0}c≥ (1− β)σ(S`) + βσ(S`+1)

(1− β)eT

S`∪{0}c + βeT

S`+1∪{0}c·

E. Proof of Theorem 2Proof. Let α = 1− 1/e− ε, and t ≥ 1. From Proposition 1, we have to upper bound

E

[τB−1∑t=1

∆(St)

].

In the proof, in addition to Pt , {∀i ∈ V,N⊕i,t−1 ≥ δ(t)}, we consider the following events:

Wt ,

∑ij∈E

N⊕i,t−1

(w?ij − wij,t−1

)2 ≤ 2δ(t)

,Ct ,

{∀i ∈ V ∪ {0}, 0 ≤ c?i − ci,t ≤ 1 ∧ 2

√1.5 log(t)

Ni,t−1

}.

Bt ,

{∀i, j ∈ V, δ(t)pi({j};w

?)

N⊕i,t−1≤ 8δ(t)

Nj,t−1

}.


As previously, we have the following upper bound on the expectation of the random horizon:

E[τB − 1] ≤ (2B/c?0 + 1)2.

For each node i ∈ V, if N⊕i,t−1 ≥ δ(τB − 1), then i will not be intentionally added to the seed set in BOIM-CUCB5. Then,each node is intentionally added for at most δ(τB − 1) + 1 times. Thus, we can write

E

[τB−1∑t=1

∆(St)I{¬Pt}

]≤ E[(δ(τB − 1) + 1)|V |λ?(|V |+ 1)] ≤

(δ(

(2B/c?0 + 1)2)

+ 1)|V |λ?(|V |+ 1).

We can therefore assume that Pt holds. In this case, we have by Proposition 3 that Bt doesn’t hold with probabilitybounded by |V |2/t2. On the other hand, from Fact 1, Wt doesn’t hold with probability bounded by 1/

(t log2(t)

), and

from Hoeffding inequality, Ct doesn’t hold with probability bounded by 2(|V |+ 1)/t2. We can consider the event At underwhich the α−approximation in BOIM-CUCB5 holds. We already saw that

P[¬At] ≤1

t log2(t)·

The regret in the case one of the events ¬At,¬Bt,¬Wt,¬Ct holds is thus bounded by a constant depending on |V | and λ?.It thus remains to upper bound

E

[τB−1∑t=1

∆(St)I{At,Bt,Wt,Ct}

].

For this, notice that from At, St which is the seed set chosen by our policy at round t, is an α-approximate maximizer ofA 7→ f(A)/(eT

Act + c0,t), where f is one of the optimistic spreads considered in BOIM-CUCB5. We thus have

f(St)

eT

Stct + c0,t

≥ α f(S?)

eT

S?ct + c0,t,

where S? ∈ arg maxS⊂Vσ(S;w?)

eTSc?+c?0· Since under Wt, f(S?) ≥ σ(S?;w?), we can derive the following upper bound on the

gap:

∆(St) = λ?α(eT

Stc? + c?0

)− σ(St;w

?)

= λ?α

((eT

Stc? + c?0

)− f(St)

αλ?

)+ (f(St)− σ(St;w

?))

≤ λ?α((

eT

Stc? + c?0

)− f(St)

αf(S?)(eT

S?ct + c0,t)

)+ (f(St)− σ(St;w

?)) Wt,Ct

≤ λ?α((eT

Stc? + c?0

)−(eT

Stct + c0,t))

+ (f(St)− σ(St;w?)). At

From this point, we can use the condition satisfied by f in BOIM-CUCB5:

f(St) ≤ σ(St;wt−1) + Bonus5(St).

Using Fact 2 with Fact 1, we can further have with Cauchy-Schwartz inequality

σ(St;wt−1)− σ(St;w?) ≤ Bonus5(St).

This allows us to get, using Ct,

∆(St) ≤ λ?α∑

i∈St∪{0}

1 ∧ 2

√1.5 log(t)

Ni,t−1+ 2Bonus5(St).

Since we have N0,t−1 = t, we can remove {0} in St ∪ {0}, by looking at the regret under the event{2λ?α

√1.5 log(t)/t ≤ ∆(St)/2

}, giving

∆(St) ≤ 2λ?α∑i∈St

1 ∧ 2

√1.5 log(t)

Ni,t−1+ 4Bonus5(St). (21)


The regret upper bound in the case this event doesn’t hold is bounded by a constant depending on the inverse of the squaredminimum gap and λ?. The first part in (21) can be handle exactly as in standard combinatorial budgeted semi bandit settings,to get a term of order

λ? log(B/c?0)∑i∈V

|V |∆i,min

,

in the regret upper bound. We can use the analysis of Degenne and Perchet (2016) to deal with the second part, to get a termof order

δ(B/c?0)|V |2|E|∑i∈V

log2(|V |)∆i,min

,

in the regret upper bound. We thus get the desired result.

F. Proof of Theorem 3Proof. Let α = 1− 1/e− ε, and t ≥ 1. The beginning of the proof is the same as in Theorem 2, except we no longerconsider the event Bt, and we consider a new event:

Rt , {∀i ∈ V,N⊕i,t−1 ≥ |E|δ(t)}.

As for Theorem 2:

• The regret in the case Rt doesn’t hold can be bounded by a term of order

λ?|V |2|E| log(B).

• When all the events hold, the same analysis gives


1 ∧ 2

√1.5 log(t)

Ni,t−1+ 4Bonus1(St;wt−1),

and the first term can be handled in the same way.

The second term can be analyzed in the following way: After bounding it by 4mBonus(St;wt−1), see that using Fact 2 onthe quantity pi(S;wt−1) present in this bonus, we get

pi(St;wt−1) ≤ pi(St;w?) +1

|V |Bonus(St;w?).

By subadditivity, and from Rt, we have

4mBonus(St;wt−1) ≤ 4mBonus(St;w?) + 4m|V |

√√√√δ(t)∑i∈V

diBonus(St;w?)

2

|V |2N⊕i,t−1

≤ 4mBonus(St;w?) + 4m|V |

√√√√∑i∈V

diBonus(St;w?)

2

|V |2|E|= 8mBonus(St;w?).

We can now use the analysis of Degenne and Perchet (2016), together with the one from Wang and Chen (2017) to deal withprobabilistically triggered arms, to get in the regret upper bound a term of order

δ(B/c?0)m2|V |2∑i∈V

dilog2(|E|)

∆i,min·


G. SKIM for influence maximization with costIn this section, we provide an adaptation of SKIM (Cohen et al., 2014) to our ratio maximization setting. Let (w, BONUS) =(wt, 0) or (wt−1,Bonus5) (depending if we want to maximize the CUCB ratio or the Bonus5 based ratio), SKIM for IC withweights w can be used (Cohen et al., 2014), but instead of taking k− 1 as the threshold for the length of the sketch of node i(i.e. i is chosen as soon as |sketch[i]| > k − 1), we rather consider

kci −BONUS({i} ∪ S)− BONUS(S)

|V |sketch[i][−1],

where sketch[i][−1] is the last added rank in sketch[i] (0 if the sketch is empty), and S is the current seed set built so far.This way, we can estimate the (non-optimistic) marginal gain spread of the chosen node i as

k|V |ci − (BONUS({i} ∪ S)− BONUS(S))sketch[i][−1]

sketch[i][−1]·

We get the optimistic version by adding BONUS({i} ∪ S)− BONUS(S), i.e., it is k|V |cisketch[i][−1] · Finally, we get the ratio

marginal gain by dived by ci, giving k|V |sketch[i][−1] . We do maximize the ratio marginal gain by doing this all procedure,

because the ranks are examined in ascending order. Notice we have to normalize costs at the beginning of the loop forfinding i, such that each of the threshold are greater than k − 1, to ensure that the length of the chosen sketch is at least kand that our estimation hold with high probability.

It should be noted that the main difficulty is to go through the ranks in the right order while taking into account the cost. Analternative would have been to draw classical ranks and change them at each update of the sketch of u according to the costof u. This procedure works for estimating the spread, but we can’t guarantee that the ranks are in the descending order ofthe marginal ratio, so all the marginal ratios must be estimated before selecting the maximizer.

Adapting Lemma 4.2 from Cohen et al. (2014), we obtain that the expected total number of rank insertions at a particularnode is O

(k log

(|V |k cmax

cmin

)), thus giving a global complexity of O

(|E|k log

(|V |k cmax

cmin

)). We note the dependence

in cmin, which although not desired, is only logarithmic. When all the costs are equal, we recover the standard SKIMcomplexity.

H. Evaluating bonuses with sketchesIn this section, we give details on the bonuses evaluation. Notice that the optimistic spread

σ(S;wt−1) + Bonus2(S;wt−1) =∑i∈V

pi(S;wt−1)

(1 + |V |

√δ(t)diN⊕i,t−1

)(22)

is actually a weighted spread, with weights 1 + |V |√

δ(t)diN⊕i,t−1

· Thus, ranks used in the sketching have to be drawn from adistribution that depends on these weights (Cohen, 2016). This can be done using the exponential or uniform distribution(Cohen, 1997; Cohen and Kaplan, 2007). Bonus3(A) is a square root of a weighted spread, and the same as above holds.For Bonus1(A), in addition to the above weighted consideration, with weights δ(t)di/N⊕i,t−1, we have to take care ofthe squared probability pi({j};wt−1)

2. To do so, the graph is replicated in each instance of the sketching, i.e., instead ofconsidering combined reachability sets with a single graph per instance, we consider two independent graphs per instance,and look at node-instance pairs satisfying the reachability on both graphs. Notice, we leverage here on the fact that we onlyevaluate on sets that are singletons A = {j}. Indeed, in this case, for a node i ∈ V , the probability that A reaches i on oneinstance, squared, is equal to the probability that some node in A reaches i on both instances.

I. Further experimentsIn this section, we present other experiments that we conducted on a complete 10 node graph, with known costs c?0 = 1,and for all i ∈ V , c?i is randomly drawn in (0, 1). We also chose w? ∼ U(0, 0.1)

⊗E , as in Section 7. We compare theBOIM-CUCB algorithm to BOIM-CUCB-REGULARIZED, another algorithm that might challenge BOIM-CUCB in our setting.BOIM-CUCB-REGULARIZED is exactly as BOIM-CUCB except that the objective that is optimized is S 7→ σ(S;wt)−λeSct,for λ being an input parameter to the algorithm. We can see that as for BOIM-CUCB, this algorithm have the willingness to


0 2000 4000 6000 8000 10000 12000 14000

B

0.0

0.2

0.4

0.6

0.8

[ τ B−1 ∑ t

=1∆

(St)

]1e4

BOIM-CUCB-regularizedλ= 2



BOIM-CUCB

0 2000 4000 6000 8000 10000 12000 14000

B

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

[ τ B−1 ∑ t

=1∆

(St)

]

1e4




BOIM-CUCB

0 2000 4000 6000 8000 10000 12000

B

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

[ τ B−1 ∑ t

=1∆

(St)

]

1e4




BOIM-CUCB

0 2000 4000 6000 8000 10000 12000

B

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

[ τ B−1 ∑ t

=1∆

(St)

]

1e4




BOIM-CUCB

0 2000 4000 6000 8000 10000 12000

B

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

[ τ B−1 ∑ t

=1∆

(St)

]

1e4




BOIM-CUCB

Figure 2. Regret curves on five different problem instances, with respect to the budget B (expectation computed by averaging over 10independent simulations).

maximize the function σ while minimizing the cost function. The fundamental difference is on the importance given to oneor the other function, controled by λ. We use a greedy maximization in BOIM-CUCB-REGULARIZED. A greedy optimizationof the objective S 7→ σ(S;wt)− λeSct is a heuristic which, although not supported in theory, performs well in practice.

We run experiments over up to T = 10000 rounds, on five different draws for w? and c?, and 3 different values λ = 2, 3, 4.Results are shown in Figure 2. We observe that BOIM-CUCB is in general better than BOIM-CUCB-REGULARIZED. If thevariable λ is properly chosen, performances similar to BOIM-CUCB can be obtained. This is not surprising since BOIM-CUCBaims (but only approximatively) to select S?t ∈ arg maxS⊂V σ(S;wt)/eS∪{0}. If λ = σ(S?t ;wt)/eS?t ∪{0}ct, then we alsohave that BOIM-CUCB-REGULARIZED aims at choosing S?t , since one can notice that S?t ∈ arg maxS⊂V σ(S;wt)−λeSct.

J. Proof of Theorem 4J.1. Preliminaries

If we let pS S′(w) , P[{i ∈ V, S W

i}

= S′], then another expression is

Bonus4(S;w) =∑S′⊃S

pS S′(w) |V |√δ(t)

∑i∈S′

diN⊕i,t−1︸︷︷︸

g(S′)

=∑k≥0

(g(S′k+1)− g(S′k)

) ∑S′ /∈{S′0,...,S′k}

pS S′(w).

where g(S) = g(S′1) ≤ g(S′2) ≤ . . . and S′0 = ∅.

Since this bonus shall be used with wt−1, we need a smoothness inequality to link pS S′(wt−1) to pS S′(w?). We provehere the following such inequality.

Proposition 5. For all S ⊂ V , all w,w′ ∈ [0, 1]E and all collection of subsets of vertices S, we have∣∣∣∣∣∑S′∈S

(pS S′(w)−pS S′(w′))

∣∣∣∣∣≤∑ij∈E

pi(S;w)∣∣w′ij−wij∣∣.

Proof. We assume w.l.o.g. that w′ ≥ w. We consider the random graph GW = (V, {ij ∈ E,Wij = 1}), where W ∼⊗ij∈EBernoulli(wij). We build GW′ from GW by adding edges ij independently with probability

w′ij−wij1−wij for each ij

that is not an edge in GW. Now, see that

∑S′∈S

pS S′(w) = P[S

W S′

]−∑S′ /∈S

pS S′(w),


where S W S′ means S W

i for all i ∈ S′. Thus,

0 ≤ pS S′(w′)− pS S′(w) = P[S

W′

S′]− P

[S

W S′

]−

(∑S′ /∈S

pS S′(w′)−

∑S′ /∈S

pS S′(w)

)

≤ P[S

W′

S′]− P

[S

W S′

]= P

[S

W′

S′ but not S W S′

]≤ P

[There is an edge ij s.t. S W

i, Wij = 0, and W ′ij = 1].

The last inequality is by noticing that if S W′

S′ but not S W S′, then there must be a edge ij accessible from S in GW′

such that Wij = 0 and W ′ij = 1. Taking the first such edge ij (watching the contagion spread from S step by step), wesee that ij must be accessible from S in GW as well (since otherwise there’s a previous accessible edge k` that verifiesWk` = 0 and W ′k` = 1).

We have that S W i is independent from Wij ,W

′ij . Since P

[Wij = 0, and W ′ij = 1

]= (1 − wij)

w′ij−wij1−wij = w′ij − wij ,

we haveP[There is an edge ij s.t. S W

i, Wij = 0, and W ′ij = 1]≤∑ij∈E

pi(S;w)(w′ij − wij

).

J.2. Main proof of Theorem 4

Proof. We apply a similar analysis as above. When all the events hold, the same analysis gives


1 ∧ 2

√1.5 log(t)

Ni,t−1+ 4Bonus4(St;wt−1),

and the first term can be handled in the same way. The second term can be analyzed in the following way: Using Proposition 5with S = {S′ ⊂ V, S′ /∈ {S′0, . . . , S′k}}, we get

4Bonus4(St;wt−1) ≤ 4Bonus4(St;w?) +

∑k≥0

(g(S′k+1)− g(S′k)

) 1

|V |Bonus4(St;w

?)

=

(4 +

√δ(t)

∑i∈V

diN⊕i,t−1

)Bonus4(St;w

?)

≤ 5Bonus4(St;w?),

where the last inequality uses the event

Rt , {∀i ∈ V,N⊕i,t−1 ≥ |E|δ(t)}.

Relying on Theorem 5, we can deal with this last term and obtain a term of order

δB/c?0∑i∈V

|V |2di log2(|E|)∆i,min

.

K. Proof of Proposition 4

Proof. There are two possibilities for S?: either E[eT

S?∪{0}c?]< b, or E

[eT

S?∪{0}c?]

= b. In the first case, we know that

S? is not random. Indeed, if it is not the case, then E[σ(S?)]

E[eTS?∪{0}c

?] is a convex combination of some σ(S)

eTS∪{0}c

? for S in the


support of the distribution of S?. Necessarily, the maximizer (over S in the support) of the ratio is such that eT

S∪{0}c? > b,

since otherwise this maximizer contradicts the definition of S?. Therefore, increasing the coefficient of this maximizer inthe convex combination increases E

[eT

S?∪{0}c?], which can thus be set to b. Since this also increases E[σ(S?)]

E[eTS?∪{0}c

?] , we get

a contradiction since we improved the solution S? while still satisfying the constraint.

• Consider the first case. We have as for the proof of Proposition 2, that

(1− e−1

) σ(S?)

eT

S?∪{0}c?≤ (1− β)σ(S`) + βσ(S`+1)

(1− β)eT

S`∪{0}c? + βeT

S`+1∪{0}c?

,

where ` ∈ {0, 1, . . . , |V | − 1} is such that eT

S`c ≤ eT

S?c ≤ eT

S`+1c, and eT

S?∪{0}c? = (1− β)eT

S`∪{0}c? +

βeT

S`+1∪{0}c?. In the case S` has a greater ratio than S`+1, it is chosen by our algorithm and has the desired

approximation. In the case S`+1 has the better ratio, it is chosen if its cost is lower than b. If its cost is greaterthan b, then ` + 1 = j and the algorithm chooses S` with some probability 1 − β′ and S`+1 with probabilityβ′. The goal is to show that the coefficient β′ we use for S`+1 is greater than β. This must be the case since(1− β′)eT

S`∪{0}c? + β′eT

S`+1∪{0}c? = b > eT

S?∪{0}c? = (1− β)eT

S`∪{0}c? + βeT

S`+1∪{0}c?.

• For the second case, we let S be the output of the Algorithm 1 considered by Wang et al. (2020). We thus have fromtheir Theorem 1 that

(1− e−1

)E[σ(S?)] ≤ E[σ(S)]. Since E

[eT

S∪{0}c?]

= E[eT

S?∪{0}c?]

= b, we have

(1− e−1

) E[σ(S?)]

E[eT

S?∪{0}c?] ≤ E[σ(S)]

E[eT

S∪{0}c?] ·

If the expected cost of the output S′ of our algorithm is b, then both algorithms coincides and we have the desired result.Else, we have that S′ maximizes the ratio over {S0, . . . , Sj}, which contains the support of S (that is {Sj−1, Sj}), sothe ratio evaluated at S′ is greater than E[σ(S)]

E[eTS∪{0}c

?] , giving again the desired result.

L. Generalities on combinatorial multi-armed banditsIn this section, i represent an “arm”, i.e., an edge in our OIM context. At is the random set of edges that are triggered atround t. Here, the horizon T can be random. Finally, bi(St) is simply some non-negative function (for our Bonus4, this is|V | times the square root of the out-degree) and mi is the maximum number of edges that can be reached when i is activated.The following theorem is based on Perrault et al. (2020b), Theorem 4.

Theorem 5 (Regret bound for `2-bonus, with expectation outside the norm). For all i ∈ [n], let (αi, βi,T ) ∈ [1/2, 1)×R+.For t ≥ 1, consider the event

At ,

∆t ≤ E

∥∥∥∥∥∥∑

i∈At,Ni,t−1>0

bi(St)βαii,Tei

Nαii,t−1

∥∥∥∥∥∥2

∣∣∣∣∣∣Ft,

Then, if {t ≤ T} ∈ Ft, we have

E

[T∑t=1

I{At}∆t

]≤∑i∈[n]

4 log2(4√mi) max

S∈S, pi(S)>0bi(S)

1αi E[βi,T ]ηi,

where

ηi =

32 log2(4

√mi)∆

−1i,min if αi = 1/2

22αi

((1− 2

1αi−2)

(1− αi)∆1−αiαi

i,min

)−1

if 1/2 < αi < 1.


Proof. Let t ≥ 1. With a first reverse amortisation, we start by restricting the set of possibles for At by only taking thosewhose error is at least twice as large as ∆t: assuming that At holds, we have

∆t ≤ E

2

∥∥∥∥∥∥∑

i∈At,Ni,t−1>0

bi(St)βαii,Tei

Nαii,t−1

∥∥∥∥∥∥2

−∆t

∣∣∣∣∣∣Ft

≤ E

I∥∥∥∥∥∥

∑i∈At,Ni,t−1>0

2bi(St)βαii,Tei

Nαii,t−1

∥∥∥∥∥∥2

≥ ∆t

∥∥∥∥∥∥

∑i∈At,Ni,t−1>0

2bi(St)βαii,Tei

Nαii,t−1

∥∥∥∥∥∥2

∣∣∣∣∣∣Ft

We now define

Λ(At) ,

∥∥∥∥∥∥∑

i∈At,Ni,t−1>0

2bi(St)βαii,Tei

Nαii,t−1

∥∥∥∥∥∥2

,

and have for any j ∈ At that

Λ(At) ≥2bj(St)β

αjj,T

Nαjj,t

. (23)

Then, we can write:

Λ(At) = −Λ(At) +

∥∥∥∥∥∥∑

i∈At,Ni,t−1>0

4bi(St)βαii,Tei

Nαii,t−1

∥∥∥∥∥∥2

= −

∥∥∥∥∥∑i∈At

Λ(At)ei‖eAt‖2

∥∥∥∥∥2

+

∥∥∥∥∥∥∑

i∈At,Ni,t−1>0

4bi(St)βαii,Tei

Nαii,t−1

∥∥∥∥∥∥2

≤

∥∥∥∥∥∥∑

i∈At,Ni,t−1>0

(4bi(St)β

αii,T

Nαii,t−1

− Λ(At)

‖eAt‖2

)+

ei

∥∥∥∥∥∥2

=

∥∥∥∥∥∥∑

i∈At,Ni,t−1>0

(4bi(St)β

αii,T

Nαii,t−1

− Λ(At)

‖eAt‖2

)+

I

{Λ(At) ≥

2bi(St)βαii,T

Nαii,t−1

}ei

∥∥∥∥∥∥2

Using (23)

≤

∥∥∥∥∥∥∑

i∈At,Ni,t−1>0

I

{2Λ(At) ≥

4bi(St)βαii,T

Nαii,t−1

≥ Λ(At)

‖eAt‖2

}4bi(St)β

αii,Tei

Nαii,t−1

∥∥∥∥∥∥2

.

We now consider the following partition of the set of indices:

I

{i ∈ At, Ni,t−1 > 0, 2Λ(At) ≥

4bi(St)βαii,T

Nαii,t−1

≥ Λ(At)

‖eAt‖2

}⊂dlog2(‖eAt‖2)e⋃

k=0

Jk,t,

where for all integer 1 ≤ k ≤ dlog2(‖eAt‖2)e,

Jk,t ,

{i ∈ At, Ni,t−1 > 0, 21−kΛ(At) ≥

4bi(St)βαii,T

Nαii,t−1

≥ 2−kΛ(At)

}.


We bound Λ(At)2 as

Λ(At)2 ≤

∥∥∥∥∥∥∑

i∈At,Ni,t−1>0

I

{2Λ(At) ≥

4bi(St)βαii,T

Nαii,t−1

≥ Λ(At)

‖eAt‖2

}4bi(St)β

αii,Tei

Nαii,t−1

∥∥∥∥∥∥2

2

=

dlog2(‖eAt‖2)e∑k=0

∥∥∥∥∥∥∑i∈Jk,t

4bi(St)βαii,Tei

Nαii,t−1

∥∥∥∥∥∥2

2

≤dlog2(‖eAt‖2)e∑

k=0

22−2kΛ(At)2∥∥eJk,t∥∥2

2.

So there exists one integer kt such that |Jkt,t| =∥∥eJkt,t∥∥2

2≥ 22kt−2(1 + dlog2(‖eAt‖2)e)−1.

T∑t=1

I{At}∆t ≤T∑t=1

E[ I{Λ(At) ≥ ∆t}Λ(At)|Ft]

≤T∑t=1

E


I{kt = k, Λ(At) ≥ ∆t}Λ(At)

∣∣∣∣∣∣∣Ft

≤T∑t=1

E


I{kt=k, Λ(At)≥∆t}∑i∈[n] I{i ∈ Jk,t}

22k−2(1+dlog2(‖eAt‖2)e)−1 Λ(At)

∣∣∣∣∣∣∣Ft

≤T∑t=1

n∑i=1

E

dlog2(‖eAt‖2)e∑

k=0

I{i∈At, 0 < Nαi

i,t−1≤2k+2bi(St)β

αii,T

Λ(At), Λ(At)≥∆t

}22k−2(1 + dlog2(‖eAt‖2)e)−1 Λ(At)

∣∣∣∣∣∣∣∣Ft.

Taking the expectation of the above, and using {t ≤ T} ∈ Ft, we have the bound

E

[T∑t=1

I{At}∆t

]≤

n∑i=1

E

T∑t=1


I{i∈At, 0 < Nαi


αii,T


}22k−2(1 + dlog2(‖eAt‖2)e)−1 Λ(At)

≤n∑i=1

dlog2(√mi)e∑

k=0

E

1+⌈log2

(√mi

)⌉22k−2

T∑t=1

I

{i∈At, 0 < Nαi


αii,T


}Λ(At)︸︷︷︸

(24)i,k

.

Applying Proposition 6 gives

(24)i,k ≤maxS∈S, pi(S)>0 bi(S)

1αi βi,T 2

k+2αi

1− αi∆

1−1/αii,min ,

So using⌈log2

(√mi

)⌉+ 1 ≤ log2

(4√mi

), we get

E

[T∑t=1

I{At}∆t

]≤∑i∈[n]

4 log2(4√mi) max

S∈S, pi(S)>0bi(S)

1αi E[βi,T ]ηi,


where

ηi =

32 log2(4

√mi)∆

−1i,min if αi = 1/2

22αi

((1− 2

1αi−2)

(1− αi)∆1−αiαi

i,min

)−1

if 1/2 < αi < 1.

Proposition 6. Let i ∈ [n] and fi : R+ → R+ be a non increasing function, integrable on an interval [δi,min, δi,max] ⊂ R?+.Then for any sequence of real numbers (δt) ∈ ([δi,min, δi,max] ∪ {0})T ,

T∑t=1

I{i ∈ At, 1 ≤ Ni,t−1 ≤ fi(δt)}δt ≤ fi(δi,min)δi,min +

∫ δi,max

δi,min

fi(x)dx.

In particular,

• If fi(x) = βi,Tx−1/αi , αi ∈ (0, 1) and βi,T ≥ 0, then

T∑t=1

I{i ∈ At, 1 ≤ Ni,t−1 ≤ fi(δt)}δt ≤ δ1−1/αii,min

βi,T1− αi

− δ1−1/αii,max

αiβi,T1− αi

≤ δ1−1/αii,min

βi,T1− αi

.

• If fi(x) = βi,Tx−1, βi,T ≥ 0, then

T∑t=1

I{i ∈ At, 1 ≤ Ni,t−1 ≤ fi(δt)}δt ≤ βi,T(

1 + log

(δi,max

δi,min

)).

Proof. Consider δi,max = δi,1 ≥ δi,2 ≥ · · · ≥ δi,Ki = δi,min being all possible values for δt when δt 6= 0. We define adummy gap δi,0 =∞ and let fi(δi,0) = 0. In (25), we look at times t where δt 6= 0 and first break the range (0, fi(δt)] ofthe counter Ni,t−1 into sub intervals:

(0, fi(δt)] = (fi(δi,0), fi(δi,1)] ∪ · · · ∪ (fi(δi,kt−1), fi(δi,kt)],

where kt is the index such that δi,kt = δt. This index kt exists by assumption that the subdivision contains all possiblevalues for δt when δt 6= 0. Notice that in (25), we do not explicitly use kt, but instead sum over all k ∈ [Ki] and filteragainst the event {δi,k ≥ δt}, which is equivalent to summing over k ∈ [kt].

T∑t=1

I{i ∈ At, Ni,t−1 ≤ fi(δt)}δt

=

T∑t=1

Ki∑k=1

I{i ∈ At, fi(δi,k−1) < Ni,t−1 ≤ fi(δi,k), δi,k ≥ δt}δt. (25)

Over each event that Ni,t−1 belongs to the interval (fi(δi,k−1), fi(δi,k)], we upper bound the gap δt by δi,k.

(25) ≤T∑t=1

Ki∑k=1

I{i ∈ At, fi(δi,k−1) < Ni,t−1 ≤ fi(δi,k), δi,k ≥ δt}δi,k. (26)


Then, we further upper bound the summation by adding events that Ni,t−1 belongs to the remaining intervals(fi(δi,k−1), fi(δi,k)] for kt < k ≤ Ki, associating them to a suffered gap δi,k. This is equivalent to removing thefiltering against the event {δi,k ≥ δt}.

(26) ≤T∑t=1

Ki∑k=1

I{i ∈ At, fi(δi,k−1) < Ni,t−1 ≤ fi(δi,k)}δi,k. (27)

Now, we invert the summation over t and the one over k.

(27) =

Ki∑k=1

T∑t=1

I{i ∈ At, fi(δi,k−1) < Ni,t−1 ≤ fi(δi,k)}δi,k. (28)

For each k ∈ [Ki], the number of times t ∈ [T ] that the counter Ni,t−1 belongs to (fi(δi,k−1), fi(δi,k)] can be upperbounded by the number of integers in this interval. This is due to the event {i ∈ At}, imposing that Ni,t−1 is incremented,so Ni,t−1 cannot be worth the same integer for two different times t satisfying i ∈ At. We use the fact that for all x, y ∈ R,x ≤ y, the number of integers in the interval (x, y] is exactly byc − bxc.

(28) ≤Ki∑k=1

(bfi(δi,k)c − bfi(δi,k−1)c)δi,k. (29)

We then simply expand the summation, and some terms are cancelled (remember that fi(δi,0) = 0).

(29) = bfi(δi,Ki)cδi,Ki +

Ki−1∑k=1

bfi(δi,k)c(δi,k − δi,k+1) (30)

We use bxc ≤ x for all x ∈ R. Finally, we recognize a right Riemann sum, and use the fact that fi is non increasing to upperbound each fi(δi,k)(δi,k − δi,k+1) by

∫ δi,kδi,k+1

fi(x)dx, for all k ∈ [Ki − 1].

(30) ≤ fi(δi,Ki)δi,Ki +

Ki−1∑k=1

fi(δi,k)(δi,k − δi,k+1) (31)

≤ fi(δi,Ki)δi,Ki +

∫ δi,1

δi,Ki

fi(x)dx. (32)

Documents

Budgeted Online Influence Maximization